2 information technology and libraries | march 2011 program that would provide for educating mentees about lita, sharing of areas of expertise and awareness, and develop a network of professionals. dialogue on the lita electronic discussion list and conversations with committee and interest group chairs suggests a desire and need for leadership training. the membership development committee is addressing the need for mentors in lita 101 and lita 201 held at ala annual conferences and midwinter meetings. lita leadership, including the membership development committee, committee and interest group chairs, the education committee, lita emerging leaders, and others, will be included in an ongoing dialogue to see how and what can be implemented from the lita leadership institute and the lita mentorship program recommendations as submitted by the 2009 emerging leaders team t. follow-up by lita to implement the recommendations of emerging leader projects is important to the vitality and longevity of the association. since 2007, a number of projects have been developed by emerging leaders. information about the projects is available at the following locations online: ■■ the ala website: http://www.ala.org/ala/educationcareerleader ship/emergingleaders/index.cfm ■■ ala connect: http://connect.ala.org/emergingleaders ■■ facebook: http://www.facebook.com/pages/ala-emerging -leaders/156736295251?ref=ts/ ■■ the emerging leaders blog: http://connect.ala.org/2011emergingleaders ■■ the emerging leaders wiki: http://emergingleaders.ala.org/wiki/index.php ?title =main_page i n 2006, ala president leslie burger implemented six initiatives, including an emerging leaders program that is now in its fifth year. the initiative was designed to prepare librarians who are new to the profession in leadership skills that are applicable on the job and as active leaders within the association. lita is sponsoring 2011 emerging leaders bohyun kim and andreas orphanides. bohyun is currently digital access librarian at the florida international university medical library. andreas is currently librarian for digital technologies and learning at the north carolina state university libraries. as of the writing of this column, the projects for 2011 have not been assigned. additional lita members accepted into the 2011 ala emerging leaders program include tabatha farney, deana greenfield, amanda harlan, colleen harris, megan hodge, matthew jabaily, catherine kosturski, nicole pagowsky, casey schacher, sibyl schaefer, jessica sender, and andromeda yelton. lita provides an ideal environment for its members to enhance their skills. in 2009, emerging leaders team t developed a project “making it personal: leadership development programs for lita,” working in consultation with the lita membership development committee. team members included amanda hornby (university of washington), angelica guerrero fortin (san diego county library), dan overfield (cuyahoga community college), and lisa carlucci thomas (yale university). the team t members recommended the creation of “an online continuing education program to develop the leadership and project management skills necessary to maintain and promote the value and ability of lita’s professional membership to the greater librarian population.” outcomes for the training would include project-management and team-building skills within a context that focuses on the development and application of technology in libraries. the team members also recommended the establishing of a lita mentorship karen j. starr (kstarr@nevadaculture.org) is lita president 2010–11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starr president’s message: membership, leadership, emerging leaders, and lita editor’s note: we have an excellent editorial board for this journal and with this issue we’ve decided to begin a new column. in each issue of ital, one of our board members will reflect on some question related to technology and libraries. we hope you find this new feature thought-provoking. enjoy! any librarian who has been following the profes-sional literature at all in the past ten years knows that there has been an increasing emphasis on user-centeredness in the design and creation of library services. librarians are trying to understand and even anticipate the needs of users to a degree that’s perhaps unprecedented in the history of our profession. it’s no mystery as to why. we now live in a world where global computer networks link users directly with information in such a way that often, no middleman is required. users are exploring information on their own terms, at their own convenience, sometimes even using technologies and systems that they themselves have designed or contributed to. at the same time, most libraries are feeling a financial pinch. resources are tight, and local governments, institutions of higher education, and corporations are all scrutinizing their library operations more closely, asking “what have you done for me lately?” the unspoken coda is “it better be something good, or i’m cutting your funding.” the increasing need to justify our existence, together with our desire to build more relevant services, is driving an increased interest in assessment. how do we know when we’ve built a successful service? how do we define “success?” and, perhaps most importantly, in a world filled with technologies that are “here today, gone tomorrow,” how do we decide which ones are appropriate to build into enduring and useful services? as a library technologist, it’s this last question that concerns me the most. i’m painfully aware of how quickly new technologies develop, mature, and fade silently into that good night with nary a trace. it’s like watching protozoa under a microscope. which of these can serve as the foundation for real, useful services? it’s obvious to me that if i’m going to choose well, it’s vital that i place these services in context—and not my context, the user context. in order to do that, i need to understand the users. how do they do their work? what are they most concerned with? how do they think about the library in relation to the research process? how do they use technology as part of that process? how does that process fit into the larger context of the assignment? to answer questions like these, librarians often turn to basic marketing techniques such as the survey or the focus group. whether we are aware of it or not, the emphasis on user-centered design is making librarians into marketers. this is a new role for us, and one that most of us have not had the training to cope with. since most of us haven’t been exposed to marketing as a discipline of study, we don’t think of what we do as marketing, even when we use marketing techniques. but that’s what it is. so whether we know it or not, marketing, particularly market research, is important to us. marketing as a discipline is in the process of undergoing some major changes right now. recent research in sociology, psychology, and neuroscience has uncovered some new and often startling insights into how human beings think and make decisions. marketers are struggling to incorporate these new models into their research methods, and to change their own thinking about how they discover what people want. i recently collided with this change when my own library decided to do a focus group to help us redesign our website. since we have a school of business, i asked one of our marketing professors for help. her advice? don’t do it. as she put it: “you and the users would just be trading ignorances.” she then gave me a reading list, which included how customers think by gerald zaltman, which i now refer to as “the book that made marketing sexy.”1 zaltman’s book pulls together a lot of the recent research on how people think, make choices, and remember. some of it is pretty mind-blowing: n 95% of human reasoning is unconscious. it happens at a level we are barely aware of. n we think in images much more than we do in language n social context, emotion, and reason are all involved in the decision-making process. without emotion, we literally are unable to make choices. n all human beings use metaphors to explain and understand the world around them. metaphor is the bridge between the rational and emotional parts of the decision-making process. n memory is not a collection of immutable snapshots we carry around in our heads. it’s much more like a narrative or story—one that we change just by remembering it. our experience of the past and present are inextricably linked—one is constantly influencing the other. heady stuff. if you follow many of these ideas to their logical conclusions, you end up questioning the value of many traditional marketing techniques, such as surveys and focus groups. for example, if the social context in 4 information technology and libraries | june 2008 kyle felker (felkerk@wlu.edu) is an ital editorial board member, 2007–09, and technology coordinator at washington and lee university library in lexington, virginia. editorial board thoughts kyle felker ital board member’s column | felker 5 which a decision is made is important, then surveys are often going to yield false data, since the context in which the person is deciding to tick off this or that box is very different from the context in which they actually decide to use or not use your service or product. asking users “what services would be useful” in a focus group won’t be effective because you are only interviewing the users’ rational thought process—it’s at least as important to find out how they feel about the service, your library, the task itself, and how they perceive other people’s feelings on the subject. zaltman proposes a number of very different marketing techniques to get a more complete picture of user decision making: n use lengthy, one-on-one interviews. interviewing the unconscious is tricky and takes trust, it’s something you can’t do in a traditional focus group setting. n use images. we think in images, and images are a richer field for bringing unconscious attitudes to the surface. n use metaphor. invite interviewees to describe their feelings and experiences in metaphor. explore the metaphors they come up with to more fully understand all the context. if this sounds more like therapy than marketing to you, then your initial reaction is pretty similar to mine. but the techniques follow logically from the research zaltman presents. how many of us have done user assessment and launched a new service, only to find a less than warm reception for it? how many of us have had users tell us they want something, only to see it go unused when it’s implemented? zaltman’s model offers potential explanations for why this happens, and methods for avoiding it. lest you think this has nothing to do with technology, let me offer an example: library facebook/myspace profile pages. there’s been a lot of debate on how effective and appropriate these are. it seems to me that we can’t gauge how receptive users are to this unless we understand how they feel about and think about those social spaces. this is exactly the sort of insight that new marketing techniques purport to offer us. in fact, if the research is right, and there is a social and emotional component to every choice a person makes, then that applies to every choice a user makes with regard to the library, whether it’s the choice to ask a question at the reference desk, the choice to use the library website, or the choice to vote on a library bond issue. librarians are doing a lot of things we never imagined we’d ever need or want to do. web design. archival digitization. tagging. perhaps it’s also time to acknowledge that what we do has an important marketing component, and to think of ourselves as marketers (at least part time). i’m sold enough on zaltman’s ideas that i’m willing to try them out at my own institution, and i encourage you to do the same. reference 1. zaltman, gerald. how customers think: essential insights into the mind of the market (boston, mass.: harvard business school press, 2003.) editorial board thoughts: appreciation for history cynthia porter information technology and libraries | september 2012 2 the future looks exciting for ital, with our new open access and online only journal. as i look forward, i have been thinking about librarians and the changes i have witnessed in library technology. i would like to thank judith carter for her work on ital for over 13 years. she encouraged me to volunteer for the editorial board. i will miss her. i believe that lessons from the past can help us. ital’s first issue appeared in 1982—the same year that i graduated from high school. i typed all my school papers with a typewriter except for my last couple of papers in college. my father bought an early macintosh computer (called lisa). he had a daisy wheel printer—if we wanted to change fonts, we changed out the daisy wheel. i am thankful for the editing capabilities and font choices i have now when i create documents. as an undergraduate student, i worked on dedicated oclc terminals in the interlibrary loan (ill) department at my college library. i was hired because i had the two hours open when ill usually used mail. i thought our ill service was a big help for our students. i could not imagine then that electronic copies of articles could be delivered to ill customers within one day. today’s ill staff doesn’t have to worry about paper cuts now, either. i graduated from library school in 1989. when i first started working as a cataloger, we were able to access oclc on pc’s (an improvement from the dumb terminals) in the libraries. our subject heading lists were in the big red books from the library of congress. i tried to use the red books as an example for today’s students and they had no idea what i was talking about. even though “subject headings” are a foreign concept to many students today, i will always value them and fight for their continuation. i worked on several retrospective conversion projects when i worked for a library contractor until 1991. the libraries still had card catalogs and we converted these physical catalogs to online catalogs. nicholson baker’s article “discards1,” published in 1994, fondly remembered card catalogs. this article was discussed fervently in library school, but it seems quaint now. i grew up with card catalogs and i liked being able to browse through the subject listings. browsing online does not provide the same satisfaction, but i would never give up the ability to keyword search an electronic document. i liked browsing the classification schemes, too. i did like easily seeing where your chosen number appeared within the scheme. it’s harder to do the same thing online. in 1991 i worked at an academic library where we were still converting catalog cards. we all had cynthia porter (cporter@atsu.edu) is distance support librarian at a.t. still university of health sciences, mesa, arizona. editorial board thoughts: appreciation for history| porter 3 computers on our desks by then and we were comfortable with regular use of e-mail. the internet was still young and gophers were the new technology. even though gophers were text-based, i thought it was amazing how easy it was to access information from a university on the other side of the country. the internet was the biggest technology development for me. i currently work with distance students who rely on their internet connections to use our online library. i could not imagine even having distance students if we weren’t connected with computers as we are now. a 2009 issue of ital was dedicated to discovery tools. in judith carter’s introduction to the issue she cites the browsing theory of shan-ju lin chang. browsing is an old practice in libraries and i am very happy to see that discovery tools use this classic library practice. bringing like items together has been a helpful organization method for ages. when i studied s.r. ranganathan and his colon classification scheme, i realized that faceted classification would work very well on the web. i found his ideas to be fascinating, but difficult to implement on book labels for classification numbers. some discovery tools even identify “facets” in searching and limiting. ranganathan’s work is a beautiful example of an old idea blossoming years after its conception. classification, facets, and browsing are old ideas that are still helping us organize information in our libraries. we can’t see the heavily used subjects by how dirty the cards are, but getting exact statistics on search terms is more useful anyway. i would also like to thank marc truitt for his time and contributions to ital. marc recently finished serving for four years as ital editor. he helped me remember library technology. i wanted to know about his collaboration with judith carter. he said that he “thought no one this side of pluto could do as well as she” as managing editor. we are lucky to have had brave librarians like ranganathan, carter, and truitt. although i enjoy remembering the past, i am very happy to utilize modern technology in my library. i don’t want to live in the past, but i definitely don’t want to forget it either. thank you library technology pioneers. references 1. nicholson baker, “discards,” the new yorker, april 4, 1994, vol. 70, no. 7, p. 64-85. 102 information technology and libraries | september 2010 lita committees and interest groups are being asked to step up to the table and develop action plans to implement the strategies the lita membership have identified as crucial to the association’s ongoing success. members of the board are liaisons to each of the committees, and there is a board liaison to the interest groups. these individuals will work with committee chairs, interest group chairs, and the membership to implement lita’s plan for the future. the committee and interest group chairs are being asked to contribute those actions plans by the 2011 ala midwinter meeting. they will be compiled and made available to all lita and ala members for their use through the lita website (http://lita.org) and ala connect (http://connect.ala.org). what is in it for you? lita is known for its leadership opportunities, continuing education, training, publications, expertise in standards and information policy, and knowledge and understanding of current and cuttingedge technologies. lita provides you with opportunities to develop those leadership skills that you can use in your job and lifelong career. the skills working within a group of individuals to implement a program, influence standards and policy, collaborate with other ala divisions, and publish can be taken home to your library. your participation documents your value as an employee and your commitment to lifelong learning. in today’s work environment, employers look for staff with proven skills who have contributed to the good of the organization and the profession. lita needs your participation in developing and implementing continuing education programs, publishing articles and books, and illustrating by your actions why others want to join the association. how can you do that? volunteer for a committee, help develop a continuing education program, write an article, write a book, role model for others with your lita participation, and recruit. what does your association gain? a solid structure to support its members in accomplishing the mission, vision, and strategic plan they identified as core for years to come. look for opportunities to participate and develop those skills. we will be working with committee and interest group chairs to develop meeting management tool kits over the next year, create opportunities to participate virtually, identify emerging leaders of all types, collaborate with other divisions, and provide input on national information policy and standards through ala’s office for information technology policy and other similar organizations. if you want to be involved, be sure to let lita committee and interest group chairs, the board, and your elected officers know. c loud computing. web 3.0 or the semantic web. google editions. books in copyright and books out of copyright. born digital. digitized material. the reduction of stanford university’s engineering library book collection by 85 percent. the publishing paradigm most of us know, and have taken for granted, has shifted. online databases came and we managed them. then cd-roms showed up and mostly went away. and, along came the internet, which we helped implement, use, and now depend on. how we deal with the current shifts happening in information and technology during the next five to ten years will say a great deal about how the library and information community reinvents itself for its role in the twenty-first century. this shift is different, and it will create both opportunities and challenges for everyone, including those who manage information and those who use it. as a reflection of the shifts in the information arena, lita is facing its own challenges as an association. it has had a long and productive role in the american library association (ala) dating back to 1966. the talent among the association members is amazing, solid, and a tribute to the individuals who belong to and participate in lita. lita’s members are leaders to the core and recognized as standouts within ala as they push the edge of what information management means, and can mean. for the past three years, lita members, the board, and the executive committee have been working on a strategic plan for lita. that process has been described in michelle frisque’s “president’s message” (ital v. 29, no. 2) and elsewhere. the plan was approved at the 2010 ala annual conference in washington, d.c. a plan is not cast in concrete. it is a dynamic, living document that provides the fabric that drives the association. why is this process important now more than ever? we are all dealing with the current recession. libraries are retrenching. people face challenges participating in the library field on various levels. the big information players on the national and international level are changing the playing field. as membership, each of us has an opportunity to affect the future of information and technology locally, nationally, and internationally. this plan is intended to ensure lita’s role as a “go to” place for people in the library, information, and technology fields well into the twenty-first century. karen j. starr (kstarr@nevadaculture.org) is lita president 2010–11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starrpresident’s message: moving forward editorial | marmion 167 dan marmioneditorial: why is ital important? editor’s note: what follows is a reprint of dan marmion’s editorial from ital 20, no. 2 (2001), http://www.ala.org/ ala/mgrps/divs/lita/ital/2002editorial.cfm. after reading, we ask you to consider: why does ital matter to you? post your thoughts on italica (http://ital-ica.blogspot .com/). s ome time ago i received an e-mail from a library school student, who asked me “why is [ital] important in the library profession?” i answered the question in this way: ital is important to the library profession for at least four reasons. first, while it is no longer the only publication that addresses the use of technology in the library profession, it is the oldest (dating back to 1968, when it was founded as the journal of library automation) and, we like to think, most distinguished. second, not only do we publish on a myriad of topics that are pertinent to technology in libraries, we publish at least three kinds of articles on those subjects: pure scholarly articles that give the results of empirical research done on topics of importance to the profession, communications from practitioners in the field that present real-world experiences from which other librarians can profit, and tutorials on specific subjects that teach our readers how to do useful things that will help them in their everyday jobs. the book and software reviews that are in most issues are added bonuses. third, it is the “official” publication of lita, the only professional organization devoted to the use of information technology in the library profession. fourth, it is a scholarly, peer-reviewed journal, and as such is an important avenue for many academic librarians whose career advancement depends in part on their ability to publish in this type of journal. in a sentence, then, ital is important to the library profession because it contributes to the growth of the profession and its professionals. after sending my response, i thought it would be interesting to see what some other people with close associations to the journal would add. thus i posed the same question to the editorial board and to the person who preceded me as editor. here are some of their comments: one of the many things that was not traditionally taught in library school was a systematic approach to problem solving—for somebody who needs to acquire this skill and doesn’t have a mentor handy, ital is a wonderful resource. over and over again, ital describes how a problem was identified and defined, explains the techniques used to investigate it, and details the conclusions that might fairly be drawn from the results of the investigation. few other journals so effectively model this approach. regardless of the specific subject of the article, the opportunity to see practical problem solving techniques demonstrated is always valuable. (joan frye williams) the one thing i would add to your points, and it ties into a couple of them, is that by some definitions a “profession” is one that does have a major publication. as such, it is not only the “official” publication of lita but an identity focus for those professionals in this particular area of librarianship. in fact, ideally, i would like to think that’s more of a reason why ital is important than just the fact that it’s a perk of lita membership. (jim kopp) real world experiences from which other librarians would profit—to use your own words. that is my primary reason for reading it, although i take note of tutorials as well. and the occasional book review here may catch my eye as it is likely more detailed that what might appear in lj or booklist, and [i would] be more likely to purchase it for either my office or for the general collection. (donna cranmer) ital begins as the oldest and best-established journal for refereed scholarly work in library automation and information technology, a role that by itself is important to libraries and the library profession. ital goes beyond that role to add high-quality work that does not fit in the refereed-paper mold, helping librarians to work more effectively. as the official publication of america’s largest professional association for library and information technology, ital assures a broad audience for important work—and, thanks to its costrecovery subscription pricing, ital makes that work available to nonmembers at prices far below the norm for scholarly publishing. (walt crawford) the journal serves as an historical record/documentation and joins its place with many other items that together record the history of mankind. a professional/scholarly journal has a presumed life that lasts indefinitely. (ken bierman) in a sentence, ital is important to the profession because “communication is the key to our success.” dan marmion was editor of ital, 1999–2004. this editorial was first published in the june 2002 issue of ital. 168 information technology and libraries | december 2010 to paper. ital provides one means of fostering this communication in a format that is easily usable and recognizable. it is not the only communications format, but it fills a particular niche. (eric lease morgan) so there you have the thoughts of the editor and a few other folks as to why this journal is important. * * * why does ital matter to you? post your thoughts on italica (http://ital-ica.blogspot.com/). ital is a formal, traditional, and standardized way of sharing ideas within a specific segment of the library community. librarianship is an institutional profession. as an institution it is an organic organization requiring communication between its members. an advantage of written communication, especially paper-based written communication, is its ability to transcend space and time. a written document can communicate an idea long after the author has died and half way around the world. yes, electronic communication can do the same thing, but electronic communication is much more fragile than ideas committed editorial | truitt 3 marc truitteditorial a s i write this, hurricane ike is within twelve hours of making landfall in texas; currently, it appears that the storm will strike directly at the houston– galveston area. houstonians with long memories will be comparing ike to hurricane alicia, which devastated the region in 1983, killing twenty-one and doing $2.6 billion in damage.1 younger residents and/or more recent immigrants to the area will recall tropical storm allison, which though not of hurricane force, lashed the city and much of east texas for two weeks in june 2001, leaving in its wake twenty-three dead, $6.4 billion in losses, and tens of thousands of homes damaged or destroyed.2 and of course, more recently, and much better known to all of us, regardless of where we live, katrina, the “mother of all storms,” killed over eighteen hundred, caused over $80 billion in damage, left huge swaths of new orleans uninhabitable, and created a population exodus with whose effects we are living even to this day.3 common to each of these disasters—and so many others like them—is the fact that they have often wrought terrible damage on libraries in their areas. most of us have probably seen the pictures of the waterand mildewdamaged collections at tulane, xavier, the university of new orleans, and the new orleans public library system. and the damage from these events is long-term or even permanent. i formerly worked at the university of houston (uh), and when i left there in 2006 that institution was still dealing with the consequences of allison’s destruction of uh’s subterranean law library. and now i have to wonder whether uh librarians, faculty, and students might not be facing a similar or even worse catastrophe all over again with ike. ital editorial board member donna hirst has done the profession a great service with her column, “the iowa city flood of 2008: a librarian and it professional’s perspective,” which appears in this issue. her account of how library it folks there dealt with relocations of servers, library staff, and indeed library it staff members themselves should be made required reading for all of us in the field, as well as for senior library administrators. the problem, i think we all secretly know, is that emergency preparedness—also known by its current moniker “business continuity planning” (bc)—and disaster recovery (dr) are not “sexy” subjects. devoting a portion of our always too modest resources of money, equipment, staffing, and time to what is, at best, a sort of insurance against what might happen someday seems inexcusably profligate today. such planning and preparation doesn’t roll out any shiny new services and will win few plaudits from staff or patrons, to say nothing of new resources from those who control our institutional purse strings. buying higher bandwidth equipment for a switching closet is likely to be a far easier sell. that is, until that unthinkable something happens, and your organization is facing (or suffers) a catastrophic loss of it services. note that i didn’t say “equipment” or “infrastructure.” the really important loss will be one of services. “stuff”—in the form of servers, workstations, networks, etc.—all costs money, but ultimately is replaceable. what are not replaceable—at least not immediately—are library services to staff and patrons: access to computing (networking, e-mail, productivity applications, etc.), internet resources, and perhaps most importantly nowadays, the licensed electronic content on which we and our patrons have so come to rely. while the news coverage will emphasize (not without justice, i think) the lost or rescued books in a catastrophic loss situation, what staff and patrons are likely to demand first and loudest will be continuation or restoration of technology-based library services such as e-mail, web presence, web access, and licensed content. lest there be doubt, does anyone recall what drove evacuees into public libraries in the wake of katrina? it was, as much as anything, the desire to locate loved ones and especially the need to seek information and forms for government assistance—all of which required access to networked computing resources. if we have one at all—i suspect that many of us have a dr plan that is sadly dated and that has never been tested. look at it this way: would you roll out a critical and highly visible new web service without careful preparation and testing? yet many of us somehow think that bc or dr is somehow different, with no periodic review or testing required. since we feel we have no resources to devote to bc or dr planning and testing, we excuse our failure to do so by telling ourselves and our administrations that “we can’t really plan for a disaster, since the precise circumstances for which we’re planning won’t be the ones that actually occur.” and so we find ourselves later facing a crisis without any preparation. here at the university of alberta libraries, we’ve been giving the questions of business continuity and disaster recovery a good deal of thought lately. our preexisting dr plan was typical of the sort i’ve described above: outof-date, vanishingly skeletal in its details, without explicit reference or relevance to maintenance and restoration of mission critical services, and of course, untested. impetus for our review has come from several sources. perhaps the most interesting of these has been a university-sponsored bc planning process that embraces a twopronged approach: marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | december 2008 n identify and prioritize your organization’s services. working with other constituencies within the library, we have identified and prioritized approximately ten broad services to be maintained or restored in the event of an interruption of our normal business activities. for example, our top priority is the continuation or restoration of access to licensed electronic content (e.g., e-journals, e-books, databases, etc.). our it disaster planning will be informed by and respond to this goal. n identify “upstream” and “downstream” dependencies. we are dependent on others for services so that we can provide our own; thus we cannot offer access to the internet for our users unless campus it provides us with a gateway to off-campus networks. we need to make certain as we plan that campus it is aware of and can provide this service in the scenarios for which we’re planning. by the same token, others are dependent on us for the provision of services critical to their planning: our consortial partners, for example, rely on us for ils, document delivery, and other technology-based services that we need to plan to continue in the event of a disaster. these two facets—services and dependencies—can be expressed as a matrix that is helpful in planning for bc and dr goals that are both responsive to the needs of the organization and achievable in terms of upstream and downstream dependencies. it has been an enlightening exercise. one consequence has been our decision to include, as part of next fiscal year’s budget request, funding to help create a dr site at our library’s remote storage facility, to enable us quickly to restore access to our most critical technology services. in the past, we might have used this annual request as an opportunity to highlight our need for funding to support rolling out some glamorous new service initiative. with this request, though, we are explicitly recognizing that we as an organization need to commit to measures that ensure the continuance in a variety of situations of our existing core services. that’s a major change in mindset for us, as i suspect it would be for many library it organizations. a final interesting aspect of our planning process is that one of the major drivers for the university is a concern about business continuity in the event of a peoplebased disaster. as avian influenza (aka, “bird flu”) has spread beyond the confines of its southeast asian point of origin, worry about how we continue to operate in the midst of a pandemic has been added to the more predictable suite of fires, floods, tornadoes, and earthquakes (okay, not likely in alberta). indeed, pandemic planning is in many ways far more difficult than that for more “normal” disasters. while in many smaller libraries the “it shop” may be comprised of one person in many hats, in larger organizations such as ours (approximately 25 full-time equivalent employees in library it), there tends to be a great deal of specialization. can the webmaster, in the midst of a crisis, support staff workstations? can the help desk technician deduce why our vendor for web of science has suddenly and inexplicably disabled our access? our bc process rules tell us that we should be planning for “three-deep” expertise in all critical areas, since the assumption is that a pandemic might mean that a third or more of our staff would be ill (or worse) at any given time. how many of us offer critical technology services that suffer from that it manager’s ultimate staffing nightmare, the single point of failure? we have no profound answers to these questions, and our planning process is by no means the one that will work for all organizations. but the evidence of katrina, ike, and iowa city is plain: we need to be as prepared as possible for these events. the time to “get religion” about business continuity and disaster recovery is before the unthinkable occurs, not after. are there any of you out there with experiences—either in preparation and planning or in recovery operations—that you would consider sharing with ital readers? we all would benefit from your thoughts and experiences. i know i would! post-ike postscript. ike roared ashore four days ago and it is clear from media coverage since that galveston suffered a catastrophe and houston was badly damaged. reports from area libraries are sketchy and only today beginning to filter out. meanwhile, at the university of houston, the building housing the architecture library lost its roof, and the salvageable portions of its collection are to be relocated to the main m.d. anderson library. references 1. “hurricane alicia,” wikipedia, http://en.wikipedia.org/ wiki/hurricane_alicia (accessed sept. 12, 2007). 2. “tropical storm allison,” wikipedia, http://en.wikipedia .org/wiki/tropical_storm_allison (accessed sept. 12, 2007). 3. “hurricane katrina,” wikipedia, http://en.wikipedia .org/wiki/hurricane_katrina (accessed sept. 12, 2007). article title | author 3editorial | truitt 3 editorial: beginnings marc truitt as i write these lines in late february, the first hints of spring on the alberta prairie are manifest. alternatively, perhaps it’s just that the longer and warmer days are causing me to “think spring.” there are no signs yet of early bulbs—at least, none that i can detect with around a foot of snow in most places—but the sun is now rising at 7:30 a.m. and not setting until 6 p.m., a dramatic change from the barely seven hours of daylight typical of december and january. and while none but the hardiest souls are yet outside in shorts and shirt-sleeves, somehow, daytime highs that hover around freezing seem downright pleasant in comparison with the minus thirties (not counting the wind chill) we were experiencing even a couple of weeks ago. yes, spring is in the air, even if the calendar says it is still nearly a month away. . . . so what, you may fairly ask, does the weather in edmonton have to do with ital? this is my first issue of ital as editor, and it may not surprise you to hear that i’ve been thinking quite a bit about what might be the right theme and tone for my first column. while i’ve been associated with the journal for quite awhile—first as a board member, and more recently as managing editor—my role has always been comfortably limited to background tasks such as refereeing papers and production issues. now, that is about to change; i am stepping a bit out of my comfort zone. it’s about beginnings. i follow with some awe in the footsteps of a long line of editors of ital (and jola, its predecessor). i’ve been honored to serve—and to learn a great deal—from the last two, dan marmion and john webb. you, the readers of ital, and i are fortunate to have as returning managing editor judith carter, who preceded me and taught me the skills required for that post; i hasten to emphasize that she is definitely not responsible for the things i did not do right in the job! regular readers of ital will recall that john webb often referred humorously and admiringly to the members of the ital editorial board as his “junkyard dogs;” he claimed that they kept him honest. with the addition of a couple of fine new members, i’m confident that they will continue to do so in my case! okay, with that as preface, enough about me . . . let’s talk about ital. ■ what’s inside this issue ital content has traditionally represented an eclectic blend of the best mainstream and leading/bleeding edge of library technology. we strive to be reflective of the broad, major issues of concern to all librarians, as well as alert to interesting applications that may be little more than a blip at the edge of our collective professional radar screen. our audience is not limited to those actively working in library technology, although they certainly form ital’s core readership; rather, we seek to identify and publish content that will be relevant to all with an interest in or need to know about how technology is affecting our profession. thus, some articles will resonate with staff seeking new ways to use web 2.0 technologies to engage our readers, while other articles will be of interest to those interested in better exploiting the four decades’ worth of bibliographic metadata that forms the backbone of our integrated library systems. the current issue of ital is no exception in this regard. we lead off with two papers that reflect the renewed interest of the past several years in the role and improvement of the library online catalog. jia mi and cathy weng review opac interfaces, searching functionality, and results displays to address the question of why the current opac is ineffective and what we can do to revitalize it. timothy dickey, in a contribution that received the 2007 lita/ exlibris student writing award,1 summarizes the challenges and benefits of a frbr approach to current and “next-gen” library catalogs. interestingly, as will become clear at the end of this column, dickey’s is not the first prize-winning frbr study to appear in the pages of ital. online learning has long been a subject of interest both to librarians and to the education sector as a whole. whereas the focus of many previous studies has been on the techniques and efficacy of online learning systems, though, connie haley’s paper takes a rather different approach, describing and exploring factors that characterize the preference of learners for online training, as compared with more traditional in-person techniques. in gary wan’s and zao liu’s investigation of contentbased information retrieval (cbir) in digital libraries, the authors describe and argue for systems that will enable identification of images and audio clips by automated comparison against digital libraries of image and audio files. finally, wooseob jeong prototypes an innovative application for enhancing web access by the visually impaired. jeong’s application makes use of force feedback, an inexpensive, proven technology drawn from the world of video gaming. ■ some ideas about where we are going a change of editorship is always one of those good opportunities for thinking about how we might improve, or of marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2008 different directions we might explore. with that in mind, here are a couple of things we’re either going to try, or that we’re considering: different voices. ital’s format has long included provision for two “opinion” columns, one by the editor, and another by the president of lita. from time to time, past editors have given over their columns for guest editorials. however, there are many other voices that could enrich ital’s pages, and the existing structure doesn’t really have a “place” for the regular airing of these voices. beginning with the june 2008 issue, ital will include a regular column contributed by members of the board, on a rotating basis. the column will be about any topic related to technology and libraries that is on the author’s mind. i’m thinking about how we might expand this to include a similar column contributed by ital readers. while such reader contributions may lack the currency of a weblog, i think that they would make for thoughtprovoking commentary. oh, and there’s that “currency thing.” in recent years, those of us who bring you ital have—as have those responsible for other ala publications—discussed at length the whole question of when and how to move to a sustainable model of electronic publishing that will address the needs of readers. this issue is of course especially important in the case of a technology-focused journal, where content tends to age rapidly. unfortunately, for various reasons, we’re not yet at the stage where we can go completely and solely electronic. a recent conversation with one board member, though, surfaced an idea that i think in the meantime has merit: essentially, we might create a preprint site for papers that have been accepted and edited for future publication in ital. we might call it something such as ital express, and its mission would be to get content awaiting publication out and accessible. is this a “done-deal”? no, at this stage, it’s just an intriguing idea, and i’d be interested in hearing your views about it . . . or anything else related to ital, for that matter. you can e-mail me at marc.truitt@ualberta.ca. ■ and finally, congratulations dept. last week, martha yee, of the film and television archive at the university of california, los angeles received the alcts cataloging and classification section’s margaret mann citation for 2008. martha was “recognized for her outstanding contributions to the practice of cataloging and her interest in cataloging education . . . [and her] professional contributions[, which] have included active participation in ala and alcts and numerous publications.” of particular note, the citation specifically singled out her work in the areas of “frbr, opac displays, shared cataloging and other important issues, [in which] yee is making a significant contribution to the discussions that are leading the development of our field.” surely among the most important of these is her paper “frbrization: a method for turning online public finding lists into online public catalogs,” which appeared in the june 2005 issue of ital (p. 77–95). archived at the ital site, d-list, the cdl e-scholarship repository, and elsewhere, this seminal contribution has become one of the most accessed and cited works on frbr. we at ital are proud to have provided the original venue for this paper and congratulate martha on being named recipient of the margaret mann award. editorial | truitt 3 marc truitteditorial w elcome to 2009! it has been unseasonably cold in edmonton, with daytime “highs”—i use the term loosely— averaging around -25°c (that’s -13°f, for those of you ital readers living in the states) for much of the last three weeks. factor in wind chill (a given on the canadian prairies), and you can easily subtract another 10°c. as a result, we’ve had more than a few days and nights where the adjusted temperature has been much closer to -40°, which is the same in either celsius or fahrenheit. while my boss and chief librarian is fond of saying that “real canadians don’t even button their shirts until it gets to minus forty,” i’ve yet to observe such a feat of derring-do by anyone at much less than twenty below . even your editor’s two labrador retrievers—who love cooler weather—are reluctant to go out in such cold, with the result that both humans and pets have all been coping with bouts of cabin fever since before christmas. n so, when is it “too cold” for a server room? why, you may reasonably ask, am i belaboring ital readers with the details of our weather? over the weekend we experienced near-simultaneous failures of both cooling systems in our primary server room (sr1), which meant that nearly all of our library it services, including our opac (which we host for a consortium of twenty area libraries), a separate opac for edmonton public library, our website, and access to licensed e-resources, e-mail, files, and print servers had to be shut down. temperature readings in the room soared from an average of 20–22°c (68–71.5°f) to as much as 37°c (98.6°f) before settling out at around 30°c (86°f). we spent much of the weekend and beginning of this week relocating servers to all manner of places while the cooling system gets fixed. i imagine that next we may move one into each staff person’s under-heated office, where they’ll be able to perform double duty as high-tech foot warmers! all of this happened, of course, while the temperature outside the building hovered between -20° and -25°c. this is not the first time we’ve experienced a failure of our cooling systems during extremely cold weather. last winter we suffered a series of problems with both the systems in sr1 and in our secondary room a few feet away. the issues we had then were not the same as those we’re living through now, but they occurred, as now, at the coldest time of the year. this seeming dichotomy of an overheated server environment in the depths of winter is not a matter of accident or coincidence; indeed, while it may seem counterintuitive, the fact is that many, if not all, of our cooling woes can be traced to the cold outside. the simple explanation is that extreme cold weather stresses and breaks things, including hvac systems. as we’ve tried to analyze this incident, it appears likely that our troubles began when the older of our two systems in sr1 developed a coolant leak at some point after its last preventive maintenance servicing in august. fall was mild here, and we didn’t see the onset of really severe cold weather until early to mid-december. since the older system is mainly intended for failover of the newer one, and since both systems last received routine service recently, it is possible that the leak could have developed at any time since, although my supposition is that it may be itself a result of the cold. in any case, all seemed well because the newer cooling system in sr1 was adequate to mask the failure of the older unit, until it suffered a controller board failure that took it offline last weekend. but, with the failure of the new system on saturday, all it services provided from this room had to be brought down. after a night spent trying to cool the room with fans and a portable cooling unit, we succeeded in bringing the two opacs and other core services back online by sunday, but the coolant leak in the old system was not repaired until midday monday. today is friday, and we’ve limped along all week on about 60 percent of the cooling normally required in sr1. we hope to have the parts to repair the newer cooling system early next week (fingers crossed!). some interesting lessons have emerged from this incident, and while probably not many of you regularly deal with -30°c winters, i think them worth sharing in the hope that they are more generally applicable than our winter extremes are: 1. document your servers and the services that reside on them. we spent entirely too much time in the early hours of this event trying to relate servers and services. we in information technology (it) may think of shutting down or powering up servers “fred,” “wilma,” “betty,” and “barney,” but, in a crisis, what we generally should be thinking of is whether or not we can shut down e-mail, file-and-print services, or the integrated library system (ils) (and, if the latter, whether we shut down just the underlying database server or also the related staff and public services). perhaps your servers have more obvious names than ours, in which case, count yourself fortunate. but ours are not so intuitively named—there is a perfectly good reason for this, by the way—and with distributed applications where the database marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2009 may reside here, the application there, and the web front end yet somewhere else, i’d be surprised if your situation isn’t as complex as ours. and bear in mind that documentation of dependencies goes two ways: not only do you want to know that “barney” is hosting the ils’s oracle database, but you also want to know all of the servers that should be brought up for you to offer ils–related services. 2. prioritize your services. if your cooling system (or other critical server-room utility) were suddenly only operating at 50 percent of your normal required capacity, how would you quickly decide which services to shut down and which to leave up? i wrote in this space recently that we’ve been thinking about prioritized services in the context of disaster recovery and business continuity, but this week’s incident tells me that we’re not really there yet. optimally, i think that any senior member of my on-call staff should be empowered in a given critical situation to bring down services on the basis of a predefined set of service priorities. 3. virtualize, virtualize, virtualize. if we are at all typical of large libraries in the association of research libraries (and i think we are), then it will come as no surprise that we seem to add new services with alarming frequency. i suspect that, as with most places, we tend to try and keep things simple at the server end by hosting new services on separate, dedicated servers. the resulting proliferation of new servers has led to ever-greater strains on power, cooling, and network infrastructures in a facility that was significantly renovated less than two years ago. and i don’t see any near-term likelihood that this will change. we are, consequently, in the very early days of investigating virtualization technology as a means of reducing the number of physical boxes and making much better use of the resources—especially processor and ram— available to current-generation hardware. i’m hoping that someone among our readership is farther along this path than we and will consider submitting to ital a “how we done it” on virtualization in the library server room very soon! 4. sometimes low-tech solutions work . . . no one here has failed to observe the irony of an overheated server room when the temperature just steps away is 30° below. our first thought was how simple and elegant a solution it would be to install ducting, an intake fan, and a damper to the outside of the building. then, the next time our cooling failed in the depths of winter, voila!, we could solve the problem with a mere turn of the damper control. 5. . . . and sometimes they don’t. not quite, it seems. when asked, our university facilities experts told us that an even greater irony than the one we currently have would be the requirement for can$100,000 in equipment to heat that -30°c outside air to around freezing so that we wouldn’t freeze pipes and other indoor essentials if we were to adopt the “low-tech” approach and rely on mother nature. oh, well . . . n in memoriam most of the snail mail i receive as editor consists of advertisements and press releases from various firms providing it and other services to libraries. but a few months ago a thin, hand-addressed envelope, postmarked pittsburgh with no return address, landed on my desk. inside were two slips of paper clipped from a recent issue of ital and taped together. on one was my name and address; the other was a mailing label for jean a. guasco of pittsburgh, an ala life member and ital subscriber. beside her name, in red felt-tip pen, someone had written simply “deceased.” i wondered about this for some time. who was ms. guasco? where had she worked, and when? had she published or otherwise been active professionally? if she was a life member of ala, surely it would be easy to find out more. it turns out that such is not the case, the wonders of the internet notwithstanding. my obvious first stop, google, yielded little other than a brief notice of her death in a pittsburgh-area newspaper and an entry from a digitized september 1967 issue of special libraries that identified her committee assignment in the special libraries assocation and the fact that she was at the time the chief librarian at mcgraw-hill, then located in new york. as a result of checking worldcat, where i found a listing for her master’s thesis, i learned that she graduated from the now-closed school of library service at columbia university in 1953. if she published further, there was no mention of it on google. my subsequent searches under her name in the standard online lis indexes drew blanks. from there, the trail got even colder. mcgraw-hill long ago forsook new york for the wilds of ohio, and it seems that we as a profession have not been very good at retaining for posterity our directories of those in the field. a friend managed to find listings in both the 1982–83 and 1984–85 volumes of who’s who in special libraries, but all these did was confirm what i already knew: ms. guasco was an ala life member, who by then lived in pittsburgh. i’m guessing that she was then retired, since her death notice gave her age as eighty-six years. of her professional career before that, i’m sad that i must say i was able to learn no more. usability as a method for assessing discovery | ipri, yunkin, and brown 181 tom ipri, michael yunkin, and jeanne m. brown usability as a method for assessing discovery the university of nevada las vegas libraries engaged in three projects that helped identify areas of its website that had inhibited discovery of services and resources. these projects also helped generate staff interest in the usability working group, which led these endeavors. the first project studied student responses to the site. the second focused on a usability test with the libraries’ peer research coaches and resulted in a presentation of those findings to the libraries staff. the final project involved a specialized test, the results of which also were presented to staff. all three of these projects led to improvements to the website and will inform a larger redesign. u sability testing has been a component of the university of nevada las vegas (unlv) libraries web management since our first usability studies in 2000.1 usability studies are a widely used and relatively standard set of tools for gaining insight into web functionality. these tests can explore issues such as the effectiveness of interactive forms or the complexity of accessing full-text articles from third-party databases. they can explore aesthetic and other emotional responses to a site. in addition, they can provide an opportunity to collect input concerning satisfaction with the layout and logic of the site. they can reveal mistakes on the site, such as coding errors, incorrect or broken links, and problematic wording. they also allow us to engage in testing issues of discovery to isolate site elements that facilitate or hamper discovery of the libraries’ resources and services. the libraries’ usability working group seized upon two library-wide opportunities to highlight findings of the past year’s studies. the first was the discovery summit, in which the staff viewed videos of staff attempting finding exercises on the homepage and discussed the finding process. the second was the discovery mini-conference, an outgrowth of a new evaluation framework and the libraries’ strategic plan. through a poster display, the working group highlighted areas dealing with discovery of library resources. the mini-conference allowed us to leverage library-wide interest in the topic of effective information-finding on the web to draw wider attention to usability’s importance in identifying the likelihood of our users discovering library resources independently. the usability working group engaged in three projects to help identify areas of the website that inhibited discovery and to generate staff interest in the process of usability. all three of these projects led to improvements to the website and will inform a larger redesign. the first project is an ongoing effort to study student responses to the site. the second was to administer a usability test with the libraries’ peer research coaches and present those findings to the libraries’ staff. the final project was requested by the dean of libraries and involved a specialized test, the results of which also were presented to staff. n student studies the usability working group began its ongoing evaluation of unlv libraries’ website by conducting two series of tests: one with five undergraduate students and one with five graduate students. not surprisingly, most students self-reported that the main reason they come to the libraries’ site is to find books and journal articles for assignments. the group created a set of fourteen tasks that were based on common needs for completing assignments: 1. find a journal article on the death penalty. (note: if students go somewhere other than the library, guide them back.) 2. find what floor the book the catcher in the rye is on. 3. find the most current issue of the journal popular mechanics. 4. identify a way to ask a question from home. 5. find a video on global warming. 6. you need to write a bibliography for a paper. find something on the website that would help you. 7. find out what lied library’s hours were for july 4. 8. find the libraries’ tutorial on finding books in the library. 9. the library offers workshops on how to use the library. find one you can take. 10. find a library-recommended website in business. 11. find out what books are checked out on this card. 12. find instructions for printing from your personal laptop. 13. your sociology professor, dr. lampert, has placed something on reserve for your class. please find the material. 14. your professor wants you to read the book efficiency and complexity in grammars by john a. hawkins. find a copy of the book for your assignment. (the tom ipri (tom.ipri@unlv.edu) is head, media and computer services; michael yunkin (michael.yunkin@unlv.edu) is web content manager/usability specialist; and jeanne m. brown (jeanne.brown@unlv.edu) is head, architecture studies library and assessment librarian, university of nevada las vegas libraries. 182 information technology and libraries | december 2009 moderator will prompt if the person stops at the catalog.) the results of these tests revealed that the site was not as conducive to discovery as was hoped. the libraries are planning on a complete redesign of the site in the near future; however, the results of these first two series of usability tests were compelling enough to prompt an intermediary redesign to improve some of the areas that were troublesome to students. that said, the tests also found certain parts of the old site (figure 1) to be very effective: 1. all participants used the tabbed box in the center of the page, which gives them access to the catalog, serials lists, databases, and reserves. 2. all students quickly found the “ask a librarian” link when prompted to find a way to ask a question from home. 3. most students found the libraries’ hours, partly because of the “hours” tab at the top of the page and partly because of multiple access points. 4. many participants used the “site search” tab to navigate to the search page, but few actually used it to conduct searches. they effectively used the site map information also included on the search page. the usability tests also revealed some variables that undermined the goal of discoverability. 1. due to the various sources of library-related information (website, catalog, vendor databases) navigation posed problems for students. although not a specific question in the usability tests, the results show students often struggled to get back to the libraries’ home page to start a new question. 2. students often expected to find different content under “help and instruction” than what was there. 3. students used the drop down boxes as a last resort. often, they would expand a drop down box and quickly navigate away without selecting anything from the list. 4. with some exceptions, students mainly ignored the tabs across the top of the home page. 5. although students made good use of the tabbed box in the center of the page, many could not distinguish between “journals” and “articles & databases.” 6. similarly, students easily found the “reserves” tab but could not make sense of the difference between “electronic reserves (e-reserves)” and “other reserves.” 7. no student found business resources via the “subject guides” drop down menu at the bottom of the home page. n peer-coach test and staff presentation unlv libraries employs peer research coaches, undergraduate students who serve as frontline research mentors to their peers. the usability working group administered the same test they used with the first group of undergraduate and graduate students to the peer research coaches. although these students are trained in library research, they still struggled with some of the usability tasks. the usability working group presented the findings of the peer research coach tests with staff. the peer research coaches are highly regarded in the libraries, so staff were surprised that they had so much difficulty navigating the site; this presentation was the first time many of the staff had seen the results of usability studies of the site. the shocking nature of these results generated a great deal of interest among the staff regarding the work of the usability working group. n the dean’s project in january 2009, the dean of libraries asked the usability working group for assistance in planning for the discovery summit. initially, she requested to view figure 1. unlv libraries’ original website design usability as a method for assessing discovery | ipri, yunkin, and brown 183 the video from some of the usability tests with the goal of identifying discovery-oriented problems on the libraries’ website. soon after, the dean tasked the group with performing a new set of usability tests using three subjects: a librarian, a library employee with little research or web expertise, and a faculty researcher. each participant was asked to complete three tasks, first using the libraries’ website, then using google. the tasks were based on items found in the libraries’ special collections: 1. find a photograph available in unlv libraries of the basic magnesium mine in henderson, nevada. 2. find some information about the baneberry nuclear test. are there any documents in unlv libraries about the lawsuit associated with the test? 3. find some information about the local greenpeace chapter. are there any documents in unlv libraries about the las vegas chapter? the dean viewed those videos and chose the most interesting clips for a presentation at the discovery summit. prior to this meeting, the libraries’ staff were instructed to try completing the tasks on their own so that they might see the potential difficulties users must overcome and to compare the user experience provided by our website with that provided by google. at the discovery summit, the dean presented the staff a number of clips from these special usability tests, giving the staff an opportunity to see where users familiar with the libraries collections stumble. the staff also were shown several clips of undergraduates using the website to perform basic tasks, such as finding journal articles or videos in the libraries, with varying degrees of success. these clips helped illustrate the various difficulties users encounter when attempting to discover library holdings, including unfamiliar search interfaces, library jargon, and a lack of clear relationships between the catalog and other databases. this discussion helped set the stage for the discovery mini-conference. n initial changes to the site unlv libraries’ website is in the process of being redesigned, and the results of the usability studies are being used to inform that process. however, because of the seriousness of some of the issues, some changes are being implemented into an intermediary design (figure 2). the new homepage n combines article and journal searching into one tab and removes the word “databases” from the page entirely; n adds a website search to the tabbed box; n adds a “music & video” search option; n makes better use of the picture on the page by incorporating rotating advertisements in that area; n widens the page, allowing more space on the rest of the site’s templates; n breaks the confusing “help & instruction” page into two more specific pages: “help” and “using the libraries”; and n adds the main library and the branch library hours to the homepage. this new homepage is just the beginning of our efforts to improve discovery through the libraries’ website. the usability working group already has plans to do a card sort for the “using the library” category to further refine the content and language of that section. the group plans to test the initial changes to the site to ensure that they are improving discovery. reference 1. jennifer church, jeanne brown, and diane vanderpol, “walking the web: usability testing of navigational pathways at the university of nevada las vegas libraries,” in usability assessment of library-related web sites: methods and case studies, ed. nicole campbell (chicago: ala, 2001). figure 2. unlv libraries’ new website design 152 information technology and libraries | december 2011 ■■ more from the far side of the k–t boundary in my september column, i offered some old-school suggestions for how we as a profession might cope with our confused and unbalanced times. since then, several more have crossed my mind, and i thought i’d offer them, for what they may be worth: ■■ we can outsource everything but responsibility. whether it’s “the cloud,” vendor acquisition profiles, or shelfready cataloguing, outsourcing has become a popular way of dealing with budgetary and staffing stresses during the past few years. generally speaking, i have serious reservations about outsourcing our services, but i do recognize the imperatives that have caused us to resort to them. that said, in farming out critical library services, we do not at the same time gain license to farm out responsibility for their efficient operation. oversight and quality control are still up to us, and it simply will not wash with patrons today, next year, or a century from now to be told that a collection or service is unacceptably substandard because we outsourced it. a vendor’s failure is our failure, too. it’s still “our stuff,” and so are the services. ■■ we’re here to make decisions, not avoid them. document delivery, patron-driven acquisitions, usability studies, and evidence-based methodologies should help to inform and serve as validity checks for our decisions, not be replacements for them. as with outsourcing and our over-reliance on technology-driven solutions, i fear that these services and methodologies are in real danger of becoming crutches, enabling us to avoid making decisions that may be difficult, unpopular, tedious, or simply too much work. but if decisions regarding collections and services can be reduced to simple questions of demand or the outcome of a survey, then who needs us? it’s our job to make these decisions; demandor survey-driven techniques are simply there to assist us in doing so. ■■ relevance is relative. we talk about “relevance” in much the same breathlessly reverential voice as we speak of the “user” . . . as if there were but one, uniquely “relevant” service model for that single, all-encompassing “user.” one of the perils of our infatuation with “relevance” is the illusion that by adopting this or that technology or targeted service, we are somehow remaining relevant to “the user.” which user? just as not all patrons come to us seeking potboiler romances, so too not all users demand that all services and collections be made available electronically, over mobile platforms. since we do recognize that our resources are finite, rather than pandering to some groups at the expense of others with trendy temporal come-ons, why not instead focus on long-term services and collections that reflect our values? the patrons who really should matter most to us will respect us for this demonstration of our integrity. ■■ libraries are ecosystems. as with the rest of the world around us, libraries comprise arrays of interlocking, interdependent, and often poorly understood/ documented entities, services, and systems. they’ve developed that way over centuries. and just as so often happens in the larger world, any and every change we make can cause a cascade of countless other changes, many of which we might not anticipate before making that seemingly simple initial change. we are stewards of the libraries in which we work: our obligation, as librarians, is to respect what was bequeathed to us, to care for and use it wisely, and to pass it on to those who follow in at least the condition in which we received it—preferably better. environments, including libraries, change and evolve of course, but critics of the supposedly slow pace of change in libraries fail to grasp that our role is just as much that of the conservationist as it is the advocate of development and change. our mission is not change for change’s sake; rather, it is incremental, considered change that will benefit not only today’s patrons and librarians, but respect those of the past and serve those of the future as well. perhaps librarians need an analogue to the medical profession’s hippocratic oath: primum non nocere, “first, do no harm.” ■■ innocents abroad probably few ital readers will be aware (i certainly wasn’t!) that mark twain’s bestselling book during his lifetime was not tom sawyer or huckleberry finn—or any of a host of others of his now better-remembered works— but rather his 1869 travelogue innocents abroad, or the new pilgrims’ progress. the book, which i’ve been savoring in my spare leisure reading time over the past several months, records in journal form twain’s involvement in a voyage in 1867 by a group of american tourists to various locales in mediterranean europe, northern africa, and the near east. in the book, twain gleefully skewers marc truitt marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. outgoing editor’s column: parting thoughts outgoing editor’s column | truitt 153 committee assignments go, i think it fair to say that this is probably one of the more thankless. board members must be expert in all areas of technology, and as important, willing and able to do a credible job of pretending to be so in those areas where they are not expert! they must be able to recognize and create good prose and to offer authors practical, constructive insights and guidance in the sometimes black art of turning promising manuscripts into great articles. as i think many ital authors will attest, they do a superb job at this. they also write some of the most interesting and perceptive editorial columns you’ll see in ital! ■■ judith carter. it’s really impossible to overstate the contributions made by judith to ital. other than a brief four-year interlude during which i served in the role, judith has been managing editor for much of the past decade and more. she taught me the job when she relinquished it in early 2004, and then graciously offered to take it back again when i was named editor four years later. more than any other single person, she is responsible for the ital you hold in your hands, and she does it with skill and tireless dedication. she also has been my coach, my confidante, and—as only a true friend can be—even my butt-kicker when i was late in observing a deadline, which has not infrequently been the case. thank you for everything, judith. ■■ dan and john. the late dan marmion brought me on board at ital as a board member in 2000; he later asked me to serve as his managing editor. he also encouraged me to succeed john webb as editor in 2007. from both dan and john i learned much about the role of an editor and especially about what ital could and should be. i am endlessly appreciative for their mentoring and hope that i have been reasonably successful in maintaining the high standards that they set for the journal. ■■ the authors. without interesting, well-researched, and timely content, there would be no ital. i have been blessed with a rich and nearly constant supply of superb manuscript submissions that the folks who make up the ital “publication machine” have then turned into a highly stimulating and readable journal. i hope you agree. ■■ the readers. and finally, i thank all of you, gentle readers. you are the reason that ital exists. i have been grateful for your support, your patience, and your always-constructive suggestions. beginning with the march 2012 issue, ital will be edited by bob gerrity of boston college. i’ve been acquainted with bob for a number of years, and i can’t think of a better person to guide this journal through the tour-goers, those they encounter, and of course himself; as with twain generally, it is at turns witty, outlandish, biting, and—by today’s lights—completely lacking in political correctness. in short, it’s vintage mark twain: delicious! i mention innocents abroad not simply because i’m currently enjoying it (and hoping that by saying so, i might pique some other ital reader ’s interest in giving it a test drive) but also because it—as with other books, songs, stories, etc., about journeys-taken—is a metaphor for life. we are all “innocents” in some sense as we traverse the days and years of growth in selfawareness, relationships, work, and all the other facets that make up life. it’s a comforting way of viewing the world, i think. i’ve served with ital in various capacities for more than eleven years. that’s a very long time in terms of one particular ala/lita committee. it’s now time for my journey and ital’s to part ways. this is my final column as editor of this journal. this “innocent” is debarking the ital ship and moving on. ital is the product of the dedicated labor of many people of whom i am but one. for some of them, it is a labor of love. as with the credits at the end of a film, it is customary for an editor in her or his final column to recognize and thank the people who made it all possible. i’d like to do so now. polite audience members know to remain until “the end” rolls by. i hope you’ll help me honor these people by doing so, too: ■■ mary taylor, valerie edmonds, and melissa prentice in the lita office. over the years, they’ve been unfailingly helpful to me, to say nothing of being nearly as unfailingly tolerant of my clueless and occasionally obstreperous, passive-aggressive ignorance of the byzantine ways of the ala bureaucracy. ■■ ala production services. production services folk are the professionals who, among innumerable other skills, copyedit and typeset manuscripts, perform miracles with figures and tables, and generally make ital into the quality product you receive (whether it is celluloseor electron-based). regardless of ital’s future publishing format and directions, count yourself fortunate as long as the good people in production services continue to play a role. i’d especially like to single out tim clifford, ital’s production editor, who over the past several years has brought skill, grace, stability, and a healthy dose of humor to this critical post. ■■ the members—past and present—of the ital editorial board. the editorial board is a lita committee; the members of this committee serve as the editor’s primary group of reviewer-referees of manuscripts submitted for publication consideration. as 154 information technology and libraries | december 2011 “happy trails,” and “t-t-t-t-that’s all, folks!” “the end.” changes that will be coming over the next few years. i wish him the very best and hope that he has as much fun in the job—and on the journey—as have i. from the managing editor i’d like to take this opportunity to give marc truitt my heartfelt thanks and best wishes as he leaves his longterm relationship with information technology and libraries (ital). i appreciate how he ably stepped into the role of managing editor (me) when i needed to resign to focus on my full-time job. a few years later he became the new editor and i accepted his request to be his me. i think we’ve had a good partnership. i’ve nudged marc about the production schedule while he has managed manuscripts, the peer review process, and eloquently represented the journal when needed. marc held and communicated a clear and scholarly view of the journal to the editorial board and to lita. i have fond memories of many cups of tea drunk in various ala conference venues while we discussed ital, lita, and shared news of mutual friends. we endured the loss of our friend and mentor dan marmion together a year ago september when marc wrote a letter which i read at the memorial service. this too may be my final issue of ital. it is unknown at time of printing. i support the online future of ital and have offered my services to robert gerrity until a paper version is no longer supported and we successfully transition my duties into an online environment/to a new me. i know he will take the journal into its new iteration with skill and grace. i have served lita and ital for over 13 years and am proud of the quality peer reviewed journal dan marmion, john webb, marc truitt, the editorial board members and i have shared with the members of lita. it has also been my honor to communicate with each of the authors and to facilitate their scholarly communication to our profession. without the authors, where would we be? thank you all, judith carter. editorial board thoughts | dehmlow 53 mark dehmloweditorial board thoughts the ten commandments of interacting with nontechnical people m ore than ten years of working with technology and interacting with nontechnical users in a higher education environment has taught me many lessons about successful communication strategies. somehow, in that time, i have been fortunate to learn some effective mechanisms for providing constructive support and leading successful technical projects with both technically and “semitechnically” minded patrons and librarians. i have come to think of myself as someone who lives in the “in between,” existing more in the beyond than the bed or the bath, and, while not a native of either place, i like to think that i am someone who is comfortable in both the technical and traditional cliques within the library. ironically, it turns out that the most critical pieces to successfully implementing technology solutions and bridging the digital divide in libraries has been categorically nontechnical in nature; it all comes down to collegiality, clear communication, and a commitment to collaboration. as i ruminated on the last ten plus years of working in technology, i began to think of the behaviors and techniques that have proved most useful in developing successful relationships across all areas of the library. the result is this list of the top ten dos and don’ts for those of us self-identified techies who are working more and more often with the self-identified nontechnical set. 1. be inclusive—i have been around long enough to see how projects that include only technical people are doomed to scrutiny and criticism. the single best strategy i have found to getting buy-in for technical projects is to include key stakeholders and those with influence in project planning and core decision-making. not only does this create support for projects, but it encourages others to have a sense of ownership in project implementation—and when people feel ownership for a project, they are more likely to help it succeed. 2. share the knowledge—i don’t know if it is just the nature of librarianship, but librarians like to know things, and more often than not they have a healthy sense of curiosity about how things work. i find it goes a long way when i take a few moments to explain how a particular technology works. our public services specialists, in particular, often want to know the details of how our digital tools work so that they can teach users most effectively and answer questions users have about how they function. sharing expertise is a really nice way to be inclusive. 3. know when you have shared enough—in the same way that i don’t need to know every deep detail of collections management to appreciate it, most nontechies don’t need hour-long lectures on how each component of technology relates to the other. knowing how much information to share when describing concepts is critical to keeping people’s interest and generally keeping you approachable. 4. communicate in english—it is true that every specialization has its own vocabulary and acronyms (oh how we love acronyms in libraries) that have no relevance to nonspecialists. i especially see this in the jargon we use in the library to describe our tools and services. the best policy is to avoid jargon and explain concepts in lay-person’s terms or, if using jargon is unavoidable, define specialized words in the simplest terms possible. using analogies and drawing pictures can be excellent ways to describe technical concepts and how they work. it is amazing how much from kindergarten remains relevant later in life! 5. avoid techno-snobbery—i know that i am risking virtual ostracism in writing this, but i think it needs to be said. just because i understand technology does not make me better than others, and i have heard some variant of the “cup holder on the computer” joke way too often. even if you don’t make these kinds of comments in front of people who aren’t as technically capable as you, the attitude will be apparent in your interactions, and there is truly nothing more condescending. 6. meet people halfway—when people are trying to ask technology-related questions or converse about technical issues, don’t correct small mistakes. instead, try to understand and coax out their meaning; elaborate on what they are saying, and extend the conversation to include information they might not be aware of. people don’t like to be corrected or made to feel stupid—it is embarrassing. if their understanding is close enough to the basic idea, letting small mistakes in terminology slide can create an opening for a deeper understanding. you can provide the correct terminology when talking about the topic without making a point to correct people. 7. don’t make a clean technical/nontechnical distinction— after once offering the “technical” perspective on a topic, one librarian said to me that it wasn’t that they themselves didn’t have any technical mark dehmlow (mdehmlow@nd.edu) is digital initiatives librarian, hesburgh libraries, university of notre dame, notre dame, indiana. 54 information technology and libraries | june 2009 perspective, it just wasn’t perhaps as extensive as mine. each person has some level of technical expertise; it is better to encourage the development of that understanding rather than compartmentalizing people on the basis of their area of expertise. 8. don’t expect everyone to be interested—just because i chose a technical track and am interested in it doesn’t mean everyone should be. sometimes people just want to focus on their area of expertise and let the technical work be handled by the techies. 9. assume everyone is capable—at least at some level. sometimes it is just a question of describing concepts in the right way, and besides, not everyone should be a programmer. everyone brings their own skills to the table and that should be respected. 10. expertise is just that—and no one, no one knows everything. there just isn’t enough time, and our brains aren’t that big. embrace those with different expertise, and bring those perspectives into your project planning. a purely technical perspective, while perhaps being efficient, may not provide a practical or intuitive solution for users. diversity in perspective creates stronger projects. in the same way that the most interesting work in academia is becoming increasingly more multidisciplinary, so too the most successful work in libraries needs to bring diverse perspectives to the fore. while it is easy to say libraries are constantly becoming more technically oriented because of the expanse of digital collections and services, the need for the convergence of the technical and traditional domains is clear—digital preservation is a good example of an area that requires the lessons and strengths learned from physical preservation, and, if anything, the technical aspects still raise more questions than solutions—just see henry newman’s article “rocks don’t need to be backed up” to see what i mean.1 increasingly, as we develop and implement applications that better leverage our collections and highlight our services, their success hinges on their usability, user-driven design, and implementations based on user feedback. these “user”-based evaluation techniques fit more closely with traditional aspects of public services: interacting with patrons. lastly, it is also important to remember that technology can be intimidating. it has already caused a good deal of anxiety for those in libraries who are worried about long-term job security as technology continues to initiate changes in the way we perform our jobs. one of the best ways to bring people along is to demystify the scary parts of technology and help them see a role for themselves in the future of the library. going back to maslow’s hierarchy of needs, people want to feel a sense of security and belonging, and i believe it is incumbent upon those of us with a deep understanding of technology to help bring the technical to the traditional in a way that serves everyone in the process. reference 1. henry newman, “rocks don’t need to be backed up,” enterprise storage forum.com (mar. 27, 2009), www.enterprise storageforum.com/continuity/features/article.php/3812496 (accessed april 24, 2009). 8 information technology and libraries | june 20088 information technology and libraries | september 2008 from our readers: virtues and values in digital library architecture mark cyzyk editor’s note: “from our readers” will be an occasional feature, highlighting ital readers’ letters and commentaries on timely issues. at the fall 2007 coalition for networked information (cni) conference in washington, d.c., i pre-sented “a survey and evaluation of open-source electronic publishing systems.” toward the end of my presentation was a slide enumerating some of the things i had personally learned as a web application architect during my review of the systems under consideration: n platform independence should not be neglected. n one inherits the flaws of external libraries and frameworks. choose with care. n installation procedures must be simple and flawless. n don’t wake the sysadmin with “slap a gui on that xml!”—and push application administration out, as much as possible, to select users. n documentation must be concise, complete, and comprehensive. “i can’t guess what you’re thinking.” initially, these were just notes i thought might be useful to others, figuring it’s typically helpful to share experiences, especially at international conferences. but as i now look at those maxims, it occurs to me that when abstracted further they point in the direction of more general concepts and traits—concepts and traits that accurately describe us and the products of our labor if we are successful, and prescribe to us the concepts and traits we need to understand and adopt if we are not. in short, peering into each maxim, i can begin to make out some of the virtues and values that underlie, or should underlie, the design and architecture of our digital library systems. n freedom and equality platform independence should not be neglected. “even though this application is written in platformindependent php, the documentation says it must be run on either red hat or suse, or maybe it will run on solaris too, but we don’t have any of these here.” while i no doubt will be heartily flamed for suggesting that microsoft has done more to democratize computing than any other single company, i nevertheless feel the need to point out that, for many of us, windows server operating systems and our responsibility for administering them way back when provided the impetus for adding our swipe-card barcodes to the acl of the data center—surely a badge of membership in the club of enterprise it if ever there was one. you may not like the way windows does things. you may not like the way microsoft plays with the other boys. but to act like they don’t exist is nothing more than foolish burying one’s head in the *nix sand. windows servers have proven themselves time and again as being affordable, easily managed, dependable, and, yes, secure workhorses. windows is the ford pickup truck of the server world, and while that pickup will some day inevitably suffer a blowout of its twenty-year-old head gasket (and will therefore be respectfully relegated to that place where all dearly departed trucks go), it’s been a long and good run. we should recognize and appreciate this. windows clearly has a place in the data center, sitting quietly humming alongside its unix and linux brothers. i imagine that it actually takes some effort to produce platform-dependent applications using platform-independent languages and frameworks. such effort should be put toward other things. keep it pure. and by that i mean, keep it platform independent. freedom to choose and presumed equality among the server-side oses should reign. n responsibility and good sense one inherits the flaws of external libraries and frameworks. choose with care. so you’ve installed the os, you’ve installed and configured the specified web server, you’ve installed and configured the application platform, you’ve downloaded and compiled the source, yet there remains a long list of external libraries to install and configure. one by one you install them. suddenly, when you get to library number 16 you hit a snag. it won’t install. it requires a previous version of library number 7, and multiple versions of library number 7 can’t be installed at the same time on the same box. worse yet, as you take a break to read some more of the documentation, it sure looks like required library number 19 is dependent on the current version of library number 7 and won’t work with any previous version. and could it be that library number 21 is dependent on library number 20, which is dependent on library number 23, which is dependent on—yikes—library number 21? mark cyzyk (mcyzyk@jhu.edu) is the scholarly communication architect, library digital programs group, sheridan libraries, johns hopkins university in baltimore. from our readers: virtues and values in digital library architecture | cyzyk 9 all things come full circle. but let’s suppose you’ve worked out all of these dependencies, you’ve figured out the single, secret order in which they must install, you’ve done it, and it looks like it’s working! yet, when you go to boot up the web service, suddenly there are errors all over the place, a fearsome crashing and burning that makes you want to go home and take a nap. something in your configuration is wrong? something in the way your configuration is interacting with an external library is wrong? you search the logs. you gather the relevant messages. they don’t make a lot of sense. now what to do? you search the lists, you search the wikis to no avail, and finally, in desperation, you e-mail the developers. “but that’s a problem with library x, not with our application.” au contraire. i would like to strongly suggest a copernican revolution in how we think about such situations. while it’s obvious that the developers of the libraries themselves are responsible for developing and maintaining them, i’d like to suggest that this does not relieve you, the developer of a system that relies on their software, from responsibility for its bugs and peculiar configuration problems. i’d like to suggest that, far from pushing responsibility in the case mentioned above out to the developers of the malfunctioning external library, that you, in choosing that library in the first place, have now inherited responsibility for it. even if you don’t believe in this notion of inheritance, if you would please at least act as if it were true, we’d all be in a better place. part of accepting this kind of responsibility is you then acting as a conduit through which we poor implementers learn the true nature of the problem and any solutions or temporary workarounds we may apply so that we can get your system up and running pronto. in the end, it’s all about your system. your system as a whole is only as strong as the weakest link in its chain of dependencies. n simplicity and perfection installation procedures must be simple and flawless. it goes without saying that if we can’t install your system we a fortiori can’t adopt it for use in our organization. i remember once having such a difficult time trying to get a system up and running that i almost gave up. i tried first to get it running against apache 1.4, then against apache 2.0. i had multiple interactions with the developers. i banged my head against the wall of that system for days in frustration. the documentation was of little help. it seemed to be more part of an internal documentation project, a way for the developers to communicate among themselves, than to inform outsiders like me about their system. and related to this i remember driving to work during this time listening to a report on npr about the famous hopkins pediatric neurosurgeon, dr. ben carson. apparently, earlier in the week he had separated the brains of siamese twins and the twins were now doing fine, recuperating. the npr commentator marveled at the intricacy of the operation and at the fact that the whole thing took, i believe, five hours. “five hours? five hours?!” i exclaimed while barreling down the highway in my vintage 1988 ford ranger pickup (head gasket mostly sealed tight, no compression leakage). “i can’t get this system at work installed in five days!” our goal as system architects needs to be that we provide to our users simple and flawless installation procedures so that our systems can, on average, be installed and configured in equal or less time than it takes to perform major brain surgery.1 “all in an afternoon” should become our motto. i am happy to find that there are useful and easy to use package managers, e.g., yum and synaptic, for doing such things on various linux distributions. windows has long had solid and sophisticated installation utilities. tomcat supports drop-in-place war files. when possible and appropriate, we need to use them. n justice and e-z livin don’t wake the sysadmin with “slap a gui on that xml!”—and push application administration out, as much as possible, to select users. i remember reading plato’s republic as an undergraduate and the feeling of being let down when the climax of the whole thing was a definition in which “justice” simply is each man serving his proper place in society and not transgressing the boundaries of his role. “that’s it?” i thought. “so you have this rigidly hierarchical society and each person in it knows his role and knows in which slot his role fits—and keeping to this is ‘justice’?” this may not be such a great way to structure a society, but now that i think about it, it’s a great way to structure a computer application. sit down and carefully look at the functions your program will provide. then create a small set of user roles to which these functions will be carefully mapped. in the end you will have a hierarchical structure of roles and functions that should look perfectly simple and rational when drawn on a piece of paper. and while the superuser role should have power over 10 information technology and libraries | september 2008 all and access to all functions in the application, the list of functions that he alone has access to should be small, i.e., the actual work of the superuser should be minimized as much as possible by making sure that most functions are delegated to the members of other, appropriate, proper user roles. doing this happily results in what i call the state of e-z livin: the last thing you want is for users to constantly be calling you with data issues to fix. you therefore will model management of the data—all of it—and the configuration of the application itself—most of it— directly into the architecture of the application, provide users the guis they need to configure and manage things themselves, and push as much functionality as you can out to them where it belongs. let them click their respective ways to happiness and computing goodness. you build the tool, they use it, and you retire back to the land of e-z livin. users are assigned to their roles, and all roles are in their proper places. application architecture justice is achieved. n clarity and wholeness documentation must be concise, complete, and comprehensive. “i can’t guess what you’re thinking.” as system developers we’ve probably all had the magical experience of a mind meld with a fellow developer when working intensively on a project. i have had this experience with two other developers, separately, at different stages of my career. (one of them, in fact, used to point out to everyone that, “between the two of us, we make one good developer!”) this is a wonderful and magical and productive working relationship in which to be, and it needs to be recognized, supported, and exploited whenever it happens. you are lucky if you find yourself designing and developing a system and your counterpart is reading your mind and finishing your sentences. however, just as it’s best to leave that nice young couple cuddling in the corner booth alone, so too it really doesn’t make a lot of sense to expect the mind-melded developers to turn out anything that remotely resembles coherent and understandable documentation. those undergoing a mind meld by definition know perfectly well what they mean. to the rest of us it just feels like we missed a memo. if you have the luxury, make sure that the one writing the documentation is not currently undergoing a mind meld with anyone else on the development team. scotty typically stayed behind while he beamed the others down. beam them down. be that scotty. you do the world a great service by staying behind on the ship and dutifully reporting, clearly and comprehensively, what’s happening down on the red planet. to these five maxims, and their corresponding virtues, i would add one more set, one upon which the others rely: n empathy and graciousness you are not your audience. at least in applied computing fields like ours, we need to break with the long-held “guru in the basement” mentality. the actions of various managerial strata have now ostensibly acknowledged for us that technical expertise, especially in applied fields, is a commodity, i.e., it can be bought. a dearth of such expertise is remedied by simply applying money to the situation—admittedly difficult to do at the majority of institutions of higher education, but a common occurrence at the wealthiest. nevertheless, the dogmatic hold of the guru has been broken and the magical aura that once draped her is not now so resplendent—her relative rarity, and the clubby superiority that depended upon it, has been diluted significantly by the sheer number of counterparts who can and will gleefully fill her function. we respect, value, and admire her; it’s just that her stranglehold on things has (rightfully) been broken. and while nobody is truly indispensable, what is more difficult and rare to find is someone who has the guru’s same level of technical chops coupled with a genuine empathic ability to relate to those who are the intended users of her systems and services. unless your systems and services are geared primarily toward other developers, programmers, and architects— and presumably they are not, nor, in the library world, should they be—your users will typically be significantly unlike you. let me repeat that: your users are not like you. rephrased: you are not your audience. when looking back over the other maxims, values, and virtues mentioned in this essay then, the moralpsychological glue that binds them all is composed of empathy for our users—faculty, students, librarians, non-technical staff—and the graciousness to design and carry out a project plan in a spirit of openness, caring, flexibility, humility, respect, and collaboration. when empathy for the users of our systems is absent—and there are cases where you can actually see this in the design and documentation of the system itself—our systems will ultimately not be used. when the spirit of graciousness is broken, men become robots, mere rule followers, and users will boycott using their systems and will look elsefrom our readers: virtues and values in digital library architecture | cyzyk 11 where, naturally preferring to avoid playing the simonsays games so often demanded by tech folk in their workaday worlds; there is a reason the comic strip dilbert is so funny and rings so true. when confronted with a lack of empathy and graciousness on our part, the users who can boycott using our systems and services will boycott using our systems and services. and we’ll be left out in the rain, feeling like, as bonnie raitt once sadly sang, “i can’t make you love me if you don’t / i can’t make your heart feel something it won’t.” empathy and graciousness, while not guaranteeing enthusiastic adoption of our systems and services, are a necessary precondition for users even countenancing participation. there are undoubtedly other virtues and values that can usefully be expounded in the context of digital library architecture—consistency, coherence, and elegance immediately come to mind—and i could go on and on analyzing the various maxims surrounding these that bubble up through the stack of consciousness during the course of the day. yet doing so would conflict with another virtue i think is key to the success and enjoyment of opinionpiece essays like this and maybe even of other sorts of publications and presentations: brevity. note 1. a colleague of mine has since informed me that carson’s operation took twenty-five hours, not five. nevertheless, my admonition here still holds. when installation and configuration of our systems are taking longer, significantly longer, than it takes to perform major brain surgery, surely there is something amiss? editorial | truitt 107 marc truitteditorial: computing in the “cloud” silver lining or stormy weather ahead? c loud computing. remote hosting. software as a service (saas). outsourcing. terms that all describe various parts of the same it elephant these days. the sexy ones—cloud computing, for example—emphasize new age-y, “2.0” virtues of collaboration and sharing with perhaps slightly mystic overtones: exactly where and what is the “cloud,” after all? others, such as the more utilitarian “remote hosting” and “outsourcing,” appeal more to the bean counters and sustainabilityminded among us. but they’re really all about the same thing: the tradeoff between cost and control. that the issue increasingly resonates with it operations at all levels these days can be seen in various ways. i’ll cite just a few: n at the meeting of the lita heads of library technology (holt) interest group at the 2009 ala annual conference in chicago, two topics dominated the list of proposed holt programs for the 2010 annual conference. one of these was the question of virtualization technology, and the other was the whole white hat–black hat dichotomy of the cloud.1 practically everyone in the room seemed to be looking at—or wanting to know more about—the cloud and how it might be used to benefit institutions. n my institution is considering outsourcing e-mail. all of it—to google. times are tough, and we’re being told that by handing e-mail over to the googleplex, our hardware, licensing, evergreening, and technical support fees will total zero. zilch. with no advertising. heady stuff when your campus hosts thirty-plus central and departmental mail servers, at least as many blackberry servers, and total costs in people, hardware, licensing, and infrastructure are estimated to exceed can$1,000,000 annually. n in the last couple of days, library electronic discussion lists such as web4lib have been abuzz— or do we now say a-twitter?—about amazon’s orwellian kindle episode, in which the firm deleted copies of 1984 and animal farm from subscribers’ kindle e-book readers without their knowledge or consent.2 indeed, amazon’s action was in violation of its own terms of service, in which the company “grants [the kindle owner] the non-exclusive right to keep a permanent copy of the applicable digital content and to view, use, and display such digital content an unlimited number of times, solely on the device or as authorized by amazon as part of the service and solely for [the kindle owner ’s] personal, noncommercial use.”3 all of this has me thinking back to the late 1990s marketing slogan of a manufacturer of consumer-grade mass storage devices—remember removable hard drives? iomega launched its advertising campaign for the 1 gb jaz drive with the catch-line “because it’s your stuff.” ultimately, whether we park it locally or send it to the cloud, i think we need to remember that it is our stuff. what i fear is that in straitened times, it becomes easy to forget this as we struggle to balance limited staff, infrastructure, and budgets. we wonder how we’ll find the time and resources to do all the sexy and forward-looking things, burdened as we are with the demands of supporting legacy applications, “utility” services, and a huge and constantly growing pile of all kinds of content that must be stored, served up, backed up (and, we hope, not too often, restored), migrated, and preserved. the buzz over the cloud and all its variants thus has a certain siren-like quality about it. the notion of signing over to someone else’s care—for little or no apparent cost—our basic services and even our own content (our stuff) is very appealing. the song is all the more persuasive in a climate where we’ve moved from just the normal bad news of merely doing more with less to a situation where staff layoffs are no longer limited to corporate and public libraries, but indeed extend now to our greatest institutions.4 at the risk of sounding like a paranoid naysayer to what might seem a no-brainer proposition, i’d like to suggest a few test questions for evaluating whether, how, and when we send our stuff into the cloud: 1. why are we doing this? what do we hope to gain? 2. what will it cost us? bear in mind that nothing is free—except, in the open-source community, where free beer is, unlike kittens, free. if, for example, the borg offer to provide institutional mail without advertisements, there is surely a cost somewhere. the borg, sensibly enough, are not in business to provide us with pro bono services. 3. what is the gain or loss to our staff and patrons in terms of local customization options, functionality, access, etc? 4. how much control do we have over the service offered or how our content is used, stored, marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 108 information technology and libraries | september 2009 repurposed, or made available to other parties? 5. what’s the exit strategy? what if we want to pick up and move elsewhere? can we reclaim all of our stuff easily and portably, leaving no sign that we’d ever sent it to the cloud? we are responsible for the services we provide and for the content we have been entrusted. we cannot shrug off this duty by simply consigning our services and our stuff to the cloud. to do so leaves us vulnerable to an irreparable loss of credibility with our users; eventually some among them would rightly ask, “so what is it that you folks do, anyway?” we’re responsible for it—whether it’s at home or in the cloud—because it’s our stuff. it is our stuff, right? references and notes 1. i should confess, in the interest of full disclosure, that it was eli neiburger of the ann arbor district library who suggested “hosted services as savior or slippery slope” for next year’s holt program. i’ve shamelessly filched eli’s topic, if not his catchy title, for this column. thanks, eli. also, again in the interest of full disclosure, i suggested the virtualization topic, which eventually won the support of the group. finally, some participants in the discussion observed that virtualization technology and hosting are in many ways two sides of the same topical coin, but i’ll leave that for others to debate. 2. brad stone, “amazon erases orwell books from kindle,” new york times, july 17, 2009, http://www.nytimes .com/2009/07/18/technology/companies/18amazon.html?_ r=1 (accessed july 21, 2009). 3. amazon.com, “amazon kindle: license agreement and terms of use,” http://www.amazon.com/gp/help/customer/ display.html?nodeid=200144530 (accessed july 21, 2009). 4. “budget cutbacks announced in libraries, center for professional development,” stanford university news, june 10, 2009, http://news.stanford.edu/news/2009/june17/layoffs-061709 .html (accessed july 22, 2009; “harvard libraries cuts jobs, hours,” harvard crimson (online edition), june, 26 2009, http:// www.thecrimson.com/article.aspx?ref=528524 (accessed july 22, 2009). 2 information technology and libraries | march 2010 michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque president’s message: join us at the forum! t he first lita national forum i attended was in milwaukee, wisconsin. it seems like it was only a couple of years ago, but in fact nine national forums have since passed. i was a new librarian, and i went on a lark when a colleague invited me to attend and let me crash in her room for free. i am so glad i took her up on the offer because it was one of the best conferences i have ever attended. it was the first conference that i felt was made up of people like me, people who shared my interests in technology within the library. the programming was a good mix of practical know-how and mindblowing possibilities. my understanding of what was possible was greatly expanded, and i came home excited and ready to try out the new things i had learned. almost eight years passed before i attended my next forum in cincinnati, ohio. after half a day i wondered why i had waited so long. the program was diverse, covering a wide range of topics. i remember being depressed and outraged on the current state of internet access in the united states as reported by the office for information technology policy. i felt that surge of recognition when i discovered that other universities were having a difficult time documenting and tracking the various systems they run and maintain. i was inspired by david lanke’s talk, “obligations of leadership.” if you missed it you can still hear it online. it is linked from the lita blog (http:// www.litablog.org). while the next forum may seem like a long way off to you, it is in the forefront of my mind. the national forum 2010 planning committee is busy working to make sure this forum lives up to the reputation of forums past. this year’s forum takes place in atlanta, georgia, september 30–october 3. the theme is “the cloud and the crowd.” program proposals are due february 19, so i cannot give you specifics about the concurrent sessions, but we do hope to have presentations about projects, plans, or discoveries in areas of library-related technology involving emerging cloud technologies; software-as-service, as well as social technologies of various kinds; using virtualized or cloud resources for storage or computing in libraries; library-specific open-source software (oss) and other oss “in” libraries; technology on a budget; using crowdsourcing and user groups for supporting technology projects; and training via the crowd. each accepted program is scheduled to maximize the impact for each attendee. programming ranges from five-minute lightening talks to full day preconferences. in addition, on the basis of attendee comments from previous forums, we have also decided to offer thirtyand seventy-five-minute concurrent sessions. these concurrent sessions will be a mix of traditional singleor multispeaker formats, panel discussions, case studies, and demonstrations of projects. finally, poster sessions will also be available. while programs such as the keynote speakers, lightning talks, and concurrent sessions are an important part of the forum experience, so is the opportunity to network with other attendees. i know i have learned just as much talking with a group of people in the hall between sessions, during lunch, or at the networking dinners as i have sitting in the programs. not only is it a great opportunity to catch up with old friends, you will also have the opportunity to make new ones. for instance, at the 2009 national forum in salt lake city, utah, approximately half of the people who attended were first-time attendees. the national forum is an intimate event whose attendance ranges between 250 and 400 people, thus making it easy to forge personal connections. attendees come from a variety of settings, including academic, public, and special libraries; library-related organizations; and vendors. if you want to meet the attendees in a more formal setting you can attend a networking dinner organized on-site by lita members. this year the dinners were organized by the lita president, lita past president, lita presidentelect, and a lita director-at-large. if you have not attended a national forum or it has been a while, i hope i have piqued your interest in coming to the next national forum in atlanta. registration will open in may! the most up-to-date information about the 2010 forum is available at the lita website (http:// www.lita.org). i know that even after my lita presidency is a distant memory, i will still make time to attend the lita national forum. i hope to see you there! 2 information technology and libraries | june 2007 i write my final president’s column a month after the midwinter meeting in seattle. you will read it as preparations for the ala annual conference in washington, d.c. are well underway. despite that discon­ nect in time, i am confident that the level of enthusiasm will continue uninterrupted between the two events. indeed, the midwinter meeting was highly charged with positive energy and excitement. the feelings are reignited if you listen to the numerous podcasts now found on the lita blog. the lita bloggers and podcasters were omni­ present reporting on all of the meetings and recording the musings of the lita top tech trendsters. by the time you have read this you will have also, hopefully, cast your ballot for lita officers and directors after having had the opportunity to listen to brief podcast interviews with the candidates. the lita board approved the election pod­ casts at the annual conference in new orleans. thanks to the collaborative efforts of the nominating committee and the bigwig members, we have this new input into our voting decision­making. the most exciting aspects of the midwinter meeting were the face­to­face, networking opportunities that make lita so great. the lita happy hour crowd filled the six arms bar and lit it up with the wonderful lita glow badges. what was particularly gratifying to me was the number of new lita members alongside those of us who have been around longer than we care to count. the net­ working that went on there was phenomenal! the other important networking opportunity for lita members was the lita town meeting led by lita vice president mark beatty. the room was packed with eager members ready to brainstorm about what they think lita should be doing after consuming a wonderful breakfast. lita’s sponsored emerging leader, michelle boule, and mark have collated the findings and will be working with the other emerging leaders to fine­tune a direction. the podcast interview of michelle and mark is an excellent summary of what you can expect in the next year when mark is president. as stated earlier, this is my last president’s column, which means my term is winding down. using lita’s strategic plan as a guide, i have worked with many of you in lita to ensure that we have a structure in place that allows us to be more adaptable to the rapidly chang­ ing world and to make sure that lita is relevant to lita members 365 x 24 x 7 and not just at conferences and lita national forum. attracting and retaining new members is critical for the health of any organization and in that vein, mark and i have used the ala emerging leaders program as a jumping off point to work with lita’s emerging leaders. the bigwig group is foment­ ing with energy and excitement as they rally bloggers and have this past year launched the podcasting initiative and the lita wiki. all of these things are making it easier for members to communicate about issues of interest in their work as well as to conduct lita business. the lita blog had over nine thousand downloads of its podcasts in the first three weeks after midwinter which confirms the desire for these types of communications! i appointed two task forces that provided recommen­ dations to the lita board at midwinter. the assessment and research task force has recommended that a perma­ nent committee be established to monitor the collection of feedback and assessment data on lita programs and services. having an established assessment process will enable the board to know how well we are accomplishing our strategic plan and to keep us on the correct course to meet membership needs. the education working group has recommended the merger of two committees, the education and regional institutes committees, into one education committee. this merged committee will develop a variety of educational opportunities including online and face­to­face sessions. we hope to have both of these committees up and going later in 2007. happily, the feedback from the town meeting parallels the recom­ mendations of the task forces. the board will be revisit­ ing the strategic plan at the annual conference using information gathered at the town meeting. we will also be looking at what new services we should be initiating. all arrows seem to be pointing towards more educational and networking opportunities both virtual and in person. i anticipate that lita members will see some great new things happening in the next year. i have very much enjoyed the opportunity to serve as the lita president this past year. the best part has been getting to know so many lita members who have such creative ideas and who roll up their sleeves and dig in to get the work done. i am very grateful for everyone who has volunteered their time and talents to make lita such a great organization. bonnie postlethwaite (postlethwaiteb@umkc.edu) is lita president 2006/2007 and associate dean of libraries, university of missouri–kansas city. president’s column bonnie postlethwaite gender, technology, and libraries | lamont 137 melissa lamont gender, technology, and libraries information technology (it) is vitally important to many organizations, including libraries. yet a review of employment statistics and a citation analysis show that men make up the majority of the it workforce, in libraries and in the broader workforce. research from sociology, psychology, and women’s studies highlights the organizational and social issues that inhibit women. understanding why women are less evident in library it positions will help inform measures to remedy the gender disparity. t echnology not only produces goods and services, it also influences society and culture and affects our ability to work and communicate. as the computer encroaches more deeply into both workplaces and homes, encouraging participation in the development and use of technology by all segments of society is important. libraries, in particular, need to provide services and products that both appeal to and are accessible by a broad range of clientele. for libraries, information technology (it) has become vitally important to the operation of the organization. yet fewer women are active in it than men. a complex series of social and cultural biases inhibits women from participating in technology both in the library and in the larger workforce. the inclusion of more women in technology would alter the development and design of products and services as well as change the dynamic of the workplace. understanding why women reject it as it is currently practiced is necessary to understanding how to make technology more inviting for women. n occupational data studies and statistics from the broader it fields highlight discrepancies between the compensation, managerial level, and occupational roles of men and women.1 among the numbers are those showing that computer and information science fields include only 519,700 females and slightly more than 1,360,000 males in 2003.2 in the same occupational fields, men earned a median of $74,000 while women earn $63,000.3 similarly, the association of research libraries (arl) statistics from 2004 to 2008 show that men were more often employed as the heads of computer systems departments within libraries. computer systems department heads also earned higher salaries than the heads of other library departments. with the exception of 2004–5, female computer department heads were paid less than their male counterparts, despite the fact that they had more years of experience. in the 2007–8 report, men and women had the same number of years of experience, though women’s salaries lagged slightly behind those of the men, as shown in table 1.4 the availability of statistics for the heads of library technology departments belies the difficulty in counting the number of technology positions in libraries, or the broader workplace, and compiling statistics by gender. in a recent study of the job satisfaction of academic library it workers, lim comments on the complexities in identifying survey participants, “as a directory of library it workers does not exist.”5 thus, to augment the statistical data for department heads, a citation analysis was used to identify those persons involved enough in library technology to write about it. presumably, authors of articles appearing in technology-oriented journals would have interests and expertise in technology regardless of their position titles or locations within the organization. technology-related articles can and do appear in a wide variety of library journals. journals with a focus on technology were selected to avoid the dilemma of subjectively categorizing individual articles as technical or nontechnical. the journals selected provide a cross-section of association, commercial, electronic, and print publications. information technology and libraries is the journal of the library information technology association division of the american library association (ala). the journal of information science and technology (jasis&t) is an official publication of the american society for information science and technology. the not-for-profit corporation for national research initiatives publishes d-lib magazine, an table 1. library computer systems department heads year gender depart ment heads salary years in field 2004–5 women 32 76,764 18.9 men 60 76,060 16.9 2005–6 women 32 78,767 19.4 men 52 79,680 18.4 2006–7 women 26 81,435 18.2 men 52 82,409 17.6 2007–8 women 27 87,107 18.8 men 51 87,136 18.8 melissa lamont (mlamont@rohan.sdsu.edu) is digital collections librarian, san diego state university. 138 information technology and libraries | september 2009 electronic publication on digital library research and development. all three are peer-reviewed. computers in libraries, published by information today, includes case studies and how-we-did-it articles and is not peer-reviewed. emerald publishes the peer-reviewed journal library hi tech. the author assembled statistics for the years 2006 and 2007. for the survey, regular columns, editors’ sections, reviews, short notices, and association communications were not counted. each authored article was counted. no attempt was made to include or discount an article based upon the topic. the gender of the authors was determined by notes within the journal, authors’ websites, other internet sites, or by communication with the authors. as the statistics in table 2 demonstrate, men publish in these journals at a far higher rate than women, with the exception of computers in libraries. women make up 35 percent of the authors while men make up 65 percent. jasis&t, arguably the most technical and theoretical journal in the analysis, and the journal with the most academic authorship, illustrates the highest disparity. alternatively, the publication computers in libraries contains more articles authored by women. this publication solicits articles on the application of technology—practical and less formal articles to share successes and ideas. it may be argued that female librarians simply publish less than male librarians. two additional publications, the journal of academic librarianship, published by elsevier, and college and research libraries (c&rl), published by the association of college and research libraries, were analyzed for comparison. table 3 illustrates the data for the comparison journals alone, with women making up 62 percent of the authors. female authors outnumbered male authors in the comparison journals, but women account for approximately 80 percent of u.s. librarians and are therefore publishing at a lower rate than men.6 in the interest of comparison, the author also analyzed the journal children and libraries, the journal of the association for library service to children, a division of ala. in 2006 and 2007, only four male authors were represented in children and libraries. they appeared as authors a total of eleven times. all of the remaining fifty authors are female. women made up 82 percent of the total authors while men made up 18 percent. these statistics are similar to a study conducted by hakanson and published in 2005. she analyzed articles in selected journals from the years 1980 to 2000 and found that male authors slightly outnumbered female authors, and further that articles authored by men were more likely to be referenced than those by women.7 the data gathered here are similar: 41 percent of the total authors in both technology and comparison journals are women, and 59 percent are men. male authors also are more likely to be the lead author on articles with multiple authors. again jasis&t shows the greatest disparity. computers in libraries includes more female lead authors, as shown in table 4. in the comparison journals, women are more often the lead author, as shown in table 5. both hakanson’s data and the small statistical sample reported here demonstrate that although women hold most library positions, they do not publish a comparable amount. technology journals show the most disparity between the numbers of male and female authors. together, the citation and occupational statistics illustrate the higher visibility men have in it. fewer women are evident in it as department heads, employees, academics, or authors. table 2. gender of authors in technology journals, 2006–7 publication articles female authors male authors # % # % computers in libraries 57 51 61.4 32 38.6 d-lib magazine 92 83 38.6 132 61.4 information technology & libraries 43 28 33 57 67 jasis&t 354 244 30.3 560 69.7 library hi-tech 91 63 41.2 90 58.8 totals 637 469 35 871 65 table 3. gender of authors in comparison journals, 2006–7 publication articles female authors male authors # % # % college & research libraries 66 81 63 48 37 journal of academic librarianship 128 140 61 89 39 totals 194 221 62 137 38 gender, technology, and libraries | lamont 139 n discussion in the broader workplace, not just libraries, men hold the majority of it positions. the importance of including women in it is not just a matter of equal opportunity. according to rasmussen and hapnes, women will bring different concerns and outlooks to it. further, the products and services produced by a diverse and integrated workforce will appeal to a broader market. including more women in the it workplace will also alter the organizational environment. their ideas and interests will bring new perspectives to development discussions and likely lead to new or different systems.8 understanding why relatively few women enter it fields will help inform measures to alter the current, male-dominated dynamic. by reviewing the research in sociology, psychology, and women’s studies, the factors inhibiting women from participation in it can start to be understood. the dissuasive factors are a complex and intertwined combination of organizational culture, occupational segregation, and subtle discrimination. abilities and perceptions technology is pervasive throughout the library, and nearly all librarians develop basic technical skills as a condition of employment. librarians may develop more advanced computing skills to address a lack of technical support, to develop new services, or for professional or personal interest. correspondingly, technologists have absorbed library concepts such as description and classification. yet knowledge and ability are valued and evaluated within the social context of the organization, according to scott-dixon. the location of an occupation within the organization will influence the perception of the ability and skill required to succeed in that position.9 although the work of librarians and technologists may be similar or interdependent, the occupations are valued differently. scott-dixon’s research addresses the problem of “designating which work is technical enough to merit consideration as it work.”10 technologically proficient librarians or staff working outside of the it department will not be considered part of the library’s it staff, yet they may be performing at a technological level equal to that of the regular it staff. scott-dixon states, “assumptions about it work incorporate assumptions about who performs this work, and that work performed in traditionally nonwhite, non-male jobs is often viewed as less technical, regardless of the technological objects that are employed in the process.”11 the number of women participating in it may be higher than the statistics represent; nevertheless, women are still less directly employed in it. any contributions they make to it will be devalued as a consequence of their positions within the library organization. position and department titles also influence the perceived value of the work. to make traditional library tasks appear modern and relevant, long-established library functions have been renamed. cataloging has become metadata, catalog control has become system administration, and librarianship has become information science. the old chestnut that information science is library science for boys has an element of truth. in 2006, the average annual starting salary for librarians who categorize their positions as information science was $48,413; the average for those who categorized their positions as library science was $39,580. women who categorized their positions as information science earned an average starting salary of $46,118; men averaged $55,423.12 salary statistics substantiate the research showing that information technology positions are more highly valued and therefore more highly compensated in the library organization. likewise, men are more highly compensated than women. one of the causes of income inequality is occupational table 4. gender of lead authors in technology journals, 2006–7 publication articles female first male first computers in libraries 20 12 8 d-lib magazine 50 22 28 information technology & libraries 17 7 10 jasis&t 140 32 108 library hi-tech 42 19 23 totals 269 92 177 table 5. gender of lead authors in comparison journals, 2006–7 publication articles female first male first college and research libraries 39 29 10 journal of academic librarianship 61 36 25 totals 100 65 35 140 information technology and libraries | september 2009 segregation.13 occupational segregation occurs when positions with similar educational requirements, but different titles or locations within the organization, are valued differently.14 the difference in the salaries of traditional library department heads and the heads of technology departments is one example of income inequality within the library. according to the arl annual salary survey 2007–08, heads of computer systems departments earn more than $87,100 while heads of rare books and manuscripts departments, who have the second highest salaries, earn $80,628. the rare books and manuscripts department heads are nearly evenly divided by gender; the majority of computer systems department heads are men.15 in libraries, occupational segregation divides traditional library departments and functions from it departments and technology applications. librarians are predominately female and, as the occupational statistics show, it workers are predominately male. the result for libraries has been a gendered segregation of the library workforce.16 the results of occupational segregation are intensified by the tendency for women to avoid defining themselves as technology workers. the research by adam et al. confirms the results of several earlier studies. when asked to define their roles in the organization, men more often associate their positions with it; women tend to identify with a larger or more encompassing group within the organization, not specifically it.17 though these studies did not include librarians, it could be assumed that female librarians would respond much like their counterparts in other industries. in fact, few occupational studies conducted outside the library profession include librarians. thus it appears that women choose to be excluded from an occupational group that is well compensated, integral to the organization, and considered highly skilled. not only do women define their positions as non– it, but women also underestimate their technical skills. hargittai and shafer reviewed a number of studies investigating the self-assessment of computer skills. in those studies, women test at the same skill level as men but consistently underrate their technical ability. hargittai and shafer conducted a study of internet skills that draws the same conclusion.18 organizational culture women may underestimate their abilities and disassociate with it in part because of the perception of it organizational culture.19 technical positions are associated with long and irregular hours, leading to the assumption that family and home responsibilities will cause women to be less able to contribute. as ramsey and mccorduck note, those assumptions are not associated with men’s work.20 they emphasize that while women “often shoulder more family responsibilities than men . . . the presumption more than the reality tends to limit women’s advancement.”21 the perception of a high commitment level is fostered by the computing industries. the stereotype of the solitary computer geek, typing away in physical, though not virtual, isolation with a social life revolving around the technology is not entirely accurate. yet guzman, stam, and stanton have studied it as an occupational subculture. they call the perceived demands of the subculture “extreme and unusual,” with long hours and constant need for self reeducation.22 the appearance of high cost in time and capital is one way that the already-initiated keep outsiders out. the use of specialized language and jargon, stories of long hours spent, and complaints about end users are all means of solidifying organizational boundaries. the ramsey and mccorduck report points to a perception by some women that the long hours are often “a status symbol, a sign of machismo.”23 all occupational groups participate in us-versus-them behavior however; since it is gendered, the subculture effectively excludes women and exacerbates the segregation. according to guzman, stam, and stanton’s research, one of the hallmarks of the it subculture is the sense of control over other groups within the organization. yet the subculture also shares a sense of fulfillment in assisting others with technology.24 the esoteric knowledge held by it workers is essential to the operation of most organizations, in particular libraries. this gives the subculture an inordinate sense of power.25 the computing professions appear to be linked with masculinity and power, at least in western cultures. melanie wilson writes, “the qualities required for entry to the professions and success in them are seen as masculine.”26 masculine occupations tend to be associated with skill, learning, and hard work. construction, business, and now it have a preponderance of male professionals. masculine occupations are more prestigious and better compensated. wacjman writes, “to be in command of the very latest technology signifies being involved in directing the future, so it is a highly valued and mythologized activity.”27 the idea that women’s skills are more instinctive makes them less valued, and feminized occupations tend to be associated with the innate behaviors.28 wilson points to research indicating that “women’s work tends to be regarded as semi-skilled merely because it is women’s work.”29 women are a higher percentage of elementary school teachers, nurses, and care givers, and those positions receive modest compensation compared to occupations typically held by men. specific to libraries, technology subfields may be seen as acceptable positions for men in an occupation traditionally dominated by women. as the research suggests, an increase in the number of women involved in technology would devalue those fields. roos and reskin explored the effect of an increase gender, technology, and libraries | lamont 141 in the numbers of women on occupational status. in a 1990 paper they wrote, traditionally, “women’s” jobs have been both lowerpaying and less valued than “men’s.” occupational incumbents have thus been chagrined to learn that their occupation is feminizing, fearful of a drop in wages and prestige. this fear has a valid empirical basis: the percentage female in an occupation is negatively correlated with occupational earnings.30 an influx of women into library it would likely devalue the subfield and depress wages; as such, occupational segregation is one means of protecting wages and influence. women are often deterred from entering or excelling in an occupation through subtle discrimination. because the sexist actions or words are not always recognized as discriminatory, subtle sexism is difficult to define. the repetition of the behaviors and language over time creates a sense that those patterns are acceptable, and they become more difficult to change.31 examples of subtle sexism include the expectation that the women will be more responsible for social occasions involving food or more responsible for the staff lounge or lunch room. often the informal exchange of information and skills, so-called boy’s-room knowledge, eludes women because they are excluded from masculine socializing. in addition, men may be assigned different, usually less clerical tasks, and women are often associated with the softer tasks of user support, help desks and interfaces.32 although subtle discrimination occurs in all work places, not just libraries, the effects in a gender-segregated workplace are compounded. confronted with a complex series of social, cultural, and organizational cues, women are made to feel less competent and less comfortable with technology. the association of women’s positions with lower wages and prestige serves to sustain the occupational segregation and justify the subtle discrimination that hinders women. sometimes perception creates reality. it would be a mistake to group all women as a whole, expecting that the experiences of all are exactly alike, just as not all men are technologically adept. socioeconomic factors, as well as ethnic and geographic differences, influence the abilities and desires of women and men to succeed in technology professions. yet the smaller number of women in technology subfields of librarianship implies an almost “symbolic image of the discipline as masculine, which in turn reinforces the minority position of women.”33 likewise, the far greater number of women writing in children’s librarianship simply reinforces this subfield as feminine. according to alksnis, “on the demand side, jobs are often seen as requiring the characteristics of the group that already dominates it.”34 the lack of women in the it field continues to reinforce the stereotype and perpetuate the imbalance. n conclusion to remedy the underrepresentation of women in it, it would be simple to call for greater educational opportunities for girls, mentoring programs for young professional women, and economic incentives to retain mid-career women. the situation, however, is not simple. a series of organizational, societal, and cultural perceptions inhibit women from associating or identifying with it. rasmussen and hapnes refer to a combination of organizational culture and gender politics that discourage women.35 instead of a focus on the numbers of women in it, librarians should work to transform the organizational culture. as technology progresses, the definition of technology work must be reevaluated and the entries into the technology fields must be redefined. in short, what constitutes it must be rethought, recast, and revalued as technology develops. in the library specifically, it and librarianship have much in common. at present, the library has a dichotomized workforce of female librarians and male it workers. over time, the skills of librarians and technologists will blend. if managed properly, the best of classic library theory and practice will combine with it into a dynamic and diverse workforce as well as a thriving and innovative organization. references and notes 1. examples of research and statistics concerning the number and status of women in technology fields, in addition to those noted in the paper, include carol simard et al., climbing the technical ladder: obstacles and solutions for mid-level women in technology (palo alto, calif.: anita borg institute for women and technology, 2008), http://anitaborg.org/files/climbing_the_ technical_ladder.pdf (accessed oct. 20, 2008); u.s. department of labor, bureau of labor statistics, household data annual averages, “table 11. employed persons by detailed occupation, sex, race and hispanic or latino ethnicity,” ftp://ftp.bls.gov/ pub/special.requests/lf/aat11.txt (accessed oct. 20, 2008); and jay vesgo, “cra taulbee trends: female students and faculty,” computing research association, june 17, 2008, www.cra.org/ info/taulbee/women.html (accessed oct. 20, 2008). 2. national science foundation, division of science resources statistics, women, minorities, and persons with disabilities in science and engineering: 2007, nsf 07-315, table h-5, employed scientists and engineers by occupation, highest degree level, and sex: 2003 (arlington, va.: national science foundation, 2007): 222, http://www.nsf.gov/statistics/wmpd/ pdf/nsf07315.pdf (accessed june 11, 2008). 3. national science foundation, division of science resources statistics, women, minorities, and persons with disabilities in science and engineering: 2007, table h-16, median annual salary of scientists and engineers employed full time, by highest degree, broad occupation, age group, and sex: 2003, 225. 4. association of research libraries, arl annual salary survey 2007–08, table 17, number and average salaries by position 142 information technology and libraries | september 2009 and sex (washington, d.c.: arl, 2008): 42–43, tables 17–18 www.arl.org/stats/annualsurveys/salary/annualedssal.shtml (accessed aug. 2008). 5. sook lim, “job satisfaction of information technology workers in academic libraries,” library & information science research 30, no. 2 (2008): 120. 6. stephanie maatta, “placements and salaries 2006: what’s an mlis worth?” library journal (oct. 15, 2007), www.library journal.com/article/ca6490671.html (accessed aug. 29, 2008) 7. malin hakanson, “the impact of gender on citations: an analysis of college & research libraries, journal of academic librarianship and library quarterly,” college & research libraries 66, no. 4 (2005): 312–22. 8. bente rasmussen and tove hapnes, “excluding women from the technologies of the future? a case study of the culture of computer science,” futures 23, no. 10 (1991): 1107. 9. krista scott-dixon, “from digital binary to analog continuum: measuring gendered it labor: notes toward multidimensional methodologies,” frontiers 26, no. 1 (2005): 26. 10. ibid. 11. ibid., 30. 12. stephanie maatta, “placements and salaries 2006.” 13. christine alksnis, serge desmarais, and james curtis, “workforce segregation and the gender wage gap: is women’s work valued as highly as men’s?” journal of applied social psychology 38, no. 6 (2008): 1416–41. 14. ibid., 1419. 15. association of research libraries, arl annual salary survey 2007–08, 42–43, tables 17–18. 16. lori ricigliano and renee houston, “men’s work, women’s work: the social shaping of technology in academic libraries,” (paper presented at the association of college and research libraries 11th annual national conference, charlotte, n.c., apr. 10–13, 2003): 1. 17. alison adam et al., “being an ‘it’ in it: gendered identities in it,” european journal of information systems 15, no. 4 (2006): 368–78. 18. eszter hargittai and steven shafer, “differences in actual and perceived online skills: the role of gender,” social science quarterly 87, no. 2 (2006): 432–48. 19. rasmussen and hapnes, “excluding women,” 1108. 20. nancy ramsey and pamela mccorduck, “where are the women in information technology? preliminary report of liberature search and interviews” (report prepared for the national center for women and information technology, feb. 5, 2005): 9, http://www.anitaborg.org/files/abi_wherearethe women.pdf (accessed june 12, 2009). 21. ibid. 22. indira r. guzman, kathryn r. stam, and jeffrey m. stanton, “the occupational culture of is/it personnel within organizations,” the data base for advances in information systems 39, no. 1 (2008): 45. 23. ramsey and mccorduck, “where are the women in information technology?” 9 24. guzman, stam, and stanton, “occupational culture,” 45. 25. ibid. 26. melanie wilson, “a conceptual framework for studying gender in information systems research,” journal of information technology 19, no. 1 (2004): 87. 27. judy wajcman, “reflections on gender and technology studies: what is state of the art?” social studies of science 30, no. 3 (june 2000): 454. 28. alksnis, desmarais, and curtis, “workforce segregation,” 1418. 29. wilson, “conceptual framework,” 85. 30. patricia a. roos and barbara f. reskin, “occupational desegregation in the 1970s: integration and economic equality?” sociological perspectives 35, no.1 (1992): 87. 31. nijole v. benokraitis, “sex discrimination in the 21st century,” in subtle sexism: current practice and prospects for change, ed. nijole v. benokraitis (thousand oaks, calif.: sage, 1997): 11. 32. fiona wilson, “can compute, won’t compute: women’s participation in the culture of computing,” new technology, work and employment 18, no. 2 (2003): 127. 33. vivian anette lagesen, “extreme make-over? the making of gender and computer science” (phd diss., norwegian university of science and technology, trondheim, norway, 2005): 188. 34. alksnis, desmarais, and curtis, “workforce segregation,” 1419. 35. rasmussen and hapnes, “excluding women,” 1108. 51 brown university library fund accounting system robert wedgeworth: brown university library, providence, r. i. the computer-based acquisitions procedures which have been developed at the library provide more efficient and more effective control over fund accounting and the maintenance of an outstanding order file. the system illustrates an economical, yet highly flexible, approach to automated acquisitions procedures in a university library. · the fund accounting system of the brown university .library was initiated on the basis of a program developed in april, 1966. subsequently, it was decided to implement the program in the fall of that year. the necessary in-house equipment, namely, an ibm 826 typewriter card punch and an ibm 026 keypunch, was placed on order along with new six-part order forms. about the same time an agreement was reached with the administrative data processing office of the university (tabulating) which would provide for rental time on their ibm 1401, 12k system with three magnetic disks and four magnetic tape-storage units. the services of a part-time programmer were also secured through this office. the system became fully operational on december 1, 1966. the primary objective of the project was to establish more efficient and more effective control over the approximately 150 fund accounts administered by the order department of the university library. in addition, it seemed that a number of by-products were possible. among these were statistical information for management and a file of bibliographical records from which a new accessions list could be drawn on a regular basis. the system was to accommodate the payment of all invoices to be posted against the aforementioned accounts. these include mono52 journal of library automation vol. 1/ 1 march, 1968 graphic and serial publications as well as supplies and equipment. however, records of outstanding orders were to be maintained for monographic publications only. although the basic routines were to remain much the same, some minor adjustments wer~ necessary to accommodate the new machine system. also, several flle s of dubious value to the new system · were to be maintained in order to gain empirical evidence as to their worth. this report is presented as a record of an attempt to develop an economical, yet highly flexible approach to the automating of acquisitions procedures of a university library. perhaps the scope of the computer-based acquisitions procedures at brown may be determined more easily relative to three recently reported systems of varying complexity. one of tl1e best surveys of automated university library acquisitions systems appears in the project report of the university of illinois, chicago campus (1). however, two of the systems summarized here are more recent. the university of michigan was included in the illinois literature survey, but the first full description to be published appeared just recently. · automated acquisitions procedures have been in operation at the university of michigan library since june, 1965 (2). the system features a list of items produced by computer from punch cards in which order · information has been recorded. this list is produced on a monthly basis with semi-weekly cumulative supplements. the computer also produces status report cards. these are punch cards, containing summarized order information, which travel with the book and at appropriate processing stages are coded and returned to the computer in order to up-date the status code in the processing list. thus by checking the status code one can determine that a book has been received, received and paid, or cataloged. claim notices are automatically produced for items which remain on order for longer than the predetermined period. in addition to creating and maintaining full financial records and compiling selected statistics, the system will produce specialized acquisitions lists on demand. yale university library creates a machine readable record of a request before it is searched or ordered ( 3). as a result, the status-monitoring system is almost immediately effective. an ibm 826 typewriter card punch is used to type purchase orders, and the ibm 357 data collection system is used to monitor the progress of an item through the system. the process information list is produced weekly with daily supplements. automatic claiming and financial record maintenance are also products of the system. moreover, numerous statistics are planned for management purposes. the fund control system reported by the university of hawaii features financial accounting for book purchases based on pre-punched cards corresponding to purchase orders typed ( 4). the list price is keypunched into the appropriate card in a separate operation and used to encumber library fund accounting system / wedgeworth 53 funds. upon receipt of the book the invoice is matched with the appropriate punch card, and after actual cost is keypunched the card is used to up-date the account. the michigan and yale systems incorporate all of the major features of operational university library automated acquisitions systems. foremost among them are the list of items being processed and its coordinate monitoring system. the cost of creating and maintaining such a file was prohibitive for brown. brown, michigan and hawaii generate a machine record after searching. unlike michigan and yale, brown and hawaii do not have "total" acquisitions systems plans. at brown serials control is not included. at hawaii fund accounting is the only task of the system. also, brown differs from michigan and yale in that the claiming procedure merely notifies the department that certain items are overdue. the brown system is certainly not as economical as that of hawaii, but the use of the typewriter card punch creates a highly flexible and easily expanded system for the difference in cost. manual files and procedures the manual routines of the order department are based upon the maintenance of four basic files. the file documents are all parts of the six-part purchase order form. the outstanding order search file is an alphabetical card file representing unfilled orders, requests t9 search for items, and inquiries for bibliographical information. this file is virtually independent of other routines, thus making it feasible for it to be merged with the file of items waiting to be cataloged. the processing file consists of outstanding orders filed first by book dealer, and second by order number. this file is used to check in shipments of books, to record reports on orders and to record claims. the numerical control file is an order number sequence file containing one copy of every order typed regardless of its ultimate disposition. it provides rapid access to information regarding retrospective orders. the fund file is a file of completed or cancelled transactions filed first by fund name and second by order number. the latter two files were thought to be of dubious value to the new system. however, it was agreed to maintain both for the time being. in order to accommodate the ftmd accounting system, the procedures developed feature two basic routines based on the presence or absence of a unique order number. unique order (figure 1) items acquired in this fashion include purchases and solicited gifts. continuations, but not serials, are included. when a request is received in the order department, it is searched in the main catalog, the waiting catalog and the outstanding order file. if it is found to be neither in the library nor on order, it is then given to an order assistant who completes the bibliographical work, if necessary, and assigns a fund and 54 journal of library automation vol. 1/1 march, 1968 fig. 1. unique order procedure. abbrf.v!ations kp key punch kv • key verify c exhibit c fimf~te maintenance . n.b . library fund accounting systemjwedgeworth 55 fund slip to kp card kp-kv record card · book arrives pull out .... t-----1 order card catalo!! er retuh~ update e* proposed accessions listinr pror ram fig. 1 continued. 56 journal of library automation vol. 1/ 1 march, 1968 dealer. if the price is listed in a foreign currency, the assistant converts it to u. s. dollars. the request then proceeds to the typist. all unique orders are typed on an 826 typewriter card punch. as the typist fills in the six-part order form, pre-~lected pieces of information are keypunched automatically. these fields are as follows : order number order date source type d for domestic, etc. fund number list price author title imprint series orders are proofread on the day after they are typed. the forms are separated and the outstanding order cards are filed immediately in order to detect duplicate orders. at this point the dealer slips are mailed and the numerical control slips filed. the processing file documents, each containing a fund slip, an l.c. order slip, and a cataloger's work slip on a separate perforation, are then filed pending the arrival of the books. also, the deck of ibm cards which has been weeded of voided orders goes to tabulating. ·. although books may be processed without invoices, the normal practice is to process after the arrival of the invoice. the processing file document is obtained and the cost, invoice date and the number of volumes are noted on the fund slip. if the item is a continuation, a supplementary fund slip is made and the original returned to the processing file with the receipt noted. the invoices are cleared and sent to the controller. the fund slips representing books received are sent to the keypuncher in order to up-date the accounts. in the meantime the books, along with the work slips and the l.c. order slips, are sent to the catalog department. as the books are cataloged, the work slips noting any major bibliographical changes and the call number are returned to the order department. from these slips are punched bibliographical adjustment cards and an up-date record card containing the call number and coded for subject and location. the resulting bibliographical record forms the data base for the new accessions listing. no unique order (figure 2 ) items acquired in this fashion include unsolicited gifts, exchanges, standing orders, etc. some continuations and all serials invoices are included. upon arrival, invoiced items without unique order numbers are searched. if they are duplicates they are retwned for credit. if they are not duplicates, they are sent to the typist. catalog file slips are typed library fund accounting systemjwedgeworth 57 kp • kv series 9 card book without unique order number here book & invoice to typist create slips & record card books & slips to cataloger invoice cheeked and si ned invoice to controller n.b. of course no r.ecord card will be made for alfts or exchanges fig. 2. no unique order procedure. 58 journal of library automation vol. 1/ 1 march, 1968 and by-product bibliographical and ·accounting records are punched. on the record card for accounting, the order number field is filled with nines. this signals the program that this entry is a receipt for which there was no unique order number. the series of order numbers beginning with 900000 was originally reserved for assignment to our standing order agreements with presses, societies, etc. eventually, each will have its own order number. however, the last number of the series, 999999, will continue to be used for miscellaneous receipts. presently no accessions listing records are being generated for items without unique order numbers. however, all purchases without unique order numbers are processed with a series 9 order number. serials all serial invoices are handled as series 9 transactions with no attempt to record bibliographical information or volume counts. expenditures for serials are accumulated and entered as one transaction each time the accounts are up-dated. this decision was made in anticipation of the development of a separate serials control program. ibm 1401 files and procedures the basic function of the computer program for the fund accounting system is to maintain current balances on the various library fund accounts and to maintain a file of outstandmg orders exclusive of standing orders. although several correlative functions are distinct possibilities, the only additionat function planned is a file of bibliographic records for the production of an accessions listing. figures 3, 4, 5 and 6 illustrate the major __ tasks to be performed by the system. the programming language used is autocoder. fund balance forward file a card file created at the beginning of each fiscal year having two card types. l fund group header card a. group code b. groq.p name this card assigns a unique code and name to categories of funds such as endowed, special, etc. 2. fund balance forward and appropriation card a. fund group code b. fund code c.-fund name d. previous year balance forward e. current income . or appropriation f. balance forward code g. remaining previous year encumbrances library fund accounting systemj wedgeworl'h 59 fund balance forward file create libr.ary fund accounting file fig. 3. fttnd file creation. fund grou9 headers fund listing this card contains information used to establish the individual funds at the beginning of each year. the balance forward code directs the program to carry over excess funds to the next year, not to carry over excess funds to the next year, or to carry over a negative balance to the next year, thereby reducing cash balance resulting from the new income or appropriation. encumbrances are carried over to the next year in order to maintain an accurate net available at all times. 60 journal of library automation vol. 1/ 1 march, 1968 .--i f/m i ·--fig. 4. file maintenance. completed record cards r-1 i i i i library fund accounting system/ wedgeworth 61 new or ders ------.., i i i i _..j fig. 5. fund accounts updating . 62 journal of library automation vol. 1/1 march, 1968 library fund file a magnetic tape file created from the fund balance forward file and containing three record types,. 1. fund group header 2. fund record a. fund group code b. fund code c. fund name d. previous year balance forward e. current income or appropriation f. current expenditures g. cash balance h. amount encumbered i. net available j. volumes purchased k. balance forward code fund record fields a, b, c, d, e, h and k initially are taken from the corresponding fields in the fund balance forward card. current year expenditures and volumes purchased are preset to zero each year. cash balance is determined by the sum of the previous year balanc(l forward_ and the current income or appropriation. amount encumbered will be · preset to zero or taken from the fund card. net available is determined by the difference between cash balance and amount encumbered. 3. fund group trailer this record is the last within each fund group and contains a summation of the quantitative fields in that fund group. it is used primarily for control purposes. figure 4 illustrates the file maintenance program for the library fund files. this program permits the addition or deletion of a fund group code, changes to a fund group header, addition or deletion of a specific fund or changes to a specific fund. however, changes to quantitative fields are limited to those fields which are contained in the fund balance forward card. thus, net available may not be changed directly by file maintenance but may be changed by manipulating current income or appropriation. . the library fund f ile is a serial file maintained in ascending algebraic sequence on fund group code, fund code and fund record from major to minor respectively. outstanding order file a magnetic disk file created and up-dated by three card types. 1. order card a. order number b. order date library fund accounting system/ wedgeworth 63 c. source type d is domestic, f is foreign d. fund number e. list price figure 5 illustrates the program which processes new orders. this program validates fund code, rejects duplicate order numbers and encumbers list price, thereby reducing net available. 2. record card a. order number b. invoice date c. fund code d. cost e. continuation order code, if applicable f. number of volumes standing orders, blanket orders, serials, etc. are purchased without placing an order. consequently, a series 9 order number is assigned to these record cards. such cards will not match the outstanding order file by definition but will increase amount expended, decrease cash balance and net available and increase volumes purchased. all other record cards must match an existing order number on file. on continuations the record card for each part received produces a transaction as described above, except that the encumbrance remains unchanged until the final record card appears without the continuation order code. 3. adjustment card this card may be submitted for either an order card or a record card. it is differentiated by a special code. its primary purpose is to correct a previous error or to effect a cancellation. · the outstanding order file is in ascending algebraic sequence by fund group, fund code and order number. all cards used in this program must be pre-sorted into this sequence. p1'intout products the accumulated punch cards are processed on a bi-weekly schedule by the tabulating office. a file maintenance report (figure 4) is the first product of each run. it lists in detail any adjustments, additions, or deletions to the fund listing plus the results of such operations. at the end of the detailed report is a summary of the status of each active fund. copies of this latter report are distributed for desk use to all order assistants, the chief order librarian, and the librarian. the transaction register of fund activity (figure 5) lists each transaction posted to each fund for the inclusive period. the assistant in charge· of bookkeeping is the primary user of this and the detailed file maintenance report. 64 journal of library automation vol. 1/ 1 march, 1968 the delinquent orders report (figure 6) lists all past due outstanding orders according to two cycles. domestic orders are listed bi-monthly and foreign orders are listed quart<;rly. the listing is of the "tickler" variety, as it may not be necessary to ask reports on all of the items. an order will remain on the delinquent orders report until it is filled or cancelled. control card list delinquent orders fig. 6. delinquent order listing. conclusion delinquent orders . as of october, 1967, the fund accounting system has been in operation for ten months. assessment of its effectiveness in terms of meeting the primary objective shows the system to be an immediate success. at this . point costs are about the same for the manual system as for the present one. however, accounts which used to require from 25 to 30 man-hours per month are maintained with about 5 man-hours per month. our current equipment and processing costs run about $325 per month. on the other hand, we have become aware of some shortcomings of the system. the addition of a currency conversion sub-routine would greatly expedite the many requests for foreign publications received daily. secondly, the addition of a dealer code would make the delinquent orders list much more useful. at present a user must search the numerical file for the order to ascertain the dealer. the processing file copies are then pulled to go to the typist who asks reports on delinquent orders. a revised program incorporating both of these features is being planned and will be operational early in 1968. the proposed accessions listing has been rejected as a by-product of this system primarily because of the limited character set available on our ibm 1403 print chain and the excessive length of the average listing. the time and expense of storing and up-dating the bibliographical record library fund accounting system j wedgeworth 65 for each new acquisition should, in our estimation, result in a more palatable end-product. we have, therefore, temporarily discontinued producing punch cards for the bibliographical records. as a corollary, it should be added that we have turned to a consideration of the paper tape typewriters as input/ output devices, focusing on their expanded character set and operating speed. the speed of the 826 leaves much to be desired. the numerical control file has proven its usefulness as a rapid index to our files spanning several years. it is extremely helpful in identifying quotes on old order numbers which have long since been cancelled. the fund file, however, has proven to be a duplicate of our machine file. it is thought that replacement of the slip in the numerical control file with the fund slips would at the same time reduce our files by one and up-date the information in the numerical file. finally, this modest beginning, occasioned by limited financial resources as well as the lack of personnel with experience in data processing, seems to have been justified. moreover, although the increasing complexity of our involvement in library automation poses some serious planning and supervisory problems, we are encouraged by our initial success. acknowledgments the staff of the order department have all contributed to the production of this report. however, a special note of gratitude is acknowledged for t:pe assistance of dorothy woods and gloria hagberg imd for the technical advice and assistance of ai hansen, library programmer, and david a. jonah, librarian. references 1. kozlow, robert d.: report on a library project conducted on the chicago campus of the university of illinois, (washington: nsf, 1966), p. 50. 2. dunlap, connie : "automated acquisitions procedures at the univer. sity of michigan library," library resources & technical services, 11 ·(spring 1967), 192. 3. alanen, sally; sparks, david e.; kilgour, frederick g.: "a computer. monitored library technical processing system," american documen· tation institute. proceedings, 3 ( 1966), 419. 4. · shaw, ralph r.: "conh·ol of book funds at the university of hawaii library," library resomces & technical services, 11 (summer 1967) , 380. eclipse editor for marc records bojana dimić surla information technology and libraries | september 2012 65 abstract editing bibliographic data is an important part of library information systems. in this paper we discuss existing approaches in developing user interfaces for editing marc records. there are two basic approaches: screen forms that support entering bibliographic data without knowledge of the marc structure, and direct editing of marc records shown on the screen. this paper presents the eclipse editor, which fully supports editing of marc records. it is written in java as an eclipse plug-in, so it is platform-independent. it can be extended for use with any data store. the paper also presents a rich client platform (rcp) application made of a marc editor plug-in, which can be used outside of eclipse. the practical application of the results is integration of the rcp application into the bisis library information system. introduction an important module of every library information system (lis) is one for editing bibliographic records (i.e., cataloguing). most library information systems store their bibliographic data in a form of marc records. some of them support cataloging by direct-editing of marc record; others have a user interface that enables entering bibliographic data by a user who knows nothing about how marc records are organized. the subject of this paper is user interfaces for editing marc records. it gives software requirements and analyzes existing approaches in this field. as the main part of the paper, we present the eclipse editor for marc records, developed at the university of novi sad, as a part of the bisis library information system. eclipse uses the marc 21 variant of the marc format. the remainder of this paper describes the motivation for the research, presents the software requirements for cataloging according to marc standards, and provides background on the marc 21 format. it also describes the development of the bisis software system, reviews the literature concerning tools for cataloging, and analyzes existing approaches in developing user interfaces for editing marc records. the results of the research are presented in the final section, which describes the functionality and technical characteristics of the eclipse marc editor. the rich client platform (rcp) version of the editor, which can be used independently of eclipse, is also presented. motivation the motivation for this paper was to provide an improved user interface for cataloging by the marc standard that will lead to more efficient and comfortable work for catalogers. bojana dimić surla (bdimic@uns.ns.ac.yu) is an associate professor, university of novi sad, serbia. eclipse editor for marc records |surla 66 there are two basic approaches in developing user interfaces for marc cataloging. the first approach includes using a classic screen form made of text fields and labels with the description of the bibliographic data, without marc standard indication. the second approach is direct editing of a record that is shown on the screen. those two approaches will be discussed in detail in “existing approaches in developing user interfaces for editing marc records” below. the current editor in the bisis system is a mixture of these two approaches—it supports direct editing, but data input is done via text field, which opens on double click.1 the idea presented in this paper is to create an editor that overcomes all drawbacks of previous solutions. the approach taken in creating the editor was direct record-editing with real-time validation and no additional dialogs. software requirements for marc cataloging the user interface for marc cataloging needs to support following functions: • creating marc records that satisfy constraints proposed by the bibliographic format • selecting codes for field tags, subfield names, and values of coded elements, such as character positions in leader and control fields, indicators, and subfield content • validating entered data • access to data about the marc format (a “user manual” for marc cataloging) • exporting and importing created records • providing various previews of the record, such as catalog cards background marc 21 as was previously mentioned, the eclipse editor uses the marc 21 variant. marc 21 consists of five formats: bibliographic data, authority data, holdings data, classification data, and community information.2 marc 21 records consist of three parts: record leader, set of control fields, and set of data fields. the record leader content, which follows the ldr label, includes the logical length of the record (first five characters) and the code for record status (sixth character). after the record leader, there are control fields. every control field is written in new line and consists of the threecharacter numeric tag and content of the control field. the content of the control field can be a single datum or a set of fixed-length bibliographic data. control fields are followed by data fields in the record. every line in the record that contains a data field consists of a three-character numeric tag, the value for the first and the second indicator—or the number sign (#) if indicators are not defined for the field—and the list of subfields that belong to the field. information technology and libraries | september 2012 67 detailed analysis of marc 21 shows that there are some constraints on the structure and content of the marc 21 record. constraints on the structure define which fields and subfields can appear more than once in the record (i.e., are the fields and subfields repeatable or not), the allowed length of the record elements, and all the elements of the record defined by marc 21. constraints on the record content are defined on the content of the leader, indicators, control fields and subfields. moreover, some constraints connect more elements in the record (when the content of one element depends on the content of the other element in the record). an example of constraint on the structure for data field 016 is that the field has the first indicator whereas the second indicator is undefined. the field 016 can have subfields a, z, 2, and 8, of which z and 8 are repeatable. bisis the results presented in this paper belong to the research on the development of the bisis library information system. this system, which has been in development since 1993, is currently in its fourth version. the editor for cataloging in the current version of bisis was the starting point for the development of eclipse, the subject of this paper. 3 apart from an editor for cataloging, the bisis system has a module for circulation and an editor for creating z39.50 queries.4 the indexing and searching of bibliographic records was implemented using the lucene text server.5 as a part of the editor for cataloging, we developed the module generating various reports and catalog cards from marc records.6 bisis also supports creating an electronic catalog of unimarc records on the web, where the input of bibliographic data can be down without knowing unimarc but the entered data are mapped to unimarc and stored in the bisis database.7 the recent research within the bisis project relates to its extension for managing research results at the university of novi sad. for that purpose, we developed the current research information system (cris) on the recommendation of the nonprofit organization eurocris.8 the paper “cerif compatible data model based on marc 21 format” gives the proposal for the common european research information format (cerif), a compatible data model based on marc 21. in this model, a part of the cerif data model that relates to research results is mapped to marc 21. furthermore, on the basis of this model, research management at the university of novi sad was developed.9 the paper “cerif data model extension for evaluation and quantitative expression of scientific research results” explains the extension of cerif for evaluation of published scientific research. the extension is based on the semantic layer of cerif, which enables classification of entities and their relationships by different classification schemas.10 the current version of the bisis system is based on a variant of the unimarc format. the development of the next version of bisis, which will be based on marc 21, is in progress. the first task was migrating existing unimarc records.11 the second task is developing the editor for marc 21 records, which is the subject of this paper. eclipse editor for marc records |surla 68 cataloging tools an editor for cataloging is a standard part of a cataloger’s workstation and the subject of numerous studies. lange describes the cataloging development process from handwritten cataloging cards, to typewriters (first manual then electronic), to the appearance of marc records and pc-based cataloger’s workstations.12 leroya and thomas debate the influence of web development on cataloging. they stress that the availability of information on the web, as well as the possibility that more applications can be opened in the same time in different windows, greatly influence the process of creating bibliographic records. their paper also indicates that there are some problems that result from using large numbers of resources from the web, such as errors that arise from copy-paste methods. consequently, there is a need for automatic check of spelling errors and the possibility of a detailed review by a cataloger during editing.13 khurshid deals with general principles of the cataloger’s workstation, its configuration, and its influence on a cataloger’s productivity. in addition to efficient access to remote and local electronic resources, khurshid includes record transfer through a network and sophisticated record editing as important functions of a cataloger’s workstation. furthermore, khurshid says it is possible to improve cataloging efficiency in the windows-based cataloger’s workstation by finding bibliographic records in other institutions and cutting and pasting lengthy parts of the record (such as summary notes) to their own catalog.14 existing approaches in developing user interfaces for editing marc records the basic source for this analysis of existing user interfaces for editing marc records was the official site for marc standards of the library of congress in addition to scientific journals and conferences. the analysis of existing systems shows that there are two basic approaches in the implementation of editing marc records: 15 • entering bibliographic data in classic screen forms made of text fields and labels, which does not require knowledge of the marc format (concourse,16 koha,17 j-marc18) • direct editing of a marc record shown on the screen (marcedit,19 isismarc,20 catalis,21 polaris,22 marcmaker and marcbraker,23 exlibris voyager24). both of these approaches have advantages and disadvantages. the drawback of the first approach is that it provides a limited set of bibliographic data to edit, and the extension of that set implies changes to the application, or in the best cases changes in configuration. another problem is that there are usually a lot of text fields, text areas, combo boxes, and labels on the screen that need to be organized into several tabs or additional windows. this situation usually makes it difficult for the users to see errors or to connect different parts of the record when checking their work. moreover, all found solutions from the first group perform little validation of data entered by the user.25 one important advantage of the first approach is that the application can be used by a user information technology and libraries | september 2012 69 who is not familiar with the standard, thus the need for access to marc data can be avoided (one of functions listed “marc 21” above). as for second approach, editing a marc record directly on the screen overcomes the problem of extending the set of bibliographic data to enter. it also enables users to scan entered data and check the whole record, which appears on the screen. users can also copy and paste parts of records from other resources into the editor. however, a majority of those applications are actually editors for editing marc files that are later uploaded in some database or transformed in some other format (marcedit, marcmaker and marcbreaker, polaris), and they usually support little or no data validation.26 they allow users to write anything (i.e., the record structure is not controlled by the program), and only validate at the end of the process when uploading or transforming the record. among those editors there are those, such as catalis and isismarc, that present the marc record as a table. they support the control of structure, but the record presented in this way is usually too big to fit on the screen, so it is separated into several tabs. an important function of editing marc records is selecting code for coded elements that can be positioned in the leader or control field, value of the indicator, or value of the subfield. there are also field tags or subfield codes that sometimes need to be selected for addition to a record. all analyzed editors provide additional dialogs for picking this code that require the user to constantly open and close dialogs, which sometimes can be annoying for the user. one important fact about editors in the second group is that they can be used only by a user who is familiar with marc, so access to the large set of marc element descriptions can make the job easier. some of the mentioned systems provide descriptions of the fields and subfields (e.g., isismarc), but most of them do not. findings the editor for marc records was developed as a plug-in for eclipse; therefore it is similar to eclipse’s java code editors. as the editor is written in java, it is platform-independent. the main part of this editor was created using oaw xtext framework for developing textual domain-specific languages.27 it was created using model-driven software development by specifying the model of marc record in a form of xtext grammar and generating the editor. all main characteristics of the editor were generated on the basis of the specification of constraints and extensions of the xtext grammar—therefore all changes to the editor can be realized by changing the specification. moreover, this editor can be easily adjusted for any database by using the concept of extension and extension point in the eclipse plug-in. we make this application independent of eclipse by using rich client platform (rcp) technology. this editor is implemented for marc 21 bibliographic and holdings formats. user interface eclipse editor for marc records |surla 70 figure 1 shows the editor opened within eclipse. the main area is marked with “1”—it shows the marc 21 file that is being edited. that file contains one marc 21 bibliographic record. the tags of the fields and subfields codes are highlighted in the editor, which contributes to presentation clarity. the area marked with “2” serves for listing the errors in the record, that is, nonvalid elements entered in the record. the area marked with “3” shows data about marc 21 in a tree form. this part of the screen has two other possible views: a marc 21 holdings format tree and a navigator, which is the standard eclipse view for browsing resources for the opened project. the actions available for creating a record are available in the cataloging menu and on the cataloging toolbar, which is marked with “4.” these are actions for previewing the catalog card, creating a new bibliographic record, loading a record from a database (importing the record), uploading a record to a database (exporting the record), and creating a holdings record for this bibliographic record. figure 1. eclipse editor for marc records in the eclipse editor for marc, selecting codes is enabled without opening additional dialogs or windows (figure 2). that is a standard eclipse mechanism for code completion: typing ctrl + space opens the dropdown list with all possible values for the cursor’s current position. information technology and libraries | september 2012 71 figure 2. selecting codes record validation is done in real time, and every violation is shown while editing (figure 3). figure 3 depicts two errors in the record: one is a wrong value in the second character position in control field 008, and another is that two 100 fields were entered, which is a field that cannot be duplicated in a record. figure 3. validation errors rcp application of the cataloging editor as shown above, the editor is available as an eclipse plug-in, which raises the question of what a cataloger will do with all the other functions of the eclipse integrated development environment (ide). as seen in figures 1 and 3, there are a lot of additional toolbars and menus that not related eclipse editor for marc records |surla 72 to cataloging. the answer lies in rcp technology. rcp technology generates independent software applications on the basis of a set of eclipse plug-ins.28 the main window of an rcp application with additional actions is shown in figure 4. beside the cataloguing menu that is shown, the window also contains the file menu, which includes save and save as actions, as well as the edit menu, which includes undo and redu actions. all of these actions are also available via the toolbar. figure 4. rcp application conclusion the goal of this paper was to review current user interfaces for editing marc records. we presented two basic approaches in this field and analyzed of advantages and disadvantages of each. we then presented the eclipse marc editor, which is part of the bisis library software system. the idea behind eclipse is inputting structured marc data in the form similar to programming language editors. the author did not find this approach in the accessible literature. the rcp application of the presented editor will find its practical application in future versions of the bisis system. it represents an upgrade of the existing editor and a starting point for forming the version of the bisis system that will be based on marc 21. the acquired results can also be information technology and libraries | september 2012 73 used for the input of other data into the bisis system, including data from the cris system used at the university of novi sad. this paper shows that eclipse plug-in technology can be used for creating end user applications. the development of applications with the plug-in technology enables the use of a big library of created components from the eclipse user interface, whereby writing source code is avoided. additionally, the plug-in technology enables the development of extendible applications by using the concept of the extension point. in this way, we can create software components that can be used by a great number of different information systems. by using the concept of “extension point,” the editor can be extended by the functions that are specific for a data store. an extension point was created for export and import of marc records, which means the marc editor plug-in can be used with any database management system by extending this extension point in eclipse plug-in technology. future work in the development of the eclipse marc editor is to implement support for additional marc formats, for authority and classification data, and for community information. these formats propose the same record structure but have different constraints on the content and different sets of fields and subfields, as well as different codes for character positions and subfields. therefore the appearance of the editor will remain the same. the only difference will be the specification of the constraints and codes for code completion. another interesting topic for discussion is considering implementation of other modules of library information systems in eclipse plug-in technology. references 1. bojana dimić and dušan surla, “xml editor for unimarc and marc21 cataloging,” electronic library 27 (2009): 509–28; bojana dimić, branko milosavljević, and dušan surla, “xml schema for unimarc and marc 21 formats,” electronic library 28 (2010): 245–62. 2. library of congress, “marc standards,” http://www.loc.gov/marc (access february 19, 2011). 3. dimić and surla, “xml editor,” dimić, milosavljević, and surla, “xml schema.” 4. danijela tešendić, branko milosavljević, and dušan surla, “a library circulation system for city and special libraries,” electronic library 27 (2009): 162–68; branko milosavljevic and danijela tešendić, “software architecture of distributed client/server library circulation,” electronic library, 28 (2010): 286–99; danijela boberić and dušan surla, “xml editor for search and retrieval of bibliographic records in the z39.50 standard,” electronic library 27 (2009): 474–95. 5. branko milosavljević, danijela boberić, and dušan surla, “retrieval of bibliographic records using apache lucene,” electronic library 28 (2010): 525–36. http://www.loc.gov/marc eclipse editor for marc records |surla 74 6. jelana rađenović, branko milosavljеvić, and dušan surla, “modelling and implementation of catalogue cards using freemarker,” program: electronic library and information systems 43 (2009): 63–76. 7. katarina belić and dušan surla, “model of user friendly system for library cataloging,” comsis 5 (2008): 61–85; katarina belić and dušan surla, “user-friendly web application for bibliographic material processing,” electronic library 26 (2008): 400–410; eurocris homepage, www.eurocris.org (accessed february 21, 2011). 8. dragan ivanović, dušan surla, and zora konjović, “cerif compatible data model based on marc 21 format,” electronic library 29 (2011). http://www.emeraldinsight.com/journals.htm?articleid=1906945. 9. eurocris, “common european research information format,” http://www.eurocris.org/index.php?page=cerifreleasesandt=1 (accessed february 21, 2011); dragan ivanović et al., “a cerif-compatible research management system based on the marc 21 format,” program: electronic library and information systems 44 (2010): 229–51. 10. gordana milosavljević et al., “automated construction of the user interface for a cerifcompliant research management system,” the electronic library 29 (2011). http://www.emeraldinsight.com/journals.htm?articleid=1954429; dragan ivanović, dušan surla, and miloš racković, “a cerif data model extension for evaluation and quantitative expression of scientific research results,” scientometrics 86 (2010): 155–72. 11. gordana rudić and dušan surla, “conversion of bibliographic records to marc 21 format,” electronic library 27 (2009): 950–67. 12. holley r. lange, “catalogers and workstations: a retrospective and future view,” cataloging & classification quarterly 16 (1993): 39–52. 13. sarah yoder leroya and suzanne leffard thomas, “impact of web access on cataloging,” cataloging & classification quarterly 38 (2004): 7–16. 14. zahirrudin khurshid, “the cataloger’s workstation in the electronic library environment,” electronic library 19 (2001): 78–83. 15. library of congress, “marc standards,” http://www.loc.gov/marc (access february 19, 2011). 16. book systems, “concourse software product,” http://www.booksys.com/v2/products/concourse (accessed february 19, 2011). 17. koha library software community homepage, http://koha-community.org (accessed february 19, 2011). http://www.emeraldinsight.com/journals.htm?articleid=1906945 http://www.emeraldinsight.com/journals.htm?articleid=1954429 http://www.loc.gov/marc http://www.booksys.com/v2/products/concourse http://koha-community.org/ information technology and libraries | september 2012 75 18. wendy osborn et al., “a cross-platform solution for bibliographic record manipulation in digital libraries,” (paper presented at the sixth iasted international conference communications, internet and information technology, july 2–4, 2007, banf, alberta, canada). 19. terry reese, “marcedit—your complete free marc editing utility,” http://people.oregonstate.edu/~reeset.marcedit/html/index.php (accessed february 19, 2011). 20. united nations educational scientific and cultural organization, “isismarc,” http://portal.unesco.org/ci/en/ev.phpurl_id=11041&url_do=do_topic&url_section=201.html (accessed february 19, 2011). 21. fernando j. gómez “catalis,” http://inmabb.criba.edu.ar/catalis (accessed february 19, 2011). 22. polaris library systems homepage, http://www.gisinfosystems.com (accessed february 19, 2011). 23. library of congress, “marcmaker and marcbreaker user’s manual,” http://www.loc.gov/marc/makrbrkr.html (accessed february 19, 2011). 24. exlibris, “exlibris voyager,” http://www.exlibrisgroup.com/category/voyager (accessed february 19, 2011). 25. book systems, “concourse software product.” 26. bonnie parks, “an interview with terry reese,” serials review 31 (2005): 303–8. 27. eclipse.org, “xtext,” http://www.eclipse.org/xtext (accessed february 19, 2011). 28. the eclipse foundation, “rich client platform,” http://wiki.eclipse.org/index.php/rich_client_platform (accessed february 19, 2011). http://people.oregonstate.edu/~reeset.marcedit/html/index.php http://portal.unesco.org/ci/en/ev.php-url_id=11041&url_do=do_topic&url_section=201.html http://portal.unesco.org/ci/en/ev.php-url_id=11041&url_do=do_topic&url_section=201.html http://inmabb.criba.edu.ar/catalis http://www.gisinfosystems.com/ http://www.loc.gov/marc/makrbrkr.html http://www.exlibrisgroup.com/category/voyager http://www.eclipse.org/xtext http://wiki.eclipse.org/index.php/rich_client_platform 18. wendy osborn et al., “a cross-platform solution for bibliographic record manipulation in digital libraries,” (paper presented at the sixth iasted international conference communications, internet and information technology, july 2–4, 2007, banf, ... 25. book systems, “concourse software product.” 26. bonnie parks, “an interview with terry reese,” serials review 31 (2005): 303–8. missing items: automating the replacement workflow process | smith et al. 93 tutorial cheri smith, anastasia guimaraes, mandy havert, and tatiana h. prokrym missing items: automating the replacement workflow process academic libraries handle missing items in a variety of ways. the hesburgh libraries of the university of notre dame recently revamped their system for replacing or withdrawing missing items. this article describes the new process that uses a customized database to facilitate efficient and effective communication, tracking, and selector decision making for large numbers of missing items. t hough missing books are a ubiquitous problem affecting multiple aspects of library services and workflows, policies and procedures for handling them have not generated a great deal of buzz in library literature. for the purpose of this article, missing books (and other collection items), refers to items that were not returned from circulation or have otherwise gone missing from the collection and cannot be located. significant staff time may be invested in the missing-book process by departments such as collection development, circulation, acquisitions, database management, systems, and public services. more importantly, user experiences can be negatively affected when missing books are not handled efficiently and effectively. while most libraries have procedures for replacing or suppressing catalog records for items that are missing from the stacks or have been checked out and never returned, few have made these procedures public. this article describes the procedure developed by the hesburgh libraries of the university of notre dame to replace missing items or to withdraw them from the catalog. hesburgh libraries’ procedure offers streamlined, paperless routing of records for missing materials, accounts for “nondecisions” by subject librarians, and results in a shortened turnaround time for acquisitions and catalogmaintenance workflows. hesburgh libraries’ experience in 2005, hesburgh libraries recognized its need to develop a streamlined method of processing missing items. because of personnel changes and competing demands on staff time, the routine handling of missing materials had been suspended for roughly five years. during this period, circulation staff continued to perform searches. when staff declared an item officially missing, the item’s catalog record was updated to the item process status “missing” (mi) and paper records were routed to the collection development department office, but no further action was taken. the mounting backlog of missing items in the catalog became a recurring source of frustration to patrons and public-services employees alike. searches for books that were popular among undergraduates often led to items with a “missing” status. to compound the problem, budgetary constraints resulted in the suspension of spending from the fund earmarked for the replacement of missing items. subject librarians were forced to use their own discipline-specific funds to replace items in their areas, but because there was no systematic means of notifying subject librarians of missing items, they replaced items very rarely and on a case-by-case basis—primarily when faculty or graduate students asked a selector to purchase a replacement for an item critical to their teaching or research. also in 2005, a library-wide fund to replace materials was made available. unfortunately, by that time, the tremendous backlog of catalog records for missing items rendered the existing paper-based system unworkable. as a result, a small task force was formed to manage the backlog and to develop a new method for handling future missing items. hesburgh libraries’ solution the missing items task force was initially composed of eight members representing all departments affected by changes in the procedures for handling missing books. the task force was chaired by the subject librarian for psychology and education. other members represented the circulation, collection development, cataloging, catalog and database maintenance (cadm), monograph acquisitions, and systems departments. during the initial meeting, each member described their portion of the workflow and communicated their requirements for effectively completing their parts of the process. because most items with the status “missing” were ones that a patron or patrons had either recently used or requested and could therefore be considered relatively high-use material, the task force quickly determined that the search time for missing books should be shortened from one year to six months. task force members from monograph acquisitions were particularly interested in making this change because newer books are more easily replaced if requests were made cheri smith (cheryl.s.smith.454@nd.edu) is coordinator for instructional services, anastasia guimaraes (aguimara @nd.edu) is supervisor of catalog and database maintenance, mandy havert (mhavert@nd.edu) is head of the monograph acquisitions department, and tatiana h. prokrym (tprokrym@ nd.edu) is senior technical consultant at hesburgh libraries of the university of notre dame, notre dame, indiana. 94 information technology and libraries | june 2009 sooner—many books, especially in the sciences, go out of print quickly and become difficult to replace. the systems task force member supplied a spreadsheet containing the roughly three thousand missing items. this initial spreadsheet included all fields that might be useful for staff in monograph acquisitions, cataloging, cadm, and collection development. various strategies for disseminating the spreadsheet to subject librarians were discussed, but all ideas for how the subject librarians might interact with the spreadsheet seemed laborious and inevitably required that someone sort through each item on the list to determine whether the records needed to be sent to monograph acquisitions or cadm for further processing. the process seemed feasible for a onetime effort, but the task force did not see it as a suitable permanent solution. the task force then considered the feasibility of developing a customized database to manage all of the information necessary for library employees—primarily subject librarians and monograph acquisitions and cadm staff—to participate in the processing of missing books. the database once the task force determined that a database would serve hesburgh libraries’ needs more efficiently than a spreadsheetor paper-based system, the task force enlisted the help of an applications developer. hesburgh libraries had previously created a database for handling journal cancellations, and the task force decided to base the replacement application upon this model. the application is therefore written in php and uses a mysql database. the first step in designing the database was to determine which bibliographic metadata (such as call number, isbn, issn, imprint, etc.) would be required by subject librarians to specify replacement or withdrawal decisions, including whether the item was to be replaced with the same edition, any edition, or the newest available edition. because replacement funds may not always be available, the task force wanted to enable the selector to identify other funds to use for the replacement purchase. finally, the task force felt that, no matter how easy the system was to use, there would always be a few subject librarians who choose not to use it. it was therefore important that the database could also account for “nondecisions” from subject librarians. other general database requirements included that it be available through any web browser and accessible to only those people who are part of the replacement-book process. with those requirements in mind, the task force created a list of metadata elements to be included in the database (see table 1). on a quarterly basis, the application pulls the database fields—title, author, call number, sub library, imprint, isbn or issn, barcode, previous fund, local cost, description, item status, update date, bib system number, and system number—from hesburgh libraries’ ils (aleph v18) and imports into the replacements database. for each item, bibliographic, circulation, and acquisitions information is retrieved from aleph and combined to generate the export data file. procedurally, a list of all items with an item process status of “missing” is first retrieved into a temporary table from the item record (z30) table. this temporary table consists of the system number, status field, sublibrary, collection, barcode, description, and the last date the item was modified (z30-update-date in aleph). a second temporary table is then created that includes the purchase price and fund code originally used to purchase the item. the two temporary tables are joined and their information merged, creating a single list of missing items and related acquisitions information. this list is then linked to the bibliographic tables to obtain key bibliographic information such as title, author, imprint, isbn or issn, the ils bibliographic number, and the barcode. these combined results are converted into an ascii text file for import into the mysql replacements database. upon the import of the ascii file, an e-mail is sent to the collection development e-mail list, informing subject librarians that data has been loaded and is ready for their review and input. table 2 lists the purpose of each of the nine tables within the replacements database. figure 1 illustrates the relationships and linking fields table 1. fields for the replacements database database field data type title varchar(200) author varchar(150) call number varchar(30) sub library varchar(12) imprint varchar(150) isbn or issn varchar(150) barcode varchar(30) previous fund varchar(20) local cost decimal(10,2) description varchar(50) item status char(2) update date date bib system number int(9) unsigned zerofill system number varchar(50) new database fields: action to take tinyint(1) new fund code int(10) modified date date modified by varchar(50) notes longtext system-used fields: transfer date date record id int(10) (auto) missing items: automating the replacement workflow process | smith et al. 95 between the tables. the database provides two “pick lists” for subject librarians. the first pick list is the action to take field. primary choices are “any edition,” “newest edition only,” “micro format only,” and “do not replace.” the second pick list is the new fund field. the default choice for this field is hesburgh libraries’ replacement fund code, although any acquisitions funds may be selected. both pick lists provide data integrity and assurance that all input from the subject librarians is standardized. two internal fields, record id and transfer date facilitate programming and identification. these fields are very important for auditing and tracking replacement records through the replacement process. rollbacks are easily handled through the manipulation of these two fields. programmatic process for the initial implementation of this application, the task force decided that batch loads would be preformed on an as-needed basis. after the initial phase of the project, the task force implemented a quarter-based schedule. for each data load, the exported records are written to a text file, which is then imported into the replacements database through an import script. the import script archives the previous group of processed records, appending them to a set of historical tables stored within the database. the import script further processes the aleph data by eliminating duplicate records and ensuring there is only one record per barcode and system number. the historical tables are checked to see if a missing item has already been loaded into the database and processed. if a record has already been processed, it is automatically deleted from the newly imported item list. after the successful completion of the data load, an e-mail is automatically generated notifying subject librarians that the replacements database is ready for their review and input. the verified missing item records are then transferred to the main database table, “tblreplacements,” and are ready for updating. included in the e-mail to subject librarians is a link that directs them to a search window allowing them to take action on the missing items (see figure 2). once the subject librarians update the records, the application provides a mechanism to distribute missing book records to the appropriate departments for further processing. a collection development staff member runs a series of reports, each one creating a microsoft excel spreadsheet. the first report lists missing-book records marked for replacement and is sent to monograph acquisitions for processing. missing books that have been marked “do not replace” or have had no action taken on them after a certain time period are exported to a separate excel spreadsheet that is sent to cadm for suppression or removal of cataloging records. for each report that is run, the application generates an e-mail message, notifying all necessary departments that there is information to be processed. a list of processed records is available for viewing and distribution to cadm and acquisitions as illustrated in figure 3. the application also provides customized manipulation of the data records that are exported to each of the departments. this customization pulls together only the specific fields of interest to each department such that each export template is unique to each department’s needs. at the end of each replacement cycle, the application automatically creates backups and archives the missing book records. table 2. tables and their purposes within the database table description alephdump stores imported aleph data before processing. tbltempreplacemetns stores aleph data from the alephdump table. this data is processed and sent through verification and truncation programs. tblreplacments post-processed aleph records. primary table for all activities, actions, and fund codes selected by the subject librarians. tblactions a reference list of valid actions that can be taken by the subject librarians. tblfunds a reference list of valid fund codes; originally imported from aleph. tblacqrecords temporary table that stores processed records that should be sent to monographic acquistions. tblcadmrecords temporary table that stores processed records that should be sent to cadm. tblcadmnullrecord temporary table that stores records where no action has been taken by a subject librarian. historytblreplacements an archiving table. 96 information technology and libraries | june 2009 subject librarian workflow when subject librarians receive a message indicating a new replacement list is ready for review, their job is surprisingly simple. after entering their network id and password to gain access to the database, they can select how they wish to view the list of missing books—by selected call number ranges, by the budget code with which the books were originally purchased, or by system number (the last two options are rarely used). subject librarians can also view items that have already been processed, and they are able to sort this list by subject librarian, action taken, new budget code, or call number. figure 1. relationship diagram for the nine database tables that were created for this application. the aleph system number is used as the primary linking field for most of the tables. missing items: automating the replacement workflow process | smith et al. 97 initially, subject librarians encounter a list of brief records for each item in the database. the brief records include system numbers, titles, authors, volume numbers (if applicable), call numbers, sublibraries, and isbns or issns. if a record has already been reviewed by a subject librarian, the list will include actions taken and the names of the subject librarians who took the action. to take action on an item, subject librarians select the system number, displaying the full record (see figure 4), and may then choose to replace the book with the same edition, any edition, the newest edition available, or a microform version. by using a drop-down menu, the selector can elect to pay for the replacement with replacement funds or with their own subject funds. subject librarians who choose to replace books with their own funds are rewarded at the end of the quarter when their replacement requests appear at the top of the queue for processing by monograph acquisitions. additional functionality includes the ability to directly link to and browse opac records for items in the database. replacement funds cannot be used for second copies of books, so quick access to opac records is often useful. it also facilitates determining if the library owns other editions of the item before taking action. a notes field allows subject librarians to communicate special instructions for monograph acquisitions or cadm, and records can be e-mailed to other librarians for additional input with just a few clicks. subject librarians are able to return to the database at any time during a given quarter to continue making decisions on their missing books and make any adjustments to prior decisions as necessary. if a subject librarian takes no action on an item by the end of the quarter, it is assumed that it is not to be replaced, and these untouched items are sent to cadm for removal or suppression. figure 2. replacements application search window figure 3. processed book records ready to be sent to monograph acquisitions and cadm. notification and data transmission to these units are achieved through the send buttons on this webpage. 98 information technology and libraries | june 2009 monograph acquisitions workflow once the quarterly database processing completes, a comma-separated file is delivered to the shared monograph acquisitions e-mail address. monograph acquisitions staff format, sort, and begin searching the spreadsheet, giving priority to the orders designated for replacement by subject librarian funds over those funded from the library replacement fund. staff members routinely search the library catalog for duplicate titles or review orders in process for the same title prior to searching with our library materials vendors. staff members ensure that replacement funds are not used to purchase second copies. material that is not available for purchase is referred by monograph acquisitions to the subject librarian for direction. sometimes the materials may be kept on order with a vendor to continue searching for outof-print or aftermarket availability. other times it is necessary for staff to cancel the order and remove the record from the system completely. likewise, the missing edition may have been subsumed by a newer, revised edition. subject librarians are contacted by search and order staff in the monograph acquisitions department regarding availability of different editions when they did not specify that any edition would be acceptable. when the monograph acquisitions department places a replacement-copy order, the search-and-order unit adds an ils library note field code designating the item is a replacement (rplc), the bibliographic system number of the item being replaced, and any typical order notes such as the initials of the staff member placing the order. the rplc code alerts the receipt unit to route new items to the cataloging supervisor, who then reviews and directs the items to either cataloging or cadm for processing. catalog and database maintenance (cadm) workflow cadm is usually the last unit to edit records in the missing books workflow. the unit receives two reports from the database: a “do not replace” list and a “no action taken” list. both reports get the same treatment: all catalog records for titles listed are removed from the catalog. removal of catalog records is accomplished either by suppression/ deletion of the bibliographic records or complete deletion of all records (item, holdings, bibliographic, and administrative) from the server. for titles that have order or subscription records attached to bibliographic records, a suppression/deletion procedure allows the record to be suppressed from patrons’ view while preserving the title’s order and payment history for internal staff use. records are completely deleted when no such information exists (e.g., a gift copy or an older record that has no such data attached). because it takes a long time to review each newly loaded batch from the catalog into the database, some records that come to cadm for deletion no longer need to be deleted if missing books are found and returned to the shelves. it is very important for staff working on the cleanup of records to check the item process status and not delete any items that have been cleared of the “missing” status. fortunately, aleph allows staff to look up an item’s history and view prior changes made to the record. this item history feature eliminates unnecessary shelf checks for items appearing on cadm reports that are no longer listed as “missing” in the catalog. occasionally, cadm receives requests to delete records directly from monograph acquisitions and cataloging staff because of a revised selector decision. this often occurs when a replacement item is only available in a different edition from figure 4. full record for a missing book in the replacement database missing items: automating the replacement workflow process | smith et al. 99 the one originally sought, or when an item is ultimately unable to be replaced because it has gone out of print or a vendor backs out of a purchase agreement. when a different edition is received to replace a missing item, the replacement copy is sent by the receipt unit in monograph acquisitions to cataloging for copy or original cataloging, and cadm is alerted by either monograph acquisitions or cataloging staff if the record for the missing item needs to be deleted. because monograph acquisitions often orders the replacement on its own record with appropriate bibliographic information (we keep the original record just in case the missing piece is found while we wait for replacement), the record for the missing book does not come to cadm on either of the two reports. perhaps in a library with a different makeup of technical services the process would be more streamlined, but because hesburgh libraries has separate cataloging and database maintenance units, we have created such partnerships to make sure nothing falls through the cracks. so far it has worked well, and every party in the process knows and carries out their responsibilities. issues while the initial implementation successfully brought a large backlog of missing records into the database, subsequent loads included duplicate records of some items processed in earlier batches. this duplication occurred, for example, if an item was identified for replacement in a prior database review cycle, but a replacement request had not yet been processed by monograph acquisitions staff. because such an item is still identified as “missing” in the catalog, it was again included in data loaded from the catalog into the missing-books database, creating confusion for selectors, cadm, and monograph acquisitions. to resolve this problem, the import process was revised to include a search for previously loaded items, deleting them before records are viewed by collection managers. a second issue involved the timing of the data load from the catalog into the replacements database. for various reasons, the data load file was not fully generated for several of the scheduled processing dates. to remedy this problem, the application automatically generates an e-mail confirming a successful data load to the collection development department staff. there is continued debate as to whether the missingitems file should be created on a daily basis, providing the capability for collection development to import new data at one time rather than periodically. results since implementing our new system, hesburgh libraries has processed records for 5,141 missing items. since its creation, twenty-five librarians have consulted the database and twenty-three of thirty subject librarians have used the database to request replacements. of the 5,141 records loaded into the database, 2,537 items (49 percent) have been selected for replacement, and 2,604 items (51 percent) have either been suppressed or deleted from our catalog. replacement funds are renewed on an annual basis and have not yet run out. as a reflection of the collection strengths at hesburgh libraries, most of the missing books (21 percent) fell in the theology/religion call number range. language and literatures was the second most popular collection for missing items (17 percent). other collections with significant numbers of missing books are history (15 percent), social sciences (17 percent), science (12 percent), and philosophy (10 percent). conclusion although the process could certainly be further developed and refined, the hesburgh libraries missing books application is an amazing improvement over the extremely outdated paper-based method of dealing with missing library materials. the process works; it is both efficient and effective, and employees who engage in the process have reported satisfaction with it. it has not only allowed hesburgh libraries to catch up on its backlog but, more importantly, to stay current and organized, keeping the catalog more accurate and patrons more satisfied. furthermore, should the libraries opt to do a full inventory in the future, the current system will prove invaluable. the authors are pleased to have the opportunity to share our experiences with interested libraries. feel free to contact any of the authors for further information. reproduced with permission of the copyright owner. further reproduction prohibited without permission. 190 information technology and libraries | december 2011 from static and stale to dynamic and collaborative: the drupal difference editor’s note: this paper is adapted from a presentation given at the 2010 lita forum. i n 2009, the university library of the university of california, santa cruz, moved from a static, dreamweaverand html-created website to an entirely new databasedriven website using the open-source content management system (cms) drupal. this article will describe the interdisciplinary approach the project team took for this large-scale transition process, with a focus on user testing, information architecture planning, user analytics, data gathering, and change management. we examine new approaches implemented for group-authoring of resources and the challenges presented by collaboration and crowdsourcing in an academic environment. we also discuss the impact on librarians and staff changing to this new paradigm of website design and development and the training support provided. we present our process for testing, staging, and publishing new content and describe the modules used to build dynamic subjectand course-guide displays. finally, we provide a list of resources and modules for beginning and intermediate drupal users. why change was needed our old library website was created using static html and its organizational structure evolved to mirror the administrative structure of the library. the vocabulary we used was very library-centric and, though useful to library staff, could be confusing to patrons. like many larger, older websites, we had accumulated a number of redundant and defunct pages. many of these pages had not been updated for years, had inconsistent naming conventions, or outdated page design. the catalyst for updating our web presence was predicated on several things. with more than one million visits per year and more than two million page views, our old servers were no longer able to handle this load, and we were about to begin a major project to replace our server hardware. in addition, we anticipated participating in an upcoming transition to a new campuswide website template. we saw this moment of change as an opportunity to revitalize the library website’s entire structure and reorganize it with a more user-centric approach to the menus and vocabulary. to do this, we decided to move away from dreamweaver and the static html approach to web design and instead choose a cms that would provide a more flexible and innovative interface. choosing drupal we had done research on commercial and open-source solutions and were leaning toward drupal as our cms. many academic departments at our campus were going through a similar process of website redesign and had already explored the cms options and had chosen drupal. this helped move us toward choosing drupal and taking advantage of a growing developer community on campus. two of the largest units on campus both chose drupal as their cms and have since been great partners for collaboration and peer support. drupal is a free, open-source cms (or content management framework) written in php with a mysql database backing it up. it is a small application of core modules with thousands of add-on modules available to increase functionality. drupal also has a very strong developer community and has been adopted by a growing number of libraries. we have found it to be very open and fluid, which is both a blessing and curse. for any one problem there can be dozens of differing solutions and modules to resolve it. the transition team the library created a core website implementation team consisting of a librarian project manager/developer, a web designer from the it department, and two librarian developers. the core team was supported by a server administrator and an it analyst. the it staff supported the technical aspects of drupal installation, backup, and maintenance. the librarian developers planned the content migration and managed the user interface design, layout, content, scope, and architecture. they needed to know the basics of how drupal works and needed to have much more access to the inner workings of drupal (e.g., modules, user permissions, etc.) than staff. the librarians also would train library staff, so needed to be able to teach and develop documentation and tailor instruction to specific staff needs. everyone who participated in the implementation team had many other competing responsibilities. the librarian developers had other projects and traditional duties such as collection development and reference services, so learning drupal and creating this new website was a part-time project and had to be integrated into existing workloads. tutorial ann hubble, deborah a. murphy, and susan chesley perry ann hubble (ahubble@ucsc.edu) is science librarian, deborah a. murphy (damurphy@ucsc.edu) is emerging technologies librarian, and susan chesley perry (chesley@ucsc.edu) is head of digital initiatives, university of california, santa cruz. selecting a web content management system for an academic library website | hubble, murphy, and perry 191from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 191 and often eccentric organizational structures that were no longer meaningful. the previous website had accumulated pages that were a bit more freewheeling in design with a lack of consistent navigation and “look and feel.” adding another layer of complexity, our website changeover took place during a period of great organizational change and a severe budget crisis. surprisingly, what seemed at first a major drawback was actually somewhat helpful. with fewer people spread thinner and doing more work, there was less need to feel in control of individual empires, leading to more cooperation during the changeover. staff learning styles vary, and no one approach to drupal training will work for everyone, so we brought many of the lessons we have learned in our bibliographic instruction sessions to our staff training. for example, we focused training on repetition, reassurance, and patience, ensuring it was an active process with hands-on participation as well as a lecture or demonstration. we provided ample time for questions and invited staff to bring their own projects to work on during training sessions. though some staff only needed to learn a few applications within drupal to perform their jobs, most needed specialized instruction to do some departmental-specific task or action that now had a very different interface. we supplemented our large group by drop-in training sessions with specialized departmental sessions, custom-made documentation, individual hands-on training, e-mail updates on system changes, and regular presentations of new system features. not everyone will become a “born again” drupalista, but everyone should at least feel that they can get their work done using drupal. drupal has also meant changes not only in the way content is added to the website, but also in how we handle revisions and updates. in the past, we had a very siloed initially from the increasing interest in drupal at the campus level. they attended a two-day intensive drupal training course from a company called lullabot, which provided an in-depth technical foundation for our initial drupal installation. this level of technical training and content was not appropriate for the other librarian developers on our team. however, a more detailed, midlevel training would have benefited the librarian developers and moved the project forward at a faster pace. these librarian developers learned using a combination of resources, including free online content that covers core drupal skills, combined with a few carefully chosen professional in-person consultations and online training packages and books. drupal is not a static environment, so after the initial training there was still a need for regular updates and refresitishers. our transition team joined the drupal4lib discussion list and consulted with library colleagues using drupal in the northern california area. drupalcon conferences as well as online users groups were excellent places not only to learn but also to make contact with vendors and other developers. several of these resources are listed in the accompanying bibliography in appendix b. staff training by far our largest group of library drupal users was the fifty-plus library staff content contributors who were faced with learning a new approach to web development. drupal’s successful implementation was ultimately dependent on ensuring library staff would be able to create, edit, and manage thousands of library webpages using this new cms. this was a change for everyone in the library, not just a few. the new website meant leaving behind the comfort of routines created over the years, elaborate designs that had been developed, and various idiosyncratic transition planning with the goal of making our new site user-centered, we wanted to make data-driven decisions about design rather than what had ultimately devolved into the practice of decisions based on politics and committee negotiations. to that end, we took several approaches to gathering user data. we inventoried our current site and gathered usage statistics based on website analytics. we met with small campus focus groups who answered questions about library site searching. we created personas for user categories based on profiles of average users (e.g. first-year students, graduate students, faculty, community users, etc.). based on this data, we drafted web interface wireframes and began user testing. drupal implementation also included developing a safe and effective means of moving from a testing environment to a final, public production site. this deployment process is a crucial component of ensuring that we could both test new features and still provide a stable environment for our users. after extensive discussions and revisions we developed a process to experiment with new modules and themes in a way that does not overwrite our existing public site. the deployment process goes from development to staging to production. it is critical to be able to determine that a new module or update will not negatively affect the database. the process we follow from our sandbox site to our production site is described in more detail in appendix a. transition team training we had three types of drupal users within the library: system administrators, developers, and staff (the primary content editors); each group had its own training needs. the library project manager, web designer, and it systems administrators benefited 192 information technology and libraries | december 2011 appropriate for this particular subject. each tab can also be customized to display whichever records pertain to this subject area. how our dynamic displays were built cck (content construction kit) module content used in both the “article databases and research tools” and “subject guides” displays is held within a special record type we created using the content construction kit (cck) module. we called this special record, or content type, online resource. we defined fields within the online resource record to hold information about individual resources we want to either display on our website or keep track of internally. the fields we defined include the resource name, url (sometimes multiple urls), description, type of information (article database, dictionary, encyclopedia, etc.), and subject discipline. figure 3 shows what a portion of the online resource record for a particular database changes, it’s updated in a central record and immediately reflected in displays throughout the site. not only is it less work to update information, but we also can provide resources in more varied combinations and make them more findable for our users. figure 1 shows how the dynamically created “article databases and research tools” list appears to a user browsing our website. the default display lists these resources in alphabetical order. the user can display the same group of records sorted by other criteria just by clicking on the appropriate tab. if the “by subject” tab is selected, the resources are displayed under subject headings. selecting the “by type” tab lists the resources by resource types, such as dictionaries and encyclopedias, citation style guides, etc. our subject guides are also created using the same components used to build the “article databases and research tools” lists. figure 2 shows a portion of one of our subject guides. like the previous example, this portion of the guide is created dynamically, displaying only records permissions environment that limited editing to only those given specific permissions. we now have role-based ownership where everyone can edit everything so that we did not have to keep up a detailed list of who does what. initial concern that someone could write over or accidently delete pages was somewhat remedied by the drupal revisions history feature, which assists with version control. there have been a few pages where ownership is an issue, and we are still in the process of developing a system to ensure that pages are updated when there is no specific individual linked to a page. dynamic displays: article databases and subject guides as part of the move to drupal, we wanted to take advantage of the new environment to redesign some of the more specialized portions of our site. in particular, we hoped that drupal’s dynamic displays would help us to keep our site more current with less effort. with this in mind, we chose to focus on two of the most heavily used resources: our list of article databases and our library subject guides. we planned to transform these static, high-maintenance html pages into dynamic, easily maintained, and easily generated resources. we used a number of drupal modules to develop the library’s website, and these are described in more detail in appendix c. to redesign our list of article databases and our library subject guides, we relied heavily on three important modules: cck (content construction kit), views, and taxonomy. the interaction of these three modules is key to building dynamically created webpages. once these modules are configured, information is input just once. drupal does the work of pulling the right information from each resource to create dynamic displays. if information, such as a url figure 1. dynamic display: article databases and research tools selecting a web content management system for an academic library website | hubble, murphy, and perry 193from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 193 and not programmers. we found that drupal was very different from anything we had used before and had a very steep learning curve. if we could start over, we would have invested much more time in lessons learned learning drupal takes time our implementation team was comprised predominantly of librarians and list of defined fields looks like behind the scenes to the librarian web developer. some fields within the online resource record rely on a little further customization. the “type of information” field is defined via an allowed values list. figure 4 shows a portion of the values list we have defined for this particular field. the “subject, discipline, topic” field (figure 3) incorporates a taxonomy list that we first created using the taxonomy module. this taxonomy vocabulary allows us to later sort the resources dynamically in both the “article databases and research tools” (figure 1) and “subject guides” displays (figure 2). taxonomy module figure 5 shows the list of subject terms we created using the taxonomy module. terms are easily added, edited, and organized via this taxonomy display, available only to the web developers. views module–putting it all together to define how the online resource records are displayed to the user (figure 1), we use the views module. views allow us to define, sort, and filter these records for display. figure 6 shows what the “article databases and research tools” view of figure 1 looks like to the web developer. notice that “a–z,” “subjects,” and “by type” are listed in a box on the left side of the page. each of these tabs corresponds to a tab on the page that displays to the user. in this case, “a–z” is bold and is the active tab currently being defined for this display. display settings such as the record type used, number of records to display per page, specific fields to display, type of sorting, and url path for the webpage are defined here. figure 2. dynamic display: dynamic portion of a subject guide figure 3. cck module: online resource record: manage fields 194 information technology and libraries | december 2011 learning drupal basics and getting a better grasp of how drupal works as a cms. the architecture and database-driven paradigm of our new drupal site is a significantly different environment from our previous website’s html-designed pages and directory-and-folder organization. of particular importance for our site were three core modules: cck, views, and taxonomy. becoming proficient with these modules was a challenge, and we can’t emphasize enough the importance of good, basic training on their use. start small: identify small parts to bring over initially, the thought of moving our old website to drupal seemed insurmountable. bringing over static html pages was straightforward, but portions of the website (such as converting our database of online resources) took more intensive planning. the entire process became more manageable when we divided up the site and focused on drupalizing small parts at a time. this way we could focus on learning enough drupal to make these portions of the site work without being overwhelmed. project management software: document & share what you’ve done if we were to transition an entire website again we would recommend using some type of project management software before starting. none of the implementation team worked on this site full time. this project was added to our other full-time workload providing reference services, collection planning, teaching, digital projects, etc. during our project we tried several free products but were not satisfied with any of them. we felt that finding the right project management package could have made the website transition process much figure 4. cck module: allowed values list (type of information) figure 5. taxonomy module: subjects list selecting a web content management system for an academic library website | hubble, murphy, and perry 195from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 195 and the library is now in a much better position for future website design transitions, a process that will be much easier with so much less static content to migrate. for example, the look and feel of our entire website can be transformed by reconfiguring a few options within drupal. ultimately, the transition of the library website to a drupal environment was a very good thing, and we are glad we did it. it was difficult and messy at times, but our website is now more flexible, agile, adaptable, and better poised for change. epilogue since this article was submitted, the uc santa cruz university library website has moved to an entirely new campus theme. we note that having a drupal-based cms greatly aided this transition process. personas for librarians and content contributors and done more usability testing for non-developers. we found that training and teaching library staff the architecture and databasedriven paradigm of the new drupal culture has been a challenge and we still have varying levels of buy-in. conclusion we now have a consistent look and feel to our site, though there are still many things yet to do. now that we are more comfortable using drupal, we can focus on creating more dynamic content, such as staff lists, adding sidebars to pages, and so on. increasing the number of dynamically created pages will mean a more up-to-date site in general. though group authoring within the library is still a challenge, we continue to find ways to encourage collaboration. easier. documenting and sharing how we created elements of the site helped us replicate complex components and allowed us to collaborate more easily on various projects. test, test, test testing the website as we developed it was a crucial component of our work. modules also can interact with other modules in unpredictable ways, so we ultimately found that loading new modules on our sandbox site, a mirror of the library website, was a crucial step in determining compatibility as well as functionality with our existing site (appendix a). it’s essential to practice using a live site without bringing the real production website down. focus on essential modules: cck, views, taxonomy images, wysiwyg editors drupal comes with a set of core modules plus an ever-increasing number of specialized contributed modules. finding and installing the right contributed module that fits a particular need can sometimes be difficult. there are often myriad modules that can solve a problem. it takes time to find and test each one to see if it will actually function as needed, and not all modules work well with one another. focusing on the essential drupal core modules plus cck, views, and taxonomy will help reduce unnecessary development frustrations. staff are important though we created many personas for faculty, students, and community users, we should have created figure 6. views module: article databases and research tools view 196 information technology and libraries | december 2011 appendix a. website deployment process created by bryn kanar and sue chesley perry selecting a web content management system for an academic library website | hubble, murphy, and perry 197from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 197 appendix b. drupal resources for getting started ■■ american library association. “drupal4lib interest group (lita library & information technology association).” http://connect.ala.org/node/71787 (accessed march 18, 2011). ■■ american library association. “showcase: database pages & research guides using drupal.” http://connect.ala .org/node/98546 (accessed march 18, 2011). ■■ austin, andy, and christopher harris. “drupal in libraries.” library technology reports 44, no. 4 (2008). ■■ byron, angela, addison berry, nathan haug, jeff eaton, james walker, and jeff robbins. using drupal: choosing and configuring modules to build dynamic websites. sebastopol, ca: o'reilly, 2008. ■■ drupal. “drupal.org.” http://drupal.org/(accessed march 18, 2011). ■■ drupal dojo. “drupal dojo.” http://drupaldojo.com/ (accessed march 18, 2011). ■■ drupal modules. “search, rate, and review drupal modules.” http://drupalmodules.com/ (accessed march 18, 2011). ■■ “drupalconsf san francisco – april 19-21, 2010.” http://sf2010.drupal.org/conference/sessions (accessed march 18, 2011). ■■ drupalib.”drupalib: a place for library drupalers to hang out.” http://drupalib.interoperating.info/ (accessed march 18, 2011). ■■ gotdrupal.com. “gotdrupal: once you've got it, you're addicted!”. http://gotdrupal.com (accessed march 18, 2011). ■■ groups.drupal. “libraries.” http://groups.drupal.org/libraries (accessed march 18, 2011). ■■ groups.drupal. “list of libraries using drupal.” http://groups.drupal.org/libraries/libraries (accessed march 18, 2011). ■■ “is this site built with drupal?”. http://www.isthissitebuiltwithdrupal.com/ (accessed march 18, 2011). ■■ learn by the drop. “learn by the drop: a place to learn drupal.” http://learnbythedrop.com/ (accessed march 18, 2011). ■■ “lullabot.” http://lullabot.com (accessed march 18, 2011). ■■ mastering drupal. “drupal screencasts.” http://www.masteringdrupal.com/videos (accessed march 18, 2011). ■■ slideshare. “drupal resources for libraries, sarah houghton-jan.” http://www.slideshare.net/librarianinblack/ drupal-resources-2982935 (accessed march 18, 2011). ■■ slideshare. “introduction to drupal for libraries, laura solomon.” http://www.slideshare.net/oplin/intro-to -drupal-for-libraries (accessed march 18, 2011). ■■ sunrainproductions. “drupalcampla 2009 views demystified.” http://www.sunrainproductions.com/ drupalcampla/views-demystified (accessed march 18, 2011). appendix c. selected drupal modules used on the ucsc library site ■■ administration menu—adds a top menu bar for authenticated users with common administration tasks ■■ cck—allows you to add new content types, for example the online resources content type for a–z list ■■ ckeditor—wysiwyg editor ■■ google analytics—adds google javascript tracking code to all of our site's pages ■■ google cse—allows us to use google as the site search ■■ imce—image-uploading module, also allows you to create subdirectories within the image directory ■■ image cache—allows you to pre-set sizes for images ■■ ldap integration—links user authentication to the library’s ldap server ■■ mollum—spam filter and image captcha (part of spam control) ■■ nice menus—allows drop-down/right/left expandable menus ■■ nodeblock—allows you to specify a content type as being a block, which content creators to edit the block text and title without having to access the block administration page ■■ pathauto—automatically generates path aliases for various kinds of content (nodes, categories, users) ■■ printer-friendly, e-mail and pdf versions—allows you to configure any type of page to display links for print, e-mail, and pdf ■■ rules—allows site administrators to define conditionally executed actions based on occurring events, we use it to send email when new content is created and to hide some content fields from selected user roles ■■ taxonomy—enables us to assign subjects and other categories to content; the url paths and views use taxonomy ■■ webform—enables quick creation of forms and questionnaires 2 information technology and libraries | september 2008 andrew k. pacepresident’s message andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. w elcome to my first ital column as lita president. i’ve had the good fortune to write a number of columns in the past—in computers in libraries, smart libraries newsletter, and most recently american libraries—and it is a role that i have always cherished. there is just enough space to say what you want, but not all the responsibility of backing it up with facts and figures. in the past, i have worried about having enough to say month after month for an undefined period. now i am daunted by only having one year to address the lita membership and communicate goals and accomplishments of my quickly passing tenure. i am simultaneously humbled and extremely excited to start my presidential year with lita. i have some ambitious agenda items for the division. i said when i was running that i wanted to make lita the kind of organization that new librarians and it professionals want to join and that seasoned librarians wanted to be active in. recruitment to lita is vital, but there is also work to be done to make that recruitment even easier. i am fortunate in following up the great work of my predecessors, many of whom i have had the pleasure of serving with on the lita board since 2005. they have set the bar for me and make the coming year as challenging as anything i have done in my career. i also owe a lot to the membership who stepped forward to volunteer for committees, liaison appointments, and other volunteer opportunities. i also think it is important for lita members to know just how much the board relies on the faithful and diligent services of the lita staff. at my vice presidential town meeting, i talked about marketing and communication in terms of list (who), method (how), and message (what and why). not only was this a good way to do some navel gazing on what it means to be a member of lita, it laid some groundwork for the year ahead. i think it is an inescapable conclusion that the lita board needs to take another look at strategic planning (which expires this year). the approach i am going to recommend, however, is not one that tries to connote the collective wisdom of a dozen lita leaders. instead, i hope we can define a methodology by which lita committees, interest groups, and the membership at large are empowered to both do the work of the division and benefit from it. one of the quirky things that some people know about me is that i actually love bureaucracy. i was pleased to read in the lita bylaws that it is actually my duty as president to “see that the bylaws are observed by the officers and members of the board of directors.” i will tell you all that i also interpret this to mean that the president and the board will not act in ways that are not prescribed. the strength of a volunteer organization comes from its volunteers. the best legacy a lita president can provide is to give committees, interest groups, and the membership a free reign to create its future. as for the board, its main objective is to oversee the affairs of the division during the period between meetings. frankly, we’re not so great at this, and it is one of the biggest challenges for any volunteer organization. it is also one of my predecessor’s initiatives that i plan to follow through on with his help as immediate past president. participation and involvement—and the ability to follow the work and strategies of the division—should be easier for all of us. so, if i were to put my platform in a nutshell it would be this—recruitment, communication, strategic planning, and volunteer empowerment. i left out fun, because it goes without saying that most of us are part of lita because it’s a fun division with great members. this is a lot to get done in one year, but because it will be fun, i’m looking forward to it. editorial board thoughts: ital 2.0 | boze 57 litablog.org/) i see that there are occasional posts, but there are rarely comments and little in the way of real discussion. it seems to be oriented toward announcements, so perhaps it’s not a good comparison with italica. some ala groups are using wordpress for their blogs, a few with user comments, but mostly without much apparent traffic (for example, the ll&m online blog, http://www .lama.ala.orgllandm). in general, blogs don’t seem to be a satisfactory platform for discussion. wikis aren’t particularly useful in this regard, either, so i think that rules out the lita wiki (http://wikis.ala.org/lita/index.php/ main_page). i’ve looked at ala connect (http://connect. ala.org/), which has a variety of web 2.0 features, so it might be a good home for italica. we could also use a mailing list, either one that already exists, such as lita-l, or a new one. the one advantage e-mail has is that it is delivered to the reader, so one doesn’t have to remember to visit a website. we already have rss feeds for the italica blog, so maybe that works well enough as a notification for those who subscribe to them. i’ve also wondered whether a discussion forum (aka message board) would be useful. i frequent a few software-related forums, and i find them conducive to discussion. they have a degree of flexibility lacking in other platforms. it’s easy for any participant to start up a new topic rather than limiting discussion only to topics posted by the owner, as is usually the case with blogs. frankly i’d like to encourage discussion on topics beyond only the articles published in ital. for example, we used to have columns devoted to book and software reviews. even though they were discontinued, those could still be interesting topics for discussion between ital readers. in writing this, my hope is to get feedback from you, the reader, about what ital and italica could be doing for you. how can we use ala connect in ways that would be useful? could we use other platforms to do things beyond simply discussing articles that appear in the print edition? what social web technologies do you use, and how could we apply them to ital? after you read this, i hope you’ll join us at italica for a discussion. let us know what you think. editor’s note: andy serves on the ital editorial board and as the ital website manager. he earns our gratitude every quarter with his timely and professional work to post the new issue online. t he title of this recurring column is “editorial board thoughts,” so as i sit here in the middle of february, what am i thinking about? as i trudge to work each day through the snow and ice, i think about what a nuisance it is to have a broken foot (i broke the fifth metatarsal of my left foot at the midwinter meeting in boston—not recommended) but most recently i’ve been thinking about ital. the march issue is due to be mailed in a couple of weeks, and i got the digital files a week or so ago. in a few days i’ll have to start separating the pdf into individual articles, and then i’ll start up my web editor to turn the rtf files for each article into nicely formatted html. all of this gets fed into ala’s content management system, where you can view it online by pointing your web browser to http://www.lita.org/ala/mgrps/divs/lita/ ital/italinformation.cfm. in case you didn’t realize it, the full text of each issue of ital is there, going back to early 2004. selected full-text articles are available from earlier issues going back to 2001. the site is in need of a face lift, but we expect to work on that in the near future. starting with the september 2008 issue of ital we launched italica, the ital blog at http://ital-ica .blogspot.com/, as a pilot. italica was conceived as a forum for readers, authors, and editors of ital to discuss each issue. for a year and a half we’ve been open for reader feedback, and our authors have been posting to the blog and responding to reader comments. what’s your opinion of italica? is it useful? what could we be doing to enhance its usefulness? in reality we haven’t had a great deal of communication via the blog. we are looking at moving italica from blogger to a platform more integrated with existing ala or lita services. is a blog format the best way to encourage discussion? when i look at the lita blog (http:// andy boze (boze.1@nd.edu) is head, desktop computing and network services, university of notre dame hesburgh libraries, notre dame, indiana. andy bozeeditorial board thoughts: ital 2.0 editorial board thoughts | dehmlow 103 i n the age of the internet, google, and the nearly crushing proliferation of metadata, libraries have been struggling with how to maintain their relevance and survive in the face of shrinking budgets and misinformed questions about whether libraries still provide value. in case there was ever any question, the answer is “of course we do.” still, an evolving environment and changing context has motivated us to rethink what we do and how we do it. our response to the shifting environment has been to envision how libraries can provide the best value to our patrons despite an information ecosystem that duplicates (and to some extent replaces) services that have been a core part of our profession for ages. at the same time, we still have to deal with procedures for managing resources we acquire and license, and many of the systems and processes that have served us so well for so many years are not suitable for today’s environment. many have talked about the need to invest in the distinctive services we provide and unique collections we have (e.g., preserving the world’s knowledge and digitizing our unique holdings) as a means to add value to libraries. there are many other ways libraries create value for our users, and one of the best is for us to respond to needs that are specific to our organizations and users— specialized services, focused collections, contextualized discovery, all integrated into environments in which our patrons work, such as course management systems, google, etc. the library market has responded to many of our needs with ermss and next-generation resource management and discovery solutions. all of this is a good start, but like any solution that is designed to work for the greatest common denominator, they often leave a “desired functionality gap” because no one system can do everything for everyone, no development today can address all of the needs of tomorrow, and very rarely do all of the disparate systems integrate with each other. so where does that leave libraries? well, every problem is an opportunity, and there are two important areas that libraries can invest in to ensure that they progress at the same pace as technology, their users, and the market: open systems that have application programmer interfaces (apis), and programmers. apis are a means to access the data and functionality of our vended or opensource systems using a program as opposed to the default interface. apis often take the shape of xml travelling in the same way that webpages do, accessed via a url, but they also can be as complex as writing code in the same language as the base system, for example software development kits (sdks). the key here is that apis provide a way to work with the data in our systems, be they backend inventory or front-end discovery interfaces, in ways that weren’t conceived by the software developers. this flexibility enables organizations to respond more rapidly to changing needs. no matter which side of the opensource/vended solution fence you sit on, openness needs to be a fundamental part of any decision process for any new system (or information service) to avoid being stifled by vendor or open-source developer priorities that don’t necessarily reflect your own. the second opportunity is perhaps the more difficult one given the state of library budgets and that the resources that are needed to hire programmers are higher than most other library staff. but having local programming skills easily accessible will be vital to our ability to address our users’ specific needs and change our internal processes as we need to. i think it is good to have at least one technical person who comes from an industry outside of libraries. they bring knowledge that we don’t necessarily have and fresh perspectives on how we do things. if it is not possible to hire a programmer, i would encourage technology managers to look closely at their existing staff, locate those in the organization who are able to think outside of the box, and provide some time and space for them to grow their skill set. i am not so obtuse as to suggest that anyone can be programmer—like any skill it requires a general aptitude and a fundamental interest—but i am a self-taught developer who had a technical aptitude and an strong desire to learn new things, and i suspect that there are many underutilized staff in libraries that with a little encouragement, mentoring, and some new technical knowledge, could easily work with apis and sdks, thereby opening the door for organizations to be nimble and responsive to both internal and external needs. i recognize that with heavy demands it can be difficult to give up some of these highly valued people’s time, but the payoff is overwhelmingly worth it. these days i can only chuckle at the doomsday predictions about libraries and the death of our services— google’s dominance in the search arena has never really made me worried that libraries would become irrelevant. we have too much that google does not, specifically licensed content that our users desire, and we have relationships with our users that google will be incapable of having. i have confidence that what we have to offer will be valuable to our users for some time to come. however, it will take a willingness to evolve with our environment and to invest in skill sets that come at a premium even when it is difficult to do so. mark dehmlow editorial board thoughts: adding value in the internet age— libraries, openness, and programmers mark dehmlow (mdehmlow@nd.edu) is digital initiatives librarian, university of notre dame, notre dame, indiana. computer based acquisitions system at texas a&i university .. ned c. morris: texas a&i university, kingsville, texas. 1 in september, 1966, a system was initiated at the university which provides for the use of automatically produced multiple orders and for the use of change cards to update order information on previously placed orders already on disk storage. the system is geared to an ibm 1620 central processing unit ( 40k) which has processed a total of 10, 222 order transactions the first year. it is believed that the system will lend itself to further development within its existing framework and that it will be capable of handling future work loads. in 1925, the library at texas a&l university (first known as south texas state teachers college and later as texas college of arts and industries) had an opening day collection of some 2,500 volumes. by the end of august, 1965, the library's collection had grown to 142,362 volumes, including 3,597 volumes purchased that year. the book budget doubled in september of 1965, and the acquisitions system was severely taxed as the library added by purchase a total of 6,562 volumes. after one full year under the mechanized system discussed below, a total of 9,062 volumes had been added by purchase. counting gifts, transfers, and cancellations, the computer actually handled 10,222 order transactions the first year. the computer-based acquisitions system now in operation was initiated in september of 1966, eleven months after the decision was made to ~echanize the process. the library had already experienced successes m computerizing the circulation and serial systems and, because a rapidly 2 journal of library automation vol. 1/ 1 march, 1968 expanding book budget had caused the old traditional type of acquisitions system to become unwieldy and seemingly obsolete, it seemed imminent that the installation of a computerized acquisitions system would follow. furthermore, it was agreed that acquisitions could make use of the computer at no additional cost, since the library was already paying its share of the machine rental costs for circulation and serials. following the decision to go ahead with the project of computerizing the acquisitions system, a preliminary survey was made of the literature on the subject, and a plan for approaching the task conceived. briefly, the plan hinged upon the idea of an automatically produced multiple order form similar to that proposed by ibm ( 1). it also provided for use of the change card, reported by becker to be "a unique and very important part of the penn state system" ( 2) . it further provided for the automatic production of a weekly books on order list or ''processing information list" similar to that reported by schultheiss to be in use at the university of illinois libraries ( 3) . the plan was written in the form of a proposal which was then sent with an accompanying flow chart to the director of the campus computer center for consideration. the basic proposal for the new system was accepted, and work toward implementation of the system was begun immediately. as was expected, the plan and how chart had to be altered in some areas as the project progressed. as a first step, the book order request form was redesigned to serve as a work slip in the verification routine, as a source document for keypunching, and, in the end, as notification to the requester that a requested item had been cataloged. the redesigned request card consisted of a single record form printed on one side of an ibm tab card (figure 1 ). the only objection to usage of this form appeared to be that the requester would have no record of his request unless he produced one for himself. however, this form was adopted because it was judged less expensive , book ltequest foil'ol firttncw'i"' et i dept. _ _ volumes (complete s• t) volum•_(onlyl [clition: series: yeor: nvmber or copi es:___ lht i'ti c:e:'---d.alcr : cat . no. : ltan no . : (fo~ liua!!y staff use only(: 0 ow~ l .c. nu.y... . . • ..•... ... .• .•. 1419.00 965.55 1633.66 ll80.2l· business administration ·--3358.00-1492.1":3.. 3.872.9h ·-.zoo7~l:-·· · ·•. chemistry . 1182.00 385 .20 646.19 150.61 education 4i~9.00 1755.05 1094.34 1549.61 engineering 2904.00 1143.60 1938.56 178.16· english . ~~6~;o6 i591.06 1463~43 11sd;s1 geography 671.00 1310.75 826.58 1466.33· governi'ie'iil--· ·-----,2~7""3~3.""oo 1ooi:-8s·.. 1356.31 374.84 ·-·-·--history 3091.00 1666.95 2856.31 1432.26health and physical education b.26.00 132.60 215.16 .. il8.24 home economics 675.00 429.17 8.92 236.91 industrial arts 89i.bo .. 193.57 29;21 668;·~2 journalism 312.00 83.40 115.41 ll3.19 -,...¥a~th~e~m~a~t~ic~s;-·----------.1~1+17'4'-;.0s:0;--,9.;;0~7-'-c.8~5i3ll.74 105.59· modern language 1801.00 626.45 978.38 196.17 music iilil2.00· 1214.15 338.92 328.93 physics 1494.00 634.30 823.19 36.51 ·;;svtholifgv··· · ....... ..... 1oz;oo· ··:;az.oo 57i.6o ·· 21.6osoc iology 739.00 197.45 69.96 471.59 ~sp;;.;e;,;e~c7:'h=-: ___________ ......... ll92.00 991.00 421.48 220.48· ~e~~.r.~l. .• 263?_1.00 8557.10 16907.12 886._78 total 64396.00 25984.62 37938.95 472.43 ··------value of gifts ano transf ers 7643.29 fig. 7. example of computer produced financial statement 8 journal of library automation vol. 1/ 1 march, 1968 (figure 8) for budgetary purposes. the computer also gives credit to the appropriate fund for items cancelled. this accounting is accomplished through the use of one of the change cards mentioned above. the "books on order" list mentioned above is necessarily cumulative to include all new orders processed, since all new requests are checked against this list for possible duplications. this list always provides current inf01mation on the status of an order, enabling the user to find out to what stage in the total process a given order has progressed. non-book materials are differentiated from book materials through use of form codes (figure 9) which appear on the "books on order" print-out. code department aed ag art bio ba chm ed en eng ceo gov hst hpe he la jrn mth mdl mus phy psy soc spe gen gft · agricultural education agriculture art biology business administration chemistry education engineering english geography govemment history health and physical education home economics industrial arts journalism mathematics modern language music physics psychology sociology speech general gifts and trasnfers fig. 8. fund codes used in the acquisitions system form gode microforms _______ m films _________________ __ ________ c filmstrips -----·---------s records ____ ___________ d tapes _______________________ t maps --"---c---------------a manuscripts ________ u serials ___________ _______ p fig. 9. form codes used for non-book materials computer based acquisitions system/ morris 9 use of change cards if a dealer reports an item unavailable, cancellation data is noted on the first change card, which then is sent to the computer center. here cancellation data is keypunched into the change card and the change card is fed into the computer to remove all information pertaining to the order from disk storage and consequently from the "books on order" list. the second change card is then discarded. if a dealer supplies an item, actual cost and date received is indicated on the first change card, which is then returned to the computer center. here cost and date received is keypunched into the change card and the change card is processed through the computer to record receipt of the item and to adjust the corresponding account if necessary. the second change card then accompanies the newly acquired item through the various stages of cataloging. at the appropriate time during the cataloging routine, the call number is written on the second change card. when the catalog cards are ready to be filed in the public catalog, the second change card is returned to the computer center where the call number is keypunched into it. from here this change card, usually in a group of several hundred, is fed into the computer and a list of current acquisitions (figure 10) is printed out. the second change card then is coded so as to make possible the deletion from disk storage of all information pertaining to an order which has appeared on an acquisitions list for as long as two months after the item has been cataloged. this allows the catalog department ample time to file cards in the public catalog, thus reducing the possibility of unintentional duplication. once deleted, the item no longer appears on the "books on order" list. use of five-part order form part one (the original) of the order is sent to the dealer. part two is sent to the catalog department for use as an order for cards from the library of congress. part three differs from part two in color only and serves primarily as a record of the library of congress card order. part four, with part five and corresponding change cards, is filed alphabetically first by dealer and then by main entry. part four serves as a report form on which to record dealer reports and other messages pertaining to the status of the item on order. in the event that an order is cancelled, part four is sent to the catalog department as a signal that library of congress cards may also be cancelled. part four is discarded if a claim or cancel procedure is negated by receipt of an ordered item. part five, with part four and corresponding change cards, is filed in the same manner as part four above. when an item is received and paid for, cost and date received is recorded on this copy of the order. part five, designated as the control copy, then is filed by order number in the library's "contror file for possible use in the identification of items already approved for 10 journal of library automation vol 1/ 1 march, 1968 f /g183ds 0 15.72 /g5896f v1+2 016.37139/h383p 016.519 /inp.b 016.9 /k953d 028 . 52 /b6448 029 .6 /m1990 031. /w569f 056. /in25 v3 1963 060. /w893 1966 67 110 . /m494e 130.1 /v631b 131. /k1396p 131 .3464 /w6321 137.842 /b388r v1 1961 137.842 /b388r v2 137. b42 /b388r v3 garland ham a daughter of th e middle border peter smith 1960 gonzalez luis fuentes de la hist contemp hex colegio mexico hex 1961 hendershot carl programed learning bibli ogr aphy ed 3 the author mich 1964 wold herman 0 bibli ogr aphy on time series mit press mass 1966 kuehl w f dissertations hi story amer l1 b assoc mckerrow r b wheeler will a pan am union mel sen a van vesey god n a kantor j r wickes fran g beck samuel j v 1 (only>. beck sam j v 2(0nly) beck samuel j v 3(0nly) univ of kentucky ky 1965 books for children 1960-1965 am l i assoc chic 1966 on the publication of research mla n y familiar allusi ons gale ind ex to latin amer periodicals eo 3 sc ar ecrow 1965 world of learning 1966-67 eo 17 internatl 1967 evolution and philosophy duquesne 1965 body and mind readings in philo humanities 1965 problems of physiological psy principia press !no 1947 the inner world of man ungar n y 1959 rorschacks test basic process ed 3 gr une 1961 rorschacks test variety of per grune 1949 rorschacks test advances in grune 1<':152 150.1943 /b78b broadbent d e behavior basic books 1961 fig. 10. example of computer produced current acquisitions list payment which may no longer appear on the "books on order, list. it further provides official evidence that purchase was duly authorized. gifts and transfers a gift item is processed in the same manner as a purchase except that part one of the order is discarded. an estimate of the value of each title is submitted so that the total value of gifts can be produced automatically computer based acquisitions system/morris 11 for a given period. an item transfelted from the bookstore or any other department of the institution is processed in the same manner as a gift, except that the actual cost of the item is used rather than an estimate. standing and continuation orders a standing or continuation order for a series is keypunched with coded information which causes it to appear indefinitely on the "books on order" list. the two-fold purpose of this is to eliminate the possibility of unintentional duplication and to serve as evidence that the order was authorized. an item actually received on a standing or continuation order basis is processed as a confirmation order and is assigned an order number different from the one assigned the original order. in this way, the item received will appear on the "books on order" list next to the original entry only as long as it takes to catalog the item. clearance of invoices and final routines upon receipt of shipment and coltesponding invoice, an item is accepted (if as ordered) and the date of acceptance and cost (as per invoice ) is noted on the first change card. this change card is then returned (usually in a group of several hundred) to the computer center, where cost and receipt date are keypunched into it. this information is fed into the computer and accurate accounting results. the next printout of the "books on order" list will indicate that the item was received on the date noted. part four of the order is discarded. part five of the order, bearing cost and date received, is filed by ~rder number in the "control" file. the second change card and the original request card accompany the book to the catalog department. book pockets are pasted in the books at this point to accommodate the second change card and, later, the ibm circulation card used by the library's circulation department. at the end of . the cataloging routine, the original request card is sent to the requester as notification that the item is ready for use. discussion no attempt has been made to compare costs of the new system to the old. on the surface, however, there appears to be considerable saving in time and clerical personnel. automatic accounting alone results in a net gain of approximately twenty hours per week in clerical time which can be applied to other necessary manual tasks. manual typing of orders has been completely eliminated with the use of the computer produced order, resulting in further savings in clerical time. limitations of the new system are about the same as those encountered by other mechanized systems, the limiting factors of space in input and electronic storage being most obvious. the present disk storage equipment is capable of storing data on approximately thirteen thousand book orders and this capacity could be doubled with the addition of another 12 journal of library automation vol. 1/ 1 march, 1968 disk unit. the problem · of disk storage space is not critical at present because removal of order information from storage at two-month intervals after the cataloging process creates additional space for new orders. although the new system has definite advantages, perfection was never expected nor does it exist. the human error factor in the book verification and keypunching processes shows up now and then. experience bears out the fact that output is only as perfect as input. nevertheless, there has been a noticeable gain in accuracy with the installation of the new system, mainly because the more exacting method of procedure helps in detecting an error before it is beyond retraction. even keypunching accuracy has been much greater than expected. conclusion the new acquisitions system at texas a&i university does the job that it was designed to do. it has resulted in faster clearance of orders, better control over unintentional duplication of orders, and automatic accounting. it is believed that the system will lend itself to further development within its existing framework and that it will be capable of handling future work loads. acknowledgements much of the credit for the success of the program goes to dr. j. r. guinn, professor and chairman of the department of electrical engineering. his time in reviewing the original proposal and his subsequent efforts toward the implementation of the project resulted in a workable, practical system. credit goes also to mr. patrick barkey, former librarian at texas a&i university (then known as texas college of arts and industries) for the encouragement he gave to the writer and for the support he gave to the project. appreciation is extended also to mr. r. c. janeway, librarian at texas technological college, for submitting some worthy ideas on design of order forms and on acquisitions procedures in general. references 1. international business machines: "mechanized library procedures," ibm data processing application manual (white plains: ibm, n. d.), p. 11. 2. becker, joseph: "system analysis-prelude to library data processing," ala bulletin, 59 (march 19~), 296. 3. schultheiss, louis a.: "data processing aids in acquisitions work," library resources and technical services, 9 (winter 1965), 68. 4. cox, carl c.: "mechanized acquisitions procedures at the university of maryland," college and research libraries, 26 (may 1965), 232. editorial board thoughts: tools of the trade sharon farnel information technology and libraries | march 2012 5 as i was trying to settle on a possible topic for this, my second “editorial board thoughts” piece, i was struggling to find something that i’d like to talk about and that ital readers would (i hope) find interesting. i had my “eureka!” moment one day as i was coming out of a meeting, thinking about a conversation that had taken place around tools. now, by tools, i’m referring not to hardware, but to those programs and applications that we can and do use to make our work easier. the meeting was of our institutional repository team, and the tools discussion specifically focused on data cleanup and normalization, citation integration, and the like. i had just recently returned from a short conference where i had heard mentioned or seen demonstrated a few neat applications that i thought had potential. a colleague also had just returned from a different conference, excited by some of things that he’d learned about. and all of the team members had, in recent days, seen various e-mail messages about new tools and applications that might be useful in our environment. we mentioned and discussed briefly some of the tools that we planned to test. one of the tools had already been test driven by a couple of us, and looked promising; another seemed like it might solve several problems, and so was bumped up the testing priority list. during the course of the conversation, it became clear that each of us had a laundry list of tools that we wanted to explore at greater depth. and it also became clear that, as is so often the case, the challenge was finding the time to do so. as we were talking, my head was full of images of an assembly line, widgets sliding by so quickly that you could hardly keep up. i started thinking how you could stand there forever, overwhelmed by the variety and number of things flying by at what seemed like warp speed. alternatively, if you ever wanted to get anywhere, do anything, or be a part of it all, you just had to roll up your sleeves and grab something. the meeting drew to a close, and we all left with a sense that we needed to find a way of tackling the tools-testing process, of sharing what we learn and what we know, all in the hope of finding a set of tools that we, as a team, could become skilled with. i personally felt a little disappointed at not having managed to get around to all of the tools i’d earmarked for further investigation. but i also felt invigorated at the thought of being able to share the load of testing and researching. if we could coordinate ourselves, we might be able to test drive even more tools, increasing the sharon farnel (sharon.farnel@ualberta.ca) is metadata and cataloguing librarian, university of alberta, edmonton, alberta, canada. mailto:sharon.farnel@ualberta.ca editorial board thoughts | farnel 6 likelihood we’d stumble on the few that would be just right! we’d taken furtive steps towards this in the past, but nothing coordinated enough to make it really stick and be effective. i started wondering how other individuals and institutions manage not only to keep up with all of the new and potentially relevant tools that appear at an ever-increasing pace, but more so how they manage to determine which they will become expert at and use going forward. (although i was excited at what we were thinking of doing, i was quite sure that others were likely far ahead of us in this regard!) it made me realize that at some point i—and we—need to stop being bystanders to the assembly line, watching the endless parade of tools pass us by. we need to simply grab on to a tool and take it for a spin. if it works for what we need, we stick with it. if it doesn’t, we put it back on the line, and grab a different one. but at some point we have to take a chance and give something a shot. we’ve decided on a few methods we’ll try for taking full advantage of the tool-rich environment in which libraries exist today. our metadata team has set up a “test bench,” a workstation that we can all use and share for trying new tools. a colleague is going to organize monthly brown-bag talks at which team members can demonstrate tools that they’ve been working with and that they think have potential uses in our work. and we’re also thinking of starting an informal, and public, blog, where we can post, among other things, about new tools we’ve tried or are trying, what we’re finding works and how, and what doesn’t and why. we hope these and other initiatives will help us all stay abreast or even slightly ahead of new developments, be flexible in incorporating new tools into our workflows when it makes the most sense, and in building skills and expertise that benefit us and that can be shared with others. so, i ask you, our ital readers, how do you manage the assembly line of tools? how do you gather information on them, and when do you decide to take one off and give it a whirl? how do you decide when something is worth keeping, or when something isn’t quite the right fit and gets placed back on the line? why not let us know by posting on the italica blog? or, even better, why not write about your experience and submit it to ital? we’re always on the lookout for interesting and instructional stories on the tools of our trade! http://ital-ica.blogspot.com/ autocomplete as a research tool: a study on providing search suggestions david ward, jim hahn, and kirsten feist information technology and libraries | december 2012 6 abstract as the library website and its online searching tools become the primary “branch” many users visit for their research, methods for providing automated, context-sensitive research assistance need to be developed to guide unmediated searching toward the most relevant results. this study examines one such method, the use of autocompletion in search interfaces, by conducting usability tests on its use in typical academic research scenarios. the study reports notable findings on user preference for autocomplete features and suggests best practices for their implementation. introduction autocompletion, a searching feature that offers suggestions for search terms as a user types text in a search box (see figure 1), has become ubiquitous on both larger search engines as well as smaller, individual sites. debuting as the “google suggest” feature in 20041, autocomplete has made inroads into the library realm through inclusion in vendor search interfaces, including the most recent proquest interface and in ebsco products. as this feature expands its presence in the library realm, it is important to understand how patrons include it in their workflow and the implications for library site design as well as for reference, instruction, and other library services. an analysis of search logs from our library federated searching tool reveals both common errors in how search queries are entered, as well as patterns in the use of library search tools. for example, spelling suggestions are offered for more than 29 percent of all searches, and more than half (51 percent) of all searches appear to be for known items.2 additionally, punctuation such as commas and a variety of correct and incorrect uses of boolean operators are prevalent. these patterns suggest that providing some form of guidance in keyword selection at the point of searchterm entry could improve the accuracy of composing searches and subsequently the relevance of search results. this study investigates student use of an autocompletion implementation on the initial search entry box for a library’s primary federated searching feature. through usability studies, the authors analyzed how and when students use autocompletion as part of typical library research, asked the students to assess the value and role of autocompletion in the research process, and noted any drawbacks of implementing the feature. additionally, the study sought to analyze how david ward (dh-ward@illinois.edu) is reference services librarian, jim hahn (jimhahn@illinois.edu) is orientation services and environments librarian, undergraduate library, university of illinois at urbana-champaign. kirsten feist (kmfeist@uh.edu) is library instruction fellow, m.d. anderson library, university of houston. information technology and libraries | december 2012 7 figure 1. autocomplete implementation implementing autocompletion on the front end of a search affected providing search suggestions on the back end (search result pages). literature review autocomplete as a plug-in has become ubiquitous on site searches large and small. research on autocomplete includes a variety of technical terms that refer to systems using this architecture. examples include real time query expansion (rtqe), interactive query expansion, search-asyou-type (sayt), query completion, type-ahead search, auto-suggest, and suggestive searching/search suggestions. the principal research concerns for autocomplete include issues related to both back-end architecture and assessments of user satisfaction and systems for specific implementations. nandi and jagadish present a detailed system architecture model for their implementation of autocomplete, which highlights many of the concerns and desirable features of constructing an index that the autocomplete will query against.3 they note in particular that the quality of suggestions presented to the user must be high to compensate for the user interface distraction of having suggestions appear as a user types. this concern is echoed by hanmin et al. in their analysis of how the results offered by their autocomplete implementation met user expectations.4 their findings emphasize configuring systems to display only keywords that bring about successful searches, noting “precision [of suggested terms] is closely related with satisfaction.” an additional analysis of their implementation also noted that suggesting search facets (or “entity types”) is a way to enhance autocomplete implementations and aid users in selecting suitable keywords for their search.5 wu also suggests using facets to help group suggestions by type, which improves comprehension of a list of possible keyword combinations.6 in defining important design characteristics for autocomplete as a research tool | ward, hahn, and feist 8 autocomplete implementations, wu advocates building in a tolerance for misplaced keywords as a critical component. chaudhuri and kaushik examine possible algorithms to use in building this type of tolerance into search systems. misplaced keywords include typing terms in the wrong field (e.g., an author name in a title field), as well as spelling and word order errors.7 systems that are tolerant in this manner “should enumerate all the possible interpretations and then sort them according to their possibilities,” a specification wu refers to as “interpret-as-you-type.”8 additionally, both wu and nandi and jagadish specify fast response time (or synchronization speed) as a key usability feature in autocomplete interfaces, with nandi and jagadish indicating 100ms as a maximum.9,10 speed also is a concern in mobile applications, which is part of the reason paek et al. recommend autocomplete as part of mobile search interfaces, in which reducing keystrokes is a key usability feature.11 on the usability end, white and marchionini12 assess best practices for implementation of searchterm-suggestion systems and users’ perceptions of the quality of suggestions and search results retrieved. they find that offering keyword suggestions before the first set of results has been displayed generated more use of the suggestions than displaying them as part of a results page, even though the same terms were displayed in both cases. providing suggestions at this initial stage also led to better-quality initial queries, particularly in cases where users may have little knowledge of the topic for which they are searching. the researchers also warn that, while presenting “query expansion terms before searchers have seen any search results has the potential to speed up their searching . . . it can also lead them down incorrect search paths.”13 method usability study we conducted two rounds of usability testing on a version of university of illinois at urbanachampaign’s undergraduate library website that contained a search box for the library’s federated/broadcast search tool with autocomplete built in. the testing followed nielsen’s guidelines, using a minimum of five students for each round, with iterative changes to the interface made between rounds based on feedback from the first group.14 we conducted the initial round in summer 2011 with five library undergraduate student workers. the second round was conducted in september 2011 and included eight current undergraduate students with no affiliation to the library. by design, this method does not allow us to state definitive trends for all autocomplete implementations. it is not a statistically significant method by quantitative standards—rather, it gives us a rich set of qualitative data about the particular implementation (easy search) and specific interface (undergrad library homepage) being studied. the study’s questions were approved by the campus institutional review board (irb), and each participant signed an irb waiver before participating. students for the september round were recruited via advertisements on the website and flyers in the library. gift certificates to a local coffee shop provided the incentive for the study. information technology and libraries | december 2012 9 the procedure for each interview focused on two steps (see appendix). first, each participant was asked to use the search tool to perform a series of common research tasks, including three queries for known item searches (locating a specific book, journal, and movie), and two searches that asked the student to recall and describe a current or previous semester’s subject-based search, then use the search interface to find materials on that topic. participants were asked to follow a speak-aloud protocol, dictating the decision-making process they went through as they conducted their search, including noting why they made each choice that they made along the way. researchers observed and took notes, including transcribing user comments and noting mouse movements, clicks, and other choices made during the searches. because part of the hypothesis of the study was that the autocomplete feature would be used as an aid for spelling search queries correctly, titles with possibly challenging spelling were chosen for the known item searches. participants were not told about or instructed in the use of autocomplete; rather, it was left to each of them to discover it and individually decide whether to use it during each of the searches they conducted as a part of the study. in the second part of the interview, researchers asked students questions about their use (or lack thereof) of the autocomplete feature during the initial set of task-based questions. this set of questions focused on identifying when students felt the autocomplete feature was helpful as part of the search process, why they used it when they did, and why they did not use it in other cases. students also were asked more general questions about ways to improve the implementation of the feature. in the second round of testing (with students from the general campus populace), an additional set of questions was asked to gather student demographic information and to have the participants assess the quality of the choices the autocomplete feature presented them with. these questions were based in part on the work of white and marchionini, who had study participants conduct a similar quality analysis.15 autocomplete implementation the autocomplete feature was javascript and based on the jquery autocomplete plugin (http://code.google.com/p/jquery-autocomplete/). autocomplete plugins generally pull results either from a set of previous searches on a site or from a set of known products and pages within a site. for the study, the initial dataset used was a list of thousands of previous searches using the library’s easy search federated search tool. however, this data proved to be extremely messy and slow to search. in particular, a high number of problematic searches were in the data, including entire citations pasted in, misspelled words, and long natural-language strings. constructing an algorithm to clean up and make sense of these difficult queries would have required too much time and overhead, so we investigated other sources. researchers looked at autocomplete apis for both bing (http://api.bing.com/osjson.aspx?query=test) and google (the suggest toolbar api: http://google.com/complete/search?output=toolbar&q=test). both worked well and produced autocomplete as a research tool | ward, hahn, and feist 10 similar relevant results for the test searches. significantly, the search algorithms behind each of these apis were able to process the search query into far more meaningful and relevant results than what was achieved through the test implementation using local data. these algorithms also included correcting misspelled words entered by users by presenting correctly spelled results from the dropdown list. we ultimately chose the google api on the basis of its xml output. findings the study’s findings were consistent across both rounds of usability testing. notable themes include using autocomplete to correct spelling on known-item searches (specific titles, authors, etc.), to build student confidence with an unfamiliar topic, to speed up the search process, to focus broad searches, and to augment search-term vocabulary. the study also details important student perceptions about autocomplete that can guide the implementation process in both library systems and instructional scenarios. these student perceptions include themes of autocomplete’s popularity, desire for local resource suggestions, various cosmetic page changes, and user perception of the value of autocomplete to their peers. spelling “it definitely helps with spelling,” said one student, responding to a prompt of how they would explain the autocomplete feature to friends. correcting search-term spelling is a key way in which students chose to make use of the autocomplete feature. for known-item searches, all eight students in the second round of testing selected suggestions from auto-complete at least two times out of the three searches conducted. of those eight students, four (50 percent) used auto-complete every time (three out of three opportunities), and four (50 percent) used it 67 percent of the time (two out of three opportunities). we found that of this latter group who only selected auto-complete suggestions two out of the three opportunities presented, three of them did in fact refer to the dropdown selections when typing their inquiries, but did not actively select these suggestions from the dropdown all three times. in choosing to use autocomplete for spelling correction, one student noted that autocomplete was helpful “if you have an idea of a word but not how it’s spelled.” it is interesting to note, with regard to clicking on the correct spellings, that students do not always realize they are choosing a different spelling than what they had started typing. an example is the search for journal of chromatography, which some students started spelling as “journal of chormo,” then picked the correct spelling (starting “chroma”) from the list, without apparently realizing it was different. this is an important theme: if a student does not have an accurate spelling from which to begin, the search might fail, or the student will assume the library does not have any information on the chosen topic. this is particularly true in many current library catalog interfaces, which do not provide spelling suggestions on their search result pages. locating known items information technology and libraries | december 2012 11 another significant use of the autocomplete feature was in cases where students were looking for a specific item but had only a partial citation. in one case, a student used autocomplete to find a specific course text by typing in the general topic (e.g., “africa”) and then an author’s name that the course instructor had recommended. the google implementation did an excellent job of combining these pieces of information into a list of actual book titles from which to choose. this finding also echoes those of white and marchioni, who note that autocomplete “improved the quality of initial queries for both known item and exploratory tasks.”16 the study also found this to be an important finding because overall, students are looking for valid starting points in their research (see “confidence” below), and autocomplete was found to be one way to support finding instructor-approved items in the library. this echoes findings from project information literacy, which shows students typically turn to instructor-sanctioned materials first when beginning research.17 this use case typically arises when an instructor suggests an author or seminal text on a research topic to a student, often with an incomplete or inaccurate title. one participant also mentioned that they wanted the autocomplete feature to suggest primary or respected authors based on the topic they entered. confidence “[autocomplete is] an assurance that it [the research topic] is out there . . . you’re not the first person to look for it.”—student participant there were multiple themes related to the concept of user confidence discovered in the study. first, some participants noted that when they see the suggestions provided by autocomplete it verifies that what they are searching is “real”—validating their research idea and giving them the sense that others have been successful previously searching for their topic. when students were asked the source of the autocomplete suggestions, most thought that results were generated based on previous user searches. their response to this particular question highlighted the notion of “popularity ranking,” in that many were confident that the suggestions presented were a result of popular local queries. in addition, one participant thought that results generated were based on synonyms of the word they typed, while another believed that the results generated were included only if the text typed matched descriptions of materials or topics currently present in the library’s databases. some students did indicate the similarity of search results to google suggestions, but they did not make an exact connection between the two. this assumption that the terms are vetted seems to lend authority to the suggestions themselves and parallels the research of jung et al., who investigated satisfaction based on the connection between user expectations on selecting an autocomplete keyword and results.18 the benefit of autocomplete-provided suggestions in this context was noted even in cases when participants did not explicitly select items from the autocomplete list. students’ confidence in their own knowledge of a topic also factored into when they used autocomplete. participants reported that if they knew a topic well (particularly if the topic chosen was one that they had previously completed a paper on), it was faster to just type it in without autocomplete as a research tool | ward, hahn, and feist 12 choosing a suggestion from the autocomplete suggestion list. one participant also noted that common topics (e.g., “someone’s name and biography”) would also be cases in which they would not use the suggestions. after the first round of usability testing, a question was added to the post–test assessment asking students to rate their confidence as a researcher on a five-point scale. all participants in the second round rated themselves as a four or five out of five. while this confirms findings on student confidence from studies like project information literacy, this assessment question ultimately had no correlation to actual use of autocomplete suggestions during the subject-based research phase of the study. rather, confidence in the topic itself seemed to be the defining factor in use. speed the study also showed that speed is a factor in deciding when to use autocomplete functionality. specifically, autocomplete should be implemented in a way in which they are not perceived as slowing down the search process. this includes having results displayed in a way that is easily ignored if students want to type in an entire search phrase themselves, and having the presentation and selection of search suggestions done in a way that is easy to read and quick to be selected. autocomplete is perceived as a time-saver when clicking on an item will shorten the amount of typing students need to do. however, some students will ignore autocomplete altogether; they do this when they know what they want, and they feel that speed is compromised if they need to stop and look at the suggestions when they already know what they want to search. in the study, different participants would often cite speed as a reason for both selecting and not selecting an item for the same question, particularly with the known-item searches. this finding indicates that a successful implementation should include both a speedy response (as noted above in nandi and jagadish’s research on delivering suggestions within 100ms, paek et al.’s research on reducing keystrokes, and white and marchioni’s finding that providing suggested words was “a real time-saver”),19 as well as an interface which does not force users to select an item to proceed, or obscure the typing of a search query. focusing topics “it helps to complete a thought.” “[autocomplete is] extra brainstorming, but from the computer.”— participant responses the above quotes indicate the use of autocomplete as a tool for query formulation and search term identification, a function closely related to the association of college and research libraries (acrl) information literacy standard two, which includes competencies for selecting appropriate search keywords and controlled vocabulary related to a topic.20 this quote also parallels a similar finding from white and marchioni, 21 who had a user comment that autocomplete “offered words (paths) to go down that i might not have thought of on my own.” the use of autocomplete for scoping and refining a topic also parallels elements of the reference interview, specifically the open and closed questions typically asked to help a student define what information technology and libraries | december 2012 13 aspects of a topic they are interested in researching. this finding has many exciting implications for how elements and best practices from both classroom instruction and reference methodologies can be injected directly into search interfaces, to aid students who may not consult with a librarian directly during the course of their research. autocomplete was used at a lower rate, and in different ways, for subject searching compared to kown-item searching. three out of eight participants (38 percent) from the second round of testing did not use autocomplete at all for subject-based searching (zero of two opportunities). five out of eight participants (62 percent) used autocomplete on one of two search opportunities (50 percent). no participants used autocomplete on both of the search opportunities. the stage of research a student was in helped to indicate where and how autocomplete could be useful in topic formulation and search-term selection for subject searches. participants indicated that they would use autocomplete for narrowing ideas if they were at a later stage in a paper, when they knew more about what they wanted or needed specifics on their topic. however, early in a paper, some participants indicated they just wanted broad information and did not want to narrow possible results too early. this finding also supports previous research from project information literacy, which describes student desire to learn the “big-picture context” as a key function in the early part of the research process.22 at this topic-focusing stage, some participants told us that the search suggestions reminded them of topics that were discussed in class. further, the study showed that autocomplete suggests aspects of topics to student that they had not previously considered, and one participant indicated that she might change her topic if she saw something interesting from the list of suggestions, particularly something she had not thought of yet. interface implementation though students who opted to utilize the autocomplete feature were generally satisfied with the results generated, some students recommended increasing the number of autocomplete suggestions in the dropdown menu to increase the probability of finding their desired topic or known item or to potentially lead to other related topics to narrow their search. in addition, students recommended increasing the width of the autocomplete text box, as its present proportions are insufficient for displaying longer suggestions without text wrapping. some students also noted that increasing the height of the dropdown menu containing the autocomplete suggestions might help reduce the necessity to scroll through the results and may help to draw user attention to all results for those who elect not to use the scroll bar. beyond the suggested improvements for the functionality of the autocomplete feature, students also noted a few cosmetic changes they would like to see implemented. in particular, students would prefer to see larger text and a better use of fonts and font colors when using autocomplete. one student noted that if different fonts and colors were used in this feature, the results generated might stand out more and better attract users, or better draw users’ attention to the recommended search terms. autocomplete as a research tool | ward, hahn, and feist 14 perceived value to peers most students who participated in the study stated that they would recommend that their fellow classmates utilize the autocomplete feature for two primary purposes: known-item searches and locating alternative options for research topics. one student noted that she would recommend using this feature to search keywords “easily and efficiently,” while another student indicated that the feature helps to link to other related keywords. this finding also revealed that users were not intimidated by the feature and did not see it as a distraction from the search process, an initial researcher concern. conclusion and future directions implementation implications implementing autocomplete functionality that accounts for the observed research tendencies and preferences of users makes for a compelling search experience. participant selection of autocomplete suggestions varied between the types of searches studied. spelling correction was the one universally acknowledged use. for subject-based searching, confidence in the topic searched and the stage of research emerged as indicators of the likelihood of autocomplete suggestions being taken. the use and effectiveness of providing subject suggestions requires further study, however. students expect suggestions to produce usable results within a library’s collections, so the source of the suggestions should incorporate known, viable subject taxonomies to maximize benefits and not lead students down false search paths. there is an ongoing need to investigate possible search-term dictionaries outside of google, such as lists of library holdings, journal titles, article titles, and controlled vocabulary from key library databases. the “brainstorming” aspect of autocomplete for subject searching is an intriguing benefit that should be more fully explored and supported. in combination with these findings, participant’s positive responses to some of the assessment questions (including first impressions of autocomplete and willingness to recommend it to friends) indicate that autocomplete is a viable tool to incorporate site-wide into library search interfaces. instruction implications traditional academic library instruction tends to focus on thinking of all possible search terms, synonyms, and alternative phrasing before the onset of actual searching and engagement with research interfaces. this process is later refined in the classroom by examining controlled vocabulary within a set of search results. however, observations from this study (as well as researcher experience with users at the reference desk) indicate that students in real-world situations often skip this step and rely on a more trial-and-error method for choosing search terms, beginning with one concept or phrasing rather than creating a list of options that they try sequentially. the implication for classroom practice is that instruction on search-term formulation should include a review of autocomplete suggestions as well as practical methods for integrating these suggestions into the research process. this is particularly important as vendor databases information technology and libraries | december 2012 15 move toward making autocomplete a default feature. proper instruction in its use can help advance acrl information literacy goals and provide a practical, context-sensitive way to explain how a varied vocabulary is important for achieving relevant results in a research setting.23 reference implications as with classroom instruction, traditional reference practice emphasizes a prescriptive path for research that involves analyzing which aspects of a topic or alternate vocabulary will be most relevant to a search before search-term entry. open and closed questioning techniques encourage users to think about different facets of their topic, such as time period, location, and type of information (e.g., statistics) that might be relevant. an enhanced implementation of autocomplete can incorporate these best practices from the reference interview into the list of suggestions to aid unmediated searching. one way this might be incorporated is through presenting faceted results that change on the basis of user selection of the type and nature of information they are looking for, such as a time period, format, or subject. for broadcast and federated searching interfaces, this could extend into the results users are then presented with, specifically attempting to use items or databases on the basis of suggestions made during the search entry phase, rather than presenting users with a multitude of options for users to make sense of, some of which may be irrelevant to the actual information need. finally, the findings on use of autocomplete also have implications for search-results pages. many of the common uses (e.g., spelling suggestions and additional search-term suggestion) also should be standard on results pages. this, too, is a common feature of commercial interfaces. bing, for example, includes a related searches feature (on the left of a standard results page), that suggests context-specific search terms based on the query. this feature is also part of their api (http://www.bing.com/developers/s/apibasics.html). providing these reference-without-alibrarian features is essential both in establishing user confidence in library research tools and in developing research skills and an understanding of the information literacy concepts necessary to becoming better researchers. our autocomplete use findings draw attention to user needs and library support across search processes; specifically, autocomplete functionality offers support while forming search queries and can improve the results of user searching. for this reason, we recommend that autocomplete functionality be investigated for implementation across all library interfaces and websites to provide unified support for user searches. the benefits that can be realized from autocomplete can be maximized by consulting with reference and instruction personnel on the benefits noted above and collaboratively devising best practices for integrating autocomplete results into searchstrategy formulation and classroom-teaching workflows. http://www.bing.com/developers/s/apibasics.html autocomplete as a research tool | ward, hahn, and feist 16 references 1. “autocomplete—web search help,” google, support.google.com/websearch/bin/answer.py?hl=en&answer=106230 (accessed february 7, 2012). 2. william mischo, internal use study, unpublished, 2011. 3. arnab nandi and h. v. jagadish, “assisted querying using instant-response interfaces,” in proceedings of the 2007 acm sigmod international conference on management of data (new york: acm, 2007), 1156–58, doi: 10.1145/1247480.1247640. 4. hanmin jung et al., “comparative evaluation of reliabilities on semantic search functions: auto-complete and entity-centric unified search,” in proceedings of the 5th international conference on active media technology (berlin, heidelberg: springer-verlag, 2009), 104–13, doi: 10.1007/978-3-642-04875-3_15. 5. hanmin jung et al., “auto-complete for improving reliability on semantic web service framework,” in proceedings of the symposium on human interface 2009 on human interface and the management of information. information and interaction. part ii: held as part of hci international 2009 (berlin, heidelberg: springer-verlag, 2009), 36–44, doi: 10.1007/978-3-64202559-4_5. 6. hao wu,“search-as-you-type in forms: leveraging the usability and the functionality of search paradigm in relational databases,” vldb 2010, 36th international conference on very large data bases, september 13–17, 2010, singapore, p. 36–41, www.vldb2010.org/proceedings/files/vldb_2010_workshop/phd_workshop_2010/phd%20wor kshop/content/p7.pdf (accessed february 7, 2012). 7. surajit chaudhuri and raghav kaushik, “extending autocompletion to tolerate errors,” in proceedings of the 35th sigmod international conference on management of data (new york,: acm, 2009), 707–18, doi: 10.1145/1559845.1559919,. 8. wu, “search-as-you_type in forms,” 38. 9. wu, “search-as-you-type in forms.” 10. ibid. 11. tim paek, bongshin lee, and bo thiesson, “designing phrase builder: a mobile real-time query expansion interface,” in proceedings of the 11th international conference on humancomputer interaction with mobile devices and services (new york: acm, 2009), 7:1–7:10, doi: 10.1145/1613858.1613868. http://support.google.com/websearch/bin/answer.py?hl=en&answer=106230 http://www.vldb2010.org/proceedings/files/vldb_2010_workshop/phd_workshop_2010/phd%20workshop/content/p7.pdf http://www.vldb2010.org/proceedings/files/vldb_2010_workshop/phd_workshop_2010/phd%20workshop/content/p7.pdf information technology and libraries | december 2012 17 12. ryen w. white and gary marchionini, “examining the effectiveness of real-time query expansion,” information processing and management 43, no. 3 (2007): 685–704, doi: 10.1016/j.ipm.2006.06.005. 13. white and marchionini, “examining the effectiveness of real-time query expansion,” 701. 14. jakob nielsen, “why you only need to test with 5 users,” jakob nielsen’s alertbox (blog), march 19, 2000, www.useit.com/alertbox/20000319.html (accessed february 7, 2012). see also walter apai, “interview with web usability guru, jakob nielsen,” webdesigner depot (blog), september 28, 2009, www.webdesignerdepot.com/2009/09/interview-with-web-usability-gurujakob-nielsen/ (accessed february 7, 2012). 15. white and marchionini, “examining the effectiveness of real-time query expansion.” 16. ibid. 17. alison j. head and michael b. eisenberg, “lessons learned: how college students seek information in the digital age,” project information literacy progress report, december 1, 2009, projectinfolit.org/pdfs/pil_fall2009_finalv_yr1_12_2009v2.pdf (accessed february 7, 2012). 18. jung et al., “comparative evaluation of reliabilities on semantic search functions.” 19. jung et al., “comparative evaluation of reliabilities on semantic search functions”; paek, lee, and thiesson, “designing phrase builder”; white and marchionini, “examining the effectiveness of real-time query expansion.” 20. association of college and research libraries (acrl), “information literacy competency standards for higher education,” http://www.ala.org/acrl/standards/informationliteracycompetency (accessed february 7, 2012). 21. white and marchionini, “examining the effectiveness of real-time query expansion.” 22. head and eisenberg, “lessons learned.” 23. association of college and research libraries (acrl), “information literacy competency standards for higher education.” http://www.useit.com/alertbox/20000319.html http://www.webdesignerdepot.com/2009/09/interview-with-web-usability-guru-jakob-nielsen/ http://www.webdesignerdepot.com/2009/09/interview-with-web-usability-guru-jakob-nielsen/ http://projectinfolit.org/pdfs/pil_fall2009_finalv_yr1_12_2009v2.pdf http://www.ala.org/acrl/standards/informationliteracycompetency autocomplete as a research tool | ward, hahn, and feist 18 appendix. questions task-based questions 1. does the library have a copy of “the epic of gilgamesh?” 2. does the library own the movie “battleship potempkin?” 3. does the library own the journal/article “journal of chromatography?” 4. for this part, we would like you to imagine you are doing research for a recent paper, either one you have already completed or one you are currently working on. a. what is this paper about? (what is your research question?) b. what class is it for? c. search for an article on yyy 5. same as 4, but different class/topic, and search for a book on yyy autocomplete-specific questions 1. what is your first impression of the autocomplete feature? 2. have you seen this feature before? a. if so where have you used it? 3. why did you/did you not use the suggested words? (words in the dropdown) 4. where do you think the suggestions are coming from? or, how are they being chosen? 5. when would you use this? 6. when would you not use it? 7. how can it be improved? 8. overall, what do you like/not like about this option? 9. would you suggest this feature to a friend? 10. if you were to explain this feature to a friend how might you explain it to them? assessment and demographic questions autocomplete feature 1. [known item] rate the quality/appropriateness of each of the first five autocomplete dropdown suggestions for your search: (5 point scale) 1—poor quality/not appropriate 2—low quality 3—acceptable 4—good quality –5—high quality/very appropriate information technology and libraries | december 2012 19 2. [subject/topic search] rate the quality/appropriateness of each of the first five autocomplete dropdown suggestions for your search: (5 point scale) 1—poor quality/not appropriate 2—low quality –3—acceptable 4—good quality –5—high quality/very appropriate 3. please indicate how strongly you agree or disagree with the following statement: “the autocomplete feature is useful for narrowing down a research topic.” (5 point scale): 1—strongly disagree 2—disagree –3—undecided –4—agree –5—strongly agree demographics 1. please indicate your current class status a.  freshman b.  sophomore c.  junior d.  senior 2. what is your declared or anticipated major? 3. have you had a librarian come talk to one of your classes or give an instruction session in one of your classes? if yes, which class(es)? 4. please rate your overall confidence level when beginning research for classes that require library resources for a paper or assignment. (5 point scale): 1—no confidence 2—low confidence 3—reasonable confidence 4—high confidence –5—very high confidence 5. what factors influence your confidence level when beginning research for classes that require library resources for a paper or assignment? article title | author 33online workplace training in libraries | haley 33 this study was designed to explore and describe the relationships between preference for online training and traditional face-to-face training. included were variables of race, gender, age, education, experience of library employees, training providers, training locations, and institutional professional development policies, etc. in the library context. the author used a bivariate test, kruskalwallis test and mann-whitney u test to examine the relationship between preference for online training and related variables. i n the era of information explosion, the nature of library and information services makes library staff update their work knowledge and skills regularly. workplace training has played an important role in the acquisition of knowledge and skills required to keep up with this information explosion. as richard a. swanson states, human resource development (hrd) is personnel training and development and organization development to improve processes and enhance the learning and performance of individuals, organizations, communities, and society (swanson 2001). training is the largest component of hrd. it helps library employees acquire more skills through continuous learning. online workplace training is a relatively new medium of delivery. this new form of training has been explored in the literature of human resources development in corporation settings (macpherson, elliot, harris, and homan 2004), but it has not been adequately explored in university and library settings. universities are unique settings in which to study hrd, and libraries are unique settings in which to examine hrd theory and practice. in human resource development literature there are studies on participation (wang and wang 2004) from the perspective of individual motivation, attitudes, etc.; however, more research needs to be conducted to explore library employees’ demographics related to online training in the unique library contexts, such as various staff training and development, as well as training policies. hrd literature includes studies of online learning in formal educational settings (hiltz and goldman 2004; shank and sitze 2001; waterhouse 2005), and there are studies on relationships between national culture and the utility of online training (downey, wentling, wentling, and wadsworth 2005). but there has been very little research conducted in terms of online workplace training for library staff. it is not clear what relationships exist among preferences for online training and demographic variables such as ethnicity, gender, age, educational level, and years of library experience. due to lack of research in these areas, workplace training in libraries will be less effective if certain ethnic groups, or certain age groups, prefer traditional face-to-face training as libraries move toward online training. the author believes that research should govern library practice. therefore, it is necessary to research this topic and disseminate the findings. because of the growth in online training, there is a need to gain a better understanding of these relationships. ■ purpose of the study the study aims to reveal the relationships between preferences for online or traditional face-to-face training and variables such as ethnicity, gender, age, educational level, and years of experience. it also studies the relationships among preference for online training and other variables of training locations, training providers, training budgets, and professional development policies. the constructs are: the preference for online training was related to demographics, library’s training budget, professional development policies, training providers, and the training locations. these factors were included in the research questionnaire. we begin with the research questions, review the current literature, and then discuss the method, results, and need for further research. correlational research questions 1. what is the relationship between ethnicity and online workplace training preferences? 2. what is the relationship of employees’ educational levels, age, and years of library experience to online workplace training preferences? 3. how does preference for online workplace training in libraries relate to employee gender? 4. how does preference for online workplace training in libraries relate to training locations, training providers, training budgets, and professional development policies? 5. do library staff prefer traditional face-to-face training over online training? ■ review of the literature as stated above, training is the largest component of hrd. the discipline of hrd relies on three core theories: psychological theory, economic theory, and system theory. swanson (2001) stated: online workplace training in libraries connie k. haley connie k. haley (chaley@csu.edu) is systems librarian, chicago state university library, illinois 34 information technology and libraries | march 200834 information technology and libraries | march 2008 economic theory is recognized as a primary driver and survival metric of organizations; system theory recognizes purpose, pieces, and relationships that can maximize or strangle systems and subsystems; and psychological theory acknowledges human beings as brokers of productivity and renewal along with the cultural and behavioral nuances. each of these three theories is unique, complementary, and robust. together they make up the core theory underlying the discipline of hrd (p. 92–93). three specific economic theory perspectives are believed to be most appropriate to the discipline of hrd: (1) scarce resource theory, (2) sustainable resource theory, and (3) human capital theory (swanson 2001). training is an investment to human capital with valuable returns, but no costs. wenger and snyder’s study (as cited in mahmood, ahmad, samah, and idris 2004) states that today’s economy runs on knowledge and skills. thurow’s study (as cited in swanson 2001) states that new industries of the future depend on brain power. man-made competitive advantages replace the comparative advantage of natural-resources endowments or capital endowments. in a rapidly changing society, maintaining organizational and individual competence has become a greater challenge than ever before (hake 1999). competences include knowledge, skills, and attitudes. much of the literature focuses on job-related functional competences (deist and winterton 2005). library workplace training is one of the primary methods of investing in human capital and increasing competence for library employees. training is the process through which skills are developed, information is provided, and attributes are nurtured (davis and davis 1998). to increase training participation and efficacy, libraries need to determine employees’ preferences for online training or traditional face-to-face training; a resulting high training participation rate would increase the competence of all employees. library trainers and administrators can encourage nonparticipants to attend training by offering different training sessions (online or face-to-face), and/or by changing training policies and budget allocations. unlike personality and intelligence, skill competence may be learned; hence it may be improved through training and development (mcclelland 1998). nadler and tushman (1999) emphasized core competence as a key organizational resource that could be exploited to gain competitive advantage. core competence was defined as collective learning in the organization, especially how to coordinate diverse production skills and integrate multiple streams of technologies (prahalad and hamel 1990). mezirow (2000) asserted that there are asymmetrical power relationships that influence the learning process (as cited in baumgartner 2000). learning more about the relationships may benefit training and learning. in other words, training may be more effective if it is provided in the form preferred by the majority of staff. as stated above, there is very little research about online workplace training for library staff. past studies have focused on how to conduct online training for working catalogers (ferris 2002) or on online teaching for students (crichton and labonte 2003; hitch and hirsch 2001). from the design and implementation perspectives, kovacs (2000) discussed web-based training in libraries, and unruh (2000) emphasized problems in delivery of web-based training. markless (2002) addressed learning theory and other relevant theories that could be used to teach in libraries. yet there is a lack of research on the demographics of library staff participation in workplace training and a lack of research on the training preferences of library staff. ■ methodology the study took place in an online environment. the research activities covered a twenty-day period from april 10 to april 30, 2006. survey questionnaires and consent forms were posted on the web. select participants the survey url (http://freeonlinesurveys.com/rendersurvey.asp?id=106221) was sent to library staff via library discussion lists along with a consent form including contact information and a brief explanation of the survey’s purpose. the surveys were anonymous and confidential. names, e-mail addresses, and personally identifiable information were not tracked. all participants filled out the survey online. the sample was limited to employees who were at least nineteen years old. directors and department heads were also welcome to participate. instrument data collected for this study included categorical data (i.e., gender and ethnicity) and numeric data (age, years of education, and years of experience). this was an attitudinal survey; hence, the rensis likert scale was used for data feedback. most of the data was quantitative likert scale, such as the preference for online training, the professional development policy, and the budget allocation for training. data collection “entailed measuring the attitudes of employees, providing feedback to participants, and stimulating joint planning for improvement” (swanson 2001). likert-type scales provide more variation of responses and lend themselves to stronger statistical analysis (creswell 2005). it is important to select a well-tested instrument that article title | author 35online workplace training in libraries | haley 35 reports reliable and valid data. however, measuring attitudes has been one of the most challenging forms of psychometric measurement (thorkildsen 2005). due to a lack of similar studies of libraries’ online training, no instruments could be found for this study except the education participation scale (eps), the deterrents to participation scale (dps), and the style analysis survey (sas) instruments. boshier’s forty-item eps (1974) is reliable in differentiating among diverse groups with varying reasons for participating in continuing education (as cited in merriam and caffarella 1999). the eps is used to find the motivations as to why people participate in continuing education; consequently, the eps cannot answer all questions of this study. similarly, the dps reveals factors of nonparticipation; hence, the dps cannot be used in this study. and while the sas is designed to identify how individuals prefer to learn, concentrate, and perform in both educational and work environments (sloan, daane, and giesen 2002), after careful examination, it was found that the sas was not well-suited to this study. because surveys are used to collect data and to assess opinions and attitudes (creswell 2005), the researcher chose to develop a survey that contained about 20 items to assess library staff’s opinions and attitudes toward online training. the survey consisted of three parts: demographic variables, likert-scale assessment of online workplace training preference, and open-ended questions that were worded to reflect reasons for training preference (see appendix). to capture demographic data, participants were asked to indicate their age, years of library experience, years of education (high school/ged = 12; two years college = 14; bachelor’s degree = 16; one master’s degree = 18; two master’s degrees = 20; ph.d/ed.d = 22+), gender (1 = male or 2 = female), ethnicity (1 = asian/pacific islander, 2 = american indian, 3 = african american, 4 = hispanic, 5 = white, non-hispanic, and 6 = other). the likert scale items are designed using a forced-choice likert scale (smith 2006), that is, an even number of response options (1 = strongly agree; 2 = agree; 3 = mildly agree; 4 = mildly disagree; 5 = disagree; 6 = strongly disagree), rather than an odd number (strongly agree; agree; neither agree nor disagree; disagree; strongly disagree). a scoring decision is consistently applied in order to have a meaningful interpretation of the scores. thus, for the likert scale items, the scaling method is to use high scores to represent stronger resistance to a measured attitude of online training. to insure reliability and validity of scores, the questionnaire was reviewed by an expert in the library field to validate if questions were representative of the library field. data collection the way a researcher plans to draw a sample is related to the best way to collect data (fowler 2002). the above sampling approach made it easier for data collection. the author collected data via the web survey company by paying for survey services on a monthly basis. the data was collected by the end of april 2006. the total number of participants was 292 (n=292), of which 260 were valid. thirty-two participants did not complete the survey; those surveys with missing data were excluded from analysis. survey results were saved in a text file and then downloaded into spss for analysis. ■ results and analysis beside general frequency analysis, the kruskal-wallis test was used for six ethnic groups. since some ethnic groups had small sample sizes, all minorities (48) were merged in one ethnic group. thus, the mann-whitney test was used for the two ethnic groups—minority and majority. the author also assessed bivariate relationships with preference of online training and other variables. frequencies analysis frequencies analysis includes demographics, preference of online training versus face-to-face training, budget, and professional development policies. demographics. eighty-five percent of participants were female, 81 percent were white, 49 percent had one master ’s degree, and 23 percent had two or more master ’s degrees. nearly 70 percent were forty years old or older; 45 percent were fifty years old or older. thirty-six percent had less than 10 years of library experience (see table 1). preference of online training versus face-to-face training. most participants (87.3 percent) reported that online training was less effective than traditional face-to-face training. generally speaking, fewer participants (33.9 percent) preferred online training: strongly agree (3.1 percent), agree (13.5 percent), and mildly agree (17.3 percent). more participants (66.1 percent) did not prefer online training: mildly disagree (28.8 percent), disagree (28.1 percent), and strongly disagree (9.2 percent). budget. fifty-five percent of participants somewhat agree their library allocates sufficient budget for training: strongly agree (8.8 percent), agree (25.8 percent), and mildly agree (20 percent). professional development policies. sixty-eight percent of participants somewhat agree their libraries had good professional development policies: strongly agree (13.5 percent), agree (30 percent), mildly agree (24.6 percent). table 2 shows the frequencies of preference of online training, budget, and policy. 36 information technology and libraries | march 200836 information technology and libraries | march 2008 kruskal-wallis test of ethnicity (α = .05) in the kruskal-wallis test for ethnicity, to match the total number of 48 minorities, 48 white people were randomly selected from 212. the test was not significantly different. in the kruskal-wallis test, chi-square is 2.222 (df = 4) and asymptotic significance was 0.715, which was greater than the criterion α = .05. there was no difference in preference for online training between ethnic groups. mann-whitney u test statistics of ethnicity (α = .05) the mann-whitney test of ethnicity was not significant. asymptotic significance is 0.81 (z = -.241), which was greater than the criterion α = .05. there was no difference in preference for online training between the minorities group and the group of white/not hispanic. mann-whitney u test statistics of gender (α = .05) the mann-whitney test of gender was not significant. asymptotic significance was 0.675 (z = -.419), which was greater than the chosen α value (α = .05). there was no significant difference in preference for online training between males and females. bivariate analysis (α = .05) bivariate correlations were computed (see table 3). preference for online training was not associated with age, years of education, years of library experience, sufficient training budget, or professional development policy. it makes sense to believe that traditional face-to-face training has better quality than online training. before the survey analysis, the author expected that younger employees would prefer online training and older ones would prefer traditional face-to-face training due to the older employees’ reluctance to change. it was also expected that highly educated employees would prefer online training while less educated ones, with fewer online skills, would prefer traditional face-to-face training. another assumption was that employees with more library experience would prefer online training while less experienced ones would prefer traditional face-to-face training. the survey showed these assumptions were wrong. it was also assumed that an insufficient training budget might result in a preference for online training, since online training is more cost effective; and that good professional development policies might result in preference for traditional face-to-face training because it is of better quality than online training. the survey found these assumptions to be false. training budget and professional development policies were irrelevant to the preference for online training. however, it was not surprising to find that preference for online training was associated with training providers and training locations, as seen in table 3. ■ discussion the exploration of the relationships among these variables revealed that the preference for online training was not related to demographics, budgets, or professional development policies. however, the preference for online training did show a correlation to training providers and locations. it was surprising to discover table 1. demographic characteristics characteristics frequency n % gender male 40 15 female 220 85 ethnicity asian/pacific islander 22 8.5 american indian 2 0.8 african american 17 6.5 hispanic 7 2.7 white 212 81.5 age 20–29 23 9.4 30–39 54 21.2 40–49 61 23.9 50–59 102 40 60+ 14 5.5 missing 5 1.9 education less than 16 years 27 10.4 16–17 years/bachelor 45 17.4 18–19 years/one master 128 49.4 20–21 years/two masters 43 16.6 22+ years/doctorate 16 6.2 missing data 1 0.4 years of library experience less than 10 years 94 35.9 10–19 years 81 31.3 20–29 years 48 16.2 more than 30 years 37 14.3 missing data 1 0.4 article title | author 37online workplace training in libraries | haley 37 the preference for online training was not associated with ethnicity, gender, age, education, or library experience. it was interesting to note that training budgets and professional development policies were not related to the preference for online training. several study hypotheses were confirmed. library staff preferred traditional face-to-face training as opposed to online training. although one-third (33.8 percent) of participants preferred (including mildly agree, agree, and strongly agree) online training, only 12.7 percent of participants thought that online training was more effective than traditional face-to-face training. on the other hand, the majority (80 percent) preferred online training when the training was held out of state; 56.2 percent preferred online training when it was held in state. the study concluded that online training was preferred if the training locations required participants to travel great distances from the library. of the participants, 63.1 percent preferred online training when the training was provided by a vendor. some participants did not think face-to-face contact was important for vendor training. this finding suggests that online training is a better choice for vendor training. fifty-five percent preferred online training when it was provided by an association/organization. association/ organization trainers should consider a combination of online and traditional face-to-face training to meet the needs of the majority. online training can be provided for some specific tasks, and supplemented by face-to-face training for others. the following are survey summaries of key reasons to use online and traditional training, along with suggestions from the survey participants. the main reasons to use online training ■ flexible (allows more people from one worksite to participate) ■ saves time ■ eliminates travel cost ■ generally lower training costs ■ ease of access (able to have handson practice with a technology and software program, able to refer back to supplemental materials, able to obtain wider range of training, appropriate to give general overviews in preparation for more indepth face-to-face training) ■ convenient (have some control over one’s time, attend training from the comfort of home or office rather than having to drive somewhere and sit through a presentation, fits easily into a busy schedule, and self-paced in asynchronous online training) the main reasons to use face-to-face training ■ questions and answers: able to ask questions and discuss answers, see immediate feedback, questions others are asking may include some that you didn’t think of, and problems solved directly ■ networking with peers: face-to-face training allows for serendipitous networking opportunities, you have the option of personal conversations with trainers as well as social opportunities to meet other professionals, it is hard to meet people and make friends through an online training, get out of the library once in awhile, find out what experiences staff from other departments or libraries are having ■ better communication and interaction: have personal interaction with instructors and participants, share ideas and experiences with others, enjoy discussions and diversity of personal opinions that come from face-to-face training ■ learn efficiently and effectively: learn from others— not just the instructor, get more out of real training, easy to get disinterested if no face-to-face contact, learn better from an instructor ■ technology barrier: sometimes technology can get in the way of training, some online training was poorly designed, online classes took forever to load and two seconds to read the whole page. suggestions to improve library workplace training administrative support. the most important factor is having library administrators who support training and encourage staff at all levels to attend training. provide workshops for professional librarians and civil service workers that relate to their work, and give them release time for training. library administrators must understand the importance of training and develop training policies with a commitment toward staff development. table 2. frequencies of preference of online training, budget, and policy descriptor frequency mean* median* std. deviation preference of online training 3.93 4 1.281 budget 3.45 3.0 1.550 policy 3.0 3.0 1.437 1 = strongly agree; 2 = agree; 3 = mildly agree; 4 = mildly disagree; 5 = disagree; 6 = strongly disagree 38 information technology and libraries | march 200838 information technology and libraries | march 2008 library administrators must plan and design training infrastructures for core competence and cumulative learning, instead of spontaneous one-shot training for new products or systems. more training. many participants expressed their desire for more training. training not only increases their knowledge and skills needed for their job, but also provides opportunities to network with colleagues. more face-to-face and technical hands-on training are needed since many librarians felt left out of the technology loop. they think that maintaining a current view of developments in technology is difficult. more online training is needed, both asynchronous and synchronous. asynchronous training is good for self-paced training, which is preferred by many survey participants, while some enjoy online webcasts of seminars and workshops for better interaction. it is hoped that state libraries will provide online streaming videos about various topics for academic, private, and public library staff. more funding. make more funding available for library workplace training. the training budget should not be the first thing cut when budgets get tight. a combination of online and traditional face-to-face training. walton (1999) notes that we must ensure we learn and grow. we may learn and grow by participating workplace training. training programs should be built into strategic hrd plans that will best fit employees’ learning preference. this study shows that online training works well with basic informational topics and most technology topics (databases, searching, or web-related technologies). certain simple topics were more appropriate for online training, such as a vendor ’s product and procedural training. some topics do not translate well into online training, however, such as how to conduct storytimes— topics that require a lot of interaction between participants. difficult topics need traditional training for direct answers from the instructor. topics that need in-depth discussion should be provided with traditional training. in other words, provide basic trainings online and save face-to-face training for more difficult topics. ■ future research future research should focus on new learning needs, how people interact with technology, and how people learn in an online environment. more research is needed for a variety of online training. in this study, the generic term “online training” was used. future study needs to expand the term “online training” to static asynchronous online training and interactive synchronous online training. static online training includes text-only static, and textgraphic static with or without voice. interactive online training includes voice-only interactive, and voice-video interactive with ability to ask and answer questions in real time. as time goes by, more people will have taken online training and will be more comfortable with it. as more people have online training experience, their attitudes toward online training may change. further research should examine and measure library staff’s preference for a variety of online training. in addition, participants should be surveyed by grouping experienced online trainees and non-experienced online trainees. finally, studies may be conducted to survey library staff in other countries to compare their preferences with those of their u.s. peers. the goal of this study is to provide helpful information for department heads, supervisors, and library human resources staff to assist them in determining the types of training that will be most effective to meet traintable 3. bivariate correlations with preference of online training ( α = .05) variables preference of online training age .980 education .507 library experience .259 budget .858 prof. development policies .280 training provider vendors <.01* associations/org. (ala, oclc, etc.) <.01* lib. consortia <.01* library/institution <.01* training location out of state <.01* in state <.01* in town <.01* in house <.01* * significant at α = .05 article title | author 39online workplace training in libraries | haley 39 ing needs. the author hopes this study also provides useful information to all library employees who attend training or workshops, including civil service personnel and librarians, and that this study will be utilized for further research on library training and, in turn, that research will make more contributions to the workplace training literature of libraries and other professions. acknowledgements the author thanks lorraine lazouskas, john webb, judith carter, and the copy editors at ala production services for their assistance and valuable input on this manuscript. bibliography baumgartner, l. m. 2000. preface. in l. m. baumgartner and s. b. merriam (eds.), adult learning and development: multicultural stories. malabar, fla.: krieger publishing. creswell, j. w. 2005. educational research: planning, conducting, and evaluating quantitative and qualitative research. 2nd ed. upper saddle river, n.j.: pearson merrill prentice hall. crichton, s., and r. labonte. 2003. innovative practices for innovators: walking the talk; online training for online teaching. educational technology & society 6, no. 1: 70–73. davis, j. r., and a. b. davis. 1998. effective training strategies : a comprehensive guide to maximizing learning in organizations. san francisco: berret-koehler. deist, f. d., and j. winterton. 2005. what is competence? human resource development international 8, no. 1: 27–46. downey, s., r. m. wentling, t. wentling, and a. wadsworth. 2005. the relationship between national culture and the usability of an e-learning system. human resource development international 8, no. 1: 47–64. ferris, a. m. 2002. cataloging internet resources using marc21 and aacr2: online training for working catalogers. cataloging and classification quarterly 34, no. 3: 339–353. fowler, f. j. 2002. survey research methods. 3rd ed. thousand oaks, calif.: sage. haley, c. k. 2006. who participates in online workplace training in libraries? survey results retrieved april 25, 2006, from http:// freeonlinesurveys.com/viewresults.asp?surveyid=183507. hake, b. j. 1999. lifelong learning in late modernity: the challenges to society, organizations, and individuals. adult education quarterly 49, no. 2: 79–90. hiltz, s. r., and r. goldman, eds. 2004. learning together online. mahwah, n.j.: lawrence erlbaum associates. hitch, l. p., and d. hirsch. 2001. model training. journal of academic librarianship 27, no. 1: 15–19. kovacs, d. k. 2000. designing and implementing web-based training in libraries. business and finance division bulletin 113 (winter): 31–37. macpherson, a., m. elliot, i. harris, and g. homan. 2004. e-learning: reflections and evaluation of corporate programme. human resource development international 7, no. 3: 295–313. mahmood, n. h. n., a. ahmad, b. a. samah, and k. idris. 2004. informal learning of management knowledge and skills and transfer of learning among head nurses. in human resource development in asia: harmony and partnership, r. moon, a. m. osman-gani, k. shinil, g. roth, and h. oh, eds. seoul: the korea academy of hrd. markless, s. 2002. learning about learning rather than about teaching. retrieved july 5, 2007, from http://www.ifla.org/iv/ ifla68/papers/081-119e.pdf. mcclelland, d. 1998. identifying competencies with behavioralevent interviews. psychological science 9, no. 5: 331–339. merriam, s. b., and r. s. caffarella. 1999. learning in adulthood. san francsico: jossey-bass. mezirow, j. 2000. learning to think like an adult: transformation theory; core concepts. in learning as transformation: critical perspectives on a theory in progress, j. mezirow and associates, eds. san francisco: jossey-bass. nadler, d. a., and m. tushman. 1999. the organization of the future: strategic imperatives and core competencies for the 21st century. organisational dynamincs 27, no. 1: 45–48. prahalad, c. k., and g. hamel. 1990. the core competence of the corporation. harvard business review 68, no. 3: 79–91. shank, p., and a. sitze. 2004. making sense of online learning. san francisco: pfeiffer. sloan, t., c. j. daane, and j. giesen. 2002. mathematics anxiety and learning styles: what is the relationship in elementary preservice teachers? school science and mathematics 102, no. 2: 84–87. smith, j. t. 2006. applied categorical data analysis. lecture presented in spring 2006 at northern illinois university, dekalb. swanson, r. a. 2001. foundations of human resource development. san francisco: berrett-koehler. thorkildsen, t. a. 2005. fundamentals of measurement in applied research. boston: pearson education. unruh, d. l. 2000. desktop videoconferencing: the promise and problems of delivery of web-based training. internet and higher education 3, no. 3: 183–199. walton, j. 1999. strategic human resource development. harlow, england: pearson education. wang, g. g., and j. wang. 2004. toward a theory of human resource development learning participation. human resource development review 3, no. 4: 326–353. waterhouse, s. 2005. the power of elearning: the essential guide for teaching in the digital age. boston: pearson education. 40 information technology and libraries | march 200840 information technology and libraries | march 2008 appendix. questionnaire part i. 1. gender q male q female 2. ethnicity q asian or pacific islander q american indian q african american q hispanic q white, non-hispanic q other ____ 3. please indicate the year of your birth: _________ 4. please indicate years of education: _________ 5. please indicate years of library experience: ________ part ii. for questions 6–16, please read each item and check the response that best matches your degree of agreement/disagreement: (1 = strongly agree; 2 = agree; 3 = mildly agree; 4 = mildly disagree; 5 = disagree; 6 = strongly disagree) 6. if training is provided by library vendors such as ebsco or blackwell, i would prefer that it be offered online rather than face-to-face. 7. if training is provided by associations/organizations such as ala and oclc, i would prefer that it be offered online rather than face-to-face. 8. if training is provided by library consortia, i would prefer that it be offered online rather than face-to-face. 9. if training is provided by your institution or library, i would prefer that it be offered online rather than face-toface. 10. if training location is out of state, i would prefer that it be offered online rather than face-to-face. 11. if training location is in-state, i would prefer that it be offered online rather than face-to-face. 12. if training location is in town, i would prefer that it be offered online rather than face-to-face. 13. if training location is in-house, i would prefer that it be offered online rather than face-to-face. 14. my library allocates sufficient budget for training (may include online training). 15. my library has good professional or staff development policies. 16. generally speaking, i prefer online training rather than face-to-face training. part iii. 17. state reasons for your preference of traditional face-to-face training. 18. state reasons for your preference of online training. 19. please make suggestions to improve library workplace training. 20. do you think that online training is less effective than traditional face-to-face training? yes__ no __ 16 information technology and libraries | march 2011 the internet public library (ipl): an exploratory case study on user perceptions environment. digital and physical holdings, academic and public libraries, free and subscription resources, internet encyclopedias, and a multitude of other offerings form a complex (and often overwhelming) informationseeking environment. to move forward effectively and to best serve its existing and potential users, the ipl must pursue a path that is adapted to the present state of the internet and that is user-informed and user-driven. recent large-scale studies, such as the 2005 oclc reports on perceptions of libraries and information resources, have begun to explore user perceptions of libraries in the complex internet environment.3 these studies emphasize the importance of user perceptions of library use, questioning whether libraries still matter in the rapidly growing infosphere and what future use trends might be. in the internet environment, user perceptions play a key role in use (or nonuse) of library resource and services as information-seekers are faced with myriad easily accessible electronic information sources. the ipl’s name, for example, may or may not be perceived as initially helpful to users’ information-seeking needs. repeat use relates to such perceptions as well, in the amount of value users perceive in the library resources over the many other sources available. in beginning to explore such issues, there is a need for current research addressing user perceptions of an internet public library: what the name implies to both existing and potential users as well as the associated functions and resources that should be offered. in this study, we present an exploratory case study on public perceptions of the ipl. using qualitative analysis of interviews with ten college students, some of whom are current users of the ipl and others with no exposure to the ipl, begins to yield an understanding of the public perception of what an internet public library should be. this study seeks to expand our understanding of such issues and explore the present-day requirements for the ipl in addressing the following research questions: ■■ what is the public perception of an internet public library? ■■ what services and materials should an internet public library offer? ■■ background the ipl: origins and research in 1995, joe janes, a professor at the university of michigan’s school of information and library studies, ran a graduate seminar in which a group of students created a web-based library intended to be a hybrid of both physical library services and internet resources and offerings. the resulting ipl would take the best from both the physical and digital the internet public library (ipl), now known as ipl2, was created in 1995 with the mission of serving the public by providing librarian-recommended internet resources and reference help. we present an exploratory case study on public perceptions of an “internet public library,” based on qualitative analysis of interviews with ten college student participants: some current users and others unfamiliar with the ipl. the exploratory interviews revealed some confusion around the ipl’s name and the types of resources and services that would be offered. participants made many positive comments about the ipl’s resource quality, credibility, and personal help. t he internet public library (ipl), now known as ipl2, is an online-based public service organization and a learning and teaching environment originally developed by the university of michigan’s school of information and currently hosted by drexel university’s ischool. the ipl was created in 1995 as a project in a graduate seminar; a diverse group of students worked to create an online space that would be both a library and an internet institution, helping librarians and the public identify useful internet resources and content collections. with a strong mission to serve and educate a varied community of users, the ipl sought to help the public navigate the increasingly complex internet environment as well as advocate for the continuing relevance of librarians in a digital world. the resulting ipl provided online reference, content collections (such as ready reference and a full-text reading room), youth-oriented resources, and services for other librarians, all through its free, web-based presence.1 currently, the ipl consists of a publicly accessible website with several large content collections (such as “potus: presidents of the united states”), sections targeted toward teens and children (“teenspace” and “kidspace”), and a question and answer service where users can e-mail questions to be answered by volunteer librarians.2 there has been an enormous amount of change in the internet and digital libraries since the ipl’s inception in 1995. while web use statistics, user feedback, and incoming patron questions indicate that the ipl remains well-used and valued, there are many questions about its place in an increasingly information-rich online monica maceli (mgm36@drexel.edu) is a doctoral student, susan wiedenbeck (susan.wiedenbeck@drexel.edu) is a professor, and eileen abels (eabels@drexel.edu) is a professor at the college of information science and technology, drexel university, philadelphia. monica maceli, susan wiedenbeck, and eileen abels the internet public library (ipl) | maceli, wiedenbeck, and abels 17 there has also been a continuous evaluation of the role of the library in an increasingly digital world, a question janes sought to address in his first imaginings of the ipl. a study conducted in 2005 claimed that “electronic information-seeking by the public, both adults and children, is now an everyday reality and large numbers of people have the expectation that they should be able to seek information solely in a virtual mode if they so choose.”12 this trend in electronic information-seeking has driven both public and academic libraries to create and support vast networks of licensed and free online information, directories, and guides. these electronic offerings, which (at least in theory) are desired and appreciated by users, are often overshadowed by the wealth of quickly accessible information from tools such as search engines.13 in competition with quickly accessible (though not necessarily credible or accurate) information sources, librarians have struggled to find their place and relevance in an evolving environment. google and other web search engines often shape users’ experiences and expectations with information-seeking, more so than any formal librarian-driven instruction such as in boolean searching. several recent comprehensive studies have explored user perceptions of libraries, both physical and digital, in relationship to the larger internet. abels explored the perspective of libraries and librarians across a broad population consisting of both library users and non-users.14 her findings included the fact that web search engines were the starting point for the majority of information-seeking, and that there is a high preference among users for virtual reference desk services. she proposed an information-seeking model in which the library serves as one of many internet resources, including free websites and interpersonal sources, and is likely not the user’s first stop. in respect to this model of information-seeking, abels suggests that “librarians need to accept the broader framework of the information seeker and develop services that integrate the library and the librarian into this framework.”15 in 2005, oclc released what is possibly the most comprehensive study to date of the public’s perceptions of library and information resources as explored on a number of levels, including both the physical and digital environments.16 findings relevant to libraries on the internet (and this study) included the following: ■■ 84 percent of participants reported beginning an information search from a search engine; only 1 percent started from a library website ■■ there was a preference for self-service and a tendency to not seek assistance from library staff ■■ users were not aware of most libraries’ electronic resources ■■ college students have the highest rate of library use ■■ users typically cross-reference other sites to validate their results worlds while developing its own unique offerings and features.4 janes had conceived the idea in 1994, when the internet’s continued growth began to make it clear that the role of libraries and librarians would be forever changed as a result. janes’ motivating question was “what does librarianship have to say to the network environment and vice versa?”5 the ipl tackled a broad mission of enhancing the value of the internet by providing resources to its varied users, educating and strengthening that community, and (perhaps most unique at the time) communicating “its’ creators vision of the unique roles of library culture and traditions on the internet.”6 initial student brainstorming sessions yielded the priorities that the ipl would address and included such services as reference, community outreach, and youth services. the first version of the ipl contained electronic versions of classic library offerings, such as magazines, texts, serials, newspapers, and an e-mail reference service. the ipl was well received and continued its development, adding and expanding resources to support specific communities such as teens and children. the ipl was awarded several grants over the next few years, allowing for expansion and continuation.7 a wealth of librarian volunteers, composed of students and staff, contributed to the ipl, in particular toward the e-mail reference services. with a stated goal of responding to patrons’ questions within one week, the reference services provide help and direct contact with the ipl’s user base, many of whom are students working on school assignments.8 the ipl’s collections are discoverable through search engines (popular offerings such as the “potus: presidents of the united states” resources rank highly in search results lists) and through its presence on social networking sites such as myspace, facebook, and twitter. additionally, ipl distributes brochures to teachers and librarians at relevant conferences. the ipl has been the focus of many research studies covering a broad range of themes, such as its history and funding, digital reference and the ipl’s question-andanswer service, and its resources and collections.9 also, in line with the original mission of the ipl, janes developed the internet public library handbook to share best practices with other librarians.10 the majority of publications, however, have focused on ipl’s reference service, which is uniquely placed as a librarian-staffed volunteer digital reference service. as the ipl has collected and retained all reference interactions since its inception in 1995, there is a wealth of data readily available to such studies and exploratory work into how best to analyze it.11 user perceptions of digital libraries the internet is a vastly different world than it was in the early days of the ipl’s creation. the expectations of library patrons, both in digital and in physical environments, have changed as well. and as the internet evolves 18 information technology and libraries | march 2011 of the public, which is the intention of this study. ■■ method this exploratory study consisted of a qualitative analysis of data gathered from interviews and observations of ten college student participants who were academic library users and nonusers of the ipl. a pilot study preceded the final research effort and allowed us to iteratively tailor the study design to best pursue our research questions. our initial study design incorporated a usability test portion, in which users were presented with a series of information-seeking needs and instructed to use the ipl’s website to answer the questions. however, we later dropped this portion of the study because pilot results found that it contributed little to answering our research questions about public perceptions; it largely explored implementation details, which was not the focus of this study. following the pilot study, we recruited ten drexel university students from the university’s w. w. hagerty library. this ensured recruiting participants who were at least minimally familiar with physical libraries and who were from a variety of academic focuses. the participant group included eight females and two males—two were graduate students, eight were undergraduates—from a variety of majors, including biology, biomedical engineering, business, library science, accounting, international studies, and information systems. participants took an average of twenty-six minutes to complete the study. the study consisted of a short interview to assess the user’s experience with public libraries (both physical and online) and their expectations of an internet public library. these open-ended questions (included in the appendix) sought to determine what features, services, or content were desired or expected by users, whether the term of “internet public library” was meaningful, if there were similarities to web-based systems that the participants were already familiar with, or if they had previously used a website they would consider an internet public library. all interviews were audio recorded and transcribed. an initial coding scheme was established and iteratively developed (table 1). once we observed significant overlap between participant responses, the study then proceeded to the final analysis and presentation, using inductive qualitative analysis to code text and identify themes from the data.22 ■■ findings all participants were current or former public library patrons; six participants (p1, p4, p5, p6, p8, and p9) were a portion of the study focused on library identity or brand in the mind of the public; participants found the library brand to be “books,” with no other terms or concepts coming close. as a companion report to this study, oclc released a report focused on the library perceptions of college students.17 as our study uses a college student participant base, oclc’s findings are highly relevant. the vast majority of college students reported using search engines as a starting point for informationseeking and expressed a strong desire for self-service library resources. as compared to the general population, however, college students have the highest rate and broadest use of both physical and digital library resources and a corresponding high awareness of these services. the relationship between public libraries and the internet was explored in depth in a 2002 study by d’elia et al.18 the study sought to systematically investigate patrons’ use of the internet and of public libraries. findings included the fact that the internet and public libraries are often complementary; that more than half of internet users were library users and vice versa; and that libraries are valued more than the internet for providing accurate information, privacy, and child-oriented spaces and services. participants made a distinction between the service characteristics of the public library versus those of the internet. many of the most-valued characteristics of the internet (such as information that is always available when needed) were not supported by physical libraries because of limited offerings and hours. in addition to large, comprehensive surveys, there have been several case-study approaches, exploring user perceptions of a particular digital library or library feature. tammaro researched user perceptions of an italian digital library, finding the catalog, online databases, and electronic journals to be most valued; she found speed of access, remote access, a larger number of resources, and personalization to be key digital library services.19 this study also reported a consistent theme in digital library literature: a patron base primarily consisting of novice users who do not know how to use the library and are unaware of the various services offered. crowley et al. evaluated an existing academic library’s webpages for issues and user needs.20 they identified issues with navigational structures and overly technical terminology and a general need for robust help and extensive research portals. in respect to our study, we found no literature that studied perceptions of internet public libraries. as mentioned earlier, research that addressed the ipl from the perspective of its patrons largely focused on ipl’s reference services. in 2008, ipl staff reported 13,857 reference questions received and 9,794,292 website visitors.21 although reference is clearly a vital and well-used service, there is also a great deal of website collection use that must be researched. recent literature does not address the current state of the ipl from the perspective the internet public library (ipl) | maceli, wiedenbeck, and abels 19 of such a library. a few remained confused about how such a concept would relate to physical public libraries and the internet in general. one participant assumed that such a term must mean the web presence of a particular physical public library. another’s immediate reaction was to question the value of such a venture in light of existing internet resources: “i mean, the internet is already useful, so i don’t know [how useful it would be]” (p2). two other participants found meaning in the term by associating it with a known library website, such as that of their academic library or local physical public library. when asked what websites seem similar in function or appearance to what they would consider an internet public library, responses varied. while most participants could not name any similar website or service, one mentioned several academic library websites that he was familiar with, another described several bookseller websites (amazon.com, half.com, and abebooks.com), and a third mentioned wikipedia (but then immediately retracted the statement, after deciding that wikipedia was not a library). theme 2: quick and easy, but still credible participants were highly enthusiastic about the perceived benefits in access to and credibility of information from an internet public library. ease of use and faster information access, often from home, were key motivators for use of internet-based libraries, both public and academic. as described earlier, there is a wealth of competing information options freely available on the internet. given this, participants felt that an internet public library would offer the most value because of its credible information: i like the ready reference [almanacs, encyclopedias]. . . . i’m not used to using any of these, wikipedia is just so ready and user friendly. it’s so easy to go to wikipedia but it’s not necessarily credible. . . . whereas i feel like this is definitely credible. it’s something i could use if i needed to in some sort of academic setting. (p10) theme 3: lack of differentiation between public and academic; physical and digital libraries for many participants, there was confusion about what was or was not a public library, and they initially considered their academic library in that category. overall, participants did not think of public and academic libraries (physical or on the internet) as distinctly different; rather they were more likely to be associated with phase of life. participants that were not current public library users reported using public libraries frequently during their years of elementary education. for participants that were current public library users, physical public libraries (and other local academic libraries) were used to fill in the gaps current public library users, and four (p2, p3, p7, and p10) had used public libraries in the past but were no longer using their services. two participants were graduate students (p3 and p9) with the remainder undergraduates, and two of the ten students had used the ipl website before (p3 and p6). the participants could be characterized as relatively infrequent public library users with a strong interest in the physical book holdings of the public library, primarily for leisure but frequently for research as well. several participants mentioned scholarly databases that were provided by their public library (typically from within the library or online with access using a public library card). there was also interest in leisure audiovisual offerings and in using the library as a destination for leisure. the following themes illustrate our main findings with respect to our research questions. as described above, we conceptualized our raw data into broad themes through an iterative process of inductive coding and analysis. although multiple themes emerged as associated with each of our research questions, we present only the most important and relevant themes (see table 2). all themes were supported by responses from multiple participants. we will further elaborate the themes discovered later in this section; a selected relevant and meaningful participant quote illustrates each theme. theme 1: confusion about name “internet public library” was not an immediately clear term to four of the participants; the six other participants were able to immediately begin describing their concept table 1. inductive coding scheme developed from raw transcript text, used to identify key themes coding scheme physical public libraries tied to life phase confusion between academic and public current use frequency of use perceptions of an internet public library access properties of physical libraries reference resources tools users general internet use academic library use similar sites to ipl 20 information technology and libraries | march 2011 would contain both electronic online items and locally available items in physical formats. in particular, connections to local physical libraries to share item holdings and availability status were desired: “general book information and maybe a list of where books can be found. like online, the local place you can find the books.” (p7) given that information-seeking, for this group, was conducted indiscriminately across physical and digital libraries, this integrated view into local physical resources seems to be a natural request. theme 6: personal and personalized help although no participants claimed that reference was a service that they typically use during their physical public library experiences, it was a strong expectation for an internet public library and mentioned by nearly every participant. when questioned as to how this reference interaction should take place, there was a clear preference for communicating via instant message: “reference information. . . . you know, where you have real people. a place where you can ask questions. . . . if you think you can get an answer at a library, then online you would hope to get the same things.” (p1) in addition to being able to interact with a “real” librarian, participants desired other personalized elements, such as resources and services dedicated to information needy populations (like children) as well as resources supporting the community and personal lifestyle issues and topics (like health and money). ■■ discussion in summary, we characterized the participants in this case study as low-frequency physical public library users with a high association between life phase (high school or grade school) and public library use. participants looked to public libraries to provide physical books—primarily for leisure but often for research use as well—leisure dvds and cds, scholarly databases, and a space to “hang for items that could not be located at their school’s academic library, either through physical or digital offerings. consistent with this finding, a few participants reported conducting searches across both local academic and public libraries in pursuit of a particular item. there was a general disregard for where the item came from, as long as it could be acquired with relatively little effort from physically close local or online resources. however, participants reported typically starting with their academic libraries for school resources and the public libraries for leisure materials “i go to the philadelphia public library probably once a month or so usually for dvds but sometimes for books that i can’t find here [academic library]. . . . i usually check here first because it’s closer.” (p5) theme 4: electronic resources, catalog, and searching tools are key there were many participant comments, and some confusion, around what type of resources an internet public library would provide, as well as whether they would be free or not (one participant assumed there would be a fee to read online). the desired resources (in order of importance) included leisure and research e-books, scholarly databases, online magazines and newspapers, and dvds and cds (pointers to where those physical items could be found in local libraries). a few comments were negative, assuming the resources provided would only be electronic, but participants were mostly enthusiastic about the types and breadth of resources that such a website would offer. for example, one participant commented, “i think you could get more resources. . . . the library i usually visit is kind of small so it’s very limited in the range of information you can find.” (p4) many participants emphasized the importance of providing robust, yet easy-to-use, search tools in managing complex information spaces and conveying item availability. theme 5: connections to physical libraries several participants assumed that the resource collection table 2. themes identified research question themes identified what is the public perception of an internet public library? confusion about name quick and easy, but still credible lack of differentiation between public and academic; physical and digital libraries what services and materials would such a website offer? electronic resources, catalog, and searching tools are key connections to physical libraries personal and personalized help the internet public library (ipl) | maceli, wiedenbeck, and abels 21 infosphere—their services and collections both physical and virtual.25 this is, like many issues in library systems design, a complex challenge. as previous research has shown, extending the metaphor of the physical library into the digital environment does not always assist users, especially when they may be more likely to draw on previous experiences with other internet resources.26 the original prospectus for the internet public library, as developed by joe janes, acknowledges the different capabilities of physical libraries and libraries on the internet, claiming that the ipl would “be a true hybrid, taking the best from both worlds but also evolving its own features.”27 if users anticipate an experience similar to the internet resources they typically use (such as search engines), then the ipl may best serve its users by moving closer to “internet” than “library.” however, such a choice may entail unforeseen tradeoffs. several participants in this study mused over what physical public library characteristics would carry over to a digital public library and the potential tradeoffs: “you wouldn’t have to leave your home but at the same time i think it’s easier to wander the library and just see things that catch your eye. and i like the quiet setting of the library too.” (p8) another participant mentioned the distinctly positive public library experience, and how such an experience should be reflected in an internet-based public library: “i think that public libraries have a very positive reputation within communities. and i don’t think it would be bad for an internet public library to move toward that expectation that people have.” (p3) the question remains, then, whether the ipl can compete with a multitude of other internet resources without losing the familiar and positive essence of a traditional physical public library. or rather, how can the ipl find a way to translate that essence to a digital environment without sacrificing performance and user expectations of internet services? ■■ conclusion during this study, participants described an internet public library that, in many ways, takes the best features of several currently existing and popular websites. an internet public library should contain all the information of wikipedia, yet be as credible as information received directly from your local librarian. it should search across both websites and physical holdings, like abebooks.com or a search aggregator. it should search as powerfully and as easily as google, yet return fewer, more targeted results. and it should provide real-time help immediately and conveniently, all from the comfort of your home. out” or occupy leisure time. for the participants, an internet public library (an occasionally confusing term) described a service you could access from home, which included electronic books, information about locally available physical books, scholarly databases, reference or help services, and robust search tools. it must be easy to use and tailored to needy community populations such as children and teens. for several participants it would be similar to existing bookseller websites (such as amazon. com or abebooks.com) or academic library websites. in exploring how these findings can inform the future design and direction of the ipl, it is again necessary to reflect on the values and concepts that inspired the original creation of the ipl. the initial choice of the ipl’s name was intended to reflect a novel system at the time, as joe janes detailed in the ipl prospectus: “i would view each of those three words as equally important in conveying the intent of this project: internet, public, and library. i think the combination of the three of them produces something quite different than any pair or individual might suggest.”23 all three of these concepts—internet, public, and library—have evolved with the changing nature of the internet. and, as the research explored would indicate, there may not be a distinct boundary between these concepts from the perspective of users. our finding that participants seek information by indiscriminately crossing public and academic libraries, as well as digital and physical resource formats verifies earlier research efforts.24 as the amount of information accessible on the internet has expanded, the boundary of the library can be seen as either expanding (providing credible indexing, pointers, and information about useful resources from all over the internet), contracting (primarily providing access to select resources that must be accessed through subscription), or existing somewhere in between, depending on the perspective. in any of these cases, it is vital that the ipl present its resources, services, and offerings such that its value and contribution to information-seeking is highlighted and clear to users. amorphously placed in a complex world of digital and physical information, the ipl must work toward creating a strong image of its offering and mission; an image that is transparent to its users, starting with its name. this challenge is not the ipl’s alone, but rather that of all internet library portals, resources, and services. the 2005 oclc report on perceptions of libraries expressed the importance of a strengthened image for internet libraries: libraries will continue to share an expanding infosphere with an increasing number of content producers, providers and consumers. information consumers will continue to self-serve from a growing information smorgasbord. the challenge for libraries is to clearly define and market their relevant place in that 22 information technology and libraries | march 2011 library,” journal of electronic publishing 3, no. 2 (1997). 8. david s. carter and joseph janes, “unobtrusive data analysis of digital reference questions and service at the internet public library: an exploratory study,” library trends 49, no. 2 (2000): 251–65. 9. on the ipl’s history and funding, see barbara hegenbart, “the economics of the internet public library,” library hi tech 16, no. 2 (1998): 69–83; joseph janes, “serving the internet public: the internet public library,” electronic library 14, no. 2 (1996): 122–26; and carter and janes, “unobtrusive data analysis,” 251–65. on digital reference and ipl’s question-andanswer service, see kenneth r. irwin, “professional reference service at the internet public library with ‘freebie’ librarians,” searcher—the magazine for database professionals 6, no. 9 (1998): 21–23; nettie lagace and michael mcclennen, “questions and quirks: managing an internet-based distributed reference service,” computers in libraries 18, no. 2 (1998): 24–27; sara ryan, “reference service for the internet community: a case study of the internet public library reference division,” library & information science research 18, no. 3 (1996): 241–59; and elizabeth shaw, “real time reference in a moo: promise and problems,” internet public library, http://www.ipl.org/div/iplhist/moo .html (accessed dec. 4, 2008). on the ipl’s resources and collections, see thomas pack, “a guided tour of the internet public library—cyberspace’s unofficial library offers outstanding collections of internet resources,” database 19, no. 5 (1996): 52–56. 10. joseph janes, the internet public library handbook (new york: neal schuman, 1999). 11. carter and janes, “unobtrusive data analysis,” 251–65. 12. gloria j. leckie and lisa m. given, “understanding information-seeking: the public library context,” advances in librarianship 29 (2005): 1–72. 13. james rettig, “reference service: from certainty to uncertainty,” advances in librarianship 30 (2006): 105–43. 14. eileen abels, “information seekers’ perspectives of libraries and librarians,” advances in librarianship 28 (2004): 151–70. 15. ibid., 168. 16. cathy de rosa et al., “perceptions of libraries.” 17. cathy de rosa et al., “college students’ perceptions of libraries.” 18. george d’elia et al., “the impact of the internet on public library use: an analysis of the current consumer market for library and internet services,” journal of the american society for information science & technology 53, no. 10 (2002): 802–20. 19. anna maria tammaro, “user perceptions of digital libraries: a case study in italy,” performance measurement & metrics 9, no. 2 (2008): 130–37. 20. gwyneth h. crowley et al., “user perceptions of the library’s web pages: a focus group study at texas a&m university,” the journal of academic librarianship 28, no. 4 (2002): 205–10. 21. adam feldman, e-mail to author, apr. 3, 2009; mark galloway, e-mail to author, apr. 3, 2009. 22. for information on inductive qualitative analysis, see david r. thomas. “a general inductive approach for analyzing qualitative evaluation data” american journal of evaluation 27, no. 2 (2006): 237–46; michael quinn patton, qualitative research and evaluation methods (thousand oaks, calif.: sage, 2002); these are clearly complex, far-reaching, and labor-intensive requirements. and many of these requirements are currently difficult and unresolved challenges to digital libraries in general, not simply the ipl. this preliminary study is limited in its college student participant base and small sample size, which may not reflect perspectives of the greater community of ipl users. these results therefore may not be generalizable to other populations who are current or potential users of the ipl, including other targeted groups such as children and teens. additionally, our chosen participant group, college students who are physical library users, had relatively high levels of library and technology experience, as well as complex expectations. our results would likely differ with a participant group of novice internet users. as detailed above, this study explores public perceptions of an internet public library—an important aspect of the ipl that is not well studied and that has implications on ipl use and repeat use. while the ipl was carefully and thoughtfully constructed by a dedicated group of librarians, students, and educators, there has not been a recent study devoted to understanding what an internet public library should be today. more recently, in january 2010, the ipl merged with the librarians’ internet index to form ipl2. the two collections were merged and the website was redesigned. although this merger was because of circumstances unrelated to our research, our findings were leveraged during the redesign (for example, in naming the collections). in the future, our findings can be used in further ipl2 design iterations or explored in subsequent research studies in the specific context of ipl2 or of digital libraries in general. as discussed above, this study may be extended to different participant populations and to existing but remote ipl2 users. this study may also be continued in a more design-oriented direction to explore the usability and user acceptance of ipl2’s website. references 1. joseph janes, “the internet public library: an intellectual history,” library hi tech 16, no. 2 (1998): 55–68. 2. “about the internet public library,” internet public library, http://ipl.org/div/about/ (accessed feb. 17, 2009). 3. cathy de rosa et al., “perceptions of libraries and information resources,” oclc online computer library center, 2005, http://www.oclc.org/reports/pdfs/percept_all .pdf (accessed mar. 9, 2009); cathy de rosa et al., “college students’ perceptions of libraries and information resources,” oclc online computer library center, 2005, http://www .oclc.org/reports/pdfs/studentperceptions.pdf (accessed mar. 9, 2009). 4. janes, “the internet public library,” 55. 5. ibid., 56. 6. ibid., 57. 7. lorrie lejeune, “before its time: the internet public the internet public library (ipl) | maceli, wiedenbeck, and abels 23 american society for information science & technology 58, no. 3 (2007): 433–45. 25. de rosa et al., “college students’ perceptions of libraries,” 146. 26. makri et al., “a library or just another information resource?” 434. 27. joseph janes, “the internet public library,” 56. and matthew b. miles and michael huberman, qualitative data analysis: an expanded sourcebook, 2nd ed. (thousand oaks, calif.: sage, 1994). 23. janes, “the internet public library,” 56. 24. for example, stephann makri et al., “a library or just another information resource? a case study of users’ mental models of traditional and digital libraries,” journal of the appendix. interview protocol ■■ have you ever visited a public library? ■❏ if so, how often do you visit and why? ■❏ what services do you typically use? ■❏ can you describe your last visit and what you were looking for? ■❏ what do you think an internet public library would be? ■■ what sort of services would it offer? ■■ what else should it do? ■■ have you ever visited an internet public library? jeng ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ communications | maceli, wiedenbeck, and abels 71benign neglect: developing life rafts for digital content | deridder 71 in accordance with the current best practices.8 for those producers of content who are not able to meet the requirements of ingest, or who do not have access to an oais archive provider, what are the options? with the recent downturn in the economy, the availability of staff and the funding for the support of digital libraries has no doubt left many collections at risk of abandonment. is there a method for preparation of content for longterm storage that is within the reach of existing staff with few technical skills? if the content cannot get to the safe harbor of a trusted digital library, is it consigned to extinction? or are there steps we can take to mitigate the potential loss? the oais model incorporates six functional entities: ingest, data management, administration, preservation planning, archival storage, and access.9 of these six, only archival storage is primary; all the others are useless without the actual content. and if the content cannot be accessed in some form, the storage of it may also be useless. therefore the minimal components that must be met are those of archival storage and some form of access. the lowest cost and simplest option for archival storage currently available is the distribution of multiple copies dispersed across a geographical area, preferably on different platforms, as recommended by the current lockss initiative,10 which focuses on bit-level preservation.11 private lockss network models (such as the alabama digital preservation network)12 are the lowest-cost implementation, requiring only hardware, membership in lockss, and a small amount of time and technical expertise. reduction of the six functional entities to only two negates the need in contrast, other leaders of the digital preservation movement have been stating for years that benign neglect is not a workable solution for digital materials. eric van de velde, director of caltech’s library information technology group, stated that the “digital archive must be actively managed.”3 tom cramer of stanford university agrees: “benign neglect doesn’t work for digital objects. preservation requires active, managed care.”4 the digital preservation europe website argues that benign neglect of digital content “is almost a guarantee that it will be inaccessible in the future.”5 abby smith goes so far as to say that “neglect of digital data is a death sentence.”6 arguments to support this statement are primarily those of media or data carrier storage fragility and obsolescence of hardware, software, and format. however, the impact of these arguments can be reduced to a manageable nightmare. by removing as much as possible of the intermediate systems, storing open-source code for the software and operating system needed for access to the digitized content, and locating archival content directly on the file system itself, we reduce the problems to primarily that of format obsolescence. this approach will enable us to forge ahead in the face of our lack of resources and our rather desperate need for rapid, cheap, and pragmatic solutions. current long-term preservation archives operating within the open archival information system (oais) model assume that producers can meet the requirements of ingest.7 however, the amount of content that needs to be deposited into archives and the expanding variety of formats and genres that are unsupported, are overwhelming the ability of depositors to prepare content for preservation. andrea goethals of harvard proposed that we revisit assumptions of producer ability to prepare content for deposit benign neglect: developing life rafts for digital content i n his keynote speech at the archiving 2009 conference in arlington, virginia, clifford lynch called for the development of a benign neglect model for digital preservation, one in which as much content as possible is stored in whatever manner available in hopes of there someday being enough resources to more properly preserve it. this is an acknowledgment of current resource limitations relative to the burgeoning quantities of digital content that need to be preserved. we need low cost, scalable methods to store and preserve materials. over the past few years, a tremendous amount of time and energy has, sensibly, been devoted to developing standards and methods for best practices. however, a short survey of some of the leading efforts clarifies for even the casual observer that implementation of the proposed standards is beyond many of those who are creating or hosting digital content, particularly because of restrictions on acceptable formats, requirements for extensive metadata in specific xml encodings, need for programmers for implementation, costs for participation, or simply a lack of a clear set of steps for the uninitiated to follow (examples include: planets, premis, dcc, caspar, irods, sound directions, hathitrust).1 the deluge of digital content coupled with the lack of funding for digital preservation and exacerbated by the expanding variety of formats, makes the application of extensive standards and extraordinary techniques beyond the reach of the majority. given the current circumstances, lynch says, either we can seek perfection and store very little, or we can be sloppy and preserve more, discarding what is simply intractable.2 communications jody l. deridder (jlderidder@ua.edu) is head, digital services, university of alabama. jody l. deridder 72 information technology and libraries | june 2011 during digitization is that developing digital libraries usually have a highly chaotic disorganization of files, directory structures, and metadata that impede digital preservation readiness.19 if the archival digital files cannot be easily and readily associated with the metadata that provides their context, and if the files themselves are not organized in a fashion that makes their relationships transparent, reconstruction of delivery at some future point is seriously in question. underfunded cultural heritage institutions need clear specifications for file organization and preparation that they are capable of meeting without programming staff or extensive time commitments. particularly in the current economic downturn, few institutions have the technical skills to create mets wrappers to clarify file relationships.20 one potential solution is to use the organization of files in the file system itself to communicate clearly to future archivists how the files relate to one another. at the university of alabama, we have adopted a standardized file naming system that organizes content by the holding institution and type, collection, item, and then sequence of delivery (see figure 1). the file names are echoed in the file system: top level directories match the holding institution number sequence, secondary level directory names match the assigned collection number sequence, and so forth. metadata and documentation are stored at whatever level in the file system corresponds to the files to which they apply, and these text and xml files have file names that also correspond to the files to which they apply, which assists further in identification (see figure 2).21 by both naming and ordering the files according to the same system, and bypassing the need for databases, complex metadata schemes and software, we leverage the simplicity of the file system to bring order to chaos and to enable our content to be easily reconstructed by future systems. take and manage the content is still uncertain. the relay principle states that a preservation system should support its own migration. preserving any type of digital information requires preserving the information’s context so that it can be interpreted correctly. this seems to indicate that both the intellectual context and the logical context need to be provided. context may include provenance information to verify authenticity, integrity, and interpretation;17 it may include structural information about the organization of the digital files and how they relate to one another; and it should certainly include documentation about why this content is important, for whom, and how it may be used (including access restrictions). because the cost of continued migration of content is very high, a method of mitigating that cost is to allow content to become obsolete but to support sufficient metadata and contextual information to be able to resurrect full access and use at some future time—the resurrection principle. to be able to resurrect obsolete materials, it would be advisable to store the content with open-source software that can render it, an opensource operating system that can support the software, and separate plain-text instructions for how to reconstruct delivery. in addition, underlying assumptions of the storage device itself need to be made explicit if possible (type of file system partition, supported length of file names, character encodings, inode information locations, etc.). some of the need for this form of preservation may be diminished through such efforts as the planets timecapsule deposit.18 this consortium has gathered the supporting software and information necessary to access current common types of digital files (such as pdf), for long-term storage in swiss fort knox. one of the drawbacks to gathering and storing content developed for a tremendous amount of metadata collection. where the focus has been on what is the best metadata to collect, the question becomes: what is the minimal metadata and contextual information needed? the following is an attempt to begin this conversation in the hope that debate will clarify and distill the absolutely necessary and specific requirements to enable long-term access with the lowest possible barrier to implementation. if we consider the purpose of preservation to be solely that of ensuring long-term access, it is possible to selectively identify information for inclusion. the recent proposal by the researchers of the national geospatial digital archive (ngda) may help to direct our focus. they have defined three architectural design principles that are necessary to preserve content over time: the fallback principle, the relay principle, and the resurrection principle.13 in the event that the system itself is no longer functional, then a preservation system should support some form of hand-off of its content—the fallback principle. this can be met by involvement in lockss, as specified above. lacking the ability to support even this, current creators and hosts of digital content may be at the mercy of political or private support for ingest into trusted digital repositories.14 the recently developed bagit file package format includes valuable information to ensure uncorrupted transfer for incorporation into such an archive.15 each base directory containing digital files is considered a bag, and the contents can be any types of files in any organization or naming convention; the software tags the content (or payload) with checksums and manifest, and bundles it into a single archive file for transfer and storage. an easily usable tool to create these manifests has already been developed to assist underfunded cultural heritage organizations in preparing content for a hosting institution or government infrastructure willing to preserve the content.16 the gap of who would communications | deridder 73benign neglect: developing life rafts for digital content | deridder 73 clifford lynch pointed out, funding cutbacks at the sub-federal level are destroying access and preservation of government records; corporate records are winding up in the trash; news is lost daily; and personal and cultural heritage materials are disappearing as we speak.24 it is valuable and necessary to determine best practices and to seek to employ them to retain as much of the cultural and historical record as possible, and in an ideal world, these practices would be applied to all valuable digital content. but in the practical and largely resource-constrained world of most libraries and other cultural institutions, this is not feasible. the scale of content creation, the variety and geographic dispersal of materials, and the cost of preparation and support makes it impossible for this level of attention to be applied to the bulk of what must be saved. for our cultural memory from this period to survive, we need to communicate simple, clear, scalable, inexpensive options to digital holders and creators. references 1. planets consortium, planets preservation and long-term access through networked services, http:// www.planets-project.eu/ (accessed mar. 29, 2011); library of congress, premis (preservation metadata maintenance activity), http://www.loc.gov/standards/premis/ (accessed mar. 29, 2011); dcc (digital curation centre), http:// www.dcc.ac.uk/ (accessed mar. 29, 2011); caspar (cultural, artistic, and scientific knowledge for preservation, access, and retrieval), http://www.casparpreserves .eu/ (accessed mar. 29, 2011); irods (integrated rule-oriented data system), h t t p s : / / w w w. i ro d s . o rg / i n d e x . p h p / irods:data_grids,_digital_libraries,_ persistent_archives,_and_real-time_ data_systems (accessed mar. 29, 2011); mike casey and bruce gordon, sound directions: best practices for audio preservation, http://www.dlib.indiana .edu/projects/sounddirections/papers present/sd_bp_07.pdf (accessed june 14, 2010); hathitrust: a shared digital online delivery of cached derivatives and metadata, as well as webcrawlerenabled content to expand accessibility. this model of online delivery will enable low cost, scalable development of digital libraries by simply ordering content within the archival storage location. providing simple, clear, accessible methods of preparing content for preservation, of duplicating archival treasures in lockss, and of web accessibility without excessive cost or deep web database storage of content, will enable underfunded cultural heritage institutions to help ensure that their content will continue to survive the current preservation challenges. as david seaman pointed out, the more a digital item is used, the more it is copied and handled, the more it will be preserved.23 focusing on archival storage (via lockss) and accessibility of content fulfills the two most primary oais functional capabilities and provides a life raft option for those who are not currently able to surmount the forbidding tsunami of requirements being drafted as best practices for preservation. the importance of offering feasible options for the continued support of the long tail of digitized content cannot be overstated. while the heavily funded centers may be able to preserve much of the content under their purview, this is only a small fraction of the valuable digitized material currently facing dissolution in the black hole of our cultural memory. as while no programmers are needed to organize content into such a clear, consistent, and standardized order, we are developing scripts that will assist others who seek to follow this path. these scripts not only order the content, they also create lockss manifests at each level of the content, down to the collection level, so that the archived material is ready for lockss pickup. a standardized lockss plugin for this method is available. to assist in providing access without a storage database, we are also developing an open-source web delivery system (acumen),22 which dynamically collects content from this protected archival storage arrangement (or from webaccessible directories) and provides figure 1. university of alabama libraries digital file naming scheme (©2009. used with permission.) figure 2. university of alabama libraries metadata organization (©2009. used with permission.) 74 information technology and libraries | june 2011 .org/documents/domain-range/index .shtml#provenancestatement (accessed july 18, 2009). 18. planets consortium, planets time capsule—a showcase for digital preservation, http://www.ifs.tuwien .ac.at/dp/timecapsule/ (accessed june 14, 2010). 19. martin halbert, katherine skinner, and gail mcmillan, “avoiding the calfpath: digital reservation readiness for growing collections and distributed preservation networks,” archiving 2009 (may 2009): 6. 20. library of congress, metadata encoding and transmission standard (mets), http://www.loc.gov/standards/ mets. 21. jody l. deridder, “from confusion and chaos to clarity and hope,” in digitization in the real world: lessons learned from small to medium-sized digitization projects, ed. kwong bor ng and jason kucsma, (metropolitan new york library council, n.y., 2010). 22. tonio loewald and jody deridder, “metadata in, library out. a simple, robust digital library system,” code4lib journal 10 (2010), http://journal.code4lib .org/articles/3107 (accessed aug. 29, 2010). 23. david seaman “the dlf today” (keynote presentation, 2004 symposium on open access and digital preservation, atlanta, ga.), paraphrased by eric lease morgan in musings on information and librarianship, http://infomotions.com/ m u s i n g s / o p e n a c c e s s s y m p o s i u m / (accessed aug. 9, 2009). 24. lynch, challenges and opportunities. 9. consultative committee for space data systems, reference model. 10. stanford university et al., lots of copies keep stuff safe (lockss), http:// www.lockss.org/lockss/home (accessed mar. 29, 2011). 11. david s. rosenthal et al., “requirements for digital preservation systems: a bottom-up approach,” d-lib magazine 11 (nov. 2005): 11, http:// w w w. d l i b . o r g / d l i b / n o v e m b e r 0 5 / rosenthal/11rosenthal.html (accessed june 14, 2010). 12. alabama digital preservation network (adpnet), http://www.adpn .org/ (accessed mar. 29, 2011). 13. greg janée, “preserving geospatial data: the national geospatial digital archive’s approach,” archiving 2009 (may 2009): 6. 14. research libraries group/oclc, trusted digital repositories: attributes and responsibilities, http://www .oclc.org/programs/ourwork/past/ trustedrep/repositories.pdf (accessed july 17, 2009). 15. andy boyko et al., the bagit file packaging format (0.96) (ndiipp content transfer project), http://www.digital preservation.gov/library/resources/ tools/docs/bagitspec.pdf (accessed july 18, 2009). 16. library of congress, bagit library, http://www.digitalpreservation.gov/ partners/resources/tools/index.html#b (accessed june 14, 2010). 17. andy powell, pete johnston, and thomas baker, “domains and ranges for dcmi properties: definition of the dcmi term provenance,” http://dublincore repository, http://www.hathitrust.org/ (accessed mar. 29, 2011). 2. clifford lynch, challenges and opportunities for digital stewardship in the era of hope and crisis (keynote speech, is&t archiving 2009 conference, arlington, va., may 2009). 3. jane deitrich, e-journals: do-ityourself publishing, http://eands .caltech.edu/articles/e%20journals/ ejournals5.html (accessed aug. 9, 2009). 4. tom cramer, quoted in art pasquinelli, “digital libraries and repositories: issues and trends” (sun microsystems presentation at the summit bibliotheken, universitäsbibliothek kassel, 18–19, mar. 2009), slide 12, http:// de.sun.com/sunnews/events/2009/ bibsummit/pdf/2-art-pasquinelli.pdf (accessed july 12, 2009). 5. digital preservation europe, what is digital preservation? http://www.digi talpreservationeurope.eu/what-is-digi tal-preservation/ (accessed june 14, 2010). 6. abby smith, “preservation,” in susan schreibman, ray siemens, john unsworth, eds., a companion to digital humanities (oxford: blackwell, 2004), http://www.digitalhumanities.org/com panion/ (accessed june 14, 2010). 7. consultative committee for space data systems, reference model for an open archival system (oais), ccsds 650.0-b-1 blue book, jan. 2002, http://public.ccsds .org/publications/archive/650x0b1.pdf (accessed june 14, 2010). 8. andrea goethals, “meeting the preservation demand responsibly = lowering the ingest bar?” archiving 2009 (may 2009): 6. guest editorial clifford lynch information technology and libraries | march 2012 3 congratulations lita and information technology and libraries. since the early days of the internet, i’ve been continually struck by the incredible opportunities that it offers organizations concerned with the creation, organization, and dissemination of knowledge to advance their core missions in new and more effective ways. libraries and librarians were consistently early and aggressive in recognizing, seizing, and advocating for these opportunities, though they’ve faced—and continue to face—enormous obstacles ranging from copyright laws to the amazing inertia of academic traditions in scholarly communication. yet the library profession has been slow to open up access to the publications of its own professional societies, to take advantage of the greater reach and impact that such policies can offer. making these changes is not easy: there are real financial implications that suddenly seem very serious when you are a member of a board of directors, charged with a fiduciary duty to your association, and you have to push through plans to realign its finances, organizational mission, and goals in the new world of networked information. so, as a long-time lita member, i find it a great pleasure to see lita finally reach this milestone with information technology and libraries (ital) moving to fully open-access electronic distribution, and i congratulate the lita leadership for the persistence and courage to make this happen. it’s a decision that will, i believe, make the journal much more visible, and a more attractive venue for authors; it will also make it easier to use in educational settings, and to further the interactions between librarians, information scientists, computer scientists, and members of other disciplines. on a broader ala-wide level, ital now joins acrl’s college & research libraries as part of the american library association’s portfolio of open-access journals. supporting ital as an open-access journal is a very good reason indeed to be a member of lita. clifford lynch (clifford@cni.org) is executive director, coalition for networked information. mailto:clifford@cni.org editor’s comments bob gerrity information technology and libraries | march 2012 4 welcome to the first issue of information technology and libraries (ital) as an open-access, eonly publication. as announced to lita members in early january, this change in publishing model will help ensure the long-term viability of ital by making it more accessible, more current, more relevant, and more environmentally friendly. ital will continue to feature high-quality articles that have undergone a rigorous peer-review process, but it will also begin expanding content to include more case studies, commentary, and information about topics and trends of interest to the lita community and beyond. look for a new scope statement for ital shortly. we’re pleased to include in this issue the winning paper from the 2011 lita/ex libris student writing award contest, abigail mcdermott’s overview on copyright law. we also have two lengthier-than-usual studies on library discovery services. the first, jason vaughan’s overview of his library’s investigations into web-scale discovery options, was accepted for publication more than a year ago, but due to its length did not make it into “print” until now, since we no longer face the constraints associated with the production of a print journal. the second study, by jody condit fagan and colleagues at james madison university, focuses on discovery-tool usability. jimmy ghaphery and erin white provide a timely overview of the results of their surveys on the use and management of web-based research guides. tomasz neugebauer and bin han offer a strategy and workflow for batch importing electronic theses and dissertations (etds) into an eprints repository. with the first open-access, e-only issue launched, our attention will be turned to updating and improving the ital website and expanding the back content available. our goal is to have all of the back issues of both ital and its predecessor, journal of library automation (jola), openly available from the ital site. we’ll also be exploring ways to better integrate the italica blog and the ital preprints site with the main site. suggestions and feedback are welcome, at the e-mail address below. bob gerrity (robert.gerrity@bc.edu) is associate university librarian for information technology, boston college libraries, chestnut hill, massachusetts. selecting a web content management system for an academic library website | black 185 others. the osu libraries needed a content management system (cms). web content management is the discipline of collecting, organizing, categorizing, and structuring information that is to be delivered on a website. cmss support a distributed content model by separating the content from the presentation and giving the content provider an easy to use interface for adding content. but not just any cms would work. it was important to select a system that would work for the organization. the focus of this article is the process followed by osu libraries in the selection of a web cms. other aspects of the project, such as the creation of the user focused information architecture, the redesign of the site, the implementation of the cms, and the management of the project are outside the scope of this article. ■■ literature review content and workflow management for library web sites: case studies, a set of case studies edited by holly yu, a special issue of library hi tech dedicated to content management, and other articles effectively outlined the need for libraries to move from static websites, dominated by html webpages, to dynamic database and cms driven websites.1 each of these works noted the messy, unmanageable situation of the static websites in which the content is inconsistently displayed and impossible to maintain. seadle summarizes the case well when he wrote “a content management system (cms) offers a way to manage large amounts of web-based information that escapes the burden of coding all of the information into each page in html by hand.”2 a cms provides an interface for content providers to add their contributions to the website without requiring knowledge of html; it separates the layout and design of the webpages from the content and provides the opportunity for reuse of both content and the code running the site. these features of a cms permit a library to professionalize its website by enforcing a consistency of design across all pages while at the same time increasing efficiency by making the maintenance of the content itself less technically challenging.3 the potential of the cms is powerful, yet it is not an easy process to select and implement a cms. one challenge is that the process of selecting and implementing a cms is not a fully technical one. the selection must be tied to the goals and strategy of the library and parent elizabeth l. black selecting a web content management system for an academic library website this article describes the selection of a web content management system (cms) at the ohio state university libraries. the author outlines the need for a cms, describes the system requirements to support a large distributed content model and shares the cms trial method used, which directly included content provider feedback side-by-side with the technical experts. the selected cms is briefly described. i magine a city that has been inhabited consistently for hundreds, perhaps thousands of years. those arriving in the city’s main port follow clear, wide paths that are easy to navigate. soon, however, the visitor notices that the signs change. they look similar but the terms are different and the spaces have an increasingly different look. continuing further, the visitor is lost. some sections look drastically different, as if they belong in an entirely different city. other sections are abandoned. the buildings at first seem occupied, but upon closer inspection all is old and neglected. the visitor tries to head back to the main, clear sections but cannot find the way. in frustration, the visitor leaves the city and moves on, often giving up the mission that led to the city in the first place. this metaphor describes the state of the ohio state university (osu) libraries’ website at the beginning of this project. the website has many content providers, more than 150 at one point. these content providers were given accounts to ftp files to the web server and a variety of web editors with which to manage their files. the site consisted of more than 100,000 files of many types: html, php, image files, microsoft office formats, pdf, etc. the files with content were primarily static html files. in 2005, the osu libraries began to implement a php-based template that included three php statements that called centrally maintained files for the header, the main navigation, and the footer. the template also called a series of centrally controlled style sheets. the goal was to have the content providers add the body of the pages and leave the rest to be managed by these central files. this didn’t work as intended. because of a combination of page editing practices learned with static html and a variety of skill with cascading style sheets (css), many pages lost the central control of the header, menu, and footer. also, the template was confusing for many because they had to wade through a lot of code they didn’t understand. one part of this content model was right—giving the person with the content knowledge the power to update the content while centrally controlling parts that should remain consistent throughout the website. unfortunately, the technical piece of the model didn’t support this goal. it required too much technical knowledge from the content providers. the real solution was a system that would allow the content providers to focus on their content and leave the technical knowledge to elizabeth l. black (black. 367@osu.edu) is head, web implementation team, ohio state university libraries, columbus, ohio. 186 information technology and libraries | december 2011 and interviews/focus groups with the current content providers. the research was similar to that described previously in the literature review section of this article. the most helpful for this project was a 2006 issue of library hi tech focused on cmss.10 the most useful of these articles was wiggins, remley, and klingler’s article about the work done at kent state university, particularly the way in which they organized their requirements.11 a working group of four served as interviewers for the focus groups with current web content providers. they worked in pairs, with one serving as a recorder and the other as the facilitator, who asked the questions. fifteen interview sessions were held over a period of three months. the focus group participants were invited to participate in like groups as much as possible, so for example the foreign language librarians were interviewed together in a different session from the instruction librarians. however, no one participated more than once in an interview. the same set of guiding questions was used for each interview. they are included in the appendix. the results of these interviews became the basis for the requirements document to which the technical team added the technical requirements. ■■ the cms requirements the requirements were gathered into five categories: content creation and ownership, content management, publishing, presentation, and administration/technical. these categories were modeled after those used for the project at kent state university.12 the full list is detailed below by category. content creation and ownership requirements ■■ separation of content and presentation: the content owners can add and edit content without impact on the presentation ■■ web-based gui content-editing environment that is intuitive and easy to learn without knowledge of html ■■ metadata created and maintained for each webpage or equivalent content level that contains: ■❏ owner ■❏ subject terms or tags describing the content ■■ multi-user authoring without overwriting ■■ can handle a large number of content providers (approximately 200) ■■ can integrate rss and other dynamic content from other sources ■■ can handle different content types, including: ■❏ text ■❏ images organization, must meet specific local requirements for functionality, and must include revision of the content management environment, meaning new roles for the people involved with the website.4 karen coombs noted that “the implementation of a content management system dramatically changes the role of the web services staff” and requires training for the librarians and staff who are now empowered to provide the content.5 another challenge was and continues to be a lack of a turn-key library cms.6 several libraries that did a systematic requirements gathering process generally found that the readily available cmss did not meet their requirements, and they ended up writing their own applications.7 building a cms is not a project to take lightly, so only a select few libraries with dedicated in-house programming staff are able to take on such an endeavor. the sharing of the requirements of these in-house library specific cmss is valuable for other libraries in identifying their own requirements. in the past few years, the field of open-source cmss has increased, making it more likely that a library will find a viable cms in the existing marketplace that will meet the organization’s needs. drupal is an open-source cms that was one of the first viable options for libraries and so is widely used in the library community. it was the subject of an edition of library technology reports in 2008.8 since drupal opened the door for open-source cmss in libraries, others have entered the market as well. in 2009 john harney noted, “there are few technologies as prolific as web content management systems. some experts number these systems in the 80-plus range, and most would concede there are at least 50.”9 the cms selection process described here builds on those described in the literature by integrating their requirements and methods to address the needs of a very large decentralized website. it builds on the increased emphasis on user involvement in technology solution building and selection by fully incorporating the cms users in the selection process. further, the process described here took place after those described in the literature, after the opensource cms field had significantly improved. the options were much greater at the time of this study and this article describes the increased possibilities of second generation cmss. while there still does not exist the perfect library ready turn-key cms, there are many excellent, robust open-source cmss available. this article describes one process for selecting among them, including an in-depth trial of three major systems: drupal, modx, and silverstripe. ■■ gathering requirements there were two parts to the requirements gathering process undertaken at osu libraries: research of the literature selecting a web content management system for an academic library website | black 187 ■■ meets established usability standards ■■ dynamic navigation generated for main and subsections of website that includes breadcrumbs and menus ■■ searching ■❏ search engine for website ■❏ can pass searches on to other library web servers ■■ mobile device friendly version (optional) ■■ delivery presentable in the browsers used most heavily by osu libraries websites visitors ■■ page load time is acceptable ■■ easy search engine optimization administration ■■ lamp (linux, apache, mysql, php) platform ■■ good documentation for technical support and end users ■■ scalable in terms of both content and traffic ■■ skills required to maintain system are available at osul the next step was to take this extensive requirements list and identify cmss that would be appropriate for a side-by-side test with both content providers and systems engineers. ■■ cms trial the web cms would become a critical part of the web infrastructure so it was important to ensure selection of the best system for both the content providers and the it team. between may 21 and august 29, 2008, two groups worked with the cmss, testing them on criteria taken from the initial requirements documents. the first team included fourteen content providers with diverse content areas and diverse technical skills; this group rated each system on a content providers set of criteria. the second team, which included the systems engineer and a technical support specialist, rated each system on a set of criteria that was more technical in nature. each participant used a microsoft excel spreadsheet containing requirements condensed from the full list. they rated each system on a scale of 1 to 3 for each criterion, where 1 was difficult, 2 was moderate, and 3 was easy. the project manager in the it web team led the trial. the criteria given to the content providers were: ■■ web gui intuitiveness ■■ media integration ■■ editor ease of use ■■ ability to add content ■■ ability to preview content ■■ ability to publish content ■■ metadata storage ■❏ videos ■❏ camtasia/captivate tutorials ■❏ flash files ■■ content owners can create tables and/or databases for display of tabular data in the web gui interface ■■ content owners can create forms in the web gui interface ■■ option for faculty and professional staff to have webpages featuring their work and their profiles ■❏ all staff must have some control over the personal information available about them on the public website content management ■■ link maintenance ■❏ does not allow internal pages to be deleted if linked to by another cms page ■❏ can regularly check the viability of external links ■❏ periodic reminders to content owners to check their content ■■ way to repurpose content elements to multiple pages for content such as: ■❏ descriptions of article and research databases ■❏ highlight or feature content elements ■■ access controls ■❏ that allows content owners to only edit their content ■❏ that allow web liaisons to provide first line support for their departments ■❏ integrates into our existing security structures (shibboleth) ■■ robust reporting features ■❏ integration with quality web analytics software ■❏ content update tracking ■❏ system usage ■❏ customized report creation publishing ■■ ability to preview before publishing ■■ cms can produce rss feeds for dynamic sections of content ■■ page templates and style sheets are used to control page layout and design centrally ■■ display non-roman scripts using unicode ■■ extensible—can incorporate non-cms content into the site ■■ ability to add personalization options for site users presentation ■■ meets ada and w3c accessibility requirements ■■ code validates to current html specifications 188 information technology and libraries | december 2011 on the technical requirement that the system be easy to extend and integrate. a simple website served as the hub for the cms trial. the site included links to each cms instance, a link to the project blog for updates from the project team, and a link to a wiki space where trial participants shared ideas and thoughts with one another. the web team’s issue tracking system was integrated into this site so participants could easily ask questions of the technical team and report problems. time was set aside each week to handle all reported issues. of the sixteen participants who started the trial, thirteen completed a criteria spreadsheet. the project manager totaled and then averaged the scores provided by each group to determine the overall content provider score and the overall technical score (see figure 1). in the end, both the content providers and the systems engineers agreed that silverstripe was the cms that best met the requirements. ■■ silverstripe silverstripe was released as an open-source cms in november 2006 by silverstripe limited. they had developed the cms as part of their business of creating websites for clients. the company was founded in 2000 and is headquartered in wellington, new zealand. the company continues to use the cms for their website business and also offers paid support for the cms. the testers agreed that silverstripe provided the best match in the areas of easy content creation by multiple authors, handling multilingual content, management of different types of content and content files, search engine optimization, and meeting web standards. a strong and growing open-source community and strong documentation were additional keys to the selection of silverstripe.14 use by high-profile clients, such as the 2008 democratic national convention, provided proof that silverstripe could handle high traffic. the content providers praised silverstripe for the intuitive user interface, the system’s ease of use, specifically the ease of previewing and publishing content. they also noted that silverstripe handled the metadata supporting the pages as well as tabular and form page content better than the other systems. the technical evaluators noted silverstripe’s modular structure, which makes it flexible enough to integrate easily with existing web applications and accommodate local customizations without modifying the core system. silverstripe includes a template language, which fully separates the content from the presentation. in practice, this means that even informed users cannot spot a silverstripe website through simple web browsing, as is common with other cmss. ■■ ability to “feature items” ■■ ability to add rss feeds ■■ ability to enter tabular data ■■ ability to create forms ■■ testing area for new features the criteria given to the systems engineers were: ■■ installation ■■ maintainability ■■ technical documentation ■■ active developer community ■■ structure management (subsites/trees) ■■ access control/permissions ■■ link management ■■ ease of extensibility ■■ interoperability (data portability and web services) the cms requirements document was used in conjunction with the cmsmatrix (http://cmsmatrix.org) website to select five cmss to participate in a trial.13 the five systems selected were drupal 6.2, modx 0.9.6, silverstripe 2.2.2, plone 3.0.6, and typo3. the systems engineer installed all five cmss on a development server and did a simple configuration to make each operational for testing. it was at this stage that plone and typo3 were dropped from the trial because they took too long to configure and set up. the goal was to do a simple installation of the base cms, without any modules, but some systems were not functional as a cms without some modules so we added modules selectively. at the point of the selection of the systems for the trial, the project leaders noted that the entire list of requirements could not be met by an existing cms. they also noted that the majority of the key needs could be met with an existing system. therefore the goal remained to select an existing open-source web cms with the emphasis figure 1. cms trial scores selecting a web content management system for an academic library website | black 189 web guides in a content management system,” library hi tech 24, no. 1 (2006): 29–53; yan han, “digital content management: the search for a content management system,” library hi tech 22, no. 4 (2004): 355–65; david kane and nora hegarty, “new web site, new opportunities: enforcing standards compliance within a content management system,” library hi tech 25, no. 2 (2007): 276–87; ed salazar, “content management for the virtual library,” information technology & libraries 25, no. 3 (2006): 170–75. 2. seadle, “content management systems,” 5. 3. huttenlock, beaird, and fordham, “untangling a tangled web”; kane and hegarty, “new web site, new opportunities”; salazar, “content management for the virtual library.” 4. holly yu, “library web content management: needs and challenges,” in content and workflow management for library web sites: case studies, ed. holly yu, 1–21 (hersey, pa.: information science, 2005). 5. karen coombs, “navigating content management,” library journal 133 (winter 2008): 24. 6. yu, “library web content management,” 10. 7. goans, leach, and vogel, “beyond html”; salazar, “content management for the virtual library”; rick wiggins, jeph remley, and tom klingler, “building a local cms at kent state,” library hi tech 24, no. 1 (2006): 69–101; regina beach and miqueas dial, “building a collection development cms on a shoe-string,” library hi tech 24, no. 1 (2006): 115–25. 8. andy austin and christopher harris, library technology reports 44, no. 4 (may/june 2008). 9. john harney, “are open-source web content management systems a bargain?” infonomics 23, no. 3 (may/june 2009): 59–62. 10. library hi tech 24, no. 1 (2006). 11. wiggins, remley, and klingler, “building a local cms at kent state.” 12. ibid. 13. cms matrix, “the content management comparison tool,” http://cmsmatrix.org/ (accessed aug. 16, 2010). 14. silverstripe.org, “open source help & support,” http:// silverstripe.org/help-and-support/ (accessed aug. 16, 2010). ■■ conclusion an academic library website is a complex operation. the best ones use the strengths of the organization to their fullest: give web content authors direct access to maintain their content without burdening them with the requirement of technical expertise in html. excellent sites also offer a consistent user experience facilitated by centrally managed presentation. a web cms facilitates this model. the selection of a web cms is not solely a technical decision; it is most effective when made in partnership with the web content providers. the process followed by osu libraries described here provides an example of one such selection process. ■■ acknowledgements the author thanks james muir and jason thompson for their thoughtful contributions to this article and their exceptional work on the project. none of it would have been possible without them. references 1. holly yu, ed., content and workflow management for library web sites: case studies (hersey, pa.: information science, 2005); michael seadle, “content management systems,” library hi tech 24, no. 1 (2006): 5–7; terry l. huttenlock, jeff w. beaird, and ronald w. fordham, “untangling a tangled web: a case study in choosing and implementing a cms,” library hi tech 24, no. 1 (2006): 61–68; doug goans, guy leach, and teri m. vogel, “beyond html: developing and re-imagining library appendix. content provider focus interview questions each group interview included a series of questions, which could be modified depending on the direction in which the interviews progressed. these are the questions provided to the interviewers: 1. who is your audience? 2. how do you teach/communicate with each audience? 3. what types of information are you trying to communicate? 4. how dynamic or static is the information? 5. what are the most important resources in your discipline? 6. who do you teach most frequently? undergrads, grads? 7. where do you start your instruction: with library.osu.edu or the department? 8. how do you connect the users/audience to your resources? 9. what message do you want to deliver? 10. what is unique about your discipline/ needs/ department? 11. what would make things easier for you? 6 information technology and libraries | march 2011 i n the new lita strategic plan, members have suggested an objective for open access (oa) in scholarly communications. some people describe oa as articles the author has to pay someone to publish. that can be true, but that’s not how i think of it. oa is definitely not vanity publishing. most oa journals are peer-reviewed. i like the definition provided by enablingopenscholarship: open access is the immediate (upon or before publication), online, free availability of research outputs without any of the restrictions on use commonly imposed by publisher copyright agreements.1 my focus on oa journals increased precipitously when the licensing for a popular american weekly medical journal changed. we could only access online articles from one on-campus computer unless we increased our annual subscription payment by 500 percent. we didn’t have the funds, and now the students suffer the consequences. i think it was an unfortunate decision the journal’s publishers made. i know from experience that if a student can’t access the first article they want, they will find another one that is available. interlibrary loan is simpler than ever, but i think only the patient and curious students will make the effort to contact us and request an article they cannot obtain. in 2006 scientist gary ward wrote that faculty at many institutions experience problems accessing current research. when faculty teach “what is available to them rather than what their students most need to know, the education of these students and the future of science in the u.s. will suffer.” he explains it is a false assumption that those who need access to scientific literature already have it. interlibrary loans or pay-per-view are often offered by publishers as the solution to the access problem, but this misses an essential fact of how we use the scientific literature: we browse. it is often impossible to tell from looking at an abstract whether a paper contains needed methodological detail or the perfect illustration to make a point to one’s students. apart from considerations of cost, time, and quality, interlibrary loans and pay-per-views simply do not meet the needs of those of us who often do not know what we’re looking for until we find it.2 i want our medical students and tomorrow’s doctors to have access to all of the most current medical research. we offer the service of providing jama articles to students, but i’m guessing that we hear from a small percentage of the students who can’t access the full text online. are people reading oa articles? not only are scholars reading the articles, but they are citing those articles in their publications. consider the public library of science’s plosone (http://www.plosone.org/home.action), a peerreviewed, open-access, online publication that features reports on primary research from all disciplines within science and medicine. in june 2010, plosone received its first impact factor of 4.351—an impressive number. that impact factor puts plosone in the top 25 percent of the institute for scientific information’s (isi) biology category.3 the impact factor is calculated annually by isi and represents the average number of citations received per paper published in that journal during the two preceding years.4 in other words, articles from plosone published in 2008 and 2009 were highly cited. is oa making an impact in my medical library? i believe it is, although i won’t be happy until our students can access the online journals they want from off campus and the library won’t have to pay outrageous licensing fees. we have more than one thousand online oa journal titles in our list of online journals. the more full text they can access, the less they’ll have to settle for their second or third choice because their first choice is not available online. i’m glad that lita members included oa in their strategic plan. the number of oa journals is increasing, and i believe we will continue to see that the articles are reaching readers and making a difference. i don’t think ital will be adopting the “author pays” model of oa, but the editorial board is dedicated to providing lita members with the access they want. references 1. enablingopenscholarship, “enabling open scholarship: open access,” http://www.openscholarship.org/jcms/c_6157/ open-access?portal=j_55&printview=true, (accessed jan. 18, 2011). 2. ward, gary, “deconstructing the arguments against improved public access,” newsletter of the american society for cell biology, nov. 2006, http://www.ascb.org/filetracker .cfm?fileid=550 (accessed jan. 18, 2011). 3. davis, phil, “plos one: is a high impact factor a blessing or a curse?” online posting, june 21, 2010, the scholarly kitchen, http://scholarlykitchen.sspnet.org/2010/06/21/plosone -impact-factor-blessing-or-a-curse/ (accessed jan. 18, 2011). 4. thomson reuters, “introducing the impact factor,” http://thomsonreuters.com/products_services/science/ academic/impact_factor/ (accessed jan. 18, 2011). cynthia porter editorial board thoughts: is open access the answer? cynthia porter (cporter@atsu.edu) is distance support librarian at a.t. still university of health sciences, mesa, arizona. editorial | truitt 3 marc truitt marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. marc truitt editorial: and now for something (completely) different t he issue of ital you hold in your hands—be that issue physical or virtual; we won’t even go into the question of your hands!—represents something new for us. for a number of years, ex libris (and previously, endeavor information systems) has generously sponsored the lita/ex libris (née lita/endeavor) student writing award competition. the competition seeks manuscript submissions from enrolled lis students in the areas of ital’s publishing interests; a lita committee on which the editor of ital serves as an ex-officio member evaluates the entries and names a winner. traditionally, the winning essay has appeared in the pages of ital. in recent years, perhaps mirroring the waning interest in publication in traditional peerreviewed venues, the number of entrants in the competition has declined. in 2008, for instance, there were but nine submissions, and to get those, we had to extend the deadline six weeks from the end of february to midapril. in previous years, as i understand it, there often were even fewer. this year, without moving the goalposts, we had— hold onto your hats!—twenty-seven entries. of these, the review committee identified six finalists for discussion. the turnout was so good, in fact, that with the agreement of the committee, we at ital proposed to publish not only the winning paper but the other finalist entries as well. we hope that you will find them as stimulating as have we. even more importantly, we hope that by publishing such a large group of papers representing 2009’s best in technology-focused lis work, we will encourage similarly large numbers of quality submissions in the years to come. i would like to offer sincere thanks to my university of alberta colleague sandra shores, who as guest editor for this issue worked tirelessly over the past few months to shepherd quality student papers into substantial and interesting contributions to the literature. she and managing editor judith carter—who guest-edited our recent discovery issue—have both done fabulous jobs with their respective ital special issues. bravo! n ex libris’ sponsorship in one of those ironic twists that one more customarily associates with movie plots than with real life, the lita/ex libris student writing award recently almost lost its sponsor. at very nearly the same time that sandra was completing the preparation of the manuscripts for submission to ala production services (where they are copyedited and typeset), we learned that ex libris had notified lita that it had “decided to cease sponsoring” the student writing award. a brief round of e-mails among principals at lita, ex libris, and ital ensued, with the outcome being that carl grant, president of ex libris north america, graciously agreed to continue sponsorship for another year and reevaluate underwriting the award for the future. we at ital and i personally are grateful. carl’s message about the sponsorship raises some interesting issues on which i think we should reflect. his first point goes like this: it simply is not realistic for libraries to continue to believe that vendors have cash to fund these things at the same levels when libraries don’t have cash to buy things (or want to delay purchases or buy the product for greatly reduced amounts) from those same vendors. please understand the two are tied together. point taken and conceded. money is tight. carl’s argument, i think, speaks as well to a larger, implied question. libraries and library vendors share highly synergistic and, in recent years, increasingly antagonistic relationships. library vendors—and i think library system vendors in particular—come in for much vitriol and precious little appreciation from those of us on the customer side. we all think they charge too much (and by implication, must also make too much), that their support and service are frequently unresponsive to our needs, and that their systems are overly large, cumbersome, and usually don’t do things the way we want them done. at the same time, we forget that they are catering to the needs and whims of a small, highly specialized market that is characterized by numerous demands, a high degree of complexity, and whose members—“standards” notwithstanding—rarely perform the same task the same way across institutions. we expect very individualized service and support, but at the same time are penny-pinching misers in our ability and willingness to pay for these services. we are beggars, yet we insist on our right to be choosers. finally, at least for those of us of a certain generation—and yep, i count myself among its members—we chose librarianship for very specific reasons, which often means we are more than a little uneasy with concepts of “profit” and “bottom line” as applied to our world. we fail to understand the open-source dictum that “free as in kittens and not as in beer” means that we will have to pay someone for these services—it’s only a question of whom we will pay. carl continues, making another point: i do appreciate that you’re trying to provide us more recognition as part of this. frankly, that was another consideration in our thought of dropping it—we just didn’t feel like we were getting much for it. marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2010 i’ve said before and i’ll say again, i’ve never, in all my years in this business had a single librarian say to me that because we sponsored this or that, it was even a consideration in their decision to buy something from us. not once, ever. companies like ours live on sales and service income. i want to encourage you to help make librarians aware that if they do appreciate when we do these things, it sure would be nice if they’d let us know in some real tangible ways that show that is true. . . . good will does not pay bills or salaries unless that good will translates into purchases of products and services (and please note, i’m not just speaking for ex libris here, i’m saying this for all vendors). and here is where carl’s and my views may begin to diverge. let’s start by drawing a distinction between vendor tchotchkes and vendor sponsorship. in fairness, carl didn’t say anything about tchotchkes, so why am i? i do so because i think that we need to bear in mind that there are multiple ways vendors seek to advertise themselves and their services to us, and geegaws are one such. trinkets are nice—i have yet to find a better gel pen than the ones given out at iug 14 (would that i could get more!)—but other than reminding me of a vendor’s name, they serve little useful purpose. the latter, vendor sponsorship, is something very different, very special, and not readily totaled on the bottom line. carl is quite right that sponsorship of the student writing award will not in and of itself cause me to buy aleph, primo, or sfx (oh right, i have that last one already!). these are products whose purchase is the result of lengthy and complex reviews that include highly detailed and painstaking needs analysis, specifications, rfps, site visits, demonstrations, and so on. due diligence to our parent institutions and obligations to our users require that we search for a balance among best-of-breed solutions, top-notch support, and fair pricing. those things aren’t related to sponsorship. what is related to sponsorship, though, is a sense of shared values and interests. of “doing the right thing.” i may or may not buy carl’s products because of the considerations above (and yes, ex libris fields very strong contenders in all areas of library automation); i definitely will, though, be more likely to think favorably of ex libris as a company that has similar—though not necessarily identical—values to mine, if it is obvious that it encourages and materially supports professional activities that i think are important. support for professional growth and scholarly publication in our field are two such values. i’m sure we can all name examples of this sort of behavior: in addition to support of the student writing award, ex libris’ long-standing prominence in the national information standards organization (niso) comes to mind. so too does the founding and ongoing support by innovative interfaces and the library consulting firm r2 for the taiga forum (http://www.taigaforum.org/), a group of academic associate university librarians. to the degree that i believe ex libris or another firm shares my values by supporting such activities—that it “does the right thing”—i will be just a bit more inclined to think positively of it when i’m casting about for solutions to a technology or other need faced by my institution. i will think of that firm as kin, if you will. with that, i will end this by again thanking carl and ex libris—because we don’t say thank you often enough!—for their generous support of the lita/ex libris student writing award. i hope that it will continue for a long time to come. that support is something about which i do care deeply. if you feel similarly—be it about the student writing award, niso, taiga, or whatever—i urge you to say so by sending an appropriate e-mail to your vendor’s representative or by simply saying thanks in person to the company’s head honcho on the ala exhibit floor. and the next time you are neck-deep in seemingly identical vendor quotations and need a way to figure out how to decide between them, remember the importance of shared values. n dan marmion longtime lita members and ital readers in particular will recognize the name of dan marmion, editor of this journal from 1999 through 2004. many current and recent members of the ital editorial board—including managing editor judith carter, webmaster andy boze, board member mark dehmlow, and i—can trace our involvement with ital to dan’s enthusiastic period of stewardship as editor. in addition to his leadership of ital, dan has been a mentor, colleague, boss, and friend. his service philosophy is best summarized in the words of a simple epigram that for many years has graced the wall behind the desk in his office: “it’s all about access!!” because of health issues, and in order to devote more time to his wife diana, daughter jennifer, and granddaughter madelyn, dan recently decided to retire from his position as associate director for information systems and digital access at the university of notre dame hesburgh libraries. he also will pursue his personal interests, which include organizing and listening to his extensive collection of jazz recordings, listening to books on cd, and following the exploits of his favorite sports teams, the football irish of notre dame, the indianapolis colts, and the new york yankees. we want to express our deep gratitude for all he has given to the profession, to lita, to ital, and to each of us personally over many years. we wish him all the best as he embarks on this new phase of his life. introducing zoomify image | smith 25 column title editor author id box for 3 column layout communications “just in case” answers: the twenty-first-century vertical file | dalrymple 25 tam dalrymple “just-in-case” answers: the twenty-first century vertical file this article discusses the use of oclc’s questionpoint service for managing electronic publications and other items that fall outside the scope of oclc library’s opac and web resources pages, yet need to be “put somewhere.” the local knowledge base serves as both a collection development tool and as a virtual vertical file, with records that are easy to enter, search, update, or delete. we do not deliberately collect for the vertical file, but add to it day by day the useful thing which turns up. these include clippings from newspapers, excerpts from periodicals . . . broadsides that are not injured by folding . . . anything that we know will be used if available. —wilson bulletin, 1919 i nformation that “will be used if available” sounds like the contents of the internet.1 as with libraries everywhere, the oclc library has come to depend on the internet as an almost limitless resource. and like libraries everywhere, it has confronted the advantages and disadvantages of that scope. this means that in addition to using the opac and oclc library’s webpages, oclc library staff have used a mix of bookmarks, del.icio.us tags, and post-it® notes to keep track of relevant, authoritative, substantive, and potentially reusable information. much has been written about the use of questionpoint’s transaction management capabilities and of the important role of knowledge bases in providing closure to an inquiry. in contrast, this article will look at questionpoint’s use as a management tool for future questions, for items that fall outside the scope of oclc library’s opac and web resources pages yet need to be “put somewhere.” the questionpoint local knowledge base is just the spot for these new vertical file items. about oclc library oclc is the world’s largest nonprofit membership computer library service and research organization. more than 69,000 libraries in 112 countries and territories around the world use oclc services to locate, acquire, catalog, lend, and preserve library materials. oclc library was established in 1977 to provide support for oclc’s mission. the collection concentrates on library, information and computer sciences, business management, and has special collections that include the papers of frederick g. kilgour and archives of the dewey decimal classification™. oclc library has a distinct clientele to which it offers a complete range of services—print and electronic collections, reference, interlibrary loan—within its subject areas. because of the nature of the organization, the library supports longterm and collaborative research, such as that done by oclc programs and research staff, as well as the immediate information needs of product management and marketing staff. oclc library also provides information to oclc’s other service areas, such as finance and human resources. while most oclc library acquisitions are done on demand, oclc library selects and maintains an extensive collection of periodicals, journals, and reference resources, most of them online and accessible—along with the opac—to oclc employees worldwide from the library’s webpages (see figure 1). often, however, oclc staff, like those of many organizations, are too busy to consult these resources themselves and thus depend on the library. oclc library staff pursue the answers to such research questions through its collections and look to enhance the collections with “anything that we know will be” of use. one of the challenges is keeping track of the “anything” that falls outside the library’s primary collections scope; questionpoint helps with that task. traditional uses of questionpoint questionpoint is a service that provides question management tools aimed at increasing the visibility of reference services and making them more efficient. oclc library uses many of those tools, but there are significant ones it does not use (for example, chat). and although the library’s questionpoint-based aska link is visible by default on the front page of the corporate intranet as well as on oclc library–specific pages, less than than 8 percent of questions over the last year were received through that link. one reason for this low use may be that for most of oclc library’s history, e-mail has been the primary contact method, and so it remains. even when the staff need clarification of a question, they automatically opt for telephone or e-mail messaging. working with a web form and question-and-answer software has not caught on as a replacement for these more established methods. however, questionpoint remains tam dalrymple (dalrympt@oclc.org) is senior information specialist at oclc, dublin, ohio. 26 information technology and libraries | december 200826 information technology and libraries | december 2008 the reference “workspace.” when questions come in through e-mail or phone, librarians enter them into questionpoint, using it to add notes and keep track of sources checked. completed transactions are added to the local knowledge base. (because their questions involve proprietary matters, many special libraries do not add their answers to the global knowledge base, and oclc library is no exception. the local knowledge base is accessible only by oclc library staff.) not surprisingly, most of the questions received are about libraries, museums, and other cultural institutions, their collections, users, and staff. this means that the likelihood of reuse of the information in the oclc library knowledge base is relatively high, and makes the local knowledge base an early stop in the reference process. though statistics vary widely by individual institutions and type of library—and though some libraries have opted not to use the knowledge base—the average ratio for all questionpoint libraries is about one knowledge base search for every three questions received. in contrast, in the past year oclc library staff averaged 4.2 local knowledge base searches for every three questions received. the view of the questionpoint knowledge base as a repository of answers to questions that have been asked is a traditional one. oclc library’s use of the questionpoint knowledge base in anticipation of information needs of its clients—as a way of collection development—is distinctive. in many respects this use creates an updated version of the oldfashioned vertical file. nontraditional uses of questionpoint just-in-case the vertical file has a quirky place in the annals of librarianship. it has been the repository for facts and information too good to throw away but not quite good enough to catalog. h. w. wilson still offers its vertical file index, a specialized subject index to pamphlets issued on topics often unavailable in book form, which began in 1932. by now, except for special collections, the internet has practically relegated the vertical file to the backroom with the card platens and electric erasers. oclc library now uses its questionpoint knowledge base to manage information that once might have gone into a vertical file: the authoritative reports, studies, .org sites, and other resources that are often not substantive enough to catalog, but too good to hide away in a single staff member’s bookmarks. the questionpoint knowledge base provides a place for these resources; more importantly, questionpoint provides fast, efficient ways to collect, tag, manage, and use them. questionpoint allows development of such collections with powerful capabilities that allow for future retrieval and use of the information, and it does so without the incredibly time-consuming processes of the past. a 1909 description of such processes describes in detail the inefficiency of yore: in the public library [sic] of newark, n.j., material is filed in folders made of no. 1 tag manila paper, cut into pieces about 11x18 inches in size. one end is so turned up against the others as to make a receptacle 11x19 1/2 inches. the front fold is a half inch shorter than the back one, and this leaves a margin exposed on the back one, whereon the subject of that folder is written.2 thus a major benefit of using questionpoint to manage these resources is saving time. because questionpoint is a routine part of oclc library’s workflow, it allows the addition of items directly to the figure 1. oclc library intranet homepage introducing zoomify image | smith 27“just in case” answers: the twenty-first-century vertical file | dalrymple 27 knowledge base quickly and with a minimum of fuss. there is initially no need to make the entry “pretty,” but only to describe the resource briefly, add the url, and tag it (see figure 2). unlike a physical vertical file, tagging items in the knowledge base allows items to be “put” in multiple places. staff can also add comments that characterize the authoritativeness of a resource. occasionally librarians come across articles or resources that might address multiple questions. instead of burying the data in one overarching knowledge base record, staff can make an entry for each aspect of the resource. an example of this is www .galbithink.org/libraries/analysis. htm, a page created by douglas galbi, senior economist with the federal communications commission (see figure 3). the site provides statistics, including historical statistics, on u.s. public libraries. rather than describe these generically with a tag like “library statistics”—not very useful in any case—each source can be added separately to the questionpoint knowledge base. for example, the item “audiovisual materials in u.s. public libraries” can be assigned specific tags—audiovisual, av, videos—that will make the data more accessible in the future. in other words, librarians use the faq model of asking and answering just one question at a time. an important element in adding “answers” to oclc library’s knowledge base is the ability to provide context. with questionpoint, librarians can not only describe what the resource is, but why it may be of future use. and just the act of adding information to the knowledge base serves as a valuable mnemonic— “i’ve seen that somewhere.” records added to the knowledge base in this way can be easily updated with information about newer editions or better sources. equally valuable is the ability to edit and add keywords when the resource becomes useful for unforeseen questions. sharing information with staff the knowledge base also serves as a more formal collection development tool. when librarians run across potentially valuable resources, they can send a description and a link to a product manager who may find it of use. library staff use questionpoint’s keyword capability to add tags of people’s names and job titles to facilitate ongoing current awareness. employees may provide feedback suggesting an item be added to the figure 3. a page with diverse facts and figures: www.galbithink.org/libraries/analysis.htm figure 2. a sample questionpoint entry, this for a report by the national endowment for the arts 28 information technology and libraries | december 200828 information technology and libraries | december 2008 permanent print collection, or linked to from the library website. oclc library strives to inform users without subjecting them to information overload. when a 2007 survey of oclc staff found the library’s rss feeds seldom used, librarians began to send e-mails directly to individuals and teams. the reaction of oclc staff indicates that such personal messages, with content summaries that allow recipients to quickly evaluate the contents, are more often read than oclc library rss feeds—especially if items sent continue to be valuable. requirements that enable this kind of sharing include knowledge of company goals, staff needs, and product initiatives. to keep up-todate, librarians meet regularly with other oclc staff, and monitor organizational changes. attendance at oclc’s members council meetings provides information on hot topics that help identify resources for future use. while oclc’s growth as a global organization has brought challenges in maintaining awareness of the full range of organization needs, the questionpoint knowledge base offers a practical way to manage increased volume. maintaining resources of potential interest to staff with questionpoint has another benefit: it helps keep librarians aware of internal experts who can help the library with questions, and in many cases allows the library to connect staff with mutual interests to one another. this has become especially important as oclc has grown and its services continue to integrate with one another. conclusions beyond its usefulness as a system to receive, manage, and answer inquiries, questionpoint is providing a way to facilitate access to online resources that addresses the particular needs of oclc library’s constituency. it is fast and easy to use: a standard part of the daily workflow. it enables direct links to sources and accommodates tagging those sources with the names of people and projects, as well as subjects. it serves as part of the library’s collection management and selection system. using questionpoint in this way has some potential drawbacks. “just in case” acquisition of virtual resources entails some of the risks of traditional acquisitions: acquiring resources that are seldom used, creating a database of resources that are difficult to retrieve, and perhaps the necessity of “weeding” or updating obsolete items. with company growth comes the issue of scalability, as well. but for now, the benefits have far outweighed the risks. most of the items added have been identified for and shared with at least one staff member, so the effort has provided immediate payoff. n the knowledge base serves as a collection development tool, helping to identify items that can be cataloged and added to the permanent collection. n the record in the knowledge base can serve as a reminder to check for later editions. n the knowledge base records are easy to update or even delete. the questionpoint virtual vertical file helps oclc library manage and share those useful things that “just turn up.” references 1. “the vertical file for pamphlets and miscellany,” wilson bulletin 1, no. 16 (june 1919): 351. 2. kate louise roberts, “vertical file,” public libraries 12 (oct. 1907): 316–17. 24 cost analysis of an automated and manual cataloging and book processing system joselyn druschel: washington state university , pullman. a comparative cost analysis of an automated network system (wln) and a local manual system of cataloging and book processing at washington state university libraries indicates that the automated system is about 20 percent less costly than the manual system. a per-unit cost approach was used in calculating the monthly cost of each system based on the average number of items processed per month under the automated system. the process and the results of the analysis are presented in a series of charts which detail the tasks, items processed, unit and total monthly costs of both the manual and automated systems. the higher costs of the manual system were essentially staff costs. the technical services division (tsd) of washington state university libraries (wsul) has had considerable experience in the use of automated techniques in selected areas of technical processing. an in-house automated acquisitions system was developed and implemented in 1967; that in-house system was eventually replaced by the acquisitions component of the washington library network (wln) . since november 1977, the technical services division of wsul has used the wln bibliographic component for data verification (searching) and cataloging of materials. although the library has generally known its total automation expenditures, it has lacked a more precise breakdown of cost data on automated processing. moreover, the library has practically no cost data on manual processing. this report deals only with the costs of using the wln bibliographic system, not the wln acquisitions component. an analysis was made of the total costs of both the automated and manual book processing systems. the objectives in undertaking the cost analysis were threefold: (1) to identify the essentially unknown costs of manual processing; (2) to provide more exact cost data on automated processing; and (3) to develop comparable data on the costs of each system . manuscript received october 1980; accepted december 1980. cost analysis/druschel 25 methodology the methodology used in this cost analysis was a per-unit cost approach. first, each process or task in which the staff were engaged in cataloging and book processing was identified. second, the per-unit cost-e.g., staff, data base, materials-of each process was calculated. finally, monthly costs were determined by multiplying the average number of items processed per month by the unit cost per task. the cost analysis charts (tables l(a)-l(e)-manual system; tables 2(a)-2(d)-automated system), which detail the tasks, items processed, and unit and total costs form the body of· the analysis. equipment costs-purchase, lease, maintenance-were calculated separately, and are included in the summary cost data for each system (table 3). identification of processes the staff of the tsd cataloging and book processing unit perform the following functions: bibliographic verification, bibliographic record production, bibliographic record maintenance, the marking of materials, binding preparation and receipt (for most of the library system), and the preparation of book cards. table i( a). cost analysis: manual cataloging and book processing system staff costs per process item bibliographic searching idc microfiche search (lc and cip copy) lt i (.084/min @ 3 min/item) $ .252 lt ii (.094/min@ 3 min/item) .282 lt iii (.117/min @ 3 min/item) .351 subscription costs-idc ($10,000/yr -'47,664 searches/yr = . 21/search) microfiche search subtotal national union catalog, etc., search lt i (.084/min@ 15 min/item) $1.26 lt ii (.094/min @ 15 min/item) 1.41 lt iii (.117/min@ 20 min/item) 2.34 lt iii (.117/min@ 40 min/item) 4. 68 subscriptions ($2, 940/yr -'15,300 searches/yr = .19/search) manual search subtotal bibliographic searching total data base costs/ item subscription costs/item $ .21 .21 .21 $ .19 .19 .19 . 19 materials costs/item average total number total cost processed cost per per per item month month $ .462 2484 $1,148 244 557 .492 496 .561 992 $1.45 1.60 2.53 4.87 3972 $1,949 588 $ 853 169 270 418 1,058 100 487 1275 $2,668 5247 $4,617 26 journal of library automation vol. 14/1 march 1981 table l(b). cost analysis: manual cataloging and book processing system ave rage staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month bibliographic record production-processing and products l. cataloging with lc microfiche copy type abbreviated fanfold (4-part 3x5 slips) timeslip (.03/min@ min/item x 72) $ .803 $.02/ fanfold lt i (.084/min @ 10 min/item x 985) check series lt i (.084/min@ 2 min/item) .168 revise fanfold supervisor i (.126/min @ 3 min/item) .38 check fanfold against book; separate fanfold lt ii (.094/min@ 2 min/item x 540) .22 supervisor i (.126/min @ 2 min/item x 517) arrange and file shelflist copy of fanfold timeslip (. 03/min @ 1.5 min/slip) .045 revise shelflist filing of fanfold slips lt ii (.094/min @ 1 min/slip) .094 verify authorities ( subject and name ) (1057x4 ) timeslip (.03/min@ 4 min/item) .12 type multilith master for card production lt i (. 084/min @ 6 min/master) .504 .061 master revise typed multilith master lt i (.084/min@ 3 min/master) .252 run multilith masters multilith operator (.13/min@ 3.5 min/set) .455 (cost of cards see below) microfiche copy cataloging subtotal $ 3.04 $ 081 $3.12 1057 $3,298 item 2. cataloging with modified copy (nuc/ lc) type fanfolds (4-part 3x5 slips ) lt i (.084/min@ 15 min/item) $ 1.26 $.021 fanfold cost analysis/druschel 27 table l(b) ( cont.) check series lt i (.084/min@ 2 min/item) .168 revise fanfold supervisor i (.126/min @ 5 min/item) .63 review fanfold cataloging librarian (.155/min @ 5 min/item) .775 separate fan folds lt ii (.094/min @ 30 sec/item) .047 . arrange and file sheljlist copy of fanfold timeslip (.03/min@ 1.5 min/slip) .045 revise filing of shelflist copy lt ii (.094/min @ l min/slip) .094 verify authorities (984 x 4) timeslip (.03/min@ 4 min/item) .12 type multilith master for card production lt i (.084/min @ 6 min/master) .504 .06/ master revise typed multilith master lt i (.084/min @ 3 min/master) .252 run multilith masters multilith operator (.13/min @ 3. 5 min/set) .455 (cost of oards see below) modified copy cataloging subtotal $ 4.35 $.08/ $4.43 984 $4,359 item 3. original cataloging catalog material librarian (. 155/min @ 60 min/item x 200) $ 9.60 librarian (.205/min @ 60 min/item x 22) revise cataloging librarian (.205/min @ 5 min/item) 1.03 type fan folds ( 4-part 3x5 slips) lt i (.084/min@ 15 min/item) 1.26 $.02/ fanfold check series lt i (.084/min@ 2 min/item .168 revise fanfold supervisor i (.126/min @ 5 min/item) .63 separate fan folds lt ii (.094/min @ 30 sec/item) .047 arrange and file sheljlist copy of fanfold timeslip (.03/min @ 1.5 min/item) .045 revise filing of sheljlist copy lt ii (.094/min @ 1 min/slip) .094 28 journal of library automation vol. 14/1 march 1981 table i(b) (c ont .) staff costs per process item 4. type multilith master for card production lt i (.084/min @ 6 min/master) .504 revise typed master lt i (.084/min@ 3 min/master) .252 run multilith master multilith operator (.13/min @ 3.5 min/set) .455 original cataloging subtotal $14.085 catalog cards (7 cards/set @ .055/card) subtotal cataloging total miscellaneous bibliographic record production assign class numbers to theses supervisor i (. 126/min @ 2 min/item) assign subject headings for audio visual materials librarian (.155/min @ 2 min/set) type multilith masters for catalog cards for a-v materials lt i (.084/min @ 6 min/master) revise multilith masters lt i (.084/min @ 3 min /master) run multilith masters multilith operator (.13/min@ 3.5 min/set) (20 cards) resolve problems; general supervision supervisor i (7.56/hr x 52 hrs/mo) librarian (9.32/hr @ 22 hrs/mo) miscellaneous bibliographic record production subtotal bibliographic record production total $ .252 .31 .504 .252 .455 data base costs/ item subscription costs/item materials costs/item .06/ master (cost of cards see below) total cost per item average number total processed cost per per month month $. 081 $14.165 222 $ 3,145 item $.385/ set $.06/ master 1.10/ set $ .252 .31 .564 .252 1.555 4297 $ 1,6.54 2263 $12,456 30 30 30 30 30 8 9 17 8 47 393 205 $ 687 $13,143 cost analysis/druschel 29 table i( c). cost analysis: manual cataloging and book processing system average staff data total number total costs base cost processed cost per costs/ subscription mate rials per per per process item item costs/ item c osts/ltem i tern month month bibliographic record maintenance count sets of cards and match against cataloging copy lt i (.084/min @ 2 sets/min) $ .042 $ .042 4297 $ 180 type subject and added entries on card sets timeslip (.03/min @ 3 min/set) .09 .09 4297 387 revise card sets lt ii (.094/min @ 3 min/set) .282 .282 2520 711 lt iii (.117/min@ 3 min/set) .351 .35 1 1803 633 type subject and name authority slips timeslip (.03/min @ 1 min/slip) .03 .03 4526 136 file subject and name authority slips timeslip (.03/min @: 1 min/slip) .03 .03 4526 136 separate card sets lt i (.084/min@ 2 sets/min) .042 .042 4297 180 file subject catalog cards (2263x2) lt ii (.094/m in @ 1 min/card ) .094 .094 4526 425 file ait catalog cards (2263 x3 ) lt i (.084/min @ 1 min/card) .084 .084 6789 570 file shelflist cards (2) timeslip (.03/min @ 1 min/card ) .03 .03 4526 136 retoise subject card filing lt iii (.117/min@ 1 min/card ) .117 .117 .4526 530 retoise ait card filing lt iii (.117/min@ 1 min/card) .117 .117 6789 794 revise sheljlist filing (2) lt ii (.094/min @ 1 min/card ) .094 .094 2340 220 supervisor i (.126/min @ 1 min/card ) .126 .126 2186 275 alphabetize and date works/ips lt i (.084/min @ 4 slips/min) .021 .021 2263 48 pull card sets (withdrawals and card corrections timeslip (.03/min @ lo min/set) .30 .30 100 30 revise card pulling ( 100 sets/month) supervisor i (.126/min @ 2 min/set) .252 .252 100 25 correct card sets (50 sets/month ) lt ii (.094/min@ 5 min/set) .465 .465 50 23 revise card corrections supervisor i (.126/min @ 2 min/set) .252 .252 50 13 process added copies (record accession # on shelflist; record call # in book; type slip for marking) lt ii (.094/min @ 15 min/item) 1.41 1.41 50 71 30 journal of library automation vol. 14/1 march 1981 table 1( c) (cont.) average staff data total number total costs base cost per costs/ subscription materials per process item locate materials in process lt ii (.094/min @ 15 min/item) $1.41 prepare books for binding decision supervisor i (. 126/min @ 1 min/item) .126 general supervision librarian ($12.34/hr @ 65 hours/month) bibliographic record maintenance total item costs/item costs/item item $1.41 . 126 table i( d ) . cost analysis: manual cataloging and book processing system staff data total costs base cost per costs/ subscription materials per process item item costs/item costs/item item marking sort materials for processing (marking) oa ii-typing (.105/min@ 30 sec/item) $ .053 $ .053 place materials on table oa 11-typing (.105/min@ 20 items/min) .005 .005 process materials (type and paste labels, pockets, & date due slips; type book cards) timeslip (.03/min@ 20 min/item) .60 $.029/ .629 label ; pocket ; date due slip; book card process materials with tab book cards (type and paste labels, pockets, & date due slips) timeslip (.03/min @ 16 min/item) .48 .032/ .512 oa ii-typing (.105/min@ 16 min/item) 1.68 label; 1.712 pocket; date due slip; book card processed cost per per month month 50 $ 71 50 6 802 $6,402 average number total processed cost per per month month 2263 $ 120 2263 ll 400 252 1555 796 308 527 cost analysis/druschel 31 table 1( d) (cont.) keypunch bookcards lt i (.084/min @ 2.4 min/card) .201 verify book cards lt iii (.117/min@ 1.6 min/card) .187 revise processing lt i (.084/min @ 2 min/item) .168 lt iii (. 117/min@ 2 min/item) .234 sort materials for delivery oa ii-typing (.105/min@ 1.5 items/min) .07 unpack bindery materials, pull slips lt i (.084/min @ 1 min/item) .084 verify bindery slips; check price lt iii (.117/min@ 2 min/item) general supervision, bindery account & statistical data lt iii (7.04/hr@ 15 hrs/mo) supervisor ii (8.97/hr @ 128 hrs/mo) librarian (12.34/hr @ 15 hrs/mo) marking total cataloging and book processing total .234 table i( e). total monthly costs (summary ) staff costs per month $25,775 data base costs/month subscription costs per month $1,076 .201 1863 374 .187 1863 348 .168 1500 252 .234 763 179 .07 2263 158 .084 550 46 .234 550 129 106 1,148 185 $ 4,631 $28,793 material costs per month total cost per month $1,942 $28,793 table 2(a). cost analysis: automated cataloging and book processing system average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month bibliographic searching l. wln data base search items searched, no inquiry charges lt ii (.094/min @ 1 min/item) terminal use (4 @ .06) $ .094 $ .24 $ .334 2443 $ 816 terminal use (3@ .06) .094 .18 .274 100 27 32 journal of library automation vol. 14/1 march 1981 table 2( a) (cont.) average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month items searched, inquiry charges assessed lt ii (.094/min@ 1 min/item) inquiry (3 @ .069) .094 .39 .484 1429 692 terminal use (3@ .06) data base search subtotal 3972 $1,535 2. national union catalog, etc. search (manual) lt ii (.094/min @ 10 min/item ) .94 .31 1.25 508 635 subscriptions ($1,860/yr + 6096 searches/yr) manual search subtotal 508 $ 635 bibliographic searching total 4480 $2, 170 table 2(b ). cost analysis: automated cataloging and book processing system average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month bibliographic record productionprocessing and products l. materials cataloged via wln a. cataloging with wln data base copy attach holdings; order cards lt ii (.094/min @ 6 min/item) $ .564 data base costs inquiry costs (no charge) cost per record use $1.60 cost per request .15 shelllist cards (4 @ .055) .22 com (cost pe r record ) .43 terminal use (1 @ .06/use) .06 -----wln data base copy subtotal $ .564 $2.46 $3.024 1376 $4,161 b. cataloging with cip copy upgrade data base copy lt ii (. 094/m in @ 11 min/item) $1.034 revise upgraded copy librarian (.155/min @ 5 min/item) .775 attach holdings order cards lt ii (.094/min @ 6 min/item ) .564 table 2(b) (cont.) data base costs cost per record use cost per request shelflist cards (4 @ .055) com (cost per record) terminal use (1 @ .06/use) $1.60 .15 .22 .43 .06 cip copy subtotal $2.373 $2.46 c. cataloging with modified copy (e.g., nuc/lc copy) prepare cataloging worksheets lt ii (.094/min @ 15 min/item) $1.41 revise cataloging worksheets lt ii (.094/min @ 10 min/item) . 94 marc tag worksheets supervisor ii (.15/min @ 15 min/item) 2.25 revise marc tagged worksheets librarian (.155/min @ 8 min/item) 1.24 input cataloging data; attach holdings; order cards timeslip (.03/min @ 25 min/item) . 75 revise data input and verify authorities librarian (. 155/min @ lo min/item) 1.55 data base costs cost of input per record cost of authority checks (7 checks @ .069/entry) shelflist cards (4 @ .055) com (cost per record) terminal use (7 @ .06/use) $ .14 .48 .22 .43 .42 modified copy subtotal $8.14 $1.69 d. original cataloging catalog and marc tag material librarian (.155/min @ 60 min/item) $ 9.30 revise cataloging and marc tagging librarian (.205/min @ 5 min/item) 1.03 input cataloging data; attach holdings; order cards lt ii (.094/mi n @ 25 min x 104) cost analysis/druschel 33 $4.833 153 $ 739 $9.83 95 $ 934 34 journal of library automation vol. 14/1 march 1981 table 2(b ) (cont.) average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month timeslip (.03/min @ 25 min x 118) 1.49 revise input; verify authorities librarian (.155/min @ 10 min/item) 1.55 data base costs cost of input per record $ .14 cost of authority checks (7 checks @ .069/entry) .48 shelflist cards (4 @ .055) .22 com (cost per record) .43 terminal use (7 @ .06/use) .42 ----subtotal $13.37 $1.69 $15.06 222 $ 3,343 wln cataloging total 1846 $ 9,177 2. materials cataloged via other methods a. microform cataloging from publisher's copy review and revise copy; complete processing; revise card sets librarian (.25/min@ 2. 7 min/item) $ .675 xerox card sets (1 0 cards/set) timeslip (.03/min @ 1 min/title) .03 $.551 set microform subtotal $ .705 $.551 $1.255 407 $ 511 set b. cataloging music scores catalog scores; prepare for card production; revise card sets librarian (.25/min @ 28 min/item) 7.00 xerox card sets (14 cards/se t) timeslip (.03/min@ 2 min/title) .06 .77/ set music score subtotal $ 7.06 $. 77/ $7.83 10 $ 78 set non-wln cataloging total 417 $ 589 cataloging total 2263 $9,766 cost analysis/druschel 35 table 2(b ) (cont .) 3. miscellaneous costs assign class numbers to theses supervisor ii (.15/m in @ 2 min/item) $ .30 $ .30 30 $ 9 retrieve "rush" monographs supervisor ii (. 15/min @ 15 min/item) 2.25 2. 25 75 169 correct/update wln data base information lt ii (.094/min @ 10 min/item) terminal use (1 @ .06/use ) .94 $.06 1.00 360 360 assign subject headings for audio visual materials librarian (.155/min @ 2 min/set) .31 .31 30 9 file subject authority slips for microform mat erials librarian (.155/min @ 1.15 min/slip) .18 .18 55 10 resolve problems; general supervision lt ii (5 .68/hr x 13 hrs/mo) 74 supervisor ii (8. 97 hrs x 89 hrs/mo) 798 librarian ($12. 34 hr x 52 hrs/mo) 642 miscellaneous costs subtot~l $ 2,071 bibliographic record production total $11 ,837 table 2(c). cost analysis: automated cataloging and book processing system average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month bibliographic record maintenancets d collate card sets from wln (7384 cards) lt i (. 083/min @ 30 sec/card) $ .042 $ .042 7384 $ 310 insert card sets in books lt ii (.094/min @ 1.6 min/item) process new books (1846 ) 1.51 . 151 1846 279 review cards against books; add accession number and stamp date on shelflist card; carrect series (when needed); separate card sets and distribute timeslip (.03/min @ 10 min/item) .30 .30 145 44 38 journal of library automation vol. 14/1 march 1981 table 2( c) (cont.) average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month lt i (.083/ min @ 10 min/item) $.83 $ .83 1701 $1 ,412 revise book processing (1846) lt iii (.117/min@ 1 min/item) .117 .117 1846 216 file central and holland sheljlist timeslip (.03/min @ 1 min/card) .03 .03 4526 136 revise central and holland sheljlist lt i (.083/min @ 30 sec/card) .042 .042 4526 190 separate and alphabetize microform card sets timeslip (.03/min @ 1 min/set) .03 .03 2000 60 file author/title/subject microform cards in general catalog timeslip (.03/min@ 1 min/card) .03 .03 2000 60 revise filing general catalog lt iii (.117/min@ 1 min/card) .117 .117 2000 234 pull card sets (withdrawals and card corrections--40 setslmo) timeslip (.03/min@ 10 min/set) .30 .30 20 6 lt i (.083/min @ 6 min/set) .498 .498 20 10 revise card pulling lt iii (.117/min@ 2 min/set) .234 .234 40 10 correct card sets (20 sets/mo) lt i (.083/min@ 6 min/set) .498 .498 20 10 revise card corrections lt iii (.117/min@ 2 min/set) .234 .234 20 5 process added copies (record accession number on shelflist; record call number in book; type slip for marking) lt i (.083/min @ 15 min/item ) 1.25 1.25 50 63 locate materials in process lt i (.083/min@ 15 min/item) 1.25 1.25 33 41 prepare books for bindery decision lt iii (.117/min @ 1 min/item) .117 .117 50 6 supervise staff and timeslip lt iii (7.04/hr @ 68 hrs/mo) 479 librarian (12.34/hr@ 50.5 hrs/mo ) 623 bibliographical record maintenance total $4,194 cost analysis!druschel 39 table 2(d). cost analysis: automated cataloging and book processing system average staff data total number total costs base cost processed cost per costs! subscription materials per per per process item item costs/item costs/item item month month marking sort materials for processing ( marking ) oa ii-typing (. 105/ min@ 30 sec/item) $ .053 $ .053 2263 $ 120 place materials on table oa iityping (.105/min@ 20 items/min) .005 .005 2263 11 process materials (type and paste labels, pockets and date due slips; type book cards ) timeslip (.03/min @ 16 min/item) .60 $.029/ .629 400 252 date due slip; label pocket ; process materials with tab book card book cards (type and paste labels, pockets and date due slips) timeslip (.03/min @ 16 min/item) .48 .032/ .512 1555 796 oa ii-typing (. 105/min @ 16 min/item) 1.68 date due 1.712 308 527 slip ; label; pocket ; book card keypunch bookcards lt i (.083/min @ 2.4 min/card) .20 .20 1863 373 verify book cards lt iii (. 117/min @ 1.6 min/card) . 187 .187 1863 348 revise processing lt i (.083/min @ 2 min/item) .166 .166 1500 249 lt iii (.117/min@ 2 min/item) .234 .234 763 179 sort materials for delivery oa ii-typing (.105/min@ 1.5 items/min ) .07 .07 2263 158 unpack bindery materials; pull slips lt i (.083/min @ 1 item/min) .083 .083 550 46 verify bindery slips and check price lt iii (. 117/min @ 2 min/item) .234 .234 550 129 general supervision; bindery accounts and statistical data lt iii (7.04/hr @ 36 hrs/mo) 253 supervisor ii (8.97/hr@ 128 hrs/mo) 1,148 marking total $ 4,589 cataloging and book processing total $22,790 40 journal of library automation vol. 14/ 1 march 1981 table 2(e ) . total monthly costs ( summary ) staff costs per month $16,849 data base costs/month $5,480 subscription costs per month $157 materials costs per month $304 total cost per month $22 ,790 table 3. cataloging and book processing system: summary comparison costs manual system category costs/month staff data base subscriptions materials equipment total $25,775 1,076 1,942 462 $29 ,255/month cost comparison-difference manual $29,255/ month automated $23,680/month automated system category costs/month staff data base subscriptions materials equipment total $16,849 5 ,480 157 304 890 $23, 680/month $ 5,575/month/$66,900/year since 1978 this unit, as well as all units in the technical services division, have periodically analyzed unit activities, and recorded the data collected on work assignment/staffing profile sheets (see table 4 for sample profile sheet). the primary purpose of the profiles was to develop a detailed account of work distribution throughout tsd in order to determine the staffing requirements necessary for each unit to maintain an even workflow. in the cost analysis , the cataloging and book processing (cbp) profile was used to identify each unit process, as well as to provide the basic data on the number and level of staff and the time required to perform each process . additionally, for the automated system, the cbp profile sheets, together with wln invoices (see figure · 1 for sample invoice) and wln monthly activity reports (see figure 2 for sample activity report) were used to determine the average number of items processed per month . for example, since about 85 percent of the cataloging done in tsd is via wln, it was possible to derive exact figures from wln invoices for the average number of items cataloged per month. the wln invoices also differentiated between data-base copy cataloging and original data entry . the cbp profile sheets were used to determine average number of non-wln items cataloged. using a combination of wln invoice and profile data, a chart was constructed of the average number of items searched and cataloged per month under the automated system (see table 5) . in order to make costs comparable, an assumption was made that the same average number of items was searched and cataloged under the previous manual system and a similar chart was made for it (see table 6). in reality , the available staff under the manual system could not process the same amount of material per month. cost analysis/drusche l 41 table 4 . technical services division work assignment/staffing profile: november 1978 unit: cataloging and book processing. subunit: lc copy editing . tasks or processes average number of items received for processing order card sets, check item { 2100/mo(monos ) agamst data base, enter holdings 63/mo(serials) prepare worksheets 210/mo prepare ts d series cards 126/mo do series check 350/mo average time per item 6 min/item 6 min/item 10 min/item 2 min/item 2 min/item update cip records 134/mo 10 min/item . . . { 210/mo(mono) 25 min/item input ongmal catalogmg data 211 ( · 1 ) 25 · /'t · mo sena s · mm 1 em process "rush" monographs 168/mo 15 min/item process corrections360/mo 10 min/item data base information receive materialssort series 2100/mo 3 items/min resolve problems; na na locate materials prepare and sort series 168/mo 8 min/item decisions materials sort mail na na staff costs total staff hours average staff available number hours at of items needed level of designated processed per task staff level 10/hr {lti 124.1 210/mo lt ii 85 .9 10/hr 6.3/mo lt iii 6.3 6/hr {lti 17.5 35/mo lt ii 17.5 30/hr 4.2/mo lt i 4.2 30/hr 11. 7/mo lt i 22.3 6/hr 22.3/mo lt ii 22.3 2.4/hr 87.5/mo lt ii 87.5 2.4/hr 8.75/mo lt iii 8.75 4/hr 42/mo lt ii 42 6/hr {lt ii 30 60/mo sup ii 30 180/hr 11.6/mo lt ii 11.6 na {lt ii 18.6 31.2/mo lt iii 13 7.5/hr 22.4/mo lt iii 22.4 na 42/mo lt iii 42 in the cost analysis of the automated system, the monthly wages for staff members of the cataloging and book processing un it were based on current monthly salaries (as of february 1980) plus estimated fringe benefits (2 1 percent). the total wages were added together for each level of staff and d ivided by the number of staff at that level to give an 0002 rbsbill rpt b1041 agency invoiced 0002 washington state unive rsity holland library pullman wa 99164 allene f schnaitter services charges com catlg processing w/s hold com catlg fiche copies online-attach sum holdcol 1 online-req cat cards-co l 1 online i nput of bib rec col 1 onli ne inquiry into database catalog cards washington library network customer invoice **. * * ***** *** * ** *** ** ••• *** * *** * *invoiced expenditure breakdown* account number / system * 4000 00 * recurring cha~ges-bib sys t em * ** *** ** * *** ** ** ** **** * ****** *** * quantity units 18 , 750 . 00 459.00 810.00 1,003.00 378.00 5 , 335 . 00 5,54 1. 00 @ 4¢ a title @ 15¢ a copy @$ 1. 60 recrd @ 15¢ each @ 14¢ each @ 6.9¢ each @ 5.5¢ card total services charges total charges , f ig. 1. washington library network customer invoice . billing date 12/31/79 ref . invoice no . 000001311 page no . 0001 total charges credits net charges 750.00 68.85 1 , 296 . 00 150.45 52 . 92 368.11 304 . 75 2. 991.08 2. 991.08 750,00 68.85 1 , 296 . 00. 150.45 52 . 92 368.11 304 . 75 2,991.08 * 2,991. 08 * 42 journal of library automation vol. 14/1 march 1981 monthly activity report tor period 11/01/79 to 11/30/79 library total holdings holdings records contribution rcps from acq orders inq uiry as of 11/30/79 added input factor 11/01 to 11/28 created transactions wapac 2, 0 59 38 . 0% 311 wap1p 41,549 416 385 92.5% 588 1,472 6,607 wapoh 33,801 566 89 15 . 7x 616 5,243 waps (wsu library) 44 866 1 630 197 12 . 0% 2013 1 674 1 9 013 fig . 2 . washington library network monthly activity report (selective sample ). average monthly wage . this average was then divided by 174 (the standard figure for university staff hours per month) to determine the average hourly rate . to calculate staff costs per minute, it was necessary to carry the per-minute costs to the third decimal to approximate the total dollars expended for staffing (see table 7) . no other indirect costs , e. g., breaks, annual leave, or holidays, were included in staff wages ; however, in order to determine the staff hours available to perform the functions being analyzed, nonproductive hours or staff hours devoted to other ass ignments had to be calculated and deducted. these calculations were made according to the following formula : hours/year hours/year 120 hours/year hours/year 88 hours/year 96 hours/year committee assignment (varied) unit meetings (varied) breaks (standard) annual leave (varied) holidays (standard) sick leave (standardized) based on hours earned per month hours/year-:12 = __ hours/month the primary reasons for variation in the nonproductive hours were length of service and whether a staff member was faculty or classified . staff costs under the manual system were based on current monthly wages; however, the number and level of staff are esse ntially that which existed at the time the manual system was function ing (see table 8). timeslip costs were not based on the minimum hourly wage, since a large number of hours were work/study during the period of the analysis . the total hours worked were divided by the total monthly expenditure to derive the per-minute timeslip costs. no effort was made to reconstruct actual timeslip costs under the manual system, but the same per-minute timeslip costs were used in order to avoid unnecessary skewing of staff costs under the manual system. data base costs the per unit costs of using the wln bibliographic system, both for performing processes and securing products, were based on the 1979-80 cost analysis/druschel 43 table 5 . typ e and average numb e r of items searched/cataloged per month on automated system (based on wln invoice data and cbp work assignment/staffing profile ) not nuc searched (wln )/month found/month found/month searched/month book approvals 600 420 (70%) 180 firm orders 700 406 (58%) 294 form approvals 244 (60%) regular 162 (40%) new acquisitions 295 90 (30% ) 205 (re-searched) precats 1380 414 (30%) 966 documents 125 25 (20%) 100 50 se rials 100 lo (10%) 90 30 rush 75 32 (42%) 43 43 gifts 100 5 (5% ) 95 95 monographic series 300 120 (40%) 180 originals 222 0 (0%) 222 222 reinstates 75 7 (10%) 68 68 3972 1529 (38 .5%) 2443 508 type and quantity of bibliographic data found in data base 1376 lc copy 153 cip copy (10%) 1529 type and quantity of original data entry monographs 192 se rials 30 nuc/ lc 95 total 317 total mate rials cataloged wln data base copy 1529 wln original data entry 317 non-wln microform 407 non-wln music lo 2263 wln schedule of charges. the average number of items processed was derived from the wln invoices. the per-record cost of the com catalog was calculated by taking the total costs of producing the com catalog from july 1979 to february 1980 and dividing these costs by the number of titles contained in the com catalog. although the wln schedule of charges stipulates a charge of . 069 cents per data-base inquiry, three kinds of processes allow a given number of inquiries without charge. since not all allowable inquiries are always used for these processes, there are generally a number of inquiries which can be made without charges being assessed. between july 1979 and february 1980, the average number of monthly inquiries for which there was a charge was 11,800; the average number per month for which there was no charge assessed was 8,044. for this reason, in the cost analysis of the automated system (table 2(a)), there appears a category "items searched, no inquiry charges" under the bibliographic searching section. 44 journal of library automation vol. 14/1 march 1981 table 6. type and average number of items searched/cataloged per month on manual system (based on cbp work assignment/staffing profile) not nuc searched (idc)/month found/month found/month searched/month book approvals 600 300 (50%) 300 firm orders 700 280 (40%) 420 420 new acquisitions 295 59 (20%) 236 (re-searched) precats 1380 276 (20%) 1104 documents 125 12 (10%) 113 113 serials 100 5 (5%) 95 95 rush 75 23 (30%) 52 52 gifts 100 5 (5%) 95 95 monographic series 300 90 (30%) 210 210 originals 222 0 (0%) 222 222 reinstates 75 7 (10%) 68 68 total 3972 1057 (26.5%) 2915 1275 type and quantity of materials cataloged idc copy 1057 modified copy 984 original cataloging 222 2263 (note: part of the "no charge" inquiries are generated and used by the acquisitions unit and are therefore not included in this analysis.) although the terminal service and line charges might simply have been added as a total amount to the data-base costs, it seemed more meaningful to distribute these costs on a per-use basis . the method used to distribute these charges was to identify each use of the bibliographic data base, and to divide the total monthly costs of terminals and lines by the total monthly units of use (see table 9). this method of distributing terminal service and line charges not only provided per-unit terminal use costs, but also served to categorize kinds and quantity of data-base use. subscription and material costs subscription costs include only those bibliographic tools purchased for use in tsd for the purpose of bibliographic searching. as a result of the increased growth of the bibliographic data base, fewer tools are being used for searching under the automated system than under the manual system. prior to the implementation of wln, the library subscribed to bibliographic data (lc and cip copy) on microfiche supplied by the information dynamics corporation (idc). the per-unit costs of all subscriptions are presented in the cost analysis charts (tables 1(a) and 2(a)). material costs include only those materials unique to cataloging and book processing; general supplies, such as pencils and paper, are not included. the calculation of the per-unit cost of most materials is generally straightforward. it should be noted, however, that under the automated system, products, i.e., materials, are included in the data-base cost analysis/druschel 45 table 7. staff costs: automated cataloging and book processing system staff costs/month classified staff oa ii lt i (4) lt ii (4) lt iii (2) supervisor ii (2) faculty catalogers (3vz) (monos) unit head staff costs/minute times lip oa ii lt i (4) lt ii (4) lt iii (2) supervisor ii (2) catalogers (3v2) unit head total staff costs/month timeslip----809 hrs @ $1,456/mo special projects librarian classified staff faculty total (all staff) salaries month $ 912 2,888 3,269 2,024 2,578 $4,691 1,774 plus 21% (fringe benefits) $192 606 686 425 541 $985 373 $1,456/mo + 809 hrs = 1.80/hr + 60 = .03/min 1,104/mo + 174 = 6.34/hr + 60 = .105/min costs/ month $ 1,104 3,494 3,955 2,449 3,119 subtotal $14,121 $ 5,676 2,147 subtotal $ 7,823 3,494/mo + 4 = $874/mo + 174 = 5.02/hr + 60 = .083/min 3, 955/mo + 4 = $989/mo + 17 4 = 5. 68/hr + 60 = . 094/min 2,449/mo + 2 = $1,225/mo + 174 = 7.04/hr + 60 = .117/min 3,119/mo + 2 = $1,560/mo + 174 = 8.97/hr + 60 = .15/min 5,676/mo + 3.5 = $1,622/mo + 174 = 9.32/hr + 60 = .155/min 2,147/mo + 174 = 12.34/hr + 60 = .205/min $ 1,456 345* 14,121 7,823 $23,745 *amount of time (wages) assigned to cataloging. costs, and only those materials used independent of the data base, e. g., book pockets and book cards, are listed as material costs on the charts. under the manual system, due to the divisional arrangement of the library system and the number of card catalogs being maintained, the formula for producing sets of cards for a single title was complex. for this reason, the costs and number of cards produced for the titles cataloged per month are listed as a separate line item. equipment costs equipment costs include only equipment unique to cataloging and book processing, i.e., required for processing or products. general equipment, such as desks, book trucks, typewriters, are not included. equipment-automated system during the period covered by the cost analysis, november 1977 to february 1980, the following equipment was purchased for the automated system: 46 journal of library automation vol. 14/1 march 1981 7 bibliographic terminals 10 modems or modem contention units 2 printers tax $24,360 5,433 6,500 $36,293 1,887 $38,180 two pieces of equipment are currently being leased (maintenance included): keypunch verifier @$ 92.61 @ 101.12 $193. 73/month summary of monthly equipment costs purchases (5-year amortization) $636.33 maintenance 60.00 leased equipment 193.73 $890.06/month equipment-manual system if the automated system had not been implemented, the following equipment would have been purchased during this period: 2 card catalogs $ 3, 755 5 kardex units 4,475 2 lined ex units __ 2, 944 $11,174 tax 581 $11,755 although the anticipated life span of this equipment should be considerably greater than that of terminals and modems, it has also been amortized over a five-year period. the rationale for this period of amortization is that the rate of growth of the files for which the equipment is used results in the purchase of additional equipment equivalent to the expected replacement of electronic equipment. therefore, the initial cost of these purchases amortized would have been $196/month. since the multilith has been owned by the library for more than twenty years, its purchase price is not applicable to this analysis. however, maintenance on the multilith is $72.24/month. two pieces of equipment were being leased under the manual system (maintenance included): keypunch verifier @$ 92.61 @ 101.12 $193. 73/month cost analysis/druschel 47 summary of monthly equipment costs purchases (5-year amortization) $196.00 maintenance 72.27 leased equipment 193.73 $462.00/month summary and conclusion the cost analysis clearly indicates that at washington state university libraries the automated cataloging and book processing system is less expensive than its previous manual system . by using the bibliographic component of the washiqgton library network, the library has reduced the costs of searching, cataloging, and record maintenance by almost 20 percent (see table 10-summary comparison costs by function). the higher costs of the manual system are essentially staff costs. under that table 8. staff costs: manual cataloging and book processing system (based on the 1977 staffing levels at current staff costs) staff costs/month classified staff oa iityping lt i (11) lt ii (3) lt iii (5) supervisor i (2) supervisor ii offset duplicator operator faculty catalogers (3. 5) unit head staff costs/minute timeslip oa ii-typing lt i (11) lt ii (3) lt iii (5) supervisor i (2) supervisor ii offset duplicator operator catalogers (3.5) unit head total staff costs/month timeslip---1208 hrs @ $2,174/mo classified staff faculty total (all staff) plus 21% salaries (fringe month benefits) $ 912 $ 192 7,950 1,670· 2,434 511 5,060 1,063 2,175 457 1,289 271 1,135 238 subtotal 4,691 985 1,774 373 subtotal $2,174/mo -o1208 hrs. = 1.80/hr -o60 = .03/min. 1, 104/mo -o174 = 6.34/hr -o60 = .105/min costs/ month $ 1,104 9,622 2,945 6,123 2,632 1,560 1,373 $25,359 5,676 2,147 $ 7,823 9,622/mo -o11 = 875/mo -o174 = 5.03/hr -o60 = .084/min 2,945/mo -o3 = 982/mo -o174 = 5.64/hr -o60 = .094/min 6,123/mo -o5 = 1,225/mo -o174 = 7.04/hr -o60 = .117/min 2,632/mo -o2 = 1,316/mo -o174 = 7.56/hr -o60 = .126/min 1,560/mo -o174 = 8.97/hr -o60 = .149/min. 1,373/mo -o174 = 7.89/hr -o60 = .13/min 5,846/mo -o3.5 = 1,670/mo -o174 = 9.60/hr -o60 = .155/min 2, 147/mo -o174 = 12.34/hr -o60 = .205/min. $ 2,174 25,359 7,823 $35,356 48 journal of library automation vol. 14/ 1 march 1981 table 9 . bibliographic data base use per month (one unit = one access to or process in data base) category searching cataloging (data base copy) cataloging (original data entry ) authority verification (317 x 7) bibliographic changes/corrections ill, ref, general total units wln terminal service and telecommunication line charges/ month 5 v2 terminals @ $140/mo = $770/mo 5v2 lines @ $40/mo = 220/mo $990/mo quantity of te rminal use 10688 1529 3 17 2219 360 537 15650 $990 + 15650 = $.06/terminal use for cataloging and book proce ssing system table 10 . catalo ging and book processing system : summary comparison costs by function (excluding equipment costs ) function manual system l. bibliographic searching 2. bibliographic record production (cost of catalog cards distribution) lc copy cataloging modified copy cataloging original cataloging misce llane ous 3. bibliographic record maintenance 4. marking total automated system l. bibliographic searching 2. bibliographic record production (cost of catalog cards included) lc and cip copy cataloging modified copy cataloging original cataloging miscellaneous 3. bibliographic record maintenance 4. marking total *total of items listed below. ttotal of costs listed below. number of items 5247 [2263)* 1057 984 222 na na na 4480 [2263 )* 1529 512 222 na na na costs pe r month $ 4 ,61 7 [$13, 143)t 4,092 5,021 3,343 687 6 ,402 4 ,631 $28 ,793 $ 2, 170 [$ll ,837)t 4,900 1,523 3 ,343 2,071 4 , 194 4 ,589 $22,790 system, eleven more staff and 1,365 more timeslip hours were needed per month to process the same amount of materials as is processed under the automated system . in fact, compared to the staff costs of both the manual and automated systems, the costs of equipment, data-base use (including products), terminal service, and telecommunication lines cost analysis/druschel 49 of the automated system are a relatively small percentage (27 percent) of the total cataloging and book processing costs . this analysis serves to underscore a basic reality of the current library organization: personnel is one of its largest expenditures and staff-intensive systems are very costly . this cost analysis has not directly addressed the issue of the quality of processing and products of either the manual or automated systems. the analysis suggests, however, that the automated system is more efficient in terms of staff time . moreover, the tsd staff has found that not only can more be done with fewer staff, but the automated system also provides more accurate data and has the flexibility to accommodate with relative ease the many corrections and changes that must be made to the library's bibliographic files. joselyn druschel is assistant director for automation and technical support at the washington state university libraries . she is currently chairing a staff task force which is developing specifications for the libraries' on-line catalog. · 158 information technology and libraries | december 2009 michelle frisquepresident’s message i know the president’s message is usually dedicated to talking about where lita is now or where we are hoping lita will be in the future, but i would like to deviate from the usual path. the theme of this issue of ital is “discovery,” and i thought i would participate in that theme. like all of you, i wear many hats. i am president of lita. i am head of the information services department at the galter health sciences library at northwestern university. i also am a new part-time student in the masters of learning and organizational change program at northwestern university. as a student and a practicing librarian, i am now on both sides of the discovery process. as head of the information systems department, i lead the team that is responsible for developing and maintaining a website that assists our health-care clinicians, researchers, students, and staff with selecting and managing the electronic information they need when they need it. as a student, i am a user of a library discovery system. in a recent class, we were learning about the burkelitwin causal model of organization performance and change. the article we were reading described the model; however, it did not answer all of my questions. i thought about my options and decided i should investigate further. before i continue, i should confess that, like many students, i was working on this homework assignment at the last minute, so the resources had to be available online. this should be easy, right? i wanted to find an overview of the model. i first tried the library’s website using several search strategies and browsed the resources in metalib, the library catalog, and libguides with no luck. the information i found was not what i was looking for. i then tried wikipedia without success. finally, as a last resort, i searched google. i figured i would find something there, right? i didn’t. while i found many scholarly articles and sites that would give me more information for a fee, none of the results i reviewed gave me an overview of the model in question. i gave up. the student in me thought: it should not be this hard! the librarian in me just wanted to forget i had ever had this experience. this got me to thinking: why is this so hard? libraries have “stuff” everywhere. we access “stuff,” like books, journals, articles, images, datasets, etc., from hundreds of vendors and thousands of publishers who guard their stuff and dictate how we and our users can access that stuff. that’s a problem. i could come up with a million other reasons why this is so difficult, but i won’t. instead, i would like to think about what could be. in this same class we learned about appreciative inquiry (ai) theory. i am simplifying the theory, but the essence of ai is to think about what you want something to be instead of identifying the problems of what is. i decided to put ai to the test and tried to come up with my ideal discovery process. i put both my student and librarian hats on, and here is what i have come up with so far: n i want to enter my search in one place and search once for what i need. i don’t want to have to search the same terms many times in various locations in the hopes one of them has what i am looking for. i don’t care where the stuff is or who provides the information. if i am allowed to access it i want to search it. n i want items to be recommended to me on the basis of what i am searching. i also want the system to recommend other searches i might want to try. n i want the search results to be organized for me. while perusing a result list can be loads of fun because you never know what you might find, i don’t always have time to go through pages and pages of information. n i want the search results to be returned to me in a timely manner. n i want the system to learn from me and others so that the results list improves over time. n i want to find the answer. i’m sure if i had time i would come up with more. while we aren’t there yet, we should continually take steps—both big and small—to perfect the discovery process. i look forward to reading the articles in this issue to see what other librarians have discovered, and i hope to learn new things that will bring us one step closer to creating the ultimate discovery experience. michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, northwestern university, chicago. practical limits to the scope of digital preservation mike kastellec practical limits to the scope of digital preservation | kastellec 63 abstract this paper examines factors that limit the ability of institutions to digitally preserve the cultural heritage of the modern era. the author takes a wide-ranging approach to shed light on limitations to the scope of digital preservation. the author finds that technological limitations to digital preservation have been addressed but still exist, and that non-technical aspects—access, selection, law, and finances—move into the foreground as technological limitations recede. the author proposes a nested model of constraints to the scope of digital preservation and concludes that costs are digital preservation’s most pervasive limitation. introduction imagine for a moment what perfect digital preservation would entail: a perfect archive would capture all the content generated by humanity instantly and continuously. it would catalog that information and make it available to users, yet it would not stifle creativity by undermining creators’ right to control their creations. most of all, it would perfectly safeguard all the information it ingested eternally, at a cost society is willing and able to sustain. now return to reality: digital preservation is decidedly imperfect. today’s archives fall far short of the possibilities outlined above. much previous scholarship debates the quality of different digital preservation strategies; this paper looks past these arguments to shed light on limitations to the scope of digital preservation. what are the factors that limit the ability of libraries, archives, and museums (henceforth collectively referred to as archival institutions) to digitally preserve the cultural heritage of the modern era? 1 i first examine the degree to which technological limitations to digital preservation have been addressed. next, i identify the non-technical factors that limit the archival of digital objects. finally, i propose a conceptual model of limitations to digital preservation. technology any discussion of digital preservation naturally begins with consideration of the limits of digital preservation technology. while all aspects of digital preservation are by definition related to technology, there are two purely technical issues at the core of digital preservation: data loss and technological obsolescence. 2 many things can cause data loss. the constant risk is physical deterioration. a digital file consists at its most basic level as binary code written to some form of mike kastellec (makastel@ncsu.edu) is libraries fellow, north carolina state university libraries, raleigh, nc. mailto:makastel@ncsu.edu information technology and libraries | june 2012 64 physical media. just like analog media (paper, vinyl recordings), digital media (optical discs, hard drives) are subject to degradation at a rate determined by the inherent properties of the medium and environment in which it is stored. 3 when the physical medium of a digital file decays to the point where one or more bits lose their definition, the file becomes partially or wholly unreadable. other causes of data loss include software bugs, human action (e.g., accidental deletion or purposeful alteration), and environmental dangers (e.g., fire, flood, war). assuming a digital archive can overcome the problem of physical deterioration, it then faces the issue of technological obsolescence. binary code is simply a string of zeroes and ones (sometimes called a bitstream)—like any encoded information, this code is only useful if it can be decoded into an intelligible format. this process depends on hardware, used to access a bitstream from a piece of physical media, and software, which decodes the bitstream into an intelligible object, such as a document or video displayed on a screen, a printout, or an audio output. technological obsolescence occurs when either the hardware or software needed to render a bitstream usable is no longer available. given the rapid pace of change in computer hardware and software, technological obsolescence is a constant concern. 4 most digital preservation strategies involve staying ahead of deterioration and obsolescence by copying data from older to current generations of file formats and storage media (migration) or by keeping many copies that are tested against one another to find and correct errors (data redundancy). 5 other strategies to overcome obsolescence include pre-emptively converting data to standardized formats (normalization) or avoiding conversion and instead using virtualized hardware and software to simulate the original digital environment needed to access obsolete formats (emulation). as may be expected of a young field, 6 there is a great deal of debate over the merits of each of these strategies. to date, the arguments mostly concern the quality of preservation, which is beyond the scope of this work. what should not be contentious is that each strategy also imposes limitations on the potential scale of digital preservation. migration and normalization are intensive processes, in the sense that they normally require some level of human interaction. any human-mediated process limits the scale of an archival institution’s preservation activities, as trained staffs are a limited and expensive resource. emulation postpones the processing of data until it is later accessed, potentially allowing greater ingest of information. as a strategy, however, it remains at least partly theoretical and untested, increasing the possibility that future access will be limited. data redundancy deserves closer examination, as it has emerged as the gold standard in recent years. the limitations data redundancy imposes on digital preservation are two-fold. the first is that simple maintenance of multiple copies necessarily increases expenses, therefore—given equal levels of funding—less information can be preserved redundantly than can be preserved without such measures. (cost considerations are inextricably linked to every other limitation on digital preservation and are examined in greater detail in “finances,” below.) there are practical, technical limitations on the bandwidth, disk access, and processing speeds needed to perform practical limits to the scope of digital preservation | kastellec 65 parity checks (tests of each bit’s validity) of large datasets to guard against data loss. pushing against these limitations incurs dramatic costs, limiting the scale of digital preservation. current technology and funding are many orders of magnitude short of what is required to archive the amount of information desired by society over the long term. 7 the second way technology limits digital preservation is more complex—it concerns error rates of archived data. non-redundant storage strategies are also subject to errors, of course. only redundant systems have been proposed as a theoretical solution to the technological problem of digital preservation, 8 though, so it is necessary to examine their error rate in particular. on a theoretical level, given sufficient copies, redundant backup is all but infallible. in practice, technological limitations emerge. 9 the number of copies required to ensure perfect bit preservation is a function of the reliability of the hardware storing each copy. multiple studies have found that hardware failure rates greatly exceed manufacturers’ claims. 10 rosenthal argues that, given the extreme time spans under consideration, storage reliability is not just unknown but untestable. 11 he therefore concludes that it cannot be known with certainty how many copies are needed to sustain acceptably low error rates. even today’s best digital preservation technologies are subject to some degree of loss and error. analog materials are also inevitably subject to deterioration, of course, but the promise of digital media leads many to unrealistic expectations of perfection. nevertheless, modern digital preservation technology addresses the fundamental needs of archival institutions to a workable degree. technological limitations to digital preservation still exist but the aspects of digital preservation beyond purely technical considerations—access, selection, law, and finances— should gain greater relative importance than they have in the past. access with regard to digital preservation, there are two different dimensions of access that are important. at one end of a digital preservation operation, authorized users must be able to access an archival institution’s holdings and unauthorized users restricted from doing so. this is largely a question of technology and rights management—users must be able to access preserved information and permitted to do so. this dimension of access is addressed in the technology and law sections of this paper. the other dimension of access occurs at the other end of a digital preservation operation: an archival institution must be able to access a digital object to preserve it. this simple fact leads to serious restrictions on the scope of digital preservation because much of the world’s digital information is inaccessible for the purposes of archiving by libraries and archives. there are a number of reasons why a given digital object may be inaccessible. large-scale harvesting of webpages requires automated programs that “crawl” the web, discovering and capturing pages as they go. web crawlers cannot access password-protected sites (e.g., facebook) and database-backed sites (all manner of sites, including many blogs, news sites, e-commerce sites, information technology and libraries | june 2012 66 and countless collections of data). this inaccessible portion of the web is estimated to dwarf the readily accessible portion by orders of magnitude. there is also an enormous amount of inaccessible digital information that is not part of the web at all, such as emails, company intranets, and digital objects created and stored by individuals. 12 additionally, there is a temporal limit to access. some digital objects only are accessible (or even exist) for a short window of time, and all require some measure of active preservation to avoid permanent loss. 13 the lifespans of many webpages are vanishingly short. other pages, like some news items, are publicly accessible for a short window before they are hidden behind paywalls. even long-lasting digital objects are often dynamic: the ads accompanying a webpage may change with each visit; news articles and other documents are revised; blog posts and comments are deleted. if an archival institution cannot access a digital object quickly or frequently enough, the object cannot be archived, at least not completely. large-scale digital preservation, which in practice necessarily relies on periodic automated harvesting of content, is therefore limited to capturing snapshots of the changes digital objects undergo over their lifespans. law existing copyright law does not translate well to the digital realm. leaving aside the complexities of international copyright law, in the united states it is not clear, for example, whether an archival institution like the library of congress is bound by licensing restrictions and if it can require deposit of digital objects, nor whether content on the web or in databases should be treated as published or unpublished. 14 “many of the uncertainties come from applying laws to technologies and methods of distribution they were not designed to address.” 15 a lack of revised laws or even relevant court decisions significantly impacts the potential scale of digital preservation, as few archival institutions will venture to preserve digital objects without legal protection for doing so. given this unclear legal environment, efforts at large-scale digital preservation are hampered by the need to secure permission to archive from the rights holder of each piece of content. 16 this obviously has enormous impact on preserving the web, but even scholarly databases and periodical archives may not hold full rights to all of their published content. additionally, a single digital object can include content owned by any number of authors, each of whose permission is needed for legal archival. without stronger legal protection for archival institutions, the scope of digital preservation is severely limited by copyright restrictions. digital preservation is further limited by licensing agreements, which can be even more restrictive than general copyright law. frequently, purchase of a digital object does not transfer ownership to the end-user, but rather grants limited licensed access to the object. in this case, libraries do not enjoy the customary right of first sale that, among other things, allows for actions related to preservation that would otherwise breach copyright. 17 preservation of licensed works requires that libraries either cede archival responsibility to rights practical limits to the scope of digital preservation | kastellec 67 holders, negotiate the right to archive licensed copies, or create dark archives that preserve objects in an inaccessible state until their copyright expires. selection the limitation selection imposes on digital preservation hinges on the act of intellectual appraisal. the total digital content created each year already outstrips the total current storage capacity of the world by a wide margin. 18 it is clear libraries and archives cannot preserve everything so, more than ever, deciding what to preserve is critical. 19 models of selection for digital objects can be plotted on a scale according to the degree of human mediation they entail. at one end, the selective model is closest to selection in the analog world, with librarians individually identifying digital objects worthy of digital preservation. at the other end of the scale, the whole domain model involves minimal human-mediation, with automated harvesting of digital objects. the collaborative model, in which archival institutions negotiate agreements with publishers to deposit content, falls somewhere between these two extremes, as does the thematic model, which can apply either selectiveor whole-domain-type approaches to relatively narrow sets of digital objects defined by event, topic, or community. each of these approaches results in limits to the scope of digital preservation. the human mediation of the selective model limits the scale of what can be preserved, as objects can only be acquired as quickly as staff can appraise them. the collaborative and thematic models offer the potential for thorough coverage of their target but by definition are limited in scope. the whole domain model avoids the bottleneck of human appraisal but, more than any other model, is subject to the access limitations discussed above. whole domain harvesting is also essentially wasteful, as it is an anti-selection approach—everything found is kept, irrespective of potential value. this wastefulness makes the whole domain model extremely expensive because of the technological resources required to manage information at such a scale. finances the ultimate limiting factor is financial reality. considerations of funding and cost have both broad and narrow effects. the narrow effects are on each of the other limitations previously identified— financial constraints are intertwined with the constraints imposed by technology, access, law, and selection. the technological model of digital preservation that offers the highest quality and lowest risk, redundant offsite copies, also carries hard-to-sustain costs. while the cost of storage continues to drop, hardware costs actually make up only a small percentage of the total cost of digital preservation. power, cooling, and—for offsite copy strategies—bandwidth costs are significant and do not decrease as scale increases to the same degree that storage costs do. cost considerations similarly fuel non-technical limitations: increased funding can increase the rate at which digital objects are accessed for preservation and can enable development of systems to mine deep web resources. selection is limited by the number of staff who can evaluate objects or information technology and libraries | june 2012 68 the need to develop systems to automate appraisal. negotiating perpetual access to objects or arranging to purchase archival copies creates additional costs. the broad financial effect is that any digital preservation requires dedicated funding over an indefinite timespan. lavoie outlines the problem: much of the discussion in the digital preservation community focuses on the problem of ensuring that digital materials survive for future generations. in comparison, however, there has been relatively little discussion of how we can ensure that digital preservation activities survive beyond the current availability of soft-money funding; or the transition from a project's first-generation management to the second; or even how they might be supplied with sufficient resources to get underway at all. 20 there are many possible funding models for digital preservation, 21 each with their own limitations. creators and rights holders can preserve their own content but normally have little incentive to do so over the long-term, as demand for access slackens. publicly funded agencies can preserve content, but they may lack a clear mandate for doing so, and they are chronically underfunded. preservation may be voluntarily funded, as is the case for wikipedia, although it is not clear if there is enough potential volunteer funding for more than a few preservation efforts. fees may support preservation, either through charging users for access or by third-party organizations charging content owners for archival services; in such cases, however, fees may also discourage access or provision of content, respectively. a nested model of limitations these aspects can be seen as a series of nested constraints (see figure 1). practical limits to the scope of digital preservation | kastellec 69 figure 1. nested model of limitations at the highest level, there are technical limitations on how much digital information can be preserved at an acceptable quality. within that constraint, only a limited portion of what could possibly be preserved can be accessed by archival institutions for digital preservation. next, within that which is accessible, there are legal limitations on what may be archived. the subset defined by technological, access, and legal limitations still holds far more information than archival institutions are capable of archiving, therefore selection is required, entailing either the limited quality of automated gathering or the limited quantity of human-mediated appraisal. finally, each of these constraints is in turn limited by financial considerations, so finances exert pressure at each level. conclusion it is possible to envision alternative ways to model these series of constraints—the order could be different, or they could all be centered on a single point but not nested within each other. thus, undue attention should not be given to the specific sequence outlined above. one important conclusion that may be drawn, however, is that the identified limitations are related but distinct. the preponderance of digital preservation research to date has understandably focused on overcoming technological limitations. with the establishment of the redundant backup model, which addresses technological limitations to a workable degree, the field would be well served by greater efforts to push back the non-technical limitations of access, law, and selection. the other conclusion is that costs are digital preservation’s most pervasive limitation. as rosenthal plainly states it, “society’s ever-increasing demands for vast amounts of data to be kept for the future are information technology and libraries | june 2012 70 not matched by suitably lavish funds.” 22 if funding cannot be increased, expectations must be tempered. perhaps it has always been the case, but the scale of the digital landscape makes it clear that preservation is a process of triage. for the foreseeable future, the amount of digital information that could possibly be preserved far outstrips the amount that feasibly can be preserved. it is useful to put the advances in digital preservation technology in perspective and to recognize that non-technical factors also play a large role in determining how much of our cultural heritage may be preserved for the benefit of future generations. references and notes 1. issues specific to digitized objects (i.e., digital versions of analog originals) are not specifically addressed herein. technological limitations apply equally to digitized and born-digital objects, however, and the remaining limitations overlap greatly in either case. 2. francine berman et al., sustainable economics for a digital planet: ensuring long-term access to digital information (blue ribbon task force on sustainable digital preservation and access, 2010), http://brtf.sdsc.edu/biblio/brtf_final_report.pdf (accessed apr. 23, 2011). 3. marilyn deegan and simon tanner, “some key issues in digital preservation,” in digital convergence—libraries of the future, ed. rae earnshaw and john vince, 219–37 (london: springer london, 2007), www.springerlink.com.proxyremote.galib.uga.edu/content/h12631/#section=339742&page=1 (accessed nov. 18, 2010). 4. berman et al., sustainable economics for a digital planet; deegan and tanner, “digital convergence.” 5. data redundancy normally will also entail hardware migration; it may or may not also incorporate file format migration. 6. the library of congress, for instance, only began digital preservation in 2000 (www.digitalpreservation.gov/partners/pioneers/index.html [accessed apr. 24, 2011]). 7. david s. h. rosenthal, “bit preservation: a solved problem?” international journal of digital curation 5, no. 1 (july 21, 2010), www.ijdc.net/index.php/ijdc/article/view/151 (accessed mar. 14, 2011). 8. h. m. gladney, “durable digital objects rather than digital preservation,” january 1, 2008, http://eprints.erpanet.org/149 (accessed mar. 14, 2011). 9. rosenthal, “bit preservation.” 10. ibid. rosenthal cites studies by schroeder and gibson (2007) and pinheiro (2007). 11. ibid. http://brtf.sdsc.edu/biblio/brtf_final_report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1 http://www.digitalpreservation.gov/partners/pioneers/index.html file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ijdc.net/index.php/ijdc/article/view/151 http://eprints.erpanet.org/149/ practical limits to the scope of digital preservation | kastellec 71 12. peter lyman, “archiving the world wide web,” in building a national strategy for digital preservation: issues in digital media archiving (washington, dc: council on library and information resources and library of congress, 2002), 38–51, www.clir.org/pubs/reports/pub106/pub106.pdf (accessed dec. 1, 2010); f. mccown, c. c marshall, and m. l nelson, “why web sites are lost (and how they’re sometimes found),” communications of the acm 52, no. 11 (2009): 141–45; margaret e. phillips, “what should we preserve? the question for heritage libraries in a digital world,” library trends 54, no. 1 (summer 2005): 57–71. 13. deegan and tanner, “digital convergence”; mccown, marshall, and nelson, “why web sites are lost (and how they’re sometimes found).” 14. june besek, copyright issues relevant to the creation of a digital archive: a preliminary assessment (the council on library and information resources and the library of congress, 2003), www.clir.org/pubs/reports/pub112/contents.html (accessed mar. 15, 2011). 15. ibid., 17. 16. archival institutions that do not pay heed to this restriction, such as the internet archive (www.archive.org), claim their actions constitute fair use. the legality of this claim is as yet untested. 17. berman et al., sustainable economics for a digital planet. 18. francine berman, “got data?” communications of the acm 51, no. 12 (december 2008): 50, http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part =magazine&wanttype=magazines&title=communications (accessed nov. 20, 2010). 19. phillips, “what should we preserve?” 20. brian f. lavoie, “the fifth blackbird,” d-lib magazine 14, no. 3/4 (march 2008): i, www.dlib.org/dlib/march08/lavoie/03lavoie.html (accessed mar. 14, 2011). 21. berman et al., sustainable economics for a digital planet. 22. rosenthal, “bit preservation.” http://www.clir.org/pubs/reports/pub106/pub106.pdf file:///c:/users/gerrityr/documents/my%20dropbox/ital/ital_june_2012_preprints/,%20http:/www.clir.org/pubs/reports/pub112/contents.htm http://www.archive.org/ http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part=magazine&wanttype=magazines&title=communications http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part=magazine&wanttype=magazines&title=communications http://www.dlib.org/dlib/march08/lavoie/03lavoie.html http://www.dlib.org/dlib/march08/lavoie/03lavoie.html 76 information technology and libraries | june 2010 in this paper we discuss the design space of methods for integrating information from web services into websites. we focus primarily on client-side mash-ups, in which code running in the user’s browser contacts web services directly without the assistance of an intermediary server or proxy. to create such mash-ups, we advocate the use of “widgets,” which are easy-to-use, customizable html elements whose use does not require programming knowledge. although the techniques we discuss apply to any web-based information system, we specifically consider how an opac can become both the target of web services integration and also a web service that provides information to be integrated elsewhere. we describe three widget libraries we have developed, which provide access to four web services. these libraries have been deployed by us and others. our contributions are twofold: we give practitioners an insight into the trade-offs surrounding the appropriate choice of mash-up model, and we present the specific designs and use examples of three concrete widget libraries librarians can directly use or adapt. all software described in this paper is available under the lgpl open source license. ■■ background web-based information systems use a client-server architecture in which the server sends html markup to the user’s browser, which then renders this html and displays it to the user. along with html markup, a server may send javascript code that executes in the user’s browser. this javascript code can in turn contact the original server or additional servers and include information obtained from them into the rendered content while it is being displayed. this basic architecture allows for myriad possible design choices and combinations for mash-ups. each design choice has implications to ease of use, customizability, programming requirements, hosting requirements, scalability, latency, and availability. server-side mash-ups in a server-side mash-up design, shown in figure 1, the mash-up server contacts the base server and each source when it receives a request from a client. it combines the information received from the base server and the sources and sends the combined html to the client. server-side mash-up systems that combine base and mash-up servers are also referred to as data mash-up systems. such data mash-up systems typically provide a web-based configuration front-end that allows users to select data sources, specify the manner in which they are combined, and to create a layout for the entire mash-up. godmar back and annette bailey web services and widgets for library information systems as more libraries integrate information from web services to enhance their online public displays, techniques that facilitate this integration are needed. this paper presents a technique for such integration that is based on html widgets. we discuss three example systems (google book classes, tictoclookup, and majax) that implement this technique. these systems can be easily adapted without requiring programming experience or expensive hosting. t o improve the usefulness and quality of their online public access catalogs (opacs), more and more librarians include information from additional sources into their public displays.1 examples of such sources include web services that provide additional bibliographic information, social bookmarking and tagging information, book reviews, alternative sources for bibliographic items, table-of-contents previews, and excerpts. as new web services emerge, librarians quickly integrate them to enhance the quality of their opac displays. conversely, librarians are interested in opening the bibliographic, holdings, and circulation information contained in their opacs for inclusion into other web offerings they or others maintain. for example, by turning their opac into a web service, subject librarians can include up-to-the-minute circulation information in subject or resource guides. similarly, university instructors can use an opac’s metadata records to display citation information ready for import into citation management software on their course pages. the ability to easily create such “mash-up” pages is crucial for increasing the visibility and reach of the digital resources libraries provide. although the technology to use web services to create mash-ups is well known, several practical requirements must be met to facilitate its widespread use. first, any environment providing for such integration should be easy to use, even for librarians with limited programming background. this ease of use must extend to environments that include proprietary systems, such as vendor-provided opacs. second, integration must be seamless and customizable, allowing for local display preferences and flexible styling. third, the setup, hosting, and maintenance of any necessary infrastructure must be low-cost and should maximize the use of already available or freely accessible resources. fourth, performance must be acceptable, both in terms of latency and scalability.2 godmar back (gback@cs.vt.edu) is assistant professor, department of computer science and annette bailey (afbailey@vt.edu) is assistant professor, university libraries, virginia tech university, blacksburg. web services and widgets for library information systems | back and bailey 77 examples of such systems include dapper and yahoo! pipes.3 these systems require very little programming knowledge, but they limit mash-up creators to the functionality supported by a particular system and do not allow the user to leverage the layout and functionality of an existing base server, such as an existing opac. integrating server-side mash-up systems with proprietary opacs as the base server is difficult because the mash-up server must parse the opac’s output before integrating any additional information. moreover, users must now visit—or be redirected to—the url of the mash-up server. although some emerging extensible opac designs provide the ability to include information from external sources directly and easily, most currently deployed systems do not.4 in addition, those mash-up servers that do usually require server-side programming to retrieve and integrate the information coming from the mash-up sources into the page. the availability of software libraries and the use of special purpose markup languages may mitigate this requirement in the future. from a performance scalability point of view, the mash-up server is a bottleneck in server-side mash-ups and therefore must be made large enough to handle the expected load of end-user requests. on the other hand, the caching of data retrieved from mash-up sources is simple to implement in this arrangement because only the mash-up server contacts these sources. such caching reduces the frequency with which requests have to be sent to sources if their data is cacheable, that is, if realtime information is not required. the latency in this design is the sum of the time required for the client to send a request to the mashup server and receive a reply, plus the processing time required by the server, plus the time incurred by sending a request and receiving a reply from the last responding mash-up source. this model assumes that the mash-up server contacts all sources in parallel, or as soon as the server knows that information from a source should be included in a page. the availability of the system depends on the availability of all mash-up sources. if a mash-up source does not respond, the end user must wait until such failure is apparent to the mash-up server via a timeout. finally, because the mash-up server acts as a client to the base and source servers, no additional security considerations apply with respect to which sources may be contacted. there also are no restrictions on the data interchange format used by source servers as long as the mash-up server is able to parse the data returned. client-side mash-ups in a client-side setup, shown in figure 2, the base server sends only a partial website to the client, along with javascript code that instructs the client which other sources of information to contact. when executed in the browser, this javascript code retrieves the information from the mash-up sources directly and completes the mash-up. the primary appeal of client-side mashing is that no mash-up server is required, and thus the url that users visit does not change. consequently, the mash-up server is no longer a bottleneck. equally important, no maintenance is required for this server, which is particularly relevant when libraries use turnkey solutions that restrict administrative access to the machine housing their opac. on the other hand, without a mash-up server, results from mash-up sources can no longer be centrally cached. thus the mash-up sources themselves must be sufficiently figure 1. server-side mash-up construction figure 2. client-side mash-up construction 78 information technology and libraries | june 2010 scalable to handle the expected number of requests. as a load-reducing strategy, mash-up sources can label their results with appropriate expiration times to influence the caching of results in the clients’ browsers. availability is increased because the mash-up degrades gracefully if some of the mash-up sources fail, since the information from the remaining sources can still be displayed to the user. assuming that requests are sent by the client in parallel or as soon as possible, and assuming that each mash-up source responds with similar latency to requests sent by the user’s browser as to requests sent by a mash-up server, the latency for a client-side mash-up is similar to the server-side mash-up. however, unlike in the server-side approach, the page designer has the option to display partial results to the user while some requests are still in progress, or even to delay sending some requests until the user explicitly requests the data by clicking on a link or other element on the page. because client-side mash-ups rely on javascript code to contact web services directly, they are subject to a number of restrictions that stem from the security model governing the execution of javascript code in current browsers. this security model is designed to protect the user from malicious websites that could exploit client-side code and abuse the user’s credentials to retrieve html or xml data from other websites to which a user has access. such malicious code could then relay this potentially sensitive data back to the malicious site. to prevent such attacks, the security model allows the retrieval of html text or xml data only from sites within the same domain as the origin site, a policy commonly known as sameorigin policy. in figure 2, sources a and b come from the same domain as the page the user visits. the restrictions of the same-origin policy can be avoided by using the javascript object notation (json) interchange format.5 because client-side code may retrieve and execute javascript code served from any domain, web services that are not co-located with the origin site can make their results available using json. doing so facilitates their inclusion into any page, independent of the domain from which it is served (see source c in figure 2). many existing web services already provide an option to return data in json format, perhaps along with other formats such as xml. for web services that do not, a proxy server may be required to translate the data coming from the service into json. if the implementation of a proxy server is not feasible, the web service is usable only on pages within the same domain as the website using it. client-side mash-ups lend themselves naturally to enhancing the functionality of existing, proprietary opac systems, particularly when a vendor provides only limited extensibility. because they do not require server-side programming, the absence of a suitable vendor-provided server-side programming interface does not prevent their creation. oftentimes, vendor-provided templates or variables can be suitably adapted to send the necessary html markup and javascript code to the client. the amount of javascript code a librarian needs to write (or copy from a provided example) determines both the likelihood of adoption and the maintainability of a given mash-up creation. the less javascript code there is to write, the larger the group of librarians who feel comfortable trying and adopting a given implementation. the approach of using html widgets hides the use of javascript almost entirely from the mash-up creator. html widgets represent specially composed markup, which will be replaced with information coming from a mash-up source when the page is rendered. because the necessary code is contained in a javascript library, adapters do not need to understand programming to use the information coming from the web service. finally, html widgets are also preferable for javascript-savvy users because they create a layer of abstraction over the complexity and browser dependencies inherent in javascript programming. ■■ the google book classes widget library to illustrate our approach, we present a first example that allows the integration of data obtained from google book search into any website, including opac pages. google book search provides access to google’s database of book metadata and contents. because of the company’s book scanning activities as well as through agreements with publishers, google hosts scanned images of many book jackets as well as partial or even full previews for some books. many libraries are interested in either using the book jackets when displaying opac records or alerting their users if google can provide a partial or full view of an item a user selected in their catalog, or both.6 this service can help users decide whether to borrow the book from the library. the google book search dynamic link api the google book search dynamic link api is a jsonbased web service through which google provides certain metadata for items it has indexed. it can be queried using bibliographic identifiers such as isbn, oclc number, or library of congress control number (lccn). it returns a small set of data that includes the url of a book jacket thumbnail image, the url of a page with bibliographic information, the url of a preview page (if available), as well as information about the extent of any preview and whether the preview viewer can be embedded directly into other pages. table 1 shows the json result returned for an example isbn. web services and widgets for library information systems | back and bailey 79 widgetization to facilitate the easy integration of this service into websites without javascript programming, we developed a widget library. from the adapter’s perspective, the use of these widgets is extremely simple. the adapter places html or
tags into the page where they want data from google book search to display. these tags contain an html attribute that acts as an identifier to describe the bibliographic item for which information should be retrieved. it may contain its isbn, oclc number, or lccn. in addition, the tags also contain one or more html <class> attributes to describe which processing should be done with the information retrieved from google to integrate it into the page. these classes can be combined with a list of traditional css classes in the <class> attribute to apply further style and formatting control. examples as an example, consider the following html an adapter may use in a page: <span title=“isbn:0596000278” class=“gbs -thumbnail gbs-link-to-preview”></span> when processed by the google book classes widget library, the class “gbs-thumbnail” instructs the widget to embed a thumbnail image of the book jacket for isbn 0596000278, and “gbs-link-to-preview” provides instructions to wrap the <span> tag in a hyperlink pointing to google’s preview page. the result is as if the server had contacted google’s web service and constructed the html shown in example 1 in table 2, but the mash-up creator does not need to be concerned with the mechanics of contacting google’s service and making the necessary manipulations to the document. example 2 in table 2 demonstrates a second possible use of the widget. in this example, the creator’s intent is to display an image that links to google’s information page if and only if google provides at least a partial preview for the book in question. this goal is accomplished by placing the image inside the span and using style=“display:none” to make the span initially invisible. the span is made visible only if a preview is available at google, displaying the hyperlinked image. the full list of features supported by the google book classes widget library can be found in table 3. integration with legacy opacs the approach described thus far assumes that the mashup creator has sufficient control over the html markup that is sent to the user. this assumption does not always hold if the html is produced by a vendor-provided system, since such systems automatically generate most of the html used to display opac search results or individual bibliographic records. if the opac provides an extension system, such as a facility to embed customized links to external resources, it may be used to generate the necessary html by utilizing variables (e.g., “@#isbn@” for isbn numbers) set by the opac software. if no extension facility exists, accommodations by the widget library are needed to maintain the goal of not requiring any programming on the part of the adapter. we implemented such accommodations to facilitate the use of google book classes within a iii millennium opac.7 we used magic strings such as “isbn:millennium.record” in a table 1. sample request and response for google book search dynamic link api request: http://books.google.com/books?bibkeys=isbn:0596000278&jscmd=viewapi&callback=process json response: process({ “isbn:0596000278”: { “bib_key”: “isbn:0596000278”, “info_url”: “http://books.google.com/books?id=ezqe1hh91q4c\x26source=gbs_viewapi”, “preview_url”: “http://books.google.com/books?id=ezqe1hh91q4c\x26printsec=frontcover\x26 source=gbs_viewapi”, “thumbnail_url”: “http://bks4.books.google.com/books?id=ezqe1hh91q4c\x26printsec=frontcover\x26 img=1\x26zoom=5\x26sig=acfu3u2d1usnxw9baqd94u2nc3quwhjn2a”, “preview”: “partial”, “embeddable”: true } }); 80 information technology and libraries | june 2010 table 2. example of client-side processing by the google book classes widget library example 1: html written by adapter browser display <span title=“isbn:0596000278” class=“gbs-thumbnail gbs-link-to-preview”> </span> resultant html after client-side processing <a href=“http://books.google.com/books?id=ezqe1hh91q4c& printsec=frontcover&source=gbs_viewapi”> <span title=“” class=”gbs-thumbnail gbs-link-to-preview”> <img src=“http://bks3.books.google.com/books?id=ezqe1hh91q4c& amp;printsec=frontcover&img=1&zoom=5& sig=acfu3u2d1usnxw9baqd94u2nc3quwhjn2a” /> </span> </a> example 2: html written by adapter browser display <span style=“display: none” title=“isbn:0596000278” class=“gbs-link-to-info gbs-if-partial-or-full”> <img src=“http://www.google.com/intl/en/googlebooks/images/ gbs_preview_button1.gif” /> </span> resultant html after client-side processing <a href=”http://books.google.com/books?id=ezqe1hh91q4c& source=gbs_viewapi”> <span title=“” class=“gbs-link-to-info gbs-if-partial-or-full”> <img src=“http://www.google.com/intl/en/googlebooks/images/ gbs_preview_button1.gif” /> </span> </a> table 3. supported google book classes google book class meaning gbs-thumbnail gbs-link-to-preview gbs-link-to-info gbs-link-to-thumbnail gbs-embed-viewer gbs-if-noview gbs-if-partial-or-full gbs-if-partial gbs-if-full gbs-remove-on-failure include an <img...> embedding the thumbnail image wrap span/div in link to preview at google book search (gbs) wrap span/div in link to info page at gbs wrap span/div in link to thumbnail at gbs directly embed a viewer for book’s content into the page, if possible keep this span/div only if gbs reports that book’s viewability is “noview” keep this span/div only if gbs reports that book’s viewability is at least “partial” keep this span/div only if gbs reports that book’s viewability is “partial” keep this span/div only if gbs reports that book’s viewability is “full” remove this span/div if gbs doesn’t return book information for this item <title> attribute to instruct the widget library to harvest the isbn from the current page via screen scraping. figure 3 provides an example of how a google book classes widget can be integrated into an opac search results page. ■■ the tictoclookup widget library the tictocs journal table of contents service is a free online service that allows academic researchers and web services and widgets for library information systems | back and bailey 81 other users to keep up with newly published research by giving them access to thousands of journal tables of contents from multiple publishers.8 the tictocs consortium compiles and maintains a dataset that maps issns and journal titles to rss-feed urls for the journals’ tables of contents. the tictoclookup web service we used the tictocs dataset to create a simple json web service called “tictoclookup” that returns rss-feed urls when queried by issn and, optionally, by journal title. table 4 shows an example query and response. to accommodate different hosting scenarios, we created two implementations of this tictoclookup: a standalone and a cloud-based implementation. the standalone version is implemented as a python web application conformant to the web services gateway interface (wsgi) specification. hosting this version requires access to a web server that supports a wsgicompatible environment, such as apache’s mod_wsgi. the python application reads the tictocs dataset and responds to lookup requests for specific issns. a cron job downloads the most up-to-date version of the dataset periodically. the cloud version of the tictoclookup service is implemented as a google app engine (gae) application. it uses the highly scalable and highly available gae datastore to store tictocs data records. gae applications run on servers located in google’s regional data centers so that requests are handled by a data center geographically close to the requesting client. as of june 2009, google hosting of gae applications is free, which includes a free allotment of several computational resources. for each application, gae allows quotas of up to 1.3 mb requests and the use of up to 10 gb of bandwidth per twenty-fourhour period. although this capacity is sufficient for the purposes of many small and medium-size institutions, additional capacity can be purchased at a small cost. widgetization to facilitate the easy integration of this service into websites without javascript programming, we developed a widget library. like google book classes, this widget library is controlled via html attributes associated with html <span> or <div> tags that are placed into the page where the user decides to display data from the tictoclookup service. the html <title> attribute identifies the journal by its issn or its issn and title. as with google book classes, figure 3. sample use of google book classes in an opac results page table 4. sample request and response for tictocs lookup web service request: http://tictoclookup.appspot.com/0028-0836?title=nature&jsoncallback=process json response: process({ “lastmod”: “wed apr 29 05:42:36 2009”, “records”: [{ “title”: “nature”, “rssfeed”: http://www.nature.com/nature/current_issue/rss }], “issn”: “00280836” }); 82 information technology and libraries | june 2010 the html <class> attribute describes the desired processing, which may contain traditional css classes. example consider the following html an adapter may use in a page: <span style=“display:none” class=“tictoc-link tictoc-preview tictoc-alternate-link” title=“issn:00280836: nature”> click to subscribe to table of contents for this journal </span> when processed by the tictoclookup widget library, the class “tictoc-link” instructs the widget to wrap the span in a link to the rss feed at which the table of content is published, allowing users to subscribe to it. the class “tictoc-preview” associates a tooltip element with the span, which displays the first entries of the feed when the user hovers over the link. we use the google feeds api, another json-based web service, to retrieve a cached copy of the feed. the “tictoc-alternate-link” class places an alternate link into the current document, which in some browsers triggers the display of the rss feed icon figure 4. sample use of tictoclookup classes in the status bar. the <span> element, which is initially invisible, is made visible if and only if the tictoclookup service returns information for the given pair of issn and title. figure 4 provides a screenshot of the display if the user hovers over the link. as with google book classes, the mash-up creator does not need to be concerned with the mechanics of contacting the tictoclookup web service and making the necessary manipulations to the document. table 5 provides a complete overview of the classes tictoclookup supports. integration with legacy opacs similar to the google book classes widget library, we implemented provisions that allow the use of tictoclookup classes on pages over which the mash-up creator has limited control. for instance, specifying a title attribute of “issn:millennium.issnandtitle” harvests the issn and journal title from the iii millennium’s record display page. ■■ majax whereas the widget libraries discussed thus far integrate external web services into an opac display, majax is a widget library that integrates information coming from an opac into other pages, such as resource guides or course displays. majax is designed for use with a iii millennium integrated library system (ils) whose vendor does not provide a web-services interface. the techniques we used, however, extend to other opacs as well. like many table 5. supported tictoclookup classes tictoclookup class meaning tictoc-link tictoc-preview tictoc-embed-n tictoc-alternate-link tictoc-append-title wrap span/div in link to table of contents display tooltip with preview of current entries embed preview of first n entries insert <link rel=“alternate”> into document append the title of the journal to the span/div web services and widgets for library information systems | back and bailey 83 legacy opacs, millennium does not only lack a web-services interface, but lacks any programming interface to the records contained in the system and does not provide access to the database or file system of the machine housing the opac. providing opac data as a web service we implemented two methods to access records from the millennium opac using bibliographic identifiers such as isbn, oclc number, bibliographic record number, and item title. both methods provide access to complete marc records and holdings information, along with locations and real-time availability for each held item. majax extracts this information via screenscraping from the marc record display page. as with all screen-scraping approaches, the code performing the scraping must be updated if the output format provided by the opac changes. in our experience, such changes occur at a frequency of less than once per year. the first method, majax 1, implements screen scraping using javascript code that is contained in a document placed in a directory on the server (/screens), which is normally used for supplementary resources, such as images. this document is included in the target page as a hidden html <iframe> element (see frame b in figure 2). consequently, the same-domain restriction applies to the code residing in it. majax 1 can thus be used only on pages within the same domain—for instance, if the opac is housed at opac.library.university.edu, majax 1 may be used on all pages within *.university.edu (not merely *.library.university.edu). the key advantage of majax 1 is that no additional server is required. the second method, majax 2, uses an intermediary server that retrieves the data from the opac, translates it to json, and returns it to the client. this method, shown in figure 5, returns json data and therefore does not suffer from the same-domain restriction. however, it requires hosting the majax 2 web service. like the tictoclookup web service, we implemented the majax 2 web service using python conformant to wsgi. a single installation can support multiple opacs. widgetization the majax widget library allows the integration of both majax 1 and majax 2 data into websites without javascript programming. the <span> tags function as placeholders, and <title> and <class> attributes describe the desired processing. majax provides a number of “majax classes,” multiple of which can be specified. these classes allow a mash-up creator to insert a large variety of bibliographic information, such as the values of marc fields. classes are also provided to insert fully formatted, ready-to-copy bibliographic references in harvard style, live circulation information, links to the catalog record, links to online versions of the item (if applicable), a ready-to-import ris description of the item, and even images of the book cover. a list of classes majax supports is provided in table 6. examples figure 6 provides an example use of majax widgets. four <span> tags expand into the book cover, a complete harvard-style reference, the valid of a specific marc field (020), and a display of the current availability of the item, wrapped in a link to the catalog record. texts such as “copy is available” shown in figure 6 are localizable. even though there are multiple majax <span> tags that refer to the same isbn, the majax widget library will contact the majax 1 or majax 2 web service only once per identifier, independent of how often it is used in a page. to manage the load, the majax client site library can be configured to not exceed a maximum number of requests per second, per client. all software described in this paper is available under the lgpl open source license. the majax libraries have been used by us and others for about two years. for instance, the “new books” list in our library uses majax 1 to provide circulation information. faculty members at our institution are using majax to enrich their course websites. a number of libraries have adopted majax 1, which is particularly easy to host because no additional server is required. ■■ related work most ilss in use today do not provide suitable web-services interfaces to access either bibliographic information figure 5. architecture of the majax 2 web service 84 information technology and libraries | june 2010 or availability data.9 this shortcoming is addressed by multiple initiatives. the ils discovery interface task force (ils-di) created a set of recommendations that facilitate the integration of discovery interfaces with legacy ilss, but does not define a concrete api.10 related, the iso 20775 holdings standard describes an xml schema to describe the availability of items across systems, but does not describe an api for accessing them.11 many ilss provide a z39.50 interface in addition to their htmlbased web opacs, but z39.50 does not provide standardized holdings and availability.12 nevertheless, there is hope within the community that ils vendors will react to their customers’ needs and provide web-services interfaces that implement these recommendations. the jangle project provides an api and an implementation of the ils-di recommendations through a representations state transfer (rest)–based interface that uses the atom publishing protocol (app).13 jangle can be linked to legacy ilss via connectors. the use of the xml-based app prevents direct access from client-side javascript code, however. in the future, adoption and widespread implementation of the w3c working draft on crossorigin resource sharing may relax the same-origin restriction in a controlled fashion, and thus allow access to app feeds from javascript across domains.14 screen-scraping is a common technique used to overcome the lack of web-services interfaces. for instance, oclc’s worldcat local product obtains access to availability information from legacy ilss in a similar fashion as our majax 2 service.15 whereas the web services used or created in our work exclusively use a rest-based model and return data in json format, interfaces based on soap (formerly simple object access protocol) whose semantics are described by a wsdl specification provide an alternative if access from within client-side javascript code is not required.16 html written by adapter <table width=“340”><tr><td> <span class=“majax-syndetics-vtech” title=“i1843341662”></span> </td><td> <span class=“majax-harvard-reference” title=“i1843341662”></span> <br /> isbn: <span class=“majax-marc-020” title=“i1843341662”></span> <br /> <span class=“majax-linktocatalogmajax-showholdings” title=“i1843341662”></span> </td></tr></table> display in browser after processing dahl, mark., banerjee, kyle., spalti, michael., 2006, digital libraries : integrating content and systems / oxford, chandos publishing, xviii, 203 p. isbn: 1843341662 (hbk.) 1 copy is available figure 6. example use of majax widgets oclc grid services provides rest-based web-services interfaces to several databases, including the worldcat search api and identifier services such as xisbn, xissn, and xoclcnum for frbr-related metadata.17 these services support xml and json and could benefit from widgetization for easier inclusion into client pages. the use of html markup to encode processing instructions is common in javascript frameworks, such as yui or dojo, which use <div> elements with customdefined attributes (so-called expando attributes) for this purpose.18 google gadgets uses a similar technique as well.19 the widely used context objects in spans (coins) specification exploits <span> tags to encode openurl table 6. selected majax classes majax class replacement majax-marc-fff-s majax-marc-fff majax-syndetics-* majax-showholdings majax-showholdings-brief majax-endnote majax-ebook majax-linktocatalog majax-harvard-reference majax-newline majax-space marc field fff, subfields concatenation of all subfields in field fff book cover image current holdings and availability information …in brief format ris version of record link to online version, if any link to record in catalog reference in harvard style newline space web services and widgets for library information systems | back and bailey 85 techniques for the seamless inclusion of information from web services into websites. we considered the cases where an opac is either the target of such integration or the source of the information being integrated. we focused on client-side techniques in which each user’s browser contacts web services directly because this approach lends itself to the creation of html widgets. these widgets allow the integration and customization of web services without requiring programming. therefore nonprogrammers can become mash-up creators. we described in detail the functionality and use of several widget libraries and web services we built. table 7 provides a summary of the functionality and hosting requirements for each system discussed. although the specific requirements for each system differ because of their respective nature, all systems are designed to be deployable with minimum effort and resource requirements. this low entry cost, combined with the provision of a high-level, nonprogramming interface, constitute two crucial preconditions for the broad adoption of mash-up techniques in libraries, which in turn has the potential to context objects in pages for processing by client-side extension.20 librarything uses client-side mash-up techniques to incorporate a social tagging service into opac pages.21 although their technique uses a <div> element as a placeholder, it does not allow customization via classes—the changes to the content are encoded in custom-generated javascript code for each library that subscribes to the service. the juice project shares our goal of simplifying the enrichment of opac pages with content from other sources.22 it provides a set of reusable components that is directed at javascript programmers, not librarians. in the computer-science community, multiple emerging projects investigate how to simplify the creation of server-side data mash-ups by end user programmers.23 ■■ conclusion this paper explored the design space of mash-up table 7. summary of features and requirements for the widget libraries presented in this paper majax 1 majax 2 google book classes tictoclookup classes web service screen scraping iii record display json proxy for iii record display google book search dynamic link api books.google.com tictoc cloud application tictoclookup .appspot.com hosted by existing millennium installation /screens wsgi/python script on libx.lib.vt.edu google, inc. google, inc. via google app engine data provenance your opac your opac google jisc (www.tictocs .ac.uk) additional cost n/a can use libx.lib.vt.edu for testing, must run wsgi-enabled web server in production free, but subject to google terms of service generous free quota, pay per use beyond that same domain restriction yes no no no widgetization majax.js: class-based: majaxclasses gbsclasses.js:classbased: gbs tictoc.js:class-based: tictoc requires javascript programming no no no no requires additional server no yes (apache+mod_wsgi) no no (if using gae), else need apache+mod_wsgi iii bibrecord display n/a n/a yes yes iii webbridge integration yes yes yes yes 86 information technology and libraries | june 2010 vastly increase the reach and visibility of their electronic resources in the wider community. references 1. nicole engard, ed., library mashups—exploring new ways to deliver library data (medford, n.j.: information today, 2009); andrew darby and ron gilmour, “adding delicious data to your library website,” information technology & libraries 28, no. 2 (2009): 100–103. 2. monica brown-sica, “playing tag in the dark: diagnosing slowness in library response time,” information technologies & libraries 27, no. 4 (2008): 29–32. 3. dapper, “dapper dynamic ads,” http://www.dapper .net/ (accessed june 19, 2009); yahoo!, “pipes,” http://pipes .yahoo.com/pipes/ (accessed june 19, 2009). 4. jennifer bowen, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology & libraries 27, no. 2 (2008): 6–19; john blyberg, “ils customer bill-of-rights,” online posting, blyberg.net, nov. 20, 2005, http://www.blyberg .net/2005/11/20/ils-customer-bill-of-rights/ (accessed june 18, 2009). 5. douglas crockford, “the application/json media type for javascript object notation (json),” memo, the internet society, july 2006, http://www.ietf.org/rfc/rfc4627.txt (accessed mar. 30, 2010). 6. google, “who’s using the book search apis?” http:// code.google.com/apis/books/casestudies/ (accessed june 16, 2009). 7. innovative interfaces, “millennium ils,” http://www.iii .com/products/millennium_ils.shtml (accessed june 19, 2009). 8. joint information systems committee, “tictocs journal tables of contents service,” http://www.tictocs.ac.uk/ (accessed june 18, 2009). 9. mark dahl, kyle banarjee, and michael spalti, digital libraries: integrating content and systems (oxford, united kingdom: chandos, 2006). 10. john ockerbloom et al., “dlf ils discovery interface task group (ils-di) technical recommendation,” (dec. 8, 2008), http://diglib.org/architectures/ilsdi/dlf_ils_ discovery_1.1.pdf (accessed june 18, 2009). 11. international organization for standardization, “information and documentation—schema for holdings information,” http://www.iso.org/iso/catalogue_detail .htm?csnumber=39735 (accessed june 18, 2009) 12. national information standards organization, “ansi/ niso z39.50—information retrieval: application service definition and protocol specification,” (bethesda, md.: niso pr., 2003), http://www.loc.gov/z3950/agency/z39-50-2003.pdf (accessed may 31, 2010). 13. ross singer and james farrugia, “unveiling jangle: untangling library resources and exposing them through the atom publishing protocol,” the code4lib journal no. 4 (sept. 22, 2008), http://journal.code4lib.org/articles/109 (accessed apr. 21, 2010); roy fielding, “architectural styles and the design of network-based software architectures” (phd diss., university of california, irvine, 2000); j. c. gregorio, ed., “the atom publishing protocol,” memo, the internet engineering task force, oct. 2007, http://bitworking.org/projects/atom/rfc5023.html (accessed june 18, 2009). 14. world wide web consortium, “cross-origin resource sharing: w3c working draft 17 march 2009,” http://www .w3.org/tr/access-control/ (accessed june 18, 2009). 15. oclc online computer library center, “worldcat and cataloging documentation,” http://www.oclc.org/support/ documentation/worldcat/default.htm (accessed june 18, 2009). 16. f. curbera et al., “unraveling the web services web: an introduction to soap, wsdl, and uddi,” ieee internet computing 6, no. 2 (2002): 86–93. 17. oclc online computer library center, “oclc web services,” http://www.worldcat.org/devnet/wiki/services (accessed june 18, 2009); international federation of library associations and institutions study group on the functional requirements for bibliographic records, “functional requirements for bibliographic records : final report,” http://www.ifla.org/files/ cataloguing/frbr/frbr_2008.pdf (accessed mar. 31, 2010). 18. yahoo!, “the yahoo! user interface library (yui),” http://developer.yahoo.com/yui/ (accessed june 18, 2009); dojo foundation, “dojo—the javascript toolkit,” http://www .dojotoolkit.org/ (accessed june 18, 2009). 19. google, “gadgets.* api developer’s guide,” http://code. google.com/apis/gadgets/docs/dev_guide.html (accessed june 18, 2009). 20. daniel chudnov, “coins for the link trail,” library journal 131 (2006): 8–10. 21. librarything, “librarything,” http://www.librarything .com/widget.php (accessed june 19, 2009). 22. robert wallis, “juice—javascript user interface componentised extensions,” http://code.google.com/p/juice-project/ (accessed june 18, 2009). 23. jeffrey wong and jason hong, “making mashups with marmite: towards end-user programming for the web” conference on human factors in computing systems, san jose, california, april 28–may 3, 2007: conference proceedings, volume 2 (new york: association for computing machinery, 2007): 1435–44; guiling wang, shaohua yang, and yanbo han, “mashroom: end-user mashup programming using nested tables” (paper presented at the international world wide web conference, madrid, spain, 2009): 861–70; nan zang, “mashups for the web-active user” (paper presented at the ieee symposium on visual languages and human-centric computing, herrshing am ammersee, germany, 2008): 276–77. author id box for 2 column layout this article examines the linguistic structure of folksonomy tags collected over a thirty-day period from the daily tag logs of del.icio.us, furl, and technorati. the tags were evaluated against the national information standards organization (niso) guidelines for the construction of controlled vocabularies. the results indicate that the tags correspond closely to the niso guidelines pertaining to types of concepts expressed, the predominance of single terms and nouns, and the use of recognized spelling. problem areas pertain to the inconsistent use of count nouns and the incidence of ambiguous tags in the form of homographs, abbreviations, and acronyms. with the addition of guidelines to the construction of unambiguous tags and links to useful external reference sources, folksonomies could serve as a powerful, flexible tool for increasing the user-friendliness and interactivity of public library catalogs, and also may be useful for encouraging other activities, such as informal online communities of readers and user-driven readers’ advisory services. o ne of the most daunting challenges of information management in the digital world is the ability to keep, or refind, relevant information; book­ marking is one of the most popular methods for storing relevant web information for reaccess and reuse (bruce, jones, and dumais 2004). the rising popularity of social bookmark managers, such as del.icio.us, addresses these concerns by allowing users to organize their bookmarks by assigning tags that reflect directly their own vocabu­ lary and needs. the collection of user­assigned tags is referred to commonly as a folksonomy. in recent years, significant developments have occurred in the creation of customizable user features in public library catalogs. these features offer clients the opportunity to customize their own library web pages and to store items of interest to them, such as book lists. client participation in these interfaces, however, is largely reactive; clients can select items from the catalog, but they have little ability to orga­ nize and categorize these items in a way that reflects their own needs and language. digital document repositories, such as library cata­ logs, normally index the subject of their contents via key­ words or subject headings. traditionally, such indexing is performed either by an authority, such as a librarian or a professional indexer, or is derived from the authors of the documents; in contrast, collaborative tagging, or folkson­ omy, allows anyone to freely attach keywords or tags to content. demspey (2003) and ketchell (2000) recommend that clients be allowed to annotate resources of interest and to share these annotations with other clients with similar interests. folksonomies can thus make significant contributions to public library catalogs by enabling cli­ ents to organize personal information spaces; namely, to create and organize their own personal information space in the catalog. clients find items of interest (items in the library catalog, citations from external databases, external web pages, and so on) and store, maintain, and organize them in the catalog using their own tags. in order to more fully understand these applications, it is important to examine how folksonomies are struc­ tured and used, and the extent to which they reflect user needs not found in existing lists of subject headings. the purpose of this proposed research is thus to examine the structure and scope of folksonomies. how are the tags that constitute the folksonomies structured? to what extent does this structure reflect and differ from the norms used in the construction of controlled vocabular­ ies ,such as library of congress subject headings? what are the strengths and weaknesses of folksonomies (for example, reflect user need, ambiguous headings, redun­ dant headings, and so forth)? this article will examine a selection of tags obtained from three folksonomy sites, del.icio.us (referred to henceforth as delicious), furl, and technorati, over a thirty­day period. the structure of these tags will be examined and evaluated against section 6 of the niso guidelines for the construction of controlled vocabularies (niso 2005), which looks specifically at the choice and form of terms. ■ definitions of folksonomies folksonomies have been described as “user­created meta­ data . . . grassroots community classification of digital assets” (mathes 2004). wikipedia (2006) describes a folksonomy as “an internet­based information retrieval methodology consisting of collaboratively generated, open­ended labels that categorize content such as web pages, online photographs, and web links.” the concept of collaboration is attributed commonly to folksonomies (bateman, brooks, and mccalla 2006; cattuto, loreto, and pietronero 2006; fichter 2006; golder and huberman the structure and form of folksonomy tags: the road to the public library catalog louise f. spiteri louise f. spiteri (louise.spiteri@dal.ca) is associate professor at the school of information management, dalhousie university, halifax, nova scotia, canada. this research was funded by the oclc/alise library and information science research grant program. the structure and form of folksonomy tags | spiteri 13 1� information technology and libraries | september 20071� information technology and libraries | september 2007 2006; mathes 2004; quintarelli 2005; udell 2004). thomas vander wal, who coined the term folksonomy, argues, however, that: the definition of folksonomy has become completely unglued from anything i recognize. . . . it is not col­ laborative . . . it is the result of personal free tagging of information and objects (anything with a url) for one’s own retrieval. the tagging is done in a social environment (shared and open to others). the act of tagging is done by the person consuming the informa­ tion” (vanderwal.net 2005). it may be more accurate, therefore, to say that folk­ sonomies are created in an environment where, although people may not actively collaborate in their creation and assignation of tags, they may certainly access and use tags assigned by others. folksonomies thus enable the use of shared tags. folksonomies are used primarily in social bookmark­ ing sites, such as delicious (http://del.icio.us/) and furl (http://www.furl.net/), which allow users to add sites they like to their personal collections of links, to organize and categorize these sites by adding their own terms, or tags, and to share this collection with other people with the same interests. the tags are used to collocate bookmarks within a user’s collection and bookmarks across the entire system, so, for example, the page http://del.icio.us/tag/blogging will show all bookmarks that are tagged with blogging by any member of the delicious site. ■ benefits of folksonomies quintarelli (2005) and fichter (2006) suggest that folk­ sonomies reflect the movement of people away from authoritative, hierarchical taxonomic schemes that reflect an external viewpoint and order that may not necessarily reflect users’ ways of thinking. “in a social distributed environment, sharing one’s own tags makes for innova­ tive ways to map meaning and let relationships naturally emerge” (quintarelli 2005). vander wal (2006) adds that “the value in this external tagging is derived from people using their own vocabulary and adding explicit mean­ ing, which may come from inferred understanding of the information/object.” an attractive feature of folksonomies is their inclusive­ ness; they reflect the vocabulary of the users, regardless of viewpoint, background, bias, and so forth. folksonomies may thus be perceived to be a democratic system where everyone has the opportunity to contribute and share tags (kroski 2006). the development of folksonomies may reflect also the difficulty and expense of applying con­ trolled taxonomies to the web: building, maintaining, and enforcing a sound, controlled vocabulary is often simply too expensive in terms of development time and of the steep learning curve needed by the user of the system to learn the classification scheme (fichter 2006; kroski 2006; quintarelli 2005; shirky 2004). a further limitation of taxonomies is that they may become outdated easily. new concepts or products may emerge that are not yet included in the taxonomy; in comparison, folksonomies easily accommodate such new concepts (fichter 2006; mitchell 2005; wu, zubair, and maly, 2006). shirky (2004) points out that the advantage of folksonomies is not that they are better than controlled vocabularies, but that they are better than nothing. folksonomies follow desire lines, which are expres­ sions of the direct information needs of the user (kroski 2006; mathes 2004; merholz 2004). these desire lines also may reflect the needs of communities of interest: tag­ gers who use same set of tags have formed a group and can seek each other out using simple search techniques. “tagging provides users an easy, yet powerful method to express themselves within a community” (szekely and torres 2005). ■ weaknesses of folksonomies folksonomies share the problems inherent to all uncon­ trolled vocabularies, such as ambiguity, polysemy, syn­ onymy, and basic level variation (fichter 2006; golder and huberman 2006; guy and tomkin 2006; mathes 2004). the terms in a folksonomy may have inherent ambiguity as different users apply terms to documents in different ways. the polysemous tag port could refer to a sweet fortified wine, a porthole, a place for loading and unloading ships, the left­hand side of a ship or air­ craft, or a channel endpoint in a communications system. folksonomies do not include guidelines for use or scope notes. folksonomies provide for no synonym control; the terms mac, macintosh, and apple, for example, are all used to describe apple macintosh computers. similarly, both singular and plural forms of terms appear (for example, flower and flowers), thus creating a number of redun­ dant headings. the problem with basic level variation is that related terms that describe an item vary along a continuum of specificity ranging from very general to very specific, so, for example, documents tagged perl and javascript may be too specific for some users, while a document tagged programming may be too general for others. folksonomies provide no formal guidelines for the choice and form of tags, such as the use of com­ pound headings, punctuation, word order, and so forth; for example, should one use the tag vegan cooking or cooking, vegan? guy and tomkin (2006) provide some general suggestions for tag selection best practices, such as the use of plural rather than singular forms, the use article title | author 15the structure and form of folksonomy tags | spiteri 15 of underscore to join terms in a multiterm concept (for example, open_source), following conventions estab­ lished by others, and adding synonyms. these sugges­ tions are rather too vague to be of much use, however; for example, under what circumstances should singular forms be used (such as noncount nouns), and how should synonyms be linked? ■ applications of folksonomies other than social bookmarking sites, folksonomies are used in commercial shopping sites, such as amazon (http://www.amazon.com/), where clients tag items of interest; these tags can be accessed by people with similar interests. platial (http://www.platial.com/ splash) is used to tag personal collections of maps. examples of the use of folksonomies for intranets include ibm’s social bookmarking application dogear, which allows people to bookmark pages within their intranet (http://domino.watson.ibm.com/cambridge/ research.nsf/99751d8eb5a20c1f852568db004efc90/ 1c181ee5fbcf59fb852570fc0052ad75?opendocument), and scuttle (http://sourceforge.net/projects/scuttle/), an open­source bookmarking project that can be hosted on web servers for free. penntags (http://tags.library. upenn.edu/) is a social bookmarking service offered by the university of pennsylvania library to its community members. steve museum is a project that is investigating the incorporation of folksonomies into museum catalogs (trant and wyman 2006). another potential application of folksonomies is to public library catalogs, where users can organize and tag items of interest in user­specific folders; users could then decide whether or not to post the tags publicly (spiteri 2006). ■ analyses of folksonomies analysis of the structure, or composition, of tags has thus far been limited; there has been more emphasis placed upon the co­occurrence of tags and their frequency of use. cattuto, loreto, and pietronero (2006) applied a stochas­ tic model of user behavior to investigate the statistical properties of tag co­occurrence; their results suggest that users of collaborative tagging systems share universal behaviors. michlmayr (2005) compared tags assigned to a set of delicious bookmarks to the dmoz (http://www. dmoz.org/) taxonomy, which is designed by a commu­ nity of volunteers. the study concluded that there were few instances of overlap between the two sets of terms. mathes (2004) provides an interesting analysis of the strengths and limitations of the structure of delicious and flickr, but does not provide an explanation of the meth­ odology used to derive his observations; it is not clear, for example, for how long he studied these two sites, how many tags he examined, what elements he was looking for, or what evaluative criteria he applied. golder and huberman (2006) conducted an analysis of the structure of collaborative tagging systems, look­ ing at user activity and kinds and frequencies of tags. specifically, golder and huberman looked at what tags delicious members assigned and how many bookmarks they assigned to each tag. this study identified a number of functions tags perform for bookmarks, including iden­ tifying the: ■ subject of the item; ■ format of the item (for example, blog); ■ ownership of the item; and ■ characteristics of the item (for example, funny). while the golder and huberman study provides an important look at tag use, their study is limited in that they examined only one site for a period of four days; their results are an excellent first step in the analysis of tag use, but the narrow focus of their population and sample size means that their observations are not easily generalized. furthermore, this study focuses more on how bookmarks are associated with tags (for example, how many bookmarks are assigned per tag and by whom) rather than at the structural composition of the tags themselves. guy and tonkin (2006) collected a random sampling of tags from delicious and flickr to see whether “popular objections to folksonomic tagging are based on fact.” the authors do not explain, however, over what period the tags were acquired (for example, over a one­day period, over a month), nor to they provide any evaluative criteria. the tags were entered into aspell, an open source spell checker, from which the authors concluded that 40 percent of flickr and 28 percent of delicious tags were either mis­ spelled, encoded in a manner not understood by aspell, or consisted of compound words of two or more words. tags did not follow convention in such areas as the use of case or singular versus plural forms. while this study certainly focuses upon the structure of the tags, the bases for the authors’ conclusions are problematic. it is not clear that the use of a spell checker is a sufficient measure of quality. does the spell checker allow for cultural variations in spell­ ing (for example, labor or labour)? how well­recognized and comprehensive is the source vocabulary for this spell checker? furthermore, if a tag does not exist in the spell checker, does this necessarily mean that the tag is incor­ rect? tags may include several neologisms, such as podcasting, that may not yet exist in conventional dictionaries but are well­recognized in a particular domain. the authors do not mention whether they took into account the cor­ 16 information technology and libraries | september 200716 information technology and libraries | september 2007 rect use of the singular form of such tags as noncountable nouns (for example, air) or tags that describe disciplines or emotions (for example, history and love). if a named entity (person or organization) was not recognized by aspell, does this mean that the tag was classified as incorrect? lastly, the authors seem to imply that compound words of two or more words are necessarily incorrect, which may not be the case (for example, open source software). the pitfalls of folksonomies have been well­docu­ mented; what is missing is an in­depth analysis of the linguistic structure of tags against an established bench­ mark. while popular opinion suggests that folksonomies suffer from ambiguous and inconsistent structure, the actual extent of these problems is not yet clear; further­ more, analyses conducted so far have not established clear benchmarks of quality pertaining to good tag structure. although there are no guidelines for the construction of tags, recognized guidelines do exist for the construction of terms that are used in taxonomies. although these guidelines discuss the elucidation of inter­term relation­ ships (hierarchical, associative, and equivalent), which does not apply to the flat space of folksonomies, they contain sections pertaining to the choice and formation of concept terms that may, in fact, have relevance for the construction of tags. ■ methodology selection of folksonomy sites tags were chosen from three popular folksonomy sites: delicious, furl, and technorati (http://www.technorati. com/). delicious and furl function as bookmarking sites, while technorati enables people to search for and organize blogs. these sites were chosen because they provide daily logs of the most popular tags that have been assigned by their members on a given day. the daily tag logs from each of the sites were acquired over a thirty­day period (february 1–march 2, 2006). the daily tags for each site were entered into an excel spreadsheet. a list of unique tags for each site was compiled after the thirty­day period; unique refers to the single instance of a tag. some of the tags were used only once during the thirty­day period, while others, such as travel, occurred several times, so travel appears only once in the list of unique tags. variations of the same tag—for example, car or cars, cheney or dick cheney—were considered to constitute two unique tags. only english­language tags were accumulated. the analysis of the tag structure in the three lists was conducted by applying the niso guidelines for thesaurus construction, which are the most current set of recognized guidelines for the: contents, display, construction . . . of controlled vocabu­ laries. this standard focuses on controlled vocabularies that are used for the representation of content objects in knowledge organization systems including lists, syn­ onym rings, taxonomies, and thesauri (niso 2005, 1). while folksonomies are not controlled vocabularies, they are lists of terms used to describe content, which means that the niso guidelines could work well as a benchmark against which to examine how folksonomy tags are structured as well as the extent to which this structure reflects the widely accepted norm for controlled vocabu­ laries. section 6 of the guidelines (term choice, scope, and form) was applied to the tags, specifically the following elements (see appendix a for the expanded list): 6.3 term choice 6.4 grammatical form of terms 6.5 nouns 6.6 selecting the preferred form only those elements in section 6 that were found to apply to the lists of unique tags are included in appendix a. for each site, the section 6 elements were applied to each unique tag; for example, it was noted whether a tag consists of one or more terms, whether the tag is a noun, adjective, or adverb, and so on. the frequency of occur­ rence of the section 6 elements was noted for each site and then compared across the three sites in order to determine the existence of any patterns in tag structure and the extent to which these patterns reflect current practice in the design of controlled vocabularies. definition and disambiguation of tags the meanings of the tags were determined based upon (1) the context of their use; and (2) their definition in three external sources, namely merriam webster online dic­ tionary (http://www.m­w.com/); google (http://www. google.com/); and wikipedia (http://www.wikipedia. org/). merriam­webster was used specifically to define all tags other than those that constitute unique entities (for example, named people, places, organizations, or products) and to determine the various meanings of tags that are homographs (for example, art or web). the actual concept represented by homographs was determined by examin­ ing the sites or blogs to which the tag was assigned. merriam­webster also was used to determine the grammatical form of a tag; for example, noun, verbal noun, adjective, or adverb. determining verbal nouns proved to be complicated, especially given that niso relies only on examples to illustrate such nouns. some tags could serve as both verbal and simple nouns; for example, the tag clipping could describe the activity to clip or an item that has been clipped, such as a newspaper article title | author 17the structure and form of folksonomy tags | spiteri 17 clipping. similarly, does skiing refer to an activity, or the sport? if the dictionary defined a tag as an activity, the tag was classified as a verbal noun. in the case of tags that were defined as both verbal nouns and simple nouns, the context in which the tag was used determined the final classification. the dictionary also was used to determine the type of concept represented by a tag. the niso guidelines do not define any of these seven types of concepts outlined in section 6.3.2; they provide only a short list of examples for each type. if the term represented by the tag was defined as an activity, property, material, event, discipline or field of study, or unit of measurement, it was classified as such unless the context of the tag suggested otherwise. if none of these six types was defined in the dictionary, the default value of thing was assigned to the tag. these definitions were then compared to the context in which the tag was used. in the case of the tag art, for example, an examination of the sites associated with this tag indicated that it refers to art objects, rather than the discipline, so it was classified as a thing. merriam­webster was used to determine whether a tag constitutes a recognized term in standard english (both united states and united kingdom variants); for example, the tag blogs is a recognized term in the dictionary, while podcasting is not. niso does not provide a clear definition of slang, neologism, or jargon, other than to say that they are nonstandard terms not generally found in dictionaries. is the term podcasting, for example, an instance of slang, jargon, or neologism? at what point does jargon become a neologism? because of the difficulty of distinguishing among these three categories, it was decided to use the broader category nonstandard terms to cover tags that (1) could not be found in the dictionary; or (2) are designated as vulgar or slang in the dictionary. google and wikipedia were used to define the mean­ ings of tags that constitute unique entities. wikipedia also was used to distinguish the various meanings of tags that constitute abbreviations or acronyms via its disambigua­ tion pages; for example, the tag nfl is given eight pos­ sible meanings. in this case, the tag nfl is used to refer specifically to the national football league, so the tag is a homograph, noun, and unique entry. ■ tagging conventions and guidelines of the folksonomy sites delicious delicious defines tags as: one­word descriptors that you can assign to your bookmarks. . . . they’re a little bit like keywords but non­hierarchical. you can assign as many tags to a bookmark as you like and easily rename or delete them later. tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders” (del.icio.us 2006a). the delicious help page for tags encourages people to “enter as many tags as you would like, each separated by a space” in the tag field. this paragraph explains briefly that two lists of tags may appear under the entry form used to enter a bookmark. the first list consists of popular tags assigned by other people to the bookmark in question, while the second consists of recommended tags, which contains a combination of tags that have been assigned by the client in question as well as other users (del.icio.us 2006b). it is not clear how the two lists differ in that they both contain tags assigned by other people to the bookmark at hand. the only tangible guideline provided about how tags should be structured is the sentence “your only limitation on tags is that they must not include spaces.” delicious thus addresses only indirectly the fact that it does not allow multiterm tags; the examples provided suggest ways in which compound terms can be expressed; for example, san­francisco, sanfranciso, san.franciso (del. ico.us 2006b). punctuation thus appears to be allowed in the construction of tags, which is confirmed by the sug­ gestion that asterisks may be used to rate bookmarks: “a tag of * might mean an ok link, *** is pretty good, and a bookmark tagged ***** is awesome” (del.icio.us 2006b). it is thus possible that tags may not consist of recognizable terms, even though asterisks are neither searchable nor indicative of content. furl the furl web site uses the term topics rather than tags, but provides no guidelines or instructions for how to con­ struct these topics. furl mentions only that when entering a bookmark, “a small window will pop up. it should have the title and url of the page you are looking at. enter any additional details (i.e., topic, rating, comments) and click save” (furl 2006). furl provides all users with a list of default topics to which one can add at will. furl provides no guidelines as to whether single or multiword topics may be used; it is only by trial and error that the user discovers that the latter are, in fact, allowed. technorati in its tags help page, technorati encourages users to “think of a tag as a simple category name. people can categorize their posts, photos, and links with any tag that makes sense” (technorati 2006). a tag may be “anything, but it should be descriptive. please only use tags that are rel­ evant to the post” (technorati 2006). technorati tags are 1� information technology and libraries | september 20071� information technology and libraries | september 2007 embedded into individual blogs via the link rel=”tag”; for example: <a href=”http://technorati.com/tag/ global+warming” rel=”tag”>global warming</a>. the tag will appear as simply global warming. no other guidelines are provided about how tags should be constructed. as can be seen, the three folksonomy sites provide very few guidelines or conventions for how tags should be constructed. users are not pointed to the common problems that exist in uncontrolled vocabulary, such as ambiguous headings, homographs, synonyms, spelling variations, and so forth, nor are suggestions made as to the preferred form of tags, such as nouns, plural forms, or the distinction between count nouns (for example, dogs) and mass nouns (for example, air). given this lack of guidance, it is not unreasonable to assume that the tags acquired from these sites will vary considerably in form and structure. ■ findings unless stated otherwise, the number of tags per folk­ sonomy site is 76 for delicious, 208 for furl, and 229 for technorati. homographs the niso guidelines recommend that homographs— terms with identical spellings but different meanings— should be avoided as far as possible in the selection of terms. homographs constitute 22 percent of delicious tags, 12 percent of furl tags, and 20 percent of technorati tags. unique entities constitute a significant proportion of the homographs in all three sites, with 71 percent in delicious, 43 percent in furl, and 55 percent in technorati. the most frequently occurring homographs across the three sites consist predominantly of computer­related terms, such as ajax and css. single-word versus multiword terms the niso guidelines recommend that terms should represent a single concept expressed by a single or mul­ tiword term, as needed. single­term tags constitute 93 percent of delicious tags, 76 percent of furl tags, and 80 percent of technorati tags. the preponderance of single tags in delicious may reflect the fact that it does not allow for the use of spaces between the different elements of the same tag; for example, open source. types of concepts niso provides a list of seven types of concepts that may be represented by terms; while this list is not exhaustive, it represents the most frequently occurring types of con­ cept. table 1 shows the percentage of tags that correspond to each of the seven types of concepts. tags that represent things are clearly predominant in the three sites, with activities and properties forming a distant second and third in importance. none of the tags represent events or measures, and only a fraction of the technorati tags represent materials. the niso guidelines provide no indication of the expected distribution of the types of concepts, so it is difficult to determine to what extent the three folksonomy sites are consistent with other lists of descriptors. none of the tags fell outside the scope of the seven types of concepts. unique entities unique entities may represent the names of people, places, organizations, products, and specific events (niso 2005). unique entities constitute 22 percent of delicious tags, 14 percent of furl tags, and 49 percent of technorati tags. there is no consistency in the percentage of unique enti­ ties: technorati has nearly twice the percentage of tags than delicious has, and nearly triple the percentage of tags than furl has. computer­related products constitute 100 percent of the unique entities in delicious, 63 percent in furl, and 38 percent in technorati. the remainder of the unique entities in furl and technorati represent places, people, and corporate bodies. the unique entities in technorati are closely related to developments in current news events, an occurrence that is likely due to the site’s focus on blogs rather than web sites. as will be discussed in a subsequent section, the unique entries constitute a significant proportion of the tags that represent ambiguous acronyms or abbreviated terms, such as ajax or psp. table 1. concepts represented by the tags delicious (%) furl (%) technorati (%) things 76 82 90.0 materials 0 0 0.4 activities 12 10 4.0 events 0 0 0.0 properties 8 6 4.0 disciplines 4 3 1.0 measures 0 0 0.0 article title | author 19the structure and form of folksonomy tags | spiteri 19 grammatical forms of terms the niso standards recommend the use of the following grammatical forms of terms: ■ nouns and noun phrases ■ verbal nouns ■ noun phrases ■ premodified noun phrases ■ postmodified noun phrases ■ adjectives ■ adverbs table 2 shows the distribution of the grammatical forms of tags. if all the types of nouns are combined, then 95 percent of delicious tags, 94 percent of furl tags, and 97 percent of technorati tags constitute types of nouns. the gram­ matical structure of the tags in the three folksonomy sites thus reflects very closely the niso recommendations that tags consist of mainly nouns, with the added proviso that adjectives and adverbs be kept to a minimum. none of the folksonomy sites used adverbs as tags, and the num­ ber of adjectives was very small, forming an average total of 5 percent of the tags. nouns (plural and singular forms) niso divides nouns into two categories: count nouns (how many?), and noncount, or mass nouns (how much?). niso recommends that count nouns appear in the plural form and mass nouns in the singular form. niso specifies other types of nouns that appear typi­ cally in the singular form: ■ abstract concepts ■ beliefs; for example, judaism, taoism ■ activities; for example, digestion, distribution ■ emotions; for example, anger, envy, love, pity ■ properties; for example, conductivity, silence ■ disciplines; for example, chemistry, astronomy ■ unique entities table 3 shows the distribution of the singular and plu­ ral forms of noun tags. the term singular nouns was used to collocate all the types of non­plural nouns. table 3 represents the number of tags that constitute count nouns; this does not mean, however, that the tags appeared correctly in the plural form. of the count nouns, 36 percent of delicious tags, 62 percent of furl tags, and 34 percent of technorati tags appeared correctly in the plural form. it should be noted that although table 3 indicates that properties constitute 8 percent of delicious, 6 percent of furl, and 4 percent of technorati tags, most of these tags are adjectives, and thus are not counted in the table. the niso guidelines do not suggest the typical distribution of count versus singular nouns, but table 3 indicates that at least among the three folksonomy sites, singular nouns form the bulk of the tags. table 2. grammatical form of tags delicious (%) furl (%) technorati (%) nouns 88 71 86 verbal nouns 5 6 4 noun phrases— premodified 1 15 4 noun phrases— postmodified 0 2 3 adjectives 6 6 3 adverbs 0 0 0 table 3. count and noncount noun tags delicious (%) furl (%) technorati (%) count nouns 18 35 23 noncount nouns 77 59 74 mass nouns 36 32 19 activities 12 10 4 properties 3 0 1 disciplines 4 3 1 unique 22 14 49 total 95 94 97 20 information technology and libraries | september 200720 information technology and libraries | september 2007 spelling the niso guidelines divide the spelling of terms into two sections: warrant and authority. with respect to warrant, niso recommends that “the most widely accepted spell­ ing of words, based on warrant, should be adopted,” with cross­references made between variant spellings of terms. as far as authority is concerned, spelling should follow the practice of well­established dictionaries or glossaries. while spelling refers normally to whole words, i included in this analysis acronyms and abbreviations used to denote unique entities, such as countries or product names, as there are recognized spellings of such acronyms and abbreviations. table 4 shows the tags from the three sites that do not conform to recognized spelling; the terms in italics show the accepted spelling. the number of tags that do not conform to spelling warrant is clearly very few, constituting a total of 4 per­ cent of the delicious tags, 3 percent of the furl tags, and 2 percent of the technorati tags. two of the nonrecognized spellings in delicious are likely due to the difficulty of creating compound tags in this site, as was discussed earlier. the remainder of the tags conformed to recog­ nized spellings as found in the three reference sources consulted. the findings suggest that tags are spelled con­ sistently and in keeping with recognized warrant across the three folksonomy sites. because of the international nature of the three folksonomy sites, no default english spelling was assumed. table 5 shows those tags whose spellings reflect regional variations. none of the three folksonomy sites featured lexical variants of any one tag. as the three sites are united states–based, the preponderance of american spelling is not surprising. what is surprising, however, is that technorati features only the british variants in the total of tags examined in this study. it should be pointed out that the two lexical variants of these terms do appear in the three folksonomy sites; the two variants simply did not appear in the daily logs examined. no system to enable cross­referencing (for example, humour use or see humor) exists in any of the three folksonomy sites, nor is cross­referencing discussed in the help logs of the sites. abbreviations, initialisms, and acronyms niso recommends that the full form of terms should be used. abbreviations or acronyms should be used only when they are so well­established that the full form of the term is rarely used. cross­references should be made between the full and abbreviated forms of the terms. abbreviations and acronyms constitute 22 percent of delicious tags, 16 percent of furl tags, and 19 percent of technorati tags. the majority of these abbreviations and acronyms pertain to unique entities, such as product names (for example, flash, mac, and nfl). in the case of delicious and furl, none of the abbreviated tags is referred to also by its full form. four of the abbreviated technorati tags have full­form equivalents: ■ cheney/dick cheney ■ ie/internet explorer ■ sheehan/cindy sheehan ■ uae/united arab emirates abbreviations and acronyms play a significant role in the ambiguity of the tags from the three sites; they represent 71 percent of the abbreviated delicious tags, 45 percent of the abbreviated furl tags, and 73 percent of the abbreviated technorati tags. furl and technorati are very similar in the proportion of abbreviated tags used, but delicious is significantly higher. the delicious tags are focused more heavily upon computer­related products, which may explain why there are so many more abbrevi­ ated tags, as many of these products are often referred to by these shorter terms; for example, css, flash, apple, and so on. table 4. tags that do not conform to spelling warrant delicious (n=76) furl (n=208) technorati (n=229) howto (how to) hollywood bday (hollywood birthday) met-art pics (metropolitan art pictures) opensource (open source) med-books (medical books) superbowl (super bowl) toread (to read) oralsex (oral sex) web-20 (web2.0) table 5. tags that reflect regional spelling variations delicious (n=76) furl (n=208) technorati (n=229) humor (u.s. spelling) humor (u.s. spelling) favourite (british spelling) jewelry (u.s. spelling) humour (british spelling) article title | author 21the structure and form of folksonomy tags | spiteri 21 neologisms, slang, and jargon the niso guidelines explain that neologisms, slang, and jargon terms are generally not included in standard dic­ tionaries and should be used only when there is no other widely accepted alternative. nonstandard tags do not constitute a particularly relevant proportion of the total number of tags per site; they account for 3 percent of the delicious tags, 10 percent of the furl tags, and 6 percent of the technorati tags. the nonstandard tags refer almost exclusively to either computer­ or sex­related concepts, such as podcast, wiki, and camsex. nonalphabetic characters this section of the niso guidelines deals with the use of capital letters and nonalphabetic characters. capitalization was not examined in the three folksonomy sites, as none of them are case sensitive; delicious and furl, for exam­ ple, post tags in lower case, regardless of whether the user has assigned upper or lower case, while technorati shows capital letters only if they are assigned by the users themselves. the niso guidelines state that nonalphabetic characters, such as hyphens, apostrophes (unless used for the possessive case), symbols, and punctuation marks, should not be used because they cause filing and search­ ing problems. table 6 shows the occurrence of nonalpha­ betic characters in the three folksonomy sites. a very small proportion of the tags in the three folk­ sonomy sites contains non­alphabetic characters, namely 1 percent of the delicious tags, and 3 percent of the furl and technorati tags. as was discussed previously, the delicious help screens may encourage people to use nonalphabetic characters to construct compound tags; in spite of this, however, such characters are not, in fact, used very frequently. it should be noted that the terms above were all searched, with punctuation intact, in their respective sites; in all three cases, the search engines retrieved the tags and their associated blogs or web sites, which suggests that nonalphabetic characters may not negatively impact searching. ■ discussion and recommendations the tags examined from the three folksonomy sites cor­ respond closely to a number of the niso guidelines pertaining to the structure of terms, namely in the types of concepts expressed by the tags, the predominance of single tags, the predominance of nouns, the use of recognized spelling, and the use of primarily alphabetic characters. potential problem areas in the structure of the tags pertain to the inconsistent use of the singular and plural form of count nouns, the difficulty with creating multi­ term tags in delicious, and the incidence of ambiguous tags in the form of homographs and unqualified abbre­ viations or acronyms. as has been seen, a significant proportion of tags that represent count nouns appears incorrectly in the singular form. because many search engines do not deploy default truncation, the use of the singular or plural form could affect retrieval; a search for the tag computer in delicious, for example, retrieved 208,409 hits, while one for computers retrieved 91,205 hits. some of the results from the two searches overlapped, but only if both the singular and plural forms of the tags coexist. it would thus be useful for the help features of the folksonomy sites to explain the difference between count and noncount nouns and to discuss the impact of the form of the noun upon retrieval. while all three sites conform to the niso recommendation that single terms be used whenever possible, some concepts cannot be expressed in this fashion, and thus folksonomy sites should accom­ modate the use of multiterm tags. table 6. nonalphabetic characters delicious (n=76) furl (n=208) technorati (n=229) hyphens — hollywood b-day; urlproject consumercredit; web2.0 apostrophes — mom’s medical (possessive) valentine’s day (possessive) underscore safari_export blogger_life — full stop — web 2.0 (part of product name) web-2.0 (part of product name) forward slash — — /africa + sign — jcr+ — 22 information technology and libraries | september 200722 information technology and libraries | september 2007 furl and technorati allow for their use, but make no mention of this feature in their help screens, which means that such tags may be constructed inconsistently—for example, by the insertion of punctuation—where a sim­ ple space between the tags will suffice. as has been seen, delicious does not allow directly for the construction of multiterm tags, and in its instructions it actually promotes inconsistency in how various punctuation devices may be used to conflate two or three separate tags, once again at the detriment of retrieval, as is shown below: opensource: 103,476 hits open_source: 91, 205 hits open.source: 26,494 hits delicious should consider allowing for the insertion of spaces between the composite words of a compound tag; without this facility, users may be unaware of how to create compound tags. alternatively, delicious should recommend the use of only one punctuation symbol to conflate terms, such as the underscore. furl and technorati should explain clearly that compound tags may be formed by the simple convention of placing a space between the terms. ambiguous headings constitute the most problematic area in the construction of the tags; these headings take the form of homographs and abbreviations or acronyms. in the case of computer­related product names, it may be safe to assume that in the context of an online environ­ ment it is likely that the meaning of these product names is relatively self­evident. in the case of the tag yahoo, for example, none of the sites or blogs associated with this tag pertained to “a member of a race of brutes in swift’s gulliver’s travels who have the form and all the vices of humans, or a boorish, crass, or stupid person” (merriam­ webster 2007), but referred consistently to the internet service provider and search engine. on the other hand, the tag ajax was used to refer to asynchronous javascript and xml technology as well as to a number of mainly european soccer teams. given the international audience of these folksonomy sites, it may be unwise to assume that the meanings of these homographs are self­evident. library of congress subject headings often uses parenthetical qualifiers to clarify the meaning of terms— for example, python (computer program language)—even though this goes against niso recommendations. it is unlikely, however, that such use of parentheses will be effective in the folksonomy sites. a search for opera (browser), for example, will likely imply an underlying and boolean operator, which detracts from the pur­ pose and value of the parenthetical qualifier; this was confirmed in a furl search, where the terms opera and browser appeared either immediately adjacent to each other or within the same document. the application of the section of the niso guidelines pertaining to abbreviations and acronyms is particularly difficult, as it is important to balance between using abbre­ viated forms of concepts that are so well­known that the full version is hardly used versus creating ambiguous tags. the fact that abbreviated forms appear so prominently in the daily logs of the three folksonomy sites suggests that the full forms of these tags are, in fact, very well­established. at face value, therefore, many of the abbreviated tags are ambiguous because they can refer to different concepts, but it is questionable whether such tags as css, flash, apple, and rss, for example are, in fact, ambiguous to the users of the sites. the use of the full forms for these tags seems cumbersome, as these concepts are hardly ever referred to in their full form. it could possibly be argued, in fact, that in some cases, the full forms may not be familiar; i may know to what concept rss refers, for example, without knowing the specific words represented by the letters r, s, s. the possible ambiguity of abbreviated forms is com­ pounded by the fact that none of the three folkson­ omy sites allows for cross­references between equivalent terms, which is a standard feature of most controlled vocabularies, for example: nfl/national football league use national football league/used for nfl the help screens of the three sites do not address the notion of ambiguity in the construction of tags: they do not draw people’s attention to the inherent ambigu­ ity of abbreviated forms that may represent more than one concept. the sites also fail to address the fact that abbreviated forms (or any tag, for that matter) may be culturally based, so that while the meaning of nfl may be obvious to north american users, this may not be the case for people who live in other geographic areas. it may be useful for the folksonomy sites to add direct links to an online dictionary and to wikipedia, and to encourage people to use these sites to determine whether their cho­ sen tags may have more than one application or meaning; i had not realized, for example, that rss could represent twenty­three different concepts until i used wikipedia and was led to a disambiguation page. access to these external sources may help users decide which full version of the abbreviation to use in the case of ambiguity. the examination of the structure of the tags pointed to some deficiencies in section 6 of the niso guidelines, specifically its occasional lack of sufficient definition or explanation of some of its recommendations. the guidelines list seven types of concepts that are typically represented by controlled vocabulary terms, but rely only upon a few examples to define the meaning and scope of these concepts. the guidelines thus provide no consistent mechanism by which the creators of terms can assess consistently the types of concepts represented. how, for example, is a discipline to be determined? does the term business represent a discipline if it is a subject area that is taught formally in a post­secondary institute, for article title | author 23the structure and form of folksonomy tags | spiteri 23 example? is it necessary for a discipline to be recognized as such among a majority of educational institutions? in its examples for events, niso lists holidays and revolutions. it is unclear, however, what level of specificity applies to this concept; would christmas, for example, be considered an event or a unique entity/proper noun (which is listed separately from types of concepts)? it is only later in the guidelines, under the examples provided for unique enti­ ties (for example, fourth of july), that one may assume that a named event should be considered a unique entity. verbal nouns also are difficult to determine based only upon the niso examples, and once again no guidelines are provided to determine whether a noun represents an activity or a thing, or possibly both; for example, skiing or clipping. the lack of clear definitions in niso also appeared in the section pertaining to slang, neologisms, and jargon, which are considered to be nonstandard terms that do not generally appear in dictionaries. as was discussed previ­ ously, it is not clear at what point a jargon term or a slang term becomes a neologism. all of the slang tags found in the three sites (for example, babe) appeared in merriam­ webster, which may serve to make this niso section even more ambiguous. ■ conclusion the most notable suggested weaknesses of folksonomies are their potential for ambiguity, polysemy, synonymy, and basic level variation as well as the lack of consistent guidelines for the choice and form of tags. the examina­ tion of the tags of the three folksonomy sites in light of the niso guidelines suggests that ambiguity and polysemy (such as homographs) are indeed problems in the struc­ ture of the folksonomy tags, although the actual propor­ tion of homographs and ambiguous tags each constitutes fewer than one­quarter of the tags in each of the three folksonony sites. in other words, although ambiguity and polysemy are certainly problematic areas, most of the tags in each of the three sites are unambiguous in their meaning and thus conform to niso recommendations. the help sites of the three folksonomy provide few tangible guidelines for (1) the construction of tags, which affects the construction of multiterm tags; and (2) the clear distinction between the singular and plural forms of count versus noncount nouns. as has been shown, the use of the singular or plural forms of terms, as well as the use of punctuation to form multiterm tags, affects search results. a large proportion of the tags in all three sites consists of single terms, which mitigates the impact on retrieval, but the inconsistent use of the singular and plural forms of nouns is indeed significant and thus may have marked effect upon retrieval. synonymy and basic level variation were not examined in this study, but are certainly worthy of further exploration. in other areas, the tags conform closely to the niso guidelines for the choice and form of controlled vocabu­ laries. the tags represent mostly nouns, with very few unqualified adjectives or adverbs. the tags represent the types of concepts recommended by niso and conform well to recognized standards of spelling. most of the tags conform to standard usage; there are few instances of nonstandard usage, such as slang or jargon. in short, the structure of the tags in all three sites is well within the standards established and recognized for the construction of controlled vocabularies. should library catalogs decide to incorporate folkson­ omies, they should consider creating clearly written rec­ ommendations for the choice and form of tags that could include the following areas: ■ the difference between count and noncount nouns, as well as an explanation of how the use of the sin­ gular and plural forms affects retrieval. ■ one standard way in which to construct multiterm tags; for example, the insertion of a space between the component terms, or the use of an underscore between the terms. ■ a link to a recognized online dictionary and to wikipedia to enable users to determine the meanings of terms, to disambiguate amongst homographs, and to determine if the full form would be preferable to the abbreviated form. an explanation of the impact of ambiguous tags and homographs upon retrieval would be useful. ■ an acceptable use policy that would cover areas of potential concern, such as the use of potentially offensive tags, overly graphic tags, and so forth. although such terms were not the focus of this study, their presence was certainly evident in some cases, and would need to be considered in an environment that includes clients of all ages. with the use of such expanded guidelines and links to useful external reference sources, folksonomies could serve as a very powerful and flexible tool for increasing the user­friendliness and interactivity of public library catalogs, and also may be useful for encouraging other activities, such as informal online communities of readers and user­driven readers’ advisory services. works cited bateman, s., c. brooks, and g. mccalla. 2006. collaborative tagging approaches for ontological metadata in adaptive e-learning systems. http://www.win.tue.nl/sw­el/2006/ camera­ready/02­bateman_brooks_mccalla_swel2006_ final.pdf (accessed jan. 11, 2007). 2� information technology and libraries | september 20072� information technology and libraries | september 2007 bruce, h., w. jones, and s. dumais. 2004. keeping and re-finding information on the web: what do people do and what do they need? seattle: information school. http://kftf.ischool.washington .edu/re­finding_information_on_the_web3.pdf (accessed jan. 11, 2007). cattuto, c., v. loreto, and l. pietronero. 2006. collaborative tagging and semiotic dynamics. http://arxiv.org/ps_cache/cs/ pdf/0605/0605015.pdf (accessed jan. 11, 2007). del.icio.us. 2006a. del.ico.us/about. http://del.icio.us/about/ (accessed jan. 11, 2007). del.icio.us. 2006b. del.ico.us/help/tags. http://del.icio.us/help/ tags (accessed jan. 11, 2007). dempsey, l. 2003. the recombinant library: portals and people. journal of library administration 39, no. 4: 103–36. fichter, d. 2006. intranet applications for tagging and folkson­ omies. online 30, no. 3: 43–45. furl. 2006. how to save a page in furl. http://www.furl.net/ howtosave.jsp (accessed jan. 11, 2007). golder, s. a., and b. a. huberman. 2006. usage patterns of col­ laborative tagging systems. journal of information science 32, no. 2: 198–208. guy, m., and e. tonkin. 2006. tidying up tags? d-lib magazine 12, no. 1. http://www.dlib.org/dlib/jan.06/guy/01guy.html (accessed jan. 11, 2007). ketchell, d. s. 2000. too many channels: making sense out of portals and personalization. information technology and libraries 19, no. 4: 175–79. kroski, e. 2006. the hive mind: folksonomies and user-based tagging. http://infotangle.blogsome.com/2005/12/07/the­hive ­mind­folksonomies­and­user­based­tagging/ (accessed jan. 11, 2007). mathes, a. 2004. folksonomies—ccooperative classification and communication through shared metadata. http://www.adammathes .com/academic/computer­mediated­communication/ folksonomies.html (accessed jan. 11, 2007). merholz, p. 2004. ethnoclassification and vernacular vocabularies. http://www.peterme.com/archives/000387.html (accessed jan. 11, 2007). merriam­webster. (2007). yahoo. http://www.m­w.com/ (accessed jan. 11, 2007). michlmayr, e. 2005. a case study on emergent semantics in communities. http://wit.tuwien.ac.at/people/michlmayr/ publications/michlmayr_casestudy_on_emergentsemantics _final.pdf (accessed jan. 11, 2007). mitchell, r. l. 2005. tag teams wrestle with web content. computerworld 38, no. 16: 31. niso. 2005. guidelines for the construction, format, and management of monolingual controlled vocabularies. ansi/niso z39.19­2005. bethesda, md.: national information standards organization. http://www.niso.org/standards/resources/z39­19­2005 .pdf (accessed jan. 11, 2007). quintarelli, e. 2005. folksonomies: power to the people. http:// www.iskoi.org/doc/folksonomies.htm (accessed jan. 11, 2007). shirky, c. 2004. folksonomy. http://www.corante.com/many/ archives/2004/08/25/folksonomy.php (accessed jan. 11, 2007). spiteri, l. f. 2006. the use of folksonomies in public library cata­ logues. the serials librarian 51, no. 2: 75–89. szekely, b., and e. torres. 2005. ranking bookmarks and bistros: intelligent community and folksonomy development. http:// torrez.us/archives/2005/07/13/tagrank.pdf. (accessed jan. 11, 2007). technorati. 2006. technorati help:tags. http://www.technorati. com/help/tags.html (accessed jan. 11, 2007). trant, j., and b. wyman. (2006). investigating social tagging and folksonomy in art museums with steve.museum. http://www.archimuse .com/research/www2006­tagging­steve.pdf (accessed jan. 11, 2007). udell, j. 2004. collaborative knowledge gardening. http://www. infoworld.com/article/04/08/20/34opstrategic_1.html (accessed jan. 11, 2007). vander wal, t. 2006. understanding folksonomy: tagging that works. http://s3.amazonaws.com/2006presentations/ dconstruct/tagging_in_rw.pdf (accessed jan. 11, 2007). vanderwal.net. 2005. folksonomy definition and wikipedia. http:// www.vanderwal.net/random/entrysel.php?blog=1750 (accessed jan. 11, 2007). wikipedia. 2006. folksonomy. http://en.wikipedia.org/wiki/ folksonomy (accessed jan. 11, 2007). wu, h., m. zubair, and k. maly. 2006. harvesting social knowledge from folksonomies. http://delivery.acm.org/10.1145/1150000/ 1149962/p111­wu.pdf (accessed jan. 11, 2007). article title | author 25the structure and form of folksonomy tags | spiteri 25 appendix a: list of niso elements 6.3 term form 6.3.1 single word vs. multiword terms 6.3.2 types of concepts terms for things and their physical parts terms for materials terms for activities or processes terms for events or occurrences terms for properties or states terms for disciplines or subject fields terms for units of measurement 6.3.3 unique entities 6.4 grammatical forms of terms 6.4.1 nouns and noun phrases 6.4.1.1 verbal nouns 6.4.1.2 noun phrases 6.4.1.2.1 premodified noun phrases 6.4.1.2.2 postmodified noun phrases 6.4.2 adjectives 6.4.3 adverbs 6.5 nouns 6.5.1 count nouns 6.5.2 mass nouns 6.5.3 other types of singular nouns 6.5.3.1 abstract concepts 6.5.3.2 unique entities 6.6.2 spelling 6.6.2.1 spelling—warrant 6.6.2.2 spelling—authorities 6.6.3 abbreviations, initialisms, and acronyms 6.6.3.1 preference for abbreviation 6.6.3.2 preference for full form 6.6.3.2.1 general use 6.6.3.2.2 ambiguity 6.6.4 neologisms, slang, and jargon 6.7.1 capitalization and nonalphabetic characters 34 information technology and libraries | march 2010 tagging: an organization scheme for the internet marijke a. visser how should the information on the internet be organized? this question and the possible solutions spark debates among people concerned with how we identify, classify, and retrieve internet content. this paper discusses the benefits and the controversies of using a tagging system to organize internet resources. tagging refers to a classification system where individual internet users apply labels, or tags, to digital resources. tagging increased in popularity with the advent of web 2.0 applications that encourage interaction among users. as more information is available digitally, the challenge to find an organizational system scalable to the internet will continue to require forward thinking. trained to ensure access to a range of informational resources, librarians need to be concerned with access to internet content. librarians can play a pivotal role by advocating for a system that supports the user at the moment of need. tagging may just be the necessary system. w ho will organize the information available on the internet? how will it be organized? does it need an organizational scheme at all? in 1998, thomas and griffin asked a similar question, “who will create the metadata for the internet?” in their article with the same name.1 ten years later, this question has grown beyond simply supplying metadata to assuring that at the moment of need, someone can retrieve the information necessary to answer their query. given new classification tools available on the internet, the time is right to reassess traditional models, such as controlled vocabularies and taxonomies, and contrast them with folksonomies to understand which approach is best suited for the future. this paper gives particular attention to delicious, a social networking tool for generating folksonomies. the amount of information available to anyone with an internet connection has increased in part because of the internet’s participatory nature. users add content in a variety of formats and through a variety of applications to personalize their web experience, thus making internet content transitory in nature and challenging to lock into place. the continual influx of new information is causing a rapid cultural shift, more rapid than many people are able to keep up with or anticipate. conversations on a range of topics that take place using web technologies happen in real time. unless you are a participant in these conversations and debates using web-based communication tools, changes are passing you by. internet users in general have barely grasped the concept of web 2.0 and already the advanced “internet cognoscenti” write about web 3.0.2 regarding the organization and availability of internet content, librarians need to be ahead of the crowd as the voice who will assure content will be readily accessible to those that seek it. internet users actively participating in and shaping the online communities are, perhaps unintentionally, influencing how those who access information via the internet expect to be able to receive and use digital resources. librarians understand that the way information is organized is critical to its accessibility. they also understand the communities in which they operate. today, librarians need to be able to work seamlessly among the online communities, the resources they create, and the end user. as internet use evolves, librarians as information stakeholders should stay abreast of web 2.0 developments. by positioning themselves to lead the future of information organization, librarians will be able to select the best emerging web-based tools and applications, become familiar with their strengths, and leverage their usefulness to guide users in organizing internet content. shirky argues that the internet has allowed new communities to form. primarily online, these communities of internet users are capable of dramatically changing society both onand offline. shirky contends that because of the internet, “group action just got easier.”3 according to shirky, we are now at the critical point where internet use, while dependent on technology, is actually no longer about the technology at all. the web today (web 2.0) is about participation. “this [the internet] is a medium that is going to change society.”4 lessig points out that content creators are “writing in the socially, culturally relevant sense for the 21st century and to be able to engage in this writing is a measure of your literacy in the 21st century.”5 it is significant that creating content is no longer reserved for the internet cognoscenti. internet users with a variety of technological skills are participating in web 2.0 communities. information architects, web designers, librarians, business representatives, and any stakeholder dependent on accessing resources on the internet have a vested interest in how internet information is organized. not only does the architecture of participation inherent in the internet encourage completely new creative endeavors, it serves as a platform for individual voices as demonstrated in marijke a. visser (marijkea@gmail.com) is a library and information science graduate student at indiana university, indianapolis, and will be graduating may 2010. she is currently working for ala’s office for information and technology policy as an information technology policy analyst, where her area of focus includes telecommunications policy and how it affects access to information. tagging: an organization scheme for the internet | visser 35 personal and organizationally sponsored blogs: lessig 2.0, boing boing, open access news, and others. these internet conversations contribute diverse viewpoints on a stage where, theoretically, anyone can access them. web 2.0 technologies challenge our understanding of what constitutes information and push policy makers to negotiate equitable internet-use policies for the public, the content creators, corporate interests, and the service providers. to maintain an open internet that serves the needs of all the players, those involved must embrace the opportunity for cultural growth the social web represents. for users who access, create, and distribute digital content, information is anything but static; nor is using it the solitary endeavor of reading a book. its digital format makes it especially easy for people to manipulate it and shape it to create new works. people are sharing these new works via social technologies for others to then remix into yet more distinct creative work. communication is fundamentally altered by the ability to share content on the internet. today’s internet requires a reevaluation of how we define and organize information. the manner in which digital information is classified directly affects each user’s ability to access needed information to fully participate in twenty-first-century culture. new paradigms for talking about and classifying information that reflect the participatory internet are essential. n background the controversy over organizing web-based information can be summed up comparing two perspectives represented by shirky and peterson. both authors address how information on the web can be most effectively organized. in her introduction, peterson states, “items that are different or strange can become a barrier to networking.”6 shirky maintains, “as the web has shown us, you can extract a surprising amount of value from big messy data sets.”7 briefly, in this instance ontology refers to the idea of defining where digital information can and should be located (virtually). folksonomy describes an organizational system where individuals determine the placement and categorization of digital information. both terms are discussed in detail below. although any organizational system necessitates talking about the relationship(s) among the materials being organized, the relationships can be classified in multiple ways. to organize a given set of entities, it is necessary to establish in what general domain they belong and in what ways they are related. applying an ontological, or hierarchical, classification system to digital information raises several points to consider. first, there are no physical space restrictions on the internet, so relationships among digital resources do not need to be strictly identified. second, after recognizing that internet resources do not need the same classification standards as print material, librarians can begin to isolate the strengths of current nondigital systems that could be adapted to a system for the internet. third, librarians must be ready to eliminate current systems entirely if they fail to serve the needs of internet users. traditional systems for organizing information were developed prior to the information explosion on the internet. the internet’s unique platform for creating, storing, and disseminating information challenges pre– digital-age models. designing an organizational system for the internet that supports creative innovation and succeeds in providing access to the innovative work is paramount to moving the twenty-first-century culture forward. n assessing alternative models controversy encourages scrutiny of alternative models. in understanding the options for organizing digital information, it is important to understand traditional classification models. smith discusses controlled vocabularies, taxonomies, and facets as three traditional methods for applying metadata to a resource. according to smith, a controlled vocabulary is an unambiguous system for managing the meanings of words. it links synonyms, allowing a search to retrieve information on the basis of the relationship between synonyms.8 taxonomies are hierarchical, controlled vocabularies that establish parent–child relationships between terms. a faceted classification system categorizes information using the distinct properties of that information.9 in such a system, information can exist in more than one place at a time. a faceted classification system is a precursor to the bottom-up system represented by folksonomic tagging. folksonomy, a term coined in 2004 by thomas vander wal, refers to a “user-created categorical structure development with an emergent thesaurus.”10 vander wal further separates the definition into two types: a narrow and a broad folksonomy.11 in a broad folksonomy, many people tag the same object with numerous tags or a combination of their own and others’ tags. in a narrow folksonomy, one or few people tag an object with primarily singular terms. internet searching represents a unique challenge to people wanting to organize its available information. search engines like yahoo! and google approach the chaotic mass of information using two different techniques. yahoo! created a directory similar to the file folder system with a set of predetermined categories that were intended to be universally useful. in so doing, the yahoo! developers made assumptions about how the general public would categorize and access information. the categories 36 information technology and libraries | march 2010 and subsequent subcategories were not necessarily logically linked in the eyes of the general public. the yahoo! directory expanded as internet content grew, but the digital folder system, like a taxonomy, required an expert to maintain. shirky notes the yahoo! model could not scale to the internet. there are too many possible links to be able to successfully stay within the confines of a hierarchical classification system. additionally, on the internet, the links are sufficient for access because if two items are linked at least once, the user has an entry point to retrieve either one or both items.12 a hierarchical system does not assure a successful internet search and it requires a user to comprehend the links determined by the managing expert. in the google approach, developers acknowledged that the user with the query best understood the unique reasoning behind her search. the user therefore could best evaluate the information retrieved. according to shirky, the google model let go of the hierarchical file system because developers recognized effective searching cannot predetermine what the user wants. unlike yahoo!, google makes the links between the query and the resources after the user types in the search terms.13 trusting in the link system led google to understand and profit from letting the user filter the search results. to select the best organizational model for the internet it is critical to understand its emergent nature. a model that does not address the effects of web 2.0 on internet use and fails to capture participant-created content and tagging will not be successful. one approach to organizing digital resources has been for users to bookmark websites of personal interest. these bookmarks have been stored on the user’s computer, but newer models now combine the participatory web with saving, or tagging, websites. social bookmarking typifies the emergent web and the attraction of online networking. innovative and controversial, the folksonomy model brings to light numerous criteria necessary for a robust organizational system. a social bookmarking network, delicious is a tool for generating folksonomies. it combines a large amount of self-interest with the potential for an equal, if not greater, amount of social value. delicious users add metadata to resources on the internet by applying terms, or tags, to urls. users save these tagged websites to a personal library hosted on the delicious website. the default settings on delicious share a user’s library publicly, thus allowing other people—not limited to registered delicious account holders—to view any library. that the delicious developers understood how internet users would react to this type of interactive application is reflected in the popularity of delicious. delicious arrived on the scene in 2003, and in 2007 developers introduced a number of features to encourage further user collaboration. with a new look (going from the original del.icio.us to its current moniker, delicious) as well as more ways for users to retrieve and share resources by 2007, delicious had 3 million registered users and 100 million unique urls.14 the reputation of delicious has generated interest among people concerned with organizing the information available via the internet. how does the folksonomy or delicious model of open-ended tagging affect searching, information retrieving, and resource sharing? delicious, whose platform is heavily influenced by its users, operates with no hierarchical control over the vocabulary used as tags. this underscores the organization controversy. bottom-up tagging gives each person tagging an equal voice in the categorization scheme that develops through the user generated tags. at the same time, it creates a chaotic information-retrieval system when compared to traditional controlled vocabularies, taxonomies, and other methods of applying metadata.15 a folksonomy follows no hierarchical scheme. every tag generated supplies personal meaning to the associated url and is equally weighted. there will be overlap in some of the tags users select, and that will be the point of access for different users. for the unique tags, each delicious user can choose to adopt or reject them for their personal tagging system. either way, the additional tags add possible future access points for the rest of the user community. the social usefulness of the tags grows organically in relationship to their adoption by the group. can the internet support an organizational system controlled by user-generated tags? by the very nature of the participatory web, whose applications often get better with user input, the answer is yes. delicious and other social tagging systems are proving that their folksonomic approach is robust enough to satisfy the organizational needs of their users. defined by vander wal, a broad folksonomy is a classification system scalable to the internet.16 the problem with projecting already-existing search and classification strategies to the internet is that the internet is constantly evolving, and classic models are quickly overcome. even in the nonprint world of the internet, taxonomies and controlled vocabulary entail a commitment both from the entity wanting to organize the system and the users who will be accessing it. developing a taxonomy involves an expert, which requires an outlay of capital and, as in the case with yahoo!, a taxonomy is not necessarily what users are looking for. to be used effectively, taxonomies demand a certain amount of user finesse and complacency. the user must understand the general hierarchy and by default must suspend their own sense of category and subcategory if they do not mesh with the given system. the search model used by google, where the user does the filtering, has been a significantly more successful search engine. google recognizes natural language, making it user friendly; however, it remains merely a search engine. it is successful at making links, but it leaves the user stranded without a means to organize search results beyond simple page rank. traditional tagging: an organization scheme for the internet | visser 37 hierarchical systems and search strategies like those of yahoo! and google neglect to take into account the tremendous popularity of the participatory web. successful web applications today support user interaction; to disregard this is naive and short-sighted. in contrast to a simple page-rank results list or a hierarchical system, delicious results provide the user with rich, multilayer results. figure 1 shows four of the first ten results of a delicious search for the term “folksonomy.” the articles by the four authors in the left column were tagged according to the diagram. two of the articles are peer-reviewed, and two are cited repeatedly by scholars researching tagging and the internet. in this example, three unique terms are used to tag those articles, and the other terms provide additional entry points for retrieval. further information available using delicious shows that the guy article was tagged by 1,323 users, the mathes article by 2,787 users, the shirky article by 4,383 users, and the peterson article by 579 users.17 from the basic delicious search, the user can combine terms to narrow the query as well as search what other users have tagged with those terms. similar to the card catalog, where a library patron would often unintentionally find a book title by browsing cards before or after the actual title she originally wanted, a delicious user can browse other users’ libraries, often finding additional pertinent resources. a user will return a greater number of relevant and automatically filtered results than with an advanced google search. as an ancillary feature, once a delicious user finds an attractive tag stream—a series of tags by a particular user—they can opt to follow the user who created the tag stream, thereby increasing their personal resources. hence delicious is effective personally and socially. it emulates what internet users expect to be able to do with digital content: find interesting resources, personalize them, in this case with tags, and put them back out for others to use if they so choose. proponents of folksonomy recognize there are benefits to traditional taxonomies and controlled vocabulary systems. shirky delineates two features of an organizational system and their characteristics, providing an example of when a hierarchical system can be successful (see table 1).18 these characteristics apply to situations using databases, journal articles, and dissertations as spelled out by peterson, for example.19 specific organizations with identifiable common terminology—for example, medical libraries—can also benefit from a traditional classification system. these domains are the antithesis of the domain represented by the web. the success of controlled vocabularies, taxonomies, and their resulting systems depends on broad user adoption. that, in combination with the cost of creating and implementing a controlled system, raises questions as to their utility and long-term viability for use on the web. though meant for longevity, a taxonomy fulfills a need at one fixed moment in time. a folksonomy is never static. taxonomies developed by experts have not yet been able to be extended adequately for the breadth and depth of internet resources. neither have traditional viewpoints been scaled to accept the challenges encountered in trying to organize the internet. folksonomy, like taxonomy, seeks to provide the information critical to the user at the moment of need. folksonomy, however, relies on users to create the links that will retrieve the desired results. doctorow puts forward three critiques of a hierarchical metadata system, emphasizing the inadequacies of applying traditional classification schemes to the digital stage: 1. there is not a “correct” way to categorize an idea. 2. competing interests cannot come to a consensus figure 1. search results for “folksonomy” using delicious. table 1. domains and their participants domain to be organized participants in the domain small corpus expert catalogers formal categories authoritative source of judgment restricted entities coordinated users clear edges expert users 38 information technology and libraries | march 2010 on a hierarchical vocabulary. 3. there is more than one way to describe something. doctorow elaborates: “requiring everyone to use the same vocabulary to describe their material denudes the cognitive landscape, enforces homogeneity in ideas.”20 the internet raises the level of participation to include innumerable voices. the astonishing thing is that it thrives on this participation. guy and tonkin address the “folksonomic flaw” by saying user-generated tags are by definition imprecise. they can be ambiguous, overly personal, misspelled, and a contrived compound word. guy and tonkin suggest the need to improve tagging by educating the users or by improving the systems to encourage more accurate tagging.21 this, however, does not acknowledge that successful web 2.0 applications depend on the emergent wisdom of the user community. the systems permit organic evolution and continual improvement by user participation. a folksonomy evolves much the way a species does. unique or single-use tags have minimal social import and do not gain recognition. tags used by more than a few people reinforce their value and emerge as the more robust species. n conclusion the benefits of the internet are accessible to a wide range of users. the rewards of participation are immediate, social, and exponential in scope. user-generated content and associated organization models support the internet’s unique ability to bring together unlikely social relationships that would not necessarily happen in another milieu. to paraphrase shirky and lessig, people are participating in a moment of social and technological evolution that is altering traditional ways of thinking about information, thereby creating a break from traditional systems. folksonomic classification is part of that break. its utility grows organically as users add tagged content to the system. it is adaptive, and its strengths can be leveraged according to the needs of the group. while there are “folksonomic flaws” inherent in a bottomup classification system, there is tremendous value in weighting individual voices equally. following the logic of web 2.0 technology, folksonomy will improve according to the input of the users. it is an organizational system that reflects the basic tenets of the emergent internet. it may be the only practical solution in a world of participatory content creation. shirky describes the internet by saying, “there is no shelf in the digital world.”22 classic organizational schemes like the dewey decimal system were created to organize resources prior to the advent of the internet. a hierarchical system was necessary because there was a physical limitation on where a resource could be located; a book can only exist in one place at one time. in the digital world, the shelf is simply not there. material can exist in many different places at once and can be retrieved through many avenues. a broad folksonomy supports a vibrant search strategy. it combines individual user input with that of the group. this relationship creates data sets inherently meaningful to the community of users seeking information on any given topic at any given moment. this is why a folksonomic approach to organizing information on the internet is successful. users are rewarded for their participation, and the system improves because of it. folksonomy mirrors and supports the evolution of the internet. librarians, trained to be impartial and ethically bound to assure access to information, are the logical mediators among content creators, the architecture of the web, corporate interests, and policy makers. critical conversations are no longer happening only in traditional publications of the print world. they are happening with communication platforms like youtube, twitter, digg, and delicious. information organization is one issue on which librarians can be progressive. dedicated to making information available, librarians are in a unique position to take on challenges raised by the internet. as the profession experiments with the introduction of web 3.0, librarians need to position themselves between what is known and what has yet to evolve. librarians have always leveraged the interests and needs of their users to tailor their services to the individual entry point of every person who enters the library. because more and more resources are accessed via the internet, librarians will have to maintain a presence throughout the web if they are to continue to speak for the informational needs of their users. part of that presence necessitates an ability to adapt current models to the internet. more importantly, it requires recognition of when to forgo conventional service methods in favor of more innovative approaches. working in concert with the early adopters, corporate interests, and general internet users, librarians can promote a successful system for organizing internet resources. for the internet, folksonomic tagging is one solution that will assure users can retrieve information necessary to answer their queries. references and notes 1. charles f. thomas and linda s. griffin, “who will create the metadata for the internet?” first monday 3, no. 12 (dec. 1998). 2. web 2.0 is a fairly recent term, although now ubiquitous among people working in and around internet technologies. attributed to a conference held in 2004 between medialive tagging: an organization scheme for the internet | visser 39 international and o’reilly media, web 2.0 refers to the web as being a platform for harnessing the collective power of internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierarchical policy influencers or regulators. web 3.0 is a much more fluid concept as of this writing. there are individuals who use it to refer to a semantic web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. there are librarians involved with exploring virtual-world librarianship who refer to the 3d environment as web 3.0. the important point here is that what internet users now know as web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing web applications. web 3.0 is the undefined future of the participatory internet. 3. clay shirky, “here comes everybody: the power of organizing without organizations” (presentation videocast, berkman center for internet & society, harvard university, cambridge, mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed oct. 1, 2008). 4. ibid. 5. lawerence lessig, “early creative commons history, my version,” videocast, aug. 11, 2008, lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed aug. 13, 2008). 6. elaine peterson, “beneath the metadata: some philosophical problems with folksonomy,” d-lib magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed sept. 8, 2008). 7. clay shirky, “ontology is overrated: categories, links, and tags” online posting, spring 2005, clay shirky’s writings about the internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed sept. 8, 2008). 8. gene smith, tagging: people-powered metadata for the social web (berkeley, calif.: new riders, 2008): 68. 9. ibid., 76. 10. thomas vander wal, “folksonomy,” online posting, feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed aug. 26, 2008). 11. thomas vander wal, “explaining and showing broad and narrow folksonomies,” online posting, feb. 21, 2005, personal infocloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed aug. 29, 2008). 12. shirky, “ontology is overrated.” 13. ibid. 14. michael arrington, “exclusive: screen shots and feature overview of delicious 2.0 preview,” online posting, june 16, 2005, techcrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed jan. 6, 2010). 15. smith, tagging, 67–93 . 16. vander wal, “explaining and showing broad and narrow folksonomies.” 17. adam mathes, “folksonomies—cooperative classification and communication through shared metadata” (graduate paper, university of illinois urbana–champaign, dec. 2004); peterson, “beneath the metadata”; shirky, “ontology is overrated”; thomas and griffin, “who will create the metadata for the internet?” 18. shirky, “ontology is overrated.” 19. peterson, “beneath the metadata.” 20. cory doctorow, “metacrap: putting the torch to seven straw-men of the meta-utopia,” online posting, aug. 26, 2001, the well, http://www.well.com/~doctorow/metacrap.htm (accessed sept. 15, 2008). 21. marieke guy and emma tonkin, “folksonomies: tidying up tags?” d-lib magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed sept. 8, 2008). 22. shirky, “ontology is overrated.” global interoperability continued from page 33 9. julie renee moore, “rda: new cataloging rules, coming soon to a library near you!” library hi tech news 23, no. 9, (2006): 12. 10. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, & technical services 27, no. 1, (2003): 56. 11. park, “cross-lingual name and subject access.” 12. ibid. 13. thomas b. hickey, “virtual international authority file” (microsoft powerpoint presentation, ala annual conference, new orleans, june 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed dec. 9, 2009). 14. leaf, “leaf project consortium,” http://www.crxnet .com/leaf/index.html (accessed dec. 9, 2009). 15. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 16. alan danskin, “mature consideration: developing bibliographic standards and maintaining values,” new library world 105, no. 3/4, (2004): 114. 17. ibid. 18. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 19. moore, “rda.” 20. danskin, “mature consideration,” 116. 21. ibid.; park, “cross-lingual name and subject access.” discovery: what do you mean by that? | carter 161 judith carter editorial board thoughts: issue introduction discovery: what do you mean by that? m wuah ha ha ha haaa! finally it’s my turn. i hold the power of the editorial. (can you tell i’m writing this around halloween?) seriously now, i’ve been intimately and extensively involved with information technology and libraries for eleven years, yet this is the first time i’ve escaped from behind the editing scenes to address the readership directly. as managing editor for seven of the eleven volumes (18–22 and 27–28) and an editorial board member reviewing manuscripts (vols. 23–26), i am honored marc agreed to let me be guest editor for this theme issue. this issue is a compilation of presentations from the discovery mini-conference held at the university of nevada las vegas (unlv) libraries in the spring of 2009. the first article by jennifer fabbi gives the full chronology and framework of the project, but i have the pleasure of introducing this issue and topic by virtue of my role as guest editor, as well as my own participation in the miniconference before i left unlv in july 2009. n what is discovery? when the dean of libraries, patricia iannuzzi, announced that unlv would have a libraries-wide, poster-session style discovery mini-conference, jennifer fabbi and i decided we wanted to be part of it. we had already been exploring various aspects of discovery as part of an organizational focus as well as following up on a particular event that happened earlier in the year. while serving on a search committee, we posed a question to all the candidates: “what do you see the library catalog looking like in the future? what do you see as the relationship between the library catalog and other access or discovery tools?” one of the candidates had such a unique answer that it got us thinking: are we all talking about the same thing when we discuss discovery? the mini-conference gave us the opportunity to explore the idea further. an all-library summit that preceded the mini-conference announcement had focused on users finding known items. we knew that discovery was so much more and that it depended on the users’ needs. of course, first we went to multiple online dictionaries to look up the meanings of “discovery” and found the following definitions: n something learned or found; something new that has been learned or found n the process of learning something; the fact or process of finding out about something for the first time n the process of finding something; the process or act of finding something or somebody unexpectedly or after searching we also looked at famous quotes about discovery. there were some of our favorites: a discovery is said to be an accident meeting a prepared mind. —albert szent-gyorgyi education is a progressive discovery of our own ignorance. —will durant next, a colleague recommended we look at chang’s browsing theory.1 this theory covered the broad spectrum of how users seek information and showed a more serendipitous view than the former focus of known item search. obviously, browsing implies a physical interaction with a collection, so we reframed the themes to fit discovery in the “every-library” electronic information environment. chang’s five browsing themes, adapted to discovery: n looking for a specific item, to locate n looking for something with common characteristics, to find “more like this” n keeping up-to-date, to find out what’s new in a field, topic or intellectual area n learning or finding out, to define or form a research question n goal-free, to satisfy curiosity or be entertained.2 all interesting information, but a little theoretical for a visual presentation. to make these themes more concrete and visual, i suggested we apply them to personas as described in one of my favorite books, the inmates are running the asylum.3 this encourages programmers to create a user with a full backstory and then design a product for their needs. to do this in an entertaining way, we identified five types of users we’ve encountered in our libraries and described an information-seeking need for each. i then created some colorful and representational characters using a well-known, alliteratively named candy’s website. our five characters were 1. mina, stylishly dressed and always carries a cell phone, is an undergraduate who rarely uses the library. she has a sociology class library assignment to find information on the cell phone habits of generation x. 2. ms. lvite lives in the las vegas area and contributes to the library. she is a regular from the community judith carter (jcarter.mls@gmail.com) is head of technical services at marquette university raynor memorial libraries, milwaukee, wi and managing editor of ital. 162 information technology and libraries | december 2009 who likes to dig into everything the library owns about small mining towns in nevada. 3. dr. prof is a faculty member with a slightly outdated wardrobe but a thirst for knowledge. he wants to know what books have been published in his field of quantum bowtie mechanics by any of his colleagues across the country. 4. phdead tired is a slightly mussed grad student who is always in the library clutching a cup of coffee. he needs to narrow down his dissertation topic. 5. duuuuude is an energetic, sociable young man who likes to hang out in the library with his friends. he has some time to kill on the computer. on our poster, we asked the discovery miniconference attendees to place cutouts of our personas on a pie chart divided into the five themes of discovery. jennifer and i expected certain placements and were pleasantly surprised when our attendees challenged our assumptions with alternate possibilities. another section of the poster related discovery behaviors to specific electronic discovery tools. we provided a few and asked the attendees to add others (see table 1). while talking with each attendee, we provided a bookmark listing the five discovery behaviors (with colorful character personas) and suggested they keep them in mind as they visited the other conference sessions. we challenged them to identify what user behaviors the other presenters’ systems or services were targeting. the message jennifer and i hoped to convey with our poster was this: the way we think about discovery, or the users’ goals in finding information, drives the discovery table 1. relating discovery behaviors to electronic discovery tools user wants . . . provide the user . . . other tools?* to find a specific item search by title, author, or call number (e.g., libraries’ webopac) search a database worldcat flickr google books to find items with common characteristics items linked by subject headings, format, or other elements; tag clouds; federated search for article databases (e.g., webopac, encore, article databases) flickr summon twine delicious to be kept up-to-date recently added items by subject; integration of blogs for news or updates (e.g., new books list, libguides, encore “recently added”) blogs rss feeds apple itunes amazon readers advisory authors/musicians websites newspapers online to learn more about something general information that provides context, reviews (e.g., wikipedia, google, encore community reviews) dissertation abstracts encyclopedias database of databases (for context) peer to peer: delicious, social tagging to satisfy curiosity or be entertained surfing the web, multimedia, social networking (e.g., google, youtube, facebook) myspace world of warcraft second life podcasts wikipedia “random article” feature * ideas generated at the discovery mini-conference discovery: what do you mean by that? | carter 163 systems we have or will create. as you read through this issue, i hope you’ll see some new ways to think about discovery and that those ways will fuel this audience’s potential to create new tools. what follows is a textual walk around our miniconference. taken as individual articles, each might not look like what you are used to seeing in ital. taken as a whole that grew out of the process, these articles are what makes this a special issue. as i said before, jennifer fabbi provides the background and process for the discovery mini-conference. then, alex dolski describes a prototype multipac discovery system he created and demonstrated, and he discusses the issues surrounding the design of such a system. tom ipri, michael yunkin, and jeanne brown, as members of the usability working group, had already been conducting testing on unlv libraries’ website. they share their methods, findings, and results with us. thomas sommer presents a look at what the special collections department has implemented to aid discovery of their unique materials. wendy starkweather and eva stowers used the mini-conference as an opportunity to research how other libraries are providing discovery opportunities to students via smartphones. patrick griffis describes his work with free screen capture tools to build pathfinders to promote resource discovery. patrick griffis and cyrus ford each looked at enhancing catalog records, so they combined their two presentations here to describe ways to enrich the online catalog to better aid our users’ success. references 1. shan-ju chang, “chang’s browsing,” in theories of information behavior, by karen e. fisher, sanda erdelez, and lynne mckechnie (medford, n.j.: information today, 2005): 69–74. 2. ibid., 71–72. 3. alan cooper, the inmates are running the asylum, (indianapolis, ind.: sams, 1999). personas are described in chapter 9. figure 1. “initial thoughts” and “five general themes of discovery behavior” panel from the discovery mini-conference poster editorial board thoughts | eden 109 editorial board thoughts bradford lee eden musings on the demise of paper w e have been hearing the dire predictions about the end of paper and the book since microfiche was hailed as the savior of libraries decades ago. now it seems that technology may be finally catching up with the hype. with the amazon kindle and the sony reader beginning to sell in the marketplace despite the cost (about $360 for the kindle), it appears that a whole new group of electronic alternatives to the print book will soon be available for users next year. amazon reports that e-book sales quadrupled in 2008 from the previous year. this has many technology firms salivating and hoping that the consumer market is ready to move to digital reading as quickly and profitably as the move to digital music. some of these new devices and technologies are featured in the march 3, 2009, fortune article by michael v. copeland titled “the end of paper?”1 part of the problem with current readers is their challenges for advertising. because the screen is so small, there isn’t any room to insert ads (i.e., revenue) around the margins of the text. but new readers such as plastic logic, polymer vision, and firstpaper will have larger screens, stronger image resolution, and automatic wireless updates, with color screens and video capabilities just over the horizon. still, working out a business model for newspapers and magazines is the real challenge. and how much will readers pay for content? with everything “free” over the internet, consumers have become accustomed to information readily available for no immediate cost. so how much to charge and how to make money selling content? the plastic logic reader weighs less than a pound, is one-eighth of an inch thick, and resembles an 8½ x 11 inch sheet of paper or a clipboard. it will appear in the marketplace next year, using plastic transistors powered by a lithium battery. while not flexible, it is a very durable and break-resistant device. other e-readers will use flexible display technology that allows one to fold up the screen and place the device into a pocket. much of this technology is fueled by e-ink, a start-up company that is behind the success of the kindle and the reader. they are exploring the use of color and video, but both have problems in terms of reading experience and battery wear. in the long run, however, these issues will be resolved. expense is the main concern: just how much are users willing to pay to read something in digital rather than analog? amazon has been hugely successful with the kindle, selling more than 500,000 for just under $400 in 2007. and with the drop in subscriptions for analog magazines and newspapers, advertisers are becoming nervous about their futures. or will the “pay by the article” model, like that used for digital music sales, become the norm? so what should or do these developments mean for libraries? it means that we should probably be exploring the purchase of some of these products when they appear and offering them (with some content) for checkout to our patrons. many of us did something similar when it became apparent that laptops were wanted and needed by students for their use. many of us still offer this service today, even though many campuses now require students to purchase them anyway. offering cutting-edge technology with content related to the transmission and packaging of information is one way for our clientele to see libraries as more than just print materials and a social space. and libraries shouldn’t pay full price (or any price) for these new toys; companies that develop these products are dying to find free research and development focus groups that will assist them in versioning and upgrading their products for the marketplace. what better avenue than college students? related to this is the recent announcement by the university of michigan that their university press will now be a digital operation to be run as part of the library.2 decreased university and library budgets have meant that university presses have not been able to sell enough of their monographs to maintain viable business models. the move of a university press to a successful scholarly communication and open-source publishing entity like the university of michigan libraries means that the press will be able to survive, and it also indicates that the newer model of academic libraries as university publishers will have a prototypical example to point out to their university’s administration. in the long run, these types of partnerships are essential if academic libraries are to survive their own budget cuts in the future. references 1. michael v. copeland, “the end of paper?” cnnmoney .com, mar. 3, 2009, http://money.cnn.com/2009/03/03/ technology/copeland_epaper.fortune/ (accessed june 22, 2009). 2. andrew albanese, “university of michigan press merged with library, with new emphasis on digital monographs,” libraryjournal.com, mar. 26, 2009, http://www .libraryjournal.com/article/ca6647076.html (accessed june 22, 2009). bradford lee eden (eden@library.ucsb.edu) is associate university librarian for technical services and scholarly communication, university of california, santa barbara. editorial ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ public library computer waiting queues: alternatives to the first -come-first-served strategy stuart williamson public library computer waiting queues | williamson 72 abstract this paper summarizes the results of a simulation of alternative queuing strategies for a public library computer sign-up system. using computer usage data gathered from a public library, the performance of these various queuing strategies is compared in terms of the distribution of user wait times. the consequences of partitioning a pool of public computers are illustrated as are the potential benefits of prioritizing users in the waiting queue according to the amount of computer time they desire. introduction many of us at public libraries are all too familiar with the scene: a crowd of customers huddled around the library entrance in the morning, anxiously waiting for the doors to open to begin a race for the computers. from this point on, the wait for a computer at some libraries, such as the one we will examine, can hover near thirty minutes on busy days and peak at an hour or more. such long waiting times are a common source of frustration for both customers and staff. by far the most effective solution to this problem is to install more public computers at your library. of course, when the space or money run out, this may no longer be possible. another approach is to reduce the length or number of sessions each customer is allowed. unfortunately, reducing session length can make completion of many important tasks difficult; whereas, restricting the number of sessions per day can result in customers upset over being unable to use idle computers.1 finally, faced with daunting wait times, libraries eager to make their computers accessible to more people may be tempted to partition their waiting queue by installing separate fifteen-minute “express” computers. a primary focus of this paper is to illustrate how partitioning the pool of public computers can significantly increase waiting times. additionally, several alternative queuing strategies are presented for providing express-like computer access without increasing overall waiting times. we often take for granted the notion that first-come-first-served (fcfs) is a basic principle of fairness. “i was here first,” is an intuitive claim that we understand from an early age. however, stuart williamson (swilliamson@metrolibrary.org) is researcher, metropolitan library system, oklahoma city, oklahoma. mailto:swilliamson@metrolibrary.org information technology and libraries | june 2012 73 the inefficiency present in a strictly fcfs queue is implicitly acknowledged when we courteously invite a person with only a few items to bypass our overflowing grocery cart to proceed ahead in the check-out line. most of us would agree to wait an additional few minutes rather than delay someone else for a much greater length of time. when express lanes are present, they formalize this process by essentially allowing customers needing help for only a short period of time to cut in line. these line cuts are masked by the establishment of separate dedicated lines, i.e., the queue is partitioned into express and non-express lines. one question addressed by this article is “is there a middle ground?” in other words, how might a library system set up its computer waiting queue to achieve express-lane type service without splitting the set of public internet computers into partitions that operate separately and in parallel? several such strategies are presented here along with the results of how each performed in a computer simulation using actual customer usage data from a public library. strategies queuing systems are heavily researched in a number of disciplines, particularly computer science and operations research. the complexity and sheer number of different queuing models can present a formidable barrier to library professionals. this is because, in the absence of real-world data, it is often necessary to analyze a queuing system mathematically by approximating its key features with an applicable probability distribution. unfortunately, applying these distributions entails adopting their underlying assumptions as well as any additional assumptions involved in calculating the input parameters. for instance, the poisson distribution (used to approximate customer arrival rates) requires that the expected arrival rate be uniform across all time intervals, an assumption which is clearly violated when school lets out and teenagers suddenly swarm the computers.2 even if we can account for such discrepancies, there remains the difficulty of estimating the correct arrival rate parameter for each discrete time interval being analyzed. fortunately, many libraries now use automated computer sign-up systems which provide access to vast amounts of real-world data. with realistic data, it is possible to simulate various queuing strategies, a few of which will be analyzed in this article. a computer simulation using real-world data provides a good picture of the practical implications of any queuing strategy we care to devise without the need for complex models. as is often the case, designing a waiting queue strategy involves striking a balance among competing factors. for instance, one way of reducing waiting times involves breaking with the fcfs rule and allowing users in one category to cut in front of other users. how many cuts are acceptable? does the shorter wait time for users in one category justify the longer waits in another? there are no right answers to these questions. while simulating a strategy can provide a realistic picture of its results in terms of waiting times, evaluating which strategy’s results are preferable for a particular library must be done on a case-by-case basis. in addition to the standard fcfs strategy with a single pool of computers and the same fcfs strategy implemented with one computer removed from the pool to serve as a dedicated fifteen public library computer waiting queues | williamson 74 minute express computer (referred to as fcfs-15), we will consider for comparison three other well-known alternative queuing strategies: shortest-job-first (sjf), highest-response-ratio-next (hrrn), and a variant of shortest-job-first (sjf-fb) which employs a feedback mechanism to restrict the number of times a given user may be bypassed in the queue.3 the three alternative strategies all require advance knowledge or estimation of how long each particular computer session will last. in our case, this means customers would need to indicate how long of a session they desire upon first signing up for a computer. any number of minutes is acceptable so we will limit the sign-up options to four categories in fifteen-minute intervals: fifteen minutes, thirty minutes, forty-five minutes, and sixty minutes. each session will then be initially categorized into one of four priority classes (p1, p2, p3, and p4) accordingly. as the data will show, customers selecting shorter sessions are given a higher priority in the queue and will thus have a shorter expected waiting time. it should be noted that relying on users to choose their own session length presents its own set of problems. it is often difficult to estimate how much time will be required to accomplish a given set of tasks online. however, users face a similar difficulty in deciding whether to opt for a dedicated fifteen-minute computer under the fcfs-15 system. the trade-off between use time and wait time should provide an incentive for some users to self-ration their computer use, placing an additional downward pressure on wait times. however, user adaptations in response to various queuing strategies are outside the scope of this analysis and will not be considered further. the shortest-job-first (sjf) strategy functions by simply selecting from the queue the user in the highest priority class. the amount of time spent waiting by each user is only considered as a tie breaker among users occupying the same priority class. our results demonstrate that the sjf strategy is generally best for minimizing overall average waiting time as well as for getting customers needing the least amount of computer time online the fastest. the main drawbacks of this strategy are that these gains come at the expense of more line cuts and higher average and maximum waiting times for the lowest priority users—those needing the longest sessions (sixty minutes). there is no limit to how many times a user can be passed over in the queue. in theory, this means that such a user could be continually bypassed and never be assigned a computer during the day. the sjf-fb strategy is a variant of sjf with the addition of a feedback mechanism that increases the priority of users each time they are cut in line. for instance, if a user signs up for a sixtyminute session, he/she is initially assigned a priority of 4. suppose that shortly after, another user signs up for a thirty-minute session and is assigned a priority of 2. the next available computer will be assigned to the user with the priority 2. the bypassed user’s priority will now be bumped up by a set interval. in this simulation an interval of 0.5 is used so the bypassed user’s new priority becomes 3.5. as a result, users beginning with a priority of 4 will reach the highest priority of 1 after being bypassed six times and will not be bypassed further. this effectively restricts the maximum number of times a user can be cut in front of at six. information technology and libraries | june 2012 75 the final alternative strategy, highest-response-ratio-next (hrrn), is a balance between fcfs and sjf. it considers both the arrival time and requested session length when assigning a priority to each user in the queue. each time a user is selected from the queue, the response ratio is recalculated for all users. the user with the highest response ratio is selected and assigned the open computer. the formula for response ratio is: ( ) this allows users with a shorter session request to cut in line, but only up to a point. even customers requesting the longest possible session move up in priority as they wait, just at a slower pace. this method produces the same benefits and drawbacks as the sjf strategy; but the effects of both are moderated, and the possibility of unbounded waiting is eliminated. still, although the expected number of cuts will be lower using hrrn than with sjf, there is no limit on how many times a user may be passed over in the queue. the response ratio formula can be generalized by scaling the importance of the waiting time factor. for instance in the modified response ratio below, increasing values of x > 1 will cause the strategy to more resemble fcfs, and decreasing values of 0 < x < 1 will more resemble sjf. ( ) one could experiment with different values of x to find a desired balance between the number of line cuts and the impact on average waiting times for customers in the various priority classes. this won’t be pursued here, and x will be assumed to be 1. methodology the data used in this simulation come from the metropolitan library system’s southern oaks library in oklahoma city. this library has eighteen public internet computers that customers can sign up for using proprietary software developed by jimmy welch, deputy executive director/technology for the metropolitan library system. the waiting queue employs the firstcome-first-served (fcfs) strategy. customers are allotted an initial session of up to sixty minutes but may extend their session in thirty-minute increments so long as the waiting queue is empty. repeat customers are also allowed to sign up for additional thirty-minute sessions during the day, provided that no user currently in the queue has been waiting for more than ten minutes (an indication that demand for computers is currently high). anonymous usage data gathered by the system in august 2010 was compiled to produce the information about each customer session shown in table 1. public library computer waiting queues | williamson 76 table 1. session data (units in minutes) the information about each session required for the simulation includes the time at which the user arrived to sign up for a computer, the number of minutes it took the user to log in once assigned a computer, how many minutes of computer time were used, whether or not this was the user’s first or a subsequent session for the day, and finally, whether the user gave up waiting and abandoned his/her place in the queue. users are given eight minutes to log in once a computer station is assigned to them before they are considered to have abandoned the queue. once this data has been gathered, the computer simulation runs by iterating through each second the library is open. as user sign-up times are encountered in the data, they are added to the waiting queue. when a computer becomes available, a user is selected from the queue using the strategy being simulated and assigned to the open computer. the customer occupies the computer for the length of time given by their associated log-in delay and session length. when this time expires, customers are removed from their computer and the information recorded during their time spent in the waiting queue is logged. results there were 7,403 sign-ups for the computers at the southern oaks library in august 2010. each of these requests is assigned a priority class based on the length of the session as detailed in table 2. the intended session length of users choosing to abandon the queue is unknown. abandoned sign-ups are assigned a priority class randomly in proportion to the overall distribution of priority classes in the data so as not to introduce any systematic bias into the results. even though their actual session length is zero, these users participate in the queue and cause the computer eventually assigned to them to sit idle for eight minutes until it is re-assigned. customers signing up for a subsequent session during the day are always assigned the lowest priority class (p-4) regardless of their requested session length. this is a policy decision to not give priority to users who have already received a computer session for the day. information technology and libraries | june 2012 77 table 2. assignment of priority classes figure 1 displays the average waiting time for each priority class during the simulation (bars) along with the total number of sessions initially assigned to each class (line). it is immediately obvious from the chart that each alternative strategy excels at reducing the average wait for high priority (p1) users. also observe how removing one computer from the pool to serve exclusively as a fifteen-minute computer drastically increases the fcfs-15 average wait times in the other priority classes. clearly, removing one (or more) computer from the pool to serve as a dedicated fifteen-minute station is a poor strategy here for all but the 519 users in class p-1. losing just one of the eighteen available computers nearly doubles the average wait for the remaining 6,884 users in the other priority classes. figure 1. average user wait minutes by priority class public library computer waiting queues | williamson 78 by contrast, note that the reduced average wait times for the highest priority users in class p-1 persist in classes p-2 and p-3 for the non-fcsc strategies. the sjf strategy produces the most dramatic reductions for the 2,164 users not in class p-4. however, for the 5,239 users in class p-4, the sjf strategy produced an average wait time that was 2.1 minutes longer than the purely fcfs strategy. the hrrn strategy achieves lesser wait time reductions than sjf in the higher priority classes, but hrrn increased the average wait for users in class p-4 by only 0.7 minutes relative to fcfs. the average wait using the sjf-fb strategy falls in between that of sjf and hrrn for each priority class while guaranteeing users will be cut at most six times. an examination of the maximum wait times for each priority class in figure 2 illustrates how the express lane itself can be a bottleneck. even with a dedicated fifteen-minute express computer under the fcfs-15 strategy, at least one user would have waited over half an hour to use a computer for fifteen minutes or less. in all but the highest priority class (p-2 through p-4), the fcfs-15 strategy again performs poorly with at least one user in each of these classes waiting over ninety minutes for a computer. figure 2. maximum user wait minutes by priority class capping the number of times a user may be passed over in the queue under the sfj-fb strategy makes it less likely that members of classes p-2 and p-3 will be able to take advantage of their higher priority to cut in front of users in class p-4 during periods of peak demand. as a result, the sjf-fb maximum wait times for classes p-2 and p-3 are similar to those under the fcfs strategy. this was not the case in the breakdown of sjf-fb average waiting times across priority classes in figure 1. information technology and libraries | june 2012 79 table 3 breaks down waiting times for each queuing strategy according to the overall percentage of users waiting no more than the given number of minutes. here we see the effects of each strategy on the system as a whole, instead of by priority class. notice that the overall average wait times for the non-fcfs strategies are lower than those of fcfs. this indicates that the total reduction in waiting times for high-priority users exceeds the additional time spent waiting by users in class p-4. in other words, these strategies are globally more efficient than fcfs. notice, too, in table 3 that the non-fcfs strategies achieve significant reductions in the median wait time compared with fcfs. table 3. distribution of wait times by strategy after demonstrating the impact that breaking the first-come-first-served rule can have on waiting times, it is important to examine the line cuts that are associated with each of these strategies. line cuts are recorded by each user in the simulation while waiting in the queue. each time a user is selected from the queue and assigned a computer, remaining users who arrived prior to the one just selected note having been skipped over. by the time they are assigned a computer, users have recorded the total number of times they were passed over in the queue. public library computer waiting queues | williamson 80 figure 3. cumulative distribution of line cuts by queuing strategy figure 3 displays the cumulative percentage of users experiencing no more than the listed number of cuts for each non-fcfs strategy. the majority of users are not passed over at all under these strategies. however, there is a small minority of users that will be repeatedly cut in line. for instance, in our simulation, one unfortunate individual was passed over in the queue sixteen times under the sjf strategy. this user waited ninety-one minutes using this strategy as opposed to only fifty-nine minutes under the familiar fcfs waiting queue. most customers would become upset upon seeing a string of sixteen people jump over them in the queue and get on a computer while they are enduring such a long wait. the hrrn strategy caused a maximum of nine cuts to an individual in this simulation. this user waited seventy-three minutes under hrrn versus only fifty-five minutes using fcfs. extreme examples such as those above are the exception. under the hrrn and sjf-fb strategies, 99% of users were passed over at most four times while waiting in the queue. conclusion we have examined the simulation of several queuing strategies using a single month of computer usage data from the southern oaks library. the relative performance difference between queuing strategies will depend on the supply and demand of computers at any given location. clearly, at libraries with plenty of public computers for which customers seldom have to wait, the choice of queuing strategy is inconsequential. however, for libraries struggling with waiting times on par with those examined here, the choice can have a substantial impact. information technology and libraries | june 2012 81 in general, however, these simulation results demonstrate the ability of non-fcfs queuing strategies to significantly lower waiting times for certain classes of users without partitioning the pool of computers. these reductions in waiting times come at the cost of allowing high priority users to essentially cut in line. this causes slightly longer wait times for low priority users; but, overall average and median wait times see a small reduction. of course, for some customers, being passed over in line even once is intolerable. furthermore, creating a system to implement an alternative queuing strategy may present obstacles of its own. however, if the need to provide for quick, short-term computer access is pressing enough for a library to create a separate pool of “express” computers; then, one of the non-fcfs queuing strategies discussed in this paper may be a viable alternative. at the very least, the fcfs-15 simulation results should give one pause before resorting to designated “express” and “nonexpress” computers in an attempt to remedy unacceptable customer waiting times. acknowledgments the author would like to thank the metropolitan library system, kay bauman, jimmy welch, sudarshan dhall, and bo kinney for their support and assistance with this paper as well as tracey thompson and tim spindle for their excellent review and recommendations. references 1. j. d. slone, “the impact of time constraints on internet and web use,” journal of the american society for information science and technology 58 (2007): 508–17. 2. william mendenhall and terry sincich, statistics for engineering and the sciences (upper saddle river, nj: prentice-hall, 2006), 151–54. 3. abraham silberschatz, peter baer galvin, and greg gagne, operating system concepts (hoboken, nj: wiley, 2009), 188–200. editorial: singularity—are we there, yet? | truitt 55 i n my last column, i wrote about two books—nicholas carr ’s the shallows and william powers’ hamlet’s blackberry—relating to learning in the always-on, always connected environment of “screens.”1 since then, two additional works have come to my attention. while i won’t be able to do them justice in the space i have here, they deserve careful consideration and open discussion by those of us in the library community. if carr’s and power’s books are about how we learn in an always-connected world of screens, sherry turkle’s alone together and elias aboujaoude’s virtually you are about who we are in the process of becoming in that world.2 turkle is a psychologist at mit who studies human– computer interactions. among her previous works are the second self (1984) and life on the screen (1995). aboujaoude is a psychiatrist at the stanford university school of medicine, where he serves as director of the obsessive compulsive disorder clinic and the impulse control disorders clinic. based on extensive coverage of specialist and popular literature, as well as numerous anonymized accounts of patients and subjects encountered by the authors, both works are characterized by thorough research and thoughtful analysis. while their approaches to the topic of “what we are becoming” as a result of screens may differ— aboujaoude’s, for example, focuses on “templates” and the terminology of traditional psychiatry, while turkle’s examines the relationship between loneliness and solitude (they are different), and how these in turn relate to the world of screens—their observations of the everyday manifestations of what might be called the pathology of screens bear many common threads. i’m acutely aware of the potential for injustice (at best) and misrepresentation or misunderstanding (rather worse) that i risk in seeking to distill two very complex studies into such a small space. and, frankly, i’m still trying to wrap my head around both the books and the larger issues they raise. with that caveat, i still think we should be reading about and widely discussing the phenomena reported, which many of us observe on a daily basis. in the sections that follow, i’d like to touch on a very few themes that emerge from these books. ■■ “why do people no longer suffice?”3 a pair of anecdotes that turkle recounts to explain her reasons for writing the current book seems worth sharing at the outset. in the first, she describes taking her then-fourteen-year-old daughter, rebecca, to the charles darwin exhibition at new york’s american museum of natural history in 2005. among the many artifacts on display was a pair of live giant galapagos tortoises: “one tortoise was hidden from view; the other rested in its cage, utterly still. rebecca inspected the visible tortoise thoughtfully for a while and then said matter-of-factly, ‘they could have used a robot.’” when turkle queried other bystanders, many of the children agreed, with one saying, ‘for what the turtles do, you didn’t have to have live ones.’” in this case, “alive enough” was sufficient for the purpose at hand.4 sometime later, turkle read and publicly expressed her reservations about british computer scientist david levy’s book, love and sex with robots, in which levy predicted that by the middle of this century, love with robots will be as normal as love with other humans, while the number of sexual acts and lovemaking positions commonly practiced between humans will be extended, as robots will teach more than is in all of the world’s published sex manuals combined.5 contacted by a reporter from scientific american about her comments regarding levy’s book, turkle was stunned when the reporter, equating the possibility of relationships between humans and robots with gay and lesbian relationships, accused her of likewise opposing these human-to-human relationships. if we now have reached a point where gay and lesbian relationships can strike us as comparable to human-to-machine relationships, something very important has changed; for turkle, it suggested that we are on the threshold of what she terms the “robotic moment”: this does not mean that companionate robots are common among us; it refers to our state of emotional—and i would say philosophical—readiness. i find people willing to seriously consider robots not only as pets but as potential friends, confidants and romantic partners. we don’t seem to care what these artificial intelligences “know” or “understand” of the human moments we might “share” with them. at the robotic moment, the performance of connection seems connection enough. we are poised to attach to the inanimate without prejudice.6 marc truitteditorial: singularity—are we there, yet? marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 56 information technology and libraries | june 2011 while these examples are admittedly extreme, both authors agree that something very basic has changed in the way we conduct ourselves. turkle characterizes it as mobile technology having made each of us “pausable,” i.e., that a face-to-face interaction being interrupted by an incoming call, text message, or e-mail is no longer extraordinary; rather, in the “new etiquette,” it is “close to the norm.”10 and the rudeness, as well we know, isn’t limited to mobile communications. referring to “flame wars,” which regularly erupt in online communities, aboujaoude observes: the internet makes it easier to suspend ethical codes governing conduct and behavior. gentleness, common courtesy, and the little niceties that announce us as well-mannered, civilized, and sociable members of the species are quickly stripped away to reveal a completely naked, often unpleasant human being.11 even our routine e-mail messages—lacking as they often do salutations and closing sign-offs—are characterized by a form of curtness heretofore unacceptable in paper communications. remarkably, to those old enough to recall the traditional norms, the brusqueness is not only unintended, it is as well unconscious; “[we] just don’t think warmth and manners are necessary or even advisable in cyberspace.”12 ■■ castles in the air: avatars, profiles, and remaking ourselves as we wish we were finally, a place to love your body, love your friends, and love your life. —second life, “what is second life?”13 one of the interesting and worrisome themes in both turkle’s and aboujaoude’s studies is that of the reinvention and transformation of the self, in the form of online personas and avatars. this is the stock-in-trade of online communities and gaming sites such as facebook and second life. these sites cater to our nearly universal desire to be someone other than who we are: online, you’re slim, rich, and buffed up, and you feel you have more opportunities than in the real world. . . . we can reinvent ourselves as comely avatars. we can write the facebook profile that pleases us. we can edit our messages until they project the self we want to be.14 the problem is that for many there is an increasing fuzziness at the interface between real and virtual ■■ changing mores, or the triumph of rudeness i can’t think of any successful online community where the nice, quiet, reasonable voices defeat the loud, angry ones. . . . the computer somehow nullifies the social contract. —heather champ, yahoo!’s flickr community manager7 sadly, we’ve all experienced it. we get stuck on a bus, train, or in an elevator with someone engaged in a loud conversation on her or his mobile phone. all too often, the person is loudly carrying on about matters we wish we weren’t there to hear. perhaps it’s a fight with a partner. or a discussion of some delicate health matter. whatever it is, we really don’t want to know, but because of the limitations imposed by physical spaces, we can’t avoid being a party to at least half of the conversation. what’s wrong with these individuals? do they really have no consideration or sense of propriety? it turns out that in matters of tact and good taste, the ground has shifted, and where once we understood and abided by commonly accepted rules of conduct and respect for others, we do so no longer. indeed, the everyday obnoxious intrusions by those using public spaces for their private conversations are among the least of offenders. consider the following situations shared by turkle: sal, 62 years old, holds a small dinner party at his home as part of his “reentry into society” after several years of having cared for his recently deceased wife: i invited a woman, about fifty, who works in washington. in the middle of a conversation about the middle east, she takes out her blackberry. she wasn’t speaking on it. i wondered if she was checking her e-mail. i thought she was being rude, so i asked her what she was doing. she said that she was blogging the conversation. she was blogging the conversation.8 turkle later tells of attending a memorial service for a friend. several [attendees] around me used the [printed] program’s stiff, protective wings to hide their cell phones as they sent text messages during the service. one of the texting mourners, a woman in her late sixties, came over to chat with me after the service. matter-of-factly, she offered, “i couldn’t stand to sit that long without getting on my phone.” the point of the service was to take a moment. this woman had been schooled by a technology she’d had for less than a decade to find this close to impossible.9 editorial: singularity—are we there, yet? | truitt 57 enough” became yet more blurred. turkle’s anecdotes of children explaining the “aliveness” of these robots are both touching and disturbing. speaking of a tamagotchi, one child wrote a poem: “my baby died in his sleep. i will forever weep. then his batteries went dead. now he lives in my head.”19 the concept of “alive enough” is not unique to the very young, either. by 2009, sociable robots had moved beyond children’s toys with the introduction of paro, a baby seal-like “creature” aimed at providing companionship to the elderly and touted as “the most therapeutic robot in the world. . . . the children were onto something: the elderly are taken with the robots. most are accepting and there are times when some seem to prefer a robot with simple demands to a person with more complicated ones.”20 where does it end? turkle goes on to describe nursebot, a device aimed at hospitals and long-term care facilities, which colleagues characterized as “a robot even sherry can love.” but when turkle injured herself in a fall a few months later, [i was] wheeled from one test to another on a hospital stretcher. my companions in this journey were a changing collection of male orderlies. they knew how much it hurt when they had to lift me off the gurney and onto the radiology table. they were solicitous and funny. . . . the orderly who took me to the discharge station . . . gave me a high five. the nursebot might have been capable of the logistics, but i was glad that i was there with people. . . . between human beings, simple things reach you. when it comes to care, there may be no pedestrian jobs.21 but need we librarians care about something as farfetched as nursebot? absolutely. now that ibm has proven that it can design a machine—okay, an array of machines, but something much more compact is surely coming soon—that can win at jeopardy!, is the robotic reference librarian really that much of a hurdle? take a bit of watson technology, stick it in nursebot, give it sensible shoes, and hey, i can easily imagine bibliobot, factory-standard in several guises, including perhaps donna reed (as mary, who becomes the town librarian in the alter-life of capra’s it’s a wonderful life) or shirley jones (as marian, the librarian, in the music man). i like donna reed as much as anyone, but do i really want reference assistance from her android doppelgänger? but then, for years after the introduction of the atm, i confess that i continued taking lunch hours off just so that i could deal with a “real person” at the bank, so perhaps it’s just me. the future is in the helping/service professions, indeed! and when we’re all replaced by robots (sociable and otherwise), what will we do to fill the time? personas: “not surprisingly, people report feeling let down when they move from the virtual to the real world. it is not uncommon to see people fidget with their smartphones, looking for virtual places where they might once again be more.”15 turkle speaks of the development of what she terms a “vexed relationship” between the real and the virtual: in games where we expect to play an avatar, we end up being ourselves in the most revealing ways; on social-networking sites such as facebook, we think we will be presenting ourselves, but our profile ends up as somebody else—often the fantasy of who we want to be. distinctions blur.16 and indeed, some completely lose sight of what is real and what is not. aboujaoude relates the story of alex, whose involvement in an online community became so consuming that he not only created for himself an online persona—“’i then meticulously painted in his hair, streak by streak, and picked “azure blue” for his eye color and “snow white” for his teeth.’”—but also left his “real” girlfriend after similarly remaking the avatar of his online girlfriend, nadia—“from her waist size to the number of freckles on her cheeks.” speaking of his former “real” girlfriend, alex said, “real had become overrated.”17 ■■ “don’t we have people for these jobs?”18 ageist disclaimer: when i grew up, robots—those that weren’t in science fiction stories or films—were things that were touted as making auto assembly lines more efficient, or putting auto workers out of jobs, depending on your perspective. while not technically a robot, the other machine that characterized “that time” was the automated teller machine (atm), which freed us from having to do our banking during traditional weekday hours, and not coincidentally resulted, again, in the loss of many entry-level jobs in financial institutions. as i recall, we were all reassured that the future lay in “helping/ service” professions, where the danger of replacement by machines was thought to be minimal. now, fast forward 30 years. the first half of turkle’s book is the history of “sociable robots” and our interactions with them. moving from the reactions of mit students to joseph weizenbaum’s eliza in the mid-1970s, she recounts her studies of children’s interactions, first with electronic toys—e.g., tamagotchi—and later, with increasingly sophisticated and “alive” robots, such as furby, aibo, and my real baby. with each generation, these devices made yet more “demands” on their owners—for care, “feeding”, etc. and with each generation, the line between “alive” and “alive 58 information technology and libraries | june 2011 to admit that we’ve seen many examples of how connectedness between people we’d otherwise consider “normal” has and is changing our manners and mores.24 many libraries and other public spaces, reacting to patron complaints about the lack of consideration shown by some users, have had to declare certain areas “cell phone free.” in the interest of getting your attention, i’ve admittedly selected some fairly extreme examples from the two books at hand. however, i think the point is that, now that the glitter of always-on, always-connected, has begun to fade a bit, there is a continuum of dysfunctional behaviors that we are beginning to notice, and it’s time to talk about how we as librarians fit into all of this. are there things we in libraries are doing that encourage some of these less desirable and even unhealthy behaviors? which takes us to a second concern raised by some of my gentle draft-readers: we’ve heard this tale before. television, and radio before it, were technologies that, when they were new, were criticized as corrupting and leading us to all sorts of negative, self-destructive, and socially undesirable behaviors. how are screens and the technology of always-connected any different? a part of me—the one that winces every time someone glibly refers to the “transformational” changes taking place around us—agrees. i was trained as a historian, to take a long view about change. and we’re talking about technologies that—in the case of the web— have been in common use for just over fifteen years. that said, my interest here is in seeing our profession begin a conversation about how connective technologies have influenced behavioral changes in people, and especially about how we in libraries may be unwittingly abetting those behavioral changes. television and radio were fundamentally different technologies in that they were one-way broadcast tools. and to the best of my recollection, neither has ever been widely adopted by or in libraries. yes, we’ve circulated videos and sound recordings, and even provided limited facilities for the playback of such media. but neither has ever really had an impact on the traditional core business of libraries, which is the encouragement and facilitation of the largely solitary, contemplative act of reading. connective technologies, in the form of intelligent machines and network-based communities, can be said to be antithetical to this core activity. we need to think about that, and to consider carefully the behaviors we may be encouraging. notwithstanding those critics of change in our profession who feel we move far too glacially, i would maintain that we have often been, if not at the forefront of the technology pack, then certainly among its most enthusiastic ■■ where from here? i titled this column “singularity.” for those not familiar with the literature of science fiction, turkle provides a useful explanation: this notion has migrated from science fiction to engineering. the singularity is the moment—it is mythic; you have to believe in it—when machine intelligence crosses a tipping point. past this point, say those who believe, artificial intelligence will go beyond anything we can currently conceive. . . . at the singularity, everything will become technically possible, including robots that love. indeed, at the singularity, we may merge with the robotic and achieve immortality. the singularity is technological rapture.22 i think it’s pretty clear that we’re still a fair distance from anything that one might reasonably term a singularity. but the concept is surely present, albeit in a somewhat less hubristic degree, when we speak in uncritical awe of “game-changing” or “transformational” technologies. turkle puts it this way: the triumphalist narrative of the web is the reassuring story that people want to hear and that technologists want to tell. but the heroic story is not the whole story. in virtual worlds and computer games, people are flattened into personae. on social networks, people are reduced to their profiles. on our mobile devices, we often talk to each other on the move and with little disposable time—so little, in fact, that we communicate in a new language of abbreviation in which letters stand for words and emoticons for feelings. . . . we are increasingly connected to each other but oddly more alone: in intimacy, new solitudes.23 some of my endlessly patient friends—the ones who provide both you and me with some measure of buffering from the worst of my rants in prepublication drafts of these columns—have asked questions about how all this relates to libraries, for example: how much it is legitimate to generalize to the broader population research findings from cases of obsessive compulsive disorder? the individuals studied are, of course, obsessive and compulsive, in relation to the internet and new technologies. do their behaviors not represent an extreme end of the population? a fair question. and yes, the examples i’ve provided in this column are admittedly somewhat extreme. but turkle and aboujaoud both point to many examples that are far more common. i think all of us would have editorial: singularity—are we there, yet? | truitt 59 references and notes 1. marc truitt, “editorial: the air is full of people,” information technology and libraries 30 (mar. 2011): 3–5. http:// www.ala.org/ala/mgrps/divs/lita/ital/302011/3001mar/ editorial_pdf.cfm (accessed apr. 25, 2011). 2. sherry turkle, alone together: why we expect more from technology and less from each other (new york: basic books, 2011); elias aboujaoude, virtually you : the dangerous powers of the e-personality (new york : norton, 2011). 3. turkle, 19. 4. ibid., 3–4. 5. quoted in ibid., 5. 6. ibid., 9–10. emphasis added. 7. quoted in aboujaoude, 99. 8. turkle, 162. emphasis in original. 9. ibid, 295. 10. turkle, 161. 11. aboujaoude, 96 12. ibid., 98. 13. quoted in turkle, 1. 14. ibid., 12. 15. ibid. 16. ibid., 153. 17. aboujaoude, 77–78. 18. turkle, 290. 19. ibid., 34. 20. ibid., 103–4. 21. ibid., 120–21. 22. ibid., 25. 23. ibid., 18–19. 24. for a recent and typical example, see david carr, “keep your thumbs still when i’m talking to you,” new york times, apr. 15, 2011, http://www.nytimes.com/2011/04/17/ fashion/17text.html (accessed may 2, 2011). 25. aboujaoude, 283. adopters. in our quest to remain “relevant” to our university or school administrations, governing boards, and (in theory, at least) our patrons, we have embraced with remarkably little reservation just about every technology trend that’s come along in the past few decades. at the same time, we’ve been remarkably uncritical and unreflective about our role in, and the larger implications of, what we might be doing by adopting these technologies. aboujaoude, in a surprising, but i think largely correct summary comment, observes: extremely little is available, however, for the individual interested in learning more about how virtual technology has reshaped our inner universe and may be remapping our brains. as centers of learning, public libraries, schools, and universities may be disproportionately responsible for this deficiency. they outdo one another in digitalizing their holdings and speeding up their internet connections, and rightfully see those upgrades as essential to compete for students, scholars, and patrons. in exchange, however, and with few exceptions, they teach little about the unintended, less obvious, and more personal consequences of the world wide web. the irony is, at least in some libraries’ case, that their very survival seems threatened by a shift that they do not seem fully engaged in trying to understand, much less educate their audiences about.25 i could hardly agree more. so, how do we answer aboujaoude’s critique? editorial | truitt 87 l ife out of balance. those who saw it will surely recall the 1982 film that juxtaposed images of stunning natural beauty with scenes of humankind’s intrusion into the environment, all set to a score by philip glass. the title is a hopi word meaning “life out of balance,” “crazy life,” “life in turmoil,” “life disintegrating,” or “a state of life that calls for another way of living.” while the film, as i recall, relied mainly on images of urban landscapes, mines, power lines, etc., to make its point about our impact on the world around us, it did include as well images that had a technological focus, even if the pre–pc technology exemplars shown may seem somewhat quaint thirty years later.1 the sense that one is living in unbalanced, crazy, or tumultuous times is nothing new. indeed, i think it’s fair to say that most of us—our eyes and perspectives firmly and narrowly riveted to the here and now—tend to believe that our own specific time is one of uniquely rapid and disorienting change. but just as there have been, and will be, periods of rapid technological change, social upheaval, etc.—“been there, done that, got the t-shirt,” to recall the memorably pithy, if now slightly oh-so-aughts, slogan—so too have there been reactions to the conditions that characterized those times. a couple of very different but still pertinent examples come to mind. in the second half of the nineteenth century, a reaction against the social conservatism and shoddy, mass-produced goods of the victorian era began in england. inspired by writer and designer william morris, the arts and crafts movement emphasized simplicity, hand-made (as opposed to factory-made) objects, and social reform. by the turn of the century, the movement had migrated to the united states—memo to self: who were the leading lights of the movement in canada?—finding expression in the “mission-style” furniture of gustav stickley, the elegant art pottery of rookwood, marblehead, and teco, and the social activism of elbert hubbard’s roycrofters. fast-forward another half-century to the mid-1960s and the counter-culture of that time, itself a reaction to the racism, sexism, militarism, and social regimentation of the preceding decade. for a brief period, experimentation with “alternative lifestyles,” resistance to the vietnam war, and agitation for social, racial, and sexual change flourished. whatever one’s views about, say, the flower children, civil rights demonstrations, or the wisdom of u.s. involvement in vietnam, it’s well-nigh impossible to argue that the society that emerged from that time was not fundamentally different from the one that preceded it. that both of these “movements” ultimately were subsumed into the larger whole from which they sprang is only partly the issue. and my aim is not to romanticize either of these times, even as i confess to more than a passing interest in and sympathy for both. rather, my point is that their roots lay in a reaction to excesses—social, cultural, economic, political, even technological—that marked their times. they were the result of what might be termed “life out of balance.” in turn, their result, viewed through a longer lens, was a new balance, incorporating elements of the status quo ante and critical pieces from the movements themselves. thesis —> antithesis —> synthesis. we find ourselves in such unbalanced times again today. even without resort to over-hyped adjectives such as “transformational,” it is fair to say that we are in uncertain times. in libraries, budgets, staffing levels, and gate counts are in decline. the formats and means of information delivery are rapidly changing. debates rage over whether we are merely in the business of delivering “information” or of preserving, describing, and imparting learning and knowledge. perhaps worst of all, as our role in the society of which we are a part changes into something we cannot yet clearly see, we fear “irrelevance.” what will happen when everyone around us comes to believe that “everything [at least, everything that’s important] is on the web” and that libraries and librarians no longer have a raison d’être? for much of the past decade and a half—some among us might argue even longer—we’ve reacted by taking the rat-in-the-wheel approach. to remain “relevant,” we’ve adopted practically every new fad or technology that came along, endlessly spinning the wheel faster and faster, adopting the tokens of society around us in the hope that by so doing we would stanch the bleeding of money, staff, patrons, and our own morale. as i’ve observed in this space previously,2 we’ve added banks of über-connected computers, clearing away book stacks to design technology-focused creative services and collaborative spaces around them. we’ve treated books almost as smut, to be hidden away in “plain-brown-wrapper” compact storage facilities. we’ve reduced staffing, in the process outsourcing some services and automating others so that they become depersonalized, the library equivalent of a bank automated teller machine. we’ve forsaken collection building, preferring instead to rent access to resources we don’t own and to cede digitization control of those resources that we ostensibly do own. where does it end? in a former job, i used to joke that my director’s vision of the library would not be fully realized until no one but the director and the library’s system administrator were left on staff and nothing but a giant super-server remained of the library. it seemed only black humor then. today it’s just black. marc truitt marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. editorial: koyaanisqatsi 88 information technology and libraries | september 2011 and intellectual rest. they are places of the imagination. play to these strengths. those seeking to reimagine library spaces as refuges could hardly do better than to look to jasper fforde’s magical bookworld in the thursday next series for inspiration.3 stuffy academics and special libraries take note: library magic is not something restricted to children’s rooms in public libraries. walk through the glorious spaces of yale’s sterling memorial library or visit the reading room at the university of alberta’s rutherford library—known to the present generation of students as the “harry potter room,” for its evocation of the hogwarts school’s great hall—and then tell me that magic does not abound in such places. it’s present in all of our libraries, if we but have eyes to see and hearts to feel. ■■ the library was once a place for the individual. to contemplate. to do research. to know the peace and serenity of being alone. in recent years, as we’ve moved toward service models that emphasize collaboration and groups, i think we’ve lost track of those who do not visit us to socialize or work in groups. we need to reclaim them by devoting as much attention to services and spaces aimed at those seeking aloneness as we do at those seeking togetherness. the preceding list will probably brand me in the minds of some readers as anti-technology. i am not. after spending the greater part of my career working in library it, i still can be amazed at what is possible. “golly? we can do that?” but i firmly believe that library technology is not an end in itself. it is a tool, a service, whose purpose is to facilitate the delivery of knowledge, learning, and information that our collections and staff embody. nothing more. that world view may make me seem old fashioned; if such be the case, count me proudly guilty. in the end, though, i come back to the question of balance. there was a certain balance in and about libraries that prevailed before the most recent waves of technological change began washing over libraries a couple of decades ago. those waves disrupted but did not destroy the old balance. instead, they’ve left us out of balance, in a state of koyaanisqatsi. it’s time to find a new equilibrium, one that respects and celebrates the strengths of our traditional services and collections while incorporating the best that new technologies have to offer. it’s time to synthesize the two into something better than either. it’s time for balance. references and notes 1. wikipedia, “koyaanisqatsi,” http://en.wikipedia.org/ wiki/koyaanisqatsi (accessed july 12, 2011). ital readers in the united states can view the entire film online at http://www more importantly, where has all this wheel spinning gotten us, other than continued decline and yet more hand-wringing and anguish about irrelevance? it’s time to recognize that we are living in a state of koyaanisqatsi (life out of balance). and it’s up to us to do something new about it by creating a new balance. here are a few perhaps out-of-the-box ideas that i think could help with establishing that balance. spoiler alert: some of these may seem just a bit retro. i can’t help it: my formative library years predate the chicxulub asteroid impact. anyway, here goes: ■■ cease worrying so about “relevance.” instead, identify our niche: design services and collections that are “right” and uniquely ours, rather than pale reflections of fads that others can do better and that will eventually pass. we are not google. we are not starbucks. we know that we cannot hope to beat these sorts of outfits at their games; perhaps less obvious is that we should be extremely wary of even partnering with them. their agenda is not ours, and in any conflict between agendas, theirs is likely to prevail. we must identify something unique at which we excel. ■■ find comfort in our own skins. too many of us, i sense, are at some level uneasy with calling ourselves “librarians.” perhaps this is so because so many of us came to the profession by this or that circuitous route, that is, that we intended to be something else and wound up as librarians. get over it and wear the sensible shoes proudly. ■■ stop trying to run away from or hide books. they are, after all, perceived as our brand. is that such a bad thing? ■■ quit designing core services and tools that are based on the assumption that our patrons are all lazy imbeciles who will otherwise flee to google. the evidence suggests that those folks so inclined are already doing it anyway; why not instead aim at the segment that cares about provision of quality content and services—in collections, face-to-face instruction, and metadata? people can detect our arrogance and condescension on this point and will respond accordingly, either by being insulted and alienated or by acting as we depict them. ■■ begin thinking about how to design and deliver services that are less reliant on technology. technology has become, to borrow from marx, the opiate of libraries and librarians; we rely on it to the exclusion of nontechnological approaches, even when the latter are available to us. technology has become an end in itself, rather than a means to an end. ■■ libraries are perceived by many as safe harbors and refuges from any number of storms. they are places of rest—not only of physical rest, but of emotional editorial | truitt 89 editorial.cfm (accessed july 13, 2011). 3. begin with fforde’s the eyre affair (2001) and proceed from there. if you are a librarian and are not quickly enchanted, you probably should consider a career change very soon! thank you, michele n! .youtube.com/watch?v=sps6c9u7ras. sadly, the rest of us must borrow or rent a copy. 2. marc truitt, “no more silver bullets, please,” information technology & libraries 29, no. 2 (june 2010), http://www.ala .org/ala/mgrps/divs/lita/publications/ital/292010/2902jun/ we give to the organization. the lita assessment and research committee recently surveyed membership to find out why people belong to lita, this is an important step in helping lita provide programming etc. that will be most beneficial to its users, but the decision on whether to be a lita member i believe is more personal and doesn’t rest on the fact that a particular drupal class is offered or that a particular speaker is a member of the top tech trends panel. it is based on the overall experience that you have as a member, the many little things. i knew in just a few minutes of attending my first lita open house 12 years ago that i had found my ala home in lita. i wish that everyone could have such a positive experience being a member of lita. if your experience is less than positive how can it be more so? what are we doing right? what could we do differently? please let me or another officer know, and/or volunteer to become more involved and create a more valuable experience for yourself and others. president’s message continued from page 86 26 information technology and libraries | june 2008 preparing locally encoded electronic finding aid inventories for union environments: a publishing model for encoded archival description author id (to come) plato l. smith ii this paper will briefly discuss encoded archival description (ead) finding aids, the workflow and process involved in encoding finding aids using ead metadata standard, our institution’s current publishing model for ead finding aids, current ead metadata enhancement, and new developments in our publishing model for ead finding aids at florida state university libraries. for brevity and within the scope of this paper, fsu libraries will be referred to as fsu, electronic ead finding and/ or archival finding aid will be referred as ead or eads, and locally encoded electronic ead finding aids inventories will be referred to as eads @ fsu. n what is an ead finding aid? many scholars, researchers, and learning and scholarly communities are unaware of the existence of rare, historic, and scholarly primary source materials such as inventories, registers, indexes, archival documents, papers, and manuscripts located within institutions’ collections/holdings, particularly special collections and archives. a finding aid—a document providing information on the scope, contents, and locations of collections/ holdings—serves as both an information provider and guide for scholars, researchers, and learning and scholarly communities, directing them to the exact locations of rare, historic, and scholarly primary source materials within institutions’ collections/holdings, particularly noncirculating and rare materials. the development of the finding aid led to the institution of an encoding and markup language that was software/hardware independent, flexible, extensible, and allowed online presentation on the world wide web. in order to provide logical structure, content presentation, and hierarchical navigation, as well as to facilitate internet access of finding aids, the university of california–berkeley library in 1993 initiated a cooperative project that would later give rise to development of the nonproprietary sgml-based, xml-compliant, machine-readable markup language encoding finding aid standard, encoded archival description (ead) document type definition (dtd) (loc, 2006a). thus, an ead finding aid is a finding aid that has been encoded using encoded archival description and which should be validated against an ead dtd. the ead xml that produces the ead finding aid via an extensible style sheet language (xsl) should be checked for well-formed-ness via an xml validator (i.e. xml spy, oxygen, etc.) to ensure proper nesting of ead metadata elements “the ead document type definition (dtd) is a standard for encoding archival finding aids using extensible markup language (xml)” (loc, 2006c). an ead finding aid includes descriptive and generic elements along with attribute tags to provide descriptive information about the finding aid itself, such as title, compiler, compilation date, and the archival material such as collection, record group, series, or container list. florida state university libraries has been creating locally encoded electronic encoded archival description (ead) finding aids using a note tab light text editor template and locally developed xsl style sheets to generate multiple ead manifestations in html, pdf, and xml formats online for over two years. the formal ead encoding descriptions and guidelines are developed with strict adherence to the best practice guidelines for the implementation of ead version 2002 in florida institutions (fcla, 2006), manuscript processing reference manual (altman & nemmers, 2006), and ead version 2002. an ead note tab light template is used to encode findings down to the collection level and create ead xml files. the ead xml files are tranformed through xsl stylesheets to create ead finding aids for select special collections. n ead workflow, processes, and publishing model the certified archivist and staff in special collections and a graduate assistant in the digital library center encode finding aids in ead metadata standard using an ead clip and ead template library in note tab light text editor via data entry input for the various descriptive, administrative, generic elements, and attribute metadata element tags to generate ead xml files. the ead xml files are then checked for validity and well-formed-ness using xml spy 2006. currently, ead finding aids are encoded down to the folder level, but recent florida heritage project 2005–2006 grant funding has allowed selected special collections finding aids to be encoded down to the item level. currently, we use two xsl style sheets, ead2html.xsl and ead2pdf.xsl, to generate html and pdf formats, and simply display the raw xml as part of rendering ead finding aids as html, pdf, and xml and presenting these manifestations to researchers and end users. the ead2html.xsl style sheet used to generate the html versions was developed with specifications such as use of fsu seal, color, and display with input from the special collections department head. the ead2pdf.xsl style sheet used to generate pdf versions uses xsl-fo (formatting plato l. smith ii (psmithii@fsu.edu) is digital initiatives librarian at florida state university libraries, tallahassee. preparing locally encoded electronic finding aid inventories for union environments | smith 27 object), and was also developed with specifications for layout and design input from the special collections department head. the html versions are generated using xml spy home edition with built-in xslt, and the pdf versions are generated using apache formatting object processor (fop) software from the command line. ead finding aids, eads @ fsu, are available in html, pdf, and xml formats (see figure 1). the style sheets used, ead authoring software, and eads @ fsu original site are available via www.lib.fsu.edu/dlmc/dlc/ findingaids. n enriching ead metadata as ead standards and developments in the archival community advance, we had to begin a way of enriching our ead metadata to prepare our locally encoded ead finding aids for future union catalog searching and opac access. the first step toward enriching the metadata of our ead finding aids was to use rlg ead report card (oclc, 2008) on one of our ead finding aids. the test resulted in the display of missing required (req), mandatory (m), mandatory if applicable (ma), recommended (rec), optional (opt), and encoding analogs (relatedencoding and encodinganalog attributes) metadata elements (see figure 2). the second test involved reference online archive of california best practices guidelines (oac bpg), specifically appendix b (cdl, 2005, ¶ 2), to create a formal public identifier (fpi) for our ead finding aids and make the ead fpis describing archives content standards (dacs)–compliant. this second test resulted in the creation of our very first dacs– compliant ead formal public identifier. example: <eadid countrycode=“us” identifier=“mss2003004” mainagencycode=“ftasu” publicid=“-//florida state university::strozier library::special collections//text (us::ftasu::ftasu2003004:: bernard f. sliger collection)//en”>ftasu2003004. xml</eadid> the rlg ead report card and appendix b of oac bpg together helped us modify our ead finding aid encoding template and workflow to enrich the ead document identifier <did> metadata tag element, include missing mandatory ead metadata elements, and develop fpis for all of our ead finding aids. prior to recent new developments in the publishing model of ead finding aids at fsu libraries, the ead finding aids in our eads @ fsu inventories could not be easily found using traditional web search engines, were part of the so-called “deep web,” (prom & habing, 2002) and were “unidimensional in that they [were] based upon the assumption that there [was] an object in a library and there [was] a descriptive surrogate for that object, the cataloging record” (hensen, 1999). ead finding aids in our eads @ fsu inventories did not have a descriptive surrogate catalog record and lacked the relevant related encoding and analog metadata elements within the ead metadata with which to facilitate “metadata crosswalks”—mapping one metadata standard with another metadata standard to facilitate crosssearching. “to make the metadata in ead instance as robust as possible, and to allow for crosswalks to other encoding schemes, we mandate the inclusion of the relatedencoding and encodinganalog attributes in both the <eadheader> and <archdesc> segments” (meissner, et al., 2002). incorporating an ead quality checking tool such as rlg bpg and ead compliance such as dacs when figure 1. ead finding aids in html, pdf, and xml format figure 2. rlg ead report card of xml ead file 28 information technology and libraries | june 2008 authoring eads, will assist in improving ead encoding and ead finding aids publishing model. n some key issues with creating and managing ead finding aids one of the major issues with creating and managing ead finding aids is the set of rules used for describing papers, manuscripts, and archival documents. the former set of rules used for providing consistent descriptions and anglo-american cataloging rules (aacr) bibliographic catalog compliance for papers, manuscripts, and archival documents down to collection level was archives, personal papers, and manuscripts (appm), which was complied by steven l. hensen and published by the library of congress in 1983. however, the need for more description granularity down to the item level, enhanced bibliographic catalog specificity, marc and ead metadata standards implementations and metadata standards crosswalks, and inclusion of descriptors of archival material types beyond personal papers and manuscripts prompted the development of describing archives: a content standard (dacs), published in 2004 with the second edition published in 2007. “dacs [u.s. implementation of international standard for the description of archival materials and their creators] is an output-neutral set of rules for describing archives, personal papers, and manuscripts collections, and can be applied to all material types ”(pearce-moses, 2005). some international standards for describing archival materials are general international standard archival description isad(g) and international standard archival authority record for corporate bodies, persons, and families [isaar(cpf)]. other issues with creating and managing ead finding aids include (list not exhaustive): 1. online presentation of finding aids 2. exposing finding aids electronically for searching 3. provision of a search interface to search finding aids 4. online public access catalog record (marc) and link to finding aids 5. finding aids linked to digitized content of collections eads @ fsu exist in html for online presentation, pdf for printing, and xml for exporting, which allow researchers greater flexibility and options in the information-gathering and research processes and have improved the way archivists communicated guides to archival collections with researchers as opposed to paper finding aids physically housed within institutions. eads @ fsu have existed online in html, pdf, and xml formats for two years in a static html document and then moved to drupal (mysql database with php) for about one year, which improved online maintenance but not researcher functionality. however, the purchase and upgrade of a digital content management system marked a huge advancement in the development of our ead finding aids implementation and thus resolutions to issues numbers 1–3. researchers now have a single-point search interface to search eads @ fsu across all our digital collections/ institutional repository (see figure 3); the ability to search within the finding aids via full-text indexing of pdfs; the option of brief (thumbnails with ead, htm, pdf, and xml manifestation icons), table (title, creator, and identifier), and full (complete ead finding aid dc record with manifestations) views of search results, which provides different levels of exposures of ead finding aids; and the ability to save/e-mail search results. future initiatives are underway to enhance eads @ fsu implementation via the creation of ead marc records through dublin core to marc metadata crosswalk, to deep link to ead finding aids via 856 field in marc records, and to begin digitizing and linking to ead finding aids archival content via digital archival object <dao> ead element. <dao> is “linking element that uses the attributes entityref or href to connect the finding aid information to electronic representations of the described materials. the <dao> and <daogrp> elements allow the content of an archival collection or record figure 3. online search gui for ead finding aids and digital collections within ir preparing locally encoded electronic finding aid inventories for union environments | smith 29 group to be incorporated in the finding aid” (loc, 2006b). we have opted to create basic dublin core records of ead finding aids based on the information in the ead finding aids descriptive summary (front matter) first and then crosswalk to marc, but are cognizant that this current workflow is subject to change in the pursuit of advancement. however, we are seeking ways to improve the ead workflow and ead marc record creation through more communication and future collaboration with the fsu libraries cataloging department. n number of finding aids and percent of eads @ fsu as of february 16, 2006, we had 700 collections with finding aids in which 220 finding aids are electronic and encoded in html (31 percent of total finding aids). from the 220 electronic finding aids, 60 are available as html, pdf, and xml finding aids (20 percent of electronic finding aids are eads @ fsu). however, we currently have 63 ead finding aids available online in html, pdf, and xml formats. n new developments in publishing eads @ fsu current eads @ fsu include the recommendations from test 1 and test 2 (rlg bpg and dacs compliance) which were discussed earlier and the digital content management system (i.e. digitool) creates a descriptive digital surrogate of the ead objects in the form of brief and basic dublin core metadata records for each ead finding aid along with multiple ead manifestations (see figure 4). we have successfully built and launched our first new digital collection, fsu special collections ead inventories, in digitool 3.0 as part of fsu libraries dlc digital repository (http://digitool3.lib.fsu.edu/r/), a relational database digital content management system (dcms). digitool has an oracle 9i relational database management system backend, searchable web-based gui, a default ead style sheet that allows full-text searching of eads, supports marc, dc, mets metadata standards, jpeg2000 (built in tools for images and thumbnails) as well as z39.50 and oai protocols which will enable resource discovery and exposing of eads @ fsu. you can visit fsu special collections ead finding aids inventories at http://digitool3.lib.fsu.edu/r/? func=collections-result&collection_id=1076. n national, international, and regional aggregation of finding aids initiatives rlg’s archivegrid (http://archivegrid.org/web/index. jsp) is an international, cross-institutional search constituting the aggregation of primary source archival materials of more than 2,500 research libraries, museums, and archives with a single-point interface to search archival collections from across research institutions. other international, cross-institutional searches of aggregated archival collections are: n intute: arts& humanities in the united kingdom www.intute.ac.uk/artsandhumanities/ cgi-bin/browse.pl?id=200025 (international guide to subcategories of archival materials) n archives made easy www.archivesmade easy.org (guide to archives by country) there are also some regional initiatives, which provide cross-institutional search of aggregations of finding aids: n publication of archival library and museum materials (palmm) http://palmm.fcla.edu (crossfigure 4. ead finding aids in ead (default), html, pdf, and xml manifestations 30 information technology and libraries | june 2008 institutional searches in fl fsu participates, fl) n virginia heritage: guides to manuscript and archival collections in virginia http://ead.lib .virginia.edu/vivaead/ (cross-institutional searches in virginia) n texas archival resources online www.lib.utexas. edu/taro/ (cross-institutional searches in texas) n online archive of new mexico http://elibrary .unm.edu/oanm/ (cross-institutional searches in new mexico) awareness of regional, national, and international aggregation of finding aids initiatives and engagement in regional aggregation of finding aids will enable a consistent advancement in the development and implementation of eads @ fsu. acknowledgments fsu libraries digital library center and special collections department, florida heritage project funding (fcla), chuck f. thomas (fcla), and robert mcdonald (sdsc) assisted in the development, implementation, and success of eads at fsu. references altman, b. & nemmers, j. (2006). manuscripts processing reference manual. florida state university special collections. california digital library (cdl). (2005). oac best practice guidelines for encoded archival description, appendix b. formal public identifiers for finding aids. retrieved october 6, 2006 from www.cdlib.org/inside/diglib/guidelines/bpgead/ bpgead_app.html#d0e2995. digital library center, florida state university libraries. (2006). fsu special collections ead finding aids inventories. retrieved january 5, 2007 from http://digitool3.lib.fsu.edu/ r/?func=collections-result&collection_id=1076. florida center of library automation (fcla). (2004). palmm: publication of archival library and museum materials, archival collections. retrieved january 7, 2007 from http://palmm.fcla .edu. florida center for library automation (fcla). (2006). best practice guidelines for the implementaton of ead version 2002 in florida institutions. (john nemmers, ed.). accessed april 21, 2008, at www.fcla.edu/dlini/openingarchives/new/ floridaeadguidelines.pdf fox, m. (2003). the ead cookbook — 2002 edition.chicago: the society of american archivists. retrieved october 6, 2006 from www.archivists.org/saagroups/ead/ead2002cookbook .html. hensen, s. l. (1999). nistf ii and ead: the evolution of archival description. encoded archival description: context, theory, and case studies (pp. 23–34). chicago: the society of american archivsits library of congress (loc). (2006a). development of the encoded archival description dtd. retrieved october 6, 2006 from www.loc.gov/ead/eaddev.html. library of congress (loc). (2006b). digital archival object— encoded archival description tag library—version 2002. retrieved january 8, 2007 from www.loc.gov/ead/tglib. library of congress (loc). (2006c). encoded archival description —version 2002 official site. etd dtd version 2002. retrieved april 19, 2008 from www.loc.gov/ead/ead2002a.html. meissner, d., kinney, g., lacy, m., nelson, n., proffitt, m., rinehart, r., ruddy, d., stockling, b., webb, m., & young, t. (2002). rlg best practices guidelines for encoded archival description (pp. 1-24). mountain view: rlg. retrieved january 5, 2007 from www.rlg.org/en/pdfs/bpg.pdf. national library of australia. (1999). use of encoded archival description (ead) for manuscript collection retrieved january 4, 2007 from www.nla.gov.au/initiatives/ead/eadintro .html. oclc. (2007). archivegrid—open the door to history. retrieved january 4, 2007 from http://archivegrid.org/web. oclc. (2008). ead report card. retrieved april 11, 2008 www.oclc.org/programs/ourwork/past/ead/reportcard .htm. pearce-moses, r. (2005). a glossary of archival and records terminology. chicago: society of american archivists. retrieved january 8, 2007 from www.archivists.org/glossary/index.asp. prom, c. j. & habing, t. g. (2002). using the open archives initiative protocols with ead . paper preserted at the international conference on digital libraries proceedings of the 2nd acm/ieee-cs joint conference on digital libraries. portland, oregan, usa, july 14-18, 2002. retrieved october 6, 2006 from http://portal.acm .org/citation.cfm?doid=544220.544255. reese, t. (2005). building lite-weight ead repositories,. paper presented in the international conference on digital libraries proceedings of the 5th acm/ieee-cs joint conference on digital libraries. new york: acm. retrieved january 5, 2007 from http://doi.acm.org/10.1145/1065385.1065498. special collections department, university of virginia. (2004). virginia heritage guides to manuscripts and archival collections in virginia. retrieved january 7, 2007 from http://ead.lib.virginia .edu/vivaead/. thomas, c., et al. (2006). best practices guidelines for the implementation of ead version 2002 in florida institutions. florida state university special collections. university of texas libraries, university of texas at austin. (unknown). texas archival resources online (taro). retrieved january 4, 2007 from www.lib.utexas.edu/taro. zumalt ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ improving independent student navigation of complex educational web sites: an analysis of two navigation design changes in libguides kate a. pittsley and sara memmott information technology and libraries | se ptember 2012 52 abstract can the navigation of complex research websites be improved so that users more often find their way without intermediation or instruction? librarians at eastern michigan university discovered both anecdotally and by looking at patterns in usage statistics that some students were not recognizing navigational elements on web-based research guides, and so were not always accessing secondary pages of the guides. in this study, two types of navigation improvements were applied to separate sets of online guides. usage patterns from before and after the changes were analyzed. both sets of experimental guides showed an increase in use of secondary guide pages after the changes were applied whereas a comparison group with no navigation changes showed no significant change in usage patterns. in this case, both duplicate menu links and improvements to tab design appeared to improve independent student navigation of complex research sites. introduction anecdotal evidence led librarians at eastern michigan university (emu) to investigate possible navigation issues related to the libguides platform. anecdotal evidence included (1) incidents of emu librarians not immediately recognizing the tab navigation when looking at implementations of the libguides platform on other university sites during the initial purchase evaluation, (2) multiple encounters with students at the reference desk who did not notice the tab navigation, and (3) a specific case involving use of a guide with an online course. the case investigation started with a complaint from a professor that graduate students in her online course were suddenly using far fewer resources than students in the same course during previous semesters. the students in that semester’s section relied heavily—often solely— on one database, while most students during previous semesters had used multiple research sources. this course has always relied on a research guide prepared by the liaison librarian, the selection of resources provided had not changed significantly between the semesters, and the assignment had not changed. furthermore, the same professor taught the course and did not alter her recommendation to the students to use the resources on the research guide. what had changed between the semesters was the platform used to present research guides. the library had just migrated from a simple one-page format for research guides to the more flexible multipage format offered by the libguides platform. only a few resources were listed on the first kate a. pittsley (kpittsle@emich.edu) is an assistant professor and business information librarian and sara memmott (smemmott@emich.edu) is an instructor and emerging technologies librarian at eastern michigan university, ypsilanti, michigan. improving independent student navigation of complex educational websites | pittsley and memmott 53 libguides page of the guide used for the course. only one of these resources was a subscription database, and that database was the one that current students were using to the exclusion of many other useful sources. after speaking with the professor, the liaison librarian also worked one-on-one with a student in the course. the student confirmed that she had not noticed the tab navigation and so was unaware of the numerous resources offered on subsequent pages. the professor then sent a message to all students in the course explaining the tab navigation. subsequently the professor reported that students in the course used a much wider range of sources in assignments. statistical evidence of the problem a look at statistics on guide use for fall 2010 showed that on almost all guides the first pages of guides were the most heavily used. as the usual entry point, it wasn’t surprising that the first pages would receive the most use; however, on many multipage guides, the difference in use between the first page and all secondary pages was dramatic. that users missed the tab navigation and so did not realize additional guide pages existed seemed like a possible explanation for this usage pattern. librarians felt strongly that most users should be able to navigate guides without direct instruction in their use, and they were concerned by the evidence that indicated problems with the guide navigation. was there something that could be done to improve independent student navigation in libguides? two types of design changes to navigation were considered. to test the changes, each navigation change was applied to separate sets of guides. usage patterns were then compared for those guides before and after changes were made. the investigators also looked at usage patterns over the same period for a comparison group to which no navigation changes had been made. literature review navigation in libguides and pathfinders the authors reviewed numerous articles related to libguides or pathfinders generally, but found few that mention navigation issues. they then turned to studies of website navigation in general. in an early article on the transition to web-based library guides, cooper noted that “computer screens do not allow viewers to visualize as much information simultaneously as do print guides, and consequently the need for uncomplicated, easily understood design is even greater.”1 four university libraries’ usability studies of the libguides platform specifically address navigation issues. university of michigan librarians dubicki et al. found that “tabs are recognizable and meaningful—users understood the function of the tabs.”2 the michigan study then focused on the use of meaningful language for tab labels. however, at the latrobe university library (australia), corbin and karasmanis found a consistent pattern of students not recognizing the navigation tabs, and so recommended providing additional navigation links elsewhere on the page.3 at the university of washington, hungerford et al. found students did not immediately recognize the tab navigation: information technology and libraries | se ptember 2012 54 during testing it was observed that users frequently did not notice a guide’s tabs right away as a navigational option. users’ eyes were drawn to the top middle of the page first and would focus on content there, especially if there was actionable content, such as links to other pages or resources.4 the solution at the university of washington was to require that all guides have a main page navigation area (libguides “box”) with a menu of links to the tabbed pages. after a usability study, mit libraries also recommended use of a duplicate navigation menu on the first page, stating in mit libraries staff guidelines for creating libguides to “make sure to link to the tabs somewhere on the main page” as “users don’t always see the tabs, so providing alternate navigation helps.”5 navigation palmer mentions navigation as one of the factors most significantly associated with website success as measured by user satisfaction, likelihood to use a site again, and use frequency.6 however, effective navigation may be difficult to achieve. nielsen found in numerous studies that “users look straight at the content and ignore the navigation areas when they scan a new page.”7 in a presentation on the top ten mistakes in web design, human–computer interaction scholar tullis included “awkward or confusing navigation.”8 the following review of the literature on website navigation design is limited to studies of navigation models that use browsing via menus, tabs, and menu bars. the navigation problem seen in libguides is far from unique. usability studies for other information-rich websites demonstrate similar problems with users not recognizing navigation tabs or menu bars similar to those used in libguides. in 2001, mcgillis and toms investigated the usability of a library website with a horizontal navigation bar at the top of the page, a design similar to the single row of libguides tabs. this study found that users either did not see the navigation bar or did not realize it could be clicked.9 in multiple usability studies, u.s. census bureau researchers found similar problems with navigation bars on government websites. in 2009, olmsted-hawala et al. reported that study participants did not use the top-navigation bar on the census bureau’s business and industry website.10 the next year, chen et al. again reported problems with top-navigation bar use on the governments division public website, explaining that the “top-navigation bar blends into the header, leading participants to skip over the tabs and move directly to the main content. this is a recurring issue the usability laboratory has identified with many web sites.”11 one possible explanation for user neglect of tabs and navigation bars may be a phenomenon termed “banner blindness.” as early as 1999, benway provided in-depth analysis of this problem. in his thesis, he uses the word “banner” not just for banner ads, but also for banners that consist of horizontal graphic buttons similar to the libguides tab design. benway’s experiments show that an attempt to make important items visually prominent may have the opposite effect— that “the visual distinctiveness may actually make important items seem unimportant.” benway follows with two recommendations: (1) that “any method that is created to make something stand out should be carefully tested with users who are specifically looking for that content to ensure that it does not cause banner blindness,” and (2) that “any item visually distinguished on a page should be duplicated within a collection of links or other navigation areas of the page. that way, if searchers ignore the large salient item, they can still find what they need through basic navigation.”12 improving independent student navigation of complex educational websites | pittsley and memmott 55 in 2005, tullis cited multiple studies that showed that users found information faster or more effectively by using a simple table of contents than by using other navigation forms, including tabbased navigation.13 yet in 2011, nicolson et al. found that “participants rarely used table of contents; and often appeared not to notice them.”14 yelinek et al. pointed to a practical problem in using content menus on libguides pages: since libguides pages can be copied or mirrored on other guides, guide authors must be cognizant that such menus could cause problems with incorrect or confusing navigational links on copied or mirrored pages.15 success can also depend on the location of navigational elements, although researchers disagree on effects of location. in addition, user expectations of where to look for navigation elements may change over time along with changes in web conventions. in 2001, bernard studied user expectations as to where common web functions would be located on the screen layout. he found that “most participants expected the links to web pages within a website to be almost exclusively located in the upper-left side of a web page, which conforms to the current convention of placing links on [the] left side.”16 in 2004, pratt et al. found that users were equally effective using horizontal or vertical navigation menus, but when given a choice more users chose to use vertical navigation.17 also in 2004, mccarthy et al. performed an eye-tracking study, which showed faster search times when sites conformed to the expected left navigation menu and a user bias toward searching the middle of the screen; but it also found that the initial effect of menu position diminished with repeated use of a site.18 nonetheless, jones found that by 2006 most corporate webpages used “horizontally aligned primary navigation using buttons, tabs, or other formatted text.”19 in 2008, cooke found that users looked equally at left, top, and center menus; however, when “a visually prominent navigation menu populated the center of the web page, participants were more likely to direct their search in this location.”20 wroblewski describes how tab navigation was first popularized by amazon.21 burrell and sodan investigated user preferences for six navigation styles and found that users clearly preferred tab navigation “because it is most easily understood and learned.”22 in the often-cited web design manual don’t make me think, krug also recommends tabs: “tabs are one of the very few cases where using a physical metaphor in a user interface actually works.”23 krug recommends that tabs be carefully designed to resemble file folder tabs. they should “create the visual illusion that the active tab is in front of the other tabs . . . the active tab needs to be a different color or contrasting shade [than the other tabs] and it has to physically connect with the space below it. this is what makes the active tab ‘pop’ to the front.”24 an often-cited u.s. department of health and human services manual on research-based web design addresses principles of good tab design, stating that tabs should be located near the top of the page and should “look like clickable versions of real-world tabs. real-world tabs are those that resemble the ones found in a file drawer.”25 nielsen provides similar guidelines for tab design, which include that the selected tab should be highlighted, the current tab should be connected to the content area (just like a physical tab), and that one should use only one row of tabs.26 more recently, cronin highlighted examples of good tab design that effectively use elements such as rounded tab corners, space between tabs, and an obvious design for the active tab that visually connects the tab to the area beneath it.27 christie also provides best practices for tab design that include consistent use of only one row of tabs, use of a prominent color for the active tab and a single information technology and libraries | se ptember 2012 56 background color for unselected tabs, changing the font color on the active tab, and use of rounded corners to enhance the file-folder-tab metaphor.28 two articles mention that the complexity of a site can be a factor in navigation success. mccarthy et al. found that search times are significantly affected by site complexity and recommended finding ways to balance the provision of numerous user options with simplifying the site so that users can find their way.29 little specifically suggests reducing the amount of extraneous information on libguides pages in her article, which applies cognitive load theory to use of library research guides.30 in sum, effective navigation is difficult to achieve. however, navigation design can be improved by considering the purpose of the site, user expectations, common conventions, best practices, the possibility that intuitive ideas for design may not perform as expected (e.g., banner blindness), the site’s complexity, and more. research question and method could design changes improve independent student use of libguides tab navigation? the literature reviewed above suggested two likely design changes to test: adding additional navigation links in the body of the page and improving the tab design. testing these design changes on selected guides would allow the emu library to assess the impact before implement changes on all library research guides. for this experiment, each type of navigation change was applied to separate subsets of guides; a subset of similar guides was selected as a comparison group; and usage patterns were analyzed for similar periods before and after changes were made. navigation design changes were made to fourteen subject guides related to business. the business subject guides were divided into two experimental groups of seven guides. in group a, a table of contents box with navigation links was added to the front page of each guide, and in group b, the navigation tabs were altered in appearance. no navigation changes were made to comparison group c. class specific guides were excluded from the experiment, as in many cases the business librarian would have instructed students in the use of tabs on class guides. changes were made at the beginning of the winter 2011 semester so that an entire semester’s data could be collected and compared to the previous semester’s usage patterns. the design for group a was similar to the university of washington implementation of a “what’s in the guide” box on guide homepages that repeated the tab navigation links.31 for guides in group a, a table of contents box was placed on the guide homepages. it contained a simple list of links to the secondary pages of the guides, using the same labels as on the navigation tabs. the table of contents box used a larger font size than other body text and was given an outline color that contrasted with the outline color used on other boxes and matched the navigation tab color to create visual cues that this box had a different function from the other boxes on the page (navigation). the table of contents box was placed alongside other content on the guide homepages so users could still see the most relevant resources immediately. figure 1 shows a guide containing a table of contents box. improving independent student navigation of complex educational websites | pittsley and memmott 57 figure 1. group a guide with content menu box labeled “guide sections” the design change for group b focused on the navigation tabs. libguides tabs exhibit some of the properties of good tab design, such as allowing for rounded corners and contrasting colors for the selected tabs. other aspects are not ideal, such as the line that separates the active tab from the page body.32 in the emu library’s initial libguides implementation, the option for tabs with rounded corners was used to resemble the design of manila file folders and increase the association with the file-folder metaphor. possibilities for further design adaptation on the experimental guides were somewhat limited because these changes needed to be applied to the tabs of just a selected set of guides. the investigators theorized that increasing the height of the tabs might make them more closely resemble paper file folder tabs. increasing the height would also increase the area of the tabs, and the larger size might also make the tabs more noticeable. this option was simple to implement on the guides in group b by adding html break tags, <br />, to the tab text. taller tabs also provided more room for text on the tabs. tabs in libguides will expand in width to fit the text label used, and if the tabs on a guide require more space on the page, they will be displayed in multiple rows. multiple rows of tabs are visually confusing and break the tabs metaphor, decreasing their usefulness for navigation.33 the emu library’s best practices for research guides already encouraged limiting tabs to one row. adding height to tabs allowed for clearer text labels on some guides without expanding the tab display beyond a single row. figure 2 shows a guide containing the altered taller tabs. information technology and libraries | se ptember 2012 58 figure 2. group b guide with tabs redesigned to look more like file folder tabs while variations in content and usage of library guides did not allow for a true control group, other social science subject guides were selected as a comparison group. social science subject guides were excluded from the comparison group if they had very low guide usage during the fall 2010 semester (fewer than thirty uses), or if they had fewer than three tabs, making them structurally dissimilar to the business guides. this left a group of sixteen comparison guides. no changes were made to the navigation design of these guides during the test period. the business guides—which the authors had permission to experiment with—tend to be longer and have more pages than other guides. on average, the experimental guides had more pages per guide than the comparison guides; guides in groups a and b averaged nine pages per guide, and comparison guides averaged five pages per guide. guides with more pages will tend to have a higher percentage of hits on secondary pages because there are more pages available to users. however, the authors intended to measure the change in usage patterns with each guide measured against itself in different periods, and the number of pages in each guide did not change from semester to semester. data collection and results libguides provides monthly usage statistics that include the total hits on each guide and the number of hits on each page of a guide. use of secondary pages of the guides was measured by calculating the proportion of hits to each guide that occurred on secondary pages. data for the fall 2010 semester (september through december 2010) was used to measure usage patterns before navigation changes were made to the experimental guides. data for the winter 2011 semester (january through april 2011) was used to measure usage patterns after navigation changes were made. each would represent a full semester’s use at similar enrollment levels with many of the same courses and assignments. usage patterns for the comparison guides were also examined for these periods. improving independent student navigation of complex educational websites | pittsley and memmott 59 as shown in figures 3 and 4, in both group a and group b, the percentage of hits on secondary pages increased in five guides and decreased in two guides. figure 3. group a: change in secondary page usage with content menus added for winter 2011 figure 4. group b: change in secondary page usage with new tab design for winter 2011 both groups of experimental guides showed an increase in use of secondary guide pages after the design changes were made. the median usage score was calculated for each group. group a, with the added menu links, showed an increase of 10.3 points in the median percentage of guide hits on secondary pages. group b, with redesigned tabs, showed an increase of 10.4 points in the median percentage of guide hits on secondary pages. within the comparison guides, the proportion of hits secondary tab usage : guides in group a fall 2010 winter 2011 secondary tab usage: guides in group b fall 2010 winter 2011 information technology and libraries | se ptember 2012 60 on secondary pages did not change significantly from fall 2010 to winter 2011. table 1 shows the median percentage of guide hits on secondary pages before and after navigation design changes. group a: menu links added group b: tabs redesigned group c: comparison group fall 2010 39.1% 50.5% 37.7% winter 2011 49.4% 60.9% 37.4% table 1. median percentage of guide hits on secondary pages the box plot in figure 5 graphically illustrates the range of the usage of secondary pages in each group of guides and the changes from fall 2010 to winter 2011, showing the minimum, maximum, and median scores, as well as the range of each quartile. figure 5. distribution of percentage of guide hits on secondary pages. this figure demonstrates the change in usage pattern for groups a and b and the lack of change in usage pattern for comparison group c. averages for the percentage change in secondary tab use were also computed for the combined experimental groups and the comparison group. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% group a f10 group a w11 group b f10 group b w11 group c f10 group c w11 improving independent student navigation of complex educational websites | pittsley and memmott 61 experimental or comparison n mean std. deviation std. error mean change in secondary tab use dim ension 1 experimental 14 .07871 .097840 .026149 comparison 16 -. 02550 .145977 .036494 table 2. average change in secondary tab use from fall 2010 to winter 2011, comparing all experimental guides (groups a & b) with all comparison (group c) guides. when comparing all experimental guides and all comparison guides, the change in use of secondary pages was found to be statistically significant. the average change in use of secondary pages for all experimental guides (groups a and b) was .07871, and the average for all comparison guides (group c) was -.02550. a t test showed that this difference was significant at the p < . 05 level (p = .032). study limitations in some (possibly many) cases, the first page of the guide provides all necessary sources and advice for an assignment. we measured actual use of secondary pages, but were unable to measure recognition of navigation elements where the student did not use the secondary pages because they had no need for additional resources. because it wasn’t possible to control use of the guides during the periods studied, it is possible that factors other than the design changes contributed to the pattern of hits. though subject guides rather than class guides were used to limit the influence of instruction in the use of guides, it wasn’t possible to determine with certainty if other faculty members instructed a significant number of students in the use of particular guides during the periods examined. the comparison group was slightly dissimilar in that they had fewer pages than the experimental guides; however, the number of pages on a guide did not correlate with a change in percentage of hits on secondary pages from one semester to the next. application of findings when presented with the study results, the full library faculty at emu expressed interest in using both design changes across all library research guides. the change to tab design—which is easiest to implement—has been made to all subject guides. some librarians also chose to add content menus to selected guides. since the complexity of research guides is also a factor in successful navigation,35 a recent libguides enhancement was used to move elements from the header area to the bottom of the guides. the elements moved out of the header included the date of last update, guide url, print option, and rss updates. the investigators hypothesize that the reduced complexity of the header may help in recognizing the tab navigation. although convinced that the experimental changes made a difference to independent student navigation in research guides, the authors hope to find further ways to strengthen independent navigation. vendor design changes to enhance the tab metaphor, such as creating a more visible connection between the active tab and page, might also improve navigation.36 information technology and libraries | se ptember 2012 62 conclusion designing navigation for complex sites, such as library research guides, is likely to be an ongoing challenge. this study suggests that thoughtful design changes can improve navigation. in this case, both duplicate menu links and improvements to tab design improved independent student navigation of complex research sites. references and notes 1. eric a. cooper, “library guides on the web: traditional tenets and internal issues,” computers in libraries 17, no. 9 (1997): 52. 2. barbara dubicki beaton et al., libguides usability task force guerrilla testing (ann arbor: university of michigan, 2009), http://www.lib.umich.edu/content/libguides-guerillatesting. 3. jenny corbin and sharon karasmanis, health sciences information literacy modules usability testing report (bundoora, australia: la trobe university library, 2009), http://arrow.latrobe.edu.au:8080/vital/access/handleresolver/1959.9/80852. 4. rachel hungerford, lauren ray, christine tawatao, and jennifer ward, libguides usability testing: customizing a product to work for your users (seattle: university of washington libraries, 2010), 6, http://hdl.handle.net/1773/17101. 5. mit libraries, research guides (libguides) usability results (cambridge, ma: mit libraries, 2008), http://libstaff.mit.edu/usability/2008/libguides-summary.html; mit libraries, guidelines for staff libguides (cambridge, ma: mit libraries, 2011), http://libguides.mit.edu/staff-guidelines. 6. jonathan w. palmer, “web site usability, design, and performance metrics,” information systems research 13, no. 2 (2002): 151-67, doi:10.1287/isre.13.2.151.88. 7. jakob nielsen, “is navigation useful?,” jakob nielsen’s alertbox, http://www.useit.com/alertbox/20000109.html. 8. thomas s. tullis, “web-based presentation of information: the top ten mistakes and why they are mistakes,” in hci international 2005 conference: 11th international conference on human-computer interaction, 22–27, july 2005, caesars palace, las vegas, nevada usa (mahwah nj: lawrence erlbaum associates, 2005), doi:10.1.1.107.9769. 9. louise mcgillis and elaine g. toms, “usability of the academic library web site: implications for design,” college & research libraries 62, no. 4 (2001): 355–67, http://crl.acrl.org/content/62/4/355.short. 10. erica olmsted-hawala et al., usability evaluation of the business and industry web site, survey methodology #2009–15, (washington, dc: statistical research division, u.s. census bureau, 2009), http://www.census.gov/srd/papers/pdf/ssm2009–15.pdf. 11. jennifer chen et al., usability evaluation of the governments division public web site, survey http://www.lib.umich.edu/content/libguides-guerilla-testing http://www.lib.umich.edu/content/libguides-guerilla-testing http://arrow.latrobe.edu.au:8080/vital/access/handleresolver/1959.9/80852 http://hdl.handle.net/1773/17101 http://crl.acrl.org/content/62/4/355.short http://www.census.gov/srd/papers/pdf/ssm2009–15.pdf improving independent student navigation of complex educational websites | pittsley and memmott 63 methodology #2010–02, (washington, dc: u.s. census bureau, usability laboratory, 2010), 19, http://www.census.gov/srd/papers/pdf/ssm2010-02.pdf. 12. jan panero benway, “banner blindness: what searching users notice and do not notice on the world wide web” (phd diss., rice university, 1999), 75, http://hdl.handle.net/1911/19353. 13. tullis, “web-based presentation of information.” 14. donald j. nicolson et al., “combining concurrent and sequential methods to examine the usability and readability of websites with information about medicines,” journal of mixed methods research 5, no. 1 (2011): 25–51, doi:10.1177/1558689810385694. 15. kathryn yelinek et al., “using libguides for an information literacy tutorial 2.0,” college & research libraries news 71, no. 7 (july): 352–55, http://crln.acrl.org/content/71/7/352.short 16. michael l. bernard, “developing schemas for the location of common web objects,” proceedings of the human factors and ergonomics society annual meeting 45, no. 15 (october 1, 2001): 1162, doi:10.1177/154193120104501502. 17. jean a. pratt, robert j. mills, and yongseog kim, “the effects of navigational orientation and user experience on user task efficiency and frustration levels,” journal of computer information systems 44, no. 4 (2004): 93–100. 18. john d. mccarthy, m. angela sasse, and jens riegelsberger, “could i have the menu please? an eye tracking study of design conventions,” people and computers 17, no. 1 (2004): 401–14. 19. scott l. jones, “evolution of corporate homepages: 1996 to 2006,” journal of business communication 44, no. 3 (2007): 236–57, doi:10.1177/0021943607301348. 20. lynne cooke, “how do users search web home pages?” technical communication 55, no. 2 (2008): 185. 21. luke wroblewski, “the history of amazon’s tab navigation,” lukew ideation + design, may 7, 2007, http://www.lukew.com/ff/entry.asp?178. after addition of numerous product categories made tabs impractical, amazon now relies on a left-side navigation menu. 22. a. burrell and a. c. sodan, “web interface navigation design: which style of navigationlink menus do users prefer?” in 22nd international conference on data engineering workshops, april 2006. proceedings (washington, d.c.: ieee computer society, 2006), 42– 42, doi:10.1109/icdew. 2006.163. 23. steve krug, don’t make me think! a common sense approach to web usability, 2nd ed. (berkeley: new riders, 2006), 79. 24. ibid., 82. http://www.census.gov/srd/papers/pdf/ssm2010-02.pdf http://hdl.handle.net/1911/19353 http://crln.acrl.org/content/71/7/352.short http://www.lukew.com/ff/entry.asp?178 information technology and libraries | se ptember 2012 64 25. u.s. department of health and human services, “navigation,” in research-based web design & usability guidelines (washington, dc: u.s. department of health and human services, 2006), 8, http://www.usability.gov/pdfs/chapter7.pdf. 26. jakob nielsen, “tabs, used right,” jakob nielsen’s alertbox, http://www.useit.com/alertbox/tabs.html. 27. matt cronin, “showcase of well-designed tabbed navigation,” smashing magazine, april 6, 2009, http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designedtabbed-navigation. 28. alex christie, “usability best practice, part 1—tab navigation,” tamar, january 13, 2010, http://blog.tamar.com/2010/01/usability-best-practice-part-1-tab-navigation. 29. mccarthy, sasse, and riegelsberger, “could i have the menu please?” 30. jennifer j. little, “cognitive load theory and library research guides,” internet reference services quarterly 15, no. 1 (2010): 52–63, doi:10.1080/10875300903530199. 31. hungerford et al., libguides usability testing. 32. christie, “usability best practice”; nielsen, “tabs, used right”; krug, don’t make me think; cronin, “showcase of well-designed tabbed navigation.” 33. christie, “usability best practice”; nielsen. “tabs, used right.” 34. eva d. vaughan, statistics: tools for understanding data in the behavioral sciences (upper saddle river, nj: prentice hall, 1998), 66. 35. mccarthy, sasse, and riegelsberger, “could i have the menu please?” 36. springshare, the libguides vendor, has been amenable to customer feedback and open to suggestions for platform improvements. http://www.usability.gov/pdfs/chapter7.pdf http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designed-tabbed-navigation http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designed-tabbed-navigation http://blog.tamar.com/2010/01/usability-best-practice-part-1-tab-navigation introducing zoomify image | smith 29 column title editor author id box for 3 column layout playing tag in the dark: diagnosing slowness in library response time | brown-sica 29 margaret brown-sicatutorial playing tag in the dark: diagnosing slowness in library response time in this article the author explores how the systems department at the auraria library (which serves more than thirty thousand primarily commuting students at the university of colorado–denver, the metropolitan state college of denver, and the community college of denver) diagnosed and analyzed slow response time when querying proprietary databases. issues examined include vendor issues, proxy issues, library network hardware, and bandwidth and network traffic. w hy is everything so slow?” this is the question that library systems departments often have the most trouble answering. it is also easy to dismiss because it is often the fault of factors beyond the control of library staff. what usually prompts these questions are the experiences of the reference librarians. when these librarians are trying to help students at the reference desk, it is very frustrating when databases seem to respond to queries slowly, files take forever to load onto the computer screen, and all the while the line in front of the desk get continues to grow. or the library gets calls from students using databases and the catalog from their homes who complain that searching library resources takes too long, and that they are getting frustrated and using google instead. this question is so painful because libraries spend so much of their shrinking budgets on high quality information in the form of expensive proprietary databases, and it is all wasted if users have trouble using them. in this case the problem seemed to be how slow the process of searching for information and downloading documents from databases was. for lack of a better term, the auraria library called this the “response time” problem. this article will discuss the various ways the systems (technology) department of the auraria library, which serves the university of colorado–denver, metropolitan state college of denver, and the community college of denver, tried to identify problems and improve database response time. the systems department defined “response time” as the time it took for a person to send a query from a computer at home or in the library to a proprietary information database and receive a response back, or how long it took to load a selected fulltext article from a database. when a customer sets out to use a database in the library, the query to the database could be slowed down by many different factors. the first is the proxy, in our case innovative interfaces’ inc. web access management (iii wam), a product that authenticates the user via the iii api (application program interface) product. to do this the query travels over network hardware, switches, and wires to the iii server and back again. then the query goes to the database’s server, which may be almost anywhere in the world. hardware problems at the database vendor’s end can affect this transfer. in the case of auraria library this transfer can be influenced by traffic on the library’s network, the university’s network, and any other place in between. this could also be hampered by the amount of memory in the computer where the query originates, by the amount of tasks being performed by that computer, etc. the bandwidth of the network and its speed can also have an effect. basically, the bottlenecks needed to be found and fixed. bottlenecks are described by webopedia as “the delay in transmission of data through the circuits of a computer’s microprocessor or over a tcp/ip network. the delay typically occurs when a system’s bandwidth cannot support the amount of information being relayed at the speed it is being processed. there are, however, many factors that can create a bottleneck in a system.”1 literature review there is not a lot on database response slowness in library literature, probably because the issue overlaps with computer science and really is not one problem but a possibility of one of several problems. the issue is figuring out where the problem lies. gerhan and mutula examined technical reasons for network slowness, performing bandwidth testing at a library in botswana and one in the united states using the same computer, and giving several suggestions for testing, fixing technical problems, and issues to examine. gerhan and mutula concluded that bandwidth and insufficient network infrastructure were the main culprits in their situation. they studied both bandwidth and bandwidth “squeeze.” looking for the bandwidth “squeeze” means looking along the internet’s “journey of many stages through routers and exchange points, each successively farther removed from the user.”2 bandwidth bottlenecks could occur at any one or more of those stages in the query’s transmission. the following four sections parse that lengthy pathway and examine how each may contribute to delays. badue et al. in their article “basic issues on the processing of web queries,” described web margaret brown-sica (margaret.brown -sica@ucdenver.edu) is head of technology and distance education support, auraria library, serving the university of colorado–denver, metropolitan state college of denver, and the community college of denver. 30 information technology and libraries | december 200830 information technology and libraries | december 2008 queries, load balancing, and how they function.3 bertot and mcclure’s “assessing sufficiency and quality of bandwidth for public libraries” is based on data collected as part of the 2006 public libraries and the internet study and provides a very straightforward approach for checking specific areas for problems.4 it outlines why basic data such as bandwidth readings may not give the complete picture. it also gives a nice outline of factors involved such as local settings and parameters, ultimate connectivity path, application resource needs, and protocol priority. azuma, okamoto, hasegawa, and masayuki’s “design, implementation and evaluation of resource management system for internet servers” was very helpful in understanding the role and function of proxy servers and problems they can present.5 vendor issues this is a very thorny topic because it is out of the library’s control, and also because the library has so many databases. the systems department asked the reference staff to send reports of problems listing the type of activity attempted, time and dates, the names of the database, the problem and any error messages encountered. a few that seemed to be the slowest were selected for special examination. one vendor worked extensively with the library and in the end it was believed that there were problems at their end in load balancing, which eventually seemed to be fixed. that company was in the middle of a merger and that may have also been an issue. we also noted that a database that uses very large image files, artstor, was hard to use because it was so slow. this company sent the library an application that simulated the databases’ use and was supposed to test to see if bandwidth at auraria library was sufficient for that database. according to the test, it was. databases that consistently were perceived as the slowest were those that had the largest documents and pictures, such as those that used primarily pdfs and visual material. this, with the results of the testing, pointed to a problem independent of vendor issues. bandwidth and network traffic the systems department decided to do bandwidth testing on the library’s public and staff computers after reading gerhan and mutula’s article about the university of botswana. the general perception is that bandwidth is often the primary problem in network slowness, as well as the problems with databases that use larger files. several of the computers were tested in several successive days during what is usually the busiest time for the network, between noon and 2 p.m. the results were good, averaging about 3000 kilobytes per second (kbps). for this test we used the cnet bandwidth meter, which downloads an image to your computer, measures the time of the download, and compares it to the maximum speeds offered by other internet service providers.6 there are several bandwidth meters available on the internet. when the network administrator checked the switches for network traffic, they showed low traffic, almost always less than 20 percent of capacity. this was confusing: if the problem was neither with the bandwidth nor the vendors, what was causing the slow network performance? one of the university network administrators was consulted to see if any factor in their sphere could be having an effect on our network. we knew that the main university network had implemented a bandwidth shaper to regulate bandwidth. “these devices limit bandwidth . . . by greedy applications, guarantee minimum throughput for users, groups or protocols, and better utilize widearea connections by smoothing out bursty traffic.”7 it was thought that perhaps this might be incorrectly prioritizing some of the library’s traffic. this was a dead end, though—the network administrators had stopped using the device. if the bandwidth was good and the traffic was manageable, then the problem appeared to not be at the library. however, according to bertot and mcclure, the bandwidth question is complex because typically an arbitrary number describes the number of kbps used to define “broadband.” . . . such arbitrary definitions to describe bandwidth sufficiency are generally not useful. the federal communications commission (fcc), for example, uses the term “high speed” for connections of 200kbps in at least one direction. there are three problematic issues with this definition: 1. it specifies unidirectional bandwidth, meaning that a 200kbps download, but a much slower upload (e.g., 56kbps) would fit this definition; 2. regardless of direction, bandwidth of 200kbps is neither high speed nor does it allow for a range of internet-based applications and services. this inadequacy will increase significantly as internet-based applications continue to demand more bandwidth to operate properly. 3. the definition is in the context of broadband to the single user or household, and does not take into consideration the demands of a high-use multiple-workstation public-access context.8 proxy issues auraria library uses the iii wam proxy server product. there were several things that pointed to the introducing zoomify image | smith 31playing tag in the dark: diagnosing slowness in library response time | brown-sica 31 proxy being an issue. one was that the systems department had been experimenting with invoking the proxy in the library building in order to collect more accurate statistics and found that complaints about speed seemed to have started around the same time as this experiment. but if the bandwidth was not showing inadequacy and the traffic was light, why was this happening? the answer is better explained by azuma et al.: needless to say, busy web servers must have many simultaneous http sessions, and server throughput is degraded when effective resource management is not considered, even with large network capacity. web proxy servers must also accommodate a large number of tcp connections, since they are usually prepared by isps (internet service providers) for their customers. furthermore, proxy servers must handle both upward tcp connections (from proxy server to web servers) and downward tcp connections (from client hosts to proxy server). hence, the proxy server becomes a likely spot for bottlenecks to occur during web document transfers, even when the bandwidth of the network and web server performance are adequate.9 testing was done from on campus and off campus, with and without using the proxy server. the results showed that the connection was faster without the proxy. when testing was done from the health sciences library at the university of colorado with the same type of server and proxy, the response time was much faster. the difference between auraria library and the other library is that the community auraria library serves (the community college of denver, metropolitan state college, and the university of colorado–denver) has a much larger user population who overwhelmingly use databases from home, therefore taxing the proxy server. the other library belonged to a smaller campus, but the hardware was the same. the proxy was immediately dropped for on-campus users, and that resulted in some responsetime improvements. a conference call was set up with the proxy vendor to determine if improvements in response time might be attained by changing from a proxy server to ldap (lightweight directory access protocol) authentication. the response given was that although there might be other benefits, increased response time was not one of them. library network hardware it was evident that the biggest bottleneck was the proxy, so the systems department decided to take a closer look at iii’s hardware. the switch that regulated traffic between the network and the server that houses our integrated library system, part of which is the proxy server, was discovered to have been set at “halfduplex.” half-duplex refers to the transmission of data in just one direction at a time. for example, a walkie-talkie is a half-duplex device because only one party can talk at a time. in contrast, a telephone is a full-duplex device because both parties can talk simultaneously. duplex modes often are used in reference to network data transmissions. some modems contain a switch that lets you select between halfduplex and full-duplex modes. the correct choice depends on which program you are using to transmit data through the modem.10 when this setting was changed to full duplex response time increased. there was also concern that this switch had not been functioning as well as it could. the switch was replaced, and this also improved response time. in addition, the old server purchased through iii was a generic server that had specifications based on the demands of the ils software and didn’t into consideration the amount of traffic going to the proxy server. auraria library, which serves a campus of more than thirty thousand full-time equivalent students, is a library with one of the largest commuter student populations in the country. a new server had been scheduled to be purchased in the near future, so a call was made to the ils vendor to talk about our hypothesis and requirements. the vendor agreed that the library should change the specification on the new server to make sure it served the library’s unique demands. a server will be purchased with increased memory and a second processor to hopefully keep these problems from happening again in the next few years. also, the cabling between the switch and the server was changed to greater facilitate heavy traffic. conclusion although it is sometimes a daunting task to try to discover where problems occur in the library’s database response time because there are so many contributing factors and because librarians often do not feel that they have enough technical knowledge to analyze such problems, there are certain things that can be examined and analyzed. it is important to look at how each library is unique and may be inadequately served by current bandwidth and hardware configurations. it is also important not to be intimidated by computer science literature and to trust patterns of reported problems. the auraria library systems department was fortunate to also be able to compare problems with colleagues at other libraries and test in those libraries, which revealed issues that were unique and therefore most likely due to a problem at the library end. it is important to keep learning about how 32 information technology and libraries | december 200832 information technology and libraries | december 2008 your system functions and to try to diagnose the problem by slowly looking at one piece at a time. though no one ever seems to be completely satisfied with the speed of their network, the employees of auraria library, especially those who work with the public, have been pleased with the increased speed they are experiencing when using proprietary databases. having improved on the responsetime speed issue, other problems that are not caused by the proxy hardware have been illuminated, such as browser configuration, which may be hampering certain databases—something that had been attributed to the network. references 1. webopedia, s.v. “bottleneck,” www.webopedia.com/term/b/bottleneck.html (accessed oct. 8, 2008). 2. david r. gerhan and stephen mutula, “bandwidth bottlenecks at the university of botswana,” library hi tech 23, no. 1 (2005): 102–17 3. claudine badue et al., “basic issues on the processing of web queries,” sigir forum; 2005 proceedings (new york: association for computing machinery, 2005): 577–78. 4. john carlo bertot and charles r. mcclure,” assessing sufficiency and quality of bandwidth for public libraries,” information technology and libraries 26, no. 1 (mar. 2007): 14 –22. 5. kazuhiro azuma, takuya okamoto, go hasegawa, and murata masayuki, “design, implementation and evaluation of resource management system for internet servers,” journal of high speed networks 14, no. 4 (2005): 301–16. 6. “cnet bandwidth meter,” http:// reviews.cnet.com/internet-speed-test (accessed oct. 8, 2008). 7. michael j. demaria, “warding off wan gridlock,” network computing nov. 15, 2002, www.networkcomputing.com/ showitem.jhtml?docid=1324f3 (accessed oct. 8, 2008). 8. bertot and mcclure, “assessing sufficiency and quality of bandwidth for public libraries,” 14. 9. azuma, okamoto, hasegawa, and masayuki, “design, implementation and evaluation of resource management system for internet servers,” 302. 10. webopedia, s.v. “half-duplex,” www.webopedia.com/term/h/half _duplex.html (accessed oct. 8, 2008). lita cover 2, cover 3, cover 4 index to advertisers graphs in libraries: a primer | powell et al. 157 james e. powell, daniel alcazar, matthew hopkins, robert olendorf, tamara m. mcmahon, amber wu, and linn collinsgraphs in libraries: a primer answer routine searches is compelling. how, we wonder, can we bring a bit of google to the library world? google harvests vast quantities of data from the web. this data aggregation is obviously complex. how does google make sense of it all so that it can offer searchers the most relevant results? answering this question requires understanding what google is doing, which requires a working knowledge of graph theory. we can then apply these lessons to library systems, make sense of voluminous bibliometric data, and give researchers tools that are as effective for them as google is for web surfers. just as web surfers want to know which sites are most relevant, researchers want to know which of the relevant results are the most reliable, the most influential, and of the highest quality. can quantitative metrics help answer these qualitative questions? the more deeply libraries and librarians can mine relationships between articles and authors and between subjects and institutions, the more reliable are their metrics. suppose some librarians want to compare the relative influence of two authors. they might first look at the authors’ respective number of publications. but are those papers of equally high quality? they might next count all citations to those papers. but are the citing articles of high quality? deeper still, they might assign different weights to each citing article using its own number of citations. at each step, whether realizing it or not, they are applying graph theory. with deeper knowledge of this subject, librarians can embrace complexity and harness it for research tools of powerful simplicity. ■■ pagerank and the global giant graph indexing the web is a massive challenge. the internet is a network of computer hardware resources so complex that no one really knows exactly how it is structured. in fact, researchers have resorted to conducting experiments to discern the structure and size of the internet and its potential vulnerability to attacks. representations of the data collected by these experiments are based on network whenever librarians use semantic web services and standards for representing data, they also generate graphs, whether they intend to or not. graphs are a new data model for libraries and librarians, and they present new opportunities for library services. in this paper we introduce graph theory and explore its real and potential applications in the context of digital libraries. part 1 describes basic concepts in graph theory and how graph theory has been applied by information retrieval systems such as google. part 2 discusses practical applications of graph theory in digital library environments. some of the applications have been prototyped at the los alamos national laboratory research library, others have been described in peer-reviewed journals, and still others are speculative in nature. the paper is intended to serve as a high-level tutorial to graphs in libraries. part 1. introduction to graph theory complexity surrounds us, and in the twenty-first century, our attempts at organization and structure sometimes lead to more complexity. in layman’s terms, complexity refers to problems and objects that have many distinct but interrelated issues or components. there also is an interdisciplinary field referred to as “complex systems,” which investigates emergent properties, such as collective intelligence.1 emergent properties are an embodiment of the old adage “the whole is greater than the sum of its parts.” these are behaviors or characteristics of a system “where the parts don’t give a real sense of the whole.”2 libraries reside at the nexus of these two definitions: they are creators and caretakers of complex data sets (metadata), and they are the source of explicit records of the complex and evolving intellectual and social relationships underlying the evolution of knowledge. digital libraries are complex systems. patrons visit libraries hoping to find some order in complexity or to discover a path to new knowledge. instead, they become the integration point for a complex set of systems as they juggle resource discovery by interacting with multiple systems, either overtly or via federated search, and by contending with multiple vendor sites to retrieve articles of interest. contrast this with google’s simple approach to content discovery: a user enters a few terms in a single box, and google returns a large list of results spanning the internet, placing the most relevant results at the top of this list. no one would suggest using google for all research needs, but its simplicity and recognized ability to james e. powell (jepowell@lanl.gov) is research technologist, daniel a. alcazar (dalcazar@lanl.gov) is professional librarian, matthew hopkins (mfhop@lanl.gov) is library professional, tamara m. mcmahon (tmcmahon@lanl.gov) is library technology professional, amber wu (amber.ponichtera@gmail.com) is graduate research assistant, and linn collins (linn@lanl .gov) is technical project manager, los alamos national laboratory, los alamos, new mexico. robert olendorf (olendorf@unm .edu) is data librarian for science and engineering, university of new mexico libraries, albuquerque, new mexico. 158 information technology and libraries | december 2011 influence a person has in a business context. if we want to analyze this aspect of the network, then it makes sense to consider the fact that some relationships are more influential than others. for example, a relationship with the president of the company is more significant than a relationship with a coworker, since it is a safe assumption that a direct relationship with the company leader will increase influence. so we assign weights to the edges based on who the edge connects to. google does something similar. all the webpages they track have centrality values, but google’s weighting algorithm takes into account the relative importance of the pages that connect to a given resource. the weighting algorithm bases importance on the number of links pointing to a page, not the page’s internal content, which makes it difficult for website authors to manipulate the system and climb the results ladder. so if a given webpage science, also known as graph theory. this is not the same network that ties all the computers on the internet together, though at first glance it is a similar idea. network science is a technique for representing the relationships between components of a complex system.3 it uses graphs, which consist of nodes and edges, to represent these sets of relationships. generally speaking, a node is an actor or object of some sort, and an edge is a relationship or property. in the case of the web, universal resource locators (urls) can be thought of as nodes, and connections between pages can be thought of as links or edges. this may sound familiar because the semantic web is largely built around the idea of graphs, where each pair of nodes with a connecting edge is referred to as a triple. in fact, tim berners-lee refers to the semantic web as the global giant graph—a place where statements of facts about things are published online and distinctly addressable, just as webpages are today.4 the semantic web differs from the traditional web in its use of ontologies that place meaning on the links and in the expectation that nodes are represented by universal resource identifiers (uris) or by literal (string, integer, etc.) values, as shown in figure 1, where the links in a web graph have meaning in the semantic web. semantic web data are a form of graph, so graph analysis techniques can be applied to semantic graphs, just as they are applied to representations of other complex systems, such as social networks, cellular metabolic networks, and ecological food webs. herein lies the secret behind google’s success: google builds a graph representation of the data it collects. these graphs play a large role in determining what users see in response to any given query. google uses a graph analysis technique called eigenvector centrality.5 in essence, google calculates the relative importance of a given webpage as a function of the importance of the pages that point to it. a simpler centrality measure is called degree centrality. degree centrality is simply a count of the number of edges a given node has. in a social network, degree centrality might tell you how many friends a given person has. if a person has edges representing friendship that connect him to seventeen other nodes, representing other people in the network, then his degree value is seventeen (see figure 2). if a person with seventeen friends has more friendship edges than any other person in the network, then he has the highest degree centrality for that network. eigenvector centrality expands on degree centrality. consider a social network that represents the amount of figure 1. a traditional web graph is compared to a corresponding semantic web graph. notice that replacing traditional web links with semantic links facilitates a deeper understanding of how the resources are related. graphs in libraries: a primer | powell et al. 159 networks evidence for the evolution of metabolic processes.7 chemists have used networks to model reactions in a step-wise fashion by “editing” graphs representing models of molecules and their reactivity,8 and they also have used graphs to better comprehend phase transition states, such as the freezing of water or the emergence of superconductivity when a material is cooled.9 economists have used graphs to model market trades and the effects of globalization.10 infectious disease specialists have used networks to model the spread of disease and to evaluate prospective vaccination plans.11 sociologists have modeled the complex interactions of people in communities.12 and in libraries, computer scientists have explored citation networks and coauthorship networks,13 and they have developed maps of science that integrate scientific papers, their topics, the journals in which they appear, and comsumers’ usage patterns to provide a new view of the pursuit of science.14 network science can make complexity more comprehensible by representing a subset of actors and relationships in a complex system as a graph. these has only two edges, it may still rank higher than a more connected page if one of the pages that links to it has a large number of pages pointing to it (see figure 3). this weighted degree centrality measure is eigenvector centrality, and a higher eigenvector centrality score causes a page to show up closer to the top of a google results set. the user never sees a graph, but this graphbased approach to exploring a complex system (the web), works quite well for routine web searches. ■■ graph theory graph theory, also known as network science, has evolved tremendously in the last decade. for example, information scientists have discovered hubs in the web that connect large numbers of pages, and if removed, disconnect large portions of the network.6 biologists have begun to explore cellular processes, such as metabolism, by modeling these processes as networks and have even found in these figure 2. friendship network figure 3. node 2 ranks higher than node 1 because node 3, which connects to node 2, has more incoming links than node 1. node 3 is deemed more important than node 9, which has no incoming links. 160 information technology and libraries | december 2011 as subgraphs, e.g., in the case where a person has two friends who are also mutual friends. small world networks have numerous highly interconnected subgroups called clusters. these may be distributed throughout the network in a regular fashion, with a few random connections that connect the otherwise disconnected clusters. these random links have the effect of greatly reducing the path length between any two nodes and explain the oft-cited six degrees of separation that connect all people to one another. in social networks, agency is often described as the mechanism by which graphs can then be explored visually and mathematically. graphs can be used to represent systems as they are, to extract subsets of these systems, or to discover wholly artificial collections of relationships between components of a speculative system. data also can be represented as graphs when they consist of “measurements that are either of or from a system conceptualized as a network.”15 in short, graphs offer a continuum of techniques for comprehending complexity and are suitable either for a layman with casual interest in a topic or a serious researcher ferreting out discrete details. at the core of network science is the graph. as stated earlier, a graph is a collection of nodes and the edges that connect some of those nodes, together representing a set of actors and relationships in a type of system. relationships can be unidirectional (e.g., in a social network, when the information flows from one person to another) or bidirectional (e.g., when the information flows back and forth between two individuals). relationships also can vary in significance and can be assigned a weight—for example, a person’s relationship to his or her supervisor might be weighted more heavily than a person’s relationship to his or her peers. a graph can consist of a single type of node (for subjects) and a single type of edge connecting those nodes (for predicates). these are called unipartite graphs. from the standpoint of graph theory, these are the easiest types of graphs to work with. graphs that represent two relationships (bipartite) or more are typically reduced to unipartite graphs in the process of exploring them because the vast majority of techniques for evaluating graphs were developed for graphs that address a single relationship between a set of nodes. ■■ global properties of graphs there are other aspects of graphs to consider, sometimes referred to as “global graph properties.”16 there are two basic classes of networks: homogeneous networks and inhomogeneous networks.17 these graphs exhibit characteristics that may not be comprehensible by close examination (e.g., by examining degree centrality, node clustering, or paths within a graph)18 but may be apparent, depending on the size and the way in which the graph is rendered, merely by looking at a visualization of the graph. in homogeneous graphs, nodes have no significant difference between their number of connections. examples include random graphs, complete graphs, and small world networks. in random graphs there is an equal probability that any two nodes will be connected (see figure 4), while in complete graphs (see figure 5) all nodes are connected with one another. random graphs are often used as tools to evaluate networks that describe real systems. complete graphs might occur in social networks figure 4. a random graph. any given node has an equal probability of being linked to any other node figure 5. a complete graph. all nodes are connected to all other nodes graphs in libraries: a primer | powell et al. 161 building blocks of networks.20 a three-node feedback motif is a set of nodes where the edges between them form a triangle and the edges are directional. in other words, node a is connected to (and might convey some information to) node b; node b, in turn, has the same relationship with node c; and node c is connected to (and conveys information back to) node a. in digital libraries, for example, if similar papers exhibit the same pattern of connectivity to a group of subject or keyword categories, motifs will make it possible to readily identify the topical overlap between them. collections of nodes that have a high degree of connectivity with each other are called clusters.21 in many complex systems, clusters are formed by preferential attachment. a group of highly clustered nodes that have low connectivity to the larger graph is known as a clique. while there are other aspects of graphs that can be explored, these four—node centrality measures, paths between nodes, motifs, and clustering—are accessible to most users and are significant in graphs representing bibliographic metadata and textual content. this will become clearer in the examples that follow. ■■ quantitative evaluation of graphs returning now to centrality measures, two of particular interest in digital libraries are degree centrality and betweenness centrality (or flow centrality). an interesting aspect of graphs is that, regardless of the data being represented, centrality measures and clustering characteristics often reveal important clues about the system that the data these random links get established. agency refers to the idea that multiple, often unpredictable actions on the part of individuals in a network result in unanticipated connections between people. examples of such actions are hobbies, past work experience, meeting someone new while on a trip to another country—pretty much anything that takes a person outside his or her normal social circles. in the case of inhomogeneous graphs, not all nodes are created equal. one type, scale-free networks, is common in a variety of systems ranging from biological to technological (see figure 6. these exhibit a structure in which a few nodes play a central role in connecting many others. these hubs form as a result of preferential attachment, known colloquially as “the rich get richer.” researchers became aware of scale-free networks as a result of analysis of the web when it was in its infancy. scale-free networks have been documented in biology, social networks, and technological networks. as a result, they are quite important in the field of information science. small world and scale-free networks are typical of complex systems that occur in nature or evolve because of emergent dynamic processes, in which a system self-organizes over time. small world networks provide fast, reliable communication between nodes, while scale-free networks are more fault tolerant, making them ideal for systems such as living cells, which are frequently challenged by the external environment.19 ■■ local properties of graphs below the ten-thousand-foot system-level view of networks, graphs can be scrutinized more closely using many other techniques. we will now consider four broad categories of local characteristics that describe networks and how they are, or could be, applied in digital libraries: node centrality measures, paths between nodes, motifs, and clustering. centrality measures make it possible to determine the importance of a given node in a network. degree centrality, in its simplest form, is simply a count of the number of edges connected to any given node in a network: a node with high-degree centrality has many connections to other nodes compared to a typical node in the graph. paths make it possible to explore the connections between nodes. an author who is two degrees removed from another author—in other words, the friend of a friend of a friend—has a path length of 2. researchers are often interested specifically in the shortest path between a given pair of nodes. many other types of paths can be explored depending on the type of network, but in libraries, paths that describe the flow of ideas or communication between people are most likely to be useful. motifs are the fundamental recurring structures that make up the larger graph, and they often are called the figure 6. example of a scale-free coauthorship network. a few nodes have many links, and most nodes have few or a single link to another node 162 information technology and libraries | december 2011 path that connects a node through other nodes back to itself. within graph visualization tools, the placement of nodes can vary from one layout to another. what matters is not the pictorial representation (though this can be useful), but the underlying relationships between nodes (the topology). along with clustering, paths help differentiate motifs, which are considered to be building blocks of some types of complex networks. since bibliographic metadata represents communication in one form or another, it is often most common to apply social network theory to graphs. but it is also possible to apply various centrality measures to graphs that are not social and to use these to discover significant nodes within those graphs. in part 2 we consider various unipartite and bipartite graphs that might be especially useful for examining digital library metadata. part 2. graph theory applications in digital libraries library systems, by virtue of the content they contain, are complex systems. fielded searches, faceted searches, and full-text searches all allow users to access aspects of the complex system. fielded searches leverage the explicit structure that has been encoded into the metadata describing the resources that users are ultimately trying to find (articles, books, etc). full-text searches enable users to explore in a more free-form manner, subject of course to the availability of searchable text. often, full-text search means the user is searching titles, abstracts, and other content that summarizes a resource, rather than the actual full text of articles and books. even if the user is searching the full content, there are relationships and aspects of the content that are not readily discernible through a full-text search. furthermore, there is not one single, comprehensive digital library—many library systems live in the deep web, that is, they are databases that are not indexed by search engines like google, and so users must describes, whether it’s coauthorship relationships or protein interactions in the cell of a living organism. often the clusters or nodes that exhibit a higher score in some centrality calculation are significant in some meaningful way compared to other nodes. recall that degree centrality refers to how many edges a given node has. degree centrality can vary significantly in strength depending on the relationships that are represented in the graph. consider a graph of citations between papers. while it may be obvious to humans that the mostly highly cited papers will have the highest-degree centrality, computers have no idea what this means. it is still up to humans to lend a degree of comprehensibility to the raw calculation: in other words, to understand that a paper with high-degree centrality is an important paper, at least among the papers the graph represents. betweenness centrality exposes how integral a given node is to a network. basically, without getting into the mathematics, it measures how often a node falls on the shortest path between other nodes. thus, nodes with high betweenness centrality do not necessarily have a lot of edges, but they bridge disparate clusters. in an informational network, the nodes with high betweenness centrality are crucial to information flow, social connections, or collaborations. hubs are examples of nodes with high betweenness centrality. the removal of a hub causes large portions of a network to become detached. in figure 7, the node labeled “folkner, w.m.” exhibits high betweenness centrality, since it connects two clusters together. a cluster coefficient expresses whether a given node in a network is a member of a tightly interlinked collection of nodes, or clique. the cluster coefficient of an entire graph reveals the overall tendency for clustering in a graph, with higher cluster coefficients typical of small world graphs. in other types of graphs, clusters sometimes manifest as homophily; that is, nodes of a given type are highly interconnected with one another and have few connections with nodes of other types. in social networks, this is sometimes referred to as the “birds of a feather” effect. in a more current reference, the effect was explored as a function of the likelihood that someone would “unfriend” an acquaintance on the social networking site facebook.22 in some networks (such as the internet), clusters are connected by hubs, while in others, the hub is the primary connecting node of other nodes. paths refer to the edges that connect nodes. the simplest case of a path is an edge that connects two nodes directly. path analysis addresses the set of edges that connect two nodes that are not, themselves, directly connected. the shortest path, as its name implies, refers to the route that uses the least number of edges to connect from node a to node b and measures the number of edges, not the linear distance. walks and paths refer to a list of nodes between two nodes, with walks allowing repeat visits to nodes, and paths not allowing them. cycles refer to a figure 7. paths in a coauthorship network graphs in libraries: a primer | powell et al. 163 coauthorship (collaboration) networks coauthorship (collaboration) networks are typically small world networks in which crossand interdisciplinary work provides the random links that connect various clusters (see figure 8). these graphs can be explored to determine which researchers are having the most influence in a given field; influence is a function of frequency of authorship. a prime example is the collaboration network graph for paul erdős, a highly productive mathematician. the popularity of his influence in academia has lead to the creation of the erdős number, which is “defined as indicating the topological distance in the graph depicting the co-authorship relations.”23 liu et al. proposed a node analysis measure that they called authorrank, which establishes weighted directed edges between authors. the author ’s authorrank value is a sum of the weighted edges connected to that author.24 these networks also can be used to explore how an idea spreads and what opportunities may exist for future collaborations, as well as many other existing and potential relationships. citation graphs citation graphs more strongly resemble scale-free networks, in which early papers in a given field tend to accumulate more links. such hub papers can be cited hundreds or even thousands of times while most papers are cited far less often or not at all. many researchers have explored citation graphs, though the person often credited with first noting the network characteristics of citation patterns was dereck j. de solla price in 1965.25 more recently, mark newman introduced the concept of what he calls “first mover advantage” to describe the preferential attachment observed in citation networks.26 search each individually. but if more of these systems adopted semantic web standards, they could be explored as graphs, and relationships between different databases would be easier to discern and represent to the user. many libraries have tried to emulate google by incorporating federated search engines with a single search box as an interface. this copies the form of google’s search engine but not its underlying power. to do that, libraries must enhance full-text searches by drawing on relationships. a full-text search will (hopefully) find relevant papers on a given topic, but a researcher often wants to find the best papers on that topic. to meet that need, libraries must harness the information contained in relationships; otherwise each paper is stuck in a vacuum. cited references are one way to connect papers. for researchers and librarians alike, this is a familiar metric for assessing a paper’s relative importance. the web of science and scopus are two databases that perform this function. looked at another way, citation counts are nothing more than degree centrality applied to a simple graph in which papers are nodes and references are edges. thus, in the framework of graph theory, citation analysis is just a small sliver of a world of possible relationships, many of which are unexplored. the following examples outline use case scenarios in which graph techniques are or could be applied to library data, such as bibliographic metadata, to help users find relationships and conduct research. ■■ informational graphs intrinsic to digital library systems there are multiple relationships represented within and between metadata contained in library systems that can be represented as graphs and explored using graph techniques. some of these, such as citation networks, are among the most well-studied informational networks. citation networks are valued because the data describing them is readily accessible and because scientists studying classes of networks have used them as surrogates for exploring scale-free networks. they are often evaluated as static networks (i.e., a snapshot in time) but some also have dynamic characteristics (e.g., they change and grow over time or they allow information-flow analysis). techniques such as pagerank can be used to evaluate information when the importance of a linking resource is as important as the number of links to a resource. multirelational networks can be developed to explore dynamic processes in research fields by using library data to provide the basic topological framework for some of the explorations. figure 8. a coauthorship network 164 information technology and libraries | december 2011 network with three types of nodes: one to represent individual pieces of debris, a second to represent collections of debris that are the original object that the debris is a fragment of, and a third to represent conjunction events (near misses) between objects. another example of graphs being used as tools is the case of developing vaccination strategies to curtail the spread of an infectious disease.30 in this case, networks have been used to determine that one of the best strategies for curtailing the transmission of a disease is to identify and vaccinate hubs, rather than to conduct mass vaccination campaigns. in libraries, graphs as tools could be used to help researchers identify collaboration opportunities, to disambiguate author identities and aggregate related materials, to allow library staff to evaluate the academic contribution of a group of researchers (bibliometrics), and to explore geospatial and temporal aspects of information, including changes in research focus over time. graphs for author name disambiguation author name disambiguation is a long-standing problem in libraries. many resources have been devoted to manual and automatic name authority control, yet the problem remains unsolved. projects such as oclc viaf and efforts to establish unique author identifiers will no doubt improve the situation, but many problems remain.31 meanwhile, we have experimented with an approach to author name matching by generating multirelational graphs. authors subject–author (expertise) graphs graphs that connect authors by subject areas can vary because of the granularity of subject headings (see figure 9). high-level subject headings tend to function as hubs, but more useful relationships are revealed by specific subject headings and author-provided keywords. the map of science merges publications and citations with actual end user usage patterns captured in library systems and deals, in part, with categories of scientific research.27 it clusters publications and visualizes them “as a journal network that outlines the relationships between various scientific domains.” implicit in this a model is the relationship of authors to subject areas. institution–topic and nation–topic (expertise) graphs from a commercial or geopolitical perspective, graphs that represent institutional or national expertise can reveal valuable information for scientists, elected officials, and investors, particularly in networks that represent the change in a given organization or region’s contributions to a field over time. metadata for scientific papers typically includes enough information to generate nodes and edges describing this. the resulting graph can reveal unexpected details, such as national or institutional efforts to nurture expertise in a given field, and the results of those efforts. the visualization of this data may take the form of icons that vary in shape and size depending on various aspects of nodes in the institution-topic network. these visual representations can then be overlaid onto a map, with various visual aspects of the icons also affected by centrality measures applied to a given institution’s contributions.28 ■■ graphs as tools graph representations can be used as tools to explore a variety of complex systems. even systems that do not initially appear to manifest networks of relationships can often be better understood when some aspect of the system is represented as a graph. this approach requires thinking about what aspects of information needs, discovery, or consumption might be represented or evaluated using networks. two interesting examples from other fields will illustrate the point. a 2009 paper in acta astronautica proposed that techniques to reduce the amount of space junk in orbit around the earth could be evaluated using graph theory techniques.29 the authors propose a dynamic multirelational figure 9. a subject–author graph for stephen hawking graphs in libraries: a primer | powell et al. 165 computation over time because it is typically so important to understanding data. allen’s temporal intervals address machine reasoning over disparate means of recording the temporal aspects of events.33 another temporal computing concept that has applicability to graphs is from the memento project, which makes it possible for users to view prior versions of webpages.34 entities in the memento ontology can become predicates in triples, which in turn can become edges in graphs. using graphs, time can be represented as a relationship between objects or as a distinct object within a graph. nodes that connect through a temporal node may overlap, coincide, or co-occur. nodes that cluster around time represent something important about the objects. genomic-document and proteindocument networks many people hoped that mapping the human genome would result in countless medical advances, but the process whereby genes manifest themselves in living organisms turned out to be much more complex—there wasn’t just a simple mapping between genes and organism traits, there were other processes controlled by genes representing additional layers of complexity scientists had not anticipated. today biologists apply network science to these processes to reveal the missing pieces of this puzzle.35 just as the process itself is complex, the information needs of these researchers benefit from a more sophisticated approach. biologists often need to find papers that reference a given gene or protein sequence. and so, representing these relationships (e.g., article–gene) as graphs has the added benefit of making the digital library research data compatible with the methods that biologists already use to document what they know about these processes. although this is a specialized type of graph, a similar approach might be valuable to researchers in a number of scientific disciplines, including materials science, astrophysics, and environmental sciences. graphs of omission one of the less obvious capabilities of network science is to make predictions about complex systems by looking for missing nodes in graphs.36 this has many applications: for example, identifying a hub in the metabolic processes of bacteria can yield new targets for antibiotics, but it is vital to know that interrupting the enzyme that serves as that hub will effectively kill the organism. making predictions about the evolution of research by identifying areas for cross-disciplinary collaboration or areas where little work has been done—enabling a researcher to leverage are the primary nodes of interest, but relationships such as topic areas, titles, dates, and even soundex representations of names also are represented. as one would expect, phonetically similar names cluster around particular soundex representations. shared coauthorship patterns and shared topic areas can reveal that two different names are for the same author as, for example, when a person’s name changes after marriage (see figure 10). graphs for title or citation deduplication string edit distance involves counting the number of changes that would need to be made to one string to convert it to another, and it is one of the most common approaches to deduplicating titles, citations, and author names. multirelational graphs, in which titles are linked to authors, publication dates, and subjects, result in subgraphs that appear virtually identical when two title variants are represented. centrality measures can be applied to unipartite subgraphs of these networks to home in on areas where data duplication may exist. temporal-topic graphs for analyzing the evolution of knowledge over time a particularly active area of research in graph theory is the representation of dynamical systems as networks. a dynamical system is described as a complex system that changes over time.32 computer scientists have developed various strategies and technologies to cope with figure 10. two authors with similar names linked by subject nodes 166 information technology and libraries | december 2011 basis for an on-the-fly search expansion tool. a querysuggestion tool might look at user-entered terms and determine that some are hubs, then suggest related terms from nodes that connect to those hub nodes. remember, graphs need not be visible to be useful! global subject resolution using dbpedia although dbpedia appears to lag behind wikipedia in terms of completeness and scrutiny by domain experts, it offers one mechanism for unifying user-provided tags, author keywords, and library-assigned subject headings with a graph of known facts about a topic. links into and out of dbpedia’s graphs on a given topic would enable serendipitous knowledge discovery through browsing these semantic graphs. viaf linked author data oclc’s effort to convert tens of millions of identity records into graphs describing various attributes of authors promises to enhance exploration of digital library content on the author dimension.42 these authority records contain a wealth of information, linking name variations, basic genealogical data such as birth and death dates, associations with institutions, subject areas, and titles published by authors. although some rough edges need to be smoothed (one of the authors of this paper discovered that his own authorship data was linked with another author of the same name), iterative refinement of this data as it is actually used may enable crowd-sourced the first-mover advantage and thus advance his or her career—is a valuable service that libraries are well positioned to provide (see figure 11). machine-supplied suggestions offer another type of prediction. for example, providing the prompt “did you mean john smith and climate change?” can leverage real or predicted relationships between author and subject (see figure 12). graphs, in turn, can be used to create tools that will simplify an author–subject search. viral concept detection phase transition typically refers to a process in thermodynamics that describes the point at which a material changes from one state of matter to another (e.g., liquid to solid). phase transition also applies to the dispersal of a new idea. interestingly enough, graphs representing matter at the point of phase transition, and graphs representing the spread of a fad in a social network, exhibit the same recognizable pattern of change: suddenly there are links between many more nodes, there’s a dramatic increase in clustering, and something called a giant component emerges.37 in a giant component, all of the nodes in that portion of the graph are interlinked, resulting in a complete graph like figure 5. this is not so different from what one observes when something “goes viral” on the internet. in a library, a dynamic graph showing the usage of new keywords for emerging subject areas would likely reflect a similar pattern. ■■ linked data graph examples cross-collection graphs, or graphs that link data under your control to data published online, can be constructed by building links into the web of linked data.38 linked data refers to semantic graphs of statements that various organizations publish on the web. for example, geonames.org publishes millions of statements about geographic locations on the linked data web.39 as these graphs grow and evolve, opportunities emerge for using this data in combination with your own data in various ways. for example, it would be quite interesting to develop a network representation of library subject headings and their relationships to concepts in the encyclopedic linked data collection known as dbpedia.40 the resulting graph could be used in a variety of ways: for example, to evaluate the consistency of statements made about concepts, to establish semantic links between user-provided tags and concepts,41 or to function as the figure 11. identifying areas for collaboration: a co-author graph with many simple motifs and few clusters might indicate a field ripe for collaboration graphs in libraries: a primer | powell et al. 167 content could be represented and explored as a graph, and some research has already shown that geographic networks—especially those representing human-constructed entities such as cities and transportation networks—exhibit small world characteristics.45 another way graphs can express geographic relationships in useful ways would be in representing the concept of nearness. waldo tobler’s first law of geography states that “everything is related to everything else, but near things are more related than distant things.”46 in practice, human beings define nearness in different ways, so a graph representing a shared concept of nearness would be very valuable, particularly in exploring works associated with biological, ecological, geological, or evolutionary sciences. graph representations of nearness could be developed by librarians working with scientists and could be the geographic equivalent to subject guides and finding aids. they also might be useful across disciplines and would enable machine inferencing across data that include geographic relationships. still other kinds of graphs what might a digital library tool based on graph theory look like? what could it do? it wouldn’t necessarily depict visualizations of graphs (though in some cases visual graphs are the most efficient way to impart concepts). after all, citation databases utilize graph theory, but the user only sees a number (cite count) and lists of articles (citing or cited). in many cases, then, the tool would perform graph evaluation techniques behind the scenes, translating these metrics into simple descriptive queries for the user. for example, a user interested in the most influential papers in his field would enter his subject, and then on the backend, the tool would apply eigenvector centrality to that subject’s citation graph. if the same user finds an especially relevant article, clicking a “find similar articles” button will produce a list of papers in that graph with the shortest path length to the paper in question. researchers also could use this tool to evaluate authors and institutions in various ways: ■■ is my output diverse or specialized compared to my colleagues? the tool assigns a score for each author based on degree centrality in a subject-author graph. ■■ i want to find potential collaborators. the tool returns authors connected to researcher by the shortest path length in a coauthorship graph. ■■ i want to collaborate with colleagues from other departments at my institution. high betweenness centrality quality control that will more rapidly identify and resolve these problems. linked geographic data using geonames it is ironic that the use of networks to describe geographic aspects of the world is in its infancy, considering that many consider leonhard euler’s attempt to find a mathematical solution to the seven bridges of königsberg problem in 1735 to be the birth of the field.43 as some authors have pointed out, geometric evaluation of geographic relationships is actually a poor way to explore geographic relationships.44 graphs can be used to express arbitrary relationships between geographically separated objects, and it is perhaps no accident that our road and railway systems are in fact among the most familiar graphs that people encounter in the real word. a subway map is a graph where subway stations are nodes linked by railway. graphs can represent the relationships between topological features, the visibility of buildings in a city to one another, or the land, sea, and air transportation that links one country to another. geonames supplies a rich collection of geographic information that includes descriptions of geopolitical entities (cities, states, countries), geophysical features, and various names that have been ascribed to these objects. the geographic relationships in intellectual figure 12. find similar articles: a search for hv reynolds might prompt a suggestion for sd miller, who has a similar authorship pattern 168 information technology and libraries | december 2011 nov. 21, 2007, timbl’s blog, http://dig.csail.mit.edu/bread crumbs/node/215. 5. lawrence page et al., the pagerank citation ranking: bringing order to the web (1999), http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.31.1768. 6. duncan s. callaway et al., “network robustness and fragility: percolation on random graphs,” physical review letters 85, no. 25 (2000): 5468–71. 7. adreas wagner and david a. fell, “the small world inside large metabolic networks,” proceedings of the royal society b: biological sciences 268, no. 1478 (2001): 1803–10. 8. gil benko, christopher flamm, and peter f. stadler, “a graph-based toy model of chemistry,” journal of chemical information and modeling 43, no. 4 (2003): 1085–93. 9. tad hogg, bernardo a. huberman, and colin p. williams, “phase transition and the search problem,” artificial intelligence 81 (1996): 1–15. 10. vladimir boginski, sergiy butenko, and panos m. pardalos, “mining market data: a network approach,” computers & operations research 33, no. 11 (2006): 3171–84. 11. zoltán dezső and albert-lászló barabási, “halting viruses in scale-free networks,” physical review e 65, no. 5 (2002), doi: 10.1103/physreve.65.055103. 12. hans noel and brendan nyhan, “the ‘unfriending’ problem: the consequences of homophily in friendship retention for causal estimates of social influence,” sept. 2010, http://arxiv.org/abs/1009.3243. 13. johan bollen et al., “toward alternative metrics of journal impact: a comparison of download and citation data,” information processing & management 41, no. 6 (2005): 1419–40; xiaoming liu et al., “co-authorship networks in the digital library research community,” information processing & management 41, no. 6 (2005): 1462–80. 14. johan bollen et al., “clickstream data yields highresolution maps of science,” ed. alan ruttenberg, plos one 4, no. 3 (3, 2009): e4803. 15. eric kolaczyk, statistical analysis of network data (new york; london: springer, 2009). 16. alejandro cornejo and nancy lynch, “reliably detecting connectivity using local graph traits,” csail technical reports mit-csail-tr-2010–043, 2010, http://hdl.handle .net/1721.1/58484 (accessed feb. 17, 2011). 17. réka albert, hawoong jeong, and albert-lászló barabási, “error and attack tolerance of complex networks,” nature 406, no. 6794 (2000): 378–82. 18. m. e. j. newman, “scientific collaboration networks. ii. shortest paths, weighted networks, and centrality,” physical review e 64, no. 1 (2001), doi: 10.1103/physreve.64.016132. 19. albert, jeong, and barabási, “error and attack tolerance.” 20. r. milo, “network motifs: simple building blocks of complex networks,” science 298, no. 5594 (2002): 824–27. 21. lawrence j. hubert, “some applications of graph theory to clustering,” psychometrika 39, no. 3 (1974): 283–309. 22. noel and nyhan, “the ‘unfriending’ problem.” 23. alexandru balaban and douglas klein, “co-authorship, rational erdős numbers, and resistance distances in graphs,” scientometrics 55, no. 1 (2002): 59–70. 24. liu et al., “co-authorship networks in the digital library research community.” 25. derek j. de solla price, “networks of scientific papers,” in a subject–author graph for that institution may locate potential “bridge” subjects to collaborate in. ■■ i’m leaving my current job. what other institutions are doing similar work? in an institution–subject graph, the shorter the path length between two institutions, the more comparable they may be. graphs also enable libraries to reach outside their own data to build connections with other data sets. heterogeneity, which makes relational database representations of arbitrary relationships difficult or impossible, becomes a trivial matter of adding additional nodes and edges to bridge collections. the linked data web defines simple requirements for establishing just such representations, and libraries are wellpositioned to build these bridges. ■■ conclusion for many centuries, libraries have served as repositories for the accumulated knowledge and creative products of civilization, and they contain mankind’s best efforts at comprehending complexity. this knowledge includes scientific works that strive to understand various aspects of the physical world, many of which are complex and require the efforts of numerous researchers over time. since the advent of the dewey decimal system, librarians have worked on many fronts to make this knowledge discoverable and to assist in its evaluation. qualitative evaluation increasingly requires understanding a resource in a larger context. we suggest that this context is itself a complex system, which would benefit from the modeling and quantitative evaluation techniques that network science has to offer. we believe librarians are well positioned to leverage network science to explore and comprehend emergent properties of complex information environments. as motivation for this pursuit, we offer in closing this prescient quote from carl woese, which though focused on the discipline of biology, is equally applicable to the myriad complexities of modern life: “a society that permits biology to become an engineering discipline, that allows that science to slip into the role of changing the living world without trying to understand it, is a danger to itself.”47 references 1. melanie mitchell, complexity: a guided tour (oxford, england; new york: oxford univ. pr., 2009). 2. carl woese, “a new biology for a new century,” microbiology and molecular biology reviews (june 2004): 173–86, doi: 10.1128/mmbr. 68.2.173–186.2004. 3. national research council (u.s.), network science (washington, d.c.: national academies pr., 2005). 4. tim berners-lee, “giant global graph,” online posting, graphs in libraries: a primer | powell et al. 169 networks,” proceedings of the national academy of sciences of the united states of america 98, no. 2 (jan. 16, 2001): 404–9. 38. chris bizer, richard cyganiak, and tom heath, how to publish linked data on the web? http://sites.wiwiss.fu-berlin.de/ suhl/bizer/pub/linkeddatatutorial/ (accessed feb. 17, 2011). 39. geonames, http://www.geonames.org/ (accessed feb. 17, 2011). 40. dbpedia, http://dbpedia.org/ (accessed february 17, 2011). 41. alexandre passant and phillippe laublet, “meaning of a tag: a collaborative approach to bridge the gap between tagging and linked data,” proceedings of the www 2008 workshop linked data on the web (ldow2008), bejing, apr. 2008, doi: 10.1.1.142.6915. 42. oclc, “viaf”; oclc homepage, http://www.oclc.org/ us/en/default.htm (accessed feb. 17, 2011). 43. norman biggs, graph theory, 1736–1936 (oxford, england; new york: clarendon, 1986). 44. bin jiang, “small world modeling for complex geographic environments,” in complex artificial environments (springer berlin heidelberg, 2006): 259–71, http://dx.doi.org/10.1007/3 -540-29710-3_17. 45. gillian byrne and lisa goddard, “the strongest link: libraries and linked data,” d-lib magazine 16, no. 11/12 (2010), http://www.dlib.org/dlib/november10/byrne/11byrne.html (accessed feb. 17, 2011). 46. daniel sui, “tobler’s first law of geography: a big idea for a small world?” annals of the association of american geographers 94, no. 2 (2004): 269–77. 47. woese, “a new biology for a new century.” science 149, no. 3683 (july 30, 1965): 510–15. 26. m. e. j. newman, “the first-mover advantage in scientific publication,” epl (europhysics letters) 86, no. 6 (2009): 68001. 27. bollen et al., “clickstream data yields high-resolution maps of science.” 28. chaomei chen, jasna kuljis, and ray j. paul, “visualizing latent domain knowledge,” ieee transactions on systems, man and cybernetics, part c (applications and reviews) 31, no. 4 (nov. 2001): 518–29. 29. hugh g. lewis et al., “a new analysis of debris mitigation and removal using networks,” acta astronautica 66, no. 1–2 (2010): 257–68. 30. dezso and barabási, “halting viruses in scale-free networks.” 31. oclc, “viaf (the virtual international authority file) [oclc—activities],” http://www.oclc.org/research/activities/viaf/ (accessed feb. 17, 2011). 32. mitchell, complexity: a guided tour. 33. james f. allen, “toward a general theory of action and time,” artificial intelligence 23, no. 2 (1984): 123–54. 34. herbert van de sompel et al., “memento: timemap apo for web archives,” http://www.mementoweb.org/events/ ia201002/slides/memento_201002_timemap.pdf (accessed feb. 17, 2011). 35. hawoong jeong et al., “lethality and centrality in protein networks,” nature 411 (may 3, 2001): 41–42. 36. aaron clauset, cristopher moore, and m. e. j. newman, “hierarchical structure and the prediction of missing links in networks,” nature 453, no. 7191 (2008): 98–101. 37. m. e. j. newman, “the structure of scientific collaboration incoming editor’s column | gerrity 155 bob gerrity g reetings ital readers. i’m writing this in late september, as the boston red sox attempt to back their way into the major league baseball postseason after blowing a 9-game lead over tampa bay in a major-league september meltdown of epic proportions. [red sox fans are prone to hyperbole, but in this case no hyperbole is needed: this meltdown really is epic.] it’s down to the last game of the season, and like many red sox fans, i’m hopeful but not optimistic. the fate of the 2011 red sox will be old news by the time this appears in print, though: as i’m coming to learn, the wheels of scholarly publishing continue to turn ever so slowly, unless forced to do otherwise. which brings me to why i’m taking on the role of editor of ital. on one hand, i’m fortunate to be taking on the editorship of a journal that quite clearly has been stewarded with care, dedication, and attention by my predecessors. i’ve spent quite a few hours recently in the z678.9 section of my library’s stacks, perusing three decades of back volumes of ital and its predecessor, the journal of library automation. there’s an impressive body of scholarly and informational output on library automation and related topics, from the sublime (“to boolean or not to boolean?” september 1983), to the not-so-sublime (“the effects of baud rate, performance anxiety, and experience in online bibliographic searches,” march 1990), to the sentimental (“floppies to pass the billion-dollar level in ’84.” september 1982), to the déjà-vu-all-over-again (“ls2000—the integrated library system for oclc,” june 1984). overall, i’d have to say there’s a solid foundation to build on, plus plenty of good content in the pipeline, and it would be easy to continue on in the same vein. but that’s not why i’m here. i’m fortunate to be taking on the role of editor as ital faces significant changes. in his inaugural editorial for ital in march 2005, then-incoming editor john webb articulated a number of worthy goals for ital, to both broaden and deepen the content of the journal and the demographic of the authors contributing to it. one goal in particular, though, strikes me (in hindsight of course) as problematic: “i hope to . . . facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association.” anyone who has observed the struggles of the newspaper industry in recent years or been involved in the shift towards e-only in the world of academic/scholarly journals will not be surprised to learn that, in the intervening years since john wrote his column and ital has continued in print plus electronic form, revenues (primarily from subscriptions and advertising) have steadily declined while production and distribution costs have not, resulting in an increasing annual subsidy from ala/lita to support the publication. as a result, i’ve been tasked with exploring a new publication model for ital: open access and electronic only. plans for—and the timing of — this transition are still being developed as i write this, but should be finalized before “my” first issue is published in march 2012. there is much about ital that will not change even if the publication format does. a primary focus of the journal will continue to be to solicit and publish high-quality, peer-reviewed papers covering a broad array of topics related to the design, application, and use of technology in libraries. changes i would like to see include making ital more timely and more relevant to the day-to-day work interests of many of its readers. i’d like to add more topical, current, and informational content to ital without negatively impacting its traditional role as a publication vehicle for librarians in tenure-track positions. ital in an e-only format also needs to provide easy and transparent ways for readers to be informed when new content is published and to offer advice, criticism, and commentary to help improve ital. i look forward to your feedback as ital moves in a new direction, about which i’m both hopeful and optimistic. i would like to offer my sincere thanks to the outgoing editor of ital, marc truitt, who has been both helpful and gracious during this editorial transition. marc is passionate about ital and its legacy, and i hope he’ll see the future ital as a worthy successor to, rather than an unfortunate break from, the journal he’s stewarded for the past several years. incoming editor’s column: ch-ch-ch-ch-changes (turn and face the strain) bob gerrity (robert.gerrity@bc.edu) is associate university librarian for information technology, boston college libraries, chestnut hill, massachusetts. 50 communications how long the wait until we can call it television jerry borrell: congressional research service, library of congress , washington , d.c* this brief article will review videotex and teletext. there is little need to define terminology because new hybrid systems are being devised almost constantly (hats off to oclc's latest buzzword-viewtel). ylost useful of all would be an examination of the types of technology being used for information provision. the basic requirement for all systems is a data base-i.e ., data stored so as to allow its retrieval and display on a television screen. the interactions between the computer and the television screens are means to distinguish technologies. in teletext and videotex a device known as a decoder uses data encoded onto the lines of a broadcast signal (whatever the medium of transmission ) to generate the display screen. in videotex, voice grade telephone lines or interactive cable are used to carry data communications between two points (usually 1200 baud from the computer and 300 baud or less from the decoder and th e television screen). in teletext the signal is broadcast over airwaves (wideband) or via a time-sharing system (narrowband). the numerous configurations possible make straightforward classification of syst e ms questionable. a review of the systems currently available is useful to illustrate these terms, videotex and teletext. compuserve, the columbus, ohio-based company, provides on-line searching of newspapers to about 4,000 users. reader's digest recently acquired 51 percent of the source, a time*the views expressed in this paper do not necessarily represent those of the library of congress or of the congressional research ser~ vice. sharing service that provides more than 100 different (nonbibliographic) data bases to about 5,000 users. the warner and american express joint project, qube (also columbus-based), utilizes cable broadcast with a limited interactive capability . it does not allow for on-demand provision of information ; rather, it uses a polling technique. antiope, the french teletext system, used at ksl in st. louis last year and undergoing further tests in los angeles at knxt in the coming year, is only part of a complex data transmission system known as didon. antiope is also at an experimental stage in france, with 2,500 terminals scheduled for use in 1981. ceef ax and oracle , broadcast teletext by the bbc and ibc in britain, have an estimated 100,000 users currently. two thousand adapted television sets are being sold every month . prestel, bbc's videotex system, currently has approximately 5,000 users, half of whom are businesses. all other countries in europe are conducting experiments with one of the technologies. in canada, telidon, the most technically advanced system, has 200 users. experiments involving telidon are being conducted nationwide due to government interest in telecommunications improvements. telidon will also be used in washington in the spring of 1981 for consumer evaluation. these cursory notes should indicate the breadth of interes t in alternative means of information provision. video and electronic publishing newsletters (see references) keep track of the number of users and are the best way to keep informed of activities and developments. several important trends are becoming evident. perhaps the most evident is the realization that videography is being developed in countries other than the u.s. as a result of strong support by the national posts and telecommunications (ptt) authorities . until recently there was a feeling that the u.s. was technically behind europe. what is now evident is that in the free market system of the u . s. manufacturers or other potential system providers have had insufficient impetus to provide videotex/teletext technology. the technology of information display (see borrell, journal of library automation, v.13 (dec. 1980), p.277-81) in the u.s. is an order of magnitude more sophisticated than in europe. the point being that in the absence of strong ptt pressure, videography in the u . s. developed for specialized markets in which telecommunications were not a central need. in the one area of great demand, teletext services for the hearing impaired, decoders were developed and have been employed for a number of years (about 25,000 are currently in use ). as the high cost of telecommunications bandwidth is eased by data compression, direct broadcasting by satellite, enhanced cable services, and fiber optic networks, then videotex and te letext will become available on a wide scale in the u.s. the computer inquiry ii decision by the fcc involving reinterpretation of the communications act of 1934 has given at&t permission to enter the data processing market . in fact, at&t, in its third experiment with videotex, is taking such an aggressive stance that it seems to be doing everything that its critics have feared: providing updatable classified ads (dynamic yellow pages), allowing users to place information into the system memory , and providing voice mail servicesthereby taking on the newspapers, home computer manufacturers, and the u . s. postal service. in addition, banking services will be offered . as the largest company in the u.s ., this stance cannot be ignored. at&t supplies about 80 percent of the phone service in the u .s., and has the potential, if allowed , to become a broadcaster, data processor, publisher, and banker ; cross-ownership was never allowed up to this time . the trend toward specialized services provision is also exemplified by the communications 51 french and british systems. prestel , which was originally targeted for a home market, is now promoted with the tacit policy of being a special business service allowing financial and private data to be provided to subscribers. sofratev, the marketers of the french teletext system, are acknowledging the importance of transactional markets in two ways, based on technology they have named "smart card," a credit card-size (in one configuration) plate with a built-in microprocessor or chip. the card will allow system users to access material that will have controlled readership. an example would be a magazine of financial data provided to those who need such information (or, more importantly, are willing to pay for it). in a more complex effort, the largest retailer in paris will advertise material via teletext and system users will be able to make acquisitions with their smart card, which can be programmed with financial data. nor is this the end of the effort by the french to market information display technology. the electronic phone directory, being offered by bell in austin , is replicated in a more modest way by the french, who plan to produce a six-byeight-inch black-and,white display unit that will provide. phone directory information (both white and yellow pages) to all of france by the 1990s. developed as part of the "telerriatique" program of the .french government, the terminals represent to some (the parent company of the source has tendered an offer for up to 250,000 of the terminals) a low-cost alternative for providing videotex to a mass market. the tandy home computer in its videotex configuration seems to fill the same market slot. perhaps the most disturbing trend, at least from a librarian's point of view, is the fact that contemporary data systems are being created which could benefit greatly from the experience of librarians and libraries. for instance, research into the methods of access-keyword, phonetic and geographical-by the french is intended to pro:vide a flexible and easily used system for untrained persons searching for directory information, and is being performed by an advertising and yellow pages 52 journal of library automation vol. 14/1 march 1981 publishing firm. with a feeling of deja vu i listened to an explanation of how difficult it is to develop a system for the novice; one proposed solution is to allow only the first four letters of a word to be entered (one of the search methods used at the library of congress, which does suggest some cross-fertilization ). whatever the trends, the reality is that librarians and information scientists are playing decreasing roles in the growth of information display technology. hardware systems analysts, advertisers, and communications specialists are the main professions that have an active role to play in the information age. perhaps the answer is an immediate and radical change in the training of library schools of today. our small role may reflect our penchant to be collectors, archivists, and guardians of the information repositories . have we become the keepers of the system? the demand today is for service, information, and entertainment. if we librarians cannot fulfill these needs our places are not assured. should the american library association (ala) be ensuring that libraries are a part of all ongoing tests of videotex-at least in some way-either as organizers, information providers, or in analysis? consider the force of the argument given at the ala 1980 new york annual conference that cable television should be a medium that librarians become involved with for the future. certainly involvement is an important role, but we , like the industrialists and marketers before us, must make smart decisions and choose the proper niche and the most effective way to use our limited resources if we are to serve any part of society in the future. bibliography 1. electronic publishing revietc. oxford, england : learned information ltd . quarterly . 2. home video report . white plains, new york : knowledge industry publications. weekly. 3. ieee transactions on consumer electronics. new york: ieee broadcast, cable, and consumer electronics soc iety . five tim es yearly. 4. international videotex /te letext news. washington , d. c.: arlen communications ltd. monthly . 5. videodisc/teletext news. westport , conn.: microform revi ew. quarterly. 6. videoprint. norwalk , conn.: videoprint. two times monthly. 7. viewdata/videotex report. new york: link resources corp. monthly. data processing library: a very special library sherry cook, mercedes dumlao, and maria szabo: bechtel data processing library, san francisco, california. the 1980s are here and with them comes the ever broadening application of the computer. this presents a new challenge to libraries. what do we do with all these computer codes? how do we index the material? and most importantly, how do we make it accessible to our patrons or computer users? bechtel's data processing library has met these demands. the genesis for th e collection was bechte l's conversion from a honeywell 6000 computer to a univac lloo in 1974. all the programs in use at that time were converted to run on the univac system. it seemed a good time to put all of the computer programs together from all of the various bechtel divisions into a controlled collection. the librarians were charged with the responsibility of enforcing standards and control of bechtel's computer programs. the major benefits derived from placing all computer programs into a controlled library were: 1. company-wide usage of the programs. 2. minimize investment in program development through common usage. 3. computer file and documentation storage by the library to safeguard the investment. 4. central location for audits of program code and documentation. 5. centralized reporting on bechtel programs . developing the collection involved basic cataloging techniques which were greatly modified to encompass all the information that computer programs generate, including actual code, documentation, and list66 1 comparative costs of converting shelf list records to machine readable form richard e. chapin and dale h. pretzer: michigan state university library, east lansing, michigan a study at michigan state university library compared costs of three different methods of conversion: keypunching, paper-tape typew1·iting, and optical scanning by a service bureau. the record converted included call number, copy number, first 39 letters of the author's name, first 43 letters of the title, and m.te of publication. source documents were all of the shelf list cards at the library. the end products were a master book tape of the library collections and a machine readable book card for each volume to be used in an automated circulation system. the problems of format, cost and techniques in converting bibliographic data to machine readable form have caused many libraries to defer the automation of certain routine operations. the literature offers little for the administrator facing the decisions of what to convert and how to convert it. automated circulation systems require at least partial conversion of the accumulated bibliographic record. the university of missouri, like many libraries, has been converting the past record only for books as they are circulated ( 1) . southern illinois university ( 2) and johns hopkins ( 3), on the other hand, have converted the record for their entire collections. the southern illinois program is based upon converting only the call number. johns hopkins has converted the call number, main entry, title, pagination, size, and number of copies. and missouri has recorded call number, accession number, and abbreviated author and title. costs of shelf list conversion/ chapin and pretzer 67 · several methods of converting the record have been described. missouri employed keypunching; southern illinois marked code sheets which were scanned electronically and converted to magnetic tape; johns hopkins, working from microfilm copy of the shelf list, used special type font and typed the records for optical scanning. an ibm report on converting the national union catalog recommended an on-line terminal as the best method of conversion ( 4). studies at michigan state university led to the conclusion that acquisition, serials, circulation, and card production contained certain routines that might well be automated. once automation of circulation was decided upon as our initial effort, decisions were necessary as to the conversion. it was recommended that a portion of the bibliographic record for all items in the shelf list should be converted. information other than the call number is being used for other programs ( 5) . cost figures for converting library records are scarce. in only two instances are figures available. the ibm report on the national union cata. log shows that the average entry in nuc contains 277 characters, with an estimated conversion cost ranging from $0.3531 to $0.417 per entry. the proposed conversion method employs an on-line terminal, a technique not available to most libraries. the johns hopkins conversion of "about 300,000 · cards" was accomplished by optical scanning and cost $18,170 (3,p.4). this figures out at about $.06 per record. later in the report it is stated that the conversion "is at a rate of $.0038 per character converted" ( 3,p.25). at $.06 per card and $.0038 per character, the converted record would consist of 16 characters! in the study herewith reported every effort was made to arrive at comparative cost figures for the three methods of conversion that are readily available to most research libraries: keypunching, paper-tape typewriting, and optical scanning as accomplished through a service bureau. methods of study the shelf list records of the michigan state university library were divided into three sections by numbering catalog drawers in sequence: 1,2,3; then 2,3,1; then 3,1,2. all the drawers marked with number one became one sample group; those marked two and three made up the other groups. this method of numbering the drawers gave samples from each area of the classification schedule for each method of conversion. the bibliographic data were taken directly from the shelf list without transferring information to worksheets. a sample of the shelf list shows that 74 per cent of the cards are library of congress cards or copies of library of congress proof slips. of those cards produced in the library, only 12 per cent of the total were abbreviated records. the keypunch operators, the typists, and the service bureau were in68 journal of library automation vol. 1/ 1 march, 1968 structed to extract information from the shelf list record. all differences in type-capitals, italics, etc.-were to ~e ignored; transliterated titles were to he used in those cases where entries were in non-roman alphabet; accents and diacritical marks were ignored, except where it made a difference in filing, as with umlauts; all numbers in title and author fields were to be spelled as if written. <d qd 941 • a4l3 c.l c.2 c.) jx 1417 • w47 c.l ~ auanovich. vladimir moiseevfela. <!> apahaf dlsw.rslon m crystal rtics and the theory of excitons ,l>yl 1'. m. agranovic and v. l. ginzburg. trnnslnted from the original manuscript by literaturprojekt, innsbruck, a~tria. london, new york intencience publishers ,cu~r~ \!1 ' ml, 316 p. ua. 24 em. (lnterteleoce mon01rapba and te:dl in phyiiici.i and utronomy, y, 18) translation of kphctu .. oonthka c )"'etom n~tp&hctaeiiiioi .iiicnepc:hh h teo ph• jltchtohob ( romanlzed: ~latallooptlka i adletom prostrn n11t\ ennoj dl1pe11111 i teorul ~ksltonoy) b.lbllograpby: p . 807-313. 1. cry11tal optici. 2. e1:clto" tbeory. t. gln&burj, vltallt luanyich, 191&joint author. n. title. (sertea: lntencience monosj&phl id physics ao aattooomy, v. 18) qd941.a.13 548'.9 66-2th7 llbrarr of con1re1111 ed • ern.tional .relations• san francisco, chandler 0 fig. 1. shelf list cards. costs of shelf list conversion/ chapin and pretzer 69 information that was transcribed is marked in the example, figure 1. the complete call number 1) was included. author 2) was typed through 39 spaces, including dates, if possible. in cases where author entry was lengthy the operators were instructed to stop at the end of 39 spaces. title 3) was recorded as completely as possible th1·ough 43 spaces, but not to extend beyond the first major punctuation. date 4) was included as shown. only one copy 5) was shown on each entry. in the example of abbreviated form in figure 1, five separate records were required, with change only in copy number. the master book tape includes the call number, which occupies 32 spaces; 3 spaces are allowed for copy number, 39 for author, 43 for title, and 4 for date of publication. on the book card, figure 2, which was generated by the computer from the master book tape, the format is as follows: 32 spaces for call numbers, 3 for copy number, 11 for author, 26 for title and 4 for the year published. the remainder of the card is for machine codes used in the circulation system. l. i i i i ii i ill ill i i i michigan state fjnjversiiy i library i i i i i i i i i ii i importa.fr: i if this card is lost or damaged, a fine jll ie chargta. msu 7j6 i i i i i fig. 2. book pocket card. i i i i i i ii . i i the book card alone can be created directly by the keypunch. however, if a library has equipment available for a more complete program, it is useful to prepare information in a format to create a master book tape. programs have been written so that the master tape can be added to or deleted from at a later date. four operators worked on the project at michigan state university. two of them were average keypunch operators with little typing skill, one was an expert typist, and the other was an expert keypunch operator. the first two operators were trained to use both the keypunch and the flexowriter. the purpose in using a variety of typists and operators for the job was to arrive at average figures for the conversion project. the data show great variance of output among operators. .70 journal of library automation vol. 1/ 1 march, 1968 the outline of the methods used is shown in figure 3. the keypunch method recorded the bibliographic data by use of an ibm 026 keypunch. the punch cards were transferred to a magnetic tape and the book cards were generated by the computer. the paper-tape typewriter information was punched in paper tape by the use of a 2201 flexowriter. a portion of the sample was converted directly to magnetic tape. since some libraries will not have a paper-tape to magnetic-tape converter, the remainder of the paper-tape sample was converted to punch cards and then to magnetic tape. typed 1-----+ page fig. 3. flowchart of shelf list record. optical scanner the optical scanning method was handled by farrington corporation·s service bureau, input services, in dayton, ohio. the service bureau assigned 10 to 15 employees to transcribe the shelf list. they used ibm selectric typewriters, with special type font. special symbols were used to designate end of field. the data were recorded on continuous-form paper. the typed record was then edited and scanned, producing a magnetic tape. mter the tape was used for production of book cards, it was added to the master book tape. the first batch of cards sent to dayton was gone from the library for approximately four weeks. after the personnel at dayton became accustomed to the format and to library terminology, the turnaround time was approximately two weeks. the 255,000 records which were converted by the service bureau were sent off campus in four separate batches. costs of shelf list conversion/chapin and pretzer 71 machine verification of the record was not required. each operator was instructed to proofread her own copy. machine verification was considered, but the idea was discarded because of the extra cost involved. also, since book cards were to be inserted in all volumes, final verification would result when the books and cards were matched. results in the conversion keypunching cost 6.63 cents per record. paper-tape ran slightly higher-7.07 cents; this higher cost was due to the added cost of machinery and the added cost of going from paper tape to magnetic tape. optical scanning, through a service bureau, was exactly the same as keypunching-6.63 cents, including the programming costs. cost details are shown in table 1. table 1. average cost per shelf list record converted labor (1) salary fringe benefits equipment rental (2) computer supplies overhead ( 4) contractual services keypunch $.04073 .03723 .00350 .00322 .00280 .00042 .00003 .02232 paper-tape typewriter $.03960 .03620 .00340 .00888 .00840 .00048 . (3) .00052 .02172 scanning, service bureau $.00030 .06600 (5) total $.06630 $.07072 $.06630 ( 1) average costs for all operators based upon salary of $2.10 per hour, and fringe benefits of 9.4 per cent. ( 2) rental tin1e to library of ibm 1401 computer is $30.00 per hour, including personnel costs. (3) includes $.000089 for tape-to-tape conversion and $.000091 for tape to card to magnetic tape conversion. ( 4) university charge of 54.87 per cent of salaries, for space, utilities, maintenance, etc. this figure does not include cost of training and supervision. ( 5) $.057 per record plus .009 per record for programming costs. late in the study we observed that a seemingly inordinate amount of the flexowriter time was consumed by the automatic movement of the typewriter caniage to the pre-determined fixed fields. in order to circum72 journal of library automation vol. 1/ 1 march, 1968 vent this the operator was instructed to strike one key to indicate end of field, and then she no longer had to wait for the carriage movement. by using the manual field markers, as opposed to automatic fixed fields, the cost of the flexowriter operation was reduced to 6.672 cents per record. the disadvantage of the manual field-marking system was the increased chance of operator error, which amounted to 3.13 per cent more than the fixed-field method. for this reason, and in spite of the economy of the manual method, the use of pre-determined fixed fields for flexowriter conversion is to be preferred. in the comparison of the salary costs for keypunching and for the use of flexowriter, great variations were shown among operators. two participants were asked to use both the keypunch and the flexowriter on varying days, with tallies of their output accounted for throughout the entire project. operator 1 was essentially a skilled keypunch operator who had some background in typing. her salary cost per record during keypunching was 3.98 cents; her salary for the paper-tape typewriter was 7.92 cents. operator 2 was a skilled keypunch operator who was also sent to typing class for one term to raise her typing skill. her salary cost was 3.92 cents per record on the keypunch and 3.79 cents per record on the paper-tape machine. operator 3, who was a skilled keypunch operator, averaged 2.32 cents per record for salary cost. operator 4, who was a typist and not a keypunch operator, produced records on the flexowriter at a cost of 3.56 cents per record. the above figures indicate salaries only, and do not include overhead, fringe benefits, and other expenses which are reflected in the total conversion cost shown. a letter from farrington service corporation stated the following information about the scanning operation: "1) our typists produced an approximate total of 7,950 typing pages in the course of this conversion. 2) each typist averaged from 3.6 to 3.8 pages per hour. 3) we processed an average of 800-1,000 (shelf list) cards, per girl, per day. 4) the total man hours expended in this project was 2,144. 5) the amount of error detected as a result of sight verification varies significantly from girl to girl. the average, however, ran approximately 2.8 per cent (of records to be corrected)." comparison was made of actual records converted per eight-hour day by each of the methods. the service bureau, with skilled typists, was able to convert approximately 100 records an hour for each typist. the most efficient keypunch operator averaged about 75 records per hour, which was noticeably more than the average. the paper-tape typist, using pre-programmed fixed fields, reached 65 records per hour, but was able to produce 73 records per hour by manually typing the field markers. a short-run sample was stop-watch-timed to give an indication of the differences in results for each method when only minimum changes in certain fields, such as copy number or volume number, were required. costs of shelf list conversion/ chapin and pretzer 73 on the keypunch machine an operator consumed 34.6 seconds in typing the initial record and 20.4 seconds in duplicating the basic information and changing data in one given field. the operator with the automatic program flexowriter consumed 47.2 seconds typing the initial record, including 13.2 seconds in shifting fields and automatically firing the record marks, and 24 seconds duplicating the record. when she manually indicated the field information, she was able to convert the initial record in slightly less time-30 seconds; and she took 22.8 seconds to duplicate the data with a change in one field. final verification will be completed only when all cards are matched with the proper books. for those books that do not circulate, this may never be accomplished. a sample of cards was selected to reflect the three methods of conversion. the service bureau cards contained fewer errors than those produced by keypunching and paper-tape typewriting. production of records that were not acceptable to the computer in an edit program occurred in 1.75 per cent of the sample for keypunching, 0.93 per cent for paper-tape typewriting, and 0.16 per cent for service bureau. operator errors, discovered while matching cards with books, showed a higher percentage: 4.62 per cent for keypunching, 3.60 per cent for flexowriter, and 0.35 per cent for service bureau. conclusions and recommendations 1. the cost of converting a portion of the bibliographic record is relatively inexpensive when compared to the total cost of automated library programs. one reason for our delay in entering into the field of an automated circulation program was that of making the book cards. now that this task has been completed, it is obvious that conversion is a one-time cost that can well be absorbed. if the library cannot afford the original conversion, at a cost of 6 or 7 cents a record, then the library cannot afford to proceed with automated programs. 2. there is no difference in cost between keypunching a machine readable record and in having the project undertaken by a service bureau. the use of paper-tape typewriter for conversion costs more than the other two methods. 3. large scale conversion of records to machine readable form might well be done by an outside organization. in order to get the task completed in a short period of time, a library would be required to hire a number of short-term clerical employees. in the case of michigan state, situated in the small community of east lansing, recruiting and training a large number of employees for short-term projects is most difficult. it is rather certain that the overhead for such a program would bring the cost beyond that of using a service bureau. on the basis of our experience it is recommended that the conversion be sent to a service bureau. 74 journal of library automation vol. 1/ 1 march, 1968 4. a library can get along without portions of. a shelf list for short periods of time. one of the predicted problems of sending material off campus to be converted was that of losing the availability of the shelf list records. although there were some inconveniences, it was found that the library could carry on its operations and function without the shelf list. certainly, this could not be done if the shelf list cards were gone for any length of time. acknowledgment a grant from the council on library resources, inc., made possible the study described in this paper. references 1. parker, ralph h.: "development of automatic systems at the university of missouri library," in university of illinois graduate school of library science, ptoceedings of the 1963 clinic on library applications of data processing. (champaign, ill.: illini union bookstore, 1964)' 43-55. 2. southern illinois university. office of systems and procedures: an automated circulation control system for the delyte w. morris library; the system and its progress in brief. (carbondale, ill.: southern illinois university, 1963). 3. the johns hopkins university. the milton s. eisenhower library: progress report on an operations research and systems engineering study of a university library. (baltimore: johns hopkins, 1965). 4. international business machines. federal systems division: report on a pilot project for converting the pre-1952 national union catalog to a machine readable record. (rockville, maryland: ibm, 1965). 5. chapin, richard e.: "administrative and economic considerations for library automation," in university of illinois graduate school of library science, proceedings of the 1967 clinic on applications of data processing. (in press). the next generation library catalog | yang and hofmann 141 sharon q. yang and melissa a. hofmann the next generation library catalog: a comparative study of the opacs of koha, evergreen, and voyager open source has been the center of attention in the library world for the past several years. koha and evergreen are the two major open-source integrated library systems (ilss), and they continue to grow in maturity and popularity. the question remains as to how much we have achieved in open-source development toward the next-generation catalog compared to commercial systems. little has been written in the library literature to answer this question. this paper intends to answer this question by comparing the next-generation features of the opacs of two open-source ilss (koha and evergreen) and one proprietary ils (voyager’s webvoyage). m uch discussion has occurred lately on the nextgeneration library catalog, sometimes referred to as the library 2.0 catalog or “the third generation catalog.”1 different and even conflicting expectations exist as to what the next-generation library catalog comprises: in two sentences, this catalog is not really a catalog at all but more like a tool designed to make it easier for students to learn, teachers to instruct, and scholars to do research. it provides its intended audience with a more effective means for finding and using data and information.2 such expectations, despite their vagueness, eventually took concrete form in 2007.3 among the most prominent features of the next-generation catalog are a simple keyword search box, enhanced browsing possibilities, spelling corrections, relevance ranking, faceted navigation, federated search, user contribution, and enriched content, just to mention a few. over the past three years, libraries, vendors, and open-source communities have intensified their efforts to develop opacs with advanced features. the next-generation catalog is becoming the current catalog. the library community welcomes open-source integrated library systems (ilss) with open arms, as evidenced by the increasing number of libraries and library consortia that have adopted or are considering opensource options, such as koha, evergreen, and the open library environment project (ole project). librarians see a golden opportunity to add features to a system that will take years for a proprietary vendor to develop. open-source opacs, especially that of koha, seem to be more innovative than their long-established proprietary counterparts, as our investigation shows in this paper. threatened by this phenomenon, ils vendors have rushed to improve their opacs, modeling them after the next-generation catalog. for example, ex libris pushed out its new opac, webvoyage 7.0, in august of 2008 to give its opac a modern touch. one interesting question remains. in a competition for a modernized opac, which opac is closest to our visions for the next-generation library catalog: opensource or proprietary? the comparative study described in this article was conducted in the hope of yielding some information on this topic. for libraries facing options between open-source and proprietary systems, “a thorough process of evaluating an integrated library system (ils) today would not be complete without also weighing the open source ils products against their proprietary counterparts.”3 ■■ scope and purpose of the study the purpose of the study is to determine which opac of the three ilss—koha, evergreen, or webvoyage—offers more in terms of services and is more comparable to the next-generation library catalog. the three systems include two open-source and one proprietary ilss. koha and evergreen are chosen because they are the two most popular and fully developed open-source ilss in north america. at the time of the study, koha had 936 implementations worldwide; evergreen had 543 library users.4 we chose webvoyage for comparison because it is the opac of the voyager ils by ex libris, the biggest ils vendor in terms of personnel and marketplace.5 it also is one of the more popular ilss in north america, with a customer base of 1,424 libraries, most of which are academic.6 as the sample only includes three ilss, the study is very limited in scope, and the findings cannot be extrapolated to all open-source and proprietary catalogs. but, hopefully, readers will gain some insight into how much progress libraries, vendors, and open-source communities have achieved toward the next-generation catalog. ■■ literature review a review of the library literature found two relevant studies on the comparison of opacs in recent years. the first study was conducted by two librarians in slovenia investigating how much progress libraries had made toward the next-generation catalog.7 six online catalogs sharon q. yang (yangs@rider.edu) is systems librarian and melissa a. hofmann (mhofmann@rider.edu) is bibliographic control librarian, rider university. 142 information technology and libraries | september 2010 were examined and evaluated, including worldcat, the slovene union catalog cobiss, and those of four public libraries in the united states. the study also compared services provided by the library catalogs in the sample with those offered by amazon. the comparison took place primarily in six areas: search, presentation of results, enriched content, user participation, personalization, and web 2.0 technologies applied in opacs. the authors gave a detailed description of the research results supplemented by tables and snapshots of the catalogs in comparison. the findings indicated that “the progress of library catalogues has really been substantial in the last few years.” specifically, the library catalogues have made “the best progress on the content field and the least in user participation and personalization.” when compared to services offered by amazon, the authors concluded that “none of the six chosen catalogues offers the complete package of examined options that amazon does.”8 in other words, library catalogs in the sample still lacked features compared to amazon. the other comparative study was conducted by linda riewe, a library school student, in fulfillment for her master’s degree from san jose university. the research described in her thesis is a questionnaire survey targeted at 361 libraries that compares open-source (specifically, koha and evergreen) and propriety ilss in north america. more than twenty proprietary systems were covered, including horizon, voyager, millennium, polaris, innopac, and unicorn.9 only a small part of her study was related to opacs. it involved three questions about opacs and asked librarians to evaluate the ease of use of their ils opac’s search engines, their opac search engine’s completeness of features, and their perception of how easy it is for patrons to make self-service requests online for renewals and holds. a scale of 1 to 5 was used (1 = least satisfied; 5= very satisfied) regarding the three aspects of opacs. the mean and medium satisfaction ratings for open-source opacs were higher than those of proprietary ones. koha’s opac was ranked 4.3, 3.9, and 3.9, respectively in mean, the highest on the scale in all three categories, while the proprietary opacs were ranked 3.9, 3.6, and 3.6.10 evergreen fell in the middle, still ahead of proprietary opacs. the findings reinforced the perception that open-source catalogs, especially koha, offer more advanced features than proprietary ones. as riewe’s study focused more on the cost and user satisfaction with ilss, it yielded limited information about the connected opacs. no comparative research has measured the progress of open-source versus proprietary catalogs toward the next-generation library catalog. therefore the comparison described in this paper is the first of its kind. as only koha, everygreen, and voyager’s opacs are examined in this paper, the results cannot be extrapolated. studies on a larger scale are needed to shed light on the progress librarians have made toward the next-generation catalog. ■■ method the first step of the study was identifing and defining of a set of measurements by which to compare the three opacs. a review of library literature on the next-generation library catalog revealed different and somewhat conflicting points of views as to what the nextgeneration catalog should be. as marshall breeding put it, “there isn’t one single answer. we will see a number of approaches, each attacking the problem somewhat differently.”11 this study decided to use the most commonly held visions, which are summarized well by breeding and by morgan’s lita executive summary.12 the ten parameters identified and used in the comparison were taken primarily from breeding’s introduction to the july/ august 2007 issue of library technology reports, “nextgeneration library catalogs.”13 the ten features reflect some librarians’ visions for a modern catalog. they serve as additions to, rather than replacements of, the feature sets commonly found in legacy catalogs. the following are the definitions of each measurement: ■■ a single point of entry to all library information: “information” refers to all library resources. the next-generation catalog contains not only bibliographical information about printed books, video tapes, and journal titles but also leads to the full text of all electronic databases, digital archives, and any other library resources. it is a federated search engine for one-stop searching. it not only allows for one search leading to a federation of results, it also links to full-text electronic books and journal articles and directs users to printed materials. ■■ state-of-the-art web interface: library catalogs should be “intuitive interfaces” and “visually appealing sites” that compare well with other internet search engines.14 a library’s opac can be intimidating and complex. to attract users, the next-generation catalog looks and feels similar to google, amazon, and other popular websites. this criterion is highly subjective, however, because some users may find google and amazon anything but intuitive or appealing. the underlying assumption is that some internet search engines are popular, and a library catalog should be similar to be popular themselves. ■■ enriched content: breeding writes, “legacy catalogs tend to offer text-only displays, drawing only on the marc record. a next-generation catalog might bring in content from different sources to strengthen the visual appeal and increase the amount of information presented to the user.”15 the enriched content the next generation library catalog | yang and hofmann 143 includes images of book covers, cd and movie cases, tables of contents, summaries, reviews, and photos of items that traditionally are not present in legacy catalogs. ■■ faceted navigation: faceted navigation allows users to narrow their search results by facets. the types of facets may include subjects, authors, dates, types of materials, locations, series, and more. many discovery tools and federated search engines, such as villanova university’s vufind and innovative interface’s encore, have used this technology in searches.16 auto-graphics also applied this feature in their opac, agent iluminar.17 ■■ simple keyword search box: the next-generation catalog looks and feels like popular internet search engines. the best example is google’s simple user interface. that means that a simple keyword search box, instead of a controlled vocabulary or specific-field search box, should be presented to the user on the opening page with a link to an advanced search for user in need of more complex searching options. ■■ relevancy: traditional ranking of search results is based on the frequency and positions of terms in bibliographical records during keyword searches. relevancy has not worked well in opacs. in addition, popularity is another factor that has not been taken into consideration in relevancy ranking. for instance, “when ranking results from the library’s book collection, the number of times that an item has been checked out could be considered an indicator of popularity.”18 by the same token, the size and font of tags in a tag cloud or the number of comments users attach to an item may also be considered relevant in ranking search results. so far, almost no opacs are capable of incorporating circulation statistics into relevancy ranking. ■■ “did you mean . . . ?”: when a search term is not spelled correctly or nothing is found in the opac in a keyword search, the spell checker will kick in and suggest the correct spelling or recommend a term that may match the user’s intended search term. for example, a modern catalog may generate a statement such as “did you mean . . . ?” or “maybe you meant . . . .” this may be a very popular and useful service in modern opacs. ■■ recommendations and related materials: the nextgeneration catalog is envisioned as promoting reading and learning by making recommendations of additional related materials to patrons. this feature is an imitation of amazon and websites that promote selling by stating “customers who bought this item also bought . . . .” likewise, after a search in the opac, a statement such as “patrons who borrowed this book also borrowed the following books . . .” may appear. ■■ user contribution—ratings, reviews, comments, and tagging: legacy catalogs only allow catalogers to add content. in the next-generation catalog, users can be active contributors to the content of the opac. they can rate, write reviews, tag, and comment on items. user contribution is an important indicator for use and can be used in relevancy ranking. ■■ rss feeds: the next-generation catalog is dynamic because it delivers lists of new acquisitions and search updates to users through rss feeds. modern catalogs are service-oriented; they do more than provide a simple display search results. the second step is to apply these ten visions to the opacs of koha, evergreen, and webvoyage to determine if they are present or absent. the opacs used in this study included three examples from each system. they may have been product demos and live catalogs randomly chosen from the user list on the product websites. the latest releases at the time of the study was koha 3.0, evergreen 2.0, webvoyage 7.1. in case of discrepancies between product descriptions and reality, we gave precedence to reality over claims. in other words, even if the product documentation lists and describes a feature, this study does not include it if the feature is not in action either in the demo or live catalogs. despite the fact that a planned future release of one of those investigated opacs may add a feature, this study only recorded what existed at the time of the comparison. the following are the opacs examined in this paper. koha ■■ koho demo for academic libraries: http://academic .demo.kohalibrary.com/ ■■ wagner college: http://wagner.waldo.kohalibrary .com/ ■■ clearwater christian college: http://ccc.kohalibrary .com/ evergreen ■■ evergreen demo: http://demo.gapines.org/opac/ en-us/skin/default/xml/index.xml ■■ georgia pines: http://gapines.org/opac/en-us/ skin/default/xml/index.xml ■■ columbia bible college at http://columbiabc .evergreencatalog.com/opac/en-ca/skin/default/ xml/index.xml webvoyage ■■ rider university libraries: http://voyager.rider.edu ■■ renton college library: http://renton.library.ctc .edu/vwebv/searchbasic 144 information technology and libraries | september 2010 ■■ shoreline college library: http://shoreline.library .ctc.edu/vwebv/searchbasic the final step includes data collection and compilation. a discussion of findings follows. the study draws conclusions about which opac is more advanced and has more features of the next-generation library catalog. ■■ findings each of the opacs of koha, evergreen, and webvoyage are examined for the presence of the ten features of the next-generation catalog. single point of entry for all library information none of the opacs of the three ilss provides true federated searching. to varying degrees, each is limited in access, showing an absence of contents from electronic databases, digital archives, and other sources that generally are not located in the legacy catalog. of the three, koha is more advanced. while webvoyage and evergreen only display journal-holdings information in their opacs, koha links journal titles from its catalog to proquest’s serials solutions, thus leading users to fulltext journals in the electronic databases. the example in figure 1 (koha demo) shows the journal title unix update with an active link to the full-text journal in the availability field. the link takes patrons to serials solutions, where full text at the journal-title level is listed for each database (see figure 2). each link will take you into the full text in each database. state-of-the-art web interface as beauty is in the eye of the beholder, the interface of a catalog can be appealing to one user but prohibitive to another. with this limitation in mind, the out-of-thebox user interface at the demo sites was considered for each opac. all the three catalogs have the google-like simplicity in presentation. all of the user interfaces are highly customizable. it largely depends on the library to make the user interface appealing and welcoming to users. figures 3–5 show snapshots from each ilss demo sites and have not been customized. however, there are a few differences in the “state of the art.” for one, koha’s navigation between screens relies solely on the browser’s forward and back buttons, while webvoyage and evergreen have internal navigation buttons that more efficiently take the user between title lists, headings lists, and record displays, and between records in a result set. while all three opacs offer an advanced search page with multiple boxes for entering search terms, only webvoyage makes the relationship between the terms in different boxes clear. by the use of a drop-down box, it makes explicit that the search terms are by default anded and also allows for the selection of or and not. in koha’s and evergreen’s advanced search, however, the terms are anded only, a fact that is not at all obvious to the user. in the demo opacs examined, there is no option to choose or or not between rows, nor is there any indication that the search is anded. the point of providing multiple search boxes is to guide users in constructing a boolean search without their having to worry about operators and syntax. in koha, however, users have to type an or or not statement themselves within the text box, thus defeating the purpose of having multiple boxes. while evergreen allows for a not construction within a row (“does not contain”), it does not provide an option for or (“contains” and “matches exactly” are the other two options available). see figures figure 1. link to full-text journals in serials solutions in koha figure 2. links to serials solutions from koha the next generation library catalog | yang and hofmann 145 6–8. thus koha’s and evergreen’s advanced search is less than intuitive for users and certainly less functional than webvoyage’s. enriched content to varying degrees, enriched content is present in all three catalogs, with koha providing the most. while all three catalogs have book covers and movie-container art, koha has much more in its catalog. for instance, it displays tags, descriptions, comments, and amazon reviews. webvoyage displays links to google books for book reviews and content summaries but does not have tags, descriptions, and comments in the catalog. see figures 9–11. faceted navigation the koha opac is the only catalog of the three to offer faceted navigation. the “refine your search” feature allows users to narrow search results by availability, places, libraries, authors, topics, and series. clicking on a term within a facet adds that term to the search query and generates a narrower list of results. the user may then choose another facet to further refine the search. while evergreen appears to have faceted navigation upon first glance, it actually does not possess this feature. the following facets appear after a search generates hits: “relevant subjects,” “relevant authors,” and “relevant series.” but choosing a term within a facet does not narrow down the previous search. instead, it generates an entirely new search with the selected term; it does not add the new term to the previous query. users must manually combine the terms in the simple search box or through the advanced search page. webvoyage also does not offer faceted navigation—it only provides an option to “filter your search” by format, language, and date when a set of results is returned. see figures 12–14. keyword searching koha, evergreen, and webvoyage all present a simple keyword search box with a link to the advanced search (see figures 3–5). relevancy neither koha, evergreen, nor webvoyage provide any evidence for meeting the criteria of the next-generation catalog’s more inclusive vision of relevancy ranking, such as accounting for an item’s popularity or allowing user tags. koha uses index data’s zebra program for its relevance ranking, which “reads structured records in a variety of input formats . . . and allows access to them through exact boolean search figure 3. koha: state-of-the-art user interface figure 5. voyager: state-of-the-art user interface figure 4. evergreen: state-of-the-art user interface 146 information technology and libraries | september 2010 user contributions koha is the only system of the three that allows users to add tags, comments, descriptions, and reviews. in koha’s opac, user-added tags form tag clouds, and the font and size of each keyword or tag indicate that keyword or figure 6. voyager advanced search figure 7. koha advanced search figure 8. evergreen advanced search expressions and relevance-ranked free-text queries.19 evergreen’s dokuwiki states that the base relevancy score is determined by the cover density of the searched terms. after this base score is determined, items may receive score bumps based on word order, matching on the first word, and exact matches depending on the type of search performed.20 these statements do not indicate that either koha or evergreen go beyond the traditional relevancy-ranking methods of legacy systems, such as webvoyage. did you mean . . . ? only evergreen has a true “did you mean . . . ?” feature. when no hits are returned, evergreen provides a suggested alternate spelling (“maybe you meant . . . ?”) as well as a suggested additional search (“you may also like to try these related searches . . .”). koha has a spell-check feature, but it automatically normalizes the search term and does not give the option of choosing different one. this is not the same as a “did you mean . . . ?” feature as defined above. while the normalizing process may be seamless, it takes the power of choice away from the user and may be problematic if a particular alternative spelling or misspelling is searched purposefully, such as “womyn.” (when “womyn” is searched as a keyword in the koha demo opac, 16,230 hits are returned. this catalog does not appear to contain the term as spelled, which is why it is normalized to women. the fact that the term does not appear as is may not be transparent to the searcher.) with normalization, the user may also be unaware that any mistake in spelling has occurred, and the number of hits may differ between the correct spelling and the normalized spelling, potentially affecting discovery. the normalization feature also only works with particular combinations of misspellings, where letter order affects whether a match is found. otherwise the system returns a “no result found!” message with no suggestions offered. (try “homoexuality” vs. “homoexsuality.” in koha’s demo opac, the former, with a missing “s,” yields 553 hits, while the latter, with a misplaced “s,” yields none.) however, koha is a step ahead of webvoyage, which has no built-in spell checker at all. if a search fails, the system returns the message “search resulted in no hits.” see figures 15–17. recommendations/related materials none of the three online catalogs can recommend materials for users. the next generation library catalog | yang and hofmann 147 figure 9. koha enriched content figure 10. evergreen enriched content figure 11. voyager enriched content figure 12. koha faceted navigation figure 13. evergreen faceted navigation figure 14. voyager faceted navigation 148 information technology and libraries | september 2010 nevertheless, the user contribution in the koha opac is not easy to use. it may take many clicks before a user can figure out how to add or edit text. it requires user login, and the system cannot keep track of the search hits after a login takes place. therefore the user contribution features of koha need improvement. see figure 18. rss feeds koha provides rss feeds, while evergreen and webvoyage do not. ■■ conclusion table 1 is a summary of the comparisons in this paper. these comparisons show that the koha opac has six out of the ten compared features for the next-generation catalog, plus two halves. its full-fledged features include state-of-the-art web interface, enriched content, faceted navigation, a simple keyword search box, user contribution, and rss feeds. the two halves indicate the existence of a feature that is not fully developed. for instance, “did you mean . . . ?” in koha does not work the way the next-generation catalog is envisioned. in addition, koha has the capability of linking journal titles to full text via serials solutions, while the other two opacs only display holdings information. evergreen falls into second place, providing four out of the ten compared features: state-of-the-art interface, enriched content, a keyword search box, and “did you mean . . . ?” webvoyage, the voyager opac from ex libris, comes in third, providing only three out of the ten features for figure 15. evergreen: did you mean . . . ? figure 16. koha: did you mean . . . ? figure 17. voyager: did you mean . . . ? figure 18. koha user contibutions tag’s frequency of use. all the tags in a tag cloud serve as hyperlinks to library materials. users can write their own reviews to complement the amazon reviews. all user-added reviews, descriptions, and comments have to be approved by a librarian before they are finalized for display in the opac. the next generation library catalog | yang and hofmann 149 the next-generation catalog. based on the evidence, koha’s opac is more advanced and innovative than evergreen’s or voyager’s. among the three catalogs, the open-source opacs compare more favorably to the ideal next-generation catalog then the proprietary opac. however, none of them is capable of federated searching. only koha offers faceted navigation. webvoyage does not even provide a spell checker. the ils opac still has a long way to go toward the nextgeneration catalog. though this study samples only three catalogs, hopefully the findings will provide a glimpse of the current state of open-source versus proprietary catalogs. ils opacs are not comparable in features and functions to stand-alone opacs, also referred to as “discovery tools” or “layers.” some discovery tools, such as ex libris’ primo, also are federated search engines and are modeled after the next-generation catalog. recently they have become increasingly popular because they are bolder and more innovative than ils opacs. two of the best stand-alone open-source opacs are villanova university’s vufind and oregon state university’s libraryfind.21 both boast eight out of ten features of the next-generation catalog.22 technically it is easier to develop a new stand-alone opac with all the next-generation catalog features than mending old ils opacs. as more and more libraries are disappointed with their ils opacs, more discovery tools will be implemented. vendors will stop improving ils opacs and concentrate on developing better discovery tools. the fact that ils opacs are falling behind current trends may eventually bear no significance for libraries—at least for the ones that can afford the purchase or implementation of a more sophisticated discovery tool or stand-alone opac. certainly small and public libraries who cannot afford a discovery tool or a programmer for an open-source opac overlay will suffer, unless market conditions change. references 1. tanja mercun and maja žumer, “new generation of catalogues for the new generation of users: a comparison of six library catalogues,” program: electronic library & information systems 42, no. 3 (july 2008): 243–61. 2. eric lease morgan, “a ‘next-generation’ library catalog— executive summary (part #1 of 5),” online posting, july 7, 2006, lita blog: library information technology association, http:// litablog.org/2006/07/07/a-next-generation-library-catalog -executive-summary-part-1-of-5/ (accessed nov. 10, 2008). 3. marshall breeding, introduction to “next generation library catalogs,” library technology reports 43, no. 4 (july/aug. 2007): 5–14. 4. ibid. 5. marshall breeding, “library technology guides: key resources in the field of library automation,” http:// www .librarytechnology.org/lwc-search-advanced.pl (accessed jan. 23, 2010). 6. marshall breeding, “investing in the future: automation marketplace 2009,” library journal (apr. 1, 2009), http:// www .libraryjournal.com/article/ca6645868.html (accessed jan. 23, 2010). 7. marshall breeding, “library technology guides: company directory,” http://www.librarytechnology.org/exlibris .pl?sid=20100123734344482&code=vend (accessed jan. 23, 2010). 8. merčun and zumer, “new generation of catalogues.” 9. ibid. 10. linda riewe, “integrated library system (ils) survey: open source vs. proprietary-tables” (master’s thesis, san jose university, 2008): 2–5, http://users.sfo.com/~lmr/ils-survey/ tables-all.pdf (accessed nov. 4, 2008). 11. ibid., 26–27. 12. breeding, introduction. 13. ibid.; morgan, “a ‘next-generation’ library catalog.” 14. breeding, introduction. 15. ibid. 16. ibid. 17. villanova university, “vufind,” http://vufind.org/ (accessed june 10, 2010); innovated interfaces, “encore,” http:// encoreforlibraries.com/ (accessed june 10, 2010). 18. auto-graphics, “agent illuminar,” http://www4.auto -graphics.com/solutions/agentiluminar/agentiluminar.htm (accessed june 10, 2010). 19. breeding, introduction; morgan, “a ‘next-generation’ table 1. summary features of the next generation catalog koha evergreen voyager single point of entry for all library information ûü û û state-of-the-art web interface ü ü ü enriched content ü ü ü faceted navigation ü û û keyword search ü ü ü relevancy û û û did you mean…? üû ü û recommended/ related materials û û û user contribution ü û û rss feed ü û û 150 information technology and libraries | september 2010 22. villanova university, “vufind”; oregon state university, “libraryfind,” http://libraryfind.org/ (accessed june 10, 2010). 23. sharon q.yang and kurt wagner, “open source standalone opacs,” (microsoft powerpoint presentation, 2010 virtual academic library environment annual conference, piscataway, new jersey, jan. 8, 2010). library catalog.” 20. index data, “zebra,” http://www.indexdata.dk/zebra/ (accessed jan. 3, 2009). 21. evergreen docuwiki, “search relevancy ranking,” http://open-ils.org/dokuwiki/doku.php?id=scratchpad:opac_ demo&s=core (accessed dec. 19, 2008). lita cover 3, cover 4 yalsa cover 2 index to advertisers 60 information technology and libraries | june 2011 b ecause this is a family program and because we are all polite people, i can’t really use the term i want to here. let’s just say that i am an operating system [insert term here for someone who is highly promiscuous]. i simply love to install and play around with various operating systems, primarily free operating systems (oses), primarily linux distributions. and the more exotic, the better, even though i always dutifully return home at the end of the evening to my beautiful and beloved ubuntu. in the past year or two i can recall installing (and in some cases actually using) the following: gentoo, mint, fedora, debian, moonos, knoppix, damn small linux, easypeasy, ubuntu netbook remix, xubuntu, opensuse, netbsd, sabayon, simplymepis, centos, geexbox, and reactos. (aside from stock ubuntu and all things canonical, the one i keep a constant eye on is moonos [http://www.moonos.org/], a stunningly beautiful and eminently usable ubuntu-based remix by a young artist and programmer in cambodia, chanrithy thim.) in the old days i would have rustled up an old, sloughed-off pc to use as an experimental “server” upon which i would unleash each of these oses, one at a time. but those were the old days, and these are the new days. my boss kindly bought me a big honkin’ windows-based workstation about a year and a half ago, a box with plenty of processing power and memory (can you even buy a new workstation these days that’s not incredibly powerful, and incredibly inexpensive?), so my need for hardware above and beyond what i use in my daily life is mitigated. specifically, it’s mitigated through use of virtual machines. i have long used virtualbox (http://www.virtualbox .org/) to create virtual machines (vms), lopped-off hunks of ram and disk space to be used for the installation of a completely different os. with virtualbox, you first describe the specifications of the vm you’d like to create—how much of the host’s ram to provide, how large a virtual hard disk, boot order, access to host cd drives, usb devices, etc. you click a button to create it, then you install an os onto it, the “guest” os, in the usual way. (well, not exactly the usual way; it’s actually easier to install an os here because you can boot directly from a cd image, or iso file, negating the need to mess with anything so distasteful and old-fashioned and outre as an actual, physical cd-rom.) in my experience, you can create a new vm in mere seconds; then it’s all a matter of how difficult the os is to install, and the linux distributions are becoming easier and easier to install as the months plow on. at any rate, as far as your new os is concerned, it is being installed on bare metal. virtual? real? for most intents and purposes the guest os knows no difference. in the titillatingly dangerous and virus-ridden cyberworld in which we live, i’ll not mention the prophylactic uses of vms because, again, this is a family program and we’re all polite people. suffice it to say, the typical network connection of a vm is nated behind the nic of the host machine, so at least as far as active network– based attacks are concerned, your guest vm is at least as secure as its host, even more so because it sits in its own private network space. avoiding software-based viruses and trojans inside your vm? let’s just say that the wisdom passed down the cybergenerations still holds: when it rains, you wear a raincoat—if you see what i’m saying. aside from enabling, even promoting my shameless os promiscuity, how are vms useful in an actual work setting? for one, as a longtime windows guy, if i need to install and test something that is *nix-only, i don’t need a separate box with which to do so. (and vice versa too for all you unix-weaned ladies and gentlemen who find the need to test something on a rocker from redmond.) if there is a software dependency on a particular os, a particular version of a particular os, or even if the configuration of what i’m trying to test is so peculiar i just don’t want to attempt to mix it in with an existing, stable vm, i can easily and painlessly whip up a new instance of the required os and let it fly. and deleting all this when i’m done is easily accomplished within the virtualbox gui. using a virtual machine facilitates the easy exploration of new operating systems and new applications, and moving toward using virtual machines is similar to when i first started using a digital camera. you are free to click click click with no further expense accrued. you don’t like what you’ve done? blow it away and begin anew. all this vm business has spread, at my home institution, from workstation to data center. i now run both a development and test server on vms physically sitting on a massive production server in our data center—the kind of machine that when switched on causes a brown-out in the tri-state area. this is a very efficient way to do things though because when i needed access to my own server, our system administrator merely whipped up a vm for me to use. to me, real or virtual, it was all the same; to the system administrator, it greatly simplified operations. and i may joke about the loud clank of the host server’s power switch and subsequent dimming of the lights, but doing things this way has been shown to be more energy efficient than running a server farm in which each server editorial board thoughts: just like being there, or how i learned to stop coveting bare metal and learned to love my vm mark cyzyk (mcyzyk@jhu.edu) is the scholarly communication architect in the sheridan libraries, johns hopkins university, baltimore, maryland. mark cyzyk editorial board thoughts | cyzyk 61 virtual machines: zero-cost playgrounds for the promiscuous, and energy efficient, staff saving tools for system operations. what’s not to like? throw dual monitors into the mix (one for the host os; one for the guest), and it’s just like being there. sucks in enough juice to quench the thirst of its redundant power supplies. (they’re redundant, they repeat themselves; they’re redundant, they repeat themselves—so you don’t want too many of them around slurping up the wattage, slurping up the wattage . . . ) 100 information technology and libraries | june 2009 tutorial andrew darby and ron gilmour adding delicious data to your library website social bookmarking services such as delicious offer a simple way of developing lists of library resources. this paper outlines various methods of incorporating data from a delicious account into a webpage. we begin with a description of delicious linkrolls and tagrolls, the simplest but least flexible method of displaying delicious results. we then describe three more advanced methods of manipulating delicious data using rss, json, and xml. code samples using php and javascript are provided. o ne of the primary components of web 2.0 is social bookmarking. social bookmarking services allow users to store bookmarks on the web where they are available from any computer and to share these bookmarks with other users. even better, these bookmarks can be annotated and tagged to provide multiple points of subject access. social bookmarking services have become popular with librarians as a means of quickly assembling lists of resources. since anything with a url can become a bookmark, such lists can combine diverse resource types such as webpages, scholarly articles, and library catalog records. it is often desirable for the data stored in a social bookmarking account to be displayed in the context of a library webpage. this creates consistent branding and a more professional appearance. delicious (http://delicious .com/), one of the most popular social bookmarking tools, allows users to extract data from their accounts and to display this data on their own websites. delicious offers multiple ways of doing this, from simply embedding html in the target webpage to interacting with the api.1 in this paper we will begin by looking at the simplest methods for users uncomfortable with programming, and then move on to three more advanced methods using rss, json, and xml. our examples use php, a cross-platform scripting language that may be run on either linux/ unix or windows servers. while it is not possible for us to address the many environments (such as cmses) in which websites are constructed, our code should be adaptable to most contexts. this will be especially simple in the many popular php–based cmses such as drupal, joomla, and wordpress. it should be noted that the process of tagging resources in delicious requires little technical expertise, so the task of assembling lists of resources can be accomplished by any librarian. the construction of a website infrastructure (presumably by the library’s webmaster) is a more complex task that may require some programming expertise. linkrolls and tagrolls the simplest way of sharing links is to point users directly to the desired andrew darby (adarby@ithaca.edu) is web services librarian, and ron gilmour (rgilmour@ithaca.edu) is science librarian at ithaca college library, ithaca, new york. figure 1. delicious linkroll page adding delicious data to your library website | darby and gilmour 101 delicious page. to share all the items labeled “biology” for the user account “iclibref,” one could disseminate the url http://delicious.com/iclibref/ biology. the obvious downside is that the user is no longer on your website, and they may be confused by their new location and what they are supposed to do there. linkrolls, a utility available from the delicious site, provides a number of options for generating code to display a set of bookmarked links, including what tags to display, the number, the type of bullet, and the sorting criterion (see figure 1).2 this utility creates simple html code that can be added to a website. a related tool, tagrolls, creates the ubiquitous delicious tag cloud.3 for many librarians, this will be enough. with the embedded linkroll code, and perhaps a bit of css styling, they will be satisfied with the results. however, delicious also offers more advanced methods of interacting with data. for more control over how delicious data appears on a website, the user must interact with delicious through rss, json or xml. rss like most web 2.0 applications, delicious makes its content available as rss feeds. feeds are available at a variety of levels, from the delicious system as a whole down to a particular tag in a particular account. within a library context, the most useful types of feeds will be those that point to lists of resources with a given tag. for example, the request http://feeds.delicious.com/rss/iclibref/biology returns the rss feed for the “biology” tag of the “iclibref” account, with items listed as follows: <item rdf:about=“http://icarus .ithaca.edu/cgi-bin/pwebrecon .cgi?bbid=237870”> <title>darwin’s dangerous idea (evolution 1) 2008-0409t18:40:00z http://icarus.ithaca .edu/cgi-bin/pwebrecon. cgi?bbid=237870 iclibref this episode interweaves the drama in key moments of darwin&#039;s life with documentary sequences of current research, linking past to present and introducing major concepts of evolutionary theory. 2001 biology to display delicious rss results on a website, the webmaster must use some rss parsing tool in combination with a script to display the results. the xml_rss package provides an easy way to read rss using php.4 the code for such an operation might look like this: parse(); foreach ($rss->getitems() as $item) { echo “”; } ?> this code uses xml_rss to parse the rss feed and then prints out a list of linked results. rss is designed primarily as a current awareness tool. consequently, a delicious rss feed only returns the most recent thirty-one items. this makes sense from an rss perspective, but it will not often meet the needs of librarians who are using delicious as a repository of resources. despite this limitation, the delicious rss feed may be useful in cases where currency is relevant, such as lists of recently acquired materials. json a second method to retrieve results from delicious is using javascript object notation or json.5 as with the rss feed method, a request with credentials goes out to the delicious server. the response returns in json format, which can then be processed using javascript. an example request might be http://feeds.delicious . c o m / v 2 / j s o n / i c l i b r e f / b i o l o g y . by navigating to this url, the json response can be observed directly. a json response for a single record (formatted for readability) looks like this: delicious.posts = [ {“u”:“http:\/\/icarus.ithaca .edu\/cgi-bin\/pwebrecon .cgi?bbid=237870”, “d”:“darwin’s dangerous idea (evolution 1)”, “t”:[“biology”], “dt”:“2008-04-09t06:40:00z”, “n”:“this episode interweaves the drama in key moments of darwin’s life with documentary sequences of current research, linking past to present and introducing major concepts of evolutionary theory. 2001”} ]; it is instructive to look at the json feed because it displays the information elements that can be extracted: “u” for the url of the resource, “d” for the title, “t” for a comma-separated list of related tags, “n” for the note field, and “dt” for the timestamp. to display results in a webpage, the feed is requested using javascript: 102 information technology and libraries | june 2009 then the json objects must be looped through and displayed as desired. alternately, as in the script below, the json objects may be placed into an array for sorting. the following is a simple example of a script that displays all of the available data with each item in its own paragraph. this script also sorts the links alphabetically. while rss returns a maximum of thirty-one entries, json allows a maximum of one hundred. the exact number of items returned may be modified through the count parameter at the end of the url. at the ithaca college library, we chose to use json because at the time, delicious did not offer the convenient tagrolls, and the results returned by rss were displayed in reverse chronological order and truncated at thirty-one items. currently, we have a single php page that can display any delicious result set within our library website template. librarians generate links with parameters that designate a page title, a comma-delimited list of desired tags, and whether or not item descriptions should be displayed. for example, www.ithacalibrary.com/research/delish_feed. php?label=biology%20films&tag=bio logy,biologyi¬es=yes will return a page that looks like figure 2. the advantage of this approach is that librarians can easily generate webpages on the fly and send the url to their faculty members or add it to a subject guide or other webpage. the php script only has to read the “$_get” variables from the url and then query delicious for this content. xml delicious offers an application programming interface (api) that returns xml results from queries passed to delicious through https. for instance, the request https://api.del.icio.us/v1/posts/ recent?&tag=biology returns an xml document listing the fifteen most recent posts tagged as “biology” for a given account. unlike either the rss or the json methods, the xml api offers a means of retrieving all of the posts for a given tag by allowing requests such as https://api.del.icio.us/v1/ posts/all?&tag=biology. this type of request is labor intensive for the delicious server, so it is best to cache the results of such a query for future use. this involves the user writing the results of a request to a file on the server and then checking to see if such an archived file exists before issuing another request. a php utility called deliciousposts, which provides caching functionality, is available for free.6 note that the username is not part of the request and must be supplied separately. unlike the public rss or json feeds, using the xml api requires users to log in to their own account. from a script, this can be accomplished using the php curl function: $ch = curl_init(); curl_setopt($ch, curlopt_ url, $queryurl); curl_setopt($ch, curlopt_ userpwd, $username . “:” . $password); curl_setopt($ch, curlopt_ returntransfer, 1); $posts = curl_exec($ch); curl_close($ch); this code logs into a delicious account, passes it a query url, and makes the results of the query available as a string in the variable $posts. the content of $posts can then be processed as desired to create web content. one way of doing this is to use an xslt stylesheet to transform the results into html, which can then be printed to the browser: /* create a new dom document from your stylesheet */ $xsl = new domdocument; $xsl->load(“mystylesheet.xsl”); /* set up the xslt processor */ $xp = new xsltprocessor; $xp->importstylesheet($xsl); /* create another dom document from the contents of the $posts variable */ $doc = new domdocument; $doc->loadxml($posts); /* perform the xslt transformation and output the resulting html */ $html = $xp>transformtoxml($doc); echo $html; conclusion delicious is a great tool for quickly and easily saving bookmarks. it also offers some very simple tools such as linkrolls and tagrolls to add delicious content to a website. but to exert more control over this data, the user must interact with the delicious api or feeds. we have outlined three different ways to accomplish this: rss is a familiar option and a good choice if the data is to be used in a feed reader, or if only the most recent items need be shown. json is perhaps the fastest method, but requires some basic scripting knowledge and can only display one hundred results. the xml option involves more programming but allows an unlimited number of results to be returned. all of these methods facilitate the use of delicious data within an existing website. references 1. delicious, tools, http://delicious .com/help/tools (accessed nov. 7, 2008). 2. linkrolls may be found from your delicious account by clicking settings > linkrolls, or directly by going to http:// delicious.com/help/linkrolls (accessed nov. 7, 2008). 3. tagrolls may be found from your delivious account by clicking settings > tagrolls or directly by going to http:// delicious.com/help/tagrolls (accessed nov. 7, 2008) 4. martin jansen and clay loveless, “pear::package::xml_rss,” http://pear .php.net/package/xml_rss (accessed november 7, 2008). 5. introducing json, http://json.org (accessed nov. 7, 2008). 6. ron gilmour, “deliciousposts,” h t t p : / / r o n g i l m o u r. i n f o / s o f t w a r e / deliciousposts (accessed nov. 7, 2008). lita cover 2, cover 3, cover 4 mit press 92 index to advertisers copyright: regulation out of line with our digital reality? abigail j. mcdermott information technology and libraries | march 2012 7 abstract this paper provides a brief overview of the current state of copyright law in the united states, focusing on the negative impacts of these policies on libraries and patrons. the article discusses four challenges current copyright law presents to libraries and the public in general, highlighting three concrete ways intellectual property law interferes with digital library services and systems. finally, the author suggests that a greater emphasis on copyright literacy and a commitment among the library community to advocate for fairer policies is vital to correcting the imbalance between the interests of the public and those of copyright holders. introduction in july 2010, the library community applauded when librarian of congress james h. billington announced new exemptions to the digital millennium copyright act (dmca). those with visual disabilities and the librarians who serve them can now circumvent digital rights management (drm) software on e-books to activate a read-aloud function.1 in addition, higher education faculty in departments other than film and media studies can now break through drm software to include high-resolution film clips in class materials and lectures. however, their students cannot, since only those who are pursuing a degree in film can legally do the same.2 that means that english students who want to legally include high-resolution clips from the critically acclaimed film sense and sensibility in their final projects on jane austin’s novel will have to wait another three years, when the librarian of congress will again review the dmca. the fact that these new exemptions to the dmca were a cause for celebration is one indicator of the imbalanced state of the copyright regulations that control creative intellectual property in this country. as the consumer-advocacy group public knowledge asserted, “we continue to be disappointed that the copyright office under the digital millennium copyright act can grant extremely limited exemptions and only every three years. this state of affairs is an indication that the law needs to be changed.”3 this paper provides a brief overview of the current state of u.s. copyright law, especially developments during the past fifteen years, with a focus on the negative impact these policies have had and will continue to have on libraries, librarians, and the patrons they serve. this paper does not provide a comprehensive and impartial primer on copyright law, a complex abigail j. mcdermott (ajmcderm@umd.edu) is graduate research associate, the information policy and access center (ipac), and masters candidate in library science, university of maryland, college park. mailto:ajmcderm@umd.edu copyright: regulation out of line with our digital reality | mcdermott 8 and convoluted topic, instead identifying concerns about the effects an out-of-balance intellectual property system is having on the library profession, library services, and creative expression in our digital age. as with any area of public policy, the battles over intellectual property issues create an every fluctuating copyright environment, and therefore, this article is written to be current with policy developments as of october 2011. finally, this paper recommends that librarians seek to better educate themselves about copyright law, and some innovative responses to an overly restrictive system, so that we can effectively advocate on our own behalf, and better serve our patrons. the state of u.s. copyright law copyright law is a response to what is known as the “progress clause” of the constitution, which charges congress with the responsibility “to promote the progress of science and the useful arts . . . to this end, copyright assures authors the right to their original expression, but encourages others to build freely upon the ideas and information conveyed by a work.”4 fair use, a statutory exception to u.s. copyright law, is a complex subject, but a brief examination of the principle gets to the heart of copyright law itself. when determining fair use, courts consider 1. the purpose and character of the use; 2. the nature of the copyrighted work; 3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 4. the effect of the use upon the potential market for the copyrighted work.5 while fair use is an “affirmative defense” to copyright infringement,6 invoking fair use is not the same as admitting to copyright infringement. teaching, scholarship, and research, as well as instances in which the use is not-for-profit and noncommercial, are all legitimate examples of fair use, even if fair use is determined on a case-by-case basis.7 despite the byzantine nature of copyright law, there are four key issues that present the greatest challenges and obstacles to librarians and people in general: the effect of the dmca on the principle of fair use; the dramatic extension of copyright terms codified by the sonny bono copyright term extension act; the disappearance of the registration requirement for copyright holders; and the problem of orphan works. the digital millennium copyright act (dmca) the dmca has been controversial since its passage in 1998. title i of the dmca implements two 1996 world intellectual property organization (wipo) treaties that obligate member states to enforce laws that make tampering with drm software illegal. the dmca added chapter 12 to the u.s. copyright act (17 u.s.c. §§ 1201–1205), and it criminalized the trafficking of “technologies designed to circumvent access control devices protecting copyrighted material from unauthorized information technology and libraries | march 2012 9 copying or use.”8 while film studios, e-book publishers, and record producers have the right to protect their intellectual property from illegal pirating, the dmca struck a serious blow to the principle of fair use, placing librarians and others who could likely claim fair use when copying a dvd or pdf file in a catch-22 scenario. while the act of copying the file may be legal according to fair use, breaking through any drm technology that prevents that copying is now illegal.9 the sonny bono copyright term extension act while the copyright act of 1790 only provided authors and publishers with twenty-eight years of copyright protection, the sonny bono copyright term extension act of 1998 increased the copyright terms of all copyrighted works that were eligible for renewal in 1998 to ninety-five years after the year of the creator’s death. in addition, all works copyrighted on or after january 1, 1978, now receive copyright protection for the life of the creator plus seventy years (or ninety-five years from the date of publication for works produced by multiple creators).10 jack valenti, former president of the motion picture association of american, was not successful in pushing copyright law past the bounds of the constitution, which mandates that copyright be limited, although he did try to circumvent this constitutional requirement by suggesting that copyright terms last forever less one day.11 the era of automatic copyright registration perhaps the most problematic facet of modern u.s. copyright law appears at first glance to be the most innocuous. the copyright act of 1976 did away with the registration requirement established by the copyright act of 1790.12 that means that any creative work “fixed in any tangible medium of expression” is automatically copyrighted at the moment of its creation.13 that includes family vacation photos stored on a computer hard drive; they are copyrighted and your permission is required to use them. the previous requirement of registration meant authors and creators had to actively register their works, so anything that was not registered entered the public domain, replenishing that important cultural realm.14 now that copyright attaches at the moment an idea is expressed through a cocktail napkin doodle or an outline, virtually nothing new enters the public domain until its copyright term expires—at least seventy years later. in fact, nothing new will enter the public domain through copyright expiration until 2019. until then, the public domain is essentially frozen in the year 1922.15 the problem of orphan works in addition, the incredibly long copyright terms that apply to all books, photographs, and sound recordings have created the problem of orphan works. orphan works are those works that are under copyright protection, but whose owners are difficult or impossible to locate, often due to death.16 these publications are problematic for researchers, librarians, and the public in general: orphan works are perceived to be inaccessible because of the risk of infringement liability that a user might incur if and when a copyright owner subsequently appears. consequently, many works that are, copyright: regulation out of line with our digital reality | mcdermott 10 in fact, abandoned by owners are withheld from public view and circulation because of uncertainty about the owner and the risk of liability.17 if copyright expired with the death of the author, or if there were a clause that would allow these works to pass into the public domain if the copyright holder’s heirs did not actively renew copyright for another term, then these materials would be far less likely to fall into legal limbo. currently, many are protected despite the fact that acquiring permission to use them is all but impossible. a study of orphan works in the collections of united kingdom public sector institutions found that these works are likely to have little commercial value, but high “academic and cultural significance,” and when contacted, these difficult-to-trace rights holders often grant permission for reproduction without asking for compensation.18 put another way, orphan works are essentially “locking up culture and other public sector content and preventing organizations from serving the public interest.”19 the row that arose in september 2011 between the hathitrust institutions and the authors guild over the university of michigan’s orphan works digitization project, with j. r. salamanca’s longout-of-print 1958 novel the lost country serving as the pivot point in the dispute, is an example of the orphan works problem. the fact that university of michigan associate university librarian john price wilkin was forced to assure the public that “no copyrighted books were made accessible to any students” illustrates the absurdity in arguing over whether it’s right to digitize books that are no longer even accessible in their printed form.20 libraries, digitization, and copyright law: the quiet crisis while one can debate if u.s. copyright law is still oriented toward the public good, the more relevant question in this context is the effect copyright law has on the library profession. drm technology can get in the way of serving library patrons with visual disabilities and every library needs to place a copyright disclaimer on the photocopiers, but how much more of a stumbling block is intellectual property law to librarians in general, and the advance of library systems and technology in particular? the answer is undeniably that current u.s. copyright legislation places obstacles in the way of librarians working in all types of libraries. while there are many ways that copyright law affects library services and collections in this digital area, three challenges are particularly pressing: the problem of ownership and licensing of digital content or collections; the librarian as de facto copyright expert; and copyright law as it relates to library digitization programs generally, and the google book settlement in particular. digital collections: licenses replace ownership in the past, people bought a book, and they owned that copy. there was little they could accidentally or unknowingly do to infringe on the copyright holder’s rights. likewise, when physical collections were their only concern, librarians could rely on sections 108 and 109 of the copyright law to protect them from liability when they copied a book or other work and when they loaned materials in their collections to patrons.21 today, we live partly in the physical world and https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote19#footnote19 information technology and libraries | march 2012 11 partly in the digital world, reaching out and connecting to each other across fiber optic lines in the same way we once did around the water cooler. likewise, the digital means of production are widely distributed. in a multimedia world, where sharing an informative or entertaining video clip is as easy as embedding a link onto someone’s facebook wall, the temptation to infringe on rights by distributing, reproducing, or displaying a creative work is all too common, and all too easy.22 many librarians believe that disclaimers on public-access computer terminals will protect them from lawsuit, but they do not often consider placing such disclaimers on their cd or dvd collections. yet a copyright holder would not have to prove the library is aware of piracy to accuse the library of vicarious infringement of copyright. the copyright holder may even be able to argue that the library sees some financial gain from this piracy if the existence of the material that is being pirated serves as the primary reason a patron visits the library.23 even the physical cd collection in the public library can place the institution in danger of copyright infringement; yet the copyright challenges raised by cutting-edge digital resources, like e-books, are undoubtedly more complicated. e-books are replacing traditional books in many contexts. like most digital works today, e-books are licensed, not purchased outright. the problem licensing presents to libraries is that licensed works are not sold, they are granted through contracts, and contracts can change suddenly and negate fair-use provisions of u.s. copyright law.24 while libraries are now adept at negotiating contracts with subscription database providers, e-books are in many ways even more difficult to manage, with many vendors requiring that patrons delete or destroy the licensed content on their personal e-readers at the end of the lending period.25 the entire library community was rocked by harpercollins’s february 2011 decision to limit licenses on e-books offered through library ebook vendors like overdrive to twenty-six circulations, with many librarians questioning the publisher’s assertion that this seemingly arbitrary limitation is related to the average lifespan of a single print copy.26 license holders have an easy time arguing that any use of their content without paying fees is a violation of their copyright. that is not the case when a fair use argument is justified, and while many in the library community may acquiesce to these arguments, “in recent cases, courts have found the use of a work to be fair despite the existence of a licensing market.”27 when license agreements are paired with drm technology, libraries may find themselves managing thousands of micropayments to allow their users to view, copy, move, print, or embed, for example, the pdf of a scholarly journal article.28 in the current climate of reduced staff and shrinking budgets, managing these complex licensing agreements has the potential to cripple many libraries. the librarian as accidental copyright czar during a special libraries association (sla) q&a session on copyright law in the digital age, the questions submitted to the panel came from librarians working in hospitals, public libraries, academic libraries, and even law libraries. librarians are being thrust into the position of de facto copyright expert. one of the speakers mentioned that she must constantly remind the lawyers at copyright: regulation out of line with our digital reality | mcdermott 12 the firm she works for that they should not copy and paste the full text of news or law review journal articles into their e-mails, and instead, they should send a link. the basis of her argument is the third factor of fair use mentioned earlier: the amount or substantiality of the portion of the copyrighted work being used.29 since fair use is not a “bright line” principle, the more factors you have on your side the better when you are using a copyrighted work without the owners express permission.30 librarians working in any institution must seek express permission from copyright holders for any video they wish to post, or embed, on library-managed websites. e-reserves and streaming video, mainstays of many educators and librarians seeking to capture the attention of this digital generation, have become bright red targets for litigious copyright holders who want to shrink the territory claimed under the fair-use banner even further. many in the library community are aware of the georgia state university e-reserves lawsuit, cambridge university press et al. v. patton, in which a group of academic publishers have accused the school of turning its e-reserves system into a vehicle for intentional piracy.31 university librarians are implicated for not providing sufficient oversight. it has come to light that the association of american publishers (aap) approached other schools, including cornell, hofstra, syracuse, and marquette, before filing a suit against georgia state. generally, the letters come from aaps outside counsel and are accompanied by “the draft of a federal court legal complaint that alleges copyright infringement.”32 the aap believes that e-reserves are by nature an infringement of copyright law, so they demand these universities work with their association to draft guidelines for electronic content that support aaps “cost-per-click theory of contemporary copyright: no pay equals no click.”33 it seems that georgia state was not willing to quietly concede to aap’s view on the matter, and they are now facing the association in court.34 a decision in this case was pending at the time this article went to press. the case brought by the association for information and media equipment (aime) against ucla is similar, except it focuses on the posting of videos so they can be streamed by students on password-protected university websites that do not allow the copying or retention of the videos.35 ucla argued that the video streaming services for students are protected by the technology education and copyright harmonization (teach) act of 2002, which is the same act that allows all libraries to offer patrons online access to electronic subscription databases off-site through a user-authentication system.36 in addition, ucla argues that it is simply allowing its students to “time shift” these videos, a practice deemed not to infringe on copyright law by the supreme court in its landmark sony corp. v. universal city studios, inc. decision of 1984.37 the american library association (ala), association of research libraries (arl), and the association of college and research libraries (acrl) jointly published an opinion supporting ucla in this case. many in the wider library community sympathized with ucla’s library administrators, who cite budget cuts that reduced hours at the school’s media laboratory as one reason they must now offer students a video-streaming option.38 in the end, the case was dismissed, mostly due to the lack of standing aime had to bring the suite against ucla, a state agency, in federal court. while the judge did not https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote30#footnote30 information technology and libraries | march 2012 13 expressly rule on the fair-use argument ucla made, the ruling did confirm that streaming is not a form of video distribution and that the public-performance argument ucla made regarding the videos was not invalidated by the fact that they made copies of the videos in question.39 digitization programs and the google book settlement librarians looking to digitize print collections, either for preservation or to facilitate online access, are also grappling with the copyright monopoly. librarians who do not have the time or resources to seek permission from publishers and authors before scanning a book in their collection cannot touch anything published after 1922. librarylaw.com provides a helpful chart directed at librarians considering digitization projects, but the overwhelming fine print below the chart speaks to the labyrinthine nature of copyright.40 the google book settlement continues to loom large over both the library profession and the publishing industry. at the heart of debate is google’s library project, which is part of google book search, originally named google print.41 the library project allows users to search for books using google’s algorithms to provide at its most basic a “snippet view” of the text from a relevant publication. authors and publishers could also grant their permission to allow a view of select sample pages, and of course if the book is in the public domain, then google can make the entire work visible online.42 in all cases, the user will see a “buy this book” link so that he or she could purchase the publication from online vendors on unrelated sites.43 google hoped to sidestep the copyright permission quandary for a digitization project of this scale, announcing that it would proceed with the digitization of cooperative library collections and that it would be the responsibility of publishers and authors to actively opt out or vocalize their objection to seeing their works digitized and posted online.44 google attempted to turn the copyright permissions process on its head, which was the basis of the class action lawsuit authors guild v. google inc.45 before the settlement was reached, google pointed to kelly v. arriba soft corp as proof that the indexing functions of an internet search engine constitute fair use. in that 2002 case, the ninth circuit court of appeals found that a website’s posting of thumbnail images, or “imprecise copies of low resolution, scaled down images,” constitutes fair use, and google argued its “snippet view” function is equivalent to a thumbnail image.46 however, judge denny chin rejected the google book settlement in march 2011, citing the fact that google would in essence be “exploiting books without the permission of copyright owners” and could also establish a monopoly over the digitized books market. the decision did in the end hinge on the fact that google wanted to follow an opt-out program for copyright holders rather than an affirmative opt-in system.47 the google book settlement was dismissed without prejudice, leaving the door open to further negotiations between the parties concerned. going forward, the library community should be concerned with how google will handle orphan works and how its index of digitized works will be made available to libraries and the public. the 2008 settlement granted google the nonexclusive right to digitize all books published before january 5, 2009, and in exchange, google would have https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote40#footnote40 copyright: regulation out of line with our digital reality | mcdermott 14 “paid 70% of the net revenue earned from uses of google book search in the united states to rights holders.”48 in addition, google would have established the book rights registry to negotiate with google and others seeking to “digitize, index or display” those works on behalf of the rights holders.49 approval of the settlement would have allowed google to move forward with plans to expand google book search and “to sell subscriptions to institutions and electronic versions of books to individuals.”50 the concern that judge denny chin expressed over a potential google book monopoly was widespread among the library community. while the settlement would not have given google exclusive rights to digitize and display these copyrighted works, google planned to ensure via the settlement that it would have received the same terms the book rights registry negotiated with any third-party digital library, while also inoculating itself against the risk of any copyright infringement lawsuits that could be filed against a competitor.51 that would have left libraries vulnerable to any subscription price increases for the google books service.52 libraries should carefully watch the negotiations around any future google books settlement, paying attention to a few key issues.53 there was considerable concern that under the terms of the 2008 settlement, even libraries participating in the google books library project would need to subscribe to the service to have access to digitized copies of the books in their own collections.53 many librarians also vocalized their disappointment in google’s abandonment of its fair-use argument when it agreed to the 2008 settlement, which, if it succeeded, would have been a boon to nonprofit, library-driven digitization programs.54 finally, many librarians were concerned that google’s book rights registry was likely to become the default rights holder for the orphan works in the google books library, and that claims that google books is an altruistic effort to establish a world library conceals the less admirable aim of the project—to monetize out-of-print and orphan works.55 librarians as free culture advocates: implications and recommendations our digital nation has turned copyright law into a minefield for both librarians and the public at large. intellectual property scholar lawrence lessig failed in his attempt to argue before the supreme court that the sonny bono copyright term extension act was an attempt to regulate free speech and therefore violated the first amendment.56 but many believe that our restrictive copyright laws at least violate the intent of the progress clause of the constitution, if not the first amendment: “unconstrained access to past works helps determine the richness of future works. inversely, when past works are inaccessible except to a privileged minority, future works are impoverished.”57 while technological advances have placed the digital means of production into the hands of the masses, intellectual property law is leading us down a path to self-censorship.58 as the profession “at the heart of both the knowledge economy and a healthy democracy,”59 it is in our best interest as librarians to recognize the important role we have to play in restoring the balance to copyright law. to engage in the debate over copyright law in the digital age, the library community needs to educate itself and advocate for our own self-interests, focusing on three key areas: https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote50#footnote50 information technology and libraries | march 2012 15 1. copyright law in the classroom and at the conference. we must educate new and seasoned librarians on the nature of copyright law, and the impact it has on library practice and systems. library schools must step up to the plate and include a thorough overview of copyright law in their library science curriculum. while including copyright law in a larger legal-issues class is acceptable, the complexity of current u.s. copyright law demonstrates that this is not a subject that can be glossed over in a single lecture. furthermore, there needs to be a stronger emphasis on continuing education and training on copyright law within the library profession. the sla offers a copyright certificate program, but the reach of such programs is not wide enough. copyright law, and the impacts current policy has on the library profession, must be prominently featured at library conferences. the university of maryland university college’s center for intellectual property offers an online community forum for discussing copyright issues and policies, but it is unclear how many librarians are members.60 2. librarians as standard-bearers for the free culture movement. while the library copyright alliance, to which the ala, arl, and acrl all belong, files amicus briefs in support of balanced copyright law and submits comments to wipo, the wider library community must also advocate for copyright reform, since this is an issue that affects all librarians, everywhere. as a profession, we need to throw our collective weight behind legislative measures that address the copyright monopoly. there have been a number of unfortunate failures in recent years. s. 1621, or the consumers, schools, and libraries digital management awareness act of 2003, attempted to address a number of drm issues, including a requirement that access controlled digital media and electronics include disclosures on the nature of the drm technology in use.61 h.r. 107, the digital media consumers rights act of 2003, would have amended the dmca to allow those researching the technology to circumvent drm software while also eliminating the catch-22 that makes circumventing drm software for fair-use purposes illegal. the balance act of 2003 (h.r. 1066) included provisions to expand fair use to the act of transmitting, accepting, and saving a copyrighted digital work for personal use. all of this legislation died in committee, as did h.r. 5889 (orphan works act of 2008) and s. 2913 (shawn bentley orphan works act of 2008). both bills would have addressed the orphan works dilemma, clearly spelling out the steps one must take to use an orphan work with no express permission from the copyright holder, without fear of a future lawsuit. could a show of support from the library community have saved these bills? it is impossible to know, but it is in our best interest to follow these legislative battles in the future and make sure our voice is heard. 3. libraries and the creative commons phenomenon. in addition, librarians need to take part in the creative commons (cc) movement by actively directing patrons towards this world of digital works that have clear, simple use and attribution requirements. creative commons was founded in 2001 with the support of the center for the study of the public domain at duke university school of law.62 the movement is essentially about free culture, and the idea that many people want to share their creative works and allow others to use or build off of their efforts easily and without seeking their permission. it is not intended to supplant copyright law, and lawrence copyright: regulation out of line with our digital reality | mcdermott 16 lessing, one of the founders of creative commons, has said many times that he believes intellectual property law is necessary and that piracy is inexcusable.63 instead, a cc license states in clear terms exactly what rights the creator reserves, and conversely, what rights are granted to everyone else.64 as lawrence lessig explains, you go to the creative commons website (http://creativecomms.org); you pick the opportunity to select a license: do you want to permit commercial uses or not? do you want to allow modifications or not? if you allow modifications, do you want to require a kind of copyleft idea that other people release the modifications under a similarly free license? that is the core, and that produces a license.65 there are currently six cc licenses, and they include some combination of the four license conditions defined by creative commons: attribution (by), share alike (sa), noncommercial (nc), and no derivatives (nd).66 each of the four conditions is designated by a clever symbol, and the six licenses display these symbols after the creative commons trademark itself, two small c’s inside a circle.67 there are “hundreds of millions of cc licensed works” that can be searched through google and yahoo, and some notable organizations that rely on cc licenses include flickr, the public library of science, wikipedia, and now whitehouse.gov.68 all librarians not already familiar with this approach need to educate themselves on cc licenses and how to find cc licensed works.69 while librarians must still inform their patrons about the realities of copyright law, it is just as important to direct patrons, students, and colleagues to cc licensed materials, so that they can create the mash-ups, videos, and podcasts that are the creative products of our web 2.0 world.70 the creative commons system is not perfect, and “creative commons gives the unskilled an opportunity to fail at many junctures.”71 yet that only speaks to the necessity of educating the library community about the “some rights reserved” movement, so that librarians, who are already called upon to understand traditional copyright law, are also educating our society about how individuals can protect their intellectual property while preserving and strengthening the public domain. conclusion the library community can no longer afford to consider intellectual property law as a foreign topic appropriate for law schools but not library schools. those who are behind the slow extermination of the public domain rely on the complexity of copyright law, and the misunderstanding of the principle of fair use, to make their arguments easier and to brow beat libraries and the public into handing over the rights the constitution bestows on everyone. librarians need to engage in the debate over copyright law to retain control over their collections, and to better serve their patrons. in the past, the library community has not hesitated to stand up for the freedom of speech and self-expression, whether it means taking a stand against banning books from school libraries or fighting to repeal clauses of the usa patriot act. today’s library patrons are not just information consumers—they are also information producers. therefore it is just as critical for librarians to advocate for their creative rights as it is for them to defend their freedom to read. https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote60#footnote60 https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcreativecomms.org https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote65#footnote65 information technology and libraries | march 2012 17 the internet has become such a strong incubator of creative expression and innovation that the innovators are looking for a way to shirk the very laws that were designed to protect their interests. in the end, the desire to create and innovate seems to be more innate than those writing our intellectual property laws expected. perhaps financial gain is less of a motivator than the pleasure of sharing a piece of ourselves and our worldview with the rest of society. whether that’s the case or not, what is clear is that if we do not roll back legislation like the sonny bono copyright term extension act and the dmca so as to save the public domain, the pressure to create outside the bounds of the law is going to turn more inventors and artists into anarchists, threatening the interests of reasonable copyright holders. as librarians, we must curate and defend the creative property of the established, while fostering the innovative spirit of the next generation. as information, literature, and other creative works move out of the physical world, and off the shelves, into the digital realm, librarians need to do their part to ensure legislation is aligned with this new reality. if we do not, our profession may suffer first, but it will not be the last casualty of the copyright wars. references 1. beverly goldberg, “lg unlocks doors for creators, consumers with dmca exceptions,” american libraries 41, no. 9 (summer 2010): 14. 2. ibid. 3. goldberg, “lg unlocks doors.” 4. christopher alan jennings, fair use on the internet, prepared by the congressional research service (washington, dc: library of congress, 2002), 2. 5. ibid., 1. 6. ibid. 7. brandon butler, “urban copyright legends,” research library issues 270 (june 2010): 18. 8. robin jeweler, “digital rights” and fair use in copyright law, prepared by the congressional research service (washington, dc: library of congress, 2003), 5. 9. rachel bridgewater, “tipping the scales: how free culture helps restore balance in the age of copyright maximalism,” oregon library association quarterly 16, no. 3 (fall 2010): 19. 10. charles w. bailey jr., “strong copyright + drm + weak net neutrality = digital dystopia?” information technology & libraries 25, no. 3 (summer 2006): 117; u.s. copyright office, “copyright law of the united states,” under “chapter 3: duration of copyright,” http://www.copyright.gov/title17 (accessed december 8, 2010). 11. dan hunter, “culture war,” texas law review 83, no. 4 (2005): 1130. 12. bailey, “strong copyright,” 118. 13. u.s. copyright office, “copyright law of the united states,” under “chapter 1: subject matter and scope of copyright,” http://www.copyright.gov/title17 (accessed december 8, 2010). 14. bailey, “strong copyright,” 118. 15. mary minnow, “library digitization table,” http://www.librarylaw.com/digitizationtable.htm (accessed december 8, 2010). https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.copyright.gov%2ftitle17 https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.copyright.gov%2ftitle17 https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.librarylaw.com%2fdigitizationtable.htm copyright: regulation out of line with our digital reality | mcdermott 18 16. brian t. yeh, “orphan works” in copyright law, prepared by the congressional research service (washington, dc: library of congress, 2002), summary. 17. ibid. 18. jisc, in from the cold: an assessment of the scope of “orphan works” and its impact on the delivery of services to the public (cambridge, uk: jisc, 2009), 6. 19. ibid. 20. andrew albanese, “hathitrust suspends its orphan works release,” publishers weekly, sept, 16, 2011, http://www.publishersweekly.com/pw/bytopic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html (accessed october 13, 2011). 21. u.s. copyright office, “copyright law of the united states,” under “chapter 1.” 22. u.s. copyright office, copyright basics (washington, dc: u.s. copyright office, 2000), www.copyright.gov/circs/circl/html (accessed december 8, 2010). 23. mary minnow, california library association, “library copyright liability and pirating patrons,” http://www.cla-net.org/resources/articles/minow_pirating.php (accessed december 10, 2010). 24. bailey, “strong copyright,” 118. 25. overdrive, “copyright,” http://www.overdrive.com/copyright.asp (accessed december 13, 2010). 26. josh hadro, “harpercollins puts 26 loan cap on ebook circulations,” library journal (february 25 2011), http://www.libraryjournal.com/lj/home/889452264/harpercollins_puts_26_loan_cap.html.csp (accessed october 13, 2011). 27. butler, “urban copyright legends,” 18. 28. bailey, “strong copyright,” 118. 29. library of congress, fair use on the internet, 3. 30. ibid., summary. 31. matthew k. dames, “education use in the digital age,” information today 27, no. 4 (april 2010): 18. 32. ibid. 33. dames, “education use in the digital age,”18. 34. matthew k. dames, “making a case for copyright officers,” information today 25, no. 7 (july 2010): 16. 35. william c. dougherty, “the copyright quagmire,” journal of academic librarianship 36, no. 4 (july 2010): 351. 36. ibid. 37. library of congress, “digital rights” and fair use in copyright law, 9. 38. dougherty, “the copyright quagmire,” 351. 39. kevin smith, “streaming video case dismissed,” scholarly communications @ duke, october 4, 2011, http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-casedismissed/ (accessed october 13, 2011). http://www.publishersweekly.com/pw/by-topic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html http://www.publishersweekly.com/pw/by-topic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.copyright.gov%2fcircs%2fcircl%2fhtml https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.cla-net.org%2fresources%2farticles%2fminow_pirating.php https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.overdrive.com%2fcopyright.asp http://www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp http://www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-case-dismissed/ http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-case-dismissed/ information technology and libraries | march 2012 19 40. dougherty, “the copyright quagmire,” 351. 41. librarylaw.com, “library digitization table.” 42. kate m. manuel, the google library project: is digitization for purposes of online indexing fair use under copyright law, prepared by the congressional research service (washington, dc: library of congress, 2009), 1–2. 43. jeweler, “digital rights” and fair use in copyright law, 2. 44. ibid. 45. ibid. 46. manuel, the google library project, 2. 47. amir efrati and jeffrey a. trachtenberg, “judge rejects google books settlement,” wall street journal, march 23, 2011, http://online.wsj.com/article/sb10001424052748704461304576216923562033348.html (accessed october 13, 2011). 48. jennings, fair use on the internet, 7. 49. manuel, the google library project, 2. 50. ibid., 9–10. 51. ibid. 52. ibid. 53. pamela samuelson, “google books is not a library,” huffington post, october 13, 2009, http://www.huffingtonpost.com/pamela-samuelson/google-books-is-not-a-lib_b_317518.html (accessed december 10, 2009). 54. ivy anderson, “hurtling toward the finish line: should the google book settlement be approved?” against the grain 22, no. 3 (june 2010): 18. 55. samuelson, “google books is not a library.” 56. jeweler, “‘digital rights” and fair use in copyright law, 3. 57. bailey, “strong copyright,” 116. 58. cushla kapitzke, “rethinking copyrights for the library through creative commons licensing,” library trends 58, no. 1 (summer 2009): 106. 59. ibid. 60. university of maryland university college, “member community,” center for intellectual property, http://cipcommunity.org/s/1039/start.aspx (accessed february 21, 2011). 61. robin jeweler, copyright law: digital rights management legislation, prepared by the congressional research service (washington, dc: library of congress, 2004), summary. 62. creative commons, “history,” http://creativecommons.org/about/history/ (accessed december 8, 2010). 63. lawrence lessig, “the vision for the creative commons? what are we and where are we headed? free culture,” in open content licensing: cultivating the creative commons, ed. brian fitzgerald (sydney: sydney university press, 2007), 42. 64. steven j. melamut, “free creativity: understanding the creative commons licenses,” american association of law libraries 14, no. 6 (april 2010): 22. http://online.wsj.com/article/sb10001424052748704461304576216923562033348.html https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.huffingtonpost.com%2fpamela-samuelson%2fgoogle-books-is-not-a-lib_b_317518.html https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcipcommunity.org%2fs%2f1039%2fstart.aspx https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcreativecommons.org%2fabout%2fhistory%2f copyright: regulation out of line with our digital reality | mcdermott 20 65. lessig, “the vision for the creative commons?” 45. 66. creative commons, “about,” http://creativecommons.org/about/ (accessed december 8, 2010). 67. ibid. 68. ibid. 69. bridgewater, “tipping the scales,” 21. 70. ibid. 71. woody evans, “commons and creativity,” searcher 17, no. 9 (october 2009): 34. https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcreativecommons.org%2fabout%2f introducing zoomify image | smith 55 column title editor author id box for 3 column layout returning classification to the catalog | bland and stoffan 55 communications robert n. bland and mark a. stoffan returning classification to the catalog the concept of a classified catalog, or using classification as a form of subject access, has been almost forgotten by contemporary librarians. recent developments indicate that this is changing as libraries seek to enhance the capabilities of their online catalogs. the western north carolina library network (wncln) has developed a “classified browse” feature for its shared online catalog that makes use of library of congress classification. while this feature is not expected to replace keyword searching, it offers both novice and experienced library users another way of identifying relevant materials. classification to modern librari-ans is almost exclusively a tool for organizing and arranging books (or other physical media) on shelves. the role of classification as a form of subject access to collections through the public catalog—the concept of the classified catalog—has been almost forgotten. from a review of the literature, it does not appear that any major u.s. library has supported a classified catalog since boston university libraries closed its classified catalog in 1973.1 to be sure, nearly all online catalogs nowadays have some form of what is called a “call number search” or a “shelf list browsing capability” that is based on classification, but this is a humble and little-used feature because it requires that a call number (or at least a call number stem) be known and entered by the user, when no verbal index to the classification is available online. this search methodology provides nothing in the way of a systematic and hierarchical arrangement and display of subject classes, complete with accompanying verbal descriptions, that the classified catalog seeks to accomplish. but as karen markey put it in her recent review of classification and the online catalog, “to this day, the only way in which most end users experience classification online is through their online catalog’s shelf list browsing capability.”2 there are signs that this situation is changing. the recently released endeca-based catalog at north carolina state university libraries uses library of congress classification (lcc) in a prominent way to provide for browsing of the collection without need of the user entering any search terms at all.3 the lcc outline is presented on the main search entry screen with verbal captions describing the classes, allowing users to navigate through several layers of the outline to retrieve with a click of the mouse bibliographic records for materials assigned to those classes. in a converse way, the new online catalog being developed by the florida center for library automation uses lc classification as a kind of back end to keyword searching. following a keyword search, a user can limit the results set by confining it to a designated lcc range chosen again from an online display of the lcc outline.4 both of these catalogs use three levels of the lcc outlines from the most general single letter level classes (q for sciences, for example) through the two-letter classes for more specific subjects (qc for physics, qd for chemistry) to an even finer granularity with designated numeric ranges within the two-letter classes identifying specific subdisciplines, (qd241–qd441 for organic chemistry). the western north carolina library network (wncln) has been experimenting with classification as a retrieval tool in the public catalog for some time,5 and it has just implemented the first version of what we call a classified catalog browse in our innovative millennium system.6 like the two catalogs just mentioned, the classified catalog browse is based on software that is external to the ils software and integrated with that software through linking and webpage designs. also, like the previously discussed catalogs, it is robert n. bland (bland@unca.edu) is associate university librarian for technical services, university of north carolina at asheville. mark stoffan (mstoffan@ fsu.edu) is associate director for library technology at florida state university, tallahassee. figure 1. level 1 of lc classification in wncln webpac 56 information technology and libraries | june 200856 information technology and libraries | september 2008 based on scanning and incorporating into the catalog the lcc outlines as published by the library of congress. the wncln catalog goes a step further, however, in bringing the entire lc classification online down to the individual class number level—at least that portion of the classification that is actually used in our catalog. this is done through extracting class numbers and associated subject headings from bibliographic and authority records in our catalog and building an online classification display with descriptive captions (a verbal index) from these bibliographic and authority records. the result is a hierarchical display (to continue the example from above) not only of qc241–qd441 for organic chemistry but within this, qd271 for chromatographic analysis, qd273 for organic electrochemistry, and so on. the design of our interface presents this as a fourth level to which the user can “drill” down beginning with q for sciences, qd for chemistry, qd241–qd441 for organic chemistry, and finally qd271 for chromatographic analysis (figures 1–4.) from this fourth level,the user can click an associated link to execute a search of the catalog by the class number in question using the call number search function of the ils (figure 5); a second link for that class number will present the same list of titles but sorted by “most popular” (i.e., the items that have been checked out most frequently) from a separate but linked external database (figure 6); a third link will search the catalog by the associated subject heading for the class (figure 7); and finally a fourth link will show other subject headings that have been used in the catalog with this specific class number (figure 8). what does having the lc classification online in our catalog accomplish for our users? part of the point of our project is to answer this very question. chan and others7 have theorized that incorporation of the classification system into the catalog as a retrieval tool can figure 2. level 2 of lc classification in wncln webpac figure 3. level 3 of lc classification in wncln webpac provide enhanced subject access that is not possible through standard alphabetical subject headings and keyword searching alone. early studies by markey and others at oclc seem to have confirmed this with an online version of the dewey decimal classification.8 since (as far as we know) the library of congress classification has not really been tested as an online retrieval tool in a live catalog up to now, our implementation will serve as a kind of test bed for this hypothesis. how actual users in fact exploit this feature is of course only something that experience will introducing zoomify image | smith 57returning classification to the catalog | bland and stoffan 57 tell. a cursory look, however, would seem to indicate definite advantages to this approach. first of all, many studies indicate that two of the major sources of failure with subject retrieval in online systems are misspellings and poor choice of search terms by users. no figure 4. level 4 of lc classification in wncln web:pac figure 5. call number search display in wncln matter how far we may try to go with keyword searching and relevance ranking, no online library retrieval system is likely to do much with “napolyan’s fites” when what the user is looking for are books on the military campaigns of the emperor napoleon. with the classification system and verbal index online most of these problems are eliminated, since users can navigate to a subject of choice without ever entering a search term. moreover, given the design of the verbal index based on library of congress subject headings, the user is led to actual subject headings used in the catalog, which should provide for precise retrieval beyond what is ordinarily possible with keywords even when entered correctly, and (importantly) a retrieval set that is always greater than zero. the infamous and frustrating problem of “no hits” is eliminated. secondly, the great attraction of the classified catalog approach is that it arranges subjects in a hierarchical fashion based on integral connections among the topics in a way that cannot be accommodated in an alphabetic subject approach because of the vagaries of spelling. the topics “violence,” “social conflict,” and “conflict management,” for example, obviously spread out in an alphabetical subject list, are collocated in the classified catalog under the class “hm1106–hm1171 interpersonal relations” (figure 9), allowing the user to find references to materials all in one place in the catalog just as the classification system arranges the books on these subjects all in one place on the library shelves. alphabetical subject indexes, of course, attempt to ameliorate this problem by means of cross references, but there is clearly a limit to how far one can go with this approach. finally, the classified catalog provides an efficient way for collection development staff to review specific subject areas and to make better informed purchasing decisions regarding the collections. in the wncln design, the classes at the bottom level of the hierarchy are linked to the catalog by call number and subject headings, and each class carries an indication of the number of items assigned that class number. the classes are also linked to an external database that shows the frequency 58 information technology and libraries | june 200858 information technology and libraries | september 2008 of circulation of items in the class as well as title and date of publication. a quick review of this list can inform a bibliographer of circulation rates as well as the currency of materials in the class. as mentioned, the captions that are displayed with the lcc hierarchy in the wncln catalog are extracted from subject headings and authority records present in our catalog. readers familiar with lc marc record services may wonder why we took this approach to building the verbal index rather than using the information available in the lc marc classification records. machine-readable records for lc classification are now available in marc format. these files include records for each individual class number with a corresponding verbal caption. while we did experiment with using these files, cost and complexity determined that we go another direction. the lc classification files are huge, containing hundreds of thousands of classification numbers that we do not now and probably never would use in our wncln catalog simply because we (unlike lc) have no materials on these subjects. while these records could be filtered out by matching against lc class numbers that are found in our catalog and discarding non-matches, this would add yet another level of processing to an already complex process, as would handling the lc table subdivisions that are used in the lc schedules and that are separate from the standard class numbers. secondly, the lc marc classification files require a subscription costing several thousands dollars per year, as well as a substantial payment for the retrospective file needed to begin building the database of class numbers. on the other hand, extracting the verbal index from subject headings and authority records in our own catalog adds no cost to our processing. these headings and authority records are created and maintained, of course, as a standard part of the figure 6. most used titles display figure 7. subject search display in wncln cataloging process, and accordingly only headings and authority records that match materials owned by our libraries are included. the description or caption that is finally assigned to a class number is determined by a computer program that analyzes both authority records and bibliographic records found in our catalog that are assigned the class number in question, with the subject heading that is used most frequently as a primary subject generally being the one normally selected as the caption for the class. these class numbers with associated subject headings are processed then by another program, which eventually builds html files introducing zoomify image | smith 59returning classification to the catalog | bland and stoffan 59 representing the classification with links to the catalog and the external “most used” database as alluded to above. these standard html files, along with the files representing the first three levels of the lcc outline, are then loaded onto our web server to display the classification system online. figure 9. collocation of terms in the classified catalog figure 8. related subjects display in wncln a second advantage of this approach is that using the actual subject heading as the caption or description for the class makes it possible to use that caption as a direct link to a subject search in the catalog, as shown in the illustration in figure 4. a disadvantage is that the captions from the lcc files are designed to retain the hierarchy that is represented in the printed schedules in a visual way by formatting and indenting. captions derived from subject headings do not retain this feature. we have tried to accommodate this in our display of the schedules by replicating the class number ranges from the outline in the appropriate place in the full display of the schedules, thereby building a hierarchy from these ranges as genus and the individual class numbers as species. this does not manage to retain the full hierarchy of the lc schedules as shown in the printed schedules or as represented in lc’s online classification web product, but it is, we hope, an adequate surrogate for the purpose intended. in fact, in most cases, the captions derived from the extracted subject and authority headings match quite nicely the captions included in the actual lcc schedules, as shown in a comparison from the psychology classification of the hierarchy as it appears in our classified catalog browse and as it appears online in lc’s classification web product (figures 10 and 11). what is missing in our representation of the classification is not so much the subject content of the classes but the notes and information about literary form that are included in the actual lcc schedules. thus, our lcc online is not a strict image of the lcc as it would appear in printed or electronic form based on the hierarchies and captions devised by the lc. nor for that matter—despite our terminology— is it a true classified catalog, since only one classification (that used in the call number) is assigned to each item, whereas in a true classified catalog multiple classifications may be assigned to an item. it is nevertheless an online presentation of the lcc with links to our catalog that seeks to enhance subject access by exploiting the power of the classification system to organize materials by integral subject classes and to show relationships among subjects by a 60 information technology and libraries | june 200860 information technology and libraries | september 2008 hierarchical arrangement of classes as genus, species, and subspecies. and, perhaps just as importantly, it is an implementation that requires no additional cataloging effort on the part of our staff, nor any additional costs for data or processing other than the investment we have made in development of the software and the small amount of time required weekly to update the files. we do not expect that the classified catalog browse will replace keyword or subject searching as the primary means of subject access to our collections. we do believe that it promises to be a powerful and effective complement to our standard ils searches that may improve subject searching for both the novice and the experienced user. references 1. margaret hindle hazen, “the closing of the classified catalog at boston university,” library resources and technical services 18 (1974): 221–26. 2. karen markey, joan s. mitchell, and diane vizine-goetz, “forty years of classification online: final chapter or future unlimited?” cataloging and classification quarterly 42 (2006): 1–63. 3. north carolina state university libraries, “ncsu libraries online catalog,” north carolina state university, www.lib.ncsu.edu/catalog (accessed mar. 23, 2007). 4. florida center for library automation, “state university libraries of florida–endeca,” board of governors, state of florida, http://catalog.fcla.edu (accessed mar. 23, 2007). 5. the western north carolina library network is a consortium consisting of the libraries of appalachian state university, the university of north carolina at asheville, and western carolina university. 6. western north carolina library network, “library catalog,” western north carolina library network, http://wncln .wncln.org (accessed mar. 23, 2007). figure 10. class captions in the wncln webpac figure 11. class captions in lc’s classification web 7. lois mai chan, “library of congress classification as an online retrieval tool: potentials and limitations,” information technology and libraries 5 (1986): 181–92. 8. karen markey and anh demeyer, dewey decimal classification online project: evaluation of a library schedule and index integrated into the subject searching capabilities of an online catalog: final report to the council on library resources (dublin, ohio: oclc, 1986), report no. oclc/ opr/rr-86/1. 2 information technology and libraries | september 2007 w elcome to my first ital president’s column. each president only gets a year to do these col­ umns, so expectations must be low all around. my hope is to stimulate some thinking and conversation that results in lita members’ ideas being exchanged and to create real opportunities to implement those ideas. my first column i thought i would keep short and sweet, and discuss just a few of the ideas that have been rattling around in my head since the 2007 midwinter lita town meeting, which have been enhanced by a number of discussions among librarians over the last six months. with any luck, these thoughts might have some bearing on what any of those ideas could mean to our organization. first off, i don’t think i can express how weird this whole presidential appellation is to me. i am extremely proud to be associated with lita, and honored and surprised at being elected. i come from a consortia envi­ ronment and an extremely flat organization. solving problems is often a matter of throwing all the parties in a room together and hashing it out until solutions are arrived at. i’ve been a training librarian for quite a while now, and pragmatic approaches to problem solving are my central focus. i’m a consortia wrangler, a trainer, and a technology pusher, and i hope my approach is, and will be, to listen hard and then see what can be accomplished. so in my own way, i find being president kind of on the embarrassing side. it’s like not knowing what to do with your hands when you’re speaking in public. at the lita town meeting (http://litablog .org/2007/06/17/lita­town­meeting­2007­report/) it was pretty obvious that members want community in all its various forms, face­to­face in multiple venues and online in multiple venues. it’s also pretty obvious from the studies done by pew internet and american life and by oclc that our users, and in particular our younger users, really want community. the web 2.0 and the library 2.0 movements are responses to that desire. as a somewhat flippant observation, we spent a generation educating our kids to work in groups, and now we shouldn’t be sur­ prised that they want to work and play in groups. many of us work effectively in collaborative groups everyday. we find it exciting, productive, and even fun. it’s an environment that we would like to create for our patrons, in­house and virtually. it’s what we would like to see in our association. having been to every single top tech trends program and listened to the lita trendsters, one theme that often comes up is that complaining about the systems our ven­ dors deliver can at times be pointless, because they sim­ ply deliver what we ask for. there is of course a corollary to this. once a system is in the marketplace, adding func­ tionality often becomes centered around the low­hang­ ing fruit. as a fictitious example, a vendor might easily add the ability to change the colors of the display to the patron, but adding a shelf list browse might take serious coding to create. so through discussions and rfp, we ask for and get the pretty colors while the browsing function waits, a form of procrastination. so then does innovation come only when all the low­hanging fruit has finally been plucked, and there’s nothing else to procrastinate on? as social organizations, libraries, ala, lita and other groups, it appears that we have plucked all the low­hanging fruit of web 1.0. e­mail and static web pages have been done to death. as a pragmatist, what concerns me most is implementation. what delivery systems should and can we adopt and develop to fulfill the promise of services we’d like? can we ensure that barriers to participation are either eliminated or so low as to include everyone? i like to think that web 2.0 is innovation toward mirroring how we personally want to work and play and how we want our social structures to perform. so how can we make lita mirror how we want to work and play? i do know it’s not just making everything a wiki. mark beatty (mbeatty@wils.wisc.edu) is lita president 2007/2008 and trainer, wisconsin library services, madison. president’s column mark beatty 4 information technology and libraries | december 2007 author id box for 2 column layout column title editor enterprise digital asset management (dam) systems are beginning to be explored in higher education, but little information about their implementation issues is available. this article describes the university of michigan’s investigation of managing and retrieving rich media assets in an enterprise dam system. it includes the background of the pilot project and descriptions of its infrastructure and metadata schema. two case studies are summarized—one in healthcare education, and one in teacher education and research. experiences with five significant issues are summarized: privacy, intellectual ownership, digital rights management, uncataloged materials backlog, and user interface and integration with other systems. u niversities are producers and repositories of large amounts of intellectual assets. these assets are of various forms: in addition to text materials, such as journal papers, there are theses, performances from per­ forming arts departments, recordings of native speakers of indigenous languages, or videos demonstrating surgical procedures, to name a few.1 such multimedia materials have not, in general, been available outside the originat­ ing academic department or unit, let alone systematically cataloged or indexed. valuable assets are “lost” by being locked away in individual drawers or hard disks.2 managing and retrieving multimedia assets are not problems confined to academia. media companies such as broadcast news agencies and movie studios also have faced this problem, leading to their adoption of digital asset management (dam) systems. in brief, dam systems are not only repositories of digital­rich media content and the associated metadata, but also provide management functionalities similar to database manage­ ment systems, including access control.3 a dam system can “ingest digital assets, store and index assets for easy searching, retrieve assets for use in many environments, and manage the rights associated with those assets.”4 in summer 2000, the university of michigan (u­m) tv station, umtv, was searching for a video archive solution. that fall, a u­m team visited cnn and experienced a “eureka!” moment. as james hilton, then­associate provost for academic, information, and instructional technology affairs, later wrote, “building a digital asset management into the infrastructure . . . will be the digital equivalent of bringing indoor plumbing to the campus.”5 in spring 2001, an enterprise dam system was considered for inclusion in the university infrastruc­ ture. upon completion of a limited proof­of­concept project, a cross­campus team developed the request for proposals (rfp) for the dams living lab, which was issued in july 2002 and subsequently awarded to ibm and ancept. in august 2003, hardware and software installation began in the living lab.6 by 2006, the project changed its name to bluestream to appeal to the grow­ ing mainstream user base.7 six academic and two support units agreed to partner in the pilot: ■ school of education ■ school of dentistry ■ college of literature, science, and the arts ■ school of nursing ■ school of pharmacy ■ school of social work ■ information technology central services ■ university libraries the academic units were asked to provide typical and unusual digital media assets to be included in the living lab pilot. the pilot focused on rich media, so the preferred types of assets were digital video, images, and other multimedia delivered over the web. the living lab pilot was designed to address four key questions: ■ how to create a robust infrastructure to process, manage, store, and publish digital rich media assets and their associated metadata. ■ how to build an environment where assets are eas­ ily searched, shared, edited, and repurposed in the academic model. ■ how to streamline the workflow required to create new works with digital rich media assets. ■ how to provide a campuswide platform for future application of rights declaration techniques (or other ip tools) to existing assets. this article describes the challenges encountered during the research­and­development phase of the u­m enterprise dam system project known as the living lab. the project has now ended, and the implemented project is known as bluestream. enterprise digital asset management system pilot: lessons learned yong-mi kim, judy ahronheim, kara suzuka, louis e. king, dan bruell, ron miller, and lynn johnson yong-mi kim (kimym@umich.edu) is carat-rackham fellow 2004, school of information; judy ahronheim (jaheim@umich .edu) is metadata specialist, university libraries; kara suzuka (ksuzuka@umich.edu) is assistant research scientist, school of education; louis e. king (leking@umich.edu) is managing producer, digital media commons; dan bruell (danlbee@umich .edu) is director, school of dentistry; ron miller (ronalan@umich .edu) is multimedia services position lead, school of education; and lynn johnson (lynjohns@umich.edu) is associate professor, school of dentistry, university of michigan, ann arbor. article title | author 5enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 5 ■ background of the living lab: u-m enterprise dam system project an enterprise project such as the living lab at u­m can have significant impact on an institution’s teaching and learning activities by allowing all faculty and students easy yet secure access to media assets across the entire campus. such extensive impact can only be obtained by overcoming numerous and varied obstacles and by docu­ menting actual implementation experiences employed to overcome those challenges. enterprise dam system vendors such as stellent, artesia, and canto list clients from many different industry sectors, including gov­ ernment and education, but provide no detailed case studies on their web sites.8 information regarding the status of enterprise dam system projects and specific issues that arose during implementation is difficult to find. information publicly available for enterprise dam system projects in higher education is usually in the form of white papers or proposals that do not cover the actual implementations.9 given the high degree of interest and the number of pilot projects announced in recent years, this shortcoming has prompted the writing of this article, which presents the most important lessons learned dur­ ing the first phase of the living lab pilot project with the hope that these experiences will be valuable to other academic institutions considering similar projects. as part of its core mission, u­m strives to meet the teaching and learning needs of the entire campus. thus, the living lab pilot solicited participation from a diverse cross­section of the university’s departments and units with the goal of evaluating the use of varied teaching and learning assets for the system. from the beginning, it was expected that this system would handle assets in many different forms, such as digital video or digitized images, and also accommodate various organizational schemas and metadata for different collections. this sets the u­m enterprise dam system apart from projects that focus on only one type of collection or define a large monolithic metadata schema for all assets. data were gathered through interviews with asset providers, focus groups with potential users, and a review of the relevant literature. a number of barriers were identified during the pilot’s first phase. while there were some technical barriers, the most signifi­ cant barriers were cultural and organizational ones for which technical solutions were not clear. perhaps the most significant cultural divide was between the culture of academia and the culture of the commercial sector. cultural and organizational assumptions from com­ mercial business practices were embedded in the design of the products initially used in the living lab imple­ mentation. thus, an additional implementation chal­ lenge was determining which issues should be resolved through technical means, and which should be solved by changing the academic culture. this is expected to be an ongoing challenge. ■ architecture (building the infrastructure) an enterprise dam system in an academic community such as u­m needs to support a wide variety of services in order to meet the numerous and varied teaching, research, service, and administrative functions. figure 1 illustrates the services that are provided by an enterprise dam system and concurrently demonstrates its com­ plexity. the left column, process, lists a few of the media processes that various producers will use prepare their media and subsequent ingestion into the enterprise dam system; the middle column, manage, demonstrates the various functions of the enterprise dam system; while the third column, publish, lists a subset of the publishing venues for the media. because an enterprise dam system supports a variety of rich media, a number of software tools and workflows are required. figure 2 illustrates this complexity and describes the architecture and workflow used to add a video segment. the organization of figure 2 parallels that of figure 1. the left column, process, indicates that flip factory by telestream is used to convert digital video from the original codec to one that can be used for play­ back.10 in addition, videologger by virage uses media analysis algorithms to extract key frames and time codes created by louis e. king, ©2004 regents of the university of michigan figure 1. component services of the living lab 6 information technology and libraries | december 20076 information technology and libraries | december 2007 from the video as well as to convert the speech­to­text for easy searching.11 the middle column, manage, illustrates tools from ibm that help create rich media as well as tools from stellent, such as its ancept media server (ams), that store and index the rich media assets.12 the third column, publish, illustrates two examples of how these digital video assets could be made available to the end user. one strategy is as a real video stream using real network’s helix server, and the other as a quicktime video stream using ibm’s videocharger.13 a thorough discussion of all of the software and hardware that make up u­m’s dam system is beyond the scope of this article. however, a list of the software components with links to their associated web sites is provided in figure 3. from the beginning the living lab pilot aimed for a diverse collection of assets to promote resource discovery and sharing across the university. figure 4 illustrates how the living lab is expected to fit into the varied publishing venues that comprise the campus teaching and learning infrastructure. existing storage and network infrastruc­ tures are used to deliver media assets to various software systems on campus. the living lab is used to streamline the cataloging, searching, and retrieving processes encoun­ tered during academic teaching and research activities. the following example describes how the enterprise dam system fits into the future campus cyberinfrastruc­ ture. a faculty member in the school of music is a jazz composer. one of her compositions is digitally stored in the enterprise dam system along with the associated metadata (cataloging information) that will allow the piece to be found during a search. that single audio file is then found, accessed, and used by five unique publish­ ing venues—the course web site, the university web site, a radio broadcast, the music store, and the library archive. the faculty member uses the piece in her jazz interpreta­ tion course and thus includes a link to the composition on her sakai course web site.14 when she receives an award, the u­m issues a press release on the u­m web site that includes a link to an audio sample. concurrently, michigan radio uses the enterprise dam system to find the piece for a radio interview with her that includes an audio segment.15 her performance is published by block m records, u­m’s web­based recording label, and, lastly, the library permanently stores the valuable piece in its institutional archive, deep blue.16 ■ metadata (managing assets within the academic model) the vision for enterprise dam at u­m is for digital assets to not only be stored in a secure repository, but also be findable, accessible, and usable by the appropriate persons in the university community in their academic endeavors. information about these assets, or metadata, is a crucial component of fulfilling this vision. an important created by louis e. king, ©2004 regents of the university of michigan figure 2. the living lab architecture north american systems ancept media server www.nasi.com/ancept.php ibm content manager www-306.ibm.com/software/data/cm/cmgr/mp/ telestream flip factory www.telestream.net/products/flipfactory.htm virage videologger www.virage.com/content/products/index.en.html ibm video charger www-306.ibm.com/software/data/videocharger/ real networks helix server www.realnetworks.com/products/media_delivery. html apple quicktime streaming server www.apple.com/quicktime/streamingserver/ handmade software image alchemy www.handmadesw.com/products/image_alchemy. htm figure 3. software used in the living lab article title | author 7enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 7 question that arises is, “what kind of metadata should be required for the assets in the living lab?” to help answer this question, potential asset provid­ ers were interviewed regarding their current approach to metadata, such as if they used a particular schema and how well it met their purposes. not surprisingly, asset providers had widely varied metadata implementations. while the assets intended for the living lab pilot all had some metadata, the scope and granularity varied greatly. metadata storage and access methods also varied, ranging from databases implemented using commercial database products and providing web front­ends, to a combination of paper and spreadsheet records that had to be consulted together to locate a particular asset. the assets to be used in the living lab pilot consisted primarily of high­ and low­resolution digital images and digitized video. these interviews also generated a number of requirements for any potential living lab metadata schema. it was deter­ mined that the schema should be able to: ■ describe heterogeneous collections at an appropriate level of granularity and detail, allowing for domain­ specific description needs and vocabularies; ■ allow metadata entry by non­specialists; ■ enable searches across multiple subject areas and col­ lections; ■ provide provenance information for the assets; and ■ provide information on authorized uses of the assets for differing classes of users. an examination of the literature showed a general consensus that no single metadata standard could meet the requirements of heterogeneous collections.17 projects as diverse as pb core and vius at penn state adopted the approach of drawing from multiple existing metadata standards.18 their approaches differ in that pb core is a combination of selected metadata elements from a num­ ber of standards plus additional elements unique to pb core, while vius opted for a merged superset of all the elements in the standards selected. in interviews with asset providers (usually faculty), cataloging backlog and the lack of personnel for gen­ erating and entering metadata emerged as consistent problems. there was concern that an overly complex or specialized schema would aggravate the cataloging back­ log by making metadata generation time­consuming and cumbersome. budgetary constraints made hiring pro­ fessional metadata creators prohibitive. another aspect of the personnel problem was that adequate descrip­ tion required subject specialists who were, ideally, the resource authors or creators. but subject specialists, while familiar with the resources and the potential audience for them, may not be knowledgeable of how to produce high­quality metadata, such as controlled vocabularies or consistent naming formats. to address these issues, the more simple and straight­ forward indexing process offered by dublin core (dc) was selected as the starting point for the metadata schema in the living lab.19 dc was originally developed to sup­ port resource discovery of a digital object, with resource authors as metadata creators. dc is a relatively small standard, but is extensible through the use of qualifiers. it has been adopted as a standard by a number of standards organizations, such as iso and ansi. a body of research exists on its use in digital libraries and its efficacy for author­generated metadata, and there are metadata crosswalks between dc and most other metadata stan­ dards. a number of other subject­specific standards were also examined for more specialized description needs and controlled vocabularies: vra core, ims learning resource meta­data specification, and snodent.20 in the end, the project leaders elected to adopt a rather novel approach to metadata by not defining one metadata schema for all assets. by taking advantage of the power of multiple approaches (for example, pb core for mix­and­ match, and vius for a merged superset) each collection can have its own schema as long as it contains the ele­ ments of a more general, lowest­common­denominator schema. this overall schema, um_core, was defined based on dc. the elements are prefixed with dc or um to specify the schema origin. um_publisher and um_alternatepublisher identify who should be contacted about problems or ques­ tions regarding that particular asset. um_secondarysubject is a cross­collection subject classification schema devel­ created by louis e. king, ©2004 regents of the university of michigan figure 4. the enterprise dam system as the future campus infrastructure for academic venues 8 information technology and libraries | december 20078 information technology and libraries | december 2007 oped by the u­m libraries, and helps map the asset into the context of the university. in adopting such an approach to metadata, metadata creation is seen not as a one­shot process, but a collaborative and iterative one. for example, on initial ingestion into the living lab, the only metadata entered for an image may be dc_title, dc_date, and um_publisher. additional meta­ data may be entered as users discover and use the asset, or as input from a subject specialist becomes available. the discussion so far has focused on metadata pro­ duced with human intervention. a number of metadata elements can be obtained from the digital objects through the use of software. in an enterprise dam system, this is referred to as automatically generated metadata and is what can be directly obtained from a computer file such as file name, file size, and file format. this type of metadata is expected to play a larger role as an increasing propor­ tion of assets will be born digital and come accompanied by a rich set of embedded metadata. for example, images or video produced by current digital cameras contain exchangeable image file format (exif) metadata, which include such information as image size, date produced, and camera model used. when available, the living lab presents automatically generated metadata to the user in addition to the elements in um_core. thus, asset metadata in the living lab can be pro­ duced in two ways: automatically generated through a tool such as virage videologger in the case of video, or entered by hand through the current dam system inter­ face.21 in addition, if metadata already exist in a database format, such as filemaker, this can be imported once the appropriate mappings are defined.22 videologger, a video analysis tool for digital video files, can extract video key frames, add closed captions, determine whether the audio is speech or music, convert speech to text, and identify (through facial recognition) the speaker(s). these capabilities allow for more sophis­ ticated searching of video assets compared to the cur­ rent capabilities of search engines such as google. some degree of content­based searching can now be done, as opposed to searching that relies on the title and other textual description provided separately from the video itself. for the pilot, particular interest was expressed in the speech recognition capability of videologger. videologger generates a time­coded text of spoken key­ words with 50 to 90 percent accuracy. the result is not nearly accurate enough to generate a transcript, but does indeed provide robust data for searching the content of video. given the diversity of assets in the living lab, it is clear that the university can utilize low­cost keyword analysis to enhance search granularity as well as the more expensive, fully accurate hand­processed transcript. ■ workflow examples two instructional challenges demonstrate how an enter­ prise digital asset management system can provide a solution to instructional dilemmas and how a unique workflow needs to be created for each situation. the chal­ lenges related to each project are described. school of dentistry the educational dilemma the u­m school of dentistry uses standardized patient instructors (spis) to assess students’ abilities to interact with patients. carefully trained actors play carefully scripted patient roles. dental students interview the patients, read their records, and make decisions about the patients’ care, all in a few minutes (see figure 6). each session is video recorded. currently, spis grade each student on predeter­ mined criteria, and the video recording is only used if a student contests the spis’ grade. ideally, a dental educator should review each recording and also grade each student. however, the u­m class size of 105 dental students causes a recording­based grading process to be prohibitively expensive in terms of personnel time. in addition, the use of digital videotape makes it difficult for the recorded sessions to be made available to the students. because the tapes are part of the student’s record, they cannot be checked out. if a student wants to review a tape, she or he must make an appointment and review it in a supervised setting. living lab solution the u­m school of dentistry’s living lab pilot attempted simultaneously to improve the spi program and lower the cost of faculty grading spi sessions through three goals: dc_title dc_creator dc_subject um_secondarysubject dc_description dc_publisher dc_contributor dc_date dc_type dc_format dc_identifier dc_source dc_language dc_relation dc_coverage dc_rights um_publisher um_alternatepublisher figure 5. the u-m enterprise dam system metadata scheme um_core article title | author 9enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 9 1. use speech­to­text analysis to create an easily searched transcript; 2. streamline the recording process; and 3. make the videos available online for student review. each of these challenges and the current results are summarized. speech-to-text analysis it was hypothesized that an effective speech­to­text anal­ ysis of the spi session could enable a grader quickly to locate video segments that: (1) represented student dis­ cussion of specific dental procedures; and (2) contained student verbalizations of key clinical communication skills.23 in summer 2005, nine spi sessions were recorded and a comparison between manual transcription and the automated speech­to­text processes was conducted. the transcribed audio track was manually marked up with time­coded reference points and inserted as an annota­ tion track to the video. those same videos also were ana­ lyzed through the video logger speech­to­text service in the living lab, resulting in an automatically generated, time­coded text track. lastly, six keywords were selected that, if spoken by the student, indicated the correct use of either a dental procedure or good communication skills. keyword searches were conducted on both the manual transcription and the speech­to­text analysis. three results were calculated on the key word searches of both versions of all nine recorded sessions. they were: (1) the number of successful keyword searches; (2) the number of successful search results that did not actually contain the keywords (false positives); and (3) the time required to complete the manual transcrip­ tion and text­to­speech analysis of the recordings. the results demonstrated that the speech­to­text analysis matched the manual transcription 20 to 60 percent of the time. also, the speech­to­text process resulted in a false positive less than 10 percent of the time. lastly, the time required to complete the speech­to­text analysis of a session was two minutes, while the average time required to complete a manual transcription of the same session was 180 minutes. while not perfect, the results are encouraging that manually transcribing the audio is no longer necessary. improvements are being made to the clinical environment and microphones so that a higher­quality recording is obtained. it is anticipated that those changes combined with improved software will improve the results of the speech­to­text analysis sufficiently so that automated keyword searches can be conducted for grading purposes. streamlining the recording process scale is a significant challenge to capturing 105 spi inter­ actions in a short amount of time. two to three weeks are required for the entire class of 105 students to complete a series of spi experiences, with as a many as four concur­ rent sessions at any given time. in summer 2006, it was decided to record 50 percent of one class. logistically, one camera operator could staff two stations simultane­ ously. the stations had to be physically close enough for a one­person operation, but not so close that audio from the adjacent session was recorded. the optimal distance was about thirty to thirty­five feet of separa­ tion. staggering the start times of each session allowed the camera operator to make sure each was started with optimal settings. since the results of the speech­to­text analysis were linked to the quality of the equipment used, two prosumer minidv cameras with professional quality microphones and tripods also were purchased. student availability an important strength of living lab is the ability to make the assets both protected and accessible. the current itera­ tion does not have an interface for user­created access con­ trol lists (acl), instead they need to be created by a systems administrator. once a systems administrator has created an acl, academic technology support staff can add or subtract people. to satisfy family educational rights and protection act regulations, a separate acl is needed for each student for the spi project.24 currently, the possibility of including the spi recordings and their associated transcriptions as ele­ ments of an eportfolio is being explored.25 in the meantime, students can use url references to include these videos and transcripts in such web­based tools as eportfolios and course management systems. discussion as the challenges of improving speech­to­text analysis, recording workflow, and user­created acls are overcome, the spi program will be able to operate at a new and previ­ ously unimagined level. a more objective keyword grad­ ing process can be instituted. students will be easily able to search through and review their sessions at times and locations that are convenient for them. living lab also will allow students to view their eportfolio of spi interactions and witness how they have improved their communica­ tion skills with patients. for the first time in healthcare education, a clinician’s communication skills, such as bedside or chairside manner, will be able to be taught and assessed using objective methods. school of education the challenge of using records of practice for research and professional education classroom documentation plays a significant role in educational research and in the professional education of teachers at the u­m school of education. collections of 10 information technology and libraries | december 200710 information technology and libraries | december 2007 videos capturing classroom lessons, small­group work, and interviews with students and teachers—as well as other classroom records, such as images of student work, teacher lesson plans, and assessment documents—are basic to much of the research that takes places in the school of education. however, there also is a large and increasing demand to use these records from real class­ rooms for educational purposes at the u­m and beyond, creating rich media materials for helping preservice and practicing teachers learn to see, understand, and engage in important practices of teaching. this desire to create widely distributed educational materials from classroom documentation raises two important challenges: first, there is the important challenge of protecting the identity of children (and, in some cases, teachers); and second, there is the difficult task of ensuring that the classroom records can be easily accessed by individuals who have permission to view and use the records while being inac­ cessible to those without permission. one research and materials development project at the u­m school of education has been exploring the use of living lab to support the critical work of processing classroom records for use in research and in educational materials, and the distribution and protection of class­ room records as they are integrated into teacher educa­ tion lessons and professional development sessions at the u­m and other sites in the united states. the findings and challenges of these efforts are summarized below. processing classroom records the classroom records used in the pilot were processed in three main ways, producing three different types of products: ■ preservation copies are high­quality formats of the classroom records with minimal loss of digital infor­ mation that can be read by modern computers with standard software. these files are given standardized filenames, cleaned of artifacts and minor irregu­ larities, and de­identified (that is, digitally altered to remove any information that could reveal the identity of the students and, in some cases, of the teachers). ■ working copies are lower­quality versions of the preservation copies that are still sufficient for print­ ing or displaying and viewing. trading some degree of quality for smaller file sizes and thus data rates, the working copies are easier for people to use and share. additionally, these files are further devel­ oped to enhance usability: videos are clipped and composited to feature particular episodes; videos also are subtitled, flagged with chapter markers (or other types of coding), and embedded with links for accessing other relevant information; images of stu­ dent and teacher work are organized into multipage pdfs with bookmarks, links, and other navigational aids; and all files are embedded with metadata for aiding their discovery and revealing information about the files and their contents. ■ distribution copies are typically similar in quality to the working copies but are often integrated into other documents or with other content; they are labeled with copyright information and statements about the limitations of use. they are, in many cases, edited for use on a variety of platforms and copy protected in small ways (for example, word and powerpoint files are converted to pdfs). the living lab was found to support this processing of classroom records in two important ways. first, the system allowed for the setup and use of workflows that enabled undergraduate students hired by the project to upload processed files into the system and walk through a series of quality checks, focused on different aspects of the products. so, for example, when checking the preservation copies, one person was assigned to check the preservation copy against the actual artifact to make sure everything was captured adequately and that the resulting digital file was named properly (“quality check 1”). another individual was assigned to make sure the content was cleaned up properly and that no identifying information appeared anywhere (“quality check 2”). and finally, a third person checked the file against the meta­ data to make sure that all basic information about the file was correct (“quality check 3”). files that passed through all checks were organized into collections accessible to project members and others (“organize”). files that failed along the way were sent back to the beginning of the workflow (the “drawing board”), fixed, and checked again (see figure 7). figure 6. a dental student interviewing an spi. article title | author 11enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 11 second, living lab allowed asset and collection development to be carried out collaboratively and itera­ tively, enabling different individuals to add value in dif­ ferent ways over time. undergraduate students did much of the initial processing and checking of the assets; skilled staff members converted subtitles into speech metadata housed within living lab; and, eventually, project faculty and graduate students will add other types of analytic codes and content specific metadata to the assets. distribution and protection of classroom records in addition to supporting the production of various types of assets and collections, the living lab supported the distribution and protection of classroom records for use in education settings both at u­m and other institutions. for example, almost fifteen hours of classroom videos from a third­grade mathematics class were made acces­ sible to and were used by instructors and students in the college of education at michigan state university. in a different context, approximately ten minutes of classroom video was made available to instructors in mathematics departments at brigham young university, the university of georgia, and the city college of new york to use in courses for elementary teachers. each asset (and its derivatives) housed within living lab has a url that can be embedded within web pages and online course­management systems, allowing for a great deal of flexibility in how and where the assets are pre­ sented and used. at the same time, each call to the server is checked and, when required, users are prompted to authen­ ticate by logging in before any assets are delivered. this has great potential for easily, seamlessly, and safely integrating living lab assets into a variety of web spaces. although this feature has indeed allowed for a great deal of flexibility, there were and continue to be challenges with creating an integrated and seamless experience for school of education students and their instructors. for example, depending on a variety of factors, such as user operating systems and web browser combinations, users might be prompted for multiple logins. additionally, the login for the living lab server can be quite unforgiving, locking out users who fail to login properly in the first few tries and providing limited communication about what has occurred and what needs to be done to correct the situation. discussion during the living lab pilot a number of workflow chal­ lenges were overcome that now allow numerous and varied types of media related to classroom records to be ingested into living lab, and derivatives created. this demonstrates that living lab is ready for complex media challenges associated with instruction. however, the next challenge of delivering easily and smoothly to others still remains. once authentication and authorization is con­ ducted using single sign­on techniques that allow users to access assets securely from living lab through other systems, assets will be able to be incorporated into web­ based materials and used to enhance the instruction of teachers in ways that have yet to be conceived. ■ privacy, intellectual property, and copyright during the course of the pilot, a number of issues emerged. among these were some of the most critical issues that institutions considering embarking on a similar asset man­ agement system need to address. these issues are: ■ privacy; ■ intellectual ownership and author control of materials; ■ digital rights management and copyright; ■ uncataloged materials backlog; and ■ user interface and integration with other campus systems. up to this point, enterprise dam systems had been developed and used primarily by commercial enterprises— for example, cnn and other broadcasting companies. using a product developed by and for the commercial sec­ tor brought to the fore the cultural differences between the academy and the commercial sector (see figure 8). the first three issues in the previous list are related to the differing cultures of commercial enterprise and academia. these issues are addressed below. the fourth and fifth issues are addressed in the section “other important issues.” privacy videos of medical procedures can be of tremendous value to students. in their own words, “watching is different from reading about it in a textbook.” but subjects have the right to retract their consent regarding the use of their images or treatment information for educational purposes. this creates a dilemma: if other assets have been cre­ ated using it, do all of them have to be withdrawn? for drawing board → quality check 1 → quality check 2 → quality check 3 → organize figure 7. living lab workflow 12 information technology and libraries | december 200712 information technology and libraries | december 2007 example, if a professor included an image from the univer­ sity’s dam system in a classroom powerpoint or keynote presentation, and subsequently included the presentation in the university’s dam system, what is the status of this file if the patient withdraws consent for use of her or his treatment information?26 when must the patient’s request be fulfilled? can it be done at the end of the semester, or does it need to be completed immediately? if the request must be fulfilled immediately, the faculty member may not have sufficient time to find a comparable replacement. waiting until the end of the semester helps balance patient privacy with teaching needs. in either case, files must be withdrawn from the enterprise dam system and links to those files removed. consent status and asset relationships must be part of the metadata for an asset to handle such situations. consideration must be given to associating a digital copy of all consent forms with the corresponding asset within an enterprise dam system. intellectual ownership and author control of materials authors’ rights, as recognized by the berne convention for the protection of literary and artistic works, have two components.27 one, the economic right in the work, is what is usually recognized by copyright law in the united states, being a property right that the author of the work can transfer to others through a contract. the other component—the moral rights of the author—is not explicitly acknowledged by copyright law in the united states and thus may escape consideration regarding ownership and use of intellectual property. moral rights include the right to the integrity of the work, and thus come into play in situations where a work is distorted or misrepresented. unlike economic rights, moral rights cannot be transferred and remain with the author. in a university setting, the university may own the economic right for a researcher’s work, in the form of copyright, but the researcher retains moral rights. the following incident illustrates what can happen when only property rights are taken into account. a digital video segment of a medical procedure was being shown as part of a living lab demo at a university it showcase. because the u­m held the copyright for that particular videotape, no problems were foreseen regarding its usage. a faculty member recognized the video as one she had cre­ ated several years ago and expressed great concern that it had been used for such a purpose without her knowledge or consent. the concern arose from the fact that video showed an outdated procedure. while the faculty member continued to use this video in the classroom, she felt this was different from having it available through the living lab. in the classroom, the faculty member alerted students to the outdated practices during the viewing, and she had full control over who viewed it. the faculty member felt she lost this control and additional clarification when the video became available through living lab. that is, her work was now misrepresented and her moral rights as an author were violated. digital rights management and copyright in the academic world, digital rights management (drm) is becoming a necessary component in disseminating intellectual products of all forms.28 however, at this time there are few standards and no technical drm solution that works for all media on all platforms. therefore, u­m has elected to use social rather than technical means of managing digital rights. the living lab metadata schema provides an element for rights statements, dc_rights. these metadata, combined with education of the univer­ sity community about copyright, fair use, and the highly granular access control and privileges management of the system, provide the community with the knowledge and tools to use the assets ethically. the university can establish rights declarations to use in the dc_rights field as standards are developed and prec­ edent is established in the courts. these declarations may include copyright licenses developed by the university legal counsel as well as those from the creative commons.29 current solution—access control lists a clear difference between the cultures of commercial enterprises and academia emerged regarding access to assets, administered through acls.30 an acl specifies commercial dam system model university dam system model assets held centrally federated ownership of assets access, roles, and privileges managed centrally distributed management of access, privileges and roles metadata frameworks— monolithic federated metadata schema agnostic user interface(s) re: privileges, ownership figure 8. differences between commerical and university uses of a dam system. article title | author 13enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 13 who is allowed to access an asset and how they can use it. in commercial settings, access to assets is centrally managed, while in academia, with its complex set of intellectual and copyright issues, it is preferable to have them managed by the asset holders. university users repeatedly asked for the ability to define acls for each asset in the living lab. currently, end users and support staff cannot define acls—only system administrators can create them. the middleware for user­defined acls has been fully developed, and the user interface for user­ defined acls will be made available in the next version. this capability is important in the academic envi­ ronment because the composition of group(s) of people requiring access to a particular asset is fluid and can span many organizational boundaries, both within and outside the university. a research group owning a collection of assets may want to restrict access for various reasons, including requirements set forth by an institutional review board (irb, a university group that oversees research projects involving human subjects), or regulations such as the health insurance portability and accountability act of 1996, which addresses patient health information privacy.32 the research group will want flexible access control, as research group members may collaborate with others inside and outside the university. the original irb approval may specify that confidentiality of the subjects must be maintained, and collected data, such as video or transcripts, can only be viewed by those directly involved in the research project and cannot be browsed by other researchers not involved in the study or the public at large. in another situation, a collection of art images may only be viewed by current students of the institution, thus requiring a different acl. this situation is still open to interpretation. some say patient consent regarding the use of information for instructional purposes cannot be withdrawn for the use of existing information at the home institution. they can only withdraw it for the use of future assets. others may feel that patients can withdraw permission for the use of their patient assets. other important issues uncataloged materials backlog what emerged from interviews and focus groups with content providers was that while there was no lack of assets they would like to see online, a large proportion of these assets had never been cataloged or even sys­ tematically labeled in some form. this finding may be attributed in part to the pilot focusing on existing assets that have previously not been available for widespread sharing—such as the files stored on faculty hard disks and departmental servers—only known to a favored few. owners or creators of these materials had not consciously thought about sharing these materials or making them available to others. librarians, in contrast, have devel­ oped systems and practices to ensure the findability of materials that enter the library. asset owners were more than willing to have the assets placed online, but did not have the time or resources to provide the appropriate metadata. hiring personnel to create the metadata is problematic, as there is a limit to the metadata that can be entered by non­experts, and experts often are scarce and expensive. for example, for a collection of oral pathology images of microscopic slides, a subject expert must provide the diagnoses, stain, magnification, and other information for each image. without these details, merely putting the slides online is of little value, but these metadata cannot be provided by laypeople. collaborative metadata creation, allowing multiple metadata authors and iterations, may be one solution to this problem. a number of studies indicate that both organiza­ tional support and user­friendly metadata creation tools are necessary for resource authors to create high­ quality metadata.33 some of the backlog may be resolved through development of tools aimed at resource authors. in addition, increased use of digital file formats with embedded metadata may contribute to reducing future backlog by requiring less human involvement in meta­ data creation. faculty need to be taught that metadata raises the value and utility of assets. as they come to understand the essential role metadata plays, they, too, will invest in its creation. user interface and integration with other systems an enterprise dam system has two basic types of uses: by producers and by users. producers tend to be digital media technologists who create the digital assets and ingest them into the enterprise dam system. the users are the faculty, students, and staff who use these digital assets in their teaching, learning, or research. the research and development version of the enter­ prise dam system, living lab, works well for digital asset producers, but not for the users of these digital assets. ingestion and accessing processes are quite complex and are not currently integrated with other campus systems, such as the online library catalog or the sakai­based, campuswide course management sys­ tem, ctools.34 digital producers who are comfortable with complex systems are able to ingest and access rich media. however, users have to log onto the enterprise dam system and navigate its complex user interface. the level of complexity of accessing the media can cre­ ate a barrier to adoption and use. if the level of complex­ ity for accessing the assets is too high for users, then the system also is too complex to expect users to contribute to the ingestion of digital assets. 14 information technology and libraries | december 200714 information technology and libraries | december 2007 in both student and faculty focus groups there was concern about the technical skills needed for faculty use of an enterprise dam system in the classroom. ideally faculty should be able to incorporate assets seamlessly from the enterprise dam system to their classroom mate­ rials, such as powerpoint or keynote presentations. then, the presentations created on their computers should dis­ play without glitches on the classroom system. obviously faculty members cannot be expected to troubleshoot in the classroom when display problems occur. if the enterprise dam system is perceived as difficult to use, or as requiring a lot of troubleshooting by the user, this will discourage adoption by the faculty. this creates additional demands on the enterprise dam system, and potential additional it staffing demands for the academic units wanting to promote enterprise dam system use. when a problem is experienced in the classroom, the departmental it support, not the enterprise dam system support team, will be the first to be called. ideally, an enterprise dam system should be linked to the campus it infrastructure such that users or con­ sumers do not interact with the dam system itself, but rather through existing academic tools, such as the library gateway, course management system, or departmental web sites. having to learn a new system could be a sig­ nificant barrier to use for many potential dam system users in academia. ■ conclusions and lessons learned the vision of a dam system that would allow faculty and students easy yet secure access to myriad rich media assets is extremely appealing to members of the academy. conducting the pilot projects revealed numerous techni­ cal and cultural problems to resolve prior to achieving this vision. the authors anticipate that other institutions will need to address these same issues before undertaking their own enterprise dam system. using commercial software developed in academia during the course of the living lab pilot, the differ­ ences between academia and the commercial sector proved to be a significant issue. assumptions about the organizational culture and work methods are built into systems, often in a tacit manner. in the case of the initial iteration of the living lab, these assumptions were those of the corporate world, the primary clients of the commercial providers as well the environment of the developers. u­m project participants, meanwhile, brought their own expectations based on the reality of their work environment in academia. universities do not have a strict hierarchical structure, with each aca­ demic unit and department having a great degree of local control. academia also has a culture of sharing, where teaching or research products are often shared with no payment involved, other than acknowledgment of the source. thus, there was a process of mutual edu­ cation and negotiation regarding what was and was not acceptable in the enterprise dam system implementa­ tion. this difference of cultures first manifested itself with acls. in the initial implementation, an acl could be defined only by a system administrator. this was a showstopper for the u­m participants, who thought that asset providers themselves would be able to define and modify the acl for any particular asset. a centralized solution with a single owner of the assets (the company), which is acceptable in the corporate environment, is not acceptable in a university environment, where each user is consumer and owner. defining who has access to an asset can be a complex problem in academia, since this access is a moving target subject to both departmental and institutional constraints. libraries and librarians the traditional role of libraries is one of preserving and making accessible the intellectual property of all of humanity. with each new advance in information tech­ nology, such as dam systems, the role of libraries and librarians continues to evolve. this pilot highlighted the role and value of librarians skilled in metadata develop­ ment and assignment. without their expertise and early involvement, there would have been no standard method of indexing assets, thus preventing users from finding useful media. also, the project reinforced two reasons for encouraging asset creators to assign metadata at the asset creation point instead of at the archival point. one, this ensures that metadata are assigned when the content expertise is available. it is very difficult for producers to assign metadata retrospectively, and the indexing information may no longer be available at the point of archive. two, metadata assignment at the point of asset creation helps to ensure consistent metadata assignment that lends itself to automated solutions at the time of archiving.35 thus, while their role in digital asset man­ agement systems continues to evolve, the authors predict that the librarians’ role will evolve around metadata, and that libraries will start to become the archive for digital materials. it is anticipated that librarians will work with technical experts to develop workflows that include the automated metadata assignment to help faculty routinely add existing and new collections of assets to the system. one example of such a role is deep blue at the university of michigan. deep blue is a digital framework for pre­ serving and finding the best scholarly and artistic work produced at the university. article title | author 15enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 15 production productivity new technical complexities emerge with each new asset collection added to the u­m system. new workflows as well as richer software features continue to be developed to meet newly identified integration and user interface needs. as the living lab experience advances, techni­ cal barriers are eliminated and new workflows auto­ mated. the authors anticipate that, eventually, automated workflows will allow faculty and staff to routinely use digital assets with a minimum of technical expertise, thus decreasing the personnel costs associated with the use of rich media. for the foreseeable future, however, techni­ cally knowledgeable staff will be required to develop these workflows and even complete a significant amount of the work. academic practice the more delicate and challenging issue is educating fac­ ulty on the value and power of digital assets to improve their research and teaching. dam is a new concept to fac­ ulty, and it will only become useful when integrated into their daily teaching and research. this will happen as fac­ ulty members become more knowledgeable and increase their comfort in the use of digital assets. the dental case study demonstrates that an improved student experience can be provided with such an asset management system, while the education case study demonstrates that a com­ plex set of authentic classroom materials can be orga­ nized and ingested for use by others. these case studies are only two examples of the unanticipated outcomes that result from the use of digital assets in education. the authors predict that as more unanticipated and innova­ tive uses of digital assets are discovered, these new uses will, in turn, lead to increased academic productivity—for example, teaching more without increasing the number of faculty, students teaching each other with rich media, small­group work, and project­based learning. the list of possibilities is endless. as the living lab evolved from a research and development project into the implementation project known as bluestream, it has become an actual classroom resource. this article described myriad issues that were addressed so that other institutions can embark on their own enterprise dam systems fully informed about the road ahead. the remaining technical issues can and will be resolved over time. the greatest challenges that remain are being discovered as faculty and students use bluestream to improve teaching, learning, and research activities. the success of bluestream specifically, and enterprise dam systems in general, will be determined by their successes and failures in meeting the needs of faculty and students. ■ acknowledgements the authors recognize that the living lab pilot program was conducted with the support of others. we thank ruxandra­ana iacob for her administrative contributions to the project. we thank both ruxandra­ana iacob and sharon grayden for their assistance with writing this article. thanks to karen dickinson for her encourage­ ment, optimism, and constant support throughout the project. we thank mark fitzgerald for his vision regard­ ing the potential of the school of dentistry spi project and for conducting the original research. the living lab pilot was conducted with support from the university of michigan office of the provost through the carat partnership program, which pro­ vided funding for the pilot, and the carat­rackham fellowship program, which funded the metadata work. references 1. a. doyle and l. dawson, “current practices in digital asset management,” internet2/cni performance archive & retrieval working group, 2003, http://docs.internet2.edu/ doclib/draft­internet2­humanities­digital­asset­management­ practices­200310.html (accessed feb. 17, 2007). 2. d. z. spicer, p. b. deblois, and the educause current issues committee. “fifth annual educause survey identifies current it issues.” educause quarterly 27, no. 2 (2004): 8–22. 3. humanities advanced technology and information insti­ tute (hatii), university of glasgow, and the national initiative for a networked cultural heritage (ninch), “the ninch guide to good practice in the digital representation and man­ agement of cultural heritage materials,” 2003, www.nyu.edu/ its/humanities/ninchguide (accessed july 10, 2005). 4. a. mccord, “overview of digital asset management sys­ tems,” educause evolving technologies committee, sept. 6, 2002. 5. james l. hilton, “digital management systems,” educause review 38, no. 2 (2003): 53. 6. james. hilton, “university of michigan digital asset management system,” 2004. http://sitemaker.umich.edu/ bluestream/files/dams_year01_campus.ppt (accessed feb. 15, 2007). 7. the university of michigan, “bluestream,” 2006, http:// sitemaker.umich.edu/bluestream (accessed feb. 15, 2007). 8. oracle corp., “stellent universal content management,” 2006, www.stellent.com/en/index.htm (accessed feb. 15, 2007); artesia digital media group, “artesia: the open text digital media group,” 2006, www.artesia.com/ (accessed feb. 15, 2007); canto, “canto,” 2007, www.canto.com (accessed feb. 15, 2007). 9. r. d. vernon and o. v. riger, “digital asset management: an introduction to key issues,” www.cit.cornell.edu/oit/arch­ init/digassetmgmt.html (accessed sept. 24, 2004); yan han, “digital content management: the search for a content man­ agement system,” library hi tech 22, no. 4 (2004): 355–65; stan­ ford university libraries and academic information resources, 16 information technology and libraries | december 200716 information technology and libraries | december 2007 “media preservation: digital preservation,” 2005, http://library. stanford.edu/depts/pres/mediapres/digital.html (accessed july 29, 2005). 10. telestream, “telestream, inc.,” 2005, www.telestream.net/ products/flipfactory.htm (accessed feb. 15, 2007). 11. autonomy, inc., “virage products overview: virage vid­ eologger,” 2006, www.virage.com/content/products/index. en.html (accessed feb. 15, 2007). 12. international business machines corp., “ancept media server: digital asset management solution,” 2007, www.nasi. com/ancept.php (accessed feb. 15, 2007). 13. realnetworks, inc., “realnetworks media servers,” 2007, www.realnetworks.com/products/media_delivery.html (accessed feb. 15, 2007); apple, inc., “quicktime streaming server,” 2007, www.apple.com/quicktime/streamingserver (accessed feb. 15, 2007); international business machines corp., “db2 content manager video charger,” 2007, www­306.ibm. com/software/data/videocharger/ (accessed feb. 15, 2007). 14. sakai, “sakai: collaboration and learning environment for education,” 2007, www.sakaiproject.org (accessed feb. 15, 2007). 15. the university of michigan, “michigan radio,” 2007, www.michiganradio.org (accessed feb. 15, 2007). 16. the university of michigan, “block m records,” 2005, www.blockmrecords.org (accessed feb. 15, 2007); the univer­ sity of michigan, “deep blue,” 2007, http://deepblue.lib.umich. edu (accessed feb. 15, 2007). 17. e. duval et al., “metadata principles and practicalities,” d-lib magazine 8, no 4 (2002); a. m. white et al., “pb core— the public broadcasting metadata initiative: progress report,” 2003 dublin core conference sept. 28–oct. 2, 2003, seattle; j. attig, a. copeland, and m. pelikan, “context and meaning: the challenges of metadata for a digital image library within the university,“ college & research libraries 65, no. 3 (may 2004): 251–61. 18. white et al., “pb core—the public broadcasting meta­ data initiative”; attig, copeland, and pelikan, “context and meaning.” 19. dublin core metadata initiative, “dublin core metadata initiative,” 2007, http://dublincore.org (accessed feb. 15, 2007). 20. visual resources association, “vra core categories, version 3.0,” 2002, www.vraweb.org/vracore3.htm (accessed feb. 15, 2007); louis j. goldberg, et al., “the significance of snodent,” studies in health technology and informatics 116 (aug. 2005): 737–42; http://ontology.buffalo.edu/medo/sno­ dent_05.pdf (accessed feb. 15, 2007). 21. autonomy, “virage products overview.” 22. filemaker, inc., “filemaker,” 2007, www.filemaker.com/ products (accessed feb. 15, 2007). 23. m. fitzgerald et al., “efficacy of speech­to­text technol­ ogy in managing video recorded interactions,” journal of dental research 85, special issue a (2006): abstract no. 833. 24. u.s. department of education, “family educational rights and privacy act ferpa,” 2005, www.ed.gov/policy/ gen/guid/fpco/ferpa/index.html (accessed feb. 15, 2007). 25. g. lorenzo and j. ittelson, “an overview of e­portfolios,” educause learning initiative, 2005, http://educause.edu/ir/ library/pdf/eli3001.pdf (accessed feb. 15, 2007). 26. microsoft corp., “microsoft office powerpoint 2007,” 2007, http://office.microsoft.com/en­us/powerpoint/default. aspx (accessed feb. 15, 2007); apple, inc., “keynote,” 2007, www.apple.com/iwork/keynote (accessed feb. 15, 2007). 27. world intellectual property organization, “berne con­ vention for the protection of literary and artistic works,” 1979, www.wipo.int/treaties/en/ip/berne/trtdocs_wo001.html (accessed feb. 15, 2007). 28. wikimedia foundation, inc., “digital rights manage­ ment,” 2007, http://en.wikipedia.org/wiki/digital_rights_ management (accessed feb. 15, 2007). 29. creative commons, “creative commons,” 2007, http:// creativecommons.org (accessed feb. 15, 2007). 30. wikimedia foundation, inc., “access control list,” 2007, http://en.wikipedia.org/wiki/access_control_list (accessed feb. 15, 2007). 31. the university of michigan, “um institutional review boards,” 2007, www.irb.research.umich.edu (accessed feb. 15, 2007). 32. health insurance portability and accountability act of 1996 (hipaa), “centers for medicare and medicaid ser­ vices,” 2005, www.cms.hhs.gov/hipaageninfo/downloads/ hipaalaw.pdf (accessed feb. 15, 2007). 33. j. greenberg et al., “author­generated dublin core meta­ data for web resources: a baseline study in an organization,” journal of digital information 2, no. 2 (2002), http://journals.tdl. org/jodi/article/view/jodi­39/45 (accessed nov. 10, 2007); a. crystal and j. greenberg, “usability of a metadata creation application for resource authors,” library & information science research 27, no. 2 (2005): 177–89. 34. the university of michigan, “ctools,” 2007, https:// ctools.umich.edu/portal (accessed feb. 15, 2007). 35. m. cox et al., descriptive metadata for television (amster­ dam: focal pr., 2006); michael a. chopey, “planning and imple­ menting a metadata­driven digital repository,” cataloging & classification quarterly 40, no. 3/4 (2005): 255–87. the efficient storage of text documents in digital libraries | skibiński and swacha 143 przemysław skibiński and jakub swacha the efficient storage of text documents in digital libraries przemysław skibiński (inikep@ii.uni.wroc.pl) is [qy: title?], institute of computer science, university of wrocław, poland. jakub swacha (jakubs@uoo.univ.szczecin.pl) is [qy: title?], institute of information technology in management, university of szczecin, poland. przemysław skibiński and jakub swacha the efficient storage of text documents in digital libraries in this paper we investigate the possibility of improving the efficiency of data compression, and thus reducing storage requirements, for seven widely used text document formats. we propose an open-source text compression software library, featuring an advanced word-substitution scheme with static and semidynamic word dictionaries. the empirical results show an average storage space reduction as high as 78 percent compared to uncompressed documents, and as high as 30 percent compared to documents compressed with the free compression software gzip. i t is hard to expect the continuing rapid growth of global information volume not to affect digital libraries.1 the growth of stored information volume means growth in storage requirements, which poses a problem in both technological and economic terms. fortunately, the digital librarys’ hunger for resources can be tamed with data compression.2 the primary motivation for our research was to limit the data storage requirements of the student thesis electronic archive in the institute of information technology in management at the university of szczecin. the current regulations state that every thesis should be submitted in both printed and electronic form. the latter facilitates automated processing of the documents for purposes such as plagiarism detection or statistical language analysis. considering the introduction of the three-cycle higher education system (bachelor/master/doctorate), there are several hundred theses added to the archive every year. although students are asked to submit microsoft word–compatible documents such as doc, docx, and rtf, other popular formats such as tex script (tex), html, ps, and pdf are also accepted, both in the case of the main thesis document, containing the thesis and any appendixes that were included in the printed version, and the additional appendixes, comprising materials that were left out of the printed version (such as detailed data tables, the full source code of programs, program manuals, etc.). some of the appendixes may be multimedia, in formats such as png, jpeg, or mpeg.3 notice that this paper deals with text-document compression only. although the size of individual text documents is often significantly smaller than the size of individual multimedia objects, their collective volume is large enough to make the compression effort worthwhile. the reason for focusing on text-document compression is that most multimedia formats have efficient compression schemes embedded, whereas text document formats usually either are uncompressed or use schemes with efficiency far worse than the current state of the art in text compression. although the student thesis electronic archive was our motivation, we propose a solution that can be applied to any digital library containing text documents. as the recent survey by kahl and williams revealed, 57.5 percent of the examined 1,117 digital library projects consisted of text content, so there are numerous libraries that could benefit form implementation of the proposed scheme.4 in this paper, we describe a state-of-the-art approach to text-document compression and present an opensource software library implementing the scheme that can be freely used in digital library projects. in the case of text documents, improvement in compression effectiveness may be obtained in two ways: with or without regard to their format. the more nontextual content in a document (e.g., formatting instructions, structure description, or embedded images), the more it requires format-specific processing to improve its compression ratio. this is because most document formats have their own ways of describing their formatting, structure, and nontextual inclusions (plain text files have no inclusions). for this reason, we have developed a compound scheme that consists of several subschemes that can be turned on and off or run with different parameters. the most suitable solution for a given document format can be obtained by merely choosing the right schemes and adequate parameter values. experimentally, we have found the optimal subscheme combinations for the following formats used in digital libraries: plain text, tex, rtf, text annotated with xml, html, as well as the device-independent rendering formats ps and pdf.5 first we discuss related work in text compression, then describe the basis of the proposed scheme and how it should be adapted for particular document formats. the section “using the scheme in a digital library project” discusses how to use the free software library that implements the scheme. then we cover the results of experiments involving the proposed scheme and a corpus of test files in each of the tested formats. n text compression there are two basic principles of general-purpose data compression. the first one works on the level of character sequences, the second one works on the level of przemysław skibiński (inikep@ii.uni.wroc.pl) is associate professor, institute of computer science, university of wrocław, poland. jakub swacha (jakubs@uoo.univ.szczecin .pl) is associate professor, institute of information technology in management, university of szczecin, poland. 144 information technology and libraries | september 2009 individual characters. in the first case, the idea is to look for matching character sequences in the past buffer of the file being compressed and replace such sequences with shorter code words; this principle underlies the algorithms derived from the concepts of arbraham lempel and jacob ziv (lz-type).6 in the second case, the idea is to gather frequency statistics for characters in the file being compressed and then assign shorter code words for frequent characters and longer ones for rare characters (this is exactly how huffman coding works—what arithmetic coding assigns are value ranges rather than individual code words).7 as the characters form words, and words form phrases, there is high correlation between subsequent characters. to produce shorter code words, a compression algorithm either has to observe the context (understood as several preceding characters) in which the character appeared and maintain separate frequency models for different contexts, or has to first decorrelate the characters (by sorting them according to their contexts) and then use an adaptive frequency model when compressing the output (as the characters’ dependence on context becomes dependence on position). whereas the former solution is the foundation of prediction by partial match (ppm) algorithms, burrows-wheeler transform (bwt) compression algorithms are based on the latter.8 witten et al., in their seminal work managing gigabytes, emphasize the role of data compression in text storage and retrieval systems, stating three requirements for the compression process: good compression, fast decoding, and feasibility of decoding individual documents with minimum overhead.9 the choice of compression algorithm should depend on what is more important for a specific application: better compression or faster decoding. an early work of jon louis bentley and others showed that a significant improvement in text compression can be achieved by treating a text document as a stream of space-delimited words rather than individual characters.10 this technique can be combined with any general-purpose compression method in two ways: by redesigning character-based algorithms as word-based ones or by implementing a two-stage scheme whose first step is a transform replacing words with dictionary indices and whose second step is passing the transformed text through any generalpurpose compressor.11 from the designer’s point of view, although the first approach provides more control over how the text is modeled, the second approach is much easier to implement and upgrade to future general-purpose compressors.12 notice that the separation of the wordreplacement stage from the compression stage does not imply that two distinct programs have to be used—if only an appropriate general-purpose compression software library is available, a single utility can use it to compress the output of the transform it first performed. an important element of every word-based scheme is the dictionary of words that lists character sequences that should be treated as single entities. the dictionary can be dynamic (i.e., constructed on-line during the compression of every document),13 static (i.e., constructed off-line before the compression stage and once for every document of a given class—typically, the language of the document determines its class),14 or semidynamic (i.e., constructed off-line before compression stage but individually for every document).15 semidynamic dictionaries must be stored along with the compressed document. dynamic dictionaries are reconstructed during decompression (which makes the decoding slower than in the other cases). when the static dictionary is used, it must be distributed with the decoder; since a single dictionary is used to compress multiple files, it usually attains the best compression ratios, but it is only effective with documents of the class it was originally prepared for. n the basic compression scheme the basis of our approach is a word-based, lossless text compression scheme, dubbed compression for textual digital libraries (ctdl). the scheme consists of up to four stages: 1. document decompression 2. dictionary composition 3. text transform 4. compression stages 1–2 are optional. the first is for retrieving textual content from files compressed poorly with generalpurpose methods. it is only executed for compressed input documents. it uses an embedded decompressor for files compressed using the deflate algorithm,16 but an external tool—precomp—is used to decode natively compressed pdf documents.17 the second stage is for constructing the dictionary of the most frequent words in the processed document. doing so is a good idea when the compressed documents have no common set of words. if there are many documents in the same language, a common dictionary fares better—it usually does not pay off to store an individual dictionary with each file because they all contain similar lists of words. for this reason we have developed two variants of the scheme. the basic ctdl includes stage 2; therefore it can use a document-specific semidynamic dictionary in the third stage. the ctdl+ variant uses a static dictionary common for all files in the same language; therefore it can omit stage 2. during stage 2, all the potential dictionary items that meet the word requirements are extracted from the document and then sorted according to their frequency the efficient storage of text documents in digital libraries | skibiński and swacha 145 to form a dictionary. the requirements define the minimum length and frequency of a word in the document (by default, 2 and 6 respectively) as well as its content. only the following kinds of strings are accepted into the dictionary: n a sequence of lowercase and uppercase letters (“a”–“z”, “a”–“z”) and characters with ascii code values from range 128–255 (thus it supports any typical 8-bit text encoding and also utf-8) n url address prefixes of the form “http:// domain/,” where domain is any combination of letters, digits, dots, and dashes n e-mails—patterns of the form “login@domain,” where login and domain are any combination of letters, digits, dots, and dashes n runs of spaces stage 3 begins with parsing the text into tokens. the tokens are defined by their content; as four types of content are distinguished, there are also four classes of tokens: words, numbers, special tokens, and characters. every token is then encoded in a way that depends on the class it belongs to. the words are those character sequences that are listed in the dictionary. every word is replaced with its dictionary index, which is then encoded using symbols that are rare or nonexistent in the input document. indexes are encoded with code words that are between one and four bytes long, with lower indexes (denoting more frequent words) being assigned shorter code words. the numbers are sequences of decimal digits, which are encoded with a dense binary code, and, similarly to letters, placed in a separate location in the output file. the special tokens can be decimal fractions, ip numerical addresses, dates, times, and numerical ranges. as they have a strict format and differ only in numerical values, they are encoded as sequences of numbers.18 finally, the characters are the tokens that do not belong to any of the aforementioned group. they are simply copied to the output file, with the exception of those rare characters that were used to construct code words; they are copied as well, but have to be preceded with a special escape symbol. the specialized transform variants (see the next section) distinguish three additional classes from the character class: letters (words not in the dictionary), single white spaces, and multiple white spaces. stage 4 could use any general-purpose compression method to encode the output of stage 3. for this role, we have investigated several open-licensed, generalpurpose compression algorithms that differ in speed and efficiency. as we believe that document access speed is important to textual digital libraries, we have decided to focus on lz–type algorithms because they offer the best decompression times. ctdl has two embedded backend compressors: the standard deflate and lzma, wellknown for its ability to attain high compression ratios.19 n adapting the transform for individual text document formats the text document formats have individual characteristics; therefore the compression ratio can be improved by adapting the transform for a particular format. as we noted in the introduction, we propose a set of subschemes (modifications of the original processing steps or additional processing steps) that can help compression— provided the issue that a given subscheme addresses is valid for the document format being compressed. there are two groups of subschemes: the first consists of solutions that can be applied to more than one document format. it includes n changing the minimum word frequency threshold (the “minfr” column in table 1) that a word must pass to be included in the semidynamic dictionary (notice that no word can be added to a static dictionary); n using spaceless word model (“wdspc” column in table 1) in which a single space between two words is not encoded at all; instead, a flag is used to mark two neighboring words that are not separated by a space; n run-length encoding of multiple spaces (“spruns” column in table 1); n letter containers (“letcnt” column in table 1), that is, removing sequences of letters (belonging to words that are not included in the dictionary) to a separate location in the output file (and leaving a flag at their original position). table 1 shows the assignment of the mentioned subschemes to document formats, with “+” denoting that a given subscheme should be applied when processing a given document format. notice that we use different subschemes for the same format depending on whether a semidynamic (ctdl) or static (ctdl+) dictionary is used. the remaining subschemes are applied for only one document format. they attain an improvement in compression performance by changing the definition of acceptable dictionary words, and, in one case (ps), by changing the definition of number strings. the encoder for the simplest of the examined formats—plain text files—performs no additional formatspecific processing. the first such modification is in the tex encoder. the difference is that words beginning with “\” (tex 146 information technology and libraries | september 2009 instructions) are now accepted in the dictionary. the modification for pdf documents is similar. in this case, bracketed words (pdf entities)— for example “(abc)”—are acceptable as dictionary entries. notice that pdf files are internally compressed by default—the transform can be applied after decompressing them into textual format. the precomp tool is used for this purpose. the subscheme for ps files features two modifications: its dictionary accepts words beginning with “/” and “\” or ending with “(“, and its number tokens can contain not only decimal but also hexadecimal digits (though a single number must have at least one decimal digit). the hexadecimal number must be at least 6 digits long, and is encoded with a flag: a byte containing its length (numbers with more than 261 digits are split into parts) and a sequence of bytes, each containing two digits from the number (if the number of digits is odd, the last byte contains only one digit). for rtf documents, the dictionary accepts the “\”-preceded words, like the tex files. moreover, the hexadecimal numbers are encoded in the same way as in the ps subscheme so that rtf documents containing images can be significantly reduced in size. specialization for xml is roughly the transform described in our earlier article, “revisiting dictionarybased compression.”20 it allows for xml start tags and entities to be added to dictionary, and it replaces every end tag respecting the xml well-formedness rule (i.e., closing the element opened most recently) with a single flag. it also uses a single flag to denote xml attribute value begin and end marks. html documents are handled similarly. the only difference is that the tags that, according to the html 4.01 specification, are not expected to be followed by an endtag (base, link, xbasehref, br, meta, hr, img, area, input, embed, param and col) are ignored by the mechanism replacing closing tags (so that it can guess the correct closing tag even after the singular tags were encountered).21 n using the scheme in a digital library project many textual digital libraries seriously lack text compression capabilities, and popular digital library systems, such as greenstone, have no embedded efficient text compression.22 therefore we have decided to develop ctdl as an open-source software library. the library is free to use and can be downloaded from www.ii.uni.wroc .pl/~inikep/research/ctdl/ctdl09.zip. the library does not require any additional nonstandard libraries. it has both the text transform and back-end compressors embedded. however, compressing pdf documents requires them to be decompressed first with the free precomp tool. the compression routines are wrapped in a code selecting the best algorithm depending on the chosen compression mode and the input document format. the interface of the library consists of only two functions: ctdl_encode and ctdl_decode, for, respectively, compressing and decompressing documents. ctdl_encode takes the following parameters: n char* filename—name of the input (uncompressed) document n char* filename_out—name of the output (compressed) document n efiletype ftype—format of the input document, defined as: enum efiletype { html, pdf, ps, rtf, tex, txt, xml}; n edictionarytype dtype—dictionary type, defined as: enum edictionarytype { static, semidynamic }; ctdl_decode takes the following parameters: n char* filename—name of the input (compressed) document n char* filename_out—name of the output (decompressed) document table 1. universal transform optimizations ctdl settings ctdl+ settings format minfr wdspc spruns letcnt wdspc spruns letcnt html 3 + + + + + pdf 3 ps 6 + + rtf 3 + + + tex 3 + + + + + + txt 6 + + + + + + xml 3 + + + + + the efficient storage of text documents in digital libraries | skibiński and swacha 147 the library was written in the c++ programming language, but a compiled static library is also distributed; thus it can be used in any language that can link such libraries. currently, the library is compatible with two platforms: microsoft windows and linux. to use static dictionaries, the respective dictionary file must be available. the library is supplied with an english dictionary trained on a 3 gb text corpus from project gutenberg.23 seven other dictionaries—german, spanish, finnish, french, italian, polish, and russian— can be freely downloaded from www.ii.uni.wroc.pl/~inikep/ research/dicts. there also is a tool that helps create a new dictionary from any given corpus of documents, available from skibiński upon request via e-mail (inikep@ii.uni .wroc.pl). the library can be used to reduce the storage requirements or also to reduce the time of delivering a requested document to the library user. in the first case, the decompression must be done on the server side. in the second case, it must be done on the client side, which is possible because stand-alone decompressors are available for microsoft windows and linux. obviously, a library can support both options by providing the user with a choice whether a document should be delivered compressed or not. if documents are to be decompressed client-side, the basic ctdl, using a semidynamic dictionary, seems handier, since it does not require the user to obtain the static dictionary that was used to compress the downloaded document. still, the size of such a dictionary is usually small, so it does not disqualify ctdl+ from this kind of use. n experimental results we tested ctdl experimentally on a benchmark set of text documents. the purpose of the tests was to compare the storage requirements of different document formats in compressed and uncompressed form. in selecting the test files we wanted to achieve the following goals: n test all the formats listed in table 1 (therefore we decided to choose documents that produced no errors during document format conversion) n obtain verifiable results (therefore we decided to use documents that can be easily obtained from the internet) n measure the actual compression improvement from applying the proposed scheme (apart from the rtf format, the scheme is neutral to the images embedded in documents; therefore we decided to use documents that have no embedded images) for these reasons, we used the following procedure for selecting documents to the test set. first, we searched the project gutenberg library for tex documents, as this format can most reliably be transformed into the other formats. from the fifty-one retrieved documents, we removed all those containing images as well as those that the htlatex tool failed to convert to html. in the eleven remaining documents, there were four jane austen books; this overrepresentation was handled by removing three of them. the resulting eight documents are given in table 2. from the tex files we generated html, pdf, and ps documents. then we used word 2007 to transform html documents into rtf, doc, and xml (thus this is the microsoft word xml format, not the project gutenberg xml format). the txt files were downloaded from project gutenberg. the tests were conducted on a low-end amd sempron 3000+ 1.80 ghz system with 512 mb ram and a seagate 80 gb ata drive, running windows xp sp2. for comparison purposes, we used three generalpurpose compression programs: n gzip implementing deflate n bzip2 implementing a bwt-based compression algorithm table 2. test set documents specification file name title author tex size (bytes) 13601-t expositions of holy scripture: romans corinthians maclaren 1,443,056 16514-t a little cook book for a little girl benton 220,480 1noam10t north america, v. 1 trollope 804,813 2ws2610 hamlet shakespeare 194,527 alice30 alice in wonderland carroll 165,844 cdscs10t some christmas stories dickens 127,684 grimm10t fairy tales grimm 535,842 pandp12t pride and prejudice austen 727,415 148 information technology and libraries | september 2009 n ppmvc implementing a ppm-derived compression algorithm24 tables 3–10 show n the bitrate attained on each test file by the deflatebased gzip in default mode, the proposed compression scheme in the semidynamic and static variants with deflate as the back-end compression algorithm, 7-zip in lzma mode, the proposed compression scheme in the semidynamic and static variants with lzma as the back-end compression algorithm, bzip2 and ppmvc; n the average bitrate attained on the whole test corpus; and n the total compression and decompression times (in seconds) for the whole test corpus, measured on the test platform (they are total elapsed times including program initialization and disk operations). bitrates are given in output bits per character of an uncompressed document in a given format, so a smaller table 3. compression efficiency and times for the txt documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.944 2.244 2.101 2.337 2.057 1.919 2.158 1.863 16514-t 2.566 2.150 1.969 2.228 1.993 1.838 2.010 1.780 1noam10t 2.967 2.337 2.109 2.432 2.151 1.958 2.160 1.946 2ws2610 3.217 2.874 2.459 2.871 2.659 2.312 2.565 2.343 alice30 2.906 2.533 2.184 2.585 2.360 2.056 2.341 2.090 cdscs10t 3.222 2.898 2.298 2.928 2.721 2.192 2.694 2.436 grimm10t 2.832 2.275 2.090 2.357 2.079 1.931 2.112 1.886 pandp12t 2.901 2.251 2.097 2.366 2.061 1.930 2.032 1.835 average 2.944 2.445 2.163 2.513 2.260 2.017 2.259 2.022 comp. time 0.688 1.234 0.954 6.688 2.640 2.281 2.110 3.281 dec. time 0.125 0.454 0.546 0.343 0.610 0.656 0.703 3.453 table 4. compression efficiency and times for the tex documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.927 2.233 2.092 2.328 2.049 1.913 2.146 1.852 16514-t 2.277 1.904 1.794 1.957 1.744 1.645 1.746 1.534 1noam10t 2.976 2.370 2.142 2.445 2.186 1.986 2.195 1.976 2ws2610 3.206 2.906 2.482 2.864 2.674 2.323 2.562 2.340 alice30 2.897 2.526 2.183 2.573 2.350 2.048 2.332 2.085 cdscs10t 3.224 2.931 2.328 2.941 2.759 2.222 2.723 2.466 grimm10t 2.831 2.304 2.120 2.364 2.113 1.960 2.143 1.910 pandp12t 2.881 2.239 2.090 2.346 2.049 1.916 2.013 1.817 average 2.902 2.427 2.154 2.477 2.241 2.002 2.233 1.998 comp. time 0.688 1.250 0.969 6.718 2.703 2.406 2.140 3.329 dec. time 0.109 0.453 0.547 0.360 0.609 0.672 0.703 3.485 the efficient storage of text documents in digital libraries | skibiński and swacha 149 bitrate (of, e.g., rtf documents compared to the plain text) does not mean the file is smaller, only that the compression was better. uncompressed files have a bitrate of 8 bits per character. looking at the results obtained for txt documents (table 3), we can see an average improvement of 17 percent for ctdl and 27 percent for ctdl+ compared to the baseline deflate implementation. compared to the baseline lzma implementation, the improvement is 10 percent for ctdl and 20 percent for ctdl+. also, ctdl+ combined with lzma compresses txt documents 31 percent better than gzip, 11 percent better than bzip2, and slightly better than the state-of-the-art ppmvc implementation. in case of tex documents (table 4), the gzip results were improved, on average, by 16 percent using ctdl and by 26 percent using ctdl+; the numbers for lzma are 10 percent for ctdl and 19 percent for ctdl+. in a cross-method comparison, ctdl+ with lzma beats gzip by 31 percent, bzip2 by 10 percent, and attains results very close to ppmvc. on average, deflate-based ctdl compressed xml documents 20 percent better than the baseline algorithm (table 5), and with ctdl+ the improvement rises to 26 percent. ctdl improves lzma compression by 11 percent, and ctdl+ improves it by 18 percent. ctdl+ with lzma beats gzip by 33 percent, bzip2 by 8 percent, and loses only 4 percent to ppmvc. similar results were obtained for html documents (table 6): they were compressed with ctdl and deflate 18 percent better than with the deflate algorithm alone, and 27 percent better with ctdl+. lzma compression efficiency is improved by 11 percent with ctdl and 20 percent with ctdl+. ctdl+ with lzma beats gzip by 33 percent, bzip2 by 9 percent, and loses only 2 percent to ppmvc. for rtf documents (table 7), the gzip results were improved, on average, by 18 percent using ctdl, and 25 percent using ctdl+; the numbers for lzma are respectively 9 percent for ctdl and 17 percent for ctdl+. in a cross-method comparison, ctdl+ with lzma beats gzip by 34 percent, bzip2 by 7 percent, and loses 5 percent to ppmvc. although there is no mode designed especially for doc documents in ctdl (table 8), the basic txt mode was used, as it was found experimentally to be the best choice available. the results show it managed to improve deflate-based compression by 9 percent using ctdl, and by 21 percent using ctdl+, whereas lzma-based compression was improved respectively by 4 percent for ctdl and 14 percent for ctdl+. combined with lzma, ctdl+ compresses doc documents 30 percent better than gzip, 13 percent better than bzip2, and 1 percent better than ppmvc. in case of ps documents (table 9), the gzip results were improved, on average, by 5 percent using ctdl, and by 8 percent using ctdl+; the numbers for lzma improved 3 percent for ctdl and 5 percent for ctdl+. in a cross-method comparison, ctdl+ with lzma beats gzip by 8 percent, losing 5 percent to bzip2 and 7 percent to ppmvc. finally, ctdl improved deflate-based compression of pdf documents (table 10) by 9 percent using ctdl and 10 percent using ctdl+ (compared to gzip; the numbers are table 5. compression efficiency and times for the xml documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.046 1.551 1.514 1.585 1.405 1.339 1.451 1.242 16514-t 0.871 0.698 0.670 0.703 0.612 0.590 0.599 0.552 1noam10t 2.383 1.870 1.736 1.914 1.711 1.575 1.724 1.515 2ws2610 0.691 0.539 0.497 0.561 0.474 0.440 0.461 0.422 alice30 1.477 1.258 1.140 1.248 1.131 1.034 1.116 0.999 cdscs10t 2.106 1.892 1.576 1.862 1.741 1.462 1.721 1.538 grimm10t 1.878 1.485 1.422 1.521 1.337 1.276 1.337 1.198 pandp12t 1.875 1.404 1.349 1.465 1.263 1.207 1.252 1.105 average 1.666 1.337 1.238 1.357 1.209 1.115 1.208 1.071 comp. time 0.750 1.844 1.390 10.79 4.891 5.828 7.047 3.688 dec. time 0.141 0.672 0.750 0.421 0.859 0.953 1.140 3.907 150 information technology and libraries | september 2009 much higher if compared to the embedded pdf compression—see “native” column in table 10); the numbers for lzma are respectively 7 percent for ctdl and 10 percent for ctdl+. combined with lzma, ctdl+ compresses pdf documents 28 percent better than gzip, 4 percent better than bzip2, and 5 percent worse than ppmvc. the results presented in tables 3–10 show that ctdl manages to improve compression efficiency of the general-purpose algorithms it is based on. the scale of improvement varies between document types, but for most of them it is more than 20 percent for ctdl+ and 10 percent for ctdl. the smallest improvement is achieved in case of ps (about 5 percent). figure 1 shows the same results in another perspective: the bars show how much better compression ratios were obtained for the same documents using different compression schemes compared to gzip with default options (0 percent means no improvement). compared to gzip, ctdl offers a significantly better compression ratio at the expense of longer processing time. the relative difference is especially high in case of decompression. however, in absolute terms, even in the worst case of pdf, the average delay between ctdl+ and gzip is below 180 ms for compression and 90 ms for decompression per file. taking into consideration the low-end specification of the test computer, these results table 6. compression efficiency and times for the html documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.696 2.054 1.940 2.121 1.868 1.751 1.932 1.670 16514-t 1.726 1.405 1.310 1.436 1.258 1.180 1.257 1.113 1noam10t 2.768 2.159 1.972 2.244 1.979 1.815 1.973 1.785 2ws2610 2.084 1.747 1.504 1.743 1.525 1.344 1.499 1.303 alice30 2.451 2.124 1.829 2.128 1.929 1.701 1.888 1.684 cdscs10t 2.880 2.593 2.084 2.597 2.410 1.966 2.348 2.131 grimm10t 2.603 2.074 1.916 2.138 1.883 1.752 1.889 1.688 pandp12t 2.640 2.037 1.891 2.120 1.826 1.717 1.777 1.596 average 2.481 2.024 1.806 2.066 1.835 1.653 1.820 1.621 comp. time 0.750 1.438 1.078 8.203 3.421 3.328 2.672 3.500 dec. time 0.140 0.515 0.594 0.359 0.688 0.750 0.812 3.672 table 7. compression efficiency and times for the rtf documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 1.882 1.431 1.372 1.428 1.267 1.200 1.300 1.120 16514-t 0.834 0.701 0.696 0.662 0.601 0.591 0.568 0.529 1noam10t 2.244 1.774 1.637 1.765 1.594 1.462 1.601 1.404 2ws2610 0.784 0.630 0.581 0.629 0.545 0.500 0.520 0.485 alice30 1.382 1.196 1.065 1.134 1.046 0.948 0.995 0.922 cdscs10t 2.059 1.882 1.558 1.784 1.704 1.432 1.645 1.488 grimm10t 1.618 1.301 1.227 1.285 1.150 1.082 1.149 1.010 pandp12t 1.742 1.340 1.264 1.336 1.169 1.115 1.142 1.012 average 1.568 1.282 1.175 1.253 1.135 1.041 1.115 0.996 comp. time 0.766 2.047 1.500 12.62 6.500 7.562 8.032 3.922 dec. time 0.156 0.688 0.766 0.469 0.875 0.953 1.312 4.157 the efficient storage of text documents in digital libraries | skibiński and swacha 151 certainly seem good enough for practical applications. compared to lzma, ctdl offers better compression and a shorter compression time at the expense of longer decompression time. notice that the absolute gain in compression time is several times the loss in decompression time, and the decompression time remains short, noticeably shorter than bzip2’s and several times shorter than ppmvc’s. ctdl+ beats bzip2 (with the sole exception of ps documents) in terms of compression ratio and achieves results that are mostly very close to the resourcehungry ppmvc. n conclusions in this paper we addressed the problem of compressing text documents. although individual text documents rarely exceed several megabytes in size, their entire collections can have very large storage space requirements. although text documents are often compressed with general-purpose methods such as deflate, much better compression can be obtained with a scheme specialized for text, and even better if the scheme is additionally specialized for individual document formats. we have developed such a scheme (ctdl), beginning with a text transform designed earlier for xml documents and table 8. compression efficiency and times for the doc documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.798 2.183 2.062 2.181 1.976 1.854 2.115 1.818 16514-t 2.226 2.213 2.073 1.712 1.712 1.652 1.919 1.686 1noam10t 2.851 2.250 2.025 2.289 2.057 1.869 2.113 1.870 2ws2610 2.497 2.499 2.210 2.095 2.095 1.890 2.251 1.999 alice30 2.744 2.714 2.270 2.345 2.345 2.038 2.348 2.058 cdscs10t 2.916 2.891 2.231 2.559 2.560 2.062 2.475 2.196 grimm10t 2.691 2.677 2.059 2.179 2.179 1.856 2.075 1.833 pandp12t 2.761 2.171 2.050 2.189 1.955 1.843 1.983 1.770 average 2.686 2.450 2.123 2.194 2.110 1.883 2.160 1.904 comp. time 0.718 1.312 1.031 7.078 4.063 3.001 2.250 3.421 dec. time 0.125 0.375 0.547 0.344 0.547 0.718 0.735 3.625 table 9. compression efficiency and times for the ps documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.847 2.634 2.589 2.213 2.105 2.074 2.011 1.778 16514-t 3.226 3.129 3.039 2.730 2.707 2.699 2.613 2.505 1noam10t 2.718 2.551 2.490 2.147 2.060 2.015 1.892 1.694 2ws2610 3.064 2.922 2.795 2.600 2.521 2.450 2.336 2.186 alice30 3.224 3.154 3.026 2.750 2.745 2.691 2.553 2.400 cdscs10t 3.110 3.029 2.890 2.657 2.683 2.579 2.447 2.276 grimm10t 2.833 2.664 2.597 2.288 2.200 2.162 2.074 1.863 pandp12t 2.814 2.533 2.468 2.193 2.049 1.998 1.858 1.644 average 2.980 2.827 2.737 2.447 2.384 2.334 2.223 2.043 comp. time 1.328 3.015 2.500 14.23 10.96 11.09 4.171 5.765 dec. time 0.203 0.688 0.781 0.609 1.063 1.125 1.360 6.063 152 information technology and libraries | september 2009 modifying it for the requirements of each of the investigated document formats. it has two operation modes: basic ctdl and ctdl+ (the latter uses a common word dictionary for improved compression) and uses two back-end compression algorithms: deflate and lzma (differing in compression speed and efficiency). the improvement in compression efficiency, which can be observed in the experimental results, amounts to a significant reduction of data storage requirements, giving the reasons to use the library in both new and existing digital library projects instead of general-purpose compression programs. to facilitate this process, we implemented the scheme as an open-source software library under the same name, freely available at http://www.ii.uni.wroc . p l / ~ i n i k e p / re s e a rc h / c t d l / ctdl09.zip. although the scheme and the library are now complete, we plan future extensions aiming both to increase the level of specializations for currently handled document formats and to extend the list of handled document formats. table 10. compression efficiency and times for the (uncompressed) pdf documents deflate lzma bzip2 ppmvc file name native gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 3.443 2.624 2.191 2.200 1.986 1.708 1.656 1.852 1.659 16514-t 4.370 2.839 2.836 2.810 2.422 2.422 2.328 2.378 2.241 1noam10t 3.379 2.522 2.103 2.094 1.924 1.659 1.603 1.770 1.587 2ws2610 3.519 2.204 2.346 2.248 1.781 1.947 1.860 1.625 1.480 alice30 3.886 2.863 2.753 2.668 2.429 2.308 2.216 2.315 2.137 cdscs10t 3.684 2.835 2.688 2.557 2.399 2.276 2.164 2.260 2.079 grimm10t 3.543 2.557 2.135 2.120 2.008 1.713 1.661 1.858 1.696 pandp12t 3.552 2.684 2.267 2.256 2.071 1.831 1.769 1.870 1.705 average 3.672 2.641 2.415 2.369 2.128 1.983 1.907 1.991 1.823 comp. time n/a 1.594 3.672 3.250 19.62 13.31 16.32 5.641 7.375 dec. time n/a 0.219 0.844 0.969 0.719 1.219 1.360 1.765 7.859 figure 1. compression improvement relative to gzip the efficient storage of text documents in digital libraries | skibiński and swacha 153 acknowledgements szymon grabowski is the coauthor of the xml-wrt transform, which served as the basis for the ctdl library. references 1. john f. gantz et al., the diverse and exploding digital universe: an updated forecast of worldwide information growth through 2011 (framingham, mass.: idc, 2008), http://www .emc.com/collateral/analyst-reports/diverse-exploding-digital -universe.pdf (accessed may 7, 2009). 2. timothy c. bell, alistair moffat, and ian h. witten, “compressing the digital library,” in proceedings of digital libraries ‘94 (college station: texas a&m univ. 1994): 41. 3. ian h. witten and david bainbridge, how to build a digital library (san francisco: morgan kaufmann, 2002). 4. chad m. kahl and sarah c. williams, “accessing digital libraries: a study of arl members’ digital projects,” the journal of academic librarianship 32, no. 4 (2006): 364. 5. donald e. knuth, tex: the program (reading, mass.: addison-wesley, 1986); microsoft technical support, rich text format (rtf) version 1.5 specification, 1997, http://www.biblioscape .com/rtf15_spec.htm (accessed may 7, 2009); tim bray et al., eds., extensible markup language (xml) 1.0 (fourth edition), 2006, http://www.w3.org/tr/2006/rec-xml-20060816 (accessed may 7, 2009); dave raggett, arnaud le hors, and ian jacobs, eds., w3c html 4.01 specification, 1999, http://www.w3.org/ tr/rec-html40/ (accessed may 7, 2009); postscript language reference, 3rd ed. (reading, mass.: addison-wesley, 1999), http://www.adobe.com/devnet/postscript/pdfs/plrm.pdf (accessed may 7, 2009); pdf reference, 6th ed., version 1.7, 2006, http://www.adobe.com/devnet/acrobat/pdfs/pdf_ reference_1-7.pdf (accessed may 7, 2009). 6. jacob ziv and abraham lempel, “a universal algorithm for sequential data compression,” ieee transactions on information theory 23, no. 3 (1977): 337. 7. ian h. witten, alistair moffat, and timothy c. bell, managing gigabytes: compressing and indexing documents and images, 2nd ed. (san francisco: morgan kaufmann, 1999). 8. john g. cleary and ian h. witten, “data compression using adaptive coding and partial string matching,” ieee transactions on communication 32, no. 4, (1984): 396; michael burrows and david j. wheeler, “a block-sorting lossless data compression algorithm,” digital equipment corporation src research report 124, 1994, www.hpl.hp.com/techreports/ compaq-dec/src-rr-124.pdf (accessed may 7, 2009). 9. witten, moffat, and bell, managing gigabytes. 10. jon louis bentley et al., “a locally adaptive data compression scheme,” communications of the acm 29, no. 4 (1986): 320; r. nigel horspool and gordon v. cormack, “constructing word-based text compression algorithms,” proceedings of the data compression conference (snowbird, utah, 1992): 62. 11. see for example andrei v. kadach, “text and hypertext compression,” programming & computer software 23, no. 4 (1997): 212; alistair moffat, “word-based text compression,” software—practice & experience 2, no. 19 (1989): 185; przemysław skibiński, szymon grabowski, and sebastian deorowicz, “revisiting dictionary-based compression,” software— practice & experience 35, no. 15 (2005): 1455. 12. przemysław skibiński, jakub swacha, and szymon grabowski, “a highly efficient xml compression scheme for the web,” proceedings of the 34th international conference on current trends in theory and practice of computer science, lncs 4910 (2008): 766. 13. jon louis bentley et al., “a locally adaptive data compression scheme,” communications of the acm 29, no. 4 (1986): 320. 14. skibiński, grabowski, and deorowicz, “revisiting dictionary-based compression,” 1455. 15. skibiński, swacha, and grabowski, “a highly efficient xml compression scheme for the web,” 766. 16. peter deutsch, “deflate compressed data format specification version 1.3,” rfc1951, network working group, 1996, www.ietf.org/rfc/rfc1951.txt (accessed may 7, 2009). 17. christian schneider, precomp—a command line precompressor, 2009, http://schnaader.info/precomp.html (accessed may 7, 2009). 18. the technical details of the algorithm constructing code words and assigning them to indexes, and encoding numbers and special tokens, are given in skibiński, swacha, and grabowski, “a highly efficient xml compression scheme for the web,” 766. 19. david solomon, data compression: the complete reference, 4th ed. (london: springer-verlag, 2006). 20. skibiński, swacha, and grabowski, “a highly efficient xml compression scheme for the web,” 766. 21. dave raggett, arnaud le hors, and ian jacobs, eds., w3c html 4.01 specification, 1999, http://www.w3.org/tr/rec -html40/ (accessed may 7, 2009). 22. ian h. witten, david bainbridge, and stefan boddie, “greenstone: open source dl software,” communications of the acm 44, no. 5 (2001): 47. 23. project gutenberg, 2008, http://www.gutenberg.org/ (accessed may 7, 2009). 24. przemysław skibiński and szymon grabowski, “variablelength contexts for ppm,” proceedings of the ieee data compression conference (snowbird, utah, 2004): 409. alcts cover 2 lita cover 3, cover 4 index to advertisers 86 information technology and libraries | september 2011 on technology and other decisions in my career. i know that i can post a question to lita-l or ala connect and get a quick, diverse response to an inquiry. i know that i can call on my lita colleagues to serve as references and reviewers as i move through my career. i also know that i can depend upon lita to help keep me current and well informed about technology and how it is integrated into our libraries and lives. this also gives me an edge in my career. so much of the lita experience is currently gained from attending meetings in person and making connections—those of you who have attended the lita happy hour can probably attest to this. for several years lita has not had a requirement to attend meetings in person and allows for virtual participation in committees and interest groups. several ad hoc methods have developed to allow members to attend meetings virtually. to better institutionalize the process two new taskforces have been formed to look at virtual participation in formal and informal lita meetings. a broadcasting taskforce is charged with making a recommendation on the best ways to deliver business meetings and another taskforce is charged with investigating methods to deliver education and programming to members virtually. both taskforces will pay careful attention to interaction and other attributes of in person gatherings that can be applied to virtual meetings so that we retain the connection-making experience. it is hard to assign monetary value to membership in an association, but we do so every time we make a decision to join or renew membership. when i renew and pay annual dues to lita i affirm that i am receiving value, and i do so without thinking. it is a given that i will renew. in addition to my library memberships i am a member of the wildlife conservation society (the group behind the bronx zoo and several other zoos in nyc). each year as i renew my membership i do a quick cost analysis calculating how many times i visited the zoos and what it would have cost my family if we were not members. but before i can finish that exercise my mind begins to wander and i start to think about the excursions to the zoos-camel rides, newborn animals—and those experiences and the memories created derail any cost recovery exercise. it is impossible to put monetary value on the wonderful experiences my family share during our visits to the zoo (incidentally it is more economical as well). i also feel some pride in contributing to an organization that does such wonderful programming and makes a real difference for animals and our planet. i understand that my membership helps them do what they do best. i don’t do this cost analysis with lita, but perhaps i should. the current price of lita membership is sixty dollars per year, which is about sixteen cents per day. as members we need to ask ourselves if we are receiving in return what a s i write my first president’s message for ital, i am wrapping up my year as vice president and the ala annual conference is fast approaching. the past year has been a busy one—handling necessary division business, including meeting with my fellow ala vice presidents, making committee appointments, planning an orientation for new board members, strategic planning, and attending various conferences and meetings to prepare me for my role as lita president. i am lucky to follow such wonderful leaders as karen starr and michelle frisque, who have both helped ready me for the year ahead. my life outside of lita has been equally busy. i started a new position as the director of weill cornell medical college library earlier this year and have a busy home life with two small children. as usual, i have been juggling quite a bit and often dropping a few balls. my mantra is that it is impossible to keep all the balls in the air all of the time, but when they do drop be careful not to let them roll so far away from you so that you lose sight of them. eventually i pick them up and start juggling again. i know that i am not alone in this juggling exercise. lita members have real jobs and friends, family, and other social responsibilities that keep us busy. so why do we give so much to our profession, including lita? if you are like me, it is because we get so much in return. the importance of activity and leadership in national, professional associations cannot be overrated. my experience in lita and other professional library associations has given me an opportunity to hone leadership skills working with various committees and boards over the years. the achievements that i have made in my career have a direct correlation to my work with lita. as libraries flatten organizational structures, lita is one place where anyone can take on leadership roles, gaining valuable experience. many members have agreed to take on leadership roles in the coming year by volunteering for committees and taskforces and accepting various appointments and i want to thank everyone who came forward. in the coming year i will be working with several officers and committees to develop orientations, mentoring initiatives, and leadership training for our members. i do appreciate that not everyone wants to take on a leadership role in lita. the networking opportunities, both formal and informal, also have been extremely valuable in my career. the people i have met in lita have become colleagues i am comfortable turning to for advice colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011–12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york colleen cuddy president’s message: reflections on membership continued on page 89 editorial | truitt 89 editorial.cfm (accessed july 13, 2011). 3. begin with fforde’s the eyre affair (2001) and proceed from there. if you are a librarian and are not quickly enchanted, you probably should consider a career change very soon! thank you, michele n! .youtube.com/watch?v=sps6c9u7ras. sadly, the rest of us must borrow or rent a copy. 2. marc truitt, “no more silver bullets, please,” information technology & libraries 29, no. 2 (june 2010), http://www.ala .org/ala/mgrps/divs/lita/publications/ital/292010/2902jun/ we give to the organization. the lita assessment and research committee recently surveyed membership to find out why people belong to lita, this is an important step in helping lita provide programming etc. that will be most beneficial to its users, but the decision on whether to be a lita member i believe is more personal and doesn’t rest on the fact that a particular drupal class is offered or that a particular speaker is a member of the top tech trends panel. it is based on the overall experience that you have as a member, the many little things. i knew in just a few minutes of attending my first lita open house 12 years ago that i had found my ala home in lita. i wish that everyone could have such a positive experience being a member of lita. if your experience is less than positive how can it be more so? what are we doing right? what could we do differently? please let me or another officer know, and/or volunteer to become more involved and create a more valuable experience for yourself and others. president’s message continued from page 86 aliprand ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ reference information extraction and processing using conditional random fields tudor groza, gunnar aastrand grimnes, and siegfried handschuh reference information extraction and processing |groza, grimnes, and handschuh 6 abstract fostering both the creation and the linking of data with the scope of supporting the growth of the linked data web requires us to improve the acquisition and extraction mechanisms of the underlying semantic metadata. this is particularly important for the scientific publishing domain, where currently most of the datasets are being created in an author-driven, manual manner. in addition, such datasets capture only fragments of the complete metadata, omitting usually, important elements such as the references, although they represent valuable information. in this paper we present an approach that aims at dealing with this aspect of extraction and processing of reference information. the experimental evaluation shows that, currently, our solution handles very well diverse types of reference format, thus making it usable for, or adaptable to, any area of scientific publishing. 1. introduction the progressive adoption of semantic web 1 techniques resulted in the creation of a series of datasets connected by the linked data 2 initiative, and via the linked data principles, into a universal web of linked data. in order to foster the continuous growth of this linked data web, we need to improve the acquisition and extraction mechanisms of the underlying semantic metadata. unfortunately, the scientific publishing domain, a domain with an enormous potential for generating large amounts of linked data, still promotes trivial mechanisms for producing semantic metadata. 3 as an illustration, the metadata acquisition process of the semantic web dog food server, 4 the main linked data publication repository available on the web, consists of two steps:  the authors manually fill-in submission forms corresponding to different publishing venues (e.g., conferences or workshops), with the resulting (usually xml) information being transformed via scripts into semantic metadata, and  the entity uris (i.e., authors and publications) present in this semantic metadata are then manually mapped to existing web uris for linking/consolidation purposes. tudor groza (tudor.groza@uq.edu.au) is postdoctoral research fellow, school of information technology and electrical engineering, university of queensland, gunnar aastrand grimnes (grimnes@dfki.uni-kl.de) is researcher, german research center for artificial intelligence (dfki) gmbh, kaiserslautern, germany, siegfried handschuh (msiegfried.handschuh@deri.org) is senior lecturer/associate professor, national university of ireland, galway, ireland. mailto:tudor.groza@uq.edu.au mailto:grimnes@dfki.uni-kl.de mailto:msiegfried.handschuh@deri.org information technology and libraries | june 2012 7 moreover, independent of the creation/acquisition process, one particular component of the publication metadata, i.e., the reference information, is almost constantly neglected. the reason is mainly the amount of work required to manually create it, or the complexity of the task, in the case of automatic extraction. as a result, currently, there are no datasets in the linked data web exposing reference information, while the number of digital libraries providing search and link functionality over references is rather limited. this is quite a problematic gap if we consider the amount of information provided by references and their foundational support for other application techniques that bring value to researchers and librarians, such as citation analysis and citation metrics, tracking temporal author-topic evolution 5 or co-authorship graph analysis. 6,7 in this paper we focus on the first of the above-mentioned steps, i.e., providing the underlying mechanisms for automatic extraction of reference metadata. we devise a solution that enables extraction and chunking of references using conditional random fields (crf). 8 the resulting metadata can then be easily transformed into semantic metadata adhering to particular schemas via scripts, the added value being the exclusion of the manual author-driven creation step from the process. from the domain perspective, we focus on computer science and health sciences only because these domains have representative datasets that can be used for evaluation and hence enable comparison against similar approaches. however, we believe that our model can be applied also in domains such as digital humanities or social sciences, and we intend, in the near future, to build a corresponding corpus that would allow us to test and adapt (if necessary) our solution to these domains. figure 1. examples of chunked and labeled reference strings reference chunking represents the process of label sequencing a reference string, i.e., tagging the parts of the reference containing the authors, the title, the publication venue, etc. the main issue associated with this task is the lack of uniformity in the reference representation. figure 1 presents three examples of chunked and labeled reference strings. one cannot infer generic patterns for all types of references. for example, the year (or date) of some of the references of this paper are similar to example 2 from the figure, i.e., they are located at the very end of the reference string. unfortunately, this does not hold for some journal reference formats, such as the one presented in example 1. and at the same time, the actual date might not comprise only the year, but also the month (and even day). in addition to the placement of the particular types of tokens within the reference string, one of the major concerns when labeling these types of tokens is disambiguation. generally, there are three categories of ambiguous elements: reference information extraction and processing |groza, grimnes, and handschuh 8  names—can act as authors, editors, or even part of organization names (e.g., max planck institute); in example 1 a name is used as part of the title;  numbers—can act as pages, years, days, volume numbers, or just numbers within the title;  locations—can act as actual locations or part of organization names (e.g., univ. of wisconsin) to help the chunker in performing disambiguation, one can use a series of markers, such as, pp. for pages, tr for technical reports, univ. or institute for organization. however, there are cases where such markers help in detecting the general category of the token, e.g., publication venue, but a more detailed disambiguation is required. for example, the proc. marker generally signals the publication venue of the reference, without knowing exactly whether it represents a workshop, conference or even journal (as in the case of proc. natl. acad. sci.—proceedings of the national academy of sciences). the solution we have devised was built to properly handle such disambiguation issues and the intrinsic heterogeneous nature of references. the features of the crf chunker model were chosen to provide a representative discrimination between the different fields of the reference string. consequently, as the experimental results show, the resulting chunker has a superior efficiency, while at the same time maintaining an increased versatility. the rest of the paper is structured as follows: in section 2 we briefly describe conditional random fields and analyze the existing related work. section 3 details the crf-based reference chunker and before concluding in section 5, section 4 presents our experimental results. 2. background 2.1 conditional random fields to have a better understanding of the machine learning technique used by our solution, in the following we give a brief description of the conditional random fields paradigm. figure 2. example linear crf—showing dependencies between features x and classes y information technology and libraries | june 2012 9 conditional random fields (crf) is a probabilistic graphical model for classification. crf, in general, can represent many different types of graphical models, however in the scope of this paper, we use the so-called linear-chain crfs. a simple example of a linear dependency graph is shown in figure 2, here only the features x of the previous item influences the class of the current item y. the conditional probability is defined as: ( | ) ( ) (∑ ( ) ) where ( ) ∑ ( ) and ( ) ∑ (∑ ( ) ) . the model is usually trained by maximizing the log-likelihood of the training data by gradient methods. a dynamic algorithm is used to compute all the required probabilities p⍬(yi, yi+1) for calculating the gradient of the likelihood. this means that in contrast to traditional classification algorithms in machine learning (e.g., support vector machines 9 ), it not only considers the attributes of the current element when determining the class, but also attributes of preceding and succeeding items. this makes it ideal for tagging sequences, such as chunking of parts of speech or parts of references, which is what we require for our chunking task. 2.2 related work in recent years, extensive research has been performed in the area of automatic metadata extraction from scientific publications. most of the approaches focus on one of the two main metadata components, i.e., on the heading/bibliographic metadata or on the reference metadata, but there are also cases when the entire set is targeted. as this paper focuses only on the second component, within this section we present and discuss those applications that deal strictly with reference chunking. the parscit framework is the closest technique mapping to our goals and methodology. 10 parscit is an open-source reference-parsing package. while its first version used a maximum entropy model to perform reference chunking, 11 currently, inspired by the work of peng et al. , 12 it uses a trained crf model for label sequencing. the model was obtained based on a set of twenty-three token-oriented features tailored towards correcting the errors that peng's crf model produced. our crf chunker builds on the work of parscit. however, as we aimed at improving the chunking performance, we altered some of the existing features and introduced additional ones. moreover, we have compiled significantly larger gazetteers required for detecting different aspects, such as names, places, organizations, journals, or publishers. one of the first attempts to extract and index reference information led to the currently well known system, citeseer. 13 around the same period, seymore et al. developed one of the first reference chunking approaches that used machine learning techniques. 14 the authors trained a hidden markov model (hmm) to build a reference sequence labeler using internal states for different parts of the fields. as it represented pioneering work, it also resulted in the first gold standard set, the cora dataset. at a later stage, the same group applied crf for the first time to perform reference chunking, which later inspired parscit. 15 reference information extraction and processing |groza, grimnes, and handschuh 10 in the same learning-driven category is the work of han et al. 16 the authors proposed an effective word clustering approach with the goal of reducing feature dimensionality when compared to hmm, while at the same time improving the overall chunking performance. the resultant domain, rule-based word clustering method for cluster feature representation used clusters formed from various domain databases and word orthographic properties. consequently, they achieved an 8.5 percent improvement on the overall accuracy of reference fields classification combined with a significant dimensionality reduction. flux-cim 17 is the only unsupervised 18 approach that targets reference chunking. the system uses automatically constructed knowledge bases from an existing set of sample references for recognizing the component fields of a reference. the chunking process features two steps:  a probability estimation of a given term within a reference which is a value for a given reference field based on the information encoded in their knowledge bases, and  the use of generic structural properties of references. similarly to seymore et al., 19 the authors have also created two datasets (specifically for the computer science and health science areas) to be used for comparing the achieved accuracies. a completely different, and novel, direction was developed by poon and domingos. 20 unlike all the other approaches, they propose a solution where the segmentation (chunking) of the reference fields is performed together with the entity resolution in a single integrated inference process. they, thus, help in disambiguating the boundaries of less-clear chunked fields, using the already well-segmented ones. although the results achieved are similar to, and even better than some of, the above-mentioned approaches, this is suboptimal from the computational perspective: the chunking/resolution time reported by the authors measured around thirty minutes. in addition to the previously described works, which were specifically tailored for bibliographic metadata extraction, there are a series of other approaches that could be used for the same purpose. for example, cesario et al. propose an innovative recursive boosting strategy, with progressive classification, to reconcile textual elements to an existing attribute schema. 21 in the case of bibliographic metadata segmentation, the metadata fields would correspond to the textual elements, while an ontology describing them (e.g., dublincore 22 or swrc 23 ) would have the schema role. the authors even describe an evaluation of the method using the dblp citation dataset, however, without giving precise details on the fields considered for segmentation. some other approaches include, in general, any sequence labeling techniques, e.g., slf, 24 named entity recognition techniques, 25 or even field association (fa) terms extraction, 26 the latter working on bibliographic metadata fields in a quasi-similar manner as the recursive boosting strategy. in conclusion, it is worth mentioning that retrieving citation contexts is an interesting research area especially in the context of digital libraries. our current work does not feature this aspect, but we regard it as one of the key next steps to be tackled. consequently, we mention the research performed by schwartz et al. 27 teufel et al., 28 or wu et al. 29 that deal with using citation contexts for discerning a citation's function and analyzing how this influences or is influenced by the work it points to. information technology and libraries | june 2012 11 3. method this section presents the crf chunker model. we start by defining the preprocessing steps that deal with the extraction of the references block, dividing the block into actual reference entries and cleaning the reference strings, and then detail the crf reference chunker features. 3.1 prerequisites most of the features used by the crf chunker require some forms of vocabulary entries. therefore, we have manually compiled a comprehensive list of gazetteers (only for english, except for the names), explained as follows:  firstname—25,155 entries gazetteer of the most common first names (independent of gender);  lastname—48,378 entries list of the most common surnames;  month—month names gazetteer and associated abbreviations;  venuetype—a structured gazetteer with five categories: conference, workshop, journal, techreport, and website. each category has attached its own gazetteer, containing specific keywords and not actual titles. for example, the conference gazetteerfeatures ten unigrams signaling conferences, such as conference, conf, or symposium;  location—places, cities, and countries gazetteer comprising 17,336 entries;  organization—150 entries gazetteer listing organization prefixes and suffixes (e.g., e.v. or kgaa);  proceedings—simple list of all possible appearances of the proceedings marker;  publisher—564 entries gazetteer comprising publisher unigrams (produced from around 150 publisher names);  jtitle—12,101 entries list of journal title unigrams (produced from around 1600 journal titles);  connection—a 42 entries stop-word gazetteer (e.g., to, and, as). 3.2 preprocessing in the preprocessing stage we deal with three aspects:  cleaning the provided input,  extracting the reference block, and  the division of the reference block into reference entries. the first step aims to clean the raw textual input received by the chunker of unwanted spacing characters while at the same time ensuring proper spacing where necessary. since the source of the textual input is unknown to the chunker, we make no assumptions with regard to its structure or content. 30 thus, in order to avoid inherent errors that might appear as a result of extracting the raw text from the original document, we perform the following cleaning steps:  we compress the text by eliminating unnecessary carriage returns, such that the lines containing less than 15 characters are merged with previous ones, 31  we introduce spaces after some punctuation characters, such as “,,” “.” or “-”, and finally,  we split the camel-cased strings, such as johndoe. reference information extraction and processing |groza, grimnes, and handschuh 12 the result will be a compact and clean version of the input. also, if the raw input is already compact and clean, this preprocessing step will not affect it. the extraction of the reference block is done using regular expressions. generally, we search in the compacted and cleaned input for specific markers, like references or bibliography, located mainly at the beginning of a line. if these are not directly found, we try different variations, such as, looking for the markers at the end of a line, or looking for split markers onto two lines (e.g., ref – erences, or refer – ences). this latter case is a typical consequence of the above-described compacting step if the initial input was erroneously extracted. the text following the markers is considered for division, although it may contain unwanted parts such as appendices or tables. the division into individual reference entries is performed on a case basis. after splitting the reference block based on new lines, we look for prefix patterns at the beginning of each line. as an example, we analyze which lines start with “[”, “(”, or a number followed by “.” or space, and we record the positions of these lines in the list of all lines. to ensure that we don't consider any false positives when merging the adjacent lines into a reference entry, we compute a global average of the differences between positions. assuming that a reference does not span on more than four lines, if this average is between one and four, a reference entry is created. the same average is also used to extract the last reference in the list, thus detaching it from eventual appendices or tables. 3.3 the reference chunking model we have built the crf learning model based on a series of features used in principle also by the other crf reference chunking approaches such as parscit 32 or peng and mccallum 33 . a set of feature values is used to characterize each token present in the reference string, where the reference's token list is obtained by dividing the reference string into space-separated pieces. the complete list of features is detailed as follows. we use example 1 from figure 1 toexemplify the feature values.  token—the original reference token: bronzwaer,  clean token—the original token, stripped of any punctuation and lower cased: bronzwaer  token ending—a flag signaling the type of ending (possible values: lower cap – c / upper cap – c / digit – 0 / punctuation character: ,  token decomposition–start—five individual values corresponding to token's first five characters, taken gradually: b, br, bro, bron, bronz  token decomposition–end—five individual values corresponding to the token's last five characters, taken gradually: r, er, aer, waer, zwaer,  pos tag—the token's part of speech tag (possible values: proper noun phrase – nnp ,  noun phrase – np, adjective – jj, cardinal number – cd, etc): nnp  orthographic case—a flag signaling the token's orthographic case (possible values:  initialcap, singlecap, lowercase, mixedcaps, allcaps): singlecap  punctuation type—a flag signaling the presence and type of a trailing punctuation character (possible values: cont, stop, other): cont  number type—a flag signaling the presence and type of a number in the token (possible values: year, ordinal, 1dig, 2dig, 3dig, 4dig, 4dig+, nonumber): nonumber information technology and libraries | june 2012 13  dictionary entries—a set of ten flags signaling the presence of the token in the set of individual gazetteers listed in sect. 3.1. for our example the dictionary feature set would be: no lastname no no no no no no no no  date check—a flag checking whether the token may contain a date in form of a period of days, e.g., 12-14 (possible values: possdate, no): no  pages check—a flag checking whether the token may contain pages, e.g., 234–238 (possible values: posspages, no): no  token placement—the token placement in the reference string, based on its division into nine equal consecutive buckets. this feature indicates the bucket number: 0 for training purposes we compiled and manually tagged a set of 830 randomly chosen references. these were extracted from random publications from diverse conferences and journals from the computer science field (collected from ieee explorer, springer link or the acm portal), manually cleaned, tagged, and categorized according to their type of publication venue. 34 to achieve an increased versatility, instead of performing crossvalidation, 35 which would result in a datasettailored model with limited or no versatility, we opted for sampling the test data. hence, we included in the training corpus some samples from the testing datasets as follows: 10 percent of the cora dataset (i.e., 20 entries), 36 10 percent of the flux-cim cs dataset (i.e., 30 entries), 37 and 1% of the flux-cim hs dataset (i.e., 20 entries). consequently, the final training corpus consisted of a total of 900 reference strings. to clarify, this is, to some extent, similar to the dataset-specific cross-validation, but instead of considering, for example, a 60–40 ratio for training/testing, we used only 10 percent for training, while the testing (described in section 4) was performed as a direct application of the chunker on the entire dataset. as already mentioned, our focus on computer science and health sciences is strictly due to evaluation purposes. our proposed model is domain-agnostic, and hence, the steps described here can be easily performed on datasets emerged from other domains, if at all necessary. in reality, the chunker’s performance on references from a domain not covered above can be easily boosted simply by including a sample of references in the training set and then retraining the chunker. the list of labels used for training and then testing consists of author, title, journal, conference, workshop, website, technicalrep, date, publisher, location, volnum, pages, etal, note, editors, organization. as we will see in the evaluation, not all labels were actually used for testing (e.g., note or editors), some of them being present in the model for the sake of disambiguation. also, as opposed to the other approaches, we made a clear distinction between workshop and conference, which adds an extra degree to the complexity of the disambiguation. the crf model was trained using the mallet (a machine learning for language toolkit) implementation. 38 the output of the chunker is post-processed to expose a series of fine-grained details. as shown in figure 1 in all the examples, the chunking provides a blocked partition of the reference string, but we require for the author field an even deeper partition. consequently, following a rule-based approach we extract the individual author names from the author block making use of the punctuation marks, the orthographic case, and the alternation between initials and actual names. when no initials, subject to the existing punctuation marks, we consider as a rule-of-thumb that each name generally comprises one first name and one surname (in this order, i.e., john doe). the result of the post-processing is used in the linking process. reference information extraction and processing |groza, grimnes, and handschuh 14 4. experimental results we have performed an extensive evaluation of the proposed reference chunking approach. in general, all the previous work in reference chunking focuses on raw reference chunking, i.e., label sequencing at the macro level. more concretely, the other approaches split and tag the reference strings using blocks of complete references, without going into details such as chunking individual authors. the only exception is the parscit package that does perform complete reference chunking in a similar fashion as we do. the evaluation results presented in this section, will feature complete chunking only for our solution and for parscit, and raw chunking for the rest of the approaches. field parscit peng han et al. our approach p r f1 f1 p r f1 p r f1 author 98.7 99.3 98.99 99.4 92.6 99.1 97.6 99.08 99.6 99.30 title 96.0 98.4 97.18 98.3 92.2 93.0 92.6 95.64 95.64 95.64 date 100 98.4 99.19 98.9 98.5 95.9 97.2 99.33 98.67 98.99 pages 97.7 98.4 98.04 98.6 95.6 96.9 96.2 99.28 99.22 99.24 location 95.6 90.0 92.71 87.2 77.7 71.5 74.5 93.45 92.59 93.01 organization 90.9 87.9 89.37 94.0 76.5 77.3 76.9 100 87.87 93.54 journal 90.8 91.2 90.99 91.3 77.1 78.7 77.9 94.02 97.42 95.68 booktitle 92.7 94.2 93.44 93.7 88.7 88.9 88.88 97.77 98.44 98.10 publisher 95.2 88.7 91.83 76.1 56.0 64.1 59.9 94.84 95.83 95.33 tech. rep. 94.0 79.6 86.2 86.7 56.2 64.1 59.9 100 90.90 95.23 website 100 100 100 table 1. evaluation results on the cora dataset an additional observation we need to make is related to the reference fields taken into account. most of the fields we have focused on coincide with the fields considered by all the existing relevant approaches. nevertheless, there are also some discrepancies, listed as follows:  the fields: volume, number, editors, or note were used in the chunking process b u t are not considered for evaluation  unlike all the other approaches, we make the distinction between conference and workshop as publication venues. however, for alignment purposes (i.e., to be able to compare our results with the other approaches), in the evaluation results these are merged into the booktitle field. the actual tests were performed on four different datasets, three of them used also for evaluating the other approaches, and a fourth one compiled by us. in the case of the three existing datasets, during the experimental evaluation we did not make use of the preprocessing step as they were already clean. as evaluation metric, we used the f1 score, 39 i.e., the harmonic mean of precision and recall, using the following formula: information technology and libraries | june 2012 15 in the following, we iterate over each dataset, by providing a short description and the experimental results. it is worth mentioning that our crf reference chunker was trained only once, as described earlier, and not specifically for each dataset. 4.1 dataset: cora the cora dataset is the first gold standard created for automatic reference chunking. 40 it comprises two hundred reference strings and focuses on the computer science area. each entry is segmented into thirteen different fields: author, editor, title, booktitle, journal, volume, publisher, date, pages, location, tech, institution and note. table 1 shows the comparative evaluation results on the cora dataset of parscit, peng et al., 41 han et al., 42 and our approach. we observe that our chunker outperforms the other chunkers on most of the fields, with some of them presenting a significant increase in performance (looking at the f1 score): journal from 91.3 percent to 95.68 percent, booktitle from 93.44 percent to 98.10 percent, publisher from 91.83 percent to 95.33 percent, and especially tech. rep. from 86.7 percent to 95.23 percent. in the case of the fields where our chunker was outperformed, the f1 score is very close to the best of the approaches and includes an increase in one of its two components (i.e., precision or recall). for example, on the organization field, we scored 93.54percent, the best being peng's 94 percent. however, we achieved a gain of almost 10 percent in precision when compared with parscit (100 percent vs. 90.9 percent precision). similarly, on the date field, our f1 was 98.99 percent, opposed to parscit's 99.19 percent, but with a better recall of 98.67 percent. field parscit flux-cim our approach p r f1 p r f1 p r f1 author 98.8 99.0 98.89 93.59 95.58 94.57 99.08 99.08 99.08 title 98.8 98.3 98.54 93.0 93.0 93.0 99.65 99.65 99.65 date 99.8 94.5 97.07 97.75 97.44 97.59 98.55 98.19 98.36 pages 94.7 99.3 96.94 97.0 97.84 97.41 97.28 97.72 97.49 location 96.9 88.4 92.45 96.83 97.6 97.21 95.55 94.5 95.02 journal 97.1 82.9 89.43 95.71 97.81 96.75 94.0 97.91 95.91 booktitle 95.7 99.3 97.46 97.47 95.45 96.45 99.13 99.13 99.13 publisher 98.8 75.9 85.84 100 100 100 98.59 98.59 98.59 table 2. evaluation results on the flux-cim dataset—cs domain field flux-cim our approach p r f1 p r f1 author 98.57 99.04 98.81 99.8 99.36 99.57 title 84.88 85.14 85.01 91.39 91.39 97.39 date 99.85 99.5 99.61 99.89 99.69 99.78 pages 99.1 99.2 99.45 99.94 99.59 99.76 journal 97.23 89.35 93.13 99.42 99.16 99.28 table 3. evaluation results on the flux-cim dataset—hs domain reference information extraction and processing |groza, grimnes, and handschuh 16 4.1 dataset: flux-cim flux-cim 43 is an unsupervised 44 reference extraction and chunking system. in order to evaluate its performance, the authors of flux-cim created two separate datasets:  the flux-cim cs dataset, composed on a collection of heterogeneous references from the computer science field, and  the flux-cim hs dataset is comprised of an organized and controlled collection of references from pubmed. the flux-cim cs dataset contains three hundred reference strings randomly selected from the acm digital library. each string is segmented into ten fields: author, title, conf, journal, volume, number, pub, date, pages and place. the flux-cim hs dataset contains 2000 entries, with each entry segmented into six fields: author, title, journal, volume, date and pages. table 2 presents the comparative test results achieved by parscit, flux-cim, and our approach on the cs dataset. similar to the cora dataset, our chunker outperformed the other chunkers on the majority of the fields, exceptions being the location, journal, and publisher fields. the test results on the hs dataset are presented in table 3. here we can observe a clear performance improvement on all fields, in some cases the difference being significant, e.g., the title field, from 85.01 percent to 97.39 percent, or the journal field, from 93.12 percent to 99.28 percent. this increase is even more relevant considering the size of the dataset, each 1percent representing twenty references. 4.3 dataset: cs-sw while the cora and flux-cim cs datasets do focus on the computer science field, they do not cover the slight differences in reference format that can be found nowadays in the semantic web community. consequently, to show the even broader application of our approach, we have compiled a dataset named cs-sw comprising 576 reference strings randomly selected from publications in the semantic web area, from conferences such as international semantic web conference (iswc), the european semantic web conference (eswc), the world wide web conference (www), or the european conference on knowledge acquisition (and co-located workshops). 45 each reference entry is segmented into twelve fields: author, title, conference, workshop, journal, techrep, organization, publisher, date, pages, website and location. table 4 shows the results of the tests carried out on this dataset. one can easily observe that the chunker performed in a similar manner as on the cora dataset, with emphasis on the author, date, pages and publisher fields. field our approach p r f1 author 98.61 99.27 98.93 title 94.91 93.29 94.09 date 98.89 98.34 98.61 pages 98.94 97.24 98.08 location 93.9 92.77 93.33 organization 85.71 80 00 82.75 journal 94.59 93.33 93.95 information technology and libraries | june 2012 17 conference 96.66 95.08 95.86 workshop 83.33 88.23 85.71 publisher 96.61 97.43 97.01 tech. rep. 100 80 88.88 website 98.14 94.64 96.35 table 4. evaluation results on the cs-sw dataset 5. conclusion in this paper we presented a novel approach for extracting and chunking reference information from scientific publications. the solution, realized using a crf trained chunker, achieved good results in the experimental evaluation, in addition to an increased versatility shown by applying the one-time trained chunker on multiple testing datasets. this enables a straightforward adoption and reuse of our solution for generating semantic metadata in any digital library or publication repository focused on scientific publishing. as next steps, we plan to create a comprehensive dataset covering multiple heterogeneous domains (e.g., social sciences or digital humanities) and evaluate the chunker’s performance on it. then we will focus on developing an accurate reference consolidation and linking technique, to address the second step mentioned in section 1, i.e., aligning the resulting metadata to the existing linked data on the web. we plan to develop a flexible consolidation mechanism by dynamically generating and executing sparql queries from chunked reference fields and filtering the results via two string approximation metrics (a combination of monge-elkan and chapman soundex algorithms). the sparql queries generation will be implemented in an extensible manner, via customizable query modules, to accommodate the heterogeneous nature of the diverse linked data sources. finally, we intend to develop an overlay interface for arbitrary online publication repositories, to enable on-the-fly creation, visualization, and linking of semantic metadata from repositories that currently do not expose their datasets in a semantic / linked manner. acknowledgements the work presented in this paper has been funded by science foundation ireland under grant no. sfi/08/ce/i1380 (lion-2). references and notes 1. tim berners-lee et al., “the semantic web,” scientific american 284 (2001): 35–43. 2. christian bizer et al., “linked data—the story so far,” international journal on semantic web and information systems 5 (2009): 1–22. 3. generating computer-understandable metadata represents an issue, in general, in the publishing domain, and not necessarily only in its scientific area. however, the relevant literature dealing with metadata extraction/generation has focused on scientific publishing, because of its accelerated growing rate, especially with the increasing use of the world wide web as a dissemination mechanism. reference information extraction and processing |groza, grimnes, and handschuh 18 4. knud moeller et al., “recipes for semantic web dog food – the eswc and iswc metadata projects,” proceedings of the 6th international semantic web conference (busan, korea, 2007). 5. wei peng and tao li, “temporal relation co-clustering on directional social network and author-topic evolution,” knowledge and information systems 26 (2011): 467–86. 6. laszlo barabasi et al., “evolution of the social network of scientific collaborations,” physica a: statistical mechanics and its applications 311 (2002): 590–614. 7. xiaoming liu et al., “co-authorship networks in the digital library research community,” information processing & management 41 (2005): 1462–80. 8. john d. lafferty et al., “conditional random fields: probabilistic models for segmenting and labeling sequence data,” proceedings of the 18th international conference on machine learning (san francisco, ca, usa, 2001): 282–89. 9. vladimir vapnik, the nature of statistical learning theory (new york: springer, 1995). 10. isaac g. councill et al., “parscit: an open-source crf reference string parsing package,” proceedings of the sixth international language resources and evaluation (marrakech, morocco, 2008). 11. yong kiat ng, “citation parsing using maximum entropy and repairs” (master's thesis, national university of singapore, 2004). 12. fuchun peng and andrew mccallum, “information extraction from research papers using conditional random fields,” information processing & management 42 (2006): 963–79. 13. c. lee giles et al., “citeseer: an automatic citation indexing system,” proceedings of the third amc conference on digital libraries (pittsburgh, pa, 1998): 89–98. 14. kristie seymore et al., “learning hidden markov model structure for information extraction,” proceedings of the aaai workshop on machine learning for information extraction (1999): 37– 42. 15. isaac g. councill et al., “parscit: an open-source crf reference string parsing package,” proceedings of the sixth international language resources and evaluation (marrakech, morocco, 2008). 16. hui han et al., “rule-based word clustering for document metadata extraction,” proceedings of the symposium on applied computing (santa fe, new mexico, 2005). 17. eli cortez et al., “flux-cim: flexible unsupervised extraction of citation metadata,” proceedings of the 2007 conference on digital libraries (new york, 2007): 215–24. 18. machine learning methods can be broadly classified into two categories: supervised and unsupervised. supervised methods require training on specific datasets that exhibit the characteristics of the target domain. to achieve high accuracy levels, the training dataset needs to be reasonably large, and more importantly, it has to cover most of the possible information technology and libraries | june 2012 19 exceptions from the intrinsic data patterns. unlike supervised methods, unsupervised methods do not require training, and in principle, use generic rules to encode both the expected patterns and the possible exceptions of the target data. 19. peng and mccallum, “information extraction from research papers using conditional random fields.” 20. hoifung poon and pedro domingos, “joint inference in information extraction,” proceedings of the 22nd national conference on artificial intelligence (vancouver, british columbia, canada, 2007): 913–18. 21. ariel schwartz et al., “multiple alignment of citation sentences with conditional random fields and posterior decoding,” proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (prague, czech republic, 2007): 847–57. 22. simone teufel et al., “automatic classification of citation function,” proceedings of the 2006 conference on empirical methods in natural language processing (sydney, australia, 2006): 103–10. 23. jien-chen wu et al., “computational analysis of move structures in academic abstracts,” coling/acl interactive presentation sessions (sydney, australia, 2006): 41–44. 24. eugenio cesario et al., “boosting text segmentation via progressive classification,” knowledge and information systems 15 (2008): 285–320. 25. dublin core website, http://dublincore.org (accessed may 4, 2011). 26. york sure et al., “the swrc ontology – semantic web for research communities,” proceedings of the 12th portuguese conference on artificial intelligence (covilha, portugal, 2005). 27. yanjun qi et al., “semi-supervised sequence labeling with self-learned features,” proceedings of ieee international conference on data mining (miami, fl, usa, 2009). 28. david sanchez et al., “content annotation for the semantic web: an automatic web-based approach,” knowledge and information systems 27 (2011): 393-418. 29. tshering cigay dorji et al., “extraction, selection and ranking of field association (fa) terms from domain-specific corpora for building a comprehensive fa terms dictionary,” knowledge and information systems 27 (2011): 141–61. 30. please note that the chunker is document-format agnostic and takes as input only raw text. the actual extraction of this raw text from the original document (pdf, doc or some other format) is the user’s responsibility. 31. as a note, we chose this length of fifteen characters empirically, and based on the assumption that in any format the publication content lines usually have more than fifteen characters. reference information extraction and processing |groza, grimnes, and handschuh 20 32. lafferty et al., “conditional random fields: probabilistic models for segmenting and labeling sequence data.” 33. councill et al., “parscit: an open-source crf reference string parsing package.” 34. the manual tagging was performed by a single person and since the reference chunks have no ambiguity attached, we did not see the need for running any data reliability tests. 35. ron kohavi, “a study of cross-validation and bootstrap for accuracy estimation and model selection,” proceedings of the 14th international joint conference on artificial intelligence (montreal, quebec, 1995): 1137–43. 36. peng and mccallum, “information extraction from research papers using conditional random fields.” 37. councill et al., “parscit: an open-source crf reference string parsing package.” 38. mallet: machine learning for language toolkit, http://mallet.cs.umass.edu (accessed may 4, 2011). 39. william m. shaw et al., “performance standards and evaluations in ir test collections: clusterbased retrieval models,” information processing & management 33 (1997): 1–14. 40. peng and mccallum, “information extraction from research papers using conditional random fields.” 41. councill et al., “parscit: an open-source crf reference string parsing package.” 42. seymore et al., “learning hidden markov model structure for information extraction.” 43. han et al., “rule-based word clustering for document metadata extraction.” 44. cortez et al., “flux-cim: flexible unsupervised extraction of citation metadata.” 45. the cs-sw dataset is available at http://resources.smile.deri.ie/corpora/cs-sw (accessed may 4, 2011). http://resources.smile.deri.ie/corpora/cs-sw editorial board thoughts | farnel 169 t his past spring, my alma mater, the school of library and information studies (slis) at the university of alberta, restructured the it component of its mlis program. as a result, as of september 2010, incoming students are expected to possess certain basic it skills before beginning their program.1 these skills include the following: ■■ comprehension of the components and operations of a personal computer ■■ microsoft windows file management ■■ proficiency with microsoft office (or similar) products, including word processing and presentation software ■■ use of e-mail ■■ basic web browsing and searching this new requirement got me thinking: is this common practice among ala-accredited library schools? if other schools are also requiring basic it skills prior to entry, how do those required by slis compare? so i thought i’d do a little investigating to see what others in “library school land” are doing. before i continue, a word of warning: this was by no means a rigorous scientific investigation, but rather an informal survey of the landscape. i started my investigation with ala’s directory of institutions offering accredited master’s programs.2 there are fifty-seven institutions listed in the directory. i visited each institution’s website and looked for pages describing technology requirements, computer-competency requirements, and the like. if i wasn’t able to find the desired information after fifteen or twenty minutes, i would note “nothing found” and move on to the next. in the end i found some sort of list of technology or computer-competency requirements on thirty-three (approximately 58 percent) of the websites. it may be the case that such a list exists on other sites and i didn’t find it. i should also note that five of the lists i found focus more on software and hardware than on skills in using said software and hardware. even considering these conditions, however, i was somewhat surprised at the low numbers. is it simply assumed that today’s students already have these skills? or is it expected that they will be picked up along the way? i don’t claim to know the answers, and discovering them would require a much more detailed and thorough investigation, but they are interesting questions nonetheless. once i had found the requirements, i examined them in some detail to get a sense of the kinds of skills listed. while i won’t enumerate them all, i did find the most common ones to be similar to those required by slis— basic comfort with a personal computer and proficiency with word processing and presentation software, e-mail, file management, and the internet. a few (5) schools also list comfort with local systems (e-mail accounts, online courseware, etc.). several (7) schools mention familiarity with basic database design and functionality, while a few (5) list basic web design. very few (3) mention competency with security tools (firewalls, virus checkers, etc.), and just slightly more (4) mention familiarity with web 2.0 tools like blogs, wikis, etc. while many (14) specifically mention searching under basic internet skills, few (7) mention proficiency with opacs or other common information tools such as full-text databases. interestingly, one school has a computer programming requirement, with mentions of specific acceptable languages, including c++, pascal, java, and perl. but this is certainly the exception rather than the rule. i was encouraged that there seems to be a certain agreement on the basics. but i was a little surprised at the relative rarity of competency with wikis and blogs and all those web 2.0 tools that are so often used and talked about in today’s libraries. is this because there is still some uncertainty as to the utility of such tools in libraries? or is it because of a belief that the members of the millennial or “digital” generation are already expert in using them? i don’t know the reasons, but it is interesting to ponder nonetheless. i was also surprised that a level of information literacy isn’t listed more often, particularly given that we’re talking about slis programs. i do know, of course, that many of these skills will be developed or enhanced as students work their way through their programs, but it also seems to me that there is so much other material to learn that the more that can be taken care of beforehand, the better. librarians work in a highly technical and technological environment, and this is only going to become even more the case for future generations of librarians. certainly, basic familiarity with a variety of applications and tools and comfort with rapidly changing technologies are major assets for librarians. in fact, ala recognizes the importance of “technological knowledge and skills” as core competencies of librarianship. specifically mentioned are the following: ■■ information, communication, assistive, and related technologies as they affect the resources, service delivery, and uses of libraries and other information agencies. ■■ the application of information, communication, assistive, and related technology and tools consistent with professional ethics and prevailing service norms and applications. ■■ the methods of assessing and evaluating the sharon farnel editorial board thoughts: system requirements sharon farnel (sharon.farnel@ualberta.ca) is metadata & cataloguing librarian at the university of alberta in edmonton, alberta, canada. 170 information technology and libraries | december 2010 references 1. university of alberta school of library and information studies, “degree requirements: master of library & information studies,” www.slis.ualberta.ca/mlis_degree_requirements.cfm (accessed aug. 5, 2010). 2. american library association office for accreditation, “library & information studies directory of institutions offering accredited master’s programs 2008–2009,” 2008, http:// ala.org/ala/educationcareers/education/accreditedprograms/ directory/pdf/lis_dir_20082009.pdf (accessed aug. 5, 2010). 3. american library association, “ala’s core competences of librarianship,” january 2009, www.ala.org/ala/education careers/careers/corecomp/corecompetences/finalcorecomp stat09.pdf (accessed aug. 5, 2010). specifications, efficacy, and cost efficiency of technology-based products and services. ■■ the principles and techniques necessary to identify and analyze emerging technologies and innovations in order to recognize and implement relevant technological improvements.3 given what we know about the importance of technology to librarians and librarianship, my investigation has left me with two questions: (1) why aren’t more library schools requiring certain it skills prior to entry into their programs? and (2) are those who do require them asking enough of their prospective students? i hope you, our readers, might ask yourselves these questions and join us on italica for what could turn out to be a lively discussion. 150 information technology and libraries | december 2011 hardly a day goes by in my professional life (and it sometimes creeps into my personal life too!) when i don’t think about the issues of connecting people with data, and then how to present that data in ways that are relevant to their needs. the tides are shifting in health sciences library and likely in your library too. ongoing changes in publishing and the changing nature of research have challenged the traditional nature of the library. it is no longer solely a repository for information, physical or virtual. as librarians move from collecting and cataloging bibliographic information new roles have emerged in data discovery, in its preservation, and in helping to make data more accessible. important specialties include; knowledge management, data visualization, e-science and copyright. librarians have valuable skills sets in mining and accessing data, human–computer interaction, computer interface design, and knowledge management that can be leveraged now. it is inevitable that data discovery will quicken the pace of science and lead to collaboration and collaboration will in turn lead to data discovery and accelerate the pace of science and so on and so on. in short twentieth century data stored in individual scientists’ notebooks or computers is largely inaccessible. twenty-first-century data needs to be available 24/7 in a curated state for continuous analysis. information overload and data deluge created by intersection of science and technology are two very real problems that the librarians have the skill and ability to deal with. and, as i talk of science, bear in mind that it extends beyond the biological and physical sciences to encompass the social sciences as well. interdisciplinary studies in particular have intensive data needs. in fields such as public health and urban planning, government data alongside research data is used to predict trends, forecast, make decisions, etc. government data is a particularly important part of the equation. consider the recent nsf requirement for researchers to provide open access to their data for any nsf-sponsored grants. it is likely other government agencies will follow suit. one of taiga’s provocative statements of 2011 is “#10. the oversupply of mlss” which states that “within five years, library programs will have overproduced mlss at a rate greater even than humanities phds and glutted a permanently diminished market.”1 as the alarming scenario of an over abundance of new mlss in proportion to available library jobs presents itself, i encourage librarians to begin to envision themselves as digital information brokers or data scientists. the us department of labor in the 2010–11 occupational outlook handbook, anticipates that librarian jobs in nontraditional settings will grow the fastest over this decade. nontraditional libraries and jobs include working as information brokers for private corporations, nonprofit organizations, and consulting firms. “many companies are turning to librarians because of l ast week i attended the second annual vivo conference in washington, d.c. vivo (vivoweb .org) is a semantic web application that enables the discovery of research and scholarship across disciplines in an institution with the potential to also link scholars and research across institutions. despite an earthquake and a hurricane the conference itself was the real showstopper—excellent, informative programming, engaging speakers, great networking and exchange of ideas. my institution is one of the core vivo members so it was an opportunity to showcase our work, see what others are doing as well as learn more about trends in research, e-science and data discovery and collaboration initiatives. much of what i learned or rediscovered at vivo will make it into my fifty-minute presentation on the subject at the lita national forum in st. louis later this month. in fact the vivo conference itself reminded me of our own national forum in size, scope and content. it was a good mix of in-depth technical discussions coupled with broad coverage of issues and trends in scientific research. this attention to content balance is something that lita consistently gets right at our annual forum—there is literally something for everyone from introductory concepts to technical details—and i look forward to seeing many familiar faces and meeting some new folks at this year’s lita national forum in st. louis “rivers of data: currents of change.” i would also like to take this opportunity to personally invite each and every ital reader to the 2012 lita national forum. building on this year’s theme, the 2012 lita national forum will be “the new world of data: discover. connect. remix.” i just signed off on theme this week and i am excited and impressed by the work completed by the national forum planning committee so far. please look for the call for papers and posters to come out in late december. i love the forum because it is much more intimate than the much larger ala meetings i always come away with new ideas and new friends. i am not alone in this feeling. a recent forum attendee commented,” (the lita forum) was one of the best conferences i have attended. i met a far greater concentration of peers—colleagues at other libraries doing similar work—at lita forum than i have met at other similar conferences.” i don’t think i could say it better myself. the 2012 forum theme is one of great personal interest to me and i plan to extend the theme to the lita president’s program on june 24, 2012, in anaheim. in fact colleen cuddypresident’s message: data discovery colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011–12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york president’s message | cuddy 151 column a call to arms for librarians of all backgrounds. the time to address data discovery is now! references 1. “taiga 2011 provocative statements,” http://taigaforum provocativestatements.blogspot.com/ (accessed sept. 22, 2011). 2. united states department of labor, bureau of labor statistics, occupational outlook handbook, 2010–11 edition, http:// www.bls.gov/oco/ocos068.htm (accessed sept. 22, 2011). their research and organizational skills and their knowledge of computer databases and library automation systems. librarians can review vast amounts of information and analyze, evaluate, and organize it according to a company’s specific needs.” 2 we have been seeing new job titles emerging to reflect these needs, such as data curation librarian, digital data outreach librarian, gis librarian, etc. what is your library doing with data? how can you and your library address the data needs of the twenty-first century? what technology is needed to address data needs? how can lita help you meet those needs? consider this statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2010 issue). total number of copies printed: average, 4,547; actual, 4,494. mailed outside country paid subscriptions: average, 3,608; actual, 3,577. sales through dealers and carriers, street vendors, and counter sales: average, 395; actual 367. total paid distribution: average, 4,003; actual, 3,944. free or nominal rate copies mailed at other classes through the usps: average, 27; actual, 27. free distribution outside the mail (total): average, 118; actual, 117. total free or nominal rate distribution: average, 145; actual, 144. total distribution: average, 4,148; actual, 4,088. office use, leftover, unaccounted, spoiled after printing: average, 399; actual, 406. total: average, 4,547; actual, 4,494. percentage paid: average, 96.50; actual, 96.48. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 11 . this article discusses structural, systems, and other types of bias that arise in matching new records to large databases. the focus is databases for bibliographic utilities, but other related database concerns will be discussed. problems of satisfying a “match” with sufficient flexibility and rigor in an environment of imperfect data are presented, and sources of unintentional variance are discussed. editor’s note: this article was submitted in honor of the fortieth anniversaries of lita and ital. s ameness is a sometime thing. libraries and other information­intensive organizations have long faced the problem of large collections of records growing incrementally. computerized records in a net­ worked environment have encouraged the recognition that duplicate records pose a serious threat to efficient information retrieval. yet what constitutes a duplicate record may be neither exact nor completely predictable. levels of discernment are required to permit matches on records that do not dif­ fer significantly and records that do. n initial definitions matching is defined as the process by which additions to a large database are screened and compared with existing database records. ideally, this process of matching ensures that duplicates are not added, nor erroneous replacements made of record pairs that are not really equivalent. oclc (online computer library center, inc.) is a non­ profit organization serving member libraries and related institutions throughout the world. it is the chief database capital of the organization, and it is “owned” in a sense by the member libraries worldwide that use and contribute to it. at this writing, it contains over seventy­three mil­ lion records. this discussion focuses chiefly on oclc’s extended worldcat (xwc), though many of the issues are common to other bibliographic databases. examples of these include the research libraries group’s research libraries information network (rlin) database, pica (a european cooperative of libraries headquartered in the netherlands), and other union catalogs. the literature will demonstrate that the problems described exist in many if not most large bibliographic databases.the database contents are representations or surrogates of the objects in shared collections. individual records in xwc are com­ plex bibliographic representations of physical or virtual objects—books, films, urls, maps, slides, and much more. each of these records consists of metadata, i.e., “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource”1(appendix a). the records use an xml varia­ tion of the marc communications format.2 for example, a record for a book might typically contain such fields for author, title, publisher, and date, and many more in addi­ tion. the representation of any one object can be quite com­ plex, containing scores of fields and subfields. such a record may be quite brief, or several thousand characters long. the depth and richness of the records varies enormously. they may describe materials in more than 450 languages. this is a database against which millions of searches and millions of records are processed, each month. why is matching a challenge? two records describing the same intellectual creation or work (e.g., shakespeare’s othello) can vary by physical form and other attributes. two records describing both the same work and exactly the same form can differ from each other if the records were created under different rules of record description (catalog­ ing). two records intended to describe the same object can vary unintentionally if typographical or other entry errors are present in one or both. thus sorting out significant from insignificant differences is critical. an example of the challenges of developing matching software in the metadata capture project is described elsewhere.3 the scope of misinformation is limited to information storage and retrieval, and specifically to comparison of incoming records to candidate matches in the database. the authors define misinformation as follows: 1. anything that can cause two database records, i.e., representations of different items to be mistaken as representations of the same item. these can lead to inappropriate merging or updates. 2. the effect of techniques or processes of search that can obscure distinctions in differing items. 3. any case where matching misses an appropriate match due to nonsignificant differences in two records that really represent the same item. note that disinformation (the intentional effort to mis­ represent) is not considered in scope for this discussion. the assumption is that cooperation is in the interests of all parties contributing to a shared database. we do not assume that all institutions sharing the database have the same goals. misinformation and bias in metadata processing | thornburg and oskins 15 misinformation and bias in metadata processing: matching in large databases gail thornburg and w. michael oskins gail thornburg (thornbug@oclc.org) has taught at the university of maryland and the university of illinois, and served as an adjunct professor at kent state university, and as a senior-level software engineer at oclc. w. michael oskins (oskins@oclc.org) has worked as a developer and researcher at oclc for twenty years. 16 information technology and libraries | june 200716 information technology and libraries | june 2007 what is bias? bias can be defined as factors in the creation or processing of database records that feed on misinformation or missing information, and skew charac­ terizations of the database records in question. context—matching and bias how are matching and bias related to each other? the growth of a database is in part a function of the matching process. if matching is not tuned correctly, the database can grow or change in nonoptimal ways. another way to look at the problem is to consider the goal of success in searching, and the need to know when to stop. human beings recognize that failure to find the best information for a given problem may be costly. finding the best information when less would suffice may also be costly. systems need to know this. for a large shared data­ base, hundreds of thousands of records may be processed in a day; the system must be as efficient as possible. what are some costs? fail to match when one should, and duplicates may proliferate in the database. match badly, and there is risk of merging multiple records that do not represent the same item. a system of matching can fail in more than one way. balance is needed. 1. searches, which are based on data in the incom­ ing record, may be too precise to find legitimate matches. loosen the criteria too much, and the search may return too many records to compare. 2. once retrieved, candidate matches are evaluated. compare candidates too narrowly, and records with insignificant differences will be rejected. fail to take note of salient differences between incom­ ing record and database record, and the match will be wrong, undetected, and potentially hard to detect in the future. the goals vary in different matching projects. for some projects, setting “holdings,” the indication that a member library owns a copy of something, is the main goal of the processing. this does not involve adding, replacing, or merging database records. for other projects, the goal is to update the database, either by replacing matched records, merging multiple duplicate records into one, or by adding new records if no match is found in the database. for the latter, bad matching could compromise database contents. n background hickey and rypka provide a good review of the problems of identifying duplicates and the implications for match­ ing software.4 their study notes concerns from a variety of library networks including that of the university of toronto (utlas), washington library network (wln), and research libraries group (rlin). they also refer­ ence studies on duplicate detection in the illinois state­ wide bibliographic database and at oak ridge national laboratories. background discussion of broader misinfor­ mation issues in shared library catalogs can be found in bade’s paper.5 a good, though dated, review of duplicate record problems can be found in the o’neill, rogers, and oskins article.6 the authors discuss their analysis of differences in records that are similar but not identical, and which elements caused failure to match two records for the same item. for example, when there was only one differing element in a pair, they found that element was most often publication date. their study shows the difficulties for experts to determine with certainty that a bibliographic record is for the same item. problems of typographical errors in shared biblio­ graphic records come under discussion by beall and kafadar.7 their study of copy cataloging errors found only 35.8 percent were corrected later by libraries, though the ordinary assumption is that copy cataloging will be updated when more information is available for an item. pollock and zamora report on a spelling error detection project at chemical abstracts service (cas) and charac­ terize the types of errors they found.8 chemical abstracts databases are among the most searched databases in the world. cas is usually characterized as a set of sources with considerable depth and breadth. of the four most common typographical errors they describe, errors of omission are most common, with insertion second, substitution third, and transposition fourth. over 90 percent of the errors they found were single letter errors. this is in agreement with the findings of o’neill and aluri, though the databases were substantially different.9 another study on moving­ image materials focuses on problems of near­equivalents in cataloging.10 yee suggests that cataloging practice tends to lead to making too many separate records for near equivalents. owen gingerich provides insight in the use of holdings information in oclc and other bibliographic utilities such as rlin for scholarly research in locating early editions of copernicus’ de revolutionibus.11 among other sources, he used holdings information in multiple bibliographic utilities to help in collecting a census of copies of de revolutionibus, and plotting its movements through europe in the sixteenth century. his article high­ lights the importance of distinguishing very similar items for scholarly research. shedenhelm and burk discuss the introduction of vendor records into oclc’s worldcat database.12 their results indicate that these minimal­level records increase the duplication rate within the database and can be costly to upgrade. (see further discussion in the section change in contributor characteristics below.) one problem in analysis of sources of mismatch in previous studies is that there is no good way to detect and charac­ public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 17misinformation and bias in metadata processing | thornburg and oskins 17 terize typos that form real words. jasco reviews studies characterizing types and sources of errors.13 sheila intner compares the quality issues in the databases of oclc and the research libraries group (rlg) and finds the issues similar.14 intner used matched samples of records from both worldcat and rlin to list and compare types of errors in the records. she noted that while the perception at that time was that rlin had higher­quality cataloging, the differences found were not statistically significant. jeffrey beall, while focusing in his study on the full­ text online database jstor, notes the commonality of problems in metadata quality.15 in addition, he discusses the special quality problems in a database of scanned images. the scanning software itself may introduce typo­ graphical errors. like xwc, the database changes rapidly. o’neill and visine­goetz present a survey of quality con­ trol issues in online databases.16 their sections on dupli­ cate detection and on matching algorithms illustrate the commonalities of these problems in a variety of shared cataloging databases. they cite variation in title as the most common reason for failure to identify a duplicate record that should match. variations in publisher, names, and pagination were noted as common. lei zeng pres­ ents a study of chinese language records in the oclc and rlin databases.17 zeng discusses quality problems including (1) format errors such as field and subfield tagging and incorrect punctuation; (2) content errors such as missing fields and internal record inconsisten­ cies; and (3) editing and inputting errors such as spacing and misspelling. part 2 of her study presents the results of the prototype rule­based system developed to catch such errors.18 while the author refrains from comparing the quality of oclc and rlin chinese language catalog records, the discussion makes clear that the quality issues are common to a number of online databases. more work is needed on quality and accuracy of shared records in non­roman scripts, or in other lan­ guages transliterated to roman script. n types of bias to be considered specific factors that may tend to bias an attempt to match one record to another include: 1. violated expectations—system software expects data it does not receive, or data received is not well formed. 2. temporal bias—changes in rules and philosophies of record creation over time. 3. design bias—choices in layout of the records, which favor one type of record representation at the expense of another. 4. judgment calls—distinctions introduced in record representations due to differing but legitimate variation in expert judgment. oclc is a multina­ tional cooperative and there is no universal set of standards and rules for creating database records. rules of cataloging most widely used are not abso­ lutely prescriptive and are designed to allow local deviation to meet local needs.19 5. structural bias—process and systems bias. this category reflects internal influences, inherent in the automatic processing, storage, and retrieval of large numbers of records. 6. growth of the database environment—whether in raw numbers of records, numbers of specific formats, numbers of foreign languages, or other characteristics that may affect efficient location and comparison of records. 7. changes in contributor characteristics––in the goals or focus of institutions that contribute to the database. violated expectations data may not conform to expectations. expectations about the nature of records in the data­ bases are frequently violated. what seem to be good rules for matching may not work well if the incoming data is not well formed, or simply not constructed as expected. biasing sources in the incoming data include the fol­ lowing: 1. typographical errors occur in titles and other parts of the record. anywhere the software has to parse text, an entry error—or even correction of an entry error by a later update—could con­ found matching. this could confound both (a) query execution and (b) candidate comparisons. basically the system expects textual data such as the name of a title or publisher to be correct, and machine­based efforts to detect errors in data are expensive to run. spelling detection techniques can compensate in some ways for data problems, but will not identify cases of real­word errors. see kukich for a survey of spelling error, real­word, and context­dependent techniques.20 2. there is also the issue of real word differences in similar text strings. an automated system with programmed fault tolerance may wrongly equate the publisher name “mila” with “mela” when they are distinct publishers. equivalence tables can cross­reference known variations on well­known publisher names, but cannot predict merges and other organizational changes. or consider author names: are “john smith” and “jon smith” the 1� information technology and libraries | june 20071� information technology and libraries | june 2007 same? this is a major problem with automated authority control where context clues may not be trustworthy. 3. errors of formatting of variable fields in the meta­ data contribute to false mismatch. the rules for data entry in the marc record are complex and have changed over time. erroneous placement or coding of subfields poses challenges for iden­ tification of relevant data. the software must be fault tolerant wherever possible. changes in the format of the data itself in these fields/sub­ fields may further complicate record comparisons. isbns (international standard book numbers) and lccns (library of congress control numbers) have both changed format in the recent past. 4. errors occur in the fields that indicate format of the information. in bibliographic records, format information is used to derive the overall type of material being described: book, url, dvd, and so on. errors in the data in combination can generate an incorrect material type for the record. 5. language of cataloging: this comparison has in the past caused inappropriate mismatches. the require­ ments in the new matching aimed to address this. 6. language in formation of queries: marc records frequently are a mixture of languages. as has been seen in other projects with intensive comparison of text, overlap in languages has the potential to confuse comparisons of short strings of text.21 the assumption made here is that the use of all pos­ sible syllables contained in the title should tend to mitigate language problems. nothing short of semantic analysis by the software is likely to solve such a problem, and contextual approaches to detection have had most success (in the produc­ tion environment) in carefully controlled cases. matching overall must be generic in its problem solving techniques. temporal bias large databases developed over time have their contents influenced by changes in standards for record creation, changes in contributor perception of the role of the data­ base, and changes in technology to be described. changes may include the following: 1. description level: e.g. changes such as book or elec­ tronic book. these have evolved from format­ to content­based descriptions that transcend format. over time, the cataloging rules for describing formats have changed. thus a format description created earlier might inadvertently “mismatch” the newer description of exactly the same item. for example, the rules for describing a book on a cd originally emphasized the cd format, whereas now, the emphasis might be shifted to focus on the intellectual content, the fact that it is a book. 2. the role of the database once perceived as chiefly repository or even backup source for a given library has become a shared resource with responsibilities to a community larger than any one library. 3. over time, the use of the database may change. (this is further discussed in the section on growth of the environment later.) searching has to satisfy the reference function of the database, but match­ ing as a process also relies on searching, and its goals are different. 4. varied standards worldwide challenge coopera­ tion. while u.s. libraries usually follow aacr2 and use the marc21 communications format, other parts of the world may use unimarc and country­specific cataloging rules. for instance, the pica bibliotekssystem, which hosts the dutch union catalog, used the prussian cataloging rules, which tended to focus on title entries.22 the switch to the rak was made by the early nineties.23 5. some libraries may not use any form of marc but submit a spreadsheet that is then converted to marc. there is some potential for ambiguities in those conversions due to lack of 1:1 correspon­ dence of parts. 6. even within a country, standards change over time, so that “correct” cataloging in one decade may not match that in a later period. neither is wrong, in its own temporal context, but each results in different metadata being created to describe the same item. intner points out that oclc’s database was initi­ ated a full decade before rlg implemented rlin, and rlin started almost the same time as the aacr2 publication.24 thus rlin had many fewer pre­aacr2 records in its database, while worldcat had many more preexisting records to try to match with the newer aacr2 forms. 7. objects referenced in the database may change over time. for instance, a record describing an elec­ tronic resource may point to a location no longer valid for that resource. 8. vendor records are created as advance advertis­ ing, but there is no guarantee the records will be updated later. estimating the time before updates occur is impossible. 9. records themselves change over time as they are copied, derived, and migrated into other systems. they may be enhanced or corrected in any system where they reside. so when they return to the origi­ nating database, they may have been transformed so far as to be unrecognizable as representations of the same item. this problem is not unique to xwc; public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 1�misinformation and bias in metadata processing | thornburg and oskins 1� it is a challenge for any shared database where export of records and reentry is likely. design bias the title, author, publisher, place of publication, and other elements of a record, designed in a time when most of the contents of a library were books, may not appear as clear or usable for other forms of informa­ tion, such as web sites or software. there is a risk to any design of a representation for an object, that it may favor distinctions in one format over another. or representations imported from other schemes may lose distinctions in the crosswalk from one scheme to another. a crosswalk is a mechanism for the mapping of data elements/content from one metadata scheme to another. dublin core and marc are just two examples of schemes used by library professionals. software exists to convert dublin core metadata to marc for­ mat, but the process of converting less complex data to a scheme of more structured data has inevitable limita­ tions. for instance, dublin core has “subject” while marc has dozens of ways to indicate subject, each with a different kind of designation for subject aspects of an item.25 see discussion in beall.26 libraries commonly exchange or purchase records from external sources to reduce the volume or costs of in­house cataloging. if an institution harvests metadata from multiple sources, there can be varying structures, content standards, and overall quality, all of which can make record compari­ sons error prone. while library and information science professionals have been creating metadata in the form of catalog records for a long time, the wider community of digital repositories may be outside the lis commu­ nity, and have varied understanding of the need for consistent representations of data. robertson discusses the challenges of metadata creation outside the library community.27 museums and archives may take a dif­ ferent view of what quality standards in metadata are. for example, for a museum, extensive detail about the provenance of an object is necessary. archives often record information at the collection level rather than the object level; for example, a box of miscellaneous papers, as opposed to a record for each of the papers within the box. educators need to describe resources such as learning objects. a learning object is any entity, digital or nondigital, which can be used, reused, or referenced during technology­supported learning 28 for these objects a metadata record using the ieee lom standard may be used.29 while this is as complex as a marc record, it has less bibliographic description and more focus on description of the nature and use of the learning object. in short, for one type of institution the notion of appropriate granularity of description may be too detailed or too vague for the needs of another type of institution. judgment calls two persons creating independent records for the same item exercise judgment in describing what is most impor­ tant about the object. one may say it is a book with an accompanying cd, another may say it is software on a cd, accompanied by a book of documentation. another example of legitimate variation is the choice of use of ellipses […] to leave out parts of long titles in a metadata description. one record creator may list the whole title, another may list only the first part followed by the mark of ellipsis to indicate abbreviation of the lengthy title. either is correct, but may not match each other without special techniques. see appendix b for the perils of ellipsis handling. the form of name of a publisher, given other occur­ rences of a publisher name in a record, may be abbrevi­ ated. for instance, in one place the corporate author who is also the publisher might be listed in the author field as “department of health and human services” and then abbreviated—or not—in the publisher area as “the department.” note that there are limitations inherent to the valida­ tion of any system of matching, in that human reviewers may not be able to determine whether two representa­ tions in fact describe the same item. structural bias 1. process bias refers to any features of the software which at run­time may change the way matching is carried out, whether by shortening or lengthen­ ing the analysis, or otherwise branching the logical flow. this can arise from many sources, including but not limited to the following factors. a. there is need for efficient processing of large num­ bers of incoming records. this can force an empha­ sis on speedy matching. that is, matching not required to replace records tends to be optimized to stop searching/matching as early as is reason­ able. in the case where unique key searching finds a single match to an incoming record, it is fairly easy for the software to “justify” stopping. if there are multiple matches found, more analysis may be needed before the decision to stop matching can be made. over time the numbers of records processed has increased enormously. b. matching needs to exploit “unique” keys to speed searching, yet these may not prove to be unique. though agreements are in place for use of numeric keys such as isbns, creation of these keys is not under the control of any one organization. 20 information technology and libraries | june 200720 information technology and libraries | june 2007 c. problems arise when brief records are com­ pared with fuller records. comparisons may be biased inadvertently towards false matches. such sparseness of data has been identified as a problem in rlin matching as well as in xwc. d. at the same time there is bias toward less generic titles in matching. requirements of sys­ tem throughput mandate an upper limit on the size of result set that the matching software will even attempt to analyze. this upper limit could tend to discriminate against effective retrieval of generic titles. matching will reject very large results sets of searches. so the query that has fewer title terms may tend to retrieve too much. titles such as “proceedings” or “bulletin” may be difficult to match if insufficient other informa­ tion is present in the record for the query to use. ironically this can mean addition of more generic titles to the database, since what is there is in effect less findable. e. transparency can contribute to bias in that, for each layer of transparency a layer of opacity may be added, when information is filtered out from a user’s view. that user may be a human or an application. openurl access to “appropriate copy” is an example from the standards world. the complexity of choosing among multiple online copies has become known as the “appro­ priate copy” problem. there are a number of instances where more than one legitimate copy of an electronic article may exist, such as mir­ roring or aggregator databases. it is essentially a problem of where and how to introduce localiza­ tion into the linking process.30 appropriateness reflects the user’s context, e.g., location, license agreements in place, cost, and other factors. 2. systems bias. what is this, really? the database can be seen as “agent.” the weight of its own mass may affect efforts to use its contents. a. for maintainers of large database systems, the goals of database repository and search engine may be somewhat at odds. yet librarians do make use of the database as reference source. b. search strategies for the software that acts as a user of the database is necessarily developed and optimized at a certain point in time. yet a river of new information flows into this data­ base. 1. if the numbers of types of entries in various database indexes grows nonproportion­ ally, search strategies that worked well in the past could potentially fall “out of tune” with the database contents. see growth of the environment section below. 2. change in proportions of languages in the database may render an application’s use of stopword lists less effective. 3. if changes in technology or practice result in new forms of material being described in the database, the software searches using material type as a limiter may not work properly. the software is using abstractions provided by the database, and they need to be kept synchronized. c. automated query construction presents its own problems. the use of boolean searching [term a and term b and term c] is quite restrictive in the sense that there is no “halfway” or flex for a record being included in a set of candidates. matching starts with the most specific search to avoid too­high numbers of records retrieved, and all it can do is drop or rearrange terms from a query in the effort to broaden the results. d. disconnects in metadata object creation/revision are another problem. links can point to broken uris (uniform resource identifiers). controlled vocabularies can drift or expand. even more confusing, a uri that is not broken may point to content which has changed to the point where the metadata no longer describes the item it once did. at one extreme, bruce and hillmann describe the curious case of citation of judicial opinions, for which a record of the opinion may be created as much as eighteen months before the volume with the official citation is printed, and thus the official citation cannot be created.31 e. expectations for creation of metadata play a role as well. traditional cataloging has generally had an expectation that most metadata is being cre­ ated once and reused. yet current practice may be more iterative, and must be, if such problems as records with broken internet uris are to be avoided. f. loss of synchronization can subvert process­ ing. note that other elements of metadata may become divorced or out of synch with the origi­ nal target /purpose. the prefix to an isbn was originally intended to describe the publisher, but is now an unreliable discriminator. numeric keys intended to identify items uniquely can retrieve multiple items, if the scheme for assign­ ing them is not applied consistently. in the worst case, meaningful data elements may become so corrupted as to be useless for record retrieval or even comparison of two records. g. ownership issues can detract from optimal data­ base management. member institutions’ percep­ tions of ownership of individual records can conflict with the goals of efficient search and retrieval. members may resist the idea of a “bet­ public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 21misinformation and bias in metadata processing | thornburg and oskins 21 ter” record being merged with a “lesser” one. so systems have ways of ranking records by source or contents with the general goal of trying to avoid losing information, but with the specific effect of refraining from actions that might be enriching in a given case. growth of the database environment a shared database can grow in unpredictable ways. a change in the relative proportions of different types of materials or topical coverage can render once­effective searches ineffective due to large result sets. an example of this is the number of internet­related entries in xwc. a search such as “dog” restricted to “internet­related” entries in 1995 retrieved thirty­four hits. this might be a manageable number. but in 2005, 225 entries were in the result set. similarly with subject headings, one search on “computer animation” retrieved fourteen hits in 1980, and 342 in 2005. in both cases the result sets grew from manageable to “too large” over time. the increase in the number of foreign language entries in a database can cause problems. just determining what language an entry is in can be difficult, and records may contain multiple languages. also, such languages as chinese, japanese, and korean can overlap. chinese syllables such as: “a, an, to, no, jan, ka, jun, lung, sung, i, lo, la, le, so, sun, juan,” seen out of context might be chinese or any one of several other languages. determining appropriate handling of stopwords and other rules for effective title matching becomes more complex as more languages populate the database. changes in contributor characteristics copy cataloging practices in an institution can affect xwc indirectly. an institution previously oriented to fixing downloaded records may adopt a policy of refrain­ ing from changing downloaded records. historical inde­ pendence of libraries is one illustration. prior to the 1970s, most libraries did not share their cataloging with other libraries. many institutions, especially smaller ones, were outside the loop and did things their own way. they used what rules they felt were useful, if they used any rules at all. later they converted sparse and poorly formed data into marc records and sent them to oclc for matching, perhaps in an effort to get back a more complete and useful record. yet the matching process is not always able to distinguish or interpret these local dialects. changes in specialization of cata­ loging staff at an institution, or cutbacks in staff can lead to reduced facility in providing original cataloging. outsourcing of cataloging work can affect handling of specialized materials as well. the introduction of vendor records and their characteristics has been noted by shedenhelm and burk.32 as they note, these records are very brief bibliographic records originally designed to advertise an item for sale by the vendor. these mini­ mal level records have a relatively high degree of dupli­ cation with existing records (37.5 percent in their study) and because of their sparseness can increase the cost of cataloging. changes in the proportion of contribu­ tors who create records in non­marc formats such as dublin core can affect the completeness of bibliographic entries. the use of such formats, meant to facilitate the entry of bibliographic materials, does come with a cost. group cataloging is a process whereby smaller libraries can join a larger set of institutions in order to reduce costs and facilitate cataloging. this larger group then contributes to oclc’s database as an entity. the growth of group cataloging has resulted in the addition of more records from smaller libraries, which may in the future have an effect on searching/matching in xwc worldcat overall. internationalization may be a factor as well. the marc format is an anglo­based format with english­language­based documentation. rapid inter­ national growth thrusts a broader range of traditions into a marc/oclc world. the role of character sets is heightened as the database grows. a cyrillic record may not be confidently matched to a transliterated record for the same item. although worldcat has a long his­ tory with cjk records, marc and worldcat are not yet accustomed to a wide repertoire of character sets. now, however, xwc is an environment in which expanding character coverage is possible, and likely. future research n we need more systematic study of the types of errors/omissions encountered in marc record cre­ ation. n how can the process of matching accomodate objects that change over time? n how does the conversion from new metadata schemes affect matching to marc records? does it help to know in what format a record arrived, or under what rules it was created? n how can we address sparseness in vendor records or legal citations? how can we deal with other advance publication issues? n how do changes in philosophy of the database affect the integrity of the matching process? n conclusions in this review we have seen that characterizing metadata at a high level is difficult. challenges for adding to a large, complex database include some of the following: 22 information technology and libraries | june 200722 information technology and libraries | june 2007 n rules for expert creation of metadata inevitably change over time. n the object of the metadata itself may change, more often than may be convenient. n comparisons of briefer records to records that are more elaborate descriptions can have pitfalls. search and comparison strategies for such record pairs are challenged by the need to have matching algorithms that work for every scenario. n changes within the database may themselves con­ tribute to exacerbation of matching problems if duplicates are added too often, or records are merged that actually represent different contents. because of the risk, policies for merging and replacing records tend to be conservative, but this does not always favor the greatest efficiency in database processing. n changes in the membership sharing a database are likely to affect its shape and searchability. n newer schemes of metadata representation are likely to challenge existing algorithms for determining matches. references 1. national information standards organization, understanding metadata (bethesda, md.: niso pr., 2004), 1. http:// www.niso.org/standards/resources/understanding metadata. pdf (accessed feb. 26, 2006). 2. library of congress, “marc 21 concise format for bibliographic data (2002).” http://www.loc.gov/marc/ bibliographic/ecbdhome.html (accessed nov. 20, 2004). 3. gail thornburg, “matching: discrimination, misinforma­ tion, and sudden death,” informing science conference, flag­ staff, ariz., june 2005. 4. thomas b. hickey and david j. rypka, “automatic detec­ tion of duplicate monographic records,” journal of library automation 12, no. 2 (june 1979): 125–42. 5. david bade, “the creation and persistence of misinfor­ mation in shared library catalogs,” occasional paper no. 211, (graduate school of library and information science, univer­ sity of illinois at urbana–champaign, apr. 2002). 6. edward t. o’neill, sally a. rogers, and w. michael oskins, “characteristics of duplicate records in oclc’s online union catalog,” library resources and technical services 37, no.1 (1993): 59–71. 7. jeffrey beal and karen kafadar, “the effectiveness of copy cataloging at eliminating typographical errors in shared bibliographic records,” library resources & technical services 48, no. 2 (apr. 2004): 92–101. 8. j. j. pollock and a. zamora, “collection and characteriza­ tion of spelling errors in scientific and scholarly text,” journal of the american society for information science 34, no. 1 (1983): 51–58. 9. edward t. o’neill and rao aluri, “a method for cor­ recting typographical errors in subject headings in oclc records,” research report # oclc/opr/rr­80/3 (1980). 10. martha m. yee, “manifestations and year­equivalents: theory, with special attention to moving­image materials,” library resources and technical services 38, no. 3 (1995): 227–55. 11. owen gingerich, “researching the book nobody read: the de revolutionibus of nicolaus copernicus,” the papers of the bibliographical society of america 99, no. 4 (2005): 484–504. 12. laura d. shedenhelm and bartley a. burk, “book vendor records in the oclc database: boon or bane?” library resources and technical services 45, no. 1 (2001): 10–19. 13. peter jasco, “content evaluation of databases,” in annual review of information science and technology, vol. 32 (medford, n.j.: information today, inc., for the american society for infor­ mation science, 1997), 231–67. 14. sheila intner, “quality in bibliographic databases: an analysis of member­controlled cataloging of oclc and rlin,” advances in library administration and organization 8 (1989): 1–24. 15. jeffrey beall, “metadata and data quality problems in the digital library,” journal of digital information 6, no. 3 (2005): 10–11. 16. edward t. o’neill and diane vizine­goetz, “quality control in online databases,” annual review of information science and technology 23 (washington, d.c.: american society for information science, 1988). 17. lei zeng, “quality control of chinese­language records using a rule­based data validation system. part 1: an evalua­ tion of the quality of chinese­language records in the oclc oluc database,” cataloging and classification quarterly 16, no. 4 (1993): 25–66 18. lei zeng, “quality control of chinese­language records using a rule­based data validation system. part 2: a study of a rule­based data validation system for online chinese cata­ loging,” cataloging and classification quarterly 18, no. 1 (1993): 3–26. 19. anglo-american cataloguing rules, 2nd ed., 2002 rev. (chi­ cago: ala, 2002). 20. karen kukich, “techniques for automatically correct­ ing words in text,” acm computing surveys 24, no. 4 (1992): 377–439. 21. gail thornburg, “the syllables in the haystack: techni­ cal challenges of non­chinese in a wade­giles to pinyin con­ version,” information technology and libraries 21, no. 3 (2002): 120–26. 22. hartmut walravens, “serials cataloguing in germany: the historical development,” cataloging and classification quarterly 35, no. 3/4 (2003): 541–51; instruktionen für die alphabetischen kataloge der preuszischen bibliotheken vom 10. mai 1899. 2 ausg. in der fassung vom 10. august 1908 (berlin: behrend & co., 1909). 23. richard greene, e­mail message to author, nov. 13, 2006; regeln für die alphabetische katalogisierung: rak / irmgard bou­ vier (wiesbaden, germany: l. reichert, 1980, c1977). 24. intner, “quality in bibliographic databases.” 25. richard greene, e­mail message to author, feb. 27, 2006. 26. beall, “metadata and data quality problems in the digital library.” 27. r. john robertson, “metadata quality: implications for library and information science professionals,” library review 54, no. 5 (2005): 295–300. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 23misinformation and bias in metadata processing | thornburg and oskins 23 28. ieee. learning technology standards committee, “wg12: learning objects metadata.” http://ltsc.ieee.org/wg12 (accessed feb. 26, 2006). 29. ibid. 30. orien beit­arie et al., “linking to the appropriate copy: report of a doi­based prototype,” d-lib 7, no. 9 (sept. 2001). 31. thomas r. bruce and diane i. hillmann,“the continuum of metadata quality: defining, expressing, exploiting,” in metadata in practice (chicago: ala, 2004), 238–56. 32. shedenhelm and burk, “book vendor records in the oclc database.” 24 information technology and libraries | june 200724 information technology and libraries | june 2007 appendix a. sample cdfrecord record from the xwc database cgm 7a 27681290 vf bcahru mr baaafu 920714r19551952fr 092 mleng 92513007 dlcamim dlc lp5921u.s. copyright office xxu mr vbe 6360­6361 (viewing copy) fgb 5643­5647 (ref print) fpa 0621­0625 (master­ pos) othello (motion picture : welles) the tragedy of othello­­ the moor of venice / a mercury production, [films marceau?] ; directed, produced, and written by orson welles. u.s. ; [morocco?] france :films marceau,1952 ; [morocco?: :s.n., 1952?] ;united states : united artists,1955. 2 videocassettes of 2 (ca. 92 min.) :sd., b&w ; 3/4 in. viewing copy. 10 reels of 10 on 5 (ca. 8280 ft.) :sd., b&w ; 35 mm. ref print. 10 reels of 10 on 5 (ca. 8280 ft.) :sd., b&w ; 35 mm. masterpos. copyright: orson welles; 19sep52; lp5921. reference sources cited below and m/b/rs preliminary cataloging card list title as othello. photography, anchisi brizzi, g.r. aldo, george fanto ; film editors, john shepridge, jean sacha, renzo lucidi, william morton ; music, francesco lavagnino, alberto barberis. orson welles, suzanne cloutier, micheaì l macliamoì ir, robert coote. director, producer, and writer credits taken from focus on orson welles, p. 205. lc has u.s. reissue copy.dlc new york times,9/15/55. an adaptation of the play by william shakespeare. reference sources used: new york times, 9/15/55; international motion pic­ ture almanac, 1956, p. 329; focus on orson welles, p. 205­206; monthly film bulletin, v. 23, no. 267, p. 44; index de la cineì matog­ raphie francì§aise, 1952, p. 496. received: 5/26/87 from lc video lab;viewing copy; preservation, made from ref print, paperwork in acq: copyright­­material movement form file, lwo 21635; copyright collection. received: 12/2/64; ref print;copyright deposit; copyright collection. received: 5/70; masterpos;gift; afi theatre collection. othello (fictitious charac­ ter)drama. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 25misinformation and bias in metadata processing | thornburg and oskins 25 plays. mim features. mim welles, orson, 1915­direction, production,writing, cast. cloutier, suzanne,1927­cast. mac liammoì ir, micheaì l, 1899­1978,cast. coote, robert,1909­1982,cast. copyright collection (library of congress)dlc afi theatre collection (library of congress)dlc othello. appendix b. the perils of judging near matches a. challenges of handling ellipses in titles thought to be similar incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 104th congress prepared by the staff of the joint committee on taxation incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 106th congress prepared by the staff of the joint committee on taxation incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 107th congress prepared by the staff of the joint committee on taxation incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 108th congress prepared by the staff of the joint committee on taxation b. partial matches in names which might represent the same publisher publisher comparison is challenging in an environment where organziations are regularly merged or acquired by other organziations. there is no real authority control for publishers that would help cataloguers decide on a preferred form. when governmental organizations are added to the mix, the challenges increase. below are some examples of non­match­ ing text of publisher names in records, which might or might not considered the same by a human expert. (the publisher names have been normalized.) 26 information technology and libraries | june 200726 information technology and libraries | june 2007 1. publisher name may be partially or differently recorded in two records incoming publisher: konzeptstudien kantonale planungsgruppe match: kantonale planungsgruppe konzeptstudien (word order different) incoming publisher: institut francais proche orient match: institut francais darcheologie proche orient incoming publisher: u s dept of commerce national oceanic and atmospheric administration national environ­ mental satellite data and information service match: national oceanic and atmospheric administration 2. publisher name may have changed due to acquisition by another organization incoming publisher: pearson prentice hall match: prentice hall incoming publisher: uxl match: uxl thomson gale incoming publisher: thomson arco match: arco thomson learning 3. one record may show “publisher” which is actually government distributing agency or clearinghouse such as the u.s. government printing office or national technical information service (ntis), while the candidate match shows the actual government agency. these can be almost impossible to evaluate. incoming publisher: u s congressional service match: supt g p o (here the distributor is the government printing office, listed as the publisher) incoming publisher: u s dept of commerce national oceanic and atmospheric administration national environmental satellite data and information service match: national oceanic and atmospheric administration incoming publisher: u s gpo match: u s fish and wildlife service 4. the publisher in a record may start with or end with the publisher in the second record. should it be called a match? good: incoming publisher trotta match: editorial trotta incoming publisher wiley match: john wiley questionable? incoming publisher prentice hall match: prentice hall regents canada incoming publisher geuthner match: orientaliste geuthner incoming publisher oxford match: distributed royal affairs oxford incoming publisher: pan union general secretariat organization states match: social science section cultural affairs pan union editorial | truitt 159 marc truitt editorial: reflections on what we mean by “forever” w hat do we mean when we tell people that we want or intend to preserve content or an object “forever”? a couple of weeks ago, i attended the fall meeting of the preservation and archiving special interest group (pasig) in san francisco. the group, generously sponsored by sun microsystems, is the brainchild of art pasquinelli of sun and michael keller of stanford. first, a confession on my part. since the university of alberta (ua) was one of the founding members of pasig, i had occasion to attend the first several pasig meetings. in the beginning, there were just a handful of—perhaps fewer than ten—institutions represented. it seemed at the first couple of meetings, when the group was still finding its direction, that the content was slim, repetitious, and overly focused on sun’s own solutions in the digital preservation and archiving (dpa) arena. since we had other attendees ably representing ua, i stayed away from the following several meetings. well, pasig has grown up. the attendee list for this meeting boasted nearly two hundred persons representing more than thirty institutions. among the attendees were many of the leading lights in dpa and the profession generally. institutions represented included several north american and european national libraries, as well as arls, memory institutions, and a host of companies and consultants offering a range of dpa solutions. yes, pasig has arrived, and we have art, mike, and sun to thank for this. if i have one real remaining complaint about pasig, it’s that the group is still overly focused on sun’s solutions. true, other vendors such as exlibris and vtls attended, but their solutions don’t compete; rather, they build on sun’s offerings. and while microsoft also was in attendance for the first time, its presentation focused not so much on dpa solutions—it has none—as on a raft of interesting and useful plug-ins whose purpose is to facilitate preservation of content created in microsoft products such as word, excel, powerpoint, etc. other large vendors of dpa solutions—think ibm, for one—remain conspicuously absent. it’s time for sun to do the “right thing” and “open source” pasig. if sun wishes to continue to sponsor pasig by lending administrative and organizational expertise, that would be great. indeed, a leading but not controlling role in pasig would be entirely consistent with the company’s new focus on support of open-source efforts such as mysql, openoffice, and opensolaris. so, what about the title of this editorial? when we talk of digital preservation, just how long are we thinking of preserving an object? ask any twenty specialists in dpa, and chances are that you’ll get at least ten different answers. for some, the timeframe can be as short as five to twenty years. for others, it’s fifty or perhaps one hundred years. at pasig, at least one presenter described an organizational business model that envisions preserving content for five hundred years. and there are even some in our profession who glibly use what one might call “the dpa f-word,” although fortunately none of them seemed to be in attendance at this fall’s pasig what does this mean in a very practical, nuts-and-bolts it sense? chris wood of sun gave a presentation at the 2008 pasig spring meeting in which he estimated that the cost to supply power and cooling alone to maintain a petabyte (1,000 tb) of disk-based digital content for a mere ten years would easily exceed $1 million.1 refining his figures downward somewhat, wood noted a few months later at the following pasig meeting that for a 1 tb drive, the fiveyear estimated power and cooling for 2008–12 could be estimated at approximately $320, or $640,000 per petabyte over ten years, still a considerable sum.2 add to this the costs of migration—consider that a modern spinning disk is generally thought to have a useful lifespan of about five years, and tape may have two or three decades—and the need regular integrity-checking of digital content for “bit-rot,” and you have the stuff of a sustainability nightmare. these challenges don’t even include the messy question of preservating an object so that it is usable in a century or five. while we probably will be able to read word and excel files for the foreseeable future, there are already countless files created with nowdefunct pc applications of the 1980s and 1990s; many are stored on all kinds of obsolete media and today are skating on the edge of inaccessibility. already we are seeing concern expressed at institutions with significant digital library and digitization commitments that curating, migrating, and ensuring the integrity and usability of growing petabytes of content over centuries may be unsustainable in both dollars and staff.3 can we even imagine the possible maintenance burden for our descendants, say, 250 or 500 years from now? in 2006, alexander stille observed that “one of the great ironies of the information age is that, while the late twentieth century will undoubtedly have recorded more data than any other period in history, it will also almost certainly have lost more information than any previous era.”4 how are we to deal with this? can we meaningfully plan for the preservation of digital content over centuries given our poor track record over just the past few decades? perhaps we’re thinking too big when we speak of “forever.” maybe we need to begin by conceptualizing and implementing on a more manageable scale. or, to adopt a phrase that seemed to become the informal mantra of marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 160 information technology and libraries | december 2009 both this year’s pasig and the immediately preceding ipres meeting, “to get to forever you have to get to five years first.”5 n about this issue of ital a few months ago, while she was still working at the university of nevada las vegas, ital’s longtime managing editor, judith carter, shared with me the program for discovery mini-conference that had just been held at unlv. the presentations, originally cast as poster sessions, suggested a diverse and fascinating collection of insights deserving of wider attention. i suggested to judith that she and her colleagues had the makings of a great ital theme issue, and i’m pleased that they accepted my invitation to rework the presentations into a form suitable for publication here. i hope that you will find the results of their work interesting—i certainly do. they’ve done a superb job! bravo to judith and the presenters at the unlv discovery mini-conference! n corrigenda in our september issue, in an article by kathleen carlson, we inadvertently characterized camtasia studio as an open-source product. it is not. camtasia studio is published by techsmith corporation. you can find out more at the product website (http://www.techsmith.com/ camtasia.asp). also, in the same article, we provided a url to a flash tutorial titled “how to order an article that asu does not own.” ms. carlson has recently advised us that the tutorial in question is no longer available. references and notes 1. chris wood, “the billion file problem and other archive issues” (presentation, spring meeting of the sun preservation and archiving special interest group [pasig], san francisco, california, may 28, 2008), http://events-at-sun.com/ pasig_spring/presentations/chriswood_massivearchive.pdf (accessed oct. 22, 2009). 2. chris wood, “archive and preservation: emerging storage: technologies & trends” (presentation, fall meeting of pasig, baltimore, maryland, nov. 19, 2008), http://events -at-sun.com/pasig_fall08/presentations/pasig_wood.pdf. (accessed oct. 22, 2009). 3. consider, for example, the following extract from a recent posting to the syslib-l electronic discussion list by the head of library systems at the university of north carolina at chapel hill: i’m exaggerating a little in my subject line, but it’s been less than 4 years since we purchased our first large (5tb) storage array. we now have a raw 65tb online, and 84tb on order—although a considerable chunk of that 84 is going to replace storage that’s going out of warranty/maintenance and is more cost effective to replace (apple xraids, for instance). in the end, though we’ll net out with 100tb or thereabouts by the end of next year. a great deal of this space is going to digitization projects—no surprise there. we have over 20tb now in our “digital archive,” storage i consider dim, if not dark. we need a heck of a lot of space for staging backups, givien [sic] how much we write to tape in a 24-hour period. individual staff aren’t abusing our lack of quotas—it’s really almost all legitimate, project-driven work that’s eating us up. what’s scarier is that we’re now talking seriously about moving from project-driven work to programmatic work: the latest large photographic archive we acquired is being scanned as part of the acquisition/processing workflow. we’re looking at ways to prioritize the scanning of our manuscript collections. donors increasingly expect to see their gifts online. and we’re not even yet supporting an “institutional repository.” will owen, “0 to 60 in three years: mass storage management,” online posting, dec. 8, 2008, syslib-l@listserv.indiana.edu, https://listserv.indiana.edu/cgi-bin/wa-iub.exe?a0=syslib-l (account required; accessed oct. 22, 2009). 4. alexander stille, “are we losing our memory? or, the museum of obsolete technology,” lost magazine, no. 3 (feb. 2006), http://www.lostmag.com/issue3/memory.php (accessed oct. 22, 2009). while stille was referring in this quotation to both digital and nondigital materials, his comments are but part of a larger debate positing that the latter half of the twentieth century could well come to be known in the future as a “digital dark age” because of the vast quantity of at-risk digital content, recently estimated by one expert at some 369 exabytes (369 billian gb) worth of data. physorg.com, “‘digital dark age’ may doom some data,” http://www.physorg.com/news144343006 .html (accessed oct. 22, 2009). 5. ed summers, “ipres, iipc, pasic roundup/braindump,” online posting, oct. 14, 2009, inkdroid, http://inkdroid .org/journal/2009/10/14/ipres-iipc-pasig-roundupbrain dump/ (accessed oct. 22, 2009). 2 information technology and libraries | june 2008 mark beatty (mbeatty@wils.wisc.edu) is lita president 2007/2008 and trainer, wisconsin library services, madison. mark beattypresident’s message i’ve recently read three quite different articles that surprisingly all had something similar to say with a different twist on the theme uppermost in my brain the last year or two. here’s the briefest of quotes from the three. i would suggest your full reading of all three if you haven’t already. n lankes, silverstein and nicholson, “participatory networks: the library as conversation,” in the december 2007 information technology and libraries: “with their principles, dedication to service, and unique knowledge of infrastructure, libraries are poised not simply to respond to new technologies, but to drive them. by tying technological implementation, development and improvement to the mission of facilitating conversations across fields, libraries can gain invaluable visibility and resources.” n bill crowley, “lifecycle librarianship,” in the april 1, 2008 library journal: “public, academic, and school librarians should adopt the service philosophy of lifecycle librarianship and jointly plan at town, city, or county levels to identify and meet human learning needs from “lapsit to nursing home.”” n joe kissell, “instant messaging for introverts,” in the april 4, 2008 tidbits (http://db.tidbits.com/ article/9544): “several people i discussed this issue with (using im and twitter) expressed dismay at having had relationships deteriorate due to an unwillingness on another person’s part to adapt to changing technology. for example, people who don’t use e-mail don’t get evites, and so they end up being excluded from parties.” what all three express to me is a concern that libraries, and just plain humans, need to be part of the conversation, part of the social structure, and full participants in life. we are now, through surveys and meetings and focus groups, starting to know that new librarians and new lita members are most interested in networking with their colleagues using multiple methods to fulfill the whole range of their professional and social needs. lankes wants to make sure we participate with all our constituencies, crowley wants us to spend a lifetime with those constituencies, and kissell wants to make sure we get invited to the party. that sounds a bit facetious but i believe the point is that our association, our libraries, our social structures are now required to be active participants, physically and virtually, in the life of their communities. we have to recognize our communities and then act to participate and provide space and support to those communities. this takes work and the will to always be part of our communities. all of which leads to my president’s program, featuring keynote speaker joe janes and the blogging folks at “it’s all good” at the ala annual conference 2008 in anaheim, california. it will be part of sunday afternoon with lita, taking place on sunday, june 29, 2008. the program line up will include: n top technology trends 1:30–3:00 p.m. n lita awards and scholarships reception 3–4 p.m. n lita president’s program 4–5 p.m. n “isn’t it great to be in the library . . . wherever that is?” it’s often said that today we have to run three libraries at once: the library of yesterday, today, and tomorrow. we run both the physical, visible library, and the one that exists beyond the walls. this raises many questions of what a library is and encompasses, what it isn’t, where the boundaries lie, the impact on what we do and how we do it, what our clients want, how we serve them, and what kinds of librarians serve them. this program will attempt to examine the full social and cultural constructs of libraries that move beyond basic web 2.0 and integrate patrons, librarians, and resources in what should be a ubiquitous manner. join joe janes, associate professor in the information school of the university of washington in seattle and columnist for american libraries, keynote speaker, along with members of the “it’s all good” blogging group (http://scanblog.blogspot.com) as the reactor panel for a lively exploration of possible futures. i hope you’ll be able to attend but be assured that members of the lita community will blog and report and even record the sessions in various ways that will be made freely available to our community. a semantic model of selective dissemination of information | morales-del-castillo et al. 21 a semantic model of selective dissemination of information for digital libraries j. m. morales-del-castillo, r. pedraza-jiménez, a. a. ruíz, e. peis, and e. herrera-viedma in this paper we present the theoretical and methodological foundations for the development of a multi-agent selective dissemination of information (sdi) service model that applies semantic web technologies for specialized digital libraries. these technologies make possible achieving more efficient information management, improving agent–user communication processes, and facilitating accurate access to relevant resources. other tools used are fuzzy linguistic modelling techniques (which make possible easing the interaction between users and system) and natural language processing (nlp) techniques for semiautomatic thesaurus generation. also, rss feeds are used as “current awareness bulletins” to generate personalized bibliographic alerts. n owadays, one of the main challenges faced by information systems at libraries or on the web is to efficiently manage the large number of documents they hold. information systems make it easier to give users access to relevant resources that satisfy their information needs, but a problem emerges when the user has a high degree of specialization and requires very specific resources, as in the case of researchers.1 in “traditional” physical libraries, several procedures have been proposed to try to mitigate this issue, including the selective dissemination of information (sdi) service model that make it possible to offer users potentially interesting documents by accessing users’ personal profiles kept by the library. nevertheless, the progressive incorporation of new information and communication technologies (icts) to information services, the widespread use of the internet, and the diversification of resources that can be accessed through the web has led libraries through a process of reinvention and transformation to become “digital” libraries.2 this reengineering process requires a deep revision of work techniques and methods so librarians can adapt to the new work environment and improve the services provided. in this paper we present a recommendation and sdi model, implemented as a service of a specialized digital library (in this case, specialized in library and information science), that can increase the accuracy of accessing information and the satisfaction of users’ information needs on the web. this model is built on a multi-agent framework, similar to the one proposed by herrera-viedma, peis, and morales-del-castillo,3 that applies semantic web technologies within the specific domain of specialized digital libraries in order to achieve more efficient information management (by semantically enriching different elements of the system) and improved agent–agent and user–agent communication processes. furthermore, the model uses fuzzy linguistic modelling techniques to facilitate the user–system interaction and to allow a higher grade of automation in certain procedures. to increase improved automation, some natural language processing (nlp) techniques are used to create a system thesaurus and other auxiliary tools for the definition of formal representations of information resources. in the next section, “instrumental basis,” we briefly analyze sdi services and several techniques involved in the semantic web project, and we describe the preliminary methodological and instrumental bases that we used for developing the model, such as fuzzy linguistic modelling techniques and tools for nlp. in “semantic sdi service model for digital libraries,” the bulk of this work, the application model that we propose is presented. finally, to sum up, some conclusive data are highlighted. n instrumental basis filtering techniques for sdi services filtering and recommendation services are based on the application of different process-management techniques that are oriented toward providing the user exactly the information that meets his or her needs or can be of his or her interest. in textual domains, these services are usually developed using multi-agent systems, whose main aims are n to evaluate and filter resources normally represented in xml or html format; and n to assist people in the process of searching for and retrieving resources.4 j. m. morales-del-castillo (josemdc@ugr.es) is assistant professor of information science, library and information science department, university of granada, spain. r. pedrazajiménez (rafael.pedraza@upf.edu) is assistant professor of information science, journalism and audiovisual communication department, pompeu fabra university, barcelona, spain. a. a. ruíz (aangel@ugr.es) is full professor of information science, library and information science department, university of granada. e. peis (epeis@ugr.es) is full professor of information science, library and information science department, university of granada. e. herrera-viedma (viedma@decsai.ugr.es) is senior lecturer in computer science, computer science and artificial intelligence department, university of granada. 22 information technology and libraries | march 2009 traditionally, these systems are classified as either content-based recommendation systems or collaborative recommendation systems.5 content-based recommendation systems filter information and generate recommendations by comparing a set of keywords defined by the user with the terms used to represent the content of documents, ignoring any information given by other users. by contrast, collaborative filtering systems use the information provided by several users to recommend documents to a given user, ignoring the representation of a document’s content. it is common to group users into different categories or stereotypes that are characterized by a series of rules and preferences, defined by default, that represent the information needs and common behavioural habits of a group of related users. the current trend is to develop hybrids that make the most of content-based and collaborative recommendation systems. in the field of libraries, these services usually adopt the form of sdi services that, depending on the profile of subscribed users, periodically (or when required by the user) generate a series of information alerts that describe the resources in the library that fit a user’s interests.6 sdi services have been studied in different research areas, such as the multi-agent systems development domain,7 and, of course, the digital libraries domain.8 presently, many sdi services are implemented on web platforms based on a multi-agent architecture where there is a set of intermediate agents that compare users’ profiles with the documents, and there are input-output agents that deal with subscriptions to the service and display generated alerts to users.9 usually, the information is structured according to a certain data model, and users’ profiles are defined using a series of keywords that are compared to descriptors or the full text of the documents. despite their usefulness, these services have some deficiencies: n the communication processes between agents, and between agents and users, are hindered by the different ways in which information is represented. n this heterogeneity in the representation of information makes it impossible to reuse such information in other processes or applications. a possible solution to these deficiencies consists of enriching the information representation using a common vocabulary and data model that are understandable by humans as well as by software agents. the semantic web project takes this idea and provides the means to develop a universal platform for the exchange of information.10 semantic web technologies the semantic web project tries to extend the model of the present web by using a series of standard languages that enable enriching the description of web resources and make them semantically accessible.11 to do that, the project basis itself on two fundamental ideas: (1) resources should be tagged semantically so that information can be understood both by humans and computers, and (2) intelligent agents should be developed that are capable of operating at a semantic level with those resources and that infer new knowledge from them (shifting from the search of keywords in a text to the retrieval of concepts).12 the semantic backbone of the project is the resource description framework (rdf) vocabulary, which provides a data model to represent, exchange, link, add, and reuse structured metadata of distributed information sources, thereby making them directly understandable by software agents.13 rdf structures the information into individual assertions (e.g., “resource,” “property,” and “property value triples”) and uniquely characterizes resources by means of uniform resource identifiers (uris), allowing agents to make inferences about them using web ontologies or other, simpler semantic structures, such as conceptual schemes or thesauri.14 even though the adoption of the semantic web and its application to systems like digital libraries is not free from trouble (because of the nature of the technologies involved in the project and because of the project’s ambitious objectives,15 among other reasons), the way these technologies represent the information is a significant improvement over the quality of the resources retrieved by search engines, and it also allows the preservation of platform independence, thus favouring the exchange and reuse of contents.16 as we can see, the semantic web works with information written in natural language that is structured in a way that can be interpreted by machines. for this reason, it is usually difficult to deal with problems that require operating with linguistic information that has a certain degree of uncertainty (e.g., when quantifying the user’s satisfaction in relation to a product or service). a possible solution could be the use of fuzzy linguistic modelling techniques as a tool for improving system–user communication. fuzzy linguistic modelling fuzzy linguistic modelling supplies a set of approximate techniques appropriate for dealing with qualitative aspects of problems.17 the ordinal linguistic approach is defined according to a finite set of tags (s) completely ordered and with odd cardinality (seven or nine tags): { }{ }t,=hi,s=s i …∈ 0, the central term has a value of approximately 0.5, and the rest of the terms are arranged symmetrically around a semantic model of selective dissemination of information | morales-del-castillo et al. 23 it. the semantics of each linguistic term is given by the ordered structure of the set of terms, considering that each linguistic term of the pair (si, st-i) is equally informative. each label si is assigned a fuzzy value defined in the interval [0,1] that is described by a linear trapezoidal property function represented by the 4-tupla (ai, bi, αi, βi). (the two first parameters show the interval where the property value is 1.0; the third and fourth parameters show the left and right limits of the distribution.) additionally, we need to define the following properties: 1.–the set is ordered: si ≥ sj if i ≥ j. 2.–there is the negation operator: neg(si ) = sj, with j = t i. 3.–maximization operator: max(si, sj) = si if si ≥ sj. 4.–minimization operator: min(si, sj) = si if si ≤ sj. it also is necessary to define aggregation operators, such as linguistic weighted averaging (lwa),18 capable of and operating with and combining linguistic information. focusing on facilitating the interaction between users and system, the other starting objective is to achieve the development and implementation of the model proposed in the most automated way possible. to do this, we use a basic auxiliary tool—a thesaurus—that, among other tasks, assists users in the creation of their profile and enables automating the alerts generation. that is why it is critical to define the way in which we create this tool, and in this work we propose a specific method for the semiautomatic development of thesauri using nlp techniques. nlp techniques and other automating tools nlp consists of a series of linguistic techniques, statistic approaches, and machine learning algorithms (mainly clustering techniques) that can be used, for example, to summarize texts in an automatic way, to develop automatic translators, and to create voice recognition software. another possible application of nlp would be the semiautomatic construction of thesauri using different techniques. one of them consists of determining the lexical relations between the terms of a text (mainly synonymy, hyponymy, and hyperonymy),19 and extracting terms that are more representative for the text’s specific domain.20 it is possible to elicit these relations by using linguistic tools, like princeton’s wordnet (http://wordnet .princeton.edu) and clustering techniques. wordnet is a powerful multilanguage lexical database where each one of its entries is defined, among other elements, by their synonyms (synsets), hyponyms, and hyperonyms.21 as a consequence, once given the most important terms of a domain, wordnet can be used to create from them a thesaurus (after leaving out all terms that have not been identified as belonging or related to the domain of interest).22 this tool can also be used with clustering techniques—for example, to group documents of a collection in a set of nodes or clusters, depending on their similarity. each of these clusters is described by the most representative terms of their documents. these terms make up the most specific level of a thesaurus and are used to search in wordnet for their synonyms and most general terms, contributing (with the repetition of this procedure) to the bottom-up-development process of the thesaurus.23 although there are many others, these are some of the most well-known techniques of semiautomatic thesaurus generation (semiautomatic because, needless to say, the supervision of experts is necessary to determine the validity of the final result). for specialized digital libraries, we propose developing, on a multi-agent platform and using all these tools, sdi services capable of generating alerts and recommendations for users according to their personal profiles. in particular, the model presented here is the result of several previous models merging, and its service is based on the definition of “current-awareness bulletins,” where users can find a basic description of the resources recently acquired by the library or those that might be of interest to them.24 n the semantic sdi service model for digital libraries the sdi service includes two agents (an interface agent and a task agent) distributed in a four-level hierarchical architecture: user level, interface level, task level and resource level. its main components are a repository of full-text documents (which make up the stock of the digital library) and a series of elements described using different rdfbased vocabularies: one or several rss feeds that play a role similar to that of current-awareness bulletins in traditional libraries; a repository of recommendation log files that store the recommendations made by users about the resources, and a thesaurus that lists and hierarchically relates the most relevant terms of the specialization domain of the library.25 also, the semantics of each element (that is, its characteristics and the relations the element establishes with other elements in the system) are defined in a web ontology developed in web ontology language (owl).26 next, we describe these main elements as well as the different functional modules that the system uses to carry out its activity. elements of the model there are four basic elements that make up the system: 24 information technology and libraries | march 2009 the thesaurus, user profiles, rss feeds, and recommendation log files. thesaurus an essential element of this sdi service is the thesaurus, an extensible tool used in traditional libraries that enables organizing the most relevant concepts in a specific domain, defining the semantic relations established between them, such as equivalence, hierarchical, and associative relations. the functions defined for the thesaurus in our system include helping in the indexing of rss feeds items and in the generation of information alerts and recommendations. to create the thesaurus, we followed the method suggested by pedraza-jiménez, valverde-albacete, and navia-vázquez.27 the learning technique used for the creation of a thesaurus includes four phases: preprocessing of documents, parameterizing the selected terms, conceptualizing their lexical stems, and generating a lattice or graph that shows the relation between the identified concepts. essentially, the aim of the preprocessing phase is to prepare the documents’ parameterization by removing elements regarded as superfluous. we have developed this phase in three stages: eliminating tags (stripping), standardizing, and stemming. in the first stage, all the tags (html, xml, etc.) that can appear in the collection of documents are eliminated. the second stage is the standardization of the words in the documents in order to facilitate and improve the parameterization process. at this stage, the acronyms and n-grams (bigrams and trigrams) that appear in the documents are identified using lists that were created for that purpose. once we have detected the acronyms and n-grams, the rest of the text is standardized. dates and numerical quantities are standardized, being substituted with a script that identifies them. all the terms (except acronyms) are changed to small letters, and punctuation marks are removed. finally, a list of function words is used to eliminate from the texts articles, determiners, auxiliary verbs, conjunctions, prepositions, pronouns, interjections, contractions, and grade adverbs. all the terms are stemmed to facilitate the search of the final terms and to improve their calculation during parameterization. to carry out this task, we have used morphy, the stemming algorithm used by wordnet. this algorithm implements a group of functions that check whether a term is an exception that does not need to be stemmed and then convert words that are not exceptions to their basic lexical form. those terms that appear in the documents but are not identified by morphy are eliminated from our experiment. the parameterization phase has a minimum complexity. once identified, the final terms (roots or bases) are quantified by being assigned a weight. such weight is obtained by the application of the scheme term frequencyinverse document frequency (tf-idf), a statistic measure that makes possible the quantification of the importance of a term or n-gram in a document depending on its frequency of appearance and in the collection the document belongs to. finally, once the documents have been parameterized, the associated meanings of each term (lemma) are extracted by searching for them in wordnet (specifically, we use wordnet 2.1 for unix-like systems). thus we get the group of synsets associated with each word. the group of hyperonyms and hyponyms also are extracted from the vocabulary of the analyzed collection of documents. the generation of our thesaurus—that is, the identification of descriptors that better represent the content of documents, and the identification of the underlying relations between them—is achieved using formal concept analysis techniques. this categorization technique uses the theory of lattices and ordered sets to find abstraction relations from the groups it generates. furthermore, this technique enables clustering the documents depending on the terms (and synonyms) it contains. also, a lattice graph is generated according to the underlying relations between the terms of the collection, taking into account the hyperonyms and hyponyms extracted. in that graph, each node represents a descriptor (namely, a group of synonym terms) and clusters the set of documents that contain it, linking them to those with which it has any relation (of hyponymy or hyperonymy). once the thesaurus is obtained by identifying its terms and the underlying relations between them, it is automatically represented using the simple knowledge organization system (skos) vocabulary (see figure 1).28 user profiles user profiles can be defined as structured representations that contain personal data, interests, and preferences of users with which agents can operate to customize the sdi service. in the model proposed here, these profiles are basically defined with friend of a friend (foaf), a specific rdf/xml for describing people (which favours the profile interoperability, since this is a widespread vocabulary supported by an owl ontology) and another nonstandard vocabulary of our own to define fields not included in foaf (see figure 2).29 profiles are generated the moment the user is registered in the system, and they are structured in two parts: a public profile that includes data related to the user’s identity and affiliation, and a private profile that includes the user’s interests and preferences about the topic of the alerts he or she wishes to receive. to define their preferences, users must specify keywords and concepts that best define their information a semantic model of selective dissemination of information | morales-del-castillo et al. 25 needs. later, the system compares those concepts with the terms in the thesaurus using as a similarity measure the edit tree algorithm.30 this function matches character strings, then returns the term introduced (if there’s an exact match) or the lexically most similar term (if not). consequently, if the suggested term satisfies user expectations, it will be added to the user’s profile together with its synonyms (if any). in those cases where the suggested term is not satisfactory, the system must have any tool or application that enables users to browse the thesaurus and select terms that better describe their needs. an example of this type of applications is thmanager (http://thmanager .sourceforge.net), a project of the universidad de zaragoza, spain, that enables editing, visualizing, and going through structures defined in skos. each of the terms selected by the user to define his or her areas of interest has an associated linguistic frequency value (tagged as ) that we call “satisfaction frequency.” it represents the regularity with which a particular preference value has been used in alerts positively evaluated by the user. this frequency measures the relative importance of the preferences stated by the user and allows the interface agent to generate a ranking list of results. the range of possible values for these frequencies is defined by a group of seven labels that we get from the fuzzy linguistic variable “frequency,” whose expression domain is defined by the linguistic term set s = {always, almost_ always, often, occasionally, rarely, almost_never, never}, being the default value and “occasionally” being the central value. rss feeds thanks to the popularization of blogs, there has been widespread use of several vocabularies specifically designed for the syndication of contents (that is, for making accessible to other internet users the content of a website by means of hyperlink lists called “feeds”). to create our current-awareness bulletin we use rss 1.0, a vocabulary that enables managing hyperlinks lists in an easy and flexible way. it utilizes the rdf/xml syntax and data model and is easily extensible because of the use of proceedings figure 1. sample entry of a skos core thesaurus diego allione sr. af9fa7601df46e95566 library management 0.83 figure 2. user profile sample 26 information technology and libraries | march 2009 modules that enable extending the vocabulary without modifying its core each time new describing elements are added. in this model several modules are used: the dublin core (dc) module to define the basic bibliographic information of the items utilizing the elements established by the dublin core metadata initiative (http:// dublincore.org); the syndication module to facilitate software agents synchronizing and updating rss feeds; and the taxonomy module to assign topics to feeds items. the structure of the feeds comprises two areas: one where the channel itself is described by a series of basic metadata like a title, a brief description of the content, and the updating frequency; and another where the descriptions of the items that make up the feed (see figure 3) are defined (including elements such as title, author, summary, hyperlink to the primary resource, date of creation, and subjects). recommendation log file each document in the repository has an associated recommendation log file in rdf that includes the listing of evaluations assigned to that resource by different users since the resource was added to the system. each of the entries of the recommendation log files consists of a recommendation value, a uri that identifies the user that has done the recommendation, and the date of the record (see figure 4). the expression domain of the recommendations is defined by the following set of five fuzzy linguistic labels that are extracted from the linguistic variable “quality of the resource”: q = {very_low, low, medium, high, very_high}. these elements represent the raw materials for the sdi service that enable it to develop its activity through four processes or functional modules: the profiles updating process, rss feeds generation process, alert generation process, and collaborative recommendation process. system processes profiles updating process since the sdi service’s functions are based on generating passive searches to rss feeds from the preferences stored 14/03/2007 high figure 4. recommendation log file sample escudero sánchez, manuel fernández cáceres, josé luis broadcasting and the internet http://eprints.rclis.org/…/audiovideo_good.pdf this paper is about… 2002 redoc, 8 (4), 2008 virual communities figure 3. rss feed item sample in a user’s profile, updating the profiles becomes a critical task. user profiles are meant to store long-term preferences, but the system must be able to detect any subtle change in these preferences over time to offer accurate recommendations. in our model, user profiles are updated using a simple mechanism that enables finding users’ implicit preferences by applying fuzzy linguistic techniques and taking into account the feedback users provide. users are asked about their satisfaction degree (ej) in relation to the information alert generated by the system (i.e., whether the items a semantic model of selective dissemination of information | morales-del-castillo et al. 27 retrieved are interesting or not). this satisfaction degree is obtained from the linguistic variable “satisfaction,” whose expression domain is the set of five linguistic labels: s’ = {total, very_high, high, medium, low, very_low, null}. this mechanism updates the satisfaction frequency associated with each user preference according to the satisfaction degree ej. it requires the use of a matching function similar to those used to model threshold weights in weighted search queries.31 the function proposed here rewards the frequencies associated with the preference values present when resources assessed are satisfactory, and it penalizes them when this assessment is negative. let ej { }t,=hba,|ss,s ba 0,...∈∈ s’ be the degree of satisfaction, and f j i l { }t,=hba,|ss,s ba 0,...∈∈ s the frequency of property i (in this case i = “preference”) with value l, then we define the updating function g as s’x s→s: { } { } ( ) {=f,eg s ‘http://www.doaj.org/oai.article’; # the oai repository mylibrary::config->instance( ‘articles’ ); # the mylibrary instance # create a facet called formats $facet = mylibrary::facet->new; $facet->facet_name( ‘formats’ ); $facet->facet_note( ‘types of physical items embodying information.’ ); $facet->commit; $formatid = $facet->facet_id; # create an associated term called articles $term = mylibrary::term->new; $term->term_name( ‘articles’ ); $term->term_note( ‘short, scholarly essays.’ ); $term->facet_id( $formatid ); $term->commit; $articleid = $term->term_id; # create a location type called url $location_type = mylibrary::resource::location::type->new; $location_type->name( ‘url’ ); $location_type->description( ‘the location of an internet resource.’ ); $location_type->commit; $location_type_id = $location_type->location_type_id; # create a harvester and loop through each oai set mylibrary: a digital library framework and toolkit | morgan 21 $harvester = net::oai::harvester->new( ‘baseurl’ => doaj ); $sets = $harvester->listsets; foreach ( $sets->setspecs ) { # get each record in this set and process it $records = $harvester->listallrecords( metadataprefix => ‘oai_dc’, set => $_ ); while ( $record = $records->next ) { # map the oai metadata to mylibrary attributes $fkey = $record->header->identifier; $metadata = $record->metadata; $name = $metadata->title; @creators = $metadata->creator; $note = $metadata->description; $publisher = $metadata->publisher; next if ( ! $publisher ); $location = $metadata->identifier; next if ( ! $location ); $date = $metadata->date; $source = $metadata->source; @subjects = $metadata->subject; # create and commit a mylibrary resource $resource = mylibrary::resource->new; $resource->fkey( $fkey ); $resource->name( $name ); $creator = ‘’; foreach ( @creators ) { $creator .= “$_|” } $resource->creator( $creator ); $resource->note( $note ); $resource->publisher( $publisher ); $resource->source( $source ); $resource->date( $date ); $subject = ‘’; foreach ( @subjects ) { $subject .= “$_|” } $resource->subject( $subject ); $resource->related_terms( new => [ $articleid ]); $resource->add_location( location => $location, location_type => $location_type_id ); $resource->commit; } } 22 information technology and libraries | september 2008 # done exit; appendix b # index mylibrary data with kinosearch # require use kinosearch::invindexer; use kinosearch::analysis::polyanalyzer; use mylibrary::core; # define use constant index => ‘../etc/index’; # location of the index mylibrary::config->instance( ‘articles’ ); # mylibrary instance to use # initialize the index $analyzer = kinosearch::analysis::polyanalyzer->new( language => ‘en’ ); $invindexer = kinosearch::invindexer->new( invindex => index, create => 1, analyzer => $analyzer ); # define the index’s fields $invindexer->spec_field( name => ‘id’ ); $invindexer->spec_field( name => ‘title’ ); $invindexer->spec_field( name => ‘description’ ); $invindexer->spec_field( name => ‘source’ ); $invindexer->spec_field( name => ‘publisher’ ); $invindexer->spec_field( name => ‘subject’ ); $invindexer->spec_field( name => ‘creator’ ); # get and process each resource foreach ( mylibrary::resource->get_ids ) { # create, fill, and commit a document with content my $resource = mylibrary::resource->new( id => $_ ); my $doc = $invindexer->new_doc; $doc->set_value ( id => $resource->id ); mylibrary: a digital library framework and toolkit | morgan 23 $doc->set_value ( title => $resource->name ) unless ( ! $resource->name ); $doc->set_value ( source => $resource->source ) unless ( ! $resource->source ); $doc->set_value ( publisher => $resource->publisher ) unless ( ! $resource->publisher ); $doc->set_value ( subject => $resource->subject ) unless ( ! $resource->subject ); $doc->set_value ( creator => $resource->creator ) unless ( ! $resource->creator ); $doc->set_value ( description => $resource->note ) unless ( ! $resource->note ); $invindexer->add_doc( $doc ); } # optimize and done $invindexer->finish( optimize => 1 ); exit; appendix c # search a kinosearch index and display content from mylibrary # require use kinosearch::searcher; use kinosearch::analysis::polyanalyzer; use mylibrary::core; # define use constant index => ‘../etc/index’; # location of the index mylibrary::config->instance( ‘articles’ ); # mylibrary instance to use # get the query my $query = shift; if ( ! $query ) { print “enter a query. “; chop ( $query = )} # open the index $analyzer = kinosearch::analysis::polyanalyzer->new( language => ‘en’ ); $searcher = kinosearch::searcher->new( invindex => index, analyzer => $analyzer ); # search $hits = $searcher->search( qq( $query )); # get the number of hits and display $total_hits = $hits->total_hits; 24 information technology and libraries | september 2008 print “your query ($query) found $total_hits record(s).\n\n”; # process each search result while ( $hit = $hits->fetch_hit_hashref ) { # get the mylibrary resource $resource = mylibrary::resource->new( id => $hit->{ ‘id’ }); # extract dublin core elements and display print “ id = “ . $resource->id . “\n”; print “ name = “ . $resource->name . “\n”; print “ date = “ . $resource->date . “\n”; print “ note = “ . $resource->note . “\n”; print “ creators = “; foreach ( split /\|/, $resource->creator ) { print “$_; “ } print “\n”; # get related terms and display @resource_terms = $resource->related_terms(); print “ term(s) = “; foreach (@resource_terms) { $term = mylibrary::term->new(id => $_); print $term->term_name, “ ($_)”, ‘; ‘; } print “\n”; # get locations (urls) and display @locations = $resource->resource_locations(); print “ location(s) = “; foreach (@locations) { print $_->location, “; “ } print “\n\n”; } # done exit; information retrieval using a middleware approach danijela boberić krstićev information technology and libraries | march 2013 54 abstract this paper explores the use of a mediator/wrapper approach to enable the search of an existing library management system using different information retrieval protocols. it proposes an architecture for a software component that will act as an intermediary between the library system and search services. it provides an overview of different approaches to add z39.50 and search/retrieval via url (sru) functionality using a middleware approach that is implemented on the bisis library management system. that wrapper performs transformation of contextual query language (cql) into lucene query language. the primary aim of this software component is to enable search and retrieval of bibliographic records using the sru and z39.50 protocols, but the proposed architecture of the software components is also suitable for inclusion of the existing library management system into a library portal. the software component provides a single interface to server-side protocols for search and retrieval of records. additional protocols could be used. this paper provides practical demonstration of interest to developers of library management systems and those who are trying to use open-source solutions to make their local catalog accessible to other systems. introduction information technologies are changing and developing very quickly, forcing continual adjustment of business processes to leverage the new trends. these changes affect all spheres of society, including libraries. there is a need to add new functionality to existing systems in ways that are cost effective and do not require major redevelopment of systems that have achieved a reasonable level of maturity and robustness. this paper describes how to extend an existing library management system with new functionality supporting easy sharing of bibliographic information with other library management systems. one of the core services of library management systems is support for shared cataloging. this service consists of the following activities: a librarian when processing a new bibliographical unit first checks whether the bibliographic unit has already been recorded in another library in the world. if it is found, then the librarian stores that electronic records to his/her local database of bibliographic records. in order to enable those activities, it is necessary that standard way of communication between different library management systems exists. currently, the well-known standards in this area are z39.501 and sru.2 danijela boberić krstićev (dboberic@uns.ac.rs) is a member department of mathematics and informatics, faculty of sciences, university of novi sad, serbia. mailto:dboberic@uns.ac.rs information retrieval using a middleware approach | krstićev 55 in this paper, a software component that integrates services for retrieval bibliographic records using the z39.50 and sru standard is described. the main purpose of that component is to encapsulate server sides of the appropriate protocols and to provide a unique interface for communication with the existing library management system. the same interface may be used regardless of which protocols are used for communication with the library management system. in addition, the software component acts as an intermediary between two different library management systems. the main advantage of the component is that it is independent of library management system with which it communicates. also, the component could be extended with new search and retrieval protocols. by using the component, the functionality of existing library management systems would be improved and redevelopment of the existing system would not be necessary. it means that the existing library management system would just need to provide an interface for communication with that component. that interface can even be implemented as an xml web service. standards used for search and retrieval the z39.50 standard was one of the first standards that defined a set of services to search for and retrieve data. the standard is an abstract model that defines communication between the client and server and does not go into details of implementation of the client or server. the model defines abstract prefixes used for search that do not depend on the implementation of the underlying system. it also defines the format in which data can be exchanged. the z39.50 standard defines query language type-1, which is required when implementing this standard. the z39.50 standard has certain drawbacks that new generation of standards, like sru, is trying to overcome. sru tries to keep functionality defined by z39.50 standard, but to allow its implementation using current technologies. one of the main advantages of the sru protocol, as opposed to z39.50, is that it allows messages to be exchanged in a form of xml documents, which was not the case with the z39.50 protocol. the query language used in sru is called contextual query language (cql).3 the sru standard has two implementations, one in which search and retrieval is done by sending messages via the hypertext transfer protocol (http) get and post methods (sru version) and the other for sending messages using the simple object access protocol (soap) (srw version). the main difference between sru and srw is in the way of sending messages.4 the srw version of the protocol packs messages in the soap envelope element, while the sru version of the protocol sends messages based on parameter/value pairs that are included in the url. another difference between the two versions is that the sru protocol for messages transfer uses only http, while srw, in can use secure shell (ssh) and simple mail transfer protocol (smtp), in addition to http. information technology and libraries | march 2013 56 related work a common approach for adding sru support to library systems, most of which already support, the z39.50 search protocol,5 has been to use existing software architecture that supports the z39.50 protocol. simultaneously supporting both protocols is very important because individual libraries will not decide to move to the new protocol until it is widely adopted within the library community. one approach in the implementation of a system for retrieval of data using both protocols is to create two independent server-side components for z39.50 and sru, where both software components access a single database. this approach involves creating a server implementation from the scratch without the utilization of existing architectures, which could be considered a disadvantage. figure 1. software architecture of a system with separate implementations of serverside protocols this approach is good if there is an existing z39.50 or sru server-side implementation, or if there is a library management system, for example, that supports just the z39.50 protocol, but has open programming code and allows changes that would allow the development of an sru service. the system architecture that is based on this approach is shown in figure 1 as a unified modeling language (uml) component diagram. in this figure, the software components that constitute the implementation of the client and the server side for each individual protocol are clearly separated, while the database is shared. the main disadvantage of this approach is that adding support for new search and retrieval protocols requires the transformation of the query language supported by that new protocol into the query language of target system. for example, if the existing library management system uses a relational database to store bibliographic records, for every a new protocol added, its query language must be transformed into the structured query language (sql) supported by the database. z39.50 server side sru server side database z39.50 client side sru client side zservice sruservice jdbc information retrieval using a middleware approach | krstićev 57 however, in most commercial library management systems that support server-side z39.50, local development and maintenance of additional services may not be possible due to the closed nature of the systems. one of the solutions in this case would be to create a so-called “gateway” software component that implements both an sru server and a z39.50 client, used to access the existing z39.50 server. that is, if a sru client's application sends search request, the gateway will accept that request, transform it into the z39.50 request and forward the request to the z39.50 server. similarly, when the gateway receives a response from the z39.50 server, the gateway will transform this response in sru response and forward it to the client. in this way, the client will have the impression that communicates directly with the sru server, while the existing z39.50 server will think that it sends response directly to the z39.50 client. figure 2 presents a component diagram that represents the architecture of the system that is based on this approach. figure 2. software architecture of a system with a gateway the software architecture shown in the figure 2 is one of the most common approaches and is used by the library of congress (lc),6 which uses the commercial voyager7 library information system, which allows searching by the z39.50 protocol. in order to support search of the lc database using sru, indexdata8 developed the yazproxy software component,9 which is an sruz39.50 gateway. the same idea10 was used in the implementation of the "the european library”11 database sru client side jdbc gateway sru server side z39.50 client side srutoz3950converter zservice z39.50 server side sruservice information technology and libraries | march 2013 58 portal, which aims to provide integrated access to the major collections of all the european national libraries. another interesting approach in designing software architecture for systems dealing with retrieval of information can be observed in the systems involved in searching heterogeneous information sources. the architecture of these systems is shown in figure 3. the basic idea in most of these systems is to provide the user with a single interface to search different systems. this means that there is a separate component that will accept a user query and transform it into a query that is supported by the specific system component that offers search and data retrieval. this component is also known as a mediator. a separate wrapper component must be created for each system to be searched, to convert the user's query to a query that is understood by the particular target system.12 figure 3. architecture with the mediator/wrapper approach figure 3 shows a system architecture that enables communication with three different systems (system1, system2 and systemn), each of which may use a different query language and therefore need different wrapper components (wrapper1, wrapper2 and wrappern ). in this architecture, each system can be a new mediator component that will interact with other systems. that is, the wrapper component can communicate with the system or with another mediator. the role of the mediator is to accept the request defined by the user and send it to all wrapper components. the wrapper components know how to transform the query that is sent by a mediator into a query that is supported by the target system with which the wrapper communicates. in addition, the wrapper has to transform data received from the target system in a format prescribed by the mediator. communication between client applications and the mediator client mediator system1 system2 systemn wrapper1 wrapper2 wrappern converter1 concrete query languagenconcrete query language2concrete query language1 converter2 convertern uniform query language information retrieval using a middleware approach | krstićev 59 may be through one of the protocols for search and retrieval of information, for example through the sru or z39.50 protocols, or it may be a standard http protocol. systems in which the architecture is based on the mediator/wrapper approach are described in several papers. coiera et al (2005)13 describe the architecture of a system that deals with the federated search of journals in the field of medicine, using the internal query language unified query language (uql). for each information source with which the system communicates, a wrapper was developed to translate queries from uql into the native query language of the source. the wrapper also has the task of returning search results to the mediator. those results are returned as an xml document, with a defined internal format called a unified response language (urel). as an alternative to using particular defined languages (uql and urel), a cql query language and the sru protocol could be used. another example of the use of mediators is described by cousins and sanders (2006),14 who address the interoperability issues in cross-database access and suggest how to incorporate a virtual union catalogue into the wider information environment through the application of middleware, using the z39.50 protocol to communicate with underlying sources. software component for services integration this paper describes a software component that would enable the integration of services for search and retrieval of bibliographic records into an existing library system. the main idea is that the component should be modular and flexible in order to allow the addition of new protocols for search and easy integration into the existing system. based on the papers analyzed in the previous section, it was concluded that a mediator/wrapper approach would work best. the architecture of system that would include the component and that would allow search and retrieval of bibliographic records from other library systems is shown in figure 4. z39.50 client sru client library information system recordmanager intermediary mediator wrapper z39.50 server sru server information technology and libraries | march 2013 60 figure 4. architecture of system for retrieval of bibliographic records in figure 4, the central place is occupied by the intermediary component, which consists of a mediator component and a wrapper component. this component is an intermediary between the search service and an existing library system. the library system provides an interface (recordmanager) which is responsible for returning records that match the received query. figure 4 also shows the components that are client applications that use specific protocols for communication (sru and z39.50), as well as the components that represent the server-side implementation of appropriate protocols. this paper will not describe the architecture of components that implement the server side of the z39.50 and sru protocols, primarily because there are already a lot of open-source solutions15 that implement those components and can easily be connected with this intermediary component. in order to test the intermediary component, we used the server side of the z39.50 protocol developed through the jafer project16 ; for the sru server side, we developed a special web service in the java programming language. in further discussion, it is assumed that the intermediary component receives queries from server-side z39.50 and sru services, and that this component does not contain any implementation of these protocols. the mediator component, which is part of the intermediary component, must accept queries sent by the server-side search and retrieval services. the mediator component uses its own internal representation of queries, so it is therefore necessary to transform received queries into the appropriate internal representation. after that, the mediator will establish communication with the wrapper component, which is in charge of executing queries in existing library system. the basic role of the wrapper component is to transform queries received from the mediator into queries supported by library system. after executing the query, the wrapper sends search results as an xml document to the mediator. before sending those results to server side of protocol, the mediator must transform those results into the format that was defined by the client. mediator software component the mediator is a software component that provides a unique interface for different client applications. in this study, as shown in figure 4, a slightly different solution was selected. instead of the mediator communicating directly with the client application, which in the case of protocols for data exchange is client side of that protocol, it actually communicates with the server components that implement the appropriate protocols, and the client application exchanges messages with the corresponding server-side protocol. the z39.50 client exchanges messages with the appropriate z39.50 server, and it communicates with the mediator component. a similar process is done when communication is done using the sru protocol. what is important to emphasize is that the z39.50 and sru servers communicate with the mediator through a unified user interface, represented in figure 5 by class mediatorservice. in this way the same method is used to submit the query and receive results, regardless of which protocol is used. that means information retrieval using a middleware approach | krstićev 61 that our system becomes more scalable and that it is possible to add some new search and retrieval protocols without refactoring the mediator component. figure 5 shows the uml class diagram that describes the software mediator component. the mediatorservice class is responsible for communication with the server-side z39.50 and sru protocols. this class accepts queries from the server side of protocols and returns bibliographic records in the format defined by the server. the mediator can accept queries defined by different query languages. its task is to transform these queries to an internal query language, which will be forwarded to the wrapper component. in this implementation, accepted queries are transformed into an object representation of cql, as defined by the sru standard. one of the reasons for choosing cql is that concepts defined in the z39.50 standard query language can be easily mapped to the corresponding concepts defined by cql. cql is semantically rich, so can be used to create various types of queries. also, because it is based on the concept of context set, it is extensible and allows usage of various types of context sets for different purposes. so, cql is not just limited to the function of searching bibliographic material. it could, for example, be used for searching geographical data. accordingly, it was assumed that cql is a general query language and that probably any query language could be transformed into it. in this implementation, the object model of cql query defined in project cqljava17 was used. in the case that there is a new query language, it would be necessary to perform mapping of the new query language into cql or to extend the object model of cql with new concepts. this implementation of the mediator component could transform two different types of queries into the cql object model. currently, it can transform type-1 queries (used by z39.50) and cql queries into cql object representation. to to add a new query language, it would just be necessary to add a new class that would implement the interface queryconverter shown in figure 5, but the architecture of component mediator remains the same. one task of the mediator component is to return records in the format that was defined by the client that sent the request. information technology and libraries | march 2013 62 figure 5. uml class diagram of mediator component as the mediator communicates with the z39.50 and sru server side, the task of the z39.50 and sru server side will be to check whether the format that the client requires is supported by the underlying system. if it is not supported, the request is not sent to mediator. otherwise, the mediator ensures the transformation of retrieved records into the chosen format. the mediator obtains bibliographic records from the wrapper in the form of an xml document that is valid according to the appropriate xml schema.18 the xml schema allows the creation of an xml document describing bibliographic records according to the unimarc19 or marc2120 format. the current implementation of the mediator component supports transformation of bibliographic records into an xml document that can be an instance of the unimarcslim xml schema,21 the marc21slim xml schema,22 or the dublin core xml schema.23 adding support for a new format would require creating a new class that would extend the class recordserializer (figure 5). because this mediator component works with xml, the transformation of bibliographic records into a new format also could be done by using exstensible stylesheet language transformations (xslt). 0..11..1 0..1 1..* 0..1 0..1 mediatorservice + getrecords (object query, string format) : string[] wrapper + executequery (cqlnode cqlquery) : string[] cqlstringconverter + parsequery (object query) : cqlnode rpnconverter + parsequery (object query) : cqlnode queryconverter + parsequery (object query) : cqlnode marc21serializer + serialize (string r) : sting dublincoreserializer + serialize (string r) : sting unimarcserializer + serialize (string r) : sting recordserialize + serialize (string r) : sting information retrieval using a middleware approach | krstićev 63 wrapper software component the wrapper software component is responsible for ensuring communication between the mediator and the existing library system. that is, the wrapper component is responsible for transforming the cql object representation into a concrete query that is supported by the existing library system and for obtaining results that match the query. implementation of the wrapper component directly depends on the architecture of the existing library system. figure 7 proposes a possible architecture of the wrapper component. this proposed architecture assumes that the existing library system provides some kind of service that will be used by the wrapper component to send the query and obtain results. the recordmanager interface in figure 7 is an example of such a service. recordmanager has two operations, one which executes the query and returns the number of hits and the second operation which returns bibliographic records. this proposed solution is useful for libraries that use a library management system that can be extended. it may not be appropriate for libraries using an “off the self” library management system that cannot be extended. the proposed architecture of the wrapper component is based on a strategy design pattern,24 primarily because of the need for transformation of the cql query into a query that is supported by the library system. according to the cql concept of context sets, all prefixes that can be searched are grouped in context sets, and these sets are registered with the library of congress. the concept of context sets enables specific communities and users to define their own prefixes, relations, and modifiers without fear that their name will be identical to the name of prefix defined in another set. that is, it is possible to define two prefixes with the same name, but they belong to different sets and therefore have different semantics. cql offers the possibility of combining in a single query elements that are defined in different context sets. when parsing a query, it is necessary to check which context set a particular item belongs to and then to apply appropriate mapping of the element from the context set to the corresponding element defined by the query language used in the library system. the strategy design pattern includes patterns that describe the behavior of objects (behavioral patterns), which determine the responsibility of each object and the way in which objects communicate with each other. the main task of a strategy pattern is to enable easy adjustment of the algorithm that is applied by an object at runtime. strategy pattern defines a family of algorithms, each of which is encapsulated in a single object. figure 6 is shows a class diagram from the book “design patterns: elements of reusable object-oriented software,“25 which describes basic elements of strategy patterns. information technology and libraries | march 2013 64 figure 6. strategy design pattern the basic elements of this pattern are the classes context, strategy, concretestrategya and concretestrategyb. the class context is in charge of choosing and changing algorithms in a way that creates an instance of the appropriate class, which implements the interface strategy. interface strategy contains the method algorityinterface(), which should implement all classes that implement that interface. class concretestrategya implements one concrete algorithm. this design pattern is used when transforming cql queries primarily because cql queries can consist of elements that belong to different context sets, whose elements are interpreted differently. classes context, strategy, cqlstrategy and dcstrategy, shown in figure 7, are elements of strategy pattern responsible for mapping concepts defined by cql. the class context is responsible for selection of appropriate strategies for parsing, depending on which context set the element that is going to be transformed belongs to. class cqlstrategy and dcstrategy are responsible for mapping the elements belonging respectively to the cql or dublin core context set in the appropriate elements of a particular query language used by the library system. the use of strategy pattern makes it possible, in real time, to change the algorithm that will parse the query depending on what context set is used. the described implementation of a wrapper component enables the parsing of queries that contain only elements that belong to cql and/or the dublin core context set. in order to provide support for a new context set, a new implementation of interface strategy (figure 7) would be required, including an algorithm to parse the elements defined by this new set. information retrieval using a middleware approach | krstićev 65 figure 7. uml class diagram of wrapper component integration of intermediary software components into the bisis library system the bisis library system was developed at the faculty of science and the faculty of technical sciences in novi sad, serbia, and has had several versions since its introduction in 1993. the fourth and current version of the system is based on xml technologies. among the core functional units of bisis26 are: • circulation of library material • cataloging of bibliographic records • indexing and retrieval of bibliographic records • downloading bibliographic records through z39.50 protocol • creation of a card catalog • creation of statistical reports an intermediary software component has been integrated into the bisis system. the intermediary component was written in the java programming language and implemented as a web application. communication between server applications that support the z39.50 and sru protocols and the intermediary component is done using the software package hessian.27 hessian offers a simple implementation of two protocols to communicate with web services, a binary protocol and its corresponding xml protocol, both of which rely on http. use of hessian package makes it easy to create a java servlet on the server side and proxy object on client-side, which will be used to 0..1 1..1 0..11..1 0..1 1..1 context + + + setstrategy (string strategy) mapindext ounderlayingprefix (string index) parseoperand (string index, cqlt ermnode node) : void : string : object strategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object cqlstrategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object dcstrategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object recordmanager + + select (object query) getrecords (int hits[]) : int[] : string[] wrapper + executequery (cqlnode cqlquery) makequery (cqlnode cql, object underlayingquery) : string[] : object information technology and libraries | march 2013 66 communicate with the servlet. in this case, the proxy object is deployed on the server side of protocol and the intermediary component contains a servlet. communication between the intermediary and bisis is also realized using the hessian software package, which leads to the possibility of creating a distributed system because the existing library system, the intermediary component, and server applications that implement the protocols can be located on physically separate computers. the bisis library system uses the lucene software package for indexing and searching. lucene has defined its own query language,29 so the wrapper component that is integrated into bisis has to transform to the cql query object model the object representation of the query defined by lucene. therefore the wrapper first needs to determine to which context set the index belongs and then apply the appropriate strategy for mapping the index. the rules for mapping the index to lucene fields are read from the corresponding xml document that is defined for every context set. listing 1 below provides an example of an xml document that contains some rules for mapping indexes of the dublin core context set to lucene index fields. the xml element index represents the name of index which is going to be mapped, while the xml element mappingelement contains the name of lucene field. for example, the title index defined in the dublincore context set, which denotes search by title of the publication, is mapped to the field ti, which is used by the search engine of bisis system. title ti creator au subject sb listing 1. xml document with rules for mapping the dublincore context set after the index is mapped to corresponding fields in lucene, a similar procedure is repeated for a relationship that may belong to some other context set or may have modifiers that belong to some information retrieval using a middleware approach | krstićev 67 other context set. it is therefore necessary to change the current strategy for mapping into a new one. by doing this, all elements of the cql query are converted into a lucene query, so the new query can be sent to bisis to be executed. approximately 40 libraries in serbia currently use the bisis system, which includes a z39.50 client, allowing the libraries to search the collections of other libraries that support communication through the z39.50 protocol. by integrating the intermediary component in the bisis system, non-bisis libraries may now search the collections of libraries that use bisis. as a first step, the intermediary component was just integrated in a few libraries, without any major problems. the component is most useful to the city libraries that use system bisis, because they have many branches, which can now search and retrieve bibliographic records from their central libraries. the component could potentially be used by other library management system, assuming the presence of an appropriate wrapper component to transform cql to the target query language. conclusion this paper describes an independent, modular software component that enables the integration of a service for search and retrieval of bibliographic records into an existing library system. the software component provides a single interface to server-side protocols to search and retrieve records, and could be extended to support additional server-side protocols. the paper describes the communication of this component with z39.50 and sru servers. the software component was developed for integration with the bisis library system, but is an independent component that could be integrated in any other library system. the proposed architecture of the software component is also suitable for inclusion of the existing library system into a single portal. the architecture of the portal should involve one mediator component whose task would be to communicate with wrapper components of individual library systems. each library system would implement its own search and store functionalities and could function independently of the portal. the basic advantage of this architecture is that it is possible to include new library systems that provide search services. it is only necessary to add a new wrapper that will perform the appropriate transformation of the query obtained from the mediator component in a query that the library system can process. the task of the mediator is to send queries to the wrapper, while each wrapper can establish communication with a specific library system. after obtaining the results from underlying library system, the mediator should be able to combine results, remove duplicate, and sort results. in this way end user would have impression that he has been searched a single database. references 1. “information retrieval (z39.50): application service definition and protocol specification,” http://www.loc.gov/z3950/agency/z39-50-2003.pdf (accessed february 22, 2013). http://www.loc.gov/z3950/agency/z39-50-2003.pdf information technology and libraries | march 2013 68 2. “search/retrieval via url,” http://www.loc.gov/standards/sru/. 3. “contextual query language – cql,” http://www.loc.gov/standards/sru/specs/cql.html. 4. eric lease morgan, "an introduction to the search/retrieve url service (sru),” ariadne 40 (2004), http://www.ariadne.ac.uk/issue40/morgan. 5. larry e. dixson, "yaz proxy installation to enhance z39.50 server performance,” library hi tech 27, no. 2 (2009): 277-285, http://dx.doi.org/10.1108/07378830910968227; mike taylor and adam dickmeiss, “delivering marc/xml records from the library of congress catalogue using the open protocols srw/u and z39.50,” (paper presented at world library and information congress: 71st ifla general conference and council, oslo, 2005). 6. mike taylor and adam dickmeiss,“delivering marc/xml records from the library of congress catalogue using the open protocols srw/u and z39.50,” (paper presented at world library and information congress: 71st ifla general conference and council, oslo, 2005). 7. “voyager integrated library system,” http://www.exlibrisgroup.com/category/voyager. 8. “indexdata,” http://www.indexdata.com/. 9. “yazproxy,” http://www.indexdata.com/yazproxy. 10. theo van veen and bill oldroyd, “search and retrieval in the european library,” d-lib magazine 10, no. 2 (2004), http://www.dlib.org/dlib/february04/vanveen/02vanveen.html.. 11. “тhe european library,” http://www.theeuropeanlibrary.org./tel4/. 12. gio wiederhold ,“mediators in the architecture of future information systems,” computer 25, no. 3 (1992): 38-49, http://dx.doi.org/10.1109/2/121508. 13. enrico coiera, martin walther, ken nguyen, and nigel h. lovell, “architecture for knowledgebased and federated search of online clinical evidence,” journal of medical internet research 7, no. 5 (2005), http://www.jmir.org/2005/5/e52/. 14. shirley cousins and ashley sanders, “incorporating a virtual union catalogue into the wider information environment through the application of middleware: interoperability issues in crossdatabase access,” journal of documentation 62, no. 1 (2006): 120-144, http://dx.doi.org/10.1108/00220410610642084. 15. “sru software and tools,” http://www.loc.gov/standards/sru/resources/tools.html; “z39.50 registry of implementators,” http://www.loc.gov/z3950/agency/register/entries.html. 16. “jafer toolkit project,” http://www.jafer.org. 17. “cql-java: a free cql compiler for java,” http://zing/z3950.org/cql/java/. http://www.loc.gov/standards/sru/ http://www.loc.gov/standards/sru/specs/cql.html http://www.ariadne.ac.uk/issue40/morgan http://dx.doi.org/10.1108/07378830910968227 http://www.exlibrisgroup.com/category/voyager http://www.indexdata.com/ http://www.indexdata.com/yazproxy http://www.dlib.org/dlib/february04/vanveen/02vanveen.html http://www.theeuropeanlibrary.org./tel4/ http://dx.doi.org/10.1109/2/121508 http://www.jmir.org/2005/5/e52/ http://dx.doi.org/10.1108/00220410610642084 http://www.loc.gov/standards/sru/resources/tools.html http://www.loc.gov/z3950/agency/register/entries.html http://www.jafer.org/ http://zing/z3950.org/cql/java/ information retrieval using a middleware approach | krstićev 69 18. bojana dimić, branko milosavljević and dušan surla,“xml schema for unimarc and marc 21 formats,” the electronic library 28, no. 2 (2010): 245-262, http://dx.doi.org/10.1108/02640471011033611. 19. “unimarc formats and related documentation,” http://www.ifla.org/en/publications/unimarcformats-and-related-documentation. 20. “marc 21 format for bibliographic data,” http://www.loc.gov/marc/bibliographic/. 21. “unimarcslim xml schema,” http://www.bncf.firenze.sbn.it/progetti/unimarc/slim/documentation/unimarcslim.xsd. 22. “marc21slim xml schema,” http://www.loc.gov/standards/marcxml/schema/marc21slim.xsd. 23. “dublincore xml schema,” http://www.loc.gov/standards/sru/resources/dc-schema.xsd. 24. erich gamma, richard helm, ralph johnson, and john vlissides, design patterns: elements of reusable object-oriented software (indianapolis: addison–wesley, 1994), 315-323. 25. ibid. 26. danijela boberić and branko milosavljević, “generating library material reports in software system bisis,” (proceedings of the 4th international conference on engineering technologies icet, novi sad, 2009); danijela boberić and dušan surla, “xml editor for search and retrieval of bibliographic records in the z39.50 standard”, the electronic library 27, no. 3 (2009): 474-495, http://dx.doi.org/10.1108/02640470910966916 (accessed february 22, 1013); bojana dimić and dušan surla, “xml editor for unimarc and marc21 cataloguing,” the electronic library 27, no. 3 (2009): 509-528, http://dx.doi.org/10.1108/02640470910966934 (accessed february 22, 2013); jelena rađenović, branko milosavljеvić and dušan surla, “modelling and implementation of catalogue cards using freemarker,” program: electronic library and information systems 43, no. 1 (2009): 63-76, http://dx.doi.org/10.1108/00330330934110 (accessed february 22, 2013); danijela tešendić, branko milosavljević and dušan surla, “a library circulation system for city and special libraries”, the electronic library 27, no. 1 (2009): 162-186, http://dx.doi.org/10.1108/02640470910934669. 27. “hessian,” http://hessian.caucho.com/doc/hessian-overview.xtp. 28. branko milosavljević, danijela boberić, and dušan surla, “retrieval of bibliographic records using apache lucene,” the electronic library 28, no. 4 (2010): 525-539, http://dx.doi.org/10.1108/02640471011065355. acknowledgement the work is partially supported by the ministry of education and science of the republic of serbia, through project no. 174023: "intelligent techniques and their integration into wide-spectrum decision support." http://dx.doi.org/10.1108/02640471011033611 http://www.ifla.org/en/publications/unimarc-formats-and-related-documentation http://www.ifla.org/en/publications/unimarc-formats-and-related-documentation http://www.loc.gov/marc/bibliographic/ http://www.bncf.firenze.sbn.it/progetti/unimarc/slim/documentation/unimarcslim.xsd http://www.loc.gov/standards/marcxml/schema/marc21slim.xsd http://www.loc.gov/standards/sru/resources/dc-schema.xsd http://dx.doi.org/10.1108/02640470910966916 http://dx.doi.org/10.1108/02640470910966934 http://dx.doi.org/10.1108/00330330934110 http://dx.doi.org/10.1108/02640470910934669 http://hessian.caucho.com/doc/hessian-overview.xtp http://dx.doi.org/10.1108/02640471011065355 abstract smartphones: a potential discovery tool | starkweather and stoward 187 smartphones: a potential discovery tool wendy starkweather and eva stowers the anticipated wide adoption of smartphones by researchers is viewed by the authors as a basis for developing mobile-based services. in response to the unlv libraries’ strategic plan’s focus on experimentation and outreach, the authors investigate the current and potential role of smartphones as a valuable discovery tool for library users. w hen the dean of libraries announced a discovery mini-conference at the university of nevada las vegas libraries to be held in spring 2009, we saw the opportunity to investigate the potential use of smartphones as a means of getting information and services to students. being enthusiastic users of apple’s iphone, we and the web technical support manager, developed a presentation highlighting the iphone’s potential value in an academic library setting. because wendy is unlv libraries’ director of user services, she was interested in the applicability of smartphones as a tool for users to more easily discover the libraries’ resources and services. eva, as the health sciences librarian, was aware of a long tradition of pda use by medical professionals. indeed, first-year bachelor of science nursing students are required to purchase a pda bundled with select software. together we were drawn to the student-outreach possibilities inherent in new smartphone applications such as twitter, facebook, and myspace. n presentation our brief review of the news and literature about mobile phones in general provided some interesting findings and served as a backdrop for our presentation: n a total of 77 percent of internet experts agreed that the mobile phone would be “the primary connection tool” for most people in the world by 2020.1 the number of smartphone users is expected to top 100 million by 2013. there are currently 25 million smartphone users, with sales in north america having grown 69 percent in 2008.2 n smartphones offer a combination of technologies, including gps tracking, digital cameras, and digital music, as well as more than fifty-thousand specialized apps for the iphone and new ones being designed for the blackberry and the palm pre.3 the palm pre offered less than twenty applications at its launch, but one million apllication downloads had been performed by june 24, 2009, less than a month after launch.4 n the 2009 horizon report predicts that the time to adoption of these mobile devices in the educational context will be “one year or less.”5 data gathered from campus users also was presented, providing another context. in march 2009, a survey of university of california, davis (uc-davis) students showed that 43 percent owned a smartphone.6 uc-davis is participating in apple’s university education forum. here at unlv, 37 percent of students and 26 percent of faculty and staff own a smartphone.7 the presentation itself highlighted the mobile applications that were being developed in several libraries to enhance student research, provide library instruction, and promote library services. two examples were abilene christian university (http://www.acu.edu/technology/ mobilelearning/index.html), which in fall 2008 distributed iphones and ipod touches to the incoming freshman class; and stanford university (http://www.stanford .edu/services/wirelessdevice/iphone/) which participates in “itunes u” (http://itunes.stanford.edu/). if the libraries were to move forward with smartphone technologies, it would be following the lead of such universities. readers also may be interested in joan lippincott’s recent concise summary of the implications of mobile technologies for academic libraries as well as the chapter on library mobile initiatives in the july 2008 library technology report.8 n goals: a balancing act ultimately the goal for many of these efforts is to be where the users are. this aspiration is spelled out in unlv libraries’ new strategic plan relating to infrastructure evolution, namely, “work towards an interface and system architecture that incorporates our resources, internal and external, and allows the user to access from their preferred starting point.”9 while such a goal is laudable and fits very well into the discovery emphasis of the mini-conference presentation, we are well aware of the need for further investigation before proceeding directly to full-scale development of a complete suite of mobile services for our users. of critical importance is ascertaining where our users are and determining whether they want us to be there and in what capacity. the value of this effort is demonstrated in booth’s research report on student interest in emerging technologies at ohio state university. the report includes the results of an extensive environmental survey of their wendy starkweather (wendy.starkweather@unlv.edu) is director, user services division, and eva stowers (eva.stowers @unlv.edu) is medical/health sciences librarian at the university of nevada las vegas libraries. 188 information technology and libraries | december 2009 library users. the study is part of ohio state’s effort to actualize their culture of assessment and continuous learning and to use “extant local knowledge of user populations and library goals” to inform “homegrown studies to illuminate contextual nuance and character, customization that can be difficult to achieve when using externally developed survey instruments.”10 unlv libraries are attempting to balance early experimentation and more extensive data-driven decision-making. the recently adopted strategic plan includes specific directions associated with both efforts. for experimentation, the direction states, “encourage staff to experiment with, explore, and share innovative and creative applications of technology.”11 to that end, we have begun working with our colleagues to introduce easy, small-scale efforts designed to test the waters of mobile technology use through small pilot projects. “text-a-librarian” has been added to our existing group of virtual reference service, and we introduced a “text the call number and record” service to our library’s opac in july 2009. unlv libraries’ strategic plan helps foster the healthy balance by directing library staff to “emphasize data collection and other evidence based approaches needed to assess efficiency and effectiveness of multiple modes and formats of access/ownership” and “collaborate to educate faculty and others regarding ways to incorporate library collections and services into education experiences for students.”12 action items associated with these directions will help the libraries learn and apply information specific to their users as the libraries further adopt and integrate mobile technologies into their services. as we begin our planning in earnest, we look forward to our own set of valuable discoveries. references 1. janna anderson and lee rainie, the future of the internet iii, pew internet & american life project, http://www.pewinternet .org/~/media//files/reports/2008/pip_futureinternet3.pdf .pdf (accessed july 20, 2009). 2. sam churchill, “smartphone users: 110m by 2013,” blog entry, mar. 24, 2009, dailywireless.org, http://www.daily wireless.org/2009/03/24/smartphone-users-100m-by-2013 (accessed july 20, 2009). 3. mg siegler, “state of the iphone ecosystem: 40 million devices and 50,000 apps,” blog entry, june 8, 2009, tech crunch, http://www.techcrunch.com/2009/06/08/40-million-iphones -and-ipod-touches-and-50000-apps (accessed july 20, 2009). 4. jenna wortham, “palm app catalog hits a million downloads,” blog entry, june 24, 2009, new york times technology, http://bits.blogs.nytimes.com/2009/06/24/palm-app-cataloghits-a-million-downloads (accessed july 20, 2009). 5. larry johnson, alan levine, and rachel smith, horizon report, 2009 edition (austin, tex.: the new media consortium, 2009), http://www.nmc.org/pdf/2009-horizon-report.pdf (accessed july 20, 2009). 6. university of california, davis. “more than 40% of campus students own smartphones, yearly tech survey says,” technews, http://technews.ucdavis.edu/news2.cfm?id=1752 (accessed july 20, 2009). 7. university of nevada las vegas, office of information technology, “student technology survey report: 2008– 2009,” http://oit.unlv.edu/sites/default/files/survey/survey results2008_students3_27_09.pdf (accessed july 20, 2009). 8. joan lippincott, “mobile technologies, mobile users: implications for academic libraries,” arl bi-monthly report 261 (dec. 2008), http://www.arl.org/bm~doc/arl-br-261-mobile .pdf. (accessed july 20, 2009); ellyssa kroski, “library mobile initiatives,” library technology reports 44, no. 5 (july 2008): 33–38. 9. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 20, 2009): 2. 10. char booth, informing innovation: tracking student interest in emerging library technologies at ohio university (chicago: association of college and research libraries, 2009), http:// www.ala.org/ala/mgrps/divs/acrl/publications/digital/ ii-booth.pdf (accessed july 20, 2009); “unlv libraries strategic plan 2009–2011,” 6. 11. “unlv libraries strategic plan 2009–2011,” 2. 12. ibid. manzari user-centered design of a web site | manzari and trinidad-christensen 163 this study describes the life cycle of a library web site created with a user-centered design process to serve a graduate school of library and information science (lis). findings based on a heuristic evaluation and usability study were applied in an iterative redesign of the site to better serve the needs of this special academic library population. recommendations for design of web-based services for library patrons from lis programs are discussed, as well as implications for web sites for special libraries within larger academic library settings. u ser-centered design principles were applied to the creation of a web site for the library and information science (lis) library at the c. w. post campus of long island university. this web site was designed for use by master’s degree and doctoral students in the palmer school of library and information science. the prototype was subjected to a usability study consisting of a heuristic evaluation and usability testing. the results were employed in an iterative redesign of the web site to better accommodate users’ needs. this was the first usability study of a web site at the c. w. post library. human-computer interaction, the study of the interaction of human performance with computers, imposes a rigorous methodology on the process of user-interface design. more than an intuitive determination of userfriendliness, a successful interactive product is developed by careful design, testing, and redesign based on the testing outcomes. testing the product several times as it is being developed, or iterative testing, allows the users’ needs to be incorporated into the design. the interface should be designed for a specific community of users and set of tasks to be accomplished, with the goal of creating a consistent, usable product. the lis library had a web site that was simply a description of the collection and did not provide access to online specialized resources. a new web site was designed for the lis library by the incoming lis librarian who made a determination of what content might be useful for lis students and faculty. the goal was to have such content readily accessible in a web site separate from the main library web site. the web site for the lis library includes: ฀ access to all online databases and journals related to lis; ฀ a general overview of the lis library and its resources as well as contact information, hours, and staff; ฀ a list of all print and online lis library journal subscriptions, grouped by both title and subject, with links to access the online journals; ฀ links to other web sites in the lis field; ฀ links to other university web pages, including the main library’s home page, library catalog, and instructions for remote database access, as well as to the lis school web site; ฀ a link to jake (jointly administered knowledge environment), a project by yale university that allows users to search for periodical titles within online databases, since the library did not have this type of access through its own software. this information was arranged in four top-level pages with sublevels. design considerations included making the site both easy to learn and efficient once users were familiar with it. since classes are taught at four locations in the metropolitan area, the site needed to be flexible enough to serve students at the c. w. post campus library as well as remotely. the layout of the information was designed to make the web site uncluttered and attractive. different color schemes were tried and informally polled among users. a version with white text on black background prompted strong likes or dislikes when shown to users. although this combination is easy to read, it was rejected because of the strong negative reactions from several users. photographs of the lis library and students were included. the pages were designed with a menu on the left side; fly-out menus were used to access submenus. where main library pages already existed for information to be included in the lis web site, such as lis hours and staff, links to those pages were made instead of re-creating the information in the lis web site. an attempt was made to render the site accessible to users with disabilities, and pages were made compliant with the world wide web consortium (w3c) by using their html validator and their cascading style sheet validator.1 ฀ literature review usability is a term with many definitions, varying by field.2 the fields of industrial engineering, product research and development, computer systems, and library science all share the study of human-and-machine interaction, as well user-centered design of a web site for library and information science students: heuristic evaluation and usability testing laura manzari and jeremiah trinidad-christensen laura manzari (manzari@liu.edu) is an associate professor and library and information science librarian at the c. w. post campus of long island university, brookville, n.y. jeremiah trinidad-christensen (jt2118@columbia.edu) is a gis/map librarian at columbia university, new york, n.y. 164 information technology and libraries | september 2006 as a commitment to users. dumas and reddish explain it simply: “usability means that the people who use the product can do so quickly and easily to accomplish their own tasks.”3 user-centered design incorporates usability principles into product design and places the focus on the user during project development. gould and lewis cite three principles of user-centered design: an early focus on users and tasks, empirical measurement of product usage, and iterative design to include user input into product design and modification.4 jakob nielsen, an often-cited usability engineering specialist, emphasizes that for increased functionality, engineering usability principles should apply to web design, which should be treated as a software development project. he advocates incorporating user evaluation into the design process first through a heuristic evaluation, followed by usability testing with a redesign of the product after each phase of evaluation.5 usability principles have been applied to library web-site design; however, library web-site usability studies often do not include the additional heuristic evaluation recommended by nielsen.6 in addition to usability, consideration should also be given during the design process to making the web site accessible to people with disabilities. federal agencies are now required by the rehabilitation act to make their web sites accessible to the disabled. section 508 part 1194.22 of the act enumerates sixteen rules for internet applications to help ensure web-site access for people with various disabilities.7 similarly, the web accessibility initiative hosted by the w3c works to ensure that accessibility practices are considered in web-site design. they developed the web content accessibility guidelines for making web sites accessible to people with disabilities.8 although articles have been written about usability testing of academic library web sites, very little has been written about usability testing of special-collection web sites for distinct user populations within larger academic settings.9 ฀ heuristic evaluation methodology heuristic evaluation is a usability engineering method in which a small set of expert evaluators examine a user interface for design problems by judging its compliance with a set of recognized usability principles or heuristics. nielsen developed a set of ten widely adopted usability heuristics (see sidebar). after studying the use of individual evaluators as well as groups of varying sizes, nielsen and molich recommend using three to five evaluators for a heuristic evaluation.10 the use of multiple experts will catch more flaws than a single expert, but using more than five experts does not produce greater results. in comparisons of heuristic evaluation and usability testing, the heuristic evaluation uncovered more of the minor problems while usability testing uncovered more major, global problems.11 since each method tends to uncover different usability problems, it is recommended that both methods be used complementarily, particularly with an iterative design change between the heuristic evaluation and the usability testing. for the heuristic evaluation, four people were approached from the palmer lis school faculty and ph.d. program with expertise in web-site design and humancomputer interaction. three agreed to participate. they were asked to familiarize themselves with the web site and evaluate it according to nielsen’s ten heuristics, which were provided to them. ฀ heuristic evaluation results the evaluators were all in agreement that the language was appropriate for lis students. one evaluator said if new students were not familiar with some of the terms they soon would be. another thought jake, the tool to access full text, might not be clear to students at first, but the lis web-site explanation was fine the way it was. they were also in agreement that the web site was well designed. comments included: “the purpose and description of each page is short and to the point, and there is a good, clean, viewable page for the users”; “the site was well designed and not over designed”; “very clear and user friendly”; “excellent example of limiting unnecessary irrelevant information.” the only page to receive a “poor layout” comment was the lengthy subject list of journals, though no suggestions for improvement were made. concern was expressed about links to other web sites on campus. one evaluator thought new students might be confused about the relationship between long island university, c. w. post, and the palmer school. two evaluators thought links to the main library’s web site could cause confusion because of the different design and layout. a preference for the design of the lis library web site over the main library and palmer school web sites was expressed. to eliminate some confusion, the menu options for other campus web sites were dropped down to a separate menu right below the menu of lis web pages. for additional clarity, some of the main library pages were re-created in the style of the lis pages instead of linking to the original page. the evaluators made several concrete suggestions for menu changes, which were included in the redesign. it was suggested that several menu options were unclear and needed clarification, so additional text was added for clarity at the expense of brevity. long island university’s online catalog is named liucat and was listed that way on the menu. new students might not be familiar with this name, so the menu label was changed to liucat (library catalog). user-centered design of a web site | manzari and trinidad-christensen 165 for the link to jake, a description, find periodicals in online databases, was added for clarification. it was also suggested that the link to the main library web page for all databases could cause confusion since the layout and design of that page is different. the wording was changed to all databases (located in the c. w. post library web site). menu options were originally arranged in order of anticipated use (see figure 1). thus, the order of menu options from the lis home page was databases, journals, library catalog, other web sites, palmer school, and main library. evaluators suggested that putting the option for lis home page first would give users an easy “emergency exit” to return to the home page if they were lost. the original menu options also varied from page to page. for example, menu options on the database page referred only to pages that users might need while doing database searches. at the suggestion of evaluators, the menu options were changed to be consistent on every page (see figure 2). a redesign based on these results was completed and posted to the internet for public use (see figure 3). ฀ usability testing methodology usability testing is an empirical method for improving design. test subjects are gathered from the population who will use the product and are asked to perform real tasks using the prototype while their performance and reactions to the product are observed and recorded by an interviewer. this observation and recording of behavior distinguishes usability testing from focus groups. observation allows the tester to see when and where users become frustrated or confused. the goal is to jakob nielsen’s usability heuristics visibility of system status—the system should always keep users informed about what is going on, through appropriate feedback within reasonable time. match between system and the real world— the system should speak the user’s language, with words, phrases, and concepts familiar to the user rather than system-oriented terms. follow real-world conventions, making information appear in a natural and logical order. user control and freedom—users often choose system functions by mistake and will need a clearly marked “emergency exit” to leave the unwanted state without having to go through an extended dialogue. support undo and redo. consistency and standards—users should not have to wonder whether different words, situations, or actions mean the same thing. follow platform conventions. error prevention—even better than good error messages is a careful design that prevents problems from occurring in the first place. recognition rather than recall—make objects, actions, and options visible. the user should not have to remember information from one part of the dialogue to another. instructions for use of the system should be visible or easily retrievable whenever appropriate. flexibility and efficiency of use—accelerators, unseen by the novice user, may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. allow users to tailor frequent actions. aesthetic and minimalist design—dialogues should not contain information that is irrelevant or rarely needed. every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility. help users recognize, diagnose, and recover from errors—error messages should be expressed in plain language (no codes), precisely indicate the problems, and constructively suggest a solution. help and documentation—even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. any such information would be easy to search, focused on the user’s task, list concrete steps to be carried out, and not be too large.12 figure 1. original menu figure 2. revised menu 166 information technology and libraries | september 2006 uncover usability problems with the product, not to test the participants themselves. the data gathered are then analyzed to recommend changes to fix usability problems. in addition to recording empirical data such as number of errors made or time taken to complete tasks, active intervention allows the interviewer to question participants about reasons for their actions as well as about their opinions regarding the product. in fact, subjects are asked to verbalize their thought processes as they complete the tasks using the interface. test subjects are usually interviewed individually and are all given the same pretest briefing from a script with a list of instructions followed by tasks representing actual use. test subjects are also asked questions about their likes and dislikes. in most situations, payment or other incentives are offered to help recruit subjects. four or five subjects will reveal 80 percent of usability problems.13 messages were sent to students via the palmer school’s mailing lists requesting volunteers. a ten-dollar gift certificate to a bookstore was offered as an inducement to recruitment. input was desired from both master’s degree and doctoral students. the first nine volunteers to respond—all master’s degree students—were accepted. this group included students from both the main and satellite campuses. no ph.d. students volunteered to participate at first, citing busy schedules, but eventually a doctoral student was recruited. testing was conducted in computer labs at the library, at the palmer school, and at the manhattan satellite campus. demographic information was gathered regarding users’ gender, age range, university status, familiarity with computers, with the internet, and with the lis library, as well as the type of internet connection and browser usually used. the subjects were given eight tasks to complete using the web site. the tasks reflected both the type of assignment a student might receive in class and the type of information they might seek on the lis web site on their own. the questions were designed to test usability of different parts of the web site. ฀ ฀usability testing results the first task tested the print journals page and asked if the lis library subscribes to a specific journal and whether it is refereed. (the web site uses an asterisk next to a journal title to indicate that it is refereed.) all subjects were able to easily find that the lis library does hold the journal title. although it was not initially obvious that the asterisk was a notation indicating that the journal was refereed, most of the subjects eventually found the explanatory note. many of the subjects did not know what a refereed journal was, and some asked if a definition could be provided on the site. for the second task, subjects needed to use jake to find the full text of an article. none of the students were familiar with jake but were able to use the lis web site to gain an understanding of its purpose and to access it. the third task asked subjects to find a library association that required using the other web sites page. all subjects demonstrated an understanding of how to use this page and found the information. the fourth task tested the full-text databases page. only one subject actually used this page to complete the task. the rest used the all databases link to the main library’s database list. that link appears above the link to full-text databases and most subjects chose that link without looking at the next menu option. several subfigure 3. final home page user-centered design of a web site | manzari and trinidad-christensen 167 jects became confused when they were taken to the main library’s page, just as the evaluators had predicted. even though wording was added warning users that they were leaving the lis web site, most subjects did not read it and wondered why the page layout changed and was not as clear. they also had trouble navigating back to the lis web site from the main library web site. the fifth task tested the journals by subject page. this task took longer for most of the subjects to answer, but all were able to use the page successfully to find a journal on a given subject. the sixth task required using the lis home page, and everyone easily used it to find the operating hours. the seventh task required subjects to find an online journal title that could be accessed from the electronic journals page. all subjects navigated this page easily. the final task asked subjects to find a book review. most subjects did not look at the page for library and information sciences databases to access the books in print database, saying they did not think it would be included there. instead, they used the link to the main library’s database page. one subject was not able to complete this task. problems primarily occurred during testing when subjects left the lis page to use a non-library science database located on the main web site. subjects had problems getting back to the lis site from the main library site. while performing tasks, some subjects would scroll up and down long lists instead of using the toolbars provided to bring the user to an exact location on the page. some preferred using the back button instead of using the lis web-site menu to navigate. these seemed to be individual styles of using the web and not any usability problem with the site. several people consistently used the menu to return to the lis home page before starting each new task, even though they could have navigated directly to the page they needed, making a return to the home page unnecessary. this validated the recommendation from the heuristic study that the link to the home page always be the first menu option to give users a comfortable safety valve when they get lost. the final questions asked subjects for their opinions on what they did and did not like about the web site, as well as any suggestions for improving the site. all subjects responded that they liked the layout of the pages, calling them uncluttered, clean, attractive, and logical. there were very few suggestions for improving the site. one person asked that contact information be included on the menu options in addition to its location right below the menu on the lis home page. another participant suggested adding class syllabi to the web site each semester, listing required texts along with a link to an online bookstore. some of the novice users asked for explanations of unfamiliar terms such as “refereed journals.” a participant suggested including a search engine instead of using links to navigate the site. this was considered during the initial site design but was not included since the site did not have a large number of pages. however, a search engine may be worth including. the one doctoral student had previously only used the main library’s web page to access databases. originally, he said he did not see the advantage of a site devoted to information science sources for doctoral candidates, since that program is more multidisciplinary. however, after completing the usability study, the student concluded that the lis web site was useful. he suggested that it should be publicized more to doctoral candidates and that it be more prominently highlighted on the main library web site. though the questions asked were about the lis web site, several subjects complained about the layout of the main library web site and suggested that it have better linking to the lis web site to enable it to be accessed more easily. ฀ conclusions iterative testing and user-centered design resulted in a product that testing revealed to be easy to learn and efficient to use, and about which subjects expressed satisfaction. based on findings that some students had not even been aware of the existence of the lis web site, greater emphasis is now given to the web site and its features during new student orientations. the biggest problem users had was navigating from the web pages of the main library back to the lis site. it was suggested that the lis site be highlighted more prominently on the main library web site. some users were confused by the different layouts between the sites, but no one expressed a preference for the design used by the main library web site. despite this confusion, subjects overwhelmingly expressed positive feedback about having a specialized library site serving their specific needs. issues regarding web-site design can be problematic for smaller specialized libraries within larger institutions. in this case, some of the problems navigating between the sites could be resolved by changes to the main library site. the design of the lis web site was preferred over the main campus web site by both the heuristic evaluators and the students in the usability test. however, designers of a main library web site might not be receptive to suggestions from a specialized or branch library. although consistency in design would eliminate confusion, requiring the specialcollection’s web site to follow a design set by the main institution could be a loss for users. in this instance, the main site was designed without user input, whereas the specialized library serving a smaller population was able to be more dynamic and responsive to its users. finding an appropriate balance for a site used by students new to the field as well as advanced students is 168 information technology and libraries | september 2006 a challenge. although the students in the study were all experienced computer and web users, their familiarity with basic library concepts varied greatly. a few novice users expressed some confusion as to the difference between journals and index databases. there actually was a description of each of these sources on the site but it was not read. (the subjects barely read any of the site’s text, so it can be difficult to make some points clearer when users want to navigate quickly without reading instructions. several subjects who did not bother to read text on the site still suggested having more notes to explain unfamiliar terms. however, if the site becomes too overloaded with explanations of library concepts, it could become annoying for more advanced users.) a separate page with a glossary is a possibility—based on the study, however, it will probably not be read. another possibility is a handout for students that could have more text for new users without cluttering the web site. having such a handout would also serve to publicize the site. there was some concern prior to the study that offering more advanced features, such as providing access to jake or indicating which journals are refereed, might be off-putting for new students; therefore, test questions were designed to gauge reactions to these features. most students in the study did express some intimidation at not being familiar with these concepts. however, all the subjects eventually figured out how to use jake and, once they tried it, thought it was a good idea to include it. even new students who had the most difficulty were still able to navigate and learn from the site to be able to use it efficiently. an online survey was added to the final design to allow continuous user input. the site consistently receives positive feedback through these surveys. it was planned that responses could be used to continually assess the site and ensure that it is kept responsive and up-to-date; however specific suggestions have not yet been forthcoming. how valuable was usability testing to the web-site design? several good suggestions were made and implemented, and the process confirmed that the site was well designed. it provided some insight into how subjects used the web site that had not been anticipated by the designers. since usability studies are fairly easy and inexpensive to conduct, it is probably a step worth taking during the web-site design process even if it results in only minor changes to the design. references and notes 1. w3c, “the w3c markup validation service,” validator .w3.org (accessed nov. 1, 2005); w3c, “the w3c css validation service,” jigsaw.w3.org/css-validator (accessed nov. 1, 2005). 2. see carol m. barnum, usability testing and research (new york: longman international, 2002); alison j. head, “web redemption and the promise of usability,” online 23, no. 6 (1999): 20–29; international standards organization, ergonomic requirements for office work with visual display terminals. part 11: guidance on usability—iso 9241-11 (geneva: international organization for standardization, 1998); judy jeng, “what is usability in the context of the digital library and how can it be measured?” information technology and libraries 24, no. 2 (2005): 47–52; jakob nielsen, usability engineering (boston: academic, 1993); ruth ann palmquist, “an overview of usability for the study of users’ web-based information retrieval behavior,” journal of education for library and information science 42, no. 2 (2001): 123–36. 3. joseph s. dumas and janice c. redish, a practical guide to usability testing (portland: intellect bks., 1999), 4. 4. john d. gould and clayton h. lewis, “designing for usability: key principles and what designers think,” communications of the acm 28 no. 3 (1985): 300–11. 5. jakob nielsen, “heuristic evaluation,” in jakob nielsen and robert l. mack, eds., usability inspection methods (new york: wiley, 1994), 25–62. 6. see denise t. covey, usage and usability assessment: library practices and concerns (washington, d.c.: digital library federation, 2002); nicole campbell, usability assessment of library-related web sites (chicago: ala, 2001); kristen l. garlock and sherry piontek, designing web interfaces to library services and resources (chicago: ala, 1999); anna noakes schulze, “user-centered design for information professionals,” journal of education for library and information science 42, no. 2 (2001): 116–22; susan m. thompson, “remote observation strategies for usability testing,” information technology and libraries 22, no. 3 (2003): 22–32. 7. government services administration, “section 508: section 508 standards,” www.section508.gov/index.cfm?fuseacti on=content&id=12#web (accessed nov. 1, 2005). 8. w3c, “web content accessibility guidelines 2.0,” www .w3.org/tr/wcag20 (accessed nov. 1, 2005). 9. see susan augustine and courtney greene, “discovering how students search a library web site: a usability case study,” college and research libraries 63, no. 4 (2002): 354–65; brenda battleson, austin booth, and jane weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 188–98; janice krueger, ron l. ray, and lorrie knight, “applying web usability techniques to assess student awareness of library web resources,” journal of academic librarianship 30, no. 4 (2004): 285–93; thura mack et al., “designing for experts: how scholars approach an academic library web site,” information technology and libraries 23, no. 1 (2004): 16–22; mark shelstad, “content matters: analysis of a web site redesign,” oclc systems & services 21, no. 3 (2005): 209–25; robert l. tolliver et al., “web site redesign and testing with a usability consultant: lessons learned,” oclc systems & services 21, no. 3 (2005): 156–67; dominique turnbow et al., “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 226–34; leanne m. vandecreek, “usability analysis of northern illinois user-centered design of a web site | manzari and trinidad-christensen 169 university libraries’ web site: a case study,” oclc systems & services 21, no. 3 (2005): 181–92. 10. jakob nielsen and rolf molich, “heuristic evaluation of user interfaces,” in proceedings of the acm chi ’90 (new york: association for computing machinery, 1990), 249–56. 11. robin jeffries et al., “user interface evaluation in the real world: a comparison of a few techniques,” in proceedings of the acm chi ’91 (new york: association for computing machinery, 1991), 119–24; jakob nielsen, “finding usability problems through heuristic evaluation,” in proceedings of the acm chi ’92 (new york: association for computing machinery, 1992), 373–86. 12. jakob nielsen, “heuristic evaluation,” 25–62. 13. jeffrey rubin, handbook of usability testing: how to plan, design, and conduct effective tests (new york: wiley, 1994); jakob nielsen, “why you only need to test with five users, alertbox mar. 19, 2000,” www.useit.com/alertbox/20000319.html (accessed nov. 1, 2005). 184 information technology and libraries | december 2009 thomas sommer unlv special collections in the twenty-first century university of nevada las vegas (unlv) special collections is consistently striving to provide several avenues of discovery to its diverse range of patrons. specifically, unlv special collections has planned and implemented several online tools to facilitate unearthing treasures in the collections. these online tools incorporate web 2.0 features as well as searchable interfaces to collections. t he university of nevada las vegas (unlv) special collections has been working toward creating a visible archival space in the twenty-first century that assists its patrons’ quest for historical discovery in unlv’s unique southern nevada, gaming, and las vegas collections. this effort has helped patrons ranging from researchers to students to residents. special collections has created a discovery environment that incorporates several points of access, including virtual exhibits, a collection-wide search box, and digital collections. unlv special collections also has added web 2.0 features to aid in the discovery and enrichment of this historical information. these new features range from a what’s new blog to a digital collection with interactive features. the first point of discovery within the unlv special collections website began with the virtual exhibits. staff created the virtual exhibits as static html pages that showcased unique materials housed within unlv special collections. they showed the scope and diversity of materials on a specific topic available to researchers, faculty, and students. one virtual exhibit is “dino at the sands” (figure 1), a point of discovery for the history not only of dean martin but of many rat pack exploits.1 the photographs in this exhibit come from the sands collection. it is a static html page, and it provides information and pictures regarding one of las vegas’ most famous entertainers. this exhibit contains links to rat pack information and various resources on dean martin, including photographs, books, and videotapes. a second mode of discovery within the unlv special collections website is its new “search special collections” google-like search box (figure 2). this is located on the homepage and searches the manuscript, photograph, and oral history primary source collections.2 the purpose is to aid in the discovery of material within the collections that is not yet detailed in the public online catalog. in the past researchers would have to work through the special collection’s website to locate the resources. they can now go to one place to search for various types of material—a one-stop shop. the search results are easy to read and highlight the search term (see figure 3).3 the third point of access is the digital collection. these collections are digital copies of original materials located within the archives. the digital copies are presented online, described, and organized for easy access. each collection offers full-text searches, browsing, zoom, pan, figure 2. unlv special collections search box figure 1. “dino at the sands” exhibit thomas sommer (thomas.sommer@unlv.edu) is university and technical services archivist in special collections at the university of nevada las vegas libraries. unlv special collections in the twenty-first century | sommer 185 side-by-side comparison, and exporting for presentation and reuse. the newest example of a digital collection is “southern nevada: the boomtown years” (figure 4).4 this collection brings together a wide range of original materials from various collections located within unlv special collections, the nevada state museum, the historical society in las vegas, and the clark county heritage museum. it even provides standards-based activities for elementary and high school students. this project was funded by the nevada state library and archives under the library services and technology act (lsta) as amended through the institute of museum figure 4. “southern nevada: the boomtown years” digital collection figure 5. “what’s new” blog figure 6. unlv special collection facebook page figure 3. hoover dam search results 186 information technology and libraries | december 2009 and library services (imls). unlv special collections director peter michel selected the content. the team included fourteen members, four of whom were funded by the grant. christy keeler, phd, created the educator pages and designed the student activities. new collections are great, but users have to know they exist. to announce new collections and displays, special collections first added a what’s new blog that includes an rss feed to keep patrons up-to-date on new messages (figure 5).5 another avenue of interaction was implemented in april 2009 when special collections created its own facebook page (figure 6).6 students and researchers are encouraged to become fans. status updates with images and links to southern nevada and las vegas resources lead the fans back to the main website where the other treasures can be discovered. special collections has implemented various web 2.0 features within its newest digital collections. specifically, it added a comments section, a “rate it” feature, and an rss feature to its latest digital collections (figures 7, 8, and 9). these latest trends enrich the collections’ resources with patron-supplied information.7 as is apparent, unlv special collections implemented several online tools to allow patrons to discover its extensive primary resources. these tools range from virtual exhibits and digital collections with web 2.0 features to blogs and social networking sites. special collections has endeavored to stay on top of the latest trends to benefit its patrons and facilitate their discovery of historical materials in the twenty-first century. figure 8. “rate it” feature for aerial view of hughes aircraft plant photograph figure 7. comments section for aerial view of hughes aircraft plant photograph figure 9. rss feature for the index to the “welcome home howard” digital collection continued on page 190 190 information technology and libraries | december 2009 as previously mentioned, these easy-to-use tools can allow screencast videos and screenshots to be integrated into a variety of online spaces. a particularly effective type of online space for potential integration of such screencast videos and screenshots are library “how do i find . . .” research help guides. many of these “how do i find . . .” research help guides serve as pathfinders for patrons, outlining processes for obtaining information sources. currently, many of these pathfinders are in text form, and experimentation with the tools outlined in this article can empower library staff to enhance their own pathfinders with screencast videos and screenshot tutorials. reference 1. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 30, 2009): 2. unlv special collections continued from page 186 references 1. peter michel, “dino at the sands,” unlv special collections, http://www.library.unlv.edu/speccol/dino/index.html (accessed july 28, 2009). 2. peter michel, “unlv special collections search box.” unlv special collections. http://www.library.unlv.edu/speccol/ index.html (accessed july 28, 2009). 3. unlv special collections search results, “hoover dam,” http://www.library.unlv.edu/speccol/databases/index .php?search_query=hoover+dam&bts=search&cols[]=oh&cols []=man&cols[]=photocoll&act=2 (accessed october 27, 2009). 4. unlv libraries, “southern nevada: the boomtown years,” http://digital.library.unlv.edu/boomtown/ (accessed july 28, 2009). 5. unlv special collections, “what’s new in special collections,” http://blogs.library.unlv.edu/whats_new_in_special_ collections/ (accessed july 28, 2009). 6. unlv special collections, “unlv special collections facebook homepage,” http://www.facebook.com/home .php?#/pages/las-vegas-nv/unlv-special-collections/70053 571047?ref=search (accessed july 28, 2009). 7. unlv libraries, “comments section for the aerial view of hughes aircraft plant photograph,” http://digital.library .unlv.edu/hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “‘rate it’ feature for the aerial view of hughes aircraft plant photograph,” http://digital.library.unlv.edu/ hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “rss feature for the index to the welcome home howard digital collection” http://digital.library.unlv.edu/hughes/ dm.php/ (accessed july 28, 2009). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2009 issue). total number of copies printed: average, 5,096; actual, 4,751. mailed outside country paid subscriptions: average, 4,090; actual, 3,778. sales through dealers and carriers, street vendors, and counter sales: average, 430; actual 399. total paid distribution: average, 4,520; actual, 4,177. free or nominal rate copies mailed at other classes through the usps: average, 54; actual, 57. free distribution outside the mail (total): average, 127; actual, 123. total free or nominal rate distribution: average, 181; actual, 180. total distribution: average, 4,701; actual, 4,357. office use, leftover, unaccounted, spoiled after printing: average, 395; actual, 394. total: average, 5,096; actual, 4,751. percentage paid: average, 96.15; actual, 95.87. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 0 9 . fboze rectangle generating collaborative systems for digital libraries | hilera et al. 195 josé r. hilera, carmen pagés, j. javier martínez, j. antonio gutiérrez, and luis de-marcos an evolutive process to convert glossaries into ontologies dictionary, the outcome will be limited by the richness of the definition of terms included in that dictionary. it would be what is normally called a “lightweight” ontology,6 which could later be converted into a “heavyweight” ontology by implementing, in the form of axioms, knowledge not contained in the dictionary. this paper describes the process of creating a lightweight ontology of the domain of software engineering, starting from the ieee standard glossary of software engineering terminology.7 ■■ ontologies, the semantic web, and libraries within the field of librarianship, ontologies are already being used as alternative tools to traditional controlled vocabularies. this may be observed particularly within the realm of digital libraries, although, as krause asserts, objections to their use have often been raised by the digital library community.8 one of the core objections is the difficulty of creating ontologies as compared to other vocabularies such as taxonomies or thesauri. nonetheless, the semantic richness of an ontology offers a wide range of possibilities concerning indexing and searching of library documents. the term ontology (used in philosophy to refer to the “theory about existence”) has been adopted by the artificial intelligence research community to define a categorization of a knowledge domain in a shared and agreed form, based on concepts and relationships, which may be formally represented in a computer readable and usable format. the term has been widely employed since 2001, when berners-lee et al. envisaged the semantic web, which aims to turn the information stored on the web into knowledge by transforming data stored in every webpage into a common scheme accepted in a specific domain.9 to accomplish that task, knowledge must be represented in an agreed-upon and reusable computer-readable format. to do this, machines will require access to structured collections of information and to formalisms which are based on mathematical logic that permits higher levels of automatic processing. technologies for the semantic web have been developed by the world wide web consortium (w3c). the most relevant technologies are rdf (resource description this paper describes a method to generate ontologies from glossaries of terms. the proposed method presupposes an evolutionary life cycle based on successive transformations of the original glossary that lead to products of intermediate knowledge representation (dictionary, taxonomy, and thesaurus). these products are characterized by an increase in semantic expressiveness in comparison to the product obtained in the previous transformation, with the ontology as the end product. although this method has been applied to produce an ontology from the “ieee standard glossary of software engineering terminology,” it could be applied to any glossary of any knowledge domain to generate an ontology that may be used to index or search for information resources and documents stored in libraries or on the semantic web. f rom the point of view of their expressiveness or semantic richness, knowledge representation tools can be classified at four levels: at the basic level (level 0), to which dictionaries belong, tools include definitions of concepts without formal semantic primitives; at the taxonomies level (level 1), tools include a vocabulary, implicit or explicit, as well as descriptions of specialized relationships between concepts; at the thesauri level (level 2), tools further include lexical (synonymy, hyperonymy, etc.) and equivalence relationships; and at the reference models level (level 3), tools combine the previous relationships with other more complex relationships between concepts to completely represent a certain knowledge domain.1 ontologies belong at this last level. according to the hierarchic classification above, knowledge representation tools of a particular level add semantic expressiveness to those in the lowest levels in such a way that a dictionary or glossary of terms might develop into a taxonomy or a thesaurus, and later into an ontology. there are a variety of comparative studies of these tools,2 as well as varying proposals for systematically generating ontologies from lower-level knowledge representation systems, especially from descriptor thesauri.3 this paper proposes a process for generating a terminological ontology from a dictionary of a specific knowledge domain.4 given the definition offered by neches et al. (“an ontology is an instrument that defines the basic terms and relations comprising the vocabulary of a topic area as well as the rules for combining terms and relations to define extensions to the vocabulary”)5 it is evident that the ontology creation process will be easier if there is a vocabulary to be extended than if it is developed from scratch. if the developed ontology is based exclusively on the josé r. hilera (jose.hilera@uah.es) is professor, carmen pagés (carmina.pages@uah.es) is assistant professor, j. javier martínez (josej.martinez@uah.es) is professor, j. antonio gutiérrez (jantonio.gutierrez@uah.es) is assistant professor, and luis de-marcos (luis.demarcos@uah.es) is professor, department of computer science, faculty of librarianship and documentation, university of alcalá, madrid, spain. 196 information technology and libraries | december 2010 configuration management; data types; errors, faults, and failures; evaluation techniques; instruction types; language types; libraries; microprogramming; operating systems; quality attributes; software documentation; software and system testing; software architecture; software development process; software development techniques; and software tools.15 in the glossary, entries are arranged alphabetically. an entry may consist of a single word, such as “software,” a phrase, such as “test case,” or an acronym, such as “cm.” if a term has more than one definition, the definitions are numbered. in most cases, noun definitions are given first, followed by verb and adjective definitions as applicable. examples, notes, and illustrations have been added to clarify selected definitions. cross-references are used to show a term’s relations with other terms in the dictionary: “contrast with” refers to a term with an opposite or substantially different meaning; “syn” refers to a synonymous term; “see also” refers to a related term; and “see” refers to a preferred term or to a term where the desired definition can be found. figure 2 shows an example of one of the definitions of the glossary terms. note that definitions can also include framework),10 which defines a common data model to specify metadata, and owl (ontology web language),11 which is a new markup language for publishing and sharing data using web ontologies. more recently, the w3c has presented a proposal for a new rdf-based markup system that will be especially useful in the context of libraries. it is called skos (simple knowledge organization system), and it provides a model for expressing the basic structure and content of concept schemes, such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabularies.12 the emergence of the semantic web has created great interest within librarianship because of the new possibilities it offers in the areas of publication of bibliographical data and development of better indexes and better displays than those that we have now in ils opacs.13 for that reason, it is important to strive for semantic interoperability between the different vocabularies that may be used in libraries’ indexing and search systems, and to have compatible vocabularies (dictionaries, taxonomies, thesauri, ontologies, etc.) based on a shared standard like rdf. there are, at the present time, several proposals for using knowledge organization systems as alternatives to controlled vocabularies. for example, folksonomies, though originating within the web context, have been proposed by different authors for use within libraries “as a powerful, flexible tool for increasing the user-friendliness and interactivity of public library catalogs.”14 authors argue that the best approach would be to create interoperable controlled vocabularies using shared and agreed-upon glossaries and dictionaries from different domains as a departure point, and then to complete evolutive processes aimed at semantic extension to create ontologies, which could then be combined with other ontologies used in information systems running in both conventional and digital libraries for indexing as well as for supporting document searches. there are examples of glossaries that have been transformed into ontologies, such as the cambridge healthtech institute’s “pharmaceutical ontologies glossary and taxonomy” (http://www.genomicglossaries.com/content/ontolo gies.asp), which is an “evolving terminology for emerging technologies.” ■■ ieee standard glossary of software engineering terminology to demonstrate our proposed method, we will use a real glossary belonging to the computer science field, although it is possible to use any other. the glossary, available in electronic format (pdf), defines approximately 1,300 terms in the domain of software engineering (figure 1). topics include addressing assembling, compiling, linking, loading; computer performance evaluation; figure 1. cover of the glossary document generating collaborative systems for digital libraries | hilera et al. 197 4. define the classes and the class hierarchy 5. define the properties of classes (slots) 6. define the facets of the slots 7. create instances as outlined in the introduction, the ontology developed using our method is a terminological one. therefore we can ignore the first two steps in noy’s and mcguinness’ process as the concepts of the ontology coincide with the terms of the glossary used. any ontology development process must take into account the basic stages of the life cycle, but the way of organizing the stages can be different in different methods. in our case, since the ontology has a terminological character, we have established an incremental development process that supposes the natural evolution of the glossary from its original format (dictionary or vocabulary format) into an ontology. the proposed life cycle establishes a series of steps or phases that will result in intermediate knowledge representation tools, with the final product, the ontology, being the most semantically rich (figure 4). therefore this is a product-driven process, in which the aim of every step is to obtain an intermediate product useful on its own. the intermediate products and the final examples associated with the described concept. in the resulting ontology, the examples were included as instances of the corresponding class. in figure 2, it can be seen that the definition refers to another glossary on programming languages (std 610.13), which is a part of the series of dictionaries related to computer science (“ieee std 610,” figure 3). other glossaries which are mentioned in relation to some references about term definitions are 610.1, 610.5, 610.7, 610.8, and 610.9. to avoid redundant definitions and possible inconsistencies, links must be implemented between ontologies developed from those glossaries that include common concepts. the ontology generation process presented in this paper is meant to allow for integration with other ontologies that will be developed in the future from the other glossaries. in addition to the explicit references to other terms within the glossary and to terms from other glossaries, the textual definition of a concept also has implicit references to other terms. for example, from the phrase “provides features designed to facilitate expression of data structures” included in the definition of the term high order language (figure 2), it is possible to determine that there is an implicit relationship between this term and the term data structure, also included in the glossary. these relationships have been considered in establishing the properties of the concepts in the developed ontology. ■■ ontology development process many ontology development methods presuppose a life cycle and suggest technologies to apply during the process of developing an ontology.16 the method described by noy and mcguinness is helpful when beginning this process for the first time.17 they establish a seven-step process: 1. determine the domain and scope of the ontology 2. consider reusing existing ontologies 3. enumerate important terms in the ontology figure 2. example of term definition in the ieee glossary figure 3. ieee computer science glossaries 610—standard dictionary of computer terminology 610.1—standard glossary of mathematics of computing terminology 610.2—standard glossary of computer applications terminology 610.3—standard glossary of modeling and simulation terminology 610.4—standard glossary of image processing terminology 610.5—standard glossary of data management terminology 610.6—standard glossary of computer graphics terminology 610.7—standard glossary of computer networking terminology 610.8—standard glossary of artificial intelligence terminology 610.9—standard glossary of computer security and privacy terminology 610.10—standard glossary of computer hardware terminology 610.11—standard glossary of theory of computation terminology 610.12—standard glossary of software engineering terminology 610.13—standard glossary of computer languages terminology high order language (hol). a programming language that requires little knowledge of the computer on which a program will run, can be translated into several difference machine languages, allows symbolic naming of operations and addresses, provides features designed to facilitate expression of data structures and program logic, and usually results in several machine instructions for each program statement. examples include ada, cobol, fortran, algol, pascal. syn: high level language; higher order language; third generation language. contrast with: assembly language; fifth generation language; fourth generation language; machine language. note: specific languages are defined in p610.13 198 information technology and libraries | december 2010 since there are terms with different meanings (up to five in some cases) in the ieee glossary of software engineering terminology, during dictionary development we decided to create different concepts (classes) for the same term, associating a number to these concepts to differentiate them. for example, there are five different definitions for the term test, which is why there are five concepts (test1–test5), corresponding to the five meanings of the term: (1) an activity in which a system or component is executed under specified conditions, the results are observed or recorded, and an evaluation is made of some aspect of the system or component; (2) to conduct an activity as in (1); (3) a set of one or more test cases; (4) a set of one or more test procedures; (5) a set of one or more test cases and procedures. taxonomy the proposed lifecycle establishes a stage for the conversion of a dictionary into a taxonomy, understanding taxonomy as an instrument of concepts categorization, product are a dictionary, which has a formal and computer processed structure, with the terms and their definitions in xml format; a taxonomy, which reflects the hierarchic relationships between the terms; a thesaurus, which includes other relationships between the terms (for example, the synonymy relationship); and, finally, the ontology, which will include the hierarchy, the basic relationships of the thesaurus, new and more complex semantic relationships, and restrictions in form of axioms expressed using description logics.18 the following paragraphs describe the way each of these products is obtained. dictionary the first step of the proposed development process consists of the creation of a dictionary in xml format with all the terms included in the ieee standard glossary of software engineering terminology and their related definitions. this activity is particularly mechanical and does not need human intervention as it is basically a transformation of the glossary from its original format (pdf) into a format better suited to the development process. all formats considered for the dictionary are based on xml, and specifically on rdf and rdf schema. in the end, we decided to work with the standards daml+oil and owl,19 though we are not opposed to working with other languages, such as skos or xmi,20 in the future. (in the latter case, it would be possible to model the intermediate products and the ontology in uml graphic models stored in xml files.)21 in our project, the design and implementation of all products has been made using an ontology editor. we have used oiled (with oilviz plugin) as editor, both because of its simplicity and because it allows the exportation to owl and daml formats. however, with future maintenance and testing in mind, we decided to use protégé (with owl plugin) in the last step of the process, because this is a more flexible environment with extensible modules that integrate more functionality such as ontology annotation, evaluation, middleware service, query and inference, etc. figure 5 shows the dictionary entry for “high order language,” which appears in figure 2. note that the dictionary includes only owl:class (or daml:class) to mark the term; rdf:label to indicate the term name; and rdf:comment to provide the definition included in the original glossary. figure 4. ontology development process highorderlanguage figure 5. example of dictionary entry generating collaborative systems for digital libraries | hilera et al. 199 example, when analyzing the definition of the term compiler: “(is) a computer program that translates programs expressed in a high order language into their machine language equivalent,” it is possible to deduce that compiler is a subconcept of computer program, which is also included in the glossary.) in addition to the lexical or syntactic analysis, it is necessary for an expert in the domain to perform a semantic analysis to complete the development of the taxonomy. the implementation of the hierarchical relationships among the concepts is made using rdfs:subclassof, regardless of whether the taxonomy is implemented in owl or daml format, since both languages specify this type of relationship in the same way. figure 6 shows an example of a hierarchical relationship included in the definition of the concept pictured in figure 5. thesaurus according to the international organization for standardization (iso), a thesaurus is “the vocabulary of a controlled indexing language, formally organized in order to make explicit the a priori relations between concepts (for example ‘broader’ and ‘narrower’).”25 this definition establishes the lexical units and the semantic relationships between these units as the elements that constitute a thesaurus. the following is a sample of the lexical units: ■■ descriptors (also called “preferred terms”): the terms used consistently when indexing to represent a concept that can be in documents or in queries to these documents. the iso standard introduces the option of adding a definition or an application note to every term to establish explicitly the chosen meaning. this note is identified by the abbreviation sn (scope note), as shown in figure 7. ■■ non-descriptors (“non-preferred terms”): the synonyms or quasi-synonyms of a preferred term. a nonpreferred term is not assigned to documents submitted to an indexing process, but is provided as an entry point in a thesaurus to point to the appropriate descriptor. usually the descriptors are written in capital letters and the nondescriptors in small letters. ■■ compound descriptors: the terms used to represent complex concepts and groups of descriptors, which allow for the structuring of large numbers of thesaurus descriptors into subsets called micro-thesauri. in addition to lexical units, other fundamental elements of a thesaurus are semantic relationships between these units. the more common relationships between lexical units are the following: ■■ equivalence: the relationship between the descriptors and the nondescriptors (synonymous and that is, as a systematical classification in a traditional way. as gilchrist states, there is no consensus on the meaning of terms like taxonomy, thesaurus, or ontology.22 in addition, much work in the field of ontologies has been done without taking advantage of similar work performed in the fields of linguistics and library science.23 this situation is changing because of the increasing publication of works that relate the development of ontologies to the development of “classic” terminological tools (vocabularies, taxonomies, and thesauri). this paper emphasizes the importance and usefulness of the intermediate products created at each stage of the evolutive process from glossary to ontology. the end product of the initial stage is a dictionary expressed as xml. the next stage in the evolutive process (figure 4) is the transformation of that dictionary into a taxonomy through the addition of hierarchical relationships between concepts. to do this, it is necessary to undertake a lexicalsemantic analysis of the original glossary. this can be done in a semiautomatic way by applying natural language processing (nlp) techniques, such as those recommended by morales-del-castillo et al.,24 for creating thesauri. the basic processing sequence in linguistic engineering comprises the following steps: (1) incorporate the original documents (in our case the dictionary obtained in the previous stage) into the information system; (2) identify the language in which they are written, distinguishing independent words; (3) “understand” the processed material at the appropriate level; (4) use this understanding to transform, search, or traduce data; (5) produce the new media required to present the produced outcomes; and finally, (6) present the final outcome to human users by means of the most appropriate peripheral device—screen, speakers, printer, etc. an important aspect of this process is natural language comprehension. for that reason, several different kinds of programs are employed, including lemmatizers (which implement stemming algorithms to extract the lexeme or root of a word), morphologic analyzers (which glean sentence information from their constituent elements: morphemes, words, and parts of speech), syntactic analyzers (which group sentence constituents to extract elements larger than words), and semantic models (which represent language semantics in terms of concepts and their relations, using abstraction, logical reasoning, organization and data structuring capabilities). from the information in the software engineering dictionary and from a lexical analysis of it, it is possible to determine a hierarchical relationship when the name of a term contains the name of another one (for example, the term language and the terms programming language and hardware design language), or when expressions such as “is a” linked to the name of another term included in the glossary appear in the text of the term definition. (for 200 information technology and libraries | december 2010 indicating that high order language relates to both assembly and machine languages. the life cycle proposed in this paper (figure 4) includes a third step or phase that transforms the taxonomy obtained in the previous phase into a thesaurus through the incorporation of relationships between the concepts that complement the hierarchical relations included in the taxonomy. basically, we have to add two types of relationships—equivalence and associative, represented in the standard thesauri with uf (and use) and rt respectively. we will continue using xml to implement this new product. there are different ways of implementing a thesaurus using a language based on xml. for example, matthews et al. proposed a standard rdf format,26 where as hall created an ontology in daml.27 in both cases, the authors modeled the general structure of quasi-synonymous). iso establishes that the abbreviation uf (used for) precedes the nondescriptors linked to a descriptor; and the abbreviation use is used in the opposite case. for example, a thesaurus developed from the ieee glossary might include a descriptor “high order language” and an equivalence relationship with a nondescriptor “high level language” (figure 7). ■■ hierarchical: a relationship between two descriptors. in the thesaurus one of these descriptors has been defined as superior to the other one. there are no hierarchical relationships between nondescriptors, nor between nondescriptors and descriptors. a descriptor can have no lower descriptors or several of them, and no higher descriptors or several of them. according to the iso standard, hierarchy is expressed by means of the abbreviations bt (broader term), to indicate the generic or higher descriptors, and nt (narrower term), to indicate the specific or lower descriptors. the term at the head of the hierarchy to which a term belongs can be included, using the abbreviation tt (top term). figure 7 presents these hierarchical relationships. ■■ associative: a reciprocal relationship that is established between terms that are neither equivalent nor hierarchical, but are semantically or conceptually associated to such an extent that the link between them should be made explicit in the controlled vocabulary on the grounds that it may suggest additional terms for use in indexing or retrieval. it is generally indicated by the abbreviation rt (related term). there are no associative relationships between nondescriptors and descriptors, or between descriptors already linked by a hierarchical relation. it is possible to establish associative relationships between descriptors belonging to the same or different category. the associative relationships can be of very different types. for example, they can represent causality, instrumentation, location, similarity, origin, action, etc. figure 7 shows two associative relations, .. high order language (descriptor) sn a programming language that... uf high level language (no-descriptor) uf third generation language (no-descriptor) tt language bt programming language nt object oriented language nt declarative language rt assembly language (contrast with) rt machine language (contrast with) .. high level language use high order language .. third generation language use high order language .. figure 7. fragment of a thesaurus entry figure 6. example of taxonomy entry ... generating collaborative systems for digital libraries | hilera et al. 201 terms. for example: . or using the glossary notation: . ■■ the rest of the associative relationships (rt) that were included in the thesaurus correspond to the cross-references of the type “contrast with” and “see also” that appear explicitly in the ieee glossary. ■■ neither compound descriptors nor groups of descriptors have been implemented because there is no such structure in the glossary. ontology ding and foo state that “ontology promotes standardization and reusability of information representation through identifying common and shared knowledge. ontology adds values to traditional thesauri through deeper semantics in digital objects, both conceptually, relationally and machine understandably.”29 this semantic richness may imply deeper hierarchical levels, richer relationships between concepts, the definition of axioms or inference rules, etc. the final stage of the evolutive process is the transformation of the thesaurus created in the previous stage into an ontology. this is achieved through the addition of one or more of the basic elements of semantic complexity that differentiates ontologies from other knowledge representation standards (such as dictionaries, taxonomies, and thesauri). for example: ■■ semantic relationships between the concepts (classes) of the thesaurus have been added as properties or ontology slots. ■■ axioms of classes and axioms of properties. these are restriction rules that are declared to be satisfied by elements of ontology. for example, to establish disjunctive classes ( ), have been defined, and quantification restrictions (existential or universal) and cardinality restrictions in the relationships have been implemented as properties. software based on techniques of linguistic analysis has been developed to facilitate the establishment of the properties and restrictions. this software analyzes the definition text for each of the more than 1,500 glossary terms (in thesaurus format), isolating those words that a thesaurus from classes (rdf:class or daml:class) and properties (rdf:property or daml:objectproperty). in the first case they proposed five classes: thesaurusobject, concept, topconcept, term, scopenote; and several properties to implement the relations, like hasscopenote (sn), isindicatedby, preferredterm, usedfor (uf), conceptrelation, broaderconcept (bt), narrowerconcept (nt), topofhierarchy (tt) and isrelatedto (rt). recently the w3c has developed the skos specification, created to define knowledge organization schemes. in the case of thesauri, skos includes specific tags, such as skos:concept, skos:scopenote (sn), skos:broader (bt), skos:narrower (nt), skos:related (rt), etc., that are equivalent to those listed in the previous paragraph. our specification does not make any statement about the formal relationship between the class of skos concept schemes and the class of owl ontologies, which will allow different design patterns to be explored for using skos in combination with owl. although any of the above-mentioned formats could be used to implement the thesaurus, given that the endproduct of our process is to be an ontology, our proposal is that the product to be generated during this phase should have a format compatible with the final ontology and with the previous taxonomy. therefore a minimal number of changes will be carried out on the product created in the previous step, resulting in a knowledge representation tool similar to a thesaurus. that tool does not need to be modified during the following (final) phase of transformation into an ontology. nevertheless, if for some reason it is necessary to have the thesaurus in one of the other formats (such as skos), it is possible to apply a simple xslt transformation to the product. another option would be to integrate a thesaurus ontology, such as the one proposed by hall,28 with the ontology representing the ieee glossary. in the thesaurus implementation carried out in our project, the following limitations have been considered: ■■ only the hierarchical relationships implemented in the taxonomy have been considered. these include relationsips of type “is-a,” that is, generalization relationships or type–subset relationships. relationships that can be included in the thesaurus marked with tt, bt, and nt, like relations of type “part of” (that is, partative relationships) have not been considered. instead of considering them as hierarchical relationships, the final ontology includes the possibility of describing classes as a union of classes. ■■ the relationships of synonymy (uf and use) used to model the cross-references in the ieee glossary (“syn” and “see,” respectively) were implemented as equivalent terms, that is, as equivalent axioms between classes (owl:equivalentclass or daml:sameclassas), with inverse properties to reflect the preference of the 202 information technology and libraries | december 2010 match the name of other glossary terms (or a word in the definition text of other glossary terms). the isolated words will then be candidates for a relationship between both of them. (figure 8 shows the candidate properties obtained from the software engineering glossary.) the user then has the option of creating relationships with the identified candidate words. the user must indicate, for every relationship to be created, the restriction type that it represents as well as existential or universal quantification or cardinality (minimum or maximum). after confirming this information, the program updates the file containing the ontology (owl or daml), adding the property to the class that represents the processed term. figure 9 shows an example of the definition of two properties and its application to the class highorderlanguage: a property express with existential quantification over the class datastructure to indicate that a language must represent at least one data structure; and a property translateto of universal type to indicate that any high-level language is translated into machine language (machinelanguage). ■■ results, conclusions, and future work the existence of ontologies of specific knowledge domains (software engineering in this case) facilitates the process of finding resources about this discipline on the semantic web and in digital libraries, as well as the reuse of learning objects of the same domain stored in repositories available on the web.30 when a new resource is indexed in a library catalog, a new record that conforms to the ontology conceptual data model may be included. it will be necessary to assign its properties according to the concept definition included in the ontology. the user may later execute semantic queries that will be run by the search system that will traverse the ontology to identify the concept in which the user was interested to launch a wider query including the resources indexed under the concept. ontologies, like the one that has been “evolved,” may also be used in an open way to index and search for resources on the web. in that case, however, semantic search engines such as swoogle (http://swoogle.umbc .edu/), are required in place of traditional syntactic search engines, such as google. the creation of a complete ontology of a knowledge domain is a complex task. in the case of the domain presented in this paper, that of software engineering, although there have been initiatives toward ontology creation that have yielded publications by renowned authors in the field,31 a complete ontology has yet to be created and published. this paper has described a process for developing a modest but complete ontology from a glossary of terminology, both in owl format and daml+oil format, accept access accomplish account achieve adapt add adjust advance affect aggregate aid allocate allow allow symbolic naming alter analyze apply approach approve arrangement arrive assign assigned by assume avoid await begin break bring broke down builds call called by can be can be input can be used as can operate in cannot be usedas carry out cause change characterize combine communicate compare comply comprise conduct conform consist constrain construct contain contains no contribute control convert copy correct correspond count create debugs decompiles decomposedinto decrease define degree delineate denote depend depict describe design designate detect determine develop development direct disable disassembles display distribute divide document employ enable encapsulate encounter ensure enter establish estimate establish evaluate examine exchange execute after execute in executes expand express express as extract facilitate fetch fill follow fulfil generate give give partial given constrain govern have have associated have met have no hold identify identify request ignore implement imply improve incapacitate include incorporate increase indicate inform initiate insert install intend interact with interprets interrelate investigate invokes is is a defect in is a form of is a method of is a mode of is a part is a part of is a sequence is a sequenceof is a technique is a techniqueof is a type is a type of is ability is activated by is adjusted by is applied to is based is called by is composed is contained is contained in is establish is established is executed after is executed by is incorrect is independent of is manifest is measured in is not is not subdivided in is part is part of is performed by is performed on is portion is process by is produce by is produce in is ratio is represented by is the output is the result of is translated by is type is used is used in isolate know link list load locate maintain make make up may be measure meet mix modify monitors move no contain no execute no relate no use not be connected not erase not fill not have not involve not involving not translate not use occur occur in occur in a operate operatewith optimize order output parses pas pass test perform permit permitexecute permit the execution pertaining place preclude predict prepare prescribe present present for prevent preventaccessto process produce produce no propose provide rank reads realize receive reconstruct records recovery refine reflect reformat relate relation release relocates remove repair replace represent request require reserve reside restore restructure result resume retain retest returncontrolto reviews satisfy schedule send server set share show shutdown specify store store in structure submission of supervise supports suppress suspend swap synchronize take terminate test there are no through throughout transfer transform translate transmit treat through understand update use use in use to utilize value verify work in writes figure 8. candidate properties obtained from the linguistic analysis of the software engineering glossary generating collaborative systems for digital libraries | hilera et al. 203 to each term.) we defined 324 properties or relationships between these classes. these are based on a semiautomated linguistic analysis of the glossary content (for example, allow, convert, execute, operatewith, produces, translate, transform, utilize, workin, etc.), which will be refined in future versions. the authors’ aim is to use this ontology, which we have called ontoglose (ontology glossary software engineering), to unify the vocabulary. ontoglose will be used in a more ambitious project, whose purpose is the development of a complete ontology in software engineering from the swebok guide.32 although this paper has focused on this ontology, the method that has been described may be used to generate an ontology from any dictionary. the flexibility that owl permits for ontology description, along with its compatibility with other rdf-based metadata languages, makes possible interoperability between ontologies and between ontologies and other controlled vocabularies and allows for the building of merged representations of multiple knowledge domains. these representations may eventually be used in libraries and repositories to index and search for any kind of resource, not only those related to the original field. ■■ acknowledgments this research is co-funded by the spanish ministry of industry, tourism and commerce profit program (grant tsi-020100-2008-23). the authors also want to acknowledge support from the tifyc research group at the university of alcala. references and notes 1. m. dörr et al., state of the art in content standards (amsterdam: ontoweb consortium, 2001). 2. d. soergel, “the rise of ontologies or the reinvention of classification,” journal of the american society for information science 50, no. 12 (1999): 1119–20; a. gilchrist, “thesauri, taxonomies and ontologies—an etymological note,” journal of documentation 59, no. 1 (2003): 7–18. 3. b. j. wielinga et al., “from thesaurus to ontology,” proceedings of the 1st international conference on knowledge capture (new york: acm, 2001): 194–201: j. qin and s. paling, “converting a controlled vocabulary into an ontology: the case of gem,” information research 6 (2001): 2. 4. according to van heijst, schereiber, and wielinga, ontologies can be classified as terminological ontologies, information ontologies, and knowledge modeling ontologies; terminological ontologies specify the terms that are used to represent knowledge in the domain of discourse, and they are in use principally to unify vocabulary in a certain domain. g. van heijst, a. t. which is ready to use in the semantic web. as described at the opening of this article, our aim has been to create a lightweight ontology as a first version, which will later be improved by including more axioms and relationships that increase its semantic expressiveness. we have tried to make this first version as tailored as possible to the initial glossary, knowing that later versions will be improved by others who might take on the work. such improvements will increase the ontology’s utility, but will make it a lessfaithful representation of the ieee glossary from which it was derived. the ontology we have developed includes 1,521 classes that correspond to the same number of concepts represented in the ieee glossary. (included in this number are the different meanings that the glossary assigns ... figure 9. example of ontology entry 204 information technology and libraries | december 2010 20. w3c, skos; object management group, xml metadata interchange (xmi), 2003, http://www.omg.org/technology/documents/formal/xmi.htm (accessed oct. 5, 2009). 21. uml (unified modeling language) is a standardized general-purpose modeling language (http://www.uml.org). nowadays, different uml plugins for ontologies’ editors exist. these plugins allow working with uml graphic models. also, it is possible to realize the uml models with a case tool, to export them to xml format, and to transform them to the ontology format (for example, owl) using a xslt sheet, as the one published in d. gasevic, “umltoowl: converter from uml to owl,” http://www.sfu.ca/~dgasevic/projects/umltoowl/ (accessed oct. 5, 2009). 22. gilchrist, “thesauri, taxonomies and ontologies.” 23. soergel, “the rise of ontologies or the reinvention of classification.” 24. j. m. morales-del-castillo et al., “a semantic model of selective dissemination of information for digital libraries,” information technology & libraries 28, no. 1 (2009): 22–31. 25. international standards organization, iso 2788:1986 documentation—guidelines for the establishment and development of monolingual thesauri (geneve: international standards organization, 1986). 26. b. m. matthews, k. miller, and m. d. wilson, “a thesaurus interchange format in rdf,” 2002, http://www.w3c.rl.ac .uk/swad/thes_links.htm (accessed feb. 10, 2009). 27. m. hall, “call thesaurus ontology in daml,” dynamics research corporation, 2001, http://orlando.drc.com/daml/ ontology/call-thesaurus (accessed oct. 5, 2009). 28. ibid. 29. y. ding and s. foo, “ontology research and development. part 1—a review of ontology generation,” journal of information science 28, no. 2 (2002): 123–36. see also b. h. kwasnik, “the role of classification in knowledge representation and discover,” library trends 48 (1999): 22–47. 30. s. otón et al., “service oriented architecture for the implementation of distributed repositories of learning objects,” international journal of innovative computing, information & control (2010), forthcoming. 31. o. mendes and a. abran, “software engineering ontology: a development methodology,” metrics news 9 (2004): 68–76; c. calero, f. ruiz, and m. piattini, ontologies for software engineering and software technology (berlin: springer, 2006). 32. ieee, guide to the software engineering body of knowledge (swebok) (los alamitos, calif.: ieee computer society, 2004), http:// www.swebok.org (accessed oct. 5, 2009). schereiber, and b. j. wielinga, “using explicit ontologies in kbs development,” international journal of human & computer studies 46, no. 2/3 (1996): 183–292. 5. r. neches et al., “enabling technology for knowledge sharing,” ai magazine 12, no. 3 (1991): 36–56. 6. o. corcho, f. fernández-lópez, and a. gómez-pérez, “methodologies, tools and languages for buildings ontologies. where is their meeting point?” data & knowledge engineering 46, no. 1 (2003): 41–64. 7. intitute of electrical and electronics engineers (ieee), ieee std 610.12-1990(r2002): ieee standard glossary of software engineering terminology (reaffirmed 2002) (new york: ieee, 2002). 8. j. krause, “semantic heterogeneity: comparing new semantic web approaches with those of digital libraries,” library review 57, no. 3 (2008): 235–48. 9. t. berners-lee, j. hendler, and o. lassila, “the semantic web,” scientific american 284, no. 5 (2001): 34–43. 10. world wide web consortium (w3c), resource description framework (rdf): concepts and abstract syntax, w3c recommendation 10 february 2004, http://www.w3.org/tr/rdf-concepts/ (accessed oct. 5, 2009). 11. world wide web consortium (w3c), web ontology language (owl), 2004, http://www.w3.org/2004/owl (accessed oct. 5, 2009). 12. world wide web consortium (w3c), skos simple knowledge organization system, 2009, http://www.w3.org/ tr/2009/rec-skos-reference-20090818/ (accessed oct. 5, 2009). 13. m. m. yee, “can bibliographic data be put directly onto the semantic web?” information technology & libraries 28, no. 2 (2009): 55-80. 14. l. f. spiteri, “the structure and form of folksonomy tags: the road to the public library catalog,” information technology & libraries 26, no. 3 (2007): 13–25. 15. corcho, fernández-lópez, and gómez-pérez, “methodologies, tools and languages for buildings ontologies.” 16. ieee, ieee std 610.12-1990(r2002). 17. n. f. noy and d. l. mcguinness, “ontology development 101: a guide to creating your first ontology,” 2001, stanford university, http://www-ksl.stanford.edu/people/dlm/ papers/ontology-tutorial-noy-mcguinness.pdf (accessed sept 10, 2010). 18. d. baader et al., the description logic handbook (cambridge: cambridge univ. pr., 2003). 19. world wide web consortium, daml+oil reference description, 2001, http://www.w3.org/tr/daml+oil-reference (accessed oct. 5, 2009); w3c, owl. editorial | truitt 3 w ithin the last few months, two provocative books have been published that take different approaches to the question of how we learn in the always-on, always-connected electronic environment of “screens.” while neither is specifically directed at librarians, i think both deserve to be read and discussed widely in our community. ■■ the shallows the first, the shallows: what the internet is doing to our brains (norton, 2010), by nicholas carr, is an expanded version of his article “is google making us stupid?” published in the july/august 2008 issue of atlantic monthly and discussed in this space soon after.1 carr’s arguments in the shallows will be familiar to those who read his earlier article, but they are more thoroughly developed in his book and worth summarizing here. carr’s thesis is that use of connective technology—the internet and the web—is leading to a remapping of cognitive reading and thinking skills, and a “shallowing” of these mental faculties: over the last few years i’ve had an uncomfortable sense that someone, or something, has been tinkering with my brain, remapping the neural circuitry, reprogramming the memory. . . . i’m not thinking the way i used to think. i feel it most strongly when i’m reading. i used to find it easy to immerse myself in a book or a lengthy article. . . . that’s rarely the case anymore. (5) the problem, as carr goes on to describe at some length, chronicling in detail the results of years of neurological investigations, is that the brain is “plastic.” “virtually all of our neural circuits—whether they’re involved in feeling, seeing, hearing, moving, thinking, learning, perceiving, or remembering—are subject to change.” and one of the things that is changing them the most drastically today is our growing reliance on digital information. the paradox is that as we repeat an activity—surfing the web and clicking on links, rather than engaging with linear texts, for example—chemically induced synapses cause us to want to continue the new activity, strengthening those links (34). this quality of plastic neural circuits that can be remapped, when combined with the “ecosystem of interruption technologies” of the internet and the web (e.g., in-text hyperlinks, e-mail and rss alerts, text messaging, twitter, multiple widgets, etc.) is resulting in what carr argues is a growing inability or unwillingness to engage with and reflect deeply upon extended text (91).2 as carr puts it, the linear, literary mind . . . [that has] been the imaginative mind of the renaissance, the rational mind of the enlightenment, the inventive mind of the industrial revolution, even the subversive mind of modernism . . . may soon be yesterday’s mind. (10) there is much more. carr offers pointed critiques of major internet players and the roles they play in facilitating and exploiting the remapping of our neural circuits. google, whose “profits are tied directly to the velocity of people’s information intake,” is to carr “in the business of distraction” (156–57). the google book initiative “shouldn’t be confused with the libraries we’ve known until now. it’s not a library of books. it’s a library of snippets. . . . the strip-mining of ‘relevant content’ replaces the slow excavation of meaning” (166). ultimately, for carr, it’s about who is controlling whom. while the internet may permit us to better perform some functions—search, for example—“it poses a threat to our integrity as human beings . . . we program our computers and thereafter they program us” (214). put another way, “the computer screen bulldozes our doubts with its bounties and conveniences. it is so much our servant that it would seem churlish to notice that it is also our master” (4). ■■ hamlet’s blackberry perhaps less familiar than carr’s work is william powers’ hamlet’s blackberry: a practical philosophy for building a good life in the digital age (harpercollins 2010). powers, a writer whose work has appeared in the washington post, the new york times, the new republic, and elsewhere, describes the influence of digital technology (or “screens,” to use his shorthand)3 and connectedness on our lives: in the last few decades, we’ve found a powerful new way to pursue more busyness: digital technology. computers and smart phones are often pitched as solutions to our stressful, overextended lives. . . . but at the same time, they link us more tightly to all the sources of our busyness. our screens are conduits for everything that keeps us hopping—mandatory and optional, worthwhile and silly. . . . marc truitteditorial: “the air is full of people” marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2011 if not yet a general consensus, that people are coming to experience and understand these costs. finally, they also make the point that things need not continue on their present course. i can imagine that if we in libraries take carr and powers seriously, there might be significant implications for service models and collections practices. both books have been reviewed in all the usual mainstream places. remarkably though, to me—and excluding a scant few discussion list threads such as that on web4lib several years ago—i’ve seen no discussion in the usual professional venues of their implications where libraries are concerned. perhaps i’m simply not reading the “right” weblogs or discussion lists. i’m not under the illusion that libraries or librarians can by themselves alter our rush toward the “shallows.” still, given our eagerness to discuss how we extend the reach of “screens” in libraries—whether in the form of learning commons, wireless access, mobile-friendly websites, clearing stacks of “tree-books” in favor of e-books, etc.—would it not be reasonable to think that we should show as much concern about the consequences of such activities, and even some interest in providing possible remedial alternatives? one of my favorite library spaces in college was the linonia and brothers reading room in yale’s sterling memorial library (see a photo of the reading room at http://images.library.yale.edu/madid/oneitem.aspx ?id=1772930). its dark oak paneling, built-in bookshelves, overstuffed leather easy chairs, cozy alcoves, toasty, footwarming steam radiators, and stained-glass windows overlooking a quiet courtyard represented the epitome of the nineteenth-century “gentleman’s library” and encouraged the sort of deep reading and contemplation that are becoming so rare in our institutions today. i spent many hours there, reading, thinking, dreaming—and yes, catnapping too. i haven’t visited the “l&b” in years; i hope it is still the way i so fondly recall it. over the past few years, as we’ve considered the various aspects of the library-as-space question, we’ve created all manner of collaborative, group-focused, überconnected learning spaces. we’ve also created bookfree spaces (to say nothing of book-free “libraries”), food-friendly spaces, quiet and cell-phone-free spaces, and a host of others of which i’m sure i haven’t thought. so, in an attempt to get us thinking about what carr ’s and powers’ books might mean for libraries, here’s a crazy idea to start us off: how about a screen-free space for deep reading and contemplation? it should be very low-tech: no mobiles, no laptops, no desktops, no networks, no clickety-clack of keys, no chimes of incoming e-mail and tweets, no unearthly glow of monitors. no food, drink, or group-study areas, either. just a quiet, inviting, comfortable space for individual reading and the goal is no longer to be “in touch” but to erase the possibility of ever being out of touch. to merge, to live simultaneously with everyone, sharing every moment, every perception, thought, and action via our screens. even the places where we used to go to get away from the crowd and the burdens it imposes on us are now connected. the simple act of going out for a walk is completely different today from what it was fifteen years ago. whether you’re walking down a big-city street or in the woods outside a country town, if you’re carrying a mobile device with you, the global crowd comes along. . . . the air is full of people. (14–15) drawing inspiration and analogy from a list of philosophers and other historical and literary figures beginning with plato and ending with mcluhan, powers describes seven practical approaches, tools, and techniques for disconnecting from our screen-driven life: ■■ seek physical distance (plato) ■■ seek intellectual and emotional distance (seneca) ■■ hope for devices that might allow us to customize our degree of connectedness (gutenberg) ■■ consider older, low-tech tools as alternatives where possible (shakespeare via hamlet) ■■ create positive rituals (ben franklin) ■■ create a “walden zone” refuge (thoreau) ■■ be aware of and take personal control from technology by being aware of that technology (mcluhan) powers then reviews how he and his family used these techniques to regain the sense of control and depth they felt they’d lost to screens. in the past several months, i’ve tried a couple myself. i no longer carry a blackberry unless i’m traveling out of town. i avoid e-mail and the internet completely on saturdays (my “internet sabbath”). the effect of these two small and easily achieved changes has been little short of liberating, providing space to think and reflect without the distraction of always-on connectedness. walking my lab seamus has become a special pleasure! ■■ bringing libraries into the picture so, what do carr’s and powers’ theses mean for libraries, and what do they mean in particular for those of us who provide technology solutions for libraries? they remind us that there is a very real human cost to the technology of screens and always-on connectedness that have become our stock-in-trade in recent years. as well, they provide convincing evidence that there is a growing awareness, editorial | truitt 5 references and notes 1. carr’s atlantic monthly article appeared in volume 301 (july/aug. 2008) and can be found at http://www.theatlantic . c o m / m a g a z i n e / a rc h i v e / 2 0 0 8 / 0 7 / i s g o o g l e m a k i n g u s -stupid/6868/ (accessed jan. 14, 2011); my ital column on the topic is at http://www.ala.org/ala/mgrps/divs/lita/ ital/272008/2703sep/editorial_pdf.cfm (accessed jan. 14, 2011). 2. the term “ecosystem of interruption technologies” belongs to cory doctorow. 3. powers uses the term “screens” to describe “the connective digital devices that have been widely adopted in the last two decades, including desktop and notebook computers, mobile phones, e-readers, and tablets” (1). thought. would some of our patrons adopt it? i’m willing to bet that they would. do we not owe them the same commitment to service that we’ve worked so hard to provide to those who wish to be collaborative and “always-on”? absolutely. no, we can’t change the world or stop the march of the screens. but perhaps, as with powers’ “walden zone,” we can start by providing a close-at-hand safe harbor for those of our patrons seeking refuge from the “always-on” world of screens. 44 information technology and libraries | december 2007 author id box for 3 column layout column titlecommunications afghanistan digital library initiative: revitalizing an integrated library system yan han and atifa rawan this paper describes an afghanistan digital library initiative of building an integrated library system (ils) for afghanistan universities and colleges based on open-source software. as one of the goals of the afghan equality digital libraries alliance, the authors applied systems analysis approach, evaluated different open-source ilss, and customized the selected software to accommodate users’ needs. improvements include arabic and persian language support, user interface changes, call number label printing, and isbn-13 support. to our knowledge, this ils is the first at a large academic library running on open-source software. the last quarter­century has been devastating for afghanistan, with an uninterrupted period of inva­ sions, civil wars, and oppressive regimes. “since 1979, the education system was virtually destroyed on all levels. schools and colleges were closed, looted, or physically reduced; student bodies and faculties were emptied by war, migration, and eco­ nomic hardship; and libraries were gutted.”1 kabul university (ku), for example, was largely demolished by 1994 and completely closed down in 1998. it is universally recognized that afghanistan desperately needs trained faculty, teachers, librarians, and staff. the current state of the higher education system is one of dramatic destruction and deteriora­ tion. based on rawan’s assessments of ku library, most of its collections were damaged or destroyed. she found that there were approximately 60,000 to 70,000 books in english, 2,000 to 3,000 books in persian, and 2,000 theses in persian. none of these collections have manual or online catalog records. the library has eigh­ teen staff members, but not all are fully trained in library activities.2 rebuilding the educational infra­ structure in afghanistan is essential. afghan equality digital libraries alliance the university of arizona (ua) library has been involved in rebuilding academic libraries in afghanistan since april 2002. in 2005, we were invited to be part of the digital libraries alliance (dla) as part of the afghan equality alliances: 21st century universities for afghanistan initiative funded by the usaid and washington state university. dla’s goal is to build the capacity of afghan libraries and librarians to work with open source digital libraries platforms; and to provide and enhance access to schol­ arly information resources and open content that all afghanistan univer­ sities can share. revitalizing the afghan ils an integrated library system (ils) usually includes several critical com­ ponents, such as acquisitions, cat­ aloging, catalog (search and find), circulation, and patron management. traditionally it has been the center of any library. recent developments in digital libraries have resulted in dis­ tributed systems in libraries, and the ils is treated as one of many digital library systems. it still is critical to have a centralized ils to provide a primary way to access library­owned materials for afghanistan universi­ ties and colleges. other services, such as interlibrary loan and other digital library systems, can be further devel­ oped to extend libraries’ services to users and communities. the ua library is working collab­ oratively with other dla members, including universities around the world and universities in afghanistan. one of the goals is to develop a digital library environment, includ­ ing a centralized ils for four aca­ demic universities in kabul (kabul university, polytechnic university, kabul medical university, and kabul education university). in the future, the ils will include other regional institutions throughout afghanistan. the ils will support 30,000 students and 2,000 faculty in afghan universi­ ties and colleges. overview of the ils market currently the ils market is primar­ ily dominated by commercial sys­ tems, such as innovative interface, endeavor, and sirsi. compared with other computing areas, open­source systems in ils are immature and limited, as there are only a few prod­ ucts available, and most of them do not have the full features of an ils. however, they are providing a valu­ able alternative to those costly com­ mercial systems. based on the availability of exist­ ing funding, experiences with com­ mercial vendors, and consideration of vendor supports and future direc­ tions, the authors decided to build a digital library infrastructure with the open concept (open access, open source, and open standards). the decision is widely influenced by glo­ balization, open access, open source, open standards, and increasing user expectations. at the same time, the decision gives us an opportunity to develop and integrate new tools and services for libraries as suggested by the university of california.3 koha is probably the most renowned open­source ils. it is yan han (hany@u.library.arizona.edu) is systems librarian and atifa rawan (rawana@u.library.arizona.edu) is librarian at the university of arizona libraries, tucson. afghanistan digital library initiative | han and rawan 45 a full­featured ils, developed in new zealand and first deployed in horowhenua library trust in 2000. so far koha has been running in a few public and special libraries. the underlying architecture is the linux, apache, mysql, and perl (lamp) stack. building on a simi­ lar lamp (linux, apache, mysql, and php) architecture, openbiblio has a relatively short history, releas­ ing its first beta 0.1.0 version in 2002 and currently in beta 0.5.1 version. webils is an open­source ils based on unesco’s cds/isis database, developed by the institute for computer and information engineering in poland. the software has some ils features, including cataloging, catalog (search and find), loan, and report modules. weblis must run on windows and window­ based web servers, such as xitami/ microsoft iis and isis database. gnuteca, another open­source ils widely deployed in south america universities, was developed in brazil. as with webils, it has some ils features, such as cataloging, cata­ log, and loan; however, the software interface is written in portuguese, which presents a language barrier for u.s. and afghanistan users. the paper open source integrated library systems provides a good overview of other systems.4 systems analysis the authors adopted systems analy­ sis by taking account of afghan col­ lections, users’ needs, and systems functionality required to perform essential library operations. koha was chosen as the base software, due to its functionality, maturity, and support. some of the reasons are: ■ the software architecture is open­ source lamp, which is popular, stable, and predominant. ■ our staff have skills in these open software systems. ■ it is a full­featured open­source ils. certain components, such as multiple branch support and user management, are critical. ■ two large public libraries serv­ ing population of 30,000 users in new zealand and united states have been running their ils on koha for a few years. the soft­ ware is stable, and most bugs have been fixed. ■ koha has a mailing list that is used by koha developers and users as a communication tool to ask and answer questions. kabul universities have com­ puter science faculty and students who have the capacity to participate in the development. due to working schedules and locations, we prefer to develop and maintain the system in the ua library. the technical project team consists of three people: yan han, who is responsible for manag­ ing the overall implementation and development in the open source ils system; one part­time (twenty hours per week) student developer whose major task is to develop and man­ age source code; and a temporary student (ten hours per week for two months) responsible for translating english to farsi and dari. testing tasks, such as unit testing and sys­ tem testing, are shared by all mem­ bers of the team. major challenges farsi and dari languages support koha version 2.2 cannot correctly handle east asian language records, including farsi and dari records. supporting persian, farsi, and dari records is a very important require­ ment, as these afghan universities have quite a few persian and dari materials. koha generates a web­ based graphical user interface (gui) through perl included templates that use a html meta tag with western character set (iso­8559­1) to encode characters. browsers such as internet explorer and firefox use the meta tag to decode characters with a predefined character set. therefore, other characters, such as arabic and persian as well as chinese would not be displayed correctly. the perl tem­ plates were identified and modified to allow characters to be encoded in unicode, and this solved the prob­ lem. persian and dari characters can be entered into the cataloging module and displayed correctly in the gui. however, we should understand the limitations of this approach when dealing with other east asian character sets, such as chinese characters. only frequently used characters can be represented. a project of academia sinica is one of the efforts to deal with 65,000 unique chinese characters.5 farsi/dari gui as the project is designed for local afghanistan users, there is a need for a farsi and dari gui. the current version of koha does not have such an interface, and we decided to create a new farsi/dari gui for the opac. the koha system’s internal structure is logically arranged; therefore, our development work in translation is not difficult to manage. the transla­ tion student translates english words in perl template files into farsi and dari. at the same time he works with the developer to make sure it is dis­ played correctly in the opac. figure 1 is the screenshot of the gui. other improvements we further developed a spine label printing module and integrated the module into the ils, as there is no such function provided. the module allows library staff to print one or more standardized labels (1.5 inches high by 1 inch wide) with oclc formats on gaylord lsl 01 paper, which has fifty­six labels per sheet. 46 information technology and libraries | december 2007 lstaff can select an appropriate label slot to start and print out his or her choices of labels through the web preview feature. this feature eases library staff operations and provides cost savings for label papers. isbn­13 replaced isbn­10 after january 1, 2007, and any ils has to be able to handle the new isbn­13. our ils has been improved to han­ dle both isbn standards. thanks to koha’s delegation of the gui and major functionality, interfaces such as fonts and web pages can be modi­ fied through the templates and css. a z39.50 service has been configured to allow users to search other librar­ ies’ catalogs. hardware and software support afghanistan is still developing its fun­ damental infrastructure: electricity, transportation, and communication. when considering buying hardware for the ils, difficult issues, such as server services and computer parts, have to be solved. even international it companies, such as dell, hp, and ibm, have very limited services and support in afghanistan. regarding software and system support, our strategies are to: ■ maintain and develop the open source software at the ua library by the project team; ■ run one server in kabul, afghanistan, administrated by a local system administrator. ■ run one server in the ua library administrated by the library’s system administrator. cost we estimated our overall cost for building the open­source system is low and reasonable. the system is currently run­ ning on a dell 2800 server ($5,000 for 3ghz cpu, 4gb ram, and five 73gb hard drives), kernel built debian linux (free), apache 2 (free), mysql (free), and perl (free). han spends four hours per week for coor­ dination, communication, and man­ agement of the project. the student developer works twenty hours per week for development and mainte­ nance, while the translation student will spend one hundred hours for translation. conclusion revitalizing an afghan ils is the first important goal to build digital library initiatives for the afghanistan higher education system. by under­ standing afghan university librar­ ies, collections, and users, the ua library is working with other dla members to build the open source ils. the new farsi and dari user interface, language support, and other improvements have been made to meet needs of afghan uni­ versities and colleges. the cost of using and developing existing open source software is reasonable. acknowledgments we thank usaid, washington state university, and other dla mem­ bers for providing support. this work was supported by usaid and washington state university. references and notes 1. nazif sharani et. al., conference transcription, conference on strate­ gic planning of higher education for afghanistan, 2002, indiana university, bloomington, oct. 6–7. 2. atifa rawan, transformation in afghanistan: rebuilding libraries, paper presented at azla conference, mesa, ariz., oct. 11–13, 2005. 3. the university of california libraries, rethinking how we provide bibliographic services for the university of california, 2005, http://libraries.univer­ sityofcalifornia.edu/sopag/bstf/final. pdf. 4. eric anctil and jamshid beheshti, open source integrated library systems: an overview, 2004, www.anctil.org/users/ eric/oss4ils.html (accessed nov. 5, 2006). 5. derming juang et al., “resolving the unencoded character problem for chinese digital libraries,” proceedings of the 5th acm/ieee-cs joint conference on digital libraries, jcdl 2005, denver (june 7–11, 2005): 311–19 (new york: acm pr., 2005). figure 1: afghanistan academic libraries union catalog in farsi/dari lita cover 2, cover 3, cover 4 index to advertisers a s i approach the end of my tenure as ital edi­ tor, i reflect on the many lita members who have not submitted articles for possible publica­ tion in our journal. i am especially mindful of the smaller number who have promised or hinted or implied that they intended to or might submit articles. admittedly, some of them may have done so because i asked them, and their replies to me were the polite ones that one expects of the honorable members of the library and information technology association of the american library association. librarians are as individuals almost all or almost always polite in their professional discourse. pondering these potential authors, particularly the smaller number, i conjured a mental picture of a fictional, male, potential ital author. i don’t know why my fic­ tional potential author was male—it may be because more males than females are members of that group; it may be because i’m a male; or it may be unconscious sex­ ism. i’m not very self­analytic. my mental picture of this fictional male potential author saw him driving home from his place of employ­ ment after having an after­work half gallon of rum when, into the picture, a rattlesnake crawled on to the seat of his car and bit him on the scrotum. lucky him: he was, after all, a figment of my imagina­ tion. (any resemblance between my fictional author and a real potential author is purely coincidental.) lucky me: we all know that such an incident is not unthinkable in library land. lucky lita: it is unlikely that any member will cancel his or her membership or any subscriber, his, her, or its subscription because the technical term “scro­ tum” found its way into my editorial. ital is, after all, a technology journal, and members and readers ought to be offended if our journal abjures technical terminology. likewise they should be offended if our articles discuss library technology issues misusing technical terms or concepts, or confusing technical issues with policy issues, or stating technology problems or issues in the title or abstract or introduction then omitting any mention of said problems until the final paragraph(s). ital referees are quite diligent in questioning authors when they think terminology has been used loosely. their close readings of manuscripts have caught more than one author mislabeling policies related to the uses of informa­ tion technologies as if the policies were themselves tech­ nical conundrums. most commonly, they have required authors who state major theses or technology problems at the beginnings of their manuscripts, then all but ignore these until the final paragraphs, to rewrite sections of their manuscripts to emphasize the often interesting questions raised at the outset. what, pray tell, is the editor trying to communicate to readers? two things, primarily. first, i have been following with interest the several heated discussions that have taken place on lita­l for the past number of months. sometimes, the idea of the traditional quarterly scholarly/professional journal in a field changing so rapidly may seem almost quaint. a typical ital article is five months old when it is pub­ lished. a typical discussion thread on lita­l happens in “real time” and lasts two days at most. a small number of participants raise and “solve” an issue in less than a half dozen posts. a few times, however, a question asked or a comment posted by a lita member has led to a flurry of irrelevant postings, or, possibly worse, sustained bomb­ ing runs from at least two opposing camps that have left some members begging to be removed from the list until the all clear signal has been sounded. i’ve read all of these, and i could not help but won­ der, what if ital accepted manuscripts as short as lita­l postings? what would our referees do? i suspect, for our readers’ sakes, most would be rejected. authors whose manuscripts are rejected receive the comments made by the referees and me explaining why we cannot accept their submissions. the most frequent reason is that they are out of scope, irrelevant to the purposes of lita. when someone posts a technology question to lita­l that gener­ ates responses advising the questioner that implementing the technology in question is bad policy, the responses are, from an editor’s point of view, out of scope. how many lita members have authority—real authority—to set policy for their libraries? a second “popular” reason for rejections is that the manuscripts pose “false” problems that may be technological but that are not technologies that are within the “control” of libraries. these are out of scope in a different manner. third, some manuscripts do not pass the “so what” test. some days i wish that lita­l responders would referee, honestly, their own responses for their relevance to the questions or issues or so­whatness and to the membership. second, and more importantly to me, lita members, whether or not your bodies include the part that we all have come to know and defend, do you have the “­” to send your ital editor a manuscript to be chewed upon not by rattlesnakes but by the skilled professionals who are your ital editorial board members and referees? i hope (and do i dare beg again?) so. your journal will not suffer quaintness unless you make it so. editorial: the virtues of deliberation john webb john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university, and editor of information technology and libraries. editorial | webb 3 bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 205 kayla l. quinney, sara d. smith, and quinn galbraith bridging the gap: self-directed staff technology training of hbll patrons. as anticipated, results indicated that students frequently use text messages, social networks, blogs, etc., while fewer staff members use these technologies. for example, 42 percent of the students reported that they write a blog, while only 26 percent of staff and faculty do so. also, 74 percent of the students and only 30 percent of staff and faculty indicated that they belonged to a social network. after concluding that staff and faculty were not as connected as their student patrons are to technology, library administration developed the technology challenge to help close this gap. the technology challenge was a self-directed training program requiring participants to explore new technology on their own by spending at least fifteen minutes each day learning new technology skills. this program was successful in promoting lifelong learning by teaching technology applicable to the work and home lives of hbll employees. we will first discuss literature that shows how technology training can help academic librarians connect with student patrons, and then we will describe the technology challenge and demonstrate how it aligns with the principles of self-directed learning. the training will be evaluated by an analysis of the results of two surveys given to participants before and after the technology challenge was implemented. ■■ library 2.0 and “librarian 2.0” hbll wasn’t the first to notice the gap between librarians and students, mcdonald and thomas noted that “gaps have materialized,” and library technology does not always “provide certain services, resources, or possibilities expected by emerging user populations like the millennial generation.”1 college students, who grew up with technology, are “digital natives,” while librarians, many having learned technology later in life, are “digital immigrants.”2 the “digital natives” belong to the millennial generation, described by shish and allen as a generation of “learners raised on and confirmed experts in the latest, fastest, coolest, greatest, newest electronic technologies.”3 according to sweeny, when students use libraries, they expect the same “flexibility, geographic independence, speed of response, time shifting, interactivity, multitasking, and time savings” provided by the technology they use daily.4 students are undergraduates, as members of the millennial generation, are proficient in web 2.0 technology and expect to apply these technologies to their coursework—including scholarly research. to remain relevant, academic libraries need to provide the technology that student patrons expect, and academic librarians need to learn and use these technologies themselves. because leaders at the harold b. lee library of brigham young university (hbll) perceived a gap in technology use between students and their staff and faculty, they developed and implemented the technology challenge, a self-directed technology training program that rewarded employees for exploring technology daily. the purpose of this paper is to examine the technology challenge through an analysis of results of surveys given to participants before and after the technology challenge was implemented. the program will also be evaluated in terms of the adult learning theories of andragogy and selfdirected learning. hbll found that a self-directed approach fosters technology skills that librarians need to best serve students. in addition, it promotes lifelong learning habits to keep abreast of emerging technologies. this paper offers some insights and methods that could be applied in other libraries, the most valuable of which is the use of self-directed and andragogical training methods to help academic libraries better integrate modern technologies. l eaders at the harold b. lee library of brigham young university (hbll) began to suspect a need for technology training when employees were asked during a meeting if they owned an ipod or mp3 player. out of the twenty attendees, only two raised their hands—one of whom worked for it. perceiving a technology gap between hbll employees and student patrons, library leaders began investigating how they could help faculty and staff become more proficient with the technologies that student patrons use daily. to best serve student patrons, academic librarians need to be proficient with the technologies that student patrons expect. hbll found that a self-directed learning approach to staff technology training not only fosters technology skills, but also promotes lifelong learning habits. to further examine the technology gap between librarians and students, the hbll staff, faculty, and student employees were given a survey designed to explore generational differences in media and technology use. student employees were surveyed as representatives of the larger student body, which composes the majority kayla l. quinney (quinster27@gmail.com) is research specialist, sara d. smith (saradsmith@gmail.com) is research specialist, and quinn galbraith (quinn_galbraith@byu.edu) is library human resource training and development manager, brigham young university library, provo, utah. 206 information technology and libraries | december 2010 2.0,” a program that “focuses on self-exploration and encourages staff to learn about new technologies on their own.”24 learning 2.0 encouraged library staff to explore web 2.0 tools by completing twenty-three exercises involving new technologies. plcmc’s program has been replicated by more than 250 libraries and organizations worldwide,25 and several libraries have written about their experiences, including academic26 and public libraries.27 these programs—and the technology challenge implemented by hbll—integrate the theories of adult learning. in the 1960s and 1970s, malcolm knowles introduced the theory of andragogy to describe the way adults learn.28 knowles described adults as learners who (1) are self-directed, (2) use their experiences as a resource for learning, (3) learn more readily when they experience a need to know, (4) seek immediate application of knowledge, and (5) are best motivated by internal rather than external factors.29 the theory and practice of self-directed learning grew out of the first learning characteristic and assumes that adults prefer self-direction in determining and achieving learning goals, and therefore learners exercise independence in determining how and what they learn.30 these theories have had a considerable effect on adult education practice31 and employee development programs.32 when adults participate in trainings that align with the assumptions of andragogy, they are more likely to retain and apply what they have learned.33 ■■ the technology challenge hbll’s technology challenge is similar to learning 2.0 in that it encourages self-directed exploration of web 2.0 technologies, but it differs in that participants were even more self-directed in exploration and that they were asked to participate daily. these features encouraged more self-directed learning in areas of participant interest as well as habit formation. it is not our purpose to critique learning 2.0, but to provide some evidence and analysis to demonstrate the success of hands-on, self-directed training approaches and to suggest other ways for libraries to apply self-directed learning to technology training. the technology challenge was implemented from june 2007 to january 2008. hbll staff included 175 full-time employees, 96 of whom participated in the challenge. (the student employees were not involved.) participants were asked to spend fifteen minutes each day learning a new technology skill. hbll leaders used rewards to make the program enjoyable and to motivate participation: for each minute spent learning technology, participants earned one point, and when one thousand points were earned, the participant would receive a gift certificate to the campus bookstore. staff and faculty participated and tracked their progress through an online masters of “informal learning”; that is, they are accustomed to easily and quickly gathering information relevant to their lives from the internet and from friends. shish and allen claimed that millennials prefer “interactive, hyper-linked multimedia over the traditional static, textoriented printed items. they want a sense of control; they need experiential and collaborative approaches rather than formal, librarian-guided, library-centric services.”5 these students arrive on campus expecting “to handle the challenges of scholarly research” using similar methods and technologies.6 interactive technologies such as blogs, wikis, streaming media applications, and social networks, are referred to as “web 2.0.” abram argued that web 2.0 technology “could be useful in an enterprise, institutional research, or community environment, and could be driven or introduced by the library.”7 “library 2.0” is a concept referring to a library’s integration of these technologies; it is essentially the use of “web 2.0 opportunities in a library environment.”8 manesss described library 2.0 is user-centered, social, innovative, and provider of a multimedia experiences.9 it is a community that “blurs the line between librarian and patron, creator and consumer, authority and novice.”10 libraries have been using web 2.0 technology such as blogs,11 wikis,12 and social networks13 to better serve and connect with patrons. blogs allow libraries to “provide news, information and links to internet resources,”14 and wikis create online study groups15 and “build a shared knowledge repository.”16 social networks can be particularly useful in connecting with undergraduate students: millennials use technology to collaborate and make collective decisions,17 and libraries can capitalize on this tendency by using social networks, which for students would mean, as bates argues, “an informational equivalent of the reliance on one’s facebook friends.”18 students expect library 2.0—and as libraries integrate new technologies, the staff and faculty of academic libraries need to become “librarian 2.0.” according to abram, librarian 2.0 understands users and their needs “in terms of their goals and aspirations, workflows, social and content needs, and more. librarian 2.0 is where the user is, when the user is there.”19 the modern library user “needs the experience of the web . . . to learn and succeed,”20 and the modern librarian can help patrons transfer technology skills to information seeking. librarian 2.0 is prepared to help patrons familiar with web 2.0 to “leverage these [technologies] to make a difference in reaching their goals.”21 therefore staff and faculty “must become adept at key learning technologies themselves.”22 stephen abram asked, “are the expectations of our users increasing faster than our ability to adapt?”23 and this same concern motivated hbll and other institutions to initiate staff technology training programs. the public library of charlotte and mecklenburg county of north carolina (plcmc) developed “learning bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 207 their ability to learn and use technology. to be eligible to receive the gift card, participants were required to take this exit survey. sixty-four participants, all of whom had met or exceeded the thousand-point goal, chose to complete this survey, so the results of this survey represent the experiences of 66 percent of the participants. of course, if those who had not completed the technology challenge had taken the survey the results may have been different, but the results do show how those who chose to actively participate reacted to this training program. the survey included both quantifiable and open-ended questions (see appendix b for survey results and a list of the open-ended questions). the survey results, along with an analysis of the structure of the challenge itself, demonstrates that the program aligns with knowles’s five principles of andragogy to successfully help employees develop both technology skills and learning habits. self-direction the technology challenge was self-directed because it gave participants the flexibility to select which tasks and challenges they would complete. garrison wrote that in a self-directed program, “learners should be provided with choices of how they wish to proactively carry out the learning process. material resources should be available, approaches suggested, flexible pacing accommodated, and questioning and feedback provided when needed.”34 hbll provided a variety of challenges and training sessions related to various technologies. technology challenge participants were given the independence to choose which learning methods to use, including which training sessions to attend and which challenges to complete. according to the exit survey, the most popular training methods were small, instructor-led groups, followed by self-learning through reading books and articles. group training sessions were organized by hbll leadership and addressed topics such as microsoft office, rss feeds, computer organization skills, and multimedia software. other learning methods included web tutorials, dvds, large group discussions, and one-on-one tutoring. the group training classes preferred by hbll employees may be considered more teacher-directed than self-directed, but the technology challenge was self-directed as a whole in that learners were given the opportunity to choose what they learned and how they learned it. the structure of the technology challenge allowed participants to set their own pace. staff and faculty were given several months to complete the challenge and were responsible to pace themselves. on the exit survey, one participant commented: “if i didn’t get anything done one week, there wasn’t any pressure.” another enjoyed flexibility in deciding when and where to complete the tasks: “i liked being able to do the challenge anywhere. when i had a few minutes between appointments, classes, board game called “techopoly.” participation was voluntary, and staff and faculty were free to choose which tasks and challenges they would complete. tasks fell into one of four categories: software, hardware, library technology, and the internet. participants were required to complete one hundred points in each category, but beyond that, were able to decide how to spend their time. examples of tasks included attending workshops, exploring online tutorials, and reading books or articles about a relevant topic. for each hundred points earned, participants could complete a mini-challenge, which included reading blogs or e-books, listening to podcasts, or creating a photo cd (see appendix a for a more complete list). participants who completed fifteen out of twenty possible challenges were entered into a drawing for another gift certificate. before beginning the challenge, all participants were surveyed about their current use of technology. on this survey, they indicated that they were most uncomfortable with blogs, wikis, image editors, and music players. these results provided a focus for technology challenge trainings and mini-challenges. while not all of these technologies may apply directly to their jobs, 60 percent indicated that they were interested in learning them. forty-four percent reported that time was the greatest impediment to learning new technology; therefore the daily fifteen-minute requirement was introduced with the hope that it was small enough to be a good incentive to participate but substantial enough to promote habit formation and allow employees enough time to familiarize themselves with the technology. although some productivity may have been lost due to the time requirement (especially in cases where participants may have spent more than the required time), library leaders felt that technology training was an investment in hbll employees and that, at least for a few months, it was worth any potential loss in productivity. because participants could chose how and when they learned technology, they could incorporate the challenge into their work schedules according to their own needs, interests, and time constraints. of ninety-six participants, sixty-six reached or exceeded the thousand-point goal, and eight participants earned more than two thousand points. ten participants earned between five hundred and one thousand points, and another six earned between one hundred and five hundred. although not all participants completed the challenge, most were involved to some extent in learning technology during this time. ■■ the technology challenge and adult learning after finishing the challenge, participants took an exit survey to evaluate the experience and report changes in 208 information technology and libraries | december 2010 were willing, even excited, to learn technology skills: 37 percent “agreed” and 60 percent “strongly agreed” that they were interested in learning new technology. their desire to learn was cultivated by the survey itself, which helped them recognize and focus on this interest, and the challenge provided a way for employees to channel their desire to learn technology. immediate application learners need to see an opportunity for immediate application of their knowledge: ota et al. explained that “they want to learn what will help them perform tasks or deal with problems they confront in everyday situations and those presented in the context of application to real life.”39 because of the need for immediate application, the technology challenge encouraged staff and faculty to learn technology skills directly related to their jobs—as well as technology that is applicable to their personal or home lives. hbll leaders hoped that as staff became more comfortable with technology in general, they would be motivated to incorporate more complex technologies into their work. here is one example of how the technology challenge catered to adult learners’ need to apply what they learn: before designing the challenge, hbll held a training session to teach employees the basics of photoshop. even though attendees were on the clock, the turnout was discouraging. library leaders knew they needed to try something new. in the revamped photoshop workshop that was offered as part of the technology challenge, attendees brought family photos or film and learned how to edit and experiment with their photos and burn dvd copies. this time, the class was full: the same computer program that before drew only a few people was now exciting and useful. focusing on employees’ personal interests in learning new software, instead of just on teaching the software, better motivated staff and faculty to attend the training. motivation as stated by ota et al., adults are motivated by external factors but are usually more motivated by internal factors: “adults are responsive to some external motivators (e.g., better job, higher salaries), but the most potent motivators are internal (e.g., desire for increased job satisfaction, self-esteem).”40 on the entrance survey, participants were given the opportunity to comment on their reasons for participating in the challenge. the gift card, an example of an external motivation, was frequently cited as an important motivation. but many also commented on more internal motivations: “it’s important to my job to stay proficient in new technologies and i’d like to stay current”; “i feel that i need to be up-to-date or meetings i could complete some of the challenges.” employees could also determine how much or how little of the challenge they wanted to complete: many reached well over the thousand-point goal, while others fell a little short. participants began at different skill levels, and thus could use the time and resources allotted to explore basic or more advanced topics according to their needs and interests. garrison had noted the importance of providing resources and feedback in self-directed learning.35 the techopoly website provided resources (such as specific blogs or websites to visit) and instructions on how to use and access technology within the library. hbll also hired a student to assist staff and faculty one-on-one by explaining answers to their questions about technology and teaching other skills he thought may be relevant to their initial problem. the entrance and exit surveys provided opportunities for self-reflection and self-evaluation by questioning the participants’ use of technology before the challenge and asking them to evaluate their proficiency in technology after the challenge. use of experience the use of experience as a source of learning is important to adult learners: “the richest resource for learning resides in adults themselves; therefore, tapping into their experiences through experiential techniques (discussions, simulations, problem-solving activities, or case methods) is beneficial.”36 the small-group discussions and one-onone problem solving made available to hbll employees certainly fall into these categories. small-group classes are one of the best ways to encourage adults to share and validate their experiences, and doing so increases retention and application of new information.37 the trainings and challenges encouraged participants to make use of their work and personal experiences by connecting the topic to work or home application. for example, one session discussed how blogs relate to libraries, and another helped participants learn adobe photoshop skills by editing personal photographs. need to know adult learners are more successful when they desire and recognize a need for new knowledge or skills. the role of a trainer is to help learners recognize this “need to know” by “mak[ing] a case for the value of learning.”38 hbll used the generational survey and presurvey to develop a need and desire to learn. the results of the generational survey, which demonstrated a gap in technology use between librarians and students, were presented and discussed at a meeting held before the initiation of the technology challenge to help staff and faculty understand why it was important to learn 2.0 technology. results of the presurvey showed that staff and faculty bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 209 statistical reports or working with colleagues from other libraries.” ■■ “i learned how to set up a server that i now maintain on a semi-regular basis. i learned a lot about sfx and have learned some perl programming language as well that i use in my job daily as i maintain sfx.” ■■ “the new oclc client was probably the most significant. i spent a couple of days in an online class learning to customize the client, and i use what i learned there every single day.” ■■ “i use google docs frequently for one of the projects i am now working on.” participants also indicated weaknesses in the technology challenge. almost 20 percent of those who completed the challenge reported that it was too easy. this is a valid point—the challenge was designed to be easy so as not to intimidate staff or faculty who are less familiar with technology. it is important to note that these comments came from those who completed the challenge—other participants may have found the tasks and mini-challenges more difficult. the goal was to provide an introduction to web 2.0, not to train experts. however, a greater range of tasks and challenges could be provided in the future to allow staff and faculty more selfdirection in selecting goals relevant to their experience. to encourage staff and faculty to attend sponsored training sessions as part of the challenge, hbll leaders decided to double points for time spent at these classes. this certainly encouraged participation, but it lead to “point inflation”—perhaps being one reason why so many reported that the challenge was too easy to complete. the doubling of points may also have encouraged staff to spend more time in workshops and less time practicing or applying the skills learned. a possible solution would be offering 1.5 points, or offering a set number of points for attendance instead of counting per minute. it also may have been informative for purpose of analysis to have surveyed both those who did not complete the challenge as well as those who chose not to participate. because the presurvey indicated that time was the biggest deterrent to learning and incorporating new technology, we assume that many of those who did not participate or who did not complete the challenge felt that they did not have enough time to do so. there is definitely potential for further investigation into why library staff would not want to participate in a technology training program, what would motivate them to participate, and how we could redesign the technology challenge to make it more appealing to all of our staff and faculty. several library employees have requested that hbll sponsor another technology challenge program. because of the success of the first and because of continuing interest in technology training, we plan to do so in the future. we will make changes and adjustments according to the on technology in order to effectively help patrons”; “to identify and become comfortable with new technologies that will make my work more efficient, more presentable, and more accurate.” ■■ lifelong learning staff and faculty responded favorably to the training. none of the participants who took the exit survey disliked the challenge; 34 percent even reported that they strongly liked it. ninety-five percent reported that they enjoyed the process of learning new technology, and 100 percent reported that they were willing to participate in another technology challenge—thus suggesting success in the goal of encouraging lifelong technology learning. the exit survey results indicate that after completing the challenge, staff and faculty are more motivated to continue learning—which is exactly what hbll leaders hoped to accomplish. eighty-nine percent of the participants reported that their desire to learn new technology had increased, and 69 percent reported that they are now able to learn new technology faster after completing the technology challenge. ninety-seven percent claimed that they were more likely to incorporate new technology into home or work use, and 98 percent said they recognized the importance of staying on top of emerging technologies. participants commented that the training increased their desire to learn. one observed, “i often need a challenge to get motivated to do something new,” and another participant reported feeling “a little more comfortable trying new things out.” the exit survey asked participants to indicate how they now use technology. one employee keeps a blog for her daughter’s dance company, and another said, “i’m on my way to a full-blown googlereader addiction.” another participant applied these new skills at home: “i’m not so afraid of exploring the computer and other software programs. i even recently bought a computer for my own personal use at home.” the technology challenge was also successful in helping employees better serve patrons: “i can now better direct patrons to services that i would otherwise not have known about, such as streaming audio and video and e-book readers.” another participant felt better connected to student patrons: “i understand the students better and the things they use on a daily basis.” staff and faculty also found their new skills applicable to work beyond patron interaction, and many listed specific examples of how they now use technology at work: ■■ “i have attended a few microsoft office classes that have helped me tremendously in doing my work more efficiently, whether it is for preparing monthly 210 information technology and libraries | december 2010 2. richard t. sweeny, “reinventing library buildings and services for the millennial generation,” library administration & management 19, no. 4 (2005): 170. 3. win shish and martha allen, “working with generationd: adopting and adapting to cultural learning and change,” library management 28, no. 1/2 (2006): 89. 4. sweeney, “reinventing library buildings,” 170. 5. shish and allen, “working with generation-d,” 96. 6. ibid., 98. 7. stephen abram, “social libraries: the librarian 2.0 pheonomenon,” library resources & technical services 52, no. 2 (2008): 21. 8. ibid. 9. jack m. maness “library 2.0 theory: web 2.0 and its implications for libraries,” webology 3, no. 2 (2006), http:// www.webology.ir/2006/v3n2/a25.html?q=link:webology.ir/ (accessed jan. 8, 2010). 10. ibid., under “blogs and wikis,” para. 4. 11. laurel ann clyde, “library weblogs,” library management 22, no. 4/5 (2004): 183–89; maness, “library 2.0. theory.” 12. see matthew m. bejune, “wikis in libraries,” information technology & libraries 26, no. 3 (2007): 26–38 ; darlene fichter, “the many forms of e-collaboration: blogs, wikis, portals, groupware, discussion boards, and instant messaging,” online: exploring technology & resources for information professionals 29, no. 4 (2005): 48–50; maness, “library 2.0 theory.” 13. mary ellen bates, “can i facebook that?” online: exploring technology and resources for information professionals 31, no. 5 (2007): 64; sarah elizabeth miller and lauren a. jensen, “connecting and communicating with students on facebook,” computers in libraries 27, no. 8 (2007): 18–22. 14. clyde, “library weblogs,” 183. 15. maness, “library 2.0 theory.” 16. fichter, “many forms of e-collaboration,” 50. 17. sweeney, “reinventing library buildings”; bates, “can i facebook that?” 18. bates, “can i facebook that?” 64. 19. abram, “social libraries,” 21. 20. ibid., 20. 21. ibid., 21. 22. shish and allen, “working with generation-d,” 90. 23. abram, “social libraries,” 20. 24. helene blowers and lori reed, “the c’s of our sea change: plans for training staff, from core competencies to learning 2.0,” computers in libraries 27, no. 2 (2007): 11. 25. helene blowers, learning 2.0, 2007, http://plcmclearning .blogspot.com (accessed jan. 8, 2010). 26. for examples, see ilana kingsley and karen jensen, “learning 2.0: a tool for staff training at the university of alaska fairbanks rasmuson,” the electronic journal of academic & special librarianship 12, no. 1 (2009), http://southernlibrarianship.icaap.org/content/v10n01/kingsley_i01.html (accessed jan. 8, 2010); beverly simmons, “learning (2.0) to be a social library,” tennessee libraries 58, no. 2 (2008): 1–8. 27. for examples, see christine mackenzie, “creating our future: workforce planning for library 2.0 and beyond,” australasian public libraries & information services 20, no. 3 (2007): 118–24; liisa sjoblom, “embracing technology: the deschutes public library’s learning 2.0 program,” ola quarterly 14, no. 2 (2007): 2–6; hui-lan titango and gail l. mason, “learning library 2.0: 23 things @ scpl,” library management 30, no. 1/2 feedback we have received, and continue to evaluate it and improve it based on survey results. the purpose of a second technology challenge would be to reinforce what staff and faculty have already learned, to teach new skills, and to help participants remember the importance of lifelong learning when it comes to technology. ■■ conclusion hbll’s self-directed technology challenge was successful in teaching technology skills and in promoting lifelong learning—as well as in fostering the development of librarian 2.0. abram listed key characteristics and duties of librarian 2.0, including learning the tools of web 2.0; connecting people, technology, and information; embracing “nontextual information and the power of pictures, moving images, sight, and sound”; using the latest tools of communication; and understanding the “emerging roles and impacts of the blogosphere, web syndicasphere, and wikisphere.”41 survey results indicated that hbll employees are on their way to developing these attributes, and that they are better equipped with the skills and tools to keep learning. like plcmc’s learning 2.0, the technology challenge could be replicated in libraries of various sizes. obviously an exact replication would not be feasible or appropriate for every library—but the basic ideas, such as the principles of andragogy and self-directed learning could be incorporated, as well as the daily time requirement or the use of surveys to determine weaknesses or interests in technology skills. whatever the case, there is a great need for library staff and faculty to learn emerging technologies and to keep learning them as technology continues to change and advance. but the most important benefit of a self-directed training program focusing on lifelong learning is effective employee development. the goal of any training program is to increase work productivity—and as employees become more productive and efficient, they are happier and more excited about their jobs. on the exit survey, one participant expressed initially feeling hesitant about the technology challenge and feared that it would increase an already hefty workload. however, once the challenge began, the participant enjoyed “taking the time to learn about new things. i feel i am a better person/librarian because of it.” and that, ultimately, is the goal—not only to create better librarians, but also to create better people. notes 1. robert h. mcdonald and chuck thomas, “disconnects between library culture and millennial generation values,” educause quarterly 29, no. 4 (2006): 4. bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 211 ers,” journal of extension 33 (2005), http://www.joe.org/ joe/2006december/tt5.php (accessed jan. 8, 2010); wayne g. west, “group learning in the workplace,” new directions for adult and continuing education 71 (1996): 51–60. 33. ota et al., “needs of learners.” 34. d. r. garrison, “self-directed learning: toward a comprehensive model,” adult education quarterly 48 (1997): 22. 35. ibid. 36. ota et al., “needs of learners,” under “needs of the adult learner,” para. 4. 37. ota et al., “needs of learners”; west, “group learning.” 38. ota et al., “needs of learners,” under “needs of the adult learner,” para. 2. 39. ibid., para. 6. 40. ibid., para 7. 41. abram, “social library,” 21–22. (2009): 44–56; illinois library association, “continuous improvement: the transformation of staff development,” the illinois library association reporter 26, no. 2 (2008): 4–7; and thomas simpson, “keeping up with technology: orange county library embraces 2.0,” florida libraries 20, no. 2 (2007): 8–10. 28. sharan b. merriam, “andragogy and self-directed learning: pillars of adult learning theory,” new directions for adult & continuing education 89 (2001): 3–13. 29. malcolm shepherd knowles, the modern practice of adult education: from pedagogy to andragogy (new york: cambridge books, 1980). 30. jovita ross-gordon, “adult learners in the classroom,” new directions for student services 102 (2003): 43–52. 31. merriam, “pillars of adult learning”; ross-gordon, “adult learners.” 32. carrie ota et al., “training and the needs of learnappendix a. technology challenge “mini challenges” technology challenge participants had the opportunity to complete fifteen of twenty mini-challenges to become eligible to win a second gift certificate to the campus bookstore. below are some examples of technology mini-challenges: 1. read a library or a technology blog 2. listen to a library podcast 3. check out a book from circulation’s new self-checkout machine 4. complete an online copyright tutorial 5. catalog some books on librarything 6. read an e-book with sony ebook reader or amazon kindle 7. scan photos or copy them from a digital camera and then burn them onto a cd 8. backup data 9. change computer settings 10. schedule meetings with microsoft outlook 11. create a page or comment on a page on the library’s intranet wiki 12. use one of the library’s music databases to listen to music 13. use wordpress or blogger to create a blog 14. post a photo on a blog 15. use google reader or bloglines to subscribe to a blog or news page using rss 16. reserve and check out a digital camera, camcorder, dvr, or slide scanner from the multimedia lab and create something with it 17. convert media on the analog media racks 18. edit a family photograph using photo-editing software 19. attend a class in the multimedia lab 20. make a phone call using skype 212 information technology and libraries | december 2010 how did you like the technology challenge overall? answer response percent strongly disliked 0 0 disliked 0 0 liked 42 66 strongly liked 22 34 how did you like the reporting system used for the technology challenge (the techopoly game)? answer response percent strongly disliked 0 0 disliked 4 6 liked 41 64 strongly liked 19 30 would you participate in another technology challenge? answer response percent yes 64 100 no 0 0 what percentage of time did you spend using the following methods of learning? (participants were asked to allocate 100 points among the categories) category average response instructor-led large group 15.3 instructor-led small group 27 one-on-one instruction 3.5 web tutorial 12.8 self-learning (books, articles) 27.4 dvds .5 small group discussion 2.7 large group discussion 2.6 other 6.7 i am more likely to incorporate new technology into my home or work life. answer response percent strongly disagree 0 0 disagree 2 3 agree 49 77 strongly agree 13 20 i enjoy the process of making new technology a part of my work or home life. answer response percent strongly disagree 0 0 disagree 2 3 agree 37 58 strongly agree 24 38 after completing the technology challenge, my desire to learn new technologies has increased. answer response percent strongly disagree 0 0 disagree 7 11 agree 44 69 strongly agree 13 20 i feel i now learn new technologies more quickly. answer response percent strongly disagree 0 0 disagree 20 31 agree 39 61 strongly agree 5 8 appendix b. exit survey results bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 213 open-ended questions ■■ what would you change about the technology challenge? ■■ what did you like about the technology challenge? ■■ what technologies were you introduced to during the technology challenge that you now use on a regular basis? ■■ in what was do you feel the technology challenge has benefited you the most? how much more proficient do you feel in . . . category not any somewhat a lot hardware 31% 64% 5% software 8% 72% 20% internet resources 17% 68% 15% library technology 23% 64% 13% in order for you to succeed in your job, how important is keeping abreast of new technologies to you? answer response percent not important 1 2 important 22 34 very important 41 64 editorial | truitt 3 marc truitteditorial i doubt that many of the blog people are in the habit of sustained reading of complex texts. —michael gorman, 2005 s o, three plus years after the fact, why am i opening with michael gorman’s unfortunate characterization of those he labeled “blog people”? i have no interest in reopening this debate, honestly! but the problem with generalizations, however unfair, is that at their heart there is just enough substance to make them “stick”—to give them a grain or two of credibility. gorman’s words struck a chord in me that existed before his charge and has continued to exist to this day. the substance in gorman’s words had little to do with these “blog people” as such; rather, my interest was piqued by the implications in his remark about how we all deal with “complex texts” and the “sustained reading” of the same. in a time of wide availability of full-text electronic articles, it has become so easy and tempting to cherry pick the odd phrase here or there, without study of the work as a whole. how has scholarship especially been changed by the ease with which we can reduce works to snippets without having considered their overall context? i’m not arguing that scholarly research and writing hasn’t always been at least in part about finding the perfect juicy quotation around which we then weave our own theses. many of us well recall the boxes of 3x5” citation and 5x8” quotation files that we or our patrons laboriously assembled through weeks, months, and years of detailed research. but if the style of compiling these files that i witnessed (and indeed did) is any guide, their existence was the product of precisely that “sustained reading of complex texts” of which gorman spoke. my vague, nagging sense is that what is changing is this style of approaching whole texts. i wondered then about how much scholarly research today is driven by keyword searches of digitized texts that then essentially produce “virtual quotation files” without our having had to struggle with their context in the whole of the original source text? fast forward three years. lately, several articles touching on our changing ways of interacting with resources have appeared in both scholarly and popular venues, and these have served to underline my sense that we are missing something because of our growing lack of engagement with whole texts. writing in the july/august issue of the atlantic monthly, nicholas carr asks “is google making us stupid?” drawing an analogy to the scene in the film 2001: a space odyssey, in which astronaut dave bowman disables supercomputer hal’s memory circuits, carr says i can feel it, too. over the past few years i’ve had an uncomfortable sense that someone, or something, has been tinkering with my brain, remapping the neural circuitry, reprogramming the memory. my mind isn’t going—so far as i can tell—but it’s changing. i’m not thinking the way i used to think. i can feel it most strongly when i’m reading. immersing myself in a book or a lengthy article used to be easy. my mind would get caught up in the narrative or the turns of the argument, and i’d spend hours strolling through long stretches of prose. that’s rarely the case anymore. now my concentration often starts to drift after two or three pages. i get fidgety, lose the thread, begin looking for something else to do. i feel as if i’m always dragging my wayward brain back to the text. the deep reading that used to come naturally has become a struggle.1 carr goes on to explain that “what the net seems to be doing is chipping away my capacity for concentration and contemplation. my mind now expects to take in information the way the net distributes it: in a swiftly moving stream of particles. once i was a scuba diver in the sea of words. now i zip along the surface like a guy on a jet ski.”2 carr’s nagging fear found similar expression among some tech-savvy participants of library online forums; one of the more interesting comments appeared on the web4lib electronic discussion list. in a discussion of the article, tim spalding of librarything observed that he himself had experienced what he dubbed “the google effect” and noted something is lost. . . . human culture often advances by externalizing pieces of our mental life—writing externalizes memory, calculators externalize arithmetic, maps, and now gps, externalize way-finding, etc. each shift changes the culture. and each shift comes with a cost. nobody memorizes texts anymore, nobody knows the times tables past ten or twelve and nobody can find their way home from the stars and the side of the tree the moss grows on.3 meanwhile, another article appeared on a closely related topic, this time in the journal science. james a. evans observed that, because “scientists and scholars tend to search electronically and follow hyperlinks rather than browse or peruse,” the easy availability of electronic resources was resulting in an “ironic change” for scientific marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | september 2008 scholarship, in that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles. the forced browsing of print archives may have stretched scientists and scholars to anchor findings deeply into past and present scholarship. searching online is more efficient and following hyperlinks quickly puts researchers in touch with prevailing opinion, but this may accelerate consensus and narrow the range of findings and ideas built upon.4 evans’s research highlights an additional irony: an unintended benefit to the scholarly process in the paperbased world was “poor indexing,” since it encouraged browsing through less relevant, older, or more marginal literature. this browsing had the effect of “facilitat[ing] broader comparisons and led researchers into the past. modern graduate education parallels this shift in publication—shorter in years, more specialized in scope, culminating less frequently in a true dissertation than an album of articles.”5 what is one to make of all of this? at the outset, i wish to state clearly that i am not some sort of anti e-text luddite. electronic texts are a fact of life, and are becoming moreso every day. even though they are in their infancy as a medium, they’ve already transformed the landscape of bibliographic access. my interest is not with the tool, but with the manner in which we are using it. i began by suggesting that i share with gorman a concern about how we increasingly engage with “complex texts” today. unlike him, though, my concern is not limited only to the so-called blog people (whomever they may be), but indeed, it includes all of us. with the explosion in easily accessible electronic texts, our ideas and habits concerning interaction with these texts are changing, sometimes in unintended ways. in a recent informal survey i conducted of my colleagues at work, i asked, “have you ever read an e-book (not just a journal article) from (virtual) cover to (virtual) cover?” for those whose answer was affirmative, i also asked, “how many such books have you read in their entirety?” out of twenty-odd responses, three individuals answered that yes, they had had occasion to read an entire e-book (for a total of six books among the three “yes” respondents, which seemed surprisingly high to me). of greater interest, though, were those who chose to question the premise of the survey, arguing that people don’t “read” e-books the way that they read paper ones. it does make one wonder, then, how amazon thinks it possesses a viable business model in the kindle e-book reader, for which it currently lists an astounding 140,000+ available e-books. clearly, some e-books are being read as whole texts, by some people, for some purposes. but i suspect that’s another story.6 carr and evans use slightly differing imagery to describe a similar phenomenon. carr closes with a reference back to the death of 2001’s hal, saying, “as we come to rely on computers to mediate our understanding of the world, it is our own intelligence that flattens into artificial intelligence.”7 evans, on the other hand, compares contemporary scientific researchers to newton and darwin, each of whom produced works that “not only were engaged in current debates, but wove their propositions into conversation with astronomers, geometers, and naturalists from centuries past.” twenty-first-century scientists and scholars, by contrast, are able because of readily available electronic resources “to frame and publish their arguments more efficiently, [but] they weave them into a more focused—and more narrow—past and present.” 8 perhaps the most succinct statement, though, comes from librarything’s tim spalding, who summarized the problem thusly: “we advance by becoming dumber.”9 an ital research and publishing opportunity for an inquisitive and enterprising scholar, perhaps? i’d welcome the manuscript! shameless plugs department. by the time you read this, we at ital will have launched our new blog, italica (http://ital-ica.blogspot.com). italica addresses a need we on the ital editorial board have long sensed; that is, an area for “letters to the editor,” updates to articles, supplementary materials we can’t work into the journal—you name it. one of the most important features of italica will be a forum for readers’ conversations with our authors: we’ll ask authors to host and monitor discussion for a period of time after publication so that you’ll then have a chance to interact with them. italica is currently a pilot project. for our first issue we will have begun with a discussion hosted by jennifer bowen, whose article “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase i” was published in the june 2008 issue of ital. for our second italica, we plan to expand coverage and discussion to include all articles and other features in the september issue you now have in hand. italica is sure to become a stimulating supplement to and forum for topics originating in ital. we look forward to seeing you there! references and notes extract. michael gorman, “revenge of the blog people!” library journal (feb. 15, 2005) www.libraryjournal.com/article/ ca502009.html (accessed july 21, 2008). 1. nicholas carr, “is google making us stupid?” the atlantic monthly 301 (july/aug. 2008) www.theatlantic.com/ doc/200807/google (accessed july 23, 2008). editor’s column | truitt 5 2. ibid. 3. tim spalding, “re: ‘is google making us stupid? what the internet is doing to our brains,’” web4lib discussion list post, june 19, 2008, http://article.gmane.org/gmane.education .web4lib/12349 (accessed july 24, 2008). 4. james a. evans, “electronic publication and the narrowing of science and scholarship,” science (july 18, 2008) www .sciencemag.org/cgi/content/full/321/5887/395 (accessed july 24, 2008). emphasis added. 5. ibid. 6. as of 5:30pm (est), july 24, 2008, amazon’s website listed 145,591 “kindle books.” www.amazon.com/s/qid=1216934603/ ref=sr_hi?ie=utf8&rs=154606011&bbn=154606011&rh=n%3a1 54606011&page=1. 7. carr, “is google making us stupid?” 8. evans, “electronic publication and the narrowing of science of scholarship.” 9. spalding, “re: ‘is google making us stupid?’” 149 an integrated computer based technical processing system in a small college library jack w. scott: kent state university library, kent, ohio (formerly lorain county community college, lorain, ohio) a functioning technical processing system in a two-year community college library utilizes a model 2201 friden flexowriter with punch card control and tab card reading units, an ibm 026 key punch, and an ibm 1440 computer, with two tape and two disc drives, to produce all acquisitions and catalog files based primarily on a single typing at the time of initiating an order. records generated by the initial order, with slight updating of information,. are used to produce, via computer, manual and mechanized order files and shelf lists, catalogs in both the traditional 3x5 card form and book form, mechanized claiming of unfilled orders, and subject bibliographies. the lorain county community college, a two-year institution designed for 4000 students, opened in september 1964, with no librarian and no library collection. when the librarian was hired in october 1964, lack of personnel, both professional and clerical, forced him to examine closely traditional ways of ordering and preparing materials, his main task being the controlled building of a collection as quickly as possible. no library having been established, there were no inflexible rules governing acquisitions or cataloging and no catalogs or other files enforcing their pattern on future plans. the librarian was free to experiment and adapt as much as he desired; and adapt and experiment he did, remembering, at least most of the time, the primary reasons for designing the 150 journal of library automation vol. 1/3 september, 1968 system. these were 1) to notify the vendor about what material was desired; 2) to have readily available information about when material had been ordered and when it might arrive; 3) to provide a record of encumbrances; 4) to make sure that material received was the material which had been ordered; 5) to initiate payment for material received; 6) to provide catalog copy for technical processes to use in producing card and book catalogs; 7) to provide inexpensive control cards for a circulation system; and 8) to provide whatever other statistics might be needed by the librarian. the librarian attended the purdue conference on library automation (october 2-3, 1964) and an ibm conference on a-utomation held in cleveland (december 1964), and visited libraries with data processing installations, such as the decatur public library. then an extensive literature search was run on the subject of mechanization of libraries and the available material thoroughly reviewed. it was the consensus of the president, the librarian, and the manager of data processing that, as white said later, "the computer will play a major part in how libraries are organized and operated because libraries are a part of the fabric of society and computers are becoming a daily accepted part of life." ( 1) moreover, it was agreed that the use of data processing equipment would be justified only if it made building a collection more efficient and more economical than manual methods could do. metro}) after careful consideration of the ibm 870 document writing system ( 2) and the system described by kraft ( 3) as input techniqu~s for the college library, ·it . was decided to use the friden flexowriter, recommended both at purdue and, in european applications, by bernstein ( 4). its most attractive feature was the use of paper tapes to generate various secondary. records without the necessity of proofreading each one. the college, by mid-1965, ·had the following equipment available for library use: one friden flexowriter (model 2201) with card punch control unit and tab card reading unit, one ibm 026 key punch with alternate programming, and guaranteed time on the college-owned ibm 1440 8k computer with two tape and lwo disc drives. to produce punched paper tape and tab cards with only one keyboarding, an electrical connection between the flexowtiter and the keypunch was especially designed and installed. . it was fortunate for the library that the college also had an excellent data processing· manager who was interested in seeing data processing machines and techniques utilized in as many ways as possible. with his enthusiastic support, aid in programming and preparation of flow charts, and patient cooperation, it was not surprising that the automation of library processes was completely successful. ·· at this time it ·was decided that since the college was likely to remain integrated computer based processing/ scott 151 a single-campus institution it would be uneconomical to rely solely on a book catalog, even though the portability of such a device was most attractive to librarian and faculty alike. therefore, it was planned to have the public catalog, as well as the official shelf list, in card form, permitting both to be kept current economically. these two files were to be supplemented with crude book catalogs which would be a by-product, among others, of the typing of the original book orders. these book catalogs were not to replace the card catalog but simply to extend and facilitate use of the collection. it was also decided to design a system which would duplicate as few as possible of the manual aspects of normal technical processing systems, but one which would, at the same time, permit the return to a manual system from a machine system with a minimum of trouble and tribulation if support for the library's automated system should be withdrawn. concern about such withdrawal of support had originally been voiced by durkin and white in 1961, when they said: "there have been a number of unfortunate examples of libraries that abandoned their home-grown catalogs for a machine retriev(tl program because there was some free computer time, only to lose their machine time to a higher priority project and to be left with information storage to which they no longer have access. many of these librarians, and others who have heard about their plight, are determined not to bum their bridges behind' them by abandoning their reliable, if old-fashioned, 3x5 card catalogs." ( 5) although the necessity of returning to an inefficient manual system has not, to date, raised its ugly head, there were times when it was most comforting to know that routes of retreat and reformation were available. under the present system there is only one manual keyboarding of descriptive catalog main entries for most titles. all other records are generated from these main entries. this integrated system was adopted on the assumption that cataloging infonnation in some form ( 6) would be available for a high percentage of books. experience showed that about 95 percent of acquisitions did have catalog copy readily available. of 4029 titles processed in a 5-month period, catalog copy was available for 3824. after verification that a requested title is neither in the library nor on order, a copy of a catalog entry is located in a source such as the national union catalog, library of congress proofsheets, or publisher's weekly, etc. the catalog information is manually typed in its entirety (including subject headings) onto five-part multiple request forms, using the friden flexowriter. output from the friden consists of the multiple order, a punched paper tape containing the full bibliographic entry but no order information, and tab cards, punched by the slave ibm key punch, which contain full order information but only abbreviated bibliographic data. (figure 1 ). the tab cards, containing full order information, are used as input to the 1440 computer to create an "on order" file arranged by order 152 /ou·rnal of library automation vol. 1/ 3 september, 1968 mail copies to vendor typed multiple book orders on order tape fig. 1 on order creation routine. start flexowriter 026 key punch on order cards cards to week integrated computer based p1'0cessing / scott 153 number and stored on magnetic tape, from which an "on order" printout is produced weekly (figure 2). at any given time this magnetic tape order file can be used to total the dollar amount of outstanding orders to any given vendor, or the total amount outstanding to all vendors (figure 3 ). the punched paper tape and two copies of the request form are stored in a standard 3x5 card file arranged by main entry. one copy of the request form is to be used as a work slip when material is received. on order cards for one week fig. 2 on order update. start cpu on order update scratc h a f ter update 154 journal of library automation vol. 1/ 3 september, 1968 the original and one copy of the request form is sent to the vendor, with instructions to return one copy with shipment. in the event the vendor does not comply, the main entry can be located readily by checking the order number or order .date on the "on order" printout and using the abbreviated bibliographic information which appears there. if the material requested has not been shipped within three months, the magnetic tape order file is used to prepare tab cards containing all original order information and the cards are sent to the library with a notice stating that shipment is overdue. these tab cards are used as input fig. 3 on order cost tally. start cpu list or tab of on order file by cost #30000 on order cost tab integrated computer based processing/ scott 155 to the flexowriter tab card reader unit which activates the flexowriter itself and prepares "overdue, ship or cancel" notices to the vendor (figfig. 4 late on order routine. ure 4). 156 journal of library automation vol. 1/ 3 september, 1968 products when material is received, the paper tape and one copy of the main entry work slip are pulled from the card order file and sent to the cataloger who notes on the work slip the call number to be used as well as any changes. the work slip, punched paper tape and book then pass to the technician who does the shelf listing. at this point the original output paper tape containing full bibliographic information is used as input for the flexowriter to create a standard 3x5 hard-copy shelf list card containing full bibliographic information, as well as inventory data such as vendor, date of receipt and cost. the last three items and the call number are added manually as "changes." simultaneously a new paper tape is produced as output which contains bibliographic information from the first tape and all revisions deemed necessary by the cataloger. the revised paper tape is used on the flexowriter to prepare 3x5 card sets for the public catalog. at the same time the slave keypunch prepares a set of tab cards containing full acquisitions fig. 5 shelf list creation routine. integrated computer based processing/scott 157 information: cost, vendor, date of receipt; and abbreviated bibliographic information: short author, short title, full call number (including copy, year, part and volume), accession number and short edition statement (figure 5). the tab cards are used first to delete the item from the magnetic tape "on order" file and second as input to create a magnetic tape shelf list of abbreviated information arranged by call number (figure 6). the magnetic tape shelf list is used to create 1) eight copies of author, title, and classified catalogs which are updated semi-annually; 2 ) printouts of weekly acquisitions; 3) subject printouts on demand; and 4) tab cards which serve as circulation cards for books, film s, drawings, tape and disc recordings, filmstrips and any other materials. the tab cards can be used with the ibm 357 circulation system or any similar system. discussion the efficiency of this system is most dramatically demonstrated by the amount of work accomplished per person per year. one technician can sort by call number cpu circ. caro prep fig. 6 weekly shelf list update. sort by control number cpu 158 journal of library automation vol. 1/ 3 september, 1968 process over one thousand orders per month. over fifteen thousand fully cataloged volumes per year (approximately eleven thousand titles) are added to the collection by a technical processing department which consists solely of one full-time cataloger and two full-time technicians. one technician spends one half of her time typing orders and the other half preparing the shelf list. at present the limiting factor in processing material is not the personnel time available but rather time on the flexowriterkeypunch combination, which runs continuously for sixty hours per week. the cataloger feels if some thirty hours more per week were available for running the machines, or if a second flexowriter were available to handle catalog card output, it would then be possible to order, receive, and fully process fifteen thousand titles per year (eighteen to twenty thousand volumes) with only the present technical processing staff. references 1. white, herbert s.: "to the barricades! the computers are coming!" special libmries 57 (november, 1966), 631. 2. general information manual: mechanized library procedures (white plains, n.y.: ibm, n.d.). 3. kraft, donald h .: libmry automation with data processing equipment (chicago: ibm, 1964). 4. bernstein, hans h.: "die verwendung von flexowritern in dokumentation und bibliothek", n achrichten fur dokumentation 12 (june, 1961), 92. 5. durkin, robert e.; white, herbert s.: "simultaneous preparation of library catalogs for manual and machine applications", special libraries 52 (may, 1961), 231. 6. kaiser, walter h.: "new face and place for the catalog card", library journal 88 (january, 1963 ), 186. 6 information technology and libraries | june 2008 metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1 jennifer bowen the extensible catalog (xc) project at the university of rochester will design and develop a set of open-source applications to provide libraries with an alternative way to reveal their collections to library users. the goals and functional requirements developed for xc reveal generalizable needs for metadata to support a next-generation discovery system. the strategies that the xc project team and xc partner institutions will use to address these issues can contribute to an agenda for attention and action within the library community to ensure that library metadata will continue to support online resource discovery in the future. library metadata, whether in the form of marc 21 catalog records or in a variety of newer metadata schemas, has served its purpose for library users by facilitating their discovery of library resources within online library catalogs (opacs), digital libraries, and institutional repositories. however, libraries now face the challenge of making this wealth of legacy catalog data function adequately within next-generation web discovery environments. approaching this challenge will require: n an understanding of the metadata itself and a commitment to deriving as much value from it as possible; n a vision for the capabilities of future technology; n an understanding of the needs of current (and, where possible, future) library users; and n a commitment to ensuring that lessons learned in this area inform the development of both future library systems and future metadata standards. the university of rochester ’s extensible catalog (xc) project will bring these various perspectives together to design and develop a set of open-source, collaboratively built next-generation discovery tools for libraries. the xc project team seeks to make the best possible use of legacy library metadata, while also informing the future development of discovery metadata for libraries. during phase 1 of the xc project (2006–2007), the xc project team created a plan for developing xc and defined the goals and initial functional requirements for the system. this paper outlines the major metadatarelated issues that the xc project team and xc partner institutions will need to address to build the xc system during phase 2. it also describes how the xc team and xc partners will address these issues, and concludes by presenting a number of issues for the broader library community to consider. while this paper focuses on the work of a single library project, the goals and functional requirements developed for the xc project reveal many generalizable needs for metadata to support a next-generation discovery system.1 the metadata-related goals of the xc project—to facilitate the use of marc metadata outside an integrated library system (ils), to combine marc metadata with metadata from other sources in a single discovery environment, and to facilitate new functionality (e.g., faceted browsing, user tagging)—are very similar to the goals of other library projects and commercial vendor discovery software. the issues described in this paper thus transcend their connection to the xc project and can be considered general needs for library discovery metadata in the near future. in addition to informing the library community about the xc project and encouraging comment on that work, the author hopes that identifying and describing metadata issues that are important for xc—and that are likely to be important for other projects as well—will encourage the library community to set these issues as high priorities for attention and action within the next few years. n the extensible catalog project the university of rochester’s vision for the extensible catalog (xc) is to design and develop a set of open-source applications that provide libraries with an alternative way to reveal their collections to library users. xc will provide easy access to all resources (both digital and physical collections) and will enable library content to be revealed through other web applications that libraries may already be using. xc will be released as open-source software, so it will be available for free download, and libraries will be able to adopt, customize, and extend the software to meet their local needs. the xc project is a collaborative effort between partner institutions that will serve a variety of roles in its development. phase 1 of the xc project, funded by the andrew w. mellon foundation and carried out by the university of rochester river campus libraries between april 2006 and june 2007, resulted in the creation of a project plan for the development of xc. during xc phase 1, the xc project team recruited a number of other institutions that will serve as xc partners and who have agreed to contribute resources toward building and implementing xc during phase 2. xc phase 2 (october 2007 through jennifer bowen (jbowen@library.rochester.edu) is director of metadata management at the university of rochester river campus libraries, new york, and is co-principal investigator for the extensible catalog project. metadata to support next-generation library resource discovery | bowen 7 june 2009) is supported through additional funding from the andrew w. mellon foundation, the university of rochester, and xc partners. during phase 2, the xc project team, assisted by xc partners, will deploy the xc software and make it available as open-source software.2 through its various components, the xc system will provide a platform for local development and experimentation that will ultimately allow libraries to manage and reveal their metadata through a variety of web applications such as web sites, institutional repositories, and content management systems. a library may choose to create its own customized local interface to xc, or use xc’s native user interface “as is.” the native xc interface will include web 2.0 functionality, such as tagging and faceted browsing of search results that will be informed by frbr (functional requirements for bibliographic records)3 and frad (functional requirements for authority data)4 conceptual models. the xc software will handle multiple metadata schemas, such as marc 215 and dublin core,6 and will be able to serve as a repository for both existing and future library metadata. in addition, xc will facilitate the creation and incorporation of user-created metadata, enabling such metadata to be enhanced, augmented, and redistributed in a variety of ways. the xc project team has designed a modular architecture for xc, as shown in the simplified schematic in figure 1. xc will bring together metadata from a variety of sources (integrated library systems, digital repositories, etc.), apply services to that metadata, and display it in a usable way in the web environments where users expect to find it.7 xc’s architecture will allow institutions that implement the software to take advantage of innovative models for shared metadata services, which will be described in this paper. n xc phase 1 activities during the now-completed xc phase 1, the xc project team focused on six areas of activity: 1. survey and understand existing research on user practices. 2. gauge library demand for the xc system. 3. anticipate and prepare for the metadata requirements of the new system. 4. learn about and build on related projects. 5. experiment with and incorporate useful, freely available code. 6. build a community of interest. the xc project team carried out a variety of research activities to inform the overall goals and high-level functional requirements for xc. this research included a literature search and ongoing monitoring of discussion lists and blogs, to allow the team to keep up with the most current discussions taking place about next-generation library discovery systems and related technologies and projects.8 the xc team also consulted regularly with prospective partners and other knowledgeable colleagues who are engaged in defining the concept of a next-generation library discovery system. in order to gauge library demand for the xc system, the team also conducted a survey of interested institutions.9 this paper reports the results of the third area of activity during xc phase 1—anticipating and preparing for the metadata requirements of the new system—and looks ahead to plans to develop the xc software during phase 2. n xc goals and metadata functional requirements the goals of the xc project have significant implications for the metadata functionality of the system, with each goal suggesting specific high-level functional requirements for how the system can achieve that particular goal. the five goals are: n goal 1: provide access to all library resources, digital and non-digital. n goal 2: bring metadata about library resources into a more open web environment. n goal 3: provide an interface with new web functionality such as web 2.0 features and faceted browsing. n goal 4: conduct user research to inform system development. n goal 5: publish the xc code as open-source software. figure 1. xc system diagram 8 information technology and libraries | june 2008 an overview of each xc goal and its related high-level metadata requirements appears below. each requirement is then discussed in more detail, with a plan for how the xc project team will address that requirement when developing the xc software. n goal 1: provide access to all library resources, digital and non-digital working alongside a library’s current integrated library system (ils) and its other web applications, xc will strive to bring together access to all library resources, thus eliminating the data silos that are now likely to exist between a library’s opac and its various digital repositories and commercial databases. this goal suggests two fairly obvious metadata requirements (requirements 1 and 2). requirement 1—the system must be capable of acquiring and managing metadata from multiple sources: ilss, digital repositories, licensed databases, etc. a typical library currently has metadata pertaining to its collections residing in a variety of separate online systems: marc data in an ils, metadata in various schemas in digital collections and repositories, citation data in commercial databases, and other content on library web sites. a library that implements xc may want to populate the system with metadata from several online environments to simplify access to all types of resources. to achieve goal 1, xc must be capable of acquiring and managing metadata from all of these sources. each online environment and type of metadata present their own challenges. repurposing marc data repurposing marc metadata from an existing ils will be one of the biggest metadata tasks for a next-generation discovery system such as xc. in planning xc, we have assumed that most libraries will keep their current ils for the next few years or perhaps migrate to a newer commercial or open-source ils. in either case, most libraries will likely continue to rely on an ils’s staff functionality to handle materials acquisition, cataloging, circulation, etc. for the short term. relying upon an ils as a processing environment does not, however, mean that a library must use the opac portion of that ils as its means of resource discovery for users. xc will provide other options for resource retrieval by using web services to interact with the ils in the background.10 to repurpose ils metadata and enable it to be used in various web discovery environments, xc will harvest a copy of marc metadata records from an institution’s ils using the open archives initiative protocol for metadata harvesting (oai-pmh).11 using web services and standard protocols such as oaipmh offers not only a short-term solution for reusing metadata from an ils, but can also be used in both the shortand long-term to harvest metadata from any system that is oai-pmh harvestable, as will be discussed further below. while harvesting metadata from existing systems into xc creates duplication of metadata between an ils and xc, this actually has significant benefits. xc will handle metadata updates through automated harvesting services that minimize additional work for library staff, other than for setting up and managing the automated services themselves. the internal xc metadata cache can be easily regenerated from the original repositories and services when necessary, such as to enable future changes to the internal xc metadata schema. the xc system architecture also makes use of internal metadata duplication among xc’s components, which allows these components to communicate with each other using oaipmh. this built-in metadata redundancy will also enable xc to communicate with external services using this standard protocol. it is important to distinguish the deliberate metadata redundancies built into the xc architecture from the type of metadata redundancies that have been singled out for elimination in the library of congress working group on the future of bibliographic control draft report (recommendation 1.1)12 and previously in the university of california (uc) libraries bibliographic services task force’s final report.13 these other “negative” redundancies result from difficulties in sharing metadata among different environments and cause significant additional staff expense for libraries to enrich or recreate metadata locally. xc’s architecture actually solves many of these problems by facilitating the sharing of enriched metadata among xc users. xc can also adapt as the library community begins to address the types of costly metadata redundancies mentioned in the above reports, such as between the oclc worldcat database14 and copies of that marc data contained within a library’s ils, because xc will be capable of harvesting metadata from any source that uses a standard api.15 metadata from digital repositories and other free sources xc will harvest metadata from various digital collections and repositories, using oai-pmh, and will maintain a copy of the harvested metadata within the xc metadata cache, as shown in figure 1. the metadata services hub architecture provides flexibility and possible economy for xc users by offering the option for multiple xc institutions to share a single metadata hub, thus allowing participating institutions to take full advantage of the hub’s capabilities to aggregate and augment metadata from multiple sources. while the procedure for harvestmetadata to support next-generation library resource discovery | bowen 9 ing metadata from an external repository is not technologically difficult in itself, managing the flow of metadata coming from multiple sources and aggregating that metadata for use in xc will require the development of sophisticated software. to address this, the xc project team is partnering with established experts in bibliographic metadata aggregation to develop the metadata services portion of the xc architecture. the team from cornell university that has developed the software behind the national science digital library’s metadata management system (nsdl/mms)16 is advising the xc team in the development of the xc metadata services hub, which will be built on top of the basic nsdl/mms software. the xc metadata services hub will coordinate metadata services into a reusable task grouping that can be started on demand or scheduled to run regularly. this xc component will harvest xml metadata and combine metadata records that refer to equivalent resources (based on uniform resource identifier [uri], if available, or other unique identifier) into what the cornell team describes as a “mudball.” each mudball will contain the original metadata, the sources for the metadata, and the references to any services used to combine metadata into the mudball. the mudball may also contain metadata that is the result of further automated processing or services to improve quality or to explicitly identify relationships between resources. hub services could potentially record the source of each individual metadata statement within each mudball, which would then allow a metadata record to be redelivered in its original or in an enriched form when requested.17 by allowing for the capture of provenance data for each data element, the hub could potentially provide much more granular information about the origin of metadata—and much more flexibility for recombining metadata—than is possible in most marcbased environments. after using the redeployed nsdl/mms software as the foundation for the xc metadata hub, the xc project team will develop additional hub services to support xc’s functional requirements. xc-specific hub services will accommodate incoming marc data (including marc holdings data for non-digital resources); basic authority control; mappings from marc 21, marcxml,18 and dublin core to an internal xc schema defined within the xc application profile (described below); and other services to facilitate the functionality of the xc user environments (see discussion of requirement 5, below). finally, the xc hub services will make the metadata available for harvesting from the hub by the xc client integration applications. metadata for licensed content for a next-generation discovery system such as xc to provide access to all library resources, it will need to provide access to licensed content, such as citation data and full-text databases. metasearch technology provides one option for incorporating access to licensed content into xc. unfortunately, various difficulties with metasearch technology19 and usability issues with some metasearch products20 make metasearch technology a less-than-ideal solution. an alternative approach would bring metadata from licensed content directly into a system such as xc. the metadata services hub architecture for xc is capable of handling the ingest and processing of metadata supplied by commercial content providers by adding additional services to handle the necessary schema transformations and to control access to the licensed content. the more difficult issue with licensed content may be to obtain the cooperation of commercial vendors to ingest their metadata into xc. pursuing individual agreements with vendors to negotiate rights to ingest their metadata is beyond the original scope of xc’s phase 2 project. however, the xc team will continue to monitor ongoing developments in this area, especially the work of the ethicshare project, which uses a system architecture very similar to that of xc.21 it remains our goal to build a system that will facilitate the inclusion of licensed content within xc in situations where commercial providers have made it available to xc users. requirement 1 summary when considering needed functionality for a next-generation discovery system, the ability to ingest and manage metadata from a variety of sources is of paramount importance. unlike a current ils, where we often think of metadata as mostly static unless it is supplemented by new, updated, and deleted records, we should instead envision the metadata in a next-generation system as being in constant motion, moving from one environment to another and being harvested and transformed on a scheduled basis. the metadata services hub architecture of the xc system will accommodate and facilitate such constant movement of metadata. requirement 2—the system must handle multiple metadata schemas. an extension of requirement 1 will be the necessity for a next-generation system such as xc to handle metadata from multiple schemas, as the system harvests those schemas from various sources. library metadata priorities as a part of the xc survey of libraries described earlier in this paper, the xt team queried respondents about what metadata schemas they currently use or plan to use in the near future. many responding libraries indicated that they expect to increase their use of non–marc 21 metadata within the next three years, although no library indicated the intention to completely move away from 10 information technology and libraries | june 2008 marc 21 within that time period. nevertheless, the idea of a “marc exit strategy” has been discussed in various circles.22 the architecture of xc will enable libraries to move beyond the constraints of a marc-based system without abandoning their ils, and will provide an opportunity for libraries to stage their “marc exit strategy” in a way that suits their purposes. libraries also indicated that they plan to move away from homegrown schemas toward accepted standards such as mets,23 mods,24 mads,25 premis,26 ead,27 vra core,28 and dublin core.29 several responding libraries plan to move toward a wider variety of metadata schemas in the near future, and will focus on using xmlbased schemas to facilitate interoperability and metadata harvesting. to address the needs of these libraries in the future, xc’s metadata services will contain a variety of transformation services to handle a variety of schemas. taking into account the metadata schemas mentioned the most often among survey respondents, the software developed during phase 2 of the xc project will support harvested metadata in marc 21, marcxml, and dublin core (including qualified dublin core).30 metadata crosswalks and mapping one respondent to the xc survey offered the prediction that “reuse of existing metadata and transformation of metadata from one format to another will become commonplace and routine.”31 xc’s internal metadata transformations must be designed with this in mind, to facilitate making these activities “commonplace and routine.” fortunately, many maps and crosswalks already exist that potentially can be incorporated into a next-generation system such as xc.32 the metadata services hub architecture for xc can function as a standard framework for applying a variety of existing crosswalks within a single, shared environment. following “best practices” for crosswalking metadata, such as those developed by the digital library federation (dlf),33 will be extremely important in this environment. as the dlf guidelines describe, metadata schema transformation is not as straightforward as it might first appear to be. while the dlf guidelines advise always crosswalking from a more robust schema to a simpler one, sometimes in a series of steps, such mapping will often result in “dumbing down” of metadata, or loss of granularity. this is a particularly important concern for the xc project because a large percentage of the metadata handled by xc will be rich legacy marc 21 metadata, and we hope to maintain as much of that richness as possible within the xc system. in addition to simply mapping one data element in a schema to its closest equivalent in another, it is essential to ensure that the underlying metadata models of the two schemas being crosswalked are compatible. the authors of the framework for a bibliographic future draft document define multiple layers of such models that need to be considered,34 and offer a general highlevel comparison between the frbr data model35 and the dcmi (dublin core metadata initiative) abstract model (dcam).36 more detailed comparisons of models are also taking place as a part of the development of the new metadata content standard, resource description and access (rda).37 the developers of rda have issued documents offering a detailed mapping of rda elements to rda’s underlying model (frbr)38 and analyzing the relationship between rda elements, the dcmi abstract model, and the metadata framework.39 as a result of a meeting held april 30–may 1, 2007, a joint dcmi/rda task group is now undertaking the collaborative work necessary to carry out the following tasks: n develop an rda element vocabulary. n develop an rda/dublin core application profile based on frbr and frad. n disclose rda value vocabularies using rdf/ rdfs/skos.40 these efforts hold much potential to provide a more rigorous way to communicate about metadata across multiple communities and to increase the compatibility of different metadata schemas and their underlying models. such compatibility will be essential to enabling the functionality of future discovery systems such as xc. an xc metadata application profile the xc project team will define a metadata application profile for xc as a way to document decisions made about data elements, content standards, and crosswalking used within the system. the use of an application profile can facilitate metadata migration, harvesting, and other automated processes, and presents an approach to metadata that is more flexible and responsive to local needs than simply adopting someone else’s metadata guidelines.41 application profiles facilitate the use of multiple schemas because elements can be selected for inclusion from more than one existing schema, or additional elements can be created and defined locally.42 because the xc system will incorporate harvested metadata from a variety of sources, the use of an application profile will be essential to support xc’s complex system requirements. the dcmi community has published guidelines for creating a dublin core application profile (dcap), which is defined more specifically as: [a] form for documenting which terms a given application uses in its metadata, with what extensions or adaptations, and specifying how those terms relate both to formal standards such as dublin core as well as to less formally defined element sets and vocabularies.43 metadata to support next-generation library resource discovery | bowen 11 the announcement of plans to develop an rda/ dublin core application profile illustrates the important role that application profiles are beginning to take to facilitate the interoperability of metadata schemas. the planned rda/dc application profile will “translate” rda into a standard structure that will allow it to be related more easily to other metadata element sets. unfortunately, the rda/dc application profile will likely not be completed in time for it to be incorporated into the first release of the xc software in mid-2009. nevertheless, we intend to use the existing definitions of rda elements to inform the development of the xc application profile.44 this will allow us to anticipate any future incompatibilities between the rda/dc and the xc application profiles, and ensure that xc will be wellpositioned to take advantage of rda-based metadata when rda is implemented. this process may have the reciprocal benefit of also informing the developers of rda of any rda elements that may be difficult to implement within a next-generation system such as xc. the potential value of rda to the xc project—in terms of providing a consistent approach to bibliographic and authority metadata and facilitating frbr-related user functionality—is very significant. it is hoped that at some point xc can become an early adopter of rda and provide a mechanism through which libraries can move their legacy marc 21 metadata into a system that is compatible with an emerging international metadata standard. n goal 2: bring metadata about library resources into a more open web environment xc will reveal library metadata not only through its own separate interface (either the out-of-the-box xc interface or an interface designed by the local library), but will also allow library metadata to be revealed through other web applications. the latter approach will bring library resources directly to web locations that library users are already visiting, rather than attempting to entice users to visit an additional library-specific web location. making library metadata work effectively in the broader web environment (outside the well-defined boundaries of an ils or repository) will require the following requirements 3 and 4: requirement 3—metadata must conform to the standards of the new web environments as well as to that of the system from which it originated. achieving requirement 3 will require library metadata in future systems to perform a dual function: to conform to both existing library standards as well as to web standards and conventions. one way to achieve this is to ensure that the two types of standards themselves are compatible. coyle and hillmann have argued persuasively for changes in the direction of rda development to allow metadata created using rda to function in the broader web environment. these changes include the need to follow a clearly refined, high-level metadata model, to create data elements that can be manipulated by machines, and to move toward the use of uris instead of textual identifiers.45 after the announcement of the outcomes of the rda/dc data modeling meeting, the two authors are considerably more optimistic about rda functioning as a standard within the broader web environment.46 this discourse concerning rda shows but a piece of the process through which long-established library metadata standards need to be reexamined to make library metadata understandable to both humans and machines on the web. moving away from aacr2 toward rda, and ultimately toward incorporating standard web conventions into library metadata, can be a difficult process for those involved in creating and maintaining library standards. nevertheless, transforming library metadata standards in this way is essential to fulfill the requirements necessary for next-generation library discovery systems. requirement 4—metadata must function effectively within the new web environments as well as within the system from which it originated. not only must metadata for a next-generation system follow the conventions and standards used in the broader web, but the data also needs to be able to function effectively in a broader web environment. this is a slightly different proposition from requirement 3, and will necessitate testing the metadata standards themselves to ensure that they enable library metadata to function effectively. the xc project will provide direct experience with using library metadata in two types of web environments: content management systems and learning management systems. library metadata in a content management system as shown in the xc architecture diagram in figure 1, the xc project team will build one of the primary user environments for xc on top of the open-source content management system, drupal.47 the xc drupal module will allow us to respond to many of the needs expressed by libraries in their responses to the xc survey48 by supplying: n a web application server with a back-end database; 12 information technology and libraries | june 2008 n a user interface with web 2.0 features; n library-controlled web pages that will treat library metadata as a native data type; n a metadata interface for enhancing or correcting metadata in the system; and n an administrative interface. the xc team will bring library metadata into the drupal content management system (cms) as a native content type within that environment, creating a drupal “node” for each metadata record. this will allow xc to take advantage of many native features of the drupal cms, such as a taxonomy system.49 building xc interfaces on top of the drupal cms will also give us an opportunity to collaborate with partner libraries that are already active participants in the drupal user community. xc’s architecture will allow the possibility of developing additional user environments on top of other content management systems. bringing library metadata into these new environments will provide many new opportunities for libraries to manipulate their metadata and present it to users without being constrained by the limitations of the current generation of library systems. such opportunities will then inform the future requirements for library metadata in such environments. library metadata in a learning management system figure 1 illustrates two examples of xc user environments through learning management systems: xc interfaces to both the blackboard learning system50 and sakai.51 much exciting work is being done at other institutions to bring library content into these web applications.52 xc will build on projects such as these to reveal library metadata for non-licensed library resources from an ils through learning management systems. specifically, we plan to develop the capability for libraries to make the display of library metadata context-sensitive within the learning management system. for example, searching or browsing on a page for a particular academic course could be configured to reflect the subject area of the course (e.g., chemistry) and automatically present library resources related to that subject.53 this capability will build upon the experiences gained by the university of rochester through its work to develop its “course resources” system.54 such xc functionality will be integrated directly into the learning management system, rather than simply providing a link out to a separate library system. again, we hope that our efforts to bring library metadata into these new environments will encourage libraries to engage in further work to integrate library resources into broader web environments and inform future requirements for library metadata in these environments. n goal 3: provide an interface with new web functionality such as web 2.0 features and faceted browsing new functionality for users will require that metadata fulfill more sophisticated functions in a next-generation system than it may have done in an ils or repository, in order to provide more intuitive searching and navigation. the system will also need to capture and incorporate metadata generated through tagging, user-contributed reviews, etc. such new functionality creates the need for requirements 5 and 6. requirement 5—metadata must support functionality to facilitate intuitive searching and navigation, such as faceted browsing and frbrinformed results groupings. enabling faceting and clustering much research has already been done regarding the design of faceted search interfaces in general.55 when considered along with user research conducted at other institutions56 and to be conducted during the development of xc, this data provides a strong foundation for the design of a faceted browse environment. the xc project team has already gained firsthand experience with developing faceted browsing through the development of the “c4” prototype interface during phase 1 of the xc project.57 to enable faceting within xc, we will also pay particular attention to what others have discovered through designing faceted interfaces on top of legacy marc 21 metadata. specific lessons learned from those involved with north carolina state university’s endeca-based catalog,58 vanderbilt university’s primo implementation,59 and plymouth state university’s scriblio system60 provide valuable guidance for the xc project team as we design facets for the xc system. ideally, a mechanism should be developed to enable these discoveries to feed back into the development of metadata and encoding standards, so that changes to existing standards can be considered to facilitate faceting in the future. several new system implementations have used library of congress subject headings (lcsh) and lc subdivisions from marc 21 records as the basis for deriving facets. the xc “c4” prototype interface provides facets for topic, genre, and region that are based simply upon one or more marc 21 6xx tags.61 north carolina state university’s endeca-based system has enabled facets for topic, genre, region, and era using lcsh subdivisions as well, but this has necessitated a “massive cleanup” of subdivisions, as described by charley pennell.62 oclc’s fast (faceted application of subject terminology) project may provide another option for enabling such facets.63 a library could populate its marc 21 data with fast headings, based metadata to support next-generation library resource discovery | bowen 13 upon the existing lcsh in the records, and then use the fast headings as the basis for generating facets. it remains to be seen whether fast will offer significant benefit over lcsh itself when it comes to faceting, however, since fast headings are generated directly from lcsh. while marc 21 metadata has some known difficulties where faceting and clustering are concerned (such as those involving lcsh), the xc system will encounter additional difficulties when implementing these technologies with less robust metadata schemas such as simple dublin core, and especially across metadata from a variety of schemas. the development of web services to augment batches of metadata records in an automated manner holds some promise for improving the creation of facets from other metadata schemas. within the xc system, such services could be added to the metadata services hub and run against ingested metadata. while designing extensive services of this type is beyond the scope of the next phase of xc software development, we will encourage others to develop such services for xc. another (but much less desirable) approach to augmenting metadata is for a metadata specialist to manually edit one record or group of records. the xc cataloging interface, built within the drupal cms, will allow recordby-record editing of metadata when necessary. while we see this editing interface as essential functionality for xc, we anticipate that libraries will want to use this feature sparingly. in many cases it will be preferable to correct or augment metadata within its original repository (e.g., the institution’s ils) and then re-harvest the corrected metadata, rather than correcting it manually within xc itself. because of the expense of manual metadata augmentation and correction, libraries will be well-advised to rely upon insights gained through user research to assess the value of this type of work. for example, a library might decide to edit individual metadata records only when the correction or augmentation will support specific system functionality that is of high priority for the institution’s users. implementing frbr results groupings to incorporate logical groupings of search results based upon the frbr64 and frad65 data models over sets of diverse metadata within xc, we will encounter similar difficulties that we face with faceting and clustering. various analyses of the marc 21 formats have dealt extensively with the relationship between frbr and marc 21,66 and others have written specifically about methodology for frbrizing a marc-based catalog.67 in addition, various tools and web services are available that can potentially facilitate this process.68 even with this extensive body of work to draw upon, however, the success of our implementation of frbr-based functionality will depend upon both the quality and completeness of the system’s metadata. metadata in xc that originated as dublin core records may need significant augmentation to be incorporated effectively into frbrized results displays. to maximize the ability of the system to support frbr/frad results groupings, we may need to supplement automated grouping of resources with a combination of additional services for the metadata services hub, and with cataloger-generated metadata correction and augmentation, as described above.69 the xc team will use the results of user research carried out during the next phase of the xc project to inform our decision-making regarding what frbr-informed results grouping users find helpful, and then assess what specific metadata augmentation services are needed for xc. providing frbr-informed groupings of related records in search results will be easier when the underlying metadata incorporates principles of authority control. of course, the vast majority of the non-marc metadata that will be ingested into xc will not be under authority control. again, this situation suggests the need for additional services or functionality to improve existing metadata within the xc metadata hub, the xc cataloging interface, or both. as an experiment in developing services to facilitate authority control, the xc project team carried out a pilot project in partnership with a group of software engineering students from the rochester institute of technology (rit) during phase 1 of xc. the rit students designed a basic name access control tool that can be used across disparate metadata schemas in an environment such as xc. the tool can ingest marc 21 authority and bibliographic records as well as dublin core records, provide automated matching, and facilitate a cataloger’s handling of problem reports.70 the xc project team will implement the automated portion of the tool as a web service within the xc hub, and the “cataloger facilitation” portion of the tool within the xc cataloging user interface. institutions that use xc can then incorporate additional tools to facilitate authority control into xc as they are needed and developed. in addition to providing a test case for developing xc metadata services, the rit pilot project proved valuable by providing an opportunity for student software developers and catalogers to discuss the functional requirements of a cataloging tool. not only did the experience enable the developers to understand the needs of the system’s intended users, but it also presented an opportunity for the engineering students to demonstrate technological possibilities that the catalogers—who work almost exclusively with legacy ils technology—may not have envisioned before participating in the project. requirement 6—the system must manage usergenerated metadata resulting from user tagging, submission of reviews, etc. because users now expect web-based tools to offer web 2.0 functionalities, the xc project has as one of its basic 14 information technology and libraries | june 2008 goals to incorporate these functionalities into xc’s user environments. the results of the xc survey rank tools to support the finding, gathering, use, and reuse of scholarly content (e.g., rss feeds, blogs, tagging, user reviews) eighth out of a list of twenty new desirable opac features.71 we expect to learn much more about the usefulness of web 2.0 technology within a next-generation system through the user research that we will carry out during phase 2 of the xc project. the xc system will capture metadata generated by users from any one of the system’s user environments (e.g., drupal-based interface, learning management system integration) and harvest it back into the system’s metadata services hub for processing.72 the xc application profile will incorporate user-generated metadata, mapped into its own carefully defined metadata elements. this will allow us to capture and manage this metadata as discrete content, without inadvertently mixing it with other metadata created by library staff or ingested from other sources. n goal 4: conduct user research to inform system development user research will be essential to informing the design and functionality of the xc software. to align xc’s functional requirements as closely as possible with user needs, the xc project team will practice a user-centered design methodology that takes an iterative approach to defining the system’s functional requirements. since we will engage concurrently in the processes of user research and software design, we will not fully determine the system requirements for xc until a significant amount of user research has been done. a complete picture of the demands upon metadata within xc will thus emerge as we gain information from our user research. n goal 5: publish the xc code as open-source software central to the vision of the xc project is sharing the xc software freely throughout the library community and beyond. our hope is that others will use all or part of the xc software, modify it, and improve it to meet their own needs. new requirements for the metadata within xc are likely to arise as this process takes place. other future changes to the xc software will also be needed to ensure the software’s continued compatibility with various metadata standards and schemas. these changes will all affect the system requirements for xc over time. addressing goals 4 and 5 while goals 1 through 3 for the xc project result in specific high-level functional requirements for the system’s discovery metadata that can be addressed and discussed as xc is being developed, goals 4 and 5 present general challenges that must be addressed in the future. goal 4 is likely to fuel the need to update the xc software over time as the needs of users change. goal 5 provides a challenge to managing that updating process in a collaborative environment. these two goals suggest an additional general requirement for the system’s metadata requirement 7: requirement 7—the system’s metadata must be extensible to facilitate future enhancements and updates. enabling future user needs developing xc using a user-centered design process in which user research and software design occur simultaneously will enable us to design and build a system that is as responsive as possible to the needs of users that are seeking library resources. however, user needs will change during the life of the xc software. these needs must be assessed and addressed, and then weighed against the desires of individual institutions that use xc and who request specific system enhancements. to carry forward the xc project’s commitment to serving users, we will develop a governance model for the xc community that brings the needs of future users into the decision-making process by providing a method for continuing to determine and capture user needs. in addition, we will consciously cultivate a commitment to user research among members of the xc community. because the xc software will be released as open source, we can also encourage xc partners to develop whatever additional functionality they need for their own institutions and make these enhancements available to the entire community of xc users. this approach is very different from the enhancement process in place for most commercial systems, and xc partner institutions may need to adjust to this approach. enabling future metadata standards as current metadata standards are revised and new standards and schemas are created, xc must be able to accommodate these changes. new crosswalks will allow new metadata schemas to be mapped to the xc internal schema in the future. the xc application profile can be updated with the addition of new data elements as needed. the drupal-based xc user environment will also allow institutions that use xc to create new internal data types to incorporate additional types of metadata. as the development of the semantic web moves forward73 and enables smart linking between existing authority files and vocabularies,74 xc’s architecture can make use of the resulting web services, either by incorporating them metadata to support next-generation library resource discovery | bowen 15 through the xc metadata services hub or through the native xc user interface as part of a user search query. n further considerations the above discussion of the goals and requirements for xc has revealed a number of issues related to the development of next-generation discovery systems that are unfortunately beyond the scope of the next phase of the xc project. we therefore offer them as a possible agenda for future work by the broader library community: 1. explore the wider usefulness of web-based metadata services and the need for an automated metadata services coordinator to control these functions. libraries are already comfortable with basic “services” that are performed on metadata by an outside agency: for example, a library may send copies of its marc records to a vendor for authority processing or enrichment with tables of contents or other data elements. the library community should encourage vendors and others to develop these and other metadata enrichment options as automated web services. 2. study the advantages of using statement-level metadata provenance, as used in the nsdl metadata management system and considered for use within the xc metadata services hub, and explore whether there are ways that marc 21 could move toward allowing more granularity in recording and sharing metadata provenance. 3. to facilitate access to licensed library resources, encourage the development of more robust metasearch technology and standards so that technological limitations do not hinder system performance and search result usability. if this is not successful, libraries and content providers must work together to enable metadata for licensed resources to be revealed within open discovery environments such as xc and ethicshare.75 this second scenario will enable libraries to directly address usability issues with the display of licensed content, which may make it a more desirable longer-term solution than attempting to improve metasearch technology. 4. the administrative bodies of the two groups represented on the dcmi/rda task group (i.e., the dublin core metadata initiative and the rda committee of principals) have a responsibility to take the lead in funding this group’s work to develop and maintain the rda/dc application profile and its related registries and vocabularies. beyond this, however, the broader library community must recognize that this work is essential to ensure that future library metadata standards will function in the broader web environment, and offer additional administrative and financial support for it in the coming years. 5. to ensure that library standards work effectively outside of traditional library systems, catalogers and metadata experts must develop ongoing, collaborative working relationships with system developers. such collaboration will necessitate educating each group of experts about the domain of the other. 6. libraries should experiment with using metadata in new environments and use the lessons learned from this activity to inform the metadata standards development process. while current library automation environments by and large do not provide opportunities for this, the extensible catalog will provide a flexible platform where experimentation can take place.76 xc will make experimentation as risk-free as possible by ensuring that the original metadata brought into the system can be reharvested in its original form, thus minimizing concerns about possible data corruption. xc will also minimize the investment needed for a library to engage in this experimentation because it will be released as open-source software. 7. to facilitate new functionality for next-generation library discovery environments, libraries must share their new expertise in this area with each other. for example, library professional organizations (such as ala and its associations) should form discussion groups and committees devoted to sharing lessons learned from the implementation of faceted interfaces and web 2.0 technologies, such as tagging and folksonomies. such groups should develop a “best practices” document outlining a preferred way to define facets from marc 21 data that can be used by any library implementing faceting on top of its legacy metadata. 8. the library community should discuss and encourage mechanisms for pooling and sharing usergenerated metadata among libraries and other interested institutions. n conclusions to present library resources via the web in a manner that users now expect, library metadata must function in ways that have never been required of it before. making library metadata function effectively within the broader web environment will require that libraries take advantage of the combined knowledge of experts in the areas of cataloging/metadata and system development who share a 16 information technology and libraries | june 2008 common vision for serving library users. the challenges to making legacy library metadata and newer metadata for digital resources interact effectively in the broader web environment are significant, and work must begin now to ensure that we can preserve the investment that libraries have made in their legacy metadata. while the recommendations within this report are the result of planning to develop one particular library discovery system—the extensible catalog (xc)—these lessons can inform the development of other systems as well. the actual development of xc will continue to add to our knowledge in this area. while it may be tempting to wait and see what commercial vendors offer as their next generation of commercial discovery products, such a passive approach may jeopardize the future viability of library metadata. projects such as the extensible catalog can serve as a vehicle for moving forward by providing an opportunity for libraries to experiment and to then take informed action to move the library community toward a next generation of resource discovery systems. acknowledgments phase 1 of the extensible catalog project was funded through a grant from the andrew w. mellon foundation. this paper is in partial fulfillment of that grant, originally funded on april 1, 2006, and concluding on june 30, 2007. the author acknowledges the contributions of the entire university of rochester extensible catalog project team to the content of this paper, and especially thanks david lindahl, barbara tillett, and konstantin gurevich for reading and offering suggestions on drafts of this paper. references and notes 1. despite the use of the word “catalog” within the name of the extensible catalog project, this paper will avoid using the word “catalog” in the phrase “next-generation catalog” because this may misleadingly convey the idea of a catalog as solely a single, separate web destination for library users. instead, terms such as “discovery environment” and “discovery system” will be preferred. 2. the xc blog provides a list of xc partners, describes their roles in xc phase 2, and provides links to reports that represent the outcomes of xc phase 1. “xc (extensible catalog): an opensource online system that will unify access to traditional and digital library resources,” www.extensiblecatalog.info (accessed october 4, 2007). 3. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records (munich: k. g. saur, 1998), www.ifla.org/vii/s13/frbr/ frbr.pdf (accessed july 23, 2007). 4. ifla working group on functional requirements and numbering of authority records (franar), “functional requirements for authority data: a conceptual model,” april 1, 2007, www.ifla.org/vii/d4/franar-conceptualmodel2ndreview.pdf (accessed july 23, 2007). 5. library of congress, network development and marc standards office, “marc 21 formats,” april 18, 2005, www.loc .gov/marc/marcdocz.html (accessed september 3, 2007). 6. “dublin core metadata element set, version 1.1,” december 20, 2004, http://dublincore.org/documents/dces (accessed september 3, 2007). 7. university of rochester river campus libraries, “extensible catalog phase 2,” (grant proposal submitted to the andrew w. mellon foundation, july 11, 2007). 8. “literature list,” extensible catalog blog, www. extensiblecatalog.info/?page_id=17 (accessed august 27, 2007). 9. a summary of the results of this survey is available on the xc blog. nancy fried foster et al., “extensible catalog survey report,” july 20, 2007, www.extensiblecatalog.info/wp-content/ uploads/2007/07/xc%20survey%20report.pdf (accessed july 23, 2007). 10. lorcan dempsey has written of the need for a service layer for libraries that would facilitate the “de-coupling” of resource retrieval from back-end processing. lorcan dempsey, “a palindromic ils service layer,” lorcan dempsey’s weblog, january 20, 2006, http://orweblog.oclc.org/archives/000927. html (accessed august 24, 2007). 11. “open archives initiative protocol for metadata harvesting v. 2.0,” www.openarchives.org/oai/openarchivesprotocol. html (accessed august 27, 2007). 12. library of congress, working group on the future of bibliographic control, “report on the future of bibliographic control: draft for public comment,” november 30, 2007, www .loc.gov/bibliographic-future/news/lcwg-report-draft-11-3007-final.pdf (accessed december 30, 2007). 13. university of california libraries bibliographic services task force, “rethinking how we provide bibliographic services for the university of california,” final report, 34, http://libraries. universityofcalifornia.edu/sopag/bstf/final.pdf (accessed august 24, 2007). 14. “[worldcat.org] search for an item in libraries near you,” www.worldcat.org (accessed august 24, 2007). 15. oclc’s plan to create additional apis to worldcat as part of its worldcat grid project is a welcome development that may enable oclc members to harvest metadata directly from worldcat into a system such as xc in the future. see the following blog posting for an early description of oclc’s plans, which have not been formally unveiled by oclc as of this writing: bess sadler, “the librarians and the chocolate factory: oclc developer network day,” solvitur ambulando, october 3, 2007, www.ibiblio.org/bess/?p=88 (accessed december 30, 2007). 16. “metadata management system,” nsdl registry, september 20, 2006, http://metadataregistry.org/wiki/index.php/ metadata_management_system (accessed july 23, 2007). 17. diane hillmann, stuart sutton, and jon phipps, “nsdl metadata improvement and augmentation services,”(grant proposal submitted to the national science foundation, 2007). 18. library of congress, network development and marc standards office, “marcxml: marc 21 xml schema,” july 26, 2006, www.loc.gov/standards/marcxml (accessed september 3, 2007). metadata to support next-generation library resource discovery | bowen 17 19. andrew k. pace, “category: metasearch,” hectic pace, http://blogs.ala.org/pace.php?cat=150 (accessed august 27, 2007). see in particular the following blog entries: “metameta,” july 25, 2006; “more meta,” september 29, 2006; “preaching to the publishers,” oct 31, 2006; “even more meta,” july 11, 2007; and “still here,” august 21, 2007. 20. david lindahl, “metasearch in the users’ context,” the serials librarian 51, no. 3/4 (2007): 220–222. 21. ethicshare, a collaborative project of the university of minnesota, georgetown university, indiana university–bloomington, indiana university–purdue university indianapolis, and the university of virginia, is addressing this challenge as part of its plan to develop a sustainable online environment for the practical ethics community. the architecture of the proposed ethicshare system has many similarities to that of xc, but the project focuses specifically upon ingesting citation metadata from a variety of sources, including commercial providers. see cecily marcus, “ethicshare planning phase final report,” july 2007, www.lib.umn.edu/about/ethicshare/university%20 of%20minnesota_ethicshare_final_report.pdf (accessed august 27, 2007). 22. roy tennant used this phrase in “marc exit strategies,” library journal 127, no. 19 (november 15, 2002), www.libraryjournal.com/article/ca256611.html?q=tennant+exit (accessed july 23, 2007); karen coyle presented her vision for moving beyond marc to a more flexible, identifier-based record structure that will facilitate a range of library functions in “future considerations: the functional library systems record,” library hi tech 22, no. 2 (2004). 23. library of congress, network development and marc standards office, “mets: metadata encoding and transmission standard official web site,” august 23, 2007, www.loc.gov/ standards/mets (accessed september 3, 2007). 24. library of congress, network development and marc standards office, “mods: metadata object description schema,” august 22, 2007, www.loc.gov/standards/mods (accessed september 3, 2007). 25. library of congress, network development and marc standards office, “mads: metadata authority description schema,” february 2, 2007, www.loc.gov/standards/mads (accessed september 3, 2007). 26. “premis: preservation metadata maintenance activity,” july 31, 2007, www.loc.gov/standards/premis (accessed september 3, 2007). 27. library of congress, network development and marc standards office, “ead: encoded archival description version 2002 official site,” august 17, 2007, www.loc.gov/ead (accessed september 3, 2007). 28. visual resources association, “vra core: welcome to the vra core 4.0,” www.vraweb.org/projects/vracore4 (accessed september 3, 2007). 29. “dublin core metadata element set, version 1.1.” 30. other xml-compatible schemas, such as mods and mads, will also be supported initially in xc if they are first converted into marc xml or qualified dublin core. in the future, we plan to allow these other schemas to be harvested directly into xc. 31. foster et al., “extensible catalog survey report,” july 20, 2007, 15. the original comment was submitted by meg bellinger in yale university’s response to the xc survey. 32. patricia harpring et al., “metadata standards crosswalks,” in introduction to metadata: pathways to digital information (getty research institute, n.d.), www.getty.edu/research/ conducting_research/standards/intrometadata/crosswalks. html (accessed august 29, 2007); see also carol jean godby, jeffrey a. young, and eric childress, “a repository of metadata crosswalks,” d-lib magazine 10, no. 12 (december 2004), www .dlib.org/dlib/december04/godby/12godby.html (accessed july 23, 2007). 33. digital library federation, “crosswalkinglogic,” june 22, 2007, http://webservices.itcs.umich.edu/mediawiki/oaibp/ index.php/crosswalkinglogic (accessed august 28, 2007). 34. karen coyle et al., “framework for a bibliographic future,” may 2007, http://futurelib.pbwiki.com/framework (accessed july 23, 2007). 35. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records. 36. andy powell et al., “dcmi abstract model,” dublin core metadata initiative, june 4, 2007, http://dublincore.org/ documents/abstract-model (accessed august 29, 2007). 37. joint steering committee for development of rda, “rda: resource description and access: background,” july 16, 2007, www.collectionscanada.ca/jsc/rda.html (accessed august 29, 2007). 38. joint steering committee for development of rda, “rda-frbr mapping,” june 14, 2007, www.collectionscanada .ca/jsc/docs/5rda-frbrmapping.pdf (accessed august 29, 2007). 39. joint steering committee for development of rda, “rda element analysis,” june 14, 2007, www.collectionscanada.ca/ jsc/docs/5rda-elementanalysis.pdf (accessed august 28, 2007). a revised version of the document was issued on december 16, 2007, at www.collectionscanada.gc.ca/jsc/docs/5rda-element analysisrev.pdf (accessed december 30, 2007). 40. “data model meeting: british library, london 30 april–1 may 2007,” www.bl.uk/services/bibliographic/meeting.html (accessed july 23, 2007). the task group has outlined its work plan, including deliverables, on its wiki at http://dublincore .org/dcmirdataskgroup (accessed october 4, 2007). 41. emily a hicks, jody perkins, and margaret beecher maurer, “application profile development for consortial digital libraries,” library resources and technical services 51, no. 2 (april 2007). 42. makx dekkers, “application profiles, or how to mix and match metadata schemas,” cultivate interactive, january 2001, www.cultivate-int.org/issue3/schemas (accessed august 29, 2007). 43. thomas baker et al., “dublin core application profile guidelines,” september 3, 2005, http://dublincore.org/usage/ documents/profile-guidelines (accessed october 8, 2007). 44. joint steering committee for development of rda, “rda element analysis.” 45. karen coyle and diane hillmann, “resource description and access (rda): cataloging rules for the 20th century,” d-lib magazine 13, no. 1/2 (jan./feb. 2007), www.dlib.org/dlib/ january07/coyle/01coyle.html (accessed august 24, 2007). 46. karen coyle, “astonishing announcement: rda goes 2.0,” coyle’s information, may 3, 2007, http://kcoyle.blogspot .com/2007/05/astonishing-announcement-rda-goes-20.html (accessed august 29, 2007). 18 information technology and libraries | june 2008 47. “drupal.org,” http://drupal.org (accessed august 30, 2007). 48. foster et al., “extensible catalog survey report,” 14. 49. “taxonomy: a way to organize your content,” drupal.org, http://drupal.org/handbook/modules/taxonomy (accessed september 12, 2007). 50. “blackboard learning system,” www.blackboard.com/ products/academic_suite/learning_system/index.bb (accessed august 31, 2007). 51. “sakai: collaboration and learning environment for education,” http://sakaiproject.org (accessed august 31, 2007). 52. for example, the library into blackboard project at california state fullerton has developed a toolkit for faculty that brings openurl resolver functionality into blackboard to create linked citations to resources. see “putting the library into blackboard: a toolkit for cal state fullerton faculty,” 2005, www .library.fullerton.edu/librarytoolkit/default.shtml (accessed august 31, 2007); and susan tschabrun, “putting the library into blackboard: using the sfx openurl generator to create a toolkit for faculty.” the sakaibrary project at indiana university and the university of michigan are working to integrate licensed library content into sakai using metasearch technology. see “sakaibrary: integrating licensed library resources with sakai,” june 28, 2007, www.dlib.indiana.edu/projects/sakai (accessed august 31, 2007). 53. university of rochester river campus libraries, “extensible catalog phase 2.” 54. susan gibbons, “library course management systems: an overview,” library technology reports 41, no. 3 (may/june 2005): 34–37. 55. marti a. hearst, “design recommendations for hierarchical faceted search interfaces,” august 2006, http:// flamenco.berkeley.edu/papers/faceted-workshop06.pdf (accessed august 31, 2007). 56. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology and libraries 25, no. 3 (september 2006): 128–138. 57. “c4,” https://www.library.rochester.edu/c4 (accessed september 28, 2007). as of the time of this writing, the c4 prototype is available to the public. however, the prototype is no longer being developed, and this prototype may cease to be available at some point in the future. 58. charley pennell, “forward to the past: resurrecting faceted search @ ncsu libraries,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), www.lib.ncsu.edu/endeca/ presentations/200706-facetedcatalogs-pennell.ppt (accessed august 31, 2007). 59. mary charles lasater, “authority control meets faceted browse: vanderbilt and primo,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), www.ala.org/ala/lita/litamembership/ litaigs/authorityalcts/2007annualfiles/marycharleslasater.ppt (accessed august 31, 2007). 60. casey bisson, “faceting and clustering: an implementation report based on scriblio,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), http://oz.plymouth.edu/~cbisson/ presentations/alaannual_2-2007june24.pdf (accessed august 31, 2007). 61. “subject access fields (6xx),” in marc 21 concise format for bibliographic data (2006), www.loc.gov/marc/bibliographic/ ecbdsubj.html (accessed september 28, 2007). 62. pennell, “forward to the past: resurrecting faceted search@ ncsu libraries.” 63. “fast: faceted application of subject terminology,” www.oclc.org/research/projects/fast (accessed august 31, 2007). 64. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records. 65. ifla working group on functional requirements and numbering of authority records (franar), “functional requirements for authority data.” 66. library of congress, network development and marc standards office, “functional analysis of the marc 21 bibliographic and holding formats,” april 6, 2006, www.loc. gov/marc/marc-functional-analysis/functional-analysis.html (accessed august 31, 2007); martha m. yee, “frbrization: a method for turning online public finding lists into online public catalogs,” information technology and libraries 24, no. 2 (june 2005): 77–95; pat riva, “mapping marc 21 linking entry fields to frbr and tillett’s taxonomy of bibliographic relationships,” library resources and technical services 48, no. 2 (april 2004): 130–143. 67. trond aalberg, “a process and tool for the conversion of marc records to a normalized frbr implementation,” in digital libraries: achievements, challenges and opportunities (berlin/heidelberg: springer, 2006), 283–292; christian monch and trond aalberg, “automatic conversion from marc to frbr,” in research and advanced technology for digital libraries (berlin/heidelberg: springer, 2003): 405–411; david mimno and gregory crane, “hierarchical catalog records: implementing a frbr catalog,” d-lib magazine 11, no. 10 (october 2005), www .dlib.org/dlib/october05/crane/10crane.html (accessed august 24, 2007). 68. trond aalberg, frank berg haugen, and ole husby, “a tool for converting from marc to frbr,” in research and advanced technology for digital libraries (berlin/heidelberg: springer, 2006), 453–456; “frbr work-set algorithm,” www .oclc.org/research/software/frbr/default.htm (accessed august 31, 2007); “xisbn (web service),” www.worldcat .org/affiliate/webservices/xisbn/app.jsp (accessed august 31, 2007). 69. for example, marc 21 data may need to be augmented to extract data attributes related to frbr works and expressions that are not explicitly coded within a marc 21 bibliographic record (such as a date associated with a work coded within a general note field); or to “sort out” the fields in a marc 21 bibliographic record for a single resource that contains various works and/or expressions (e.g. ,a sound recording with multiple tracks), to associate the various fields (performer access points, analytical entries, subject headings, etc.) with the appropriate work or expression. 70. while the rit-developed tool is not publicly available at the time of this writing, it is our intent to post it to sourceforge (www.sourceforge.net) in the near future. the final report of the rit project is available at http://docushare.lib.rochester.edu/ docushare/dsweb/get/document-27362 (accessed january 2, 2008). metadata to support next-generation library resource discovery | bowen 19 71. foster et al., “extensible catalog survey report.” 72. note the arrow pointing to the left in figure 1 between the user environments and the metadata services hub. 73. jane greenberg and eva mendez, knitting the semantic web (binghamton, ny: haworth information press, 2007). this volume, co-published simultaneously as cataloging and classification quarterly 43, no. 3/4, contains a wealth of articles that explore the role that libraries can, and should, play in the development of the semantic web. 74. corey a. harper and barbara b. tillett explore various methods for making these controlled vocabularies available in “library of congress controlled vocabularies and their application to the semantic web,” cataloging and classification quarterly 43, no. 3/4 (2007): 63. the development of skos (simple knowledge organization system), a semantic web language for representing controlled structured vocabularies, will also be valuable for xc. see alistair miles and jose r. perez-aguiera, “skos: simple knowledge organisation for the web,” catalogingand classification quarterly 43, no. 3/4 (2007). 75. marcus, “ethicshare planning phase final report.” 76. the talis platform provides another promising environment for experimentation and development. see “talis platform: semantic web application platform,” talis, www.talis.com/ platform (accessed september 2, 2007). student use of library computers: are desktop computers still relevant in today’s libraries? susan thompson information technology and libraries |december 2012 20 abstract academic libraries have traditionally provided computers for students to access their collections and, more recently, facilitate all aspects of studying. recent changes in technology, particularly the increased presence of mobile devices, calls into question how libraries can best provide technology support and how it might affect the use of other library services. a two-year study conducted at california state university san marcos library analyzed student use of computers in the library, both the library’s own desktop computers and laptops owned by students. the study found that, despite the increased ownership of mobile technology by students, they still clearly preferred to use desktop computers in the library. it also showed that students who used computers in the library were more likely to use other library services and physical collections. introduction for more than thirty years, it has been standard practice in libraries to provide some type of computer facility to assist students in their research. originally, the focus was on providing access to library resources, first the online catalog and then journal databases. for the past decade or so, this has expanded to general-use computers, often in an information-commons environment, capable of supporting all aspects of student research from original resource discovery to creation of the final paper or other research product. however, times are changing and the ready access to mobile technology has brought into question whether libraries need to or should continue to provide dedicated desktop computers. do students still use and value access to computers in the library? what impact does student computer use have on the library and its other services? have we reached the point where we should reevaluate how we use computers to support student research? california state university san marcos (csusm) is a public university with about nine thousand students, primarily undergraduates from the local area. csusm was established in 1991 and is one of the youngest campuses in the 23-campus california state university system. the library, originally located in space carved out of an administration building, moved into its own dedicated library building in 2004. one of the core principles in planning the new building was the vision of the library as a teaching and learning center. as a result, a great deal of thought went into the design of technology to support this vision. rather than viewing technology’s role as just supporting access to library resources, we expanded its role to providing cradle-to-grave support for the entire research process. we also felt that encouraging students to work in the library would encourage use of traditional library materials and the expertise of library staff, since these resources would be readily available.1 susan thompson (sthompsn@csusm.edu) is coordinator of library systems, california state university san marcos. student use of library computers | thompson 21 rethinking our assumptions about library technology’s role in the student research process led us to consider the entire building as a partner in the students’ learning process. rather than centralizing all computer support in one information commons, we wanted to provide technology wherever students want to use it. we used two strategies. first, we provided centralized technology using more than two hundred desktop computers, most located in four of our learning spaces: reference, classrooms, the media library, and the computer lab. three of these spaces are configured like information commons, providing full-service research computers grouped around the service desks near each library entrance. in addition, simplified “walk-up” computers are available on every floor. the simplified computers provide limited web services to encourage quick turnaround and no login requirement to ensure ready access to library collections for everyone, including community members. the other major component of our technology plan was the provision of wireless throughout the building, along with extensive power outlets to support mobile computing. more than forty quiet study rooms, along with table “islands” in the stacks, help support the use of laptops for group study. however, only two of these quiet studies, located in the media library, provide desktop computers designed specifically to support group work. in 2009 and again in 2010, we conducted computer use studies to evaluate the success of the library’s technology strategy and determine whether the library’s desktop computers were still meeting student needs as envisioned by the building plan. the goal of the study was to obtain a better understanding of how students use the library’s computers, including types of applications used, computer preferences, and computer-related study habits. the study addressed several specific research questions. first, librarians were concerned that the expanded capabilities of the desktop computers distracted students from an academic and library research focus. were students using the library’s computers appropriately? second, the original technology plan had provided extensive support for mobile technology, but the technology landscape has changed over time. how did the increase in student ownership of mobile devices—now at more than 80 percent—affect the use of the desktop computers? finally, did providing an application-rich computer environment encourage student to conduct more of their studying in the library, leading them more frequently to use traditional library collections and services? this article will focus on the study results pertaining to the second and third research questions. we found that, according to our expectations, students using library computer facilities also made extensive use of traditional library services. however, we were surprised to discover that the growing availability of mobile devices had relatively little impact on students’ continuing preference for libraryprovided desktop computers. literature review the concept of the information commons was just coming into vogue in the early 2000s, when we were designing our library building, and it strongly influenced our technology design as well as building design. information commons, defined by steiner as the “functional integration of technology and service delivery,” have become one of the primary methods by which libraries provide enhanced computing support for students studying in the library.2 one of the changes in libraries motivating the information-commons concept is the desire to support a broad range of learning styles, including the propensity to mix academic and social activities. particularly influential to our design was the concept of the information commons supporting students’ projects “from inception to completion” by providing appropriate technologies to facilitate research, collaboration, and consultation.3 information technology and libraries |december 2012 22 providing access to computers appears to contribute to the value of libraries as “place.” shill and toner, early in the era of information commons, noted “there are no systematic, empirical studies documenting the impact of enhanced library buildings on student usage of the physical library.” 4 since then, several evaluations of the information-commons approach seem to show a positive correlation between creation of a commons and higher library usage because students are now able to complete all aspects of their assignments in the library. for example, the university of tennessee and indiana university have shown significant increases in gate counts after they implemented their commons.5 while many studies discuss the value of information commons, very few look at why library computers are preferred over computers in other areas on campus. burke looked at factors influencing students’ choice of computing facilities at an australian university.6 given a choice of central computer labs, residence hall computers, and the library’s information commons, most students preferred the computers in the library over the other computer locations, with more than half using the library computers more than once a week. they rated the library most highly on its convenience and closeness to resources. perhaps the most important trend likely to affect libraries’ support for student technology needs is the increased use of mobile technology. the 2010 nationwide educause center for applied research (ecar) study, from the same year as the second csusm study, showed that 89 percent of students had laptops.7 other nationwide studies have corroborated this high level of laptop ownership.8 so, does this increased use of laptops and mobile devices have affect the use of desktop computers? the 2010 ecar study reported that desktop ownership (about 50 percent in 2010) had declined by more than 25 percent between 2006 and 2009, a significant period in the lifetime of csusm’s new library building. pew’s internet & american life project trend data showed desktop ownership as the only gadget category in which ownership is decreasing, from 68 percent in 2006 to 55 percent at the end of 2011.9 some libraries and campuses are beginning to respond to the increase in laptop ownership by changing their support for desktop computers. university of colorado boulder, in an effort to decrease costs and increase availability of flexible campus spaces, is making a major move away from providing desktop computers.10 while they found that 97 percent of their students own laptops and other mobile devices, they were concerned that many students still preferred to use desktop computers when on campus. to entice students to bring their laptops to campus, the university is enhancing their support for mobile devices by converting their central computer labs into flexible-use space with plentiful power outlets, flexible furniture, printing solutions, and access to the usual campus software. nevertheless, it may be premature for all libraries and universities to eliminate their desktop computer support. tom, voss, and scheetz found students want flexibility with a spectrum of technological options.11 certainly, they want wi-fi and power outlets to support their mobile technology. however, students also want conventional campus workstations providing a variety of functions, such as quick print and email computers, long-term workstations with privacy, and workstations at larger tables with multiple monitors that support group work. while the ubiquity of laptops is an important factor today, other forms of mobile devices may become more important in the future. a 2009 wall street journal article reported the trend for business travelers is to rely on smartphones rather than laptops.12 for the last three years, educause’s horizon reports have made support for non-laptop mobile technologies one of the top trends. the 2009 horizon report mentioned that in countries like japan, “young people equipped student use of library computers | thompson 23 with mobiles often see no reason to own personal computers.”13 in 2010, horizon reported an interesting pilot project at a community college in which one group of students was issued mobile devices and another group was not.14 members of the group with the mobile devices were found to work on the course more during their spare time. the 2011 horizon report discusses mobiles as capable devices in their own right that are increasingly users’ first choice for internet access.15 therefore, rather than trying to determine which technology is most important, libraries may need to support multiple devices. trends described in the ecar and horizon studies make it clear that students own multiple devices. so how do they use them in the study environment? head’s interviews with undergraduate students at ten us campuses found that “students use a less is more approach to manage and control all of the it devices and information systems available to them.”16 for example, in the days before final exams, students were selective in their use of technology to focus on coursework yet remain connected with the people in their lives. the question then may not be which technology libraries should support but rather how to support the right technology at the right time. method the csusm study used a mixed-method approach, combining surveys with real-time observation to improve the effectiveness of assessment and generate a more holistic understanding of how library users made their technology choices. the study protocol received exempt status by the university human subjects review board. it was carried out twice over a two-year period to determine whether time of the semester affected usage. in 2009, the study was administered at the end of the spring term, april 15 to may 3. we expected that students near the end of the term would be preparing for finals and completing assignments, including major projects. the 2010 study was conducted near the beginning of the term, february 4 to february 18. we that early term students would be less engaged in academic assignments, particularly major research projects. we carried out each study over a two-week period. an attempt was made to check consistency by duplicating each time and location. each location was surveyed monday—thursday, once in the morning and once in the afternoon during the heavy-use times of 11 a.m. and 2 p.m. the survey locations included two large computer labs (more than eighty computers each), one located near the library reference desk and one near the academic technology helpdesk. other locations included twenty computers in the media library, a handful of desktop computers in the curriculum area, and laptop users, mostly located on the fourth and fifth floor of the library. the fourth and fifth floor observations also included the library’s forty quiet study rooms. for the 2010 study, the other large computer lab on campus (108 computers), located outside the library, also was included for comparison purposes. we used two techniques: a quantitative survey of library computer users and a qualitative observation of software applications usage and selected study habits. the survey tried to determine the purpose for which the student was using the computer for that day, what their computer preference was, and what other business they might have in the library. it also asked students for their suggestions for changes in the library. the survey was usually completed within the five-minute period that we had estimated and contained no identifying personal information. the survey administrator handed-out the one-page paper survey, along with a pencil if desired, to each student using a library workstation or using a laptop during each designated observation information technology and libraries |december 2012 24 period. users who refused to take the survey were counted in the total number of students asked to do the survey. however, users who indicated they refused because they had already completed a survey on a previous observation date were marked as “dup” in the 2010 survey and were not counted again. the “dup” statistic proved useful as an independent confirmation of the popularity of the library computers. the second method involved conducting “over-the-shoulder” observations of students using the library computers. while students were filling out the paper survey, the survey administrator walked behind the users and inconspicuously looked at their computer screens. all users in the area were observed whether or not they had agreed to take the survey. the one exception was users in group-study rooms. the observer did not enter the room and could only note behaviors visible from the door window, such as laptop usage or group studying. based on brief (one minute or less) observations, administrators noted on a form the type of software application the student was using at that point in time. the observer also noted other, nondesktop computer technical devices in use (specifically laptops, headphones, and mobile devices such as smart phones), and study behaviors, such as groupwork (defined as two or more people working together). the student was not identified on the form. we felt that these observations could validate information provided by the users on the survey. results we completed 1,452 observations in 2009 and 2,501 observations in 2010. the gate counts for the primary month each study took place—70,607 for april 2009 and 59,668 for february 2010— show the library was used more heavily during the final exam period. the larger number of results the second year was due to more careful observation of laptop and study-group computer users on the fourth and fifth floor and the addition of observations in a nonlibrary computer lab rather than an increase of students available to be observed. the observations looked at application usage, study habits, and devices present, but this article will only discuss the observations pertaining to devices. in 2009, 17 percent of students were observed using laptops (see table1). this number almost doubled in 2010 to 33 percent. most laptop users were observed on the fourth and fifth floors where furniture, convenient electrical outlets, and quiet study rooms provided the best support for this technology. very few desktop computers were available, so students desiring to study on these floors have to bring their own laptops. almost 20 percent of students in 2010 were observed with other mobile technology, such as cell phones or ipods, and 16 percent were wearing headphones, which indicated there was other, often not visible, mobile technology in use. student use of library computers | thompson 25 table 1. mobile technology observed in 2009, 1,141 students completed the computer-use survey. however, we were unable to accurately determine the return rate that year. the nature of the study, which surveyed the same locations multiple times, revealed that many of the students were approached more than once to complete the survey. thus the majority of the refusals to take the survey were because the subject had already completed one previously. the 2010 study accounted for this phenomenon by counting refusals and duplications separately. in 2010, 1,123 students completed the survey out of 1,423 unique asks, resulting in a 79 percent return rate. the 619 duplicates counted represented about half of the 2010 surveys completed and could be considered another indicator of frequent use of the library’s computers. the 2010 results included an additional 290 surveys completed by students using the other large computer lab on campus outside the library. table 2. frequency of computer use 33% 16% 18% 17% 0% 5% 10% 15% 20% 25% 30% 35% laptop in use headphones in use mobile device in use (cell phone, ipod) 2010 2009 49% 33% 11% 9% 42% 30% 15% 10% 0% 10% 20% 30% 40% 50% 60% daily when on campus several times a week several times a month rarely use comps in library 2009 2010 information technology and libraries |december 2012 26 in both years of the study, 78 percent of students said they preferred to use computers in the library to other computer lab locations on campus. students also indicated they were frequent users (see table 2). in 2009, 82 percent of students used the library computers frequently—49 percent daily and 33 percent several times a week. the frequency of use in the 2010 early term study dropped about 10 percent to 72 percent but with the same proportion of daily vs. weekly users. convenience and quiet were the top reasons given by more than half of students as to why they preferred the library computers followed closely by atmosphere. about a quarter of students preferred library computers because of their close access to other library services. table 3. preferred computer to use in the library the types of computer that students preferred to use in the library were desktop computers followed by laptops owned by the students (see table 3). it is notable that the preference for desktop computers changed significantly from 2009 and 2010: 84 percent of students preferred desktop computers in 2009 vs. 72 percent in 2010—a 12 percent decrease. not surprisingly, few students preferred the simplified walk-up computers used for quick lookups. however, we did not expect such little interest in checking out laptops, with only 2 percent preferring that option. the 2010 study added a new question to the survey to better understand the types of technology devices owned by students (see table 4). in 2010, 84 percent of students owned a laptop (combining the netbook and laptop statistics). almost 40 percent of students owned a desktop, therefore many students owned more than one type of computer. of the 85 percent of students that indicated they had a cell phone, about one-third indicated they owned smart phones. the majority of students own music players. the one technology students were not interested in was e-book readers, with less than 2 percent indicating ownership. 84% 6% 23% 2% 71% 5% 28% 2% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% sit-down pc walk-up pc own laptop laptop checked out in library 2009 2010 student use of library computers | thompson 27 table 4. technology devices owned by students (2010) to understand how the use of technology might affect use of the library in general, the survey asked students what other library services they used on the same day they were using library computers. table 5 shows survey responses are very similar between the late term 2009 study and the early term in 2010. by far the most popular use of the library, by more than three-quarters of the students, was for study. around 25 percent of the students planned to meet with others, and 20 percent planned to use the media services. around 15 percent of students planned to checkout print books, 15 percent planned to use journals, and 10 percent planned to ask for help. the biggest difference for students early in the term was an increased interest (5 percent more) in using the library for study. the late-term students were 9 percent more likely to meet with others. by contrast, users in the nonlibrary computer lab were much less likely to make use of other library services. only 24 percent of nonlibrary users planned to study in the library, and 8 percent planned to meet with others in the library that day. use of all other library services was less than 5 percent by the nonlibrary computer users. 1% 1% 7% 31% 40% 52% 59% 77% 0% 20% 40% 60% 80% 100% kindle/book reader other handheld devices netbook smart phone desktop computer regular cell phone ipod/mp3 music player laptop information technology and libraries |december 2012 28 table 5. other library services used in 2010, we also asked users what changes they would like in the library, and 58 percent of respondents provided suggestions. the question was not limited to technology, but by far the biggest request for change was to provide more computers (requested by 30 percent of all respondents). analysis of the other survey questions regarding computer ownership, and preferences revealed who was requesting more traditional desktops in the library. surprisingly, most were laptop users; 90 percent of laptop owners wanted more computers and 88 percent of the respondents making this request were located on the fourth and fifth floor, which were almost exclusively laptop users. the next most comments received were remarks indicating student satisfaction with the current library services: 19 percent of students said they were satisfied with current library services and 9 percent praised the library and its services. commonality of requests dropped quickly at that point, with the fourth most common request being for more quiet (2 percent). 1% 0% 0% 2% 2% 3% 3% 4% 7% 23% 4% 3% 3% 9% 10% 13% 13% 22% 26% 81% 0% 3% 6% 8% 10% 15% 16% 20% 35% 76% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% other pick up ill/circuit create a video/web page use a reserve book ask questions/get help look for journals/newspapers checkout a book use media meet with others study 2009 2010 non-library student use of library computers | thompson 29 discussion the results show that students consistently prefer to use computers in the library, with 78 percent declaring a preference for the library over other computer locations on campus both years of the study. this preference is confirmed by the statistics reported by csusm’s campus it department, which tracks computer login data. this data consistently shows the library computer labs are used more than nonlibrary computer labs, with the computers near the library reference desk as the most popular followed closely by the library’s second large computer lab, which is located next to the technology help desk. for instance, during the 2010 study period, the reference desk lab (80 computers) had 6,247 logins compared to 3,218 logins in the largest nonlibrary lab (108 computers)—double the amount of usage. the data also shows that use of the computers near the reference desk increased by 15 percent between 2007 and 2010. supporting the popularity of using computers in the library is the fact that most students are repeat customers. table 2 shows 82 percent of the 2009 late-term respondents used the library computers several times a week with almost half using our computers daily. in contrast, 72 percent of the 2010 early term students used the library computers daily or several times a week. the 10 percent drop in frequency of visits to the library for computing applied to both laptop and desktop users and seems to be largely due to not yet receiving enough work from classes to justify more frequent use. the kind of computer that users prefered changed somewhat over the course of the study. the preference for desktop computers dropped from 84 percent of students in 2009 to 72 percent in 2010 (see table 3). one reason for this 12 percent drop may be related to how the survey was adminstered. the 2010 study did a more thorough job of surveying the fourth and fifth library floors where most laptop users are. as a result, the laptop floors represented 29 percent of the response in 2010 vs. only 13 percent in 2009. these numbers are also reflected in the proporation of laptops observed each year—33 percent in 2010 vs. 17 percent in 2009 (see table 1). the drop in desktop computer preference is interesting because it was not matched by an equally large increase in laptop preference, which only increased by 5 percent. the other reason for the decrease in desktop preference is likely due to the larger change seen nationwide in student laptop ownership. for instance, the pew study of gadget ownership showed a 13 percent drop in desktop ownership over a five-year period, 2006–2011, while at the same time laptop ownership almost doubled from 30 percent to 56 percent.17 however, it is interesting to note that, according to the pew study, in 2011 the percent of adults who owned each type of device was nearly equal— 55 percent for desktops and 56 percent for laptops. the 2010 survey tried to better understand students’ preferences by identifying all the kinds of technology they had available to them. we found that 77 percent of csusm students owned laptops and an additional 7 percent owned the netbook form of laptops (see table 4). the combined 84 percent laptop ownership is comparable with the 2010 ecar study’s finding of 89 percent student laptop ownership nationwide.18 this high level of laptop ownership may explain why the users who preferred laptop computers almost all preferred to use their own rather than laptops checked out in the library. despite the high laptop ownership and decrease in desktop preference, it is significant that the majority of csusm students still prefer to use desktop computers in the library. aside from the 72 percent of respondents who specifically stated a preference for desktop computers, the top suggestion for library improvement was to add more desktop computers, requested by 38 percent information technology and libraries |december 2012 30 of respondents. further analysis of the survey data revealed that it was the laptop owners and the fourth and fifth floor laptop users who were the primary requestors of more desktop computers. to try to better understand this seemingly contradictory behavior, we have done some further investigation. anecdotal conversations with users during the survey indicated that convenience and reliability are two factors affecting student’s decision to use desktop computers. the desktop computers’ speed and reliable internet connections were regarded as particularly important when uploading a final project to a professor, with some students stating they came to the library specifically to upload an assignment. in may 2012, the csusm library held a focus group that provided additional insight to the question of desktops vs. laptops. all of the eight-student focus group participants owned laptops, yet all eight participants indicated that they preferred to use desktop computers in the library. when asked why, participants indicated the reliability and speed of the desktop computers and the convenience of not having to remember to bring their laptop to school and “lug” it around. another factor influencing the convenience factor may be that our campus does not require that students own a laptop and bring it to class, so they may have less motivation to travel with their laptop. supporting the idea that students perceive different benefits for each type of computer, six of the eight participants owned a desktop computer in addition to a laptop. the 2010 study also showed that students see value in owning both a desktop and a laptop computer, since the 40 percent ownership of desktop computers overlaps the 84 percent ownership of laptops (see table 4). table 6. reasons students prefer using library computer areas for almost half of the students surveyed, one of the reasons for their preference for using computers in the library was either the ready access to library services or staff (see table 6). even more significant, when specifically asked what else they planned to do in the library that day besides using the computer (see table 5), more than 80 percent of the students indicated that they intended to use the library for purposes other than computing. the top two uses for the library were studying (76 percent in 2009, 81 percent in 2010) and meeting with others (35/26 percent), indicating the importance of the library as place. the most popular library service was the media 0% 5% 10% 15% 20% 25% 30% library services are close library staff are close 2009 2010 student use of library computers | thompson 31 library (20/22 percent) followed by collections with 16/13 percent planning to checkout a book and 15/13 percent planning to look for journals and newspapers. it is interesting that the level of use of these library services was similar whether early or late in the term. the biggest difference was that early term students were less likely to be working with a group but were slightly more likely to be engaged in general studying. even the less-used services, such as asking a question (10 percent) or using a reserve book (8 percent), exhibited an appropriate amount of usage if one looks at the actual numbers. for example, 8 percent of 1,123 2010 survey respondents represent 90 students who used reserve materials sometime during the 8 hours of the two-week survey period. to put the use of the library by computer users into perspective, we also asked students using the nonlibrary computer lab if they planned to use the library sometime that same day. only 24 percent of the nonlibrary computer users planned to study in the library that day vs. 81 percent of the library computer users; only 4 percent planned to use media vs. 24 percent; and 2 percent planned to check out a book vs. 13 percent. the implication is clear that students using computers in the library are much more likely to use the library’s other services. we usually think of providing desktop computers as a service for students, and so it is. however, the study results show that providing computers also benefits the library itself. it reinforces its role as place by providing a complete study environment for students and encouraging all study behaviors including communication and working with others. the popularity of the library computers provide us with a “captive audience” of repeat customers. conclusion the csusm library technology that was planned in 2004 is still meeting students’ needs. although most of our students own laptops, most still prefer to use desktop computers in the library. in fact, providing a full-service computer environment to support the entire research process benefits the entire library. students who use computers in the library appear to conduct more of their studying in the library and thus make more use of traditional library collections and services. going forward, several questions arise for future studies. csusm is a commuter school. students often treat their work space in the library as their office for the day, which increases the importance of a reliable and comfortable computer arrangement. one question that could be asked is whether the results would be different for colleges where most students live on campus or nearby. if the university requires that all students own their own laptop and expects them to bring them to class, how does that affect the relevance of desktop computers in the library? the 2010 study was completed just a few weeks before the first ipad was introduced. since students have identified convenience and weight as reasons for not carrying their laptops, are tablets and ultra-light computers, like the macbook air, more likely to be carried on campus by students and used them more frequently for their research? how important is it to have a supportive mobile infrastructure with features such as high speed wifi, ability to use campus printers, and access to campus applications? are students using smart phones and other mobile devices for study purposes? in fact, are we focusing too much on laptops, and are other mobile devices starting to take over that role? this study’s results make it clear that we can’t just look at data such as ecar’s, which show high laptop ownership, and assume that means students don’t want or won’t use library computers. as information technology and libraries |december 2012 32 the types of mobile devices continue to grow and evolve, libraries should continue to develop ways to facilitate their research role. however, the bottom line may not be that one technology will replace another but rather that students will have a mix of devices and will choose which device is best suited to a particular purpose. therefore libraries, rather than trying to pick which device to support, may need to develop a broad-based strategy to support them all. references 1. susan m. thompson and gabriella sonntag. “chapter 4: building for learning: synergy of space, technology and collaboration.” learning commons: evolution and collaborative essentials. oxford: chandos publishing (2008): 117-199. 2. heidi m. steiner and robert p. holley, “the past, present, and possibilities of commons in the academic library,” reference librarian 50, no. 4 (2009): 309–332. 3. michael j. whitchurch and c. jeffery belliston,“information commons at brigham young university: past, present, and future,” reference services review 34, no. 2 (2006): 261–78. 4. harold shill and shawn tonner, “creating a better place: physical improvements in academic libraries, 1995–2002,” college & research libraries 64 (2003): 435. 5. barbara i. dewey, “social, intellectual, and cultural spaces: creating compelling library environments for the digital age,” journal of library administration 48, no. 1 (2008): 85–94; diane dallis and carolyn walters, “reference services in the commons environment,” references services review 34, no. 2 (2006): 248–60. 6. liz burke et al., “where and why students choose to use computer facilities: a collaborative study at an australian and united kingdom university,” australian academic & research libraries 39, no. 3 (september 2008): 181–97. 7. shannon d. smith and judith borreson caruso, the ecar study of undergraduate students and information technology, 2010 (boulder, co: educause center for applied research, october 2010), http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed march 21, 2012). 8. pew internet & american life project, “adult gadget ownership over time (2006–2012),” http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx (accessed june 14, 2012); the horizon report: 2009 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2010 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2011 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012). 9. pew internet, “adult gadget ownership.” http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf student use of library computers | thompson 33 10. deborah keyek-franssen et al., computer labs study university of colorado boulder office of information technology october 7, 2011, http://oit.colorado.edu/sites/default/files/labsstudypenultimate-10-07-11.pdf (accessed june 15, 2012). 11. j. s. c. tom, k. voss, and c. scheetz[full names?], “the space is the message: first assessment of a learning studio,” educause quarterly 31, no. 2 (2008), http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio (accessed june 25, 2012). 12. nick wingfield, “time to leave the laptop behind,” wall street journal, february 23, 2009, http://online.wsj.com/article/sb122477763884262815.html (accessed june 15 2012). 13. the horizon report: 2009 edition. 14. the horizon report: 2010 edition. 15. the horizon report: 2011 edition. 16. alison j. head and michael b. eisenberg, “balancing act: how college students manage technology while in the library during crunch time,” project information literacy research report, information school, university of washington, october 12, 2011, http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf (accessed june 14, 2012). 17. pew internet, “adult gadget ownership.” 18. smith and caruso, ecar study. http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio http://online.wsj.com/article/sb122477763884262815.html http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf table 1. mobile technology observed discussion article title | author 41content-based information retrieval and digital libraries | wan and liu 41 content-based information retrieval and digital libraries this paper discusses the applications and importance of content-based information retrieval technology in digital libraries. it generalizes the process and analyzes current examples in four areas of the technology. content-based information retrieval has been shown to be an effective way to search for the type of multimedia documents that are increasingly stored in digital libraries. as a good complement to traditional textbased information retrieval technology, content-based information retrieval will be a significant trend for the development of digital libraries. w ith several decades of their development, digital libraries are no longer a myth. in fact, some general digital libraries such as the national science digital library (nsdl) and the internet public library are widely known and used. the advance of computer technology makes it possible to include a colossal amount of information in various formats in a digital library. in addition to traditional text-based documents such as books and articles, other types of materials—including images, audio, and video—can also be easily digitized and stored. therefore, how to retrieve and present this multimedia information effectively through the interface of a digital library becomes a significant research topic. currently, there are three methods of retrieving information in a digital library. the first and the easiest way is free browsing. by this means, a user browses through a collection and looks for desired information. the second method—the most popular technique used today—is textbased retrieval. through this method, textual information (full text of text-based documents and/or metadata of multimedia documents) is indexed so that a user can search the digital library by using keywords or controlled terms. the third method is content-based retrieval, which enables a user to search multimedia information in terms of the actual content of image, audio, or video (marques and furht 2002). some content features that have been studied so far include color, texture, size, shape, motion, and pitch. while some may argue that text-based retrieval techniques are good enough to locate desired multimedia information, as long as it is assigned proper metadata or tags, words are not sufficient to describe what is sometimes in a human’s mind. imagine a few examples: a patron comes to a public library with a picture of a rare insect. without expertise in entomology, the librarian won’t know where to start if only a text-based information retrieval system is available. however, with the help of content-based image retrieval, the librarian can upload the digitized image of the insect to an online digital image library of insects, and the system will retrieve similar images with detailed description of this insect. similarly, a patron has a segment of music audio, about which he or she knows nothing but wants to find out more. by using the content-based audio retrieval system, the patron can get similar audio clips with detailed information from a digital music library, and then listen to them to find an exact match. this procedure will be much easier than doing a search on a text-based music search system. it is definitely helpful if a user can search this non-textual information by styles and features. in addition, the advance of the world wide web brings some new challenges to traditional text-based information retrieval. while today’s web-based digital libraries can be accessed around the world, users with different language and cultural backgrounds may not be able to do effective keyword searches of these libraries. content-based information retrieval techniques will increase the accessibility of these digital libraries greatly, and this is probably a major reason it has become a hot research area in the past decade. ideally, a content-based information retrieval system can understand the multimedia data semantically, such as its objects and categories to which it belongs. therefore, a user is able to submit semantic queries and retrieve matched results. however, a great difficulty in the current computer technology is to extract high-level or semantic features of multimedia information. most projects still focus on lower-level features, such as color, texture, and shape. simply put, a typical content-based information retrieval system works in this way: first, for each multimedia file in the database, certain feature information (e.g., color, motion, or pitch) is extracted, indexed, and stored. second, when a user composes a query, the feature information of the query is calculated as vectors. finally, the system compares the similarity between the feature vectors of the query and multimedia data, and retrieves the best matching records. if the user is not satisfied with the retrieved records, he or she can refine the search results by selecting the most relevant ones to the search query, and repeat the search with the new information. this process is illustrated in figure 1. the following sections will examine some existing content-based information retrieval techniques for most common information formats (image, audio, and video) in digital libraries, as well as their limitations and trends. gary (gang) wan (gwan@tamu.edu) is a science librarian and assistant professor, and zao liu (zliu@tamu.edu) is a distance learning librarian and assistant professor at sterling c. evans library, texas a&m university, college station, texas. gary (gang) wan and zao liu 42 information technology and libraries | march 200842 information technology and libraries | march 2008 ■ content-based image retrieval there have been a large number of different contentbased image retrieval (cbir) systems proposed in the last few years, either building on prior work or exploring novel directions. one similarity among these systems is that most perform feature extraction as the first step in the process, obtaining global image features such as color, shape, and texture (datta et al., 2005). one of the most well-known cbir systems is query by image content (qbic), which was developed by ibm. it uses several different features, including color, sketches, texture, shape, and example images to retrieve images from image and video databases. since its launch in 1995, the qbic model has been employed for quite a few digital libraries or collections. one recent adopter is the state hermitage museum in russia (www.hermitage. ru), which uses qbic for its web-based digital image collection. users can find artwork images by selecting colors from a palette or by sketching shapes on a canvas. the user can also refine existing search results by requesting all artwork images with similar visual attributes. the following screenshots demonstrate how a user can do a content-based image search with qbic technology. in figure 2.1, the user chooses a color from the palette and composes the color schema of artwork he or she is looking for. figure 2.2 shows the artwork images that match the query schema. another example of digital libraries or collections that have incorporated cbir technology is the national science foundation’s international digital library project (www.memorynet.org), a project that is composed of several image collections. the information retrieval system for these collections includes both a traditional text-based search engine and a cbir system called simplicity (semantics-sensitive integrated matching for picture libraries) developed by wang et al. (2001) of pennsylvania state university. from the front page of these image collections, a user can choose to display a random group of images (figure 3.1). below each image is a “similar” button; clicking this allows the user to view a group of images that contain similar objects to the previously selected one (figure 3.2). by providing feedback to the search engine this way, the user can find images of desired objects without knowing their names or descriptions. simply put, simplicity segments each image into small regions, extracts several features (such as color, figure 1. the general process of content-based information retrieval figure 2.1. a user query figure 2.2. the search results for this query article title | author 43content-based information retrieval and digital libraries | wan and liu 43 location, and shape) from these small regions, and classifies these regions into some semantic categories (such as textured/nontextured and graph/photograph). when computing the similarity between the query image and images in the database, all these features will be considered and integrated, and best matching results will be retrieved (wang et al., 2001). similar applications of cbir technology in digital libraries include the university of california–berkeley’s digital library project (http://bnhm.berkeley.edu), the national stem digital library (ongoing), and virginia tech’s anthropology digital library, etana (ongoing). while these feature-based approaches have been explored over the years, an emerging new research direction in cbir is automatic concept recognition and annotation. ideally, automatic concept recognition and annotation can discover the concepts that an image conveys and assign a set of metadata to it, thus allowing image search through the use of text. a trusted automatic concept recognition and annotation system can be a good solution for large data sets. however, the semantic gap between computer processors and human brains remains the major challenge in the development of a robust automatic concept recognition and annotation system (datta et al., 2005). a recent example of efforts in this field is li and wang’s alipr (automatic linguistic indexing of pictures—real time, http://alipr.com) project (2006). through a web interface, users are able to search images in several different ways: they may do text searches and provide feedback to the system to find similar images. users may also upload an image, and the system will perform concept analysis and generate a set of annotations or tags automatically, as shown in figure 4. the system then retrieves images from the database that are visually similar to the uploaded image. in the process of automatic annotation, if the user doesn’t think the tags given by the system are suitable, he or she can input other tags to describe the image. this is also the “training” process for the alipr system. since cbir is the major research area and has the longest history in content-based information retrieval, there are many models, products, and ongoing projects in addition to the above examples. as image collections become a significant part of digital libraries, more attention has been paid to possibilities of providing content-based image search as a complement to existing metadata search. ■ content-based audio retrieval compared with cbir, content-based audio retrieval (cbar) is relatively new, and fewer research projects on it can be found. in general, existing cbar approaches start from the content analysis of audio clips. an example of this content analysis is extracting basic audio elements, such as duration, pitch, amplitude, brightness, and bandfigure 3.1. a group of random images in the collection figure 3.2. cbir results figure 4. alipr’s automatic annotation feature 44 information technology and libraries | march 200844 information technology and libraries | march 2008 width (wold et al., 1996). because of the great difficulties in recognizing audio content, research in this area is less mature than that in content-based image and video retrieval. although no cbar system has been found to be implemented by any digital library so far, quite a few projects provide good prototypes or directions. one good example is zhang and kuo’s (2001) research project on audio classification and retrieval. the prototype system is composed of three stages: coarse-level audio segmentation, fine-level classification, and audio retrieval. in the first stage, audio signals are semantically segmented and classified into several basic types including speech, music, song, speech with music background, environment sounds, and silence. some physical audio features—such as the energy function, the fundamental frequency, and the spectral peak tracks—are examined in this stage. in the second stage, further classification is conducted for every basic type. features are extracted from the time-frequency representation of audio signals to reveal subtle differences of timbre and pattern among different classes of sounds. based on these differences, the coarse-level segmentation obtained in stage one can be classified to narrower categories. for example, speech can be differentiated into the voices of men, women, and children. finally, in the information retrieval stage, two approaches—query-by-keyword and query-by-example—are employed. the query-by-keyword approach is more like the traditional text-based search system. the query-by-example approach is similar to content-based image retrieval systems where an image can be searched by color, texture, and histogram, and audio clips can be retrieved with distinct features, such as timbre, pitch, and rhythm. this way, a user may choose from a given list of features, listen to the retrieved samples, and modify the input feature set to get more desired results. zhang and kuo’s prototype is a very typical and classic cbar system. it is relatively mature and can be used by large digital audio libraries. more recently, li et al. (2003) proposed a new feature extraction method particularly for music genre classification named daubechies wavelet coefficient histograms (dwchs). dwchs capture the local and global information of music signals simultaneously by computing their histograms. similar to other cbar strategies, this method divides the process of music genre classification into two steps: feature extraction and multi-class classification. the music signal information representing the music is extracted first, and then an algorithm is used to identify the labels from the representation of the music sounds with respect to their features. since the decomposition of audio signal can produce a set of subband signals at different frequencies corresponding to different characteristics, li et al. (2003) proposed a new methodology, the dwchs algorithm, for feature extraction. with this algorithm, the decomposition of the music signals is obtained at the beginning, and then a histogram of each subband is constructed. hence, the energy for each subband is computed, and the characteristics of the music are represented by these subbands. one finding from this research reveals that this methodology, along with advanced machine learning techniques, has significantly improved accuracy of music genre classification (li et al. 2003). therefore, this methodology potentially can be used by those digital music libraries widely developed in past several years. ■ content-based video retrieval content-based video retrieval (cbvr) is a more recent research topic than cbir and cbar, partly because the digitization technology for video appeared later than those for image and audio. as digital video websites such as youtube and google video become more popular, how to retrieve desired video clips effectively is a great concern. searching by some features of video, such as motion and texture, can be a good complement to the traditional text-based search method. one of the earliest examples is the videoq system developed by chang et al. (1997) of columbia university (www.ctr.columbia.edu/videoq), which allows a user to search video based on a rich set of visual features and spatio-temporal relationships. the video clips in the database are stored as mpeg files. through a web interface, the user can formulate a query scene as a collection of objects with different attributes, including motion, shape, color, and texture. once the user has formulated the query, it is sent to a query server, which contains several databases for different content features. on the query server, the similarities between the features of each object specified in the query and those of the objects in the database are computed; a list of video clips is then retrieved based on their similarity values. for each of these video clips, key-frames are dynamically extracted from the video database and returned to browser. the matched objects are highlighted in the returned key-frame. the user can interactively view these matched video clips by simply clicking on the keyframe. meanwhile, the video clip corresponding to that key-frame is extracted from the video database (chang et al. 1997). figures 5.1–5.2 show an example of a visual search through the videoq system. many other cbvr projects also examine these content features and try to find more efficient ways to retrieve data. a recent example is wang et al.’s (2006) project, vferret, a content-based similarity search tool for continuous archived video. the vferret system segments video data into clips and extracts both visual and audio features as metadata. then a user can do a metadata search or article title | author 45content-based information retrieval and digital libraries | wan and liu 45 content-based search to retrieve desired video clips. in the first stage, a simple segmentation method is used to split the archived digital video into five-minute video clips. the system then extracts twenty image frames evenly from each of these five-minute video clips for visual feature extraction. additionally, the system splits the audio channel of each clip into twenty individual fifteensecond segments for further audio feature extraction. in the second stage, both audio and visual features are extracted. for visual features, the color element is used as the content feature. for audio features, 154 audio features originally used by ellis and lee (2004) to describe audio segments are computed. for each fifteen-second video segment, the visual feature vector extracted from the sample image and the audio feature vector extracted from the corresponding audio segment are combined into a single feature vector. in the information retrieval stage, the user submits a video clip query at first, then its feature vector is computed and compared with that of video clips in the database, and the most similar clips are retrieved (wang et al. 2006). similar projects in this area include carnegie mellon university’s informedia digital video library (www. informedia.cs.cmu.edu) and muvis of finland’s tampere university of technology (http://muvis.cs.tut.fi/index. html). content-based information retrieval for other digital formats with the advance of digitization technology, the content and formats of digital libraries are much richer than before. they are not limited to text, image, audio, and video. some new formats of digital content are emerging. digital libraries of 3-d objects are good examples. since 3-d models have arbitrary topologies and cannot be easily “parameterized” using a standard template as in the case for 2-d forms (bustos et al. 2005), contentbased 3-d model retrieval is a more challenging research topic than other multimedia formats discussed earlier. so far, four types of solutions—primitive-based, statistics-based, geometry-based, and view-based—have been found (bimbo and pala 2006). primitive-based solutions represent 3-d objects with a basic set of parameterized primitive elements. parameters are used to control the shape of each primitive element and to fit each primitive element with a part of the model. with statistics-based approaches, shape descriptions based on statistical modfigure 5.1. the user composes a query figure 5.2. search results for the sample query 46 information technology and libraries | march 200846 information technology and libraries | march 2008 els are created and measured. geometry-based methods, however, use geometric properties of the 3-d object and their measures as global shape descriptors. for viewbased solutions, a set of 2-d views of the model and descriptors of their content are used to represent the 3-d object shape (bimbo and pala 2006). another novel example is moustakas et al.’s (2005) project on 3-d model search using sketches. in the experimental system, the vector of geometrical descriptors for each 3-d model is calculated during the feature extraction stage. in the retrieval stage, a user can initially use one of the sketching interfaces (such as the virtual reality interface or by using an air mouse) to sketch a 2-d contour of the desired 3-d object. the 2-d shape is recognized by the system, and a sample primitive is automatically inserted in the scene. next, the user defines other elements that cannot be described by the 2-d contour, such as the height of the object, and manipulates the 2-d contour until it reaches its target position. the final query is formed after all the primitives are inserted. finally, the system computes the similarities between the query model and each 3-d model in the database, and renders the best matching records. an online demonstration can be found for a european project specifically designed for a 3-d digital museum collection, sculpteur (www.sculpteurweb.org). from its web-based search interface, a user can choose to do a metadata search or content-based search for a 3-d object. the search strategy here is somewhat similar to that in some cbir systems: the user can upload a 3-d model in vrml formats, then select a search algorithm (such as similar color, texture, etc.) to perform a search within a digital collection of 3-d models. as 3-d computer visualization has been widely used in a variety of areas, there are more research projects focusing on the content-based information retrieval techniques for this new multimedia format. ■ conclusion there is no doubt that content-based information retrieval technology is an emerging trend for digital library development and will be an important complement to the traditional text-based retrieval technology. the ideal cbir system can semantically understand the information in a digital library, and render users the most desirable data. however, the machine understanding of semantic information still remains to be a great difficulty. therefore, most current research projects, including those discussed in this paper, deal with the understanding and retrieval of lower-level features or physical features of multimedia content. certainly, as related disciplines such as computer vision and artificial intelligence keep developing, more researches will be done on higher-level feature-based retrieval. in addition, the growing varieties of multimedia content in digital libraries have also brought many new challenges. for instance, 3-d models now become important components of many digital libraries and museums. content-based retrieval technology can be a good direction for this type of content, since the shapes of these 3-d objects are often found more effectively if the user can compose the query visually. new cbir approaches need to be developed for these novel formats. furthermore, most cbir projects today tend to be web-based. by contrast, many project were based on client applications in the 1990s. these web-based cbir tools will have significant influence on digital libraries or repositories, as most of them are also web-based. particularly in the age of web 2.0, some large digital repositories—such as flickr for images and youtube and google video for video—are changing people’s daily lives. the implementation of cbir will be a great benefit to millions of users. since the nature of cbir is to provide better search aids to end users, it is extremely important to focus on the actual user’s needs and how well the user can use these new search tools. it is surprising to find that little usability testing has been done for most cbir projects. such testing should be incorporated into future cbir research before it is widely adopted. bibliography bimbo, a. and p. pala. 2006. content-based retrieval of 3-d models. acm transactions on multimedia computing, communications, and applications 2, no. 1: 20–43. bustos, b., et al. 2005. feature-based similarity search in 3-d object databases. acm computing surveys 37, no. 4: 345–387. chang, s., et al. 1997). videoq: an automated content based video search system using visual cues. in proceedings of the 5th acm international conference on multimedia, e. p. glinert, et al., eds. new york: acm. datta r., et al. 2005. content-based image retrieval: approaches and trends of the new age. in proceedings of the 7th international workshop on multimedia information retrieval, in conjunction with acm international conference on multimedia, h. zhang, , j. smith, and q. tian, eds. new york: acm. ellis, d. and k. lee. minimal-impact audio-based personal archives. in proceedings of the 1st acm workshop on continuous archival and retrieval of personal experiences carpe, j. gemmell, et al., eds. new york: acm. li, t., et al. 2003. a comparative study on content-based music genre classification. in proceedings of the 26th annual international acm sigir conference on research and development in information retrieval, c. clarke, et al., eds. new york: acm. li, j. and j. wang, j. 2006. real-time computerized annotation of pictures. in proceedings of the 14th annual acm international article title | author 47content-based information retrieval and digital libraries | wan and liu 47 conference on multimedia, k. nahrstedt, et al., eds. new york: acm. marques, o. and b. furht. 2002. content-based image and video retrieval. norwell, mass: kluwer. moustakas, k., et al. 2005. master-piece: a multimodal (gesture+speech) interface for 3d model search and retrieval integrated in a virtual assembly application. proceedings of the enterface: 62–75. wang, j., et al. 2001. simplicity: semantics-sensitive integrated matching for picture libraries. ieee trans. pattern analysis and machine intelligence 23, no. 9: 947–963. wang, z., et al. 2006. vferret: content-based similarity search tool for continuous archived video. in proceedings of the 3rd acm workshop on continuous archival and retrival of personal experiences, k. maze et al., eds. new york: acm. wold, e., et al. 1996. content-based classification, search, and retrieval of audio. ieee multimedia 3, no. 3: 27–36. zhang, t. and c. kuo. 2001. content-based audio classification and retrieval for audiovisual data parsing. norwell, mass.: kluwer. lita national forum cover 2 lita guides cover 3 lita workshops cover 4 index to advertisers statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: john webb, librarian emeritus, washington state university libraries, pullman, wa 99164-5610. annual subscription price, $55. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: june 2007 issue). total number of copies printed: average, 5,354; actual, 5,280. sales through dealers and carriers, street vendors, and counter sales: average, 0; actual 462. paid or requested mail subscriptions: average, 4,283; actual, 4,193. free distribution (total): average, 292; actual, 292. total distribution: average, 5,028; actual, 4,947. office use, leftover, unaccounted, spoiled after printing: average, 326; actual, 333. total: average, 5,354; actual, 5,280. percentage paid: average, 94.19; actual, 94.10. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 0 7 . trends at a glance: a management dashboard of library statistics emily morton-owens and karen l. hanson information technology and libraries | september 2012 36 abstract systems librarians at an academic medical library created a management data dashboard. charts were designed using best practices for data visualization and dashboard layout, and include metrics on gatecount, website visits, instant message reference chats, circulation, and interlibrary loan volume and turnaround time. several charts draw on ezproxy log data that has been analyzed and linked to other databases to reveal use by different academic departments and user roles (such as faculty or student). most charts are bar charts and include a linear regression trend line. the implementation uses perl scripts to retrieve data from eight different sources and add it to a mysql data warehouse, from which php/javascript webpages use google chart tools to create the dashboard charts. introduction new york university health sciences libraries (nyuhsl) had adopted a number of systems that were either open-source, home-grown, or that offered apis of one sort or another. examples include drupal, google analytics, and a home-grown interlibrary loan (ill) system. systems librarians decided to capitalize on the availability of this data by designing a system that would give library management a single, continuously self-updating point of access to monitor a variety of metrics. previously this kind of information had been assembled annually for surveys like aahsl and arl. 1 the layout and scope of the dashboard was influenced by google analytics and a beta dashboard project at brown.2 the dashboard enables closer scrutiny of trends in library use, ideally resulting in a more agile response to problems and opportunities. it allows decisions and trade-offs to be based on concrete data rather than impressions, and it documents the library’s service to its user community, which is important in a challenging budget climate. although the end product builds on a long list of technologies—especially perl, mysql, php, javascript, and google chart tools—the design of the project is lightweight and simple, and the number of lines of code required to power it is remarkably small. further, the design is modular. this means that nyuhsl could offer customized versions for staff in different roles, restricting the display to show only data that is relevant to the individual’s work. because most libraries have a unique combination of technologies in place to handle functions like circulation, reference questions, circulation, and so forth, a one-size-fits-all software package that emily morton-owens (emily.morton.owens@gmail.com) was web services librarian and karen hanson (karen.hanson@med.nyu.edu) is knowledge systems librarian, new york university health sciences libraries, new york. trends at a glance: a management dashboard of library statistics | morton-owens and hanson 37 could be used by any library may not be feasible. instead, this lightweight and modular approach could be re-created relatively easily to fit local circumstances and needs. visual design principles in designing the dashboard, we tried to use some best practices for data visualization and assembling charts into a dashboard. the best-known authority on data visualization, edward tufte, states “above all else, show the data.”3 in part, this means minimizing distractions, such as unnecessary gridlines and playful graphics. ideally, every dot of ink on the page would represent data. he also emphasizes truthful proportions, meaning the chart should be proportional to the actual measurements.4 a chart should display data from zero to the highest quantity, not arbitrarily starting the measurements at a higher number, because that distorts the proportions between the part and the whole. a chart also should not use graphics that differ in width as well as length, because that causes the area of the graphic to increase incorrectly, as opposed to simply the length increasing. pie charts are popular chart types that have serious problems in this respect despite their popularity; they require users to judge the relative area of the slices, which is difficult to do accurately.5 generally, it is better to use a bar chart with different length bars whose proportions users can judge better. color should also be used judiciously. some designers use too many colors for artistic effect, which creates a “visual puzzle”6 as the user wonders whether the colors carry meaning. some colors stand out more than others and should be used with caution. for example, red is often associated with something urgent or negative, so it should only be used in appropriate contexts. duller, less saturated colors are more appropriate for many data visualizations. a contrasting style is exemplified by nigel holmes, who designs charts and infographics with playful visual elements. a recent study compared the participants’ reactions to holmes’ work with plain charts of the same data.7 there was no significant difference in comprehension or shortterm memorability; however, the researchers found that the embellished charts were more memorable over the long term, as well as more enjoyable to look at. that said, holmes’ style is most appropriate for charts that are trying to drive home a certain interpretation. in the case of the dashboard, we did not want to make any specific point, nor did we have any way of knowing in advance what the data would reveal, so we used tufte’s principles in our design. a comparable authority on dashboard design is stephen few. a dashboard combines multiple data displays in a single point of access. as in the most familiar example, a car dashboard, it usually has to do with controlling or monitoring something without taking your focus from the main task.8 a dashboard should be simple and visual, not requiring the user to tune out extraneous information or interpret novel chart concepts. the goal is not to offer a lookup table of precise values. the user should be able to get the idea without reading too much text or having to think information technology and libraries | september 2012 38 too hard about what the graph represents. thinking again of a car, its speedometer does not offer a historical analysis of speed variation because this is too much data to process while the car is moving. similarly, the dashboard should ideally fit on one screen so that it can be taken in at a glance. if this is not possible, at least all of the individual charts should be presented intact, without scrolling or being cramped in ways that distort the data. a dashboard should present data dimensions that are dynamic. the user will refer to the dashboard frequently, so presenting data that does not change over time only takes up space. better yet, the data should be presented alongside a benchmark or goal. a benchmark may be a historical value for the same metric or perhaps a competitor’s value. a goal is an intended future value that may or may not ever have been reached. either way, including this alternate value gives context for whether the current performance is desirable. this is essential for making the dashboard into a decision-making tool. nils rasmussen et al. discuss three levels of dashboards: strategic, tactical (related to progress on a specific project), and operational (related to everyday, department-level processes). 9 so far, nyuhsl’s dashboard is primarily operational, monitoring whether ordinary work is proceeding as planned. later in this paper we will discuss ways to make the dashboard better suited to supporting strategic initiatives. system architecture the dashboard architecture consists of three main parts: importer scripts that get data from diverse sources, a data warehouse, and php/javascript scripts that display the data. the data warehouse is a simple mysql database; the term “warehouse” refers to the fact that it contains a stripped-down, simplified version of the data that is appropriate for analysis rather than operations. our approach to handling the data is an etl (extract, transform, load) routine. data are extracted from different sources, transformed in various ways, and loaded into the data warehouse. our data transformations include reducing granularity and enriching the data using details drawn from other datasets, such as the institutional list of ip ranges and their corresponding departments. data rarely change once in the warehouse because they represent historical measurements, not open transactions.10 there is an importer script customized for each data source. the data sources differ in format and location. for example, google analytics is a remote data source with a unique data export api, the ill data are in a local mysql database, and libraryh3lp has remote csv log files. the scripts run automatically via a cron job at 2a.m. and retrieves data for the previous day. that time was chosen to ensure all other nightly cron jobs that affect the databases are complete before the dashboard imports start. each uses custom code for its data source and creates a series of mysql insert queries to put the needed data fields in the mysql data warehouse. for example, a script might pull the dates when an ill request was placed and filled, but not the title of the requested item. trends at a glance: a management dashboard of library statistics | morton-owens and hanson 39 a carefully thought-out data model simplifies the creation of reports. the data structure should aim to support future expansion. in the data warehouse, information that was previously formatted and stored in very inconsistent ways is brought together uniformly. there is one table for each kind of data with consistent field names for dates, services, and so forth, and others that combine related data in useful ways. the dashboard display consists of a number of widgets, one for each chart. each chart is created with a mixture of php and javascript. google chart tools interprets lines of javascript to draw an attractive, proportional chart. we do not want to hardcode the values in this javascript, of course, because the charts should be dynamic. therefore we use php to query the data warehouse and a statement for each line of results to “write” a line of the data in javascript. figure 1. php is used to read from the database and generate rows of data as server-side javascript. each php/javascript file created through this process is embedded in a master php page. this master page controls the order and layout of the individual widgets using the php include feature to add each chart file to the page plus a css stylesheet to determine the spacing of the charts. finally, because all the queries take a relatively long time to run, the page is cached and refreshes itself the first time the page is opened each day. the dashboard can be refreshed manually if the database or code is modified and someone wants to see the results immediately. many of the dashboard’s charts include a linear regression trend line. this feature is not provided by google charts and must be inserted into the widget’s code manually. the formula can be found online.11 the sums and sums of squares are totted up as the code loops through each line of data, and these totals are used to calculate the slope and intercept. in our twenty-six-week displays, we never want to include the twenty-sixth week of data because that is the present (partial) week. the linear regression line takes the form y = mx + b. we can use that formula along with the slope and intercept values to calculate y-values for week zero and the next-to-last week (week twentyfive). those two points are plotted and the trend line is drawn between them. the color of the line depends on its slope (greater or less than zero). depending on whether we want that chart’s metric to go up or down, the line is green for the desirable direction and red for the undesirable direction. information technology and libraries | september 2012 40 details on individual systems gatecount most of nyuhsl’s five locations have electronic gates to track the number of patrons who visit. formerly these statistics were kept in a microsoft excel spreadsheet, but now there is a simple web form into which staff can enter the gate reading twice daily. the data goes directly into the data warehouse, and the a.m. and p.m. counts are automatically summed. there is some errorchecking to prevent incorrect numbers being entered, which varies depending on whether that location’s gate is the kind that provides a continuously increasing count or is reset each day. the data are presented in a stacked bar chart, summed for the week. the user can hover over the stacked bars to see numbers for each location, but the height of the stacked bar and the trend line represent the total visits for all locations together. figure 2. stacked bar chart with trendline showing visits per week to pphysical library branches over a twenty-six-week period ticketing nyuhsl manages online user requests with a simple ticketing system that integrates with drupal. there are four types of tickets, two of which involve helping users and two of which involve reporting problems. the “helpful” services are general reference questions and literature search requests. the “trouble” services are computer problems and e-resource problems. these two pairs trends at a glance: a management dashboard of library statistics | morton-owens and hanson 41 each have their own stacked bar chart because, ideally, the number of “helpful” tickets would go up while the number of “trouble” tickets would go down. each chart has a trend line, color-coded for the direction that is desirable in that case. figure 3. stacked bar chart with trendline showing trouble tickets by type the script that imports this information into the data warehouse simply does so from another local mysql database. it only fetches the date and the type of request, not the actual question or response. it also inserts a record into the user transactions table, which will be discussed in the section on user data. drupal nyuhsl’s drupal site allows librarians directly to contribute content like subject guides and blog posts.12 the dashboard tracks the number of edits contributed by users (excluding the web services librarian and the web manager, who would otherwise swamp the results). this is done with a simple count query on the node_revisions table in the drupal database. because no other processing is needed and caching ensures the query will be done at most once per day, this is the only widget that pulls data directly from the original database at the time the chart is drawn. koha koha is an open-source opac system. at nyuhsl, koha’s database is in mysql. each night the importer script copies “issues” data from koha’s statistics table. this supports the creation of a information technology and libraries | september 2012 42 stacked bar chart showing the number of item checkouts each week, with each bar divided according to the type of item borrowed (e.g., book or laptop). as with other charts, a color-coded trend line was added to show the change in the number of item checkouts. google analytics the dashboard relies on the google analytics php interface (gapi) to retrieve data using the google analytics data export api.13 nothing is stored in the data warehouse and there is no importer script. the first widget gets and displays weekly total visits for all nyuhsl websites, the main nyuhsl website, and visits from mobile devices. a trend line is calculated from the “all sites” count. the second widget retrieves a list of the top “outbound click” events for the past thirty days and returns them as urls. a regular expression is used to remove any ezproxy prefix, and the remaining url is matched against our electronic resources database to get the title. thus, for example, the widget displays “web of knowledge” instead of “http://ezproxy.med.nyu.edu/login?url=http://apps.isiknowledge.com/.” a future improvement to this display would require a new table in the data warehouse and importer script to store historic outbound click results. this data would support comparison of the current list with past results to identify click destinations that are trending up or down. figure 4. most popular links clicked on to leave the library’s website in a thirty-day period trends at a glance: a management dashboard of library statistics | morton-owens and hanson 43 libraryh3lp libraryh3lp is a jabber-based im product that allows librarians to jointly manage a queue of reference queries. it offers csv-formatted log files that a perl script can access using “curl,” a command-line tool that mimics a web browser’s login, cookies, and file requests. the csv log is downloaded via curl, processed with perl’s text::csv module, and the data are then inserted into the warehouse. the first libraryh3lp widget counts the number of chats handled by each librarian over the past ninety days. the second widget tracks the number of chats for the past twenty-six weeks and includes a trend line. figure 5. bar chart showing number of im chats per week over a twenty-six-week period document delivery services the document delivery services (dds) department fulfills ill requests. the web application that manages these requests is homegrown, with a database in mysql. each night, a script copies the latest requests to the data warehouse. the dashboard uses this data to display a chart of how many requests are made each week and which publications are requested from other libraries most frequently. this data could be used to determine whether there are resources that should be considered for purchase. information technology and libraries | september 2012 44 the dds data was also used to demonstrate how data might be used to track service performance. one chart shows the average time it takes to fulfill a document request. further evaluation is required to determine the usefulness of such a chart for motivating improvement of the service or whether this is perceived as a negative use of the data. some libraries may find this kind of information useful for streamlining services. figure 6. this stacked bar chart shows the number of document delivery requests handled per week. the chart separates patron requests from requests made by other libraries. ezproxy data ezproxy is an oclc tool for authenticating users who attempt to access the library’s electronic resources. it does not log e-resource use where the user is automatically authenticated using the institutional ip range, but the data are still valuable because it logs a significant amount of use that can support in-depth analysis. because of the gaps in the data, much of the analysis looks at patterns and relationships in the data rather than absolute values. karen coombs’ article discussing the analysis of ezproxy logs to understand e-resource at the department level provided the initial motivation to switch on the ezproxy log.14 when logging is enabled, a standard web log file is produced. here is a sample line from the log: 123.45.6.7 amyu0gh5brmuska hansok01 [09/sep/2011:18:25:23 -0500] post http://ovidsp.tx.ovid.com: 80/sp3.3.1a/ovidweb.cgi http/1.1 20020472 http://ovidsp.tx.ovid.com.ezproxy.med.nyu.edu/sp-3.3.1a/ovidweb.cgi trends at a glance: a management dashboard of library statistics | morton-owens and hanson 45 each line in the log contains a user ip address, a unique session id, the user id, the date and time of access, the url requested by the user, the http status code, the number of bytes in the requested file, and the referrer (the page the user clicked on to get to the site). the ezproxy log data undergoes some significant processing before being inserted into the ezproxy report tables. the main goal of this is to enrich the data with relevant supplemental information while eliminating redundancy. to facilitate this process, the importer script first dumps the entire log into a table and then performs multiple updates on the dataset. during the first step of processing, the ip addresses are compared to a list of departmental ip ranges maintained by medical center it. if a match is found, the “location accessed” is stored against the log line. next, the user id is compared with the institutional people database, retrieving a user type (faculty, staff, or student) and a department, if available (e.g., radiology). one item of significant interest to senior management is the level of use within hospitals. as a medical library, we are interested in the library’s value to patient care. if there is significant use in the hospitals, this could furnish positive evidence about the library’s role in the clinical setting. next, the resource url and the referring address are truncated down to domain names. the links in the log are very specific, showing detailed user activity. because the library is operating in a medical environment, privacy is a concern and so specific addresses are truncated to a top-level domain (e.g. ovid.com) to suppress any tie to a specific article, e-book, or other specific resource. finally, a query is run against the remaining raw data to condense the log down to unique session id/resource combinations, and this block of data is inserted into a new table. each user visit to a unique resource in a single session is recorded; for example, if a user visits lexis nexis, ovid medline, scopus, and lexis nexis again in a single session, three lines will be recorded in the user activity table. a single line in the final ezproxy activity table contains a unique combination of location accessed (e.g., tisch hospital), user department (e.g., radiology), user type (e.g., staff), earliest access date/time for that resource (e.g., 9/9/201118:25), resource name (e.g., scopus.com), session id, and referring domain (e.g., hsl.med.nyu.edu). there is significant repetition in the log. depending on what filters are set up, every image within a webpage could be a line in the log. the method of condensing the data described previously results in a much smaller and more manageable dataset. for example, on a single day 115,070 rows of were collected in the ezproxy log, but only 2,198 were inserted into the final warehouse table after truncating the urls and removing redundancy. in a separate query on the raw data table, a distinct list containing user id, date, and the word “eresources” is built and stored in a “user transactions” table. this very basic data are stored so that simple user analysis can be performed (see “user data” below). information technology and libraries | september 2012 46 figure 7. line chart showing total number of ezproxy sessions captured per week over a twenty-sixweek period once the ezproxy data are transferred to the appropriate tables, the raw data (and thus the most concerning data from a privacy standpoint) is purged from the database. several dashboard charts were created using the streamlined ezproxy data, a simple count of weekly e-resource users, and a table showing resources whose use changed most significantly since the previous month. it was challenging to calculate the significance of the variations in use since resources that went from one session in a month to two sessions were showing the same proportional change as those that increased from one thousand to two thousand sessions. a basic calculation was created to highlight the more significant changes in use. d = (pq) if d<0 then significance = d—8 x 10 d q +1 if d>0 then significance = d +8 x 10 d q +1 d = difference between last month and this month p = number of visits last month (8 to 1 days ago) q = number of visits previous month (15 to 9 days ago) trends at a glance: a management dashboard of library statistics | morton-owens and hanson 47 this equation serves the basic purpose of identifying unusual changes in e-resource use. for example, one e-resource was shown trending up in use after a librarian taught a course in it. figure 8. table of e-resources showing the most significant change in use over the last month compared to the previous month the ezproxy data has already proven to be a rich source of data. the work so far has only scratched the surface of what the data could show. only two charts are currently displayed on the dashboard, but the value of thisdata is more likely to come from one-off customized reports based on specific queries, like tracking use of individual resources over time or looking at variations of use within specific buildings, departments, or user types. there is also a lot that could be done with the referrer addresses. for example, the library has been submitting tips to the newsletter that is delivered by email. the referrer log allows the number of clicks from this source to be measured so that librarians can monitor the success of this marketing technique. user data each library system includes some user information. where user information is available in a system, a separate table is populated in the warehouse. as mentioned briefly above, a user id, a date, and the type of service used (e-resources, dds, literature search, etc.) is stored. details of the transaction are not kept here. the user id can be used to look up basic information about the user such as role (faculty, staff, student) and department. we should emphasize for clarity that the detailed information about the activity is completely separated from any information about the user so that the data cannot be joined back together. information technology and libraries | september 2012 48 the most sensitive data, such as the raw ezproxy log data, is purged after the import script has copied the truncated and de-identified data. even though the data stored is very basic, information at the granularity of individual users is never displayed on the dashboard. the user information is aggregated by user type for further analysis and display. the institutional people database can be used to determine how many people are in each department. a table added to the dashboard shows the number of resource uses and the percentage of each department that used library resources in a six-month period. some potential uses of this data include identifying possible training needs and measuring the success of library outreach to specific departments. for example, if one department uses the resources very little, this may indicate a training or marketing deficit. it may also be interesting to analyze how the academic success of a department aligns with library resource use. do the highest intensity users of library resources have greater professional output or higher prestige as a research department, for example? it is unsurprising to find that medical students and librarians are most likely to use library resources. the graduate medical education group is third and includes medical residents (newly qualified doctors on a learning curve). as with the ezproxy data, there are numerous insights to be gained from this data that will help the library make strategic decisions about future services. figure 9. table showing the proportion of each user group that has used at least one library service in a six-month period results trends at a glance: a management dashboard of library statistics | morton-owens and hanson 49 the dashboard has been available for almost a year. it requires a password and is only available to nyuhsl’s senior management team and librarians who have asked for access. feedback on the dashboard has been positive, and librarians have begun to make suggestions to improve its usefulness. one librarian uses the data warehouse for his own reports and will soon provide his queries so that they can be added to the dashboard. the dashboard has facilitated discoveries about the nature of our users and has identified potential training needs and areas of weakness in outreach. a static dashboard snapshot was recently created for presentation to the dean of the medical school to illustrate the extent and breadth of library use. the initial dashboard aimed to demonstrate the kinds of library statistics that it is possible to extract and display, but there is much to be done to improve its operational usefulness. a dashboard working group has been established to build on the original proof-of-concept by improving the data model and adding relevant charts. some charts will be incorporated into the public website as a snapshot of library activity. the dashboard was structured to be adaptable and expandable. the next iteration will support customization of the display for each user. new charts will be added as requested, and charts that are perceived to be less insightful will be removed. for example, one chart shows the number of reference chat requests answered by each librarian in addition to the number of chats handled per week. the usefulness of this chart was questioned when it was observed that the results were merely a reflection of which librarians had the most time at their own desks, allowing them to answer chats. this is an example of how it can be difficult to separate context from numbers. in this instance the individual statistics were only included because the data was available, not because any particular request from management, so these charts may be removed from the dashboard. nyuhsl is also investigating the ex libris tool ustat, which supports analysis of counter (counting online usage of networked electronic resources) reports from e-resources vendors. ustat covers some of the larger gaps in the ezproxy log, including journal-level rather than vendor-level analysis, and most importantly, the use statistics for non-ezproxied addresses. a future project will be to see whether there is an automated way to extract use metrics, either from ustat or directly from the vendors to be incorporated into the data warehouse. preliminary discussion are being held with it administrators about the possibilities of ezproxying library resource urls as they pass through the firewall so that the ezproxy log becomes a more complete reflection of use. an example of a strategic decision based on dashboard data involves nyuhsl’s mobile website. librarians had been considering the question of whether to invest substantial effort in identifying and presenting free apps and mobile websites to complement the library’s small collection of licensed mobile content. the chart of website visits on the dashboard surprisingly shows that the number of visits that come from mobile devices is consistently fewer 3 percent, probably because of the relatively modest selection of mobile-optimized website resources. rather than invest information technology and libraries | september 2012 50 significant effort in cataloging additional potentially lackluster free resources that would not be seen by a large number of users, the team decided to wait for more headlining subscription-based resources to become available and increase traffic to the mobile site. it would be worthwhile to add charts to the dashboard that track metrics related to new strategic initiatives requiring librarians to translate strategic ideas into measurable quantities. for example, if the library aspired to make sure users received responses more quickly, charts tracking the response time for various services could be added and grouped together to track progress on this goal. as data continues to accumulate, it will be possible to extend the timeframe of the charts, for example, making weekly charts into monthly ones. over time, the data may become more static, requiring more complicated calculations to reveal interesting trends. conclusions the medical center has a strong ethic of metric-driven decisions, and the dashboard brings the library in line with this initiative. the dashboard allows librarians and management to monitor key library operations from a single, convenient page, with an emphasis on long-term trends rather than day-to-day fluctuations in use. it was put together using freely available tools that should be within the reach of people with moderate programming experience. assembling the dashboard required background knowledge of the systems in question, was made possible by nyuhsl’s use of open-source and homegrown software, and increased the designers’ understanding of the data and tools in question. references 1 association of academic health sciences libraries, “annual statistics,” http://www.aahsl.org/mc/page.do?sitepageid=84868 (accessed november 7, 2011); association of research libraries, “arl statistics,” http://www.arl.org/stats/annualsurveys/arlstats (accessed november 7, 2011). 2 brown university library, “dashboard_beta :: dashboard information,” http://library.brown.edu/dashboard/info (accessed january 5, 2012). 3 edward r. tufte, the visual display of quantitative information (cheshire, ct: graphics, 2001), 92. 4 ibid., 56. 5 ibid., 178. 6 ibid., 153. 7 scott bateman et al., “useful junk? the effects of visual embellishment on comprehension and memorability of charts,” chi ’10 proceedings of the 28th international conference on human factors in computing systems (new york, acm, 2010) , doi: 10.1145/1753326.1753716. http://www.aahsl.org/mc/page.do?sitepageid=84868 http://www.arl.org/stats/annualsurveys/arlstats/ http://library.brown.edu/dashboard/info/ trends at a glance: a management dashboard of library statistics | morton-owens and hanson 51 8 stephen few, information dashboard design: the effective visual communication of data (beijing: o’reilly, 2006), 98. 9 nils rasmussen, claire y. chen, and manish bansal, business dashboards: a visual catalog for design and deployment (hoboken, nj: wiley, 2009), ch. 4. 10 richard j. roiger and michael w. geatz, data mining: a tutorial-based primer (boston: addison wesley, 2003), 186. 11 one example: stefan waner and steven r. costenoble, “fitting functions to data: linear and exponential regression,” february 2008, http://people.hofstra.edu/stefan_waner/realworld/calctopic1/regression.html (accessed january 5, 2012). 12 emily g. morton-owens, “editorial and technological workflow tools to promote website quality,” information technology &llibraries 30, no 3 (september 2011):92–98. 13 google, “gapi—google analytics api php interface,” http://code.google.com/p/gapi-google-analyticsphp-interface (accessed january 5, 2012). 14 karen a. coombs, “lessons learned from analyzing library database usage data,” library hitech 23 (2005): 4, 598–609, doi: 10.1108/07378830510636373. http://people.hofstra.edu/stefan_waner/realworld/calctopic1/regression.html http://code.google.com/p/gapi-google-analytics-php-interface/ http://code.google.com/p/gapi-google-analytics-php-interface/ library use of web-based research guides jimmy ghaphery and erin white information technology and libraries | march 2012 21 abstract this paper describes the ways in which libraries are currently implementing and managing webbased research guides (a.k.a. pathfinders, libguides, subject guides, etc.) by examining two sets of data from the spring of 2011. one set of data was compiled by visiting the websites of ninety-nine american university arl libraries and recording the characteristics of each site’s research guides. the other set of data is based on an online survey of librarians about the ways in which their libraries implement and maintain research guides. in conclusion, a discussion follows that includes implications for the library technology community. selected literature review while there has been significant research on library research guides, there has not been a recent survey either of the overall landscape or of librarian attitudes and practices. there has been recent work on the efficacy of research guides as well as strategies for their promotion. there is still work to be done on developing a strong return on investment metric for research guides, although the same could probably be said for other library technologies including websites, digital collections, and institutional repositories. subject-based research guides have a long history in libraries that predates the web as a servicedelivery mechanism. a literature-review article from 2007 found that research on the subject gained momentum around 1996 with the advent of electronic research guides, and that there was a need for more user-centric testing.1 by the mid-2000s, it was rare to find a library that did not offer research guides through its website.2 the format of guides has certainly shifted over time to database-driven efforts through local library programming and commercial offerings. a number of other articles start to answer some of the questions about usability posed in the 2007 literature review by vileno. in 2008, grays, del bosque, and costello used virtual focus groups as a test bed for guide evaluation.3 two articles from the august 2010 issue of the journal of library administration contain excellent literature reviews and look toward marketing, assessment, and best practices.4 also in 2010, vileno followed up on the 2007 literature review with usability testing that pointed toward a number of areas in which users experienced difficulties with research guides.5 jimmy ghaphery (jghapher@vcu.edu) is head, library information systems and erin white (erwhite@vcu.edu) is web systems librarian, virginia commonwealth university libraries, richmond, va. mailto:jghapher@vcu.edu library use of web-based research guides | ghaphery and white 22 in terms of cross-library studies, an interesting collaboration in 2008 between cornell and princeton universities found that students, faculty, and librarians perceived value in research guides, but that their qualitative comments and content analysis of the guides themselves indicated a need for more compelling and effective features.6 the work of morris and grimes from 1999 should also be mentioned; the authors surveyed 53 university libraries, finding that it was rare to find a library with formal management policies for their research guides.7 most recently, libguides has emerged as a leader in this arena, offering a popular software-as-aservice (saas) model and as such is not yet heavily represented in the literature. a multichapter libguides lita guide is pending publication and will cover such topics as implementing and managing libguides, setting standards for training and design, and creating and managing guides. arl guides landscape during the week of march 3rd, 2011, the authors visited the websites of 99 american university arl libraries to determine the prevalence and general characteristics of their subject-based research guides. in general, the visits reinforced the overarching theme within the literature that subject-based research guides are a core component of academic library web services. all 99 libraries offered research guides that were easy to find from the library home page. libguides was very prominent as a platform, in production at 67 of the 99 libraries. among these, it appeared that at least 5 libraries were in the process of migrating from a previous system (either a homegrown, database-driven site or static html pages) to libguides. in addition to the presence and platform, the authors recorded additional information about the scope and breadth of each site’s research guides. for each site, the presence of course-based research guides was recorded. in some cases the course guides had a separate listing, whereas in others they were intermingled with the subject-based research guides. course guides were found on 75 of the 99 libraries visited. of these, 63 were also libguides sites. it is certainly possible that course guides are being deployed at some of the other libraries but were not immediately visible in visiting the websites, or that course guides may be deployed through a course management system. nonetheless, it appears that the use of libguides encourages the presence of public-facing course guides. qualitatively, there was wide diversity of how course guides were organized and presented, varying from a simple a-to-z listing of all guides to separately curated landing pages specifically organized by discipline. the number of guides was recorded for each libguides site. it was possible to append “/browse.php?o=a” to the base url to determine how many guides and authors were published at each site. this php extension was the publicly available listing of all guides on each libguides platform. the “/browse.php?o=a” extension no longer publicly reports these statistics; however, findings could be reproduced by manually counting the number of guides and authors on each site. the authors confirmed the validity of this method in the fall of 2011 by revisiting four sites and finding that the numbers derived from manual counting were in line with the previous findings. of information technology and libraries | march 2012 23 the 63 libguides sites we observed, a total of 14,522 guides were counted from 2,101 authors for an average of 7 guides per author. on average, each site had 220 guides from 32 authors (median of 179 guides; 29 authors). at the high end of the scale, one site had 713 guides from 46 authors. based on the volume observed, libraries appear to be investing significant time toward the creation, and presumably the maintenance, of this content. in addition to creation and ongoing maintenance, such long lists of topics raise a number of usability issues that libraries will also be wise to keep in mind.8 survey the literature review and website visits call out two strong trends: 1. research guides are as commonplace as books in libraries, 2. libguides is the elephant in the room, so much so that it is hard to discuss research guides without discussing libguides. based on preliminary findings from the literature review and survey, we looked to further describe how libraries are supporting, innovating, implementing, and evaluating their research guides. a ten-question survey was designed to better understand how research guides sit within the cultural environment of libraries. it was distributed to a number of professional discussion lists the week of april 19, 2011 (see appendix). the following lists were used in an attempt to get a balance of opinion from populations of both technical and public services librarians: code4lib, web4lib, lita-l, lib-ref-l, and ili-l. the survey was made available for two weeks following the list announcements. survey response was very strong, with 198 responses (188 libraries) received without the benefit of any follow-up recruitment. ten institutions submitted more than one response. in these cases only the first response was included for analysis. we did not complete a response for our own institution. the vast majority (155, 82%) of respondents were from college or university libraries. of the remaining 33, 24 (13%) were from community college libraries, with only 9 (5%) identifying themselves as public, school, private, or governmental. among the college and university libraries, 17 (9%) identified themselves as members of the arl, which comprises 126 members.9 in terms of “what system best describes your research guides by subject?” the results were similar to the survey of arl websites. most libraries (129, 69%) reported libguides as their system, followed by “customized open source system” and “static html pages,” both at 20 responses (11% each). sixteen libraries (9%) reported using a homegrown system, with three libraries (2%) reporting “other commercial system.” in terms of initiating and maintaining a guides system, much of the work within libraries seems to be happening outside of library systems departments. when asked which statement best described who selected the guides system, 67 respondents (36%) indicated their library research library use of web-based research guides | ghaphery and white 24 guides were “initiated by public services,” followed closely by “more of a library-wide initiative” at 63 responses (34%). in the middle at 34 responses (18%) was “initiated by an informal crossdepartmental group.” only 10 respondents (5%) selected “initiated by systems,” with the top down approach of “initiated by administration” gathering 14 responses (7%). when narrowing the responses to those sites that are using libguides or campus guides, the portrait is not terribly different, with 36% library-wide, 35% public services, 18% informal cross-departmental, 7% administration, and systems trailing at 4%. likewise there was not a strong indication of library systems involvement in maintaining or supporting research guides. sixty-nine responses (37%) indicated “no ongoing involvement” and an additional 35 (19%) indicated “n/a we do not have a systems department.” there were only 21 responses (11%) stating “considerable ongoing involvement,” with the balance of 63 responses (34%) for “some ongoing involvement.” not surprisingly, there was a correlation between the type of research guide and the amount of systems involvement. for sites running a “customized open source system,” “other commercial system,” or “homegrown system,” at least 80% of responses indicated either “considerable” or “some” ongoing systems involvement. in contrast, 37% of sites running libguides or campusguides indicated “considerable” or “some” technical involvement. further, the libguides and campusguides users recorded the highest percentage (43%) of “no ongoing involvement” compared to 37% of all respondents. interestingly, 20% of libguides and campus guides users answered “n/a we do not have a systems department,” which is not significantly higher than all respondents for this question at 19%. the level of interaction between research guides and enterprise library systems was not reported as strong. when asked “which statement best describes the relationship between your web content management system and your research guides?” 112 responses (60%) indicated that “our content management system is independent of our research guides” with an additional 51 responses (27%) indicating that they did not have a content management system (cms). only 12 respondents (6%) said that their cms was integrated with their research guides with a remaining 13 (7%) saying that their cms was used for “both our website and our research guides.” a similar portrait was found in seeking out the relationship between research guides and discovery/federated search tools. when asked “which statement best describes the relationship between your discovery/federated search tool and your research guides?” roughly half of the respondents (96, 51%) did not have a discovery system (“n/a we do not have a discovery tool”). only 12 respondents (6%) selected “we prominently feature our discovery tool on our guides,” whereas more than double that number, 26 (14%), said “we typically do not include our discovery tool on our guides.” fifty four respondents (29%) took the middle path of “our discovery tool is one of many search options we feature on our guides.” in the case of both discovery systems and content management systems, it seems that research guides are typically not deeply integrated. when asked “what other type of content do you host on your research guides system?” respondents selected from a list of choices as reflected in table 1. information technology and libraries | march 2012 25 answer total percent libguides/campusguides course pages 127 68% 74% “how to” instruction 123 65% 77% alphabetical list of all databases 76 40% 42% “about the library” information (for example hours, directions, staff directory, event) 59 31% 35% digital collections 34 18% 19% everything—we use the research guide platform as our website 16 9% 9% none of the above 17 9% 2% table 1. other types of content hosted on research guides system these answers reinforce the portrait of integration within the larger library web presence. while the research guides platform is an important part of that presence, significant content is also being managed by libraries through other systems. it is also consistent with the findings from the arl website visits, where course pages were consistently found within the research guides platform. for sites reporting libguides or campusguides as their platform, inclusion of course pages and how-to instruction was even higher, at 74% and 77%, respectively. another multi-answer question sought to determine what types of policies are being used by libraries for the management of research guides: “which of the following procedures or policies do you have in place for your research guides?” responses are summarized in table 2. library use of web-based research guides | ghaphery and white 26 answer total percent percent using libguides/campusguides style guides for consistent presentation 105 56 58 maintenance and upkeep of guides 94 50 53 link checking 87 46 50 required elements such as contact information, chat, pictures, etc. 78 41 56 training for guide creators 73 39 43 transfer of guides to another author due to separation or change in duties 72 38 41 defined scope of appropriate content 43 23 22 allowing and/or moderating user tags, comments, ratings 36 19 25 none of the above 36 19 19 controlled vocabulary/tagging system for managing guides 23 12 25 table 2. management policies/procedures for research guides while nearly one in five libraries reported none of the policies in place at all, the responses indicate that there is effort being applied toward the management of these systems. the highest percentage for any given policy was 56% for “style guides for consistent presentation.” best practices in these areas could be emerging or many of these policies could be specific to individual library needs. as with the survey question on content, the research-guides platform also has a role with the libguides and campusguides users reporting much higher rates of policies for “controlled vocabulary/tagging” (25% vs. 12%) and “required elements” (56% vs. 41%). in both information technology and libraries | march 2012 27 of these cases, it is likely that the need for policies arise from the availability of these features and options that may not be present in other systems. based on this supposition, it is somewhat surprising that the libguides and campusguides sites reported the same lack of policy adoption (none of the above; 19%). the final question in the survey further explored the management posture for research guides by asking a free-text question: “how do you evaluate the success or failure of your research guides?” results were compiled into a spreadsheet. the authors used inductive coding to find themes and perform a basic data analysis on the responses, including a tally of which evaluation methods were used and how often. one in five institutions (37 respondents, 19.6%) looked only to usage stats, while seven respondents (4%) indicated that their library had performed usability testing as part of the evaluation. forty-our respondents (23.4%) said they had no evaluation method in place (“ouch! it hurts to write that.”), though many expressed an interest or plans to begin evaluation. another emerging theme included ten respondents who quantified success in terms of library adoption and ease of use. this included one respondent who had adopted libguides in light of prohibitive it regulations (“we choose libguides because it would not allow us to create class specific research webpages”). several institutions also expressed frustration with the survey instrument because they were in the process of moving from one guides system to another and were not sure how to address many questions. most responses indicated that there are more questions than answers regarding the efficacy of their research guides, though the general sentiment toward the idea of guides was positive, with words such as “positive,” “easy,” “like,” and “love” appearing in 16 responses. countering that, 5 respondents indicated that their libraries’ research-guides projects had fallen through. conclusion this study confirms previous research that web-based research guides are a common offering, especially in academic libraries. adding to this, we have recorded a quantitative adoption of libguides both through visiting arl websites and through a survey distributed to library listservs. further, this study did not find a consistent management or assessment practice for library research guides. perhaps the most interesting finding from this study is the role of library systems departments with regard to research guides. it appears that many library systems departments are not actively involved in either the initiation or ongoing support of web-based research guides. what are the implications for the library technology community and what questions arise for future research? the apparent ascendancy of libguides over local solutions is certainly worth considering and in part demonstrates some comfort within libraries for cloud computing and saas. time will tell how this might spread to other library systems. the popularity of libguides, at its heart a specialized content management system, also calls into question the vitality and adaptability of local content management system implementations in libraries. more generally, does the desire to professionally select and steward information for users on research guides indicate librarian misgivings about the usability of enterprise library systems? how do attitudes library use of web-based research guides | ghaphery and white 28 toward research guides differ between public services and technical services? hopefully these questions serve as a call for continued technical engagement with library research guides. what shape that engagement may have in the future is an open question, but based on the prevalence and descriptions of current implementations, such consideration by the library technology community is worthwhile. references 1. luigina vileno, “from paper to electronic, the evolution of pathfinders: a review of the literature,” reference services review 35, no. 3 (2007): 434–51. 2. martin courtois, martha higgins, aditya kapur, “was this guide helpful? users’ perceptions of subject guides,” reference services review 33 , no. 2 (2005): 188–96. 3. lateka j. grays, darcy del bosque, and kristen costello, “building a better m.i.c.e. trap: using virtual focus groups to assess subject guides for distance education students,” journal of library administration 48, no. 3/4 (2008): 431–53. 4. mira foster et al., “marketing research guides: an online experiment with libguides,” journal of library administration 50, no. 5/6 (july/september, 2010): 602–16; alisa c. gonzalez and theresa westbrock, “reaching out with libguides: establishing a working set of best practices,” journal of library administration 50, no. 5/6 (july/september, 2010): 638–56. 5. luigina vileno, “testing the usability of two online research guides,” partnership: the canadian journal of library and information practice and research 5, no. 2 (2010), http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 (accessed august 8, 2011). 6. angela horne and steve adams, “do the outcomes justify the buzz? an assessment of libguides at cornell university and princeton university—presentation transcript,” presented at the association of academic and research libraries, seattle, wa, 2009, http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-oflibguides-at-cornell-university-and-princeton-university (accessed august 8, 2011). 7. sarah morris and marybeth grimes, “a great deal of time and effort: an overview of creating and maintaining internet-based subject guides,” library computing 18, no. 3 (1999): 213–16. 8. mathew miles and scott bergstrom, “classification of library resources by subject on the library website: is there an optimal number of subject labels?” information technology & libraries 28, no. 1 (march 2009): 16–20, http://www.ala.org/lita/ital/files/28/1/miles.pdf (accessed august 8, 2011). 9. association of research libraries, “association of research libraries: member libraries,” http://www.arl.org/arl/membership/members.shtml (accessed october 24, 2011). http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-libguides-at-cornell-university-and-princeton-university http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-libguides-at-cornell-university-and-princeton-university http://www.ala.org/lita/ital/files/28/1/miles.pdf http://www.arl.org/arl/membership/members.shtml information technology and libraries | march 2012 29 appendix. survey library use of web-based research guides please complete the survey below. we are researching libraries’ use of web-based research guides. please consider filling out the following survey, or forwarding this survey to the person in your library who would be in the best position to describe your library’s research guides. responses are anonymous. thank you for your help! jimmy ghaphery, vcu libraries erin white, vcu libraries 1) what is the name of your organization? __________________________________ note that the name of your organization will only be used to make sure multiple responses from the same organization are not received. any publication of results will not include specific names of organizations. 2) which choice best describes your library? o arl o university library o college library o community college library o public library o school library o private library o governmental library o nonprofit library 3) what type of system best describes your research guides by subject? o libguides or campusguides o customized open source system o other commercial system o homegrown system o static html pages 4) which statement best describes the selection of your current research guides system? o initiated by administration o initiated by systems o initiated by public services o initiated by an informal cross-departmental group o more of a library-wide initiative library use of web-based research guides | ghaphery and white 30 5) how much ongoing involvement does your systems department have with the management of your research guides? o no ongoing involvement o some ongoing involvement o considerable ongoing involvement o n/a we do not have a systems department 6) what other type of content do you host on your research guides system? o course pages o “how to” instruction o alphabetical list of all databases o “about the library” information (for example: hours, directions, staff directory, events) o digital collections o everything—we use the research guide platform as our website o none of the above 7) which statement best describes the relationship between your discovery/federated search tool and your research guides? o we typically do not include our discovery tool on our guides o our discovery tool is one of many search options we promote on our guides o we prominently feature our discovery tool on our guides o n/a we do not have a discovery tool 8) which statement best describes the relationship between your web content management system and your research guides? o our content management system is independent of our research guides o our content management system is integrated with our research guides o our content management system is used for both our website and our research guides o n/a we do not have a content management system 9) which of the following procedures or policies do you have in place for your research guides? o defined scope of appropriate content o required elements such as contact information, chat, pictures, etc. o style guides for consistent presentation o allowing and/or moderating user tags, comments, ratings o training for guide creators o controlled vocabulary/tagging system for managing guides o maintenance and upkeep of guides o link checking information technology and libraries | march 2012 31 o transfer of guides to another author due to separation or change in duties o none of the above 10) how do you evaluate the success or failure of your research guides? [free text] 49 book reviews writing for technical and professional journals by john h. mitchell. john wiley & sons, inc., new york, london and sydney, 1968, 405 pp. this book reprints, describes, summarizes or refers to every item in what has to be the world's largest scrapbook of material relating to professional publication. the last 240 pages (three-fifths of the total) include "sample" style guides from the ieee, management science, aibs, acs (including seven or eight pages of abbreviations used in chemical abstracts), aip, the gpo, nasa, the modem language association, the american mathematical society, the american medical association, the apa, the american sociological review, the american economic review, the hispanic american historical review, the nea, and sundry others. in almost every case, the excerpted or complete style guide is followed by an illustrative article. i would doubt that any other such compilation exists. the chapters which precede this anthology discuss more general aspects of writing for professional journals: design and approach, the collection, correlation, selection and anangement of data, and the . elements of journal articles. the text in these chapters is crowded with material of the most varied and unexpected kinds: disquisitions on logic, formal organization, outlining, interview techniques, information retrieval, the dewey decimal system, the ejc thesaurus, and much, much more. there is only one problem in all of this, but it is a· serious one, epitomized by the quotation from robert louis stevenson which mitchell uses as motto for his first chapter: "if a man can group his ideas, he is a good writer." this real treasury of reference material is all but inaccessible to the reader. titles of the five chapters are not very descriptive, and the index is not organized as a retrieval device. if one knows where in the book to look, he can find very useful information, but just leafing through the pages is neither efficient nor easy. it is made particularly difficult, in fact, by the striking lack of editorial judgment exercised in the design of the book. there is no differentiation between the author's comments and the examples and illustrations which he reprints (unless, as in some cases, the typography of the original has been reproduced). headings within chapters, where they exist at all, are confusing-and again, it is often difficult to determine whether they are part of mitchell's organization or part of some quoted work. as a result, it is hard to say who should buy this book and even harder to say how it might be used. professor mitchell, who "was elected teacher of the year by the students of the university of massachusetts" in 1965, is presumably able to make selections from the contents and to present them effectively in a classroom. perhaps the publishers might atone for 50 journal of library automation vol. 2/1 march, 1969 their abnegations of responsibility in preparing this book for the press by prevailing upon its author to write a supplementary, and much-needed, user's guide to its contents. a. ]. goldwyn computer peripherals & typesetting by arthur h. phillips. london, her majesty's stationary office, 1968. 665 pp. $28.80. the appearance of a comprehensive volume on computer composition is a boon to librarians as it comes at a time when progress with marc and other complex data bases calls for printing and other output capabilities which exceed those now commonly available with computers. recent advances in photocomposition technology now make possible printing of graphic arts quality at acceptable costs for certain types of com~ puter produced library publications, such as book and periodical catalogs whose basic input includes upperand lower-case and a full range of diacritical marks. with these advances librarians need no longer accept the limitations of character sets and image quality imposed by present line printers. a quality product is needed for outputs which are destined for publication. some pioneers have already made good use of this advanced technology to produce quality catalogs and lists; this book will help others to travel the same road. the volume is a comprehensive reference compendium of data on computer peripherals which is not otherwise available in convenient form. it gives special emphasis to the coding and keyboarding of alphanumeric texts and describes how the computer can be used for text processing with a typographic output. it also gives an appreciation of the problems involved and the techniques and equipment that are available to those who are preparing to enter this important field. the text is arranged in three sections. the first is an introduction to computer processing of alphanumeric data which is intended for printing in typographic quality. the second describes many types of computer peripherals and gives considerable attention to the various codes used for computer and printing equipment data input. the third section describes alphanumeric text composition and the available graphic arts composing equipment. the text is supplemented by many illustrations, diagrams, and tables plus an index and a glossary of terms. while much of the material in the volume will become outdated within a short time, a substantial portion of it is sufficiently basic to retain its value for a longer period. this handsome book is intelligently conceived and well-written by one of england's leading authorities on printing and ' computer typesetting. for anyone seriously interested in the subject the volume is essential and worth its price. richard de gennaro book reviews 51 coordinate indexing, by john c. costello, jr. rutgers series on systems for the intellectual organization of information, volume vii. edited by susan artandi. the rutgers university press, new brunswick, n.j., 1966. 218 pp. this paperback book is the result of ~ seminar meeting on coordinate indexing held april 28 and 29, 1966, under the sponsorship of the rutgers graduate school of library service. the volume consists of a detailed presentation of the subject by john costello of battelle memorial institute, followed by a discussion of the presentation by four panelists. the objectives of the book as given in the preface are: to offer a description, discussion, critique, and collection of facts and data on coordinate indexing as one of the systems which may be used to intellectually organize information contained in documents. basically an introductory description of the subject is offered. however, the principles of coordinate indexing are included so that the material has value for anyone interested in the topic. with examples offered primarily from metallurgy and engineering, the emphasis is on the handling of technical documents. about half of the presentation is devoted to input, with storage, searching, and output comprising the other half. discussion by the panel (dr. susan artandi, moderator; dr. charles l. bernier; dr. vincent e. giuliano; and dr. i. a. warheit) is not given verbatim, but summarized by the editor. although the table of contents is quite detailed, an index would make the book more useful. the inclusion of a selective bibliography is valuable, but unfortunately it is almost never referred to· in the text. the bibliography is of course now somewhat out-of-date. laura k. osborn libraries of the future, by j. c. r. licklider. the m.i.t. press, massachusetts institute of technology, cambridge, massachusetts, 1965. third printing, september 1966. 219 pp. $6.00. this remarkable little book is rapidly becoming a classic in the field of information science. (note that it is now in its third printing.) it analyzes the concepts and problems of libraries of the future, "future" being defined as the year 2000. the book is the culmination of a two-year research project on the future of libraries sponsored by the council on library resources. the study was conducted by bolt beranek and newman, inc. between november, 1961, and november, 1963. the first part of this book describes man's interaction with recorded knowledge in what mr. licklider calls "procognitive systems." the author assumes man will be reacting to segments of the entire body of recorded information within a vast hierarchical information network. he estimates 52 journal of. library automation vol. 2/1 marc;:h, 1969 the present world corpus of knowledge could be stored in 1015 bits of computer memory. the rate of increase is 2·106 bits per second. part two explores the use of computers within the procognitive system. subjects touched upon include syntactical analysis of natural languages, quantitative aspects of the representation of information, information retrieval effectiveness, and question-answer systems. some time is spent with studies of current computer techniques. in general, part two is a trifle dated as it deals with specific techniques in a field where technological obsolescence is precipitous. . mr. licklider's writing is both intellectually stimulating and delightful. in discussing the future computer console, " ... the concept of ·desk' may have changed from passive to active: a desk may be primarily a displayand-control station in a telecommunication-telecomputation system---,and its most vital part may be the cable (umbilical cord) that connects it, via a wall socket, into the procognitive utility net," a footnote goes on to say, "if a man wishes to get away from it all and think in peace and quiet, he will have merely to turn off the power. however, it may not be economically feasible for his employer to pay him full rate for the time he thus spends in unamplified cerebration." serious students of information or library science should consider this book required reading if for no other reason than the jolt it provides one's imagination1 gerry d. guthrie 154 information technology and libraries | september 2009 tutorial kathleen carlson delivering information to students 24/7 with camtasia this article examines the selection process for and use of camtasia studio software, a screen video capture program created by techsmith. the camtasia studio software allows the author to create streaming videos which gives students 24 hour access on any topics including how to order books through interlibrary loan. h ow does one engage students in the library research process? in my brief time at the downtown phoenix campus library of arizona state university (asu) i have found a software program that allows librarians to bring the classroom to the student. screen capture programs allow you to create presentations and publish them for students to view on their own time. instead of telling students how to do something, we need to show them.1 recent studies show there are numerous benefits to using streaming video in higher education. students that receive streaming video instruction as well as traditional instruction show dramatic improvement in class.2 this article takes a look at how i selected one software program and created a streaming video using the application. i examined three software applications that help create video tutorials and presentations: cam studio, macromedia’s captivate, and techsmith’s camtasia studio. i first experimented with cam studio, which is open-source software. there are limitations to what you can do with software that is free. the screen size is too small and the file size it can create is limited. macromedia’s captivate is good if you want to create a series of screenshots with accompanying audio. i did not choose this streaming video program because i was unsure of the software’s capability, and i had no one to provide technical support. the third choice, the open-source camtasia studio, was the software i selected. there were several reasons why i preferred this software. i had more familiarity with it, and the software is very easy to load and is user friendly. it also has the ability to record a video of everything that is happening on your computer screen.3 another reason i selected camtasia studio was because of the availability of an asu software technician who had experience editing the streaming video. most users view camtasia’s video through adobe flash, but the program also can produce windows media, quicktime, dvd-ready avi, ipod, iphone, realmedia mp3, web, cd, blog, and animated gif formats.4 camtasia performs screen captures in real time. you are able to simultaneously use slideshow software, navigate to a website, and narrate step-by-step instructions. the full version of camtasia studio runs around $300. in addition to the software program, you also must have a combination headset and microphone. a stick microphone will work, but the combination headset will help eliminate any noise that can be picked up by a stick microphone. i purchased a logitech extreme pc gaming headset for about $20. when you purchase the camtasia license online at http://www.techsmith .com/, the customer service department will e-mail you the access code along with a link from which you can download the software. the cd-rom loaded with the camtasia software arrives about ten days later. my first camtasia studio project was a tutorial on how to use the university’s interlibrary loan system. here are the basic steps i took to create a streaming video: 1. preproduction. this involves the creation of a script. 2. production. the actual capturing of the video and audio content. have all websites and programs open and minimized at the bottom of the screen in order to easily select them during the video capturing. 3. postproduction. this is the most time-consuming and involves editing the video and compressing the file for delivery to users. 4. publishing. posting the video to a web server and assessing the material’s success. to see the full 3 minute 53 second streaming video “how to order an article that asu does not own” go to http://www.asu.edu/lib/tutorials/ illiad/index.html. implementing camtasia studio once camtasia studio is installed on your computer, double click on the camtasia studio icon. it will bring up a welcome window where you can select from the following (see figure 1): n start a new project by recording the screen n start recording a powerpoint presentation n start a new project by importing media files n open an existing project i have selected “start a new project by recording the screen.” on the left hand menu there is a task list, and you can select one of the kathleen carlson (kathleen.carlson@ asu.edu) is health sciences librarian, information commons library, arizona state university, downtown phoenix campus. delivering information to students 24/7 with camtasia | carlson 155 following (see figure 2): n record the screen n record the powerpoint i have selected “start a new project by recording the screen.” this will bring up a window, “new recording wizard screen recording setup.” it asks you what you would like to record (see figure 3). n region of the screen n specific window n entire screen i have selected “entire screen.” when you click on the “next” button, it brings up a recording options window (see figure 4). select from the following: n record audio n record camera i have selected “record audio while recording the screen.” next you see a window that lets you choose audio settings from the following (see figure 5): n microphone n speaker audio n microphone and speaker audio n manual input selection i have selected “microphone” (see figure 6). the next window is titled “tune volume input levels.” use the input level lever to set the audio input level (see figure 7). figure 1. welcome screen and what do you want to do? figure 5. choose audio settingsfigure 3. screen recording setup figure 4. recording options figure 2. record the screen 156 information technology and libraries | september 2009 the “begin recording” window appears, which includes instructions on how to start and stop recording. you have the choice of clicking the “record” button on camtasia recorder or clicking the f9 key to start recording. to stop, click the “stop” button on camtasia recorder or click the f10 key (see figure 8). finally click on either “record the screen” or “record powerpoint.” to view your streaming video, click on the saved icon where it says clip bin or go to camtasia toolbar and click on view. then click on clip bin, then click on thumbnails. that’s all there is to it. summary i found camtasia studio to be very user friendly, although i cannot emphasize enough how important it is for librarians to collaborate with their it staff. this software enables you to bring the classroom to the student when they need it. you may have instructed a class on library research, but many of these students may have already forgotten where to begin. streaming video allows students to access presentations 24/7. here is a checklist of things to think about when selecting software: n what do you want to accomplish with the software? n what kind of access are you trying to give? n do you want audio, video, or both? n is it easy for the student to access and understand? n have you researched the software to make sure it meets your needs? n how much money do you want to spend? n what additional equipment is necessary? finally, and most importantly, work with your it staff on all phases of your project. by developing a collaborative relationship with them you will have fewer bumps in the road. use your imagination: the sky is the limit. references 1. diane murley, “tools for creating video tutorials,” law library journal 99, no. 4 (2007). 2. ron reed, “streaming technology improves achievement: study shows the use of standards-based video content, powered by new internet technology application, increases student achievement,” t.h.e. journal 30, no. 7 (2003). 3. christopher cox, “from cameras to camtasia: streaming media without the stress,” internet reference services quarterly 9 no. 3/4 (2004). 4. john d. clark and qinghua kou, “captivate/camtasia,” journal of the medical library association 96, no. 1 (2008), http://www.pubmedcentral.nih.gov/ articlerender.fcgi?artid=2212324 (accessed june 24, 2009). figure 6. audio volume levels figure 7. begin recording figure 8. camtasia recorder reproduced with permission of the copyright owner. further reproduction prohibited without permission. graphical table of contents for library collections: the application ... herrero-solana, victor;félix moya-anegón;guerrero-bote, vicente;zapico-alonso, felipe information technology and libraries; mar 2006; 25, 1; proquest education journals pg. 43 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. bibliographic retrieval from bibliographic input; the hypothesis and construction of a test frederick h. ruecking, jr.: head, data processing division, the fondren library, rice university, houston, texas 227 a study of problems associated with bibliographic retrieval using unverified input data supplied by requesters. a code derived from compression of title and author information to four, four-character abbreviations each was used for retrieval tests on an ibm 1401 computer. retrieval accuracy was 98.67%. current acquisitions systems which utilize computer processing have been oriented toward handling the order request only after it has been manually verified. systems, such as that of texas a & i university (1), have proven useful in reducing certain clerical routines and in handling fund accounting ( 2). lack of a larger bibliographic data base and lack of adequate computer time have prevented many libraries from studying more sophisticated acquisitions systems. at the time the marc pilot project ( 3) was started, the fondren library at rice university did not have operating computer applications in acquisitions, serials, or cataloging. the university administration and the research computation center provided sufficient access to the ibm 7040 to permit the study of problems associated with bibliographic retrieval using input data which has varying accuracy. in 1966, richmond expressed the concern of many librarians about the lack of specific statements describing the techniques by which on-line retrieval could be accomplished without complicating the problems presented by the current card catalog ( 4). she had previously described some of the problems created by the kind and quality of data being utilized as references by library users ( 5). 228 journal of library automation vol. 1/ 4 december, 1968 an examination of the pertinent literature indicates that most of the current work in retrieval, while related to problems of bibliographic retrieval, does not offer much assistance when the input data is suspect ( 6, 7,8 ). tainiter and toyoda, for example, have described different techniques of addressing storage using known input data ( 9,10). one of the best-known retrieval systems is that of the chemical abstracts service, which provides a fairly sophisticated title-scan of journal articles with a surprising degree of flexibility in the logic and term structure used as input. comparable systems are used by the defense documentation center, medlars centers, and nasa technology centers. these systems have one specific feature in common: a high level of accuracy in the input data. user-supplied bibliographic data the reliability of bibliographic data supplied to university libraries from faculty and students has long been questioned ( 5). any search system which accepts such data must be designed 1) to increase the level of confidence through machine-generated search structures and variable threshholds and 2) to reduce the dependence upon spelling accuracy, punctuation, spacing and word order. the initial task of formulating an approach to this problem is to determine the type, quality, and quantity of data generally supplied by a user. to derive a controlled set of data for this purpose, the acquisition department of the fondren library provided xerox copies of all english language requests dated 1965 or later and a random sample of 295 requests was drawn from that file of 5000 items. this random sample was compared to the manually-verified, original order-requests to determine 1) the frequency with which data was supplied by the requestor and 2) the accuracy of the provided information. results of this study are given in table 1. table 1. level of confidence in the input data data times times level of elements given correct accuracy confidence edition 295 294 99.6 99.6 title 295 292 99.0 99.0 author 290 264 91.0 82.7 publish. 268 218 81.3 73.9 date 265 215 81.1 72.8 the results suggest that edition can have great significance when specified and should be used as strong supporting evidence for retrieval. it should not necessarily be a restrictive element because of the low-order magnitude of actual specification, which was five times in the sample. (unstated editions were considered as first editions, and correct. ) bibliographic retrievalj ruecking 229 title is the most significant and most reliable element. as richmond indicates, use of the entire title for searching would present distinct problems for retrieval systems ( 4) . consequently, an abbreviated version of the title must be derived from the input data which will reduce the impact and significance of the problems described by richmond (5). the hypothesis it is hypothecated that retrieval of correct bibliographic entries can be obtained from unverified, user-supplied, input data through the use of a code derived from the compression of author and title information supplied by the user. it is assumed that a similar code is provided for all entries of the data base using the same compression rules for main and added entry, title and added title information. it is further hypothecated that use of weighting factors for individual segments of the code will provide accurate retrieval in those cases when exact matching does not occur. before the retrieval methodology can be described, it is necessary to outline the compression technique to be used with author and title words. title compression to gain some understanding of the problems to be faced in compressing title information, a random sample of 500 titles was drawn from the first half of the initial marc i reel (about 4800 titles). each of these titles was analyzed for significant words and tabulations were made on word strings and word frequencies. the following words. were considered as non-significant: a, an, and, by, if, in, of, on, the, to. the tabulated data, shown in table 2, contain some surprising attributes. approximately 90% of the titles contain less than five significant words, which suggests that four significant words will be adequate to match on title. table 2. significant word strings in titles length of word string 1 2 3 4 5+ total number of titles 42 151 179 76 52 500 percentage 8.4 30.2 35.8 15.2 10.4 100.0 cumulative percentage 8.4 38.6 74.4 89.6 100.0 letting n stand for the corpus of words available for title use, the random chance of duplicating any specific word in another title can be stated 1 as . when a string of words is considered, the chance of randomly n 1 selecting the same word string may be considered as -a, where 'a' is the n number of words in the string. 230 journal of library automation vol. 1/ 4 december, 1968 certain words are used more frequently than others, and the occurrence of such words in a given string reduces the uniqueness of that string. the curve displayed in figure 1 shows the frequency distribution of words in the sample. the mean frequency of words in the title-sample is 1.33. 'ioo ( )b~f 800 700 600 t.r) 0 a: 0 3.500 ll. d 0:: llj cdfoo ~ =:i :z: 3()() 2ixj \ 100 fi'}.. i~ k f+!.~ \' jtl-' __() (i) i (~ _c[).l i z 3 '1s 6 7 8 f/ 10 ii /2 ffi!equency fig. 1. frequency distribution of words in sample. bibliographic retrievaljruecking 231 therefore, the chance of selecting an identical word string can be more accurately expressed as: n" an examination of word lengths, as shown in table 3, shows that 95% of the significant title words contain less than ten characters. an examination of the word list revealed that some 70% of the title words contain inflections and/ or suffixes. if these suffixes and inflections are removed, approximately 43% of the remaining word stems contain less than five characters and 59% contain less than six. table 3. distribution of character length and stem length length in total different percent stems percent characters words words 1 7 5 0.5 5 0.8 2 25 14 1.3 14 2.3 3 87 48 4.6 48 7.9 4 172 117 11.1 196 32.3 5 229 163 15.5 92 15.2 6 198 153 14.5 94 15.5 7 202 159 15.3 64 10.6 8 158 122 11.6 45 7.4 9 121 102 9.7 15 2.5 10 84 69 6.6 8 1.3 11 54 48 4.6 7 1.2 12 38 28 2.7 2 0.3 13 14 12 1.1 2 0.3 14 6 4 0.4 0 0.0 15 3 3 0.3 0 0.0 16 2 2 0.2 0 0.0 summary 1400 1049 592 the reduction of word length does affect the uniqueness of the individual word, merging distinct words into common word stems at a mean rate of 2.5 to 1.0. in table 3 the difference between 1049 words and 592 stems reflects the reduction of similar words into a common stem; for example: america, american, americans, americanism, etc., into a.mer. thus, the uniqueness of a string of title words is reduced to the following chance of duplication: (2.5 x 1.33 )• 3.3• n• or-n" 232 journal of library automation vol. 1/ 4 december, 1968 an analysis of consonant strings made by dolby and resnikoff provides frequencies of initial and terminal consonant strings occurring in 7000 common english words (with suffixes and inflections removed) ( 11,12, 13). these frequency lists clearly show that the terminal string of consonants has considerable information-carrying potential in terms of word identification. the starting string also carries information potential, but significantly less than the terminal string. by combining the initial and terminal strings, it is possible to generate an abbreviation which has adequate uniqueness and reduces the influence of spelling. the high percentage of four-character word stems and the fact that the maximum terminal string contains four consonants suggest the use of a four-character abbreviation. to compress a title word into four characters, it is necessary to specify a set of rules. the first rule will be to delete all suffixes and inflections which terminate a title word. the second rule will be to delete vowels from the stem until a consonant is located or the four-character stem is produced. the suffixes and inflections deleted in this procedure are contained in table 4. when the stem contains more than four characters, the third compression rule states that the four-character field is filled with the terminal-consonant string and remaining positions are filled from the initialcharacter string. table 4. deleted suffixes and inflections -ic -ive -in -et -ed -ative -ain -est -aged -ize -on -ant -oid -ing -ion -ent -ance -og -ation -ient -ence -log -ship -ment -ide -olog -er -ist -age -ish -or -y -able -al -s -ency -ible -ial -es -ogy -ite -ful -ies -ology -ine -ism -ives -ly -ure -urn -ess -ry -ise -ium -us -ary -ose -an -ous -ory -ate -ian -ious -ity -ite the relative uniqueness of the generated abbreviation can be calculated using the data supplied by dolby and resnikoff. for example, carter and bonk's building library collections would be abbreviatedbuld, libr,coct. the random chance of duplicating any abbreviation can be stated as consisting of the product of the random chance of duplicating the initial string and the random chance of duplicating the terminal string: bibliographic retrievalj ruecklng 233 fl ft -xx3.32 n1 nt the frequencies listed by dolby and resnikoff may be substituted in the above equation producing the following chances for duplication: 324 63 1 x x 10.89 = -for buld 6800 6800 208 288 6800 277 6800 1 1 x 6800 x 10.89 = 14745 for libr 16 1 x 6800 x 10.89 = 1041 for coct the random chance of duplicating this string of three abbreviations can be calculated by multiplying the individual calculations, which yields the random chance of 1 in 32 x 108• this high uniqueness declines rapidly when the title contains less than three significant words and contains high frequency words, such as the title collected works, for which the same uniqueness calculation produces the random chance of 1 in 44 x 104• to increase the level of uniqueness on short titles, like collected works, it becomes necessary to provide supporting data to the title information. it is clear that the supporting data must come from supplied author text. author compression the same compression algorithms can be used for both personal and corporate names with some modifications. the frequent· substitution of "conference" for "congress" and "symposia" for "symposium" suggests that meeting names should be considered as a secondary sub-set of non-significant words. names of organizational divisions, such as bureau, department, ministry, and office, can be considered as part of the same sub-set. the rules which govern the deletion of inflections, suffixes and vowels can be used for corporate names, but personal author names must be carried into the compression routine without modificatjon. only the last name of an author would be compressed into a code. constructing the test four, four-character abbreviations are allowed for title compression and four for author. rather than use a 32-character fixed field for these codes, the lengths of the input and main-base codes are variable, with leading control digits to specify the individual code sizes for the title and author segments. . provision is made for the inclusion of date, publisher and/ or edition in the search-code sh·ucture although these were not implemented in the test performed. . 234 journal of library automation vol. 1/ 4 december, 1968 at the time the input data is read, the existence of title, author, edition, publisher and date is indicated by the setting of indicators which control the matching mask and which, in part, control the specification of the retrieve threshhold. the title indicator specifies the number of compressed words in the supplied title which must be matched by the base code. a simple algorithm is used to calculate the threshhold values given in columns two through four of table 5. columns five through seven are obtained by adding two to the calculated threshholds. each agreement within the mask adds to a retrieve counter the values indicated in the last five columns of table 5, the values of x and y being the number of matching code words in the title and author segments respectively. conducting the test as mentioned above, the initial tests of the retrieve were based upon title and author matching exclusively and required three runs on the fondren library's 1401 computer. the first loaded 2874 original orderrequests, generated a search code utilizing the rules specified in this paper and created an input tape. the second run extracted title and author data from the marc i data base, created multiple search codes for title, main entry, added title and added entry. both tapes were sorted into ascending search-code sequence. the final run was the search program which attempted to match input codes with the marc i base codes. when there was agreement based on relationship of threshhold and retrieve counter, the printer displayed threshhold, short author and short title on one line, and retrieve value, input author and title on the next line as illustrated in figure 2. the printed results were compared to validate the accuracy of the retrieve. this comparison was cross-checked against the results of the acquisition department's manual procedures. the search program also provided for an attempt to match titles on the basis of a rearrangement of title words. in such attempts the retrieve threshhold was raised. analysis of results the raw data obtained from this experimental run are shown in table 6. of the 287 4 items represented in the input file , 48.4%, or 1392, were actually found to exist in the data base. of those actually present 90.4%, or 1200, were extracted with an overall accuracy of 98.67%. an examination of the sixteen false drops revealed several omissions in the compression routines for the input data and for the data base. one of the more significant omissions was failing to compensate for multi-character abbreviations, particularly 'st.' and 'ste.' for 'saint.' a subroutine for acceptance of such abbreviations added to the search-code generating program would increase the retrieve accuracy to 99%. table 5. values for variable threshhold data threshhold values agreement values given full-code test individual code test title author edition publish. date taepd 3 or 4 2 1 3 or 4 2 1 xylll 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 xyllo 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 xylol 12 8+2y 4+ 2y 14 10+2y 6+2y 4x 2y 3 2 1 xyloo 12 8+2y 4+ 2y 14 10+2y 6+2y 4x 2y 3 2 1 xyoll 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 l::x; .... xyolo 12 8+2y 4+ 2y 14 10+2y 6+2y 4x 2y 3 2 1 ~ g:' xy001 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 (1q ~ "';j xyooo 12 8+2y 4+2y 14 18+2y 6+2y 4x 2y 3 2 1 ;;:to .... ~ xolll 12 11 7 13 12 7 4x 2y 3 2 1 ::x; {';) xouo 12 11 7 13 12 7 4x 2y 3 2 1 ..... "'t .... {';) c: x0101 12 11 7 13 12 7 4x 2y 3 2 1 ~ "' x0100 12 11 7 13 11 7 4x 2y 3 2 1 !:l:l c::: xoou 12 10 6 13 11 7 4x 2y 3 2 1 trl (") p.:: xoolo 12 10 6 13 not permitted 4x 2y 3 2 1 -z x0001 12 9 5 13 not permitted 4x 2y 3 2 1 0 1:-0 xooo 12 not permitted not permitted c.:> cj1 10 4me r4m8rhchs 10 am~r4mbrhchs ob ame~boll ob ameii.boll 12 amerbusqshowzien 12 amerbusqshowzein 12 amercntrcampbrth 12 amercntrcamp 12 aherjewsisrliscs 1~ amerjewsisrliscshalor 12 ameroccpstctblau 1~ ameroccpstctblau 12 ameroccpstctounn 12 aheroccpstctblau 12 amerpartsysmchrs 14 aherpartsysmchrs 10 amerpreowarn 10 amerpreowarn 10 amerschkillck 10 4merschkblck 10 amerschosexi'i 10 amerschosexnpatccayo 12 amerspacexprshtn 1~ amerspacexprshen 12 amerthettooaoowr 1~ amerthettooaoowr 12 ame r thtii.as seenbrwn 11> amerthtras see nmos smonsj 12 ameihhtras seenmoss 18 a!'ierthtras seei'imossmo'isj 12 an4zphphargumcgl 12 anazphphargumcgfjan phip 12 ancihuntfar westpoud 18 ancihuntfar we stpouo fig. 2. sample of retrieved citations. heinrichs, waldo h. heinrichs* boswell, charles. boswellt lieoman, irving . leiomant bosworth, allan r. clay, c. t.t isaacs, harold robert; isaacs, harold r.t blau, peter michael. blaut duncan, otis ouoleyo jo blaut chambers, william nisbet chahberst warren, stoney, 1916 warrei'it black, hillel. blackt sexton, patricia cayo. sextoi'io patricia cayot shelton, william royo sheltont downer, alan seymour, oownert . brown, john mason, 1900 hoses, moi'itrose j.t american ambassaoor joseph c. gr american ambassaojrt the america the story of the worl the america. the story of the world the american burlesque show. the american burlesque showt america-s concentration camps by america-s concent~ation campst americai'i jews in israel by haao americai'i jews in israelt the americai'i occjpational structur the american etcupational structure the american occ~pational structur the american occupational structure the american party systems stages the american part~ systems• stages the american president the amer[can presioentt the american schjolbook. the american schoolbook* readings the american school a sociologic the american scholl. a sociological american space exploration the f american spat~ exploration. the fir the american theater today, eoite the american theater. today* the american theatre as seen by it the american theatre as seen by its moses, montrosf jqnaso the american the4tre as seen by it hoses, montrose j.t the american theatre as seen by its mcgreal, ian philip, 19 analyzing philosophical arguments mcgreaf, jan phillipt analyzing philosophical arguments. pouraoe, richard f. pourade* ancient hunters jf the far west, ancient hunters of the far west* ~ o;, 0" ~ ....... .q.. t"'l .... ~ ~ ~ i e· ;:$ < 0 r....... ~ t1 (!) (') (!) g. (!) ..:-: ....... cd 85 bibliographic retrievaljrvecking 237 table 6. table of results retrieve total correct false percentage values hits hits hits correct 6 14 14 0 100 8 0 0 0 0 10 311 311 0 100 12 264 248 16 93.3 14 232 232 0 100 16 118 118 0 100 18 260 260 0 100 20 1 1 0 100 totals 1200 1184 16 98.7 table 7. distribution of errors title errors author errors no. of title author author codes error spelling lacking error spelling other total 1 2 3 10 12 27 4 58 2 2 6 17 26 60 23 134 3 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 total 4 9 27 38 87 27 192 the occurrence of titles with the words "selected". or "collected," etc., produced additional false drop when the title word string exceeded two words. a modification to the search program to raise the threshhold when the input data contain codes such as 'sect; 'coct' would increase the retrieve accuracy to 99.17% the presence of personal names in titles, such as 'charles evans hughes' and 'franklin delano roosevelt' caused seven additional false drops. at present it seems unlikely that a simple method to prevent them can be included. conclusion the experimental results indicate that the hypothesis suggested is valid. use of multiple codes for added entry, added title in addition to the main entry, and main title data are clearly necessary. approximately 10% of the correctly retrieved items were produced by the existence of an added entry code. the influence of spelling accuracy was lessened by use of a compression technique. an inspection of extracted titles revealed the existence of 43 spelling errors which did not affect retrieval. thus, the search code reduced the significance of spelling by some 30%. utilizing table search followed by table look-up and linking random238 journal of library automation vol. 1/ 4 december, 1968 access addresses, should enable the search code approach to bibliographic retrieval to provide rapid, direct access to the title sought. acknowledgment this study was supported in part by national science foundation grants gn-758 and gu-1153 and by the regional information and communication exchange. the assistance of the acquisitions department staff, the research computation center staff and the staff of the fondren library's data processing division is gratefully acknowledged. references 1. morris, ned c.: "computer based acquisitions system at texas a & i university," journal of library automation, 1 (march 1968 ), 1-12. 2. wedgeworth, robert: "brown university library fund accounting system," i ournal of library automation, 1 (march 1968), 51-65. 3. u. s. library of congress: project marc, an experiment in automating library of congress catalog data (washington: 1967). 4. richmond, phyllis a.: "note on updating and searching computerized catalogs," library resources and technical services, 10 (spring 1966), 155-160. 5. richmond, phyllis a.: "source retrieval," physics today, 18 (april 1965)' 46-48. 6. atherton, p.; yorich, j. c.: three experiments with citation indexing and bibliographic coupling of physics literature (new york, american institute of physics, 1962). 7. international business machines corporation: reference manual, index organization for information retrieval (ibm, 1961). 8. international business machines corporation: a unique computable name code for alphabetic account numbering (white plains, n.y.: ibm, 1960). 9. tainiter, m.: "addressing random-access storage with multiple bucket capacities," association for computing machinery journal, 10 (july 1963 ), 307-315. 10. toyoda, junichi; tazuka, yoshikazu; kasahara, yoshiro: "analysis of the address assignment problems for clustered keys," association for computing machinery journal, 13 (october 1966), 526-532. 11. dolby, james l.; resnikoff, howard l.: "on the structure of written english words," language, 40 (apr-june 1964), 167-196. 12. resnikoff, howard l.; dolby, james l.: "the nature of affixing in written english, part i," mechanical translation, 8 (march 1965), 84-89. 13. resnikoff, howard l.; dolby, james l.: "the nature of affixing in written english, part ii," mechanical translation, 9 (june 1966), 23-33. a simple scheme for book classification using wikipedia | yelton 7 andromeda yelton a simple scheme for book classification using wikipedia ■■ background hanne albrechtsen outlines three types of strategies for subject analysis: simplistic, content-oriented, and requirements-oriented.3 in the simplistic approach, “subjects [are] absolute objective entities that can be derived as direct linguistic abstractions of documents.” the content-oriented model includes an interpretive step, identifying subjects not explicitly stated in the document. requirementsoriented approaches look at documents as instruments of communication; thus they anticipate users’ potential information needs and consider the meanings that documents may derive from their context. (see, for instance, the work of hjørland and mai.4) albrechtsen posits that only the simplistic model, which has obvious weaknesses, is amenable to automated analysis. the difficulty in moving beyond a simplistic approach, then, lies in the ability to capture things not stated, or at least not stated in proportion to their importance. synonymy and polysemy complicate the task. background knowledge is needed to draw inferences from text to larger meaning. these would be insuperable barriers if computers limited to simple word counts. however, thesauri, ontologies, and related tools can help computers as well as humans in addressing these problems; indeed, a great deal of research has been done in this area. for instance, enriching metadata with princeton university’s wordnet and the national library of medicine’s medical subject headings (mesh) is a common tactic,5 and the yahoo! category structure has been used as an ontology for automated document classification.6 several projects have used library of congress classification (lcc), dewey decimal classification (ddc), and similar library tools for automated text classification, but their results have not been thoroughly reported.7 all of these tools have had problems, though, with issues such as coverage, currency, and cost. this has motivated research into the use of wikipedia in their stead. since wikipedia’s founding in 2001, it has grown prodigiously, encompassing more than 3 million articles in its english edition alone as of this writing; this gives it unparalleled coverage. wikipedia also has many thesaurus-like features. redirects function as “see” references by linking synonyms to preferred terms. disambiguation pages deal with homonyms. the polyhierarchical category structure provides broader and narrower term relationships; the vast majority of pages belong to at least one category. links between pages function as related-term indicators. editor’s note: this article is the winner of the lita/ex libris student writing award, 2010. because the rate at which documents are being generated outstrips librarians’ ability to catalog them, an accurate, automated scheme of subject classification is desirable. however, simplistic word-counting schemes miss many important concepts; librarians must enrich algorithms with background knowledge to escape basic problems such as polysemy and synonymy. i have developed a script that uses wikipedia as context for analyzing the subjects of nonfiction books. though a simple method built quickly from freely available parts, it is partially successful, suggesting the promise of such an approach for future research. a s the amount of information in the world increases at an ever-more-astonishing rate, it becomes both more important to be able to sort out desirable information and more egregiously daunting to manually catalog every document. it is impossible even to keep up with all the documents in a bounded scope, such as academic journals; there were more than twenty-thousand peer-reviewed academic journals in publication in 2003.1 therefore a scheme of reliable, automated subject classification would be of great benefit. however, there are many barriers to such a scheme. naive word-counting schemes isolate common words, but not necessarily important ones. worse, the words for the most important concepts of a text may never occur in the text. how can this problem be addressed? first, the most characteristic (not necessarily the most common) words in a text need to be identified—words that particularly distinguish it from other texts. some corpus that connects words to ideas is required—in essence, a way to automatically look up ideas likely to be associated with some particular set of words. fortunately, there is such a corpus: wikipedia. what, after all, is a wikipedia article, but an idea (its title) followed by a set of words (the article text) that characterize that title? furthermore, the other elements of my scheme were readily available. for many books, amazon lists statistically improbable phrases (sips)— that is, phrases that are found “a large number of times in a particular book relative to all search inside! books.”2 and google provides a way to find pages highly relevant to a given phrase. if i used google to query wikipedia for a book’s sips (using the query form “site:en.wikipedia .org sip”), would wikipedia’s page titles tell me something useful about the subject(s) of the book? andromeda yelton (andromeda.yelton@gmail.com) graduated from the graduate school of library and information science, simmons college, boston, in may 2010. 8 information technology and libraries | march 2011 ■■ an initial test case to explore whether my method was feasible, i needed to try it on a test case. i chose stephen hawking’s a brief history of time, a relatively accessible meditation on the origin and fate of the universe, classified under “cosmology” by the library of congress. i began by looking up its sips on amazon.com. noticing that amazon also lists capitalized phrases (caps)—“people, places, events, or important topics mentioned frequently in a book”—i included those as well (see table 1).14 i then queried wikipedia via google for each of these phrases, using queries such as “site:en.wikipedia .org ‘grand unification theory.’” i selected the top three wikipedia article hits for each phrase. this yielded a list of sixty-one distinct items with several interesting properties: ■■ four items appeared twice (arrow of time, entropy [arrow of time], inflation [cosmology], richard feynman). however, nothing appeared more than twice; that is, nothing definitively stood out. ■■ many items on the list were clearly relevant to brief history, although often at too small a level of granularity to be good subject headings (e.g., black hole, second law of thermodynamics, time in physics). ■■ some items, while not unrelated, were wrong as subject classifications (e.g., list of solar system objects by size, nobel prize in physics). ■■ some items were at best amusingly, and at worst bafflingly, unrelated (e.g., alpha centauri [doctor who], electoral district [canada], james k. polk, united states men’s national soccer team). ■■ in addition, i had to discard some of the top google hits because they were not articles but wikipedia special pages, such as “talk” pages devoted to discussion of an article. this test showed that i needed an approach that would give me candidate subject headers at a higher level of granularity. i also needed to be able to draw a brighter line between candidates and noncandidates. the presence of noncandidates was not in itself distressing—any automated approach will consider avenues a human would not—but not having a clear basis for discarding low-probability descriptors was a problem. as it happens, wikipedia itself offers candidate subject headers at a higher level of granularity via its categories system. most articles belong to one or more categories, which are groups of pages belonging to the same list or topic.15 i hoped that by harvesting categories from the sixty-one pages i had discovered, i could improve my method. this yielded a list of more than three hundred categories. unsurprisingly, this list mostly comprised irrelevant because of this thesaurus structure, all of which can be harvested and used automatically, many researchers have used wikipedia for metadata enrichment, text clustering and classification, and the like. for example, han and zhao wanted to automatically disambiguate names found online but faced many problems familiar to librarians: “the traditional methods measure the similarity using the bag of words (bow) model. the bow, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. so the bow cannot reflect the actual similarity.” to counter this, they constructed a semantic model from information on wikipedia about the associative relationships of various ideas. they then used this model to find relationships between information found in the context of the target name in different pages. this enabled them to accurately group pages pertaining to particular individuals.8 carmel, roitman, and zwerdling used page categories and titles to enhance labeling of document clusters. although many algorithms exist for sorting large sets of documents into smaller, interrelated clusters, there is less work on labeling those clusters usefully. by extracting cluster keywords, using them to query wikipedia, and algorithmically analyzing the results, they created a system whose top five recommendations contained the human-generated cluster label more than 85 percent of the time.9 schönhofen looked at the same problem i examine— identifying document topics with wikipedia data—but he used a different approach. he calculated the relatedness between categories and words from titles of pages belonging to those categories. he then used that relatedness to determine how strongly words from a target document predicted various wikipedia categories. he found that although his results were skewed by how wellrepresented topics were on wikipedia, “for 86 percent of articles, the top 20 ranked categories contain at least one of the original ones, with the top ranked category correct for 48 percent of articles.”10 wikipedia has also been used as an ontology to improve clustering of documents in a corpus,11 to automatically generate domain-specific thesauri,12 and to improve wikipedia itself by suggesting appropriate categories for articles.13 in short, wikipedia has many uses for metadata enrichment. while text classification is one of these potential uses, and one with promise, it is under-explored at present. additionally, this exploration takes place almost entirely in the proceedings of computer science conferences, often without reference to library science concepts or in a place where librarians would be likely to benefit from it. this paper aims to bridge that gap. a simple scheme for book classification using wikipedia | yelton 9 computationally trivial to do so, given such a list. (the list need not be exhaustive as long as it exhaustively described category types; for instance, the same regular expression could filter out both “articles with unsourced statements from october 2009” and “articles with unsourced statements from may 2008.”) at this stage of research, however, i simply ignored these categories in analyzing my results. to find a variety of books to test, i used older new york times nonfiction bestseller lists because brand-new books are less likely to have sips available on amazon.19 these lists were heavily slanted toward autobiography, but also included history, politics, and social science topics. ■■ results of the thirty books i examined (the top fifteen each from paperback and hardback nonfiction lists), twenty-one had sips and caps available on amazon. i ran my script against each of these phrase sets and calculated three measures for each resulting category list: ■■ precision (p): of the top categories, how many were synonyms or near-synonyms of the book’s lcshs? ■■ recall (r): of the book’s lcshs, how many had synonyms or near-synonyms among the top categories? ■■ right-but-wrongs (rbw): of the top categories, how many are reminiscent of the lcshs without actually being synonymous? these included narrower terms (e.g., the category “african_american_actors” when the lcshs included “actors—united states —biography”), broader terms (e.g., “american_folk_ singers” vs. “dylan, bob, 1941–”), related terms (e.g., “the_chronicles_of_narnia_books” vs. “lion, the witch and the wardrobe (motion picture)”), and examples (“killian_documents_controversy” vs. “united states—politics and government—2001–2009”). i considered the “top categories” for each book to be the five that most commonly occurred (excluding wikipedia administrative categories), with the following exceptions: ■■ because i had no basis to distinguish between them, i included all equally popular categories, even if that would bring the total to more than five. thus, for example, for the book collapse, the most common category occurred seven times, followed by two categories with five appearances and six categories with four. rather than arbitrarily selecting two of the six four-occurrence categories to bring the total to five, i examined all nine top categories. ■■ if there were more than five lcshs, i expanded the number of categories accordingly, so as not to candidates (“wars involving the states and peoples of asia,” “video games with expansion packs,” “organizations based in sweden,” among many others). many categories played a clear role in the wikipedia ecology of knowledge but were not suitable as general-purpose subject headers (“living people,” “1849 deaths”). strikingly, though, the vast majority of candidates occurred only once. only forty-two occurred twice, fifteen occurred three times, and one occurred twelve times: “physical cosmology.” twelve occurrences, four times as many as the next candidate, looked like a bright line. and “physical cosmology” is an excellent description of brief history— arguably better than lcsh’s “cosmology.” the approach looked promising. ■■ automating further test cases the next step was to test an extensive variety of books to see if the method was more broadly applicable. however, running searches and collating queries for even one book is tedious; investigating a large number by hand was prohibitive. therefore i wrote a categorization script (see appendix) that performs the following steps:16 ■■ reads in a file of statistically improbable phrases17 ■■ runs google queries against wikipedia for all of them18 ■■ selects the top hits after filtering out some common wikipedia nonarticles, such as “category” and “user” pages ■■ harvests these articles’ categories ■■ sorts these categories by their frequency of occurrence this algorithm did not filter out wikipedia administrative categories, as creating a list of them would have been prohibitively time-consuming. however, it would be table 1. sips and caps for a brief history of time sips grand unification energy, complete unified theory, thermodynamic arrow, psychological arrow, primordial black holes, boundary proposal, hot big bang model, big bang singularity, more quarks, contracting phase, sum over histories caps alpha centauri, solar system, nobel prize, north pole, united states, edwin hubble, royal society, richard feynman, milky way, roger penrose, first world war, weak anthropic principle 10 information technology and libraries | march 2011 “continental_army_generals” vs. “united states— history—revolution, 1775–1783.” ■■ weak: some categories treated the same subject as the lcsh but not at all in the same way ■■ wrong: the categories were actively misleading the results are displayed in table 2. ■■ discussion the results of this test were decidedly more mixed than those of my initial test case. on some books the wikipedia method performed remarkably well; on misleadingly increase recall statistics. ■■ i did not consider any categories with fewer than four occurrences, even if that left me with fewer than five top categories to consider. the lists of three-, two-, and one-occurrence categories were very long and almost entirely composed of unrelated items. i also considered, subjectively, the degree of overlap between the lcshs and the top wikipedia categories. i chose four degrees of overlap: ■■ strong: the top categories were largely relevant and included synonyms or near-synonyms for the lcsh ■■ near miss: some categories suggested the lcsh but missed its key points, such as table 2. results (sorted by percentage of relevant categories). book p r rbw subjective quality chronicles, bob dylan 0.2 0.5 0.8 strong the chronicles of narnia: the lion, the witch and the wardrobe official illustrated movie companion, perry moore 0.25 1 0.625 strong 1776, david mccullough 0 0 0.8 near miss 100 people who are screwing up america, bernard goldberg 0 0 0.625 weak the bob dylan scrapbook, 1956–1966, with text by robert santelli 0.2 0.5 0.4 strong three weeks with my brother, nicholas sparks 0 0 0.57 weak mother angelica, raymond arroyo 0.07 0.33 0.43 near miss confessions of a video vixen, karrine steffans 0.25 0.33 0.25 weak the fairtax book, neal boortz and john linder 0.17 0.33 0.33 strong never have your dog stuffed, alan alda 0 0 0.43 weak the world is flat, thomas l. friedman 0.4 0.5 0 near miss the tender bar, j. r. moehringer 0 0 0.2 wrong the tipping point, malcolm gladwell 0 0 0.2 wrong collapse, jared diamond 0 0 0.11 weak blink, malcolm gladwell 0 0 0 wrong freakonomics, steven d. levitt and stephen j. dubner 0 0 0 wrong guns, germs, and steel, jared diamond 0 0 0 weak magical thinking, augusten burroughs 0 0 0 wrong a million little pieces, james frey 0 0 0 wrong worth more dead, ann rule 0 0 0 wrong tuesdays with morrie, mitch albom no category with more than 4 occurrences a simple scheme for book classification using wikipedia | yelton 11 my method’s success with a brief history of time. i tested another technical, jargon-intensive work (n. gregory mankiw’s macroeconomics textbook), and found that the method also worked very well, giving categories such as “macroeconomics” and “economics terminology” with high frequency. therefore a system of this nature, even if not usable for a broad-based collection, might be very useful for scientific or other jargon-intensive content such as a database of journal articles. ■■ future research the method outlined in this paper is intended to be a proof of concept using readily available tools. the following work might move it closer to a real-world application: ■■ a configurable system for providing statistically improbable phrases; there are many options.23 this would provide the user with more control over, and understanding of, sip generation (instead of the amazon black box), as well as providing output that could integrate directly with the script. ■■ a richer understanding of the wikipedia category system. some categories (e.g., “all articles with unsourced statements”) are clearly useful only for wikipedia administrative purposes, not as document descriptors; others (e.g., “physical cosmology”) are excellent subject candidates; others have unclear value as subjects or require some modification (e.g., “environmental non-fiction books,” “macroeconomics stubs”). many of these could be filtered out or reformatted automatically. ■■ greater use of wikipedia as an ontology. for example, a map of the category hierarchies might help locate headers at a useful level of granularity, or to find the overarching meaning suggested by several headers by finding their common broader terms. a more thorough understanding of wikipedia’s relational structure might help disambiguate terms.24 others, it performed very poorly. however, there are several patterns here: many of these books were autobiographies, and the method was ineffective on nearly all of these.20 a key feature of autobiographies, of course, is that they are typically written in the first person, and thus lack any term for the major subject—the author’s name. biography, by contrast, is rife with this term. this suggests that including titles and authors along with sips and caps may be wise. additionally, it might require making better use of wikipedia as an ontology to look for related concepts (rather in the manner that han and zhao used it for name disambiguation).21 books that treat a single, well-defined subject are easier to analyze than those with more sprawling coverage. in particular, books that treat a concept via a sequence of illustrative essays (e.g., tipping point, freakonomics) do not work well at all. sips may apply only to particular chapters rather than to the book as a whole, and the algorithm tends to pick out topics of particular chapters (e.g., for freakonomics, the fascinating chapter on sudhir venkatesh’s work on “gangs_in_chicago, _illinois”22) rather than the connecting threads of the entire book (e.g. “economics—sociological aspects”). the tactics suggested for autobiography might help here as well. my subjective impressions were usually, but not always, borne out by the statistics. this is because some of the rbws were strongly related to one another and suggested to a human observer a coherent narrative, whereas others picked out minor or dissimilar aspects of the book. there was one more interesting, and promising, pattern: my subjective impressions of the quality of the categories were strongly predicted by the frequency of the most common category. remember that in the brief history example, the most common category, “physical cosmology,” occurred twelve times, conspicuously more than any of its other categories. therefore i looked at how many times the top category for each book occurred in my results. i averaged this number for each subjective quality group; the results are in table 3. in other words, the easier it was to draw a bright line between common and uncommon categories, the more likely the results were to be good descriptions of the work. this suggests that a system such as this could be used with very little modification to streamline categorization. for example, it could automatically categorize works when it met a high confidence threshold (when, for instance, the most common category has double-digit occurrence), suggest categories for a human to accept or reject at moderate confidence, and decline to help at low confidence. it was also interesting to me that—unlike my initial test case—none of the bestsellers were scientific or technical works. it is possible that the jargon-intensive nature of science makes it easier to categorize accurately, hence table 3. category frequency and subjective quality subjective quality of categories frequencies of most common category average frequency of most common category strong 6, 12, 16, 19 13.25 near miss 5, 5, 7, 10 6.75 weak 4, 5, 6, 7, 8 6 wrong 3, 4, 4, 5, 5, 5, 7, 7 5 12 information technology and libraries | march 2011 (1993): 219. 4. birger hjørland, “the concept of subject in information science,” journal of documentation 48, no. 2 (1992): 172; jenserik mai, “classification in context: relativity, reality, and representation,” knowledge organization 31, no. 1 (2004): 39; jens-erik mai, “actors, domains, and constraints in the design and construction of controlled vocabularies,” knowledge organization 35, no. 1 (2008): 16. 5. xiaohua hu et al., “exploiting wikipedia as external knowledge for document clustering,” in proceedings of the 15th acm sigkdd international conference on knowledge discovery and data mining, paris, france, 28 june–1 july 2009 (new york: acm, 2009): 389. 6. yannis labrou and tim finin, “yahoo! as an ontology— using yahoo! categories to describe documents,” in proceedings of the eighth international conference on information and knowledge management, kansas city, mo, usa 1999 (new york: acm, 1999): 180. 7. kwan yi, “automated text classification using library classification schemes: trends, issues, and challenges,” international cataloging & bibliographic control 36, no. 4 (2007): 78. 8. xianpei han and jun zhao, “named entity disambiguation by leveraging wikipedia semantic knowledge,” in proceeding of the 18th acm conference on information and knowledge management, hong kong, china, 2–6 november 2009 (new york: acm, 2009): 215. 9. david carmel, haggai roitman, and naama zwerdling, “enhancing cluster labeling using wikipedia,” in proceedings of the 32nd international acm sigir conference on research and development in information retrieval, boston, ma, usa (new york: acm, 2009): 139. 10. peter schönhofen, “identifying document topics using the wikipedia category network,” in proceedings of the 2006 ieee/wic/acm international conference on web intelligence, hong kong, china, 18–22 december 2006 (los alamitos, calif.: ieee computer society, 2007). 11. hu et al., “exploiting wikipedia.” 12. david milne, olena medelyan, and ian h. witten, “mining domain-specific thesauri from wikipedia: a case study,” in proceedings of the 2006 ieee/wic/acm international conference on web intelligence, 22–26 december 2006 (washington, d.c.: ieee computer society, 2006): 442. 13. zeno gantner and lars schmidt-thieme, “automatic content-based categorization of wikipedia articles,” in proceedings of the 2009 workshop on the people’s web meets nlp, acl-ijcnlp 2009, 7 august 2009, suntec, singapore (morristown, n.j.: association for computational linguistics, 2009): 32. 14. “amazon.com capitalized phrases,” amazon.com, http://www.amazon.com/gp/search-inside/capshelp.html/ ref=sib_caps_help (accessed mar. 13, 2010). 15. for more on the epistemological and technical roles of categories in wikipedia, see http://en.wikipedia.org/wiki/ wikipedia:categorization. 16. two sources greatly helped the script-writing process: william steinmetz, wicked cool php: real-world scripts that solve difficult problems (san francisco: no starch, 2008); and the documentation at http://php.net. 17. not all books on amazon.com have sips, and books that do may only have them for one edition, although many editions may be found separately on the site. there is not a readily apparent pattern determining which edition features sips. therefore ■■ a special-case system for handling books and authors that have their own article pages on wikipedia. in addition, a large-scale project might want to work from downloaded snapshots of wikipedia (via http:// download.wikimedia.org/), which could be run on local hardware rather than burdening their servers, this would require using something other than google for relevance ranking (there are many options), with a corresponding revision of the categorization script. ■■ conclusions even a simple system, quickly assembled from freely available parts, can have modest success in identifying book categories. although my system is not ready for real-world applications, it demonstrates that an approach of this type has potential, especially for collections limited to certain genres. given the staggering volume of documents now being generated, automated classification is an important avenue to explore. i close with a philosophical point. although i have characterized this work throughout as automated classification, and it certainly feels automated to me when i use the script, it does in fact still rely on human judgment. wikipedia’s category structure and its articles linking text to title concepts are wholly human-created. even google’s pagerank system for determining relevancy rests on human input, using web links to pages as votes for them (like a vast citation index) and the texts of these links as indicators of page content.25 my algorithm therefore does not operate in lieu of human judgment. rather, it lets me leverage human judgment in a dramatically more efficient, if also more problematic, fashion than traditional subject cataloging. with the volume of content spiraling ever further beyond our ability to individually catalog documents—even in bounded contexts like academic databases, which strongly benefit from such cataloging— we must use human judgment in high-leverage ways if we are to have a hope of applying subject cataloging everywhere it is expected. references and notes 1. carol tenopir. “online databases—online scholarly journals: how many?” library journal (feb. 1, 2004), http://www .libraryjournal.com/article/ca374956.html (accessed mar. 13, 2010). 2. “amazon.com statistically improbable phrases,” amazon. com, http://www.amazon.com/gp/search-inside/sipshelp .html/ref=sib_sip_help (accessed mar. 13, 2010). 3. hanne albrechtsen. “subject analysis and indexing: from automated indexing to domain analysis,” the indexer, 18, no. 4 a simple scheme for book classification using wikipedia | yelton 13 problematic million little pieces to be autobiography, as it has that writing style, and as its lcsh treats it thus. 21. han and zhao, “named entity disambiguation.” 22. sudhir venkatesh, off the books: the underground economy of the urban poor (cambridge: harvard univ. pr., 2006). 23. see karen coyle, “machine indexing,” the journal of academic librarianship 34, no. 6 (2008): 530. she gives as examples phraserate (http://ivia.ucr.edu/projects/phraserate/), kea (http://www.nzdl.org/kea/), and extractor (http://extractor. com/). 24. per han and zhao, “named entity disambiguation.” 25. lawrence page et al., “the pagerank citation ranking: bringing order to the web,” stanford infolab (1999), http:// ilpubs.stanford.edu:8090/422/ (accessed mar. 13, 2010). this paper precedes the launch of google; as the title indicates, the citation index is one of google’s foundational ideas. this step cannot be automated. 18. be aware that running automated queries without permission is an explicit violation of google’s terms of service. seegoogle webmaster central, “automated queries,” http://www.google.com/support/webmasters/bin/answer .py?hl=en&answer=66357 (accessed mar. 13, 2010). before using this script, obtain an api key, which confers this permission. ajax web search api keys can be instantly and freely obtained via http://code.google.com/apis/ajaxsearch/web.html. 19. “hardcover nonfiction,” new york times, oct. 9, 2005, http://www.nytimes.com/2005/10/09/books/bestseller /1009besthardnonfiction.html?_r=1 (accessed mar. 13, 2010); “paperback nonfiction,” new york times, oct. 9, 2005, http://www .nytimes.com/2005/10/09/books/bestseller/1009bestpapernon fiction.html?_r=1 (accessed mar. 13, 2010). 20. for the purposes of this discussion i consider the appendix. php script for automated classification 4) { echo “i’m sorry; the number specified cannot be more than 4.”; die; } // next, turn our comma-separated list into an array. 14 information technology and libraries | march 2011 $sip_temp = fopen($argv[1], ‘r’); $sip_list = ‘’; while (! feof($sip_temp)) { $sip_list .= fgets($sip_temp, 5000); } fclose($sip_temp); $sip_array = explode(‘, ‘, $sip_list); /* here we access google search results for our sips and caps. it is a violation of the google terms of service to run automated queries without permission. obtain an ajax api key via http://code.google.com. */ $apikey = ‘your_key_goes_here’; foreach($sip_array as $query) { /* in multiword terms, change spaces to + so as not to break the google search. */ $query = str_replace( “ “, “+”,,” $query); $googresult = “http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=site%3aen.wikipedia.org+$query&key=$apikey”; $googdata = file_get_contents($googresult); // pick out the urls we want and put them into the array $links preg_match_all(‘|” url”:” [^” ]*”|i’,, $googdata, $links); /* strip out some crud from the json syntax to get just urls */ $links[0] = str_replace( “\” url\”:\” “, “”, $links[0]); $links[0] = str_replace(“\” “, “”, $links[0]); /* here we step through the links in the page google returned to us and find the top wikipedia articles among the results */ $i=0; foreach($links[0] as $testlink) { /* these variables test to see if we have hit a wikipedia special page instead of an article. there are many more flavors of special page, but these are the most likely to show up in the first few hits. */ $filetest = strpos($testlink, ‘wiki/file:’); $cattest = strpos($testlink, ‘wiki/category:’); $usertest = strpos($testlink, ‘wiki/user’); $talktest = strpos($testlink, ‘wiki/talk:’); $disambtest = strpos($testlink, ‘(disambiguation)’); $templatetest = strpos($testlink, ‘wiki/template_’); if (!$filetest && !$cattest && !$usertest && !$talktest && !$disambtest && !$templatetest) { $wikipages[] = $testlink; $i++; } /* once we’ve accumulated as many article pages as the user asked for, stop adding links to the $wikipages array. */ appendix. php script for automated classification (continued) a simple scheme for book classification using wikipedia | yelton 15 if ($i == $argv[2]) { break; } //this closes the foreach loop which steps through $links } // this closes the foreach loop which steps through $sip_array } /* for each page that we identified in the above step, let’s find the categories it belongs to. */ $mastercatarray = array(); foreach ($wikipages as $targetpage) { // scrape category information from the article page. $wikiscrape = file_get_contents($targetpage); preg_match_all(“|/wiki/category.[^\” ]+|”,,” $wikiscrape, $categories); foreach ($categories[0] as $catstring) { /* strip out the “wiki/category:” at the beginning of each string */ $catstring = substr($catstring, 15); /* keep count of how many times we’ve seen this category. */ if (array_key_exists($catstring, $mastercatarray)) { $mastercatarray[$catstring]++; } else { $mastercatarray[$catstring] =1; } } } // sort by value: most popular categories first. arsort($mastercatarray); echo “the top categories are:\n”; print_r($mastercatarray); ?> appendix. php script for automated classification (continued) mapping for the masses: gis lite and online mapping tools in academic libraries kathleen w. weessies and daniel s. dotson information technology and libraries | march 2013 23 abstract customized maps depicting complex social data are much more prevalent today than in the past. not only in formal published outlets, interactive mapping tools make it easy to create and publish custom maps in both formal and more casual outlets such as social media. this article defines gis lite, describes three commercial products currently licensed by institutions, and discusses issues that arise from their varied functionality and license restrictions. introduction news outlets from newspapers to television to internet these days are filled with maps that make it possible for readers to visualize complex social data. presidential election results, employment rates, and the plethora of data arising from the census of population are just a small sampling of social data mapped and consumed daily. the sharp rise in published maps in recent years has increased consumer awareness of the effectiveness of presenting data in map format and has raised expectations for finding, making and using customized maps. not just in news media, but in academia also, researchers and students have high interest in being able to make and use maps in their work. just a few years ago even the simplest maps had to be custom made by specialists. researchers and publishers had to seek out highly trained experts to make maps on their behalf. as a result, custom maps were generally only to be found in formal publications. the situation has changed partly because geographic information system (gis) software for geographic analysis and map making is more readily available than in years past. it does, however, remain specialized and wants considerable training for users to be proficient at even a basic level.1 this gap between supply and demand has been partly filled, especially in the last five years, by the growth of internet-based “gis lite” tools. while some basic tools are freely available on the internet, several tools are subscription-based and are licensed by libraries, schools and businesses for use. college and university libraries especially are quickly becoming a major resource for data visualization and mapping tools. the aim of this article is to describe several data-rich gis lite tools available in the library market and how these products have met or failed to meet the needs of several real-life college class kathleen w. weessies (weessie2@msu.edu), a lita member, is geosciences librarian and head of the map library, michigan state university, lansing. michigan. daniel s. dotson (dotson.77@osu.edu) is mathematical sciences librarian and science education specialist, associate professor, ohio state university libraries, columbus, ohio. mailto:weessie2@msu.edu mailto:dotson.77@osu.edu mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 24 situations. this is followed by a discussion of issues arising from user needs and restrictions posed by licensing and copyright. what is gis lite? students and faculty across the academic spectrum often discover that their topic has a geographic element to it and a map would enhance their work (paper, presentation, project, poster, article, book, thesis or dissertation, etc.). if their research involves data analysis, geospatial tools will draw attention to spatial patterns in the data that might not otherwise be apparent. every scholar with such needs must make a cost/benefit decision concerning gis: is his or her need greater than the cost in time and effort (sometimes money) necessary to learn or hire skills to produce map products? a full functioning gis, being a specialized system of software designed to work with geospatially referenced datasets, is designed to address all the problems above. the data may be analyzed and output into customized maps exactly to the researcher’s need. the traditional lowend solution available to non-experts, on the other hand, is colorizing a blank outline map, either with hand-held tools (markers, colored pencils, etc.) or on a computer using a graphic editing program. the profusion of web mapping options dangles tantalizingly with possibility, and occasionally (and increasingly) is able to provide an output that illustrates a useful point of users’ research in a professional enough manner to fill a need. in recent years the web has blossomed with map applications collectively called the “geoweb” or “geospatial web.” geoweb or geospatial web refers to the “emerging distributed global gis, which is a widespread distributed collaboration of knowledge and discovery.”2 some geoweb applications are well known street map resources such as google maps and mapquest. others are designed to deliver data from an organization, such as the national hazards support system (http://nhss.cr.usgs.gov), national pipeline mapping system (http://www.npms.phmsa.dot.gov/publicviewer), and the broadband map (http://www.broadbandmap.gov). a few tools focus on map creation and output such as arcgis online (http://www.arcgis.com/home/webmap/viewer.html) and scribble maps (http://www.scribblemaps.com). the newest subgenre of the geoweb consists of participatory mapping sites such as openstreet map (http://www.openstreetmap.org), did you feel it? (http://earthquake.usgs.gov/earthquake.usgs.gov/earthquakes/dyfi), and ushahidi (http://community.ushahidi.com/deployments). the geoweb literature is small but growing. 3 elwood reviewed published research on the geographic web.4 the geoweb literature tends to focus on creation of mappable data and delivery of geoweb services.5 in these the map consumer only appears as a contributor of data. very little has been written about users’ needs from the geoweb. the term gis lite has arisen among map and gis librarians to describe a subset of geoweb applications. gis lite is useful to library patrons lacking specialized gis training but who wish to conduct some gis and map-making activities on a lower learning curve. for the purpose of this article, gis lite will refer to applications, usually web-based, which allow users to manipulate geospatial data and create map outputs without programming skills or training in full gis software. http://nhss.cr.usgs.gov/ http://www.npms.phmsa.dot.gov/publicviewer http://www.broadbandmap.gov/ http://www.arcgis.com/home/webmap/viewer.html http://www.scribblemaps.com/ http://www.openstreetmap.org/ http://earthquake.usgs.gov/earthquake.usgs.gov/earthquakes/dyfi http://community.ushahidi.com/deployments information technology and libraries | march 2013 25 while many geoweb applications allow only low-level output options, gis lite will provide an output intended to be used in activities or rolled into a gis for further geospatial processing. in libraries, gis lite is closely allied with data and statistics resources. data and statistics librarianship have already been discussed as disciplines in the literature such as by hogenboom6 and gray.7 new technologies and access to deeper data resources such as the ones presented here have raised the bar for librarians’ responsibilities for curating, serving, and aiding patrons in its use. rather than be passive shepherds of information resources, librarians are now active participants and even information partners. librarians with map and gis skills similarly can directly enhance the quality of student scholarship across academic disciplines.8 the gis lite resources, however, need not remain specialized tools of map and gis librarians. librarians working in disciplines across the academic spectrum may incorporate them into their arsenal of tools to meet patron needs. data visualization tools a growing number of academic libraries have licensed access to online data providers. the following data tools contain enough gis lite functionality to aid patrons in visualizing and manipulating data (primarily social data) and creating customized map outputs. three of the more powerful commercial products described here are social explorer, simplymap, and proquest statistical datasets. social explorer licensed by oxford university press, social explorer provides selected data from the us decennial census 1790 to 2010, plus american community survey 2006 through 2010.9 the interface enables either retrieval of tabular data or visualization of data in an interactive map. as the user selects options through pull-down menus, the map automatically refreshes to reflect the chosen year and population statistics. the level of geography depicted defaults to county level data. if a user zooms in to an area smaller than a county, then data refreshes to smaller geographies such as census tracts if they are available at that level for that year. output is in the form of graphic files suitable for sharing in a computer presentation (see figure 1). one advantage of social explorer is that it utilizes historic boundaries as they existed for states, territories, counties, and census tracts for each given year. social explorer utilizes data and boundary files generated by the national historical gis (nhgis) based at the university of minnesota in collaboration with other partners. the creation of these historical boundaries was a significant undertaking and accomplishment.10 custom tables of data and the historic geographic boundaries may also be retrieved and downloaded for use from an affiliated engine through the nhgis website (http://www.nhgis.org). a disadvantage of this product is that the tool, while robust, does not completely replicate all the data available in the original paper census volumes. also, historical boundaries have not been created for city or township-level data. the final map layout is not customizable either in the location of title and legend or in the data intervals. http://www.nhgis.org/ mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 26 figure 1: map depicting population having four or more years of college, 1960 (source: social explorer, 2012; image used with permission) simplymap simplymap (http://geographicresearch.com/simplymap) is a product of geographic research. this powerful interface brings together public and licensed proprietary data to offer a broad array of 75,000 data variables in the united states. us census data are available 1980–2010 normalized to the user’s choice of either year 2000 or year 2010 geographies. numerous other licensed datasets primarily focus on demographics and consumer behavior, which makes it popular as a marketing research tool. each user establishes a personal login which allows created maps and tables to persist from session to session. upon creating a map view, the user may adjust the smaller geographic unit at which the theme data is displayed and also may adjust the data intervals as desired. the user creates a layout, adjusting the location of the map legend and title before exporting as a graphic or pdf (see figure 2). data are also exportable as gis-friendly shapefiles. http://geographicresearch.com/simplymap information technology and libraries | march 2013 27 the great advantage of this product is the ability to customize the data intervals. this makes it possible to filter the data and display specific thresholds meaningful to the user. for instance if a user needs to illustrate places where an activity or characteristic is shared by “over half” of the population, then one may change the map to display two data categories: one for places where up to 50 percent of the population shares the characteristic and a second category for places where more than 50 percent of the population shares the characteristic. another potential advantage is that all local data have been allocated pro rata so that all variables, regardless of their original granularity, may be expressed by county boundaries, by zip code boundaries, or by census tract. a disadvantage of the product is the lack of historical boundaries to match historical data. figure 2. map depicting census tracts that have more than 50% black population (yellow line indicates cincinnati city boundary) (source: simplymap, 2012; image used with permission) mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 28 proquest statistical datasets statistical datasets was developed by conquest systems inc. and is licensed by proquest. this product also mingles a broad array of several thousand public and licensed proprietary datasets, including some international data, in one interface. the user may retrieve data and view it in tabular or chart form. if the data have a geographic element, then the user may switch the view to a map interface. the resulting map may be exported as an image. the data may also be exported to a gis-friendly shapefile format. this product offers more robust data manipulation than the other products, in that the user may perform calculations between any of the data tables and create a chart or map of the created data element (see figure 3). statistical datasets, however, has more simplistic map layout capabilities than the other products. figure 3. map of sorghum production, by country, in 2010 (source: proquest statistical datasets, 2012; image used with permission) case studies the following three case studies are of college classroom situations in which students utilized maps or map making as part of the assigned course work. the above mapping options are assessed for how well they met the assignment needs. information technology and libraries | march 2013 29 case study 1 an upper level statistics course at the ohio state university requires students to create maps using sas (http://www.sas.com). while many may not associate the veteran statistical software package with creating maps, this course uses it along with sas/graph to combine statistical data with a map. the project requires data articulated at the county level in ohio, which the students then combine into multi-county regions. the end result is a map with regions labeled and rendered in 3d according to the data values. an example of the type of map that could be produced from such data using sas can be seen in figure 4. figure 4. map of observed rabbit density in ohio using sas, sas/graph, and mail carrier survey data,1998 (image used with permission) while the data are provided in this course, students could potentially seek help from the library in a traditional way to find numerical data expressed at a county level. the librarian would guide http://www.sas.com/ mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 30 patrons through appropriate avenues to locate data such as to the three products listed above. all three options contain numerous data variables for ohio at the county level. because the students are further processing the data elsewhere (in this case sas), the output options of the three products are less important. ultimately the availability of data on a desired subject would be the primary determinant for choosing one of the three gis lite options discussed here. social explorer will export the data in tabular form which can then be ingested into sas. simplymap and proquest statistical datasets would both be a bit easier, though, because both packages allow the user to export the data as shapefiles which are directly imported into sas/graph as both boundary files and joined tabular data. case study 2 a first year writing class at michigan state university has a theme of the american ethnic and racial experience. assignments all relate to a student’s chosen ethnic group and geographic location from approximately 1880 to 1930. assignments build upon each other to culminate in a final semester paper. students with ancestors living in the united states at that time are encouraged to examine their own family’s ethnicity and how they fit in their geographic context. otherwise, students may choose any ethnic group and place of interest. maps are a required element in the assignments. maps that display historical census data help students place the subject ethnic group into the larger county, state, and national context over the time frame. the students can see, for instance, if their subject household was part of an ethnic cluster or an outlier to ethnic clusters. the parameters for finding data and maps are generous and open to each student’s interpretation. the wish is for students to find social statistics and maps that are insightful to their topic and will help them tell their story. of the three statistical resources considered above, currently the only useful one is social explorer because it covers the time period studied by the class. the students may map several social indicators at the county level across several decades and compare their local area to the region and the nation. also they may save their maps and include them in their papers (properly credited). case study 3 “the ghetto” is an elective geography class restricted to upperclassman at michigan state university. in the semester project, students analyze the spatial organization and demographic variables of “ghetto” neighborhoods in a chosen city. a ghetto is defined as neighborhoods that have a 50 percent or higher concentration of a definable ethnic group. since black and white are the only two races consistently reported at the census tract level for all the years covered by the class (1960 through 2010) the students necessarily use that data for their projects. data needs for the class are focused and deep. the students specifically need to visualize us census data from 1960 through 2010 at the census tract level within the city limits for several social indicators. indicators include median income, median housing value, median rent, educational attainment, income, and rate of unemployment. the instructor has traditionally required use of the paper census volumes and students created hand-made maps that highlight information technology and libraries | march 2013 31 tracts in the subject city that conformed to the ghetto definition and those that did not for each of the census years covered. computer-retrieved data and computer-generated maps would be acceptable, but at the time of this writing no gis lite product is able to make all the maps that meet the specific requirements of this class. social explorer covers all of the date range and provides data down to the tract level. however it does not provide an outline of the city limits and does not provide all the data variables required in the assignment. simplymap will only work for 2000 through 2010 because tract boundaries are only available for those two years even though the data go back to 1980. simplymap does provide two excellent features though: it is the only product that allows an overlay of the (modern) city boundary on top of the census tract map, ands it is the only product that allows manipulation of the data intervals. students may choose to break the data at the needed 50 percent mark, while the other products utilize fixed data intervals not useful to this class. proquest statistical datasets can compute the data into two categories to create the necessary data intervals; however census data are only available beginning with census 2000. map products for user needs these three real-life class scenarios illustrate how the rich and seemingly duplicative resources of the library can range from perfectly suitable to perfectly useless depending on each project’s exact needs. the appropriateness of any given tool can only be assessed fairly if the librarian is familiar with all the “ins and outs” of every product. the geoweb and gis lite tools mentioned throughout this article are summarized in table 1. the suitability of gis lite tools will be further affected by the following issues. historical boundaries the range and granularity of data tools are subject to factors sometimes at odds with what a researcher would wish to have. at this time, for instance, many historical resources provide data only as detailed as the county level. county level data are available largely due to the efforts of the nhgis mentioned above and the newberry library’s atlas of county boundaries project (http://publications.newberry.ort/ahcbp). far fewer resources provide historical data at smaller geographies such as city, township, or census tract levels. this is because the smaller the geographies get, the exponentially more there are to create and for map interfaces to process. from the well-known resource city and county data book,11 it is easy enough to retrieve us city data. the historical boundaries of every city in the united states, however, have not been created. this is because city boundaries are much more dynamic than county boundaries and there is no centralized authoritative source for their changes over time. two of the three case studies presented here utilized historic data. this isn’t necessarily a representative proportion of user needs; librarians should assess data resources in light of their own patrons’ needs. normalization two equally valid data needs concerning any kind of time series data concern changing geographic boundaries. census tracts, for instance, provide geographic detail roughly at the neighborhood level designed by the bureau of census to encompass approximately 2,500 to 8,000 http://publications.newberry.ort/ahcbp mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 32 people.12 because people move around and the density of population changes from decade to decade, so the configuration and numbering of tracts change over time. some scholars will wish to see the data values in the tracts as they were drawn at the time of issue. in this situation, a neighborhood of interest might belong to different tracts over the years or even be split between two or more tracts. other scholars focused on a particular neighborhood may wish to see many decades of census data re-cast into stable tracts in order to be directly comparable. data providers will take one approach or the other on this issue, and librarians will do well to be aware of their choice. license restrictions a third issue affecting use of these products is the ability to use derived map images, not only in formal outlets such as professional presentations, articles, books, and dissertations, but also informal outlets such as blogs and tweets. for the most part gis lite vendors are willing—even pleased—to see their products promoted in the literature and in social media. the vendors uniformly wish any such use to be properly credited. the license that every institution signs when acquiring these products will specify allowed and disallowed activities. the license, fixated on disallowing abuse or resale or other commercialization of the data, might leave a chilling effect on users wishing to use the images in their work. if a user is in any doubt as to the suitability of an intended use of a map, he or she should be encouraged to contact the vendor to seek permission for its use. as data resources grow and become more readily usable, the possibility for scholarly inquiry grows. librarians with familiarity with gis lite tools may partner with their patrons and guide them to the best resources. information technology and libraries | march 2013 33 table 1: a selection of geoweb and gis lite tools and their output options tool name url free or fee electronic output options* geoweb tools atlas of historical county boundaries http://publications.newberry.org/ahcbp/ free spatial data as shapefile, kmz; image as pdf did you feel it? http://earthquake.usgs.gov/earthquakes/dyfi/ free tabular data as txt, xml. image as jpg, pdf, ps google maps https://maps.google.com/ free none mapquest http://www.mapquest.com free none national broadband map http://www.broadbandmap.gov/ free image as png national hazards support systems (usgs) http://nhss.cr.usgs.gov/ free image as pdf, png national pipeline mapping system https://www.npms.phmsa.dot.gov/publicview er/ free image as jsf openstreetmap http://www.openstreetmap.org/ free tabular data as xml; image as png, jpg, svg, pdf ushahidi community deployments http://community.ushahidi.com/deployments/ free image as jpg gis lite tools arcgis online http://www.arcgis.com limited free options; access is part of institutional site license spatial data as arcgis 10; image as png (in arcexplorer) proquest statistical datasets http://cisupa.proquest.com/ws_display.asp?filt er=statistical%20datasets%20overview fee tabular data as excel, pdf, delimited text, sas, xml; spatial data as shapefile; image may be copied to clipboard sas/graph http://www.sas.com/technologies/bi/query_re porting/graph/index.html fee image as pdf, png, ps, emf, pcl scribble maps http://www.scribblemaps.com/ free spatial data as kml, gpx; image as jpg simplymap http://geographicresearch.com/simplymap fee tabular data as excel, csv, dbf, spatial data as shapefile; image as pdf, gif * does not include taking a screen shot of the monitor or making a durable url to the page http://publications.newberry.org/ahcbp/ http://earthquake.usgs.gov/earthquakes/dyfi/ https://maps.google.com/ http://www.mapquest.com/ http://www.broadbandmap.gov/ http://nhss.cr.usgs.gov/ https://www.npms.phmsa.dot.gov/publicviewer/ https://www.npms.phmsa.dot.gov/publicviewer/ http://www.openstreetmap.org/ http://community.ushahidi.com/deployments/ http://www.arcgis.com/ http://cisupa.proquest.com/ws_display.asp?filter=statistical%20datasets%20overview http://cisupa.proquest.com/ws_display.asp?filter=statistical%20datasets%20overview http://www.sas.com/technologies/bi/query_reporting/graph/index.html http://www.sas.com/technologies/bi/query_reporting/graph/index.html http://www.scribblemaps.com/ http://geographicresearch.com/simplymap information technology and libraries | march 2013 34 references 1. national research council, division on earth and life studies, board on earth sciences and resources, geographical sciences committee, learning to think spatially (washington, d.c.: f academies press, 2006): 9. 2. pinde fu and jiulin sun, web gis: principles and applications (redlands, ca: esri press, 2011): 15. 3. for good overviews of the geoweb, see muki haklay, alex singleton and chris parker, “web mapping 2.0: the neogeography of the geoweb,” geography compass 2, no. 6 (2008): 20112039, http://dx.doi.org/10.1111/j.1749-8198.2008.00167.x; jeremy w crampton, “cartography: maps 2.0,” progress in human geography 33, no. 1 (2009): 91-100, http://dx.doi.org/10.1177/0309132508094074. 4. sarah elwood, “geographic information science: visualization, visual methods, and the geoweb,” progress in human geography 35, no. 3 (2010): 401-408, http://dx.doi.org/10.1177/0309132510374250. 5. songnian li; suzana dragićević, and bert veenendaal eds, advances in web-based gis, mapping services and applications (boca raton, fl: crc press, 2011). 6. hogenboom, karen, carissa phillips, and merinda hensley, "show me the data! partnering with instructors to teach data literacy," in declaration of interdependence: the proceedings of the acrl 2011 conference, march 30-april 2, 2011, philadelphia, pa, ed. dawn m. mueller. (chicago: association of college and research libraries, 2011), 410-417, http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_ me_the_data.pdf. 7. ann s. gray, “data and statistical literacy for librarians,” iassist quarterly 28 no. 2/3 (2004): 24-29, http://www.iassistdata.org/content/data-and-statistical-literacy-librarians. 8. kathy weimer, paige andrew, and tracey hughes, map, gis and cataloging / metadata librarian core competencies (chicago: american library association map and geography round table, 2008), http://www.ala.org/magirt/files/publicationsab/magertcorecomp2008.pdf. 9. social explorer. http://www.socialexplorer.com/pub/home/home.aspx. 10. catherine fitch and steven ruggles, building the national historical geographic information system historical methods 36, no. 1 (2003): 41-50, http://dx.doi.org/10.1080/01615440309601214 . 11. u. s. bureau of census. county and city data book, http://www.census.gov/prod/www/abs/ccdb.html. http://dx.doi.org/10.1111/j.1749-8198.2008.00167.x http://dx.doi.org/10.1177/0309132508094074 http://dx.doi.org/10.1177/0309132510374250 http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_me_the_data.pdf http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_me_the_data.pdf http://www.iassistdata.org/content/data-and-statistical-literacy-librarians http://www.ala.org/magirt/files/publicationsab/magertcorecomp2008.pdf http://www.socialexplorer.com/pub/home/home.aspx http://dx.doi.org/10.1080/01615440309601214 http://www.census.gov/prod/www/abs/ccdb.html information technology and libraries | march 2013 35 12. census tracts and block numbering areas. http://www.census.gov/geo/www/cen_tract.html. acknowledgments the authors wish to thank dr. michael fligner, dr. clarence hooker, and dr. joe darden for permission to use their courses as case studies. http://www.census.gov/geo/www/cen_tract.html public access technologies in public libraries | bertot 81 john carlo bertot public access technologies in public libraries: effects and implications public libraries were early adopters of internet-based technologies and have provided public access to the internet and computers since the early 1990s. the landscape of public-access internet and computing was substantially different in the 1990s as the world wide web was only in its initial development. at that time, public libraries essentially experimented with publicaccess internet and computer services, largely absorbing this service into existing service and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology (pat) services and resources. this article explores the implications for public libraries of the provision of pat and seeks to look further to review issues and practices associated with pat provision resources. while much research focuses on the amount of public access that public libraries provide, little offers a view of the effect of public access on libraries. this article provides insights into some of the costs, issues, and challenges associated with public access and concludes with recommendations that require continued exploration. p ublic libraries were early adopters of internet-based technologies and have provided public access to the internet and computers since the early 1990s.1 in 1994, 20.9 percent of public libraries were connected to the internet, and 12.7 percent offered public-access computers. by 1998, internet connectivity in public libraries grew to 83.6 percent, and 73.3 percent of public libraries provided public internet access.2 the landscape of public-access internet and computing was substantially different in the 1990s, as the world wide web was only in its initial development. at that time, public libraries essentially experimented with public-access internet and computer services, largely absorbing this service into existing service and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology (pat) services and resources.3 using case studies conducted at thirty-five public libraries in five geographically dispersed and demographically diverse states, this article explores the implications for public libraries of the provision of pat. the researcher also conducted interviews with state library agency staff prior to visiting libraries in each state. the goals of this article are to n explore the level of support pat requires within public libraries; n explore the implications of pat on public libraries, including management, building planning, staffing, and other support issues; n explore current pat support practices; n identify issues and challenges public libraries face in maintaining and supporting their pat infrastructure; and n identify factors that contribute to successful pat practices. this article seeks to look beyond the provision of pat by public libraries and review issues and practices associated with pat–provision resources. while much research focuses on the amount of public access that public libraries provide, little offers a view of the effect of public access on libraries. this article provides insights into some of the costs, issues, and challenges associated with public access, and it concludes with recommendations that require continued exploration. n literature review quickly over time, public libraries increased their public-access provision substantially (see figures 1 and 2). connectivity grew from 20.9 percent in 1994 to nearly 100 percent in 2006.4 moreover, nearly all libraries that connected to the internet offered public-access internet services. simultaneously, the average number of publicaccess computers grew from 1.9 per public library in 1996 to 12 per public library in 2007.5 accompanying and in support of the continual growth of basic connectivity and computing infrastructure was a demand for broadband connectivity. indeed, since 1994, connectivity has progressed from dial-up phone lines to leased lines and other forms of high-speed connectivity. the extent of the growth in public-access services within public libraries is profound and substantive, leading to the development of new internet-based service roles for public libraries.6 and public access to the internet through public libraries provides a number of community benefits to different populations within served communities.7 overlaid onto the public-access infrastructure is an increasingly complex service mix that now includes access to digital content (e.g., databases and digital john carlo bertot (jbertot@umd.edu) is professor and director of the center for library innovation in the college of information studies at the university of maryland, college park. 82 information technology and libraries | june 2009 libraries), integrated library systems (ilss), voice over internet protocol (voip), digital reference, and a host of other services and resources—some for public access, others for back-office library operations. and patrons do use these services in increasing amounts—both in the library and in everyday life.8 in fact, 82.5 percent of public libraries report that they do not have an adequate number of public-access computers some or all of the time and have resorted to time limits and wireless access to extend public-access services.9 by 2007, as connectivity and public-access computer infrastructure grew, so ensued the need to provide a range of publicly available services and resources: n 87.7 percent of public libraries provide access to licensed databases n 83.4 percent of public libraries offer technology training n 74.1 percent of public libraries provide e-government services (e.g., locating government information and helping patrons complete online applications) n 62.5 percent of public libraries provide digital reference services n 51.8 percent of public libraries offer access to e-books10 the list is not exhaustive, but illustrative, since libraries do offer other services such access to homework resources, video content, audio content, and digitized collections. as public libraries expanded these services, management realized that they needed to plan and evaluate technology-based services. over the years, a range of technology management, planning, and evaluation resources emerged to help public libraries cope with their technology-based resources—those both publicly available and for administrative operations.11 but increasingly, public libraries report the strain that pat services promulgate. this centers on four key areas: n maintenance and management. the necessary maintenance and management requirements of pat places an additional burden on existing staff, many of whom do not possess technology expertise to troubleshoot, fix, and support internet-based services and resources that patrons access. n staff. libraries consistently cite staff expertise and availability as a barrier to the addition, support, and management of pat. indeed, as described in previous sections, some libraries have experienced a decline in library staff. n finances. there is evidence of stagnant funding for libraries at the local level as well as a shift in expenditures from staff and collections to operational costs such as utilities and maintenance. n buildings. the buildings are inadequate in terms of space and infrastructure (e.g., wiring and cabling) to support additional public access.12 this article explores these four areas through a sitevisit method in an effort to go beyond a quantitative assessment of pat within the public library community. though related in terms of topic area and author, this study was conducted separately from the public library internet surveys conducted since 1994 and offers insights into the provision of pat services and resources that a national survey cannot explore in such depth. figure 1. public-access internet connectivity from 1994 through 2008 figure 2. public-access internet workstations from 1996 through 2008 public access technologies in public libraries | bertot 83 n method the researcher visited thirty-five public libraries in five geographically and demographically diverse states between october 2007 and may 2008. the states were in the west, southwest, southeast, and mid-atlantic regions. the libraries visited included urban, suburban, rural, and native american public libraries that served populations ranging from a few hundred to more than half a million. the communities that the libraries served varied in terms of poverty, race, income, age, employment, and education demographics. prior to visiting the public library sites, the researcher conducted interviews with state library agency staff to better understand the public library context within each state and to explore overall pat issues, strategies, and other factors within the state. the following research questions guided the site visits: n what are the community and library contexts in which the library provides pat? n what are the pat services and resources that the library makes available to its community? n what pat services and resources does the library desire to provide to its community? n what is the relationship between provided and desired pat and the effect on the library (e.g., staff, finances, the building, and management)? n what are the perceived benefits to the library and its community gains through pat in the library? n what are the issues and barriers that the library encounters in providing pat services and resources? n how does the library manage and maintain its pat? the researcher visited each library for four to six hours. during that time, he interviewed the library director and/or branch manager and technology support staff (either a specific library position, designated library employee, or city or county it staff person), toured the library facility, and conducted a brief technology inventory. at some libraries, the researcher was able to meet with community partners that in some way collaborated with the library to provide pat services and resources (e.g., educational institutions that collaborated with libraries to provide access to broadband or volunteers who conducted technology training sessions). interviews were recorded and transcribed, and the technology inventories were entered into a microsoft excel spreadsheet for analysis. the transcripts were coded using thematic content analytic schemes to allow for the identification of key issues regarding pat areas.13 this approach enabled the researcher to use an iterative site-visit strategy that used findings from previous site visits to inform subsequent visits. to ensure valid and reliable data, the researcher used a three-stage strategy: 1. site-visit reports were completed and sent to th libraries for review. corrections from libraries were incorporated into a final site-visit report. 2. a final state-based site-visit report was compiled for distribution to state library agency staff and also incorporated their corrections. this provided a state-level reliability and validity check. 3. a summary of key findings was distributed to six experts in the public library technology environment, three of which were public library technology managers and three of which were technology consultants who worked with public libraries. in combination, this approach provided three levels of data quality checks, thus providing both internal (library and state) and external (technology expert) support for the findings. the findings in this article are limited to the libraries visited and interviews conducted with public librarians and state library agency staff. however, themes emerged early during the site-visit process and were reinforced through subsequent interviews and visits across the states and libraries visited. in addition, the use of external reviewers of the findings lends additional, but limited, support to the findings. n findings this section presents the results of the site visits and interviews with state library agency staff and public librarians. the article presents the findings by key areas surrounding pat in public libraries. the public-access context public libraries have a range of pat installed in their libraries for patron use. these technologies include public-access computers, wireless (wifi) access, ilss, online databases, digital reference, downloadable audio and video, and others. many of these services and resources are also available to patrons from outside library buildings, thus extending the reach (and support issues) of the library beyond the library’s walls. in addition, when libraries do not provide direct access to resources and services, they serve as access points to those services, such as online gaming and social networking. while libraries can and do deploy a number of technologies for public use, it is possible to group these 84 information technology and libraries | june 2009 technologies broadly into two overlapping categories: n hardware. library pat hardware can include public-access computers, public-access computing registration (i.e., reservation) systems, self-checkout stations, printers, faxes, laptops, and a range of other devices and systems. some of these technologies may have additional devices, such as those required for persons with disabilities. within the hardware grouping are networking technologies that include a range of hardware and software to enable a range of library networks to run (e.g., routers, hubs, switches, telecommunications lines, and networking software). n software. software can include device operating system software (e.g., microsoft windows, mac os, and linux), device application software (e.g., microsoft office, openoffice, graphics software, audio software, e-book readers, assistive software, and others), and functional software (e.g., web browsers, online databases, and digital reference). in short, public libraries make use of a range of technologies that the public uses in some way. each type of technology requires skills, management, implementation, and maintenance, all of which are discussed later. in the building, all of these products and services come together at the library’s public-access computers, or patron mobile device if wifi is available. moreover, patrons increasingly want to use their portable devices (e.g., usb drives, ipods, and others) with library technology. this places pressure on libraries to not just offer public-access computers, but also to support a range of technologies and services. thus the environment in which libraries offer pat is complex and requires substantial technical expertise, support, and maintenance in key areas of applications, computers, and networking. moreover, as discussed below, patrons are increasingly demanding market-based approaches to pat. these demands—which are largely about single-point access to a range of information services and resources—are often at odds with library technology that is based on stove-piped approaches (e.g, ils, e-books, and licensed resources) and that do not necessarily lend themselves to seamless integration. n external pressures on pats the advent and increased use by the public of google, amazon, itunes, youtube, myspace, second life, and other networked services affects public libraries in a number of ways. this article discusses these services and resources from the perspective of an information marketplace of which the public library is one entrant. interviewed librarians overwhelmingly indicated that users now expect library services to resemble those in the marketplace. users expect the look and feel, integration, service capabilities, interactivity, and personalization and customization that they experience while engaging in social networking, online searching, online purchasing, or other online activities. and within the library building, patrons expect the services to integrate at the public-access computer entry point—not distributed throughout the library in a range of locations, workstations, or devices. said differently, they expect to have a “mylibrary.com” experience that allows for seamless integration across the library’s services but also facilitates the use of personal technologies (e.g., ipods, mp3 players, and usb devices). thus users expect the library’s services to resemble those services offered by a range of information service providers. importantly, however, librarians indicated that library systems on which their services and resources reside by and large do not integrate seamlessly—nor were they designed to do so. public-access computers are gateways to the internet; the ils exists for patrons to search for and locate library holdings; and online databases, e-books, audiobooks, etc., are extensions of the library’s holdings but are not physical items under a library’s control and thus subject to a vendor’s information and business models. while library vendors and the library community are working to develop more integrated products that lead users to the information they seek, the technology is under development. there are three significant issues that libraries face because of market pressures: (1) the pressures all come together at a single point—the public-access computer; (2) users want a customized experience while using technology designed for the general public, not the individual user; and (3) users have choices in the information marketplace. one participant indicated, “if the library cannot match what users have access to on the outside, users will and do move on.” managing and maintaining public access managing the public-access computer environment for public libraries is an growing challenge. there are a number of management areas with which public librarians contend: n public-access computers—the computers and laptops (if applicable) themselves, which can include anything from keyboards and mice to troubleshooting a host of computer problems (it is important to note that these may be computers that often vary in age and composition, come from a range of vendors, run different operating systems, and often public access technologies in public libraries | bertot 85 have different application software versions). n peripheral management—the printers, faxes, scanners, and other equipment that are part of the library’s overall public access infrastructure. n public-access management software or systems—these may include online or in-building computer-based reservations (which encompasses specialized reservations such as teen machines, gaming computers, computers for seniors, and so on), time management (set to the library’s decided-upon time allotment), filtering, security, logins, virtual machines, etc. n wireless access—this may include logins and configurations for patrons to gain access to the library’s wireless network. n bandwidth management—this may include the need to allocate bandwidth differently as needs increase and decrease in a typical day. n training and patron assistance—for a vast array of services such as databases, online searching, e-government (e.g., completing government forms and seeking government information), and others. training can take place formally through classes, but also through point-of-use tutorials requested by patrons. to some extent, librarians commented that, while they do have issues with the public-access computers themselves from time to time, the real challenges that they face regard the actual management of the publicaccess environment—sign-ups, time limits, cost recovery for print jobs, helping patrons, and so on. one librarian commented that “the computers themselves are pretty stable. we don’t really have too many issues with them per se. it’s everything that goes into, out from, or around the computer that creates issues for us.” as a result of the management challenges, several libraries have adopted turn-key solutions, such as public-access management systems (e.g., comprise technology’s smart access manager [http://www.comprisetechnologies .com/product_29.html]) and all-encompassing public computing management systems that include networking and desktops (e.g., userful’s discoverstations [http:// userful.com/libraries/]). these systems allow for an allin-one sign-up, print cost recovery, filtering (if desired), and security approach. also, the discoverstations are a linux-based, all encompassing public-access management environment. a clear advantage to the discoverstation approach is that the discoverstation is connected to the internet and is accessible by userful staff remotely to update software and perform other maintenance functions. they also use open-source operating and application software. while these solutions do provide efficiencies, they also can create limitations. for example, the discoverstations are a thin-client system and are dependent on the server for graphics and memory, thus limiting their ability to access gaming and social-networking sites. the smart access manager, and similar programs, can rely on smart cards or other technology that users must purchase to print. another limitation is that the time limits are fixed, and, while users get warnings as time runs out, the session can end abruptly. these approaches are by and large adopted by libraries to ease the management associated with public-access computers and let staff concentrate on other duties and responsibilities. one librarian indicated that “until we had our management system, we would spend most of the day signing people up for the computers, or asking them to finish their work for the next person in line.” n planning for pat services and resources public libraries face a number of challenges when planning for pat services and resources. this is primarily because pat planning involves more than computers. any planning needs to encompass n building needs, requirements, limitations, and design; n technology assessment that considers the library’s existing technology, technology potential, current practices, and future trends; n planning for and supporting multiple technology platforms; n telecommunications and networking; n services and resources available in the marketplace—those specifically for libraries and those more broadly available to consumers and used by patrons; n specific needs and requirements of technology (e.g., memory, disk space, training, other); n requirements of other it groups with which the library may need to integrate, for example, city or county technology mandates; n support needs, including the need to enter into maintenance agreements for computer, network, and other equipment and software; n staff capabilities, such as current staff skill sets and their ability to handle the technologies under review or purchased; and n policy, such as requirements to filter because of local, state or federal mandates. the above list may not be exhaustive, but rather based on the main items that librarians identified during the site visits, and they serve to provide indicators of the challenges those planning library it initiatives face. 86 information technology and libraries | june 2009 n the endless upgrade and planning one librarian likened the pat environment to “being a gerbil on a treadmill. you go round and round and never really arrive,” a reference to the fact that public libraries are in a perpetual cycle of planning and implementing various pat services and resources. either hardware needs to be updated or replaced, or there is a software update that needs to be installed, or libraries are looking to the next technology coming down the road. in short, the technology planning to implementation cycle is perpetual. the upgrade and replacement cycle is further exacerbated by the funding situation in which most public libraries find themselves. increasingly, public library local and state funding, which combined can account for more than 90 percent of library funding, is flat or declining.14 the most recent series of public library internet studies indicates an increase in reliance by public libraries on fees and fines, fundraising, private foundation, and grant funding to finance collections and technology within libraries.15 this places key aspects of library operations in the realm of unreliable and one-time funding sources, thus making it difficult for libraries to develop multiyear plans for pat. n multiple support models to cope with pat management and maintenance issues, public libraries are developing various support strategies. the site visits found a number of technology-support approaches in effect, ranging from no it support to highly centralized statewide approaches. the following list describes the technology-support models encountered during the site visits: 1. no technology support. libraries in this group have neither technology-support staff nor any type of organized technology-support mechanism with existing library staff. nor do they have access to external support providers such as county or city it staff. libraries in this group might rely on volunteers or engage in ad hoc maintenance, but by and large have no formal approach to supporting or maintaining their technology. 2. internal library support without technology staff. in this model, the library provides its own technology support but does not necessarily have dedicated technology staff. rather, the library has designated one or more staff members to serve as the it person. usually this person has an interest in technology but has other primary responsibilities within the library. there may be some structure to the support—such as updating software (e.g., windows patches) once a week at a certain time— but it may be more ad hoc in approach. also, the library may try to provide its designated it person(s) with training to develop his or her skills further over time. 3. internal library support with technology staff. in this model, the library has at least one dedicated it staff person (partor full-time) who is responsible for maintaining and planning the library’s pat environment. the person may also have responsibilities for network maintenance and a range of technology-based services and resources. at the higher end of this approach are libraries with multiple it staff with differing responsibilities, such as networking, telecommunications, public-access computers, the ils, etc. libraries at this end of the spectrum tend to have a high degree of technology sophistication but may face other challenges (i.e., staffing shortages in key areas). 4. library consortia. over the years, public libraries have developed consortia for a range of services— shared ilss, resource sharing, resource licensing, and more. as public-library needs evolve, so too do the roles of library consortia. consortia increasingly provide training and technology-support services, and may be funded through membership fees, state aid, or other sources. 5. technology partners. while some libraries may rely on consortia for their technology support, others are seeking libraries that have more technology expertise, infrastructure, and abilities with whom to partner. this can be a fee-for-service arrangement that may involve sharing an ils, a maintenance agreement for network and public-access computer support, and a range of services. these arrangements allow the partner libraries to have some input into the technology planning and implementation processes without incurring the full expense of testing the technologies, having to implement them first, or hiring necessary staff (e.g., to manage the ils). the disadvantage to this model is that the smaller partner libraries are dependent on the technology decisions that the primary partner makes, including upgrade cycles, technology choices, migration time frames, etc. 6. city, county, or other agency it support. as city or county government agencies, some libraries receive technology support from the city or county it department (or in some cases the education department). this support ranges from a full slate of services and support available to the library to support only for the staff network and computers. public access technologies in public libraries | bertot 87 even at the higher end of the support spectrum, librarians gave mixed reviews for the support received from it agencies. this was primarily because of competing philosophies regarding the pat environment, with public librarians wanting an open-access policy to allow users access to a range of information service and resources and it agency staff wanting to essentially lock down the public-access environment and thus severely limit the functionality of the public-access computers and network services (i.e., wireless). other limitations might include prescribed pat, specified vendors, and bidding requirements. 7. state library support. one state library visited provides a high degree of service through its statewide approach to supporting public-access computing in the state’s public libraries. the state library has it staff in five locations throughout the state to provide support on a regional level but also has additional staff in the capital. these staff offer training, inhouse technical support, phone support, and can remote access the public-access computers in public libraries to troubleshoot, update, and perform other functions. moreover, this state built a statewide network through a statewide application to the federal e-rate program, thus providing broadband to all libraries. this model extends the availability of qualified technical support staff to all public libraries in the state—by phone as well as in person if need be. as a result, this enables public libraries to concentrate on service delivery to patrons. it is important to note that there are combinations of the above models in public libraries. for example, some libraries support their public-access networks and technology while the county or city it department supports the staff network and technology. it is clear, however, that there are a number of models for technology support in public libraries, and likely more than are presented in this article. the key issue is that public libraries are engaging in a broad spectrum of strategies to support, maintain, and manage their pat infrastructure. also of significance is that there are public libraries that have no technology-support services that provide pat services and resources. these libraries tend to serve populations of less than ten thousand, are rural, have fewer than five full-time equivalents (ftes), and are unlikely to be staffed by professional librarians. staff needs and pressures the study found a number of issues related to the effect of pat on library staff. this section of the findings discusses the primary factors affecting library staff as they work in the public-access context. n multiple skills needed not only is the pace of technological change increasing, but the change requires an ever-increasing array of skills because of the complexity of applications, technologies, and services. an example of such complexity is the library opac or ils. visited libraries indicated that such systems are becoming so complex and technologically sophisticated that there is a need for a full-time staff person to run and maintain the library ils. given the range of hardware, software, and networking infrastructure, as well as planning and pat management requirements, public librarians need a number of skills to successfully implement and maintain their pat environments. moreover, the skill needs depend on the librarian’s position—for example, an actual it staff person versus a reference librarian who does double duty by serving as the library’s it person. the skills required fall into technology, information literacy, service and facilities planning, management, and leadership and advocacy areas: n technology o general computer troubleshooting o basic maintenance, such as mouse and keyboard cleaning o basic computer repair, such as memory replacement, floppy drive replacement, disk defragmentation, etc. o basic networking, such as troubleshooting an “internet” issue versus a computer problem o telecommunications so as to understand the design and maintenance of broadband networks o integrated library systems o web design n information literacy o searching and using internet-based resources o searching and using library licensed resources o training patrons on the use of the publicaccess computers, general internet resources, and library resources o designing curriculum for various patron training courses n services and facilities planning o technology plan development and implementation (including budgeting) o telecommunications planning (including 88 information technology and libraries | june 2009 e-rate plan and application development) o building design so as to accommodate the requirements of public access technologies n management o license and contract negotiation for licensed resources, various public-access software and licenses, and maintenance agreements (service and repair agreements) o integration of pat into library operations o troubleshooting guidelines and process o policy development, such as acceptable use, filtering, filtering removal requests by patrons, etc. n leadership and advocacy o grant writing and partnership development so as to fund pat services and resources and extend out into the community that the library serves o advocacy so as to be able to demonstrate the value of pat in the library as a community good o leadership so as to build a community approach to public access with the library as one of the foundational institutions these items provide a broad cross section of the skills that public library staff may need to offer a robust pat environment. in the case of smaller, rural libraries, these requirements in general fall to the library director—along with all other duties of running the public library. in libraries that have separate technology, collections development, and other specialized staff, the skills and expertise may be dispersed throughout various areas in the library. n training public librarians receive a range of technology training— including none at all. in some cases, this might be a basic workshop on some aspect of technology at a state library association annual meeting or a regional workshop hosted by the library’s consortium. it could be an online course through webjunction (http://www.webjunction .org/). it could be a one-on-one session with a vendor representative or colleague. or it could be a formal, multiday class regarding the latest release of an ils. if available, public librarians have access to technology training that can take many forms, has a wide array of content (basic to expert), and can enhance staff knowledge about it with varying degrees of success. an issue raised by librarians was that having access to training and being able to take advantage of training are two separate things. regardless of the training delivery medium, librarians indicated that they were not always able to get release time to attend a training session. this was particularly the case for small, rural libraries that had less than five ftes spread out over several part-time individuals. for these staff to take advantage of training would require a substitute to cover public-service hours—or shut down the library. funding information technology as one might expect, there was a range of technology budgets in the public libraries visited or interviewed— from no technology budget to a substantial technology budget, and many points in between. some libraries had a dedicated it budget line item, others had only an operating budget out of which they might carve some funds for technology. libraries with dedicated it budgets by and large had at least one it staff person; libraries with no it budget largely relied on a staff person responsible for other library functions to manage their technology. in the smallest libraries, the library director served as the technology specialist in addition to being the general library operation manager. some libraries have established foundations through which they can raise funds for technology, among other library needs. many seek grants and thus devote substantial effort to seeking grant initiatives and writing grant proposals. some libraries held fundraisers and worked with their library friends groups to generate funds. other libraries engage in all of the above efforts to provide for their pat infrastructure, services, and resources. in short, there are several budgetary approaches public libraries use to support their pat environment. critical to note is that a number of libraries are increasingly relying on nonrecurring funds to support pats, a fact corroborated by the 2007 and 2008 public library internet surveys.16 the buildings when one visits public libraries, one is immediately struck by the diversity in design, functionality, and architecture of the buildings. public libraries often reflect the communities that they serve not only in the collection and service, but also in the facilities. this diversity serves the public library community well because it allows for a custom approach to libraries and their community. the building design, however, can also be a source of substantial challenge for public libraries. the increased integration of technology into library service places a range of stresses on buildings—physical space for workstations and other equipment and specialized furniture, power, server rooms, and cabling, for example. along with the library-based technology requirements come those of patrons—particularly the need for power so that public access technologies in public libraries | bertot 89 patrons may plug in their laptops or other devices. also important to note is that the building limitations also extend to staff and their access to computing and networked technologies. a number of librarians commented that they are “simply at capacity.” one librarian summed it up by stating that “there’s no more room at the inn. unless we start removing parts of our collection, we don’t have any more room for workstations.” another said that, “while we do have the space to add more computers, we don’t have enough power or outlets to support them. and, with our building, it’s not a simple thing to add.” in short, many libraries are reaching, or have reached, a saturation point as to just how much pat they can support. n discussion and implications over time, pat services have become essential services that public libraries provide their communities. with nearly all public libraries connected to the internet and offering public-access computers, the high percentage of libraries that offer internet-based services and resources, the overall usage of these resources by the public,17 and 73 percent of public libraries reporting that they are the only free provider of pat in their communities, it is clear that the provision of pat services is a key and critical service role that public libraries offer.18 it is also clear, however, that the extent to which public libraries can continue to absorb, update, and expand their pat depends on the resolution of a number of staffing, financial, maintenance and management, and building barriers. in a time of constrained budgets, it is unlikely that libraries will receive increased operational funding. indeed, reports of library funding cuts are increasing in the current economic downturn, which affects the ability of libraries to increase, or significantly update, staff—particularly in the areas of technology, licensing additional resources, procuring additional and new computers, and purchasing and offering expanded services such as digital photography, gaming, or social networking.19 moreover, the same financial constraints can affect the ability of libraries to raise capital funds for building improvements and new construction. funding also has an effect on the training that public libraries can offer or develop for their staff. and training is becoming increasingly important to the success of pat services and resources in public libraries—but not just training regarding the latest technologies. rather, there is a need for training that provides instruction on the relationship between the level of pat services and resources a library can or desires to provide and advocacy; broadband, computing, and other needs; technology planning and management; collaboration and partnering; and leadership. the public library pat environment is complex, encompasses a number of technologies, and has ties to many community services and resources. training programs need to reflect this complexity. the continued provision of pat services in public libraries is increasingly burdensome on the public library community, and the pressures to expand their pat services and resources continues to grow—particularly as libraries report their “sole provider” of free pat status in their communities. the successful libraries in terms of pat services and resources visited had staff that could n understand pat (both in terms of functionality and potential); n think creatively across the technology and library service spectrum; n integrate online content, pat, and library services; n articulate the value of pat as an essential community need and public library service; n articulate the role of the perception of the library by its community as a critical bridge to online content; n demonstrate leadership within the community and library; n form partnerships and extend pat services and resources into the community; and n raise funds and develop other support mechanisms to enhance pat services and resources in the library and throughout the community. in short, successful pat in libraries was being redefined in the context of communitywide pat service and resource provision. this approach not only can lead to a more robust community pat infrastructure, but it also lessens the library’s burden of pat service and resource provision. but equally important to note is that the extent to which all public libraries can engage in these activities on their own is unclear. indeed, several libraries visited were struggling to maintain basic pat service levels and indicated that increasing pat services came at the expense of other library services. “we’re trying to meet demand,” one librarian said, “but we have too few computers, too slow a connection, and staff don’t always know what to do when things go wrong or someone comes in talking about the latest technology or website.” for some libraries, therefore, quality pat services that meet community needs are simply out of reach. thus another implication and finding of the study is the need for libraries to explore other models of support for their pat environments—for example, using the services of a regional cooperative, if available; if none is available, libraries could form their own cooperative for resource sharing, technology support, and other aspects of pat service provision. the same approach could be 90 information technology and libraries | june 2009 taken within a city or county to enhance technology support throughout a region. another approach would be to outsource a library’s pat support and maintenance to a nearby library with support staff in a fee-for-service approach. there are a number of approaches that libraries could take to support their pat infrastructure. a key point is that libraries need to consider pat service provision in a broader community, regional, or state context, and the study found some libraries doing so. the need to avail staff of the skills required to truly support pat was a recurring theme throughout the site visits. approaches and access to training varied. for example, some state libraries provided—either directly or through the hiring of consultants and instructors—a number of technology-related courses taught in regional locations. an example of this approach is california’s infopeople project (http://www.infopeople .org/). some state libraries subscribed to webjunction (http://www.webjunction.org/), which provides access to online instructional content. online manuals provided by compumentor through a grant funded by the bill and melinda gates foundation aimed at helping rural libraries support their pat (www.maintainitproject.org) are another resource. beyond technology skills training, however, is the need for technology planning, effective communication, leadership, value demonstration, and advocacy. the extent to which leadership, advocacy, and library marketing, for example, are able to be taught remains a question. all of these issues take place with the backdrop of an economic downturn and budgetary constraints. increased operating costs created through inflation and higher energy costs place substantial pressures on public libraries simply to maintain current levels of service— much less engage in the additional levels of service that the pat environment brings. indeed, as the 2008 public library funding and technology access study demonstrated, public libraries are increasingly funding their technology-based services through non-recurring funds such as fines and fundraising activities.20 thus, the ability of public libraries to provide robust pat services and resources is increasingly limited unless such service provision comes at the expense of other library services. alone, the financial pressures place a high burden on public libraries. combined with the building, staffing, skills, and other constraints reported by public libraries, however, the emerging picture for library pat services and resources is one of significant challenge. n three key areas for additional exploration the findings from the study point to the need for additional research and exploration of three key services areas and issues related to pat support and services: 1. develop a better understanding of success in the pat environment. this study and the 2006 study by bertot et al. point to what is required for libraries to be successful in a networked environment.21 in fact, the 2007 public libraries and the internet report contained a section entitled “the successfully networked public library,” which offered a range of checklists for public libraries (and others) to consider as they planned and implemented their networked services.22 this study identified additional success factors and considerations focused specifically on the public access technology environment. together, these efforts point to the need to better understand and articulate the critical success factors necessary for public libraries to plan, implement, and update their pat given current service contexts. this is particularly necessary in the context of meeting user expectations and needs regarding networked technologies and services. 2. further identify technology-support models. this study uncovered a number of different technologysupport models implemented by public libraries. undoubtedly there are additional models that require identification. but, more importantly, there is a need to further explore how each technologysupport model assists libraries, under what circumstances, and in what ways. some models may be more or less appropriate on the basis of the service context of the library—and that is not clearly understood at this time. 3. levels of service capabilities. an underlying theme throughout this research, and one that is increasingly supported by the public library and the internet studies, is that the pat service context is essentially a continuum from low service and capability to high service and capability. there are a number of factors contributing to where libraries may lie on the success continuum—funding, management, leadership, attitude, skills, community support, and innovation, to name a few. this continuum requires additional research, and the research implications could be profound. emerging data indicate that there are public libraries that will be unable to continue to evolve and meet the increased demands of the networked environment, both in terms of staff and infrastructure. public libraries will have to make choices regarding the provision of pat services and resources in light of their ability to provide high-quality services (as defined by their service communities). for better or worse, the technology environment continually evolves and requires new technologies, management, and support. that is, public access technologies in public libraries | bertot 91 and will continue to be, the nature of public access to the internet. though there are likely other issues worthy of exploration, these three are critical to further our understanding of the pat environment and public library roles and issues associated with the provision of public access. n conclusion the pat environment in which public libraries operate is increasingly complex and continues to grow in funding, maintenance and management, staffing, and building demands. public libraries have navigated this environment successfully for more than fifteen years; however, stresses are now evident. libraries rose quickly to the challenge of providing public-access services to the communities that they serve. the challenges libraries face are not necessarily insurmountable, and there are a range of tools designed to help public libraries plan and manage their public-access services. these tools, however, place the burden of public access, or assume that the burden of public access in placed, on the public library. given increased operating costs because of inflation, the continual need to innovate and upgrade technologies, staff technology skills requirements, and other factors discussed in this article, libraries may not be in a position to shoulder the burden of public access alone. thus there is a need to reconsider the extent to which pat provision is the sole responsibility of the library; perhaps there is a need to integrate and expand public access throughout a community. the potential of such an approach can benefit a community through an integrated and broader access strategy, but also can relieve the pressure on the public library as the sole provider of public access. n acknowledgement this reserach was made possible in part through the support of the maintianit project (http://www.maintainit project.org/), an effort of the nonprofit techsoup web resource (http://www.techsoup.org/). references 1. charles r. mcclure, john carlo bertot, and douglas l. zweizig, public libraries and the internet: study results, policy issues, and recommendations (washington, d.c.: national commission on libraries and information science, 1994). 2. john carlo bertot and charles r. mcclure, moving toward more effective public internet access: the 1998 national survey of public library outlet internet connectivity (washington, d.c.: national commission on libraries and information science, 1998), http://www.liicenter.org/reports/1998_plinternet_ study.pdf (accessed apr. 22, 2009). 3. charles r. mcclure, john carlo bertot, and john c. beachboard, internet costs and cost models for public libraries (washington, d.c.: national commission on libraries and information science, 1995). 4. charles r. mcclure, john carlo bertot, and douglas l. zweizig, public libraries and the internet: study results, policy issues, and recommendations (washington, d.c.: national commission on libraries and information science, 1994); john carlo bertot, charles r. mcclure, paul t. jaeger, and joe ryan, public libraries and the internet 2006: study results and findings (tallahassee, fla.: information institute, 2006), http://www .ii.fsu.edu/projectfiles/plinternet/2006/2006_plinternet.pdf (accessed mar. 5, 2009). 5. john carlo bertot, charles r. mcclure, carla b. wright, elise jensen, and susan thomas, public libraries and the internet 2007: study results and findings (tallahassee, fla.: information institute, 2008). http://www.ii.fsu.edu/projectfiles/plinternet/ 2007/2007_plinternet.pdf (accessed sept. 10, 2008). 6. charles r. mcclure and paul t. jaeger, public libraries and internet service roles: measuring and maximizing internet services (chicago: ala, 2008). 7. george d’elia, june abbas, kay bishop, donald jacobs, and eleanor jo rodger, “the impact of youth’s use of the internet on the use of the public library,” journal of the american society for information science & technology 58, no. 14 (2007): 2180–96; george d’elia, corinne jorgensen, joseph woelfel, and eleanor jo rodger, “the impact of the internet on public library use: an analysis of the current consumer market for library and internet services,” journal of the american society for information science & technology 53, no. 10 (2002): 802–20. 8. national center for education statistics (nces), public libraries in the united states: fiscal year 2005 [nces 2008301] (washington, d.c.: national center for education statistics, 2007); pew american and internet life, “internet activities,” http:// www.pewinternet.org/trends/internet_activities_2.15.08.htm (accessed mar. 5, 2009). 9. bertot et al., public libraries and the internet 2007. 10. ibid. 11. cheryl bryan, managing facilities for results: optimizing space for services (chicago: public library association, 2007); joseph matthews, strategic planning and management for library managers (westport, conn.: libraries unlimited, 2005); joseph matthews, technology planning: preparing and updating a library technology plan (westport, conn.: libraries unlimited, 2004); diane mayo and jeanne goodrich, staffing for results: a guide to working smarter (chicago: public library association, 2002). 12. ala, libraries connect communities: public library funding & technology access study (chicago: ala, 2008), http:// www.ala.org/ala/aboutala/offices/ors/plftas/0708report.cfm (accessed mar. 5, 2008). 13. charles p. smith, ed., motivation and personality: handbook of thematic content analysis (new york: cambridge univ. 92 information technology and libraries | june 2009 pr., 1992); klaus krippendorf, content analysis: an introduction to its methodology (beverly hills, calif.: sage, 1980). 14. ala, libraries connect communities. 15. bertot et al., public libraries and the internet 2006; bertot et al., public libraries and the internet 2007. 16. ibid. 17. nces, public libraries in the united states. 18. bertot et al., public libraries and the internet 2007. 19. american libraries, “branch closings and budget cuts threaten libraries nationwide,” nov. 7, 2008, http://www .ala.org/ala/alonline/currentnews/ newsarchive/2008/november2008/ branchesthreatened.cfm (accessed nov. 17, 2008). 20. ala, libraries connect communities. 21. bertot et al., public libraries and the internet 2006. 22. bertot et al., public libraries and the internet 2007. 44 information technology and libraries | march 2011 jennifer emanuel usability of the vufind next-generation online catalog vufind incorporates many of the interactive web and social media technologies that the public uses online, including features from online booksellers and commercial search engines. the vufind search page is simple, containing only a single search box and a dropdown menu that gives users the option to search all fields or to search by title, author, subject, or isbn/issn (see figure 1). to combine searches using boolean logic or to limit to a particular language or format, the user must use the advanced search feature (see figure 2). the recordresults page displays results vertically, with each result containing basic item information, such as title, author, call number, location, item availability, and a graphical icon displaying the material’s format. the results page also has a column on the right side displaying “facets,” which are links that allow a user to refine their search and browse results using catalog data contained within the result set (see figure 3). vufind also contains a variety of web 2.0 features, such as the ability to tag items, create a list of favorite items, leave comments about an item, cite an item, and links to google book previews and extensive author biographies data mined from the internet. corresponding to the beginning of the vufind trial at uiuc, the university library purchased reviews, synopses, and cover images from syndetic solutions to further enhance both vufind and the existing webvoyage catalog. an additional appealing aspect of vufind was its speed; the carli installation of webvoyage is slow to load and is prone to time out while conducting searches. the uiuc library first provided vufind (http:// www.library.illinois.edu/vufind) at the beginning of the 2008 fall semester and expected it to be trialed through the end of the spring semester 2009. use statistics show that throughout the fall semester (september through december), there were approximately six thousand unique visitors each month, producing a total of more than thirty-eight thousand visits. spring statistics show use averaging more than ten thousand visitors a month, an increase most likely from word-of-mouth. librarians at both uiuc and carli were interested in what users thought about vufind, especially in relation to the usability of the interface. with this in mind, the library launched several forms of assessment during the spring semester. the first was a quantitative survey based on yale’s vufind usability testing.3 the second was a more extensive qualitative usability test that had users conducting sample searches in the interface and telling the facilitator their opinions. this article will discuss the hands-on usability portion of this study. survey responses that support the results presented herein will be reported in a separate venue. while this article only discusses vufind at a single institution, it does offer a generalized view of next-generation catalogs and how library users use such a catalog compared to a traditional online catalog. the vufind open–source, next-generation catalog system was implemented by the consortium of academic and research libraries in illinois as an alternative to the webvoyage opac system. the university of illinois at urbana-champaign began offering vufind alongside webvoyage in 2009 as an experiment in next generation catalogs. using a faceted search discovery interface, it offered numerous improvements to the uiuc catalog and focused on limiting results after searching rather than limiting searches up front. library users have praised vufind for its web 2.0 feel and features. however, there are issues, particularly with catalog data. v ufind is an open–source, next-generation catalog overlay system developed by villanova university library that was released to the public as beta in 2007 and version 1.0 in 2008.1 as of july 2009, four institutions implemented vufind as a primary catalog interface, and many more are either beta or internally testing it.2 more information about vufind, including the technical requirements and compatible opacs, is available on the project website (http://www.vufind.org). in illinois, the state consortium of academic and research libraries in illinois (carli) released a beta installation of vufind in 2008 on top of its webvoyage catalog database. the carli installation of vufind is a base installation with minor customizations to the carli catalog environment. some libraries in illinois utilize vufind as an alternative to their online catalog, including the university of illinois at urbana-champaign (uiuc), which currently advertises vufind as a more user friendly and faster version of the library catalog. as a part of the evaluation of nextgeneration catalog systems, uiuc decided to conduct hands-on usability testing during the spring of 2009. the carli catalog environment is very complex and comprises 153 member libraries throughout illinois, ranging from tiny academic libraries to the very large uiuc library. currently, 76 libraries use a centrally managed webvoyage system referred to as i-share. i-share is composed of a union catalog containing holdings of all 76 libraries as well as individual institution catalogs. library users heavily use the union catalog because of a strong culture of sharing materials between member institutions. carli’s vufind installation uses the records of the entire union catalog, but has library-specific views. each of these views is unique to the member library, but each library uses the same interface to view records throughout i-share. jennifer emanuel (emanuelj@illinois.edu) is digital services and reference librarian, university of illinois at urbana-champaign. usability of the vufind next-generation online catalog | emanuel 45 not simply find them.6 as a result, the past five years have been filled with commercial opac providers releasing next-generation library interfaces that overlay existing library catalog information and require an up-front investment by libraries to improve search capabilities. as these systems are inherently commercial and require a significant investment of capital, several open–source, next-generation catalog projects have emerged, such as vufind, blacklight, scriblio, and the extensible catalog project.7 these interfaces are often developed at one institution with their users in mind and then modified and adapted by other institutions to meet local needs. however, because they can be locally customized, libraries with significant technical expertise can have a unique interface that commercial vendors cannot compete against. one cannot discuss next-generation catalogs without mentioning the metadata that underlie opac systems. some librarians view the interface as only part of the problem of library catalogs and point to cataloging and metadata practices as the larger underlying problem. many librarians view traditional cataloging using machine-readable cataloging (marc), which has been used since the 1960s, as outdated because it was developed with nearly fifty-year-old technology in mind.8 however, because marc is so common and allows cataloging with a fine degree of granularity, current opac systems still utilize it. librarians have developed additional cataloging standards, such as dublin core (dc), metadata object description schema (mods), and functional requirements for bibliographic records (frbr), but none of these have achieved widespread adoption for cataloging printed materials. newly developed catalog projects, such as extensible catalog, are beginning to integrate these new metadata schemas, but currently others continue to use marc.9 many librarians also advocate to integrate folksonomy, or user tagging, into library catalogs. folksonomy is used by many library websites, most notably flickr, delicious, and librarything, each of which store user-submitted content that istagged with self-selected keywords that allow for easy retrieval and discovery.10 vufind integrates tagging into individual item records ■■ literature review librarians have complained about the usability of online catalogs since they were first created.4 when amazon.com became the go-to site for books and book information in the early 2000s, librarians and their users began to harshly criticize both opac interfaces and metadata standards.5 ever since north carolina state university announced a partnership with the commercial-search corporation endeca in 2006, librarians have been interested in the next generation of library catalogs and more broadly, discovery systems designed to help users discover library materials, figure 1. vufind default search figure 2. vufind advanced search figure 3. facets in vufind 46 information technology and libraries | march 2011 searching the library’s online catalog and were eager to see changes made to it. the test used was developed from a statewide usability test of different catalog interfaces usedin illinois. the test was adapted using the same sample searches, but was customized to the features and uses of vufind (see appendix). the vufind test was similar to the original test to allow a comparison of other catalog interfaces to vufind for internal evaluation purposes. i designed the test to allow subjects to perform a progressively complicated series of sample searches using the catalog while the moderator pointed out various features of the catalog interface. subjects were also asked what they thought about the search result sets and their opinions of the interface and navigation; they also were asked to perform specific tasks using vufind. the tasks were common library-catalog tasks using topics familiar at undergraduate–level students. the tasks ranged from a keyword search for “global warming” to a more complicated search for a specific compact disc by the artist prince. the tasks also included using the features associated with creating and using an account with vufind, such as adding tags and creating a favorite items list. through completing the test, subjects got an overview of vufind and were then asked to draw conclusions about their experience and compare it to other library catalogs they have used. the tests were performed in a small meeting room with one workstation set up with an install of the morae software, a microphone, and a web camera. morae is a very powerful software program developed by techsmith that records the screen on which the user is interacting with an interface, as well as environmental audio and video. although the study did not utilize all the features of the morae software, it was invaluable to the researcher to be able to review the entire testing experience with the same detail as when the test actually occurred in person. the study was carried out with the researcher sitting next to the workstation asking subjects to perform a task from the script while morae recorded all of their actions. once all fifteen subjects completed the test, the researcher watched the resulting videos and coded the answers into various themes on the basis of both broad subject categories and individual question answers. the researcher then gathered the codes into categories and used them to further analyze and gain insight into both the useful features of and problems with the vufind interface. ■■ analysis participants generally liked vufind and preferred it to the current webvoyage system. when asked to choose which catalog they would rather use, only one person, a faculty member, stated he would still use webvoyage. this faculty but does not pull tags from other sources; rather, users must tag items individually. additionally, next-generation catalogs offer a search mechanism that focuses on discovery rather than simply searching for library materials. users, accustomed to new ways of searching both on the internet and through commercial library indexing and abstracting databases, now search in a fundamentally different style than they did when opacs first became a part of library services. the online catalog is now just one of many tools that library users use to locate information and now covers fewer resources than it did ten to fifteen years ago. library users are now accustomed to using a single search box, such as with google; they also use nonlibrary online tools to find information about books and no longer view library catalogs as the primary place to look for books.11 as users are no longer accustomed to using the controlled language and particular searching methods of library catalogs because they have moved to discovering materials online, libraries must adapt to new way of obtaining information and focus not on teaching users how to locate library materials, but give them the tools to discover on their own.12 vufind is one option among many in the genre of next-generation or discovery-catalog tools. ■■ methods the study employed fifteen subjects who participated in individual, hands-on usability test sessions lasting an average of thirty minutes. i recruited volunteers though several methods, including posting to a university faculty and staff e-mail discussion list, an e-mail discussion lists aimed toward graduate students, and flyers in the undergraduate library. all means of recruitment stated that the library sought volunteer subjects to perform a variety of sample searches in a possible new library catalog interface. i also informed subjects that there was a gift card as a thank you for their time. all subjects had to sign a human subjects statement of informed consent approved by the university of illinois institutional review board. i sought a diverse sample, and therefore accepted the first five volunteers from the following pools: faculty and staff, graduate students, and undergraduate students. i felt that these three user groups were distinct enough to warrant having separate pools. the number of five users in each group was chosen because of jakob nielsen’s statement that five users will find 85 percent of usability problems and that fifteen users will discover all usability problems.13 although i did not specifically aim to recruit a diverse sample, the sample showed a large diversity in areas including age, library experience, and academic discipline. all subjects stated they had some experience usability of the vufind next-generation online catalog | emanuel 47 though there were questions as to how results were deemed relevant to the search statement as well as how they were ranked. participants were then asked to look at the right sidebar of the results page, which contains the facets. most users did not understand the term “facets,” with faculty and staff understanding the term more than graduate and undergraduate students did. one faculty member who understood the term facet noted that “facets are like a diamond with different sides or ways of viewing something.” however, when asked what term would be better to call the limiting options other than facet, several users suggested either calling the facets “categories” or renaming the column “refine search,” “narrow search,” or “sort your search.” participants were then asked to find how to see results for other i-share libraries. only two faculty members found i-share results quickly, and just half of the remaining participants were able to find the option at all. when asked what would make that option easier to find, most said they liked the wording, but the option needed to stand out more, perhaps with a different colored link or bolder type. two users thought having the location integrated as a facet would be the most useful way of seeing it. participants, however, quickly took to using the facets, as they were asked to use the climate change search results to find an electronic book published in 2008. no user had problems with this task, and several remarked that using facets was a lot easier than limiting to format and year before searching. the next task for participants was to open and examine a single record within their original climate change results (see figures 4 and 5). participants liked the layout, including the cover image with some brief title information, and a tabbed bar below showing additional information, such as more detailed description, holdings information, a table of contents, reviews, comments, and a link to request the item. several users remarked that they liked having information contained under tabs, but vufind organized each tab as a new webpage that made going back to previous tabs or the results page cumbersome. the only problem users had with the information contained within the tabs was the “staff view,” which contained the marc record information. most users looked at the marc record with confusion, including one graduate student who said, “if the staff view is of no use to the user, why even have it there?” one other useful feature that individual records in vufind contain is a link to an overlay window containing the full citation information for the item in both apa and mla formats. users were able to find this “cite this” link and liked having that information available. however, several participants noted that citation information would be much more beneficial if it could be easily exported to refworks or other bibliographic software. the next several searches used progressively higher-level member thought most of his searches were too advanced for the vufind interface and needed options that vufind did not have, such as limiting a search to an individual library or call number searching. this user did, however, specify that vufind would be easier to use for a fast and simple search. other users all responded very favorably to vufind, liking it better than any other online catalog they have used, with most stating that they wanted it as a permanent addition to the library. the most common responses to vufind were that the layout is easier on the eyes and displayed data much better than the webvoyage catalog; there were no comments about actual search results. several users stated that it was nice to be able to do a broad search and then have all limiting options presented to them as facets, allowing users to both limit after searching and letting them browse through a large number of search results. one user, an undergraduate student, stated she liked vufind because it “was new” and she always wants to try out new things on the internet. the first section of the usability test asked users to examine both the basic and advanced search options. users easily recognized how the interface functioned and liked having a single search box as the basic interface, noting that it looked more like a web search engine. they also recognized all of the dropdown menu options and agreed that the options included what they most often searched. however, four users wanted a keyword search. even though there is not a keyword search in webvoyage and there is an “all fields” menu option, participants seemed to think of the one box search universally as a keyword search and wanted that to be the default search option. one participant, an international graduate student, remarked that keyword is more understood by international students than the “all fields” search because, internationally, a field is not a search field but a scholarly field such as education or engineering. in the advanced search, all users thought the search options were clear and liked having icons to depict the various media formats. however, two users did remark that it would be useful to be able to limit by year on the advanced search page. the advanced search also is where the user can select one of seven languages, all of which are considered western languages, including latin and russian. two users, both international graduate students, stated that more languages would be beneficial, especially asian and more slavic languages. the university of illinois has separate libraries for asian and slavic materials, and these two participants said it would be useful to have search options that include the languages served by the libraries. the first task that participants were asked to do was an “all fields” search for “climate change.” they were instructed to look at the results page and an individual record to give feedback as to how they liked the layout and what they thought of the search results. upon looking at the results, all participants thought they were relevant, 48 information technology and libraries | march 2011 to items in which james joyce is the author, no participant had any problems, though several pointed out that there were three facets using his name—joyce, james; joyce, james avery; and joyce, j. a.—because of inconsistencies in cataloging (see figure 6). participants were next asked to search for an audio recording by the artist prince using the basic (single) search box. most participants did an “all fields” search for prince and attempted to use the facets to limit by a particular format. all but one was confident that they achieved the proper result, but there was confusion about the format. some participants were confused as to what format an audio recording was because the corresponding facet was for a music recording. a couple of users thought “audio recording” could be a spoken-word recording. most participants preferred that the format facets be more concrete toward a single actual physical format, such as a record, cassette, or a compact disc (see figure 7). physical formats appeared to resonate more with users than the broad cataloging term of “music recording.” a more specific format type (i.e., compact disc) is contained in the call number and should be straightforward to pull out as a facet. it appears vufind pulls the format information from marc field 245 subfield $h for medium rather than the call number (which at illinois can specify the format) or the 300 physical description field or another field such as a notes field that some institutions may use to specify the exact format. however, when participants were asked to further use facets to find prince’s first album, 1978’s for you, limitations with vufind became more apparent. each participant used a different method to search for this album, and none actually found the item either locally or in i-share, though the item has multiple copies available in both locations. most participants tried initially limiting by date because they were given that information. however, vufind’s facets focus on eras rather than specific years, which participants stated was frustrating as many items can fall under a broad era. also, the era facets brought up many more eras than one would consider an audio research skills and showed problems with both vufind and the catalog record data. the first search asked participants to do an “all fields” search for james joyce. all were able to complete the search, but there was notable confusion as to which records were written by james joyce and which were items about him. about half of the first-page results for this search did not list an author on the results page. vufind appears to pull the author field on the results page from the 100 field in the marc record, so if the 700 field is used instead for an editor, this information is not displayed on the results page. individual records do substitute the 700 field if the 100 field is not present, but this should also be the case on the initial results screen as well. several users thought it was strange that the results page often did not list the author, but an author was listed in the individual record. additionally, when asked to use the facets to limit figure 4. results set figure 5. record display figure 6. author facet figure 7. format facet usability of the vufind next-generation online catalog | emanuel 49 about both the reviews and comments that could be seen in the various records participants were asked to examine. many of the participants wanted more information as to where the reviews came from because this information was not clear. they also wanted to know whether the reviews or comments from catalog users had any type of moderation by a librarian. for the most part, participants liked having reviews inside the catalog records, but they liked having a summary even more. several users, all graduate students, expressed concern about the objectiveness of having reviews in the catalog, especially because it was not clear who did the review and feared that reviews may interject some bias that had no place in a library catalog record. one of these participants stated, “if i wanted reviews, i would just go to amazon. i don’t expect reviews, which can be subjective, to be in a library catalog—that is too commercial.” several undergraduate participants stated that reviews helped them decide whether the book was something that would be useful to them. the final task of the usability test asked participants to create an account with vufind because it is not connected to our user database. most users had no problems finishing this task, though they found some problems with the interface. first, it was not clear that users had to create an account and could not log in with their library number as they did in the library’s opac. second, the default field asks users for their barcode, which is not a term used at uiuc (users are assigned a library number). once logged in, participants were satisfied with the menu options and how their account information was displayed. finally, participants were asked, while logged in, to search for a favorite book and add it to their favorites list. all users liked the favorites-list feature, and many already knew of ways they could use it, but several wished they could create multiple lists and have the ability to arrange lists in folders. ■■ discussion participants thought favorably of the vufind interface and would use it again. they liked the layout of information much more than the current webvoyage interface and thought it was much easier to look at. they also had many comments that the color scheme (yellow and grey) was easier than the blues of the primary library opac. vufind also had more visual elements, such as cover images and icons representing format types that participants also commented on favorably. when asked to compare vufind to both the webvoyage catalog and amazon, only one participant indicated a preference for amazon, while the rest preferred vufind. the user who specified amazon, a faculty member, stated that that was where he always started searching for books; he would then search for specific titles in the recording, such as the 15th century. granted, the 15th century probably brings up music that originated in that era, not recorded then, but participants wanted the date to correspond to when an item was initially published or released. it appears that vufind pulls the era facet information from the subject headings and ignores the copyright or issue year. to users, the era facets are not useful for most of their search needs; users would rather limit by copyright or the original date of issue. another search that further highlighted problems searching for multimedia in vufind is the title search participants did for gone with the wind. everyone thought this search brought up relevant results, but when asked to determine whether the uiuc library had a copy of the dvd, many users expressed confusion. once again, the confusion was based on the inability to limit to a specific format. participants could use the facets to limit to a film or video, but not to a specific format. several participants stated that they needed specific formats because when they are doing a comparable search, they only want to find dvds. however, because all film formats are linked together under “film/video,” they must to go into individual records and examine the call number to determine the exact format. most participants stated clearly that “dvd” needed to be it’s own format facet and that entering a record to find the format required too much effort. participants also expressed frustration that the call number was the only place to determine specific format and believed that this information should be contained in the brief item information and not buried in the tabbed areas. the frustrations with the lack of specific formats also were evident when participants were asked to do an advanced search for a dvd on public speaking. all users initially thought the advanced search limiter for film/video was sufficient when they first looked at the advanced search options. however, when presented with an actual search (“public speaking”), they found that there should be more options and specific format choices up-front within the advanced search. another search that participants conducted was an author search for jack london. they then used the facets to find the book white fang. this search was chosen because the resulting records are mostly for older materials that often do not contain a lot of the additional information that newer records contain. participants looked at a specific record and then were asked what they thought of the information that was displayed. most answered that they would like as much information as you can give them, but were accepting of missing information. several participants stated that most people already know this book and thus did not need additional information. however, when pressed as to what information they would like added to the record, several users stated a summary would be the most useful. additionally, several users asked for more information 50 information technology and libraries | march 2011 the simplicity of the favorites listing feature, the difficulty of linking to other i-share library holdings, and the difficulties in using the facet categories. ■■ implications i intend to continue to perform similar usability tests on next-generation catalogs on a trial basis to examine one aspect regarding the future of online catalogs at uiuc. uiuc is looking at various catalog interfaces, of which vufind is one option, to see which best meets the needs of our users. users stated multiple times during testing that they find the current webvoyage interface to be very frustrating and will accept nearly anything that is an improvement, even if the new interface has some usability issues. vufind is not perfect for all searches, as shown by a lack of a call number search and the limitations in searching for multimedia options, but it does provide a more intuitive interface for most patrons. the future of vufind at uiuc is still open. development is currently stalled because of a lack of developer updates and internal staffing constraints both at uiuc and carli. however, because vufind is open–source, and the only ongoing cost is that of server maintenance, both carli and the library are continuing to display it as an option for searching the catalog. both carli and uiuc are closely examining other options for catalog interfaces that would provide patrons with a better search experience, but they have taken no further action to permanently adapt either vufind or to demo other options. despite its limitations, vufind is still a viable option for libraries with substantial technology expertise that are interested in a next-generation catalog interface at a low price. although it does have limitations, it has a better out-of-the-box interface than traditional opacs and should be considered alongside commercial options for any library thinking of adapting a catalog interface overlay. this usability test focused on one institution’s installation of vufind, which may or may not apply to other installations and other institutional needs. it would be interesting to study an installation of vufind at a smaller, nonresearch institution, where users have different searching needs and expectations related to a library’s opac. references 1. john houser, “the vufind implementation at villanova university,” library hi tech 27, no. 1 (2009): 96–105. 2. vufind, “vufind: about,” http://www.vufind.org/about .php (accessed sept. 10 2009). 3. kathleen bauer, “yale university vufind test— undergraduates,” http://www.library.yale.edu/libepub/ usability/studies/summary_undergraduate.doc (accessed mar. 20, 2010). library catalog to check availability. other participants who made comments about amazon stated that it was commercial and more about marketing materials, while the library catalog just provided the basic information needed to evaluate materials without attempting to sell them to you. several participants also stated they checked amazon for book information, but generally did not like it because of its commercial nature; because vufind provides much of the same information as amazon, they will use vufind first in the future. participants also thought amazon was for a popular and not scholarly audience, making it not useful for academic purposes. most users did not have much to say about the webvoyage opac, except it was overwhelming, had too many words on the result screen, and was not pleasantly visual. participants were also asked to look at vufind, amazon, and webvoyage from a visual preference. again, participants believed that vufind had the best layout. they liked that vufind had a very clean and uncluttered interface and that the colors were few and easy on the eye. they also commented about the visuals contained (cover art and icons) in the records and the vertical orientation of vufind (webvoyage has a horizontal orientation) to display records. they also liked how the facets were displayed, though two users thought they would be better situated on the left side of the results because they scan websites from the left to the right. the one thing that was mentioned several times was vufind’s lack of the star rating system that amazon uses to quickly rate an item. participants thought such a system might be better than reviews because it allows users to quickly scan through the item and not have to read through multiple reviews. when asked to rate the ease of use for vufind, with 1 being easy and 5 being difficult, participants rated it an average of 1.92. faculty rated the ease at 1.6, graduate students at 1.75, and undergraduates at 2.8. undergraduates were more likely to get frustrated at media searching and thought that some of the facets related to media items were confusing, which they used to explain their lower scores. however, when asked if they would rather use vufind over the current library catalog (webvoyage), all but one participant enthusiastically stated they would use vufind. most users stated that although vufind was not perfect, it was still much better than the other library catalog because of the better layout, visuals, and ability to limit results. the only user that specified they would still rather use the webvoyage catalog believed it had more options for advanced search, such as call number searching, which vufind lacked. there are, however, several changes that could make vufind more useful to our users that came out of usability testing. some of these are easy to implement on a local level, and others would improve the base build of vufind. a number of issues arose from usability testing, but the largest issues are the lack of refworks integration, usability of the vufind next-generation online catalog | emanuel 51 9. jennifer bowen, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology & libraries 27, no. 2 (2008): 6–19. 10. tom steele, “the new cooperative cataloging,” library hi tech 27, no. 1 (2009): 68–77. 11. ian rowlands and david nicholas, “understanding information behaviour: how do students and faculty find books?” journal of academic librarianship 34, no. 1 (2008): 3–15. 12. ja mi and cathy weng, “revitalizing the library opac: interface, searching, and display challengers,” information technology & libraries 27, no. 1 (2008): 5–22. 13. jakob nielsen, “why you only need to test with 5 users,” http://www.useit.com/alertbox/20000319.html (accessed mar. 20, 2010). 4. christine borgman, “why are online catalogs still hard to use?” journal of the american society for information science 47, no. 7 (1996): 493–503. 5. georgia briscoe, karne selden, and cheryl rae nyberg, “the catalog versus the home page: best practices for connecting to online resources,” law library journal 95, no. 2 (2003): 151–74. 6. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39. 7. marshall breeding, “library technology guides: discovery layer interfaces,” http://www.librarytechnology. org/discovery.pl?sid=20100322930450439 (accessed mar. 2010). 8. karen m. spicher, “the development of the marc format,” cataloging & classification quaterly 21, no 3/4 (1996): 75–90. appendix. vufind usability study logging sheets i. the look and feel of vufind a. basic screen (the vufind main page) 1) is it obvious what to do? yes _____ no _____; what were you trying to do? 2) open the drop down box, examine the options. do you recognize theseoptions? yes _____ no _____ some _____ (if some, find out what the patron was expecting and get suggestions for improvement). comments: b. click on the advanced search option—take a minute to allow the participants to look around the screen 1) examine each of the advanced search options a) are the advanced search options clear? yes_____ no_____ b) are the advance search options helpful? yes_____no_____ 2) examine the limits fields, open the drop-down menu boxes a) are the limits clearly identified? yes _____ no _____ b) are the pictures helpful? yes _____ no _____ c) are the drop-down menu box options clear? yes _____ no _____ comments: ii. (back to the) basic search field a. enter the phrase—climate change (search all fields)—examine the search results 1) do the records retrieved appear to be relevant to your search statement? yes _____no _____don’t know _____ 2) what information would you like to see in the record? how should it be displayed? 3) examine the right sidebar. are the “facets” clear? yes _____no _____some, not all _____ 4) if you want to view items from other libraries in your search results, can you find the option? yes _____no _____ 5) can you find an electronic book published in 2008? yes _____no _____don’t know _____ comments: b. click on the first book record in the original climate change search results 1) is information about the book clearly represented? yes _____ no _____ 2) is it clear where to find item? yes _____ no _____ 3) look at the tags. do you understand what this feature is? yes _____ no _____ comments: c. look at the brief item information provided on the screen 1) is the information displayed useful in determining the scope and content of the item? yes _____no _____ 2) are the topics in the record useful for finding additional information on the topic? yes _____no _____ comments: d. click on each button below the brief record information 1) is this information useful? yes _____ no _____ 2) are the names for the tabs accurate? what should they be named? e. can you easily determine where the item is located and how to request it? yes _____no _____ comments: f. go back to the basic search box and enter the author james joyce (all fields) as a new search 1) is it easy to distinguish items by james joyce from items about james joyce? yes _____no _____ 2) using the facets, can you find only titles with james joyce as author? yes _____no _____ 3) can you find out how to cite an item? yes _____ no _____ comments: 52 information technology and libraries | march 2011 g. now try to find an audio recording by the artist prince using basic search were you successful? yes _____no _____ h. find the earliest prince recording ( “for you”; 1978). is it in the local collection? yes _____ no _____ if not, can you get a copy? comments: iii. in the advanced search screen: a. use the title drop down to find the item: gone with the wind 1) were you successful? yes _____ no _____ not sure _____ 2) can you locate a dvd of the same title? yes _____ no _____ 3) are copies of the dvd available in the university of illinois library? yes _____ no _____ comments: b. use the author drop down in the advanced search to locate titles by: jack london using the facets, find and open the record for the jack london novel, white fang. explore each of the: description, holdings, and comments tabs: 1) is this information useful? yes _____ no _____ 2) would you change the names of the tabs or the information on them? 3) other than your local library copy of white fang, can you find copies at other libraries? yes _____ no _____ comments: c. using the advanced search, find a dvd on public speaking (hint: use the limit box to select the film/video format) are there instructional videos in the university of illinois library? yes _____ no _____ 1) identify the author that’s responsible for one of the dvds 2) can you easily find other works by this author? yes _____ no _____ comments: iv. exploring the account features: a. click on login in the upper right corner of the page. on the next page, create an account. is it clear how to create an account? yes _____ no _____ b. once you have your account and are logged in to vufind, look at the menu on the right hand side. is it clear what each of the menu items are? yes _____ no _____ c. while still logged in, do a search for your favorite book and add it to your favorites list. is this tool useful, would you consider using it? yes _____ no _____ comments: v. comparing vufind to other resources: a. open three browser windows (this is easiest in firefox by entering ctrl-t for each new window) with 1) your library catalog 2) vufind 3) amazon.com enter global warming in each website in the basic search window of each. based on your initial reactions, which service appears the best for most of your uses? library catalog _____ vufind _____ amazon _____ comments: c. do you have a preference in the display formats? library catalog _____ vufind _____ amazon _____ comments: debriefing now that you have used vufind, how would you rate it—on a scale from 1–5, from easy to confusing to use? comments? how does it compare to other library catalogs you’ve used? if vufind and your home library catalog were available side-by-side, which would you use first? why? are you familiar with any of these other products: aquabrowser _____ googlebooks _____ microsoft live search _____ librarything _____amazon.com _____other preferred service _____ that’s it! thank you for participating in our usability. you will be receiving one other survey through email, we appreciate your opinions on the vufind product. lita covers 2, 3, and 4 index to advertisers resource discovery: comparative survey results on two catalog interfaces heather hessel and janet fransen resource discovery: comparative survey results | hessel and fransen 21 abstract like many libraries, the university of minnesota libraries-twin cities now offers a next-generation catalog alongside a traditional online public access catalog (opac). one year after the launch of its new platform as the default catalog, usage data for the opac remained relatively high, and anecdotal comments raised questions. in response, the libraries conducted surveys that covered topics such as perceptions of success, known-item searching, preferred search environments, and desirable resource types. results show distinct differences in the behavior of faculty, graduate student, and undergraduate survey respondents, and between library staff and non-library staff respondents. both quantitative and qualitative data inform the analysis and conclusions. introduction the growing level of searching expertise at large research institutions and the increasingly complex array of available discovery tools present unique challenges to librarians as they try to provide authoritative and clear searching options to their communities. many libraries have introduced next-generation catalogs to satisfy the needs and expectations of a new generation of library searchers. these catalogs incorporate some of the features that make the current web environment appealing: relevancy ranking, recommendations, tagging, and intuitive user interfaces. traditional opacs are generally viewed as more complex systems, catering to advanced users and requiring explicit training in order to extract useful data. some librarians and users also see them as more effective tools for conducting research than next-generation catalogs. academic libraries are frequently caught in the middle of conflicting requirements and expectations for discovery from diverse sets of searchers. in 2002, the university of minnesota-twin cities libraries migrated from the notis library system to the aleph500™ system and launched a new web interface based on the aleph online catalog, originally branded as mncat. in 2006, the libraries contracted with the ex libris group as one of three development partners in the creation of a new next-generation search environment called primo. during the development process, the libraries conducted multiple usability studies that provided data to inform the direction of the product. participants in the usability studies generally characterized the primo interface as “clear” and “efficient.”1 a year later the university heather hessel (heatherhessel@yahoo.com) was interim director of enterprise technology and systems, janet fransen (fransen@umn.edu) is the librarian for aerospace engineering, electrical engineering, computer science, and history of science & technology, university of minnesota, minneapolis, mn. mailto:heatherhessel@yahoo.com mailto:fransen@umn.edu information technology and libraries | june 2012 22 libraries branded primo as mncat plus, rebranded the aleph opac as mncat classic, and introduced mncat plus to the twin cities user community as a beta service. in august 2008, mncat plus was configured as the default search for the twin cities catalog on the libraries’ main website, with the libraries continuing to keep a separate link active to the aleph opac. a new organizational body called the primo management group was created in december 2008 to coordinate support, feedback, and enhancements of the local primo installation. this committee’s charge includes evaluating user input and satisfaction, coordinating communication to users and staff, and prioritizing enhancements to the software and the normalization process. when the primo management group began planning its first user satisfaction survey, the group noted that a significant number of library users seemed to prefer mncat classic. therefore, two surveys were developed in response to the group’s charge. these two surveys were identical in scope and questions, except that one survey referenced mncat classic and was targeted to mncat classic searchers (appendix a), while the other survey referenced mncat plus and was targeted to mncat plus searchers (appendix b). these surveys were designed to produce statistics that could be used as internal benchmarks to gauge library progress in areas of user experience, as well as to assist with ongoing and future planning with regard to discovery tools and features. research questions in addition to evaluating user satisfaction and requesting user input, the primo management group also chose to question users about searching behaviors in order to set the direction of future interface work. questions directed toward searching behaviors were informed by the findings from a 2009 university of minnesota libraries report on making resources discoverable.2 the group surveyed respondents about types of items they expect to find in their searches, their interest in online resources, and the entry point for their discovery experience. the primo management group crafted the surveys to get answers to the following research questions:  how often do users view their searching activity as successful?  how often do users know the title of the item that they are looking for, as opposed to finding any resource relevant to their topic?  what search environments do users choose when looking for a book? a journal? anything relevant to a topic?  how interested are users in finding items that are not physically located at the university of minnesota?  are there other types of resources that users would find helpful to discover in a catalog search? resource discovery: comparative survey results | hessel and fransen 23 although it can be tempting to think of the people using the catalog interfaces as a homogeneous group of “users,” large academic libraries serve many types of users. as wakimoto states in “scope of the library catalog in times of transition,” on the one hand, we have ‘net-generation users who are accustomed to the simplicity of the google interface, are content to enter a string of keywords, and want only the results that are available online. on the other hand, we have sophisticated, experienced catalog users who understand the purpose of uniform titles and library of congress classifications and take full advantage of advanced search functions. we need to accommodate both of these user groups effectively.3 the primo management group planned to use the demographic information to look for differences among user communities; therefore the surveys requested demographic information such as role (e.g., student) and college of affiliation (e.g., school of dentistry). in designing the surveys, the group took into account the limitations of this type of survey as well as the availability of other sources of information. for example, the primo management group chose not to include questions about specific interface features because such questions could be answered by analyzing data from system logs. the group was also interested in finding out about users’ strategies for discovering information, but members felt that this information was better obtained through focus groups or usability studies rather than through a survey instrument. research method the primo management group positioned links to the user surveys in several online locations, with the libraries’ home page providing one primary entry point. clicking on the link from the home page presented users with an intermediate page, where they were given a choice of which survey to complete: one based on mncat plus, and the other on mncat classic. if desired, users could choose to complete a separate survey for each of the two systems. links were also provided from within the mncat plus and mncat classic environments, and these links directed users to the relevant version of the survey without the intermediary page. in addition to the survey links in the online environment, announcements were made to staff about the surveys, and librarians were encouraged to publicize the surveys to their constituents around campus. the survey period lasted from october 1 through november 25, 2009. at the time of the surveys, the university of minnesota libraries was running primo version 2 and aleph version 19. because participants were self-selected, the survey results represent a biased sample, are more extreme than the norm, and are not generalizable to the whole university population. participants were not likely to click the survey link or respond to e-mailed requests unless they had sufficient incentive, such as strong feelings about one interface or the other. thirty percent of respondents provided an e-mail address to indicate that they would be willing to be contacted for focus groups or further surveys, indicating a high level of interest in the public-facing interfaces the libraries employ. in considering a process for repeating this project, more attention would be paid to methodology to address validity concerns. findings and analysis information technology and libraries | june 2012 24 findings relevant to each research question are discussed here. six hundred twenty-nine surveys contained at least one response—476 for mncat plus and 153 for mncat classic. responses by demographics as shown in table 1, graduate students were the primary respondents for both mncat plus and mncat classic, followed by undergraduates and faculty members. library staff made up 13 percent of mncat classic respondents and 4 percent of mncat plus respondents, although the actual number of library staff responding was nearly identical (twenty-one for mncat plus, twenty for mncat classic). library staff members were disproportionately represented in these survey responses and the group analyzed the results to identify categories in which library staff members differed from overall trends in the responses. questions about affiliation appeared at the end of the surveys, which may account for the high number of respondents in the “unspecified” category. mncat classic respondents frequency mncat plus respondents frequency graduate student 50 33% graduate student 176 37% undergraduate student 31 20% undergraduate student 110 23% library staff 20 13% faculty 40 8% faculty 21 14% staff (non-library) 28 6% staff (non-library) 10 7% library staff 21 4% community member 2 1% community member 11 2% (unspecified) 19 12% (unspecified) 90 19% total 153 100% total 476 100% table 1. respondents by user population a comparison of the student survey responses shows that graduate students were overrepresented, while undergraduates were underrepresented, at close to a reverse ratio. of the total number of graduate and undergraduate students, 62 percent of the respondents were graduate students, even though they accounted for only 32 percent in the larger population. conversely, undergraduates represented only 38 percent of the student respondents, even though they accounted for 68 percent of the graduate and undergraduate total. regrettably, the surveys did not include options for identifying oneself as a non-degree-seeking or professional student, so the analysis of students compared with overall population in this section includes only graduate students and undergraduates. differences were also apparent in the representation of all four categories of students within a particular college unit. at least two college units were underrepresented in the survey responses: resource discovery: comparative survey results | hessel and fransen 25 carlson school of management and the college of continuing education. one college unit was overrepresented in the survey results; 59 percent of the overall student respondents to the mncat classic survey, and 47 percent of the mncat plus students indicated that they were housed in the college of liberal arts (cla), and yet cla students only represent 32 percent of the total number of students on campus. table 2 shows the breakdown of percentages by college or unit and the corresponding breakdown by survey respondent, highlighting where significant discrepancies are evident. twin cities overall percentage of students mncat classic student survey respondents +/mncat plus student survey respondents +/ carlson school of management 9% 0% -9% 2% -7% center for allied health 0% 2% +1% 1% 0% col of educ/human development 10% 9% -1% 14% +3% col of food, agr & nat res sci 5% 4% 0% 7% +2% coll of continuing education 8% 1% -7% 1% -7% college of biological sciences 4% 6% +2% 5% 0% college of design 3% 3% 0% 3% 0% college of liberal arts 32% 59% +27% 47% +15% college of pharmacy 1% 1% 0% 0% -1% college of veterinary medicine 1% 1% 0% 1% 0% graduate school 0% 0% 0% 0% 0% humphrey inst of publ affairs 1% 1% 0% 1% 0% institute of technology (now college of science & engineering) 14% 9% -5% 10% -4% law school 2% 1% -1% 1% 0% medical school 4% 2% -3% 5% 0% school of dentistry 1% 1% 0% 0% -1% school of nursing 1% 0% -1% 0% -1% school of public health 2% 1% -1% 3% +1% table 2. student responses by affiliation information technology and libraries | june 2012 26 faculty and staff together totaled only eighty-nine respondents on the mncat plus survey and fifty-one respondents on the mncat classic survey. in keeping with graduate and undergraduate student trends, the college of liberal arts (cla) was clearly over-represented in terms of faculty responses. the cla faculty group represents about 17 percent of the faculty at the university of minnesota. yet over half the faculty respondents on the mncat plus survey were from cla; over 80 percent of the mncat classic faculty respondents identified themselves as affiliated with cla. faculty groups that were underrepresented include the medical school and the institute of technology. perceptions of success a critical area of inquiry for the surveys was user satisfaction and perceptions of success: “do users perceive their searching activity as successful?” asked in both surveys, the question’s responses allowed the primo management group to compare respondents’ perceived success between the two interfaces. results show a marked difference: while 86 percent of the mncat classic respondents reported that they are “usually” or “very often” successful at finding what they are looking for, only 62 percent of the mncat plus respondents reported the same perception of success. respondents reported very similar rates of success regardless of school, type of affiliation, or student status. figure 1. perceptions of success: mncat plus and mncat classic these results should be interpreted cautiously. because mncat plus is the libraries’ default catalog interface, mncat classic users are a self-selecting group whose members make a conscious decision to bookmark or click the extra link to use the mncat classic interface. one cannot assume that mncat users in general also would have an 86 percent perception of success were they to use mncat classic; familiarity with the tool could play a part in mncat classic users’ success. 14% 24% 44% 18% 4% 11% 32% 54% 0% 10% 20% 30% 40% 50% 60% rarely sometimes usually very often mncat classic mncat plus resource discovery: comparative survey results | hessel and fransen 27 another possible factor in the reported difference in user success is the higher proportion of known-item searching—finding a book by title—occurring in mncat classic. a user’s criteria for success differ when searching for a known item versus conducting a general topical search. it is easier for a searcher to determine that they have been successful in a situation where they are looking for a specific item. some features of mncat classic, such as the start-of-title and other browse indexes, are well suited to known-item searching and had no direct equivalent in mncat plus, which defaults to relevance-ranked results. (primo version 3 has implemented new features to enhance known-item searching.) comments received from users suggest that several factors played a role. one mncat classic respondent praised the “precision of the search...not just lots of random hits” and noted that mncat classic supports a “[m]ore focused search since i usually already know the title or author.” in contrast, a mncat plus respondent commented that the next-generation interface was “great for browsing topics when you do not have a specific title in mind.” this comment is consonant with the results from other usability testing done on next-generation catalogs. in "next generation catalogs: what do they do and why should we care?", emanuel describes observed differences between topical and known-item searching: “during the testing, users were generally happy with the results when they searched for a broad term, but they were not happy with results for more specific searches because often they had to further limit to find what they wanted in the first screen of results.”4 a common characteristic of next-generation catalogs is that they return a large result set that can then be limited using facets. training and experience may also explain some of the differences in success. mncat plus also enables functionality associated with the functional requirements for bibliographic records (frbr), which is intended to group items with the same core intellectual content in a way that is more intuitive to searchers. however, this feature is unfamiliar to traditional catalog searchers and requires an extra step to discover very specific known-items in primo. one mncat plus user expressed dissatisfaction and added, “i'm not sure if it's my lack of training/practice or that the system is not user-friendly.” in focus group analyses conducted in 2008, oclc found that “when participants conducted general searches on a topic (i.e., searches for unknown items) that they expressed dissatisfaction when items unrelated to what they were looking for were returned in the results list. end users may not understand how to best craft an appropriate search strategy for topic searches.”5 how often do users know the title of the item that they are looking for? users come to the library with different goals in mind. in “chang's browsing,” available in theories of information behavior, chang identified five general browsing themes,6 adapted to discovery by carter.7 for the purposes of the survey, the primo management group grouped those themes into two goals: finding an item when the title is known, and finding anything on a given topic. the primo management group had heard concerns from faculty and staff that they have more difficulty finding an item when they know the title when using mncat plus than they did with mncat classic. the group was interested in knowing how often users search for known items. to explore this topic and its impact on perceptions of success, the surveys included two questions on known-item and topical searching. the survey results shown in table 3 indicate that a significantly higher proportion of mncat classic respondents (30 percent plus 43 percent = 73 percent) than mncat plus respondents (24 information technology and libraries | june 2012 28 percent plus 29 percent = 53 percent) were “very often” or “usually” searching for known items. it may be that users in search of known items have learned to go to mncat classic rather than mncat plus. rarely sometimes usually very often total i already know the title of the item i am looking for mncat classic 7% (11) 19% (29) 30% (46) 43% (66) 152 mncat plus 15% (69) 33% (151) 24% (111) 29% (132) 463 i am looking for any resource relevant to my topic mncat classic 14% (21) 32% (47) 20% (29) 34% (51) 148 mncat plus 14% (62) 29% (133) 29% (133) 28% (127) 455 table 3. responses to “i already know the title of the item i am looking for” when the primo management group considered how often researchers in different user roles searched for known items versus anything on a topic, clear patterns emerged as shown in figure 2. in the mncat plus survey, only 34 percent of undergraduate mncat plus searchers “usually” or “very often” search for a particular item, versus 74 percent of faculty. conversely, 75 percent of undergraduate respondents “usually” or “very often” search for any resource relevant to a topic, versus 37 percent of faculty. graduate student respondents showed interest in both kinds of use. if successful browsing by topic is best achieved using post-search filtering, it may help to explain differences between undergraduate students and faculty. the analysis of usability testing done on other next generation catalogs described in “next generation catalogs: what do they do and why should we care?” states that “users that did not have extensive searching skills were more likely to appreciate the search first, limit later approach, while faculty members were faster to get frustrated with this technique.”8 results for all mncat classic respondents showed a preference for known item searching, but undergraduate students still indicated that they search more for anything on the topic and less for known items than faculty respondents. no significant differences were identified by discipline. resource discovery: comparative survey results | hessel and fransen 29 figure 2. searching for a known item vs. any relevant resource some qualitative comments from survey takers suggest that respondents view the library interface as a place to go to find something already known to exist, e.g., “i never want to search by topic. library catalogs are for looking up specific items.” however, with respect to discovering resources for a subject in general, both mncat classic and mncat plus respondents showed that they would also like to find items relevant to their topic (figure 2). there was no significant difference between mncat classic and mncat plus respondents on this question; in both environments, only 14 percent of the users said that they would “rarely” be interested in general results relevant to their topic. perceptions of success by specific characteristics for mncat plus, the majority of respondents “somewhat agree” or “strongly agree” that items available online or in a particular collection are easy to find. one-third of the mncat plus respondents had never tried to find an item in a particular format. over 40 percent had never tried to find an item with a particular isbn/issn. interface features may be a factor here: isbn/issn searching is not a choice in the mncat plus drop down menu, so users may not know that they can do such a search. a higher percentage of mncat classic respondents “strongly agree” that it is easy to find items by collection, available online, or in a particular format, than mncat plus respondents. figure 3 shows results based on particular characteristics. information technology and libraries | june 2012 30 figure 3. perception of success by characteristic although the surveys were primarily intended to gather reactions from end users, some interesting data emerged about usage by library staff. as demonstrated in figure 4, library staff respondents were much more likely to have performed the specific types of searches listed in this section than users generally, and reported a much higher rate of perceived success with mncat classic. figure 4. perception of success by characteristic: library staff resource discovery: comparative survey results | hessel and fransen 31 searching by location: local collections and other resources in a large research institution with several physical library locations and many distinct collections, users need the ability to quickly narrow a search to a particular collection. but even the largest institution cannot collect everything a researcher might need. the primo management group wondered not only whether users felt successful when they looked for an item in a particular collection but also wanted to explore whether users want to see items not owned by the institution as part of their search results. finding items among the many library locations was not a problem for either mncat plus or mncat classic respondents: 72 percent either somewhat or strongly agreed that it is easy to find items in a particular collection using mncat. furthermore, survey respondents of both interfaces agreed that they are interested in items no matter where the items are, which underlines the value of a service such as worldcat; 73 percent of mncat plus respondents and 78 percent of mncat classic respondents expressed a preference for seeing items held by other libraries, knowing they could request items using an interlibrary loan service if necessary. preferred search environments three of the survey questions asked users about their preferred search environments for different searching needs:  when looking for a particular book  when looking for a particular journal article  when searching without a particular title in mind each survey presented respondents with a list of choices and space to specify other sources not listed. respondents were encouraged to mark as many sources as they regularly use. when searching for a specific book, users of the two catalog environments identified a number of other sources. the top five sources in each survey are listed in table 4. when i am looking for a specific book, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. mncat classic (116) 1. mncat plus (217) 2. worldcat (50) 2. google (165) 3. amazon (50) 3. mncat classic (163) 4. google (49) 4. amazon (160) 5. google books (31) 5. google books (108) table 4. search environment for books information technology and libraries | june 2012 32 qualitative comments indicated that users like being able to connect to amazon and google books in order to look at tables of contents and reviews. they also specifically mentioned barnes and noble, as well as other local libraries. these results show that mncat plus respondents were more likely to also use mncat classic than vice-versa. the data do not suggest why this would be the case, but familiarity with the older interface may play a role. mncat classic respondents were more likely than mncat plus users to return to their search environment when searching for a particular book (82 percent versus 53 percent). one mncat plus respondent commented “i didn't know i could still get to mncat classic.” when searching for a specific journal article, users of both systems chose “other databases (jstor, pubmed, etc.)” above all the other choices. even more respondents would likely have marked this choice if not for confusion over the term “other databases.” most of the comments mentioned specific databases, even when the respondent had not selected the “other databases” choice. one user commented, “most of these choices would be illogical. you don't list article indexes, that's where i go first.” table 5 lists the five responses marked most often for each survey. when i am looking for a specific journal article, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. other databases (jstor, pubmed, etc.) (92) 1. other databases (jstor, pubmed, etc.) (232) 2. mncat classic (53) 2. google scholar (131) 3. google scholar (40) 3. e-journals list (130) 4. e-journals list (34) 4. mncat plus (110) 5. google (29) 5. mncat plus article search (101) table 5. search environment for articles. qualitative comments from respondents indicated that interfaces would be more useful if they helped users find online journal articles. this raised some questions with regard to mncat plus, which includes a tab labeled “articles” for conducting federated article searches. however, mncat plus respondents noted that they used the plus “articles” search almost as much as they did mncat plus. other plus comments included: i tried to use this for journal articles but it only has some in the database i guess and when i did my search it only found books and no articles. i don't understand it. i tried this new one and it came up with wierd [sic] stuff in terms of articles. my professor said to give up and use the regular indexes because i wasn't getting what i needed to do the paper. it wasted my time. this desire for federated search coupled with the expressions of dissatisfaction with the existing federated search platform is consistent with the mixed opinions expressed in other studies, such as sam houston state university’s assessment of use of and satisfaction with the webfeat resource discovery: comparative survey results | hessel and fransen 33 federated search tool. that study found “[f]ederated search use was highest among lower-level undergraduates, and both use and satisfaction declined as student classification rose.”9 the new search tools that contain preindexed articles, such as primo central, summon, worldcat local, and ebsco discovery service, may address the frustrations that more experienced searchers express regarding federated search technology. when researching a topic without a specific title in mind, “google” and “other databases” were nearly equal and ranked first for mncat plus respondents, while “other databases” ranked first for mncat classic respondents. table 6 lists the five responses marked most option for each survey. when i am researching a topic without a specific title in mind, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. other databases (jstor, pubmed, etc.) (84) 1. google (197) 2. mncat classic (76) 2. other databases (jstor, pubmed, etc.) (192) 3. google (63) 3. google scholar (155) 4. google scholar (47) 4. mncat plus (145) 5. worldcat (32) 5. mncat classic (101) table 6. search environment for topics significant differences based on school affiliation were evident in the area of preferred search environments for topical research. for example, institute of technology respondents reported using google much more often when researching without a specific title in mind than respondents in other areas. evidence from the health sciences is limited in that only seven percent of respondents in total identified themselves as being from this area. however, these limited results show that health sciences respondents relied more on library databases than on google. respondents in the liberal arts relied more on mncat, in either version, than did respondents in the other fields. desired resource types one feature of the primo discovery interface is its ability to aggregate records from more than one source. university libraries maintains several internal data sources that are not included in the catalog, and the possibility of including some of these in the mncat plus catalog has been considered many times since primo’s release. the primo management group was interested to hear from users whether they would find three types of internal sources useful: research reports and preprints, online media, and archival finding aids. the group also asked users to mark “online journal articles” if they would find article results helpful. the question did not specify whether journal articles would appear integrated with other search results in a mncat “books” search or information technology and libraries | june 2012 34 in a separate search such as that already provided through a metasearch on the mncat plus articles tab. the surveys asked users what kinds of resources would make mncat more useful. the results for both mncat plus and mncat classic were similar and response counts for both surveys were ordered as shown in table 7. respondents could mark more than one of the choices. i would find mncat more useful if it helped me find: mncat classic frequency mncat plus frequency online journal articles 65 255 u of m research materials (e.g., research reports, preprints) 34 149 online media (e.g., digital images, streaming audio/visual) 27 134 archival finding aids 27 90 table 7. desired resource types the primo management group noted that more mncat plus respondents chose “online journal articles” more frequently than the other categories even though the mncat plus interface includes an “articles” tab for federated searching. it is unclear whether the respondents were not seeing the “articles” tab in mncat plus because they would like to see search results integrated, or if they were using the “articles” tab and were not satisfied with the results. comments from respondents generally supported the inclusion of a wider range of resources in mncat. however, several respondents also expressed concerns about the trade-offs that might be involved in providing wider coverage. one user liked the idea of having the databases “all … in one place,” but added that “it would have to just give you the stuff that you need.” several users cited the varying quality of the material discovered through library sources. one user supported the inclusion of articles “if it included good articles and not the ones i got.” a mncat classic respondent gave the variable quality of the material he or she had found through a database search as a reason for leaving the coverage of mncat as it is: “i use the best sources depending on my needs.” another mncat classic user expressed doubt that coverage of all disciplines was feasible. in commenting on the content of mncat, respondents also mentioned specific types of material that they wanted to see (e.g. archives of various countries), as well as difficulties with particular classes of material (“the confusing world of government documents”). one mncat plus user related his or her interest in public domain items to a specific item of functionality that would enhance their discovery, namely a date sort. in general, the interest in university of minnesota research material was fairly high. however, faculty members ranked university of minnesota research materials last in terms of preference: only twelve faculty respondents chose the option, out of sixty-one total faculty respondents. resource discovery: comparative survey results | hessel and fransen 35 conclusions the data from two surveys, conducted concurrently in 2009 on a traditional opac (mncat classic) and next-generation catalog (mncat plus), point to differences in the use and perceptions of both systems. there appeared to be fairly strong “brand loyalty” with mncat classic, given that this interface is no longer the default search for the libraries. surveys for both systems suggest a perception of success that is lower than desirable and that there is room to improve the quality of the discovery experience. it is unclear from the data if the reported perceptions of success were the result of the systems not finding what the user wants, or if the systems did not contain what the user wanted to find. mncat classic respondents were more likely to use worldcat to find a specific book than mncat plus respondents. mncat plus respondents indicated a use of mncat classic, but not vice versa. both sets of surveys described use of amazon and google for discovery. mncat plus respondents reported lower rates of success at finding known items than mncat classic respondents. mncat classic respondents were far more likely to have a specific title in mind that they wanted to obtain; half of the mncat plus respondents reported having a specific title in mind. the team that examined the survey responses found that the data suggested several key attributes that should be present in the libraries discovery environment. further discussion of the results and suggested attributes was conducted with library staff members in open sessions. results also informed local work on improving discovery interfaces. the results suggested:  the environment should support multiple discovery tasks, including known-item searching and topical research.  support for discovery activity should be provided to all primary constituent groups, noting the significant survey response by graduate student searchers.  users want to discover materials that are not owned by the libraries, in addition to local holdings.  a discovery environment should make it easy for users to find and access resources in vendor-provided resources, such as jstor and pubmed. while the results of the 2009 surveys provided a valuable description of usage, the survey team recognized that methodological choices limit the usefulness in applying results to a larger population. the team also recognized that there were a number of questions yet unanswered. some of these outstanding questions present opportunities for future research and suggest that a variety of formats might be useful, including surveys, focus groups, and targeted interviews.  to what extent do users expect to find integrated search results among different kinds of content, such as articles, databases, indexes, and even large scale data sets?  what general search strategies do users use to navigate the complex discovery environment that is available to them, and where are the failure points?  how much of the current environment requires training and how much is truly intuitive to users? information technology and libraries | june 2012 36  how can the university libraries identify and serve users who did not complete the surveys?  how useful would users find targeted results based on a particular characteristic such as role, student status, or discipline? since the surveys were conducted, the university libraries upgraded to primo version 3, which included features to address some of the concerns respondents identified in the surveys, such as known-item searching. primo version 3 allows users to conduct a left-justified title search (“title begins with…”), as well as sort by fields such as title and author. once the new version has been in place long enough for users to develop some comfort with the interface, the primo management group intends to resolve methodological issues and repeat its surveys, measuring users’ reactions against the baseline data set in the 2009 surveys. acknowledgements we would like to thank the other members of the primo management group, who helped to design and implement the surveys, as well as analyze and communicate the results: chew chiat naun (chair), susan gangl, connie hendrick, lois hendrickson, kristen mastel, r. arvid nelsen, and jeff peterson. we also want to acknowledge the helpful feedback and guidance of the group’s sponsor, john butler. references 1 tamar sadeh, “user experience in the library: a case study.” new library world 109, no. 1/2 (2008): 7–24. 2 cody hanson et al., discoverability phase 1 final report (minneapolis: university of minnesota, 2009), http://purl.umn.edu/48258/ (accessed dec. 20, 2010). 3 jina choi wakimoto, “scope of the library catalog in times of transition.” cataloging & classification quarterly 47, no. 5 (2009): 409–26. 4 jenny emanuel, “next generation catalogs: what do they do and why should we care?” reference & user services quarterly 49, no. 2 (winter, 2009): 117–20. 5 karen calhoun, diane cellentani, and oclc, online catalogs : what users and librarians want: an oclc report (dublin, ohio: oclc, 2009). 6 shan-ju chang, “chang's browsing,” in theories of information behavior, ed. karen e. fisher, sandra erdelez and lynne mckechnie, 69-74 (medford, n.j.: information today, 2005). 7 judith carter, “discovery: what do you mean by that?” information technology & libraries 28, no. 4 (december 2009): 161–63. 8 jenny emanuel, “next generation catalogs: what do they do and why should we care?” reference & user services quarterly 49, no. 2 (winter, 2009): 117–20. 9 abe korah and erin dorris cassidy. “students and federated searching: a survey of use and satisfaction,” reference & user services quarterly 49, no. 4 (summer 2010): 325–32. https://purl.umn.edu/48258 resource discovery: comparative survey results | hessel and fransen 37 appendix a. mncat classic survey the library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. we’d like to know how often you use mncat classic for these different purposes. 1. when i visit mncat classic… very often usually sometimes rarely i already know the title of the item i am looking for     i am looking for any resource relevant to my topic     many people use tools other than the library catalog to find books, articles, and other resources. for the different situations below, please tell us what other tools you find helpful. 2. when i am looking for a specific book, i usually search (check all that apply):  amazon  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search  google scholar  libraries onesearch other (please specify) _______________________________________________________ 3. when i am looking for a specific journal article, i usually search (check all that apply):  amazon  google books  mncat plus article search  citation linker  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat other (please specify) ___________________________________________________ information technology and libraries | june 2012 38 4. when i am researching a topic without a specific title in mind, i usually search (check all that apply):  amazon  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search other (please specify) ___________________________________________________ now we’d like to know what you think of mncat classic and what new features (if any) you’d like to see. 5. when i use mncat classic very often usually sometimes rarely i succeed in finding what i’m looking for     6. it is easy to find the following kinds of items in mncat classic strongly agree somewhat agree somewhat disagree strongly disagree i haven’t looked for this with mncat classic an item that is available online      an item within a particular collection (e.g., wilson library, university archives, etc.)      an item in a particular physical format (e.g., dvd, map, etc.)      an item with a specific isbn or issn      resource discovery: comparative survey results | hessel and fransen 39 7. i would find mncat classic more useful if it helped me find (check all that apply):  online journal articles  online media (e.g., digital images, streaming audio/visual)  archival finding aids  u of m research material (e.g., research reports, preprints) other (please specify) ___________________________________________________ 8. the worldcat catalog allows you to search the contents of many library collections in addition to the university of minnesota. which of the following best describes your level of interest in this type of catalog?  yes, i am interested in what other libraries have regardless of where they are, knowing i could request it through interlibrary loan if i want it  yes, i am interested, but only if i can get the items from a nearby library  no, i am interested only in what is available at the university of minnesota libraries please share anything you particularly like or dislike about mncat classic. 9. what i like most about mncat classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. what i like least about mncat classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ we want to understand how different groups of people use mncat classic, as well as other tools, for finding information. please answer the following questions to give us an idea of who you are. 11. how are you affiliated with the university of minnesota?  faculty  graduate student  undergraduate student  staff (non-library) information technology and libraries | june 2012 40  library staff  community member 12. with which university of minnesota college or school are you most closely affiliated?  allied health programs  food, agricultural and natural resource sciences  pharmacy  biological sciences  law school  public affairs  continuing education  liberal arts  public health  dentistry  libraries  technology (engineering, physical sciences & mathematics)  design  management  veterinary medicine  education & human development  medical school  none of these  extension  nursing 13. we are interested in learning more about how you find the materials you need. if you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ resource discovery: comparative survey results | hessel and fransen 41 appendix b. mncat plus survey the library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. we’d like to know how often you use mncat plus for these different purposes. 1. when i visit mncat plus… very often usually sometimes rarely i already know the title of the item i am looking for     i am looking for any resource relevant to my topic     many people use tools other than the library catalog to find books, articles, and other resources. for the different situations below, please tell us what other tools you find helpful. 2. when i am looking for a specific book, i usually search (check all that apply):  amazon  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search  google scholar  libraries onesearch other (please specify) _______________________________________________________ 3. when i am looking for a specific journal article, i usually search (check all that apply):  amazon  google books  mncat plus article search  citation linker  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat other (please specify) ___________________________________________________ information technology and libraries | june 2012 42 4. when i am researching a topic without a specific title in mind, i usually search (check all that apply):  amazon  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search other (please specify) ___________________________________________________ now we’d like to know what you think of mncat plus and what new features (if any) you’d like to see. 5. when i use mncat plus very often usually sometimes rarely i succeed in finding what i’m looking for     6. it is easy to find the following kinds of items in mncat plus strongly agree somewhat agree somewhat disagree strongly disagree i haven’t looked for this with mncat plus an item that is available online      an item within a particular collection (e.g., wilson library, university archives, etc.)      an item in a particular physical format (e.g., dvd, map, etc.)      an item with a specific isbn or issn      resource discovery: comparative survey results | hessel and fransen 43 7. i would find mncat plus more useful if it helped me find (check all that apply):  online journal articles  online media (e.g., digital images, streaming audio/visual)  archival finding aids  u of m research material (e.g., research reports, preprints) other (please specify) ___________________________________________________ 8. the worldcat catalog allows you to search the contents of many library collections in addition to the university of minnesota. which of the following best describes your level of interest in this type of catalog?  yes, i am interested in what other libraries have regardless of where they are, knowing i could request it through interlibrary loan if i want it  yes, i am interested, but only if i can get the items from a nearby library  no, i am interested only in what is available at the university of minnesota libraries please share anything you particularly like or dislike about mncat plus. 9. what i like most about mncat plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. what i like least about mncat plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ we want to understand how different groups of people use mncat plus, as well as other tools, for finding information. please answer the following questions to give us an idea of who you are. 11. how are you affiliated with the university of minnesota?  faculty  graduate student  undergraduate student  staff (non-library) information technology and libraries | june 2012 44  library staff  community member 12. with which university of minnesota college or school are you most closely affiliated?  allied health programs  food, agricultural and natural resource sciences  pharmacy  biological sciences  law school  public affairs  continuing education  liberal arts  public health  dentistry  libraries  technology (engineering, physical sciences & mathematics)  design  management  veterinary medicine  education & human development  medical school  none of these  extension  nursing 13. we are interested in learning more about how you find the materials you need. if you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ editorial | truitt 3 marc truitteditorial marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. the catalog. love it? hate it? depending upon who is speaking, it may be cast as the ultimate portal that enables user access to all local and networked resources, or it may be a tool of byzantine complexity, comprehensible at best to but a small fraction of librarians able to navigate its bibliographic metadata encoded in an arcane 1960s-era format. it is a rich trove of structured and controlled information assembled over decades by the work of countless dedicated catalogers and others. or, it is the now-obsolete product of a labor-intensive process of description and subject analysis that has no relevance in a web-centric world where “everything” is findable via the google search-box. its attempt to organize knowledge provides catalogers with a raison d’etre, but sends their colleagues and many users fleeing for simpler and more all-encompassing tools. it is our alpha and omega, our yin and our yang. few topics in librarianship—perhaps with the conspicuous exception of that perennial library school favorite, our profession’s status as a profession—seem to provoke the range and depth of sentiment engendered by discussions of the place of the catalog. especially in recent years, criticism of the catalog has grown ever more strident, to the point where it has become commonplace in our profession’s literature to say that this most basic of library services “sucks.” as a consequence, librarians have increasingly fallen into one of two camps, with those critical of the catalog often simplistically characterized as favoring, and those defending it as opposing “change.” a number of initiatives have emerged in response to this ferment. some of these have focused on our bibliographic metadata, and particularly on its ability to express the relationships and interconnectedness of the bibliographic universe. as we have traditionally cataloged whatever we had “in-hand,” our cataloging codes and encoding standards have done a very good job of managing the description of bibliographic items; what they have not generally expressed well are the relationships among items. frbr and frad—the functional requirements for bibliographic records and the functional requirements for authority data—seem promising beginnings for addressing the relationship issues, although there are as-yet very few practical implementations. resource description and access (rda), the forthcoming successor to aacr2, is designed around frbr concepts; it will be interesting to see how this plays out in the “real world;” equally interesting will be to what degree the present (or a modified) marc21 is able to express rda’s frbr-based relationship model. other approaches have focused on developing systems that are able to exploit our existing investment in bibliographic metadata in new and useful ways. the pioneering and best-known example of this, of course, is the discovery tool developed by a partnership of north carolina state university libraries and endeca, which premiered in early 2006. this initiative included several innovative features not previously found in library catalogs, such as search result relevance ranking and the ability to perform faceted searching against a variety of controlledvocabulary indices (subject/topical, form/genre, date, etc.) ncsu’s endeca discovery tool spawned an entirely new product segment for the catalog: major ils vendors have scrambled to develop their own next-gen products, combining relevancy and facets with additional functionality such as web 2.0 social and collaborative tools and enhanced federated searching capabilities. the result of all this activity has been the first cross-platform growth opportunity for ils vendors since the development of resource-linking tools and the erm. we at ital have watched these trends with keen interest and have published works describing many of the major developments vis-a-vis the catalog in recent years. indeed, since late 2004, ital has published at least eleven major papers on various topics related to improving the catalog. with our publication of jennifer bowen’s report on the first phase outcomes of the university of rochester’s extensible catalog (xc) project in this issue of ital, we continue our commitment to publish important research in this area. the rochester project is noteworthy, both for its modular and metadata-focused approach and for its high visibility as an open source effort that has received significant support from the andrew w. mellon foundation. i predict that this paper will quickly take its place among the other ground-breaking works on the catalog that ital has published, and i’ll eagerly be awaiting the next progress report on the xc. n “must-reads” dept. okay, so i may not be the first out of the gate with this one, but for those of you who haven’t looked at it yet, trust me, you’ll want to. jonathan zittrain’s the future of the internet and how to stop it (yale university press, 2008), which divides the internet into “generative” technologies such as the pc, and proprietary appliances such as the iphone, may or may not resonate with you, but i think it could well become the next big debate about where the net is and where it should be going. grab a copy and read it today. 32 information technology and libraries | june 2007 author id box for 3 column layout column title 32 information technology and libraries | june 2008 communications michaela brenner and peter klein discovering the library with google earth libraries need to provide attractive and exciting discovery tools to draw patrons to the valuable resources in their catalogs. the authors conducted a pilot project to explore the free version of google earth as such a discover tool for portland state library’s digital collection of urban planning documents. they created eye-catching placemarks with links to parts of this collection, as well as to other pertinent materials like books, images, and historical background information. the detailed how-to-do part of this article is preceded by a discussion about discovery of library materials and followed by possible applications of this google earth project. in calhoun’s report to the library of congress, it becomes clear that staff time and resources will need to move from cataloging traditional formats, like books, to cataloging unique primary sources, and then providing access to these sources from many different angles. “organize, digitize, expose unique special collections” (calhoun 2006). in 2005, portland state university library received a grant “to develop a digital library under the sponsorship of the portland state university library to serve as a central repository of the collection, accession, and dissemination of [urban] key planning documents . . . that have high value for oregon citizens and for scholars around the world” (abbott 2005). this collection is called the oregon sustainable community digital library (oscdl) and is an ongoing project that includes literature, planning reports, maps, images, rlis (regional land information system) geographical data, and more. much of the older material is unpublished, and making it available online presents a valuable resource. most of the digitized—and, more recently, borndigital—documents are accessible through the library’s catalog, where patrons can find them together with other library materials about the city of portland. the bibliographic records are arranged in the catalog in an electronic resource management (erm) system (brenner, larsen, and weston 2006). additionally, these bibliographic data are regularly exported from the library catalog to the oscdl web site (http://oscdl. research.pdx.edu) and there integrated with gis (global information system) features, thus optimizing cataloging costs by reusing data in a different electronic environment. committed to not falling into the trap that clifford lynch had in mind when he wrote, “i think there is a mental picture that many of us have that digitization is something you do and you finish . . . a finite, one-time process“ (lynch 2002), and agreeing with gatenby that “it doesn’t matter at all if a user finds our opac through the ‘back door ’“ (gatenby 2007), the authors looked into further using these existing data from the library catalog by making them accessible from a popular and appealing place on the internet, a place that users are more likely to visit than the library catalog. the free version of google earth, a virtual-globe program that can be installed on pcs, lent itself to experimenting. “google earth combines the power of google search with satellite imagery, maps, terrain and 3-d buildings to put the world’s geographic information at your fingertips” (http://earth.google.com). from there, the authors provide links to the digitized documents in the library catalog. easy distribution, as well as the more playful nature of this pilot project and the inclusion of pictures, make the available data even more attractive to users. “google now reigns” “google now reigns,” claims karen markey (markey 2007), and many others agree that using google is easier and more appealing to most than using library catalogs. google’s popularity has been growing spectacularly. in august 2007, google accounted for 64 percent of all u.s. searches (avtec media group 2007). in contrast, the oclc report on how users perceive the library shows that only one percent of the respondents begin their information search on a library web site, while 84 percent use search engines (de rosa, et al. 2005). “if we [libraries] want to survive,” says stephen abram, “we must place our messages where the users are seeking answers and will trip over them. today that usually means at yahoo, msn, and google” (abram 2005). according to lorcan dempsey, in the longer run, traffic to the library catalog will come by linking from larger consolidated resources, like open worldcat and google scholar (dempsey 2005). dempsey also stressed that it becomes more and more significant to differentiate between discovery and location (dempsey 2006a). initially, users want to discover; they want to find what interests them independent from where this information is actually located and available. while there may be lots of valuable, detailed, and exceptionally well-organized bibliographic information in the library catalog, not michaela brenner (brennerm@pdx.edu) is assistant professor and database maintenance and catalog librarian at portland state university library, oregon. peter klein (peter.klein@colorado.edu) is aerospace engineering bs/ms at the university of colorado at boulder. introducing zoomify image | smith 33discovering the library with google earth | brenner and klein 33 many users (one percent) are willing to discover this information through the catalog. they may not discover what a library has to offer if “the library does not find a way to go to the user, rather than waiting for the user to come to the library” (coyle 2007). unless the intent is to keep our treasures buried, the library community needs to work with popular outside discovery environments— like search engines—to bring information available in libraries to users from the outside. libraries are, although sometimes reluctantly, responding. google, google scholar, and google books are open worldcat partner sites that are now or soon will be providing access to worldcat records. google book search includes “find this book in the library,” and the advanced book search also has the option to limit a search to library catalogs with access to the worldcat web record for each item. “deep linking” enables web users to link from search results in yahoo, google, or other partner sites to the “find in a library” interface in open worldcat, and then directly to the item’s record in their library’s online public access catalog (opac). simply put, “find it on google, get it from your library” (calhoun 2006). the “leveraged discovery environment” is an expression coined by dempsey that means it becomes increasingly important to leverage a “discovery environment which is outside your control to bring people back into our catalog environment (like amazon, google scholar)” (dempsey 2006b). issues in calhoun’s report to the library of congress include the question of how to get a google user from google to library collections. she quotes an interviewee saying that “data about a library’s collection needs to be on google and other popular sites as well as the library interface” (calhoun 2006). with evidence pointing to the heavy use of google for discovery and with google earth technology providing such a powerful visualization tool, the authors felt tempted to experiment with existing data from portland state library’s digital oscdl collection and make these data accessible through a virtual globe. the king’s college cultural heritage project martyn jessop from king’s college in london, united kingdom, published an article about a relatively small pilot project on providing access to a digital cultural heritage collection through a geographical information system (jessop 2005). jessop’s approach to explore different technologies and techniques to apply to existing data about unique primary sources was exactly what the authors had in mind with this project, and provided encouragement to move forward with the idea of providing additional access to the oregon sustainable community digital library (oscdl) collections through google earth. similar to jessop, the authors regard it an unaffordable luxury to put a great deal of effort into collecting, digitizing, and cataloging materials without making them available to a much broader audience through multiple access points. comparable to jessop, the goal of this project was to find a relatively simple, low-cost technological solution that could also be applied to a much wider range of data without much more investment in staff time and money. once the authors mastered the initial hurdle of understanding google earth’s programming language, they could easily identify with jessop’s notion of “project creep” as more and more possibilities arose to make the project more appealing. this, as with the king’s college project, was a valuable part of the development process, the details of which are described below. the portland state library oscdl-ongoogle-earth project the authors chose ten portlandbased oscdl sub-collections as the basis of this pilot project: harbor drive, front street, portland public market, urban studies collection, downtown, park blocks, south park blocks, pioneer courthouse square, portland city archives, and jpact (joint policy advisory committee on transportation). the programming language for google earth is kml (keyhole markup language), a file format used to display geographic data. kml is based on the xml standard and can be created with the google earth user interface or from scratch with a simple text editor. having no previous kml experience, the authors decided to use both. figure 1. basic placemark in google earth figure 2. kml script for basic placemark 34 information technology and libraries | june 200834 information technology and libraries | june 2008 a basic placemark provided by google earth (figure 1), copied and pasted in notepad (figure 2), was the starting point. at portland state library, information technology routinely batch export cataloged oscdl data from the library catalog (ils) to the oscdl web site to reuse them. for the google earth project, the authors had two options, to either export data relevant to our collections from the ils to a spreadsheet or to use an existing excel spreadsheet containing most of the same data, including place coordinates. this spreadsheet was one of many others that had been created to keep track for the digitization process as well as for creating bibliographic records for the library catalog later. using the available spreadsheet again, the following data were retained: n the title of the collection n longitude and latitude of the place the collection refers to n a brief description of the collection the following were added manually to the remaining spreadsheet: n all the texts and urls for the collection-specific links n urls for the collection-specific images the authors extracted the placemark-specific script from figure 2 to create a template in notepad. a general description and all links that were the same for the ten collections were added to this template, and placeholders were inserted for collection-specific data (figure 3). using microsoft office word’s mail merge, the authors populated the template with the data from the spreadsheet in one quick step. the result was a kml script that included all the placemark data for the ten collections (figure 4). the script was saved as plain text (.txt) first, and then renamed with the extension .kml, which represents the final file (figure 5). clicking the oscdl.kml icon on a desktop or inside a web application opens google earth. the user “flies” to portland, where ten stars represent the ten collections (figure 6). zooming in, the placemarks show the locations to which the collections refer. considering the many layers and icons available in google earth, the authors decided to use yellow stars to make them more visible. in order to avoid clutter and overlapping labels, titles only appear on mouse-over (figures 7 and 8). figure 9 shows the open placemark for portland public market. “portland state university” with the university’s logo is a link that takes the user to the university’s homepage. the next line is the title of the collection, followed by a brief description. the paragraph after that is the same for all collections and includes links to the portland state university library and the oscdl web site. the collection-specific links that follow next go to the library catalog where the user has access to the digitized manuscripts of this collection (figure 10). other pertinent links—in this case to a book available in the library, a public web site on the history of the market, and a historic image of the market—were added as well. to make the placemarks visually more attractive, all links are presented in the school’s “psu green,” and an image representative of the collection was added. the pictures can be enlarged in a new window by clicking on them. to avoid copyright issues, the authors photographed their own images. the last link opens an e-mail window for questions and comments (figure 11). this link is intended to bring some feedback and suggestions on how to improve the project and on its value for researchers and other users. the authors have been toying with the idea of including in the future more elaborate features such as video clips and music. one more recent feature is that kml files, created in google earth, can now also be viewed on the web by simply entering the url of the kml file into the search box of google maps (figure 12), thus creating google earth placemarks in figure 3. detail of template with variables between « double brackets » figure 4. detail: “downtown” placemark of finished kml script figure 5. simplified process figure 6. ten stars representing the ten collections introducing zoomify image | smith 35discovering the library with google earth | brenner and klein 35 google maps with different view options (figures 13 and 14). not all formatting is correctly transferred, and at this point, there is no way to correct this in google maps. for example, the yellow stars were white, the mouse-over didn’t work and the size of the placemarks was imprecise. however, the content of the placemarks—except for the images which didn’t show on some computers—was fully retained and all links worked (figure 15). although the use of the kml file in google maps is not as elegant as in google earth, it has the advantage that there is no need to install software as with google earth. this adds value to kml files and makes projects like this more versatile. the authors have identified several uses for the kml file: n a workstation in the library can be dedicated to resources about the city of portland. an icon on the desktop of this workstation will open google earth and “fly” directly to portland where the yellow stars are displayed. n professors can easily add the .kml file to webct (now blackboard) or other course management systems. n the file can be e-mailed as an figure 7. zoomed in with mouse-over placemark figure 8. location of the pioneer courthouse square placemark figure 9. portland public market figure 10. access to the collection in library catalog figure 11. ready-to-go e-mail window figure 12. url of kml file in google maps search box figure 13. “map” view in google maps figure 14. “satellite” view in google maps figure 15. portland public market placemark in google maps 36 information technology and libraries | june 200836 information technology and libraries | june 2008 attachment to those interested in the development of the city of portland. n a link from the wikipedia page related to the oscdl project leads to the google earth pilot project. n the project was added to the google earth gallery where many remarkable projects, created by individuals and groups can be found. n it can also be accessed through the oscdl web site, and relevant links from the records in the library catalog to google maps can be included. it may be useful to alert patrons, who actually did come to the catalog by themselves, to this visual tool. conclusion “the question now is not how we improve the catalog as such,” says dempsey. “it is how we provide effective discovery and delivery of library materials in a network environment where attention is scarce and information resources are abundant and where discovery opportunities are being centralized into major search engines and distributed to other environments” (dempsey 2006a). with this in mind, the authors took on the challenge to create another discovery tool for one of the library’s primary unique digital collections. google earth is not the web, and it needs to be installed on a workstation in order to use a kml file. on the other hand, the file created in google earth can also be used on the web more readily but less elegantly in google maps, thus possibly reaching a larger audience. similar to the king’s college project and following abram’s suggestion that “we should experiment more with pilots in specific areas” (abram 2005), this pilot project is of an exploratory, experimental nature. and as with many experiments, the authors were testing an idea, trying something different and new to find out how useful this idea might be, and useful applications for this project were identified. google earth is a sophisticated, attractive, and exciting program—and fun to play with. in a time “where attention is scarce and information resources are abundant,” as dempsey (2006a) says, we need to provide these kinds of discovery tools to attract patrons and to lure them to these valuable resources in our library’s catalog that we created with so much diligence and cost of staff time and resources. works cited abbott, carl. 2005. planning a sustainable portland: a digital library for local, regional, and state planning and policy documents. framing paper. http://oscdl.research.pdx.edu/documents/library_grant.pdf. abram, stephen. 2005. the google opportunity. library journal 130, no. 2: 34. avtec media group. 2007. search engine statistics. http://avtecmedia.com/ internet-marketing/internet-marketing-trends.htm. brenner, michaela, tom larsen, and claudia weston. 2006. digital collection management through the library catalog. information technology and libraries 25, no. 2: 65–77. calhoun, karen. 2006. the changing nature of the catalog and its integration with other discovery tools; final report, prepared for the library of congress. www.loc.gov.proxy.lib.pdx. edu/catdir/calhoun-report-final.pdf. coyle, karen. 2007. the library catalog in a 2.0 world. the journal of academic librarianship 33, no. 2: 289–291. de rosa, cathy et al. 2005. perceptions of libraries and information resources. a report to the oclc membership. www .oclc.org.proxy.lib.pdx.edu/reports/ pdfs/percept_all.pdf. dempsey, lorcan. 2006a. the library catalogue in the new discovery environment: some thoughts. ariadne 48. www.ariadne.ac.uk/issue48/dempsey. dempsey, lorcan. 2006b. lifting out the catalog discovery experience. lorcan dempsey’s weblog on libraries, services, and networks, may 14, 2006. http://orweblog .oclc.org/archives/001021.html dempsey, lorcan. 2005. making data work—web 2.0 and catalogs. lorcan dempsey’s weblog on libraries, services, and networks, october 4, 2005. http://orweblog.oclc .org/archives/000815.html gatenby, janifer. 2007. accessing library materials via google and other web sites. paper presented to elag (european library automation group), may 9, 2007. http://elag2007.upf. edu/papers/gatenby_2.pdf. jessop, martyn. 2005. the application of a geographical information system to the creation of a cultural heritage digital resource. literary and linguistic computing: journal of the association for literary and linguistic computing 20, no. 1: 71–90. lynch, clifford. 2002. digital collections, digital libraries, and the digitization of cultural heritage information. first monday 7, no. 5. www.firstmonday. org/issues/issue7_5/lynch. markey, karen. 2007. the online library catalog. d-lib magazine 13, no. 1/2. www .dlib.org/dlib/january07/markey/01 markey.html. lita cover 2, cover 3, cover 4 index to advertisers extending im beyond the reference desk: a case study on the integration of chat reference and library-wide instant messaging network ian chan, pearl ly, and yvonne meulemans information technology and libraries | september 2012 4 abstract openfire is an open-source instant messaging (im) network and a single unified application that meets the needs of chat reference and internal communication. in fall 2009, the california state university san marcos (csusm) library began using openfire and other jive software im technologies to simultaneously improve our existing im-integrated chat reference software and implement an internal im network. this case study describes the chat reference and internal communications environment at the csusm library and the selection, implementation, and evaluation of openfire. in addition, the authors discuss the benefits of deploying an integrated im and chat reference network. introduction instant messaging (im) has become a prevalent contact point for library patrons to get information and reference help, commonly known as chat reference or virtual reference. however, im can also offer a unique method of communication between library staff. librarians are able to rapidly exchange information synchronously or asynchronously in an informal way. im provides another means of building relationships within the library organization and can improve teamwork. many different chat-reference software packages are widely used by libraries, including questionpoint, meebo, and libraryh3lp. less commonly used is openfire (www.igniterealtime.org/projects/openfire), an open-source im network and a single unified application that uses the extensible messaging and presence protocol (xmpp), a widely adopted open protocol for im. since 2009, the california state university san marcos (csusm) kellogg library has used openfire for chat reference and internal im communication. openfire was relatively easy to set up and administer by the web development librarian. librarians and library users have found the im interface to be intuitive. in addition to helpful chat reference features such as statistics capture, queues, transfer, linking to meebo widgets, openfire offers the unique capability to host an internal im network within the library. ian chan (ichan@csusm.edu) is web development librarian, california state university san marcos, pearl ly (pmly@pasadena.edu) is access services & emerging technologies librarian, pasadena community college, pasadena, and yvonne meulemans (ymeulema@csusm.edu) is information literacy program coordinator, california state university san marcos, california. extending im beyond the reference desk | chan, ly, and meulemans 5 in this article, the authors present a literature review on im as a workplace communication tool and its successful use in libraries for chat reference services. a case study on the selection, implementation, and evaluation of openfire for use in chat reference and as an internal network will be discussed. in addition, survey results on the library staff use of the internal im network and its implications for collaboration and increased communication are shared. literature review although there is a great deal of literature on im for library reference services, publications on the use of im in libraries for internal communications do not appear in the professional literature. a review of library and information science (lis) literature has revealed very limited work on this aspect of instant messaging. however, a wider literature review in the fields of communications, computer science, and business, indicates there is growing interest in studying the benefits of im within organizations. instant messaging in the workplace in the workplace, im can offer a cost-effective means of connecting in real-time and may increase communication effectiveness between employees. it offers a number of advantages over email, telephone, and face-to-face that we will discuss further in the following section. within the academic library, im offers the possibility of not only improving access to librarians for research help but also provides the opportunity to enhance communication and collaboration throughout the entire organization. research findings indicate that im allows coworkers to maintain a sense of connection and context that is different from email, face-to-face (ftf), and phone conversations.1 each im conversation is designed to display as a single textual thread with one window per conversation. the contributions from each person in the discussion are clearly indicated and it is easy to review what has been said. this design supports the intermittent reconnection of conversation and in contrast to email, “intermittent instant messages were thought to be more immersive and to give more of a sense of a shared space and context than such email exchanges.”2 through the use of im, coworkers gain a highly interactive channel of communication that is not available via other methods of communication.3 phone and ftf conversations are two of the most common forms of interruption within the workplace.4 however, garrett and danziger found that “instant messaging in the workplace simultaneously promotes more frequent communications and reduces interruptions.”5 participants reported they were better able to manage disruptions using im and that im did not increase their communication time. the findings of this study revealed that some communication that otherwise may have occurred over email, by telephone, or in-person were instead delivered via im. this likely contributed to the reduced interruptions because im does not require full and immediate attention unlike a phone call or face-to-face communication. in addition, im study participants reported the ability to negotiate their availability through postponing conversations, information technology and libraries | september 2012 6 and these findings support earlier studies suggesting im is less intrusive than traditional communication methods for determining availability of coworkers.6 a number of research studies show that im improves teamwork and is useful for discussing complex tasks. huang, hung, and chen compared the effectiveness of email and im and the number of new ideas; they found that groups utilizing im generated more ideas than the email groups.7 they suggested that the spontaneous and rapid interchanges typical of im facilitates brainstorming between team members. the information that is uniquely visible through im and the ease of sending messages help create opportunities for spontaneous dialog. this is supported by a study by quan-haase, cothrel, and wellman, which found im promotes team interaction by indicating the likelihood of a faster response.8 ou et al. also suggest im has “potential to empower teamwork by establishing social networks and facilitating knowledge sharing among organizational members.”9 im can enhance the social connectedness of coworkers through its focus on contact lists and instant, opportunistic interactivity. the informal and personalized nature of im allows workers to build relationships while promoting the sharing of information. cho, trier, and kim suggest that the use of im as a communication tool encourages unplanned virtual hallway discussions that may be difficult for those located in different parts of a building, campus, or in remote locations.10 im can build relationships between teams and organizations where members are in physically separated locations. however, cho, trier, and kim also note that im is more successful in building relationships between coworkers who already have an existing relationship. wu et al. argue that by helping to build the social network within the organization, instant messaging can contribute to increased productivity.11 several studies have cautioned that im, like other forms of communication, requires organizational guidelines on usage and best practices. mahatanankoon suggests that productivity or job satisfaction may decrease without policies and workplace norms that guide im use.12 other research indicates that personality, employee status, and working style may affect the usefulness of im for individual employees.13 some workers may find the multitasking nature of im to work in their favor while those who prefer sequential task completion may find im disruptive. the hierarchy of work relationships and the nature of managerial styles are likely to have an impact on the use of im as well. while there are no research findings associated with the use of im for internal communication within libraries, there are articles encouraging its use. breeding writes of the potential for im to bring about “a level of collaboration that only rarely occurs with the store-and-forward model of traditional e-mail.”14 fink provides a concise introduction to the advantages of using internal im for communication between library staff.15 in addition, he provides an overview of the implementation and success of the openfire-based im network at mcmaster university. extending im beyond the reference desk | chan, ly, and meulemans 7 success of chat reference in libraries im-based chat reference gives libraries the means to more easily offer low-cost delivery of synchronous, real-time research assistance to their users, commonly referred to as “chat reference.” although libraries have used im for the last decade and many currently subscribe to questionpoint, a collaborative virtual reference service through oclc, two newer online services helped propel the growth of im-based chat reference. first available in 2006, the web-based meebo (www.meebo.com) made it much easier to use im for localized chat reference because library patrons were no longer required to have accounts on a proprietary network, such as aol or yahoo, to communicate with librarians.16 instead, meebo provided web widgets that allowed users to chat via the web browser. libraries could easily embed these widgets throughout their website and unlike questionpoint, meebo is free and does not require a subscription. librarians could answer questions using either their account on meebo’s website or by logging-in with a locally installed instant messaging client. in comparison to im-based chat reference, a number of libraries also found questionpoint difficult to use due to its complexity and awkward interface.17 in 2008, libraryh3lp (http://libraryh3lp.com) pushed the growth of im-based chat reference even further because it offered a low-cost, library-specific service that required little technical expertise to implement and operate. libraryh3lp improved on the meebo model by adding features such as queues, multi-user accounts, and assessment tools.18 im adds a more informal means of interaction that helps librarians build relationships with their users. several recent studies have shown that users respond positively to the use of im for chat reference. the illinois state university milner library found that switching from its older chat reference software to im increased transactions by 161 percent within one year.19 with the introduction of web-based im widgets pennsylvania state university library’s im-based chat reference grew from 20 percent to 60 percent of all virtual reference (vr), which includes email reference, in one year.20 a 2010 study of vr and im service at the university of guelph library found 71 percent user satisfaction with im compared to 70 percent satisfaction with vr overall.21 im use in academic libraries has become ubiquitous, and other types of libraries also use im to communicate with library patrons. case study california state university, san marcos (csusm) is a mid-size public university with approximately 9,500 students. csusm is a commuter campus with the majority of students living in north county san diego and offers many online or distance courses at satellite campuses. the csusm kellogg library has a robust chat reference service that is used by students on and off campus. the library has about forty-five employees including librarians, library administrators, and library assistants. the following section will discuss the meebo chat reference pilot, selection of openfire to replace meebo, implementation and customization of openfire, and evaluation of openfire for chat reference by librarians and as an internal network for all library personnel. information technology and libraries | september 2012 8 meebo chat reference pilot to examine the feasibility of using im for chat reference at csusm, the reference librarians initiated a pilot program using meebo (2008–9). a meebo widget was placed on the library’s homepage, the ask a librarian page, and on library research guides. within the first year of the pilot project, chat reference grew to more than 41 percent of all reference transactions.22 based on responses to user satisfaction surveys, 85 percent indicated they would recommend chat reference to other students, and 69 percent said they preferred it to other forms of reference services. chat reference is now an integral part of the library’s research assistance program, and im has become a permanent access point for students to contact reference librarians. although the new im service was successful, the pilot program uncovered a number of key shortcomings with meebo when used for chat reference; these shortcomings are documented in a case study by meulemans et al.23 these findings matched problems reported by other libraries who used meebo in their reference services.24 meebo is most suited for individual users who communicate one-to-one via im. for example, meebo chat widgets are specific to each meebo user, and it is not possible to share a single widget between multiple librarians. in addition, features such as message queues and message transfers, invaluable for managing a heavily used chat reference service, are not available in meebo. those features are essential for working with multiple, simultaneous incoming im messages, a common occurrence in virtual reference. other missing features included the lack of built-in transcript retention and lack of automated usage statistics.25 selecting openfire based on the need for a more robust chat reference system, the csusm reference librarians and the web development librarian explored other im options, especially open-source software. the web development librarian had previous experience using openfire at the university of alaska anchorage, for an internal library im network and investigated its capabilities to replace meebo as a chat reference tool. the desire to replace meebo for chat reference at csusm also provided the opportunity to pilot an internal im network. openfire, part of the suite of open-source instant messaging tools from jive software, was the only application that could easily fulfill both roles and offered a number of features that made it highly preferable when compared to other im-based chat reference systems. of its many features, one of the most valuable was the integration between openfire user accounts and our campus email system. being able to tap into the university’s email system meant automated configuration and updating of all staff accounts and contact lists. this removed the burden of individual account maintenance associated with external services such as meebo, libraryh3lp, and questionpoint. openfire supports internal im networks at educational institutions such as the university of pennsylvania, central michigan university, and university of california, san francisco. extending im beyond the reference desk | chan, ly, and meulemans 9 openfire could meet our im chat reference needs because it includes the fastpath plugin, a complete web-based chat management system available at www.igniterealtime.org/projects/openfire/plugins.jsp. this robust system incorporates important features such as message queues, message transfer, statistics, and canned messages. james cook university library in australia also chose to use openfire with fastpath plugin as its chat reference solution based on their need for those features.26 other institutions using fastpath and openfire in the role of chat reference or support include the university of texas, the oregon/ohio multistate virtual reference consortium, mozilla.com, and the university of wisconsin. when reviewing chat reference solutions, we considered the possibility of using chat modules available through drupal (http://drupal.org), the web content management system (cms) for our library website. the primary advantage of that option was complete integration with the library website and intranet. further analysis of the drupal option revealed that the available chat modules where too basic for our needs and that reconfiguration of our intranet and website to incorporate a workable chat reference system would require extensive time. in comparison to the implementation time associated with deploying the openfire system, using drupal-based chat modules did not provide a favorable cost-benefit ratio. while the proprietary libraryh3lp offered similar functionality for chat reference, its inability to integrate with our email system was clearly a deficit when compared to openfire. in libraryh3lp, it is necessary to create accounts for all library personnel in chat reference. fastpath does not have that requirement if you integrate openfire with your organization’s lightweight directory access protocol (ldap) directory. instead, the system will automatically create accounts for all library staff. furthermore, the administrative options and interface for libraryh3lp also did not compare favorably with that of fastpath. the fastpath interface for assigning users is more intuitive and the system generates a customizable chat initiation form for each workgroup (figures 1 and 2). oregon’s l-net and ohio’s knowitnow24x7 offer information about software requirements and an online demonstration of spark/fastpath.27 information technology and libraries | september 2012 10 figure 1. fastpath chat initiation form for csusm research help desk figure 2. fastpath chat initiation form for csusm media library for our requirements, openfire was clearly superior to the available systems for chat reference. its relatively simple deployment requirements and ease of setup helped make it our first choice for building a combined im network and chat reference system. in the following section, we will discuss the installation, customization, and assessment of our openfire implementation. openfire installation and configuration the openfire application is a free download from ignite realtime, a community of jive software. the program will run on any web server that has a windows, linux, or macintosh operating system. if configured as a self-contained application, openfire only requires java to be available on your web server. installation of the software is an automated process and system configuration is through a web-based setup guide. after the initial language selection form, the next step in the server configuration process is to enter the web server url and the ports through which the server will communicate with the outside world (figure 3). the third step provides fields for selecting the type of database to use with openfire and for inputting any information relating to your selection (figure 4). extending im beyond the reference desk | chan, ly, and meulemans 11 figure 3. openfire server settings screen figure 4. openfire database configuration form openfire uses a database to store information such as im network settings, user account information, and transcripts. database options include using an embedded database or connecting to an external database server. using the embedded database is the simpler option and is helpful if you do not have access to a database server. connecting to an external database server offers more control of the data generated by openfire and provides additional backup options. openfire works with a number of the more commonly used database servers such as mysql, postgresql, and microsoft sql server. in addition, oracle and ibm’s db2 are database options with additional free plugins from these vendors. we choose to use mysql because of our experience using it with other library web applications. if using the external database option, creating and configuring the external database before installing openfire is highly recommended. after choosing a database, the openfire configuration requires the selection of an authentication method for user accounts. one option is to use openfire’s internal authentication system. while the internal system is robust, it requires additional administrative support to manage the process of creating and maintaining user accounts. the recommended option is to connect openfire with your organization’s lightweight directory access protocol (ldap) directory (figure 5). ldap is a protocol that allows external systems to interact with the user information stored in an organization’s email system. using ldap with openfire is highly preferable because it simplifies access for your librarians and staff by automatically creating user accounts based on the information in your organization’s email system. library staff simply login with their work email or network account information; they are not required to create a new username and password. information technology and libraries | september 2012 12 figure 5. openfire ldap configuration form the last step in the configuration process is to grant system administrator access to the appropriate users. if using the ldap authentication method, you are able to select one or more users in your organization by entering their email id (the portion before the ampersand). the selected users will have complete access to all aspects of the openfire server. once the setup and configuration process is complete, the server is ready to accept im connections and route messages. reviewing the settings and options within the openfire system administration area is highly recommended. most libraries will likely want to adjust the configurations within the sections for server settings and archives. connecting the im network the second phase of the implementation process connected our library personnel with the im network using im software installed on their workstations. the openfire im server works with any multiprotocol im client (“multiprotocol” refers to support for simultaneous connections to multiple im networks) that provides options for configuring an xmpp or jabber account. some of the more popular im clients that offer this functionality include spark, trillian, miranda, and pidgin. based on our chat reference requirements, we choose to use spark (www.igniterealtime.org/projects/spark), an im client program designed to work specifically with the fastpath web chat service. spark comes with a fastpath plugin that enables users to receive and send messages to anyone communicating through the web-based fastpath chat widgets (more information on fastpath configuration is in the next section of this article). this plugin provides a tab for logging into a fastpath group and for viewing the status of the group’s message queues extending im beyond the reference desk | chan, ly, and meulemans 13 (figure 6). spark also includes many of the features offered by other im clients including built-in screen capture, message transfer, and group chat. figure 6. the fastpath plugin for spark library personnel were able to install spark on their own by downloading it from the ignite software website and launching the software’s installation package. the installation process is very simple and user-specific information is only required when spark is started for the first time. the fields required for login include the username and password of the user’s organizational email and the address of the im server. as part of our implementation process, we also provided library staff with recommendations regarding the selection and configuration of optional settings that might enhance their im experience. recommendations included auto-start of spark when loggingin to computer and the activation of incoming message signals, such as sound effects and pop-ups. on our openfire server, we had also installed the kraken gateway (http://kraken.blathersource.org) plugin to enable connections to external im networks. the gateway plugin works with spark to integrate library staff accounts on chat network such as google talk, facebook, and msn (an example of integrated networks is shown in figure 6.) by integrating meebo as well, librarians were able to continue using the meebo widgets they had embedded into their research guides and faculty profile pages. this allowed them to use spark to receive im messages rather than logging on to the meebo website. information technology and libraries | september 2012 14 configuring the fastpath plugin for chat reference a primary motivation for using openfire was the feature set available in the fastpath plugin. fastpath is a complete chat messaging system that includes workgroups, queues, chat widgets, and reporting. fastpath actually consists of two plugins that work together, fastpath service for managing the chat system and fastpath webchat for web-based chat widgets. both plugins are available as free downloads from the openfire plugins section of the ignite software website— www.igniterealtime.org/projects/openfire/plugins.jsp. to install fastpath, upload the its packages using the form in the plugins section of the openfire administrative interface. the plugins will automatically install and add a fastpath tab to the administrative main menu. the first step in getting started with the system is to create a workgroup and add members (figure 7). within each new workgroup, one or more queues are required to process and route incoming requests and each queue requires at least one “agent.” in fastpath, the term agent refers to those who will receive the incoming chat requests. figure 7. workgroup setup form in fastpath as work groups are created, the system automatically generates a chat initiation form which by default includes fields for name, email and question. administrators can remove, modify, and add any combination of field types including text fields, dropdown menus, multiline text areas, radio buttons, and check boxes. you may also configure the chat initiation form to require completion of some, all, or none of the fields. at csusm, our form (figures 1 and 2) includes name, question, email, and a dropdown menu for selecting the topic area of the user’s research and a field for the user to enter their question. the information in these fields allows us to quickly route incoming extending im beyond the reference desk | chan, ly, and meulemans 15 questions to the appropriate subject librarian. fastpath includes the ability to create routing rules that use the values submitted in the form to send messages to specific queues within a workgroup. in future, we may use the dropdown menu to automatically route questions to the subject specialist based on the student’s topic. there are two methods to make the fastpath chat widget available to the public. the standard approach embeds a presence icon on your webpage and provides automatic status updates. clicking on the icon displays the chat initiation form. for our needs we choose to embed the chat initiation form in our webpages (see appendix b for sample code). when the user submits the form, openfire routes the message to the next available librarian. on the librarian’s computer, the spark program plays a notification sound and displays a pop-up dialog. the pop-up dialog remains open until the librarian accepts the message, passes it on, or the time limit for acceptance is reached, in which case the message returns to the queue for the next available librarian. evaluation of openfire for enhanced chat reference the csusm reference librarians found fastpath and openfire to be much more robust than meebo for chat reference. the ability to keep chat transcripts and to retain metadata such as time stamps, duration of chats, and topic of research for each conversation is very helpful toward analyzing the effectiveness of chat research assistance and for statistical reporting. the automated recording of transcripts and metadata saved time when compared to meebo. using meebo, transcripts were manually copied into a microsoft word document and the tracking statistics of im interactions were kept in a shared excel spreadsheet. other useful features of fastpath were the capability of transferring of patrons to other librarians and having more than one librarian monitor incoming questions. furthermore, access to the database holding the fastpath data allowed us to build an intranet page to monitor real-time incoming im messages and their responses. however, some issues were encountered with the fastpath plugin when initiating chat connections. we experienced intermittent, random instances of dropped im connections and lost messages. while many of these lost connections were likely the result of user actions (accidentally closing the chat pop-up, walking away from the computer, etc.), others appear to have been due to problematic connections between the server and the user’s browser. to address these issues, we are now asking users to provide their email when they initiate a chat session. with user emails and our real-time chat monitoring system, we are able to follow up with reference patrons that experience im connection issues and provide research assistance via email. evaluation of openfire as an internal communication tool while the adoption of im as internal communication tool was highly encouraged, its use was not mandatory for all library personnel. based on the varied technical background of our staff and librarians, we recognized that some might find im difficult to integrate within their workflow or communication style and chose a soft-launch for our network. information technology and libraries | september 2012 16 in summer 2011, we conducted a survey of csusm library personnel (44 respondents, 99 percent of total staff) to evaluate im as an internal communication tool. (see appendix a for survey questions.) we found that 59 percent of staff use the internal im network while 85 percent use some type of im for web-based chat for work. of those who use internal im, 30 percent used it daily. while the survey was anonymous, anecdotal discussions indicate adoption rates are higher among library units where the work is technically oriented or instructional in nature, such as library systems and the information literacy program/reference. among the respondents who use im, 45 percent of library staff indicated they use it because it allows quick communication between those in the library and 39 percent like its informal nature of communication. twenty percent of total respondents preferred im to email and phone communications. two respondents use the internal im network but were dissatisfied with it and indicated it did not work well while one found it too difficult to use. an additional survey question was geared for staff members who do not use the internal im network at all (“why do you not use the library im network?”). this question was designed to find areas of possible improvement within our system to encourage greater use. survey respondents were allowed to select more than one reason. the most common reasons given by those who do not use the library im network were that they don’t feel the need (34 percent of nonusers), they mainly communicate with staff members who are also not utilizing the im network (18 percent), im does not work for their communication style (14 percent), and privacy concerns (14 percent). we believe more in-depth analysis is necessary to learn more regarding the perceived usefulness of im within our organization and to further its adoption. conclusion through additional training and user education, we hope to promote greater use of the openfire internal im network among those who work in the library. while 100 percent adoption of im as a communication tool is not a stated goal of our project, we believe that some staff have not realized the full potential of im for collaboration and productivity due to a lack of experience with this technology. in hindsight, additional training sessions beyond the initial introductory workshop to set up the spark im client may have increased the usage of im by staff. for example, providing more information on the library’s policies regarding internal im tracking and the configuration of our system may have alleviated concerns regarding privacy. in addition, we need to lead more discussions on the benefits of im for collaboration, lowering disruptions, and increasing effectiveness in the workplace. openfire and fastpath for chat reference has brought many new features that were previously unavailable to chat reference at csusm. the addition of queues, message transfer, and transcripts has enhanced the effectiveness of this service and eased its management. compared to the prior chat reference implementations that used questionpoint and meebo, this new system is more user friendly and robust. extending im beyond the reference desk | chan, ly, and meulemans 17 furthermore, the internal im network and its connection to web-based chat widgets offer the opportunity for building a library that is more open to users. library users could feasibly contact any library staff member, not just reference librarians, via im for help. we are testing this concept with a pilot project involving the csusm media library. they are staffing their own chat workgroup and a chat widget is now available on their website. in the future, we also hope to employ a chat widget for circulation and ill services, another public services area that frequently works with library users. it is important to note that the success of openfire and im in the library attracted the attention of other csusm instructional and student support areas. in spring 2011, instructional and information technology services (iits), which provides campus-wide technology services for faculty, staff, and students piloted an openfire-based im helpdesk service to assist users with technology questions and problems. as of fall 2011, the “ask an it technician” service is fully implemented and available on all campus webpages. discussions on the adoption of im for other campus student services, such as financial aid and counseling, have also occurred. in addition to being a contact point for students, im has potential to improve the internal communication within the organization. references 1. hee-kyung cho, matthias trier, and eunhee kim, “the use of instant messaging in working relationship development: a case study,” journal of computer-mediated communication 10, no. 4 (2005), http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full (accessed aug. 1, 2011). 2. bonnie a. nardi, steven whittaker, and erin bradner, “interaction and outeraction: instant messaging in action,” in proceedings of the 2000 acm conference on computer supported cooperative work (new york, new york: acm press, 2000),79–88. 3. ellen isaacs et al., “the character, functions, and styles of instant messaging in the workplace,” in proceedings of the 2002 acm conference on computer supported cooperative work (new york, new york: acm press, 2002), 11–20. 4. victor m. gonzález and gloria mark, “constant, constant, multi-tasking craziness: managing multiple working spheres,” in proceedings of the sigchi conference on human factors in computing systems (new york, new york: acm press, 2004), 113–20. 5. r. kelly garrett and james n. danziger, “im = interruption management? instant messaging and disruption in the workplace,” journal of computer-mediated communication 13, no. 1 (2007), http://jcmc.indiana.edu/vol13/issue1/garrett.html (accessed jun. 15, 2011). 6. nardi, whittaker, and bradner, “interaction and outeraction,” 83. 7. albert h. huang, shin-yuan hung, and david c. yen, “an exploratory investigation of two internet-based communication modes,” computer standards & interfaces 29, no. 2 (2006): 238–43. http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full http://jcmc.indiana.edu/vol13/issue1/garrett.html information technology and libraries | september 2012 18 8. anabel quan-haase, joseph cothrel, and barry wellman, “instant messaging for collaboration: a case study of a high-tech firm,” journal of computer-mediated communication 10, no. 4 (2005), http://jcmc.indiana.edu/vol10/issue4/quan-haase.html (accessed jun. 12, 2011). 9. carol x. j. ou et al., “empowering employees through instant messaging,” information technology & people 23, no. 2 (2010): 193–211. 10. cho, trier, and kim, “instant messaging in working relationship development.” 11. lynn wu et al., “value of social network—a large-scale analysis on network structure impact to financial revenue of information technology consultants” (paper presented at winter information systems conference, salt lake city, ut, feb. 5, 2009). 12. pruthikrai mahatanankoon, “28p. exploring the impact of instant messaging on job satisfaction and creativity,” conf-irm 2010 proceedings (2010). 13. ashish gupta and han li, “understanding the impact of instant messaging (im) on subjective task complexity and user satisfaction,” in pacis 2009 proceedings. paper 10, http://aisel.aisnet.org/pacis2009/1; and stephanie l. woerner, joanne yates, and wanda j. orlikowski, “conversational coherence in instant messaging and getting work done,” in proceedings of the 40th annual hawaii international conference on system sciences, http://www.computer.org/portal/web/csdl/doi/10.1109/hicss.2007.152 (2007). 14. marshall breeding, “instant messaging: it’s not just for kids anymore,” computers in libraries 23, no. 10 (2003): 38–40. 15. john fink, “using a local chat server in your library,” feliciter 56, no. 5 (2010): 202–3. 16. william breitbach, matthew mallard, and robert sage, “using meebo’s embedded im for academic reference services: a case study,” reference services review 37, no. 1 (2009): 83–98. 17. cathy carpenter and crystal renfro, “twelve years of online reference services at georgia tech: where we have been and where we are going,” georgia library quarterly 44, no. 2 (2007), http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 (accessed aug. 25, 2011); and danielle theiss-white et al., “im’ing overload: libraryh3lp to the rescue,” library hi tech news 26, no. 1/2 (2009): 12–17. 18. theiss-white et al., “im’ing overload,” 12–17. 19. sharon naylor, “why isn’t our chat reference used more?” reference & user services quarterly 47, no. 4 (2008): 342–54 20. sam stormont, “becoming embedded: incorporating instant messaging and the ongoing evolution of a virtual reference service,” public services quarterly 6, no. 4 (2010): 343–59. http://jcmc.indiana.edu/vol10/issue4/quan-haase.html http://www.computer.org/portal/web/csdl/doi/10.1109/hicss.2007.152 http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 extending im beyond the reference desk | chan, ly, and meulemans 19 21. lorna rourke and pascal lupien, “learning from chatting: how our virtual reference questions are giving us answers,” evidence based library & information practice 5, no. 2 (2010): 63–74. 22. pearl ly and allison carr, “do u im?: using evidence to inform decisions about instant messaging in library reference services” (poster presented at the 5th evidence based library and information practice conference, stockholm, sweden, june 29, 2009), http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf (accessed august 1, 2011). 23. yvonne nalani meulemans, allison carr, and pearl ly, “from a distance: robust reference service via instant messaging,” journal of library & information services in distance learning 4, no. 1 (2010): 3–17. 24. theiss-white et al., “im’ing overload,” 12–17. 25. meulemans, carr, and ly, “from a distance,” 14–15 26. nicole johnston, “improving the reference and information experience of students in regional areas—does an instant messaging service make a difference?” (paper presented at 4th alia new librarians symposium, december 5–6, 2008, melbourne, australia), http://eprints.jcu.edu.au/2076(accessed august 17, 2011); and alan cockerill, “open source for im reference: openfire, fastpath and spark” (workshop presented at fair shake of the open source bottle, griffith university, queensland college of art, brisbane, australia, november 20, 2009), http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 (accessed august 4, 2011). 27. oregon state multistate collaboration, “multi-state collaboration: home,” http://www.oregonlibraries.net/multi-state (accessed august 16, 2011). http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf http://eprints.jcu.edu.au/2076 http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 http://www.oregonlibraries.net/multi-state information technology and libraries | september 2012 20 appendix a library instant messaging (im) usage survey the information you submit is confidential. your name and campus id are not included with your response. which of the following do you use . . . for work for personal library’s im network (spark) meebo msn yahoo gtalk facebook or other website-specific chat system im app on my phone trillian, pidgin or other im aggregator skype i don’t use im or web-based chat other if you selected other, please describe: ____________________________________________________________________ extending im beyond the reference desk | chan, ly, and meulemans 21 on average, how often do you communicate via im or web-based chat at work? ● several times a day ● almost daily ● several times a week ● several times a month ● never how often do you use im or web-based chat to . . . 5—often 4 3— sometimes 2 1—never discuss work-related topic socialize with co-worker answer questions from library users talk about non-work related topic request tech support other if you selected other, please describe: ____________________________________________________________________ if you use im to communicate at work, what do you like about it? ● allows for quick communication with others in the library ● facilitates informal conversation ● students like to use it to ask library related questions ● i prefer im over phone or email ● other: information technology and libraries | september 2012 22 why do you not use the library im network? ● don’t feel the need ● the people i usually talk to aren’t on it ● does not work well ● never get around to it . . . but would like to ● it doesn’t work for my communication style ● the system is too difficult to use ● privacy concerns ● other: additional comments? ____________________________________________________________________ extending im beyond the reference desk | chan, ly, and meulemans 23 appendix b iframe code for embedding fastpath chat widget many different chat-reference software packages are widely used by libraries, including questionpoint, meebo, and libraryh3lp. less commonly used is openfire (www.igniterealtime.org/projects/openfire), an open-source im network and a single unified ap... literature review instant messaging in the workplace success of chat reference in libraries case study meebo chat reference pilot selecting openfire openfire installation and configuration connecting the im network configuring the fastpath plugin for chat reference evaluation of openfire for enhanced chat reference evaluation of openfire as an internal communication tool conclusion references appendix a library instant messaging (im) usage survey appendix b investigations into library web-scale discovery services jason vaughan information technology and libraries | march 2012 32 abstract web-scale discovery services for libraries provide deep discovery to a library’s local and licensed content and represent an evolution—perhaps a revolution—for end-user information discovery as pertains to library collections. this article frames the topic of web-scale discovery and begins by illuminating web-scale discovery from an academic library’s perspective—that is, the internal perspective seeking widespread staff participation in the discovery conversation. this included the creation of the discovery task force, a group that educated library staff, conducted internal staff surveys, and gathered observations from early adopters. the article next addresses the substantial research conducted with library vendors that have developed these services. such work included drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. together, feedback gained from library staff, insights arrived at by the discovery task force, and information gathered from vendors collectively informed the recommendation of a service for the unlv libraries. introduction web-scale discovery services, combining vast repositories of content with accessible, intuitive interfaces, hold the potential to greatly facilitate the research process. while the technologies underlying such services are not new, commercial vendors releasing such services, and their work and agreements with publishers and aggregators to preindex content, is very new. this article in particular frames the topic of web-scale discovery and helps illuminate some of the concerns and commendations related to web-scale discovery from one library’s staff perspective—that is, the internal perspective. the second part focuses on detailed dialog with the commercial vendors, enabling the library to gain a better understanding of these services. in this sense, the second half is focused externally. given that web-scale discovery is new for the library environment, the author was unable to find any substantive published work detailing identification, research, evaluation, and recommendation related to library web-scale discovery services. it’s hoped that this article will serve as the ideal primer for other libraries exploring or contemplating exploration of these groundbreaking services. web-scale discovery services are able to index a variety of content, whether hosted locally or remotely. such content can include library ils records, digital collections, institutional repository content, and content from locally developed and hosted databases. such capabilities existed, to varying degrees, in next-generation library catalogs that debuted in the mid 2000s. in addition, web-scale discovery services pre–index remotely hosted content, whether purchased or licensed by the library. this latter set of content—hundreds of millions of items—can include items such as e-books, publisher or aggregator content for tens of thousands of full-text journals, content from abstracting and indexing databases, and materials housed in open-access repositories. for purposes of this article, web-scale discovery services are flexible services which jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada, las vegas. investigations into library web-scale discovery services | vaughan 33 provide quick and seamless discovery, delivery, and relevancy-ranking capabilities across a huge repository of content. commercial web-scale discovery vendors have brokered agreements with content providers (publishers and aggregators), allowing them to pre–index item metadata and full-text content (unlike the traditional federated search model). this approach lends itself to extremely rapid search and return of results ranked by relevancy, which can then be sorted in various ways according to the researcher’s whim (publication date, item type, full text only, etc.). by default, an intuitive, simple, google-like search box is provided (along with advanced search capabilities for those wishing this approach). the interface includes design cues expected by today’s researchers (such as faceted browsing) and, for libraries wishing to extend and customize the service, embraces an open architecture in comparison to traditional ils systems. why web-scale discovery? as illustrated by research dating back primarily to the 1990s, library discovery systems within the networked online environment have evolved, yet continue to struggle to serve users. as a result, the library (or systems supported and maintained by the library) is often not the first stop for research—or worse, not a stop at all. users accustomed to a quick, easy, “must have it now” environment have defected, and research continues to illustrate this fact. rather than weave these research findings into a paragraph or page, below are some illustrative quotes to convey this challenge. the quotations below were chosen because they succinctly capture findings from research involving dozens, hundreds, and in some cases thousands of participants or respondents: people do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find.1 * * * today, there are numerous alternative avenues for discovery, and libraries are challenged to determine what role they should appropriately play. basic scholarly information use practices have shifted rapidly in recent years, and as a result the academic library is increasingly being disintermediated from the discovery process, risking irrelevance in one of its core functional areas [that of the library serving as a starting point or gateway for locating research information] . . . we have seen faculty members steadily shifting towards reliance on networklevel electronic resources, and a corresponding decline in interest in using locally provided tools for discovery.2 * * * a seamless, easy flow from discovery through delivery is critical to end users. this point may seem obvious, but it is important to remember that for many end users, without the delivery of something he or she wants or needs, discovery alone is a waste of time.3 * * * end users’ expectations of data quality arise largely from their experiences of how information is organized on popular web sites. . . 4 * * * [user] expectations are increasingly driven by their experiences with search engines like google and online bookstores like amazon. when end users conduct a search in a library information technology and libraries | march 2012 34 catalog, they expect their searches to find materials on exactly what they are looking for; they want relevant results.5 * * * users don’t understand the difference in scope between the catalog and a&i services (or the catalog, databases, digitized collections, and free scholarly content).6 * * * it is our responsibility to assist our users in finding what they need without demanding that they acquire specialized knowledge or select among an array of “silo” systems whose distinctions seem arbitrary . . . the continuing proliferation of formats, tools, services, and technologies has upended how we arrange, retrieve, and present our holdings. our users expect simplicity and immediate reward and amazon, google, and itunes are the standards against which we are judged. our current systems pale beside them.7 * * * q: if you could provide one piece of advice to your library, what would it be? a: just remember that students are less informed about the resources of the library than ever before because they are competing heavily with the internet.8 additional factors sell the idea of web-scale discovery. obviously, something must be discoverable for it to be used (and of value) to a researcher; ideally, content should be easily discoverable. since these new services index content that previously was housed in dozens or hundreds of individual silos, they can greatly facilitate the search process for many research purposes. libraries often spend large sums of money to license and purchase content, sums that often increase annually. any tool that holds the potential to significantly increase the discovery and use of such content should cause libraries to take notice. at time of writing, early research is beginning to indicate that these tools can increase discovery. doug way compared link-resolver-database and full-text statistics prior to and after grand valley state university’s implementation of the summon webscale discovery service.9 his research suggests that the service was both broadly adopted by the university’s community and that it has led to an increase in their library’s electronic resource discovery and use. willamette university implemented worldcat local, and bill kelm presented results that showed an increase in both ill requests as well as use of the library’s electronic resources.10 from another angle, information-literacy efforts focus on connecting users to “legitimate” content and providing researchers the skills to identify content quality and legitimacy. given that these web-scale discovery services include or even primarily focus on indexing a large amount of scholarly research, such services can serve as another tool in the library’s arsenal. results retrieved from these services—largely content licensed or purchased by libraries—is accurate, relevant, and vetted, compared to the questionable or opinionated content that may often be returned through a web search engine query. several of the services currently allow a user to refine results to just categorized as peer-reviewed or scholarly. the internal academic library perspective: genesis of the unlv libraries discovery task force the following sections of this article begin with a focus on the internal unlv library perspective—from early discussions focused on the broad topic of discovery to establishing a task investigations into library web-scale discovery services | vaughan 35 force charged to identify, research, evaluate, and recommend a potential service for purchase. throughout this process, and as detailed below, communication with and feedback from the variety of library staff was essential in ensuring success. given the increasing vitality of content in electronic format, and the fact that such content was increasingly spread across multiple access points or discovery systems, in late 2008 the university of nevada las vegas (unlv) libraries began an effort to engage library staff in information discovery and how such discovery would ideally occur in the future. related to the exponential growth of content in electronic format, traditional technical-services functions of cataloging and acquisitions were changing or would soon change, not just at unlv, but throughout the academic library community. coinciding with this, the libraries were working on drafting their 2009–11 strategic plan and wanted to have a section highlighting the importance of information discovery and delivery with action items focused on improving this critical responsibility of libraries. in spring 2009, library staff were given the opportunity to share with colleagues a product or idea, related to some aspect of discovery, which they felt was worthy of further consideration. this event, open to unlv libraries staff and other nevada colleagues, was titled the discovery mini-summit, and more than a dozen participants shared their ideas, most in a poster-session format. one of the posters focused on serial solutions summon, an early entrant into the vendor web-scale discovery service landscape. at the time, it was a few months from public release. other posters included topics such as the flickr commons (cultural heritage and academic institutions exposing their digital collections through this popular platform), and a working prototype of a homegrown, open-source federated search approach searching across various subscribed databases. in august 2009, the dean of the unlv university libraries charged a ten-person task force to investigate and evaluate web-scale discovery services with the ultimate goal of providing a final recommendation for potential purchase. representation on the task force included three directors and a broad cross section of staff from across the functional areas of the library, including back-of-the-house and public-service operations. the director of library technologies, and author of this article, was tasked with drafting a charge and chairing the committee; once charged, the discovery task force worked over the next fifteen months to research, evaluate, and ultimately provide a recommendation regarding a web-scale discovery service. to help illustrate some of the events described, a graphical timeline of activities is presented as appendix a; the original charge appears as appendix b. in retrospect, the initial target date of early 2010 to make a recommendation was naive, as three of the five products ultimately identified and evaluated by the task force weren’t publicly released until 2010. several boundaries were provided within the charge, including the fact that the task force was not investigating and evaluating traditional federated search products. the libraries had had a very poor experience with federated search a few years earlier, and the shortcomings of the traditional federated search approach—regardless of vendor—are well known. the remainder of this article discusses the various steps taken by the discovery task force in evaluating and researching web-scale discovery services. while many libraries have begun to implement the webscale discovery services evaluated by this group, many more are currently at the learning and evaluation stage, or have not yet begun. many libraries that have already implemented a commercial service likely went through an evaluation process, but perhaps not at the scale conducted by the unlv libraries, if for no other reason than the majority of commercial services are extremely new. even in early 2010, there was less competition, fewer services to evaluate, information technology and libraries | march 2012 36 fewer vendors to contact, and fewer early adopters from whom to seek references. fortunately, the initial target date of early 2010 for a recommendation was a soft target, and the discovery task force was given ample time to evaluate the products. based on presentations given by the author in 2010, it can’t be presumed that an understanding of web-scale discovery—or the awareness of the commercial services now available—is necessarily widespread. in that sense, it’s the author’s hope and intent that information contained in this article can serve as a primer, or a recipe, for those libraries wishing to learn more about web-scale discovery and perhaps begin an evaluation process of their own. while research exists on federated search technologies within the library environment, the author was unable to find any peer-reviewed published research on the evaluation model and investigations for vendor produced web-scale discovery services as described in this paper. however, some reports are available on the open web, providing some insights into web-scale discovery evaluations led by other libraries, such as two reports provided by oregon state university. the first, dated march 2009, describes a task force whose activities included “scrutinize wcl [worldcat local], investigate other vendors’ products, specifically serials solutions’ summon, the recently announced federated index discovery system; ebsco’s integrated search; and innovative interfaces’ encore product, so that a more detailed comparison can be done,” and “by march 2010, communicate . . . whether wcl or another discovery service is the optimal purchase for osu libraries.”11 note that in 2009, encore existed as a next-generation discovery layer, and it had an optional add on called “encore harvester,” which allows for the harvesting of digital local collections. the report cites the university of michigan’s evaluation of wcl, and adds their additional observations. the march 2009 report provides a features comparison matrix for worldcat local, encore, summon, and libraryfind (an open-source search tool developed at osu that provides federated searching for selected resources). feature sets include the areas of search and retrieval, content, and added features (e.g., book covers, user tagging, etc.). the report also describes some usability testing involving wcl and integration with other local library services. a second set of investigations followed “in order to provide the task force with an opportunity to more thoroughly investigate other products” and is described in a second report provided at the end of 2009.12 at the time of both phases of this evaluation (and drafted reports) three of the web-scale discovery products had yet to enter public release. the december 2009 report focused on the two released products, serials solutions summon and worldcat local, and includes a feature matrix like the earlier report, with the added feature set of “other,” which included the features of “clarity of display,” “icons/images,” and “speed.” the latter report briefly describes how they obtained subject librarian feedback and the pros and cons observed by the librarians in looking at summon. it also mentions obtaining feedback from two early adopters of the summon product, as well as obtaining feedback from librarians whose library had implemented worldcat local. apart from the oregon reports, some other reports on evaluations (or selection) of a particular service, or a set of particular services, are available, such as the university of michigan’s article discovery working group, which submitted a final report in january 2010.13 activity: understanding web-scale the first activity of the discovery task force was to educate the members, and later, other library colleagues, on web-scale discovery. terms such as “federated search,” “metasearch,” “next investigations into library web-scale discovery services | vaughan 37 generation catalogs,” and “discovery layers” had all come before, and “web-scale” was a rather new concept that wasn’t widely understood. the discovery mini summit served as a springboard that perhaps more by chance than design introduced to unlv library staff what would later become more commonly known as web-scale discovery, though even we weren’t familiar with the term back in spring 2009. in fall 2009, the discovery task force identified reports from entities such as oclc, ithaka, and reports prepared for the library of congress highlighting changing user behavior and expectations; these reports helped form a solid foundation for understanding the “whys” related to web-scale discovery. additional registration and participation in sponsored web-scale discovery webcasts and meeting with vendors at library conferences helped further the understanding of web-scale discovery. after the discovery task force had a firm understanding of web-scale discovery, the group hosted a forum for all library staff to help explain the concept of web-scale discovery and the role of the discovery task force. specifically, this first forum outlined some key components of a web-scale discovery service, discussed research the task force had completed to date, and outlined some future research and evaluation steps. a summary of these steps appears in the timeline in appendix a. time was allowed for questions and answers, and then the task force broadcast several minutes of a (then recent) webcast talking about web-scale discovery. as part of its education role, the discovery task force set up an internal wiki-based webpage in august 2009 upon formation of the group, regularly added content, and notified staff when new content was added. a goal of the task force was to keep the evaluative process transparent, and over time the wiki became quite substantial. links to “live” services were provided on the wiki. given that some services had yet to be released, some links were to demo sites or sites of the closest approximation available, i.e., some services yet to be released were built on an existing discovery layer already in general release, and thus the look, feel, and functionality of such services was basically available for staff review. the wiki also provided links to published research and webcasts on web-scale discovery. such content grew over time as additional webscale discovery products entered general release. in addition to materials on particular services, links were provided to important background documents and reports on topics related to the user discovery experience and user expectations for search, discovery, and delivery. discovery task force meeting notes and staff survey results were posted to the wiki, as were evaluative materials such as information on the content-overlap analysis conducted for each service. announcements to relevant vendor programs at the american library association’s annual conference were also posted to the wiki. activity: initial staff survey as noted above, when the task force began its work, only two products (out of five ultimately evaluated) were in general release. as more products entered public release, a next step was to invite vendors onsite to show their publicly released product, or a working, developed prototype nearing initial public release. to capture a sense of the library staff ahead of these vendor visits, the discovery task force conducted the first of two staff surveys. the 21-question survey consisted of a mix of “rank on a scale” questions, multiple-choice questions, and free-text response questions. both the initial and subsequent surveys were administered through the online surveymonkey tool. respondents were allowed to skip any question they wished. the survey was broken into three broad topical areas: “local library customization capabilities,” “end user aspect: information technology and libraries | march 2012 38 features and functionality,” and “content.” the survey had an average response rate of 47 staff, or 47% of the library’s 100-strong workforce. the survey questions appear in appendix c. in hindsight, some of the questions could have benefitted from more careful construction. that said, there was a conscious juxtaposition of differing concepts within the same question—the task force did not want to receive a set of responses in which all library staff felt it was important for a service to do everything—in short, to be all things to all people. forcing staff to rate varied concepts within a question could provide insights into what they felt was really important. a brief summary of some key questions for each section follows. as an introduction, one question in the survey asked staff to rate the relative importance of each overarching aspect related to a discovery service (customization, end user interface, and content). staff felt content was the most critical aspect of a discovery service, followed by the end-user interface, followed by the ability to heavily customize the service. a snapshot of some of the capabilities library staff thought were important (or not) is provided in table 1. web-scale capabilities sa a n d sd physical item status information 81.6% 18.4% publication date sort capability 75.5% 24.5% display library-specified links in the interface 69.4% 30.6% one-click retrieval of full-text items 61.2% 36.7% 2% ability to place ill / consortial catalog requests 59.2% 36.7% 4.1% display the library’s logo 59.2% 36.7% 4.1% to be embedded within various library website pages 58% 42% full-text items first sort capability 58.3% 31.3% 8.3% 2.1% shopping cart for batch printing, emailing, saving 55.1% 44.9% faceted searching 48.9% 42.6% 8.5% media type sort capability 47.9% 43.8% 4.2% 4.2% author name sort capability 41.7% 37.5% 18.8% 2.1% have a search algorithm that can be tweaked by library staff 38% 36% 20% 4% 2% user account for saved searches and marked items 36.7% 44.9% 14.3% 4.1% book cover images 25% 39.6% 20.8% 10.4% 4.2% have a customizable color scheme 24% 58% 16% 2% google books preview button for book items 18.4% 53.1% 24.5% 4.1% tag cloud 12.5% 52.1% 31.3% 4.2% user authored ratings 6.4% 27.7% 44.7% 12.8% 8.5% user authored reviews 6.3% 20.8% 50% 12.5% 10.4% user authored tags 4.2% 33.3% 39.6% 10.4% 12.5% sa = strongly agree; a = agree; n = neither agree nor disagree; d = disagree; sd = strongly disagree table 1. web-scale discovery service capabilities investigations into library web-scale discovery services | vaughan 39 none of the results was surprising, other than perhaps the low interest or indifference in several web 2.0 community features, such as the ability for users to provide ratings, reviews, or tags for items, and even a tag cloud. the unlv libraries already had a next-generation catalog offering these features, and they have not been heavily used. even if there had been an appreciable adoption of these features by end users in the next-generation catalog for a web scale discovery service they are perhaps less applicable—it’s probably more likely that users would be less inclined to post reviews and ratings for an article, as opposed to a monograph—and article-level content vastly outnumbers book-level content with web-scale discovery services. the final survey section focused on content. one question asked about the incorporation of ten different information types (sources) and asked staff to rank how important it was that a service include such content. results are provided in table 2. a bit surprisingly, inclusion of catalog records was seen as most important. not surprisingly, full-text and a&i content from subscription resources were ranked very highly. it should also be noted that at the time of the survey, the institutional repository was in its infancy with only a few sample records, and awareness of this resource was low among library staff. another question listed a dozen existing publishers (e.g., springer, elsevier, etc.) deemed important to the libraries and asked staff to rank the importance that a discovery service index items from these publishers on a four point scale from “essential” to “not important.” results showed that all publishers were ranked as essential and important. related to content, 83.8 percent of staff felt that it was preferable for a service to de-dupe records such that the item appears once in the returned list of results; 14.6 percent preferred that the service not de-dupe results. information source rating average ils catalog records 1.69 majority of full-text articles / other research contained in vendorlicensed online resources 2.54 majority of citation records for non-full-text vendor-licensed a&i databases 4.95 consortial catalog records 5.03 electronic reserves records 5.44 records within locally created and hosted databases 5.64 digital collection records 5.77 worldcat records 6.21 ils authority control records 6.5 institutional repository records 6.68 table 2. importance of content indexed in discovery service after the first staff survey was concluded, the discovery task force hosted another library forum to introduce and “test drive” the five vendor services in front of library staff. this session was scheduled just a few weeks ahead of the onsite vendor visits to help serve as a primer to engage library staff and get them actively thinking about questions to ask the vendors. the task force information technology and libraries | march 2012 40 distributed notecards at the forum and asked attendees to record any specific questions they had about a particular service. after the forum, specific questions related to the particular products were collected; 28 questions were collected, and they helped inform future research for those questions for which the task force did not at the time have an answer. questions ran the gamut and collectively touched on all three areas of evaluation. activity: second staff survey within a month after the five vendor onsite visits, a content analysis of the overlap between unlv licensed content and content indexed by the discovery services was conducted. after these steps, a second staff survey was administered. this second staff survey had questions focused on the same three functional areas as the first staff survey: local library customization features, end user features and functionality, and content. since the vendor visits had taken place and users could now understand the questions in the context of the products, questions were asked from the perspective of each product, e.g., “please rate on a five point likert scale whether each discovery service appears to adequately cover a majority of the critical publisher titles (worldcat local, summon, eds, encore synergy, primo central).” in addition, there were free-text questions focused on each individual product allowing colleagues to share additional, detailed thoughts. the second survey totalled 25 questions and had an average response rate of 18 respondents, or about 18 percent of library staff. several staff conducted a series of sample searches in each of the services and provided feedback of their findings. though this was a small response rate, two of the five products rose to the top, a third was a strong contender, and two were seen as less desirable. the lower response rate is perhaps indicative of several things. first, not all staff had attended the onsite vendor demonstrations or had taken the time to test drive the services via the links provided on the discovery task force wiki site. second, some questions were more appropriately answered by a subset of staff. for example, the content questions might best be matched to those with reference, collection development, or curriculum and program liaison duties. finally, intricate details emerged once a thorough analysis of the vendor services was commenced. the first survey was focused more on the philosophy of what was desirable; the second survey took this a step further and asked how well each product matched such wishes. discovery services are changing rapidly with respect to interface updates, customization options, and scope of content. as such, and also reflective of the lower response rate, the author is not providing response information nor analysis for this second survey within this article. however, results may be provided upon specific request to the author. the questions themselves for the second staff survey are significant, and they could help serve as a model for other libraries evaluating existing services on the market. as such, questions appear in appendix d. activity: early adopter references one of the latter steps in the evaluation process from the internal academic library perspective was to obtain early adopter references from other academic library customers. a preliminary shortlist was compiled through a straw vote of the discovery task force—and the results of the vote showed a consensus. this vote narrowed down the discovery task force’s list of services still in contention for a potential purchase. this shortlist was based on the growing mass of research conducted by the discovery task force and informed by the staff surveys and feedback to date. three live customers were identified for each service that had made the shortlist, and the task investigations into library web-scale discovery services | vaughan 41 force successfully obtained two references for each service. reference requests were intensive and involved a set of two dozen questions that references either responded to in writing or answered during scheduled conference calls. to help libraries conducting or interested in conducting their own evaluation and analysis of these services, this list of questions appears in appendix e. the services are so new that the live references weren’t able to comprehensively answer all the questions—they simply hadn’t had sufficient time to fully assess the service they’d chosen to implement. still, some important insights were gained about the specific products and, at the larger level, discovery services as a whole. as noted earlier, discovery services are changing rapidly in the sense of interface updates, customization options, and scope of content. as such, the author is not providing product specific response information or analysis of responses for each specific product—such investigations and interpretations are the job of each individual library seriously wishing to evaluate the services to help decide which product seems most appropriate for its particular environment. several broad insights merit notice, and they are shared below. regarding a question on implementation (though some challenges were mentioned with a few responders), nothing reached the threshold of serious concern. all respondents indicated the new discovery service is already the default or primary search box on their website. one section of the early adopter questions focused on content. the questions in this area seemed a bit challenging for the respondents to provide lots of detail. in terms of “adequately covering a majority of the important library titles,” respondents varied from “too early to tell,” “it covers many areas but there are some big names missing,” to two of the respondents answering simply, “yes.” several respondents also clearly indicated that the web-scale discovery service is not the “beginning and ending” for discovery, a fact that even some of the discovery vendors openly note. for example, one respondent indicated that web-scale discovery doesn’t replace remote federated searching. a majority (not all) of the discovery vendors also have a federated search product that can, to varying degrees, be integrated with their preharvested, centralized, index-based discovery service. this allows additional content to be searched because such databases may include content not indexed within the web-scale discovery service. however, many are familiar with the limitations of federated search technologies: slow speed, poor relevancy ranking of results, and the need to configure and maintain sources and targets. such problems remain with federated search products integrated with web-scale discovery services. another respondent indicated they were targeting their discovery service at undergraduate research needs. another responded, “as a general rule, i would say the discovery service does an excellent job covering all disciplines. if you start really in-depth research in a specific discipline, it starts to break down. general searches are great . . . dive deeper into any discipline and it falls apart. for example, for a computer science person, at some point they will want to go to acm or ieee directly for deep searches.” related to this, “the catalog is still important, if you want to do a very specific search for a book record, the catalog is better. the discovery service does not replace the catalog.” in terms of satisfaction with content type (newspapers, articles, proceedings, etc.), respondents seemed generally happy with the content mix. a range of responses were received, such as “doesn’t appear to be a leaning one way or another, it’s a mix. some of these things depend on how you set the system up, as there is quite a bit of flexibility; the library has to make a decision on what they want searched.” another example was that “the vendor has been working very hard to balance content types and i’ve seen a lot of improvement,” “no imbalance, results seem pretty well rounded.” another responded, “a common complaint is that newspapers and book reviews dominate the search results, but that is much more a function of search algorithms then the amount of content in the index.” information technology and libraries | march 2012 42 when asked about positive or critical faculty feedback to the service, several respondents indicated they hadn’t had a lot of feedback yet. one indicated they had anecdotal feedback. another indicated they’d received backlash from some users who were used to other search services (but also added that it was no greater than backlash from any other service they’d implemented in the past—and so the backlash wasn’t a surprise). one indicated “not a lot of feedback from faculty, the tendency is to go to databases directly, librarians need to instruct them in the discovery service.” for student feedback, one indicated, “we have received a few positive comments and see increased usage.” another indicated, “reviews are mixed. we have had a lot of feedback thanking us for providing a search that covers articles and books. they like the ability to do one search and get a mix of resources without the search taking a long time. other feedback usually centers around a bug or a feature not working as it should, or as they understand it should. in general, however, the feedback has been positive.” another replied, “comments we receive are generally positive, but we’ve not collected them systematically.” some respondents indicated they had done some initial usability testing on the initial interface, but not the most recent one now in use. others indicated they had not yet conducted usability testing, but it was planned for later in 2010 or 2011. in terms of their fellow library staff and their initial satisfaction, one respondent indicated, “somewhere between satisfied and very satisfied . . . it has been increasing with each interface upgrade . . . our instruction librarians are not planning to use the discovery service this fall [in instruction efforts] because they need more experience with it . . . they have been overall intrigued and impressed by it . . . i would say our organization is grappling more with the implications of a discovery tools as a phenomenon than with our particular discovery service in particular. there seems to be general agreement that it is a good search tool for the unmediated searcher.” another indicated some concerns with the initial interface provided: “if librarians couldn’t figure it out, users can’t figure it out.” another responded, it was “a big struggle with librarians getting on board with the system and promoting the service to students. they continually compare it against the catalog. at one point, they weren’t even teaching the discovery service in bib instruction. the only way to improve things it with librarian feedback; it’s getting better, it has been hard. librarians have a hard time replacing the catalog and changing things that they are used to.” in terms of local customization, responses varied; some libraries had done basically no customization to the out-of-the-box interface, others had done extensive customization. one indicated they had tweaked sort options and added widgets to the interface. another indicated they had done extensive changes to the css. one indicated they had customized the colors, added a logo, tweaked the headers and footers, and created “canned” or preconfigured search boxes searching a subset of the index. another indicated they couldn’t customize the header and footer to the degree they would have liked, but were able to customize these elements to a degree. one respondent indicated they’d done a lot of customization to an earlier version of the interface, which had been rather painstaking, and that much of this broke when they upgraded to the latest version. that said, they also indicated the latest version was much better than the previous version. one respondent indicated it would be nice if the service could have multiple sources for investigations into library web-scale discovery services | vaughan 43 enriched record content so that better coverage could be achieved. one respondent indicated they were working on a complete custom interface from scratch, which would be partially populated with results from the discovery service index (as well as other data sources). a few questions asked about relevancy as a search concept and how well the respondents felt about the quality of returned results for queries. one respondent indicated, “we have been able to tweak the ranking and are satisfied at this point.” another indicated, “overall, the relevance is good – and it has improved a lot.” another noted, “known item title searching has been a problem . . . the issues here are very predictable – one word titles are more likely to be a problem, as well as titles with stopwords,” and noted the vendor was aware of the issue and was improving this. one noted, “we would like to be able to experiment with the discovery service more – and noted, “no relevancy algorithm control.” another indicated they looked to investigate relevance more once usability studies commenced, and noted they worked with the vendor to do some code changes with the default search mechanism. one noted that they’d like to be able to specify some additional fields that would be part of the algorithm associated with relevancy. another optimistically noted “as an early adopter, it has been amazing to see how relevance has improved. it is not perfect, but it is constantly evolving and improving.” a final question asked simply, “overall, do you feel your selection of this vendor’s product was a good one? do you sense that your users – students and faculty – have positively received the product?” for the majority of responses, there was general agreement from the early adopters that they felt they’d made the right choice. one noted that it was still early and the evaluation is still a work in progress, but felt it has been positively received. the majority were more certain, “yes, i strongly feel that this was the right decision . . . as more users find it, i believe we will receive additional positive feedback,” “yes, we strongly believe in this product and feel it has been adopted and widely accepted by our users,” “i do feel it was a good selection.” the external perspective: dialog with web-scale discovery vendors the preceding sections focused on an academic library’s perspective on web-scale discovery services—the thoughts, opinions, preferences, and vetting activities involving library staff. the following sections focus on the extensive dialog and interaction with the vendors themselves, regardless of the internal library perspective, and highlight the thorough, meticulous research activities conducted on five vendor services. the discovery task force sought to learn as much about the each service as possible, a challenging proposition given the fact that at the start of investigations, only two of five services had been released, and, unsurprisingly, very little research existed. as such, it was critical to work with vendors to best understand their services, and how their service compared to others in the marketplace. broadly summarized efforts included identification of services, drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. activity: vendor identification over the course of a year’s work, the discovery task force executed several steps to systematically understand the vendor marketplace—the capabilities, content considerations, development cycles, and future roadmaps associated with five vendor offerings. given that the information technology and libraries | march 2012 44 task force began their work when only two of these services were in public release, there was no manual, recipe, or substantial published research to rely on. the beginning, for the unlv libraries, lie in identification of the services—one must first know the services to be evaluated before evaluation can commence. as mentioned previously, the discovery mini-summit held at the unlv libraries highlighted one product—serial solutions summon; the only released product at the time of the mini-summit was worldcat local. while no published peer-reviewed research highlighting these new web-scale discovery services existed, press and news releases did exist for the three to-be-released services. such releases shed light on the landscape of services that the task force would review—a total of five services, from the first-to-market, worldcat local, to the most recent entrant, primo central. oclc worldcat local, released in november 2007, can be considered the first web-scale discovery service as defined in this research; the experience of an early pilot partner (the university of washington) is profiled in a 2008 issue of library technology reports.14 in the uw pilot, approximately 30 million article-level items were included with the worldcat database. another product, serials solutions summon, was released in july 2009, and together these two services were the only ones publicly released when the discovery task force began its work. the task force identified three additional vendors each working on their own version of a web-scale discovery service; each of these services would enter initial general release as the task force continued its research: ebsco eds in january 2010, innovative interfaces encore synergy around may 2010, and ex libris primo central in june 2010. while each of these three were new in terms of web-scale discovery capabilities, each was built, at least in part, on earlier systems from the vendors. eds draws heavily from the ebscohost interface (the original version of which dates back to the 1990s), while the base encore and base primo systems were next-generation catalog systems that debuted in 2007. activity: vendor investigations after identification of existing and under development discovery services, a next step in unlv’s detailed vendor investigations included the creation of a uniform, comprehensive question list sent to each of the five vendors. the discovery task force ultimately developed a list of 71 questions divided into nine functional areas, as follows, with an example question: section 1: background. “when did product development begin (month, year)?” section 2: locally hosted systems and associated metadata. “with what metadata schemas does your discovery platform work? (e.g., marc, dublin core, ead, etc.)” section 3: publisher/aggregator coverage (full text and citation content). “with approximately how many publishers/aggregators have you forged content agreements ?” section 4: records maintenance and rights management. “how is your system initialized with the correct set of rights management information when a new library customer subscribes to your product?” investigations into library web-scale discovery services | vaughan 45 section 5: seamlessness & interoperability with existing content repositories. “for ils records related to physical holdings, is status information provided directly within the discovery service results list?” section 6: usability philosophy. “describe how your product incorporates published, established best practices in terms of a customer focused, usable interface.” section 7: local “look & feel” customization options. “which of the following can the library control: color scheme; logo / branding; facet categories and placement; etc.” section 8: user experience (presentation, search functionality, and what the user can do with the results). “at what point does a user leave the context and confines of the discovery interface and enter the interface of a different system, whether remote or local?” section 9: administration module & statistics. “describe in detail the statistics reporting capabilities offered by your system. does your system provide the following sets of statistics . . .” all vendors were given 2–3 weeks to respond, and all vendors responded. it was evident from the uneven level of responses to the questions that the vendors were at different developmental states with their products. some vendors were still 6–9 months away from initial public release; some were not even firm on when their service would enter release. it was also observed that some vendors were less explicit in the level of detail provided, reflective of, or in some cases perhaps regardless of, development state. a refined subset of the original 71 questions appears as a list of 40 questions in appendix f. apart from the detailed question list, various sets of free and licensed information on these discovery services are available online, and the task force sought to identify and digest the information. the charleston advisor has conducted interviews with several of the library webscale discovery vendors on their products, including ebsco,15 serials solutions,16 and ex libris.17 these interviews, each around a dozen questions, ask the vendors to describe their product, how it differs from other products in the marketplace, and include questions on metadata and content—all important questions. an article by ronda rowe reviews summon, eds, and worldcat local, and provides some analysis of each product on the basis of content, user interface and searchability, pricing, and contract options.18 it also provides a comparison of 24 product features provided by these three services, such as “search box can be embedded in any webpage,” “local branding possible,” and “supports social networking.” a wide variety of archived webcasts, many provided by library journal, are available through free registration, and new webcasts are being offered at time of writing; these presentations to some degree touch on discussions with the discovery vendors, and are often moderated or include company representatives as part of the discussion group.19 several libraries have authored reports and presentations that, at least partially, discuss information on particular services gained through their evaluations, which include dialog with the vendors.20 vendors themselves each have a section on their corporate website devoted to their service. information provided on these websites ranges from extremely brief to, in the case of worldcat local, very detailed and informative. in addition, much can be gained by “test-driving” live implementations. as such, a listing of vendor website addresses information technology and libraries | march 2012 46 providing more information as well as a list of sample, live implementations is provided in appendix g. activities: vendor visits and content overlap analysis each of the five vendors visited the unlv libraries in spring 2010. vendor visits all occurred within a nine-day span; visits were intentionally scheduled close to each other to keep things fresh in the minds of library staff, and such proximity would help with product comparisons. vendor visits lasted approximately half a day, and each vendor visit often included the field or regional sales representative as well as a product manager or technical expert. vendor visits included a demonstration and q&a for all library staff as well as invited colleagues from other southern nevada libraries, a meeting with the discovery task force, and a meeting with technical staff at unlv responsible for website design and application development and customization. vendors were each given a uniform set of fourteen questions on topics to address during their visit; these appear in appendix h. questions were divided into the broad topical areas of content coverage, end user interface and functionality, and staff “control” over the end user interface. on average, approximately 30–40 percent of the library staff attended the open vendor demo and q & a session. shortly after the vendor visits, a content-overlap analysis comparing unlv serials holdings with indexed content in the discovery service was sought from each vendor. given that the amount of content indexed by each discovery service was growing (and continues to grow) extremely rapidly as new publisher and aggregator content agreements are signed, this content-overlap analysis was intentionally not sought at an earlier date. some vendors were able to provide detailed coverage information against our existing journal titles (unlv currently subscribes to approximately 20,000 e-journals and provides access to another 7,000+ open-access titles). for others, this was more difficult. recognizing this, the head of collection development was asked to provide a list of the “top 100” journal titles for unlv based on such factors as usage statistics and whether the title was a core title for part of the unlv curriculum. the remaining vendors were able to provide content coverage information against this critical title list. four of the five products had quite comprehensive coverage (more than 80 percent) of the unlv libraries’ titles. while outside the scope of this article, “coverage” can mean different things for different services. driven by the publisher agreements they are able to secure, some discovery services may have extensive coverage for particular titles (such as the full text, abstracts, author-supplied keywords, subject headings, etc.), whereas other services, while covering the same title, may have “thinner” metadata, such as basic citation information (article title, publication title, author, publication date, etc.). more discussion on this topic is present in the january 2011 library technology reports on library web-scale discovery services.21 activity: product development tracking one aspect of web-scale discovery services, and the next-generation discovery layers that preceded them, is a rapid enhancement cycle, especially when juxtaposed against the turnkeystyle ils system that dominated library automation for many years. as an example, minor enhancements are provided by serials solutions to summon approximately every three to four weeks; provided by ebsco to ebsco discovery service approximately every three months; and investigations into library web-scale discovery services | vaughan 47 provided by ex libris to primo/primo central approximately every three months. many vendors unveil updates coinciding with annual library conferences, and 2010 was no exception. in late summer/early fall 2010, the discovery task force had conference calls or onsite visits with several of the vendors with a focused discussion on new enhancements and changes to services as well as to obtain answers to any questions that arose since their last visit several months earlier. since the vendor visits in spring 2010, each service had changed, and two services had unveiled significantly different and improved interfaces. the discovery task force’s understanding of web-scale discovery services had expanded greatly since starting their work. coordinated with the second series of vendor visits and discussions, an additional list of more than two dozen questions, recognizing this refined understanding, was sent to the majority of vendors. a portion of these questions are provided as part of the refined list of questions presented in appendix f. this second set of questions dealt with complex discussions of metadata quality, such as what level of content publishers and aggregators were providing for indexing purposes, e.g., full text, abstracts, table of contents, author-supplied keywords or subject headings, or particular citation and record fields), and also the vendor’s stance on content neutrality, i.e., whether they are entering into exclusive agreements with publishers and aggregators, and, if the discovery service vendor is owned by a company involved with content, if that content is promoted or weighted more heavily in result sets. other questions dealt with such topics as current install base counts and technical clarifications about how their service worked. in particular, the questions related to content were tricky for many (not all) of the vendors to address. still, the discovery task force was able to get a better understanding of how things worked in the evolving discovery environment. combined with the internal library perspective and the early adopter references, information gathered from vendors provided the necessary data set to submit a recommendation with confidence. activity: recommendation by mid-fall 2010, the discovery task force had conducted and had at their disposal a tremendous amount of research. recognizing how quickly these services change and the fact that a cyclical evaluation could occur, the task force members felt they had met their charge. if all things failed during the next phase—implementation—at least no one would be able to question the thoroughness of the task force’s efforts. unlike the hasty decision, which in part led to a less than stellar experience with federated search a few years earlier, the evaluation process to recommend a new web-scale discovery service was deliberate, thorough, transparent, and vetted with library stakeholders. given the discovery task force was entering its final phase, official price quotes were sought from each vendor. each task force member was asked to develop a pro/con list for all five identified products based on the knowledge that was gained. these lists were anonymized and consolidated into a single, extensive pro/con list for each service. some of the pros and cons were subjective (such as the interface aesthetics), some were objective (such as a particular discovery service not offering a desired feature). at one of the final meetings of the task force, members reaffirmed the three top contenders, indicated the other two were no longer under consideration and, afterward, were asked to rank their first, second, and third choices for the remaining services. while complete consensus wasn’t achieved, there was a resounding first choice, second choice, and third information technology and libraries | march 2012 48 choice. the task force presented a summary of findings at a meeting open to all library staff. this meeting summarized the research and evaluation steps the task force had conducted over the past year, framed each of the three shortlisted services by discussing some strengths and weaknesses of each service as observed by the task force, and sought to answer any questions from the library at large. prior to drafting the final report and making the recommendation to the dean of libraries, several task force members led a discussion and final question and answer at a libraries’ cabinet meeting, one of the high-level administrative groups at the unlv libraries. vetting by this body represented the last step related to the discovery task force’s investigation, evaluation, and recommendation for purchase of a library web-scale discovery service. the recommendation was broadly accepted by the library cabinet, and shortly afterward the discovery task force was officially disbanded, having met its goal of investigating, evaluating, and making a recommendation for purchase of a library web-scale discovery service. next steps the dialog above describes the research, evaluation, and recommendation model used by the unlv libraries to select a web-scale discovery service. such a model and the associated appendixes could serve as a framework, with some adaptations perhaps, for other libraries considering the evaluation and purchase of a web-scale discovery service. together, the discovery task force’s internal and external research and evaluation provided a substantive base of knowledge on which to make a recommendation. after its recommendation, the project progressed from a research and recommendation phase to an implementation phase. the libraries’ cabinet brainstormed a list of more than a dozen concise implementation bullet points—steps that would need to be addressed—including the harvesting and metadata mapping of local library resources, local branding and some level of customization work, and integration of the web-scale discovery search box in the appropriate locations on the libraries’ website. project implementation co-managers were assigned (the director of technical services and the web technical support manager), as well as key library personnel who would aid in one or more implementation steps. in january 2011, the implementation commenced, with an expected public launch of the new service planned for mid-2011. the success of a web-scale discovery service at the unlv libraries is a story yet to be written, but one full of promise. acknowledgements the author wishes to thank the other members of the unlv libraries’ discovery task force in the research and evaluation of library web-scale discovery services: darcy del bosque, alex dolski, tamera hanken, cory lampert, peter michel, vicki nozero, kathy rankin, michael yunkin, and anne zald. references 1. marcia j. bates, improving user access to library catalog and portal information, final report, version 3 (washington, dc: library of congress, 2003), 4, http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf (accessed september 10, 2010). http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf investigations into library web-scale discovery services | vaughan 49 2. roger c. schonfeld and ross housewright, faculty survey 2009: key strategic insights for libraries, publishers, and societies (new york: ithaka s+r, 2010), 4, http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-20002009/faculty%20study%202009.pdf (accessed september 10, 2010). 3. oclc, online catalogs: what users and librarians want (dublin, oh: oclc, 2009), 20, http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf (accessed september 10, 2010). 4. ibid, vi. 5. ibid, 14. 6. karen calhoun, the changing nature of the catalog and its integration with other discovery tools: final report (washington, dc: library of congress, 2006), 35, http://www.loc.gov/catdir/calhoun-report-final.pdf (accessed september 10, 2010). 7. bibliographic services task force, rethinking how we provide bibliographic services for the university of california: final report ([pub location?] university of california libraries, 2005), 2, http://libraries.universityofcalifornia.edu/sopag/bstf/final.pdf (accessed september 10, 2010). 8. oclc, college students’ perceptions of libraries and information resources (dublin, oh: oclc, 2006), part 1, page 4, http://www.oclc.org/reports/pdfs/studentperceptions.pdf (accessed september 10, 2010). 9. doug way, “the impact of web-scale discovery on the use of a library collection,” serials review, in press. 10. bill kelm, “worldcat local effects at willamette university,” presentation, prezi, july 21, 2010, http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ (accessed sept 10, 2010). 11. michael boock, faye chadwell, and terry reese, “worldcat local task force report to lamp,”march 27, 2009, http://hdl.handle.net/1957/11167 (accessed february 12, 2012). 12. michael boock et al., “discovery services task force recommendation to university librarian,” http://hdl.handle.net/1957/13817 (accessed february 12, 2012). 13. ken varnum et al., “university of michigan library article discovery working group final report,” umich, january 29, 2010, http://www.lib.umich.edu/files/adwg/final-report.pdf.[access date?] 14. jennifer ward, pam mofjeld, and steve shadle, “worldcat local at the university of washington libraries,” library technology reports 44, no. 6 (august/september 2008). 15. dennis brunning and george machovec, “an interview with sam brooks and michael gorrell on the ebscohost integrated search and ebsco discovery service,” charleston advisor 11, no. 3 (january 2010): 62–65. http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/faculty%20study%202009.pdf http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/faculty%20study%202009.pdf http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf http://libraries.universityofcalifornia.edu/sopag/bstf/final.pdf http://www.oclc.org/reports/pdfs/studentperceptions.pdf http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ http://hdl.handle.net/1957/11167 http://hdl.handle.net/1957/13817 http://www.lib.umich.edu/files/adwg/final-report.pdf information technology and libraries | march 2012 50 16. dennis brunning and george machovec, “interview about summon with jane burke, vice president of serials solutions,” charleston advisor 11, no. 4 (april 2010): 60–62. 17. dennis brunning and george machovec, “an interview with nancy dushkin, vp discovery and delivery solutions at ex libris, regarding primo central,” charleston advisor 12, no. 2 (october 2010): 58–59. 18. ronda rowe, “web-scale discovery: a review of summon, ebsco discovery service, and worldcat local,” charleston advisor 12, no. 1 (october 2010): 5–10. 19. library journal archived webcasts are available at http://www.libraryjournal.com/csp/cms/sites/lj/tools/webcast/index.csp (accessed sept 10, 2010). 20. boock, chadwell, and reese, “worldcat local task force report to lamp”; boock et al., “discovery services task force recommendation to university librarian”; ken varnum et al., “university of michigan library article discovery working group final report.” 21. jason vaughan, “library web-scale discovery services,” library technology reports 47, no. 1 (january 2011). note: appendices a–h available as supplemental files. http://www.libraryjournal.com/csp/cms/sites/lj/tools/webcast/index.csp investigations into library web-scale discovery services: appendices a-h jason vaughan information technology and libraries | march 2012 51 appendices appendix a. discovery task force timeline appendix b. discovery task force charge appendix c. discovery task force: staff survey 1 questions appendix d. discovery task force: staff survey 2 questions appendix e. discovery task force: early adopter questions appendix f. discovery task force: initial vendor investigation questions appendix g. vendor websites and example implementations appendix h. vendor visit questions investigations into library web-scale discovery services | vaughan 52 appendix a. discovery task force timeline information technology and libraries | march 2012 53 appendix b. discovery task force charge discovery task force charge informed through various efforts and research at the local and broader levels, and as expressed in the libraries 2010/12 strategic plan, the unlv libraries have the desire to enable and maximize the discovery of library resources for our patrons. specifically, the unlv libraries seek a unified solution which ideally could meet these guiding principles: • creates a unified search interface for users pulling together information from the library catalog as well as other resources (e.g. journal articles, images, archival materials). • enhances discoverability of as broad a spectrum of library resources as possible • intuitive: minimizes the skills, time, and effort needed by our users to discover resources •supports a high level of local customization (such as accommodation of branding and usability considerations) • supports a high level of interoperability (easily connecting and exchanging data with other systems that are part of our information infrastructure) •demonstrates commitment to sustainability and future enhancements •informed by preferred starting points as such, the discovery task force advises libraries administration on a solution that appears to best meet the goal of enabling and maximizing the discovery of library resources. a bulk of the work will entail a marketplace survey and evaluation of vendor offerings. charge specific deliverables for this work include: 1. identify vendor next generation discovery platforms, whether established and currently on the market, or those publicized and at an advanced stage of development, with an expectation of availability within a year’s time. identify & create a representative list of other academic libraries which have implemented or purchased currently available products. 2. create a checklist / criteria of functional requirements / desires for a next generation discovery platform. 3. create lists of questions to distribute to potential vendors and existing customers of next generation discovery platforms. questions will focus on broad categories such as the following: a. seek to understand how content hosted in our current online systems (iii catalog, contentdm, locally created databases, vendor databases, etc.) could/would (or not be able investigations into library web-scale discovery services | vaughan 54 to) be incorporated or searchable within the discovery platform. apart from our existing online systems as we know them today, the task force will explore, in general terms, how new information resources could be incorporated into the discovery platform. more explicitly, the task force will seek an understanding of what types of existing records are discoverable within the vendor’s next generation discovery platform, and seek an understanding of what basic metadata must exist for an item to be discoverable. b. seek to understand whether the solution relies on federated search, the creation of a central site index via metadata harvesting, or both, to enable discovery of items. c. additional questions, such as pricing, maintenance, install base, etc. 4. evaluate gathered information and seek feedback from library staff. 5. provide to the dean’s directs a final report which summarizes the task force findings. this report will include a recommended product(s) and a broad, as opposed to detailed, summary of workload implications related to implementation and ongoing maintenance. the final report should be provided to the dean’s directs by february 15, 2010. boundaries the work of the task force does not include: • detailing the contents of “hidden collections” within the libraries and seeking to make a concrete determination that such hidden collections, in their current form, would be discoverable via the new system. • conducting an inventory, recommending, or prioritizing collections or items which should be cataloged or otherwise enriched with metadata to make them discoverable. • coordination with other southern nevada nshe entities. • an ils marketplace survey. the underlying innovative millennium system is not being reviewed for potential replacement. • implementation of a selected product. [the charge concluded with a list of members for the task force] information technology and libraries | march 2012 55 appendix c. discovery task force: staff survey 1 questions “rank” means the surveymonkey question will be set up such that each option can only be chosen once, and will be placed on a scale that corresponds to the number of choices overall. “rate” means there will be a 5 point likert scale ranging from strongly disagree to strongly agree. section 1: customization. the “staff side” of the house 1. customization. it is important for the library to be able to control/tweak/influence the following design element [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  general color scheme  ability to include a unlv logo somewhere on the page.  ability to add other branding elements to the page.  ability to add one or more library specified links prominently in the interface (example: a link to the libraries’ home page)  able to customize the name of the product (meaning, the vendor’s name for the product doesn’t need to be used nor appear within the interface)  ability to embed the search box associated with the discovery platform elsewhere into the library website, such as the homepage (i.e. the user could start a search w/o having to directly go to the discovery platform 2. customization. are there any other design customization capabilities that are significantly important? please list, and please indicate if this is a high, low, or medium priority in terms of importance to you. (freetext box ) 3. search algorithms. it is important for the library to be able to change or tweak the platform’s native search algorithm to be able to promote desired items such that they appear higher in the returned list of [strongly disagree / disagree / neither agree or disagree / agree / strongly agree] [e.g. the library, at its option, could tweak one or more search algorithms to more heavily weight resources it wants to promote. for example, if a user searches for “hoover dam” the library could set a rule that would heavily weight and promote unlv digital collection images for hoover dam – those results would appear on the first page of results]. 4. statistics. the following statistic is important to have for the discovery platform [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  number of searches, by customizable timeframe number of item or article level records accessed (that is, a user clicks on something in the returned list of results)  number of searches generating 0 results investigations into library web-scale discovery services | vaughan 56  number of items accessed by type  number of items accessed by provider of content (that is, number of articles from particular database/fulltext vendor 5. statistics. what other statistics would you like to see a discovery platform provide and how important is this to you? (freetext box) 6. staff summary. please rank on a 1-3 scale how important the following elements are, with a “1” being most important, a “2” being 2nd most important, and a 3 being 3rd most important.  heavy customization capabilities as described in questions 1 & 2 above  ability to tweak search algorithms as described in question 3  ability for the system to natively provide detailed search stats such as described in question 4, 5. section 2. the “end user” side of the house 7. searching. which of the following search options is preferable when a user begins their search [choose one]  the system has a “google-like” simple search box  the system has a “google-like” simple search box, but also has an advanced search capability (user can refine the search to certain categories: author, journal, etc.)  no opinion 8. zero hit searches. for a search that retrieves no actual results: [choose one]  the system should suggest something else or ask, “did you mean?”  retrieving precise results is more important and the system should not suggest something else or ask “did you mean?”  no opinion 9. de-duplication of similar items. which of the following is preferable [choose one]  the system automatically de-dupes records (the item only appears once in the returned list)  the system does not de-dupe records (the same item could appear more than once in the returned list, such as when we have overlapping coverage of a particular journal from multiple subscription vendors)  no opinion information technology and libraries | march 2012 57 10. sorting of returned results. it is important for the user to be able to sort or reorder a list of returned results by . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  publication date  alphabetical by author name  alphabetical by title  full text items first  by media type (examples: journal, book, image, etc) 11. web 2.0 functionality on returned results. the following items are important for a discovery platform to have . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree] (note, if necessary, please conduct a search in the libraries’ encore system to help illustrate / remember some of the features/jargon mentioned below. in encore, “facets” appear on the left hand side of the screen; the results with book covers, “add to cart,” and “export” features appear in the middle; and a tag cloud to the right. note: this question is asking about having the particular feature regardless of which vendor, and not how well or how poorly you think the feature works for the encore system)  a tag cloud  faceted searching  ability to add user-generated tags to materials (“folksonomies”)  ability for users to write and post a review of an item • other (please specify) 12. enriched record information on returned results. the following items are important to have in the discovery system . . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  book covers for items held by the libraries  a google books preview button for print items held by the libraries  displays item status information for print items held by the libraries (example: available, checked out) 13. what the user can do with the results. the following functionality is important to have in the discovery system . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  retrieve the fulltext of an item with only a single click on the item from the initial list of returned results  ability to add items to a cart for easy export (print, email, save, export to refworks) investigations into library web-scale discovery services | vaughan 58  ability to place an interlibrary loan / link+ request for an item  system has a login/user account feature which can store user search information for later. in other words, a user could potentially log in to retrieve saved searches, previously stored items, or create alerts when new materials become available. 14. miscellaneous. the following feature/attribute is important to have in the discovery system . . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  the vendor has an existing mobile version of their discovery tool for use by smartphones or other small internet-enabled devices.  the vendor has designed the product such that it can be incorporated into other sites used by students, such as webcampus and/or social networking sites. such “designs” may include the use of persistent urls to embed hyperlinks, the ability to place the search box in another website, or specifically designed widgets developed by the vendor  indexing and availability of newly published items occurs within a matter of days as opposed to a week or perhaps a month.  library catalog authority record information is used to help return proper results and/or populate a tag cloud. 15. end user summary. please rank on a 1-8 scale how important the following elements are; a “1” means you think it is the most important, a “2” second most important, etc.  system offers a “google-like” simple search box only, as detailed in question 7 above  system offers a “did you mean?” or alternate suggestions for all searches retrieving 0 results as detailed in question 8 above (obviously, if you value precision of results over “did you mean” functionality, you would rank this toward the lower end of the spectrum).  system de-dupes similar items as detailed in question 9 above(if you believe the system should not dedupe similar items, you would rate this toward the lower end of the spectrum)  system provides multiple sort options of returned results as detailed in question 10 above  system offers a variety of web 2.0 features as detailed in question 11 above  system offer enriched record information as detailed in question 12 above  system offers flexible options for what a user can do with the results, as detailed in question 13 above  system has one or more miscellaneous features as detailed in question 14 above. section 3: content 16. incorporation of different information types. in an ideal world, a discovery platform would incorporate all of our electronic resources, whether locally produced or licensed/purchased from vendors. below is a listing of different information types. please rank on a scale of 1-10 how vital it is information technology and libraries | march 2012 59 that a discovery platform accommodate these information types (“1” is the most important item in your mind, a “2” is second most important, etc). a. innopac millennium records for unlv print & electronic holdings b. link+ records for print holdings held within the link+ consortium c. innopac authority control records d. records within oclc worldcat e. contentdm records for digital collection materials f. bepress digital commons institutional repository materials g. locally created web accessible database records (e.g. the special collections & architecture databases) h. electronic reserves materials hosted in eres i. a majority of the citation records from non fulltext, vendor licensed online index/abstract/citation databases (e.g. the “agricola” database) j. a majority of the fulltext articles or other research contained in many of our vendor licensed online resources (e.g. “academic search premier” which contains a lot of full text content, and the other fulltext resource packages / journal titles we subscribe to) 17. local content. related to item (g) in the question immediately above, please list any locally produced collections that are currently available either on the website, or in electronic format as a word document, excel spreadsheet or access database (and not currently available on the website) that you would like the discovery platform to incorporate. (freetext box) 18. particular sets of licensed resources, what’s important? please rank which of the licensed (full text or primarily full text) existing publishers below are most important for a discovery platform to accommodate. elsevier sage wiley springer american chemical society taylor & francis (informaworld) ieee american institute of physics oxford ovid nature emerald investigations into library web-scale discovery services | vaughan 60 section 4: survey summary 19. overarching survey question. the questions above were roughly categorized into three areas. given that no discovery platform will be everything to everybody, please rank on a 1-3 scale what the most important aspects of a discovery system are to you (1 is most critical, 2 is second in importance overall, etc.)  the platform is highly customizable by staff (types of things in area 1 of the survey)  the platform is highly flexible from the end-user standpoint (type of things in area 2 of the survey)  the platform encompasses a large variety of our licensed and local resources (type of things in area 3 of the survey) 20. additional input. the survey above is roughly drawn from a larger list of 71 questions sent to the discovery task force vendors. what other things do you think are really important when thinking about a next-generation discovery platform? (freetext input, you may write a sentence or a book) 21. demographic. what library division do you belong to? library administration library technologies research & education special collections technical services user services information technology and libraries | march 2012 61 appendix d. discovery task force: staff survey 2 question for the comparison questions, products are listed by order of vendor presentation. please mark an answer for each product. part i. licensed publisher content (e.g. fulltext journal articles; citations / abstracts) sa = strongly agree; a = agree; n= neither agree nor disagree; d = disagree; sd = strongly disagree 1. “the discovery platform appears to adequately cover a majority of the critical publisher titles.” sa a n d sd i don’t know enough about the content coverage for this product to comment ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 2. “the discovery platform appears to adequately cover a majority of the second-tier or somewhat less critical publisher titles.” sa a n d sd i don’t know enough about the content coverage for this product to comment ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 3. overall, from the content coverage point of view, please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 4. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the content coverage standpoint. unacceptable acceptable ex libris primo central investigations into library web-scale discovery services | vaughan 62 oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part ii. end-user functionality & ease of use 5. from the user perspective, how functional do you think the discovery platform is? are the facets and/or other methods that one can use to limit or refine a search appropriate? were you satisfied with the export options offered by the system (email, export into refworks, print, etc.)? if you think web 2.0 technologies are important (tag cloud, etc.), were one or more of these present (and well executed) in this product? the platform appears to be severely limited in major aspects of end user functionality the platform appears to have some level of useful functionality, but perhaps not as much or as well executed as some competing products. yes, the platform seems quite rich in terms of end user functionality, and such functions are well executed. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 6. from the user perspective, for a full-text pdf journal article, how easy is it to retrieve the full-text? does it take many clicks? are there confusing choices? it’s very cumbersome trying to retrieve the full text of an item, there are many clicks, and/or it’s simply confusing when going through the steps to retrieve the full text. it’s somewhat straightforward to retrieve a full text item, but perhaps it’s not as easy or as well executed as some of the competing products it’s quite easy to retrieve a full text item using this platform, as good as or better than the competition, and i don’t feel it would be a barrier to a majority of our users. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central information technology and libraries | march 2012 63 oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 7. how satisfied were you with the platform’s handling of “dead end” or “zero hit” searches? did the platform offer “did you mean” spelling suggestions? did the platform offer you the option to request the item via doc delivery / link+? is the vendor’s implementation of such features well executed, or were they difficult, confusing, or otherwise lacking? the platform appears to be severely limited in or otherwise poorly executes how it responds to a dead end or zero hit search. the platform handled dead end or zero hit results, but perhaps not as seamlessly or as well executed as some of the competing products. i was happy with how the platform handled “dead end” searches, and such functionality appears to be well executed, as good as or better than the competition. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 8. how satisfied were you with the platform’s integration with the opac? were important things such as call numbers, item status information, and enriched content immediately available and easily viewable from within the discovery platform interface, or did it require an extra click or two into the opac – and did you find this cumbersome or confusing? the platform provides minimal opac item information, and a user the platform appeared to integrate ok with the opac in i was happy with how the platform integrated with the i can’t comment on this particular product because i didn’t see the investigations into library web-scale discovery services | vaughan 64 would have to click through to the opac to get the information they might really need; and/or it took multiple clicks or was otherwise cumbersome to get the relevant item level information terms of providing some level of relevant item level information, but perhaps not as much or as well executed as competing products. opac. a majority of the opac information was available in the discovery platform, and/or their connection to the opac was quite elegant. vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 9. overall, from an end user functionality / ease of use standpoint – how a user can refine a search, export results, easily retrieve the fulltext, easily see information from the opac record – please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 10. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the user functionality / ease of use standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part iii. staff customization information technology and libraries | march 2012 65 11. the “out of the box” design demo’ed at the presentation (or linked to the discovery wiki page – whichever particular implementation you liked best for that product) was . . seriously lacking and i feel would need major design changes and customization by library web technical staff. middle of the road – some things i liked, some things i didn’t. the interface design was better than some competing products, worse than others. appeared very professional, clean, well organized, and usable; the appearance was better than most/all of the others products. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 12. all products offer some level of customization options that allow at least some changes to the “out of the box” platform. based on what the vendors indicated about the level of customization possible with the platform (e.g. look and feel, ability to add library links, ability to embed the search box on a homepage) do you feel there is enough flexibility with this platform for our needs? the platform appears to be severely limited in the degree or types of customization that can occur at the local level. we appear “stuck” with what the vendor gives us – for better or worse. the platform appeared to have some level of customization, but perhaps not as much as some competing products. yes, the platform seems quite rich in terms of customization options under our local control; more so than the majority or all of the other products. i can’t comment on this particular product because i didn’t see the vendor demo, don’t have enough information, and/or would prefer to leave this question to technical staff to weigh in on. ex libris primo central oclc worldcat local ebsco discovery services innovative encore investigations into library web-scale discovery services | vaughan 66 synergy serials solutions summon 13. overall, from a staff customization standpoint – the ability to change the interface, embed links, define facet categories, define labels, place the searchbox in a different webpage, etc., please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 14. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the staff customization standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part iv. summary questions 15. overall, from a content coverage, user functionality, and staff customization standpoint, please rank each product from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon information technology and libraries | march 2012 67 16. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the overall standpoint of content coverage, user functionality, and staff customization standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part v. additional thoughts 17. please share any additional thoughts you have on ex libris primo central. (freetext box) 18. please share any additional thoughts you have on oclc worldcat local. (freetext box) 19. please share any additional thoughts you have on ebsco discovery services. (freetext box) 20. please share any additional thoughts you have on innovative encore synergy. (freetext box) 21. please share any additional thoughts you have on serials solutions summon. (freetext box) investigations into library web-scale discovery services | vaughan 68 appendix e. discovery task force: early adopter reference questions author’s note: appendix e originally appeared in the january 2011 library technology reports: web scale discovery services as chapter 7, “questions to consider.” part 1 background 1. how long have you had your discovery service available to your end users? (what month and year did it become generally available to your primary user population, and linked to your public library website). 2. after you had selected a discovery service, approximately how long was the implementation period – how long did it take to “bring it up” for your end‐users and make it available (even if in ‘beta’ form) on your library website? 3. what have you named your discovery service, and is it the ‘default’ search service on your website at this point? in other words, regardless of other discovery systems (ils, digital collection management system, ir, etc.), has the new discovery service become the default or primary search box on your website? part 2 content: article level content coverage & scope “article level content” = articles from academic journals, articles from mainstream journals, newspaper content, conference proceedings, open access content 4. in terms of article level content, do you feel the preindexed, preharvested central index of the discovery platform adequately covers a majority of the titles important to your library’s collection and focus? 5. have you observed any particular strengths in terms of subject content in any of the three major overarching areas -humanities, social sciences, sciences? 6. have you observed any big, or appreciable, gaps in any of the three major overarching areas – humanities, social sciences, sciences? 7. have you observed that the discovery service leans toward one or a few particular content types (e.g. peer reviewed academic journal content; mainstream journal content; newspaper article content; conference proceedings content; academic open access content)? 8. are there particular publishers whose content is either not incorporated, (or not adequately incorporated), into the central index, that you’d like to see included (e.g. elsevier journal content)? 9. have you received any feedback, positive or negative, from your institution’s faculty, related to the content coverage within the discovery service? 10. taking all of the above questions into consideration, are you happy, satisfied, or dissatisfied with the scope of subject content, and formats covered, in the discovery platform’s central index? 11. in general, are you happy with the level of article level metadata associated with the returned information technology and libraries | march 2012 69 citation level results (that is, before one retrieves the complete full text). in other words, the product may incorporate basic citation level metadata (e.g. title, author, publication info), or it may include additional enrichment content, such as abstracts, author supplied keywords, etc. overall, how happy do you sense your library staff is with the quality and amount of metadata provided for a “majority” of the article level content indexed in the system? part 3 content: your local library resources 12. it’s presumed that your local library ils bib records have been harvested into the discovery solution. do you have any other local “homegrown” collections – hosted by other systems at your library or institution – whose content has been harvested into the discovery solution? examples would include digital collection content, institutional repository content, library subject guide content, or other specialized, homegrown local database content. if so, please briefly describe the content – focus of collection, type of content (images, articles, etc.), and a ballpark number of items. if no local collections other than ils bib record content have been harvested, please skip to question 15. 13. [for local collections other than ils bib records]. did you use existing, vendor provided ingestors to harvest the local record content (i.e. ingestors to transfer the record content, apply any transformations and normalizations to migrate the local content to the underlying discovery platform schema)? or did you develop your own ingestors from scratch, or using a toolkit or application profile template provided by the vendor? 14. [for local collections other than ils bib records]. did you need extensive assistance from the discovery platform vendor to help harvest any of your local collections into the discovery index? if so, regardless of whether the vendor offered this assistance for free or charged a fee, were you happy with the level of service received from the vendor? 15. do you feel your local content (including ils bib records) is adequately “exposed” during a majority of searches? in other words, if your local harvested content equaled a million records, and the overall size of the discovery platform index was a hundred million records, do you feel your local content is “lost” for a majority of end user searches, or adequately exposed? part 4 interface: general satisfaction level 16. overall, how satisfied are you and your local library colleagues with the discovery service’s interface? 17. do you have any sense of how satisfied faculty at your institution are with the discovery service’s interface? have you received any positive or negative comments from faculty related to the interface? 18. do you have any sense of how satisfied your (non-faculty) end-users are with the discovery service’s interface? have you received any positive or negative comments from users related to the interface? 19. have you conducted any end-user usability testing related to the discovery service? if so, can you provide the results, or otherwise some general comments on the results of these tests? 20. related to searching, are you happy with the relevance of results returned by the discovery service? have you noticed any consistent “goofiness,” or surprises with the returned results? if you could make a investigations into library web-scale discovery services | vaughan 70 change in the relevancy arena, what would it be, if anything? part 5 interface: local customization 21. has your library performed what you might consider any “major customization” to the product? or has it primarily been customizations such as naming the service, defining hyperlinks and the color scheme? if you’ve done more extensive customization, could you please briefly describe, and was the product architecture flexible enough to allow you to do what you wanted to do (also see question 22 below, which is related). 22. is there any particular feature or function that is missing or non-configurable within the discovery service that you wish were available? 23. in general, are you happy with the “openness” or “flexibility” of the system in terms of how customizable it is by your library staff? part 6: final thoughts 24. overall, do you feel your selection of this vendor’s product was a good one? do you sense that your users – students and faculty – have positively received the product? 25. have you conducted any statistics review or analysis (through the discovery service statistics, or link resolver statistics, etc.) that would indicate or at least suggest that the discovery service has improved the discoverability of some of your materials (whether local library materials or remotely hosted publisher content). 26. if you have some sense of the competition in the vendor discovery marketplace, do you feel this product offers something above and beyond the other competitors in the marketplace? if so, what attracted you to this particular product, what made it stand out? information technology and libraries | march 2012 71 appendix f. discovery task force: initial vendor investigation questions section 1: general / background questions 1. customer install base how many current customers do you have that have which have implemented the product at their institution? (the tool is currently available to users / researchers at that institution) how many additional customers have committed to the product? how many of these customers fall within our library type (e.g. higher ed academic, public, k-12)? 2. references can you provide website addresses for live implementations which you feel serve as a representative model matching our library type? can you provide references – the name and contact information for the lead individuals you worked with at several representative customer sites which match our library type? 3. pricing model, optional products describe your pricing model for a library type such as ours, including initial upfront costs and ongoing costs related to the subscription and technical support. what optional add-on services or modules (federated search, recommender services, enrichment services) do you market which we should be aware of, related to and able to be integrated with your web scale discovery solution? 4. technical support and troubleshooting briefly describe options customers have, and hours of availability, for reporting mission critical problems; and for reporting observed non mission-critical glitches. briefly describe any consulting services you may provide above and beyond support services offered as part of the ongoing subscription. (e.g. consulting services related to harvesting of a unique library resource for which an ingest/transform/normalize routine does not already exist). is there a process for suggesting enhancement requests for potential future incorporation into the product? 5. size of the centralized index. how many periodical titles does your preharvested, centralized index encompass? how many indexed items? 6. statistics. please describe what you feel are some of the more significant use, management or content related statistics available out-of-the-box with your system. investigations into library web-scale discovery services | vaughan 72 are the statistics counter compliant? 7. ongoing maintenance activities, local library staff. for instances where the interface and discovery service is hosted on your end, please describe any ongoing local library maintenance activities associated with maintaining the service for the local library’s clientele (e.g. maintenance of the link resolver database; ongoing maintenance associated with periodic local resource harvest updates; etc.) section 2: local library resources 8. metadata requirements and existing ingestors. what mandatory record fields for a local resource has to exist for the content to be indexed and discoverable within your platform (title, date)? please verify that your platform has existing connectors -ingest/transform/normalize tools and transfer mechanisms and/or application profiles for the following schema used by local systems at our library (e.g. marc 21 bibliographic records; unqualified / qualified dublin core, ead, etc.) please describe any standard tools your discovery platform may offer to assist local staff in crosswalking between the local library database schema and the underlying schema within your platform. our library uses the abc digital collection management software. do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? our library uses the abc institutional repository software. do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? 9. resource normalization. is content for both local and remote content normalized to a single schema? if so, please offer comments on how local and remote (publisher/aggregator) content is normalized to this single underling schema. to what degree can collections from different sources have their own unique field information which is displayed and/or figures into the relevancy ranking algorithm for retrieval purposes. 10. schedule. for records hosted in systems at the local library, how often do you harvest information to account for record updates, modifications, deletions? can the local library invoke a manual harvest of locally hosted resource records on a per-resource basis (e.g. from a selected resource – for example, if the library launches a new digital collection and want the records to be available in the new discovery platform shortly after they are available in our local digital collection management system, is there a mechanism to force a harvest prior to the next regularly scheduled harvest routine? after harvesting, how long does it typically take for such updates, additions, and deletions to be reflected in the searchable central index? information technology and libraries | march 2012 73 11. policies / procedures. please describe any general policies and procedures not already addressed which the local library should be aware of as relates to the harvesting of local resources. 12. consortial union catalogs. can your service harvest or provide access to items within a consortial or otherwise shared catalog (e.g. the inn-reach catalog). please describe. section 3: publisher and aggregator indexed content 13. publisher/aggregator agreements: general with approximately how many publishers have you forged content agreements with? are these agreements indefinite or do they have expiration dates? have you entered into any exclusive agreements with any publishers/aggregators (i.e. the publisher/aggregator is disallowed from forging agreements with competing discovery platform vendors, or disallowed from providing the same deep level of metadata/full text for indexing purposes). 14. comments on metadata provided. could you please provide some general comments on the level of data provided to you, for indexing purposes, by the “majority” of major publishers/aggregators with which you have forged agreements. please describe to what degree the following elements play a role in your discovery service: a. “basic” bibliographic information (article title/journal title/author/publication information) b. subject descriptors c. keywords (author supplied?) d. abstracts (author supplied?) e. full text 15. topical content strength do you feel there is a particular content area that you feel the service covers especially well or leans heavily toward (e.g. humanities, social sciences, sciences). do you feel there is a particular content type that you feel the service covers very well or leans heavily toward (scholarly journal content, mainstream journal content, newspapers, conference proceedings). what subject / content areas, if any, do you feel the service may be somewhat weak? are there current efforts to mitigate these weaknesses (e.g. future publisher agreements on the horizon)? 16. major publisher content agreements. are there major publisher agreements that you feel are especially significant for your service? if so, which publishers, and why (e.g. other discovery platform vendors may not have such agreements with those particular providers; the amount of content was so great that it greatly augmented the size and scope of your service; etc.) investigations into library web-scale discovery services | vaughan 74 17. content considered key by local library (by publisher). following is a list of some major publishers whose content the library licenses which is considered “key.” has your company forged agreements with these publishers to harvest their materials. if so please describe in general the scope of the agreement. how many titles are covered for each publisher? what level of metadata are they providing to you for indexing purposes (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). a. ex. elsevier b. ex. sage c. ex. taylor and francis d. ex. wiley / blackwell 18. content considered key by local library (by title). following is a list of some major journal / newspaper titles whose content the library licenses which is considered “key.” could you please indicate if your central index includes these titles, and if so, the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). a. ex. nature b. ex. american historical review c. ex. jama d. ex. wall street journal 19. google books / google scholar. do any agreements exist at this time to harvest the data associated with the google books or google scholar projects into your central index? if so, could you please describe the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). 20. worldcat catalog. does your service include the oclc worldcat catalog records? if so, what level of information is included? the complete record? holdings information? 21. e-book vendors. does your service include items from major e-book vendors? 22. record information. given the fact that the same content (e.g. metadata for a unique article) can be provided by multiple sources (e.g. the original publisher of the journal itself, an open access repository, a database / aggregator, another database / aggregator, etc.), please provide some general comments on how records are built within your discovery service. for example: a. you have an agreement with a particular publisher/aggregator and they agree to provide you with rich metadata for their content, perhaps even provide you with indexing they’ve already done for their content, and may even provide you with the full text for you to be able to “deep index” their content. b. you’ve got an agreement with a particular publisher who happens to be the only publisher/provider of that content. they may provide you rich info, or they may provide you rather weak info. in any case, you choose to incorporate this into your service, as they are the only provider/publisher of the info. or, information technology and libraries | march 2012 75 alternately, they may not be the only publisher/provider of the info, but they are the only publisher/provider you’ve currently entered into an agreement with for that content. c. for some items appearing within your service, content for those items is provided by multiple different sources whom you’ve made agreements with. in short, there will be in some/many cases of overlap for unique items, such as a particular article title. in such cases, do you create a “merged/composite/super record” -where your service utilizes particular metadata from each of the multiple sources, creating a “strong” single record built from these multiple resources. 23. deduping. related to the question immediately above, please describe your services’ approach (or not) to deduplicating items in your central index. if your service incorporates content for a same unique item from more than one content provider, does your index retrieve and display multiple instances of the same title? or do you create a merged/composite/super record, and only this single record is displayed? please describe. section 4: open access content 24. open access content sources. does your service automatically include (out of the box, no additional charge) materials from open access repositories? if so, could you please list some of the major repositories included (e.g. arxiv e-prints; hindawi publishing corporation; the directory of open access journals; hathi trust materials; etc.). 25. open access content sources: future plans. in addition to the current open access repositories that may be included in your service, are there other repositories whose content you are planning to incorporate in the future? 26. exposure to other libraries’ bibliographic / digital collection / ir content. are ils bibliographic records from other customers using your discovery platform exposed for discoverability in the searchable discovery instance of another customer? are digital collection records? institutional repository records? section 5: relevancy ranking 27. relevancy determination. please describe some of the factors which comprise the determination of relevancy within your service. what elements play a role, and how heavily are they weighted for purposes of determining relevancy? 28. currency. please comment on how heavily currency of an item plays in relevancy determination. does currency weigh more heavily for certain content types (e.g. newspapers)? 29. local library influence. does the local library have any influence or level of control over the relevancy algorithm? can they choose to “bump up” particular items for a search? please describe. 30. local collection visibility. could you please offer some comments on how local content (e.g. ils bibliographic records; digital collections) remains visible and discoverable within the larger pool of content indexed by your service? for example, local content may measures a million items, and your centralized index may cover half a billion items. investigations into library web-scale discovery services | vaughan 76 31. exposure of items with minimal metadata. some items likely have lesser metadata than other items. could you please offer some comments on how your system ensures discoverability for items with lesser or minimal metadata. 32. full text searching. does your service offer the capability for the user to search the fulltext of materials in your service (i.e. are they searching a full text keyword index?) if so, approximately what percentage of items within your service are “deep indexed?” 33. please describe how your system deals when no hits are retrieved for a search. does your system enable “best-match” retrieval – that is, something will always be returned or recommended? what elements play into this determination; how is the user prevented from having a completely “dead-end” search? section 6: authentication and rights management 34. open / closed nature of your discovery solution. does your system offer an unauthenticated view / access? please describe and offer some comments on what materials will not be discoverable/visible for an unauthenticated user. a. licensed full text b. records specifically or solely sourced from abstract and indexing databases c. full citation information (e.g. an unauthenticated user may see just a title; an authenticated user would see fuller citation information) d. enrichment information (such as book image covers, table of contents, abstracts, etc.) e. other 35. exposure of non-licensed resource metadata. if one weren’t to consider and take into account any e-journal/publisher package/database subscriptions & licenses the local library pays for, is there a base index of citation information that’s exposed and available to all subscribers of your discovery service? this may include open access materials, and/or bibliographic information for some publisher / aggregator content (which often requires a local library license to access the full text). please describe. would a user need to be authenticated to search (and retrieve results from) this “base index?” approximately how large is this “base index” which all customers may search, regardless of local library publisher/aggregator subscriptions. 36. rights management. please discuss how rights management is initialized and maintained in your system, for purposes of determining whether a local library user should have access to the full text (or otherwise “full resolution” if a library doesn’t license the fulltext – such as resolution to a detailed citation/abstract). information technology and libraries | march 2012 77 our library uses the abc link resolver. our library uses the abc a-z journal listing service. our library uses the abc electronic resource management system. is your discovery solution compatible with one/all of these systems for rights management purposes? is one approach preferable to the other, or does your approach explicitly depend on one of these particular services? section 7: user interface 37. openness to local library customization. please describe how “open” your system is to local library customization. for example, please comment on the local library’s ability to a. rename the service b. customize the header and footer hyperlinks / color scheme c. choose which facet clusters appear d. define new facet clusters e. embed the search box in other venues f. create canned, pre-customized searches for an instance of the search box g. define and promote a collection, database, or item such that it appears at the top or on the first page of any search i. develop custom “widgits” offering extra functionality or download “widgits” from an existing user community (e.g. image retrieval widgits such as flickr integration; library subject guide widgits such as libguides integration; etc. j. incorporate links to external enriched content (e.g. google book previews; amazon.com item information) k. other 38. web 2.0 social community features. please describe some current web 2.0 social features present in your discovery interface (e.g. user tagging, ratings, reviews, etc.). what, if any, plans do you have to offer or expand such functionality in future releases? 39. user accounts. does your system offer user accounts? if so, are these mandatory or optional? what services does this user account provide? a. save a list of results to return to at a later time? investigations into library web-scale discovery services | vaughan 78 b. save canned queries for later searching? c. see a list of recently viewed items? d. perform typical ils functions such as viewing checked out items / renewals / holds? e. create customized rss feeds for a search 40. mobile interface. please describe the mobile interfaces available for your product. is it a browser based interface optimized for smallscreen devices? is it a dedicated iphone, android, or blackberry based executable application? 41. usability testing. briefly describe how your product incorporates published, established “best practices” in terms of a customer focused, usable interface. what usability testing have your performed and/or do you conduct on an ongoing basis? have any other customers that have gone live with your service completed usability testing that you’re aware of? information technology and libraries | march 2012 79 appendix g: vendor websites and example implementations oclc worldcat local www.oclc.org/us/en/worldcatlocal/default.htm example implementations: lincoln trails library system www.lincolntrail.info/linc.html university of delaware www.lib.udel.edu university of washington www.lib.washington.edu willamette university http://library.willamette.edu serials solutions summon www.serialssolutions.com/summon example implementations: dartmouth college www.dartmouth.edu/~library/home/find/summon drexel university www.library.drexel.edu university of calgary http://library.ucalgary.ca western michigan university http://wmich.summon.serialssolutions.com ebsco discovery services www.ebscohost.com/discovery example implementations: james madison university www.lib.jmu.edu mississippi state university http://library.msstate.edu northeastern university www.lib.neu.edu university of oklahoma http://libraries.ou.edu investigations into library web-scale discovery services | vaughan 80 innovative interfaces encore synergy encoreforlibraries.com/tag/encore-synergy example implementations: university of nebraska-lincoln http://encore.unl.edu/iii/encore/home?lang=eng university of san diego http://sallypro.sandiego.edu/iii/encore/home?lang=eng scottsdale public library http://encore.scottsdaleaz.gov/iii/encore/home?lang=eng sacramento public library http://find.saclibrarycatalog.org/iii/encore/home?lang=eng ex libris primo central www.exlibrisgroup.com/category/primocentral example implementations: (note: example implementations are listed in alphabetical order. some implementations are more open to search by an external audience, based on configuration decisions at the local library level.) brigham young university scholarsearch www.lib.byu.edu (note: choose all-in-one search) northwestern university http://search.library.northwestern.edu vanderbilt university discoverlibrary http://discoverlibrary.vanderbilt.edu (note: choose books, media, and more) yonsei university (korea) wisearch: articles + library holdings http://library.yonsei.ac.kr/main/main.do (note: choose the articles + library holdings link. the interface is available in both korean and english; to change to english, select english at the top right of the screen after you have conducted a search and are within the primo central interface) information technology and libraries | march 2012 81 appendix h. vendor visit questions content 1. please speak to how well you feel your product stacks up against the competition in terms of the licensed full-text / citation content covered by your product. based on whatever marketplace or other competitive analysis you may have done, do you feel the agreements you’ve made with publishers equal, exceed, or trail the agreements other competitors have made? 2. from the perspective of an academic library serving undergraduate and graduate students as well as faculty, do you feel that there are particular licensed content areas your product covers very well (e.g. humanities, social sciences, sciences). do you feel there are areas which you need to build up? 3. what’s your philosophy going forward in inking future agreements with publishers to cover more licensed content? are there particular key publishers your index currently doesn’t include, but whom you are in active negotiations with? 4. we have several local content repositories, such as our digital collections in contentdm, our growing ir repository housed in bepress, and locally developed, web-searchable mysql databases. given the fact that most discovery platforms are quite new, do you already have existing customers harvesting their local collections, such as the above, into the discovery platform? have any particular, common problems surfaced in their attempts to get their local collections searchable and exposed in the discovery platform? 5. let’s say the library subscribes to an ejournal title – journal of animal studies -that’s from a publisher with whom you don’t have an agreement for their metadata, and thus, supposedly, don’t index. if a student tried to search for an article in this journal – “giraffe behavior during the drought season,” what would happen? is this content still somehow indexed in your tool? would the discovery platform invoke our link resolver? please describe. 6. our focus is your next generation discovery platform, and not on your “traditional” federated search product which may be able to cover other resources not yet indexed in your next generation discovery platform. that said, please briefly describe the role of your federated search product vis a vis the next generation discovery platform. do you see your federated search product “going away” once more and more content is eventually indexed in your next generation discovery platform? end user interface & functionality 7. are there any particular or unique look and feel aspects of your interface that you feel elevate your product above your competitors? if so, please describe. 8. are there any particular or unique functionality aspects of your product that you feel elevate it above the competition (e.g. presearch or postsearch refinement categories, export options, etc.) 9. studies show that end users want very quick access to full text materials such as electronic journal articles and ebooks. what is your product’s philosophy in regards to this? does your platform, in your opinion, provide seamless, quick access to full text materials, with a minimum of confusion? please describe. investigations into library web-scale discovery services | vaughan 82 related to this, does your platform de-dupe results, or is the user presented with a list of choices for a single, particular journal article they are trying to retrieve? in addition, please describe a bit how your relevancy ranking works for returned results. what makes an item appear first or on the first page of results? 10. please describe how “well” your product integrates with the library’s opac (in our case, innovative’s millennium opac). what information about opac holdings can be viewed directly in the discovery platform w/o clicking into the catalog and opening a new screen (e.g. call #, availability, enriched content such as table of contents or book covers?) in addition, our opac uses “scopes” which allow a user – if they choose – to limit at an outset (prior to a search being conducted) what collection they are searching. in other words, these scopes are location based, not media type based. for our institution, we have a scope for the main library, one for each of our three branch libraries, and a scope for the entire unlv collection. would your system be able to incorporate or integrate these pre-existing scopes in an advanced search mode? and/or, could these location based scopes appear as facets which a user could use to drill down a results list? 11. what is your platform’s philosophy in terms of “dead end searches.” does such a thing exist with your product? please describe what happens if a user a.) misspells a word b.) searches for a book or journal title / article that our library doesn’t own/license, but that we could acquire through interlibrary loan. staff “control” over the end user interface 12. how “open” is your platform to customization or interface design tweaks desired by the library? are there any particular aspects that the library can customize with your product that you feel elevate it above your competitors (e.g. defining facet categories; completely redesigning the end-user interface with colors, links, logos; etc.)? what are the major things customizable by the library, and why do you think this is something important that your product offers. 13. how “open” is your platform to porting over to other access points? in other words, provided appropriate technical skills exist, can we easily embed the search box for your product into a different webpage? could we create a “smaller,” more streamlined version of your interface for smartphone access? overarching question 14. in summary, what are some of the chief differentiators of your product from the competition? why is your product the best and most worthy of serious consideration? abstract introduction why web-scale discovery? q: if you could provide one piece of advice to your library, what would it be? the internal academic library perspective: genesis of the unlv libraries discovery task force the following sections of this article begin with a focus on the internal unlv library perspective—from early discussions focused on the broad topic of discovery to establishing a task force charged to identify, research, evaluate, and recommend a pot... activity: understanding web-scale activity: initial staff survey table 1. web-scale discovery service capabilities activity: second staff survey activity: early adopter references activity: vendor identification activity: vendor investigations activity: product development tracking activity: recommendation next steps references catqc and shelf-ready material | jay, simpson, and smith 41 michael jay ([e-mail?]) is information technology expert, software unit, information technology department; betsy simpson is chair, cataloging and metadata department; and doug smith is head, copy cataloging unit, cataloging and metadata department, george a. smathers libraries, university of florida, gainesville. michael jay, betsy simpson, and doug smith catqc and shelf-ready material: speeding collections to users while preserving data quality libraries contract with vendors to provide shelf-ready material, but is it really shelf-ready? it arrives with all the physical processing needed for immediate shelving, then lingers in back offices while staff conduct itemby-item checks against the catalog. catqc, a console application for microsoft windows developed at the university of florida, builds on oclc services to get material to the shelves and into the hands of users without delay and without sacrificing data quality. using standard c programming, catqc identifies problems in marc record files, often applying complex conditionals, and generates easy-to-use reports that do not require manual item review. a primary goal behind improvements in technical service workflows is to serve users more efficiently. however, the push to move material through the system faster can result in shortcuts that undermine bibliographic quality. developing safeguards that maintain sufficiently high standards but don’t sacrifice productivity is the modus operandi for technical service managers. the implementation of oclc’s worldcat cataloging partners (wcp, formerly promptcat) and bibliographic record notification services offers an opportunity to retool workflows to take advantage of automated processes to the fullest extent possible, but also requires some backroom creativity to assure that adequate access to material is not diminished. n literature review quality control has traditionally been viewed as a central aspect of cataloging operations, either as part of item-byitem handling or manual and automated authority maintenance. how this activity has been applied to outsourced cataloging was the subject of a survey of academic libraries in the united states and canada. a total of 19 percent of libraries in the survey indicated that they forgo quality control of outsourced copy, primarily for government documents records. however, most respondents reported they review records for errors. of that group, 50 percent focus on access points, 30 percent check a variety of fields, and a significant minority—20 percent—look at all data points. overall, the libraries expressed satisfaction with the outsourced cataloging using the following measures of quality supplied by the author: accuracy, consistency, adequacy of access points, and timeliness.1 at the inception of oclc’s promptcat service in 1995, ohio state university libraries participated in a study to test similar quality control criteria with the stated goals of improving efficiency and reducing copyediting. the results were so favorable that the author speculated that promptcat would herald a future where libraries can “reassess their local practices and develop greater confidence in national standards so that catalog records can be integrated into local opacs with minimal revision and library holdings can be made available in bibliographic databases as quickly as possible.”2 fast forward a few years and the new incarnation of promptcat, wcp, is well on its way to fulfilling this dream. in a recent investigation conducted at the university of arkansas libraries, researchers concluded that error review of copy supplied through promptcat is necessary, but the error rate does not warrant discontinuance of the service. the benefits in terms of time savings far outweigh the effort expended to correct errors, particularly when the focus of the review is to correct errors critical to user access. while the researchers examined a wide variety of errors, a primary consideration was series headings, particularly given the problems cited in previous studies and noted in the article.3 with the 2006 announcement by the library of congress (lc) to curtail its practice of providing controlled series access, the cataloging community voiced great concern about the effect of that decision on user access.4 the arkansas study determined that “the significant number of series issues overall (even before lc stopped performing series authority work) more than justifies our concern about providing series authority control for the shelf-ready titles.” approximately one third of the outsourced copy across the three record samples studied had a series, and, of that group, 32 percent needed attention, predominantly taking the form of authority record creation with associated analysis and classification decisions.5 the overwhelming consensus among catalogers is that error review is essential. as far as can be determined, an underlying premise behind such efforts seems to be that it is done with the book in hand. but could there be a way to satisfy the concerns without the book in hand? certainly, validation tools embedded in library management systems provide protections whether records are manually entered or batchloaded, and outsourced authority maintenance services (for those who can use them) offer further control. but a customizable tool that allows libraries to target specific needs, both standards-based and local, without relying on item-by-item handling can contribute michael jay (emjay@ufl.edu) is information technology expert, software unit, information technology department; betsy simpson (betsys@uflib.ufl.edu) is chair, cataloging and metadata department; and doug smith (dougsmith@uflib.ufl .edu) is head, copy cataloging unit, cataloging and metadata department, george a. smathers libraries, university of florida, gainesville. 42 information technology and libraries | march 2009 to an economy of scale demanded by an environment with shrinking budgets and staff to devote to manual bibliographic scrutiny. if that tool is viewed as part of a workflow stream involving local error detection at the receiving location as well as enhancement at the network level (i.e., oclc’s bibliographic record notification service), then it becomes an important step in freeing catalogers to turn their attention to other priorities, such as digitized and hidden collections. n local setting and workflow the george a. smathers libraries at the university of florida encompasses six branches that address the information needs of a diverse academic research campus with close to fifty thousand undergraduate and graduate students. the technical services division, which includes the acquisitions and licensing department and the cataloging and metadata department, acquires and catalogs approximately forty thousand items annually. seeking ways to minimize the handling of incoming material, beginning in 2006 the departments developed a workflow that made it possible to send shelf-ready incoming material directly to the branches after check-in against the invoice. shelf-ready items represent approximately 30 percent of the libraries’ purchased monographic resources at this time. by using wcp record loads along with vendor-supplied shelf-ready processing, the time from receipt to shelf has been reduced significantly because it is no longer necessary to send the bulk of the shipments to cataloging and metadata. exceptions to this practice include specific categories of material that require individual inspection. the vendor is asked to include a flag in books that fall into many of these categories: n any nonprocessed book or book without a spine label n books with spine labels that have numbering after the date (e.g., vol. 4, no. 2) n books with cds or other formats included n books with loose maps n atlases n spiral-bound books n books that have the words “annual,” “biennial,” or a numeric year in the title (these may be a serial add to an existing record or part of a series that will be established during cataloging) to facilitate a post–receipt record review for those items not sent to cataloging and metadata, acquisitions and licensing runs a local programming tool, catqc, which reports records containing attributes cataloging and metadata has determined necessitate closer examination. figure 1 is an example of the reports generated, which are viewed using the mozilla firefox browser. copy catalogers rotate responsibility for checking the report and revising records when necessary. retrieval of the physical piece is only necessary in the 1 percent of cases where the item needs to be relabeled. n catqc report catqc analyzes the content of the wcp record file and identifies records with particular bibliographic coding, which are used to detect potential problems: 1. encoding levels 2, 3, 5, 7, e, j, k, m 2. 040 with non-english subfield b 3. 245 fields with subfields h, n, or p 4. 245 fields with subfields a or b that contain numerals 5. 245 fields with subfields a or b that contain red flag keywords 6. 246 fields 7. 490 fields with first indicator 0 8. 856 fields without subfield 3 9. 6xx fields with second indicators 4, 5, 6, and 7 the numbers following each problem listed below indicate which codes are used to signal the presence of a potential problem. minimal-level copy (1) the library’s wcp profiles, currently in place for three vendors, are set up to accept all oclc encoding levels. with such a wide-open plan, it is important to catch records with minimal-level copy to assure that appropriate access points exist and are coded correctly. the library encounters these less-than-full encoding levels infrequently. parallel records (2) catqc identifies foreign library records that are candidates for parallel record treatment by indicating in the report if the 040 has a non-english subfield b. the report includes a 936 field if present to alert catalogers that a parallel record is available. volume sets (3, 4, 5) the library does not generally analyze the individual volumes of multipart monographic sets (i.e., volume sets) even when the volumes have distinctive titles. these catcq and shelf-ready material | jay, simpson, and smith 43 “volume,” “part,” and “number” as well as common abbreviations of those words (e.g., v. or vol.). serial vs. monograph treatment (4, 5) titles owned by the library and classified as serials sometimes are ordered inadvertently as monographs, resulting in the delivery of a monographic record. a similar problem also occasionally arises with new titles. by detecting numerals, keywords, or the presence of one or more of the subfields in the 245 field, we can quickly scan a list of records with these characteristics. of course, most of the records detected by catqc are false hits because of the broad scope of the search; however, it takes only a few minutes to scan through the record list. non-print formats (3) the library does not receive records for any format other than print through wcp. consequently, detecting the presence of a subfield h in the 245 field is a good signal that there may be a problem with the record. alternate titles (6) alternate titles can be an important access point for library users. sometimes text that should properly be in subfield i (e.g., “at head of title”) of the 246 field is placed in subfield a in front of the alternate title. this adversely affects user access to the title through browse searching. catqc checks for and reports the presence of a 246 field. the cataloger can then quickly confirm that it is coded correctly. untraced series (7) as a program for cooperative cataloging (pcc) participant, the library opted to follow pcc practice to continue to trace series despite lc’s decision in 2006 to treat as untraced all series statements in newly cataloged records. because some libraries chose to follow lc in its decision, there has been an overall increase in the use of untraced series statements across all types of record-encoding volumes are added to the collection under the title of the set. the june 2006 decision by lc to produce individual volume records when a distinctive title exists caused concern about the integrity of the libraries’ existing open volume set records. because such records typically have enumeration indicated in the subfield n, and sometimes p, of the 245 field, the program searches for instances of those subfields. in addition, the program detects the presence of numerals in the 245 and keywords such as figure 1. an example report from catcq 44 information technology and libraries | march 2009 levels. to address this issue, catqc searches all wcp records for 490 fields with first indicator 0. catalogers check the authority files for the series and make any necessary changes to the records. this is by far the most frequent correction made by catalogers. links (8) to provide users with information about the nature of the urls displayed in the catalog, catalogers insure that explanatory text is recorded in subfield 3 of the 856 field. catqc looks for the absence of subfield 3, and, if absent, displays the 856 field in the report as a hyperlink. the cataloger adds the appropriate text (e.g., full text) as needed. subject headings with second indicators 4, 5, 6, and 7 (9) the catqc report reviewed by catalogers includes subject headings with second indicator 4. when these headings duplicate headings already on the record, catalogers delete them from our local system. when the headings are not duplicates, the catalogers change the second indicator 4 to 0. typically, 6xx fields with second indicators 5, 6, and 7 contain non-english headings based on foreign thesauri. these headings can conflict with lc headings and, in some cases, are cross references on lc authorities. the resulting split files are not only confusing to patrons, but also add to the numbers of errors reported that require authority maintenance. for these reasons, our policy is to delete the headings from our local system. catqc detects the presence of second indicators 5, 6, or 7 and creates a modified file with the headings removed with one exception: a heading with second indicator 7 and subfield 2 of “nasat,” which indicates the heading is taken from the national aeronautics and space administration thesaurus, is not removed because the local preference is to retain the “nasat” headings. n library-specific issues catqc resolves local problems when needed. for example, when more than one lc call number was present on the record, the wcp spine manifest sent to the vendor used to contain the second call number, which was affixed to the item. when the wcp records were loaded into the library’s catalog, the first call number populated the holding. as a result, there was a discrepancy between the spine label on the book and the call number in the catalog. prior to generating the report, catqc found multiple instances of call numbers in the records in the wcp file and created a modified file with the call numbers reordered so that the correct call number was used on the holding when the record was loaded. previously, the library’s opac did not display the text in subfield 3 of the 856 field, which specifies the type of material covered by the link, and to the user it appeared that the link was to a full-text resource. this was particularly troublesome for records with lc links to table of contents, publisher descriptions, contributor information, and sample text. to prevent user frustration, catqc was programmed to move the links on the wcp records to 5xx fields. when the opac interface improved and the programming was no longer necessary, catqc was revised. n analysis to see how well catqc and oclc’s bibliographic notification service were meeting our goal of maintaining high-quality bibliographic control, 63 reports were randomly selected from the 171 reports generated by catqc between october 2007 and april 2008. catqc found no problems in twelve (19 percent) of the selected reports. these twelve were not used in the analysis, leaving fifty-one catqc reports examined with at least one potential problem flagged for review. an average of 35.6 percent of the records in the sample of reports was flagged as requiring review by a cataloger. an average of thirteen possible problems was detected per report. of these, 55 percent were potential problems requiring at least some attention from the cataloger. the action required of the cataloger varied from simply checking the text of a field displayed in the report (e.g., 246 fields) to bringing up the record in aleph and editing the bibliographic record (e.g., verifying and correcting series headings or eliminating unwanted subject headings). why the relatively high rate of false positives (45 percent)? to minimize missing serials and volumes belonging to sets, catqc is designed to err on the side of caution. two of the criteria listed earlier were responsible for the vast majority of the false positives generated by catqc: 245 fields with subfields a or b that contain numerals and 245 fields with subfields a or b that contain red-flag keywords. clearly, if every record with a numeral in the 245 is flagged, a lot of hits will be generated that are not actual problems. the list of keywords was purposefully designed to be extensive. for example, “volume,” “vol.,” and “v.” are all triggers causing a record to be flagged. therefore a bibliographic record containing the phrase “volume cost profit analysis” in the 245 field would be flagged as a potential problem. at first glance, a report filled with so many false positives may seem inefficient and burdensome for catalogers to use; however, this is largely mitigated by the excellent display format. the programmer worked closely with catcq and shelf-ready material | jay, simpson, and smith 45 the copy cataloging unit staff to develop a user-friendly report format. each record is framed separately, making it easy to distinguish from adjoining records. potential problems are highlighted with red lettering immediately alerting catalogers to what the potential problem might be. whenever a potential problem is found, the text of the entire field appears in the report so that catalogers can see quickly whether the field triggering the flag is an actual problem. it takes a matter of seconds to glance through the 245 fields of half a dozen records to see if the numeral or keyword detected is a problem. the catalogers who work with these reports estimated that it took them between two and three hours per month to both review the files and make corrections to bibliographic records. a second component of bibliographic quality maintenance is oclc’s bibliographic record notification service. this service compares newly upgraded oclc records with records held by the library and delivers the upgraded records to the library. because catqc flags records with encoding levels of 2, 3, 5, 7, e, j, k, and m, it was possible to determine if these records had, in fact, been upgraded in oclc. in the sample, thirty-three records were flagged because of the encoding level. no upgrade had been made to 21.2 percent of the records in oclc as of august 2008. upgrades had been made to 45.5 percent of the records. the remaining 33.3 percent of the records were manually loaded by catalogers in copy cataloging. these typically are records for items brought to copy cataloging by acquisitions and licensing because they meet one or more of the criteria for individual inspection discussed previously. when catalogers search oclc and find that the received record has not been upgraded, they search for another matching record. a third of the time, a record of higher quality than that received is found in oclc and exported to the catalog. the reason why the record of better quality is not harvested initially is not clear. it is possible that at the time the records were harvested both records were of equivalent quality and by chance one was enhanced over another. in no instance had any of the records originally harvested been upgraded (this is not reflected in the 21.2 percent of records not upgraded). encoding level 8 records are excluded from catqc reports. because of the relatively quick turnaround for upgrades of this type of copy, the library decided to rely solely on the bibliographic record notification service. n technical specifications catqc is a console application for windows. written in standard c, it is designed to be portable to multiple operating systems with little modification. no graphic interface was developed because (a) the users are satisfied with the current operating procedure and (b) the treatment of the records is predefined as a matter of local policy. the user opens a command console (cmd.exe) and types “catqc”+space+“[name of marc file]”+enter. the corrected file is generated; catqc analyzes the modified file and creates the xml report. it moves the report to a reviewing folder on a file server across the lan and indicates to the user that it is terminating. modifications require action by a programmer; the user cannot choose from a list of options. benefits include a 100 kb file size and a processing speed of approximately 1,000 records per second. no quantitative analysis has yet been done related to the speed of processing, but to the user the entire process seems nearly instantaneous. the genesis of the project was an interest in the record structure of marc files brought about in the programmer by the use of earlier local automation tools. the project was speculative. the first experiment contained the programming structure that would become catqc. one record is read into memory at a time, and there is another array held for individual marc fields. conceptually, the records are divided into three portions—leader, directory, and dataset—when the need arises to build an edited record. initially there was no editing, only the production of the report. the generation of strict, valid xml is a significant aspect of catqc. an original document type was created, along with a corresponding cascading style sheet. the reports are viewable to anyone with an xml–capable browser either through file server, web server, or e-mail. (the current version of internet explorer does not fully support the style sheet syntax.) this continues to be convenient for the report reviewers because they do not have to be client application operators. see appendix a for an excerpt of a document instance and appendix b for the document type definition. catqc is not currently a generalized tool such as marcedit, a widely used marc editing utility that provides a standard array of basic capabilities: field counting, field and subfield deletion (with certain conditional checks), field and subfield additions, field swapping and text replacement, and file conversion to and from various formats such as marcxml and dublin core as well as between marc-8 and utf-8 encodings.6 marcedit continues to grow and does offer programmability that relies on the windows scripting host. this requires the user to either learn vbscript or use the wizards offered by marcedit. the catqc development goal was to create a report, viewable through a lan or the internet, which alerts a group of catalogers to potential problems with specific records, often illustrating those problems. although it might have been possible to use a combination of marcedit capabilities and local programming to help achieve this goal, it likely would have been a more cumbersome route, particularly taking into consideration the multidimensional 46 information technology and libraries | march 2009 conditionals desired. it was deemed easier to write a program that addresses local needs directly in a language already familiar to the programmer. as catqc evolved, it was modified to identify more potential problems and to do more logical comparisons as well as to edit the files as necessary before generating the reports. catqc addresses a particular workflow directly and provides one solution. it is procedural as opposed to event driven or object oriented. with version 1.3, the generic functions were extracted into a marclib 1.0, a common object file format library. functions specific to local workflow remain in catqc. the program is freely available to interested libraries by contacting the authors. as of this writing, the university of florida plans to distribute this utility under the gnu public license version 3 (see www.opensource.org/licenses/gpl-3.0.html) while retaining copyright. n conclusion catqc provides catalogers an easy way to check the bibliographic quality of shelf-ready material without the book in hand. as a result, throughput time from receipt to shelf is reduced, and staff can focus data review on problem areas—those affecting access or interfering with local processes. some of the issues addressed by catqc are of concern to all libraries while others reflect local preferences. the program could be easily modified to conform to those preferences. automation tools such as catqc are of key importance to libraries seeking ways to streamline workflows to the benefit of users. references and notes 1. vinh-the lam, “quality control issues in outsourcing cataloging in united states and canadian academic libraries,” cataloging & classification quarterly 40, no. 1 (2005): 101–22. 2. mary m. rider, “promptcat: a projected service for automatic cataloging—results of a study at the ohio state university libraries,” cataloging & classification quarterly 20, no. 4 (1995): 43. 3. mary walker and deb kulczak, “shelf-ready books using promptcat and ybp: issues to consider (an analysis of errors at the university of arkansas),” library collections, acquisitions, & technical services 31, no. 2 (2007): 61–84. 4. “lc pulls plug on series authority records,” cataloging & classification quarterly 43, no. 2 (2006): 98–99. 5. walker and kulczak, “shelf-ready books.” 6. for more information about marcedit, see http://oregon state.edu/~reeset/marcedit/html/index.php. wcp file analysis: 201 records analyzed. record: 71 oclc number: 243683394 timestamp: 20080824000000.0 245: 10 |a difference algebra /|c levin alexander. 245 h 245 n 245 p numerals keywords appendix a. catqc document instance excerpt catcq and shelf-ready material | jay, simpson, and smith 47 490: 0 |a algebras and applications ;|v v. 8 . . . appendix b. catqc document type definition 48 information technology and libraries | march 2009 academic uses of google earth and google maps in a library setting eva dodsworth and andrew nicholson academic uses of google earth and google maps in a library setting | dodsworth and nicholson 102 abstract over the last several years, google earth and google maps have been adopted by many academic institutions as academic research and mapping tools. the authors were interested in discovering how popular the google mapping products are in the academic library setting. a survey was conducted to establish the mapping products’ popularity, and type of use in an academic library setting. results show that over 90 percent of the respondents use google earth and google maps either to help answer research questions, to create and access finding aids, for instructional purposes or for promotion and marketing. the authors recommend expanding the mapping products’ user base to include all reference and liaison librarians. introduction since their launch in 2005, google maps and google earth have had an enormous impact on the way people think, learn, and work with geographic information. with easy access to spatial and cultural information, google maps/earth has provided users with the means to understand their world and their communities of interest. moreover, the customizable map features and dynamic presentation tools found in google maps and google earth make each one an attractive option for someone wanting to teach geographic information or make customized maps. for academic researchers, google mapping applications are also appealing for their powerful ability to share and host projects, create customized kml (keyhole markup language) files, and to easily communicate their own research findings in a geographic context. recognizing their potential for revitalizing map collections and geographic education, the authors felt that many academic libraries were also going to be active in using google maps/earth for a variety of purposes, from promoting their services to developing their own google kml files for users. with google earth’s ease of use and visualization capabilities, it was even thought that academic libraries would be using google earth heavily in instruction classes bringing geographic information to subject areas traditionally outside of geography. as active users of google maps/earth in their roles as academic librarians at their universities, the authors became curious to know what other academic librarians were doing with google maps/earth, particularly those working with maps and/or geography subjects. were they using eva dodsworth (edodsworth@uwaterloo.ca) is geospatial data services librarian, university of waterloo library, waterloo, and andrew nicholson (andrew.nicholson@utoronto.ca) is gis/data librarian, hazel mccallion academic learning centre, university of toronto mississauga, ontario mailto:edodsworth@uwaterloo.ca mailto:andrew.nicholson@utoronto.ca information technology and libraries | june 2012 103 the technology as part of their librarian roles on campus? how were they using it? what impacts was it having in how they delivered library services? to help answer these questions, the authors set out on a three-stage process with the aim of providing a more complete picture of google maps/earth use in academic libraries. the first stage consisted of a literature search focusing on library and information science research databases, to see what (if any) scholarly research had been written that discussed the role of google maps/earth in academic libraries. the second stage of the research had the authors examining over a dozen academic library websites to assess how they were integrating google maps/earth either through an api plug-in on their website or advertising other google maps/earth related services and collections. the third stage had the authors compile a set of twenty survey questions which were then distributed to academic librarians across canada and the united states, probing the use of google mapping products in the academic library setting. literature review despite the ubiquity of google for information searching, there was a surprising paucity of literature that documents the impact of google maps/earth in academic libraries. nevertheless, there are some articles which indicate just how much google maps can help raise the profile of library services. terry ballard, a librarian at quinnipiac university, describes in a few articles how he and colleagues were able to use google earth placemarks to promote his library’s special collections.1 the potential for “discovering the library with google earth” is also a theme in an article by brenner and klein in which the portland state university library linked their urban planning documents collection to google earth for ease of searching.2 although the focus is on public libraries, michael vandenburg documents how his library system began “using google maps as an interface for the library catalogue.” in his article, vandenburg discusses that the inspiration for such a project came about through various google maps mashups that were popular on search oriented websites such as “housing maps,” which combined realtor listings from craigslist with a google maps api. using api coding, vandenburg was able to link latitude and longitude data of countries to individual opac records enabling a visual search for items at the country level.3 while these articles focused on use of google earth as a collection discovery tool, troy swanson notes the visualization aspects of the applications and their utility for teaching information literacy. swanson has students use google earth and second life as tools to create a virtual exhibit on malcolm x. although swanson notes that the final output by the students did not meet the initial expectations, valuable learning opportunities for teaching in a 3d space were recognized and should be pursued. 4 some of these opportunities are highlighted as case studies by lamb, noting the visualization aspects of google earth would be very useful for librarians providing instruction.5 academic uses of google earth and google maps in a library setting | dodsworth and nicholson 104 google maps/earth & academic libraries: a scan of selected library websites for the next stage, the authors performed an environmental scan of academic library websites to see how they are using and implementing google mapping technology into their services. many are doing creative and innovative project work which will, we hope, encourage and guide other libraries to consider doing something similar. mapping technology can be used in several different ways, and with internet users becoming more proficient using this technology, libraries have the opportunity to take advantage of this communication medium. any document or image that has a geographic component can be digitized and made easily accessible using online mapping technology. the following section will review some of the projects highlighted on websites. the projects can be grouped into the following categories: finding aids, collection distribution, and teaching and reference services. finding aids all collections in libraries require some sort of finding aid to locate library material—the most obvious one being the library catalog. however, there are many location-based materials that use customized finding aids such as map and air photo indexes, and geospatial data coverage maps. for several years now libraries have been trying to make access to the finding aids easier by digitizing them and offering them online. not only are online versions easily updatable, but they are quite often created using google technology, allowing for the use of modern basemaps and zoom capabilities. traditional paper indexes can be difficult to navigate, especially the historical ones, making the search process rather difficult for users and library staff. one of the most popular types of online indexes created by libraries is air photo indexes. most map libraries collect air photos, and many use similar indexes to help locate aerial photography for an area of interest. several libraries have digitized the indexes making the same information available online. users simply zoom into a geographical area and click on a point to retrieve the photo information they need in order to locate the air photo in the library collection. some libraries will even send an electronic copy of the photo to the users. the mcgill university library, for example, has made its air photo information available from their webpage in a kml format to be viewed in google earth. users can click on a point of interest to easily obtain the air photo information. mcgill library has also digitized topographic indexes, making them also available via google earth.6 the university of western ontario’s serge a. sauer’s library also provides its air photo indexes online, incorporating google maps directly into their website. placemarks representing individual photos have been inserted on a google map, along with the photo description so that when users click on the placemark, photo information is released. using google mapping technology to offer online finding aids that are searchable by location is an innovative and cost-free step towards collection accessibility. what would make these types of library collections even more accessible, however, is offering users online access to digital versions of the collection items themselves. so to bring the indexing project one step forward, not only would the photo reference information be made available, but the actual image would be too, information technology and libraries | june 2012 105 thereby allowing libraries to use google mapping technology as an avenue for collection distribution and delivery. collection delivery libraries have had digital collections for quite some time. many of course do not need to digitize resources themselves as they subscribe to products such as electronic journals and books. however there are still some less common collections that are physically housed in libraries that would be much more accessible to users if they were exposed and made available online. an internet search has shed light on numerous digitization projects that use google mapping technology to search for and deliver location-based collections. examples of these types of collections include historical maps and air photos, archived photos and postcards, audio interviews, community information, textual documents like letters and diaries, and gis data. mcmaster university library is one example of a library that has digitized a historical map collection and made it available online. an index to its world war i military maps and aerial photography was created using google maps, and was embedded into its webpage.7 users can click on an area of interest to bring up the corresponding high resolution map image. likewise, brock university library has also offered its historical air photo collection online, allowing users to search using a google map, and then download photos of interest.8 additionally, yale university library has created kml indexes of its fire insurance plans, with direct links to the digitized images.9 the university of connecticut library has digitized its local historical maps and using google maps had created a map mashup which includes historic landmarks. clicking on the landmarks provides users with links to related resources. several libraries have digitized other imagery, such as postcards and photography. this is particularly popular with archival and specialized collections. the university of vermont library has embedded a google map into its website with placemarks that when clicked lead the user to the library’s long trail collection, an assortment of over 900 images of the oldest long-distance hiking trail in the united states. the images have been digitized from hand-colored lantern slides.10 cleveland state university library has also done something similar with its cleveland memory project, in which google maps were embedded into the library webpage and placemarks of local historic landmarks added. when users click on the placemarks, they are able to access a description of the landmark along with a photograph of it. clicking on “more information” will lead the user to several related resources, including the library catalog, where original documents about the location are available (e.g., images, books).11 besides digitizing their collections, some libraries have also georeferenced them so that they could not only be accurately located using an index, but so that they could be viewed in google earth (kml format). offering collections in kml format greatly increases exposure and use of geographic resources because google earth is one of the more popular location-based applications used by library users and the public. geographic files such as georeferenced air photos and satellite images, as well as gis data used to be only viewed in specialized gis programs. but gis technology has evolved into so many online applications, offering all computer users the benefits of geographic information and a platform to distribute information. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 106 the university of waterloo map library is one example of a library that had digitized its historical air photo collection and made the images available in kml format for google earth usage.12 users can access a map index of the available photos from the map library webpage and then click on the index to download the images. the university of north carolina library has georeferenced several historical maps and made them available for viewing as an image overlay in google maps. this particular mapping project consists of around 150 thematic maps, including historical soil surveys, road and highway maps, city/county maps, and more. users can take advantage of the georeferenced maps and accurately compare historical features to modern ones with google maps’ basemap. having a preview of the dataset before it is downloaded assists the user in downloading only what is needed.13 perhaps more popular than a library’s air photo collection are libraries’ collections of geospatial data. geospatial, or gis data, has traditionally been only used by users who have access to gis programs such as esri’s arcgis, or arcview. more recently, librarians have discovered that when spatial files are converted into easy-to use file formats, such as kml, the user group is broadened and the files are used more. so it is no surprise that several libraries have converted their gis shapefiles (a spatial data file format used specifically in gis programs) into kml files and made them available for download from their webpages. university of connecticut library offers its gis files online in various formats, including kml. it also provides a sample image of the gis layer in google maps.14 baruch college at the city university of new york has made neighborhood census data available in google maps. the geographic boundary files were overlaid in google maps, and clicking on the map will lead users to the files available from the american census bureau’s fact finder. clearly, many libraries have incorporated google mapping technology into their digitization projects. the technology has proven capable of attracting collections that are not strictly locationfocused such as maps and air photos, but that have a location associated with it, such as archival photos of community landmarks or books written about a specific locale. google mapping technology makes the organization and storage of collections relatively effortless for library project managers, and it makes collection searching and distribution simple and friendly for the users. other uses of google maps/earth in libraries perhaps one of the simplest uses of google mapping technology can be illustrated by visiting several library websites. many libraries have embedded google maps into their website as either a webpage header15 survey: what are academic library staff doing with google maps/earth?16 following the review of the literature and academic library websites, the authors wanted to discover how academic librarians themselves were using google maps and google earth in their work, if at all. to capture this data, the authors compiled a set of survey questions targeting those in the academic library community who work with maps, gis, or geography/geology/earth science subject matter. information technology and libraries | june 2012 107 in preparing the survey questions, the authors were aware of a “survey fatigue” among the academic library community. at the time of research, many surveys were going out to librarians requesting their time and responses, so the authors wanted to keep the survey concise both in terms of number of questions, but also in the types of questions. in the end, the survey was created with twenty questions consisting of six yes/no questions, seven multiple choice, and the remaining seven questions being short answer. for distributing the survey, the authors wanted to reach as many librarians who worked with maps, geospatial data and government document subject matter as possible. the survey was then distributed on specialized map library and government publication listservs, including maps-l, govinfo, gis4lib, and carta (canadian maps & air photo systems forum). the survey was also distributed on the members’ only lists belonging to the association of canadian map libraries & archives (acmla) and the western association of map libraries (waml) listservs. the survey was made available on survey monkey for two months from december 2010 to the end of january 2011. the responses with the survey available during a quieter period of library activities, and thanks to a couple of reminder emails being sent out on the lists, our questionnaire received a total of 83 responses. who is using google maps/earth? the first couple questions dealt with the department or area of the library in which the respondent worked in, and what their position encompassed. as expected, a large majority of respondents, 81 percent, worked in “map/gis services” while 28.8 percent also had “general reference” responsibilities. other library service areas mentioned included “data services” and “it,” as well as some that fell outside library boundaries where staff worked in geography and environment science departments. not surprisingly, 52 percent of the responses indicated that their position was “librarian,” with the majority being “gis librarian” or “map librarian.” others included “reference & instructional services librarian” and “science librarian.” also received were 17 responses from gis specialists, library technicians and map assistants. what was especially noteworthy was that 12 responses were from library administrators, directors, or department heads who were finding time to work with google earth as part of their responsibilities. this number also included gis coordinators and map curators responsible for making decisions in their departments. google mapping products : what is being used, how often and for what purpose? to gain an understanding of how library staff are using google mapping products, a series of questions was asked of the respondents to determine which products were being used, how often and for which tasks. respondents were given a list of all the google mapping products available, and were asked to indicate which ones they had worked with. not surprisingly, the top two products used by respondents were google maps, 93 percent (71) and google earth, 91 percent (69). google maps api had been used by 40 percent (30) of the respondents, followed by google earth pro at 38 percent (29). eight percent (6) had also worked with google earth api, and 7 percent (5) had used google earth plus. interestingly, one respondent indicated that they had deployed google earth enterprise in their library. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 108 figure 1. respondents’ use of google mapping products since many of these users may have simply used the products occasionally, it was important to get a sense of how often the products were being used. when asked the question “how regularly do you work with google mapping products for work-related projects?” 69 percent (54) responded that they use the products at least once a month. of those responses, 45 percent (35) use them at least weekly. specifically, eighteen percent (14) use them one to two times a week, thirteen percent (10) use them three to four times a week, and fourteen percent (11) use them even more often than that. only six percent (5) responded that they don’t use the products at all. information technology and libraries | june 2012 109 figure 2. frequency of use for work-related projects as google maps/earth can be used in many different ways and for different purposes in a library environment, the survey inquired how in fact these products were being used in their libraries. the survey question listed four possible tasks that the technology could be used for with the additional option for respondents to enter their own ‘other’ usages. respondents could check off all that applied. the options given included: • instruction • promotion/marketing • answering research questions • creating/accessing a finding aid tool (air photo map indexes, etc.) • other: (fill in answer) the majority of respondents, 82 percent (58) indicated they were using the products to answer research questions; 61 percent (43) for creating or accessing a finding aid tool; 56 percent (40) for instruction purposes; 27 percent for promotion/marketing and 20 percent (14) have used them for “other” purposes including georeferencing imagery, for use in webpages or creating learning objects. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 110 figure 3. level and frequency of use in instruction are google mapping products being used for library instruction? for the authors, one of the best aspects of google maps/earth applications is their visualization capabilities. the ability to easily create and display geographic information to engage students makes google mapping applications an ideal instruction tool. in many ways, google maps and google earth have helped promote map and spatial literacy as concepts that are teachable. despite the free availability and ease of use of google mapping applications, the authors were somewhat surprised from the survey to find that 72 percent of library staff surveyed noted that their institution did not have any kind of map, spatial, or geospatial literacy policy in place. when it came time to provide instruction in the classroom, the survey found that only 31 percent (26) of the respondents had even used google earth in a classroom. nevertheless, in looking at the course levels, library instruction with google earth tools is actually occurring at all levels, from first year to graduate. significantly however, the frequency of the instruction seems to peak in the fourth year, where staff are using in upwards of six to nine courses. respondents were asked to give some details of these sessions, and they included a variety of class topics from environmental awareness education for first year students, to learning digitization skills in later years. has your library taken advantage of google map/earth technology for promotion or marketing purposes? information technology and libraries | june 2012 111 from our environmental scan of library websites we saw many interesting uses of google maps and google earth that were embedded directly into websites. perhaps because of this the authors were surprised to find that 55 percent of the survey respondents did not believe their library was using these technologies for promotion or marketing purposes. for those respondents who were using google maps or google earth to boost services for users, quite a few provided interesting examples of what this technology can offer. many were using google map apis to enhance map and aerial photo indexes, creating greater awareness of these resources and enhancing access. one respondent noted they had created a campus tour that highlighted all of the buildings that made up the library system, while others were using google api technology to showcase particular digitization projects such as folklore collections or geologic atlases. when asked if such activities have helped to enhance services or provided benefits to users, many responded that they had for both the users and for other library staff. greater speed and an increased familiarity of the collections were cited by several respondents, who no longer need to consult the paper indexes. does the library provide support to the wider campus community using google mapping products (not including instructional collaborations)? although many libraries are now using google maps and google earth technology, the authors were surprised that many were not actively leveraging this expertise across their campuses. almost all the respondents either skipped the question or stated that they were not providing this kind of active support. several noted that their gis services were open to all and that they were responsible for the google earth pro licences on campus, but that this was the extent of their support. working with google map/earth (kml) files in the last few years, kml files have become one of the more popular ways to display and distribute geographic information online. with its ease of use, and access, kml files have considerably broadened the user base of geographic information. kml files can be easily created in google earth, and they can be easily converted from gis files in specialized programs. it is this ease of access and usability that has popularized geographic information, hence increasing exposure to library collections and services. this survey was therefore interested in determining how libraries are using and creating kml files. when survey respondents were asked whether they work with kml files, 64 percent (47) responded they did, with 85 percent (40) claiming that they create their own kml files. for those who create their own, 92 percent (34) said that they created kml files by converting them from another file format using an external application, such as arcgis, earthpoint, ogr20gr, or shp2kml software. 78 percent (29) also created them in google earth, and 32 percent (12) created kml files by writing their own xml code. the authors were most interested to know if kml files were actually held as part of the library holdings. thirty percent (13) of respondents noted that they provide access to their kml files as academic uses of google earth and google maps in a library setting | dodsworth and nicholson 112 part of their collections, with 89 percent, (8) claiming they could be located through the library website. other areas mentioned for access included libguides and specialized gis data catalogues available through the library’s website. in terms of quantity, one respondent claimed a collection of 500-800 kml files, while other responses mentioned amounts in the ranges from 5 to 100, with some claiming that they were not sure exactly how many made up their collection. what other online mapping tools are used in your library apart from google maps and google earth? although google maps and google earth are perhaps the most well-known online mapping tools available, the authors were also interested to learn if there were other products that libraries were using as part of their service offerings. as expected many mentioned esri’s arcgis online and esri’s arcexplorer, while other responses included bing maps, openstreetmap, and open layers. discussion google mapping applications are clearly being used for academic purposes in library settings. with such diverse capabilities made available in these programs, library professionals are using them in several different ways. google earth and google maps are popular among library staff who work with gis and/or map collections. in fact, over 90 percent of the respondents use both products, either to help answer research questions, to create and access finding aids, for instructional purposes or for promotion and marketing. google mapping products have also helped libraries revitalize their collections as well as assist in transferring spatial information literacy skills to academic students and faculty. the authors hope that readers who work in a map/gis library setting will be inspired by the many examples of online mapping projects outlined in this paper and they will too use the online tools to the benefit of their library and their library users. google mapping products offer libraries an online platform to share information, and resources in an easy, accessible and low-cost way. the survey results also indicate that map/gis professionals in academic libraries trust and rely on google maps/earth as a solution to many academic queries and needs. since google mapping products were created for the use by mainstream society, it can be suggested that all other nonmap and gis related fields may find the products to be beneficial and useful to them as well. google earth and google maps are very easy to learn and the users do not require any spatial or mapping skills. as this survey was limited to map/gis users, the authors do not know how, if at all, google mapping products are being used by other library staff. this will be a future area of study. the authors do strongly suggest however for map/gis librarians to consider offering training sessions to reference staff and liaison librarians. as a multidisciplinary tool, many subject areas can benefit from google maps/earth, as it’s certainly not a tool for use by only gis/map librarians. with a little bit of training, all library staff can use google mapping products to assist with research questions, spatial literacy, location-based projects and library instruction. in fact, library staff members responsible for nontraditional library material such as photographs, postcards, audio recordings, original hand-written documents, etc. may want to consider using online mapping products to organize their collection. too many times such original material is lost in the library’s filing system, is irretrievable or unavailable during convenient hours. google maps/earth will organize all collections based on their geographic location and can offer access to the actual information technology and libraries | june 2012 113 material. more exposure to and training on these free and easy to use products can increase collection use, promote mapping technology, and organize the library’s holdings. references 1 terry ballard, “inheriting the earth: using kml files to add placemarks relating to the library’s original content to google earth and google maps” new library world 110 (2009): 357-65, doi: 10.1108/0307480091097579. jacobsen, mikael and terry ballard, “google maps: you are here: using google maps to bring out your library’s local collections” library journal, october 15, 2008 (accessed september 11, 2011). http://www.libraryjournal.com/article/ca6602836.html. 2 michaela brenner and peter klein, “discovering the library with google earth” information technology and libraries 27 (2008): 32-6. 3 michael vandenburg, “using google maps as an interface for the library catalogue” library hitech 26 (2008): 33-40. 4 troy swanson, “google maps and second life: virtual platforms meet information literacy” college & research libraries news 69 (2008): 610-12. 5 annette lamb, and larry johnson, “virtual expeditions: google earth, gis, and geovisualization technologies in teaching and learning” teacher librarian 37 (2010): 81-5. 6 a list of mcgill library’s air photo indexes can be viewed at http://www.mcgill.ca/library/library-findinfo/maps/airphotos/ (accessed september 8, 2011). 7 mcmaster university library map index can be found at http://library.mcmaster.ca/maps/ww1/ndx5to40.htm, (accessed september 8, 2011). 8 the brock university historical air photo collection can be accessed at: http://www.brocku.ca/maplibrary/airphoto/historical.php (accessed september 8, 2011). 9 the yale university sanborn indexes can be found at http://www.library.yale.edu/mapcoll/print_sanborn.html (accessed september 8, 2011). 10 the university of vermont library’s google map can be found at: http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20p hotographs (accessed september 8, 2011). 11 the cleveland memory project can be found at: http://www.clevelandmemory.org/hlneo/ (accessed september 8, 2011). http://www.libraryjournal.com/article/ca6602836.html http://www.mcgill.ca/library/library-findinfo/maps/airphotos/ http://library.mcmaster.ca/maps/ww1/ndx5to40.htm http://www.brocku.ca/maplibrary/airphoto/historical.php http://www.library.yale.edu/mapcoll/print_sanborn.html http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20photographs http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20photographs http://www.clevelandmemory.org/hlneo/ academic uses of google earth and google maps in a library setting | dodsworth and nicholson 114 12 the university of waterloo map library website can be found at: http://www.lib.uwaterloo.ca/locations/umd/project/ (accessed september 8, 2011). 13 the university of north carolina library provides interactive maps at http://www.lib.unc.edu/dc/ncmaps/interactive/overlay.html (accessed september 8, 2011). 14 the university of connecticut library offers gis files online here: http://magic.lib.uconn.edu/connecticut_data.html (accessed september 8, 2011). 15 campus map examples include: yale university library at http://maps.commons.yale.edu/venice/ example maps for library locations on campus include: brock university library, http://www.brocku.ca/maplibrary/general/where-is-the-ml.php university of north carolina, http://www.lib.unc.edu/libraries_collections.html (all accessed on september 8, 2011). 16 the full survey instrument can be found in the appendix of this document. http://www.lib.uwaterloo.ca/locations/umd/project/ http://www.lib.unc.edu/dc/ncmaps/interactive/overlay.html http://magic.lib.uconn.edu/connecticut_data.html http://maps.commons.yale.edu/venice/ http://www.brocku.ca/maplibrary/general/where-is-the-ml.php http://www.lib.unc.edu/libraries_collections.html information technology and libraries | june 2012 115 appendix google maps and google earth: influences and impacts in your library you and your library 1. what is your work position title? 2. what department/division/area of library do you work in? (click all that apply) o map/gis services o government publications o general reference o technical services o other (please specify): google mapping products 3. please check all the products you have worked with? o google maps o google maps api o google earth o google earth plus o google earth pro o google earth api o google earth enterprise 4. how regularly do you work with google mapping products for work-related projects? o not at all o a few times a year o 1-3 times a month o 1-2 times a week o 3-4 times a week o more often than that! o not sure 5. for what work related tasks, have you used these products? (click all that apply) o instruction o promotion/marketing o answering research questions o creating/accessing a finding aid tool (air photo, map indexes, etc.) academic uses of google earth and google maps in a library setting | dodsworth and nicholson 116 library instruction using google mapping products 6. does your library have a map, or spatial, or geospatial literacy policy or program? o yes o no 7. if you are using google mapping products for instruction, what level or year of university course(s) are you using it in, and in how many courses: 1-2 3-5 6-9 10-14 15 and more 1st year (100 level) 2nd year (200 level) 3rd year (300 level) 4th year (400 level) graduate level 8. please describe some of these activities? 9. does your library offer geographic awareness or gis-related training to some or all the library staff? promotion/marketing using google mapping products 10. has your library used google mapping technology to promote, offer, or deliver a service? (for example, offering kml files for download, indexes, guides, scanned documents, placemarks/urls from google maps/earth, etc.) o yes o no 10a. if yes, please describe with as much detail as possible how your library has used google mapping technology. if possible, please provide links to the projects. 10b. if yes, how have the google mapping related projects enhanced services or benefited the library? information technology and libraries | june 2012 117 11. does the library provide support to the wider campus community using google mapping products (not including instructional collaborations)? kml/kmz collections 12. do you work with kml files? o yes o no 13. do you create your own kml files? o yes o no 14. how do you create your own kml files? o write xml code o save in google earth o convert from another file format using an external application o other (please specify) 15. does your library hold and provide access to kml or kmz files as part of its collections? o yes o no 16. if yes, approximately how many files do you currently hold? 17. how are these files findable by your patrons? o opac o library website o both 18. do you or other library staff use other online mapping tools? please list which ones and what they are used for. batch ingesting into eprints digital repository sof tware tomasz neugebauer and bin han information technology and libraries | march 2012 113 abstract this paper describes the batch importing strategy and workflow used for the import of theses metadata and pdf documents into the eprints digital repository software. a two-step strategy of importing metadata in marc format followed by attachment of pdf documents is described in detail, including perl source code for scripts used. the processes described were used in the ingestion of 6,000 theses metadata and pdfs into an eprints institutional repository. introduction tutorials have been published about batch ingestion of proquest metadata and electronic theses and dissertations (etds),1 as well as endnote library,2 into the digital commons platform. the procedures for bulk importing of etds using dspace have also been reported.3 however, bulk importing into the eprints digital repository software has not been exhaustively addressed in the literature.4 a recent article by walsh provides a literature review of batch importing into institutional repositories.5 the only published report on batch importing into the eprints platform describes perl scripts for metadata-only records import from thomson reuters reference manager.6 bulk importing is often one of the first tasks after launching a repository, so it is unsurprising that requests for reports and documentation on eprints-specific workflow have been a recurring question on the eprints tech list.7 a recently published review of eprints identifies “the absence of a bulk uploading feature” as its most significant weakness.8 although eprints’ graphical user interface for bulk importing is limited to the use of the installed import plugins, the software does have a versatile infrastructure for this purpose. leveraging eprints’ import functionality requires some perl scripting, structuring the data for import, and using the command line interface. in 2009, when concordia university launched spectrum,9 its research repository, the first task was a batch ingest of approximately 6,000 theses dated from 1967 to 2003. the source of the metadata for this import consisted in marc records from an integrated library system powered by innovative interfaces and proquest pdf documents. this paper is a report on the strategy and workflow adopted for batch ingestion of this content into the eprints digital repository software. import strategy eprints has a documented import command line utility located in the /bin folder.10 documents can also be imported through eprints’ graphical interface. using the command line utility for tomasz neugebauer (tomasz.neugebauer@concordia.ca) is digital projects and systems development librarian and bin han (bin.han@concordia.ca) is digital repository developer, concordia university libraries, montreal, quebec, canada. mailto:tomasz.neugebauer@concordia.ca mailto:bin.han@concordia.ca batch ingesting into eprints digital repository software| neugebauer and han 114 importing is recommended because it is easier to monitor the operation in real time by adding progress information output to the import plugin code. the task of batch importing can be split into the following subtasks: 1. import of metadata of each item 2. import of associated documents, such as full-text pdf files the strategy adopted was to first import the metadata for all of the new items into the inbox of an editor’s account. after this first step was completed, a script was used to loop through the newly imported eprints and attach the corresponding full-text documents. although documents can be imported from the local file system or via http, import of the files from the local file system was used. the batch import procedure varies depending on the format of the metadata and documents to be imported. metadata import requires a mapping of the source schema fields to the default or custom fields in eprints. the source metadata must also be converted into one of the formats supported by eprints’ import plugins, or a custom plugin must be created. import plugins are available for many popular formats, including bibtex, doi, endnote, and pubmedxml. in addition, community-contributed import plugins such as marc and arxiv are available at eprints files.11 since most repositories use custom metadata fields, some customization of the import plugins is usually necessary. marc plugin for eprints in eprints, the import and export plugins ensure interoperability of the repository with other systems. import plugins read metadata from one schema and load it into the eprints system through a mapping of the fields into the eprints schema. loading marc-encoded files into eprints requires the installation of the import/export plugin developed by romero and miguel.12 the installation of this plugin requires the following two cpan modules: marc::record and marc::file::usmarc. the marc plugin was then subclassed to create an import plugin named “concordia theses,” which is customized for thesis marc records. concordia theses marc plugin the marc plugin features a central configuration file (see appendix a) in which each marc field is paired with a corresponding mapping to an eprints field. most of the fields were configured through this configuration file (see table 1). the source marc records from the innovative interfaces integrated library system (ils) encode the physical description of each item using the anglo american cataloguing rules, as in the following example: “ix, 133 leaves : ill. ; 29 cm.” since the default eprints field for number of pages is of the type integer and does not allow multipart physical descriptions from the marc 300 field, a custom text field for these physical descriptions (pages_aacr) had to be added. the marc.pl configuration file cannot be used to map compound fields, such as author names—the fields need custom mapping implementation in perl. for instance, the marc 100 and 700 fields information technology and libraries | march 2012 115 are transferred into the eprints author compound field (in marc.pm). similarly, marc 599 is mapped into a custom thesis advisor compound field. marc field eprints field 020a isbn 020z isbn 022a issn 245a title 250a edition 260a place_of_pub 260b publisher 260c date 300a pages_aacr 362a volume 440a series 440c volume 440x issn 520a abstract 730a publication table 1. mapping table from marc to eprints helge knüttel’s refinements to the marc plugin shared on the eprints tech list were employed in the implementation of a new subclass of marc import for the concordia theses marc records. in the implementation of the concordia theses plugin, concordiatheses.pm inherits from marc.pm. (see figure 1.)13 knüttel added two methods that make it easier to subclass the general marc plugin and add unique mappings: handle_marc_specialities and post_process_eprint. the post_process_eprint function was not used to attach the full-text documents to each eprint. instead, the strategy to import the full-text documents using a separate attach_documents script was used (see “theses document file attachment” below). import of all of the specialized fields, such as thesis type (mapped from marc 710t), program, department, and proquest id, was implemented in the function handle_marc_specialities of concordiatheses.pm. for instance, 502a in the marc record contains the department information, whereas an eprints system like spectrum stores department hierarchy as subject objects in a tree. therefore importing the department information based on the value of 502a required regular expression searches of this marc field to find the mapping into a corresponding subject id. this was implemented in the handle_marc_specialities function. batch ingesting into eprints digital repository software| neugebauer and han 116 figure 1. concordia theses class diagram, created with the perl module uml::class::simple execution of the theses metadata import the depositing user’s name is displayed along with the metadata for each eprint. a batchimporter user with the corporate name “concordia university libraries” was created to carry out the import. as a result, the public display of the imported items shows the following as a part of the metadata: “deposited by: concordia university libraries.” the marc plugin requires the encoding of the source marc files to be utf-8, whereas the records are exported from the ils with marc-8 encoding. therefore marcedit software developed by reese was used to convert the marc file to utf-8.14 to activate the import, the main marc import plugin and its subclass, concordiatheses.pm, have to be placed in the plugin folder /perl_lib/eprints/plugin/import/marc/. the configuration file information technology and libraries | march 2012 117 (see appendix a) must also be placed with the rest of the configurable files in /archives/repositoryid/cfg/cfg.d. the plugin can then be activated from the command line using the import script in the /bin folder. a detailed description of this script and its usage is documented on the eprints wiki. the following eprints command from the /bin folder was used to launch the import: import repositoryid --verbose --user batchimporter eprint marc::concordiatheses theses-utf8.mrc following the aforementioned steps, all the theses metadata was imported into the eprints software. the new items were imported with their statuses set to inbox. a status set to inbox means that the imported items are in the work area of batchimporter user and will need to be moved to live public access by switching their status to archive. theses document file attachment after the process of importing the metadata of each thesis is complete, the corresponding document files need to be attached. the proquest id was used to link the full-text pdf documents to the metadata records. all of the marc records contained the proquest id, while the pdf files, received from proquest, were delivered with the corresponding proquest id as the filename. the pdfs were uploaded to a folder on the repository web server using ftp. the attach_documents script (see appendix b for source code) was then used to attach the documents to each of the imported eprints in the batchimporter’s inbox and to move the imported eprints to the live archive. several variables need to be set at the beginning of the attach_documents operation (see table 2). variable comment $root_dir = 'bin/importdata/proquest' this is the root folder where all the associated documents are uploaded by ftp. $depositor = 'batchimporter' only the items deposited by a defined depositor, in this case batchimporter, will be moved from inbox to live archive. $dataset_id = 'inbox' limit the dataset to those eprints with status set to inbox $repositoryid = 'library' the internal eprints identifier of the repository table 2. variables to be set in the attach_documents script batch ingesting into eprints digital repository software| neugebauer and han 118 the following command is used to proceed with file attachment, while the output log is redirected and saved in the file attachment: /bin/attach_documents.pl > ./attachment 2>&1 the thesis metadata record was made live even if it did not contain a corresponding document file. a list of eprint ids of theses that did not contain a corresponding full-text pdf document are listed at the end of the log file, along with the count of the number of theses that were made live. after the import operation is complete, all the abstract pages need to be regenerated with the following command: /bin/generate_abstracts repositoryid conclusions this paper is a detailed report on batch importing into the eprints system. the authors believe that this paper and its accompanying source code is a useful contribution to the literature on batch importing into digital repository systems. in particular, it should be useful to institutions that are adopting the eprints digital repository software. batch importing of content is a basic and fundamental function of a repository system, which is why the topic has come up repeatedly on the eprints tech list and in a repository software review. the methods that we describe for carrying out batch importing in eprints make use of the command line and require perl scripting. more robust administrative graphical user interface support for batch import functions would be a useful feature to develop in the platform. acknowledgements the authors would like thank mia massicotte for exporting the metadata records from the integrated library system. we would also like to thank alexandros nitsiou, raquel horlick, adam field, and the reviewers at information technology and libraries for their useful comments and suggestions. references 1. shawn averkamp and joanna lee, “repurposing proquest metadata for batch ingesting etds into an institutional repository,” code{4}lib journal 7 (2009), http://journal.code4lib.org/articles/1647 (accessed june 27, 2011). 2. michael witt and mark p. newton, “preparing batch deposits for digital commons repositories,” 2008, http://docs.lib.purdue.edu/lib_research/96/ (accessed june 20, 2011). 3. randall floyd, “automated electronic thesis and dissertations ingest,” 2009, https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+i ngest (accessed may 26, 2011). 4. eprints digital repository software, university of southampton, uk, http://www.eprints.org/ (accessed june 27, 2011). 5. maureen p. walsh, “batch loading collections into dspace: using perl scripts for automation and quality control,” information technology & libraries 29, no. 3 (2010): 117–27, http://journal.code4lib.org/articles/1647 http://docs.lib.purdue.edu/lib_research/96/ https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+ingest https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+ingest http://www.eprints.org/ information technology and libraries | march 2012 119 http://search.ebscohost.com/login.aspx?direct=true&db=a9h&an=52871761&site=ehost-live (accessed june 26, 2011). 6. lesley drysdale, “importing records from reference manager into gnu eprints,” 2004, http://hdl.handle.net/1905/175 (accessed june 27, 2011). 7. eprints tech list, university of southampton, uk, http://www.eprints.org/tech.php/ (accessed june 27, 2011). 8. mike beazly, “eprints institutional repository software: a review,” partnership: the canadian journal of library & information practice & research 5, no. 2 (2010), http://journal.lib.uoguelph.ca/index.php/perj/article/viewarticle/1234 (accessed june 27, 2011). 9. concordia university libraries, “spectrum: concordia university research repository,” http://spectrum.library.concordia.ca (accessed june 27, 2011). 10. eprints wiki, “api:bin/import,” university of southampton, uk, http://wiki.eprints.org/w/api:bin/import (accessed june 23, 2011). 11. eprints files, university of southampton, uk, http://files.eprints.org/ (accessed june 24 2011). 12. parella romero and jose miguel, “marc import/export plugins for gnu eprints3,” eprints files, 2008, http://files.eprints.org/323/ (accessed may 31, 2011). 13. agent zhang and maxim zenin, “uml:class::simple,” cpan, http://search.cpan.org/~agent/uml-class-simple-0.18/lib/uml/class/simple.pm (accessed september 20, 2011). 14. terry reese, “marcedit: downloads,” oregon state university, http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html (accessed june 27, 2011). http://search.ebscohost.com/login.aspx?direct=true&db=a9h&an=52871761&site=ehost-live http://hdl.handle.net/1905/175 http://www.eprints.org/tech.php/ http://journal.lib.uoguelph.ca/index.php/perj/article/viewarticle/1234 http://spectrum.library.concordia.ca/ http://wiki.eprints.org/w/api:bin/import http://files.eprints.org/ http://files.eprints.org/323/ http://search.cpan.org/~agent/uml-class-simple-0.18/lib/uml/class/simple.pm http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html batch ingesting into eprints digital repository software| neugebauer and han 120 appendix a. marc.pl configuration file # # plugin eprints::plugin::import::marc # # marc tofro eprints mappings # do _not_ add compound mappings here. $c->{marc}->{marc2ep} = { # marc to eprints '020a' => 'isbn', '020z' => 'isbn', '022a' => 'issn', '245a' => 'title', '245b' => 'subtitle', '250a' => 'edition', '260a' => 'place_of_pub', '260b' => 'publisher', '260c' => 'date', '362a' => 'volume', '440a' => 'series', '440c' => 'volume', '440x' => 'issn', '520a' => 'abstract', '730a' => 'publication', }; $c->{marc}->{marc2ep}->{constants} = { }; ################################################################### ### # # plugin-specific settings. # # any non empty hash set for a specific plugin will override the # general one above! # ################################################################### ### # # plugin eprints::plugin::import::marc::concordiatheses # $c->{marc}->{'eprints::plugin::import::marc::concordiatheses'}->{marc2ep} = { '020a' => 'isbn', '020z' => 'isbn', '022a' => 'issn', '250a' => 'edition', information technology and libraries | march 2012 121 '260a' => 'place_of_pub', '260b' => 'publisher', '260c' => 'date', '300a' => 'pages_aacr', '362a' => 'volume', '440a' => 'series', '440c' => 'volume', '440x' => 'issn', '520a' => 'abstract', '730a' => 'publication', }; $c->{marc}->{'eprints::plugin::import::marc::concordiatheses'}->{constants} = { # marc to eprints constants 'type' => 'thesis', 'institution' => 'concordia university', 'date_type' => 'submitted', }; batch ingesting into eprints digital repository software| neugebauer and han 122 appendix b. attach_documents.pl #!/usr/bin/perl -i/opt/eprints3/perl_lib =head1 description this script allows you to attach a file to an eprint object by proquest id. =head1 copyright and license 2009 adam field, tomasz neugebauer 2011 bin han this module is free software under the same terms of perl. compatible with eprints 3.2.4 (victoria sponge). =cut use strict; use warnings; use eprints; my $repositoryid = 'library'; my $root_dir = '/opt/eprints3/bin/import-data/proquest'; #location of pdf files my $dataset_id = 'inbox'; #change to 'eprint' if you want to run it over everything. my $depositor = 'batchimporter'; #limit import to $depositor’s inbox #global variables for log purposes my $int_live = 0; #count of eprints moved to live archive with a document my $int_doc = 0; #count of eprints that already have document attached my @array_doc; #ids of eprints that already have documents my $int_no_doc = 0; #count of eprints moved to live with no document attached my @array_no_doc; #ids of eprints that have no documents my $int_no_proid = 0; #count of eprints with no proquest id my @array_no_proid; #ids of eprints with no proquest id my $session = eprints::session->new(1, $repositoryid); die "couldn't create session for $repositoryid\n" unless defined $session; #the hash contains all the files that need to be uploaded #the hash contains key-value pairs: (pq_id => filename) my $filemap = {}; load_filemap($root_dir); #get all eprints in inbox dataset my $dataset = $session->get_repository->get_dataset($dataset_id); #run attach_file on each eprint object $dataset->map($session, \&attach_file); information technology and libraries | march 2012 123 #output log for attachment print "#### $int_doc eprints already have document attached, skip ####\n @array_doc\n"; print "#### $int_no_proid eprints doesn't have proquest id, skip ####\n @array_no_proid\n"; print "#### $int_no_doc eprints doesn't have associated document, moved to live ####\n @array_no_doc\n"; #total number of eprints that were made live: those with and without documents. my $int_total_live = $int_live + $int_no_doc; print "#### intotal: $int_total_live eprints moved to live ####\n"; #attach file to corresponding eprint object sub attach_file { my ($session, $ds, $eprint) = @_; #skip if eprint already has a document attached my $full_text_status = $eprint->get_value( "full_text_status" ); if ($full_text_status ne "none") { print "eprint ".$eprint->get_id." already has a document, skipping\n"; $int_doc ++; push ( @array_doc, $eprint->get_id ); return; } #retrieve username/userid associated with current eprint my $user = new eprints::dataobj::user( $eprint->{ session }, $eprint->get_value( "userid" ) ); my $username; # exit in case of failure to retrieve associated user, just in case. return unless defined $user; $username = $user->get_value( "username" ); # $dataset includes all eprints in inbox, so we limit to $depositor's items only return if( $username ne $depositor ); #skip if no proquest id is associated with the current eprint my $pq_id = $eprint->get_value('pq_id'); if (not defined $pq_id) { print "eprint ".$eprint->get_id." doesn't have a proquest id, skipping\n"; $int_no_proid ++; batch ingesting into eprints digital repository software| neugebauer and han 124 push ( @array_no_proid, $eprint->get_id ); return; } #remove space from proquest id $pq_id =~ s/\s//g; #attach the pdf to eprint objects and move to live archive if ($filemap->{$pq_id} and -e $filemap->{$pq_id} ) #if the file exists { #create document object, add pdf files to document, attach to eprint object, and move to live archive my $doc = eprints::dataobj::document::create( $session, $eprint ); $doc->add_file( $filemap->{$pq_id}, $pq_id . '.pdf' ); $doc->set_value( "format", "application/pdf" ); $doc->commit(); print "adding document to eprint ", $eprint->get_id, "\n"; $eprint->move_to_archive; print "eprint ".$eprint->get_id." moved to archive.\n"; $int_live ++; } else { #move the metadata-only eprints to live as well print "proquest id \\$pq_id\\ (eprint ", $eprint->get_id, ") does not have a file associated with it\n"; $eprint->move_to_archive; print "eprint ".$eprint->get_id." moved to archive without document attached.\n"; $int_no_doc ++; push ( @array_no_doc, $eprint->get_id ); } } #recursively traverse the directory, find all pdf files. sub load_filemap { my ($directory) = @_; foreach my $filename (<$directory/*>) { if (-d $filename) { load_filemap($filename); } #catch the file name ending in .pdf elsif ($filename =~ m/([^\/]*)\.pdf$/i) information technology and libraries | march 2012 125 { my $pq_id = $1; #add pq_id => filename pair to filemap hash table $filemap->{$pq_id} = $filename; } } } modeling a library website redesign process: developing a user-centered website through usability testing danielle a. becker and lauren yannotta information technology and libraries | march 2013 6 abstract this article presents a model for creating a strong, user-centered web presence by pairing usability testing and the design process. four rounds of usability testing were conducted throughout the process of building a new academic library web site. participants were asked to perform tasks using a talk-aloud protocol. tasks were based on guiding principles of web usability that served as a framework for the new site. results from this study show that testing throughout the design process is an effective way to build a website that not only reflects user needs and preferences, but can be easily changed as new resources and technologies emerge. introduction in 2008 the hunter college libraries launched a two-year website redesign process driven by iterative usability testing. the goals of the redesign were to: • update the design to position the library as a technology leader on campus; • streamline the architecture and navigation; • simplify the language used to describe resources, tools, and services; and • develop a mechanism to quickly incorporate new and emerging tools and technologies. based on the perceived weaknesses of the old site, the libraries’ web committee developed guiding principles that provided a framework for the development of the new site. the guiding principles endorsed solid information architecture, clear navigation systems, strong visual appeal, understandable terminology, and user-centered design. this paper will review the literature on iterative usability testing, user-centered design, and thinkaloud protocol and the implications moving forward. it will also outline the methods used for this study and discuss the results. the model used, building the design based on the guiding principles and using the testing to uphold those principles, led to the development of a strong, user-centered site that can be easily changed or adapted to accommodate new resources and technologies. we believe this model is unique and can be replicated by other academic libraries undertaking a website redesign process. danielle a. becker (dbe0003@hunter.cuny.edu) is assistant professor/web librarian, lauren yannotta (lyannotta@hotmail.com) was assistant professor/instructional design librarian, hunter college libraries, new york, new york. mailto:dbe0003@hunter.cuny.edu mailto:lyannotta@hotmail.com modeling a library website redesign process | becker 7 background the goals of the research were to (1) determine the effectiveness of the hunter college libraries website, (2) discover how iterative usability testing resulting in a complete redesign impacts how the students perceive the usability of a college library website, and (3) reveal student informationseeking habits. a formal usability test was conducted both on the existing hunter college libraries website (appendix a) and the following drafts of the redesign (appendix b) with twenty users over an eighteen-month period. the testing occurred before the website redesign began, while the website was under construction, and after the site was launched. the participants were selected through convenience sampling and informed that participation was confidential. the intent of the usability test was to uncover the flaws in navigation and terminology of the current website and, as the redesign process progressed, to incorporate the users’ feedback into the new website’s design to closely match their wants and needs. the redesign of the website began with a complete inventory of the existing webpages. an analysis was done of the website that identified key information, links, units within the department, and placement of information in the information architecture of the website. we identified six core goals that we felt were the most important for all users of the library’s website: 1. user should be able to locate high-level information within three clicks. 2. eliminate library jargon from navigational system using concise language. 3. improve readability of site. 4. design a visually appealing site. 5. create a site that was easily changeable and expandable. 6. market the libraries’ services and resources through the site. literature review in 2010, oclc compiled a report, “the digital information seeker,” that found 84 percent of users begin their information searches with search engines, while only 1 percent began on a library website. search engines are preferred because of speed, ease of use, convenience, and availability.1 similar studies such as emde et al., and gross and sheridan, have shown that students are not using library websites to do their research.2 gross and sheridan assert in their article on undergraduate search behavior that “although students are provided with library skills sessions, many of them still struggle with the complex interfaces and myriad of choices the library website provides.” 3 this research shows the importance of creating streamlined websites that will information technology and libraries | march 2013 8 compete for our students’ attention. in building a new website at the hunter college libraries, we thought the best way to do this was through user-centered design. web designers both inside and outside the library have recognized the importance of usercentered design. nielsen advises that website structure should be driven by the tasks the users came to the site to perform.4 he asserts the amount of graphics on webpages should be minimized because they often affect page download times and that gratuitous graphics (including text rendered as images) should be eliminated altogether. 5 he also contends it is important to ensure that page designs are accessible to all users regardless of platform or newness of technology. 6 in their article, “how do i find an article? insights from a web usability study,” cockrell and jayne cited instances when researchers concluded that library terminology contributed to patrons’ difficulties when using library websites, thus highlighting the importance of understandable terminology. hulseberg and monson found in their investigation of student-driven taxonomy for library website design that “by developing our websites based on student-driven taxonomy for library website terminology, features, and organization, we can create sites that allow students to get down to the business of conducting research.” 7 performing usability testing is one way to confirm user-centered design. in his book don’t make me think!, krug insists that usability testing can provide designers with invaluable input. that, taken together with experience, professional judgment, and common sense, makes design choices easier.8 ipri, yunkin, and brown, in their article “usability as a method for assessing discovery,” emphasize the important role usability testing has in capturing emotional and aesthetic responses users have to websites, along with expressions of satisfaction with the layout and logic of the site. even the discovery of basic mistakes, such as incorrect or broken links and ineffective wording, can negatively affect discovery of library resources and services. 9 in battleson, booth, and weatherford’s literature review for their usability testing of an academic library website case study, they summarize dumas and redish's discussion of the five facets of formal usability testing: (1) the goal is to improve the usability of the interface, (2) testers should represent real users, (3) testers perform real tasks, (4) user behavior and commentary are observed and recorded, and (5) data are analyzed to recognize problems and suggest solutions. they conclude that when usability testing is "applied to website interfaces, this test method not only results in a more usable site, but also allows the site design team to function more efficiently, since it replaces opinion with user-centered design."10 this allows the designers to evaluate the results and identify problems with the design being tested. 11 usability experts nielsen and tahir contend that the earlier and more frequently usability tests are conducted, the more impact the results will have on the final design of the website because the results can be incorporated throughout the design process. they conclude it is better to conduct frequent, smaller studies with a maximum of five users. they assert, “you will always have discovered so many blunders in the design that it will be better to go back to the drawing board modeling a library website redesign process | becker 9 and redesign the interface than to discover the same usability problems several more times with even more users.” 12 based on the strength of the literature, we decided to use iterative testing for our usability study. krug points out that testing is an iterative process because designers need to create, test, and fix based on test results, then test again.13 according to the united states department of health and human services report “research-based web design and usability guidelines,” conducting before and after studies when revising a website will help designers determine if changes actually made a difference in the usability of the site.14 manzari and trinidad-christensen found in their evaluation of user-centered design for a library website, iterative testing is when a product is tested several times during development, allowing users’ needs to be incorporated into the design. in their study, their aim was that the final draft of their website would closely match the users’ information needs while remaining consistent, easy to learn, and efficient.15 battleson, booth, and weintrop report that there is “a consensus in the literature that usability testing be an iterative process, preferably one built into a web site’s initial design.” 16 they explain that “site developers should test for usability, redesign, and test again—these steps create a cycle for maintaining, evaluating and continually improving a site.” 17 george used iterative testing in her redesign of the carnegie mellon university libraries website and concluded that it was “necessary to provide user-centered services via the web site.” 18 cobus, dent, and ondrusek used six students to usability test the “pilot study.” then eight students participated in the first round of testing; then librarians modified the prototype and tested fourteen students in the second and final round. after the second round of testing they used the results of this test to analyze the user recordings and deliver the findings and proposed “fixes” to the prototype pages to the web editor.19 mcmullen’s redesign of the roger williams university library website was able to “complete the usability-refinement cycle” twice before finalizing the website design.20 but continued refinements were needed, leading to another round of usability tests to identify and correct problem areas.21 bauer-graham, poe, and weatherford did a comparative study of a library websites’ usability via a survey and then redesigned the website after evaluating the survey’s results. they waited a semester, distributed another survey to determine the functionality of the current site. the survey had the participants view the previous design and the current design in a side-by-side comparison to determine how useful the changes made to the site were. 22 when testing participants, in the article “how do i find an article? insights from a web usability study,” cockrell and jayne suggest using a web interface to perform specified tasks while a tester observes, noting the choices made, where mistakes occur, and using a “think aloud” protocol. they found that modifying the website through an ongoing, iterative process of testing, refining, and retesting its component parts improves functionality. 23 in conducting our usability testing we used a think-aloud protocol to capture the participants’ actions. van den haak, de jong, and schellens define think-aloud protocol as relying on a method information technology and libraries | march 2013 10 that asks users to complete a set of tasks and to constantly verbalize their thoughts while working on the tasks. the usefulness of this method of testing lies in the fact that the data collected reflect the actual use of the thing being tested and not the participants’ judgments about its usability. instead, the test follows the individual’s thoughts during the execution of the tasks. 24 nielsen states that think-aloud protocol “may be the single most valuable usability engineering method. . . . one gets a very direct understanding of what parts of the [interface/user] dialog cause the most problems, because the thinking aloud method shows how users interpret each individual interface item.” 25 turnbow ‘s article “usability testing for web redesign: a ucla case study” states that using the “think-aloud protocol” provides crucial real-time feedback on potential problems in the design and organization of a website.26 cobus, dent, and ondrusek used the think-aloud protocol in their usability study. they encouraged participants to talk out loud as they answered the questions, audio taped their comments, and captured their on-screen navigation using camtasia.27 this information was used to successfully reorganize hunter college library’s website. method an interactive draft of hunter college libraries redesigned website was created before the usability study was conducted. in spring 2009, the authors created the protocol for the usability testing. a think-aloud protocol was agreed upon for testing both the old site and the drafts of the new site, including a series of post-test questions that would allow participants to share their demographic information and give subjective feedback on the drafts of the site. draft questions were written, and we conducted mock usability tests on each other. after several drafts we revised our questions and performed pilot tests on an mlis graduate student and two undergraduate student library assistants with little experience with the current website. we ascertained from these pilot tests that we needed to slightly revise the wording of several questions to make them more understandable to all users. we made the revisions and eliminated a question that was redundant. all recruitment materials and finalized questions were submitted to the institutional review board (irb) for review and went through the certification process. after receiving approval we secured a private room to conduct the study. participants were recruited using a variety of methods. signs were posted throughout the library, an e-mail was sent out to several hunter college distribution lists, and a tent sign was erected in the lobby of the library. participants were required to be students or faculty. participants were offered a $10.00 barnes & noble gift card as incentive. applicants were accepted on a rolling basis. twenty students participated in the web usability study (appendix c). no faculty responded to our requests for participation so a decision was made to focus this usability test on students rather than faculty because students comprise our core user base. another usability test will be conducted in the future that will focus on faculty to determine how their academic tasks differ from undergraduates when using the library modeling a library website redesign process | becker 11 website. the redesigned site is malleable, which makes revisions and future changes in the design a predicted outcome of future usability tests. tests were scheduled for thirty-minute intervals. we conducted four rounds of testing using five participants per round. the two researchers switched questioner and observer roles after each round of testing. each participant was asked to think aloud while they completed the tasks and navigated the website. both researchers took notes during the tests to ensure detailed and accurate data was collected. each participant was asked to review the irb forms detailing their involvement in the study, and they were asked to consent at that time. their consent was implied if they participated in the study after reading the form. the usability test consisted of fifteen task-oriented questions. the questions were identical when testing the old and new draft site. the first round tested only the old site, while the following three rounds tested only the new draft site. we tested both sites because we believed that comparing the two sites would reveal if the new site improved performance. the questions (appendix d) were not changed after they were initially finalized and remained the same throughout the entire four rounds of the usability study. participants were reminded at the onset of the test and throughout the process that the design and usability of the site(s) were being tested, not their searching abilities. the tests were scheduled for an hour each, allowing participants to take the tests without time restrictions or without being timed. as a result, the participants were encouraged to take as much time as they needed to answer the questions, but were also allowed to skip questions if they were unable to locate answers. initially the tests were recorded using camtasia software. this allowed us to record participants’ navigation trails through their mouse movements and clicks. but, after the first round of testing, we decided that observing and taking notes was appropriate documentation, and we stopped using the software. after the participants completed the tests we asked them user preference questions to get a sense of their user habits and their candid opinions of the new draft of the website. these questions were designed to elicit ideas for useful links to include on the website and also to gauge the visual appeal of the site. information technology and libraries | march 2013 12 results table 1. percent of tasks answered correctly discussion hunter college libraries’ website was due for a redesign because the site was dated in its appearance and did not allow new content to be added quickly and easily. as a result, a decision was made to build a new site using a content management system (cms) to make the site easily expandable and simple to update. this study tested the simple tasks to determine how to structure the information architecture and to reinforce the guiding principles of the redesigned website. task successes and failures the high percentage of success of participants finding books on the redesigned website using the online library catalog and easily find library hours reinforced our guiding principle of understandable terminology and clear navigational systems. krug contends that navigation educates the user on the site’s contents through its visible hierarchy. the result is a site that guides the user through their options and instills confidence in task old site new site find a book using online library catalog 80% 86% find library hours 100% 100% get help from a librarian using questionpoint 40% 93% find a journal article 20% 66% find reference materials 0% 7% find journals by title 40% 66% find circulation policies 60% 53% find books on reserve 80% 73% find magazines by title 0% 73% find the library staff contact information 60% 100% find contact information for the branch libraries 40% 100% modeling a library website redesign process | becker 13 the website and its designers.28 we found this to be true in the way our users easily found the hours and catalog links on the prototype of our library website. the users on the old site knew where to look for this information because they were accustomed to how to navigate the old site. given that the prototype was a complete departure from the navigation and design of the old site, it was crucial that the labels and links were clear and understandable in the prototype or our design would fail. we made “hours” the first link under the “about” heading and “cuny+/books” the first link under the “find” heading and as a result both our terminology and our structure was a success with participants. on the old website, users rarely used the libraries’ online chat client. despite our efforts to remind students of its usefulness, the website didn’t sufficiently place the link in a reasonably visible location on the home page. in the old site, only 40 percent of participants located the link as it was on the bottom left of the screen and easy to overlook. instead, on the new site, the “ask a librarian” link was prominently featured on the top of the screen. these results upheld the guiding principles of solid information architecture and understandable terminology. it also supported nielsen’s assertion that “site design must be aimed at simplicity above all else, with as few distractions as possible and with a very clear information architecture and matching navigation tools.” 29 as a result the launch of the redesigned site, the use of the questionpoint chat client has more than doubled. finding a journal article on a topic was always problematic for users of the old library website. the participants we tested were familiar with the site, and 80 percent erroneously clicked on “journal title list” when the more appropriate link would have been “databases” if they didn’t have an exact journal title in mind. although we taught this in our information literacy courses, it was challenging getting the information across. in order to address this on the new site, “databases” was changed to “databases/articles” and categorized under the heading “find.” the participants using the new site had greater success with the new terminology; 66 percent correctly chose “databases/articles.” this question revealed an inconsistency with the guiding principals of understandable terminology and clear navigation systems on the old site. these issues were addressed by adding the word “articles” after “databases” on the new site to clarify what resources could be found in a database and also by placing the link under the heading “find” to further explain the action a student would be taking by clicking on the “databases/articles” link. finding reference materials was challenging for the users of the old site as none of the participants clicked on the intended link “subject guides.” in an effort to increase usage of the research guides, the library not only purchased the libguides tool, but also changed the wording of the link to “topic guides.” as we neared the end of our study we observed that only one participant knew to click on the “topic guides” link for research assistance. the participants suggested calling it “research guides” instead of “topic guides” and we changed it. unfortunately, the usability study was completed and we were unable to further test the effectiveness of the rewording of this link. anecdotally, the rewording of this link appears to be more understandable to users as the information technology and libraries | march 2013 14 research guides are getting more usage (based on hit counts) than the previous guides. the rewording of these guides adhered to both principles of understandable terminology and usercentered design. these results supported nielsen’s assertion that the most important material should be presented up front, using the inverted pyramid principal. “users should be able to tell in a glance what the page is about and what it can do for them.” 30 our results also supported the hhs report, which states that terminology “plays a large role in the user’s ability to find and understand information. many terms are familiar to designers and content writers, but not to users.” 31 we concluded that rewriting the link based on student feedback reduces the use of terminology. although librarians are “subject specialists” and “subject liaisons” and are familiar with those labels and that terminology, our students were looking for the word “research” instead of “subject” so they were not connecting with the library’s libguides. as previously discussed, students of the old site thought the link “journal title list” would give them access to the library’s database holdings. when asked to find a specific journal title the correct answer to this question on the old site was “journal title list,” with only 40 percent of the participants answering correctly. another change to terminology in the new site, both were placed under the heading “find,” and, after testing of the first prototype, “journal title list” was changed to “list of journals and magazines.” in the following tests 66 percent of the participants were able to answer correctly. the percentages of success in finding circulation policies between the old site and the prototype site were slight, only a 7 percent difference. this can be attributed to the fact that participants on the old site could click on multiple links to get to the correct page, and they were familiar enough with the site to know that. in the prototype of the site there were several paths as well, some direct, some indirect. testing the wording of this link supported the understandable terminology principle, more so than the old website’s “library policies” link, yet to be true to our user-centered design principle, we needed to reword it once more. therefore, after the test was completed and the website was launched, we reworded the link to “checkout policies,” which utilizes the same terminology that users are familiar with because they checkout books at our checkout desk. the remaining tasks consisted of locating information, such as finding books on reserve, magazines by title, library staff contact information, and finding branch information were all met with higher success rates in the prototype site because in the redesign process the links were reworded to support the understandable terminology and user-centered design principles. participant feedback: qualitative the usability testing process informed the redesign of our website in many specific ways. if the layout of the site didn’t test well with participants, we planned to create another prototype. in their evaluation of colorado state universities libraries’ digital collections and the western waters digital library websites, zimmerman and paschal describe the importance of first impressions of a website as the determining factor of whether users return to a website; if it is positive they will return and continue to explore.32 modeling a library website redesign process | becker 15 when given an opportunity to give feedback on what they thought of the design of the website the participants commented: • “there were no good library links at the bottom before and there wasn’t the ask a librarian link either which i like a lot.” • “the old site was too difficult to navigate, new site has a lot of information, i like the different color schemes for the different things.” • “it is contemporary and has everything i need in front of me.” • “cool.” • “helpful.” • “straightforward.” • “the organization is easier for when you want to find things.” • “interactivity and rollovers make it easy to use.” • “intuitive, straight-forward and i like the simplicity of the colors.” • “more professional, more aesthetically pleasing than the old site.” • “the four menu options (about, find, services, help) break the information down easily.” additional research conducted by nathan, yeow, and murguesan claims attractiveness (referring to aesthetic appeal of a website) is the most important factor in influencing customer decisionmaking and affects the usability of the website.33 not only that, but users feel better when using a more attractive product. fortunately, the feedback from our participants revealed that the website was visually appealing, and the navigation scheme was clear and easy to understand. other changes made to the libraries’ website because of usability testing participants commented that they expected to find library contact information on the bottom of the homepage, so the bottom of the screen was modified to include this information as well as a “contact us” link. participants did not realize that the “about,” “find,” “services,” and “help” headings were also links, so we modified them so they were underlined when hovered over. there were also adjustments to the gray color bars on the top of the page because participants thought they were too bright, so they were darkened to make the labels easier to read. participants also commented that they wanted links to various public libraries in new york city under the “quick links” section of the homepage. we designed buttons for brooklyn public library, queens public library, and the new york public library and reordered this list to move these links closer to the top of the “quick links” section. information technology and libraries | march 2013 16 conclusion conducting a usability study of hunter college libraries existing website and the various stages of the redesigned website prototypes was instrumental in developing a user-centered design. approaching the website redesign in stages, with guidance from iterative user testing and influenced by the participants’ comments, gave the web librarian and the web committee an opportunity to incorporate the findings of the usability study into the design of the new website. rather than basing design decisions on assumptions of users’ needs and information seeking behaviors, we were able to incorporate what we’d learned from the library literature and the users’ behavior into our evolving designs. this strategy resulted in a redesigned website that, with continued testing, user feedback, and updating, has aligned with the guiding principles we developed at the onset of the redesign project. the one unexpected outcome from this study is that we discovered that despite how well a library website is designed, users will still need to be educated in how to use the site with an emphasis on developing strong information literacy skills. references 1. “the digital information seeker: report of the findings from selected oclc, rin, and jisc user behaviour projects,” oclc research, ed. lynn silipigni-connaway and timothy dickey (2010): 6, www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx. 2. judith emde, lea currie, frances a. devlin, and kathryn graves, “is ‘good enough’ ok? undergraduate search behavior in google and in a library database,” university of kansas scholarworks (2008), http://hdl.handle.net/1808/3869; julia gross and lutie sheridan, “web scale discovery: the user experience,” new library world 112, no. 5/6 (2011): 236, doi: 10.1108/03074801111136275. 3. ibid, 238. 4. jakob nielsen, designing web usability (indianapolis: new riders, 1999), 198. 5. ibid, 134. 6. ibid, 97. 7. barbara j. cockrell and elaine a. jayne, “how do i find an article? insights from a web usability study,” journal of academic librarianship 28, no. 3 (2002): 123, doi: 10.1016/s00991333(02)00279-3. 8. steve krug, don't make me think! a common sense approach to web usability, 2nd ed. (berkeley, ca: new riders, 2006), 135. 9. tom ipri, michael yunkin, and jeanne brown, “usability as a method for assessing discovery,” information technology & libraries 28, no. 4 (2009): 181, doi: 10.6017/ital.v28i4.3229. 10. brenda battleson, austin booth, and jane weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 189–98, doi: 10.1016/s0099-1333(01)00180-x. http://www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx http://hdl.handle.net/1808/3869 doi:%2010.1108/03074801111136275 doi:%2010.1108/03074801111136275 doi:%2010.1016/s0099-1333(02)00279-3 doi:%2010.1016/s0099-1333(02)00279-3 doi:%2010.6017/ital.v28i4.3229 doi:%2010.1016/s0099-1333(01)00180-x doi:%2010.1016/s0099-1333(01)00180-x modeling a library website redesign process | becker 17 11. ibid. 12. jakob nielsen and marie tahir, “keep your users in mind,” internet world 6, no. 24 (2000): 44. 13. steve krug, don't make me think! a common sense approach to web usability, 135. 14. research-based web design and usability guidelines, ed. ben schneiderman (washington: united states dept. of health and human services, 2006), 190. 15. laura manzari and jeremiah trinidad-christensen, “user-centered design of a web site for library and information science students: heuristic evaluation and usability testing,” information technology & libraries 25, no. 3 (2006): 163, doi: 10.6017/ital.v25i3.3348. 16. battleson, booth, and weintrop, “usability testing of an academic library web site,” 190. 17. ibid. 18. carole a. george, “usability testing and design of a library website: an iterative approach,” oclc systems & services 21, no. 3 (2005): 178, doi: 10.1108/10650750510612371. 19. laura cobus, valeda dent, and anita ondrusek, “how twenty-eight users helped redesign an academic library web site,” reference & user services quarterly 44, no. 3 (2005): 234–35. 20. susan mcmullen, “usability testing in a library web site redesign project,” reference services review 29, no. 1 (2001): 13, doi: 10.1108/00907320110366732. 21. ibid. 22. john bauer-graham, jodi poe, and kimberly weatherford, “functional by design: a comparative study to determine the usability and functionality of one library's web site,” technical services quarterly 21, no. 2 (2003): 34, doi: 10.1300/j124v21n02_03. 23. cockrell and jayne, “how do i find an article?,” 123. 24. maaike van den haak, menno de jong, and peter jan schellens, “retrospective vs. concurrent think-aloud protocols: testing the usability of an online library catalogue,” behavior & information technology 22, no. 5 (2003): 339. 25. battleson, booth, and weintrop, “usability testing of an academic library web site,” 192. 26. dominique turnbowet al., “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 231, doi: 10.1108/10650750510612416. 27. cobus, dent, and ondrusek, “how twenty-eight users helped redesign an academic library web site,” 234. 28. krug, don't make me think! 59. 29. nielsen, designing web usability, 164. 30. ibid., 111. doi:%2010.6017/ital.v25i3.3348 doi:%2010.1108/10650750510612371 doi:%2010.1108/00907320110366732 doi:%2010.1300/j124v21n02_03 doi:%2010.1108/10650750510612416 information technology and libraries | march 2013 18 31. schneiderman, research-based web design and usability guidelines, 160. 32. don zimmerman and dawn bastian paschal, “an exploratory evaluation of colorado state universities libraries’ digital collections and the western waters digital library web sites,” journal of academic librarianship 35, no. 3 (2009): 238, doi: 10.1016/j.acalib.2009.03.011. 33. robert j. nathan, paul h. p. yeow, and sam murugesan, “key usability factors of serviceoriented web sites for students: an empirical study,” online information review 32, no. 3 (2008): 308, doi: 10.1108/14684520810889646. doi:%2010.1016/j.acalib.2009.03.011 doi:%2010.1108/14684520810889646 modeling a library website redesign process | becker 19 appendix a. hunter college libraries’ old website information technology and libraries | march 2013 20 appendix b. hunter college libraries’ new website modeling a library website redesign process | becker 21 appendix c. test participant profiles participant sex academic standing major library instruction session? how often in the library 1 female senior history yes every day 2 female sophomore psychology no every day 3 male junior nursing no 1/week 4 female junior studio art no 5/week 5 female senior accounting yes 2–3/week 6 male freshman undeclared yes 1/week 7 female freshman undeclared no every day 8 male senior music yes 3–4/week 9 male freshman physics/english no every day 10 female senior english lit/ media studies no 1/week 11 female junior fine arts/ geography yes 2–3/week 12 male sophomore computer science yes every day 13 male sophomore econ/psychology yes 6 hours/week 14 female senior math/econ yes 2–3/week 15 female senior art yes everyday 16 male n/a* pre-nursing no daily 17 female senior** econ didn’t remember 3/week 18 male senior pre-med yes 2/week 19 female grad art history yes 3/week 20 male grad education (tesol) no every day note: *this student at hunter fulfilling pre-requisites; already had bachelor of arts degree from another college. **this student had just graduated. information technology and libraries | march 2013 22 appendix d. test questions/tasks • what is the first thing you noticed (or looked at) when you launched the hunter libraries homepage? • what’s the second? • if your instructor assigned the book to kill a mockingbird what link would you click on to see if the library owns that book? • when does the library close on wednesday night? • if you have a problem researching a paper topic and are at home, where would you go to get help from a librarian? • where would you click if you needed to find two journal articles on “homelessness in america”? • you have to write your first sociology paper and wanted to know what databases, journals, and web sites would be good resources for you to begin your research. where would you click? • does hunter library subscribe to the e-journal journal of communication? • how long can you check out a book for? • how would you find items on reserve for professor doyle’s liibr100 class? • does hunter library have the latest issue of rolling stone magazine? • what is the e-mail for louise sherby, dean of libraries? • what is the phone number for the social work library? • you are looking for a guide to grammar and writing on the web, does the library’s webpage have a link to such a guide? • your friend is a hunter student who lives near brooklyn college. she says that she may return books she borrowed from the brooklyn college library to hunter library. is she right? where would you find out? • this website is easy to navigate (agree, agree somewhat, disagree somewhat, disagree)? • this website uses too much jargon (agree, agree somewhat, disagree somewhat, disagree)? • i use the hunter library’s website (agree, agree somewhat, disagree somewhat, disagree)? can bibliographic data be put directly onto the semantic web? | yee 55 martha m. yee can bibliographic data be put directly onto the semantic web? this paper is a think piece about the possible future of bibliographic control; it provides a brief introduction to the semantic web and defines related terms, and it discusses granularity and structure issues and the lack of standards for the efficient display and indexing of bibliographic data. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of more frbrized cataloging rules than those about to be introduced to the library community (resource description and access) and in creating an rdf data model for the rules. i am now in the process of trying to model my cataloging rules in the form of an rdf model, which can also be inspected at http://myee. bol.ucla.edu/. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is sophisticated enough yet to deal with our data. this article is an attempt to identify some of those areas and explore whether or not the problems i have encountered are soluble—in other words, whether or not our data might be able to live on the semantic web. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. t his paper is a think piece about the possible future of bibliographic control; as such, it raises more complex questions than it answers. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of frbrized descriptive and subject-cataloging rules. here my focus will be on the data model rather than on the frbrized cataloging rules for gathering data to put in the model, although i hope to have more to say about the latter in the future. the intent is not to present you with conclusions but to present some questions about data modeling that have arisen in the course of the experiment. my premise is that decisions about the data model we follow in the future should be made openly and as a community rather than in a small, closed group of insiders. if we are to move toward the creation of metadata that is more interoperable with metadata being created outside our community, as is called for by many in our profession, we will need to address these complex questions as a community following a period of deep thinking, clever experimentation, and astute political strategizing. n the vision the semantic web is still a bewitching midsummer night’s dream. it is the idea that we might be able to replace the existing html–based web consisting of marked-up documents—or pages—with a new rdf– based web consisting of data encoded as classes, class properties, and class relationships (semantic linkages), allowing the web to become a huge shared database. some call this web 3.0, with hyperdata replacing hypertext. embracing the semantic web might allow us to do a better job of integrating our content and services with the wider internet, thereby satisfying the desire for greater data interoperability that seems to be widespread in our field. it also might free our data from the proprietary prisons in which it is currently held and allow us to cooperate in developing open-source software to index and display the data in much better ways than we have managed to achieve so far in vendor-developed ils opacs or in giant, bureaucratic bibliographic empires such as oclc worldcat. the semantic web also holds the promise of allowing us to make our work more efficient. in this bewitching vision, we would share in the creation of uniform resource identifiers (uris) for works, expressions, manifestations, persons, corporate bodies, places, subjects, and so on. at the uri would be found all of the data about that entity, including the preferred name and the variant names, but also including much more data about the entity than we currently put into our work (name-title and title), such as personal name, corporate name, geographic, and subject authority records. if any of that data needed to be changed, it would be changed only once, and the change would be immediately accessible to all users, libraries, and library staff by means of links down to local data such as circulation, acquisitions, and binding data. each work would need to be described only once at one uri, each expression would need to be described only once at one uri, and so forth. very much up in the air is the question of what institutional structures would support the sharing of the creation of uris for entities on the semantic web. for the data to be reliable, we would need to have a way to ensure that the system would be under the control of people who had been educated about the value of clean and accurate entity definition, the value of choosing “most commonly known” preferred forms (for display in lists of multiple different entities), and the value of providing access martha m. yee (myee@ucla.edu) is cataloging supervisor at the university of california, los angeles film and television archive. 56 information technology and libraries | june 2009 under all variant forms likely to be sought. at the same time, we would need a mechanism to ensure that any interested members of the public could contribute to the effort of gathering variants or correcting entity definitions when we have had inadequate information. for example, it would be very valuable to have the input of a textual or descriptive bibliographer applied to difficult questions concerning particular editions, issues, and states of a significant literary work. it would also be very valuable to be able to solicit input from a subject expert in determining the bounds of a concept entity (subject heading) or class entity (classification). n the experiment (my project) to explore these bewitching ideas, i have been conducting an experiment. as part of my experiment, i designed a set of cataloging rules that are more frbrized than is rda in the sense that they more clearly differentiate between data applying to expression and data applying to manifestation. note that there is an underlying assumption in both frbr (which defines expression quite differently from manifestation) and on my part, namely that catalogers always know whether a given piece of data applies at either the expression or the manifestation level. that assumption is open to questioning in the process of the experiment as well. my rules also call for creating a more hierarchical and degressive relationship between the frbr entities work, expression, manifestation, and item, such that data pertaining to the work does not need to be repeated for every expression, data pertaining to the expression does not need to be repeated for every manifestation, and so forth. degressive is an old term used by bibliographers for bibliographies that provide great detail about first editions and less detail for editions after the first. i have adapted this term to characterize my rules, according to which the cataloger begins by describing the work; any details that pertain to all expressions and manifestations of the work are not repeated in the expression and manifestation descriptions. this paper would be entirely too long if i spent any more time describing the rules i am developing, which can be inspected at http://myee.bol.ucla .edu. here, i would like to focus on the data-modeling process and the questions about the suitability of rdf and the semantic web for encoding our data. (by the way, i don’t seriously expect anyone to adopt my rules! they are radically different than the rules currently being applied and would represent a revolution in cataloging practice that we may not be up to undertaking in the current economic climate. their value lies in their thought-experiment aspect and their ability to clarify what entities we can model and what entities we may not be able to model.) i am now in the process of trying to model my cataloging rules in the form of an rdf model (“rdf” as used in this paper should be considered from now on to encompass rdf schema [rdfs], web ontology language [owl], and simple knowledge organization system [skos] unless otherwise stated); this model can also be inspected at http://myee.bol .ucla.edu. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is yet sophisticated enough to deal with our data. this article is an attempt to outline some of those areas and explore whether the problems i have encountered are soluble, in other words, whether or not our data might be able to live on the semantic web eventually. i have already heard from rdf experts bruce d’arcus (miami university) and rob styles (developer of talis, as semantic web technology company), whom i cite later, but through this article i hope to reach a larger community. my research questions can be found later, but first some definitions. n definition of terms the semantic web is a way to represent knowledge; it is a knowledge-representation language that provides ways of expressing meaning that are amenable to computation; it is also a means of constructing knowledgedomain maps consisting of class and property axioms with a formal semantics rdf is a family of specifications for methods of modeling information that underpins the semantic web through a variety of syntax formats; an rdf metadata model is based on making statements about resources in the form of triples that consist of 1. the subject of the triple (e.g., “new york”); 2. the predicate of the triple that links the subject and the object (e.g., “has the postal abbreviation”); and 3. the object of the triple (e.g., “ny”). xml is commonly used to express rdf, but it is not a necessity; it can also be expressed in notation 3 or n3, for example.1 rdfs is an extensible knowledge-representation language that provides basic elements for the description of ontologies, also known as rdf vocabularies. using rdfs, statements are made about resources in the form of 1. a class (or entity) as subject of the rdf triple (e.g., “new york”); 2. a relationship (or semantic linkage) as predicate of the rdf triple that links the subject and the object (e.g., can bibliographic data be put directly onto the semantic web? | yee 57 “has the postal abbreviation”); and 3. a property (or attribute) as object of the rdf triple (e.g., “ny”). owl is a family of knowledge representation languages for authoring ontologies compatible with rdf. skos is a family of formal languages built upon rdf and designed for representation of thesauri, classification schemes, taxonomies, or subject-heading systems. n research questions actually, the full-blown semantic web may not be exactly what we need. remember that the fundamental definition of the semantic web is “a way to represent knowledge.” the semantic web is a direct descendant of the attempt to create artificial intelligence, that is, of the attempt to encode enough knowledge of the real world to allow a computer to reason about reality in a way indistinguishable from the way a human being reasons. one of the research questions should probably be whether or not the technology developed to support the semantic web can be used to represent information rather than knowledge. fortunately, we do not need to represent all of human knowledge—we simply need to describe and index resources to facilitate their retrieval. we need to encode facts about the resources and what the resources discuss (what they are “about”), not facts about “reality.” based on our past experience, doing even this is not as simple as people think it is. the question is whether we could do what we need to do within the context of the semantic web. sometimes things that sound simple do not turn out to be so simple in the doing. my research questions are as follows: 1. is it possible for catalogers to tell in all cases whether a piece of data pertains to the frbr expression or the frbr manifestation? 2. is it possible to fit our data into rdf? given that rdf was designed to encode knowledge rather than information, perhaps it is the wrong technology to use for our purposes? 3. if it is possible to fit our data into rdf, is it possible to use that data to design indexes and displays that meet the objectives of the catalog (i.e., providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)? as stated previously, i am not yet ready to answer these questions. i hope to find answers in the course of developing the rules and the model. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. n other relevant projects other relevant projects include the following: 1. frbr, functional requirements for authority data (frad), funtional requirements for subject authority records (frsar), and frbr-objectoriented (frbroo). all are attempts to create conceptual models of bibliographic entities using an entity-relationship model that is very similar to the class-property model used by rdf.2 2. various initiatives at the library of congress (lc), such as lc subject headings (lcsh) in skos,3 the lc name authority file in skos,4 the lccn permalink project to create persistent uris for bibliographic records,5 and initiatives to provide skos representations for vocabularies and data elements used in marc, premis, and mets. these all represent attempts to convert our existing bibliographic data into uris that stand for the bibliographic entities represented by bibliographic records and authority records; the uris would then be available for experiments in putting our data directly onto the semantic web. 3. the dc-rda task group project to put rda data elements into rdf.6 as noted previously and discussed further later, rda is less frbrized than my cataloging rules, but otherwise this project is very similar to mine. 4. dublin core’s (dc’s) work on an rdf schema.7 dublin core is very focused on manifestation and does not deal with expressions and works, so it is less similar to my project than is the dc-rda task groups’s project (see further discussion later). n why my project? one might legitimately ask why there is a need for a different model than the ones already provided by frbr, frad, frsar, frbroo, rda, and dc. the frbr and rda models are still tied to the model that is implicit in our current bibliographic data in which expression and manifestation are undifferentiated. this is because publishers publish and libraries acquire and shelve manifestations. in our current bibliographic practice, a new 58 information technology and libraries | june 2009 bibliographic record is made for either a new manifestation or a new expression. thus, in effect, there is no way for a computer to tell one from the other in our current data. despite the fact that frbr has good definitions of expression (change in content) and manifestation (mere change in carrier), it perpetuates the existing implicit model in its mapping of attributes to entities. for example, frbr maps the following to manifestation: edition statements (“2nd rev. ed.”); statements of responsibility that identify translators, editors, and illustrators; physical description statements that identify illustrated editions; and extent statements that differentiate expressions (the 102-minute version vs. the 89-minute version); etc. thus the frbr definition of expression recognizes that a 2nd revised edition is a new expression, but frbr maps the edition statement to manifestation. in my model, i have tried to differentiate more cleanly data applying to expressions from data applying to manifestations.8 frbr and rda tend to assume that our current bibliographic data elements map to one and only one group 1 entity or class. there are exceptions, such as title, which frbr and rda define at work, expression, and manifestation levels. however, there is a lack of recognition that, to create an accurate model of the bibliographic universe, more data elements need to be applied at the work and expression level in addition to (or even instead of) the manifestation level. in the appendix i have tried to contrast the frbr, frad, and rda models with mine. in my model, many more data elements (properties and attributes) are linked to the work and expression level. after all, if the expression entity is defined as any change in work content, the work entity needs to be associated with all content elements that might change, such as the original extent of the work, the original statement of responsibility, whether illustrations were originally present, whether color was originally present in a visual work, whether sound was originally present in an audiovisual work, the original aspect ratio of a moving image work, and so on. frbr also tends to assume that our current data elements map to one and only one entity. in working on my model, i have come to the conclusion that this is not necessarily true. in some cases, a data element pertaining to a manifestation also pertains to the expression and the work. in other cases, the same data element is specific to that manifestation, and, in other cases, the same data element is specific to its expression. this is true of most of the elements of the bibliographic description. frad, in attempting to deal with the fact that our current cataloging rules allow a single person to have several bibliographic identities (or pseudonyms), treats person, name, and controlled access point as three separate entities or classes. i have tried to keep my model simpler and more elegant by treating only person as an entity, with preferred name and variant name as attributes or properties of that entity. frbroo is focused on the creation process for works, with special attention to the creation of unique works of art and other one-off items found in museums. thus frbroo tends to neglect the collocation of the various expressions that develop in the history of a work that is reproduced and published, such as translations, abridged editions, editions with commentary, etc. dc has concentrated exclusively on the description of manifestations and has neglected expression and work altogether. one of the tenets of semantic web development is that, once an entity is defined by a community, other communities can reuse that entity without defining it themselves. the very different definitions of the work and expression entities in the different communities described above raise some serious questions about the viability of this tenet. n assumptions it should be noted that this entire experiment is based on two assumptions about the future of human intervention for information organization. these two assumptions are based on the even bigger assumption that, even though the internet seems to be an economy based on free intellectual labor, and, even though human intervention for information organization is expensive (and therefore at more risk than ever), human intervention for information organization is worth the expense. n assumption 1: what we need is not artificial intelligence, but a better human–machine partnership such that humans can do all of the intellectual labor and machines can do all of the repetitive clerical labor. currently, catalogers spend too much time on the latter because of the poor design of current systems for inputting data. the universal employment provided by paying humans to do the intellectual labor of building the semantic web might be just the stimulus our economy needs. n assumption 2: those who need structured and granular data—and the precise retrieval that results from it—to carry out research and scholarship may constitute an elite minority rather than most of the people of the world (sadly), but that talented and intelligent minority is an important one for the cultural and technological advancement of humanity. it is even possible that, if we did a better job of providing access to such data, we might enable the enlargement of that minority. can bibliographic data be put directly onto the semantic web? | yee 59 n granularity and structure issues as soon as one starts to create a data model, one encounters granularity or cataloger-data parsing issues. these issues have actually been with us all along as we developed the data model implicit in aacr2r and marc 21. those familiar with rda, frbr, and frad development will recognize that much of that development is directed at increasing structure and granularity in catalogerproduced data to prepare for moving it onto the semantic web. however, there are clear trade-offs in an increase in structure and granularity. more structure and more granularity make possible more powerful indexing and more sophisticated display, but more structure and more granularity are more complex and expensive to apply and less likely to be implemented in a standard fashion across all communities; that is, it is less likely that interoperable data would be produced. any switching or mapping that was employed to create interoperable data would produce the lowest common denominator (the simplest and least granular data), and once rendered interoperable, it would not be possible for that data to swim back upstream to regain its lost granularity. data with less structure and less granularity could be easier and cheaper to apply and might have the potential to be adopted in a more standard fashion across all communities, but that data would limit the degree to which powerful indexing and sophisticated display would be possible. take the example of a personal name: currently, we demarcate surname from forename by putting the surname first, followed by a comma and then the forename. even that amount of granularity can sometimes pose a problem for a cataloger who does not necessarily know which part of the name is surname and which part is forename in a culture unfamiliar to the cataloger. in other words, the more granularity you desire in your data, the more often the people collecting the data are going to encounter ambiguous situations. another example: currently, we do not collect information about gender self-identification; if we were to increase the granularity of our data to gather that information, we would surely encounter situations in which the cataloger would not necessarily know if a given creator was self-defined as a female or a male or of some other gender identity. presently, if we are adding a birth and death date, whatever dates we use are all together in a $d subfield without any separate coding to indicate which date is the birth date and which is the death date (although an occasional “b.” or “d.” will tell us this kind of information). we could certainly provide more granularity for dates, but that would make the marc 21 format much more complex and difficult to learn. people who dislike the marc 21 format already argue that it is too granular and therefore requires too much of a learning curve before people can use it. for example, tennant claims that “there are only two kinds of people who believe themselves able to read a marc record without referring to a stack of manuals: a handful of our top catalogers and those on serious drugs.”9 how much of the granularity already in marc 21 is used either in existing records or, even if present, is used in indexing and display software? granularity costs money, and libraries and archives are already starving for resources. granularity can only be provided by people, and people are expensive. granularity and structure also exist in tension with each other. more granularity can lead to less structure (or more complexity to retain structure along with granularity). in the pursuit of more granularity of data than we have now, rda, attempting to support rdf–compliant xml encoding, has been atomizing data to make it useful to computers, but this will not necessarily make the data more useful to humans. to be useful to humans, it must be possible to group and arrange (sort) the data meaningfully, both for indexing and for display. the developers of skos refer to the “vast amounts of unstructured (i.e., human readable) information in the web,”10 yet labeling bits of data as to type and recording semantic relationships in a machine-actionable way do not necessarily provide the kind of structure necessary to make data readable by humans and therefore useful to the people the web is ultimately supposed to serve. consider the case of music instrumentation. if you have a piece of music for five guitars and one flute, and you simply code number and instrumentation without any way to link “five” with “guitars” and “one” with “flute,” you will not be able to guarantee that a person looking for music for five flutes and one guitar will not be given this piece of music in their results (see figure 1).11 the more granular the data, the less the cataloger can build order, sequencing, and linking into the data; the coding must be carefully designed to allow the desired order, sequencing, and linking for indexing and display to be possible, which might call for even more complex coding. it would be easy to lose information about order, sequencing, and linking inadvertently. actually, there are several different meanings for the term structure: 1. structure is an object of a record (structure of document?); for example, elings and waibel refer to “data fields . . . also referred to as elements . . . which are organized into a record by a data structure.”12 2. structure is the communications layer, as opposed to the display layer or content designation.13 3. structure is the record, field, and subfield. 4. structure is the linking of bits of data together in the 60 information technology and libraries | june 2009 form of various types of relationships. 5. structure is the display of data in a structured, ordered, and sequenced manner to facilitate human understanding. 6. data structure is a way of storing data in a computer so that it can be used efficiently (this is how computer programmers use the term). i hasten to add that i am definitely in favor of adding more structure and granularity to our data when it is necessary to carry out the fundamental objectives of our profession and of our catalogs. i argued earlier that frbr and rda are not granular enough when it comes to the distinction between data elements that apply to expression and those that apply to manifestation. if we could just agree on how to differentiate data applying to the manifestation from data applying to the expression instead of our current practice of identifying works with headings and lumping all manifestation and expression data together, we could increase the level of service we are able to provide to users a thousandfold. however, if we are not going to commit to differentiating between figure 1b. example of encoding of musical instrumentation at the expression level based on the above model 5 guitars 1 flute instrumentation of musical expression original instrumentation of musical expression—number of a particular instrument original instrumentation of musical expression—type of instrument figure 1a. extract from yee rdf model that illustrates one technique for modeling musical instrumentation at the expression level (using a blank node to group repeated number and instrument type) can bibliographic data be put directly onto the semantic web? | yee 61 expression and manifestation, it would be more intellectually honest for frbr and rda to take the less granular path of mapping all existing bibliographic data to manifestation and expression undifferentiated, that is, to use our current data model unchanged and state this openly. i am not in favor of adding granularity for granularity’s sake or for the sake of vague conceptions of possible future use. granularity is expensive and should be used only in support of clear and fundamental objectives. n the goal: efficient displays and indexes my main concern is that we model and then structure the data in a way that allows us to build the complex displays that are necessary to make catalogs appear simple to use. i am aware that the current orthodoxy is that recording data should be kept completely separate from indexing and display (“the applications layer”). because i have spent my career in a field in which catalog records are indexed and displayed badly by systems people who don’t seem to understand the data contained in them, i am a skeptic. it is definitely possible to model and structure data in such a way that desired displays and indexes are impossible to construct. i have seen it happen! the lc working group report states that “it will be recognized that human users and their needs for display and discovery do not represent the only use of bibliographic metadata; instead, to an increasing degree, machine applications are their primary users.”14 my fear is that the underlying assumption here is that users need to (and can) retrieve the single perfect record. this will never be true for bibliographic metadata. users will always need to assemble all relevant records (of all kinds) as precisely as possible and then browse through them before making a decision about which resources to obtain. this is as true in the semantic web—where “records” can be conceived of as entity or class uris—as it is in the world of marc–encoded metadata. some of the problems that have arisen in the past in trying to index bibliographic metadata for humans are connected to the fact that existing systems do not group all of the data related to a particular entity effectively, such that a user can use any variant name or any combination of variant names for an entity and do a successful search. currently, you can only look for a match among two or more keywords within the bounds of a single manifestation-based bibliographic record or within the bounds of a single heading, minus any variant terms for that entity. thus, when you do a keyword search for two keywords, for example, “clemens” and “adventures,” you will retrieve only those manifestations of mark twain’s adventures of tom sawyer that have his real name (clemens) and the title word “adventures” co-occurring within the bounded space created by a single manifestation-based bibliographic record. instead, the preferred forms and the variant forms for a given entity need to be bounded for indexing such that the keywords the user employs to search for that entity can be matched using co-occurrence rules that look for matches within a single bounded space representing the entity desired. we will return to this problem in the discussion of issue 3 in the later section “rdf problems encountered.” the most complex indexing problem has always proven to be the grouping or bounding of data related to a work, since it requires pulling in all variants for the creator(s) of that work as well. otherwise, a user who searches for a work using a variant of the author’s name and a variant of the title will continue to fail (as they do in all current opacs), even when the desired work exists in the catalog. if we could create a uri for the adventures of tom sawyer that included all variant names for the author and all variant titles for the work (including the variant title tom sawyer), the same keyword search described above (“clemens” and “adventures”) could be made to retrieve all manifestations and expressions of the adventures of tom sawyer, instead of the few isolated manifestations that it would retrieve in current catalogs. we need to make sure that we design and structure the data such that the following displays are possible: n display all works by this author in alphabetical order by title with the sorting element (title) appearing at the top of each work displayed. n display all works on this subject in alphabetical order by principal author and title (with principal author and title appearing at top of each work displayed), or title if there is no principal author (with title appearing at top of each work displayed). we must ensure that we design and structure the data in such a way that our structure allows us to create subgroups of related data, such as instrumentation for a piece of music (consisting of a number associated with each particular instrument), place and related publisher for a certain span of dates on a serial title change record, and the like. n which standards will carry out which functions? currently, we have a number of different standards to carry out a number of different functions; we can speculate about how those functions might be allocated in a new semantic web–based dispensation, as shown in table 1. in table 1, data structure is taken to mean what a record represents or stands for; traditionally, a record has represented an expression (in the days of hand62 information technology and libraries | june 2009 press books) or a manifestation (ever since reproduction mechanisms have become more sophisticated, allowing an explosion of reproductions of the same content in different formats and coming from different distributors). rda is record-neutral; rdf would allow uris to be established for any and all of the frbr levels; that is, there would be a uri for a particular work, a uri for a particular expression, a uri for a particular manifestation, and a uri for a particular item. note that i am not using data structure in the sense that a computer programmer does (as a way of storing data in a computer so that it can be used efficiently). currently, the encoding of facts about entity relationships (see table 1) is carried out by matching data-value character strings (headings or linking fields using issns and the like) that are defined by the lc/naco authority file (following aacr2r rules), lcsh (following rules in the subject cataloging manual), etc. in the future, this function might be carried out by using rdf to link the uri for a resource to the uri for a data value. display rules (see table 1) are currently defined by isbd and aacr2r but widely ignored by systems, which frequently truncate bibliographic records arbitrarily in displays, supply labels, and the like; rda abdicates responsibility, pushing display out of the cataloging rules. the general principle on the web is to divorce data from display and allow anyone to display the data any way they want. display is the heart of the objects (or goals) of cataloging: the point is to display to the user the works of an author, the editions of a work, or the works on a subject. all of these goals only can be met if complex, high-quality displays can be built from the data created according to the data model. indexing rules (see table 1) were once under the control of catalogers (in book and card catalogs) in that users had to navigate through headings and cross-references to find table 1. possible reallocation of current functions in a new semantic web–based dispensation function current future? data content, or content guidelines (rules for providing data in a particular element) defined by aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data elements defined by isbd–based aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data values defined by lc/naco authority file, lcsh, marc 21 coded data values, etc. defined as ontologies using rdf/ rdfs/owl/skos encoding or labeling of data elements for machine manipulation; same as data format? defined by iso 2709–based marc 21 defined by rdf/rdfs/xml data structure (i.e., what a record stands for) defined by aacr2r and marc 21; also frbr? defined by rdf/rdfs/owl/ skos schematization (constraint on structure and content) marc 21, mods, dcmi abstract model defined by rdf/rdfs/owl/ skos encoding of facts about entity relationships carried out by matching data value strings (headings found in lc/naco authority file and lcsh, issn’s, and the like) carried out by rdf/rdfs/owl/ skos in the form of uri links display rules ils software, formerly isbd– based aacr2r (“application layer”) or yee rules indexing rules ils software sparql, “application layer,” or yee rules can bibliographic data be put directly onto the semantic web? | yee 63 what they wanted; currently indexing is in the hands of system designers who prefer to provide keyword indexing of bibliographic (i.e., manifestation-based) records rather than provide users with access to the entities they are really interested in (works, authors and subjects), all represented currently by authority records for headings and cross-references. rda abdicates responsibility, pushing indexing concerns completely out of the cataloging rules. the general principle on the web is to allow resources to be indexed by any web search engines that wish to index them. current web data is not structured at all for either indexing or display. i would argue that our interest in the semantic web should be focused on whether or not it will support more data structure—as well as more logic in that data structure—to support better indexes and better displays than we have now in manifestation-based ils opacs. crucial to better indexing than we have ever had before are the co-occurrence rules for keyword indexing, that is, the rules for when a co-occurrence of two or more keywords should produce a match. we need to be able to do a keyword search across all possible variant names for the entity of interest, and the entity of interest for the average catalog user is much more likely to be a particular work than to be a particular manifestation. unfortunately, catalog-use studies only have studied so-called known-item searches without investigating whether a known-item searcher was looking for a particular edition or manifestation of a work or was simply looking for a particular work in order to make a choice as to edition or manifestation once the work was found. however, common sense tells us that it is a rare user who approaches the catalog with prior knowledge about all published editions of a given work. the more common situation is surely one in which a user desires to read a particular shakespeare play or view a particular david lean film and discovers that the desired work exists in more than one expression or manifestation only after searching the catalog. we need to have the keyword(s) in our search for a particular work co-occur within a bounded space that encompasses all possible keywords that might refer to that particular work entity, including both creator and title keywords. notice in table 1 the unifying effect that rdf could potentially have; it could free us from the use of multiple standards that can easily contradict each other, or at least not live peacefully together. examples are not hard to find in the current environment. one that has cropped up in the course of rda development concerns family names. presently the rules for naming families are different depending on whether the family is the subject of a work (and established according to lcsh) or whether the family is responsible for a collection of papers (and established according to rda). n types of data rda has blurred the distinctions among certain types of data, apparently because there is a perception that on the semantic web the same piece of data needs to be coded only once, and all indexing and display needs can be supported from that one piece of data. i question that assumption on the basis of my experience with bibliographic cataloging. all of the following ways of encoding the same piece of data can still have value in certain circumstances: n transcribed; in rdf terms, a literal (i.e., any data that is not a uri, a constant value). transcribed data is data copied from an item being cataloged. it is valuable for providing access to the form of the name used on a title page and is particularly useful for people who use pseudonyms, corporate bodies that change name, and so on. transcribed data is an important part of the historical record and not just for off-line materials; it can be a historical record of changing data on notoriously fluid webpages. n composed; in rdf terms, also a literal. composed data is information composed by a cataloger on the basis of observation of the item in hand; it can be valuable for historical purposes to know which data was composed. n supplied; in rdf terms, also a literal. supplied data is information supplied by a cataloger from outside sources; it can be valuable for historical purposes to know which data was supplied and from which outside sources it came. n coded; in rdf, represented by a uri. coded data would likely transform on the semantic web into links to ontologies that could provide normalized, human-readable identification strings on demand, thus causing coded and normalized data to merge into one type of data. is it not possible, though, that the coded form of normalized data might continue to provide for more efficient searching for computers as opposed to humans? coded data also has great cross-cultural value, since it is not as language-dependent as literals or normalized headings. n normalized headings (controlled headings); in rdf, represented by a uri. normalized or controlled headings are still necessary to provide users with coherent, ordered displays of thousands of entities that all match the user’s search for a particular entity (work, author, subject, etc.). the reason google displays are so hideous is that, so far, the data searched lacks any normalized display data. if variant language forms of the name for an entity 64 information technology and libraries | june 2009 are linked to an entity uri, it should be possible to supply headings in the language and script desired by a particular user. n the rdf model those who have become familiar with frbr over the years will probably not find it too difficult to transition from the frbr conceptual model to the rdf model. what frbr calls an “entity,” rdf calls a “subject” and rdfs calls a “class.” what frbr calls an “attribute,” rdf calls an “object” and rdfs calls a “property.” what frbr calls a “relationship,” rdf calls a “predicate” and rdfs calls a “relationship” or a “semantic linkage” (see table 2). the difficulty in any data-modeling exercise lies in deciding what to treat as an entity or class and what to treat as an attribute or property. the authors of frbr decided to create a class called expression to deal with any change in the content of a work. when frbr is applied to serials, which change content with every issue, the model does not work well. in my model, i found it useful to create a new entity at the manifestation level, the serial title, to deal with the type of change that is more relevant to serials, the change in title. i also created another new entity at the manifestation level, title-manifestation, to deal with a change of title in a nonserial work that is not associated with a change in content. one hundred years ago, this entity would have been called title-edition. i am also in the process of developing an entity at the expression level—surrogate—to deal with reproductions of original artworks that need to inherit the qualities of the original artwork they reproduce without being treated as an edition of that original artwork, which ipso facto is unique. these are just examples of cases in which it is not that easy to decide on the classes or entities that are necessary to accurately model bibliographic information. see the appendix for a complete comparison of the classes and entities defined in four different models: frbr, frad, rda, and the yee cataloging rules (ycr). the appendix also shows variation among these models concerning whether a given data element is treated as a class/entity or as an attribute/property. the most notable examples are name and preferred access point, which are treated as classes/entities in frad, as attributes in frbr and ycr, and as both in rda. n rdf problems encountered my goal for this paper is to institute discussion with data modelers about which problems i observed are insoluble and which are soluble: 1. is there an assumption on the part of semantic web developers that a given data element, such as a publisher name, should be expressed as either a literal or using a uri (i.e., controlled), but never both? cataloging is rooted in humanistic practices that require careful recording of evidence. there will always be value in distinguishing and labeling the following types of data: n copied as is from an artifact (transcribed) n supplied by a cataloger n categorized by a cataloger (controlled) tim berners-lee (the father of the internet and the semantic web) emphasizes the importance of recording not just data but also its provenance for the sake of authenticity.15 for many data elements, therefore, it will be important to be able to record both a literal (transcribed or composed form or both) and a uri (controlled form). is this a problem in rdf? as a corollary, if any data that can be given a uri cannot also be represented by a literal (transcribed and composed data, or one or the other), it may not be possible to design coherent, readable displays of the data describing a particular entity. among other things, cataloging is a discursive writing skill. does rdf require that all data be represented only once, either by a literal or by a uri? or is it perhaps possible that data that has a uri could also have a transcribed or composed form as a property? perhaps it will even be possible to store multiple snapshots of online works that change over time to document variant forms of a name for works, persons, and so on. 2. will the internet ever be fast enough to assemble the equivalent of our current records from a collection of hundreds or even thousands of uris? in rdf, links are one-to-one rather than one-to-many. this leads to a great proliferation of reciprocal links. the more granularity there is in the data, the more linking is necessary to ensure that atomized data elements are linked together. potentially, every piece of data describing a particular entity could be represented by a uri leading out to a skos list of data values. the number of links necessary to pull together table 2. the frbr conceptual model translated into rdf and rdfs frbr rdf rdfs entity subject class attribute object property relationship predicate relationship/ semantic linkage can bibliographic data be put directly onto the semantic web? | yee 65 all of the data just to describe one manifestation could become astronomical, as could the number of one-to-one links necessary to create the appearance of a one-to-many link, such as the link between an author and all the works of an author. is the internet really fast enough to assemble a record from hundreds of uris in a reasonable amount of time? given the often slow network throughput typical of many of our current internet connections, is it really practical to expect all of these pieces to be pulled together efficiently to create a single display for a single user? we yet may feel nostalgia for the single manifestation-based record that already has all of the relevant data in it (no assembly required). bruce d’arcus points out, however, that i think if you’re dealing with rdf, you wouldn’t necessarily be gathering these data in real-time. the uris that are the targets for those links are really just global identifiers. how you get the triples is a separate matter. so, for example, in my own personal case, i’m going to put together an rdf store that is populated with data from a variety of sources, but that data population will happen by script, and i’ll still be querying a single endpoint, where the rdf is stored in a relational database.16 in other words, d’arcus essentially will put them all in one place, or in one database that “looks” from a uri perspective to be “one place” where they’re already gathered. 3. is rdf capable of dealing with works that are identified using their creators? we need to treat author as both an entity in its own right and as a property of a work, and in many cases the latter is the more important function for user service. lexical labels, or human-readable identifiers for works that are identified using both the principal author and the title, are particularly problematic in rdf given that the principal author is an entity in its own right. is rdf capable of supporting the indexing necessary to allow a user to search using any variant of the author’s name and any variant of the title of a work in combination and still retrieve all expressions and manifestations of that work, given that author will have a uri of its own, linked by means of a relationship link to the work uri? is rdf capable of supporting the display of a list of one thousand works, each identified by principal author, in order first by principal author, then by title, then by publication date, given that the preferred heading for each principal author would have to be assembled from the uri for that principal author and the preferred title for each work would have to be assembled from the uri for that work? for fear that this will not, in fact, be possible, i have put a human-readable work-identifier data element into my model that consists of principal author and title when appropriate, even though that means the preferred name of the principal author may not be able to be controlled by the entity record for the principal author. any guidance from experienced data modelers in this regard would be appreciated. according to bruce d’arcus, this is purely an interface or application question that does not require a solution at the data layer.17 since we have never had interfaces or applications that would do this correctly, even though the data is readily available in authority records, i am skeptical about this answer! perhaps bruce’s suggestion under item 9 of designating a sortname property for each entity is the solution here as well. my human-readable work identifier consisting of the name of the principal creator and uniform title of work could be designated the sortname poperty for the work. it would have to be changed whenever the preferred form of the name for the principal creator changed, however. 4. do all possible inverse relationships need to be expressed explicitly, or can they be inferred? my model is already quite large, and i have not yet defined the inverse of every property as i really should to have a correct rdf model. in other words, for every property there needs to be an inverse property; for example, the property iscreatorof needs to have the inverse property iscreatedby; thus “twain” has the property iscreatorof, while “adventures of tom sawyer” has the property iscreatedby. perhaps users and inputters will not actually have to see the huge, complex rdf data model that would result from creating all the inverse relationships, but those who maintain the model will have to deal with a great deal of complexity. however, since i’m not a programmer, i don’t know how the complexity of rdf compares to the complexity of existing ils software. 5. can rdf solve the problems we are having now because of the lack of transitivity or inheritance in the data models that underlie current ilses, or will rdf merely perpetuate these problems? we have problems now with the data models that underlie our current ilses because of the inability of these models to deal with hierarchical inheritance, such that whatever is true of an entity in the hierarchy is also true of every entity below that entity in the hierarchy. one example is that of cross-references to a parent corporate body that should be held to apply to all subdivisions of that corporate body but never are in existing ils systems. there is a cross-reference from “fbi” to “united states. federal bureau of investigation,” but not from “fbi counterterrorism division” to “united states. federal bureau of investigation. counterterrorism division.” for that reason, a search in any opac name index for “fbi counterterrorism division” will fail. we need systems that recognize that data about a parent corporate body is relevant to all subdivisions of that parent body. we need systems that recognize that data about a work is relevant to all expressions and manifestations of that work. rdf allows you to link a work to an expression 66 information technology and libraries | june 2009 and an expression to a manifestation, but i don’t believe it allows you to encode the information that everything that is true of the work is true of all of its expressions and manifestations. rob styles seems to confirm this: “rdf doesn’t have hierarchy. in computer science terms, it’s a graph, not a tree, which means you can connect anything to anything else in any direction.”18 of course, not all links should be this kind of transitive or inheritance link. one expression of work a is linked to another expression of work a by links to work a, but whatever is true of one of those expressions is not necessarily true of the other; one may be illustrated, for example, while the other is not. whatever is true of one work is not necessarily true of another work related to it by related work link. it should be recognized that bibliographic data is rife with hierarchy. it is one of our major tools for expressing meaning to our users. corporate bodies have corporate subdivisions, and many things that are true for the parent body also are true for its subdivisions. subjects are expressed using main headings and subject subdivisions, and many things that are true for the main heading (such as variant names) also are true for the heading combined with one of its subdivisions. geographic areas are contained within larger geographic areas, and many things that are true of the larger geographic area also are true for smaller regions, counties, cities, etc., contained within that larger geographic area. for all these reasons, i believe that, to do effective displays and indexes for our bibliographic data, it is critical that we be able to distinguish between a hierarchical relationship and a nonhierarchical relationship. 6. to recognize the fact that the subject of a book or a film could be a work, a person, a concept, an object, an event, or a place (all classes in the model), is there any reason we cannot define subject itself as a property (a relationship) rather than a class in its own right? in my model, all subject properties are defined as having a domain of resource, meaning there is no constraint as to the class to which these subject properties apply. i’m not sure if there will be any fall-out from that modeling decision. 7. how do we distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location? sometimes a place is a jurisdiction and behaves like a corporate body (e.g., united states is the name of the government of the united states). sometimes place is a physical location in which something is located (e.g., the birds discussed in a book about the birds of the united states). to distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location, i have defined two different classes for place: place as jurisdictional corporate body and place as geographic area. will this cause problems in the model? will there be times when it prevents us from making elegant generalizations in the model about place per se? there is a similar problem with events. some events are corporate bodies (e.g., conferences that publish papers) and some are a kind of subject (e.g., an earthquake). i have defined two different classes for event: conference or other event as corporate body creator and event as subject. 8. what is the best way to model a bound-with or an issuedwith relationship, or a part–whole relationship in which the whole must be located to obtain the part? the bound-with relationship is actually between two items containing two different works, while the issued-with relationship is between two manifestations containing two different works (see figure 2). is this a work-to-work relationship? will designating it a work-to-work relationship cause problems for indicating which specific items or manifestation-items of each work are physically located in the same place? this question may also apply to those part–whole relationships in which the part is physically contained within the whole and both are located in the same place (sometimes known as analytics). one thing to bear in mind is that in all of these cases the relationship between two works does not hold between all instances of each work; it only holds for those particular instances that are contained in the particular manifestation or item that is bound with, issued with, or part of the whole. however, if the relationship is modeled as a work-1manifestation to work-2-manifestation relationship, or a work-1-item to work-2-item relationship,, care must be taken in the design of displays to pull in enough information about the two or more works so as not to confuse the user. 9. how do we express the arrangement of elements that have a definite order? i am having trouble imagining how to encode the ordering of data elements that make up a larger element, such as the pieces of a personal name. this is really a desire to control the display of those atomized elements so that they make sense to human beings rather than just to machines. could one define a property such as natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of a person given that the ideal order of these elements might vary from one person to another? could one define properties such as sorting element 1, sorting element 2, sorting element 3, etc., and assign them to the various pieces that will be assembled to make a particular heading for an entity, such as an lcsh heading for a historical period? (depending on the answer to the question in item 11, it may or may not be possible to assign a property to a property in this fashion.) are there standard sorting rules we need to be aware of (in unicode, for example)? are there other rdf techniques available to deal with sorting and arrangement? bruce d’arcus suggests that, instead of coding the name parts, it would be more useful to designate sortname properties;19 might it not be necessary to designate a sortname property for each variant name, as well, can bibliographic data be put directly onto the semantic web? | yee 67 for cases in which variants need to appear in sorted displays? and wouldn’t these sortname properties complicate maintenance over time as preferred and variant names changed? 10. how do we link related data elements in such a way that effective indexing and displays are possible? some examples: number and kind of instrument (e.g., music written for two oboes and three guitars); multiple publishers, frequencies, subtitles, editors, etc., with date spans for a serial title change (or will it be necessary to create a new manifestation for every single change in subtitle, publisher name, place of publication, etc?). the assumption seems to be that there will be no repeatable data elements. based on my somewhat limited experience with rdf, it appears that there are record equivalents (every data element—property or relationship—pertaining to a particular entity with a uri), but there are no field or subfield equivalents that allow the sublinking of related pieces of data about an entity. indeed, rob styles goes so far as to argue that ultimately there is no notion of a “record” in rdf.20 it is possible that blank nodes might be able to fill in for fields and subfields in some cases for grouping data, but there are dangers involved in their use.21 to a cataloger, it looks as though the plan is for rdf data to float around loose without any requirement that there be a method for pulling it together into coherent displays designed for human beings. 11. can a property have a property in rdf? as an example of where it might be useful to define a property of a property, robert maxwell suggests that date of publication is really an attribute (property) of the published by relationship (another property).22 another example: in my model, a variant title for a serial is a property. can that property itself have the property type of variant title to encompass things like spine title, key title, etc.? another example appeared in item 9, in which it is suggested that it might be desirable to assign sort-element properties to the various elements of a name property. 12. how do we document record display decisions? there is no way to record display decisions in rdf itself; it is completely display-neutral. we could not safely commit to a particular rdf–based data model until a significant amount of sample bibliographic data had been created and open-source indexing and display software had been designed and user-tested on that data. it may be that we will need to supplement rdf with some other encoding mechanism that allows us to record display decisions along with the data. current cataloging rules are about display as much as they are about content designation. isbd concerns the order in which the elements should be displayed to humans. the cataloging objectives concern display to users of such entity groups as the works of an author, the editions of a work, and the works on a subject. 13. can all bibliographic data be reduced to either a class or a property with a finite list of values? another way to put this is to ask if all that catalogers do could be reduced to a set of pull-down menus. cataloging is the art of writing discursive prose as much as it is the ability to select the correct value for a particular data element. we must deal with ambiguous data (presented by joe blow could mean that joe created the entire work, produced it, distributed it, sponsored it, or merely funded it). we must sometimes record information without knowing its exact meaning. we must deal with situations that have not been anticipated in advance. it is not possible to list every possible kind of data and every possible value for each type of figure 2. examples of part–whole relationships. how might these be best expressed in rdf? issued-with relationship a copy of charlie chaplin’s 1917 film the immigrant can be found on a videodisc compilation called charlie chaplin, the early years along with two other chaplin films. this compilation was published and collected by many different libraries and media centers. if a user wants to view this copy of the immigrant, he or she will first have to locate charlie chaplin, the early years, then look for the desired film at the beginning of the first videodisc in the set. the issued-with relationship between the immigrant and the other two films on charlie chaplin, the early years is currently expressed in the bibliographic record by means of a “with” note: first on charlie chaplin, the early years, v. 1 (62 min.) with: the count – easy street. bound-with relationship the university of california, los angeles film & television archive has acquired a reel of 16 mm. film from a collector who strung five warner bros. cartoons together on a single reel of film. we can assume that no other archive, library, or media collection will have this particular compilation of cartoons, so the relationship between the five cartoons is purely local in nature. however, any user at the film & television archive who wishes to view one of these cartoons will have to request a viewing appointment for the entire reel and then find the desired cartoon among the other four on the reel. the bound-with relationship among these cartoons is currently expressed in a holdings record by means of a “with” note: fourth on reel with: daffy doodles – tweety pie – i love to singa – along flirtation walk. 68 information technology and libraries | june 2009 data up front before any data is gathered. it will always be necessary to provide a plain-text escape hatch. the bibliographic world is a complex, constantly changing world filled with ambiguity. n what are the next steps? in a sense, this paper is a first crude attempt at locating unmapped territory that has not yet been explored. if we were to decide as a community that it would be valuable to move our shared cataloging activities onto the semantic web, we would have a lot of work ahead of us. if some of the rdf problems described above are insoluble, we may need to work with semantic web developers to create a more sophisticated version of rdf that can handle the transitivity and complex linking required by our data. we will also need to encourage a very complex existing community to evolve institutional structures that would enable a more efficient use of the internet for the sharing of cataloging and other metadata creation. this is not just a technological problem, but also a political one. in the meantime, the experiment continues. let the thinking and learning begin! references and notes 1. “notation3, or n3 as it is more commonly known, is a shorthand non–xml serialization of resource description framework models, designed with human-readability in mind: n3 is much more compact and readable than xml rdf notation. the format is being developed by tim berners-lee and others from the semantic web community.” wikipedia, “notation 3,” http://en.wikipedia.org/wiki/notation_3 (accessed feb. 19, 2009). 2. frbr review group, www.ifla.org/vii/s13/wgfrbr/; frbr review group, franar (working group on functional requirements and numbering of authority records), www .ifla.org/vii/d4/wg-franar.htm; frbr review group, frsar (working group, functional requirements for subject authority records), www.ifla.org/vii/s29/wgfrsar.htm; frbroo, frbr review group, working group on frbr/crm dialogue, www .ifla.org/vii/s13/wgfrbr/frbr-crmdialogue_wg.htm. 3. library of congress, response to on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 24, 39, 40, www.loc.gov/bibliographic-future/news/lcwgrpt response_dm_053008.pdf (accessed mar. 25, 2009). 4. ibid., 39. 5. ibid., 41. 6. dublin core metadata initiative, dcmi/rda task group wiki, http://www.dublincore.org/dcmirdataskgroup/ (accessed mar. 25, 2009). 7. mikael nilsson, andy powell, pete johnston, and ambjorn naeve, expressing dublin core metadata using the resource description framework (rdf), http://dublincore.org/ documents/2008/01/14/dc-rdf/ (accessed mar. 25, 2009). 8. see for example table 6.3 in frbr, which maps to manifestation every kind of data that pertains to expression change with the exception of language change. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records (munich: k. g. saur, 1998): 95, http://www.ifla.org/vii/s13/frbr/frbr.pdf (accessed mar. 4, 2009). 9. roy tennant, “marc must die,” library journal 127, no. 17 (oct. 15, 2002): 26. 10. w3c, skos simple knowledge organization system reference, w3c working draft 29 august 2008, http://www.w3.org/ tr/skos-reference/ (accessed mar. 25, 2009). 11. the extract in figure 1 is taken from my complete rdf model, which can be found at http://myee.bol.ucla.edu/ ycrschemardf.txt. 12. mary w. elings and gunter waibel, “metadata for all: descriptive standards and metadata sharing across libraries, archives and museums,” first monday 12, no. 3 (mar. 5, 2007), http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/1628/1543 (accessed mar. 25, 2009). 13. oclc, a holdings primer: principles and standards for local holdings records, 2nd ed. (dublin, ohio: oclc, 2008), 4, http:// www.oclc.org/us/en/support/documentation/localholdings/ primer/holdings%20primer%202008.pdf (accessed mar. 25, 2009). 14. the library of congress working group, on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 30, http:// www.loc.gov/bibliographic-future/news/lcwg-ontherecord -jan08-final.pdf (accessed mar. 25, 2009). 15. talis, sir tim berners-lee talks with talis about the semantic web: transcript of an interview recorded on 7 february 2008, http://talis-podcasts.s3.amazonaws.com/twt20080207_timbl .html (accessed mar. 25, 2009). 16. bruce d’arcus, e-mail to author, mar. 18, 2008. 17. ibid. 18. rob styles, e-mail to author, mar. 25, 2008. 19. bruce d’arcus, e-mail to author, mar. 18, 2008. 20. rob styles, e-mail to author, mar. 25, 2008. 21. w3c, “section 2.3, structured property values and blank nodes,” in rdf primer: w3c recommendation 10 february 2004, http://www.w3.org/tr/rdf-primer/#structuredproperties (accessed mar. 25, 2009). 22. robert maxwell, frbr: a guide for the perplexed (chicago: ala, 2008). can bibliographic data be put directly onto the semantic web? | yee 69 entities/classes in rda, frbr, frad compared to yee cataloging rules (ycr) rda, frbr, and frad ycr group 1: work work group 1: expression expression surrogate group 1: manifestation manifestation title-manifestation serial title group 1: item item group 2: person person fictitious character performing animal group 2: corporate body corporate body corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family (rda and frad only) group 3: concept concept group 3: object object group 3: event event or historical period as subject group 3: place place as geographic area discipline genre/form name identifier controlled access point rules (frad only) agency (frad only) appendix. entity/class and attribute/property comparisons 70 information technology and libraries | june 2009 attributes/properties in frbr compared to frad model entity frbr frad work title of the work form of work date of the work other distinguishing characteristics intended termination intended audience context for the work medium of performance (musical work) numeric designation (musical work) key (musical work) coordinates (cartographic work) equinox (cartographic work) form of work date of the work medium of performance subject of the work numeric designation key place of origin of the work original language of the work history other distinguishing characteristic expression title of the expression form of expression date of expression language of expression other distinguishing characteristics extensibility of expression revisability of expression extent of the expression summarization of content context for the expression critical response to the expression use restrictions on the expression sequencing pattern (serial) expected regularity of issue (serial) expected frequency of issue (serial) type of score (musical notation) medium of performance (musical notation or recorded sound) scale (cartographic image/object) projection (cartographic image/object) presentation technique (cartographic image/object) representation of relief (cartographic image/object) geodetic, grid, and vertical measurement (cartographic image/ object) recording technique (remote sensing image) special characteristic (remote sensing image) technique (graphic or projected image) form of expression date of expression language of expression technique other distinguishing characteristic surrogate can bibliographic data be put directly onto the semantic web? | yee 71 model entity frbr frad manifestation title of the manifestation statement of responsibility edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution fabricator/manufacturer series statement form of carrier extent of the carrier physical medium capture mode dimensions of the carrier manifestation identifier source for acquisition/access authorization terms of availability access restrictions on the manifestation typeface (printed book) type size (printed book) foliation (hand-printed book) collation (hand-printed book) publication status (serial) numbering (serial) playing speed (sound recording) groove width (sound recording) kind of cutting (sound recording) tape configuration (sound recording) kind of sound (sound recording) special reproduction characteristic (sound recording) colour (image) reduction ratio (microform) polarity (microform or visual projection) generation (microform or visual projection) presentation format (visual projection) system requirements (electronic resource) file characteristics (electronic resource) mode of access (remote access electronic resource) access address (remote access electronic resource) edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution form of carrier numbering title-manifestation serial title item item identifier fingerprint provenance of the item marks/inscriptions exhibition history condition of the item treatment history scheduled treatment access restrictions on the item location of item attributes/properties in frbr compared to frad (cont.) 72 information technology and libraries | june 2009 model entity frbr frad person name of person dates of person title of person other designation associated with the person dates associated with the person title of person other designation associated with the person gender place of birth place of death country place of residence affiliation address language of person field of activity profession/occupation biography/history fictitious character performing animal corporate body name of the corporate body number associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body type of corporate body language of the corporate body address field of activity history corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family type of family dates of family places associated with family history of family concept term for the concept type of concept object term for the object type of object date of production place of production producer/fabricator physical medium event term for the event date associated with the event place associated with the event attributes/properties in frbr compared to frad (cont.) can bibliographic data be put directly onto the semantic web? | yee 73 model entity frbr frad place term for the place coordinates other geographical information discipline genre/form name type of name scope of usage dates of usage language of name script of name transliteration scheme of name identifier type of identifier identifier string suffix controlled access point type of controlled access point status of controlled access point designated usage of controlled access point undifferentiated access point language of base access point script of base access point script of cataloguing transliteration scheme of base access point transliteration scheme of cataloguing source of controlled access point base access point addition rules citation for rules rules identifier agency name of agency agency identifier location of agency attributes/properties in frbr compared to frad (cont.) 74 information technology and libraries | june 2009 attributes/properties in rda compared to ycr model entity rda ycr work title of the work form of work date of work place of origin of work medium of performance numeric designation key signatory to a treaty, etc. other distinguishing characteristic of the work original language of the work history of the work identifier for the work nature of the content coverage of the content coordinates of cartographic content equinox epoch intended audience system of organization dissertation or theses information key identifier for work language-based identifier (preferred lexical label) variant language-based identifier (alternate lexical label) language-based identifier (preferred lexical label) for work language-based identifier for work (preferred lexical label) identified by principalcreator in combination with uniform title language-based identifier (preferred lexical label) for work identified by title alone (uniform title) supplied title for work variant title for work original language of work responsibility for work original publication statement of work dates associated with work original publication/release/broadcast date of work copyright date of work creation date of work date of first recording of a work date of first performance of a work finding date of naturally occurring object original publisher/distributor/broadcaster of work places associated with work original place of publication/distribution/broadcasting for work country of origin of work place of creation of work place of first recording of work place of first performance of work finding place of naturally occurring object original method of publication/distribution/broadcast of work serial or integrating work original numeric and/or alphabetic designations—beginning serial or integrating work original chronological designations— beginning serial or integrating work original numeric and/or alphabetic designations—ending serial or integrating work original chronological designations— ending encoding of content of work genre/form of content of work original instrumentation of musical work instrumentation of musical work—number of a particular instrument instrumentation of musical work—type of instrument original voice(s) of musical work voice(s) of musical work—number of a particular type of voice voice(s) of musical work—type of voice original key of musical work numeric designation of musical work coordinates of cartographic work equinox of cartographic work original physical characteristics of work original extent of work original dimensions of work mode of issuance of work can bibliographic data be put directly onto the semantic web? | yee 75 model entity rda ycr work (cont.) original aspect ratio of moving image work original image format of moving image work original base of work original materials applied to base of work work summary work contents list custodial history of work creation of archival collection censorship history of work note about relationship(s) to other works expression content type date of expression language of expression other distinguishing characteristic of the expression identifier for the expression summarization of the content place and date of capture language of the content form of notation accessibility content illustrative content supplementary content colour content sound content aspect ratio format of notated music medium of performance of musical content duration performer, narrator, and/or presenter artistic and/or technical credits scale projection of cartographic content other details of cartographic content awards key identifier for expression language-based identifier (preferred lexical label) for expression variant title for expression nature of modification of expression expression title expression statement of responsibility edition statement scale of cartographic expression projection of cartographic expression publication statement of expression place of publication/distribution/release/broadcasting for expression place of recording for expression publisher/distributor/releaser/broadcaster for expression publication/distribution/release/broadcast date for expression copyright date for expression date of recording for expression numeric and/or alphabetic designations for serial expressions chronological designations for serial expressions performance date for expression place of performance for expression extent of expression content of expression language of expression text language of expression captions language of expression sound track language of sung or spoken text of expression language of expression subtitles language of expression intertitles language of summary or abstract of expression instrumentation of musical expression instrumentation of musical expression—number of a particular instrument instrumentation of musical expression—type of instrument voice(s) of musical expression voice(s) of musical expression—number of a particular type of voice voice(s) of musical expression—type of voice key of musical expression appendages to the expression expression series statement mode of issuance for expression notes about expression surrogate [under development] attributes/properties in rda compared to ycr (cont.) 76 information technology and libraries | june 2009 model entity rda ycr manifestation title statement of responsibility edition statement numbering of serials production statement publication statement distribution statement manufacture statement copyright date series statement mode of issuance frequency identifier for the manifestation note media type carrier type base material applied material mount production method generation layout book format font size polarity reduction ratio sound characteristics projection characteristics of motion picture film video characteristics digital file characteristics equipment and system requirements terms of availability key identifier for manifestation publication statement of manifestation place of publication/distribution/release/broadcast of manifestation manifestation publisher/distributor/releaser/broadcaster manifestation date of publication/distribution/release/broadcast carrier edition statement carrier piece count carrier name carrier broadcast standard carrier recording type carrier playing speed carrier configuration of playback channels process used to produce carrier carrier dimensions carrier base materials carrier generation carrier polarity materials applied to carrier carrier encoding format intermediation tool requirements system requirements serial manifestation illustration statement manifestation standard number manifestation isbn manifestation issn manifestation publisher number manifestation universal product code notes about manifestation titlemanifestation key identifier for title-manifestation variant title for title-manifestation title-manifestation title title-manifestation statement of responsibilities title-manifestation edition statement publication statement of title-manifestation place of publication/distribution/release/broadcasting of titlemanifestation publisher/distributor/releaser, broadcaster of title-manifestation date of publication/distribution/release/broadcast of titlemanifestation title-manifestation series title-manifestation mode of issuance notes about title-manifestation title-manifestation standard number attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 77 model entity rda ycr serial title key identifier for serial title variant title for serial title title of serial title serial title statement of responsibility serial title edition statement publication statement of serial title place of publication/distribution/release/broadcast of serial title publisher/distributor/releaser/broadcaster of serial title date of publication/distribution/release/broadcast of serial title serial title beginning numeric and/or alphabetic designations serial title beginning chronological designations serial title ending numeric and/or alphabetic designations serial title ending chronological designations serial title frequency serial title mode of issuance serial title illustration statement notes about serial title serial title issn-l item preferred citation custodial history immediate source of acquisition identifier for the item item-specific carrier characteristics key identifier for item item barcode item location item call number or accession number item copy number item provenance item condition item marks and inscriptions item exhibition history item treatment history item scheduled treatment item access restrictions attributes/properties in rda compared to ycr (cont.) 78 information technology and libraries | june 2009 model entity rda ycr person name of the person preferred name for the person variant name for the person date associated with the person title of the person fuller form of name other designation associated with the person gender place of birth place of death country associated with the person place of residence address of the person affiliation language of the person field of activity of the person profession or occupation biographical information identifier for the person key identifier for person language-based identifier (preferred lexical label) for person clan name of person forename/given name/first name of person matronymic of person middle name of person nickname of person patronymic of person surname/family name of person natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of person affiliation of person biography/history of person date of birth of person date of death of person ethnicity of person field of activity of person gender of person language of person place of birth of person place of death of person place of residence of person political affiliation of person profession/occupation of person religion of person variant name for person fictitious character [under development] performing animal [under development] corporate body name of the corporate body preferred name for the corporate body variant name for the corporate body place associated with the corporate body date associated with the corporate body associated institution other designation associated with the corporate body language of the corporate body address of the corporate body field of activity of the corporate body corporate history identifier for the corporate body key identifier for corporate body language-based identifier (preferred lexical label) for corporate body dates associated with corporate body field of activity of corporate body history of corporate body language of corporate body place associated with corporate body type of corporate body variant name for corporate body corporate subdivision [under development] place as jurisdictional corporate body [under development] attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 79 model entity rda ycr conference or other event as corporate body creator [under development] jurisdictional corporate subdivision [under development] family name of the family preferred name for the family variant name for the family type of family date associated with the family place associated with the family prominent member of the family hereditary title family history identifier for the family concept term for the concept preferred term for the concept variant term for the concept type of concept identifier for the concept key identifier for concept language-based identifier (preferred lexical label) for concept qualifier for concept language-based identifier variant name for concept object name of the object preferred name for the object variant name for the object type of object date of production place of production producer/fabricator physical medium identifier for the object key identifier for object language-based identifier (preferred lexical label) for object qualifier for object language-based identifier variant name for object event name of the event preferred name for the event variant name for the event date associated with the event place associated with the event identifier for the event key identifier for event or historical period as subject language-based identifier (preferred lexical label) for event or historical period as subject beginning date for event or historical period as subject ending date for event or historical period as subject variant name for event or historical period as subject place name of the place preferred name for the place variant name for the place coordinates other geographical information identifier for the place key identifier for place as geographic area language-based identifier (preferred lexical label) for place as geographic area qualifier for place as geographic area variant name for place as geographic area discipline key identifier for discipline language-based identifier (preferred lexical label) (name or classification number or symbol) for discipline translation of meaning of classification number or symbol for discipline attributes/properties in rda compared to ycr (cont.) 80 information technology and libraries | june 2009 model entity rda ycr genre/form key identifier for genre/form language-based identifier (preferred lexical label) for genre/form variant name for genre/form name scope of usage date of usage identifier controlled access point rules agency note: in rda, the following attributes have not yet been assigned to a particular class or entity: extent, dimensions, terms of availability, contact information, restrictions on access, restrictions on use, uniform resource locator, status of identification, source consulted, cataloguer’s note, status of identification, and undifferentiated name indicator. name is being treated as both a class and a property. identifier and controlled access point are treated as properties rather than classes in both rda and ycr. attributes/properties in rda compared to ycr (cont.) 252 book reviews systematic analysis of university libraries, by jeffrey a. raffel and robert shishko. cambridge, mass.: m. i. t. press, 1969. 107 pp. $6.95. . systematic analysis of university libraries is an exciting book, for it is the first rep?rt describing a~plication of cost-benefit analysis to a library. raffel and shishk? ha~e apphed the methodology of cost-benefit analysis to . the m. i. t. l1branes and have produced an admirable description of this. method of research that examines policy making in a system as a cho~ce among alternatives. this work is not a cookbook providing answers denved from principles; it is an exposition of a methodology that produces data used as a basis of decision making. the book employs the case-study technique, with them. i. t. libraries furnish~g ~e raw material for the cases. findings cannot be extrapolated to all hbranes: although they may be applicable in some. for example, raffel and shishko found that 75% of the m. i. t. libraries budget is allocated to res.earch activities in the institution. such findings are inapplicable to small hberal arts colleges, where faculty members do little research. the ,purpose of systematic analysis of university libraries is to teach the ~ppl1cabo~ of cost-benefit analysis rather than to provide answers. it mstructs m the methodology for obtaining answers. case. studies pres~nted in the ?ook include selection, acquisitions and catalogm~,. ~mong hbrary operations. also examined are book storage, study facihhes and reserve book procedures. a technique for measuring benefits by surveying users is also described. the conc~uding ~hapter presents in outline form major findings, of which o~ly two will be given here as examples of results of this type of analysis. frrst, the authors found that the most effective alternate storage system, namely compact storage, saves only about one percent of annual library resources, but provokes a major loss of benefit, since compact storage limits browsing and increases retrieval time for books. a second finding of int~rest ~s . that major. cataloging expenses are for professional librarians domg ongmal catalogmg, and for proofreading and checking of catalog car~s. t?at costs of original cataloging bulk largest will not be a surprise to hb:anans, but that the ~ext largest cost should be proofreading and checkmg of catalog cards will come as a surprise to some. the book concludes with a score of research questions to be explored in the future, and it is fervently to be hoped that raffel and shishko will continue their investigation along the avenues they have delineated. frederick g. kilgour book reviews 253 tj:e undergr~d~te library, by irene a. braden. chicago: american library assocmhon, 1970. ( acrl monograph, 31). 158 pp. $7.50. the separate undergraduate library on the university campus is a phe?omenon of the last two decades-harvard's lamont library was the first m 1949. more than twenty-five such libraries now exist or are in the plann.in~ or c01~str~~tion stage. the literature of librarianship contains descnphons of md1v1dual undergraduate libraries or philosophical essays concerning library services for undergraduates. braden, however, was the first to st':dy more .exte~sively and impartially this attempt to provide bette~ se.rvices for umvemty students. for her dissertation at the university of. m~ch1gan, she collected data on six undergraduate libraries-harvard, ~i~htg~, south carolina, cornell, indiana, and texas. each library was visited m 1965/66 and interviews with librarians were conducted· documents were consulted. ' we he~e have published ~5-35 page descriptions of these six pioneers. !~~ studies r.ange from architectural design, through the gathering of the m1hal collections of books and other media, to the host of services offered in the complete~ library. excellent statistical tables, organizational charts and floor plans illustrate the text. there are some errors. michigan added more _se~ts in 1965, not sej?,tember, 1966 as stated on page 43. also referring to michtgan on page 47: the reference collection began with about 2000 volumes, but it soon became evident that it would have to be enlarged. the collection now numbers about 3100 volumes.83" the footnote refers to page 18 of the 1957/58 annual report of the michigan undergraduate library, but there is no mention of the number of reference volumes there. instead the 1957/58 annual report records on page 4 that there were 800 reference volumes on november 18, 1957 when the collections were moved into the new building. after presentation of the case studies, the author summarizes her conclusions on the buildings, book collections, services, staffs, and use by stu?en~s. ~f particular value are fourteen brief guidelines formulated to assist libranans who may be contemplating an undergraduate library on their campus. the reader should be forewarned that the undergraduate library, although a most welcome publication, is now an historical document. only data through 1964/65 are presented. major changes in services and facilities have occurred in the past five years. those interested in autom~ti~n would think that undergraduate libraries have done nothing. michigan, however, began an automated circulation system for reserve material in 1967 and for the main collection in 1968. . billy r. wilkinson 252 book reviews systematic analysis of university libraries, by jeffrey a. raffel and robert shishko. cambridge, mass.: m. i. t. press, 1969. 107 pp. $6.95. . systematic analysis of university libraries is an exciting book, for it is the first rep?rt describing a~plication of cost-benefit analysis to a library. raffel and shishk? ha~e apphed the methodology of cost-benefit analysis to . the m. i. t. l1branes and have produced an admirable description of this. method of research that examines policy making in a system as a cho~ce among alternatives. this work is not a cookbook providing answers denved from principles; it is an exposition of a methodology that produces data used as a basis of decision making. the book employs the case-study technique, with them. i. t. libraries furnish~g ~e raw material for the cases. findings cannot be extrapolated to all hbranes: although they may be applicable in some. for example, raffel and shishko found that 75% of the m. i. t. libraries budget is allocated to res.earch activities in the institution. such findings are inapplicable to small hberal arts colleges, where faculty members do little research. the ,purpose of systematic analysis of university libraries is to teach the ~ppl1cabo~ of cost-benefit analysis rather than to provide answers. it mstructs m the methodology for obtaining answers. case. studies pres~nted in the ?ook include selection, acquisitions and catalogm~,. ~mong hbrary operations. also examined are book storage, study facihhes and reserve book procedures. a technique for measuring benefits by surveying users is also described. the conc~uding ~hapter presents in outline form major findings, of which o~ly two will be given here as examples of results of this type of analysis. frrst, the authors found that the most effective alternate storage system, namely compact storage, saves only about one percent of annual library resources, but provokes a major loss of benefit, since compact storage limits browsing and increases retrieval time for books. a second finding of int~rest ~s . that major. cataloging expenses are for professional librarians domg ongmal catalogmg, and for proofreading and checking of catalog car~s. t?at costs of original cataloging bulk largest will not be a surprise to hb:anans, but that the ~ext largest cost should be proofreading and checkmg of catalog cards will come as a surprise to some. the book concludes with a score of research questions to be explored in the future, and it is fervently to be hoped that raffel and shishko will continue their investigation along the avenues they have delineated. frederick g. kilgour book reviews 253 tj:e undergr~d~te library, by irene a. braden. chicago: american library assocmhon, 1970. ( acrl monograph, 31). 158 pp. $7.50. the separate undergraduate library on the university campus is a phe?omenon of the last two decades-harvard's lamont library was the first m 1949. more than twenty-five such libraries now exist or are in the plann.in~ or c01~str~~tion stage. the literature of librarianship contains descnphons of md1v1dual undergraduate libraries or philosophical essays concerning library services for undergraduates. braden, however, was the first to st':dy more .exte~sively and impartially this attempt to provide bette~ se.rvices for umvemty students. for her dissertation at the university of. m~ch1gan, she collected data on six undergraduate libraries-harvard, ~i~htg~, south carolina, cornell, indiana, and texas. each library was visited m 1965/66 and interviews with librarians were conducted· documents were consulted. ' we he~e have published ~5-35 page descriptions of these six pioneers. !~~ studies r.ange from architectural design, through the gathering of the m1hal collections of books and other media, to the host of services offered in the complete~ library. excellent statistical tables, organizational charts and floor plans illustrate the text. there are some errors. michigan added more _se~ts in 1965, not sej?,tember, 1966 as stated on page 43. also referring to michtgan on page 47: the reference collection began with about 2000 volumes, but it soon became evident that it would have to be enlarged. the collection now numbers about 3100 volumes.83" the footnote refers to page 18 of the 1957/58 annual report of the michigan undergraduate library, but there is no mention of the number of reference volumes there. instead the 1957/58 annual report records on page 4 that there were 800 reference volumes on november 18, 1957 when the collections were moved into the new building. after presentation of the case studies, the author summarizes her conclusions on the buildings, book collections, services, staffs, and use by stu?en~s. ~f particular value are fourteen brief guidelines formulated to assist libranans who may be contemplating an undergraduate library on their campus. the reader should be forewarned that the undergraduate library, although a most welcome publication, is now an historical document. only data through 1964/65 are presented. major changes in services and facilities have occurred in the past five years. those interested in autom~ti~n would think that undergraduate libraries have done nothing. michigan, however, began an automated circulation system for reserve material in 1967 and for the main collection in 1968. . billy r. wilkinson 254 journal of library automation vol. 3/3 september, 1970 report on t~e total system computer program for medical libraries, by robert e. divett and w. wayne jones. albuquerque: university of new mexico school of medicine library of the medical sciences, 1969. 424 pp. the concep~ .of "total system" is a fairly easy one to grasp until one attempts defimtion of the term. then there creep in all sorts of unexpected, rather unfair practical considerations, usually related to environment. under these constraints, one man's total system becomes a very personal conditioned statement. the report is organized into three sections: a system description oriented toward the librarian; technical descriptions of the file organization and program structure f~r .the programmer; and a set of appendices which mclude the source hstmgs of all the programs. the source listings are mo~e than three-quarters of ~he report, and are tiring to examine and decipher. much more useful m a report of this nature would have been the program decision tables which underly the program. a section on recommendations explores the future direction of the system. however, some matters of concern in the report are glossed over in a rather facile manner with little or no comment. the system has been implemented at different levels. acquisitions and cat~o~ing are. essentially translations to an on-line mode of a batch system. (it ~s mteres~g to note ~at ~ card catalog is maintained to back up this on-line operation). on-line crrculation is presented as if it is running, whereas the authors say that lack of funding prevented implementation. -:r:he really exciting work has been done with file organization, the incorporation of mesh tree structures on the file and their use for upward (to most general, not most specific) searching, and the development of an on-line interrogation procedure both for update and search of the file. one finds that hardware costs alone would be either $7728 per annum plus computer time for a batch system, or between $98,000 and $104,000 per annum plus computer time for a terminal system. but then one reads that "the terminal total computer system is the only effective, efficient way of meeting the demands of service and processing that are required by a technical library." when one is talking about a hardware cost of $100,000 per annum what exactly do the words "only effective, efficient" mean? glyn evans how to manage and use technical information, by freeman h . dyke, jr. boston: industrial education institute, 1968. $15.00. freeman dyke is a veteran of the ups and downs of the informationretrieval industry and through his association through the years with jonker, doc~mentat.i~n,_inc. ~leasco ), and the acm lecture circuit has developed a wide famil1anty with hardware and software used in the handling of technical information. this book is a compendium of information about book reviews 255 equipment and systems, ranging from catalog cards to computers. a useful feature, repeated many times throughout the b<9ok, is a double list of "advantages" and "disadvantages" for the hardware or the system that has been described. thus, the advantages of uniterm cards and dual dictionaries (simplicity, low equipment and operating cost, flexibility of vocabulary, physical availability, fairly high output speed ) are balanced against their disadvantages (variable search speed, low output flexibility, indirect access to information, difficulty in updating ). in most cases, no bias is indicated in the descriptive sections, and the reader is more or less on his own in making a final choice of machine or technique. numerous clear illustrations -photographs, cartoons, diagrams and other graphics-provide a . helpful and interesting relief to the unjustified offset text. the lack of an index sets up serious retrieval problems. the major market for this book would seem to be business and industry, particularly companies which .are planning to set up or modernize their methods for the storage and retrieval of technical information. the book might well be purchased for the business or industrial users of a library. because it is not at all oriented to the problems of library automation, it is not particularly recommended for use by the librarians themselves. · a. ]. goldwyn an introduction to decision logic tables, by herman mcdaniel. new york: wiley, 1968. 96 pp. $6.95. the literature of decision tables is marked more by · its absence than by its presence; before the appearance of this book, the reader was limited to brief journal articles or an infrequent technical report or two. thus, even though the author wams that the present volume makes no pretext of being an exhaustive treatment, he has nonetheless added materially to the store of knowledge of this admittedly limited field. ; mcdaniel carefully leads the reader through the process of developing a decision table and the simple rules of logic utilized to prove relevancy of the table elements or for eliminating irrelevant tests. of interest to all who are concerned with automation is the author's discussion of -the conversion of a flow chart to· a decision table. another interesting section is the use of table processors to translate decisioi1 tables into · portions of computer programs. at · this juncture, the author offers some evidence to support his contention that considerable programming time will be saved if the programmer works from decision tables rather than flow charts. if he is right, librarians had better get with it and learn how to construct decision tables as well as flow charts. one omission, a discussion of and and or condition statements, is unfortunate since it appears that they merit space even in an introductory text. however, the author does provide a considerable number of exercises for the reader. these will help to sharpen the reader's understanding of decision tables. john ]. miniter 254 journal of library automation vol. 3/3 september, 1970 report on t~e total system computer program for medical libraries, by robert e. divett and w. wayne jones. albuquerque: university of new mexico school of medicine library of the medical sciences, 1969. 424 pp. the concep~ .of "total system" is a fairly easy one to grasp until one attempts defimtion of the term. then there creep in all sorts of unexpected, rather unfair practical considerations, usually related to environment. under these constraints, one man's total system becomes a very personal conditioned statement. the report is organized into three sections: a system description oriented toward the librarian; technical descriptions of the file organization and program structure f~r .the programmer; and a set of appendices which mclude the source hstmgs of all the programs. the source listings are mo~e than three-quarters of ~he report, and are tiring to examine and decipher. much more useful m a report of this nature would have been the program decision tables which underly the program. a section on recommendations explores the future direction of the system. however, some matters of concern in the report are glossed over in a rather facile manner with little or no comment. the system has been implemented at different levels. acquisitions and cat~o~ing are. essentially translations to an on-line mode of a batch system. (it ~s mteres~g to note ~at ~ card catalog is maintained to back up this on-line operation). on-line crrculation is presented as if it is running, whereas the authors say that lack of funding prevented implementation. -:r:he really exciting work has been done with file organization, the incorporation of mesh tree structures on the file and their use for upward (to most general, not most specific) searching, and the development of an on-line interrogation procedure both for update and search of the file. one finds that hardware costs alone would be either $7728 per annum plus computer time for a batch system, or between $98,000 and $104,000 per annum plus computer time for a terminal system. but then one reads that "the terminal total computer system is the only effective, efficient way of meeting the demands of service and processing that are required by a technical library." when one is talking about a hardware cost of $100,000 per annum what exactly do the words "only effective, efficient" mean? glyn evans how to manage and use technical information, by freeman h . dyke, jr. boston: industrial education institute, 1968. $15.00. freeman dyke is a veteran of the ups and downs of the informationretrieval industry and through his association through the years with jonker, doc~mentat.i~n,_inc. ~leasco ), and the acm lecture circuit has developed a wide famil1anty with hardware and software used in the handling of technical information. this book is a compendium of information about book reviews 255 equipment and systems, ranging from catalog cards to computers. a useful feature, repeated many times throughout the b<9ok, is a double list of "advantages" and "disadvantages" for the hardware or the system that has been described. thus, the advantages of uniterm cards and dual dictionaries (simplicity, low equipment and operating cost, flexibility of vocabulary, physical availability, fairly high output speed ) are balanced against their disadvantages (variable search speed, low output flexibility, indirect access to information, difficulty in updating ). in most cases, no bias is indicated in the descriptive sections, and the reader is more or less on his own in making a final choice of machine or technique. numerous clear illustrations -photographs, cartoons, diagrams and other graphics-provide a . helpful and interesting relief to the unjustified offset text. the lack of an index sets up serious retrieval problems. the major market for this book would seem to be business and industry, particularly companies which .are planning to set up or modernize their methods for the storage and retrieval of technical information. the book might well be purchased for the business or industrial users of a library. because it is not at all oriented to the problems of library automation, it is not particularly recommended for use by the librarians themselves. · a. ]. goldwyn an introduction to decision logic tables, by herman mcdaniel. new york: wiley, 1968. 96 pp. $6.95. the literature of decision tables is marked more by · its absence than by its presence; before the appearance of this book, the reader was limited to brief journal articles or an infrequent technical report or two. thus, even though the author wams that the present volume makes no pretext of being an exhaustive treatment, he has nonetheless added materially to the store of knowledge of this admittedly limited field. ; mcdaniel carefully leads the reader through the process of developing a decision table and the simple rules of logic utilized to prove relevancy of the table elements or for eliminating irrelevant tests. of interest to all who are concerned with automation is the author's discussion of -the conversion of a flow chart to· a decision table. another interesting section is the use of table processors to translate decisioi1 tables into · portions of computer programs. at · this juncture, the author offers some evidence to support his contention that considerable programming time will be saved if the programmer works from decision tables rather than flow charts. if he is right, librarians had better get with it and learn how to construct decision tables as well as flow charts. one omission, a discussion of and and or condition statements, is unfortunate since it appears that they merit space even in an introductory text. however, the author does provide a considerable number of exercises for the reader. these will help to sharpen the reader's understanding of decision tables. john ]. miniter 256 journal of library automation vol. 3/3 september, 1970 computer-based library and information systems, by j.p. henley. computer monographs series. new york: american elsevier, 1970. 84 pp. $5.75. just when librarians and computer specialists were beginning to understand each other, there is published a slight monograph that effectively gaps the bridge. the book is based upon mr. henley's m.sc. work at trinity college, dublin. it bears a 1970 imprint, but appears to be about seven years out of date. one is told briefly about the king report, the information retrieval languages lisp and comit, and related ancient breakthroughs. the bibliography yields 32 dated citations with a mean date of 1963-the approximate time this work might have been considered timely. in seven slim chapters ru1d two gratuitous appendices, the author treats such topics as "introduction to the computer", "library systems requirements", "the philosophy of a machine-based system", and even "a short note on backus normal form". some of the author's urgent allusions to old events are pure high camp, e.g., "the growing interest in mechanisation, borne out for example by . . . the recent initiation of discussions between a major publishing house and a large computer manufacturer, make it vital for the cross-fertilization of ideas between computer and library experts to proceed as quickly as possible." (p. 75.) other pronouncements are patently absurd, such as: "one common use of such real-time 'on-line' computing is the writing of a program directly at the console, instruction l;ly instruction, instead of having to write it all beforehand and read it in from cards or paper tape." ( p. 10.) it is all too easy to fault a short book for shortcomings, but other books in this same series, such as j. m. foster's list processing, have proven the excellence possible in a trim so-shilling monograph (mr. foster's work is only 54 pages.) excellence in this format appears to require focus upon a narrow subject area, and discipline in the treatment of the core elements of the area. in attempting in 84 pages to cover several subjects of encyclopedic scope (library and information systems, as well as a basic computer tutorial), the author piles pelion upon ossa and then shows us sample pebbles from the pile instead of the view from the summit. there remains much important and exciting material to be presented to librarians and computer people about each other's work. regrettably, mr. henley, in the words of his fellow dubliner james joyce, has "speared the rod and spoiled the lightning." wiuiam r. nugent librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 129 debra a. riley-huff and julia m. rholes librarians and technology skill acquisition: issues and perspectives qualified individuals to fill these technology-driven librarian roles in our libraries and if so why? how are qualifications acquired and what are are they, besides a moving target? there appears to be two major convergent trends influencing this uncertain phenomenon. the first is what is perceived as “lack of awareness” and consensus about what the core of lis needs to be or to become in order to offer real value in a constantly changing and competitive information landscape.5 the other trend centers on the role of lis education and the continuing questions regarding its direction, efficacy, and ability to prepare future librarians for the modern information professions of now and the future. while changes are apparent it appears many lis programs are still operating on a two-track model of “traditional librarians and information managers” and there are enough questions in this area to warrant further investigation and inquiry.6 ■■ literature review most of the literature pertaining to the readiness of librarians to work in increasingly technical environments, centers on lis education. this certainly makes sense given the assumed qualifications the degree confers. scant literature focuses solely on the core of the librarians’ professional identity, workplace culture, and institutional historical perspectives related to qualifications; however, allusions to “redefining” lis are often found in lis education literature. there is limited research on preprofessional or even professional in-service training although calls for such research have been made repeatedly. a key study on lis education is the 2000 kaliper report, issued when the impact of technology in libraries was clearly reaching saturation.7 the report is the product of an analysis project with a goal of examining new trends in lis education. the report lists six trends including three of which are pertinent to the investigation of technology inclusion in lis programs. these trends note that in 2000, lis programs were beginning to address a more broad range of information problems and environments, programs were increasing it content into the curriculum, and several programs were beginning to offer specializations within the curriculum, though not ones with a heavy technology focus. in a widely cited curriculum study in 2004, markey completed a comprehensive examination of 55 libraries are increasingly searching for and employing librarians with significant technology skill sets. this article reports on a study conducted to determine how well prepared librarians are for their positions in academic libraries, how they acquired their skillss and how difficult they are to hire and retain. the examination entails a close look at ala-accredited lis program technology course offerings and dovetails a dual survey designed to capture experiences and perspectives from practitioners, both library administrators and librarianss who have significant technology roles. a recent oclc report on research libraries, risk, and systemic change discusses what arl directors perceive as the highest risks to their libraries.1 the administrators reported on several high risks in the area of human resources including high-risk conditions in recruitment, training, and job pools. the oclc report notes that recruitment and retention is difficult due to the competitive environment and the reduction in the pool of qualified candidates. why precisely do administrators perceive that there is a scarcity of qualified candidates? changes in libraries, most of which have been brought on by the digital age, are reflected in the need for a stronger technological type of librarianship—not simply because technology is there to be taken advantage of, but because “information” by nature has found its dominion as the supreme commodity perfectly transported on bits. it follows, if information is your profession, you are no longer on paper. that lis is becoming an increasingly technology-driven profession is both recognized and documented. a noted trend particularly in academic libraries is a move away from simply redefining traditional or existing library roles altogether in favor of new and completely redesigned job profiles.2 this trend verifies actions by library administrators who are increasingly seeking librarians with a wider range of information technology (it) skills to meet the demands of users who are accessing information through technology.3 johnson states the need well as we need an integrated understanding of human needs and their relationships to information systems and social structures. we need unifying principles that illuminate the role of information in both computation and cognition, in both communication and community. we need information professionals who can apply these principles to synthesize human-centered and technological perspectives.4 the questions then become, is there a scarcity of debra a. riley-huff (rileyhuf@olemiss.edu) is web services librarian, university of mississippi libraries, university, miss. julia m. rholes (jrholes@olemiss.edu) is dean of libraries, university of mississippi libraries, university, mississippi. 130 information technology and libraries | september 2011 academic libraries had embarked on an unprecedented increase in filling librarian positions with professionals who do not have a master’s degree in library science.13 citing the association of research libraries annual salary statistics, among a variety of positions being filled by other professionals a substantial number are going to those in technology fields such as systems and instructional technology. in the mid 2000s, suggestions that library schools needed to work more closely with computer science departments began coming up more often. obstacles to these types of partnerships were noted as computer science departments failed to see the advantage offered by library science faculty as well as being wary of taking on a “softening” by the inclusion of what is perceived as a “soft science.”14 in response, most library schools have added courses in computing, but many still question the adequacy. more recently there have been increasing calls from within lis for more research into lis education and professional practice. in 2006, a study by mckinney comparing proposed “ala core competencies” to what was actually being taught in ala-accredited curricula, shed some light on what is currently offered in the core of lis education.15 the study found that the core competency required most often in ala-accredited programs were “knowledge organization” or cataloging (94.6 percent), “professional ethics” (80.4 percent), “knowledge dissemination” or reference (73.2 percent), “knowledge inquiry” or research (66.1 percent), and “technical knowledge” or technology foundations (66.1 percent).16 these courses map well to ala core competencies but the question in the digital age, is one, not even universally required, technology-related course adequate for a career in lis? the literature would seem to reflect that it is not. 2007 saw many calls for studies of lis education using methods that not only examined course curricula but that also sought evidence of outcomes by those working in the field.17 an interest in studies reporting on employers’ views, graduates’ workplace experiences, and if possible longitudinal studies have been outwardly requested.18 indications are that those in library work environments can play a vital role in shaping the future course of lis education and preprofessional training by providing targeted research, data, and evidence of where weaknesses are currently being experienced and what changes are driving new scenarios. the most current literature points out both areas of technology deficiencies and emerging opportunities in libraries. areas with an apparent need for immediate improvement are the continuing integration of third-party web 2.0 application programming interfaces (apis) and social networking platforms.19 debates about job titles and labels continue but the actuality is that the number of adequately trained digital librarians has not kept up with the demand.20 modern libraries require those in technology-related roles to have broad or ala-accredited lis programs looking for change between the years 2000 and 2002.8 markey’s study revealed that while there were improvements in the number of it-related courses offered and required throughout programs, they were still limited overall with the emphasis continuing to be on the core curriculum consisting of foundations, reference, organization, and management. one of the important points markey makes is the considerable challenge involved in retraining or acquiring knowledgeable faculty to teach relevant it courses. the focus on lis education issues came to the fore in 2004 when michael gorman released a pair of articles asserting that there was a crisis in lis education, namely an assault on lis by what gorman referred to as “information science,” “information studies” and “information technology.”9 gorman’s papers sought to establish that there is a de facto competition between information science courses, which he characterized as courses with a computational focus and lis courses, which composed core librarianship courses, those tending to be the more user focused and organizational. gorman claimed lis faculty were being marginalized in favor of information science and made further claims regarding gender roles within the profession along the alleged lis/is split. gorman also noted that there was no consensus about how “librarianship” should be defined coming from either ala or the lis graduate programs. the articles were not without controversy, spurring a flurry of discussion in the library community, which spawned several follow up articles. dillon and norris rallied against the library vs. information science argument as a premise, which has no bearing on the reality of what is happening in lis and does nothing but create yet another distracting disagreement over labels.10 others argued for the increasing inclusion of technology courses in lis education, as estabrook put it, librarianship without a strong linkage to technology (and it’s capacity to extend our work) will become a mastodon. technology without reference to the core library principles of information organization and access is deracinated.11 as the future of lis was being hotly debated, voices in the field were issuing warnings that obstacles were being encountered finding qualified librarians with the requisite technology skills necessary to take on new roles in the library. in 2007, johnson made the case for the increasing need for new areas of emphasis in lis, including specializations such as geographic information systems by pointing out that it is not so much the granular training that is expected of lis education but a higher level technology skill set that allows for the ability to move into these specializations, identify what is needed, assess problems, and make decisions.12 in 2006, neal noted that librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 131 by examination of course catalogs and surveys of both library administrators and technology librarians. the lis educational data was obtained by inspecting course catalogs. course catalogs and website curriculum pages from all ala-accredited lis programs in the united states, canada, and puerto rico were examined in december 2009 for the inclusion of technology-related courses. the catalogs examined were for the 2009–10 academic year. spanish and french catalogs were translated. each available course description was reviewed and those courses with a primary technology component were identified. in a secondary examination the selected courses were closely inspected for the exact technology focus and the primary subject content was noted for each course. courses were then separated into categories by areas of focus and tabulated. a targeted survey identified practicing technology librarians’ perspectives on their level of preparation and continuing skill level needs based on actual job demands. in this survey, librarians with significant technology roles was defined as “for the purposes of this survey a librarian with a significant technology role would be any librarian whose job would very likely be considered “it” if they were not in a library and whose job titles contain words like “systems, digital, web, electronic, network, database, automation, and whose job involves maintaining and/or building various it infrastructures.” the survey was posted on various library and library technology electronic discussion lists in december 2009 and was available for two weeks. library administrative perspectives were also gained through a targeted survey aimed at those with an administrative role of department head or higher. the survey was designed to capture the reported experience library administrators have had with librarians in significant technology roles, primarily as it relates to skill levels, availability, hiring, and retention. this survey was posted on to various library administrative and technology discussion lists in december 2009 and was also available for two weeks. both surveys included many similar questions to compare and contrast viewpoints. results were tabulated to form an overarching picture and some relevant comparisons were made. there are limitations and inherent issues with this type of research. catalog examinations when completed by qualified librarians can hold great accuracy; however, the introduction of bias or misinterpretation is always possible.26 when categorizing courses, the authors reviewed course descriptions three separate times to ensure accuracy. courses in doubt were reviewed again with knowledgeable colleagues to obtain a consensus. surveys designed to capture perspectives, views, and experiences are by nature highly subjective and provide data that is both qualitative and quantitative. tabulated data was given strictly simple numerical representation to provide a factual picture of what was reported. specialized competencies in areas such as web development, database design, and management paired with a good working knowledge of classification formats such as xml, marc, ead, rdf and dublin core. educational technology (et) has been identified as an area of expected growth opportunity for libraries and there have been suggestions that more lis programs should partner with et programs to improve lis technology offerings, skills and preprofessional training.21 lis program change, including the apparent coalescing of information technology focused education would appear to be demonstrated by the ischool or ifield caucus of ala accredited programs, however the literature is not clear on if that is actually being evidenced. the ischools organization started in as collective in 2005 with a goal of advancing information science. ischools incorporate a multidisciplinary approach and those with a library science focus are ala accredited.22 a 2009 study interestingly applied abbott’s theoretical framework used in the chaos of disciplines to the ifield.23 resulting in abstract yet relevant conclusions, abbott looks at change in a field through a sociological lens looking for patterns of fractal distinction over time. the study concluded that traditional lis education remained at the heart of the ifield movement and that the real change has been in locale, from libraries to location independent.24 hall’s 2009 study exploring the core of required courses across almost all ala accredited programs reveals that the core curriculum is still principle-centered, but it is focusing less on reference and intermediary activities with a definite shift toward research methods and information technology.25 ■■ method this research study was designed to capture a broad view of technology skill needs, skill availability, and skill acquisition in libraries, while still allowing for some areas of sharper focus on stakeholder perspectives. the four primary stakeholder groups in this study were identified as lis educators, lis students, working librarians, and library administrators. the research questions cover three main areas of technology skill acquisition and employment. one area is lis education and whether the status of all technology course offerings has changed in recent years in response to market demands. the second area is the experience of librarians with significant technology roles with regards to job availability, readiness, and technology skill acquisition. the third area is, the perception of library administrators regarding the availability and readiness of librarians with technology roles. to cover the research questions and provide a broad situational view, the research was triangulated and aimed at the three question areas. data collection was accomplished 132 information technology and libraries | september 2011 may arguably be considered description or cataloging. metadata was included because it is an integral part of many new digital services. the categories are presented in column 1, the total number of courses offered is presented in column 2. the number of advanced courses available within each category total is further broken out into parenthesis. some programs offered more than one course in a given category; hence the percentage of programs offering at least one course is given in column 3. additionally, although the librarian survey was targeted to “those with significant technology roles,” it would appear that the definition of “significant” seemed to vary in interpretation by the respondents. this is discussed in further detail in the findings. given the limitations of this type of research, the authors did not attempt to find definite correlations, however trends and patterns are clearly revealed. ■■ catalog findings course catalogs from all 57 ala-accredited programs in the united states, canada, and puerto rico were examined for the inclusion of technology-related courses. a total of 439 technology-related courses were offered across the 57 lis programs, including certificate program course offerings. the total number of technology-related courses offered by program ranged from 2 to 20. the mean number of courses offered per program was 7.7, the median was 10, and the mode was 4. table 1 shows the total number of technology courses being offered per program by matching them with the number of courses they offer. catalog course content descriptions were analyzed looking for a technology focus. the fifteen categories noted in table 2 were selected as representative of the technology-related courses offered. it is acknowledged that some course content may be overlapping, but each course was placed in only one category based on its primary content. note also the inclusion of “metadata markup” which table 1. number of technology-related courses being offered per program # of programs offering # of courses offered 1 offers 2 courses 6 offer 3 courses 8 offer 4 courses 6 offer 5 courses 7 offer 6 courses 5 offer 7 courses 5 offer 8 courses 1 offer 9 courses 6 offer 10 courses 1 offers 11 courses 3 offer 12 courses 2 offer 13 courses 2 offer 14 courses 1 offers 15 courses 1 offers 17 courses 1 offers 18 courses 1 offers 20 courses table 2. course content description and number of courses offered across all programs. the number of advanced courses in the total is given in parenthesis. course type as categorized by the course content description in the lis program catalog # of courses offered % of programs offering at least 1 course database design, development and maintenance 47 (7) 70 web architecture (web design, development, usability) 52 (11) 68 broad technology survey courses (basics of library technologies and overviews) 50 65 digital libraries 43 (4) 61 systems analysis, server management 49 (6) 60 metadata markup (dc, ead, xml, rdf) 43 (10) 50 digital imaging, audio and video production 33 (5) 47 automation and integrated library systems 21 37 networks 32 (3) 35 human computer interaction 21 (4) 29 instructional technology 12 21 computer programming languages, open source technologies 12 (2) 17 web 2.0 (social networking, virtual reality, third party api’s) 11 17 user it management (microcomputers in libraries) 6 10 geographic information systems 6 (1) 8 librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 133 ■■ perspectives on job availability, readiness and skill acquisition as previously noted in the method, two surveys were administered to collect participant viewpoint data pertinent to the study. reponses were carefully checked to determine whether they met the criteria for inclusion in the study. no attempt was made to disqualify respondents based solely on job title. it did appear that a significant number of non-target subjects did initially reply to the librarian survey, but quit the survey at the technology-related questions. final inclusion was based on either an it-related job title or if the respondent answered the technology questions regardless of job title. tables 3–5 report demographic response data. ■■ perspectives on job and candidate availability a 2009 study by matthew and pardue asked the question “what skills do librarians need in today’s world?”29 they sought to answer this question by performing a content analysis, spread over five months, of randomly selected jobs from ala’s joblist. what they found in the area of technology was a significant need for web development, an assessment of the course catalog facts reveals that there have been increases in the number of technology courses offered in lis programs, but is it enough? significant longitudinal data shows an increased emphasis in the area of metadata. a 2008 study of the total number of lis courses offering internet or electronic resources and metadata schemas, found that the number of programs offering such as being ten (17.5 percent) with only twelve metadata courses offered in total.27 current results show 43 metadata courses offered with 50 percent of lis programs offering at least one course. the lack of a solid basis in web 2.0 applications and integration as reported by aharony is confirmed by the current catalog data, with only 17 percent of programs offering a course.28 while at first glance it looks like many technology-related courses are currently being offered in lis programs, a closer inspection reveals cause for concern. many of these courses should be offered by 100 percent of lis programs and advanced courses in many areas should be offered as well. while there may be some overlap of content in some of these course descriptions, the percentages are still too low to deduce that lis graduates, without preprofessional technology experience or education, are really prepared to take on serious technology roles in academic libraries. table 3. response data responses administrative survey librarian survey total responses 185 382 total usable (qualified) 146 227 table 4. respondents institution by size by type administrative survey librarian survey under 5,000 37 72 5,000 10,000 25 31 10,000 15,000 18 28 15,000 20,000 11 20 20,000 25,000 13 21 25,000 30,000 16 13 30,000 35,000 4 11 35,000 40,000 5 9 more than 40,000 12 21 unknown 5 1 table 5. respondent type administrative survey: position # of responses dean, director, university librarian 46 department head 71 manager or other leadership role 29 librarian survey: general area of work # of responses public services 48 systems 42 web services 32 reporting dual roles 31 digital librarian 29 electronic resources librarian 28 emerging/instructional technologies 18 administrative 10 metadata/cataloger 9 technical services 7 distance education librarian 4 demographic data 134 information technology and libraries | september 2011 based on the difficulty rating and the classifications were then averaged by difficulty. some respondents were unsure of difficulty ratings because the searches happened before their presence at their current library and those searches were excluded. position classifications with less than five searches were excluded from averaging and are marked “na” in table 6. the difficulty rubric is as follows: 1 = easy; 2 = not too bad, pretty straightforward search; 3 = a bit tough, the search was protracted; 4 = very difficult, required more than one search; 5 = unable to fill the position. it is to be noted that almost all levels of difficulties were reported for many classifications but that the overall average hiring difficulty rating was 2.48. a comparable set of questions was posted to the librarian survey. we asked librarians to report professional level technology positions they had held in the past five years along with any current job searches. 164 responses were received by people indicating that they had held such a position or were searching for one, with the total number of positions/searches being reported at 316 with some respondents reporting multiple positions. respondents reported having between one and five different positions with the average number being 1.92 jobs per respondent (see table 7). the respondents were also asked to give the position title for each position held or positions they were applying for as well as the difficulty encountered in obtaining the position. like the administrative report, job titles were project management, systems development, and systems applications. further they suggest that some librarians are using a substantial professional it skills subset. this article’s literature review points out that there are assertions being made that some technology-related librarian positions are difficult to fill and may in fact be filled by non-mls professionals. in the associated surveys the authors sought to capture data related to actual job availability, search experiences and perspectives by both library administration and librarians. note that both mls librarians and a few professional library it staff completed the survey. the distinction is made where appropriate. the survey asked library administrators if they had hired a technology professional position in the past five years. 146 responses were received and 100 respondents indicated that they had conducted such a search, with the total number of searches being reported at 167. of these searches, 22 did not meet the criteria for inclusion due to other missing data such as job title. the total reported number of librarian/professional level technology positions that were posted for hire by these respondents was 145 with some respondents reporting multiple searches for the same or different positions. respondents conducting searches reported having between 1 and 5 searches total with the average number being 1.45 per respondent. the respondents were also asked to provide the position title for each search, the difficulty encountered in conducting the search, and the success rate. job titles were divided into categories to ascertain how many positions in each category reported having a relevant search conducted. each search was then assigned a point value table 6. administrative report on positions open, searches and difficulty of search (n = 145) position classification searches search difficulty systems/ automation librarian 40 2.78 digital librarian 32 2.6 emerging & instructional technology librarian 15 2.53 web services/ development librarian 33 2.51 electronic resources librarian 22 1.95 database manager 1 na network librarian/ professional 1 na table 7. librarian report on positions held or current searches and difficulty (n = 316) position classification # of positions/ searches search difficulty administrative 8 3 technical services 17 2.11 public services 57 2.1 systems/ automation librarian 76 1.89 web services/ development librarian 38 1.89 electronic resources librarian 39 1.87 digital librarian 41 1.8 metadata/cataloger 13 1.77 distance education librarian 6 1.66 emerging & instructional technology librarian 21 1.61 reporting dual roles 30 na librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 135 employment status for “newly minted” mls graduates having just entered the profession were asked in a survey “did specific information technology or computer skills lead to you getting a job?” the answer was a “resounding yes” by 66 percent of the respondents.33 experience is divided into categories to ascertain how many positions in each classification category. each position classification was then assigned a point value base on how the respondents rated the difficulty of those particular searches and the classifications were then averaged by difficulty using the same scale that was applied in the administrative survey. again, almost all levels of difficulties were reported for many classifications but that the overall average hiring difficulty rating was 1.9. to provide as accurate a picture as possible the surveys asked both groups to indicate if any well known mitigating factors contributed to complications with the job searches. these factors are shown in table 8 which stacks both groups for comparison. this particular dataset reveals some interesting patterns. those roles that were in the most demand were the also the most difficult to hire for, while these also were the easier positions for candidates to find. librarians also listed more job categories as having a significant technology component than the administrators had. perhaps most notable is the discrepancy shown between how administrators perceive the qualifications of candidates as compared to how candidates view themselves. while both groups acknowledge lack of it skills and qualifications as the number one mitigating factor, library administrators perceive the problem as being significantly more serious. this data backs up other recent findings that important new job categories are being defined in lis.30 the data also further support that these roles, while centering on core librarianship principles, have a different skill set.31 ■■ job readiness perspectives issues of job readiness for academic librarians need to be looked at from a number of different perspectives. job readiness can be understood in one way by a candidate and can be something different to an employer. job readiness is not only of critical concern at the beginning of a librarian’s career, clearly this attribute continues to be significant throughout an individual’s length of service in one or more roles and to one or more employers. job readiness is composed of several factors, the most important being education, experience and ongoing skill acquisition. while this is certainly true for all librarians it is of even more concern to those librarians with significant technology roles because of rapid changes in technology. a concern has been established in the literature and in this study that lis education, in the areas of technology, may be inadequate and lack the intensity necessary for modern libraries. this perception has been backed up by entrants to the profession.32 that technology skills are extremely important to library employers has been evident for at least a decade. in 2001 a case study on table 8. mitigating factors in hiring and job search (n = 93) administrative survey: mitigating factors in hiring as a percentage of respondents to the question (n = 93) % of responses we had difficulty getting an applicant pool with adequate skills 54 we are unable to offer qualified candidates what we feel is a competitive salary. 38 we are located in what may reasonably be perceived as an undesirable area to live. 23 we are located in an area with a very high cost of living. 23 we have an it infrastructure or environment that we and/or a candidate may have perceived as unacceptable. 20 the current economic climate has made hiring for these types of positions easier. 18 a successful candidate did not accept an offer of employment 13 librarian survey: mitigating factors in job search as a percentage of respondents to the question (n = 198) % of responses i suspect i may not have/had adequate skills, experience or i was otherwise unqualified. 25 i have not been able to find a position for what i consider to be a fair salary. 11 many jobs are located in what may reasonably be perceived as an undesirable area to live. 10 many jobs are located in an area with a very high cost of living. 15 some jobs have an it infrastructure or environment that i have perceived as unacceptable. 10 the current economic climate has now made finding these types of positions tougher. 22 i was a successful candidate but i could or did not accept an offer of employment. 3 136 information technology and libraries | september 2011 library technology experience they preferred from a candidate. there were 97 responses; the range of preferred experience was 0–7, the mean was 3.06, and the mode was 3. librarians were also asked how much experience they had in a technology-related library role. there were 187 responses; the range of experience was 0–39 years, the mean was 8.7, the mode was 5. when participating administrators were asked if they felt it was necessary to have an mlis librarian fill a technology-related role that is heavily user-centric, 110 administrators responded. also a very important factor, with one study of academic library search committees reporting committee members mentioning that “experience trumps education.”34 this study sought to gather data on possible patterns in the job readiness area. the authors wanted to know how job candidates and employers felt about the viability of new mls graduates, how experience factored into job readiness, how much experience is out there and how long term experience impacted expectations. the survey asked administrators how many years of table 9. question sets related to experience factors by group administrative survey strongly disagree disagree can’t say agree strongly agree new librarians right out of graduate school seem to be adequately prepared (n = 111) 7% 40% 24% 28% 1% librarians with undergraduate or 2nd graduate degrees in a technology/computer fields seem adequately prepared (n = 109) 1% 9% 48% 39% 4% librarians with pre-professional technologyrelated experience seem adequately prepared (n = 109) 1% 6% 47% 41% 8% librarians with some (up to 3 years) post mls technology experience seem adequately prepared (n = 111) 1% 10% 17% 62% 10% librarians with more than 3 years post mls technology experience seem adequately prepared (n = 111) 1% 3% 24% 55% 16% librarians never seem adequately prepared for technology roles (n = 111) 19% 55% 12% 7% 6% librarian survey strongly disagree disagree other agree strongly agree as a new librarian right out of graduate school i was adequately prepared (n = 187) 12% 19% no grad degree 3% 42% 8% i have an undergraduate or 2nd graduate degree in a technology/computer field that has helped me be adequately prepared (n = 187) 13% 7% no tech degree 60% 13% 6% i had pre-professional technology-related experience that helped me be adequately prepared (n = 187) 3% 7% no such experience 20% 43% 27% i have less than 3 years of post mls technology experience and i am adequately prepared (n = 180) 6% 13% na 63% 16% 1% i have more than 3 years of post mls technology experience and i am adequately prepared (n = 184) 2% 12% na 17% 48% 20% i have never felt like i am adequately prepared for technology roles (n = 186) 19% 43% neutral 23% 12% 2% librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 137 readiness of new librarians and the value of related technology degrees. areas of agreement are noted in the importance of preprofessional experience, three or more years of experience, and the generally positive attitude regarding librarians’ ability to successfully take on significant technology roles in libraries. ■■ ongoing skill acquisition and retention how librarians with significant technology roles acquire the skills needed to do their jobs and how they keep those skills current was of great interest in this study. the importance of preprofessional experience has been noted but we should also include the value of service learning in lis education as an important starting point. successful service learning experiences include practicum and partnerships with libraries in need of technology-related services. successful projects such as online exhibits, wireless policies, taxonomy-creation and cross-walking for contentdm are just a few of the service projects that have given lis students real-world experience.35 this responses ranged from 50 percent “yes,” 38 percent “no,” and 12 percent “unsure.” to the same question, 195 practicing technology librarians responded with 58 percent “yes,” 23 percent “no,” and 20 percent “unsure.” the administrator participants were asked if they had ever had to fill a technology-related librarian role with a non-mls hire simply because they were unable to find a qualified librarian to fill the job. of 106 responses, 22 percent reported that they hired a non-mls candidate. the librarian participants were also was asked to report on mls status; out of 194 responses, 93 percent reported holding an mls or equivalent. the survey also asked the librarian participants to report what year they graduated from their mls program as the authors felt this data was important to the inherent longitudinal perspectives reported in the study. of 162 responses, participants reported graduating between 1972–2009. the mean was 1999, the median was 2002, and the mode was 2004. table 9 shows a question set related to experience factors, which stacks both groups for comparison. there are a few notable points in this particular dataset including what appears to be an area of disagreement between administrators and librarians about the table 10. education and skill supplementation for librarians with technology roles administrative survey: in what ways have you supplemented training for your librarians or professional staff with technology-related roles? (does not include ala conferences) % we have paid for technology-related conferences and pre-conferences. 79 we have paid for or allowed time off for classes. 72 we have paid for or allowed time for off online workshops and /or tutorials 87 we have paid for books or other learning materials. 55 we have paid for some or all of a 1st or 2nd graduate degree. 12 we would like to supplement but it is not in our budget. 5 we feel that keeping up with technology is essential for librarians with technology-related roles. 73 librarian survey: in what ways have you supplemented your own education related to technology skill development in terms of your time and/or money? (not including ala conferences) % i have attended technology-related conferences and pre-conferences. 73 i have taken classes. 60 i have taken online workshops and/or tutorials 87 i have bought books or other learning materials. 77 i am getting a 1st or 2nd graduate degree. 9 we would like to supplement my own education but i can not afford it. 13 i would like to supplement my own education but i do not have time. 13 i have not had to supplement in any way. 1 i feel that keeping up with technology is essential for librarians with technology-related roles. 84 i feel that keeping up with technology is somewhat futile. 11 138 information technology and libraries | september 2011 librarians who have transitioned successfully into technology centric roles. this supports the perception that experience and on the job learning play a leading role in the development of technology skills for librarians. openended survey comments also revealed a number of staff who initially were hired in an it role and then went on to acquire an mls while continuing in their technologyfocused role. retention is sometimes problematic for librarians with it roles, primarily because many of them are also employable in many other settings apart from libraries. the survey asked administrators “do you know any librarians with technology roles that have taken it positions outside the library field?” and out of 111 respondents, 33 percent answered “yes.” in open-ended responses the most common reasons administrators felt retention may be a problem was salary, lack of challenges/opportunities, and risk averse cultures. the survey also asked the librarian group “do you think you would ever consider taking an it position outside the library field?” out of 190 respondents; 34 percent answered “yes,” 23 percent “yes, but only if it was education related,” and 42 percent “no.” additionally 38 percent of these librarian respondents knew a librarian who took an it position outside the library field. for the librarian participants an open response field in the survey, named work environment and lack of support for technology as the most often named reasons for this leaving a position. the surveys used in this research study covered several complicated issues. those who responded to the surveys were encouraged to leave open text comments research study asked administrators and librarians in what formal ways they supplement their ongoing education and skill acquisition. table 10 shows these results in a stacked format for comparison. also of interest in this data set is the higher level of importance librarians place on continuing skill development in the area of technology. in open ended text responses a number of librarians reported that the less formal methods of monitoring electronic discussion lists and articles was also a very important part of keeping up with technology in their area. the priority of staying educated, active and current for librarians with significant technology roles cannot be underestimated; what tennant defines as technology agility, the capacity to learn constantly and quickly. i cannot make this point strongly enough. it does not matter what they know now. can they assess a new technology and what it may do (or not do) for your library? can they stay up to date? can they learn a new technology without formal training? if they can’t they will find it difficult to do the job.36 not all librarians with technology roles start out in those positions and thus role transformation must be examined. in some cases librarians with more traditional roles such as reference and collection development have transformed their skill set and taken on technology centric roles. table 11 shows the results of the survey questions related to role transformation in a stacked format for comparison. to be noted in this data set is the large number of table 11. role transformation from traditional library roles to technology centric roles and the reverse. administrative survey (n = 104) % we have had one or more librarians make this transformation successfully. 53 we have had one or more librarians attempt this transformation with some success. 35 we have had one or more librarians attempt this transformation without success. 17 some have been interested in doing this but have not done so. 14 we do not seem to have had anyone interested in this 11 we have had one or more librarians who started out in a technology-related librarian role but have left it for a more traditional librarian role. 5 librarian survey (n = 184) % i started out in a technology-related librarian role and i am still in it. 45 i have made a complete technology role transformation successfully from another type of librarian role. 30 i have attempted to make a technology role transformation but with only some success. 12 i have made a technology role transformation but sometimes i wish i had not. 9 i have made a technology role transformation but i wish i had not and i am interested in returning to a more traditional librarian role. 9 i am not a librarian. 4 librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 139 vary considerably from program to program and the content of individual courses appears to vary considerably as well. there appears to be a clear need for additional courses at a more advanced level. this need is evidenced by the experiences of both information technology job candidates and the administrators involved in the hiring decisions. there are clearly still difficulties in both the acquisition of needed skill sets for certain positions and in actual hiring for some information technology positions. there are also some discrepancies between how administrators perceive candidates’ qualifications as compared to how the candidates view themselves. administrators perceive the problem of a lack of it skills/qualifications as more serious than do candidates. the two groups also differ on the question of “readiness” of new professionals. the two groups do agree on the importance of preprofessional experience, and they both exhibit generally positive attitudes toward librarians’ ability to successfully take on significant technology roles in libraries. in several key areas. a large number of comments were received and many of them were of considerable length. many individuals clearly wanted to be heard, others were concerned their story would not be captured in the data, and many expressed a genuine overall interest in the topic. a few salient comments from a variety of areas covered are given in table 12. ■■ conclusion this study seeks to provide an overview of the current issues related to it staffing in academic libraries by reporting on three areas dealing with library skill acquisition and employment. with regards to the status of technology course offerings in lis programs, there has been a significant increase in the number of technologyrelated courses, but the numbers of technology courses table 12. a sample of open ended responses from the two surveys administrative survey “there is a huge need for more and adequate technology training for librarians. it is essential for libraries to remain viable in the future.” “only one library technology position (coordinator) is a professional librarian. others are professional positions without mls.” “there is a lot of competition for few jobs, especially in the current economic climate.” “we finally hired at the level of technician as none of the mls candidates had the necessary qualifications.” “if i wanted a position that would develop strategy for the library’s tools on the web or create a digitization program for special collections, i probably would want an mls with library experience simply because they understand the expectations and the environment.” “number of years of experience in technology is not as important as a willingness to learn and keep current. sometimes old dogs won’t move on to new tricks. sometimes new dogs aren’t interested in learning tricks.” librarian survey “i believe that because technology is constantly changing and evolving, librarians in technology-oriented positions must do the same.” “my problem with being a systems librarian in a small institution is that the job was 24/7/365. way too much stress with no down time.” “i have left the library field for a few years but came back. my motivation was a higher salary, but that didn’t really happen.” “i’m considering leaving my current position because the technology role (which i do love) was added to my position without much training or support. now that part of my job is growing so that i can’t keep up with all my duties.” “i don’t think that library school alone prepared me for my job. i had to do a lot of external study and work to learn what i did, and worked as a part-time systems library assistant while in school, where i learned the majority of what prepared me for my current job.” “library schools need to be more rigorous about teaching students how to innovate with technology, not just use tools others have built. you can’t convert “traditional” librarians into technology roles without rigorous study. otherwise, you will get mediocre and even dangerous results.” 140 information technology and libraries | september 2011 16. ibid., 53–54. 17. thomas w. leonhardt, “thoughts on library education,” technicalities 27, no. 3 (2007): 4–7. 18. thomas w. leonhardt, “library and information science education” technicalities 27, no. 2 (2007): 3–6. 19. noa aharony, “web 2.0 in u.s. lis schools: are they missing the boat?” ariande 30, no. 54 (2008): 1. 20. chuck thomas and salwa ismail patel, “competencybased training for digital librarians: a viable strategy for an evolving workforce?” journal of education for library & information science, 49, no. 4 (2008): 298–309. 21. michael j. miller, “information communication technology infusion in 21st century librarianship: a proposal for a blended core course,” journal of education for library & information science 48, no. 3 (2007): 202–17. 22. “about the ischools.” (2010); http://www.ischools.org/ site/about/ (accessed 9/1/2010). 23. laurie j. bonnici, manimegalai m. subramaniam, and kathleen burnett, “everything old is new again: the evolution of library and information science education from lis to ifield,” journal of education for library & information science 50, no. 4 (2009): 263–74; andrew abbott, the chaos of disciplines (chicago: chicago univ. pr., 2001). 24. bonnici, “everything old is new again,” 263–74. 25. russell a. hall, “exploring the core: an examination of required courses in ala-accredited,” education for information 27, no. 1 (2009): 57–67. 26. ibid., 62. 27. jane m. davis, “a survey of cataloging education: are library schools listening?” cataloging & cataloging quarterly 46, no. 2 (2008): 182–200. 28. aharony, “web 2.0 in u.s. lis,” 1. 29. janie m. mathews and harold pardue, “the presence of it skill sets in librarian position announcements,” college & research libraries 70, no. 3 (2009): 250–57. 30. “redefining lis jobs,” library technology reports 45, no. 3, (2007): 40. 31. youngok choi and edie rasmussen, “what qualifications and skill are important for digital librarian positions in academic libraries? a job advertisement analysis,” the journal of academic librarianship 35, no. 5 (2009): 457–67. 32. carla j. soffle and kim leeder, “practitioners and library education: a crisis of understanding,” journal of education for library & information science 46, no. 4 (2005): 312–19. 33. marta mestrovic deyrup and alan delozier, “a case study on the current employment status of new m.l.s. graduates,” current studies in librarianship 25, no. 1/2, (2001): 21–38. 34. mary a. ball and katherine schilling, “service learning, technology and lis education,” journal of education for library & information science 47, no. 4 (2006): 277–90. 35. marta mestrovic deyrup and alan delozier, “a case study on the current employment status of new m.l.s. graduates,” current studies in librarianship 25, no. 1/2 (2001): 21–38. 36. roy tennant, “the most important management decision: hiring staff for the new millennium,” library journal 123, no. 3 (1998): 102. more research is still needed to identify the key technology skills needed. case studies of successful library technology teams and individuals may reveal more about the process of skill acquisition. questions regarding how much can be taught in lis courses or practicum, and how much must be expected through on-the-job experience are good areas for more research. references 1. james michalko, constance malpas and arnold arcolio, “research libraries, risk and systematic change,” oclc research (mar. 2010), http://www.oclc.org/research/publications/ library/2010/2010-03.pdf. 2. lori a. goetsch, reinventing our work, “new and emerging roles for academic librarians,” journal of library administration 48, no. 2 (2008): 157–72. 3. janie m. mathews and harold pardue, “the presence of it skill sets in librarian position announcements,” college and research libraries 70, no. 3 (2009): 250–57. 4. peggy johnson, “from the editor’s desk,” technicalities 27, no. 3 (2007): 2–4. 5. ton debruyn, “questioning the focus of lis education,” journal of education for library & information science 48, no. 2 (2007): 108–15. 6. jacquelyn erdman, “education for a new breed of librarian,” reference librarian 47, no. 98 (2007): 93–94. 7. “educating library and information science professionals for a new century: the kaliper report,” executive summary. aliper advisory committee, alise. (reston, virginia, july 2000), http://www.si.umich.edu/~durrance/textdocs/ kaliperfinalr.pdf (accessed june 1, 2010). 8. karen markey, “current educational trends in library and information science curricula,” journal of education for library and information science 45, no. 4 (2004): 317–39. 9. michael gorman, “whither library education?” new library world 105, no. 9/10 (2004): 376–80; michael gorman, “what ails library education?” journal of academic librarianship 30, no. 2 (2004): 99–101. 10. andrew dillon and april norris, “crying wolf: an examination and reconsideration of the perception of crisis in lis education,” journal of education for library & information science 46, no. 4, (2005): 208–98. 11. leigh s. estabrook, “crying wolf: a response,” journal of education for library & information science 46, no. 4 (2005):299–303. 12. ian m. johnson, “education for librarianship and information studies: fit for purpose?” information development 23, no.1 (2007): 13–14. 13. james g. neal, “raised by wolves,” library journal 131, no. 3 (2006): 42–44. 14. sheila s. intner, “library education for the third millennium,” technicalities 24, no. 6 (2004): 10–12 15. renee d. mckinney, “draft proposed ala core competencies compared to ala-accredited, candidate, and precandidate program curricula: a preliminary analysis,” journal of education for library & information science 47 no.1 (2006): 52–77. editorial board thoughts: doesn’t work mark cyzyk editorial board thoughts | cyzyk 3 the proof of the pudding’s in the eating. miguel de cervantes saavedra. the ingenious hidalgo don quixote de la mancha. part i, chapter xxxvii, john rutherford, trans. about fifteen years ago i had two students from germany working for me, jens and andreas. those guys were great. they were smart and funny and interesting and always did their best. i would send them out to fix things around the library, and they would dutifully report back with success or failure. i told them that, particularly if there was a problem with a staff workstation, “if it breaks in the morning, it must be fixed by lunchtime; if it breaks in the afternoon, it must be fixed by 5:00.” they understood that if a staff workstation was down, then that probably also meant a staff member was just sitting there, waiting for it to be fixed. if we had to we could slap a sign on a broken public workstation and get back to it later—there were plenty of other working public stations after all—but staff workstations must be working at all times. insofar as we had an aged fleet of pcs whose cmos batteries were rapidly giving out, i kept jens and andreas running around the building quite a bit. on occasion, though, they would report back with the dreaded, “hey boss, doesn’t work.” this was the one thing that would raise my ire. “of course it doesn’t work, that’s why i sent you down there!” i would think. the phrase “doesn’t work” became for me a pavlovian signal that i was about to drop everything and go take a look myself. it now occurs to me, though, that this notion of “work” is precisely the point of technology, and that sometimes this gets lost for those of us employed fulltime as technologists in libraries. let me explain: in my opinion and for the most part, the proper role of the technologist in a library is that of a consultant on loan to the departments to work on projects there, embedded.1 two of the best bosses i ever had said essentially the same thing to me in our introductory first-day-on-the-job chit-chat: “you report to me, but you work for them.” such is the proper attitude in any serviceoriented profession. does this not frequently get inverted, subverted, lost? what happens is that technology starts to take on an importance undeserved. it becomes selfreferential and insular; a technology-for-technology’s-sake attitude arises. mark cyzyk (mcyzyk@jhu.edu) is scholarly communication architect in the sheridan libraries, john hopkins university. mailto:mcyzyk@jhu.edu information technology and libraries | june 2012 4 but technology-for-technology’s-sake is just wrong. technology is merely a means to an end, not an end in itself. the word itself derives from the ancient greek technê, most frequently translated into english as “craft” and frequently distinguished in the greek philosophical literature from epistêmê or (certain) knowledge.2 so it is here that the crucial distinction in the western world between practical and theoretical activities is made, and technology is clearly a practical, not theoretical activity. as such, it has by its very nature practical outcomes in the world: technology works in the world. technology is instrumental in achieving certain practical outcomes; its value is as a tool, instrumentally valuable, not inherently valuable. it is not for its own sake that we implement technology; we implement technology to get some sort of work accomplished in the world. our programming languages, application servers, web application frameworks, ajax libraries, integrated development environments, source-code repositories, build tools, testing harnesses, switches, routers, single-signon utilities, proxy servers, link resolvers, repositories, bibliographic management utilities, help-desk ticketing applications, and elaborate project-management protocols are all for naught if the final product of our labor, at the end of the day, doesn’t work. our product is not only literally useless, it is worse than useless because the library in which we labor has devoted precious resources to it only to result in a service or product that does not properly function, and those are precious resources that could have been spent elsewhere. hey there fellow technologists, why am i being so dismal? i would prefer the term “grave” to “dismal.” significant portions of the library budget are put toward technology each year, and as those whose duty it is to carry our local technology strategies into the future, we need to always be mindful of the fact that each and every dollar spent on technology is a dollar not available for building our collections—surely the direct center of the mission of anyone who calls himself a librarian, a.k.a., a cultural conservationist. (shouldn’t we be wearing badges that read, “to collect and preserve”?) making it work is job one for the technologist in the library. … a colleague and friend of mine once told me, a decade ago, that our fellow colleague made a snippy comment about an important and major web application i had written, “just because it works doesn’t mean it’s right.” now, admittedly, i was a very sloppy code formatter, and yet i certainly would never say that the applications i wrote were steaming plates of spaghetti. on the contrary, i think the code i wrote consisted of good, solid procedural programming. what my disgruntled colleague meant, i think, was that i failed to follow a framework, and by “framework” he naturally meant the same framework to which he’d recently hitched his own coding wagon. my response to his snippiness was, “ah, pretty-it-up all you want, organize it any-which-way, but functional code-code that works--is actually the number one criterion for being good code.” just ask your clients. editorial board thoughts | cyzyk 5 that app i wrote has been in production, happily working away as a key piece of the enterprise network infrastructure at a prominent, multi-campus, east coast university since 2002.3 references 1. and here i heartily agree with my fellow editorial board member, michael witt, when he notes that “[p]art of this process is attempting to feel our users’ pain…”, and i even extend this to the point of us technologists actively working with our users toward a common goal, literally sitting with them, among them, not merely being present to offer occasional support, not merely feeling their pain but being so invested in our common project that their pain is our pain. [did i really just suggest we take on more pain?! yep.] see: michael witt. “eating our own dogfood.” information technology and libraries 30, no. 3 (september 2011) 90. http://www.ala.org/lita/ital/sites/ala.org.lita.ital/files/content/30/3/pdf/witt.pdf 2. i’m no classics scholar, but this is my recollection from taking a graduate seminar many years ago on this very topic. so while i’m not pulling this entirely out of thin air, i am pulling it from the musty mists of middle-aged memory – that, and a quick scan of professor richard parry’s fine article on this topic in the stanford encyclopedia of philosophy, particularly the section on aristotle’s views. regarding my comments below on technology being instrumentally valuable, i cite parry’s words: “presumably, then, the craftsman does not choose his activity for itself but for the end; thus the value of the activity is in what is made”. see: richard parry. "episteme and techne," the stanford encyclopedia of philosophy, fall 2008 edition, edward n. zalta, editor. http://plato.stanford.edu/archives/fall2008/entries/episteme-techne/ 3. mark cyzyk, "the johns hopkins address registration system (jhars): anatomy of an application," educause quarterly 26, no. 3 (2003). https://jscholarship.library.jhu.edu/handle/1774.2/32800 http://www.ala.org/lita/ital/sites/ala.org.lita.ital/files/content/30/3/pdf/witt.pdf http://plato.stanford.edu/archives/fall2008/entries/episteme-techne/ https://jscholarship.library.jhu.edu/handle/1774.2/32800 experiences of migrating to an open source integrated library system vandana singh information technology and libraries | march 2013 36 abstract interest in migrating to open-source integrated library systems is continually growing in libraries. along with the interest, lack of empirical research and evidence to compare the process of migration brings a lot of anxiety to the interested librarians. in this research, twenty librarians who have worked in libraries that migrated to open-source integrated library system (ils) or are in the process of migrating were interviewed. the interviews focused on their experiences and the lessons learned in the process of migration. the results from the interviews are used to create guidelines/best practices for each stage of the adoption process of an open-source ils. these guidelines will be helpful for librarians who want to research and adopt an open-source ils. introduction open-source software (oss) has become increasingly popular in libraries, and every year more libraries migrate to an open-source integrated library system.1 while there many discrete opensource applications used by libraries, this paper focuses on the integrated library system (ils), which supports core operations at most libraries. the two most popular open-source ilss in the united states are koha and evergreen, and they are being positioned as alternatives to proprietary ilss. 2 as open-source software becomes more widely used, it is not enough just to identify which software is the most appropriate for libraries, but it is also important to identify best practices, common problems, and misconceptions with the adoption of these software packages. the literature on open-source ilss is usually in the form of a case study from an individual library or a detailed account of one or two aspects of the process of selection, migration, and adoption. in our interactions with librarians from across the country, we found that there are no consolidated resources for researching different open-source ilss and for sharing the experiences of the people using them. librarians who are interested in open-source ils cannot find one resource that can give them an overview of the necessary information related to open-source ilss. in this research, we interviewed twenty librarians from different types and sizes of libraries and gathered their experiences to create generalized guidelines for the adoption of open-source ilss. these guidelines are at a broader level than one single case study and cover all the different stages of the adoption lifecycle. the experiences of librarians are useful for people who are evaluating opensource ilss as well as those who are in the process of adoption. learning from their experiences will help librarians to not have to reinvent the wheel. this type of research helps the librarians by empowering them with the information they need; also, it helps us in understanding the current status of this popular software. vandana singh (vandana@utk.edu) is assistant professor, school of information sciences, university of tennessee, knoxville, tennessee. mailto:vandana@utk.edu experiences of migrating to an open-source integrated library system | singh 37 literature review as mentioned earlier, most of the literature on open-source ils is practitioner-based and provides case studies or single steps in the process of adoption. these research studies and resources are useful but do not address the broad information needs of the librarians who are researching the topic of open-source ilss. every library is different, so no two libraries are going to take the same path in the adoption process. the usefulness of these articles depends on whether the searcher can find one in a similar environment. another issue is the amount of information given in these resources. often these papers discuss only one aspect of moving to an open-source ils, for example choosing the open-source ils. if they do cover the whole process, there is usually not enough detail to know how they did it. for example, morton-owens, hanson, and walls organize their paper into five sections: motivation and requirements analysis, software selection, configuration, training, and maintenance. 3 however, each section includes more main points than description. another relevant stream of literature includes those that compare different opensource ilss. these range from little more than links to different open-source projects to in-depth comparisons.4 for example, muller evaluated open-source communities for different ilss on forty criteria and then compared the ils on over eight hundred functions and features.5 these types of articles are very useful for those who are trying to become acquainted with the different opensource ilss that are available and are in the evaluation phase of the process. again, they are not helpful in understanding the entire process of adoption. some best practices articles such as tennant may be a little older, but his nine tips are still valid and very useful as a good foundation for anyone thinking about making the switch to open-source ils.6 what are the factors for moving to an open-source ils? another reason why an open-source ils appeals to libraries is its underlying philosophy: “open source and open access are philosophically linked to intellectual freedom, which is ultimately the mission of libraries.” 7 the other two common reasons are cost and functionality. the literature covering the decision to move to an open-source ils makes it clear that there is a wide variety of ways that libraries come to this decision. in espiau-bechetoille, bernon, bruley, and mousin, the consortium made the decision in four parts. 8 the article states that they initially determined that four open-source ilss met their needs (koha, emilda, gnuteca and evergreen), although it is somewhat vague as to how they determined that koha was the best for their situation. indeed, most of the article is about how the three libraries involved had to work together, coordinating and dividing responsibilities. bissels shares that money was the main reason that the complementary and alternative medicine library and information service (camlis) decided to migrate to koha.9 they explain the process of making that decision. camlis was being developed from nothing, which makes their situation different than most libraries, and hence the process is different as well. michigan is an area known for its number of evergreen libraries. much of that is due to michigan library consortium. dykhuis explains the long, involved process that led to a number of evergreen installations. 10 mlc provides services to michigan information technology and libraries | march 2013 38 libraries, such as training and support. when they started looking for an ils system that all libraries could use, the main concerns were cost and functionality, which are the two key aspects that are mentioned in any discussion about choosing an ils. kohn and mccloy state that they decided to migrate to a new ils due to frustration with their current ils and that they involved all six of their librarians in the decision-making process.11 dennison and lewis show another reason why people migrate to open-source ils.12 they say that the proprietary system they were using was much more complicated than they needed. in addition, because of staff turnover no one really understood the system. this lack of expertise combined with increasing annual costs led to the decision to move to an open-source ils. an important lesson to take from this article is that they included all six of their librarians in the decisionmaking process. for a smaller library where everyone is an expert in their area of the library, it is important to get everyone involved in order to make sure that important functions or needed capabilities are not overlooked. almost any library that chooses open-source ils will name cost as one of their primary reasons. functionality is usually what determines which ils they choose. riewe conducted a study where he asked why each library chose its current ils. 13 open source libraries responded most often with ability to customize, the freedom from vendor lock-in, portability, and cost. how does migration happen? there are two general ways to do a migration: all at once or in stages. kohn and mccloy discuss a three-phase migration.14 the reason for this method was to spread the cost over several years. they did the public website and federated catalog as phase one and did the backend part during phases two and three. when multiple libraries are involved, phased migration is more like what is described in dykhuis.15 in that case, first a pilot program was created where a few libraries migrated over to the new system. when that was successful, then more libraries migrated. in contrast to a phased migration, walls discusses a migration completed in three months.16 this time includes installation, testing, and configuration. one interesting decision they made was to migrate at the end of the fiscal year in order to limit the amount of acquisitions data to be migrated. dennison and lewis completed their migration in two months. in this migration, most of the work was done by the company that was hosting their system. 17 this limited the amount of expertise that the library staff needed and made the migration much smoother from their perspective. migration can also be an opportunity; for example, morton-owens, hanson, and walls mention that they used the migration to koha to synchronize circulation rules between the branches. 18 it was also used to weed out inactive patrons (anyone who had not used the library in two years). data migration can be a problem, though. in the old system, the location code had been used for where the item was within the branch library, what kind of item it was, and how it circulated, but experiences of migrating to an open-source integrated library system | singh 39 these are three separate fields in koha. however, to some extent these issues are true of any migration between different systems. the migration experience is not always of a smooth transition. one of the advantages of opensource is the ability to customize and to develop functions that are specific to your library. in the case of new york academy of medicine library (nyam) working with its consortium waldo (westchester academic library directors organization), it was the decision to have developments completed before migration that caused the problems.19 their migration schedule was delayed by a month, and even after the delay not all of the eleven key features were complete. in addition, their migration took place when liblime (a proprietary vendor) with whom they were working announced their separation from the koha open-source community, which caused additional confusion. there are a couple of lessons to take from this. first, if doing development, be sure that the time needed is built into the migration schedule. also, when choosing an ils, think about how many developments are going to be necessary to successfully run the ils in your environment. lastly, try to prioritize the developments to minimize the number needed before “going live.” what does the literature say about training? very little is available about the training process for open-source ils. in current studies, training can be done in two ways: either by buying training from a vendor, or doing it internally.20 dennison and lewis found that having staff work on the system together at first and then try it independently was the most successful. 21 they had a demonstration system to practice, which also helped. in addition to this self-training, they had onsite training done by module, which allowed staff to attend only the training that was relevant and needed for them. in all of the articles discussed in this section, only one talks about ongoing maintenance. 22 the two-paragraph section includes suggested methods and does not mention anything about the amount of time or expertise needed for ongoing maintenance. in summary, in this literature review we found that there is research about open-source ils but that there is a need for much more work in this area. it was found that research articles and practitioner pieces are available and talk about different aspects of the adoption process. the main reasons for adoption are identified. there are also a few scattered individual articles about the process of migration, training, and maintenance. there is a gap in the studies of open-source ils, and there is no comprehensive study that documents the process, explains the steps, and identifies best practices and challenges for librarians who are interested. data sources the objective for data collection was to collect data from a variety of library types and sizes in order to collect a wide range of data. e-mail invitations for interviews were sent to koha and evergreen discussion list and to several other library-related discussion list. the e-mail requested volunteers for a telephone interview to share their experiences with open-source integrated library systems. potential participants identified themselves as being willing to be interviewed for information technology and libraries | march 2013 40 the project via e-mail and were then contacted by researchers to set up times for phone interviews. the list of interview questions was e-mailed to the participants before the interviews so that they could review the questions and had enough time to reflect on their experiences. the interviews were conducted with librarians working in a variety of libraries, including nine libraries using evergreen and one in the process of migrating to evergreen. seven libraries were using koha, two were using other open-source ilss, and one was using a proprietary ils while evaluating opensource. public libraries were the most numerous with eleven respondents, while there were also four special libraries, three academic libraries, and one school library. researchers also requested information about the size of the library collection. seven libraries owned collections of less than 100,000 items, seven had collections of 100,001–999,999 items, and four libraries owned collections of over 1,000,000 items. geographically, the respondents ranged all over the united states and included one library located in afghanistan (although the ils was installed in the united states). table 1 details the description of the data. data collection method interviews were chosen as the primary means of data collection in order to gather rich information that could be analyzed using qualitative methods. researchers sought to interview professionals from a variety of library types and sizes in order to collect a variety of different experiences regarding the selection, implementation, and ongoing maintenance of open-source ils. interviewing was the chosen methodology for several reasons. first, the goal was to go past the practitioner articles to see what kinds of trends there are in the migration process. this requires getting experiences from multiple librarians. interviews provide the in-depth “case-study description” that we were looking for.23 in addition, the most useful aspect of interviewing is the ability to follow up on an answer that the participant gives.24 this ensures that the same type of information is gathered from every interview. this is unlike surveys where sometimes participants do not respond in a way that answers what the researcher really wants to know. in our case, we used telephone interviews due to the geographic dispersion of the participants. it allowed us to talk to librarians from all over the country instead of just within our area. the interview questions are listed in appendix a. data analysis methodology interviews were transcribed, and identifying information was then removed from each of the transcribed documents. the transcripts were then uploaded into dedoose (www.dedoose.com), a web-based analysis program supporting qualitative and mixed methods research. dedoose provides online excerpt selection, coding, and analysis by multiple researchers for multiple documents. the research team used an iterative process of qualitatively analyzing the resulting documents. this method used multiple reviews of the data to initially code large excerpts which were then analyzed twice more to extract common themes and ideas. researchers began by reviewing each document for quantitative information, including the library type, ils in use, experiences of migrating to an open-source integrated library system | singh 41 number of it staff, and size of the collection. this information was added as metadata descriptors to each document in dedoose. upon review of the transcriptions and in discussions about the interview process, researchers began a content analysis of the qualitative data. codes were created based on this initial analysis to aid in categorizing the data from the interviews. two coders coded the entire dataset, specifying categories and themes to the excerpts of the interview transcription. all of the excerpts from each coder were used to create two tests. each coder then took the test of the other's codes by choosing their own codes for each excerpt. researchers earned scores of .96 and .95 using cohen’s kappa statistic, indicating very high reliability. table 1. description of libraries library size (number of items in collection) library type ils used under 100,000 academic koha 100,000–1,000,000 public evergreen under 100,000 special proprietary—considering open-source under 100,000 public koha school koha 100,000–1,000,000 public millennium—in process of migrating to evergreen 100,000–1,000,000 public evergreen 100,000–1,000,000 special koha under 100,000 public koha public evergreen 100,000–1,000,000 academic evergreen-equinox under 100,000 special koha over 1,000,000 academic kuali ole 100,000–1,000,000 public evergreen-equinox over 1,000,000 public evergreen 100,000–1,000,000 public evergreen under 100,000 public koha-bywater over 1,000,000 public evergreen-equinox under 100,000 public evergreen over 1,000,000 special collective access information technology and libraries | march 2013 42 results results from the interview questions were divided into eight categories identified as stages of migration, starting with evaluation of the ils, creation of a demonstration site, data preparation, identification of customization and development needs, data migration, staff training and user testing, and going live and long-term maintenance plans. best practices and challenges for each of the stages are presented below. this section begins with some general considerations gleaned from the responses. general consideration when migrating to an open-source ils • create awareness about open-source culture in your library—let them know what to expect. • develop it skills internally even if you use a vendor. • assess your staff’s abilities before committing. knowing what your staff can do will help determine whether you need to work with a vendor and to what degree or if you can do it alone. it is also a way to determine who is going to be on your migration team. • have a demonstration system; pre-migration, it can be used to test and train, and after migration it can be used to help find solutions to problems. this will also help develop skills internally. • communication is key. o if working with a vendor either as a single library or as a consortium, have a designated liaison with the vendor so all questions go through one person. in a consortium, ensure that everybody knows what is going on. • be prepared to commit a significant amount of staff time for testing, development, and migration, especially if you are not hiring a proprietary vendor for support. working with vendors • read contracts carefully. do not be afraid to ask questions and request changes. sometimes the other party has a completely different meaning for a word than you do. make sure you are on the same page. • ensure that there is an explicit timeline and procedure for the release of usable source code. • see that you are guaranteed and entitled to access the source code in case you need to switch developers, bring additional developers on board, or try to fix problems in-house. • provide specific examples when reporting problems. specific example will help the developers determine what the problem is and will help prevent any miscommunication. • designate a liaison between library staff and developers. the liaison will have to be someone who understands or can learn enough about what the developers are doing so that he or she can translate any problems or complaints from one group to the other. experiences of migrating to an open-source integrated library system | singh 43 • set up regular meetings for those involved in the migration project. regular meetings keep everyone focused and on task. they also provide an opportunity for questions, concerns, and problems to be addressed quickly. sample quote from interviews: one of the main things that came up is working with equinox, it was amazing. to start with, they were very, very helpful. and i had made an assumption, and i think the rest of us had, too, that we were working with, that this was developed by librarians, and that the terminology used would be library jargon. but that was not the case. we had some stumbling points over, we would say, okay, we want this, or this is a transaction, or that’s a bill, but that’s not what they called it. they didn’t call it a transaction, or they didn’t call it a bill. and so when we wrote the contract, we wrote it so that none of the patrons’ current checkout record would migrate, which is a big issue. and we didn’t realize that we weren’t using the right terminology in order to put that in the contract so that those current checkouts would move over with the migration and not just the record. stage 1—evaluation when making the decision of whether to migrate to open-source and which open-source ils is best for your library, the main things to start with are two questions: who makes the decision and on what basis. in practice, who makes the decision? • if a single library, one or two people make the decision, usually the library director and whoever is serving as the tech person. • if in a consortium, a committee makes the decision, often either the library directors or tech people. best practice suggestion: regardless of the size of the library system, even though these are the people making the decisions, you should always try to include as many groups as open-sourceible in the decision to move to open-source. which ils? • make a list of requirements based on your current system and a wish list of requirements for the new system. this is one area where you can involve more than just the system staff. asking the different departments (cataloging, acquisitions, and circulation) what their needs are ensures that the final decision includes everyone. • talk to other libraries that have made the move to open-source. they are a great resource for seeing how the system actually works, asking questions about the migration process, and providing information about open-source problems. if available, talk to a library that migrated from your current proprietary system. some systems are easier to migrate from than others, so this would be an opportunity to find out about any specific problems. information technology and libraries | march 2013 44 stage 2—set up a demonstration site • this is the most important guideline in the entire paper. create a demonstration site before making a final decision. o if there is still confusion in your team about which ils to use, setting up a demo site and installing koha and evergreen will be the best way to decide which one works for your situation. o doing at least one test migration will show what kind of data preparation needs to be done, usually by doing data mapping. data mapping is where you determine where the fields in your current system go when you move into the new system. another often-used term for data mapping is staging tables. o the demo site is also a good way to do staff training when needed. o the demo site also provides a way to determine what the best setup rules, policies, and settings are by testing them in advance. o it provides an opportunity to learn the processes of the different modules and how they differ from your library’s current practices. o most importantly, it serves as a test run for migration, which will make the actual migration go smoothly. sample quotes from interviews: do you think that the tests with the data and doing that really helped? oh yes, we were have had a disaster if we hadn’t done three tests and test loads. the pals office has done conversions multiple times before so they have it done, and we have good tech people. so they knew that the three tests loads would be a good thing. we did discover some of the tools that should be used, like for example one of the things that’s recommended for evergreen patron migration is to have a staging table, so you dump all your records into a database that you can then use to create the records in the evergreen tables. and you know we found out why that was important by running into a couple, a few problems with not being able to line up the data in the multiple fields. but you know that’s the sort of thing we expect. that’s pretty, i classify it as pretty typical migration learning, is finding out what works one way, what doesn’t the other. but you know that was a good thing because all the documents were saying, “you should use a staging table.” and we had to figure out ourselves why that was such a good idea. you should use a staging table for migration, i.e. move records into a database that is then used to create records in evergreen. it helps because some data doesn't line up in the same fields. it's a good idea to set up tables and rules far in advance in order to test before migration. it's very important to do data mapping very carefully because if you lose anything during migration it's difficult to get it back. check it to make sure that all the fields will be experiences of migrating to an open-source integrated library system | singh 45 transported correctly, and run tests while the old system is still up to make sure everything is there. stage 3—data preparation • clean up the data in advance. the better the data is, the easier it will transfer. this is also an opportunity to start fresh in a new system, so if there were inconsistencies or irritations in the old system this is a good time to fix it. o weeding—if you have records (either materials or patrons) that are out of date, get rid of them. the fewer the records, the easier migration will be. in addition, vendors often charge by record, so why pay for records you do not need? • consistency in data is key. if multiple people are working on the data, make sure they are working based on the same standards. • do a fine amnesty when migrating to a new system. depending on the systems (current and new), it is sometimes impossible or very difficult to transfer fine data into the new ils, so doing a fine amnesty will make the process simpler. • spot check data (testing, during, and after migration). catching problems early means there will be less work trying to fix problems later. sample quotes from interviews: i would say that if you’re considering converting to an ils software, that you’ve really got to do the data mapping very carefully with a fine-toothed comb because you don’t want to lose data. it’s too hard to get it back in. the data needs to be normalized so that the numbers of fields are uniform, names are in the correct order, and data is displayed correctly. the library has had to decide whether it is worthwhile to do things like getting rid of old abbreviations, etc. to make the data more easily understood. problems occur with old data if information such as note fields has been entered inconsistently. it's important to have procedures and to make sure everyone is following them. often things are put in different places, which causes a lot of trouble. they are doing a lot of cleanup of data, such as reducing the number of unique values in the case of some items that had a huge number of values in a drop down list. would like to spend more time on data cleanup but need to go ahead and get data migrated. stage 4—development/customization • one benefit of using an open-source ils is that any development done by any library comes back to the community, so often if you want something done, someone else might have already created that functionality and you can use it. information technology and libraries | march 2013 46 • develop partnerships. often if you want a specific development, someone else does too. if your staff does not have the expertise, then you could provide more of the funding and the partner could provide the tech skills or vice versa. partnerships mean the development will cost less than if you did it alone. • grant money is also available for open-source development and may be another funding option. sample quotes from interviews: the library does its own minor customizations and uses equinox for major jobs. they will lay out and prepare everything then hire equinox to write and implement new code. the library tries not to do things on its own but always looks for partnerships when doing any customizations. that way libraries that have similar needs can share resources. stage 5—migration process • write workflows and policies/rules beforehand. writing these when working on the demo site should provide step-by-step instructions on how to do the final migration. • having regular meetings during the migration process ensures that everyone stays on the same page and prevents miscommunications that will slow down the process. • if many libraries are involved, migration in waves will make things easier. this is generally a situation with a statewide consortium. usually there is a pilot migration of four to eight libraries, then after that, each wave gets a little bigger as the system becomes more practiced. this can also be a useful model if the libraries involved in the consortium are accepting the migration at different rates. • for a consortium that is coming from multiple ilss, having a vendor will make it easier. this is not to say that it could not be done without a vendor, but migrating from system a is going to be different than migrating from system b. this increases the complexity, which can make working with a vendor more cost effective. stage 6—staff training and user testing • who does the training? there are two main ways: by a vendor or internally. o if trained by a vendor, there are two options:  the vendor sends someone to the library to conduct training.  the library sends someone to the vendor for training and then he or she comes back and trains the rest of the staff. o if trained internally, there are a lot of training materials available. there are several libraries that have created their own materials and then made them available online. this is another time where having contacts with other libraries can help in using common resources. experiences of migrating to an open-source integrated library system | singh 47 • documentation is important for training. the best way is to find what documentation is already available and then customize it for your system. • do training fairly close to the “go live” date. • use a day or two for training. if a consortium is spread out geographically, use webinars and wikis. • when doing training, have specific tasks to do. this can be done a few ways. o do the specific tasks at the training. o demonstrate the tasks at training and then give “homework” where the staff does the specific tasks independently. to implement this option, staff has to have access to a demo system. o have staff try the tests on their own and use the training session for questions or problems they had. sample quotes from interviews: well we had, we hired equinox to come and do 2 days of training with us. so they’re here and did hands-on training with us. and then we also, they provided some packets of exercises that people could do on their own. and we had the system up and running in the background so that they could play with it about a week before we actually went live to the public so that they could get used to it, figure out how things worked, and work with it a little bit so they could answer questions before the public came and said, hey, how do i find my record, and i can’t get into this anymore. and the training was really good, but the hands-on was the best. and it’s not a difficult system to work, but you just need a little experience with it before it makes sense. evergreen runs a test server that anybody can download the staff client for that and work in their test server and just examine all of the records and how the system works, to figure out our workflows. we looked up documentation online—evergreen, indiana, pines, various places—copied the documentation they so graciously hosted online for everybody to use, went through it, found what worked for us. those couple staff members worked with other staff. we printed out kind of our little how-to guides for other people, depending on which worked, and told them they’re going to sit down, we’ve got terminals set up here, sit down and learn it. the admin person, she went through some quite detailed training. she went to atlanta and had training from equinox on a lot of aspects of evergreen. and then we also, she came back, and then she did training for all the libraries in the consortium, kind of an intensive day-long or half-day-long thing that she offered in several different central geographic locations so that all the libraries would have a chance to go and attend without having to drive too far. and we also did webinars, we got a couple webinars for the real outlying libraries. and we also have ongoing weekly webinars. and we have a wiki set up where we put all the information in the online manual and stuff like that. information technology and libraries | march 2013 48 all the training sessions were recorded, and so we had them on cd for new people coming on board. marketing for patrons • most libraries have not done anything elaborate, generally just announcements through posters, local papers, flyers, and on websites. • if the migration is greatly changing the situation for patrons, then more marketing is needed. • set up a demo computer for patrons to try or hold classes once the system is up. training for patrons • most libraries did not find this necessary. either the system is easy to use or it is set up to look like the old system. • if training patrons, create online tutorials. stage 7—“go live” and after • if possible, have your old system running for a month or two until you are sure that all the data got migrated over properly. sample quote from interviews: check it to make sure that all the fields will be transported correctly, and run tests while the old system is still up to make sure everything is there. maintenance—library staff (this assumes a migration being done in-house with little to no vendor support.) • staff has to have the technical knowledge (linux, sql, and coding). • often the money saved from moving to open-source is used to pay for additional staff. • most time is not spent on maintenance but on customization, updates, or problem-solving. maintenance—vendor • often start with higher vendor support, which lessens as the staff learns and develops expertise. discussion and conclusion interviews with twenty librarians from different settings provided insight into the process of the adoption of open-source ils and were used to develop the guidelines presented in this paper. these guidelines are not intended to serve as a complete guide to the process of adoption but are meant to give interested librarians an overview of the process. these guidelines can help libraries prepare themselves for the research and adoption far before they delve into the process. since these guidelines are all based in the real-life adoption experiences of libraries, they provide insight experiences of migrating to an open-source integrated library system | singh 49 into the challenges as well as the opportunities in the process. these guidelines can be used to develop an adoption plan and requirements for the adoption process. in future research, we are working to create adoption blueprints and total cost of ownership assessments (with and without vendors) for libraries of different sizes and types. also, as part of this research we have developed an information portal that contains resources that will help librarians in each phase of the process of open-source ils adoption. the information portal along with these guidelines will fill a very important gap in the resources available for open-source ils adoption. the url for the portal is not being provided in this paper to ensure anonymous review. references 1. marshall breeding, “automation marketplace 2012: agents of change,” library journal 137, no. 6 (april 1, 2012), http://lj.libraryjournal.com/2012/03/industry-news/automationmarketplace-2012-agents-of-change (accessed february 18, 2013). 2. tristan müller, “how to choose a free and open-source integrated library system,” oclc systems & services: international digital library perspectives 27, no. 1 (2011): 57–78, http://dx.doi.org/10.1108/10650751111106573 (accessed february 18, 2013). 3. emily g. morton-owens, karen l. hanson, and ian walls, “implementing open-source software for three core library functions: a stage-by-stage comparison,” journal of electronic resources in medical libraries 8, no. 1 (2011), 1–14, http://dx.doi.org/10.1080/15424065.2011.551486 (accessed february 18, 2013). 4. janet l. balas, “how they did it: ils migration case studies,” computer in libraries 31, no. 8 (2011): 37. 5. müller, “how to choose a free and open-source integrated library system.” 6. roy tennant, “technology decision-making: a guide for the perplexed,” library journal 125, no. 7 (2000): 30. 7. xan arch, “ultimate debate 2010: open source software—free beer or free puppy? a report of the lita internet resources & services interest group program, american library association annual conference, washington, dc, june 2010,” technical services quarterly 28, no. 2 (2011): 186–88, http://dx.doi.org/10.1080/07317131.2011.546268 (accessed february 18, 2013). 8. camille espiau-bechetoille, jean bernon, caroline bruley, and sandrine mousin, “an example of inter-university cooperation for implementing koha in libraries: collective approach and institutional needs,” oclc systems & services: international digital library perspectives 27, no.1 (2011): 40–44, http://dx.doi.org/10.1108/10650751111106546 (accessed february 18, 2013). http://lj.libraryjournal.com/2012/03/industry-news/automation-marketplace-2012-agents-of-change/ http://lj.libraryjournal.com/2012/03/industry-news/automation-marketplace-2012-agents-of-change/ http://dx.doi.org/10.1108/10650751111106573 http://dx.doi.org/10.1080/15424065.2011.551486 http://dx.doi.org/10.1080/07317131.2011.546268 http://dx.doi.org/10.1108/10650751111106546 information technology and libraries | march 2013 50 9. gerhard bissels, “implementation of an open-source library management system: experiences with koha 3.0 at the royal london homoeopathic hospital,” electronic library and information systems 42, no. 3 (2008): 303–14, http://dx.doi.org/10.1108/00330330810892703 (accessed february 18, 2013). 10. randy dykhuis, “michigan evergreen: implementing a shared open source integrated library system,” collaborative librarianship 1, no. 2 (2009): 60–65, http://collaborativelibrarianship.org/index.php/jocl/article/view/7/8 (accessed february 18, 2013). 11. karen kohn and eric mccloy, “phased migration to koha: our library’s experience,” journal of web librarianship 4 no. 4 (2010): 427–34, http://dx.doi.org/10.1080/19322909.2010.485944 (accessed february 18, 2013). 12. l.h. lyn dennison and a.f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college,” georgia library quarterly 48 no. 2 (2011): 6–8, http://digitalcommons.kennesaw.edu/glq/vol48/iss2/3 (accessed february 18, 2012). 13. linda m. riewe, “survey of open-source integrated library systems,” master’s theses, paper 3481, http://scholarworks.sjsu.edu/etd_theses/3481 (accessed february 18, 2013). 14. karen kohn and eric mccloy, “phased migration to koha: our library’s experience.” 15. randy dykhuis, “michigan evergreen: implementing a shared open source integrated library system.” 16. ian walls, “migrating from innovative interfaces’ millennium to koha: the nyu health sciences libraries’ experiences,” oclc systems & services: international digital library perspectives 27, no. 1 (2011): 51–56, http://dx.doi.org/10.1108/10650751111106564 (accessed february 13, 2013). 17. l.h. lyn dennison and a.f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college.” 18. emily g. morton-owens, karen l. hanson, and ian walls “implementing open-source software for three core library functions: a stage-by-stage comparison.” 19. lisa genoese and latrina keith, “jumping ship: one health science library’s voyage from a proprietary ils to open source,” journal of electronic resources in medical libraries 8, no. 2 (2011): 126–33, http://dx.doi.org/10.1080/15424065.2011.576605 (accessed february 18, 2013). 20. ian walls, “migrating from innovative interfaces’ millennium to koha: the nyu health sciences libraries’ experiences”; emily g. morton-owens, karen l. hanson, and ian walls, http://dx.doi.org/10.1108/00330330810892703 http://collaborativelibrarianship.org/index.php/jocl/article/view/7/8 http://dx.doi.org/10.1080/19322909.2010.485944 http://digitalcommons.kennesaw.edu/glq/vol48/iss2/3 http://scholarworks.sjsu.edu/etd_theses/3481 http://dx.doi.org/10.1108/10650751111106564 http://dx.doi.org/10.1080/15424065.2011.576605 experiences of migrating to an open-source integrated library system | singh 51 “implementing open-source software for three core library functions: a stage-by-stage comparison.” 21. l. h. lyn dennison and a. f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college.” 22. morton-owens, hanson, and walls, “implementing open-source software for three core library functions.” 23. laurel jizba mis, “an essay on our interviews, and a call for participation,” journal of internet cataloging 6 no. 2 (2003): 17–20, doi: 10.1300/j141v06n02_04 (accessed february 18, 2013). 24. golnessa galyani moghaddan and mostafa moballeghi, “how do we measure the use of scientific journals? a note on research methodologies,” scientometrics 76, no. 1 (2008): 125– 33, doi: 10.1007/s11192-007-1901-y (accessed february 18, 2013). doi:%2010.1300/j141v06n02_04 doi:%2010.1007/s11192-007-1901-y information technology and libraries | march 2013 52 appendix a. interview questions library environment 1. what is your library type (school, academic, public, special, etc.)? 2. what is your library size (how many employees, population served, and number of materials)? evaluation (we would like as much info as possible about why the system was chosen over others, including any existing system.) 3. what open-source ils are you using and why did you choose it? 4. when choosing an open-source ils, where did you go for information (vendor/ils pages, community groups, personal contacts, etc)? 5. who was involved in deciding which ils to use? adoption (we would like to document specific problems or issues that could be used by other libraries to ease their installation.) 6. were there any problems during migration? 7. what do you know now that you wish you had known before migration? 8. how long did migration take? were you on schedule? 9. if getting paid support, how did the vendors (previous and current) help with migration? implementation (again, specific examples of the things that worked well or didn't work. how can other libraries learn from this experience?) 10. what kind of (and how much) training did your library staff receive? 11. did you do any kind of marketing to your patrons? 12. (if haven’t gotten to this part yet), what are your plans for implementation? 13. how much time did implementation take and were you on schedule? maintenance (this information will be especially important when compared to the library type and size as a reference for other libraries. we would like to get answers that are as specific as possible). 14. how large is your systems staff? is it sufficient to maintain the system? 15. how much time do you spend each week doing system maintenance? how does this compare to your old system? experiences of migrating to an open-source integrated library system | singh 53 16. what resources (or channels) do you use to solve your technical support issues? what roles do paid vendors play in maintenance of your system? advice for other libraries (these open-ended questions are an opportunity to learn more information that we might not have thought of asking about. responses could provide a valuable resource to other libraries as they plan their implementation). 17. what is the best thing and worst thing about having an open-source ils? 18. are there any lessons or advice that you would like to share with other librarians who are thinking about or migrating to an open-source ils? acknowledgment this research was funded by an early career imls grant. abstract interest in migrating to open-source integrated library systems is continually growing in libraries. along with the interest, lack of empirical research and evidence to compare the process of migration brings a lot of anxiety to the interested librari... 164 information technology and libraries | december 2009 “discovery” focus as impetus for organizational learning jennifer l. fabbi the university of nevada las vegas libraries’ focus on the concept of discovery and the tools and processes that enable our users to find information began with an organizational review of the libraries’ technical services division. this article outlines the phases of this review and subsequent planning and organizational commitment to discovery. using the theoretical lens of organizational learning, it highlights how the emerging focus on discovery has provided an impetus for genuine learning and change. t he university of nevada las vegas (unlv) libraries’ focus on the concept of discovery and the tools and processes that enable our users to find information stemmed from the confluence of several initiatives. however, a significant path that is directly responsible for the increased attention on discovery leads through one unit in unlv libraries—technical services. this unit, consisting of the materials ordering and receiving (acquisitions) and bibliographic and metadata services (cataloging) departments, had been without a permanent director for three years when i was asked to take the interim post in april 2008. while the initial expectation was that i would work with the staff to continue to keep technical services functioning while we performed our third search for a permanent director, it became clear after three months that, because of nevada’s budgetary limitations, we would not be able to go forward with a search at that time. as all personnel searches in unlv libraries were frozen, managers and staff across the divisions moved quickly to reassign staff with the aim of mitigating the effects of staff vacancies. there was division between the library administrators as to what the solution would be for technical services: split up the division—for which we had trouble recruiting and retaining a leader in the past—and divvy up its functions among other divisions in the libraries, or to continue to hold down the fort while conducting a review of technical services that would inform what it might become in the future. other organizations have taken serious looks at, and provided roadmaps of, how their organizations’ focus of technical services will change in the future.1 the latter route was chosen, and the review—eventually dubbed revisioning technical services—led directly to the inquiries and activities documented in this ital special issue. detailing the process of revisioning technical services and using the theoretical lens of organizational learning, i will demonstrate how the libraries’ emerging focus on the concept of discovery has provided an impetus for genuine learning and change. n organizational learning in images of organization, morgan devotes a chapter to theories of organizational development that characterize organizations using the metaphor of the brain.2 based on the principles of modern cybernetics, argyris and schön provide a framework for thinking about how organizations can learn to learn.3 while many organizations have become adept at single-loop learning—the ability to scan the environment, set objectives, and monitor their own figure 1. singleand double-loop learning source: learning-org discussion pages, “single and double loop learning,” learning-org dialog on learning organizations, http://www.learning-org.com/ graphics/lo23374singledll.jpg (accessed aug. 11, 2009). jennifer l. fabbi (jennifer.fabbi@unlv.edu) is special assistant to the dean at the university of nevada las vegas libraries. “discovery” focus as impetus for organizational learning | fabbi 165 general performance in relation to existing operating norms—these types of systems are generally designed to keep the organization “on course.” double-loop learning, on the other hand, is a process of learning to learn, which depends on being able to take a “double look” at the situation by questioning the relevance of operating norms (see figure 1). bureaucratized organizations have fundamental organizing principles, including management hierarchy and subunit goals that are seen as ends to themselves, which can actually obstruct the learning process. to become skilled in the art of double-loop learning, organizations must avoid getting trapped in singlelooped processes, especially those created by “traditional management control systems” and the “defensive routines” of organizational members.4 according to morgan, cybernetics suggests that learning organizations must develop capacities that allow them to do the following:5 n scan and anticipate change in the wider environment to detect significant variations by o embracing views of potential futures as well as of the present and the past; o understanding products and services from the customer’s point of view; and o using, embracing, and creating uncertainty as a resource for new patterns of development. n develop an ability to question, challenge, and change operating norms and assumptions by o challenging how they see and think about organizational reality using different templates and mental models; o making sure strategic development does not run ahead of organizational reality; and o developing a culture that supports change and risk taking. n allow an appropriate strategic direction and pattern of organization to emerge by o developing a sense of vision, norms, values, limits, or “reference points” to guide behavior, including the ability to question the limits being imposed; o absorbing the basic philosophy that will guide appropriate objectives and behaviors in any situation; and o placing as much importance on the selection of the limits to be placed on behavior as on the active pursuit of desired goals. unlv libraries’ revisioning technical services process and the resulting organizational focus on discovery is outlined below, and the elements identifying unlv libraries as a learning organization throughout this process are highlighted (see appendix a). n revisioning technical services this review of technical services was a process consisting of several distinct steps over many months, and each step was informed by the data and opinions gained in the prior steps: phase 1: technical services baseline, focusing on the nature of technical services work at unlv libraries, in the library profession, and factors that affect this work now and in the future phase 2: organizational call to action, engaging the entire organization in shared learning and input phase 3: summit on discovery, shifting significantly away from technical services and toward the concept of discovery of information and the experience of our users technical services baseline the first phase of the process, which i called the “technical services baseline,” included a face-to-face meeting with me and all technical services staff. we talked openly about the challenges that we faced, options on the table for the division and why i thought that taking on this review would be the best course to pursue, and goals of the review. outcomes of the process were guided by the dean of libraries, were written by me, and received input from technical services staff, resulting in the following goals: 1. collect input about the kinds of skills and leadership we would like to see in our new technical services director. (while creating these goals, we were given the go-ahead to continue our search for a new director). 2. investigate the organization of knowledge at a broad level—what is the added value that libraries provide? 3. increase overall knowledge of professional issues in technical services and what is most meaningful for us at unlv. 4. encourage technical services staff to consider current and future priorities. after establishing these goals, i began to document information about the process on unlv libraries’ staff website (figure 2) so that all staff could follow its progress. 166 information technology and libraries | december 2009 with the feedback i received at the face-to-face meeting and guided by the stated goals of the process, i gave technical services staff a series of three questions to answer individually: 1. what do you think the major functions of technical services are? examples are “cataloging physical materials” and “ordering and paying for all resources purchased from the collections budget.” 2. what external factors—in librarianship and otherwise—should we be paying the most attention to in terms of their effect on technical services work? examples are “the ways that users look for information” and “reduction of print book and serials budgets.” feel free to do a little research on this question and provide the sources of the information that you find. 3. what are the three highest priority/most important tasks on your to-do list right now? eighteen of twenty staff members responded to the questions. i then analyzed the twenty pages of feedback according to two specific criteria: (1) i paid special attention to phrases that indicated an individual’s beliefs, values, or philosophies to identify potential sources of conflict as we moved through the process; and (2) i looked for priority tasks listed that are not directly related to the individual’s job duties, as many of them were indicators of work stress or anxiety related to perceived impending change. during this phase, organizational learning was initiated through the process of challenging how technical services staff and others viewed technical services as a unit in the organization, and through the creation of shared reference points to guide our future actions. while beginning a dialogue about a variety of future management options for technical services work functions may have raised levels of anxiety within the organization, it also invited administration and staff to question the status quo and consider alternative modes of operation within the context of efficiency.6 in addition to thinking about current realities and external influences, staff were asked to participate in generating outcomes to guide the review process. these shared goals helped to develop a sense of coherence for what started out as a very loose assignment—a review that would inform what the unit might become in the future. organizational call to action the next phase of the process, “a call to action,” required library-wide involvement and input. while i knew that this phase would involve a library staff survey, i also desired that all staff responding to the survey had a basic knowledge of some of the issues that are facing library technical services today. using input from the two technical services department heads, i selected two readings for all library staff: bothmann and holmberg’s chapter on strategic planning for electronic resource management addressed many of the planning, policy, and workflow issues that unlv libraries has experienced7; and coyle’s article on information organization and the future of the library catalog offers several ideas for ensuring that valuable information is visible to our users in the information environments they are using.8 i also asked the library staff to visit the university of nebraska–lincoln’s “encore catalog search” (http://iris.unl.edu) and go through the discovery experience by performing a guided search and a search on a topic of their choice. they were then asked to ponder what collections of physical or digital resources we currently own at the libraries that are not available from the library catalog. after completing these steps, i directed library staff to a survey of questions related to the importance of several items referenced in the articles in terms of the following unlv libraries priorities: n creating a single search interface for users pulling together information from the traditional library catalog as well as other resources (e.g., journal articles, images, archival materials) n considering non–marc records in the library catalog for the integration of nontraditional library and nonlibrary resources into the catalog n linking to access points for full-text resources from the catalog n creating ways for the catalog to recommend items to users figure 2. project’s wiki page on staff website “discovery” focus as impetus for organizational learning | fabbi 167 n creating metadata for materials not found in the catalog n creating “community” within the library catalog n implementing an electronic resource management system (erms) to help manage the details related to subscriptions to electronic content n implementing federated searching so that users can search across multiple electronic resource interfaces at once n making electronic resource license information available to library staff and patrons there also were several questions asking library staff to prioritize many of the functions that technical services already undertakes to some extent: n cataloging specialized or unique materials n cataloging and processing gift collections n ensuring that full-text electronic access is represented accurately in the catalog n claiming and binding print serials n ordering and receiving physical resources n ordering and receiving electronic resources n maintaining and communicating acquisitions budget and serials data the survey asked technical services staff to “think of your current top three priority to-do items. in light of what you read and what you think is important for us to focus on, how do you think your work now will have changed in five years?” all other library staff members were asked to respond to the following: 1. please list two ways that technical services supports your work now. 2. please list two things you would like technical services to start doing in support of your work now. 3. please list two things you think technical services can stop doing now. 4. please list two things technical services will need to begin doing to support your work in the next five years. finally, the survey included ample opportunity for additional comments. fifty-eight staff members (over half of all library staff) completed the readings, activity, and survey. i analyzed the information to inform the design of subsequent phases of revisioning technical services. the dean of libraries’ direct reports then reviewed the design. in addition, many library staff contributed additional readings and links to library catalogs and other websites to add to the revisioning technical services staff webpage. throughout this phase, the organization was invited into the learning process through engagement with shared reference points, the ability to question the status quo, and the ability to embrace views of potential futures as well as of the present and the past.9 the careful selection of shared readings and activities created coherence among the staff in terms of thinking about the future, but these ideas also raised many questions about the concept of discovery and what route unlv libraries might take. the survey allowed library staff to better understand current practices in technical services, to prioritize new ideas against these practices, and to think about future options and their potential impact on their individual work as well as the collective work of the libraries. summit on discovery in the third phase of this process, “the discovery summit,” focus began to shift significantly from technical services as an organizational unit to the concept of discovery and what it means for the future of unlv libraries. during this half-day event, employing a facilitator from off campus, the dean of libraries and i designed a program to fulfill the following desired outcome: through a process of focused inquiry, observation, and discussion, participants will more fully understand the discovery experience of unlv libraries users. the event was open to all library staff members; however, individuals were required to rsvp and complete an activity before the day of the event. (the facilitator worked specifically with the technical services staff at a retreat designed to prepare for upcoming interviews for technical services director candidates.) participants were each sent a “summit matrix” (see appendix b) ahead of time, which asked them to look for specific pieces of information by doing the following: 1. search for the information requested with three discovery tools as your starting points: the libraries’ catalog, the libraries’ website, and a general internet search engine (like google). 2. for each discovery tool, rate the information that you were able to find in terms of “ease of discovery” on a scale of 1 (lowest ease—few results) to 5 (highest ease—best results). 3. document the thoughts and feelings you had and/ or process you went through in searching for this information. 4. answer this question: do you have other preferred starting points when looking for information that the libraries own or provide access to? the information that staff members were asked to search for using each discovery tool was mostly specific to the region of southern nevada, such as, “i heard that henderson (a city in southern nevada) started as a mining community. does unlv libraries have any books about that?” and “find any photograph of the gay 168 information technology and libraries | december 2009 pride parade in las vegas that you can look at in unlv libraries.” during the summit, the approximately sixty participants were asked to discuss their experiences searching for the matrix information, including any affective component to their experience, and they were asked to specify criteria for their definition of “ease of discovery.” next, we showed end-user usability video testing footage of a unlv professor, a human resources employee, and a unlv librarian going through similar discovery exercises. after each video, we discussed these users’ experiences—their successes, failures, and frustrations— and the fact that even our experts were unable to discover some of this information. finally, we facilitated a robust brainstorming session on initiatives we could undertake to improve the discovery experience of our users. [editor’s note: read more about this usability testing in “usability as a method for assessing discovery” on page 181 of this issue.] during the wrap-up of the discovery summit, the final phase of this initial process, the discovery miniconference was introduced. a call for proposals for library staff to introduce or otherwise present discovery concepts to other library staff was distributed. this call tied together the revisioning technical services process to date and also placed the focus on discovery to the libraries’ upcoming strategic planning process. this strategic planning process, outlining broad directions for the libraries to focus on for the next two years, would be the first time we would use our newly created evaluation framework. we focused on the concepts of discovery, access, and use, all tied together through an emphasis on the user. all library staff members were invited to submit a poster session or other visual display on various themes related to discovery of information to add to our collective and individual knowledge bases and to better understand our colleagues’ philosophies and positions on discovery. in addressing one of six mini-conference themes listed below, all drawn directly from the revisioning technical services survey results, potential participants were asked to consider the question, “what are your ideas for ways to improve how users find library resources?” n single search interface (federated searching, harvester-type platform, etc.) n open source vs. vendor infrastructure n information-seeking behavior of different users n social networking and web 2.0 features as related to discovery n describing primary sources and other unique materials for discovery n opening the library catalog for different record types and materials proposals could include any of these perspectives: n an environmental scan with a summary of what you learn n a visual representation of what you would consider improvement or success n a position for a specific approach or solution that you advocate ultimately, we had seventeen distinct projects involving twenty-four staff members for the afternoon miniconference. it was attended by approximately seventy additional staff members from unlv libraries as well as representatives from institutions who share our innovative system. we collected feedback on each project in written form and electronically after the mini-conference. miniconference content was documented on its own wiki pages and in this special issue of ital. during this phase of the revisioning technical services process, there was an emphasis on understanding our services from the customers’ point of view, a hallmark of a learning organization.10 during the discovery summit, we aimed to transform frustration and uncertainty over the user experience of the services we are providing into a motivation to embrace potential futures. the mini-conference utilized the discovery themes that had evolved throughout the revisioning technical services process to provide a cohesive framework for library staff members to share their knowledge and ideas about discovery systems and to question the status quo. n organizational ownership of discovery: strategic planning and beyond through the phases of the revisioning technical services process outlined above, it should be evident how the concept of discovery, highlighted during the process, moved from being focused on technical services to being owned by the entire organization. while the vocabulary of discovery had previously been owned by pockets of staff throughout unlv libraries, it has now become a common lexicon for all. the libraries’ evaluation framework, which includes discovery, had set the stage for our upcoming organizational strategic plan. just prior to the discovery summit, the dean of libraries’ direct reports group began to discuss how it would create a strategic plan for the 2009–11 biennium. it became increasingly apparent how important a focus on discovery would be in this process, and that we needed to time our planning right, allowing the organization and ourselves time to become familiar with the potential activities we might commit to in this area before locking into a strategic plan. “discovery” focus as impetus for organizational learning | fabbi 169 the dean’s direct reports group first spent time crafting a series of strategic directions to focus on in the two-year time period we were planning for. rather than give the organization specific activities to undertake, the strategic directions were meant to focus our new initiatives—and in a way to limit that activity to those that would move us past the status quo. of the sixteen directions, one stemmed directly from the organization’s focus on discovery: “improve discoverability of physical and electronic resources in empowering users to be self sufficient; work toward an interface and system architecture that incorporates our resources, internal and external, and allows the user to access them from their preferred starting point.” an additional direction also touched on the discovery concept: “monitor and adapt physical and virtual spaces to ensure they respond to and are informed by next-generation technologies, user expectations, and patterns in learning, social interactions, and research collaboration; encourage staff to experiment with, explore, and share innovative and creative applications of technology.” through their division directors and standing committees, all library staff members were subsequently given the opportunity to submit action items to the strategic plan within the framework of the strategic directions. the effort was made by the dean of libraries for this part of the process to coincide with the discovery mini-conference, a time when many library staff members were being exposed to a wide variety of potential activities that we might take as an organization in this area. one of the major action items that made it into the strategic plan was for the dean’s direct reports to charge an oversight task force with the investigation and recommendation of a systems or systems that would foster increased, unified discovery of library collections. the charge of this newly created discovery task force includes a set of guiding principles for the group in recommending a discovery solution that n creates a unified search interface for users pulling together information from the library catalog as well as other resources (e.g., journal articles, images, archival materials); n enhances discoverability of as broad a spectrum of library resources as possible; n is intuitive: minimizes the skills, time, and effort needed by our users to discover resources; n supports a high level of local customization (such as accommodating branding and usability considerations); n supports a high level of interoperability (easily connecting and exchanging data with other systems that are part of our information infrastructure); n demonstrates commitment to sustainability and future enhancements; and n is informed by preferred starting points of the user. in setting forth these guiding principles, the work of the discovery task force is informed by the organization’s discovery values, which have evolved over a year of organizational learning. in the timing of the strategic planning process and the emphasis of the plan, we made sure that the organization’s strategic development did not run ahead of organizational reality and also have worked to develop a culture that supports change and risk taking.11 the strategic discovery direction and pattern of organizational focus has been allowed to emerge throughout the organizational learning process. as evidenced in both the strategic plan directions and guiding principles laid out in the charge of the discovery task force, the organization has begun to absorb the basic philosophy that will guide appropriate objectives in this area and has focused more on this guiding philosophy than on the active pursuit of one right answer as it continues to learn. n conclusion using the theoretical lens of organizational learning, i have documented how unlv libraries’ emerging focus on the concept of discovery has provided an impetus for learning and change (see appendix a). our experience throughout this process supports the theory that organizational intelligence evolves over time and in reference to current operating norms.12 argyris and schön warn that a top-down approach to management focusing on control and clearly defined objectives encourages singleloop learning.13 had unlv libraries chosen a more management-oriented route at the beginning of this process, it most likely would have yielded an entirely different result. in this case, genuine organizational learning proved to be action based and ever-emerging, and while this is known to introduce some level of anxiety into an organization, the development of the ability to question, challenge, and potentially change operating norms has been worth the cost.14 i believe that while any single idea we have broached in the discovery arena may not be completely unique, it is the entire process of organizational learning that is significant and applicable to many information and technology-related areas of interest. references 1. karen calhoun, the changing nature of the catalog and its integration with other discovery tools (washington, d.c.: library 170 information technology and libraries | december 2009 scan and anticipate change in the wider environment to detect significant variations by n embracing views of potential futures as well as of the present and the past (revisioning phase 1: technical services questions); n understanding products and services from the customer’s point of view (revisioning phase 3: summit); and n using, embracing, and creating uncertainty as a resource for new patterns of development (revisioning phase 1: meeting; phase 3: summit). develop an ability to question, challenge, and change operating norms and assumptions by n challenging how they see and think about organizational reality using different templates and mental models (revisioning phase 2: survey); n making sure strategic development does not run ahead of organizational reality (strategic planning process; discovery task force charge); and n developing a culture that supports change and risk taking (strategic planning process). allow an appropriate strategic direction and pattern of organization to emerge by n developing a sense of vision, norms, values, limits, or “reference points” to guide behavior, including the ability to question the limits being imposed (revisioning phase 1: outcomes; phase 2: shared readings, activity; strategic planning process; discovery task force charge); n absorbing the basic philosophy that will guide appropriate objectives and behaviors in any situation (strategic planning process, discovery task force charge); and n placing as much importance on the selection of the limits to be placed on behavior as on the active pursuit of desired goals (strategic planning process, discovery task force charge). of congress, 2006), http://www.loc.gov/catdir/calhoun-report -final.pdf (accessed aug. 12, 2009); bibliographic services task force, rethinking how we provide bibliographic services for the university of california (univ. of california libraries, 2005), http://libraries.universityofcalifornia.edu/sopag/bstf/final .pdf (accessed aug. 12, 2009). 2. gareth morgan, images of organization (thousand oaks, calif.: sage, 2006). 3. chris argyris and donald a. schön, organizational learning ii: theory, method, and practice (reading, mass.: addison wesley, 1996). 4. morgan, images of organization, 87. 5. morgan, images of organization, 87–97. 6. ibid. 7. robert l. bothmann and melissa holmberg, “strategic planning for electronic management,” in electronic resource management in libraries: research and practice, ed. holly yu and scott breivold, 16–28 (hershey, pa.: information science reference, 2008). 8. karen coyle, “the library catalog: some possible futures,” the journal of academic librarianship 33, no. 3 (2007): 414–16. 9. morgan, images of organization. 10. ibid. 11. ibid. 12. ibid. 13. argyris and schön, organizational learning ii. 14. morgan, images of organization. appendix a. tracking unlv libraries’ discovery focus across characteristics of organizational learning “discovery” focus as impetus for organizational learning | fabbi 171 please complete the following and bring to the summit on discovery—february 24: 1. search for the information requested in each row of the table below with three discovery tools as your starting points: the libraries catalog, the libraries website, and a general internet search engine (like google). 2. for each discovery tool, rate the information that you were able to find in terms of “ease of discovery” on a scale of 1 (lowest ease) to 5 (highest ease). 3. document the thoughts and feelings you had and/ or process you went through in searching for this information in the space provided. 4. answer this question: do you have other preferred starting points when looking for information that the libraries own or provide access to? appendix b. summit matrix what am i looking for? libraries catalog libraries website google thoughts, etc., on what i discovered what’s all the fuss about frazier hall? why is it important? does unlv libraries have any documents about the history of the university that reference it? it’s black history month and my professor wants me to find an oral history about african americans in las vegas that is available in unlv libraries. i heard that henderson started as a mining community. does unlv libraries have any books about that? find any photograph of the gay pride parade in las vegas that you can look at in unlv libraries. 106 information technology and libraries | september 2009 michelle frisquepresident’s message michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. b y the time you read this column i will be lita president, however, as i write this i still have a couple of weeks left in my vice-presidential year. i have been warned by so many that my presidential year will fly by, and i am beginning to understand how that could be. i can’t believe i am almost done with my first year. i have enjoyed it and sometimes been overwhelmed by it—especially when i began the process of appointing lita volunteers to committees and liaison roles. i didn’t realize how many appointments there were to make. i want to thank all of the lita members who volunteered. you really helped make the appointment process easier. as a volunteer organization, lita relies on you, and once again many of you have stepped up. thank you. during the appointment process i was introduced to many lita members whom i had not yet met. i enjoyed being introduced to you virtually, and i look forward to meeting you in person in the coming year. i also want to thank the lita office. they were there whenever i needed them. without their assistance i would not have been able to successfully complete the appointment process. over the last year i have been working closely with this year’s lita emerging leaders, lisa thomas and holly tomren. i have really enjoyed the experience. their enthusiasm and energy is contagious. i wish every lita member could have been at this year’s lita camp in columbus, ohio, on may 8. during one of the lightning round sessions, lisa went to the podium and gave an impassioned speech about the benefits of belonging to a professional organization like lita. if there was a person in the audience that was not yet a lita member, i am sure they joined immediately afterward. she really captured the essence of why i became active in lita and why i continue to stay so involved in this organization so many years later. i can honestly say that as much as i have given to lita, i have received so much more in return. that is the true benefit of lita membership. over the last year, the lita board has had some great discussions with lita members and leaders. those conversations will continue as we start the work of drafting a new strategic plan. i want to create a strategic plan that will chart a meaningful path for the association and its members for the next several years. i want it to provide direction but also be flexible enough to adapt to changes in the information technology association landscape. as andrew pace mentioned in his last president’s message, changes will be coming. while we still aren’t sure exactly what those changes are, we know that it is time to seriously look at the current organizational structure of lita to make sure it best fits our needs today while continuing to remain flexible enough to meet our needs tomorrow. when i think of the organizational changes we are exploring, i can’t help but think of the houses i see on my favorite home improvement shows. lita has good bones. the structure and foundation are solid and well built, and as long as the house is well cared for, should last for years to come. however, like all houses, improvements need to be made over time to keep up with the market. the lita structure and foundation will be the same. when you drive up to the house you will still recognize the lita structure. when you walk in the door my hope is that you will still get that same homey feeling you had before, maybe with a few “oohs” and “aahs” thrown in as you notice the upgrades and enhancements. as the year progresses we will know more. i will use this column and other communication avenues to keep you informed of our plans and to gather your input. i would like to close my first column by thanking you for giving me this opportunity to serve you as the lita president. i am honored and humbled by the trust you have placed in me, and i am ready to start my presidential year. i hope it does not go by too quickly. i want to savor the experience. now let’s get started! editor’s comments bob gerrity information technology and libraries | september 2012 1 g’day, mates, and welcome to our third open-access issue. ital takes on an additional international dimension with this issue, as your faithful editor has taken up residence down under, in sunny queensland, australia. the recent ala annual meeting in anaheim marked some changes to the ital editorial board that i’d like to highlight. cynthia porter and judith carter are ending their tenure with ital after many years of service. cynthia is featured in this month’s editorial board thoughts column, offering her perspective on library technology past and present. judith carter ends a long run with ital as managing editor, and i thank her for her years of dedicated service. ed tallent, director of levin library at curry college, is the incoming managing editor. we also welcome two new members of the editorial board: brad eden, the dean of library services and professor of library science at valparaiso university, and jerome yavarkovsky, former university librarian at boston college, and the 2004 recipient of ala’s hugh c. atkinson award. jerome currently co-chairs the library technology working group at the mediagrid immersive education initiative. we cover a broad range of topics in this issue. ian chan, pearl ly, and yvonne meulemans describe the implementation of the open-source instant messaging (im) network openfire at california state university san marcos, in supporting of the integration of chat reference and internal library communications. richard gartner explores the use of the metadata encoding and transmission standard (mets) as an alternative to the fedora content model (fcm) for an “intermediary” digital-library schema. emily morton and karen hanson present an innovative approach to creating a management dashboard of key library statistics. kate pittsley and sara memmott describe navigational improvements made to libguides at eastern michigan university. bojana surla reports on the development of a platform-independent, java-based marc editor. yongming wang and trevor dawes delve into the need for next-generation integrated library systems and early initiatives in that space. melanie schlosser and brian stamper begin to explore the effects of reposting library digital collections on flickr. in addition to the compelling new content in this issue of ital, we have compelling old content from the print archive of ital and its predecessor, journal of library automation (jola), that will soon be available online, thanks in large to the work of andy boze and colleagues at the university of notre dame. scans of all of the back issues have now been deposited onto the server that currently hosts ital, and will be processed and published online over the coming months. bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, st. lucia, queensland, australia. learning to share: measuring use of a digitized collection on flickr and in the ir melanie schlosser and brian stamper information technology and libraries | september 2012 85 abstract there is very little public data on usage of digitized library collections. new methods for promoting and sharing digitized collections are created all the time, but very little investigation has been done on the effect of those efforts on usage of the collections on library websites. this study attempts to measure the effects of reposting a collection on flickr on use of the collection in a library-run institutional repository (ir). the results are inconclusive, but the paper provides background on the topic and guidance for future efforts. introduction inspired by the need to provide relevant resources and make wise use of limited budgets, many libraries measure the use of their collections. from circulation counts and in-library use studies of print materials, to increasingly sophisticated analyses of usage of licensed digital resources, the techniques have changed even as the need for the data has grown. new technologies have simultaneously presented challenges to measuring use, and allowed those measurements to become more accurate and more relevant. in spite of the relative newness of the digital era, “librarians already know considerably more about digital library use than they did about traditional library use in the print environment.”1 arl’s libqual+,2 one of the most widelyadopted tools for measuring users’ perceptions of service quality, has recently been joined by digiqual and mines for libraries. these new statsqual tools3 extend the familiar libqual focus on users into the digital environment. there are tools and studies for seemingly every type of licensed digital content, all with an eye toward better understanding their users and making better-informed collection management decisions. those same tools and studies for measuring use of library-created digital collections are conspicuous in their absence. almost two decades into library collection digitization programs, there is not a significant body of literature on measuring use of digitized collections. a number of articles have been written about measuring usage of library websites in general; arendt and wagner4 is a recent example. in one of the few studies to specifically measure use of a digitized collection, herold5 uses google analytics to uncover the geographical location of users of a digitized archival image collection. otherwise, a literature search on usage studies uncovers very little. less formal communication channels are similarly quiet, and public usage data on digitized collections on library sites is virtually nonexistent. commercial sites for disseminating and sharing melanie schlosser (schlosser.40@osu.edu) is digital publishing librarian and brian stamper (stamper.10@osu.edu) is administrative associate, the ohio state university libraries, columbus, ohio. mailto:schlosser.40@osu.edu mailto:stamper.10@osu.edu information technology and libraries | september 2012 86 digital media frequently display simple use metrics (image views, for example, or file downloads) alongside content; such features do not appear on digitized collections on library sites. usage and digitization projects digitized library collections are created with an eye toward use from their early planning stages. an influential early clir publication on selecting collections for digitization written by a harvard task force6 included current and potential use of the analog and digitized collection as a criterion for selection. the factors to be considered include the quantitative (“how much is the collection used?”) and the qualitative (“what is the nature of the use?”). more than ten years later, ooghe and moreels7 find that use is still a criterion for selection of collections to digitize, tied closely to the value of the collection. facilitating discovery and use of the digitized collection is a major consideration during project development. payette and rieger8 is an early example of a study of the needs of users in digital library design. usability testing of the interface is frequently a component of site design; see jeng9 for a good overview of usability testing in the digital library environment. increasing usage of the digitized collection is also a major theme in metadata research and development. standards such as the open archives initiative’s protocol for metadata harvesting10 and object reuse and exchange11 are meant to allow discovery and reuse of objects in a variety of environments, and the linked data movement promises to make library data even more relevant and reusable in the world wide web environment.12 digital collection managers have also found more radical methods of increasing usage of their collections. inserting references into relevant wikipedia articles has become a popular way to drive more users to the library’s site.13 some librarians have taken the idea a step further and have begun reposting their digital content on third-party sites. the smithsonian pioneered one reposting strategy in 2008 when they partnered with flickr, the popular photo-sharing site, to launch flickr commons.14 the commons is a walled garden within flickr that contains copyrightfree images held by cultural heritage institutions such as libraries, archives, and museums. each partner institution has its own branded space “photostream” in flickr parlance organized into collections and sets. this model aggregates content from different organizations and locates it where users already are, but it still maintains the traditional institution/collection structure. flickr commons has been, by all measures, a very successful experiment in sharing collections with users. the smithsonian,15 the library of congress,16 the alcuin society,17 and the london school of economics18 have all written about their experiences with the commons. stephens19 and michel and tzoc20 give advice on how libraries can work with flickr, and garvin21 and vaughan22 take a broad view of the project and the partners. another sharing strategy is beginning to emerge, where digital collection curators contribute individual or small groups of images to thematic websites. a recent example is pets in collections,23 a whimsical tumblr photo blog created by the digital collections librarian at bryn mawr college. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 87 the site’s description states, “come on if you work in a library, archive, or museum, you know you’ve seen at least one of these a seemingly random image of that important person and his dog or a man and a monkey wearing overalls … so now you finally have a place to share them with the world!” the site requires submissions to include only the image and a link back to the institution or repository that houses it, although submitters may include more information if they choose. although more lighthearted than most traditional library image collections, it still performs the desired function of introducing users to digital collections they may never have encountered otherwise. clearly, these creative and thoughtful strategies are not dreamed up by digital librarians unconcerned with end use of their collections, so why do stewards of digitized collections so rarely collect, or at least publicly discuss, statistics on the use of their content? the one notable exception to this may shed some light on the matter. institutional repositories (irs) have been the one area of non-licensed digital library content where usage statistics are frequently collected and publicized. dspace,24 the widely-adopted ir platform developed by mit and hewlett-packard, has increasingly sophisticated tools for tracking and sharing use of the content it hosts. digital commons,25 the hosted ir solution created by bepress, provides automated monthly download reports for scholars who use it to archive their content. the development of these features has been driven by the need to communicate value to faculty and administrators. encouraging participation by faculty has been a major focus of ir managers since the initial ‘build it and they will come’ optimism faded and the challenge of adding another task to already busy faculty schedules became clear.26 having a clear need (outreach) and a defined audience (participating scholars) has led to a thriving program of usage tracking in the ir community. the lack of an obvious constituency and the absence of pointed questions about use in the digitized collections world have, one suspects, led to the current dearth of measurement tools and initiatives. still, questions about use do arise, particularly when libraries undertake laborintensive usability studies or venture into the somewhat controversial landscape of sharing library-created digital objects on third party sites.27 anecdotally, the thought of sharing library content elsewhere on the web also raises concerns about loss of context and control, as well as a fear of ‘dilution’ of the library’s web presence. “if patrons can use the library’s collections on other sites,” a fellow librarian once exclaimed, “they won’t come to the library’s website anymore!” without usage data, we cannot adequately answer questions about the value of our projects or the way they impact other library services. justification for study and research questions there were three major motivations for this project. first, inspired by the success of the flickr commons project, we wanted to explore a method for sharing our collections more widely. an image collection and a third-party image-sharing platform were an obvious choice, since image display is not a strength of our dspace-based repository. flickr is currently a major presence in information technology and libraries | september 2012 88 the image sharing landscape, and the existence of the commons was an added incentive for choosing flickr as our platform. second, the collection we selected for the project (described more fully below) is not fully described, and we wanted to take advantage of flickr’s annotation tools to allow user-generated metadata. since further description of the images would have required an unusual depth of expertise, we were not optimistic that we would receive much useful data, and in fact we did not. still, we lost nothing by asking, and gained familiarity with flickr’s capabilities for metadata capture. the final motivation for the project, and the focus of the study, was the desire to investigate the effect of third-party platform sharing of a local collection on usage of that collection on library sites. the data gathered were meant partly to inform our local practice, but also to address a concern that may hold librarians back from exploring such means of increasing collection usage the fear that doing so will divert traffic from library sites. we suspected that sharing collections more widely would actually increase usage of the items on library-owned sites, and the study was developed to explore the issue in a rigorous way. the research question for this study was: does reposting digitized images from a library site to a third-party image sharing site have an effect on usage of the images on the library site? about the study platforms for the study, the images were submitted to two different platforms the knowledge bank (kb),28 a library-managed repository, and flickr, a commercial image sharing site. the kb is an institutional repository built on dspace software with a manakin (xml-based) user interface. established in 2005, it holds more than 45,000 items, including faculty and student research, gray literature, institutional records, and digitized library collections. image collections like the one used in this study make up a small percentage of the items in the repository. in the kb’s organizational structure, the images in the study were submitted as a collection in the library’s community, under a sub-community for the special collection that contributed them. each image was submitted as an item consisting of one image file and dublin core metadata.29 the project originally called for submitting the images to flickr commons, but the commons was not accepting new partners during the study period. instead, we created a standard flickr pro account for the libraries, while following the commons guidelines in image rights and settings. in contrast to dspace’s community/sub-community/collection structure, flickr images are organized in sets, sets belong to collections, and all images make up the account owner’s photostream. a set was created for the images, with accompanying text giving background information and inviting users to contribute to the description of the images.30 the images were accompanied by the same metadata as the items in the kb, but the files themselves were higher resolution, to take advantage of flickr’s ability to display a range of sizes for each image. all items in the set were publicly learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 89 available for viewing, commenting, and tagging, and each image was accompanied by links back to the kb at the item, collection, and repository level. the collection the choice of a collection for the study was limited by a number of factors. first, and most obviously, it needed to be an image collection. second, it needed to be in the public domain, both to allow our digitization and distribution of the images, and also to satisfy flickr commons’ “no known copyright restrictions” requirement.31 this could be accomplished either by choosing a collection whose copyright protections had expired, or by removing restrictions from a collection to which the libraries owned the rights. third, the curator of the collection needed to be willing and able to post the images on a commercial site. this required not only an open-minded curator, but also a collection without a restrictive donor agreement or items containing sensitive or private information. finally, we wanted the collection to be of broad public interest. the collection chosen for the study was a set of 163 photographs from osu’s charles h. mccaghy collection of exotic dance from burlesque to clubs, held by the jerome lawrence and robert e. lee theatre research institute.32 the photographs, mainly images of burlesque dancers, were published on cabinet and tobacco cards in the 1890s, putting them solidly in the public domain. figure 1. "the devil's auction," j. gurney & son (studio). http://hdl.handle.net/1811/47633 (kb), http://www.flickr.com/photos/60966199@n08/5588351865/ (flickr) http://hdl.handle.net/1811/47633 learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 87 methodology phases the study took place in 2011 and was organized in three ten-week phases. for the first phase (january 31 through april 11), the images were submitted to the kb. the purpose of this phase was to provide a baseline level of usage for the images in the repository. in phase two (april 12 through june 20), half of the images were randomly selected and submitted to flickr (group a). the purpose of this phase was to determine what effect reposting would have on usage of items in the repository both on those images that were reposted, and on other images in the same collection that had not been reposted. in phase three (june 21 through august 29), the rest of the images (group b) were submitted to flickr. in this phase, we began publicizing the collection. publicity consisted of sharing links to the collection on social media and sending emails to scholars in relevant fields via email lists. these efforts led to further downstream publicity on popular and scholarly blogs.33 data collection the unit of measurement for the study was views of individual images. to understand the notion of a “view,” we must contrast two different ways that an image may be viewed in the knowledge bank. each image in the collection has an individual web page (the item page) where it is presented along with metadata describing it. in addition, from that page a visitor may download and save the image file itself (in this collection, a jpeg). in the former case, the image is an element in a web page, while in the latter it is an image file independent of its web context. search engines and other sources commonly link directly to such files, so it is not unusual for a visitor to download a file without ever having seen it in context. in light of this, we produced two data sets, one for visits to item pages, and another for file downloads. depending on one’s interpretation, either could be construed as a “view.” ultimately there was little distinction in usage patterns between the two types of measure. the data were generated by making use of dspace’s apache solr-based statistics system, which provides a queryable database of usage events. for each item in the study, we made two queries; one for per-day counts of item page views, and another for per-day counts of image file downloads (called “bitstream” downloads in dspace parlance.) in both cases, views that came from automated sources such as search engine indexing agents were excluded from our counts. views of the images in flickr were noted and used as a benchmark, but were not the focus of the study. unlike cumulative views, which are tabulated and saved indefinitely, flickr saves daily view numbers for only thirty days. as a result, daily view numbers for most of the study period were not available for analysis, and the discussion of the trends in the flickr data is necessarily anecdotal. information technology and libraries | september 2012 88 results at the end of the study period, the data showed very little usage of the collection in the repository. this lack of usage was relatively consistent through the three phases of the study, and in rough terms translates to less than one view of each item per day. of the two ways of measuring an image "view" either by counting views of the web page where the item can be found or by counting how many times the image file was downloaded there was little distinction. knowledge bank item pages received between 5 and 38 views per item, while files were downloaded between 5 and 34 times. further, there were no significant differences in number of views received between the first group released to flickr and the second. kb item page views image file downloads min median max min median max group a (images released to flickr in phase ii) 5 10 35 5 9 25 group b (images released to flickr in phase iii) 6 10 38 4 9 34 table 1. the items in the study are divided into group a and group b, depending on when the images were placed on flickr. this table shows that both groups received similar traffic over the course of the study, with items having between 5 and 38 views in both groups, with a median of 10 for both, and between 4 and 34 downloads, with a median of 9 for both groups. the items attracted more visitors on flickr, with the images receiving between 100 and 600 views each. with a few exceptions, the items that appeared towards the beginning of the set (as viewed by a user who starts from the set home page) received more views than items towards its end. this suggests a particular usage pattern start at the beginning, browse through a certain number of images, and navigate away. a more significant trend in the flickr data is that most views of the images came after publicity for the collection began (approximately midway through the third phase of the study). again, the lack of daily usage numbers on flickr makes it impossible to demonstrate the publicity ‘bump,’ but it was dramatic. we witnessed a similar, if smaller, ‘bump’ in usage of the items in the kb after publicity started. we were also able to identify 65 unique visitors to the kb who came to the site via a link on flickr, out of 449 unique visitors overall. of those who came to the kb from flickr, 31 continued on to other parts of the kb, and the rest left after viewing a single item or image. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 89 discussion with so little data, we cannot reliably answer the primary research question. reposting certainly does not seem to have lowered usage of the items in the kb, but the numbers of views in all phases were so small as to preclude drawing meaningful conclusions. a larger issue is the fact that much of the usage came immediately following our promotional efforts. this development complicated the research in a number of ways. first, because the promotional emails and social media messages specifically pointed users to the collection in flickr, it is impossible to know how the use may have differed if the primary link in the promotion had been to the knowledge bank. would the higher use seen on flickr simply have transferred to the kb? would the unfamiliarity and non-image-centric interface of the knowledge bank have thwarted casual users in their attempt to browse the collection? the centrality of the promotion efforts also suggests that one of the underlying assumptions of the study may have been wrong. this research project was premised on the idea that an openly available collection on a library website will attract a certain number of visitors (number dependent on the popularity and topicality of the subject of the collection) who find the content spontaneously via searching and browsing. placing that same content on a third-party site could theoretically divert a percentage of those users, who would then never visit the library’s site. the percentage of users diverted would likely depend on how many more users browse the third party site than the library site, as well as the relative position of the two in search rankings. the mccaghy collection should have been a good candidate for this type of use pattern. flickr is certainly heavily used and browsed, and burlesque, while not currently making headlines, is a subject with fairly broad popular appeal. the fact that users did not spontaneously discover the collection on either platform in significant numbers suggests that this may not be how discovery of library digitized collections works. it is not surprising that email lists and social media should drive larger numbers of users to a collection than happenstance the power of link curation by trusted friends via informal communication channels is well known. what is surprising is that it was the only significant use pattern in evidence. the primary takeaway is that promotion is key. if we do not promote our collections to the people who are likely to be interested in them, barring a stroke of luck, it is unlikely that they will be found. anecdotally, promotional efforts are often an afterthought in digital collections work a pleasant but unnecessary ‘extra.’ in our environment, the repository staff often feel that promotion is the work of the collection owner, who may not think of promoting the collection in the digital environment, nor know how to do so. as a result, users who would benefit from the collections simply do not know they exist. these results also suggest that librarians worried about the consequences of sharing their collections on third party sites may be worrying about the wrong thing. the sheer volume of information on any given topic makes it unlikely that any but the most dedicated researcher will information technology and libraries | september 2012 90 explore all available sources. most other users are likely to rely on trusted information sources (traditional media, blogs, social networking sites) to steer them towards the items that are most likely to interest them. instead of wondering if users will still come to the library’s site if the content is available elsewhere, perhaps we should be asking of our digital collections, “is anyone using them on any site?” and if the answer is no, the owners and caretakers of those collections should explore ways to bring them to the attention of relevant audiences. conclusion as a usage study of a collection hosted on a library site and a commercial site, this project was not a success. flawed assumptions and a lack of usable data resulted in an inability to address the primary research question in a meaningful way. however, it does shed light on the questions that motivated it. are our digitized collections being used? what effect do current methods of sharing and promotion have on that use? librarians working with digitized collections have fallen behind our colleagues in the print and institutional repository arenas in measuring use of collections, but we have the same needs for usage data. in the current climate of heightened accountability in higher education and publicly funded institutions, we need to demonstrate the value of what we do. we need to know when our efforts to promote our collections are working, and determine which projects have been most successful and merit continued development. and as always, we need to share our results, both formally and informally, with our colleagues. measuring use of digital resources is challenging, and obtaining accurate usage statistics requires not only familiarity with the tools involved, but also some understanding of the ways in which the numbers can be unrepresentative of actual use. the organizations that do collect usage statistics on their digitized collections should share their methods and their results with others to help foster an environment where such data are collected and used. next steps in this area could take the shape of further research projects, or simply more visible work collecting usage statistics on digital collections. of greatest utility to the field would be data demonstrating the relative effectiveness of different methods of increasing use. do labor-intensive usability studies deliver returns in the form of increased use of the finished site? which forms of reposting generate the most views? what types of publicity are most effective in bringing users to collections? how does use of a collection change over time? there are also more policy-driven questions to be answered. for example, should further investment in a collection or site be tied to increasing use of low-traffic collections, or capitalizing on success? differences in topic, format, and audience make it difficult to generalize in this area, but we can begin building a body of knowledge that helps us learn from each other’s successes and failures. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 91 references 1 brinley franklin, martha kyrillidou, and terry plum. "from usage to user: library metrics and expectations for the evaluation of digital libraries." in evaluation of digital libraries: an insight into useful applications and methods, ed. giannis tsakonas and christos papatheodorou, 17-39. (oxford: chandos publishing, 2009). http://www.libqual.org/publications (accessed february 29, 2012) 2 “libqual+,” accessed february 29, 2012. http://www.libqual.org/home 3 “statsqual,” accessed february 29, 2012. http://www.digiqual.org/ 4 julie arendt and cassie wagner. "beyond description: converting web site usage statistics into concrete site improvement ideas." journal of web librarianship 4, no. 1 (2010): 37-54. 5 irene m. h. herold. "digital archival image collections: who are the users?" behavioral & social sciences librarian 29, no. 4 (2010): 267-282. 6 dan hazen, jeffrey horrell, and jan merrill-oldham. selecting research collections for digitization. (council on library and information resources, 1998). http://www.clir.org/pubs/reports/hazen/pub74.html (accessed february 29, 2012) 7 bart ooghe and dries moreels. "analysing selection for digitisation: current practices and common incentives." d-lib magazine 15, no. 9 (2009): 28. http://www.dlib.org/dlib/september09/ooghe/09ooghe.html. 8 sandra d. payette and oya y. rieger. "supporting scholarly inquiry: incorporating users in the design of the digital library." the journal of academic librarianship 24, no. 2 (1998): 121-129. 9 judy jeng. "what is usability in the context of the digital library and how can it be measured?" information technology & libraries 24, no. 2 (2005): 47-56. 10 “open archives initiative protocol for metadata harvesting,” accessed february 29, 2012. http://www.openarchives.org/pmh/ 11 “open archives initiative object reuse and exchange,” accessed february 29, 2012. http://www.openarchives.org/ore/ 12 eric miller and micheline westfall. "linked data and libraries." serials librarian 60, no. 1&4 (2011): 17-22. 13 ann m. lally and carolyn e. dunford. “using wikipedia to extend digital collections,” d-lib magazine 13, no. 5&6 (2007). accessed february 29, 2012. doi:10.1045/may2007-lally 14 “flickr: the commons,” accessed february 29, 2012. http://www.flickr.com/commons/ 15 martin kalfatovic, effie kapsalis, katherine spiess, anne camp, and michael edson. "smithsonian team flickr: a library, archives, and museums collaboration in web 2.0 space." archival science 8, no. 4 (2008): 267-277. http://www.libqual.org/publications http://www.libqual.org/home http://www.digiqual.org/ http://www.clir.org/pubs/reports/hazen/pub74.html http://www.dlib.org/dlib/september09/ooghe/09ooghe.html http://www.openarchives.org/pmh/ http://www.openarchives.org/ore/ http://www.flickr.com/commons/ information technology and libraries | september 2012 92 16 josh hadro. "lc report positive on flickr pilot." library journal 134, no. 1 (2009): 23. 17 jeremiah saunders. “flickr as a digital image collection host: a case study of the alcuin society,” collection management 33, no. 4 (2008): 302-309. doi: 10.1080/01462670802360387 18 victoria carolan and anna towlson. "a history in pictures: lse archives on flickr." aliss quarterly 6 (2011): 16-18. 19 michael stephens. "flickr." library technology reports 42, 4 (2006): 58-62. 20 jason paul michel and elias tzoc. "automated bulk uploading of images and metadata to flickr." journal of web librarianship 4, no. 4 (10, 2010): 435-448. 21 peggy garvin. "photostreams to the people." searcher 17, no. 8 (2009): 45-49. 22 jason vaughan. "insights into the commons on flickr." portal: libraries & the academy 10, no. 2 (2010): 185-214. 23 “pets-in-collections,” accessed february 29, 2012. http://petsincollections.tumblr.com/ 24 “dspace,” accessed february 29, 2012. http://www.dspace.org/ 25 “digital commons,” accessed february 29, 2012. http://digitalcommons.bepress.com/ 26 dorothea salo. "innkeeper at the roach motel." library trends 57, no. 2 (2008): 98-123. 27 for an example of the type of debate that tends to surround projects like flickr commons, see http://www.foundhistory.org/2008/12/22/tragedy-at-the-commons/. (accessed february 29, 2012) 28 “the knowledge bank,” accessed february 29, 2012. http://kb.osu.edu 29 “charles h. mccaghy collection of exotic dance from burlesque to clubs,” accessed february 29, 2012. http://hdl.handle.net/1811/47556 30 “charles h. mccaghy collection of exotic dance from burlesque to clubs,” accessed february 29, 2012. http://flic.kr/s/ahsjua3bgi 31 “flickr: the commons (usage),” accessed february 29, 2012. http://www.flickr.com/commons/usage/ 32 “the jerome lawrence and robert e. lee theatre research institute,” http://library.osu.edu/find/collections/theatre-research-institute/; “charles h. mccaghy collection of exotic dance from burlesque to clubs,” http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-andspecial-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/; “loose women in tights digital exhibit,” http://library.osu.edu/find/collections/theatreresearch-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/. accessed february 29, 2012. http://petsincollections.tumblr.com/ http://www.dspace.org/ http://digitalcommons.bepress.com/ http://www.foundhistory.org/2008/12/22/tragedy-at-the-commons/.%29 http://hdl.handle.net/1811/47556 http://flic.kr/s/ahsjua3bgi http://www.flickr.com/commons/usage/ http://library.osu.edu/find/collections/theatre-research-institute/ http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-and-special-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/ http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-and-special-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/ http://library.osu.edu/find/collections/theatre-research-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/ http://library.osu.edu/find/collections/theatre-research-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/ learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 93 33 for an example of the kind of coverage it received, see http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesqueperformers (accessed february 29, 2012) http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesque-performers http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesque-performers 108 information technology and libraries | september 2011 nancy m. foasberg adoption of e-book readers among college students: a survey understand whether and how students are using e-book readers to respond appropriately. as new media formats emerge, libraries must avoid both extremes: uncritical, hype-driven adoption of new formats and irrational attachment to the status quo. ■■ research context recently introduced e-reader brands have attracted so much attention that it is sometimes difficult to remember that those currently on the market are not the first generation of such devices. the first generation was introduced, to little fanfare, in the 1990s. devices such as the softbook and the rocket e-book reader are well documented in the literature, but were unsuccessful in the market.1 the most recent wave of e-readers began with the sony reader in 2006 and amazon’s kindle in 2007, and thus far is enjoying more success. barnes and noble and borders have entered the market with the nook and the kobo, respectively, and apple has introduced the ipad, a multifunction device that works well as an e-reader. amazon claims that e-book sales for the kindle have outstripped their hardcover book sales.2 these numbers may reflect price differences, enthusiasm on the part of early adopters, marketing efforts on the parts of these particular companies, or a lack of other options for e-reader users because the devices are designed to be compatible primarily with the offerings of the companies who sell them. nevertheless, they certainly indicate a rise in the consumption of e-books by the public, as the dramatic increase in wholesale e-book sales bears out.3 in the meantime, sales of the devices increased nearly 80 percent in 2010.4 with this flurry of activity have come predictions that e-readers will replace print eventually, perhaps even within the next few years.5 books have been published with such bold titles as print is dead.6 however, despite the excitement, e-readers are still a niche market. according to the 2010 pew internet and american life survey, 5 percent of americans own e-book readers. those who do skew heavily to the wealthy and well-educated, with 12 percent having an annual household income of $75,000 or more and 9 percent of college graduates owning an electronic book reader. this suggests that e-book readers are still a luxury item to many.7 to academic librarians, it is especially important to know whether e-readers are being adopted by college students and whether they can be adapted for academic use. e-readers’ virtues, including their light weight, their ability to hold many books at the same time, and the speed with which materials can be delivered, could make them very attractive to students. however, they have many limitations for academic work. most do not provide the ability to copy and paste into another document, have to learn whether e-book readers have become widely popular among college students, this study surveys students at one large, urban, four-year public college. the survey asked whether the students owned e-book readers and if so, how often they used them and for what purposes. thus far, uptake is slow; a very small proportion of students use e-readers. these students use them primarily for leisure reading and continue to rely on print for much of their reading. students reported that price is the greatest barrier to e-reader adoption and had little interest in borrowing e-reader compatible e-books from the library. p ortable e-book readers, including the amazon kindle, barnes and noble nook, and the sony reader, free e-books from the constraints of the computer screen. although such devices have existed for a long time, only recently have they achieved some degree of popularity. as these devices become more commonplace, they could signal important changes for libraries, which currently purchase and loan books according to the rights and affordances associated with print books. however, these changes will only come about if e-book readers become dominant. for academic libraries, the population of interest is college students. their use of reading formats drives collection development practices, and any need to adjust to e-readers depends on whether students adopt them. thus, it is important to research the present state of students’ interest in e-readers. do they own e-readers? do they wish to purchase one? if they do own them, do they use them often and regard them suitable for academic work? the present study surveys students at queens college, part of the city university of new york, to gather information about their attitudes toward and ownership of e-books and e-book readers. because only queens college students were surveyed, it is not possible to draw conclusions about college students in general. however, the data do provide a snapshot of a diverse student body in a large, urban, four-year public college setting. the goal of the survey was to learn whether students own and use e-book readers, and if so, how they use them. in the midst of enthusiasm for the format by publishers, librarians and early adopters, it is important to consult the students themselves, whose preferences and reading habits are at stake. it is also vital for academic libraries to nancy m. foasberg (nfoasberg@qc.cuny.edu) is humanities librarian, queens college, city university of new york, flushing, new york. adoption of e-book readers among college students: a survey | foasberg 109 foundation survey, internet and american life, found that e-readers were luxury items owned by the well educated and well off. in the survey, 5 percent of respondents reported owning an e-reader.12 in the ecar study of undergraduate students and information technology, 3.1 percent of undergraduate college students reported owning an e-book reader, suggesting that college students are adopting the devices at a slower rate than the general population.13 commercial market research companies, including harris interactive and the book industry study group, also have collected data on e-book adoption. the harris interactive poll found that 8 percent of their respondents owned e-readers, and that those who did claimed that they read more since acquiring it. however, as a weighted online poll with no available measure of sampling error, these results should be considered with caution.14 the book industry study group survey, although it was sponsored by several publishers and e-reader manufacturers, appears to use a more robust method. this survey, consumer attitudes toward e-book reading, was conducted in three parts in 2009 and 2010. kelly gallagher, who was responsible for the group that conducted the study, remarks that “we are still in very early days on e-books in all aspects—technology and adoption.” although the size of the market has increased dramatically, the survey found that almost half of all e-readers are acquired as a gift and that half of all e-books “purchased” are actually free. however, among those who used e-books, about half said they mostly or exclusively purchased e-books rather than print. the e-books purchased are mostly fiction (75 percent); textbooks comprised only 11 percent of e-book purchases.15 much of the literature on e-book readers consists of user studies, which provide useful information about how readers might interact with the devices once they have them in hand but provide no information about whether students are likely to use them of their own volition. however, these studies are of interest because they hint at reasons that students may or may not find e-readers useful, important information for predicting the future of e-books. user studies have covered small devices, such as pdas (personal data assistants);16 first-generation e-readers, such as the rocket ebook;17 and more recent e-book readers.18 the results of many recent e-reader user studies have been very similar to studies on the usability of the first generation of e-book readers: the devices offer advantages in portability and convenience but lack good note-taking features and provide little support for nonlinear navigation. amazon sponsored large-scale research on academic uses of e-book readers at universities, such as princeton, case western reserve university, and the university of virginia,19 while other universities, such as northwest missouri state university,20 carried out their own projects limited note-taking capabilities, and rely on navigation strategies that are most effective for linear reading. the format also presents many difficulties regarding library lending. many publishers rely on various forms of drm (digital rights management) software to protect copyrighted materials. this software often prevents e-books from being compatible with more than one type of e-book reader. indeed, because e-book collections in academic libraries predate the emergence of e-book readers, many libraries now own or subscribe to large e-book collections that are not compatible with the majority of these devices. furthermore, publishers and manufacturers have been hesitant to establish lending models for their books. amazon recently announced that they would allow users to lend a book once for a period of fourteen days, if the publisher gave permission.8 this very cautious and limited approach speaks volumes about publishers’ fears regarding user sharing of e-books. several libraries have developed programs for lending the devices,9 but there is no real model for lending e-books to users who already own e-readers. a service called overdrive also provides downloadable collections, primarily of popular fiction, that can be accessed in this manner. however, the collections are small and are not compatible with all devices, including the most popular, the kindle. in the united kingdom, the publisher’s association has provided guidelines under which libraries can lend e-books, which include a requirement that the user physically visit the library to download the e-book.10 clearly, we do not currently have anything resembling a true library lending model for e-reader compatible e-books, especially not one that takes advantage of the format’s strengths. despite the challenges, it is clear that if e-book readers are enthusiastically adopted by students, libraries will need to find a way to offer materials compatible with them. as buczynski puts it, “libraries need to be in play at this critical juncture lest they be left out or sidelined in the emerging e-book marketplace.”11 however, because the costs of participating are likely to be substantial, it is very important to discover whether students are indeed adopting the hardware. few studies have focused on spontaneous student adoption of the devices, although several mention that when students were introduced to e-readers, they appeared to be unfamiliar with the devices and regard them as a novelty. however, e-readers have become more prevalent since many of these studies were conducted. thus this study surveys students to find their attitudes toward e-book readers. ■■ literature review only a few studies have attempted to quantify the popularity of e-readers. as mentioned above, the 2010 pew 110 information technology and libraries | september 2011 their first encounter with an e-book reader.”34 while this is mere anecdote, it, along with the survey results noted above, raises the question of how popular the device really is on college campuses. finally, a third group of studies attempts to predict the future of e-readers and e-books. even before the introduction of e-readers, some saw e-books as the likely future of academic libraries.35 more recently, one report discusses the likelihood of and barriers to e-book adoption. this article concludes that “barriers to e-book adoption still exist, but signs point to this changing within the next two to five years. that, of course, has been said for most of the past 15 to 20 years.”36 still, nelson points out that technologies can become ubiquitous very quickly, using the ipod as an example, and warns libraries against falling behind.37 yet another report puts e-books in the two-tothree-year adoption range and claims that e-books “have reached mainstream adoption in the consumer sector” and that the “obstacles have . . . started to fall away.”38 ■■ method the e-reader survey was conducted as part of queens college’s student technology survey, which also covered several other aspects of students’ interactions with technology. the author is grateful to the center for teaching and learning (in particular, eva fernández and michelle fraboni) for graciously agreeing to include questions about e-readers in the survey and providing some assistance in managing the data. this survey, run through queens college’s center for teaching and learning, was hosted by surveymonkey and was distributed to students through their official e-mail accounts. participants were offered a chance to win an ipod touch as an incentive, but students who did not participate also were offered an opportunity to enter the ipod drawing. the survey was available between april and june 2010. all personally identifying information was removed from the responses to protect student privacy. rather than surveying the entire population about e-readers and e-books, the survey limited most of the questions to students with some experience with the format. of the students who responded to the survey, only 63 (3.7 percent) used e-readers. however, 338 more students identified themselves as users of e-books but did not use e-readers. all other students skipped past the e-book questions and were directed to the next part of the survey. the questions about e-readers fell into several categories. the students were asked about their ownership of devices and which devices they planned to purchase in the future. while they might of course change their minds about future purchases, this is a useful way of measuring whether students regard the devices as desirable. with other e-readers. other types of programs, most notably texas a&m’s kindle lending program,21 and many academic focus groups have also contributed to our knowledge of how students use e-readers. users in nearly every study have praised the portability of these devices. this can be very important to students; users in one study noted that the portability of reading devices allowed them to “reclaim an otherwise difficult to use brief period,”22 and in another, students were able to multitask, doing household chores and studying at the same time.23 adjustable text size and the ability to search for words in the text have also been popular among students, as has the novelty value of these devices. environmental concerns surrounding heavy printing have also been cited as an advantage of e-readers.24 however, the limitations of these devices, some of which are severe in an academic setting, also have been noted. the comments of students at gettysburg college are typical: they liked the e-readers for leisure reading, but found them awkward for classroom use.25 lack of note-taking support was an important drawback for many students. waycott and kukulska-hulme noted that students were much less likely to take notes while reading with a pda than they were with print.26 a study at princeton found that the same was true of students using the kindle,27 and students at northwest missouri state university said they read less with an e-textbook than with a traditional one, although they did not report changes in their study habits.28 despite the ability of many devices to search the text of a book, users in many studies also disliked the inability to skim and browse through the materials as they would with print.29 interestingly, this complaint appeared in studies of all types of e-readers, even those with larger screens. students, in a recent study with the sony reader and ipod touch, noted that these devices did a poor job of supporting pdfs, a standard format for online course materials. the documents were displayed at a very small size and the words were sometimes jumbled.30 whether these drawbacks will prevent students from adopting e-book readers remained to be seen. library and information science (lis) students in a small, week-long study reiterated the problems found in the above studies, but nevertheless found themselves using e-readers extensively and reading more books and newspapers than they had before.31 several of these user studies hint that e-readers are not currently commonplace as far as users often seemed to regard the devices with surprise and curiosity. in some studies, while users were initially attracted to the novelty value of the devices, their enthusiasm dimmed after using the devices and discovering technical problems and limitations.32 one author describes e-readers as “attention getters, but not attention keepers.”33 a study in early 2009, in which students were provided with e-readers, notes that “for the majority of the participants, this was adoption of e-book readers among college students: a survey | foasberg 111 attitudes of students in general, similar surveys should be taken across many campuses in several demographically different areas. researching e-readers is inherently difficult because the landscape is changing very quickly. since the survey began, apple’s ipad became available, prices for dedicated e-readers have dropped dramatically, publishers have become more willing to offer content electronically, and amazon has released a new version of the kindle and has begun taking out television advertisements for it. without a follow-up survey, it is impossible to know whether these events have changed student attitudes. ■■ results and discussion e-reader adoption of the 1,705 students who responded to the survey, 401 say that they read e-books (table 1). most students (338) who use e-books read them on a device other than an e-reader, but 63 say they use a dedicated reader for e-books (table 2). however, when students were asked about the technological devices that they own, only 56 selected e-book readers. perhaps the seven students who use e-book readers but don’t report owning one are sharing or borrowing them, or perhaps they are using a device other than the ones enumerated in the question. aside from table 3, which breaks down the e-reader brands that students own, the following data will be based upon the larger sample of 63 students. the students who read e-books on another device were asked whether they planned to buy an e-reader in the respondents were also asked about their use of e-books. this category includes questions about what kind of reading students use e-books for, how much of their reading uses e-books, and where they are finding their e-books. it was important to learn whether students considered e-book readers appropriate for academic work, and whether they considered the library a potential source for e-books. finally, to assess their attitudes toward e-book readers, students were asked to identify the main benefits and drawbacks of e-book readers. several possibilities were listed, and students were asked to respond to them along a likert scale. a field was also included in which students could fill in their own answers. after 643 incomplete surveys were eliminated, there were 1,705 responses from queens college students. this is about 8 percent of the queens college student body. e-mail surveys always run the risk of response bias, especially when they concern technology. however, students who responded were representative of queens college in terms of sex, age, class standing, major, and other demographic characteristics. the results were compared using a chi-squared test with the level of significance set at 0.05. in some cases, there were too few respondents to test significance properly and comparisons could not be made. please see appendix for the e-reader questions included in the survey instruments. they will be referred to in more depth throughout this article. ■■ survey limitations the survey results may not be generalizable because of the survey’s small sample size. in particular, the 63 respondents who use e-book readers may not be representative of student e-reader owners in general. the survey also relies on self-reporting; no direct observation of student behavior took place. students who do use e-readers may be more comfortable with technology and more likely to respond to e-mail surveys. however, the sample is representative for queens college students, and the percentage of students who own e-book readers is close to the national average at the time the survey was taken (5 percent).39 since only queens college students were surveyed, the results reflect the behavior and attitudes of students at a single large, four-year public college in new york city. the results do not necessarily reflect the experience of students at other types of institutions or in other parts of the united states. the other parts of the technology survey show that qc students are heavy users of technology, so they may adopt new technologies such as e-book readers more quickly than other students. to understand the table 1. e-book use among respondents e-book use number of respondents read e-books 401 (23.5%) do not read e-books 1262 (74.0%) don’t know what an e-book is 42 (2.5%) total 1705 (100%) table 2. devices used to read e-books among e-book readers device used number of respondents (% of e-book users) dedicated e-reader 63 (15.7) other device 338 (84.3) total 401 (100) 112 information technology and libraries | september 2011 desire to buy an ipad, many more than reported owning an e-reader. curiously, the e-reader owners reported that they planned to buy an ipad at the same rate as the other students. it is not clear whether these students plan to replace their e-reader or use multiple devices. in either case, while the arrival of the ipad and other tablet devices seems likely to increase the number of students carrying potential e-reading devices, some of its adopters will probably be students who already own e-readers. not surprisingly, students who used e-readers tended to be early adopters of technology in general (table 4).40 compared to the general pool of respondents, they were much more likely to like or love new technologies and much less likely to describe themselves as neutral or skeptical of them. in a chi-squared test, these differences were significant at a level of 0.001. although e-reading devices have existed since the 1990s, the newest, most popular generation of them is so recent that people who own one now are early adopters by definition. compared to the rest of the survey respondents, both e-reader owners and other e-book users were much more likely to identify as early adopters of technology in general. given this trend, the adoption rate of e-readers among students may slow once the early adopters are satisfied. uses of e-books students who used an e-book reader were asked how much of their reading they did with it and whether they used it for class, recreational, or work-related reading (table 5). students without e-readers were asked the same questions about their use of e-books. while it is likely that students who use e-book readers continue to access e-books in other ways, this distinction was made because this survey was designed to study their use of e-readers specifically. because e-reader users were not asked about their use of e-books in other formats, it is not clear whether their habits with more traditional e-book formats differ from those of other students. fewer than half the e-reader users in the study used the device for two-thirds of their reading or more. in the table below, students who did all their reading and those who did about two-thirds of their reading with e-books are combined, because so few claimed to read e-books exclusively. three students with e-readers and future. the majority had no immediate plans to buy one, with those who said they did not plan to acquire one and those who did not know combining for 62.43 percent. 23.67 percent planned to buy one either within the next year or before leaving college, and the remaining 13.91 percent planned to acquire an e-reader after graduating. despite ergonomic disadvantages, many more students are using e-books on some other device, such as a computer or a cell phone, than are loading them on e-readers. furthermore, a large percentage of these students do not plan to buy an e-book reader. the factors preventing these students from buying e-readers will be covered in more detail in the “attitudes toward e-readers” section below. however, it seems likely that a major factor is price, identified by both e-reader owners and non-owners as the greatest disadvantage of these devices. when asked to list the devices they owned, 56 students named some type of e-book reader. among these, the amazon kindle was the most popular (table 3). as expected, e-readers have yet to be adopted by most students at queens college. at the time of this survey, less than 4 percent of respondents owned one. while the rest of the survey shows that these students are highly wired—82 percent own a laptop less than five years old and 93 percent have high-speed internet access at home—this has not translated to a high rate of e-reader ownership. although apple’s ipad, a tablet device that functions as an e-reader among other things, was not yet released at the time of the survey, it may see wider adoption than the dedicated devices. when the survey was originally distributed, this device had been announced but not yet released. overall, 8 percent of students expressed a table 3. e-reader brands owned by students devices owned number of students (% of e-reader owners) amazon kindle 26 (46.4%) barnes & noble nook 14 (25.0%) sony reader 10 (17.9%) other 6 (10.7%) total 56 (100.0%) table 4. e-reader use and self-identification as an early adopter e-reader owners all respondents love or like new technologies 40 (63.5%) 698 (40.9%) neutral or skeptical about new technologies 23 (36.5%) 1007 (59.1%) total 63 (100.0%) 1705 (100.0%) adoption of e-book readers among college students: a survey | foasberg 113 pleasure. this finding is much more surprising, given the very slow adoption of e-books before the introduction of e-readers, and the ergonomic problems with reading from vertical screens. however, students who used e-books without e-readers were much more likely to read e-books for classes. this difference may be due to the sorts of material that are available in each format. although textbook publishers have shown interest in creating e-textbooks for use on devices such as the ipad, there is little selection available for e-readers as yet. when working without e-book readers, however, there is a wide variety of academic materials available in electronic formats, and many textbooks include an online component. academic libraries, including the one at queens college, subscribe to large e-book collections of academic materials. for the most part, these collections cannot be used on an e-reader, but they are available through the library’s website to students with an internet connection and a browser. it is also possible that the e-readers are not well suited to class readings. some past studies, cited above, have found that e-readers do not accommodate functions such as note taking, skimming, and non-sequential navigation very well. since these are important functions for academic work, and both print books and “traditional” e-books are superior in these respects, such limitations may prevent students from using e-readers for classes. the user behaviors reported here do not appear to herald the end of print; in fact, very few students with e-readers use them for all their reading, and over half of the students with e-readers use them for one-third of their reading or less. it is not clear whether students intentionally choose to read some materials in print and others with nine without said they used e-books for all their reading. very few students without e-book readers used e-books for a large proportion of their reading; indeed, 54 percent said they used e-books for less than a third of their reading. differences between the groups were tested for significance using a chi-squared test. note that percentages may not add up to 100 percent, due to rounding. since many studies of e-book readers have found them more suitable for recreational reading than for academic work, users of e-readers were asked to identify the kinds of readings for which they used e-readers and asked to identify all options that they found applicable (table 6). since students were allowed to choose more than one option, the totals are greater than the number of participants. indeed, e-readers were much more likely to be used for recreational reading and other types of e-books far more likely to be used for class. for other types of reading, differences between these groups were not significant. since e-readers have been marketed largely for the popular fiction market and are designed to accommodate casual linear reading, it is not surprising that students who use them are most likely to report using them for leisure reading. in this area they seem to enjoy a strong advantage over more traditional e-book formats read on another device such as a computer or a cell phone. however, the study did not control for the amount of reading that students do. students who use e-readers may be heavier leisure readers in general. further research could clarify whether heavier use of leisure e-reading is due to the devices or the tendencies of those who own them. a large proportion of the students who read e-books without e-readers (65.7 percent) do read e-books for table 5. amount of reading done with e-books amount of reading e-reader users other users x2 significance level significant? about two-thirds or all 27 (42.8%) 65 (19.2%) 16.8 0.001 yes about a third 14 (22.2%) 90 (26.6%) 0.1 0.5 no less than a third 22 (34.9%) 183 (54.1%) 7.9 0.01 yes total 63 (99.9%) 338 (99.9%) ———table 6. types of reading done with e-books type of reading e-reader users other users x2 significance level significant? recreational 54 (85.7%) 222 (65.7%) 9.9 0.01 yes class 24 (38.1%) 217 (64.2%) 14.7 0.001 yes work 11 (17.8%) 88 (26.0%) 2.1 0.5 no other 3 (4.8%) 8 (2.4%) 1.1 0.5 no 114 information technology and libraries | september 2011 from the manufacturer of the e-reader that supports them, this result is not surprising. it suggests that these booksellers have a high degree of power in the market, a potential effect of e-readers that deserves further attention. however, official e-book sellers of the sort mentioned above are not the only option for students seeking digital reading material, since both independent online bookstores and open access repositories such as project gutenberg were used by students. libraries, both public and academic, reached traditional e-book users much more successfully than e-reader users. although many libraries have large e-book collections, there is currently little material for e-readers. despite the existence of a service called overdrive, which provides e-books compatible with some e-readers (excluding the kindle), circulating e-books is challenging, due to a host of technical and legal problems. given this environment, it is not surprising that students without e-readers were more likely to use their public library as a source of e-books than were e-reader users. the queens college campus library, which offers many electronic collections but none that are e-reader-friendly, fared worse; only one student claimed to have used it as a source of e-reader compatible materials. in the free comment field, students mentioned other sources of e-books such as the apple itunes store, the campus bookstore and lulu.com, an online bookseller that also provides self-publishing. several also admitted, unprompted, that they download books illegally. attitudes toward e-readers in the interests of learning what caused students to adopt e-readers or not, the survey used a series of likert-style questions to ask what the students considered the benefits and drawbacks of such devices. strikingly, e-reader owners and non-owners agreed about both the advantages and disadvantages; owning an e-reader did not seem to change most of the things that students value and dislike about it. figure 1 shows the number of students in each group who their e-reader, or whether they are limited by the materials available for the e-reader. the circumstances under which students switch between electronic and print would be an excellent area for future research; is it a matter of what is practically available, or is the e-reader better suited for some texts and reading circumstances than others? sources of e-books the major producers of e-readers are either primarily booksellers, such as amazon and barnes & noble, or are hardware manufacturers who also provide a store where users can purchase e-books, such as sony (or, after the ipad launch, apple). in both models, the manufacturers hope to sell e-books to those who have purchased their devices. they provide more streamlined ways of loading these e-books on their devices, and in some cases use drm to prevent their e-books from being used on competing devices, as well as to inhibit piracy. table 7 shows the sources from which readers with and without e-readers obtain e-books. e-reader users were much more likely than non-users to get their e-books from the official store associated it—that is, the store providing the e-reader, such as amazon, barnes and noble, or sony’s readerstore. there was no significant difference between the two groups’ use of open access or independent sources, but the students who did not use e-readers were much more likely to use e-books from their public library, and while 19.8 percent of students without e-readers used the campus library as a source of e-books, only one student with an e-reader did. since respondents were allowed to choose more than one answer, the results do not sum up to 100 percent. by a wide margin, students who own e-readers are most likely to purchase their e-reading materials from the “official” store; 86 percent cited the official store as a source of e-books. students without e-readers also use these stores more than any other source of e-books, but they are nevertheless far less likely to use them than e-reader users. because it is much easier to buy e-books table 7. sources of e-books how do you get e-books? e-reader users other users x2 significance level significant? store specific to popular e-readers 54 (85.7%) 154 (45.6%) 34.2 0.001 yes open access repositories 16 (25.4%) 120 (35.5%) 2.4 0.5 no public library 10 (15.9%) 99 (29.3%) 4.8 0.05 yes independent online retailer 9 (14.3%) 71 (21.0%) 1.5 0.5 no other 4 (6.3%) 39 (11.5%) n/a n/a n/a campus library 1 (1.6%) 67 (19.8%) n/a n/a n/a adoption of e-book readers among college students: a survey | foasberg 115 students with e-readers were more likely than others to rate portability and convenience as “very valuable.” as the studies cited above suggest, being able to easily download books, carry them away from the computer, and store many books on a single device are very appealing to students. only the final two features, text-to-speech and special features such as dictionaries, attracted enough “not very valuable” or “not valuable” responses for an inter-group comparison. both groups considered text-to-speech the least valuable feature, but students who did not own e-readers were significantly more likely to consider it a valuable or very valuable feature, perhaps indicating that the users to whom this is important have avoided the devices, which currently support it in a very limited fashion. perhaps, too, students with e-readers rated this feature less useful because of its current limitations. in either case, rated each feature either valuable or very valuable. if the positive features of the devices are ranked based on the percentage of respondents who considered them very valuable, the order is almost the same for students with and without e-readers. for students with e-readers, the features rank as follows: portability, convenience, storage, special functions, and text-to-speech. for those without, convenience ranks slightly higher than portability; all other features rank in the same order. tables 8 and 9 present the results of these questions in more detail. for the sake of brevity, the chi-squared results have been omitted. any differences considered significant in the discussion below are significant at least at the 0.05 level. nearly all e-reader users and a strong majority of other e-book users rated portability, convenience, and storage either “valuable” or “very valuable,” though figure 1. features rated “valuable” or “very valuable” 116 information technology and libraries | september 2011 among respondents suggests that that many of those who do not own an e-book reader are unfamiliar with the technology. since e-readers are primarily sold over the internet, many people have not had a chance to see or handle one, perhaps partly explaining this result. if they become more widespread, this may well change. not surprisingly, respondents who did not own e-readers were significantly more likely to prefer print. however, it is worth noting that even among students who did use e-readers, over a third “agree” or “completely agree” that they prefer print, with another third neither agreeing nor disagreeing. use of e-readers does not appear to indicate hostility toward print. this is consistent with the students’ self-reports of e-reader use; as reported above, over half of the students surveyed use e-readers for one-third of their reading or less. thus, it seems unlikely that most of these students plan to totally abandon print any time soon; rather, e-readers are providing another format that they use in addition to print. as for students who do not use e-readers, over half say they prefer print, but this is far from their most widespread concern; rather, like e-reader owners, they are most likely to cite the cost of the reader or the selection of books available as a drawback of the devices. queens college students considered price the most important drawback of e-readers. for both groups (owners and non-owners), it was the factor most likely to be identified as a concern, and the difference between the it was the only variable listed in the survey for which either the “not very valuable” and “not valuable” responses from either group amounted to a combined total of greater than 10 percent of the respondents in that group. in addition to valuing the same features, e-reader owners and non-owners had similar concerns about the device. figure 2 shows the number of respondents in each group who agreed or completely agreed that the issues listed were one of the main shortcomings of e-book readers. tables 10 and 11 give the responses in more detail. the responses with which the most respondents either agreed or completely agreed were the same: cost of e-reader, selection of e-books, and cost of e-books, in that order. although groups such as the electronic frontier foundation have raised concerns about privacy issues related to e-readers,41 these issues have made little impression on students; both e-reader users and nonusers were in agreement in putting privacy at the bottom of the list. one exception to the general agreement between e-reader users and other e-book readers was concern about eyestrain. the majority (63 percent) of those who do not use e-readers either “completely agree” or “agree” that eyestrain is a drawback, while only 29 percent of e-reader owners did. this was a major concern for early e-readers, leading the current generation of these devices to use e-ink, a technology that resembles paper and is thought to eliminate the eyestrain problem. the disparity table 8. value of e-reader features, according to e-reader users very valuable valuable somewhat valuable not very valuable not valuable at all no response portability 52 (82.54%) 10 (15.87%) 1 (1.59%) 0 (0.00%) 0 (0.00%) 0 (0.00%) convenience 46 (73.02%) 13 (20.63%) 1 (1.59%) 1 (1.59%) 1 (1.59%) 1 (1.59%) storage 42 (66.67%) 16 (25.40%) 2 (3.17%) 1 (1.59%) 0 (0.00%) 2 (3.17%) special functions 32 (50.79%) 18 (28.57%) 7 (11.11%) 3 (4.76%) 3 (4.76%) 0 (0.00%) text-speech 10 (15.87%) 13 (20.63%) 12 (19.05%) 16 (25.40%) 11 (17.46%) 1 (1.59%) table 9. value of e-reader features, according to other e-book users very valuable valuable somewhat valuable not very valuable not valuable at all no response portability 199 (58.88%) 89 (26.33%) 39 (11.53%) 4 (1.18%) 5 (1.48%) 2 (0.06%) convenience 194 (57.40%) 98 (28.99%) 34 (10.06%) 7 (2.07%) 2 (0.59%) 3 (0.89%) storage 181 (53.55%) 99 (29.28%) 40 (11.83%) 10 (2.96%) 4 (1.18%) 4 (1.18%) special functions 169 (50.00%) 82 (24.26%) 58 (17.16%) 22 (6.51%) 4 (1.18%) 3 (0.89%) text-speech 95 (28.11%) 77 (22.78%) 77 (22.78%) 50 (14.79%) 35 (10.36%) 4 (1.18%) adoption of e-book readers among college students: a survey | foasberg 117 responded, but they brought up issues such as highlighting, battery life, and the small size of the screen. another student was more confident in the value of e-readers and used this space to proclaim paper books dead. e-book circulation programs finally, students were asked whether they would be interested in checking out e-readers with books loaded on them from the campus library (table 12). as is often the case when a survey asks for interest in a prospective new service, the response was very positive. however, it was expected that many of the students would prefer to download materials for devices that they already own to take advantage of the convenience of e-readers. on the contrary, a high percentage of both types of students expressed interest in checking out e-book readers, but very few wished to check out e-books two groups was not significant. at the time this survey was taken, amazon’s kindle cost close to $300 and barnes and noble’s nook was priced similarly. soon after the survey closed, however, the major e-reader manufacturers engaged in a “price war,” which resulted in the prices of the best-known dedicated readers, amazon’s kindle and barnes and noble’s nook, falling to under $200. given the feeling among survey respondents that the price of the readers is a serious drawback, this reduction may cause the adoption rate to rise. it would be worthwhile to repeat this survey or a similar one in the near future to learn whether the e-reader price war has had any effect upon price-sensitive students. in the pilot survey, students had written in further responses about the drawbacks of e-readers, but not about their benefits. while some of those responses were incorporated into the final survey, a free text field was also added to catch any further comments. few students figure 2. drawbacks with which students “agree” or “completely agree” 118 information technology and libraries | september 2011 ■■ future research although this survey provides some data to help libraries think about the popularity of e-readers among students, many aspects of students’ use of e-readers remain unexplored. further research on how student adoption of e-book readers varies by location and demographics, particularly considering students’ economic characteristics, for a device of their own. even students who owned e-readers were much more likely to express interest in checking out the device than checking out materials to read on it. this preference belies the common assumption that users do not wish to carry multiple devices and prefer to download everything electronically. instead, they were interested in checking out an e-reader from the library. unless the emphasis of the question altered the results, it is somewhat difficult to account for this response. table 10. drawbacks of e-readers, according to e-reader owners completely agree agree neither agree nor disagree disagree completely disagree no response cost of reader 19 (30.16%) 23 (36.51%) 13 (20.63%) 7 (11.11%) 0 (0.00%) 1 (1.59%) selection 11 (17.46%) 26 (41.27%) 12 (19.05%) 7 (11.11%) 6 (9.52%) 1 (1.59%) cost of e-books 10 (15.87%) 20 (31.75%) 16 (25.40%) 11 (17.46%) 5 (7.94%) 1 (1.59%) prefer print 6 (9.52%) 16 (25.40%) 21 (33.33%) 11 (17.46%) 8 (12.70%) 1 (1.59%) eyestrain 7 (11.11%) 11 (17.46%) 20 (31.75%) 15 (23.81%) 9 (14.29%) 1 (1.59%) interface 7 (11.11%) 10 (15.87%) 24 (38.10%) 9 (14.29%) 8 (12.70%) 5 (7.94%) privacy 3 (4.76%) 9 (14.29%) 13 (20.63%) 26 (41.27%) 11 (17.46%) 1 (1.59%) table 11. drawbacks of e-readers, according to other e-book users completely agree agree neither agree nor disagree disagree completely disagree no response cost of reader 146 (43.20%) 117 (34.62%) 50 (14.79%) 14 (4.14%) 11 (3.25%) 0 (0.00%) selection 80 (23.67%) 136 (40.24%) 84 (24.85%) 27 (7.99%) 7 (2.07%) 4 (1.18%) cost of e-books 94 (27.81%) 121 (35.80%) 76 (22.49%) 37 (10.95%) 10 (2.96%) 0 (0.00%) prefer print 78 (23.08%) 99 (29.29%) 116 (34.32%) 25 (7.40%) 19 (5.62%) 1 (0.30%) eyestrain 84 (24.85%) 129 (38.17%) 80 (23.67%) 33 (9.76%) 11 (3.25%) 1 (0.30%) interface 43 (12.72%) 82 (24.26%) 145 (42.90%) 33 (9.76%) 20 (5.92%) 15 (4.44%) privacy 39 (11.54%) 65 (19.23%) 144 (42.60%) 49 (14.50%) 40 (11.83%) 1 (0.30%) table 12. interest in checking out preloaded e-readers from the library e-reader owners other e-book users would be interested in checking out e-readers 44 (70.0%) 257 (76.0%) would not be interested in checking out e-readers 4 (6.3%) 38 (11.2%) would not be interested in checking out e-readers, but would like to check out e-books to read on my own e-reader 15 (23.8%) 43 (12.7%) total 63 (100.1%) 338 (99.9%) adoption of e-book readers among college students: a survey | foasberg 119 whom would not object to using a print edition if one were available. under these circumstances, and realizing that the future popularity of e-readers is far from guaranteed, developing such models is, for now, more important than putting them into practice in the short term. references 1. nancy k. herther, “the ebook reader is not the future of ebooks,” searcher 16, no. 8 (2008): 26–40, http://search.ebsco host.com/login.aspx?direct=true&db=a9h&an=34172354&site =ehost-live (accessed dec. 22, 2010). 2. charlie sorrel, “amazon: e-books outsell hardcovers,” wired, july 20, 2010, http://www.wired.com/gadgetlab/ 2010/07/amazon-e-books-outsell-hardcovers/ (accessed dec. 22, 2010). 3. international digital publishing forum, “industry statistics,” oct. 2010, http://www.idpf.org/doc_library/indus trystats.htm (accessed dec. 22, 2010). 4. kathleen hall, “global e-reader sales to hit 6.6m 2010,” electronics weekly, dec. 9, 2010, http://www.electronicsweekly .com/articles/2010/12/09/50083/global-e-reader-sales-to -reach-6.6m-2010-gartner.htm (accessed dec. 22, 2010). 5. cody combs, “will physical books be gone in five years?” video interview with nicholas negroponte, cnn, oct. 18, 2010, http://www.cnn.com/2010/tech/innovation/10/17/negro ponte.ebooks/index.html (accessed dec. 22, 2010). 6. jeff gomez, print is dead: books in our digital age (basingstoke, uk: palgrave macmillan, 2009). 7. aaron smith, “e-book readers and tablet computers,” in americans and their gadgets (washington, d.c.: pew internet & american life project, 2010), http://www.pewinternet.org/ reports/2010/gadgets/report/ebook-readers-and-tablet -computers.aspx (accessed dec. 22, 2010). 8. alex sharp, “amazon announces kindle book lending feature is coming in 2010,” suite101, oct. 26, 2010, http:// www.suite101.com/content/amazon-announces-kindle-book -lending-feature-is-coming-in-2010-a300036#ixzz18cxanfke (accessed dec. 22, 2010). 9. karl drinkwater, “e-book readers: what are librarians to make of them?” sconul focus 49 (2010): 4–10, http://www .sconul.ac.uk/publications/newsletter/49/2.pdf (accessed dec. 22, 2010). drinkwater provides an overview and a discussion of the challenges and benefits of such programs. 10. benedicte page, “pa sets out restrictions on library e-book lending,” the bookseller, oct. 21, 2010, http://www .thebookseller.com/news/132038-pa-sets-out-restrictions-on -library-e-book-lending.html (accessed dec. 22, 2010). 11. james a. buczynski, “library ebooks: some can’t find them, others find them and don’t know what they are,” internet services reference quarterly 15, no. 1 (2010): 11–19, doi: 10.1080/10875300903517089, http://dx.doi.org/ 10.1080/10875300903517089 (accessed dec. 22, 2010). 12. smith, “e-book readers and tablet computers,” http:// www.pewinternet.org/reports/2010/gadgets/report/ebook -readers-and-tablet-computers.aspx (accessed dec. 22, 2010). 13. shannon d. smith and judith borreson caruso, the ecar study of undergraduate students and information technology, 2010 (boulder, colo.: educause, 2010), http://net.educause. is certainly important. more research on the habits of students with e-readers would also help libraries and universities to better serve their needs. in particular, while this survey found that students tend to switch between electronic and print formats, little is yet known about when and why they move from one to the other. it will also be important to research the differences between the reading habits of students who own e-readers and those who do not, as this may prove useful in interpreting the survey data about types of reading done with different kinds of e-books. furthermore, since the e-book market changes quickly, continuing to research student adoption of e-readers is also important to monitor student reactions to new developments. ■■ conclusion while many queens college students express an interest in e-readers, and even those who do not own one believe that their portability and convenience offer valuable advantages, only a small percentage of students, many of whom are early adopters of technology in general, actually use one. furthermore, even those who own e-readers do not use them exclusively, and only a third say they prefer it to print. in light of these responses, the proper response to this technology may not be a discussion about whether “paper books are dead” (as one of the survey respondents wrote in the comment field) but how each format is used. research on when, where, and for what purposes students might choose print or electronic has already begun.42 many of the factors that contribute to the niche status of e-readers are changing. competition between manufacturers has brought down the price of the reader itself, and the selection of books available for them is improving. because these were some of the most important problems standing in the way of e-reader adoption for queens college students, e-reader ownership could increase rapidly. the lack of a significant difference between the attitudes of e-reader owners and nonowners merits further emphasis and examination, as it may indicate that price is indeed the major barrier to e-reader ownership. although the prices are lower now than they were when the survey was originally taken, this would present a major concern if e-readers became the expected format in which students read, perhaps even the possibility of a new kind of digital divide. as the future is uncertain, it is important for academic libraries to pay attention to their students’ adoption of e-readers, and to consider models under which they can provide materials compatible with them. however, it is important to remember that such materials would, at present, be accessible to only a small subset of users, many of 120 information technology and libraries | september 2011 20. jon t. rickman et al., “a campus-wide e-textbook initiative,” educause quarterly 32, no. 2 (2009), http://www.edu cause.edu/library/eqm0927 (accessed dec. 22, 2010). 21. dennis t. clark, “lending kindle e-book readers: first results from the texas a&m university project,” collection building 28, no. 4 (2009): 146–49, doi: 10.1108/01604950910999774, http://www.emeraldinsight.com/journals.htm?articleid=18174 06&show=abstract (accessed dec. 22, 2010). 22. marshall and rutolo, “reading-in-the-small,” 58. 23. mallett, “a screen too far?” 142. 24. “e-reader pilot at princeton.” 25. foster and remy, “e-books for academe,” 6. 26. waycott and kukulska-hulme, “students’ experiences with pdas,” 38. 27. “e-reader pilot at princeton.” 28. rickman, “a campus-wide e-textbook initiative.” 29. dennis t. clark et al., “a qualitative assessment of the kindle e-book reader: results from initial focus groups,” performance measurement and metrics 9, no. 2 (2008): 118–129, doi: 10.1108/14678040810906826, http://www.emeraldinsight .com/journals.htm?articleid=1736795&show=abstract (accessed dec. 22, 2010); james dearnley, cliff mcknight, and anne morris. “electronic book usage in public libraries: a study of user and staff reactions to a pda-based collection,” journal of librarianship and information science 36, no. 4 (2004): 175–182, doi: 10.1177/0961000604050568, http://lis.sagepub.com/content/36/4/175 (accessed dec. 22, 2010); mallett, “a screen too far?” 143; waycott and kukulska-hulme, “students’ experiences with pdas,” 36. 30. mallet, “a screen too far?” 142–43. 31. m. cristina pattuelli and debbie rabina. “forms, effects, function: lis students’ attitudes toward portable e-book readers,” aslib proceedings: new information perspectives 62, no. 3 (2010): 228–44, doi: 10.1108/00012531011046880, http://www .emeraldinsight.com/journals.htm?articleid=1863571&show=ab stract (accessed dec. 22, 2010). 32. see, for example, gil-rodriguez and planella-ribera, “educational uses of the e-book,” 58–59; and cliff mcknight and james dearnley, “electronic book use in a public library,” journal of librarianship & information science 35, no. 4 (2003): 235–42, doi: 10.1177/0961000603035004003, http://lis.sagepub .com/content/35/4/235 (accessed dec. 22, 2010). 33. rickman et al. “a campus-wide e-textbook initiative.” 34. maria kiriakova et al., “aiming at a moving target: pilot testing ebook readers in an urban academic library,” computers in libraries 30, no. 2 (2010): 20–24, http://search .ebscohost.com/login.aspx?direct=true&db=a9h&an=48757663 &site=ehost-live (accessed dec. 22, 2010). 35. mark sandler, kim armstrong, and bob nardini, “market formation for e-books: diffusion, confusion or delusion?” the journal of electronic publishing 10, no. 3 (2007), doi: 10.3998/3336451.0010.310, http://quod.lib.umich.edu/cgi/t/ text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0010.310 (accessed dec. 22, 2010). 36. mark r. nelson, “e-books in higher education: nearing the end of an era of hype?” educause review 43, no. 2 (2008), http://www.educause.edu/educause+review/ educausereviewmagazinevolume43/ebooksinhigher educationnearing/162677 (accessed dec. 22, 2010). 37. ibid. 38. l. johnson et al., the 2010 horizon report (austin, tex.: edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed dec. 22, 2010). 14. harris interactive, “one in ten americans use an e-reader; one in ten likely to get one in the next six months,” press release, sept. 22, 2010, http://www.piworld.com/com mon/items/biz/pi/pdf/2010/09/pi_pdf_harrispoll_ereaders. pdf (accessed dec. 22, 2010). 15. kat meyer, “#followreader: consumer attitudes toward e-book reading,” blog posting, o’reilly radar, aug. 4, 2010, http://radar.oreilly.com/2010/08/followreader-consumer-atti tudes-toward-e-book-reading.html (accessed dec. 22, 2010). 16. the following articles are all based on user studies with small form factor devices: paul lam, shun leung lam, john lam and carmel mcnaught, “usability and usefulness of ebooks on ppcs: how students’ opinions vary over time,” australasian journal of educational technology 25, no. 1 (2009): 30–44, http:// www.ascilite.org.au/ajet/ajet25/lam.pdf (accessed dec. 22, 2010); catherine c. marshall and christine rutolo, “readingin-the-small: a study of reading on small form factor devices,” in jcdl ’02 proceedings of the 2nd acm/ieee-cs joint conference on digital libraries (new york: acm, 2002): 56–64. doi: 10.1145/544220.544230, http://portal.acm.org/citation .cfm?doid=544220.544230 (accessed dec. 22, 2010); and j. waycott and a. kukulska-hulme, “students’ experiences with pdas for reading course materials,” personal ubiquitous computing 7, no. 1 (2002): 30–43, doi: 10.1007/s00779–002–0211-x, http://www .springerlink.com/content/w288kry251dd2vcd/ (accessed dec. 22, 2010). 17. some examples in an academic context: james dearnley and cliff mcknight, “the revolution starts next week: the findings of two studies considering electronic books,” information services & use 21, no. 2 (2001): 65–78, http://search .ebscohost.com/login.aspx?direct=true&db=a9h&an=5847810& site=ehost-live (accessed dec. 22, 2010); and eric j. simon, “an experiment using electronic books in the classroom,” journal of computers in mathematics & science teaching 21, no. 1 (2002): 53–66, http://vnweb.hwwilsonweb.com/hww/jumpstart.jhtml?recid= 0bc05f7a67b1790e5237dc070f466830549a60a87b3fa34bd0b8951acd 7a879da9fa151218a88252&fmt=h (accessed dec. 22, 2010). 18. eva patrícia gil-rodriguez and jordi planella-ribera, “educational uses of the e-book: an experience in a virtual university context,” in hci and usability for education and work, ed. andreas holzinger, lecture notes in computer science no. 5298 (berlin: springer, 2008): 55–62, doi: 10.1007/9783-540-89350-9-5, http://www.springerlink.com/content/ d357482823j10m96/ (accessed dec. 22, 2010); “e-reader pilot at princeton, final report,” (princeton university, 2009), http:// www.princeton.edu/ereaderpilot/ereaderfinalreportlong .pdf (accessed dec. 22, 2010); gavin foster and eric d. remy. “e-books for academe: a study from gettysburg college,” educause research bulletin, no. 22 (2009), http://www.educause .edu/resources/ebooksforacademeastudyfromgett/187196 (dec. 22, 2010); and elizabeth mallett, “a screen too far? findings from an e-book reader pilot,” serials 23, no. 2 (2010): 14–144, doi: 10.1629/23140, http://uksg.metapress.com/ media/mfpntjwvyqtggyjvudu7/contributions/f/3/2/6/ f32687v5r12n5h77.pdf (accessed july 11, 2011). 19. steve kolowich, “colleges test amazon’s kindle e-book reader as study tool,” usa today, feb. 23, 2010, http://www .usatoday.com/news/education/2010–02–23-ihe-amazon-kin dle-for-college23_st_n.htm (accessed dec. 22, 2010). adoption of e-book readers among college students: a survey | foasberg 121 question 22, and was reused in the current survey. again, the author extends thanks to michelle fraboni and eva fernández, who ran this portion of the survey at queens college and allowed the use of their data. 41. electronic frontier foundation, “updated and corrected: e-book buyer’s guide to privacy,” deeplinks blog, jan. 6, 2010, http://www.eff.org/deeplinks/2010/01/updated-and-corrected-e-book-buyers-guide-privacy (accessed dec. 22, 2010). 42. pattuelli and rabina, “lis students’ attitudes.” new media consortium, 2010), http://wp.nmc.org/horizon2010/chapters/electronic-books/ (accessed july 11, 2011). 39. aaron smith, “e-book readers and tablet computers,” h t t p : / / w w w. p e w i n t e r n e t . o rg / r e p o r t s / 2 0 1 0 / g a d g e t s / report/ebook-readers-and-tablet-computers.aspx (accessed july 11, 2011). 40. this question was located in a portion of the survey not focused on e-book readers and thus does not appear in the appendix. the question derives from smith and caruso, 105, 122 information technology and libraries | september 2011 appendix. queens college student technology survey adoption of e-book readers among college students: a survey | foasberg 123 124 information technology and libraries | september 2011 adoption of e-book readers among college students: a survey | foasberg 125 126 information technology and libraries | september 2011 adoption of e-book readers among college students: a survey | foasberg 127 128 information technology and libraries | september 2011 2 information technology and libraries | march 2008 currently we librarians seem to be hitching our wagon to the idea of library as community because in part it’s what we ourselves want. we’ve seen that our lita members want more community from our association, so it makes sense to us that our patrons also want community. it’s what pew, oclc, and other studies seem to be telling us. the business-wired side of the world is breaking their backs to create every form of virtual community they can think of as quickly as possible. apply the appropriate amounts of marketing and then our patrons want those things and expect them from all of their historically important community resources, the library being a prime player in that group. so we strive and strive and strive to not only provide the standard issue face-to-face community we’ve always created, but to also create that new highly desired virtual community. either we create a library-specific version, or we at the very least create a way for our patrons to access those communities. hopefully, when our patrons step into those virtual communities, we work to make it possible for them to find libraries there, too. all well and good, but do we have a plan? what’s the goal? what’s the end achievement? if, as studies say, patrons with a research need turn to libraries first only one percent of the time, and instead first hit up friends and family fifty or more percent of the time, then where is our significance and place in either the physical or virtual spaces? we know we serve significant numbers in many ways. we have gate counts, circulation records, holds placed, warm bodies in the building—all manners of indicators that show a well-managed and -marketed library is in demand and appreciated. as we run into the terrible head-on crash of community and technology, willy-nilly doing absolutely everything we can to accommodate everyone and everything, because we’re librarians and library technologists and that’s what we do, do we really have a clue why we’re doing it? all fodder for deep thought and many lattes or beers and late night discussions. on the lita side, though, we’re embarking on doing something about this knot when it comes to serving our members. under the guidance of past-president bonnie postlethwaite we’ve established an assessment and research committee co-chaired by bonnie and diane bisom. to kick off the committee activities and to help them establish an agenda and direction, lita hired the research firm the wedewer group to work with the lita board and the new committee. stay tuned for reports and announcements from this committee as it works to find answers to some of those questions. and have that latte with a lita colleague as you seek to find some answers yourself. it’s all part of building community. mark beatty (mbeatty@wils.wisc.edu) is lita president 2007/2008 and trainer, wisconsin library services, madison. president’s message: doing something about life’s persistent problems? mark beatty 24 information technology and libraries | march 2011 ruben tous, manel guerrero, and jaime delgado semantic web for reliable citation analysis in scholarly publishing nevertheless, current practices in citation analysis entail serious problems, including security flaws related to the publishing process (e.g., repudiation, impersonation, and privacy of paper contents) and defects related to citation analysis, such as the following: ■■ nonidentical paper instances confusion ■■ author naming conflicts ■■ lack of machine-readable citation metadata ■■ fake citing papers ■■ impossibility for authors to control their related citation data ■■ impossibility for citation-analysis systems to verify the provenance and trust of citation data, both in the short and long term besides the fact that they do not provide any security feature, the main shortcoming of current citation-analysis systems such as isi citation index, citeseer (http:// citeseer.ist.psu.edu/), and google scholar is the fact that they count multiple copies or versions of the same paper as many papers. in addition, they distribute citations of a paper between a number of copies or versions, thus decreasing the visibility of the specific work. moreover, their use of different analysis databases leads to very different results because of differences in their indexing policies and in their collected papers.3 to remedy all these imperfections, this paper proposes a reference architecture for reliable citation analysis based on applying semantic trust mechanisms. it is important to note that a complete or partial adoption of the ideas defended in this paper will imply the effort to introduce changes within the publishing lifecycle. we believe that these changes are justified considering the serious flaws of the established solutions, and the relevance that citation-analysis systems are acquiring in our society. ■■ reference architecture we have designed a reference architecture that aims to provide reliability to the citation and citation-tracking lifecycle. this architecture is based in the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow. as a trust scheme, we have chosen a public key infrastructure (pki), in which certificates are signed by certification authorities belonging to one or more hierarchical certification chains.4 trust scheme the goal of the architecture is to allow citation-analysis systems to verify the provenance and trust of machinereadable metadata about citations before incorporating analysis of the impact of scholarly artifacts is constrained by current unreliable practices in cross-referencing, citation discovering, and citation indexing and analysis, which have not kept pace with the technological advances that are occurring in several areas like knowledge management and security. because citation analysis has become the primary component in scholarly impact factor calculation, and considering the relevance of this metric within both the scholarly publishing value chain and (especially important) the professional curriculum evaluation of scholarly professionals, we defend that current practices need to be revised. this paper describes a reference architecture that aims to provide openness and reliability to the citation-tracking lifecycle. the solution relies on the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow in such a manner that authors, publishers, repositories, and citation-analysis systems will have access to independent reliable evidences that are resistant to forgery, impersonation, and repudiation. as far as we know, this is the first paper to combine semantic web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing. i n recent years, the amount of scholarly communication brought into the digital realm has exponentially increased.1 this no-way-back process is fostering the exploitation of large-scale digitized scholarly repositories for analysis tasks, especially those related to impact factor calculation. the potential automation of the contribution– relevance calculation of scholarly artifacts and scholarly professionals has attracted the interest of several parties within the scholarly environment, and even outside of it. for example, one can find within articles of the spanish law related to the scholarly personnel certification the requirement that the papers appearing in the curricula of candidates should appear in the subject category listing of the journal citation reports of the science citation index.2 this example shows the growing relevance of these systems today. ruben tous (rtous@ac.upc.edu) is associate professor, manuel guerrero (guerrero@ac.upc.edu) is associate professor, and jaime delgado (jaime.delgado@ac.upc.edu) is professor, all in the departament d’arquitectura de computadors, universitat politècnica de catalunya, barcelona, spain. semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 25 might send a signed notification of rejection. we feel that the notification of acceptance is necessary because in a certain kind of curriculum, evaluations for university professors conditionally accepted papers can be counted, and in other curriculums not. the camera-ready version will be signed by all the authors of the paper, not only the corresponding author like in the paper submission. after the camera-ready version of the paper has been accepted, the journal will send a signed notification of future publication. this notification will include the date of acceptance and an estimate date of publication. finally, once the paper has been published, the journal will send a signed notification of publication to the author. the reason for having both notification of future publication and notification of publication is that, again, some curriculum evaluations might be flexible enough to count papers that have been accepted for future publication, while stricter ones state explicitly that they only accept published papers. once this process has been completed, a citationanalysis system will only need to import the authors’ ca certificates (that is, the certificates of the universities, research centers, and companies) and the publishers’ ca certificates (like acm, ieee, springer, lita, etc.) to be able to verify all the signed information. a chain of cas will be possible both with authors (for example, university, department, and research line) and with publications (for example, publisher and journal). ■■ universal resource identifiers to ensure that authors’ uris are unique, they will have a tree structure similar to what urls have. the first level element of the uri will be the authors’s organization (be it a university or a research center) id. this organization id will be composed by the country code top-level domain (cctld) and the organization name, separated by an underscore.5 the citation-analysis system will be responsible for assigning these identifiers and ensuring that all organizations have different identifiers. then, in the same manner, each organization will assign second-level elements (similar to departments) and so forth. author’s ca_id: _ example: es_upc author ’s uri: author:/// . . . /. example: author://es_upc.dac/ruben.tous (in this example “es” is the cctdl for spain, upc (universitat politècnica de catalunya) is the university, and dac (departament d’arquitectura de computadors) is the department. them into their repositories. as a collateral effect, authors and publishers also will be able to store evidences (in the form of digitally signed metadata graphs) that demonstrate different facts related to the creating–editing–publishing process (e.g., paper submission, paper acceptance, and paper publication). to achieve these goals, our reference architecture requires each metadata graph carrying information about events to be digitally signed by the proper subject. because our approach is based in a pki trust scheme, each signing subject (author or publisher) will need a public key certificate (or identity certificate), which is an electronic document that incorporates a digital signature to bind a public key with an identity. all the certificates used in the architecture will include the public key information of the subject, a validity period, the url of a revocation center, and the digital signature of the certificate produced by the certificate issuer’s private key. each author will have a certificate that will include as a subject-unique identifier the author ’s universal resource identifier (uri), which we explain in the next section, along with the author ’s current information (such as name, e-mail, affiliation, and address) and previous information (list of former names, e-mails, and addresses), and a timestamp indicating when the certificate was generated. the certification authority (ca) of the author’s certificate will be the university, research center, or company with which the author is affiliated. the ca will manage changes in name, e-mail, and address by generating a new certificate in which the former certificate will move to the list of former information. changes in affiliation will be managed by the new ca, which will generate a new certificate with the current information. since the new certificate will have a new uri, the ca also will generate a signed link to the previous uri. therefore the citation-analysis system will be able to recognize the contributions signed with both certificates as contributions made by the same author. it will be the responsibility of the new ca to verify that the author was indeed affiliated to the former organization (which we consider a very feasible requirement). every time an author (or group of authors) submits a paper to a conference, workshop, or journal, the corresponding author will digitally sign a metadata graph describing the paper submission event. although the paper submission will only be signed by the corresponding author, it will include the uris of all the authors. journals (and also conferences and workshops) will have a certificate that contains their related information. their ca will be the organization or editorial board behind them (for instance, acm, ieee, springer, lita, etc.). if a paper is accepted, the journal will send a signed notification of acceptance, which will include the reviews, the comments from the editor, and the conditions for the paper to be accepted. if the paper is rejected, the journal 26 information technology and libraries | march 2011 ■■ microsoft’s conference management toolkit (cmt; http://cmt.research.microsoft.com) is a conference management service sponsored by microsoft research. it uses https to provide confidentiality, but it is a service for which you have to pay. although some of the web-based systems provide confidentiality through https, none of them provides nonrepudiation, which we feel is even more important. this is so because nonrepudiation allows authors to certify their publications to their curriculum evaluators. our proposed scheme always provides nonrepudiation because of its use of signatures. curriculum evaluators don’t need to search for the publisher’s website to find the evaluated author’s paper. in addition, our proposed scheme allows curriculum evaluations to be performed by computer programs. and confidentiality can easily be achieved by encrypting the messages with the public key of the destination of the message. it should not be difficult for authors to obtain the public key for the conference or journal (which could be included in its “call for papers” or included on its webpage). and, because the paper-submission message includes the author’s public key, notifications of acceptance, rejection, and publication can be encrypted with that key. ■■ modeling the scholarly communication process citation analysis systems operate over metadata about the scholarly communication process. currently, these metadata are usually automatically generated by the citation-analysis systems themselves, generally through a programmatic analysis of the scholarly artifacts unstructured textual contents. these techniques have several drawbacks, as enumerated already, but especially regarding the fact that there is metadata that cannot be inferred from the contents of a paper, like all the aspects of the publishing process. to allow citation-analysis systems accessing metadata about the entire scholarly artifacts lifecycle, we suggest a metadata model that captures a great part of the scholarly domain static and dynamic semantics. this model is based on knowledge representation techniques in semantic web, such as resource description framework (rdf) graphs and web ontology language (owl) ontologies. metadata and rdf the term “metadata” typically refers to a certain data representation that describes the characteristics of an information-bearing entity (generally another data representation such as a physical book or a digital video file). metadata plays a privileged role in the scholarly creations’ uris are built in a similar manner to authors’ uris. but it this case, the use of the country code as part of the publisher’s id is optional. because a creation and its metadata evolve through different stages (submission and camera-ready), we will use different uris for each phase. we propose the use of this kind of uri instead of other possible schemes such as the digital object identifier (doi), because the ones proposed in this paper has the advantage of being human readable and contain the cas chain.6 of course, that doesn’t mean that once published a paper cannot obtain a doi or another kind of identifier. publisher’s ca_id: or _ examples: lita and it_italianjournalofzoology creation’s uri: creation:// . . . / example: creation://lita.ital/vol27_num1_ paper124 confidentiality and nonrepudiation nowadays, some conferences manage their paper submissions and notifications of acceptance (with their corresponding reviews) through e-mail, while others use a web-based application, such as edas (http://edas.info/). the e-mail-based system has no means of providing any kind of confidentiality. each router through which the e-mail travel can see their contents (paper submissions and paper reviews). the web-based system can provide confidentiality through http secure (https), although some of the most popular applications (such as edas and myreview) do not provide it; their developers may not have thought that it was an important feature. the following is a short list of some of the existing web-based systems: ■■ edas (http://edas.info/) is probably the most popular sytem. it can manage a large number of conferences and special issues of journals. it does not provide confidentiality. ■■ myreview (http://myreview.intellagence.eu/index .php) is an open-source web application distributed under the gpl license for managing the paper submissions and paper reviews of a conference or journal. myreview is implemented with php and mysql. it does not provide confidentiality. ■■ conftool (http://www.conftool.net) is another web-based management system for conferences and workshops. a free license of the standard version is available for noncommercial conferences and events with fewer than 150 participants. it uses https to provide confidentiality. semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 27 the purpose of the reference architecture described in this paper, we do not instruct which of the two described approaches for signing rdf graphs is to be used. the decision will depend on the implementation (i.e., on how the graphs will be interchanged and processed). owl and an ontology for the scholarly context to allow modeling the scholarly communication process with rdf graphs, we have designed an owl description logic (dl) ontology. owl is a vocabulary for describing properties and classes of rdf resources, complementing rdfs’s capabilities for providing semantics for generalization hierarchies of such properties and classes. owl enriches the rdfs vocabulary by adding, among others, relations between classes (e.g., disjointness), cardinality (e.g., “exactly one”), equality, richer typing of properties, characteristics of properties (e.g., symmetry), and enumerated classes. owl has the influence of more than ten years of dl research. this knowledge allowed the set of constructors and axioms supported by owl to be carefully chosen so as to balance the expressive requirements of typical applications with a requirement for reliable and efficient reasoning support. a suitable balance between these computational requirements and the expressive requirements was achieved by basing the design of owl on the sh family of description logics.10 the language has three increasingly expressive sublanguages designed for different uses: owl lite, owl dl, and owl full. we have chosen owl dl to define the ontology for capturing the static and dynamic semantics of the scholarly communication process. with respect to the other versions of owl, owl dl offers the most expressiveness while retaining computational completeness (all conclusions are guaranteed to be computable) and decidability (all computations will finish in finite time). owl dl is so named because of its correspondence with description logics. figure 3 shows a simplified graphical view of the owl ontology we have defined for capturing static and dynamic semantics of the scholarly communication process. figure 4, figure 5, and figure 6 offer a (partial) tabular representation of the main classes and properties of the ontology. in owl, properties are independent from classes, but we have chosen to depict them in an object-oriented manner to improve understanding. for the same reason we have represented some properties as arrows between classes, despite this information being already present in the tables. uris do not appear as properties in the diagrams because each instance of a class will be an rdf resource, and any resource has a uri according to the rdf model. these uris will follow the rules described in the above section, “reference architecture.” it’s worth mentioning that the selection of the included properties has been based in the study of several metadata formats and standards, such as dublin communication process by helping identify, discover, assess, and manage scholarly artifacts. because metadata are data, they can be represented through any the existing data representation models, such as the relational model or the xml infoset. though the represented information should be the same regardless of the formalism used, each model offers different capabilities of data manipulation and querying. recently, a not-so-recent formalism has proliferated as a metadata representation model: rdf from the world wide web consortium (w3c).7 we have chosen rdf for modeling the citation lifecycle because of its advantages with respect to other formalisms. rdf is modular; a subset of rdf triples from an rdf graph can be used separately, keeping a consistent rdf model. it therefore can be used with partial information, an essential feature in a distributed environment. the union of knowledge is mapped into the union of the corresponding rdf graphs (information can be gathered incrementally from multiple sources). rdf is the main building block of the semantic web initiative, together with a set of technologies for defining rdf vocabularies like rdf schema (rdfs) and the owl.8 rdf comprises several related elements, including a formal model and an xml serialization syntax. the basic building block of the rdf model is the triple subjectpredicate-object. in a graph-theory sense, an rdf instance is a labeled directed graph consisting of vertices, which represent subjects or objects, and labeled edges, which represent predicates (semantic relations between subjects and objects). coming back to the scholarly domain, our proposal is to model static knowledge (e.g., authors and papers metadata) and dynamic knowledge (e.g., “the action of accepting a paper for publication,” or “the action of submitting a paper for publication”) using rdf predicates. the example in figure 1 shows how the action of submitting a paper for publication could be modeled with an rdf graph. figure 2 shows how the example in figure 1 would be serialized using the rdf xml syntax (the abbreviated mode). so, in our approach, we model assertions as rdf graphs and subgraphs. to allow anybody (authors, publishers, citation-analysis systems, or others) to verify a chain of assertions, each involved rdf graph must be digitally signed by the proper principal. there are two approaches to signing rdf graphs (as also happens with xml instances). the first approach applies when the rdf graph is obtained from a digitally signed file. in this situation, one can simply verify the signature on the file. however, in certain situations the rdf graphs or subgraphs come from a more complex processing chain, and one could not have access to the original signed file. a second approach deals with this situation, and faces the problem of digitally signing the graphs themselves, that is, signing the information contained in them.9 for 28 information technology and libraries | march 2011 note that instances of submitted and accepted event classes will point to the same creation instance because no modification of the creation is performed between these events. on the other hand, instances of tobepublished and published event classes will point to different creation instances (pointed by the cameraready and publishedcreation properties) because of the final editorial-side modifications to which a work can be subject. ■■ advantages of the proposed trust scheme the following is a short list of security features provided by our proposed scheme and attacks against which our proposed scheme is resilient: core (dc), dc’s scholarly works application profile, vcard, and bibtex.11 figure 4 shows the class publication and its subclasses, which represent the different kinds of publication. in the figure, we only show classes for journals, proceedings, and books. but it could obviously be extended to contain any kind of publication. figure 5 contains the classes for the agents of the ontology (i.e., the human beings that author papers and book chapters and the organizations to which human beings are affiliated or that edit publications). the figure also includes the creation class (e.g., a paper or a book chapter). finally, figure 6 has the part of the ontology that describes the different events that occur in the process of publishing a paper (i.e., paper submission, paper acceptance, notification of future publication, and publication). figure 1. example rdf graph semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 29 cryptography. the necessary changes do not apply only to the citation-management software, but also to all the involved parties in the publishing lifecycle (e.g., conference and journal management systems). authors and publishers would be the originators of the digitally signed evidences, thus user-friendly tools for generating and signing the rdf metadata would be required. plenty of rdf editors and digital signature toolkits exist, but we predict that conference and journal management systems such as edas could easily be extended to provide integrated functionalities for generating and processing digitally signed metadata graphs. this could be transparent to the users because the rdf documents would be automatically generated (and also signed in the case of the publishers) during the creating–editing– publishing process. because our approach is based on a pki trust scheme, we rely on a special setup assumption: the existence of cas, which certify that the identity information and the public key contained within the public key certificates of authors and publishers belong together. to get a publication recognized by a reliable citation-analysis system, an author or a publisher would need a public-key certificate issued by a ca trusted by this citation-analysis system. the selection of trusted ■■ an author can certify to any evaluation entity that will evaluate his or her curriculum the publications that he or she has done. ■■ an evaluator entity can query the citation-analysis system and get all the publications that a certain author has done. ■■ an author cannot forge notifications of publication. ■■ a publisher cannot repudiate the fact that it has published an article once it has sent the certificate. ■■ two or more authors cannot team up and make the system think that they are the same person to have more publications in their accounts (not even if they happen to have the same name). ■■ implications the adoption of the approach proposed in this paper has certain implications in terms of technological changes but also in terms of behavioral changes at some of the stages of the scholarly publishing workflow. regarding the technological impact, the approach relies on the use of semantic web technologies and public-key 2008–05–25 semantic web for reliable citation management in scholarly publishing . . . . . . figure 2. example rdf/xml representation of graph in figure 1 30 information technology and libraries | march 2011 figure 3. owl ontology for capturing the scholarly communication process figure 4. part of the ontology describing publications semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 31 the citation-analysis system obtains the information or whether the information is duplicated. the proposed approach guarantees that the citation-analysis subsystem can always verify the provenance and trust of the metadata, and the use of unique identifiers ensures the detection of duplicates. our approach also implies minor behavioral changes for authors, mainly related to the management of publickey certificates, which is often required for many other tasks nowadays. a collateral benefit of the approach would be the automation of the copyright transfer procedure, which in most cases still relies on handwritten signatures. authors would only be required to have their public-key certificate at hand (probably installed in the web browser), and the conference and journal management software would do all the work. cas by citation-analysis systems would require the deployment of the necessary mechanisms to allow an author or a publisher to ask for the inclusion of his or her institution in the list. however, this process would be eased if some institutional cas belonged to trust hierarchies (e.g., national or regional), so including some higher-level cas makes the inclusion of cas of some small institutions easier. another technological implication is related to the interchange and storage of the metadata. users and publishers should save the signed metadata coming from a publishing process digitally, and citation-analysis systems should harvest the digitally signed metadata. the metadata-harvesting process could be done in several different ways; but here raises an important benefit of the presented approach: the fact that it does not matter where figure 5. part of the ontology describing agents and creations 32 information technology and libraries | march 2011 domain, but which we have taken in consideration. in our approach, static and dynamic metadata cross many trust boundaries, so it is necessary to apply trust management techniques designed to protect open and decentralized systems. we have chosen a public-key infrastructure (pki) design to cover such a requirement. however, other approaches exist, such as the one by khare and rifkin, which combines rdf with digital signatures in a manner related to what is known as the “web of trust.”13 one aspect of any approach dealing with rdf and cryptography is how to digitally sign rdf graphs. as described above, in the section “modeling the scholarly communication process with semantic web knowledge representation techniques,” there are two different approaches for such a task, signing the file from which the graph will be obtained (which is the one we have chosen) or digitally signing the graphs themselves (the information represented in them), as described by carroll.14 ■■ conclusions the work presented in this paper describes a reference architecture that aims to provide reliability to the citation and citation-tracking lifecycle. the paper defends that current practices in the analysis of impact of scholarly artifacts entail serious design and security flaws, including nonidentical instances confusion, author-naming conflicts, fake citing, repudiation, impersonation, etc. ■■ related work as far as we know, this is the first paper to combine semantic web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing. regarding the use of ontologies and semantic web technologies for modeling the scholarly domain, we highlight the research by rodriguez, bollen, and van de sompel.12 they define a semantic model for the scholarly communication process, which is used within an associated large-scale semantic store containing bibliographic, citation, and use data. this work is related to the mesur (metrics from scholarly usage of resources) project (http://www.mesur.org) from los alamos national laboratory. the project’s main goal is providing novel mechanisms for assessing the impact of scholarly communication items, and hence of scholars, with metrics derived from use data. as in our case, the approach by rodriguez, bollen, and van de sompel models static and dynamic aspects of the scholarly communication process using rdf and owl. however, contrary to what happens in that approach, our work focuses on modeling the dynamic aspects of the creation–editing–publishing workflow, while the approach by rodriguez, bollen, and van de sompel focuses on modeling the use of alreadypublished bibliographic resources. regarding the combination of semantic web technologies with security aspects and cryptography, there exist several works that do not specifically focus in the scholarly figure 6. part of the ontology describing events semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 33 isi web of knowledge, http://www.isiwebofknowledge .com/ (accessed june 24, 2010); and eugene garfield, citation indexing: its theory and application in science, technology and humanities (new york: wiley, 1979). 3. judit bar-ilan, “an ego-centric citation analysis of the works of michael o. rabin based on multiple citation indexes,” information processing & management: an international journal 42 no. 6 (2006): 1553–66. 4. alfred arsenault and sean turner, “internet x.509 public key infrastructure: pkix roadmap,” draft, pkix working group, sept. 8, 1998, http://tools.ietf.org/html/draft-ietf-pkixroadmap-00 (accessed june 24, 2010). 5. internet assigned numbers authority (iana), root zone database, http://www.iana.org/domains/root/db/ (accessed june 24, 2010). 6. for information on the doi system, see bill rosenblatt, “the digital object identifier: solving the dilemma of copyright protection online,” journal of electronic publishing 3, no. 2 (1997). 7. resource description framework (rdf), world wide web consortium, feb. 10, 2004, http://www.w3.org/rdf/ (accessed june 24, 2010). 8. “rdf vocabulary description language 1.0: rdf schema. w3c working draft 23 january 2003,” http://www .w3.org/tr/2003/wd-rdf-schema-20030123/ (accessed june 24, 2010); “owl web ontology language overview. w3c recommendation 10 february 2004,” http://www.w3.org/tr/ owl-features/ (accessed june 24, 2010). 9. jeremy j. carroll, “signing rdf graphs,” in the semantic web—iswc 2003, vol. 2870, lecture notes in computer science, ed. dieter fensel, katia sycara, and john mylopoulos (new york: springer, 2003). 10. ian horrocks, peter f. patel-schneider, and frank van harmelen, “from shiq and rdf to owl: the making of a web ontology language” web semantics: science, services and agents on the world wide web 1 (2003): 10–11. 11. see the dublin core metadata initiative (dcmi), http:// dublincore.org/ (accessed june 24, 2010); julie allinson, pete johnston, and andy powell, “a dublin core application profile for scholarly works,” ariadne 50 (2007), http://www.ukoln .ac.uk/repositories/digirep/index/eprints_type_vocabulary_ encoding_scheme, http://www.ariadne.ac.uk/issue50/ allinson-et-al/ (accessed dec. 27, 2010); world wide web consortium, “representing vcard objects in rdf/xml: w3c note 22 february 2001,” http://www.w3.org/tr/2001/note -vcard-rdf-20010222/ (accessed dec. 3, 2010); and for bibtex, see “entry types,” http://nwalsh.com/tex/texhelp/bibtx-7. html (accessed june 24, 2010). 12. marko. a. rodriguez, johan bollen, and herbert van de sompel, “a practical ontology for the large-scale modeling of scholarly artifacts and their usage,” proceedings of the 7th acm/ ieee joint conference on digital libraries (2007): 278–87. 13. rohit khare and adam rifkin, “weaving a web of trust,” world wide web journal 2, no. 3 (1997): 77–112. 14. carroll, “signing rdf graphs.” the architecture presented in this work is based in the use of digitally signed rdf graphs in the different stages of the scholarly publishing workflow, in such a manner that authors, publishers, repositories, and citation-analysis systems could have access to independent reliable evidences. the architecture aims to allow the creation of a reliable information space that reflects not just static knowledge but also dynamic relationships, reflecting the full complexity of trust relationships between the different parties in the scholarly domain. to allow modeling the scholarly communication process with rdf graphs, we have designed an owl dl ontology. rdf graphs carrying instances of classes and properties from the ontology will be digitally signed and interchanged between parties at the different stages of the creation–editing–publishing process. citation-management systems will have access to these signed metadata graphs and will be able to verify their provenance and trust before incorporating them to their repositories. because citation analysis has become a critical component in scholarly impact factor calculation, and considering the relevance of this metric within the scholarly publishing value chain, we defend that the relevance of providing a reliable solution justifies the effort of introducing technological changes within the publishing lifecycle. we believe that these changes, which could be easily automated and incorporated to the modern conference and journal editorial systems, are justified considering the serious flaws of the established solutions and the relevance that citation-analysis systems are acquiring in our society ■■ acknowledgment this work has been partly supported by the spanish administration (tec2008-06692-c02-01 and tsi2007 66869-c02-01). references and notes 1. herbert van de sompel et al., “an interoperable fabric for scholarly value chains,” d-lib magazine 12 no. 10 (2006), http:// www.dlib.org/dlib/october06/vandesompel/10vandesompel .html (accessed jan. 19, 2011). 2. boletín oficial del estado (b.o.e.) 054 04/03/2005 sec 3 pag 7875 a 7887, http://www.boe.es/boe/dias/2005/03/04/pdfs/ a07875–07887.pdf (accessed june 24, 2010). see also thomson cherry 154 information technology and libraries | september 2006 article title: subtitle in same font author name and second author author id box for 2 column layout the present study investigated whether there is a correlation between user performance and compliance with screen-design guidelines found in the literature. rather than test individual guidelines and their interactions, the authors took a more holistic approach and tested a compilation of guidelines. nine bibliographic display formats were scored using a checklist of eighty-six guidelines. twenty-seven participants completed ninety search tasks using the displays in a simulated web environment. none of the correlations indicated that user performance was statistically significantly faster with greater conformity to guidelines. in some cases, user performance was actually significantly slower with greater conformity to guidelines. in a supplementary study, a different set of forty-three guidelines and the user performance data from the main study were used. again, none of the correlations indicated that user performance was statistically significantly faster with greater conformity to guidelines. a ttempts to establish generalizations are ubiquitous in science and in many areas of human endeavor. it is well known that this enterprise can be extremely problematic in both applied and pure science.1 in the area of human-computer interaction, establishing and evaluating generalizations in the form of interface-design guidelines are pervasive and difficult challenges, particularly because of the intractably large number of potential interactions among guidelines. using bibliographic display formats from web catalogs, the present study utilizes global evaluation by correlating user performance in a search task with conformity to a compilation of eighty-six guidelines (divided into four subsets). the literature offers many design guidelines for the user interface, some of which cover all aspects of the user interface, some of which focus on one aspect of the user interface—e.g., screen design. tullis, in chapters in two editions of the handbook of human-computer interaction, reviews the work in this area.2 the earlier chapter provides a table describing the screen-design guidelines available at that time. he includes, for example, galitz, whom he notes have several hundred guidelines addressing general screen design, and smith and mosier, whom he notes have about three hundred guidelines addressing the display of data.3 earlier guidelines tended to be generic. more recently, guidelines have been developed for specific applications—e.g., web sites for airline travel agencies, multimedia applications, e-commerce, children, bibliographic displays, and public-information kiosks.4 although some of the guidelines in the literature are based on empirical evidence, many are based on expert opinion and have not been tested. some of the researchbased guidelines have been tested in isolation or in combination with only a few other guidelines. the national cancer institute (nci) web site, research-based web design and usability guidelines, rates sixty guidelines on a scale of 0 to 5 based on the strength of the evidence.5 the more valid the studies that directly support the guideline, the higher the rating. in interpreting the scores, the site advises that scores of 1, 2, or 3 suggest that “more evidence is needed to strengthen the designer’s overall confidence in the validity of a guideline.” of the sixty guidelines on the site, forty-six (76.7 percent) fall into this group. in 2003, the united states department of health and human services web site, research-based web design and usability guidelines, rated 187 guidelines on a different five-point scale.6 eightytwo guidelines (43.9 percent) meet the criteria of having strong or medium research support. another forty-eight guidelines (25.7 percent) are rated as having weak research support. thus, there is some research support for 69.6 percent of the guidelines. in addition to the issue of the validity of individual guidelines, there may be interactions among guidelines. an interaction occurs if the effect of a variable depends on the level of another variable—e.g., an interaction occurs if the usefulness of a guideline depends on whether some other guideline is being followed. a more severe problem is the potential for high-order interactions: the nature of a two-way interaction may depend on the level of a third variable, the nature of a three-way interaction may depend on the level of a fourth variable, and so on. because of the combinatorial explosion, if there are more than a few variables the number of possible interactions becomes huge. as cronbach stated: “once we attend to interactions, we enter a hall of mirrors that extends to infinity.”7 with a large set of guidelines, it is impractical to test all of the guidelines and all of the interactions, including highorder interactions. muter suggested several approaches for handling the problem of intractable high-order interactions, including adapting optimizing algorithms such as simplex, seeking “robustness in variation,” re-construing the problem, and pruning the alternative space.8 the present study utilizes another approach: global evaluation by joan m. cherry, paul muter, and steve j. szigeti bibliographic displays in web catalogs: does conformity to design guidelines correlate with user performance? joan m. cherry (joan.cherry@utoronto.ca) is a professor in the faculty of information studies; paul muter (muter@psych .utoronto.ca) is an assistant professor in the department of psychology; and steve j. szigeti (szigeti@fis.utoronto.ca) is a doctoral student in the faculty of information studies and the knowledge media design institute, all at the university of toronto, canada. bibliographic displays in web catalogs | cherry, muter, and szigeti 155 correlating user performance with conformity to a set of guidelines. using this method, particular guidelines and interactions are not tested, but the set and subsets are tested globally, and some of the interactions, including high-order interactions, are captured. bibliographic displays were scored using a compilation of guidelines, divided into four subsets, and the performance of users doing a set of search tasks using the displays was measured. an attempt was made to determine whether users find information more quickly on displays that receive high scores on checklists of screen-design guidelines. the authors are aware of only two studies that have investigated conformity with a set of guidelines and user performance, and they both included only ten guidelines. d’angelo and twining measured the correlation between compliance with a set of ten standards (d’angelo standards) and user comprehension.9 the d’angelo standards are in the form of principles for web-page design, based on a review of the literature.10 d’angelo and twining found a small correlation (.266) between number of standards met and user comprehension.11 they do not report on statistical significance, but from the data provided in the paper it appears that the correlation is not significant. gerhardt-powals compared an interface designed according to ten cognitive engineering principles to two control interfaces and found that the cognitively engineered interface resulted in statistically significantly superior user performance.12 the guidelines used in the present study were based on a list compiled by chan to evaluate displays of bibliographic records in online library catalogs.13 the set of guidelines was broken down into four subsets. participants in this study were given search tasks and clicked on the requested item on a bibliographic display. the main dependent variable of interest was response time. ฀ method participants twenty-seven participants were recruited through the university of toronto psychology 100 subject pool. seventeen were female; ten were male. most (twenty) were in the age group 17 to 24; three were in the age group 25 to 34 years, and four were in the age group 35 to 44. one had never used the web; all others reported using the web one or more hours per week. participants received course credit. design to control for the effects of fatigue, practice runs, and the like, the order of trials was determined by two orthogonal 9 x 9 latin squares—one to select a display and one to select a book record. each participant completed five consecutive search tasks—author, title, call number, publisher, and date—in a random order, with each display-book combination. (the order of the five search tasks was randomized each time.) this procedure was repeated, so that in total each participant did ninety tasks (9 displays x 5 tasks x 2 repetitions). materials and apparatus the study used nine displays from library catalogs available on the web. they were selected to represent a variety of systems and to illustrate the remarkable diversity in bibliographic displays in web catalogs. the displays differed in the amount of information included, the structure of the display, employment of highlighting techniques, and use of graphical elements. four examples of the nine displays are presented in figures 1a, 1b, 1c, and 1d. the displays were captured and presented in an interactive environment using active server page (asp) software. the look of the displays was retained, but hypertext links were deactivated. nine different book records were used to provide the content for the displays. items selected were those that would be readily understood by most users—e.g., books by saul bellow, norman mailer, and john updike. the guidelines were based on a list compiled by chan from a review of the literature in human-computer interaction and library science.14 the list does not include guidelines about the process of design. chan formatted the guidelines as a checklist for bibliographic displays in online catalogs. in work reported in 1996, cherry and cox modified the checklist for use with bibliographic displays in web catalogs.15 in a 1998 paper, cherry reported on evaluations of bibliographic displays in catalogs of academic libraries, based on chan’s data for twelve opacs and data for ten web catalogs evaluated by cherry and cox using a modification of the 1996 checklist for web catalogs.16 the findings showed that, on average, displays in opacs scored 58 percent and displays in web catalogs scored 60 percent. the 1996 checklist of guidelines was modified by herrero-solana and de moya-anegón, who used it to explore the use of multivariate analysis in evaluating twenty-five latin american catalogs.17 for the present study four questions were removed that were considered less useful from the checklist used in cherry’s 1998 analysis. the checklist consisted of four sections or subsets: labels (these identify parts of the bibliographic description); text (the display of the bibliographic, holdings/ location, and circulation status information); instructions (includes instructions to users, informational messages, and options available); and layout (includes identification of the screen, the organization for the bibliographic 156 information technology and libraries | september 2006 information, spacing, and consistency of information presentation). items on the checklist were phrased as questions requiring yes/no responses. examples of the items are: labels: “are all fields/variables labeled?” text: “is the text in mixed case (upper and lowercase)?” instructions: “are instructional sentences or phrases simple, concise, clear, and free of typographical errors?” and layout: “is the width of the display no more than forty to sixty characters?” the set used in the present study contained eightysix guidelines in total, of which forty-eight were generic and could be applied to any application. thirty-eight are specific and apply to bibliographic displays in web catalogs. the experiment was run on a pentium computer with a seventeen-inch sony color monitor with a standard keyboard and mouse. figure 1a. example of display figure 1b. example of display figure 1c. example of display figure 1d. example of display bibliographic displays in web catalogs | cherry, muter, and szigeti 157 procedure participants were tested individually. five practice trials with a display and book record not used in the experiment familiarized the participant with the tasks and software. at the beginning of a trial, the message “when ready, click” appeared on the screen. when the participant clicked on the mouse, a bibliographic display appeared along with a message at the top of the screen indicating whether the participant should click on the author, title, call number, publisher, or date of publication—e.g., “current task: author.” participants clicked on what they thought was the correct answer. if they clicked on any other area, the display was shown again. an incorrect click was not defined as an error—in effect, percent correct was always 100—but an incorrect click would of course add to the response time. the software recorded the time to successfully complete each search, the identification for the display and the book record, and the search-task type. when a participant completed the five search tasks for a display, a message was shown indicating the average response time on that set of tasks. when participants completed the ninety search tasks, they were asked to rank the nine displays according to their preference. for this task, a set of laminated color printouts of the displays was provided. participants ranked the displays, assigning a rank of 1 to the display that they preferred most, and 9 to the one they preferred least. they were also asked to complete a short background questionnaire. the entire session took less than forty-five minutes. scoring the displays on screen design guidelines the authors’ experience has indicated that judging whether a guideline is met can be problematic: evaluators sometimes differ in their judgments. in this study, three evaluators assessed each of the nine displays independently. if there was any disagreement amongst the evaluators’ responses for a given question for a given display, that question was not used in the computation of the percentage score for that display. (a guideline regarding screen density was evaluated by only one evaluator because it was very time-consuming.) the total number of questions used to assess each display was eighty-six. the number of questions on which the evaluators disagreed ranged from twelve to thirty across the nine displays. all questions on which the three evaluators agreed for a given display were used in the calculation of the percentage score for that display. hence the percentage scores for the displays are based on a variable set and number of questions—from fifty-six to seventy-four. the subset of questions on which the three evaluators agreed for all nine displays was small—twenty-two questions. ฀ results with regard to conformity to the guidelines, in addition to the overall scores for each display, which ranged from 42 percent to 65 percent, the percentage score was calculated for each subset of the checklist (labels, text, instructions, and layout). the time to successfully complete each search task was recorded to the nearest millisecond. (for some unknown reason, six of the 2,430 response times recorded [27 x 90] were 0 milliseconds. the program was written in such a way that the response-time buffer was cleared at the time of stimulus presentation, in case the participant clicked just before this time. these trials were treated as missing values in the calculation of the means.) six mean response times were calculated: author, title, call number, publisher, date, and the sum of the five response times, called all tasks. the mean of all tasks response times ranged from 13,671 milliseconds to 21,599 milliseconds for the nine formats. the nine display formats differed significantly on this variable according to an analysis of variance, f(8, 477) = 17.1, p < .001. the correlations between response times and guidelines-conformance scores are presented in table 1. it is important to note that a high correlation between response time and conformity to guidelines indicates a low correlation between user performance (speed) and conformity to guidelines. row 1 of table 1 contains correlations between the total guidelines score and response times; column 1 contains correlations between all tasks (the sum of the five response times) and guidelines scores. of course, the correlations in table 1 are not all independent of each other. only five of the thirty correlations in table 1 are significant at the .05 level, and they all indicate slower response times with higher conformity to guidelines. of the six correlations in table 1 indicating faster response times with higher conformity to guidelines, none approaches statistical significance. the upper left-hand cell of table 1 indicates that the overall correlation between total scores on the guidelines and the mean response time across all search tasks (all tasks) was 0.469 (df = 7, p = 0.203)—i.e., conformity to the overall checklist was correlated with slower overall response times, though this correlation did not approach statistical significance. figure 2 shows a scatter plot of the main independent variable, overall score on the checklist of guidelines, and the main dependent variable, the sum of the response times for the five tasks (all tasks). figure 3 shows a scatter plot for the highest obtained correlation: between score on the overall checklist of guidelines and the time to complete the title search task. visual inspection suggests patterns consistent with table 1: no correlation in figure 2, and slower search times with higher guidelines scores in figure 3. finally, correlations were computed between preference and response times (all tasks response times and five 158 information technology and libraries | september 2006 specific-task response times) and between preference and conformity to guidelines (overall guidelines four subsets of guidelines). none of the eleven correlations approached statistical significance. ฀ supplementary study to further validate the results of the main study, it was decided to score the interfaces against a different set of guidelines based on the 2003 u.s. department of health and human services research-based web design and usability guidelines. this set consists of 187 guidelines and includes a rating for each guideline based on strength of research evidence for that guideline. the present study started with eighty-two guidelines that were rated as having either moderate or strong research support, as the definitions of both of these include “cumulative research-based evidence.”18 compliance with guidelines that address the process of design can only be judged during the design process, or via access to the interface designers. since this review process did not allow for that, a total of nine process-focused guidelines were discarded. this set of seventy-three guidelines was then compared with the sixty-guideline 2001 nci set, research-based web design and usability guidelines, intending to add any outstanding nci guidelines supported by strong research evidence to the existing list of seventy-three. however, all of the strongly supported nci guidelines were already represented in the original seventy-three. finally, the guidelines in the iso 9241, ergonomic requirements for office work with visual display terminals (vdts), part 11 (guidance on usability), part 12 (presentation of information ), and part 14 (menu dialogues ) were compared to the existing set of seventy-three, with the intention that any prescriptive guideline in the iso set that was not already included in the original seventy-three would be added.19 again, there were none. the seventy-three guidelines were organized into three thematic groups: (1) layout (the organization of textual and graphic material on the screen), (2) interaction (which included navigation or any element with which the user would interact), and (3) text and readability. all of the guidelines used were written in a manner allowing readers room for interpretation. the authors explicitly stated that they were not writing rules, but rather, guidelines, and recognized that their application must allow for a level of flexibility.20 this ambiguity creates problems in terms of assessing displays. in this study, two evaluators independently assessed the nine displays. the first evaluator applied all seventy-three guidelines and found thirty to be nonapplicable to the specific types of interfaces considered. the second evaluator applied the shortened list of forty-three guidelines. following the independent evaluations, the two evaluators compared assessments. the initial rate of agreement between the two assessments ranged from 49 percent to 70 percent across the nine displays. in cases where there was disagreement, the evaluators discussed their rationale for the assessment in order to achieve consensus. ฀ results of supplementary study as with the initial study, in addition to the overall scores for each display, the percentage score was calculated for each subset of the checklist (labels, interaction, and text and readability). it is worth noting that the overall scores witnessed higher compliance to this second set of guidelines, ranging from 68 percent to 89 percent. the correlations between response times and guidelines-conformance scores are presented in table 2. again, it is important to note that a high correlation between response time and conformity to guidelines indicates a low correlation between user performance (speed) and conformity to guidelines. row 1 of table 2 contains correlations between the total guidelines score and response times; column 1 contains correlations between all tasks (the sum of the five response times) and guidelines scores. of course, the correlations in table 2 are not all independent of each other. only one of the twenty-four correlations in table 2 table 1. correlations between scores on the checklist of screen design guidelines and time to complete search tasks: pearson correlation (sig. 2-tailed); n=9 all cells all tasks author title call # publisher year total score: .469 (.203) .401 (.285) .870 (.002) .547 (.127) .035 (.930) .247 (.522) labels: .722 (.028) .757 (.018) .312 (.413) .601 (.087) .400 (.286) .669 (.049) text: -.260 (.500) -.002 (.997) .595 (.091) -.191 (.623) -.412 (.271) -.288 (.452) instructions: .422 (.258) .442 (.234) .712 (.032) .566 (.112) .026 (.947) .126 (.748) layout: .602 (.086 -.102 (.794) .383 (.308) .624 (.073) .492 (.179) .367 (.332) bibliographic displays in web catalogs | cherry, muter, and szigeti 159 is significant at the .05 level, and it indicates a slower response time with higher conformity to guidelines. of the ten correlations in table 2 indicating faster response times with higher conformity to guidelines, none approaches statistical significance. the upper left-hand cell of table 2 indicates that the overall correlation between total scores on the guidelines and the mean response time across all search tasks (all tasks) was 0.292 (p = 0.445)—i.e., conformity to the overall checklist was correlated with slower overall response times, though this correlation did not approach statistical significance. figure 4 shows a scatter plot of the main independent variable, overall score on the checklist of guidelines, and the main dependent variable, the sum of the response times for the five tasks (all tasks). figure 5 shows a scatter plot for the highest-obtained correlation: between score on the text and readability category of guidelines and the time to complete the title search task. visual inspection suggests patterns consistent with table 2: no correlation in figure 4, and slower search times with higher guidelines scores in figure 5. ฀ discussion in the present experiment and the supplementary study, none of the correlations indicating faster user performance with greater conformity to guidelines approached statistical significance. in some cases, user performance was actually significantly slower with greater conformity to guidelines—i.e., in some cases, there was a negative correlation between user performance and conformity to guidelines. the authors are aware of no other study indicating a negative correlation between user performance and conformity to interface design guidelines. some researchers would not be surprised at a finding of zero correlation between user performance and conformity to guidelines, but a negative correlation is somewhat puzzling. a negative correlation implies that there is something wrong somewhere—perhaps incorrect underlying theories or an incorrect body of assumptions. such a negative correlation is not without precedent in applied science. in the field of medicine, before the turn of the twentieth century, seeing a doctor actually decreased the chances of improving health.21 presumably, medical guidelines of the time were negatively correlated with successful practice, and the negative correlation implies not just worthlessness, but medical theories or beliefs that were actually incorrect and harmful. the boundary conditions of the present findings are unknown. the present findings may be specific to the tasks employed—fairly simple search tasks. the findings may apply only to situations in which the user is switching formats frequently, as opposed to situations in which each user is using only one format. (a between-subjects design would test this possibility.) the findings may be specific to the two sets of guidelines used. with sets of ten guidelines, d’angelo and twining and gerhardt-powals found positive correlations between user performance and conformity to guidelines (though apparently not statistically significantly in the former study).22 the guidelines used in the authors’ main study and supplementary study tended to be more detailed than in the other two studies. detailed guidelines are sometimes seen as advantageous, since developers who use guidelines need to be able to interpret the guidelines in order to implement them. however, perhaps following a large number of detailed figure 2. scatter plot for overall score on checklist of screen design guidelines and time to complete set of five search tasks figure 3. scatter plot for overall score on checklist of screen design guidelines and time to complete “title” search tasks 160 information technology and libraries | september 2006 guidelines reduces the amount of personal judgment used and results in less effective designs. (designers of the nine displays used in the present study would not have been using either of the sets of guidelines used in our studies but may have been using some of the sources from which our guidelines were extracted.) as noted by cheepen in discussing guidelines for voice dialogues, sometimes a designer’s experience may be more valuable than a particular guideline.23 the lack of agreement in interpreting the guidelines was an unexpected but interesting factor revealed during the collection of data in both the main study and the supplementary study. while a higher rate of agreement had been expected, the differences raised an important point in the use of guidelines. if guidelines intentionally leave room for interpretation, what factor does expert opinion and experience play in design? in the main study, the number of guidelines on which the evaluators disagreed ranged from 14 percent to 35 percent across the nine displays. in the supplementary study, both evaluators had experience in interface design through a number of different roles in the design process (both academic and professional). this meant the evaluators’ interpretations of the guidelines were informed by previous experience. the initial level of disagreement ranged from 30 percent to 51 percent across the nine displays. while it was possible to quickly reach consensus table 2. correlations between scores on subset of the u.s. dept. of health and human services (2003) research–based web design and usability guidelines and time to complete search tasks: pearson correlation (sig. 2-tailed); n=9 all cells all tasks author title call # publisher year total score: .292 (.445) .201 (.604) .080 (.839) -.004 (.992) .345 (.363) .499 (.172) layout: -.308 (.420) -.264 (.492) -.512 (.159) -.332 (.383) .046 (.906) -.294 (.442) text: .087 (.824) -.051 (.895) .712 (.032) -.059 (.879) -.095 (.808) -.259 (.500) interaction: .638 (.065) .603 (.085) .055 (.887) .439 (.238) .547 (.128) .625 (.072) figure 4. scatter plot for subset of u.s. department of health and human services (2003) research–based web design and usability guidelines conformance score and total time to complete five search tasks figure 5. scatter plot for text and readability category of u.s. department of health and human services (2003) research–based web design and usability guidelines and time to complete “title” search tasks bibliographic displays in web catalogs | cherry, muter, and szigeti 161 on a number of assessments (because both evaluators recognized the high degree of subjectivity that is involved in design), it also led to longer discussions regarding the intentions of the guideline authors. a majority of the differences involved lack of guideline clarity (where one evaluator had indicated a meet-or-fail score, while another felt the guideline was either unclear or not applicable). does this imply that guidelines can best be applied by committees or groups of designers? the dynamic of such groups would add another complex variable to understanding the relationship between guideline conformity and user performance. future research should test other tasks and other sets of guidelines to confirm or refute the findings of the present study. there should also be investigation of other potential predictors of display effectiveness. for example, would the ratings of usability experts or graphic designers for a set of bibliographic displays be positively correlated with user performance? crawford, in response to a paper presenting findings from an evaluation of bibliographic displays using a previous version of the checklist of guidelines used in the main study, commented that the design of bibliographic displays still reflects art, not science.24 several researchers have discussed aesthetics and user interface design. reed et al. noted the need to extend our understanding of the role of aesthetic elements in the context of user-interface guidelines and standards.25 ngo, teo, and byrne discussed fourteen aesthetic measures for graphic displays.26 norman discussed these ideas in “emotions and design: attractive things work better.”27 tractinsky, katz, and ikar found strong correlations between perceived aesthetic appeal and perceived usability.28 most empirical studies of guidelines have looked at one variable only or, at the most, a small number of variables. the opposite extreme would be to do a study that examines a large number of variables factorially. for example, assuming eighty-six yes/no guidelines for bibliographic displays, it would be theoretically possible to do a factorial experiment testing all possible combinations of yes/no—2 to the 86th power. in such an experiment, all two-way interactions and higher interactions could be assessed, but such an experiment is not feasible. what the authors have done is somewhere between these two extremes. this study has the disadvantage that we cannot say anything about any individual guideline, but it has the advantage that it captures some of the interactions, including highorder interactions. despite the present results, the authors are not recommending abandoning the search for guidelines in interface design. at a minimum, the use of guidelines may increase consistency across interfaces, which may be helpful. however, in some research domains, particularly when huge numbers of potential interactions result in extreme complexity, it may be advisable to allocate resources to means other than attempting to establish guidelines, such as expert review, relying on tradition, letting natural selection take its course, utilizing the intuitions of designers, and observing user-interaction. indeed, in pure and applied research in general, perhaps more resources should be allocated to means other than searching for explicit generalizations. future research may better indicate when to attempt to establish generalizations and when to use other methods. ฀ acknowledgements this work was supported by a social sciences and humanities research council general research grant awarded by the faculty of information studies, university of toronto, and by the natural sciences and engineering research council of canada. the authors wish to thank mark dykeman and gerry oxford who developed the software for the experiment; donna chan, joan bartlett, and margaret english, who scored the displays with the first set of guidelines; everton lewis, who conducted the experimental sessions; m. max evans, who helped score the displays with the supplementary set of guidelines; and robert l. duchnicky, jonathan l. freedman, bruce oddson, tarjin rahman, and paul w. smith for helpful comments. references and notes 1. see, for example, a. chapanis, “some generalizations about generalization,” human factors 30, no. 3 (1988): 253–67. 2. t. s. tullis, “screen design,” in handbook of human-computer interaction, ed. m. helander (amsterdam: elsevier, 1988), 377–411; t. s. tullis, “screen design,” in handbook of humancomputer interaction, 2d ed., eds. m. helander, t. k. landauer, and p. prabhu (amsterdam: elsevier, 1997), 503–31. 3. w. o. galitz, handbook of screen format design, 2d ed. (wellesley hills, mass.: qed information sciences, 1985); s. l. smith and j. n. mosier, guidelines for designing user interface software, technical report esd-tr-86-278 (hanscom air force base, mass.: usaf electronic systems division, 1986). 4. c. chariton and m. choi, “user interface guidelines for enhancing the usability of airline travel agency e-commerce web sites,” chi ‘02 extended abstracts on human factors in computing systems, apr. 20–25, 2002 (minneapolis, minn.: acm press), 676–77, http://portal.acm.org/citation .cfm?doid=506443.506541 (accessed dec. 28, 2005); m. g. wadlow, “the andrew system; the role of human interface guidelines in the design of multimedia applications,” current psychology: research and reviews 9 (summer 1990): 181–91; j. kim and j. lee, “critical design factors for successful e-commerce systems,” behaviour and information technology 21, no. 3 (2002): 185–99; s. giltuz and j. nielsen, usability of web sites for children: 162 information technology and libraries | september 2006 70 design guidelines (fremont, calif.: nielsen norman group, 2002); juliana chan, “evaluation of formats used to display bibliographic records in opacs in canadian academic and public libraries,” master of information science research project report (university of toronto: faculty of information studies, 1995); m. c. maquire, “a review of user-interface design guidelines for public information kiosk systems,” international journal of human-computer studies 50, no. 3 (1999): 263–86. 5. national cancer institute, research-based web design and usability guidelines (2001), www.usability.gov/guidelines/index .html (accessed dec. 28, 2005). 6. u.s. department of health and human services, researchbased web design and usability guidelines (2003), http://usability .gov/pdfs/guidelines.html (accessed dec. 28, 2005). 7. l. j. cronbach, “beyond the two disciplines of scientific psychology,” american psychologist 30, no. 2 (1975): 116–27. 8. p. muter, “interface design and optimization of reading of continuous text,” in cognitive aspects of electronic text processing, eds. h. van oostendorp and s. de mul (norwood, n.j.: ablex, 1996), 161–80; j. a. nelder and r. mead, “a simplex method for function minimization,” computer journal 7, no. 4 (1965): 308–13; t. k. landauer, “research methods in human-computer interaction,” in handbook of human-computer interaction, ed. m. helander (amsterdam: elsevier, 1988), 905–28; r. n. shepard, “toward a universal law of generalization for psychological science,” science 237 (sept. 11, 1987): 1317–323. 9. j. d. d’angelo and j. twining, “comprehension by clicks: d’angelo standards for web page design, and time, comprehension, and preference,” information technology and libraries 19, no. 3 (2000): 125–35. 10. j. d. d’angelo and s. k. little, “successful web pages: what are they and do they exist?” information technology and libraries 17, no. 2 (1998): 71–81. 11. d’angelo and twining, “comprehension by clicks.” 12. j. gerhardt-powals, “cognitive engineering principles for enhancing human-computer performance,” international journal of human-computer interaction 8, no. 2 (1996): 189–211. 13. chan, “evaluation of formats.” 14. ibid. 15. joan m. cherry and joseph p. cox, “world wide web displays of bibliographic records: an evaluation,” proceedings of the 24th annual conference of the canadian association for information science (toronto, ontario: canadian association for information science, 1996), 101–14. 16. joan m. cherry, “bibliographic displays in opacs and web catalogs: how well do they comply with display guidelines?” information technology and libraries 17, no. 3 (1998): 124– 37; cherry and cox, “world wide web displays of bibliographic records.” 17. v. herrero-solana and f. de moya-anegón, “bibliographic displays of web-based opacs: multivariate analysis applied to latin-american catalogs,” libri 51 (june 2001): 75–85. 18. u.s. department of health and human services, researchbased web design and usability guidelines, xxi. 19. international organization for standardization, iso 924111: ergonomic requirements for office work with visual display terminals (vdts)—part 11: guidance on usability (geneva, switzerland: international organization for standardization, 1998); international organization for standardization, iso 9241-12: ergonomic requirements for office work with visual display terminals (vdts)—part 12: presentation of information (geneva, switzerland: international organization for standardization, 1997); international organization for standardization, iso 9241-14: ergonomic requirements for office work with visual display terminals (vdts)—part 14: menu dialogues (geneva, switzerland: international organization for standardization, 1997). 20. u.s. department of health and human services, researchbased web design and usability guidelines. 21. ivan illich, limits to medicine: medical nemesis: the expropriation of health (harmondsworth, n.y.: penguin, 1976). 22. d’angelo and twining, “comprehension by clicks”; gerhardt-powals, “cognitive engineering principles.” 23. c. cheepen, “guidelines for dialogue design—what is our approach? working design guidelines for advanced voice dialogues project. paper 3,” (1996), www.soc.surrey.ac.uk/research/ reports/voice-dialogues/wp3.html (accessed dec. 29, 2005). 24. w. crawford, “webcats and checklists: some cautionary notes,” information technology and libraries 18, no. 2, (1999): 100–03; cherry, “bibliographic displays in opacs and web catalogs.” 25. p. reed et al., “user interface guidelines and standards: progress, issues, and prospects,” interacting with computers 12, no. 1 (1999): 119–42. 26. d. c. l. ngo, l. s. teo, and j. g. byrne, “formalizing guidelines for the design of screen layouts,” displays 21, no. 1 (2000): 3–15. 27. d. a. norman, “emotion and design: attractive things work better,” interactions 9, no. 4 (2002): 36–42. 28. n. tractinsky, a. s. katz, d. ikar, “what is beautiful is usable,” interacting with computers 13, no. 2 (2000): 127–45. 52 journal of library automation vol. 14/1 march 1981 publishing firm. with a feeling of deja vu i listened to an explanation of how difficult it is to develop a system for the novice; one proposed solution is to allow only the first four letters of a word to be entered (one of the search methods used at the library of congress, which does suggest some cross-fertilization ). whatever the trends, the reality is that librarians and information scientists are playing decreasing roles in the growth of information display technology. hardware systems analysts, advertisers, and communications specialists are the main professions that have an active role to play in the information age. perhaps the answer is an immediate and radical change in the training of library schools of today. our small role may reflect our penchant to be collectors, archivists, and guardians of the information repositories . have we become the keepers of the system? the demand today is for service, information, and entertainment. if we librarians cannot fulfill these needs our places are not assured. should the american library association (ala) be ensuring that libraries are a part of all ongoing tests of videotex-at least in some way-either as organizers, information providers, or in analysis? consider the force of the argument given at the ala 1980 new york annual conference that cable television should be a medium that librarians become involved with for the future. certainly involvement is an important role, but we , like the industrialists and marketers before us, must make smart decisions and choose the proper niche and the most effective way to use our limited resources if we are to serve any part of society in the future. bibliography 1. electronic publishing revietc. oxford, england : learned information ltd . quarterly . 2. home video report . white plains, new york : knowledge industry publications. weekly. 3. ieee transactions on consumer electronics. new york: ieee broadcast, cable, and consumer electronics soc iety . five tim es yearly. 4. international videotex /te letext news. washington , d. c.: arlen communications ltd. monthly . 5. videodisc/teletext news. westport , conn.: microform revi ew. quarterly. 6. videoprint. norwalk , conn.: videoprint. two times monthly. 7. viewdata/videotex report. new york: link resources corp. monthly. data processing library: a very special library sherry cook, mercedes dumlao, and maria szabo: bechtel data processing library, san francisco, california. the 1980s are here and with them comes the ever broadening application of the computer. this presents a new challenge to libraries. what do we do with all these computer codes? how do we index the material? and most importantly, how do we make it accessible to our patrons or computer users? bechtel's data processing library has met these demands. the genesis for th e collection was bechte l's conversion from a honeywell 6000 computer to a univac lloo in 1974. all the programs in use at that time were converted to run on the univac system. it seemed a good time to put all of the computer programs together from all of the various bechtel divisions into a controlled collection. the librarians were charged with the responsibility of enforcing standards and control of bechtel's computer programs. the major benefits derived from placing all computer programs into a controlled library were: 1. company-wide usage of the programs. 2. minimize investment in program development through common usage. 3. computer file and documentation storage by the library to safeguard the investment. 4. central location for audits of program code and documentation. 5. centralized reporting on bechtel programs . developing the collection involved basic cataloging techniques which were greatly modified to encompass all the information that computer programs generate, including actual code, documentation, and listings . historically, this information must be kept indefinitely on an archival basis . the machine-readabl e codes themselves are grouped together and maintained from the library's budget . finally , a reference desk is staffed to answer questions from the entire user community. documentation for programs is strictly controlled . code changes are arranged chronologically to provide only the most current release of a program to all users. historical information is kept and is crucial to satisfy the demands of auditors (such as the nuclear regulatory commission). additionally, the names of people administratively connected with the program are recorded and their responsibilities communications 53 defined (valuable in situations of liability for work complete d yesteryear). the backbone of the operation is a standards manual that spells out and discusses the file requirements, documentation specifications, and control forms. this standard is made readily available throughout bechtel. in addition, there are in-house education classes about the same document. indeed, the central data processing library is the repository of computer information at bechtel. the centralization and control of computer programs eliminates the chaos that can occur if too many individuals maintain and use the same computer program . a partnership for creating successful partnerships | grant 5 ex libris column carl grant carl grant is [tk] ex libris column carl grant a partnership for creating successful partnerships carl grant w hen marc asked me to write this column i eagerly accepted because i feel strongly about libraries leveraging their role to their greater advantage in the rapidly changing information landscape. i see sponsorships and partnerships as an important tool for doing that. however, as noted in marc’s column in this issue, we’d been having a discussion about the continuing involvement of ex libris in the lita/ex libris student writing award. like many of you, we at ex libris are trying to keep our costs low in this challenging economic environment so that we can in turn keep your costs low. thus we are closely evaluating all expenditures to ensure their cost is justified by the value they return to our organization. i won’t repeat the discussion already outlined by marc above, but will just note with great pleasure his willingness to not only listen to my concerns, but to try and address them. his invitation to write this column was part of that response, a chance for me to share my thoughts and concerns with you about sponsorships and partnerships and where they need to go in the future. to do that, i’d like to expand on some of the concepts marc and i were discussing and talk about how to make sponsorships and partnerships successful. i want to look at what successful ones consist of as well as what types are needed in our profession tomorrow. n the elements of successful sponsorships and partnerships for a sponsorship or partnership to be successful in today’s environment, it should offer at least the following components: 1. clear and shared goals. agreeing what is to be achieved via the sponsorship or partnership is essential. furthermore, it should be readily apparent that the goals are achievable. this will happen through joint planning and execution of an agreedupon project plan that results in that achievement. it is up to each partner to ensure that they have the resources to execute that project plan on schedule and on budget. as there will always be unplanned events and issues, there must also be ongoing, open communications throughout the life of the sponsorship or partnership. this way, surprises are avoided and issues can be dealt with before they become problems. 2. risks and rewards must be real and shared. members of a sponsorship or partnership should share risks and rewards in proportion to the role they hold. furthermore, the rewards must be seen to be real rewards to all the members. step into the other members’ shoes and look at what you’re offering. does it clearly bring value to the other organizations in the arrangement? if so, how? if not, what can be done to address that disparity? sponsorships and partnerships should not take advantage of any one sponsor or partner by allocating risks or rewards disproportionately to their contributions. rewards realized by members of the sponsorship or partnership should be proportionally shared by all the members. 3. defined time. a sponsorship or partnership is for a defined amount of time and should not be assumed to be ongoing. regular reviews of how well the sponsorship or partnership is working for the partners must be conducted and decisions made on the basis of those results. it might be that the landscape is changing and the benefits are no longer as meaningful, or there are alternatives now available that provide better benefits for on of the members. maintaining a sponsorship or partnership past its useful life will only result in the disintegration of the overall relationship. 4. write it down. organizations merge, are acquired and sold, people change jobs, and people change responsibilities. any sponsorship or partnership should have a written agreement outlining the elements above. once finalized, it should be signed by an appropriate person representing each member organization. that way, when things do change, there is a reference point and the arrangement is more likely to survive any of these precipitous events. n the sponsorships and partnerships needed for tomorrow successful sponsorships and partnerships are a necessary part our landscape today. the world of information and knowledge has become too large, exists in too many silos, and is far too complex. “competition, collaboration, and cooperation” defines the only path possible for navigating the landscape successfully. as the president of a company in the library automation marketplace, i continue to seek out opportunities that uniquely position our company to effectively maintain success in the marketplace and to provide value for our customers and thus our company. i believe libraries need to seek the same opportunities for their organizations. carl grant (carl.grant@exlibrisgroup.com) is president of ex libris north america, des plaines, illinois. continued on page 7 editorial board thoughts | shores 7 looking ahead, it seems clear that the pace of change in today’s environment will only continue to accelerate; thus the need for us to quickly form and dissolve key sponsorships and partnerships that will result in the successful fostering and implementation of new ideas, the currency of a vibrant profession. the next challenge is to realize that many of the key sponsorship and partnerships that need to be formed are not just with traditional organizations in this profession. tomorrow’s sponsorships and partnership will be with those organizations that will benefit from the expertise of libraries and their suppliers while in return helping to develop or provide the new funding opportunities and means and places for disseminating access to their expertise and resources. likely organizations would be those in the fields of education, publishing, content creation and management, and social and community webbased software. to summarize, we at ex libris believe in sponsorships and partnerships. we believe they’re important and should be used in advancing our profession and organizations. from long experience we also have learned there are right ways and wrong ways to implement these tools, and i’ve shared thoughts on how to make them work for all the parties involved. again, i thank marc for his receptiveness to this discussion and my even deeper appreciation for trying to address the issues. it’s serves as an excellent example of what i discussed above. people forget, but paper, the scroll, the codex, and later the book were all major technological leaps, not to mention the printing press and moveable type. . . . there is so much potential for using technology to equalize access to information, regardless of how much money you have, what language you speak, or where you live. big ideas, enthusiasm, and hope for the profession, in addition to practical technology-focused information await the reader. enjoy the issue, and congratulations to the winner and all the finalists! note 1. all quotations are taken with permission from private e-mail correspondence. a partnership for creating successful partnerships continued from page 5 90 information technology and libraries | september 2011 michael witt i ’ll never forget helping one of my relatives learn how to use his first computer. we ran through the basics: turning the computer and monitor on, pointing and clicking, typing, and opening and closing windows. i went away to college, and when i came back for the holidays, he happily showed off his new abilities to send emails and create spreadsheets and such. despite his well-earned pride, i couldn’t help but notice that when he reached the edge of the desk with the mouse, he would use his other hand to place a photo album up against the desk and roll the mouse onto it, in order to reach the far right-hand side of the screen with the pointer. when i picked up his hand and the mouse and re-centered it on the desk for him, i think it blew his mind. he had been using the photo album to extend the reach of the mouse and pointer for months! it occurred to me that i should have spent more time with him, not just showing him what to do, but watching him do it. those of us working in information technology have a tremendous impact on library staff productivity by virtue of the systems we select or develop and implement. people working in most facets of library operations trust and rely on our hardware and software to accomplish their daily work, for which we bear a significant burden of responsibility. are they using the best possible tools for their work? are they using them in the best way? a great deal of effort has gone into user-centered design and improving functionality for our patrons, but in this time of reduced budgets and changing staff roles, it is important to extend similar consideration to the systems that we provision for our co-workers. at its best, information technology has the ability to save time and add value to the library by creating efficiencies and empowering people to do better and new work. whether we are evaluating new integrated library systems or choosing the default text editor for our workstations, we are presented with opportunities to learn more about how our libraries accomplish work “on the ground” and reconsider the role that technology can play in helping them. the phrase “eating your own dog food” is so common in software development circles that some have begun using it as a verb. developers engage in “dogfooding” by using new software themselves, internally, to identify bugs and improve usability and functionality before releasing it to users. this is a regular practice of companies such as microsoft1 and google2. setting aside any negative connotations for the moment (why are people eating dog food? and exactly who are the “dogs” in this scenario?), there is a lot that we can learn by putting ourselves in the place of our users and experiencing our systems from their perspective. perhaps the best way to do this is to walk around the building and spend time in each unit of the library, shadowing its staff and observing how they interact with systems to do their work. try to learn their workflow and observe the tasks they perform—both online and offline. you don’t need to become an expert, but ideally you’d be able to try to perform some of the tasks yourself. in one case, we were able to identify and enable someone to design and run their own reports, which helped their unit make more timely decisions and eliminated the need for it to run monthly reports on their behalf. if these tasks support user-facing interactions, you might get some good usability information in the process too. for example, i learned more about our library’s website by working chat reference for an hour a week than i did in two years of web development team meetings! part of this process is attempting to feel our users’ pain, too. do you use the same locked-down workstation image that you deploy to your staff desktops? there is also a tendency among it staff to keep the newest and best machines for their own use and cycle older machines to other units. i understand—it staff are working with databases and doing developing software, and so we benefit the most from higher-performing machines—but keep in mind that your co-workers likely have older, slower machines and take the lowest common denominator hardware into account when selecting new software. by walking a mile in your users’ shows, you may gain a deeper appreciation and understanding of the other units of the library and how they work together. because so much work is done on computers, people working in information technology can often see a broad picture of the activities of the library. we have the ability to make connections and identify potential points of integration, not only between machines but also between people and their work. references 1. pascal g. zachary, showstopper! the breakneck race to create windows nt and the next generation at microsoft (new york: free press, 1994): 129–56. 2. stephen levy, “inside google+: how the search giant plans to go social,” http://www.wired.com/epicenter/2011/06/ inside-google-plus-social/all/1 (accessed july 12, 2011). editorial board thoughts: eating our own dogfood michael witt (mwitt@purdue.edu) is the interdisciplinary research librarian and an assistant professor of library science at purdue university in west lafayette, indiana. he serves on the editorial board of ital. 58 information technology and libraries | june 2010 know its power, and facets can showcase metadata in new interfaces. according to mcguinness, facets perform several functions in an interface: ■■ vocabulary control ■■ site navigation and support ■■ overview provision and expectation setting ■■ browsing support ■■ searching support ■■ disambiguation support5 these functions offer several potential advantages to the user: the functions use category systems that are coherent and complete, they are predictable, they show previews of where to go next, they show how to return to previous states, they suggest logical alternatives, and they help the user avoid empty result sets as searches are narrowed.6 disadvantages include the fact that categories of interest must be known in advance, important trends may not be shown, category structures may need to be built by hand, and automated assignment is only partly successful.7 library catalog records, of course, already supply “categories of interest” and a category structure. information science research has shown benefits to users from faceted search interfaces. but do these benefits hold true for systems as complex as library catalogs? this paper presents an extensive review of both information science and library literature related to faceted browsing. ■■ method to find articles in the library and information science literature related to faceted browsing, the author searched the association for computing machinery (acm) digital library, scopus, and library and information science and technology abstracts (lista) databases. in scopus and the acm digital library, the most successful searches included the following: ■■ (facet* or cluster*) and (usability or user stud*) ■■ facet* and usability in lista, the most successful searches included combining product names such as “aquabrowser” with “usability.” the search “catalog and usability” was also used. the author also searched google and the next generation catalogs for libraries (ngc4lib) electronic discussion list in an attempt to find unpublished studies. search terms initially included the concept of “clustering”; however, this was quickly shown to be a clearly defined, separate topic. according to hearst, “clustering refers to the grouping of items according to some measure faceted browsing is a common feature of new library catalog interfaces. but to what extent does it improve user performance in searching within today’s library catalog systems? this article reviews the literature for user studies involving faceted browsing and user studies of “next-generation” library catalogs that incorporate faceted browsing. both the results and the methods of these studies are analyzed by asking, what do we currently know about faceted browsing? how can we design better studies of faceted browsing in library catalogs? the article proposes methodological considerations for practicing librarians and provides examples of goals, tasks, and measurements for user studies of faceted browsing in library catalogs. m any libraries are now investigating possible new interfaces to their library catalogs. sometimes called “next-generation library catalogs” or “discovery tools,” these new interfaces are often separate from existing integrated library systems. they seek to provide an improved experience for library patrons by offering a more modern look and feel, new features, and the potential to retrieve results from other major library systems such as article databases. one interesting feature these interfaces offer is called “faceted browsing.” hearst defines facets as a “a set of meaningful labels organized in such a way as to reflect the concepts relevant to a domain.”1 labarre defines facets as representing “the categories, properties, attributes, characteristics, relations, functions or concepts that are central to the set of documents or entities being organized and which are of particular interest to the user group.”2 faceted browsing offers the user relevant subcategories by which they can see an overview of results, then narrow their list. in library catalog interfaces, facets usually include authors, subjects, and formats, but may include any field that can be logically created from the marc record (see figure 1 for an example). using facets to structure information is not new to librarians and information scientists. as early as 1955, the classification research group stated a desire to see faceted classification as the basis for all information retrieval.3 in 1960, ranganathan introduced facet analysis to our profession.4 librarians like metadata because they jody condit fagan (faganjc@jmu.edu) is content interfaces coordinator, james madison university library, harrisonburg, virginia. jody condit fagan usability studies of faceted browsing: a literature review usability studies of faceted browsing: a literature review | fagan 59 doing so and performed a user study to inform their decision. results: empirical studies of faceted browsing the following summaries present selected empirical research studies that had significant findings related to faceted browsing or interesting methods for such studies. it is not an exhaustive list. pratt, hearst, and fagan questioned whether faceted results were better than clustering or relevancy-ranked results.11 they studied fifteen breast-cancer patients and families. every subject used three tools: a faceted interface, a tool that clustered the search results, and a tool that ranked the search results according to relevance criteria. the subjects were given three simple queries related to breast cancer (e.g., “what are the ways to prevent breast cancer?”), asked to list answers to these before beginning, and to answer the same queries after using all the tools. in this study, subjects completed two timed tasks. first, subjects found as many answers as possible to the question in four minutes. second, the researchers measured the time subjects took to find answers to two specific questions (e.g., “can diet be used in the prevention of breast cancer?”) that related to the original, general query. for the first task, when the subjects used the faceted interface, they found more answers than they did with the other two tools. the mean number of answers found using the faceted interface was 7.80, for the cluster tool it was 4.53, and for the ranking tool it was 5.60. this difference was significant (p<0.05).12 for the second task, the researchers found no significant difference between the tools when comparing time on task. the researchers gave the subjects a user-satisfaction questionnaire at the end of the study. on thirteen of the fourteen quantitative questions, satisfaction scores for the faceted interface were much higher than they were for either the ranking tool or the cluster tool. this difference was statistically significant (p < 0.05). all fifteen users also affirmed that the faceted interface made sense, was helpful, was useful, and had clear labels, and said they would use the faceted interface again for another search. yee et al. studied the use of faceted metadata for image searching, and browsing using an interface they developed called flamenco.13 they collected data from thirty-two participants who were regular users of the internet, searching for information either every day or a few times a week. their subjects performed four tasks (two structured and two unstructured) on each of two interfaces. an example of an unstructured task from their study was “search for images of interest.” an example of a structured task was to gather materials for an art history of similarity . . . typically computed using associations and commonalities among features where features are typically words and phrases.”8 using library catalog keywords to generate word clouds would be an example of clustering, as opposed to using subject headings to group items. clustering has some advantages according to hearst. it is fully automated, it is easily applied to any text collection, it can reveal unexpected or new trends, and it can clarify or sharpen vague queries. disadvantages to clustering include possible imperfections in the clustering algorithm, similar items not always being grouped into one cluster, a lack of predictability, conflating many dimensions, difficulty labeling groups, and counterintuitive subhierarchies.9 in user studies comparing clustering with facets, pratt, hearst, and fagan showed that users find clustering difficult to interpret and prefer a predictable organization of category hierarchies.10 ■■ results the author grouped the literature into two categories: user studies of faceted browsing and user studies of library catalog interfaces that include faceted browsing as a feature. generally speaking, the information science literature consisted of empirical studies of interfaces created by the researchers. in some cases, the researchers’ intent was to create and refine an interface intended for actual use; in others, the researchers created the interface only for the purposes of studying a specific aspect of user behavior. in the library literature, the studies found were generally qualitative usability studies of specific library catalog interface products. libraries had either implemented a new product, or they were thinking about figure 1. faceted results from jmu’s vufind implementation 60 information technology and libraries | june 2010 uddin and janacek asked nineteen users (staff and students at the asian institute of technology) to use a website search engine with both a traditional results list and a faceted results list.22 tasks were as follows: (1) look for scholarship information for a masters program, (2) look for staff recruitment information, and (3) look for research and associated faculty member information within your interested area.23 they found that users were faster when using the faceted system, significantly so for two of the three tasks. success in finding relevant results was higher with the faceted system. in the post–study questionnaire, participants rated the faceted system more highly, including significantly higher ratings for flexibility, interest, understanding of information content, and more search results relevancy. participants rated the most useful features to be the capability to switch from one facet to another, preview the result set, combine facets, and navigate via breadcrumbs. capra et al. compared three interfaces in use by the bureau of labor statistics website, using a between-subjects study with twenty-eight people and a within-subjects study with twelve people.24 each set of participants performed three kinds of searches: simple lookup, complex lookup, and exploratory. the researchers used an interesting strategy to help control the variables in their study: because the bls website is a highly specialized corpus devoted to economic data in the united states organized across very specific time periods (e.g., monthly releases of price or employment data), we decided to include the us as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in our study. thus, the simple lookup tasks were constructed around a single economic facet but also included the spatial and temporal facets to provide context for the searchers. the complex lookup tasks involve additional facets including genre (e.g. press release) and/or region.25 capra et al. found that users preferred the familiarity afforded by the traditional website interface (hyperlinks + keyword search) but listed the facets on the two experimental interfaces as their best features. the researchers concluded, “if there is a predominant model of the information space, a well designed hierarchical organization might be preferred.”26 zhang and marchionini analyzed results from fifteen undergraduate and graduate students in a usability study of an interface that used facets to categorize results (relation browser ++).27 there were three types of tasks: ■■ type 1: simple look-up task (three tasks such as “check if the movie titled the matrix is in the library movie collection”). ■■ type 2: data exploration and analysis tasks (six tasks essay on a topic given by the researchers and to complete four related subtasks. the researchers designed the structured task so they knew exactly how many relevant results were in the system. they also gave a satisfaction survey. more participants were able to retrieve all relevant results with the faceted interface than with the baseline interface. during the structured tasks, participants received empty results with the baseline interface more than three times as often as with the faceted interface.14 the researchers found that participants constructed queries from multiple facets in the unstructured tasks 19 percent of the time and in the structured tasks 45 percent of the time.15 when given a post–test survey, participants identified the faceted interface as easier to use, more flexible, interesting, enjoyable, simple, and easy to browse. they also rated it as slightly more “overwhelming.” when asked to choose between the two, twenty-nine participants chose the faceted interface, compared with two who chose the baseline (n = 31). thirty-one of the thirty-two participants said the faceted interface helped them learn more, and twentyeight of them said it would be more useful for their usual tasks.16 the researchers concluded that even though their faceted interface was much slower than the other, it was strongly preferred by most study participants: “these results indicate that a category-based approach is a successful way to provide access to image collections.”17 in a related usability study on the flamenco interface, english et al. compared two image browsing interfaces in a nineteen-participant study.18 after an initial search, the “matrix view” interface showed a left column with facets, with the images in the result set placed in the main area of the screen. from this intermediary screen, the user could select multiple terms from facets in any order and have the items grouped under any facet. the “singletree” interface listed subcategories of the currently selected term at the top, with query previews underneath. the user could then only drill down to subcategories of the current category, and could not select terms from more than one facet. the researchers found that a majority of participants preferred the “power” and “flexibility” of matrix to the simplicity of singletree. they found it easier to refine and expand searches, shift between searches, and troubleshoot research problems. they did prefer singletree for locating a specific image, but matrix was preferred for browsing and exploring. participants started over only 0.2 percent of the time for the matrix compared to 4.5 percent for singletree.19 yet the faceted interface, matrix, was not “better” at everything. for specific image searching, participants found the correct image only 22.0 percent of the time in matrix compared to 66.0 percent in singletree.20 also, in matrix, some participants drilled down in the wrong hierarchy with wrong assumptions. one interesting finding was that in both interfaces, more participants chose to begin by browsing (12.7 percent) than by searching (5.0 percent).21 usability studies of faceted browsing: a literature review | fagan 61 of the first two studies: the first study comprised one faculty member, five graduate students, and two undergraduate students; the second comprised two faculty members, four graduate students, and two undergraduate students. the third study did not report results related to faceted browsing and is not discussed here. the first study had seven scenarios; the second study had nine. the scenarios were complex: for example, one scenario began, “you want to borrow shakespeare’s play, the tempest, from the library,” but contained the following subtasks as well: 1. find the tempest. 2. find multiple editions of this item. 3. find a recent version. 4. see if at least one of the editions is available in the library. 5. what is the call number of the book? 6. you’d like to print the details of this edition of the book so you can refer to it later. participants found the interface friendly, easy to use, and easy to learn. all the participants reported that faceted browsing was useful as a means of narrowing down the result lists, and they considered this tool one of the differentiating features between primo and their library opac or other interfaces. facets were clear, intuitive, and useful to all participants, including opening the “more” section.31 one specific result from the tests was that “online resources” and “available” limiters were moved from a separate location to the right with all other facets.32 in a study of aquabrowser by olson, twelve subjects— all graduate students in the humanities—participated in a comparative test in which they looked for additional sources for their dissertation.33 aquabrowser was created by medialab but is distributed by serials solutions in north america. this study also had three pilot subjects. no relevance judgments were made by the researchers. nine of the twelve subjects found relevant materials by using aquabrowser that they had not found before.34 olson’s subjects understood facets as a refinement tool (narrowing) and had a clear idea of which facets were useful and not useful for them. they gave overwhelmingly positive comments. only two felt the faceted interface was not an improvement. some participants wanted to limit to multiple languages or dates, and a few were confused about the location of facets in multiple places, for example, “music” under both format and topic. a team at yale university, led by bauer, recently conducted two tests on pilot vufind installations: a subject-based presentation of e-books for the cushing/ whitney medical library and a pilot test of vufind using undergraduate students with a sample of 400,000 records from the library system.35 vufind is open-source software developed at villanova university (http://vufind.org). that require users to understand and make sense of the information collection: “in which decade did steven spielberg direct the most movies?”). ■■ type 3: (one free exploration task: “find five favorite videos without any time constraints”). the tasks assigned for the two interfaces were different but comparable. for type 2 tasks, zhang and marchionini found that performance differences between the two interfaces were all statistically significant at the .05 level.28 no participants got wrong answers for any but one of the tasks using the faceted interface. with regard to satisfaction, on the exploratory tasks the researchers found statistically significant differences favoring the faceted interface on all three of the satisfaction questions. participants found the faceted interface not as aesthetically appealing nor as intuitive to use as the basic interface. two participants were confused by the constant changing and updating of the faceted interface. the above studies are examples of empirical investigations of experimental interfaces. hearst recently concluded that facets are a “proven technique for supporting exploration and discovery” and summarized areas for further research in this area, such as applying facets to large “subject-oriented category systems,” facets on mobile interfaces, adding smart features like “autocomplete” to facets, allowing keyword search terms to affect order of facets, and visualizations of facets.29 in the following section, user studies of next-generation library catalog interfaces will be presented. results: library literature understandably, most studies by practicing librarians focus on products their libraries are considering for eventual use. these studies all use real library catalog records, usually the entire catalog’s database. in most cases, these studies were not focused on investigating faceted browsing per se, but on the usability of the overall interface. in general, these studies used fewer participants than the information science studies above, followed less rigorous methods, and were not subjected to statistical tests. nevertheless, they provide many insights into the user experience with the extremely complex datasets underneath next-generation library catalog interfaces that feature faceted browsing. in this review article, only results specifically relating to faceted browsing will be presented. sadeh described a series of usability studies performed at the university of minnesota (um), a primo development partner.30 primo is the next-generation library catalog product sold by ex libris. the author also received additional information from the usability services lab at um via e-mail. three studies were conducted in august 2006, january 2007, and october 2007. eight users from various disciplines participated in each 62 information technology and libraries | june 2010 participants. the researchers measured task success, duration, and difficulty, but did not measure user satisfaction. their study consisted of four known-item tasks and six topic-searching tasks. the topic-searching tasks were geared toward the use of facets, for example, “can you show me how would you find the most recently published book about nuclear energy policy in the united states?”45 all five participants using endeca understood the idea of facets, and three used them. students tried to limit their searches at the outset rather than search and then refine results. an interesting finding was that use of the facets did not directly follow the order in which facets were listed. the most heavily used facet was library of congress classification (lcc), followed closely by topic, and then library, format, author, and genre.46 results showed a significantly shorter average task duration for endeca catalog users for most tasks.47 the researchers noted that none of the students understood that the lcc facet represented call-number ranges, but all of the students understood that these facets “could be used to learn about a topic from different aspects—science, medicine, education.”48 the authors could find no published studies relating to the use of facets in some next-generation library catalogs, including encore and worldcat local. although the university of washington did publish results of a worldcat local usability study in a recent issue of library technology reports, results from the second round of testing, which included an investigation of facets, were not yet ready.49 ■■ discussion summary of empirical evidence related to faceted browsing empirical studies in the information science literature support many positive findings related to faceted browsing and build a solid case for including facets in search interfaces: ■■ facets are useful for creating navigation structures.50 ■■ faceted categorization greatly facilitates efficient retrieval in database searching.51 ■■ facets help avoid dead ends.52 ■■ users are faster when using a faceted system.53 ■■ success in finding relevant results is higher with a faceted system.54 ■■ users find more results with a faceted system.55 ■■ users also seem to like facets, although they do not always immediately have a positive reaction. ■■ users prefer search results organized into predictable, multidimensional hierarchies.56 ■■ participants’ satisfaction is higher with a faceted system.57 the team drew test questions from user search logs in their current library system. some questions targeted specific problems, such as incomplete spellings and incomplete title information. bauer notes that some problems uncovered in the study may relate to the peculiarities of the yale implementation. the medical library study contained eight participants—a mix of medical and nursing students. facets, reported bauer, “worked well in several instances, although some participants did not think they were noticeable on the right side of the page.”36 the prompt for the faceted task in this study came after the user had done a search: “what if you wanted to look at a particular subset, say ‘xxx’ (determine by looking at the facets).”37 half of the participants used facets, half used “search within” to narrow the topic by adding keywords. sixty-two percent of the participants were successful at this task. the undergraduate study asked five participants faced with a results list, “what would you do now if you only wanted to see material written by john adams?”38 on this task, only one of the five was successful, even though the author’s name was on the screen. bauer noted that in general, “the use of the topic facet to narrow the search was not understood by most participants. . . . even when participants tried to use topic facets the length of the list and extraneous topics rendered them less than useful.”39 the five undergraduates were also asked, “could you find books in this set of results that are about health and illness in the united states population, or control of communicable diseases during the era of the depression?”40 again, only one of the five was successful. bauer notes that “the overly broad search results made this difficult for participants. again, topic facets were difficult to navigate and not particularly useful to this search.”41 bauer’s team noted that when the search was configured to return more hits, “topic facets become a confusingly large set of unrelated items. these imprecise search results, combined with poor topic facet sets, seemed to result in confusion for test participants.”42 participants were not aware that topics represented subsets, although learning occurred because the “narrow” header was helpful to some participants.43 other results found by bauer’s team were that participants were intrigued by facets, navigation tools are needed so that patrons may reorder large sets of topic facets, format and era facets were useful to participants, and call-number facets were not used by anyone. antelman, pace, and lynema studied north carolina state university’s (ncsu) next-generation library catalog, which is driven by software from endeca.44 their study used ten undergraduate students in a between-subjects design where five used the endeca catalog and five used the library’s traditional catalog. the researchers noted that their participants may have been experienced with the library’s old catalog, as log data shows most ncsu users enter one or two terms, which was not true of study usability studies of faceted browsing: a literature review | fagan 63 one product’s faceted system for a library catalog does not substitute for another, the size and scope of local collections may greatly affect results, and cataloging practices and metadata will affect results. still, it is important for practicing librarians to determine if new features such as facets truly improve the user’s experience. methodological best practices after reading numerous empirical research studies (some of which critique their own methods) and library case studies, some suggestions for designing better studies of facets in library catalogs emerged. designing the study ■■ consider reusing protocols from previous studies. this provides not only a tested method but also a possible point of comparison. ■■ define clear goals for each study and focus on specific research questions. it’s tempting to just throw the user into the interface and see what happens, but this makes it difficult, if not impossible, to analyze the results in a useful way. for example, one of zhang and marchionini’s hypotheses specifically describes what rich interaction would look like: “typing in keywords and clicking visual bars to filter results would be used frequently and interchangeably by the users to finish complex search tasks, especially when large numbers of results are returned.”64 ■■ develop the study for one type of user. olson’s focus on graduate students in the dissertation process allowed the researchers to control for variables such as interest of and knowledge about the subject. ■■ pilot test the study with a student worker or colleague to iron out potential wrinkles. ■■ let users explore the system for a short time and possibly complete one highly structured task to help the user become used to the test environment, interface, and facilitator.65 unless you are truly interested in the very first experience users have with a system, the first use of a system is an artificial case. designing tasks ■■ make sure user performance on each task is measurable. will you measure the time spent on a task? if “success” is important, define what that would look like. for example, english et al. defined success for one of their tasks as when “the participant indicated (within the allotted time) that he/she had reached an appropriate set of images/specific image in the collection.”66 ■■ establish benchmarks for comparison. one can test for significant differences between interfaces, one can test for differences between research subjects and an expert user, and one can simply measure against ■■ users are more confident with a faceted system.58 ■■ users may prefer the familiarity afforded by traditional website interface (hyperlinks + keyword search).59 ■■ initial reactions to the faceted interface may be cautious, seeing it as different or unfamiliar.60 users interact with specific characteristics of faceted interfaces, and they go beyond just one click with facets when it is permitted. english et al. found that 7 percent of their participants expanded facets by removing a term, and that facets were used more than “keyword search within”: 27.6 percent versus 9 percent.61 yee et al. found that participants construct queries from multiple facets 19 percent of the time in unstructured tasks; in structured tasks they do so 45 percent of the time.62 the above studies did not use library catalogs; in most cases they used an experimental interface with record sets that were much smaller and less complicated than in a complete library collection. domains included websites, information from one website, image collections, video collections, and a journal article collection. summary of practical user studies related to faceted browsing this review also included studies from practicing librarians at live library implementations. these studies generally had smaller numbers of users, were more likely to focus on the entire interface rather than a few features, and chose more widely divergent methods. studies were usually linked to a specific product, and results varied widely between systems and studies. for this reason it is difficult to assemble a bulleted summary as with the previous section. the variety of results from these studies indicate that when faceted browsing is applied to a reallife situation, implementation details can greatly affect user performance and user preference. some, like labarre, are skeptical about whether facets are appropriate for library information. descriptions of library materials, says labarre, include analyses of intellectual content that go beyond the descriptive terms assigned to commercial items such as a laptop: now is the time to question the assumptions that are embedded in these commercial systems that were primarily designed to provide access to concrete items through descriptions in order to enhance profit.63 it is clear that an evaluation of commercial interfaces or experimental interfaces does not substitute for an opac evaluation. yet it is a challenge for libraries to find expertise and resources to conduct user studies. the systems they want to test are large and complex. collaborating with other libraries has its own challenges: an evaluation of 64 information technology and libraries | june 2010 groups of participants, each of which tests a different system. ■❏ a within-subjects design has one group of participants test both systems. it is hoped that if libraries use the suggestions above when designing future experiments, results across studies will be more comparable and useful. designing user studies of faceted browsing after examining both empirical research studies and case studies by practicing librarians, a key difference seems to be the specificity of research questions and designing tasks and measurements to test specific hypotheses. while describing a full user-study protocol for investigating faceted browsing in a library catalog is beyond the scope of this article, reviewing the literature and the study methods it describes provided insights into how hypotheses, tasks, and measurements could be written to provide more reliable and comparable evidence related to faceted browsing in library catalog systems. for example, one research question could surround the format facet: “compared with our current interface, does our new faceted interface improve the user’s ability to find different formats of materials?” hypotheses could include the following: 1. users will be more accurate when identifying the formats of items from their result set when using the faceted interface than when using the traditional interface. 2. users will be able to identify formats of items more quickly with the faceted interface than with the traditional interface. looking at these hypotheses, here is a prompt and some example tasks the participants would be asked to perform: “we will be asking you to find a variety of formats of materials. when we say formats of materials, we mean books, journal articles, videos, etc.” ■■ task 1: please use interface a to search on “interpersonal communication.” look at your results set. please list as many different formats of material as you can. ■■ task 2: how many items of each format are there? ■■ task 3: please use interface b to search on “family communication.” what formats of materials do you see in your results set? ■■ task 4: how many items of each format are there?” we would choose the topics “interpersonal communication” and “family communication” because our local catalog has many material types for these topics and because these topics would be understood by most of our students. we would choose different topics to expectations or against previous iterations of the same study. for example, “75 percent of users completed the task within five minutes.” zhang and marchionini measured error rates, another possible benchmark.67 ■■ consider looking at your existing opac logs for zeroresults searches or other issues that might inspire interesting questions. ■■ target tasks to avoid distracters. for example, if your catalog has a glut of government documents, consider running the test with a limit set to exclude them unless you are specifically interested in their impact. for example, capra et al. decided to include the united states as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in their study.68 ■■ for some tasks, give the subjects simple queries (e.g., “what are the ways to prevent breast cancer?”) as opposed to asking the subjects to come up with their own topic. this can help control for the potential challenges of formulating one’s own research question on the spot. as librarians know, formulating a good research question is its own challenge. ■■ if you are using any timed tasks, consider how the nature of your tasks could affect the result. for example, pratt, hearst, and fagan noted that the time that it took subjects to read and understand abstracts most heavily influenced the time for them to find an answer.69 english et al. found that the system’s processing time influenced their results.70 ■■ consider the implications of your local implementation carefully when designing your study. at yale, the team chose to point their vufind instance at just 400,000 of their records, drew questions from problems users were having (as shown in log files), and targeted questions to these problems.71 who to study? ■■ try to study a larger set of users. it is better to create a short test with many users than a long test with a few users. nielsen suggests that twenty users is sufficient.72 consider collaborating with another library if necessary. ■■ if you test a small number, such as the typical four to eight users for a usability test, be sure you emphasize that your results are not generalizable. ■■ use subjects who are already interested in the subject domain: for example, pratt, hearst, and fagan used breast cancer patients,73 and olson used graduate students currently writing their dissertations.74 ■■ consider focusing on advanced or scholarly users. la barre suggests that undergraduates may be overstudied.75 ■■ for comparative studies, consider having both between-subjects and within-subjects designs.76 ■❏ a between-subjects design involves creating two usability studies of faceted browsing: a literature review | fagan 65 these experimental studies. previous case-study investigations of library catalog interfaces with facets have proven inconclusive. by choosing more specific research questions, tasks, and measurements for user studies, libraries may be able to design more objective studies and compare results more effectively. references 1. marti a. hearst, “clustering versus faceted categories for information exploration,” communications of the acm 49, no. 4 (2006): 60. 2. kathryn la barre, “faceted navigation and browsing features in new opacs: robust support for scholarly information seeking?” knowledge organization 34, no. 2 (2007): 82. 3. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71. 4. s. r. ranganathan, colon classification basic classification, 6th ed. (new york: asia, 1960). 5. deborah l. mcguinness, “ontologies come of age,” in spinning the semantic web: bringing the world wide web to its full potential, ed. dieter fensel et al. (cambridge, mass.: mit pr., 2003): 179–84. 6. hearst, “clustering versus faceted categories,” 60. 7. ibid., 61. 8. ibid., 59. 9. ibid.. 60. 10. wanda pratt, marti a. hearst, and lawrence m. fagan, “a knowledge-based approach to organizing retrieved documents,” proceedings of the sixteenth national conference on artificial intelligence, july 18–22, 1999, orlando, florida (menlo park, calif.: aaai pr., 1999): 80–85. 11. ibid. 12. ibid., 5. 13. ka-ping yee et al., “faceted metadata for image search and browsing,” 2003, http://flamenco.berkeley.edu/papers/ flamenco-chi03.pdf (accessed oct. 6, 2008). 14. ibid., 6. 15. ibid., 7. 16. ibid. 17. ibid., 8. 18. jennifer english et al., “flexible search and navigation,” 2002, http://flamenco.berkeley.edu/papers/flamenco02.pdf (accessed apr. 22, 2010). 19. ibid., 7. 20. ibid., 6. 21. ibid., 7. 22. mohammed nasir uddin and paul janecek, “performance and usability testing of multidimensional taxonomy in web site search and navigation,” performance measurement and metrics 8, no. 1 (2007): 18–33. 23. ibid., 25. 24. robert capra et al., “effects of structure and interaction style on distinct search tasks,” proceedings of the 7th acm-ieee-cs joint conference on digital libraries (new york: acm, 2007): 442–51. 25. ibid., 446. 26. ibid., 450. help minimize learning effects. to further address this, we would plan to have half our users start first with the traditional interface and half to start first with the faceted interface. this way we can test for differences resulting from learning. the above tasks would allow us to measure several pieces of evidence to support or reject our hypotheses. for tasks 1 and 3, we would measure the number of formats correctly identified by users compared with the number found by an expert searcher. for tasks 2 and 4, we would compare the number of items correctly identified with the total items found in each category by an expert searcher. we could also time the user to determine which interface helped them work more quickly. in addition to measuring the number of formats identified and the number of items identified in each format, we would be able to measure the time it takes users to identify the number of formats and the number of items in each format. to measure user satisfaction, we would ask participants to complete the system usability scale (sus) after each interface and, at the very end of the study, complete a questionnaire comparing the two interfaces. even just selecting the format facet, we would have plenty to investigate. other hypotheses and tasks could be developed for other facet types, such as time period or publication date, or facets related to the responsible parties, such as author or director: hypothesis: users can find more materials written in a certain time period using the faceted interface. task: find ten items of any type (books, journals, movies) written in the 1950s that you think would have information about television advertising. hypothesis: users can find movies directed by a specific person more quickly using the faceted interface. task: in the next two minutes, find as many movies as you can that were directed by orson welles. for the first task above, an expert searcher could complete the same task, and their time could be used as a point of comparison. for the second, the total number of movies in the library catalog that were directed by welles is an objective quantity. in both cases, one could compare the user’s performance on the two interfaces. ■■ conclusion reviewing user studies about faceted browsing revealed empirical evidence that faceted browsing improves user performance. yet this evidence does not necessarily point directly to user success in faceted library catalogs, which have much more complex databases than those used in 66 information technology and libraries | june 2010 53. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hao chen and susan dumais, bringing order to the web: automatically categorizing search results (new york: acm, 2000): 145–52. 54. uddin and janecek, “performance and usability testing.” 55. ibid.; pratt, hearst, and fagan, “a knowledge-based approach”; hsinchun chen et al., “internet browsing and searching: user evaluations of category map and concept space techniques,” journal of the american society for information science 49, no. 7 (1998): 582–603. 56. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71; pratt, hearst, and fagan, “a knowledge-based approach,” 80–85.; chen et al., “internet browsing and searching,” 582–603; yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation using faceted metadata.” 57. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hideo joho and joemon m. jose, slicing and dicing the information space using local contexts (new york: acm, 2006): 66–74.; yee et al., “faceted metadata for image search and browsing.” 58. yee et al., “faceted metadata for image search and browsing”; chen and dumais, bringing order to the web. 59. capra et al., “effects of structure and interaction style.” 60. yee et al., “faceted metadata for image search and browsing”; capra et al., “effects of structure and interaction style”; zhang and marchionini, evaluation and evolution. 61. english et al., “flexible search and navigation,” 7. 62. yee et al., “faceted metadata for image search and browsing,” 7. 63. la barre, “faceted navigation and browsing,” 85. 64. zhang and marchionini, evaluation and evolution, 183. 65. english et al., “flexible search and navigation.” 66. ibid., 6. 67. zhang and marchionini, evaluation and evolution. 68. capra et al., “effects of structure and interaction style.” 69. pratt, hearst, and fagan, “a knowledge-based approach.” 70. english et al., “flexible search and navigation.” 71. bauer, “yale university library vufind test—undergraduates.” 72. jakob nielsen, “quantitative studies: how many users to test?” online posting, alertbox, june 26, 2006 http://www.useit .com/alertbox/quantitative_testing.html (accessed apr. 7, 2010). 73. pratt, hearst, and fagan, “a knowledge-based approach.” 74. tod a. olson used graduate students currently writing their dissertations. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 75. la barre, “faceted navigation and browsing.” 76. capra et al., “effects of structure and interaction style.” 27. junliang zhang and gary marchionini, evaluation and evolution of a browse and search interface: relation browser++ (atlanta, ga.: digital government society of north america, 2005): 179–88. 28. ibid., 183. 29. marti a. hearst, “uis for faceted navigation: recent advances and remaining open problems,” 2008, http://people. ischool.berkeley.edu/~hearst/papers/hcir08.pdf (accessed apr. 27, 2010). 30. tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1/2 (jan. 2008): 7–24. 31. ibid., 22. 32. jerilyn veldof, e-mail from university of minnesota usability services lab, 2008. 33. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 34. ibid., 555. 35. kathleen bauer, “yale university library vufind test— undergraduates,” may 20, 2008, http://www.library.yale.edu/ usability/studies/summary_undergraduate.doc (accessed apr. 27, 2010); kathleen bauer and alice peterson-hart, “usability test of vufind as a subject-based display of ebooks,” aug. 21, 2008, http://www.library.yale.edu/usability/studies/summary _medical.doc (accessed apr. 27, 2010). 36. bauer and peterson-hart, “usability test of vufind as a subject-based display of ebooks,” 1. 37. ibid., 2. 38. ibid., 3. 39. ibid. 40. ibid., 4. 41. ibid. 42. ibid., 5. 43. ibid., 8. 44. kristin antelman, andrew k. pace, and emily lynema, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39. 45. ibid., 139. 46. ibid., 133. 47. ibid., 135. 48. ibid., 136. 49. jennifer l. ward, steve shadle, and pam mofield, “user experience, feedback, and testing,” library technology reports 44, no. 6 (aug. 2008): 22. 50. english et al., “flexible search and navigation.” 51. peter ingwersen and irene wormell, “ranganathan in the perspective of advanced information retrieval,” libri 42 (1992): 184–201; winfried godert, “facet classification in online retrieval,” international classification 18, no. 2 (1991): 98–109.; w. godert, “klassificationssysteme und online-katalog [classification systems and the online catalogue],” zeitschrift für bibliothekswesen und bibliographie 34, no. 3 (1987): 185–95. 52. yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation.” generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 171 from previous experience and from research in software engineering. wasted effort and poor interoperability can therefore ensue, raising the costs of dls and jeopardizing the fluidity of information assets in the future. in addition, there is a need for modeling services and data structures as highlighted in the “digital library reference model” proposed by the delos eu network of excellence (also called the “delos manifesto”);2 in fact, the distribution of dl services over digital networks, typically accessed through web browsers or dedicated clients, makes the whole theme of interaction between users important, for both individual usage and remote collaboration. designing and modeling such interactions call for considerations pertaining to the fields of human– computer interaction (hci) and computer-supported cooperative work (cscw). as an example, scenariobased or activity-based approaches developed in the hci area can be exploited in dl design. to meet these needs we developed cradle (cooperative-relational approach to digital library environments),3 a metamodel-based digital library management system (dlms) supporting collaboration in the design, development, and use of dls, exploiting patterns emerging from previous projects. the entities of the cradle metamodel allow the specification of collections, structures, services, and communities of users (called “societies” in cradle) and partially reflect the delos manifesto. the metamodel entities are based on existing dl taxonomies, such as those proposed by fox and marchionini,4 gonçalves et al.,5 or in the delos manifesto, so as to leverage available tools and knowledge. designers of dls can exploit the domain-specific visual language (dvsl) available in the cradle environment—where familiar entities extracted from the referred taxonomies are represented graphically—to model data structures, interfaces and services offered to the final users. the visual model is then processed and transformed, exploiting suitable templates, toward a set of specific languages for describing interfaces and services. the results are finally transformed into platformindependent (java) code for specific dl applications. cradle supports the basic functionalities of a dl through interfaces and service templates for managing, browsing, searching, and updating. these can be further specialized to deploy advanced functionalities as defined by designers through the entities of the proposed visual the design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. to this end, high-level, language-neutral models have to be devised. metamodeling techniques favor the definition of domainspecific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. this paper describes cradle (cooperative-relational approach to digital library environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. a collection of tools allows the automatic generation of several services, defined with the cradle visual language, and of the graphical user interfaces providing access to them for the final user. the effectiveness of the approach is illustrated by presenting digital libraries generated with cradle, while the cradle environment has been evaluated by using the cognitive dimensions framework. d igital libraries (dls) are rapidly becoming a preferred source for information and documentation. both at research and industry levels, dls are the most referenced sources, as testified by the popularity of google books, google video, ieee explore, and the acm portal. nevertheless, no general model is uniformly accepted for such systems. only few examples of modeling languages for developing dls are available,1 and there is a general lack of systems for designing and developing dls. this is even more unfortunate because different stakeholders are interested in the design and development of a dl, such as information architects, to librarians, to software engineers, to experts of the specific domain served by the dl. these categories may have contrasting objectives and views when deploying a dl: librarians are able to deal with faceted categories of documents, taxonomies, and document classification; software engineers usually concentrate on services and code development; information architects favor effectiveness of retrieval; and domain experts are interested in directly referring to the content of interest without going through technical jargon. designers of dls are most often library technical staff with little to no formal training in software engineering, or computer scientists with little background in the research findings of hypertext information retrieval. thus dl systems are usually built from scratch using specialized architectures that do not benefit alessio malizia (alessio.malizia@uc3m.es) is associate professor, universidad carlos iii, department of informatics, madrid, spain; paolo bottoni (bottoni@di.uniroma1.it) is associate professor and s. levialdi (levialdi@di.uniroma1.it) is professor, “sapienza” university of rome, department of computer science, rome, italy. alessio malizia, paolo bottoni, and s. levialdi generating collaborative systems for digital libraries: a model-driven approach 172 information technology and libraries | december 2010 a formal foundation for digital libraries, called 5s, based on the concepts of streams, (data) structures, (resource) spaces, scenarios, and societies. while being evidence of a good modeling endeavor, the approach does not specify formally how to derive a system implementation from the model. the new generation of dl systems will be highly distributed, providing adaptive and interoperable behaviour by adjusting their structure dynamically, in order to act in dynamic environments (e.g., interfacing with the physical world).13 to manage such large and complex systems, a systematic engineering approach is required, typically one that includes modeling as an essential design activity where the availability of such domain-specific concepts as first-class elements in dl models will make application specification easier.14 while most of the disciplines related to dls—e.g., databases,15 information retrieval,16 and hypertext and multimedia17—have underlying formal models that have properly steered them, little is available to formalize dls per se. wang described the structure of a dl system as a domain-specific database together with a user interface for querying the records stored in the database.18 castelli et al. present an approach involving multidimensional query languages for searching information in dl systems that is based on first-order logic.19 these works model metadata specifications and thus are the main examples of system formalization in dl environments. cognitive models for information retrieval, as used for example by oddy et al.,20 focus on users’ information-seeking behavior (i.e., formation, nature, and properties of a users’ information need) and on how information retrieval systems are used in operational environments. other approaches based on models and languages for describing the entities involved in a dl are the digital library definition language,21 the dspace data model22 (with the definitions of communities and workflow models), the metis workflow framework,23 and the fedora structoid approach.24 e/r approaches are frequently used for modeling database management system (dbms) applications,25 but as e/r diagrams only model the static structure of a dbms, they generally do not deal deeply with dynamic aspects. temporal extensions add dynamic aspects to the e/r approach, but most of them are not object-oriented.26 the advent of object-oriented technology calls for approaches and tools to information system design resulting in object-oriented systems. these considerations drove research toward modeling approaches as supported by uml.27 however, since the uml metamodel is not yet widespread in the dl community, we adopted the e/r formalism and complemented it with the specification of the dynamics made available through the user interface, as described by malizia et al.28 using the metamodel, we have defined a dsvl, including basic entities and language. cradle is based on the entity-relationship (e/r) formalism, which is powerful and general enough to describe dl models and is supported by many tools as a metamodeling language. moreover, we observed that users and designers involved in the dl environment, but not coming from a software engineering background, may not be familiar with advanced formalism like unified modeling language (uml), but they usually have basic notions on database management systems, where e/r is largely employed. ■■ literature review dls are complex information systems involving technologies and features from different areas, such as library and information systems, information retrieval, and hci. this interdisciplinary nature is well reflected in the various definitions of dls present in the literature. as far back as 1965, licklider envisaged collections of digital versions of scanned documents accessible via interconnected computers.6 more recently, levy and marshall described dls as sets of collections of documents, together with digital resources, accessible by users in a distributed context.7 to manage the amount of information stored in such systems, they proposed some sort of user-assisting software agent. other definitions include not only printed documents, but multimedia resources in general.8 however different the definitions may be, they all include the presence of collections of resources, their organization in structured repositories, and their availability to remote users through networks (as discussed by morgan).9 recent efforts toward standardization have been taken by public and private organizations. for example, a delphi study identified four main ingredients: an organized collection of resources, mechanisms for browsing and searching, a distributed networked environment, and a set of objectified services.10 the president’s information technology advisory committee (pitac) panel on digital libraries sees dls as the networked collections of digital text, documents, images, sounds, scientific data, and software that make up the core of today’s internet and of tomorrow’s universally accessible digital repositories of all human knowledge.11 when considering dls in the context of distributed dl environments, only few papers have been produced, contrasting with the huge bibliography on dls in general. the dl group at the universidad de las américas puebla in mexico introduced the concept of personal and group spaces, relevant to the cscw domain, in the dl system context.12 users can share information stored in their personal spaces or share agents, thus allowing other users to perform the same search on the document collections in the dl. the cited text by gonçalves et al. gives generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 173 education as discussed by wattenberg or zia.33 in the nsdl program, a new generation of services has been developed that includes support for teaching and learning; this means also considering users’ activities or scenarios and not only information access. services for implementing personal content delivery and sharing, or managing digital resources and modeling collaboration, are examples of tools introduced during this program. the virtual reference desk (vrd) is emerging as an interactive service based on dls. with vrd, users can take advantage of domain experts’ knowledge and librarians’ experience to locate information. for example, the u.s. library of congress ask a librarian service acts as a vrd for users who want help in searching information categories or to interact with expert librarians to search for a specific topic.34 the interactive and collaborative aspects of activities taking place within dls facilitate the development of user communities. social networking, work practices, and content sharing are all features that influence the technology and its use. following borgmann,35 lynch sees the future of dls not in broad services but in supporting and facilitating “customization by community,” i.e., services tailored for domain-specific work practices.36 we also examined the research agenda on systemoriented issues in dls and the delos manifesto.37 the agenda abstracts the dl life cycle, identifying five main areas, and proposes key research problems. in particular we tackle activities such as formal modeling of dls and their communities and developing frameworks coherent with such models. at the architectural level, one point of interest is to support heterogeneous and distributed systems, in particular networked dls and services.38 for interoperability, one of the issues is how to support and interoperate with different metadata models and standards to allow distributed cataloguing and indexing, as in the open archive initiative (oai).39 finally, we are interested in the service level of the research agenda and more precisely in web services and workflow management as crucial features when including communities and designing dls for use over networks and for sharing content. as a result of this analysis, the cradle framework features the following: ■■ a visual language to help users and designers when visual modeling their specific dl (without knowing any technical detail apart from learning how to use a visual environment providing diagrams representations of domain specific elements) ■■ an environment integrating visual modeling and code generation instead of simply providing an integrated architecture that does not hide technical details ■■ interface generation for dealing with different users relationships for modeling dl-related scenarios and activities. the need for the integration of multiple languages has also been indicated as a key aspect of the dsvl approach.29 in fact, complex domains like dls typically consist of multiple subdomains, each of which may require its own particular language. in the current implementation, the definition of dsvls exploits the metamodeling facilities of atom3, based on graph-grammars.30 atom3 has been typically used for simulation and model transformation, but we adopt it here as a tool for system generation. ■■ requirements for modeling digital libraries we follow the delos manifesto by considering a dl as an organization (possibly virtual and distributed) for managing collections of digital documents (digital contents in general) and preserving their images on storage. a dl offers contextual services to communities of users, a certain quality of service, and the ability to apply specific policies. in cradle we leave the definition of quality of service to the service-oriented architecture standards we employ and partially model the applicable policy, but we focus here on crucial interactivity aspects needed to make dls usable by different communities of users. in particular, we model interactive activities and services based on librarians’ experiences in face-to-face communication with users, or designing exchange and integration procedures for communicating between institutions and managing shared resources. while librarians are usually interested in modeling metadata across dls, software engineers aim at providing multiple tools for implementing services,31 such as indexing, querying, semantics,32 etc. therefore we provide a visual model useful for librarians and information architects to mimic the design phases they usually perform. moreover, by supporting component services, we help software engineers to specify and add services on demand to dl environments. to this end, we use a service component model. by sharing a common language, users from different categories can communicate to design a dl system while concentrating on their own tasks (services development and design for software engineers and dl design for librarians and information architects). users are modeled according to the delos manifesto as dl end-users (subdivided into content creators, content consumers, and librarians), dl designers (librarians and information architects), dl system administrators (typically librarians), and dl application developers (software engineers). several activities have been started on modeling domain specific dls. as an example, the u.s. national science digital library (nsdl) program promotes educational dls and services for basic and advanced science 174 information technology and libraries | december 2010 ■■ how that information is structured and organized (structural model) ■■ the behavior of the dl (service model) and the different societies of actors ■■ groups of services acting together to carry out the dl behavior (societal model) figure 1 depicts the design approach supported by cradle architecture, namely, modeling the society of actors and services interacting in the domain-specific scenarios and describing the documents and metadata structure included with the library by defining a visual model for all these entities. the dl is built using a collection of stock parts and configurable components that provide the infrastructure for the new dl. this infrastructure includes the classes of objects and relationships that make up the dl, and processing tools to create and load the actual library collection from raw documents, as well as services for searching, browsing, and collection maintenance. finally, the code generation module generates tailored dl services code stubs by composing and specializing components from the component pool. initially, a dl designer is responsible for formalizing (starting from an analysis of the dl requirements and characteristics) a conceptual description of the dl using metamodel concepts. model specifications are then fed into a dl generator (written in python for atom3), to produce a dl tailored suitable for specific platforms and requirements. after these design phases, cradle generates the code for the user interface and the parts of code corresponding to services and actors interacting in the described society. a set of templates for code generation and designers ■■ flexible metadata definitions ■■ a set of interactive integrated tools for user activities with the generated dl system to sum up, cradle is a dlms aimed at supporting all the users involved in the development of a dl system and providing interfaces, data modeling, and services for user-driven generation of specific dls. although cradle does not yet satisfy all requirements for a generic dl system, it addresses issues focused on developing interactive dl systems, stressing interfaces and communication between users. nevertheless, we employed standards when possible to leave it open for further specification or enhancements from the dl user community. extensive use of xml-based languages allows us to change document information depending on implemented recognition algorithms so that expert users can easily model their dl by selecting the best recognition and indexing algorithms. cradle evolves from the jdan (java-based environment for document applications on networks) platform, which managed both document images and forms on the basis of a component architecture.40 jdan was based on xml technologies, and its modularity allowed its integration in service-based and grid-based scenarios. it supported template code generation and modeling, but it required the designer to write xml specifications and edit xml schema files in order to model the dl document types and services, thus requiring technical knowledge that should be avoided to let users concentrate on their specific domains. ■■ modeling digital library systems the cradle framework shows a unique combination of features: it is based on a formal model, exploits a set of domain-specific languages, and provides automatic code generation. moreover, fundamental roles are played by the concepts of society and collaboration.41 cradle generates code from tools built after modeling a dl (according to the rules defined by the proposed metamodel) and performs automatic transformation and mapping from model to code to generate software tools for a given dl model. the specification of a dl in cradle encompasses four complementary dimensions: ■■ multimedia information supported by the dl (collection model) figure 1. cradle architecture generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 175 socioeconomic, and environment dimensions. we now show in detail the entities and relations in the derived metamodel, shown in figure 2. actor entities actors are the users of dls. actors interact with the dl through services (interfaces) that are (or can be) affected by the actors preferences and messages (raised events). in the cradle metamodel, an actor is an entity with a behavior that may concurrently generate events. communications with other actors may occur synchronously or asynchronously. actors can relate through services to shape a digital community, i.e., the basis of a dl society. in fact, communities of students, readers, or librarians interact with and through dls, generally following predefined scenarios. as an example, societies can behave as query generator services (from the point of view of the library) and as teaching, learning, and working services (from the point of view of other humans and organizations). communication between actors within the same or different societies occur through message exchange. to operate, societies need shared data structures and message protocols, enacted by sending structured sequences of queries and retrieving collections of results. the actor entity includes three attributes: 1. role identifies which role is played by the actor within the dl society. examples of specific human roles include authors, publishers, editors, maintainers, developers, and the library staff. examples of nonhuman actors include computers, printers, telecommunication devices, software agents, and digital resources in general. 2. status is an enumeration of possible statuses for the actor: i. none (default value) ii. active (present in the model and actively generating events) iii. inactive (present in the model but not generating events) iv. sleeping (present in the model and awaiting for a response to a raised event) 3. events describes a list of events that can be raised by the actor or received as a response message from a service. examples of events are borrow, reserve, return, etc. events triggered from digital resources include store, trash, and transfer. examples of response events are found, not found, updated, etc. have been built for typical services of a dl environment. to improve acceptability and interoperability, cradle adopts standard specification sublanguages for representing dl concepts. most of the cradle model primitives are defined as xml elements, possibly enclosing other sublanguages to help define dl concepts. in more detail, mime types constitute the basis for encoding elements of a collection. the xml user interface language (xul)42 is used to represent appearance and visual interfaces, and xdoclet is used in the libgen code generation module, as shown in figure 1.43 ■■ the cradle metamodel in the cradle formalism, the specification of a dl includes a collection model describing the maintained multimedia documents, a structural model of information organization, a service model for the dl behavior, and a societal model describing the societies of actors and groups of services acting together to carry out the dl behavior. a society is an instance of the cradle model defined according to a specific collaboration framework in the dl domain. a society is the highest-level component of a dl and exists to serve the information needs of its actors and to describe its context of usage. hence a dl collects, preserves, and shares information artefacts for society members. the basic entities in cradle are derived from the categorization along the actors, activities, components, figure 2. the cradle metamodel with the e/r formalism 176 information technology and libraries | december 2010 a text document, including scientific articles and books, becomes a sequence of strings. the struct entity a struct is a structural element specifying a part of a whole. in dls, structures represent hypertexts, taxonomies, relationships between elements, or containment. for example, books can be structured logically into chapters, sections, subsections, and paragraphs, or physically into cover, pages, line groups (paragraphs), and lines. structures are represented as graphs, and the struct entity (a vertex) contains four attributes: 1. document is a pointer to the document entity the structure refers to. 2. id is a unique identifier for a structure element. 3. type takes three possible values: i. metadata denotes a content descriptor, for instance title, author, etc. ii. layout denotes the associated layout, e.g., left frame, columns, etc. iii. item indicates a generic structure element used for extending the model. 4. values is a list of values describing the element content, e.g., title, author, etc. actors interact with services in an event-driven way. services are connected via messages (send and reply) and can be sequential, concurrent, or task-related (when a service acts as a subtask of a macroservice). services perform operations (e.g., get, add, and del) on collections, producing collections of documents as results. struct elements are connected to each other as nodes of a graph representing metadata structures associated with documents. the metamodel has been translated to a dsvl, associating symbols and icons with entities and relations (see “cradle language and tools” below). with respect to the six core concepts of the delos manifesto (content, user, functionality, quality, policy, and architecture), content can be modeled in cradle as collections and structs, user as actor, and functionality as service. the quality concept is not directly modeled in cradle, but for quality of service we support standard service architecture. policies can be partially modeled by services managing interaction between actors and collections, making it possible to apply standard access policies. from the architectural point of view, we follow the reference architecture of figure 1. ■■ cradle language and tools in this section we describe the selection of languages and tools of the cradle platform. to improve interoperability service entities services describe scenarios, activities, operations, and tasks that ultimately specify the functionalities of a dl, such as collecting, creating, disseminating, evaluating, organizing, personalizing, preserving, requesting, and selecting documents and providing services to humans concerned with fact-finding, learning, gathering, and exploring the content of a dl. all these activities can be described and implemented using scenarios and appear in the dl setting as a result of actors using services (thus societies). furthermore, these activities realize and shape relationships within and between societies, services, and structures. in the cradle metamodel, the service entity models what the system is required to do, in terms of actions and processes, to achieve a task. a detailed task analysis helps understand the current system and the information flow within it in order to design and allocate tasks appropriately. the service entity has four attributes: 1. name is a string representing a textual description of the service. 2. sync states whether communication is synchronous or asynchronous, modeled by values wait and nowait, respectively. 3. events is a list of messages that can trigger actions among services (tasks); for example, valid or notvalid in case of a parsing service. 4. responses contain a list of response messages that can reply to raised events; they are used as a communication mechanism by actors and services. the collection entity collections are sets of documents of arbitrary type (e.g., bits, characters, images, etc.) used to model static or dynamic content. in the static interpretation, a collection defines information content interpreted as a set of basic elements, often of the same type, such as plain text. examples of dynamic content include video delivered to a viewer, animated presentations, and so on. the attributes of collection are name and documents. name is a string, while documents is a list of pairs (documentname, documentlabel), the latter being a pointer to the document entity. the document entity documents are the basic elements in a dl and are modeled with attributes label and structure. label defines a textual string used by a collection entity to refer to the document. we can consider it as a document identifier, specifying a class or a type of document. structure defines the semantics and area of application of the document. for example, any textual representation can be seen as a string of characters, so that generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 177 graphs. model manipulation can then be expressed via graph grammars also specified in atom3. the general process of automatic creation of cooperative dl environments for an application is shown in figure 3. initially, a designer formalizes a conceptual description of the dl using the cradle metamodel concepts. this phase is usually preceded by an analysis of requirements and interaction scenarios, as seen previously. model specifications are then provided to a dl code generator (written in python within atom3) to produce dls tailored to specific platforms and requirements. these are built on a collection of templates of services and configurable components providing infrastructure for the new dl. the sketched infrastructure includes classes for objects (tasks), relationships making up the dl, and processing tools to upload the actual library collection from raw documents, as well as services for searching and browsing and for document collections maintenance. the cradle generator automatically generates different kinds of output for the cradle model of the cooperative dl environment, such as service and collection managers. collection managers define the logical schemata of the dl, which in cradle correspond to a set of mime types, xul and xdoclet specifications, representing digital objects, their component parts, and linking information. collection managers also store instances of their and collaboration, cradle makes extensive use of existing standard specification languages. most cradle outputs are defined with xml-based formats, able to enclose other specific languages. the basic languages and corresponding tools used in cradle are the following: ■■ mime type. multipurpose internet mail extensions (mime) constitute the basis for encoding documents in cradle, supporting several file formats and types of character encoding. mime was chosen because of wide availability of mime types, and standardisation of the approach. this makes it a natural choice for dls where different types of documents need to be managed (pdf, html, doc, etc.). moreover, mime standards for character encoding descriptions help keeping the cradle framework open and compliant with standards. ■■ xul. the xml user interface language (xul) is an xml-based markup language used to represent appearance and visual interfaces. xul is not a public standard yet, but it uses many existing standards and technologies, including dtd and rdf,44 which makes it easily readable for people with a background in web programming and design. the main benefit of xul is that it provides a simple definition of common user interface elements (widgets). this drastically reduces the software development effort required for visual interfaces. ■■ xdoclet. xdoclet is used for generating services from tagged-code fragments. it is an open-source code generation library which enables attribute-oriented programming for java via insertion of special tags.45 it includes a library of predefined tags, which simplify coding for various technologies, e.g., web services. the motivation for using xdoclet in the cradle framework is related to its approach for template code generation. designers can describe templates for each service (browse, query, and index) and the xdoclet generated code can be automatically transformed into the java code for managing the specified service. ■■ atom3. atom3 is a metamodeling system to model graphical formalisms. starting from a metaspecification (in e/r), atom3 generates a tool to process models described in the chosen formalism. models are internally represented using abstract syntax figure 3. cooperative dl generation process with cradle framework 178 information technology and libraries | december 2010 and (3) the metadata operations box. the right column manages visualization and multimedia information obtained from documents. the basic features provided with the ui templates are document loading, visualization, metadata organization, and management. the layout template, in the collection box, manages the visualization of the documents contained in a collection, while the visualization template works according to the data (mime) type specified by the document. actually, by selecting a document included in the collection, the corresponding data file is automatically uploaded and visualized in the ui. the metadata visualization in the code template reflects the metadata structure (a tree) represented by a struct, specifying the relationship between parent and child nodes. thus the xul template includes an area (the metadata box) for managing tree structures as described in the visual model of the dl. although the tree-like visualization has potential drawbacks if there are many metadata items, there should be no real concern with medium loads. the ui template also includes a box to perform operations on metadata, such as insert, delete, and edit. users can select a value in the metadata box and manipulate the presented values. figure 4 shows an example of a ui generated from a basic template. service templates to achieve automated code generation, we use xdoclet to specify parameters and service code generation according to such parameters. cradle can automatically annotate java files with name–value pairs, and xdoclet provides a syntax for parameter specification. code generation is classes and function as search engines for the system. services classes also are generated and are represented as attribute-oriented classes involving parts and features of entities. ■■ cradle platform the cradle platform is based on a model-driven approach for the design and automatic generation of code for dls. in particular, the dsvl for cradle has four diagram types (collection, structure, service, and actor) to describe the different aspects of a dl. in this section we describe the user interface (ui) and service templates used for generating the dl tools. in particular, the ui layout is mainly generated from the structured information provided by the document, struct, and collection entities. the ui events are managed by invoking the appropriate services according to the imported xul templates. at the service and communication levels, the xdoclet code is generated by the service and actor entities, exploiting their relationships. we also show how code generation works and the advanced platform features, such as automatic service discovery. at the end of the section a running example is shown, representing all the phases involved in using the cradle framework for generating the dl tools for a typical library scenario. user interface templates the generation of the ui is driven by the visual model designed by the cradle user. specifically, the model entities involved in this process are document, struct and collection (see figure 2) for the basic components and layout of the interfaces, while linked services are described in the appropriate templates. the code generation process takes place through transformations implemented as actions in the atom3 metamodel specification, where graph-grammar rules may have a condition that must be satisfied for the rule to be applied (preconditions), as well as actions to be performed when the rule is executed (postconditions). a transformation is described during the visual modeling phase in terms of conditions and corresponding actions (inserting xul language statements for the interface in the appropriate code template placeholders). the generated user interface is built on a set of xul template files that are automatically specialized on the basis of the attributes and relationships designed in the visual modeling phase. the layout template for the user interface is divided into two columns (see figure 4). the left column is made of three boxes: (1) the collection box (2) the metadata box, figure 4. an example of an automatically generated user interface. (a) document area; (b) collection box; (c) metadata box; (d) metadata operations box. generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 179 "msg arguments.argname"> { "" , "" "" } , }; the first two lines declare a class with a name class nameimpl that extends the class name. the xdoclet template tag xdtclass:classname denotes the name of the class in the annotated java file. all standard xdoclet template tags have a namespace starting with “xdt.” the rest of the template uses xdtfield : forallfield to iterate through the fields. for each field with a tag named msg arguments.argname (checked using xdtfield : ifhasfieldtag), it creates a subarray of strings using the values obtained from the field tag parameters. xdtfield : fieldname gives the name of the field, while xdtfield : fieldtagvalue retrieves the value of a given field tag parameter. characters that are not part of some xdoclet template tags are directly copied into the generated code. the following code segment was generated by xdoclet using the annotated fields and the above template segment: public class msgargumentsimpl extends msgarguments { public static string[ ][ ] argumentnames = new string[ ][ ]{ { "eventmsg" , " event " , " eventstring " } , { " responsemsg " , " response " , " responsestring " } , }; } similarly, we generate the getter and setter methods for each field: public get () { return ; } public void set ( string value ) { based on code templates. hence service templates are xdoclet templates for transforming xdoclet code fragments obtained from the modeled service entities. the basic xdoclet template manages messages between services, according to the event and response attributes described in “cradle language and tools” above. in fact, cradle generates a java application (a service) that needs to receive messages (event) and reply to them (response) as parameters for the service application. in xdoclet, these can be attached to the corresponding field by means of annotation tags, as in the following code segments: public class msgarguments { . . . . . . /* * @msg arguments.argname name="event " desc="event_string " */ protected string eventmsg = null; /* * @msg arguments.argname name="response" * desc="response_string " */ protected string responsemsg = null; } each msg arguments.argname related to a field is called a field tag. each field tag can have multiple parameters, listed after the field tag. in the tag name msg arguments .argname, the prefix serves as the namespace of all tags for this particular xdoclet application, thus avoiding naming conflicts with other standard or customized xdoclet tags. not only fields can be annotated, but also other entities such as class and functions can have tags too. xdoclet enables powerful code generation requiring little or no customization (depending on how much is provided by the template). the type of code to be generated using the parameters is defined by the corresponding xdoclet template. we have created template files composed of java codes and special xdoclet instructions in the form of xml tags. these xdoclet instructions allow conditionals (if) and loops (for), thus providing us with expressive power close to a programming language. in the following example, we first create an array containing labels and other information for each argument: public class impl extends { public static string[ ][ ] argumentnames = new string[ ][ ] { " , value ) ; }< /xdtfield : ifhasfieldtag> this translates into the following generated code: public java.lang.string get eventmsg ( ) { return eventmsg ; } public void set eventmsg ( string value ) { setvalue ( "eventmsg" , value ) ; } public java.lang.string getresponsemsg ( ) { return getresponsemsg ; } public void setresponsemsg ( string value ) { setvalue ( " responsemsg " , value ) ; } the same template is used for managing the name and sync attributes of service entities. code generation, service discovery, and advanced features a service or interface template only describes the solution to a particular design problem—it is not code. consequently, users will find it difficult to make the leap from the template description to a particular implementation even though the template might include sample code. others, like software engineers, might have no trouble translating the template into code, but they still may find it a chore, especially when they have to do it repeatedly. the cradle visual design environment (based on atom3) helps alleviate these problems. from just a few pieces of information (the visual model), typically application-specific names for actors and services in a dl society along with choices for the design tradeoffs, the tool can create class declarations and definitions implementing the template. the ultimate goal of the modeling effort remains, however, the production of reliable and efficiently executable code. hence a code generation transformation produces interface (xul) and service (java code from xdoclet templates) code from the dl model. we have manually coded xul templates specifying the static setup of the gui, the various widgets and their layout. this must be complemented with code generated from a dl model of the systems dynamics coded into services. while other approaches are possible,46 we employed the solution implemented within the atom3 environment according to its graph grammar modeling approach to code generation. cradle supports a flexible iterative process for visual design and code generation. in fact, a design change might require substantial reimplementation generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 181 selecting one, the ui activates the metadata operations box—figure 6(d). the selected metadata node will then be presented in the lower (metadata operations) box, labeled “set metadata values,” replacing the default “none” value as shown in figure 6. after the metadata item is presented, the user can edit its value and save it by clicking on the “set value” button. the associated action saves the metadata information and causes its display in the intermediate box (tree-like structure), changing the visualization according to the new values. the code generation process for the do_search and front desk services is based on xdoclet templates. in particular, a message listener template is used to generate the java code for the front desk service. in fact, the front desk service is asynchronous and manages communications between actors. the actors classes are generated also by using the services templates since they have attributes, events, and messages, just like the services. the do_search service code is based on the producer and consumer templates, since it is synchronous by definition in the modeled scenario. a get method retrieving a collection of documents is implemented from the getter template. the routine invoked by the transformation action for struct entities performs a breadth-first exploration of the metadata tree in the visual model and attaches the corresponding xul code for displaying the struct node in the correct position within the graph structure of the ui. collections, while a single rectangle connected to a collection represents a document entity; the circles linked to the document entity are the struct (metadata) entities. metadata entities are linked to the node relationships (organized as a tree) and linked to the document entity by a metadata linktype relationship. the search service is synchronous (sync attribute set to “wait”). it queries the document collection (get operation) looking for the requested document (using metadata information provided by the borrow request), and waits for the result of get (a collection of documents). based on this result, the service returns a boolean message “is_available,” which is then propagated as a response to the librarian and eventually to the student, as shown in figure 5. when the library designer has built the model, the transformation process can be run, executing the code generation actions associated with the entities and services represented in the model. the code generation process is based on template code snippets generated from the atom3 environment graph transformation engine, following the generative rules of the metamodel. we also use pre– and postconditions on application of transformation rules to have code generation depend on verification of some property. the generated ui is presented in figure 6. on the right side, the document area is presented according to the xul template. documents are managed according to their mime type: the pdf file of the example is loaded with the appropriate adobe acrobat reader plug-in. on the left column of the ui are three boxes, according to the xul template. the collection box—figure 6(b)— presents the list of documents contained in the collection specified by the documents attribute of the library collection entity, and allows users to interact with documents. after selecting a document by clicking on the list, it is presented in the document area—figure 6(a)—where it can be managed (edit, print, save, etc.). in the metadata box—figure 6(c)—the tree structure of the metadata is depicted according to the categorization modeled by the designer. the xul template contains all the basic layout and action features for managing a tree structure. the generated box contains the parent and child nodes according to the attributes specified in the corresponding struct elements. the user can click on the root for compacting or exploding the tree nodes; by figure 5. the library model, alias the model of the library society 182 information technology and libraries | december 2010 workflow system. the release collection maintains the image files in a permanent storage, while data is written to the target database or content management software, together with xml metadata snippets (e.g., to be stored in xml native dbms). a typical configuration would have the recognition service running on a server cluster, with many dataentry services running on different clients (web browsers directly support xul interfaces). whereas current document capture environments are proprietary and closed, the definition of an xml-based interchange format allows the suitable assembly of different component-based technologies in order to define a complex framework. the realization of the jdan dl system within the cradle framework can be considered as a preliminary step in the direction of a standard multimedia document managing platform with region segmentation and classification, thus aiming at automatic recognition of image database and batch acquisition of multiple multimedia documents types and formats. personal and collaborative spaces a personal space is a virtual area (within the dl society) that is modeled as being owned and maintained by a user including resources (document collections, services, etc.), or references to resources, which are relevant to a task, or set of tasks, the user needs to carry out in the dl. personal spaces may thus contain digital documents in multiple media, personal schedules, visualization tools, and user agents (shaped as services) entitled with various tasks. resources within personal spaces can be allocated ■■ designing and generating advanced collaborative dl systems in this section we show the use of cradle as an analytical tool helpful in comprehending specific dl phenomena, to present the complex interplays that occur between cradle components and dl concepts in a real dl application, and to illustrate the possibility of using cradle as a tool to design and generate advanced tools for dl development. modeling document images collections with cradle, the designer can provide the visual model of the dl society involved in document management and the remaining phases are automatically carried out by cradle modules and templates. we have provided the user with basic code templates for the recognition and indexing services, the data-entry plug-in, and archive release. the designer can thus simply translate the particular dl society into the corresponding visual model within the cradle visual modeling editor. as a proof of concept, figure 7 models the jdan architecture, introduced in “requirements for modeling digital libraries,” exploiting the cradle visual language. the recognition service performs the automatic document recognition and stores the corresponding document images, together with the extracted metadata in the archive collection. it interacts with the scanner actor, representing a machine or a human operator that scans paper documents. designers can choose their own segmentation method or algorithm; what is required to be compliant with the framework is to produce an xdoclet template. it stores the document images into the archive collection, with its different regions layout information according to the xml metadata schema provided by the designer. if there is at least one region marked as “not interpreted,” the dataentry service is invoked on the “not interpreted” regions. the data-entry service allows operators to evaluate the automatic classification performed by the system and edit the segmentation for indexing. operators can also edit the recognized regions with the classification engine (included in the recognition service) and adjust their values and sizes. the output of this phase is an xml description that will be imported in the indexing service for indexing (and eventually querying). the archive collection stores all of the basic information kept in jdan, such as text labels, while the indexing service, based on a multitier architecture, exploiting jboss 3.0, has access to them. this service is responsible for turning the data fragments in the archive collection into useful forms to be presented to the final users, e.g., a report or a query result. the final stage in the recognition process could be to release each document to a content management or figure 6. the ui generated by cradle transforming the library model in xul and xdoclet code generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 183 and metadata, but also can share information with the various committees collaborating for certain tasks. ■■ evaluation in this section we evaluate the presented approach from three different perspectives: usability of the cradle notation, its expressiveness, and usability of the generated dls. usability of cradle notation we have tested it by using the well known cognitive dimensions framework for notations and visual language design.48 the dimensions are usually employed to evaluate the usability of a visual language or notation, or as heuristics to drive the design of innovative visual languages. the significant results are as follows. abstraction gradient an abstraction is a grouping of elements to be treated as one entity. in this sense, cradle is abstraction-tolerant. it provides entities for high-level abstractions of communication processes and services. these abstractions are intuitive as they are visualized as the process they represent (services with events and responses) and easy to learn as their configuration implies few simple attributes. although cradle does not allow users to build new abstractions, the e/r formalism is powerful enough to provide basic abstraction levels. closeness of mapping cradle elements have been assigned icons to resemble their real-world counterparts (e.g., a collection is represented as a set of paper sheets). the elements that do not have a correspondence with a physical object in the real world have icons borrowed from well-known notations (e.g., structs represented as graph nodes). consistency a notation is consistent if a user knowing some of its structure can infer most of the rest. in cradle, when two elements represent the same entity but can be used either as input or as output, then their shape is equal but incorporates an incoming or an outgoing message in order to differentiate them. see, for example, the icons for services or those for graph nodes representing either a according to the user’s role. for example, a conference chair would have access to conference-specific materials, visualization tools and interfaces to upload papers for review by a committee. similarly, we denote a group space as a virtual area in which library users (the entire dl society) can meet to conduct collaborative activities synchronously or asynchronously. explicit group spaces are created dynamically by a designer or facilitator who becomes (or appoints) the owner of the space and defines who the participants will be. in addition to direct user-touser communication, users should be able to access library materials and make annotations on them for every other group to see. ideally, users should be able to act (and carry dl materials with them) between personal and group spaces or among group spaces to which they belong. it may also be the case, however, that a given resource is referenced in several personal or group spaces. basic functionality required for personal spaces includes capabilities for viewing, launching, and monitoring library services, agents, and applications. like group spaces, personal spaces should provide users with the means to easily become aware of other users and resources that are present in a given group space at any time, as well as mechanisms to communicate with other users and make annotations on library resources. we employed this personal and group space paradigm in modeling a collaborative environment in the academic conferences domain, where a conference chair can have a personal view of the document collections (resources) figure 7. the cradle model for the jdan framwork 184 information technology and libraries | december 2010 of “sapienza” university of rome (undergraduate students), shown in figure 5, and (2) an application employed with a project of records management in a collaboration between the computer science and the computer engineering department of “sapienza” university, as shown in figure 7. usability of the generated tools environments for single-view languages generated with atom3 have been extensively used, mostly in an academic setting, in different areas like software and web engineering, modeling, and simulation; urban planning; etc. however, depending on the kind of the domain, generating the results may take some time. for instance, the state reachability analysis in the dl example takes a few minutes; we are currently employing a version of atom3 that includes petri-nets formalism where we can test the services states reachability.49 in general, from application experience, we note the general agreement that automated syntactical consistency support greatly simplifies the design of complex systems. finally, some users pointed out some technical limitations of the current implementation, such as the fact that it is not possible to open several views at a time. altogether, we believe this work contributes to make more efficient and less tedious the definition and maintenance of environments for dls. our model-based approach must be contrasted with the programmingcentric approach of most case tools, where the language and the code generation tools are hard-coded so that whenever a modification has to be done (whether on the language or on the semantic domain) developers have to dive into the code. ■■ conclusions and future work dls are complex information systems that integrate findings from disciplines such as hypertext, information retrieval, multimedia, databases, and hci. dl design is often a multidisciplinary effort, including library staff and computer scientists. wasted effort and poor interoperability can therefore ensue. examining the related bibliography, we noted that there is a lack of tools or automatic systems for designing and developing cooperative dl systems. moreover, there is a need for modeling interactions between dls and users, such as scenario or activity-based approaches. the cradle framework fulfills this gap by providing a model-driven approach for generating visual interaction tools for dls, supporting design and automatic generation of code for dls. in particular, we use a metamodel made of different diagram types (collection, structures, service, and struct or an actor, with different colors. diffuseness/terseness a notation is diffuse when many elements are needed to express one concept. cradle is terse and not diffuse because each entity expresses a meaning on its own. error-proneness data flow visualization reduces the chance of errors at a first level of the specification. on the other hand, some mistakes can be introduced when specifying visual entities, since it is possible to express relations between source and target models which cannot generate semantically correct code. however, these mistakes should be considered “programming errors more than slips,” and may be detected through progressive evaluation. hidden dependencies a hidden dependency is a relation between two elements that is not visible. in cradle, relevant dependencies are represented as data flows via directed links. progressive evaluation each dl model can be tested as soon as it is defined, without having to wait until the whole model is finished. the visual interface for the dl can be generated with just one click, and services can be subsequently added to test their functionalities. viscosity cradle has a low viscosity because making small changes in a part of a specification does not imply lots of readjustments in the rest of it. one can change properties, events or responses and these changes will have only local effect. the only local changes that could imply performing further changes by hand are deleting entities or changing names; however, this would imply minimal changes (just removing or updating references to them) and would only affect a small set of subsequent elements in the same data flow. visibility a dl specification consists of a single set of diagrams fitting in one window. empirically, we have observed that this model usually involves no more than fifteen entities. different, independent cradle models can be simultaneously shown in different windows. expressiveness of cradle the paper has illustrated the expressiveness of cradle by defining different entities end relationships for different dl requisites. to this end, two different applications have been considered: (1) a basic example elaborated with the collaboration of the information science school generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 185 retrieval (reading, mass.: addison-wesley, 1999). 17. d. lucarella and a. zanzi, “a visual retrieval environment for hypermedia information systems,” acm transactions on information systems 14 (1996): 3–29. 18. b. wang, “a hybrid system approach for supporting digital libraries,” international journal on digital libraries 2 (1999): 91–110,. 19. d. castelli, c. meghini, and p. pagano, “foundations of a multidimensional query language for digital libraries,” in proc. ecdl ’02, lncs 2458 (berlin: springer, 2002): 251–65. 20. r. n. oddy et al., eds., proc. joint acm/bcs symposium in information storage & retrieval (oxford: butterworths, 1981). 21. k. maly, m. zubair et al., “scalable digital libraries based on ncstrl/dienst,” in proc. ecdl ’00 (london: springer, 2000): 168–79. 22. r. tansley, m. bass and m. smith, “dspace as an open archival information system: current status and future directions,” proc. ecdl ’03, lncs 2769 (berlin: springer, 2003): 446–60. 23. k. m. anderson et al., “metis: lightweight, flexible, and web-based workflow services for digital libraries,” proc. 3rd acm/ieee-cs jcdl ’03 (los alamitos, calif.: ieee computer society, 2003): 98–109. 24. n. dushay, “localizing experience of digital content via structural metadata,” in proc. 2nd acm/ieee-cs jcdl ’02 (new york: acm, 2002): 244–52. 25. m. gogolla et al., “integrating the er approach in an oo environment,” proc. er, ’93 (berlin: springer, 1993): 376–89. 26. heidi gregersen and christian s. jensen, “temporal entity-relationship models—a survey,” ieee transactions on knowledge & data engineering 11 (1999): 464–97. 27. b. berkem, “aligning it with the changes using the goal-driven development for uml and mda,” journal of object technology 4 (2005): 49–65. 28. a. malizia, e. guerra, and j. de lara, “model-driven development of digital libraries: generating the user interface,” proc. mddaui ’06, http://sunsite.informatik.rwth-aachen.de/ publications/ceur-ws/vol-214/ (accessed oct 18, 2010). 29. d. l. atkins et al., “mawl: a domain-specific language for form-based services,” ieee transactions on software engineering 25 (1999): 334–46. 30. j. de lara and h. vangheluwe, “atom3: a tool for multi-formalism and meta-modelling,” proc. fase ’02 (berlin: springer, 2002): 174–88. 31. j. m. morales-del-castillo et al., “a semantic model of selective dissemination of information for digital libraries,” journal of information technology & libraries 28 (2009): 21–30. 32. n. santos, f. c. a. campos, and r. m. m. braga, “digital libraries and ontology,” in handbook of research on digital libraries: design, development, and impact, ed. y.-l. theng et al. (hershey, pa.: idea group, 2008): 1:19. 33. f. wattenberg, “a national digital library for science, mathematics, engineering, and technology education,” d-lib magazine 3 no. 10 (1998), http://www.dlib.org/dlib/october98/ wattenberg/10wattenberg.html (accessed oct 18, 2010); l. l. zia, “the nsf national science, technology, engineering, and mathematics education digital library (nsdl) program: new projects and a progress report,” d-lib magazine, 7, no. 11 (2002), http://www.dlib.org/dlib/november01/zia/11zia.html (accessed oct 18, 2010). 34. u.s. library of congress, ask a librarian, http://www.loc society), which describe the different aspects of a dl. we have built a code generator able to produce xul code from the design models for the dl user interface. moreover, we use template code generation integrating predefined components for the different services (xdoclet language) according to the model specification. extensions of cradle with behavioral diagrams and the addition of analysis and simulation capabilities are under study. these will exploit the new atom3 capabilities for describing multiview dsvls, to which this work directly contributed. references 1. a. m. gonçalves, e. a fox, “5sl: a language for declarative specification and generation of digital libraries,” proc. jcdl ’02 (new york: acm, 2002): 263–72. 2. l. candela et al., “setting the foundations of digital libraries: the delos manifesto,” d-lib magazine 13 (2007), http://www.dlib.org/dlib/march07/castelli/03castelli.html (accessed oct 18, 2010). 3. a. malizia et al., “a cooperative-relational approach to digital libraries,” proc. ecdl 2007, lncs 4675 (berlin: springer, 2007): 75–86. 4. e. a. fox and g. marchionini, “toward a worldwide digital library,” communications of the acm 41 (1998): 29–32. 5. m. a. gonçalves et al., “streams, structures, spaces, scenarios, societies (5s): a formal model for digital libraries,” acm transactions on information systems 22 (2004): 270–312. 6. j. c. r. licklider, libraries of the future (cambridge, mass.: mit pr., 1965). 7. d. m. levy and c. c. marshall, “going digital: a look at assumptions underlying digital libraries,” communications of the acm 38 (1995): 77–84. 8. r. reddy and i. wladawsky-berger, “digital libraries: universal access to human knowledge—a report to the president,” 2001, www.itrd.gov/pubs/pitac/pitac-dl-9feb01.pdf (accessed mar. 16, 2010). 9. e. l. morgan, “mylibrary: a digital library framework and toolkit,” journal of information technology & libraries 27 (2008): 12–24. 10. t. r. kochtanek and k. k. hein, “delphi study of digital libraries,” information processing management 35 (1999): 245–54. 11. s. e. howe et al., “the president’s information technology advisory committee’s february 2001 digital library report and its impact,” in proc. jcdl ’01 (new york: acm, 2001): 223–25. 12. n. reyes-farfan and j. a. sanchez, “personal spaces in the context of oa,” proc. jcdl ’03 (ieee computer society, 2003): 182–83. 13. m. wirsing, report on the eu/nsf strategic workshop on engineering software-intensive systems, 2004, http://www.ercim. eu/eu-nsf/sis.pdf (accessed oct 18, 2010) 14. s. kelly and j.-p. tolvanen, domain-specific modeling: enabling full code generation (hoboken, n.j.: wiley, 2008). 15. h. r. turtle and w. bruce croft, “evaluation of an inference network-based retrieval model,” acm transactions on information systems 9 (1991): 187–222. 16. r. a. baeza-yates, b. a. ribeiro-neto, modern information 186 information technology and libraries | december 2010 .mozilla.org/en/xul (accessed mar. 16, 2010). 43. xdoclet, welcome! what is xdoclet? http://xdoclet .sourceforge.net/xdoclet/index.html (accessed mar. 16, 2010). 44. w3c, extensible markup language (xml) 1.0 (fifth edition), http://www.w3.org/tr/2008/rec-xml-20081126/ (accessed mar. 16, 2010); w3c, resource description framework (rdf), http://www.w3.org/rdf/ (accessed mar. 16, 2010). 45. h. wada and j. suzuki, “modeling turnpike frontend system: a model-driven development framework leveraging uml metamodeling and attribute-oriented programming,” proc. models ’05, lncs 3713 (berlin: springer, 2005): 584–600. 46. i. horrocks, constructing the user interface with statecharts (boston: addison-wesley, 1999). 47. universal discover, description, and integration oasis standard, welcome to uddi xml.org, http://uddi.xml.org/ (accessed mar. 16, 2010). 48. t. r. g. green and m. petre, “usability analysis of visual programming environments: a ‘cognitive dimensions framework,’” journal of visual languages & computing 7 (1996): 131–74. 49. j. de lara, e. guerra, and a. malizia, “model driven development of digital libraries—validation, analysis and formal code generation,” proc. 3rd webist ’07 (berlin: springer, 2008). .gov/rr/askalib/ (accessed on mar. 16, 2010). 35. c. l. borgmann, “what are digital libraries? competing visions,” information processing & management 25 (1999):227–43. 36. c. lynch, “coding with the real world: heresies and unexplored questions about audience, economics, and control of digital libraries,” in digital library use: social practice in design and evaluation, ed. a. p. bishop, n. a. van house, and b. buttenfield (cambridge, mass.: mit pr., 2003): 191–216. 37. y. ioannidis et al., “digital library information-technology infrastructure,” international journal of digital libraries 5 (2005): 266–74. 38. e. a. fox et al., “the networked digital library of theses and dissertations: changes in the university community,” journal of computing higher education 13 (2002): 3–24. 39. h. van de sompel and c. lagoze, “notes from the interoperability front: a progress report on the open archives initiative,” proc. 6th ecdl, 2002, lncs 2458 (berlin: springer 2002): 144–57. 40. f. de rosa et al., “jdan: a component architecture for digital libraries,” delos workshop: digital library architectures, (padua, italy: edizioni libreria peogetto, 2004): 151–62. 41. defined as a set of actors (users) playing roles and interacting with services. 42. mozilla developer center, xul, https://developer drawing upon findings from a national survey of u.s. public libraries, this paper examines trends in internet and public computing access in public libraries across states from 2004 to 2006. based on library-supplied information about levels and types of internet and public computing access, the authors offer insights into the network-based content and services that public libraries provide. examining data from 2004 to 2006 reveals trends and accomplishments in certain states and geographic regions. this paper details and discusses the data, identifies and analyzes issues related to internet access, and suggests areas for future research. t his article presents findings from the 2004 and 2006 public libraries and the internet studies detail­ ing the different levels of internet access available in public libraries in different states.1 at this point, 98.9 percent of public library branches are connected to the internet and 98.4 percent of connected public library branches offer public internet access.2 however, the types of access and the quality of access available are not uniformly distributed among libraries or among the libraries in various states. while the data at the national level paint a portrait of the internet and public computing access provided by public libraries overall, studies of these differences among the states can help reveal successes and lessons that may help libraries in other states to increase their levels of access. the need to continue to increase the levels and quality of internet and public computing access in public libraries is not an abstract problem. the services and con­ tent available on the internet continue to require greater bandwidth and computing capacity, so public libraries must address ever­increasing technological demands on the internet and computing access that they provide. 3 public libraries are also facing increased external pressure on their internet and computing access. as patrons have come to rely on the availability of internet and computing access in public libraries, so too have government agencies. many federal, state, and local government agencies now rely on public libraries to facilitate citizens’ access to e­government services, such as applying for the federal prescription drug plans, filing taxes, and many other interactions with the gov­ ernment.4 further, public libraries also face increased demands to supply public access computing in times of natural disasters, such as the major hurricanes of 2004 and 2005.5 as a result, both patrons and govern­ ment agencies depend on the internet and computing access provided by public libraries, and each group has different, but interrelated, expectations of what kinds of access public libraries should provide. however, the data indicate that public libraries are at capacity in meet­ ing some of these expectations, while some libraries lack the funding, technology­support capacity, space, and infrastructure (e.g., power, cabling) to reach the expecta­ tions of each respective group. as public libraries (and the internet and public com­ puting access they provide) continue to fill more social roles and expectations, a range of new ideas and strate­ gies can be considered by public libraries to identify suc­ cessful methods for providing access that is high quality and sufficient to meet the needs of patrons and commu­ nity. the goals of the public libraries and the internet stud­ ies have been to help provide an understanding of the issues and needs of libraries associated with providing internet­based services and resources. the 2006 public libraries and the internet study employed a web­based survey approach to gather both quantitative and qualitative data from a sample of the 16,457 public library outlets in the united states.6 a sample was drawn to accurately represent metropolitan status (roughly equating to their designation of urban, suburban, or rural libraries), poverty levels (as derived through census data), state libraries, and the national picture, producing a sample of 6,979 public library out­ lets.7 the survey received a total of 4,818 responses for a response rate of 69 percent. the data in this article, unless otherwise noted, are drawn from the 2004 and 2006 public libraries and the internet studies.8 while the survey received responses from librar­ ies in all fifty states, there were not enough responses in all states from which to present state­level findings. the study was able to provide state­level analysis for thirty­five states (including washington, d.c.) in 2004 and forty­four states at the outlet level (including washington, d.c.) and forty­two states at the system level (including washington, d.c.) in 2006. in addi­ tion, there was some variance in states with adequate responses between the 2004 and 2006 studies. a full listing of the states is available in the final reports of the 2004 and 2006 studies at http://www.ii.fsu.edu/ plinternet_reports.cfm. thus, the findings below reflect 4 information technology and libraries | june 2007 public libraries and internet access across the united states: a comparison by state 2004–2006 paul t. jaeger, john carlo bertot, charles r. mcclure, and miranda rodriguez paul t. jaeger (pjaeger@umd.edu) is an assistant professor at the college of information studies at the university of maryland; john carlo bertot (bertot@ci.fsu.edu) is professor and associate director of the information use management and policy institute, college of information, florida state university; charles r. mcclure (cmcclure@ci.fsu.edu) is francis eppes professor and director of the information use management and policy institute, college of information, florida state university; and miranda rodriguez (mrodrig08@umd.edu) is a graduate student in the college of information studies at the university of maryland. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 5 only those states for which both the 2004 and 2006 stud­ ies were able to provide analysis. n public libraries and the internet across the states overview of 2004 to 2006 as the public library and the internet studies have been ongoing since 1994, the questions asked in the biennial studies have evolved along with the provision of internet access in libraries. the questions have varied between surveys, but there have been consistent questions that allow for longitudinal analysis at the national level. the 2004 study introduced the analysis of the data at both the national and the state levels. with both the 2004 and 2006 studies providing data at the state level, some longitudi­ nal analysis at the state level is now possible. overall, there were a number of areas of consistent data across the states from 2004 to 2006. most states had fairly similar, if not identical, percentages of library outlets offering public internet access between 2004 and 2006. for the most part, changes were increases in the percentage of library outlets offering patron access. further, the average number of hours open per week in 2004 (44.5) and in 2006 (44.8) were very similar, as were the percentages of library outlets reporting increases in hours per week, decreases in hours per week, and no changes in hours per week. while these numbers are consistent, it is not known whether this average number of hours open, or the distribution of the hours open across the week, is sufficient to meet patron needs in most communities. data across the states also indicated that physical space is the primary reason for the inability of libraries to add more workstations within the library building. there was also consistency in the findings related to upgrades and replacement schedules. changes and continuities from 2004 to 2006 while the items noted above show some areas of stability in the internet access provided by public libraries across the states, insights are possible in the areas of change for libraries overall or in the libraries that are leading in particular areas. table 1 details the states with the highest average number of hours open per public library outlet in 2004 and 2006. between 2004 and 2006, the national average for the number of hours open increased slightly from 44.5 hours per week to 44.8 hours per week. this increase is reflected in the numbers for the individual states in 2006, which are generally slightly higher than the numbers for the individual states in 2004. for example, the top state in 2006 averaged 55.7 hours per outlet each week, while the top state in 2004 averaged 54.8 hours. the top four states—ohio, new jersey, florida, and virginia—were the same in both years, though with the top two switching positions. this demonstrates a continuing commitment in these four states by state and local government to ensure wide access to public librar­ ies. these states are also ones with large populations and state budgets, presumably fueling the commitment and facilitating the ability to keep libraries open for many hours each week. while the needs of patrons in other states are no less significant, the data indicate that states with larger populations and higher budgets, not surpris­ ingly, may be best positioned to provide the highest levels of access to public libraries for state residents. the other six states in the 2006 top ten were not in the 2004 top ten. the primary reason for this is that the six states in 2006 increased their hours more than other states. note that the fifth­ranked state in 2004, south carolina, averaged 49 hours per outlet each week, which is less than the tenth­ranked state in 2006, illinois, at 49.5 hours. simply by maintaining the average number of hours open per outlet between 2004 and 2006, south carolina fell from fifth to out of the top ten. these differ­ ences are reflected in the fact that there is nearly a ten­ hour difference from first place to tenth place in 2004; yet only a six­hour discrepancy exists from first place to tenth in 2006. these numbers suggest that hours of operation may change frequently for many libraries, indicating the need for future evaluations of operational hours in rela­ tion to meeting patron demand. table 2 displays the states with the highest average number of public access workstations per public library in 2004 and 2006. the national averages between 2004 and 2006 also showed a slight increase from 10.4 workstations table 1. highest average number of hours open in public library outlets by state in 2004 and 2006 2004 2006 1. new jersey 54.8 1. ohio 55.7 2. ohio 54.6 2. new jersey 55.6 3. florida 52.4 3. florida 52.3 4. virginia 51.3 4. virginia 52.3 5. south carolina 49.0 5. indiana 51.9 6. utah 48.0 6. pennsylvania 50.6 7. new mexico 47.4 7. washington, d.c. 50.6 8. rhode island 47.3 8. maryland 50.0 9. alabama 46.9 9. connecticut 49.8 10. new york 46.2 10. illinois 49.5 national: 44.5 national: 44.8 in 2004 to 10.7 workstations in 2006. a key reason for this slow growth in the number of workstations appears to have a great deal to do with limitations of physical space in libraries; in spite of increasing demands, space con­ straints often limit computer capacity.9 unlike table 1, the comparisons between 2004 and 2006 in table 2 do not show across­the­board increases from 2004 to 2006. in fact, florida had the highest average of workstations per library outlet in both 2004 and 2006, but the average number decreased from 22.6 in 2004 to 21.7 in 2006. it is interesting to note that florida has a significantly higher number of workstations than the next highest state in both 2004 and 2006. in contrast, many of the states in the lower half of the top ten in 2004 had sub­ stantially lower average numbers of workstations in 2004 than in 2006. in 2004 there were an average of seven more computers in spot two than spot ten; in 2006, there were only an average of four more computers from spot two to ten. the large increases in the number of workstations in some states, like nevada, michigan, and maryland, indicate sizeable changes in budget, numbers of outlets, and/or population size. also of note is the significant drop of the average number of workstations in kentucky, declining from 18.8 in 2004 to fewer than 13 in 2006. a possible explanation is that, since kentucky libraries have been leaders in adopting wireless technologies (see table 3), the demand for workstations has decreased as libraries have added wireless access. five states appear in the top ten of both years— florida, indiana, georgia, california, and new jersey. the average number of workstations in indiana, california, and georgia increased from 2004 to 2006, while the aver­ age number of workstations in florida and new jersey decreased between 2004 and 2006. some of the decreases in workstations can be accounted for by increases in the availability of wireless access in public libraries, as librar­ ies with wireless access may feel less need to add more networked computers, relying on patrons to bring their own laptops. such a strategy, of course, will not increase access for patrons who cannot afford laptops. some libraries have sought to address this issue by having lap­ tops available for loan within the library building. the states listed in table 3 had the highest average levels of wireless connectivity in public library outlets in 2004 and 2006. the differences between the numbers in 2004 and 2006 reveal the dramatic increases in the avail­ ability of wireless internet access in public libraries. the national average in 2004 was 17.9 percent, but in 2006, the national average had more than doubled to 37.4 percent of public libraries offering wireless internet access. this sizeable increase is reflected in the changes in the states with the highest levels of wireless access. every position in the ratings in table 3 shows a dra­ matic jump from 2004 to 2006. the top position increased from 47 percent to 63.8 percent. the tenth position increased from 19.6 percent to 47.8 percent, an increase of nearly two­and­a­half times. these increases show how much more prominent wireless internet access has become in the services that public libraries offer to their communities and to their patrons. four states appear on both the 2004 and 2006 lists— virginia, kentucky, rhode island, and new jersey. these four states all showed increases, but the rises in some table 2. highest average number of public access workstations in public library outlets by state in 2004 and 2006. 2004 2006 1. florida 22.6 1. florida 21.7 2. kentucky 18.8 2. indiana 17.5 3. new jersey 15.5 3. nevada 15.7 4. georgia 14.0 4. michigan 14.8 5. utah 13.0 5. maryland 14.6 6. rhode island 12.6 6. georgia 14.4 7. indiana 12.3 7. arizona 14.1 8. texas 11.9 8. california 14.0 9. california 11.8 9. new jersey 13.8 10. south carolina 11.7 10. virginia 13.0 new york 11.7 national: 10.4 national: 10.7 table 3. highest levels of public access wireless internet connectivity in public library outlets by state in 2004 and 2006 2004 2006 1. kentucky 47% 1. virginia 63.8% 2. new mexico 38.6% 2. connecticut 56.6% 3. new hampshire 31.6% 3. indiana 56.6% 4. virginia 30.8% 4. rhode island 53.9% 5. texas 26.4% 5. kentucky 52.0% 6. kansas 25.8% 6. new jersey 50.9% 7. new jersey 22.8% 7. maryland 49.8% 8. rhode island 22.5% 8. illinois 48.3% 9. florida 21.9% 9. california 47.8% 10. new york 19.6% 10. massachusetts 47.8% national: 17.9% national: 37.4% 6 information technology and libraries | june 2007 public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 7 other states were significant enough to reduce kentucky from the top­ranked state in 2004 to the fifth ranked, in spite of the fact that the number of public libraries in kentucky offering wireless access increased from 47 per­ cent to 52 percent. in both years, a majority of the states in the top ten were located along the east coast. further, high levels of wireless access may be linked in some states to areas of high population density or the strong presence of technology­related sectors in the state, as in california and virginia. smaller states with areas of dense popula­ tions, such as connecticut, rhode island, and maryland, are also among the leaders in wireless access. tables 4 and 5 provide contrasting pictures regarding the number of public access internet workstations in public libraries by state in 2004 and 2006. table 4 shows the states with the highest percentages of libraries that consistently have fewer workstations that are needed by patrons, while table 5 shows the states with the highest percentages of libraries that consistently have sufficient workstations to meet patron needs. of note is the fact that, unlike the preceding three tables, there appears to be no significant geographical clustering of states in tables 4 and 5. nationally, the percentage of libraries that consis­ tently have insufficient workstations to meet patron needs declined from 15.7 percent in to 2004 to 13.7 percent in 2006, a change that is within the margin of error (+/­ 3.4 percent) of the question on the 2006 survey. due to the size of the change, it is not known if the national decline was a real improvement or simply a reflection of the margin of error. washington, d.c., oregon, new mexico, idaho, and california appear on the lists for both 2004 and 2006 in table 4. washington, d.c. had the highest percentage of libraries reporting insufficient workstations in both years, though there was a significant decrease from 100 percent of libraries in 2004 to 69 percent of libraries in 2006. in this case, the significant drop represents major strides forward to providing sufficient access to patrons in washington, d.c. similarly, though california features on both lists, the percentages dropped from 44.9 percent in 2004 to 22.2 percent in 2006, a decline of more than half. states like these are obviously making efforts to address the need for increased workstations. overall, eight out of ten positions in table 4 remained constant or saw a decline percentage in each position from 2004 to 2006, indicating a national decrease in libraries with insufficient workstations. in sharp contrast, fewer than 20 percent of nevada libraries in 2004 reported insufficient workstations, placing well out of the top ten. however, in 2006 nevada ranked second, with 51.5 percent of public libraries reporting insufficient workstations to meet patron demand. with nevada’s rapidly growing population, it appears that the demand for internet access in public libraries may not be keeping pace with the population growth. the percentage of public libraries reporting suffi­ cient workstations to consistently meet patron demands increased slightly at the national level from 14.1 percent in 2004 to 14.6 percent in 2006, again well within the margin of error (+/­ 3.5 percent) of the 2006 question. however, in table 5, the top ten positions in 2006 all fea­ ture lower percentages than the same positions in 2004. in 2004 the top­ranked state had 53.2 percent of libraries able to consistently meet patron needs for internet access, but the top­ranked state in 2006 had only 31 percent of libraries able to consistently meet patron access needs. table 4. public library outlet public access workstation availability by state in 2004 and 2006–consistently have fewer workstations than are needed 2004 2006 1. washington, d.c. 100% 1. washington, d.c. 69.9% 2. california 44.9% 2. nevada 51.5% 3. florida 36% 3. oregon 34.8% 4. new mexico 30.7% 4. new mexico 31.9% 5. oregon 30.4% 5. tennessee 30.4% 6. utah 29.2% 6. alaska 27.8% 7. south carolina 28.4% 7. idaho 26% 8. kentucky 24.1% 8. california 22.2% 9. alabama 21.5% 9. new york 21.4% 10. idaho 21.1% 10. rhode island 19% national: 15.7% national: 13.7% table 5. public library outlet public access workstation availability by state in 2004 and 2006—always have a sufficient number of workstations to meet demand. 2004 2006 1. wyoming 53.2% 1. louisiana 31% 2. alaska 34.9% 2. new hampshire 30.4% 3. kansas 32.2% 3. north carolina 28.4% 4. rhode island 31.4% 4. arkansas 26.2% 5. new hampshire 29.7% 5. wyoming 25.2% 6. south dakota 25.2% 6. mississippi 24.4% 7. georgia 25% 7. missouri 23.6% 8. arkansas 24.8% 8. vermont 22.2% 9. vermont 32.7% 9. nevada 20.9% 10. virginia 22.4% 10. pennsylvania 17.9% west virginia 17.9% national: 14.1% national: 14.6% � information technology and libraries | june 2007 four states—new hampshire, arkansas, wyoming, and vermont—appear on both the 2004 and 2006 lists. the national increase in the sufficiency of the num­ ber of workstations to meet patron access needs and decreases in all of the top­ranked states between 2004 and 2006 seems incongruous. this situation results, however, from a decrease in range of differences among the states from 2004 to 2006, so that the range is compressed and the percentages are more similar among the states. further, in some states, the addition of wireless access may have served to increase the overall sufficiency of the access in libraries, possibly leveling the differences among states. nevertheless, the national average of only 14.6 percent of public libraries consistently having sufficient numbers of workstations to meet patron access needs is clearly a major problem that public libraries must work to address. comparing the 2006 data of tables 4 and 5 demonstrates that patron demands for internet access are being met neither evenly nor consistently across the states. nationally, the percentage of public library systems with increases in the information technology budgets from the previous year dropped dramatically from 36.1 percent in 2004 to 18.6 percent in 2006. as can be seen in table 6, various national, state, and local budget crunches have significantly reduced the percentages of public library systems with increases in information technology budgets. when inflation is taken into account, a stationary information technology budget represents a net decrease in funds available in real dollar terms, so the only public libraries that are not actually having reductions in their information technology budgets are those with increases in such budgets. since internet access and the accompa­ nying hardware necessary to provide it are clearly a key aspect of information technology budgets, decreases in these budgets will have tangible impacts on the ability of public libraries to provide sufficient internet access. virtually every position on table 6 has a decrease of 20 percent to 30 percent from 2004 to 2006, with the largest decrease being from 84.2 percent in 2004 to 48.3 percent in 2006 in the second position. five states—delaware, kentucky, florida, rhode island, and south carolina—are listed for both 2004 and 2006, though every one of these states registered a decrease from 2004 to 2006. no drop was more dramatic than south carolina’s from 84.2 percent in 2004 to 31 percent in 2006. overall, though, the declining information tech­ nology budgets and continuing increases in demands for information technology access among patrons cre­ ates a very difficult situation for libraries. public libraries and the internet in 2006 along with questions that were asked on both the 2004 and 2006 public libraries and the internet studies, the sur­ vey included new questions on the 2006 study to account for social changes, alterations of the policy environment, and the maturation of internet access in public librar­ ies. several findings from the new questions on the 2006 study were noteworthy among the state data. the states listed in table 7 had the highest percentage of public library systems with increases in total operating budget over the previous year in 2006. nationally, 45.1 percent of public library systems had some increase in their overall budget, which includes funding for staff, physical structures, collection development, and many other costs, along with technology. at the state level, three northeastern states clearly led the way, with more than 75 percent of library systems in maryland, delaware, and rhode island benefiting from an increase in the overall operating budget. also of note is the fact that two fairly table 6. highest levels of public library system overall internet information technology budget increases by state in 2004 and 2006 2004 2006 1. florida 87.5% 1. delaware 60% 2. south carolina 84.2% 2. kentucky 48.3% 3. rhode island 67.5% 3. maryland 47.6% 4. delaware 64.9% 4. wyoming 45.7% 5. new jersey 61.5% 5. louisiana 40% 6. north carolina 55.5% 6. florida 38% 7. virginia 53.6% 7. rhode island 33.3% 8. kentucky 53.2% 8. south carolina 31% 9. new mexico 49.3% 9. arkansas 27.5% 10. kansas 49% 10. california 27.3% national: 36.1% national: 18.6% table 7. highest levels of public library system total operating budget increases by state in 2006 1. maryland 85.7% 2. delaware 80% 3. rhode island 76.4% 4. idaho 74.5% 5. kentucky 73.6% 6. connecticut 68.6% 7. virginia 62.8% 8. new hampshire 62.5% 9. north carolina 61.6% 10. wyoming 60.9% national: 45.1% public libraries and internet access | jaeger, bertot, mcclure, and rodriguez � rural and sparsely populated western states—idaho and wyoming—were among the top ten. five of the states in the top ten in highest percent­ ages of increases in operating budget in 2006 were also among the top ten in highest percentages of increases in information technology budgets in 2006. comparing table 7 with table 6 reveals that delaware, kentucky, maryland, rhode island, and wyoming are on both lists. in these states, increases in information technology budgets seem to have accompanied larger increases in the overall 2006 budget. an interesting point to ponder in comparing table 6 with table 7 is the large discrepancy between average increases in information technology budgets (18.6 per­ cent) with overall budgets (45.1 percent) at the national level. as internet access is becoming more vital to pub­ lic libraries in the content and services they provide to patrons, it seems surprising that such a smaller portion of library systems would receive an increase in information technology budgets than in overall budgets. one growing issue with the provision of internet access in public libraries is the provision of access at suf­ ficient connection speeds. more and more internet con­ tent and services are complex and require large amounts of bandwidth, particularly content involving audio and video components. fortunately, as demonstrated in table 8, 53.5 percent of libraries nationally indicate that their connection speed is sufficient at all times to meet patron needs. in contrast, only 16.1 percent of public libraries nationally indicate that their connection speed is insuf­ ficient to meet patron needs at all times. georgia has the highest percentage of libraries that always have sufficient connection speed at 80.5 percent. in the case of georgia, the statewide library network is most likely a key part of ensuring the majority of libraries have sufficient access speed. many of the other states that have the highest percentages of public librar­ ies with sufficient connection speeds are located in the middle part of the country. the state with the highest percentage of libraries with insufficient connection speed to meet patron demands is virginia, with 35 per­ cent of libraries. curiously, virginia consistently ranks in the top ten of tables 1–3. though virginia libraries have some of the longest hours open, some of the high­ est numbers of workstations, and some of the highest levels of wireless access, they still have the highest per­ centage of libraries with insufficient connection speed. only five states had more than 25 percent of libraries with connection speeds insufficient to meet the needs of patrons at all times. this issue is significant now in these states, as these libraries lack the necessary connec­ tion speeds. however, it will continue to escalate as an issue as content and services on the internet continue to evolve and become more complex, thus requiring greater connection speeds. comparing table 8 with table 4 (consistently have fewer workstations than are needed) and table 5 (always have a sufficient number of workstations to meet demand) reveals some parallels. alabama and rhode island are among the top ten states both for connection speed being consistently insufficient to meet patron needs (table 8) and consistently have fewer workstations than are needed (table 4). conversely, vermont and louisiana are among the top ten states both for connection speed being sufficient to meet patron needs at all times (table 8) and always have a sufficient number of workstations to meet demand (table 5). table 9 displays the two leading types of internet connection providers for public libraries and the states with the highest percentages of libraries using each. nationally, 46.4 percent of public libraries rely on an internet service provider (isp) for internet access. in the states listed in table 9, three­quarters or more of librar­ ies use an isp, with more than 90 percent of libraries in kentucky and iowa using an isp. the next most common means of connection for public libraries is through a library cooperative or library network, with 26.2 percent of libraries nationally using these means. in such cases, member libraries rely on their established network to serve as the connector to the internet. the library net­ work approach seems to be most effective in geographi­ cally small states. the top three on the list being three of the smallest of the states—rhode island, delaware, and west virginia—with more than 75 percent of libraries in each of these states connecting through a network. nationally, the remaining approximately 25 percent of table �. highest percentages of public library outlets where public access internet service connection speed is sufficient at all times or insufficient by state in 2006 sufficient to meet patrons needs at all times insufficient to meet patron needs 1. georgia 80.5% 1. virginia 35% 2. new hampshire 70.6% 2. north carolina 28.1% 3. iowa 64.2% 3. alaska 27.3% 4. illinois 64% 4. delaware 26.9% 5. ohio 63.9% 5. mississippi 26.6% 6. indiana 63.6% 6. missouri 24.3% 7. vermont 63.5% 7. rhode island 23.1% 8. oklahoma 62.8% 8. oregon 22.4% 9. louisiana 61.7% 9. connecticut 21.5% 10. wisconsin 61.5% 10. arkansas 21.2% national: 53.5% national: 16.1% 10 information technology and libraries | june 2007 libraries connect through a network managed by a nonlibrary entity or by other means. the highest percentages of public library sys­ tems receiving each kind of e­rate discount are presented in table 10. e­rate discounts are an important source of technology funding for many public libraries across the country, with more than $250,000,000 in e­rate discounts distributed to libraries between 2000 and 2003.10 nationally in 2006, 22.4 percent of public library systems received discounts for internet connectivity, 39.6 percent for telecommunications services, and 4.4 percent for internal connection costs. mississippi and louisiana appear in the top five for each of the three types of discounts. minnesota and west virginia are each in the top five for two of the three lists. many of the states benefiting the most from e­rate funding in 2006 have large rural popu­ lations spread out over a geographically dispersed area, indicating the continuing importance of e­ rate discounts in bringing internet connections to rural public libraries. maryland and west virginia are both included in the telecommunications service column of table 10 due to proportionally large areas of these smaller states that are rural. the importance of the telecommunications dis­ counts in certain states is obviated by the fact that more than 75 percent of public library systems in all five states listed received such discounts. in comparison, only one state has more than 75 percent of library systems receiv­ ing discounts for internet connectivity, while no state has 30 percent of library systems receiving discounts for internal connection costs, with the latter reflecting the manner in which e­rate funding is calculated. in spite of the penetration of the internet into virtually every public library in the united states and the general expectations that internet access will be publicly available in every library, not all public libraries offer information technology training for patrons. nationally, 21.4 percent of public library outlets do not offer technology training. table 10 lists the states with the highest percentages of public library outlets not offering information technol­ ogy training. six of the ten states listed are located in the southeastern part of the country. the lack of resources or adequate number of staff to provide training is a leading concern in these states. not offering patron training may be strongly linked to lacking economic resources to do so. for example, the two states with the highest percentage of public libraries not offering patron training—mississippi and louisiana—are also the two states in the top five recipients of each kind of e­rate funding listed in table 10. if the libraries in states like these are economically struggling just to provide internet access, it seems likely that providing accompany­ ing training might be difficult as well. a further difficulty is that there is little public or private funding available specifically for training. n discussion of issues the similarities and differences among the states indi­ cate that the evolution of public access to the internet in public libraries is not necessarily an evenly distributed phenomenon, as some states appear to be consistent lead­ ers in some areas and other states appear to consistently trail in others. while the national picture is one primarily of continued progress in the availability and quality of internet access available to library patrons, the progress is not evenly distributed among the states. 11 libraries in different states struggle with or benefit from different issues. some public libraries are limited by state and local budgetary limitations, while other libraries are seeking alternate funding sources through grant writ­ ing and building partnerships with the corporate world. some face barriers to providing access due to their geo­ graphical location or small service population. it may also be the case that the libraries in some states do not per­ ceive that patrons desire increased access. other public libraries are able to provide high­end access as a result of having strong local leadership, sufficient state and local funding, well­developed networks and cooperatives, and a proactive state library. though the discussion of the “digital divide” has become much less frequent, the state data seem to indi­ cate that there are gaps in levels of access among libraries in different states. while every state has very successful individual libraries in terms of providing quality internet table �. highest levels of types of internet connection provider for public library outlets by state in 2006 internet service provider library cooperative or network 1. kentucky 93.5% 1. rhode island 84.7% 2. iowa 90.9% 2. delaware 79.5% 3. new hampshire 83.8% 3. west virginia 77.9% 4. vermont 81.1% 4. wisconsin 71.2% 5. oklahoma 80.6% 5. massachusetts 54.7% wyoming 80.6% 6. minnesota 52.5% 7. idaho 80.2% 7. ohio 48.9% 8. montana 78.9% 8. georgia 45.1% 9. tennessee 78.4% 9. mississippi 41.2% 10. alabama 74.6% 10. connecticut 38.5% national: 46.4% national: 26.2% public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 11 access and individual libraries that could be doing a better job, the state data indicate that library patrons in different parts of the country have variations in the levels and quality of access available to them. uniformity across all states clearly will never be feasible, though, as differ­ ent states and their patrons have different needs. for example, tables 1, 2, and 3 all display features that indicate high­level internet access in public librar­ ies—high numbers of hours open, high numbers of public access workstations, and high levels of wireless internet access. three states—maryland, new jersey, and virginia—appear in the top ten in these three lists for 2006. further, connecticut, florida, illinois, and indiana each appear in the top ten of two of these three lists. these states clearly are making successful efforts at the state and local levels to guarantee widespread access to public libraries and the internet access they provide. gaps in access are also evident among different regions of the country. the highest percentages of library systems with increases in total operating budgets were concentrated in states along the east coast, with seven of the states listed in table 7 being mid­atlantic or northeastern states. in con­ trast, the highest percentages of library systems relying on e­rate funding in table 10 were concentrated in the midwest and the southeast. further, the numbers in tables 6 and 7 showed far greater increases in the total operating budgets than in the information technology budgets in all regions of the country. as a result, public libraries in all parts of the united states may need to seek alternate sources of funding specifically for information technology costs. as can be seen in table 3, the leading states in adoption of wireless technology are concentrated in the northeast and mid­atlantic. in table 11, southern states, particu­ larly louisiana and mississippi, had many of the highest percentages of libraries not offering any internet training to patrons. it is important to note with data from the gulfstates, however, that the effects of hurricane katrina may have had a large impact on the results reported. one key difference in a number of states seems to be the presence of a state library actively working to coordi­ nate access issues. this particular study was not able to address such issues, but evidence indicates that the state library can play a significant role in ensuring sufficiency of internet access in public libraries in a state. maine, west virginia, and wisconsin all have state libraries that apply and distribute funds at the statewide level to ensure all public libraries, regardless of size or geography, have high­end connections to the internet. the state library of west virginia, for example, applied for e­rate funding for telecommunications costs on a statewide basis and received 79.1 percent funding in 2006, using such funding to cover not only connection costs for public libraries, but also to provide it and network support to libraries. another example of a successful statewide effort to provide sufficient internet access can be found in maryland. in the early 1990s, maryland public library administrators agreed to let the state library use library services and technology act (lsta) funds to build the sailor network, connecting all public libraries in the state.12 this network predates the e­rate program by a number of years, but having an established statewide network has helped the state library to coordinate table 10. highest percentages of public library systems receiving e-rate discounts by category and state in 2006 internet connectivity telecommunications services internal connection costs 1. louisiana 89.2% 1. mississippi 92.6% 1. mississippi 29.6% 2. indiana 70.8% 2. south carolina 89.4% 2. minnesota 22.6% 3. mississippi 63% 3. louisiana 79.5% 3. arizona 19.3% 4. minnesota 50.5% 4. west virginia 79.1% 4. west virginia 14.2% 5. tennessee 44.7% 5. maryland 76.2% 5. louisiana 12.3% national: 22.4% national: 39.6% national: 4.4% table 11. highest levels of public library systems not offering patron information technology training services by state in 2006 1. louisiana 48.7% 2. mississippi 40.7% 3. arkansas 39.6% 4. alaska 36% 5. arizona 34.8% 6. georgia 34.5% 7. new hampshire 32.8% 8. south carolina 31.1% 9. tennessee 30% 10. idaho 29% national: 21.4% 12 information technology and libraries | june 2007 applications, funding, and services among the libraries of the state. the state budget in maryland also provides other types of funding to support the state library, the library systems, and the library outlets in providing internet access. in states such as georgia, maryland, maine, west virginia, and wisconsin, the provision of internet access in public libraries is shaped not only by library outlets and library systems, but by the state libraries as well. in these and other states, the efforts of the state library appear to be reflected in the data from this study. a final area for discussion is the degree to which librarians understand how much bandwidth is required to meet the needs of library users, how to measure actual bandwidth that is available in the library, and how to determine the degree to which that bandwidth is suf­ ficient. indeed, many providers advertise that their con­ nection speeds are “up to” a certain speed when in fact they deliver considerably less.13 the authors have offered an analysis of determining the quality and sufficiency of bandwidth elsewhere.14 suffice to say that there is consid­ erable confusion as to “how good is good enough” band­ width connection quality. these types of issues frame understandings of how connected libraries in different states are and whether those connections are sufficient to meet the needs of patrons. n future research while the experience of individual patrons in particular libraries will vary widely in terms of whether the access available is sufficient to meet their information needs, the fact that the state data indicate variations in the levels and quality of access among some states and regions of the country is worthy of note. an important area of sub­ sequent research will be to investigate these differences, determine the reasons for them, and develop strategies to alleviate these apparent gaps in access. investigating these differences requires consideration of local and situational factors that may affect access in one library but perhaps not in another. for example, one public library may have access to an internet provider that offers higher speed connectivity that is not available in another location. the range of the possible local and situational factors affecting access and services is extensive. a prelimi­ nary list of the factors that contribute to being a success­ fully networked public library is described in greater detail in the 2006 study.15 however, additional investigation into the degree to which these factors affect access, quality of service, and user satisfaction needs to be continued. the personal experience of the authors in working with various state library agencies suggests the need for additional research that explores relationships among those states ranked highest in areas such as connectivity and workstations with programs and services offered by the state library agencies. one state library, for example, has a specific program that works directly with individual public libraries to assist them in completing the various e­rate forms. is there a link between that state library providing such assistance and the state’s public libraries receiving more e­rate discounts per capita than other states? this is but one example where investigating the role of the state library and comparing those roles and services to the rankings may be useful. perhaps a number of “best practices” could be identified that would assist the libraries in other states in improv­ ing access and services. in terms of research methods, future research on the topics identified in this article may need to draw upon strategies other than a national survey and on­site focus groups/interviews. the 2006 study, for the first time, included site visits and interviews and produced a wealth of data that supplemented the national survey data.16 on­site analysis of actual connection speeds in a sample of public libraries is but one example. the degree to which survey respondents know the connec­ tion speeds at specific workstations is unclear. simply because a t­1 line comes in the front door, it is not nec­ essarily the speed available at a particular workstation. other methods such as log file analysis or user­based surveys of networked services (as opposed to surveys completed by librarians) may offer insights that could augment the national survey data. other approaches such as policy analysis may also prove useful in better understanding access, connectiv­ ity, and services on a state­by­state basis. there has been no systematic description and analysis of state­based laws and regulations that affect public library internet access, connectivity, and services. the authors are aware of some states that ensure a minimum bandwidth will be provided to each public library in the state and pay for such connectivity. such is not true in other states. thus, a better understanding of how state­based policies and regulations affect access, connectivity, and services may identify strategies and policies that could be used in other states to increase or improve access, connectiv­ ity, and services. the data discussed in this article also point to many other important needs in future research. libraries in certain states that seem to be frequently ranking high in the tables indicate that certain states are better able to sustain their libraries in terms of finances and usage. however, additional factors may also be key in the differ­ ences among the states. future research needs to consider the internet access in public libraries in different states in relation to other services offered by libraries and to uses of the internet connectivity in libraries, including types of online content and services available, types of training public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 13 available, community outreach, other collection issues, staffing in relation to technology, and other factors. n conclusion internet and public computing access is almost univer­ sally available in public libraries in the united states, but there are differences in the amounts of access, the kinds of access, and sufficiency of the access available to meet patron demands. now that virtually every public library has an internet connection, provides internet access to patrons, and offers a range of public computing access, the attention of public libraries must refocus on ensuring that every library can provide sufficient internet and com­ puting access to meet patron needs. the issues to address include being open to the public a sufficient number of hours, having enough internet access workstations, hav­ ing adequate wireless access, and having sufficient speed and quality of connectivity to meet the needs of patrons. if a library is not able to provide sufficient access now, the situation will only continue to grow more difficult as the content and services on the internet continue to be more demanding of technical and bandwidth capacity. public libraries must also focus on increasing provi­ sion of internet access in light of federal, state, and local governments recently adding yet another significant level of services to public libraries by “requesting” that they provide access to and training in using numerous e­gov­ ernment services. such e­government services include social services, prescription drug plans, health care, disas­ ter support, tax filing, resource management, and many other activities.17 the maintenance of traditional services, the addi­ tion and expansion of public access computing and networked services, and now the addition of a range of e­government services tacitly required by federal, state, and local governments, in combination, risk stretching public library resources beyond their ability to keep up. to avoid such a situation, public libraries, library sys­ tems, and state governments must learn from the library outlets, systems, and states that are more successfully providing sufficient internet access to their patrons and their communities. among these leaders, there are likely models for success that can be identified for the benefit of other outlets, systems, and states. beyond the lessons that can be learned from the most connected, however, there are also practical and logistical issues that remain beyond the control of an individual library and sometimes the entire state, such as geographical and economic factors. ultimately, the analysis of state data offered here sug­ gests that much can be learned from one state that might assist another state in terms of improving connectivity, access, and services. while the data suggest a number of significant discrepancies among the various states, it may be that a range of best practices can be identified from those more highly ranked states that could be employed in other states to improve access, connectivity, and ser­ vices. staff at the various state library agencies may wish to discuss these findings and develop strategies that can then improve access nationwide. providing access to the internet is now as established a role for public libraries as providing access to books. patrons and communities, and now government orga­ nizations, rely on the fact that internet access will be available to everyone who needs it. while there are other points of access to the internet in some communities, such as school media centers and community technology centers, the public library is often the only public access point available in many communities.18 public libraries across the states must continually work to make sure the access they provide meets all of these needs. n acknowledgements the 2004 and 2006 public libraries and the internet studies were funded by the american library association and the bill & melinda gates foundation. drs. bertot, mcclure, and jaeger served as the co­principal investigators of the study. more information on these studies is available at http://www.ii.fsu.edu/plinternet/. references and notes 1. john carlo bertot, charles r. mcclure, and paul t. jaeger, public libraries and the internet 2004: survey results and findings (tallahassee, fla.: information institute, 2005), http://www.ii.fsu .edu/plinternet_reports.cfm; john carlo bertot et al., public libraries and the internet 2006: study results and findings (tal­ lahassee, fla.: information institute, 2006), http://www.ii.fsu. edu/plinternet_reports.cfm (accessed mar. 31, 2007). 2. bertot et al., public libraries and the internet 2006. 3. john carlo bertot and charles r. mcclure, “assessing the sufficiency and quality of bandwidth for public libraries,” information technology and libraries 26, no. 1 (2007): 14–22. 4. john carlo bertot et al., “drafted: i want you to deliver e­government,” library journal 131, no. 13 (2006): 34–39; john carlo bertot et al., “public access computing and internet access in public libraries: the role of pub­ lic libraries in e­government and emergency situations,” first monday 11, no. 9 (2006). http://www.firstmonday .org/issues/issue11_9/bertot/ (accessed mar. 31, 2007). 5. ibid.; paul t. jaeger et al., “the 2004 and 2005 gulf coast hurricanes: evolving roles and lessons learned for public libraries in disaster preparedness and community services,” public library quarterly (in press). 6. there are actually nearly 17,000 service outlets in the united states. however, the sample frame eliminated bookmobiles as 14 information technology and libraries | june 2007 well as library outlets that the study team could neither geocode nor calculate poverty measures. additional information on the methodology is available in the study report at http://www.ii.fsu .edu/plinternet/ (accessed mar. 31, 2007). 7. bertot et al., public libraries and the internet 2006. 8. bertot, mcclure, and jaeger, public libraries and the internet 2004; bertot et al., public libraries and the internet 2006. the 2004 survey instrument is available at http://www.ii.fsu.edu/pro­ jectfiles/plinternet/plinternet_appendixa.pdf. the 2006 survey instrument is available at http://www.ii.fsu.edu/projectfiles/ plinternet/2006/appendix1.pdf (accessed mar. 31, 2007). 9. bertot et al., public libraries and the internet 2006. 10. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the e­rate program and libraries and library consortia, 2000­ 2004: trends and issues,” information technology and libraries 24, no. 2 (2005): 57–67. 11. bertot, mcclure, and jaeger, public libraries and the internet 2004; bertot et al., public libraries and the internet 2006; john carlo bertot, charles r. mcclure, and paul t. jaeger, “public libraries struggle to meet internet demand: new study shows libraries need support to sustain online services,” american libraries 36, no. 7 (2005): 78–79. 12. john carlo bertot and charles r. mcclure, sailor assessment final report: findings and future sailor development (bal­ timore, md.: division of library development and services, 1996). 13. matt richtel and ken belson, “not always full speed ahead,” new york times, nov. 18, 2006. 14. bertot and mcclure, “assessing the sufficiency,” 14–22. 15. bertot et al., public libraries and the internet 2006. 16. ibid. 17. bertot et al., “drafted: i want you to deliver e­govern­ ment”; bertot et al., “public access computing and internet access in public libraries”; jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 18. paul t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. batch loading collections into dspace | walsh 117 maureen p. walsh batch loading collections into dspace: using perl scripts for automation and quality control colleagues briefly described batch loading marc metadata crosswalked to dspace dublin core (dc) in a poster session.2 mishra and others developed a perl script to create the dspace archive directory for batch import of electronic theses and dissertations (etds) extracted with a java program from an in-house bibliographic database.3 mundle used perl scripts to batch process etds for import into dspace with marc catalog records or excel spreadsheets as the source metadata.4 brownlee used python scripts to batch process comma-separated values (csv) files exported from filemaker database software for ingest via the dspace item importer.5 more in-depth descriptions of batch loading are provided by thomas; kim, dong, and durden; proudfoot et al.; witt and newton; drysdale; ribaric; floyd; and averkamp and lee. however, irrespective of repository software, each describes a process to populate their repositories dissimilar to the workflows developed for the knowledge bank in approach or source data. thomas describes the perl scripts used to convert marc catalog records into dc and to create the archive directory for dspace batch import.6 kim, dong, and durden used perl scripts to semiautomate the preparation of files for batch loading a university of texas harry ransom humanities research center (hrc) collection into dspace. the xml source metadata they used was generated by the national library of new zealand metadata extraction tool.7 two subsequent projects for the hrc revisited the workflow described by kim, dong, and durden.8 proudfoot and her colleagues discuss importing metadata-only records from departmental refbase, thomson reuters endnote, and microsoft access databases into eprints. they also describe an experimental perl script written to scrape lists of publications from personal websites to populate eprints.9 two additional workflow examples used citation databases as the data source for batch loading into repositories. witt and newton provide a tutorial on transforming endnote metadata for digital commons with xslt (extensible stylesheet language transformations).10 drysdale describes the perl scripts used to convert thomson reuters reference manager files into xml for the batch loading of metadata-only records into the university of glascow’s eprints repository.11 the glascow eprints batch workflow is additionally described by robertson and nixon and greig.12 several workflows were designed for batch loading etds into repositories. ribaric describes the automatic this paper describes batch loading workflows developed for the knowledge bank, the ohio state university’s institutional repository. in the five years since the inception of the repository approximately 80 percent of the items added to the knowledge bank, a dspace repository, have been batch loaded. most of the batch loads utilized perl scripts to automate the process of importing metadata and content files. custom perl scripts were used to migrate data from spreadsheets or comma-separated values files into the dspace archive directory format, to build collections and tables of contents, and to provide data quality control. two projects are described to illustrate the process and workflows. t he mission of the knowledge bank, the ohio state university’s (osu) institutional repository, is to collect, preserve, and distribute the digital intellectual output of osu’s faculty, staff, and students.1 the staff working with the knowledge bank have sought from its inception to be as efficient as possible in adding content to dspace. using batch loading workflows to populate the repository has been integral to that efficiency. the first batch load into the knowledge bank was august 29, 2005. over the next four years, 698 collections containing 32,188 items were batch loaded, representing 79 percent of the items and 58 percent of the collections in the knowledge bank. these batch loaded collections vary from journal issues to photo albums. the items include articles, images, abstracts, and transcripts. the majority of the batch loads, including the first, used custom perl scripts to migrate data from microsoft excel spreadsheets into the dspace batch import format for descriptive metadata and content files. perl scripts have been used for data cleanup and quality control as part of the batch load process. perl scripts, in combination with shell scripts, have also been used to build collections and tables of contents in the knowledge bank. the workflows using perl scripts to automate batch import into dspace have evolved through an iterative process of continual refinement and improvement. two knowledge bank projects are presented as case studies to illustrate a successful approach that may be applicable to other institutional repositories. ■■ literature review batch ingesting is acknowledged in the literature as a means of populating institutional repositories. there are examples of specific batch loading processes minimally discussed in the literature. branschofsky and her maureen p. walsh (walsh.260@osu.edu) is metadata librarian/ assistant professor, the ohio state university libraries, columbus, ohio. 118 information technology and libraries | september 2010 relational database postgresql 8.1.11 on the red hat enterprise linux 5 operating system. the structure of the knowledge bank follows the hierarchical arrangement of dspace. communities are at the highest level and can be divided into subcommunities. each community or subcommunity contains one or more collections. all items—the basic archival elements in dspace—are contained within collections. items consist of metadata and bundles of bitstreams (files). dspace supports two user interfaces: the original interface based on javaserver pages (jspui) and the newer manakin (xmlui) interface based on the apache cocoon framework. at this writing, the knowledge bank continues to use the jspui interface. the default metadata used by dspace is a qualified dc schema derived from the dc library application profile.18 the knowledge bank uses a locally defined extended version of the default dspace qualified dc schema, which includes several additional element qualifiers. the metadata management for the knowledge bank is guided by a knowledge bank application profile and a core element set for each collection within the repository derived from the application profile.19 the metadata librarians at osul create the collection core element sets in consultation with the community representatives. the core element sets serve as metadata guidelines for submitting items to the knowledge bank regardless of the method of ingest. the primary means of adding items to collections in dspace, and the two ways used for knowledge bank ingest, are (1) direct (or intermediated) author entry via the dspace web item submission user interface and (2) in batch via the dspace item importer. recent enhancements to dspace, not yet fully explored for use with the knowledge bank, include new ingest options using simple web-service offering repository deposit (sword), open archives initiative object reuse and exchange (oai-ore), and dspace package importers such as the metadata encoding and transmission standard submission information package (mets sip) preparation of etds from the internet archive (http:// www.archive.org/) for ingest into dspace using php utilities.13 floyd describes the processor developed to automate the ingest of proquest etds via the dspace item importer.14 also using proquest etds as the source data, averkamp and lee described using xslt to transform the proquest data to bepress’ (the berkeley electronic press) schema for batch loading into a digital commons repository.15 the knowledge bank workflows described in this paper use perl scripts to generate dc xml and create the archive directory for batch loading metadata records and content files into dspace using excel spreadsheets or csv files as the source metadata. ■■ background the knowledge bank, a joint initiative of the osu libraries (osul) and the osu office of the chief information officer, was first registered in the registry of open access repositories (roar) on september 28, 2004.16 as of december 2009 the repository held 40,686 items in 1,192 collections. the knowledge bank uses dspace, the open-source java-based repository software jointly developed by the massachusetts institute of technology libraries and hewlett-packard.17 as a dspace repository, the knowledge bank is organized by communities. the fifty-two communities currently in the knowledge bank include administrative units, colleges, departments, journals, library special collections, research centers, symposiums, and undergraduate honors theses. the commonality of the varied knowledge bank communities is their affiliation with osu and their production of knowledge in a digital format that they wish to store, preserve, and distribute. the staff working with the knowledge bank includes a team of people from three osul areas—technical services, information technology, and preservation—and the contracted hours of one systems developer from the osu office of information technology (oit). the osul team members are not individually assigned full-time to the repository. the current osul team includes a librarian repository manager, two metadata librarians, one systems librarian, one systems developer, two technical services staff members, one preservation staff member, and one graduate assistant. the knowledge bank is currently running dspace 1.5.2 and the figure 1. dspace simple archive format archive_directory/ item_000/ dublin_core.xml--qualified dublin core metadata contents --text file containing one line per filename file_l.pdf --files to be added as bitstreams to the item file_2.pdf item_001/ dublin_core.xml file_1.pdf ... batch loading collections into dspace | walsh 119 ■■ case studies the issues of the ohio journal of science ojs was jointly published by osu and the ohio academy of science (oas) until 1974, when oas took over sole control of the journal. the issues of ojs are archived in the knowledge bank with a two year rolling wall embargo. the issues for 1900 through 2003, a total of 639 issues containing 6,429 articles, were batch loaded into the knowledge bank. due to rights issues, the retrospective batch loading project had two phases. the project to digitize ojs began with the 1900–1972 issues that osu had the rights to digitize and make publicly available. osu later acquired the rights for 1973–present, and (accounting for the embargo period) 1973–2003 became phase 2 of the project. the two phases of batch loads were the most complicated automated batch loading processes developed to date for the knowledge bank. to batch load phase 1 in 2005 and phase 2 in 2006, the systems developers working with the knowledge bank wrote scripts to build collections, generate dc xml from the source metadata, create the archive directory, load the metadata and content files, create tables of contents, and load the tables of contents into dspace. the ojs community in the knowledge bank is organized by collections representing each issue of the journal. the systems developers used scripts to automate the building of the collections in dspace because of the number needed as part of the retrospective project. the individual articles within the issues are items within the collections. there is a table of contents for the articles in each issue as part of the collection homepages.21 again, due to the number required for the retrospective project, the systems developers used scripts to automate the creation and loading of the tables of contents. the tables of contents are contained in the html introductory text section of the collection pages. the tables of contents list title, authors, and pages. they also include a link to the item record and a direct link to the article pdf that includes the file size. for each phase of the ojs project, a vendor contracted by osul supplied the article pdfs and an excel spreadsheet with the article-level metadata. the metadata format. this paper describes ingest via the dspace batch item importer. the dspace item importer is a command-line tool for batch ingesting items. the importer uses a simple archive format diagramed in figure 1. the archive is a directory of items that contain a subdirectory of item metadata, item files, and a contents file listing the bitstream file names. each item’s descriptive metadata is contained in a dc xml file. the format used by dspace for the dc xml files is illustrated in figure 2. automating the process of creating the unix archive directory has been the main function of the perl scripts written for the knowledge bank batch loading workflows. a systems developer uses the test mode of the dspace item importer tool to validate the item directories before doing a batch load. any significant errors are corrected and the process is repeated. after a successful test, the batch is loaded into the staging instance of the knowledge bank and quality checked by a metadata librarian to identify any unexpected results and script or data problems that need to be corrected. after a successful load into the staging instance the batch is loaded into the production instance of the knowledge bank. most of the knowledge bank batch loading workflows use excel spreadsheets or csv files as the source for the descriptive item metadata. the creation of the metadata contained in the spreadsheets or files has varied by project. in some cases the metadata is created by osul staff. in other cases the metadata is supplied by knowledge bank communities in consultation with a metadata librarian or by a vendor contracted by osul. whether the source metadata is created in-house or externally supplied, osul staff are involved in the quality control of the metadata. several of the first communities to join the knowledge bank had very large retrospective collection sets to archive. the collection sets of two of those early adopters, the journal issues of the ohio journal of science (ojs) and the abstracts of the osu international symposium on molecular spectroscopy currently account for 59 percent of the items in the knowledge bank.20 the successful batch loading workflows developed for these two communities—which continue to be active content suppliers to the repository—are presented as case studies. figure 2. dspace qualified dublin core xml notes on the bird life of cedar point 1901-04 griggs, robert f. 120 information technology and libraries | september 2010 article-level metadata to knowledge bank dc, as illustrated in table 1. the systems developers used the mapping as a guide to write perl scripts to transform the vendor metadata into the dspace schema of dc. the workflow for the two phases was nearly identical, except each phase had its own batch loading scripts. due to a staff change between the two phases of the project, a former osul systems developer was responsible for batch loading phase 1 and the oit systems developer was responsible for phase 2. the phase 1 scripts were all written in perl. the four scripts written for phase 1 created the archive directory, performed database operations to build the collections, generated the html introduction table of contents for each collection, and loaded the tables of contents into dspace via the database. for phase 2, the oit systems developer modified and added to the phase 1 batch processing scripts. this case study focuses on phase 2 of the project. batch processing for phase 2 of ojs the annotated scripts the oit systems developer used for phase 2 of the ojs project are included in appendix a, available on the italica weblog (http://ital-ica .blogspot.com/). a shell script (mkcol.sh) added collections based on a listing of the journal issues. the script performed a login as a selected user id to the dspace web interface using the web access tool curl. a subsequent simple looping perl script (mkallcol.pl) used the stored credentials to submit data via this channel to build the collections in the knowledge bank. the metadata.pl script created the archive directory for each collection. the oit systems developer added the pdf file for each item to unix. the vendor-supplied metadata was saved as unicode text format and transferred to unix for further processing. the developer used vi commands to manually modify metadata for characters illegal in xml (e.g., “<” and “&”). (although manual steps were used for this project, the oit systems developer improved the perl scripts for subsequent projects by adding code for automated transformation of the input data to help ensure xml validity.) the metadata.pl script then processed each line of the metadata along with the corresponding data file. for each item, the script created the dc xml file and the contents file and moved them and the pdf file to the proper directory. load sets for each collection (issue) were placed in their own subdirectory, and a load was done for each subdirectory. the items for each collection were loaded by a small perl script (loaditems. pl) that used the list of issues and their collection ids and called a shell script (import.sh) for the actual load. the tables of contents for the issues were added to the knowledge bank after the items were loaded. a perl script (intro.pl) created the tables of contents using the metadata and the dspace map file, a stored mapping of item received from the vendor had not been customized for the knowledge bank. the ojs issues were sent to a vendor for digitization and metadata creation before the knowledge bank was chosen as the hosting site of the digitized journal. the osu digital initiatives steering committee 2002 proposal for the ojs digitization project had predated the knowledge bank dspace instance. osul staff performed quality-control checks of the vendor-supplied metadata and standardized the author names. the vendor supplied the author names as they appeared in the articles—in direct order, comma separated, and including any “and” that appeared. in addition to other quality checks performed, osul staff edited the author names in the spreadsheet to conform to dspace author-entry convention (surname first). semicolons were added to separate author names, and the extraneous ands were removed. a former metadata librarian mapped the vendor-supplied table 1. mapping of vendor metadata to qualified dublin core vendor-supplied metadata knowledge bank dublin core file [n/a: pdf file name] cover title dc.identifier.citation* issn dc.identifier.issn vol. dc.identifier.citation* iss. dc.identifier.citation* cover date dc.identifier.citation* year dc.date.issued month dc.date.issued fpage dc.identifier.citation* lpage dc.identifier.citation* article title dc.title author names dc.creator institution dc.description abstract dc.description.abstract n/a dc.language.iso n/a dc.rights n/a dc.type *format: [cover title]. v[vol.], n[iss.] ([cover date]), [fpage]-[lpage] batch loading collections into dspace | walsh 121 directories to item handles created during the load. the tables of contents were added to the knowledge bank using a shell script (installintro.sh) similar to what was used to create the collections. installintro.sh used curl to simulate a user adding the data to dspace by performing a login as a selected user id to the dspace web interface. a simple looping perl script (ldallintro.pl) called installintro.sh and used the stored credentials to submit the data for the tables of contents. the abstracts of the osu international symposium on molecular spectroscopy the knowledge bank contains the abstracts of the papers presented at the osu international symposium on molecular spectroscopy (mss), which has met annually since 1946. beginning with the 2005 symposium, the complete presentations from authors who have authorized their inclusion are archived along with the abstracts. the mss community in the knowledge bank currently contains 17,714 items grouped by decade into six collections. the six collections were created “manually” via the dspace web interface prior to the batch loading of the items. the retrospective years of the symposium (1946–2004) were batch loaded in three phases in 2006. each symposium year following the retrospective loads was batch loaded individually. retrospective mss batch loads the majority of the abstracts for the retrospective loads were digitized by osul. a vendor was contracted by osul to digitize the remainder and to supply the metadata for the retrospective batch loads. the files digitized by osul were sent to the vendor for metadata capture. osul provided the vendor a metadata template derived from the mss core element set. the metadata taken from the abstracts comprised author, affiliation, title, year, session number, sponsorship (if applicable), and a full transcription of the abstract. to facilitate searching, the formulas and special characters appearing in the titles and abstracts were encoded using latex, a document preparation system used for scientific data. the vendor delivered the metadata in excel spreadsheets as per the spreadsheet template provided by osul. quality-checking the metadata was an essential step in the workflow for osul. the metadata received for the project required revisions and data cleanup. the vendor originally supplied incomplete files and spreadsheets that contained data errors, including incorrect numbering, data in the wrong fields, and inconsistency with the latex encoding. the three knowledge bank batch load phases for the retrospective mss project corresponded to the staged receipt of metadata and digitized files from the vendor. the annotated scripts used for phase 2 of the project, which included twenty years of the osu international symposium between 1951 and 1999, are included in appendix b, available on the italica weblog. the oit systems developer saved the metadata as a tab-separated file and added it to unix along with the abstract files. a perl script (mkxml2.pl) transformed the metadata into dc xml and created the archive directories for loading the metadata and abstract files into the knowledge bank. the script divided the directories into separate load sets for each of the six collections and accounted for the inconsistent naming of the abstract files. the script added the constant data for type and language that was not included in the vendor-supplied metadata. unlike the ojs project, where multiple authors were on the same line of the metadata file, the mss phase 2 script had to code for authors and their affiliations on separate lines. once the load sets were made, the oit systems developer ran a shell script to load them. the script (import_ collections.sh) was used to run the load for each set so that the dspace item import command did not need to be constructed each time. annual mss batch loads a new workflow was developed for batch loading the annual mss collection additions. the metadata and item files for the annual collection additions are supplied by the mss community. the community provides the symposium metadata in a csv file and the item files in a tar archive file. the symposium uses a web form for latex–formatted abstract submissions. the community processes the electronic symposium submissions with a perl script to create the csv file. the metadata delivered in the csv file is based on the template created by the author, which details the metadata requirements for the project. the oit systems developer borrowed from and modified earlier perl scripts to create a new script for batch processing the metadata and files for the annual symposium collection additions. to assist with the development of the new script, i provided the developer a mapping of the community csv headings to the knowledge bank dc fields. i also provided a sample dc xml file to illustrate the desired result of the perl transformation of the community metadata into dc xml. for each new year of the symposium, i create a sample dc xml result for an item to check the accuracy of the script. a dc xml example from a 2009 mss item is included in appendix c, available on the italica weblog. unlike the previous retrospective mss loads in which the script processed multiple years of the symposium, the new script processes one year at a time. the annual symposiums are batch loaded individually into one existing mss decade collection. the new script for the annual loads was tested and refined by loading the 2005 symposium into the staging instance of the 122 information technology and libraries | september 2010 ■■ summary and conclusion each of the batch loads that used perl scripts had its own unique features. the format of content and associated metadata varied considerably, and custom scripts to convert the content and metadata into the dspace import format were created on a case-by-case basis. the differences between batch loads included the delivery format of the metadata, the fields of metadata supplied, how metadata values were delimited, the character set used for the metadata, the data used to uniquely identify the files to be loaded, and how repeating metadata fields were identified. because of the differences in supplied metadata, a separate perl script for generating the dc xml and archive directory for batch loading was written for each project. each new perl script borrowed from and modified earlier scripts. many of the early batch loads were firsts for the knowledge bank and the staff working with the repository, both in terms of content and in terms of metadata. dealing with communityand vendor-supplied metadata and various encodings (including latex), each of the early loads encountered different data obstacles, and in each case solutions were written in perl. the batch loading code has matured over time, and the progression of improvements is evident in the example scripts included in the appendixes. batch loading can greatly reduce the time it takes to add content and metadata to a repository, but successful knowledge bank. problems encountered with character encoding and file types were resolved by modifying the script. the metadata and files for the symposium years 2005, 2006, and 2007 were made available to osul in 2007, and each year was individually loaded into the existing knowledge bank collection for that decade. these first three years of community-supplied csv files contained author metadata inconsistent with knowledge bank author entries. the names were in direct order, uppercase, split by either a semicolon or “and,” and included extraneous data, such as an address. the oit systems developer wrote a perl script to correct the author metadata as part of the batch loading workflow. an annotated section of that script illustrating the author modifications is included in appendix d, available on the italica weblog. the mss community revised the perl script they used to generate the csv files by including an edited version of this author entry correction script and were able to provide the expected author data for 2008 and 2009. the author entries received for these years were in inverted order (surname first) and mixed case, were semicolon separated, and included no extraneous data. the receipt of consistent data from the community for the last two years has facilitated the standardized workflow for the annual mss loads. the scripts used to batch load the 2009 symposium year are included in appendix e, which appears at the end of this text. the oit systems developer unpacked the tar file of abstracts and presentations into a directory named for the year of the symposium on unix. the perl script written for the annual mss loads (mkxml. pl) was saved on unix and renamed mkxml2009.pl. the script was edited for 2009 (including the name of the csv file and the location of the directories for the unpacked files and generated xml). the csv headings used by the community in the new file were checked and verified against the extract list in the script. once the perl script was up-to-date and the base directory was created, the oit systems developer ran the perl script to generate the archive directory set for import. the import.sh script was then edited for 2009 and run to import the new symposium year into the staging instance of the knowledge bank as a quality check prior to loading into the live repository. the brief item view of an example mss 2009 item archived in the knowledge bank is shown in figure 3. figure 3. mss 2009 archived item example batch loading collections into dspace | walsh 123 proceedings of the 2003 international conference on dublin core and metadata applications: supporting communities of discourse and practice—metadata research & applications, seattle, washington, 2003, http://dcpapers .dublincore.org/ojs/pubs/article/view/753/749 (accessed dec. 21, 2009). 3. r. mishra et al., “development of etd repository at iitk library using dspace,” in international conference on semantic web and digital libraries (icsd-2007), ed. a. r. d. prasad and devika p. madalli (2007), 249–59. http://hdl.handle .net/1849/321 (accessed dec. 21, 2009). 4. todd m. mundle, “digital retrospective conversion of theses and dissertations: an in house project” (paper presented to the 8th international symposium on electronic theses & dissertations, sydney, australia, sept. 28–30, 2005), http://adt.caul .edu.au/etd2005/papers/080mundle.pdf (accessed dec. 21, 2009). 5. rowan brownlee, “research data and repository metadata: policy and technical issues at the university of sydney library,” cataloging & classification quarterly 47, no. 3/4 (2009): 370–79. 6. steve thomas, “importing marc data into dspace,” 2006, http://hdl.handle.net/2440/14784 (accessed dec. 21, 2009). 7. sarah kim, lorraine a. dong, and megan durden, “automated batch archival processing: preserving arnold wesker’s digital manuscripts,” archival issues 30, no. 2 (2006): 91–106. 8. elspeth healey, samantha mueller, and sarah ticer, “the paul n. banks papers: archiving the electronic records of a digitally-adventurous conservator,” 2009, https://pacer .ischool.utexas.edu/bitstream/2081/20150/1/paul_banks_ final_report.pdf (accessed dec. 21, 2009); lisa schmidt, “preservation of a born digital literary genre: archiving legacy macintosh hypertext files in dspace,” 2007, https://pacer .ischool.utexas.edu/bitstream/2081/9007/1/mj%20wbo%20 capstone%20report.pdf (accessed dec. 21, 2009). 9. rachel e. proudfoot et al., “jisc final report: increase (increasing repository content through automation and services),” 2009, http://eprints.whiterose.ac.uk/9160/ (accessed dec. 21, 2009). 10. michael witt and mark p. newton, “preparing batch deposits for digital commons repositories,” 2008, http://docs .lib.purdue.edu/lib_research/96/ (accessed dec. 21, 2009). 11. lesley drysdale, “importing records from reference manager into gnu eprints,” 2004, http://hdl.handle.net/1905/175 (accessed dec. 21, 2009). 12. r. john robertson, “evaluation of metadata workflows for the glasgow eprints and dspace services,” 2006, http://hdl .handle.net/1905/615 (accessed dec. 21, 2009); william j. nixon and morag greig, “populating the glasgow eprints service: a mediated model and workflow,” 2005, http://hdl.handle .net/1905/387 (accessed dec. 21, 2009). 13. tim ribaric, “automatic preparation of etd material from the internet archive for the dspace repository platform,” code4lib journal no. 8 (nov. 23, 2009), http://journal.code4lib.org/ articles/2152 (accessed dec. 21, 2009). 14. randall floyd, “automated electronic thesis and dissertations ingest,” (mar. 30, 2009), http://wiki.dlib.indiana.edu/ confluence/x/01y (accessed dec. 21, 2009). 15. shawn averkamp and joanna lee, “repurposing probatch loading workflows are dependent upon the quality of data and metadata loaded. along with testing scripts and checking imported metadata by first batch loading to a development or staging environment, quality control of the supplied metadata is an integral step. the flexibility of perl allowed testing and revising to accommodate problems encountered with how the metadata was supplied for the heterogeneous collections batch loaded into the knowledge bank. however, toward the goal of standardizing batch loading workflows, the staff working with the knowledge bank iteratively refined not only the scripts but also the metadata requirements for each project and how those were communicated to the data suppliers with mappings, explicit metadata examples, and sample desired results. the efficiency of batch loading workflows is greatly enhanced by consistent data and basic standards for how metadata is supplied. batch loading is not only an extremely efficient means of populating an institutional repository, it is also a valueadded service that can increase buy-in from the wider campus community. it is hoped that by openly sharing examples of our batch loading scripts we are contributing to the development of an open library of code that can be borrowed and adapted by the library community toward future institutional repository success stories. ■■ acknowledgments i would like to thank conrad gratz, of osu oit, and andrew wang, formerly of osul. gratz wrote the shell scripts and the majority of the perl scripts used for automating the knowledge bank item import process and ran the corresponding batch loads. the early perl scripts used for batch loading into the knowledge bank, including the first phase of ojs and mss, were written by wang. parts of those early perl scripts written by wang were borrowed for subsequent scripts written by gratz. gratz provided the annotated scripts appearing in the appendixes and consulted with the author regarding the description of the scripts. i would also like to thank amanda j. wilson, a former metadata librarian for osul, who was instrumental to the success of many of the batch loading workflows created for the knowledge bank. references and notes 1. the ohio state university knowledge bank, “institutional repository policies,” 2007, http://library.osu.edu/sites/ kbinfo/policies.html (accessed dec. 21, 2009). the knowledge bank homepage can be found at https://kb.osu.edu/dspace/ (accessed dec. 21, 2009). 2. margret branschofsky et al., “evolving metadata needs for an institutional repository: mit’s dspace,” 124 information technology and libraries | september 2010 appendix e. mss 2009 batch loading scripts -mkxml2009.pl -#!/usr/bin/perl use encode; # routines for utf encoding use text::xsv; # routines to process csv files. use file::basename; # open and read the comma separated metadata file. my $csv = new text::xsv; #$csv->set_sep(' '); # use for tab separated files. $csv->open_file("mss2009.csv"); $csv->read_header(); # process the csv column headers. # constants for file and directory names. $basedir = "/common/batch/input/mss/"; $indir = "$basedir/2009"; $xmldir= "./2009xml"; $imagesubdir= "processed_images”; $filename = "dublin_core.xml"; # process each line of metadata, one line per item. $linenum = 1; while ($csv->get_row()) { # this divides the item's metadata into fields, each in its own variable. my ( $identifier, $title, $creators, $description_abstract, $issuedate, $description, $description2, appendixes a–d available at http://ital-ica.blogspot.com/ quest metadata for batch ingesting etds into an institutional repository,” code4lib journal no. 7 (june 26, 2009), http://journal .code4lib.org/articles/1647 (accessed dec. 21, 2009). 16. tim brody, registry of open access repositories (roar), http://roar.eprints.org/ (accessed dec. 21, 2009). 17. duraspace, dspace, http://www.dspace.org/ (accessed dec. 21, 2009). 18. dublin core metadata initiative libraries working group, “dc-library application profile (dc-lib),” http://dublincore .org/documents/2004/09/10/library-application-profile/ (accessed dec. 21, 2009). 19. the ohio state university knowledge bank policy committee, “osu knowledge bank metadata application profile,” http://library.osu.edu/sites/techservices/kbappprofile.php (accessed dec. 21, 2009). 20. ohio journal of science (ohio academy of science), knowledge bank community, http://hdl.handle .net/1811/686 (accessed dec. 21, 2009); osu international symposium on molecular spectroscopy, knowledge bank community, http://hdl.handle.net/1811/5850 (accessed dec. 21, 2009). 21. ohio journal of science (ohio academy of science), ohio journal of science: volume 74, issue 3 (may, 1974), knowledge bank collection, http://hdl.handle.net/1811/22017 (accessed dec. 21, 2009). batch loading collections into dspace | walsh 125 $abstract, $gif, $ppt, ) = $csv->extract( "talk_id", "title", "creators", "abstract", "issuedate", "description", "authorinstitution", "image_file_name", "talk_gifs_file", "talk_ppt_file" ); $creatorxml = ""; # multiple creators are separated by ';' in the metadata. if (length($creators) > 0) { # create xml for each creator. @creatorlist = split(/;/,$creators); foreach $creator (@creatorlist) { if (length($creator) > 0) { $creatorxml .= '' .$creator.’’.”\n “; } } } # done processing creators for this item. # create the xml string for the abstract. $abstractxml = ""; if (length($description_abstract) > 0) { # convert special metadata characters for use in xml/html. $description_abstract =~ s/\&/&/g; $description_abstract =~ s/\>/>/g; $description_abstract =~ s/\' .$description_abstract.''; } # create the xml string for the description. $descriptionxml = ""; if (length($description) > 0) { # convert special metadata characters for use in xml/html. $description=~ s/\&/&/g; $description=~ s/\>/>/g; $description=~ s/\' .$description.''; } appendix e. mss 2009 batch loading scripts (cont.) 126 information technology and libraries | september 2010 # create the xml string for the author institution. $description2xml = ""; if (length($description2) > 0) { # convert special metadata characters for use in xml/html. $description2=~ s/\&/&/g; $description2=~ s/\>/>/g; $description2=~ s/\' .'author institution: '.$description2.''; } # convert special characters in title. $title=~ s/\&/&/g; $title=~ s/\>/>/g; $title=~ s/\:encoding(utf-8)", "$basedir/$subdir/$filename"); print fh <<"xml"; $identifier $title $issuedate $abstractxml $descriptionxml $description2xml article en $creatorxml xml close($fh); # create contents file and move files to the load set. # copy item files into the load set. if (defined($abstract) && length($abstract) > 0) { system "cp $indir/$abstract $basedir/$subdir"; } $sourcedir = substr($abstract, 0, 5); if (defined($ppt) && length($ppt) > 0 ) { system "cp $indir/$sourcedir/$sourcedir/*.* $basedir/$subdir/"; } if (defined($gif) && length($gif) > 0 ) { system "cp $indir/$sourcedir/$imagesubdir/*.* $basedir/$subdir/"; } # make the 'contents' file and fill it with the file names. appendix e. mss 2009 batch loading scripts (cont.) batch loading collections into dspace | walsh 127 system "touch $basedir/$subdir/contents"; if (defined($gif) && length($gif) > 0 && -d "$indir/$sourcedir/$imagesubdir" ) { # sort items in reverse order so they show up right in dspace. # this is a hack that depends on how the db returns items # in unsorted (physical) order. there are better ways to do this. system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[0-9][0-9].* | sort -r >> $basedir/$subdir/contents"; system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[a-za-z][0-9].* | sort -r >> $basedir/$subdir/contents"; } if (defined($ppt) && length($ppt) > 0 && -d "$indir/$sourcedir/$sourcedir" ) { system "cd $indir/$sourcedir/$sourcedir/;" . " ls *.* >> $basedir/$subdir/contents"; } # put the abstract in last, so it displays first. system "cd $basedir/$subdir; basename $abstract >>" . " $basedir/$subdir/contents"; $linenum++; } # done processing an item. --------------------------------------------------------------------------------------------------import.sh –#!/bin/sh # # import a collection from files generated on dspace # collection_id=1811/6635 eperson=[name removed]@osu.edu source_dir=./2009xml base_id=`basename $collection_id` mapfile=./map-dspace03-mss2009.$base_id /dspace/bin/dsrun org.dspace.app.itemimport.itemimport --add --eperson=$eperson --collection=$collection_id --source=$source_dir --mapfile=$mapfile appendix e. mss 2009 batch loading scripts (cont.) : | wang 81building an open source institutional repository at a small law school library | wang 81 fang wangcommunications v700 flatbed scanner, which was recommended by many digitization best practices in texas. for software, we had all the important basics such as ocr and image editing software for the project to start. for the following several months, i did extensive research on what digital asset management platform would be the best solution for the law library. we had options to continue displaying the digital collections through webpages or use a digital asset management platform that would provide long-term preservation as well as retrieval functions. we made the decision to go with the latter. generally speaking, there are two types of digital asset management platforms: proprietary and open source. in some rare occasions, a library chooses to develop its own system and not to use either type of the platforms if the library has designated programmers. there are pros and cons to both proprietary and open source platforms. although setting up the repository is fairly quick and easy on a proprietary platform, it can be very expensive to pay annual fees for hosting and using the service. for the open source software, it may appear to be “free” up front; however, installing and customizing the repository can be very time consuming and these solutions often lack technical and development support. there is no uniform rule for choosing a platform. it depends on what the organization wants to achieve and its own unique circumstances. i explored several popular proprietary platforms such as contentdm and digital commons. contentdm is an oclc product, which has a lot of capability and is especially good for displaying image collections. digital commons is owned of the repository is ongoing; it is valuable to share the experience with other institutions who wish to set up an institutional repository of their own and also add to the knowledgebase of ir development. institutional repository from the ground up unlike most large university libraries, law school libraries are usually behind on digital initiative activities because of smaller budgets, lack of staff, and fewer resources. although institutional repositories have already become a trend for large university libraries, it still appears to be a new concept for many law school libraries. at the beginning of 2009, i was hired as the digital information management librarian to develop a digital repository for the law school library. when i arrived at texas tech university law library, there was no institutional repository implemented. there were very few digital projects done at the law library. one digital collection was of faculty scholarship. this collection was displayed on a webpage with links to pdf files. another digital project, to digitize and provide access to the texas governor executive orders found in the texas register, was planned then disbanded because of the previous employee leaving the position. i started by looking at the digitization equipment in the library. the equipment was very limited: a very old and rarely used book scanner and a sheet-fed scanner. the good thing was that the library did have extra pcs to serve as workstations. i did research on the book scanner we had and also consulted colleagues i met at various digital library conferences about it. because the model is very outdated and has been discontinued by the vendor and thus had little value to our digitization project, i decided to get rid of the scanner. i then proposed to purchase an epson perfection building an open source institutional repository at a small law school library: is it realistic or unattainable? digital preservation activities among law libraries have largely been limited by a lack of funding, staffing and expertise. most law school libraries that have already implemented an institutional repository (ir) chose proprietary platforms because they are easy to set up, customize, and maintain with the technical and development support they provide. the texas tech university school of law digital repository is one of the few law school repositories in the nation that is built on the dspace open source platform.1 the repository is the law school’s first institutional repository in history. it was designed to collect, preserve, share and promote the law school’s digital materials, including research and scholarship of the law faculty and students, institutional history, and law-related resources. in addition, the repository also serves as a dark archive to house internal records. i n this article, the author describes the process of building the digital repository from scratch including hardware and software, customization, collection development, marketing and outreach, and future projects. although the development fang wang (fang.wang@ttu.edu) is digital information management librarian, texas tech university school of law library, lubbock, texas. 82 information technology and libraries | june 2011 two months later, we discovered that a preconfigured application called jumpbox for dspace was released and approved to be a much easier solution for the installation. the price was reasonable too, $149 a year (the price has jumped quite a bit since then). however, using jumpbox would leave our newly purchased red hat linux server of no use because jumpbox runs on ubuntu, therefore after some discussion we decided not to pursue it. we were a little stuck in the installation process. outsourcing the installation seemed to be a feasible solution for us at this point. we identified a reputable dspace service provider after doing extensive research including comparing vendors, obtaining references, and pursuing other avenues. after obtaining a quote, we were quite satisfied with the price and decided to contract with the vendor. while waiting for the contract to be approved by the university contracting office, i began designing the look and feel that is unique to the ttu school of law with some help from another library staff member. the installation finally took place at the beginning of january 2010. i worked very closely with the service provider during the installation to ensure the desired configuration for our dspace instance. our repository site with the ttu law branding became accessible to the public three days later. and with several weeks of warranty, we were able to adjust several configurations including display thumbnails for images. overall, we are very pleased with the results. after the installation, our it department maintains the dspace site and we host all the content on our own server. collection development of the ir content is the most critical element to an institutional repository. while we were waiting for our it department 66, the majority of the repositories worldwide were created using the dspace platform.2 for the installation, we looked at the opportunity to use services provided by the state digital library consortium texas digital library (tdl) and tried to pursue a partnership with the main university library, which had already implemented a digital repository. however, because of financial reasons and separate budgets, those approaches did not work out. so we decided to have our own it department install dspace. installation and customization of our dspace unlike large university libraries, smaller special libraries face many challenges while trying to establish an open source repository. after making the decision to use dspace, the first challenge we faced was the installation. dspace runs on postgresql or oracle and requires a server installation. customizing the web interface requires either the jspui (javaserver pages user interface) or xmlui (extensible markup language user interface). the staff in our it department knew little about dspace. however, another special library on campus offered their installation notes to our system administrator because they just installed dspace. although dspace runs on a variety of operating systems, we purchased red hat enterprise linux after some testing because it is the recommended os for dspace. then our system administrator spent several months trying to figure out how to install the software in addition to his existing projects. because we did not have dedicated it personnel working on the installation, the work was often interrupted and very difficult to complete. our it staff also found it very difficult to continue with the installation because the software requires a lot of expertise. by berkley press and is often used in the law library community. as a smaller law library, our budget did not allow us to purchase those platforms, which require annual fees of more than $10,000. so we had to look at the open source options. for the open source platforms, i investigated dspace, fedora, eprints and green stone. dspace is a javabased system developed by mit and hp labs. it offers a communitiescollections model and has built-in submission workflows and long-term preservation function. it can be installed “out of the box” and is easy to use. it has been widely adopted as institutional repository software in the united states and worldwide. fedora was also developed in the united states. it is more of a backend software with no web-based administration tools and requires a lot of programming effort. similar to dspace, eprints is another easy to set up and use ir software developed in the u.k. it is written in perl and is more widespread in europe. greenstone is a tool developed in new zealand for building and distributing digital library collections. it provides interfaces in 35 languages so it has many international users. when choosing an ir platform, it is not a question of which software is superior to others but rather which is more appropriate for the purpose and the content of the repository. our goal was to find a platform that had low costs and did not involve much programming. we also wanted a system that was capable of archiving digital items in various formats for the long term, flexible for data migration, had a widely accepted metadata scheme, decent search capability, and was easy to use. another factor we had to consider was the user base. because open source software relies on the user themselves for technical support for the most part, we wanted a software that had an active user community in the united states. dspace seemed to satisfy all of our needs. also, according to repository : | wang 83building an open source institutional repository at a small law school library | wang 83 hosted by the lubbock county bar association at the ttu law school. we made the initial announcement to the law faculty and staff and later to the lubbock county bar about the new digital initiative service we have established. we received very positive feedback from the law community. professor edgar’s family was delighted to see his collection made available to the public. following the success of the initial launch, i developed an outreach plan to promote the digital repository. to make the repository site more visible, several efforts were made: the repository site url was submitted to the dspace user registry, the directory of open access repositories (opendoar), and registry of open access repositories (roar); the site was registered with google webmaster tools for better indexing; and the repository was linked to several websites of the law school and library. the “faculty scholarship” collection and the “texas governor executive orders” collection became available shortly after. i then developed a poster of the newly established digital repository and presented it at the texas conference on digital libraries held at university of texas austin in may 2010. currently, our digital repository has more than eight hundred digital items as of august 2010. with more and more content becoming available in the repository, we plan on making an official announcement to the law community. we will also make entering first-year law students aware of the ir by including an article about the new repository in the library newsletter that is distributed to them during their orientation. our future marketing plan includes sending out announcements of new collections to the law school using our online announcement system techlawannounce and promoting the digital repository through the law library social networking pages on facebook and twitter. we also plan reviewed each year. based on the collection development policy, we made a decision to migrate the content of the old “faculty scholarship” collection from webpages into the digital repository. it was intended to include all publications of the texas tech law school faculty in the collection. we then hired a second-year law student as the digital project assistant and trained him on scanning, editing, and ocr-ing pdf files; uploading files to dspace; and creating basic metadata. we also brought another two student assistants on board to help with the migration of the faculty scholarship collection. the faculty services librarian checked the copyright with faculty members and publishers while i (the digital information management librarian) served as the repository manager handling more complicated metadata creation, performing quality control over student submissions, and overseeing the whole project. later development and promoting the ir during the faculty scholarship migration process, we discovered a need to customize dspace to allow active urls for publications. we wanted all the articles linked to three widely used legal databases: westlaw, lexisnexis, and hein online. because the default dspace system does not support active urls, it requires some programming effort to make the system detect a particular metadata field then render it as a clickable link. we outsourced the development to the same service provider who installed dspace for us. the results were very satisfying. the vendor customized the system to allow active urls and displayed the links as clickable icons for each legal database. in april 2010, “professor j. hadley edgar ’s personal papers” collection was made available in conjunction with his memorial service, to install dspace, we prepared and scanned two collections: the “texas governor executive orders” collection and the “professor j. hadley edgar’s personal papers” collection. the latter was a collection donated by professor edgar’s wife after he passed away in 2009. professor edgar taught at the law school from 1971 to 1991. he was named the robert h. bean professor of law and was twice voted by the student body as the outstanding law professor. the collection contains personal correspondence, photos, newspaper clippings, certificates, and other materials. many of the items have a high historic value to the law school. for the scanning standards, we used 200 dpi for text-based materials and 400 dpi for pictures. we chose pdf as our production file format as it is a common document format and smaller in size to download. after the installation was completed at the beginning of january, i drafted and implemented a digital repository collection development policy shortly after to ensure proper procedures and guidance of the repository development. the policy includes elements such as the purpose of the repository, scope of the collections, selection criteria and responsibilities, editorial rights, and how to handle challenges and withdrawals. i also developed a repository release form to obtain permissions from donors and authors to ensure open access for the materials in the repository. twelve collections were initially planned for the repository: “faculty scholarship,” “personal manuscripts,” “texas governor executive orders,” “law school history,” “law library history,” “regional legal history,” “law student works,” “audio/ video collection,” “dark archive,” “electronic journals,” “conference, colloquium and symposium,” and “lectures and presentations.” there will be changes to the collections in the future as the digital repository collection development policy will be 84 information technology and libraries | june 2011 all roads lead to rome. no matter what platform you choose, whether open source or not, the goal is to pick a system that best suits your organization’s needs. to build a successful institutional repository is not simply “scanning” and “putting stuff online.” various factors need to be considered, such as digitization, ir platform, collection development, metadata, copyright issues, and marketing and outreach. our experience has proven that it is possible for a smaller special library with limited resources and funding to establish an open source ir such as dspace and continue to maintain the site and build the collections with success. open source software is certainly not “free” because it requires a lot of effort. however, in the end it still costs a lot less than what we would pay to the proprietary software vendors. references 1. “the texas tech university school of law digital repository,” http://reposi tory.law.ttu.edu/ (accessed apr. 5, 2011). 2. “repository maps,” accessed http://maps.repository66.org/ (accessed aug. 16, 2010). (ssrn) links to individual articles in the faculty scholarship collection. after that, the next collections we will work on are the law school and law library history materials. we also plan to do some development on the dspace authentication to integrate with the ttu “eraider” system to enable single log-in. in the future, we want to explore the possibilities of setting up a collection for the works of our law students and engage in electronic journal publishing using our digital repository. conclusion it is not an easy task to develop an institutional repository from scratch, especially for a smaller organization. installation and development are certainly a big challenge for a smaller library with limited number of it staff. outsourcing these needs to a service provider seems to be a feasible solution. another challenge is training. we overcame this challenge by taking advantage of the state consortium’s dspace training sessions. subscribing to the dspace mailing list is necessary as it is a communication channel for dspace users to ask questions, seek help, and keep up to date about the software. on hosting information sessions for our law faculty and students to learn more about the digital repository. future projects there is no doubt that our digital repository will grow significantly because we have exciting collections planned for future projects. one of our law faculty, professor daniel benson, donated some of his personal files from an eight-year litigation representing the minority plaintiffs in the civil rights case of jones v. city of lubbock, 727 f. 2d 364 (5th cir. 1984) in which the minority plaintiffs won the case. the lawsuit changed the city of lubbock’s election system for city council members from the “at large” method to the “single member district system,” which allowed the minority candidates consistently being elected. this collection contains materials, notes, memoranda, letters, and other documents prepared and utilized by the plaintiffs’ attorneys. it has significant historical value because a texas tech law professor and five texas tech law graduates participated in that case successfully as pro bono attorneys for the minority plaintiffs. in addition, we plan on adding social science research network 156 information technology and libraries | december 2011 mark dehmlow editorial board thoughts: sharing responsibility in the digital age t his topic is very resonant for me because this past year we launched a new interface to our catalog, rich with all of the features that our users have been self-trained to expect from browsing the internet. we actually launched this project in public beta for two years, t–w–o years. i should also mention that the initial implementation team was diverse, drawing from technology, public services, collections, and technical services. yet, when we launched the project into production, it was only then that we heard concerns and complaints. those concerns revolved around two things—first, there was functionality in the classic catalog that wasn’t in the new one, and second, people were used to the old way of doing things and didn’t know how the supposedly more intuitive interface worked—a kind of opacholm syndrome, and more importantly for librarians, they wanted to know how to exploit the system powerfully. we also found during the first semester, there were few instructors teaching the new system because they were afraid they couldn’t speak authoritatively about it. people are creatures of habit and even though something might be easier to learn if it were your first exposure to it (macs vs. pcs anyone?), often times changing from a more complex, but well understood, process is difficult. i remember years ago at another institution i worked for, helping the organization move from a menu driven ils interface to a graphical user interface. it required staff to actually rethink the process they were performing because although the gui is able to make the process more efficient, it also hides many of the more mundane parts of it. with all of those concerns on the table all of a sudden, what did we do? we spent the summer after our production launch providing targeted training sessions and gathering in person feedback from our internal stakeholders. it probably amounted to more than thirty meetings over the course of three months. we synthesized feedback, identified the biggest pain points, and spent a couple of months developing solutions. providing a more organized training program and targeted feedback sessions as a replacement for our more generalized call for input bought us a lot of goodwill internally. it also gave us some direction on what areas to focus on and opened more dialog with the rest of the library. in the end, it is really important for all areas to be responsible for trying out new systems, even if those responsible are doing more outreach than simple general calls for participation. in some ways, those deploying new systems have the greater onus in this relationship in that they are driving many of the effort; this is especially critical for changes that have broad impact. taking a more organized and proactive approach to training and acclimating our organizations to change can go a long way to reducing conflict and stress. everyone is extremely busy, and the tendency for people is to ignore the things that aren’t directly in front of them. making efforts toward a more proactive strategy raises awareness and by meeting in person, you show people that their input is valuable enough to make time to listen and talk to them. taking this type of approach is important even in the cases where projects are managed by committees. liaisons don’t necessarily provide organizational saturation and oftentimes the vital information about a new system is filtered through their own sense of what is critical. a good start to determine how much communication is needed is to first gauge the potential impact—if a change affects more than a certain percentage of the library and its users, it probably means it will require a good deal more outreach so people don’t feel quite as off balance when the change is implemented. those deploying projects should add a couple of months onto the end of planning cycles to help provide training and gather feedback in a hands on way—e-mail announcements are more often ignored than not given the sheer amount of e-mail everyone gets these days. another possible strategy is to devise testing scripts for anyone trying the system to follow as opposed to just having them “try it out.” a script will give people some direction and hopefully get them into system functionality that they otherwise could miss by trying it without any specific goal. i am not so naive to think we will reach an allencompassing-kumbaya moment where communication is perfect and everyone agrees on what kinds of changes to implement in our organizations. i do think though, that teams and individuals who are implementing new systems can help alleviate anxieties if we build more time into our deployment processes to ease our organizations into change instead of hoping they learned how to swim before we all jump in. mark dehmlow (mdehmlow@nd.edu) is head, library web department, interim head, library information systems department, hesburgh libraries, university of notre dame, notre dame, indiana 16 information technology and libraries | march 2009 mathew j. miles and scott j. bergstrom classification of library resources by subject on the library website: is there an optimal number of subject labels? the number of labels used to organize resources by subject varies greatly among library websites. some librarians choose very short lists of labels while others choose much longer lists. we conducted a study with 120 students and staff to try to answer the following question: what is the effect of the number of labels in a list on response time to research questions? what we found is that response time increases gradually as the number of the items in the list grow until the list size reaches approximately fifty items. at that point, response time increases significantly. no association between response time and relevance was found. i t is clear that academic librarians face a daunting task drawing users to their library’s web presence. “nearly three-quarters (73%) of college students say they use the internet more than the library, while only 9% said they use the library more than the internet for information searching.”1 improving the usability of the library websites therefore should be a primary concern for librarians. one feature common to most library websites is a list of resources organized by subject. libraries seem to use similar subject labels in their categorization of resources. however, the number of subject labels varies greatly. some use as few as five subject labels while others use more than one hundred. in this study we address the following question: what is the effect of the number of subject labels in a list on response times to research questions? n literature review mcgillis and toms conducted a performance test in which users were asked to find a database by navigating through a library website. they found that participants “had difficulties in choosing from the categories on the home page and, subsequently, in figuring out which database to select.”2 a review of relevant research literature yielded a number of theses and dissertations in which the authors compared the usability of different library websites. jeng in particular analyzed a great deal of the usability testing published concerning the digital library. the following are some of the points she summarized that were highly relevant to our study: n user “lostness”: users did not understand the structure of the digital library. n ambiguity of terminology: problems with wording accounted for 36 percent of usability problems. n finding periodical articles and subject-specific databases was a challenge for users.3 a significant body of research not specific to libraries provides a useful context for the present research. miller’s landmark study regarding the capacity of human shortterm memory showed as a rule that the span of immediate memory is about 7 ± 2 items.4 sometimes this finding is misapplied to suggest that menus with more than nine subject labels should never be used on a webpage. subsequent research has shown that “chunking,” which is the process of organizing items into “a collection of elements having strong associations with one another, but weak associations with elements within other chunks,”5 allows human short-term memory to handle a far larger set of items at a time. larson and czerwinski provide important insights into menuing structures. for example, increasing the depth (the number of levels) of a menu harms search performance on the web. they also state that “as you increase breadth and/or depth, reaction time, error rates, and perceived complexity will all increase.”6 however, they concluded that a “medium condition of breadth and depth outperformed the broadest, shallow web structure overall.”7 this finding is somewhat contrary to a previous study by snowberry, parkinson, and sisson, who found that when testing structures of 26, 43, 82, 641 (26 means two menu items per level, six levels deep), the 641 structure grouped into categories proved to be advantageous in both speed and accuracy.8 larson and czerwinksi recommended that “as a general principle, the depth of a tree structure should be minimized by providing broad menus of up to eight or nine items each.”9 zaphiris also corroborated that previous research concerning depth and breadth of the tree structure was true for the web. the deeper the tree structure, the slower the user performance.10 he also found that response times for expandable menus are on average 50 percent longer than sequential menus.11 both the research and current practices are clear concerning the efficacy of hierarchical menu structures. thus it was not a focus of our research. the focus instead was on a single-level menu and how the number and characteristics of subject labels would affect search response times. n background in preparation for this study, library subject lists were collected from a set of thirty library websites in the united mathew j. miles (milesm@byui.edu) is systems librarian and scott j. bergstrom (bergstroms@byui.edu) is director of institutional research at brigham young university–idaho in rexburg. classification of library resources by subject on the library website | miles and bergstrom 17 states, canada, and the united kingdom. we selected twelve lists from these websites that were representative of the entire group and that varied in size from small to large. to render some of these lists more usable, we made slight modifications. there were many similarities between label names. n research design participants were randomly assigned to one of twelve experimental groups. each experimental group would be shown one of the twelve lists that were selected for use in this study. roughly 90 percent of the participants were students. the remaining 10 percent of the participants were full-time employees who worked in these same departments. the twelve lists ranged in number of labels from five to seventy-two: n group a: 5 subject labels n group b: 9 subject labels n group c: 9 subject labels n group d: 23 subject labels n group e : 6 subject labels n group f: 7 subject labels n group g: 12 subject labels n group h: 9 subject labels n group i: 35 subject labels n group j: 28 subject labels n group k: 49 subject labels n group l: 72 subject labels each participant was asked to select a subject label from a list in response to eleven different research questions. the questions are listed below: 1. which category would most likely have information about modern graphical design? 2. which category would most likely have information about the aztec empire of ancient mexico? 3. which category would most likely have information about the effects of standardized testing on high school classroom teaching? 4. which category would most likely have information on skateboarding? 5. which category would most likely have information on repetitive stress injuries? 6. which category would most likely have information about the french revolution? 7. which category would most likely have information concerning walmart’s marketing strategy? 8. which category would most likely have information on the reintroduction of wolves into yellowstone park? 9. which category would most likely have information about the effects of increased use of nuclear power on the price of natural gas? 10. which category would most likely have information on the electoral college? 11. which category would most likely have information on the philosopher emmanuel kant? the questions were designed to represent a variety of subject areas that library patrons might pursue. each subject list was printed on a white sheet of paper in alphabetical order in a single column, or double columns when needed. we did not attempt to test the subject lists in the context of any web design. we were more interested in observing the effect of the number of labels in a list on response time independent of any web design. each participant was asked the same eleven questions in the same order. the order of questions was fixed because we were not interested in testing for the effect of order and wanted a uniform treatment, thereby not introducing extraneous variance into the results. for each question, the participant was asked to select a label from the subject list under which they would expect to find a resource that would best provide information to answer the question. participants were also instructed to select only a single label, even if they could think of more than one label as a possible answer. participants were encouraged to ask for clarification if they did not fully understand the question being asked. recording of response times did not begin until clarification of the question had been given. response times were recorded unbeknownst to the participant. if the participant was simply unable to make a selection, that was also recorded. two people administered the exercise. one recorded response times; the other asked the questions and recorded label selections. relevance rankings were calculated for each possible combination of labels within a subject list for each question. for example, if a subject list consisted of five labels, for each question there were five possible answers. two library professionals—one with humanities expertise, the other with sciences expertise—assigned a relevance ranking to every possible combination of question and labels within a subject list. the rankings were then averaged for each question–label combination. n results the analysis of the data was undertaken to determine whether the average response times of participants, adjusted by the different levels of relevance in the subject list labels that prevailed for a given question, were significantly different across the different lists. in other words, would the response times of participants using a particular list, for whom the labels in the list were highly relevant 18 information technology and libraries | march 2009 to the question, be different from students using the other lists for whom the labels in the list were also highly relevant to the question? a separate univariate general linear model analysis was conducted for each of the eleven questions. the analyses were conducted separately because each question represented a unique search domain. the univariate general linear model provided a technique for testing whether the average response times associated with the different lists were significantly different from each other. this technique also allowed for the inclusion of a covariate—relevance of the subject list labels to the question—to determine whether response times at an equivalent level of relevance was different across lists. in the analysis model, the dependent variable was response time, defined as the time needed to select a subject list label. the covariate was relevance, defined as the perceived match between a label and the question. for example, a label of “economics” would be assessed as highly relevant to the question, what is the current unemployment rate? the same label would be assessed as not relevant for the question, what are the names of four moons of saturn? the main factor in the model was the actual list being presented to the participant. there were twelve lists used in this study. the statistical model can be summarized as follows: response time = list + relevance + (list × relevance) + error the general linear model required that the following conditions be met: first, data must come from a random sample from a normal population. second, all variances with each of the groupings are the same (i.e., they have homoscedasticity). an examination of whether these assumptions were met revealed problems both with normality and with homoscedasticity. a common technique— logarithmic transformation—was employed to resolve these problems. accordingly, response-time data were all converted to common logarithms. an examination of assumptions with the transformed data showed that all questions but three met the required conditions. the three 0.70 0.80 0.90 1.00 1.10 1.20 0.50 0.60 avg log performance trend figure 1. the overall average of average search times for the eight questions for all experimental groups (i.e., lists) questions (5, 6, and 7) were excluded from subsequent analysis. n conclusions the series of graphs in the appendix show the average response times, adjusted for relevance, for eight of the eleven questions for all twelve lists (i.e., experimental groups). three of the eleven questions were excluded from the analysis because of heteroscedascity. an inspection of these graphs shows no consistent pattern in response time as the number of the items in the lists increase. essentially, this means that, for any given level of relevance, the number of items of the list does not affect response time significantly. it seems that for a single question, characteristics of the categories themselves are more important than the quantity of categories in the list. the response times using a subject list with twenty-eight labels is similar to the response times using a list of six labels. a statistical comparison of the mean response time for each classification of library resources by subject on the library website | miles and bergstrom 19 group with that of each of the other groups for each of the questions largely confirms this. there were very few statistically significant different comparisons. the spikes and valleys of the graphs in the appendix are generally not significantly different. however, when the average response time associated with all lists is combined into an overall average from all eight questions, a somewhat clearer picture emerges (see figure 1). response times increase gradually as the number of the items in the list increase until the list size reaches approximately fifty items. at that point, response time increases significantly. no association was found between response time and relevance. a fast response time did not necessarily yield a relevant response, nor did a slow response time yield an irrelevant response. n observations we observed that there were two basic patterns exhibited when participants made selections. the first pattern was the quick selection—participants easily made a selection after performing an initial scan of the available labels. nevertheless, a quick selection did not always mean a relevant selection. the second pattern was the delayed selection. if participants were unable to make a selection after the initial scan of items, they would hesitate as they struggled to determine how the question might be reclassified to make one of the labels fit. we did not have access to a high-tech lab, so we were unable to track eye movement, but it appeared that the participants began scanning up and down the list of available items in an attempt to make a selection. the delayed selection seemed to be a combination of two problems: first, none of the available labels seemed to fit. second, the delay in scanning increased as the list grew larger. it’s possible that once the list becomes large enough, scanning begins to slow the selection process. a delayed selection did not necessarily yield an irrelevant selection. the label names themselves did not seem to be a significant factor affecting user performance. we did test three lists, each with nine items and each having different labels, and response times were similar for the three lists. a future study might compare a more extensive number of lists with the same number of items with different labels to see if label names have an effect on response time. this is a particular challenge to librarians in classifying the digital library, since they must come up with a few labels to classify all possible subjects. creating eleven questions to span a broad range of subjects is also a possible weakness of the study. we had to throw out three questions that violated the assumptions of the statistical model. we tried our best to select questions that would represent the broad subject areas of science, arts, and general interest. we also attempted to vary the difficulty of the questions. a different set of questions may yield different results. references 1. steve jones, the internet goes to college, ed. mary madden (washington, d.c.: pew internet and american life project, 2002): 3, www.pewinternet.org/pdfs/pip_college_report.pdf (accessed mar. 20, 2007). 2. louise mcgillis and elaine g. toms, “usability of the academic library web site: implications for design,” college & research libraries 62, no. 4 (2001): 361. 3. judy h. jeng, “usability of the digital library: an evaluation model” (phd diss., rutgers university, new brunswick, new jersey): 38–42. 4. george a. miller, “the magical number seven plus or minus two: some limits on our capacity for processing information,” psychological review 63, no. 2 (1956): 81–97. 5. fernand gobet et al., “chunking mechanisms in human learning,” trends in cognitive sciences 5, no. 6 (2001): 236–43. 6. kevin larson and mary czerwinski, “web page design: implications of memory, structure and scent for information retrieval” (los angeles: acm/addison-wesley, 1998): 25, http://doi.acm.org/10.1145/274644.274649 (accessed nov. 1, 2007). 7. ibid. 8. kathleen snowberry, mary parkinson, and norwood sisson, “computer display menus,” ergonomics 26, no 7 (1983): 705. 9. larson and czerwinski, “web page design,” 26. 10. panayiotis g. zaphiris, “depth vs. breath in the arrangement of web links,” www.soi.city.ac.uk/~zaphiri/papers/hfes .pdf (accessed nov. 1, 2007). 11. panayiotis g. zaphiris, ben shneiderman, and kent l. norman, “expandable indexes versus sequential menus for searching hierarchies on the world wide web,” http:// citeseer.ist.psu.edu/rd/0%2c443461%2c1%2c0.25%2cdow nload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/ cache/papers/cs/22119/http:zszzszagrino.orgzszpzaphiriz szpaperszszexpandableindexes.pdf/zaphiris99expandable.pdf (accessed nov. 1, 2007). 20 information technology and libraries | march 2009 appendix. response times by question by group 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 gr p a (5 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p d (2 3 ite m s) gr p e (6 it em s) gr p f (7 it em s) gr p g (1 2 ite m s) gr p h (9 it em s) gr p i (3 5 ite m s) gr p j (2 8 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) question 1 question 8 question 2 question 9 question 3 question 10 question 4 question 11 methods of randomization of large files with high volatility 79 patrick c. mitchell: senior programmer, washington state university, pullman, washington, and thomas k. burgess: project manager, institute of library research, university of california, los angeles, california key-to-address conversion algorithms which have been used for a large, direct access file are compared with respect to record density and access time. cumulative distribution functions are plotted to demonstrate the distribution of addresses generated by each method. the long-standing practice of counting address collisions is shown to be less valuable in fudging algorithm effectiveness than considering the maximum number of contiguously occupied file locations. the random access disk file used by the washington state university library acquisition sub-system is a large file with a sizable number of records being added and deleted daily. this file represents not only materials on order by the acquisitions section, but all materials which are in process within the technical services area of the library. the size of the file currently varies from approximately 12,000 to 15,000 items and has a capacity of 18,000 items. over 40,000 items are added and purged annually. each record consists of both fixed length fields and variable length fields. fixed fields primarily contain quantity and accounting information; the variable length fields represent bibliographic data. records are blocked at 1,000 characters for file structuring purposes; however the variable length information is treated as strings of characters with delimiters. the key to the file is a 16-character structure which is developed from the purchase order number. the structure of the key is as follows: six digits of the original purchase order number, two digits of partial order and credit information, and eight digits containing the computed relative record address. proper development of this key turns out to be 80 journal of library automation vol 3/1 march, 1970 the most important factor in achieving efficiency in both file access time and record density within the file. the w.s.u. purchase order numbering system, developed from a basic six-digit purchase order number, allows up to one million entries. of these, the library currently uses four blocks: one block for standing orders, one block for orders originating from the university after the system becomes operational, another block used by the systems people in prototype testing of the system, and a fourth block which was given to one vendor who operates an approval book program. in mapping a possible million numbers into eighteen thousand disk locations, there is a high probability that the disk addresses for more than one record will be the same. disk location, also called disk address, home position, and relative record address ( rra) in this paper, refers to the computed offset address of a record in the file, relative to the starting address of the file. currently, the file resides on an ibm 2316 disk pack which can store six 1000-character records per track. thus if the starting address of the file is track 40, a record with rra = 5 would have its home position on track 40, while a record with rra = 6 would have its home position on track 41. it should be noted that routines in this system are required to calculate neither absolute track address nor relative track address and therefore the file could be moved to any direct access device supported by os/bdam without program modification. when two records map into the same address, it is called a collision. for a write statement under the ibm 360 operating system, basic direct access methods, the system locates that disk address generated and if another record is found there, it sequentially searches from that point forward until a vacant space is found and then stores the new record in that space. the sequential search is done by a hardware program in the i/ 0 channel and proceeds at the rotational speed of the device on which the file resides. the cpu is free during this period to service other users. similarily, when searching for a record, the system locates the disk address and matches keys; if they do not match, it sequentially searches forward from that point. long sequential searches sharply degrade the operating efficiency of on-line systems. in initial experimentation with this file, it was discovered that some records were 2,500 disk positions away from their computed locations. this seriously reduced response time to the terminals which were operating against those records. the necessity to develop a method for placing each record close to its calculated location became quite obvious. however, the methodology for doing this was not as clear. the upper bound delay for a direct access read/write operation can be defined as the largest number of contiguously occupied record locations within the file. the problem of minimizing this upper bound for a particular file is equivalent to finding an algorithm which maps the keys in such a way that unoccupied locations are interspersed throughout the randomization of large files/mitchell and burgess 81 file space. one method for doing this is to triple the amount of space required for the file. this has been a traditional approach but is unsatisfactory in terms of its efficiency in space utilization. the method first used by the library was motivated by the necessity to "get on the air." its requirements were that it be easily implemented and perform to a reasonable degree. the prime modulo scheme seemed to qualify and was selected. as this algorithm was used, the largest prime number within the file size was divided into the purchase order number and the modulo remainder was used as an address; that is, rra = [po modulo pr] where rra is the relative record address, po is the purchase order number, and pr is a prime number. during the initial period file size grew to about 8,000 records. because the acquisitions section was converting from its manual operation, the file continued to grow in size and the collision problem became pronounced. when the file reached about 70% capacity-that is when 70% of the space allocated for the file was being occupied by records-this method became unusable; records were then located so far from their original addresses that terminal response times became degraded and batch process routines began to have significant increases in run times. with no additional space available to expand the size of the file, it became necessary to increase the record density within the existing file bounds. therefore an adaptation of the original algorithm was developed. in addition to generating the original number by dividing a prime number into the purchase order number and keeping the modulo remainder, the purchase order number was multiplied by 300 and divided by that same prime number to get an additional modulo remainder; the latter was added to the first modulo remainder and the sum then divided by 2: (po modulo pr) + (300 • po modulo pr) 2 rra = again this scheme brought some relief, but the file continued to grow as the system was implemented, and it became obvious that this procedure would also fail because of over-crowded areas in the file. a search of the literature using w. b. climenson's chapter on file structure ( 2) as a start provided some other methods for reducing the collision problem ( 1, 3, 4, 5, 6). several randomization or hashing schemes were examined. however, none of these methods appeared to be particularly pertinent to the set of conditions at washington state. in order to bring relief from the continuing problem of file and program maintenance involved with changing the file-mapping algorithm, research was initiated to devise an algorithm which would, independent of the input data, map records uniformly across the available file space. the algorithm which resulted utilizes a pseudo-random number generator, rand (7) developed at the w.s.u. computing center randl, program 360l-13.5.004, computing center library, computing center, 82 journal of library automation vol 3/ 1 march, 1970 washington state university, pullman, washington. the normal use of rand is to generate a sequence of uniformly distributed integers over the interval [1, m], where m is a specified upper bound in the interval [1, 231 -1]. in addition to m, rand has a second input parameter: n, which is the last number generated by rand. given m and n, rand generates a result r. rand is used by the algorithm to generate relative disk addresses by setting m to the size or capacity of the file, by setting n to the purchase order number of the record to be located, and by using r as the relative address of the record. rra =rand (po, m ) . in order to test the effectiveness of this algorithm and others which might be devised, a file simulation program was written bdamsim, program 360l-06.7.008, computing center library, computing center, washington state university, pullman, washington. inputs to this program are: a) an algorithm to generate relative record locations; b) a sequential file which contains the input data for "a"; c) various scalar values such as file capacity, approximate number of records in the file, title of output, etc. the program analyzes the numbers generated by "a" operating on "b" within the constraints of "c". the outputs of the program are some statistical results and a graphical plot showing the cumulative distribution function of the generated addresses. figures 1, 2, and 3 show the plotted output of the three algorithms operating against the current acquisitions file. the abscissas of the plots 8 • )!! ii! li 1i :;! 5i ::! !':! ~ ~ n n ~~ a= .. ~, ,~ -' -' ~)11 i'! a; ·:5 ~li ma! 0.. 0.. .. .. it ,:: ~ ~ ~--~~~~-±~~~--~~~~~~~~--~--~--~--~~0 21 , 10 '12.20 83.30 111,'10 105. 51 i:m.61 1~7.71 1811,81 im.tl 211.01 2$!,11 253.2f relrt ive record addresses lx i 02 l fig. 1. rra =po modulo pr randomization of large files/ mitchell and burgess 83 fig. 2. rra = ( (po modulo pr) + (300 x po modulo pr) )/ 2. 8 i )c ii! ~ ~ z! 5i fl !! l':! i<; ~ ;;; :::::: ;::: ~8 z 8::: .; .; ::~ ~ ,.. ~ ..j ..j ~~ ~iii ~m :ti~ a: a: """' "' ~ ~ ~ ~ fig. 3. rra =rand (po, pr). 84 journal of library a'1":111l"jwij i '_''l'h.l-t tt1ul um ~~i :f'i :t~ ;jl;lt'{>,'f r r_r,f rx 15 1 lf~ 1 1-e --····--j·-·· ·-.. lf [l{rl i i'yliilj f'li: ·r1 1jgt)f i *y:nt : ii!j,ii¥1.1 i 9;j.;,~;1: ; ~!z1tt '?" ~~:.,;.· .-.t r •.. ~,, •. x ~.:r, r_ x ,r,; ,r;; i ~~i_i_ 1 if r. ---· . -l -··· ·r'i!..r':~ i tm~m~x <¥~1 t :k j~,] f:k {i~ ~ ri fr t1>/ ilm ~'!<. #.l'li!iii *t9.t !k ix x: . rel ~ \ \ \ \ \ \ \ • \ .ii character key b character c 1rselection key a fig. 1. kanji teletypewriter keyboard of the national diet library. included on this keyboard are : kanji kana western alphabets numerals symbols and marks kanji pattern s kanji components space 2,006 90 144 20 210 40 139 total 2,6506 by using shift keys on the upper left of the keyboard, kana in both styles and alphabets in upper and lower cases can be input. for satisfacjapanese character input!morita 11 tory operation, the keyers must be professionally trained, and it is said that one to three months are necessary for them to be fully trained and able to input an average of fifty to sixty kanji per minute. this is not as fast as most other methods discussed. japanese typewriter the second of the full keyboard approaches is the japanese typewriter method, which uses a modification of the standard japanese typewriter with a tray filled with kanji printing types. the operator finds a character in the tray and punches it by moving a metal handle as the type bar is punched down to print the character. this is rather primitive and different in its operation from the english typewriter, which uses the ten-finger touch method. there are four variations: character location method. kanji are arranged on a keyboard by their codes, so that when a key is punched, the kanji is typed on regular paper as if it had been done by a regular japanese typewriter. at the same time, the code is automatically read from the location of the key and is punched on tape. code-plate scanning method. each type bar has a plate attached on its side, and the code for the character is marked on its plate . when a key is typed, the kanji is printed on paper and the code from the plate is optically scanned at the same time. coded typeface method. each typeface is made with a character on the upper half and a code for it on the lower hale when a key is typed, both the character and code are printed. the code on the bottom half is optically scanned from the printed paper. modified coded typeface method. instead of typing both characters and codes on the paper, this method prints only the characters on the front of the paper and, at the same time, prints a bar code on the back of the paper. the machine capable of doing this is complicated. the size of the character on a typeface can be bigger than in the variation above, and the bar code can be larger to make the scanning of the code easier and more precise. as the discussion of the four variations indicates, the japanese typewriter offers the advantage of being able to monitor input at the time of keying. since the japanese typewriter has been in use for a long time in offices where a quantity of official documents are dealt with, and since ordinary japanese typists can use this system without any additional training, the use of equipment similar in operation was considered advantageous . however, it should be noted that japanese typewriters have never become as prevalent as english typewriters, and the demand for computers comes from more areas than just those where japanese typewriters are used . for this reason, the use of japanese typewriters is not as advantageous as its proponents claim . an obvious 12 journal of library automation vol. 14/1 march 1981 disadvantage is its slow speed of operation-thirty to fifty characters per minute on the average. another disadvantage is that the number of characters on the keyboard is limited to about 3,000. tablet style this method, also known as pen-touch method, was recently developed . each character has a key, and characters are arranged in a certain order. the location of the characters on a matrix sheet determines the two-byte binary code, which consists of a two-digit numerical abscissa and twodigit numerical ordinate . the operator touches the key with a penshaped detector and the code for the character is punched on the paper tape. the operation is one-handed, requiring only a light touch of the key by a detector. keys are on one flat keyboard and are color-coded by sections to make it easier for the operator to locate them. light touch operation reduces operator fatigue. this method does not require special training. however, the number of kanji on a keyboard of reasonable size is limited to approximately 3,500. by shifting, twice as many characters can be handled, though all characters are not indicated on the keyboard. speed of input is not very high-thirty to seventy characters per minute. this system, already used in many libraries, is becoming increasingly popular because of its easy operation. there are three different technologies used: electromagnetic, electrostatic, and photoelectric. there are no differences in actual input operation for those electronically different methods. component pattern input although not a full keyboard method, component pattern input is closely related to these methods. the idea behind this approach is that most kanji are composed of one or more basic component units, two or more of which can be put together into one kanji according to one predetermined pattern out of forty general patterns. the inputting device has keys for those forty patterns along with keys for individual components on a special keyboard. to compose a kanji, a key for an appropriate pattern is selected and typed, and components are chosen to fill each individually numbered block of the selected pattern, following the established order as shown below. 7 each pattern has a code, and so does each component . when a key is typed, the code is punched on a paper tape as shown in figure 2. there are cases where a kanji with two components can be a component of another kanji, as shown in the first and second examples in figure 2. a kanji is constructed by punching at least three codes : one for a pattern and at least two for components. then, a kanji dictionary consisting of several thousand master-code combinations (see figure 3) is stored in a magnetic drum, and the several codes to compose a kanji punched on paper or cassette tapes are converted through this dictionjapanese character lnput!morita 13 k&njl nol on pattern a componenl parlo (radiula) lhe keyboard• ;1§ *-d! [e] . f§ ---. .j 2804 38d 2723 --·-c od eo ~t§ !-.~f~~ --:~ . : .... .: . ... ! 00 "j * ej 2806 3813 1638 1938 -codu t-t ;f:t:~ lm * ~t ~ ~ ~' ,,.~; u : __ ~~-; 4 2807 1638 1138 1138 1138 --cod eo ffe ~*,l; ~ [1@ * ;-1-1 y {! -l __ m1 ___.. 4 2807 1o3a 1817 142a 08z4 ---cod eo fig. 2 . component pattern input. z804 3813 z7zb 0000 0000 0000 8118 • ~-m z806 3813 1638 193!1 0000 0000 b 118 -ao z607 1638 1138 1138 1138 0000 6117 -~ 1a z807 1638 1817 l4za 08z4 0000 9815 .. t~ fig. 3. kanji dictionary. ary to a two-byte binary code assigned to that particular kanji. these are then handled as other kanji with an individual code. though this can be a stand-alone approach to inputting kanji, the principle has been adopted by the national diet library to supplement the inputting of kanji on the full keyboard kanji teletypewriter. the national diet library uses this system when inputting kanji that are not included in its keyboard. instead of having a special separate keyboard, the kanji teletypewriter of the national diet library integrates patterns and components as equivalents to other characters. its keyboard includes forty patterns and approximately 140 components. this was the most elementary approach to computerize kanji . conceived in the early developmental stage of kanji processing, it used one of the characteristics of kanji, the composition from several components. in actual situations, this technique requires at least three key strokes for one kanji and consumes time to locate the needed component on the 14 journal of library automation vol. 14/1 march 1981 keyboard. furthermore, it requires the complicated extra step of putting input codes through a kanji dictionary to combine component codes into a code per kanji. no library is currently using this system by itself. kana keyboard system the keyboard of a japanese syllabary typewriter has adapted the conventional english typewriter keyboard and has standard roman alphabet keys that contain katakana in shift (figure 4). since the number of katakana exceeds that of roman letters, the katakana keys are extended to keys for numerals and punctuation marks. this means that this typewriter can be used either for kana or roman letters by changing its mode. fig. 4. kana typewriter keyboard. two-key stroke method this variation of the kana keyboard system is referred to as the twokey stroke system, and uses kana as codes not as letters . roman letters can be used as codes, too. there are two different subvariations. they are: location correspondence. keys are divided into two sections : one for right hand, and the other for left hand. if two keys are to be stroked, there will be four possible combinations of key strokes: (1) left hand twice, (2) left .and right, (3) right and left, and (4) right twice. the keyboard is accompanied by a kanji table in which characters are arranged in several blocks and in a certain order within each block. each block, which contains twenty-six kanji in a four-by-six arrangement, is made according to each combination of strokes: first block is left and left; second block is left, right, etc. within each block, the ordinate consists of keys for the first stroke and the abscissa for the second . a kanji which is at the intersection of the above indicates which keys are to be typed. when kanji a is to be typed (see figure 5), since it is in block a indicating the stroke combination as left and left, the operator types a · and w by left hand. if kanji b is to be typed, the operator types key a by left hand and key p by right. each key has a byte code and a combination of two key strokes makes a composite, a two-byte binary code, for a kanji. the bit may be changed by shifting, and different kanji can block a (for left, left) g j.,;( '7-. (q) (w) (e) (r) ~ ( 1) 0000 '! (q) 00 00 4(a) o• 00 ll) 0/0 0 0 (z) ' ,. kanji a japanese character input!morita 15 'ij / (t) (y) 0 0 0 0 00 0 0 ~ (1) "' (q) 4(a) ''l (z) block b (for left, right) 7-.:::.. 7--e" o (u) (i) (0) (p) ($) (c) 000000 oooooo ooo.oo 0 0 0/0 0 0 ,. / / kanji b fig. 5. kanji table for location correspondence method. be typed if another table is prepared for kanji with different bits. association memory method . in this method, each kanji is given two kana which usually represent a reading of that kanji. the operator associates a kanji to be input with two kana assigned to that kanji, and types them with two strokes using the kana keys. both of the key-stroke methods are economical as well as convenient because of the wide availability of kana typewriters . mainly for that reason, both of these systems . have been well accepted and are expected to grow further. since this touch method does not require the operator to look for the character on the keyboard to input, it is the fastest to operate and is considered suitable for input in quantity. it is possible to input 60 to 120 characters per minute. the only drawback is that the operator must get acquainted with the arrangement of kanji in the first variation, and must memorize all the associated kana spelling for many kanji in case of the second variation. in either case, the operator must be professionally trained. the japan information center for science and technology, which indexes many scientific publications, employs a vendor who uses the location correspondence variation of this system for inputting information. display selection this also uses a kana typewriter with a screen in front . when a word is typed in kana, a group of kanji with that sound are displayed on the screen. the operator chooses the right kanji with a light pen-a slow but accurate operation. the operator does not have to be specially trained for this. kana-kanji conversion in contrast to the conventional approach of full keyboard inputting, an entirely new method for inputting kanji is gaining popularity as the 16 journal of library automation vol. 14/1 march 1981 availability of sophisticated software increases. this uses a kana typewriter keyboard to input japanese in syllabary or romanized form, converting them to kanji by software. there are two ways of conversion: one that converts word by word, and the other sentence by sentence. stenotype the stenotype is a typewriterlike device. the operator must be able to take shorthand. when the stenotype is used, it punches words in paper tapes. therefore, inputting is high speed. however, the operator must receive proper training. optical character recognition this system, developing quickly and expected to gain wider use, can scan a maximum of 2,500 printed kanji. 8 one variation connects a writing tablet to a computer so that as the operator writes kanji on the tablet, the computer scans them in stroke order. this function of scanning by the stroke order is considered to be an advantage for processing some types of japanese documents. the drawbacks are that the system is still very expensive, and the number of recognizable characters is fewer than 2,000. voice recognition this is an oral-visual system, in which the human voice is read by a computer. obviously the most difficult to develop, this system is still in an experimental stage . however, a prototype has been demonstrated at various exhibitions, and the system apparently possesses great potential. summary pattern configuration and output devices for japanese characters are basically the same as those for english. however, the pattern generation of characters is mechanically more complicated than that of the roman alphabet, because kanji has a more complicated structure than the roman alphabet and the number of components is greater. each kanji is represented by a two-byte binary code rather than one byte as in roman alphabet. because of this, the efficiency of retrieval is low. presently, hard copy and typesetting for printing of hard copy are the major output forms, and very little on-line retrieval of information with kanji is in current operation. problems particular to kanji processing among numerous problems in processing kanji through computers, major ones are: (1) which kanji are to be included; (2) how many characters are to be handled; (3) what code should be assigned and how it should be arranged on the keyboard or table; and (4) how the kanji not included on the keyboard should be treated. in the early stage of kanji computer development, different institujapanese character input/morita 17 tions handled the problems in ways best suited to their individual needs, according to the nature of the literature covered, the amount of literature processed, and the kinds of output needed . they experimented with the then best available capabilities. as a result, the finished systems are all independent and mutually incompatible. standardization is obviously necessary for exchange of information among the systems. in order to set standards for selection of characters and assignment of codes, jis (japan industrial standard) c6226-1978 has been compiled by the japan association for development of information processing. this is a table of characters designed for information exchange (a portion of which is shown in figure 6). it has a one-byte code as its abscissa and another as its ordinate. characters are arranged so that the intersection of abscissa and ordinate determines a kanji whose code consists of four numerals, two from the abscissa and two from the ordinate. included in the table are kana in both styles, roman, greek, and cyrillic alphabets in upper and lower cases, diacritical marks, numerals, and punctuation marks, as follows: 1. special characters 108 2. numerals (arabic) 10 3. roman alphabets 52 4. hiragana 83 5. katakana 86 6. greek alphabets 48 7. cyrillic alphabets 66 8 . kanji 6,349 total 6~8029 in the first section of the table , numerals, alphabets., kana, and special characters are grouped . in the second section, the total of 2, 965 frequently used kanji are arranged as the first priority group, and an additional 3,384 kanji are selected as the second group 10 in the bottom half of the table. kanji are printed in the preferred style for printing typeface. this table will resolve problems 1 to 3 mentioned above. institutions that had arranged their own codes for kanji, including the national institute of japanese literature, are now automatically translating their own codes into jis codes. in cases where needed kanji are not included on the keyboard, handling varies. with the japanese typewriter, because each kanji is inscribed on a typeface, only the kanji on that typeface is printed when the type bar is stroked . therefore , only kanji that have typefaces can be input in this system, while some other handling is possible in other methods. while the number of characters that can be accommodated on keyboards is limited to 2,000 to 3,500, depending on the type of equip18 journal of library automation vol. 14/1 march 1981 b7 d d did d d d d 0 d 0 0 0 b6 1 1 1 1 1 1 1 1 1 1 1 1 1 ! ~ bs d d d d d 1 d 0 d d d 0 d d 2 "' b4 d d d d 0 d 0 1 1 1 1 1 1 1 bj d 0 d 1 1 1 1 0 0 0 0 1 1 1b2 d 1 1 d 0 1 1 0 0 1 1 d 0 bt 1 0 1 0 1 0 1 0 1 0 1 d 1 ~ 1 "'1 1~ b4 1 2 3 4 5 6 7 8 9 10 11 12 13 b; b6 b5 b3 b2 bt 0 1 0 0 0 0 1 1 :·s p: i jl r-f ii ' lll-i . . ? i ~ ~ lj ' 0 ji' 1 • _; ' 1... ---' . . 0 1 d 0 1 0 1 0 2 ~ oic'ji6 a. \l v * ' t -i t 0 1 0 0 0 j1 1 3 0 1 0 i q i 1 0 0 4 ... ..j.. ~.--. ) ;{_ i .z h tj' /j{ ~ "> d) "' 7 }; 0 1 0 0 1 0 1 5 7 711 1 '/ rf .:r.. .x. ;;t ;t ij ij~ ~ 0 1 0 j 0 i 1 ii 0 6 a!bir t.ieiz h 8 r kia m n 0 1 0 j 0 1 1 i 1 i 7 a 6 1 8 rln e e )k 3 l1 i1 k ji 0 1 0 1 0 olol 8 0 1 ol1,oloi11 9 0 1 0 1 0 ii! 0 10 0 i 1 0 1 0 1 ' 11 j. 0 1 ol 1 1jo 0 12 0 1 0 1 1 0 1 13 0 1 0 1 1 1 0 14 0 1 o 1 1 1 1 1 15 r· :!fi p.§. k ~ n -~ "' 1 t-'· ttr; j ;~ ;_,~ -ftt !ffi 0 1 1 0 0 0 0 16 5.p.. ' t ,u a. ').{ * _[§ ~c...· 0 \1 ' 1 0 0 0 1 17 vi,"'i ~~ ,., .. p-[ :tji r'-· ft•;j;j i rr: 1fn .!jfl ·~.c.~ >j(; ,>_l;;, , .~. lit (j • 1 -'fj•--;;1 0/ !_11/ 0 0 1 0 18 tftl b.fltitti l~j [£}:\ £ fjil ~~ n ~;_rj :& !iii] :l~ j . f--""· i . --:----·-·· ~q.~~ t~r-i~ 1 jf( fe .r.t: ''"' if~~ rm 0 1 1 0 oj1/l 19 is •r ·1,. i \. 1,el ;r-.l; j,~ ~ 0 1 1 0 1 0 0 20 5''5 ;\ j ....:: i ji "'~ fn . f i )(ij • 'f-lj!t. ret jf~ ;flj /fjj •wj .;lj: p~ n i 1 1 n 1 i n i 1 ?l ~ .j~ i m i ~ \.;!cr j:.jt ~rr ~ i.gi ~;j 14!.~ h:l :=r fig. 6 . code of th e japanese graphic character set for information interchange. japanese character input!morita 19 ment, character generators have the capability of outputting more than the number of characters on the keyboard. figure 7 shows their relationship. characters that are in the generator but not on the keyboard must be frequently processed, because the number of characters needed for most documents could reach 6,000 to 6,500. using a shift key to enter another mode is a fairly common technique for inputting uncommon kanji. the keyboard may not have a character but, if the character generator has it, the code for that character can be input by shifting. for example, if a character on the keyboard has a code 0117, a bit is changed so the code 8117 can be typed by shifting and typing that key. if the code 8117 is assigned to another kanji not on the keyboard but indexed in the dictionary, it can be input. this applies for the kanji teletypewriter, tablet style, and the two-key stroke variations of the kana typewriter. in the kanji teletypewriter system used by the national diet library, the keyboard accommodates 2,650 characters, while its character generi i i i ,---...... ' / -'-fig. 7. kanji creating capability. outside system capability system capability character generator capability keyboard characters ator has the capability for 5, 717. operators in the national diet library input kanji that are not on the keyboard by using component pattern input method. or, if the operator finds the kanji code in the specially compiled dictionary in which codes for kanji are indexed, a shift key is used to change the bit, thus creating the code for kanji not on the keyboard. most other tablet systems use code dictionaries. in the twokey stroke variations of kana typewriters, tables of kanji for second and third or more shifts can be built, especially when the location association method is used. the handling of kanji that are not in character generators is more difficult. only the digital character generator, the kind that uses either dot or stroke, can add characters fairly easily. in the flying spot system, characters can be added, but it must be done professionally with an additional character cylinder and is very costly. the national diet library, which now uses flying spot, limits addition of kanji to a minimum. because its output is solely in printed book form, the national diet library inputs a fill character for kanji not in the system . when 20 journal of library automation vol. 14/1 march 1981 the phototypeset masters are made, the fill characters are replaced by typeset characters . the use of a fill character suffices only when the output is phototypeset, because there is a step to replace fill characters by typeface. however, as long as the data base includes many fill characters on the magnetic tapes, the on-line retrieval of information or later utilization of tapes becomes unsatisfactory . the national institute of japanese literature uses a dot matrix and prints by wiredot impact . if a kanji is not in the character generator, the institute's staff composes the kanji in an enlarged dot matrix and creates the capability for printing in the generator. if the kanji made in such a way is used only once, the kanji pattern is not stored in the character generator, so that the generator does not reach its full capacity quickly. the enlarged dot composite for kanji created in the institute is filed and indexed for future use. most other institutions simply do not use those less commonly used kanji, and substitute kana for them . in addition to the problems common to any character output, such as size and number of dots, the problem of the space for kanji in relation to other characters and the choice of vertical or horizontal printing of japanese sentences with kanji must be considered. kanji have many strokes and, as mentioned before, are expressed by two-byte codes . each kanji needs a double space when displayed on screens or printed. when a kanji is used with numerals or kana, the kanji part looks fine but the numerical part has too much space between each numeral. therefore, input of kanji is done in a kanji mode and input of kana, roman alphabets, and numerals are in a kananumerical mode. in this way a multidigit figure looks like one whole figure rather than a line of one-digit figures . some formal documents must be printed in the traditional vertical arrangement. to cope with this situation, some line printers have the capability to precompose a vertical page before printing it. there are multicolor crts · on the market that can be used for the retrieval of library-related information, e. g., main entry in red, series statement in yellow. one last problem that must be considered is that most of these systems require trained operators, or else the operation is very slow. the information is edited and compiled by the editors and prepared for input in the form of worksheets. so are the revisions. at various stages of revising the text, the information must be printed, given to the editors, and revised . further developments in simplifying input and revising texts for efficient flow are to be expected. application of kanji systems processing of vernacular-language materials in their own writing systems is considered vital for research libraries in this country. in adoptjapanese character input/morita 21 ing the kanji systems in such libraries, there are three major factors that must be considered: the objectives and needs of the institution, the cost, and the personnel. first, the institution must know what it must accomplish by means of such a system. the needs may not be the same for all institutions . is the system for retrieving catalog information, or for inputting catalog and other information? is it for internal processing or patron use? is it for a large bibliographic utility to distribute information to its subscribers, or for an individual institution to process its own information? could the system be shared by the department of asian studies in any way? the character set needs· of the institution are a major factor in choosing the system . since input and output devices are different, i.e., one cannot input kanji on a crt and retrieve kanji from the same crt, the institution must consider how much it will need to input, or whether it can rely on available data bases. some institutions may not need any input equipment if they utilize available data bases . if japan marc and other tapes are made accessible by a large bibliographic utility in this country, the institutions will be able to obtain bibliographic information in kanji on the screen. if they want only catalog cards or a com catalog, they will not need any equipment except the terminals supported by the utility. if they want to input, they must consider what form or forms of output they need, how to create the characters not included in the system, in addition to which system to choose. second, cost is an important factor. is the expense jl.lstified in terms of the other needs of the library? what can be accomplished per dollar spent? the kanji systems are still expensive, though the cost will eventually be reduced. how much can be spent and how much continuing support can be expected are factors that modify system expectations. the budget must include not only the one-time hardware cost , but also the software, maintenance, and personnel. third, the availability of personnel will affect the choice of system. what degree of language expertise does the system require in each stage of operation, such as inputting, maintenance , and programming? does it need terminal operators trained in those languages? what other personnel does the system need as far as language-related qualification is concerned? apart from the three major factors discussed above, there are some technical aspects that must be adjusted to library situations in this country. since japanese, chinese, and korean use the same chinese ideographs to different degrees and in different ways, libraries considering automated processing of these language materials are probably expected to handle all three languages by the same system, to say nothing about the other non-roman scripts. problems will arise in selecting characters for inclusion in the system. as pointed out earlier with regard to 22 journal of library automation vol. 14/1 march 1981 japanese character processing, there are simply too many characters for the present capacity of any computer. if korean and chinese languages are to be handled by the same computer, this problem multiplies. the korean alphabet, called hangul, would have to be included. chinese has more characters than japanese. worse yet is the fact that some kanji are simplified in different ways in japan and china, so that they are neither recognizable nor interchangeable between them . it will be an enormous task to accommodate both in the same system. another problem is the arrangement and indexing of kanji. if a full keyboard, a japanese typewriter keyboard, or two-key stroke system, especially its location association method by kana typewriter, is considered for japanese, chinese, and korean, the arrangement of the characters must be indexed and accessed for the three languages, in addition to the multiple readings found in japanese. for example, kanji on the japanese keyboard are usually arranged by the initial sound of the japanese reading of the kanji . this arrangement will be useless for chinese and korean, because japanese readings are not the same as chinese or korean readings. the arrangement of kanji on the keyboards must be on some new principle common to these languages. even if the kana-kanji conversion is used, and roman alphabet-kanji conversion software is adopted, software to handle those three languages must be developed. such software would have to be highly sophisticated. the presence of many homonyms in chinese will cause a great problem to the extent that the system relies on transliterated or romanized forms of the language . recognition of the many identical spellings in different language contexts will be extremely difficult. the above discussion is based on what is currently available in japan . the combination of existing inputting, generating, and outputting equipment developed by japanese technology opens up various possibilities for us to build effective systems in this country . acknowledgment this article is based on a study conducted in japan as a japan foundation professional fellow, and as a visiting re search fellow of the center for research on information and library science, university of tokyo. references l. national institute of japanese lite rature, implementation of a computer system and a kanji handling system at ni]l (tokyo: nijl, 1978), p.16. 2. toshio ishiwata, "kanji shori kenkyu ni motomerareru mono " ("requirements for study on kanji processing"] computopia no.9 (1977) , p.35 . 3. gendai yoga no kiso chishiki , 1980 {basic knowledge on current terms , 1980] (tokyo: jiyukokuminsha, 1980), p .999. 4. figures are taken from the following two sources and compiled by the author: hasegawa, jitsur6. "kanji shari sochi" ("kanji processing devices"] ]aha shari [in formation processing] 19, no.4:353 (april 1978). japanese character lnput!morita 23 sugai, kazur6. "kanji nyii.-shutsuryoku sochi mo kaihatsu doko" ["a trend in development of kanji input-output devices"] business communication 16, no. 7:41 (1979). 5. used for the pattern input mentioned in the following component pattern input system . 6. national diet library, library automation in the national diet library (tokyo: the library, 1979), p.4 . 7. ibid., p.7 . 8. asia business consultants is using an optical character recognition system that can scan handwritten kana and numerals in a small scale to input and process catalog information for a library collection. 9. "joh6 kokan no tame no kanji fug6 no hy6junka" ["s tandarization of kanji code for information interchange"] kagaku gijitsu bunken siibisu [scientific and technical documents service] no.50 (1978), p.29. 10. ibid., p .28. ichiko morita is assistant professor in library administration and head, automated processing division, the ohio state university libraries . editor's notes most ]ola readers are aware of significant delays in publication in the last volume. susan k. martin, a former editor of ]ola, and richard d. johnson, a former editor of college & research libraries , gave freely of their time and energy to bring the journal back on schedule. mary madden, judith schmidt, and the members of the editorial board under the leadership of charles husbands all worked closely with sue and richard in this effort. this was a second time around for sue, who undertook a similar task when she assumed the jola editorship in 1972. the ]ola readership and this editor owe debts of gratitude to sue, richard, and all the others who helped. we do not foresee major changes in the format of the journal as established principally under the editorships of kilgour and martin. we look for increased strength in our book reviews section under the editorship of david weisbrod. the addition of tom harnish as assistant editor for video technologies indicates our recognition of the growing importance of videobased information systems. we encourage reader suggestions. w e welcome brief communications of successes or failures that might be of interest to other readers. letters to the editor about any of our feature articles or communications are solicited. president’s message: open access/open data colleen cuddy information technologies and libraries | march 2012 1 i am very excited to write this column. this issue of information technology and libraries (ital) marks the beginning of a new era for the journal. ital is now an open-access, electronic-only journal. there are many people to thank for this transition. the lita publications committee led by kristen antelman did a thorough analysis of publishing options and presented a thoughtful proposal to the lita board; the lita board had the foresight to push for an open-access journal even if it might mean a temporary revenue loss for the division; bob gerrity, ital editor, has enthusiastically supported this transition and did the heavy lifting to make it happen; and the lita office staff worked tirelessly for the past year to help shepherd this project. i am proud to be leading the organization during this time. to see ital go open access in my presidential year is extremely gratifying. as cliff lynch notes in his editorial, “the library profession has been slow to open up access to the publications of its own professional societies, to take advantage of the greater reach and impact that such policies can offer.” as librarians challenge publishers to pursue open-access venues, myself included, i am relieved to no longer be a hypocrite. by supporting open access we are sending a strong message to the community that we believe in the benefits of open access and we encourage other library organizations to do the same. ital will now reach a much broader and larger audience. this will benefit our authors, the organization, and the scholarship of our profession. i understand that while our members embrace open access, not everyone is pleased with an online-only journal. the number of new journals being offered electronically only is growing and i believe we are beginning to see a decline in the dual publishing model of publishers and societies offering both print and online journals. my library has been cutting back consistently on print copies of journals and this year will get only a handful of journals in print. personally, i have embraced the electronic publishing world. in fact, i held off on subscribing to the new yorker until it had an ipad subscription model! i estimate that i read 95 percent of my books and all of my professional journals electronically. the revolution has happened for me and for many others. i know that our membership will adapt and transition their ital reading habits to our new electronic edition and i look forward to seeing this column and the entire journal in its new format. colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011-12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york. mailto:colleen.cuddy@med.cornell.edu president’s message | cuddy 2 earlier this week saw the research works act die. librarians and researchers across the country celebrated this victory as we preserved an important open-access mandate requiring the deposition of research articles funded by the national institutes of health into pubmed central. this act threatened not just research but the availability of health information to patients and their families. as librarians, we still need to be vigilant about preserving open access and supporting open-access initiatives. i would like to draw your attention to the federal research public access act (frpaa, hr 4004). this act was recently introduced in the house, with a companion bill in the senate. as described by the association of research libraries, frppa would ensure free, timely, online access to the published results of research funded by eleven u.s. federal agencies. the bill gives individual agencies flexibility in choosing the location of the digital repository to house this content, as long as the repositories meet conditions for interoperability and public accessibility, and have provisions for long-term archiving. the legislation would extend and expand access to federally-funded research resources and, importantly, spur and accelerate scientific discovery. notably, this bill does not take anything away from publishers. no publisher will be forced to publish research under the bill’s provisions; any publisher can simply decline to publish the material if it feels the terms are too onerous. i encourage the library community to contact their representatives to support this bill. open access and open data are the keystones of e-science and its goals of accelerating scientific discovery. i hope that many of you will join me at the lita president’s program on june 24, 2012, in anaheim. tony hey, corporate vice president of microsoft research connections and former director of the u.k.'s e-science initiative, and clifford lynch, executive director of the coalition for networked information, will discuss data-intensive scientific discovery and its implications for libraries, drawing from the seminal work the fourth paradigm. librarians are beginning to explore our role in this new paradigm of providing access to and helping to manage data in addition to bibliographic resources. it is a timely topic and one in which librarians, due to our skill set, are poised to take a leadership role. reading the fourth paradigm was a real game changer for me. it is still extremely relevant. you might consider reading a chapter or two prior to the program. it is an open-access e-book available for download from microsoft research (http://research.microsoft.com/en-us/collaboration/fourthparadigm/). i keep a copy on my ipad, right there with downloaded ital article pdfs. http://www.arl.org/pp/access/frpaa-2012.shtml http://research.microsoft.com/en-us/collaboration/fourthparadigm/ 40 information technology and libraries | march 2010 mary kurtz dublin core, dspace, and a brief analysis of three university repositories this paper provides an overview of dublin core (dc) and dspace together with an examination of the institutional repositories of three public research universities. the universities all use dc and dspace to create and manage their repositories. i drew a sampling of records from each repository and examined them for metadata quality using the criteria of completeness, accuracy, and consistency. i also examined the quality of records with reference to the methods of educating repository users. one repository used librarians to oversee the archiving process, while the other two employed two different strategies as part of the selfarchiving process. the librarian-overseen archive had the most complete and accurate records for dspace entries. t he last quarter of the twentieth century has seen the birth, evolution, and explosive proliferation of a bewildering variety of new data types and formats. digital text and images, audio and video files, spreadsheets, websites, interactive databases, rss feeds, streaming live video, computer programs, and macros are merely a few examples of the kinds of data that can be now found on the web and elsewhere. these new dataforms do not always conform to conventional cataloging formats. in an attempt to bring some sort of order from chaos, the concept of metadata (literally “data about data”) arose. metadata is, according to ala, “structured, encoded data that describe characteristics of informationbearing entities to aid in the identification, discovery, assessment, and management of the described entities.”1 metadata is an attempt to capture the contextual information surrounding a datum. the enriching contextual information assists the data user to understand how to use the original datum. metadata also attempts to bridge the semantic gap between machine users of data and human users of the same data. n dublin core dublin core (dc) is a metadata schema that arose from an invitational workshop sponsored by the online computer library center (oclc) in 1995. “dublin” refers to the location of this original meeting in dublin, ohio, and “core” refers to that fact dc is set of metadata elements that are basic, but expandable. dc draws upon concepts from many disciplines, including librarianship, computer science, and archival preservation. the standards and definitions of the dc element sets have been developed and refined by the dublin core metadata initiative (dcmi) with an eye to interoperability. dcmi maintains a website (http://dublincore.org/ documents/dces/) that hosts the current definitions of all the dc elements and their properties. dc is a set of fifteen basic elements plus three additional elements. all elements are both optional and repeatable. the basic dc elements are: 1. title 2. creator 3. subject 4. description 5. publisher 6. contributor 7. date 8. type 9. format 10. identifier 11. source 12. language 13. relation 14. coverage 15. rights the additional dc elements are: 16. audience 17. provenance 18. rights holder dc allows for element refinements (or subfields) that narrow the meaning of an element, making it more specific. the use of these refinements is not required. dc also allows for the addition of nonstandard elements for local use. n dspace dspace is an open-source software package that provides management tools for digital assets. it is frequently used to create and manage institutional repositories. first released in 2002, dspace is a joint development effort of hewlett packard (hp) labs and the massachusetts institute of technology (mit). today, dspace’s future mary kurtz (mhkurtz@gmail.com) is a june 2009 graduate of drexel university’s school of information technology. she also holds a bs in secondary education from the university of scranton and an ma in english from the university of illinois at urbana– champaign. currently, kurtz volunteers her time in technical services/cataloging at simms library at albuquerque academy and in corporate archives at lovelace respiratory research institute (www.lrri.org), where she is using dspace to manage a diverse collection of historical photographs and scientific publications. dc, dspace, and a brief analysis of three university repositories | kurtz 41 is guided by a loose grouping of interested developers called the dspace committers group, whose members currently include hp labs, mit, oclc, the university of cambridge, the university of edinburgh, the australian national university, and texas a&m university. dspace version 1.3 was released in 2005 and the newest version, dspace 1.5, was released in march 2008. more than one thousand institutions around the world use dspace, including public and private colleges and universities and a variety not-for-profit corporations. dc is at the heart of dspace. although dspace can be customized to a limited extent, the basic and qualified elements of dc and their refinements form dspace’s backbone.2 n how dspace works: a contributor’s perspective dspace is designed for use by “metadata naive” contributors. this is a conscious design choice made by its developers and in keeping with the philosophy of inclusion for institutional repositories. dspace was developed for use by a wide variety of contributors with a wide range of metadata and bibliographic skills. dspace simplifies the metadata markup process by using terminology that is different from dc standards and by automating the production of element fields and xml/html code. dspace has four hierarchical levels of users: users, contributors, community administrators, and network/ systems administrators. the user is a member of the general public who will retrieve information from the repository via browsing the database or conducting structured searches for specific information. the contributor is an individual who wishes to add their own work to the database. to become a contributor, one must be approved by a dspace community administrator and receive a password. a contributor may create, upload, and (depending upon the privileges bestowed upon him by his community administrator), edit or remove informational records. their editing and removal privileges are restricted to their own records. a community administrator has oversight within their specialized area of dspace and accordingly has more privileges within the system than a contributor. a community administrator may create, upload, edit, and remove records, but also can edit and remove all records available within the community’s area of the database. additionally, the community administrator has access to some metadata about the repository’s records that is not available to users and contributors and has the power to approve requests to become contributors and grant upload access to the database. lastly, the community administrator sets the rights policy for all materials included in the database and writes the statement of rights that every contributor must agree to with every record upload. the network/systems administrator is not involved with database content, focusing rather on software maintenance and code customization. when a dspace contributor wishes to create a new record, the software walks them through the process. dspace presents seven screens in sequence that ask for specific information to be entered via check buttons, fillin textboxes, and sliders. at the end of this process, the contributor must electronically sign an acceptance of the statement of rights. because dspace’s software attempts to simplify the metadata-creation process for contributors, its terminology is different from dc’s. dspace uses more common terms that are familiar to a wider variety of individuals. for example, dspace asks the contributor to list an “author” for the work, not a “creator” or a “contributor.” in fact, those terms appear nowhere in any dspace. instead, dspace takes the text entered in the author textbox and maps it to a dc element—something that has profound implications if the mapping does not follow expected dc definitions. likewise, dspace does not use “subject” when asking the contributor to describe their material. instead, dspace asks the contributor to list keywords. text entered into the keyword field is then mapped into the subject element. while this seems like a reasonable path, it does have some interesting implications for how the subject element is interpreted and used by contributors. dc’s metadata elements are all optional. this is not true in dspace. dspace has both mandatory and automatic elements in its records. because of this, data records created in dspace look different than data records created in dc. these mandatory, automatic, and default fields affect the fill frequency of certain dc elements—with all of these elements having 100 percent participation. in dspace, the title element is mandatory; that is, it is a required element. the software will not allow the contributor to proceed if the title text box is left empty. as a consequence, all dspace records will have 100 percent participation in the title element. dspace has seven automatic elements, that is, element fields that are created by the software without any need for contributor input. three are date elements, two are format elements, one is an identifier, and one is provenance. dspace automatically records the time of the each record’s creation in machine-readable form. when the record is uploaded into the database, this timestamp is entered into three element fields: dc.date.available, dc.date.accessioned, and dc.date.issued. therefore dspace records have 100 percent participation in the date element. for previously published materials, a separate screen asks for the original publication date, which is then 42 information technology and libraries | march 2010 placed in the dc.date.issued element. like title, the original date of publication is a mandatory field, and failure to enter a meaningful numerical date into the textbox will halt the creation of a record. in a similar manner, dspace “reads” the kind of file the contributor is uploading to the database. dspace automatically records the size and type (.doc, .jpg, .pdf, etc.) of the file or files. this data is automatically entered into dc.format.mimetype and dc.format.extent. like date, all dspace records will have 100 percent participation in the format element. likewise, dspace automatically assigns a location identifier when a record is uploaded to the database. this information is recorded as an uri and placed in the identifier element. all dspace records have a dc.identifier.uri field. the final automatic element is provenance. at the time of record creation, dspace records the identity of the contributor (derived from the sign-in identity and password) and places this information into a dc.provenance element field. this information becomes a permanent part of the dspace record; however, this field is a hidden to users. typically only community and network/systems administrators may view provenance information. still, like date, format, and identifier elements, dspace records have automatic 100 percent participation in provenance. because of the design of dspace’s software, all dspace-created records will have a combination of both contributor-created and dspace-created metadata. all dspace records can be edited. during record creation, the contributor may at any time move backward through his record to alter information. once the record has been finished and the statement of rights signed, the completed record moves into the community administrator’s workflow. once the record has entered the workflow, the community administrator is able to view the record with all the metadata tags attached and make changes using dspace’s editing tools. however, depending on the local practices and the volume of records passing through the administrator’s workflow, the administrator may simply upload records without first reviewing them. a record may also be edited after it has been uploaded, with any changes being uploaded into the database at the end of editing process. in editing a record after it has been uploaded, the contributor, providing he has been granted the appropriate privileges, is able to see all the metadata elements that have attached to the record. calling up the editing tools at this point allows the contributor or administrator to make significant changes to the elements and their qualifiers, something that is not possible during the record’s creation. when using the editing tools, the simplified contributor interface disappears, and the metadata elements fields are labeled with their dc names. the contributor or administrator may remove metadata tags and the information they contain and add new ones selecting the appropriate metadata element and qualifier from a slider. for example, during the editing process, the contributor or administrator may choose to create dc.contributor. editor or dc.subject.lcsh options—something not possible during the record-creation process. in the examination of the dspace records from our three repositories, dspace’s shaping influence on element participation and metadata quality will be clearly seen. n the repositories dspace is principally used by academic and corporate nonprofit agencies to create and manage their institutional repositories. for this study, i selected three academic institutions that shared similar characteristics (large, public, research-based universities) but which had differing approaches to how they managed their metadata-quality issues. the university of new mexico (unm) dspace repository (dspaceunm) holds a wide-ranging set of records, including materials from the university’s faculty and administration, the law school, the anderson school of business administration, and the medical school, as well as materials from a number of tangentially related university entities like the western water policy review advisory commission, new mexico water trust board, and governor richardson’s task force on ethic reform. at the time of the initial research for this paper (spring 2008), dspaceunm provided little easily accessible on-site education for contributors about the dspace record-creation process. what was offered—a set of eight general information files—was buried deep inside the library community. a contributor would have to know the files existed to find them. by summer 2009, this had changed. dspaceunm had a new homepage layout. there is now a link to “help sheets and promotional materials” at the top center of the homepage. this link leads to the previously difficult-tofind help files. the content of the help files, however, remains largely unchanged. they discuss community creation, copyrights, administrative workflow for community creation, a list of supported formats, a statement of dspaceunm’s privacy policy, and a list of required, encouraged, and not required elements for each new record created. for the most part, dspaceunm help sheets do not attempt to educate the contributor in issues of metadata quality. there is no discussion of dc terminology, no attempts to refer the contributor to a thesaurus or controlled vocabulary list, nor any explanation of the record-creation or editing process. this lack of contributor education may be explained in part because dspaceunm requires all new records dc, dspace, and a brief analysis of three university repositories | kurtz 43 to be reviewed by a subject area librarian as part of the dspace community workflow. thus any contributor errors, in theory, ought to be caught and corrected before being uploaded to the database. the university of washington (uw) dspace repository (researchworks at the university of washington) hosts a narrower set of records than dspaceunm, with the materials limited to the those contributed by the university’s faculty, students, and staff, plus materials from the uw’s archives and uw’s school of public and community health. in 2008, researchworks was self-archiving. most contributors were expected to use dspace to create and upload their record. there is no indication in the publicly available information about the record creation workflow if record reviews were conducted before record upload. the help link on the researchworks homepage brought contributors to a set of screen-by-screen instructions on how to use dspace’s software to create and upload a record. the step-through did not include instructions on how to edit a record once it had been created. no explanation of the meanings or definitions of the various dc elements was included in the help files. there also were no suggestions about the use of a controlled vocabulary or a thesaurus for subject headings. by 2009, this link had disappeared and the associated contributor education materials with it. the knowledge bank at ohio state university(osu) is the third repository examined for this paper. osu’s repository hosts more than thirty communities, all of which are associated with various academic departments or special university programs. like researchworks at uw, osu’s repository appears to be self-archiving with no clear policy statement as to whether a record is reviewed before it is uploaded to the repository’s database. osu makes a strong effort to educate its contributors. on the upper-left of the knowledge bank homepage is a slider link that brings the contributor (or any user) to several important and useful sources of repository information: about knowledge bank, faqs, policies, video upload procedures, community set-up form, describing your resources, and knowledge bank licensing agreement. the existence and use of metadata in knowledge bank are explicitly mentioned in the faq and policies areas, together with an explanation of what metadata is and how metadata is used (faq), and a list of supported metadata elements (policies). the describe your resources section gives extended definitions of each dspace-available dc metadata element and provides examples of appropriate metadata-element use. knowledge bank provides the most comprehensive contributor education information of any of the three repositories examined. it does not use a controlled vocabulary list for subject headings, and it does not offer a thesaurus. n data and analysis i chose twenty randomly selected full records from each repository. no more than one record was taken from any one collection to gather a broad sampling from each repository. i examined each record for the quality of its metadata. metadata quality is a semantically slippery term. park, in the spring 2009 special metadata issue of cataloging and classification quarterly, suggested that most commonly accepted criteria for metadata quality are completeness, accuracy, and consistence.3 those criteria will be applied in this analysis. for the purpose of this paper, i define completeness as the fill rate for key metadata elements. because the purpose of metadata is to identify the record and to assist in the user’s search process, the key elements are title, contributor/creator, subject, and description.abstract— all contributor-generated fields. i chose these elements because these are the fields that the dspace software uses when someone conducts an unrestricted search. table 1 shows the fill rate for the title element is 100 percent for all three repositories. this is to be expected because, as noted above, title is mandatory field. the fill rate for contributor/creator is likewise high: 16 of 20 (80 percent) for unm, 19 of 20 (95 percent) for uw, and 19 of 20 (95 percent) for osu. (osu’s fill rate for creator and contributor were summed because osu uses different definitions for creator and contributor element fields than do unm or uw. this discrepancy will be discussed in greater depth in the consistency of metadata terminology below.) the fill rate for subject was more variable. unm’s subject fill rate was 100 percent, while uw’s was 55 percent, and osu’s was 40 percent. the fill rate for the description.abstract subfield was 12 of 80 (60 percent) at unm, 15 of 20 (75 percent) at uw, and 8 of 20 (40 percent) at osu. (see appendix a for a complete list of metadata elements and subfields used by each of the three repositories.) the relatively low fill rate (below 50 percent) at the osu knowledgebank in both subject and description .abstract suggests a lack of completeness in that repository’s records. accuracy in metadata quality is the essential “correctness” of a record. correctness issues in a record range from data-entry issues (typos, misspellings, and inconsistent date formats) to the correct application of metadata definitions and data overlaps.4 accuracy is perhaps the most difficult of the metadata 44 information technology and libraries | march 2010 quality criteria to judge. local practices vary widely, and dc allows for the creation of custom metadata tags for local use. additionally, there is long-standing debate and confusion about the definitions of metadata elements even among librarians and information professionals.5 because of this, only the most egregious of accuracy errors were considered for this paper. all three repositories had at least one record that contained one or more inaccurate metadata fields; two of them had four or more inaccurate records. inaccurate records included a wide variety of accuracy errors, including poor subject information (no matter how loosely one defines a subject heading, “the” is not an accurate descriptor); mutually contradictory metadata (record contained two different language tags, although only one applied to the content); and one in which the abstract was significantly longer and only tangentially related than the file it described. additionally, records showed confusion over contributor versus creator elements. in a few records, contributors entered duplicate information into both element fields. this observation supports park and childress’s findings that there is widespread confusion over these elements.6 among the most problematic records in terms of accuracy were those contained in uw’s early buddhist manuscripts project. this collection, which has been removed from public access since the original data was drawn for this paper, contained numerous ambiguous, contradictory, and inaccurate metadata elements.7 while contributor-generated subject headings were specifically not examined for this paper, it must be noted that was a wide variation in the level of detail and vocabulary used to describe records. no community within any of the repositories had specific rules for the generation of keyword descriptors for records, and the lack of guidance shows. consistency can be defined as the homogeneity of formats, definitions, and use of dc elements within the records. this consistency, or uniformity, of data is important because it promotes basic semantic interoperability. consistency both inside the repository itself and with other repositories makes the repository easier to use and provides the user with higher quality information. all three repositories showed 100 percent consistency in dspace-generated elements. dspace’s automated creation of date and format fields provided reliably consistent records in those element fields. dspace’s automatic formatting of personal names in the dc.contributor.author and dc.creator fields also provided excellent internal consistency. however, the metadata elements were much less consistent for contributor-generated information. inconsistency within the subject element is where most problems occurred. personal names used as subject heading and capitalization within subject headings both proved to be particular issues. dspace alphabetizes subject headings according to the first letter of the free text entered in the keyword box. thus the same name entered in different formats (first name first or last name first) generates different subject-heading listings. the same is true for capitalization. any difference in capitalization of any word within the free-text entry generates a separate subject heading. another field where consistency was an issue was dc.description.sponsorship. sponsorship is problem because different communities, even different collections within the same community, use the field to hold different information. some collections used the sponsorship field to hold the name of a thesis or dissertation advisor. some collections used sponsorship to list the funding agency or underwriter for a project being documented inside the record. some collections used sponsorship to acknowledge the donation of the physical materials documented by the record. while all of these are valid uses of the field, they are not the same thing and do not hold the same meaning for the user. the largest consistency issue, however, came from table 1. metadata fields and their frequencies element univ. of n.m. univ. of wash. ohio state univ. title 20 20 20 creator 0 0 16 subject 20 11 8 description 12 16 17 publisher 4 4 8 contributor 16 19 3 date 20 20 20 type 20 20 20 identifier 20 20 20 source 0 0 0 language 20 20 20 relation 3 1 6 coverage 2 0 0 rights 2 0 0 provenance ** ** ** **provenance tags are not visible to public users dc, dspace, and a brief analysis of three university repositories | kurtz 45 a comparison of repository policies regarding element use and definition. unaltered dspace software maps contributor-generated information entered into the author textbox during the record-creation process into the dc.contributor.author field. however, osu’s dspace software has been altered so that the dc.contributor .author field does not exist. instead, text entered into the author textbox during the record-creation process maps to dc.creator. although both uses are correct, this choice does create a significant difference in element definitions. osu’s dspace author fields are no longer congruent with other dspace author fields. n conclusions dspace was created as repository management tool. by streamlining the record creation workflow and partially automating the creation of metadata, dspace’s developers hoped to make institutional repositories more useful and functional while time providing an improved experience for both users and contributors. in this, dspace has been partially successful. dspace has made it easier for the “metadata naive” contributor to create records. and, in some ways, dspace has improved the quality of repository metadata. its automatically generated fields ensure better consistency in those elements and subfields. its mandatory fields guarantee 100 percent fill rates in some elements, and this contributes to an increase in metadata completeness. however, dspace still relies heavily on contributorgenerated data to fill most of the dc elements, and it is in these contributor-generated fields that most of the metadata quality issues arise. nonmandatory fields are skipped, leading to incomplete records. data entry errors, a lack of authority control over subject headings, and confusion over element definitions can lead to poor metadata accuracy. a lack of enforced, uniform naming and capitalization conventions leads to metadata inconsistency, as does the localized and individual differences in the application of metadata element definitions. while most of the records examined in this small survey could be characterized as “acceptable” to “good,” some are abysmal. to improve the inconsistency of the dspace records, the three universities have tried differing approaches. only unm’s required record review by a subject area librarian before upload seems to have made any significant impact on metadata quality. unm has a 100 percent fill rate for subject elements in its records, while uw and osu do not. this is not to say that unm’s process is perfect and that poor records do not get into the system—they do (see appendix b for an example). but it appears that for now, the intermediary intervention of a librarian during the record-creation process is an improvement over self-archiving—even with education—by contributors. references and notes 1. association of library collections & technical services, committee on cataloging: description & access, task force on metadata, “final report,” june 16, 2000, http://www.libraries .psu.edu/tas/jca/ccda/tf-meta6.html (accessed mar. 10, 2007). 2. a voluntary (and therefore less-than-complete) list of current dspace users can be found at http://www.dspace. org/index.php?option=com_content&task=view&id=596&ite mid=180. further specific information about dspace, including technical specifications, training materials, licensing, and a user wiki, can be found at http://www.dspace.org/index .php?option=com_content&task=blogcategory&id=44&itemi d=125. 3. jung-ran park “metadata quality in digital repositories: a survey of the current state of the art,” cataloging & classification quarterly 47, no. 3 (2009): 213–28. 4. sarah currier et al., “quality assurance for digital learning object repositories: issues for the metadata creation process,” alt-j: research in learning technology 12, no. 1 (2004): 5–20. 5. jung-ran park and eric childress, “dc metadata semantics: an analysis of the perspectives of informational professionals,” journal of information science 20, no. 10 (2009): 1–13. 6. ibid. 7. for a fuller discussion of the collection’s problems and challenges in using both dspace and dc, see kathleen forsythe et al., university of washington ealy buddhist manuscripts project in dspace (paper presented at dc-2003, seattle, wash., sept. 28–oct. 2, 2003), http://dc2003.ischool.washington.edu/ archive-03/03forsythe.pdf (accessed mar. 10, 2007). lita cover 2, cover 3 neal-schuman cover 4 oclc 7 index to advertisers 46 information technology and libraries | march 2010 appendix a. a list of the most commonly used qualifiers in each repository university of new mexico dc.date.issued (20) dc.date.accessioned (20) dc.date.available (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.contributor.author (15)) dc.description.abstract (12) dc.identifier.citation (6) dc.description.sponsorship (4) dc.subject.mesh (2) dc.contributor.other (2) dc.description.sponsor (1) dc.date.created (1) dc.relation.isbasedon (1) dc.relation.ispartof (1) dc.coverage.temporal (1) dc.coverage.spatial (1) dc.contributor.other (1) university of washington dc.date.accessioned (20) dc.date.available (20) dc.date.issued (20) dc.format.mimetype (20) dc.format.extent (20) dc. identifier.uri (20) dc.contributor.author (18) dc.description.abstract (15) dc.identifier.citation (4) dc.identifier.issn (4) dc.description.sponsorship (1) dc.contributor.corporateauthor (1) dc.contributor.illustrator (1) dc.relation.ispartof (1) ohio state university dc.date.issued (20) dc.date.available (20) dc.date.accessioned (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.description.abstract (8) dc.identifier.citation (4) dc.subject.lcsh (4) dc.relation.ispartof (4) dc.description.sponsorship (3) dc.identifier.other (2) dc.contributor.editor (2) dc.contribtor.advisor (1) dc.identifier.issn (1) dc.description.duration (1) dc.relation.isformatof (1) dc.description.statementofresponsibility (1) dc.description.tableofcontents (1) appendix b. sample record dc.identifier.uri http://hdl.handle.net/1928/3571 dc.description.abstract president schmidly’s charge for the creation of a north golf course community advisory board. dc.format.extent 17301 bytes dc.format.mimetype application/pdf dc.language.iso en_us dc.subject president dc.subject schmidly dc.subject north dc.subject golf dc.subject course dc.subject community dc.subject advisory dc.subject board dc.subject charge dc.title community_advisory_board_charge dc.type other the next generation library catalog | zhou 151are your digital documents web friendly? | zhou 151 are your digital documents web friendly?: making scanned documents web accessible the internet has greatly changed how library users search and use library resources. many of them prefer resources available in electronic format over traditional print materials. while many documents are now born digital, many more are only accessible in print and need to be digitized. this paper focuses on how the colorado state university libraries creates and optimizes text-based and digitized pdf documents for easy access, downloading, and printing. t o digitize print materials, we normally scan originals, save them in archival digital formats, and then make them webaccessible. there are two types of print documents, graphic-based and text-based. if we apply the same techniques to digitize these two different types of materials, the documents produced will not be web-friendly. graphic-based materials include archival resources such as historical photographs, drawings, manuscripts, maps, slides, and posters. we normally scan them in color at a very high resolution to capture and present a reproduction that is as faithful to the original as possible. then we save the scanned images in tiff (tagged image file format) for archival purposes and convert the tiffs to jpeg (joint photographic experts group) 2000 or jpeg for web access. however, the same practice is not suitable for modern text-based documents, such as reports, journal articles, meeting minutes, and theses and dissertations. many old text-based documents (e.g., aged newspapers and books), should be yongli zhoututorial files for fast web delivery as access files. for text-based files, access files normally are pdfs that are converted from scanned images. “bcr’s cdp digital imaging best practices version 2.0” says that the master image should be the highest quality you can afford, it should not be edited or processed for any specific output, and it should be uncompressed.1 this statement applies to archival images, such as photographs, manuscripts, and other image-based materials. if we adopt the same approach for modern text documents, the result may be problematic. pdfs that are created from such master files may have the following drawbacks: ■■ because of their large file size, they require a long download time or cannot be downloaded because of a timeout error. ■■ they may crash a user’s computer because they use more memory while viewing. ■■ they sometimes cannot be printed because of insufficient printer memory. ■■ poor print and on-screen viewing qualities can be caused by background noise and bleedthrough of text. background noise can be caused by stains, highlighter marks made by users, and yellowed paper from aged documents. ■■ the ocr process sometimes does not work for high-resolution images. ■■ content creators need to spend more time scanning images at a high resolution and converting them to pdf documents. web-friendly files should be small, accessible by most users, full-text searchable, and have good treated as graphic-based material. these documents often have faded text, unusual fonts, stains, and colored background. if they are scanned using the same practice as modern text documents, the document created can be unreadable and contain incorrect information. this topic is covered in the section “full-text searchable pdfs and troubleshooting ocr errors.” currently, pdf is the file format used for most digitized text documents. while pdfs that are created from high-resolution color images may be of excellent quality, they can have many drawbacks. for example, a multipage pdf may have a large file size, which increases download time and the memory required while viewing. sometimes the download takes so long it fails because a time-out error occurs. printers may have insufficient memory to print large documents. in addition, the optical character recognition (ocr) process is not accurate for high resolution images in either color or grayscale. as we know, users want the ability to easily download, view, print, and search online textual documents. all of the drawbacks created by high-quality scanning defeat one of the most important purposes of digitizing text-based documents: making them accessible to more users. this paper addresses how colorado state university libraries (csul) manages these problems and others as staff create web-friendly digitized textual documents. topics include scanning, long-time archiving, full-text searchable pdfs and troubleshooting ocr problems, and optimizing pdf files for web delivery. preservation master files and access files for digitization projects, we normally refer to images in uncompressed tiff format as master files and compressed yongli zhou is digital repositories librarian, colorado state university libraries, colorado state university, fort collins, colorado 152 information technology and libraries | september 2010152 information technology and libraries | september 2010 factors that determine pdf file size. color images typically generate the largest pdfs and black-and-white images generate the smallest pdfs. interestingly, an image of smaller file size does not necessarily generate a smaller pdf. table 1 shows how file format and color mode affect pdf file size. the source file is a page containing black-and-white text and line art drawings. its physical dimensions are 8.047" by 10.893". all images were scanned at 300 dpi. csul uses adobe acrobat professional to create pdfs from scanned images. the current version we use is adobe acrobat 9 professional, but most of its features listed in this paper are available for other acrobat versions. when acrobat converts tiff images to a pdf, it compresses images. therefore a final pdf has a smaller file size than the total size of the original images. acrobat compresses tiff uncompressed, lzw, and zip the same amount and produces pdfs of the same file size. because our in-house scanning software does not support tiff g4, we did not include tiff g4 test data here. by comparing similar pages, we concluded that tiff g4 works the same as tiff uncompressed, lzw, and zip. for example, if we scan a text-based page as blackand-white and save it separately in tiff uncompressed, lzw, zip, or g4, then convert each page into a pdf, the final pdf will have the same file size without a noticeable quality difference. tiff jpeg generates the smallest pdf, but it is a lossy format, so it is not recommended. both jpeg and jpeg 2000 have smaller file sizes but generate larger pdfs than those converted from tiff images. recommendations 1. use tiff uncompressed or lzw in 24 bits color for pages with color graphs or for historical documents. 2. use tiff uncompressed or lzw compress an image up to 50 percent. some vendors hesitate to use this format because it was proprietary; however, the patent expired on june 20, 2003. this format has been widely adopted by much software and is safe to use. csul saves all scanned text documents in this format. ■■ tiff zip: this is a lossless compression. like lzw, zip compression is most effective for images that contain large areas of single color. 2 ■■ tiff jpeg: this is a jpeg file stored inside a tiff tag. it is a lossy compression, so csul does not use this file format. other image formats: ■■ jpeg: this format is a lossy compression and can only be used for nonarchival purposes. a jpeg image can be converted to pdf or embedded in a pdf. however, a pdf created from jpeg images has a much larger file size compared to a pdf created from tiff images. ■■ jpeg 2000: this format’s file extension is .jp2. this format offers superior compression performance and other advantages. jpeg 2000 normally is used for archival photographs, not for text-based documents. in short, scanned images should be saved as tiff files, either with compression or without. we recommend saving text-only pages and pages containing text and/or line art as tiff g4 or tiff lzw. we also recommend saving pages with photographs and illustrations as tiff lzw. we also recommend saving pages with photographs and illustrations as tiff uncompressed or tiff lzw. how image format and color mode affect pdf file size color mode and file format are two on-screen viewing and print qualities. in the following sections, we will discuss how to make scanned documents web-friendly. scanning there are three main factors that affect the quality and file size of a digitized document: file format, color mode, and resolution of the source images. these factors should be kept in mind when scanning text documents. file format and compression most digitized documents are scanned and saved as tiff files. however, there are many different formats of tiff. which one is appropriate for your project? ■■ tiff: uncompressed format. this is a standard format for scanned images. however, an uncompressed tiff file has the largest file size and requires more space to store. ■■ tiff g3: tiff with g3 compression is the universal standard for faxs and multipage line-art documents. it is used for blackand-white documents only. ■■ tiff g4: tiff with g4 compression has been approved as a lossless archival file format for bitonal images. tiff images saved in this compression have the smallest file size. it is a standard file format used by many commercial scanning vendors. it should only be used for pages with text or line art. many scanning programs do not provide this file format by default. ■■ tiff huffmann: a method for compressing bi-level data based on the ccitt group 3 1d facsimile compression schema. ■■ tiff lzw: this format uses a lossless compression that does not discard details from images. it may be used for bitonal, grayscale, and color images. it may the next generation library catalog | zhou 153are your digital documents web friendly? | zhou 153 to be scanned at no less than 600 dpi in color. our experiments show that documents scanned at 300 or 400 dpi are sufficient for creating pdfs of good quality. resolutions lower than 300 dpi are not recommended because they can degrade image quality and produce more ocr errors. resolutions higher than 400 dpi also are not recommended because they generate large files with little improved on-screen viewing and print quality. we compared pdf files that were converted from images of resolutions at 300, 400, and 600 dpi. viewed at 100 percent, the difference in image quality both on screen and in print was negligible. if a page has text with very small font, it can be scanned at a higher resolution to improve ocr accuracy and viewing and print quality. table 2 shows that high-resolution images produce large files and require more time to be converted into pdfs. the time required to combine images is not significantly different compared to scanning time and ocr time, so it was omitted. our example is a modern text document with text and a black-and-white chart. most of our digitization projects do not require scanning at 600 dpi; 300 dpi is the minimum requirement. we use 400 dpi for most documents and choose a proper color mode for each page. for example, we scan our theses and dissertations in black-andwhite at 400 dpi for bitonal pages. we scan pages containing photographs or illustrations in 8-bit grayscale or 24-bit color at 400 dpi. other factors that affect pdf file size in addition to the three main factors we have discussed, unnecessary edges, bleed-through of text and graphs, background noise, and blank pages also increase pdf file sizes. figure 1 shows how a clean scan can largely reduce a pdf file size and cover. the updated file has a file size of 42.8 mb. the example can be accessed at http://hdl.handle .net/10217/3667. sometimes we scan a page containing text and photographs or illustrations twice, in color or grayscale and in black-and-white. when we create a pdf, we combine two images of the same page to reproduce the original appearance and to reduce file size. how to optimize pdfs using multiple scans will be discussed in a later section. how image resolution affects pdf file size before we start scanning, we check with our project manager regarding project standards. for some funded projects, documents are required in grayscale 8 bits for pages with black-and-white photographs or grayscale illustrations. 3. use tiff uncompressed, lzw, or g4 in black-and-white for pages containing text or line art. to achieve the best result, each page should be scanned accordingly. for example, we had a document with a color cover, 790 pages containing text and line art, and 7 blank pages. we scanned the original document in color at 300 dpi. the pdf created from these images was 384 mb, so large that it exceeded the maximum file size that our repository software allows for uploading. to optimize the document, we deleted all blank pages, converted the 790 pages with text and line art from color to blackand-white, and retained the color table 1. file format and color mode versus pdf file size file format scan specifications tiff size (kb) pdf size (kb) tiff color 24 bits 23,141 900 tiff lzw color 24 bits 5,773 900 tiff zip color 24 bits 4,892 900 tiff jpeg color 24 bits 4,854 873 jpeg 2000 color 24 bits 5,361 5,366 jpeg color 24 bits 4,849 5,066 tiff grayscale 8 bits 7,729 825 tiff lzw grayscale 8 bits 2,250 825 tiff zip grayscale 8 bits 1,832 825 tiff jpeg grayscale 8 bits 2,902 804 jpeg 2000 grayscale 8 bits 2,266 2,270 jpeg grayscale 8 bits 2,886 3,158 tiff black-and-white 994 116 tiff lzw black-and-white 242 116 tiff zip black-and-white 196 116 note: black-and-white scans cannot be saved in jpeg, jpeg 2000, or tiff jpeg formats. 154 information technology and libraries | september 2010154 information technology and libraries | september 2010 many pdf files cannot be saved as pdf/a files. if an error occurs when saving a pdf to pdf/a, you may use adobe acrobat preflight (advanced > preflight) to identify problems. see figure 2. errors can be created by nonembedded fonts, embedded images with unsupported file compression, bookmarks, embedded video and audio, etc. by default, the reduce file size procedure in acrobat professional compresses color images using jpeg 2000 compression. after running the reduce file size procedure, a pdf may not be saved as a pdf/a because of a “jpeg 2000 compression used” error. according to the pdf/a competence center, this problem will be eliminated in the second part of the pdf/a standard— pdf/a-2 is planned for 2008/2009. there are many other features in new pdfs; for example, transparency and layers will be allowed in pdf/a2.5 however, at the time this paper was written pdf/a-2 had not been announced.6 portable, which means the file created on one computer can be viewed with an acrobat viewer on other computers, handheld devices, and on other platforms.3 a pdf/a document is basically a traditional pdf document that fulfills precisely defined specifications. the pdf/a standard aims to enable the creation of pdf documents whose visual appearance will remain the same over the course of time. these files should be software-independent and unrestricted by the systems used to create, store, and reproduce them.4 the goal of pdf/a is for long-term archiving. a pdf/a document has the same file extension as a regular pdf file and must be at least compatible with acrobat reader 4. there are many ways to create a pdf/a document. you can convert existing images and pdf files to pdf/a files, export a document to pdf/a format, scan to pdf/a, to name a few. there are many software programs you can use to create pdf/a, such as adobe acrobat professional 8 and later versions, compart ag, pdflib, and pdf tools ag. simultaneously improve its viewing and print quality. recommendations 1. unnecessary edges: crop out. 2. bleed-through text or graphs: place a piece of white or black card stock on the back of a page. if a page is single sided, use white card stock. if a page is double sided, use black card stock and increase contrast ratio when scanning. often color or grayscale images have bleedthrough problems. scanning a page containing text or line art as black-and-white will eliminate bleed-through text and graphs. 3. background noise: scanning a page containing text or line art as black-and-white can eliminate background noise. many aged documents have yellowed papers. if we scan them as color or grayscale, the result will be images with yellow or gray background, which may increase pdf file sizes greatly. we also recommend increasing the contrast for better ocr results when scanning documents with background colors. 4. blank pages: do not include if they are not required. blank pages scanned in grayscale or color can quickly increase file size. pdf and longterm archiving pdf/a pdf vs. pdf/a pdf, short for portable document format, was developed by adobe as a unique format to be viewed through adobe acrobat viewers. as the name implies, it is table 2. color mode and image resolution vs. pdf file size color mode resolution (dpi) scanning time (sec.) ocr time (sec.) tiff lzw (kb) pdf size (kb) color 600 100 n/a* 16,498 2,391 color 400 25 35 7,603 1,491 color 300 18 16 5,763 952 grayscale 600 36 33 6,097 2,220 grayscale 400 18 18 2,888 1370 grayscale 300 14 12 2,240 875 b/w 600 12 18 559 325 b/w 400 10 10 333 235 b/w 300 8 9 232 140 *n/a due to an ocr error the next generation library catalog | zhou 155are your digital documents web friendly? | zhou 155 able. this option keeps the original image and places an invisible text layer over it. recommended for cases requiring maximum fidelity to the original image.8 this is the only option used by csul. 2. searchable image: ensures that text is searchable and selectable. this option keeps the original image, de-skews it as needed, and places an invisible text layer over it. the selection for downsample images in this same dialog box determines whether the image is downsampled and to what extent.9 the downsampling combines several pixels in an image to make a single larger pixel; thus some information is deleted from the image. however, downsampling does not affect the quality of text or line art. when a proper setting is used, the size of a pdf can be significantly reduced with little or no loss of detail and precision. 3. clearscan: synthesizes a new type 3 font that closely approximates the original, and preserves the page background using a low-resolution copy.10 the final pdf is the same as a born-digital pdf. because acrobat cannot guarantee the accuracy of manipulate the pdf document for accessibility. once ocr is properly applied to the scanned files, however, the image becomes searchable text with selectable graphics, and one may apply other accessibility features to the document.7 acrobat professional provides three ocr options: 1. searchable image (exact): ensures that text is searchable and selectfull-text searchable pdfs and troubleshooting ocr errors a pdf created from a scanned piece of paper is inherently inaccessible because the content of the document is an image, not searchable text. assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot figure 1. pdfs converted from different images: (a) the original pdf converted from a grayscale image and with unnecessary edges; (b) updated pdf converted from a blackand-white image and with edges cropped out; (c) screen viewed at 100 percent of the pdf in grayscale; and (d) screen viewed at 100 percent of the pdf in black-and-white. dimensions: 9.127” x 11.455” color mode: grayscale resolution: 600 dpi tiff lzw: 12.7 mb pdf: 1,051 kb dimensions: 8” x 10.4” color mode: black-and-white resolution: 400 dpi tiff lzw: 153 kb pdf: 61 kb figure 2. example of adobe acrobat 9 preflight 156 information technology and libraries | september 2010156 information technology and libraries | september 2010 but at least users can read all text, while the black-and-white scan contains unreadable words. troubleshoot ocr error 3: cannot ocr image based text the search of a digitized pdf is actually performed on its invisible text layer. the automated ocr process inevitably produces some incorrectly recognized words. for example, acrobat cannot recognize the colorado state university logo correctly (see figure 6). unfortunately, acrobat does not provide a function to edit a pdf file’s invisible text layer. to manually edit or add ocr’d text, adobe acrobat capture 3.0 (see figure 7) must be purchased. however, our tests show that capture 3.0 has many drawbacks. this software is complicated and produces it’s own errors. sometimes it consolidates words; other times it breaks them up. in addition, it is time-consuming to add or modify invisible text layers using acrobat capture 3.0. at csul, we manually add searchable text for title and abstract pages only if they cannot be ocr’d by acrobat correctly. the example in troubleshoot ocr error 2: could not perform recognition (ocr) sometimes acrobat generates an “outside of the allowed specifications” error when processing ocr. this error is normally caused by color images scanned at 600 dpi or more. in the example in figure 4, the page only contains text but was scanned in color at 600 dpi. when we scanned this page as blackand-white at 400 dpi, we did not encounter this problem. we could also use a lower-resolution color scan to avoid this error. our experiments also show that images scanned in black-and-white work best for the ocr process. in this article we mainly discuss running the ocr process on modern textual documents. black-and-white scans do not work well for historical textual documents or aged newspapers. these documents may have faded text and background noise. when they are scanned as blackand-white, broken letters may occur, and some text might become unreadable. for this reason they should be scanned in color or grayscale. in figure 5, images scanned in color might not produce accurate ocr results, ocred text at 100 percent, this option is not acceptable for us. for a tutorial on to how to make a full-text searchable pdf, please see appendix a. troubleshoot ocr error 1: acrobat crashes occasionally acrobat crashes during the ocr process. the error message does not indicate what causes the crash and where the problem occurs. fortunately, the page number of the error can be found on the top shortcuts menu. in figure 3, we can see the error occurs on page 7. we discovered that errors are often caused by figures or diagrams. for a problem like this, the solution is to skip the error-causing page when running the ocr process. our initial research was performed on acrobat 8 professional. our recent study shows that this problem has been significantly improved in acrobat 9 professional. figure 3. adobe acrobat 8 professional crash window figure 4. “could not perform recognition (ocr)” error figure 5. an aged newspaper scanned in color and black-and-white aged newspaper scanned in color aged newspaper scanned in black-and-white the next generation library catalog | zhou 157are your digital documents web friendly? | zhou 157 a very light yellow background. the undesirable marks and background contribute to its large file size and create ink waste when printed. method 2: running acrobat’s built-in optimization processes acrobat provides three built-in processes to reduce file size. by default, acrobat use jpeg compression for color and grayscale images and ccitt group 4 compression for bitonal images. optimize scanned pdf open a scanned pdf and select documents > optimize scanned pdf. a number of settings, such as image quality and background removal, can be specified in the optimize scanned pdf dialog box. our experiments show this process can noticably degrade images and sometimes even increase file size. therefore we do not use this option. reduce file size open a scanned pdf and select documents > reduce file size. the reduce file size command resamples and recompresses images, removes embedded base-14 fonts, and subset-embeds fonts that were left embedded. it also compresses document structure and cleans up elements such as invalid bookmarks. if the file size is already as small as possible, this command has no effect.11 after process, some files cannot be saved as pdf/a, as we discussed in a previous section. we also noticed that different versions of acrobat can create files of different file sizes even if the same settings were used. pdf optimizer open a scanned pdf and select advanced > pdf optimizer. many settings can be specified in the pdf optimizer dialog box. for example, we can downsample images from sections, we can greatly reduce a pdf’s size by using an appropriate color mode and resolution. figure 9 shows two different versions of a digitized document. the source document has a color cover and 111 bitonal pages. the original pdf, shown in figure 9 on the left, was created by another university department. it was not scanned according to standards and procedures adopted by csul. it was scanned in color at 300 dpi and has a file size of 66,265 kb. we exported the original pdf as tiff images, batch-converted color tiff images to black-and-white tiff images, and then created a new pdf using blackand-white tiff images. the updated pdf has a file size of 8,842 kb. the image on the right is much cleaner and has better print quality. the file on the left has unwanted marks and figure 8 is a book title page for which we used acrobat capture 3.0 to manually add searchable text. the entire book may be accessed at http://hdl .handle.net/10217/1553. optimizing pdfs for web delivery a digitized pdf file with 400 color pages may be as large as 200 to 400 mb. most of the time, optimizing processes may reduce files this large without a noticeable difference in quality. in some cases, quality may be improved. we will discuss three optimization methods we use. method 1: using an appropriate color mode and resolution as we have discussed in previous ~do university original logo text ocred by acrobat figure 6. incorrectly recognized text sample figure 7. adobe acrobat capture interface figure 8. image-based text sample 158 information technology and libraries | september 2010158 information technology and libraries | september 2010 grayscale. a pdf may contain pages that were scanned with different color modes and resolutions. a pdf may also have pages of mixed resolutions. one page may contain both bitonal images and color or grayscale images, but they must be of the same resolution. the following strategies were adopted by csul: 1. combine bitmap, grayscale, and color images. we use grayscale images for pages that contain grayscale graphs, such as black-and-white photos, color images for pages that contain color images, and bitmap images for text-only or text and line art pages. 2. if a page contains high-definition color or grayscale images, scan that page in a higher resolution and scan other pages at 400 dpi. 3. if a page contains a very small font and the ocr process does not work well, scan it at a higher resolution and the rest of document at 400 dpi. 4. if a page has both text, color, or grayscale graphs, we scan it twice. then we modify images using adobe photoshop and combine two images in acrobat. in figure 10, the grayscale image has a gray background and a true reproduction of the original photograph. the black-and-white scan has a white background and clean text, but details of the photograph are lost. the pdf converted from the grayscale image is 491 kb and has nine ocr errors. the pdf converted from the black-and-white image is 61kb and has no ocr errors. the pdf converted from a combination of the grayscale and black-and-white images is 283 kb and has no ocr errors. the following are the steps used to create a pdf in figure 10 using acrobat: 1. scan a page twice—grayscale optimizer can be found at http:// www.acrobatusers.com/tutorials/ understanding-acrobats-optimizer. method 3: combining different scans many documents have color covers and color or grayscale illustrations, but the majority of pages are textonly. it is not necessary to scan all pages of such documents in color or a higher resolution to a lower resolution and choose a different file compression. different collections have different original sources, therefore different settings should be applied. we normally do several tests for each collection and choose the one that works best for it. we also make our pdfs compatible with acrobat 6 to allow users with older versions of software to view our documents. a detailed tutorial of how to use the pdf figure 9. reduce file size example figure 10. reduce file size example: combine images the next generation library catalog | zhou 159are your digital documents web friendly? | zhou 159 help.html?content=wsfd1234e1c4b69f30 ea53e41001031ab64-7757.html (accessed mar. 3, 2010). 3. ted padova adobe acrobat 7 pdf bible, 1st ed. (indianapolis: wiley, 2005). 4. olaf drümmer, alexandra oettler, and dietrich von seggern, pdf/a in a nutshell—long term archiving with pdf, (berlin: association for digital document standards, 2007). 5. pdf/a competence center, “pdf/a: an iso standard—future development of pdf/a,” http://www. pdfa.org/doku.php?id=pdfa:en (accessed july 20, 2010). 6. pdf/a competence center, “pdf/a—a new standard for longterm archiving,” http://www.pdfa.org/ doku.php?id=pdfa:en:pdfa_whitepaper (accessed july 20, 2010). 7. adobe, “creating accessible pdf documents with adobe acrobat 7.0: a guide for publishing pdf documents for use by people with disabilities,” 2005, http://www.adobe.com/enterprise/ a c c e s s i b i l i t y / p d f s / a c ro 7 _ p g _ u e . p d f (accessed mar. 8, 2010). 8. adobe, “recognize text in scanned documents,” 2010, http:// help.adobe.com/en_us/acrobat/9.0/ s t a n d a rd / w s 2 a 3 d d 1 fa c fa 5 4 c f 6 -b993-159299574ab8.w.html (accessed mar. 8, 2010). 9. ibid. 10. ibid. 11. adobe, “reduce file size by saving,” 2010, http://help.adobe.com/en_us/ acrobat/9.0/standard/ws65c0a053 -bc7c-49a2-88f1-b1bcd2524b68.w.html (accessed mar. 3, 2010). the other 76 pages as grayscale and black-and-white. then we used the procedure described above to combine text pages and photographs. the final pdf has clear text and correctly reproduced photographs. the example can be found at http://hdl .handle.net/10217/1553. conclusion our case study, as reported in this article, demonstrates the importance of investing the time and effort to apply the appropriate standards and techniques for scanning and optimizing digitized documents. if proper techniques are used, the final result will be web-friendly resources that are easy to download, view, search, and print. users will be left with a positive impression of the library and feel encouraged to use its materials and services again in the future. references 1. bcr’s cdp digital imaging best practices working group, “bcr’s cdp digital imaging best practices version 2.0,” june 2008, http://www.bcr.org/ dps/cdp/best/digital-imaging-bp.pdf (accessed mar. 3, 2010). 2. adobe, “about file formats and compression,” 2010, http://livedocs .adobe.com/en_us/photoshop/10.0/ and black-and-white. 2. crop out text on the grayscale scan using photoshop. 3. delete the illustration on the black-and-white image using photoshop. 4. create a pdf using the blackand-white image. 5. run the ocr process and save the file. 6. insert the color graph. select tools > advanced editing > touchup object tool. rightclick on the page and select place image. locate the color graph in the open dialog, then click open and move the color graph to its correct location. 7. save the file and run the reduce file size or pdf optimizer procedure. 8. save the file again. this method produces the smallest file size with the best quality, but it is very time-consuming. at csul we used this method for some important documents, such as one of our institutional repository’s showcase items, agricultural frontier to electronic frontier. the book has 220 pages, including a color cover, 76 pages with text and photographs, and 143 text-only pages. we used a color image for the cover page and 143 black-and-white images for the 143 text-only pages. we scanned appendix a. step-by-step creating a full-text searchable pdf in this tutorial, we will show you how to create a full-text searchable pdf using adobe acrobat 9 professional. creating a pdf from a scanner adobe acrobat professional can create a pdf directly from a scanner. acrobat 9 provides five options: black and white document, grayscale document, color document, color image, and custom scan. the custom scan option allows you to scan, run the ocr procedure, add metadata, combine multiple pages into one pdf, and also make it pdf/a compliant. to create a pdf from a scanner, go to file > create pdf > from scanner > custom scan. see figure 1. at csul, we do not directly create pdfs from scanners because our tests show that it can produce fuzzy text and it is not time efficient. both scanning and running the ocr process can be very time consuming. if an error occurs during these processes, we would have to start over again. we normally scan images on scanning stations by student employees 160 information technology and libraries | september 2010160 information technology and libraries | september 2010 or outsource them to vendors. then library staff will perform quality control and create pdfs on seperate machines. in this way, we can work on multiple documents at the same time and ensure that we provide high-quality pdfs. creating a pdf from scanned images 1. from the task bar select combine > merge files into a single pdf > from multiple files. see figure 2. 2. in the combine files dialog, make sure the single pdf radio button is selected. from the add files dropdown menu select add files. see figure 3. 3. in the add files dialog, locate images and select multiple images by holding shift key, and then click add files button. 4. by default, acrobat sorts files by file names. use move up and move down buttons to change image orders and use the remove button to delete images. choose a target file size. the smallest icon will produce a file with a smaller file size but a lower image quality pdf, and the largest icon will produce a high image quality pdf but with a very large file size. we normally use the default file size setting, which is the middle icon. 5. save the file. at this point, the pdf is not full-text searchable. making a full-text searchable pdf a pdf document created from a scanned piece of paper is inherently inaccessible because the content of the document is an image, not searchable text. assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot manipulate the pdf document for accessibility. once optical character recognition (ocr) is properly applied to the scanned files, however, the image becomes searchable text with selectable graphics, and one may apply other accessibility features to the document. adobe acrobat professional provides three ocr options, searchable image (exact), searchable image, and clean scan. because searchable image (exact) is the only option that keeps the original look, we only use this option. to run an ocr procedure using acrobat 9 professional: 1. open a digitized pdf. 2. select document > ocr text recognition > recognize text using ocr. 3. in the recognize text dialog, specify pages to be ocred. 4. in the recognize text dialog, click the edit button in the settings section to choose ocr language and pdf output style. we recommend the searchable image (exact) option. click ok. the setting will be remembered by the program and will be used until a new setting is chosen. sometimes a pdf’s file size increases greatly after an ocr process. if this happens, use the pdf optimizer to reduce its file size. figure 2. merge files into a single pdf figure 3. combine files dialog figure 1. acrobat 9 professional’s create pdf from scanner dialog 6 information technology and libraries | march 2010 sandra shores is [tk] sandra shores editorial board thoughts: issue introduction to student essays t he papers in this special issue, although covering diverse topics, have in common their authorship by people currently or recently engaged in graduate library studies. it has been many years since i was a library science student—twenty-five in fact. i remember remarking to a future colleague at the time that i found the interview for my first professional job easy, not because the interviewers failed to ask challenging questions, but because i had just graduated. i was passionate about my chosen profession, and my mind was filled from my time at library school with big ideas and the latest theories, techniques, and knowledge of our discipline. while i could enthusiastically respond to anything the interviewers asked, my colleague remarked she had been in her job so long that she felt she had lost her sense of the big questions. the busyness of her daily work life drew her focus away from contemplation of our purpose, principles, and values as librarians. i now feel at a similar point in my career as this colleague did twenty-five years ago, and for that reason i have been delighted to work with these student authors to help see their papers through to publication. the six papers represent the strongest work from a wide selection that students submitted to the lita/ ex libris student writing award competition. this year’s winner is michael silver, who looks forward to graduating in the spring from the mlis program at the university of alberta. silver entered the program with a strong library technology foundation, having provided it services to a regional library system for about ten years. he notes that “the ‘accidental systems librarian’ position is probably the norm in many small and medium sized libraries. as a result, there are a number of practices that libraries should adopt from the it world that many library staff have never been exposed to.”1 his paper, which details the implementation of an open-source monitoring system to ensure the availability of library systems and services, is a fine example of the blending of best practices from two professions. indeed, many of us who work in it in libraries have a library background and still have a great deal to learn from it professionals. silver is contemplating a phd program or else a return to a library systems position when he graduates. either way, the profession will benefit from his thoughtful, well-researched, and useful contributions to our field. todd vandenbark’s paper on library web design for persons with disabilities follows, providing a highly practical but also very readable guide for webmasters and others. vandenbark graduated last spring with a masters degree from the school of library and information science at indiana university and is already working as a web services librarian at the eccles health sciences library at the university of utah. like mr. silver, he entered the program with a number of years’ work experience in the it field, and his paper reflects the depth of his technical knowledge. vandenbark notes, however, that he has found “the enthusiasm and collegiality among library technology professionals to be a welcome change from other employment experiences,” a gratifying comment for readers of this journal. ilana tolkoff tackles the challenging concept of global interoperability in cataloguing. she was fascinated that a single database, oclc, has holdings from libraries all over the world. this is also such a recent phenomenon that our current cataloging standards still do not accommodate such global participation. i was interested to see what librarians were doing to reconcile this variety of languages, scripts, cultures, and independently developed cataloging standards. tolkoff also graduated this past spring and is hoping to find a position within a music library. marijke visser addresses the overwhelming question of how to organize and expose internet resources, looking at tagging and the social web as a solution. coming from a teaching background, visser has long been interested in literacy and life-long learning. she is concerned about “the amount of information found only online and what it means when people are unable . . . to find the best resources, the best article, the right website that answers a question or solves a critical problem.” she is excited by “the potential for creativity made possible by technology” and by the way librarians incorporate “collaborative tools and interactive applications into library service.” visser looks forward to graduating in may. mary kurtz examines the use of the dublin core metadata schema within dspace institutional repositories. as a volunteer, she used dspace to archive historical photographs and was responsible for classifying them using dublin core. she enjoyed exploring how other institutions use the same tools and would love to delve further into digital archives, “how they’re used, how they’re organized, who uses them and why.” kurtz graduated in the summer and is looking for the right job for her interests and talents in a location that suits herself and her family. finally, lauren mandel wraps up the issue exploring the use of a geographic information system to understand how patrons use library spaces. mandel has been an enthusiastic patron of libraries since she was a small child visiting her local county and city public libraries. she is currently a doctoral candidate at florida state university and sees an academic future for herself. mandel expresses infectious optimism about technology in libraries: sandra shores (sandra.shores@ualberta.ca) is guest editor of this issue and operations manager, information technology services, university of alberta libraries, edmonton, alberta, canada. editorial board thoughts | shores 7 looking ahead, it seems clear that the pace of change in today’s environment will only continue to accelerate; thus the need for us to quickly form and dissolve key sponsorships and partnerships that will result in the successful fostering and implementation of new ideas, the currency of a vibrant profession. the next challenge is to realize that many of the key sponsorship and partnerships that need to be formed are not just with traditional organizations in this profession. tomorrow’s sponsorships and partnership will be with those organizations that will benefit from the expertise of libraries and their suppliers while in return helping to develop or provide the new funding opportunities and means and places for disseminating access to their expertise and resources. likely organizations would be those in the fields of education, publishing, content creation and management, and social and community webbased software. to summarize, we at ex libris believe in sponsorships and partnerships. we believe they’re important and should be used in advancing our profession and organizations. from long experience we also have learned there are right ways and wrong ways to implement these tools, and i’ve shared thoughts on how to make them work for all the parties involved. again, i thank marc for his receptiveness to this discussion and my even deeper appreciation for trying to address the issues. it’s serves as an excellent example of what i discussed above. people forget, but paper, the scroll, the codex, and later the book were all major technological leaps, not to mention the printing press and moveable type. . . . there is so much potential for using technology to equalize access to information, regardless of how much money you have, what language you speak, or where you live. big ideas, enthusiasm, and hope for the profession, in addition to practical technology-focused information await the reader. enjoy the issue, and congratulations to the winner and all the finalists! note 1. all quotations are taken with permission from private e-mail correspondence. a partnership for creating successful partnerships continued from page 5 automated book order and circulation control procedures at the oakland university library lawrence auld: oakland university, rochester, michigan 93 automated systems of book order and circulation control using an ibm 1620 computer are described as developed at oakland university. relative degrees of success and failure are discussed briefly. introduction oakland university, affiliated with michigan state ·university and founded in 1957, offers degree programs at the bachelor's and master's levels. by september, 1967, 3,896 students were enrolled and continuing growth is anticipated in coming years. the library had holdings of 86,755 jlumes and 17,908 units of microform materials on july 1, 1967. although young, oakland's library has already encountered a host of problems common to most academic libraries. in recognizing a need to 1utomate or otherwise improve basic routines of handling book ordering •• u. circulation control, oakland is simply another member of a growing club. the book order system developed at oakland is noteworthy because of ·~rtain features which may be unique: a title index to the on-order file, a computer prepared invoice-voucher form, and a computer prepared voucher card which serves as input to the computer for writing payment checks. in logic the system is related, through parallel invention, to the machine aided technical processing system developed at yale university ( 1). the system developed with unit record equipment at the university of maryland is perhaps more directly related, particularly in the use of the purchase order as a vendor's report form (2,3 ). the pennsyl94 journal of library automation vol. 1/ 2 june, 1968 vania state university library design for automated acquisitions, which uses a similar purchase order, includes the capacity for an elaborate and variable method for reporting the progress of each item from initial order to completion of cataloging ( 4,5) . the ibm 357 circulation control system developed at southern illinois university, carbondale, set the pattern followed by most subsequent systems ( 6,7) . oakland's circulation control system, a variation of the ibm 357 system, is more flexible than some because it uses trigger cards to control machine operations. this paper, originally distributed to a relatively small group of persons and redrafted for a more general reading, presents a case study of how one institution in modest circumstances set about solving certain problems. it describes not systems to be copied but rather a learning process which will continue for many years to come. background during the winter of 1964/ 65, oakland university library laid out the plans and began work on a program of automation of the university library. an initial four-phase plan was conceived: 1) book order, 2) circulation control, 3) serials acquisitions, and 4) a printed book catalog. these housekeeping routines were felt to be the foundation for developing further automation in the library. their automation would liberate the staff, clerical and professional, from such nonproductive and repetitive_ tasks as alphabetizing and re-copying of bibliographic information. an early decision to learn by doing rather than attempting to design the ultimate system in advance was supported by the university administration. consensus being that a larger computer to replace the ibm 1620 would be delivered within two years, computer programs were planned to be useful for twenty-four to thirty-six months. work on developing the book order system was begun in march, 1965; perhaps an all-time speed record was achieved when the system was put into use on july 1 of the same year. work on a circulation control system was begun in august and on february 21, 1966, it too was ready. phases three and four, serials acquisitions and the printed book catalog, were by then being held in abeyance until larger computer equipment should become available to the library. at oakland university all computer and related services are provided by the computing and data processing center. the computer system includes the following pieces of equipment: ibm 1620 computer, 40k with monitor 1 and additional instructions feature (mf, tnf, tns) ibm 1622 card reader/ punch (240 cpm/ 125 cpm) two ibm 1311 disk drives with changeable disk packs ibm 1443 line printer ( 240 lpm) automated book order/ auld 95 only one of the two disk drives is available for production use because the other is committed to monitor, supervisor, and stored programs. a disk pack on the ibm 1620 can accommodate two million numeric or one million alphabetic characters. the computer language used for most of the library programs is 1620 sps (symbolic programming system); fortran is used for some computational work. equipment within the library consists of an ibm 026 printing keypunch which is used for the order system and an ibm 357 data collection device, including a time clock, with output via a second ibm 026 printing keypunch for the circulation system. book order procedure as may be inferred from a birdseye view of the order system (figure 1), the initial input to the computer is decklets of punched cards. output from the computer is a series of printouts: purchase orders, library of congress card orders, oakland university invoice-vouchers, a complete fig. 1. flow chart of book order system . 96 journal of library automation vol. 1/ 2 june, 1968 on-order listing with title and purchase order number indices, departmental listings, and budget summaries. facu1ty and library staff submit requests for book purchases to the acquisitions department on a specially designed library book request form (figure 2 ) . the 5x8-inch size provides adequate room for notes, checking marks, etc., and makes for improved legibility, which in turn makes for easier, faster, and more accurate keypunching. kttt;e libt o ry oo~larul untve r1 ity jildg. q 11ery library book request mutt be typ41d &jp st orch au th o, cij tit i• p' tla piip brit . ... p~o~bli•h•r and "oce r----no. copie• i p, bll•h dote l fd ition j•'· _t·· ~ mo . yr. cotl. ltl!qu tttt d it d!portment cited in r---o:;;p'rice t ·· dept i vando• clau l l c cood n•mbe • l.c. fig. 2. book order request form. the request form calls for the bibliographic data customarily required for book purchasing, plus date of ordering, code number for the department originating the order, and vendor number. oakland university utilizes campus-wide a five-digit vendor code system; since the library's vendor numbers are a part of the university's vendor code, this interface is one of several points where the book order system ties in with other university records and procedures. a tag number is assigned to each library book request form upon its arrival in the acquisitions department. after routine bibliographic identification is completed, decklet cards (figure 3) are keypunched. the individual cards in each decklet are kept together by the tag number, punched into columns one through five. to keep the cards in order within decklets, column six is punched to identify the type of card as 1) author, 2) title, 3) place and publisher, or 4) miscellaneous information. column seven indicates the card number within type of card. for example, code 11 in columns six and seven wou1d be the first author card and code 12 the second. ·automated book order/ auld 97 i i : i •i l l i l ~~ i l i i ..... ~;,, i i i i i l l ! i i i : i .,! autho• ~ ~ : 0 : i!! u ~ ••• c< anc ~u&ll&~ bisct/11~ cll~b:s . each book has a machine readable book card (figure 7). the period for which the book normally circulates is indicated with a letter code punched into column one; column two identifies the collection within the library from which the material came; column three identifies · the type of material. the call number and/ or other identifying information is punched into columns four through forty-one. column forty-two is punched with an end-of-transmission code . 104 journal of library automation vol. 1/ 2 june, 1968 .. ::;; !!~8 g c: .. ~~z z ... . 0 p 0 -1 1111 .. !,._ ~oot ~ .... i z:o =;.: "' !::o:r.oa 1);1 ... ;!,..< .. ... '"it"' !ill 0 $ ;;;~_. ... ;., 0 c: s!! ~~1!!1 ,. ill: z o:= lire:~ ... ~-~ -zr.oa s 0 n_ c: ..... :iiii i:' s; s20 z-;. z =~= 0 0-it . . c: ........ .. z !o~ ... < ... ~:~ .. "' ~ . "' pi 'i t' 1 • t •h • eijn!lf!m!i!u+•+•+•+•+•+•+•+•• .. fig. 7 . book card. the ibm 357 data collection device will perform only one operation without special instructions. if it is to perform more than one operation, it. must receive instructions for each variant operation and it must receive them each time the variant operation is performed. this limitation can be met in one of three ways: by not admitting variant operations, by using a cartridge as a carrier for some information, or by providing special . instructions as they are needed via a "trigger" card. denying the existence of a variant operation was not practical, because at oakland the identification of a borrower constitutes a set of variant operations. the library's clientele includes not only oakland university students, faculty, and staff, but also residents from the surrounding communities, area high school students, and neighboring college students. the heaviest users are oakland's own students and faculty, who have machine readable plastic identification cards issued by the registrar or the personnel office. it has been impractical for the library to attempt to issue similar cards to guest borrowers. thus, the identification of a borrower is a set of variant operations. use of a cartridge to gain the borrower identification number would be possible but would leave the borrower identification badge unused. this badge card constitutes an official identification card and as such should be utilized throughout the university whenever practical. . · trigger cards to instruct the 357 in the pedormance of variant operations were developed to control the recording of borrower identification and to identify discharging and certain charging functions. the use of trigger cards provides flexibility, in that machine. instructions are carried in trigger cards and are not an integral part of the book cards. a change in machine configuration would probably not require ·repunching book cards for the book collection. at the same time a wide range · of 357 machine functions are made possible through ·the use of different automated book order/ auld 105 trigger cards. in short, the adoption of trigger cards provides the greatest degree of flexibility in operating the 357. in the customary borrowing procedure the student brings a book to the circulation desk and presents it, along with his machine readable student id card, to the desk attendant. the attendant first inserts the book card into the ibm 357 data collection device, then retrieves the book card and inserts a "student badge trigger card", which activates the badge reader on the 357. then the badge is inserted into the badge reader, completing the transaction. by remote control this has created on an ibm 026 printing keypunch a card with the following information: typical loan period, collection from which the item came, type of material, call number, borrower type, borrower's identification number, the day of the year, and the time of day secured from an on-line clock. if the borrower does not have a machine readable badge card, an alternate method of charging a book is to use a "manual entry trigger card" which activates the manual entry unit, with which can be recorded numeric information identifvine: the borrower. with special trigger cards .bo;;ks can also be charged to reserve, bindery, or "missing". books are discharged by passing the book card through the 357 and following it by a "discharge trigger card". monday through friday at closing the charge and discharge cards for the day are delivered to the computing and data processing center, where they are processed by the ibm 1620 computer system. the circulation file is maintained on a disk pack similar to that for the order. system. three reports are received from the computing and data processing center: a daily cumulative listing of all books and materials in circulation (figure 8); a cumulative weekly list of all books on long-term loan; and a weekly fines-due report. in addition, overdue notices, computer printed on mailable postcard stock, are sent weekly to the library where they are audited before being mailed. the fines-due report is arranged by borrower, bringing together in one place all of the borrower's delinquencies; the books which he has neglected to return are listed here, as are the overdue books which he returned through the outdoor book return chute. for the latter the number of days overdue at the time of return is listed. subsequent refinements introduced into this system include two additional reports: a pre-notice report in call number sequence produced two days in advance of the fines-due report and a listing of books discharged each day. the pre-notice report makes it possible to search the shelves for books which have been returned but, because of time lag, may still have overdue notices generated. normal tum-around time for the system is 24 hours, but on weekends it goes to 63 hours and at certain holiday periods even higher. the daily list of discharges documents the return and discharge of each book and is used to answer the student who says, "but i returned the book." 106 journal of library automation vol. 1/2 june, 1968 s hort term books in circulation . weds-jul. 13.1966 pag f. 1 8 call numb er borrower day of yr due odue 01 jc0153ol.79 01 000009b74 20b 01 jc 0179 , r723 01 000007736 209 01 j c0 179or83-1962 01 000004838 199 01 j c0 179.r86-195 4 01 000007935 209 01 j c025 lol..27 01 000009021 20 i 01 jc0421ob8vol 01 000000207 127 * 01 j c0 4 23oi..58co2 04 000002393 19 9 01 jk0246ob9-1895v o 2 01 00000020 7 127 * 01 jk04 2 1op4 01 000006266 203 01 jk0421o s7 01 000006266 209 01 jk0516os3 01 000 003891 199 01 jk0518oh6 01 000006266 209 01 jk0524ol.38 01 000007717 2 1 4 01 jk154 1oj27 01 000006266 182 * 01 jk1561o527 01 000003891 199 01 jk1 57 1om8 01 000003640 208 01 j k1976 om5-co2 0 5 0 00002256 207 01 jk2295om5253 01 000007397 209 01 jk2372 oh5 04 000002194 2 10 01 jk 2372op6 04 000002194 2 1 0 01 jk2408ok4 0 1 000 00020 7 146 * 01 jn6769 oa5k622 01 00000 52 31 2 13 01 j01503 o1 912 ob7 01 000003824 209 01 j01503o1911oh72 01 000003824 207 0 1 j01512ok7 01 000 003824 207 01 j s0323oc58 01 0000 07717 209 01 js0341ow7 0 1 00 0 00 7717 2 09 01 j x14 25 op384 04 00000 2925 213 01 jx14 28 oc 6c5-1 964 01 000004154 199 01 jx1 977o2 oc5a73 01 000009 11 9 207 01 jx1977o2ou5577 0 1 000007371 201 fig. 8. example of short term circulation rep01t. maximum file capacity will permit up to about 9,000 charges at one time. assuming an average life of four weeks for each charge, the maximum number of transactions which can be accommodated in one year is about 115,000. the circulation control system utilizes eight programs. all are written in 1620 sps and utilize 40k storage. (an additional computational program not included in the production package is written in fortran.) with only minor modification the programs could be made to work with 20k storage. the individual programs are described in table 2. tabk 2. lib 201 lib 202 lib 204 lib 205 lib 207 lib 209 lib 212 lib 213 circulation control system programs to update file and to print short-and long-term reports. to print overdue notices and fines-due report. phase 1 routine for lib 202. cold start program to "seed" circulation file. to restart files from one term to the next. to print pre-notice report. to print daily discharges. to print circulation file or part thereof. • automated book order/ auld 107 appraisal the book order system has been described as it was originally designed, and the circulation control system as designed and modified. a partial update together with a critical appraisal follows. implicit in the planning of both systems was the assumption that the ibm 1620 would eventually be replaced by a larger and faster machine and that both systems would be redesigned and augmented. however, the ibm 1620 is continuing in use for a maximum rather than minimum projected time. in july, 1965, oakland initiated an accelerated library development program. overnight the book budget projection for several years was available and in less than three months the book order system was consequently overloaded. with the disk ble filled and many orders waiting, drastic action was required. the most obvious solution seemed to be use of an additional changeable disk pack to expand the purchase order file, but this procedure would have been hopelessly unwieldy. to use a second pack would require either that all transactions be run against both disk packs, roughly doubling computer time and costs, or that each transaction be addressed to a particular disk pack which would necessitate extensive systems redesign. another proposed solution was to revert to a completely manual system, but the order section preferred, if at all po~sible, to retain the automated fiscal control and invoice-voucher preparation features of the order system. , the alternative finally adopted required a basic philosophical change in the system. as originally designed, the system accounted for a book from the time it was placed on order to the time it was cataloged and placed on the shelf. the disk file was one-half occupied· with items received and paid for but not yet cataloged. by purging the file of such items, an on-order file in the narrowest sense was created and a doubling of file capacity gained. now a new problem was created. how was a book to be accounted for that had been received, paid, and purged from the on-order file, but not yet cataloged? the solution was to print a second (carbon) copy of the lc card order slip which would be hand-filed into the card catalog; there it would serve as an on-order/ in-process slip until replaced by a catalog card. hand-filed slips replacing a machine-filed list further altered the philosophical basis of the system. discrepancies in entry do occur, but not so often that the expedient does not work. four months later the system was again overloaded and a routine had to be devised whereby purchase orders could be issued either manually or through the computer. however, all items were still paid via the computer and all invoice-vouchers computer prepared. fiscal control was retained even though the rationale of the system was violated . 108 journal of library automation vol. 1/ 2 june, 1968 during the summer of 1967 a change of a different nature was implemented. as originally designed the system provided constant communication between the library and each faculty department through the departmental report. but, after the changes described above, the departmental report now included less than one-half of the items being purchased with the department's book fund allocation. it had ceased to serve any purpose and was omitted after july, 1967, with a consequent reduction of nearly two-fifths of line-printer time required for the book order system. to the question, "would it be better to return to a completely manual system for ordel'ing books?" the answer by the order section has always been "no, retention of the automated system for fiscal control and voucher preparation is preferable, even with the patched system at hand." nor should it be forgotten that the book order system as originally designed worked well until the demand on it exceeded its production capacity. also to be recognized is the gain in experience and insight by the library staff during these three years. reading about or visiting someone else's work is enlightening but day-to-day work brings an understanding for which it is difficult to obtain a substitute. acknowledgments four persons deserve special recognition for the roles they played in the foregoing: dr. floyd cammack, former university librarian, without · whose imagination and courage library automation at oakland would not have been attempted; mr. donald mann, assistant director, computing and data processing center, an outstanding systems analyst and programmer; mrs. edith pollock, head of the order section, who likes computers; mrs. nancy covert, head of circulation department, who likes students. references i. alanen, sally; sparks, david e.; kilgour, frederick g.: "a computermonitored library technical processing system," in american documentation institute: proceedings of the annual meeting, v. 3, 1966 (woodland hills, calif.: adrianne press, 1966) p. 419-26. 2. cox, carl r.: "the mechanization of acquisitions and circulation procedures at the university of maryland library," in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964) p. 205-35. 3. cox, carl r.: "mechanized acquisitions procedures at the university of maryland," college & research libraries, 24 (may 1965) 232-36. 4. minder, thomas l.: "automation-the acquisitions program at the pennsylvania state university library," in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964) p. 145-56. automated book order/ auld 109 5. minder, thomas l.; lazorick, gerald: "automation of the penn state university acquisitions department" in international business machines corporation: ibm library mechanization symposium (endicott, n. y. 1964) p. 157-63. (reprinted from american documentation institute: automation and scientific communication; short papers contributed to the theme sessions of the 26th annual meeting ... (washington: 1963) p. 455-59. ) 6. dejarnett, l. r. : "library circulation control using ibm 357's at southern illinois university," in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964) p . 77-94. 7. mccoy, ralph e.: "computerized circulation work: a case study of the 357 data collection system," library resources & technical services, 9 (winter 1965), 59-65. 6 information technology and libraries | september 2008 mireia ribera turróeditorial board thoughts the june issue of ital featured a new column enti-tled editorial board thoughts. the column features commentary written by ital editorial board members on the intersection of technology and libraries. in the june issue kyle felker made a strong case for gerald zaltman’s book how customers think as a guide to doing user-centered design and assessment in the context of limited resources and uncertain user needs. in this column i introduce another factor in the library–it equation, that of rapid technological change. in the midst of some recent spring cleaning in my library i had the pleasure of finding a report documenting the current and future it needs of purdue university’s hicks undergraduate library. the report is dated winter 1995. the following summarizes the hicks undergraduate library’s it resources in 1995: [the library] has seven public workstations running eight different databases and using six different search software programs. six of the stations support a single database only; one station supports one cd-rom application and three other applications (installed on the hard drive). none of the computers runs windows, but the current programs do not require it. five stations are equipped with six-disc cd-rom drives. we do not anticipate that we will be required to upgrade to windows capability in the near future for any of the application programs. today the hicks undergraduate library’s it resources are dramatically different. as opposed to seven public workstations, we have more than seventy computers distributed throughout the library and the digital learning collaboratory, our information commons. this excludes forty-six laptops available for patron checkout and eighty-eight laptops designated for instructional use. we have moved from eight cd-rom databases to more than four hundred networked databases accessible throughout the purdue university libraries, campus, and beyond. as a result, there are hundreds of “search software programs”—doesn’t that phrase sound odd today?—including the library databases, the catalog, and any number of commercial search engines like google. today all, or nearly all, of our machines run windows, and the macs have the capability of running windows. in addition to providing access to databases, our machines are loaded with productivity and multimedia software allowing students to consume and produce a wide array of information resources. beyond computers, our library now loans out additional equipment including hard drives, digital cameras, and video cameras. the 1995 report also includes system specifications for the computers. these sound quaint today. of the seven computers six were 386 machines with processors clocking in at 25 mhz. the computers had between 640k and 2.5mb of ram with hard drives with capacities between 20 and 60mb. the seventh computer was a 286 machine probably with a 12.5 mhz processor, and correspondingly smaller memory and hard disc capacity. the report does not include monitor specifications, though, based on the time, they were likely fourteenor fifteen-inch cga or ega cathode ray tube monitors. modern computers are astonishingly powerful in comparison. according to a member of our it unit, the computers we order today have 2.8 ghz dual core processors, 3gb of ram, and 250gb hard drives. this equates to being 112 times faster, 1,200 times more ram, and hard drives that are 4,167 times larger than the 1995 computers! as a benchmark, consider moore’s law, a doubling of capacitors every two years, a sixty-four fold increase over a thirteen year period. who would have thought that library computers would outpace moore’s law?! today’s computers are also smaller than those of 1995. our standard desktop machines serve as an example, but perhaps not as dramatically as laptops, mini-laptops, and any of the mobile computing machines small enough to fit into your pocket. monitors are smaller, though also bigger. each new computer we order today comes standard with a twenty-inch flat panel lcd monitor. it is smaller in terms of weight and overall size, but the viewing area is significantly larger. these trends are certainly not unique to purdue. nearly every other academic library could boast similar it advancements. with this in mind, and if moore’s law continues as projected, imagine the computer resources that will be available on the average desktop machine— although one wonders if it will in fact be a desktop machine—in the next thirteen years. what things out on the distant horizon will eventually become commonplace? here the quote from the 1995 report about windows is particularly revealing. what things that are currently state-of-the-art will we leave behind in the next decade? what’s dos? what’s a cd-rom? will we soon say, what’s a hard drive? what’s software? what’s a desktop computer? in the last thirteen years we have also witnessed the widespread adoption and proliferation of the internet, the network that is the backbone for many technologies that have become essential components of physical and digital libraries. earlier this year, i co-authored an arl spec kit entitled social software in libraries.1 the survey reports on the usage of ten types of social software within arl libraries: (1) social networking sites like myspace and facebook; (2) media sharing sites like 6 information technology and libraries | september 2008 matthew m. bejune (mbejune@purdue.edu) is an ital editorial board member (2007–09), assistant professor of library science at purdue university, and doctoral student in the graduate school of library and information science at the university of illinois at urbana–champaign. matthew m. bejune editorial board thoughts | bejune 7 youtube and flickr; (3) social bookmarking and tagging sites like del. icio.us and librarything; (4) wikis like wikipedia and library success: a best practices wiki; (5) blogs; (6) rss used to syndicate content from webpages, blogs, podcasts, etc.; (7) chat and instant messenger services; (8) voice over internet protocol (voip) services like googletalk and skype; (9) virtual worlds like second life and massively multiplayer online games (mmogs) like world of warcraft; and (10) widgets either developed by libraries like facebook applications, firefox catalog search extensions, or widgets implemented by libraries like meebome and firefox plugins. of the 64 arl libraries that responded, a 52% response rate, 61 (95% of respondents) said they are using social software. of the three libraries not using social software, two indicated they plan to do so in the future. in combination then, 63 out of 64 respondents (98%) indicated they are either currently using or planning to use social software. as part of the survey there was a call for examples of social software used in libraries. of the 370 examples we received, we selected around 70 for publication in the spec kit. the examples are captivating and they illustrate the wide variety of applications in use today. of the ten social software applications in the spec kit, how many of them were at our disposal in 1995? by my count three: chat and instant messenger services, voip, and virtual worlds such as text-based muds and moos. of these three, how many were in use in libraries? very few, if any. in our survey we asked libraries for the year in which they first implemented social software. the earliest applications were cu-seeme, a voip chat service at cornell university in 1996, im at the university of california riverside in 1996 as well, and interoffice chat at the university of kentucky in 1998. the remaining libraries adopted social software in year 2000 and beyond, with 2005 being the most common year with 22 responses or 34% of the libraries that had adopted social software. a look at this data shows that my earlier use of a thirteen-year time period to illustrate how difficult it is to project technological innovations that may prove disruptive to our organizations is too broad a time frame. perhaps we should scale this back to looking at five-year increments of time. using the spec kit data, in year 2003, a total of 16 arl libraries had adopted social software. this represents 25% of the total number of institutions that responded when we did our survey. this seems like figure 1. responses to the question, “please enter the year in which your library first began using social software” (n=61). a more reasonable time frame to be looking to the future. so, what does the future hold for it and libraries, whether it be thirteen or five years in the future? i am not a technologist by training, nor do i consider myself a futurist, so i typically defer to my colleagues. there are three places i look to for prognostications of the future. the first is lita’s top technology trends, a recurring discussion group that is a part of ala’s annual conference sand midwinter meetings. past top technology trends discussions can be found on lita’s blog (www.ala .org/ala/lita/litaresources/toptechtrends/toptechnology.cfm) and on lita’s website (www.ala.org/ala/lita/ litaresources/toptechtrends/toptechnology.cfm). the second source is the horizon project, a five-year qualitative research effort aimed at identifying and describing emerging technologies within the realm of teaching and learning. the project is a collaboration between the new media consortium and educause. the horizon project website (http://horizon.nmc.org/wiki/main_page) contains the annual horizon reports going back to 2004. a final approach to project the future of it and libraries is to consider the work of our peers. the next library innovation may emerge from a sister institution. or perhaps it may take route at your local library first! reference 1. bejune, matthew m. and jana ronan. social software in libraries. arl spec kit 304. washington, d.c.: association of research libraries, 2008. mobile technologies & academics: do students use mobile technologies in their academic lives and are librarians ready to meet this challenge? angela dresselhaus and flora shrode mobile technologies & academics | dresselhaus and shrode 82 abstract in this paper we report on two surveys and offer an introductory plan that librarians may use to begin implementing mobile access to selected library databases and services. results from the first survey helped us to gain insight into where students at utah state university (usu) in logan, utah, stand regarding their use of mobile devices for academic activities in general and their desire for access to library services and resources in particular. a second survey, conducted with librarians, gave us an idea of the extent to which responding libraries offer mobile access, their future plans for mobile implementation, and their opinions about whether and how mobile technologies may be useful to library patrons. in the last segment of the paper, we outline steps librarians can take as they “go mobile.” purpose of the study similar to colleagues in all types of libraries around the world, librarians at utah state university (usu) want to take advantage of opportunities to provide information resources and library services via mobile devices. observing growing popularity of mobile, internetcapable telephones and computing devices, usu librarians assume that at least some users would welcome the ability to use such devices to connect to library resources. to find out what mobile services or vendors’ applications usu students would be likely to use, we conducted a needs assessment. the lessons learned will provide important guidance to management decisions about how librarians and staff members devote time and effort toward implementing and developing mobile access. we conducted a survey of usu’s students (approximately 25,000 undergraduates and graduates) to determine the degree of handheld device usage in the student population, the purposes for which students use such devices, and students’ interests in mobile access to the library. in addition, we surveyed librarians to learn about libraries’ current and future plans to launch mobile services. this survey was administered to an opportunistic population angela dresselhaus (aldresselhaus@gmail.com) was electronic resources librarian, flora shrode (flora.shrode@usu.edu) is head, reference & instruction services, utah state university, logan, utah. mailto:aldresselhaus@gmail.com mailto:flora.shrode@usu.edu information technology and libraries | june 2012 83 comprised of subscribers to seven e-mail lists whom we invited to offer feedback. our goal was to develop an action plan that would be responsive to students’ interests. at the same time, we aim to take advantage of the growing awareness of and demand for mobile access and to balance workloads among the library information technology professionals who would implement these services. usu is utah’s land-grant university and the merrill-cazier library is its primary library facility on the home campus in logan, utah. while usu has had satellite branches for some time, a growing emphasis on expanding online and distance education courses and degree programs has resulted in a considerable growth of its distance education programs in the last five years. mobile access to university resources makes especially good sense for the distance education population and for students who may reside close to the main usu campus but who also enroll in online courses. the library has an information technology staff of 4.5 fte professionals who support the library catalog, maintain roughly 250 computer workstations in cooperation with the director of campus student computer labs, and oversee the computing needs of library staff and faculty members. literature review mobile access to library resources is not a new concept; in fact, the first project designed to deliver handheld mobile access to library patrons began eighteen years ago, in 1993, the time of mainframe computers and gopher. the “library without a roof” project partners included the university of southern alabama, at&t, bellsouth cellular, and notable technologies, inc. 1 library patrons at participating institutions could search and read electronic texts on their personal digital assistants (pdas) and search the library catalog while browsing in physical collections. as reflected in the literature, interest in pda applications for libraries started to pick up around the turn of the twenty-first century. medical librarians were among the first to widely recognize the potential impact of mobile technologies on librarianship. a 2002 article in the journal of the medical library association and a monograph by colleen cuddy are among the first publications that focus on pdas. 2 a quick perusal of the medical category on the itunes store reveals several professional applications, ranging from new england journal of medicine tools to remote patient vital-sign monitors. as an example of the depth of mobile-device penetration in the medical field, in 2010 the food and drug administration approved the marketing of the airstrip suite of mobile-device applications. these apps work in conjunction with vital-sign monitoring equipment to allow instant remote access to a patient’s vital signs. 3 these examples illustrate the increasing pervasiveness of mobile technology in everyday life. mobile learning in academic areas outside of medicine has increased recently as more universities have adopted mobile technologies. 4 a sampling of current projects at academic mobile technologies & academics | dresselhaus and shrode 84 institutions is provided in the 2010 horizon report. 5 according to the 2010 educause center for applied research (ecar) study, 49 percent of undergraduates consider themselves mainstream adopters of technology. 6 locally, utah state university students have adopted smartphones at the rate of 39.3 percent and other handheld internet devices at the rate of 31.5 percent. these statistics indicate that skills are increasing and the technological landscape is changing quickly. the ecar study reports that student computing is rapidly moving to the cloud, another indication of the rapid change in the use of technology. “usb may one day go the way of the eight-track tape as laptops, netbooks, smartphones and other portable devices enable students to access their content from anywhere. they may or may not be aware of it, but many of today’s undergraduates are already cloud-savvy information consumers, and higher education is slowly but surely following their lead.” 7 similarly, usu students show interest in adopting new technology. while usu students are less likely to own mobile devices, 70.2 percent of respondents indicated that they would be likely or very likely to use library resources on smartphones if they owned capable devices and if the library provided easy access to materials. bridges, gascho rempel, and griggs published a comprehensive article, “making the case for a fully mobile library web site: from floor maps to the catalog,” detailing their efforts to implement mobile services on the oregon state university campus. 8 their paper highlights the popularity of mobile phones and smartphones/web-enabled phones. the authors discuss mobile phone use, library mobile websites, and mobile catalogs, and they describe the process they used to develop their mobile library site. they note that mobile services will certainly be expected in the coming years, and we have learned that usu students share this expectation. survey research in recent years librarians have conducted surveys on mobile technology in libraries. in a 2007 study, cummings, merrill, and borrelli surveyed library patrons to find out if they are likely to access the library catalog via small-screen devices. 9 they discovered that 45.2 percent of respondents, regardless of whether they owned a device, would access the library catalog on a small-screen device. mobile access to the library catalog was the most requested service in the usu student survey, although it accounted for only 16percent of the responses. cummings, et al. also discovered that the most frequent users of the catalog were also the least willing to access the catalog via mobile devices, an interesting observation that merits further research. their survey was completed in june of 2007, just five months after the january 9th release of the original iphone. the release of the iphone is significant as the point where the market demographics of mobile device users began to shift to people under thirty, the primary age group of undergraduate students. 10 librarians wilson and mccarthy at ryerson university conducted two surveys to measure information technology and libraries | june 2012 85 the usage of their catalog’s feature to send a call number via text or email (initiated in 2007) and their “fledgling mobile web site” (launched in 2008). 11 the first survey indicated that 20 percent of respondents owned internet-capable cell phones, and over half said they intended to buy this type of phone when their current contracts expired. the survey respondents indicated they wanted the following services: “booking group study rooms, checking hours and schedules, checking their borrower records and checking the catalogue.” 12 the second survey was conducted a year after the library had implemented a group study room reservation system, catalog and borrower record services, and a computer/laptop availability service. results of the follow-up survey show a drastic increase in ownership of internetcapable cell phones (from 20% to 65%). respondents desired two new services: article searches and e-book access. wilson and mccarthy found that very few library patrons were accessing the mobile services, but “60% of the survey respondents were unaware that the library provided mobile services.” 13 the authors conclude that advertising should be a central part of mobile technology implementation. they also detail how the library contributed expertise and leadership to their campus-wide mobile initiatives. seeholzer and salem conducted a series of focus groups in the spring of 2009 to determine the extent of mobile device use among students at kent state university. 14 notable among their findings are that students are willing to conduct research with mobile devices, and they desire to have a feature-rich interactive experience via handheld devices. students expressed interest in customizing interactions with the library’s mobile site and completing common tasks such as placing holds or renewing library materials. nationwide survey of librarians we asked colleagues who subscribe to e-mail distribution lists to respond to a survey about their libraries’ implementation of mobile applications for access to library collections and services. invitations to take the survey were sent to seven lists (acrl science & technology section, eril, information literacy instruction, liblicense-l, nasig, ref-l, and serialist), and 289 librarians and library staff members responded to the survey. the population of subscribers to the e-mail lists we used to solicit survey responses is dynamic and includes librarians and staff who work in academic and other types of settings. while our findings cannot be generalized in a statistically reliable manner, we nonetheless believe that the survey responses merit thorough analysis. we chose to conduct two surveys to avoid some of the problems we noted in a 2007 study conducted by todd spires. 15 spires’ survey questions focused on librarians’ perceptions rather than on empirical data. we developed separate surveys for librarians and students in hopes of avoiding problems that could arise from basing assumptions on perceived behavior or from the complexity of interpreting and generalizing from perceptions. a survey of library patrons should provide more accurate insight into the ways that patrons are using the library mobile technologies & academics | dresselhaus and shrode 86 via handheld devices. in the libraries that currently provide mobile access to resources, the library catalog is most commonly offered. article databases and assistance from a librarian tie as the second most frequently provided services. figure 1 shows a snapshot of the resources and services librarians reported that they provide. we also asked how long libraries have provided mobile access, and the time periods ranged from a few weeks to more than ten years. five librarians indicated that they have provided mobile access for six to ten years, and it is possible that these respondents may work in medical or health science libraries, as our literature review indicated that access to medical information and journal articles via pdas has been a reality for several years. figure 1. librarians’ responses: does your library provide mobile access to the following library resources? librarians were also asked what services and resources they believe libraries should provide via mobile devices. of one hundred seventy-eight responses, 71 percent indicated that “everything” or a variety of library resources should be made available. a few of the more interesting suggestions include a library café webcam (similar to a popular link from north carolina state university), locker reservations, a virtual suggestion box, alerts about database trials, an app that lists new books, and using ipads or other mobile devices for roving reference. roving reference with tablet pcs was evaluated by smith and pietraszewski at the west campus branch library of texas a&m. 16 as tablet computers become increasingly popular with the release of the ipad and other tablets, 17 roving reference should be reconsidered. smith and pietraszewski note that "the tablet pc proved to be an extremely useful device as well as a novelty that drew student interest (anything to make reference librarians look cool!)" 18 using the latest technology in libraries will help raise awareness that libraries are relevant and adapting to changing user preferences. we asked librarians to indicate who had responsibility for implementing mobile access in their library. the 184 responses are summarized here:  63 percent answered that a library systems or computing professional does this work;  26.1 percent indicated that the electronic resources librarian has this role;  17.9 percent rely on an information professional from outside of the library;  22.8 percent chose “other,” and we unfortunately did not offer a space for comments where survey respondents could tell us the job title of the person in their library who implements mobile access. the results from our sample of librarians are consistent with a larger study by the library journal. 19 the lj study found that the majority of academic libraries have implemented or are information technology and libraries | june 2012 87 planning to implement mobile technologies. student survey in january of 2011 we sent out a thirteen-question survey to students (questions are available in appendix a). usu’s student headcount is 25,767, and 3,074 students responded, representing 11.9 percent of the student population. we asked students to identify with colleges so that we could evaluate the survey sample against the enrollment at usu. the rate of response by college clustered between 12–19 percent with the lowest response rate (8 percent) from the college of education. the highest response rate came from the college of humanities and social sciences. we examined survey response rates from usu undergraduate and graduate populations; 54 percent of undergraduates and 50 percent of graduate students use mobile technology for academic purposes. we believe that our sample is sufficiently representative of the overall population of usu. figure 2. student response rates by college in order to understand the context of survey questions that specifically address mobile access, we asked students how often they used library electronic resources. the majority of students used electronic books, the library catalog, and electronic journals/articles a few times each semester. only 34.4 percent of students never use electronic books, 19.6 percent never use the library catalog, and 17.6 percent never use electronic journals/articles. we made comparisons between disciplines and found no significant difference in electronic resource use between fields in the sciences and those in humanities. further data will be collected in fall 2011 about use of print and electronic materials. mobile technologies & academics | dresselhaus and shrode 88 figure 3. electronic resource use among students students were asked how often they use a variety of handheld devices. we decided to emphasize access over ownership in order to allow for a variety of situations. responses show that 39.3 percent of our students use a smartphone with internet access on a daily basis. another 31.5 percent of students use other handheld devices like an ipod touch on a daily basis. very few students use ipads or e-book readers, with 3.9 percent and 5.4 percent indicating daily use, respectively. we view the "other handheld device" category as an important segment of the mobile technology market because of the lower cost barrier, since such devices do not require a subscription to a data plan. the ecar study also noted the possibility of cost factors influencing the decision of some students not to access the internet via a handheld device. 20 information technology and libraries | june 2012 89 figure 4. mobile device usage students were asked if they use their mobile device or phone for academic purposes (e.g., blackboard, electronic course reserves, etc.). this question was intentionally worded broadly in order to gather general information. we used skip logic to direct respondents to different paths through the survey based on their response to earlier questions. in response to a question about how students use their mobile devices, 54 percent of respondents indicated that they use their mobile devices for academic purposes. we analyzed the results by discipline and noted a few variances. among students responding from the school of business, 63 percent said that they use their mobile device for academic purposes, and 59 percent of engineering students use their devices for school work. the respondents from the other colleges reported use under 50 percent, most likely because of more limited adoption of mobile technology by usu faculty in those fields or lack of personal funds (or unwillingness to spend) to acquire devices and data plans. the 2010 ecar report also noted higher exposure to technology in these fields, indicating that the situation at usu is in line with results from a national study. 21 mobile technologies & academics | dresselhaus and shrode 90 table 1. device use for academic purposes by college we asked the students, “if library resources were easily accessible on your mobile devices, and if you had such a device, how likely would you be to use any of the following for assignments or research?” responses to this question allowed us to gauge interest without concerns about cost of technology or the current state of mobile readiness in our library. among the survey respondents, 70.2 percent are likely or very likely to use resources on a smartphone; 46.9 percent are likely or very likely to use resources on an ipad; 45.9 percent are likely or very likely to use resources on an e-book reader; 63.2 percent are likely or very likely to use resources on other devices. we included an option for respondents to select “not applicable” as distinct from “not likely” to allow for those students who may welcome use of a mobile device but who may currently use a device different from the types we specified. information technology and libraries | june 2012 91 figure 5. likelihood of using library resources on mobile device if easily available we are unsure how to account for the dramatic difference in interest between smartphone and ipad usage. survey responses indicated that only a small number of students have access to an ipad, and it is possible that students have had little opportunity to see their classmates or others use ipads in an academic setting. students were asked in a free-text question to list the services the library should offer. the comments were varied and often used language different from the vocabulary that librarians typically use. in order to gain an understanding of trends and to standardize the language, we coded the survey comments. after coding, trends began to emerge. access to the library catalog was mentioned by 16 percent of respondents. mobile services in general were specified by 11 percent of survey respondents, 10 percent wanted articles, and 9 percent wanted to reserve study rooms on their mobile device. the phrase “mobile services” represents a catch-all tag designated for comments that indicated that a student desired a variety of services or all services that are possible. for example, only 9 percent of respondents indicated they had used text to contact the library and 15percent had used instant messaging. several students indicated they might have used these services but did not know they were available, indicating a need for advertising. while we learned much mobile technologies & academics | dresselhaus and shrode 92 about students’ desires for mobile services from this important subset of comments in response to the free-text question, they did not prove especially useful to guide librarians’ plans for the next stages of implementing mobile technology. figure 6. services requested by students as is common at many institutions, funding at usu is limited and any development in the area of mobile access implementation must be strategic. our survey indicated that usu students are using mobile devices for their academic work and would like to further integrate library resources into their mobile routine. the next section of this paper outlines the steps we are taking toward mobile implementation. going mobile the usu library joins many other academic libraries in the beginning stages of implementing mobile technologies. survey responses from students indicate that they use mobile devices for academic purposes, and until options to use the library with such devices are available and advertised, we will not have a clear understanding of students’ preferences. klatt's article, “going mobile: free and easy,” 22 outlines a way to get started with mobile services with small investments of time and money. articles by griggs, 23 back, 24 and west, 25 and books by green, et al. 26 and hanson 27 also provide guidance in this area. here we offer suggestions to establish an implementation team, conduct an environmental scan, outline steps to begin the process, and shed light on advertising, assessment, and policy issues. information technology and libraries | june 2012 93 implementation team for a library seeking to provide mobile access to online resources, a diverse and talented implementation team is important. public services personnel in an academic library staff are on the front lines and often field students’ questions. they may also have the opportunity to observe how students are using mobile devices in the library. if librarians track reference interactions, they may find evidence that students are attempting to use their mobile devices to access library services. the electronic resources/collections specialist will also play a key role in mobile development. these specialists are often in contact with vendors, and their advocacy is important in encouraging mobile web development in the vendor community. a web site coordinator interested in mobile services and knowledgeable in current web standards will bring essential talent to the team. arguably, a mobile-optimized web site should become a standard level of service. web sites that are optimized or adapted specifically for mobile access are device agnostic and do not require advanced knowledge of smart phone operating systems. therefore existing web development staff can apply their current skill set to expand into mobile web design. in order to launch advanced interactive access to library resources, a programmer who is interested in developing mobile apps on a number of platforms is needed. device-specific applications allow for the use of phone features such as gps and orientation sensing via an accelerometer and provide the basis for augmented reality technologies. environmental scan librarians can learn about mobile usage in their community by gathering information to guide future development. at usu we interpret the numbers of students who use mobile devices for academic purposes as justification for implementing mobile library access, but we have not set a benchmark for a degree of interest that would trigger more development. some of the mobile implementations described at the end of this paper required minimal time or were investigated because of the electronic resources librarian’s interest for their relevance to her role as music subject librarian. in the survey we administered to students, we considered it important to include a wide range of devices, including ipod touches and similar devices that have many of the same possibilities for academic use as smartphones but which do not require a monthly contract. laptops are also considered a mobile technology, and while we did not emphasize this class of devices, some student comments referred specifically to laptop computers. we will monitor use of the mobile applications that we implement and likely conduct a follow-up survey to assess students’ satisfaction and to find out if there are other services they would like for the library to provide. while librarians may gather useful information from a user study, there are other ways to determine if students are, in fact, using mobile devices in the library. one approach is to review logs of reference questions to determine if students are inquiring about access to library resources via mobile devices. recently, a few mobile-related questions have surfaced mobile technologies & academics | dresselhaus and shrode 94 at usu in the libstats program used to track reference interactions. this is also an area where training reference staff to recognize and record questions about mobile access could be helpful to detect demand in the library’s community. if vendors provide statistics about use of their products from mobile devices, this information could also contribute to assessing need. finally, in libraries that use vpn or other off-campus authentication methods, consulting with it support staff to see if they field questions on setting up remote access on smartphones or other devices may factor into decisions regarding mobile access. the usu information technology website provides a knowledgebase that includes entries on a variety of mobile device queries. this indicates to librarians that people in the university community are using their mobile devices for academic functions. before we conducted the survey of usu students, we knew little about the exact nature of their mobile use. getting started after identifying the needs on campus, the next step is to create a plan for mobile implementation. an important aspect of anticipating the needs of a library’s user population is to understand the likely use scenarios, goals, tasks, and context as outlined in “library/mobile: tips on designing and developing mobile web sites.” 28 building on services that incorporate tasks that people already perform in non-academic contexts provides a logical bridge for those who are familiar with everyday use of a mobile device to recognize how such devices can serve academic purposes. gathering information from each vendor that supplies content to the library is an important early step in planning. this information can serve as the basis of a mobile web implementation plan and, in the case of ebsco, creating a profile is necessary in order to allow access to a mobile-formatted platform. at usu our online catalog provider has developed an application for apple's ios platform. if a library’s catalog vendor does not offer a dedicated application or mobile site, samuel liston’s comparisons of three major online catalogs on three popular mobile devices is helpful in gaining an understanding of how opacs display on smartphones. his article also outlines a procedure for testing opacs and usability. 29 at usu we can also take advantage of serials solutions’ mobile-optimized search screen and a variety of applications provided by other vendors. jensen noted that librarians should not rely solely on vendor-created applications due to vendors’ tendency to develop applications that are usable by only a segment of the overall mobile device user population. 30 he adds that libraries should also avoid developing applications for limited platforms. in addition, jensen provides a simple step-by-step process for converting articles retrieved from a vendor database to a format that can be downloaded from electronic course reserves and read on a variety of handheld devices. while using vendor-developed applications is an important strategy, most libraries will find that developing a mobile-compatible library website is necessary. information technology and libraries | june 2012 95 mobile website development can be accomplished in a variety of ways. at usu we plan to offer a version of our regular website by employing cascading style sheets (css). this method is described in the paper by bridges, et al., 31 and standard guidelines can be found in the mobile web best practices 1.0. 32 this method will allow the content to be reformatted at the point of need for a variety of platforms. results from the usu student survey indicate a desire to be able to use a mobile device for access to the library catalog, to use services like reference assistance, find articles, and make study room reservations. the library plans to include hours and location information, access to existing reference chat and text features, and links to databases with mobile friendly websites or vendor-created applications in addition to the resources requested by students. we are still unsure of the best way to provide links to applications and how to explain the various authentication methods required by each vendor. while vpn and ezproxy are possible methods to authenticate via mobile devices, vendors are content at the moment to allow students to access their resources by setting up an account that is based on an authorized e-mail domain or through a user account created on the non-mobile version of the resource. in a few cases at usu, mobile applications from vendors allow access to categories of users such as alumni because they have a usu.edu e-mail address, although the library does not typically include these patrons in our authorized remote user group. advertising, assessment, and policy creating a mobile website and offering mobile services are only the beginning of the effort to provide access to library materials for mobile users. as wilson and mccarthy found, advertising is essential; 33 students won’t use a service they don’t know about. crafting a marketing plan with both online and print materials is essential. educating library staff members, especially those on the public services front line, is an essential part of promoting mobile services. assessment strategies must be developed in order to focus development strategically. periodic surveys and focus groups can inform future development of mobile services and gauge the impact of currently offered services. librarians should encourage vendors to provide usage data for their mobile portals or applications, and libraries can track use data from their own information technology departments. implementation of mobile web services creates the need to develop new policies and to educate staff. privacy concerns and the complexities of digital rights management have the potential to transform the role of the library and its policies. 34 patrons will need to be aware that the library has less control over maintaining privacy when materials are accessed via third-party mobile applications. libraries will need to consider how new developments in pricing models may affect expanding mobile access; one example is harpercollins’ announcement in early 2011 about a policy requiring libraries to repurchase individual e mobile technologies & academics | dresselhaus and shrode 96 book titles after a cap on check-outs is reached. 35 librarians’ desire to offer reference services or other assistance via mobile devices follows naturally from their long-standing efforts to enable patrons to ask questions via e-mail, chat, instant messaging, or sms text. instant messaging, chat, and text lend themselves to mobile access because they are designed for the relatively short exchange that people typically use when communicating with a handheld device. offering reference services using sms text and chat in particular are relatively easy for libraries to employ because there are many free services to support them. in some cases, a systems administrator or it expert may be helpful in navigating the set-up of chat and text services and to integrate them so that, for example, when a text message arrives during a time when no one is monitoring the service, a voicemail message automatically appears in library’s e-mail account. librarians can find an enormous amount of advice on the web and in the literature about how to begin offering mobilefriendly reference, how to expand the virtual reference services they currently provide, and how to choose among free and fee-based services for their library’s needs and budget. two efficient places to begin are cody hanson’s special issue of library technology reports, which provides a thorough overview of mobile devices and their capabilities and straightforward suggestions for planning and implementation, and m-libraries, a section of library success: a best practices wiki. 36 conclusion in light of trends toward more widespread use of mobile computing devices and smartphones, it makes sense for libraries to provide access to their collections and services in ways that work well with mobile devices. this case study presents the situation at the merrill-cazier library at utah state university, where students who responded to a survey indicate they are very interested in mobile access, even if they have not yet purchased a smartphone or find data plans to be too expensive at this point. as is only reasonable for any library, at usu we have begun by implementing mobile applications that are available from vendors of our online catalog and databases because these require minimal effort and no additional cost. we present ideas for establishing an implementation team and advice for academic libraries who wish to “go mobile.” we aim to have a concrete plan for the work that will be required to optimize the library’s website for mobile access by the fall of 2011. a significant step is hiring a digital services librarian to work closely with the webmaster, electronic resources librarian, and others interested in promoting access to resources and services via mobile devices. our vision is to be on track to offer an augmented-reality experience to our patrons as the 2010 horizon report indicates will be an important trend in the next two to three years. we aim to create an environment in which students can use their mobile device to gain entry to a new layer of digital information, enhancing their experience in the physical library. information technology and libraries | june 2012 97 references 1. clifton dale foster, “pdas and the library without a roof,” journal of computing in higher education 7, no. 1 (1995): 85–93. 2. russell smith, “adapting a new technology to the academic medical library: personal digital assistants,” journal of the medical library association 90, no. 1 (2002): 93–94; colleen cuddy, using pdas in libraries: a how-to-do-it manual (new york: neal-schuman publishers, 2005). 3. andrea jackson, “wireless technology poised to transform health care,” rady business journal 3, no. 1 (2010): 24–26. 4. alan w. aldrich, “universities and libraries move to the mobile web,” educause quarterly 33, no. 2 (2010), www.educause.edu/educause+quarterly/educausequarterlymagazinevolum/univers itiesandlibrariesmoveto/206531 (accessed mar. 30, 2011). 5. larry johnson, alan levine, r. smith, and s. stone, the 2010 horizon report (austin, tx: the new media consortium, 2010), www.nmc.org/pdf/2010-horizon-report.pdf (accessed mar. 31, 2011). 6. shannon d. smith and judith borreson caruso, with an introduction by joshua kim, the ecar study of undergraduate students and information technology, 2010 (research study, vol. 6) (boulder, co: educause center for applied research, 2010), www.educause.edu/ecar (accessed mar. 31, 2011). 7. smith and caruso, the ecar study of undergraduate students and information technology, 2010. 8. laurie bridges et al., “making the case for a fully mobile library web site: from floor maps to the catalog,” reference services review 38, no. 2 (2010): 309–20. 9. joel cummings, alex merrill, and steve borrelli, “the use of handheld mobile devices: their impact and implications for library services,” library hi tech 28, no. 1 (2009): 22– 40. 10. rubicon consulting, the apple iphone: success and challenges for the mobile industry (los gatos, ca: rubicon consulting, 2008), http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf (accessed mar. 31, 2011). 11. sally wilson and graham mccarthy, “the mobile university: from the library to the campus,” reference services review 38, no. 2 (2010): 215. http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.nmc.org/pdf/2010-horizon-report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.nmc.org/pdf/2010-horizon-report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.educause.edu/ecar http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf mobile technologies & academics | dresselhaus and shrode 98 12. ibid., 216. 13. ibid., 223. 14. jamie seeholzer and joseph a. salem, “library on the go: a focus group study of the mobile web and the academic library,” college and research libraries 72, no. 1 (2011): 9– 20. 15. todd spires, “handheld librarians: a survey of librarian and library patron use of wireless handheld devices,” internet reference services quarterly 13, no. 4 (2008): 287– 309. 16. michael m. smith and barbara a. pietraszewski, “enabling the roving reference librarian: wireless access with tablet pcs,” reference services review 32, no. 3 (2004): 249–55. 17. kathryn zickuhr, generations and their gadgets (washington, d.c.: pew internet & american life project, 2011), http://pewinternet.org/reports/2011/generations-andgadgets.aspx (accessed mar. 31, 2011). 18. smith and pietraszewski, “enabling the roving reference librarian,” 253. 19. lisa carlucci thomas, “gone mobile: mobile catalogs, sms reference, and qr codes are on the rise—how are libraries adapting to mobile culture?” library journal 135, no. 17 (2020): 30–34. 20. smith and caruso, the ecar study of undergraduate students and information technology, 2010. 21. ibid. 22. carolyn klatt, “going mobile: free and easy,” medical reference services quarterly 30, no. 1 (2011): 56–73. 23. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mobile: tips on designing and developing mobile web sites,” code4lib 8, november 23, 2009, http://journal.code4lib.org/articles/2055 (accessed mar. 30, 2011). 24. godmar back and a. bailey, “web services and widgets for library information systems,”information technology & libraries 29, no. 2 (2010): 76–86. 25. mark andy west, arthur w hafner, and bradley d. faust, “communications—expanding access to library collections and services using small-screen devices,” information technology & libraries 25, no. 2 (2006): 103. 26. courtney greene, missy roser, and elizabeth ruane, the anywhere library: a primer for the mobile web (chicago: association of college and research libraries, 2010). http://pewinternet.org/reports/2011/generations-and-gadgets.aspx http://pewinternet.org/reports/2011/generations-and-gadgets.aspx http://journal.code4lib.org/articles/2055 information technology and libraries | june 2012 99 27. cody w. hanson, “libraries and the mobile web,” library technology reports 42, no. 2 (february/march 2011). 28. griggs, bridges, and gascho rempel, “library/mobile.” 29. samuel liston, “opacs and the mobile,” computers in libraries 29, no. 5 (2009): 6–47. 30. r. bruce jensen, “optimizing library content for mobile phones,” library hi tech news 27, no. 2 (2010): 6–9. 31. griggs, bridges, and gascho rempel, “library/mobile.” 32. “mobile web best practices 1.0,” worldwide web consortium (w3c), www.w3.org/tr/mobile-bp (accessed mar. 30, 2011). 33. wilson and mccarthy, “the mobile university.” 34. timothy vollmer, there’s an app for that! libraries and mobile technology: an introduction to public policy considerations (policy brief no. 3) (washington, d.c.: ala office for information technology policy, 2010), www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf (accessed mar. 31, 2011). 35. josh hadro, “harpercollins puts 26 loan cap on ebook circulations,” library journal, february 25, 2011, www.libraryjournal.com/lj/home/889452264/harpercollins_puts_26_loan_cap.html.csp (accessed mar. 31, 2011). 36. “m-libraries: library success: a best practices wiki,” www.libsuccess.org/index.php?title=m-libraries, (accessed mar. 31, 2011). file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libsuccess.org/index.php%3ftitle=m-libraries, mobile technologies & academics | dresselhaus and shrode 100 appendix a. student survey questions 1. type of student? 2. age? 3. gender? 4. what is your college? 5. how often do you use the following electronic resources provided by your library? 6. do you use any of the following devices? 7. do you use your mobile device or phone for academic purposes (e.g., blackboard, electronic course reserves, etc.)? 8. please list what you use your device to do? 9. have you ever used a text message to get help using the library? 10. have you ever used instant messaging to get help using the library? 11. if library resources were easily accessible on your mobile devices and if you had such a device, how likely would you be to use any of the following for assignments or research? 12. what mobile services would you like the library to offer? 13. comments? information technology and libraries | june 2012 101 appendix b. librarian survey questions 1. type of library? 2. your job/role in the library? 3. years working in libraries? 4. does your library offer mobile device applications for the following electronic resources? 5. who in your library or on your campus is responsible for implementing or developing mobile device applications? 6. how long has your library provided access via mobile devices to electronic resources or services? 7. if you collect use data for library electronic resources, are patrons using the mobile device applications your library provides? 8. what mobile services do you believe libraries should offer? 9. comments? 26 information technology and libraries | september 2007 author id box for 2 column layout wikis in libraries matthew m. bejune wikis have recently been adopted to support a variety of collaborative activities within libraries. this article and its companion wiki, librarywikis (http://librarywikis. pbwiki.com/), seek to document the phenomenon of wikis in libraries. this subject is considered within the framework of computer-supported cooperative work (cscw). the author identified thirty-three library wikis and developed a classification schema with four categories: (1) collaboration among libraries (45.7 percent); (2) collaboration among library staff (31.4 percent); (3) collaboration among library staff and patrons (14.3 percent); and (4) collaboration among patrons (8.6 percent). examples of library wikis are presented within the article, as is a discussion for why wikis are primarily utilized within categories i and ii and not within categories iii and iv. it is clear that wikis have great utility within libraries, and the author urges further application of wikis in libraries. i n recent years, the popularity of wikis has skyrocketed. wikis were invented in the mid­1990s to help facilitate the exchange of ideas between computer programmers. the use of wikis has gone far beyond the domain of com­ puter programming, and now it seems as if every google search contains a wikipedia entry. wikis have entered into the public consciousness. so, too, have wikis entered into the domain of professional library practice. the purpose of this research is to document how wikis are used in librar­ ies. in conjunction with this article, the author has created librarywikis (http://librarywikis.pbwiki.com/), a wiki to which readers can submit additional examples of wikis used in libraries. the article will proceed in three sections. the first section is a literature review that defines wikis and introduces computer­supported cooperative work (cscw) as a context for understanding wikis. the second section documents the author’s research and presents a schema for classifying wikis used in libraries. the third section considers the implications of the research results. ■ literature review what’s a wiki? wikipedia (2007a) defines a wiki as: a type of web site that allows the visitors to add, remove, edit, and change some content, typically with­ out the need for registration. it also allows for linking among any number of pages. this ease of interaction and operation makes a wiki an effective tool for mass collaborative authoring. wikis have been around since the mid­1990s, though it is only recently that they have become ubiquitous. in 1995, ward cunningham launched the first wiki, wikiwikiweb (http://c2.com/cgi/wiki), which is still active today, to facilitate the exchange of ideas among computer program­ mers (wikipedia 2007b). the launch of wikiwikiweb was a departure from the existing model of web communica­ tion ,where there was a clear divide between authors and readers. wikiwikiweb elevated the status of readers, if they so chose, to that of content writers and editors. this model proved popular, and the wiki technology used on wikiwikiweb was soon ported to other online communi­ ties, the most famous example being wikipedia. on january 15, 2001, wikipedia was launched by larry sanger and jimmy wales as a complementary project for the now­defunct nupedia encyclopedia. nupedia was a free, online encyclopedia with articles written by experts and reviewed by editors. wikipedia was designed as a feeder project to solicit new articles for nupedia that were not submitted by experts. the two services coexisted for some time, but in 2003 the nupedia servers were shut down. since its launch, wikipedia has undergone rapid growth. at the close of 2001, wikipedia’s first year of operation, there were 20,000 articles in eighteen language editions. as of this writing, there are approximately seven million articles in 251 languages, fourteen of which have more than 100,000 articles each. as a sign of wikipedia’s growth, when this manuscript was first submitted four months earlier, there were more than five million articles in 250 languages. author’s note: sources in the previous two para­ graphs come from wikipedia. the author acknowledges the concerns within the academy regarding the practice of citing wikipedia within scholarly works; however, it was decided that wikipedia is arguably an authoritative source on wikis and itself. nevertheless, the author notes that there were changes—insubstantial ones—to the cited wikipedia entries between when the manuscript was first submitted and when it was revised four months later. wikis and cscw wikis facilitate collaborative authoring and can be con­ sidered one of the technologies studied under the domain of cscw. in this section, cscw is explained and it is shown how wikis fit within this framework. cscw is an area of computer science research that considers the application of computer technology to sup­ port cooperative, also referred to as collaborative work. the term was first coined in 1984 by irene greif (1988) and matthew m. bejune (mbejune@purdue.edu) is an assistant professor of library science at purdue university libraries. he also is a doctoral student at the graduate school of library and information science, university of illinois at urbana-champaign. article title | author 27wikis in libraries | bejune 27 paul cashman to describe a workshop they were planning on the support of people in work environments with com­ puters. over the years there have been a number of review articles that describe cscw in greater detail, including bannon and schmidt (1991), rodden (1991), schmidt and bannon (1992), sachs (1995), dourish (2001), ackerman (2002), olson and olson (2002), dix, finlay, abowd, and beale (2004), and shneiderman and plaisant (2005). publication in the field of cscw primarily occurs through conferences. the first conference on cscw was held in 1986 in austin, texas. since then, the conference has been held biennially in the united states. proceedings are published by the association for computing machinery (acm, http://www.acm.org/). in 1991, the first european conference on computer supported cooperative work (ecscw) was held in amsterdam. ecscw also is held biennially, in odd­numbered years. ecscw proceedings are published by springer (http://www.ecscw.uni­sie­ gen.de/). the primary journal for cscw is computer supported cooperative work: the journal of collaborative computing. publications also appear within publications of the acm and chi, the conference on human factors in computing. cscw and libraries as libraries are, by nature, collaborative work envi­ ronments—library staff working together and with patrons—and as digital libraries and computer technolo­ gies become increasingly prevalent, there is a natural fit between cscw and libraries. the following researchers have applied cscw to libraries. twidale et al. (1997) pub­ lished a report sponsored by the british library research and innovation centre that examined the role of col­ laboration in the information­searching process to inform how information systems design could better address and support collaborative activity. twidale and nichols (1998) offered ethnographic research of physical collaborative environments—in a university library and an office—to aid the design of digital libraries. they wrote two reviews of cscw as applied to libraries—the first was more com­ prehensive (twidale and nichols 1998) than the second (twidale and nichols 1999). sánchez (2001) discussed collaborative environments designed and prototyped for digital library environments. classification of collaboration technologies that facilitate collaborative work are typically classified within cscw across two continua: synchronous versus asynchronous, and co­located versus remote. if put together in a two­by­two matrix, there are four possibilities: (1) synchronous and co­located (same time, same place); (2) synchronous and remote (same time, different place); (3) asynchronous and remote (different time, different place); and (4) asynchronous and co­located (different time, same place). this classification schema was first proposed by johansen et al. (1988). nichols and twidale (1999) mapped work applications within the realm of cscw in figure 1. wikis are not present in the figure, but their absence is not an indication that they are not cooperative work technologies. rather, wikis were not yet widely in use at the time cscw was considered by nichols and twidale. the author has added wikis to nichols and twidale’s graphical representation in figure 2. interestingly, wikis are border­crossers fitting within two quadrants: the upper right—asynchronous and co­located; and the lower right—asynchronous and remote. wikis are asynchronous in that they do not require people to be working together at the same time. they are both co­located and remote in that people working collaboratively may not need to be working in the same place. it is also interesting to note that library technologies also can be mapped using johansen’s schema. nichols and twidale (1999) also mapped this, and figure 3 illus­ trates the variety of collaborative work that goes on within libraries. ■ method in order to to discover the widest variety of wikis used in libraries, the author searched for examples of wikis used in libraries within three areas—the lis literature, the library success wiki, and within messages posted on three professional electronic discussion lists. when examples were found, they were logged and classified according to a schema created by the author. results are presented in the next section. the first area searched was within the lis literature. the author utilized the wilson library literature and figure 1. classification of cscw applications co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing team rooms organizational memory workflow web-based applications collaborative writing 2� information technology and libraries | september 20072� information technology and libraries | september 2007 information science database. there were two main types of articles: ones that argued for the use of wikis in libraries, and ones that were case studies of wikis that had been implemented. the second area searched was within library success: a best practices wiki (http://www.libsuccess.org/) (see figure 4), created by meredith farkas, distance learning librarian at norwich university. as the name implies, it is a place for people within the library community to share their success stories. posting to the wiki is open to the public, though registration is encouraged. there are many subject areas on the wiki, including management and leadership, readers’ advisory, reference services, infor­ mation literacy, and so on. there also is a section about collaborative tools in libraries (http://www.libsuccess .org/index.php?title=collaborative_tools_in_libraries), in which examples of wikis in libraries are presented. within this section there is a presentation about wikis made by farkas (2006) titled wiki world (http://www. libsuccess.org/indexphp?title=wiki_world), from which examples were culled. the third area that was searched was professional electronic discussion list messages from web4lib, dig_ ref, and libref­l. the web4lib electronic discussion list (tennant 2005) is “for the discussion of issues relating to the creation, management, and support of library­ based world wide web servers, services, and applica­ tions.” the list is moderated by roy tennant and the web4lib advisory board and was started in 1994. the dig_ref electronic discussion list is a forum for “people and organizations answering the questions of users via the internet” (webjunction n.d.). the list is hosted by the information institute of syracuse, school of information studies, syracuse university, and was created in 1998. the libref­l electronic discussion list is “a moderated discussion of issues related to reference librarianship (balraj 2005). established in 1990, it’s operated out of kent state university and moderated by a group of list own­ ers. these three electronic discussion lists were selected for two reasons. first, the author is a subscriber to each electronic discussion list, and prior to the research noted there were messages about wikis in libraries. second, based on the descriptions of each electronic discussion list stated above, the selected electronic discussion lists reasonably covered the discussion of wikis in libraries within the professional library electronic discussion lists. one year of messages, november 15, 2005, through november 14, 2006, was analyzed for each list. messages about wikis in libraries were identified through key­ word searches against the author’s personal archive of electronic discussion list messages collected over the figure 2. classification of cscw applications including wikis co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing wikis team rooms wikis organizational memory workflow web-based applications collaborative writing figure 3. classification of collaborative work within libraries co-located remote synchronous asynchronous personal help reference interview issue of book on loan fact-to-face interactions use of opacs database search video conferencing telephone notice boards post-it notes memos documents for study social information filtering e-mail, voicemail distance learning postal services figure �. library success: a best practices wiki (http://www. libsuccess.org/) article title | author 29wikis in libraries | bejune 29 years. an alternative method would have been to search the web archive of each list, but the author found it easier to search within his mail client, microsoft outlook. messages with the word “wiki” were found in 513 mes­ sages: 354 in web4lib, 91 in dig_ref, and 68 in libref­ l. this approach had high recall, as discourse about wikis frequently included the use of the word “wiki,” though low precision, as there were many results that were not about wikis used in libraries. common false hits included messages about the nature study (giles 2005) that com­ pared wikipedia to encyclopedia britannica, and messages that included the word “wiki” but were simply refer­ ring to wikis, though not examples of wikis used within libraries. from the list of 513 messages, the author read each message and came up with a much shorter list of thirty­nine messages about wikis in libraries: thirty­two in web4lib, three in dig_ref, and four in libref­l. ■ results classification of the results after all wiki examples had been collected, it became clear that there was a way to classify the results. in farkas’s (2006) presentation about wikis, she organized wikis in two categories: (1) how libraries can use wikis with their patrons; and (2) how libraries can use wikis for knowledge sharing and collaboration. this schema, while it accounts for two types of collaboration, is not granular enough to represent the types of collaboration found within the wiki examples identified. as such, it became clear that another schema was needed. twidale and nichols (1998) identified three types of collaboration within libraries: (1) collaboration among library staff; (2) collaboration between a patron and a member of staff; and (3) collaboration among library users. their classification schema mapped well to the examples of wikis that were identified; however, it too was not granular enough, as it did not distinguish among col­ laboration between library staff intraorganizationally and extraorganizationally, the two most common types of wiki usage found in the research (see appendix). to account for these types of collaboration, which are common not only to wiki use in libraries but to all professional library prac­ tice, the author modified twidale and nichols schema (see figure 6). the improved schema also uniformly represents entities across the categories—library staff and member of staff are referred to as “library staff,” and patrons and library users are referred to as “patrons.” examples of wikis used in libraries for each category are provided to better illustrate the proposed classifica­ tion schema. ■ collaboration among libraries the library instruction wiki (http://instructionwiki .org/main_page) is an example of a wiki that is used for collaboration among libraries (figure 7). it appears as though the wiki was originally set up to support library instruction within oregon—it is unclear if this was asso­ ciated with a particular type of library, say academic or public—but now the wiki supports library instruction in general. the wiki is self­described as: a collaboratively developed resource for librarians involved with or interested in instruction. all librarians and others interested in library instruction are welcome and encouraged to contribute. the tagline for the wiki is “stop reinventing the wheel”(library instruction wiki 2006). from this wiki there figure 6. four types of collaboration within libraries 1. collaboration among libraries (extra-organizational) 2. collaboration among library staff (intra-organizational) 3. collaboration among library staff and patrons 4. collaboration among patrons figure 5. wiki world (http://www.libsuccess.org/index.php?title=wiki _world) 30 information technology and libraries | september 200730 information technology and libraries | september 2007 is a list of library instruction resources that include the fol­ lowing: handouts, tutorials, and other resources to share; teaching techniques, tips, and tricks; class­specific web sites and handouts; glossary and encyclopedia; bibliography and suggested reading; and instruction­related projects, brainstorms, and documents. within the handouts, tutori­ als, and other resources to share section, the author found a wide variety of resources from libraries across the country. similarly, there were a number of suggestions to be found under the teaching techniques, tips, and tricks section. another example of a wiki used for collaboration among libraries is the library success wiki (http://www .libsuccess.org/), one of the sources of examples of wikis used in this research. adding to earlier descriptions of this wiki as presented in this paper, library success seems to be one of the most frequently updated library wikis and perhaps the most comprehensive in its cover­ age of library topics. ■ collaboration among library staff the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) is an example of a wiki used for col­ laboration among library staff (figure 8). this wiki is a knowledge base containing more than one thousand infor­ mation technology services (its) documents. its docu­ ments support the information technology needs of the library organization. examples include answers to com­ monly asked questions, user manuals, and instructions for a variety of computer operations. in addition to being a repository of its documents, the wiki also serves as a portal to other wikis within the university of connecticut libraries. there are many other wikis connected to library units; teams; software applications, such as the libraries ils; libraries within the university of connecticut libraries; and other university of connecticut campuses. the health science library knowledge base, stony brook university (http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome) is another example of a wiki that is used for collaboration among library staff (figure 9). the wiki is described as “a space for the dynamic collaboration of the library staff, and a platform of shared resources” (health sciences library 2007). on the wiki there are the following content areas: news and announcements; hsl departments; projects; trouble­ shooting; staff training resources, working papers and support materials; and community activities, scholarship, conferences, and publications. ■ collaboration among library staff and patrons there are only a few examples of wikis used for collabora­ tion among library staff and patrons to cite as exemplars. one example is the st. joseph county public library (sjpl) subject guides (http://www.libraryforlife.org/ subjectguides/index.php/main_page), seen in figure 10. this wiki is a collection of resources and services in print and electronic formats to assist library patrons with subject area searching. as the wiki is published by library staff for public consumption, it has more of a professional feel than wikis from the first two categories. pages have images, and the content is structured to look like a standard web page. though the wiki looks like a web page, there still remain a number of edit links that follow each section of text on the wiki. while these tags bear importance for those editing figure 7. library instruction wiki (http://instructionwiki.org/) figure �. the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) article title | author 31wikis in libraries | bejune 31 the wiki—library staff only in this case—they undoubtedly puzzle library patrons who think that they have the ability to edit the wiki when, in fact, they do not. another example of collaboration between library staff and patrons that takes a similar approach is the usc aiken gregg­graniteville library web site (http://library. usca.edu/) in figure 11. as with the sjpl subject guides, this wiki looks more like a web site than a wiki. in fact, the usc aiken wiki conceals its true identity as a wiki even more so than the sjpl subject guides. the only evidence that the web site is a wiki is a link at the bottom of each page that says “powered by pmwiki.” pmwiki (http:// pmwiki.org/) is a content management system that uti­ lizes the wiki technology on the back end to manage a web site while retaining the look and feel of a standard web site. it seems that the benefits of using a wiki in such a way are shared content creation and management. ■ collaboration among patrons as there are only three examples of wikis used for col­ laboration among patrons, all examples will be high­ lighted in this section. the first example is wiki worldcat (http://www.oclc.org/productworks/wcwiki.htm), sponsored by oclc. wiki worldcat launched as a pilot project in september 2005. the service allows users of open worldcat, oclc’s web version of worldcat, to add book reviews to item records. though this wiki does not have many book reviews in it, even for contemporary bestsellers, it gives a taste for how a wiki could be used to facilitate collaboration among patrons. a second example is the biz wiki from ohio university libraries (http://www.library.ohiou.edu/subjects/ bizwiki/index.php/main_page) (see figure 12). the biz wiki is a collection of business information resources avail­ able through ohio university. the wiki was created by chad boeninger, reference and instruction librarian, as an alternate form of a subject guide or pathfinder. what separates this wiki from those in the third category, collaboration among library staff and patrons, is that the wiki is editable by patrons as well as librarians. similarly, butler wikiref (http://www .seedwiki.com/wiki/butler_wikiref) is a wiki that has reviews of reference resources created by butler librarians, faculty, staff, and students (see figure 13).figure 9. health sciences library knowledge base (http://appdev .hsclib.sunysb.edu/twiki/bin/view/main/webhome) figure 11. usc aiken gregg-graniteville library (http://library.usca .edu/) figure 10. sjcpl subject guides (http://libraryforlife.org/subject guides/index.php/main_page/) 32 information technology and libraries | september 200732 information technology and libraries | september 2007 full results thirty­three wikis were identified. two wikis were classi­ fied in two categories each. the full results are available in the appendix. table 1 illustrates how wikis were not uniformly distributed across the four categories: category i had 45.7 percent, category ii had 31.4 percent, category iii had 14.3 percent, and category iv had 8.6 percent. nearly 80 percent of all examples were found within categories i and ii. as seen in some of the examples in the previous section, wikis were utilized for a variety of purposes. here is a short list of purposes for which wikis were utilized: for sharing information, supporting association work, collecting soft­ ware documentation, supporting conferences, facilitating librarian­to­faculty collaboration, creating digital reposito­ ries, managing web content, creating intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. wiki software utilization is summarized in tables 2 and 3. mediawiki is the most popular software utilized by libraries (33.3 percent), followed by unknown (30.3 percent), pbwiki (12.1 percent), pmwiki (12.1 percent), seedwiki (6.1 percent), twiki (3 percent), and xwiki (3 percent). if the values for unknown are removed from the totals (table 3 ), mediawiki is utilized in almost half (47.8 percent) of all library wiki applications. ■ discussion with a wealth of examples of wikis in categories i and ii and a dearth of examples of wikis in categories iii and iv, the library community seems to be more comfortable using wikis to collaborate within the community, but less comfortable using wikis to collaborate with library patrons or to enable collaboration among patrons. the research results pose the questions: why are wikis pre­ dominantly used for collaboration within the library community? and why are wikis minimally used for col­ laborating with patrons and helping patrons to collabo­ rate with one another? why are wikis predominantly used for collaboration within the library community? this is perhaps the easier of the two questions to explain. there is a long legacy of cooperation and collaboration intraorganizationally and extraorganizationally within libraries. one explanation for this is the shared bud­ getary climate within libraries. all too often there are insufficient money, staff, and resources to offer desired levels of service. librarians work together to overcome these barriers. prominent examples include coopera­ tive cataloging, interlibrary lending, and the formation of consortia to negotiate pricing. another explanation can be found in the personal characteristics of library professionals. librarianship is a service profession that consequently attracts service­minded individuals who are interested in helping others, whether they are library patrons or fellow colleagues. a third reason is the role of library associations, such as the international federation of library associations and institutions, the american library association, the special libraries association, and the medical library association, as well as many others at the international, national, state, and local lev­ figure 12. ohio university libraries biz wiki (http://www.library. ohiou.edu/subjects/bizwiki) figure 13. butler wikiref (http://www.seedwiki.com/wiki/butler_ wikiref) article title | author 33wikis in libraries | bejune 33 els, and the work that is done through these associations at annual conferences and throughout the year. libraries use wikis to collaborate intraorganizationally and extra­ organizationally because collaboration is what they do most naturally. why are wikis minimally used for collaborating with patrons and helping patrons to collaborate with one another? the reasons for why libraries are only minimally using wikis to collaborate with patrons and for patron collabora­ tion are more difficult to ascertain. however, due to the untapped potential of using wikis, the proposed answers to this question are more important and may lead to future implementations of wikis in libraries. here are four pos­ sible explanations, some more speculative than others. first, perhaps one of the reasons is the result of the way in which libraries are conceived by library patrons and librarians alike. a strong case can be made for libraries as places of collaborative work, and the author takes this posi­ tion. however, historically libraries have been repositories of information, and this remains a pervasive and difficult concept to change—libraries are frequently seen simply as places to get books. in this scenario, the librarian is a gate­ keeper that a patron interacts with to get a book—that is, if the patron interacts with a librarian at all. it also is worthy to note that the relationship is one­way—the patron needs the assistance of librarian, but not the other way around. viewed in these terms, this is not a collaborative situation. for libraries to use wikis for the purpose of collaborating with library patrons, it might demand the reconceptualiza­ tion of libraries by library patrons and librarians. similarly, this extreme conceptualization of libraries does not con­ sider patrons working with one another, even though it is an activity that occurs formally and informally within libraries, not to mention with the emergence of interdisci­ plinary and multidisciplinary work. if wikis are to be used to facilitate collaboration between patrons, the conceptual­ ization of the library by library patrons and librarians must be expanded. second, there may be fears within the library commu­ nity about authority, responsibility, and liability. libraries have long held the responsibility of ensuring the authority of the bibliographic catalog. if patrons are allowed to edit the library wiki, there is potential for negatively affecting the authority of the wiki and even the perceived author­ ity of the library. likewise, there is potential liability in allowing patrons to post to the library wiki. similar con­ table 2. software totals wiki software no. % mediawiki 11 33.3 unknown 10 30.3 pbwiki 4 12.1 pmwiki 4 12.1 seedwiki 2 6.1 twiki 1 3 xwiki 1 3 total: 33 100 table 3. software totals without unknowns wiki software no. % mediawiki 11 47.8 pbwiki 4 17.4 pmwiki 4 17.4 seedwiki 2 8.7 twiki 1 4.3 xwiki 1 4.3 total: 23 100.0 table 1. classification summary category no. % i: collaboration among libraries 16 45.7 ii: collaboration among library staff 11 31.4 iii: collaboration among library staff and patrons 5 14.3 iv: collaboration among patrons 3 8.6 total: 35 100.0 3� information technology and libraries | september 20073� information technology and libraries | september 2007 cerns have been raised in the past about other collabora­ tive technologies, such as blogs, bulletin boards, mailing lists, and so on, all aspects of the library 2.0 movement. if libraries are fully to realize library 2.0 as described by casey and savastinuk (2006), miller (2006), and courtney (2007), these issues must be considered. third, perhaps it is due to a matter of fit. it might be the case that wikis are utilized in categories i and ii and not within categories iii and iv because the tools are better suited to support the types of activities within categories i and ii. consider some of the activities listed earlier: sup­ porting association work, collecting software documenta­ tion, supporting conferences, creating digital repositories, creating intranets, and creating knowledge bases. each of these illustrates a wiki that is utilized for the creation of a resource with multiple authors and readers, tasks that are well­suited to wikis. wikipedia is a great example of a wiki with clear, shared tasks for multiple authors and multiple readers and a sense of persistence over time. in contrast, relationships between library staff and patrons do not typically lead to the shared creation of resources. while it is true that the relationship between patron and librarian in the context of a patron’s research assignment can be collab­ orative depending on the circumstances, authorship is not shared but is possessed by the patron. in addition, research assignments in the context of undergraduate coursework are short­lived and seldom go beyond the confines of a particular course. in terms of patrons working together with other patrons, there is the precedent of group work; however, groups often produce projects or papers that share the characteristics of nongroup research assignments listed above. this, of course, does not mean that wikis are not suitable for collaboration within categories iii and iv, but perhaps the opportunities for collaboration are fewer or that they stretch the imagination of the types and ways of doing collaborative work. fourth, perhaps it is a matter of “not yet.” while the research has shown that libraries are not utilizing wikis in categories iii and iv, this may be because it is too soon. it should be noted that wikis are still new technologies. it might be the case that librarians are experimenting in safer contexts so they will gain experience prior to trying more public projects where their expertise will be needed. if this explanation is true, it is expected that more exam­ ples of wikis in libraries will soon emerge. as they do, the author hopes that all examples of wikis in libraries, new and old, will be added to the companion wiki to this article, librarywikis (http://librarywikis.pbwiki.com/). ■ conclusion it appears that wikis are here to stay, and that their utili­ zation within libraries is only just beginning. this article documented the current practice of wikis used in libraries using cscw as a framework for discussion. the author located examples of wikis in three places: within the lis lit­ erature, on the library success wiki, and within messages from three professional electronic discussion lists. thirty­ three examples of wikis were identified and classified using a classification schema created by the author. the schema has four categories: (1) collaboration among librar­ ies; (2) collaboration among library staff; (3) collaboration among library staff and patrons; and (4) collaboration among patrons. wikis were used for a variety of purposes, including for sharing information, supporting associa­ tion work, collecting software documentation, supporting conferences, facilitating librarian­to­faculty collaboration, creating digital repositories, managing web content, creat­ ing intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. by and large, wikis were primarily used to support collaboration among library staff intraorganiza­ tionally and extraorganizationally, with nearly 80 percent (45.7 percent and 31.4 percent respectively) of the examples so identified, and less so in the support of collaboration among library staff and patrons (14.3 percent) and col­ laboration among patrons (8.6 percent). a majority of the examples of wikis utilized the mediawiki software (47.8 percent). it is clear that there are plenty of examples of wikis utilized in libraries, and more to be found each day. it is at this time that the profession is faced with extending the use of this technology, and it is to the future to see how wikis will continue to be used within libraries. works cited ackerman, mark s. 2002. the intellectual challenge of cscw: the gap between social requirements and technical feasibil­ ity. in human-computer interaction in the new millennium, ed. john m. carroll, 179–203. new york: addison­wesley. balraj, leela, et al. 2005 libref­l. kent state university librar­ ies. http://www.library.kent.edu/page/10391 (accessed june 12, 2007). archive is available at this link as well. bannon, liam j., and kjeld schmidt. 1991. cscw: four charac­ ters in search for a context. in studies in computer supported cooperative work. ed. john m. bowers and steven d. benford, 3–16. amsterdam: elsevier. casey, michael e., and laura c. savastinuk. 2006. library 2.0. library journal 131, no. 14: 40–42. http://www.libraryjournal. com/article/ca6365200.html (accessed june 12, 2007). courtney, nancy. 2007. library 2.0 and beyond: innovative technologies and tomorrow’s user (in press). westport, conn.: libraries unlimited. dix, alan, et al. 2004. socio­organizational issues and stake­ holder requirements. in human computer interaction, 3rd ed., 450–74. upper saddle river, n.j.: prentice hall. dourish, paul. 2001. social computing. in where the action is: the foundations of embodied interaction, 55–97. cambridge, mass: mit pr. article title | author 35wikis in libraries | bejune 35 farkas, meredith. 2006. wiki world. http://www.libsuccess. org/index.php?title=wiki_world (accessed june 12, 2007). giles, jim. 2005. internet encyclopaedias go head to head. nature 438: 900–01. http://www.nature.com/nature/journal/v438/ n7070/full/438900a.html (accessed june 12, 2007). greif, irene, ed. 1988. computer supported cooperative work: a book of readings. san mateo, calif.: morgan kaufmann publishers. health sciences library, state university of new york, stony brook. 2007. health sciences library knowledge base. http://appdev.hsclib.sunysb.edu/twiki/bin/view/main/ webhome (accessed june 12, 2007). johansen, robert, et al. 1988. groupware: computer support for business teams. new york: free press. library instruction wiki. 2006. http://instructionwiki.org/ main_page (accessed june 12, 2007). miller, paul. 2006. coming together around library 2.0. dlib magazine 12, no. 4. http://www.dlib.org/dlib/april06/ miller/04miller.html (accessed june 12, 2007). nichols, david m., and michael b. twidale. 1999. com­ puter supported cooperative work and libraries. vine 109: 10–15. http://www.comp.lancs.ac.uk/computing/research/ cseg/projects/ariadne/docs/vine.html (accessed june 12, 2007). olson, gary m., and judith s. olson. 2002. groupware and com­ puter­supported cooperative work. in the human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, ed. julie a. jacko and andrew sears, 583–95. mahwah, n.j.: lawrence erlbaum associates, inc.. rodden, tom t. 1991. a survey of cscw systems. interacting with computers 3, no. 3: 319–54. sachs, patricia. 1995. transforming work: collaboration, learn­ ing, and design. communications of the acm 38: 227–49. sánchez, j. alfredo. 2001. hci and cscw in the context of digi­ tal libraries. in chi ‘01 extended abstracts on human factors in computing systems. conference on human factors in computing systems. seattle, wash., mar. 31–apr. 5 2001. schmidt, kjeld, and liam j. bannon. 1992. taking cscw seri­ ously: supporting articulation work. computer supported cooperative work 1, no. 1/2: 7–40. shneiderman, ben, and catherine plaisant. 2005. collaboration. in designing the user interface: strategies for effective humancomputer interaction, 4th ed., 408–50. reading, mass.: addison wesley. tennant, roy. 2005. web4lib electronic discussion. webjunc­ tion.org. http://lists.webjunction.org/web4lib/ (accessed june 12, 2007). archive is available at this link as well. twidale, michael b., et al. 1997. collaboration in physical and digital libraries. report no. 64, british library research and innovation centre. http://www.comp.lancs.ac.uk/ computing/research/cseg/projects/ariadne/bl/report/ (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998a. using studies of collaborative activity in physical environments to inform the design of digital libraries. technical report cseg/11/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/cscw98.html (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998b. a survey of applications of cscw for digital libraries. technical report cseg/4/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/survey.html (accessed june 12, 2007). webjunction. n.d. dig_ref electronic discussion list. http:// www.vrd.org/dig_ref/dig_ref.shtml (accessed june 12, 2007). wikipedia. 2007a. wiki. http://en.wikipedia.org/wiki/wiki (accessed april 29, 2007). wikipedia. 2007b. wikiwikiweb. http://en.wikipedia.org/ wiki/wikiwikiweb (accessed april 29, 2007). 36 information technology and libraries | september 200736 information technology and libraries | september 2007 appendix. wikis in libraries i = collaboration between libraries ii = collaboration between library staff iii = collaboration between library staff and patrons iv = collaboration between patrons category description location wiki software i library success: a best practices wiki—a wiki capturing library success stories. covers a wide variety of topics. also features a presentation about wikis http://www.libsuccess. org/index.php?title=wiki_world http://www.libsuccess.org/ mediawiki i wiki for school library association in alaska http://akasl.pbwiki.com/ pbwiki i wiki to support reserves direct. free, open­source software for managing academic reserves materials developed by emory university. http://www.reservesdirect.org/ wiki/index.php/main_page mediawiki i sunyla new tech wiki—a place for state university of new york (suny) librarians to share how they are using information technologies to interact with patrons http://sunylanewtechwiki.pbwiki. com/ pbwiki i wiki for librarians and faculty members to collaborate across campuses. being used with distance learning instructors and small groups message from robin shapiro. on [dig_ref] electronic discussion list dated 10/18/2006. unknown i discusses setting up three wikis in last month: “one to sup­ port a pre­conference workshop, another for behind­the­ scenes conferences planning by local organizers, and one for conference attendees to use before they arrived and during the sessions” (30). fichter, darlene. 2006. using wikis to support online collaboration in libraries. information outlook 10, no.1: 30­31. unknown i unofficial wiki to the american library association 2005 annual conference http://meredith.wolfwater.com/ wiki/index.php?title=main_page mediawiki i unofficial wiki to the 2005 internet librarian conference http://ili2005.xwiki.com/xwiki/bin/ view/main/webhome xwiki i wiki for the canadian library association (cla) 2005 annual conference http://wiki.ucalgary.ca/page/cla mediawiki i wiki for south carolina library association http://www.scla.org/governance/ homepage pmwiki i wiki set up to support national discussion about institutional repositories in new zealand http://wiki.tertiary.govt.nz/ ~institutionalrepositories pmwiki i the oregon library instruction wiki used for sharing infor­ mation about library instruction http://instructionwiki.org/ mediawiki i personal repositories online wiki environment (prowe)— an online repository sponsored by the open university and the university of leicester that uses wikis and blogs to encourage the open exchange of ideas across communities of practice http://www.prowe.ac.uk/ unknown article title | author 37wikis in libraries | bejune 37 category description location wiki software i lis wiki—space for collecting articles and general informa­ tion about library and information science http://liswiki.org/wiki/main_page mediawiki i making of modern michigan—a wiki to support a state­wide digital library project http://blog.lib.msu.edu/mmmwiki/ index.php/main_page unknown (behind firewall) i wiki used as a web content editing tool in a digital library initiative sponsored by emory university, the university of arizona, virginia tech, and the university of notre dame http://sunylanewtechwiki.pbwiki .com/ pbwiki ii wiki at suny stony brook health sciences library used as knowledge base http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome; presentation can be found at: http:// ms.cc.sunysb.edu/%7edachase/ wikisinaction.htm twiki ii wiki at york university used internally for committee work. exploring how to use wikis as a way to collaborate with users message from mark robertson. on web4lib electronic discussion list dated 10/13/2006. unknown ii wiki for internal staff use at the university of waterloo. they utilize access control to restrict parts of the wiki to groups message from chris gray. on web4lib electronic discussion list dated 08/09/2006. unknown ii wiki at the university of toronto for internal communica­ tions, technical problems, and as a document repository message from stephanie walker. on libref­l electronic discussion list dated 10/28/2006. unknown ii wiki used for coordination and organization of portable professor program, which appears to be a collaborative infor­ mation literacy program for remote faculty http://tfpp­committee.pbwiki.com/ pbwiki ii the university of connecticut libraries’ staff wiki which is a repository of information technology services documents http://wiki.lib.uconn.edu/wiki/ main_page mediawiki ii wiki used at binghamton university libraries for staff intranet. features pages for committees, documentation, policies, newsletters, presentations, and travel reports screenshots can be found at http://library.lib.binghamton.edu/ presentations/cil2006/cil%202006 _wikis.pdf mediawiki ii wiki used at the information desk at miami university described in: withers, rob. “something wiki this way comes.” c&rl news 66, no. 11 (2005): 775–77. unknown ii use of wiki as knowledge base to support reference service http://oregonstate.edu/~reeset/ rdm/ unknown ii university of minnesota libraries staff web site in wiki form https://wiki.lib.umn.edu/ pmwiki ii wiki used to support the mit engineering and science libraries b­team. the wiki may no longer be active, but is still available http://www.seedwiki.com/wiki/b­ team seedwiki iii a wiki that is subject guide at st. joseph county public library in south bend, indiana http://www.libraryforlife.org/ subjectguides/index.php/main_page mediawiki 3� information technology and libraries | september 20073� information technology and libraries | september 2007 category description location wiki software iii wiki used at the aiken library, university of south carolina as a content management system (cms) http://library.usca.edu/main/ homepage pmwiki iii doucette library of teaching resources wiki—a repository of resources for education students http://wiki.ucalgary.ca/page/ doucette mediawiki iv wiki worldcat (wikid) is an oclc pilot project (now defunct) that allowed users to add reviews to open worldcat records http://www.oclc.org/product­ works/wcwiki.htm unknown iii and iv wikiref lists reviews of reference resources—databases, books, web sites, etc. —created by butler librarians, faculty, staff, and students. http://www.seedwiki.com/wiki/ butler_wikiref; reported in matthies, brad, jonathan helmke, and paul slater. using a wiki to enhance library instruction. indiana libraries 25, no. 3 (2006): 32–34. seedwiki iii and iv wiki used as a subject guide at ohio university http://www.library.ohiou.edu/sub­ jects/bizwiki/index.php/main_page; presentation about the wiki: http://www.infotoday.com/cil2006/ presentations/c101­102_boeninger .pps mediawiki index blending is the process of database development whereby various components are merged and refined to create a single encompassing source of information. once a research need is determined for a given area of study, existing resources are examined for value and possible contribution to the end product. index blending focuses on the quality of bibliographic records as the primary factor with the addition of full text to enhance the end user’s research experience as an added convenience. key examples of the process of index blending involve the fields of communication and mass media, hospitality and tourism, as well as computers and applied sciences. when academia, vendors, subject experts, lexicographers, and other contributors are brought together through the various factors associated with index blending, relevant discipline-specific research may be greatly enhanced. a s consumers, when we set out to make a purchase, we want the utmost in quality, and when applica­ ble, quantity, and of course all of the other ”appeal” factors that might be associated with a given product or service. these factors may include any number of catego­ ries, not the least of which is price. in other words, let it suffice to say that, as buyers, we want to have our cake and eat it, too. but how often is this a realistic approach to evaluating a given item for purchase? we first must decide what is important to us, decipher the order of this importance as we see it, and evaluate our options. wouldn’t it be much easier if one product in every situ­ ation had all of the factors that we deem important, and the appropriate price to go along with it? according to veliyath and fitzgerald in an article published in competitiveness review, firms can either posi­ tion themselves at the high end, offering higher quality at higher prices, or at the lower end, offering lower quality at a lower price (or anywhere in­between on the continuum of constant value for customers). customers, however, want more of what they value, such as convenience, speed, state­of­the­art design, quality, etc. competitors then try to differentiate themselves from their rivals along the same line of constant value, either by offering a higher quality at the same price or the same quality at a lower price (thereby increasing value for the customer).1 as such, and using a common example, is it possible to have the handling of a bmw sports car, the luxurious ride of a cadillac, the passenger space of a winnebago, the cargo space of an oversized pick­up truck, all for the price of an economy car? it’s doubtful. but through recent developments in the electronic research database market­ place, and a process known as “index blending,” we may be closer than ever to this ideal formula when it comes to web­based reference resources for academic libraries. the phrase “index blending” is used here to describe an original concept/methodology initiated by ebsco publishing (ebsco). this is not to say that ebsco is the first vendor ever to have combined resources to create a new product, but to the authors’ best knowledge, no other vendor has pursued the “blending” of resources to the same extent and with such a strong guiding directive as ebsco has. index blending is the combining of niche indexes and other important components to create a single defini­ tive index for a particular discipline. as vendors seek to offer the most powerful research database for a given area of study, the pieces may come together through a combination of existing resources and proprietary development. in other words, in order to refine the tools used for research in a discipline, existing resources may be combined, fleshed out, further expanded upon, and enhanced to culminate in the archetypical index for the particular discipline. perhaps this represents the solution to the dilemma that “database choices become increas­ ingly complex when multiple sources exist that cover the same discipline.”2 the idea may seem elementary, but the process, however, can be arduous. processes involved with index blending expand upon the basic development stages asso­ ciated with creating a research database from “scratch,” coupled with an increase in applicable factors, which become evident when several existing and emerging resources are involved and subsequently interwoven. as is always the case, the first step to building a solution is to identify the problem and/or the need. in database devel­ opment, this is, in a nutshell, pinpointing a subject area of research that is lacking a corresponding definitive index, and where study patterns and research interest dictate a need for such a resource. this involves not only conduct­ ing surveys and engaging in discussion with advisory boards, librarians, subject experts, users, etc., but also taking a close look at the research resources that are cur­ rently available to determine value. because the process begins with the fact that there is a problem (no definitive index for the particular area in question), the idea is to understand the strengths of available resources, as well as to identify weaknesses. through this research process, vendors can further identify independent elements of each resource that may index blending | brooks and herrick 27 index blending: enabling the development of definitive, discipline-specific resources sam brooks and mark herrick sam brooks (sbrooks@ebscohost.com) is the senior vice president of sales & marketing for ebsco information services. mark herrick (mherrick@ebscohost.com) is the vice president of business development for ebsco publishing. 2� information technology and libraries | june 20072� information technology and libraries | june 2007 provide significant benefit or value, as well as pinpoint the additional important pieces that are not represented in any of the available resources. in both cases (available and not available), these elements may represent various aspects associated with a research index such as content coverage (both current and backfile), quality of indexing and abstracts, software/search functionality, thesauri, etc. once the identification and research has taken place, vendors should have the necessary knowledge to proceed to the production phase. figure 1 helps to illustrate how the index blending process can help to develop a new database that fuses together the strengths of existing resources while simul­ taneously compensating for any individual weaknesses that they may have. if value is attributed to currently available databases, then, if appropriate, database acquisition may come into play. this is often a critical phase of the process, and may involve the acquisition of more than a single index. however, the desire by a vendor to acquire a given resource is based on several motivating factors, including the qual­ ity of the database as a whole, the depth and breadth of its coverage, and at times, the extreme quality of an intricate aspect of a database, which will eventually be said data­ base’s contribution to the process of index blending, thus representing its “mark” on the final product. because there is no authoritative resource available for a given subject area does not mean necessarily that certain aspects of existing resources are not of utmost quality. hence, utilizing strengths of existing resources makes sense so as to not “reinvent the wheel” when applicable. in a journal of academic librarianship article discussing the research environment in libraries and the simultaneous utilization of existing library resources, similar principles to those used in index blending are apparent. “properly combining library resources to func­ tion collectively as a cohesive, efficient unit is the basis of information integration.”3 similar themes to those asso­ ciated with information integration run through index blending. this is attributed largely to the fact that the basic goal of each is to enable the extraction and utiliza­ tion of essential material pertinent to specific research so as to enhance the overall research process. n the process of index blending an example an interesting example of index blending utilized for a major area of study is in the case of communication and mass media. an article in searcher outlined the develop­ ment process and release of the database, communication & mass media complete, which may be the quintessential instance of the power brought about through index blending. in the article, the author first identifies the problem/need as such: when a communication studies student approaches my reference desk, it can take a few moments before i choose a database to search. why the delay? well, to be perfectly blunt, the communication studies literature is all over the place. if the question relates to an aspect of the communications industry, i will often begin with a business database. if the question concerns the effects of media violence on children, i may choose to search one or more of the following: comabstracts, psychinfo [sic], sociological abstracts, eric, and even a few large aggregators, such as wilsonweb’s omnifile and ebsco’s academic search premier. in addition, there is the question of finding a single database that covers the communication science and disorders field and the more mass media­focused communication studies field. the result has been a searching strategy that relies on consulting multiple databases—a strategy that may not please impatient or inexperienced patrons. the need for such an assortment of databases is symptomatic of the discipline. the field of com­ munication studies is extremely interdisciplinary. the discipline’s roots began in the study of rhetoric and journalism and now encompass subjects ranging from political communication to film studies to advertising to journalism to communication disorders to digital convergence and to every manner of media. the dis­ cipline has strong roots in the social sciences, but also draws heavily on the humanities and the sciences. as some have put it, there is an aspect of communication studies in every discipline. this leaves librarians with the difficult task of finding a single database that cov­ ers this wide­ranging discipline. enter ebsco’s new communication & mass media complete database.4figure 1. the index blending process public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 2�index blending | brooks and herrick 2� this overview of the need for a comprehensive resource in areas related to communication and mass media is indicative of the type of information that vendors must extract when deciding their course of action for creat­ ing (or not creating) a database to meet such needs. in this instance, the need became apparent to ebsco upon conducting investigative research in this direction. there were certainly important, quality resources available cov­ ering some of the subject areas and subdisciplines, but not a single, all­encompassing resource. hence, the table was set to move forward and begin the process of data­ base development using the process of index blending. once the need for a comprehensive communication and mass media database was established, ebsco began the phases of looking closely at available resources and gathering specific important details about what was required to develop such a database. in order to under­ stand the finer details and make appropriate forward progress in formulating an index for a given research area, a dedicated group of subject experts (advisory board, indexers, lexicographers, etc.) must be estab­ lished. in addition, aggregators must develop appro­ priate relationships and key partnerships. in the case of the database communication & mass media complete, ebsco worked diligently to assemble a panel of experts to provide direction. often, suggestions made by advi­ sory board members ultimately led to larger organiza­ tional partnerships. the first of ebsco’s major partnerships for the benefit of the development of communication & mass media complete was with the national communication association (nca). nca is the oldest and largest national organization to promote communication schol­ arship and education. founded in 1914, the nca is a nonprofit organization of approximately 7,100 educa­ tors, practitioners, and students who work and reside in every u.s. state and more than twenty countries. the purpose of the association is to promote study, criti­ cism, research, teaching, and application of the artistic, humanistic, and scientific principles of communica­ tion. nca is a scholarly society and, as such, works to enhance the research, teaching, and service produced by its members on topics of both intellectual and social significance. staff at the nca national office follows trends in national research, teaching, and service pri­ orities. it relays those opportunities to its members and represents the academic discipline of communication in those national efforts.5 in addition to providing insight and advice into the areas associated with communication and mass media, nca found in ebsco an ideal partner to further the tremendous efforts the organization had put into its database, commsearch. commsearch, in its original form, was a scholarly communication database with deep, archival coverage of the journals of the nca and other major journals in the field of communication studies. the database provided bibliographic and keyword references to twenty­six journals in communication studies with coverage extending to the inaugural issue of each—some from as far back as the early decades of the twentieth century. the database also included cover­to­cover indexing of the nca’s first six journals (from their first editions to the present) and author­sup­ plied abstracts from their earliest appearance in nca journals. as ebsco’s goals were in line with the nca in terms of improving scholarly research in areas sur­ rounding communication as well as enhancing the dis­ semination of applicable materials, a partnership was formed, and ebsco acquired commsearch. the com­ pany acquired this database with the intent to enhance the collection through content additions such that it would take residence immediately as a core component of communication & mass media complete. the second major database acquisition came about similarly to the commsearch arrangement; only this time, ebsco worked closely with penn state university, the developers of a database called mass media articles index. created by jack pontius and maintained by the penn state libraries since 1984, mass media articles index provided citation coverage for over forty thousand articles on mass media published in over sixty research journals, as well as major journalism reviews, recent encyclopedias, and handbooks in the area of communications studies. this database, which was once a stand­alone research tool, is a good example of how a good­quality resource can arise out of the passion and unique vision of an individual, yet never fully develop into its full potential due to a lack of funding, dedicated staff, and experience in database publishing. seeing the incredible potential of mass media articles index, ebsco earmarked this database as the sec­ ond major component in its larger communication and mass media product. as mentioned, the basic idea with index blending is to pinpoint the best and most important aspects of each database to carry forward into the final product. it is at this point that difficulty typically arises in the normalization of data. once core database components are determined, a vendor ’s expertise in building data­ bases, standardizing entries, etc., comes to the forefront. furthermore, because another basic ingredient to the process of index blending revolves around additional material included by the database developer, that aggre­ gator has the burden of taking the core building blocks of the database and elevating these raw materials to the point where their combination and refinement become the desired end result—a definitive, cohesive index to research in the subject area. with this in mind, ebsco carefully selected the indexing components of each resource that were essen­ tial to carry forward and substantially expanded the 30 information technology and libraries | june 200730 information technology and libraries | june 2007 abstracting and indexing coverage of appropriate journals in commsearch and mass media articles index. the company also added indexing and abstracts for many more of the important titles in the communication and mass media fields that were not covered by these databases. through its initial research, ebsco gained a thorough knowledge of which journals and other content sources were not covered by the two acquired databases, and worked to provide coverage for those missing sources. as such, the idea with this database was to cover all appropriate, qual­ ity titles indexed in all other currently avail­ able communication and mass media­specific databases combined, as well as other important journals not previously covered by any such database. further still, the company took the database to new levels through the creation and deployment of features such as searchable cited references and index browsing. figure 2 provides a visual interpretation of the elements associated with this particular example of index blending. often academic librarians consider aggre­ gated full­text databases as a means for access­ ing full­text information quickly, but with a negative outlook toward the quality of the indexing included in these databases. however, it is ebsco’s intention to create first and fore­ most a powerful index, such that any full text included is that much easier to locate and utilize. according to cleveland and cleveland in the book introduction to indexing and abstracting, 3rd ed., “in any retrieval system, success or failure depends on the adequacy of the indexing and the related search­ ing procedures.”6 ebsco wholeheartedly agrees with this statement. and though the company is the leader in providing full­text databases, it continues to raise the bar for these databases through not only constantly increasing the quality and quantity of full text, but also by enhancing indexing, abstracts, and associated search functionality. a database may provide the greatest collection of full text, yet it is still only as good as its underlying indexing framework that guides users to the appropriate content. index blending allows for this ideal because the development of the indexing takes place at the onset as the primary objective, and full text may be included at a later stage. this is precisely the case with ebsco’s communication/communications database where the first iteration of the collection (communication & mass media index) did not include full text, and the complete (full­text version) was soon to follow. thus, in the case of communication & mass media complete, once the core elements for the index were in place, refined, and normalized, ebsco moved forward in the area of full­text content. in addition to the inclu­ sion of full text for all of the nca journals, which david oldenkamp refers to as “heavyweights in communication studies,” ebsco included full­text coverage for nearly 230 titles. according to oldenkamp, as of april 2004, the competing database with the next largest number of publications covered in full text included only sixteen full­text titles.7 though index blending is not the traditional way in which to build a database, and may actually be the most labor­intensive way in which to proceed, the end results can be remarkable when done properly. using this process, “ebsco has managed to create the largest and most comprehensive database serving the needs of communication studies scholars, faculty, students, and librarians.”8 in addition, a review published in the charleston advisor determined that “ebsco has brought together two reliable but atrophied resources and refreshed them with new search capabilities and added content, such as abstracts. these have been combined with a healthy dose of ‘not indexed anywhere’ new titles and interdisciplinary sources to create a comprehensive figure 2. indexing components of communication & mass media complete public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 31index blending | brooks and herrick 31 resource that will satisfy the needs of students, faculty, and researchers.”9 n another example of index blending hospitality & tourism index index blending is a concept as much as it is a process and a means to an end. much like applying a particular theory to a number of different instances, index blending is inter­disciplinary in application. thus, the area of com­ munication/communications as described previously, is simply an example of practical implementation of this concept, and a particular way in which the process was approached given the specific elements involved. another discipline to which index blending has been applied is the niche areas related to hospitality and tourism. according to professor vivienne sario, director of travel and tourism at community college of southern nevada, “on a global basis the hospitality and tourism industry employs more than 10 percent of the worldwide workforce. it contributes over $4 trillion in gross global output. this means travel and tourism is the world’s largest industry.”10 though still considered (perhaps incorrectly) a “niche” area of study, the number of hospitality and tourism programs supported in colleges and universities around the globe has also increased to the point where dozens and dozens of twoand four-year academic institutions provide related courses of study. from a business perspective, in order to justify the amount of resources that would inevitably be expended to develop a high-end, comprehensive database, the basic criteria needed for database development must first be in place. considering the economic vastness of the hospitality and tourism industry, the interest and research need is quite apparent. if there is at least one clearly definitive academic resource covering the subject area, in all likelihood, the decision would be made to cease exploration and development in that area. contrarily, when ebsco conducted exhaustive research to determine the need for a new index to literature in the areas of hospitality and tourism, the unanimous conclusion was to move forward in the development of a product that would go above and beyond the level of the existing resources. this is not to say that quality was not inherent in some of the existing resources. in actuality, the fact that there were already quality (albeit perhaps incomplete) resources available, paved the way for utilizing principles of index blending in the development of a more comprehensive resource. the first element of what was to become ebsco’s hospitality & tourism index was purdue university’s lodging, restaurant, & tourism index (lrti). as an indi­ cator of the level of emphasis attributed to this subject area by the university, purdue’s hospitality and tourism management undergraduate program was ranked num­ ber one nationally by a survey published in the journal of hospitality & tourism education.11 a previous survey conducted by the same journal used a different method­ ology and sample, but still ranked purdue’s hospitality and tourism management (htm) program number one in the nation.12 to provide insight into the purdue htm program, the origins and history of lrti, the need for a compre­ hensive database, and the university’s decision to work with ebsco, questions were asked of two prominent purdue faculty members: raphael kavanaugh, head, hospitality and tourism management department, and priscilla geahigan, head, consumer and family sciences library. the following is taken from e­mail cor­ respondence among one of the authors (sam brooks), kavanaugh, and geahigan: brooks: how long has purdue offered a hospitality & tourism management program? kavanaugh: the program began in 1928 as the department of institutional management. brooks: when and why did purdue decide to create the lodging restaurant & tourism index (lrti)? kavanaugh: to fill a serious void of access to relevant research conducted related to the industry. geahigan: before 1990 coverage of the hospitality industry within business indexes and databases was limited. to meet the needs of researchers and students, purdue’s restaurant, hotel, institutional, and tourism management department, an in­house indexing project, started in the purdue consumer and family sciences library in 1977. citations of articles from scholarly and trade journals were entered on index cards, filed by subject headings. in 1985 the project became more for­ malized and migrated into partnership with a few other academic institutions. a printed index titled lodging and restaurant index started. in 1987, purdue became the sole producer of the index. in 1995, the index was renamed the lodging, restaurant, and tourism index (lrti), with expanded scope and coverage. over the years, data diskettes and cd­rom formats were added to the printed version. brooks: how important are “niche” or subject­specific databases to support research in a given area such as h&t? geahigan: in contrast to earlier years, students can now get their information from a multitude of databases and 32 information technology and libraries | june 200732 information technology and libraries | june 2007 venues. at purdue, we have databases that cover all aspects of business and management. undergraduate students often get confused and impatient at the large number of databases offered. a subject specific database like hti gives them a place to start without feeling lost. brooks: why did purdue decide to partner with ebsco, and subsequently merge lrti in the larger hospitality & tourism index (hti)? geahigan: we realized that we do not have the resources to support a database that measures up to industry technology standards and have long decided to look for a company to take over lrti. ebsco’s offer was attrac­ tive to purdue because of their willingness to assume future indexing of the lrti journals. in addition, many purdue students are already familiar with the ebsco interface because we have numerous other ebsco hosted databases. we are pleased that lrti became the foundation of ebsco’s building of hti.13 the second foundational component of the database also came about through acquisition from an academic institution. articles in hospitality and tourism was copro­ duced by oxford brookes university and the university of surrey. bournemouth university was also a source of data for this database between the years of 1988 and 1998. this database provided details of more than forty­six thousand english­language articles selected from more than 330 relevant academic and trade journals published worldwide from 1984 to 2003.14 rounding out the list of three existing resources that were acquired by ebsco, the hospitality database (acquired from the original developers at cornell university) was also assimilated into the new hospitality and tourism database. the hospitality database evolved from the print publication bibliography of hotel management and related subjects that was originally established in the 1950s by blanche fickle, the first director of the library at cornell university’s school of hotel administration.15 this database, founded on the vision of ms. fickle, would serve as a core resource for ebsco’s new hospitality & tourism index by providing it with a foundation of quality indexing for journals related to the study of hotel adminis­ tration and management. ebsco completed the initial development of its hospitality and tourism database by reviewing applicable subscription statistics maintained by its sister company, ebsco subscription services, in order to locate other publications relevant to the various subdisciplines of hospitality and tour­ ism. any such publications that were not already indexed by the other three existing resources were targeted for inclusion in the new hospitality & tourism index. figure 3 provides a visual interpretation of the ele­ ments associated with this particular example of index blending. following the initial release of hospitality & tourism index, in order to provide an even more inclusive research experience, ebsco proceeded to develop and release a full­text version of this resource entitled hospitality & tourism complete. this new variant of the database offers users the same indexing infrastructure as hospitality & tourism index, as well as provides the additional benefit of immediate access to relevant full­text content. while the availability of full text is certainly of immense value, it is still the quality of underlying indexing that allows this database to be regarded as truly innovative. in fact, this same perspective was echoed in a recent review in choice where the author states that “hospitality & tourism complete indexes its specialized subject area bet­ ter than any other product currently available.”16 n the whole is greater than the sum of its parts the process of index blending not only brings together content from a variety of resources, it also has the power to increase the research value of that same content. by combining such content under the umbrella of a single comprehensive database, pertinent information can now be more efficiently accessed and cross­referenced with other relevant content. previously, the same body of information could only be explored via a highly ineffec­ tive, piecemeal research process. one last example that demonstrates this potential increase in research value is found in the computers & figure 3. indexing components of hospitality & tourism complete public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 33index blending | brooks and herrick 33 applied sciences complete database. this resource was shaped through the acquisition and merger of three distinct indexes—computer science index (csi), internet & personal computing abstracts (ipca), and information science & technology abstracts (ista)—and rounded out with addi­ tional indexed content relevant to the larger discipline. this resulted in a total of 1,100 active journals indexed back as far as 1965. then, after two years of dedicated licensing work with pub­ lishers, full text for more than 570 of those titles was added to provide more direct access to such content for researchers. figure 4 illustrates how the various subject areas (unique and shared) covered by the three original databases were merged together in the blending process. from this diagram, it is apparent that the original three databases were already quality resources in their own right and adequately rep­ resented their respective subject areas. however, it should also be apparent that, through the pro­ cess of index blending, the value of the original databases has been enhanced via the fusion of their unique, yet complementary content into a single comprehensive resource. n conclusion though the above examples of communication & mass media complete, hospitality & tourism index, and computers & applied sciences complete represent only three of sev­ eral subject­specific databases culminating from the process of index blending, most database producers (including ebsco) would likely agree that this is not a common procedure for database development. however, the knowledge that a company derives from the pro­ cess often has a significant impact on the company’s other, “nonblended” databases. index blending typically requires a high degree of refinement in order to be fully successful, so when a company engages in this rigorous developmental process, the newfound experience and expertise gained from it may spill over into the com­ pany’s other database initiatives. end users may notice improved indexing, abstracts, and other valuable com­ ponents that are now included in other more established full­text resources from the same vendor. databases that were once viewed simply as “aggregated full­text data­ bases” may be looked upon in a different light after the company adopts the process of index blending for other, unrelated database projects. though these databases may still provide easy access to an abundance of full­text content, they may also now be considered the definitive index for their respective subject area(s). therefore, when a company implements the practice of index blending for some of its products, the resulting effects are two­ fold. the databases created directly as a result of the index blending process are the first to benefit, and the company’s other databases (including those with full text) may also benefit from index blending in an indirect manner. in the end, however, the success of any index blending initiative is measured by the level of benefit that it provides to applicable researchers and other users of the resulting databases. references 1. rajaram veliyath and elizabeth fitzgerald, “firm capabil­ ities, business strategies, customer preferences, and hypercom­ petitive arenas: the sustainability of competitive advantages with implications for firm competitiveness,” competitiveness review 10 (2000): 56–82. 2. m. suzanne brown, jana s. edwards, and jeneen lasee­ willemssen, “a new comparison of the current index to jour­ nals in education and the education index: a deep analysis of indexing,” the journal of academic librarianship 25 (may 1999): 216–22. 3. sam brooks, “integration of information resources and collection development strategy,” the journal of academic librarianship 27 (july 2001): 316–19. 4. david oldenkamp, “ebsco’s new communication and mass media complete (cmmc) database,” searcher 12, no. 4 (apr. 2004): 40. 5. national communication association web site. http:// www.natcom.org (accessed aug. 2004). figure 4. subject areas of component databases are merged into a cohesive whole through index blending 34 information technology and libraries | june 200734 information technology and libraries | june 2007 6. donald b. cleveland and ana d. cleveland, introduction to indexing and abstracting, 3rd ed. (greenwood village, colo.: libraries unlimited, 2001): 26. 7. oldenkamp, “ebsco’s new communication and mass media complete (cmmc) database.” 8. ibid. 9. dodie owens, “advisor reviews—standard review: communication and mass media complete,” the charleston advisor 6, no. 4 (apr. 2005): 45. 10. vivienne sario, “hospitality & tourism programs,” http://www.studyusa.com/articles/hospitality.asp (accessed june 1, 2006). 11. purdue university web site. http://news.uns.purdue. edu/uns/html4ever/030130.kavanaugh.rank2003.html (accessed june 1, 2006). 12. michael g. brizek and mahmood a. khan, “ranking of u.s. hospitality undergraduate programs: 2000–01,” journal of hospitality & tourism education 14, no. 2 (2002): 4. 13. raphael kavanaugh and priscilla geahigan, e­mail mes­ sage with author sam brooks, feb. 3, 2005. 14. articles in hospitality and tourism web site (hosted by the university of surrey). http://libweb.surrey.ac.uk/aht2/about .asp (accessed june 1, 2006). 15. cornell university’s school of hotel administration web site. http://www.nestlelib.cornell.edu/history.html (accessed june 1, 2006). 16. s. c. awe, “reference­social and behavioral sciences— hospitality & tourism complete,” choice 43, no. 10 (june 2006). testing information literacy in digital environments | katz 3 despite coming of age with the internet and other technology, many college students lack the information and communication technology (ict) literacy skills necessary to navigate, evaluate, and use the overabundance of information available today. this paper describes the development and early administrations of ets’s iskills assessment, an internet-based assessment of information literacy skills that arise in the context of technology. from the earliest stages to the present, the library community has been directly involved in the design, development, review, field trials, and administration to ensure the assessment and scores are valid, reliable, authentic, and useful. t echnology is the portal through which we interact with information, but there is growing belief that people’s ability to handle information—to solve problems and think critically about information—tells us more about their future success than does their knowledge of specific hardware or software. these skills—known as information and communications technology (ict) literacy—comprise a twenty­first­century form of literacy in which researching and communicating information via digital environments are as important as reading and writing were in earlier centuries (partnership for 21st century skills 2003). although today’s knowledge society challenges stu­ dents with overabundant information of often dubious quality, higher education has recognized that the solution cannot be limited to improving technology instruction. instead, there is an increasingly urgent need for students to have stronger information literacy skills—to “be able to recognize when information is needed and have the ability to locate, evaluate, and use effectively the needed information” (american library association 1989)—and apply those skills in the context of technology. regional accreditation agencies have integrated information lit­ eracy into their standards and requirements (for example, middle states commission on higher education 2003; western association of schools and colleges 2001), and several colleges have begun campuswide initiatives to improve the information literacy of their students (for example, the california state university 2006; university of central florida 2006). however, a key challenge to designing and implementing effective information lit­ eracy instruction is the development of reliable and valid assessments. without effective assessment, it is difficult to know if instructional programs are paying off—whether students’ information literacy skills are improving. ict literacy skills are an issue of national and inter­ national concern as well. in january 2001, educational testing service (ets) convened an international ict literacy panel to study the growing importance of exist­ ing and emerging information and communication tech­ nologies and their relationship to literacy. the results of the panel’s deliberations over fifteen months highlighted the growing importance of ict literacy in academia, the workplace, and society. the panel called for assessments that will make it possible to determine to what extent young adults have obtained the combination of techni­ cal and cognitive skills needed to be productive mem­ bers of an information­rich, technology­based society (international ict literacy panel 2002). this article describes ets’s iskills assessment (for­ merly “ict literacy assessment”), an internet­based assessment of information literacy skills that arise in the context of technology. from the earliest stages to the pres­ ent, the library community has been directly involved in the design, development, review, field trials, and admin­ istration to ensure the assessment and scores are valid, reliable, authentic, and useful. ■ motivated by the library community although the results of the international ict literacy panel provided recommendations and a framework for an assessment, the inspiration for the current iskills assessment came more directly from the higher educa­ tion and library community. for many years, faculty and administrators at the california state university (csu) had been investigating issues of information literacy on their campuses. as part of their systemwide information competence initiative that began in 1995, researchers at csu undertook a massive ethnographic study to observe students’ research skills. the results suggested a great many shortcomings in students’ infor­ mation literacy skills, which confirmed librarian and classroom faculty anecdotal reports. however, clearly such a massive data collection and analysis effort would be unfeasible for documenting the information literacy skills of students throughout the csu system (dunn 2002). gordon smith and the late ilene rockman, both of the csu chancellor ’s office, discussed with ets the idea of developing an assessment of ict literacy that could support csu’s information competence initiative as well as similar initiatives throughout the higher edu­ cation community. irvin r. katz irvin r. katz (ikatz@ets.org) is senior research scientist in the research and development division at educational testing service. testing information literacy in digital environments: ets’s iskills assessment � information technology and libraries | september 2007� information technology and libraries | september 2007 ■ national higher education ict literacy initiative in august 2003, ets established the national higher education ict literacy initiative, a consortium of seven colleges and universities that recognized the need for an ict literacy assessment targeted at higher educa­ tion. representatives of these institutions collaborated with ets staff to design and develop the iskills assessment. the consortium built upon the work of the international panel to explicate the nature of ict literacy in higher education. over the ensuing months, repre­ sentatives of consortium institutions served as subject­ matter experts for the assessment design and scoring implementation. the development of the assessment followed a process known as evidence­centered design (mislevy, steinberg, and almond 2003), a systematic approach to the design of assessments that focuses on the evidence (student performance and products) of proficiencies as the basis for constructing assessment tasks. through the evidence­ centered design process, ets staff (psychometricians, cognitive psychologists, and test developers) and sub­ ject­matter experts (librarians and faculty) designed the assessment by considering first the purpose of the assess­ ment and by defining the construct—the knowledge and skills to be assessed. these decisions drove discussions of the types of behaviors, or performance indicators, to serve as evidence of student proficiency. finally, simulation­ based tasks designed around authentic scenarios were crafted to elicit from students the critical performance indicators. katz et al. (2004) and brasley (2006) provide a detailed account of this design and development process, illustrating the critical role played by librarians and other faculty from higher education. ■ ict literacy = information literacy + digital environments consortium members agreed with the conclusions of the international ict literacy panel that ict literacy must be defined as more than technology literacy. college students who grew up with the internet (the “net generation”) might be impressively technologically literate, more accepting of new technology, and more technically facile than their parents and instructors (oblinger and oblinger 2005). however, anecdotally and in small­scale studies, there is increasing evidence that students do not use technology effectively when they conduct research or communicate (rockman 2004). many educators believe that students today are less information savvy than earlier generations despite having powerful information tools at their disposal (breivik 2005). ict literacy must bridge the ideas of information literacy and technology literacy. to do so, ict literacy draws out the technology­related components of infor­ mation literacy as specified in the often­cited standards of the association of college and research libraries (acrl) (american library association 1989), focusing on how students locate, organize, and communicate information within digital environments (katz 2005). this conflu­ ence of information and technology directly reflects the “new illiteracy” concerns of educators: students quickly adopt new technology, but do not similarly acquire skills for being critical consumers and ethical producers of information (rockman 2002). students need training and practice in ict literacy skills, whether through general education or within discipline coursework (rockman 2004). the definition of ict literacy adopted by the con­ sortium members reflects this view of ict literacy as information literacy needed to function in a technological society: ict literacy is the ability to appropriately use digital technology, communication tools, and/or networks to solve information problems in order to function in an information society. this includes having the ability to use technology as a tool to research, organize, and communicate information and having a fundamental understanding of the ethical/legal issues surrounding accessing and using information (katz et al. 2004, 7). consortium members further refined this defini­ tion, identifying seven performance areas (see figure 1). these areas mirror the acrl standards and other related standards, but focus on elements that were judged most central to being sufficiently information literate to meet the challenges posed by technology. ■ ets’s iskills assessment ets’s iskills assessment is an internet­delivered assess­ ment that measures students’ abilities to research, orga­ nize, and communicate information using technology. the assessment focuses on the cognitive problem­solving and critical­thinking skills associated with using technol­ ogy to handle information. as such, scoring algorithms target cognitive decision­making rather than technical competencies. the assessment measures ict literacy through the seven performance areas identified by con­ sortium members, which represent important problem­ solving and critical­thinking aspects of ict literacy skill (see figure 1). assessment administration takes approx­ imately seventy­five minutes, divided into two sec­ tions lasting thirty­five and forty minutes, respectively. article title | author 5testing information literacy in digital environments | katz 5 figure 1. components of ict literacy define: understand and articulate the scope of an information problem in order to facilitate the electronic search for information, such as by: ■ distinguishing a clear, concise, and topical research question from poorly framed questions, such as ones that are overly broad or do not otherwise fulfill the information need; ■ asking questions of a “professor” that help disambiguate a vague research assignment; and ■ conducting effective preliminary information searches to help frame a research statement. access: collect and/or retrieve information in digital environments. information sources might be web pages, databases, discussion groups, e-mail, or online descriptions of print media. tasks include: ■ generating and combining search terms (keywords) to satisfy the requirements of a particular research task; ■ efficiently browsing one or more resources to locate pertinent information; and ■ deciding what types of resources might yield the most useful information for a particular need. evaluate: judge whether information satisfies an information problem by determining authority, bias, timeliness, relevance, and other aspects of materials. tasks include: ■ judging the relative usefulness of provided web pages and online journal articles; ■ evaluating whether a database contains appropriately current and pertinent information; and ■ deciding the extent to which a collection of resources sufficiently covers a research area. manage: organize information to help you or others find it later, such as by: ■ categorizing e-mails into appropriate folders based on a critical view of the e-mails’ contents; ■ arranging personnel information into an organizational chart; and ■ sorting files, e-mails, or database returns to clarify clusters of related information. integrate: interpret and represent information, such as by using digital tools to synthesize, summarize, compare, and contrast information from multiple sources while: ■ comparing advertisements, e-mails, or web sites from competing vendors by summarizing information into a table; ■ summarizing and synthesizing information from a variety of types of sources according to specific criteria in order to compare information and make a decision; and ■ re-representing results from an academic or sports tournament into a spreadsheet to clarify standings and decide the need for playoffs. create: adapt, apply, design, or construct information in digital environments, such as by: ■ editing and formatting a document according to a set of editorial specifications; ■ creating a presentation slide to support a position on a controversial topic; and ■ creating a data display to clarify the relationship between academic and economic variables. communicate: disseminate information tailored to a particular audience in an effective digital format, such as by: ■ formatting a document to make it more useful to a particular group; ■ transforming an e-mail into a succinct presentation to meet an audience’s needs; ■ selecting and organizing slides for distinct presentations to different audiences; and ■ designing a flyer to advertise to a distinct group of users. © 2007 educational testing service. all rights reserved. 6 information technology and libraries | september 20076 information technology and libraries | september 2007 during this time, students respond to fifteen interactive, performance­based tasks. each interactive task presents a real­world scenario, such as a class or work assignment, that frames the infor­ mation problem. students solve information­handling tasks in the context of simulated software (for example, e­mail, web browser, library database) having the look and feel of typical applications. there are fourteen three­ to five­minute tasks and one fifteen­minute task. the three­ to five­minute tasks target a single perfor­ mance area, while the fifteen­minute tasks comprise more complex problem­solving scenarios that target multiple performance areas. the simpler tasks contribute to the overall reliability of the assessment, while the more com­ plex task focuses on the richer aspects of ict literacy performance. in the assessment, a student might encounter a sce­ nario that requires him or her to access information from a database using a search engine (see figure 2). the results are tracked and strategies scored based on how he or she searches for information, such as key words chosen, search strategies refined, and how well the information returned meets the needs of the task. the assessment tasks each contain mechanisms to keep students from pursuing unproductive actions in the simulated environment. for example, in an internet browsing task, when the student clicks on an incorrect link, he might be told that the link is not needed for the current task. this message cues the student to try an alter­ native approach while still noting for scoring purposes that the student made a misstep. in a similar way, the student who fails to find useful (or any) journal articles in her database search might receive an instant message from a “teammate” providing her with a set of journal articles to be evaluated. these mechanisms potentially keep students from becoming frustrated (for example, via a fruitless search) while providing the opportunity for the students to demonstrate other aspects of their skills (for example, evaluation skills). the scoring for the iskills assessment is completely automated. unlike a multiple­choice question, each simu­ lation­based task provides many opportunities to collect information about a student and allows for alternative paths leading to a solution. scored responses are pro­ duced for each part of a task, and a student’s overall score on the test accumulates the individual scored responses across all assessment tasks. the assessment differs from existing measures in sev­ eral ways. as a large­scale measure, it was designed to be administered and scored across units of an institution or across institutions. as a simulation­based assessment, the tasks go beyond what is possible in multiple­choice format, providing students with the look and feel of interactive digital environments along with tasks that elicit higher­order critical­thinking and problem­solving skills. as a scenario­based assessment, students become engaged in the world of the tasks, and the task scenarios describe the types of assignments students should be see­ ing in their ict literacy instruction as well as examples of workplace and personal information problems. ■ two levels of assessments the iskills assessment is offered at two levels: core and advanced. the core level was designed to assess readi­ ness for the ict literacy demands of college. it is targeted at high school seniors and first­year college students. the advanced level was designed to assess readiness for the ict literacy challenges in transitioning to higher­level college coursework, such as moving from sophomore to junior year or transferring from a two­year to a four­year institution. the advanced level targets students in their second or third year of post­secondary study. the key difference between the core and advanced levels is in the difficulty of the assessment tasks. tasks in the core level are designed to be easier; examinees are presented with fewer options, the scenarios are more straightforward, and the reasoning needed for each step in a task is simpler. an advanced task might require an individual to infer the search terms needed from a gen­ eral description of an information need; the correspond­ ing core task would state the information need more explicitly. in a task of evaluating web sites, the core level might present a web site with many clues that it is not figure 2. in the iskills assessment, students demonstrate their skills at handling information through interaction with simulated software. in this example task, students develop a search query as part of a research assignment on earthquakes. © 2007 educational testing service. all rights reserved. article title | author 7testing information literacy in digital environments | katz 7 authoritative (a “.com” url, unprofessional look, content that directly describes the authors as students). the cor­ responding advanced task would present fewer cues of the web site’s origin (for example, a professional look, but careful reading reveals the web site is by students). ■ score reports for individuals and institutions both levels of the assessment feature online delivery of score reports for individuals and for institutions. the individual score report is intended to help guide students in their learning of ict literacy skills, aiding identifica­ tion of students who might need additional ict literacy instruction. the report includes an overall ict literacy score, a percentile score, and individualized feedback on the student’s performance (see figure 3). the percentile compares students to a reference group of students who took the test in early 2006 and who fall within the target population for the assessment level (core or advanced). as more data are collected from a greater number of institutions, these reference groups will be updated and, ideally, approach nationally representative norms. score reports are available online to students, usually within one week. high schools, colleges, and universities receive score reports that aggregate results from the test­takers at their institution. the purpose of the reports is to provide an overview of the students in comparison with a reference group. these reports are available to institutions online after at least fifty students have taken either the core or advanced level test—that is, when there are sufficient num­ bers to allow reporting of reliable scores. figure 4 shows a graph from one type of institutional report. users have the option to specify the reference group (for example, all students, all students at a four­year institution) and the subset of test­takers to compare to that group (for exam­ ple, freshmen, students taking the test within a particular timeframe). a second report summarizes the performance feedback of the individual reports, providing percentages of students who received the highest score on each aspect of performance (each of the fourteen short tasks are scored on two or three different elements). finally, institutions can conduct their own analyses by downloading the data of their test­takers, which include each student’s responses to the background questions, iskills score, and responses to institution­specified questions. ■ testing the test a variety of specialists contributed to the development of ets’s iskills assessment: librarians, classroom fac­ ulty, education administrators, assessment specialists, researchers, user­interface and graphic designers, and systems developers. the team’s combined goal was to produce a valid, reliable, authentic assessment of ict literacy skills. before the iskills assessment produced figure 3. first page of a sample score report for an individual. the subsequent pages contain additional performance feedback. figure 4. sample portion of an institutional score report: comparison between a user-specified reference group and data from the user’s institution. © 2007 educational testing service. all rights reserved. © 2007 educational testing service. all rights reserved. � information technology and libraries | september 2007� information technology and libraries | september 2007 official scores for test­takers, these specialists—both ets and ict literacy experts—subjected the assess­ ment to a variety of review procedures at many stages of development. these reviews ranged from weekly teleconferences with consortium members during the initial development of assessment tasks (january–july 2004), to small­scale usability studies in which ets staff observed individual students completing assessment tasks (or mockups of assessment tasks), to field trials that mirrored actual test delivery. the usability studies investigated students’ comprehension of the tasks and testing environment as well as the ease of use of the simulated software in the assessment tasks. the field trials provided opportunities to collect performance data and test the automated scoring algorithms. in some cases, ets staff fine­tuned the scoring algorithms (or developed alternatives) when the scores produced were not psychometrically sound, such as when one element of students’ scores was inconsistent with their overall performance. through these reviews and field trials, the iskills assessment evolved to its current form, targeting and reporting the performance of individuals who complete the seventy­five­minute assessment. in some cases, feedback from experts and field trial participants led to significant changes. for example, the iskills assess­ ment began in 2005 as a two­hour assessment (at that time called the ict literacy assessment), that reported scores only to institutions on the aggregated perfor­ mance of their participating students. some students entering higher education found the 2005 assessment excessively difficult, which led to the creation of the easier core level assessment. table 1 outlines the participation volumes for the field trials and test administrations. during each field trial, as well as during the institutional administration, feedback was collected from students on their experience with the test via a brief exit survey. table 2 summarizes some results of the exit survey. student reactions to the test were reasonably consistent: most students enjoyed taking the test and found the tasks realistic. in writ­ ten comments, students taking the institutional assess­ ment found the experience rewarding but exhausting, and thought the amount of reading excessive. student feedback directly influenced the design of the core and advanced level assessments, including the shorter test­ table 1. chronology of field trials and test administrations date administration approximate no. of students approximate no. of participating institutions july–september 2004 field trials for institutional assessment 1,000 40 january–april 2005 institutional assessment 5,000 30 may 2005 field trials for alternative individual assessment structures 400 25 november 2005 field trials for advanced level individual assessment 700 25 january–may 2006 advanced level individual assessment 2,000 25 february 2006 field trials for core level individual assessment 700 30 april–may 2006 core level individual assessment 4,500 45 august–december 2006 core level: continuous administration 2,100 20 august–december 2006 advanced level: continuous administration 1,400 10 note: items in bold represent “live” test administrations in which score reports were issued to institutions, students, or both. article title | author 9testing information literacy in digital environments | katz 9 taking time and lighter reading load compared with the institutional assessment. as shown in table 1 (bolded rows), test administra­ tions in 2005 and early 2006 occurred within set time frames. beginning in august 2006, the core and advanced level assessments switched to continuous testing: instead of a specific testing window, institutions create testing sessions to suit the convenience of their resources and students. the tests are still administered in a proctored lab environment, however, to preserve the integrity of the scores. ■ student performance almost 6,400 students at sixty­three institutions par­ ticipated during the first administrations of the core and advanced level iskills assessments between january and may 2006. (some institutions administered both the core and advanced level assessments.) test­takers consisted of 1,016 high­school students, 753 community college students, and 4,585 four­year college and university stu­ dents. institutions selected students to participate based on their assessment goals. some chose to test students enrolled in a particular course, some recruited a random sample, and some issued an open invitation and offered gift certificates or other incentives. because the sample of students is not representative of all united states institu­ tions nor all higher education students, these results do not necessarily generalize to the greater population of college­age students and should therefore be interpreted with caution. even so, the preliminary results reveal interesting trends in the ict literacy skills of participat­ ing students. overall, students performed poorly on both the core and advanced level, achieving only about half of the possible points on the tests. informally, the data suggest that students generally do not consider the needs of an audience when communicating information. for exam­ table 2. student feedback from the institutional assessment and individual assessments’ field trials statement % agreeing institutional assessment (n=4,898) advanced level field trials (n=736) core level field trials (n=648) i enjoyed taking this test. 61 59 67 this test was appropriately challenging. 90 90 86 i have never taken a test like this one before. 90 90 89 to perform well on this test requires thinking skills as well as technical skills. 95 93 94 i found the overall testing interface easy to use (even if the tasks themselves might have been difficult). 83 82 85 my performance on this test accurately reflects my ability to solve problems using computers and the internet. 63 56 67 i didn’t take this test very seriously. 25 25 23 the tasks reflect activities i have done at school, work, or home. 79 77 78 the software tools were unrealistic. n/a 21 24 10 information technology and libraries | september 200710 information technology and libraries | september 2007 ple, they do not appear to recognize the value of tailor­ ing material to an audience. regarding the ethical use of information, students tend not to check the “fair use” policies of information on the assessment’s simulated web sites. unless the usage policy (for example, copy­ right information) is very obvious, students appeared to assume that they may use information obtained online. on the positive side, test­takers appeared to recognize that .edu and .gov sites are less likely to contain biased material than .com sites. eighty percent of test­takers correctly completed an organizational chart based on e­mailed personnel information. most test­takers cor­ rectly categorized e­mails and files into folders. and when presented with an unclear assignment, 70 percent of test­takers selected the best question to help clarify the assignment. during a task in which students evaluated a set of web sites: ■ only 52 percent judged the objectivity of the sites cor­ rectly; ■ sixty­five percent judged the authority correctly; ■ seventy­two percent judged the timeliness correctly; and ■ overall, only 49 percent of test­takers uniquely identi­ fied the one web site that met all criteria. when selecting a research statement for a class assign­ ment: ■ only 44 percent identified a statement that captured the demands of the assignment; ■ forty­eight percent picked a reasonable but too broad statement; and ■ eight percent picked statements that did not address the assignment. when asked to narrow an overly broad search: ■ only 35 percent selected the correct revision; and ■ thirty­five percent selected a revision that only mar­ ginally narrowed the search results other results suggest that these students’ ict literacy needs further development: ■ in a web search task, only 40 percent entered mul­ tiple search terms to narrow the results; ■ when constructing a presentation slide designed to persuade, 12 percent used only those points directly related to the argument; ■ only a few test­takers accurately adapted existing material for a new audience; and ■ when searching a large database, only 50 percent of test­takers used a strategy that minimized irrelevant results. ■ validity evidence the goal of the iskills assessment is to measure the ict literacy skills of students—higher scores on the assess­ ment should reflect stronger skills. evidence for this validity argument has been gathered since the earliest stages of assessment design, beginning in august 2003. these documentation and research efforts, conducted at ets and at participating institutions, include: ■ the estimated reliability of iskills assessment scores is .88 (cronbach alpha), which is a measure of test score consistency across various administrations. this level of reliability is comparable to that of many other respected content­based assessments, such as the advanced placement exams. ■ as outlined earlier, the evidence­centered design approach ensures a direct connection between experts’ view of the domain (in this case, ict literacy), evi­ dence of student performance, design of the tasks, and the means for scoring the assessment (katz et al. 2004). through the continued involvement of the library community in the form of the ict literacy national advisory committee and development committees, the assessment maintains the endorsement of its con­ tent by appropriate subject­matter experts. ■ in november 2005, a panel of experts (librarians and faculty representing high schools, community colleges, and four­year institutions from across the united states) reviewed the task content and scoring for the core level iskills assessment. after investigat­ ing each of the thirty tasks and their scoring in detail, the panelists strongly endorsed twenty­six of the tasks. four tasks received less strong endorsement and were subsequently revised according to the committee’s recommendations. ■ students’ self­assessments of their ict literacy skills align with their scores on the iskills assessment (katz and macklin 2006). the self­assessment measures were gathered via a survey administered before the 2005 assessment. interestingly, although students’ confidence in their ict literacy skills aligned with their iskills scores, iskills scores did not correlate with the frequency with which students reported per­ forming ict literacy activities. this result supports librarians’ claims that mere frequency of use does not translate to good ict literacy skills, and points article title | author 11testing information literacy in digital environments | katz 11 to the need for ict literacy instruction (oblinger and hawkins 2006; rockman 2002). ■ several other validity studies are ongoing, both at ets and at collaborating institutions. these stud­ ies include using the iskills assessment in pre­post evaluations of educational interventions, detailed comparisons of student performance on the assess­ ment and on more real­world ict literacy tasks, and comparisons of iskills assessment scores and scores from writing portfolios. ■ national ict literacy standards and setting cut scores in october 2006, the national forum on information literacy, an advocacy group for information literacy policy (http://www.infolit.org/), announced the formation of the national ict literacy policy council. the policy coun­ cil—composed of representatives from key policy­making, information­literacy advocacy, education, and workforce groups—has the charter to draft ict literacy standards that outline what students should know and be able to do at different points in their academic careers. beginning in 2007, the council will first review existing standards docu­ ments to draft descriptions for different levels of perfor­ mance (for example, minimal ict literacy, proficient ict literacy), creating a framework for the national ict literacy standards. separate performance levels will be defined for the corresponding target population for the core and advanced assessments. these performance­level descrip­ tions will be reviewed by other groups representing key stakeholders, such as business leaders, healthcare educa­ tors, and the library community. the council also will recruit experts in ict literacy and information­literacy instruction to review the iskills assessment and recommend cut scores corresponding to the performance levels for the core and advanced assess­ ments. (a cut score represents the minimum assessment score needed to classify a student at a given performance level.) the standards­based cut scores are intended to help educators determine which students meet the ict literacy standards and which may need additional instruction or remediation. the council will review these recommended cut scores and modify or accept them as appropriately reflecting national ict literacy standards. ■ conclusions ets’s iskills assessment is the first nationally available measure of ict literacy that reflects the richness of that area through simulation­based assessment. owing to the 2005 and 2006 testing of more than ten thousand students, there is now evidence consistent with anec­ dotal reports of students’ difficulty with ict literacy despite their technical prowess. the results reflect poor ict literacy performance not only by students within one institution, but across the participating sixty­three high schools, community colleges, and four­year colleges and universities. the iskills assessment answers the call of the 2001 international ict literacy panel and should inform ict literacy instruction to strengthen these criti­ cal twenty­first­century skills for college students and all members of society. ■ acknowledgments i thank karen bogan, dan eignor, terry egan, and david williamson for their comments on earlier drafts of this article. the work described in this article represents con­ tributions by the entire iskills team at educational testing service and the iskills national advisory committee. works cited american library association. 1989. presidential committee on information literacy: final report. chicago: ala. available online at http://www.ala.org/acrl/legalis.html (accessed june 13, 2007). brasley, s. s. 2006. building and using a tool to assess info and tech literacy. computers in libraries 26, no. 5: 6–7, 43–48. breivik, p. s. 2005. 21st century learning and information literacy. change 37, no. 2: 20–27. dunn, k. 2002. assessing information literacy skills in the cali­ fornia state university: a progress report. journal of academic librarianship 28, no. 1/2: 26–36. international ict literacy panel. 2002. digital transformation: a framework for ict literacy. princeton, n.j.: educational testing service. available online at http://www.ets.org/media/ tests/information_and_communication_technology_lit­ eracy/ictreport.pdf (accessed june 13, 2007). katz, i. r. 2005. beyond technical competence: literacy in infor­ mation and communication technology. educational technology magazine 45, no 6: 144–47. katz, i. r., and a. macklin. 2006. information and communica­ tion technology (ict) literacy: integration and assessment in higher education. in proceedings of the 4th international conference on education and information systems, technologies, and applications, f. malpica, a. tremante, and f. welsch, eds. caracas, venezuela: international institute of informatics and systemics. katz, i. r., et al. 2004. assessing information and communications technology literacy for higher education. paper presented at the 12 information technology and libraries | september 200712 information technology and libraries | september 2007 annual meeting of the international association for educa­ tional assessment, philadelphia, pa. middle states commission on higher education. 2003. developing research and communication skills: guidelines for information literacy in the curriculum. philadelphia: middle states com­ mission on higher education. mislevy, r. j., l. s. steinberg, and r. g. almond. 2003. on the structure of educational assessments. measurement: interdisciplinary research and perspectives 1: 3–67. oblinger, d. g., and b. l. hawkins. 2006. the myth about stu­ dent competency. educause review 41, no. 2: 12–13. oblinger, d. g., and j. l. oblinger, eds. 2005. educating the net generation. washington, d.c.: educause, http://www. educause.edu/educatingthenetgen (accessed dec. 29, 2006). partnership for 21st century skills. 2003. learning for the 21st century: a report and mile guide for 21st century skills. washington, d.c.: partnership for 21st century skills. rockman, i. f. 2002. strengthening connections between infor­ mation literacy, general education, and assessment efforts. library trends 51, no. 2: 185–98. ———. 2004. introduction: the importance of information lit­ eracy. in integrating information literacy into the higher education curriculum: practical models for transformation. i. f. rockman and associates, eds. san francisco: jossy­bass. the california state university. 2006. information competence initiative web site. http://calstate.edu/ls/infocomp.shtml (accessed june 4, 2006). university of central florida. 2006. information fluency initiative web site. http://www.if.ucf.edu/ (accessed june 4, 2006). western association of schools and colleges. 2001. handbook of accreditation. alameda, calif.: western association of schools and colleges. available online at http://www.wascsenior .org/wasc/doc_lib/2001%20handbook.pdf (accessed dec. 22, 2006). automatic retrieval of biographical reference books cherie b. well: institute for computer research, committee on information science, university of chicago, chicago, illinois 239 a description of one of the first pro;ected attempts to automate a reference service, that of advising which biographical reference book to use. two hundred and thirty-four biographical books were categorized as to type of subjects included and contents of the uniform entries they contain. a computer program which selects up to five books most likely to contain answers to biographical questions is described and its test results presented. an evaluation of the system and a discussion of ways to extend the scheme to other forms of reference work are given. ideally the reference librarian is the "middleman between the reader and the right book" ( 1 ) , and this is what the program here described is intended to be. in the past there has been very little interest shown in automating this service, probably because it is neither urgent nor practical in current reference departments. many developments in automating other areas of libraries have indirectly benefitted reference librarians, and the literature primarily emphasizes this aspect. for instance, where circulation systems have been automated, the location of a particular volume can be quickly ascertained and librarians need not waste time searching. automation of the ordering phase provides them with information on the processing stage of a new volume. if the contents of the catalog have been put in machine readable form, special bibliographies can be rapidly produced in response to a particular request or as a regular service of selective dissemination. the development of kwic (key word in context) in240 journal of library automation vol. 1/ 4 december, 1968 dexes, which are compiled and printed by computer, has enabled publishers to provide indexes to their books much faster. computers have also been programmed to make concordances and citation indexes ( 2). the combination of paper-tape typewriters, computer and a photocomposer has introduced automation into compiling index medicus (3). changes in reference services themselves, however, may make automation of question-answering practical. one trend is toward larger reference collections to be shared by several libraries; some areas have already set up regional reference services. there are also cooperative reference plans whereby several strong libraries agree to specialize in certain fields and cooperate in answering questions referred by the others ( 4). these trends will mean two things to reference librarians: greater concentration of resources, allowing more specialized books and mechanization; and screening of questions at the local level, letting reference centers concentrate on more complex questions that utilize their specialized books. thus it seems likely that special reference centers may look increasingly toward mechanizing their services, and retrieval schemes of the type presented here will be important to consider. basic assumptions the categorizing system was based on two nearly universal generalizati.ons about biographical reference books: 1) they are consistently confined to biographies of persons who have something in common: for example, being alive or dead; or having the same nationality, sex, occupation, religion, race, memberships; or possessing some combination of those attributes. these common characteristics in the people covered by a given book are herein called "exclusive categories." 2) the books generally maintain uniform entries for each subject; that is, they give the same data for each biography. these facts are referred to herein as "specifics" or "specific categories." certain assumptions were made about reference work: 1) all biographical reference books fit into the scheme and can be categorized. 2) the more limited a book's scope, the more likely it is to contain the person a user wants to find. in other words, if a user is interested in a dutch economist, he is more likely to find information in a book limited to dutch economists than in a general biographical dictionary. the user, however, does not want to miss any source that might be useful. therefore a general biographical dictionary should be given to him as a last resort, after books on dutch economists, dutchmen of all occupations, and economists of all nationalities. 3) certain requirements, the specifics, have no substitutes. for example, a book lists addresses or it does not, and if a user wants an address, books without them are useless. there is merit in suggesting to a user which book to use as opposed to giving him the direct answer to his question. probably the best argument for this assumption is that the volume of names that would have to be retrieval of biographical reference booksjweil 241 compiled and stored for a direct inquiry system is staggering, only a small number would ever be looked up, and it is impossible to predict which ones would be searched for. there are advantages to mechanizing this particular task of a reference librarian: good reference librarians should be freed to perform work less easily mechanized; there are not enough reference librarians who have perfect recall of their collections even to knowing which exclusive categories all the books fit into; and no librarian could have complete recall as to the specifics contained in each biographical reference book in the collection. the computer program the program was written in the comit language, a non-numerical programming language developed for research in mechanical translation, information retrieval and artificial intelligence. it is a high-level problemoriented language for symbol manipulation, especially designed for dealing with strings of characters. the program could probably be converted to other list-processing languages ( 6) for operation at other installations. the program was run at the university of chicago computation center on an ibm 7094 having the comit system on a disk. questions were submitted and nm in large batches. · the data all biographical reference books in english, with alphabetical ordering of subjects, which are in the reference room of the university of chicago's harper library were included in the data and no other books were included. since one assumption was that all biographical reference books could be categorized by the scheme, it seemed more useful to prove the system could handle any biographical reference tool than to compile a balanced list of biographical books. there was no difficulty in categorizing the books. all books are categorized in the following way. first an arbitrary abbreviation for the book is chosen to be its entry in the file; it is referred to as a "constituent." each book is then described by determining the values of nine subscripts each constituent carries, the subscripts being sex, living, nat (nationality), occup (occupation), min ( minorities), date, index, specl and spec2 (specifics). values of the first five subscripts-the exclusive categories-are first determined. that is, is the book limited to one sex? are all the subjects living or dead? do they all have a certain occupation? does the book include only certain nationalities? or is there another restriction; e.g., to alumni of a college, members of the nobility or a religious group? the exclusive categories for a book are determined and coded from a table of abbreviations. sex, for example, allows three values: restricted to males ( m), restricted to females (f) , or no restriction ( z) . also a value x must occur 242 journal of library automation vol. 1/ 4 december, 1968 with m or f, indicating there is a restriction. therefore sex can have the following combinations: sex z, sex f x, or sex m x; the values m x and f x are both the opposite of z. next the book's date is determined by asking "at what date did the values on living (yes or no) apply? or, if the subjects are not restricted to living or dead (living z), "when was the book up to date?" next any indexes to the biographies are noted. all the biographical books list subjects in alphabetical order by surname. lists of subjects in any other order are considered indexes even if the subjects are actually listed in some other order in the main body and the list that is alphabetic by surname is an index. finally, specific categories (spec i and spec2) are coded for such facts as birthdate, birthplace, college attended, degrees held, hobbies, illustrations, social clubs, and marital status. when all categorizing is finished, a data item is punched in this form: dictphilbio/ index field x, living n x, occup z, sex a, nat philip asian x, specl dc ds fl bp l cl cm dg e i z, date 50s x, spec2 p pl r ms pd z, min z +. this represents the dictionary of philippine biography, a book limited to dead filipinos and giving for each entry: dates, career, descendants, field, birthplace, long articles, class in college, degrees, education, picture, parents, publications, references, marital status and physical description. the book has a special index to find subjects by their field of work. one specific value, that for a long article, requires special mention. though most biographical reference books provide the same facts about all the subjects in list form, a few provide different facts about different subjects in a nanative form. such books carry the specl l, and the other specifics these books are listed as providing are not always given for every subject. for example, a book with a list format may provide the birthplace for every subject when it can be ascertained, but in a book using the narrative form, where often different authors write the articles, birthplace is not necessarily given. books in narrative form are used less for quick reference; therefore the program provides a note, when a long article is requested, that the card catalog may provide more long articles on the subject. ease of file maintenance is one advantage of this system. as data is analyzed in the first place, if a new value for a category is required, such as an occupation which is not in the list, the new value is simply added under occup for that particular book and in the list of abbreviations for fuh1re use. it is a little more complicated to make an existing value more specific. for example, to differentiate botanist, chem, physics and astron and still maintain scientist as a general category embracing them all, another short program is required to retrieve the data to be reclassified. retrieval of biographical reference books/ well 243 coding the question a biographical question can be quickly coded. the nine required subscripts are the same as those for the data books, but only one value for each subscript is necessary. for example, "'what are the publications of a living dutch economist? a current book is desired." is coded as q / sex z (or m), living y, nat dutch, occup econ, min z, index z, date 60s, spec! z, spec2 pl +. operation of the program briefly, the program reads in data and then the first question. it weeds out data items that can never be suitable, discarding all but those items that have the same values as the question has on the subscripts index, spec! and spec2. it then weeds out data items that do not have either the same values as the question, or the value z, on the subscripts occup, nat, min, sex and living. mter each weeding the program checks to determine that there are data items left; if all the books have been weeded out, there are no answers. there is also a provision to allow the user to designate certain titles to be ignored on a particular question in case he has already checked them, for example. all data items left after weeding are potential answers and could simply be printed out. however, subsequent searches over the remaining items serve the purpose of rearranging them into an order in which they are more likely to produce answers. it was decided that five answers are enough to judge the types of titles chosen yet few enough to avoid very long searches. a shorter list of answers would obviously be cheaper and a longer list more likely to produce a book containing the desired subject. ordering proceeds as follows: first values of subscripts sex, living, min, occup, nat and date on the question as originally stated are matched to those of books in the data. the computer is at this stage searching for books that are limited in just those categories in which the question is limited. for example, if the question q / sex z, living y, min z, nat dutch, occup econ, index z, date 60s, spec! a, spec2 pl + will match only those books published in the 1960's and restricted to living dutch economists which give publications for all the subjects (or the majority), and the books cannot be restricted to a sex or to any "minority" group. the books found may or may not have additional values on the subscripts; that is, a book may also contain french economists. such books found on the first search are mostly likely to contain the subject the questioner is looking for. if there are fewer than five books found which are a perfect match with the question, the program begins to alter the question. to make the least significant possible change in the question, the program changes the value of the subscript judged to be the limiting factor on the fewest books in the data, namely sex. if sex has a z as its value (because the questioner did not know the sex or did not prefer a book limited to one 244 journal of library automation vol. 1/ 4 december, 1968 sex) it is changed to x so that a book limited to one sex will not be overlooked. if sex does not have a z value (which means it has either m x or f x), it is changed to z. this means the questioner preferred books limited to one sex but presumably his second choice is books not limited to any sex. clearly if the question has sex f x it can never be changed to sex m x or sex x, since sex x will find books in the data classified sex m x. anything other than z changes to z, and z only changes to x. mter this change is made, another search is conducted and the answers counted. until there are five books or the data is exhausted, the original question is altered and the cycle continued. alterations proceed by changing the values of one subscript at a time in the following order: sex, living, min, nat and occup. then they are changed two at a time, three at a time, four at a time, and finally all five are changed, so there are thirty-one possible changes. if at the end of the thirty-second search there are still not five answers and there are more data items, the date restriction on the question is checked. if date has a value other than z, it is changed to z, which matches all the data items, and the computer prints a note if this is done; the program will then select any book regardless of date. control returns to search and begins the cycle again, continuing until five answers are found or the data is exhausted. mter searching is finished, the writing routine commences. one at a time the computer takes each answer, writes out its code for possible further reference, and then writes out the complete author, title, copyright date and library of congress call number, all of which the computer finds in a list within the program. results to obtain some measure of the program's accuracy, fourteen textbook questions, probably more challenging than the average patron would ask, were submitted to the computer and to a professional librarian who was especially familiar with biographical reference books. (see figure 1 for sample questions and results. ) the librarian spent a total of an hour and a half, and found answers to eleven out of fourteen questions. on the three she could not answer she felt she had exhausted the resources. in one of the eleven she answered ("how many americans won the nobel prize in medicine between 1930 and 1950?") she found the answer in a source not specifically biographical (world almanac) and therefore not in the computer's data. no problems occurred in forming the questions for submission to the computer. the program found some reasonable sources in all cases. it found books containing the answer in ten out of fourteen cases, the four answers not found being those three the librarian missed and the one requiring an almanac. in all but one case there were more possibilities than the five books given per answer. some questions were rerun ignoring retrieval of biographical reference books / well 245 qu~stion: in one source find a list of .1t least twenty references t o biog raphical information about dmitri ~1endelee£ (or mende. lev), russ i an chemist (1834-1907). as submitted to computer: q / sex h, livi ng n, occup cheh, nat russian, min z, specl z, specz r , index z, dt.te z + librarian's results: b phillips, dictionary of biographical a encyclopedia britannica a encyclopedia p-'llericana a biography index (1949-64 volumes) reference .. 0 references 6 references 1 reference .. 14 references time: 10 minutes computer's results: a index to scientists ... 27 references a biography index c drake, di ctionary of american biography (sounds wrong but it is international.) b phillips, dictionary of biographical reference a encyclopedia britannica question: what academic degrees have been earned by professor reuben l. hill, director of family study at the university of minnesota'?: as submitted to computer: (l) q/ sex m, living y, occup educ, nat aher, min z, specl dg, spec2 z, i~"dex z, date z + ( 2) ignore + 1\}ieconassn + i gnore + amerscience + ignore + ampolisci + ignore + damerschol + ignore + leadeduc + q / sex m, living y, occup educ, nat amer, min z, specl dg, specz z, index z, date z librarian's results: b leaders in education a who's who in arne rica answer: bs, phm, phd time: 3 minutes computer 's results: ( l) d handbook of the american economic association d d d b (2) b c a b b american men of science biographical directory of the american l'olitical science assoca t ion directory of american scholars leaders in education who's who i n american education outstanding young men of america who's who in america h'ho' s who in various areas national cyclopedia of american biograp hy question: where might i find information about a new england ancestor named jacob billings who was born around 1753'? as submitted to the computer: a / sex m, living n , occup z, nat amer, min ff , index z, date z, specl z, spec2 z + librarian's results: d handbook of genealogy about genealogists not families a compendium of american genealogy time: 8 minutes computer t s results: a compendium of a(!letican genea logy c dictionary of american biography c who ~.ras ~"1'10 in america c lamb's biographical dict i onary of the u. s. c concise dictionary of american biography a = it has the answer or a t least part of it b = good choice but it does not have answer c = reasonable choice but the r e arc better ones d = poor choice fig. 1. sample reference questions. 246 journal of library automation vol. 1/ 4 december, 1968 the first five answers, and five more titles were retrieved; even then there were more possibilities. in some cases the program did better than the librarian because she wasted time looking in sources that did not give the specifics sought. for instance, when the question asked for the pronunciation of the surname of paul and lucian hillemaker, french composers, she looked in dictionaries that do not give pronunciation. the computer found the only four possible sources immediately. in other cases the program came up with rather far-fetched answers a human would have skipped. a question asking for biographies of franz rakoczy, an hungarian hero, retrieved in its second five sources three jewish encyclopedias and a book on composers! these were not wrong and, in cases where occupation or minority group affiliations were unknown, these might be good sources. as an answer to the nobel-prize-winner question the computer retrieved sources on american doctors, nobel winners and scientists, which are the best choices from the data and would have the answers buried in them. however, what is really required is an index to award winners, and there were none in the data. the test revealed the necessity for allowing questions to have dummy values; that is, ones not used in the data. for instance there are no books limited to botanists, so occup botanist is not allowed in a question, though occup scientist is, and chem and physics are included as more specific values under scientist. asking for occup scientist when searching for a botanist avoids getting books devoted to nonscientific occupations but also gets books devoted to chemists and physicists. since one would want these books if he did not know the scientist was a botanist, that should not be changed. if he asks for occup botanist he wants books devoted to botanists first, then scientists in general. a short-term solution is to have dummy values to stand for all these other values. for example occup other-scientist could include all scientific occupations except those specifically listed, and it would retrieve books limited to all scientists but not to specific scientific occupations mentioned in the data. a long-term solution is to use a computer language allowing tree-structured data. presently this problem does no more than cause extraneous retrievals which the person using the list can easily skip. discussion advantages of the scheme can be speculated. from the library's point of view its virtues are that it is simple and inexpensive. original implementation would not require a major block of time to be spent in human indexing or abstracting. operating costs would be low because it does not require such a large store of information in memory that several tapes must be searched, and because updating the file is simple. when a new retrieval of biographical reference books/ well 247 book is added, an experienced person could categorize it in five minutes, punch a new data card and, if required, add to the list of values in the table of abbreviations. the system could provide useful information to other departments. it could keep tallies for the acquisitions department of how often a book is given as an answer, indicating whether new editions of it or similar books would be good buys. from the user's point of view the system avoids a major pitfall of some retrieval schemes which retrieve on the basis of ambiguous terms or association chains; that is, missing relevant items. if the user resubmits the same question ignoring already retrieved books each time, he will eventually have a comprehensive list of possible sources in the data that have the index and specifics he requires. a user also wants his information as brief as possible, listed in order of importance and with no extraneous answers ( 7); this requirement could be met as the program stands by having a human simply cross out any unnecessary titles. users like to know the reliability of the information ( 7) ; this detail could be provided along with the titles. users also want speed and convenience. as it stands, this system could be made available to users of the university of chicago library tomorrow with no more equipment than is presently in the computation center. time delay in the present implementation could be remedied by using an on-line system. users often prefer to be given facts themselves and not just citations ( 7). a program that gives biographical facts directly has no connection with this scheme or classification system, but the output of this program could be used as a tool by a librarian to find the answer for a patron. bibliographies the most obvious area to which the retrieval scheme could be extended is that of bibliographies. like biographies, they are limited in their scopes to certain exclusive categories, and they contain the same specific facts for each entry. logical exclusive categories could be: nationality, form (with such values as drama, poetry, fiction, maps, etc. ), subject (probably the most frequently used criterion on which to select books for a bibliography), and date. since there is no living with which to connect date, date here should probably have not just the most recent relevant date but as many values as necessary. for instance date 40s 50s 60s would apply to an index that began publication in the 1940's and is current. then a request for any of those dates would find it. possible specifics include number of pages, the cost, or a facsimile of the title page. arrangement would be needed, being different from index in that bibliographies, unlike biographies, cannot be assumed to have the same order (alphabetic by subject's name) plus indexes in other orders. arrangement would list as values all the ways the con248 journal of library automation vol. 1/ 4 december, 1968 tents of the bibliography could be approached: by subject, author, title, chronology or a combination of these. dictionaries dictionaries also lend themselves well to this type of scheme; one exclusive category, subject, might even be adequate for dictionaries. dictionaries' special subjects could be broken down into field (such as chemistry or business) and type (such as slang or geography), if necessary. language would be a specific category, since there are no substitutes for the language required. other possible specifics are pronunciation, definition, etymology and illustration. atlases atlases are also suited to the scheme. exclusive categories that seem appropriate are area covered, special subject atlases, and the size of the scale. scale should probably act as date does in the biographical program; that is, if a particular scale is requested, that would be searched for first and, if no answer is found, a note would be given and another search made for any scale. specifics for atlases could include items like topography, rainfall, winds, cities, highways and major products. factual books (those that give the highest mountain, the first fourminute mile, the january loth price of u.s. steel, etc.) do not lend themselves to the scheme. because these books are not uniform as to entries and subject coverage, the list of possible specifics and exclusive categories would be extremely long and the number of searches consequently prohibitive. also, since such books are far fewer in number than biographical or bibliographical works, the proper one is easier to find by browsing. conclusion a scheme for categorizing biographical reference books by their exclusive and specific categories makes it possible to automatically retrieve titles of those which would best answer reference questions. when tested it was found acceptable, with minor refinements, and it is easily adaptable to other reference book forms. such a system seems a logical direction in which to go when automation of actual reference functions is undertaken. acknowledgment the project under discussion was undertaken in partial fulfillment of requirements for the m. a. degree at the university of chicago's graduate library school. the computer program employed is detailed in the author's thesis ( 8). the work was partially completed under the auspices of aec contract no. at(ll-1)614. retrieval of biographical reference books / well 249 references 1. university of illinois library school: the library as a community information center. papers presented at an institute conducted by the university of illinois library school september 29-0ctober 2, 1957 (champaign, illinois: university of illinois library school, 1959), p. 2. 2. shera, jesse: "automation and the reference librarian," rq, iii, 6 (july 1964), 3-4. 3. austin, charles j.: medlars 1963-1967 (bethesda, national institutes of health, 1968). 4. haas, warren j.: "statewide and regional reference service," library trends. xii, 3 (january 1964), 407-10. 5. yngve, victor: com it programmers' reference manual (cambridge, mass.: m. i. t. press, 1962). 6. hsu, r. w.: characteristics of four list-processing languages (u. s. department of commerce, national bureau of standards, sept. 1963). 7. goodwin, harry b. : "some thoughts on improved technical information service," readings in information retrieval (new york, scarecrow press, 1964) , p. 43. 8. weil, cherie b.: classification and automatic retrieval of biographical reference books (chicago: university of chicago graduate library school, 1967). kwic index to government publications margaret norden: reference librarian, rush rhees library, university of rochester, rochester, new york 139 united states and united nations publications were not efficiently processed nor readily available to the reader at brandeis university library. data processing equipment was used to make a list of this material which could be referred to by a computer produced kwic index. currency and availability to the user, and time and cost efficiencies for the library were given precedence over detailed subject access. united states and united nations classification schemes> and existing bibliographies and indexes were used extensively. collections of publications of the united states government and the united nations are unwieldy and, often, unused. orne (1), kane (2), and morehead ( 3) have acknowledged that much of the output of proliferating governmental agencies and government supported research centers is hardly accessible. successful attempts to control the literature of a particular subject field, such as the indexes to the human relations area files and the american political science review, have been compiled by kenneth janda ( 4). others ( 5,6,7,8,) have described projects which apply the kwic index method of control to industrial research reports. no similar attempt to control government publications has been reported, although at northeastern university data processing equipment has been used to list united states material. the index developed at brandeis university library was designed to accommodate the varied government publications held by a library which served student, faculty and researcher alike. 140 journal of lib-rary automation vol. 2/3 september, 1969 materials and method brandeis became a selective united states document depository late in 1965. two years later a government documents department was created to handle all united states publications, as well as those of the united nations. about 15,000 united states publications and a smaller number of united nations publications had previously been acquired and processed as a regular part of the library collection. this material formed the nucleus of the documents collection, to which some 3,000 pieces were added yearly. the new department ordered and received all publications issued by federal government agencies and the united nations, but processed and serviced only about 80% of them. materials that had been acquired for the science library or special collections, such as reserve, were directed to regular library processing departments. the materials retained were classified and arranged according to the superintendent of documents classification and the united nations scheme wherever such numbers were available. all previously cataloged items were removed from the regular collection and scheduled for reclassification. only where superintendent of documents and united nations numbers were not available was library of congress classification retained or assigned. the collection then consisted of material arranged in three sections according to the classifications of the superintendent of documents, the united nations, and the library of congress. the kwic index included all united states and united nations publications located in the documents department. the reader was reminded that additional material issued by those government publishers, housed elsewhere in the libraries, was included in the library catalog. prefatory material included a list of symbols and abbreviations. a two-part index to issuing agencies, represented by six-letter mnemonic acronyms, was arranged alphabetically by acronym, and by bureau name. the r eader was cautioned to consult the united states government organization manual and a united nations organization chart for identification of government agencies and for tracing frequent changes in their structure and nomenclature. the documents list consisted of two parts: one, an accession number listing; and two, a kwic index to part one. upon arrival at the library, publications were numbered and ibm cards were punched according to format cards that described allocation of columns: column card 1 1-6 7 8-13 14-79 80 information item number card number author agency title field blank column card 3 1-6 7 8-20 21-54 55-79 80 information item number card number procedural data holdings kwic index 141 classification number blank cards one and three were punched for all documents; however, cards two and four were punched only where data exceded the prescribed spaces on cards one or three. columns one through six were reserved for the accession numbers. a special punch in column one was used to identify united nations documents so that they were listed after the united states sequence. column seven indicated the card number for a given document and was suppressed in the print-out. the title field included not only the title, but series and number, personal author and monographic date where this information was suitable. the flexible field was used for any information for which the librarian wished kwic cards. a cross reference or explanatory note about the location of publications of a quasi-independent agency was incorporated in the title field. the procedural data included type of publication, binding and frequency information, accounting data and similar notations. a sample of part one has been reproduced in figure 1. part two of the list, the kwic index, was produced by an ibm 1620 computer, model one with 40k memory. an excerpt has been reproduced in figure 2. only cards one and two were put into the computer along with the program and dictionary of exceptions. cards three and four were not used to produce the kwic index. the program required production of cards for author acronyms and for all keywords found in the title field. except in the cases of author acronyms and first words, a keyword was identified by the fact that it followed a blank space. blanks were not necessary in these two cases because they were incorporated in the computer program. single letters, integers, and exceptions were not considered keywords. the index was printed so that the accession number always appeared on the left, and the author agency was followed by an asterik and a space. the wraparound format usually associated with kwic indexes was abandoned to improve visual clarity. results about eight months after its inception, 2600 items had been entered on a separate documents collection list. the list had been printed offline on three-part print-out paper interleaved with carbon. mter the papers were reassembled in looseleaf binders, they were made available in the documents and reference departments and in the science library. ·•' ' 142 journal of library automation vol. 2/3 september, 1969 131) .jntpub 130 32 22 trans. lations on communist chinae 6 50ooonoo1t1968-to date y3oj66/13 131 rt.ral 131 42 22 federal programs available to assist rural americao 1968o· 60epost 1968 ' a97 o2/l5z 132 fedi\ar 132 ~ directory of research natural ar~as on federal lands of the u.s. 1968. nz 4z 22 60epost 1968 y3 of 31/19/9/968 133 jntpub 133 tr-anslations on east european agricul;turet forestry + foodt industries, 133 32 22 6gift i/noo117ol963-to' date y3oj66/13 iranslaiions on. easi european foreign ira 134 jnipob 134 :>2 22 6 15ooo//noo80tl963-to date y3oj66/l3 135 jntpub 135 translations on economic organization and management in eastern europe. 135 32 22 6 65,00n0ol44t 1963-to date y3oj66/l3 136 jntpllb translations on mongoi.!ao 136 32 22 6 13,00noo19t 1963-to' date 137 jntpu'b translations on north koreao 137 32 22 6 20ooonoo1t 1966-to date lio jntpub translations on north vietnamo 138 3'2 22 6 70,00not1't 1966-to date 139 jntpub translations on south + east asiao 139 32 22 6gift 1963to date 140 jntpub translations on latin americao 140 32 22 6150,001967-to date 141 jntpub translations on the near east. 141 32 22 6 50,001966-to date 143 jntfub 143 32 22 translations on u,s,s,ro agricultureo 6 '10o001967-to date y3oj66/l3 y3oj66/13 y3oj66/13 y3oj66/13 y3tj66/13 y3tj66/13 y3oj66/l3 144 jf\;tpu6 144 32 22 u,s,s,r, economy and !ndustryo general information, 6 40,001966-to date y3oj66/13 145 l!bcon accessions listt ceylono 145 32 22 6pi..-480//l967-to date lcle3017/volt/noo 146 libcon accessions l!stt !nd!ao 146 32 22 5pl-480//l963-to date lc1,301volo/noo 147 libcon accessions listtindonesia o 147 jz 2 2 6pl-4801964-to date lc1o30/5/vol.inoo 1'48 libcon accessions listt middle east, 148 3z 2z 5pl-480//1963-to date lclo 30/3/vo.lo/noo 149 libcon accessio ns listt nepalo fig. 1. accession number list. 166 113 6 68 66 164 117 113 64 56 163 44 19 31 32 3 71 83 108 195 176 190 78 1~ 172 160 66 123 11 74 18 179 30 62 174 l2.3 177 42 93 . 118 179 166 155 9cooc2 96 176 90vv.j3 139 133 134 128 129 135 55 141 148 kwic index 143 congresso 1949-1951. desper * repto to the president and the congressional otre(toryo congrs *official congressional district data booktredistricteo states• suppe census * congressional recordo congrs * congrs * congressional recordo congrs * factual.. campaign informationt1968o congrs * message of the president of the u,s, and accompanying documents congrs * official (ong!iess!onal. directoryo conrep * calendars of the uoso house-of-representatives, conrep * laws relating to social security + unemployment compensatjontl9 consen * calendar of business, consen * nomination + election of president + vice-president of the u,st conservation, agrscs * so!~ consir!!ction reptso ohous!ng authorized by......b.\.i.l.l.d.lng. p.e.rmjj_a_~el!~y._s_* __ construction reptstthousing saleso census * construction reptsot housing authorized by building permits, census * consumer price indexes, lsblab * contemporary artistt pubo 4730o 1968, smiths * the armed forces of the u contributionst oasi-noo hewsaa * social security co•lperat i ve water resojj_rc..e.s_f3~.e.ft..b~l:la.~-d_.i.r~ u:u.i'l.y.lli~_1_a_l_n_t_~.!h__::*---­ coronary drug projectt phs pubo 1965t revo l968o hewphs * the corps• l967o opport * job cost of clean watero 1968o wtrpol * countries for ~!sa of u,s, passports, stated * fees charged by foreign county business patternse census * court decisions, spctju * supreme courts, 1966 t1967 • childb •. legal!lil!c'iograt'h;rf"® ·.ruve'nllearo-f'a'filtv·crime + delinquency abstracts, hewphs * crime delinquency abstracts, before 1965 see international bibliography on crime + delinquencyo crisis of our citieso 1968, presdt * meeting the insurance crisistfactst myth + social changeo 1967• hewwel * rural youth in cr')ps + market so foragr * foreign ag.ricult(ir-e including • • • --··~ ct-1o 1967o census .* data access descriptionst census tabulations avatla deaf peopleo1968o vocreh * vocational rehabi~itation of defciv * safety reviewo defwro 4 naval research reviewst del!nquency abstracts'!. . .':!.~ wpi-)s * crime '!' . demonstration findingso labord * mdtatexperimental + department of state bulle stated * department store sales in selected areas, census * special current busin deptot or missouri-miscellaneouso 1863e comwar 4 repto western descriptions• census tabulations available on computer tap! series• ct-1 desper * repto to the pr.e.s.ule.iil._an_d_.jj:i.e. _con~r.es.so_,_ ].9!+9-j.9.-=5~1.t.•=--:-:--=---::=::::­ developments 1834-1962t l962o stated * a historical summary of uoso-kore developmentsol96l-65o1968o ecosoc * capital punishmentt pttl repto1960tp drug abuse controlo hewfda * boac bulletin issued by the bureau of drug projectt phs pubo 1965t revo 1968o hewphs * the coronary duties• guardianshipo 1968o womenp * arental rights and east as l a. jntp..ujl! _ _t_rai'is.l.a .. llon_s_qtl_. south_.+ __ __ ·~ -~=c=;;:---:t.=c:-=---::--=:,...,-;-= east european agricul turf t forestry + foodt i ndustri es o jntpub * transl-,;" east european forf.ign tradeo jntpub * translations on eastern europeo jntpub * political translations on eastern europeo jntpub * sociological translations on eastern europe o jntpub * translations on economic organization and manag eastern europ~-l-state!l!'_t;_~_qi}._t;i.§_e_?_w!th the. s_ovg,t .union .• + .•• . ·----easto jntpub * translations on the near east, libcon * accessions ltsro middle fig. 2. kwic index. 144 journal of library automation vol. 2/3 september, 1969 the copies were usable only on the temporary basis for which they were intended. pages ripped easily in the binders. the printing on copies two and three, which were carbon copies, smudged readily. production of more permanent copies of the list was deferred until the catalog should be more complete. because of the preliminary nature of this project, no specific time accounting was made. there was an attempt to increase student assistant duties in order to save regular staff time. the librarian annotated superintendent of documents shipping lists to indicate which items required new punched cards. she omitted, for example, journals that were entered once with an open holdings statement. after annotation, punched cards for united states depository documents were made by student assistants who had been previously introduced to the allocation of columns and to punching procedures. for non-depository united states and united nations publications, the librarian mapped cards on 80-column format sheets. in the production of part two, the kwic index, staff was involved only to make cross references. since the kwic program had been designed to make entries for all words in the title field other than the dictionary of exceptions, cross references had to be interfiled manually after the kwic entries had been made and alphabetized. the cost of materials for the addition of 100 entries to the list is tabulated below: materials ibm cards ( 800) print-out paper ( 8 sets) ibm 1620 computer rental ( 4 minutes) costs $ .80 + freight .32 + freight 1.68 $ 2.80 + freight there was no charge for the use of the keypunch (ibm 026), sorter (ibm 082), and accounting machine (ibm 407), nor was the library charged by the computer center persom:iel who wrote the program. for the first 100 items, all cards were duplicated in production as insurance against destruction (thus the card expense itemized above was doubled). the duplicate deck was later eliminated because the time spent in duplicating and interpreting these cards was greater than that required to repunch the deck from the list entries. storage space was available without cost, and no new storage equipment was purchased. the kwic program was written so that keyword cards were made for all words in the title field except listed exceptions, single letters and integers. it seemed at the inception of the project that such a program, which allowed untrained assistants to punch cards with a minimum of difficulty, was preferable to one that involved tagged keywords. however, ' the necessary filing and removal of cross references subsequently proved an inconvenience when the list was updated and reprinted. kwic index 145 discussion the productivity of government publishers has directed so much material into the library that ordinary procedures have been overtaxed. card catalog entries, for example, have become tardy, cumbersome, and incomprehensible to the library user and expensive for the library. the kwic list was designed as a substitute; however, it was useful only where the subject of a publication had been fairly reflected by its title. the possibility of incorporating descriptors in the title field of the list was considered, but rejected in the interests of speed and efficiency. the list depended upon standard reference sources for more complete subject and analytic cataloging. most often used in the case of united states publications were the superintendent of documents monthly catalog (9) and its auxiliaries such as the bureau of the census catalog ( 10). other sources included: wilson, popular names (11), the readers guide, the social science and humanities index, the business periodicals index, the index to legal periodicals, and the commerce clearing house index. for the united nations publications, greater use was made of the trade publications such as the periodic check list (formerly the monthly sales bulletin), the international reporter, the unesco catalogue, and the publishers trade list annual section for "unesco." the kwic index also was limited in that it covered only documents in the library's collection. while the user was convenienced by the ready availability of all items listed, he was obliged to consult reference sources for other existing documents. the new tool had advantages similar to book catalogs in terms of space saving and ease of duplicating. although originally only three copies were made, the possibility of duplication and distribution of this list to interested academic departments had been considered. it was also intended that new punched cards would be used to produce lists of new accessions, which would be duplicated and circulated. the problem of updating involved reprinting of part two, the kwic index, after previous inter-alphabetizing of entries. part one, however, was not reprinted as new entries were added successively. corrections were made by duplicating parts of cards and punching where necessary. the availability and cun-entness of such a list would presumably have encouraged the faculty and students to make greater use of these materials, and eliminated duplication of purchase orders. a major drawback to the list was that its arrangement, by accession munbers, bore no particular logic. a classification number an-angement would have been more meaningful to the reader; it would also have served as a shelf list and provided material for subject holdings lists. however, the ibm cards were not so arranged because neither the mechanical nor manual sorting of multi-digit and letter numbers was practical. arrangement by superintendent of documents numbers was employed at north146 journal of library automation vol. 2/ 3 september, 1969 eastern university, boston, massachusetts, and proved so inadequate that the librarian added subject headings to documents punched cards. this extra time-consuming step, plus the need to manually file punched cards, influenced the author to abandon shelf list order. a second difficulty involved in the kwic project was the dependence of the library upon use of equipment owned by another agency. it was conceivable that alterations in the equipment, policies or personnel of the university computer center could enforce changes on the library's listing procedure. this evaluation of the kwic index excluded considerations of the separation of the documents and reference departments. this matter has been thoroughly discussed elsewhere ( 12) . two other subjective considerations appeared during the first year of operation. most serious was the estrangement between documents and reference personnel. since both departments served the public, and their material was distinguished only by publisher, each staff relied extensively upon the other. cooperation and acquaintance with library material was difficult to maintain in two separate departments. because documents staff were primarily public service personnel, their extensive involvement in technical processes was not an efficient use of staff expertise. on the other hand, complete responsibility for this portion of library holdings insured that the staff became thoroughly acquainted with the collection and were better able to serve the public. conclusion the kwic index to government publications at brandeis is difficult to evaluate before the tests of time and use have been made. the system was suitable for the university library in that it was frequently consulted by the same, relatively sophisticated, users who were eager to familiarize themselves with library material. the kwic list itself emphasized currentness and flexibility at the expense of detailed subject access. this system attempted to utilize a potential goldmine of material without major investment or upheaval in the library. it has been sufficiently resilient to withstand a complete change of department personnel and was successful enough so that the possibility of expansion is being considered. note this report described the documents department as it functioned at its inception in september, 1967. the author left brandeis universityin june of 1968. the scope of the documents list was changed in september 1968 to include all united states and united nations publications acquired by the library. any inquiries about the present system should be directed to the current documents librarian: mr. michael abaray, goldfarb library, brandeis university, waltham, massachusetts 02154. kwic index 147 references 1. "report on the sixty-ninth meeting of the association of research libraries, new orleans, la. 1/8/67," lc information bulletin, 26 (january 26, 1967), 70. 2. kane, rita: "the future lies ahead: the documents depository library of tomorrow," library journal, 92 (november 1, 1967), 39713973. 3. morehead, joe: "united states government documents-a mazeway miscellany," rq, 8 (fall1968), 47-50. 4. janda, kenneth ed.: "advances in information retrieval in the social sciences," american behavioral scientist, 10 (january and february 1967). 5. sternberg, v. a.: "miles of information by the inch at the library of the bettis atomic power laboratory, westinghouse electric corporation," pennsylvania library association bulletin, 22 (may 1967), 189-194. 6. lawson, constance : "report documentation at texas instruments, incorporated," special libraries association, texas chapter bulletin, 15, (february 1964), 14-17. 7. minton, ann: "document retrieval based on keyword concept," special libraries association, texas chapter bulletin, 15 (february 1964)' 8-10. 8. bauer, c. b.: "practical application of automation in a scientific information center-a case study," special libraries, 55 (march 1964), 137-142. 9. united states. superintendent of documents: monthly catalog of united states government publications (washington : government printing office, 1895) . 10. u. s. bureau of the census : bureau of the census catalog ('washington: government printing office, 1945). 11. wilson, donald f. and william p. kilroy, comps.: popular names of united states government reports, a catalog (washington: government printing office, 1966). 12. shaw, thomas shuler, ed.: "federal, state and local government publications," library trends, 15 (july 1966), 3-194. patrick griffis building pathfinders with free screen capture tools building pathfinders with free screen capture tools | griffis 189 this article outlines freely available screen capturing tools, covering their benefits and drawbacks as well as their potential applications. in discussing these tools, the author illustrates how they can be used to build pathfinding tutorials for users and how these tutorials can be shared with users. the author notes that the availability of these screen capturing tools at no cost, coupled with their ease of use, provides ample opportunity for low-stakes experimentation from library staff in building dynamic pathfinders to promote the discovery of library resources. o ne of the goals related to discovery in the university of nevada las vegas (unlv) libraries’ strategic plan is to “expand user awareness of library resources, services and staff expertise through promotion and technology.”1 screencasting videos and screenshots can be used effectively to show users how to access materials using finding tools in a systematic, step-by-step way. screencasting and screen capturing tools are becoming more intuitive to learn and use and can be downloaded for free. as such, these tools are becoming an efficient and effective method for building pathfinders for users. one such tool is jing (http://www.jingproject.com), freeware that is easy to download and use. jing allows for short screencasts of five minutes or less to be created and uploaded to a remote server on screencast.com. once a jing screencast is uploaded, screencast.com provides a url for the screencast that can be shared via e-mail or instant message or on a webpage. another function of jing is recording screenshots, which can be annotated and shared by url or pasted into documents or presentations. jing serves as an effective tool for enabling librarians working with students via chat or instant messaging to quickly create screenshots and videos that visually demonstrate to students how to get the information they need. jing stores the screenshots and videos on its server, which allows those files to be reused in subject or course guides and in course management systems, course syllabi, and library instructional handouts. moreover, jing’s files storage provides an opportunity for librarians to incorporate tutorials into a variety of spaces where patrons may need them in such a manner that does not require internal library server space or work from internal library web specialists. trailfire (http://www.trailfire.com) is another screencapturing tool that can be utilized in the same manner. trailfire allows users to create a trail of webpage screenshots that can be annotated with notes and shared with others via a url. such trails can provide users with a step-by-step slideshow outlining how to obtain specific resources. when a trail is created with trailfire, a url is provided to share. like jing, trailfire is free to download and easy to learn and use. wink (http://debugmode.com/wink) was originally created for producing software tutorials, which makes it well suited for creating tutorials about how to use databases. although wink is much less sophisticated than expensive software packages, it can capture screenshots, add explanation boxes, buttons, titles, and voice to your tutorials. screenshots are captured automatically as you use your computer on the basis of mouse and keyboard input. wink files can be converted into very compressed flash presentations and a wide range of other file types, such as pdf, but do not support avi files. as such, wink tutorials converted to flash have a fluid movie feel similar to jing screencasts, but wink tutorials also can be converted to more static formats like pdf, which provides added flexibility. slideshare (http://www.slideshare.net) allows for the conversion of uploaded powerpoint, openoffice, or pdf files into online flash movies. an option to sync audio to the slides is available, and widgets can be created to embed slideshows onto websites, blogs, subject guides, or even social networking sites. any of these tools can be utilized for just-in-time virtual reference questions in addition to the common use of just-in-case instructional tutorials. such just-in-time screen capturing and screencasting offer a viable solution for providing more equitable service and teachable moments within virtual reference applications. these tools allow library staff to answer patron questions via e-mail and chat reference in a manner that allows patrons to see processes for obtaining information sources. demonstrations that are typically provided in face-toface reference interactions and classroom instruction sessions can be provided to patrons virtually. the efficiency of this practice is that it is simpler and faster to capture and share a screencast tutorial when answering virtual reference questions than to explain complex processes in written form. additionally, the fact that these tools are freely available and easy to use provides library staff the opportunity to pursue low-stakes experimentation with screen capturing and screencasting. the primary drawback to these freely available tools is that none of them provides a screencast that allows for both voice and text annotations, unlike commercial products such as camtasia and captivate. however, tutorials rendered with these freely available tools can be repurposed into a tutorial within commercial applications like camtasia studio (http://www.techsmith.com/camtasia .asp) and adobe captivate (http://www.adobe.com/ products/captivate/). patrick griffis (patrick.griffis@unlv.edu) is business librarian, university of nevada las vegas libraries. 190 information technology and libraries | december 2009 as previously mentioned, these easy-to-use tools can allow screencast videos and screenshots to be integrated into a variety of online spaces. a particularly effective type of online space for potential integration of such screencast videos and screenshots are library “how do i find . . .” research help guides. many of these “how do i find . . .” research help guides serve as pathfinders for patrons, outlining processes for obtaining information sources. currently, many of these pathfinders are in text form, and experimentation with the tools outlined in this article can empower library staff to enhance their own pathfinders with screencast videos and screenshot tutorials. reference 1. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 30, 2009): 2. unlv special collections continued from page 186 references 1. peter michel, “dino at the sands,” unlv special collections, http://www.library.unlv.edu/speccol/dino/index.html (accessed july 28, 2009). 2. peter michel, “unlv special collections search box.” unlv special collections. http://www.library.unlv.edu/speccol/ index.html (accessed july 28, 2009). 3. unlv special collections search results, “hoover dam,” http://www.library.unlv.edu/speccol/databases/index .php?search_query=hoover+dam&bts=search&cols[]=oh&cols []=man&cols[]=photocoll&act=2 (accessed october 27, 2009). 4. unlv libraries, “southern nevada: the boomtown years,” http://digital.library.unlv.edu/boomtown/ (accessed july 28, 2009). 5. unlv special collections, “what’s new in special collections,” http://blogs.library.unlv.edu/whats_new_in_special_ collections/ (accessed july 28, 2009). 6. unlv special collections, “unlv special collections facebook homepage,” http://www.facebook.com/home .php?#/pages/las-vegas-nv/unlv-special-collections/70053 571047?ref=search (accessed july 28, 2009). 7. unlv libraries, “comments section for the aerial view of hughes aircraft plant photograph,” http://digital.library .unlv.edu/hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “‘rate it’ feature for the aerial view of hughes aircraft plant photograph,” http://digital.library.unlv.edu/ hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “rss feature for the index to the welcome home howard digital collection” http://digital.library.unlv.edu/hughes/ dm.php/ (accessed july 28, 2009). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2009 issue). total number of copies printed: average, 5,096; actual, 4,751. mailed outside country paid subscriptions: average, 4,090; actual, 3,778. sales through dealers and carriers, street vendors, and counter sales: average, 430; actual 399. total paid distribution: average, 4,520; actual, 4,177. free or nominal rate copies mailed at other classes through the usps: average, 54; actual, 57. free distribution outside the mail (total): average, 127; actual, 123. total free or nominal rate distribution: average, 181; actual, 180. total distribution: average, 4,701; actual, 4,357. office use, leftover, unaccounted, spoiled after printing: average, 395; actual, 394. total: average, 5,096; actual, 4,751. percentage paid: average, 96.15; actual, 95.87. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 0 9 . 128 bell laboratories' library real-time loan system (bellrel) r. a. kennedy: bell telephone laboratories, murray hill, new jersey bell telephone laboratories has established an on-line circulation system linking two terminals in each of its three largest libraries to a central computer. objectives include improved service through computer pooling of collections, immediate reporting on publication availability or a borrower's record, automatic reserve follow-up; reduced labor; and increased · management information. loans, returns, reserves and many queries are handled in real time. input may be keyboard only or combined with card reading, to handle all publications with borrower present or absent. bellrel is now being used for some 1500 transactions per day. introduction as part of a continuing program to exploit available technology to improve library service, the technical information libraries system of the bell telephone laboratories has established an on-line, real-time computer circulation network. the initial configuration links two terminals in each of the holmdel, murray hill and whippany, new jersey, libraries to a central computer at murray hill. these are the three largest libraries in bell laboratories, handling 75% of a system total of more than 300,000 loans per year. the bellrel system is designed to process loans, returns, reservations and queries with real-time speed and responsiveness; additionally, it provides a wide range of other products and information basic to the effective control and use of library resources. the libraries of bell laboratories, like many other research libraries, have experienced unprecedented growth over the past decade in facilities, collections, services and traffic. new approaches have had to be found bellrel/ kennedy 129 not only to supply information services of sufficient power and diversity to meet the needs of a communications research organization of over 15,000 people, but also to cope with the expanding volume of everyday work in its eighteen library units. as elsewhere, a large component of that work is circulation in all of its ramifications: direct service, record-keeping, follow-up, resource identification, inter-unit coordination, feedback for purchase and purge decisions, etc. the bellrel system is addressed to these problems within the context of the bell laboratories. the use of computers in circulation control is no longer novel. the studies done by george fry and associates for the library technology project of the american library association emphasize the expense of implementing computer-aided circulation systems ( 1,2). despite these studies, which tend to focus more on the gross costs of substituting data processing for manual techniques than on the immediate and long-range gains for the library as an information system, a trend to the computer is clear. southern illinois ( 3), lehigh ( 4), and oakland ( 5) are among the many university and research libraries which have automated circulation operations using the ibm 357 data collection system and batch processing. comparable systems are in use or planned by other libraries (6,7). latterly there is increasing evidence of serious interest in real-time circulation control. the queen's university of belfast ( 8 ), and the state university of new york at buffalo ( 9) are two institutions reporting studies. redstone arsenal has been demonstrating a two-terminal, on-line system for about a year as part of a comprehensive automation program (10). the bellrel system was put into regular service in march, 1968, after two months of dry-run testing at all six terminals. this paper describes the reasons for changing from a manual system; the objectives established for the new system; the alternatives evaluated; the principal elements, operations and services of the selected system; and problems and performance in the brief period of operations to date. the paper is essentially a summary description; it does not report in detail on all card, disk and tape formats, maintenance procedures, products, logical operations, etc., of the system and its fifty-plus programs. a further report on bellrel will be published when significant experience has accrued. the displaced manual system the newark self-charge-signature system has been used by bell laboratories' libraries for some forty years. in this well-known simple system, the borrower writes his name and address on a prepared book card pulled from the book pocket. for the two out of three loans at bell labs where the borrower is not present, a circulation assistant fills out the card, which is then date stamped, tabbed for date due and filed by author. minor variations on this practice are used for unbound journals and other items lacking book cards. 130 journal of library automation vol. 1/2 june, 1968 reservations for individuals or other libraries in the network are hand posted on the charge card. files are scrutinized for overdue dates every several days (latterly, less frequently as traffic has mounted) and notices prepared by xerox copying of the charge card on an appropriate form. although standard loan periods run from one to four weeks, depending upon the item and demand, about 30% of all loans result in overdue notices. each library in the network has maintained its own circulation rec~ ords, including records for the local circulation of items borrowed on inter~ unit loan. inter-unit traffic is heavy, although substantial duplication of important publications exists in the various libraries. the merits of the newark self-charge system-simplicity, fast handling of borrowers, relatively low cost-are widely known. the system is a venerable one; it works. but all circulation systems have imperfections and in the bell laboratories long-recognized deficiencies of the manual system became increasingly unacceptable when loan traffic began to approach, then exceed, 200,000 items per year. these deficiencies included: 1. an increasing number of hours spent on the tedious and uninspiring tasks of sorting, tagging, posting, slipping, checking and husbanding cards. 2. labor, frequent delays and poor service associated with processing over 60,000 overdue notices per year. 3. inability automatically to use the pooled resources of several libraries to meet demands. 4. inability to determine quickly not merely the holdings of other copies of a title in the library system (union catalogs serve this purpose, after some steps and card handling) but the availability of loan copies at the moment of need. 5. inefficiencies in tracking down missing publications, inventory items, etc. 6. inability to identify all publications currently on loan to a borrower or used by him sometime previously. 7. inadequate information on collection use for resource management. 8. excessive service delays due to combinations of the preceding factors. new system objectives the deficiencies listed above suggest some of the characteristics defined for the new system. library management concluded early in 1965 that any replacement for the existing system must: 1. meet the long-range needs of each of the major libraries in bell laboratories and be extensible to other units in the library network as traffic, experience and costs warranted. . 2. provide not merely a more effective means for handling circulation operations within the walls of any one library but also, if possible, bellrel/kennedy 131 an instrument for knocking walls down, for bringing the combined resources of a number of libraries to bear on any information need. 3. handle all types of materials, bound or unbound, and all types of requests whether in person, by mail, direct telephone or recorded message (i.e., telereference) service. 4. give immediate up-to-the-minute accounting for all items on loan or otherwise off the shelves and locate copies still available for loan. 5. hold reservations against system resources (in line with objective 2) and direct the first copy returned, wherever returned, and as automatically as practical, to the first person on the reserve queue, whatever his base location. 6. identify promptly all items currently charged to a borrower and, as required, previously borrowed by him. 7. monitor circulation traffic and generate, as necessary, overdue notices, missing item lists, high-demand lists, zero-activity reports, statistics, use analyses and other feedback fundamental to effective control and management of the collection. 8. lift the circulation staff from clerical tasks to more personal service to library users, in the interest of the "human use of human beings," to use norbert wiener's phrase. 9. integrate the loan system with other computer-aided systems in use or planned in the libraries. 10. improve the total response of the library to the user. systems evaluated in view of these objectives it will be apparent that only a computeraided system could be seriously considered. none of the several dozen noncomputer systems surveyed in the fry report ( 1) could be considered a worthwhile alternative to the libraries' manual system. the essential questions therefore became: off-line or on-line access? batch or real-time processing? the demonstrated success of the ibm 357 batch processing circulation system compelled study and on-site investigation in several libraries. it was concluded, however, that while the 357 system would meet a number of the established goals, and at moderate cost, the important objectives of immediate accountability, automatic follow-up on reserves, full disclosure of copies available for loan, and automatic pooling of network resources would be seriously compromised. further, the fact that twothirds of all loans made in bell laboratories do not involve the presence of the borrower substantially detracted from one of the major virtues of the 357 system, i.e., the simplicity of input using a pre-punched man ( identification) card submitted by the borrower. the various alternatives for coping with this situation in a 357 system, for 200,000 loans a year and a potential of over 15,000 people, were not attractive. i .. ' i i (. ! :: " ' r ' 132 journal of library automation vol. 1/ 2 june, 1968 the feasibility of on-line access has been widely demonstrated in the research and business world. remote, on-line computer processing is clearly a common course of the near future. equally predictably, it will steadily give more favorable cost/ value ratios as machine costs decrease and labor costs mount. in sum, the technical information libraries concluded that an on-line system was worth the investment and that no other system was worth the price. only an on-line approach would meet the overall objectives for a new system and offer advantages sufficient to justify conversion effort at this time. as frederick ruecking has observed, "a charging system should not be selected because it is 'cheaper' than others. if the selected system does not meet the present and future needs, the choice is poor." ( 11) the bellrel system bellrel is a joint development of the technical information libraries and the comptroller's division of bell laboratories. the system was designed, programmed and implemented in a little over two years, beginning in late 1965. during this time, preparation of the bibliographic records, system design and programming took about seven man years. basic machine elements the initial network is illustrated in figure 1. the two ibm 1050 terw.£ . o ata phone holm0£l l i brary ,...i,-,.,050~te=r=min""a-..~ 1051 1052 1056 selector murray hill library 1050 terminal 10~0 terminal 10" 105 2 lost 1051 1052 1056 channe~ #i 1-----.------.1 console typewriter fig. 1. bellrel circulation system network . whippany library 105 0 te rmina~ r-10 __ 5_0 --te--.m-in-4--.~ 10" 1052 1051 1051 1052 1056 won i to r te"minal bellrel/ kennedy 133 minals in each of the three libraries incorporate keyboard, printer and card reader facilities for maximum flexibility in handling all types of transactions and queries. each terminal is linked by telephone lines, using western electric 103a data-sets, to an ibm 360-40 computer in the compb·oller's division at murray hill. the murray hill library is only a building away from the computer. the holmdel and whippany libraries are about thirty and twelve miles distant, respectively. the computer, in heavy daily use along with other computers for regular operations of the comptroller's division, has a 262,000 byte ( character) core memory. core is partitioned, permitting effective simultaneous use of the computer for routine batch operations and the bellrel system. in addition to core requirements for the 360 operating system, core partitions include (a) the teleprocessing logic of the ibm queued teleprocessing access method (qtam), (b) message editing logic and application logic packages, including library applications and (c) batch processing programs and operations for all purposes. figure 2, a flowchart of the real-time processing logic, illustrates core partitioning for function process equip input and output of inquiry or transaction at 1050 terminal teleprocessing logic *inter-t.erminal communication, if desired by librarian message editing logic application logic common logic routines output ~ message ( response) v receive a i queue message i (switch) i queue a send response • process message • refer to disk files • update files • generate response disk input. and output logic process logic 1050 fig. 2. general flowchart of bellrel real-time programming logic. 134 journal of library automation vol. 1/ 2 june, 1968 (a) and (b). in addition to the programs resident in core (portions of which can be overlaid as necessary by other real-time operations) certain programs for particular functions (e.g. loan, return, etc.) are called from disk as needed. in all, 32 real-time and 23 batch programs, together with the 360 operating system, are used by bellrel. the programs are written in cobol level f and basic assembly language. disk records publication and man records are stored on an ibm 2314 disk pack with a capacity of some 29,000,000 characters. about two-thirds of this space is in use or dedicated. the man records, which are up-dated daily from tape used for telephone directory, payroll and other purposes, cover about 19,000 people including btl employees and resident visitors, i.e., contractual people who may also use library facilities. each man record is 161 characters in length and contains such information as payroll account number, name, department number, telephone number, location, occupational class, space for three book loans, keys referring to overflow loan trailers elsewhere on disk, etc. the man file is organized by payroll account number, a five-digit number which is keyed in or read from a prepunched card for all loans, reservations and other transactions requiring it. access to man records on disk is by the ibm index sequential access method ( isam). publication records vary in format, length and method of access depending upon the class of publication. five classes of publications are currently in the system: books (class 1), journals (class 2), trade catalogs (class 3), college catalogs (class 4) and dewey-classified continuations and multiple-volume titles cataloged as sets (class 5). other classes of information, e.g., documents, motion picture films , etc. will be added. each title in each class is assigned a unique six-digit identification number, the first digit of which identifies the class. a typical number for a monograph title is 127391. the punched cards and book labels for each copy of this title also indicate the holding library and its copy number, e.g., 127391mh01, 127391 wh05. a sample card and label, generated by the computer, are shown in figure 3. as noted above, books fall in two classes-1 and 5. each class provides a maximum of 100,000 title numbers, more than adequate for the predicted growth of the technical information libraries where weeding is heavy. the book collections for the three libraries now on disk total about 33,000 titles and 66,000 volumes. the disk record for each class 1 title is 188 characters in length and contains the book number, 43 characters of author-title, the call number, copies by location, the fields for file maintenance change infonnation, three loans, two reserves, keys to loan trailers and reserve trailers, etc. each loan field identifies borrower, date due, copy and status of the loan (e.g., overdue number, renewed, number of reserves, returned). the bellrel/kennedy 135 i i 102362mhui i i holton, g ./ sci~nc e and the mod e rn mind ii ill ii i i ii ii i i i i i i i i i _ .... u 500/h7 5 1111 i i i i i i i i 111111 i i bell 'elephone laboratories ii i i i i i i i i i i tethnical .. iiofi.1atjon liirafties i i i i i i i i i i i i i i 1023!>2 mh 01 holton, g./ science ano the modern mind 500/h75 ill i fig. 3. bellrel book card and label. z ~ ::u rr1 3': i 0 < rr1 i i -u r rr1 l> (f) rr1 0 0 identification number for each new class 1 book is assigned by the computer on update runs. numbers are sequential. disk access is direct. class 5 books-cataloged continuations and multiple-volume titles cataloged as sets-share a different kind of disk record. they could all have been entered as class 1 items, in which case each volume of a set would have had a separate record on disk, a unique (not ~ecessarily consecutive) identification number, and a separate listing in the author, call number and identification number printed catalogs. the class 5 approach, however, permits grouping of volumes in sets and series. ten volumes of one title are handled in one disk record, 288 characters in length, under the same identification number. additional volumes, up to a total of 100, are handled in succeeding records. all of the records of the set carry the same first five digits in their identification number. disk access is by the index sequential access method ( isam). in addition to grouping sets, class 5 records effect a saving on disk space and permit use statistics to be derived for the set as a whole, as well as for each volume in the set. the principal disadvantage of the approach is that all keyed messages dealing with any volume in the set must cite both the basic access number and the specific data (e.g., volume number) pertinent to the volume in question. the journal disk records cover all the 2700 journal titles held in the library system. unlike books, however, records of all copies and volumes of each title are not permanently stored on disk. instead, each !55character journal title record contains the journal identification number ,. ,, 136 journal of library automation vol. 1/ 2 june, 1968 and 48 characters of title, plus fields for file maintenance changes, two loans, one reserve, and keys to loan and reserve trailers. specific bound volumes or unbound issues are recorded on this record only as long as they are current loan or reserve transactions. to expedite loans and returns, punch cards and computer-printed labels have been prepared for some 10,000 bound journal volumes. additional volumes are similarly processed as circulated or bound. disk records for trade catalogs and college catalogs are also 155 characters long. access to records is also by the index sequential access method. unlike journal volumes, however, each separate catalog is specifically identified and recorded on disk. when conversion is complete, more than 5000 catalogs will be accessible on disk. the loan and reserve trailers for each publication class accommodate overflow. trailer records vary in number and length depending upon function, publication type and predicted need. for example, 5000 31character trailer records, each handling three reserves, are available for book reserves. for journals, 800 59-character records, each handling three reserves, are provided. the difference reflects the heavier book traffic and the particularly sharp peaking of reserves on new book titles. apart from the normal safety back-up files (e.g., the nightly dump to tape of the current disk records), the only remaining machine record which requires mention is the history tape. this tape, up-dated daily, is a continuing record of all completed loans which provides information necessary for statistics and use analyses. on-line transactions twenty-two different transaction codes are currently available to handle loans, returns, renewals, reservations and queries in real time. in addition, any terminal can call another terminal by a single digit code and one terminal in each library can call the other two libraries simultaneously by a 'broadcast' code. this inter-library, typewritten message facility is a highly useful component of the total system. ten of the twenty-two transaction codes handle loans, returns, reservations and renewals. these codes, their prime functions and associated data inputs are listed in table 1. the eleven lq (library query) codes for requesting information from bellrel are listed in table 2. one additional code causes the computer to print out at the query terminal a statistical log of all classes of transactions at each terminal and their totals. it also gives the number of input errors made at each terminal. the log aids in adjusting work loads and monitoring performance. let us now consider several common transactions in more detail. loans: if the borrower is present, he gives the desired book to the circulation clerk. he shows his badge or, alternatively, writes his surname and fivetable 1. on-line codes for loans, etc. code function 1. loans lm ln 2. returns lc lk 3. reserves la lb ld 4. renewals, etc. loan of 1-5 items for one man at one time. overnight loan. assigns overnight loan period automatically; does not pick-up reserves on return. cancel loan. charge out automatically to first person on reserve queue. cancel loan. no automatic charge-out. add to reserve queue. give reserve no., copies held and available, etc. bypass reserve queue. put designated man first. delete from reserve queu~. lp change loan period assigned. lr renew loan, once. lg force renewal irrespective of reserves, overdue status, etc. input data & method man no., item no. (including location and copy). usually card read. , item no., with location and copy. usually card read. " man no., item no. (less location and copy). keyed. , , new loan period, complete item no. keyed or card read. , , t:x:j ~ ~ ~ ~ ~ ~ ........... ~ t'%:1 z z t'%:1 t1 ~ )-' ~ i ( ~ .. = ~ •• 142 journal of library automation vol. 1/ 2 june, 1968 similar messages are not accepted by the system. recovery from errors may be done by aborting input, repeating it correctly or, if all elements are legitimate to the message edit program, by using the appropriate online code to correct the record. on the first day of full operations, 10% of the input transactions were incorrect. one week's experience reduced the error rate to 3%, and further improvement is expected. the .25% error rate estimated by lazorick and herling for a system planned to function without any prepunched cards ( 9) appears unrealistic. non-personal codes some thirty special codes, which function like man numbers in the system, are available to handle real-time transactions involving branch libraries, outside organizations and such internal library functions as charges to recataloging, repair, new book shelf, etc. all are three-digit codes, essentially mnemonic, e.g., al9 allentown ( pa.) library; wi9withdrawn. most of the codes generate overdue notices; the codes for binding, missing, repair and a few oth~rs do not. several require backup manual records, e.g., ala interlibrary loan forms for charges to outside libraries. batch processes and products overdue notices and daily loan lists are produced in a nightly file maintenance run which also updates the history tape. the preprinted forms used for first and second overdue notices are address-sorted for direct mailing. the third notice, triggered three days after the second and ten days after the first, is a listing with telephone numbers and other data for telephone follow-up. the daily loan list is primarily a back-up record in the event of system down-time. current loans, the number of reserves and other information are combined in one list for all three libraries. the bellrel master book catalog is run quarterly from disk records. main entry, dewey number and access number catalogs are produced. all new copy; new title and other record changes made on disk in maintenance runs are reflected in cumulative weekly catalog supplements . these runs also produce all the new or changed cards and labels required. the bellrel catalog is a precursor to a system-wide printed book catalog which will replace nearly one million catalog cards held in eighteen libraries. when completely developed, input to the circulation system will be a sub-system of the master catalog maintenance procedures. maintenance of the disk journal records for bellrel follows a comparable integrated approach: journal code numbers, title abbreviations, data changes and the like are derived from the computer routines used to prepare the serials catalog since 1962. trade catalog files for the bellrel/kennedy 141 the book and the number of copies still available for loan at each library. getting one copy into the hands of the requester is then very simple. the holding library nearest to the borrower is instructed, by telephone or terminal message, to send call number such-and-such "out." the requester's name and address are not relayed. the holding library gets the book from the shelves and cancels it, using the lc command with the card reader. although this copy was not on loan, the computer ignores this fact because someone is waiting for the book, i.e., the requester whose reserve triggered this sequence. as a consequence of the cancel operation, the requester is automatically charged with the book, the holding library is told his name and address, and mailing follows. the lc command is also used in the same way to get additional copies of a book, when purchased to meet high demands, into the hands of the requesters. the la reserve transaction is put to particularly good use in handling the 600-plus requests received within a few days each month for new books announced in the library bulletin. bulletin request forms supply both item numbers ~nd man numbers. mass input follows and the computer responds with all the signposts needed to put every copy in the system to work, with a dispatch speed hitherto impossible to achieve. as shown in table 1, two transactions permit changes in reserve queues. ld deletes a requester. lb permits the queue to be bypassed and insertion of a new name at the top of the list. queries this is a fact retrieval facility. the codes listed in table 2 are reasonably self-explanatory, and take into account the realities of on-line circulation service. lqc, for example, tells the status of a title at the moment of asking, an up-to-dateness not available from the backup daily loan list generated each night. typical responses to the lqc code are: copies available, mh02 who!; title removed my68; or all 03 copies loaned, 14 reserves similarly, lql provides a requester with an immediate, printed listing of all the items he has on loan. two query codes cause display of the complete disk record for a publication ( lqd) or a person ( lqe), including current loans, reserves and trailer records. error detection in any keyboard operation mistakes will be made. bellrel attempts to signal critical errors and prevent them from affecting records. as noted previously, input man numbers and item numbers are translated by the computer into alpha characters. numerous diagnostics are also returned: e.g., invalid transaction code; invalid book id #; invalid empl #;invalid transaction bad copy#; variable data required, etc. incorrect inputs generating these and table 2. on-line library query codes. code query 1. publications lqc what is the status of title . . . ? lqs what is the status of copy . . . ? lqn what overnight items are still out? lqd display the complete disk record for title .... 2. people lqm how many items are on loan to ... ? lql what items are charged to . . . ? lqq who is first on the reserve queue for . . . ? lqr is man . . . on reserve? where? lqw who are the borrowers of title ... ? lqz who is man number ... ? lqe display the complete disk record for man . . .. input data & method (all queries are keyed.) item no. (less location and copy). item no. (with location and copy). location symbol only. item no. (less location and copy). man no. ,, man no., item no. (less location, copy) . " man no. " " ~ i ..a t-'4 ... i it i g· ~ ..... ~ '-1 § s~ ..... cd &5 bellrel/ kennedy 139 digit number on a card. while he is doing this, the clerk hits the 'request" button on the keyboard-printer, and inserts the book card in the card reader along with an end-of-transmission card. with the keyboard 'proceed' light on ( 2 seconds after 'request'), the clerk returns the typewriter carriage and keys in lm (the loan code) and the man number obtained from the borrower: e.g., lm43486. input is completed by activating the card reader. the card reader reads only to the end-of-block punch (column 16 in book cards ) , ignoring the author-title data and call number in columns 17-78. as the book identification number is read, it is listed on the typewriter. the loan period is not punched in the card, but is assigned by the computer on the basis of the first digit of the publication's identification number. the assigned loan may be altered from the keyboard, if desired. the computer responds to the loan transaction in 3 to 5 seconds, printing back in upper-case red the first three letters (trigram) of the borrower's surname and twenty characters of the book's author-title entry. these responses provide checks against errors in keying. man numbers are usually keyed (although they may also be card read) and a book number is keyed when its punch card is not available, e.g., in posting a reserve. as noted below, a wide range of other computer responses are available to flag errors and aid diagnosis. the loan transaction is completed by inserting the punch card in the book and date stamping it. total elapsed time from the borrower's presentation of the book to date stamping averages about 23 seconds for a single loan of the type described. this compares with about 20 seconds cited for one ibm 357 system ( 3) and 14 second~ in bell laboratories manual system; however, in both these systems further processing is required. if a borrower wishes to charge out more than one book at a time (a common occurrence which ruled out punching the end-of-transmission code on the book card), up to five books may be handled with one keyboarding. total elapsed time for multiple loans averages about 15 seconds per book. loans of bound journals, trade catalogs and other publications with prepunched cards follow the routine described. for unbound journals and other items lacking cards, it is necessary to obtain the title number from the printed catalog and to key this in with the relevant issue information. other transaction codes, as noted in table 1, deal with loan period changes and renewals. typical computer responses from the renew code (lr) include: renew; overdue; res waiting; no renew. returns the two-character return code ( lc) is used with card reading or typing. five items may be discharged with one lc action. the computer ·' .,, iii '•' ill 1;1 '• ' i ' . ~ . . ,, '•' ' o , i , i. ... i ' 140 journal of library automation vol. 1/ 2 june, 1968 responds with twenty characters of author-title, and one of the following messages, for each item: returned i.e., the loan is complete and no one is waiting for a copy of this book. loan to . . . i.e., send the book to the man indicated by name and address. since he was first on the reserve queue, the book is now charged out to him for the loan period shown. mail to ... libr i.e., this book belongs to the library sho\vn and should be returned there. no one is waiting for it. not on loan i.e., this book was previously cancelled or somebody borrowed' it without charging it out. the loan to ... response noted above is a particularly valuable service and time-saving feature. in effect, if any reserve exists anywhere in the system for the title, then the first copy returned is automatically charged to the first person in the queue and the next person moved up. the loan period assigned by the computer depends upon whether there is a waiting list for the book. the library does not need to take any charge-out · action except to date stamp the book and address a mail envelope using the information provided . the mail to ... libr response, calling attention to the fact that the book should be returned to its 'home' library, is coupled with automatic charging by the computer to in transit to . . . questions about the copy will receive this response during the time it takes to ship it to its home base. when the book is cancelled at the home library, any reservations made during the 'in transit' phase will cause automatic loan in the manner already described . cancellation of a loan charge without automatic follow-up on the reserve queue is sometimes desirable. for example, after a copy of a book has been charged to 'missing' and search has failed to locate it, a charge to 'lost' may be desirable for record purposes. use of the lk return code, instead of the normal lc, makes this possible without automatic pick-up of the reserve queue. reservations since reserves are posted in bellrel in real time, any copy of a title returned, even seconds after the reserve is made, will be charged to the first man on reserve. reserves are input using the keyboard sequence la man number, item number. the computer response includes the standard name trigram and publication data. if all copies of the title are on loan, the computer also responds with information to the requester on where he stands; as an example, "res #03, copies held 05". if all copies are not on loan, the response includes the call number of l bellrel/kennedy 143 circulation system are similarly correlated with other existing machine processes and products. as stated earlier, much improved feedback on collection use, demand patterns, and other matters important to library management was a major goal of bellrel. the history tapes serve this purpose both for special-purpose analyses and regular system reports. the latter include circulation statistics by subject class and library, laboratory location, user department and so on. three other reports may be mentioned: 1. high-demand list-this is a weekly list focusing attention on all titles with more than a specified number of reserves. reserves and copies are shown by location. previous loan totals are also given to aid in purchase decisions to meet demands. 2. zero-loan list-this is a semiannual listing of all titles in the collection with no recorded loan activity in one or more libraries for the period surveyed. a summary of previous loans is given, to help in decisions on weeding. 3. missing items list-this is a twice-monthly, dewey-ordered list of all titles charged to 'missing.' it is used to conduct scheduled searches in all libraries until the items are converted to 'lost' and replaced or withdrawn. operating experience . this paper is being written after only one month's . use of bellrel in regular service. the following observations are therefore limited. circulation assistants have adapted very quickly to the ·input mechanics, familiarity with typewriter keyboards and the novelty of conversing with the computer being contributing factors .. bellrel appears to be regarded as a powerful and perceptive colleague with the occasional off moments accepted in a friend. · burdensome tasks, such as preparing overdue notices and maintaining card records have ·been dropped with enthusiasm. staff members are developing new perspectives as they understand the functioning of an information network. the total system concept, embracing the resources of all participating libraries . and permitting one copy of a book to serve many readers without inter-library loan, is modifying many practices. greater record accuracy, · completeness and utility is also being realized, along with significant time-savings throughout the system. the query . facility, which shows promise of being much used, provides immediate answers to certain questions which previously could not be asked and gives a glimpse of the eventual responsiveness of a complete on-line library catalog. .. · customer reaction has ranged from some technical interest (technical staff members were consulted in the development of the system and information about its purposes and functions has been widely disseminated) to more common approval and enthusiasm. the increase in time to charge ,. •' if' ,. ii · it '• '•' it' ' • ,, " •' 144 journal of library automation vol. 1/ 2 june, 1968 out a book in person in bellrel-about nine seconds more than the manual system for a single loan and two seconds more per book for multifle loans-appears to be widely accepted. whether this ·is due to initia tolerance of a new system, or less 'work' by the borrower in the charge operation, or an appreciation that service as a whole will be faster and more responsive, is not known. it is expected that charging time will be reduced with program modifications and experience. it should also be recalled that in two out of three loans the borrower is not present: far from experiencing additional delay, he gets what he wants faster. the usual bugs in a complex of programs have arisen; certain trailers had to be enlarged; the 360 operating system and hardware have failed several times. down-time, under initial loads of up to 1500 on-line transactions per day, has been less than anticipated for the first month and is expected to drop sharply. about two down-times per day were experienced in the first month, about half of these being deliberate, and most recoveries have taken less than fifteen minutes. down-time logs are used to record transactions for immediate entry into the system when it becomes alive, a similar procedure being used for after-hours loans. costs the costs of operating the bellrel system are, understandably, higher than the displaced manual system, the two systems, of course, not being comparable in services and functions. in the operations which can be fairly directly correlated, bellrel permits very significant labor savings. appreciable materials savings are also anticipated as a result of collection pooling (leading to reduced duplication of resources in the individual libraries), better inventory control, and other factors. rental costs are the major component. each of the six terminals, for example, with associated data-sets and telephone lines, costs $275 a month. costs of the portion .of the transmission control unit and disk facility used by the libraries total about $1100 a month. in addition to a small amount for materials, other costs include a share of the central processing unit and core memory charges, depending upon usage. to execute 1500 real-time transactions per day appears to require less than 12 minutes of main-frame computer time, but a share of the real-time terminal polling and batch processing time must also be included. however, experience with the automated system has been far too brief to reach any precise cost figures for the whole system. in particular, although the dollar value of the largely intangible but very real benefits to library users and library staff can only, at least at this stage, be guessed at, bellrel has been implemented on the premise that these benefits are major. it should be noted that the costs of the manual (newark) system in bell laboratories differ greatly from the costs calculated by the library technology project ( l tp) for this system in an academic library ( 12). bellrel/kennedy 145 ltp cost estimates for both the newark and the ibm 357 systems do not conform to our calculations for more reasons than can he discussed here. in the main, however, environmental conditions, strongly affecting labor costs, are too different. for example, in arriving at labor costs, l tp uses the figure of 44 overdues per 1000 circulations in academic libraries; in our library system where there are no fines or long loans, overdues total about eight times this figure. again, as a result of book announcement services, discipline concentration and other factors, reserves in the bell laboratories libraries are nearly twenty times the ratio used by the library technology project for academic libraries. still further, in bell laboratories some 200,000 loans per year are made without the borrower being present to fill in the loan card. these and other factors add heavily to the cost of labor. few industrial organizations can obtain labor at the cost of $2.00 per hour cited in library technology reports when personnel benefits and other overhead are included. conclusion paul fasana has observed: "since cost is primarily a quantitative measme of a system, it is hut one of several factors (and possibly not even the most important factor) to consider in evaluating an automated system. other factors . . . qualitative factors . . . must also he considered. . . . they include such items as operating efficiency, reliability, services rendered, and growth potential." ( 13) . a full judgment on these factors in the bellrel system must await fmther experience hut the following observations may he made: 1) bellrel is not an experiment; it is addressed to practical problems in an industrial library network. 2) it is not a final system; software and hardware evolution will see to that. 3) it is not a model system, transportable in toto to another context; any system of comparable complexity and investment requires careful matching to local needs and objectives. bellrel objectives, to reiterate, include improved service through computer pooling of dispersed library collections, up-to-date reporting on the status of any publication, immediate identification of all items on loan to a person and automatic follow-up on reserve queues; reduced clerical labor; better inventory control; much enriched feedback for library management; more effective realization of the information network philosophy; and experience in the new era of man-machine communication in a real-life environment. the evidence is strong that these objectives are being achieved. acknowledgments the technical information libraries gratefully acknowledge the unstinting and imaginative aid given by the comptroller's division of bell tel,, •' ~ • ' r .. ... 146 journal of librm·y automation vol. 1/ 2 june, 1968 ephone laboratories in the design, development and operation of the bellrel system. bibliography 1. 2. 3. 4. 5 . 6. 7. 8. 9. 10. 11. 12. 13. george fry and associates, inc.: study of circulation control systems (chicago: ala, 1961). american library association, library technology project: the use of data-processing equipment in circulation control (chicago: ala, july 1965), library technology reports. mccoy, ralph e.: "computerized circulation work: a case study of the 357 data collection system," library resources & technical se1·vices, 9 (winter 1965), 59-65. flannery, anne; mack, james d.: mechanized circulation system, lehigh university library (center for the information sciences, lehigh univ.: nov. 1966), library systems analyses report no. 4. cammack, floyd; mann, donald: "institutional implications of an automated circulation study," college & research libraries, 28 ( march 1967), 129-32 . cuadra, carlos a., ed.: american documentation institute annual review of information science and technology, vol. 1. (new york: interscience, 1966), pp. 201-4. mccune, lois c.; salmon, stephen r.: "bibliography of library automation," ala bulletin, 61 (june 1967), 674-94. kimber, richardt.: "studies at the queen's university of belfast on real-time computer control of book circulation," journal of documentation, 22 (june 1966), 116-22. lazorick, gerald j.; herling, john p. : "a real time library circulation system without pre-punched cards," proceedings of the american documentation institute, v. 4 (washington: adi, 1967), 202-6. croxton, f. e.: on-line computer applications in a technical library (redstone scientific information center, redstone arsenal, alabama: november 1967), rsic-723. ruecking, frederick, jr.: "selecting a circulation-control system: a mathematical approach," college & research libraries, 25 (sept. 1964)' 385-90. american library association, library technology project: three systems of circulation control (chicago: ala, may 1967), library technology reports. fasana, paul j.: "determining the cost of library automation," ala bulletin, 61 (june 1967) 661. 6 information technology and libraries | march 2009 paul t. jaeger and zheng yan one law with two outcomes: comparing the implementation of cipa in public libraries and schools though the children’s internet protection act (cipa) established requirements for both public libraries and public schools to adopt filters on all of their computers when they receive certain federal funding, it has not attracted a great amount of research into the effects on libraries and schools and the users of these social institutions. this paper explores the implications of cipa in terms of its effects on public libraries and public schools, individually and in tandem. drawing from both library and education research, the paper examines the legal background and basis of cipa, the current state of internet access and levels of filtering in public libraries and public schools, the perceived value of cipa, the perceived consequences of cipa, the differences in levels of implementation of cipa in public libraries and public schools, and the reasons for those dramatic differences. after an analysis of these issues within the greater policy context, the paper suggests research questions to help provide more data about the challenges and questions revealed in this analysis. t he children’s internet protection act (cipa) established requirements for both public libraries and public schools to—as a condition for receiving certain federal funds—adopt filters on all of their computers to protect children from online content that was deemed potentially harmful.1 passed in 2000, cipa was initially implemented by public schools after its passage, but it was not widely implemented in public libraries until the 2003 supreme court decision (united states v. american library association) upholding the law’s constitutionality.2 now that cipa has been extensively implemented for five years in libraries and eight years in schools, it has had time to have significant effects on access to online information and services. while the goal of filtering requirements is to protect children from potentially inappropriate content, filtering also creates major educational and social implications because filters also limit access to other kinds of information and create different perceptions about schools and libraries as social institutions. curiously, cipa and its requirements have not attracted a great amount of research into the effects on schools, libraries, and the users of these social institutions. much of the literature about cipa has focused on practical issues—either recommendations on implementing filters or stories of practical experiences with filtering. while those types of writing are valuable to practitioners who must deal with the consequences of filtering, there are major educational and societal issues raised by filtering that merit much greater exploration. while relatively small bodies of research have been generated about cipa’s effects in public libraries and public schools,3 thus far these two strands of research have remained separate. but it is the contention of this paper that these two strands of research, when viewed together, have much more value for creating a broader understanding of the educational and societal implications. it would be impossible to see the real consequences of cipa without the development of an integrative picture of its effects on both public schools and public libraries. in this paper, the implications of cipa will be explored in terms of effects on public libraries and public schools, individually and in tandem. public libraries and public schools are generally considered separate but related public sphere entities because both serve core educational and information-provision functions in society. furthermore, the fact that public schools also contain school library media centers highlights some very interesting points of intersection between public libraries and school libraries in terms of the consequences of cipa: while cipa requires filtering of computers throughout public libraries and public schools, the presence of school library media centers makes the connection between libraries and schools stronger, as do the teaching roles of public libraries (e.g., training classes, workshops, and evening classes). n the legal road to cipa history under cipa, public libraries and public schools receiving certain kinds of federal funds are required to use filtering programs to protect children under the age of seventeen from harmful visual depictions on the internet and to provide public notices and hearings to increase public awareness of internet safety. senator john mccain (r-az) sponsored cipa, and it was signed into law by president bill clinton on december 21, 2000. cipa requires that filters at public libraries and public schools block three specific types of content: (1) obscene material (that paul t. jaeger (pjaeger@umd.edu) is assistant professor at the college of information studies and director of the center for information policy and electronic government of the university of maryland in college park. zheng yan (zyan@uamail.albany .edu) is associate professor at the department of educational and counseling psychology in the school of education of the state university of new york at albany. one law with two outcomes | jaeger and yan 7 which appeals to prurient interests only and is “offensive to community standards”); (2) child pornography (depictions of sexual conduct and or lewd exhibitionism involving minors); and (3) material that is harmful to minors (depictions of nudity and sexual activity that lack artistic, literary, or scientific value). cipa focused on “the recipients of internet transmission,” rather than the senders, in an attempt to avoid the constitutional issues that undermined the previous attempts to regulate internet content.4 using congressional authority under the spending clause of article i, section 8 of the u.s. constitution, cipa ties the direct or indirect receipt of certain types of federal funds to the installation of filters on library and school computers. therefore each public library and school that receives the applicable types of federal funding must implement filters on all computers in the library and school buildings, including computers that are exclusively for staff use. libraries and schools had to address these issues very quickly because the federal communications commission (fcc) mandated certification of compliance with cipa by funding year 2004, which began in summer 2004.5 cipa requires that filters on computers block three specific types of content, and each of the three categories of materials has a specific legal meaning. the first type—obscene materials—is statutorily defined as depicting sexual conduct that appeals only to prurient interests, is offensive to community standards, and lacks serious literary, artistic, political, or scientific value.6 historically, obscene speech has been viewed as being bereft of any meaningful ideas or educational, social, or professional value to society.7 statutes regulating speech as obscene have to do so very carefully and specifically, and speech can only be labeled obscene if the entire work is without merit.8 if speech has any educational, social, or professional importance, even for embodying controversial or unorthodox ideas, it is supposed to receive first amendment protection.9 the second type of content—child pornography—is statutorily defined as depicting any form of sexual conduct or lewd exhibitionism involving minors.10 both of these types of speech have a long history of being regulated and being considered as having no constitutional protections in the united states. the third type of content that must be filtered— material that is harmful to minors—encompasses a range of otherwise protected forms of speech. cipa defines “harmful to minors” as including any depiction of nudity, sexual activity, or simulated sexual activity that has no serious literary, artistic, political, or scientific value to minors.11 the material that falls into this third category is constitutionally protected speech that encompasses any depiction of nudity, sexual activity, or simulated sexual activity that has serious literary, artistic, political, or scientific value to adults. along with possibly including a range of materials related to literature, art, science, and policy, this third category may involve materials on issues vital to personal well-being such as safe sexual practices, sexual identity issues, and even general health care issues such as breast cancer. in addition to the filtering requirements, section 1731 also prescribes an internet awareness strategy that public libraries and schools must adopt to address five major internet safety issues related to minors. it requires libraries and schools to provide reasonable public notice and to hold at least one public hearing or meeting to address these internet safety issues. requirements for schools and libraries cipa includes sections specifying two major strategies for protecting children online (mainly in sections 1711, 1712, 1721, and 1732) as well as sections describing various definitions and procedural issues for implementing the strategies (mainly in sections 1701, 1703, 1731, 1732, 1733, and 1741). section 1711 specifies the primary internet protection strategy—filtering—in public schools. specifically, it amends the elementary and secondary education act of 1965 by limiting funding availability for schools under section 254 of the communication act of 1934. through a compliance certification process within a school under supervision by the local educational agency, it requires schools to include the operation of a technology protection measure that protects students against access to visual depictions that are obscene, are child pornography, or are harmful to minors under the age of seventeen. likewise, section 1712 specifies the same filtering strategy in public libraries. specifically, it amends section 224 of the museum and library service act of 1996/2003 by limiting funding availability for libraries under section 254 of the communication act of 1934. through a compliance certification process within a library under supervision by the institute of museum and library services (imls), it requires libraries to include the operation of a technology protection measure that protects students against access to visual depictions that are obscene, child pornography, or harmful to minors under the age of seventeen. section 1721 is a requirement for both libraries and schools to enforce the internet safety policy with the internet safety policy strategy and the filtering technology strategy as a condition of universal service discounts. specifically, it amends section 254 of the communication act of 1934 and requests both schools and libraries to monitor the online activities of minors, operate a technical protection measure, provide reasonable public notice, and hold at least one public hearing or meeting to address the internet safety policy. this is through the 8 information technology and libraries | march 2009 certification process regulated by the fcc. section 1732, titled the neighborhood children’s internet protection act (ncipa), amends section 254 of the communication act of 1934 and requires schools and libraries to adopt and implement an internet safety policy. it specifies five types of internet safety issues: (1) access by minors to inappropriate matter on the internet; (2) safety and security of minors when using e-mail, chat rooms, and other online communications; (3) unauthorized access; (4) unauthorized disclosure, use, and dissemination of personal information; and (5) measures to restrict access to harmful online materials. from the above summary, it is clear that (1) the two protection strategies of cipa (the internet filtering strategy and safety policy strategy) were equally enforced in both public schools and public libraries because they are two of the most important social institutions for children’s internet safety; (2) the nature of the implementation mechanism is exactly the same, using the same federal funding mechanisms as the sole financial incentive (limiting funding availability for schools and libraries under section 254 of the communication act of 1934) through a compliance certification process to enforce the implementation of cipa; and (3) the actual implementation procedure differs in libraries and schools, with schools to be certified under the supervision of local educational agencies (such as school districts and state departments of education) and with libraries to be certified within a library under the supervision of the imls. economics of cipa the universal service program (commonly known as e–rate) was established by the telecommunications act of 1996 to provide discounts, ranging from 20 to 90 percent, to libraries and schools for telecommunications services, internet services, internal systems, and equipment.12 the program has been very successful, providing approximately $2.25 billion dollars a year to public schools, public libraries, and public hospitals. the vast majority of e-rate funding—about 90 percent—goes to public schools each year, with roughly 4 percent being awarded to public libraries and the remainder going to hospitals.13 the emphasis on funding schools results from the large number of public schools and the sizeable computing needs of all of these schools. but even 4 percent of the e-rate funding is quite substantial, with public libraries receiving more than $250 million between 2000 and 2003.14 schools received about $12 billion in the same time period.15 along with e-rate funds, the library services and technology act (lsta) program administered by the imls provides money to each state library agency to use on library programs and services in that state, though the amount of these funds is considerably lower than e-rate funds. the american library association (ala) has noted that the e-rate program has been particularly significant in its role of expanding online access to students and to library patrons in both rural and underserved communities.16 in addition to the effect on libraries, e-rate and lsta funds have significantly affected the lives of individuals and communities. these programs have contributed to the increase in the availability of free public internet access in schools and libraries. by 2001, more than 99 percent of public school libraries provided students with internet access.17 by 2007, 99.7 percent of public library branches were connected to the internet, and 99.1 percent of public library branches offered public internet access.18 however, only a small portion of libraries and schools used filters prior to cipa.19 since the advent of computers in libraries, librarians typically had used informal monitoring practices for computer users to ensure that nothing age inappropriate or morally offensive was publicly visible.20 some individual school and library systems, such as in kansas and indiana, even developed formal or informal statewide internet safety strategies and approaches.21 why were only libraries and schools chosen to protect children’s online safety? while there are many social institutions that could have been the focus of cipa, the law places the requirements specifically on public libraries and public schools. if congress was so interested in protecting children from access to harmful internet content, it seems that the law would be more expansive and focused on the content itself rather than filtering access to the content. however, earlier laws that attempted to regulate access to internet content failed legal challenges specifically because they tried to regulate content. prior to the enactment of cipa, there were a number of other proposed laws aimed at preventing minors from accessing inappropriate internet content. the communications decency act (cda) of 1996 prohibited the sending or posting of obscene material through the internet to individuals under the age of eighteen.22 however, the supreme court found the cda to be unconstitutional, stating that the law violated free speech under the first amendment. in 1998, congress passed the child online protection act (copa), which prohibited commercial websites from displaying material deemed harmful to minors and imposed criminal penalties on internet violators.23 a three-panel judge for the district court for the eastern district of pennsylvania ruled that copa’s focus on “contemporary community standards” violated the first amendment, and the panel subsequently imposed an one law with two outcomes | jaeger and yan 9 injunction on copa’s enforcement. cipa’s force comes from congress’s power under the spending clause; that is, congress can legally attach requirements to funds that it gives out. since cipa is based on economic persuasion—the potential loss of funds for technology—the law can only have an effect on recipients of those funds. while regulating internet access in other venues like coffee shops, internet cafés, bookstores, and even individual homes would provide a more comprehensive shield to limit children’s access to certain online content, these institutions could not be reached under the spending clause. as a result, the burdens of cipa fall squarely on public libraries and public schools. n the current state of filtering when did cipa actually come into effect in libraries and schools? after overcoming a series of legal challenges that were ultimately decided by the supreme court, cipa came into effect in full force in 2003, though 96 percent of public schools were already in compliance with cipa in 2001. when the court upheld the constitutionality of cipa, the legal challenge by public libraries centered on the way the statute was written.24 the court’s decision states that the wording of the law does not place unconstitutional limitations on free speech in public libraries. to continue receiving federal dollars directly or indirectly through certain federal programs, public libraries and schools were required to install filtering technologies on all computers. while the case decided by the supreme court focused on public libraries, the decision virtually precludes public schools from making the same or related challenges.25 before that case was decided, however, most schools had already adopted filters to comply with cipa. as a result of cipa, a public library or public school must install technology protection measures, better known as filters, on all of its computers if it receives n e-rate discounts for internet access costs, n e–rate discounts for internal connections costs, n lsta funding for direct internet costs,26 or n lsta funding for purchasing technology to access the internet. the requirements of cipa extend to public libraries, public schools, and any library institution that receives lsta and e–rate funds as part of a system, including state library agencies and library consortia. as a result of the financial incentives to comply, almost 100 percent of public schools in the united states have implemented the requirements of cipa,27 and approximately half of public libraries have done so.28 how many public schools have implemented cipa? according to the latest report by the department of education (see table 1), by 2005, 100 percent of public schools had implemented both the internet filtering strategy and safety policy strategy. in fact, in 2001 (the first year cipa was in effect), 96 percent of schools had implemented cipa, with 99 percent filtering by 2002. when compared to the percentage of all public schools with internet access from 1994 to 2005, internet access became nearly universal in schools between 1999 and 2000 (95 to 98 percent), and one can see that the internet access percentage in 2001 was almost the same as the cipa implementation percentage. according to the department of education, the above estimations are based on a survey of 1,205 elementary and secondary schools selected from 63,000 elementary schools and 21,000 secondary and combined schools.29 after reviewing the design and administration of the survey, it can be concluded that these estimations should be considered valid and reliable and that cipa was immediately and consistently implemented in the majority of the public schools since 2001.30 how many public libraries have implemented cipa? in 2002, 43.4 percent of public libraries were receiving e-rate discounts, and 18.9 percent said they would not apply for e-rate discounts if cipa was upheld.31 since the supreme court decision upholding cipa, the number of libraries complying with cipa has increased, as table 1. implementation of cipa in public schools year 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2005 access (%) 35 50 65 78 89 95 98 99 99 100 100 filtering (%) 96 99 97 100 10 information technology and libraries | march 2009 have the number of libraries not applying for e-rate funds to avoid complying with cipa. however, unlike schools, there is no exact count of how many libraries have filtered internet access. in many cases, the libraries themselves do not filter, but a state library, library consortium, or local or state government system of which they are a part filters access from beyond the walls of the library. in some of these cases, the library staff may not even be aware that such filtering is occurring. a number of state and local governments have also passed their own laws to encourage or require all libraries in the state to filter internet access regardless of e-rate or lsta funds.32 in 2008, 38.2 percent of public libraries were filtering access within the library as a result of directly receiving e-rate funding.33 furthermore, 13.1 percent of libraries were receiving e-rate funding as a part of another organization, meaning that these libraries also would need to comply with cipa’s requirements.34 as such, the number of public libraries filtering access is now at least 51.3 percent, but the number will likely be higher as a result of state and local laws requiring libraries to filter as well as other reasons libraries have implemented filters. in contrast, among libraries not receiving e-rate funds, the number of libraries now not applying for e-rate intentionally to avoid the cipa requirements is 31.6 percent.35 while it is not possible to identify an exact number of public libraries that filter access, it is clear that libraries overall have far lower levels of filtering than the 100 percent of public schools that filter access. e-rate and other program issues the administration of the e-rate program has not occurred without controversy. throughout the course of the program, many applicants for and recipients of the funding have found the program structure to be obtuse, the application process to be complicated and time consuming, and the administration of the decision-making process to be slow.36 as a result, many schools and libraries find it difficult to plan ahead for budgeting purposes, not knowing how much funding they will receive or when they will receive it.37 there also have been larger difficulties for the program. following revelations about the uses of some e-rate awards, the fcc suspended the program from august to december 2004 to impose new accounting and spending rules for the funds, delaying the distribution of over $1 billion in funding to libraries and schools.38 news investigations had discovered that certain school systems were using e-rate funds to purchase more technology than they needed or could afford to maintain, and some school systems failed to ever use technology they had acquired.39 while the administration of the e-rate program has been comparatively smooth since, the temporary suspension of the program caused serious short-term problems for, and left a sense of distrust of, the program among many recipients.40 filtering issues during the 1990s, many types of software filtering products became available to consumers, including serverside filtering products (using a list of server-selected blocked urls that may or may not be disclosed to the user), client-side filtering (controlling the blocking of specific content with a user password), text-based content-analysis filtering (removing illicit content of a website using real-time analysis), monitoring and timelimiting technologies (tracking a child’s online activities and limiting the amount of time he or she spends online), and age-verification systems (allowing access to webpages by passwords issued by a third party to an adult).41 but because filtering software companies make the decisions about how the products work, content and collection decisions for electronic resources in schools and public libraries have been taken out of the hands of librarians, teachers, and local communities and placed in the trust of proprietary software products.42 some filtering programs also have specific political agendas, which many organizations that purchase them are not aware of.43 in a study of over one million pages, for every webpage blocked by a filter as advertised by the software vendor, one or more pages were blocked inappropriately, while many of the criteria used by the filtering products go beyond the criteria enumerated in cipa.44 filters have significant rates of inappropriately blocking materials, meaning that filters misidentify harmless materials as suspect and prevent access to harmless items (e.g., one filter blocked access to the declaration of independence and the constitution).45 furthermore, when libraries install filters to comply with cipa, in many instances the filters will frequently be blocking text as well as images, and (depending on the type of filtering product employed) filters may be blocking access to entire websites or even all the sites from certain internet service providers. as such, the current state of filtering technology will create the practical effect of cipa restricting access to far more than just certain types of images in many schools and libraries.46 n differences in the perceived value of cipa and filtering based on the available data, there clearly is a sizeable contrast in the levels of implementation of cipa between one law with two outcomes | jaeger and yan 11 schools and libraries. this difference raises a number of questions: for what reasons has cipa been much more widely implemented in schools? is this issue mainly value driven, dollar driven, both, or neither in these two public institutions? why are these two institutions so different regarding cipa implementation while they share many social and educational similarities? reasons for nationwide full implementation in schools there are various reasons—from financial, population, social, and management issues to computer and internet availability—that have driven the rapid and comprehensive implementation of filters in public schools. first, public schools have to implement cipa because of societal pressures and the lobbying of parents to ensure students’ internet safety. almost all users of computers in schools are minors, the most vulnerable groups for internet crimes and child pornography. public schools in america have been the focus of public attention and scrutiny for years, and the political and social responsibility of public schools for children’s internet safety is huge. as a result, society has decided these students should be most strongly protected, and cipa was implemented immediately and most widely at schools. second, in contrast to public libraries (which average slightly less than eleven computers per library outlet), the typical number of computers in public schools ranges from one hundred to five hundred, which are needed to meet the needs of students and teachers for daily learning and teaching. since the number of computers is quite large, the financial incentives of e-rate funding are substantial and critical to the operation of the schools. this situation provides administrators in schools and school districts with the incentive to make decisions to implement cipa as quickly and extensively as possible. furthermore, the amount of money that e-rate provides for schools in terms of technology is astounding. as was noted earlier, schools received over $12 billion from 2000 to 2003 alone. schools likely would not be able to provide the necessary computers for students and teachers without the e-rate funds. third, the actual implementation procedure differs in schools and libraries: schools are certified under the supervision of the local educational agencies such as school districts and state departments of education; libraries are certified within a library organization under the supervision of the imls. in other words, the certification process at schools is directly and effectively controlled by school districts and state departments of education, following the same fundamental values of protecting children. the resistance to cipa in schools has been very small in comparison to libraries. the primary concern raised has been the issue of educational equality. concerns have been raised that filters in schools may create two classes of students—ones with only filtered access at school and ones who also can get unfiltered access at home.47 reasons for more limited implementation in libraries in public libraries, the reasons for implementing cipa are similar to those of public schools in many ways. public libraries provide an average of 10.7 computers in each of the approximately seven thousand public libraries in the united states, which is a lot of technology that needs to be supported. the e-rate and lsta funds are vital to many libraries in the provision of computers and the internet. furthermore, with limited alternative sources of funding, the e-rate and lsta funds are hard to replace if they are not available. given that the public libraries have become the guarantor of public access to computing and the internet, libraries have to find ways to ensure that patrons can access the internet.48 libraries also have to be concerned about protecting and providing a safe environment for younger patrons. while libraries serve patrons of all ages, one of the key social expectations of libraries is the provision of educational materials for children and young adults. children’s sections of libraries almost always have computers in them. much of the content blocked by filters is of little or no education value. as such, “defending unfiltered internet access was quite different from defending catcher in the rye.”49 nevertheless, many libraries have fought against the filtering requirements of cipa because they believe that it violates the principles of librarianship or for a number of other reasons. in 2008, 31.6 percent of public libraries refused to apply for e-rate or lsta funds specifically to avoid cipa requirements, a substantial increase from the 15.3 percent of libraries that did not apply for e-rate because of cipa in 2006.50 as a result of defending patron’s rights to free access, the libraries that are not applying for e-rate funds because of the requirements of cipa are being forced to turn down the chance for funding to help pay for internet access in order to preserve community access to the internet. because many libraries feel that they cannot apply for e-rate funds, local and regional discrepancies are occurring in the levels of internet access that are available to patrons of public libraries in different parts of the country.51 for adult patrons who wish to access material on computers with filters, cipa states that the library has the option of disabling the filters for “bona fide research or other lawful purposes” when adult patrons request such disabling. the law does not require libraries to 12 information technology and libraries | march 2009 disable the filters for adult patrons, and the criteria for disabling of filters do not have a set definition in the law. the potential problems in the process of having the filters disabled are many and significant, including librarians not allowing the filters to be turned off, librarians not knowing how to turn the filters off, the filtering software being too complicated to turn off without injuring the performance of the workstation in other applications, or the filtering software being unable to be turned off in a reasonable amount of time.52 it has been estimated that approximately 11 million low-income individuals rely on public libraries to access online information because they lack internet access at home or work.53 the e-rate and lsta programs have helped to make public libraries a trusted community source of internet access, with the public library being the only source of free public internet access available to all community residents in nearly 75 percent of communities in the united states.54 therefore usage of computers and the internet in public libraries has continued to grow at a very fast pace over the past ten years.55 thus public libraries are torn between the values of providing safe access for younger patrons and broad access for adult patrons who may have no other means of accessing the internet. n cipa, public policy, and further research while the diverse implementations, effects, and levels of acceptance of cipa across schools and libraries demonstrate the wide range of potential ramifications of the law, surprisingly little consideration is given to major assumptions in the law, including the appropriateness of the requirements to different age groups and the nature of information on the internet. cipa treats all users as if they are the same level of maturity and need the same level of protection as a small child, as evidenced by the requirement that all computers in a library or school have filters regardless of whether children use a particular computer. in reality, children and adults interact in different social, physical, and cognitive ways with computers because of different developmental processes.56 cipa fails to recognize that children as individual users are active processors of information and that children of different ages are going to be affected in divergent ways by filtering programs.57 younger children benefit from more restrictive filters while older children benefit from less restrictive filters. moreover, filtering can be complimented by encouragement of frequent positive internet usage and informal instruction to encourage positive use. finally, children of all ages need a better understanding of the structure of the internet to encourage appropriate caution in terms of online safety. the internet represents a new social and cultural environment in which users simultaneously are affected by the social environment and also construct that environment with other users.58 cipa also is based on fundamental misconceptions about information on the internet. the supreme court’s decision upholding cipa represents several of these misconceptions, adopting an attitude that ‘we know what is best for you’ in terms of the information that citizens should be allowed to access.59 it assumes that schools and libraries select printed materials out of a desire to protect and censor rather than recognizing the basic reality that only a small number of print materials can be afforded by any school or library. the internet frees schools and libraries from many of these costs. furthermore, the court assumes that libraries should censor the internet as well, ultimately upholding the same level of access to information for adult patrons and librarians in public libraries as students in public schools. these two major unexamined assumptions in the law certainly have played a part in the difficulty of implementing cipa and in the resistance to the law. and this does not even address the problems of assuming that public libraries and public schools can be treated interchangeably in crafting legislation. these problematic assumptions point to a significantly larger issue: in trying to deal with the new situations created by the internet and related technology, the federal government has significantly increased the attention paid to information policy.60 over the past few years, government laws and standards related to information have begun to more clearly relate to social aspects of information technologies such as the filtering requirements of cipa.61 but the social, economic, and political ramifications for decisions about information policy are often woefully underexamined in the development of legislation.62 this paper has documented that many of the reasons for and statistics about cipa implementation are available by bringing together information from different social institutions. the biggest questions about cipa are about the societal effects of the policy decisions: n has cipa changed the education and informationprovision roles of libraries and schools? n has cipa changed the social expectations for libraries and schools? n have adult patron information behaviors changed in libraries? n have minor patron information behaviors changed in libraries? n have student information behaviors changed in school? n how has cipa changed the management of libraries and schools? n will congress view cipa as successful enough to merit using libraries and schools as the means of enforcing other legislation? one law with two outcomes | jaeger and yan 13 but these social and administrative concerns are not the only major research questions raised by the implementation of cipa. future research about cipa not only needs to focus on the individual, institutional, and social effects of the law. it must explore the lessons that cipa can provide to the process of creating and implementing information policies with significant societal implications. the most significant research issues related to cipa may be the ones that help illuminate how to improve the legislative process to better account for the potential consequences of regulating information while the legislation is still being developed. such cross-disciplinary analyses would be of great value as information becomes the center of an increasing amount of legislation, and the effects of this legislation have continually wider consequences for the flow of information through society. it could also be of great benefit to public schools and libraries, which, if cipa is any indication, may play a large role in future legislation about public internet access. references 1. children’s internet protection act (cipa), public law 106554. 2. united states v. american library association, 539 u.s. 154 (2003). 3. american library association, libraries connect communities: public library funding & technology access study 2007–2008 (chicago: ala, 2008); paul t. jaeger, john carlo bertot, and charles r. mcclure, “the effects of the children’s internet protection act (cipa) in public libraries and its implications for research: a statistical, policy, and legal analysis,” journal of the american society for information science and technology 55, no. 13 (2004): 1131–39; paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology and libraries 26, no. 2 (2007): 4–14; paul t. jaeger et al., “cipa: decisions, implementation, and impacts,” public libraries 44, no. 2 (2005): 105–9; zheng yan, “limited knowledge and limited resources: children’s and adolescents’ understanding of the internet,” journal of applied developmental psychology (forthcoming); zheng yan, “differences in basic knowledge and perceived education of internet safety between high school and undergraduate students: do high school students really benefit from the children’s internet protection act?” journal of applied developmental psychology (forthcoming); zheng yan, “what influences children’s and adolescents’ understanding of the complexity of the internet?,” developmental psychology 42 (2006): 418–28. 4. martha m. mccarthy, “filtering the internet: the children’s internet protection act,” educational horizons 82, no, 2 (winter 2004): 108. 5. federal communications commission, in the matter of federal–state joint board on universal service: children’s internet protection act, fcc order 03-188 (washington, d.c.: 2003). 6. cipa. 7. roth v. united states, 354 u.s. 476 (1957). 8. miller v. california, 413 u.s. 15 (1973). 9. roth v. united states. 10. cipa. 11. cipa. 12. telecommunications act of 1996, public law 104-104 (feb. 8, 1996). 13. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the e-rate program and libraries and library consortia, 2000–2004: trends and issues,” information technology & libraries 24, no. 2 (2005): 57–67. 14. ibid. 15. ibid. 16. american library association, “u.s. supreme court arguments on cipa expected in late winter or early spring,” press release, nov. 13, 2002, www.ala.org/ala/aboutala/hqops/ pio/pressreleasesbucket/ussupremecourt.cfm (accessed may 19, 2008). 17. kelly rodden, “the children’s internet protection act in public schools: the government stepping on parents’ toes?” fordham law review 71 (2003): 2141–75. 18. john carlo bertot, paul t. jaeger, and charles r. mcclure, “public libraries and the internet 2007: issues, implications, and expectations,” library & information science research 30 (2008): 175–184; charles r. mcclure, paul t. jaeger, and john carlo bertot, “the looming infrastructure plateau?: space, funding, connection speed, and the ability of public libraries to meet the demand for free internet access,” first monday 12, no. 12 (2007), www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/2017/1907 (accessed may 19, 2008). 19. mccarthy, “filtering the internet.” 20. leigh s. estabrook and edward lakner, “managing internet access: results of a national survey,” american libraries 31, no. 8 (2000): 60–62. 21. alberta davis comer, “studying indiana public libraries’ usage of internet filters,” computers in libraries (june 2005): 10–15; thomas m. reddick, “building and running a collaborative internet filter is akin to a kansas barn raising,” computers in libraries 20, no. 4 (2004): 10–14. 22. communications decency act of 1996, public law 104-104 (feb. 8, 1996). 23. child online protection act (copa), public law 105-277 (oct. 21, 1998). 24. united states v. american library association. 25. r. trevor hall and ed carter, “examining the constitutionality of internet filtering in public schools: a u.s. perspective,” education & the law 18, no. 4 (2006): 227–45; mccarthy “filtering the internet.” 26. library services and technology act, public law 104-208 (sept. 30, 1996). 27. john wells and laurie lewis, internet access in u.s. public schools and classrooms: 1994–2005, special report prepared at the request of the national center for education statistics, nov. 2006. 28. american library association, libraries connect communities; john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 285–301; jaeger et al., “cipa.” 29. wells and lewis, internet access in u.s. public schools and classrooms. 14 information technology and libraries | march 2009 30. ibid. 31. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 32. jaeger et al., “cipa.” 33. american library association, libraries connect communities. 34. ibid. 35. ibid. 36. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 37. ibid. 38. norman oder, “$40 million in e-rate funds suspended: delays caused as fcc requires new accounting standards,” library journal 129, no. 18 (2004): 16; debra lau whelan, “e-rate funding still up in the air: schools, libraries left in the dark about discounted funds for internet services,” school library journal 50, no. 11 (2004): 16. 39. ken foskett and paul donsky, “hard eye on city schools’ hardware,” atlanta journal-constitution, may 25, 2004; ken foskett and jeff nesmith, “wired for waste: abuses tarnish e-rate program,” atlanta journal-constitution, may 24, 2004. 40. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 41. department of commerce, national telecommunication and information administration, children’s internet protection act: study of technology protection measures in section 1703, report to congress (washington, d.c.: 2003). 42. mccarthy, “filtering the internet.” 43. paul t. jaeger and charles r. mcclure, “potential legal challenges to the application of the children’s internet protection act (cipa) in public libraries: strategies and issues,” first monday 9, no. 2 (2004), www.firstmonday.org/issues/issue9_2/ jaeger/index.html (accessed may 19, 2008). 44. electronic frontier foundation, internet blocking in public schools (washington, d.c.: 2004), http://w2.eff.org/censor ship/censorware/net_block_report (accessed may 19, 2008). 45. adam horowitz, “the constitutionality of the children’s internet protection act,” st. thomas law review 13, no. 1 (2000): 425–44. 46. tanessa cabe, “regulation of speech on the internet: fourth time’s the charm?” media law and policy 11 (2002): 50–61; adam goldstein, “like a sieve: the child internet protection act and ineffective filters in libraries,” fordham intellectual property, media, and entertainment law journal 12 (2002): 1187–1202; horowitz, “the constitutionality of the children’s internet protection act”; marilyn j. maloney and julia morgan, “rock and a hard place: the public library’s dilemma in providing access to legal materials on the internet while restricting access to illegal materials,” hamline law review 24, no. 2 (2001): 199–222; mary minow, “filters and the public library: a legal and policy analysis,” first monday 2, no. 12 (1997), www .firstmonday.org/issues/issue2_12/minnow (accessed may 19, 2008); richard j. peltz, “use ‘the filter you were born with’: the unconstitutionality of mandatory internet filtering for adult patrons of public libraries,” washington law review 77, no. 2 (2002): 397–479. 47. mccarthy, “filtering the internet.” 48. john carlo bertot et al., “public access computing and internet access in public libraries: the role of public libraries in e-government and emergency situations,” first monday 11, no. 9 (2006), www.firstmonday.org/issues/issue11_9/bertot (accessed may 19, 2008); john carlo bertot et al., “drafted: i want you to deliver e-government,” library journal 131, no. 13 (2006): 34–39; paul t. jaeger and kenneth r. fleischmann, “public libraries, values, trust, and e-government,” information technology and libraries 26, no. 4 (2007): 35–43. 49. doug johnson, “maintaining intellectual freedom in a filtered world,” learning & leading with technology 32, no. 8 (may 2005): 39. 50. bertot, mcclure, and jaeger, “the impacts of free public internet access on public library patrons and communities.” 51. jaeger et al., “public libraries and internet access across the united states.” 52. paul t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. 53. goldstein, “like a sieve.” 54. bertot, mcclure, and jaeger, “the impacts of free public internet access on public library patrons and communities”; jaeger and fleischmann, “public libraries, values, trust, and e-government.“ 55. bertot, jaeger, and mcclure, “public libraries and the internet 2007”; charles r. mcclure et al., “funding and expenditures related to internet access in public libraries,” information technology & libraries (forthcoming). 56. zheng yan and kurt w. fischer, “how children and adults learn to use computers: a developmental approach,” new directions for child and adolescent development 105 (2004): 41–61. 57. zheng yan, “age differences in children’s understanding of the complexity of the internet,” journal of applied developmental psychology 26 (2005): 385–96; yan, “limited knowledge and limited resources”; yan, “differences in basic knowledge and perceived education of internet safety”; yan, “what influences children’s and adolescents’ understanding of the complexity of the internet?” 58. patricia greenfield and zheng yan, “children, adolescents, and the internet: a new field of inquiry in developmental psychology,” developmental psychology 42 (2006): 391–93. 59. john n. gathegi, “the public library as a public forum: the (de)evolution of a legal doctrine,” library quarterly 75 (2005): 12. 60. sandra braman, “where has media policy gone? defining the field in the 21st century,” communication law and policy 9, no. 2 (2004): 153–82; sandra braman, change of state: information, policy, & power (cambridge, mass.: mit pr., 2007); charles r. mcclure and paul t. jaeger, “government information policy research: importance, approaches, and realities,” library & information science research 30 (2008): 257–64; milton mueller, christiane page, and brendan kuerbis, “civil society and the shaping of communication-information policy: four decades of advocacy,” information society 20, no. 3 (2004): 169–85. 61. paul t. jaeger, “information policy, information access, and democratic participation: the national and international implications of the bush administration’s information politics,” government information quarterly 24 (2007): 840–59. 62. mcclure and jaeger, “government information policy research.” article title | author 23frbrization of a library catalog | dickey 23 the functional requirements for bibliographic records (frbr)’s hierarchical system defines families of bibliographic relationship between records and collocates them better than most extant bibliographic systems. certain library materials (especially audio-visual formats) pose notable challenges to search and retrieval; the first benefits of a frbrized system would be felt in music libraries, but research already has proven its advantages for fine arts, theology, and literature—the bulk of the non-science, technology, and mathematics collections. this report will summarize the benefits of frbr to nextgeneration library catalogs and opacs, and will review the handful of ils and catalog systems currently operating with its theoretical structure. editor’s note: this article is the winner of the lita/ ex libris writing award, 2007. t he following review addresses the challenges and benefits of a next-generation online public access catalog (opac) according to the functional requirements for bibliographic records (frbr).1 after a brief recapitulation of the challenges posed by certain library materials—specifically, but not limited to, audiovisual materials—this report will present frbr’s benefits as a means of organizing the database and public search results from an opac.2 frbr’s hierarchical system of records defines families of bibliographic relationship between records and collocates them better than most extant bibliographic systems; it thus affords both library users and staff a more streamlined navigation between related items in different materials formats and among editions and adaptations of a work. in the eight years since the frbr report’s publication, a handful of working systems have been developed. the first benefits of such a system to an average academic library system would be felt in a branch music library, but research already has proven its advantages for fine arts, theology, and literature—the bulk of the non-science, technology, and mathematics collections. ■ current search and retrieval challenges the difficulties faced first, but not exclusively, by music users of most integrated library systems fall into two related categories: issues of materials formats, and issues of cataloging, indexing, and marc record structure. music libraries must collect, catalog, and support materials in more formats than anyone else; this makes their experience of the most common ils modules—circulation, reserves, and acquisitions—by definition more complicated. the study of music continues to rely on the interrelated use of three distinct information formats—scores (the notated manifestation of a composer’s or improviser’s thought), recordings (realizations in sound, and sometimes video, of such compositions and improvisations), and books and journals (intellectual thought regarding such compositions and improvisations)—music libraries continue to require . . . collections that integrate [emphasis mine] these three information formats appropriately.3 put a different way, “relatedness is a pervasive characteristic of music materials.”4 this is why frbr’s model of bibliographic relationships offers benefits that will first impact the music collection.5 at present, however, musical formats pose search and retrieval challenges for most ils users, and the problem is certainly replicated with microforms and video recordings. the marc codes distinguish between material formats, but they support only one category for sound recordings, lumping together cd, dvd audio, cassette tape, reel-toreel tape, and all other types.6 this single “sound recording” definition is easily reflected in opacs (such as those powered by innovative interfaces’ millennium and ex libris’ aleph 500) and union catalogs (such as worldcat. org).7 however, the distinction between sound recording formats is embedded in subfields of the 007 field, which presently cannot be indexed by many library automation systems because the subfields are not adjacent. an even more central challenge derives from the fact that music sound recordings—such as journals and essay collections—contain within each item more than one work. thus, for one of the central material formats collected by a music library (as well as by a public library or other academic branches), users routinely find themselves searching for a distinct subset of the item record. perversely, though music catalogers do tend to include analytic added-entries for the subparts of a cd recording or printed score, and major ils vendors are learning to index them, aacr2 guidelines set arbitrary cutoff points of about fifteen tracks on a sound recording, and three performable units within a score.8 subsets of essay collections and journal runs are routinely exposed to users’ searches by indexing and abstracting services and major databases, but subsets of libraries’ music collections depend upon catalogers to exploit the marc records for user access.9 timothy j. dickey (dickeyt@oclc.org) is a post-doctoral researcher, oclc office of programs and research, dublin, ohio. frbrization of a library catalog: better collocation of records, leading to enhanced search, retrieval, and display timothy j. dickey 24 information technology and libraries | march 200824 information technology and libraries | march 2008 in light of these pervasive bibliographic relationships, catalogers of music (again, with parallels in other subjects) have developed a distinctive approach to the marc metadata schema. in particular, they—with their colleagues in literature, fine arts, and theology—rely upon the 700t field for uniform work titles, and upon careful authority control.10 however, once again, many major ils portals have spotty records in affording access to library collections via these data. innovative interfaces’ millennium, though it clearly leads other major library products in this market, frequently frustrates music librarians (it is, of course, not alone in doing so).11 its automatic authority control feature works poorly with (necessary) music authority records.12 and even though innovative has been one of the first vendors to add a database index to the 700t field, partly in response to concerns expressed to the company by the music librarians’ user group, millennium apparently does not allow for an appropriate level of follow-through on searching.13 an initial search by name of a major composer, for instance, yields a huge and cluttered result set containing all indexed 700t fields.14 the results do helpfully include the appropriate see also references, but those references disappear in a subsidiary (limited) search. in addition, the subsidiary display inexplicably changes to an unhelpful arrangement of generic 245 fields (“mozart, symphonies”; “mozart, operas, excerpts”). similar challenges will be faced by other parts of an academic or large public library collection, including the literature collections (for works such as shakespeare’s plays), fine arts (for images and artists’ works), and theology (for works whose uniform title is in latin). the opac interfaces of other major ils vendors fare little better. the same search (for “mozart”) on the emory university library catalog (with an ils by sirsidynix), similarly yields a rich results set of more than one thousand records, and poses similar problems in refining the search.15 in the case of this opac, an index of 700t fields also exists, but it only may be searched from the inside of a single record; as with millennium, sirsidynix’s interface will then group the next set of results confusingly by 245 fields. the library corporation’s carl-x apparently does not contain a 700t index; the simple “mozart” search returns a muchsimplified set of only 97 results organized by 245a fields, and thus offers a more concise set of results but avoids the most incisive index for audio-visual materials.16 ex libris offers a somewhat more helpful display of its more restricted results; unfortunately for the present comparison, though the detailed results set does list the “format” of all mozart-authored items, the same term— “music”—is used for sound recordings, musical scores, and score excerpts, with no attempt logically to group the results around individual works.17 no 700t index appears present. ■ the frbr paradigm: review of literature and theory from the earliest library catalogs in the modern age, the tools of bibliographic organization have sought to afford users both access to the collection and collocation of related materials. anglo-american cataloging practice has traditionally served the first function by main entries and alternate access points and the second function by classification systems. however, as knowledge increases in scope and complexity, the systems of bibliographic control have needed to evolve. as early as the 1950s, theories were developing that sought to distinguish between the intellectual content of a work, and its often manifold physical embodiments.18 the 1961 paris international conference on cataloging principles first reified within the cataloging community a work-item distinction, though even the 1988 publication of the anglo-american cataloging rules, 2nd ed., “continued to demonstrate confusion about the nature . . . of works.”19 meanwhile, extensive research into the nature of bibliographic relationships groped toward a consensus definition of the entity-types that could encompass such relationships.20 ed o’neill and diane vizine-goetz examined some one hundred editions of smollett’s the expedition of humphrey clinker over a two-hundred-year span of publication history to propose a hierarchical set of definitions to define entity levels.21 the theoretical entities include the intellectual content of a work—which in the case of audio-visual works, may not even exist in any printed formats—the various versions, editions, and printings in which that intellectual content manifests itself, and the specific copies of each manifestation which a library may hold.22 research has discovered such clusters of bibliographically related entities for as much as 50 percent or more of all the intellectual works in any given library catalog, and as many as 85 percent of the works in a music catalog.23 this work laid the foundation for frbr (and, once again, incidentally underscored the breadth of its applicability to, and beyond, music catalogs). the theoretical framework of frbr is most concisely set forth in the final report of the ifla study group. the long-awaited publication traces its genesis to the 1990 stockholm seminar, and the resultant 1992 founding of the ilfa study group on functional requirements for bibliographic records. the study group set out to develop: a framework that identifies and clearly defines the entities of interest to users of bibliographic records, the attributes of each entity, and the types of relationships that operate between entities . . . a conceptual model that would serve as the basis for relating specific attributes and relationships . . . to the various tasks that users perform when consulting bibliographic records. article title | author 25frbrization of a library catalog | dickey 25 the study makes no a priori assumptions about the bibliographic record itself, either in terms of content or structure.24 in other words, the intention of the group’s deliberations and the final report is to present a model for understanding bibliographic entities and the relationships between them to support information organization tools. it specifically adopts an approach that defines classes of entities based upon how users, rather than catalogers, approach bibliographic records—or, by natural extension, any system of metadata. the frbr hierarchical entities comprise a fourfold set of definitions: ■ work: “a distinct intellectual or artistic creation”; ■ expression: “the intellectual or artistic realization of a work” in any combination of forms (including editions, arrangements, adaptations, translations, performances, etc.); ■ manifestation: “the physical embodiment of an expression of a work”; and ■ item: “a single exemplar of a manifestation.”25 examples of these hierarchical levels abound in the bibliographic universe, but frequently music offers the quickest examples: ■ work: mozart’s die zauberflöte (the magic flute) ■ work: puccini’s la bohéme ■ expression: the composer’s complete musical score (1896) ■ manifestation: edition of the score printed by ricordi in 1897 ■ expression: an english language edition for piano and voices ■ expression: a performance by mirella freni, luciano pavarotti, and the berlin philharmonic orchestra (october 1972) ■ manifestation: a recording of this perfor mance released on 33¹/³ rpm sound discs in 1972 by london records ■ manifestation: a re-release of the same per formance on compact disc in 1987 by london records ■ item: the copy of the compact disc held by the columbus metropolitan library ■ item: the copy of the compact disc held by the university of cincinnati in fact, lis research has tended to demonstrate what music librarians have always understood—that relatedness among items and complexity of families is most prevalent in audio-visual collections. even before the ifla report had been penned, sherry vellucci had set out the task: “to create new catalog structures that better serve the needs of the music user community, it is important first to understand the exact nature and complexity of the materials to be described in the catalog.”26 even limiting herself to musical scores alone (that is, no recordings or monographs), vellucci found that more than 94.8 percent of her sample exhibited at least one bibliographic relationship with another entity in the collection; she further related this finding to the very “inherent nature of music, which requires performance for its aural realization,” as opposed to, for example, monographic book printing.27 vellucci and others have frequently commented on how the relatedness of manifestations—in different formats, arrangements, and abridgements—of musical works continues to be a problem for information retrieval in the world of music bibliography.28 musical works have been variously and industriously described by musicologists and music bibliographers. yet, in the information retrieval domain [and, i might add, under both aacr and aacr2] . . . systems for bibliographic information retrieval . . . have been designed with the document as the key entity, and works have been dismissed as too abstract . . .29 the work is the access point many users will bring—in their minds, and thus in their queries—to a system. they intend, however, to discover, identify, and obtain specific manifestations of that work. very recently, research has begun to demonstrate that the frbr model can offer specific advantages to music retrieval in cases such as these: “the description of bibliographic data in a frbr-based database leads to less redundancy and a clearer presentation of the relationships which are implicit in the traditional databases found in libraries today.”30 explorations of the theory in view of the benefits to other disciplines, such as audio-visual and other graphic materials, maps, oral literature, and rare books, have appeared in the literature as well.31 the admitted weakness of the frbr theory, of course, is that it remains a theory at its inception, with still preciously few working applications. ■ frbr applications working implementations of frbr to catalogs, opacs, and ilss are still relatively few but promise much for the future. the frbr theoretical framework has remained an area of intense research at oclc, which has even led to some prototype applications and, very recently, deployment in the worldcat local interface.32 a scattered few other researchers have crafted frbr catalogs and catalog displays for their own ends; the library of congress has a prototype as well. innovative, the leading academic ils vendor, announced a frbr feature for 2005 release, 26 information technology and libraries | march 200826 information technology and libraries | march 2008 yet shelved the project for lack of a beta-testing partner library.33 ex libris’ primo discovery tool, one other complete ils (by visionary technologies for library systems, or vtls), and the national library of australia, have each deployed operational frbr applications.34 the number of projects testifies to the high level of interest among the cataloging and information science communities, while the relatively small number of successful applications testifies to the difficulties faced. oclc has engaged in a number of research projects and prototypes in order to explore ways that frbrization of bibliographic records could enhance information access. oclc research frequently notes the potential streamlining of library cataloging by frbrization; in addition they have experienced “superior presentation” and “more intuitive clustering” of search results when the model is incorporated into systems.35 work-level definitions stand behind such oclc research prototypes as audience level, dewey browser, fictionfinder, xisbn, and live search. in every case, researchers determined that, though it was very difficult to automate any identification of expressions, application of work-level categories both simplifies and improves search result sets.36 an algorithm common to several of these applications is freely available as an open source application, and now as a public interface option in oclc’s worldcat local.37 the algorithm creates an author/title key to cluster worksets (often at a higher level than the frbr work, as in the case of the two distinct works that are the book and screenplay for gone with the wind). in the public search interface, the results sets may be grouped at the work level; users may then execute a more granular search for “all editions,” an option that then displays the group of expressions linked to the work record. unfortunately, as the software does not use 700t fields (its intention is to travel up the entity hierarchy, and it uses the 1xx, 24x, and 130 fields), its usefulness in solving the above challenges may not be immediate. a somewhat similar application (though merrilee proffitt declares it not to be a frbr product) was redlightgreen, a user interface for the exrlg union catalog based upon quasi-frbr clustering.38 the reports from designers of other automated systems offer interesting commentaries on the process. the team building an automatically frbrized database and user interface for austlit—a new union collection of australian literature among eight academic libraries and the national library of australia—acknowledged some difficulty with non-monographic works such as poems, though the majority of their database consisted of simpler work-manifestation pairs.39 based on strongly positive user feedback (“the presentation of information about related works [is] both useful and comprehensible”), a similar application was attempted on the australian national music gateway musicaustralia; it is unclear whether the project was shelved due to difficulties in automating the frbrization process.40 one recent application created for the perseus digital library adopts a somewhat different approach.41 rather than altering previously created marc records to allow hierarchical relationships to surface, this team created new records using crosswalks between marc and, for instance, mods, for work-level records. they claim some moderate level of success; though once again, their discussion of the process is more illuminating than their product. mimno and crane successfully allowed a single manifestation-level record to link upwards to many expressions, a necessary analytic feature especially for dealing with sound recordings. they did practically demonstrate the difficulty of searching elements from different levels of the hierarchy at the same time (such as work title and translator), a complication predicted by yee.42 three ils vendors have released products that use the frbr model: portia (visualcat), ex libris (primo), and vtls (virtua).43 the first product, a cataloging utility from a smaller player in the vendor market, claims to incorporate frbr into its metadata capture, yet the information available does not explain how, nor do they offer an opac to exploit it. the 2007 release of ex libris’ primo offers what the company calls “frbr groupings” of results.44 this discovery tool is not itself an ils, but promises to interoperate with major existing ils products to consolidate search results. it remains unclear at this time how ex libris’ “standard frbr algorithms” actually group records; the single deployment in the danish royal library allows searching for more records with the same title, for instance, but does not distinguish between translations of the same work.45 vtls, on the other hand, has since 2004 offered a complete product that has the potential to modify existing marc records—via local linking tags in the 001 and 004 fields—to create frbr relationships.46 their own studies agreed with oclc that a subset, roughly 18 percent, of existing catalog records (most heavily concentrated in music collections) would benefit from the process, and they thus allow for “mixed” catalogs, with only subsets (or even individually selected records) to be frbrized. the company’s own information suggests relatively simple implementation by library catalogers, coupled with robust functionality for users, and may be the leading edge of the next generation of catalog products. ■ frbr solutions the ilfa study group, following its user-centered approach, set out a list of specific tasks that users of a computer-aided catalog should be able to accomplish: article title | author 27frbrization of a library catalog | dickey 27 ■ to find all manifestations embodying certain criteria, or to find a specific manifestation given identifying information about it; ■ to identify a work, and to identify expressions and manifestations of that work; ■ to select among works, among expressions, and among manifestations; and ■ to obtain a particular manifestation once selected. it seems clear that the frbr model offers a framework of relationships that can aid each task. unfortunately, none of the currently available commercial solutions may be in themselves completely applicable for a single library. the oclc work-set algorithm is open source, as well as easily available through worldcat local, but it only works to create super-work records; it also ignores the 700t field so crucial to many of the issues noted above. none of the other home-grown applications may have code available to an institution. the virtua module from vtls offers a very tempting solution, but may require a change of vendor.47 either adapting one of these solutions or designing a local application, then, raises the question: what would the ideal system entail? catalog frbrization will transpire in two segments: enhancing the existing catalog to add bibliographic relationships to surface in the retrieval phase, and designing or adaptating a new interface and display to reflect the relationships.48 the first task may prove the more formidable, due to the size of even a modest catalog database and the difficulties often observed in automating such a task; while the librarians constructing the austlit system found a relatively high percentage of records could be transferred en masse, the oclc research team had difficulty automatically pinpointing expressions from current marc records.49 despite current technology trends toward users’ application of tags, reviews, and other metadata, a task as specialized as adding bibliographic relationships to the catalog demands specialized cataloging professionals.50 the best approach within a current library structure may be to create a single new position to head the project and to act as liaison with cataloging staff in the various branches and with vendor staff, if applicable. each library branch may judge on its own the proportions of records to frbrize, beginning with high-traffic works and authors, those for whom search results tend to be the most overwhelming and confusing to users. each branch can be responsible for allocation of cataloging staff effort to the process, and will thus have specialist oversight of subsets of the database. three technical solutions to actually changing the database structure have been attempted in the literature to date: incrementally improving the existing marc records to better reflect bibliographic relationships, adding local linking tags, and simply creating new metadata schemas. the vtls solution of adding local linking tags seems most appropriate; relationships between records are created and maintained via unique identifiers and linking statements in the 001 and 004 fields.51 oclc’s open source software could expedite the creation of work-level records, and the creation of expression-level records will be made easier by the large amount of bibliographic information already present in the current catalog. wherever possible, cataloging staff also should take the opportunity to verify or create links to authority files so as to enhance retrieval.52 creating a new catalog display option could be accomplished via additions to current opac coding, either by adopting worldcat local or by designing parts of a new local interface. it need not even require a complete revision; the single site (ucl) currently deploying vtls’ frbrized interface maintains a mixed catalog and offers, once again, a highly intuitive model.53 when a searcher comes across a bibliographic record for which frbr linking is available, they may click a link to open a new display screen. we should strive, however, to use simple interface statements such as “view all different kinds of holdings,” “this work has x editions, in y languages” or “this version of the work has been published z times” (both the oclc prototype and the austlit gateway offer such helpful and user-friendly statements). though the foundational work of both tillett and smiraglia focused upon taxonomies of relationships, the hierarchical structure of the ifla proposal should remain at the forefront of the display, with a secondary organization by type of relationship or type of entity. rather than adopting a design which automatically refreshes at each click, a tree organization of the display should be more user-friendly, allowing users to maintain a visual sense of the organization that they are encountering (see appendix for screenshots of this type of tree display).54 format information should be included in the display, as an indication of a users’ primary category, as well as a distinction among expressions of a work. with these changes, the library catalog will begin to afford its users better access to many of its core collections. frbrization of even part of the catalog—concentrating on high-incidence authors, as identified by subject specialists—will allow it better to reflect, and collocate, items within the families of bibliographic relationships that have been acknowledged a part of library collections for decades. this increased collocation will begin to counteract the pitfalls of mere keyword searching on the part of users, especially in conjunction with renewed authority work. finally, frbr offers a display option in a revamped opac that is at the same time simpler than current result lists, and more elegant in its reflection of relatedness among items. each feature should better 28 information technology and libraries | march 200828 information technology and libraries | march 2008 enable the users of our catalog to find, select, and obtain appropriate resources, and will bring our libraries into the next generation of cataloging practice. references and notes 1. ifla committee on the functional requirements for bibliographic records, final report (munich: k. g. saur, 1998); see also http://www.ifla.org/vii/s13/wgfrbr/bibliography.htm (accessed mar. 10, 2007). 2. this paper began as a graduate research assignment for lis 60640 (library automation), in the kent state university mlis program, march 19, 2007. my thanks to jennifer hambrick, nancy lensenmayer, and joan lippincott, for their helpful comments on earlier drafts. the curricular assignment asked for a library automation proposal in a specific library setting; the original review contained a set of recommendations concerning frbr through the lens of a (fictional) medium-sized academic library system, that of st. hildegard of bingen catholic university. as will be noted below, the branch music library typically serves a small population of music majors (graduate and undergraduate) within such an institution, but also a large portion of the student body that use the library’s collection to support their music coursework and arts distribution requirements. any music library’s proportion of the overall system’s holdings may be relatively small, but will include materials in a diverse set of formats: monographs, serials, musical scores, sound recordings in several formats (cassette tapes, lps, cds, and streaming audio files), and a growing collection of video recordings, likewise in several formats (vhs, laser discs, and dvd). it thus offers an early test case for difficulties with an automated library system. 3. dan zager, “collection development and management,” notes—quarterly journal of the music library association 56, no. 3 (march 2000): 569. 4. sherry l. velluci, “music metadata and authority control in an international context,” notes—quarterly journal of the music library association 57, no. 3 (mar. 2001): 541. 5. the opac for the university of huddersfield library system famously first deployed a search option for related items (“did you mean . . . ?”); http://www.hud.ac.uk/cls (accessed july 10, 2007). frbr not only offers the related item search, but also logically groups related works throughout the library catalog. 6. allyson carlyle demonstrated empirically that users value an object’s format as one of the first distinguishing features: “user categorization of works: toward improved organization of online catalog displays,” journal of documentation 55, no. 2 (mar. 1999): 184–208 at 197. 7. millennium will feature heavily in the following discussion, both because of its position leading the academic library automation market (being adopted wholesale by, for instance, the ohio statewide academic library consortium), and because it was the subject of the original paper. 8. see alastair boyd, “the worst of both worlds: how old rules and new interfaces hinder access to music,” caml review 33, no. 3 (nov. 2005), http://www.yorku.ca/caml/ review/33-3/both_worlds.htm (accessed mar. 12, 2007); michael gorman and paul w. winkler, eds., anglo-american cataloging rules, 2nd ed. (chicago: ala, 1988). 9. in the past few years, a small subset of the search literature has described technical efforts to develop search engines that can query by musical example; see j. stephen downie, “the scientific evaluation of music information retrieval systems: foundations and future,” computer music journal 28, no. 2 (summer 2004): 12–23. a company called melodis corporation has recently announced a successful launch of a query-by-humming search engine, though a verdict from the music community remains out; http://www.midomi.com (accessed jan. 31, 2007). 10. see velluci, “music metadata and authority control in an international context”; richard p. smiraglia, “uniform titles for music: an exercise in collocating works,” cataloging and classification quarterly 9, no. 3 (1989): 97–114; steven h. wright, “music librarianship at the turn of the century: technology,” notes—quarterly journal of the music library association 56, no. 3 (mar. 2000): 591–97. each author builds upon the foundational work of barbara tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (ph.d. diss., university of california at los angeles, 1987). 11. “at conferences, [my colleagues] are always groaning if they are a voyager client,” interview with an academic music librarian by the author, feb. 9, 2007. 12. several prominent music librarians only discovered that innovative’s system had such a feature when instances of the automatic system’s changing carefully crafted music authority records were discovered; mark sharff (washington university in st. louis) and deborah pierce (university of washington), postings to innovative music users’ group electronic discussion list, oct. 6, 2006, archive accessed feb. 1, 2007. 13. music librarians are the only subset of the millennium users to have formed their own innovate users’ group. sirsidynix has a separate users’ group for stm librarians, and ex libris hosts a law librarians’ users’ group, two other groups whose interaction with the ils poses discipline-specific challenges. 14. searches were tested on the the ohio state university libraries’ opac , http://library.osu.edu (accessed mar. 10, 2007). 15. http://www.emory.edu/libraries.cfm (accessed june 27, 2007). 16. searches performed on the library of oklahoma state university, http://www.library.okstate.edu (accessed june 27, 2007); tlc has considered making frbrization a possible feature of their product. they offer some concatenation of “intellectually similar bibliographic records,” and “tlc continues to monitor emerging frbr standards”; don kaiser, personal communication to the author, july 8, 2007. i was unable to reach representatives of sirsidynix on this issue. 17. searches performed on the mit library catalog, powered by aleph 500 http://libraries.mit.edu (accessed june 27, 2007). 18. eva verona, “literary unit versus bibliographic unit [1959],” in foundations of descriptive cataloging, ed. michael carpenter and elaine svenonius, 155–75 (littleton, colo.: libraries unlimited, 1985), and seymour lubetzky, principles of cataloging, final report phase i: descriptive cataloging (los angeles: institute for library research, 1969), are usually credited with article title | author 29frbrization of a library catalog | dickey 29 the foundational work on such theories; see richard p. smiraglia, the nature of “a work”: implications for the organization of knowledge (lanham, md.: scarecrow, 2001), 15–33, to whom the following overview is indebted. 19. anglo-american cataloging rules, cited in smiraglia, the nature of “a work,” 33. 20. among the many library and information science thinkers contributing to this body of research, the most prominent have been patrick wilson, “the second objective” in the conceptual foundations of descriptive cataloging, ed. elaine svenonius, 5–16 (san diego: academic publ., 1989); edward t. o’neill and diane vizine-goetz, “bibliographic relationships: implications for the function of the catalog,” in the conceptual foundations of descriptive cataloging, ed. elaine svenonius, 167–79 (san diego: academic publ., 1989); barbara ann tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (ph.d. diss, university of california, los angeles, 1987); eadem, “bibliographic relationships,” in relationships in the organization of knowledge, carol a. bean and rebecca green, eds. , 19–35 (dordrecht: kluwer, 2001) (summary of her dissertation findings on 19–20); martha m. yee, “manifestations and near-equivalents: theory with special attention to moving-image materials,” library resources and technical services 38, no. 3 (1994): 227–55. 21. o’neill and vizine-goetz, “bibliographic relationships”; see also edward t. o’neill, “frbr: application of the entityrelationship model to humphrey clinker,” library resources and technical services 46, no. 4 (oct. 2002): 150–59. 22. theorists in music semiotics who have more or less profoundly influenced music librarians’ view of their materials include jean-jacques nattiez, music and discourse: toward a semiology of music, trans. by carolyn abbate (princeton, n.j.: princeton univ. pr., 1990), and lydia goehr, the imaginary museum of musical works (new york: oxford univ. pr., 1992). see also smiraglia, the nature of “a work,” 64. for a concise overview of how semiotic theory has influenced thinking about literary texts, see w. c. greetham, theories of the text (oxford: oxford univ. pr., 1999), 276–325. 23. studies have found families of derivative bibliographic relationships in 30.2 percent of all worldcat records, 49.9 percent of records in the catalog of georgetown university library, 52.9 percent in the burke theological library (union theological seminary), 57.9 percent of theological works in the new york university library, and 85.4 percent in the sibley music library at the eastman school of music (university of rochester). see smiraglia, the nature of “a work,” 87, who cites richard p. smiraglia and gregory h. leazer, “derivative bibliographic relationships: the work relationship in a global bibliographic database,” journal of the american society for information science 50 (1999): 493–504; richard p. smiraglia, “authority control and the extent of derivative bibliographic relationships” (ph.d. diss., university of chicago, 1992); richard p. smiraglia, “derivative bibliographic relationships among theological works,” proceedings of the 62nd annual meeting of the american society for information science (medford, n.j.: information today, 1999): 497–506; and sherry l. vellucci, “bibliographic relationships among musical bibliographic entities: a conceptual analysis of music represented in a library catalog with a taxonomy of the relationships” (d.l.s. diss., columbia university, 1994). 24. ifla, final report, 2–3. 25. ibid, 16–23. 26. sherry l. vellucci, bibliographic relationships in music catalogs (lanham, md.: scarecrow, 1997), 1. 27. ibid, 238; 251. 28. vellucci, “music metadata”; richard p. smiraglia, “musical works and information retrieval,” notes: quarterly journal of the music library association 58, no. 4 (june 2002). patrick le boeuf notes that users of music collections often use the single word “score” to indicate any one of the four frbr entities; “musical works in the frbr model or ‘quasi la stessa cosa’: variations on a theme by umberto eco,” in functional requirements for bibliographic records (frbr): hype or cure-all? ed. patrick le boeuf, 103–23 at 105–06 (new york: haworth, 2005). 29. smiraglia, “musical works and information retrieval,” 2. 30. marte brenne, “storage and retrieval of musical documents in a frbr-based library catalogue” (masters’ thesis, oslo university college, 2004), 79. see also john anderies, “enhancing library catalogs for music,” paper presented at the conference on music and technology in the liberal arts environment, hamilton college, june 22, 2004; powerpoint presentation accessed mar. 12, 2007, from http://academics. hamilton.edu/conferences/musicandtech/presentations/catalog-enhancements.ppt; boyd, “the worst of both worlds.” 31. see the extensive bibliography compiled by ifla, cataloging division: “frbr bibliography,” http://www.ifla.org/ vii/s13/wgfrbr.bibliography.htm (accessed mar. 10, 2007). 32. the first ils deployment of the worldcat local application using frbr is with the university of washington libraries: http://www.lib.washington.edu (accessed june 27, 2007). 33. innovative interfaces, inc., “millennium 2005 preview: frbr support,” inn-touch (june 2004), 9. interestingly, the onepage advertisement for the new service chose a musical work, puccini’s opera la bohème, to illustrate how the sorting would work. innovative interfaces booth staff at the ala national conference, washington, d.c., june 24, 2007, told the author the company has moved in a different development direction now (investing more heavily in faceted browsing). 34. denmark’s det kongelige bibliotek has been the first ex libris partner library to deploy primo, http://www.kb.dk/en (accessed july 10, 2007). the vtls system has been operating since 2004 at the université catholique de louvain, http:// www.bib.ucl.ac.be (accessed mar. 15, 2007). for austlit, see http://www.austlit.edu.au (accessed mar. 14, 2007). 35. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, and technical services 27, no. 1 (spring 2003): 45–60. work-level records allow manifestation and item records to inherit labor-intensive subject classification metadata; eric childress, “frbr and oclc research,” paper presented at the university of north carolina-chapel hill, apr. 10, 2006, http://www.oclc.org/research/presentations/ childress/20060410-uncch-sils.ppt (accessed mar. 12, 2007). 36. thomas b. hickey, edward t. o’neill, and jenny toves, “experiments with the ifla functional requirements for bibliographic records (frbr),” d-lib 8, no. 9 (sept. 2002), http://www.dlib.org/dlib/september02/hickey/09hickey.html (accessed mar. 12, 2007). 37. thomas b. hickey and jenny toves, “frbr work-set algorithm,” apr. 2005 report, http://www.oclc.org/research/ projects/frbr/default.htm (accessed mar. 12, 2007); algorithm 30 information technology and libraries | march 200830 information technology and libraries | march 2008 available at http://www.oclc.org/research/projects/frbr/algorithm.htm. on worldcat local, see above, note 32. 38. merrilee proffitt, “redlightgreen: frbr between a rock and a hard place,” http://www.ala.org/ala/alcts/alctsconted/ presentations/proffitt.pdf (accessed mar. 12, 2007). redlight green has been discontinued, and some of its technology incorporated into worldcat local. 39. http://www.austlit.edu.au (accessed mar. 14, 2007), but unfortunately a subscription database at this time, and thus unavailable for operational comparison. see marie-louise ayres, “case studies in implementing functional requirements for bibliographic records: austlit and musicaustralia,” alj: the australian library journal 54, no. 1 (feb. 2005): 43–54, http:// www.nla.gov.au/nla/staffpaper/2005/ayres1.html (accessed mar. 12, 2007). 40. ibid. 41. see david mimno and gregory crane, “hierarchical catalog records: implementing a frbr catalog,” d-lib 11, no. 10 (oct. 2005); http://www.dlib.org/dlib/october05/ crane/10crane.html (accessed mar. 12, 2007). 42. ibid. see also martha m. yee, “frbrization: a method for turning online public finding lists into online public catalogs,” information technology and libraries 24, no. 3 (2005): 77–95, http://repositories.cdlib.org/postprints/715 (accessed mar. 12, 2007). 43. portia, “visualcat overview,” http://www.portia.dk/ pubs/visualcat/present/visualcatoverview20050607.pdf (accessed mar. 14, 2007); vtls, inc., “virtua,” http://www.vtls. com/brochures/virtua.pdf (accessed mar. 14, 2007). 44. http://www.exlibrisgroup.com/primo_orig.htm (accessed july 10, 2007). 45. syed ahmed, personal communication to the author, july 10, 2007; searches run july 10, 2007, on http://www.kb.dk/en. the library’s holdings of manifestations of mozart’s singspiel opera, the magic flute, run to four different groupings on this catalog: one under the title “die zauberflöte,” one under the title “la flute enchantée: opéra fantastique en 4 actes,” and two separate groups under the title “tryllefløtjen.” 46. “vtls announces first production use of frbr,” http:// www.vtls.com/corporate/releases/2004/6.shtml (accessed mar. 14, 2007). unfortunately, though this press release indicates commitments on the part of the université catholique de louvain and vaughan public libraries (ontario, canada) to use fully frbrized catalogs, only the first is operating in this mode as of july 2007, and with only a subset of its catalog adapted. 47. virtua is not interoperable, for instance, with any of innovative’s other ils modules, which continue to dominate a number of larger academic consortia; john espley, vtls inc. director of design, personal communication to the author, mar. 15, 2007. 48. see allyson carlyle, “fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays,” library resources and technical services 41, no. 2 (1997): 79–100. 49. even at the work-level, yee distinguished fully eight different places in a marc record in which the identity of a work may be located, “frbrization,” 79–80. 50. gregory leazer and richard p. smiraglia imply that cataloger-based “maps” of bibliographic relationships are inadequate; “bibliographic families in the library catalog: a qualitative analysis and grounded theory,” library resources and technical services 43, no. 4 (1999): 191–212. the cataloging failures they describe, however, are more a result of inadequacies in the current rules and practice, and do not really prove that catalogers have failed in the task of creating useful systems. 51. vinood chacra and john espley, “differentiating libraries though enriched user searching: frbr as the next dimensions in meaningful information retrieval,” powerpoint presentation, http://www.vtls.com/corporate/frbr.shtml (accessed mar. 10, 2007). 52. see yee, “frbrization.” 53. http://www.bib.ucl.ac.be (accessed mar. 15, 2007). 54. not only does the ex libris primo application need clickthroughs, it creates a new window for an extra step before presenting a new group of records. bibliography anderies, john. “enhancing library catalogs for music.” paper presented at the conference on music and technology in the liberal arts environment, hamilton college, june 22, 2004; http://academics.hamilton.edu/conferences/musicandtech/presentations/catalog-enhancements.ppt (accessed mar. 12, 2007). ayres, marie-louise. “case studies in implementing functional requirements for bibliographic records: austlit and musicaustralia.” alj: the australian library journal 54, no. 1 (feb. 2005): 43–54; http://www.nla.gov.au/nla/staffpaper/2005/ ayres1.html (accessed mar. 12, 2007). bennett, rick, brian f. lavoie, and edward t. o’neill. “the concept of a work in worldcat: an application of frbr.” library collections, acquisitions, and technical services 27, no. 1 (spring 2003): 45–60. boyd, alistair. “the worst of both worlds: how old rules and new interfaces hinder access to music.” caml review 33, no. 3 (nov. 2005); http://www.yorku.ca/caml/review/33-3/ both_worlds.htm (accessed mar. 12, 2007). brenne, marte. “storage and retrieval of musical documents in a frbr-based library catalogue.” masters’ thesis, oslo university college, 2004. carlyle, allyson. “fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays,” library resources and technical services 41, no. 2 (1997): 79–100. ______. “user categorization of works: toward improved organization of online catalog displays.” journal of documentation 55, no. 2 (mar. 1999): 184–208 chacra, vinood, and john espley. “differentiating libraries though enriched user searching: frbr as the next dimensions in meaningful information retrieval.” powerpoint presentation, http://www.vtls.com/corporate/frbr.shtml (accessed mar. 10, 2007). childress, eric. “frbr and oclc research.” paper presented at the university of north carolina-chapel hill, apr. 10, 2006; http://www.oclc.org/research/presentations/ childress/20060410-uncch-sils.ppt (accessed mar. 12, 2007). hickey, thomas b., and edward o’neill. “frbrizing oclc’s worldcat.” in functional requirements for bibliographic records article title | author 31frbrization of a library catalog | dickey 31 (frbr): hype or cure-all? ed. patrick le boeuf, 239-251. new york: haworth, 2005. hickey, thomas b., and jenny toves. “frbr work-set algorithm.” apr. 2005 report; http://www.oclc.org/research/ frbr (accessed mar. 12, 2007). hickey, thomas b., edward t. o’neill, and jenny toves, “experiments with the ifla functional requirements for bibliographic records (frbr),” d-lib 8, no. 9 (sept. 2002); http://www.dlib.org/dlib/september02/hickey/09hickey. html (accessed mar. 12, 2007). ifla study group on the functional requirements for bibliographic records. functional requirements for bibliographic records: final report. munich: k. g. saur, 1998. layne, sara shatford. “subject access to art images.” in introduction to art image access: issues, tools, standards, strategies, murtha baca, ed., 1–18. los angeles: getty research institute, 2002. leazer, gregory, and richard p. smiraglia. “bibliographic families in the library catalog: a qualitative analysis and grounded theory.” library resources and technical services 43, no. 4 (1999): 191–212. le boeuf, patrick. “musical works in the frbr model or ‘quasi la stessa cosa’: variations on a theme by umberto eco.” in functional requirements for bibliographic records (frbr): hype or cure-all? patrick le boeuf, ed., 103–23 new york: haworth, 2005. markey, karen. subject access to visual resources collections: a model for computer construction of thematic catalogs. new york: greenwood, 1986. mimno, david, and gregory crane. “hierarchical catalog records: implementing a frbr catalog.” d-lib 11, no. 10 (oct. 2005); http://www.dlib.org/dlib/october05/crane/10crane. html (accessed mar. 12, 2007). o’neill, edward t. “frbr: application of the entity-relationship model to humphrey clinker.” library resources and technical services 46, no. 4 (oct. 2002): 150–59. o’neill, edward t., and diane vizine-goetz. “bibliographic relationships: implications for the function of the catalog.” in the conceptual foundations of descriptive cataloging. elaine svenonius, ed., 167–79. san diego: academic publ., 1989. proffitt, merrilee. “redlightgreen: frbr between a rock and a hard place.” paper presented at the 2004 ala annual conference, orlando, fla.; http://www.ala.org/ala/alcts/alctsconted/presentations/proffitt.pdf (accessed mar. 12, 2007). smiraglia, richard p. bibliographic control of music, 1897–2000. lanham, md.: scarecrow and music library association, 2006. ______. “content metadata: an analysis of etruscan artifacts in a museum of archaeology.” cataloging and classification quarterly, 40, no. 3/4 (2005): 135–51. ______. “musical works and information retrieval,” notes: quarterly journal of the music library association 58, no. 4 (june 2002): 747–64. ______. the nature of “a work”: implications for the organization of knowledge. lanham, md.: scarecrow, 2001. ______. “uniform titles for music: an exercise in collocating works.” cataloging and classification quarterly 9, no. 3 (1989): 97–114. tillett, barbara ann. “bibliographic relationships.” in relationships in the organization of knowledge. carol a. bean and rebecca green, eds., 19–35. dordrecht: kluwer, 2001. vellucci, sherry l. bibliographic relationships in music catalogs. lanham, md.: scarecrow, 1997. ______. “music metadata and authority control in an international context.” notes—quarterly journal of the music library association 57, no. 3 (mar. 2001): 541–54. wilson, patrick. “the second objective.” in the conceptual foundations of descriptive cataloging. elaine svenonius, ed., 5–16. san diego: academic publ., 1989. wright, h. s. “music librarianship at the turn of the century: technology.” notes: quarterly journal of the music library association 56, no. 3 (mar. 2000): 591–97. yee, martha m. “frbrization: a method for turning online public finding lists into online public catalogs.” information technology and libraries 24, no. 3 (2005): 77–95; http://repositories.cdlib.org/postprints/713 (accessed mar. 12, 2007). ______. “manifestations and near-equivalents: theory with special attention to moving-image materials.” library resources and technical services 38, no. 3 (1994): 227–55. zager, daniel. “collection development and management.” notes: quarterly journal of the music library association 56, no. 3 (2000): 567–73. 32 information technology and libraries | march 200832 information technology and libraries | march 2008 a search on also sprach zarathustra on the online public access catalog for the universite catholique de louvain, with results frbrized. (a vtls opac). selecting the first work yields the following screen: . . . which, when frbrized, yields a list of expressions. any part of the tree may be expanded, to display manifestations, and item-level records follow. appendix: examples of a frbrized tree display web services and widgets for library information systems | han 87on the clouds: a new way of computing | han 87 shape cloud computing. for example, sun’s well-known slogan “the network is the computer” was established in late 1980s. salesforce.com has been providing on-demand software as a service (saas) for customers since 1999. ibm and microsoft started to deliver web services in the early 2000s. microsoft’s azure service provides an operating system and a set of developer tools and services. google’s popular google docs software provides web-based word-processing, spreadsheet, and presentation applications. google app engine allows system developers to run their python/java applications on google’s infrastructure. sun provides $1 per cpu hour. amazon is well-known for providing web services such as ec2 and s3. yahoo! announced that it would use the apache hadoop framework to allow users to work with thousands of nodes and petabytes (1 million gigabytes) of data. these examples demonstrate that cloud computing providers are offering services on every level, from hardware (e.g., amazon and sun), to operating systems (e.g., google and microsoft), to software and service (e.g., google, microsoft, and yahoo!). cloud-computing providers target a variety of end users, from software developers to the general public. for additional information regarding cloud computing models, the university of california (uc) berkeley’s report provides a good comparison of these models by amazon, microsoft, and google.4 as cloud computing providers lower prices and it advancements remove technology barriers—such as virtualization and network bandwidth—cloud computing has moved into the mainstream.5 gartner stated, “organizations are switching from factors related to cloud computing: infinite computing resources available on demand, removing the need to plan ahead; the removal of an up-front costly investment, allowing companies to start small and increase resources when needed; and a system that is pay-for-use on a short-term basis and releases customers when needed (e.g., cpu by hour, storage by day).2 national institute of standards and technology (nist) currently defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. network, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”3 as there are several definitions for “utility computing” and “cloud computing,” the author does not intend to suggest a better definition, but rather to list the characteristics of cloud computing. the term “cloud computing” means that ■■ customers do not own network resources, such as hardware, software, systems, or services; ■■ network resources are provided through remote data centers on a subscription basis; and ■■ network resources are delivered as services over the web. this article discusses using cloud computing on an it-infrastructure level, including building virtual server nodes and running a library’s essential computer systems in remote data centers by paying a fee instead of running them on-site. the article reviews current cloud computing services, presents the author’s experience, and discusses advantages and disadvantages of using the new approach. all kinds of clouds major it companies have spent billions of dollars since the 1990s to on the clouds: a new way of computing this article introduces cloud computing and discusses the author’s experience “on the clouds.” the author reviews cloud computing services and providers, then presents his experience of running multiple systems (e.g., integrated library systems, content management systems, and repository software). he evaluates costs, discusses advantages, and addresses some issues about cloud computing. cloud computing fundamentally changes the ways institutions and companies manage their computing needs. libraries can take advantage of cloud computing to start an it project with low cost, to manage computing resources cost-effectively, and to explore new computing possibilities. s cholarly communication and new ways of teaching provide an opportunity for academic institutions to collaborate on providing access to scholarly materials and research data. there is a growing need to handle large amounts of data using computer algorithms that presents challenges to libraries with limited experience in handling nontextual materials. because of the current economic crisis, academic institutions need to find ways to acquire and manage computing resources in a cost-effective manner. one of the hottest topics in it is cloud computing. cloud computing is not new to many of us because we have been using some of its services, such as google docs, for years. in his latest book, the big switch: rewiring the world, from edison to google, carr argues that computing will go the way of electricity: purchase when needed, which he calls “utility computing.” his examples include amazon’s ec2 (elastic computing cloud), and s3 (simple storage) services.1 amazon’s chief technology officer proposed the following yan hantutorial yan han (hany@u.library.arizona.edu) is associate librarian, university of arizona libraries, tucson. 88 information technology and libraries | june 201088 information technology and libraries | june 2010 company-owner hardware and software to per-use service-based models.”6 for example, the u.s. government website (http://www.usa .gov/) will soon begin using cloud computing.7 the new york times used amazon’s ec2 and s3 services as well as a hadoop application to provide open access to public domain articles from 1851 to 1922. the times loaded 4 tb of raw tiff images and their derivative 11 million pdfs into amazon’s s3 in twenty-four hours at very reasonable cost.8 this project is very similar to digital library projects run by academic libraries. oclc announced its movement of library management services to the web.9 it is clear that oclc is going to deliver a web-based integrated library system (ils) to provide a new way of running an ils. duraspace, a joint organization by fedora commons and dspace foundation, announced that they would be taking advantage of cloud storage and cloud computing.10 on the clouds computing needs in academic libraries can be placed into two categories: user computing needs and library goals. user computing needs academic libraries usually run hundreds of pcs for students and staff to fulfill their individual needs (e.g., microsoft office, browsers, and image-, audio-, and video-processing applications). library goals a variety of library systems are used to achieve libraries’ goals to support research, learning, and teaching. these systems include the following: ■■ library website: the website may be built on simple html webpages or a content management system such as drupal, joomla, or any home-grown php, perl, asp, or jsp system. ■■ ils: this system provides traditional core library work such as cataloging, acquisition, reporting, accounting, and user management. typical systems include innovative interfaces, sirsidynix, voyager, and opensource software such as koha. ■■ repository system: this system provides submission and access to the institution’s digital collections and scholarship. typical systems include dspace, fedora, eprints, contentdm, and greenstone. ■■ other systems: for example, federated search systems, learning object management systems, interlibrary loan (ill) systems, and reference tracking systems. ■■ public and private storage: staff file-sharing, digitization, and backup. due to differences in end users and functionality, most systems do not use computing resources equally. for example, the ils is input and output intensive and database query intensive, while repository systems require storage ranging from a few gigabytes to dozens of terabytes and substantial network bandwidth. cloud computing brings a fundamental shift in computing. it changes the way organizations acquire, configure, manage, and maintain computing resources to achieve their business goals. the availability of cloud computing providers allows organizations to focus on their business and leave general computing maintenance to the major it companies. in the fall of 2008, the author started to research cloud computing providers and how he could implement cloud computing for some library systems to save staff and equipment costs. in january 2009, the author started his plan to build library systems “on the clouds.” the university of arizona libraries (ual) has been a key player in the process of rebuilding higher education in afghanistan since 2001. ual librarian atifa rawan and the author have received multiple grant contracts to build technical infrastructures for afghanistan’s academic libraries. the technical infrastructure includes the following: ■■ afghanistan ils: a bilingual ils based on the open-source system koha.11 ■■ afghanistan digital libraries website (http://www.afghan digitallibraries.org/): originally built on simple html pages, later rebuilt in 2008 using the content management system joomla. ■■ a digitization management system. the author has also developed a japanese ill system (http://gif project.libraryfinder.org) for the north american coordinating council on japanese library resources. these systems had been running on ual’s internal technical infrastructure. these systems run in a complex computing environment, require different modules, and do not use computing resources equally. for example, the afghan ils runs on linux, apache, mysql, and perl. its opac and staff interface run on two different ports. the afghanistan digital libraries website requires linux, apache, mysql, and php. the japanese ill system was written in java and runs on tomcat. there are several reasons why the author moved these systems to the new cloud computing infrastructure: ■■ these systems need to be accessed in a system mode by people who are not ual employees. ■■ system rebooting time can be substantial in this infrastructure because of server setup and it policy. ■■ the current on-site server has web services and widgets for library information systems | han 89on the clouds: a new way of computing | han 89 reached its life expectancy and requires a replacement. by analyzing the complex needs of different systems and considering how to use resources more effectively, the author decided to run all the systems through one cloud computing provider. by comparing the features and the costs, linode (http://www.linode.com/) was chosen because it provides full ssh and root access using virtualization, four data centers in geographically diverse areas, high availability and clustering support, and an option for month-to-month contracts. in addition, other customers have provided positive reviews. in january 2009, the author purchased one node located in fremont, california, for $19.95 per month. an implementation plan (see appendix) was drafted to complete the project in phases. the author owns a virtual server and has access to everything that a physical server provides. in addition, the provider and the user community provided timely help and technical support. the migration of systems was straightforward: a linux kernel (debian 4.0) was installed within an hour, domain registration was complete and the domains went active in twenty-four hours, the afghanistan digital libraries’ website (based on joomla) migration was complete within a week, and all supporting tools and libraries (e.g., mysql, tomcat, and java sdk) were installed and configured within a few days. a month later, the afghanistan ils (based on koha) migration was completed. the ill system was also migrated without problem. tests have been performed in all these systems to verify their usability. in summary, the migration of systems was very successful and did not encounter any barriers. it addresses the issues facing us: after the migration, ssh log-ins for users who are not university employees were set up quickly; systems maintenance is managed by the author’s team, and rebooting now only takes about one minute; and there is no need to buy a new server and put it in a temperature and security controlled environment. the hardware is maintained by the provider. the administrative gui for the linux nodes is shown in figure 1. since migration, no downtime because of hardware or other failures caused by the provider has been observed. after migrating all the systems successfully and running them in a reliable mode for a few months, the second phase was implemented (see appendix). another linux node (located in atalanta, georgia) was purchased for backup and monitoring (see figure 2). nagios, an open-source monitoring system, was tested and configured to identify and report problems for the above library systems. nagios provides the following functions: (1) monitoring critical computing components, such as the network, systems, services, and servers; (2) timely alerts delivered via e-mail or cell phone; and (3) report and record logs of outages, events, and alerts. a backup script is also run as a prescheduled job to back up the systems on a regular basis. figure 1. linux node administration web interface figure 2. two linux nodes located in two remote data centers node 1: 64.62.xxx.xxx (fremont, ca) node 2: 74.207.xxx.xxx (atlanta, ga) nagios backup afghan digital libraries website afghan ils interlibrary loan system dspace 90 information technology and libraries | june 201090 information technology and libraries | june 2010 findings and discussions since january 2009, all the systems have been migrated and have been running without any issues caused by the provider. the author is very satisfied with the outcomes and cost. the annual cost of running two nodes is $480 per year, compared to at least $4,000 dollars if the hardware had been run in the library.12 from the author ’s experience, cloud computing provides the following advantages over the traditional way of computing in academic institutions: ■■ cost-effectiveness: from the above example and literature review, it is obvious that using cloud computing to run applications, systems, and it infrastructure saves staff and financial resources. uc berkeley’s report and zawodny’s blog provide a detailed analysis of costs for cpu hours and disk storage.13 ■■ flexibility: cloud computing allows organizations to start a project quickly without worrying about up-front costs. computing resources such as disk storage, cpu, and ram can be added when needed. in this case, the author started on a small scale by purchasing one node and added additional resources later. ■■ data safety: organizations are able to purchase storage in data centers located thousands of miles away, increasing data safety in case of natural disasters or other factors. this strategy is very difficult to achieve in a traditional off-site backup. ■■ high availability: cloud computing providers such as microsoft, google, and amazon have better resources to provide more up-time than almost any other organizations and companies do. ■■ the ability to handle large amounts of data: cloud computing has a pay-for-use business model that allows academic institutions to analyze terabytes of data using distributed computing over hundreds of computers for a short-time cost. on-demand data storage, high availability and data safety are critical features for academic libraries.14 however, readers should be aware of some technical and business issues: ■■ availability of a service: in several widely reported cases, amazon’s s3 and google gmail were inaccessible for a duration of several hours in 2008. the author believes that the commercial providers have better technical and financial resources to keep more up-time than most academic institutions. for those wanting no single point of failure (e.g., a provider goes out of business), the author suggests storing duplicate data with a different provider or locally. ■■ data confidentiality: most academic libraries have open-access data. this issue can be solved by encrypting data before moving to the clouds. in addition, licensing terms can be negotiated with providers regarding data safety and confidentiality. ■■ data transfer bottlenecks: accessing the digital collections requires considerable network bandwidth, and digital collections are usually optimized for customer access. moving huge amounts of data (e.g., preservation digital images, audios, videos, and data sets) to data centers can be scheduled during off hours (e.g., 1–5 a.m.), or data can be shipped on hard disks to the data centers. ■■ legal jurisdiction: legal jurisdiction creates complex issues for both providers and end users. for example, canadian privacy laws regulate data privacy in public and private sectors. in 2008, the office of the privacy commissioner of canada released a finding that “outsourcing of canada .com email services to u.s.-based firm raises questions for subscribers,” and expressed concerns about public sector privacy protection.15 this brings concerns to both providers and end users, and it was suggested that privacy issues will be very challenging.16 summary the author introduces cloud computing services and providers, presents his experience of running multiple systems such as ils, content management systems, repository software, and the other system “on the clouds” since january 2009. using cloud computing brings significant cost savings and flexibility. however, readers should be aware of technical and business issues. the author is very satisfied with his experience of moving library systems to cloud computing. his experience demonstrates a new way of managing critical computing resources in an academic library setting. the next steps include using cloud computing to meet digital collections’ storage needs. cloud computing brings fundamental changes to organizations managing their computing needs. as major organizations in library fields, such as oclc, started to take advantage of cloud computing, the author believes that cloud computing will play an important role in library it. acknowledgments the author thanks usaid and washington state university for providing financial support. the author thanks matthew cleveland’s excellent work “on the clouds.” references 1. nicholars carr, the big switch: rewiring the world, from edison to google web services and widgets for library information systems | han 91on the clouds: a new way of computing | han 91 (london: norton, 2008). 2. werner vogels, “a head in the clouds—the power of infrastructure as a service” (paper presented at the cloud computing and in applications conference (cca ’08), chicago, oct. 22–23, 2008). 3. peter mell and tim grance, “draft nist working definition of cloud computing,” national institute of standards and technology (may 11, 2009), http:// csrc.nist.gov/groups/sns/cloud-computing/index.html (accessed july 22, 2009). 4. michael armbust et al., “above the clouds: a berkeley view of cloud computing,” technical report, university of california, berkeley, eecs department, feb. 10, 2009, http://www.eecs.berkeley .edu/pubs/techrpts/2009/eecs-200928.html (accessed july 1, 2009). 5. eric hand, “head in the clouds: ‘cloud computing’ is being pitched as a new nirvana for scientists drowning in data. but can it deliver?” nature 449, no. 7165 (2007): 963; geoffery fowler and ben worthen, “the internet industry is on a cloud—whatever that may mean,” wall street journal, mar. 26, 2009, http://online.wsj.com/article/ sb123802623665542725.html (accessed july 14, 2009); stephen baker, “google and the wisdom of the clouds,” business week (dec. 14, 2007), http://www.msnbc .msn.com/id/22261846/ (accessed july 8, 2009). 6. gartner, “gartner says worldwide it spending on pace to supass $3.4 trillion in 2008,” press release, aug. 18, 2008, http://www.gartner.com/it/page .jsp?id=742913 (accessed july 7, 2009). 7. wyatt kash, “usa.gov, gobierno usa.gov move into the internet cloud,” government computer news, feb. 23, 2009, http://gcn.com/articles/2009/02/23/ gsa-sites-to-move-to-the-cloud.aspx?s =gcndaily_240209 (accessed july 14, 2009). 8. derek gottfrid, “self-service, prorated super computing fun!” online posting, new york times open, nov. 1, 2007, http://open.blogs .nytimes.com/2007/11/01/self-service -prorated-super-computing-fun/?scp =1&sq=self%20service%20prorated&st =cse (accessed july 8, 2009). 9. oclc online computing library center, “oclc announces strategy to move library management services to web scale,” press release, apr. 23, 2009, http://www.oclc.org/us/en/news/ releases/200927.htm (accessed july 5, 2009). 10. duraspace, “fedora commons and dspace foundation join together to create duraspace organization,” press release, may 12, 2009, http:// duraspace.org/documents/pressrelease .pdf (accessed july 8, 2009). 11. yan han and atifa rawan, “afghanistan digital library initiative: revitalizing an integrated library system,” information technology & libraries 26, no. 4 (2007): 44–46. 12. fowler and worthen, “the internet industry is on a cloud.” 13. jeremy zawodney, “replacing my home backup server with amazon’s s3,” online posting, jeremy zawodny’s blog, oct. 3, 2006, http://jeremy .zawodny.com/blog/archives/007624 .html (accessed june 19, 2009). 14. yan han, “an integrated high availability computing platform,” the electronic library 23, no. 6 (2005): 632–40. 15. office of the privacy commissioner of canada, “tabling of privacy commissioner of canada’s 2005–06 annual report on the privacy act: commissioner expresses concerns about public sector privacy protection,” press release, june 20, 2006, http://www.priv.gc.ca/media/ nr-c/2006/nr-c_060620_e.cfm (accessed july 14, 2009); office of the privacy commissioner of canada, “findings under the personal information protection and electronic documents act (pipeda),” (sept. 19, 2008), http://www.priv.gc.ca/cf -dc/2008/394_20080807_e.cfm (accessed july 14, 2009). 16. stephen baker, “google and the wisdom of the clouds,” business week (dec. 14, 2007), http://www.msnbc.msn .com/id/22261846/ (accessed july 8, 2009). appendix. project plan: building ha linux platform using cloud computing project manager: project members: object statement: to build a high availability (ha) linux platform to support multiple systems using cloud computing in six months. scope: the project members should identify cloud computing providers, evaluate the costs, and build a linux platform for computer systems, including afghan ils, afghanistan digital libraries website, repository system, japanese interlibrary loan website, and digitization management system. resources: project deliverable: january 1, 2009—july 1, 2009 92 information technology and libraries | june 201092 information technology and libraries | june 2010 phase i ■■ to build a stable and reliable linux platform to support multiple web applications. the platform needs to consider reliability and high availability in a cost-effective manner ■■ to install needed libraries for the environment ■■ to migrate ils (koha) to this linux platform ■■ to migrate afghan digital libraries’ website (joomla) to this platform ■■ to migrate japanese interlibrary loan website ■■ to migrate digitization management system phase ii ■■ to research and implement a monitoring tool to monitor all web applications as well as os level tools (e.g. tomcat, mysql) ■■ to configure a cron job to run routine things (e.g., backup ) ■■ to research and implement storage (tb) for digitization and access phase iii ■■ to research and build linux clustering steps: 1. os installation: debian 4 2. platform environment: register dns 3. install java 6, tomcat 6, mysql 5, etc. 4. install source control env git 5. install statistics analysis tool (google analytics) 6. install monitoring tool: ganglia or nagios 7. web applications 8. joomla 9. koha 10. monitoring tool 11. digitization management system 12. repository system: dspace, fedora, etc. 13. ha tools/applications note calculation based on the following: ■■ leasing two nodes $20/month: $20 x 2 nodes x 12 months = $480/year ■■ a medium-priced server with backup with a life expectancy of 5 years ($5,000): $1,000/year ■■ 5 percent of system administrator time for managing the server ($60,000 annual salary): $3,000/year ■■ ignore telecommunication cost, utility cost, and space cost. ■■ ignore software developer’s time because it is equal for both options. appendix. project plan: building ha linux platform using cloud computing (cont.) examining attributes of open standard file formats for long-term preservation and open access eun g.park and sam oh information technology and libraries | december 2012 44 abstract this study examines the attributes that have been used to assess file formats in literature and compiles the most frequently used attributes of file formats to establish open-standard file-formatselection criteria. a comprehensive review was undertaken to identify the current knowledge regarding file-format-selection criteria. the findings indicate that the most common criteria can be categorized into five major groups: functionality, metadata, openness, interoperability, and independence. these attributes appear to be closely related. additional attributes include presentation, authenticity, adoption, protection, preservation, reference, and others. introduction file format is one of the core issues in the fields of digital content management and digital preservation. as many different types of file formats are available for texts, images, graphs, audio recordings, videos, databases, and web applications, the selection of appropriate file formats poses an ongoing challenge to libraries, archives, and other cultural heritage institutions. some file formats appear to be more widely accepted: tagged image file format (tiff), portable document format (pdf), pdf/a, office open xml (ooxml), and open document format (odf), to name a few. many institutions, including the library of congress (lc), possess guidelines on file format applications for long-term preservation strategies that specify requisite characteristics of acceptable file formats (e.g., they are independent of specific operating systems, are independent of hardware and software functions, conform to international standards, etc.).1 the format descriptions database of the global digital format registry is an effort to maintain a detailed representation of information and sustainability factors for as many file formats as possible (the pronom technical registry is another such database).2 despite these developments, file format selection remains a complex task and prompts many questions that range from a general interest (“which selection criteria are appropriate?”) to more specific (“are these international standard file formats sufficient for us to ensure long term preservation and access?” or “how should we define and implement standard file formats in harmony with our local context?”). in this study, we investigate the definitions and features of standard file formats and examine the eun g. park (eun.park@mcgill.ca) is associate professor, school of information studies, mcgill university, montreal, canada. sam oh (samoh@skku.edu) is corresponding author and professor, department of library and information science, sungkyunkwan university, seoul, korea. mailto:eun.park@mcgill.ca mailto:samoh@skku.edu information technology and libraries | december 2012 45 major attributes of assessing file formats. we discuss relevant issues from the viewpoint of openstandard file formats for long-term preservation and open access. background on standard file formats the term file format is generally defined as what “specifies the organization of information at some level of abstraction, contained in one or more byte streams that can be exchanged between systems.”3 according to interpares 2, file format is “the organization of data within files, usually designed to facilitate the storage, retrieval, processing, presentation, and/or transmission of the data by software.”4 the premis data dictionary for preservation metadata observes that, technically, file format is “a specific, pre-established structure for the organization of a digital file or bitstream.”5 in general, file format can be divided into two types: an access format and a preservation format. an access format is “suitable for viewing a document or doing something with it so that users access the on-the-fly converted access formats.”6 in comparison, a preservation format is “suitable for storing a document in an electronic archive for a long period”7; it provides “the ability to capture the material into the archive and render and disseminate the information now and in the future.”8 while the ability to ensure long-term preservation focuses on the sustainability of preservation formats, the document in its access format tends to emphasize that it should be accessible and available by users, presumably all of the time. many researchers have discussed file formats and long-term preservation in relation to various types of resources. for example, folk and barkstrom describe and adopt several attributes of file formats that may affect the long-term preservation of scientific and engineering data (e.g., the ease of archival storage, ease of archival access, usability, data scholarship enablement, support for data integrity, and maintainability and durability of file formats).9 barnes suggests converting word processing documents in digital repositories, which are unsuitable for long-term storage, into a preservation format.10 the evaluation by rauch, krottmaier, and tochtermann illustrates the practical use of file formats for 3d objects in terms of long-term reliability.11 others have developed and/or applied numerous criteria in different settings. for instance, sullivan uses a list of desirable properties of a long-term preservation format to explain the purpose of pdf)/a from an archival and records management prospective.12 sullivan cites device independence, self-containment, self-describing, transparency, accessibility, disclosure, and adoption as such properties. rauch, krottmaier, and tochtermann’s study applies criteria that consist of technical characteristics (e.g., open specification, compatibility, and standardization) and market characteristics (e.g., guarantee duration, support duration, market penetration, and the number of independent producers). rog and van wijk propose a quantifiable assessment method to calculate composite scores of file formats.13 they identify seven main categories of criteria: openness, adoption, complexity, technical protection mechanism, self-documentation, robustness, and dependencies. sahu focuses on the criteria developed by the uk’s national archives, which include open standards, ubiquity, stability, metadata support, feature set, examining attributes of open standard file formats for long-term preservation and open access | park and oh 46 interoperability, and viability.14 a more comprehensive evaluation by the lc reveals three components—technical factors, quality, and functionality—while placing a particular emphasis on the balance between the first two.15 hodge and anderson use seven criteria for sustainability, which are similar to the technical factors of the lc study: disclosure, adoption, transparency, selfdocumentation, external dependencies, impact of patents, and technical protection mechanisms.16 some institutions adopt another term, standard file formats, to differentiate accepted and recommended file formats from others. according to the david project, “standard file formats owe their status to (official) initiatives for standardizing or to their widespread use.”17 standard may be too general to specify the elements of file formats. however, there is a recognition that only those file formats accepted and recommended by national or international standard organizations (such as the international standardization organization [iso], international industry imaging association [i3a], www consortium, etc.) are genuine standard file formats. for example, iso has announced several standard file formats for images: tiff/it (iso 12639:2004), png (iso/iec 15948:2004), and jpeg 2000 (iso/iec 15444:2003, 2004, 2005, 2007, 2008). for document file formats, pdf/a-1 (iso standard 19005-1. document file format for long-term preservation) is one example. this format is proprietary to maintain archival and recordsmanagement requirements and to preserve the visual appearance and migration needs of electronic documents. office open xml file format (iso/iec 29500–1:2008. information technology—document description and processing languages) is another open standard that can be implemented from microsoft office applications on multiple platforms. odf (iso/iec 26300:2006. information technology—open document format for office applications [opendocument] v1.0) is an xml-based open file format. regardless of iso-announced standards, some errors in these file formats have been reported. for example, although pdf/a-1 is for longterm preservation of and access to documents, studies reveal that the feature-rich nature of pdf can create difficulties in preserving pdf information over time.18 to overcome the barriers of pdf and pdf/a-1, xml technology seems prevalent for digital resources in archiving systems and digital preservation.19 the digital repository community is treating xml technology as a panacea and converting most of their digital resources to xml. the netherlands institute for scientific information service (nisis) adopts another noteworthy definition of standard file formats. it observes that standard image file formats “are widely accepted, have freely available specifications, are highly interoperable, incorporate no data compression and are capable of supporting preservation metadata.”20 this definition implies specific and advanced ramifications for cost-free interoperability and metadata, which closely relates to open access. open standard is another relevant term to consider in file formats. although perspectives vary greatly between researchers, open standards can be acquired and used without any barrier or cost.21 in other words, open standard products are free from restrictions, such as patents, and are independent of proprietary hardware or software. since the 1990s, open standard has been broadly adopted in many fields and is now an almost compulsory feature in information services. information technology and libraries | december 2012 47 to follow the national archives’ definition, open standard formats are “formats for which the technical specifications have been made available in the public domain.”22 in comparison, the folk and barkstrom approach opens standards from institutional support perspectives, relying on user communities for standards that are widely available and used.23 on a more specific level, stanescu emphasizes independence as the basic selection criteria for file formats.24 others, such as todd, propose determining whether a standard should be more open than others by applying criteria: adoption, platform independence, disclosure, transparency, and metadata support.25 other factors considered by todd include reusability and interoperability; robustness, complexity, and viability; stability; and intellectual property (ip) and rights management.26 echoing the lc, hodge and anderson also suggest a list of selection criteria that have been grouped under the banner of “technical factors”: disclosure, adoption, transparency, self-documentation, external dependencies, impact of patents, and technical protection mechanisms.27 researchers agree that open standard file formats are less obsolete and more reliable than proprietary formats.28 close examination of the nisis definition mentioned above reveals that standard file formats are in reality not free, nor do they allow unrestricted access to resources. the three file formats that iso has announced (pdf/a, ooxml, and odf) are proprietary and sometimes costly. they also prohibit the purchase of access to a proprietary standard, although there is an assumption that a standard should be free from legal and financial restrictions. the iso-announced file formats, in short, are only standard file formats, not open standard file formats. for cultural heritage institutions, questions regarding appropriate selection criteria and the sufficiency of existing international standard file formats for long-term preservation and access remain unanswered. there exists neither a uniform method to compare the specifications of different file formats nor an objective approach to assess format specifications that would ensure long-term preservation and persistent access. objectives of the study in this study, we attempt to better define and establish open-standard file-format-selection criteria. to that end, we assess and compile the most frequently used attributes of file formats to establish open-standard file-format-selection criteria. method we performed a comprehensive review of published articles, institutional reports, and other literature to identify the current knowledge regarding file-format-selection criteria. we included literature that deals with the three standard file formats (pdf, pdf/a, and xml) but excluded the recently announced odf format due to the scarcity of literature on odf. among more than the thirty articles initially reviewed, only twenty-five that use their own clear attributes were included in this study. all of the attributes that we have employed are listed by frequency and grouped according to similarities in meaning (see appendix). the original definitions or descriptions that we used are listed in the second column. the file formats that we assessed by their attributes are examining attributes of open standard file formats for long-term preservation and open access | park and oh 48 listed in the third column. when we give attributes without specific definitions or descriptions, “no definite term” is inserted. findings as illustrated in the appendix, the criteria identified by the studies vary. although the requirements and context of the studies may differ, the most common criteria can be divided into five categories: functionality, metadata, openness, interoperability, and independence. first, functionality refers to the ability of a format to do exactly what it is supposed to be doing.29 it is important to distinguish between two broad uses: preservation of document structure and formatting and preservation of useable content. to preserve document formatting, a “published view” of a given piece of content is critical for distribution. other content, such as database information or device-specific documents, needs to be preserved as well. functionality criteria include various attributes related to formats and structure or physical and technical specifications of files (e.g., robustness, feature set, viability, color maintenance, clarity, compactness, modularity, compression algorithms, etc.). second, metadata indicates that a format allows rich descriptive and technical metadata to be embedded in files. metadata can be expressed as metadata support, self-documentation (selfdocumenting), documentation, content-level (as opposed to presentation-level) description, selfdescribing, self-describing files, formal description of format, etc. third, openness refers to specifications of a file format that are publicly available and accessible and formats that are not proprietary. whether seen as a single definition or as a set of criteria, the characteristic that appears to be at the core of the open standard movement is its independence from outside proprietary or commercial control. openness also may refer to the autonomy of a file format, which relies on several factors. first, the document should be self-contained in terms of the content information (e.g., the text), the structural information (i.e., for those documents that are structured), the formatting information (e.g., fonts, colours, styles, etc.), and the metadata information. self-containment does not necessarily mean that an archivist will only have one document to deal with. it does mean, however, that they will have documents that will provide them with all the information to access and process the content, structure, formatting, and metadata. openness is expressed as open availability by some researchers.30 other researchers adopt the term disclosure for expressing that specification is publicly available.31 fourth, is the independence of a document from proprietary or commercial hardware and software configurations, especially to prevent any issues resulting from different versions of software, hardware, and operating systems. this aspect is expressed in the appendix as open standards, open source software or equivalent, standard/proprietary, etc. this also closely relates to independence, one of the five categories in the appendix, expressed as device independencies, independent implementations, no external dependency, no external dependencies, portability, and monitoring obsolescence. having documents in a proprietary format controlled by a third party information technology and libraries | december 2012 49 implies that, at one time or another, this format may no longer be supported, or that a change in the user agreement may lead to restricted access, access to outdated material, or patent and copyright issues. this fact means that the document must be freely accessible, without password restrictions or protection, and without any digital rights management scheme. blocking access to a document with a password can lead to serious problems if the password gets lost. in addition, the size and compactness of the document will influence the selection of a file format. fifth, interoperability primarily refers to the ability of a file format to be compatible with other formats and to exchange documents without loss of information.32 specifically, it refers to the ability of a given software to open a document without requiring any special application, plug-in, codec, or proprietary add-on. adherence to open source standards is usually a good indication of the interoperability of a format. in general, an open standard is released after years of bargaining and agreements between major players. supervision by an international standard (such as iso or the w3c) commonly helps propagate the format. in addition to the five categories mentioned above, other attributes are often used. presentation, authenticity, adoption, protection, preservation and reference are such examples. among these attributes, authenticity, although this is the seventh in the appendix, is one of the most important attributes in archives and records management. it refers to the ability to guarantee that a file is what it originally was without any corruption or alteration.33 specific to authenticity is data integrity, which assesses the integrity of the file through an internal mechanism (e.g., png files include byte sequences to validate against errors). another method of validating the authenticity of a document is to look at its traceability,34 that is, the traces left by the original author and those who modified or opened a file. one example is the difference between the creation date, modification date, and access date of any file on a personal computer. these three dates correspond to a moment when someone (often a different person each time) opened the file. other mechanisms may require log information, which is external to the file. another good indication of authenticity is the stability of a format.35 a format that is widely used is more likely to be stable. a stable format is also more likely to cause less data loss and corruption; hence it is a better indicator of authenticity. presentation includes attributes related to presenting and rendering data, expressed as distributing a page image, normal rendering, self-containment, selfcontained, and beyond normal rendering. adoption indicates how popular and widely a file format is adopted by user communities, also represented as popularity, widely used formats, ubiquity, or continuity. protection includes the technical protection mechanism or source verification to protect with security skills. preservation means long-term preservation, institutional support, or ease of transformation and preservation. reference indicates citability, or referential extensibility. among other attributes, transparency is interesting to note because it indicates the degree to which files are open to direct analysis with basic tools and human readability. another important aspect across these criteria is that the terminologies used in the studies may be quite different yet describe the same or similar concepts from different angles. for instance, rog and van wijk use openness for standardization and specification without restrictions,36 while examining attributes of open standard file formats for long-term preservation and open access | park and oh 50 several other researchers use open availability to convey the same thing.37 they in turn adopt the term disclosure to express that specification is publicly available.38 discussion and conclusion functionality, metadata, openness, interoperability, and independence appear to be the most important factors when selecting file formats. when file formats for long-term preservation and open access are under discussion, cultural heritage institutions need to consider many issues. despite several efforts, it is still tricky for them to identify the most appropriate file format or even to discern acceptable formats from unacceptable formats. where it is difficult to prevent the creation of a new file format, format selection is not an easy task, both in theory and in practice. it is critical, however, to base the decision on a clear understanding of the purpose for which the document is preserved: access preservation or repurposing preservation. cultural heritage institutions and digital repository communities need to guarantee long-term preservation of digital resources in selected file formats. additionally, users find it necessary to have access to digital information in these file formats. additional consideration involves the level of access users may enjoy (e.g., long-term access, permanent access, open access, persistent access, etc.). when determining international standard file formats, an aspect of open access should be included because it is a well-liked topic. it is necessary to develop a scale or measurement to assess open-standard format specifications to ensure long-term preservation and open access. identifying which attributes are required to be an open-standard file format and which digital format is most apt for the use and sustainability of long-term preservation is a meaningful task. the outcome of our study provides a framework for appropriate strategies when selecting file formats for long-term preservation and access to digital content. we hope that the criteria described in this study will benefit librarians, preservers, record creators, record managers, archivists, and users. we are reminded of todd’s remark that “the most important action is to align the recognition and weighting of criteria with a clear preservation strategy and keep them under review using risk management techniques.”39 the question of how to adopt and implement these attributes can only be answered in the local context and decisions of each cultural heritage institution.40 each institution should consider implementing a file format throughout the entire life cycle of digital resources, with a holistic approach to managerial, technical, procedural, archival, and financial issues for the purpose of long-term preservation and persistent access. the criteria may change over time, as is necessary for any format to adequately serve its purpose. maintaining its quality may be an ongoing task that cultural heritage institutions should take into account at all times. even more importantly, cultural heritage institutions need to establish and implement a set of standard guidelines specific to each context for the selection of open-standard file formats. note: this research was supported by the sungkyunkwan university research fund (2010-2011). information technology and libraries | december 2012 51 references and notes 1. library of congress, “sustainability of digital formats: planning for library of congress collections,” www.digitalpreservation.gov/formats/intro/intro.shtml (accessed november 21, 2011). 2. global digital format registry, www.gdfr.info (accessed november 17, 2011); the technical registry pronom, www.nationalarchives.gov.uk/aboutapps/pronom (accessed november 21, 2011). 3. mike folk and bruce r. barkstrom, “attributes of file formats for long-term preservation of scientific and engineering data in digital libraries” (paper presented at the joint conference on digital libraries (jcdl), houston, tx, may 27–31, 2003), 1, www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom2 00305.pdf (accessed november 21, 2011). 4. interpares 2 project glossary, p. 24, www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary (accessed november 21, 2011). 5. premis editorial committee, premis data dictionary for preservation metadata, ver. 2.0, march 2008, p. 195, www.loc.gov/standards/premis/v2/premis-2-0.pdf (accessed november 21, 2011). 6. ian barnes, “preservation of word processing documents,” july 14, 2006, p. 4, http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed november 21, 2011). 7. ibid. 8. gail hodge and nikkia anderson, “formats for digital preservation: a review of alternatives and issues,” information services & use 27 (2007): 46. 9. folk and barkstrom, “attributes of file formats.” 10. barnes, “preservation of word processing documents.” 11. carl rauch, harald krottmaier, and klaus tochtermann, “file-formats for preservation: evaluating the long-term stability of file-formats,” in proceedings of the 11th international conference on electronic publishing 2007 (vienna, austria, june 13–15, 2007): 101–6. 12. susan j. sullivan, “an archival/records management perspective on pdf/a,” records management journal 16, no. 1 (2006): 51–56. 13. judith rog and caroline van wijk, “evaluating file formats for long-term preservation,” 2008, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). http://www.digitalpreservation.gov/formats/intro/intro.shtml http://www.nationalarchives.gov.uk/aboutapps/pronom http://www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary http://www.loc.gov/standards/premis/v2/premis-2-0.pdf http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 52 14. d. k. sahu, “long term preservation: which file format to use” (paper presented in workshops on open access & institutional repository, chennai, india, may 2–8, 2004), http://openmed.nic.in/1363/01/long_term_preservation.pdf (accessed november 21, 2011). 15. cendi digital preservation task group, “formats for digital preservation: a review of alternatives and issues,” www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf (accessed november 21, 2011). 16. hodge and anderson, “formats for digital preservation.” 17. david 4 project (digital archiving, guideline and advice 4), “standards for fileformats,” 1, www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf (accessed november 21, 2011). 18. sullivan, “an archival/records management perspective on pdf/a”; john michael potter, “formats conversion technologies set to benefit institutional repositories,” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf (accessed november 21, 2011). 19. eva müller et al., “using xml for long-term preservation: experiences from the diva project,” in proceedings of the 6th international symposium on electronic theses and dissertations (may 20–24, 2003): 109–16, https://edoc.hu-berlin.de/conferences/etd2003/hanssonpeter/html/index.html (accessed november 21, 2011). 20. rene van horik, “image formats: practical experiences” (paper presented in erpanet training, vienna, austria, may 10–11, 2004), 22, www.erpanet.org/events/2004/vienna/presentations/erpatrainingvienna_horik.pdf (accessed november 21, 2011). 21. open standard is related to open access, which comes from the open access movement that allows resources to be freely available to the public and permits any user to use those resources (e.g., mainly electronic journals, repositories, databases, software applications, etc.) without financial, legal, or technical barriers. see amy e. c. koehler, “some thoughts on the meaning of open access for university library technical services,” serials review 32, no. 1 (march 2006): 17–21; budapest open access initiative, “read the budapest open access initiative,” www.soros.org/openaccess/read.shtml (accessed november 21, 2011). 22. national archives, “selecting file formats for long-term preservation,” 6, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). 23. folk and barkstrom, “attributes of file formats.” http://openmed.nic.in/1363/01/long_term_preservation.pdf http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf http://www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/html/index.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/html/index.html http://www.erpanet.org/events/2004/vienna/presentations/erpatrainingvienna_horik.pdf http://www.soros.org/openaccess/read.shtml http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf information technology and libraries | december 2012 53 24. andreas stanescu, “assessing the durability of formats in a digital preservation environment: the inform methodology,” d-lib magazine 10, no. 11 (november 2004), www.dlib.org/dlib/november04/stanescu/11stanescu.html (accessed november 21, 2011). 25. malcolm todd, “technology watch report: file formats for preservation,” www.dpconline.org/advice/technology-watch-reports (accessed november 21, 2011). 26. ibid. 27. hodge and anderson, “formats for digital preservation.” 28. edward m. corrado, “the importance of open access, open source, and open standards for libraries,” issues in science & technology librarianship (spring 2005), www.library.ucsb.edu/istl/05-spring/article2.html (accessed november 21, 2011); carl vilbrandt et al., “cultural heritage preservation using constructive shape modeling,” computer graphics forum 23, no. 1 (2004): 25–41; marshall breeding, “preserving digital information,” information today 19, no. 5 (2002): 48–49. 29. eun g. park, “xml: examining the criteria to be open standard file format,” (paper presented at the interpares 3 international symposium, oslo, norway, september 17, 2010), www.interpares.org/display_file.cfm?doc=ip3_isym04_presentation_3–3_korea.pdf (accessed november 21, 2011). 30. adrian brown, “digital preservation guidance note: selecting file formats for long-term preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011); barnes, “preservation of word processing documents”; sahu, “long term preservation”; potter, “formats conversion technologies.” 31. stephen abrams et al., “pdf-a: the development of a digital preservation standard” (paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, 2005), www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011); sullivan, “an archival/records management perspective on pdf/a”; cendi, “formats for digital preservation”; and hodge & anderson, “formats for digital preservation.” 32. the national archives, http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_me thod_27022008.pdf (accessed november 21, 2011); ecma international, “office open xml file formats—ecma-376,” www.ecma-international.org/publications/standards/ecma-376.htm (accessed november 21, 2011). 33. christoph becker et al., “systematic characterisation of objects in digital preservation: the extensible characterisation languages,” www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_bec ker.pdf (accessed november 21, 2011); national archives, http://www.dlib.org/dlib/november04/stanescu/11stanescu.html http://www.dpconline.org/advice/technology-watch-reports http://www.library.ucsb.edu/istl/05-spring/article2.html http://www.interpares.org/display_file.cfm?doc=ip3_isym04_presentation_3–3_korea.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/pdf-a.ppt http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 54 www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). 34. folk and barkstrom, “attributes of file formats.” 35. national archives, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011); rog and van wijk, “evaluating file formats for long-term preservation.” 36. rog and van wijk, “evaluating file formats for long-term preservation.” 37. see brown, “digital preservation guidance note: selecting file formats for long-term preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011); barnes, “preservation of word processing documents”; sahu, “long term preservation”; potter, “formats conversion technologies.” 38. stephen abrams et al., “pdf-a: the development of a digital preservation standard” (paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, 2005), www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011).; sullivan, “an archival/records management perspective on pdf/a”; cendi, “formats for digital preservation”; and hodge & anderson, “formats for digital preservation.” 39. todd, “technology watch report,” 33. 40. evelyn peters mclellan, “selecting digital file formats for long-term preservation: interpares 2 project general study 11 final report,” www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf (accessed november 21, 2011). http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/pdf-a.ppt http://www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf information technology and libraries | december 2012 55 appendix: file format attributes no. attribute definition/description assessed file format 1. f u n c t i o n a l i t y robustness robust against single point of failure, support for file corruption detection, file format stability, backward compatibility and forward compatibility (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) a robust format contains several layers of defense against corruption (frey, 2000). n/a feature set formats supporting the full range of features and functionality (brown, 2003) n/a not defined (sahu, 2006) n/a viability error-detection facilities to allow detection of file corruption (brown, 2003). png format (yes) not defined (sahu, 2006) n/a support for graphic effects and typography not defined (cendi, 2007; hodge & anderson, 2007) tiff_g4 (no) color maintenance not defined (cendi, 2007; hodge & anderson, 2007) tiff_g4 (limited) clarity support for high image resolution (cendi, 2007; hodge & anderson, 2007) tiff_g4 (yes) quality this pertains to how well the format fulfills its task today: (1) low space costs, (2) highly encompassing, (3) robust, (4) simplicity, (5) highly tested, (6) loss-free, (7) supports metadata (clausen, 2004). n/a compactness to minimize storage and i/o costs (folk & barkstrom, 2003) n/a simplicity ease of implementing readers (folk & barkstrom, 2003) n/a file corruption detection to be able to detect that a file has been corrupted; to provide errorcorrection (folk & barkstrom, 2003) n/a raw i/o efficiency formats that are organized for fast sequential access (folk & barkstrom, 2003) n/a availability of readers to maintain ease of data access for readers (folk & barkstrom, 2003) n/a ease of subsetting to process only part of data files (folk & barkstrom, 2003) n/a size to transfer data in large blocks (folk & barkstrom, 2003) n/a ability to aggregate many objects in a single file to maintain as small as archive “name space” as possible (folk & barkstrom, 2003) n/a ability to embed data extraction software in the files the files come with read software embedded (folk & barkstrom, 2003). n/a ability to name file elements to work with data based on manipulating the element names instead of binary offsets, or other references (folk & barkstrom, 2003) n/a rigorous definition to be defined in a sufficient rigorous way (folk & barkstrom, 2003) n/a multilanguage implementation of library software to have multiple implementations of readers for a single format (folk & barkstrom, 2003) n/a memory some formats emphasize the presence or absence of memory (frey, 2000). tiff (yes) examining attributes of open standard file formats for long-term preservation and open access | park and oh 56 accuracy in some cases, the accuracy of the data can be decreased to save memory, e.g., through compression. in the case of a digital master, however, accuracy is very important (frey, 2000). n/a speed the ability to access or display a data set at a certain speed is critical to certain applications (frey, 2000). n/a extendibility a data format can be modified to allow for new types of data and features in the future (frey, 2000). n/a modularity a modular data set definition is designed to allow some of its functionality to be upgraded or enhanced without having to propagate changes through all parts of the data set (frey, 2000). n/a plugability related to modularity, this permits the user of an implementation of a data set reader or writer to replace a module with private code (frey, 2000). n/a interpretability not binary formats (barnes, 2006) rtf (yes) ms word (no) xml (yes) the standard should be written in characters that people can read (lesk, 1995). n/a complexity human readability, compression, variety of features (rog & van wijk, 2008; wijk & rog, 2007). n/a simple raster formats are preferred (puglia et al., 2004). n/a compression algorithms the format uses standard algorithms (puglia et al., 2004). n/a accessibility to prohibit encryption in the file trailer (sullivan, 2006) pdf/a (yes) component reuse not defined (sahu, 2006) pdf (no) html (limited) sgml (excellent) xml (excellent) repurposing not defined (sahu, 1999) pdf (limited) html (limited) sgml (excellent) xml (excellent) packaging formats in general, packaging formats should be acceptable as transfer mechanisms for image file formats (puglia et al., 2004). zip (yes) significant properties the format accommodates high-bit, high-resolution (detail), color accuracy, and multiple compression options (puglia et al., 2004). n/a processability the requirement to maintain a processable version of the record to have any reuse value (brown, 2003) conversion of a word-processed document into pdf format. (no) searching not defined (sahu, 2006) pdf (limited) html (good) sgml (excellent) xml (excellent) no definite term to support the automatic validation of document conversions and the evaluation of conversion quality by hierarchically decomposing documents from different sources and representing them in an abstract xml language (becker et al., 2008a; becker et al., 2008b) n/a xcl (yes) to make transferring data easy (johnson, 1999) n/a xml (yes) a format that is easy to restore and understand by both humans and machines (müller et al., 2003) n/a xml (yes) information technology and libraries | december 2012 57 inability to be backed out into a usable format (potter, 2006) pdfs (no) 2. m e t a d a t a self-documentation self-documenting digital objects that contain basic descriptive, technical, and other administrative metadata (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) metadata and technical description of format embedded (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) the ability of a digital format to hold (in a transparent form) metadata beyond that needed for basic rendering of the content (arms & fleischhauer, 2006) n/a self-documenting to contain its own description (abrams et al., 2005) n/a documentation deep technical documentation publicly and fully is available. it is maintained for older versions of the format (puglia et al., 2004). n/a metadata support file formats making provision for the inclusion of metadata (brown, 2003) tiff (yes) microsoft word 2000 (yes) not defined (kenney, 2001) fiff 6.0 (yes) gif 89a (yes) jpeg (yes) flashpix 1.0.2 (yes) imagepac, photo cd (no) png 1.2 (yes) pdf (yes) not defined (sahu, 2006) n/a metadata the format allows for self-documentation (puglia et al., 2004). n/a content-level description not presentation-level description; structural markup, not formatting (barnes, 2006) pdf (no) docbook (yes) tei (yes) xhtml (yes) xml (yes) content-level, not presentation-level, descriptions where possible, the labeling of items should reflect their meaning, not their appearance (lesk, 1995). sgml (yes) self-describing many different types of metadata are required to decipher the contents of a file (folk & barkstrom, 2003). n/a self-describing files embed metadata in pdf files (sullivan, 2006) pdf/a (adobe extensible metadata platform required) formal (bnfor xml-like) description of format to create new readers solely on the basis of formal descriptions of the file content (folk & barkstrom, 2003) n/a no definite term its self-describing tags identify what your content is all about (johnson, 1999). n/a xml (yes) a format for strong descriptive and administrative metadata and the complete content of the document (müller et al., 2003) n/a xml (yes) examining attributes of open standard file formats for long-term preservation and open access | park and oh 58 3. o p e n n e s s disclosure authoritative specification publicly available (abrams et al., 2005) pdf/a (yes) microsoft word (no) the degree to which complete specifications and tools for validating technical integrity exist and are accessible to those creating and sustaining digital content (cendi, 2007; hodge & anderson, 2007; arms & fleischhauer, 2006) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) authoritative specification is publicly available (sullivan, 2006). pdf/a (yes) open availability no proprietary formats (barnes, 2006) odf (yes) gif (no) pdf (no) rtf (no) microsoft word (no) any manufacturer or researcher should have the ability to use the standard, rather than having it under the control of only one company (lesk, 1995). kodak photocd (no) gif (no) openness standardization, restrictions on the interpretation of the file format, reader with freely available source (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (yes) ms word (no) a standard is designed to be implemented by multiple providers and guide 5: file formats for digital masters employed by a large number of users (frey, 2000). n/a formats that are described by publicly available specifications or open-source source code can, with some effort, be reconstructed later: (1) open publicly available specification, (2) specification in public domain, (3) viewer with freely available source, (4) viewer with gpl’ed source, (5) not encrypted (clausen, 2004). n/a open-source software or equivalent to move toward obtaining open-source arrangements for all parts of the file format and associated libraries (folk & barkstrom, 2003) n/a open standard formats for which the technical specification has been made available in the public domain (brown, 2003) jpeg (yes) pdf (limited) ascii (limited) not defined (sahu, 2006) n/a standard/ proprietary not defined (kenney, 2001) fiff 6.0 (yes) gif 89a (yes) jpeg (yes) flashpix 1.0.2 (yes) imagepac, photo cd (no) png 1.2 (yes) pdf (yes) nonproprietary formats the specification is independent of a particular vendor (public records office of victoria, 2004). n/a no definite term to avoid vendor-lock (potter, 2006) odf (yes) information technology and libraries | december 2012 59 4. i n t e r o p e r a b i l i t y interoperability is the format supported by many software applications/os platforms or is it linked closely with a specific application (puglia et al., 2004)? n/a the ability to exchange electronic records with other users and it systems (brown, 2003) n/a not defined (sahu, 2006) n/a data interchange not defined (sahu, 2006) pdf (no) html (limited) sgml (excellent) xml (excellent) compatibility compatibility with prior versions of data set definitions often is needed for access and migration considerations (frey, 2000). n/a stability compatibility between versions (folk & barkstrom, 2003) n/a stable, not subject to constant or major changes over time (brown, 2003) n/a the format is supported by current applications and backward compatible, and there are frequent updates to the format or the specification (puglia et al., 2004). n/a not defined (sahu, 2006). n/a scalability the design should be applicable both to small and large data sets and to small and large hardware systems (frey, 2000). n/a markup compatibility and extensibility to support a much broader range of applications (ecma, 2008) n/a xml (yes) suitability for a variety of storage technologies the format should not be geared toward any particular technology (folk & barkstrom, 2003). n/a no definite term to allow data to be shared across information systems and remain impervious to many proprietary software revisions (potter, 2006) openoffice (yes) 5. i n d e p e n d e n c e device independencies can be reliably and consistently rendered without regard to the hardware/software platform (abrams et al., 2005) pdf/a (yes) tiff (no) static visual appearance can be reliably and consistently rendered and printed without regard to the hardware or software platform used (sullivan, 2006). pdf/a (yes) pdf/x (yes) this is a very important aspect for master files because they will be most likely used on various systems (frey, 2000). n/a independent implementations independent implementations help ensure that vendors accurately implement the specification (public records office of victoria, 2004). n/a externaldependency degree to which the format is dependent on specific hardware, operating system, or software for rendering or use and the complexity of dealing with those dependencies in future technical environments (arms & fleischhauer, 2006) n/a external dependencies the degree to which a particular format depends on particular hardware, operating system, or software for rendering or use and the predicted complexity of dealing with those dependencies in future technical environments (cendi, 2007; hodge & anderson, 2007) pdf (limited) pdf/a (no) tiff_g4 (no) xml (no) examining attributes of open standard file formats for long-term preservation and open access | park and oh 60 portability a format that makes extensive use of specific hardware or operating system features is likely to be unusable when that hardware or operating system falls into disuse. a format that is defined in an independent way will be much easier to use in the future: (1) independent of hardware; (2) independent of operating system; (3) independent of other software; (4) independent of particular institutions, groups, or events; (5) widespread current use; (6) little built-in functionality; and (7) single version or well-defined versions (clausen, 2004). n/a monitoring obsolescence information gathered through regular web harvesting can give us some information about what file types are approaching obsolescence, at least for the more frequently used types (clausen, 2004). n/a no definite term a human-readable text format and internationalized character sets are supported (müller et al., 2003). n/a xml (yes) not dependent on specific hardware, not dependent on specific operating systems, not dependent on one specific reader, not dependent on other external resources (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (little) the format requires a plug-in for viewing if appropriate software is not available or relies on external programs to function (puglia et al., 2004). n/a 6. p r e s e n t a t i o n distributing page image not defined (sahu, 2006) pdf (excellent) html (good) sgml (good) xml (good) normal rendering not defined (cendi, 2007; hodge & anderson, 2007). pdf (yes) pdf/a (limited) tiff_g4 (yes) xml (yes) presentation preservation of its original look and feel (brown, 2003) n/a self-containment everything that is necessary to render or print a pdf/a file must be contained within the file (sullivan, 2006). pdf/a (yes) self-contained to contain all resources necessary for rendering (abrams et al., 2005) n/a beyond normal rendering not defined (cendi, 2007; hodge & anderson, 2007). pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (limited) 7. a u t h e n t i c i t y authenticity the format must preserve the content (data and structure) of the record and any inherent contextual, provenance, referencing and fixity information (brown, 2003). n/a provenance traceability ability to trace the entire configuration of data production (folk & barkstrom, 2003) n/a integrity of layout not defined (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (n/a) xml (yes) integrity of rendering of equations not defined (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (n/a) xml (limited) integrity of structure not defined (cendi, 2007; hodge & anderson, 2007) pdf (limited) pdf/a (limited) tiff_g4 (n/a) information technology and libraries | december 2012 61 xml (yes) 8. a d o p t i o n adoption degree to which the format is already used by the primary creators, disseminators, or users of information resources (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) worldwide usage, usage in the cultural heritage sector as archival format (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (yes) microsoft word (limited) the degree to which the format is already used by the primary creators, disseminators, or users of information resources (arms & fleischhauer, 2006) n/a widespread use may be the best deterrent against preservation risk (abrams et al., 2005). tiff (yes) the format is widely used by the imaging community in cultural institutions (puglia et al., 2004). n/a flexibility of implementation to promote its wide adoption (sullivan, 2006) pdf/a (yes) popularity a format that is widely used (folk & barkstrom, 2003) n/a widely used formats it is far more likely that software will continue to be available to render the format (public records office of victoria, 2004). n/a ubiquity popular formats supported by as much software as possible (brown, 2003) n/a not defined (sahu, 2006) n/a continuity the file format is mature (puglia et al., 2004) n/a 9. p r o t e c t i o n technical protection mechanism password protection, copy protection, digital signature, printing protection and content extraction protection (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) implementation of a mechanism such as encryption that prevents the preservation of content by a trusted repository (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (no) tiff_g4 (no) xml (no) it must be able to replicate the content on new media, migrate and normalize it in the face of changing technology, and disseminate it to users at a resolution consistent with network bandwidth constraints (arms & fleischhauer, 2006). n/a no encryption, passwords, etc. (abrams et al. (2005) n/a protection the format accommodates error detection, correction mechanisms, and encryption options (puglia et al., 2004). n/a source verification cryptographic encoding of files or digital watermarks without overburdening the data centers or archives (folk & barkstrom, 2003) n/a examining attributes of open standard file formats for long-term preservation and open access | park and oh 62 10. p r e s e r v a t i o n preservation the format contains embedded objects (e.g., fonts, raster images) or links to external objects (puglia et al., 2004). n/a long-term institutional support to ensure the long-term maintenance and support of a data format by placing responsibility for these operations on institutions (folk & barkstrom, 2003) n/a ease of transformation/ preservation the format will be supported for fully functional preservation in a repository setting, or the format guarantee can currently only be made at the bitstream (content data) level (puglia et al., 2004). n/a no definite term to create files with either a very high or very low preservation value (becker et al., 2008a, becker et al., 2008b) pdf (no) tiff (no) 11. r e f e r e n c e citability a machine-independent ability to reference or “cite” the individual data element in a stable way (folk & barkstrom, 2003) n/a referential extensibility ability to build annotations about new interpretations of the data (folk & barkstrom, 2003) n/a no definite term an open and established notation (müller et al., 2003) n/a xml (yes) data is easily repurposed via tags or translated to any medium (johnson, 1999) n/a xml (yes) creating, using, and reusing tags is easy, making it highly extensible (johnson, 1999). n/a xml (yes) 12. o t h e r s transparency degree to which the digital representation is open to direct analysis with basic tools, such as human readability using a text-only editor (cendi, 2007, hodge & anderson, 2007). pdf (limited) pdf/a (limited) tiff_g4 (limited) xml (yes) in natural reading order (sullivan, 2006). pdf/a (yes) microsoft notepad (yes) the degree to which the format is already used by the primary creators, disseminators, or users of information resources (arms & fleischhauer, 2006) n/a amenable to direct analysis with basic tools (abrams et al., 2005) n/a ample comment space to allow rich metadata (barnes, 2006) n/a items should be labeled, as far as possible, with enough information to serve for searching or cataloging (lesk, 1995). tiff (yes) a digital format may inhibit the ability of archival institutions to sustain content in that format (arms & fleischhauer, 2006). n/a information technology and libraries | december 2012 63 table bibliography abrams, stephen et al. 2005. “pdf-a: the development of a digital preservation standard.” paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, http://www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011). arms, caroline r. and carl fleischhauer. 2006. “sustainability of digital formats: planning for library of congress collections.” http://www.digitalpreservation.gov/formats/sustain/sustain.shtml (accessed november 21, 2011). barnes, ian. 2006. “preservation of word processing documents.” http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed november 21, 2011). becker, christoph et al. 2008. “a generic xml language for characterising objects to support digital preservation.” in proceedings of the 2008 acm symposium on applied computing, fortaleza, ceara, brazil, march 16–20. becker, christoph et al. 2008. “systematic characterization of objects in digital preservation: the extensible characterization language.” journal of universal computer science 14, no 18: 2936– 2952. brown, adams. 2003. “the national archives. digital preservation guidance note: selecting file formats for long-term preservation.” http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011). cendi digital preservation task group. 2007. “formats for digital preservation: a review of alternatives and issues.” http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf (accessed november 21, 2011). clausen, lars r. 2004. “handling file formats.” http://netarchive.dk/publikationer/fileformats2004.pdf (accessed november 21, 2011). ecma. 2008. “office open xml file formats—part 1.” 2nd ed. http://www.ecmainternational.org/publications/standards/ecma-376.htm (accessed november 21, 2011). folk, mike, and bruce barkstrom. 2003. “attributes of file formats for long-term preservation of scientific and engineering data in digital libraries.” paper presented at the joint conference on digital libraries, houston, tx, may 27–31. http://www.hdfgroup.org/projects/nara/sci_formats_and_archiving.pdf (accessed november 21, 2011). http://www.digitalpreservation.gov/formats/sustain/sustain.shtml http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf http://netarchive.dk/publikationer/fileformats-2004.pdf http://netarchive.dk/publikationer/fileformats-2004.pdf http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.hdfgroup.org/projects/nara/sci_formats_and_archiving.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 64 frey, franziska. 2000. “5. file formats for digital masters.” in guides to quality in visual resource imaging, research libraries group and digital library federation. http://imagendigital.esteticas.unam.mx/pdf/guides.pdf (accessed november 21, 2011). hodge, gail and nikkia anderson. 2007. “formats for digital preservation: a review of alternatives and issues.” information services & use 27: 45–63. johnson, amy helen. 1999. “xml xtends its reach: xml finds favor in many it shops, but it’s still not right for everyone.” computerworld 33, no. 42: 76–81. lesk, michael e. 1995. “preserving digital objects: recurrent needs and challenges.” in proceedings of the 2nd npo conference on multimedia preservation. brisbane, australia. http://www.lesk.com/mlesk/auspres/aus.html (accessed november 21, 2011). müller, eva et al. 2003. “using xml for long-term preservation: experiences from the diva project.” in proceedings of the sixth international symposium on electronic theses and dissertations. berlin, may: 109–116, https://edoc.hu-berlin.de/conferences/etd2003/hanssonpeter/pdf/index.pdf (accessed december 8, 2012). potter, john michael. 2006. “formats conversion technologies set to benefit institutional repositories.” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026typ e=pdf (accessed november 21, 2011). public records office of victoria (australia). 2006. “advice on vers long-term preservation formats pros 99/007 (version2) specification 4.” department for victorian communities. http://prov.vic.gov.au/wp-content/uploads/2012/01/vers_advice13.pdf (accessed november 21, 2011). puglia, steven, jeffrey reed, and erin rhodes. 2004. “technical guidelines for digitizing archival materials for electronic access: creation of production master files—raster images.” us national archives and records administration. http://www.archives.gov/preservation/technical/guidelines.pdf (accessed november 21, 2011). rog, judith, and caroline van wijk. 2008. “evaluating file formats for long-term preservation.” national library of the netherlands. http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_metho d_27022008.pdf (accessed november 21, 2011). sahu, d.k. 2004. “long term preservation: which file format to use.” presentation at workshops on open access & institutional repository, chennai, india, may 2–8, http://openmed.nic.in/1363/01/long_term_preservation.pdf (accessed november 21, 2011). sullivan, susan j. 2006. “an archival/records management perspective on pdf/a.” records management journal 16, no. 1: 51–56. http://imagendigital.esteticas.unam.mx/pdf/guides.pdf http://www.lesk.com/mlesk/auspres/aus.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/pdf/index.pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/pdf/index.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://prov.vic.gov.au/wp-content/uploads/2012/01/vers_advice13.pdf http://www.archives.gov/preservation/technical/guidelines.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://openmed.nic.in/1363/01/long_term_preservation.pdf information technology and libraries | december 2012 65 van wijk, caroline, and judith rog. 2007. “evaluating file formats for long-term preservation.” presentation at international conference on digital preservation, beijing, china, oct 11–12. http://ipres.las.ac.cn/pdf/caroline-ipres2007-11-12oct_cw.pdf (accessed november 21, 2011). http://ipres.las.ac.cn/pdf/caroline-ipres2007-11-12oct_cw.pdf 50 information technology and libraries | june 2009 andrew k. pace president’s message: lita forever andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. i was warned when i started my term as lita president that my time at the helm would seem fleeting in retrospect, and i didn’t believe it. i should have. i suppose most advice of that sort falls on deaf ears—advice to children about growing up, advice to newlyweds, advice to new parents. some things you just have to experience. now i am left with that feeling of having worked very hard while not accomplishing nearly enough. it’s time to buy myself some more time. my predecessor, mark beatty, likes to jokingly introduce himself in ala circles as “lita has-been” in reference to his role as lita past-president. i say jokingly because he and i both know it is not true. not only does the past-president continue in an active role on the lita board and executive committee, the past-president has the daunting task of acting as the division’s financial officer. just as mark knows well the nature of this elected (but still volunteer) commitment, so michelle frisque, my successor this july, knows that the hard work started as vice-president/ president-elect has two challenging years ahead. being elected lita president is for all intents and purposes a three-year term with shifting responsibilities. add to this the possibility of serving on the board beforehand, and it’s likely that one could serve less time for knocking over a liquor store. i’m joking, of course— there’s nothing punitive about being a lita officer; it’s as rewarding as it is challenging. neither is this intended to be a self-congratulatory screed as my last hurrah in print as lita president. i’ve referred repeatedly to the grassroots success of lita’s board, interest groups, dedicated committees, and engaged volunteers. the flatness of our division is often emulated by others. i thoroughly enjoy engagement with the lita membership, face-to-face and virtual recruitment of new members and volunteers, and group meetings to discuss moving lita forward. i love that lita is fun. fun and enjoyment, coupled with my dedication to the profession that i love, is why i plan to make the most of my time, even as a has-been. all those meetings, all that bureaucracy? well, believe it or not, i like the bureaucracy—process works when you learn to work the process—and all those meetings have actually created some excellent feedback for the lita board. changes in ala, changes in the membership, and changes suggested by committees and interest groups all suggest . . . guess what? change. “change” has been a popular theme these days. i’m in that weird minority of people who does not believe that people don’t like to change. i think if the ideas are good, if the destination is worthwhile, then change is possible and even desirable. i’m always geared up for change, for learning from our mistakes, for asking forgiveness on occasion and for permission even less. this is a long-winded way of saying that i think lita is ready for some change. change to the board, change to the committees and interest groups, and changes to our interactions with lita and ala staff. i think ala and the other divisions are anxious for change as well, and i feel confident that lita and its membership can help, even while we change ourselves. don’t ask me today what the details of these changes are. all i can say is that i will be there for them, help see them through, and will be there on the other side to asses which changes worked and which didn’t. one thing i hope does not change is the passion and dedication of the leaders, volunteers, and members of this great organization. i only hope that our ranks grow, even in times of financial uncertainty. lita provides a valuable network of colleagues and friends—this network is always valuable, but it is indispensible in times of difficulty. for many, lita represents a second or third divisional membership, but for networking and collegial support, i think we are second to none. i titled my previous column “lita now.” i think it’s safe for me to say now, “lita forever.” usability study of a library’s m obile website: an example from portland state university kimberly d. pendell and michael s. bowman usability study of a library’s mobile website | pendell and bowman 45 abstract to discover how a newly developed library mobile website performed across a variety of devices, the authors used a hybrid field and laboratory methodology to conduct a usability test of the website. twelve student participants were recruited and selected according to phone type. results revealed a wide array of errors attributed to site design, wireless network connections, as well as phone hardware and software. this study provides an example methodology for testing library mobile websites, identifies issues associated with mobile websites, and provides recommendations for improving the user experience. introduction mobile websites are swiftly becoming a new access point for library services and resources. these websites are significantly different from full websites, particularly in terms of the user interface and available mobile-friendly functions. in addition, users interact with a mobile website on a variety of smartphones or other internet-capable mobile devices, all with differing hardware and software. it is commonly considered a best practice to perform usability tests prior to the launch of a new website in order to assess its user friendliness, yet examples of applying this practice to new library mobile websites are rare. considering the variability of user experiences in the mobile environment, usability testing of mobile websites is an important step in the development process. this study is an example of how usability testing may be performed on a library mobile website. the results provided us with new insights on the experience of our target users. in the fall of 2010, with the rapid growth of smartphones nationwide especially among college students, portland state university (psu) library decided to develop a mobile library website for its campus community. the library’s lead programmer and a student employee developed a test version of the website. this version of the website included library hours, location information, a local catalog search, library account access for viewing and renewing checked out items, and access to reference services. it also included a “find a computer” feature displaying the availability of work stations in the library’s two computer labs. kimberly d. pendell (kpendell@pdx.edu) is social sciences librarian, assistant professor, and michael s. bowman (bowman@pdx.edu) is interim assistant university librarian for public services, associate professor, portland state university library, portland, oregon. mailto:kpendell@pdx.edu mailto:bowman@pdx.edu information technology & libraries | june 2012 46 the basic architecture and design of the site was modeled on other existing academic library mobile websites that were appealing to the development team. the top-level navigation of the mobile website largely mirrored the full library website, utilizing the same language as the website when possible. the mobile website was built to be compatible with webkit, the dominant smartphone layout engine. use of javascript on the website was minimized due to the varying levels of support for it on different smartphones, and flash was avoided entirely. figure 1. home page of library mobile website, test version we formed a mobile website team to further evaluate the test website and prepare it for launch. three out of four team members owned smartphones, either an iphone 3gs or an iphone 4. we soon began questioning how the mobile website would work on other types of phones, recognizing that hardware and software differences would likely impact user experience of the mobile website. performing a formal usability test using a variety of internet-capable phones quickly became a priority. we decided to conduct a usability test for the new mobile website in order to answer the question: how user-friendly and effective is the new library mobile website on students’ various mobile devices? literature review smartphones, mobile websites, and mobile applications have dominated the technology landscape in the last few years. smartphone ownership has steadily increased, and a large percentage of usability study of a library’s mobile website | pendell and bowman 47 smartphone owners regularly use their phone to access the internet. the pew research center reports that 52 percent of americans aged 18–29 own smartphones, and 81 percent of this population use their smartphone to access the internet or e-mail on a typical day. additionally, 42 percent of this population uses a smartphone as their primary online access point.1 the 2010 ecar study of undergraduate students and information technology found that 62.7 percent of undergraduate students own internet-capable handheld devices, an increase of 11.5 percent from 2009. the 2010 survey also showed that an additional 11.3 percent of students intended to purchase an internet-capable handheld device within the next year.2 in this environment academic libraries have been scrambling to address the proliferation of student owned mobile devices, thus the number of mobile library websites is growing. the library success wiki, which tracks libraries with mobile websites, shows an 66 percent increase in the number of academic libraries in the united states and canada with mobile websites from august 2010 to august 2011.3 we reviewed articles about mobile websites in the professional library science literature and found that mobile website usability testing is only briefly mentioned. in their summary of current mobile technologies and mobile library website development, bridges, rempel, and griggs state that “user testing should be part of any web application development plan. you can apply the same types of evaluation techniques used in non-mobile applications to ensure a usable interface.”4 in a previous article, the same authors also note that not accounting for other types of mobile users is easy to do but leaves a potentially large audience for a mobile website “out in the cold.”5 more recently, seeholzer and salem found the usability aspect of mobile website development to be in need of further research.6 usability evaluation techniques for a mobile website are similar to those for a full website, but the variety of smartphones and internet-capable feature phones immediately complicates standard usability testing practices. the mobile device landscape is fraught with variables that can have a significant impact on the user experience of a mobile website. factors like small screen size, processing power, wireless or data plan connection, and on-screen keyboards or other data entry methods contribute to user experience and impact usability testing. zhang and adipat note that, mobile devices themselves, due to their unique, heterogeneous characteristics and physical constraints, may play a much more influential role in usability testing of mobile applications than desktop computers do in usability testing of desktop applications. therefore real mobile devices should be used whenever possible.7 one strategy for usability testing on mobile devices is to identify device “families” by similar operating systems or other characteristics, then perform a test of the website. for example, griggs, bridges, and rempel found representative models of device families at a local retailer, where they tested the site on the display phones. the authors also recommend “hallway usability testing,” an impromptu test with a volunteer.8 zhang and adipat go on to outline two methodologies for formal mobile application usability testing: field studies and laboratory experiments. the benefit of a mobile usability field study is information technology & libraries | june 2012 48 the preservation of the mobile environment in which tasks are normally performed. however, data collection is challenging in field studies, requiring the participant to reliably and consistently selfreport data. in contrast, the benefit of a laboratory study is that researchers have more control over the test session and data collection method. laboratory usability tests lend themselves to screen capture or video recording, allowing researchers more comprehensive data regarding the participant’s performance on predetermined tasks.9 however, billi and others point out that there is no general agreement in the literature about the significance or usefulness of the difference between laboratory and field testing of mobile applications.10 one compromise between field studies and laboratory experiments is the use of a smartphone emulator: an emulator mimics the smartphone interface on a desktop computer and is recordable via screen capture. however, desktop emulators mask some usability problems that impact smartphones, such as an unstable wireless connection or limited bandwidth.11 in order to record test sessions of users working directly with mobile devices, jakob nielsen, the well-known usability expert, briefly mentions the use of a document camera.12 in another usability test of a mobile application, loizides and buchanan also used a document camera with recording capabilities to effectively record users working with a mobile device.13 usability attributes are metrics that help assess the user-friendliness of a website. in their review of empirical mobile usability studies, coursaris and kim present the three most commonly used measures in mobile usability testing: efficiency: degree to which the product is enabling the tasks to be performed in a quick, effective and economical manner or is hindering performance; effectiveness: accuracy and completeness with which specified users achieved specified goals in particular environment; satisfaction: the degree to which a product is giving contentment or making the user satisfied.14 the authors present these measures in an overall framework of “contextual usability” constructed with the four variables of user, task, environment, and technology. an important note is the authors’ use of technology rather than focusing solely on the product; this subtle difference acknowledges that the user interacts not only with a product, but also other factors closely associated with the product, such as wireless connectivity.15 a participant proceeding through a predetermined task scenario is helpful in assessing site efficiency and effectiveness by measuring the error rate and time spent on a task. user satisfaction may be gauged by the participant’s expression of satisfaction, confusion, or frustration while performing the tasks. measurement of user satisfaction may also be supplemented by a post-test survey. returning to general evaluation techniques, mobile website usability employs the use of task scenarios, post-test surveys, and data analysis methods, similar to full site testing. general guides such as the handbook of usability testing by rubin and chisnell and george’s user-centered library websites: usability evaluation methods provide helpful information on designing task scenarios, how to facilitate a test, post-test survey ideas, and methods of analysis.16 another usability study of a library’s mobile website | pendell and bowman 49 common data collection method in usability testing is the think aloud protocol as it allows researchers to more fully understand the user experience. participants are instructed to talk about what they are thinking as they use the site; for example, expressing uncertainty of what option to select, frustration with poorly designed data entry fields, or satisfaction with easily understood navigation. examples of the think aloud protocol can also be found in mobile website usability testing.17 method while effective usability testing normally relies on five to eight participants, we decided a larger number of participants would be needed in order to capture the behavior of the site on a variety of devices. therefore, we recruited twelve participants to accommodate a balanced variety of smartphone brands and models. based on average market share, we aimed to test the website on four iphones, four android phones, and four other types of smartphones or internet-capable mobile devices (e.g., blackberry, windows phones). all study participants were university students, the primary target audience of the mobile website. we used three methods to recruit participants: a post to the library’s facebook page, a news item on the library’s home page, and two dozen flyers posted around campus. each form of recruitment described an opportunity for students to spend less than thirty minutes helping the library test its new mobile website. also, participants would receive a $10 coffee shop gift card as an incentive. a project-specific email address served as the initial contact point for students to volunteer. we instructed volunteers to indicate their phone type in their e-mail; this information was used to select and contact the students with the desired variety of mobile devices. if a scheduled participant did not come to the test appointment, another student with the same or similar type of phone was contacted and scheduled. no other demographic data or screening was used to select participants, aside from a minimum age requirement of eighteen years old. we employed a hybrid field and laboratory test protocol, which allowed us to test the mobile website on students’ native devices while in a laboratory setting that we could efficiently manage and schedule. participants used their own phone for the test without any adjustment to their existing operating preferences, similar to field testing methodology. however, we used a controlled environment in order to facilitate the test session and create recordings for data analysis. a library conference room served as our laboratory, and a document camera with video recording capability was used to record the session. the document camera was placed on an audio/visual cart and the participants chose to either stand or sit while holding their phones under the camera. the document camera recorded the phone screen, the participant’s hands, and the audio of the session. the video feed was available through the room projector as well, which helped us monitor image quality of the recordings. information technology & libraries | june 2012 50 figure 2. video still from test session recording the test session consisted of two parts: the completion of five tasks using participants’ phones on our test website recorded under the document camera, and a post-test survey. participants were read an introduction and instructions from a script in order to decrease variation in test protocol and our influence as the facilitators. we also performed a walk-through of the testing session prior to administering it to ensure the script was clearly worded and easy to understand. we developed our test scenarios and tasks according to five functional objectives for the library mobile website: 1. participants can find library hours for a given day in the week. 2. participants can perform a known title search in catalog and check for item status. 3. participants can use my account to view checked out books.18 4. participants can use chat reference. 5. participants can effectively search for a scholarly article using the mobile version of ebscohost academic search complete. prior to beginning the test, we encouraged participants to use the “think aloud” protocol while performing tasks. we also instructed them to move between tasks however they would naturally in order to capture user behavior when navigating from one part of the site to another. the post-test survey provided us with additional data and user reactions to the site. users were asked to rate the site’s appearance, ease of use, and how frequently they might use the different website features usability study of a library’s mobile website | pendell and bowman 51 (e.g., renewing a checked out item). the survey was administered directly after the task scenario portion of the test in order to take advantage of the users’ recent experience with the website. we evaluated the test sessions utilizing the measures of efficiency, effectiveness, and satisfaction. in this study, we assessed efficiency as time spent performing the task and effectiveness as success or failure in completing the task. we observed errors and categorized them as either a user error or site error. each error was also categorized as minor, major, or fatal: minor errors were easily identified and corrected by the user; major errors caused a notable delay, but the user was able to correct and complete the task; fatal errors prevented the user from completing the task. to assess user satisfaction, we took note of user comments as they performed tasks, and we also referred to their ratings and comments on the post-test survey. before analyzing the test recordings, we normalized our scoring behavior by performing a sample test session with a library staff member unfamiliar with the mobile website. we scored the sample recording separately and then met to discuss, clarify, and agree upon each error category. each of the twelve test sessions was viewed and scored independently. once this process was completed, we discussed our scoring of each test session video, combining our data and observations. we analyzed the combined data by looking for both common and unique errors for each usability task across the variety of smartphones tested. to protect participants’ confidentiality, each video file and post-test survey was labeled only with the test number and device type. prior to beginning the study, all recruitment methods, informed consent, methodology, tasks and post-test survey were approved by portland state university human subjects research and review committee. findings our recruitment efforts were successful with even a few same-day responses from the announcement posted on the library’s facebook page. some students also indicated that they had seen the recruitment flyers on campus. a total of fifty-two students volunteered to participate; twelve students were successfully contacted, scheduled, and tested. the distribution of the twelve participants and their types of phones is shown in table 1. number of participants operating system phone model 4 android htc droid incredible 2; motorola droid; htc mytouch 3g slide; motorola cliq 2 3 ios iphone 3gs 2 blackberry blackberry 9630; blackberry curve information technology & libraries | june 2012 52 1 windows phone 7 windows phone 7 1 webos palm pixi 1 other windows kin 2 feature phone (a phone with internet capability, running kinos) table 1. test participants by smartphone operating system and model usability task scenarios all test participants quickly and successfully completed the first task, finding the library hours for sunday. the second task was to find a book in the library catalog and report whether the book was available for check out. nine participants completed this task; the windows phone 7 and the two blackberry phones presented a fatal system error when working with our mobile catalog software, mobilecat. these participants were able to perform a search but were not able to view a full item record, blocking them from seeing the item’s availability and completing the task. this task also revealed one minor error for iphone users: the iphone displayed the item’s ten digit isbn as a phone number, complete with touch-to-call button. many users took more time than anticipated when asked to search for a book. the video recordings captured participants slowly scrolling through the menu before choosing “search psuonly catalog.” a few participants expressed their hesitation verbally: ● “maybe not the catalog? i don't know. yeah i guess that would be the one.” ● “i don't look for books on this site anyway...my lack of knowledge more than anything else.” ● “search psu library catalog i'm assuming?” the blackberry curve participant did not recognize the catalog option and selected “databases & articles” to search for a book. she was guided back to the catalog after her unsuccessful search in ebscohost. we observed an additional delay in searching for a book when using the catalog interface. the catalog search included a pull down menu of collections options. the collections menu was included by the site developers because it is present in the full website version of the local catalog. users tended to explore the menu looking for a selection that would be helpful in performing the task; however, they abandoned the menu, occasionally expressing additional confusion. usability study of a library’s mobile website | pendell and bowman 53 figure 3. catalog search with additional “collections” menu the next task was to log into a library account and view checked out items. all participants were successful with this task, but frequent minor user errors were observed, all misspelling or numerical entry errors. most participants self-corrected before submitting the login; however, one participant submitted a misspelled user name and promptly received an error message from the site. participants were also instructed to log out of the account. after clicking “logout” one participant made the observation; “huh, it goes to the login screen. i assume i'm logged out, though it doesn't say so.” the fourth task scenario involved using the library’s chat reference service via the mobile website. the chat reference service is provided via open source software in cooperation with l-net, the oregon statewide service. usability testing demonstrated that the chat reference service did not perform well on a variety of phones. also, a significant problem arose when participants attempted to access chat reference via the university’s unsecured wireless network. because the chat reference service is managed by a third-party host, three participants were confronted with a non-mobile friendly authentication screen (see discussion of the local wireless environment below). as this was an unexpected event in testing, participants were given the option to authenticate or abandon the task. all three participants who arrived at this point chose to move ahead with authentication during the test session. information technology & libraries | june 2012 54 once the chat interface was available to participants, other system errors were discovered. only three out of twelve participants successfully sent and received a chat message. only one participant (htc droid incredible) experienced an error-free chat transaction. various problems encountered included: · unresponsive or slow to respond buttons, · text fields unresponsive to data entry, · unusually long page loading time, · non-mobile-friendly error message upon attempting to exit, and · non-mobile-friendly “leave a message” webpage. another finding from this task is that participants expressed concern regarding communication delays during the chat reference task. if the librarians staffing the chat service are busy with other users, a new incoming user is placed in a queue. after waiting in the chat queue for forty seconds, one participant commented, “probably if i was on the bus and it took this long, i would leave a message.” being in a controlled environment, participants looked to the facilitator as a guide for how long to remain in the chat queue, distorting the indication of how long users would wait for a chat reference transaction in the field environment. figure 4. chat reference queue usability study of a library’s mobile website | pendell and bowman 55 the last task scenario asked participants to use the mobile version of ebscohost’s academic search complete. our test instance of this database generally performed well with android phones and less well with webos phones or iphones. android participants successfully accessed, searched, and viewed results in the database. iphone users experienced delays in initiating text entry, three consecutive touches being consistently necessary to activate typing in the search field. our feature phone participant with a windows kin 2 was unable to use ebscohost because the phone’s browser, internet explorer 6, is not supported by the ebscohost mobile website. the palm pixi participant also experienced difficulty with very long page loading times, two security certificate notifications (not present on other tests), and our ezproxy authentication page. with all these obstacles, the palm pixi participant abandoned the task. another participant, blackberry 9630, also abandoned the task due to slow page loading. a secondary objective of our ebscohost search task was to observe if participants explored ebscohost’s “search options” in order to limit results to scholarly articles. our task scenario asked participants to find a scholarly article on global warming. only one participant explored the ebscohost interface, successfully identified the “search options” menu, and limited the results to “scholarly (peer reviewed) articles.” another participant included the words “peer reviewed” with “global warming” in the search field in an attempt to add the limit. a third expressed the need to limit to scholarly articles but was unable to discover how to do so. of the remaining seven participants who searched academic search complete for the topic “global warming” none expressed concern or awareness of the scholarly limit in academic search complete. it is unclear whether this was a product of the interface design, users’ lack of knowledge regarding limiting their search to scholarly sources, or if our task scenario was simply too vague. though participants’ wireless configurations, or lack thereof, was not formally part of the usability test, we quickly discovered that this variable had a significant impact on the user’s experience of the mobile website. in the introductory script and informed consent we recommended to participants that they connect to the university’s wireless network to avoid data charges. however, we did not explicitly instruct users to connect to the secure network. most participants chose to connect to the unencrypted wireless network and appeared to be unaware of the encrypted network (psu and psu secure respectively). using the unencrypted network led to authentication requirements at two different points in the test: using the chat service and searching academic search complete. other users who were unfamiliar with adding a wireless network to their phone used their cellular network connection. these participants were asked to authenticate only when accessing ebscohost’s academic search complete (see table 2). participants expressed surprise at the appearance of an authentication request when performing different tasks, particularly while connected to the on-campus university wireless network. the required data entry in a non-mobile friendly authentication screen, and the added page loading time, created an obstacle for the participant to overcome in order to complete the task. notably, three participants also explained their naivete on how to find and add a wireless network to their phone. information technology & libraries | june 2012 56 internet connection library mobile website chat reference ebscohost on campus, unencrypted wireless no authentication required authentication required authentication required on campus, encrypted wireless no authentication required no authentication required no authentication required on campus, cellular network no authentication required no authentication required authentication required off campus, any mode no authentication required no authentication required authentication required table 2. authentication requirements based on type of internet connection and resource. post -test survey each participant completed a post-test survey that asked them to rate the mobile website’s appearance and ease of use. the survey also asked participants to rank how frequently they were likely to use specific features of the website such as search for books and ask for help on a rating scale of more than weekly, weekly, monthly, less than monthly, and never. participants were also invited to add general comments about the website. the mobile website’s overall appearance and ease of use was highly rated by all participants. the straightforward design of the mobile website’s homepage also garnered praise in the comment section of the post-test survey. comments regarding the site’s design included: “very simple to navigate,” and “the simple homepage is perfect! also, i love that the site rotates sideways with my phone.” for many of the features listed on the survey participants selected an almost even distribution across the frequency of use rating scale. however, two features were ranked as having potential for very high use. nine out of twelve participants said they would search for articles weekly or more than weekly. eight out of twelve participants said they would use the “find a computer” function weekly or more than weekly. two participants additionally wrote in comments that “find a computer” was “very important” and would be used “every day.” at the other end of the scale, our menu option “directions” was ranked as having a potential frequency of use of never, with the exception of one participant marking less than monthly. discussion usability testing of the library’s mobile website provided the team with valuable information, leading us to implement important changes before the site was launched. we quickly decided on a usability study of a library’s mobile website | pendell and bowman 57 few changes, while others involved longer discussion. the collections menu was removed from the catalog search; this menu distracted and confused users with options that were not useful in a general search. “directions” was moved from a top level navigation element to a clickable link in the site footer. also, the need for a mobile version of the library’s ezproxy authentication page was clearly documented and has since been created and implemented. however, the team was very pleased with the praise for the overall appearance of the website and its ease of use, especially considering the significant difficulties some participants faced when completing specific tasks. the “find a computer” feature of the mobile website was very popular with test participants. the potential popularity among users is perhaps a reflection of overcrowded computer labs across campus and the continued need students have for desktop computing. unfortunately, “find a computer” has been temporarily removed from the site due to changes in computer laboratory tracking software at the campus it level. we hope to soon again have access to the workstation data for the library’s two computer labs in order to develop a new version of this feature. the hesitation participants displayed when selecting the catalog option in order to search for a book was remarkable for its pervasiveness. it’s possible that the term “catalog” has declined in use to the point of not being recognizable to some users, and it is not used to describe the search on the homepage of the library’s full website. in fact, we had originally planned to name the catalog search option with a more active and descriptive phrase, such as “find books and more,” which is used on the library’s full website. however, the full library website employs worldcat local, allowing users to make consortial and interlibrary loan requests. in contrast, the mobile website catalog reflects only our local holdings and does not support the request functionality. the team decided not to potentially confuse users further regarding the functionality of the different catalogs by giving them the same descriptive title. in the case that worldcat local’s beta mobile catalog increases in stability and functionality, we will abandon mobilecat and provide the same request options on the mobile website as on the full website. we discussed removing the chat service option from the “ask us” page. during usability testing, it was demonstrated that users would too frequently have poor experiences using this service due to slow page loads on most phones, the unpredictable responsiveness of text entry fields and buttons, and the wait time for a librarian to begin the chat. also, it could be that waiting in a virtual queue on a mobile device is particularly unappealing because the user is blocked from completing other tasks simultaneously. the library recently implemented a new text reference service, and this service was added to the mobile website. the text reference service is an asynchronous, non-webbased service that is less likely to pose similar usability problems as those found with the chat service. this reflects the difference between applications developed for desktop computing, such as web-based instant messaging, versus a technology that is specifically related to the mobile phone environment, like text messaging. however, tablet device users complicate matters since they might use the full desktop website or the mobile website; for this reason, chat reference is still part of the mobile website. information technology & libraries | june 2012 58 participants’ interest in accessing and searching databases was notable. during the task, many participants expressed positive reactions to the availability of the ebscohost database. the posttest survey results demonstrated a strong interest in searching for articles via the mobile website, giving their potential frequency of use as weekly or more than weekly. this evidence supports the previous user focus group results of seeholzer and salem.19 students are interested in accessing research databases on their mobile devices, despite the likely limitations of performing advanced searches and downloading files. therefore, the team decided to include ebscohost’s academic search complete along with eight other mobile-friendly databases in the live version of the website launched after the usability test. figure 5. home page of the library mobile website, updated usability study of a library’s mobile website | pendell and bowman 59 the new library mobile website was launched in the first week of fall 2011 quarter classes. in the first full week there were 569 visits to the site. site analytics for the first week also showed that our distribution of smartphone models in usability testing was fairly well matched with the users of the website, though we underestimated the number of iphone users: 64 percent of visits were from apple ios users, 28 percent from android users, 0.7percent blackberry users, and the remaining a mix of users with alternative mobile browsers and desktop browsers. usability testing with participants’ native smartphones and wireless connectivity revealed issues which would have been absent in a laboratory test that employed a mobile device emulator and a stable network connection. the complications introduced by the encrypted and unencrypted campus wireless networks, and cellular network connections, revealed some of the many variables users might experience outside of a controlled setting. ultimately, the variety of options for connecting to the internet from a smartphone, in combination with the authentication requirements of licensed library resources, potentially adds obstacles for users. general recommendations for mobile library websites that emerged from our usability test include: · users appreciate simple, streamlined navigation and clearly worded labels; · error message pages and other supplemental pages linked from the mobile website pages should be identified and mobile-friendly versions created; · recognize that how users connect to the mobile website is related to their experience using the site; · anticipate problems with third-party services (which often cannot be solved locally). additionally, system responses to user actions are important; for example, provide a “you have successfully logged out” message and an indicator that a catalog search is in progress. it is possible that users are even more likely to abandon tasks in a mobile environment than in a desktop environment if they perceive the site to be unresponsive. as test facilitators, we experienced three primary difficulties in keeping the testing sessions consistent. the unexpectedly poor performance of the mobile website on some devices required us to communicate with participants about when a task could be abandoned. for example, after one participant made three unsuccessful attempts at entering text data in the chat service interface, she was directed to move ahead to the next task. such instances of multiple unsuccessful attempts were considered to be fatal system errors. however, under these circumstances, it is difficult to know whether our test facilitation led participants to spend more or less time than they normally would attempting a task. secondly, the issue of system authentication led to unexpected variation in testing. some participants proceeded through these obstacles, while others either opted out or had significant enough technical difficulties that the task was deemed a fatal error. again, it is unclear how the average user would deal with this situation in the field. some users information technology & libraries | june 2012 60 might leave an activity if an obstacle appears too cumbersome, others might proceed. finally, participants demonstrated a wide range in their willingness to “think aloud.” in retrospect, as facilitators, we should have provided an example of the method before beginning the test; perhaps doing so would have encouraged the participants to speak more freely. the relatively simple nature of most of the test tasks may have also contributed to this problem as participants seemed reluctant to say something that might be considered too obvious. another limitation of our study is that the participants were a convenience sample of volunteers selected by phone type. though our selection was based loosely on market share of different smartphone brands, a preliminary investigation into the mobile device market of our target population would have been helpful to establish what devices would be most important to test. additional usability testing on more complex library related tasks, such as advanced searching in a database, or downloading and viewing files, is recommended for further research. also of interest would be a study of user willingness to proceed past obstacles like authentication requirements and non-mobile friendly pages in the field. conclusion we began our study questioning whether or not different smartphone hardware and operating systems would impact the user experience of our library’s new mobile website. usability testing confirmed that the type of smartphone does have an impact on the user experience, occasionally significantly so. by testing the site on a range of devices, we observed a wide variation of successful and unsuccessful experiences with our mobile website. the wide variety of phones and mobile devices in use makes developing a mobile website that perfectly serves all of them difficult; there is likely to always be a segment of users who experience difficulties with any given mobile website. however, usability testing data and developer awareness of potential problems will generate positive changes to mobile websites and alleviate frustration for many users down the road. references and notes 1. aaron smith, “35% of american adults own a smartphone: one quarter of smartphone owners use their phone for most of their online browsing,” pew research center, june 15, 2011, http://pewinternet.org/~/media//files/reports/2011/pip_smartphones.pdf (accessed oct. 13, 2011). 2. shannon d. smith and judith b. caruso, the ecar study of undergraduate students and information technology, 2010, educause, 2010, 41, http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed sept. 12, 2011); shannon d. smith, gail salaway, and judith b. caruso, the ecar study of undergraduate students and information technology, 2009, educause, 2009, 49, http://www.educause.edu/resources/theecarstudyofundergraduatestu/187215 (accessed sept. 12, 2011). http://pewinternet.org/~/media/files/reports/2011/pip_smartphones.pdf http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf http://www.educause.edu/resources/theecarstudyofundergraduatestu/187215 usability study of a library’s mobile website | pendell and bowman 61 3. a comparison count of u.s. and canadian academic libraries with active mobile websites, wiki page versions, august 2010 (56 listed) and august 2011 (84 listed). library success: a best practices wiki, “m-libraries: libraries offering mobile interfaces or applications,” http://libsuccess.org/index.php?title=m-libraries (accessed sept. 7, 2011). 4. laurie m. bridges, hannah gascho rempel, and kim griggs, “making the case for a fully mobile library web site: from floor maps to the catalog,” reference services review 38, no. 2 (2010): 317, doi:10.1108/00907321011045061. 5. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mmobile: tips on designing and developing mobile web sites,” code4lib journal no. 8 (2009), under “content adaptation techniques,” http://journal.code4lib.org/articles/2055 (accessed sept. 7, 2011). 6. jamie seeholzer and joseph a. salem jr., “library on the go: a focus group study of the mobile web and the academic library,” college & research libraries 72, no. 1 (2011): 19. 7. dongsong zhang and boonlit adipat, “challenges, methodologies, and issues in the usability testing of mobile applications,” international journal of human-computer interaction 18, no. 3 (2005): 302, doi:10.1207/s15327590ijhc1803_3. 8. griggs, bridges, and rempel, “library/mobile.” 9. zhang and adipat, “challenges, methodologies,” 303–4. 10. billi et al., “a unified methodology for the evaluation of accessibility and usability of mobile applications,” universal access in the information society 9, no. 4 (2010): 340, doi:10.1007/s10209-009-0180-1. 11. zhang and adipat, “challenges, methodologies,” 302. 12. jakob nielsen, “mobile usability,” alertbox, september 26, 2011, www.useit.com/alertbox/mobile-usability.html (accessed sept. 28, 2011). 13. fernando loizides and george buchanan, “performing document triage on small screen devices. part 1: structured documents,” in iiix ’10: proceeding of the third symposium on information interaction in context, ed. nicholas j. belkin and diane kelly (new york: acm, 2010), 342, doi:10.1145/1840784.1840836. 14. constantinos k. coursaris and dan j. kim, “a qualitative review of empirical mobile usability studies” (presentation, twelfth americas conference on information systems, acapulco, mexico, august 4–6, 2006), 4, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.4082&rep=rep1&type=pdf (accessed sept. 7, 2011) 15. ibid., 2. http://libsuccess.org/index.php?title=m-libraries http://journal.code4lib.org/articles/2055 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.useit.com/alertbox/mobile-usability.html http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.4082&rep=rep1&type=pdf information technology & libraries | june 2012 62 16. jeffrey rubin and dana chisnell, handbook of usability testing: how to plan, design, and conduct effective tests, 2nd ed. (indianapolis, in: wiley, 2008); carole a. george, user-centered library web sites: usability evaluation methods (cambridge: chandos, 2008). 17. ronan hegarty and judith wusteman, “evaluating ebscohost mobile,” library hi tech 29, no. 2 (2011): 323–25, doi:10.1108/07378831111138198; robert c. wu et al., “usability of a mobile electronic medical record prototype: a verbal protocol analysis,” informatics for health & social care 33, no. 2 (2008): 141–42, doi:10.1080/17538150802127223. 18. in order to protect participants’ confidentiality a dummy library user account was created; the user name and password for the account were provided to the participant at the test session. 19. seeholzer and salem, “library on the go,” 14. 50 information technology and libraries | june 2006 author name and second author f orty years! in july 1966, the library and information technology association (lita) was officially born at the american library association (ala) annual conference in new york as the information science and automation division (isad). it was bastille day, and i’m sure for those who had worked so hard to create this new organization that it probably seemed like a revolution, a new day. the organizational meeting held that day attracted “several hundred people.” imagine! i’ve mentioned it before, i know, but the history of the first twenty-five years of lita is intriguing reading and well worth an investment of your time. stephen r. salmon’s article “lita’s first twenty-five years: a brief history” (www.lita.org/ala/lita/aboutlita/org/1st 25years.htm) offers an interesting look back in time. any technology organization that has been in existence for forty or more years has seen a lot of changes and adapted over time to a new environment and new technologies. there is no other choice. someone (who, i don’t remember; i’d gladly attribute the quote if i did) once told me that library automation began with the electric eraser. i’m sure that many of you have neither seen an electric eraser, nor can you probably imagine its purpose. ask around. i’m sure there are staff in your organization who do remember using it. there may even be one hidden somewhere in your library. a quick search of the web even finds cordless, rechargeable electric erasers today in drafting and art supply stores. the 1960s, as lita was born, was still the era of the big mainframe systems and not-so-common programming languages. machine readable cataloging (marc) was born and oclc conceived. the 1970s saw the introduction of minicomputer systems. digital equipment corporation introduced the vax, a 32-bit platform, in 1976. the roots of many of our current integrated library systems reach back to this decade. the 1980s saw the introduction of the ibm personal computer and the apple macintosh. the graphical interface became the norm or at least the one to imitate. the 1990s saw a shift away from hardware to communication and access as the web was unveiled and began to give life to the internet bubble. the new millennium began with y2k. the web predominates, and increasingly, the digital form dominates almost everything we touch (text, audio, video). automation and systems evolved and changed over the years, and so did libraries. automation, which had been confined to large air-conditioned and monitored rooms, moved out into the library. it increasingly appeared at circulation desks, on staff desks, and then throughout the library. information technology (it) spread into offices everywhere and into homes. libraries had products and services to deliver to users. users demanded more convenience. of course, others knew this trend as well and provided products and services that users wanted. users often liked what they saw in stores better than what the library was able to provide. each of us attempts to keep up, compete, and beat those whom we see as our competitors. it’s a moving target and one that seems to be gaining speed. all the while, during these four decades, our association and its members continually adapted to the new environment, faced new challenges, and adopted new technologies. we would not exist if we did not. i feel that we, as an association, are again facing the need to change, to transform ourselves. it, digital technology, automation (whatever term you want to use) affects the work of virtually every library staff member. everyone’s work in the library uses or contributes to the digital presence of our employer. it is not the domain of a few. lita has a wonderful history and it has great potential to better serve the profession. what do we want our association to be? what programs and services can we provide that others do not? who can we involve to broaden our reach? how can we better communicate with members and nonmembers? if we had a clean sheet of paper, what would we write? what would we dream? we need to share that dream and bring it to life. i can’t do it. the lita board can’t do it. we need your help. we need your ideas. we need your energy. we need to break out of our comfort zone. none of us wants the strategic plan (www.lita.org/ala/lita/aboutlita/org/plan.htm) we adopted last year to ring hollow. we want to accelerate change and move into a reenergized future. i welcome your aspirations, ideas, and comments. i know that the lita board does as well. please feel free to contact me or any member of the board (www.lita .org/ala/lita/aboutlita/org/litagov/board.htm). lita is your association. where should we be going? help us navigate the future. patrick mullin patrick mullin (mullin@email.unc.edu) is lita president 2005– 2006, and associate university librarian for access services and systems, the university of north carolina at chapel hill. president’s column lauren h. mandel (lmandel@fsu.edu) is a doctoral candidate at the florida state university college of communication & information, school of library & information studies, and is research coordinator at the information use management & policy institute. geographic information systems: tools for displaying in-library use data lauren h. mandel geographic information systems: tools for displaying in-library use data | mandel 47 in-library use data is crucial for modern libraries to understand the full spectrum of patron use, including patron self-service activities, circulation, and reference statistics. rather than using tables and charts to display use data, a geographic information system (gis) facilitates a more visually appealing graphical display of the data in the form of a map. giss have been used by library and information science (lis) researchers and practitioners to create maps that display analyses of service area populations and demographics, facilities space management issues, spatial distribution of in-library use of materials, planned branch consolidations, and so on. the “seating sweeps” method allows researchers and librarians to collect in-library use data regarding where patrons are locating themselves within the library and what they are doing at those locations, such as sitting and reading, studying in a group, or socializing. this paper proposes a gis as a tool to visually display in-library use data collected via “seating sweeps” of a library. by using a gis to store, manage, and display the data, researchers and librarians can create visually appealing maps that show areas of heavy use and evidence of the use and value of the library for a community. example maps are included to facilitate the reader’s understanding of the possibilities afforded by using giss in lis research. t he modern public library operates in a context of limited (and often continually reduced) funding where the librarians must justify the continued value of the library to funding and supervisory authorities. this is especially the case as more and more patrons access the library virtually, calling into question the relevance of the physical library. in this context, there is a great need for librarians and researchers to evaluate the use of library facility space to demonstrate that the physical library is still being used for important social and educational functions. despite this need, no model of public library facility evaluation emphasizes the ways patrons use library facilities. the systematic collection of in-library use data must go beyond traditional circulation and reference transactions to include self-service activities, group study and collaboration, socializing, and more. geographic information systems (giss) are beginning to become deployed in library and information science (lis) research as a tool for graphically displaying data. an initial review of the literature has yielded studies where a gis has been used in analyzing service area populations through u.s. census data;1 sitting facility locations;2 managing facilities, including spatial distribution of in-library book use and occupancy of library study space;3 and planning branch consolidations.4 these uses of gis are not mutually exclusive; studies have combined multiple uses of giss.5 also, giss have been proposed as viable tools for producing visual representations of measurements of library facility use.6 these studies show the capabilities of a gis for storing, managing, analyzing, and displaying in-library use data and the value of gisproduced maps for library facility evaluations, in-library use research, and library justification. n research purpose observing and measuring the use of a library facility is a crucial step in the facility evaluation process. the library needs to understand how the facility is currently being used in order to justify the continued financial support necessary to maintain and operate it. understanding how the facility is used can also help librarians identify hightraffic areas of the library that are ideal locations to market library services and materials. this understanding cannot be reached by analyzing circulation and reference transaction data alone; it must include in-library use measures that account for all ways patrons are using the facility. the purpose of this paper is to suggest a method by which to observe and record all uses of a library facility during a sampling period, the so-called “seating sweep” performed by given and leckie, and then to use a gis to store, manage, and display the collected data on a map or series of maps that graphically depict library use.7 n significance of facility evaluation facility evaluation is a topic of vital importance in all fields, but this is especially true of a field such as public librarianship where funding is often a source of concern.8 in times of economic instability, libraries can benefit from the ability to identify uses of existing facilities and employ this information to justify the continued operation of the library facility. also, knowing which areas of the library are more frequently used than others can help lauren h. mandel (lmandel@fsu.edu) is a doctoral candidate at the florida state university college of communication & information, school of library & information studies, and is research coordinator at the information use management & policy institute. 48 information technology and libraries | march 2010 librarians determine where to place displays of library materials and advertisements of library services. for a library to begin to evaluate patron use and how well the facility meets users’ needs, there must be an understanding of what users need from the library facility.9 to determine those needs, it is vital that library staff observe the facility while it is being used. this observation can be applied to the facility evaluation plan to justify the continued operation of the facility to meet the needs of the library service population. understanding how people use the public library facility beyond traditional measures of circulation statistics and reference transactions can lead to new theories of library use, an area of significant research interest for lis. additionally, the importance of this work transcends lis because it applies to other government-funded community service agencies as well. for example, recreation facilities and community centers could also benefit from a customer-use model that incorporates measures of the true use of those facilities. n literature review although much has been written on the use of library facilities, little of the research includes studies of how patrons actually use existing public library facilities and whether facilities are designed to accommodate this use.10 rather, much of the research in public library facility evaluation has focused on collection and equipment space needs,11 despite the user-oriented focus of public library accountability models.12 recent research in library facility design is beginning to reflect this focus,13 but additional study would be useful to the field. use of gis is on the rise in the modern technological world. a gis is a computer-based tool for compiling, storing, analyzing, and displaying data graphically.14 usually this data is geospatial in nature, but a gis also can incorporate descriptive or statistic data to provide a richer picture than figures and tables can. although gis has been around for half a century, it has become increasingly more affordable, allowing libraries and similar institutions to consider using a gis as a measurement and analysis tool. giss have started being used in lis research as a tool for graphically displaying library data. one fruitful area has been the mapping of user demographics for facility planning purposes,15 including studies that mapped library closures.16 mapping also can include in-library use data,17 in which case a gis is used to overlay collected in-library use data on library floor plans. this can offer a richer picture of how a facility is being used than traditional charts and tables can provide. using a gis to display library service area population data adkins and sturges suggest libraries use a gis-based library service area assessment as a method to evaluate their service areas and plan library services to meet the unique demographic demands of their communities.18 they discuss the methods of using gis, including downloading u.s. census tiger (topologically integrated geographic encoding and referencing) files, geocoding library locations, delineating service areas by multiple methods, and analyzing demographics. a key tenet of this approach is the concept that public libraries need to understand the needs of their patrons. this is a prevailing concept in the literature.19 prieser and wang, in reporting a method used to create a facilities master plan for the public library of cincinnati and hamilton county, ohio, offer a convincing argument for combining gis and building performance evaluation (bpe) methods to examine branch facility needs and offer individualized facilities recommendations.20 like other lis researchers,21 preiser and wang suggest a relationship between libraries and retail stores, noting the similar modern trends of destination libraries and destination bookstores. they also acknowledge the difficulty in completing an accurate library performance assessment due to the multitude of activities and functions of a library. their method is a combination of a gis-based service area and population analysis with a bpe that includes staff and user interviews and surveys, direct observation, and photography. the described multimethod approach offers a more complete picture of a library facility’s performance than traditional circulation-based evaluations. further use of giss in library facility planning can be seen from a study comparing proposed branches by demographic data that has been analyzed and presented through a gis. hertel and sprague describe research that used a gis to conduct geospatial analysis of u.s. census data to depict the demographics of populations that would be served by two proposed branch libraries for a public library system in idaho.22 a primary purpose of this research is to demonstrate the possible ways public libraries can use gis to present visual and quantitative demographic analyses of service area populations. hertel and sprague identify that public libraries are challenged to determine which public they are serving and the needs of that population, writing that “libraries are beginning to add customer-based satisfaction as a critical component of resource allocation decisions” and need the help of a gis to provide hard-data evidence in support of staff observations.23 this evidence could take the form of demographic data, as discussed by hertel and sprague, and also could incorporate in-library use data to present a fuller picture of a facility’s use. geographic information systems: tools for displaying in-library use data | mandel 49 using gis to display in-library use data xia conducted several studies in which he collected libraryuse data and mapped that data via a gis. in one study designed to identify the importance of space management in academic libraries, xia suggests applications of giss in library space management, particularly his tool integrating library floor plans with feature data in a gis.24 he explains that a gis can overcome the constraints of drafting and computer automated design tools, such as those in use at chico meriam library at california state university and at the michigan state university main library. for example, giss are not limited to space visualization manipulation, but can incorporate user perceptions, behavior, and daily activities, all of which are important data to library space management considerations and in-library use research. xia also reviews the use of gis tools that incorporate hospital and casino floor plans, noting that library facilities are as equally complex as hospitals and casinos; this is a compelling argument that academic libraries should consider the use of a gis as a space management tool. in another study, xia uses a gis to visualize the spatial distribution of books in the library in an attempt to establish the relationship between the height of bookshelves and the in-library use of books.25 this study seeks to answer the question of how the location of books on shelves of different heights could influence user behavior (i.e., patrons may prefer to browse shelves at eye level rather than the top and bottom shelves). what is of interest here is xia’s use of a gis to spatially represent the collected data. xia remarks that a gis “is suitable for assisting in the research of in-library book use where library floor layouts can be drawn into maps on multipledimensional views.”26 in fact, xia’s graphics depict the use of books by bookshelf height in a visual manner that could not be achieved without the use of a gis. similarly, a gis can be used to spatially represent the collected data in an in-library use study by overlaying the data onto a representation of the library floor plan. in a third project, xia measures study space use in academic libraries as a metric of user satisfaction with library services.27 he says that libraries need to evaluate space needs on case-by-case basis because every library is unique and serves a unique population. therefore, to observe the occupancy of study areas in an academic library, xia drew the library’s study facilities (including furniture) in a gis. he then observed patrons’ use of the facilities and entered the observation data into the gis to overlay on maps of the study areas. there are several advantages of using gis in this way: spatial databases can store continuing data sets, the system is powerful and flexible for manipulating and analyzing the spatial dataset, there are enhanced data visualization capabilities, and maps and data become interactive. conclusions drawn from the literature a gis is a tool gaining momentum in the world of lis research. giss have been used to conduct and display service area population assessments,28 propose facility locations,29 and plan for and measure branch consolidation impacts and benefits.30 giss also have been used to graphically represent in-library use for managing facility space allocation, mapping in-library book use, and visualizing the occupancy of library study space.31 additionally, giss have been used in combination studies that examine library service areas and facility location proposals.32 these uses of giss are only the beginning; a gis can be used to map any type of data a library can collect, including all measures of in-library use. additionally, gis-based data analysis and display complements the focus in library-use research on gathering data to show a richer picture of a facility’s use and the focus in library facility design literature on building libraries on the basis of community needs.33 n in-library use research that would benefit from spatial data displays unobtrusive observational research offers a rich method for identifying and recording the use of a public library facility. a researcher could obtain a copy of a library’s floor plan, predetermine sampling times during which to “sweep” the library, and conduct the sweeps by marking all patrons observed on the floor plan.34 this data then could be entered into a gis database for spatial analysis and display. specific questions that could be addressed via such a method include the following: n what are all the ways in which people are using the library facility? n how many people are using traditional library resources, such as books and computers? n how many people are using the facility for other reasons, such as relaxation, meeting friends, and so on? n do the ways in which patrons use the library vary by location within the facility (e.g., are the people using traditional library resources and the people using the library for other reasons using the same areas of the library or different areas)? n which area(s) of the library facility receive the highest level of use? it is hoped that answers to these questions, in whole or in part, could begin to offer a picture of how a library facility is currently being used by library patrons. to better view this picture, the data recorded from the observational research could be entered into a gis to 50 information technology and libraries | march 2010 overlay onto the library floor plan in a similar manner as xia’s use of a gis to display occupancy of library study space.35 this spatial representation of the data should facilitate greater understanding of the actual use of the library facility. instead of a library presenting tables and graphs of library use, it would be able to produce illustrative maps that would help explain patterns of use to funding and supervising authorities. these maps would not require expensive proprietary gis packages; the examples provided in this paper were created using the free, open-source mapwindow gis package. example using gis to display in-library use data for this paper, i produced example maps on the basis of fictional in-library use data. these maps were created using mapwindow gis software along with microsoft excel, publisher, and paint (see figure 1 for a diagram of this process). mapwindow is an open-source gis package that is easy to learn and use, but its layout and graphic design features are limited compared to the more expensive and sophisticated proprietary gis packages.36 mapwindow files are compatible with the proprietary packages, so they could be imported into other gis packages for finishing. for this paper, however, the goal was to create simple maps that a novice could replicate. therefore publisher and paint were used for finalizing the maps, instead of a sophisticated gis package. it was relatively easy to create the maps. first, i drew a sample floor plan of a fictional library computer lab in excel and imported it into mapwindow as a jpeg file. i then overlaid polygons (shapes that represent area units such as chairs and tables) onto the floor plan and saved two shapefiles, one for tables and one for computers. a shapefile is a basic storage file used in most gis packages. for each of those shapefiles i created an attribute table (basically, a linked spreadsheet) using fictitious data representing use of the tables and computers at 9 and 11 a.m. and 1, 3, 5, and 7 p.m. on a sample day. the field calculator generated a final column summing the total use of each table and computer for the fictitious sample day. i then created maps depicting the use of both tables and computers at each of the sample time periods (see figure 2) and for the total use (see figure 3). benefits of gis-created displays for library managers the maps presented here are not based on actual data, but are meant to demonstrate the capabilities of giss for spatially representing the use of a library facility. this could be done on a grander scale using an entire library floor plan and data collected during a longer sample period (e.g., a full week). these maps can serve several purposes for figure 1. process diagram for creating the sample maps figure 2. example maps depicting use of tables and computers in a fictional library computer lab, by hour geographic information systems: tools for displaying in-library use data | mandel 51 library managers, specifically regarding the marketing of library services and the justification of library funding. mapping data obtained from library “sweeps” can help identify the popularity of different areas of the library at different times of the day, different days of the week, or different times of the year. once the library has identified the most popular areas, this information can be used to market library materials and services. for example, a highly populated area would be an ideal location over which to install ceiling-mounted signs that the library could use for marketing services and programs. or the library could purchase a book display table similar to those used in bookstores and install it in the middle of a frequently populated area. the library could stock the table with seasonally relevant books and other materials (e.g., tax guidebooks in march and april) and track the circulation of these materials to determine the degree to which placement on the display table resulted in increased borrowing of those materials. in addition to helping the library market its materials and services, mapping in-library use can provide visual evidence of the library’s value. public libraries often rely on reference and circulation transaction data, gate counts, and programming attendance statistics to justify their existence. these measures, although valuable and important, do not include many other ways that patrons use libraries, such as sitting and reading, studying, group work, and socializing. during “seating sweeps,” the observers can record any and all uses they observe, including any that may not have been anticipated. all of these uses could then be mapped, providing a richer picture of how a public library is used and stronger justification of the library’s value. these maps may be easier for funding and supervising authorities to understand than textual explanations or graphs and charts of statistical analyses. n conclusion from a review of the literature, it is clear that giss are increasingly being used in lis research as data-analysis and display tools. giss are being used to analyze patron and materials data as well as studies combining combined multiple uses of giss. patron analysis has included service-area-population analysis and branch-consolidation planning. analysis of library materials has been used for space management, visualizing the spatial distribution of in-library book use, and visual representation of facility-use measurements. this paper has proposed collecting in-library use data according to given and leckie’s “seating sweeps” method and visually displaying that data via a gis. examples of such visual displays were provided to facilitate the reader’s understanding of the possibilities afforded by using a gis in lis research, as well as the scalable nature of the method. librarians and library staff can produce maps similar to the examples in this paper with minimal gis training and background. the literature review and example figures offered in this paper show the capabilities of giss for analyzing and graphically presenting library-use data. giss are tools that can facilitate library facility evaluations, in-library use research, and library valuation and justification. references 1. denice adkins and denyse k. sturges, “library service planning with gis and census data,” public libraries 43, no. 3 (2004): 165–70; karen hertel and nancy sprague, “gis and census data: tools for library planning,” library hi tech 25, no. 2 (2007): 246–59; wolfgang f. e. preiser and xinhao wang, “assessing library performance with gis and building evaluation methods,” new library world 107, no. 1224–25 (2006): 193–217. figure 3. example map depicting total use of tables and computers in a fictional library computer lab for a sample day 52 information technology and libraries | march 2010 2. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 3. jingfeng xia, “library space management: a gis proposal,” library hi tech 22, no. 4 (2004): 375–82; xia, “using gis to measure in-library book-use behavior,” information technology & libraries 23, no. 4 (2004): 184–91; xia, “visualizing occupancy of library study space with gis maps,” new library world 106, no. 1212–13 (2005): 219–33. 4. preiser and wang, “assessing library performance.” 5. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 6. preiser and wang, “assessing library performance”; xia, “library space management”; xia, “using gis to measure”; xia, “visualizing occupancy.” 7. lisa m. given and gloria j. leckie, “‘sweeping’ the library: mapping the social activity space of the public library,” library & information science research 25, no. 4 (2003): 365–85. 8. “jackson rejects levy to reopen libraries,” american libraries 38, no. 7 (2007): 24–25; “may levy set for jackson county libraries closing in april,” american libraries 38, no. 3 (2007): 14; “tax reform has florida bracing for major budget cuts,” american libraries 38, no. 8 (2007): 21. 9. anne morris and elizabeth barron, “user consultation in public library services,” library management 19, no. 7 (1998): 404–15; susan l. silver and lisa t. nickel, surveying user activity as a tool for space planning in an academic library (tampa: univ. of south florida library, 2002); james simon and kurt schlichting, “the college connection: using academic support to conduct public library services,” public libraries 42, no. 6 (2003): 375–78. 10. given and leckie, “‘sweeping’ the library”; christie m. koontz, dean k. jue, and keith curry lance, “collecting detailed in-library usage data in the u.s. public libraries: the methodology, the results and the impact,” in proceedings of the third northumbria international conference on performance measurement in libraries and information services (newcastle, uk: university of northumbria, 2001): 175–79; koontz, jue, and lance, “neighborhood-based in-library use performance measures for public libraries: a nationwide study of majorityminority and majority white/low income markets using personal digital data collectors,” library & information science research 27, no. 1 (2005): 28–50. 11. cheryl bryan, managing facilities for results: optimizing space for services (chicago: ala, 2007); anders c. dahlgren, public library space needs: a planning outline (madison, wis.: department of public instruction, 1988); william w. sannwald and robert s. smith, eds., checklist of library building design considerations (chicago: ala, 1988). 12. brenda dervin, “useful theory for librarianship: communication, not information,” drexel library quarterly 13, no. 3 (1977): 16–32; morris and barron, “user consultation”; preiser and wang, “assessing library performance”; simon and schlichting, “the college connection”; norman walzer, karen stott, and lori sutton, “changes in public library services,” illinois libraries 83, no. 1 (2001): 47–52. 13. bradley wade bishop, “use of geographic information systems in marketing and facility site location: a case study of douglas county (colo.) libraries,” public libraries 47, no. 5: 65–69; david jones, “people places: public library buildings for the new millennium,” australasian public libraries & information services 14, no. 3 (2001): 81–89; nolan lushington, libraries designed for users: a 21st century guide (new york: neal-schuman, 2001); shannon mattern, “form for function: the architecture for new libraries,” in the new downtown library: designing with communities (minneapolis: univ. of minnesota pr., 2007), 55–83. 14. united nations, department of economic and social affairs, statistics division, handbook on geographical information systems and mapping (new york: united nations, 2000). 15. adkins and sturges, “library service planning”; bishop, “use of geographic information systems”; hertel and sprague, “gis and census data”; christie koontz, “using geographic information systems for estimating and profiling geographic library market areas,” in geographic information systems and libraries: patrons, maps, and spatial information, ed. linda c. smith and mike gluck (urbana–champaign: univ. of illinois pr., 1996): 181–93; preiser and wang, “assessing library performance.” 16. christie m. koontz, dean k. jue, and bradley wade bishop, “public library facility closure: an investigation of reasons for closure and effects on geographic market areas,” library & information science research 31, no. 2 (2009): 84–91. 17. xia, “library space management”; xia, “using gis to measure”; xia, “visualizing occupancy.” 18. adkins and sturges, “library service planning.” 19. bishop, “use of geographic information systems”; jones, “people places”; koontz, jue, and lance, “collecting detailed inlibrary usage data”; koontz, jue, and lance, “neighborhoodbased in-library use”; morris and barron, “user consultation”; simon and schlichting, “the college connection”; walzer, stott, and sutton, “changes in public library services.” 20. preiser and wang, “assessing library performance.” 21. given and leckie, “‘sweeping’ the library;” christie m. koontz, “retail interior layout for libraries,” marketing library services 19, no. 1 (2005): 3–5. 22. hertel and sprague, “gis and census data.” 23. ibid., 247. 24. xia, “library space management.” 25. xia, “using gis to measure.” 26. ibid., 186. 27. xia, “visualizing occupancy.” 28. adkins and sturges, “library service planning”; hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 29. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 30. koontz, jue, and bishop, “public library facility closure”; preiser and wang, “assessing library performance.” 31. xia, “library space management”; xia, “using gis to measure”; xia, “visualizing occupancy.” 32. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 33. given and leckie, “‘sweeping’ the library”; koontz, jue, and lance, “collecting detailed in-library usage data”; koontz, jue, and lance, “neighborhood-based in-library use”; silver and nickel, surveying user activity; jones, “people places”; lushington, libraries designed for users. 34. given and leckie, “‘sweeping’ the library.” 35. xia, “visualizing occupancy.” 36. for more information or to download mapwindow gis, see http://www.mapwindow.org/ : | zhang et al. 75seeing the wood for the trees | zhang et al. 75 here again, no weighting or differentiating mechanism is included in describing the multiple elements. what is addressed is the “what” problem: what is the work of or about? metadata schemas for images and art works such as vra core and cdwa focus on specificity and exhaustivity of indexing, that is, the precision and quantity of terms applied to a subject element. however, these schemas do not address the question of how much the work is of or about the item or concept represented by a particular keyword. recently, social tagging functions have been adopted in digital library and catalog systems to help support better searching and browsing. this introduces more subject terms into the system. yet again, there is typically no mechanism to differentiate between the tags used for any given item, except for only a few sites that make use of tag frequency information in the search interfaces. as collections grow and more federated searching is carried out, the absence of weights for subject terms can cause problems in search and navigation. the following examples illustrate the problems, and the rest of the paper further reviews and discusses the precedent research and practice on weighting, and further outlines the issues that are critical in applying a weighting mechanism. example, the dublin core metadata element set recommends the use of controlled vocabulary to represent subject in “keywords, key phrases, or classification codes.”1 similarly, the library of congress practice, suggested in the subject headings manual, is to assign “one or more subject headings that best summarize the overall contents of the work and provide access to its most important topics.”2 a topic is only “important enough” to be given a subject heading if it comprises at least 20 percent of a work, except for headings of named entities, which do not need to be 20 percent of the work when they are “critical to the subject of the work as a whole.”3 although catalogers are aware of it when they assign terms, this weight information is left out of the current library metadata schemas and practice. a similar practice applies in non-textual object subject indexing. because of the difficulty of selecting words to represent visual/aural symbolism, subject indexing for art and cultural objects is usually guided by panofsky’s three levels of meaning (pre-iconographical, iconographical, and post-iconographical), further refined by layne in “ofness” and “aboutness” in each level. specifically, what can be indexed includes the “ofness” (what the picture depicts) as well as some “aboutness” (what is expressed in the picture) in both pre–iconographical and iconographical levels.4 in practice, vra core 4.0 for example defines subject subelements as: terms or phrases that describe, identify, or interpret the work or image and what it depicts or expresses. these may include generic terms that describe the work and the elements that it comprises, terms that identify particular people, geographic places, narrative and iconographic themes, or terms that refer to broader concepts or interpretations.5 seeing the wood for the trees: enhancing metadata subject elements with weights subject indexing has been conducted in a dichotomous way in terms of what the information object is primarily about/of or not, corresponding to the presence or absence of a particular subject term, respectively. with more subject terms brought into information systems via social tagging, manual cataloging, or automated indexing, many more partially relevant results can be retrieved. using examples from digital image collections and online library catalog systems, we explore the problem and advocate for adding a weighting mechanism to subject indexing and tagging to make web search and navigation more effective and efficient. we argue that the weighting of subject terms is more important than ever in today’s world of growing collections, more federated searching, and expansion of social tagging. such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality and metadata schemas. s ubjects as important access points have largely been indexed in a dichotomous way: what the object is primarily about/ of or not. this approach to indexing is implicitly assumed in various guidelines for subject indexing. for hong zhang, linda c. smith, michael twidale, and fang huang gaocommunications hong zhang (hzhang1@illinois.edu) is phd candidate, graduate school of library and information science, university of illinois at urbana-champaign, linda c. smith (lcsmith@illinois.edu) is professor, graduate school of library and information science, university of illinois at urbana-champaign, michael twidale (twidale@illinois.edu) is professor, graduate school of library and information science, university of illinois at urbana-champaign, and fang huang gao (fgao@gpo.gov) is supervisory librarian, government printing office. 76 information technology and libraries | june 2011 ■■ examples of problems exhaustive indexing: digital library collections a search query of “tree” can return thousands of images in several digital library collections. the results include images with a tree or trees as primary components mixed with images where a tree or trees, although definitely present, are minor components of the image. figure 1 illustrates the point. these examples come from three different collections and either include the subject element of “tree” or are tagged with “tree” by users. there is no mechanism that catalogers or users have available to indicate that “tree” in these images is a minor component. note that we are not calling this out as an error in the professionally developed subject terms, nor indeed in the end user generated tags. although particular images may have an incorrectly applied keyword, we want to talk about the vast majority where the keyword quite correctly refers to a component of the image. furthermore, such keywords referring to minor components of the image are extremely useful for other queries. this kind of exhaustive indexing of images enables the effective satisfaction of search needs, such as looking for pictures of “buildings, people, and trees” or “trees beside a river.” with large image collections, such compound needs become more important to satisfy by combinations of searching and browsing. to enable them, metadata about minor subjects is essential. however, without weights to differentiate subject keywords, users will get overwhelmed with partially relevant results. for example, a user looking for images of trees (i.e., “tree” as the primary subject) would have to look through large sets of results such as a photograph of a dog with a tiny tree out of focus in the background. for some items that include rich metadata, such as title or description, when people look at a particular item’s record, with the title and sometimes the description, we may very well determine that the picture is primarily of, say, a dog instead of trees. that is, the subject elements have to be interpreted based on the context of other elements in the record to convey the “primary” and “peripheral” subjects among the listed subject terms. however, in a search and navigation system where subject elements are usually treated as context-free, search efficiency will be largely impaired because of the “noise” items and inability to refine the scope, especially when the volume of items grows. lack of weighting also limits other potential uses of keywords or tags. for example, all the tags of all the items in a collection can be used to create a tag cloud as a low cost way to contribute to a visualization of what a collection is “about” overall.6 unfortunately, a laboriously developed set of exhaustive tags, although valuable for supporting searching and browsing within a large image collection, could give a very distorted overview of what the whole collection is about. extending our example, the tag “tree” may occur so frequently and be so prominent in the tag cloud that a user infers that this is mostly a botanical collection. selective indexing: lcsh in library catalogs although more extreme in the case of images in conveying the “ofness,” the same problem with multiple subjects also applies to text in terms of “aboutness.” the following example comes from an online library catalog in a faceted navigation web interface using library of congress subject headings in subject cataloging.7 the query “psychoanalysis and religion” returned 158 results, with 126 in “psychoanalysis and religion” under the topic facet. according to the subject headings manual, the first subject is always the primary one, while the second and others could be either a primary or nonprimary subject.8 this means that among these 126 books, there is no easy way to tell which books are “primarily” about “psychoanalysis and religion” unless the user goes through all of them. with the provided metadata, we do know that all books that have “psychoanalysis and religion” as the first subject heading are primarily about this topic, but a book that has this same heading as its second subject heading may or may not be primarily about this topic. there is no way to indicate which it is in the metadata, nor in the search interface. as this example shows, the library of congress manual involves an attempt to acknowledge and make a distinction between primary and nonprimary subjects. however in practice the attempt is insufficient to be really useful since apart from the first entry, it is ambiguous whether subsequent entries are additional primary subjects or nonprimary subjects. consequently, the search system and, further on, the users are not able to take full advantage of the care of a cataloger in deciding whether an additional subject is primary or not. other information retrieval systems the negative effect of current subject indexing without weighting on search outcomes has been identified by some researchers on particular information retrieval systems. in a study examining “the contribution of metadata to effective searching,”9 hawking and zobel found that the available subject metadata are “of little value in ranking answers” to search queries.10 their explanation is that “it is difficult to indicate via metadata tagging the relative importance of a page to a particular topic,”11 in addition to the problems in data quality and system implementation. the same problem : | zhang et al. 77seeing the wood for the trees | zhang et al. 77 authors compared with the automatic indexing systems, because human indexers should be better at weighting the significance of subjects, and be more able to distinguish between important and peripheral compared with computers that base significance on term frequency.13 indeed, while various weighting algorithms have been used in automatic indexing systems to approximate the distinguishing function, there is simply no such mechanism built in human subject the particular page harder to find.12 a similar problem is reported in a recent study by lykke and eslau. in comparing searching by controlled subject metadata, searching based on automatic indexing, and searching based on automatic indexing expanded with a corporate thesaurus in an enterprise electronic document management system, the authors found that the metadata searches produced the lowest precision among the three strategies. the problem of indiscriminate metadata indexing is “remarkable” to the of multiple tags without weights is described: in the kinds of queries we have studied, there is typically one page (or at most a small number) that is particularly valuable. there are many other pages which could be said to be relevant to the query—and thus merit a metadata match—but they are not nearly so useful for a typical searcher. under the assumption that metadata is needed for search, all of these pages should have the relevant metadata tag, but this makes a. subject: women; books; dresses; flowers; trees; . . . in: victoria & albert museum (accessed aug. 30, 2010), http://collections.vam.ac.uk/item/014962/oil-painting-the-day-dream b. tags: japanese; moon; nights; walking; tree; . . . in: brooklyn museum (accessed aug. 30, 2010), http://www.brooklynmuseum.org/opencollections/objects/121725/aoi_slope_outside_toranomon_gate_no._113_from_ one_hundred_famous_views_of_edo c. tags: japanese; birds; silk; waterfall; tree; . . . in: steve: the museum social tagging project (accessed aug. 30, 2010), http://tagger.steve.museum/steve/object/15?offset=2 figure 1. example images with “tree” as a subject item 78 information technology and libraries | june 2011 anderson in niso tr021997.20 in addition, researchers have noticed the limitations of this dichotomous indexing. in an opinion piece, markey emphasizes the urgency to “replace boolean-based catalogs with post-boolean probabilistic retrieval methods,”21 especially given the challenges library systems are faced with today. it is the time to change the boolean, i.e., dichotomous, practice of subject indexing and cataloging, no matter whether it is produced by professional librarians, by user tagging, or by an automatic mechanism. indeed, as declared by svenonius, “while the purpose of an index is to point, the pointing cannot be done indiscriminately.”22 needed refinements in subject indexing the fact that weighted indexing has become more prominently needed over the past decade may be related to the shift in the continuum from subject indexing as representation/ surrogate to subject indexing as access points, which is consistent with the shift from a small number of subject terms to more subject terms. this might explain why the weighting practice is applied in the above mentioned medline/pubmed system. with web-based systems, social tagging technology, federated searching, and the growing number of collections producing more subject terms, to distinguish between them has become a prominent problem. in reviewing information users and use from the 1920s to the present, miksa points out the trend to “more granular access to informational objects” “by viewing documents as having many diverse subjects rather than one or two ‘main’ subjects,” no matter what the social and technical environment has been.23 in recognizing this theme in the future development of information organization and retrieval systems, we argue that the subject indexing mechanism subject indexing has been discussed in the research area of subject analysis for some time. weighting gives indexing an increased granularity and can be a device to counteract the effect of indexing specificity and exhaustivity on precision and recall, as pointed out by foskett: whereas specificity is a device to increase relevance at the cost of recall, exhaustivity works in the opposite direction, by increasing recall, but at the expense of relevance. a device which we may use to counteract this effect to some extent is weighting. in this, we try to show the significance of any particular specification by giving it a weight on a pre-established scale. for example, if we had a book on pets which dealt largely with dogs, we might give pets a weight of 10/10, and dogs, a weight of 8/10 or less.16 anderson also includes weighting as a part of indexing in the guidelines for indexes and related information retrieval devices (niso tr021997): one function of an index is to discriminate between major and minor treatments of particular topics or manifestations of particular features.17 he also notes that a weighting scheme is “especially useful in high-exhaustivity indexing”18 when both peripheral and primary topics are indicated. similarly, fidel lists “weights” as one of the issues that should be addressed in an indexing policy.19 metadata indexing without weighting is related to the simplified dichotomous assumption in subject indexing—primarily about/of and not primarily about/of, which further leads to the dichotomous retrieval result—retrieved and not retrieved. weighting as a mechanism to break this dichotomy is noted by metadata indexing even though human indexers are able to do the job much better than computers. weighting: yesterday, today, and future precedent weighting practices written more than thirty years ago, the final report of the subject access project describes how the project researchers applied weights to the newly added subject terms extracted from tables of contents and backof-the-book indexes. the criterion used in that project was that terms and phrases with a “ten-page range or larger” were treated as “major” ones.14 a similar mechanism was adopted in the eric database beginning in the 1960s, with indexes distinguishing “major” and “minor” descriptors as the result of indexing. while some search systems allowed differentiation of major and minor descriptors in formulating searches, others simply included the distinction (with an asterisk) when displaying a record. unfortunately, this distinguishing mechanism is no longer included in the later eric indexing data. a system using weighted indexing and searching and still running today is the medline/pubmed interface. a qualifier [majr] can be used with a medical subject headings (mesh) term in a query to “search a mesh heading which is a major topic of an article (e.g., thromboembolism[majr]).”15 in the search result page, each major mesh topic term is denoted by an asterisk at the end. weighting concept and the purpose of indexing the weighting concept is connected with the fundamental purpose of indexing. the idea of weighting in : | zhang et al. 79seeing the wood for the trees | zhang et al. 79 user tagging and machine generated metadata, such weighting becomes more important than ever if we are to make productive use of metadata richness and still see the wood for the trees. references 1. “dublin core metadata element set, version 1.1,” http://dublincore.org/docu ments/dces/ (accessed nov. 20, 2010). 2. library of congress, subject headings manual (washington, d.c.: library of congress, 2008). 3. ibid. 4. elaine svenonius, “access to nonbook materials: the limits of subject indexing for visual and aural languages,” journal of the american society for information science, 45, no. 8 (1994): 600–606. 5. “vra core 4.0 element description,” http://www.loc.gov/standards/vracore/ vra_core4_element_description.pdf (accessed mar. 31, 2011). 6. richard j. urban, michael b. twidale, and piotr adamczyk, “designing and developing a collections dashboard,” in j. trant and d. bearman (eds). museums and the web 2010: proceedings, ed. j. trant and d. bearman (toronto: archives & museum informatics, 2010). http://www .archimuse.com/mw2010/papers/urban/ urban.html (accessed apr. 5, 2011). 7. “vufind at the university of illinois,” http://vufind.carli.illinois.edu (accessed nov. 20, 2010). 8. library of congress, subject headings manual. 9. david hawking and justin zobel, “does topic metadata help with web search?” journal of the american society for information science & technology 58, no. 5 (2007): 613–28. 10. ibid. 11. ibid. 12. ibid, 625. 13. marianne lykke and anna g. eslau, “using thesauri in enterprise settings: indexing or query expansion?” in the janus faced scholar. a festschrift in honour of peter ingwersen, ed. birger larsen et al. (copenhagen: royal school of library & information science, 2010): 87–97. 14. subject access project, books are for use: final report of the subject access project to the council on library resources (syracuse, n.y.: syracuse univ., 1978). 15. “pubmed,” http://www.nlm.nih more than three categories or using continuous scales instead of category rating.24 subject indexing involves a similar judgment of relevance when deciding whether to include a subject term. more sophisticated scales certainly enable more useful ranking of results, but the cost of obtaining such information may rise. after the mechanism of incorporating weights into subject indexing/ cataloging is developed, guidelines should be provided for indexing practice to produce consistent and good quality. weights in both indexing and retrieval system adding weights to subject indexing/ cataloging needs to be considered and applied in three parts: (1) extending metadata schemas by encoding weights in subject elements; (2) subject indexing/cataloging with weight information; and (3) retrieval systems that exploit the weighting information in subject metadata elements. the mechanism will not work effectively in the absence of any one of them. conclusion this paper advocates for adding a weighting mechanism to subject indexing and tagging, to enable search algorithms to be more discriminating and browsing better oriented, and thus to make it possible to provide more granular access to information. such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality. as social tagging is brought into today’s digital library collections and online library catalogs, as collections grow and are aggregated, and the opportunity arises for adding more metadata from a variety of different sources, including end should provide sufficient granularity to allow more granular access to information, as demonstrated in the examples in the previous section. potential challenges while arguing for the potential value of weights associated with subject terms, it is also important to acknowledge potential challenges posed by this approach. human judgment treating assigned terms equally might seem to avoid the additional human judgment and the subjectivity of the weight levels because different catalogers may give different weight to a subject heading. we argue that assigning subject headings is itself unavoidably subjective. we are already using professional indexers and subject catalogers to create value-added metadata in the form of subject terms. assigning weights would be a further enhancement. on the other hand, adding a weighting mechanism into metadata schemas is independent of the issue of human indexing. no matter who will do the subject indexing or tagging, either professional librarians or users or possibly computers, there is a need for weight information in the metadata records. the weighting scale in terms of the specific mechanism of representing the weight rating, we can benefit from research on weighting of index terms and on the relevance of search results. for example, the three categories of relevant, partially relevant, and nonrelevant in information retrieval are similar to the major, minor, and nonpresent subject indexing method in the examples above. borlund notes several retrieval studies proposing 80 information technology and libraries | june 2011 22. svenonius, “access to nonbook materials,” 601. 23. francis miksa, “information organization and the mysterious information user,” libraries & the cultural record 44, no. 3 (2009): 343–70. 24. pia borlund, “the concept of relevance in ir,” journal of the american society for information science & technology 54, no. 10 (2003): 913–25. 18. ibid. 19. raya fidel, “user-centered indexing,” journal of the american society for information science 45, no. 8 (1994): 572–75. 20. anderson, guidelines for indexes and related information retrieval devices, 20. 21. karen markey, “the online library catalog: paradise lost and paradise regained?” d-lib magazine 13, no. 1/2 (2007). . g o v / b s d / d i s t e d / p u b m e d t u t o r i a l / 020_760.html (accessed nov. 20, 2010). 16. a. c. foskett, the subject approach to information, 5th ed. (london: library association publishing, 1996): 24. 17. james d. anderson, guidelines for indexes and related information retrieval devices. niso-tr02–1997, http:// www.niso.org/publications/tr/tr02.pdf (accessed nov. 20, 2010): 25. 214 information technology and libraries | december 2010 margaret brown-sica, jeffrey beall, and nina mchale next-generation library catalogs and the problem of slow response time and librarians will benefit from knowing what typical and acceptable response times are in online catalogs, and this information will assist in the design and evaluation of library discovery systems. this study also looks at benchmarks in response time and defines what is unacceptable and why. when advanced features and content in library catalogs increase response time to the extent that users become disaffected and use the catalog less, nextgen catalogs represent a step backward, not forward. in august 2009, the auraria library launched an instance of the worldcat local product from oclc, dubbed worldcat@auraria. the library’s traditional catalog—named skyline and running on the innovative interfaces platform—still runs concurrently with worldcat@auraria. because worldcat local currently lacks a library circulation module that the library was able to use, the legacy catalog is still required for its circulation functionality. in addition, skyline contains marc records from the serialssolution 360 marc product. since many of these records are not yet available in the oclc worldcat database, these records are being maintained in the legacy catalog to enable access to the library’s extensive collection of online journals. almost immediately upon implementation of worldcat local, many library staff began to express concern about the product’s slow response time. they bemoaned its slowness both at the reference desk and during library instruction sessions. few of the discussions of the product’s slow response time evaluated this weakness in the context of its advanced features. several of the reference and instruction librarians even stated that they refused to use it any longer and that they were not recommending it to students and faculty. indeed, many stated that they would only use the legacy skyline catalog from then on. therefore we decided to analyze the product’s response time in relation to the legacy catalog. we also decided to further our study by examining response time in library catalogs in general, including several different online catalog products from different vendors. ■■ response time the term response time can mean different things in different contexts. here we use it to mean the time it takes for all files that constitute a single webpage (in the case of testing performed, a permalink to a bibliographic record) to travel across the internet from a web server to the computer on which the page is to be displayed. we do not include the time it takes for the browser to render the page, only the time it takes for the files to arrive to the requesting computer. typically, a single webpage is made of multiple files; these are sent via the internet from a web response time as defined for this study is the time that it takes for all files that constitute a single webpage to travel across the internet from a web server to the end user’s browser. in this study, the authors tested response times on queries for identical items in five different library catalogs, one of them a next-generation (nextgen) catalog. the authors also discuss acceptable response time and how it may affect the discovery process. they suggest that librarians and vendors should develop standards for acceptable response time and use it in the product selection and development processes. n ext-generation, or nextgen, library catalogs offer advanced features and functionality that facilitate library research and enable web 2.0 features such as tagging and the ability for end users to create lists and add book reviews. in addition, individual catalog records now typically contain much more data than they did in earlier generations of online catalogs. this additional data can include the previously mentioned tags, lists, and reviews, but a bibliographic record may also contain cover images, multiple icons and graphics, tables of contents, holdings data, links to similar items, and much more. this additional data is designed to assist catalog users in the selection, evaluation, and access of library materials. however, all of the additional data and features have the disadvantage of increasing the time it takes for the information to flow across the internet and reach the end user. moreover, the code that handles all this data is much more complex than the coding used in earlier, traditional library catalogs. slow response time has the potential to discourage both library patrons from using the catalog and library staff from using or recommending it. during a reference interview or library instruction session, a slow response time creates an awkward lull in the process, a delay that decreases confidence in the mind of library users, especially novices who are accustomed to the speed of an open internet search. the two-fold purpose of this study is to define the concept of response time as it relates to both traditional and nextgen library catalogs and to measure some typical response times in a selection of library catalogs. libraries margaret brown-sica (margaret.brown-sica@ucdenver.edu) is assistant professor, associate director of technology strategy and learning spaces, jeffrey beall (jeffrey.beall@ucdenver.edu) is assistant professor, metadata librarian, and nina mchale (nina.mchale@ucdenver.edu) is assistant professor, web librarian, university of colorado denver. next-generation library catalogs | brown-sica, beall, and mchale 215 mathews posted an article called “5 next gen library catalogs and 5 students: their initial impressions.”7 here he shares student impressions of several nextgen catalogs. regarding slow response time mathews notes, “lots of comments on slowness. one student said it took more than ten seconds to provide results. some other comments were: ‘that’s unacceptable’ and ‘slow-motion search, typical library.’” nagy and garrison, on lauren’s library blog, emphasized that any “cross-silo federated search” is “as slow as the slower silos.”8 any search interface is as slow as the slowest database from which it pulls information; however, that does not make users more likely to wait for search results. in fact, many users will not even know—or care—what is happening behind the scenes in a nextgen catalog. the assertion that slow response time makes wellintentioned improvements to an interface irrelevant is supported by an article that analyzes the development of semantic web browsers. frachtenberg notes that users, however, have grown to expect web search engines to provide near-instantaneous results, and a slow search engine could be deemed unusable even if it provides highly relevant results. it is therefore imperative for any search engine to meet its users’ interactivity expectations, or risk losing them.9 this is not just a library issue. users expect a fast response to all web queries, and we can learn from studies on general web response time and how it affects the user experience. huang and fong-ling help explain different user standards when using websites. their research suggests that “hygiene factors” such as “navigation, information display, ease of learning and response time” are more important to people using “utilitarian” sites to accomplish tasks rather than “hedonistic” sites.10 in other words, response time importance increases when the user is trying to perform a task— such as research—and possibly even more for a task that may be time sensitive—such as trying to complete an assignment for class. ■■ method for testing response time in an assortment of library catalogs, we used the websitepulse service (http://www .websitepulse.com). websitepulse provides in-depth website and server diagnostic services that are intended to save e-business customers time and money by reporting errors and web server and website performance issues to clients. a thirty-day free trial is available for potential customers to review the full array of their services; however, the free web page test, available at http://www.website server and arrive sequentially at the computer where the request was initiated. while the world wide web consortium (w3c) does not set forth any particular guidelines regarding response time, go-to usability expert jakob nielsen states that “0.1 second is about the limit for having the user feel that the system is reacting instantaneously.”1 he further posits that 1.0 second is “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.”2 finally, he asserts that: 10 seconds is about the limit for keeping the user’s attention focused on the dialogue. for longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.3 even though this advice dates to 1994, nielsen noted even then that it had “been about the same for many years.”4 ■■ previous studies the chief benefit of studying response time is to establish it as a criterion for evaluating online products that libraries license and purchase, including nextgen online catalogs. establishing response-time benchmarks will aid in the evaluation of these products and will help libraries convey the message to product vendors that fast response time is a valuable product feature. long response times will indicate that a product is deficient and suffers from poor usability. it is important to note, however, that sometimes library technology environments can be at fault in lengthening response time as well; in “playing tag in the dark: diagnosing slowness in library response time,” brown-sica diagnosed delays in response time by testing such variables as vendor and proxy issues, hardware, bandwidth, and network traffic.5 in that case, inadequate server specifications and settings were at fault. while there are many articles on nextgen catalogs, few of them discuss the issue of response time in relation to their success. search slowness has been reported in library literature about nextgen catalogs’ metasearch cousins, federated search products. in a 2006 review of federated search tools metalib and webfeat, chen noted that “a federated search could be dozens of times slower than google.”6 more comments about the negative effects of slow response time in nextgen catalogs can be found in popular library technology blogs. on his blog, 216 information technology and libraries | december 2010 ■■ findings: skyline versus worldcat@auraria in figure 2, the bar graph shows a sample load time for the permalink to the bibliographic record for the title hard lessons: the iraq reconstruction experience in skyline, auraria’s traditional catalog load time for the page is pulse.com/corporate/alltools.php, met our needs. to use the webpage test, simply select “web page test” from the dropdown menu, input a url—in the case of the testing done for this study, the permalink for one of three books (see, for example, figure 1)—enter the validation code, and click “test it.” websitepulse returns a bar graph (figure 2) and a table (figure 3) of the file activity from the server sending the composite files to the end user ’s web browser. each line represents one of the files that make up the rendered webpage. they load sequentially, and the bar graph shows both the time it took for each file to load and the order in which the files were received. longer segments of the bar graph provide visual indication of where a slow-loading webpage might encounter sticking points—for example, waiting for a large image file or third-party content to load. accompanying the bar graph is a table describing the file transmissions in more detail, including dns, connection, file redirects (if applicable), first and last bytes, file transmission times, and file sizes. figure 1. permalink screen shot for the record for the title hard lessons in auraria library’s skyline catalog figure 2. websitepulse webpage test bar graph results for skyline (traditional) catalog record figure 3. websitepulse webpage test table results for skyline (traditional) catalog record next-generation library catalogs | brown-sica, beall, and mchale 217 requested at items 8, 14, 15, 17, 26, and 27. the third parties include yahoo! api services, the google api service, recaptcha, and addthis. recaptcha is used to provide security within worldcat local with optical character recognition images (“captchas”), and the addthis api is used to provide bookmarking functionality. at number 22, a connection is made to the auraria library web server to retrieve a logo image hosted on the web server. at number 28, the cover photo for hard lessons is retrieved from an oclc server. the files listed in figure 6 details the complex process of web browsers’ assembly of them. each connection to third-party content, while all relatively short, allows for additional features and functionality, but lengthens overall response. as figure 6 shows, the response time is slightly more than 10 seconds, which, according to nielsen, “is about the limit for keeping the user ’s attention focused on the dialogue.”12 while widgets, third-party content, and other web 2.0 tools add desirable content and functionality to the library’s catalog, they also do slow response time considerably. the total file size for the bibliographic record in worldcat@auraria—compared to skyline’s 84.64 kb—is 633.09 kb. as will be shown in the test results below for the catalog and nextgen catalog products, bells and whistles added to traditional 1.1429 seconds total. the record is composed of a total of fourteen items, including image files (gifs), cascading style sheet (css) files, and javascript (js) files. as the graph is read downward, the longer segments of the bars reveal the sticking points. in the case of skyline, the nine image files, two css files, and one js file loaded quickly; the only cause for concern is the red line at item four. this revealed that we were not taking advantage of the option to add a favicon to our iii catalog. the web librarian provided the ils server technician with the same favicon image used for the library’s website, correcting this issue. the skyline catalog, judging by this data, falls into nielsen’s second range of user expectations regarding response time, which is more than one second, or “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.”11 further detail is provided in figure 3; this table lists each of the webpage’s component files, and various times associated with the delivery of each file. the column on the right lists the size in kilobytes of each file. the total size of the combined files is 84.64 kb. in contrast to skyline’s meager 14 files, worldcat local requires 31 items to assemble the webpage (figure 4) for the same bibliographic record. figures 5 and 6 show that this includes 10 css files, 10 javascript files, and 8 images files (gifs and pngs). no item in particular slows down the overall process very much; the longestloading item is number 13, which is a wait for third-party content, a connection to yahoo!’s user interface (yui) api service. additional third-party content is being figure4. permalink screen shot for the record for the title hard lessons in worldcat@auraria figure 5. websitepulse webpage test bar graph results for worldcat@auraria record 218 information technology and libraries | december 2010 total time for each permalinked bibliographic record to load as reported by the websitepulse tests; this number appears near the lower right-hand corner of the tables in figures 3, 6, 9, 12, and 15. we selected three books that were each held by all five of our test sites, verifying that we were searching the same three bibliographic records in each of the online catalogs by looking at the oclc number in the records. each of the catalogs we tested has a permalink feature; this is a stable url that always points to the same record in each catalog. using a permalink approximates conducting a known-item search for that item from a catalog search screen. we saved these links and used them in our searches. the bibliographic records we tested were for these books; the permalinks used for testing follow the books: book 1: hard lessons: the iraq reconstruction experience. washington, d.c.: special inspector general, iraq reconstruction, 2009 (oclc number 302189848). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat .org/oclc/302189848 ■■ skyline: http://skyline.cudenver.edu/record=b243 3301~s0 ■■ lcoc: http://lccn.loc.gov/2009366172 ■■ ut austin: http://catalog.lib.utexas.edu/record= b7195737~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2770895{ckey} book 2: ehrenreich, barbara. nickel and dimed: on (not) getting by in america. 1st ed. new york: metropolitan, 2001 (oclc number 256770509). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat .org/oclc/45243324 ■■ skyline: http://skyline.cudenver.edu/record=b187 0305~s0 ■■ lcoc: http://lccn.loc.gov/00052514 ■■ ut austin: http://catalog.lib.utexas.edu/record= b5133603~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=1856407{ckey} book 3: langley, lester d. simón bolívar: venezuelan rebel, american revolutionary. lanham: rowman & littlefield catalogs slowed response time considerably, even doubling it in one case. are they worth it? the response of auraria’s reference and instruction staff seems to indicate that they are not. ■■ gathering more data: selecting the books and catalogs to study to broaden our comparison and to increase our data collection, we also tested three other non-auraria catalogs. we designed our study to incorporate a number of variables. we decided to link to bibliographic records for three different books in the five different online catalogs tested. these included skyline and worldcat@auraria as well three additional online public access catalog products, for a total of two instances of innovative interfaces products, one of a voyager catalog, and one of a sirsidynix catalog. we also selected online catalogs in different parts of the country: worldcatlocal in ohio; skyline in denver; the library of congress’ online catalog (lcoc) in washington, d.c.; the university of texas at austin’s (ut austin) online catalog; and the university of southern california’s (usc) online catalog, named homer, in los angeles. we also did our testing at different times of the day. one book was tested in the morning, one at midday, and one in the afternoon. websitepulse performs its webpage tests from three different locations in seattle, munich, and brisbane; we selected seattle for all of our tests. we recorded the figure 6. websitepulse webpage test table results for worldcat@auraria record next-generation library catalogs | brown-sica, beall, and mchale 219 .org/oclc/256770509 ■■ skyline: http://skyline.cudenver.edu/record=b242 6349~s0 ■■ lcoc: http://lccn.loc.gov/2008041868 ■■ ut austin: http://catalog.lib.utexas.edu/record= b7192968~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2755357{ckey} we gathered the data for thirteen days in early november 2009, an active period in the middle of the semester. for each test, we recorded the response time total in seconds. the data is displayed in tables 1–3. we searched bibliographic records for three books in five library catalogs over thirteen days (3 x 5 x 13) for a total of 195 response time measurements. the websitepulse data is calculated to the ten thousandth of a second, and we recorded the data exactly as it was presented. publishers, c2009 (oclc number 256770509). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat table 1. response times for book 1 response time in seconds day wor ldcat skyline lc ut austin usc 1 10.5230 1.3191 2.6366 3.6643 3.1816 2 10.5329 1.2058 1.2588 3.5089 4.0855 3 10.4948 1.2796 2.5506 3.4462 2.8584 4 13.2433 1.4668 1.4071 3.6368 3.2750 5 10.5834 1.3763 3.6363 3.3143 4.6205 6 11.2617 1.2461 2.3836 3.4764 2.9421 7 20.5529 1.2791 3.3990 3.4349 3.2563 8 12.6071 1.3172 3.6494 3.5085 2.7958 9 10.4936 1.1767 2.6883 3.7392 4.0548 10 10.1173 1.5679 1.3661 3.7634 3.1165 11 9.4755 1.1872 1.3535 3.4504 3.3764 12 12.1935 1.3467 4.7499 3.2683 3.4529 13 11.7236 1.2754 1.5569 3.1250 3.1230 average 11.8310 1.3111 2.5105 3.4874 3.3953 table 2. response times for book 2 response time in seconds day worldcat skyline lc ut austin usc 1 10.9524 1.4504 2.5669 3.4649 3.2345 2 10.5885 1.2890 2.7130 3.8244 3.7859 3 10.9267 1.3051 0.2168 4.0154 3.6989 4 13.8776 1.3052 1.3149 4.0293 3.3358 5 10.6495 1.3250 4.5732 3.5775 3.2979 6 11.8369 1.3645 1.3605 3.3152 2.9023 7 11.3482 1.2348 2.3685 3.4073 3.5559 8 10.7717 1.2317 1.3196 3.5326 3.3657 9 11.1694 1.0997 1.0433 2.8096 2.6839 10 19.0694 1.6479 2.5779 4.3595 2.6945 11 12.0109 1.1945 2.5344 3.0848 18.5552 12 12.6881 0.7384 1.3863 3.7873 3.9975 13 11.6370 1.1668 1.2573 3.3211 3.6393 average 12.1174 1.2579 1.9410 3.5791 4.5190 table 3. response times for book 3 response time in seconds day worldcat skyline lc ut austin usc 1 10.8560 1.3345 1.9055 3.7001 2.6903 2 10.1936 1.2671 1.8801 3.5036 2.7641 3 11.0900 1.5326 1.3983 3.5983 3.0025 4 10.9030 1.4557 2.0432 3.6248 2.9285 5 12.3503 1.5972 3.5474 3.6428 4.5431 6 9.1008 1.1661 1.4440 3.4577 3.1080 7 9.6263 1.1240 2.3688 3.1041 3.3388 8 10.9539 1.1944 1.4941 2.8968 3.4224 9 11.0001 1.2805 1.3255 3.3644 2.7236 10 10.2231 1.3778 1.3131 3.3863 3.4885 11 10.1358 1.2476 2.3199 3.4552 2.9302 12 12.0109 1.1945 2.5344 3.0848 18.5552 13 11.5881 1.2596 2.5245 3.8040 3.8506 average 10.7717 1.3101 2.0076 3.4325 4.4112 table 4. averages response time in seconds book worldcat skyline lc ut austin usc book 1 11.8310 1.3111 2.5105 3.4874 3.3953 book 2 12.1174 1.2579 1.9410 3.5791 4.5190 book 3 10.7717 1.3101 2.0076 3.4325 4.4112 average 11.5734 1.2930 2.1530 3.4997 4.1085 220 information technology and libraries | december 2010 university of colorado denver: skyline (innovative interfaces) as previously mentioned, the traditional catalog at auraria library runs on an innovative interfaces integrated library system (ils). testing revealed a missing favicon image file that the web server tries to send each time (item 4 in figure 3). however, this did not negatively affect the response time. the catalog’s response time was good, with an average of 1.2930 seconds, giving it the fastest average time among all the test sites in the testing period. as figure 1 shows, however, skyline is a typical legacy catalog that is designed for a traditional library environment. library of congress: online catalog (voyager) the average response time for the lcoc was 2.0076 ■■ results the data shows the response times for each of the three books in each of the five online catalogs over the thirteenday testing period. the raw data was used to calculate averages for each book in each of the five online catalogs, and then we calculated averages for each of the five online catalogs (table 4). the averages show that during the testing period, the response time varied between 1.2930 seconds for the skyline library catalog in denver to 11.5734 seconds for worldcat@auraria, which has its servers in ohio. university of colorado denver: worldcat@auraria worldcat@auraria was routinely over nielsen’s ten second limit, sometimes taking as long as twenty seconds to load all the files to generate a single webpage. as previously discussed, this is due to the high number and variety of files that make up a single bibliographic record. the files sent also include cover images, but they are small and do not add much to the total time. after our tests on worldcat@auraria were conducted, the site removed one of the features on pages for individual resources, namely the “similar items” feature. this feature was one of the most file-intensive on a typical page, and its removal should speed up page loads. however, worldcat@auraria had the highest average response time by far of the five catalogs tested. figure 7. permalink screen shot for the record for the title hard lessons in the library of congress online catalog figure 8. websitepulse webpage test bar graph results for library of congress online catalog record figure 9. websitepulse webpage test table results for library of congress online catalog record next-generation library catalogs | brown-sica, beall, and mchale 221 item 14 is a script, that while hosted on the ils server, queries amazon.com to return cover image art (figures 11–12). the average response time for ut austin’s catalog was 3.4997 seconds. this example demonstrates that response times for traditional (i.e., not nextgen) catalogs can be slowed down by additional content as well. university of southern california: homer (sirsidynix) the average response time for usc’s homer catalog was 4.1085 seconds, making it the second slowest after seconds. this was the second fastest average among the five catalogs tested. while, like skyline, the bibliographic record page is sparsely decorated (figure 7), this pays dividends in response time, as there are only two css files and three gif files to load after the html content loads (figure 9). figure 8 shows that initial connection time is the longest factor in load time; however, it is still short enough to not have a negative effect. total file size is 19.27 kb. as with skyline, the page itself (figure 7) is not particularly end-user friendly to nonlibrarians. university of texas at austin: library catalog (innovative interfaces) ut austin, like auraria library, runs an innovative interfaces ils. the library catalog also includes book cover images, one of the most attractive nextgen features (figure 10), and as shown in figure 12, third-party content is used to add features and functionality (items 16 and 17). ut austin’s catalog uses a google javascript api (item 16 in figure 12) and librarything’s catalog enhancement product, which can add book recommendations, tag browsing, and alternate editions and translations. total content size for the bibliographic record is considerably larger than skyline and the lcoc at 138.84 kb. it appears as though inclusion of cover art nearly doubles the response time; figure 10. permalink screen shot for the record for the title hard lessons in university of texas at austin’s library catalog figure 11. websitepulse webpage test bar graph results for university of texas at austin’s library catalog record figure 12. websitepulse webpage test table results for university of texas at austin’s library catalog record 222 information technology and libraries | december 2010 completed. added functionality and features in library search tools are valuable, but there is a tipping point when these features slow down a product’s response time to where users find the search tool too slow or unreliable. based on the findings of this study, we recommend that libraries adopt web response time standards, such as those set forth by nielsen, for evaluating vendor search products and creating in-house search products. commercial tools like websitepulse make this type of data collection simple and easy. testing should be conducted for an extended period of time, preferably during a peak period—i.e., during a busy time of the semester for academic libraries. we further recommend that reviews of electronic resources add response time as an worldcat@auraria, and the slowest among the traditional catalogs. this sirsidynix catalog appears to take a longer time than the other brands of catalogs to make the initial connection to the ils; this accounts for much of the slowness (see figures 14 and 15). once the initial connection is made, however, the remaining content loads very quickly, with one exception: item 13 (see figure 15), which is a connection to the third-party provider syndetic solutions, which provides cover art, a summary, an author biography, and a table of contents. while the display of this content is attractive and well-integrated to the catalog (figure 13), it adds 1.2 seconds to the total response time. also, as shown in item 14 and 15, usc’s homer uses the addthis service to add bookmarking enhancements to the catalog. total combined file size is 148.47 kb, with the bulk of the file size (80 kb) coming from the initial connection (item 1 in figure 15). ■■ conclusion an eye-catching interface and valuable content are lost on the end user if he or she moves on before a search is figure 13. permalink screen shot for the record for the title hard lessons in homer, the university of southern california’s catalog figure 14. websitepulse webpage test bar graph results for homer, the university of southern california’s catalog figure 15. websitepulse webpage test table results for homer, the university of southern california’s catalog next-generation library catalogs | brown-sica, beall, and mchale 223 4. ibid. 5. margaret brown-sica. “playing tag in the dark: diagnosing slowness in library response time,” information technology & libraries 27, no. 4 (2008): 29–32. 6. xiaotian chen, “metalib, webfeat, and google: the strengths and weaknesses of federated search engines compared with google,” online information review 30, no. 4 (2006): 422. 7. brian mathews, “5 next gen library catalogs and 5 students: their initial impressions,” online posting, may 1, 2009, the ubiquitous librarian blog, http://theubiquitouslibrarian .typepad.com/the_ubiquitous_librarian/2009/05/5-next-genlibrary-catalogs-and-5-students-their-initial-impressions.html (accessed feb. 5, 2010) 8. andrew nagy and scott garrison, “next-gen catalogs are only part of the solution,” online posting. oct. 4, 2009, lauren’s library blog, http://laurenpressley.com/library/2009/10/next -gen-catalogs-are-only-part-of-the-solution/ (accessed feb. 5, 2010). 9. eitan frachtenberg, “reducing query latencies in web search using fine-grained parallelism,” world wide web 12, no. 4 (2009): 441–60. 10. travis k huang and fu fong-ling, “understanding user interface needs of e-commerce web sites,” behaviour & information technology 28, no. 5 (2009): 461–69, http://www .informaworld.com/10.1080/01449290903121378 (accessed feb. 5, 2010). 11. nielsen, usability engineering, 135. 12. ibid. evaluation criterion. additional research about response time as defined in this study might look at other search tools, to include article databases, and especially other metasearch products that collect and aggregate search results from several remote sources. further studies with more of a technological focus could include discussions of optimizing data delivery methods—again, in the case of metasearch tools from multiple remote sources—to reduce response time. finally, product designers should pay close attention to response time when designing information retrieval products that libraries purchase. ■■ acknowledgments the authors wish to thank shelley wendt, library data analyst, for her assistance in preparing the test data. references 1. jakob nielsen, usability engineering (san francisco: morgan kaufmann, 1994): 135. 2. ibid. 3. ibid. 116 journal of library automation vol. 14/2 june 1981 tions only. they do not list the individual works that may be contained in publications. if an analytic catalog were to be built into a computerized system at some time in the future , the structure code would be a great help in the redesign, because it makes it easy to spot items that need analytics, namely those that contain embedded works, or codes 2, 4, 5, 6, 8, 9, 10, 11, and 13. a searcher working with such an analytic catalog could use the code to limit output to manageable stages-first all items of type c, for example; then broadening the search to include those of type d; and so forth, until enough relevant material has been found. the structure code would also be useful in the displayed output. if codes 5 or 8 appeared together with a bibliographic description on the screen, this would tell the catalog user that the item retrieved is a set of many separately titled documents. a complete list of those titles can then be displayed to help the searcher decide which of the documents are relevant for him. in the card catalog this is done by means of contents notes . not all libraries go to the trouble of making contents notes, though, and not all contents notes are complete and rtliable . the structure code would ensure consistency and completeness of contents information at all times. codes 10 and 13 in a search output, analogously, would tell the user that the item is a serial with individual issue titles. there is no mechanism in the contemporary card catalog to inform readers of those titles. codes 4 and 7 would tell that the document is part of a finite set, and so forth. it has been the general experience of database designers that a record cannot have too many searchable elements built into its format. no sooner is one approach abandoned "because nobody needs it," than someone arrives on the scene with just that requirement. it can be anticipated, then, that once the structure code is part of the standard record format, catalog users will find many other ways to work the code into search strategies. it can also be anticipated that the proposed structure code, by adding a factor of selectivity, will help catalogers because it strengthens the authority-control aspect of machine-readable catalog files. if two publications bear identical titles, for example, and one is of structure 1, the other of structure 6, then it is clear that they cannot possibly be the same items. however, if they are of structures 1 and 7, respectively, extra care must be taken in cataloging, for they could be different versions of the same work. determination of the structure of an item is a by-product of cataloging, for no librarian can catalog a book unless he understands what the structure of that book is-one or more works, one or more documents per item, open or closed set, and so forth . it would therefore be very cheap at cataloging time to document the already-performed structure analysis and express this structure in the form of a code. references l. herbert h. hoffman, descriptive cataloging in a new light: polemical chapters for librarians (newport beach, calif.: headway publications, 1976), p.43. revisions to contributed cataloging in a cooperative cataloging database judith hudson: university libraries , state university of new york at albany. introduction oclc is the largest bibliographic utility in the united states. one of its greatest assets is its computerized database of standardized cataloging information . the database, which is built on the principle of shared cataloging, consists of cataloging records input from library of congress marc tapes and records contributed by member libraries. oclc standards ln. order to provide records contributed by member libraries that are as usable as those input from marc tapes, it is imperative that the records meet the standards set by oclc and that the cataloging and formatting of the records be free of errors. member libraries are requested to follow the nationally accepted cataloging code (anglo-american cataloging rules, north american text, 1 • 2 for records input before december 12, 1980, and angloamerican cataloguing rules, second edition, 3 for records input later), the library of congress' application of the cataloging code, and the various marc formats in preparing records to be input. 4 • 5 the cataloging rules dictate what kind of bibliographic information should be included in the cataloging records, a prescribed system of punctuation that identifies the various fields of the cataloging record (international standard bibliographic description, isbd), which access points should be provided, and what form the entries should take. the marc formats provide a standardized method of identifying the various fields and subfields in a cataloging record and, through the use of indicators, information necessary to make the record easily manipulated by computers. in addition, fixed fields provide coded information about the cataloging records. the form of main, added, and series entries can be verified in the national union catalog to ensure that member libraries are following the library of congress' application of the cataloging code . by the same token, subject entries can be verified in the appropriate subject heading list (e.g., library of congress subject headings, sears subject headings, etc.). a study of oclc member cataloging a major problem with the use of contributed cataloging is the amount of revision needed to bring the records up to the standards described above. in 1975, a study of the quality of a group of membercontributed catalog records was conducted by c. c . ryans. 6 the first 700 monographic records input into oclc after september 1, 1975, to which kent state university attached its holdings were examined. 7 the analysis included changes in or additions to main, added, or series communications 117 entries, changes in descriptive cataloging, and changes in or additions to subject headings . the study dealt only with the revision of cataloging; revision of the formatting of records was not noted. the kent state study found that 393 revisions were necessary to 283 records. the remaining 417 records were considered to be acceptable, i.e., they adhered to aacr and isbd rules and to the oclc standards for input cataloging. recent developments relating to quality control since these records were studied, the internetwork quality control council was formed in 1977 by the oclc board of trustees. 8 its primary purpose is to identify problem areas regarding quality control and distribute information to networks concerning problems and solutions. its role is to promote quality control through education and by monitoring the implementation of standards. in addition, oclc' s documentation has steadily improved. the recent publication of the books format9 and the recent revision of the cataloging manual10 provide clear and specific information on oclc' s formatting requirements. with these developments in mind, it would seem likely that the quality of the contributed cataloging has improved since 1975. in order to test this assumption, a number of cataloging records were analyzed in an effort to replicate the kent state study. the analysis of these records differed from the earlier study in that differences in the treatment of series were not noted because one library's treatment of series can reasonably be expected to differ from that of another . methodology the records included in this study consist of 1,017 monographic catalog records to which the state university of new york at albany (sunya) library added its holding symbol during an eight-month period from november 1979 to july 1980. the records included only those that were entered into the oclc database after 1976. cataloging revisions that were noted 118 journal of library automation vol. 14/2 jun e 1981 consisted of changes in main and added entries to make them consistent with library of congress form of entry, and the inclusion of other added entries that were deemed necessary to provide adequate access to the material. in addition, corrections or additions to the imprint and the collation· were noted, as were typograph_ ical errors in all fields . subject headings that were changed to make them consistent with library of congress subject headings and subject headings and/or subdivisions added to provide better subject access to the material were also noted . analysis of cataloging cataloging revisions were required for 43 percent of the 1,017 records examined (596 changes or additions were made to 437 records). changes or additions to subject headings were made to 22.4 percent of all the records in the sunya sample, and represented the most common revision . changes in descriptive cataloging were made to 20 percent of the records, and changes or additions to main or added entries were made to approximately 16 percent of the records. table 1 compares the results of this analysis with the findings of the e arlier study . it should be emphasized that the two studies are not exactly comparable because the kent state study included differences in the treatment of series, while this study noted only typographical errors in series statements. the findings of this analysis do not bear out the hypothesis that the quality of member-contributed cataloging has improved since 1975. the overall percentage of records requiring cataloging revision is similar in both the kent state and the sunya samples . the percentage of changes made in the various areas of the cataloging records was similar, with the exception of added entries and subject headings . in the sunya sample , more revisions and additions were made to these two areas. this difference between the two samples may reflect variation in the cataloging policies of the two libraries rather than the presence or absence of more errors in member-contributed catalog records . analysis of oclc reportable errors and additions in the fall of 1979, oclc distributed its revised cataloging manual, which includes a chapter dealing with quality control. 11 the chapter delineates the errors and changes that are to be reported to oclc for correction or addition . the cataloging records examined in this study were also analyzed with these criteria in mind. this analysis (table 2) revealed that 661 reportable errors or changes were found on 486 records (47.8 percent of all the records). reportable errors or changes included formatting errors or omissions such as incorrect assignment of tags, incorrect or missing indicators, subfield codes or fixed fields, and errors affecting retrieval or card printing . other types of errors intable 1 . comparison of two studies of cataloging revision area needing kent state sample* sunya sample revision or addition number percentage number percentage main entry 44 6.2 46 4.5 title statement 28 4.0 76 7.5 edition statement 4 0.6 2 0.2 imprint 29 4.4 64 6.3 collation 111 15.9 58 5.7 series 55 7.9 3 0.3 subject heading 88 12.6 228 22 .4 added entries 44 6.2 119 11.7 total records in study 700 100.0 1017 100.0 records requiring revision 283 40.4 437 43.0 number of revisions made 393 596 *source: constance c . ryans, "a study of errors found in non-marc cataloging in a machineassisted system," journal of library automation 11 :128 (june 1978). communications 119 table 2 . errors and additions reportable to oclc number percentage of total records percentage of total errors and additions 19 6 13 17 59 errors in transcription of data incorrect assignment of tags incorrect or missing subfield codes incorrect assignment of 1st indicator incorrect assignment of 2d indicator incorrect fixed fields incorrect isbd incorrect form of entry (less than lc) errors affecting retrieval or card printing bibliographic information missing addition of access points 313 8 87 3 1 135 total number of records containing reportable errors or additions total number of reportable errors or additions 486 661 eluded incorrect or omitted access points (added or subject entries, isbn, lc card numbers, etc.), errors in transcription of data, incorrect isbn, and the omission of needed bibliographic information. approximately 40 percent (408) of the records contained formatting errors, with over 29 percent (300) of the records containing incomplete or incorrect fixed fields. the apparent unconcern with fixed fields may stem from a lack of understanding of the value of correct fixed-field information. the recent addition of date and type of material as qualifiers in a search of the database is one example of the use of fixed fields. in order to underscore their importance, it might be useful for oclc to highlight this use of fixed fields and further explain to its members how other fixed fields might be used in online search strategies in the future. errors in or omission of access points were found in 222 records (21.8 percent). these errors were also noted in the study of cataloging revisions discussed above, as were errors in transcription of data, in isbd, and in omission of necessary bibliographic information. summary of findings although the quality of the sunya sample seems equivalent to that of the kent state sample, an analysis by date of input of the records examined indicates a slight decrease in the percentage of rec1.9 0.6 1.3 1.7 5.8 30.8 0.8 8.6 0.3 0.1 13.3 47.8 2.9 0.9 2.0 2.6 8.9 47.4 1.2 13.2 0.5 0.2 20 .4 100.0 ords needing correction for those records input in 1979 and 1980 (table 3). perhaps this is the beginning of a trend toward more careful cataloging and formatting of records input by members. in summary, 589 of the 1,017 membercontributed records studied were found to require revision. of these, 486 records contained er.rors or omissions that may be reported to oclc, and 437 required cataloging revision. it is discouraging to realize that approximately 60 percent of the member records used required revision. such a high percentage of records needing revision necessitates the review of all member records .used if a library wishes to adhere to oclc standards for cataloging. this leads to tremendous duplication of effort and negates, in part, the purpose of shared cataloging. table 3. yearly breakdown of catalog records total records percentage year number needing needing of input of records correction correction 1977 186 115 61.8 1978 332 202 60.8 1979 339 184 54.3 1980 160 88 55.0 influences for change the implementation of aacr2 in 1981 provides the impetus for greater adherence to standards. since all catalogers 120 journal of library automation vol. 14/2 june 1981 have had to learn the new cataloging requirements, greater care may be used in the formulation of records by member libraries. the publication of clear and specific guidelines for reportable errors may help to alleviate the situation in two ways . first, the careful articulation of errors or desirable additions may impel member libraries to place more emphasis on the quality control of input. second, member libraries may report more errors, thus allowing oclc to correct the master records. a change in the method of correcting errors and the rate at which they are corrected might be beneficial. presently, errors on the master records can only be corrected by oclc or by the inputting library if it is the only library that has used the record. such an arrangement is clumsy and time-consuming. if other member libraries were trained and authorized to correct errors on master records, errors might be corrected as often as they are detected. in the long run, however, the responsibility for inputting catalog records that meet the standards for cataloging and formatting rests with the member libraries. oclc and the networks must develop methods of encouraging libraries to input records that are correctly formatted and cataloged . one way of alleviating the problem might be to develop training programs conducted by oclc or by network staff that are aimed at those libraries identified as having high error rates. another approach might be to give public recognition to libraries that contribute cataloging of high quality to the database. one example of this approach is the pittsburgh regional library council's fred award, which annually honors the library with the lowest error rate in the prlc network. 12 through the use of peer pressure the member libraries and networks of oclc can encourage adherence to the standards. in addition, they must continue to insist that oclc address this annoying, expensive, and seemingly perennial problem. references l. anglo-american cataloging rules, north american text (chicago: american library assn., 1967), 409p. 2. anglo-american cataloging rules, chapter 6 (rev. ed.; chicago: american library assn., 1974), 122p. 3. anglo-american cataloguing rules, second edition (chicago: american library assn., 1978), 620p. 4. oclc, inc . , cataloging: user manual (columbus: oclc, 1979), 1v. (looseleaf). 5. oclc level i and level k input standards (columbus: ohio college library center, 1977), 1 v. (looseleaf). 6. constance c. ryans, "a study of errors found in non-marc cataloging in a machine-assisted system," journal of library automation 11:125-32 oune 1978). 7. ibid., p . 127. 8. frederick g. kilgour, "establishment of inter-network quality control council" (unpublished document, ohio college library center, 1977), 2p. 9. oclc, inc., books format (columbus: oclc, 1980), 1v. (looseleaf). 10. oclc, inc., cataloging: user manual, 1v. (looseleaf) . 11. ibid. 12. "prlc peer council cites pittsburgh theological seminary library for high cataloging standards," oclc newsletter 131:4 (sept. 1980). participatory networks | lankes, silverstein, and nicholson 17 author id box for 2 column layout column title editor the goal of the technology brief is to familiarize library decision-makers with the opportunities and challenges of participatory networks. in order to accomplish this goal the brief is divided into four sections (excluding an overview and a detailed statement of goal): ■ a conceptual framework for understanding and evaluating participatory networks; ■ a discussion of key concepts and technologies in participatory networks drawn primarily from web 2.0 and library 2.0; ■ a merging of the conceptual framework with the technological discussion to present a roadmap for library systems development; and ■ a set of recommendations to foster greater discussion and action on the topic of participatory networks and, more broadly, participatory librarianship. this summary will highlight the discussions in each of these four topics. for consistency, the section numbers and titles from the full brief are used. k nowledge is created through conversation. libraries are in the knowledge business. therefore, libraries are in the conversation business. some of those conversations span millennia, while others only span a few seconds. some of these conversations happen in real time. in some conversations, there is a broadcast of ideas from one author to multiple audiences. some conversa­ tions are sparked by a book, a video, or a web page. some of these conversations are as trivial as directing someone to the bathroom. other conversations center on the foun­ dations of ourselves and our humanity. it may be odd to start a technology brief with such seemingly abstract comments. yet, without this firm, if theoretical, footing, the advent of web 2.0, social net­ working, library 2.0, and participatory networks seems a clutter of new terminology, tools, and acronyms. in fact, as will be discussed, without this conceptual footing, many library functions can seem disconnected, and the field that serves lawyers, doctors, single mothers, and eight­year olds (among others) fragmented. the scale of this technology brief is limited; it is to present library decision­makers with the opportunities and challenges of participatory networks. it is only a single piece of a much larger puzzle that seeks to pres­ ent a cohesive framework for libraries. this framework not only will fit tools such as blogs and wikis into their offerings (where appropriate), but also will show how a more participatory, conversational approach to libraries in general can help libraries better integrate current and future functions. think of this document as an overview or introduction to participatory librarianship. readers will find plenty of examples and definitions of web 2.0 and social networking later in this article. however, to jump right into the technology without a larger frame­ work invites the rightful skepticism of a library organiza­ tion that feels constantly buffeted by new technological advances. in any environment with no larger conceptual founding, to measure the importance of an advance in technology or practice selection of any one technology or practice is nearly arbitrary. without a framework, the field becomes open to the influence of personalities and trendy technology. therefore, it is vital to ground any technological, social, or policy conversation into a larger, rooted concept. as susser said, “to practice without theory is to sail an uncharted sea; theory without practice is not to set sail at all.”1 for this paper, the chart will be conversation theory. the core of this article is in four sections: ■ a conceptual framework for understanding and eval­ uating participatory networks; ■ a discussion of key concepts and technologies in par­ ticipatory networks drawn primarily from web 2.0 and library 2.0; ■ a merging of the conceptual framework with the technological discussion to present a sort of roadmap for library systems development; and ■ a set of recommendations to foster greater discussion and action on the topic of participatory networks and, more broadly, participatory librarianship. it is recommended that the reader follow this order to get the big picture; however, the second section should be a useful primer on the language and concepts of partici­ patory networks. ■ library as a facilitator of conversation let us return to the concept that knowledge is created through conversation. this notion stretches back to socrates and the socratic method. however, the specific foundation for this statement comes from conversation theory, a means of explaining cognition and how people learn.2 it is not the purpose of this article to provide a r. david lankes (jdlankes@iis.syr.edu) is director and associate professor, joanne silverstein (jlsilver@iis.syr.edu) is research professor, and scott nicholson (scott@scottnicholson.com) is associate professor at the information institute of syracuse, (n.y.) syracuse university’s school of information studies. participatory networks: the library as conversation r. david lankes, joanne silverstein, and scott nicholson 18 information technology and libraries | december 200718 information technology and libraries | december 2007 detailed description of conversation theory, a task already admirably accomplished by pask. rather, let us use the theory as a structure upon which to hang an exploration of participatory networking and, more broadly, participa­ tory librarianship. the core of conversation theory is simple: people learn through conversation. different communities have different standards for conversations, from the scientific community’s rigorous formalisms, to the religious com­ munity’s embedded meaning in scripture, to the some­ times impenetrable dialect of teens. the point remains, however, that different actors establish meaning through determining common definitions and building upon shared concepts. the library has been a place where we facilitate con­ versations, though often implicitly. the concept of learn­ ing through conversation is evidenced in libraries in such large initiatives as information literacy and teaching criti­ cal thinking skills (using such meta­cognitive approaches as self­questioning), and in the smaller events of book groups, reference interviews, and speaker series. library activities such as building collections of artifacts (the tan­ gible products of conversation) inform scholars’ research through a formal conversation process where ideas are supported with evidence and methods. similarly, pres­ ervation efforts, perhaps of wax cylinders with spoken word content or of ancient maps that embody an ongo­ ing dialogue about the shape and nature of the physical world, seek to save, or at least document, important conversations. common use of the word “conversation” is com­ pletely in accordance with the use of the term in conver­ sation theory. the term is, however, more specifically defined as an act of communication and agreement between a set of agents. so, a conversation can be between two people, two organizations, two countries, or even within an individual. how can a conversation take place within an individual? educators and school librarians may be familiar with the term “metacogni­ tion,” or the act of reflecting on one’s learning.3 yet, even the most casual reader will be familiar with the concept of debating oneself (“if i go right, i’ll get there faster, but if i go left i can stop by jim’s . . .”). the point is that a conversation is with at least two agents trying to come to an understanding. also note that those two agents can change over time. so, while socrates and plato are dead, the conversation they started about the nature of knowl­ edge and the world is carried forward by new genera­ tions of thinkers—same conversation, different agents. people converse, organizations converse, states con­ verse, societies converse. the requirements, in the terms of conversation theory, are two cognitive systems seek­ ing agreement. the results of these conversations, what pask would call “cognitive entanglements,” are books, videos, and artifacts that either document, expand, or result from conversations.4 so, while one cannot con­ verse with a book, that book certainly can be a starting point for many conversations within the reader and within a larger community. if the theory is that conversation creates knowledge, the library community has added a corollary: the best knowl­ edge comes from an optimal information environment, one in which the most diverse and complete information is available to the conversant(s). library ethics show an implicit understanding of this corollary in the advocacy of intellectual freedom and unfettered access. libraries seek to create rich environments for knowledge and have taken the stance that they are not in the job of arbitrating the conversations that occur or the appropriateness of the information used to inform those conversations. as will be discussed later, this belief in openness of conversations will have some far­reaching implications for the library collec­ tion and is an ideal that can never truly be met. for now, the reader may take away that conversation theory is very much in line with current and past library practice, and it also shows a clear trajectory for the future. this viewpoint’s value is not just theoretical; it has real consequences and uses. for example, much of library evaluation has been based on numeric counts of tangible outputs: books circulated, collection size, reference transactions, and so on. yet this quantitative approach has been frustrating to many who feel they are count­ ing outcomes but not getting at true impact of library service. librarians may ask themselves, “which num­ bers are important . . . and why?” if libraries focused on conversations, there might be some clarity and cohesion between statistics and other outcomes. suddenly, the number of reference questions can be linked to items cat­ aloged or to circulation numbers . . . they are all markers of the scope and scale of conversations within the library context. this approach might enable the library com­ munity to better identify important conversations and demonstrate direct contributions to these conversations across functions. for example, a school district identifies early literacy as important. there is a discussion about public policy options, new programs, and school goals to achieve greater literacy in k–5. the library should be able to track two streams in this conversation. the first is the one libraries are accustomed to counting; that is, the library’s contribution to k–5 literacy (participation in book talks, children’s events, circulation of children’s books, reference questions, and so on). but the library also can document and demonstrate how it furthered the conversation about children’s literacy in general. it could show the resources provided to community offi­ cials. it could show the literacy pathfinders that were created. the point of this example is that the library is both participant in the conversation (what we do to pro­ mote early literacy) and facilitator of conversation (what we do to promote public discourse). article title | author 19participatory networks | lankes, silverstein, and nicholson 19 the theoretical discussion leads us to a discussion about the second topic of this technology brief: pragmatic aspects of the knowledge as conversation approach, or a participatory approach, as it will be called. as new technologies are developed and deployed in the current environment of limited resources, there must be some means of evaluating their utility. a technology’s util­ ity is appropriately measured against a given library’s mission, which is, in turn, developed to respond to the needs of the community that library serves. first, how­ ever, let us identify some of the new technologies and describe them briefly. ■ participatory networking, social networks, and web 2.0 let us now move from the theoretical to the opera­ tional. the impetus behind this article is the relatively recent emergence of a new group of internet services and capabilities. suddenly, terms such as wiki, blog, mashup, web 2.0, and biblioblogosphere have become commonplace. as with any new wave of technological creation, these terms can seem ambiguous. they also come wrapped in varying amounts of hype. they may all, however, be grouped under the phenomenon of par­ ticipatory networking. while we now have a conceptual framework to evaluate these technologies that support participatory networking (for example, do they further conversa­ tions), we still need to know the basics of the terminol­ ogy and technologies. this section outlines key concepts in the pragmatics of participatory networking. the section after this one will join the theoretical and operational to outline key chal­ lenges and opportunities for the library world. we begin with web 2.0. web 2.0 much of what we call participatory networking, at least the technological foundation of it, stems from developments in web 2.0.5 as with many buzzwords, the exact definition of web 2.0 is not clear. it is more an aggregation of concepts that range from software development (loosely coupled application programming interfaces [apis] and the ease of incorporating features across platforms) to abstrac­ tions (the user is the content). what pervades the web 2.0 approach is the notion that internet services are increas­ ingly facilitators of conversations. the following sections describe some of the characteristics of web 2.0. web 2.0 characteristic: social networks a core concept of web 2.0 is that people are the content of sites; that is, a site is not populated with information for users to consume. instead, services are provided to individual users for them to build networks of friends and other groups (professional, recreational, and so on). the content of a site, then, comprises user­provided infor­ mation that attracts new members of an ever­expanding network. examples include: ■ flickr. flickr (www.flickr.com) provides users with free web space to upload images and create photo albums. users then can share these photos with friends or with the public at large. flickr facilitates the creation of shared photo galleries around themes and places. ■ the cheshire public library. the teen book blog (http://cpltbb.wordpress.com) at the cheshire public library offers book reviews created only by the stu­ dents who use the library. ■ memorial hall library. the memorial hall library in andover, massachusetts, offers podcasts of poetry contests in which the content is created by students (www.mhl.org/teens/audio/index.htm). ■ libraries in myspace. myspace searches show that there are myspace sites for hundreds of individual libraries and scores of library groups. alexandrian public library (apl), for example, has established a site at myspace (www.myspace.com/teensatapl). this practice is growing among public libraries and is an attempt to reach out to users in their preferred online environments. in this venue, the more friends a library’s myspace site has, the more successful it may be considered. as of this writing, apl had sev­ enty­five friends and fifteen comments. the brooklyn college library had 2,195 friends and 270 comments. web 2.0 characteristic: wisdom of crowds there has been some research into the quality of mass decision­making.6 that research shows how remarkably accurate groups are in their judgments. web 2.0 pools large groups of users to comment on decisions. this aggregation of input is facilitated by the ready availabil­ ity of social networking sites. certainly, this approach of community organization and verification of knowledge also has its detractors. many, for example, question the wisdom seen in some entries of wikipedia. yet, recent articles have compared this mass editing process favor­ ably to traditional sources of information, such as the encyclopedia britannica.7 examples include: ■ ebay. ebay has perhaps the most studied and copied community policing and reputation systems. all buyers and sellers can be rated. the aggregation of many users’ experiences create a feedback score that is equivalent to a group credibility rating (see figure 1). these kinds of group feedback systems can now be seen in most major internet retailers. ■ librarything. librarything.com makes book recom­ 20 information technology and libraries | december 200720 information technology and libraries | december 2007 mendations based on the collective intelligence of all users of the site. the greater the pool of collective intelligence, the more information available to the user for decision­making. ■ the diary project. the diary project library (www. diaryproject.com) is a non­profit organization that encourages teens to write about their day­to­day experiences growing up. the goal of this site is to encourage communication among teens of all cul­ tures and backgrounds, provide peer­to­peer support, stimulate discussion, and generate feedback that can help ease some of the concerns teens encounter along the way and let them know that they are not alone. to that end, the site comprises thousands of entries in twenty­four categories. because of the great number of entries, most youth can find helpful materials. web 2.0 characteristic: loosely coupled apis an api provides a set of instructions (messages) that a programmer can use to communicate between applica­ tions. apis allow programmers to incorporate one piece of software they may not be able to directly manipulate (code) into another. for example, google maps has made a public api that allows web page designers to include satellite images into their web pages with little more than a latitude and longitude.8 apis vary in their ease of integration. loosely coupled apis allow for very easy integration using high­level scripting languages such as javascript9. examples include: ■ google maps. google maps displays street or sat­ ellite maps showing markers on specific locations provided by an external source with simple sets of longitudes and latitudes. it becomes extremely easy to create geographic information systems with little knowledge of gis principles. ■ flickr. flickr provides easy means to integrate hosted images into other web pages or applications (as with a google map that shows images taken at a specific location). ■ youtube. youtube (www.youtube.com) provides users with the capability to upload and comment upon video on the internet. it also allows for easy integration of the videos into other web pages and blogs. with a simple line of html code, anyone can access streaming video for their content. web 2.0 characteristic: mashups mashups are combinations of apis and data that result in new information resources and services.10 this ease of incorporation has led to an assumption of a “right to remix.” in the world of open source software and the creative commons, the right to remix refers to a grow­ ing expectation among internet users that they are not limited by the interfaces and uses presented to them by a single organization. examples include: ■ chicagocrime.org. an often­cited example of a mashup is chicagocrime.org, which uses google maps to plot crime data for the city of chicago. users can now see exactly which street corner had the most murders. figure 2 shows a marker at the location of every homicide in chicago from november 2, 2005, to august 2, 2006. ■ book burro. book burro (http://bookburro.org/ about.html) “is a web 2.0 extension for firefox and flock. when it senses you are looking at a page that contains a book, it will overlay a small panel which when opened lists prices at online bookstores such as amazon, buy, half (and many more) and whether the book is available at your library.” ■ library lookup. the mit library lookup greasemonkey script for firefox (http://libraries. mit.edu/help/lookup.html) searches mit’s barton catalog from an amazon book screen. web 2.0 characteristic: permanent betas the concept of a permanent beta is, in part, a realization that no software is ever truly complete so long as the user community is still commenting upon it. for example, google does not release services from beta until it has achieved a sufficient user base, no matter how fixed the underlying source code is.11 permanent beta also is a design strategy. large applications are broken into smaller constituent parts that can be manipulated sepa­ rately. this allows large applications to be continually figure 1. a seller’s profile shows a potential buyer the ebay community’s current estimation of a seller’s credibility. article title | author 21participatory networks | lankes, silverstein, and nicholson 21 developed by a more diverse and distributed community (as in open source). examples include: ■ google labs. google has a site named “google labs” (http://labs.google.com) that puts out company­ generated tools and services. in fact, part of a google employee’s work time is dedicated to creating the resources and tools through personal projects and exploration. these tools and services remain a part of the “lab” until they are finished and have sufficient user bases. projects (see figure 3) range from the simple (google suggest, which provides a dropdown box of possible search queries as you being to type your search terms) to the extensive (google maps, which started as a google lab project). ■ mit libraries. the mit libraries are experimenting with new technologies to help make access to informa­ tion easier. the tools below are offered to the public with an appeal for feedback and additional tools, and the there is a permanent address designed just to collect feedback from the beta­phase tools, which include: ■ the new humanities virtual browsery, which highlights new books and incorporates an rss feed, the ability to comment on books, links to book reviews, availability information, and links to other books by the same author. ■ the libx—mit edition (http://libraries.mit. edu/help/libx.html), which is a firefox toolbar that allows users to search the barton catalog, vera, google scholar, the sfx fulltext finder, and other search tools; it embeds links to mit­ only resources in amazon, barnes & noble, google scholar, and nyt book reviews. ■ the dewey research advisor business and economics q&a (http://libraries.mit.edu/help/ dra.html), which provides starting points for specific research questions in the fields of busi­ ness, management, and economics. web 2.0 characteristic: software gets better the more people use it an increasing number of web 2.0 sites emphasize social networks, where these services gain value only as they gain users. malcolm gladwell recounts this principle and the work of kevin kelly with an earlier telecommunica­ tions network, the network of fax machines connected to the phone system: the first fax machine ever made . . . cost about $2,000 at retail. but it was worth nothing because there was no other fax machine for it to communicate with. the second fax machine made the first fax machine more valuable, and the third fax made the first two more valuable, and so on. . . . when you buy a fax machine, then, what you are really buying is access to the entire fax network—which is infinitely more valuable than the machine itself.12 with social networking sites, and all sites that seek to capitalize on user input (reviews, annotations, profiles, etc.), the true value of each site is defined by the number of people it can bring together. a classic example of this characteristic is amazon. amazon sells books and other merchandise, but, in reality, amazon is very much about the marketing of information. amazon gains tremendous value by allowing its users to review and rate items. the more people use amazon and the more they comment, the more visibility these active users gain and the more credibility markers they take on. web 2.0 characteristic: folksonomies a folksonomy is a classification system created in a bottom­up fashion with no central coordination. this differs from the deductive approach of such classifica­ tions systems as the dewey decimal system, where the world of ideas is broken into ten nominal classes.13 it also differs from other means of developing classifications where some central authority determines if a term should be included. in a folksonomy, the members of a group simply attach terms (or tags) to items (such as photos or blog postings), and the aggregate of these terms is seen as the classification. what emerges is a classification scheme that prioritizes common usage (the most­used tags) over semantic clarity (if most people use “car,” but some use “cars,” they are seen as different terms, and the tag “auto­ mobile” has no real relationship within the aggregate classification). examples include: figure 2: screenshot of chicagocrime.org 22 information technology and libraries | december 200722 information technology and libraries | december 2007 ■ penntags. penntags (http://tags.library.upenn.edu/ help) is a social bookmarking tool for locating, orga­ nizing, and sharing one’s favorite online resources. members of the penn community can collect and maintain urls, links to journal articles, and records in franklin, the online catalog, and vcat, the online video catalog. once resources are compiled, users can organize them by assigning tags (free­text key­ words) or by grouping them into projects according to specific preferences. penntags also can be used collaboratively, as it acts as a repository of the varied interests and academic pursuits of the penn com­ munity, and a user can find topics and other users related to his or her own favorite online resources. ■ hillsdale teen library. the hillsdale teen library (www.flickr.com/photos/hillsdalelibraryteens) uses flickr to post pictures of events at the hillsdale teen library (figure 4). the resulting tag view is repre­ sented in figure 5. these tags allow users to easily retrieve the images in which they are interested. there are more characteristics of web 2.0, but these give some overall concepts. core new technologies: ajax and web services as we have just discussed, web 2.0 is little more than set of related concepts, albeit with a lot of value being currently attached to these concepts. these concepts are supported by two underlying technologies that have facilitated web 2.0 development and brought a substantially new (and improved) user experience to the web. the first is ajax, which allows a more desktop­like experience for users. the second is the advent of web services. these technolo­ gies are not necessary for web 2.0 concepts, but they have made web 2.0 sites much more compelling. ajax ajax stands for asynchronous javascript and xml.14 it is a set of existing web technologies brought together. at the most basic, ajax allows a browser (the part the user interacts with) and a server (where the data resides) to send data back and forth without needing to refresh the entire web page being worked on. think about the web sites you work with. you click on a link, the browser freezes and waits for the data, then draws it on the screen. early versions of such sites as mapquest would show a map. if you wanted to zoom into the map, you would press a zoom icon and wait while the new map, and the rest of the web page was redrawn. compare this to google maps, where you click in the middle of a map and drag left or right and the map moves dynamically. we are used to this kind of interaction in desktop applications. click and drag has become second nature on the desktop, and ajax is making it second nature on the web, too. another ajax advantage is that it is open and requires only light programming skills. javascript on the client and almost any server­side scripting language (such as active server pages or php) are easily accessible languages. this fact allows for both fast development and easier integration with existing systems. as an example, it should now be easier to bring more interactive web interfaces to existing online catalogs. web services web services allow for software­to­software interactions on the web.15 using web protocols and xml, applications exchange queries and information in order to facilitate the larger functioning of a system. one example would be a system that uses an isbn number to query multiple online catalogs and commercial vendors for availability (and price) of a book. this simple process might be part of a much larger library catalog that shows users a book and its availability. the point is, that unlike federated search systems such as z39.50, web services are small. they also tend to be lightweight (that is, limited in what they do), and are aggregated for greater functionality. this is the technological basis for the loosely coupled apis dis­ cussed previously. library 2.0 library 2.0 is a somewhat diffuse concept. walt crawford, in his extended essay “library 2.0 and ‘library 2.0,’” found sixty­two different (and often contradictory) views and seven distinct definitions of library 2.0.16 it is no wonder that people are confused. however, it is natural for emerging ideas and groups to function in an environ­ figure 3: screenshot of current google lab projects article title | author 23participatory networks | lankes, silverstein, and nicholson 23 ment of high ambiguity. for use in this technology brief, the authors see library 2.0 as an attempt to apply web 2.0 concepts (and some longstanding beliefs for greater com­ munity involvement) to the purpose of the library. in the words of ormsby, “the purpose of a library is not to . . . showcase new gadgetry . . . ; rather, it is to make possible that instant of insight when all the facts come together in the shape of new knowledge.”17 in the case of library 2.0, the new gadgetry discussed in the previous section comprises a group of software applications. how the applications are used will determine whether they support ormsby’s “instant of insight.” many libraries and librarians already are pursuing this goal. some, for instance, are using blogs to reach other librarians, their own users (on their own web sites), and potential users (using myspace and other online communities). they are using wikis to deliver reports, teach information literacy, and serve as repositories. one has developed an api that allows wordpress posts to be directly integrated into a library catalog. clearly, the internet and newer tools that empower users seem to be aligned with the library mission. after all, librarians blogging and allowing the catalog to be mashed up can be seen as an extension of current information services. but this abundance of new applications poses a challenge. given the speed with which new tools are invented, librarians may find it difficult to create strate­ gies that include all the desired services that they make possible. for every new application that becomes avail­ able, library administrators must decide whether it can serve the library, how to use it, and how to find additional resources to manage it (for example, “now we can do this. but why should we?”). this problem stems from focusing excessively on the technology. librarians should instead focus on the phenomena made possible by the technology. most important of these phenomena, the library invites participation. as chad and miller state: library 2.0 facilitates and encourages a culture of participation, drawing upon the perspectives and con­ tributions of library staff, technology partners and the wider community. library 2.0 means harnessing this type of participation so that libraries can benefit from increasingly rich collaborative cataloguing efforts, such as including contributions from partner libraries as well as adding rich enhancements, such as book jackets or movie files, to records from publishers and others. library 2.0 is about encouraging and enabling a library’s community of users to participate, contribut­ ing their own views on resources they have used and new ones to which they might wish access. with library 2.0, a library will continue to develop and deploy the rich descriptive standards of the domain, whilst embracing more participative approaches that encourage interaction with and the formation of com­ munities of interest.18 the carte blanche statement that users participating in the library is “good,” however, is insufficient. library administers must ask, “what is the ultimate goal?” in summary, current initiatives in the library world to bring the tools of web 2.0 to the service of library 2.0 are exciting and innovative, and, more to the point, they are supportive of the library’s purpose. they may, however, incur costs, such as monitoring blogs and wikis, and cre­ figure 4: hillsdale teen library figure 5: hillsdale teen library flickr site 24 information technology and libraries | december 200724 information technology and libraries | december 2007 ating content and corresponding with users that stretch already inadequate resources even further. ultimately, the value of library 2.0 concepts requires us to answer some important questions: will they be used to further knowledge, or will they simply create more work for librarians? what does the next version of library 2.0 look like? is its mission the same, and only the tools dif­ ferent? what makes the library different from myspace— simply a legacy? should we incorporate new services into the current library offerings? how do we, as facilitators of conversations, point the way to the next generation of library? it is hoped that some of the concepts in participa­ tory librarianship may answer these questions and help further the innovations of the library 2.0 community. participatory networks the authors use the phrase “participatory networking” to encompass the concept of using web 2.0 principles and technologies to implement a conversational model within a community (a library, a peer group, the general public, and so on). why not simply adopt social networking, web 2.0, or library 2.0 for that matter? let us examine each term’s limitations: ■ social networking: social network sites such as myspace and facebook have certainly captured public attention. they also have proven very popular. in their short life spans, these sites have garnered an immense audience (myspace has been ranked one of the top destination sites on the web) and drawn much atten­ tion from the press.19 some of that attention, however, has been very negative. myspace, for example, has been typified as a refuge for pedophiles and online predators. even the television show saturday night live has parodied the site for the ease with which users can create false personas and engage in risky online behaviors.20 to say you are starting a social networking site in your library may draw either enthusiastic support, vehement opposition (“social networking experiment in my library?!”), or simply confused looks. add to the potential negative con­ notations the ambiguity of the term. is a blog a social networking site? is flickr? to compound this confu­ sion, the academic domain of social network theory predates myspace by about a decade. ■ web 2.0: ambiguity also dogs the web 2.0 world. for some, it is technology (blogs, ajax, web ser­ vices, and so on). for others, it is simply a buzzword for the current crop of internet sites that survived the burst of the dot­com bubble. in any case, web 2.0 certainly implies more than just the inclusion of users in systems. ■ library 2.0: as stated before, the term library 2.0 is a vague term used by some as a goad to the library community. further, this term limits the discussion of user­inclusive web services to the library world. while this brief focuses on the library community, it also sees the library community as a potential leader in a much broader field. so, ultimately, the authors propose “participatory net­ working” as a positive term and concept that libraries can use and promote without the confusion and limitations of previous language. the phrase “participatory network” also has a history of prior use that can be built upon. it represents systems of exchange and integration and has long been used in discussions of policy, art, and government.21 the phrase also has been used to describe online communities that exchange and integrate information. ■ libraries as participatory conversations so where are we? we started with the abstract statement that knowledge is created through conversation. we then looked at the current landscape of technologies that can facilitate these conversations and showed examples of how libraries, other industries, and individuals are using these technologies. in this section we combine the larger framework with the technologies to see how libraries can incorporate participatory networks to further their knowledge mission. participatory librarianship in action let us look specifically at how participatory networks can be used in the library’s role as facilitator of knowledge through conversation. an obvious example is libraries hosting blogs and wikis for their communities, creat­ ing virtual meeting spaces for individuals and groups. indeed, these are increasingly useful functions for librar­ ies. they meet a perceived need in the community and can generate excitement both within the library and in the community. the idea of creating online sites for individu­ als and organizations makes sense for a library, although it is not without difficulties (see the section on challenges and opportunities). libraries also could use freely avail­ able (and increasingly easy to implement) open source software to create library versions of wikipedia (with or without enhanced editorial processes). another way for libraries to offer these services would be through a cooperative or other third­party vendor. such a service easily can be seen as a knowledge management activity capturing and providing local expertise while linking this expertise to that produced at other libraries. another reason for libraries to engage in participatory networking is that one library can more easily collaborate article title | author 25participatory networks | lankes, silverstein, and nicholson 25 with other libraries in richer dialogues. we currently have systems that connect our online catalogs and share resources through interlibrary loan. these conduits exist and can be used for the transferal of richer data, as has been proved through collaborative virtual reference sys­ tems. in our current systems, as in traditional library practice, when users are referred to other libraries, they are sent out and not brought back. in a participatory library setting, libraries would facilitate a conversation between the user, the community of the local library, and then through the developed conduits, other libraries and their communities. the end result would be a seamless web of libraries where the user can ignore the intrica­ cies of the library’s organization structure and boundar­ ies, and in which the libraries are using the best local resources to meet local needs. bringing libraries seamlessly together to participate in conversations with a single user has another sig­ nificant advantage: the library would make it easy for users to join the conversation regardless of where they are, through the presentation of a single façade. there is, for example, only one google, one amazon, and one wikipedia. why should users have to search from among thousands of libraries to find the conversations they want? participatory networking will be most effective when libraries work together, when the whole is greater than its parts. we currently see elements of the participatory library in the oclc open worldcat project. for example, users searching google may come across a listing provided by oclc. after selecting the entry for the book, the user can then jump to his or her own local library’s information about the book. users do not have to know which library to visit to find a book near them. extending this concept to conversations, one goal of these participatory networks is to make it easier for the user to enter a conversation with the library without having to work to discover their own specific entry points. however, ensuring this effective seamless access to the library will require more than simply adding ele­ ments of participatory networking around the library’s edges. adding services such as blogs and wikis may be seen merely as adjunct to current library offerings. as with any technological advance, scarce resources must be weighed against a desire to incorporate new services. do we expand the collection, improve the web site, or offer blogs to students? a better approach for making these kinds of decisions is to look at the needs of the community served in context with the commonly accepted, core tasks of a library, and see how they can be recast (and enhanced) as conversational, or participatory, tools. in point of fact, every service, patron, and access point is a starting point for a conversation. let’s start with the catalog. if the catalog is a conversation, it is decidedly formal and, more importantly, one way. think of today’s catalog as the educational equivalent of a college lecture. a for­ mal system is used to serve up a series of presentations on a given topic (selected by the user). the presentations are rigid in their construction (marc, aacr2, and so on). they follow an abstract model (relevance scores, some­ times alphabetical listings), and provide minimal oppor­ tunities to the receiver of the information to provide feedback or input. they provide no constructive means for the user to improve or shape the conversation. even recent advances in catalog functions (dynamic, graphical visualizations; faceted searching; simple search boxes’ links to non­collection resources) do little more than make the presentation of information more varied. they are still not truly interactive because they do not allow user participation; they do not allow for conversation. to highlight the one­way nature of the catalog, ask a simple question: what happens when the user doesn’t find something? do we assume that the information is there, but that the user is simply incapable of finding it (in which case the catalog presents search tips, refers the patron to an expert librarian who is capable, or offers more information literacy instruction)? do we assume that the information does not exist (refer the patron to interlibrary loan, pass him or her on to a broader search engine)? do we assume that the catalog itself is limited (refer the user to online databases, or other finding aids)? what if we assume that the catalog is just the current place a user is involving in an ongoing conversation —what would that look like? how can such a traditionally rigid system (in concept, more than in any one feature set) be made more participa­ tory? what if the user, finding no relevant information in the catalog, adds either the information or a place­ holder for someone else to fill in the missing information? possibly the user adds information from his or her exper­ tise. however, assuming that most people go to a catalog because they don’t have the information, perhaps the user instead begins a process for adding the information. the user might ask a question using a virtual reference service; at the end of the transaction, the user then has the option to add the question, along with the answer and associ­ ated materials, to the catalog. or perhaps, the user simply leaves the query in the catalog for other patrons to answer, requesting to be notified when a response is posted. in that case, when a new user does a catalog search and runs across the question, he or she can provide an answer. that answer might be a textual entry (or an image, sound, or video), or simply a new query that directs the original questioner or new patrons to existing information in the catalog (user­created see also entries in the catalog). the catalog also can associate conversations with any data point. for example, a user pulls up the record for a book she or he feels might be relevant to an information need she or he is having. this process starts a conver­ sation between that user and the library, its users, and 26 information technology and libraries | december 200726 information technology and libraries | december 2007 authors of associated works. the user can see comments and ratings associated with this book from not only users of this library, but users of other libraries. also associated is a list of related works and the full audio of a lecture by the author. the user also might be directed to an in­ person or online book group that is reading that book. the point is that the catalog facilitates a conversation as opposed to simply presenting what it “knows” about a topic and then stepping out of the process. the catalog, then, does not simply present information, but instead helps users construct knowledge by allowing the user to participate in a conversation. there are other means of improving (and linking) systems in a conversational library. take the implicit link between the catalog and circulation. of course, these systems have always been linked in that items found in the catalog can be checked out, and checked out items have their status reflected in the catalog. but this kind of state information is a pretty meager offering. imagine using circulation data to improve the actual functionality of the catalog. take the example of a user who is search­ ing the catalog for fictional books on magic. currently, a relevance score between an item’s metadata and the query is computed and then all the items are ranked in a retrieval set. this relevance score can be computed in many ways, but is usually based on the number of times a keyword appears in the record and the placement of that keyword in the metadata record (giving preference to terms appearing in certain marc fields, such as titles). what is missing is the actual, real­world circulation of an item. wouldn’t it make sense, given such an abstract query, to present the user with harry potter first (but not exclusively)? what if we added circulation data to our relevance rankings: how many times this item has been checked out? it turns out that using a simple statistic is amazingly powerful. it is akin to google’s page rank algorithm that presents sites most linked to higher in the results. also, for those worried that users would be flooded with only popular materials, studies show that while these algorithms do change the very top ranked material, the effect quickly fades so that the user can still easily find other materials. another consideration for adjusting a search is to allow the user to tweak the algorithms used to retrieve works. in the example above, a user could turn off the popularity feature. the user also could toggle switches for currency, authority, and other facets of relevancy rankings. the conversational model requires us to rethink the catalog as a dynamic system with data of varying levels of currency and, frankly, quality, coming into and out of the system. in a conversational catalog, there is no reason that some data can’t exist in the catalog for limited dura­ tions (from years to seconds). records of well­groomed physical collections may be a core and durable collection in the catalog, but that is only one of many types of infor­ mation that could exist in the catalog space. furthermore, even this core data can be annotated and linked to (and from) more transient media. so, the user might see a review from a blog as part of a catalog record on one day, but when she or he pulls the record up again in a few days, that review might be absent, the blog writer hav­ ing withdrawn the comment. this is akin to weeding the collection; however, it would happen in a more dynamic fashion than occurs with the content on library shelves. the conversational model also can be used in other areas of the library. what do we digitize? what do we select? what programs do we offer? what do we pre­ serve? the empowered user can participate in answer­ ing all of these questions but does not replace the expert librarian; rather, the user contributes additional and diverse information and commentary. in fact, the catalog scenario just proposed already assumes that the library catalog does more than store metadata. in order for the scenario to work, the catalog must store questions, answers, video, audio—in essence the catalog must be expanded and integrated with other library systems so that a final participatory library system can present a coherent view of the library to patrons. the next section lays out a sort of roadmap for these enhance­ ments and mergers. framework for integration of participatory librarianship as has been noted, participatory networks and libraries as conversations are not brand new concepts sprung from the head of zeus. instead, they are means to integrate past and current innovations and create a viable plan forward. figure 6 provides a sort of road map of how the library might make the transition from current systems to a truly participatory system. it includes current systems, systems under development (such as federated searching), and new concepts (such as the participatory library). it seeks to capture current momentum and push the field forward to a larger view instead of getting bogged down in the intricacies of any one development activity. along the left side of the graph are current library systems. while the terminology may differ from library to library, nearly every system can be found on today’s library web sites. by showing the systems together, the problems of user confusion and library management burden become obvious. users must often navigate these systems based on their needs, and often with little help. should they search the catalogs first, or the databases? isn’t the catalog really just another database? which data­ base do they choose? in our attempts to serve the users better by creating a rich set of resources and services, we have instead complicated their information­seeking lives. as one librarian puts it, “don’t give me one more system i, or my patrons, have to deal with.” article title | author 27participatory networks | lankes, silverstein, and nicholson 27 from the array of systems on the left side, we can see that libraries have not been doing themselves any favors either. we are maintaining many systems, therefore mak­ ing the calls for yet more systems not only impractical but unwise. the answer is to integrate systems, combining the best of each while discarding the complexity of the whole. the library world is in the midst of doing just that. this section seeks to highlight promising developments in integrating library systems well beyond the library catalog and to highlight not only an ideal endpoint, but also how this ideal system is truly participatory. merging reference and community involvement the functional area furthest along in the integration of participatory librarianship is reference; as reference is most readily recognizable as a conversation, this comes as no surprise. over the last decade, reference services have gone online and have led to shared reference ser­ vices. more importantly, reference done online creates artifacts of reference conversations: electronic files than can be cleaned of personal information and placed in a knowledge base and used as a resource for other users. a new development in reference is the reference blog, in which multiple librarians and other users can be part of a question­answering community with conversations that can live on beyond a single transaction. another functional area of libraries that is already involved with participatory librarianship is community involvement. for decades, public libraries have supported local community groups through meeting spaces. some libraries now are hosting web spaces for local groups. as libraries incorporate participatory technologies into their offerings, they can create virtual places such as discussion forums, wikis, and blogs for these community groups to use. if there are standards for these discussion areas, then groups from different communities also could easily participate in shared boards; this makes sense for groups such as weight watchers or alcoholics anonymous that have local branches and national involvement. in an academic setting, these groups can be student, faculty, or staff organizations or courses. in addition to reference and hosted community con­ versations, the library has been actively creating digi­ tal collections of materials (either through digitization, leasing service from content providers, or capturing the library’s born digital items). parallel to the digital collec­ tion building of library materials is an active attempt to create institutional repositories of faculty papers, teacher lesson plans, organizational documentation, and the like. these services are participatory systems in which col­ lections come from users’ contributions, and they may evolve into digital repositories that include both user­ and librarian­created artifacts. these different conversations can be archived into a single repository, and, if properly planned, the refer­ ence conversations can live alongside, and eventually be intermingled with, the community conversations, and the digital repository (which, after all, though formal, is a community conversation) into a community repository. community repositories allow librarians to be more eas­ ily involved in the conversations of the community and capture important artifacts of these conversations for later use. merging library metadata into an enhanced catalog participatory librarianship can be supported by another functional area of the library: collections. traditionally, the collection comprises books, magazines, and other information resources paid for by the library. electronic resources, such as databases that are leased instead of purchased, make up a large portion of library expen­ ditures. more recently, web­based resources (external feeds and sites) have been selected and added to the virtual collection. several kinds of finding aids are used to locate these information resources. the catalog and databases both contain descriptions of resources and searching interfaces. in order to improve access, libraries include records for databases within the catalog. conversely, federated search­ ing tools combine the records from different databases and could allow the retrieval of both books and articles by com­ bining records from the traditional catalog and databases into one tool. if community­created resources are part of the catalog, then these resources also would be findable alongside other traditional library resources. the tools for describing information resources also can be participatory. in traditional librarianship, the librarians provide metadata that patrons then use to make selections. figure 6: road map of how the library might make the transition from current systems to a truly participatory system. 28 information technology and libraries | december 200728 information technology and libraries | december 2007 by examining this use data, recommender systems can be created to help users locate new materials. in participatory networking, patrons will be encouraged to add comments about items. if standards are used for these comments, then they can be shared among libraries to create larger pools of recommendations. as these comments are analyzed, they can be combined with usage databases to create stronger recommender systems to present patrons with additional choices based upon what is being explored. the end result is an enhanced catalog that allows users and libraries to find information regardless of which sys­ tem the information resides in. however, the enhanced catalog is still just that, a catalog. it contains surrogates of digital information and is managed separately from the artifacts themselves. in the case of physical items, this may be all the library systems can manage, but in the case of digital content, there is one more step that needs to be taken. namely, the artificial barrier between catalog (defined as inventory control system) and content (housed in the community repository) must come down. building the participatory library at this point in the evolution of distributed systems into a truly integrated library system, the participatory library, we have two large collections: one of resources, and one of information about the resources. the first collection of digital content, the community repository, is built by the library and its users collaboratively. the second collection, the enhanced catalog, includes metadata, both formal and user­created (such as ratings, commentary, use data, and the like). both the community repository and the enriched catalog are participatory. yet to realize the dream of a seamless system of functionality (seamless to the user and the library), these two systems must be merged, allow­ ing users to find resources and, much more importantly, conversations. furthermore, the users must be able to add to metadata (such as tags to catalog records) and content (such as articles, postings to a wiki, or personal images). the result may be conceived of as a single integrated infor­ mation resource, which, for the purposes of this conversa­ tion, is called the participatory library. users may access the participatory library directly through the library or as a series of services in google, myspace, or their own home pages. the point is that the access to the library takes place at the point of conversa­ tion, not at the point the user realizes he or she needs information from the library. conversations and preservation the conversation model highlights the need for preserva­ tion. aside from simply providing systems that facilitate conversation, libraries serve as the vital community memory. conversations construct knowledge, but some­ one must remember what has already been said and know how to access that dialog. scientific conversations, for example, are built on previous conversations (theories, studies, methods, results, and hypotheses). capturing conversations and playing them back at the right time is essential. this might mean the preservation of artifacts (maps, transcripts, blueprints, photographs), but also it means the increasingly important tasks of capturing the digital dialogs. this highlights the need for institutional repositories (that will later be integrated seamlessly with other library systems, as previously discussed). specifically, web sites, lectures, courseware, and articles must be kept. further, they must be kept in true conversa­ tional repositories that capture the artifacts (the papers), the methods (data, instruments, policy documents), and the process (meeting notes, conversations, presentations, web sites, electronic discussions). they must be kept in information structures that make them readily available as conversations; in other words, users must be able to search for materials and reconstruct a conversation in its entirety from one fragment. being where the conversation is imagine the conversations that are going on in your local library as you read this. imagine the physicist chatting with the gardener, and the trustee talking with the volunteer who is reading the latest best­seller. what knowledge can be gleaned from these novel interac­ tions? can you measure it? can you enhance it? can you capture it? can you recall it when it would be precisely what a user needs? note also that these conversations do not belong solely to the library. the library is only part of the con­ versation. faced with the daunting variety of resources available on the web, many organizations try to become the single point of entry into it. remember that conversa­ tions are varied in their mode, places, and players, and, more importantly, that they are intensely personal. this means that participants need to have ownership in them, and often in their locations as well. this also means that the library, as facilitator, needs to be varied in its modes and access points. in many cases, it is better to either create a personal space in which users may converse, or, increasingly, to be part of someone else’s space. what we can learn from web 2.0’s mashups is that smaller sets of limited (but easy to access) functionalities lead to greater incorporation of tools into people’s lives. in the chicagocrime–google maps mashup, combining maps from google and chicago crime statistics, it was important for the host of the site to brand the space and shape the interface for his conversation on crime. can your library functions be as easily incorporated into these types of conversations? can a user search your catalog and present the results on his or her web site? the point is that libraries need to be proactive in a new way. instead of article title | author 29participatory networks | lankes, silverstein, and nicholson 29 the mantra, “be where the user is,” we need to, “be where the conversation is.” it is not enough to be at the users’ desktops; you need to be in their e­mail program, in their myspace pages, in their instant messaging lists, and in their rss feed readers. all of these examples point to a significant mental shift that librarians will need to make in moving from delivering information from a centralized location to delivering information in a decentralized manner where the conversations of users are taking place. the catalog example presented earlier is an example of a centralized place for conversations. what if, instead of only being in a catalog, the same data were split into smaller components and embedded in the user’s browser and e­mail pro­ grams? just as google’s mail system embeds advertising based upon the content of a message, the library could provide links to its resources based upon what a user is working on. by disaggregating the information within its system, the library can deliver just what is needed to a user, provide connections into mashups, and live in the space of the user instead of forcing the user to come to the space of the library. challenges and opportunities there is clearly a host of challenges in incorporating par­ ticipatory networks and a participatory model into the library. this is to be expected when we are dealing with something as fundamental as knowledge and as personal as conversations. we consider four major challenges that must be met by libraries before they can truly get into the business of participatory librarianship. technical there is a rich suite of participatory networking software that libraries can incorporate into their daily operations. implementing a blog, a wiki, or rss feeds these days is not a hard task, and they can easily be used to deliver information about library services and conversations to the user’s space. furthermore, these systems are often tested in very large­scale environments and are, in some cases, the same tools used in large participatory network­ ing sites such as wikipedia and blogger. some of these packages are commercial, but others are open source software. open source software is cheaper, easier to adapt, and, in some cases, more advanced. the downside to open source is that it requires a considerable amount of technical knowledge by the library (but not as much as one might think) and does not come with a technical support hotline. the largest technological impediment, however, may be the currently installed base of software within librar­ ies. integrated library systems have a long history and include a broad range of library functions. legacy code and near monolithic systems have restricted the easy exchange of a diverse set of information. were these sys­ tems written today, they would use modular code and loosely coupled apis and allow customers much more interface customizability. these changes may come to integrated library systems (as customers are demanding it), but it may take years. several libraries are currently attempting to pick apart these integrated systems themselves. often, libraries go to the underlying databases that hold the library metadata or create their own data structures, such as the university of pennsylvania data farm project.22 once components of this system are exposed, the catalog simply becomes another database that can be federated into new and uni­ fied interfaces. however, such integration requires a great deal of technological expertise. there is an opportunity for integrated library system vendors or large consortial groups such as oclc to move quickly into this space. in the meantime, there is an opportunity for the larger library community. this technology brief was created in response to a perceived need. whether evi­ denced in the library 2.0 community or in conversations at lita, libraries are now interested in incorporating new web technologies into their offerings and opera­ tions. the technologies under consideration here pres­ ent platforms for experimentation. rather than setting up thousands of separated experiments, however, the library community should create a participatory net­ work of its own. the technology certainly exists to create a test bed for libraries to set up various combinations of communication technologies (blogs, tagging, wikis), to test new web services against pooled data (catalog data, metadata repositories, and large scale data sets), and even to incorporate new services into the current library offerings (rss feeds, for example). by combining resources (money, time, expertise) in a single, large­scale test bed, libraries not only can get greater impact for the their investments, but can directly experience life as a connected conversation. these connections, if built at the ground level, will then make it easier for the library to come into existence. terminology can be clarified, claims tested, and best practices collaboratively developed, greatly accelerating innovation and dissemination. operational in addition to being in the conversation business, librar­ ies are in the infrastructure business. one of the most powerful aspects of a library is its ability not only to develop a collection of some type of information, but to maintain it over time. sometimes infrastructure can be problematic (as in the case of legacy systems), but more often than not it provides a stable foundation from which to operate. there are many conversations going on that need infrastructure but have none (or little). think of the opportunities in your community for using the web to 30 information technology and libraries | december 200730 information technology and libraries | december 2007 facilitate a conversation. it might be a researcher want­ ing to disseminate the results of his or her latest study. it might be a community organization seeking funding. it might be a business trying to manage its basic opera­ tional knowledge. the point is that such individuals and community organizations are not in the infrastructure business and could use a partner who is. imagine a local organization coming to the library and, within a few min­ utes, setting up a web site with an rss feed, a blog, and bulletin boards. the library facilitates, but does not own, that individual’s or organization’s conversation. it does form a strong partnership, however, that can be leveraged into resources and support. the true power of participa­ tory networking in libraries is not to give every librarian a blog; it is in giving every community member a blog (and making the librarian a part of the community). in addition, the library can play the role of connecting these conversations to other users when appropriate. participatory libraries allow the concept of com­ munity center (intellectual center, service center, media center, information center, meeting center) to be extended to the web. many public libraries have no problem providing meeting space to local non­profits. why not provide web meeting space in the form of a web site or web conferencing? many academic libraries attempt to capture the scholarly output of their faculties, why not help generate the output with research data stores? the answers to these questions inevitably come back to time and money. however, there is nothing in this brief that says such services have to be free. in fact, the best part­ nerships are formed when all partners are invested in the process. the true problem is that libraries have no idea of how to charge for such services. faculty would be glad to write library support into grants (in the form of web site creation and hosting), but need a dollar figure to include and how long each task will take. many libraries aren’t used to positioning their services on a per item basis, and this makes it difficult to build partnerships. sometimes it is not a lack of money, but a lack of structure to take in money that is the problem. policy as always, it is policy that presents the greatest challenges. the idea of opening the library functions to a greater set of inputs is rife with potential pitfalls. how can libraries use the technologies and concepts of facebook and myspace without being plagued by their problems? how can users truly be made part of the collection without the library being liable for all of their actions? the answers may lie in a seemingly obscure concept: identity management. conversations can range in their mode, topic, and duration. they also can vary in the conversants. the library needs to know a conversant’s status to determine policy (for example, we can only disclose this information to this person), and requires a unique identifier, such as a library card, to uphold it. in traditional libraries, that is the extent of identity management. in a participatory model, distinctions among identi­ ties become complex and graduated, and require us to consider a new approach. this new model, of patrons adding information directly to library systems, is not as radical as it may first appear. we have become very used to the idea of roles and tiered levels of authority in many other settings. most modern computer systems allow for some gradation in user abilities (and responsibilities). online communities have even introduced merit systems, by which continual high­quality contributions to a site equals greater power in the site. think about amazon, wikipedia, even ebay; as users contribute more to the community, they gain status and recognition. from par­ ticipants to editors, from readers to writers, these organi­ zations have seen membership as a sliding scale of trust, and libraries need to adopt this approach in all of their basic systems. we currently do, to a degree, in the form of librarians, paraprofessionals, and other staff. yet even these distinctions tend to be rigid and often class­based, with high walls (such as a master’s degree) between the strata. some of this is imposed by outside organizations (civil service requirements, tenure track, and so on), but a great deal is there by inertia of the field. skillful use of identity management will help librar­ ies avoid the baggage of myspace and facebook. as users grain greater access, greater responsibility, and greater autonomy, libraries need to be more certain of their identities. that is, for a user to do more requires the library to know more. knowing about a user may involve traditional identity verification or tracking an activity trail, whereby intentions can be judged in rela­ tion to actions. these concepts may be expressed as, “the more we know you, the more control you can have in valuable services such as blogging, or the catalog.” the concepts are illustrated in blogger and livejournal, both of which require some level of identity information. in another example, to join livejournal you must be invited, thus the community confers identity. the common theme is that verifying (and building) identity is community­ based. the difference between the library and myspace is that the library works in an established community with traditional norms of identity, whereas myspace is seeking to create a community (where identity is more defined by social connections than actions). both the library and the services mentioned above, however, base their functions and services on identity. ethical as knowledge is developed through conversation, and libraries facilitate this process, libraries have a powerful impact on the knowledge generated. can librarians inter­ fere with and shape conversations? absolutely. should we? we can’t help it. our collections, our reference work, article title | author 31participatory networks | lankes, silverstein, and nicholson 31 our mere presence will influence conversations. the ques­ tion is, in what ways? by dedicating a library mission to directly align with the needs of a finite community, we are accepting the biases, norms, and priorities of the com­ munity. while a library may seek to expand or change the community, it does so from within. when internet filtering became a requirement for fed­ eral internet funding, public and school libraries could not simply quit, or ignore the fact, because they are agents of their communities. school libraries had to accept filtering with federal funding because their parent organizations, the schools, accepted filtering.23 we see, from this example, that libraries may shift from facilitating conversations to becoming active conversants, but they are always doing both. thus, the question is not whether the library shapes conversations, but which ones, and how actively? these questions are hardly new to the underlying principles of librarianship. and nothing in the participa­ tory model seeks to change those underlying principles. the participatory model does, however, highlight the fact that those principles shape conversations and have an impact on the community. ■ recommendations the overall recommendation of this article is that librar­ ies must be active participants in the ongoing conversa­ tions about participatory networking. they must do so through action, by modeling appropriate and innovative use of technologies. this must be done at the core of the library, not on the periphery. rather than just adding blogs and photosharing, libraries should adopt the princi­ ples of participation in existing core library technologies, such as the catalog. anything less simply adds stress and stretches scarce resources even further. to complement this broad recommendation, the authors make two specific proposals: expand and deepen the discussion and understanding of participatory net­ works and participatory librarianship, and create a par­ ticipatory library test bed to give librarians needed participatory skills and sustain a standing research agenda in participatory librarianship. as stated in the outset of this document, what you are reading is limited. while it certainly contains the kernel and essence of participatory networks (systems to allow users to be truly part of services) and participatory librar­ ianship (the role of librarianship as facilitators and actors in conversations in general), the focus was on technology and technology changes. already, the ideas contained in this document have been part of an active conversation. the first draft of this document was made available for public comment via a wiki, e­mail, and bulletin boards, and concepts herein presented at conferences and lec­ tures. however, there is now a need to broaden the scope and scale of the conversation. the theoretical founda­ tions of participatory librarianship need to be rigorously presented. the nontechnical components of the ideas (and the marriage of nontechnical to technical) need to be explored. there are curricular implications: how do we prepare participatory librarians? the nature and form of the library and participatory systems need to be dis­ cussed and examined in theoretical, experimental, and operational contexts. in order to do this, the authors propose a series of con­ versations to engage the ideas. these conversations, both in person and virtual, need to be within the profession and across disciplines and industries. the deeper conversa­ tions need to be documented in a series of publications that expand this document for academics and practitioners. the authors feel, however, that the first proposal must be grounded in action. to complement the more abstract exploration of participatory networks and participatory librarianship, there must be an active playground where conversants can experience firsthand the technologies discussed, and then actively shape the tools of partici­ pation. this is the test bed. this test bed would imple­ ment a participatory network of libraries, and provide a common technology platform to host blogs, wikis, discussion boards, rss aggregators, and the like. these shared technologies would be used to experiment with new technologies and to provide real services to librar­ ies. thus, libraries could not only read about blogging applications, they could try them and even roll them out to their community members. as libraries start new com­ munity initiatives, they could rapidly add wikis and rss feeds hosted at the shared test bed. the test bed would also make all software available to the libraries so they could locally implement technologies that have proven themselves. the test bed would provide the open source software and consulting support to implement features locally. the test bed also would develop new metrics and means of evaluating participatory library services for the use of planners and policy makers. a major deliverable of the test bed, however, would be to model innovations in integrated library systems (ils). the test bed would work with libraries and ils vendors to pilot new technologies and specify new stan­ dards to accelerate ils modernization. the point of the test bed is not to create new ilss, but to make it easy to incorporate innovative technologies into vendor and open source ilss. the location and support model of the test bed are open for the library community to determine. certainly, it could be placed in existing library associations or orga­ nizations. however, it would require the host to be seen as neutral in ils issues, and to be capable of supporting a diverse infrastructure over time. the host organiza­ tion also would need to be a nimble organization, able 32 information technology and libraries | december 200732 information technology and libraries | december 2007 to identify new technical opportunities and implement them quickly. one model that might work is establishing a pooled fund from interested libraries. this pooled fund would support an open source technology infrastructure and a small team of researchers and developers. the team’s activities would be overseen by an advisory panel drawn from contributing members. such a model spreads this investment out into experimentation across a broad col­ laboration and should, ultimately, save libraries time and money. as a result, the time and money that indi­ vidual libraries might spend on isolated or disconnected experiments can be invested in a common effort with greater return. libraries have a chance not only to improve service to their local communities, but to advance the field of par­ ticipatory networks. with their principles, dedication to service, and unique knowledge of infrastructure, libraries are poised not simply to respond to new technologies, but to drive them. by tying technological implementa­ tion, development, and improvement to the mission of facilitating conversations across fields, libraries can gain invaluable visibility and resources. impact and leadership, however, come from a firm and conceptual understanding of libraries’ roles in their communities. the assertion that libraries are an indis­ pensable part of knowledge generation in all sectors pro­ vides a powerful argument to an expanded function of libraries. eventually, blogs, wikis, rss, and ajax all will fade in the continuously dynamic internet environment. however, the concept of participatory networks and con­ versations is durable. ■ acknowledgements the authors would like to thank the following people and groups: ken lavender, for his editing prowess. the doctoral students of ist 800 for providing input on conversation theory: johanna birkland, john d’ignazio, keisuke inoue, jonathan jackson, todd marshall, jeffrey owens, katie parker, david pimentel, michael scialdone, jaime snyder, sarah webb. the students of ist 676 for their tremendous input and for their exploration of the related concept of massive scale librarianship: marcia alden, charles bush, janet chemotti, janet feathers, gabrielle gosselin, ana guimaraes, colleen halpin, katie hayduke, agnes imecs, jennifer kilbury, min­chun ku, todd mccall, virginia payne, joseph ryan, jean van doren, susan yoo. those who commented on the draft, including karen scheider, walt crawford and john buschman, and kathleen de la peña mccook. lita for giving us a forum for feedback. carrie lowe, rick weingarten, and mark bard of ala’s oitp for their feedback and support. the institute staff, including lisa pawlewicz, joan laskowski, and christian o’brien, for logistical support. references and notes 1. cited in p. hardiker and m. baker, “towards social theory for social work,” handbook of theory for practice teachers in social work, j. lishman, ed. (london: jessica kingsley, 1991). 2. g. pask, conversation theory: applications in education and epistemology (new york: elsevier, 1976). 3. linda h. bertland, “an overview of research in metacog­ nition: implications for information skills instruction,” school library media quarterly 15 (winter 1986): 96–99. 4. pask, conversation theory, 92. 5. tim o’reilly, “what is web 2.0: design patterns and business models for the next generation of software,” o’reilly, www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/ what­is­web­20.html (accessed feb. 1, 2007). 6. j. suroweicki, the wisdom of crowds (new york: double­ day, 2004). 7. “wiki’s wild world: researchers should read wikipedia cautiously and amend it enthusiastically,” nature 438, no. 890 (dec. 2005): 890; www.nature.com/nature/journal/v438/ n7070/full/438890a.html (accessed feb 1, 2007). 8. google, “google maps api,” www.google.com/apis/ maps (accessed feb. 1, 2007). 9. “java script tutorial,” w3 schools, www.w3schools.com/ js/default.asp (accessed feb. 1, 2007). 10. while the terms in web 2.0 are a bit ambiguous, many people confuse the term “mashup” with “remixes.” mashups are combining data and functions (such as mapping), whereas remixes are reusing and combining content only. so combining a song with a piece of video to create a “new” music video would be a remix. mapping all of your videos on a map using youtube to store the videos and google maps to plot them geographically would be a mashup. 11. for example gmail, a very widely used, web­based email service, but is still considered “beta” by google. 12. malcolm gladwell, the tipping point: how little things can make a big difference (boston: back bay books, 2000), 272. 13. oclc, “introduction to dewey decimal classification,” www.oclc.org/dewey/versions/ddc22print/intro.pdf (accessed feb. 1, 2007). 14. “ajax (programming),” wikipedia, http://en.wikipedia .org/wiki/ajax_(programming) (accessed feb. 1, 2007). 15. “web services activity,” w3c, www.w3.org/2002/ws (accessed feb. 1, 2007). 16. walt crawford, “library 2.0 and ‘library 2.0.’ ” cites & insights 6, no. 2 (2006), http://citesandinsights.info/civ6i2.pdf (accessed dec. 13, 2007). 17. eric ormsby, “the battle of the book: the research library today,” the new criterion (oct. 2001): 8. 18. ken chad and paul miller, “do libraries matter? the rise of library 2.0: a white paper,” version 1.0, 2005, www.talis .com/downloads/white_papers/dolibrariesmatter.pdf (accessed feb. 1, 2007). 19. slashdot, “myspace #1 us destination last week,” h t t p : / / s l a s h d o t . o rg / a r t i c l e s / 0 6 / 0 7 / 1 2 / 0 0 1 6 2 11 . s h t m l article title | author 33participatory networks | lankes, silverstein, and nicholson 33 (accessed feb. 1, 2007); pete williams, “myspace, facebook attract online predators,” msnbc, www.msnbc.msn.com/ id/11165576 (accessed feb. 1, 2007); “the myspace gener­ ation,” businessweek, dec. 12, 2005, www.businessweek .com/magazine/content/05_50/b3963001.htm (accessed feb. 1, 2007). 20. saturday night live, “sketch: myspace seminar,” nbc, www.nbc.com/saturday_night_live/segments/9166.shtml (accessed feb. 1, 2007). 21. c. stohl and g. cheney, “participatory processes/para­ doxical practices,” management communication quarterly 14, no. 3 (2001): 349–407. 22. j. zucca, “traces in the clickstream: early work on a management information repository at the university of penn­ sylvania,” information technology and libraries 22, no. 4 (2003): 175–78. 23. to be more precise, public and school libraries that accept e­rate funding. 170 information technology and libraries | december 2011 this paper summarizes a research program that focuses on how catalogers, other cultural heritage information workers, web/semantic web technologists, and the general public understand, explain, and manage resource description tasks by creating, counting, measuring, classifying, and otherwise arranging descriptions of cultural heritage resources within the bibliographic universe and beyond it. a significant effort is made to update the nineteenth-century mathematical and scientific ideas present in traditional cataloging theory to their twentiethand twenty-first-century counterparts. there are two key elements in this approach: (1) a technique for diagrammatically depicting and manipulating large quantities of individual and grouped bibliographic entities and the relationships between them, and (2) the creation of resource description exemplars (problem–solution sets) that are intended to play theoretical, pedagogical, and it system design roles. to the reader: this paper presents a major re-visioning of cataloging theory, introducing along the way a technique for depicting diagrammatically large quantities of bibliographic entities and the relationships between them. as many details of the diagrams cannot be reproduced in regularly sized print publications, the reader is invited to follow the links provided in the endnotes to pdf versions of the figures. c ataloging—the systematic arrangement of resources through their descriptions that is practiced by libraries, archives, and museums (i.e., cultural heritage institutions) and other parties1—can be placed in an advanced, twenty-first-century context by updating its preexisting scientific and mathematical ideas with their more contemporary versions. rather than directing our attention to implementation-oriented details such as metadata formats, database designs, and communications protocols, as do technologists pursuing bottom-up web and semantic web initiatives, in ronald j. murray and barbara b. tillett cataloging theory in search of graph theory and other ivory towers object: cultural heritage resource description networks this paper we will define a complementary, top-down approach. this top-down approach focuses on how catalogers, other cultural heritage information workers, web/ semantic web technologists, and the general public have understood, explained, and managed their resource description tasks by creating, counting, measuring, classifying, and otherwise arranging descriptions of cultural heritage resources within and beyond the bibliographic universe. we go on to prescribe what enlargements of cataloging theory and practice are required such that catalogers and other interested parties can describe pages from unique, ancient codices as readily as they might describe information elements and patterns on the web. we will be enhancing cataloging theory with concepts from communications theory, history of science, graph theory, computer science, and from the hybrid field of anthropology and mathematics called ethnomathematics. employing this strategy benefits two groups: ■■ workers in the cultural heritage realm, who will acquire a broadened perspective on their resource description activities, who will be better prepared to handle new forms of creative expressions as they appear, and who will be able to shape the development of information systems that support more sophisticated types of resource descriptions and ways of exploring those descriptions. to build a better library system (perhaps an n-dimensional, n-connected system?), one needs better theories about the library collections and the people or groups who manage and use them. ■■ the full spectrum of people who draw on cultural heritage resources: scholars, creatives (novelists, poets, visual artists, musicians, and so on), professional and technical workers, students, and other people or groups pursuing specific or general, long or short-term interests, entertainment, etc. to apply a multidisciplinary perspective to the processes by which resource description data (linked or otherwise) are created and used is not an ivory tower exercise. our approach draws lessons from the debates on why, what, and how to describe physical phenomena that were conducted by physicists, engineers, software developers (and their historian and philosopher of science observers) during the evolution of high-energy physics. during that time, intensive debates raged over theory and observational/experimental data, the roles of theorists, experimenters, and instrument builders, instrumentation, and hardware/software system design.2 accommodating the resulting scientific approaches to description, collaboration, and publishing has required the creation of information technologies that have had and continue to have world-shaking effects. ronald j. murray (rmur@loc.gov) is a digital conversion specialist in the preservation reformatting division, and barbara b. tillett (btil@loc.gov) is the chief of the policy and standards division at the library of congress. cataloging theory in search of graph theory and other ivory towers | murray and tillett 171 descriptions—accounts or representations of a person, object, or event being drawn on by a person, group, institution, and so on, in pursuit of its interests. given this definition, a person (or a computation) operating from a business rules–generated institutional or personal point of view, and executing specified procedures (or algorithms) to do so, is an integral component of a resource description process (see figure 1). this process involves identifying a resource’s textual, graphical, acoustic, or other features and then classifying, making quality and fitness for purpose judgments, etc., on the resource. knowing which institutional or individual points of view are being employed is essential when parties possessing multiple views on those resources describe cultural heritage resources. how multiple resource descriptions derived from multiple points of view are to be related to one another becomes a key theoretical issue with significant practical consequences. ■■ niels bohr’s complementarity principle and the library in 1927, the physicist niels bohr offered a radical explanation for seemingly contradictory observations of physical phenomena confounding physicists at that time.6 according to bohr, creating descriptions of nature is the primary task of the physicist: it is wrong to think that the task of physics is to find out how nature is. physics concerns what we can say about nature.7 descriptions that appear contradictory or incomparable may in fact be signaling deep limitations in language. bohr’s complementarity principle states that a complete description of atomic-level phenomena requires descriptions of both wave and particle properties. this is generally understood to mean that in the normal language these physics research facilities and their supporting academic institutions are the same ones whose scientific subcultures (theory, experiment, and instrument building) generated the data creation, management, analysis, and publication requirements that resulted in the creation of the web. in response to this development, we have come to believe that cultural heritage resource description (i.e., the process of identifying and describing phenomena in the bibliographic universe as opposed to the physical one) must now be as open to the concepts and practices of those twenty-first-century physics subcultures as it had been to the natural sciences during the nineteenth century.3 we have consequently undertaken an intensive study of the scientific subcultures that generate scientific data and have identified four principles on which to base a more general approach to cultural heritage resource description: 1. observations 2. complementarity 3. graphs 4. exemplars the cultural heritage resource description theory to follow proposes a more articulated view of the complex, collaborative process of making available—through their descriptions—socially relevant cultural heritage resources at a global scale. we will demonstrate that a broader understanding of this resource description process (along with the ability to create improved implementations of it) requires integrating ideas from other fields of study, reaching beyond it system design to embrace larger issues. ■■ cataloging as observation as stated in the oxford english dictionary, an observation is: the action or an act of observing scientifically; esp. the careful watching and noting of an object or phenomenon in regard to its cause or effect, or of objects or phenomena in regard to their mutual relations (contrasted with experiment). also: a measurement or other piece of information so obtained; an experimental result.4 following the scientific community’s lead in striving to describe the physical universe through observations, we adapted the concept of an observation into the bibliographic universe and assert that cataloging is a process of making observations on resources. human or computational observers following institutional business rules (i.e., the terms, facts, definitions, and action assertions that represent constraints on an enterprise and on the things of interest to the enterprise)5 create resource figure 1. a resource description modeled as a business ruleconstrained account of a person, object, or event 172 information technology and libraries | december 2011 purpose, its reformatting, and its long-term preservation must take into consideration that resource’s physical characteristics. having things to say about cultural heritage resources—and having many “voices” with which to say them—presents the problem of creating a well-articulated context for library-generated resource descriptions as well as those from other sources. these contextualization issues must be addressed theoretically before implementation-level thinking, and the demands of contextualization require visualization tools to complement the narratives common to catalogers, scholars, and other users. this is where mathematics and ethnomathematics make their entrance. ethnomathematics is the study of the mathematical practices of specific cultural groups over the course of their daily lives and as they deal with familiar and novel problems.10 an ethnomathematical perspective on cultural heritage resource description directs one’s attention to the existence of simple and complex resource descriptions, the patterns of descriptions that have been created, and the representation of these patterns when they are interpreted as expressions of mathematical ideas. a key advantage of operating from an ethnomathematical perspective is becoming aware that mathematical ideas can be observed within a culture (namely the people and institutions who play key roles in observing the bibliographic universe) before their having been identified and treated formally by western-style mathematicians. ■■ resource description as graph creation relationships between cultural heritage resource descriptions can be represented as conceptually engaging and flexible systems of connections mathematicians call graphs. a full appreciation of two key mathematical ideas underlying the evolution of cataloging—putting things into groups and defining relationships between things and groups of things—was only possible after the founding, naming, and expansion of graph theory, which is a field of mathematics that emerged in the 1850s, and the eventual acceptance around 1900 of set theory, a field founded amid intense controversy in 1874. between the emergence of formal mathematical treatments of those ideas by mathematicians and their actual exploitation by cataloging theorists—or by anyone capable of considering library resource description and organization problems from a mathematical perspective—lay a gulf of more than one hundred years.11 it remained for scholars in the library world to begin addressing the issue. tillett’s 1987 work on bibliographic relationships and svenonius’s 2000 definition of bibliographic entities in set-theoretic terms that physicists use to communicate experimental results, the wholeness of nature is accessible only through the embrace of complementary, contradictory, and paradoxical descriptions of it. later in his career, bohr vigorously affirmed his belief that the complementarity principle was not limited to quantum physics: in general philosophical perspective, it is significant that, as regards analysis and synthesis in other fields of knowledge, we are confronted with situations reminding us of the situation in quantum physics. thus, the integrity of living organisms, and the characteristics of conscious individuals, and most of human cultures, present features of wholeness, the account of which implies a typically complementary mode of description. . . . we are not dealing with more or less vague analogies, but with clear examples of logical relations which, in different contexts, are met with in wider fields.8 within a library, there are many things catalogers, conservators, and preservation scientists—each with their distinctive skills, points of view, and business rules—can observe and say about cultural heritage resources.9 much of what these specialists say and do strongly affects library users’ ability to discover, access, and use library resources in their original or surrogate forms. while observations made by these specialists from different perspectives may lead to descriptions that must be accepted as valid for those specialists, a fuller appreciation of these descriptions calls for the integration of those multiple perspectives into a well-articulated, accessible whole. reflecting the perspectives of the library of congress directorates in which we work, the acquisitions and bibliographic access (aba) directorate and the preservation directorate, we assert that the most fundamental complementary views on cultural heritage resources involve describing a library’s resources in terms of their availability (from an acquisitions perspective), in terms of their information content (from a cataloging perspective), and in terms of their physical properties (from a preservation perspective). for example, in the normal languages used to communicate their results, preservation directorate conservators narrate their condition assessments and record simple physical measurements of library-managed objects—while at the same time preservation scientists in another section bring instrumentation to acquire optical and chemical data from submitted materials and from reference collections of physical and digital media. even though these assessments and measurements may not be comprehended by or made accessible to most library users, the information gathered possess a critical logical relationship to bibliographic and other descriptions of those same resources. key decisions regarding a library resource’s fitness for cataloging theory in search of graph theory and other ivory towers | murray and tillett 173 by the modeling technique. what is required instead is theory-based guidance of systems development, alongside theory testing and improvement through application use. if software development is not constrained by a tacit or explicit resource description theory or practice, graph or other data structures familiar to the historically less well-informed, those favored by an institution’s system designers and developers, or those familiar to and favored by implementation-oriented communities may be invoked inappropriately.18 given graph theory’s potentially overwhelming mathematical power—as evidenced by its many applications in the physical sciences, engineering, and computer science—investigations into graph theory and its history require close attention both to the history and evolving needs of the cultural heritage community.19 the unnecessary constraint on resource description theory formation occasioned by the use of e-r or oo modeling can be removed by dispensing with it system analysis tools and expressing resource description concepts in graph-theoretical terms. with this step, the very general elements (i.e., entities and relationships) that characterize e-r models and the more implementation-oriented ones in oo models are replaced by more mathematically flexible, theory-relevant elements expressed in graph-theoretical terms. the result is a “graph-friendly” theory of cultural heritage resource description, which can borrow from other fields (e.g., ethnomathematics, history of science) to improve its descriptive and predictive power, guide it system design and use, and, in response to users’ experiences with functioning systems, results in improved theories and information systems. graph theory in a cultural heritage context ever since the nineteenth century foundation of graph theory (though scholars regularly date its origins from euler’s 1736 paper)20 and its move from the backwaters of recreational mathematics to full field status by 1936, graph theory has concerned itself with the properties of systems of connections—nowadays regularly expressed as the mathematical objects called sets.21 in addition to its set notational form, graphs also are depicted and manipulated in diagrammatic form as dots/labeled nodes linked by labeled or unlabeled, simple or arrowed lines. for example, the graph x, consisting of one set of nodes labeled a, b, c, d, e, and f and one set of edges labeled ab, bd, de, ef, and fc, can be depicted in set notation as x = {{a b c d e f}, {ab bd de ef fc}} and can be depicted diagrammatically as in figure 2. when graphs are defined to represent different types of nodes and relationships, it becomes possible to create and discuss structures that can support cultural heritage resource description theory and application building. the following diagrams depict simple resource description identified those mathematical ideas in cataloging theory and developed them formally.12 then in 2009, we were able to employ graph theory (expressed in set-theoretical terms and in its highly informative graphical representation) as part of a broader historical and cultural analysis.13 cataloging theory had by 2009 haltingly embraced a new view on how resources in libraries have been described and arranged via their descriptions—an activity that in principle stretches back to catalogs created for the library of alexandria14—and how these structured resource descriptions have evolved over time, irrespective of implementation. murray’s investigation into this issue revealed that the increasingly formalized and refined rules that guided anglo-american catalogers had, by 1876, specified sophisticated systems of cross-references (i.e., connections between bibliographic descriptions of works, authors, and subjects)—systems whose properties were not yet the subject of formal mathematical treatment by mathematicians of the time.15 murray also found that library resource description structures—when teased out of their book and card and digital catalog implementations and treated as graphs—are arguably more sophisticated than those being explored in the world wide web consortium’s (w3c) library linked data initiative.16 implementation-oriented substitutes for graph theory cataloging theory has been both helped and hindered by the use of information technology (it) techniques like entity-relationship modeling (e-r, first used extensively by tillett in 1987 to identify bibliographic relationships in cataloging records) and object-oriented (oo) modeling.17 e-r and oo modeling may be used effectively to create information systems that are based on an inventory of “things of interest” and the relationships that exist between them. unfortunately, the things of interest in cultural heritage institutions keep changing and may require redefinition, aggregation, disaggregation, and re-aggregation. e-r and oo modeling as usually practiced are not designed to manage the degree and kind of changes that take place under those circumstances. when trying to figure out what is “out there” in the bibliographic universe, we assert that focus should first be placed on identifying and describing the things of interest, what relationships exist between them, and what processes are involved in the creation, etc., of resource descriptions. having accomplished this, attention can then be safely paid to defining and managing information deemed essential to the enterprise, that is, undertaking it system analysis and design. but when an it-centric modeling technique becomes the bed on which the resource description theory itself is constructed, the resulting theory will be driven in a direction that is strongly influenced 174 information technology and libraries | december 2011 of the resources they describe. figure 4’s diagrammatic simplicity becomes problematic when large quantities of resources are to be described, when the number and kinds of relationships recorded grows large, and when more comprehensive but less-detailed views of bibliographic relationships are desired. to address these problems in a comprehensive fashion, we examined similar complex description scenarios in the sciences and borrowed another idea from the physics community—paper tool creation and use. ■■ paper tools: graph-aware diagram creation paper tools are collections of symbolic elements (diagrams, characters, etc.), whose construction and manipulation are subject to specified rules and constraints.23 berzelian chemical notation (e.g., c6h12o6) and—more prominently—feynman diagrams like those in figure 5 are familiar examples of paper tool creation and use.24 creating a paper tool resource diagram requires that the rules for creating resource descriptions be reflected in diagram elements, properties of diagram elements, and drawing rules that define how diagram/symbolic elements are connected to one another (e.g., the formula c6h12o6 specifies six molecules of carbon, twelve of hydrogen, and six of oxygen). the detailed bibliographic information in figure 4 is progressively schematized in a graphs that are based on real-world bibliographic descriptions. nodes in the graphs represent text, numbers, or dates and relationships that can be nondirectional (as a simple line), unidirectional (as single arrowed lines) or bidirectional (as a double arrowed line). the all-in-one resource description graph in figure 3 can be divided and connected according to the kinds of relationships that have been defined for cultural heritage resources. this is the point where institutional, group, and individual ways of describing resources shape the initial structure of the graph. once constructed, graph structures like this and their diagrammatic representations are then interpreted in terms of a tacit or explicit resource description theory. in the case of graphs constructed according to ifla’s functional requirements for bibliographic records (frbr) standard,22 figure 3 can be subdivided into four frbr sub-graphs, yielding figure 4. the four diagrams depict the initial graph of cataloging data as four complementary frbr wemi (w–work, e–expression, m–manifestation, and i–item) graphs. note that the item graph contains the call numbers (used here to identify the location of the copy) of three physical copies of the novel. this use of call numbers is qualitatively different from the values found in the manifestation graph in that resource descriptions in this graph apply to the entire population of physical copies printed by the publisher. the descriptions contained in figure 4’s frbr subgraphs reproduce bibliographic characteristics found useful by catalogers, scholars, other educationally oriented end users, and to varying extents the public in general. once created, resource description graphs and subgraphs (in mathematical notation or in simple diagrams like figure 4) can proliferate and link in multiple and complex ways—in parallel with or independently figure 3. library of congress catalog data for thomas pynchon’s novel gravity’s rainbow, represented as an all-inone graph labeled c figure 2. a diagrammatic representation of graph x cataloging theory in search of graph theory and other ivory towers | murray and tillett 175 6 graph is now represented explicitly by a black dot in a ring in the more schematic paper tool version. resource descriptions are then represented in fixed colors and positions relative to the resource/ring: the worklevel resource description is represented by a blue box, expression by a green box, manifestation by a yellow box, and item by a red box. depicting one aspect of the frbr way that reflects frbr definitions of bibliographic things of interest and their relevant relationships. as a first step, the four wemi descriptions in figure 4 are given a common identity by linking them to a c node, as in figure 6. the diagram is then further schematized such that frbr description types and relationships are represented by appropriate graphical elements connected to other elements. the result shows how a frbr paper tool makes it much easier to construct and examine complex large-scale properties of resource and resource description structures (like figure 7, right side) without being distracted by textual and linkage details. the resource described (but not shown) by the figure figure 4. the all-in-one graph in figure 3, separated into four frbr work (top-left), expression (top-right), manifestation (bottom-left), and item (bottom-right) graphs figure 5. feynman diagrams of elementary particle interactions figure 6. a frbr resource description graph 176 information technology and libraries | december 2011 expressions. the work products of scholars—especially those creations that are dense with quotations, citations, and other types of direct and derived textual and graphical reference within and beyond themselves—are excellent environments for paper tool explorations and more generally, for testing of exemplars—solutions to the potentially complex problem of describing cultural heritage resources. ■■ exemplars the fourth principle in our cultural heritage resource description theory involves exemplar identification and analysis. according to the historian of science thomas s. kühn, exemplars are sets of concrete problems and solutions encountered during one’s education, training, and work. in the sciences, exemplar-based problem finding and solving involves mastery of relevant models, builds knowledge bases, and hones problem-solving skills. every student in a field would be expected to demonstrate mastery by learning and using their field’s exemplars. change within a scientific field is manifest by the need to modify old or create new exemplars as new problems appear and must be solved.26 a cultural heritage resource description theorist would, in addition to identifying and developing exemplars from real bibliographic data and other sources, want to speculate about possible resource/description configurations that call for changes in existing information technologies. to the theorist, it would be as important to find out what can’t be done with frbr and other resource description models at library, archive, museum, and internet scales, as it is to be able to explain routine item cataloging and tagging activities. discovering system limitations is better done in advance by simulating uncommon or challenging circumstances than by having problems appear later in production systems. model graphically, the descriptions closest to the black dot resource/slot are the most concrete and those furthest away the most abstract. (readers wishing to interpret frbr paper tool diagrams without reference to color values should note the strict ordering of wemi elements: w–e–m–i–resource/ring or resource/ring–i–m–e–w.) finally, to minimize element use when pairs of wemi boxes touch, the appropriate frbr linking relationship for the relevant pair of descriptions (as explicitly shown in the expanded graph) is implied but not shown. with appropriate diagramming conventions, the process of creating and exploring resource description complexes addresses combined issues of cataloging theory and institutional policy—and results in an ability to make better-informed judgments/computations about resource descriptions and their referenced resources. as a result, resource description graphs are readily created and transformed to serve theoretical—and with greater experience in thinking and programming along graph-friendly lines, practical—ends. one example of transformability would arise when exploring the implications of removing redundant portions of related resource descriptions as more copies of the same work are brought to the bibliographic universe. the frbr paper tool elements and the more articulated resource description graphs in figure 8 both depict the consequences of a practical act: combining resource descriptions for two copies of the same edition of the novel gravity’s rainbow.25 the top-most frbr diagram and its magnified section depict how the graph would look with a single item-level description, the call number for one physical copy. the bottom-most frbr diagram and its magnified section depict the graph with two item-level descriptions, the call numbers for two physical copies. a frbr paper tool’s flexibility is useful for exploring potentially complex bibliographic relationships created or uncovered by scholars—parties whose expertise lies in identifying, interrelating, and discussing creative concepts and influences across a full range of communicative figure 7. a frbr paper tool diagram element (left) and the less schematic frbr resource description graph it depicts (right) cataloging theory in search of graph theory and other ivory towers | murray and tillett 177 drawing diagrams. use case diagrams are secondary in use case work.28 as products of and guides for theory making, resource description exemplars have different origins and audiences than those for use cases. while use cases and exemplars offer perspectives that can support information system design, exemplars were originally introduced as theoretical entities by kühn to explain how theories and theory-committed communities can crystallize around problem-solution sets, how these sets also can serve as pedagogical tools, and why and when problem-solution sets get displaced by new ones. the proposed process of cultural heritage exemplar creation and use, followed by modification or replacement in the face of changes in the bibliographic universe draws on kühn’s and historian of science david kaiser’s interest in how work gets done in the sciences, in addition to their rejection of paradigms as eerie self-directing processes.29 exemplars are not use cases use cases are a software modeling technique employed by the w3c library linked data incubator group (lld xg) in support of requirements specification.27 kühnstyle exemplars are definitely not to be confused with use cases, which are requirements-gathering documents that contribute to software engineering projects. there is a wikipedia definition of a use case that describes its properties: a use case in software engineering and systems engineering, is a description of steps or actions between a user (or “actor”) and a software system which leads the user towards something useful. the user or actor might be a person or something more abstract, such as an external software system or manual process. . . . use cases are mostly text documents, and use case modeling is primarily an act of writing text and not figure 8. frbr paper tool diagram elements and the frbr resource description graphs they depict 178 information technology and libraries | december 2011 ■■ a webpage and its underlying, globally distributed, multimedia resource network, as it changes over time. such exemplars can be presented diagrammatically through the use of paper tools. this use of diagrams in support of conceptualization and information system design is deliberately patterned after professional data modeling theory and practice.31 paper tool–supported analyses of a nineteenth-century american novel (exemplar 1) and of eighteenth-century french poems drawn from state archives (exemplar 2) will be presented to illustrate how information system design and pedagogy can be informed by exemplary scholarly research and publication, combined with narrativized diagrammatic representations of bibliographic and other relationships in traditional and digital media. exemplar 1. from moby-dick to mash-ups—a print publication history and multimedia mash-up problem document the publication history of print copies of a literary work, identifying editorially driven content transfer across print editions along with content selection and transformation in support of multimedia resource creation. solution the solution to this descriptive problem relies heavily on placing resource descriptions into groups and then defining relationships within and across those groups— i.e., on graph creation. after locating a checklist that documented the publication history of the novel and after identifying key components of a moby-dick and orson welles–themed multimedia resource appropriation and transformation network, murray used the frbr paper tool along with additional connection rules to create a resource description diagram (rdd) that represented g. thomas tanselle’s documentation of the printing history (from 1851 to 1976) of herman melville’s epic novel, moby-dick.32 the resulting diagram provides a high-level view of a large set of printed materials—depicting concepts such as a creative work, the expression of the work in a particular mode of languaging (i.e., speech, sign, image), and more concrete concepts such as publications. to reduce displayed complexity, sets of frbr diagram elements were collapsed into green shaded squares representing entire editions/printings, yielding figure 9.33 the vertical axis represents the year of publication, starting with the 1851 printings at the top. connected squares the resulting network of connections in figure 9 can be interpreted in publishing terms. one line or two or more lines descending downwards from a printing’s green in addition, resource description structures specified in an exemplar can and should represent a more abstract treatment of a resource description and not just data or data structures engaged by end users. exemplars on hand and others to come cultural heritage resource description exemplars have been created over time as solutions to problems of resource description and later made available for use, study, mastery, and improvement. while not necessarily bound to a particular information technology, such as papyrus, parchment, index cards, database records, or rdf aggregations, resource description exemplars have historically provided descriptive solutions of physical resources whose physical and intellectual structure had originally been innovative solutions to describing, for example, ■■ a manuscript (individual and related multiples, published but host to history, imaginary, etc.); ■■ a monograph in one edition (individual and related multiples); ■■ a monograph in multiple editions (individual and related multiples); and ■■ a publication in multiple media, created sequentially or simultaneously. with the advent of electronic and then digital communications media, more complex resource description problem-solution sets have been called for as a response to enduringly or recently more sophisticated creative/ editorial decision-making and to more flexible print and digital information technology production capabilities. the most challenging problem-solution sets involve the assembly and cross-referencing of several multipart—and possibly multimedia—creative or editorially constructed works, such as the following: ■■ a work published as a monograph, but which has been reprinted and reedited; translated into numerous languages; supplemented by illustrations from multiple artists; excerpted and adapted as plays, an opera, comic books, and cartoon series; multimedia mash-ups; and has been directly quoted in paintings and other graphic arts productions, and has been the subject of dissertations, monographs, journal articles, etc. ■■ a continuing publication (individual and related multiple publications, special editions, name, publisher, editorial policy changes, etc.). ■■ a monograph whose main content is composed nearly entirely of excerpts from other print publications.30 ■■ a library-hosted multimedia resource and its associated resource description network. cataloging theory in search of graph theory and other ivory towers | murray and tillett 179 by paper tool diagram creation, analysis, and subsequent action, namely, ■■ connecting the squares (i.e., assigning at least one relationship to a printing) ensures access based on the relationship assigned; and ■■ parties located around the globe can examine a given connected or disconnected resource description network and develop strategies for enhancing its usefulness. the wealth of descriptive information available in the moby-dick exemplar illustrates how previous and future collaborative efforts between cultural heritage institutions and other parties have already generated resource descriptions that possess a network structure alongside its content. with a more graph-friendly and collaborative implementation, melville scholars, scholarly organizations,34 and enthusiasts could more effectively examine, discuss, and through their actions enhance the moby dick resource description network’s documentary, scholarly, and educational value. in its original form, the moby dick resource description diagram (and the exemplar it partially documents) only depicted full-length publications of melville’s work. as a test of the frbr paper tool’s ability to accommodate both traditional and modern creative expressions in individual and aggregate form—while continuing to serve theoretical, practical, and educational ends—murray added a resource description network for orson whales,35 square are interpreted to mean that the printing gave rise to one or more additional printings, which may occur in the same or later years. two or more lines converging on a green square from above indicate that the printing was created by combining texts from multiple prior printings—an editorial/creative technique similar to that used to construct the mash-ups published on the web. connecting unconnected squares tanselle’s checklist did not specify predecessor or successor relationships for each post–1851 printing. this often unavoidable, incomplete status is depicted in figure 9 as green squares that are ■■ not linked to any squares above it, i.e., to earlier printings; and/or ■■ not linked to any squares below it, i.e., to later printings; or ■■ connected islands, without a link to the larger structure. recognizing the extent of moby-dick printing disconnectedness in tanselle’s checklist and developing a strategy for dealing with it only by analyzing tanselle’s checklist would be extremely difficult. in contrast, the disconnectedness of the moby-dick resource description network, and its implications for search-based discovery based on following the depicted relationships is readily discernable in figure 9. the ease with which the disconnected condition can be assessed also hints at benefits to be gained by collaborative resource description supported figure 9. a moby-dick resource description diagram, depicting relationships between printings made between 1851–1976 (greatly reduced scale) 180 information technology and libraries | december 2011 darnton’s book can stand on its own as an exemplar for historical method, with the diagram providing additional diagrammatic support. solution 2 darnton’s analysis treated each poem found in the archives as an individual creative work,38 enabling the use of the frbr paper tool (as a bookkeeping device this time) instead of a tool designed to aggregate and describe archival materials. the resulting diagram is a more articulated frbr paper tool depiction of darnton’s poetry communication network, a section of which appears as figure 11. the depiction of the poetry communication network shown in figure 11 is composed of: ■■ tan squares that depict individuals (clerks, professors, priests, students, etc.) who read, discussed, copied, and passed along the poems. ■■ diagram elements that depict poetry written on scraps of paper (treated as resources) that were police custody, were admitted to having existed by suspects, or assumed to have existed by the police. if one’s theory and business rules permit it, paper tool drawing conventions can depict descriptions of lost and nonexistent but nonetheless describable resources. ■■ arrowed lines that represent relationships between a poem and the individuals who owned copies, those who created or received copies of the poem, etc.39 with darnton’s monograph to provide background information regarding the historical personages involved, relationships between the works and the people, document selection from archival fonds, and the point of view of the scholar, the resulting problem-solution set can: ■■ serve as enhanced documentation for darnton-style communication network analysis and discussion. ■■ serve as an exemplar for catalogers, scholars, and alex itin’s moby-dick-themed multimedia mash-up, to the print media diagram. the four-minute long orson whales multimedia mashup contains hundreds of hand-painted page images from the novel, excerpts from the led zeppelin song “moby dick,” parts of two vocal performances by the actor orson welles, and a video clip from welles’s motion picture citizen kane. the result is shown in figure 10.36 the leftmost group of descriptions in figure 10 depicts various releases of led zeppelin’s “moby dick.” the central group depicts the sources of two orson welles audio dialogues after they had been ripped (i.e., digitized from physical media) and made available online. the grouping on the right depicts the orson whales mash-up itself and collections of digital images of painted pages created from two printed copies of the novel. exemplar 2. poetry and the police—archival content identification and critical analysis problem examine archival collections and select, describe, and document ownership and other relationships of a set of documents (poems) alleged to have circulated within a loosely defined social group. solution 1 in his 2010 work, poetry and the police: communication networks in eighteenth-century paris, historian robert darnton studied a 1749 paris police investigation into the transmission of poems highly critical of the french king, louis xv. after combing state archives for police reports, finding and identifying scraps of paper once held as evidence, and collecting other archival materials, darnton was able to construct a poetry communication network diagram,37 which, along with his narrative account, identified a number of parties who owned, copied, and transmitted six of the scandalous poems and placed their activities in a political, social, and literary context. figure 10. a resource description diagram of alex itin’s moby-dick multimedia work, depicting the resources and their frbr descriptions. cataloging theory in search of graph theory and other ivory towers | murray and tillett 181 with all of the adaptations and excerpts extant within a specified bibliographic universe (such as the cataloging records that appear in oclc’s worldcat bibliographic database). resource description diagrams, created from real-world or theoretically motivated considerations, would then provide a diagrammatic means for depicting the precise and flexible underlying mathematical ideas that, heretofore unrecognized but nonetheless systematically employed, serve resource description ends. if the structure of a well-motivated and constructed resource description diagram subsequently makes data representation and management requirements that a given information system cannot accommodate, cataloging theorists and information technologists alike will then know of that system’s limitations, will work together on mitigating them, and will embark on improving system capabilities. ■■ cataloging theory, tool-making, education, and practice this modernized resource description theory offers new and enhanced roles and benefits for cultural heritage personnel as well as for the scholars, students, and those members of the general public who require support not just for searching, but also for collecting, reading, writing, collaborating, monitoring, etc.40 information systems that others who seek similar solutions to their problems with identifying, describing, depicting, and discussing as individual works documents ordinarily bundled within hierarchically structured archival fonds at multiple locations. ■■ a paper tool into a power tool there are limits to what can be done with a hand-drawn frbr paper tool. while murray was able to depict largescale bibliographic relationships that probably had not been observed before, he was forced to stop work on the moby-dick diagram because much of the useful information available could not fit into a static, hand-drawn diagram. we think that automated assistance in creating resource description diagrams from bibliographic records is required. with that capability available, cataloging theorists and parties with scholarly and pedagogical interests could interactively and efficiently explore how scholars and sophisticated readers describe significant quantities of analog and digital resources. it would then be possible and extremely useful to be able to initiate a scholarly discussion or begin a lecture by saying, “given a moby-dick resource description network . . . ” and then proceed to argue or teach from a diagram depicting all known printings of moby-dick—along figure 11. a section of darnton’s poetry communication network 182 information technology and libraries | december 2011 the value of non-euclidean geometry lies in its ability to liberate us from preconceived ideas in preparation for the time when exploration of physical laws might demand some geometry other than the euclidean.41 taking riemann to heart, we assert that the value of describing cultural heritage resources as observations organized into graphs and of enhancing and supplementing the resource description exemplars that have evolved over time and circumstance rests in opportunities for liberating the cultural heritage community from preconceived ideas about resource description structures and from longstanding points of view on those resources. having achieved such a goal, the cultural heritage community would then be ready when the demand came for resource description structures that must be more flexible and powerful than the traditional ones. given the unprecedented development of the web and the promise of bottom-up semantic web initiatives, we think that the time for the cultural heritage community’s liberation is at hand. ■■ acknowledgments the authors wish to thank beacher wiggins and dianne van der reyden, directors of the library of congress acquisitions and bibliographic access directorate and the preservation directorates, respectively, for supporting the authors’ efforts to explore and renew the scientific and mathematical foundations of cultural heritage resource description. thanks also to marcia ascher, david hay, robert darnton, daniel huson, and mark ragan, whose scholarship informed our own; and to joanne o’brienlevin for her critical eye and for editorial advice. references and notes 1. oed online, “catalogue, n.” http://www.oed.com/view dictionaryentry/entry/28711 (accessed aug. 10, 2011). 2. peter galison, “part ii: building data,” in image & logic: a material culture of microphysics (chicago: univ. of chicago pr., 2003): 370–431. 3. gordon mcquat, “cataloguing power: delineating ‘competent naturalists’ and the meaning of species in the british museum,” british journal for the history of science 34, no. 1 (mar. 2001): 1–28. exclusive control of classification schemes and of the records that named and described its specimens are said to have contributed to the success of the british museum’s institutional mission in the nineteenth century. as a division of the british museum, the british library appears to have incorporated classification concepts (hierarchical structuring) from its parent and elaborated on the museum’s strategies for cataloging species. 4. oed online, “observation, n.” http://www.oed.com/ viewdictionaryentry/entry/129883 (accessed july 8, 2011). couple modern, high-level understandings about how cultural heritage resources can be described, organized, and explored with data models that support linking within and across multiple points of view will be able to support those requirements. the complementarity of cosmological and quantum-level views cataloging theory formation and practice—two areas of activity that did not interest many outside of cultural heritage institutions—can now be understood as a much more comprehensive multilayered activity that is approachable from at least two distinct points of view. the approach presented in this paper represents a cosmological-level view on the bibliographic universe. this treatment of existing or imaginable large-scale configurations of cultural heritage resource descriptions serves as a complement to the quantum-level view of resource description, as characterized by it-related specificities such as character sets, identifiers, rdf triples, triplestores, etc. activities at the quantum level—the domain of semantic web technologists and others—yield powerful and relatively unconstrained information management systems. in the absence of cosmological-level inspiration or guidance, these systems have not necessarily been tested against nontrivial, challenging cultural heritage resource description scenarios like those documented in the above two exemplars. applying both views to the bibliographic universe would clearly be beneficial for all institutional and individual parties involved. if ever a model for multilevel, multidisciplinary effort was required, the history of physics is illuminated by mutually influential interactions of cosmological and quantum-level theories, practices, and pedagogy. workers in cultural heritage institutions and technologists pursuing w3c initiatives would do well to reflect on the result. ■■ ready for the future—and creating the future to explore the cultural, scientific, and mathematical ideas underlying cultural heritage resource description, to identify, study, and teach with exemplars, and to exploit the theoretical reach and bookkeeping capability of paper tool –like techniques is to pay homage to the cultural heritage community’s 170+ year-old talent for pragmatic, implementation-oriented thinking,while at the same time pointing out a rich set of possibilities for enhanced service to society. the cultural heritage community can draw inspiration from geometrician bernhard riemann’s own justification for his version of thinking outside of the box called euclidean geometry: cataloging theory in search of graph theory and other ivory towers | murray and tillett 183 18. the prospects for creating graph-theoretical functions that operate on resource description networks are extremely promising. for example, combinatorica (an implementation of graph theory concepts created for the computer mathematics application mathematica) is composed of more than 450 functions. were cultural heritage resource description networks to be defined using this application’s graph-friendly data format, significant quantities of combinatorica functions would be available for theoretical and applied uses; siriam pemmaraju and steven skiena, computational discrete mathematics: combinatorics and graph theory with mathematica (new york: cambridge univ. pr., 2003). 19. dénes könig, theory of finite and infinite graphs, trans. richard mccoart (boston: birkhaüser, 1990); fred buckley and marty lewinter, a friendly introduction to graph theory (upper saddle river, n.j.: pearson, 2003); oystein ore and robin wilson, graphs and their uses (washington d.c.: mathematical association of america, 1990). 20. leonhard euler, “solutio problematis ad geometriam situs pertinentis,” commentarii academiae scientarium imperalis petropolitanae no. 8 (1736): 128–40. 21. “set theory, branch of mathematics that deals with the properties of well-defined collections of objects, which may or may not be of a mathematical nature, such as numbers or functions. the theory is less valuable in direct application to ordinary experience than as a basis for precise and adaptable terminology for the definition of complex and sophisticated mathematical concepts.” quoted from encyclopædia britannica online, “set theory,” oct. 2010, http://www.britannica.com/ebchecked/ topic/536159/set-theory (accessed oct. 27, 2010). 22. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records: final report (munich: k.g. saur, 1998). this document is downloadable as a pdf from http://www.ifla.org/vii/s13/ frbr/frbr.pdf or as an html page at http://www.ifla.org/vii/ s13/frbr/frbr.htm. 23. ursula klein, ed., experiments, models, paper tools: cultures of organic chemistry in the nineteenth century (stanford, calif.: stanford univ. pr., 2003); klein, ed., tools and modes of representation in the laboratory sciences (boston: kluwer, 2001); david kaiser, drawing theories apart: the dispersion of feynman diagrams in postwar physics (chicago: univ. of chicago pr., 2005). 24. for more examples and a general description of feynman diagrams, see http://www2.slac.stanford.edu/vvc/theory/ feynman.html. 25. an enlarged version of this diagram may be found online. ronald j. murray and barbara b. tillett, “frbr paper tool diagram elements and the frbr resource description graphs they depict,” aug. 2011, http://arizona.openrepository.com/ arizona/bitstream/10150/139769/2/fig%208%20frbr%20 paper%20tool%20elements%20and%20graphs.pdf. other informative illustrations also are available. murray and tillett, “resource description diagram supplement to ‘cataloging theory in search of graph theory and other ivory towers. object: cultural heritage resource description networks,” aug. 2011, http://hdl.handle.net/10150/139769. 26. thomas s. kühn, the structure of scientific revolutions, 2nd ed. (chicago: univ. of chicago pr., 1970). 27. daniel vila suero, “use case report,” world wide web consortium, june 27, 2011, http://www.w3.org/2005/ incubator/lld/wiki/usecasereport. 5. david c. hay, uml and data modeling: a vade mecum for modern times (bradley beach, n.j.: technics pr., forthcoming 2011): 124–25. some scholars argue that decisions as to what the things of interest are and the categories they belong to are influenced by social and political factors. geoffrey c. bowker, susan leigh star, sorting things out: classification and its consequences (cambridge, mass.: mit pr., 1999). 6. gerald holton, “the roots of complementarity,” daedalus 117, no. 3 (1988): 151–97, http://www.jstor.org/stable/20023980 (accessed feb. 24, 2011). 7. niels bohr, quoted in aage petersen, “the philosophy of niels bohr,” bulletin of the atomic scientists 19, no. 7 (sept. 1963): 12. 8. niels bohr, “quantum physics and philosophy: causality and complementarity,” in essays 1958–1962 on atomic physics and human knowledge (woodbridge, conn.: ox bow, 1997): 7. 9. for cataloging theorists, the description of cultural heritage things of interest yields groups of statements that occupy different levels of abstraction. upon regarding a certain physical object, a marketer describes product features, a linguist enumerates utterances, a scholar perceives a work with known or inferred relationships to other works, and so on. 10. marcia ascher, ethnomathematics: a multicultural view of mathematical ideas (pacific grove, calif.: brooks/cole, 1991); ascher, mathematics elsewhere: an exploration of ideas across cultures (princeton: princeton univ. pr., 2002). 11. a timeline of events, people, and so on that have had or should have had an impact on describing cultural heritage resources is available online. seven fields or subfields are represented in the timeline and keyed by color: library & information science; mathematics; ethnomathematics; physical sciences; biological sciences; computer science; and arts & literature. ronald j. murray, “the library organization problem,” dipity .com, aug. 2011, http://www.dipity.com/rmur/libraryorganization-problem/ or http://www.dipity.com/rmur/ library-organization-problem/?mode=fs (fullscreen view). 12. barbara ann barnett tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (phd diss., university of california, los angeles, 1987); elaine svenonius, the intellectual foundation of information organization (cambridge, mass.: mit pr., 2000): 32–51. svenonius’s definition is opposed to database implementations that permitted boolean operations on records at retrieval time. 13. ronald j. murray, “the graph-theoretical library,” slideshare.net, july 5 2011, http://www.slideshare.net/ ronmurray/-the-graph-theoretical-library. 14. francis j. witty, “the pinakes of callimachus,” library quarterly 28, no. 1–4 (1958): 132–36. 15. ronald j. murray, “re-imagining the bibliographic universe: frbr, physics, and the world wide web,” slideshare .net, oct. 22 2010, http://www.slideshare.net/ronmurray/frbrphysics-and-the-world-wide-web-revised. 16. for an overview of the technology-driven library linked data initiative, see http://linkeddata.org/faq. murray’s analyses of cultural heritage resource descriptions may be explored in a series of slideshows at http://www.slideshare.net/ronmurray/. 17. pat riva, martin doerr, and maja žumer, “frbroo: enabling a common view of information from memory institutions,” international cataloging & bibliographic control 38, no. 2 (june 2009): 30–34. 184 information technology and libraries | december 2011 36. the multimedia mash-up in figure 10 was linked to the much larger moby-dick structure depicted in figure 9. the combination of the two yields figure 10a, which is too detailed for printout but which can be downloaded for inspection as the following pdf file: ronald j. murray and barbara b. tillett, “transfer and transformation of content across cultural heritage resources: a moby-dick resource description network covering full-length printings from 1851–1976*,” july 2011, http://arizona.openrepository.com/arizona/bitstream/10150/136270/4/fig%2010a%20orson%20whales%20 in%20moby%20dick%20context.pdf. in the figure, two print publications have been expanded to reveal their own similar mash-up structure. 37. robert darnton, poetry and the police: communication networks in eighteenth-century paris (cambridge, mass.: belknap pr. of harvard univ. pr., 2010): 16. 38. ronald j. murray in a discussion with robert darnton, sept. 20, 2010. darnton considered the poems retrieved from the archives as distinct intellectual creations, which permitted the use of frbr diagram elements for the analysis. otherwise, a paper tool with diagram elements based on the archival descriptive standard isad(g) would have been used. committee on descriptive standards, isad (g): general international standard archival description (stockholm, sweden, 1999– ). 39. the complete poetry communication diagram may be viewed at http://arizona.openrepository.com/arizona/ bitstream/10150/136270/6/fig%2011%20poetry%20commun ication%20network.pdf. 40. carole l. palmer, lauren c. teffeau, and carrie m. pittman, scholarly information practices for the online environment: themes from the literature and implications for library science development (dublin, ohio: oclc research, 2009), http://www . o c l c . o rg / p ro g r a m s / p u b l i c a t i o n s / re p o r t s / 2 0 0 9 0 2 . p d f (accessed july 15, 2011). 41. g. f. b. riemann, quoted in marvin j. greenberg, euclidean and non-euclidean geometry: development and history (new york: freeman, 2008): 371. 28. wikipedia.org, “use case,” june 13, 2011, http://en .wikipedia.org/wiki/use_case. 29. kaiser, drawing theories, 385–86. 30. prime examples being jacques derrida’s typographically complex 1974 work glas (univ. of nebraska pr.), and reality hunger: a manifesto (vintage), david shield’s 2011 textual mashup on the topic of originality, authenticity, and mash-ups in general. 31. graeme simsion, data modeling: theory and practice (bradley beach, n.j.: technics, 2007): 333. 32. herman melville, moby-dick (new york: harper & brothers; london: richard bentley, 1851). moby-dick edition publication history excerpted from g. thomas tanselle, checklist of editions of moby-dick 1851–1976. issued on the occasion of an exhibition at the newberry library commemorating the 125th anniversary of its original publication (evanston, ill.: northwestern univ. pr.; chicago: newberry library, 1976). 33. ronald j. murray, “from moby-dick to mash-ups: thinking about bibliographic networks,” slideshare.net, apr. 2011, http://www.slideshare.net/ronmurray/from-mobydick-to-mashups-revised. the moby-dick resource description diagram was presented to the american library association committee on cataloging: description and access at the ala annual conference, washington d.c., july 2010. 34. the life and works of herman melville, melville.org, july 25, 2000, http://melville.org. 35. the new york artist alex itin describes his creation: “it is more or less a birthday gift to myself. i’ve been drawing it on every page of moby dick (using two books to get both sides of each page) for months. the soundtrack is built from searching ‘moby dick’ on youtube (i was looking for orson’s preacher from the the [sic] john huston film) . . . you find tons of led zep [sic] and drummers doing bonzo and a little orson . . . makes for a nice melville in the end. cinqo [sic] de mayo i turn forty. ahhhhhhh the french champagne.” quoted from alex itin, “orson whales,” youtube, jan. 2011, http://www.youtube .com/watch?v=2_3-gem6o_g. 2 information technology and libraries | december 2008 andrew k. pacepresident’s message i n my first column, i mentioned that the lita board’s main objective is “to oversee the affairs of the division during the period between meetings.” of course, oversight requires communication. sometimes this is among board members, or it’s an e-mail update, or a post to the lita-l discussion list, or even the articles in this journal. regardless, i see the cornerstone of “between-meeting oversight” as keeping the membership fully (or even partially) engaged from january through june and july through december. as a mea culpa for the board, but without placing the blame on any one individual, i am willing to concede that the board has not done an adequate job of engaging the membership between american library association (ala) meetings. while ala itself is addressing this problem with recommendations for virtual participation and online collaboration, lita should be at the forefront of setting the benchmark for virtual communication, participation, education, planning, and membership development. in an attempt to posit some solutions, as opposed to finding someone to blame, i first thought of the lita committees. which one should be responsible for communicating lita opportunities and events to the membership using twenty-first-century technology? education? membership? web coordinating? program planning? publications? in the end, i was left with the choice of two evils: merge all the committees into one so that they can do everything or create a new committee to deal with the perceived problem. knowing that neither of those solutions will suffice, i’d like to put the onus back on the membership. maybe i’m trying to be a 2.0 librarian—crowdsourcing the problem, that is, taking the task that might have been done by an individual or committee and asking for more of a community-driven solution. in the past, lita focused on the necessary technologies for crowdsourcing—discussion lists, blogs, and wikis—as if the technology alone could solve the problem. the bigwig taskforce and web coordinating committee have shouldered the burden of both implementing the technology and gaining philosophical consensus on its use—a daunting task that can easily appear chaotic. now that the technology is commoditized (and generally embraced by ala at large and other divisions as well), perhaps it is time to embrace the philosophy of crowdsourcing. maybe it’s just because i have had cloud computing and web-scale architectures on the brain too much lately (having decided that it is impossible to serve two masters—job and volunteer work—i shall forever endeavor to find the overlap between the two), but i sincerely believe that repeating the mantra that lita’s strength is its membership is not mere rhetorical lipservice. ebay is better for sellers because there are so many buyers; it is better for buyers because there are so many sellers. googledocs works for sharing documents better than a corporate wiki or microsoft sharepoint because it breaks down the barriers of domains, allowing the participants to determine who shares responsibility for producing something. barcamps are rising in popularity not only because of a content focus on open data, open source, and open access, but because of the participatory and usergenerated style of the barcamp-style meetings. as a division of ala, lita has two challenges— leading the efforts of educating the membership, other divisions, and ala about impending sea changes in information technology, but also embracing these technologies itself. we must eat our own dog food, as the saying goes. perhaps it is more fitting to suggest that lita must not only focus on getting technology to work, but putting technology to work. in the next few months, the lita board will be tackling lita’s strategic plan, which expires in 2008. that means it is time not only to review the strategy—to educate, to serve, to reach out—but also to assess the tactics employed to fulfill that strategy. you are probably reading this column in or after the month in which the strategic plan ends, which does not mean that we will be coasting into the ala midwinter meeting. on the contrary, i sincerely hope to gather enough information from committees, task forces, members, and nonmembers in order for the lita leadership to come up with something strategically meaningful going into the next decade. one year isn’t nearly long enough to see something this big through to completion. just as national politicians begin reelection campaigns as soon as they are elected, i suspect that ala divisional presidents begin thinking about their legacy within the first couple months of office, if not before. but i hope, at least, to establish some groundwork, including a platform strategy that will allow the membership to maintain a connection with the board and with other members—to crowdsource solutions on a scale that has not been attempted in the past and that will solidify our future. and when we have a plan, you can trust that we will use all the available methods at our disposal to promote it and solicit your feedback. andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. laneconnex | ketchell et al. 31 laneconnex: an integrated biomedical digital library interface debra s. ketchell, ryan max steinberg, charles yates, and heidi a. heilemann this paper describes one approach to creating a search application that unlocks heterogeneous content stores and incorporates integrative functionality of web search engines. laneconnex is a search interface that identifies journals, books, databases, calculators, bioinformatics tools, help information, and search hits from more than three hundred full-text heterogeneous clinical and bioresearch sources. the user interface is a simple query box. results are ranked by relevance with options for filtering by content type or expanding to the next most likely set. the system is built using component-oriented programming design. the underlying architecture is built on apache cocoon, java servlets, xml/xslt, sql, and javascript. the system has proven reliable in production, reduced user time spent finding information on the site, and maximized the institutional investment in licensed resources. m ost biomedical libraries separate searching for resources held locally from external database searching, requiring clinicians and researchers to know which interface to use to find a specific type of information. google, amazon, and other web search engines have shaped user behavior and expectations.1 users expect a simple query box with results returned from a broad array of content ranked or categorized appropriately with direct links to content, whether it is an html page, a pdf document, a streaming video, or an image. biomedical libraries have transitioned to digital journals and reference sources, adopted openurl link resolvers, and created institutional repositories. however, students, clinicians, and researchers are hindered from maximizing this content because of proprietary and heterogeneous systems. a strategic challenge for biomedical libraries is to create a unified search for a broad spectrum of licensed, open-access, and institutional content. n background studies show that students and researchers will use the search path of least cognitive resistance.2 ease and speed are the most important factors for using a particular search engine. a university of california report found that academic users want one search tool to cover a wide information universe, multiple formats, full-text availability to move seamlessly to the item itself, intelligent assistance and spelling correction, results sorted in order of relevance, help navigating large retrievals by logical subsetting and customization, and seamless access anytime, anywhere.3 studies of clinicians in the patient-care environment have documented that effort is the most important factor in whether a patient-care question is pursued.4 for researchers, finding and using the best bioinformatics tool is an elusive problem.5 in 2005, the lane medical library and knowledge management center (lane) at the stanford university medical center provided access to an expansive array of licensed, institutional, and open-access digital content in support of research, patient care, and education. like most of its peers, lane users were required to use scores of different interfaces to search external databases and find digital resources. we created a local metasearch application for clinical reference content, but it did not integrate result sets from disparate resources. a review of federated-search software in the marketplace found that products were either slow or they limited retrieval when faced with a broad spectrum of biomedical content. we decided to build on our existing application architecture to create a fast and unified interface. a detailed analysis of lane website-usage logs was conducted before embarking on the creation of the new search application. key points of user failure in the existing search options were spelling errors that could easily be corrected to avoid zero results; lack of sufficient intuitive options to move forward from a zero-results search or change topics without backtracking; lack of use of existing genre or role searches; confusion about when to use the resource, openurl resolver, or pubmed search to find a known item; and results that were cognitively difficult to navigate. studies of the web search engine and the pubmed search log concurred with our usagelog analysis: a single term search is the most common, with three words maximum entered by typical users.6 a pubmed study found that 22 percent of user queries were for known items rather than for a general subject, confirming our own log analysis findings that the majority of searches were for a particular source item.7 search-term analysis revealed that many of our users were entering partial article citations (e.g., author, date) in any query debra s. ketchell (debra.ketchell@gmail.com) is the former associate dean for knowledge management and library director; ryan max steinberg (ryan.max.steinberg@stanford .edu) is the knowledge integration programmer/architect; charles yates (charles.yates@stanford.edu) is the systems software developer; and heidi a. heilemann (heidi.heilemann@stanford .edu) is the former director for research & instruction and current associate dean for knowledge management and library director at the lane medical library & knowledge management center, information resources & technology, stanford university school of medicine, stanford, california. 32 information technology and libraries | march 2009 box expecting that article databases would be searched concurrently with the resource database. our displayed results were sorted alphabetically, and each version of an item was displayed separately. for the user, this meant a cluttered list with redundant title information that increased their cognitive effort to find meaningful items. overall, users were confronted with too many choices upfront and too few options after retrieving results. focus groups of faculty and students were conducted in 2005. attendees wanted local information integrated into the proposed single search. local information included content such as how-to information, expertise, seminars, grand rounds, core lab resources, drug formulary, patient handouts, and clinical calculators. most of this content is restricted to the stanford user population. users consistently described their need for a simple search interface that was fast and customized to the stanford environment. in late 2005, we embarked on a project to design a search application that would address both existing points of failure in the current system and meet the expressed need for a comprehensive discovery-andfinding tool as described in focus groups. the result is an application called laneconnex. n design objectives the overall goal of laneconnex is to create a simple, fast search across multiple licensed, open-access, and special-object local knowledge sources that depackages and reaggregates information on the basis of stanford institutional roles. the content of lane’s digital collection includes forty-five hundred journal titles and fortytwo thousand other digital resources, including video lectures, executable software, patient handouts, bioinformatics tools, and a significant store of digitized historical materials as a result of the google books program. media types include html pages, pdf documents, jpeg images, mp3 audio files, mpeg4 videos, and executable applications. more than three hundred reference titles have been licensed specifically for clinicians at the point of care (e.g., uptodate, emedicine, stat-ref, and micromedex clinical evidence). clinicians wanted their results to reflect subcomponents of a package (e.g., results from the micromedex patient handouts). other clinical content is institutionally managed (e.g., institutional formulary, lab test database, or patient handouts). more than 175 biomedical research tools have been licensed or selected from open-access content. the needs of biomedical researchers include molecular biology tools and software, biomedical literature databases, citation analysis, chemical and engineering databases, expertise-finding tools, laboratory tools and supplies, institutional-research resources, and upcoming seminars. the specific objectives of the search application are the following: n the user interface should be fast, simple, and intuitive, with embedded suggestions for improving search results (e.g., did you mean? didn’t find it? have you tried?). n search results from disparate local and external systems should be integrated into a single display based on popular search-engine models familiar to the target population. n the query-retrieval and results display should be separated and reusable to allow customization by role or domain and future expansion into other institutional tools. n resource results should be ranked by relevance and filtered by genre. n metasearch results should be hit counts and filtered by category for speed and breadth. results should be reusable for specific views by role. n finding a known article or journal should be streamlined and directly link to the item or “get item” option. n the most popular search options (pubmed, google, and lane journals) should be ubiquitous. n alternative pathways should be dynamic and interactive at the point of need to avoid backtracking and dead ends. n user behavior should be tracked by search term, resource used, and user location to help the library make informed decisions about licensing, metadata, and missing content. n off-the-shelf software should be used when available or appropriate with development focused on search integration. n the application should be built upon existing metadata-creation systems and trusted webdevelopment technologies. based on these objectives, we designed an application that is an extension of existing systems and technologies. resources are acquired and metadata are provided using the voyager integrated library system (ils). the sfx openurl link resolver provides full-text article access and expands the title search beyond biomedicine to all online journals at stanford. ezproxy provides seamless off-campus access. webtrends provides usage tracking. movable type is used to create faq and help information. a locally developed metasearch application provides a cross search with hit results from more than three hundred external and internal full-text sources. the technologies used to build laneconnex and integrate all of these systems include extensible stylesheet language laneconnex | ketchell et al. 33 transformations (xslt), java, javascript, the apache cocoon project, and oracle. n systems description architecture laneconnex is built on a principle of separation of concerns. the lane content owner can directly change the inclusion of search results, how they are displayed, and additional path-finding information. application programmers use java, javascript, xslt, and structured query language (sql) to create components that generate and modify the search results. the merger of content design and search results occurs “just in time” in the user’s browser. we use component-oriented programming design whereby services provided within the application are defined by simple contracts. in laneconnex, these components (called “transformers”) consume xml information and, after transforming it in some way, pass it on to some other component. a particular contract can be fulfilled in different ways for different purposes. this component architecture allows for easy extension of the underlying apache cocoon application. if laneconnex needs to transform some xml data that is not possible with built-in cocoon transformers, it is a simple matter to create a software component that does what is needed and fulfills the transformer contract. apache cocoon is the underlying architecture for laneconnex, as illustrated in figure 1. this java servlet is an xml–publishing engine that is built upon a component framework and uses a pipeline-processing model. a declarative language uses pattern matching to associate sets of processing components with particular request urls. content can come from a variety of sources. we use content from the local file system, network file system, http, and a relational database. the xslt language is used extensively in the pipelines and gives fine control of individual parts of the documents being processed. the end of processing is usually an xhtml document but can be any common mime type. we use cocoon to separate areas of concern so things like content, look and feel, and processing can all be managed as separate entities by different groups of people with little effect on another area. this separation of concerns is manifested by template documents that contain most of the html content common to all pages and are then combined with content documents within a processing pipeline. the declarative nature of the sitemap language and xslt facilitate rapid development with no need to redeploy the entire application to make changes in its behavior. the laneconnex search is composed of several components integrated into a query-and-results interface: oracle resource metadata, full-text metasearch application, movable type blogging software, “did you mean?” spell checker, ezproxy remote access, and webtrends tracking. n full-text metasearch integration of results from lane’s metasearch application illustrates cocoon’s many strengths. when a user searches laneconnex, cocoon sends his or her query to the metasearch application, which then dispatches the request to multiple external, full-text search engines and content stores. some examples of these external resources are uptodate, access medicine, micromedex, pubmed, and md consult. the metasearch application interacts with these external resources through jakarta commons http clients. responses from external resources are turned into w3c document object model (dom) objects, and xpath expressions are used to resolve hit counts from the dom objects. as result counts are returned, they are added to an xml–based result list and returned to cocoon. the power of cocoon becomes evident as the xml– based metasearch result list is combined with a separate display template. this template-based approach affords content curators the ability to directly add, group, and describe metasearch resources using the language and look that is most meaningful to their specific user communities. for example, there are currently eight metasearch templates curated by an informationist in partnership with a target community. curating these templates requires little to no assistance from programmers. in lane’s 2005 interface, a user’s request was sent to the metasearch application, and the application waited five seconds before responding to give external resources a chance to return a result. hit counts in the user interface included a link to refresh and retrieve more results from external resources that had not yet responded. usability studies showed this to be a significant user barrier, since the refresh link was rarely clicked. the initial five second delay also gave users the impression that the site was slow. the laneconnex application makes heavy use of javascript to solve this problem. after a user makes her initial request, javascript is used to poll the metasearch application (through cocoon) on the user’s behalf, popping in result counts as external resources respond. this adds a level of interactivity previously unavailable and makes the metasearch piece of laneconnex much more successful than its previous version. resource metadata laneconnex replaces the catalog as the primary discovery interface. metadata describing locally owned and 34 information technology and libraries | march 2009 licensed resources (journals, databases, books, videos, images, calculators, and software applications) are stored in the library’s current system of record, an instance of the voyager ils. laneconnex makes no attempt to replace voyager ’s strengths as an application for the selection, acquisition, description, and management of access to library resources. it does, however, replace voyager ’s discovery interface. to this end, metadata for about eight thousand digital resources is extracted from voyager ’s oracle database, converted into marcxml, processed with xslt, and stored in a simple relational database (six tables and twenty-nine attributes) to support fast retrieval speed and tight control over search syntax. this extraction process occurs nightly, with incremental updates every five minutes. the oracle text search engine provides functionality anticipated by our internet-minded users. key features are speed and relevance-ranked results. a highly refined results ranking insures that the logical title appears in the first few results. a user ’s query is parsed for wildcard, boolean, proximity, and phrase operators, and then translated into an sql query. results are then transformed into a display version. related services laneconnex compares a user’s query terms against a dictionary. each query is sent to a cocoon spell-checking component that returns suggestions where appropriate. this component currently uses the simple object figure 1. laneconnex architecture. laneconnex | ketchell et al. 35 access protocol (soap)–based spelling service from google. google was chosen over the national center for biotechnology information (ncbi) spelling service because of the breadth of terms entered by users; however, cocoon’s component-oriented architecture would make it trivial to change spell checkers in the future. each query is also compared against stanford’s openurl link resolver (findit@stanford). client-side javascript makes a cocoon-mediated query of findit@stanford. using xslt, findit@stanford responses are turned into javascript object notation (json) objects and popped into the interface as appropriate. although the vast majority of laneconnex searches result in zero findit@stanford results, the convenience of searching all of lane’s systems in a single, unified interface far outweighs the effort of implementation. a commercial analytics tool called webtrends is used to collect web statistics for making data-centric decisions about interface changes. webtrends uses client-side javascript to track specific user click events. libraries need to track both on-site clicks (e.g., the user clicked on “clinical portal” from the home page) and off-site clicks (e.g., the user clicked on “yamada’s gastroenterology” after doing a search for “ibs”). to facilitate off-site click capture, webtrends requires every external link to include a snippet of javascript. requiring content creators to input this code by hand would be error prone and tedious. laneconnex automatically supplies this code for every class of link (search or static). this specialized webtrends method provides lane with data to inform both interface design and licensing decisions. n results laneconnex version 1.0 was released to the stanford biomedical community in july 2006. the current application can be experienced at http://lane.stanford.edu. the figure 2. laneconnex resource search results. resource results are ranked by relevance. single word titles are given a higher weight in the ranking algorithm to insure they are displayed in the first five results. uniform titles are used to co-locate versions (e.g., the three instances of science from different producers). journals titles are linked to their respective impact factor page in the isi web of knowledge. digital formats that require special players or restrictions are indicated. the metadata searched for ejournals, databases, ebooks, biotools, video, and medcalcs are lane’s digital resources extracted from the integrated library system into a searchable oracle database. the first “all” tab is the combined results of these genres and the lane site help and information. figure 3. laneconnex related services search enhancements. laneconnex includes a spell checker to avoid a common failure in user searches. ajax services allow the inclusion of search results from other sources for common zero results failures. for example, the stanford link resolver database is simultaneously searched to insure online journals outside the scope of biomedicine are presented as a linked result for the user. production version has proven reliable over two years. incremental user focus groups have been employed to improve the interface as issues arose. a series of vignettes will be used to illustrate how the current version of 36 information technology and libraries | march 2009 the “sunetid login” is required. n user query: “new yokrer.” a faculty member is looking for an article in the new yorker for a class reading assignment. he makes a typing error, which invokes the “did you mean?” function (see figure 3). he clicks on the correct spelling. no results are found in the resource search, but a simultaneous search of the link-resolver database finds an instance of this title licensed for the campus and displays a clickable link for the user. n user query: “pathway analysis.” a post–doc is looking for information on how to share an ingenuity pathway. figure 4 illustrates the integration of the locally created lane faqs. faqs comprise a broad spectrum of help and how-to information as described by our focus groups. help text is created in the movable type blog software, and made searchable through the laneconnex application. the movable type interface lowers the barrier to html content creation by any staff member. more complex answers include embedded images and videos to enable the user to see exactly how to do a particular procedure. cocoon allows for the syndication of subsets of this faq content back into static html pages where it can be displayed as both category-specific lists or as the text for scroll-over help for a link. having a single store of help information insures the content is updated once for all instances. n user query: “uterine cancer kapp.” a resident is looking for a known article. laneconnex simultaneously searches pubmed to increase the likelihood of user success (see figure 5). clicking on the pubmed tab retrieves the results in the native interface; however, the user sees the pubmed@stanford version, which includes embedded links to the article based on our openurl link resolver. the ability to retrieve results from bibliographic databases that includes article resolution insures that our biomedical community is always using the correct url to insure maximum full-text article access. user testing in 2007 found that adding the three most frequently used sources (pubmed, google, and lane catalog) into our one-box laneconnex search was a significant time saver. it addresses laneconnex meets the design objectives from the user’s perspective. n user query: “science.” a graduate student is looking for the journal science. the laneconnex results are listed in relevance order (see figure 2). singleword titles are given a higher weight in the ranking algorithm to insure they are displayed in the first five results. results from local metadata are displayed by uniform title. for example, lane has three instances of the journal science, and each version is linked to the appropriate external store. brief notes provide critical information for particular resources. for example, restricted local patient education documents and video seminars note that figure 4. example of integration of local content stores. help information is managed in moveable type and integrated into laneconnex search results. laneconnex | ketchell et al. 37 the expectation on the part of our users that they could search for an article or a journal title in a single search box without first selecting a database. n user query: “serotonin pulmonary hypertension.” a medical student is looking for the correlation of two topics. clicking on the “clinical” tab, the student sees the results of the clinical metasearch in figure 6. metasearch results are deep searches of sources within licensed packages (e.g., textbooks in md consult or a specific database in micromedex), local content (e.g., stanford’s lab-test database), and openaccess content (e.g., ncbi databases). pubmed results are tailored strategies tiered by evidence. for example, the evidence-summaries strategy retrieves results from twelve clinical-evidence resources (e.g., buj, clinical evidence, and cochrane systematic reviews) that link to the full-text licensed by stanford. an example of the bioresearch metasearch is shown in figure 7. content selected for this audience includes literature databases, funding sources, patents, structures, clinical trials, protocols, and stanford expertise integrated with gene, protein, and phenotype tools. user testing revealed that many users did not click on the “clinical” tab. the clinical metasearch was originally developed for the clinical portal page and focused on clinicians in practice; however, the results needed to be exposed more directly as part of the laneconnex search. figure 8 illustrates the “have you tried?” feature that displays a few relevant clinical-content sources without requiring the user to select the “clinical” tab. this feature is managed by the smartsearch component of the laneconnex system. smartsearch sends the user’s query terms to pubmed, extracts a subset of articles associated with those terms, extracts the mesh headings for those articles, and computes the frequency of headings in the articles to determine the most likely mesh terms associated with the user’s query terms. these mesh terms are mapped to mesh terms associated with each metasearch resource. preliminary evaluation indicates that the clinical content is now being discovered by more users. figure 5. example of integration of popular search engines into laneconnex results. three of the most popular searches based on usage analysis are included at the top level. pubmed and google are mapped to lane’s link resolver to retrieve the full article. creating or editing metasearch templates is a curator driven task. programming is only required to add new sources to the metasearch engine. a curator may choose from more than three hundred sources to create a discipline-based layout using general templates. names, categories, and other description information are all at the curator ’s discretion. while developing new subspecialty templates, we discovered that clinicians were confused by the difference in layout of their specialty portal and their metasearch results (e.g., the cardiology portal used the generic clinical metasearch). to address this issue, we devised an approach that merges a portal and metasearch into a single entity as illustrated in figure 9. a combination of the component-oriented architecture of laneconnex and javascript makes the integration of metasearch results into a new template patterned after a portal easy to implement. this strategy will enable the creation of templates contextually appropriate to knowledge requests originating from electronic medical-record systems in the future. direct user feedback and usage statistics confirm that search is now the dominant mode of navigation. the amount of time each user spends on the website has dropped since the release of version 1.0. we speculate that the integrated search helps our users find relevant 38 information technology and libraries | march 2009 information more efficiently. focus groups with students are uniformly positive. graduate students like the ability to find digital articles using a single search box. medical students like the clinical metasearch as an easy way to look up new topics in texts and customized pubmed searches. bioengineering students like the ability to easily look up patient care–related topics. pediatrics residents and attendings have championed the development of their portal and metasearch focused on their patient population. medical educators have commented on their ability to focus on the best information sources. n discussion a review of websites in 2007 found that most biomedical libraries had separate search interfaces for their digital resources, library catalog, and external databases. biomedical libraries are implementing metasearch software to cross search proprietary databases. the university of california, davis is using the metalib software to federate searching multiple bibliographic databases.8 the university of south california and florida state university are using webfeat software to search clinical textbooks.9 the health sciences library system at the university of pittsburgh is using vivisimo to search clinical textbooks and bioresearch tools.10 academic libraries are introducing new “resource shopping” applications, such as the endeca project at north carolina state university, the summa project at the university of aarhus, and the vufind project at villanova university.11 these systems offer a single query box, faceted results, spell checking, recommendations based on user input, and asynchronous javascript and xml (ajax) for live status information. we believe our approach is a practical integration for our biomedical community that bridges finding a resource and finding a specific item through figure 6. integration of metasearch results into laneconnex. results from two general, role-based metasearches (bioresearch and clinical) are included in the laneconnex interface. the first image shows a clinician searching laneconnex for serotonin pulmonary hypertension. selecting the clinical tab presents the clinical content metasearch display (second image), and is placed deep inside the source by selecting a title (third image). laneconnex | ketchell et al. 39 a metasearch of multiple databases. the laneconnex application searches across digital resources and external data stores simultaneously and presents results in a unified display. the limitation to our approach is that the metasearch returns only hit counts rather than previews of the specific content. standardization of results from external systems, particularly receipt of xml results, remains a challenge. federated search engines do integrate at this level, but are usually slow or limit the number of results. true integration awaits health level seven (hl7) clinical decision support standards and national information standards organization (niso) metasearch initiative for query and retrieval of specific content.12 one of the primary objectives of laneconnex is speed and ease of use. ranking and categorization of results has been very successful in the eyes of the user community. the integration of metasearch results has been particularly successful with our pediatric specialty portal and search. however, general user understanding of how the clinical and biomedical tabs related to the genre tabs in laneconnex has been problematic. we reviewed web engines and found a similar challenge in presenting disparate format results (e.g., video or image search results) or lists of hits from different systems (e.g., ncbi’s entrez search results).13 we are continuing to develop our new specialty portal-and-search model and our smartsearch term-mapping component to further integrate results. n conclusion laneconnex is an effective and openended search infrastructure for integrating local resource metadata and full-text content used by clinicians and biomedical researchers. its effectiveness comes from the recognition that users prefer a single query box with relevance or categorically organized results that lead them to the most likely figure 7. example of a bioresearch metasearch. figure 8. the smartsearch component embeds a set of the metasearch results into the laneconnex interface as “have you tried?” clickable links. these links are the equivalent of selecting the title from a clinical metasearch result. the example search for atypical malignant rhabdoid tumor (a rare childhood cancer) invokes oncology and pediatric textbook results. these texts and pubmed provide quick access for a medical student or resident on the pediatric ward. figure 9. example of a clinical specialty portal with integrated metasearch. clinical portal pages are organized so metasearch hit counts can display next to content links if a user executes a search. this approach removes the dissonance clinicians felt existed between separate portal page and metasearch results in version 1.0. 40 information technology and libraries | march 2009 answer to a question or prospects in their exploration. the application is based on separation of concerns and is easily extensible. new resources are constantly emerging, and it is important that libraries take full advantage of existing and forthcoming content that is tailored to their user population regardless of the source. the next major step in the ongoing development of laneconnex is becoming an invisible backend application to bring content directly into the user’s workflow. n acknowledgements the authors would like to acknowledge the contributions of the entire laneconnex technical team, in particular pam murnane, olya gary, dick miller, rick zwies, and rikke ogawa for their design contributions, philip constantinou for his architecture contribution, and alain boussard for his systems development contributions. references 1. denise t. covey, “the need to improve remote access to online library resources: filling the gap between commercial vendor and academic user practice,” portal libraries and the academy 3 no.4 (2003): 577–99; nobert lossau, “search engine technology and digital libraries,” d-lib magazine 10 no. 6 (2004), www.dlib.org/dlib/june04/lossau/06lossau.html (accessed mar. 1, 2008); oclc, “college students’ perception of libraries and information resource,” www.oclc.org/reports/ perceptionscollege.htm (accessed mar 1, 2008); and jim henderson, “google scholar: a source for clinicians,” canadian medical association journal 12 no. 172 (2005). 2. covey, “the need to improve remote access to online library resources”; lossau, “search engine technology and digital libraries”; oclc, “college students’ perception of libraries and information resource.” 3. jane lee, “uc health sciences metasearch exploration. part 1: graduate student gocus group findings,” uc health sciences metasearch team, www.cdlib.org/inside/assess/ evaluation_activities/docs/2006/draft_gradreport_march2006. pdf (accessed mar. 1, 2008). 4. karen k. grandage, david c. slawson, and allen f. shaughnessy, “when less is more: a practical approach to searching for evidence-based answers,” journal of the medical library association 90 no. 3 (2002): 298–304. 5. nicola cannata, emanuela merelli, and russ b. altman, “time to organize the bioinformatics resourceome,” plos computational biology 1 no. 7 (2005): e76. 6. craig silverstein et al., “analysis of a very large web search engine query log,” www.cs.ucsb.edu/~almeroth/ classes/tech-soc/2005-winter/papers/analysis.pdf (accessed mar. 1, 2008); anne aula, “query formulation in web information search,” www.cs.uta.fi/~aula/questionnaire.pdf (accessed mar. 1, 2008); jorge r. herskovic, len y. tanaka, william hersh, and elmer v. bernstam, “a day in the life of pubmed: analysis of a typical day’s query log,” journal of the american medical informatics association 14 no. 2 (2007): 212–20. 7. herskovic, “a day in the life of pubmed.” 8. davis libraries university of california, “quicksearch,” http://mysearchspace.lib.ucdavis.edu/ (accessed mar. 1, 2008). 9. eileen eandi, “health sciences multi-ebook search,” norris medical library newsletter (spring 2006), norris medical library, university of southern california, www.usc.edu/hsc/ nml/lib-information/newsletters.html (accessed mar. 1, 2008); maguire medical library, florida state university, “webfeat clinical book search,” http://med.fsu.edu/library/tutorials/ webfeat2_viewlet_swf.html (accessed mar. 1, 2008). 10. jill e. foust, philip bergen, gretchen l. maxeiner, and peter n. pawlowski, “improving e-book access via a librarydeveloped full-text search tool,” journal of the medical library association 95 no. 1 (2007): 40–45. 11. north carolina state university libraries, “endeca at the ncsu libraries,” www.lib.ncsu.edu/endeca (accessed mar. 1, 2008); hans lund, hans lauridsen, and jens hofman hansen, “summa—integrated search,” www.statsbiblioteket.dk/ publ/summaenglish.pdf (accessed mar. 1, 2008); falvey memorial library, villanova university, “vufind,” www.vufind.org (accessed mar. 1, 2008). 12. see the health level seven (hl7) clinical decision support working committee activities, in particular the infobutton standard proposal at www.hl7.org/special/committees/dss/ index.cfm and the niso metasearch initiative documentation at www.niso.org/workrooms/mi (accessed mar 1, 2008). 13. national center for biotechnology information (ncbi) entrez cross-database search, www.ncbi.nlm.nih.gov/entrez (accessed mar. 1, 2008). acrl 5 alcts 15 lita cover 2, cover 3 jaunter cover 4 index to advertisers author name and second author b y now, most library and information technology association (lita) members and information technology and libraries (ital) readers know that 2006 is the fortieth anniversary of lita’s predecessor, the information science and automation division (isad) of the american library association (ala). and 2007 marks the fortieth birthday of ital, first published in 1967 as the journal of library automation (jola). i hope that members and readers know the vital role played by fred kilgour in the founding of the division and as jola’s founding editor. this issue marks the initiation of a two-volume celebration (volumes 25 and 26) of his role as founding editor by publishing what we hope are significant articles resulting from original research, the development of important and creative new systems, or explications of significant new technologies that will shape future information technologies. i have invited some of the authors of these articles to submit their manuscripts. others are being submitted in response to a call i published both in an earlier editorial and in a message to the lita-l discussion list. whether invited or submitted, they will receive the same double-blind refereeing that all ital articles undergo. the referees will not know which articles have been invited or submitted for this purpose. the articles will, however, be so designated when they are published. volume 25 initiates a second landmark for ital. henceforth, ital will be published simultaneously in electronic and print versions. the electronic copy will be available to lita members and ital subscribers on the ala/lita web site. equally significantly, at the 2006 ala midwinter meeting in san antonio, the lita board of directors approved a second proposal from the lita publications committee. (the ital editor and editorial board report to the publications committee.) after six months, the electronic issues will be open to all, not restricted to members and subscribers. put simply, if you are a member or subscriber reading this issue in print, you may also read it and volume 25, number 1 (the march 2006 issue) on the web. when volume 25, number 3 is published in september 2006, the march issue on the web will be open for anyone to read. when the december issue is published, this june e-issue will be open to all. the web versions are to be published in both pdf and html versions. most ital articles now include urls. readers will be able to link to them. most figures and graphs submitted by authors are in color. from now on, these will be available to the readers of the e-copies. ala publishing allows authors to submit their articles to institutional repositories, and many authors now do so. authors will retain this option. some articles have been posted on other portals as well. martha yee’s outstanding june 2005 article on how to frbrize the opac appears not only on ucla’s repository site but also on the escholarship repository site of the university of california system, one of the few library-related articles on the site (http://repositories.cdlib.org/escholarhip). furthermore, on november 29, 2005, it was among the top ten most popular articles on the site. recently, dlist (http://dlist.sir.arizona.edu) at the university of arizona library received permission to include it. the decisions to allow simultaneous publication of print and electronic versions and to allow open access after six months were not made lightly. the lita board members carried on extensive electronic discussions among themselves and with nancy colyar, chair of the publications committee, and me. lita president pat mullin’s summary of those discussions was more than ten single-spaced pages. nancy and i also attended a meeting of the board in san antonio. publications and memberships are two chief sources of revenue for almost all professional associations. in two surveys in the past ten years, lita members have indicated they considered ital to be their most important membership benefit. lita membership fell this year, probably because of the recent dues increases by other divisions of ala. this decline was anticipated by lita’s leadership. i think both the ital editorial board and the lita leadership would love to take the additional pioneering step of making our journal a full open-access publication. however, legitimate concern was expressed that opening access after six months might lead to both a decrease in members and subscribers. a significant number of lita leaders said that their membership was based on lita programs, participation, and interaction with colleagues, not just ital. i hope that all lita members feel the same. i further hope that lita members will do everything they can to discourage their libraries from canceling their subscriptions. our financial health would be enhanced if all lita members took two other steps: participating in writing and encouraging the writing of significant articles, and encouraging your many library technology vendors to advertise in ital. fred kilgour and the other founders of our division were library information technology (it) pioneers. fred’s leadership helped make jola and now ital vital reading for library it professionals. i believe that by celebrating the lita/ital anniversaries with a reconfirmation of our practice of publishing articles of the highest quality and by making ital more accessible through electronic publication, we are reaffirming the scholarly and professional commitments first made by fred kilgour and his isad colleagues such a short forty years ago. john webb john webb (jwebb@wsu.edu) is assistant director for systems and planning, washington state university libraries, pullman, and editor of information technology and libraries. editorial: lita and ital: forty and still counting editorial | webb 51 87 book reviews automation in libraries, by r. t. kimber. oxford, pergamon press, 1968. 140 pp. $6.00. many books have been published in recent years on the subject of library automation. very few of them, however, have succeeded in making meaningful contributions to a better understanding of the subject. this volume has made a sincere effort to be one of the few. although library automation is an ambiguous term which lacks precise definition, it is used here clearly to mean the use of computers in libraries. the book is intended for those with no computer background but who are familiar with library operations. it attempts to give a good introduction to current practices in library automation and a fairly detailed ac4 count of the state of the art. in the first chapter, "libraries and automation," mr. kimber discusses the relationship between the library and the computer. seeing the computer as a means of performing human clerical functions, he points out two important attitudes that must be observed: first, one must not change to a computer system just for the sake of changing, and second, one must be willing to change if the change means improvement. the monetary worth of the computer in the library is difficult to express because the end result is not increased profit but better service. since benefits from computer operations can be expressed in time and effort saved, these are the means of monetary comparison the author suggests. he also observes that although there are many good reasons for wanting computerized operations, some of these are merely emotional. chapter ii, "introduction to computers" is written by anne h. boyd, lecturer in computation at queen's university of belfast. miss boyd gives a brief review of the development and use of computers and discusses the fundamentals of computer systems. the next four chapters by mr. kimber present computerized systems for various library activities: chapter iii, "ordering and acquisitions," chapter iv, "circulation control," chapter v, "periodicals listing and accessioning," chapter vi, "catalogues and bibliographies." each chapter with the minimum of technical terminology gives a good account of what is involved in automating a particular operation. his treatment is very informative on these matters. in his final chapter (chapter vii, "the present state of automation in libraries") kimber discusses current trends of library automation and gives examples of libraries which use computers. his list is admittedly not comprehensive, but it does provide a comparison to the "ideal" systems he has described in the earlier chapters. in commenting on the future of computerized library systems, he sees these systems as an escape from the problems of everyday library operations. .88 journal of library a.utomation vol. 3/1 march, 1970 this book should be a good addition to the current book-s on library automation. one unfortunate aspect, however, appears to be an absence of treatment regarding the psychological impact of automation on librarians and users which is certainly one important aspect to he considered when automation of a system is proposed. also, at times the author, in attempting to simplify his discussion, has made :a generalized statement without fuller explanation. this could be misleading and tend to confuse the uninitiated reader. these deficiencies are not of major consequence and do not prejudice the total work but, care should be taken in reading. sul h. lee 1968 international directory of research and development scientists, philadelphia: institute for scientific information, inc., 1969, 1352 pages { approx. ) . $60.00. the second issue of the "international directory of research and development scientists" ( idr&ds) iists the names and organizational addresses of 152,648 authors whose papers were listed in either 4o implications of marc, and the· library of congress systems studies. (this paper includes twenty-eight pages. of ap-· pendices., mostly charts}., two additional papers include a discussion of the future of, and a tabulation of trends affecting, library automation. mm:h of the material in these non-survey papers. is reported more completely elsewhere and some of ft now seems dated. the material presented in this publication must have produced a highly effective educational institute in 1967. in 1969~ its value is at best as a first reader in library automation but not as the state-of-the-art review the title proclaims. charles t . payne 90 journal of library automation vol. 3/1 march, 1970 computers and data processing: information sources, by chester morrill, jr. an annotated guide to the literature, associations, and institutions concerned with input, throughput, and output of data. detroit: gale research co., [1969]. 275 pp. $8.75. (management information guide, 15) this latest volume in the management information guide series should prove as useful as its predecessors, offering to those persons interested in or concerned with computers and data processing (and who now is not?) an organized and extensive survey of the basic and necessary source of available information. thus the text is for the most part an annotated bibliography of pertinent references arranged in broad categories, each category prefaced with a paragraph or two of comment. this is in the style of mr. morrill's earlier contribution to the series, systems and procedures including office management, 1967 and, in general, that of all the volumes of the series. section 7 "operating" is the largest category, some forty pages of references subdivided into "manuals," "digital computers," "data transmission," "fortran," "software" and the like. section 9, entitled "front office references," is of particular interest to the reference librarian, since it serves as a guide to desirable dictionaries, handbooks and abstracting services in the fields of automation and data processing. individual annotations are usually brief, informative and on occasion evaluative. they give evidence of considerable skill in the art of capsule characterization. the prefatory paragraphs and notes to each section characterize the particular topic as successfully and succinctly as do the individual annotations. the preface to section 3, "personnel," is particularly felicitous. coverage is ample not only as to the subjects chosen but also as to numbers of references under individual subjects. an important thirty pages of appendices lists additional sources of information associations, manufacturers, seminars, publishers, placement firms, etc.-particularly valuable to the business man or government official as a desk or front-office reference book, although the librarian will also find it of value in providing specific information for his clientele. in all, this is a highly competent and very welcome addition to the series as well as to the ranks of special reference sources so necessary to the proper practice of the reference librarian's art. i think of crane's a guide to the literature of chemistry and white's sources of information in the social sciences and consider the author quite comfortable in their company as well as in that of his colleagues in the series. in addition, he evinces in his annotations and prefaces a wit, a turn of phrase and a capacity for direct statement that inform and delight the user. he displays an expertise in the fields of management and computer science, and one feels one can rely on his selection and judgment. eleanor r. devlin book reviews 91 cenralized book processing: a feasibility study based on colorado academic libraries by lawrence e. leonard, joan m. maier and richard m. dougherty. metuchen, n.j.: scarecrow press, 1969. 401 pp. $10.00. in october 1966 the national science foundation awarded a grant to the university of colorado libraries and the colorado council of librarians for research in the area of centralized processing. the project was in three phases. phase i involved an examination of the feasibility of establishing a book-processing center to serve the needs of the nine state-supported college and university libraries in colorado (which range in size from the university of colorado, with 805,959 volumes as of june 30, 1967, to metropolitan state college, a new institution with 8,310 volumes). phase ii involved a simulation study of the proposed center, while phase iii involved an operational book-processing center on a one-year experimental basis. this book summarizes the results of the first two phases of the study. phase i involved a detailed time-and-cost analysis of the acquisition, cataloging, and bookkeeping procedures in the nine participating libraries, with resultant processing costs per volume which are both convincing and somewhat startling, ranging as they do from $2.67 to $7.71 per volume. the operating specifications of the proposed book-processing center are then set forth and a mathematical model for simulating its operations under a variety of alternative conditions is prepared. the conclusions are less than surprising: "a centralized book processing center to serve the needs of the academic libraries in colorado is a viable approach to book processing." project benefits are enumerated, in the areas of cost savings, time-lag reductions, and the more efficient utilization of personnel. unfortunately, while many of the conclusions are buttressed by a dazzling array of tables and mathematical formulas (how can most librarians really argue with a regression analysis correlation coefficient matrix?), some of the most important savings cited are based on simple guesses, in some cases very simple guesses. to mention just two examples: 1) we are told that "a discount advantage expected through the use of combined ordering and a larger volume of ordering is conservatively estimated at 5% ... " (perhaps, but what is this based on?) 2) in the area of time lag reduction, "the greatest savings in time will accrue when the center is able to purchase materials from a vendor who has built up his book stock to reflect the needs of academic institutions. up to now, vendors have been unwilling to do this because there is insufficient profit motive." would nine libraries combining together change this profit picture? it is unfortunate that this report could not have waited on phase iii, the completion of the one-year trial of the operational center which was to have been ready in august 1969, so that we could see just how the predictions for the center worked out in practice. as it stands, however, the 92 journal of library autcmuztion vol 3/1 march, 1970 book is a valuable study in library systems analysis and design, and its identification and quantification of the various technical processing activities can yield real benefits to librarians everywhere, be they ever so decentralized. norman dudley a guide to a selection of computer-based science and technology reference services in the u.s.a., american library association, chicago, illinois, 1969, 29 pages. $1.50. this guide is an attempt to bring together those reference publications which are also available in machine readable form. as a "selection" it is limited to eighteen sources from government, professional and private organizations. the guide is the result of a survey undertaken in 1968 by the science and technology reference services committee of the american library association reference services division. the committee was composed of elsie bergland, john mcgowan, william page, joseph paulukonis, margaret simonds, george caldwell, robert krupp and richard snyder. each entry is broken down into three units: 1) the characteristics of the data base, 2) the equipment configuration and 3) the use of the file. subject headings under characteristics of the data base include subject matter, literature surveyed, types of material covered, etc. the equipment configuration section describes computer model, core, operating systems, and programming language. the use of the file section covers potential uses of the data base by the producer and the subscriber. unfortunately for publications of this sort, they become out of date rather quickly. the continuing series, the directory of computerized information in science and technology, is updated periodically and is a very useful reference tool in this field. ge"y d. guthrie 92 journal of library autonuztion vol 3/1 march, 1970 book is a valuable study in library systems analysis and design, and its identification and quantification of the various technical processing activities can yield real benefits to librarians everywhere, be they ever so decentralized. norman dudley a guide to a selection of computer-based science and technology reference services in the u.s.a., american library association, chicago, illinois, 1969, 29 pages. $1.50. this guide is an attempt to bring together those refere~~e pu~lic,~~o~s which are also available in machine readable form. as a selection 1t ls limited to eighteen sources from government, professional and private organizations. . . the guide is the result of a survey undertaken m 1968 by the sc1ence and technology reference services committee of the american library association reference services division. the committee was composed of elsie bergland, john mcgowan, william page, joseph paulukonis, margaret simonds, george caldwell, robert krupp and richard snyder. each entry is broken down into three units: 1) the characteristics of the data base, 2) the equipment configuration and 3) the use of the file. subject headings under characteristics of the data base include subject matter, literature surveyed, types of material covered, etc. the. equipment configuration section describes computer model, core, operatmg systems, and programming language. the use of the file section covers potential uses of the data base by the producer and the subscriber. unfortunately for publications of this sort, they become out of date rather quickly. the continuing series, the directory of computerized infornuztion in science and technology, is updated periodically and is a very useful reference tool in this field. gerry d. guthrie \ orthographic error patterns of author names in catalog searches 93 renata tagliacozzo, manfred kochen, and lawrence rosenberg: mental health research institute, the university of michigan, ann arbor, michigan an investigation of error patterns in author names based on data from a survey of library catalog searches. position of spelling errors was noted and related to length of name. probability of a name having a spelling error was found to increase with length of name. nearly half of the spelling mistakes were replacement errors; following, in order of decreasing frequency, were omission, addition, and transposition errors. computer-based catalog searching may fail if a searcher provides an author or title which does not match with the required exactitude the corresponding computer-stored catalog entry ( 1). in designing computer aids to catalog searching, it is important to build in safety features that decrease sensitivity to minor errors. for example, compression coding techniques may be used to minimize the effects of spelling errors on retrieval ( 2, 3, 4). preliminary to the design of good protection devices, the application of error-correction coding theory ( 5, 6, 7) and data on error patterns in actual catalog searches ( 8, 9) may be helpful. a recent survey of catalog use at three university libraries yielded some data of the above-mentioned kind (10). the aim of this paper is to present and analyze those results of the survey which bear on questions of error control in searching a computer-stored catalog. in the survey, users were interviewed at random as they approached the catalog. of the 2167 users interviewed, 1489 were searching the catalog for a particular item ("known-item searches"). of these, 67.9% first entered the catalog with an author's or editor's name, 26.2% with a title, and 5.9% with a subject heading. approximately half the searchers had a written citation, while half relied on memory for the relevant inreducing psychological resistance to digital repositories | quinn 67 and mit mandates, and other mandates such as the one instituted at stanford’s school of education, have come to pass, and the registry of open access repository material archiving policies (roarmap) lists more than 120 mandates around the world that now exist.3 while it is too early to tell whether these developments will be successful in getting faculty to deposit their work in digital repositories, they at least establish a precedent that other institutions may follow. how many institutions follow and how effective the mandates will be once enacted remains to be seen. will all colleges and universities, or even a majority, adopt mandates that require faculty to deposit their work in repositories? what of those that do not? even if most institutions are successful in instituting mandates, will they be sufficient to obtain faculty cooperation? for those institutions that do not adopt mandates, how are they going to persuade faculty to participate in self-archiving, or even in some variation—such as having surrogates (librarians, staff, or graduate assistants) archive the work of faculty? are mandates the only way to ensure faculty cooperation and compliance, or are mandates even necessarily the best way? to begin to adequately address the problem of user resistance to digital repositories, it might help to first gain some insight into the psychology of resistance. the existing literature on user behavior with regard to digital repositories devotes scant attention to the psychology of resistance. in an article entitled “institutional repositories: partnering with faculty to enhance scholarly communication,” johnson discusses the inertia of the traditional publishing paradigm. he notes that this inertia is most evident in academic faculty. this would suggest that the problem of eliciting user cooperation is primarily motivational and that the problem is more one of indifference than active resistance.4 heterick, in his article “faculty attitudes toward electronic resources,” suggests that one reason faculty may be resistant to digital repositories is because they do not fully trust them. in response to a survey he conducted, 48 percent of faculty felt that libraries should maintain paper archives.5 the implication is that digital repositories and archives may never completely replace hard copies in the minds of scholars. in “understanding faculty to improve content recruitment for institutional repositories,” foster and gibbons point out that faculty complain of having too much work already. they resent any additional work that contributing to a digital repository might entail. thus the authors echo johnson in suggesting that faculty resistance the potential value of digital repositories is dependent on the cooperation of scholars to deposit their work. although many researchers have been resistant to submitting their work, the literature on digital repositories contains very little research on the psychology of resistance. this article looks at the psychological literature on resistance and explores what its implications might be for reducing the resistance of scholars to submitting their work to digital repositories. psychologists have devised many potentially useful strategies for reducing resistance that might be used to address the problem; this article examines these strategies and how they might be applied. o bserving the development and growth of digital repositories in recent years has been a bit like riding an emotional roller coaster. even the definition of what constitutes a repository may not be the subject of complete agreement, but for the purposes of this study, a repository is defined as an online database of digital or digitized scholarly works constructed for the purpose of preserving and disseminating scholarly research. the initial enthusiasm expressed by librarians and advocates of open access toward the potential of repositories to make significant amounts of scholarly research available to anyone with internet access gradually gave way to a more somber appraisal of the prospects of getting faculty and researchers to deposit their work. in august 2007, bailey posted an entry to his digital koans blog titled “institutional repositories: doa?” in which he noted that building digital repository collections would be a long, arduous, and costly process.1 the success of repositories, in his view, will be a function not so much of technical considerations as of attitudinal ones. faculty remain unconvinced that repositories are important, and there is a critical need for outreach programs that point to repositories as an important step in solving the crisis in scholarly communication. salo elaborated on bailey’s post with “yes, irs are broken. let’s talk about it,” on her own blog, caveat lector. salo points out that institutional repositories have not fulfilled their early promise of attracting a large number of faculty who are willing to submit their work. she criticizes repositories for monopolizing the time of library faculty and staff, and she states her belief that repositories will not work without deposit mandates, but that mandates are impractical.2 subsequent events in the world of scholarly communication might suggest that mandates may be less impractical than salo originally thought. since her post, the national institutes of health mandate, the harvard brian quinn (brian.quinn@ttu.edu) is social sciences librarian, texas tech university libraries, lubbock. brian quinn reducing psychological resistance to digital repositories 68 information technology and libraries | june 2010 whether or not this was actually the case.11 this study also suggests that a combination of both cognitive and affective processes feed faculty resistance to digital repositories. it can be seen from the preceding review of the literature that several factors have been identified as being possible sources of user resistance to digital repositories. yet the authors offer little in the way of strategies for addressing this resistance other than to suggest workaround solutions such as having nonscholars (e.g., librarians, graduate students, or clerical staff) serve as proxy for faculty and deposit their work for them, or to suggest that institutions mandate that faculty deposit their work. similarly, although numerous arguments have been made in favor of digital repositories and open access, they do not directly address the resistance issue.12 in contrast, psychologists have studied user resistance extensively and accumulated a body of research that may suggest ways to reduce resistance rather than try to circumvent it. it may be helpful to examine some of these studies to see what insights they might offer to help address the problem of user resistance. it should be pointed out that resistance as a topic has been addressed in the business and organizational literature, but has generally been approached from the standpoint of management and organizational change.13 this study has chosen to focus primarily on the psychology of resistance because many repositories are situated in a university setting. unlike employees of a corporation, faculty members typically have a greater degree of autonomy and latitude in deciding whether to accommodate new work processes and procedures into their existing routines, and the locus of change will therefore be more at an individual level. ■■ the psychology of user resistance psychologists define resistance as a preexisting state or attitude in which the user is motivated to counter any attempts at persuasion. this motivation may occur on a cognitive, affective, or behavioral level. psychologists thus distinguish between a state of not being persuaded and one in which there is actual motivation to not comply. the source of the motivation is usually an affective state, such as anxiety or ambivalence, which itself may result from cognitive problems, such as misunderstanding, ignorance, or confusion.14 it is interesting to note that psychologists have long viewed inertia as one form of resistance, suggesting paradoxically that a person can be motivated to inaction.15 resistance may also manifest itself in more subtle forms that shade into indifference, suspicion of new work processes or technologies, and contentment with the status quo. may be attributed at least in part to motivation.6 in another article published a few months later, foster and gibbons suggest that the main reason faculty have been slow to deposit their work in digital repositories is a cognitive one: faculty have not understood how they would benefit by doing so. the authors also mention that users may feel anxiety when executing the sequence of technical steps needed to deposit their work, and that they may also worry about possible copyright infringement.7 the psychology of resistance may thus manifest itself in both cognitive and affective ways. harley and her colleagues talk about faculty not perceiving any reward for depositing their work in their article “the influence of academic values on scholarly publication and communication practices.” this perception results in reduced drive to participate. anxiety is another factor contributing to resistance: faculty fear that their work may be vulnerable to plagiarism in an openaccess environment.8 in “towards user responsive institutional repositories: a case study,” devakos suggests that one source of user resistance is cognitive in origin. scholars do not submit their work frequently enough to be able to navigate the interface from memory, so they must reinitiate the learning process each time they submit their work. the same is true for entering metadata for their work.9 their sense of control may also be threatened by any limitations that may be imposed on substituting later iterations of their work for earlier versions. davis and connolly point to several sources of confusion, uncertainty, and anxiety among faculty in their article “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace.” cognitive problems arise from having to learn new technology to deposit work and not knowing copyright details well enough to know whether publishers would permit the deposit of research prior to publication. faculty wonder whether this might jeopardize their chances of acceptance by important journals whose editors might view deposit as a form of prior publication that would disqualify them from consideration. there is also fear that the complex structure of a large repository may actually make a scholar’s work more difficult to find; faculty may not understand that repositories are not isolated institutional entities but are usually searchable by major search engines like google.10 kim also identifies anxiety about plagiarism and confusion about copyright as being sources of faculty resistance in the article “motivating and impeding factors affecting faculty contribution to institutional repositories.” kim found that plagiarism anxiety made some faculty only willing to deposit already-published work and that prepublication material was considered too risky. faculty with no self-archiving experience also felt that many publishers do not allow self-archiving, reducing psychological resistance to digital repositories | quinn 69 more open to information that challenges their beliefs and attitudes and are more open to suggestion.18 thus before beginning a discussion of why users should deposit their research in repositories, it might help to first affirm the users’ self-concept. this could be done, for example, by reminding them of how unbiased they are in their work or how important it is in their work to be open to new ideas and new approaches, or how successful they have been in their work as scholars. the affirmation should be subtle and not directly related to the repository situation, but it should remind them that they are openminded individuals who are not bound by tradition and that part of their success is attributable to their flexibility and adaptability. once the users have been affirmed, librarians can then lead into a discussion of the importance of submitting scholarly research to repositories. self-generated affirmations may be even more effective. for example, another way to affirm the self would be to ask users to recall instances in which they successfully took a new approach or otherwise broke new ground or were innovative in some way. this could serve as a segue into a discussion of the repository as one more opportunity to be innovative. once the self-concept has been boosted, the threatening quality of the message will be perceived as less disturbing and will be more likely to receive consideration. a related strategy that psychologists employ to reduce resistance involves casting the user in the role of “expert.” this is especially easy to do with scholars because they are experts in their fields. casting the user in the role of expert can deactivate resistance by putting that person in the persuasive role, which creates a form of role reversal.19 rather than the librarian being seen as the persuader, the scholar is placed in that role. by saying to the scholar, “you are the expert in the area of communicating your research to an audience, so you would know better why the digital repository is an alternative that deserves consideration once you understand how it works and how it may benefit you,” you are empowering the user. casting the user as an expert imparts a sense of control to the user. it helps to disable resistance by placing the user in a position of being predisposed to agree to the role he or she is being cast in, which also makes the user more prone to agree with the idea of using a digital repository. priming and imaging one important discovery that psychologists have made that has some bearing on user resistance is that even subtle manipulations can have a significant effect on one’s judgments and actions. in an interesting experiment, psychologists told a group of students that they were to read an online newspaper, ostensibly to evaluate its design and assess how easy it was to read. half of them read an editorial discussing a public opinion survey of youth ■■ negative and positive strategies for reducing resistance just as the definition of resistance can be paradoxical, so too may be some of the strategies that psychologists use to address it. perhaps the most basic example is to counter resistance by acknowledging it. when scholars are presented with a message that overtly states that digital repositories are beneficial and desirable, it may simultaneously generate a covert reaction in the form of resistance. rather than simply anticipating this and attempting to ignore it, digital repository advocates might be more persuasive if they acknowledge to scholars that there will likely be resistance, mention some possible reasons (e.g., plagiarism or copyright concerns), and immediately introduce some counterrationales to address those reasons.16 psychologists have found that being up front and forthcoming can reduce resistance, particularly with regard to the downside of digital repositories. they have learned that it can be advantageous to preemptively reveal negative information about something so that it can be downplayed or discounted. thus talking about the weaknesses or shortcomings of digital repositories as early as possible in an interaction may have the effect of making these problems seem less important and weakening user resistance. not only does revealing negative information impart a sense of honesty and credibility to the user, but psychologists have found that people feel closer to people who reveal personal information.17 a librarian could thus describe some of his or her own frustrations in using repositories as an effective way of establishing rapport with resistant users. the unexpected approach of bringing up the less desirable aspects of repositories—whether this refers to the technological steps that must be learned to submit one’s work or the fact that depositing one’s work in a repository is not a guarantee that it will be highly cited—can be disarming to the resistant user. this is particularly true of more resistant users who may have been expecting a strong hard-sell approach on the part of librarians. when suddenly faced with a more candid appeal the user may be thrown off balance psychologically, leaving him or her more vulnerable to information that is the opposite of what was anticipated and to possibly viewing that information in a more positive light. if one way to disarm a user is to begin by discussing the negatives, a seemingly opposite approach that psychologists take is to reinforce the user’s sense of self. psychologists believe that one source of resistance stems from when a user’s self-concept—which the user tries to protect from any source of undesired change—has been threatened in one way or another. a stable self-concept is necessary for the user to maintain a sense of order and predictability. reinforcing the self-concept of the user should therefore make the user less likely to resist depositing work in a digital repository. self-affirmed users are 70 information technology and libraries | june 2010 or even possibly collaborating on research. their imaginations could be further stimulated by asking them to think of what it would be like to have their work still actively preserved and available to their successors a century from now. using the imagining strategy could potentially be significantly more effective in attenuating resistance than presenting arguments based on dry facts. identification and liking conscious processes like imagining are not the only psychological means of reducing the resistance of users to digital repositories. unconscious processes can also be helpful. one example of such a process is what psychologists refer to as the “liking heuristic.” this refers to the tendency of users to employ a rule-of-thumb method to decide whether to comply with requests from persons. this tendency results from users constantly being inundated with requests. consequently, they need to simplify and streamline the decision-making process that they use to decide whether to cooperate with a request. the liking heuristic holds that users are more likely to help someone they might otherwise not help if they unconsciously identify with the person. at an unconscious level, the user may think that a person acts like them and dresses like them, and therefore the user identifies with that person and likes them enough to comply with their request. in one experiment that psychologists conducted to see if people are more likely to comply with requests from people that they identify with, female undergraduates were informed that they would be participating in a study of first impressions. the subjects were instructed that they and a person in another room would each learn a little about one another without meeting each other. each subject was then given a list of fifty adjectives and was asked to select the twenty that were most characteristic of themselves. the experimenter then told the participants that they would get to see each other’s lists. the experimenter took the subject’s list and then returned a short time later with what supposedly was the other participant’s list, but was actually a list that the experimenter had filled out to indicate that either the subject had much in common with the other participant’s personality (seventeen of twenty matches), some shared attributes (ten of twenty matches), or relatively few characteristics in common (three of twenty matches). the subject was then asked to examine the list and fill out a survey that probed their initial impressions of the other participant, including how much they liked them. at the end of the experiment, the two subjects were brought together and given credit for participating. the experimenter soon left the room and the confederate participant asked the other participant if she would read and critically evaluate an eight-page paper for an english class. the results of the experiment indicated that the more the participant thought she shared in consumer patterns that highlighted functional needs, and the other half read a similar editorial focusing on hedonistic needs. the students next viewed an ad for a new brand of shampoo that featured either a strong or a weak argument for the product. the results of the experiment indicated that students who read the functional editorial and were then subsequently exposed to the strong argument for the shampoo (a functional product) had a much more favorable impression of the brand than students who had received the mismatched prime.20 while it may seem that the editorial and the shampoo were unrelated, psychologists found that the subjects engaged in a process of elaborating the editorial, which then predisposed them to favor the shampoo. the presence of elaboration, which is a precursor to the development of attitudes, suggests that librarians could reduce users’ resistance to digital repositories by first involving them in some form of priming activity immediately prior to any attempt to persuade them. for example, asking faculty to read a brief case study of a scholar who has benefited from involvement in open-access activity might serve as an effective prime. another example might be to listen briefly to a speaker summarizing the individual, disciplinary, and societal benefits of sharing one’s research with colleagues. interventions like these should help mitigate any predisposition toward resistance on the part of users. imagining is a strategy related to priming that psychologists have found to be effective in reducing resistance. taking their cue from insurance salesmen—who are trained to get clients to actively imagine what it would be like to lose their home or be in an accident—a group of psychologists conducted an experiment in which they divided a sample of homeowners who were considering the purchase of cable tv into two groups. one group was presented with the benefits of cable in a straightforward, informative way that described various features. the other group was asked to imagine themselves enjoying the benefits and all the possible channels and shows that they might experience and how entertaining it might be. the psychologists then administered a questionnaire. the results indicated that those participants who were asked to imagine the benefits of cable were much more likely to want cable tv and to subscribe to it than were those who were only given information about cable tv.21 in other words, imagining resulted in more positive attitudes and beliefs. this study suggests that librarians attempting to reduce resistance among users of digital repositories may need to do more than merely inform or describe to them the advantages of depositing their work. they may need to ask users to imagine in vivid detail what it would be like to receive periodic reports indicating that their work had been downloaded dozens or even hundreds of times. librarians could ask them to imagine receiving e-mail or calls from colleagues indicating that they had accessed their work in the repository and were interested in learning more about it, reducing psychological resistance to digital repositories | quinn 71 students typically overestimate the amount of drinking that their peers engage in at parties. these inaccurate normative beliefs act as a negative influence, causing them to imbibe more because they believe that is what their peers are doing. by informing students that almost threequarters of their peers have less than three drinks at social gatherings, psychologists have had some success in reducing excessive drinking behavior by students.23 the power of normative messages is illustrated by a recent experiment conducted by a group of psychologists who created a series of five cards to encourage hotel guests to reuse their towels during their stay. the psychologists hypothesized that by appealing to social norms, they could increase compliance rates. to test their hypothesis, the researchers used a different conceptual appeal for each of the five cards. one card appealed to environmental concerns (“help save the environment”), another to environmental cooperation (“partner with us to save the environment”), a third card appealed to the advantage to the hotel (“help the hotel save energy”), a fourth card targeted future generations (“help save resources for future generations”), and a final card appealed to guests by making reference to a descriptive norm of the situation (“join your fellow citizens in helping to save the environment”). the results of the study indicated that the card that mentioned the benefit to the hotel was least effective in getting guests to reuse their towels, and the card that was most effective was the one that mentioned that descriptive norm.24 this research suggests that if users who are resistant to submitting their work to digital repositories were informed that a larger percentage of their peers were depositing work than they realized, resistance may be reduced. this might prove to be particularly true if they learned that prominent or influential scholars were engaged in populating repositories with their work. this would create a social-norms effect that would help legitimize repositories to other faculty and help them to perceive the submission process as normal and desirable. the idea that accomplished researchers are submitting materials and reaping the benefits might prove very attractive to less experienced and less well-regarded faculty. psychologists have a considerable body of evidence in the area of social modeling that suggests that people will imitate the behavior of others in social situations because that behavior provides an implicit guideline of what to do in a similar situation. a related finding is that the more influential people are, the more likely it is for others to emulate their actions. this is even more probable for highstatus individuals who are skilled and attractive and who are capable of communicating what needs to be done to potential followers.25 social modeling addresses both the cognitive dimension of how resistant users should behave and also the affective dimension by offering models that serve as a source of motivation to resistant users to change common with the confederate, the more she liked her. the more she liked the confederate and experienced a perception of consensus, the more likely she was to comply with her request to critique the paper.22 thus, when trying to overcome the resistance of users to depositing their work in a digital repository, it might make sense to consider who it is that is making the request. universities sometimes host scholarly communication symposia that are not only aimed at getting faculty interested in open-access issues, but to urge them to submit their work to the institution’s repositories. frequently, speakers at these symposia consist of academic administrators, members of scholarly communication or open-access advocacy organizations, or individuals in the library field. the research conducted by psychologists, however, suggests that appeals to scholars and researchers would be more effective if they were made by other scholars and those who are actively engaged in research. faculty are much more likely to identify with and cooperate with requests from their own tribe, as it were, and efforts need to be concentrated on getting faculty who are involved in and understand the value of repositories to articulate this to their colleagues. researchers who can personally testify to the benefits of depositing their work are most likely to be effective at convincing other researchers of the value of doing likewise and will be more effective at reducing resistance. librarians need to recognize who their potentially most effective spokespersons and advocates are, which the psychological research seems to suggest is faculty talking to other faculty. perceived consensus and social modeling the processes of faculty identification with peers and perceived consensus mentioned above can be further enhanced by informing researchers that other scholars are submitting their work, rather than merely telling researchers why they should submit their work. information about the practices of others may help change beliefs because of the need to identify with other in-group members. this is particularly true of faculty, who are prone to making continuous comparisons with their peers at other institutions and who are highly competitive by nature. once they are informed of the career advantages of depositing their work (in terms of professional visibility, collaboration opportunities, etc.), and they are informed that other researchers have these advantages, this then becomes an impetus for them to submit their work to keep up with their peers and stay competitive. a perception of consensus is thus fostered—a feeling that if one’s peers are already depositing their work, this is a practice that one can more easily agree to. psychologists have leveraged the power of identification by using social-norms research to inform people about the reality of what constitutes normative behavior as opposed to people’s perceptions of it. for example, college 72 information technology and libraries | june 2010 highly resistant users that may be unwilling to submit their work to a repository. rather than trying to prepare a strong argument based on reason and logic, psychologists believe that using a narrative approach may be more effective. this means conveying the facts about open access and digital repositories in the form of a story. stories are less rhetorical and tend not to be viewed by listeners as attempts at persuasion. the intent of the communicator and the counterresistant message are not as overt, and the intent of the message might not be obvious until it has already had a chance to influence the listener. a well-crafted narrative may be able to get under the radar of the listener before the listener has a chance to react defensively and revert to a mode of resistance. in a narrative, beliefs are rarely stated overtly but are implied, and implied beliefs are more difficult to refute than overtly stated beliefs. listening to a story and wondering how it will turn out tends to use up much of the cognitive attentional capacity that might otherwise be devoted to counterarguing, which is another reason why using a narrative approach may be particularly effective with users who are strongly resistant. the longer and more subtle nature of narratives may also make them less a target of resistance than more direct arguments.28 using a narrative approach, the case for submitting work to a repository might be presented not as a collection of dry facts or statistics, but rather as a story. the protagonists are the researchers, and their struggle is to obtain recognition for their work and to advance scholarship by providing maximum access to the greatest audience of scholars and to obtain as much access as possible to the work of their peers so that they can build on it. the protagonists are thwarted in their attempts to achieve their ends by avaricious publishers who obtain the work of researchers for free and then sell it back to them in the form of journal and database subscriptions and books for exorbitant prices. these prices far exceed the rate of inflation or the budgets of universities to pay for them. the publishers engage in a series of mergers and acquisitions that swallow up small publishing firms and result in the scholarly publishing enterprise being controlled by a few giant firms that offer unreasonable terms to users and make unreasonable demands when negotiating with them. presented in this dramatic way, the significance of scholar participation in digital repositories becomes magnified to an extent that it becomes more difficult to resist what may almost seem like an epic struggle between good and evil. and while this may be a greatly oversimplified example, it nonetheless provides a sense of the potential power of using a narrative approach as a technique to reduce resistance. introducing a time element into the attempt to persuade users to deposit their work in digital repositories can play an important role in reducing resistance. given that faculty are highly competitive, introducing the idea not only that other faculty are submitting their work but that they are already benefiting as a result makes the their behavior in the desired direction. redefinition, consistency, and depersonalization another strategy that psychologists use to reduce resistance among users is to change the definition of the situation. resistant users see the process of submitting their research to the repository as an imposition at best. in their view, the last thing that they need is another obligation or responsibility to burden their already busy lives. psychologists have learned that reframing a situation can reduce resistance by encouraging the user to look at the same phenomenon in a different way. in the current situation, resistant users should be informed that depositing their work in a digital repository is not a burden but a way to raise their professional profile as researchers, to expose their work to a wider audience, and to heighten their visibility among not only their peers but a much larger potential audience that would be able to encounter their work on the web. seen in this way, the additional work of submission is less of a distraction and more of a career investment. moreover, this approach leverages a related psychological concept that can be useful in helping to dissolve resistance. psychologists understand that inconsistency has a negative effect on self-esteem, so persuading users to believe that submitting their work to a digital repository is consistent with their past behavior can be motivating.26 the point needs to be emphasized with researchers that the act of submitting their work to a digital repository is not something strange and radical, but is consistent with prior actions intended to publicize and promote their work. a digital repository can be seen as analogous to a preprint, book, journal, or other tangible and familiar vehicles that faculty have used countless times to send their work out into the world. while the medium might have changed, the intention and the goal are the same. reframing the act of depositing as “old wine in new bottles” may help to undermine resistance. in approaching highly resistant individuals, psychologists have discovered that it is essential to depersonalize any appeal to change their behavior. instead of saying, “you should reduce your caloric intake,” it is better to say, “it is important for people to reduce their caloric intake.” this helps to deflect and reduce the directive, judgmental, and prescriptive quality of the request, thus making it less likely to provoke resistance.27 suggestion can be much less threatening than prescription among users who may be suspicious and mistrusting. reverting to a third-person level of appeal may allow the message to get through without it being immediately rejected by the user. narrative, timing, and anticipation psychologists recommend another strategy to help defuse reducing psychological resistance to digital repositories | quinn 73 technological platforms, and so on. this could be followed by a reminder to users that it is their choice—it is entirely up to them. this reminder that users have the freedom of choice may help to further counter any resistance generated as a result of instructions or inducements to anticipate regret. indeed, psychologists have found that reinstating a choice that was previously threatened can result in greater compliance than if the threat had never been introduced.32 offering users the freedom to choose between alternatives tends to make them more likely to comply. this is because having a choice enables users to both accept and resist the request rather than simply focus all their resistance on a single alternative. when presented with options, the user is able to satisfy the urge to resist by rejecting one option but is simultaneously motivated to accept another option; the user is aware that there are benefits to complying and wants to take advantage of them but also wants to save face and not give in. by being offered several alternatives that nonetheless all commit to a similar outcome, the user is able to resist and accept at the same time.33 for example, one alternative option to self-archiving might be to present the faculty member with the option of an authorpays publishing model. the choice of alternatives allows the faculty member to be selective and discerning so that a sense of satisfaction is derived from the ability to resist by rejecting one alternative. at the same time, the librarian is able to gain compliance because one of the other alternatives that commits the faculty member to depositing research is accepted. options, comparisons, increments, and guarantees in addition to offering options, another way to erode user resistance to digital repositories is to use a comparative strategy. one technique is to first make a large request, such as “we would like you to submit all the articles that you have published in the last decade to the repository,” and then follow this with a more modest request, such as “we would appreciate it if you would please deposit all the articles you have published in the last year.” the original request becomes an “anchor” or point of reference in the mind of the user against which the subsequent request is then evaluated. setting a high anchor lessens user resistance by changing the user’s point of comparison of the second request from nothing (not depositing any work in the repository) to a higher value (submitting a decade of work). in this way, a high reference anchor is established for the second request, which makes it seem more reasonable in the newly created context of the higher value.34 the user is thus more likely to comply with the second request when it is framed in this way. using this comparative approach may also work because it creates a feeling of reciprocity in the user. when proposition much more salient. it not only suggests that submitting work is a process that results in a desirable outcome, but that the earlier one’s work is submitted, the more recognition will accrue and the more rapidly one’s career will advance.29 faculty may feel compelled to submit their work in an effort to remain competitive with their colleagues. one resource that may be particularly helpful for working with skeptical faculty who want substantiation about the effect of self-archiving on scholarly impact is a bibliography created by the open citation project titled, “the effect of open access and downloads (hits) on citation impact: a bibliography of studies.”30 it provides substantial documentation of the effect that open access has on scholarly visibility. an additional stimulus might be introduced in conjunction with the time element in the form of a download report. showing faculty how downloads accumulate over time is analogous to arguments that investment counselors use showing how interest on investments accrues and compounds over time. this investment analogy creates a condition in which hesitating to submit their work results in faculty potentially losing recognition and compromising their career advancement. an interesting related finding by psychologists suggests that an effective way to reduce user resistance is to have users think about the future consequences of complying or not complying. in particular, if users are asked to anticipate the amount of future regret they might experience for making a poor choice, this can significantly reduce the amount of resistance to complying with a request. normally, users tend not to ruminate about the possibility of future disappointment in making a decision. if users are made to anticipate future regret, however, they will act in the present to try to minimize it. studies conducted by psychologists show that when users are asked to anticipate the amount of future regret that they might experience for choosing to comply with a request and having it turn out adversely versus choosing to not comply and having it turn out adversely, they consistently indicate that they would feel more regret if they did not comply and experienced negative consequences as a result.31 in an effort to minimize this anticipated regret, they will then be more prone to comply. based on this research, one strategy to reduce user resistance to digital repositories would be to get users to think about the future, specifically about future regret resulting from not cooperating with the request to submit their work. if they feel that they might experience more regret in not cooperating than in cooperating, they might then be more inclined to cooperate. getting users to think about the future could be done by asking users to imagine various scenarios involving the negative outcomes of not complying, such as lost opportunities for recognition, a lack of citation by peers, lost invitations to collaborate, an inability to migrate one’s work to future 74 information technology and libraries | june 2010 submit their work. mandates rely on authority rather than persuasion to accomplish this and, as such, may represent a less-than-optimal solution to reducing user resistance. mandates represent a failure to arrive at a meeting of the minds of advocates of open access, such as librarians, and the rest of the intellectual community. understanding the psychology of resistance is an important prerequisite to any effort to reduce it. psychologists have assembled a significant body of research on resistance and how to address it. some of the strategies that the research suggests may be effective, such as discussing resistance itself with users and talking about the negative effects of repositories, may seem counterintuitive and have probably not been widely used by librarians. yet when other more conventional techniques have been tried with little or no success, it may make sense to experiment with some of these approaches. particularly in the academy, where reason is supposed to prevail over authority, incorporating resistance psychology into a program aimed at soliciting faculty research seems an appropriate step before resorting to mandates. most strategies that librarians have used in trying to persuade faculty to submit their work have been conventional. they are primarily of a cognitive nature and are variations on informing and educating faculty about how repositories work and why they are important. researchers have an important affective dimension that needs to be addressed by these appeals, and the psychological research on resistance suggests that a strictly rational approach may not be sufficient. by incorporating some of the seemingly paradoxical and counterintuitive techniques discussed earlier, librarians may be able to penetrate the resistance of researchers and reach them at a deeper, less rational level. ideally, a mixture of rational and less-conventional approaches might be combined to maximize effectiveness. such a program may not eliminate resistance but could go a long way toward reducing it. future studies that test the effectiveness of such programs will hopefully be conducted to provide us with a better sense of how they work in real-world settings. references 1. charles w. bailey jr., “institutional repositories: doa?,” online posting, digital koans, aug. 22, 2007, http://digital -scholarship.org/digitalkoans/2007/08/21/institutional -repositories-doa/ (accessed apr. 21, 2010). 2. dorothea salo, “yes, irs are broken. let’s talk about it,” online posting, caveat lector, sept. 5, 2007, http://cavlec. yarinareth.net/2007/09/05/yes-irs-are-broken-lets-talk-about -it/ (accessed apr. 21, 2010). 3. eprints services, roarmap (registry of open access repository material archiving policies) http://www.eprints .org/openaccess/policysignup/ (accessed july 28, 2009). 4. richard k. johnson, “institutional repositories: partnering the requester scales down the request from the large one to a smaller one, it creates a sense of obligation on the part of the user to also make a concession by agreeing to the more modest request. the cultural expectation of reciprocity places the user in a situation in which they will comply with the lesser request to avoid feelings of guilt.35 for the most resistant users, breaking the request down into the smallest possible increment may prove helpful. by making the request seem more manageable, the user is encouraged to comply. psychologists conducted an experiment to test whether minimizing a request would result in greater cooperation. they went door-to-door, soliciting contributions to the american cancer society, and received donations from 29 percent of households. they then made additional solicitations, this time asking, “would you contribute? even a penny will help!” using this approach, donations increased to 50 percent. even though the solicitors only asked for a penny, the amounts of the donations were equal to that of the original request. by asking for “even a penny,” the solicitors made the request appear to be more modest and less of a target of resistance.36 librarians might approach faculty by saying “if you could even submit one paper we would be grateful,” with the idea that once faculty make an initial submission they will be more inclined to submit more papers in the future. one final strategy that psychological research suggests may be effective in reducing resistance to digital repositories is to make sure that users understand that the decision to deposit their work is not irrevocable. with any new product, users have fears about what might happen if they try it and they are not satisfied with it. not knowing the consequences of making a decision that they may later regret fuels reluctance to become involved with it. faculty need to be reassured that they can opt out of participating at any time and that the repository sponsors will guarantee this. this guarantee needs to be repeated and emphasized as much as possible in the solicitation process so that faculty are frequently reminded that they are entering into a decision that they can reverse if they so decide. having this reassurance should make researchers much less resistant to submitting their work, and the few faculty who may decide that they want to opt out are worth the reduction in resistance.37 the digital repository is a new phenomenon that faculty are unfamiliar with, and it is therefore important to create an atmosphere of trust. the guarantee will help win that trust. ■■ conclusion the scholarly literature on digital repositories has given little attention to the psychology of resistance. yet the ultimate success of digital repositories depends on overcoming the resistance of scholars and researchers to reducing psychological resistance to digital repositories | quinn 75 20. curtis p. haugtvedt et al., “consumer psychology and attitude change,” in knowles and linn, resistance and persuasion, 283–96. 21. larry w. gregory, robert b. cialdini, and kathleen m. carpenter, “self-relevant scenarios as mediators of likelihood estimates and compliance: does imagining make it so?” journal of personality & social psychology 43, no. 1 (1982): 89–99. 22. jerry m. burger, “fleeting attraction and compliance with requests,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 155–66. 23. john d. clapp and anita lyn mcdonald, “the relationship of perceptions of alcohol promotion and peer drinking norms to alcohol problems reported by college students,” journal of college student development 41, no. 1 (2000): 19–26. 24. noah j. goldstein and robert b. cialdini, “using social norms as a lever of social influence,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 167–90. 25. dale h. schunk, “social-self interaction and achievement behavior,” educational psychologist 34, no. 4 (1999): 219–27. 26. rosanna e. guadagno et al., “when saying yes leads to saying no: preference for consistency and the reverse foot-inthe-door effect,” personality & social psychology bulletin 27, no. 7 (2001): 859–67. 27. mary jiang bresnahan et al., “personal and cultural differences in responding to criticism in three countries,” asian journal of social psychology 5, no. 2 (2002): 93–105. 28. melanie c. green and timothy c. brock, “in the mind’s eye: transportation-imagery model of narrative persuasion,” in narrative impact: social and cultural foundations, ed. melanie c. green, jeffrey j. strange, and timothy c. brock (mahwah, n.j.: lawrence erlbaum, 2004): 315–41. 29. oswald huber, “time pressure in risky decision making: effect on risk defusing,” psychology science 49, no. 4 (2007): 415–26. 30. the open citation project, “the effect of open access and downloads (‘hits’) on citation impact: a bibliography of studies,” july 17, 2009, http://opcit.eprints.org/oacitation -biblio.html (accessed july 29, 2009). 31. matthew t. crawford et al., “reactance, compliance, and anticipated regret,” journal of experimental social psychology 38, no. 1 (2002): 56–63. 32. nicolas gueguen and alexandre pascual, “evocation of freedom and compliance: the ‘but you are free of . . .’ technique,” current research in social psychology 5, no. 18 (2000): 264–70. 33. james p. dillard, “the current status of research on sequential request compliance techniques,” personality & social psychology bulletin 17, no. 3 (1991): 283–88. 34. thomas mussweiler, “the malleability of anchoring effects,” experimental psychology 49, no. 1 (2002): 67–72. 35. robert b. cialdini and noah j. goldstein, “social influence: compliance and conformity,” annual review of psychology 55 (2004): 591–21. 36. james m. wyant and stephen l. smith, “getting more by asking for less: the effects of request size on donations of charity,” journal of applied social psychology 17, no. 4 (1987): 392–400. 37. lydia j. price, “the joint effects of brands and warranties in signaling new product quality,” journal of economic psychology 23, no. 2 (2002): 165–90. with faculty to enhance scholarly communication,” d-lib magazine 8, no. 11 (2002), http://www.dlib.org/dlib/november02/ johnson/11johnson.html (accessed apr. 2, 2008). 5. bruce heterick, “faculty attitudes toward electronic resources,” educause review 37, no. 4 (2002): 10–11. 6. nancy fried foster and susan gibbons, “understanding faculty to improve content recruitment for institutional repositories,” d-lib magazine 11, no. 1 (2005), http://www.dlib.org/ dlib/january05/foster/01foster.html (accessed july 29, 2009). 7. suzanne bell, nancy fried foster, and susan gibbons, “reference librarians and the success of institutional repositories,” reference services review 33, no. 3 (2005): 283–90. 8. diane harley et al., “the influence of academic values on scholarly publication and communication practices,” center for studies in higher education, research & occasional paper series: cshe.13.06, sept. 1, 2006, http://repositories.cdlib.org/ cshe/cshe-13-06/ (accessed apr. 17, 2008). 9. rea devakos, “towards user responsive institutional repositories: a case study,” library high tech 24, no. 2 (2006): 173–82. 10. philip m. davis and matthew j. l. connolly, “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace,” d-lib magazine 13, no. 3/4 (2007), http://www.dlib.org/dlib/march07/davis/03davis .html (accessed july 29, 2009). 11. jihyun kim, “motivating and impeding factors affecting faculty contribution to institutional repositories,” journal of digital information 8, no. 2 (2007), http://journals.tdl.org/jodi/ article/view/193/177 (accessed july 29, 2009). 12. peter suber, “open access overview” online posting, open access news: news from the open access environment, june 21, 2004, http://www.earlham.edu/~peters/fos/overview .htm (accessed 29 july 2009). 13. see, for example, jeffrey d. ford and laurie w. ford, “decoding resistance to change,” harvard business review 87, no. 4 (2009): 99–103.; john p. kotter and leonard a. schlesinger, “choosing strategies for change,” harvard business review 86, no. 7/8 (2008): 130–39; and paul r. lawrence, “how to deal with resistance to change,” harvard business review 47, no. 1 (1969): 4–176. 14. julia zuwerink jacks and maureen e. o’brien, “decreasing resistance by affirming the self,” in resistance and persuasion, ed. eric s. knowles and jay a. linn (mahwah, n.j.: lawrence erlbaum, 2004): 235–57. 15. benjamin margolis, “notes on narcissistic resistance,” modern psychoanalysis 9, no. 2 (1984): 149–56. 16. ralph grabhorn et al., “the therapeutic relationship as reflected in linguistic interaction: work on resistance,” psychotherapy research 15, no. 4 (2005): 470–82. 17. arthur aron et al., “the experimental generation of interpersonal closeness: a procedure and some preliminary findings,” personality & social psychology bulletin 23, no. 4 (1997): 363–77. 18. geoffrey l. cohen, joshua aronson, and claude m. steele, “when beliefs yield to evidence: reducing biased evaluation by affirming the self,” personality & social psychology bulletin 26, no. 9 (2000): 1151–64. 19. anthony r. pratkanis, “altercasting as an influence tactic,” in attitudes, behavior and social context: the role of norms and group membership, ed. deborah j. terry and michael a.hogg (mahwah, n.j.: lawrence erlbaum, 2000): 201–26. librarians and technology skill acquisition: issues and perspectives | farney 141click analytics: visualizing website use data | farney 141 tutorial tabatha a. farney librarians who create website content should have access to website usage statistics to measure their webpages’ effectiveness and refine the pages as necessary.3 with web analytics libraries can increase the effectiveness of their websites, and as marshall breeding has observed, libraries can regularly use website statistics to determine how new webpage content is actually being used and make revisions to the content based on this information.4 several recent studies used google analytics to collect and report website usage statistics to measure website effectiveness and improve their usability.5 while web analytics are useful in a website redesign process, several studies concluded that web usage statistics should not be the sole source of information used to evaluate a website. these studies recommend using click data in conjunction with other website usability testing methods.6 background a lack of research on the use of click analytics in libraries motivated the web services librarian to explore their potential by directly implementing them on the library’s website. she found that there are several click analytics products available and each has its own unique functionality. however, many are commercially produced and expensive. with limited funding, the web services librarian selected google analytics’ in-page analytics, clickheat, and crazy egg because they are either free or inexpensive. each tool was evaluated on the library’s website for over a six month period. because google analytics cannot discern between the same link repeated in multiple places on a webpage. furthermore, she wanted to use website use data to determine the areas of high and low usage on the library’s homepage, and use this information to justify her webpage reorganization decisions. although this data can be found in a google analytics report, the web services librarian found it difficult to easily identify the necessary information within the massive amount of data the reports contain. the web services librarian opted to use click analytics, also known as click density analysis or site overlay, a subset of web analytics that reveals where users click on a webpage.1 a click analytics report produces a visual representation of what and where visitors are clicking on an individual webpage by overlaying the click data on top of the webpage that is being tested. rather than wading through the data, libraries can quickly identify what content users are clicking by using a click analytics report. the web services librarian tested several click analytics products while reassessing the library’s homepage. during this process she discovered that each click analytics tool had different functionalities that impacted their usefulness to the library. this paper introduces and evaluates three click analytics tools, google analytics’ in-page analytics, clickheat, and crazy egg, in the context of redesigning the library’s homepage and discusses the benefits and drawbacks of each. literature review library literature indicates that libraries are actively engaged in interpreting website usage data for a variety of purposes. laura b. cohen’s study encourages libraries to use their website usage data to enhance their understanding of how visitors access and use library websites.2 jeanie m. welch further recommends that all click analytics: visualizing website use data editor’s note: this paper is adapted from a presentation given at the 2010 lita forum click analytics is a powerful technique that displays what and where users are clicking on a webpage helping libraries to easily identify areas of high and low usage on a page without having to decipher website use data sets. click analytics is a subset of web analytics, but there is little research that discusses its potential uses for libraries. this paper introduces three click analytics tools, google analytics’ in-page analytics, clickheat, and crazy egg, and evaluates their usefulness in the context of redesigning a library’s homepage. w eb analytics tools, such as google analytics, assist libraries in interpreting their website usage statistics by formatting that data into reports and charts. the web services librarian at the kraemer family library at the university of colorado, colorado springs wanted to use website use data to reassess the library’s homepage that was crowded with redundant links. for example, all the links in the site’s dropdown navigation were repeated at the bottom of the homepage to make the links more noticeable to the user, but it unintentionally made the page long. to determine which links the web services librarian would recommend for removal, she needed to compare the use or clicks the repetitive links received. at the time, the library relied solely on google analytics to interpret website use data. however, this practice proved insufficient tabatha a. farney (tfarney@uccs.edu) is web services librarian, kraemer family library, university of colorado, colorado springs, colorado. 142 information technology and libraries | september 2011 libraries, outbound links include library catalogs or subscription databases. additional javascript tags must be added to each outbound link for google analytics to track that data.9 once google analytics recognizes the outbound links, their click data will be available in the in-page analytics report. visitors to that page, and outbound destinations, links that navigate visitors away from that webpage. the inbound sources and outbound destinations reports can track outbound links, which are links that have a different domain or url address from the website tracked within google analytics. for in-page analytics google analytics is a popular, comprehensive web analytics tool that contains a click analytics feature called in-page analytics (formerly site overlay) that visually displays click data by overlaying that information on the current webpage (see figure 1). site overlay was used during the library’s redesign process, however, it was replaced by in-page analytics in october 2010.7 the web services librarian reassessed the library’s homepage using in-page analytics, and found that the current tool resolved some of site overlay’s shortcomings. site overlay is no longer accessible in google analytics, so this paper will discuss in-page analytics. essentially, in-page analytics is an updated version of the site overlay (see figure 2). in addition to visually representing click data on a webpage, in-page analytics contains new features including the ability to easily segment data. web analytics expert, avinash kaushik, stresses the importance of segmenting website use data because it breaks down the aggregated data into specific data sets that represents more defined groups of users.8 rather than studying the total number of clicks a link received, an in-page analytics report can segment the data into specific groups of users, such as mobile device users. in-page analytics provides several default segments, but custom segments can also be applied allowing libraries to further filter the data that is constructive to them. in-page analytics also displays a complementing overview report of statistics located in a side panel next to the typical site overlay view. this overview report extracts useful data from other reports generated in google analytics without having to leave the in-page analytics report screen. the report includes the webpage’s inbound sources, also called top referrals, which are links from other webpages leading figure 1. screenshot of google analytics’ defunct site overlay figure 2. screenshot of google analytic’s in-page analytic librarians and technology skill acquisition: issues and perspectives | farney 143click analytics: visualizing website use data | farney 143 services librarian uses a screen capture tool, such as the firefox add-on screengrab13, to collect and archive the in-page analytics reports, but the process is clunky and results in the loss of the ability to segment the data. clickheat labsmedia’s clickheat is an open source heat mapping tool that visually displays the clicks on a webpage using color to indicate the amount of clicks an area receives. similar to in-page analytics, a clickheat heat map displays the current webpage and overlays that page with click data (see figure 3). instead of listing percentages or actual numbers of clicks, the heat map represents clicks using color. the warmer the color, such as yellows, oranges, or reds, the more clicks that area receives; the absence of color implies little to no click activity. each heat map has an indicator that outlines the number of clicks a color represents. a heat map clearly displays the heavily used and underused sections on a webpage making it easy for people with little experience interpreting website usage statistics to interpret the data. however, a heat map is not about exact numbers, but rather general areas of usage. for exact numbers, a traditional, comprehensive web analytics tool is required. clickheat can stand alone or be integrated into other web analytic tools.14 to have a more comprehensive web analytics product, the web services librarian opted to use the clickheat plugin for piwik, a free, open source web analytics tool that seeks to be an alternative to google analytics.15 by itself piwik has no click analytics feature, therefore clickheat is a useful plugin. both piwik and clickheat require access to a web server for installation and knowledge of php and mysql to configure them. because the kraemer family library does not maintain its own web servers, the pages, but it is time consuming and may not be worth the effort since the data are indirectly available.11 a major drawback to in-page analytics is that it does not discern between the same links listed in multiple places on a webpage. instead it tracks redundant links as one link, making it impossible to distinguish which repeated link received more use on the library’s homepage. similarly, the library’s homepage uses icons to help draw attention to certain links. these icons are linked images next to their counterpart text link. since the icon and text link share the same url, in-page analytics cannot reveal which is receiving more clicks. in-page analytics is useless for comparing repetitive links on a webpage, but google reports that they are working on adding this capability.12 as stated earlier, in-page analytics lays the click data over the current webpage in real-time, which can be both useful and limiting. using the current webpage allows libraries to navigate through their site while staying within the in-page analytics report. libraries can follow in the tracks of website users to learn how they interact with the site’s content and navigation. the downside is that it is difficult to compare a new version of a webpage with an older version since it only displays the current webpage. for example, the web services librarian could not accurately compare the use data between the old homepage and the revised homepage within the in-page analytics report because the newly redesigned homepage replaced the old page. comparing different versions of a webpage could help determine whether the new revisions improved the page or not. an archive or export feature would remedy this problem, but in-page analytics does not have this capacity. additionally, an export function would improve the ability to share this report with other librarians without having them login to the google analytics website. currently, the web evaluation of in-page analytics in-page analytics’ advanced segmenting ability far exceeds the old site overlay functionality. segmenting click data at the link level helps web managers to see how groups of users are navigating through a website. for example, in-page analytics can monitor the links mobile users are clicking, allowing web managers to track how that group of users are navigating through a website. this data could be used in designing a mobile version of a site. in-page analytics integrates a site overlay report and an overview report that contains selected web use statistics for an individual webpage. although the overview report is not in visual context with the site overlay view, it combines the necessary data to determine how a webpage is being accessed and used. this assists in identifying possible flaws in a website’s navigation, layout, or content. it also has the potential to clarify misleading website statistics. for instance, google analytics top exit pages report indicates the library’s homepage is the top exit page for the site. exit pages are the last page a visitor views before leaving the site.10 having a high exit rate could imply visitors were leaving the library’s site from the homepage and potentially missing a majority of the library’s online resources. using in-page analytics, it was apparent the library’s homepage had a high number of exits because many visitors clicked on outbound links, such as the library catalog, that navigated visitors away from the library’s website. rather than finding a potential problem, in-page analytics indicated that the homepage’s layout successfully led visitors to a desired point of information. while the data from the outbound links is available in the data overview report, it is not displayed within the site overlay view. it is possible to work around this problem by creating internal redirect 144 information technology and libraries | september 2011 the precise number of clicks is available in traditional web analytics reports. installing and configuring clickheat is a potential drawback for some libraries that do not have access to the necessary technology or staff to maintain it. even with access to a web server and knowledgeable staff, the web services librarian still experienced glitches implementing clickheat. she could not add clickheat to any high trafficked webpage because it created a slight, but noticeable, lag in response time to any page it was added. the cause was an out-of-box configuration setting that had to be fixed by the campus’ information technology department.17 another concern for libraries is that clickheat is continuously being developed with new versions or patches released periodically.18 like any locally installed software, libraries must plan for continuing maintenance of clickheat to keep it current. just as with in-page analytics, clickheat has no export or archive function. this impedes the web main navigation on the homepage and opted to use links prominently displayed within the homepage’s content. this indicated that either the users did not notice the main navigation dropdown menus or that they chose to ignore them. further usability testing of the main navigation is necessary to better understand why users do not utilize it. clickheat is most useful when combined with a comprehensive web analytics tool, such as piwik. since clickheat only collects data where visitors are clicking, it does not track other web analytics metrics, which limits its ability to segment the click data. currently, clickheat only segments clicks by browser type or screen resolution. additional segmenting ability would enhance this tool’s usefulness. for example, the ability to segment clicks from new visitors and returning visitors may reveal how visitors learn to use the library’s homepage. furthermore, the heat map report does not provide the actual number of clicks on individual links or content areas since heat maps generalize click patterns. web services librarian worked with the campus’ information technology department to install piwik with the clickheat plugin on a campus web server. once installed, piwik and clickheat generate javascript tags that must be added to every page that website use data will be tracked. although piwik and clickheat can be integrated, the tools work separately so two javascript tags must be added to a webpage to track click data in piwik as well as in clickheat. only the pages that contain the clickheat tracking script will generate heat maps that are then stored within the local piwik interface. evaluation of clickheat in-page analytics only tracks links or items that perform some sort of action, such as playing a flash video,16 but clickheat tracks clicks on internal links, outbound links, and even nonlinked objects, such as images. hence, clickheat is able to track clicks on the entire webpage. tracking non-linked objects was unexpectedly useful in identifying potential flaws in a webpage’s design. for instance, within a week of beta testing the library’s redesigned homepage, it was evident that users clicked on the graphics that were positioned closely to text links. the images were intended to draw the user’s attention to the text link, but instead users clicked on the graphic itself expecting it to be a link. to alleviate possible user frustration, the web services librarian added links to the graphics to take visitors to the same destinations as their companion text links. clickheat treats every link or image as its own separate component, so it has the ability to compare the same link listed in multiple places on the same page. unlike in-page analytics, clickheat was particularly helpful in analyzing which redundant links received more use on the homepage. in addition, the heat map also revealed that users ignored the site’s figure 3. screenshot of clickheat’s heat map report librarians and technology skill acquisition: issues and perspectives | farney 145click analytics: visualizing website use data | farney 145 clicks that area has received with the brighter colors representing the higher percentage of clicks. the plus signs can be expanded to show the total number of clicks an item has received, and this number can be easily filtered into eleven predefined allowing crazy egg to differentiate between the same link or image listed multiple times on a webpage. crazy egg displays this data in color-coded plus signs which are located next to the link or graphic it represents. the color is based on the percentage of services librarian’s ability to share the heat maps and compare different versions of a webpage. again, the web services librarian manually archives the heat maps using a screen capture tool, but the process is not the perfect solution. crazy egg crazy egg is a commercial, hosted click analytics tool selected for this project primarily for its advanced click tracking functionality. it is a fee-based service that requires a monthly subscription. there are several subscription packages based on the number of visits and “snapshots.” snapshots are webpages that are tracked by crazy egg. the kraemer family library subscribes to the standard package that allows up to twenty snapshots at one time with a combined total of 25,000 visits a month. to help manage how those visits are distributed, each tracked page can be assigned a specific number of visits or time period so that one webpage does not use all the visits early in the month. once a snapshot reaches its target number of visits or its allocated time period, it automatically stops tracking clicks and archives that snapshot within the crazy egg website.19 the snapshots convert the click data into three different click analytic reports: heat map, site overlay, and something called “confetti view.” crazy egg’s heat map report is comparable to clickheat’s heat map; they both use intensity of colors to show high areas of clicks on a webpage (see figure 4). crazy egg’s site overlay is similar to in-page analytics in that they both display the number of clicks a link receives (see figure 5). unlike in-page analytics, crazy egg tracks all clicks including outbound links as well as nonlinked content, such as graphics, if it has received multiple clicks. every clicked link and graphic is treated as its own separate entity, figure 4. screenshot of crazy egg’s heat map report figure 5. screenshot of crazy egg’s site overlay report 146 information technology and libraries | september 2011 to decide which redundant links to remove from the homepage. the confetti view report was useful for studying clicks on the entire webpage. segmenting this data allowed the web services librarian to identify click patterns on the webpage from a specific group. for example, the report revealed that mobile device users would scroll horizontally on the homepage to click on content, but rarely vertically. she also focused on the time to click segment, which reports how long it took a visitor to click on something, in the confetti view to identify links or areas that took users over half a minute to click. both segments provided interesting information, but further usability testing is necessary to better understand why mobile users preferred not to scroll vertically or why it took users longer to click on certain links. crazy egg also has the ability to archive its snapshots within its profile. this is useful for comparing different versions of a webpage to discover if the modifications were an improvement or not. one goal for the library’s homepage redesign was to shorten the page so users did not have to scroll evaluation of crazy egg crazy egg combines the capabilities of in-page analytcis and clickheat in one tool and expands on their abilities. it is not a comprehensive web analytics tool like google analytics or piwik, but rather is designed to specifically track where users are clicking. crazy egg’s heat map report is comparable to the one freely available in clickheat, however, its site overlay and confetti view reports are more sophisticated than what is currently available for free. the web services librarian found crazy egg to be a worthwhile investment during the library’s homepage redesign because it provided additional context to show how users were interacting with the library’s website. the site overlay facilitated the ability to compare the same link listed in multiple locations on the library’s homepage. not only could the web services librarian see how many clicks the links received, but she could also segment and compare that data to learn which links users were finding faster and which links new visitors or returning visitors preferred. this data helped her segments that include day of week, browser type, and top referring websites. custom segments may be applied if they are set up within the crazy egg profile. the confetti view report displays every click the snapshot recorded and overlays those clicks as colored dots on the snapshot as shown in figure 6. the color of the dot corresponds to specific segment value. the confetti view report uses the same default segmented values used in the site overlay report but here they can be further filtered into defined values for that segment. for example, the confetti view can segment the clicks by window width and then further filter the data to display only the clicks from visitors with window widths under 1000 pixels to see if users with smaller screen resolutions are scrolling down long webpages to click on content. this information is hard to glean from crazy egg’s site overlay report because it focuses on the individual link or graphic. the confetti view report focuses on clicks at the webpage level, allowing libraries to view usage trends on a webpage. crazy egg is a hosted service like google analytics, which means all the data are stored on crazy egg’s web servers and accessed through its website. implementing crazy egg on a webpage is a two-step process requiring the web manager to first set up the snapshot within the crazy egg profile and then add the tracking javascript tags to the webpage it will track. once the javascript tags are in place, crazy egg takes a picture of the current webpage and stores that as the snapshot on which to overlay the click data reports. since it uses a “snapshot” of the webpage, the website manager needs to retake a snapshot of the webpage if there are any changes to it. retaking the snapshot requires only a click of a button to automatically stop the old snapshot and regenerate a new one based on the current webpage without having to change the javascript tags. figure 6. screenshot of crazy egg’s confetti view report librarians and technology skill acquisition: issues and perspectives | farney 147click analytics: visualizing website use data | farney 147 website. next, she will explore ways to automate the process of sharing of website use data to make this information more accessible to other interested librarians. by sharing this information, the web services librarian hopes to promote informed decision making for the library’s web content and design. references 1. avinash kaushik, web analytics 2.0: the art of online accountability and science of customer centricity (indianapolis: wiley, 2010): 81–83. 2. laura b. cohen, “a two-tiered model for analyzing library website usage statistics, part 2: log file analysis,” portal: libraries & the academy 3, no. 3 (2003): 523–24. 3. jeanie m. welch, “who says we’re not busy? library web page usage as a measure of public service activity,” reference services review 33, no. 4 (2005): 377–78. 4. marshall breeding, “an analytical approach to assessing the effectiveness of web-based resources,” computers in libraries, 28, no. 1 (2008): 20–22. 5. julie arendt and cassie wagner, “beyond description: converting web site statistics into concrete site improvement ideas,” journal of web librarianship 4, no. 1 (january 2010): 37–54; steven j. turner, “websites statistics 2.0: using google analytics to measure library website effectiveness,” technical services quarterly 27, no. 3 (2010): 261–278; wei fang and marjorie e. crawford, “measuring law library catalog web site usability: a web analytic approach,” journal of web librarianship 2, no. 2–3 (2008): 287–306. 6. ardent and wagner, “beyond description,” 51–52; andrea wiggins, “data-driven design: using web analytics to validate heuristics,” bulletin of the american society for information science and technology 33, no. 5 (2007): 20–21; elizabeth l. black, “web analytics: a picture of the academic library web site user,” journal of web librarianship 3, no. 1 (2009): 12–13. 7. trevor claiborne, “introducing in-page analytics: visual context for your analytics data,” google analytics blog, oct. 15, 2010, http://analytics.blogspot .com/2010/10/introducing-in-page-ana tracking abilities, however, all provide a distinct picture of how visitors use a webpage. by using all of them, the web services librarian was able to clearly identify and recommend the links for removal. in addition, she identified other potential usability concerns, such as visitors clicking on nonlinked graphics rather than the link itself. a major bonus of using click analytics tools is their ability to create easy to understand reports that instantly display where visitors are clicking on a webpage. no previous knowledge of web analytics is required to understand these reports. the web services librarian found it simple to present and discuss click analytics reports with other librarians with little to no background in web analytics. this helped increase the transparency of why links were targeted for removal from the homepage. as useful as click analytics tools are, they cannot determine why users click on a link, only where they have clicked. click analytics tools simply visualize website usage statistics. as elizabeth black reports, these “statistics are a trail left by the user, but they do not explain the motivations behind the behavior.”20 she concludes that additional usability studies are required to better understand users and their interactions on a website.21 libraries can use the click analytics reports to identify a problem on a webpage, but further usability testing will explain why there is a problem and help library web managers fix the issue and prevent repeating the mistake in the future. the web services librarian incorporated the use of in-page analytics, clickheat, and crazy egg in her web analytics practices since these tools continue to be useful to test the usage of new content added to a webpage. furthermore, she finds that click analytics’ straightforward reports prompted her to share website use data more often with fellow librarians to assist in other decisionmaking processes for the library’s down too much to get to needed links. by comparing the old homepage and the new homepage confetti reports in crazy egg, it was instantly apparent that the new homepage had significantly fewer clicks on its bottom half than the old version. furthermore, comparing the different versions using the time to click segment in the site overlay showed that placing the link more prominently on the webpage decreased the overall time it took users to click on it. crazy egg’s main drawback is that archived pages that are no longer tracking click data count toward the overall number of snapshots that can be tracked at one time. if libraries regularly retest a webpage, they will easily reach the maximum number of snapshots their subscription permits in a relatively short period. once a crazy egg subscription is cancelled data stored in the account is no longer accessible. this increases the importance of regularly exporting data. crazy egg is designed to export the heat map and confetti view reports. the direct export function takes a snapshot of the current report as it is displayed, and automatically converts that image into a pdf. exporting the heat map is fairly simple because the report is a single image, but exporting all the content in the confetti view report is more difficult because the report is based on segments of click data. each segment type would have to be exported in a separate pdf report to retain all of the content. in addition, there is no export option for the site overlay report so there is not an easy method to manage that information outside of crazy egg. even if libraries are actively exporting reports from crazy egg, data loss is inevitable. summary and conclusions closely examining in-page analytics, clickheat, and crazyegg reveals that each tool has different levels of click 148 information technology and libraries | september 2011 (2009): 81–84. 17. clickheat performance and optimization, labsmedia, http://www .labsmedia.com/clickheat/156894.html (accessed feb. 7, 2011). 18. clickheat, sourceforge, http:// sourceforge.net/projects/clickheat/files/ (accessed feb. 7, 2011). 19. crazy egg, http://www.crazyegg .com/, (accessed on mar. 25, 2011). 20. black, “web analytics,” 12. 21. ibid., 12–13. 13. screengrab, firefox add-ons, https://addons.mozilla.org/en-us/fire fox/addon/1146/ (accessed feb. 7, 2011). 14. clickheat, labsmedia, http:// www.labsmedia.com/clickheat/index .html (accessed feb. 7,2011). 15. piwik, http://piwik.org/ (accessed feb. 7, 2011). 16. paul betty, “assessing homegrown library collections: using google analytics to track use of screencasts and flash-based learning objects,” journal of electronic resources librarianship, 21, no. 1 lytics-visual.html (accessed feb. 7, 2011). 8. kaushik, web analytics 2.0, 88. 9. turner, “websites statistics 2.0,” 272–73. 10. kaushik, web analytics 2.0, 53–55. 11. site overlay not displaying outbound links, google analytics help forum, http://www.google.com/ support/forum/p/google+analytics/ thread?tid=39dc323262740612&hl=en (accessed feb. 7, 2011). 12. claiborne, “introducing in-page analytics.” 30 information technology and libraries | march 2010 the path toward global interoperability in cataloging ilana tolkoff libraries began in complete isolation with no uniformity of standards and have grown over time to be ever more interoperable. this paper examines the current steps toward the goal of universal interoperability. these projects aim to reconcile linguistic and organizational obstacles, with a particular focus on subject headings, name authorities, and titles. i n classical and medieval times, library catalogs were completely isolated from each other and idiosyncratic. since then, there has been a trend to move toward greater interoperability. we have not yet attained this international standardization in cataloging, and there are currently many challenges that stand in the way of this goal. this paper will examine the teleological evolution of cataloging and analyze the obstacles that stand in the way of complete interoperability, how they may be overcome, and which may remain. this paper will not provide a comprehensive list of all issues pertaining to interoperability; rather, it will attempt to shed light on those issues most salient to the discussion. unlike the libraries we are familiar with today, medieval libraries worked in near total isolation. most were maintained by monks in monasteries, and any regulations in cataloging practice were established by each religious order. one reason for their lack of regulations was that their collections were small by our standards; a monastic library had at most a few hundred volumes (a couple thousand in some very rare cases). the “armarius,” or librarian, kept more of an inventory than an actual catalog, along with the inventories of all other valuable possessions of the monastery. there were no standard rules for this inventory-keeping, although the armarius usually wrote down the author and title, or incipit if there was no author or title. some of these inventories also contained bibliographic descriptions, which most often described the physical book rather than its contents. the inventories were usually taken according to the shelf organization, which was occasionally based on subject, like most libraries are today. these trends in medieval cataloging varied widely from library to library, and their inventories were entirely different from our modern opacs. the inventory did not provide users access to the materials. instead, the user consulted the armarius, who usually knew the collection by heart. this was a reasonable request given the small size of the collections.1 this type of nonstandardized cataloging remained relatively unchanged until the nineteenth century, when charles c. jewett introduced the idea of a union catalog. jewett also proposed having stereotype plates for each bibliographic record, rather than a book catalog, because this could reduce costs, create uniformity, and organize records alphabetically. this was the precursor to the twentieth-century card catalog. while many of jewett’s ideas were not actually practiced during his lifetime, they laid the foundation for later cataloging practices.2 the twentieth century brought a great revolution in cataloging standards, particularly in the united states. in 1914, the library of congress subject headings (lcsh) were first published and introduced a controlled vocabulary to american cataloging. the 1960s saw a wide array of advancements in standardization. the library of congress (lc) developed marc, which became a national standard in 1973. it also was the time of the creation of anglo-american cataloguing rules (aacr), the paris principles, and international standard bibliographic description (isbd). while many of these standardization projects were uniquely american or british phenomena, they quickly spread to other parts of the world, often in translated versions.3 while the technology did not yet exist in the 1970s to provide widespread local online catalogs, technology did allow for union catalogs containing the records of many libraries in a single database. these union catalogs included the research libraries information network (rlin), the oclc online computer library center (oclc), and the western library network (wln). in the 1980s the local online public access catalog (opac) emerged, and in the 1990s opacs migrated to the web (webpacs).4 currently, most libraries have opacs and are members of oclc, the largest union catalog, used by more than 71,000 libraries in 112 countries and territories.5 now that most of the world’s libraries are on oclc, librarians face the challenge and inconvenience of discrepancies in cataloging practice due to the differing standards of diverse countries, languages, and alphabets. the fields of language engineering and linguistics are working on various language translation and analysis tools. some of these include machine translation; ontology, or the hierarchical organization of concepts; information extraction, which deciphers conceptual information from unorganized information, such as that on the web; text summarization, in which computers create a short summary from a long piece of text; and speech processing, which is the computer analysis of human speech.6 while these are all exciting advances in information technology, as of yet they are not intelligent enough to help us establish cataloging interoperability. it will be interesting to see whether language engineering tools will be capable of helping catalogers in the future, but for now they are ilana tolkoff (ilana.tolkoff@gmail.com) holds a ba in music and italian from vassar college, an ma in musicology from brandeis university, and an mls from the university at buffalo. she is currently seeking employment as a music librarian. the path toward global interoperability in cataloging | tolkoff 31 best at making sense of unstructured information, such as the web. the interoperability of library catalogs, which consist of highly structured information, must be tackled through software that innovative librarians of the future will produce. in an ideal world, oclc would be smoothly interoperable at a global level. a single thesaurus of subject headings would have translations in every language. there would be just one set of authority files. all manifestations of a single work would be grouped under the same title, translatable to all languages. there would be a single bibliographic record for a single work, rather than multiple bibliographic records in different languages for the same work. this single bibliographic record could be translatable into any language, so that when searching in worldcat, one could change the settings to any language to retrieve records that would display in that chosen language. when catalogers contribute to oclc, they would create the records in their respective languages, and once in the database the records would be translatable to any other language. because records would be so fluidly translatable, an opac could be searched in any language. for example, the default settings for the university at buffalo’s opac could be english, but patrons could change those settings to accommodate the great variety of international students doing research. this vision is utopian to say the least, and it is doubtful that we will ever reach this point. but it is valuable to establish an ideal scenario to aim our innovation in the right direction. one major obstacle in the way of global interoperability is the existence of different alphabets and the inherently imperfect nature of transliteration. there are essentially two types of transliteration schemes: those based on phonetic structure and those based on morphemic structure. the danger of phonetic transliteration, which mimics pronunciation, is that semantics often get lost. it fails to differentiate between homographs (words that are spelled and pronounced the same way but have different meanings). complications also arise when there are differences between careful and casual styles of speech. park asserts, “when catalogers transcribe words according to pronunciation, they can create inconsistent and arbitrary records.”7 morphemic transliteration, on the other hand, is based on the meanings of morphemes, and sometimes ends up being very different from the pronunciation in the source language. one advantage to this, however, is that it requires fewer diacritics than phonetic transliteration. park, whose primary focus is on korean–roman transliteration, argues that the mccune reischauer phonetic transliteration that libraries use loses too much of the original meaning. in other alphabets, however, phonetic transliteration may be more beneficial, as in the lc’s recent switch to pinyin transliteration in chinese. the lc found pinyin to be more easily searchable than wade-giles or monosyllabic pinyin, which are both morphemic. however, another problem with transliteration that neither phonetic nor morphemic schemes can solve is word segmentation—how a transliterated word is divided. this becomes problematic when there are no contextual clues, such as in a bibliographic record.8 other obstacles that stand in the way of interoperability are the diverse systems of subject headings, authority headings, and titles found internationally. resource description and access (rda) will not deal with subject headings because it is such a hefty task, so it is unlikely that subject headings will become globally interoperable in the near future.9 fortunately, twenty-four national libraries of english speaking countries use lcsh, and twelve non-english-speaking countries use a translated or modified version of lcsh. this still leaves many more countries that use their own systems of subject headings, which ultimately need to be made interoperable. even within a single language, subject headings can be complicated and inconsistent because they can be expressed as a single noun, compound noun, noun phrase, or inverted phrase; the problem becomes even greater when trying to translate these to other languages. bennett, lavoie, and o’neill note that catalogers often assign different subject headings (and classifications) to different manifestations of the same work.10 that is, the record for the novel gone with the wind might have different subject headings than the record for the movie. this problem could potentially be resolved by the functional requirements for bibliographic records (frbr), which will be discussed below. translation is a difficult task, particularly in the context of strict cataloging rules. it is especially complicated to translate among unrelated languages, where one might be syntactic and the other inflectional. this means that there are discrepancies in the use of prepositions, conjunctions, articles, and inflections. the ability to add or remove terms in translation creates endless variations. a single concept can be expressed in a morpheme, a word, a phrase, or a clause, depending on the language. there also are cultural differences that are reflected in different languages. park gives the example of how angloamerican culture often names buildings and brand names after people, reflecting our culture’s values of individualism, while in korea this phenomenon does not exist at all. on the other hand, korean’s use of formal and informal inflections reflects their collectivist hierarchical culture. another concept that does not cross cultural lines is the korean pumasi system in which family and friends help someone in a time of need with the understanding that the favor will be returned when they need it. this cannot be translated into a single english word, phrase, or subject heading. one way of resolving ambiguity in translations is through modifiers or scope notes, but this is only a partial solution.11 because translation and transliteration are so difficult, 32 information technology and libraries | march 2010 as well as labor-intensive, the current trend is to link already existing systems. multilingual access to subjects (macs) is one such linking project that aims to link subject headings in english, french, and german. it is a joint project under the conference of european national librarians among the swiss national library, the bibliothèque nationale de france (bnf), the british library (bl), and die deutsche bibliothek (ddb). it aims to link the english lcsh, the french répertoire d’autorité matière encyclopédique et alphabétique unifié (rameau), and the german schlagwortnormdatei/ regeln für den schlagwortkatalog (swd/rswk). this requires manually analyzing and matching the concepts in each heading. if there is no conceptual equivalent, then it simply stands alone. macs can link between headings and strings or even create new headings for linking purposes. this is not as fruitful as it sounds, however, as there are fewer correspondences than one might expect. the macs team experimented with finding correspondences by choosing two topics: sports, which was expected to have a particularly high number of correspondences, and theater, which was expected to have a particularly low number of correspondences. of the 278 sports headings, 86 percent matched in all three languages, 8 percent matched in two, and 6 percent was unmatched. of the 261 theater headings, 60 percent matched in three languages, 18 percent matched in two, and 22 percent was unmatched.12 even in the most cross-cultural subject of sports, 14 percent of terms did not correspond fully, making one wonder whether linking will work well enough to prevail. a similar project—the virtual international authority file (viaf)—is being undertaken for authority headings, a joint project of the lc, the bnf, and ddb, and now including several other national libraries. viaf aims to link (not consolidate) existing authority files, and its beta version (available at http://viaf.org) allows one to search by name, preferred name, or title. oclc’s software mines these authority files and the titles associated with them for language, lc control number, lc classification, usage, title, publisher, place of publication, date of publication, material type, and authors. it then derives a new enhanced authority record, which facilitates mapping among authority records in all of viaf’s languages. these derived authority records are stored on oai servers, where they are maintained and can be accessed by users. users can search viaf by a single national library or broaden their possibilities by searching all participating national libraries. as of 2006, between the lc’s and ddb’s authority files, there were 558,618 matches, including 70,797 complex matches (one-to-many), and 487,821 unique matches (one-to-one) out of 4,187,973 lc names and 2,659,276 ddb names. ultimately, viaf could be used for still more languages, including non-roman alphabets.13 recently the national library of israel has joined, and viaf can link to the hebrew alphabet. a similar project to viaf that also aimed to link authority files was linking and exploring authority files (leaf), which was under the auspices of the information society technologies programme of the fifth framework of the european commission. the three-year project began in 2001 with dozens of libraries and organizations (many of which are national libraries), representing eight languages. its website describes the project as follows: information which is retrieved as a result of a query will be stored in a pan-european “central name authority file.” this file will grow with each query and at the same time will reflect what data records are relevant to the leaf users. libraries and archives wanting to improve authority information will thus be able to prioritise their editing work. registered users will be able to post annotations to particular data records in the leaf system, to search for annotations, and to download records in various formats.14 park identifies two main problems with linking authority files. one is that name authorities still contain some language-specific features. the other is that disambiguation can vary among name authority systems (e.g., birth/death dates, corporate qualifiers, and profession/ activity). these are the challenges that projects like leaf and viaf must overcome. while the linking of subject headings and name authorities is still experimental and imperfect, the frbr model for linking titles is much more promising and will be incorporated in the soon-to-be-released rda. according to bennett, lavoie, and o’neill, there are three important benefits to frbr: (1) it allows for different views of a bibliographic database, (2) it creates a hierarchy of bibliographic entities in the catalog such that all versions of the same work fall into a single collapsible entry point, (3) and the confluence of the first two benefits makes the catalog more efficient. in the frbr model, the bibliographic record consists of four entities: (1) the work, (2) the expression, (3) the manifestation, and (4) the item. all manifestations of a single work are grouped together, allowing for a more economical use of information because the title needs to be entered only once.15 that is, a “title authority file” will exist much like a name authority file. this means that all editions in all languages and in all formats would be grouped under the same title. for example, the lord of the rings title would include all novels, films, translations, and editions in one grouping. this would reduce the number of bibliographic records, and as danskin notes, “the idea of creating more records at a time when publishing output threatens to outstrip the cataloguing capacity of national bibliographic agencies is alarming.”16 the frbr model is particularly beneficial for complex canonical works like the bible. there are a small number of complex canonical works, but they take up a the path toward global interoperability in cataloging | tolkoff 33 disproportionate number of holdings in oclc.17 because this only applies to a small number of works, it would not be difficult to implement, and there would be a disproportionate benefit in the long run. there is some uncertainty, however, in what constitutes a complex work and whether certain items should be grouped under the same title.18 for instance, should prokofiev’s romeo and juliet be grouped with shakespeare’s? the advantage of the frbr model for titles over subject headings or name authorities is that no such thing as a title authority file exists (as conceptualized by frbr). we would be able to start from scratch, creating such title authority files at the international level. subject headings and name authorities, on the other hand, already exist in many different forms and languages so that cross-linking projects like viaf might be our only option. it is encouraging to see the strides being made to make subject headings, name authority headings, and titles globally interoperable, but what about other access points within a record’s bibliographic description? these are usually in only one language, or two if cataloged in a bilingual country. should these elements (format, contents, and so on) be cross-linked as well, and is this even possible? what should reasonably be considered an access point? most people search by subject, author, or title, so perhaps it is not worth making other types of access points interoperable for the few occasions when they are useful. yet if 100 percent universal interoperability is our ultimate utopian goal, perhaps we should not settle for anything less than true international access to all fields in a record. because translation and transliteration are such complex undertakings, linking of extant files is the future of the field. there are advantages and disadvantages to this. on the one hand, linking these files is certainly better than having them exist only for their own countries. they are easily executed projects that would not require a total overhaul of the way things currently stand. the disadvantages are not to be ignored, however. the fact that files do not correspond perfectly from language to language means that many files will remain in isolation in the national library that created them. another problem is that cross-linking is potentially more confusing to the user; the search results on http://www.viaf.org are not always simple and straightforward. if cross-linking is where we are headed, then we need to focus on a more user-friendly interface. if the ultimate goal of interoperability is simplification, then we need to actually simplify the way query results are organized rather than make them more confusing. very soon rda will be released and will bring us to a new level of interoperability. aacr2 arrived in 1978, and though it has been revised several times, it is in many ways outdated and mainly applies to books. rda will bring something completely new to the table. it will be flexible enough to be used in other metadata schemes besides marc, and it can even be used by different industries such as publishers, museums, and archives.19 its incorporation of the frbr model is exciting as well. still, there are some practical problems in implementing rda and frbr, one of which is that reeducating librarians about the new rules will be costly and take time. also, frbr in its ideal form would require a major overhaul of the way oclc and integrated library systems currently operate, so it will be interesting to see to what extent rda will actually incorporate frbr and how it will be practically implemented. danskin asks, “will the benefits of international co-operation outweigh the costs of effecting changes? is the usa prepared to change its own practices, if necessary, to conform to european or wider ifla standards?”20 it seems that the united states is in fact ready and willing to adopt frbr, but to what extent is yet to be determined. what i have discussed in this paper are some of the more prominent international standardization projects, although there are countless others, such as eurowordnet, the open language archives community (olac), and international cataloguing code (icc), to name but a few.21 in general, the current major projects consist of linking subject headings, name authority files, and titles in multiple languages. linking may not have the best correspondence rates, we have still not begun to tackle the cross-linking of other bibliographic elements, and at this point search results may be more confusing than helpful. but the existence of these linking projects means we are at least headed in the right direction. the emergent universality of oclc was our most recent step toward interoperability, and it looks as if cross-linking is our next step. only time will tell what steps will follow. references 1. lawrence s. guthrie ii, “an overview of medieval library cataloging,” cataloging & classification quarterly 15, no. 3 (1992): 93–100. 2. lois mai chan and theodora hodges, cataloging and classification: an introduction, 3rd ed. (lanham, md.: scarecrow, 2007): 48. 3. ibid., 6–8. 4. ibid., 7–9. 5. oclc, “about oclc,” http://www.oclc.org/us/en/ about/default.htm (accessed dec. 9, 2009). 6. jung-ran park, “cross-lingual name and subject access: mechanisms and challenges,” library resources & technical services 51, no. 3 (2007): 181. 7. ibid., 185. 8. ibid. continued on page 39 tagging: an organization scheme for the internet | visser 39 international and o’reilly media, web 2.0 refers to the web as being a platform for harnessing the collective power of internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierarchical policy influencers or regulators. web 3.0 is a much more fluid concept as of this writing. there are individuals who use it to refer to a semantic web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. there are librarians involved with exploring virtual-world librarianship who refer to the 3d environment as web 3.0. the important point here is that what internet users now know as web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing web applications. web 3.0 is the undefined future of the participatory internet. 3. clay shirky, “here comes everybody: the power of organizing without organizations” (presentation videocast, berkman center for internet & society, harvard university, cambridge, mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed oct. 1, 2008). 4. ibid. 5. lawerence lessig, “early creative commons history, my version,” videocast, aug. 11, 2008, lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed aug. 13, 2008). 6. elaine peterson, “beneath the metadata: some philosophical problems with folksonomy,” d-lib magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed sept. 8, 2008). 7. clay shirky, “ontology is overrated: categories, links, and tags” online posting, spring 2005, clay shirky’s writings about the internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed sept. 8, 2008). 8. gene smith, tagging: people-powered metadata for the social web (berkeley, calif.: new riders, 2008): 68. 9. ibid., 76. 10. thomas vander wal, “folksonomy,” online posting, feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed aug. 26, 2008). 11. thomas vander wal, “explaining and showing broad and narrow folksonomies,” online posting, feb. 21, 2005, personal infocloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed aug. 29, 2008). 12. shirky, “ontology is overrated.” 13. ibid. 14. michael arrington, “exclusive: screen shots and feature overview of delicious 2.0 preview,” online posting, june 16, 2005, techcrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed jan. 6, 2010). 15. smith, tagging, 67–93 . 16. vander wal, “explaining and showing broad and narrow folksonomies.” 17. adam mathes, “folksonomies—cooperative classification and communication through shared metadata” (graduate paper, university of illinois urbana–champaign, dec. 2004); peterson, “beneath the metadata”; shirky, “ontology is overrated”; thomas and griffin, “who will create the metadata for the internet?” 18. shirky, “ontology is overrated.” 19. peterson, “beneath the metadata.” 20. cory doctorow, “metacrap: putting the torch to seven straw-men of the meta-utopia,” online posting, aug. 26, 2001, the well, http://www.well.com/~doctorow/metacrap.htm (accessed sept. 15, 2008). 21. marieke guy and emma tonkin, “folksonomies: tidying up tags?” d-lib magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed sept. 8, 2008). 22. shirky, “ontology is overrated.” global interoperability continued from page 33 9. julie renee moore, “rda: new cataloging rules, coming soon to a library near you!” library hi tech news 23, no. 9, (2006): 12. 10. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, & technical services 27, no. 1, (2003): 56. 11. park, “cross-lingual name and subject access.” 12. ibid. 13. thomas b. hickey, “virtual international authority file” (microsoft powerpoint presentation, ala annual conference, new orleans, june 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed dec. 9, 2009). 14. leaf, “leaf project consortium,” http://www.crxnet .com/leaf/index.html (accessed dec. 9, 2009). 15. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 16. alan danskin, “mature consideration: developing bibliographic standards and maintaining values,” new library world 105, no. 3/4, (2004): 114. 17. ibid. 18. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 19. moore, “rda.” 20. danskin, “mature consideration,” 116. 21. ibid.; park, “cross-lingual name and subject access.” author id box for 2 column layout editorial board thoughts | hirst 5 donna hirsteditorial board thoughts the iowa city flood of 2008: a librarian and it professional’s perspective d o you like to chase fire trucks? do you enjoy watching a raft of adventurers go over the waterfall, careening from rock to rock? well, this is a story of the iowa city flood of 2008, a flood projected to happen once every five hundred years, from the perspective of a librarian and it professional. n the approach of the flood the winter of 2008 was hard, and we got mounds of snow. the spring was wet that year in iowa city. it rained almost every day. minnesota’s snow melt-off hadn’t been released from the reservoir due to the heavy rains. everyone watched the river rise, day by day. the parks were underwater; the river was creeping up toward buildings, including the university of iowa. in early june, with about a day and a half notice, library staff at the university’s main library, art library, and music library were told to evacuate. one of the first acts of evacuation was the relocation of all of the library servers to the engineering building up the hill—high and dry—literally rolling them across the street and up the sidewalk. although all servers were relocated to engineering, engineering didn’t have enough power in their server room to handle the extra capacity to run all of our machines. the five primo servers that run our discovery searching service had to stay disconnected. with the servers safe and sound, we moved our attention to staff workstations. the personal workstations of the administrative staff and the finance department were moved to the business library. the libraries’ laptops were collected and moved into the branch libraries, which would be receiving displaced staff. many staff would be expected to work from public clusters in the various library branches, locked down to specific functions. as library staff were collecting their critical possessions, the town was madly sandbagging. more than a million sandbags were piled around university buildings, private businesses, and residences. in retrospect, some of the sandbags may have made a difference, but since the flood was so much greater than anticipated, the water largely went over and around, leaving a lot of soggy sandbags. on june 13, the day before the main library was to be closed, the decision was made to move books up from the basement. there were well over 500,000 volumes in the basement, and a group of approximately five hundred volunteers moved 62,000 volumes and 37,000 manuscript boxes from the lower shelves. volunteers passed books hand to hand into the third, fourth, and fifth floors of the building. a number of the volunteers came from sandbagging teams. individuals who had never been in a boxes of manuscripts being stacked on the fifth floor photo by carol jonck moving boxes out of the basement photo courtesy of the university of iowa news service donna hirst (donna-hirst@uiowa.edu) is project coordinator, library information technology, university of iowa libraries, iowa city. 6 information technology and libraries | december 2008 library, didn’t know what a circulation desk was, or what a library of congress call number was were working hard side by side with physicians, ministers, scientists, students, and retirees. the end result was not orderly, but the collection was saved from the encroaching river. the libraries at the university of iowa are indebted to these volunteers who helped protect the collection from the expected water. n the river peaks approximately twenty university buildings were closed because of the flood, including the main library, the art building, and the music building. the university’s power plant was closed. the entire arts campus was deeply under water. most of the main roads connecting the east side of iowa city to the west side were closed, and most of the highways into iowa city were closed. interstate 80 was closed in multiple places, and no traffic was allowed from the east side of the state to the west side. many bridges in and around iowa city were closed; some had actually crumbled and floated down stream. so the president of the university, sally mason, closed the university for the first time in its history. most staff would not be able to get to work anyway. many individuals were struggling with residences and businesses that were under water. the university was to be closed for the week of june 15, with the university’s hospitals continuing to operate under strained conditions; continued delivery of patient services was a priority. most library staff stayed home and followed the news stories, shocked at the daily news of destruction and loss. select library it staff began working in the background to set up new work environments for library staff returning to foreign workstations or relocated work environments. at the flood’s peak, the main library took several inches of water in the basement. there was slight rusting in the compact shelving, but the collection was completely saved. a portion of the basement was lower, and the computer equipment controlling the libraries’ public computer cluster was completely ruined. this computer cluster housing more than two hundred workstations library staff and volunteers sandbagging photo by carol jonck moving books out of the basement photo courtesy of the university of iowa news service the beginning of a book chain to the fourth floor photo courtesy of the university of iowa news service editorial board thoughts | hirst 7 which had been moved on the last day before the evacuation. much of this administrative work could proceed, and during the first week at the business library our finance department successfully completed our end-ofyear rollover process on all our materials funds. staff from the music library, art library, preservation, and special collections were assigned to the business library. the engineering library adopted the main library circulation and reserve departments. the media services staff was relocated to the physics library. the media staff had cleverly pulled most of the staff development videos and made them available to staff from the physics library, thus allowing the many displaced library staff to make progress on staff development requirements. was completely out of commission. the basements and first floors of the art and music buildings were completely ruined, but the libraries for these disciplines were on higher floors. the collections were spared, but there was absolutely no access to the building. n the libraries take baby steps to resume service after a week of being completely shut down, the university opened to a first day of summer school, but things were not the same. for the nineteen university buildings that had been flooded, hordes of contractors, subcontractors, and laborers began the arduous task of reclamation. university staff could work at home when that was possible, and most of the library’s dislocated reference staff did that, developing courses for the fall, progressing on selection work, and so on. staff could take vacation, but few chose this option. approximately 160 staff from the main library and the art and music libraries were reassigned to four branch libraries that were not affected by the flood. all of central technical services (cts) and interlibrary loan staff were assigned to the hardin health science library. central shipping and facilities was also at harden library, thus the convoluted distribution of mail started from here. most of the public machines were taken by cts staff, but their routine work proceeded very slowly. cts did not have access to oclc until the end of their flood relocation, which seriously impacted their workflow. an early problem that had to be solved was providing telephones and printing to relocated staff. virtually none of the relocated staff had dedicated telephones, even the administration. in any given location the small number of regular branch staff graciously shared their phones with their visitors. sharing equipment tended to be true for printers as well. for a few critical phone numbers in the main library, the phone number was transferred to a designated phone in the branch. thus often, when regular staff or student workers answered a phone, they had no idea what number the originating caller was trying to call. staff were encouraged to transfer their office phone number to their cell phone. at the business library, the library administrative staff and the finance staff had their personal workstations, library staff sandbagging photo by donald baxter 8 information technology and libraries | december 2008 was closed for about four weeks. the art and music libraries may be closed for a year. when library staff returned to the main library, there were books and manuscript boxes piled on the floor and on top of all the study tables. some of the main corridors, approximately twentyone feet wide, were so filled with library materials that you almost had to walk sideways and suck in your tummy to walk down the hall. bathrooms were blocked and access to elevators was limited. every library study table on the third through fifth floors were piled three feet high or more with books. for many weeks, library staff and volunteers carefully sorted through the materials and reshelved them as required. many materials needed conservation treatment, not because of the flood, but because of age and handling. many adjustments needed to be made to resume full service. due dates for all circulation categories had to be retrospectively altered to allow for the libraries being closed and for the extraordinary situations in which our library users found themselves during the flood. library materials were returned wet and moldy, and some items were lost. during the flood, in some cases, buildings actually floated down river. the libraries’ preservation department did extensive community education regarding treatment of materials damaged in the flood. the university was very interested in documenting the affect of the flood, and thus the libraries cooperated in trying to gather statistics on the number of hours of library staff and volunteers used during the flood. record keeping was complex, since one person could be a staff person working on flood efforts but also a volunteer working evenings and weekends. n our neighbors the effect of the iowa city flood of 2008 has been extensive, but was nothing compared to the flood in cedar rapids, our neighbor to the north. the cedar rapids public library lost their entire collection of 300,000 volumes, except for the children’s collection and 26,000 volumes that were checked out to library users that week. it staff were housed throughout the newly distributed libraries complex. one it staff member was at the engineering library, one was at the health science library, and two were at the business library. several it staff were relocated to the campus computer center. n the libraries proceed apace despite hurdles as the water receded and workers cleaned and proceeded with air handling and mold abatement, a very limited number of library staff were allowed back into the main library, typically with escorts, for very limited periods of time. during this time it staff was able to go into the main library and retrieve barcode scanners to allow cts staff to progress with book processing. staff went back for unprocessed materials needing original cataloging since staff had the time to process materials but didn’t have the materials. it staff retrieved some of our zebra printers so that labels could be applied to unbound serials. as it staff were allowed limited access to the main library, they went around to the various staff workstations and powered them up so that relocated staff could utilize the remote desktop function. n moving back the art and music libraries were evacuated june 10. the main library was evacuated june 13. the main library passing the books up the stairs photo courtesy of the university of iowa news service the goal of this paper is to describe a design—including the hardware, software, and configuration––for an open source wireless network. the network designed will require authentication. while care will be taken to keep the authentication exchange secure, the network will otherwise transmit data without encryption. w ireless networks are an essential tool for provid­ ing service for colleges and libraries. this paper will explain the setup of a wireless network using open­source software and inexpensive commodity hardware. open­source software was employed exclu­ sively. this allowed for flexibility in design and reduction in expense while also providing a platform for students to learn more about the internal workings of the system by examining particular sections of code in which they have interest. standard commodity hardware was used as a means of saving cost. this should allow others to repeat this design with a minimum of funding. the purpose of a network, like any resource, is to provide a service for those who own it; in this case, the patrons of a library, or students, faculty, and staff at a col­ lege. to ensure that this network serves its owners, users will be required to authenticate before gaining access. once authenticated, the central captive portal can pro­ vide different levels of service for specific user groups, including guest access, if desired. for this system, ease of access for users was the primary concern; other than using the secure socket layer for authentication, the remainder of the traffic was unencrypted. other than the base nodes, the remaining access points were connected to each other using a wireless connection in order to avoid physically connecting all access points across campus and to further reduce the expense for the deployment of the network. this was accomplished using the wds (wireless distributed system) feature on the wireless routers. all access points connect to a centralized set of servers that provide: dhcp, web­caching proxy, dns caching, radius, web server, a captive portal, and logging of network traffic. n hardware requirements for the network were relatively modest, using inexpensive wireless routers along with several linux servers built upon older pentium 3 desktop systems. linksys wrt54gs routers were chosen as the access points as they are inexpensive, readily available, and possess the ability to run custom open­source firmware. other access points could be used; however, the configuration sugges­ tions are specific to the wrt54gs and may not apply to other hardware. the routing functions of the wrt54gs were not used in this implementation. the servers need not be anything special; older hardware will work just fine. for this implementation, decommissioned 900 mhz units with 512mb of ram and 40gb hard drives were used. n wireless router software in order to provide the functionality required, the units had their firmware flashed with an open­source, linux­ based operating system available from openwrt for the linksys routers (http://www.openwrt.org). support is also available for other wireless devices. “the firmware from openwrt provides a fully writable file system with pack­ age management. this allows developers the freedom to customize the devices by choosing only the packages and software that are necessary for their applications.”1 as the routers have limited storage, being able to hand select only the necessary components is a definite advantage. n server software for the operating system on the servers, fedora core was chosen.2 fedora provides the yellow dog updater, modified (yum), which eases the updating of all pack­ ages installed on the system, including kernel updates.3 this aids security by providing a platform for easily and frequently updating the system. fedora core is an open­ source distribution that is available for free. fedora core also comes with many other open­source packages that were used in this design, such as the apache web server. while the designers had more familiarity with fedora, other distributions are also available that provide simi­ lar benefits (suse, ubuntu, openbsd, debian, etc.). the server was run in command line mode with no graphical user interface in order to reduce the load on the server and save space on the hard drive. n captive portal in order to require authentication before gaining access to the network, a captive portal was used. some of the open source wifi hotspot implementation | sondag and feher 35 open source wifi hotspot implementation tyler sondag and jim feher jim feher (jdfeher@mckendree.edu) is an associate professor of computer science at mckendree college in lebanon, illinois. tyler sondag (tnsondag@mckendree.edu), is a senior in computer science at mckendree college. 36 information technology and libraries | june 200736 information technology and libraries | june 2007 desired features in the choice of the captive portal were: encrypted authentication, traffic logging, and the ability to provide different levels of service for different user groups. logging traffic allows the system administrators to identify accounts that have been misusing the network. those who inadvertently misuse the system or perhaps have had their accounts compromised can have their access temporarily disabled until they can be contacted with instructions concerning acceptable use of the net­ work. as the network must be shared by all, those who habitually abuse the resource can have their accounts per­ manently disabled. the captive portal should also redi­ rect web traffic to a login page that is served on the secure socket layer until the user logs in. chillispot was chosen as it possesses all of the features mentioned above.4 n server layout as can be seen in appendix a, three servers were used for this implementation. the first server was used as the main router to the internet. the second server ran a squid web caching server.5 it also ran a dns cach­ ing server and the freeradius server.6 the third was used for the captive portal. three servers were used for various reasons. first, this distributed the load. second, portions of the network that were not behind the cap­ tive portal could more easily use the services on the second server running squid, dns, and freeradius. it should be noted that three independent servers are not required; many of the services could be consolidated on two or even one single server to reduce the hardware requirements. the implementation depends upon the specific needs for the network. n server installation installing the operating system (fedora core) on each server is a relatively straightforward procedure. each machine was partitioned with 1024 mbs of swap space with the rest of the drive being an ext3 partition with the mount point “/”. only the minimal set of packages required were installed at this time. the first server, server #1 (router), was given three network interfaces, one for the internet connection, one to connect to a switch that then connects to server #2 (web/dns caching and radius) as well as other machines that do not connect through the captive portal, and one connecting to server #3 (captive portal machine). the second server, server #2, only needs one interface, but the third, server #3, requires two interfaces, one for the master wireless access point, and one to connect to the switch connecting this machine to the rest of the network (appendix a). ssh login for root was also disabled at this time for added security. n server #1 configuration for server #1, very little setup was required. since this server works mainly as a router, the only major items that went into its configuration were the iptables rules, which are shown and described in appendix b.7 rules were set up to: n set up network address translation; n allow traffic to flow within the network; n log the traffic from the wireless portion of the net­ work; n allow for the transparent setup of the web proxy server; and n set up port knocking before allowing users to log into the router via ssh.8 a reference to this script was placed in the /etc/rc.d/ rc.local file so that it would run when the server boots. last was the setup of the three network interfaces in the machine. this can be done during system installation or afterwards on the fedora core based server by editing the configuration files in the /etc/sysconfig/networking­ scripts/ directory. one of the configuration files used in this implementation can be seen in appendix c. of course the configuration will change as the topology of the net­ work changes. n server #2 configuration the second server required significantly more setup to configure all of the necessary services that it runs. the first service added for this implementation was the web­ caching proxy server, squid. squid’s default configura­ tion file (/etc/squid.conf) is quite large; fortunately it requires little modification to get a simple server up and running.9 the changes made for this implementation can be seen in appendix d. the most important lines in this configuration are the last few, which enable it to act as a transparent proxy server, making it invisible to the users and requiring no setup of their browsers. as there was no need for an authoritative dns server, just dns caching for the network, dnsmasq, which is easy to configure and can handle both dhcp services as well as dns caching, was chosen.10 in this instance, the captive portal was used to provide dhcp services for the wireless clients; however dnsmasq was used for dynamic clients on the remaining portion of the network. dnsmasq public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 37open source wifi hotspot implementation | sondag and feher 37 is relatively easy to configure, requiring only one change in its default configuration file, which points to the file in which the dns server addresses are stored, in this case /etc/dnsmasq_resolv.conf. next is the configuration of freeradius server. there are two files that need to be modified for the radius server; both are in the /etc/raddb/ directory. the first is clients.conf (appendix e). in this file at least two clients must be listed, one for localhost (this machine) and one for the captive portal machine. for each machine, a pass­ word must be specified as well as the hostname for that machine. this establishes the shared key that is used to encrypt communication between the captive portal and the radius server. the second is the users file (appendix f). in this file, each user for the captive portal system must be listed with his/her password. this implementa­ tion also included a class, a session timeout (dhcp lease time), idle timeout, accounting interim interval, and the maximum upload and download speeds. if guest access is required, one or several guest accounts should be added to this file along with entries for the registered users. an entry was added for each access point so that they can obtain an ip address from the dhcp server. finally for this machine, the interface configuration file was changed according to the network specifications. for this machine the configuration is simple since it only has one interface, and the only requirement for its address is that it be on the same network as the interface on the main router server to which it is connected. n server #3 configuration the third server required the installation of the captive portal software, in this case chillispot. in order to install chillispot, if fedora was used for the base system, it may be possible to install it as a prepackaged binary in the form an rpm package manager (rpm) file. otherwise, if you find that you need to compile chillispot from source code, you may need to deviate from a minimal installa­ tion of the operating system and base components and also include the gnu compiler collection (gcc). when installing from source code, first download the code from the chillispot web site. once the code is down­ loaded, unzipped and untarred, installing the chillispot daemon is done by entering the directory containing the source files and entering the standard commands: ./configure make make install when chillispot is on the system, either by compiling from source or through an rpm file, two more files must be configured and copied to the proper directory, the main configuration file and the login file. the configuration file, chilli.conf, is located in the directory that contains the source files. move this file to the /etc/ directory and make the necessary changes. in this implementation, the file required several changes (appendix g). one of the more significant alterations was to change the default network range of 192.168.182.0/24, which would be limited to less than 256 addresses. the address range was for the dhcp server was also expanded to allow for more users. the lower portion of the network range was left to make room for addresses that could be assigned to the wireless access points. an entry was added to allow the access points to obtain a static ip address in that lower range. after this, settings must be changed for the dns addresses given out to clients, and the address of the radius server. there is also a setting in the chillispot configuration file that allows users to access a certain list of domains without logging in. for this implementation, the decision was to allow the users access to the campus network, as well as to the dns server. next, the “radi­ ussecret” must be set. this is the same password that was entered into the clients.conf file on the radius server for this machine. it is also necessary to set the address of the page to which users will be directed. two lines must also be added to allow authentication using the physical or media access control (mac) address for the access points. all of the access points shared a common password. chillispot passes the physical address of the access point to the radius server along with this password. a separate entry must exist in the radius configuration file for each ip/physical address combination. for this setup, the redirect page was placed on this server, therefore apache (using yum) was also installed, and this server’s address was added as the web address for the redirect page (also note that the https module may be required for apache if it does not automatically install). rather than write a new page at this time, the sample page (hotspotlogin.cgi) from the chillispot source folder was copied and modified slightly (appendix h). in addi­ tion, a secure socket layer (ssl) certificate was installed on this server. this is not necessary, but it helps to avoid the warnings that pop up when a client attempts to access the login page with a browser. a few iptables rules need to be added. the first com­ mand needs to be executed in order to utilize network address translation (nat) and have the server forward packets to the outside network. /sbin/iptables ­t nat ­a postrouting ­o eth0 \ ­j masquerade the next is used to drop all outbound traffic originating from the access points. this prevents anyone spoofing the physical address of the access point from accessing 3� information technology and libraries | june 20073� information technology and libraries | june 2007 the internet, while still allowing the access points and the chillispot server to communicate for configuration and monitoring. /sbin/iptables ­a forward ­s 192.168.182.0/24 \ ­j drop these commands need to be executed when the chillispot machine boots, so they were placed into the /etc/rc.d/rc.local file. it may also be necessary to ensure that the machine can forward network traffic. this can be accomplished with the following command, which is also found as the first executable command from the script in appendix b: echo “1” > /proc/sys/net/ipv4/ip_forward finally, the configuration files for the interfaces were set up. n openwrt installation and configuration several ways exist to replace the default linksys firmware with the openwrt firmware.11 the tftp protocol can be used with both windows and linux, and one such method can be found in appendix i.12 in addition, other methods for using the standard web interface can be found on the openwrt web site.13 there are several versions of the openwrt firmware available; the newest version that uses the squashfs filesystem was chosen because it utilizes com­ pression that frees more space on the access point. openwrt comes with a default web interface that can be used for configuration, however, ssh was enabled and a script using the nvram command was used to configure each access point (see appendix j). before ssh can be used, you must telnet into the router and change the default password (which for linksys routers is ‘admin’). note: even if you decide to use the web interface, you should still change the default password. as several services that were installed with the default configuration were not used in the implementa­ tion, they were disabled once the firmware was flashed by removing the modules that boot at startup: the web interface, dnsmasq, and the firewall. this is done by deleting their entries in the /etc/init.d directory. changes were needed to set the mode of the access point, to turn on and configure the clients needing to use wds, to set the network information for the access point and then to save these settings. all of the wireless access points that communicate with each other via a wireless connec­ tion must have their physical addresses entered using a nvram command. for example, the command used for the main access point for the library would be: nvram set w10_wds=”mac_4_lib1 mac_4_lib2” all of this is detailed in appendix j. a final set of com­ mands, which were needed for the wrt54gs, are included to allow the access point to obtain its ip address from the dhcp server. these commands may not be necessary depending upon the type of access point used. since extra wireless access points are available, if an access point fails or is having problems for some reason, it is simply a matter of running a script similar to the one found in the appendix on one of the extra routers and swapping it out. n security unfortunately this system is not very secure. only the login credentials are encrypted via ssl. general data packets are in no way encrypted, so any information being transmitted is available to anyone sniffing the channel. wep and wpa could be used for encryption, but they have known vulnerabilities. other methods exist for securing the network such as wpa with radius or the use of a virtual private network, however the client setup for such systems may not be considered trivial for the typical user. therefore it was decided that it was better to inform the users that the data was not being encrypted and let them act accordingly, rather than use encryption with known flaws or invest the time required to train the general population on how to configure their mobile units to use a more secure form of encryption. as the main goal of this particular network was connectivity and not security, it was felt that this was a fair trade­ off. as new standards for wireless communication are developed and commodity hardware that supports them becomes available, this may change so that encrypted channels can be employed more easily. n conclusion this implementation is in no way completed. it is a work in progress, with many goals still in mind. also, as new features are desired, parts of the system will change to accommodate these requirements. current plans for the future are first to develop scripts to check the status of the access points and display this information to a web page. these scripts will also notify network administrators when access points go offline. this will help the adminis­ trators in making sure the system is up at all times. after this, scripts will be developed to parse the log files to find abusive activity (spamming, viruses, etc). however, the current project as described is complete and has already functioned successfully for nearly a year providing con­ nectivity for the library and portions of the mckendree college campus. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 3�open source wifi hotspot implementation | sondag and feher 3� references and notes 1. openwrt, wireless freedom. www.openwrt.org (accessed june 16, 2006). 2. the fedora project. www.fedora.redhat.com (accessed nov. 29, 2005). 3. yum: yellow dog updater, modified. www.linux.duke. edu/projects/yum (accessed july 22 2006). 4. chillispot—open source wireless lan access point controller. www.chillispot.org (accessed june 23, 2006). 5. squid web proxy cache. www.squid­cache.org (accessed june 1, 2006). 6. freeradius—building the perfect radius server. www. freeradius.org (accessed june 28, 2006). 7. netfilter/iptables project homepage—the netfilter.org project. www.netfilter.org (accessed aug. 8, 2006). 8. thomas eastep, “port knocking and other uses of ‘recent match.’” www.shorewall.net/portknocking.html (accessed aug. 11, 2006). 9. squid web proxy cache, “squid frequently asked questions: interception caching/proxying.” www.squid­cache. org/doc/faq/faq­17.html (accessed aug. 8, 2006). 10. dnsmasq—a dns forwarder for nat firewalls. www. thekelleys.org.uk/dnsmasq/doc.html (accessed june 1, 2006). 11. linksys.com. www.linksys.com (accessed dec. 15, 2005). 12. openwrtdocs/installing/tftp—openwrt. wiki.open­ wrt.org/openwrtdocs/installing/tftp?action=show&redirect =openwrtviatfp (accessed aug. 2, 2006). 13. openwrtdocs/installing—openwrt. wiki.openwrt.org/ openwrtdocs/installing (accessed aug. 2, 2006). appendix a. network configuration 40 information technology and libraries | june 200740 information technology and libraries | june 2007 appendix b. iptables script—server #1 # this particular bit must be set to one to allow the # network to forward packets echo “1” > /proc/sys/net/ipv4/ip_forward # set up path to the internal network from internet if the # internal network initiated the connection iptables ­a forward ­i eth0 ­o eth1 ­d 10.4.0.0 \ ­m state ­­state established,related ­j accept # same for the chillispot subnet iptables ­a forward ­i eth0 ­o eth2 ­d 10.5.0.0 \ ­m state ­­state established,related ­j accept # allow the internal subnets to communicate with one another iptables ­a forward ­i eth1 ­d 10.5.0.0 ­o eth2 \ ­j accept iptables ­a forward ­i eth2 ­d 10.4.0.0 ­o eth1 \ ­j accept # allow subnet containing server 2 to reach the internet iptables ­a forward ­i eth1 ­o eth0 ­j accept # chillispot – accept and forward packets iptables ­a forward ­i eth2 ­s 10.5.3.30 ­j accept # set up transparent proxy for wireless network, but allow # connections that go through to the campus network # to bypass proxy iptables ­t nat ­a prerouting ­i eth2 ! \ ­d 66.99.172.0/23 ­p tcp ­­dport 80 ­s 10.5.0.0/16 \ ­j dnat ­­to­destination 10.4.1.90:3128 # nat iptables ­t nat ­a postrouting ­o eth0 \ ­j masquerade # simple port knocking to allow port 22 connection adapted # from www.shorewall.net/portknocking.html1 another # excellent document can be found at # www.debian-administration.org/articles/26814 # once connection started let it continue iptables ­a input ­m state ­­state \ established,related ­j accept # if name ssh has been set, then allow connection iptables ­a input ­p tcp ­­dport 22 ­m recent \ ­­rcheck ­­name ssh ­j accept # surround the port that opens ssh so that a sequential port # scanners will end up closing it right after opening it. iptables ­a input ­p tcp ­­dport 1233 ­m recent \ –­name ssh ­­remove ­j drop iptables ­a input ­p tcp ­­dport 1234 ­m recent \ ­­name ssh ­­set ­j drop iptables ­a input ­p tcp ­­dport 1235 ­m recent \ ­­name ssh ­­remove ­j drop # drop all packets that do not match a rule above by default iptables ­a input ­j drop appendix c. server configuration for first network card (ethernet 0) # /etc/sysconfing/networking­scripts/ifcfg­eth0 ­ # server #1 # device=eth0 bootproto=static broadcast=66.128.109.63 hwaddr=00:11:22:33:44:66 ipaddr=66.128.109.60 netmask=255.255.255.248 network=66.128.109.56 onboot=yes type=ethernet appendix d. /etc/squid.conf—server #2 #default squid port http_port 3128 # settings changed to specify memory for squid cache_mem 32 mb cachedir ufs /var/spool/squid 1000 16 256 # allow assess to squid for all within our network acl all src 0.0.0.0/0.0.0.0 http_access allow all http_reply_access allow all # internal host with no externally known name so we put # our internal host name visible_hostname hostname # specifications needed for transparent proxy2 httpd_accel_port 80 httpd_accel_host virtual httpd_accel_with_proxy on httpd_accel_uses_host_header on public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 41open source wifi hotspot implementation | sondag and feher 41 appendix e. /etc/raddb/clients.conf— server #2 client 127.0.0.1 { secret = password shortname = localhost nastype = other } client 10.5.3.30 { secret = password shortname = other machine } appendix f. /etc/raddb/users—server #2 # example of an entry for a user joeuser auth­type:=local, user­password==”passwd” class = 0702345678, session­timeout = 3600, idle­timeout = 600, acct­interim­interval = 60, wispr­bandwidth­max­up = 128000, wispr­bandwidth­max­down = 512000 # example of an entry for an access point # the physical/mac address listed below is for the # lan side of the router/access point mac_address auth­type := local, user­password == “password” framed­ip­address = 192.168.182.10, acct­interim­interval = 3600, session­timeout = 0, idle­timeout = 0 appendix g. /etc/chilli.conf—server #3 # used to expand the network net 192.168.176.0/20 # used to expand the number of hosts that can connect # while still leaving a portion of the network for # infrastructure dynip 192.168.184.0/21 # used to give static addresses to the access points statip 192.168.182.0/24 # internal dns followed by external dns dns1 10.4.1.90 dns2 24.217.0.3 # radius server for the network radiusserver1 10.4.1.90 radiusserver2 10.4.1.90 # radius secret used radiussecret password # interface chillispot server to listens to dhcp requests dhcpif eth1 # specified default login page uamserver https://10.5.3.30/cgi­bin/hotspotlogin.cgi # addresses that users can visit without authenticating uamallowed 10.4.1.90,24.217.0.3,66.99.172.0/24 # this allows the access points to authenticate based on # mac address only, this is required to log into the access # points from the captive portal server macauth # this password corresponds with the password from the # radius users file macpasswd password 42 information technology and libraries | june 200742 information technology and libraries | june 2007 appendix h. redirection page appendix i. method for flashing firmware of linksys router the firmware can be flashed using the built­in web inter­ face or via tftp. while help is available online3 for this, the procedure outlined here may also be helpful. on newer versions of the linksys routers, an older version of the linksys firmware must be installed first that supports a bug in the ping function on the router. once the older version is installed, you can exploit a bug in the ping com­ mand on the router to enable “boot wait,” which enables the router to accept a connection to flash its firmware as it is booting. detailed instructions for this installation are as fol­ lows: n first, download an old version of a linksys firmware that supports the ping bug to enable boot wait. one is available at: ftp://ftp.linksys.com/pub/network/ wrt54gs_3.37.2_us_code.zip n download and unzip this file. n plug an ethernet patch cable into link #1 on the router (not the wan port) and the interface on your machine. set the ip address of your computer to a static ip address in the 192.168.1.x range, not 192.168.1.1, which is used by the router. n log into router by opening a browser window and putting 192.168.1.1 into the address bar. (note: this is only for factory preset routers.) username: (leave blank) password: admin n click on "administration". n click on "firmware upgrade". n click "browse" and locate the old linksys firmware on your machine. n click "upgrade". n wait patiently while it flashes the firmware…. n click "setup". n click "basic setup". public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 43open source wifi hotspot implementation | sondag and feher 43 n choose "static ip" from the first box. n for the ip address put in "10.0.0.1". n for the netmask put in "255.0.0.0". n for the gateway put in "10.0.0.2". n you can leave everything else as their default set­ tings. n choose save settings at the bottom of the page. n click on "administration". n click on "diagnostics". n click on "ping". in the “address” box put the following commands in one at a time and click on “ping”; if you see the message that the host was unreachable you have done something wrong. ;cp${ifs}*/*/nvram${ifs}/tmp/n ;*/n${ifs}set${ifs}boot_wait=on ;*/n${ifs}commit ;*/n${ifs}show>tmp/ping.log n after the last command you will see a list of all the nvram settings on the router, make sure that the line for "boot_wait" is set to on n unplug the router (the linksys router will only look for new firmware on boot). n use tftp on your linux or windows machine. n if the openwrt0­wrt54gs­squashfs.bin file is not in this directory, copy the file to this directory n run the following commands at the prompt (below are the linux commands) tftp 192.168.1.1 tftp> binary tftp> rexmt 1 tftp> timeout 60 tftp> trace tftp> put openwrt­xxx­x.x­xxx.bin n the router will now reboot (it may take a very long time), when it is done rebooting, the dmz light will turn off the new firmware is now loaded onto the router. appendix j. nvram script for wireless routers ## server information stored as comments ##192.168.182.10 mainap 00:11:22:33:44:00 ##192.168.182.11 cl202a 00:11:22:33:44:11 ##192.168.182.20 lib01 00:11:22:33:44:22 ##192.168.182.21 lib02 00:11:22:33:44:33 ##192.168.182.22 lib03 00:11:22:33:44:44 ##192.168.182.30 car01 00:11:22:33:44:55 ## same for all nvram set wl0_mode=ap nvram set wl0_ssid=mck_wireless nvram set wl0_channel=9 nvram set lan_proto=dhcp ## sample configuration for a few access points. ## uncomment and run for the appropriate node. ## make sure to ## add a line for every access point you have. ## unique for lib01 ## allow connections to/from lib02, and lib03 #nvram set wl0_wds=”00:11:22:33:44:33 00:11:22:33:44:44” ## unique for lib02 ## allow connections to/from lib01 #nvram set wl0_wds=”00:11:22:33:44:22” ## unique for lib03 ## allow connections to/from lib01 #nvram set wl0_wds=”00:11:22:33:44:22” ## same for all nvram commit ## same for all ## this needed to be done to allow each wrt54gs router ## to accept an ip address from a dhcp server. this is ## only for the wrt54gs. other access point/routers ## may require something different. # cd /etc/init.d # rm s05nvram # cp /rom/etc/init.d/s05nvram . # vi s05nvram ## place a # in front of (comment out) ## nvram set lan_proto=”static” references 1. thomas eastep, “port knocking and other uses of ‘recent match.’ ” www.shorewall.net/portknocking.html (accessed aug. 11, 2006) 2. ibid. 3. openwrtdocs/installing­openwrt, wiki.openwrt.org/ openwrtdocs/installing (accessed aug. 2, 2006). animated subject maps for book collections tim donahue information technology and libraries | june 2013 7 abstract of our two primary textual formats, articles by far have received the most fiscal and technological support in recent decades. meanwhile, our more traditional format, the book, seems in some ways to already be treated as a languishing symbol of the past. the development of opacs and the abandonment of card catalogs in the 1980s and 1990s is the seminal evolution in print monograph access, but little else has changed. to help users locate books by call number and browse the collection by subject, animated subject maps were created. while the initial aim is a practical one, helping users to locate books and subjects, the subject maps also reveal the knowledge organization of the physical library, which it displays in a way that can be meaningful to faculty, students, and other community members. we can do more with current technologies to assist and enrich the experience of users searching and browsing for books. the subject map is presented as an example of how we can do more in this regard. lc classification, books, and library stacks during the last few decades of technological evolution in libraries, we have helped facilitate a seismic shift from print-based to digital research. our library websites are jammed with electronic resources, digital collection components, database links, virtual reference assistance, online tutorials, and mobile apps. collection budgets too have shifted from a print to electronic focus. many libraries are now spending less than 20 percent of their material budgets on print monographs. and yet, our stacks are still filled with books that often take up more than fifty percent of our library spaces. knowledge organization schemas have also evolved in libraries. we have subject lists to help users to decide on which databases to select that reflect current disciplines and majors in higher education. internal database navigation continues to evolve in terms of limits, fields, and subject searching. web searching is based on the contemporary keyword approach where “everything is miscellaneous” and need not be organized, but nationwide, billions of books still sit on shelves according to dewey or library of congress classification systems that were initially developed over a century ago. some say these organizing systems are woefully antiquated and do not reflect our contemporary post-modern realities, though they still amply serve their purpose to assign call number locations for our books. we hear scant little of plans to update these classification schemes. why invest more time, energy, and resources on revamped organization schemes for libraries? the hathitrust now contains the tim donahue (tdonahue@montana.edu) is assistant professor/instruction librarian, montana state university, bozeman, mt. animated subject maps for book collections | donahue 8 scanned text of more than ten million books. google claims there are almost 130 million published titles in the world and intends to digitize all of them.1 what will happen to our physical book collections? how long will they reside on our library shelves? how long will they be located using the dewey and lc systems? is the library a shrinking organism? profession-wide, there seems to be no concrete vision in regards to the future of our book collections. there is, of course, general acknowledgement that acquisition of e-books will increase as print acquisitions decrease and that, overall, print collections will accordingly shrink to reflect the growing digital nature of knowledge consumption. but for now and into the foreseeable future these billions of monographs remain on our shelves in the same locations our call number systems assigned to them decades ago. and while online library users are now able to utilize an array of electronic access delivery systems and web technologies for their article research and consumption, book seekers still need a call number. books and articles have been our two primary textual formats for centuries. articles have moved into the digital realm more fleetly than their lengthier counterparts. their briefer length, the cyclical serial publication process, and the evolution of database containment and access have enabled, in a relatively short time, a migration from print to primarily digital access. books, however, are accessed in much the same way they were a hundred years ago. the development of opacs in the 1980s and 1990s and abandonment of card catalogs is the seminal evolution in print monograph access, but little else has changed.2 once a call number is attained, the rest of the process remains physical, usually requiring pencil, paper, feet, sometimes a librarian, and a trip through the library until the object itself is found and pulled from the shelf. so while the process of article acquisition may employ a plethora of finding aids, keyword searching, database features, full text availability, and various delivery methods through our richly developed websites, beyond the opac and possibly a static online map, book seekers are on their own or need a librarian in what may seem a meaningless labyrinth of stacks and shelves. while the primary and most practical purpose of our classification schemes is to provide an assigned call number for book finding, these organizational outlines create an order to the layout of our stacks that maps a universe of knowledge within our library walls. this structure of knowledge reveals a meaning to our collections that includes the colocation of books by topic and proximity of related subjects. these features enhance the browsing process and often lead to the act of serendipitous discovery. to locate a book by call number, a user may consult library floor plans, which are typically limited to broad ranges or lc main classes, then rely on stack-end cards to home in on the exact stack location. to browse books by subject without using the catalog, a user typically must rely on a combination of floor plans and lc outline posters if they exist at all. often, informed browsing by subject cannot take place without a visit to the reference desk for mediation by a librarian. even then, many librarians are barely familiar with their book collection’s organizational structure and are reticent to recommend broad subject browsing. information technology and libraries | june 2013 9 purpose and description of the subject map to help users locate books by call number and browse the collection by subject, animated subject maps were created at skidmore college and montana state university. displaying overhead views of library floors, users mouse over stacks to reveal the lc sub-classes located within. alternatively, they may browse and select lc subject headings to see which stacks contain them. the lc outline contains 21 main subject classes and 224 sub-classes, corresponding to the first two elements of a book call number. on stack mouse-over, three items are displayed: the call number by range, the main subject heading, and all sub-classes contained within the stack. when using the browse by subject option, users select and click an lc main class and the stacks where this class is located are highlighted. while the initial aim is a practical one, helping users to locate books and subjects, the subject map also reveals the knowledge organization of the physical library, which it displays in a way that can be meaningful to faculty, students, and other community members. the map also provides local electronic access to the lc classification outline. at both institutions the maps are linked from prominent web locations and electronic points of need that are relevant and proximate to other book searching functions and tools. figure 1. skidmore college subject map showing stack mouse-over display. animated subject maps for book collections | donahue 10 figure 2. montana state university subject map showing stack mouse-over display. design rationale and methodology the inspiration for the subject map started with a question: what if users could see on a map where individual subjects were located within the library? most library maps examined were limited to lc main classes or broad ranges denoting wide swaths of call numbers. including hundreds of lc subclasses would convolute and clutter a floor map beyond usability. but what if an online map contained each individual stack and only upon user-activation was the information revealed, saving space and avoiding clutter? such a map should be as devoid of congestion as possible and focus the user’s attention on library stack locations and lc classification. working from existing maps and architectural blueprints of the library building, a basic perimeter was rendered using adobe illustrator and indesign software. these perimeters were then imported into adobe flash and a new .fla file created. library stacks were then measured, counted, and added as a separate layer within each floor perimeter. basic location elements such as stairways, elevators, and doors were added for locational reference points. each stack was then programmed as a button with basic rollover functionality. flash actionscript was coded so that the correct call number, main class, and sub-class information appear within the interface upon rollover activation. this functionality accounts for the stack searching ability of the subject map. information technology and libraries | june 2013 11 additionally, the lc outline was made searchable within the map so that users can mouse over subjects and upon clicking, see what stacks contain those main classes. this functionality accounts for the subject searching ability of the map. left-hand navigation was built in so users can toggle between these two main search functions. maintaining visual minimalism and simplicity was a priority and inclinations to render the map more comprehensively were resisted in order to maximize attention to subject and stack information. black, white, and gray colors were chosen to enhance the contrast of the map and aid the user’s eye for quick and clear use. other relevant links and instructional context were added to the left-hand navigation including links to the catalog, official lc outline, and library homepage. finally, after uploading to the local server and creating a simple url, links to the subject map were established in prominent and meaningful points of need within the library website. user acceptance once the subject map was completed and links to it were made public, a brief demonstration was provided for reference team members who began showing it to users at the reference desk. initial reaction was enthusiastic. students thought it was “cool” and enjoyed “playing with it.” one reported, “i didn’t know the library actually made sense like that. it’s neat to see the logic about where things are.” another student said, “now i can see where all the books on buddhism are!” faculty, too, were pleased. though faculty members typically know a little about lc classification, they are not accustomed to seeing it visualized and grafted onto their institutional library’s stacks. making transparent the intellectual organization of the library for other faculty can bolster their confidence in our order and structure. professors are often pleased to see their discipline’s place within our stacks and where related subjects are located. the most positive praise for the subject map, however, comes from the sense of convenience it lends. many comments express appreciation for the ability to directly locate an individual book stack. because primary directional and finding elements like stairs and elevators are included in the maps, users are able to see the exact path that leads to the book they are seeking. for those not interested in browsing, in a hurry, or challenged in terms of mobility, the subject map is a time and energy saver. some users however have reported frustration with the sensitivity required for the mouse-over functions. others desire a more detailed level of searching beyond the sub-class level. one user pointed out that the subject map was of no help to the blind. multiple uses and internal applications the primary use and most obvious application of the subject map is as a reference tool. as a front line finding aid, librarians and other public service staff at reference, circulation, or other help desks can easily and conveniently deploy the map to point users in the right direction and orient them to the book collection. in library instruction sessions, the subject map is not only a practical local resource worth pointing out, but also serves as an example of applied knowledge organization. when accompanying a demonstration of the library catalog, the map is not only a valuable finding aid, but adds a layer of meaning as well. students who understand the map are animated subject maps for book collections | donahue 12 not only more able to browse and locate books, but learn that a call number represents a detailed subject meaning as well as locational device. used in conjunction with a tour, the map reinforces the layout of library shelves and helps to bridge the divide between electronic resources and physical retrieval. the subject map facilitates a concrete and visual introduction to the lc classification outline, a knowledge of which can be applied to most college and research libraries in the united states. the subject map can also be of assistance with collection development. perusal of the map can reveal relative strengths and weaknesses within the collection. subject liaisons and bibliographers may use the map to home in on and visualize their assigned areas. circulation staff and stacks maintenance workers find the map useful for book retrieval, shifting projects, and in the training and acclimation of new workers to the library. the subject map has proven to be a useful reference for library redesign and space planning considerations. at information fairs and promotional events where devices or projection screens are available, the map has served as a talking point and promotional piece of digital outreach. the map has been demonstrated by information science professors to lis graduate students as an example of applied knowledge organization in libraries. recently, a newly hired incoming library dean commented that the map helped him “get to know the book collection” and familiarized him with the library. figure 3. skidmore college subject map showing subject search display. information technology and libraries | june 2013 13 issues and challenges in some libraries, books don’t move for decades. the same subjects may reside on the same shelves during an entire library’s lifetime. in this case, a subject map can be designed once and never edited. but, of course, most library buildings go through changes and evolutions. in many libraries, collection shifting seems to be ongoing. book collections wax and wane. certain subjects expand with their times, while others shrink in irrelevancy. weeding does not affect all subjects and stacks equally and adjustments to shelves and end cards are necessary. in addition to the transitions of weeding and shifting, sometimes whole floors are reconfigured. in the library commons era of the last few decades, substantial redesigns have been commonplace as book collections make way for computer stations and study spaces. in all these cases, adjustments and updates will be necessary to keep a subject map accurate. this is easily done by going back into the master .fla file and editing as needed. in many cases only a stack or two need be adjusted, but in instances of major collection shifting some planning ahead may be necessary and more time allotted for redesign. shifting can be a complex spatial exercise and it is difficult to predict where subjects will realign exactly. subject map editing may have to wait until physical shifting is completed. it should be noted that each stack must be hand-coded separately. in libraries with hundreds of stacks this can seem a tedious and time-consuming design method. both subject maps rely on adobe flash animation technology. flash is proprietary software, so the benefits of open source software cannot be utilized with subject maps at this time. further, abobe flash reader software must be installed on a computer for the subject map to render. this has almost never been a problem, however, as the flash reader is ubiquitous and automatically installed on most public and private machines upon initial boot up. another concern, however, relating to flash technology is human assets. not every library has a flash designer or even someone who can implement the most fundamental flash capabilities. flash is not hard to learn and the subject maps utilize only its most basic functionalities, but still, for some it remains a niche software and many libraries will not have the resources to invest. reaction, though, to the live subject maps and the rollover interactivity they provide, has been so positive that more fully integrated flash maps have been proposed. why not have all physical elements of the library incorporated into one flash-enabled map? this is possible but may come at some expense to the functionality of the subject-rendering aspect of the maps. by limiting the application to stacks and lc classes, a user may remain more focused. avoiding clutter, overcrowding, and a preponderance of choice is a design strategy that has gained much credibility in recent years.3 the subject map enjoys the usability success of clean design, limited purpose, and simple rendering. while demonstrating the potential of user-activated animation for other proposed library applications, the subject map might be best maintained as a limited specialty map. a final concern regarding the long-term success of subject maps should be mentioned. how long will books remain in libraries? how long will they be organized by subject? when the physical animated subject maps for book collections | donahue 14 arrangement and organization of information objects no longer exists in libraries, maps of any kind will seemingly lose all efficacy. but will libraries themselves exist in this future? whither books? whither libraries? future developments the most prominent and practical attribute of the subject map is its ability to show a user the exact stack where the book they are seeking is located. but in its current state as a stand-alone application, a user must obtain a call number from a catalog search, then open the subject map by going to its independent url. investigation is underway to determine what is necessary in order to integrate the subject map with the online catalog. in this scenario, a catalog item record might also display an embedded subject map that automatically highlights the floor and stack where the call number is located. this seemingly requires .swf files and flash actionscript to be embedded in catalog coding. one potential solution is to attribute an individual url to each stack rendering so that a get url function can be applied and embedded in each catalog item record. this synthesis of subject map and catalog poses a complex challenge but promises meaningful and time-saving results for the item retrieval process. qr code technology in conjunction with subject map use is also being deployed. by fixing qr codes on stack end cards that link to relevant sections of the lc outline, a researcher may use a mobile device to browse digitally and physically within the stacks at the same time. in this way a user may conduct digital subject browsing and physical item browsing simultaneously. the urls linked to by qr coding contain detailed lc sub-levels not contained within the subject map, which is limited to the level of sub-class. the active discovery of new knowledge facilitated by exploiting preexisting lc organization inside library stacks in real time can be quite impressive when experienced firsthand. another development exploiting lc knowledge organization is in beta mode at this time. an lc search database has been created allowing users to enter words and find matching lc subject terminology. potentially, this database could be merged with the subject map, allowing users to correlate subject word search with physical locations independent of call numbers. despite its intent as a limited specialty map, possibilities are also being explored to incorporate the subject map into a more fully integrated library map. one way forward in this regard is to create map layers that could be toggled on and off by users. in this way, the subject map could exist as its own layer, maintaining its clarity and integrity when isolated but integrated when viewed with other layers. flash technology excels at allowing such layer creation. other stack maps and related technologies searching the web for “subject map” and relative terminology such as stack, shelf, book, and lc maps, does turn up various efforts and approaches to organizing and exploiting classification scheme data, but no animated, user-activated maps are found. similar searches across library and information science literature turn up some explorative research on the possibilities of mapping information technology and libraries | june 2013 15 lc data, but again no animated stack maps are found.4 there is a product licensed by bowker inc. called stackmap that can be linked to catalog search results. when a user clicks on the map link next to a call number result, a map is displayed with the destination stack highlighted, but the information provided is locational only. stackmap is not animated or user-activated. no subject information is given and the map offers no browsing features. since the release of html5, we are beginning to see more animation on the web that is not flashdriven. steve jobs and apple’s determined refusal to run flash on their mobile devices has motivated many to seek other animation options. new html5 animation tools such as adobe edge, hippo animator, and hype offer promising starts at dislodging the flash grip on web animation, but they have far to go and do not yet offer either the ease of design nor the range of creative possibilities of flash. building an animated subject map with html5 alone does not seem possible at this time. universal applicability of the subject map so far, subject maps have been created for two very different libraries. the commonality shared between the montana state university and skidmore college libraries is their possession of hundreds of thousands of books in stacks shelved by the lc classification system. this is a trait shared by nearly all college and research libraries. subject maps can be easily structured on the dewey decimal system as well so that public libraries could benefit from their functionality, making the subject map appropriate and creatable for more than 12,000 libraries.5 of our two primary textual formats, articles by far have received the most fiscal and technological support in recent decades. article searching and retrieval continues to evolve through the rich implementation of assets such as locally constructed resource management tools, independent journal title searches, complexly designed database search interfaces, and dedicated electronic resource librarians. meanwhile, our more traditional format, the book, seems in some ways to already be treated as a languishing symbol of the past. because its future is uncertain, does that justify our neglect in the present? as a profession we seem a bit complacent about the state of our book collections. why dedicate our technical resources to a format that is on the way out? but has the book disappeared yet? as we make room for more student lounges, coffee bars, computer stations, writing labs, and information commons, we should carefully ask what makes a library special. good books and the focused, sustained treatment of knowledge they contain are part of the correct answer, symbolically and as yet, practically speaking. while our books still occupy our library shelves, shouldn’t they also fully benefit from the ongoing technological explosion through which we continue to evolve? opacs haven’t evolved much in recent years. in fact they seem quite stymied to many librarians and users. we can do more with current technologies to assist and enrich the experience of users searching and browsing for books. the subject map is hopefully an example of how we can do more in this regard. while we have grown accustomed to increasingly look forward in order to position our libraries for the future, we should also remember to sometimes look back. our classification systems and animated subject maps for book collections | donahue 16 book collections are assets built from the past that represent many decades of great labor, investment, and achievement. more than 12,000 public and academic libraries together make up one of our greatest national treasures and bulwarks of living democracy. libraries are among the dearest valued assets in any of our states. many of the most beautiful buildings in our nation are libraries. based on library insurance values and estimated replacement costs, library buildings and the collections they hold amount cumulatively to hundreds of billions of dollars of worth.6 this astounding worth is figured mainly from the buildings themselves and the books they contain. a few have commented that there is some aesthetic quality to the subject maps. if this is true, the appeal comes from the synthesis of architectural form and the universe of knowledge revealed within, from the beauty of libraries both real and ideal, from physical and mental constructions unified. animated subject maps can help bring the physical and intellectual beauty of libraries into the digital realm, but the main appeal is a practical one: to point the user directly to the book or subject they are seeking. so in conclusion, perhaps we should measure the subject map’s potential in the light of ranganathan’s five laws of library science:7 1. books are for use. 2. every reader his [or her] book. 3. every book its reader. 4. save the time of the reader. 5. the library is a growing organism. the subject maps can be found at the following urls: skidmore college subject map: http://lib.skidmore.edu/includes/files/subjectmaps/subjectmap.swf montana state university subject map: www.lib.montana.edu/subjectmap references 1. google, “google books library project—an enhanced card catalog of the world’s books,” http://books.google.com/googlebooks/library.html, accessed november 8, 2012. 2. antonella iacono, “opac, users, web. future developments for online library catalogues,” bollettino aib 50, no. 1–2 (2010): 69–88, http://bollettino.aib.it/article/view/5296. 3. geoffrey little, “where are you going, where have you been? the evolution of the academic library web site,” the journal of academic librarianship 38, no. 2, (2012): 123–25, doi:10.1016:j.acalib.2012.02.005. http://lib.skidmore.edu/includes/files/subjectmaps/subjectmap.swf http://www.lib.montana.edu/subjectmap/ http://books.google.com/googlebooks/library.html http://bollettino.aib.it/article/view/5296 http://dx.doi.org/10.1016:j.acalib.2012.02.005 information technology and libraries | june 2013 17 4. kwan yi and lois mai chan, “linking folksonomy to library of congress subject headings: an exploratory study,” journal of documentation 65, no. 6 (2009): 872–900, doi:10.1108:00220410910998906. 5. american library association, “number of libraries in the united states, ala library fact sheet 1,” www.ala.org/tools/libfactsheets/alalibraryfactsheet01. 6. edward marman, “a method for establishing a depreciated monetary value for print collections,” library administration and management 9, no. 2 (1995): 94–98. 7. s. r. ranganathan, the five laws of library science (new delhi: ess ess, 2006), http://hdl.handle.net/2027/mdp.39015073883822. http://dx.doi.org/10.1108:00220410910998906 http://www.ala.org/tools/libfactsheets/alalibraryfactsheet01 http://hdl.handle.net/2027/mdp.39015073883822 social contexts of new media literacy: mapping libraries elizabeth thorne-wallington information technology and libraries | december 2013 53 abstract this paper examines the issue of universal library access by conducting a geospatial analysis of library location and certain socioeconomic factors in the st. louis, missouri, metropolitan area. framed around the issue of universal access to internet, computers, and technology (ict) for digital natives, this paper demonstrates patterns of library location related to race and income. this research then raises important questions about library location, and, in turn, how this impacts access to ict for young people in the community. objectives and purpose the development and diffusion of new media and digital technologies has profoundly affected the literacy experiences of today’s youth.1 young people today develop literacy through a variety of new media and digital technologies.2 the dissemination of these resources has also allowed for youth to have literacy-rich experiences in an array of different settings. ernest morrell, literacy researcher, writes, as english educators, we have a major responsibility to help future english teachers to redefine literacy instruction in a manner that is culturally and socially relevant, empowering, and meaningful to students who must navigate a diverse and rapidly changing world.3 this paper will explore how mapping and geographic information systems (gis) can help illuminate the cultural and social factors related to how and where students access and use new media literacies and digital technology. libraries play an important role in encouraging new media literacy development;4 yet access to libraries must be understood through social and cultural contexts. the objective of this paper is to demonstrate how mapping and gis can be used to provide rigorous analysis of how library location in st. louis, missouri, is correlated with socioeconomic factors defined by the us census including median household income and race. by using gis, the role of libraries in providing universal access to new media resources can be displayed statistically, both challenging and confirming previously held beliefs about library access. this analysis raises new questions about how libraries are distributed across the st. louis area and whether they truly provide universal and equal access. elizabeth thorne-wallington (ethornew@wustl.edu) is a doctoral student in the department of education at washington university in st. louis. mailto:ethornew@wustl.edu information technology and libraries | december 2013 54 literature review advances in technologies are transforming the very meaning of literacy.5 traditionally, literacy has been defined as the ability to understand and make meaning of a given text.6 the changing global economy requires a variety of digital literacies, which schools do not provide.7 instead, young people acquire literacy through a multitude of inand out-of-school experiences with new media and digital technology.8 libraries play a vital role in supporting new media literacy by offering out-of-school access and experiences. to understand the role that libraries play in offering access to new media literacy technologies, a few key concepts must be defined. first is the concept of the digital native. those born around 1980, who have essentially grown up with technology, are known as digital natives.9 digital natives are expected to have a base knowledge of technology and to be able to pick up and learn new technology quickly because of that base knowledge. digital natives have been exposed to technology from a young age and are adept at using a variety of digital technologies. the suggestion is that young people can quickly learn to make use of the new media and technology available in a specific location. key to any discussion of digital natives is the concept of the digital divide. the digital divide has been a central issue of education policy since the mid-1990s.10 early work on the digital divide was concerned primarily with equal access.11 more recently, however, the idea of a “binary digital divide” has been replaced by studies focusing on a multidimensional view of the digital divide.12 hargattai asserts that even among digital natives, there are large variations in internet skills and uses correlated with socioeconomic status, race, and gender.13 these variations call for a nuanced study examining social and cultural factors associated with new media literacy, including out-ofschool contexts. the concept of literacy and learning in out-of-school contexts has a strong historical context. hull and schultz provide a review of the theory and research on literacy in out-of-school settings.14 a variety of studies, including self-guided literacy activities, after-school programs, and reading programs were reviewed, and the significance of out-of-school learning opportunities was supported by these studies. importantly for the research here, research has also been done on the use of digital technology in out-of-school settings. lankshear and knobel examine out-of-school practices extensively with their work on new literacies.15 lankshear and knobel also make clear the complexity of out-of-school experiences among young people. students participate in nontraditional literacy activities such as blogging and remix in a variety of out-of-school contexts, from home computers to community-based organizations to libraries. most importantly, lankshear and knobel found that the students did connect what they learned in the classroom with these out-of-school activities. the connection between out-of-school literacies and in-school learning has also been studied. education policy researcher allan luke writes, the redefined action of governments . . . is to provide access to combinatory forms of enabling capital that enhance students’ possibilities of putting the kinds of practices, texts, and discourses social contexts of new media literacies: mapping libraries| thorne-wallington 55 acquired in schools to work in consequential ways that enable active position taking in social fields.16 collins writes about this relationship between inand out-of-school literacies. collins writes in her case study that there are a variety of “imports” and “exports” in terms of practices. that is, skill transaction works in both directions, with skills learned out of school used in school, and skills learned in school used out of school.17 skerett and bomer make this connection even more explicit when looking at adolescent literacy practices.18 their article examines how a teacher in an urban classroom drew on her students’ out-of-school literacies to inform teaching and learning in a traditional literacy classroom. the authors found that the teacher in their study was able to create a curriculum that engaged students by inviting them to use literacies learned in out-of-school settings. however, the authors write that this type of literacy study was taxing and time-consuming for both the teacher and the student. still, it is clear that connections between inand out-of-school literacies can be made. the role libraries play in making this connection has not been studied as extensively. yet it is clear that young people do use libraries to access technology. becker et al., found that nearly half of the nation’s 14 to 18 year olds had used a library computer within the past year. becker et al. additionally found that for poor children and families, libraries are a “technological lifeline.” among those below the poverty line, 61 percent used public library computers and the internet for educational purposes.19 tripp writes that libraries have long played an important role in helping people gain access to digital media tools, resources, and skills.20 tripp writes that libraries should capitalize on the potential of new media to engage young people. additionally, tripp argues that librarians need to develop skills to train young people to use new media. the idea that libraries are important in meeting the need is further supported by the recent grants, totaling $1.2 million, by the john d. and catherine t. macarthur foundation to build “innovative learning labs for teens” in libraries. this grant making was a response to president obama’s “educate to innovate” campaign, a nationwide effort to bring american students to the forefront in science and math.21 this literature review demonstrates that the body of research currently available focuses on digital natives and the digital divide, but that the research lacks the nuance needed to capture the complexity of social and cultural contexts surrounding the issue. this literature review further demonstrates both the importance of new media literacy and out-of-school learning, as well as the key role that libraries play in supporting these learning opportunities. the study provided here uses gis analysis to demonstrate important socioeconomic and cultural factors that surround libraries and library access. first, i describe the role of gis in understanding context. next, i describe the methods used in this paper. finally, i analyze the results and implications for the study. geographic information systems analysis in education there is a burgeoning body of research which uses geographic information systems (gis) to better understand socioeconomic and cultural contexts of education and literacy issues.22 information technology and libraries | december 2013 56 there are several key works that link geography and social context. lefebvre defines space as socially produced, and he writes that space embodies social relationships shaped by values and meanings. he describes space as a tool for thought and action or as a means of control and domination. lefebvre writes that there is a need for spatial reappropriation in everyday urban life. the struggle for equality, then, is central to the “right of the city.”23 the unequal distributions of resources in the city help to maintain social and economic advantaged positions, which is important to the analysis here of library access. this unequal distribution of resources continues today. de souza briggs and others write that there is clear geographical segregation in american cities today.24 this is seen in housing choice, racial attitudes, and discrimination, as well as metropolitan development and policy coalitions. in the conclusion of his book, de souza briggs writes that housing choice is limited for low-ses minorities, and these limitations produce myriad social effects. again, this finding is important to the contexts of where libraries are located. jargowsky writes of similar findings.25 like de souza briggs, jargowsky focuses on the role that geography plays in terms of neighborhood and poverty. jargowsky even finds social characteristics of these neighborhoods: there is a higher prevalence of single-parent families, lower educational attainment, a higher level of dropouts, and more children living in poverty. important here, though, is that all such characteristics can be displayed geographically, which means that varying housing, economic, and social conditions can be displayed with library locations. soja goes beyond the geographic analysis offered by de souza briggs and jargowsky and writes that space should be applied to contemporary social theory.26 soja found that spatiality should be used in terms of critical human geography to advance a theory of justice on multiple levels. he writes that injustice is spatially construed and that this spatiality shapes social injustice as much as social injustice shapes a specific geography. this understanding, then, shapes how i approach the study of new media literacies as influenced by cultural and social factors. these factors are particularly prevalent in the st. louis, missouri, area. colin gordon reiterates the arguments of lefbvre jargowsky and de souza briggs in arguing that st. louis is a city in decline.27 by providing maps that project housing policies, gordon is able to provide a clear link between historical housing policies such as racial covenants and current urban decline. gordon is able to show that vast populations are moving out of st. louis city and into the county, resulting in a concentration of minority populations in the northern part of the city. gordon argues that the policies and programs offered by st. louis city have only exacerbated the problem and led to greater blight.28 in terms of literacy, morrell makes the most explicit connection between literacy and mapping with a study that used a community-asset mapping activity to make the argument that teachers need to make an explicit connection between literacy at school and the new literacies experienced in the community.29 the significance of this is that gis can be used to illuminate the social and economic contexts of new media literacy opportunities as well, which in turn could help inform social dialogue about the availability of and access to informal education opportunities for new media literacy. social contexts of new media literacies: mapping libraries| thorne-wallington 57 methods and data the gis analysis performed here concerns library locations in the st. louis metropolitan area, including st. louis city and st. louis county. the st. louis metropolitan area was chosen because of past research mapping the segregation of the city, largely because the city and county are so clearly segregated racially and economically along the north–south line. this segregation is striking when displayed geographically and illuminating when mapped with library location. maps were created using tiger files (www.census.gov/geo/maps-data/data/tiger.html) and us census data (http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml), both freely available to the public via internet download. libraries were identified using the st. louis city library’s “libraries & hours” webpage (www.slpl.org/slpl/library/article240098545.asp), the st. louis county library “locations & hours” webpage (www.slcl.org/about/hours_and_locations), google maps (www.maps.google.com), and the yellow pages for the st. louis metropolitan area (www.yellowpages.com). the address of each library was entered into itouchmap (http://itouchmap.com ) to indentify the latitude and longitude of the library. a spreadsheet containing this information was then loaded into the gis software and displayed as x–y data. the maps were then displayed using median household income, african american population, and latino and hispanic population as obtained from the us census at census tract level. for median household income, the data was from 1999. for all other census data, the year was 2010. for district-level data, communication arts data from the missouri department of elementary and secondary education (modese) website (http://dese.mo.gov/dsm ), was entered into microsoft excel, and then displayed on the maps. the data is district level, representing all grades tested for communication arts across all district schools. the modese data was from 2008, the most recent year available at the time the analysis was performed. the communication arts data was taken from the missouri assessment program test. this test is given yearly across the state to all public school students. the state then collects the data and makes it available at the state, district, and school level. the data used here is district-level data. scores are broken into four categories: advanced, proficient, basic, and below basic. the groups for proficient and advanced were combined to indicate the district’s success on the map test. these are the two levels generally considered acceptable or passing by the state.30 before looking at patterns of library location and these socioeconomic and educational factors, density analysis was performed on the library locations using esri arcgis software, version 9.0, to analyze whether clustering was statistically significant. this analysis was used to demonstrate whether libraries were clustered in a statistically significant pattern, or if location was random. the nearest neighbor tool of arcgis was used to determine if a set of features, in this case the libraries, shows a statistically significant level of clustering. this was done by measuring the distance from each library to its single nearest neighbor and calculating the average distance of all the measurements. the tool then created a hypothetical set of data with the same number of features, but placed randomly within the study area. then an average distance was calculated for these features and compared to the real data. that is, a hypothetical random set of locations was compared to the set of actual library locations. a near-neighbor index was produced, which expresses the ratio of the observed distance divided by the distance from the hypothetical data, thus comparing the two sets.31 this score was then standardized, producing a z-score, reported below in the results section. http://www.census.gov/geo/maps-data/data/tiger.html http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml http://www.slpl.org/slpl/library/article240098545.asp http://www.slcl.org/about/hours_and_locations http://www.maps.google.com/ http://www.yellowpages.com/ http://dese.mo.gov/dsm information technology and libraries | december 2013 58 results and conclusions using the nearest neighbor tool produced a z-score of -3.08, showing that the data is clustered beyond the 0.01 significance level. this means that there is a less than 1 percent chance that library location would be clustered to this degree based on chance. knowing, then, that library location is not random, we can now examine socioeconomic patterns of the areas where libraries are located. figure 1 shows library location and population of individuals under the age of 18 at the census tract level for st. louis city and county, using data from the 2010 us census. to clarify, the city and county are divided by the bold black line crossing the middle of the map, the only such boundary in figure 1, where the county is the larger geographic area. library location is important because previous research shows that young people use informal learning environments to access new media technologies,32 and libraries are a key informal learning environment.33 this map demonstrates, however, that libraries are not located in census tracts with the highest populations of individuals under the age of 18 in st. louis city and county. in fact, for all the tracts with the highest number of individuals under the age of 18, there are zero libraries located in these tracts. this is especially concerning given that young people may have less access to transportation, so their access of facilities in neighboring census tracts may be quite limited. figure 1. number of individuals under the age of 18 by census tract and library location in st. louis city and st. louis county. source: 2010 us census. social contexts of new media literacies: mapping libraries| thorne-wallington 59 figure 2 includes maps showing library locations in st. louis city and county in terms of poverty and race by census tract level, as well as act score by district, represented by the bold lines, where st. louis city is represented by a single district, the st. louis public school district. median household income in indicated by the gray shading, with white areas not having data available. first, census tracts with low median household income are clustered in the northern part of the city and county. there are four libraries in the northern half of the city, and eleven libraries in the central and southern parts of the city. there are fewer libraries in the census tracts with low median household income. figure 2. median household income, act score, and library location, st. louis city and county. source: 2010 us census and missouri department of elementary and secondary education, 2010, www.modese.gov. while the nearest neighbor analysis has already demonstrated the libraries are significantly clustered, the maps seem to suggest the pattern of that clustering. this is especially concerning given the report by becker that 61 percent of those living below the poverty line use libraries to access the internet.34 first, in terms of median household income, it does appear that many libraries are located in higher income areas of the city and county. while the libraries appear to be http://www.modese.gov/ information technology and libraries | december 2013 60 clustered centrally, and particularly near major freeways, there appear to be libraries in many of the higher income census tracts. adding to the concern of location is that of access to these library locations. for those living below the poverty line, transportation is often a prohibitive cost, so access from public transportation should also be a major concern for libraries. additionally, in a pattern repeated in figure 4, the location of libraries does not appear to have any effect on act scores, but there are clearly higher act scores in wealthier areas of the city and county. this is not to say that there is a statistical relationship between act score and library location, but rather to look at the spatial patterns of each in order to note similarities and differences in these patterns. figure 3 shows library location by race, including african american or black and hispanic or latino. first, it is important to note that patterns of race in st. louis have been carefully documented by gordon.35 the st. louis area is clearly a highly segregated region, which makes the social contexts of libraries in the st. louis area even more important. this map demonstrates that while there are many libraries in the northern parts of st. louis city and county, none of these libraries is located in the census tracts with the highest populations of those identifying themselves as african american or black in either the city or county. this raises questions about the inequality of access to the libraries. on the other hand, the densest populations of those identifying themselves as hispanic or latino are in the southern part of the city, but not the county. there is a library located in one of those tracts. it appears the areas with higher concentrations of african americans or blacks have fewer libraries, while areas with the higher concentrations of latinos or hispanics are located in the southern parts of the city that do have libraries. it is important to note, however, that the concentrations of latinos and hispanics is quite low, and those areas are majority white census tracts. as noted above, beyond location, access from public transportation is also an important issue. at the same time, the clustering and patterns shown on these maps raise key issues about access based on income and race. libraries are not located in areas with low median household income or in areas with high concentrations of african americans or blacks. this raises serious questions about why libraries are located where they are, and whether the individuals located in these areas have equal access to library resources, particularly new media technologies. social contexts of new media literacies: mapping libraries| thorne-wallington 61 figure 3. african american or black and hispanic, library location, st. louis city and county. source: 2010 us census. the final map raises a slightly different issue, one of test scores and student achievement. figure 4 shows library location by percent proficient or advanced on the missouri achievement program test by district. beyond the location of the libraries, one factor that stands out is that the areas with the lowest percent proficient or advanced are also the areas with the lowest median household income and the highest percentage of those identifying as african american or black. here an interesting pattern emerges. while there are many libraries in the city and northern part of the county, the percent proficient or advanced on the communication arts portion of exam is quite low (20–30 percent). on the other hand, in the western part of the county, there are few libraries, but the percent proficient or advanced is at its highest level. this suggests that there may not be a strong connection between achievement on the map exam and library location, similar to the lack of relationship seen in between act average score and library location in figure 2. at the same time, there does appear to be a correlation between race, income, and test scores. this correlation is noted throughout the literature on student achievement.36 clearly, these maps raise important questions such as how and why libraries are located in a certain area, who uses libraries in a given area, as well as what other informal learning environments and community assets exist in these areas. what is made clear by the maps, though, is that gis can be used as a tool to help understand the context of new media literacy. information technology and libraries | december 2013 62 figure 4. proficient or advanced, communication arts map by district, 2009, and library location. source: missouri department of elementary and secondary education, 2010, www.modese.gov. significance these results demonstrate that gis can be used to illuminate the social, cultural, and economic complexity that surrounds informal learning environments, particularly libraries. this can help demonstrate not only where young people have the opportunity to use new media literacy, but also the complex contextual factors surrounding those opportunities. paired with traditional qualitative and quantitative work, gis can provide an additional lens for understanding new media literacy ecologies, which can help inform dialogue about this topic. for the results of this study, there does appear to be a relationship between library location and race and income. this study illuminates the complex contextual factors affecting libraries. because of the important role that libraries can play in offering young people out of school learning opportunities, particularly in terms of access to new media resources, these contextual factors are important to ensuring equal access and opportunity for all. http://www.modese.gov/ social contexts of new media literacies: mapping libraries| thorne-wallington 63 references 1. ernest morrell, “critical approaches to media in urban english language arts teacher development,” action in teacher education 33, no. 2 (2011): 151–71, doi: 10.1080/01626620.2011.569416. 2. mizuko ito et al., hanging out, messing around, and geeking out: kids living and learning with new media (cambridge: mit press/macarthur foundation, 2010). 3. morrell, “critical approaches to media in urban english language arts teacher development.” 4. lisa tripp, “digital youth, libraries, and new media literacy,” reference librarian 52, no. 4 (2011): 329–41, doi: 10.1080/02763877.2011.584842. 5. gunther kress, literacy in the new media age (london: routledge, 2003). 6. ibid. 7. donna e. alvermann and alison h. heron, “literacy identity work: playing to learn with popular media,” journal of adolescent & adult literacy 45, no. 2 (2001): 118–22. 8. colin lankshear and michele knobel, new literacies: everyday practices and classroom learning (maidenshead: open university press, 2006). 9. john palfrey and urs gasser, born digital: understanding the first generation of digital natives (new york: perseus, 2009). 10. karin m. wiburg, “technology and the new meaning of educational equity,” computers in the schools 20, no. 1–2 (2003): 113–28, doi: 10.1300/j025v20n01_09. 11. rob kling, “learning about information technologies and social change: the contribution of social informatics,” information society 16, no. 3 (2000): 212–24. 12. james r. valadez and richard p. durán, “redefining the digital divide: beyond access to computers and the internet,” high school journal 90, no. 3 (2007): 31–44, http://www.jstor.org/stable/40364198. 13. eszter hargittai, “digital na(t)ives? variation in internet skills and uses among members of the ‘net generation,’” sociological inquiry 80, no. 1 (2010): 92–113, doi: 10.1111/j.1475682x.2009.00317.x. 14. glynda hull and katherine schultz, “literacy and learning out of school: a review of theory and research,” review of educational research 71, no. 4 (2001): 575–611, http://www.jstor.org/stable/3516099. 15. colin lankshear and michele knobel, new literacies. http://www.jstor.org/stable/40364198 http://www.jstor.org/stable/3516099 information technology and libraries | december 2013 64 16. allan luke, “literacy and the other: a sociological approach to literacy research and policy in multilingual societies,” reading research quarterly 38, no. 1 (2003): 132–41, http://www.jstor.org/stable/415697. 17. stephanie collins, “breadth and depth, imports and exports: transactions between the in-and out-of-school literacy practices of an ‘at risk’ youth,” in cultural practices of literacy: case studies of language, literacy, social practice, and power (mahwah, nj: lawrence erlbaum, 2007). 18. allison skerrett and randy bomer, “borderzones in adolescents literacy practices: connecting out-of-school literacies to the reading curriculum,” urban education 46, no. 6 (2011): 1256–79, doi: 10.1177/0042085911398920. 19. samantha becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 20. lisa tripp, “digital youth, libraries, and new media literacy.” 21. nora fleming, “museums and libraries awarded $1.2m to build learning labs,” education week (blog), december 7, 2012, http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarde d_12_million_to_build_learning_labs_for_youth.html. 22. see william f. tate iv and mark hogrebe, “from visuals to vision: using gis to inform civic dialogue about african american males,” race ethnicity and education 14, no. 1 (2011), 51– 71, doi: 10.1080/13613324.2011.531980; mark c. hogrebe and william f. tate iv, “school composition and context factors that moderate and predict 10th-grade science proficiency,” teachers college record 112, no. 4 (2010), 1096–1136; robert j. sampson, great american city: chicago and the enduring neighborhood effect (chicago: university of chicago press, 2012). 23. henri lefebvre, the production of space (oxford: blackwell, 1991). 24. xavier de souza briggs, the georgraphy of opportunity: race and housing choice in metropolitan america (washington, dc: brookings institute press, 2005). 25 paul jargowsky, poverty and place: ghettos, barrios, and the american city (new york: russell sage foundation, 1997). 26. edward w. soja, postmodern geographies: the reassertion of space in critical social theory (new york: verso, 1989). 27. collin gordon, mapping decline: st. louis and the fate of the american city (university of pennsylvania press, 2008). 28. ibid. http://www.jstor.org/stable/415697 http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html social contexts of new media literacies: mapping libraries| thorne-wallington 65 29. ernest morrell, “critical approaches to media in urban english language arts teacher development.” 30. missouri department of elementary and secondary education, http://dese.mo.gov/dsm/. 31. david allen, gis tutorial ii: spatial analysis workbook (redlands, ca: esri press, 2009). 32. becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 33. lisa tripp, “digital youth, libraries, and new media literacy.” 34. becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 35. collin gordon, mapping decline: st. louis and the fate of the american city. 36. see mwalimu shujaa, beyond desegregation: the politics of quality in african american schooling (thousand oaks, ca: corwin, 1996); william j. wilson, the truly disadvantaged: the inner city, the underclass, and public policy (chicago: university of chicago press, 1987); gary orfield and mindy l. kornhaber, raising standards or raising barriers: inequality and highstakes testing in public education (new york: century foundation, 2010). http://dese.mo.gov/dsm/ tending a wild garden: library web design for persons with disabilities | vandenbark 23 r. todd vandenbark tending a wild garden: library web design for persons with disabilities nearly one-fifth of americans have some form of disability, and accessibility guidelines and standards that apply to libraries are complicated, unclear, and difficult to achieve. understanding how persons with disabilities access web-based content is critical to accessible design. recent research supports the use of a database-driven model for library web development. existing technologies offer a variety of tools to meet disabled patrons’ needs, and resources exist to assist library professionals in obtaining and evaluating product accessibility information from vendors. librarians in charge of technology can best serve these patrons by proactively updating and adapting services as assistive technologies improve. i n march 2007, eighty-two countries signed the united nations’ convention on the rights of persons with disabilities, including canada, the european community, and the united states. the convention’s purpose was “to promote, protect and ensure the full and equal enjoyment of all human rights and fundamental freedoms by all persons with disabilities, and to promote respect for their inherent dignity.”1 among the many proscriptions for assuring respect and equal treatment of people with disabilities (pwd) under the law, signatories agreed to take appropriate measures: (g) to promote access for persons with disabilities to new information and communications technologies and systems, including the internet; and (h) to promote the design, development, production and distribution of accessible information and communications technologies and systems at an early stage, so that these technologies and systems become accessible at minimum cost. in addition, the convention seeks to guarantee equal access to information by doing the following: (c) urging private entities that provide services to the general public, including through the internet, to provide information and services in accessible and usable formats for persons with disabilities; and (d) encouraging the mass media, including providers of information through the internet, to make their services accessible to persons with disabilities.2 because the internet and its design standards are evolving at a dizzying rate, it is difficult to create websites that are both cutting-edge and standards-compliant. this paper evaluates the challenge of web design as it relates to individuals with disabilities, exploring current standards, and offering recommendations for accessible development. examining the provision of it for this demographic is vital because according to the u.s. census bureau, the u.s. public includes about 51.2 million noninstitutionalized people living with disabilities, 32.5 million of which are severely disabled. this means that nearly one-fifth of the u.s. public faces some physical, mental, sensory, or other functional impairment (18 percent in 2002).3 because a library’s mandate is to make its resources accessible to everyone, it is important to attend to the special challenges faced by patrons with disabilities and to offer appropriate services with those special needs in mind. n current u.s. regulations, standards, and guidelines in 1990 congress enacted the americans with disabilities act (ada), the first comprehensive legislation mandating equal treatment under the law for pwd. the ada prohibits discrimination against pwd in employment, public services, public accommodations, and in telecommunications. title ii of the ada mandates that all state governments, local governments, and public agencies provide access for pwd to all of their activities, services, and programs. since school, public, and academic libraries are under the purview of title ii, they must “furnish auxiliary aids and services when necessary to ensure effective communication.”4 though predating widespread use of the internet, the law’s intent points toward the adoption and adaptation of appropriate technologies to allow persons with a variety of disabilities to access electronic resources in a way that is most effective for them. changes to section 508 of the 1973 rehabilitation act enacted in 1998 and 2000 introduced the first standards for “accessible information technology recognized by the federal government.”5 many state and local governments have since passed laws applying the standards of section 508 to government agencies and related services. according to the access board, the independent federal agency charged with assuring compliance with a variety of laws regarding services to pwd, information and communication technology (ict) includes any equipment or interconnected system or subsystem of equipment, that is used in the creation, conversion, or duplication of data or information. the term electronic r. todd vandenbark (todd.vandenbark@utah.edu) is web services librarian, eccles health sciences library, university of utah, salt lake city. 24 information technology and libraries | march 2010 and information technology includes, but is not limited to, telecommunications products (such as telephones), information kiosks and transaction machines, world wide web sites, multimedia, and office equipment such as copiers and fax machines.6 the access board further specifies guidelines for “web-based intranet and internet information and applications,” which are directly relevant to the provision of such services in libraries.7 what follows is a detailed examination of these standards with examples to assist in understanding and implementation. (a) a text equivalent for every non-text element shall be provided. assistive technology cannot yet describe what pictures and other images look like; they require meaningful text-based information associated with each picture. if an image directs the user to do something, the associated text must explain the purpose and meaning of the image. this way, someone who cannot see the screen can understand and navigate the page successfully. this is generally accomplished by using the “alt” and “longdesc” attributes for images: “short. however, these aids also can clutter a page when not used properly. the current versions of the most popular screen-reader software do not limit the amount of “alt” text they can read. however, freedom scientific’s jaws 6.x divides the “alt” attribute into distinct chunks of 125 characters each (excluding spaces) and reads them separately as if they were separate graphics.8 this can be confusing to the end user. longer content can be put into a separate text file and the file linked to using the “longdesc” attribute. when a page contains audio or video files, a text alternative needs to be provided. for audio files such as interviews, lectures, and podcasts, a link to a transcript of the audio file must be immediately available. for video clips such as those on youtube, captions must accompany the clip. (b) equivalent alternatives for any multimedia presentation shall be synchronized with the presentation. this means that captions for video must be real-time and synchronized with the actions in the video, not contained solely in a separate transcript. (c) web pages shall be designed so that all information conveyed with color is also available without color, for example from context or markup. while color can be used, it cannot be the sole source or indicator of information. imagine an educational website offering a story problem presented in black and green print, and the answer to the problem could be deciphered using only the green letters. this would be inaccessible to students who have certain forms of color-blindness as well as those who use screen-reader software. (d) documents shall be organized so they are readable without requiring an associated style sheet. the introduction of cascading style sheets (css) can improve accessibility because they allow the separation of presentation from content. however, not all browsers fully support css, so webpages need to be designed so any browser can read them accurately. the content needs to be organized so that it can be read and understood with css formatting turned off. (e) redundant text links shall be provided for each active region of a server-side image map, and (f) client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape. an image map can be thought of as a geometrically defined and arranged group of links to other content on a site. a clickable map of the fifty u.s. states is an example of a functioning image map. a server-side image map would appear to a screen reader only as a set of coordinates, whereas clientside maps can include information about where the link leads through “alt” text. the best practice is to only use client-side image maps and make sure the “alt” text is descriptive and meaningful. (g) row and column headers shall be identified for data tables, and (h) markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers. correct table coding is critical. each table should use the “table summary” attribute to provide a meaningful description of its content and arrangement: . headers should be coded using the table header (“th”) tag, and its “scope” attribute should specify whether the header applies to a row or a column: ”; return ($statusline, $toemail, $msg); }#end printstatus function # checks the status for the given daemon # takes in ip, port to check, daemon name, and protocol # (tcp/udp). if given port=0 it checks for local daemon sub checkdaemon { my ($ip, $port, $daemon, $proto) = @_; my $dstat = 0; if ($proto !~ /local/){ #su checks for udp ports my $com = ($proto =~ “tcp”) ? (“nmap -p $port $ip | grep $port”) : (“nmap -su -p $port $ip | grep $port”); open(tmp, “$com|”); my $comout = ; close(tmp); if ($comout =~ /open/){ $dstat = 1; #if port is open, status is up } } else{ $daemon =~ s/ +.*//g; #\l lowercases the first letter of $daemon my $com = “which \l$daemon”; open(tmp, “$com|”); my $comout = ; close(tmp); $com = “ps aux | awk ‘{print \$11}’ | grep $comout”; open(tmp, “$com|”); $comout = ; close(tmp); $dstat = 1 if ($comout); 54 information technology and libraries | june 200854 information technology and libraries | september 2008 } return $dstat; } # end checkdaemon function # send the output perl status file to the webserver sub scpfile { my ($filepath, $webservuname, $webservpass, $webservurl, $webservtarg ) = @_; my $command = “scp $filepath $webservuname” .”\@$webservurl:$webservtarg”; my $exp1 = expect->spawn ($command); # the first argument “30” may need to be adjusted # if your system has very high latency my $ret = $exp1->expect(30, “word:”); print $exp1 “$webservpass\r”; my $ret = $exp1->expect(undef); $exp1->close(); } # end scpfile function # send an email to the admin & append error to log file sub sendemail { my ($errorlist, $weboutputurl, $fromemail, $toaddresses ) = @_; my $mailer = mail::mailer->new(“sendmail”); $mailer->open({from => “$fromemail”, to => [$toaddresses], subject => “wireless problem”}); $errorlist .= “\n\n$weboutputurl”; print $mailer $errorlist; $mailer->close(); } # end sendemail function appendix d. script output page appendix e. diagram of network lita cover 2, cover 3, cover 4 index to advertisers article title | author 39 author id box for 2 column layout thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria 39 author id box for 2 column layout knowledge organization systems denotes formally represented knowledge that is used within the context of digital libraries to improve data sharing and information retrieval. to increase their use, and to reuse them when possible, it is vital to manage them adequately and to provide them in a standard interchange format. simple knowledge organization systems (skos) seem to be the most promising representation for the type of knowledge models used in digital libraries, but there is a lack of tools that are able to properly manage it. this work presents a tool that fills this gap, facilitating their use in different environments and using skos as an interchange format. u nlike the largely unstructured information avail­ able on the web, information in digital libraries (dls) is explicitly organized, described, and man­ aged. in order to facilitate discovery and access, dl sys­ tems summarize the content of their data resources into small descriptions, usually called metadata, which can be either introduced manually or automatically generated (index terms automatically extracted from a collection of documents). most dls use structured metadata in accor­ dance with recognized standards, such as marc21 (u.s. library of congress 2004) or dublin core (iso 2003). in order to provide accurate metadata without ter­ minological dispersion, metadata creators use different forms of controlled vocabularies to fill the content of typi­ cal keyword sections. this increase of homogeneity in the descriptions is intended to improve the results provided by search systems. to facilitate the retrieval process, the same vocabularies used to create the descriptions are usu­ ally used to simplify the construction of user queries. as there are many different schemas for modeling controlled vocabularies, the term knowledge organization systems (kos) is intended to encompass all types of schemas for organizing information and promoting knowledge management. as hodge (2000) says, “a kos serves as a bridge between the users’ information need and the material in the collection.” some types of kos can be highlighted. examples of simple types are glossaries, which are only a list of terms (usually with definitions), and authority files that control variant ver­ sions of key information (such as geographic or personal names). more complex are subject headings, classifica­ tion schemes, and categorization schemes (also known as taxonomies) that provide a limited hierarchical structure. at a more complex level, kos includes thesauri and less traditional schemes, such as semantic networks and ontologies, that provide richer semantic relations. there is not a single kos on which everyone agrees. as lesk (1997) notes, while a single kos would be advantageous, it is unlikely that such a system will ever be developed. culture constrains the knowledge classifi­ cation scheme because what is meaningful to one area is not necessarily meaningful to another. depending on the situation, the use of one or another kos has its advan­ tages and disadvantages, each one having its place. these schemas, although sharing many characteristics, usually have been treated heterogeneously, leading to a variety of representation formats to store them. thesauri are an example of the format heterogeneity problem. according to iso­2788 (norm for monolingual thesauri) (iso 1986), a thesaurus is a set of terms that describe the vocabulary of a controlled indexing language, formally organized so that the a priori relationships between con­ cepts (for example, synonyms, broader terms, narrower terms, and related terms) are made explicit. this stan­ dard is complemented with iso­5964 (iso 1985), which describes the model for multilingual thesauri, but none of them describe a representation format. the lack of a stan­ dard representation model has caused a proliferation of incompatible formats created by different organizations. so each organization that wants to use several external thesauri has to create specific tools to transform all of them to the same format. in order to eliminate the heterogeneity of represen­ tation formats, the w3c initiative has promoted the development of simple knowledge organization systems (skos) (miles et al. 2005) for its use in the semantic web environment. skos has been created to represent simple kos, such as subject heading lists, taxonomies, classifica­ tion schemes, thesauri, folksonomies, and other types of controlled vocabulary as well as concept schemes embed­ ded in glossaries and terminologies. although skos has been recently proposed, the number and importance of organizations involved in its creation process (and that publish their kos in this format) indicates that it will probably become a standard for kos representation. skos provides a rich, machine­readable language that is very useful to represent kos, but nobody would expect to have to create it manually or by just using a general­purpose resource description framework (rdf) editor (skos is rdf­based). however, in the digital library area, there are not specialized tools that are able to manage it adequately. therefore, this work tries to fill this gap, describing an open source tool, thmanager, that thmanager: an open source tool for creating and visualizing skos javier lacasta, javier nogueras-iso, francisco javier lópez-pellicer, pedro rafael muro-medrano, and francisco javier zarazaga-soria javier lacasta (jlacasta@unizar.es) is assistant professor, javier nogueras-iso (jnog@unizar.es) is assistant professor, francisco javier lópez-pellicer (fjlopez@unizar.es) is research fellow, pedro rafael muro-medrano (prmuro@ unizar.es) is associate professor, and francisco javier zarazaga-soria (javy@unizar.es) is associate professor in the computer science and systems engineering department, university of zaragoza, spain. �0 information technology and libraries | september 2007�0 information technology and libraries | september 2007 facilitates the construction of skos­based kos. although thmanager has been created to manage thesauri, it also is appropriate to create and manage any other models that can be represented using skos format. this article describes the thmanager tool, highlight­ ing its characteristics. thmanager’s layer­based architec­ ture permits the reuse of the components created for the management of thesauri in other applications where they are also needed. for example, it facilitates the selection of values from a controlled vocabulary in a metadata cre­ ation tool, or the construction of user queries in a search client. the tool is distributed as open source software accessible through the sourceforge platform (http:// thmanager.sourceforge.net/). ■ state of the art in thesaurus tools and representation models the problem of creating appropriate content for thesauri is of interest in the dl field and other related disciplines, and an increasing number of software packages have appeared in recent years for constructing thesauri. for instance, the web site of willpower information (http://www .willpower.demon.co.uk/thessoft.htm) offers a detailed revision of more than forty tools. some are only avail­ able as a module of a complete information storage and retrieval system, but others also allow the possibility of working independently of any other software. among these thesaurus creation tools, one may note the follow­ ing products: ■ bibliotech (http://www.inmagic.com/). this is a multiplatform tool that forms part of bibliotech pro integrated library system and can be used to build an ansi/niso standard thesaurus (standard z39.19 [ansi 1993]). ■ lexico (http://www.pmei.com/lexico.html). this is a java­based tool that can be accessed and/or manip­ ulated over the internet. thesauri are saved in a text­based format. it has been used by the u.s. library of congress to manage such vocabularies and thesauri as the thesaurus for graphic materials, the global legal information network thesaurus, the legislative indexing vocabulary, and the symbols of american libraries listing. ■ multites (http://www.multites.com/) is a windows­ based tool that provides support for ansi/niso relationships plus user­defined relationships and comment fields for an unlimited number of thesauri (both monolingual and multilingual). ■ termtree 2000 (http://www.termtree.com.au/) is a windows­based tool that uses access, sql server, or oracle for data storage. it can import and export trim thesauri (a format used by the towers records information management system [http://www.towersoft.com/]), as well as a defined termtree 2000 tag format. ■ webchoir (http://www.webchoir.com/) is a family of client­server web applications that provides dif­ ferent utilities for thesaurus management in multiple dbms platforms. termchoir is a hierarchical infor­ mation organizing and searching tool that enables one to create and search varieties of hierarchical subject categories, controlled vocabularies, and tax­ onomies based on either predefined standards or a user­defined structure, and is then exported to an xml­based format. linkchoir is another tool that allows indexers to describe information sources using terminology organized in termchoir. and seekchoir is a retrieval system that enables users to browse thesaurus descriptors and their references (broader terms, related terms, synonyms, and so on). ■ synaptica (http://www.synaptica.com/) is a client­ server web application that can be installed locally on a client’s intranet or extranet server. thesaurus data is stored in a sql server or oracle database. the application supports the creation of electronic the­ sauri in compliance with the ansi/niso standard. the application allows the exchange of thesauri in csv (comma­separated values) text format. ■ superthes (batschi et al. 2002) is a windows­based tool that allows the creation of thesauri. it extends the ansi/niso relationships, allowing many pos­ sible data types to enrich the properties of a concept. it can import and export thesauri in xml and tabular format. ■ tematres (hhttp://r020.com.ar/tematres/) is a web application specially oriented to the creation of thesauri, but it also can be used to develop web navigation structures or to manage the documentary languages in use. the thesauri are stored in a mysql database. it provides the created thesauri in zthes (tylor 2004) or in skos format. finally, it must be mentioned that, given that thesauri can be considered as ontologies specialized in organiz­ ing terminology (gonzalo et al. 1998), ontology editors have sometimes been used for thesaurus construction. a detailed survey of ontology editors can be found in the denny study (2002). all of these tools (desktop or web­based) present some problems in using them as general thesaurus editors. the main one is the incompatibility in the interchange formats that they support. these tools also present integration problems. some are deeply integrated in bigger sys­ tems and cannot easily be reused in other environments because they need specific software components to work article title | author �1thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �1 (as dbms to store thesauri). others are independent tools (can be considered as general­purpose thesaurus editors), but their architecture does not facilitate their integration within other information management tools. and most of them are not open source tools, so there is no possibility to modify them to improve their functionality. focusing on the interchange format problem, the iso­5964 standard (norm for multilingual thesauri) is currently undergoing review by iso tc46/sc 9, and it is expected that the new modifications will include a stan­ dard exchange format for thesauri. it is believed that this format will be based on technologies such as rdf/xml. in fact, some initiatives in this direction have already arisen: ■ the adl thesaurus protocol (janée et al. 2003) defines an xml­ and http­based protocol for access­ ing thesauri. as a result of query operations, portions of the thesaurus encoded in xml are returned. ■ the language independent metadata browsing of european resources (limber) project has published a thesaurus interchange format in rdf (matthews et al. 2001). this work introduces an rdf representa­ tion of thesauri, which is proposed as a candidate thesaurus interchange format. ■ the california environmental resources evaluation system (ceres) and the nbii biological resources division are collaborating in a thesaurus partnership project (ceres/nbii 2003) for the development of an integrated environmental thesaurus and a thesau­ rus networking toolset for metadata development and keyword searching. one of the deliverables of this project is an rdf format to represent thesauri. ■ the semantic web advanced development for europe (swad­europe 2001) project includes the swad­europe thesaurus activity, which has defined the skos, a set of specifications to represent the knowledge organization systems (kos) on the semantic web (thesauri between them). the british standards bs­5723 (bsi 1987) and bs­6723 (bsi 1985) (equivalent to the international iso­2788 and iso­5964) also lack a representation format. the british standards institute idt/2/2 working group is now developing the bs­8723 standard that will replace them and whose fifth part will describe the exchange formats and protocols for interoperability of thesauri. the objec­ tive of this working group is to promote the standard to iso, to replace the iso­2788 and iso­5964. here, it is important to remark that given the direct involvement of the idt/2/2 working group with skos development; probably the two initiatives will not diverge. the new representation format will be, if not exactly skos, at least skos­based. taking into account all these circumstances, skos seems to be the most adequate representation model to store thesauri. given that skos is rdf­based, it can be created using any tool that is able to manage rdf (usually used to edit ontologies); for example, swoop (mindswap group 2006), protégé (noy et al. 2000), or triple20 (wielemaker et al. 2005). the problem with these tools is that they are too complex for editing and visualizing such a simple model as skos. they are thought to create complex ontologies, so they provide too many options not spe­ cifically adapted to the type of relations in skos. in addition, they do not allow an integrated management of collection of thesauri and other types of controlled vocabularies as needed in dl processes (for example, the creation of metadata of resources, or the construction of queries in a search system). ■ skos model skos is a representation model for simple knowledge organization systems, such as subject heading lists, tax­ onomies, classification schemes, thesauri, folksonomies, other types of controlled vocabulary, and also concept schemes embedded in glossaries and terminologies. this section describes the model, providing characteristics, showing the state of development, and indicating the problems found to represent some types of kos. skos was initially developed within the scope of the semantic web advanced development for europe (swad­europe 2001). swad­e was created to support w3c’s semantic web initiative in europe (part of the ist­7 programme). skos is based on a generic rdf schema for thesauri that was initially produced by the desire project (cross et al. 2001), and further developed in the limber project (matthews et al. 2001). it has been developed as a draft of an rdf schema for thesauri com­ patible with relevant iso standards, and later adapted to support other types of kos. among the kos already published using this new format are gemet (eea 2001), agrovoc (fao 2006), adl feature types (hill and zheng 1999), and some parts of wordnet lexical data­ base (miller 1990), all of them available on the skos project web page. skos is a collection of three different rdf schema application profiles: skos­core, to store common prop­ erties and relations; skos­mapping, whose purpose is to describe relations between different kos; and skos­ extension, to indicate specific relations and properties only contained in some type of kos. for the first step of the development of the thmanager tool, only the most stable part of skos has been consid­ ered. figure 1 shows the part of skos­core used. the rest of skos­core is still unstable, so its support has been delayed until it is approved. skos­mapping and skos­extension are still in their first steps of develop­ �2 information technology and libraries | september 2007�2 information technology and libraries | september 2007 ment and are very unstable, so their management in thmanager also has been delayed until the creation of stable versions. in skos­core, a kos (in our case, usually a the­ saurus) consists of a set of concepts (labelled as skos: concept) that are grouped by a concept scheme (skos: conceptscheme). to distinguish between different mod­ els provided, the skos:conceptscheme contains a uri that identifies it, but to describe the model content to humans, metadata following the dublin core standard also can be added. the relation of the concept scheme with the concepts of the kos is done through the skos: hastopconcept relation. this relation points at the most general concepts of the kos (top concepts), which are used as entry points to the kos structure. in skos, each concept consists of a uri and a set of properties and relations to other concepts. among the properties, skos.preflabel and skos.altlabel provide labels for a concept in different languages. the first one is used to show the label that better identifies a concept (for the­ sauri it must be unique). the second one is an alternative label that contains synonyms or spelling variations of the preferred label (it is used to redirect to the preferred label of the concept). the skos concepts also can contain three other properties called skos.scopenote, skos.definition, and skos.example. they contain annotations about the ways to use a concept, a definition, or examples of use in differ­ ent languages. last, the skos.prefsymbol and skos.altsymbol properties are used to provide a preferred or some alter­ native symbols that graphically represent the concept. for example, a graphical representation is very useful to identify the meaning of a mathematical formula. another example is a chemical formula, where a graphical repre­ sentation of the structure of the substance also provides valuable information to the user. with respect to the relations, each concept indicates by means of the skos:inscheme relation in which concept scheme it is contained. the skos.broader and the skos.narrower relations are inverse relations used to model the generalization and specialization characteristics present in many kos (including thesauri). skos.broader relates to more general concepts, and skos.narrower to more spe­ cific ones. the skos.related relation describes associative relationships between concepts (also present in many thesauri), indicating that two concepts are related in some way. with these properties and relations, it is perfectly possible to represent thesauri, taxonomies, and other types of controlled vocabularies. however, there is a problem for the representation of classification schemes that provide multiple coding of terms, as there is no place to store this information. under this category, one may find classification schemes such as iso­639 (iso 2002) (iso standard for coding of languages), which proposes different types of alphanumeric codes (for example, two letters and three letters). for this special case, the skos working group proposes the use of the property skos.notation. although this property is not in the skos vocabulary yet, it is expected to be added in future versions. given the need to work with these types of schemes, this property has been included in the thmanager tool. ■ thmanager architecture this section presents the architecture of thmanager tool. this tool has been created to manage thesauri in skos, but it also is a base infrastructure that facilitates the management of thesauri in dls, simplifying their inte­ gration in tools that need to use thesauri or other types of controlled vocabularies. in addition, to facilitate its use on different computer platforms, thmanager has been developed using the java object­oriented language. the architecture of thmanager tool is shown in figure 2. the system consists of three layers: first, a repository layer where thesauri are stored and identified by means of associated metadata describing them; second, a per­ sistence layer that provides an api for access to thesauri stored in the repository; and third, a gui layer that offers different graphical components to visualize thesauri, to search by their properties, and to edit them in different ways. the thmanager tool is an application that uses the different components provided by the gui layer to allow the user to manage the thesauri. in addition, the layered architecture allows other applications to use some of the visualization components or the method provided by the persistence layer to provide access to thesauri. the main features that have guided the design of these layers have been the following: a metadata­driven design, efficient management of thesauri, the possibility of interrelating thesauri, and the reusability of thmanager figure 1. skos model article title | author �3thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �3 components. the following subsections describe these characteristics in detail. metadata-driven design a fundamental aspect in the repository layer is the use of metadata to describe thesauri. thmanager considers metadata of thesauri as basic information in the thesau­ rus management process, being stored in the metadata repository and managed by the metadata manager. the reason for this metadata­driven design is that thesauri must be described and classified to facilitate the selec­ tion of the one that better fits the user needs, allowing the user to search them not only by their name but also by the application domain or the associated geographi­ cal area between others. the lack of metadata makes the identification of useful thesauri (provided by other organizations) difficult, producing a low reuse of them in other contexts. to describe thesauri in our service, a metadata profile based on dublin core has been created. the reason to use dublin core as basis of this profile has been its extensive use in the metadata community. it provides a simple way to describe a resource using very general metadata ele­ ments, which can be easily matched with complex domain­ specific metadata standards. additionally, dublin core also can be extended to define application profiles for specific types of resources. following the metadata pro­ file hierarchy described in tolosana­calasanz et al. (2006), the thesaurus metadata profile refines the definition and domain of dublin core elements as well as includes two new elements (metadata language and metadata identifier) to appropriately identify the metadata records describing a thesaurus. the profile for thesauri has been described using the iemsr format (heery et al. 2005) and is distributed with the tool. iemsr is an rdf­based format created by the jisc ie metadata schema registry project to describe metadata application profiles. figure 3 shows the metadata created for gemet thesaurus (the resource), expressed as a hedgehog graph (reinterpreta­ tion of rdf triplets: resources, named properties, and values). the purpose of these metadata is not only to sim­ plify the thesaurus location to a user, but also to facilitate the identification of thesauri useful for a specific task in a machine­to­machine communication. for instance, one may be interested only in thesauri that cover a restricted geographical area or have a specific thematic. efficient thesauri storage thesauri vary enormously in size, ranging from hundreds of concepts and properties to millions. so the time spent on load, navigation, and search processes are a functional restriction for a tool that has to manage them. skos is rdf­based, and because reading rdf to extract the con­ tent is a slow process, the format is not appropriate for inner storage. to provide better access time, thmanager transforms skos into a binary format when a new skos is imported. the persistence layer provides a unified access to the thesaurus repository. this layer is used by the gui layer figure 2. kos manager architecture viewer generatorviewer generator repository concept repository metadata manager concept manager persistence gui disambiguation tool concept core thesaurus persistence manager skos core skos mapping jena api metadata repository thesaurus metadata applications thmanagerthmanager other tools that use thesauri other tools that use thesauri desktop tools that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri desktop tools that use thesauri desktop tools that use thesauri other tools that use thesauri other tools that use thesauri web services that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri web services that use thesauri web services that use thesauri visualization edition search gui manager figure 3. metadata of gemet thesaurus european topic centre on catalogue of data sources (etc/cds) general multilingual environmental thesaurus dc:title dcterms:alternative gemet dc:creator [ http://www2.ulcc.ac.uk/unesco/concept/mt_mt_2.55 ] science.environmental sciences and engineering [ http://www2.ulcc.ac.uk/unesco/concept/mt_2.60 ] science.pollution, disasters and security [ http://www2.ulcc.ac.uk/unesco/concept/mt_2.65 ] science.natural resources dc:subject dc:subject dc:subject dc:subject gemet was conceived as a "general" thesaurus, aimed to define a common general language, a core of general terminology for the environment dc:description dc:publisher european environment agency (eea) dc:date 2005-03-07 dc:type [ http://iaaa.cps.unizar.es/dctype/concept/236 ] text.reference materials.ontology dc:format [ http://iaaa.cps.unizar.es/mimetype/concept/skos ] skos http://www.eionet.eu.int/gemetdc:identifier dc:language en es fr ... iaaa:metadatalanguage en http://iaaa.cps.unizar.es/ontologies/gemetiaaa:metadataidentifier [ http://www2.ulcc.ac.uk/unesco/concept/mt_2.75 ] science.natural sciences [ http://www.eionet.europa.eu ] european environment information and observation network it can be used whenever there is no commercial profitdc:rights dc:relation us environmental protection agency (epa) dc:contributor dc:source [ http://europa.eu/eurovoc ] eurovoc thesaurus european environment agency (eea) dc:creator ... �� information technology and libraries | september 2007�� information technology and libraries | september 2007 to access the thesauri, but it also can be employed by other tools that need to use thesauri outside a desktop environment (for example, a thematic search system accessible through the web that requires browsing a thesaurus to facilitate construction of user queries). this layer performs the transformation of skos to the binary format when a thesaurus is imported. the transformation is provided using the jena library, a popular library to manipulate rdf documents that allows storing them in different kinds of repositories (http://jena.sourceforge. net/). jena provides an open model that can be extended with specialized modules to use other ways of storage, making it possible to easily change the storage format system for another that is more efficient if needed. the data structure used is shown in figure 4. the model is an optimized representation of the information given by the rdf triplets. the concepts map contains the concepts and their associated relations in the form of key­value pairs: the key is a uri identifying a concept; and the value is a relations object containing the properties of the concept. a relations object is a map that stores the properties of one concept in the form of pairs. the keys used for this map are the names of the typical property types in the skos model (for example, narrower or broader). the only special cases for encoding these property types in the proposed data structure occur when they have a language attribute (for example, preflabel, definition, or scopenote). in those cases, we propose the use of a [lang] suffix to distinguish the property type for a particular language. for instance, preflabel_en indicates a preflabel property type in english. additionally, it must be noted that the data type of the property values assigned to each key in the relations map varies upon the semantics given to each property type. the data types fall into the following categories: a string for a preflabel property type; a list of strings for altlabel, definition, scope note, and example property types; a uri for a prefsymbol property type; a list of uris for narrower, broader, related, and altsymbol property types; and a list of notation objects for a notation property type. the data type used for notation values is a complex object because there may be different notation types. a notation object consists of type and value attributes. the type attribute is a uri that identifies a particular notation type and qualifies the associated notation value. additionally, and with the objective of increasing the speed of some operations (for example, navigation or search), some optimizations have been added. first, the uris of the top concepts are stored in the topconcepts list. this list contains redundant information, given that those concepts also are stored in the concepts map, but it makes immediate their location. second, to speed up the search of concepts and the drawing of the alphabetic viewer, the translations map has been added. for each language sup­ ported by the thesaurus, this map contains a translationterm object, or list of pairs , ordered by preflabel. it also contains redundant information that allows the immediate creation of the alphabetic viewer for a language, simplifying the search process; as can be seen later, this does not provides a big over­ head in load time. in addition, if no alphabetic viewer and search are needed, this structure can be removed without affecting the hierarchical viewer. this solution has proven to be useful to manage the kind of thesauri we use (they do not sur­ pass 50,000 concepts and about 330,000 properties), loading them to memory in an average com­ puter in a reasonable time, and allowing immediate navigation and search (see section 6). interrelation of thesauri the vast choice of thesauri that are available nowadays implies an undesired effect of content heterogeneity. although a the­ saurus is usually created for a specific application domain, some of the concepts defined in thesauri from different applica­figure �. persistence model …… relations uri 3uri 3 relations uri 2uri 2 relations uri 1uri 1 valuekey …… relations uri 3uri 3 relations uri 2uri 2 relations uri 1uri 1 valuekey <> concepts uriprefsymbol list altsymbol list notation stringpreflabel_[lang] list altlabel_[lang] list definition_[lang] list scopenote_[lang] list example_[lang] list related list broader list narrower valuekey uriprefsymbol list altsymbol list notation stringpreflabel_[lang] list altlabel_[lang] list definition_[lang] list scopenote_[lang] list example_[lang] list related list broader list narrower valuekey <> relations -type : uri -value : string notation …… list narrower valuekey …… list narrower valuekey <> relations … uri 390 uri 27 uri 3 … uri 390 uri 27 uri 3 <> topconcepts … -concept : uri -label : string translationterm …… listfr listes listen valuekey …… listfr listes listen valuekey <> translations article title | author �5thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �5 tions domains may be equivalent. in order to facilitate cross­domain classification of resources, users would benefit from the possibility of knowing the connections of a thesaurus in their application domain to thesauri used in other domains. however, it is difficult to manually detect the implicit links between those different thesauri. therefore, in order to automatically facilitate these interthesaurus connections, the persistence layer of thmanager tool provides an interrelation function that relates a thesaurus with respect to an upper­level lexical database (the concept core displayed in figure 2). the interrelation mechanism is based on the method presented in nogueras­iso, zarazaga­soria, and muro­ medrano (2005). it is an unsupervised disambiguation method that uses the relations between concepts as disam­ biguation context. it applies a heuristic voting algorithm to select the most adequate sense of the used concept core for each thesaurus concept. at the moment, the concept core is the wordnet lexical database. wordnet is a large english lexical database that groups nouns, verbs, adjectives, and adverbs into sets of cognitive synonyms (synsets), each expressing a distinct concept. those synsets are interlinked by means of conceptual­semantic and lexical relations. the interrelation component has been conceived as an independent module that receives a thesaurus as input in skos and returns the relation respect to concept core using an extended version of the skos mapping model (miles and brickley 2004). this model, as commented before, is a part of skos that allows describing exact, major, and minor mappings between concepts of two different kos (in this case between a thesaurus and the common core). skos mapping is still in an early stage of development and has been extended in order to provide the needed functionality. the base skos mapping provides the map:exactmatch, map:majormatch, and map:minormatch relations to indicate the degree of relation between two concepts. given that the interrelation algorithm cannot ensure that a mapping is 100 percent exact, only the major and minor match properties are used. the algorithm returns a list of pos­ sible mappings with the lexical database for each concept: the one with the highest probability is assigned as major match, and the rest are assigned as minor matches. to store the interrelation probability, skos mapping has been extended by adding a blank node with the liability of the mapping. also, to be able to know which concepts of which thesauri are equivalents to one of the common core, the inverse relations of map:majormatch and map:minormatch have been created. an example of skos mapping can be seen in figure 5. there, the concept 340 of gemet thesaurus (alloy) is correctly mapped to the wordnet concept number 13751474 (alloy, metal) with a probability of 91.007 percent, an unrelated minor mapping also is found, but it is given a low probability (8.992 percent). reusability of thmanager components on top of the api layer, the gui layer has been con­ structed. this layer contains several graphical interfaces to provide different types of viewers, searchers, and edi­ tors for thesauri. this layer is used as base for the con­ struction of the thmanager tool. the tool groups a subset of the provided components, relating them to obtain a final user application that allows the management of the stored thesauri, their visualization (navigation by the concept relations), their edition, and their importation and exportation using skos format. the thmanager tool not only has been created as an independent tool to facilitate thesauri management, but also to allow easy integration in tools that need to use thesauri. it has been done by combining the informa­ tion management with specific graphical interfaces in different black­box components. between the provided components, there is a hierarchical viewer, an alphabetic viewer, a list viewer, a searcher, and an editor, but more components can be constructed if needed. the use of the gui layer as a library of reusable graphical components makes it possible to create different tools that are able to manage thesauri with different user requirements with minimum effort, allowing also the integration of this technology in other applications that need controlled vocabularies to improve their functionality. for example, in a metadata creation tool, it can be used to provide the graphical component to select controlled values from thesauri and automatically insert them in the metadata. it also can be used to provide the list of possible values to use in a web search system, or to provide a thesaurus­ based navigation of a collection of resources in an explor­ atory search system. figure 6 shows the integration process of a thesau­ rus visualization component in an external tool. the provided thesaurus components have been constructed following the java beans philosophy (reusable software components that can be manipulated visually in a builder tool), where a component is a black box with methods to read and change its state that can be reused when needed. here, each thesaurus component is a thesaurusbean that can be directly inserted in a graphical application to use its functionality (visualize or edit thesauri) in a very simple way. the thesaurusbeans are provided by the thesaurusbeanmanager that, given the parameters of the thesaurus to visualize and the type of visualization, returns the most adequate component to use. ■ description of thmanager functionality thmanager tool is a desktop application that is able to manage thesauri stored in skos. as regards to the instal­ �6 information technology and libraries | september 2007�6 information technology and libraries | september 2007 lation requirements, the application requires 100 mbs of free space on the hard disk. with respect to ram and cpu requirements, they depend greatly on the size and the number of thesauri loaded in the tool. considering the number and size of thesauri used as testbed in section 6, ram consumption ranges from 256 to 512 mbs, and with a 3ghz cpu (for example, pentium iv), the load times for the bigger thesauri are acceptable. however, if the size of thesauri is smaller, ram and cpu requirements decrease, being able to operate on a computer with just a 1 ghz cpu (for example, pentium iii) and 128 mbs of ram. given that the management of thmanager is meta­ data oriented, the first window in the application shows a table including the metadata records describing all the thesauri stored in the system (figure 7). the selection of a record in this table indicates to the rest of the compo­ nents the selected thesaurus. the creation or deletion of thesauri also is provided here. the only operation that can be performed when no record is selected is to import a new thesaurus stored in skos. to import it, the name of the skos file must be provided. the import tool also contains the option to interrelate the imported thesaurus to the concept core. the metadata of the thesaurus are extracted from inside of the skos if they are available, or they can be provided in an associated xml metadata file. if no metadata record is provided, the application generates a new one with minimum information, using as base the name of the skos file. once the user has selected a thesaurus, it can visualize and modify its metadata or content, export it to skos, or, as commented before, delete it. with respect to the metadata describing a thesaurus, a metadata viewer visualizes the metadata in html and a metadata editor allows the editing of metadata following the thesaurus metadata profile described in the metadata­driven design section (figure 8 shows a screenshot of the metadata edi­ tor). different html views can be provided by adding more css files to the application. the metadata editor is customiz­ able. to add or delete metadata elements to the metadata edi­ tor window, it is only neces­ sary to modify the description of the iemsr profile for thesauri included in the application. the main functionality of the tool is to visualize the thesaurus structure, showing all proper­ ties of concepts and allowing the navigation by relations (see figure 9). here, different read­only viewers are provided. there is an alphabetic viewer that shows all the concepts ordered by the preferred label in one language. a hierar­ chical viewer provides navigation by broader and nar­ rower relations. additionally, a hypertext viewer shows all properties of a concept and provides navigation by all its relations (broader, narrower, and related) via hyper­ links. finally, there also is a search system that allows the typical searches needed for thesauri (equals, starts with, contains). currently, search is limited to preferred labels in the selected language, but it could be extended to allow searches by other properties, such as synonyms, defini­ tions, or scope notes. figure 5. skos mapping extension alloy ... 91.00727 alloy, metal … 91.00727 map:majormatch iaaa:probability map:majormatch iaaa:hasmajormatch iaaa:hasmajormatch resource property alloy, metal a mixture containing two or more metallic elements or metallic and nonmetallic elements usually fused together or dissolving into each other when molten; "brass is an alloy of zinc and copper" skos:definition map:minormatch iaaa:hasminormatch admixture, alloy map:minormatch iaaa:hasminormatch http://www.eionet.eu.int/ gemet/concept/340 rdf:about a28660 rdf:nodeid a2821 8.992731 iaaa:probability rdf:nodeid http://wordnet.princeton.edu/ wordnet_2.0/13751474 rdf:about skos:preflabel alloy skos:preflabel http://wordnet.princeton.edu/ wordnet_2.0/13664144 the state of impairing the quality or reducing the value of something skos:preflabel skos:definition rdf:about any of a large number of substances having metallic properties and consisting of two or more elements; with few exceptions, the components are usually metallic elements. (source: mgh) skos:definition figure 6. gui component integration desktop tool thesaurusbeanmanager type: tree, thesaurus: gemet thesaurusbean article title | author �7thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �7 all of these viewers are synchronized, so the selec­ tion of a concept in one of them produces the selection of the same concept in the others. the layered architec­ ture described previously allows these viewers to be reused in many situations, including other parts of the thmanager tool. for example, in the thesaurus metadata editor described before, the thesaurus viewer is used to facilitate the selection of values for the subject section of metadata. also, in the thesaurus editor shown later, the thesaurus viewer simplifies the selection of a concept related (by some kind of relation) to the selected, and provides a preview of the hierarchical viewer to help to detect wrong relations. the third available operation is to edit the thesaurus structure. here, to create a thesaurus following the skos model, an edition component is provided (see figure 10). the graphical interface shows a list with all the concepts created in the selected thesaurus, allowing the creation of new ones (providing their uris) or deletion of selected ones. once a concept has been selected, its properties and relations to other concepts are shown, allowing the creation of new ones and the deletion of others. to facili­ tate the creation of relations between concepts, a selector of concepts (based in the thesaurus viewer) is provided, allowing the user to add related concepts without manu­ ally typing the uri of the associated concept. also, to see if the created thesaurus is correct, a preview of the hier­ archical viewer can be shown, allowing the user to easily detect problems in the broader and narrower relations. with respect to the interrelation functionality, at the moment the mapping obtained is shown in the thesaurus viewers, but the navigation between equivalent concepts of two thesauri must be be done manually by the user. however, a navigation component still under develop­ ment will allow the user to jump from a concept in a the­ saurus to concepts in others that are mapped to the same concept in the common core. as mentioned before, for efficiency, the format used to store the thesauri in the repository is binary, but the inter­ change format used is skos. so a module for thesauri importation and exportation is provided. this module is able to import from and export to skos. in addition, if the thesaurus has been interrelated with respect to the concept core, it is able to export its mapping to the con­ cept core using the extended version of skos mapping above. ■ results of the work this section shows some experiments performed with the thmanager tool for the storage and management of a selected set of thesauri. in particular, this set of thesauri is relevant in the context of the geographic information community. the increasing relevance of geographic infor­ mation for decision­making and resource management in different areas of government has promoted the cre­ ation of geo­libraries and spatial data infrastructures to facilitate distribution and access of geographic informa­ tion (nogueras­iso, zarazaga­soria, and muro­medrano, 2005). in this context, complex metadata schemes, such as iso­19115, have been proposed for a full­detail descrip­ tion of resources. many of the metadata elements in these schemes are either constrained to a selected vocabulary (iso­639 for language encoding, iso­3166 for country codes, and so on), or the user is told to pick a term from the most suitable thesaurus. the problems with this sec­ ond case are that typically the choice for thesauri is quite open, the thesauri are frequently large, and the exchange format of available thesauri is quite heterogeneous. in such a context, the thmanager tool has proven to be very useful to simplify the management of the used thesauri. at the moment, eighty kos between thesauri and other types of controlled vocabulary have been cre­ ated or transformed to skos and managed through this tool. table 1 shows some of them, indicating their names (name column), the number of concepts (nc column), their total number of properties and relations (np and nr columns), and the number of languages in which concept properties are provided (nl column). to give an idea of the cost of loading these structures, the sizes of skos and binary files (ss and sb columns) are provided in kilobytes (kb). additionally, table 1 compares the performance time of thmanager with respect to other tools that load the figure 7. thesaurus selector figure �. thesaurus metadata editor �� information technology and libraries | september 2007�� information technology and libraries | september 2007 thesauri directly from an rdf file using the jena library (time performance has been obtained using a 3ghz pentium iv processor). for this purpose, three different load times (in seconds) have been computed. the bt column contains the load time of binary files without the cost of creating the gui for the thesauri viewers. the lt column contains the total load time of binary files (including the time of gui creation and drawing). the jt column contains the time spent by a hypothetical rdf­ based editor tool to invoke jena and load in its memory model the rdf skos files (it does not include gui cre­ ation) containing the thesauri. the difference between the bt and lt column shows the time used to draw the gui once the thesauri have been loaded in memory. the difference between bt and jt columns shows the gain in terms of time of using a binary storage instead of a rdf based one. the thesauri shown in the table are the adl feature types thesaurus (adl ftt), the isoc thesaurus of geography (isoc­g), the iso­639, the unesco thesaurus (unesco 1995), the ogp surveying and positioning committee code lists (epsg) (ogp 2006), the multilingual agricultural thesaurus (agrovoc), the european vocabulary thesaurus (eurovoc) (eupo 2005), the european territorial units (spain and france) (etu), and the general multilingual environmental thesaurus (gemet). they have been selected because they have different sizes and can be used to show how the load time evolves with the thesaurus size. among them, gemet and agrovoc can be high­ lighted. although they are provided as skos, they include nonstandard extensions that we have transformed to standard skos relations and properties. eurovoc and unesco are examples of thesauri provided in formats different than skos that we have completely transformed into skos. the former one was in an xml­based format, and the latter used a plain­text format. another thesaurus transformed to skos is the european territorial units, which contains the administrative political units in spain and france. here, the original source was a collection of heterogeneous documents that contained parts of the needed information and have been processed to generate a skos file. some classification schemes also have been trans­ formed to skos, such as the iso­639 and the different epsg codes for coordinate reference systems (includ­ ing datums, ellipsoids, and projections). with respect to controlled vocabularies created (by the authors) in skos using the thmanager tool, there is an extended version of the adl feature types that includes a more detailed clas­ sification of features types and different glossaries used for resource classification. figure 11 depicts the comparison of the different load times shown in table 1 with respect to the size of the rdf skos files. the order of the thesauri in the figure is the same as in the table 1. it can be seen that the time to con­ struct the model using a binary format is almost half the time spent to create the model using a rdf file. in addi­ tion, once the binary model is loaded, the time to generate the gui is not very dependent on thesaurus size. this is possible thanks to the redundant information added to facilitate the access to top concepts and to speed up load­ ing of the alphabetic viewer. this redundant informa­ tion produces an overhead in the load of the model, but without it the drawing time would be much worse, as it would have to generate it on the fly. however, in spite of the improvements, for the larger thesauri considered, the load time starts to be long, given that it includes the load time of all the structure of the thesaurus in memory and the creation of the objects used to manage it quickly when loaded. but, once it is loaded, future accesses are immediate (quicker than 0.5 seconds). these accesses include opening it again, navigating by figure 9. thesaurus concept selector figure 10. thesaurus concept editor article title | author �9thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �9 thesaurus relations, changing the visualization language, and searching concepts by their preferred labels. to minimize the load time, thesauri can be loaded in the background when the application is launched, reducing, in that way, the user perception of the load time. another interesting aspect in figure 11 is the peak of the third element. it corresponds with the iso­639 classifica­ tion scheme. it has the special characteristic of not having hierarchy and having many notations. these two character­ istics produce a little increase in the model load time, given that the top concepts list contains all the concepts and the notations are more complex than other relations. but most of the time is used to generate the gui of the tree viewer. the tree viewer gets all the concepts that are top terms, and for each one it asks for their preferred labels in the selected language and sorts them alphabetically to show the first level of the tree. this is fast for a few hundred concepts, but not for the 7,599 in the iso­639. however, this problem could be easily solved if the metadata contained a descrip­ tion of the type of kos to visualize. if the tool knew that the kos does not have broader and narrower relations, it could use the structures used to visualize the alphabetic list, which are optimized to show all of the kos concepts rapidly, instead of trying to load it as a tree. the persistence approach used has the advantage of not requiring external persistence systems, such as a dbms, and providing rapid access after loading, but it has the drawback of loading all thesauri in memory (in time and space). so, for much bigger thesauri, the use of some kind of dbms would be necessary. if this change were necessary, minimum modifications would be needed (one class). however, if not all the concepts are loaded, the alphabetic viewer (shows all the concepts) would have to be updated (for example, showing the concepts by pages) or it would become too slow to work with it. ■ conclusions this article has presented a tool for managing the the­ sauri needed in a digital library, for creating metadata, and for running search processes using skos as the interchange format. this work revises the tools that are available to edit thesauri, highlighting the lack of a formalized way to exchange thesauri and the difficulty of integrating those tools in other environments. this work selects skos from the available interchange formats for thesauri as the most promising format to become a standard for skos repre­ sentation, and highlights the lack of tools that are able to manage it properly. the thmanager tool is offered as the solution to these problems. it is an open source tool that can manage the­ sauri stored in skos, allowing their visualization and editing. thanks to the layered architecture, its components can be easily integrated in other applications that need to use thesauri or other controlled vocabularies. additionally, the components can be used to control the possible values used in a web search service to facilitate traditional or exploratory searches based on a controlled vocabulary. the performance of the tool is proved through a series of experiments on the management of a selected set of thesauri. this work analyzes the features of this selected set of thesauri and compares the efficiency of this tool with respect to other tools that load the thesauri directly from a rdf file. in particular, it is shown that the internal representation used by thmanager helps to decrease the time spent for the graphical loading of thesauri, facilitating navigation of the thesaurus contents as well as other typical operations, such as sorting or change of visual­ ization language. additionally, it is worth noting that the tool can be used as a library of components to simplify the integration of the­ sauri in other applications that require the use of controlled vocabularies. thmanager has been integrated within the open source catmdedit tool table 1. sizes of some thesauri and other types of vocabularies name nc np nr nl lt bt jt ss sb adl ftt 210 210 408 1 0.4 0.047 0.062 103 41 isoc­g 5,136 5,136 1,026 1 2.4 1.063 1.797 2,796 1,332 iso­639 7,599 16,247 0 6 5.1 1.969 2.89 3,870 3,017 unesco 8,600 13,281 21,681 3 2.1 1.406 2.984 4,034 2,135 epsg 4,772 9,544 0 1 1.8 0.969 1.796 2,935 1,682 agrovoc 16,896 103,484 30,361 3 7.5 4.953 14.75 15,859 5,089 eurovoc 6,649 196,391 20,861 15 11.1 9.266 15.828 18,442 11,483 etu 44,991 89,980 89,976 2 13.3 10.625 17.844 23,828 10,412 gemet 5,244 326,602 12,750 21 13.7 11.828 25.61 28,010 15,048 50 information technology and libraries | september 200750 information technology and libraries | september 2007 (zarazaga­soria et al. 2003), a metadata editor tool for the documentation of geographic information resources (metadata compliant with iso19115 geographic informa­ tion metadata standard). the thesaurusbeans provided in thmanager library have been used to facilitate keyword selection for some metadata elements. the thmanager component library also has contributed to the develop­ ment of catalog search systems guided by controlled vocabularies. for instance, it has been used to build a thematic catalog in the sdiger project (zarazaga­soria 2007). sdiger is a pilot project on the implementa­ tion of the infrastructure for spatial information in europe (inspire) for the development of a spatial data infrastructure to support access to geographic infor­ mation resources concerned with the european water framework directive. thanks to the thmanager compo­ nents, the thematic catalog allows browsing of resources by means of several multilingual thesauri, including gemet, unesco, agrovoc, and eurovoc. future work will enhance the functionalities provided by thmanager. first, the ergonomics will be improved to show connections between different thesauri. currently, these connections can be computed and annotated, but the gui does not allow the user to navigate them. as the base technology already has been developed, only a graphical interface is needed. second, the tool will be enhanced to support data types different from texts (for example, images, documents, or other multimedia sources) for the encoding of concepts’ property values. third, it has been noted that the thesauri concepts can evolve with time. thus, a mechanism for the managing the different ver­ sions of thesauri will be necessary in the future. finally, improvements in usability also are expected. thanks to the component­based design of thmanager widgets (thesaurusbeans), new viewers or editors can be readily created to meet the needs of specific users. ■ acknowledgments this work has been partially supported by the spanish ministry of education and science through the proj­ ects tin2006­00779 and tic2003­09365­c02­01 from the national plan for scientific research, development, and technology innovation. the authors would like to express their gratitude to juan josé floristán for his support in the technical development of the tool. references american national standards institute (ansi). 1993. guidelines for the construction, format, and management of monolin­ gual thesauri. ansi/niso z39.19­1993. revision of z39.19. batschi, wolf­dieter et al. 2002. superthes: a new software for construction, maintenance, and visualisation of mul­ tilingual thesauri. http://www.t­reks.cnr.it/docs/st_ enviroinfo_2002.pdf (accessed sept. 6, 2007). british standards institute (bsi). 1985. guide to establishment and development of multilingual thesauri. bs 6723. british standards institute (bsi). 1987. guide to establishment and development of monolingual thesauri. bs 5723. ceres/nbii. 2003. the ceres/nbii thesaurus partnership project. http://ceres.ca.gov/thesaurus/ (accessed june 12, 2007). cross, phil, dan brickley, and traugott koch. 2001. rdf the­ saurus specification. technical report 1011, institute for learning and research technology. http://www.ilrt.bris.ac.uk/ discovery/2001/01/rdf­thes/ (accessed june 12, 2007). denny, michael. 2002. ontology building: a survey of edit­ ing tools. xml.com. http://xml.com/pub/a/2002/11/06/ ontologies.html (accessed june 12, 2007). european environment agency (eea). 2004. general multilingual environmental thesaurus (gemet). version 2.0. european environment information and observation network. http:// www.eionet.europa.eu/gemet/rdf (accessed june 12, 2007). european union publication office (eupo). 2005. european vocabulary (eurovoc). publications office. http://europa .eu/eurovoc/ (accessed june 12, 2007). food and agriculture organization of the united nations (fao). 2006. agriculture vocabulary (agrovoc). agricul­ tural information management standards. http://www.fao. org/aims/ag%20alpha.htm (accessed june 12, 2007). gonzalo, julio, et al. 1998. applying eurowordnet to cross­lan­ guage text retrieval. computers and the humanities 32, no. 2/3 (special issue on euroword­net): 185–207. heery, rachel, et al. 2005. jisc metadata schema registry. in 5th acm/ieee-cs joint conference on digital libraries, 381–81. new york: acm pr. hill, linda, and qi zheng. 1999. indirect geospatial referencing through place names in the digital library: alexandria digi­ figure 11. thesaurus load times 0 5 10 15 20 25 30 0 5000 10000 15000 20000 25000 30000 skos file size (kb) lo ad t im e (s ) rdf (jena) binary thmanager article title | author 51thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria 51 tal library experience with developing and implementing gazetteers. in asis ‘99: proceedings of the 62nd asis annual meeting: knowledge: creation, organization, and use, 57–69. med­ ford, n.j.: information today, for the ameircan society for information science. hodge, gail. 2000. systems of knowledge organization for digital libraries: beyond traditional authority files. washington, d.c.: the digital library federation. international organization for standardization (iso). 1985. guidelines for the establishment and development of multilingual thesauri. iso 5964. international organization for standardization (iso). 1986. guidelines for the establishment and development of monolingual thesauri. iso 2788. international organization for standardization (iso). 2002. codes for the representation of names of languages. iso 639. international organization for standardization (iso). 2003. information and documentation—the dublin core metadata element set. iso 15836:2003. janée, greg, satoshi ikeda, and linda l. hill. 2003. the adl the­ saurus protocol. http://www.alexandria.ucsb.edu/~gjanee/ thesaurus/ (accessed june 12, 2007). lesk, michael. 1997. practical digital libraries. san francisco: books, bytes, and bucks. matthews, brian m., et al. 2001. internationalising data access through limber. in third international workshop on internationalisation of products and systems: 1–14. milton keynes (uk). http://epubs.cclrc.ac.uk/bitstream/401/limber_iwips.pdf (accessed june 12, 2007). miles, alistair, and dan brickley, eds. 2004. skos mapping vocab­ ulary specification. w3c. http://www.w3.org/2004/02/ skos/mapping/spec/2004­11­11.html (accessed june 12, 2007). miles, alistair, brian matthews, and michael wilson. 2005. skos core: simple knowledge organization for the web. in 2005 dublin core annual conference—vocabularies in practice, 5–13. madrid: universidad carlos ii de madrid. miller, george a. 1990. wordnet: an on­line lexical database. int. j. lexicography 3: 235–312. mindswap group. 2006. swoop a hypermedia­based feath­ erweight owl ontology editor. maryland information and network dynamics lab. semantic web agents project. http://www.mindswap.org/2004/swoop/ (accessed june 12, 2007). nogueras­iso, javier, francisco javier zarazaga­soria, and pedro rafael muro­medrano. 2005. geographic information metadata for spatial data infrastructures—resources, interoperability, and information retrieval. new york: springer verlag. noy, natalie f., ray w. fergerson, and mark a. musen. 2000. the knowledge model of protégé2000: combining interoper­ ability and flexibility. in knowledge engineering and knowledge management: methods, models, and tools: 12th international conference, ekaw 2000, juan-les-pins, france, october 2–6, 2000: proceedings, 1­20 (lecture notes in computer science, 1937). new york: springer. ogp surveying & positioning committee. 2006. surveying and positioning. http://www.epsg.org/ (accessed june 12, 2007). semantic web advanced development for europe (swad­ europe). 2001. semantic web advanced development for europe thesaurus activity. http://www.w3.org/2001/sw/ europe/ reports/thes (accessed june 12, 2007). tolosana­calasanz, r., et al. 2006. semantic interoperability based on dublin core hierarchical one­to­one mappings. international journal of metadata, semantics, and ontologies 1, no. 3: 183–88. tylor, mike. 2004. the zthes specifications for thesaurus rep­ resentation, access, and navigation. http://zthes.z3950.org/ (accessed june 12, 2007). united nations educational, scientific, and cultural organiza­ tion (unesco). 1995. unesco thesaurus: a structured list of descriptors for indexing and retrieving literature in the fields of education, science, social and human science, culture, communication and information. paris: unesco publ. u.s. library of congress. network devlopment and marc standards office. 2004. marc standards. http://www.loc. gov/marc/ (accessed june 12, 2007). wielemaker, jan, guss schreiber, and bob wielinga1. 2005. using triples for implementation: the triple20 ontology-manipulation tool (lecture notes in computer science, 3729): 773–85. new york: springer. zarazaga­soria, francisco javier, et al. 2003. a java tool for creating iso/fgdc geographic metadata. in geodatenund geodiensteinfrastukuren—von der forschung zur praktischen anwendung: beitrage ze den münsteraner gi-tagen, 26/27. juni 2003 (ifgiprints, 18). münster, germany: institut fur geoin­ formatik, universitat münster. zarazaga­soria, francisco javier, et al. 2007. providing sdi ser­ vices in a cross­border scenario: the sdiger project use case. in research and theory in advancing spatial data infrastructure concepts, 113–26. redlands, calif.: esri. ebsco cover 2 lita cover 3, cover 4 index to advertisers 2 information technology and libraries | march 2007 m any things happen on the national front that affect libraries and their use of technology. legislative action, national policy, and stan­ dards development are all arenas in which ala and lita both take an active role. lita has articulated in its strategic plan the need to pursue active involvement in providing its expertise on national issues and standards development. lita achieves these important objectives in a variety of ways. lita has several committees, interest groups, and representatives to ala standing committees that address legislation, regulation, and national policy issues that pertain to technology. the charge of the lita legislation and regulations committee reads: “the legislation and regulation committee monitors legislative and regula­ tory developments in the areas of information and communications technologies; identifies relevant issues affecting libraries and assists in developing appropri­ ate strategies for responding to these issues.” as its educational mission, the committee publicizes issues and strategies on the lita web site. the chairperson of this committee serves as the lita representative to the ala legislation assembly which advises ala on positions to take regarding legislative and regulatory action. lita also has a representative to the ala office of information technology policy advisory committee who works closely with the legislation and regulation committee on it policy issues that may cross over into the legislative realm. lita also appoints a representa­ tive to the ala intellectual freedom committee whose purpose is “to recommend such steps as may be neces­ sary to safeguard the rights of library users, libraries, and librarians, in accordance with the first amendment to the united states constitution and the library bill of rights.” much has happened on the national front in the past few years that provides plenty of work for these lita and ala committees. the patriot act, calea, net neutrality, dopa, ada compliance, and debates over copyright and intellectual property rights in an electronic world are all examples of issues that require technologi­ cal control or affect systems and network solutions. they also touch at the heart of what librarians have always stood for: protection of intellectual property, personal pri­ vacy, and intellectual freedom. library technologists exert enormous time and effort protecting the privacy of patron records through data retention policies, system controls, and strong authentication systems all while providing authorized access to intellectual property according to copyright or licensing restrictions. keeping lita mem­ bers apprised of all of these issues and the technologies required to abide by legal requirements is an enormous task of the committees and interest groups. these groups do this through programming, publications, and postings to the lita web site. lita has always been very active on the standards development front. from the start, lita was involved with the marc standards through the hard work of henriette avram. the number of standards that affect libraries has mushroomed. there are standards for all aspects of technology—data formats, hardware and firmware, and networking. ala regularly calls on lita to provide expertise on developing standards that per­ tain to library technology. lita has a standards interest group and shares membership with alcts and rusa on the marbi committee. most lita interest groups deal with standards of some sort at least occasionally. the lita board felt that lita’s work on develop­ ing standards was so important that in 2006 a new standards coordinator position was created and diane hillman, cornell university, was appointed as the first person in this role. the standards coordinator identifies lita experts to assist in calls for review of developing standards and seeks input from the membership. the standards coordinator works closely with the standards interest group to help educate the membership. because of the nature of digital information, networks, and the standards that enable the distribution of digital informa­ tion and services, it has become impossible for any one person to understand all the standards that affect the library technologist. as standards proliferate, it becomes more important for lita to provide educational oppor­ tunities alongside the involvement in the development of these standards that so impact our daily lives. the lita web site provides a wealth of information about standards. a new means of contributing to the dialogue about developing standards is to participate in the lita wiki where diane hillman will be leading the way in posting information about various library technology standards. also, a great place to learn about various stan­ dards is right here in ital. practically every issue has at least one article about one standard or another. lita’s participation in technological developments on the national front is critical to all libraries. policy, regu­ lation, and standards form the infrastructure to techno­ logical implementation and are the cornerstone to library technology. lita is the place where you can learn more about these developments and participate in the dialogue about them. bonnie postlethwaite (postlethwaiteb@umkc.edu) is lita president 2006/2007 and associate dean of libraries, university of missouri–kansas city. president’s column bonnie postlethwaite searchable signatures: context and the struggle for recognition gina schlesselmantarango information technology and libraries | september 2013 5 abstract social networking sites made possible through web 2.0 allow for unique user-generated tags called “searchable signatures.” these tags move beyond the descriptive and act as means for users to assert online individual and group identities. this paper presents a study of searchable signatures on the instagram application, demonstrating that these types of tags are valuable not only because they allow for both individuals and groups to engage in what social theorist axel honneth calls the “struggle for recognition,” but also because they provide contextual use data and sociohistorical information so important to the understanding of digital objects. methods for the gathering and display of searchable signatures in digital library environments are also explored. introduction a comparison of user-generated tags with metadata traditionally assigned to digital objects suggests that social network platforms provide an intersubjective space for what social theorist axel honneth has termed the “struggle for recognition.” 1 social network users, through the creation of identity-based tags—or what can be understood as “searchable signatures”—are able to assert and perform online selves and are thus able to demand, or struggle for, recognition within a larger social framework. baroncelli and freitas cogently argue that web 2.0, or the interactive online social arena, in fact functions as a “recognition market in which contemporary individuals . . . trade personal worth through displays and exchanges of . . . self-presentations.” 2 a comparison of a metadata schema used in yale university’s digital images database with usergenerated tags accompanying shared photographs on the social networking platform instagram demonstrates that searchable signatures are unique to social networking sites. as phenomena that allow for public presentations of disembodied selves, searchable signatures thus provide specific information about the context of the digital images with which they are associated. capturing context remains a challenge for those working with digital collections, but searchable signatures allow viewers to derive valuable use data and sociohistorical information to better understand the world in which digital images originated and exist. literature review web 2.0 identities and recognition theory while web 2.0 can be imagined as a highly collaborative space where social actors are able to gina schlesselman-tarango (gina.schlesselman@du.edu) holds a master of social sciences from the university of colorado denver and is currently an mlis candidate at university of colorado. mailto:gina.schlesselman@du.edu searchable signatures: context and the struggle for recognition | schlesselman-tarango 6 communicate to the world new identities, some warn that this communication is somehow engineered and performed. van dijck, in an analysis of social media, argues that it is indeed “publicity strategies [that] mediate the norms for sociality and connectivity,” and baroncelli and freitas note that web 2.0 allows people to make themselves visible through modes of spectacularization.3 though his focus is on the spectacle in fin de siècle france, clark provides some insight into the effects of spectacularization on the individual. 4 working within a historical materialist framework, clark points that with the growth of capitalism, the individual has become colonized. 5 clark further describes this colonization as “massive internal extension of the capitalist market—the invasion and restructuring of whole areas of free time, private life, leisure, and personal expression . . . the making-into-commodities of whole areas of social practice which had once been referred to casually as everyday life.” 6 here, web 2.0 is not a liberatory tool but instead a space where users are colonized to the extent that they create selves exchanged through social networking sites owned by capitalist enterprises. web 2.0, then, has created a situation in which personal time and identification can be successfully commodified. baroncelli and freitas conclude, “from that formula, personal life becomes a capital to be shared with other people—preferably, with a large audience.” 7 the problem, then, is that one’s existence is defined simply “by being seen by others” and can no longer be understood as authentic.8 despite the sophistication of the argument detailed above, there are some who view the online self, created through web 2.0, as a legitimate and authentic identity. in an account of the online self, hongladarom summarizes this position, noting that both offline and virtual identities are constructed in social environments. 9 for hongladarom, these identities are not different in essence because “what it is to be a person . . . is constituted by external factors.” 10 the online world as an external factor has the ability to affirm one’s existence, regardless of whether that existence is physical or virtual. in sum, it is the social other and not a material existence that is the authenticating factor in identity formation. there are others who validate the role that spectacle—or what also can be understood as performance—plays in identity formation. pearson calls on the work of goffman to argue, “identity-as-performance is seen as part of the flow of social interaction as individuals construct identity performances fitting their milieu.” 11 for pearson, the identity is always performed, be it through web 2.0 or otherwise. there is nothing particularly worrisome, then, about the effects of web 2.0 on the self, nor does web 2.0 threaten the authenticity of the self. identity is always performed and is in a sense a spectacle—this does not mean, however, that identity in itself is spurious. it is with this perspective of the online self as a performed albeit authentic identity that this paper further develops. before a thorough analysis of the searchable signature as an online self can be conducted, a deeper understanding of honneth’s theory of recognition is in order. information technology and libraries | september 2013 7 in his 1995 work the struggle for recognition: the moral grammar of social conflicts, honneth sets out to develop a social theory based on what he calls “morally motivated struggle.” 12 based on the habermasian concept of communicative action, honneth contends that it is through mutual recognition that “one can develop a practical relation-to-self [and can] view oneself from the normative perspective of one’s partners in interaction, as their social addressee.” 13 relation-toself is key for honneth, and he argues that a healthy relation-to-self, or what can be thought of as self-esteem, is developed when one is seen as valuable by others. beyond self-esteem, honneth points that the success of social life itself depends on “symmetrical esteem between individualized (and autonomous) subjects.” 14 for honneth, this “symmetrical esteem” can lead to solidarity between individuals. “relationships of this sort,” he explains, “can be said to be cases of ‘solidarity’ because they inspire not just passive tolerance but felt concern for what is individual and particular about the other person.” 15 that is to say that felt concern for another allows one to see the specific traits of the other as valuable in working towards common goals, and honneth imagines that in situations of “symmetrical esteem . . . every subject is free from being collectively denigrated, so that one is given the chance to experience oneself to be recognized, in light of one’s own accomplishments and abilities, as valuable for society.” 16 until this ideal is realized, however, individuals must find sites in which to struggle to be recognized as valuable social assets. according to baroncelli and freitas, it is in fact web 2.0 that provides the arena where “the contemporary demand for the visibility of the self” is able to flourish. 17 they position this argument within honneth’s framework, asserting that the visibility of self is “directed towards a quest for recognition,” and they thus conclude that web 2.0 can be understood as a “recognition market.” 18 context and its importance capturing and integrating markers of context into records, according to chowdhury, still present a challenge for many.19 “there is now a general consensus that the major challenge facing a digital library as well as a digital preservation program is that it must describe its content as well as the context sufficiently well to allow its correct interpretation by the current and future generations of users,” he contends.20 context in itself is difficult to define, let alone its myriad facets that might or might not facilitate better understanding of digital objects. dervin, in her exploration of the meaning of context, points that it is often conceptualized as the “container in which the phenomenon resides.” 21 she points that the list of factors that constitute the container and might be considered contextual is in fact “inexhaustible”—items on this list, for example, might include the gender, race, and ethnicity of those involved in a phenomenon. 22 in an indexing or digital collection environment, the goal is to determine which of these many factors ought be included in a record to best allow for discovery and use. searchable signatures: context and the struggle for recognition | schlesselman-tarango 8 others imagine context as a fluid, ever-changing process rather than as a static container of data. “in this framework,” dervin writes, “reality is in a continuous and always incomplete process of becoming.” 23 this understanding of context as changing is helpful for those working with objects that live in digital environments, especially web 2.0. certainly the interactive nature of the web has created room for a variety of users to create, share, appropriate, comment on, tag, reject, celebrate, and ultimately understand images in a multitude of contexts that might be different from one moment to the next. there are many reasons to include contextual information in records of digital objects. lee argues that by providing context, or what he describes as the “social and documentary” world “in which [a digital object] is embedded,” future users will be able to better understand the “details of our current lives.” 24 further, lee contends that context is helpful in that is illustrates the ways in which a digital object is related to other materials: relationships to other digital objects can dramatically affect the ways in which digital objects have been perceived and experienced. in order for a future user to make sense of a digital object, it could be useful for that user to know precisely what set of . . . representations—e.g. titles, tags, captions, annotations, image thumbnails, video keyframes—were associated with a digital object at a given point in time. 25 the user-generated tag, then, is a valuable representation that provides contextual information surrounding the perception and experience of the image with which it is directly related. discussion user-generated tags and traditional metadata user-generated tags have been hailed as an important stage in the evolution of image description and are said to have the potential to shape controlled vocabularies used in traditional metadata schemas. for example, in a comparison of flickr tags and index terms from the university of st. andrews library photographic archive, rorissa stresses the importance of exploring similarities and differences between indexers’ and users’ language, noting that “social tagging could serve as a platform on which to build future indexing systems.” 26 like others, rorissa hopes that continued research into user-generated social tags will be able to “bridge the semantic gap between indexerassigned terms and users’ search language.” 27 in fact, some are currently utilizing social tags in an effort to describe and facilitate access to collections. one such organization is steve: the museum social tagging project, “a place where you can help museums describe their collections by applying keywords, or tags, to objects.” 28 the organization allows users to not only view traditional metadata associated with cultural objects, but also tags generated by others. in an effort to better understand the similarities and differences between user-generated tags and the language used in traditional metadata schemas, one must compare the two systems. information technology and libraries | september 2013 9 yale university’s digital images database provides a glimpse at the ways in which traditional metadata schemas are typically used to describe images in digital library settings. most of the images included in the database are accompanied by descriptive, structural, and administrative metadata. for example, an item entitled “boy sitting on a stoop holding a pole” (see figure 1) from the university’s collection of 1957–90 andrews st. george papers provides a digital copy of the image, the image number, name of the creator, date of creation, type of original material, dimensions, copyright information, manuscript group name and number, box and folder numbers, and a credit line.29 the image is further described by the following: “man in the shed is making homemade bombs. the boy and man are also in image 45350.” 30 figure 1. “boy sitting on a stoop holding a pole” from yale university’s digital images database collection of 1957–90 andrews st. george papers, november 2012. certainly, such information is useful in library environments and provides users with helpful and formatted data to best guide the information discovery process. the finding aid for the andrews st. george collection is additionally helpful in that it includes information about provenance, access, processing, associated materials, and the creator; it also contains descriptive information about the collection by box and folder number. 31 however, if additional use data and sociohistorical searchable signatures: context and the struggle for recognition | schlesselman-tarango 10 information specific to this individual item were available, it would be most helpful in assisting users in determining the image’s greater context. a study of modes of participation on social networking sites suggests that it is now possible to supply such contextual information for digital objects that live in interactive online environments. a useful site for exploring user-generated tags associated with images is instagram, a social application designed for iphone and android.32 instagram users are able to upload and edit photos, and other users can then view, like, and comment on the shared photos. instagram users are able to follow other users and search for photos by the creator’s username or by accompanying tags. instagram, owned by facebook, is interoperable with other social networking sites, and users have the ability to share their photos on facebook, flickr, tumblr, and twitter. as of july 2012, it was reported that instagram had 80 million users, and in september 2012, the new york times reported that 5 billion photos were shared through the application.33 users are limited to 30 tags per photo, and instagram suggests that users be as specific as possible when describing an image with a tag so that communities of users with similar interests can form.34 many tags, like the information included in traditional metadata schemas, aim to best describe an image by explaining its content; for example, one user assigned the tags #kids, #nieces, #nephews, and #family to a photograph of a group of smiling children (see figure 2). like the information accompanying the photograph in the yale university digital images database, such tags provide users and viewers with tools to better determine the “aboutness” of the image at hand. information technology and libraries | september 2013 11 figure 2. photo shared on instagram assigning both descriptive tags and the searchable signature #proudaunt, november 2012. however, instagram users are repurposing the tagging function in a way that is unique to social networking sites. in addition to the descriptive tags assigned to the image of the children described above, the user also tagged the photo with the term #proudaunt (see figure 2). there is, however, no aunt (what can be assumed to be an adult female) in the photograph. this tag, then, functions to further identify the user who created or shared the photograph and does not describe the content of the image at hand. a search of the same tag, #proudaunt, demonstrates that this user is not alone in identifying as such: in november 2012, this search returned 40,202 images with the same tag and more than 58,000 images with tags derived from the same phrase (#proudaunty, #proudauntie, #proudaunties, #proudauntiemoment, and #proudaunti) (see figure 3). figure 3. list of results from #proudaunt hashtag search on instagram, november 2012. this type of user-generated tag—one that identifies the creator or sharer of the photograph yet is not necessarily meant to describe the content of the image—can be understood as a searchable signature. such identity-based tags are not found within yale university’s digital images database; the closest relative of the searchable signature is the creator’s name. while searchable, this name is not alternative, or secondary, and it was not created and does not exist in a social environment. searchable signatures: context and the struggle for recognition | schlesselman-tarango 12 currently, born-digital objects are often created and shared in a technological milieu that allows for the assignment of user-generated tags. consequently, the integration of the searchable signature into the presentation of digital objects has become part of accepted social practice and offers unique opportunities for digital library curators and users alike. until quite recently, most materials—be they photographs, manuscripts, or government documents—were not born in digital environments. however, digitization projects have been undertaken to ensure that such historical materials are more widely and eternally available. these reborn digital objects, then, have been and can be integrated into dynamic social environments. steve: the museum social tagging project, mentioned earlier in this paper, is one example of an organization that has capitalized on the social practice of user-generated tagging and is using descriptive tags along with traditional metadata to better describe reborn digital objects. it is important, then, to explore what (if any) implications the application of the searchable signature, a unique type of user-generated tag, has for historical objects that are later integrated into digital environments. searchable signatures associated with born digital images on social networking sites contain valuable information about their creators, users, and the images’ context. one cannot ignore that users will, if given the chance, also likely apply signatures to reborn digital objects in similar ways that they do to objects that have always existed in social environments. since the searchable signature is used to identify not only digital image creators, but also sharers, and if these signatures do in fact provide important insight into the sharers and their motivations, then these signatures are not to be ignored. rather than focusing on the creating, the lens through which to understand the searchable signature for reborn digital objects can be shifted to the social act of sharing: by whom, when, in which social environments, and for what purposes. a deeper analysis of the presentation of self through the searchable signature and the role that the signature plays in providing valuable contextual information for both bornand reborn-digital objects is developed below. searchable signatures and the struggle for recognition if web 2.0 indeed functions as a recognition market, then social media and social networking sites might appear to be tables at such a market. placing oneself behind a table—be it facebook, twitter, or instagram—the user is able to perform his or her online identity to passersby and effectively struggle to be recognized as a unique individual or as a member of a social group. these performances, which could be deemed narcissistic in nature, can alternatively be read as healthy attempts to self-actualize and connect to larger society.35 one such “table” in the recognition market is instagram. beyond instagram’s social nature that allows participants to interact with and follow one another, the specific role of the searchable signature is of interest to those who are concerned with struggles for recognition. rather than describing shared images, searchable signatures reflect performative yet authentic user identities. information technology and libraries | september 2013 13 mccune, in a case study of consumer production on instagram, acknowledges the potential of the tag to not only facilitate image exchange but to communicate users’ positions as members of social groups.36 through a simple search of tags, users who identify as, for example, “cat ladies,” are able to validate their identities when they see that there are many others who use the same or similar language in demonstrations of the self (see figure 4). other signatures such as #proudaunt, while not necessarily playful, still function to provide viewers with additional information about the instagram user that cannot be determined through the photo itself. the ability to find images based on these searchable signatures allows users to find others who identify in a like manner and to imagine themselves as part of a larger social group. in effect, searchable signatures allow users to be recognized as social addressees of like-minded others. positioning oneself within a group must be understood as a struggle for recognition, for to imagine oneself as part of the social fabric is also to see oneself as valuable. figure 4. list of results from #catlady hashtag search on instagram, november 2012. enabled by web 2.0, searchable signatures contain potential for marginalized peoples or groups to assert online selves to be seen and ultimately heard in a truly intersubjective landscape. it is not too much of a leap to imagine that searchable signatures might make possible the organization of individuals and groups for political purposes. in fact, in a discussion of social groups, honneth notes that “the more successful social movements are at drawing the public sphere’s attention to searchable signatures: context and the struggle for recognition | schlesselman-tarango 14 the neglected significance of the traits and abilities they collectively represent, the better their chances of raising the social worth, or indeed, the standing of their members.” 37 here, searchable signatures might provide such movements with a venue to capture the public’s attention and to effectively struggle for and gain recognition. searchable signatures and context as markers of individual and group identities, searchable signatures are unique in that they provide a snapshot of the multitude of social, historical, political, individual, and interpersonal relationships that ontologize the images with which they are paired. it is this very contextual information that is at times lacking in traditional indexing environments. by examining searchable signatures, experts and users are able to understand which individuals and groups create, use, and identify with certain images. thus, as markers of self, searchable signatures provide use data for scholars to better investigate which images are important to online individual or group identities. if the searchable signature is used in a political fashion, historians and sociologists might be able to study which types of images, for example, marginalized groups rally around, identify with, and use in their struggles for recognition. such use data also illuminates how and by whom certain digital images have been appropriated over time. for example, if a picture of a cat is first created or shared via instagram by an animal rights activist, the image might be accompanied by the searchable signature #humanforcats. this same image, shared by another user months later, might be accompanied by the #catlady signature. those interested will be able to examine how the same image has been historically used for different purposes and will be better able to grasp the evolving nature of its digital context. in addition to use data, the searchable signature provides insight into the sociohistorical context surrounding digital images. for those who perceive “reality . . . as accessible only (and always incompletely) in context, in specific historicized moments in time space” the searchable signature clarifies and makes more accessible that reality surrounding the digital image. 38 in a traditional library setting, a photo of a cat might be indexed with descriptive subject headings such as “cat,” “persian cat,” or “kitten—behavior.” however, the searchable signature #catladyforlife provides additional information on how the cat has become, for a certain social group in a specific moment in time, a trope of sorts for those who are proud of not only their relationships with their domestic pets, but of their shared values and lifestyles as well. if a historian were to dig deeper, he or she also might see that “cat lady” has historically been used in a derogatory manner to mark single, unattractive women thought to be crazy and unable to care for the great number of cats they own and that, by (re)claiming this title, women might be engaging in a struggle for recognition that extends beyond mere admiration for felines.39 chowdhury, in a continued discussion of challenges facing the digital world, asks whether it is “possible to capture the changing context along with the content of each information resource, because as we know the use and importance . . . changes significantly with time.” 40 additionally, he information technology and libraries | september 2013 15 asks, “will it be possible to re-interpret the stored digital content in the light of the changing context and user community, and thereby re-inventing the importance and use of the stored objects?” 41 it is here that the searchable signature offers use data and sociohistorical information to illuminate the (changing) value digital images have for individuals, communities, and society. conclusion clark argues that representation must be understood within the confines of what he calls “social practice.” 42 social practice, among other things, can be understood as “the overlap and interference of representations; it is their rearrangement in use.” 43 representation of self also must be understood within current social practice, and an important facet of today’s practice is web 2.0. as a social space, web 2.0 allows for the creation of disembodied self-representations. one type of such representation, the searchable signature, is a phenomenon unique to social networking sites. while many acknowledge the potential of descriptive, user-generated tags to inform or even to be used in conjunction with metadata schemas or controlled vocabularies, instagram users have created an additional, alternative use for the tag. rather than simply using tags to describe shared images, they have successfully created a route to online identity formation and recognition. searchable signatures demonstrate the power of the online self, as they allow users to struggle to be recognized as unique individuals or as parts of larger social groups. these signatures, too, might act as platforms on which social groups can assert their value and thus demand recognition. additionally, searchable signatures provide contextual information that reflects the social practice in which digital images live. while the capture and integration of such information remains a challenge for those engaged in traditional indexing, web 2.0 allows for this unique type of usergenerated tag and thus provides better understanding of the context surrounding digital images. as to the question of whether searchable signatures can be integrated into existing metadata schemas or be used to inform controlled vocabularies in library environments, it is not unreasonable to suggest that digital objects be accompanied by their supplemental yet valuable representations (e.g., searchable signatures and the like). many methods exist through which these signatures might be both gathered and displayed. certainly, a full exploration of such practices is the stuff of future research; however, some initial ideas are detailed below. one method of gathering identity-based tags would involve the active hunting down of searchable signatures. locating objects on social networking sites that are also in one’s digital collection, the indexer would identify and track associated user-generated searchable signatures. this method would require extreme diligence, high levels of comfort navigating and using web 2.0, a clear idea of which social networking sites yield the most valuable searchable signatures, and likely one or more full-time staff members devoted to such activities. even if feed systems were employed for individual digital objects, this method demands much of indexers and would likely not be sustainable over time. searchable signatures: context and the struggle for recognition | schlesselman-tarango 16 a more passive yet efficient way of gathering searchable signatures would simply be to build on methods that have shown to be successful. by creating interactive digital environments that encourage users to assign not only descriptive but also identity-based tags, indexers are freed of the time-consuming task of hunting for searchable signatures on the web. since searchable signatures have come to be part of online social practice, the assigning of them would likely be familiar to users—initially, libraries might need to prompt users to share signatures or provide them with examples. this gathering tactic could be used to harvest signatures for items that are already part of the library’s digital collection (telling us about signatures used by potential sharers) or as a means to incorporate new digital objects into the collection (telling us about signatures used by both creators and sharers). in both gathering scenarios, indexers might choose to display only the most occurring or what they deem to be the most relevant searchable signatures, or they might choose to display all such tags; decisions such as these will ultimately depend on each institution’s mission and resources. of course, if a library integrates a born-digital image into its collection and can identify the searchable signatures originally assigned to it via social networking sites or otherwise, this information should also be recorded. here, users will be able to get a glimpse of the image in its pre-library life. providing associated usernames, dates posted, and the name of the social networking sites too will assist in providing a more complete picture of the individuals or groups linked to the image. this information can provide valuable data about the information creators and sharers who use specific social platforms. the aim of this paper is to lay the theoretical groundwork to better understand the role of searchable signatures in today’s digital environment as well as the signature’s unique ability to provide context for digital images. surely, further research into the phenomenon of the searchable signature would demonstrate how it is currently used outside of instagram or as a political tool. others might consider examining the username as another arena in which individuals or groups construct and perform online identities and thus engage in struggles for recognition. usernames also might provide contextual use data and sociohistorical information that inevitably support greater understanding of digital objects. finally, further research is needed to identify how libraries could utilize the searchable signature in promotional activities and to build and cater to user communities. references 1. axel honneth, the struggle for recognition: the moral grammar of social conflicts (cambridge: mit press, 1995). 2. lauane baroncelli and andre freitas, “the visibility of the self on the web: a struggle for recognition,” in proceedings of 3rd acm international conference on web science, 2011, accessed august 12, 2013, www.websci11.org/fileadmin/websci/posters/191_paper.pdf. information technology and libraries | september 2013 17 3. jose van dijck, “facebook as a tool for producing sociality and connectivity,” television & new media 13, no. 2 (2012): 160–76; baroncelli and freitas, “the visibility of the self.” 4. t. j. clark, introduction to the painting of modern life: paris in the art of manet and his followers (princeton, nj: princeton university press, 1984), 1–22. 5. ibid. 6. ibid., 9. 7. baroncelli and freitas, “the visibility of the self.” 8. ibid. 9. soraj hongladarom, “personal identity and the self in the online and offline world,” minds & machines 21 (2011): 533–48. 10. ibid., 541. 11. erika pearson, “all the world wide web’s a stage: the performance of identity in online social networks,” first monday 14 (2009), accessed november 9, 2012, www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm; erving goffman, the presentation of self in everyday life (garden city, ny: doubleday, 1959). 12. honneth, the struggle for recognition, 1. 13. jurgen habermas, the theory of communicative action (boston: beacon, 1984); honneth, the struggle for recognition, 92. 14. honneth, the struggle for recognition, 129. 15. ibid. 16. ibid., 130. 17. baroncelli and freitas, “the visibility of the self.” 18. ibid. 19. gobinda chowdhury, “from digital libraries to digital preservation research: the importance of users and context,” journal of documentation 66, no. 2 (2010): 207–23, doi: 10.1108/00220411011023625. 20. ibid., 217. 21. brenda dervin, “given a context by any other name: methodological tools for taming the unruly beast,” in information seeking in context, ed. pertti vakkari et al. (london: taylor graham, 1997), 13–38. searchable signatures: context and the struggle for recognition | schlesselman-tarango 18 22. ibid., 15. 23. ibid., 18. 24. christopher a. (cal) lee, “a framework for contextual information in digital collections,” journal of documentation 67 (2011): 95–143. 25. ibid., 100. 26. abebe rorissa, “a comparative study of flickr tags and index terms in a general image collection,” journal of the american society for information science and technology 61, no. 11 (2010): 2230–42. 27. ibid., 2239. 28. “steve central: social tagging for cultural collections,” steve: the museum social tagging project, accessed december 16, 2012, http://tagger.steve.museum. 29. “yale university library manuscripts & archives department,” yale university manuscripts & archives digital images database, last modified april 19, 2012, accessed december 3, 2012, http://images.library.yale.edu/madid. 30. ibid. 31. “andrew st. george papers (ms 1912),” manuscripts and archives, yale university library, accessed april 30, 2013, http://drs.library.yale.edu:8083/fedoragsearch/rest. 32. “faq,” instagram, accessed november 10, 2012, http://instagram.com/about/faq. 33. emil protalinksi, “instagram passes 80 million users,” cnet, july 6, 2012, accessed november 13, 2012, http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-millionusers; jenna wortham, “it’s official: facebook closes its acquisition of instagram,” new york times, september 6, 2012, accessed november 13, 2012, http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-ofinstagram. 34. “tagging your photos using #hashtags,” instagram, accessed november 10, 2012, http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-usinghashtags; “instagram tips: using hashtags,” instagram, accessed november 10, 2012, http://blog.instagram.com/post/17674993957/instagram-tips-using-hashtags. 35. andrew l. mendelson and zizi papacharissi, “look at us: collective narcissism in college student facebook photo galleries,” in a networked self: identity, community and culture on social network sites, ed. zizi papacharissi (new york: routledge, 2010), 251–73. 36. zachary mccune, “consumer production in social media networks: a case study of the http://tagger.steve.museum/ http://images.library.yale.edu/madid/ http://drs.library.yale.edu:8083/fedoragsearch/rest/ http://instagram.com/about/faq/ http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-million-users/ http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-million-users/ http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-of-instagram/ http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-of-instagram/ http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-using-hashtags http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-using-hashtags http://blog.instagram.com/post/17674993957/instagram-tips-using-hashtags information technology and libraries | september 2013 19 ‘instagram’ iphone app” (master’s dissertation, university of cambridge, 2011), accessed december 20, 2012, http://thames2thayer.com/portfolio/a-study-of-instagram. 37. honneth, the struggle for recognition, 127. 38. dervin, “given a context by any other name,” 17. 39. kiri blakeley, “crazy cat ladies,” forbes, october 15, 2009, accessed december 4, 2012, www.forbes.com/2009/10/14/crazy-cat-lady-pets-stereotype-forbes-woman-timefelines.html; crazy cat ladies society & gentlemen's auxiliary homepage, accessed december 4, 2012, www.crazycatladies.org. 40. chowdhury, “from digital libraries to digital preservation,” 219. 41. ibid. 42. clark, introduction to the painting of modern life, 6. 43. ibid. acknowledgments many thanks to erin meyer and dr. krystyna matusiak at the university of denver for their feedback and guidance. http://thames2thayer.com/portfolio/a-study-of-instagram/ academic web site design and academic templates | peterson 217 academic web site design continues to evolve as colleges and universities are under increasing pressure to create a web site that is both hip and professional looking. many colleges and universities are using templates to unify the look and feel of their web sites. where does the library web site fit into a comprehensive campus design scheme? the library web site is unique due to the wide range of services and content available. based on a poster session presented at the twelfth annual association of college and research libraries conference in minneapolis, minnesota, april 2005, this paper explores the prevalence of university-wide academic templates on library web sites and discusses factors libraries should consider in the future. c ollege and universities have a long history with the web. in the early 1990s, university web sites began as piecemeal projects with varying degrees of complexity—many started as informational sites for various technologically advanced departments on campus. over the last decade, these web sites have become a vital part of postsecondary institutions and one of their most visible faces. academic web sites communicate the brand and mission of an institution. they are used by prospective students to learn about an institution and then used later to apply. current students use them to pay tuition bills, register for classes, access course materials, participate in class discussions, take tests, get grades, and more. online learning and course-management software programs, such as blackboard, continue to increase the use of web sites. they are now an important learning tool for the entire campus community and the primary communication tool for current students, parents, alumni, the community, donors, and funding organizations. web site standards have developed since the 1990s. usability and accessibility are now important tenets for web site designers, especially for educational institutions. as a result, campus web designers or outside consultants are often responsible for designing large parts of the academic web site. as web sites have grown, ongoing maintenance is an important workload issue. databases and other technologies are used to simplify daily updates and changes to web sites. this is where the academic template fits in. an academic template can be defined as a common or shared template used to control the formatting of web pages in different departments on a campus. generally, administrators will mandate the use of a specific template or group of templates. this mandate includes guidelines for such things as layout, design, color, font, graphics, and navigation links to be used on all web pages. often, the templates are administered using content management systems (cmss) or web development software such as macromedia’s contribute. these programs give different levels of editing rights to individuals, thus keeping tight control over particular web pages or even parts of web pages. academic templates give the web site administrator the ability to change the template and update all pages with a single keystroke. for example, the web site administrator may give editing rights to content editors, such as librarians, to edit only the center section of the web page. the remaining parts of the page such as the top, sides, and bottom are locked and cannot be edited. the result of using templates is that the university web site is very unified and consistent. this is particularly important in creating a brand for the university. well-branded institutions have the opportunity to increase revenue, improve administration and faculty staffing, improve retention, and increase alumni relationships.1 but what about the library? libraries are one of the most visited web pages on a university’s web site.2 thus, the design of the library page can be crucial to a well-designed academic web site. the library web site can set a tone for an institution and help prospective students get a feel for the campus. belanger, mount, and wilson contend it is important for the image of an institution to match the reality.3 if there is discord between the two, students may choose an inappropriate college and quickly drop out, lowering a campus’s retention data. the library web site can also be important in the recruitment of new faculty members. in addition, libraries use their web sites for marketing, public relations, and fund-raising for the library.4 library web sites are crucial to delivering data, research tools, and instruction to students, faculty, staff, and community patrons. more than 90 percent of students access the library from their home computers, and 78 percent prefer this form of access.5 today, the web site connects users with article citations and databases, library catalogs, full-text journals, magazines, newspapers, books, videos, dvds, e-books, encyclopedias, streaming music and video, and more. users access subject-specific research guides, library tutorials, information-literacy instruction, and critical evaluation tools. services such as interlibrary loan (ill), reference management programs such as endnote or refworks, and print and electronic reserves are also used via the web. users get help with doing research by e-mail and virtual chat. in addition, libraries are digital repositories for a growing number of digital historic documents and archives. academic web site design and academic templates: where does the library fit in? kate peterson kate peterson (katepeterson@gmail.com) is an information literacy librarian at capella university, minneapolis, minnesota. 218 information technology and libraries | december 2006 how common are academic templates in library web sites? what effect do they have on the content and services provided by libraries? ■ methods for the purposes of this study, a list of doctoral, master’s, and bachelor of arts (ba) institutions (private and public) based on the carnegie classification of institutions of higher education was created and a random number table was used to select a sample of web pages (n=216).6 home pages, admissions pages, departmental pages, and library web pages were analyzed. a similarly sized sample of each type was selected to give a broad overview of trends—18 percent of doctoral institutions (n=47), 19 percent of master’s institutions (n=115), and 23 percent of ba institutions (n=54). the following questions were asked: ■ does the college or university web site use an academic template? ■ if yes, is the library using the template, and for how much of the library web site? ■ to what extent is the template being used? primarily, a web site was determined to be using an academic template based on the look of the site. for example, if the majority of the web elements (top banner, navigation) all matched, then the web site was counted as using some sort of template. use and nonuse of content management system (cms) software behind the web site was not considered in this study—only the look of the web site. ■ results a majority of college and university web sites (94 percent) use an academic template. fifty percent of the libraries surveyed use the academic template for at least the library’s home page. of that number, about 34 percent of libraries use the template on a majority of the library pages. roughly 44 percent of the total libraries surveyed did not use the academic template, and approximately 5 percent of academic web sites do not use any sort of unified academic template. smaller ba institutions are more likely to use the academic template on multiple library pages than doctoral institutions, which tend to have their own library design or template (see table 1). for those libraries that did not use the academic template on every library page, the most commonly used elements template were the top header (which often has the university seal or an image of the university), the top navigation bar (with university-wide links), and the bottom footer, which often contains the university address, privacy statement, or legal disclaimers. less frequently used elements were the bottom navigation bar, and the left or right navigation bar with university-wide links (see tables 2–3). ■ discussion while many colleges and universities use academic templates, only about half of their libraries follow suit. libraries using the template often use selected parts of the template, or only use the template on their home page. though not considered in this study, there may be a correlation between institution size and template use, as larger institutions are more likely to have library web designers and thus use the academic template only on the library’s home page. while academic templates can cause libraries many problems, there are also many benefits to be considered. ■ problems with academictemplates on library web sites the primary concern with any template is how much space is available for content. for example, there may be a very small box for the page content while images, banner bars, and large navigation links may take up most of the real estate on the page. this problem can be exacerbated for libraries because there are so many different types of content such as the library catalog, databases, tutorials, forms, ill, and other library services delivered via the web. libraries can be caught between the design imposed by the academic template and the rigid size requirements from outside vendors such as database companies, ill or reserve modules, federated search products, or others. academic templates are usually mandated by administrators without a full understanding of the specific content and uses of the library web site. many problems can occur when trying to fit an existing library web site into a poorly designed academic template. it can be very difficult to modify the template effectively for the library’s purposes. an example of one specific problem is confusing links on the template, where a link on every page to the “university catalog” links to the course catalog and not the library catalog, which is very confusing for users. another example is a search box as part of the academic template—what are users searching? the university web site? the library web site? the library catalog? the world wide web? another drawback to using academic templates for library web sites can be the time involved in training librarians, staff, and library web site administrators. the existing academic web site design and academic templates | peterson 219 content must be fit into the new template—a huge project, given that many library web sites contain one thousand pages or more. generally, a decision to use a template is accompanied by a decision to use a cms or new web-page editor. this takes yet more time to train individuals on the new software in addition to the new template. ■ benefits of using academic templates one of the benefits for libraries using an academic template is the ability to exploit the expertise of the web site designers who created the template. the academic template often incorporates images, logos, and branding that the library may not be able to design otherwise. many libraries do not have professional web designers on staff; even if they do, there often is no one person who designs and maintains the entire library web site. instead, different parts of a library web site are designed and maintained by different individuals with varying degrees of web site ability. as a result, many library web sites are a mix of styles, which can be disorienting for students who are familiar with the university’s “look.” web site uniformity has a positive effect on usability since familiarity with one part of the web site helps students, faculty, table 1. percentages of occurrences of academic templates no academic template (%) library not using template (%) library using template—transition or top page (%) library using template—majority of pages (%) bachelor of arts 4 37 13 46 master’s 6 48 12 34 doctoral 6 45 28 21 table 2. occurrence of templates in academic and library web sites no academic template library not using template library using template— transition or top page library using template— majority of pages total sites analyzed bachelor of arts 2 20 7 25 54 master’s 7 55 14 39 115 doctoral 3 21 13 10 47 total 12 96 34 74 216 table 3. percentages of occurrence for institutions using the academic-wide template for first page of library web site or libraries using modified academic template ba (%) master’s (%) doctoral (%) all colleges and universities (%) top header (no navigation) 100 94 94 91 top navigation 75 82 82 76 bottom header (no navigation) 83 65 76 72 bottom navigation 25 18 18 20 left navigation 42 18 18 24 right navigation 8 0 0 2 220 information technology and libraries | december 2006 and staff navigate other parts of the web site. even web site basics such as knowing the color and style of the links and how to navigate to different pages can be helpful.8 another benefit is academic templates are generally ada compliant as required under section 508 of the rehabilitation act of 1973.9 as usability and usability testing become more prevalent, academic template designers may also test the template and navigation for usability. such testing will improve the template and thus the library web site as well. ■ trends in academic and library web sites colleges and universities are responding to a new generation of students, the majority of whom have grown up with computers. in trying to meet their needs and desires, many academic web sites have high-quality photographs, quotes, and testimonials from the universities’ students on their home pages. more and more materials are being placed online to allow both prospective and current students to do what they need to do twenty-four hours a day, from registering for classes to handing in research papers. many web sites have interactive elements such as instant polls or quizzlets or use instant messaging to connect with tech-savvy students. for example, prospective students can chat with admissions staff members or current students about what it is like to attend a particular university. a large number of sites also highlight weblogs written by current students or those studying abroad. these features allow students to use the technology they are comfortable with to maximize their academic experience. numerous library web sites are changing as well, featuring a library catalog, article database, or federated search box on the home page to allow users to search instantly. additionally, library sites are beginning to include images of students using the library, external or internal shots of the building, flash graphics, icons, and sound. many incorporate screen captures to help users navigate specific databases or forms. in addition, an increasing number of libraries use weblogs to give more of a dynamic quality with daily library news and announcements. ■ strategies for using academic templates based on comments received in april 2005 during the poster session, and in recent electronic discussion list postings, many academic libraries are dealing with these issues. libraries should work on creating a mission statement and objectives for their web sites that expand upon the library’s mission, the institutional web site’s mission, and the institution’s overall mission and brand. librarians must be knowledgeable about web site usability and trends in web site design in order to communicate effectively to designers and administrators. librarians should also become members of campus web committees and be a voice for library users during the design process. teaching administrators and campus web designers about the library and the library web site’s prominence are important tools to successfully deal with any proposed university-wide academic templates. for example, a librarian could mock-up a few pages, conduct informal usability testing, and invite administrators to learn firsthand about potential problems library users could experience with a template. librarians could also propose a modified template that uses a few key elements from the academic template. this would maintain the brand but retain enough space for important library content. connecting with other librarians and learning from each other’s successes and failures will also help bring insight into this academic template issue. ■ conclusion the use of academic templates is only going to increase as institutional web sites grow in complexity and importance. libraries are an important part of institutions both physically—on campus—and virtually—as part of the campus web site. academic templates are part of a unified design scheme for colleges and universities. librarians must work with both library and university administrators to create a well-designed but usable library web site. they must advocate for library users and continue to help students and faculty access the rich resources and services available from the library. library administrators need to allocate resources and staff time to improve their web sites and to work in concert with academic web site designers to merge the best of the academic template to the best of the library site while not sacrificing users’ needs. the result will be highly used, highly usable library web sites that attract students and keep them coming back to access the fantastic world of information available in today’s academic libraries. ■ references 1. robert sevier, “university branding: 4 keys to success,” university business 5, no. 1 (2002): 27–28. 2. mignon adams and richard m. dougherty, “how useful is your homepage? a quick and practical approach to evaluating a library’s web site,” college & research libraries news 63, no. 8 (2002): 590–92. academic web site design and academic templates | peterson 221 3. charles belanger, joan mount, and mathew wilson, “institutional image and retention,” tertiary education and management 8, no. 3 (2002): 217. 4. jeanie m. welch, “the electronic welcome mat: the academic library web site as a marketing and public-relations tool,” the journal of academic librarianship 31, no. 3 (2005): 225–28. 5. oclc, “how academic librarians influence students’ web-based information choices,” in oclc online computer library center database online, (2002), 5, http://www5.oclc .org/downloads/community/informationhabits.pdf (accessed march 10, 2005). 6. carnegie foundation, carnegie classification of institutions of higher education, 2000 edition, http://www.carnegiefound ation.org/classification/ (accessed jan. 8, 2005). 7. beth evans, “the authors of academic library home pages: their identity, training, and dissemination of web construction skills,” internet research 9, no. 4 (1999): 309–19. 8. oclc, 6. 9. u.s. department of justice, section 508 home page, in united states department of justice database online, (2004), 1, http://www.usdoj.gov/crt/508/ (accessed july 3, 2005). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: john webb, librarian emeritus, washington state university libraries, pullman, wa 99164-5610. annual subscription price, $55. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: june 2006 issue). total number of copies printed: average, 5,256; actual, 5,300. sales through dealers and carriers, street vendors, and counter sales: none. paid or requested mail subscriptions: average, 4,262; actual, 4,280. free distribution (total): average, 59; actual, 67. total distribution: average, 4,758; actual, 4,769. office use, leftover, unaccounted, spoiled after printing: average, 498; actual, 531. total: average, 5,256; actual, 5,300. percentage paid: average, 98.76; actual, 98.60. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , o c t o b e r 1 9 9 9 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , s e p t e m b e r 3 0 , 2 0 0 6 . editorial and technological workflow tools to promote website quality | morton-owens 91 emily g. morton-owens editorial and technological workflow tools to promote website quality everard and galletta performed an experimental study with 232 university students to discover whether website flaws affected perception of site quality and trust. their three types of flaws were incompleteness, language errors (such as spelling mistakes), and poor style in terms of “ambiance and aesthetics,” including readable formatting of text. they discovered that subjects’ perception of flaws influenced their judgment of a site being highquality and trustworthy. further, they found that the first perceived error had a greater negative impact than additional problems did, and they described website users as “quite critical, negative, and unforgiving.”5 briggs et al. did two studies of users’ likelihood of accepting advice presented on a website. of the three factors they considered—credibility, personalization, and predictability—credibility was the most influential in predicting whether users would accept or reject the advice. “it is clear,” they report, “that the look and feel of a web site is paramount in first attracting the attention of a user and signaling the trustworthiness of the site. the site should be . . . free of errors and clutter.”6 though none of these studies focuses on libraries or academic websites and though they use various metrics of trustworthiness, together they point to the importance of quality. text quality and functional usability should be important to library website managers. libraries ask users to entrust them to choose resources, answer questions, and provide research advice, so projecting competence and trustworthiness is essential. it is a challenge to balance the concern for quality with the desire to update the website frequently and with librarians’ workloads. this paper describes a solution implemented in drupal that promotes participation while maintaining quality. the editorial system described draws on the author’s prior experience working in book publishing at penguin and random house, showing how a system that ensures quality in print publishing can be adjusted to fit the needs of websites. ■■ setting editing most people think of editing in terms of improving the correctness of a document: fixing spelling or punctuation errors, fact-checking, and so forth. these factors are probably the most salient ones in the sense that they are editor’s note: this paper is adapted from a presentation given at the 2010 lita forum library websites are an increasingly visible representation of the library as an institution, which makes website quality an important way to communicate competence and trustworthiness to users. a website editorial workflow is one way to enforce a process and ensure quality. in a workflow, users receive roles, like author or editor, and content travels through various stages in which grammar, spelling, tone, and format are checked. one library used a workflow system to involve librarians in the creation of content. this system, implemented in drupal, an opensource content management system, solved problems of coordination, quality, and comprehensiveness that existed on the library’s earlier, static website. t oday, libraries can treat their websites as a significant point of user contact and as a way of compensating for decreases in traditional measures of library use, like gate counts and circulation.1 websites offer more than just a gateway to journals; librarians also can consider instructional or explanatory webpages as a type of public service interaction.2 as users flock to the web to access electronic resources and services, a library’s website becomes an increasingly prominent representation of the library. at the new york university health sciences libraries (nyuhsl), for example, statistics for the 2009–10 academic year showed 580,980 in-person visits for all five locations combined. by comparison, the website received 986,922 visits. in other words, the libraries received 70 percent more website visits than in-person visits. many libraries conduct usability testing to determine whether their websites meet the functional needs of their users. a concern related to usability is quality: users form an impression of the library partly based on how it presents itself via the website. as several studies outside the library arena have shown, users’ experience of a website leads them to attribute characteristics of competence and trustworthiness to the sponsoring organization. tseng and fogg, discussing non-web computer systems, present “surface credibility” as one of the types of credibility affecting users. they suggest that “small computer errors have disproportionately large effects on perceptions of credibility.”3 in another paper by fogg et al., “amateurism” is one of seven factors in a study of website credibility. the authors recommend that “organizations that care about credibility should be ever vigilant—and perhaps obsessive—to avoid small glitches in their websites. . . . even one typographical error or a single broken link is damaging.”4 emily g. morton-owens (emily.morton-owens@med.nyu.edu) is web services librarian, new york university health sciences libraries, new york. 92 information technology and libraries | september 2011 happens when a page moves from one state to another. the very simple workflow in figure 1 shows two roles (author and editor) and three states (draft, approval, and published). there are two transitions with permissions attached to them. only the author can decide when he or she is done working and make the transition from draft to approval. only the editor can decide when the page is ready and make the transition from approval to published. (in these figures, dotted borders indicate states in which the content is not visible to the public.) a book publishing workflow involves perhaps a dozen steps in which the manuscript passes between the author, his or her agent, and various editorial staff. a year can pass between receiving the manuscript and publishing the book. the reason for that careful, conservative process is that it is very difficult to fix a book once thousands of copies have been printed in hardcover. by contrast, consider a newspaper: a new version appears every day and contains corrections from previous editions. a newspaper workflow is hardly going to take a full year. a website is even more flexible than a newspaper because it can be fixed or improved at any time. the kind of multistep process used for books and newspapers is effective, but not practical for websites. a website should have a workflow for editorial quality control, but it should be proportional to the format in terms of the number of steps, the length of the process, and the number of people involved. alternate workflow models this paper focuses on a contributor/editor model in which multiple authors create material that is vetted by a central authority: the editor. other models could be implemented with much the same tools. for example, in a peer-review system as is used for academic journals, there is a reviewer role, and an article could have states like “published,” “under review,” “conditionally accepted,” and so forth. most noticeable when neglected. editors, however, have several other important roles. for example, they select what will be published. in book publishing, that involves rejecting the vast majority of material that is submitted. in many professional contexts, however, it means soliciting contributions and encouraging authors. either way, the editor has a role in deciding what topics are relevant and what authors should be involved. additionally, editors are often involved in presenting their products to audiences. in book publishing, that can mean weighing in on jacket designs or soliciting blurbs from popular authors. on websites, it might mean choosing templates or fonts. editors want to make materials attractive and accessible to the right audience. together, correctness, choice, and presentation are the main concerns of an editor and together contribute to quality. each of these ideas can be considered in light of library websites. correctness means offering information that is current and free of errors, contradictions, and confusing omissions. it also means representing the organization well by having text that is well written and appropriate for the audience. writing for the web is a special skill; people reading from screens have a tendency to skim, so text should be edited to be concise and preferably organized into short chunks with “visible structure.”7 there is also good guidance available about using meaningful link words, action phrases, and “layering” to limit the amount of information presented at once.8 of course, correctness also means avoiding the kind of obvious spelling and grammar mistakes that users find so detrimental. choice probably will not involve rejecting submissions to the website. instead, in a library context it could mean identifying information that should appear on the website and writing or soliciting content to answer that need. presentation may or may not have a marketing aspect. a public library’s website may advertise events and emphasize community participation. as an academic medical library, nyuhsl has in some sense a captive audience, but it is still important to communicate to users that librarians understand their unique and highlevel information needs and are qualified to partner with them. workflow a workflow is a way to assign responsibility for achieving the goals of correctness, choice, and presentation. it breaks the process down into steps that ensure the appropriate people review the material. it also leaves a paper trail that allows participants to see the history and status of material. workflow can alleviate the coordination problems that prevent a website from exhibiting the quality it should. a workflow is composed of states, roles, and transitions. pages have states (like “draft” or “published”) and users have roles (like “contributor” or “editor”). a transition figure 1. very basic workflow editorial and technological workflow tools to promote website quality | morton-owens 93 effect was on the quality of the website, which contained mistakes and confusing information. ■■ methods nyuhsl workflow and solutions to resolve its web management issues, nyuhsl chose to work with the drupal content management system (cms). the ability to set up workflow and inventory content by date, subject, or author was a leading reason for that decision. other reasons included usability of the backend for librarians, theming options, the scripting language the cms uses (php), and drupal’s popularity with other libraries and other nyu departments.9 nyuhsl’s drupal environment has four main user roles: 1. anonymous: these are visitors to the nyuhsl site who are not logged in (i.e., library users). they have no permissions to edit or manage content. they have no editorial responsibilities. 2. library staff: this group includes all the staff content authors. their role is to notice what content library users need and to contribute it. staff have been encouraged to view website contributions as something casual—more akin to writing an e-mail than writing a journal article. 3. marketing team: this five-member group checks content that will appear on the homepage. their mandate is to make sure that the content is accurate about library services and resources and represents the library well. its members include both librarians and staff with relevant experience. 4. administrators: there are three site admins; they have the most permissions because they also build the site and make changes to how it works. two of the three admins have copyediting experience from prior jobs, so they are responsible for content approvals. they copyedit for spelling, grammar, and readability. admins also check for malformed html created by the wysiwyg (what you see is what you get) interface provided for authors, and they use their knowledge of other material on the site to look out for potential conflicts or add relevant links. returning to the themes of correctness, choice, and presentation, it could be said that librarian authors are responsible for choice (deciding what to post), the marketing team is responsible for choice and presentation, and the administrators are responsible for all three. an important thing to understand is that each person in a role has the same permissions, and any one of in an upvoting system like reddit (http://reddit .com), content is published by default, any user has the ability to upvote (i.e., approve) a piece of content, and the criterion for being featured on the front page is the number of approvals. in a moderation system, any user can submit content and the default behavior is for the moderator to approve anything that is not outright offensive. the moderator never edits, just chooses the state “approved” or the state “denied.” moderation is often used to manage comments. another model, not considered here, is to create separate “staging” and “production” websites. content and features are piloted on the staging site before being pushed to the live site. (nyuhsl’s workflow occurs all on the live site.) still, even in a staging/production system the workflow is implicit in choosing someone who has the permission and responsibility to push the staging site to the production site. problems at nyuhsl in 2007, the web services librarian position at nyuhsl had been open for nearly a year. librarians who needed to post material to the website approached the head of library systems or the “sysadmin.” both of them could post pages, but they did not proofread. pages that became live on the website stayed: they were never systematically checked. if a librarian or user noticed a problem with a page, it was not clear who had the correct information or was responsible for fixing it. often, pages that were found to be out-of-date would be delinked from other pages but were left on the server and thus findable via search engines or bookmarks. because only a few people had ftp access to the server, but authored little content, the usernames shown on the server were useless for determining who was responsible for a page. similarly, timestamps on the server were misleading; someone might fix one link on a page without reviewing the rest of it, so the page could have a recent timestamp but be full of outdated information. even after a new web services librarian started in 2007, problems remained. the new librarian took over sole responsibility for posting content, which made the responsibility clearer but created a bottleneck, for example, if she went on vacation. furthermore, in a library with five locations and about sixty full-time employees, it was hard for one person to do justice to all the libraries’ activities. if a page required editing, there was no way to keep track of whose turn it was to work on the document. there also was no automatic notification when a page was published. this made it possible for content to go astray and be forgotten. these problems added up to frustration for would-be content authors, a time drain for systems staff, and less time to create new content and sites. the most significant 94 information technology and libraries | september 2011 at the top of the homepage. their appearance should not be delayed, so any staff author can publish one. class sessions are specific dates, times, and locations that a class is being offered. these posts are assembled from prewritten text, so there is no way to introduce errors and no reason to route them through an approval step. figure 2 illustrates the main steps of the three cases. the names of the states are shown with arrows indicating which role can make each transition. unlabeled arrows mean that any staff member can perform that step. figure 3 shows how, at each approval step, content can be sent back to the author (with comments) for revision. although this happens rarely, it is important to have a way to communicate with the author in a way that is traceable by the workflow. figure 4 illustrates the concept of retirement. nyuhsl needed a way to hide content from library users and search engines, but it is dangerous to allow library staff to delete content. also, old content is sometimes useful to refer to or can even be republished if the need arises. any library staff user can retire content if they recognize it as no longer relevant or appropriate. additionally, library staff can resurrect retired content by resetting it to the draft state. that is, they cannot directly publish retired content (because they do not have permission to publish), but they can put it back on the path to being published by saving it as a draft, editing, and resubmitting for approval. figure 5 shows that library staff do not really need to understand the details of workflow. for any new content, they only have two options: keep the content in the draft state or move it on to whatever next step is available. all them can perform an action. the five marketing team members do not vote on the content, nor do they all have to approve it; instead, any one of them, who happens to be at his workstation when they get a notification, is sufficient to perform the marketing team duty. also, the marketing team members and administrators do not “self-approve”—no matter how good an editor someone may be, he or she is rarely good at editing her own work. nyuhsl’s workflow considers three cases: 1. most types of content are reviewed by one of the administrators before going live. 2. content types that appear on the homepage (i.e., at higher visibility) are reviewed by a member of the marketing team before being reviewed by an administrator. 3. two types of content do not go through any workflow. alerts are urgent messages that appear in red figure 2. approval steps figure 3. returning contents for edits figure 4. retirement editorial and technological workflow tools to promote website quality | morton-owens 95 this may sound like a large volume of e-mail, but it does not appear to bother library staff. the subject line of every e-mail generated by the system is prefaced with “[hsl site]” for easy filtering. also, every e-mail is signed with “love, the nyuhsl website.” this started as a joke during testing but was retained because staff liked it so much. one described it as giving the site a “warm, fuzzy feeling.” drupal modules nyuhsl developers used a number of different drupal modules to achieve the desired workflow functionality. a simple system could be achieved using fewer modules; the book using drupal offers a good walkthrough of workflow, actions, and trigger.10 of course, it also would be possible to implement these ideas in another cms or in a homegrown system. this list does not describe how to configure each module because the features are constantly evolving; more information is available on the drupal website.11 the drupal modules used include: ■■ workflow ■■ actions ■■ trigger ■■ token ■■ module grants ■■ wysiwyg, imce, imce wysiwyg api bridge ■■ node expire ■■ taxonomy role ■■ ldap integration ■■ rules ■■ results participation figure 6 shows the number of page revisions per person from july 14, 2009, to november 4, 2010. since many pages are static and were created only once, but need to be updated regularly, a page creation and a page update count equally in this accounting, which was drawn from the node_revisions table in drupal. it gives a general sense of content-related activity. a reasonable number of staff have logged in, including all of the librarians and a number of staff in key positions (such as branch managers). the black bars represent the administrators of the website. it is clear that the workflow system, while broadening participation, has hardly diffused primary responsibility of managing the website. the web services librarian and web manager have by far the most page edits, as they both write new content and edit content written by all other users. of the other options are hidden because staff do not have permission to perform them. the status of content in the workflow can be checked by clicking on the workflow tab of each page, but it also is tracked by notification e-mails. when the content enters a state requiring an approval, each person in that approving role gets an e-mail letting them know something needs their attention. the e-mail includes a link directly to the editing page. for example, if a librarian writes a blog post and changes its state from “draft” to “ready for marketing approval,” he or she gets a confirmation e-mail that the post is in the marketing approval queue. the marketing team members each get an e-mail asking them to approve the post; only one needs to do so. once someone has performed that approval, the marketing team members receive an e-mail letting them know that no further action is required. now the content is in the “ready for approval” state and the author gets another e-mail notification. the administrators get a notification with a link to edit the post. once an administrator gives the post final approval, the author gets an e-mail indicating that the post is now live. the nyuhsl website workflow system also includes reminders. each piece of content in the system has an author (authorship can be reassigned, so it is not necessarily the person who originally created the page). the author receives an e-mail every four months reminding him or her to check the content, revise it if necessary, and re-save it so that it gets a new timestamp. if the author does not do so, he or she will continue to get reminders until the task is complete. also, the site administrators can refer to a list of content that is out of date and can follow up in person if needed. note that reminders only apply to static content types like pages and faqs, not to blog posts or event announcements, which are not expected to have permanent relevance. figure 5. workflow choices for library staff users 96 information technology and libraries | september 2011 check the status by clicking on the workflow tab. this eliminates the discouraging mystery of having content get lost on the way to being published. ■■ identifying “problem” content: the node expire module has been modified to send e-mail reminders about stale content; as a result, this “problem” figure 7 shows the distribution of content updates once the web team members have been removed. it is clear that a small number of heroic contributors are responsible for the bulk of new content and updates, with other users logging on sporadically to address specific needs or problems. how editorial workflow addresses nyuhsl’s problems different aspects of the nyuhsl editorial workflow address different website problems that existed before the move to a cms. together, the workflow features create a clearly defined track that marches contributed content along a path to publication while always making the history and status of that content clear. ■■ keeping track of who wrote what when: this information is collected by the core drupal software and visible on administrative pages. (drupal also can be customized to display or sort this information in more convenient ways.) ■■ preventing mistakes and inconsistencies: this requires a human editor, but drupal can be used to formalize that role, assign it to specific people, and ensure nothing gets published without being reviewed by an editor. ■■ bottlenecks: nyuhsl eliminated bottlenecks that stranded content waiting for one person to post it by creating roles with multiple members, any one of whom can advance content to the next state. there is no step in the system that can be performed by only one person. ■■ knowledge: the issue of having too much going on in the library for one person to report on was addressed by making it easier for more people to contribute. drupal encourages this through its usability (especially a wysiwyg editor), and workflow makes it safe by controlling how the contributions are posted. ■■ “lost” content: when staff contribute content, they get e-mail notifications about its status and also can figure 6. number of revisions by user each user is indicated by their employee type rather than by name. figure 7. number of revisions by user, minus web team each user is indicated by their employee type rather than by name editorial and technological workflow tools to promote website quality | morton-owens 97 places web content in the context of other communication methods, like e-mail marketing, press releases, and social media.12 in her view, it is not enough to consider a website on its own; it has to be part of a complete strategy for communicating with an organization’s audience. libraries embarking on a website redesign would benefit from contemplating this larger array of strategic issues in addition to the nitty-gritty of creating a process to ensure quality. ■■ conclusions nyuhsl differs from other libraries in its size, status as an academic medical library, level of it staffing, and other ways. some aspects of nyuhsl’s experience implementing editorial workflow will, however, likely be applicable to other libraries. it does not necessarily make sense to assign editorial responsibility to it staff; instead, there may be someone on staff who has editorial or journalistic experience and could serve as the content approver. many universities offer short copyediting courses, and a prospective website editor could attend such a course. implementing a workflow system, especially in drupal, requires a lot of detailed configuration. developers should make sure the workflow concept is clearly mapped out in terms of states, roles, and transitions before attempting to build anything. workflow can seem complicated to users too, so developers should endeavor to hide as much as possible from nonadministrators. small mistakes in drupal settings and permissions can cause confusing failures in the workflow system. for example, a user may find him or herself unable to advance a blog post from “draft” to “ready for approval,” or a state change from “ready for approval” to “live,” and may not actually cause the content to be published. it would save time in the long run to thoroughly test all the possibilities with volunteers who play each role before the site is in active use. finally, when the workflow is in place, the website’s managers may find themselves doing less writing and fewer content updates. they have a new role, though: to curate the site and support staff who use the new tools. the concept of editing is not yet consistently applied to websites unless the site represents an organization that already relies on editors (like a newspaper)—but it is gaining recognition as a best practice. if the website is the most readily available public face of an institution, it should receive editorial attention just as a brochure or fundraising letter would. workflow is one way that libraries can promote a higher level of quality and perceived competence and reliability through their website presence. content is usually addressed by library staff without the administrators/editors doing anything at all. the administrators also can access a page that lists all the content that has been marked as “expired” so they know with whom to follow up. ■■ outdated content: some content may be outdated and undesirable to show the public or be indexed by search engines, but be useful to librarians. it also is not safe to allow staff to delete content, as they may do so by accident. these issues are addressed by the notion of “retiring” content, which hides content by unpublishing it but does not delete it from the system. ■■ future work the workflow system sets up an environment that achieves nyuhsl’s goals, structurally speaking, but social (nontechnology) considerations prevent it from living up to its full potential. not all of the librarians contribute regularly. this is partly because they are busy, and writing web content is not one of their job requirements. another reason is that some staff are more comfortable using the system than others, a phenomenon that reinforces itself as the expert users spend more time creating content and become even more expert. a third cause is that not all librarians may perceive that they have something useful to say. reluctant contributors have no external motivation to increase their involvement. it would be helpful to formalize the role of librarians as content contributors. there is presently no librarian at nyuhsl whose job description includes writing content for the website; even the web services librarian is charged only with “coordinating, designing, and maintaining” sites. ideally, every librarian job description would include working with users and would mention writing website content as an important forum for that. that said, it is not clear what metric could be used to judge the contributions fairly. it also is important to continue to emphasize the value of content contributions so that librarians are motivated and feel recognized. even librarians whose specialties are not outreach-oriented (e.g., systems librarians) have expert knowledge that could be shared in, say, a short article on how to set up rss feeds. workflow is part of a group of concerns being called “content strategy.” this concept, which has grown in popularity since 2008, includes editorial quality alongside issues like branding/messaging, search engine optimization, and information architecture. a content strategist would be concerned with why content is meaningful in addition to how it is managed. in her brief, useful book on the topic, kristina halvorson 98 information technology and libraries | september 2011 5. andrea everard and dennis f. galletta, “how presentation flaws affect perceived site quality, trust, and intention to purchase from an online store,” journal of management information systems 22 (2005–6): 79. 6. pamela briggs et al., “trust in online advice,” social science computer review 20 (2002): 330. 7. patrick j. lynch and sarah horton “online style,” web style guide, 3rd ed., http://webstyleguide.com/wsg3/9-editorial-style/3-online-style.html (accessed dec. 1, 2010). 8. janice (ginny) redish, letting go of the words: writing web content that works (san francisco: morgan kaufman, 2007). 9. emily g. morton-owens, karen l. hanson, and ian walls, “implementing open-source software for three core library functions: a stage-by-stage comparison,” journal of electronic resources in medical libraries 8 (2011): 1–14. 10. angela byron et al., using drupal (sebastopol, calif.: o’reilly, 2008). 11. all drupal modules can be found via http://drupal.org/ project/modules. 12. kristina halvorson, content strategy for the web (berkeley, calif.: new riders, 2010). ■■ acknowledgments thank you to jamie graham, karen hanson, dorothy moore, and vikram yelanadu. references 1. charles martell, “the absent user: physical use of academic library collections and services continues to decline 1995–2006,” journal of academic librarianship 34 (2008): 400–407. 2. jeanie m. welch, “who says we’re not busy? library web page usage as a measure of public service activity,” reference services review 33 (2005): 371–79. 3. b. j. fogg and hsiang tseng, “the elements of computer credibility” (paper presented at chi ’99, pittsburgh, pennsylvania, may 15–20, 1999): 82. 4. b. j. fogg et al., “what makes web sites credible? a report on a large quantitative study” (paper presented at sigchi ’01, seattle, washington, mar. 31–apr. 4, 2001): 67–68. 75 the development and administration of automated systems in academic libraries richard de gennaro: harvard university library, cambridge, mass. the first part of this paper considers three general approaches to the development of an automation program in a large research library. the library may decide simply to wait for developments; it may attempt to develop a total or integrated system from the start; or it may adopt an evolutionary approach leading to an integrated system. outside consultants, it is suggested, will become increasingly important. the second part of the paper deals with important elements in any program regardless of the approach. these include the building of a capability to do automation work, staffing, equipment, organizational structure, selection of projects, and costs. since most computer-based systems in academic libraries at the present time are in the developmen tal or early operational stages when improvements and modifications are frequent, it is difficult to make a meaningful separation between the developmental function and the administrative or management function. development, administration, and operations are all bound up together and are in most cases carried on by the same staff. this situation will change in time, but it seems safe to assume that automated library systems will continue to be characterized by instability and change for the next several years. in any case, this paper will not attempt to distinguish between developmental and administrative ftmctions but will instead discuss in an informal and non-technical way some of the factors to be considered b y librarians and administrators when 76 journal of library automation vol. 1/ 1 march, 1968 their thoughts turn, as they inevitably must, to introducing computer systems into their libraries or to expanding existing machine operations. alternative approaches to library automation will be explored first. there will follow a discussion of some of the important elements that go into a successful program, such as building a capability, a staff, and an organization. the selection of specific projects and the matter of costs will also be covered briefly. approaches to library automation devising a plan for automating a library is not entirely unlike formulating a program for a new library building. while there are general types of building best suited to the requirements of different types of library, each library is unique in some respects, and requires a building which is especially designed for its own particular needs and situation. as there are no canned library building programs, so there are no canned library automation programs, at least not at this stage of development; therefore the first task of a library administration is to formulate an approach to automation based on a realistic assessment of the institution• s needs and resources. certain newly-founded university libraries such as florida atlantic, which have small book collections and little existing bibliographical apparatus, have taken the seemingly logical course of attempting to design and install integrated computer-based systems for all library operations. certain special libraries with limited collections and a flexible bibligraphical apparatus are also following this course. project intrex at m.i.t. is setting up an experimental library operation parallel to the traditional one, with the hope that the former will eventually transform or even supersede the latter. several older university libraries, including chicago, washington state, and stanford, are attempting to design total systems based on on-line technology and to implement these systems in modules. many other university libraries (british columbia, harvard, and yale to name only a few) approach automation in an evolutionary way and are designing separate, but related, batch-processing systems for various housekeeping functions such as circulation, ordering and accounting, catalog input, and card production. still other libraries (princeton is a notable example) expect to take little or no action until national standardized bibliographical formats have been promulgated, and some order or pattern has begun to emerge from the experimental work that is in progress. only time will tell which of these courses will be most fruitful. meanwhile the library administrator must decide what approach to take; and the approach to automation, like that to a building program, must be based on local requirements and available resources ( 1,2). for the sake of this discussion the major principal approaches will be considered under three headings: 1) the wait-for-developments approach, automated systems in academic libraries/ de gennaro 77 2) the direct approach to a total system, and 3) the evolutionary approach to a total system. the use of outside consultants will also be discussed. the wait-for-developments approach this approach is based on the premise that practically all computerbased library systems are in an experimental or research-and-development stage with questionable economic justification, and that it is unnecessary and uneconomical for every library to undertake difficult and costly development work. the advocates of this approach suggest that library automation should not be a moon race and say that it makes sense to wait until the pioneers have developed some standardized, workable, and economical systems which can be installed and operated in other libraries at a reasonable cost. for many libraries, particularly the smaller ones, this is a reasonable position to take for the next few years. it is a cautious approach which minimizes costs and risks. for the larger libraries, however, it overlooks the fact that soon, in order to cope with increasing workloads, they will have to develop the capability to select, adapt, implement, operate, and maintain systems that were developed elsewhere. the development of this capability will take time and will be made more· difficult by the absence of any prior interest and activity in automation within the adapting institution. the costs will be postponed and perhaps reduced because the late-starters will be able to telescope much of the process, like countries which had their industrial revolution late. however, it will take some courage and political astuteness for a library administrator to hold firmly to this position in the face of the pressures to automate that are coming from all quarters, both inside and outside the institution ( 3). a major error in the wait-for-developments approach is the assumption that a time will come when the library automation situation will have shaken down and stabilized so that one can move into the field confidently. this probably will not happen for many years, if it happens at all, for with each new development there is another more promising one just over the horizon. how long does one wait for the perfect system to be developed so that it can be easily "plugged in," and how does one recognize that system when one sees it? there is real danger of being left behind in this position, and a large library may then find it difficult indeed to catch up. the direct approach to a total system this approach to library automation is based on the premise that, since a library is a total operating unit and all its varied operations are interrelated and interconnected, the logic of the situation demands that it be looked upon as a whole by the systems designers and that a single inte78 journal of library automation vol. 1/ 1 march, 1968 grated or total system be designed to include all machinable operations in the library. such a system would make the most efficient and economical use of the capabilities of the computer. this does not require that the entire system be designed and implemented at the same time, but permits treating each task as one of a series of modules, each of which can be implemented separately, though designed as part of a whole. several large libraries have chosen this method and, while a good deal of progress is being made, these efforts are still in the early development stage. the university of chicago system is the most advanced (4) . unlike the evolutionary approach, which assumes that much can be done with local funds, home-grown staff, batch processing and even second generation computers, the total systems approach must be based on sophisticated on-line as well as batch-processing equipment. this equipment is expensive; it is also complex, requiring a trained and experienced staff of systems people and expert programmers to design, implement, and operate it effectively. since the development costs involved in this approach are considerable, exceeding the available resources of even the larger libraries, those libraries that are attempting this method have sought and received sizable financial backing from the granting agencies. the total systems approach has logic in its favor: it focuses on the right goal and the goal will ultimately be attainable. the chief difficulty, however, is one of timing. the designers of these systems are trying to telescope the development process by skipping an intermediate stage in which the many old manual systems would have been converted to simple batch-processing or off-line computer systems, and the experience and knowledge thus acquired utilized in taking the design one step further into a sophisticated, total system using both on-line and batch-processing techniques. the problem is that we neither fully understand the present manual systems nor the implications of the new advanced ones. we are pushing forward the frontiers of both library automation and computer technology. it may well be that the gamble will pay off, but it is extremely doubtful that the first models of a total library system will be economically and technically viable. the best that can be hoped for is that they will work well enough to serve as prototypes for later models. while bold attempts to make a total system will unquestionably advance the cause of library automation in general, the pioneering libraries may very well suffer serious setbacks in the process, and the prudent administrator should carefully weigh the risks and the gains of this approach for his own particular library. the evolutionary approach to a total system this approach consists basically of taking a long-range, conservative view of the problem of automating a large, complex library. the ultimate goal is the same as that of the total systems approach described in the automated systems in academic libraries/de gennaro 79 preceding section, but the method of reaching it is different. in the total systems approach, objectives are defined, missions for reaching those objectives are designed, and the missions are computerized, usually in a series of modules. in the evolutionary approach, the library moves from traditional manual systems to increasingly complex machine systems in successive stages to achieve a total system with the least expenditure of effort and money and with the least disruption of current operations and services ( 5 ) . in the first stage the library undertakes to design and implement a series of basic systems to computerize various procedures using its own staff and available equipment. this is something of a bootstrap operation, the basic idea of which is to raise the level of operation circulation, acquisitions, catalog input, etc. -from existing manual systems to simple and economical machine systems until major portions of the conventional systems have been computerized. in the process of doing this, the library will have built up a trained staff, a data processing department or unit with a regular budget, some equipment, and a space in which to work: in short, an in-house capability to carry on complex systems work. during this first stage the library will have been working with tried and tested equipment and software packages probably of the second generation variety and meanwhile, third generation computers with on-line and time-sharing software are being debugged and made ready for use in actual operating situations. at some point the library itself, computer hardware and software, and the state of the library automation art will all have advanced to a point where it will be feasible to undertake the task of redesigning the simple stage-one systems into a new integrated stage-two system which builds upon the designs and operating experience obtained with the earlier systems. these stage-one systems will have been, for the most part, mechanized versions of the old manual systems; but the stage-two systems, since they are a step removed from the manual ones, can be designed to incorporate significant departures from the old way of doing things and take advantage of the capabilities of the advanced equipment and software that will be used. the design, programming, and implementation of these stage-two systems will be facilitated by the fact that the library is going from one logical machine system to another, rather than from primitive unformalized manual systems to highly complex machine systems in one step. because existing manual systems in libraries produce no hard statistical data about the nature and number of transactions handled, stage-one machine systems have had to be designed without benefit of this essential data. however, even the simplest machine systems can be made to produce a wide variety of statistical data which can be used to great advantage by the designers of stage-two systems. the participation of non80 journal of library automation vol. 1/ 1 march, 1968 library-oriented computer people in stage-two design will also ·be facilitated by the fact that they will be dealing with formalized machine systems and records in machine readable form with which they can easily cope. while the old stage one of library automation was one in which librarians almost exclusively did the design and programming, it is doubtful that stage-two systems can or should be done without the active aid of computer specialists. in stage one it was easier for librarians to learn computing and to do the job themselves than it was to teach computer people about the old manual systems and the job to be done to convert them. this may no longer be the case in dealing with redesign of old machine systems into very complex systems to run on third or fourth generation equipment in an on-line, time-sharing environment. there is now a generation of experienced computer-oriented librarians capable of specifying the job to be done and knowledgeable enough to judge the quality of the work that has been done by the experts. there is no reason why a team of librarians and computer experts should not be able to work effectively together to design and implement future library systems. as traditional library systems are replaced by machine systems, the specialized knowledge of them becomes superfluous, and it was this type of knowledge that used to distinguish the librarian from the computer expert. just as there is a growing corps of librarians specializing in computer work, so there is a growing corps of computer people specializing in library work. it is with these two groups working together as a team that the hope of the future lies. the question of who is to do library automation librarians or computer experts is no longer meaningful; library automation will be done by persons who are knowledgeable about it and who are deeply committed to it as a specialty; whether they have approached it through a background of librarianship or technology will be of little consequence. experience has shown that computer people who have made a full-time commitment to the field of library automation have done some of the best work to date. stage-two, or advanced integrated library systems, may be built by a team of library and computer people of various types working as staff members of the library, as has been suggested in the preceding discussion, but this approach also has its weaknesses. for example, let us assume that a large library has finally brought itself through stage one and is now planning to enter the second stage. it may have acquired a good deal of the capability to do advanced work, but its staff may be too small and too inexperienced in certain aspects of the work to undertake the major task of planning, designing, and implementing a new integrated system. additional expert help may be needed, but only on a temporary basis during the planning and design stages. such people will be hard to find, and also hard to hire within some library salary structures. they automated systems in academic librari-es/ de gennaro 81 will be difficult to absorb into the library's existing staff, administrative, and physical framework. they may also be difficult to separate from the staff when they are no longer needed. use of outside consultants there are alternative approaches to creating advanced automated systems. the discussion that follows will deal with one of the most obvious: to contract much of the work out to private research and development firms specializing in library systems. what comes to mind here is an analogy with the employment of specialized talents of architects, engineers, and construction companies in planning and building very large, complex and costly library buildings, which are then turned over to librarians to operate. when a decision has been made to build a new building, the university architect is not called in to do the job, nor is an architect added to the library staff, nor are librarians on the staff trained to become architects and engineers qualified to design and supervise the construction of the building. most libraries have on their staffs one or two librarians who are experienced and knowledgeable enough to determine the over-all requirements of the new building, and together they develop a building program which outlines the general concept of the building and specifies various requirements. a qualified professional architect is commissioned to translate the program into preliminary drawings, and there follows a continuing dialogue between the architect and the librarians which eventually produces acceptable working drawings of a building based on the original program. for tasks outside his area of competence, the architect in turn engages the services of various specialists, such as structural and heating and ventilating engineers. both the architect and the owners can also call on library consultants for help and advice if needed. the architect participates in the selection of a construction company to do the actual building and is responsible for supervic;ing the work and making sure that the building is constructed according to plans and contracts. upon completion, the building is turned over to the owners, and the librarians move in and operate it and see to its maintenance. in time, various changes and additions will have to be made. minor ones can be made by the regular buildings staff of the institution, but major ones will probably be made with the advice and assistance of the original architect or some other. in the analogous situation, the library would have its own experienced systems unit or group capable of formulating a concept and drawing up a written program specifying the goals and requirements of the automated system. a qualified "architect" for the system would be engaged in the form of a small firm of systems consultants specializing or experienced in library systems work. their task, like the architect's, would be to turn 82 journal of library automation vol. 1/ 1 march, 1968 the general program into a detailed system design with the full aid and participation of the local library systems group. this group would be experienced and competent enough to make sure that the consultants really understood the program and were working in harmony with it. mter an acceptable design had emerged from this dialogue, the consultant would be asked to help select a systems development firm which would play a role similar to that of the construction company in the analog: to complete the very detailed design work and .to do the programming and debugging and implementation of the system. the consultant would oversee this work, just as the architect oversees the construction of a building. the local library group will have actively participated in the development and implementation of the system and would thus be competent to accept, operate, maintain and improve it. success or failure in this approach to advanced library automation will depend to a large extent on the competence of the "architect" or consultant who is engaged. until recently this was not a very promising route to take for several reasons. there were no firms or consultants with the requisite knowledge and experience in library systems, and the state of the library automation art was confused and lacking in clear h·ends or direction. it was generally felt tl1at batch-processing systems on second and even third generation computing equipment could and should be designed and installed by local staff in order to give them necessary experience and to avoid the failures that could come from systems designed outside the library. library automation has evolved to a point where there is a real need for advanced library systems competence that can be called upon in the way that has been suggested, and individuals and firms will appear to satisfy that need. it is very likely, however, that the knowledge and the experience that is now being obtained in on-line systems by pioneering libraries such as the university of chicago, washington state university and stanford university, will have to be assimilated before we can expect competent consultants to emerge. the chief difficulty with the architect-and-building analog is that while the process of designing and constructing library buildings is widely understood, there being hundreds of examples of library buildings which can be observed and studied as precedents, the total on-line library system has yet to be designed and tested. there are no precedents and no examples; we are in the position of asking the "architect'' to design a prototype system, and therein lies the risk. mter this task has been done several times, librarians can begin to shop around for experienced and competent "architects" and successful operating systems which can be adapted to their needs. the key problem here, as always in library automation, is one of correct timing: to embark on a line of development automated systems in academic libraries/de gennaro 83 only when the state of the art is sufficiently advanced and the time is ripe for a particular new development. building the capability for automation regardless of the approach that is selected, there are certain prerequisites to a successful automation effort, and these can be grouped under the rubric of "building the capability." to build this capability requires time and money. it consists of a staff, equipment, space, an organization with a regular budget, and a certain amount of know-how which is generally obtained by doing a series of projects. success depends to a large extent on how well these resources are utilized, i.e. on the overall sh·ategy and the nature and timing of the various moves that are made. much has already been said about building the capability in the discussion on the approaches to automation, and what follows is an expansion of some points that have been made and a recapitulation of others. staff since nothing gets done without people, it follows that assembling, training, and holding a competent staff is the most important single element in a library's automation effort. the number of trained and experienced library systems people is still extremely small in ·relation to the ever-growing need and demand. to attract an experienced computer librarian and even to hold an inexperienced one with good potential, libraries will have to pay more than they pay members of the staff with comparable experience in other lines of library work. this is simply the law of supply and demand at work. to attract people from the computer field will by the same token require even higher salaries. in addition, library systems staff, because of the rate of development of the field and the way in which new information is communicated, will have to be given more time and funds for training courses and for travel and attendance at conferences than has been the case for other library staff. the question of who will do library automation-librarians or computer experts-has already been touched upon in another context, but it is worth emphasizing the point that there is no unequivocal answer. there are many librarians who have acquired the necessary computer expertise and many computer people who have acquired the necessary knowledge of library functions. the real key to the problem is to get people who are totally committed to library automation whatever their background. computer people on temporary loan from a computing center may be poor risks, since their professional commitment is to the computer world rather than that of the library. they are paid and promoted by the computing center and their primary loyalty is necessarily to that employer. computer people, like the rest of us, give their best to tasks which they find interesting and challenging, and by and large, they tend to look l 84 journal of library automation vol. 1/ 1 march, 1968 upon the computerization of library housekeeping tasks as trivial and unworthy of their efforts. on the other hand, a first-rate computer person who has elected to specialize in library automation and who has accepted a position on a library staff may be a good risk, because he will quickly take on many of the characteristics of a librarian yet without becoming burdened by the full weight of the conventional wisdom that librarians are condemned to carry. the ideal situation is to have a staff large enough to include a mixture of both types, so that each will profit by the special knowledge and experience of the other. to bring in computer experts inexperienced in library matters to automate a large and complex library without the active participation of the library's own systems people is to invite almost certain failure. outsiders, no matter how competent, tend to underestimate the magnitude and complexity of library operations; this is tme not only of computing center people but also of independent research and development firms. a library automation group can include several different types of persons with very different kinds and levels of qualifications. the project director or administrative head should preferably be an imaginative and experienced librarian who has acquired experience with electronic data processing equipment and techniques, and an over-all view of the general state of the library automation art, including its potential and direction of development. there are various levels of library systems analysts and programmers, and the number and type needed will depend on the approach and the stage of a particular library's automation effort. the critical factor is not numbers but quality. there are many cases where one or two inspired and energetic systems people have far surpassed the efforts of much larger groups in both quality and quantity of work. some of the most effective library automation work has been done by the people who combine the abilities of the systems analyst with those of the expert programmer and are capable of doing a complete project themselves. a library that has one or two really gifted systems people of this type and permits them to work at their maximum is well on the way to a successful automation effort. as a library begins to move into development of on-line systems, it will need specialist programmers in addition to the systems analysts described above. these programmers need not be, and probably will not be, librarians. other members of the team, again depending on the projects, will be librarians who are at home in the computer environment but who will be doing the more traditional types of work, such as tagging and editing machine catalog records. in any consideration of library automation staff, it would be a mistake to underestimate the importance of the role of keypunchers, paper tape automated systems in academic libraries/de gennaro 85 typists, and other machine operators; it is essential that these staff members be conscientious and motivated persons. they are responsible for the quality and quantity of the input, and therefore of the output, and they can frequently do much to make or break a system. a good deal of discussion and experimentation has gone into the question of the relative efficiency of various keyboarding devices for library input, but little consideration is given to the human operators of the equipment. experience shows that there can be large variations in the speed and accuracy of different persons doing the same type of work on the same machine. equipment one of the lessons of library automation learned during the last few years is that a library cannot risk putting its critical computer-based systems onto equipment over which it has no control. this does not necessarily mean that it needs its own in-house computer. however, if it plans to rely on equipment under the administrative control of others, such as the computer center or the administrative data processing unit, it must get firm and binding commitments for time, and must have a voice in the type and configuration of equipment to be made available. the importance of this point may be overlooked during an initial development period, when the library's need for time is minimal and flexible; it becomes extremely critical when systems such as acquisitions and . circulation become totally dependent on computers. people at university computing centers are generally oriented toward scientific and research users and in a tight situation wiu give the library's needs second priority; those in administrative data process~g, because they are operations oriented, tend to have a somewhat better appreciation of the library's requirements. in any case, a library needs more than the expressed sympathy and goodwill of those who control the computing equipment-it needs firm commitments. for all but the largest libraries, the economics of present-day computer applications in libraries make it virtually impossible to justify an in-house machine of the capacity libraries will need, dedicated solely or largely to library uses. even the larger libraries will find it extremely difficult to justify a high-discount second generation machine or a small third generation machine during the period when their systems are being developed and implemented a step or a module at a time. eventually, library use may increase to a point where the in-house machine will pay for itself, but during the interim period the situation will be uneconomical unless other users can be found to share the cost. in the immediate future, most libraries will have to depend on equipment located in computing or data processing centers. the recent experience of the university of chicago library, which is pioneering on-line systems, suggests that this situation is inevitable, given the high core requirements and low com86 journal of library automation vol. 1/ 1 march, 1968 puter usage of library systems. experience at the university of missouri ( 6), suggests that the future will see several libraries grouping to share a machine dedicated to library use; this may well be preferable to having to share with research and scientific users elsewhere within the university. a clear trend is not yet evident, but it seems reasonable to suppose that in the next few years sharing of one kind or another will be more common than having machines wholly assigned to a single library; and that local situations will dictate a variety of arrangements. while it is clear that the future of library automation lies in third-generation computers, much of their promise is as yet unfulfilled, and it would be premature at this point to write off some of the old, reliable, second-generation batch-processing machines. the ibm 1401, for example, is extremely well suited for many library uses, particularly printing and formatting, and it is a machine easily mastered by the uninitiated. this old workhorse will be with us for several more years before it is retired to majorca along with obsolete paris taxis. organization when automation activity in a library has progressed to a point where the systems group consists of several permanent professionals and several clericals, it may be advisable to make a permanent place for the group in the library's regular organizational structure. the best arrangement might be to form a separate unit or department on an equal footing with the traditional departments such as acquisitions, cataloging, and public services. this systems department would have a two-fold function: it would develop new systems and operate implemented systems; and it would bring together for maximum economy and efficiency most of the library's data processing equipment and systems staff. it will require adequate space of its own andabove alla regular budget, so that permanent and long-term programs can be developed and sustained on some thing other than an ad hoc basis. there are other advantages to having an established systems department or unit. it gives a sense of identity and esprit to the staff; and it enables them to work more effectively with other departments and to be accepted by them as a permanent fact of life in the library, thereby diminishing resistance to automation. let there be no mistake about it the systems group will be a permanent and growing part of the library staff, because there is no such thing as a finished, stable system. (there is a saying in the computer field which goes "if it works, it's obsolete.") the systems unit should be kept flexible and creative. it should not be allowed to become totally preoccupied with routine operations and submerged in its day-to-day workload, as is too frequently the case with the traditional departments, which consequently lose their capacity to see their operations clearly and to innovate. part of the systems effort automated systems in academic libraries/de gennaro 87 must be devoted to operational systems, but another part should be devoted to the formulation and development of new projects. the creative staff should not be wasted running routine operations . . . there has never been any tradition for research and development work in libraries they were considered exclusively service and operational institutions. the advent of the new technology is forcing a change in this traditional attitude in some of the larger and more innovative libraries which are doing some research and a good deal of development. it is worth noting that a concomitant of research and development is a certain amount of risk but that, while there is no such thing as change without risk, standing pat is also a gamble. not every idea will succeed and we must learn to accept failures, but the experiments must be conducted so as to minimize the effect of failure on actual library operations. ·automated systems are never finished they are open-ended. they are always being changed, enlarged, and improved; and program and system maintenance will consequently be a permanent activity. this is one of the chief reasons why the equipment and the systems group should be concentrated in a separate department. the contrary case, namely dispersion of the operational aspects among the departments responsible for the work, may be feasible in the future as library automation becomes more sophisticated and peripheral equipment becomes less expensive, but the odds at this time appear to favor greater centralization. · the harvard university library has created, with good results, a new major department along the lines suggested above, except that it also includes the photo-reproduction services. the combination of data processing and reprography in a single department is a natural and logical relationship and one which will have increasingly important implications as both technologies develop concurrently and with increasing interdependence in the future. even at the present time, there is sufficient relationship between them so that the marriage is fruitful and in no way premature. while computers have had most of the glamour, photographic technology in general, and particularly the advent of the quick-copying machine, during the last seven years has so far had a more profound and widespread impact on library resources and services to readers than the entire field of computers and data processing. within the next several years, computer and reprographic technology will be so closely intertwined in libraries as to be inseparable. it would be a mistake to sell reprography short in the coming revolution. project selection no academic library should embark on any type of automation program without first acquiring a basic knowledge of the projects and plans of the library of congress, the national library of medicine, the national li88 journal of library automation vol. 1/ 1 march, 1968 · brary of agriculture, and certain of their joint activities, such as the national serials data program. as libraries with no previous experience with data processing systems move into the field of automation, they frequently select some relatively simple and productive projects to give experience to the systems staff and confidence in machine tec;hniques to the rest of the library staff. precise selection will depend on the local situation, but projects such as the production of lists of current journals (not serials check-in), lists of reserve books, lists of subject headings, circulation, and even acquisitions ordering and accounting systems are considered to be the safest and the most productive type of initial projects. since failures in the initial stage will have serious psychological effects on the library administration and entire staff, it is best to begin with modest projects. until recently it was fashionable to tackle the problem of automating the serials check-in system as a first project on the grounds that this was one of the most important, troublesome, and repetitive library operations and was therefore the best area in which to begin computerization. fortunately, a more realistic view of the serials problem has begun to prevail that serial receipts is an extremely complex and irregular library operation and one which will probably require some on-line updating capabilities, and complex file organization and maintenance programs. in any case, it is decidedly not an area for beginners. a major objection to all of the projects mentioned is that they do not directly involve the catalo~, which is at the heart of library automation. now that the marc ii tormat has been developed by the library of congress and is being widely accepted as the standardized bibliographical and communications format, the most logical initial automation effort for many libraries will be to adapt to their own environments the input system for current cataloging which is now being developed by the library of congress. the logic of beginning an integrated system with the development of an input sub-system for current cataloging has always been compelling for this author far more compelling than beginning in the ordering process, as so many advocate. the catalog is the central record, and the conversion of this record into machinable form is the heart of the matter of library automation. it seems self-evident that systems design should begin here with the basic bibliographical entry upon which the entire system is built. having designed this central module, one can then tum to the acquisitions process and design this module around the central one. circulation is a similar secondary problem. in other words, systems design should begin at the point where the permanent bibliographical record enters the system and not where the first tentative special-purpose record is created. unfortunately, until the advent of the standardized marc ii format, it was not feasible, except in automated systems in academic libraries/ de gennaro 89 an experimental way, for libraries to begin with the catalog record, simply because the state of the art was not far enough advanced. the development and acceptance of the marc ii format in 1967 marks the end of one era in library automation and the beginning of another. in the pre-marc ii period every system was unique; all the programming and most of the systems work had to be done by a library's own staff. in the post-marc ii period we will begin to benefit from systems and programs that will be developed at the library of congress and elsewhere, because they will ~e designed around the standard format and for at least one standard computer. as a result of this, automation in libraries will be greatly accelerated and will become far more widespread in the next few years ( 7). an input system for current cataloging in the marc ii format will be among the first packages available. it will be followed shortly by programs designed to sort and manipulate the data in various ways. a library will require a considerable amount of expertise on the part of its staff to adapt these procedures and programs to its own uses (we are not yet at the point of "plugging-in" systems), but the effort will be considerably reduced and the risks of going down blind alleys with homemade approaches and systems will be nearly eliminated for those libraries that are willing to adopt this strategy. the development and operation of a local marc ii input system with an efficient alteration and addition capability will be a prerequisite for any library that expects to learn to make effective use of the magnetic tapes containing the library of congress's current c;atalog data in the marc ii format, which will be available as a regular subscription in july, 1968. in addition to providing the experience essential for dealing with the library of congress marc data, a local input system will enable the library to enter its own data both into the local systems and into the national systems which will l?egin to emerge in the near future. since the design of the marc ii format is also hospitable to other kinds of library data, such as subject-headings lists and classification schedules, the experience gained with it in an input system will be transferable to other library automation projects. costs the price of doing original development work in the library automation field comes extremely highso high that in most cases such work cannot be undertaken without substantial assistance from outside sources. even when grants are available, the institution has to contribute a considerable portion of the total cost of any development effort, and this cost is not a matter of money alone; it requires the commitment of the library's limited human resources. in the earlier days of library automa-:tion attention was focused on the high cost of hardware, computer and 90 journal of library automation vol 1/ 1 march, 1q.68 peripheral equipment. the cost of software, the systems work and programming, tended to be underestimated. experience has shown, however, that software costs are as high as hardware costs or even higher. the development of new systems, i.e., those without precedents, is the most costly kind of library automation, and most libraries will have to select carefully the areas in which to do their original work. for those libraries that are content to adopt existing systems, the costs of the systems effort, while still high, are considerably less and the risks are also reduced. these costs, however, will probably have to be borne entirely by the institution, as it is unlikely that outside funding can be obtained for this type of work. the justification of computer-based library systems on the basis of the costs alone will continue to be difficult because machine systems not only replace manual systems but generally do more and different things, and it is extremely difficult to compare them with the old manual systems, which frequently did not adequately do the job they were supposed to do and for which operating costs often were unknown. generally speaking, and in the short run at least, computer-based systems will not save money for an institution if all development and implementation costs are included. they will provide better and more dependable records and systems, which are essential to enable libraries simply to cope with increased intake and workloads, but they will cost at least as much as the inadequate and frequently unexpansible manual systems they replace. the picture may change in the long run, but even then it seems more reasonable to expect that automation, in addition to profoundly changing · the way in which the library budget is spent, will increase the total cost of providing library service. however, that service will be at a much higher level than the service bought by today's library budget. certain jobs will be eliminated, but others will be created to provide ·new services and services in greater depth; as a library becomes increasingly successful and responsive, more and more will be demanded of it. conclusion the purpose of this paper has been to stress the importance of good strategy, correct timing, and intelligent systems staff as the essential ingredients for a successful automation program. it has also tried to make clear that no canned formulas for automating an academic library are waiting to be discovered and applied to any particular library. each library is going to have to decide for itseh which approach or strategy seems best suited to its own particular needs and situation. on the other hand, a good deal of experience with the development and administration of library systems has been acquired over the last few years and some of it may very well be useful to those who are about to take the plunge for the first time. this paper was written with the intention of automated systems in academic libraries/ de gennaro 91 passing along, for what they are worth, one man's ideas, opinions, and impressions based on an imperfect knowledge of the state of the library automation art and a modest amount of first-hand experience in library systems development and administration. references 1. wasserman, paul: th e librarian and the machine (detroit: gale, 1965). a thoughtful and thorough review of the state of the art of library automation, with some discussion of the various approaches to automation. essential reading for library administrators. 2. cox, n. s. m.; dews, j. d.; dolby, j. l.: the computer and the libmry (newcastle upon tyne: university of newcastle upon tyne, 1966). american edition published by archon books, hamden, conn. extremely clear, well-written and essential book for anyone with an interest in library automation. 3. dix, william s.: annual report of the librarian for the year ending june 30, 1966 (princeton: princeton university library, 1966). one of the best policy statements on library automation; a comprehensive review of the subject in the princeton context, with particular emphasis on the "wait-for-developments" approach. 4. fussier, herman h.; payne, charles t.: annual report 1966/67 to the national science foundation from the university of chicago library; development of an integrated, computer-based, bibliographical data system for a large university library (chicago: university of chicago library, 1967 ). appended to the report is a paper given may 1, 1967, at the clinic on library application of data processing conducted by the graduate school of library science, university of illinois. mr. payne is the author, and the paper is entitled "an integrated computer-based bibliographic data system for a large university library: progress and problems at the university of chicago." 5. kilgour, frederick g.: "comprehensive modern library systems," in the brasenose conference on the automation of libraries, proceedings. (london: mansell, 1967), 46-56. an example of the evolutionary approach as employed at the yale university library. 6. parker, ralph h.: "not a shared system: an account of a computer operation designed specifically and solely for library use at the university of missouri," librm·y journal, 92 (nov. 1, 1967), 3967-3970. 7. annual review of information science and technology (new york: lnterscience publishers), 1 ( 1966) . a useful tool for surveying the current state of the library automation art and for obtaining citations to current publications and reports is a chapter on automation in libraries which appears in each volume. communications ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 172 information technology and libraries | december 2009 information discovery insights gained from multipac, a prototype library discovery system alex a. dolski at the university of nevada las vegas libraries, as in most libraries, resources are dispersed into a number of closed “silos” with an organization-centric, rather than patron-centric, layout. patrons frequently have trouble navigating and discovering the dozens of disparate interfaces, and any attempt at a global overview of our information offerings is at the same time incomplete and highly complex. while consolidation of interfaces is widely considered to be desirable, certain challenges have made it elusive in practice. m ultipac is an experimental “discovery,” or metasearch, system developed to explore issues surrounding heterogeneous physical and networked resource access in an academic library environment. this article discusses some of the reasons for, and outcomes of, its development at the university of nevada las vegas (unlv). n the case for multipac fragmentation of library resources and their interfaces is a growing problem in libraries, and unlv libraries is no exception. electronic information here is scattered across our innovative webpac; our main website, our three branch library websites; remote article databases, local custom databases, local digital collections, special collections, other remotely hosted resources (such as libguides), and others. the number of these resources, as well as the total volume of content offered by the libraries, has grown over time (figure 1), while access provisions have not kept pace in terms of usability. in light of this dilemma, the libraries and various units within have deployed finding and search tools that provide browsing and searching access to certain subsets of these resources, depending on criteria such as n the type of resource; n its place within the libraries’ organizational structure; n its place within some arbitrarily defined topical categorization of library resources; n the perceived quality of its content; and n its uniqueness relative to other resources. these tools tend to be organization-centric rather than patron-centric, as they are generally provisioned in relative isolation from each other without placing as much emphasis on the big picture (figure 2). the result is, from the patron’s perspective, a disaggregated mass of information and scattered finding tools that, to varying degrees, each accomplishes its own specific goals at the expense of macro-level findability. currently, a comprehensive search for a given subject across as many library resources as possible might involve visiting a half-dozen interfaces or more—each one predicated upon awareness of each individual interface, its relation to the others, and figure 1. “silos” in the library figure 2. organization-centric resource provisioning alex a. dolski (alex.dolski@unlv.edu) is web & digitization application developer at the university of nevada las vegas libraries. information discovery insights gained from multipac | dolski 173 the characteristics of its specific coverage of the corpus of library content. our library website serves as the de facto gateway to our electronic, networked content offerings. yet usability studies have shown that findability, when given our website as a starting point, is poor. undoubtedly this is due, at least in part, to interface fragmentation. test subjects, when given a task to find something and asked to use the library website as a starting point, fail outright in a clear majority of cases.1 multipac is a technical prototype that serves as an exploration of these issues. while the system itself breaks no new technical ground, it brings to the forefront critical issues of metadata quality, organizational structure, and long-term planning that can inform future actions regarding strategy and implementation of potential solutions at unlv and elsewhere. yet it is only one of numerous ways that these issues could be addressed.2 in an abstract sense, multipac is biased toward principles of simplification, consolidation, and unification. in theory, usability can be improved by eliminating redundant interfaces, consolidating search tools, and bringing together resource-specific features (e.g., opac holdings status) in one interface to the maximum extent possible (figure 3). taken to an extreme, this means being able to support searching all of our resources, regardless of type or location, from a single interface; abstracting each resource from whatever native or built-in user interface it might offer; and relying instead on its data interface for querying and result-set gathering. thus multipac is as much a proof-of-concept as it is a concrete implementation. n background: how multipac became what it is multipac came about from a unique set of circumstances. from the beginning, it was intended as an exploratory project, with no serious expectation of it ever being deployed. our desire to have a working prototype ready for our discovery mini-conference meant that we had just six weeks of development time, which was hardly sufficient for anything more than the most agile of table 1. some popular existing library discovery systems name company/institution commercial status aquabrowser serials solutions commercial blacklight university of virginia open-source (apache) encore innovative interfaces commercial extensible catalog university of rochester open-source (mit/gpl) libraryfind oregon state university open-source (gpl) metalib ex libris commercial primo ex libris commercial summon serials solutions commercial vufind villanova university open-source (gpl) worldcat local oclc commercial table 2. some existing back-end search servers name company/institution commercial status endeca endeca technologies commercial idol autonomy commercial lucene apache foundation open-source (apache) search server microsoft commercial search server express microsoft free solr (superset of lucene) apache foundation open-source (apache) sphinx sphinx technologies open-source (gpl) xapian community open-source (gpl) zebra index data open-source (gpl) 174 information technology and libraries | december 2009 development models. the resulting design, while foundationally solid, was limited in scope and depth because of time constraints. another option, instead of developing multipac, would have been to demonstrate an existing open-source discovery system. the advantage of this approach is that the final product would have been considerably more advanced than anything we could have developed ourselves in six weeks. on the other hand, it might not have provided a comparable learning opportunity. n survey of similar systems were its development to continue, multipac would find itself among an increasingly crowded field of competitors (table 1). a number of library discovery systems already exist, most backed by open-source or commercially available back-end search engines (table 2), which handle the nitty-gritty, low-level ingestion, indexing, and retrieval. these lists of systems are by no means comprehensive and do not include notable experimental or research systems, which would make them much longer. n architecture in terms of how they carry out a search, meta-search applications can be divided into two main groups: distributed (or federated search), in which searches are “broadcast” to individual resources that return results in real time (figure 4); and harvested search, in which searches are carried out against a local index of resource contents (figure 5).3 both have advantages and disadvantages beyond the scope of this article. multipac takes the latter approach. it consists of three primary components: the search server, the user interface, and the metadata harvesting system (figure 6). figure 4. the federated search process figure 5. the harvested search process figure 6. the three main components of multipac figure 3. patron-centric resource provisioning information discovery insights gained from multipac | dolski 175 n search server after some research, solr was chosen as the search server because of its ease of use, proven library track record, and http–based representational state transfer (rest) application programming interface (api), which improves network-topological flexibility, allowing it to be deployed on a different server than the front-end web application—an important consideration in our server environment.4 jetty—a java web application server bundled with solr—proved adequate and convenient for our needs. the metadata schema used by solr can be customized. we derived ours from the unqualified dublin core metadata element set (dcmes),5 with a few fields removed and some fields added, such as “library” and “department,” as well as fields that support various multipac features, such as thumbnail images, and primary record urls. dcmes was chosen for its combination of generality, simplicity, and familiarity. in practice, the solr schema is for finding purposes only, so whether it uses a standard schema is of little importance. n user interface the front-end multipac system is written in php 5.2 in a model-view-controller design based on classical object design principles. to support modularity, new resources can be added as classes that implement a resource-class interface. the multipac html user interface is composed of five views: search, browse, results, item, and list, which exist to accommodate the finding process illustrated in figure 7. each view uses a custom html template that can be easily styled by nonprogrammer web designers. (needless to say, judging by figures 8–12, they haven’t been.) most dynamic code is encapsulated within dedicated “helper” methods in an attempt to decouple the templates from the rest of the system. output formats, like resources, are modular and decoupled from the core of the system. the html user interface is one of several interfaces available to the multipac system; others include xml and json, which effectively add web services support to all encompassed resources—a feature missing from many of the resources’ own built-in interfaces.6 n search view search view (figure 8) is the simplest view, serving as the “front page.” it currently includes little more than a brief introduction and search field. the search field is not complicated; it is, in fact, possible to include search forms on any webpage and scope them to any subset of resources on the basis of facet queries. for example, a search form could be scoped to las vegas–related resources in special collections, which would satisfy the demand of some library departments for custom search engines tailored to their resources without contributing to the “interface fragmentation” effect discussed in the introduction. (this would require a higher level of metadata quality than we currently have, which will be discussed in depth later.) because search forms can be added to any page, this view is not essential to the multipac system. to improve simplification, it could be easily removed and replaced with, for example, a search form on the library homepage. n browse view browse view (figure 9) is an alternative to search view, intended for situations in which the user lacks a “concrete target” (figure 7). as should be evident by its appearance, figure 7. the information-finding process supported by multipac figure 8. the multipac search view page 176 information technology and libraries | december 2009 this is the least-developed view, simply displaying facet terms in an html unordered list. notice the facet terms in the format field; this is malprocessed, marc– encoded information resulting from a quick-and-dirty extensible stylesheet language (xsl) transformation from marcxml to solr xml. n results view the results page (figure 10) is composed of three columns: 1. the left column displays a facet list—a feature generally found to be highly useful for results-gathering purposes.7 the data in the list is generated by solr and transformed to an html unordered list using php. the facets are configurable; fields can be made “facetable” in the solr schema configuration file. 2. the center column displays results for the current search query that have been provided by solr. thumbnails are available for resources that have them; generic icons are provided for those that do not. currently, the results list displays item title and description fields. some items have very rich descriptions; others have minimal descriptions or no descriptions at all. this happens to be one of several significant metadata quality issues that will be discussed later. 3. the right column displays results from nonindexed resources, including any that it would not be feasible to index locally, such as google, our article databases, and so on. multipac displays these resources as collapsed panes that expand when their titles are clicked and initiate an ajax request for the current search query. in a situation in which there might be twenty or more “panes” to load, performance would obviously suffer greatly if each one had to be queried each time the results page loaded. the on-demand loading process greatly speeds up the page load time. currently, the right column includes only a handful of resource panes—as many as could be developed in six weeks alongside the rest of the prototype. it is anticipated that further development would entail the addition of any number of panes—perhaps several dozen. the ease of developing a resource pane can vary greatly depending on the resource. for developerfriendly resources that offer a useful javascript object notation (json) api, it can take less than half an hour. for article databases, which vendors generally take great pains to “lock down,” the task can entail a two-day marathon involving trial-and-error http-request-token authentication and screen-scraping of complex invalid html. in some cases, vendor license agreements may prohibit this kind of use altogether. there is little we can do about this; clearly, one of multipac’s severest limitations is its lack of adeptness at searching these types of “closed” remote resources. n item view item view (figure 11) provides greater detail about an individual item, including a display of more metadata fields, an image, and a link to the item in its primary context, if available. it is expected that this view also would include holdings status information for opac resources, although this has not been implemented yet. the availability of various page features is dependent on values encoded in the item’s solr metadata record. for example, if an image url is available, it will be displayed; if not, it won’t. an effort was made to keep the view logic separate from the underlying resource to improve code and resource maintainability. the page template itself does not contain any resource-dependent conditionals. n list view list view (figure 12), essentially a “favorites” or “cart” view, is so named because it is intended to duplicate the list feature of unlv libraries’ innovative millennium figure 9. the multipac browse view page information discovery insights gained from multipac | dolski 177 opac. the user can click a button in either results view or item view to add items to the list, which is stored in a cookie. although currently not feature-rich, it would be reasonable to expect the ability to send the list as an e-mail or text message, as well as other features. n metadata harvesting system for metadata to be imported into solr, it must first be harvested. in the harvesting process, a custom script checks source data and compares it with local data. it downloads new records, updates stale records, and deletes missing records. not all resources support the ability to easily check for changed records, meaning that the full record set must be downloaded and converted during every harvest. in most cases, this is not a problem; most of our resources (the library catalog excluded) can be fully dumped in a matter of a few seconds each. in a production environment, the harvest scripts would be run automatically every day or so. in practice, every resource is different, necessitating a different harvest script. the open archives initiative protocol for metadata harvesting (oai-pmh) is the protocol that first jumps to mind as being ideal for metadata harvesting, but most of our resources do not support it. ideally, we would modify as many of them as possible to be oai–compliant, but that would still leave many that are out of our hands. either way, a substantial number of custom harvest scripts would still be required. for demonstration purposes, the multipac prototype was seeded with sample data from a handful of diverse resources: 1. a set of 16,000 marc records from our library catalog, which we converted to marcxml and then to solr xml using xsl transformations 2. our locally built las vegas architects and buildings database, a mysql database containing more than 10,000 rows across 27 tables, which we queried and dumped into xml using a php script 3. our locally built special collections database, a smaller mysql database, which we dealt with the same way 4. our contentdm digital collections, which we downloaded via oai-pmh and transformed using another custom xsl stylesheet there are typically a variety of conversion options for each resource. because of time constraints, we simply chose what we expected would be the quickest route for each, and did not pay much attention to the quality of the conversion. n how multipac answers unlv libraries’ discovery questions multipac has essentially proven its capability of solving interface multiplication and fragmentation issues. figure 10. the multipac results view page 178 information technology and libraries | december 2009 by adding a layer of abstraction between resource and patron, it enables us to reference abstract resources instead of their specific implementations—for example, “the library catalog” instead of “the innopac catalog.” this creates flexibility gains with regard to resource provision and deployment. this kind of “pervasive decoupling” can carry with it a number of advantages. first, it can allow us to provide custom-developed services that vendors cannot or do not offer. second, it can prevent service interruptions caused by maintenance, upgrades, or replacement of individual back-end resources. third, by making us less dependent on specific implementations of vendor products—in other words, reducing vendor “lock-in”—it can potentially give us leverage in vendor contract negotiations. because of the breadth of information we offer from our website gateway, we as a library are particularly sensitive about the continued availability of access to our resources at stable urls. when resources are not persistent, patrons and staff need to be retrained, expectations need to be adjusted, and hyperlinks—scattered all over the place—need to be updated. by decoupling abstract resources from their implementations, multipac becomes, in effect, its own persistent uri system, unifying many library resources under one stable uri schema. in conjunction with a url rewriting system on the web server, a resource-based uri schema (figure 13) would be both powerful and desirable.8 n lessons learned in the development of multipac the lessons learned in the development of multipac fall into three main categories, listed here in order of importance. metadata quality considerations quality metadata—characterized by unified schemas; useful crosswalking; and consistent, thorough description—facilitates finding and gathering. in practice, a surrogate record is as important as the resource it describes. below a certain quality threshold, its accompanying resource may never be found, in which case it may as well not exist. surrogate record quality influences relevance ranking and can mean the difference between the most relevant result appearing on page 1 or page 50 (relevance, of course, being a somewhat disputed term). solr and similar systems will search all surrogates, including those that are of poor quality, but the resulting relevancy ranking will be that much less meaningful. figure 13. example of an implementation-based vs. resource-based uri implementation-based http://www.library.unlv.edu/arch/archdb2/index.php/projects/view/1509 resource-based (hypothetical) http://www.library.unlv.edu/item/483742 figure 11. the multipac item view page figure 12. the multipac list view page information discovery insights gained from multipac | dolski 179 metadata quality can be evaluated on several levels, from extremely specific to extremely broad (figure 14). that which may appear to be adequate at one level may fail at a higher level. using this figure as an example, multipac requires strong adherence to level 5, whereas most of our metadata fails to reach level 4. a “level 4 failure” is illustrated in table 3, which compares sample metadata records from four different multipac resources. empty cells are not necessarily “bad”— not all metadata elements apply to all resources—but this type of inconsistency multiplies as the number of resources grows, which can have negative implications for retrieval. suggestions for improving metadata quality the results from the multipac project suggest that metadata rules should be applied strictly and comprehensively according to library-wide standards that, at our libraries, have yet to be enacted. surrogate records must be treated as must-have (rather than nice-to-have) features of all resources. resources that are not yet described in a system that supports searchable surrogate records should be transitioned to one that does; for example, html webpages should be transitioned to a content management system with metadata ascription and searchability features (at unlv, this is planned). however, it is not enough for resources to have high-quality metadata if not all schemas are in sync. there exist a number of resources in our library that are well-described but whose schemas do not mesh well with other resources. different formats are used; different descriptive elements figure 14. example scopes of metadata application and evaluation, from broad (top) to specific table 3. comparing sample crosswalked metadata from four different unlv libraries resources library catalog digital collections special collections database las vegas architects & buildings database title goldfield: boom town of nevada map of tonopah mining district, nye county, nevada 0361 : mines and mining collection flamingo hilton las vegas creator paher, stanley w. booker & bradford call number f849.g6p34 contents (item-level description of contents) format digital object photo collections database record language eng eng eng coverage tonopah mining district (nev.) ; ray mining district (nev.) description (omitted for brevity) publisher nevada publications university of nevada las vegas libraries unlv architecture studies library subject (lcsh omitted for brevity) (lcsh omitted for brevity) 180 information technology and libraries | december 2009 are used; and different interpretations, however subtle, are made of element meanings. despite the best intentions of everyone involved with its creation and maintenance, and despite the high quality of many of our metadata records when examined in isolation, in the big picture, multipac has demonstrated—perhaps for the first time—how much work will be needed to upgrade our metadata for a discovery system. would the benefits make the effort worthwhile? would the effort be implementable and sustainable given the limitations of the present generation of “silo” systems? what kind of adjustments would need to be made to accommodate effective workflows, and what might those workflows look like? these questions still await answers. of note, all other open-source and vendor systems suffer from the same issues, which is a key reason that these types of systems are not yet ascendant in libraries.9 there is much promise in the ability of infrastructural standards like frbr, skos, rda, and the many other esoteric information acronyms to pave the way for the next generation of library discovery systems. organizational considerations electronic information has so far proved relatively elusive to manage; some of it is ephemeral in existence, most of it is constantly changing, and all of it is from diverse sources. attempts to deal with electronic resources—representing them using catalog surrogate records, streamlining website portals, farming out the problem to vendors—have not been as successful as they have needed to be and suffer from a number of inherent limitations. multipac would constitute a major change in library resource provision. our library, like many, is for the most part organized around a core 1970s–80s ils–support model that is not well adapted to a modern unified discovery environment. next-generation discovery is trending away from assembly-line-style acquisition and processing of primarily physical resources and toward agglomerating interspersed networked and physical resource clouds from onand offsite.10 in this model, increasing responsibilities are placed on all content providers to ensure that their metadata conforms to site-wide protocols that, at our library, have yet to be developed. n conclusion in deciding how to best deal with discovery issues, we found that a traditional product matrix comparison does not address the entire scope of the problem, which is that some of the discoverability inadequacies in our libraries are caused by factors that cannot be purchased. sound metadata is essential for proper functioning of a unified discovery system, and descriptive uniformity must be ensured on multiple levels, from the element level to the institution level. technical facilitators of improved discoverability already exist; the responsibility falls on us to adapt to the demands of future discovery systems. the specific discovery tool itself is only a facilitator, the specific implementation of which is likely to change over time. what will not change are library-wide metadata quality issues that will serve any tool we happen to deploy. the multipac project brought to light important library-wide discoverability issues that may not have been as obvious before, exposing a number of limitations in our existing metadata as well as giving us a glimpse of what it might take to improve our metadata to accommodate a next-generation discovery system, in whatever form that might take. references 1. unlv libraries usability committee, internal library website usability testing, las vegas, 2008. 2. karen calhoun, “the changing nature of the catalog and its integration with other discovery tools.” report prepared for the library of congress, 2006. 3. xiaoming liu et al., “federated searching interface techniques for heterogeneous oai repositories,” journal of digital information 4, no. 2 (2002). 4. apache software foundation, apache solr, http://lucene .apache.org/solr/ (accessed june 11, 2009). 5. dublin core metadata initiative, “dublin core metadata element set, version 1.1,” jan. 14, 2008, http://dublincore.org/ documents/dces/ (accessed june 25, 2009). 6. lorcan dempsey, “a palindromic ils service layer,” lorcan dempsey’s weblog, jan. 20, 2006, http://orweblog.oclc .org/archives/000927.html (accessed july 15, 2009). 7. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 4, no. 25 (2007): 550–61. 8. tim berners-lee, “hypertext style: cool uris don’t change,” 1998, http://www.w3.org/provider/style/uri (accessed june 23, 2009). 9. bowen, jennifer, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology and libraries 2, no. 27 (june 2008): 6–19. 10. calhoun, “the changing nature of the catalog.” evaluation of the new jersey digital highway | jeng 17 judy jeng evaluation of the new jersey digital highway the aim of this research is to study the usefulness of the new jersey digital highway (njdh, www.njdigitalhigh way.org) and its portal structure. the njdh intends to provide an immersive and user-centered portal for new jersey history and culture. the research recruited 145 participants and used a web-based questionnaire that contained three sections: for everyone, for educators, and for curators. the feedback on the usefulness of the njdh was positive and the portal structure was favorable. the research uncovered several reasons why some collections did not want to or could not participate. the findings also suggested priorities for further development. this study is one of the few on the evaluation of cultural heritage digital library. t he new jersey digital highway (njdh, www .njdigitalhighway.org) is a digital library for new jersey history and culture, including collections of new jersey libraries, museums, archives, and historical societies. the njdh, funded in part by the 2003 national leadership grant of the institute for museum and library services, is a joint project by new jersey state library, the new jersey division of archives and records management at rutgers university libraries, the new jersey historical society, and the american labor museum. as part of the project, the njdh identifies 686 cultural heritage institutions (public libraries, archives, historical societies, and museums). as of november 2007, there are more than ten thousand objects (pictures, records, and oral histories) in the repository. more are being added daily. the njdh, at this writing, is still very much a work in process. the principal investigator of this project continues to extend opportunities to more communities to link their sites and scan their images.1 the njdh provides portals for four different groups of people: everyone, educators, students, and librarians and curators. its mission is to develop an immersive, user-centered information portal and to support the new jersey learner through a collaboration among cultural heritage institutions that supports preservation of the past, new access strategies for the future, and active engagement with resources at the local and the global level for shared access and local ownership. the njdh uses fedora (flexible extensible digital object repository architecture) as a platform to mount participating institutions’ digital objects and metadata. fedora is developed jointly by cornell university and the university of virginia and is currently supported through an andrew w. mellon foundation grant that is customizable and allows local institutions to have true control over what they digitize and post.2 fedora is built on xml with core standards that support flexibility and interoperability such as mets (metadata encoding and transmition standard, www.loc.gov/standards/ mets) and oai-pmh (open archives initiative protocol for metadata harvesting, www.openarchives.org) functions. fedora is chosen for the njdh because it can effectively accommodate and manage a broad array of information sources with the flexibility to integrate with other information repositories. the njdh uses a metadata structure based on mods (metadata open description schema, www.loc.gov/ standards/mods), mets, niso, and premis (preservation metadata, www.loc.gov/standards/premis) metadata standards to support preservation of digital objects, to ensure scalability for projects and interoperability with other systems through oai-pmh. this hybrid approach enables njdh collection managers and metadata creators to provide information through multiple presentation standards in a schema easily understood within distinctive cultural heritage organization communities. mods is used for descriptive metadata, provides and retains standard bibliographic cataloging principles, and is therefore easily mapped to marc. the njdh therefore includes a mapping utility that allows the export of records from the njdh to online catalogs for any organization that wants to make its digital objects accessible within its integrated library system. additionally, there are four other types of metadata in njdh: source metadata describes provenance, condition, and conservation of analog source materials such as photographs, books, maps, audio, and video; technical metadata describes born digital images and provides information about the digital master files that will be maintained for long-term preservation and access; rights metadata identifies the rights holder(s) for each information source, identifies the permissions for use including any restrictions, and documents the copyright status of each work; digital provenance metadata provides a digital “audit trail” of any changes to the metadata.3 the use of the njdh has steadily grown and has some three thousand unique visitors a month averaging eight to ten thousand visits per month.4 n prior cultural heritage digital library evaluations literature review indicates that few researchers have investigated the usability or the evaluation of cultural judy jeng (jjeng@njcu.edu) is head of collection services, new jersey city university, new jersey. 18 information technology and libraries | december 2008 heritage digital libraries. the minerva (ministerial network for valorising activities) project proposed a number of criteria and principles specifically for usability evaluations of cultural web applications, including visibility, affordance, natural mapping, constraints, conceptual models, feedback, safety, flexibility, the scope and aim of the site, meaningful organization of the website’s functions, quality of content (for example, consistency, completeness, conciseness, accuracy, objectivity), design of functional layout, consistent use of graphics and multimedia components, as well as provision for navigation tools and search mechanisms.5 in addition, vaki, dallas, and dalla proposed sixteen usability guidelines for cultural applications.6 garoufallou, siatri, and balatsoukas reported their research on the user interface of the veriagrid application.7 the veriagrid system (www.theveriagrid.org) is a platform based on digital cartography that supports a vector map of the city of veria organized by layers and linked to multimedia objects such as text, images, photos, and video clips. the researchers were interested in learnability, errors, and satisfaction. n usefulness as the primary evaluation criterion for the njdh the njdh aims to serve heterogeneous communities and information needs. like other digital cultural services, it is not easy to address usability issues. lynch has said that digital libraries of cultural heritage don’t really have natural communities around them and that digital materials find their own unexpected user communities.8 garoufallou, siatri, and balatsoukas said that “different types of users, such as students and scholars or tourists and travelers look at these services from different angles (for example, scholarly or recreational needs). thus, the provision of accessible and user-friendly systems is important for the wider use and acceptance of these services.”9 the aim of this evaluation was to assess usefulness of the njdh from the perspectives of general users, educators, and cultural heritage professionals. usefulness is one of the criteria of usability with a focus on “did it really help me?” and “was it worth the effort?” usefulness differs from usableness in that usableness refers to functions such as “can i turn it on?” or “can i invoke that function?” usefulness can also mean “serving an intended purpose.” in the technology acceptance model (tam) developed by davis and his colleagues, perceived usefulness refers to the extent to which an information system will enhance a user’s performance.10 in addition to usefulness and usableness, jeng has gathered a comprehensive collection of usability criteria such as effectiveness, efficiency, satisfaction, learnability, ease of use, memorability, mistake recovery, and interface effectiveness.11 usability is a multidimensional construct and has a theoretical root in human–computer interaction. although usefulness may be an important evaluation criterion, thomas and jeng report that usefulness is an often overlooked criterion of usability.12 literature review indicates that usefulness has been used as either the primary or one of the criteria in the following evaluations of digital libraries: elibraryhub, the digital work environment, grow (geotechnical, rock, and water engineering, www.grow.arizona.edu), mcmaster university library’s gateway, the miguel de cervantes virtual library, minnesota’s foundations project, and the moving image collections.13 this paper reports the evaluation of the njdh. n research method a web-based online survey was conducted in september– december 2006. the questionnaire was designed, collected, and analyzed using web-based software called surveymonkey. convenience sampling method was used in this study. subjects were recruited by posting a link on the njdh website, by posting announcements on a number of electronic discussion lists for educators and cultural heritage professionals, and by word-of-mouth invitations. the participants were asked to complete a two-part questionnaire. the first part gathered demographic data such as gender, age, ethnic background, educational background, the county they live in, and how they learned about the njdh. the second part contained three sections: one for everyone, one for educators, and one for cultural heritage professionals. the section for everyone contained twenty-six questions, including seven-point likert scales and open-ended questions with a focus on the digital library’s usefulness, navigation, design, terminology, and user lostness. in addition to this general section, educators were asked to complete another fifteen questions pertaining specifically to the educators’ portal; the cultural heritage professionals had another thirteen questions regarding the librarians and curators’ portal. a total of 145 individuals participated in the survey, of which 32 were educators (22%) and 28 (20%) were cultural heritage professionals. the participants were mostly white (127 respondents or 89%), mostly female (118 respondents or 81%), and most had a master’s or doctoral degree (114 respondents or 79%). in terms of age distribution, more than half of the participants were over 50 (79 respondents or 55%) (see table 1). nearly all (136 respondents or 94%) were residents of new jersey. evaluation of the new jersey digital highway | jeng 19 among the educators that participated in this survey who evaluated the educators’ portal, 56% (18 respondents) worked at colleges or universities, 16% (5 respondents) worked at high schools, 13% (4 respondents) worked at elementary or middle schools, and 6% (2 respondents) identified themselves as specialists in museums, libraries, or archives. roughly a third (10 respondents or 31%) were teachers, 3% (1 respondent) was a teaching assistant, 13% (4 respondents) were school administrators, and 28% (9 respondents) were school library media specialists or librarians (see table 2). in terms of what they teach, 27% (7 respondents) teach new jersey history, 23% (6 respondents) teach social studies, 12% (3 respondents) teach civics, 8% (2 respondents) teach geography, and 8% (2 respondents) teach popular culture. as to the survey participants who identified themselves as cultural heritage professionals, 61% (17 respondents) worked at libraries, 11% (3 respondents) worked at museums, 11% (3 respondents) worked at historical societies, and 4% (1 respondent) worked with archives. in terms of their roles at those organizations, 61% (17 respondents) said they were faculty or staff, 18% (5 respondents) were administrators, one was a consultant, one was a librarian, and one was a volunteer (see table 3). n findings how do users find out about the njdh and will they come back? the survey found that more than half of the respondents (58 participants or 40%) learned about the njdh from their colleagues or friends, 19 participants (13%) learned through attending conferences, 16 participants (11%) were linked from other websites (see figure 1). the njdh digital library intends to build rich and “one stop shop” digital collections of new jersey history and culture. cultural heritage digital library plays a particularly important role for students of the humanities because the digital library is the humanist’s laboratory, its resources are the scholar’s primary data.14 it is important to enhance users’ awareness of this digital library among new jerseyans and even promote this cultural heritage digital library to users at global level. table 1. demographic data (n = 145) total % gender male 27 18.6 female 118 81.4 age 18–24 1 0.7 25–49 63 44.1 50–64 74 51.7 65+ 5 3.5 ethnic background white 127 89.4 african american 5 3.5 asian 6 4.2 hispanic 3 2.1 native american 1 0.7 education high school 5 3.4 associate’s degree 7 4.8 bachelor’s degree 19 13.1 master’s or phd degree 114 78.6 in terms of the purposes of visiting the njdh, the study found 72 respondents (76%) were just browsing and 23 respondents (24%) were looking for specific information such as a specific county information, history, and family genealogy (see figure 2). seventy-two respondents (74%) replied that they will come back to use the njdh again (see figure 3). those who said “no” gave reasons such as their doubts on whether the information in the njdh is reliable and authoritative, the depth and breadth of content in this digital library, and the inconsistency of fonts and font sizes. n navigation navigation has been reported in literature as a common problem in a digital library. users could accidentally leave the digital library, following the links to other web-based resources, and were unaware that they were no longer using the digital library. brinck, gergle, and wood report that disorientation is among the biggest frustrations for web users.15 20 information technology and libraries | december 2008 average 2.54 on a 7-point likert scale, 1 being easy to navigate and 7 being difficult to navigate). twentythree participants (25%) marked 1 on the likert scale, 28 participants (30%) marked 2, and 26 participants (28%) marked 3. these brought the total of the top three points to 83%. the overall response regarding user lostness was also not a problem (response average 2.42 on a 7-point likert scale, 1 being not lost at all and 7 being very lost). only two participants expressed they were very lost and one expressed lost. the reasons that could lead to user lostness include the lack of material in the collections so far, the need for explanation of how relevance is ranked, the home page being text heavy and cluttered, the photos not being legible, the lack of author information in documents, no indication of a trail of how one got there, lengthy urls, the need for better chosen direct links instead of layered links, and patrons’ unfamiliarity with icons and their functions. n layout the rating for the layout of the njdh was very positive (response average 2.54 on a 7-point likert scale, 1 being good and 7 being bad). however, the site may improve its appearance in the following areas: there is currently too much text per page (the font is too small and the use of typography, informational hierarchy, and white space must be improved); more important information needs to go at the top of pages; and more colors need to be used. n terminology the degree to which users interact with a digital library depends on how well users understand the terminology displayed on the system interface. literature review has indicated that the inappropriate use of jargon has been a common problem in digital library design. hartson, shivakumar, and pérez-quinones report from their usability inspection of the networked computer science technical reference library (www.ncstrol.org) that problems with wording accounted for 36% of the digital library’s usability problems.16 system designers often assume too much about the extent of user knowledge. the precise use of words in a user interface is one of the utmost important design considerations for usability. table 2. educators’ demographic data (n = 32) total % institutions university or college 18 56 high school 5 16 elementary or middle school 4 13 museums and others 2 6 no answer 3 9 total 32 100 roles teacher 10 31 teaching assistant 1 3 administrator 4 13 librarian 9 28 no answer 8 25 total 32 100 table 3. cultural heritage professional’s demographic data (n = 28) total % institutions library 17 61 museum 3 11 historical society 3 11 archives 1 4 others or no answer 4 14 roles faculty or staff 17 61 administrator 5 18 consultant 1 4 librarian 1 4 volunteer 1 4 no answer 3 11 this survey found the overall response regarding the navigation of the njdh was very positive (response evaluation of the new jersey digital highway | jeng 21 this research found that the overall response regarding terminology and labeling in the njdh was positive (response average 2.34 on a 7-point likert scale, 1 being clear and 7 being not clear). n usefulness usefulness was the fundamental research focus of this study. this research investigated whether the njdh was useful to the general public, educators, and students. the responses were overwhelmingly positive: 73% of the respondents gave 1–3 ratings on the 7-point likert scale (1 being useful and 7 being not useful)—30% (29 respondents) marked 1, 33% (32 respondents) marked 2, and 12% (12 respondents) marked 3. the average response was 2.63. this was a very positive response. when it comes to the specific section for educators to evaluate the educator’s portal, the rating was also positive (response average 3.04). those educators felt that the most useful information was the “how to” information for teaching with digital resources, research genealogy, developing an oral history, and so on. twelve respondents (44%) indicated they would encourage their students to use the njdh site for term papers or homework assignments. thirteen respondents (50%) indicated they would make their own lesson plans using the resources and information from the njdh. regarding the student’s portal, those educators who responded to the survey indicated that, from their perspectives, the most useful information for students was the general information about new jersey, including a directory of cultural heritage organizations, places to visit, etc. as for the librarians and curators’ portal, those cultural heritage professionals identified the librarians and curators’ resource center as the most useful resource in the njdh, followed by the digital highway collections roadmap and associated guidelines, calendar, the searching capabilities of new jersey cultural heritage organizations, and new jersey information. sixteen respondents (67%) said they would recommend this digital library to their patrons, two respondents (8%) won’t, and six respondents (25%) were not sure. it is obvious that the njdh administrators need to work harder in this area to enhance usefulness for cultural heritage professionals and their patrons. figure 1. where did you hear about njdh? the survey asked all respondents to suggest what themes should be enriched in the njdh collections. the suggestions were, in this order: new jersey history, new jersey state and county documents, new jersey culture, genealogy, everyday life in new jersey, new jersey industry, more immigration resources, education in new jersey, new jersey in wartime, and transportation. regarding the librarians and curators’ portal, the respondents suggested the contents of this particular portal should be enhanced in the following priority order: (1) more links to other websites with history resources and activities, (2) access to mentors experienced in digitizing and metadata who can provide one-to-one assistance, (3) a discussion list or blog where users can ask questions or share ideas with others, (4) information about training sessions around new jersey on digitization and metadata, (5) more resources on digital preservation and metadata, (6) educational activities that users can share with their patrons, (7) a tool for users to create their own interactive activities using the njdh resources, and (8) more information about helping patrons to use the njdh more effectively. n portal structure the njdh provides four portals for different target users: everyone, educators, students, and librarians and curators. each portal provides different interface and packages different information for a different type of user. the survey found 80% of the subjects understood the purpose of the four portals (by marking 1 or 2 on the 7-point likert scale) and only 4 participants (4%) found this type of portal structure confusing. the survey further found 65% of participants felt this kind of portal structure helpful to them. 22 information technology and libraries | december 2008 n why not contributing to the njdh collections? the respondents indicated that the barriers for them to contribute collections or resources to the njdh were, in this order: (1) lack of staff or time, (2) lack of funding, (3) lack of knowledge, and (4) copyright concerns. n statistical analyses the study found demographic factors, such as age, gender, ethnic background, and educational level, do not have significant effects on a number of areas: (1) how the participants ranked usefulness of the digital library, (2) usefulness evaluation of the four-portal structure, (3) understanding of terminology, (4) ease of navigation, and (5) lostness. the study found the correlation between navigation and lostness was statistically significant: r (66) = .83, p < .001. when a user felt the system easy to navigate, the user felt less lost. the study also found usefulness of the digital library has a statistically significant effect on a user ’s return decision. a one-way analysis of variance was conducted. the analysis of variance was significant, f (2, 59) = 20.42, p < .001. the strength of relationship between usefulness ranking and the decision of whether to revisit the digital library, as assessed by n2, was strong, with the usefulness factor accounting for 41% of the variance of the return decision. because the overall f test was significant, follow-up tests were conducted to evaluate pairwise differences among the means. using the turkey test, the pairwise comparisons yes vs. no and yes vs. not sure were significant. the pairwise comparison no vs. not sure was not significant. n conclusions usability evaluation is a user-centered evaluation to learn from users’ needs, expectations, and satisfaction. this research studied usefulness, navigation, user lostness, terminology, and layout. the overall response was positive, and the finding was that the njdh was useful in providing new jersey history and culture information. designers of the njdh learned from the study the priorities of adding various new jersey themes to the collections and how to make the site easier to use. as a result of the study, lifelong learners are identified as an important target audience. this research provided insights on why people came to use this particular digital library, their pleasure of using it, how to improve ease-of-use, navigation, website appearance, and the use of terminology and labeling. the front page of the website was redesigned to address the overuse of text on each page. the study also helped to discover what components of the site were more useful and why. furthermore, it investigated why some museums or collections in new jersey have not participated in this digital library development project. as a result of the study, more emphasis has been placed on building tools figure 2. purpose of the most recent visit figure 3. will you use njdh again? evaluation of the new jersey digital highway | jeng 23 to increase independent collection contribution by museums and archives. the observations of this study may help the development of other academic digital libraries because the barriers found in the study are common obstacles. after eighteen months of the study, the njdh governance planning committee still uses the evaluation report to address more complex and fundamental changes and the reorganization of the digital library. the study confirmed that users of this digital library appreciated the idea of providing different portals for different users. the study did not find demographic factors (age, gender, ethnic background, and educational level) play statistically significant roles in the usefulness rankings of the digital library or portal structure, terminology, ease of use, or user lostness. the study found there was a strong correlation between ease of navigation and user lostness. users don’t have feelings of lostness when a system is easy to navigate. the study also found users will come back to revisit a digital library when they find the site is useful. n acknowledgments judy jeng and grace agnew were the codesigners of the questionnaire for this study. judy served as the evaluation consultant for the njdh. grace agnew, the associate university librarian for digital library systems at rutgers university, was the principal investigator of the njdh. the njdh received funding from institute of museum library services grant lg30-03-0269-03. references 1. linda langschied, “history and high-tech intersect on the new jersey digital highway,” www.imls.gov/profiles/ nov07.shtm (accessed aug. 12, 2008). 2. linda langschied and ann montanaro, “the new jersey digital highway: a next-generation approach to statewide digital library development,” microform & imaging review 34, no. 4 (2005): 167–73. 3. the new jersey digital highway: final report on imls grant #lg30-03-0269-03, www.njdigitalhighway.org/documents/ njdh-final_report_www_version.pdf (accessed aug. 12, 2008). 4. ibid. 5. minerva working group 5, handbook for quality in cultural web sites improving quality for citizens: version 1.2—draft. (2003), www.minervaeurope.org/publications/ qualitycriteria1_2draft/qualitypdf1103.pdf (accessed aug. 12, 2008). 6. elina vaki, costis dallas, and christina dalla, calimera: cultural applications: local institutions mediating electronic resources: deliverable d 18: usability guidelines, www.calimera .org/lists/resources%20library/the%20end%20user%20 experience,%20a%20usable%20community%20memory/ usability%20guidelines.pdf (accessed aug. 12, 2008). 7. emmanouel garoufallou, rania siatri, and panagiotis balatsoukas, “virtual maps—virtual worlds: testing the usability of a greek virtual cultural map,” journal of the american society for information science and technology 59, no. 4 (2008): 591–601. 8. clifford lynch, “digital collections, digital libraries and the digitization of cultural heritage information,” first monday 7, no. 5 (2002), www.firstmonday.org/issues/issue7_5/lynch/ (accessed aug. 12, 2008). 9. garoufallou, siatri, and balatsoukas, “virtual maps— virtual worlds,” 591–601. 10. fred d. davis, “perceived usefulness, perceived ease of use, and user acceptance of information technology,” mis quarterly 13, no. 3 (1989): 319–40; fred d. davis, richard p. bagozzi, and paul r. warshaw, “user acceptance of computer technology: a comparison of two theoretical models,” management science 35, no. 8 (1989): 982–1003. 11. judy jeng, “usability of the digital library: an evaluation model” (phd diss., rutgers university, 2006): 10–19; judy jeng, “usability assessment of academic digital libraries: effectiveness, efficiency, satisfaction, and learnability,” libri: international journal of libraries and information services 55, no. 2/3 (2005): 96–121; judy jeng, “what is usability in the context of the digital library and how can it be measured?” information technology and libraries 24, no. 2 (2005): 47–56. 12. rita leigh thomas, “elements of performance and satisfaction as indicators of the usability of digital spatial interfaces for information-seeking: implications for isla” (phd diss., univ. of southern california, 1998); judy jeng, “usability of the digital library: an evaluation model” (phd diss., rutgers university, 2006): 33. 13. yin-leng theng, mei-yee chan, ai-ling khoo, and raju buddharaju, “quantitative and qualitative evaluations of the singapore national library board’s digital library,” in design and usability of digital libraries: case studies in the asia pacific, ed. yin-leng theng and schubert foo (hershey, pa.: information science publishing, 2005): 334–49.; n. meyyappan, schubert foo, and g. g. chowdhury, “design and evaluation of a taskbased digital library for the academic community,” journal of documentation 60, no. 4 (2004): 449–75; janice lodato, “creating an educational digital library: grow a national civil engineering education resource library,” (paper presented at the conference on human factors in computing systems, vienna, austria, apr. 24–29, 2004), in the acm digital library, http://portal.acm.org/citation.cfm?id=985942&coll=portal&dl =acm&cfid=32427354&cftoken=28824529 (accessed aug. 12, 2008); brian detlor et al., fostering robust library portals: an assessment of the mcmaster university library gateway (hamilton, ont.: michael g. degroote school of business, mcmaster university, 2003); álvaro quijano-solís and raúl novelo-peña, “evaluating a monolingual multinational digital library by using usability: an exploratory approach from a developing country,” the international information & library review 37, no. 4 (2005): 329–36; eileen quam, “informing and evaluating a metadata initiative: usability and metadata studies in minnesota’s foundations project,” government information quarterly 18, no. 24 information technology and libraries | december 2008 3 (2001): 181–94; judy jeng, “metadata usefulness evaluation of the moving image collections” (paper presented at the new jersey library association annual conference, long branch, new jersey, apr. 23–25, 2007), www.njla.org/conference/2007/ presentations/metadata.pdf (accessed aug. 12, 2008). 14. gregory crane and clifford wulfman, “towards a cultural heritage digital library,” proceedings of the 3rd acm/ ieee-cs joint conference on digital libraries, in the acm digital library, http://delivery.acm.org/10.1145/830000/827150/p75 -crane.pdf?key1=827150&key2=9784876911&coll=acm&dl=a cm&cfid=8598346&cftoken=44546164 (accessed aug. 12, 2008). 15. tom brinck, darren gergle, and scott d. wood, designing web sites that work: usability for the web (san francisco: morgan kaufmann, 2002). 16. h. rex hartson, priy a. shivakumar, and manuel a. pérez-quinones, “usability inspection of digital libraries: a case study,” international journal on digital libraries 4, no. 2 (2004): 108–23. lib-mocs-kmc364-20131012113626 236 news and announcements programmers discussion group meets: pl/1, the marc format, and holdings twenty-two computer programmers, analysts, and managers met on june 29 in san francisco for the formative meeting of the lit a/isas programmers discussion group. in an informal and informative hour, the group established ground rules, started a mailing list, planned the topic for midwinter 1982, and found out more about practice<> in fifteen library-related installations. programming language usage what programming languages are used, and used primarily, at the installations? nine languages turned up, excluding database management systems (and lumping all "assembly" languages together)-but one language accounted for more than one-half of the responses: language users primary pl/1 14 13 assembler/ assembly languages 8 5 cobol 4 2 pascal 3 1 basic 1 1 c 1 1 mils (a mumps dialect) 1 fortran 0 snobol 0 note: some installations use more than one ''primary" language.) a second round of hands showed only four users with no use of pl!i. marc format usage these questions are asked on an agencyby-agency basis. one agency made no use of the marc communications format. none of those receiving marc-format tapes were unable to recreate the format. eight of the fifteen agencies made significant internal-processing use of the marccommunications-format structure, including the leader, directory, and character storage patterns; this question was made more explicit to try to narrow the answers. thus, the marc communications format is used as a processing format in a significant number of institutions. only three agencies use ascii internally, most use of marc takes place within ebcdic. (all but three agencies were using ibm 360/370 equivalent computersthe parallel is clear.) computer usage as noted, all but three agencies use ibm equivalents in the mainframe range; three of those use plug-compatible equipment such as magnuson and amdahl. the other major computers are cdc, dec/vax, and data general eclipse systems. smaller computers in use include dc, dec 11170, datapoint, and ibm series/! units. home terminals and computers four of those present currently have home terminals. three have home computers. future plaru for the discussion group the midwinter 1982 topic will be "holdings," with some emphasis on dealing with holdings formats in various technical processing systems (such as oclc, utlas, wln, rlin). an announcement and mailing list will go to all those on the mailing list, as will an october/november mailing with questions sent to the chair. those interested should send their names and addresses to walt crawford, rlg, jordan quad, stanford, ca 94305. it is anticipated that papers on the topic may be ready by midwinter; questions and comments are welcomed. note: there will be no set speakers or panelists; this will be a true disci.i.i'sion group. the topic for the philadelphia meeting will be set at midwinter 1982.-walt crawford, chair, the research libraries group, inc. channel 2000 a test of viewdata system called channel 2000 was conducted by oclc in columbus, ohio, during the last quarter of 1980. an outgrowth of the oclc research department's home delivery of library services program, channel 2000 was developed and tested to investigate technical, business, market, and social issues involved in electronic delivery of information using videotex technology. data collection throughout the test, data were collected in three ways. transaction logs were maintained, recording keystrokes of each user during the test, thus allowing future analyses and reconstruction of the test sessions. questionnaires requesting demographic information, life-style, opinion leadership, and attitudes toward channel 2000 were collected from each user in each household before, during, and after the test. six focus-group interviews were held and audiotaped to obtain specific userresponses to the information services. attitudes toward library services forty-six percent of the respondents agreed that channel 2000 saved time in getting books from the library. responding to other questions, 29 percent felt that they would rather go to a traditional library than order books through channel 2000, and 38 percent of the users felt that channel 2000 had no effect on their library allendance. forty-one percent of the channel 2000 test group felt that their knowledge of library services increased as a result of the channel 2000 test. in addition, 16 percent of the respondents stated that they spent more time reading books than they did before the test. eighty-two percent of the respondents felt that public libraries should spend tax dollars on services such as channel 2000. although this might suggest that library viewdata services should be taxbased, subsequent focus-group interviews indicated that remote use of these services should be paid for by the individual, whereas on-site use should be "free." sixtythree percent of the test population stated news and announcements 237 that they would probably subscribe and pay for a viewdata library service, if the services were made available to them off-site. purchase intent respondents were asked to rank-order the seven channel 2000 services according to the likelihood that they would pay money to have that service in their home. a mean score was calculated for each channel 2000 service, and the following table shows rank order of preference. rank order channel 2000 service 1 video encyclopedia locate any of 32,000 articles in the new academic american encyclopedia via one of three easy look-up indexes 2 video catalog browse through the videocard catalog of the public libraries of columbus and franklin county, and select books to be mailed directly to your home 3 home banking pay your bills; check the status of your checking and savings accounts; look up the balance of your visa credit card; look up your mortgage and installment loans; get current information on bank one interest rates 4 public information become aware of public and legislative information in ohio 5 columbus calendar check the monthly calendar of events for local educational and entertainment happenings 6 math that connts! teach your children basic mathematics, including counting and simple word problems 7 early reader help your children learn to read by reinforcing word relationships the final report, mailed to all oclc member libraries, was published as channel 2000: description and findings of a viewdata test conducted by oclc in columbus, ohio, october-december 1980. dublin, ohio: research department, online computer library center, inc., 1981. 21p. notis software available at the 1981 ala annual conference in san francisco, the northwestern univer238 journal of library automation vol. 14/3 september 1981 sity library announced the availability of version 3.2 of the notis computer system. intended for medium and large research libraries or groups of libraries, notis provides comprehensive online integratedprocessing capabilities for cataloging, acquisitions, and serials control. patron access by author and title has been in operation for more than a year , and version 3.2 adds subject-access capability as well as other new features. an improved circulation module and other enhancements are under development for future release. although notis, which runs on standard ibm or ibm-compatible hardware, has been in use by the national library of venezuela for several years, northwestern only recently decided to actively market the software, and provided a demonstration at the ala conference. a contract has been signed with the university of florida, and several other installations are expected within a few months. further information on notis may be obtained from the northwestern university library, 1935 sheridan rd., evanston, il 60201. bibliographic access & control system the washington university school of medicine library announces its computerbased online catalog/library control system known as the bibliographic access & control system (bacs). the system is now in operation and utilizes marc cataloging records obtained from oclc since 1975, serials records from philsom serials control network, and machine-readable patron records. features of interest in the system are: 1. patron access by author, title, subject, call number, or combination of keywords. the public-access feature has been in operation since may 1981. online instructions support system use, minimizing staff intervention. user survey indicates a high degree of satisfaction with the system. 2. low cost public access terminal with a specially designed overlay board. 3. barcode-based circulation system featuring the usual functions, including recalls for high demand items, overdue notices, suspension of circulation privileges, etc. 4. cataloging records loaded from oclc marc records by tape and from a microcomputer interface at the oclc printer port. authority control available on three levels: (a) controlled authority, i.e. , mesh or lc, (b) library-specific assigned authority, and (c) word list available to user. 5. full cataloging functions online, including editing, deleting, and entering records. 6. serials control from philsom system. philsom is an online distributed computer network that currently controls serials for sixteen medical school libraries. philsom features rapid online check-in, claims, fiscal control, union lists, and management reports. 7. five possible displays of the basic bibliographic record, varying from a brief record for the public access terminal to complete information for cataloging and reference staff. 8. two levels of documentation available online. the software is available to interested libraries, bibliographic utilities, or commercial firms. contact: washington university school of medicine library, 4580 scott, st. louis, mo 63110; (314) 454-3711. editorial | truitt 55 a recent library journal (lj) story referred to “the palpable hunger public librarians have for change . . . and, perhaps, a silver bullet to ensure their future” in the context of a presentation at the public library association’s 2010 annual conference by staff members of the rangeview (colo.) library district. now, lest there be any doubt on this point, allow me to state clearly from the outset that none of the following ramblings are in any way intended as a specific critique of the measures undertaken by rangeview. far be it from me to second-guess the rangeview staff’s judgment as to how best to serve the community there.1 rather, what got my attention was lj’s reference to a “palpable hunger”for magic ammunition, from whose presumed existence we in libraries seem to draw comfort. in the last quarter century, it seems as though we’ve heard about and tried enough silver bullets to keep our collective six-shooters endlessly blazing away. here are just a few examples that i can recall off the top of my head, and in no particular order: ■■ library cafes and coffee shops. ■■ libraries arranged along the lines of chain bookstores. ■■ general-use computers in libraries (including information/knowledge commons and what-have-you) ■■ computer gaming in libraries. ■■ lending laptops, digital cameras, mp3 players and ipods, e-book readers, and now ipads. ■■ mobile technology (e.g., sites and services aimed at and optimized for iphones, blackberries, etc.) ■■ e-books and e-serials. ■■ chat and instant-message reference. ■■ libraries and social networking (e.g., facebook, twitter, second life, etc.). ■■ “breaking down silos,” and “freeing”/exposing our bibliographic data to the web, and reuse by others outside of the library milieu. ■■ ditching our old and “outmoded” systems, whether the object of our scorn is aacr2, lcsh, lcc, dewey, marc, the ils, etc. ■■ library websites generally. remember how everyone—including us—simply had to have a website in the 1990s? and ever since then, it’s been an endless treadmill race to find the perfect, user-centric library web presence? if sisyphus were to be incarnated today, i have little doubt that he would appear as a library web manager and his boulder would be a library website. ■■ oh, and as long as we’re at it, “user-centricity” generally. the implication, of course, is that before the term came into vogue, libraries and librarians were not focused on users. ■■ “next-gen” catalogs. i’m sure i’m forgetting a whole lot more. anyway, you get the picture. each of these has, at one time or another, been positioned by some advocate as the necessary change—the “silver bullet”—that would save libraries from “irrelevance” (or worse!), if we would but adopt it now, or better yet, yesterday. well, to judge from the generally dismal state of libraries as depicted by some opinionmakers in our profession—or perhaps simply from our collective lack of self-esteem—we either have been misled about the potency of our ammunition, or else we’ve been very poor markspersons. notwithstanding the fact that we seem to have been indiscriminately blasting away with shotguns rather than six-shooters, our shooting has neither reversed the trends of shrinking budgets and declining morale nor staunched the ceaseless dire warnings of some about “irrelevance” resulting from ebbing library use. to stretch the analogy a bit further still, one might even argue that all this shooting has done damage of its own, peppering our most valuable services with countless pellet-sized holes. at the same time, we have in recent years shown ourselves to be remarkably susceptible to the marketingfocused hyperbole of those in and out of librarianship about technological change. each new technology is labeled a “game-changer”; change in general is either— to use the now slightly-dated, oh-so-nineties term—a “paradigm shift” or, more recently, “transformational.” when did we surrender our skepticism and awareness of a longer view? what’s wrong with this picture?2 i’d like to suggest another way of viewing this. a couple of years ago, alan weisman published the world without us, a book that should be required reading for all who are interested in sustainability, our own hubris, and humankind’s place in the world. the book begins with our total, overnight disappearance, and asks (1) what would the earth be like without us? and (2) what evidence of our works would remain, and for how long? the bottom line answers for weisman are (1) in the long run, probably much better off, and (2) not much and not for very long, really. so, applying weisman’s first question to our own, much more modest domain, what might the world be like if tomorrow librarians all disappeared or went on to work doing something else—became consultants, perhaps?— and our physical and virtual collections were padlocked? would everything be okay, because as some believe, marc truitteditorial: no more silver bullets, please marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 56 information technology and libraries | june 2010 think we need to be prepared to turn off the lights, lock the doors, and go elsewhere, because i hope that what we’re doing is about more than just our own job security. and if the far-fetched should actually happen, and we all disappear? i predict that at some future point, someone will reinvent libraries and librarians, just as others have reinvented cataloguing in the guise of metadata. notes and references 1. norman oder, “pla 2010 conference: the anythink revolution is ripe,” library journal, mar. 26, 2010, http://www .libraryjournal.com/article/ca6724258.html (accessed mar. 30, 2010). there, i said it! a fairly innocuous disclaimer added to one of my columns last year seemed to garner more attention (http:// freerangelibrarian.com/2009/06/13/marc-truitts-surprising -ital-editorial/) than did the content of the column itself. will the present disclaimer be the subject of similar speculation? 2. one of my favorite antidotes to such bloated, short-term language is embodied in michael gorman’s “human values in a technological age,” ital 20, no. 1 (mar. 2000): 4–11, http:// www.ala.org/ala/mgrps/divs/lita/ital/2001gorman.cfm (accessed apr 12, 2010)—highly recommended. the following is but one of many calming and eminently sensible observations gorman makes: the key to understanding the past is the knowledge that people then did not live in the past—they lived in the present, just a different present from ours. the present we are living in will be the past sooner than we wish. what we perceive as its uniqueness will come to be seen as just a part of the past as viewed from the point of a future present that will, in turn, see itself as unique. people in history did not wear quaintly oldfashioned clothes—they wore modern clothes. they did not see themselves as comparing unfavorably with the people of the future, they compared themselves and their lives favorably with the people of their past. in the context of our area of interest, it is particularly interesting to note that people in history did not see themselves as technologically primitive. on the contrary, they saw themselves as they were—at the leading edge of technology in a time of unprecedented change. it’s all out there on the web anyway, and google will make it findable? absent a few starry-eyed bibliophiles and newly out-of-work librarians—those who didn’t make the grade as consultants—would anyone mourn our disappearance? would anyone notice? if a tree falls in the woods . . . in short, would it matter? and if so, why and how much? the answer to the preceding two questions, i think, can help to point the way to an approach for understanding and evaluating services and change in libraries that is both more realistic and less draining than our obsessive quest for the “silver bullet.” what exactly is our “valueadd”? what do we provide that is unique and valuable? we can’t hope to compete with barnes and noble, starbucks, or the googleplex; seeking to do so simply diverts resources and energy from providing services and resources that are uniquely ours. instead, new and changed services and approaches should be evaluated in terms of our value-add: if they contribute positively and are within our abilities to do them, great. if they do not contribute positively, then trying to do them is wasteful, a distraction, and ultimately disillusioning to those who place their hopes in such panaceas. some of the “bullets” i listed above may well qualify as contributing to our value-add, and that’s fine. my point isn’t to judge whether they are “bad” or “good.” my argument is about process and how we decide what we should do and not do. understanding what we contribute that is uniquely ours should be the reference standard by which proposed changes are evaluated, not some pie-inthe-sky expectation that pursuit of this or that vogue will magically solve our funding woes, contribute to higher (real or virtual) gate counts, make us more “relevant” to a particular user group, or even raise our flagging selfesteem. in other words, our value-add must stand on its own, regardless of whether it actually solves temporal problems. it is the “why” in “why are we here?” if, at the end of the day, we cannot articulate that which makes us uniquely valuable—or if society as a whole finds that contribution not worth the cost—then i editorial ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 34 information technology and libraries | march 2011 camilla fulton web accessibility, libraries, and the law as a typical student, you are able to scan the resources and descriptions, familiarize yourself with the quiz’s format, and follow the link to the quiz with no inherent problems. everything on the page flows well for you and the content is broken up easily for navigation. now imagine that you are legally blind. you navigate to the webpage with your screen reader, a software device that allows you to surf the web despite your impairment. ideally, the device gives you equal access to webpages, and you can navigate them in an equivalent manner as your peers. when you visit your teacher’s webpage, however, you start experiencing some problems. for one, you cannot scan the page like your peers because the category titles were designed with font tags instead of heading tags styled with cascading style sheets (css). most screen readers use heading tags to create the equivalent of a table of contents. this table of contents function divides the page into navigable sections instead of making the screen reader relay all page content as a single mass. second, most screen readers also allow users to “scan” or navigate a page by its listed links. when you visit your teacher’s page, you get a list of approximately twenty links that all read, “search this resource.” unfortunately, you are unable to differentiate between the separate resources without having the screen reader read all content for the appropriate context. third, because the resources are separated by hard returns, you find it difficult to differentiate between each listed item. your screen reader does not indicate when it approaches a list of categorized items, nor does it pause between each item. if the resources were contained within the proper html list tags of either ordered or unordered (with subsequent list item tagging), then you could navigate through the suggested resources more efficiently (see figures 1, 2, and 3). finally, the video tutorial’s audio tract explains much of the quiz’s structure; however, the video relies on image-capture alone for page orientation and navigation. without a visual transcript, you are at a disadvantage. stylistic descriptions of the page and its buttons are generally unhelpful, but the page’s textual content, and the general movement through it, would better aid you in preparation for the quiz. to be fair, your teacher would already be cognizant of your visual disability and would have accommodated your class needs appropriately. the individuals with disabilities education act (idea) mandates educational institutions to provide an equal opportunity to education.1 your teacher would likely avoid posting any class materials online without being certain that the content was fully accessible and usable to you. unlike educational institutions, however, most libraries are not legally bound to the same law. idea does not command libraries to provide equal access to information through with an abundance of library resources being served on the web, researchers are finding that disabled people oftentimes do not have the same level of access to materials as their nondisabled peers. this paper discusses web accessibility in the context of united states’ federal laws most referenced in web accessibility lawsuits. additionally, it reveals which states have statutes that mirror federal web accessibility guidelines and to what extent. interestingly, fewer than half of the states have adopted statutes addressing web accessibility, and fewer than half of these reference section 508 of the rehabilitation act or web content accessibility guidelines (wcag) 1.0. regardless of sparse legislation surrounding web accessibility, librarians should consult the appropriate web accessibility resources to ensure that their specialized content reaches all. i magine you are a student. in one of your classes, a teacher and librarian create a webpage that will help the class complete an online quiz. this quiz constitutes 20 percent of your final grade. through the exercise, your teacher hopes to instill the importance of quality research resources found on the web. the teacher and librarian divide their hand-picked resources into five subject-based categories. each resource listing contains a link to that particular resource followed by a paragraph of pertinent background information. the list concludes with a short video tutorial that prepares students for the layout of the online quiz. neither the teacher nor the librarian has extensive web design experience, but they both have basic html skills. the library’s information technologists give the teacher and librarian web space, allowing them to freely create their content on the web. unfortunately, they do not have a web librarian at their disposal to help construct the page. they solely rely on what they recall from previous web projects and visual layouts from other websites they admire. as they begin to construct the page, they first style each category’s title with font tags to make them bolder and larger than the surrounding text. they then separate each resource and its accompanying description with the equivalent of hard returns (or line breaks). next, they place links to the resources within the description text and label them with “search this resource.” finally, they create the audiovisual tutorial with a runtime of three minutes. camilla fulton (cfulton2@illinois.edu) is web and digital content access librarian, university of illinois, urbana-champaign. web accessibility, libraries, and the law | fulton 35 providing specifics on when those standards should apply. for example, section 508 of the rehabilitation act could serve as a blueprint for information technology guidelines that state agencies should follow. section 508 states that federal employees with disabilities [must] have access to and use of information and data that is comparable to the access and use by federal employees who are not individuals with disabilities, unless an undue burden would be imposed on the agency.4 section 508 continues to outline how the declaration should be met when procuring and managing software, websites, telecommunications, multimedia, etc. section 508’s web standards comply with w3c’s web content accessibility guidelines (wcag) 1.0; stricter compliance is optional. states could stop at section 508 and only make web accessibility laws applicable to other state agencies. section 504 of the rehabilitation act, however, provides additional legislation to model. in section 504, no disabled person can be excluded from programs or activities that are funded by federal dollars.5 section 504 further their websites. neither does the federal government possess a carte blanche web accessibility law that applies to the nation. this absence of legislation may give the impression of irrelevance, but as more core components of librarianship migrate to the web, librarians should confront these issues so they can serve all patrons more effectively. this article provides background information on the federal laws most frequently referenced within web accessibility cases. additionally, this article tests three assumptions: ■■ although the federal government has no web accessibility laws in place for the general public, most states legalized web accessibility for their respective state agencies. ■■ most state statutes do not mention section 508 of the americans with disabilities act (ada) or acknowledge world wide web consortium (w3c) standards. ■■ most libraries are not included as entities that must comply with state web accessibility statutes. further discussion on why these issues are important to the library profession follows. ■■ literature review no previous study has systematically examined state web accessibility statutes as they relate to libraries. most articles that address issues related to library web accessibility view libraries as independent entities and run accessibility evaluators on preselected library and university websites.2 those same articles also evaluate the meaning and impact of federal disability laws that could drive the outcome of web accessibility in academia.3 in examining state statutes, additional complexities may be unveiled when delving into the topic of web accessibility and librarianship. ■■ background with no definitive stance on public web accessibility from the federal government, states became tasked with figure 1. these webpages look exactly the same to users, but the html structure actually differs in source code view. 36 information technology and libraries | march 2011 title ii, section 201 (1) defines “public entity” as state and local governments, including their agencies, departments, and districts.9 title iii, section 302(a) builds on title ii and states that in the case of commercial facilities, no individual shall be discriminated against on the basis of disability in the full and equal enjoyment of the goods, services, facilities, privileges, advantages, or accommodations of any place of public accommodation by any person who owns, leases . . . or operates a place of public accommodation.10 delineates specific entities subject to the auspice of this law. though section 504 never mentions web accessibility specifically, states could freely interpret and apply certain aspects of the law for their own use (e.g., making organizations receiving state funds create accessible websites to prevent the exclusion of disabled people). if states wanted to provide the highest level of service to all, they would also consider incorporating the most recent w3c recommendations. the w3c formed in 1994 to address the need for structural consistency across multitudinous websites and web browsers. the driving principle of the w3c is to make the benefits of the web accessible to all, “whatever their hardware, software, network infrastructure, native language, culture, geographical location, or physical or mental ability.”6 the most recent w3c guidelines, wcag 2.0, detail web accessibility guidelines that are simpler to understand and, if followed, could improve both accessibility and usability despite browser type. alternatively, states could decide to wait until the federal government mandates an all-encompassing law on web accessibility. the national federation of the blind (nfb) and american council of the blind (acb) have been trying commercial entities in courts, claiming that inaccessible commercial websites discriminate against disabled people. the famous nfb lawsuit against target provided a precedent for other courts to acknowledge; commercial entities should provide an accessible means to purchase regularly stocked items through their website (if they are already maintaining one).7 these commercial web accessibility lawsuits are often defended with title ii and title iii of the ada. title ii, section 202 states, subject to the provisions of this title, no qualified individual with a disability shall, by reason of such disability, be excluded from participation in or be denied the benefits of the services, programs, or activities of a public entity, or be discriminated by any such entity.8 figure 2. here we see distinct variances in the source code. the image at the top (inaccessible) reveals code that does not use headings or unordered lists for each resource. the image on the bottom (accessible) does use semantically correct code, maintaining the same look and feel of the headings and list items through an attached cascading stylesheet. web accessibility, libraries, and the law | fulton 37 accessibility believe that section 301(7) specifically denotes places of physical accommodation because the authors’ original intent did not include virtual ones.13 settling on a definition for “public accommodation” is so divisive that three district courts are receptive to “public accommodation” referring to nonphysical places, four district courts ruled against the notion, and four have not yet made a decision.14 despite legal battles within the commercial sector, state statute analysis shows that states felt compelled to address web accessibility on their own terms. ■■ method this study surveys the most current state statute web presences as they pertain to web accessibility and their connection to libraries. using georgia institute of technology’s state e&it accessibility initiatives database and golden’s article on accessibility within institutions of higher learning as starting points, i searched each state government’s online statutes for the most recently available code.15 examples of search terms used include “web accessibility,” “information technology,” and “accessibility -building -architecture -health.” “building,” for example, excluded statute results that pertained to building accessibility. i then reviewed each statute to determine whether its mandates applied to web accessibility. some statutes excluded mention of web accessibility but outlined specific requirements for an institution’s software procurement. when statutes on web accessibility could not be found, additional searches were conducted for the most recently available web accessibility guidelines, policies, or standards. using a popular web search engine and the search terms “[state] web accessibility” usually resulted in finding the state’s standards online. if the search engine did not offer desirable results, then i visited the appropriate state government’s website. the term “web accessibility” was used within the state government’s site search. the following results serve only as a guide. because of the ever-changing nature of the law, please consult legal advisors within your institution for changes that may have occurred post article publication. ■■ results “although the federal government has no web accessibility laws in place for the general public, most states legalized web accessibility for its respective state agencies.” false—only seventeen states have codified laws ensuring web accessibility for their state websites.16 four this title’s proclamation seems clear-cut; however, legal definitions of “public accommodation” differ. title iii, section 301(7) defines a list of acceptable entities to receive the title of “public accommodation.”11 among those listed are auditoriums, theaters, terminals, and educational facilities. courts using title iii in defense for web accessibility argue that the web is a place, and therefore cannot discriminate against those with visual, motor, or mental disabilities.12 those arguing against using title iii for web figure 3. fangs (http://www.standards-schmandards.com/ projects/fangs/) visually emulates what a standard screen reader outputs so that designers can take the first steps in creating more accessible content on the web. 38 information technology and libraries | march 2011 classified institutions with library websites found that less than half of each degree-producing division was directed by their institution to comply with the ada for web accessibility.24 some may not recognize the significance of providing accessible library websites, especially if they do not witness a large quantity of accommodation requests from their users. coincidentally, perceived societal drawbacks could keep disabled users from seeking the assistance they need.25 according to american community survey terminology, disabilities negatively affecting web accessibility tend to be sensory and self-care based.26 the 2008 american community survey public use microdata sample estimates that 10,393,100 noninstitutionalized americans of all ages live with a hearing disability and 6,826,400 live with a visual disability.27 according to the same survey, an estimated 7,195,600 noninstitutionalized americans live with a self-care disability. in other words, nearly 24.5 million people in the united states are unable to retrieve information from library websites unless web authors make accessibility and usability their goal. as gatekeepers of information and research resources, librarians should want to be the first to provide unrestricted and unhindered access to all patrons despite their ability. nonetheless, potential objections to addressing web accessibility can deter improvement: learning and applying web accessibility guidelines will be difficult. there is no way we can improve access to disabled users in a way that will be useful. actually, more than 90 percent of sensory-accessibility issues can be resolved through steps outlined in section 508, such as utilizing headings properly, giving alternative image descriptions, and providing captions for audio and video. granted, these elements may be more difficult to manage on extensive websites, but wisely applied web content management systems could alleviate information technology units’ stress in that respect.28 creating an accessible website is time consuming and resource draining. this is obviously an “undue burden” on our facility. we cannot do anything about accessibility until we are given more funding. the “undue burden” clause seen in section 508 and several state statutes is a real issue that government officials needed to address. however, individual institutions are not supposed to view accessible website creation as an isolated activity. “undue burden,” as defined by the code of federal regulations, relies upon the overall budget of the program or component being developed.29 claiming an “undue burden” means that the institution must extensively document why creating an accessible website would cause a burden.30 the institution would also have to provide disabled users an alternative means of access to information provided online. of these seventeen extended coverage to include agencies receiving state funds (with no exceptions).17 though that number seems disappointingly low, many states addressed web accessibility through other means. thirtyone states without web accessibility statutes posted some form of standard, policy, or guideline online in its place (see appendix). these standards only apply to state entities, however, and have no legal footing outside of federal law to spur enforcement. at the time of article submission, alaska and wyoming were the only two states without an accessibility standard, policy, or guideline available on the web. “most state statutes do not mention section 508 of the americans with disabilities act or acknowledge world wide web consortium (w3c) standards” true—interestingly, only seven of the seventeen states with web accessibility statutes reference section 508 or wcag 1.0 directly within their statute text (see appendix).18 minnesota is the only state that references the more current wcag 2.0 standards.19 these numbers may seem minuscule as well, but all states have supplemented their statutes with more descriptive guidelines and standards that delineate best practices for compliance (see appendix). within those guidelines and standards, section 508 and wcag 1.0 get mentioned with more frequency. “most libraries are not included as entities that must comply with state web accessibility statutes.” true—from the perspective of a librarian, the above data means that forty-eight states would require web accessibility compliance for their state libraries (see appendix). four of those states (arkansas, california, kentucky, and montana) require all libraries receiving state funds to maintain an accessible website.20 an additional four states (illinois, oklahoma, texas, and virginia) explicitly hold universities, and therefore their libraries, to the same standards as their state agencies.21 despite the commendable efforts of eight states pushing for more far-reaching web accessibility, thousands of k–12, public, and academic libraries nationwide escape these laws’ reach. ■■ discussion and conclusion without legal backing for web accessibility issues at all levels, “equitable access to information and library services” might remain a dream.22 notably, researchers have witnessed web accessibility improvements in a four-year span; however, as of 2006, even libraries at institutions with ala-accredited library and information science programs did not average an accessibility validation of 70 percent or higher.23 additionally, a survey of carnegie web accessibility, libraries, and the law | fulton 39 9. 42 u.s.c. §12131. 10. 42 u.s.c. §12182. 11. 42 u.s.c. §12181. 12. carrie l. kiedrowski, “the applicability of the ada to private internet web sites,” cleveland state law review 49 (2001): 719–47; shani else, “courts must welcome the reality of the modern word: cyberspace is a place under title iii of the americans with disabilities act,” washington & lee law review 65 (summer 2008): 1121–58. 13. ibid. 14. nikki d. kessling, “why the target ‘nexus test’ leaves disabled americans disconnected: a better approach to determine whether private commercial websites are ‘places of public accommodation,’” houston law review 45 (summer 2008): 991–1029. 15. state e & it accessibility initiatives workgroup, “state it database,” georgia institute of technology, http://acces sibility.gtri.gatech.edu/sitid/state_prototype.php (accessed jan. 28, 2010); nina golden, “why institutions of higher education must provide access to the internet to students with disabilities,” vanderbilt journal of entertainment & technology law 10 (winter 2008): 363–411. 16. arizona revised statutes §41-3532 (2010); arkansas code of 1987 annotated §25-26-201–§25-26-206 (2009); california government code §11135–§11139 (2010); colorado revised statutes §24-85-101–§24-85-104 (2009); florida statutes §282.601– §282.606 (2010); 30 illinois complied statutes annotated 587 (2010); burns indiana code annotated §4-13.1-3 (2010); kentucky revised statutes annotated §61.980–§ 61.988 (2010); louisiana revised statutes §39:302 (2010); maryland state finance and procurement code annotated §3a-311 (2010); minnesota annotated statutes §16e.03 subdivisions 9-10 (2009); missouri revised statutes §191.863 (2009); montana code annotated §185-601 (2009); 62 oklahoma statutes §34.16, §34.28–§34.30 (2009); texas government code §2054.451–§2054.463 (2009); virginia code annotated §2.2-3500–§2.2-3504 (2010); west virginia code § 18-10n-1–§18-10n-4 (2009). 17. arkansas code of 1987 annotated §25-26-202(7) (2009); california government code §11135 (2010); kentucky revised statutes annotated §61.980(4) (2010); montana code annotated §18-5-602 (2009). 18. arizona revised statutes §41-3532 (2010); california government code §11135(d)(2) (2010); burns indiana code annotated §4-13.1-3-1(a) (2010); florida statutes §282.602 (2010); kentucky revised statutes annotated §61.980(1) (2010); minnesota annotated statutes §16e.03 subdivision 9(b) (2009); missouri revised statutes §191.863(1) (2009). 19. minnesota annotated statutes §16e.03 subdivision 9(b) (2009). 20. arkansas code of 1987 annotated §25-26-202(7) (2009); california government code §11135 (2010); kentucky revised statutes annotated §61.980(4) (2010); montana code annotated §18-5-602 (2009). 21. 30 illinois complied statutes annotated 587/10 (2010); 62 oklahoma statutes §34.29 (2009); texas government code §2054.451 (2009); virginia code annotated §2.2-3501 (2010). 22. american library association, “alahead to 2010 strategic plan,” http://www.ala.org/ala/aboutala/missionhistory/ plan/2010/index.cfm (accessed jan. 28, 2010). 23. comeaux and schmetzke, “accessibility trends.” no one will sue an institution focused on promoting education. we will just continue providing one-on-one assistance when requested. in 2009, a blind student, backed by the nfb, initiated litigation against the law school admissions council (lsac) because of the inaccessibility of its online tests.31 in 2010, they added four law schools to the defense: university of california hastings college of the law, thomas jefferson school of law, whittier law school, and chapman university school of law.32 these law schools were added because they host their application materials on the lsac website.33 assuredly, if instructors and students are encouraged or required to use library webpages for assignments and research, those unable to use them in an equivalent manner as their peers may pursue litigation for forcible change. ultimately, providing accessible websites for library users should not be perceived as a hassle. sure, it may entail a new way of thinking, but the benefits of universal access and improved usability far outweigh the frustration that users may feel when they cannot be self-sufficient in their web-based research.34 regardless of whether the disabled user is in a k–12, college, university, or public library, they are paying for a service that requires more than just a physical accommodation.35 federal agencies, state entities, and individual institutions are all responsible (and important) in the promotion of accessible website construction. lack of statutes or federal laws should not exempt libraries from providing equivalent access to all; it should drive libraries toward it. references 1. individuals with disabilities education act of 2004, 40 u.s.c. §1411–§1419. 2. see david comeaux and axel schmetzke, “accessibility trends among academic library and library school web sites in the usa and canada,” journal of access services 6 (jan.–june 2009): 137–52; julia huprich and ravonne green, “assessing the library homepages of copla institutions for section 508 accessibility errors: who’s accessible, who’s not and how the online webxact assessment tool can help,” journal of access services 4, no. 1 (2007): 59–73; michael providenti and robert zai iii, “web accessibility at kentucky’s academic libraries,” library hi tech 25, no. 4 (2007): 478–93. 3. ibid.; michael providenti and rober zai iii, “web accessibility at academic libraries: standards, legislation, and enforcement,” library hi tech 24, no. 4 (2007): 494–508. 4. 29 u.s.c. §794(d); 36 code of federal regulations (cfr) §1194.1. 5. 29 u.s.c. § 794. 6. world wide web consortium, “w3c mission,” http:// www.w3.org/consortium/mission.html (accessed jan. 28, 2010). 7. national federation of the blind v. target corp., 452 f. supp. 2d 946 (n.d. cal. 2006). 8. 42 u.s.c. §12132. 40 information technology and libraries | march 2011 special needs, vol. 5105, lecture notes in computer science (linz, australia: springer-verlag, 2008) 454–61; david kane and nora hegarty, “new site, new opportunities: enforcing standards compliance within a content management system,” library hi tech 25, no. 2 (2007): 276–87. 29. 28 cfr §36.104. 30. ibid. 31. sheri qualters, “blind law student sues law school admissions council over accessibility,” national law journal (feb. 20, 2009), http://www.law.com/jsp/nlj/pubarticlenlj .jsp?id=1202428419045 (accessed jan. 28, 2010). follow the case at the county of alameda’s superior court of california, available online (search for case number rg09436691): http://apps .alameda.courts.ca.gov/domainweb/html/index.html (accessed sept. 20, 2010). 32. ibid. 33. ibid. after finding the case, click on “register of actions” in the side navigation menu. these details can be found on page 10 of the action “joint case management statement filed,” uploaded june 30, 2010. 34. jim blansett, “digital discrimination: ten years after section 508, libraries still fall short of addressing disabilities online,” library journal 133 (aug. 2008): 26–29; drew robb, “one site fits all: companies are working to make their web sites comply with accessibility guidelines because the effort translates into more customers,” computerworld (mar. 28, 2005): 29–32. 35. the united states department of justice supports title iii’s application of “public accommodation” to include virtual web spaces. see u.s. department of justice, “settlement agreement between the united states of america and city of missoula county, montana under the americans with disabilities act,” dj# 204-44-45, http://www.justice.gov/crt/foia/mt_1.php and http://www.ada.gov/missoula.htm (accessed jan. 28, 2010). 24. ruth sara connell, “survey of web developers in academic libraries,” journal of academic librarianship 34, no. 2 (2008): 121–29. 25. patrick m. egan and traci a. guiliano, “unaccommodating attitudes: perceptions of students as a function of academic accommodation use and test performance” north american journal of psychology 11, no. 3 (2009): 487–500; ramona paetzold et al., “perceptions of people with disabilities: when is accommodation fair?” basic & applied social psychology 30 (2008): 27–35. 26. u.s. census bureau, american community survey, puerto rico community survey: 2008 subject definitions (washington, d.c.: government printing office, 2009). hearing disability pertains to deafness or difficulty in hearing. visual disability pertains to blindness or difficulty seeing despite prescription glasses. self-care disability pertains to those whom have “difficulty dressing or bathing.” 27. u.s. census bureau, data set: 2006–2008 american community survey (acs) public use microdata sample (pums) 3-year estimates (washington, d.c.: government printing office, 2009). for a more interactive table, with statistics drawn directly from the american community survey pums data files, see the database created and maintained by the employment and disability institute at cornell university: m. j. bjelland, w. a. erickson, and c. g. lee, disability statistics from the american community survey (acs), cornell university rehabilitation research and training center on disability demographics and statistics (statsrrtc), http://www.disabilitystatistics.org (accessed jan. 28, 2010). 28. sébastien rainville-pitt and jean-marie d’amour, “using a cms to create fully accessible web sites,” journal of access services 6 (2009): 261–64; laura burzagli et al., “using web content management systems for accessibility: the experience of a research institute portal,” in proceedings of the 11th international conference on computers helping people with appendix. library website accessibility requirements, by state state libraries included? code online state statutes online statements/policies/ guidelines ala. n/a n/a n/a http://isd.alabama.gov/isd/statements .aspx alas. n/a n/a n/a n/a ariz.* state and statefunded (with exceptions) arizona revised statutes §413532 http://www.azleg.state.az.us/ arizonarevisedstatutes.asp? title=41 http://az.gov/polices_accessibility.html ark. state and state-funded arkansas code annotated §2526-201 thru §25-26-206 http://www.arkleg.state.ar.us/assembly/ arkansascodelargefiles/title%2025%20 state%20government-chapter%2026%20 information%20technology.htm and http:// www.arkleg.state.ar.us/bureau/publications/ arkansas%20code/title%2025.pdf http://portal.arkansas.gov/pages/policy .aspx web accessibility, libraries, and the law | fulton 41 state libraries included? code online state statutes online statements/policies/ guidelines calif.* state and state-funded california government code §11135 thru §11139 http://www.leginfo.ca.gov/calaw.html http://www.webtools.ca.gov/accessibility/ state_standards.asp colo. state colorado revised statutes §2485-101 thru §24-85-104 http://www.state.co.us/gov_dir/leg_dir/ olls/colorado_revised_statutes.htm www.colorado.gov/colorado/accessibility .html conn. n/a n/a n/a http://www.access.state.ct.us/ del. n/a n/a n/a http://gic.delaware.gov/information/ access_central.shtml fla.* state florida statutes §282.601 thru §282.606 http://www.leg.state.fl.us/statutes/ http://www.myflorida.com/myflorida/ accessibility.html ga. n/a n/a n/a http://www.georgia.gov/00/static/ 0,2085,4802_0_0_accessibility, 00.html hawaii n/a n/a n/a http://www.ehawaii.gov/dakine/docs/ada .html idaho n/a n/a n/a http://idaho.gov/accessibility.html ill. state and university 30 illinois complied statutes annotated 587 http://www.ilga.gov/legislation/ilcs/ilcs.asp http://www.dhs.state.il.us/page.aspx? item=32765 ind.* state and local government burns indiana code annotated §4-13.1-3 http://www.in.gov/legislative/ic/code/title4/ ar13.1/ch3.html http://www.in.gov/core/accessibility.htm iowa n/a n/a n/a http://www.iowa.gov/pages/accessibility kans. n/a n/a n/a http://www.kansas.gov/about/ accessibility_policy.html ky.* state and state-funded kentucky revised statutes annotated §61.980 thru §61.988 http://www.lrc.ky.gov/krs/titles.htm http://technology.ky.gov/policies/ webtoolkit.htm la. state louisiana revised statutes §39:302 http://www.legis.state.la.us/ http://www.louisiana.gov/government/ policies/#webaccessibility maine n/a n/a n/a http://www.maine.gov/oit/accessibility/ policy/webpolicy.html appendix. library website accessibility requirements, by state (continued) 42 information technology and libraries | march 2011 state libraries included? code online state statutes online statements/policies/ guidelines md. state and (possibly) community college maryland state finance and procurement code annotated §3a311 http://www.michie.com/maryland/ and http://www.dsd.state.md.us/comar/coma r.aspx http://www.maryland.gov/pages/ accessibility.aspx mass. n/a n/a n/a http://www.mass.gov/accessibility and http://www.mass.gov/?pageid=mg2utiliti es&l=1&sid=massgov2&u=utility_policy_ accessibility mich. n/a n/a n/a http://www.michigan.gov/som/0,1607,7– 192–26913–2090—, 00.html minn.** state minnesota annotated statutes §16e. 03 subdivisions 9–10 https://www.revisor.mn.gov/pubs/ http://www.starprogram.state.mn.us/ accessibility_usability.htm miss. n/a n/a n/a http://www.mississippi.gov/access_policy .jsp mo.* state missouri revised statutes §191.863 http://www.moga.mo.gov/statutes/ statutes.htm http://oa.mo.gov/itsd/cio/standards/ ittechnology.htm mont. state and state-funded montana code annotated §185-601 http://data.opi.mt.gov/bills/mca_toc/index .htm http://mt.gov/discover/disclaimer .asp#accessibility neb. n/a n/a n/a http://www.webmasters.ne.gov/ accessibilitystandards.html nev. n/a n/a n/a http://www.nitoc.nv.gov/psps/3.02_ standard_webstyleguide.pdf n.h. n/a n/a n/a http://www.nh.gov/wai/ n.j. n/a n/a n/a http://www.state.nj.us/nj/accessibility.html n.m. n/a n/a n/a http://www.newmexico.gov/accessibility .htm n.y. n/a n/a n/a http://www.cio.ny.gov/policy/nys-p08– 005.pdf n.c. n/a n/a n/a http://www.ncsta.gov/docs/principles%20 practices%20standards/application.pdf n. dak. n/a n/a n/a http://www.nd.gov/ea/standards/ ohio n/a n/a n/a http://ohio.gov/policies/accessibility/ appendix. library website accessibility requirements, by state (continued) web accessibility, libraries, and the law | fulton 43 state libraries included? code online state statutes online statements/policies/ guidelines okla. state and university 62 oklahoma statutes §34.16, §34.28 thru §34.30 http://www.lsb.state.ok.us/ http://www.ok.gov/accessibility/ ore. n/a n/a n/a http://www.oregon.gov/accessibility.shtml pa. n/a n/a n/a http://www.portal.state.pa.us/portal/ server.pt/community/it_accessibility/10940 r.i. n/a n/a n/a http://www.ri.gov/policies/access.php s.c. n/a n/a n/a http://sc.gov/policies/accessibility.htm s. dak. n/a n/a n/a http://www.sd.gov/accpolicy.aspx tenn. n/a n/a n/a http://www.tennesseeanytime.org/web -policies/accessibility.html tex. state and university texas government code §2054.451 thru §2054.463 http://www.statutes.legis.state.tx.us/ http://www.texasonline.com/portal/tol/en/ policies utah n/a n/a n/a http://www.utah.gov/accessibility.html va. state, university, and commonwealth virginia code annotated §2.2-3500 thru §2.2-3504 http://leg1.state.va.us/000/src.htm http://www.virginia.gov/cmsportal3/ about_virginia.gov_4096/web_policy.html vt. n/a n/a n/a http://www.vermont.gov/portal/policies/ accessibility.php wash. n/a n/a n/a http://isb.wa.gov/webguide/accessibility .aspx w. va. state west virginia code §1810n-1 thru §18-10n-4 http://www.legis.state.wv.us/wvcode/ code.cfm http://www.wv.gov/policies/pages/ accessibility.aspx wis. n/a n/a n/a http://www.wisconsin.gov/state/core/ accessibility.html wyo. n/a n/a n/a n/a *these states mention section 508 of the rehabilitation act within statute text **this state mentions wcag 2.0 within its statute text note: most states with statutes on web accessibility also have statements, policies, and guidelines that are more detailed than the statute text and may contain references to section 508 and wcag 2.0. all webpages were visited between january 1, 2010, and february 12, 2010. appendix. library website accessibility requirements, by state (continued) 34 information technology and libraries | december 2007 author id box for 2 column layout column title editor as public libraries are becoming e-government access points relied on by both patrons and government agencies, it is important for libraries to consider the implications of these roles. while providing e-government access serves to reinforce the tremendously important role of public libraries in the united states social infrastructure, it also creates new demands on libraries and opens up significant new opportunities. drawing upon several different strands of research, this paper examines the nexus of public libraries, values, trust, and e-government, focusing on the ways in which the values of librarianship and the trust that communities place in their public libraries reinforce the role of public libraries in the provision of e-government. the unique values embraced by public libraries have not only shaped the missions of libraries, they have influenced popular opinion surrounding public libraries and fostered the confidence that communities place in them as a source of trusted information and assistance in finding information. as public libraries have embraced the provision of internet access, these values and trust have become intertwined with their new social role as a public access point for e-government both in normal information activities and in the most extreme circumstances. this paper explores the intersections of these issues and the relation of the vital e-government role of public libraries to library funding, public policy, library and information science education, and research initiatives. p ublic libraries have always been valued and trusted institutions within society. due to recent advances in technology and changes in united states society, public libraries now also play a unique and critical role by offering free public internet access. with the increas­ ing reliance on the internet as a key source of news, social capital, and access to government services and information, the free access provided by public librar­ ies is an invaluable resource. as a result, a significant proportion of the u.s. population, including people who have no other means of access, people who need help using computers and the internet, and people who have lower quality access, rely on the internet access and computer help available in public libraries. federal, state, and local government agencies now also rely on public libraries to provide citizens with access to and guidance in using e­government web sites, forms, and services; many government agencies simply direct citizens to the nearest public library for help. this confluence of events has created a major new social role for public libraries— guarantors of internet and e­government access. though public libraries are not the only points of free internet access in many communities, they have created the strongest commitment to providing access and help for all. by providing not only the access to technology, but also to help using the technology, libraries became internet access points, while community technology cen­ ters, which usually did not offer the same level of avail­ able assistance, failed in the late 1990s and early 2000s. further, as libraries not only provide internet access, but free computer access as well, they attract the people who do not own computers and do not benefit from a city’s or coffee shop’s free wi­fi. the compelling combination of free computer access, free internet access, the avail­ ability of assistance from knowledgeable librarians, the value that public librarians place on serving their local communities, and the historical trust that society places in public libraries has made libraries a critical part of the u.s. social infrastructure. without public libraries, large segments of the population would be cut off from access to the internet and e­government. while the provision of internet access for those who have no other access parallels the role of public libraries as providers of access to print materials, the matura­ tion of public libraries into internet and e­government access hubs has profound implications for the roles that public libraries are being expected to play in their communities. public libraries are trusted by their com­ munities as places that community members can turn to for unfettered information access and as places to go for information in times of need. combining this trust with the power of internet access and support makes public libraries even more critical within their local com­ munities. the trust placed in libraries is also important in balancing the lack of confidence that many citizens place in other government institutions as well as in the internet. clearly, e­government, which exists at this intersection, has its trustworthiness bolstered by the role of public libraries in its use. as patrons are able to access e­government through the library—a place that is trusted—they may have greater confidence in the gov­ ernment services they use through library computers and with the assistance of librarians. the important role of libraries in providing citizens with access to the internet, and especially to e­govern­ paul t. jaeger (pjaeger@umd.edu) is an assistant professor and director of the center for information policy and electronic government at the college of information studies of the university of maryland, college park. kenneth r. fleischmann (kfleisch@umd.edu) is an assistant professor at the college of information studies of the university of maryland, college park. paul t. jaeger and kenneth r. fleischmann public libraries, values, trust, and e-government article title | author 35public libraries, values, trust, and e-government | jaeger and fleischmann 35 ment, makes natural sense given the values of the public library. these new services reflect the values traditionally upheld by public libraries, such as equal access to infor­ mation, literacy and learning, and democracy. indeed, these values likely have played a significant role in developing and sustaining public trust in public libraries as institutions. thus, to understand how public libraries have come to serve as the default site for e­government access, it is important to consider how this role builds on and reflects the public library’s enduring values. drawing upon several different strands of research, this article explores the intersections of public libraries, values, trust, and e­government. the article first exam­ ines the values of public libraries and the role that these values play in influencing popular opinion surrounding public libraries. next, the article focuses on the trust that communities place in public libraries, which builds upon the values that libraries uphold. after that, the article explores the reasons why public libraries became and remain the public access point for e­government, providing examples from the 2004 and 2005 hurricane seasons that illustrate this point in the most extreme cir­ cumstances. the article then examines the nexus of public libraries, values, trust, and e­government, further exam­ ining how the values of librarianship and the confidence that communities place in their public libraries reinforce the role of public libraries in the provision of e­govern­ ment. finally, the article explores how the e­government role of public libraries could be cultivated to improve library services through involvement in research and educational initiatives. ■ public libraries and values values can be seen as “evaluative beliefs that synthesize affective and cognitive elements to orient people to the world in which they live.”1 in other words, values tie together how individuals think about the world and how they feel about the world. following this definition, values are situated within individuals. although they are a result of social interaction and may be shared among individuals, values are a highly individualized and per­ sonalized phenomenon. thus, values arise at the intersec­ tion of the individual and the social, with some scholars now making a case for increasing the emphasis placed on values in the social sciences.2 recently, many scholars and commentators have focused on the values of librar­ ies, most notably former ala president michael gorman, who has written extensively on the topic.3 gorman focuses on library values in response to what he views as a disconnect between library practitioners and academics. he argues that library­science programs are becoming increasingly detached from reality, and that one way to ground library science, as well as the library profession, is through an emphasis on the values of librar­ ianship, which demonstrate the core, enduring values of the profession.4 he explains that values, on the one hand, should provide a foundation for interaction and mutual understanding among members of a profession; on the other hand, they should not be viewed as immutable, but rather as sufficiently flexible to match the changing times. he lists eight central values of librarianship that he views as particularly salient at present: stewardship, service, intellectual freedom, rationalism, literacy and learning, equity of access to recorded knowledge and information, privacy, and democracy. frances groen echoes gorman’s sentiments and argues that one of the major limitations of library­science programs is their lack of attention to values.5 she argues that library and information science (lis) programs place almost all of their educational emphasis on what librar­ ians do and how they do it, and almost none on the rea­ sons why they do what they do and why such activities are important. she identifies three fundamental library values: access to information, universal literacy, and preservation of cultural heritage, all of which she argues are also characteristics of liberal democratic societies. this argument parallels the observation that increases in information access within a society are essential to increasing the inclusiveness of the democratic process in that society.6 library historian toni samek focuses on another aspect of library values that is no longer as strongly emphasized—attempts to achieve neutrality in libraries.7 neutrality often was advocated as a cherished value, in the sense of providing equal access to all information and sources. however, samek demonstrates that libraries, on the contrary, were more likely to emphasize mainstream information sources and thus privilege them over alter­ native sources. not only has the value of neutrality been problematic in terms of how it has been implemented and mobilized in public libraries in the 1960s and 1970s, but it also is perhaps impossible to ever achieve in reality.8 the fact that neither gorman nor groen include neutrality in their listings of fundamental library values demonstrates how library values have continued to evolve as public libraries have developed as social institutions. as library values have developed, they have served to unite librarians and establish the role of public libraries in their communities. the values of librarianship have been encoded in the american library association’s (ala) library bill of rights, which strongly asserts the values of equal access and service for all patrons, nondiscrimina­ tion, diversity of viewpoint, and resistance to censorship and other abridgments of freedom of expression.9 the values of libraries and librarianship are one of the fac­ tors that lead communities to trust public libraries, as the following section explores. overall, further study of the 36 information technology and libraries | december 200736 information technology and libraries | december 2007 role of values in libraries is essential, especially given the increasing role of technology in public libraries.10 ■ public libraries and trust exactly one half of the respondents to a 2007 pew research center study agreed with the statement “you can’t be too careful in dealing with people.”11 however, even in a climate where trust can be a precious commodity, public libraries are trusted by their communities. carr argues that libraries have come to earn the trust of their com­ munities because of four obligations that librarians strive to meet: to provide user­centered service, to actively engage in helping users, to connect information seekers to unexplored information sources, and to take the goal of helping users as a professional duty that is controlled first and foremost by the library user.12 similarly, jaeger and burnett argue that, because of its traditional defense of commonly accepted and popular values—such as free access to and exchange of information, providing a diverse range of materials and perspectives to users from across society, and opposition to government intrusions into personal reading habits—public libraries have come to be seen by members of the populace as a trusted source of information in the community.13 gorman argues for a direct link between the values of libraries and the trust that is instilled within them by the public, stating that one important mission for ensuring the survival of libraries and librarianship is “assuring the bond of trust between the library and the society we serve by demonstrating our stewardship and commitment, thus strengthening the mutuality of the interests of librar­ ians and the broader community.”14 further, a 2006 study conducted by public agenda found that “public libraries seem almost immune to the distrust that is associated with so many other institutions.”15 in specific terms of the internet, the public library “is a trusted community­based entity to which individuals turn for help in their online activities—even if they have comput­ ers and internet access at home or elsewhere.”16 in a large­ scale national survey, 64 percent of respondents, including both users and non­users of public libraries, asserted that providing public access to the internet should be one of the highest priorities for public libraries.17 thus, trust in public libraries seems to carry over from other library services to provision of internet access and training. however, challenges to trust in public libraries seem to be growing in the internet age. the trusted role of pro­ tecting users’ personal information may create conflicts with the other social responsibilities of public libraries.18 as a result of a lack of preparedness of some librarians to deal with privacy issues, it is possible that “the trust that research shows users place in libraries is not fully repaid.”19 a 2005 oclc study suggests that, indeed, user trust in public libraries shows signs of weakening, as the majority of citizens place as much trust in internet search engines as they do in public libraries.20 further, the changes in the law following the 9/11 terror attacks that have increased the ability of the federal government to track patron activities in public libraries, such as through the usa patriot act, have raised serious concerns about privacy and freedom of expression among many public library patrons and librarians.21 trust in libraries also has been challenged by the impo­ sition of filters for public libraries that receive e­rate fund­ ing due to the children’s internet protection act.22 while internet access is no longer unfettered in libraries that have to comply with the law, public libraries have been able to prevent this law from eroding their role as trusted internet provider through ala’s vigorous legal challenge to the constitutionality of law and the rejection of e­rate funds by a large number of libraries after the supreme court upheld the constitutionality of the law.23 thus, the trusting rela­ tionships that public libraries have built with their com­ munities are valuable commodities that can be transferred under some circumstances from one particular service to another, yet are not inalienable rights granted to public libraries. rather, public trust is something that libraries must work hard to maintain. trust in public libraries also has served as an important cause and effect of the role of libraries in providing access to e­government. ■ public libraries and e-government public libraries are not only trusted as a means of access to the internet in general, they are trusted as a provider of access to e­government. with nearly every united states public library now connected to the internet and offer­ ing free public access, they can fill a community need of ensuring that all citizens have access to e­government and assistance using e­government services.24 indeed, public libraries and the internet have both improved public access to government information.25 this social role also is embraced by all levels of government, with government agencies often directing people with questions about their online materials to public libraries for help.26 as such, government agencies also trust public libraries to serve as key providers of e­ government access and training. public libraries could not have foreseen becoming the default social access point for e­government when they began to provide free public internet access in the mid­1990s, due in great part to the largely separate evolution of internet access in libraries and e­government. however, they now fill this role in society, ensuring access to those who have no other means of reaching e­government and providing a safety article title | author 37public libraries, values, trust, and e-government | jaeger and fleischmann 37 net of training and assistance for those who have access but need help using e­government. public libraries have developed into the social source of e­government for two reasons. the first is simply that libraries committed to the provision of public internet access in the early 1990s and have continued to grow and improve that access so that virtually all public libraries in the united states provide free public internet access.27 however, presence of access alone does not account for the current role of the public library, as most public schools and government offices have internet access, and community technology centers were origi­ nally funded to create an environment that would provide computer access. a key difference in public libraries is that they are historically trusted as providers of information, including government information, to all segments of society. “the public library is one place that is culturally ingrained as a trusted source of free and open information access and exchange.”28 a key part of the provision of internet access in pub­ lic libraries also has been providing help. as heanue explains, “even if americans had all the hardware they needed to access every bit of government information they required, many would still need the help of skilled librarians whose job it is to be familiar with multiple systems of access to government systems.”29 not only is the information trusted because of the source, the help is trusted because the librarians are part of the library. as e­government has developed and the complexity has grown, this trusted help has become invaluable to many people who need to use e­government but do not feel able to on their own. in a 2001 study of both public library and internet users, the key preferences identified for public libraries included the ease of use, accuracy of informa­ tion available, and help provided by library staff.30 these perceptions have carried over into e­government, as the staff members not only provide help using e­government; their guidance directs users to the correct e­government sites and forms and makes using the sites an easier expe­ rience than it otherwise would be. in the era of e­government, governments internation­ ally are showing a strong preference for delivering ser­ vices via the internet, particularly as a means of boosting cost­efficiency and reducing time spent on direct interac­ tions with citizens.31 however, citizens show a strong preference for phone­based or in­person interactions with government representatives when they have questions or are seeking services.32 e­government services generally are limited by difficulties in searching for and locating the desired information, as well as lack of availability of computers and internet access to many segments of the general population.33 such problems are exacerbated by general lack of familiarity of the structure of government and which agencies to contact as well as many citizens’ attitudes toward technology and government.34 also, as e­government sites give more emphasis to presenting political agendas rather than promoting democratic par­ ticipation, users are less trusting of the sites themselves.35 finally, perhaps the most compelling reason for the reli­ ance on public libraries to provide access to and help with e­government is that public libraries provide support equally to all members of a community—and that free services are of most relative value to those who have the fewest resources of their own. as a result of the reliance of patrons and government agencies on the public library as a center for e­government access and assistance, public librarians have had to become de facto experts on e­government, ranging from medicare prescription plans to fema forms to immigration registra­ tion to water management registration.36 in one case, the involvement of a librarian who specialized in government information was necessary in a community planning pro­ cess to sort through the related e­government materials and information sources.37 one area where the social roles as provider of e­government and as trusted provider of information were notably intertwined was during the 2004 and 2005 hurricane seasons along the gulf coast. ■ public libraries as trusted provider of e-government public libraries have become vital access points and com­ munication hubs for many communities and, in times of emergency, are vital in helping their communities cope with the crisis.38 this role proved especially important in com­ munities along the gulf coast during the unprecedented 2004 and 2005 hurricane seasons, with public libraries employing their internet access to assist their communities in hurricane recovery in numerous ways. the public librar­ ies in that region described five major roles for the public library internet access in communities after a hurricane: ■ finding and communicating with dispersed and dis­ placed family members and friends; ■ completing fema forms, which are online only, and insurance claims; ■ searching for news about conditions in the areas from which they had evacuated; ■ trying to find information about the condition of their homes or places of work, including checking news sites and satellite maps; and ■ helping emergency service providers find informa­ tion and connect to the internet.39 the provision of e­government information and assis­ tance in filling out e­government forms was a central function of these libraries in helping their communities. the level of assistance was astounding—one mississippi library completed more than forty­five thousand fema 38 information technology and libraries | december 200738 information technology and libraries | december 2007 applications for patrons in the first month after katrina struck—despite the fact that the libraries were not specifi­ cally prepared to offer such a service and that few library systems planned for this type of situation.40 furthermore, while libraries helped many communities, they could not meet the enormous needs in the affected communi­ ties. the events along the gulf coast in 2004 and 2005 revealed a serious need for the integration of local and state public entities that have large­scale coordination plans to work with the libraries.41 most of the functions that community organizations played in the most ravaged areas after katrina, rita, wilma, dennis, ivan, and the other major storms were completely ad hoc and unplanned.42 the federal gov­ ernment was of little help in the immediate aftermath of many of these situations.43 as such, it was the local community organizations, particularly public libraries, that used information technology (at least what was still working) to try to pick up the pieces, get aid, find the missing, and perform other vital functions. consider the following quotes from local government officials explaining the role computers and internet access in public libraries played in providing information to dev­ astated communities: our public access computers have been the only source of communicating with insurance carriers, the federal emergency management agency and other sources of aid. the greatest impact has been access to information such as fema forms and job applications that are only available via internet. this was highly visible during the aftermath of hurricanes rita & katrina. overall access to information in this rural community has been outstanding due to use of the internet. relief workers were encouraged to use the library to keep in touch with family and friends through email. . . . the library provided a fema team with local maps and help in locating areas that potentially suffered major damage from the storm. during the immediate aftermath of katrina, our com­ puters were invaluable in locating missing family, applying for fema relief (which could only be done online) and other emergency needs. for that time—the computers were a godsend. we have a large number of displaced people who are coming to rely upon the library in ways many of them never expected. i’ve had so many people tell me that they had never been to a library before they had to find someplace to file a fema application or insur­ ance claim. many of these people knew nothing about computers and would have been totally lost without the staff’s help.44 along with e­government access, one of the greatest affects of access to information related to searches for lost family, friends, and pets, with many libraries creating lists of individuals who had been to the library and who were being sought to help in establishing contacts between people. as one librarian stated, “our computers were invaluable in locating a missing family.”45 searches were conducted by patrons and by librarians helping them to locate evacuees and search for information about those who stayed behind. internet access also allowed patrons to have “contact with family members outside of the disaster area,” “communicate with family and friends,” and “stay in touch with family and friends due to lack of telephone service.”46 libraries used their internet access to aid rescue personnel to communicate with their agen­ cies, and even to direct emergency responders with direc­ tions, maps, and information about where people most needed help.47 the level of local libraries’ success in meeting the needs of their communities after the hurricanes varied widely, though. many were simply overwhelmed by the numbers of people in need and limited by the fact that they had never expected to have to act as a community lifeline in this way.48 the libraries that faired the best were usually in florida; they have a greater familiarity with dealing with hurricanes and thus were more prepared and had more established ties between local libraries, county governments, and state agencies.49 having internet access and expertise is clearly not enough. planning, coordina­ tion, experience, and government support and funding all influenced how different public libraries were able to respond after the major hurricanes. public libraries also may be able to play a role in ongoing emergency response efforts, such as the development of large­scale community response grids that coordinate citizens and emergency responders in emergencies.50 the greatest lesson, however, may be that public librar­ ies, as trusted providers of information technology access, particularly access to e­government, are the most local line of response in communities. the national government failed shatteringly and completely to help people after hurricane katrina, while little public libraries in and on the edges of the devastation hummed along. the local nature of the response that libraries could provide man­ aged to reach communities and members of those commu­ nities much better than national or state level responses. such local response to crises, while vital, is becoming much harder to find outside of public libraries. ■ the nexus of public libraries, values, trust, and e-government the democratically oriented core values of public librar­ ies and the trust that communities place in their public article title | author 39public libraries, values, trust, and e-government | jaeger and fleischmann 39 libraries have the potential to significantly enhance and strengthen the role of public libraries in the provision of e­government. citizens who access e­government using computers in public libraries, and with the expert assistance of librarians, may have more confidence in the e­government information and services they are using as a result of their high regard for public libraries. as patrons trust that librarians will help them reach the information they need, patrons’ awareness of and confidence in e­government will increase as they learn from librarians about the types of information and services available from e­government. further, by teaching patrons what is available from and how to use e­government, librar­ ians are serving to increase the number of e­government users. because e­government is still at an early stage in its development, such positive associations could play a critical role in encouraging and facilitating its widespread acceptance and adoption. just as e­government is still in its formative stages, research on e­government also is just getting started. to date, research on e­government has focused more on technical than social aspects. for example, a meta­ analysis of 110 peer­reviewed journal articles related to e­government revealed that the relationship between e­government and values is an important, yet to date understudied, topic.51 it is important to consider not only bandwidth and markup languages, but also values and trust in developing and analyzing e­government. it also is important to consider the relationship between trust in e­government and the potential for increasingly participatory democracy. trust can be seen as “centrally positioned at the nexus between the primarily internally driven administrative reforms of e­government’s architecture and the related, more exter­ nally rooted pressures for e­governance reflected in widening debates on openness and engagement.”52 similarly, “citizen engagement can help build and strengthen the trust relationship between governments and citizens.”53 through e­government, it is possible to facilitate citizen participation in government through the bidirectional interactive potential of the internet, making it possible to move toward strong democracy.54 greater faith in democracy can potentially significantly increase citizen trust in e­government. at the same time that we consider all of these impor­ tant issues related to e­government, it is important not to lose sight of the critical role that public libraries play in the provision of e­government. further, it is necessary to make certain that public libraries receive credit and support for the work that they do in providing access to and help with e­government. as demonstrated above, public libraries are uniquely and ideally situated to ensure access to and assistance in using e­government information and services. however, this activity is not sustainable without the recognition and resources that must accompany this role. the conclusion addresses this important point in more detail. ■ conclusions and future directions the evolution of the public library into an e­government access point has occurred without the direct intention of public libraries and without their involvement in policy decisions related to these new social roles. as with the need to become more active in encouraging the develop­ ment of technologies to help libraries fulfill these social expectations, public libraries also must become more involved in the policy­making process and in seeking financial and other support for these activities. public libraries have to demand a voice not only to better con­ vey their critical role in the provision e­government, but to help shape the direction of the policy­making process to ensure more government support for the access to and help with e­government that they provide. public libraries have taken on these responsibilities without receiving additional funding. while the provi­ sion of internet access alone is a major expense for public libraries, the reliance of government agencies on public libraries as the public support system for e­government adds very significant extra burdens to libraries.55 in a 2007 survey of florida public libraries, for example, 98.7 percent indicated that they receive no support from an outside agency to support the e­government services the library provides, despite the fact that 83.3 percent of responding libraries indicated that the use of e­govern­ ment in the library had increased overall library usage.56 this lack of outside support has resulted in public librar­ ies in different parts of the country having widely varying access to the internet.57 the reality is that public libraries are expected by patrons and government agencies to fulfill this social role, whether or not any support—financial, staffing, or training—is provided for this role. the vital roles that public libraries played in the aftermath of the major hur­ ricanes of the 2004 and 2005 seasons may have perma­ nently cemented the public and government perception of public libraries as hubs for e­government access.58 while public libraries have become the unofficial uni­ versal access point for e­government and are trusted to serve as a vital community response and recovery agency during emergencies, they do not receive funding or other forms of external assistance for these functions. public libraries need to become involved in and encourage plans and programs that will serve to sustain these essential and inextricably linked activities, while also bringing some level of financial, training, and staffing support for these roles. the tremendous efforts and successes of public librar­ ies in the aftermath of the 2004 and 2005 hurricanes has 40 information technology and libraries | december 200740 information technology and libraries | december 2007 earned libraries a central position to e­government and emergency planning at local, state, and federal levels. in those emergency situations, public libraries were able to serve their communities in a capacity that was far beyond the traditional image of the role of libraries, but these emergency response roles are as significant as anything else libraries could do for their communities. in order to continue fulfilling these roles and adequately performing other expected functions, public libraries need to push not only for financial support, but also for a greater role in planning and decision­making related to e­government services as well as emergency response and recovery at all levels of government. if strategic plans and library activities have a consis­ tent message about the need for support, the interrelated roles of trusted source of local information, e­government access provider, and community­response information and coordination center can make a compelling argument for increases in funding, support, and social standing of public libraries. the most obvious source of further sup­ port for these activities would be the federal government. amazingly, federal government support accounts for only about 1 percent of public library funding.59 given that federal government agencies are already relying on public libraries to ensure access to e­government and fos­ ter community response and recovery in times of emer­ gencies, federal support for these social roles of the public library clearly can and should be increased significantly. state libraries, cooperatives, and library networks already work to coordinate funding and activities related to certain programs, such as the e­rate program.60 these same library collectives may be able to work together to promote the need for additional resources and coor­ dinate those resources once they are attained. private and public partnerships offer another potential means of support for these library activities. with its strong historical and current connections to technology and libraries, the bill and melinda gates foundation might be a very important partner in funding and facilitating the increased role that public libraries play in providing access to and help with e­government. the search for additional funding to support e­government provision should not only focus on funds for access and training, but also on funds for research about how to better meet individual and community e­government needs and the affects of e­government provision by public libraries on individuals and communities. regardless of what approaches are taken to find­ ing greater support, however, public libraries must do a better job of communicating their involvement in the provision of e­government to governments and private organizations in order to increase support. such commu­ nications will need to be part of a larger strategy to define a place within public policy that gives public libraries a voice in e­government issues. if public libraries are going to fulfill this social role, they must become a greater pres­ ence in the national policy discourse surrounding e­gov­ ernment. to increase their support and standing in policy discourse, libraries must not be hesitant in reminding the public and government officials of their successes after emergencies and in providing the social infrastructure for e­filing of taxes, enrolling in medicare prescription drug plans, and myriad other routine e­government activities. in many societies, e­government has come to be seen by many citizens and governments as a force that will enhance democratic participation, more closely link citizens and their representatives, and help disadvan­ taged populations become more active participants in government and in society.61 e­government is seen by many as having “the potential to fundamentally change a whole array of public interactions with government.”62 while the e­government act of 2002 and president’s e­government management agenda have emphasized the transformative effect of e­government, thus far it has primarily been used as a way to make information available, provide forms and electronic filing, and distrib­ ute the viewpoints of government agencies.60 however, many citizens do look to e­government as a valuable source of information, considering e­government sites to be “objective authoritative sources.”64 currently, the primary reason that people use e­government is to gather information.65 in the united states, 58 percent of internet users in the united states believe e­government to be the best source for government information, 65 percent of americans expect that information they are seeking will be on a government site, and 26 million americans seek political information online everyday.66 public satisfaction with the e­government services available, however, is limited. as commercial sites are developing faster and provide more innovative services than e­government sites, public satisfaction with gov­ ernment web sites is declining.67 public confidence in government web sites also has declined as much of the public policy related to e­government since 9/11 has been to reduce access to information through e­govern­ ment.68 the types of information that have been affected include many forms of socially useful information, from scientific information to public safety information to information about government activities.69 for these and other reasons, the majority of citizens, even those with a high­speed internet connection at home, seeking govern­ ment information and services prefer to speak to a person directly in their contacts with the government.70 in many cases, people turn to public librarians to serve as the per­ son involved in e­government contacts. further, when people struggle with, become frustrated by, or reject e­government services, they turn to public libraries. every year, public libraries deal with huge num­ bers of patrons needing help with online taxes, and the medicare prescription drug plan sign­up period resulted in article title | author 41public libraries, values, trust, and e-government | jaeger and fleischmann 41 an influx of seniors to public libraries seeking help in using the online registration system.71 for example, during the 2006 tax season, virginia discontinued the distribution of free print copies of tax forms to encourage use of the online system. instead, citizens of the state flooded public librar­ ies, assuming that libraries could find them print copies of the forms, which of course the libraries did. it seems unlikely, however, that the same government officials pushing the use of e­government are aware of the roles of public libraries in helping citizens with day­to­day e­government use. further, the enormous social roles of public libraries in emergency response in communities, such as during the 2004 and 2005 hurricane seasons, are far from widely known among government officials. to encourage the provision of external funding, the develop­ ment of targeted support technologies, and policy sup­ port for these social roles, public libraries must make the government and the public better aware of these roles and what is needed to ensure that the roles can be fulfilled. similarly, there is an extremely important role for lis programs in ensuring public libraries can meet community expectations for e­government provision. lis program graduates need to be prepared to help patrons access and use e­government information and services. as govern­ ment activities move primarily or exclusively online, patrons will increasingly seek help with e­government from public libraries. lis programs must ensure that grad­ uates are ready to serve patrons in this capacity. in 2007, the college of information studies at the university of maryland became the first ala­accredited school to offer a concentration in e­government as part of the master of library science program.72 the goal of this concentration is to prepare future librarians who wish to specialize in e­government, which will be an area of increasing and sig­ nificant need as more government information and services move online and more government agencies rely on public libraries to ensure access to e­government. lis programs need to prioritize finding ways to incorporate the teaching of issues related to e­government in public libraries as new concentrations or courses, or into existing courses. the provision of e­government is an important role of public libraries that is likely to increase significantly, and gradu­ ates of lis programs need to be prepared to meet patrons’ e­government information needs. further, lis faculties also can support public libraries in their e­government access and training roles by focusing more research on the intersections of public libraries and e­government. ultimately, the role of the trusted and valued public provider of e­government access creates many financial and staffing obligations and social responsibilities, but it also is a tremendous opportunity for public libraries. fighting against censorship efforts in the 1950s estab­ lished the public perception of libraries as guardians of the first amendment during the mccarthy era.73 working to ensure access and the ability to use e­government is creating new public perceptions of libraries as guardians of equal access in new but just as socially meaningful ways. rather than needing to ponder whether the emer­ gence of the internet will limit or remove the relevance of public libraries, the advent of e­government has created a brand new and very significant role that public libraries can play in serving their communities. given the empha­ sis that governments are placing on moving information and services online, patrons will continue to need access to and assistance in using e­government. the trust and values that have long been associated with public libraries are evolving to include the social expectations of the provision of access to and training for e­government by public libraries. in the same ways that patrons have learned to trust public libraries to provide equal access to print information sources, they now have learned to trust that libraries can provide equal access to e­government information. it seems that citizens will regu­ larly be turning to public libraries for help with mundane e­government activities, such as finding forms and filing taxes, as well as with the most pressing e­government activities, as was demonstrated in the aftermath of hur­ ricanes katrina and rita. because the trust in and values of public libraries have set the stage for the emerging role of libraries in e­government, public libraries need to work to ensure the availability of the support, education, and policy decisions that they need to serve their communities in this new and vital role in situations ranging from every­ day information needs to the most extreme circumstances. in spite of the costs associated with serving as the public’s e­government access center, acting as the social guarantor of equal access to e­government emphatically demonstrates that public libraries will continue to be a central part of the infrastructure of society in the internet age. public libraries now must learn to articulate better the social roles they are playing and the types of support they need from lis programs, funding agencies, and gov­ ernment agencies to continue playing these roles. ■ acknowledgment the authors of this paper have worked with several col­ leagues on projects related to the ideas discussed in this paper. the authors would particularly like to thank john carlo bertot, lesley a. langa, charles r. mcclure, jennifer preece, yan qu, ben shneiderman, and philip fei wu. references and notes 1. margaret mooney marini, “social values and norms,” encyclopedia of sociology, edgar f. borgatta and marie l. borgatta, eds., 2828 (new york: macmillan, 2000). 42 information technology and libraries | december 200742 information technology and libraries | december 2007 2. steven hitlin and jane allyn piliavin, “values: reviv­ ing a dormant concept,” annual review of sociology 30 (2004): 359–93. 3. michael gorman, our singular strengths: meditations for librarians (chicago: ala, 1997); michael gorman, our enduring values: librarianship in the 21st century (chicago: ala, 2000); michael gorman, our own selves: more meditations for librarians (chicago: ala, 2005). 4. gorman, our enduring values. 5. frances k. groen, access to medical knowledge: libraries, digitization, and the public good (lanham, md.: scarecrow, 2007). 6. elizabeth smith, “equal information access and the evo­ lution of american democracy,” journal of educational media and library sciences 33, no. 2 (1995): 158–71. 7. toni samek, intellectual freedom and social responsibility in american librarianship, 1967–1974 (jefferson, n.c.: mcfarland, 2001). 8. pam scott, evelleen richards, and brian martin, “cap­ tives of controversy: the myth of the neutral social researcher in contemporary scientific controversies,” science, technology, and human values 15 (1990): 474–94. 9. american library association, “library bill of rights,” www.ala.org/ala/oif/statementspols/statementsif/librarybill­ rights.htm (accessed may 19, 2007). 10. kenneth r. fleischmann, “digital libraries with embed­ ded values: combining insights from lis and science and technology studies,” library quarterly (in press); kenneth r. fleischmann, “digital libraries and human values: human­ computer interaction meets social informatics,” proceedings of the 70th annual conference of the american society for infor­ mation science and technology, milwaukee, wisc., 2007. 11. pew research center, americans and social trust: who, where, and why (washington, d.c.: pew research center, 2007), http://pewresearch.org/assets/social/pdf/socialtrust.pdf, 2. 12. david wildon carr, “an ethos of trust in information service,” in ethics and electronic information: a festschrift for stephen almagno, barbara rockenbach and tom mendina, eds., 45–52 (jefferson, n.c.: mcfarland, 2003). 13. paul t. jaeger and gary burnett, “information access and exchange among small worlds in a democratic society: the role of policy in redefining information behavior in the post­ 9/11 united states,” library quarterly 75, no. 4 (2005): 464–95. 14. gorman, our enduring values, 66. 15. public agenda, long overdue: a fresh look at public and leadership attitudes about libraries in the 21st century (new york: public agenda, 2006), 11, www.publicagenda.org/research/ pdfs/long_overdue.pdf (accessed may 19, 2007). 16. john carlo bertot et al., “public access computing and internet access in public libraries: the role of public librar­ ies in e­government and emergency situations,” first monday 11, no. 9 (2006), www.firstmonday.org/issues/issue11_9/bertot (accessed may 19, 2007). 17. public agenda, long overdue. 18. nancy zimmerman and feili tu, “it is not just a matter of ethics ii: an examination of issues related to the ethical provi­ sion of consumer health services in public libraries,” ethics and electronic information: a festschrift for stephen almagno, barbara rockenbach and tom mendina, eds., 119–27 (jefferson, n.c.: mcfarland, 2003). 19. paul sturges and ursula iliffe, “preserving a secret garden for the mind: the ethics of user privacy in the digital library,” ethics and electronic information: a festschrift for stephen almagno, barbara rockenbach and tom mendina, eds., 74–81 (jefferson, n.c.: mcfarland, 2003), 81. 20. online computer library center, inc. (oclc), perceptions of libraries and information resources: a report to the oclc membership (dublin, ohio: oclc, 2005). 21. jaeger and burnett, “information access and exchange among small worlds in a democratic society”; paul t. jaeger et al., “the usa patriot act, the foreign intelligence surveil­ lance act, and information policy research in libraries: issues, impacts, and questions for library researchers,” library quarterly 74, no. 2 (2004): 99–121. 22. children’s internet protection act, public law 106–554. 23. paul t. jaeger, john carlo bertot, and charles r. mcclure, “the effects of the children’s internet protection act (cipa) in public libraries and its implications for research: a statistical, policy, and legal analysis,” journal of the american society for information science and technology 55, no. 13 (2004): 1131–39; paul t. jaeger et al., “cipa: decisions, implementation, and impacts,” public libraries 44, no. 2 (2005): 105–09. 24. bertot et al., “public access computing and internet access in public libraries”; john carlo bertot et al., “drafted: i want you to deliver e­government,” library journal 131, no. 13 (2006): 34–39; john carlo bertot et al., public libraries and the internet 2006: study results and findings (tallahassee, fla.: infor­ mation institute, 2006), www.ii.fsu.edu/plinternet_reports.cfm (accessed may 19, 2007). 25. nancy kranich, “libraries, the internet, and democracy,” libraries & democracy: the cornerstones of liberty, nancy kranich, ed., 83–95 (chicago: ala, 2001). 26. bertot et al., “public access computing and internet access in public libraries”; bertot et al., “drafted.” 27. bertot et al., public libraries and the internet 2006. 28. jaeger and burnett, “information access and exchange among small worlds in a democratic society,” 487. 29. anne heanue, “in support of democracy: the library role in public access to government,” information, libraries, and democracy: the cornerstones of liberty, nancy kranich, ed. (chi­ cago: ala, 2001), 124. 30. george d’elia et al., “the impact of the internet on public library uses: an analysis of the current consumer market for library and internet services,” journal of the american society for information science and technology 53, no. 10 (2002): 802–20; eleanor jo rodger, george d’elia, and corrine jorgensen, “the public library and the internet: is peaceful coexistence pos­ sible?,” american libraries 31, no. 5 (2001): 58–61. 31. w. e. ebbers, w. j. pieterson, and h. n. noordman, “elec­ tronic government: rethinking channel management strate­ gies,” government information quarterly (in press). 32. ibid. 33. awdhesh k. singh and rajendra sahu, “integrating inter­ net, telephones, and call centers for delivering better quality e­governance to all citizens,” government information quarterly (in press). 34. paul t. jaeger and kim m. thompson, “e­government around the world: lessons, challenges, and new directions,” government information quarterly 20, no. 4 (2003): 389–94; paul t. jaeger and kim m. thompson, “social information behavior article title | author 43public libraries, values, trust, and e-government | jaeger and fleischmann 43 and the democratic process: information poverty, normative behavior, and electronic government in the united states,” library & information science research 26, no. 1 (2004): 94–107. 35. paul t. jaeger, “deliberative democracy and the con­ ceptual foundations of electronic government,” government information quarterly 22, no. 4 (2005): 702–19; paul t. jaeger, “information policy, information access, and democratic partic­ ipation: the national and international implications of the bush administration’s information politics,” government information quarterly (in press). 36. bertot et al., “public access computing and internet access in public libraries”; bertot et al., “drafted.” 37. aimee c. quinn and laxmi ramasubramanian, “infor­ mation technologies and civic engagement: perspectives from librarianship and planning,” government information quarterly (in press). 38. bertot et al., public libraries and the internet 2006; paul t. jaeger et al., “the 2004 and 2005 gulf coast hurricanes: evolv­ ing roles and lessons learned for public libraries in disaster preparedness and community services,” public library quarterly (in press). 39. bertot et al., “drafted.” 40. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 41. ibid. 42. ibid. 43. michael arnone, “storm watch 2006: ready or not,” federal computer week, june 5, 2006, www.fcw.com/print/12_20/ news/94711­1.html (accessed may 19, 2007). 44. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 45. bertot et al., “public access computing and internet access in public libraries.” 46. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 47. ibid. 48. ibid. 49. bertot et al., “public access computing and internet access in public libraries.” 50. paul t. jaeger et al., “911.gov: harnessing e­government, mobile communication technologies, and social networks to promote community participation in emergency response,” telecommunications policy (in press); ben shneiderman and jenny preece, “911.gov: community response grids,” science 315 (2007): 944. 51. kim viborg andersen and helle zinner henriksen, “e­government research: capabilities, interaction, orientation, and values,” current issues and trends in e-government research, donald f. norris, ed., 269–88 (hershey, pa.: cybertech, 2007). 52. jeffrey roy, “e­government in canada: transition or trans­ formation?” current issues and trends in e-government research, donald f. norris, ed., 44–67 (hershey, pa.: cybertech, 2007), 51. 53. oecd e­government studies, the e-government imperative (danvers, mass.: organization for economic co­operation and development, 2005), 45. 54. bruce barber, strong democracy (berkeley, calif.: univ. of california pr., 1984). 55. bertot et al., public libraries and the internet 2006. 56. charles r. mcclure et al., e-government and public libraries: current status, meeting report, findings, and next steps (tallahassee, fla.: information use management and policy institute, 2007), www.ii.fsu.edu/announcements/e­gov2006/ egov_report.pdf (accessed may 19, 2007). 57. paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology and libraries 26, no. 2 (2007): 4–14. 58. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 59. bertot et al., “drafted.” 60. jaeger et al., “public libraries and internet access across the united states.” 61. beth simone noveck, “designing deliberative democracy in cyberspace: the role of the cyber­lawyer,” boston university journal of science and technology 9 (2003): 1–91. 62. s. h. holden and l. i. millett, “authentication, privacy, and the federal e­government,” information society 21 (2005): 367. 63. e­government act of 2002, p.l. 107–347; jaeger, “delibera­ tive democracy and the conceptual foundations of electronic government”; e-government strategy: implementing the president’s management agenda for e-government (washington, d.c.: egov, 2003), www.whitehouse.gov/omb/egov/2003egov_strat.pdf (accessed may 19, 2007). 64. anderson office of government services, a usability analysis of selected federal government web sites (anderson office of government services: washington, d.c., 2002), 1. 65. christopher g. reddick, “citizen interaction with e­gov­ ernment: from the streets to servers?,” government information quarterly 22, no. 1 (2005): 338–57. 66. john b. horrigan, politics online (washington, d.c., pew internet & american life project, 2006); john b. horrigan and lee rainie, counting on the internet (washington, d.c., pew internet & american life project, 2002). 67. stephen barr, “public less satisfied with government websites,” washington post, mar. 21, 2007, www.washingtonpost. com/wp­dyn/content/article/2007/03/20/ar2007032001338. html (accessed may 19, 2007). 68. lotte e. feinberg, “foia, federal information policy, and information availability in a post­9/11 world,” government information quarterly 21 (2004): 439–60; elaine l. halchin, “electronic government: government capability or terrorist resource,” government information quarterly 21 (2004): 406–19: harold c. relyea and elaine l. halchin, “homeland security and information management,” the bowker annual: library and trade almanac 2003, d. bogart, ed., 231–50 (medford, n.j.: infor­ mation today, 2003). 69. jaeger, “information policy, information access, and democratic participation.” 70. john b. horrigan, how americans get in touch with government (washington, d.c., pew internet & american life project, 2004). 71. bertot et al., “public access computing and internet access in public libraries”; bertot et al., “drafted.” 72. the description of the university of maryland’s e­gov­ ernment master’s program is available at www.clis.umd.edu/ programs/egov.shtml. 73. jaeger and burnett, “information access and exchange among small worlds in a democratic society.” 54 information technology and libraries | june 2010 tinuing education opportunities for library information technologists and all library staff who have an interest in technology. 2. innovation: to serve the library community, lita expert members will identify and demonstrate the value of new and existing technologies within ala and beyond. 3. advocacy and policy: lita will advocate for and participate in the adoption of legislation, policies, technologies, and standards that promote equitable access to information and technology. 4. the organization: lita will have a solid structure to support its members in accomplishing its mission, vision, and strategic plan. 5. collaboration and outreach: lita will reach out and collaborate with other library organizations to increase the awareness of the importance of technology in libraries, improve services to existing members, and reach out to new members. the lita executive committee is currently finalizing the strategies lita will pursue to achieve success in each of the goal areas. it is my hope that the strategies for each goal are approved by the lita board of directors before the 2010 ala annual conference in washington, d.c. that way the finalized version of the lita strategic plan can be introduced to the committee and interest group chairs and the membership as a whole at that conference. this will allow us to start the next fiscal year with a clear road for the future. while i am excited about what is next, i have also been dreading the end of my presidency. i have truly enjoyed my experience as lita president, and in some way wish it was not about to end. i have learned so much and have met so many wonderful people. thank you for giving me this opportunity to serve you and for your support. i have truly appreciated it. a s i write this last column, the song “my way” by frank sinatra keeps going through my head. while this is definitely not my final curtain, it is the final curtain of my presidency. like sinatra i have a few regrets, “but then again, too few to mention.” there was so much more i wanted to accomplish this year; however, as usual, my plans were more ambitious than the time i had available. being lita’s president was a big part of my life, but it was not the only part. those other parts—like family, friends, work, and school—demanded my attention as well. i have thought about what to say in this final column. do i list my accomplishments of the last year? nah, you can read all about that in the lita annual report, which i will post in june. tackle some controversial topic? while i can think of a few, i have not yet thought of any solutions, and i do not want to rant against something without proposing some type of solution or plan of attack. i thought instead i would talk about where i have devoted a large part of my lita time over the last year. as i look back at the last year, i am also thinking ahead to the future of lita. we are currently writing lita’s strategic plan. we have a lot to great ideas to work with. lita members are always willing to share their thoughts both formally and informally. i have been charged with the task of taking all of those great ideas, gathered at conferences, board meetings, hallway conversations, surveys, e-mail, etc., to create a roadmap for the future. after reviewing all of the ideas gathered over the last three years, i was able to narrow that list down to six major goal areas. with the assistance of the lita board of directors and the lita executive committee, we whittled the list down to five major goal areas of the lita strategic plan: 1. training and continuing education: lita will be nationally recognized as the leading source for conmichelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque president’s message: the end and new beginnings editorial board thoughts: technology and mission: reflections of a first-year college library director ed tallent information technology and libraries | december 2012 3 as i reflect on my first year as director for a small college library, several themes are clear to me, but perhaps none resonates as vibrantly as the challenges in managing technology, technology planning, and the never-ending need for technology integration, both within the library and the college. it is all-encompassing, involving every library activity and initiative. while my issues will naturally have a contextual flavor unique to my place of employment, i imagine they reflect issues that all librarians face (or have already faced). what is perhaps less unique is how these issues of library technology intersect with some very high priority college initiatives and challenges. and, given myriad reports on students’ ongoing ambivalent attitudes toward libraries (after everything we have done for them!), it still behooves us to keep working at this integration of the library into the learning and teaching process and to hitch our wagon to larger strategic missions. so, what issues have i faced? the campus portal vs. library web site: this issue is neither new nor unique, but is still is a tangled web of conflicting priorities and attitudes, campus politics and technology vision, the extent and location of technology support, and the flexibility of the campus portal or content management system (cms) and the people who direct it. it is not a question of any misunderstandings, as the need to market the library via the campus web site is obvious and the goal of personalized service is laudatory. yet, marrying the external marketing needs with the internal support needs is a difficult balance to achieve. the web offers a more dramatic entrée to the library than a portal/intranet, and portal technology is not perfect, as jacob neilson highlights in a recent post. the goal obviously is further complicated by the fact that the support needed to maintain a quality web presence--one that is well graphically interesting, vibrant and intuitive--is significant when one considers library web sites are rarely used a place to begin research by students and faculty. ed tallent (edtallent@ curry.edu) is director, levin library, curry college, milton, massachusetts. http://www.useit.com/alertbox/intranet-usability.html editorial board thoughts: technology and mission | tallent 4 the portal, on the other hand, promises a personalized approach and easier maintenance, but lacks the level of operability that would be desirable. the web presence can support both user needs and offer visitors a sense of the quality services and collections the library provides. so, at this writing, what we have is a litany of questions not yet resolved. mobile, tablets, and virtual services: the questions also abound in these areas. should we build our own mobile services, or contract out the development? do we (can we) focus on creating a leadership role for the library in the area of emerging technology, or wait for a coordinated institutional vision and plan to emerge? in the area of tablets, we are about to commence circulating ipads and anyone who has gone through the labyrinthian process just to load apps will know that the process gives one pause as to the value of such an initiative, and that is before they circulate and need to be managed. still, it is a technology initiative that demands review of library work flows, security, student training, and collection access. virtual services were at a fairly nascent state upon my arrival and have grown slowly, as they are being developed in a culture that stressed individual, hands-on, and personalized services. virtual services can be all that, but that needs to be demonstrated not only to the user but to the people delivering the service. the added value here is that the work engages us in valuable reflections on the way in which we work or should work. value of the library: i began my new position at time when the college was deeply engrossed in the issue of student recruitment, retention, and success. for my employer these are significant institutional identity issues, and the library is expected to document its contributions to student outcomes and success. not nearly enough has been done, though a working relationship with a new director of institutional research is developing and critical issues such information literacy, integrated student support, learning spaces, learning analytics, and the need for a data warehouse will be incorporated into the into the college’s strategic plan. the opportunity is there for the library to link with major college initiatives, for example, and make information literacy more than a library issue. citation management: now, here is a traditional library activity, the bane of many a reference service interaction and the undergraduate’s last-minute nightmare. a combination of technical, service and fiscal challenge revolve around the campus climate on the use of technology to respond to this quandary. what to do with faculty who believe strongly that the best way to learn this skill is by hand, not with any system that aims for interoperability and a desire to save the time of the user? for others, which tool should be used? should we not just go with a free one? while discipline differences will always exist, the current environment does present opportunities for the library to take a leadership role in defining what the possibilities are and ideally connecting the approach to appropriate and measurable learning outcomes and to the larger issue of academic integrity. information technology and libraries | december 2012 5 e-books, pda, article tokens: one of the unforeseen benefits of my moving to a small college library is that there is not the attachment to a print collection that exists in many/most research libraries. there is remarkable openness to experimenting with and committing to various methods of digital delivery of content. thus, we have been able to test myriad possibilities, from patron driven book purchasing, tokens for journal articles, and streaming popular films from a link in the library management system. this blurring of content, delivery, and functionality presents numerous opportunities for librarians to have conversations with departments of the future of collections. connecting with alumni: this is always an important strategic issue for colleges and universities and it seems as though there are promising emerging options for libraries to deliver database content to alumni, as vendors are beginning to offer more reasonable alumni-oriented packages. my library will be working with the appropriate campus offices next year to develop a plan for funding targeted library content for alumni as part of the college’s broader strategic activities to engage alumni. web design skills: while i understand the value that products like libguides can bring to the community, allowing content experts (librarians) to quickly and easily create template-driven web-based subject guides, i remain troubled by the lack of design skills librarians possess, and by the lack of recognition that good design can be just as important as good content. this is not a criticism, as we are not graphic designers. we have a sense of user needs, knowledge about content, and a desire to deliver, but i believe that products like this lead librarians to believe that good design for learning is easy. i do not claim to be an expert, but i know this is not the case. this approach does not translate into user friendly guides that hold to consistent standards. i think we need to recognize that we can benefit from non-librarian expertise in the area of web design. one opportunity that i want to investigate along these lines is to create student internships that would bring design skills and the student perspective to the work. a win-win, as this also supports the college’s desire for more internships and experiential learning for students. there is neither time nor space to address an even broader library technology issue on the near horizon, which will be another campus engagement moment, the future ils for the library. yet, maybe that should have been addressed first, since what i have read and heard, the new ilss will solve all of the above problems! farrell ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ editor’s comments bob gerrity information technologies and libraries | september 2013 3 this month’s issue in this month’s issue, we welcome back the president’s message column, with incoming lita president cindi trainor describing upcoming lita events, priorities, and opportunities for members. university of denver mlis candidate gina schlesselman-tarango contributes a compelling piece describing the background, use, and potential library application of searchable signatures in web 2.0 applications such as instagram. jenny emanuel from university of illinois reports on the complex relationship that millennial academic librarians have with technology. kristina l. southwell and jacquelyn slater from university of oklahoma present the findings of a study evaluating the accessibility of special collections finding aids to screen readers for visually impaired users. ping fu from central washington university and moira fitzgerald from yale university look at the potential effects of cloud-based next-generation library services platforms on staffing models for systems and technical-services departments. visiting the discovery side of library services, megan johnson from appalachian state university reports on usability testing of appalachian’s “one box” integrated articles and catalog search, using innovative interfaces’ encore discovery service. speaking of usability, i had the chance recently to observe a usability testing session for my library’s website, and was reminded of the importance of designing library websites and delivering web-based library services that will actually be of value to our users, delivered with their context in mind rather than ours. my library, like many others, has a website rich in content and complexity and organized around our structure. to the user i was observing, the complexity and library-centric organization clearly were obstacles to the rich content we offer. an undergraduate art history major, she was primarily interested in library resources and services that were directly connected to her coursework and that were accessible from the university’s learning management system (lms). she valued the convenience of direct access from the lms to library-managed course readings and past exam papers. but, when asked to navigate to the same resources using the library homepage as a starting point, rather than the lms, she quickly became frustrated and confused by the overload of search options with (to her) confusing labels. she was further stymied by our proclivity to make things more complex than they need to be (or should be). a simple example: a common occurrence at the beginning of semester is that students with outstanding library fines/fees are blocked from registering for classes. rather than providing a simple, direct “resolve my library fees” link, with clear instructions on how to fix their problem, as bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. editor’s comments bob gerrity editor’s comments | gerrity 4 quickly as possible, we instead provide pages of information about how and why the fines/fees were calculated, with no link to a solution to the problem at hand. my takeaways from the session were that (1) our website needs to be radically simplified and (2) we should be focussing on designing and delivering services that can be embedded in the context of the user’s natural workflows, not the library’s. easier said than done, of course. reviewers needed the ital editorial board has room for a couple of additional members, to help us keep up with incoming article submissions. if you have a passion for library technology, a willingness to undertake a few reviews each year, and are a member of lita (or willing to join), please send me an e-mail indicating your interest and area(s) of expertise. as always, suggestions and feedback on ital are welcome, at the e-mail address above. content management systems: trends in academic libraries ruth sara connell information technology and libraries | june 2013 42 abstract academic libraries, and their parent institutions, are increasingly using content management systems (cmss) for website management. in this study, the author surveyed academic library web managers from four-year institutions to discover whether they had adopted cmss, which tools they were using, and their satisfaction with their website management system. other issues, such as institutional control over library website management, were raised. the survey results showed that cms satisfaction levels vary by tool and that many libraries do not have input into the selection of their cms because the determination is made at an institutional level. these findings will be helpful for decision makers involved in the selection of cmss for academic libraries. introduction as library websites have evolved over the years, so has their role and complexity. in the beginning, the purpose of most library websites was to convey basic information, such as hours and policies, to library users. as time passed, more and more library products and services became available online, increasing the size and complexity of library websites. many academic library web designers found that their web authoring tools were no longer adequate for their needs and turned to cmss to help them manage and maintain their sites. for other web designers, the choice was not theirs to make. their institution transitioned to a cms and required the academic library to follow suit, regardless of whether the library staff had a say in the selection of the cms or its suitability for the library environment. the purpose of this study was to examine cms usage within the academic library market and to provide librarians quantitative and qualitative knowledge to help make decisions when considering switching to, or between, cmss. in particular, the objectives of this study were to determine (1) the level of saturation of cmss in the academic library community; (2) the most popular cmss within academic libraries, the reasons for the selection of those systems, and satisfaction with those cmss; (3) if there is a relationship between libraries with their own dedicated information technology (it) staff and those with open source (os) systems; and (4) if there is a relationship between institutional characteristics and issues surrounding cms selection. ruth sara connell (ruth.connell@valpo.edu) is associate professor of library services and electronic services librarian, christopher center library services, valparaiso university, valparaiso, in. mailto:ruth.connell@valpo.edu content management systems: trends in academic libraries | connell 43 although this study largely focuses on cms adoption and related issues, the library web designers who responded to the survey were asked to identify what method of web management they use if they do not use a cms and asked about satisfaction with their current system. thus, information regarding cms alternatives (such as adobe’s dreamweaver web content editing software) is also included in the results. as will be discussed in the literature review, cmss have been broadly defined in the past. therefore, for this study participants were informed that only cmss used to manage their primary public website were of interest. specifically, cmss were defined as website management tools through which the appearance and formatting is managed separately from content, so that authors can easily add content regardless of web authoring skills. literature review most of the library literature regarding cms adoption consists of individual case studies describing selection and implementation at specific institutions. there are very few comprehensive surveys of library websites or the personnel in charge of academic library websites to determine trends in cms usage. the published studies including cms usage within academic libraries do not definitively answer whether overall adoption has increased. in 2005 several georgia state university librarians surveyed web librarians at sixty-three of their peer institutions, and of the sixteen responses, six (or 38 percent) reported use of “cms technology to run parts of their web site.” 1 a 2006 study of web managers from wide range of institutions (associates to research) indicated a 26 percent (twenty-four of ninety-four) cms adoption rate.2 a more recent 2008 study of institutions of varying sizes resulted in a little more than half of respondents indicating use of cmss, although the authors note that “people defined cmss very broadly,” 3 including tools like moodle and contentdm, and some of those libraries indicated they did not use the cms to manage their website. a 2012 study by comeaux and schmetzke differs from the others mentioned here in that they reviewed academic library websites of the fifty-six campuses offering ala-accredited graduate degrees (generally larger universities) and used tools and examined page code to try to determine on their own if the libraries used cmss, as opposed to polling librarians at those institutions to ask them to self-identify if they used cmss. they identified nineteen out of fifty-six (34 percent) sites using cmss. the authors offer this caveat, “it is very possible that more sites use cmss than could be readily identified. this is particularly true for ‘home-grown’ systems, which are unlikely to leave any readily discernible source code.” 4 because of different methodologies and population groups studied in these studies, it is not possible to draw conclusions regarding cms adoption rates within academic libraries over time using these results. as mentioned previously, some people define cmss more broadly than others. one example of a product that can be used as a cms, but is not necessarily a cms, is springshare’s libguides. many libraries use libguides as a component of their website to create guides. however, some libraries have utilized the product to develop their whole site, in effect using it as a cms. a case study by information technology and libraries | june 2013 44 two librarians at york college describes why they chose libguides as their cms instead of as a more limited guide creation tool.5 several themes recurred throughout many of the case study articles. one common theme was the issue of lack of control and problems of collaboration between academic libraries and the campus entities controlling website management. amy york, the web services librarian at middle tennessee state university, described the decision to transition to a cms in this way, “and while it was feasible for us to remain outside of the campus cms and yet conform to the campus template, the head of the it web unit was quite adamant that we move into the cms.” 6 in a study by bundza et al., several participants who indicated dissatisfaction with website maintenance mentioned “authority and decision-making issues” as well as “turf struggles.” 7 other articles expressed more positive collaborative experiences. morehead state university librarians kmetz and bailey noted, “when attending conferences and hearing the stories of other libraries, it became apparent that a typical relationship between librarians and a campus it staff is often much less communicative and much less positive than [ours]. because of the relatively smooth collaborative spirit, a librarian was invited in 2003 to participate in the selection of a cms system.” 8 kimberley stephenson also emphasized the advantageous relationships that can develop when a positive approach is used, “rather than simply complaining that staff from other departments do not understand library needs, librarians should respectfully acknowledge that campus web developers want to create a site that attracts users and consider how an attractive site that reflects the university’s brand can be beneficial in promoting library resources and services.” 9 however, earlier in the article she does acknowledge that the iterative and collaborative process between the library and their university relations (ur) department was occasionally contentious and that the web services librarian notifies ur staff before making changes to the library homepage.10 another common theme in the literature was the reasoning behind transitioning to a cms. one commonly cited criterion was access control or workflow management, which allows site administrators to assign contributors editorial control over different sections of the site or approve changes before publishing.11 however, although this feature is considered a requirement by many libraries, it has its detractors. kmetz and bailey indicated that at morehead state university, “approval chains have been viewed as somewhat stifling and potentially draconian, so they have not been activated.” 12 these studies greatly informed the questions used and development of the survey instrument for this study. method in designing the survey instrument, questions were considered based on how they informed the objectives of the study. to simplify analysis, it was important to compile as comprehensive a list of content management systems: trends in academic libraries | connell 45 cmss as possible. this list was created by pulling cms names from the literature review, the web4lib discussion list, and the cmsmatrix website (www.cmsmatrix.org). in order to select institutions for distribution, the 2010 carnegie classification of institutions of higher education basic classification lists were used.13 the author chose to focus on three broad classifications: 1. research institutions consisting of the following carnegie basic classifications: research universities (very high research activity), research universities (high research activity), and dru: doctoral/research universities. 2. master’s institutions consisting of the following carnegie basic classifications: master's colleges and universities (larger programs), master's colleges and universities (medium programs), master's colleges and universities (smaller programs). 3. baccalaureate institutions consisting of the following carnegie basic classifications: baccalaureate colleges—arts & sciences and baccalaureate colleges—diverse fields. the basic classification lists were downloaded into excel with each of the three categories in a different worksheet, and then each institution was assigned a number using the random number generator feature within excel. the institutions were then sorted by those numbers creating a randomly ordered list within each classification. to determine sample size for a stratified random sampling, ronald powell’s “table for determining sample size from a given population” 14 (with a .05 degree of accuracy) was used. each classification’s population was considered separately, and the appropriate sample size chosen from the table. the population size of each of the groups (total number of institutions within that carnegie classification) and the corresponding sample sizes were • research: population = 297, sample size = 165; • master’s: population = 727, sample size = 248; • baccalaureate: population = 662, sample size = 242. the total number of institutions included in the sample size was 655. the author then went through the list of selected institutions and searched online to find their library webpages and find the person most likely responsible for the library’s website. during this process, there were some institutions, mostly for-profits, for which a library website could not be found. when this occurred, that institution was eliminated and the next institution on the list used in its place. in some cases, the person responsible for web content was not easily identifiable; in these cases an educated guess was made when possible, or else the director or a general library email address was used. the survey was made available online and distributed via e-mail to the 655 recipients on october 1, 2012. reminders were sent on october 10 and october 18, and the survey was closed on october 26, 2012. out of 655 recipients, 286 responses were received. some of those responses http://www.cmsmatrix.org/ information technology and libraries | june 2013 46 had to be eliminated for various reasons. if two responses were received from one institution, the more complete response was used while the other response was discarded. some responses included only an answer to the first question (name of institution or declination of that question to answer demographic questions) and no other responses; these were also eliminated. once the invalid responses were removed, 265 remained, for a 40 percent response rate. before conducting an analysis of the data, some cleanup and standardization of results was required. for example, a handful of respondents indicated they used a cms and then indicated that their cms was dreamweaver or adobe contribute. these responses were recoded as non-cms responses. likewise, one respondent self-identified as a non-cms user but then listed drupal as his/her web management tool and this was recoded as a cms response. demographic profile of respondents for the purposes of gathering demographic data, respondents were offered two options. they could provide their institution’s name, which would be used solely to pair their responses with the appropriate carnegie demographic categories (not to identify them or their institution), or they could choose to answer a separate set of questions regarding their size, public/private affiliation, and basic carnegie classification. the basic carnegie classification of the largest response group was master’s with 102 responses (38 percent); then baccalaureate institutions (94 responses or 35 percent), and then research institutions (69 responses or 26 percent). this correlates pretty closely with the distribution percentages, which were 38 percent master’s (248 out of 655), 37 percent baccalaureate (242 out of 655), and 25percent research (165 out of 655). of the 265 responses, 95 (36 percent) came from academic librarians representing public institutions and 170 (64 percent) from private. of the private institutions, the vast majority (166 responses or 98 percent) were not-for-profit, while 4 (2 percent) were for-profits. to define size, the carnegie size and setting classification was used. very small institutions are defined as less than 1,000 full-time equivalent (fte) enrollment, small is 1,000–2,999 fte, medium is 3,000–9,999 fte, and large is at least 10,000 fte. the largest group of responses came from small institutions (105 responses or 40 percent), then medium (67 responses or 25 percent), large (60 responses or 23 percent), and very small (33 responses or 12 percent). results the first question asking for institutional identification (or alternative routing to carnegie classification questions) was the only question for which an answer was required. in addition, because of question logic, some people saw questions that others did not based on how they answered previous questions. thus, the number of responses varies for each question. one of the objectives of this study was to identify if there were traits among institutional characteristics and cms selection and management. the results that follow include both content management systems: trends in academic libraries | connell 47 descriptive statistics and statistically significant inferential statistics discovered using chi-square and fisher’s exact tests. statistically significant results are labeled as such. the responses to this survey show that most academic libraries are using a cms to manage their main library website (169out of 265 responses or 64 percent). overall, cms users expressed similar (although slightly greater) satisfaction levels with their method of web management (see table 1.) table 1 satisfaction by cms use use a cms to manage library website yes no user is highly satisfied or satisfied yes 79 responses or 54% 41 responses or 47% no 68 responses or 46% 46 responses or 53% total 147 responses or 100% 87 responses or 100% non-cms users non-cms users were asked what software or system they use to govern their site. by far, the most popular system mentioned among the 82 responses was adobe dreamweaver, with 24 (29 percent) users listing it as their only or primary system. some people listed dreamweaver as part of a list of tools used; for example “php / mysql, integrated development environments (php storm, coda), dreamweaver, etc.,” and if all mentions of dreamweaver are included, the number of users rises to 31 (38 percent). some version of “hand coded” was the second most popular answer with 9 responses (11 percent), followed by adobe contribute with 7 (9 percent). many of the “other” responses were hard to classify and were excluded from analysis. some examples include: • ftp to the web • voyager public web browser ezproxy • excel, e-mail, file folders on shared drives among the top three non-cms web management systems, dreamweaver users were most satisfied, selecting highly satisfied or satisfied in 15 out of 24 (63 percent) cases. hand coders were highly satisfied or satisfied in 5out of 9 of cases (56 percent), and adobe contribute users were only highly satisfied or satisfied in 3 out of 7 (43 percent) cases. respondents not using a cms were asked whether they were considering a move to a cms within the next two years. most (59 percent) said yes. research libraries were much more likely to be planning such a move (81percent) than master’s (50 percent) or baccalaureate (45 percent) libraries (see table 2.) a chi-square test rejects the null hypothesis that the consideration of a move to cms is independent of basic carnegie classification; this difference was significant at the p = 0.038 level. information technology and libraries | june 2013 48 table 2 non-cms users considering a move to a cms within the next two years by carnegie classification* baccalaureate master’s research total no 11 responses or 55% 11 responses or 50% 4 responses or 19% 26 responses or 41% yes 9 responses or 45% 11 responses or 50% 17 responses or 81% 37 responses or 59% total 20 responses or 100% 22 responses or 100% 21 responses or 100% 63 responses or 100% chi-square=6.526, df=2, p=.038 *excludes “not sure” responses non-cms users were asked to provide comments related to topics covered in the survey, and here is a sampling of responses received: • cmss cost money that our college cannot count on being available on a yearly basis. • the library doesn't have overall responsibility for the website. university web services manages the entire site, i submit changes to them for inclusion and updates. • we are so small that the time to learn and implement a cms hardly seems worth it. so far this low-tech method has worked for us. • the main university site was moved to a cms in 2008. the library was not included in that move because of the number of pages. i hear rumors that we will be forced into the cms that is under consideration for adoption now. the library has had zero input in the selection of the new cms. cms users when respondents indicated their library used a cms, they were routed to a series of cms related questions. the first question asked which cms their library was using. of the 153 responses, the most popular cmss were drupal (40); wordpress (15); libguides (14), which was defined within the survey as a cms “for main library website, not just for guides”; cascade server (12); ektron (6); and modx and plone (5 each). these users were also asked about their overall satisfaction with their systems. among the top four cmss, libguides users were the most satisfied, selecting highly satisfied or satisfied in 12 out of 12 (100 percent) cases. the remaining three systems’ satisfaction ratings (highly satisfied or satisfied) were as follows: wordpress (12out of 15 cases or 80 percent), drupal (26out of 38 cases or 68 percent), and cascade server (3 out of 11 cases or 27 percent). when asked whether they would switch systems if given the opportunity, most (61out of 109 cases or 56 percent) said no. looking at the responses for the top four cmss, responses echo the content management systems: trends in academic libraries | connell 49 satisfaction responses. libguides users were least likely to want to switch (0 out of 7 cases or 0 percent), followed by wordpress (1 out of 5 cases or 17 percent), drupal (8out of 23 cases or 26 percent), and cascade server (3 out of 7 or 43 percent) users. respondents were asked whether their library uses the same cms as their parent institution. most (106 out of 169 cases or 63 percent) said yes. libraries at large institutions (over 10,000 fte) were much less likely (34 percent) than their smaller counterparts to share a cms with their parent institution (see table 3.) a chi-square test rejects the null hypothesis that sharing a cms with a parent institution is independent of size: at a significance level of p = 0.001, libraries at smaller institutions are more likely to share a cms with their parent. table 3 cms users whose libraries use the same cms as their parent institution by size large medium small very small total no 23 responses (66%) 15 responses (33%) 19 responses (27%) 6 responses (35%) 63 responses (37%) yes 12 responses (34%) 31 responses (67%) 52 responses (73%) 11 responses (65%) 106 responses (63%) total 35 responses (100%) 46 responses (100%) 71 responses (100%) 17 responses (100%) 169 responses (100%) chi-square=15.921, df=3, p=.001 not surprisingly, a similar correlation holds true for comparing shared cmss and simplified basic carnegie classification. baccalaureate and master’s libraries were more likely to share cmss with their institutions (69 percent and 71 percent respectively) than research libraries (42 percent) (see table 4.) at a significance level of p = 0.004, a chi-square test rejects the null hypothesis that sharing a cms with a parent institution is independent of basic carnegie classification. table 4 cms users whose libraries use the same cms as their parent institution, by carnegie classification baccalaureate master’s research total no 19 responses (31%) 18 responses (29%) 26 responses (58%) 63 responses (37%) yes 43 responses (69%) 44 responses (71%) 19 responses (42%) 106 responses (63%) total 62 responses (100%) 62 responses (100%) 45 responses (100%) 169 responses (100%) chi-square = 11.057, df = 2, p = .004 information technology and libraries | june 2013 50 when participants responded that their library shared a cms with the parent institution, they were asked a follow up question about whether the library made the transition with the parent institution. most (80 out of 99 cases or 81 percent) said yes, the transition was made together. however, private institutions were more likely to have made the switch together (88 percent) than public (63 percent) (see table 5.) a fisher’s exact test rejects the null hypothesis that transition to cms is independent of institutional control: at a significance level of p = 0.010, private institutions are more likely than public to move to a cms in concert. table 5 users whose libraries and parent institutions use the same cms: transition by public/private control* private public total switched independently 9 responses (13%) 10 responses (37%) 19 responses (19%) switched together 63 responses (88%) 17 responses or (63%) 80 responses (81%) total 72 responses (101%)** 27 responses (100%) 99 responses (100%) fisher’s exact test: p = .010 * excludes responses where people indicated “other” ** due to rounding, total is greater than 100% similarly, a relationship existed between transition to cms and basic carnegie classification. baccalaureate institutions (93 percent) were more likely than master’s (80 percent), which were more likely than research institutions (53 percent) to make the transition together (see table 6.) a chi-square test rejects the null hypothesis that the transition to cms is independent of basic carnegie classification: at a significance level of p = 0.002, higher degree granting institutions are less likely to make the transition together. table 6 users whose libraries and parent institutions use the same cms: transition by carnegie classification* baccalaureate master’s research total switched independently 3 responses (7%) 8 responses (21%) 8 responses (47%) 19 responses (19%) switched together 40 responses (93%) 31 responses (80%) 9 responses (53%) 80 responses (81%) total 43 responses (100%) 39 responses (101%)** 17 responses (100%) 99 responses (100%) chi-square = 12.693, df = 2, p = .002 *excludes responses where people indicated “other” **due to rounding, total is greater than 100% content management systems: trends in academic libraries | connell 51 this study indicates that for libraries that transitioned to a cms with their parent institution, the transition was usually forced. out of the 88 libraries that transitioned together and indicated whether they were given a choice, only 8 libraries (9 percent) had a say in whether to make that transition. and even though academic libraries were usually forced to transition with their institution, they did not usually have representation on campus-wide cms selection committees. only 25 percent (22 out of 87) respondents indicated that their library had a seat at the table during cms selection. when comparing cms satisfaction ratings among libraries that were represented on cms selection committees versus those that had no representation, it is not surprising that those with representation were more satisfied (13 out of 22 cases or 59 percent) than those without (21 out of 59 cases or 36 percent). the same holds true for those libraries given a choice whether to transition. those given a choice were satisfied more often (6out of 8 cases or 75 percent) than those forced to transition (21 out of 71 cases or 30 percent). respondents who said that they were not on the same cms as their institution were asked why they chose a different system. many of the responses indicated a desire for freedom from the controlling influence of either it and marketing arms of the institution : • we felt drupal offered more flexibility for our needs than cascade, which is what the university at large was using. i've heard more recently that the university may be considering switching to drupal. • university pr controls all aspects of the university cms. we want more freedom. • we are a service-oriented organization, as opposed to a marketing arm. we by necessity need to be different. cms users were asked to provide a list of three factors most important in their selection of their cms and to rank their list in order of importance. the author standardized the responses, e.g. “price” was recorded as “cost.” the factors listed first, in order of frequency, were ease of use (15), flexibility (10), and cost (6). ignoring the ranking, 38 respondents listed ease of use somewhere in their “top three”, while 23 listed cost, and 16 listed flexibility. another objective of this study was to determine if there was a positive correlation between libraries with their own dedicated it staff and those who chose open source cmss. therefore cms users were asked if their library had its own dedicated it staff, and 66 out of 143 libraries (46 percent) said yes. then the cmss used by respondents were translated into two categories, open source or proprietary systems (when a cms listed was unknown it was coded as a missing value), and a fisher’s exact test was run against all cases that had values for both variables to see if a correlation existed. although those with library it had open source systems more frequently than those without, the difference was not significant (see table 7.) information technology and libraries | june 2013 52 table 7 libraries with own it personnel by open source cmss library has own it yes no total cms is open source yes 37 responses (73%) 32 responses (57%) 69 responses (65%) no 14 responses (28%) 24 responses (43%) 38 responses (36%) total 51 responses (101%)* 56 responses or (100%) 107 responses (101%)* fisher’s exact test: p = .109 *due to rounding, total is greater than 100% in another question, people were asked to self-identify if their organization uses an open source cms, and if so asked whether they have outsourced any of its implementation or design to an outside vendor. most (61 out of 77 cases or 79 percent) said they had not outsourced implementation or design. one person commented, “no, i don't recommend doing this. the cost is great, you lose the expertise once the consultant leaves, and the maintenance cost goes through the roof. hire someone fulltime or move a current position to be the keeper of the system.” one of the advantages of having a cms is the ability to give multiple people, regardless of their web authoring skills, the opportunity to edit webpages. therefore, cms users were asked how many web content creators they have within their library. out of 152 responses, the most frequent range cited was 2–5 authors (72 responses or 47 percent), followed by (33 responses or 22 percent) with only one author, 6–10 authors (20 responses or 13 percent), 21–50 authors (16 responses or 11 percent), 11–20 authors (6 responses or 4 percent), and over 50 authors (5 responses or 3 percent). because this question was an open ended response and responses varied greatly, including “over 100 (over 20 are regular contributors)” and “1–3”, standardization was required. when a range or multiple numbers were provided, the largest number was used. respondents were asked whether their library uses a workflow management process requiring page authors to receive approval before publishing content. of the 131 people who responded yes or no, most (88 responses or 67 percent) said no. cms users were asked to provide comments related to topics covered in the survey. many comments mentioned issues of control (or lack thereof), while another common theme was concerns with specific cmss. here is a sampling of responses received: • having dedicated staff is a necessity. there was a time when these tools could be installed and used by a techie generalist. those days are over. a professional content person and a professional cms person are a must if you want your site to look like a professional site... content management systems: trends in academic libraries | connell 53 i'm shocked at how many libraries switched to a cms yet still have a site that looks and feels like it was created 10 years ago. • since the cms was bred in-house by another university department, we do not have control over changing the design or layout. the last time i requested a change, they wanted to charge us. • our university marketing department, which includes the web team, is currently in the process of switching [cmss]. we were not invited to be involved in the selection process for a new cms, although they did receive my unsolicited advice. • we compared costs for open source and licensed systems, and we found the costs to be approximately equivalent based on the development work we would have needed in an open source environment. • the library was not part of the original selection process for the campus' first cms because my position didn't exist at that time. now that we have a dedicated web services position, the library is considered a "power user" in the cms and we are often part of the campus wide discussions about the new cms and strategic planning involving the campus website. • we currently do not have the preferred level of control over our library website; we fought for customization rights for our front page, and won on that front. however, departments on campus do not have permission to install or configure modules, which we hope will change in the future. • there’s a huge disconnect between it /administration and the library regarding unique needs of the library in the context of web-based delivery of information. discussion comparing the results of this study to previous studies indicates that cms usage within academic libraries is rising. the 64 percent cms adoption rate found in this survey, which used a more narrow definition of cms than some previous studies cited in the literature review, is higher than adoption rates in any of said studies. as more libraries make the transition, it is important to know how different cmss have been received among their peers. although cms users are slightly more satisfied than non-cms users (54 percent vs. 47 percent), the tools used matter. so if a library using dreamweaver to manage their site is given an option of moving with their institution to a cms and that cms is cascade server, they should strongly consider sticking with their current non-cms method based on the respective satisfaction levels reported in this study (63 percent vs. 27 percent). satisfaction levels are important, but should not be considered in a vacuum. for example, although libguides users reported very high satisfaction levels (100 percent were satisfied or very satisfied), users were mostly (11 out of 14 users or 79 percent) small or very small schools, while the remaining three (21percent) were medium schools. no large schools reported using libguides as their cms. libguides may be wonderful for a smaller school without need of much information technology and libraries | june 2013 54 customization or, in some cases, access to technical expertise but may not be a good cms solution for larger institutions. one of the largest issues raised by survey respondents was libraries’ control, or lack thereof, when moving to a campus-selected cms. given the complexity of academic libraries websites, library representation on campus-wide cms selection committees is warranted. not only are libraries more satisfied with the results when given a say in the selection, but libraries have special needs when it comes to website design that other campus units do not. including library representation ensures those needs are met. some of the respondents’ comments regarding lack of control over their sites are disturbing to libraries being forced or considering a move to a campus cms. clearly, having to pay another campus department to make changes to the library site is not an attractive option for most libraries. nor should libraries have to fight for the right or ability to customize their home pages. developing good working relationships with the decision makers may help prevent some of these problems, but likely not all. this study indicates that it is not uncommon for academic libraries to be forced into cmss, regardless of the cmss acceptability to the library environment. conclusion the adoption of cmss to manage academic libraries’ websites is increasing, but not all cmss are created equal. when given input into switching website management tools, library staff have many factors to take into consideration. these include, but are not limited to, in-house technical expertise, desirability of open source solutions, satisfaction of peer libraries with considered systems, and library specific needs, such as workflow management and customization requirements. ideally, libraries would always be partners at the table when campus-wide cms decisions are being made, but this study shows that this does not happen in most cases. if a library suspects that it is likely to be required to move to a campus-selected system, its staff should be alert for news of impending changes so that they can work to be involved at the beginning of the process to be able to provide input. a transition to a bad cms can have long-term negative effects on the library, its users, and staff. a library’s website is its virtual “branch” and vitally important to the functioning of the library. the management of such an important component of the library should not be left to chance. references 1. doug goans, guy leach, and teri m. vogel, “beyond html: developing and re-imagining library web guides in a content management system,” library hi tech 24, no. 1 (2006): 29–53, doi:10.1108/07378830610652095. 2. ruth sara connell, “survey of web developers in academic libraries,” the journal of academic librarianship 34, no. 2 (march 2008): 121–129, doi:10.1016/j.acalib.2007.12.005. http://dx.doi.org/10.1016/j.acalib.2007.12.005 content management systems: trends in academic libraries | connell 55 3. maira bundza, patricia fravel vander meer, and maria a. perez-stable, “work of the web weavers: web development in academic libraries,” journal of web librarianship 3, no. 3 (july 2009): 239–62. 4. david comeaux and axel schmetzke, “accessibility of academic library web sites in north america—current status and trends (2002–2012).” library hi tech 31, no. 1 (january 28, 2013): 2. 5. daniel verbit and vickie l. kline, “libguides: a cms for busy librarians,” computers in libraries 31, no. 6 (july 2011): 21–25. 6. amy york, holly hebert, and j. michael lindsay, “transforming the library website: you and the it crowd,” tennessee libraries 62, no. 3 (2012). 7. bundza, vender meer, and perez-stable, “work of the web weavers: web development in academic libraries.” 8. tom kmetz and ray bailey, “migrating a library’s web site to a commercial cms within a campus-wide implementation,” library hi tech 24, no. 1 (2006): 102–14, doi:10.1108/07378830610652130. 9. kimberley stephenson, “sharing control, embracing collaboration: cross-campus partnerships for library website design and management,” journal of electronic resources librarianship 24, no. 2 (april 2012): 91–100. 10. ibid. 11. elizabeth l. black, “selecting a web content management system for an academic library website,” information technology & libraries 30, no. 4 (december 2011): 185–89; andy austin and christopher harris, “welcome to a new paradigm,” library technology reports 44, no. 4 (june 2008): 5–7; holly yu , “chapter 1: library web content management: needs and challenges,” in content and workflow management for library web sites: case studies, ed. holly yu (hersey, pa: information science publishing, 2005), 1–21; wayne powel and chris gill, “web content management systems in higher education,” educause quarterly 26, no. 2 (2003): 43– 50; goans, leach, and vogel, “beyond html.” 12. kmetz and bailey, “migrating a library’s web site.” 13. carnegie foundation for the advancement of teaching, 2010 classification of institutions of higher education, accessed february 4, 2013, http://classifications.carnegiefoundation.org/descriptions/basic.php. 14. ronald r. powell , basic research methods for librarians (greenwood, 1997). http://classifications.carnegiefoundation.org/descriptions/basic.php j costs of library catalog cards produced by computer 121 frederick g. kilgour: ohio college library center, columbus, ohio production costs of 79,831 cards are analyzed. cards were produced by four variants of the columbia-harvard-yale procedure employing an ibm 870 document writer and an ibm 1401 computer. costs per card ranged from 8.8 to 9.8 cents for completed cards. . early in september, 1964, the yale medical library.put into routine operation the columbia-harvard-yale computerized technique for catalog card manufacture ( 1), and during the following three · years yale produced over 87,000 cards. the principal objective of the chy project was an on-line, computerized, bibliographic information retrieval system. however, the route selected for attaining the objective included manufacture of cards from machine readable data to keep up the manual catalog while machine readable records were being inexpensively accumulated for computerized subject retrieval. catalog cards were only one product of the system, but their production was designed to be as efficient as possible within constraints of the system. nevertheless, this paper will examine chy card production costs as though this segment of the system were an isolated procedure, yielding but one product, as is the case in classical library procedures. costing will disregard other benefits, such as accession lists and machine readable data produced for little, or no, additional expense. the columbia medical library and harvard medical library also installed ibm 870 document writers and tested the programs for card production, but neither library routinely produced cards. however, co122 journal of library automation vol. 1/ 2 june, 1968 lumbia produced its acquisitions lists until october, 1966, using chy techniques. harvard issued a similar list, but for a shorter period of time, and it was harvard's withdrawal early in 1966 that brought about the collapse of the project. nevertheless, other institutions adopted the chy procedure for catalog card production, among them the medical library at the university of rochester, which used the programs for two years following february, 1966. e. r. squibb & sons at east brunswick, new jersey, also uses the programs. at the university of kentucky an 870 document writer types catalog cards, but new programs were written to run on an ibm 7040 computer that recently have been recoded in cobol for an ibm 360/50. similarly, the library at philip morris, inc., richmond, virginia, rewrote the programs to run on an ibm 1620 computer which punches cards that drive an 870. the korean social science bibliography project of the human relations area files has elaborated the chy technique into its automated bibliographic system ( 2), which in turn is the base for another bibliographic system for mrican studies. the machine readable cataloging record of the chy mechanized system eventually became the great-grandfather of the marc ii format and contributed about as much to marc ii as would have been the case had their relationship been truly biological. although the columbia-harvard-yale project never did develop and activate its proposed bibliographic information retrieval system, r. k. summit working entirely independently has brought into successful operation his excellent dialog system ( 3) which is essentially the system that chy had in design stage. moreover, summit's system is definitely superior because it has several useful functions not contemplated in chy. nearly all reports on catalog card production limit study of costs to reproduction of cards and neglect other costs involved in preparing cards for the catalog. an exception is p. j. fasana's 1963 investigation wherein he found that library of congress cards, in seven copies and ready to be filed into a catalog, cost 16.6 cents per card; cards produced by a machine method consisting of a tape typewriter and a very small special purpose computer cost 9.9 cents ( 4). fasana used an hourly salary rate of $2.00. a study of early experience with chy production yielded 12.5 cents per card ( 1) whereas the present study shows that costs range between 8.8 and 9.8 cents per card, cards being ·in completed form, arranged in packs for individual catalogs, and ready for bursting before alphabetizing for filing. methods · during the course of the three years in which the chy programs were in operation, four variant techniques were used for card production. the first three with their limitations have been described · elsewhere ( 5). briefly, the initial system consisted of keypunching from worksheets, _listing the punch cards on an ibm 870 document writer, proofreading and costs of library catalog cards/ kilgour 123 correcting, processing the proofread and corrected punch cards on an ibm 1401 computer which produced punch card output that, in tum, was used to drive the 870 document writer for production of catalog cards on oneup forms. in the next arrangement, printing of cards on one-up forms was accomplished on an ibm 1401 computer driving an upperand lowercase print chain. in the third procedure, a two-up card form replaced the one-up form. finally, the medical library returned the 870 document writer to the manufacturer, and the 1401 was programmed to do the prooflisting in upper and lower case. the yale bibliographic system (6) replaced the chy routines on 25 july 1967. the keypuncher kept time records for the various activities listed in table 1 throughout the period of this study. during the first two months of operation, design for recording data was inadequate. subsequently an individual would, albeit infrequently, fail to record time elapsed, so that production of 7,630 cards was omitted from the study, leaving a total of 79,831 to be included. on several occasions during the fourth part of the study, the second proofreading was suspended, and only correction carried out. hence, time expended in this category is less than in the previous three periods. at first an ibm 1401 computer in the yale computer center was used, the center being located about a mile from the medical library. subsequently, another 1401 modified to drive an upperand lower-case print chain and located in the medical sc;hool was employed. later this machine was transferred to the administrative data systems computer center, which moved to a new location not long after it assumed operation of the 1401. still later, the 1401 was again transferred, this time to the yale computer center. as can be seen from the computer charges in table 1, these wanderings about new haven appear to have had no effect on operating efficiency. time recorded for each computer run was actual time clocked by the operator. other times were recorded by the individual performing the operation. ·. salaries used in the cost calculation were salaries being paid in june, 1967, which were, of course, appreciably higher than those in the autumn of 1964; hourly rate for the first proofreader in table 1 was $2.62 ~nd for the second $2.21. hourly rental for the 870 document writer was $.78. rate of computer charges employed in the calculation was $20 per hour, a rate that had existed during the last year or so during which data was collected. initially, computer charges had been $75 an hour, but they dropped precipitously during the first two years. costs for catalog card stock were the lowest cost charged for the two types of forms. since these forms were not standard items during the years of the study, their prices varied considerably depending upon the amount ordered. results table 1 contains cost figures for catalog card production by the four variant techniques. since salaries and computer charges can vary widely, -----.-.---.-~..::::-·...:::::-.-__ ...... l'o ~ table 1. per-card costs of computer-produced catalog cards. 'o' one-u p form on 870 one-up fo r m o n 1401, t woup f o r m on 1401 , two-up· form o n 1401 , ~ g proo f on 8 70 proof o n 870 p r oof o n 140 1 ...... ..a dollars hou r s dollars hou r s dolla r s hou r s d olla r s hours t"'' .... <:3"' k e ypunch i n g • 02 19 • 0099 • 0 2 18 • 0099 • 0222 • 0 10 1 • 0 235 • 0106 "'t ~ "'t '-!::: keypun c h • 0029 • 00 99 • 0030 • 009 9 • 003 0 • 0101 • 0 0 32 • 0 106 ::> ~ i b m 8 70p r o o£ • 0033 • 00 4 3 • 0 036 • 00 4 6 • 003 9 • 00 51 ..... 0 i bm 1401 -proof • 004 6 ~ • 009 1 ~ ..... .... proofr eaders (2) 0 ;:$ proofr eadi ng • 0 11 5 • 004 4 • 0 11 3 • 00 4 3 . 0118 • 00 45 • 011 6 • 0044 proofr eading and c orrecting • 0 120 • 0 0 55 • 0 12 2 • 005 5 • 0 11 9 • 0 0 54 • 009 1 • 004 1 ~ i bm 140 1 • 0149 • 0085 • 0313 • 0 156 • 023 1 • 011 6 • 024 5 • 0 112 !"""' ...... ib m 8 70-ca r d typing • 0 104 '-.... l'o card st o c k • 0 149 • 01 49 • 01 2 5 • 0125 '--1 t o ta l • 0 9 18 • 0981 • 0884 • 09 35 § v(l) ...... <;;0 n um b er of cards 1 5, 149 9343 27,210 28, 129 0:> 00 number of titles 1, 6 55 990 2 , 920 3,1 30 cards per titl e 9 . 2 9. 4 9. 3 9 . 0 ~--· costs of librm·y catalog cards/kilgour 125 particularly among countries, time per card produced is also included in the table to facilitate comparison with other systems. of course, amounts of tim~ calculated by dividing elapsed time by amount of product are not directly comparable with results of time and motion studies such as henry voos' helpful study (7) . however, two different methods of comparing the input costs in table 1 with those johnson ( 8) published for the stanford book catalog gave divergences of only 2 and 6 per cent. source of the increase in costs of six-tenths of a cent from the first procedure to the second is entirely the increase in computer charges when the 1401 replaced the 870 to print cards. when the two-up form was employed on the computer in variant three, charges then dropped to less than the combined 1401 and 870 costs in the first procedure. costs rose again in procedure four. here the principal cause of the increase was the substitution of computer-produced proof listings after the 870 document writer had been returned to the manufacturer. although there is no reason to think that preparation of cataloging copy on a worksheet is either more or less expensive than older techniques, coding a worksheet constitutes additional work for which there is no equivalent in classical procedures. coding costs were examined between 9 march and 11 may 1965, when six individuals, ranging from professional catalogers to a student assistant, recorded time required to code 725 worksheets. time per final catalog card produced was three seconds; in other words, $.003 for a cataloger receiving $7500 a year, or $.001 for a student assistant earning $1.50 an hour. if total coding cost, . rather than a portion of it, were to be charged to card production, costs reported in table 1 could rise oneto three-tenths cents. discussion the accurate comparison of costs would be with those of systems similar to the chy system that produce more than one product. for instance, the chy system also produced monthly accession lists from the same punch-card decklets that produced catalog cards. the accession list was produced mechanically at a cost far less than that for the previous manual preparation. the decklets also constituted machine readable information available for other purposes, most of which have not yet been realized. system costing would assign only a portion of keypunching and proofreading costs to card production. another saving was the appreciable shortening of time required for catalog cards to appear in the catalog. in procedures one through three, usually three or four days elapsed from the day on which the cataloger completed cataloging to the day on which cards were filed into the catalog. however, in procedure four, the computer, which was then a mile distant from the medical library, was used on two separate occasions for each batch of decklets, so that elapsed time rose to at least a week. ' i li ii ii '· ,, .. '· ,, ' • ,, 126 journal of library automation vol. 1/ 2 june, 1968 even though other benefits are not reflected in comparative costs, it is clear from fasana's findings that the chy computer-produced cards cost far less than do lc cards, and have a similar cost to those produced mechanically on which fasana reported. although there appears to be no published evidence that photocopying techniques can produce finished catalog cards at less expense than 9 cents, it is possible that some photoreproduced cards may be less expensive than those described in this article. however, it must be pointed out that photo-reproduced cards are products . of single-product procedures, whereas the chy cards are one of several system products. increase in cost between procedure three and procedure four was due to increase in cost of prooflisting in upper and lower case on the 1401 computer as compared to prooflisting on the 870 document writer. this cost increase was not detected until calculations were done for this investigation, and therein lies a moral. it was the policy at the yale library for all programming to be done by library programmers, since various inefficiences, and indeed catastrophes, had occasionally been observed when non-library personnel had prepared programs for library operations. the single exception to this policy was the proof program, which this investigation reveals used an exhorbitant amount of time-one-third of that required for subsequent card production. since it had been felt that writing and coding a prooflisting program. was perfectly straightfmward, an outside programmer of recognized ability was employed to write and code the program. because the program was simple, and because the programmer had high competence, efficiency of the program was never checked as it should have been. this episode raises the question that if even the wary can be trapped, how can the tmwary avoid pitfalls? there is no satisfactory answer, but it would appear that some difficulties could be avoided by review of new programs by experienced library programmers, of which there are unfortunately far too few. comparison with data such as that in table 1 will also be helpful, but not definitive, in evaluating new programs. of course, when widely used library computer programs of recognized efficiency are generally available, magnitude of the pitfalls will have been greatly reduced. concl"qsion computer-produced catalog cards, even when they are but one of several system products, can be prepared in finished form for a local catalog less expensively and with less delay than can library of congress printed cards. computer card production at 8.8 to 9.8 cents per completed card appears to be competitive with other procedures for preparing catalog cards. however, undetected inefficiency in a minor program increased costs, thereby emphasizing need to insure efficiency in programs used routinely. costs of library catalog cards/ kilgour 127 acknowledgements the author is most grateful to mrs. sarah boyd, keypuncher extraordinary, who maintained the record of the data used in this study. national science foundation grant no. 179 supported the chy project in part. references 1. kilgour, frederick g.: "mechanization of cataloging procedures," bulletin of the medical library association, 53 (aprill965), 152-162. 2. koh, hesung c.: "a social science bibliographic system; computer adaptations," the american behavioral scientist, 10 (jan. 1967), 2-5. 3. summit, roger k.: "dialog; an operational on-line reference retrieval system," association for computing machinery, proceedings of 22nd national conference, (1967), 51-56. 4. fasana, p.j.: "automating cataloging functions in conventional libraries," library resources & technical services, 7 ( fall1963), 350-365. 5. kilgour, frederick g.: "library catalogue production on small computers," american documentation, 17 (july 1966), 124-131. 6. weisbrod, david l.: "an integrated, computerized, bibliographic system for libraries," (in press). 7. voos, henry: standard times for certain clerical activities in technical processing (ann arbor, university microfilms, 1965). 8. johnson, richard d.: "a book catalog at stanford~" journal of library automation, 1 (march 1968), 13-50. ----------------------editor’s comments bob gerrity information technology and libraries | december 2012 1 past and present converge with the december 2012 issue of information technology and libraries (ital), as we also publish online the first volume of ital’s predecessor, the journal of library automation (jola), originally published in print in 1968. the first volume of jola offers a fascinating glimpse into early days of library automation, when many things were different, such as the size (big) and capacity (small) of computer hardware, and many things were the same (e.g., richard johnson’s description of the book catalog project at stanford, where “the major achievement of the preliminary systems design was to establish a meaningful dialogue between the librarian and systems and computer personnel.” plus ça change, plus c'est la meme. there are articles by luminaries in the field: richard de gennaro describes approaches to developing an automation program in a large research library, frederick kilgour, from the ohio bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. http://ejournals.bc.edu/ojs/index.php/ital/issue/view/312 editor’s comments bob gerrity editor’s comments | gerrity 2 college library center (now oclc), analyzes catalog-card production costs at columbia, harvard, and yale in the mid 1960s (8.8 to 9.8 cents per completed card), and henriette avram from the library of congress describes the successful use of the cobol programming language to manipulate marc ii records. the december 2012 issue marks the completion of ital’s first year as an e-only, open-access publication. while we don’t have readership statistics for the previous print journal to compare with, download statistics for the e-version appear healthy, with more than 30,000 full-text article downloads for 2012 content so far this year, plus more than 10,000 downloads for content from previous years. based on the download statistics, the topics of most interest to today’s ital readers are discovery systems, web-based research guides, digital preservation, and digital copyright. this month’s issue takes some of these themes further, with articles that examine the usability of autocompletion features in library search interfaces (ward, hahn, and feist), reveal patterns of student use of library computers (thompson), propose a cloud-based digital library storage solution (sosa-sosa), and summarize attributes of open standard file formats (park, oh). happy reading. 20 information technology and libraries | june 2008 an assessment of student satisfaction with a circulating laptop service louise feldmann, lindsey wess, and tom moothart since may of 2000, colorado state university’s (csu) morgan library has provided a laptop computer lending service. in five years the service had expanded from 20 to 172 laptops. although the service was deemed a success, users complained about slow laptop startups, lost data, and lost wireless connections. in the fall of 2005, the program was formally assessed using a customer satisfaction survey. this paper discusses the results of the survey and changes made to the service based on user feedback. colorado state university (csu) is a land-grant insti-tution located in fort collins, colorado. the csu libraries consist of the morgan library, the main library on the central campus; the veterinary teaching branch hospital library at the veterinary hospital campus; and the atmospheric branch library at the foothills campus. in 1997, morgan library completed a major renovation and expansion which provided a designated space for public desktop computers in an information commons environment. the library called this space the electronic information center (eic). due to the popularity of the eic ,and with the intent of expanding computer access without expanding the computer lab, library staff began to explore the implementation of a laptop checkout service in 2000. library staff used heather lyle’s (1999) article “circulating laptop computers at west virginia university” as a guide in planning the service. development funds were used to purchase twenty laptop computers, and the 3com corporation donated fifteen wireless network access points. the laptops were to be used in morgan library on a wireless network maintained by the library technology services department. these computers were to be circulated from the loan desk, the same desk used to check out books. although the building is open to the public, use of the laptops was limited to university students and staff and for library in-house use only. all the public desktop computers and laptops use microsoft windows and microsoft office. maintaining the security of the libraries’ network and students’ personal data in a wireless environment was paramount. to maintain a secure computing environment and present a standardized computing experience in the library, an application of windows xp group policies was used. currently, the laptop software is updated at least every semester using symantec ghost. ghost copies a standardized image to every laptop even when the library owns a variety of computer models from the same manufacturer. additionally, due to concerns over wireless computer security, morgan library implemented cisco’s virtual private network (vpn) in 2004. the laptop service was launched in may 2000. more than 22,000 laptop transactions occurred in the initial year. since its inception, the use of the morgan library laptop service and the number of laptops available for checkout has steadily grown. using student technology funds, the service had grown to 172 laptops and ten presentation kits consisting of a laptop, projector, and a portable screen. circulation during the fall 2005 semester totaled 30,626 laptops and 102 presentation kits. in fiscal year 2005, 66,552 laptops and presentation kits were checked out. based on the high circulation statistics and anecdotal evidence, the service appeared to be successful. although morgan library replaced laptops every three years and upgraded the wireless network, laptop support staff noted that users complained of slow laptop startups, lost data, and lost wireless connections. the researchers also noted that large numbers of users queued at the circulation desk at 5:00 p.m. even though large numbers of desktop computers were available in the eic. a customer service satisfaction survey was developed to assess the service and test library staff’s assumptions about the service. csu had a student population of 25,616 students at the time of the survey. n literature review much of the published literature discussing laptop services focuses on the implementation of laptop lending programs and was published from 2001 to 2003, when many libraries were beginning this service (allmang 2003; block 2001; dugan 2001; myers 2001; oddy 2002; vaughan and burnes 2002; williams 2003). these articles deal primarily with topics such as how to deal with start-up technological, staffing, and maintenance issues. they have minimal discussion of the service post-implementation. researchers who have surveyed users of university laptop lending services include direnzo (2002), lyle (1999), jordy (1998), block (2001), oddy (2002), and monash university’s caulfield library (2004). direnzo from the university of akron only briefly discusses a survey they conducted with some information about additional software added as a result of their user comments. lyle from west virginia university discusses the percentage of respondents to particular questions such louise feldmann (louise.feldmann@colostate.edu) is the business and economics librarian at colorado state university libraries. she serves as the college liaison librarian to the college of business. lindsey wess (lindsey.wess@colostate. edu) coordinates assistive technology services and manages the information desk and the electronic information center at colorado state university libraries. tom moothart (tmoothar@ library.colostate.edu) is the coordinator of on-site services at colorado state university libraries. student satisfaction with circulating laptop service | feldmann, wess, and moothart 21 as what applications were used, problems encountered, and overall satisfaction with the service. jordy’s report provides in-depth analysis of the survey results from the university of north carolina at chapel hill, but the focus of his survey is on the laptop service’s impact on library employee work flow. monash university’s caulfield library survey focuses on wireless access and awareness of the program by patrons. other survey results found on university library web sites include southern new hampshire university library (west 2005) and murray state university library (2002). additionally, the monmouth university library web site (2003) provides discussion and written analysis of a survey they conducted prior to implementation of their service, a survey which was used to gather information and assess patron needs in order to aid in the construction and planning of their service. from the survey results discussed in the literature and posted on web sites, overall comments from users are very consistent with one another. most users indicate that they use a loaned laptop computer rather than desktop computer for privacy and portability (lyle 1999; oddy 2002; west 2005). in addition, the responses from patrons are overwhelmingly positive and users appreciated having the service made available (lyle 1999; jordy1998; west 2005). both west virginia university and the university of north carolina at chapel hill surveys found that 98 percent of respondents would check out a laptop again (lyle 1999; jordy 1998). southern university of new hampshire’s survey indicated that 88 percent of those responding would check one out again (west 2005). many respondents stated that a primary drawback of using the laptops was the slowness of connectivity (lyle 1999; monash 2004; murray state 2002). the primary use of the laptops, reported in the surveys, was microsoft word (lyle 1999; jordy 1998; oddy 2002). there is a lack of published literature regarding laptop lending customer satisfaction surveys and analysis. this could be due to the relative newness of many programs, the lack of university libraries that provide laptops, or the reliance on circulation statistics solely to assess the program. articles that discuss circulation and usage statistics as an assessment indicator to judge the popularity of their programs include direnzo (2002), dugan (2001), and vaughan and burnes (2002). based on high circulation statistics and positive anecdotal evidence, it may appear that library users are pleased with laptop programs, and perhaps there has been a hesitation to survey users on a program that is perceived by those in the library as successful. n results with the strong emphasis on assessment at colorado state university, it was decided to formally survey laptop users on their satisfaction with the program. the survey was distributed by the access services staff when the laptops were checked out from october 28, 2005, to november 28, 2005. this was a voluntary survey and the respondents were asked to complete one survey. users returned 173 completed surveys. undergraduates are the predominant audience for the laptop service; of the 173 returned surveys, 160 identified themselves as undergraduates. as shown in table 1, the responses indicated that the library has a core of regular laptop users, with 33 percent using the laptops at least daily and 82 percent using the laptops at least weekly. only 3 percent indicated that they were using a laptop for the first time. many laptop users also utilized the eic with 67 percent responding that they use the information commons at least weekly (see table 2). the laptops were initially purchased with the intent that they would be used to support student team projects. presentation kits with a laptop, projector, and portable screen were an extension of this idea and were also made available for checkout. surprisingly, only 15 percent of table 1. how often do you use a library laptop? frequency percentage more than once a day 3% daily 30% weekly 49% monthly 15% my first time 3% n=172 table 2. how often do you use a library pc? frequency percentage more than once a day 3% daily 20% weekly 44% monthly 20% never 13% n=169 22 information technology and libraries | june 2008 the respondents noted that they were using the laptop with a group. during evenings, it was observed by staff that students were regularly queuing and waiting for a laptop even though pcs were available in the library computer lab. figure 1 shows an hourly use statistics for the desktop and laptop public computers. the usage of the desktop computer drops in the late afternoon, just as the use of the laptop computer increases. students were asked why they chose a laptop rather than a library pc and were allowed to choose from multiple answers. as can be seen in table 3, most students noted the advantages of portability and privacy. five respondents wrote in the “other” category that they were able to work better in quieter areas, and ten mention that the computer lab workspace is limited. the dense use of space in the library computer lab has been noted by morgan library staff and students. the desktop surrounding each library pc only provides about three feet of workspace. one respondent explained the choice of laptop over pc was because “i can take it to a table and spread out my notes vs. on a library pc.” for many users, the desktops are too crowded to spread research material, and the eic is too noisy for contemplative thought. as can be noted from the use statistics, the public laptop program has been a very popular library service. prior to the survey, the perception of the morgan library staff was that students were waiting in the evening for extended periods of time for a laptop. when the library expanded the laptop pool from 20 in 2000 to 172 in 2005, it had seemingly no effect on reducing the number of students waiting to use them. as can be seen in table 4, when asked how long they had waited for a laptop, 74 percent of the students said they had access to a laptop immediately, and 15 percent waited less than a minute. the survey was administered during the second busiest time of the year for the library, the month before thanksgiving break. in the open comments, one respondent stated that it was possible to wait fortyfive minutes to an hour for a laptop and another noted that “during finals weeks it is almost impossible to get one.” even with the limited waiting time recorded by the page 1 of 1 feldmann figures.doc 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 7:30 am 8:30 am 9:30 am 10:30 am 11:30 am 12:30 pm 1:30 pm 2:30 pm 3:30 pm 4:30 pm 5:30 pm 6:30 pm 7:30 pm 8:30 pm 9:30 pm 10:30 pm 11:30 pm time of day p er ce nt ag e of u se r desktop computers checkout laptops figure 1. computer use statistics for may 1, 2006. figure 1. computer use statistics for may 1, 2006. table 3. why did you choose to use a laptop rather than a library pc? response number portability 41 privacy 12 easier to work with a group 7 portability and privacy 54 portability and easier to work with a group 10 portability, privacy, and easier to work with a group 12 student satisfaction with circulating laptop service | feldmann, wess, and moothart 23 respondents, when asked how the library could improve the laptop service many respondents requested that more laptops be purchased to decrease the wait. the library is struggling to determine the appropriate number of laptops to have available during peak use periods to reduce or eliminate wait times. the library laptops are more problematic than the library desktop computers to support. the laptops are more fragile than the desktop computers and have the added complication of connecting to the wireless network. every morning the morgan library’s technology staff retrieves non-functioning laptops; library technicians regularly retrieve lost data due to malfunctioning laptops and unsophisticated computer users. the addition of the virtual private network (vpn) connection to the laptop startup script files has slowed the boot-up to the wireless network. an effort has been made to ameliorate wireless “dead zones,” but users still complain of being dropped from the wireless network. with these problems in mind, users were asked about the technical complications they have experienced with the library laptops. the survey responses in tables 5 and 6 indicate a much lower percentage of users reporting technical problems than was anticipated. the technical staff’s large volume of technical calls may reflect the volume of users rather than systematic problems with the laptop service. surprisingly, 79 percent of the users reported rarely or never returning a non-functioning laptop. in addition, the library technicians have reported that no problems have been found on some of the laptops returned for repair. some of the returned computers may be due to frustration with the slow connection to the wireless network. forty-five percent of respondents reported at least occasionally having problems connecting to the wireless network. from the inception of the laptop program, the library has experienced problems with the wireless technology. from its original fifteen wireless access points to its current twenty-nine, the library has struggled to meet the demand of additional library laptops and users’ personal laptops. many written comments on the surveys complained about the slow connection speed of the wireless network such as, “find a way to make the boot-up process faster. i need to wait about five minutes for it to be totally booted and ready to use.” even with the slow connection to the wireless network, 41 percent of students responding to the survey rated their satisfaction with the library’s laptop service as excellent and 49 percent rated their satisfaction as good (see table 7). n discussion even with 90 percent of our users rating the laptop service as good or excellent, the survey noted some problems that needed attention. the morgan library laptops seamlessly connect to a wireless network through a login script when the computer is turned on. a new script was written to table 4. how long did you wait before you were able to check out your laptop? response percentage i did not wait 74% less than one minute 15% one to four minutes 11% five to ten minutes 2% more than ten minutes 0% n=171 table 5. how often have you experience problems saving files, connecting to the wireless network, or had a laptop that locked up or crashed? frequency saving files wireless connection locked up or crashed often <1% 5% <1% occasionally 8% 40% 17% rarely 33% 32% 35% never 58% 24% 49% n= 165 165 163 table 6. how often have you returned a library laptop that was not working properly? frequency percentage often 4% occasionally 18% rarely 30% never 49% n=165 24 information technology and libraries | june 2008 allow the connection and authentication to the cisco virtual private network (vpn) client. during testing it was found that some laptops took as long as ten minutes to connect to the wireless network, which resulted in numerous survey respondents commenting on our slow wireless network. to help correct this problem, the library’s network staff changed each laptop’s user profile from a mandatory roaming profile to a local profile and simplified the login script. the laptops connected faster to the wireless network with the new script, but they still did not meet the students’ expectations. in the fall of 2006, the library network staff moved the laptops from vpn to wi-fi protected access (wpa) wireless security, and laptop login time to the wireless network dropped to under two minutes. the number of customer complaints dropped dramatically after implementing wpa. additional access points were purchased to improve connectivity in morgan library’s wireless “dead zones.” in january 2006, the university’s central computing services audited the wireless network after continued wireless connectivity complaints. the audit recommended reconfiguring the access points channel assignments. in many cases it was found that the same channel had been assigned to access points adjacent to each other, ultimately compromising laptop connectivity. the audit also discovered noise interference on the wireless network from a 2.4-ghz cordless phone used by the loan desk staff. the phone was replaced with a 5.8-ghz one, which has resulted in fewer dropped connections near the loan desk. supporting almost 200 laptops has introduced several problems in the library. the morgan library building was not designed to support the use of large numbers of laptops. because it is impractical for the loan desk to charge nearly 200 laptop batteries throughout the day, laptops available for checkout must be connected to electrical outlets. these are seldom near study tables, and students are forced to crawl underneath tables to locate power or stretch adapter cords across aisles. a space plan for the morgan library is being developed that will increase the number of outlets near study tables. in the meantime, 100 power strips were added to tables used heavily by laptop users. the loan desk staff is very efficient at circulating, but has less success at troubleshooting technical problems. when the laptop service was first implemented, large numbers of laptops were not available due to servicing reasons. the public laptop downtime was lowered by hiring additional library technology students. a one-day onsite repair service agreement was purchased from the manufacturer which resulted in many equipment repairs being completed within 48 hours. in order to reduce the downtime further, a plan to replace some loan desk student workers with library technology students is being evaluated. the technology students will be able to troubleshoot connectivity and hardware problems with the users when they return the defective computers to the loan desk. if a computer needs additional service, it can be handled immediately, which will allow more laptops for checkout since fewer will be removed for repair. when the laptop service was first envisioned, it was seen as a great service for those working in groups. as can be seen in table 3, very few students are using the laptops in a group setting. in survey written comments, students emphasize that they enjoy the portability and privacy enabled by using a laptop. the morgan library eic is cramped and noisy, with the configuration allowing very little room for students to spread out research materials and notes for writing. the morgan library space plan takes these issues into consideration and recommends reconfiguring the eic to lessen the noise and provide writing space near computers. this is intended to improve the student library experience and encourage students to use the desktop computers during the evenings when lines form for the laptops. in order to decrease the current laptop queue at the loan desk, more laptops will be added. as a result of survey comments requesting apple computers, five mac powerbooks were added to the library’s laptop fleet. in addition, as morgan library adds more checkout laptops and the number of students arriving on campus with wireless laptops increases, the wireless infrastructure will need to be upgraded. upgrading the wireless access points to standard 802.11g has been implemented. updating each laptop with a new hardrive image has become problematic as the number of laptops has increased. the wireless network capacity is not large enough for the ghost software to transmit the image to multiple laptops, and so each laptop must be physically attached to the library network. initially, when library technology services attempted imaging many laptops at once, it took six to eight hours and required up to eight staff members. this method of large-scale laptop imaging was so network intensive that it had to be performed when the library was closed to avoid disrupting table 7. please rate your satisfaction with the laptop service. response percentage excellent 41% good 49% neutral 7% poor very poor 2% <1% n=166 student satisfaction with circulating laptop service | feldmann, wess, and moothart 25 public internet use. now imaging the laptop fleet is done piecemeal, twenty to thirty laptops at a time, in order to minimize complications with the ghost process and multicasting through the network switches. due to the staff time required, laptop software is not updated as often as the users would like. technological solutions continue to be investigated that will decrease the labor and network intensity of imaging. n conclusion the morgan library laptop service was established in 2000 and has been a very popular addition to the library’s services. as an example of its popularity, in fiscal year 2005 the laptops circulated 66,552 times. student government continues to support the use of student technology fees to support and expand the fleet of laptops. this survey was an attempt to assess users’ perceptions of the service and identify areas that need improvement. the survey found that students rarely wait more than a few minutes for a laptop, and in open-ended survey questions, students noted that they waited for computers only during peak use periods. while relatively few survey respondents experienced technical difficulties with the laptops and wireless network, slow wireless connection time was a concern that students noted in the open comments section of the survey. overall, the students gave the laptop service a very high rating. when asked to suggest improvements to the service, many respondents recommended purchasing more laptops. the libraries made several changes to improve the laptop service based on survey responses. changes have been made to the login script files, wireless network, and security protocol to speed and stabilize the wireless connection process. additional wireless access points will be added to the building and all access points will be upgraded to the 802.11g standard. in addition, five mac powerbooks have been added to the fleet of windowsbased laptops. the library continues to investigate new service models to circulate and maintain the laptops. works cited allmang, nancy. 2003. our plan for a wireless loan service. computer in libraries 23, no. 3: 20–25. block, karla j. 2001. laptops for loan: the experience of a multilibrary project. journal of interlibrary loan, document delivery, and information 12, no. 1: 1–12. direnzo, susan. 2002. a wireless laptop-lending program: the university of akron experience. technical services quarterly 20, no. 2: 1–12. dugan, robert e. 2001. managing laptops and the wireless network at the mildred f. sawyer library. journal of academic librarianship 27, no. 4: 295–298. jordy, matthew l. 1998. the impact of user support needs on a large academic workflow as a result of a laptop check-out program. master’s thesis, university of north carolina. lyle, heather. 1999. circulating laptop computers at west virginia university. information outlook 3, no. 11: 30–32. myers, penelope. 2001. laptop rental program, temple university libraries. journal of interlibrary loan, document delivery, and information supply 12, no. 1: 35–40. monash university caulfield library. 2004. laptop users and wireless network survey. www.its.monash.edu.au/staff/networks/wireless/review/caul-lapandnetsurvey.pdf (accessed june 8, 2005). monmouth university. 2003. testing the wireless waters: a survey of potential users before the implementation of a wireless notebook computer lending program in an academic library. http://bluehawk.monmouth.edu/~hholden/wwl/wireless_survey_results.html (accessed june 8, 2005). murray state university. 2002. library laptop computer usage survey results. www.murraystate.edu/msml/laptopsurv. htm (accessed june 8, 2005). oddy, elizabeth carley. 2002. laptops for loan. library and information update 1, no. 4: 54–55. vaughn, james b., and brett burnes. 2002. bringing them in and checking them out: laptop use in the modern academic library. information technology and libraries 21, no. 2: 52–62. west, carol. 2005. librarians pleased with results of student survey. southern new hampshire university. www.snhu. edu/3174/asp (accessed june 8, 2005). williams, joe. 2003. taming the wireless frontier: pdas, tablets, and laptops at home on the range. computers in libraries 23, no. 3: 10–12, 62–64. 128 information technology and libraries | september 2010 lynne weber and peg lawrence authentication and access: accommodating public users in an academic world in cook and shelton’s managing public computing, which confirmed the lack of applicable guidelines on academic websites, had more up-to-date information but was not available to the researchers at the time the project was initiated.2 in the course of research, the authors developed the following questions: ■■ how many arl libraries require affiliated users to log into public computer workstations within the library? ■■ how many arl libraries provide the means to authenticate guest users and allow them to log on to the same computers used by affiliates? ■■ how many arl libraries offer open-access computers for guests to use? do these libraries provide both open-access computers and the means for guest user authentication? ■■ how do federal depository library program libraries balance their policy requiring computer authentication with the obligation to provide public access to government information? ■■ do computers provided for guest use (open access or guest login) provide different software or capabilities than those provided to affiliated users? ■■ how many arl libraries have written policies for the use of open-access computers? if a policy exists, what is it? ■■ how many arl libraries have written policies for authenticating guest users? if a policy exists, what is it? ■■ literature review since the 1950s there has been considerable discussion within library literature about academic libraries serving “external,” “secondary,” or “outside” users. the subject has been approached from the viewpoint of access to the library facility and collections, reference assistance, interlibrary loan (ill) service, borrowing privileges, and (more recently) access to computers and internet privileges, including the use of proprietary databases. deale emphasized the importance of public relations to the academic library.3 while he touched on creating bonds both on and off campus, he described the positive effect of “privilege cards” to community members.4 josey described the variety of services that savannah state college offered to the community.5 he concluded his essay with these words: why cannot these tried methods of lending books to citizens of the community, story hours for children . . . , a library lecture series or other forum, a great books discussion group and the use of the library staff in the fall of 2004, the academic computing center, a division of the information technology services department (its) at minnesota state university, mankato took over responsibility for the computers in the public areas of memorial library. for the first time, affiliated memorial library users were required to authenticate using a campus username and password, a change that effectively eliminated computer access for anyone not part of the university community. this posed a dilemma for the librarians. because of its federal depository status, the library had a responsibility to provide general access to both print and online government publications for the general public. furthermore, the library had a long tradition of providing guest access to most library resources, and there was reluctance to abandon the practice. therefore the librarians worked with its to retain a small group of six computers that did not require authentication and were clearly marked for community use, along with several standup, open-access computers on each floor used primarily for searching the library catalog. the additional need to provide computer access to high school students visiting the library for research and instruction led to more discussions with its and resulted in a means of generating temporary usernames and passwords through a web form. these user accommodations were implemented in the library without creating a written policy governing the use of open-access computers. o ver time, library staff realized that guidelines for guests using the computers were needed because of misuse of the open-access computers. we were charged with the task of drafting these guidelines. in typical librarian fashion, we searched websites, including those of association of research libraries (arl) members for existing computer access policies in academic libraries. we obtained very little information through this search, so we turned to arl publications for assistance. library public access workstation authentication by lori driscoll, was of greater benefit and offered much of the needed information, but it was dated.1 a research result described lynne webber (lnweber@mnsu.edu) is access services librarian and peg lawrence (peg.lawrence@mnsu.edu) is systems librarian, minnesota state university, mankato. authentication and access | weber and lawrence 129 providing service to the unaffiliated, his survey revealed 100 percent of responding libraries offered free in-house collection use for the general public, and many others offered additional services.16 brenda johnson described a one-day program in 1984 sponsored by rutgers university libraries forum titled “a case study in closing the university library to the public.” the participating librarians spent the day familiarizing themselves with the “facts” of the theoretical case and concluded that public access should be restricted but not completely eliminated. a few months later, consideration of closing rutgers’ library to the public became a real debate. although there were strong opposing viewpoints, the recommendation was to retain the open-door policy.17 jansen discussed the division between those who wanted to provide the finest service to primary users and those who viewed the library’s mission as including all who requested assistance. jansen suggested specific ways to balance the needs of affiliates and the public and referred to the dilemma the university of california, berkeley, library that had been closed to unaffiliated users.18 bobp and richey determined that california undergraduate libraries were emphasizing service to primary users at a time when it was no longer practical to offer the same level of service to primary and secondary users. they presented three courses of action: adherence to the status quo, adoption of a policy restricting access, or implementation of tiered service.19 throughout the 1990s, the debate over the public’s right to use academic libraries continued, with increasing focus on computer use in public and private academic libraries. new authorization and authentication requirements increased the control of internal computers, but the question remained of libraries providing access to government information and responding to community members who expected to use the libraries supported by their taxes. morgan, who described himself as one who had spent his career encouraging equal access to information, concluded that it would be necessary to use authentication, authorization, and access control to continue offering information services readily available in the past.20 martin acknowledged that library use was changing as a result of the internet and that the public viewed the academic librarian as one who could deal with the explosion of information and offer service to the public.21 johnson described unaffiliated users as a group who wanted all the privileges of the affiliates; she discussed the obligation of the institution to develop policies managing these guest users.22 still and kassabian considered the dual responsibilities of the academic library to offer internet access to public users and to control internet material received and sent by primary and public users. further, they weighed as consultants be employed toward the building of good relations between town and gown.6 later, however, deale indicated that the generosity common in the 1950s to outsiders was becoming unsustainable.7 deale used beloit college, with an “open door policy” extending more than 100 years, as an example of a school that had found it necessary to refuse out-of-library circulation to minors except through ill by the 1960s.8 also in 1964, waggoner related the increasing difficulty of accommodating public use of the academic library. he encouraged a balance of responsibility to the public with the institution’s foremost obligation to the students and faculty.9 in october 1965, the ad hoc committee on community use of academic libraries was formed by the college library section of the association of college and research libraries (acrl). this committee distributed a 13-question survey to 1,100 colleges and universities throughout the united states. the high rate of response (71 percent) was considered noteworthy, and the findings were explored in “community use of academic libraries: a symposium,” published in 1967.10 the concluding article by josey (the symposium’s moderator) summarized the lenient attitudes of academic libraries toward public users revealed through survey and symposium reports. in the same article, josey followed up with his own arguments in favor of the public’s right to use academic libraries because of the state and federal support provided to those institutions.11 similarly, in 1976 tolliver reported the results of a survey of 28 wisconsin libraries (public academic, private academic, and public), which indicated that respondents made a great effort to serve all patrons seeking service.12 tolliver continued in a different vein from josey, however, by reporting the current annual fiscal support for libraries in wisconsin and commenting upon financial stewardship. tolliver concluded by asking, “how effective are our library systems and cooperative affiliations in meeting the information needs of the citizens of wisconsin?”13 much of the literature in the years following focused on serving unaffiliated users at a time when public and academic libraries suffered the strain of overuse and underfunding. the need for prioritization of primary users was discussed. in 1979, russell asked, “who are our legitimate clientele?” and countered the argument for publicly supported libraries serving the entire public by saying the public “cannot freely use the university lawn mowers, motor pool vehicles, computer center, or athletic facilities.”14 ten years later, russell, robison, and prather prefaced their report on a survey of policies and services for outside users at 12 consortia institutions by saying, “the issue of external users is of mounting concern to an institution whose income is student credit hour generated.”15 despite russell’s concerns about the strain of 130 information technology and libraries | september 2010 be aware of the issues and of the effects that licensing, networking, and collection development decisions have on access.”35 in “unaffiliated users’ access to academic libraries: a survey,” courtney reported and analyzed data from her own comprehensive survey sent to 814 academic libraries in winter 2001.36 of the 527 libraries responding to the survey, 72 libraries (13.6 percent) required all users to authenticate to use computers within the library, while 56 (12.4 percent) indicated that they planned to require authentication in the next twelve months.37 courtney followed this with data from surveyed libraries that had canceled “most” of their indexes and abstracts (179 libraries, or 33.9 percent) and libraries that had cancelled “most” periodicals (46 libraries or 8.7 percent).38 she concluded that the extent to which the authentication requirement restricted unaffiliated users was not clear, and she asked, “as greater numbers of resources shift to electronic-only formats, is it desirable that they disappear from the view of the community user or the visiting scholar?”39 courtney’s “authentication and library public access computers: a call for discussion” described a follow-up with the academic libraries participating in her 2001 survey who had self-identified as using authentication or planning to employ authentication within the next twelve months. her conclusion was the existence of ambivalence toward authentication among the libraries, since more than half of the respondents provided some sort of public access. she encouraged librarians to carefully consider the library’s commitment to service before entering into blanket license agreements with vendors or agreeing to campus computer restrictions.40 several editions of the arl spec kit series showing trends of authentication and authorization for all users of arl libraries have been an invaluable resource in this investigation. an examination of earlier spec kits indicated that the definitions of “user authentication” and “authorization” have changed over the years. user authentication, by plum and bleiler indicated that 98 percent of surveyed libraries authenticated users in some way, but at that time authentication would have been more precisely defined as authorization or permission to access personal records, such as circulation, e-mail, course registration, and file space. as such, neither authentication nor authorization was related to basic computer access.41 by contrast, it is common for current library users authenticate to have any access to a public workstation. driscoll’s library public access workstation authentication sought information on how and why users were authenticated on public-access computers, who was driving the change, how it affected the ability of federal depository libraries to provide public information, and how it affected library services in general.42 but at the time of driscoll’s survey, only 11 percent of surveyed libraries required authentication on all computers and 22 percent required it only on selected terminals. cook and shelton’s managing public computing the reconciliation of material restrictions against “principles of freedom of speech, academic freedom, and the ala’s condemnation of censorship.”23 lynch discussed institutional use of authentication and authorization and the growing difficulty of verifying bona fide users of academic library subscription databases and other electronic resources. he cautioned that future technical design choices must reflect basic library values of free speech, personal confidentiality, and trust between academic institution and publisher.24 barsun specifically examined the webpages of one hundred arl libraries in search of information pertinent to unaffiliated users. she included a historic overview of the changing attitudes of academics toward service to the unaffiliated population and described the difficult balance of college community needs with those of outsiders in 2000 (the survey year).25 barsun observed a consistent lack of information on library websites regarding library guest use of proprietary databases.26 carlson discussed academic librarians’ concerns about “internet-related crimes and hacking” leading to reconsideration of open computer use, and he described the need to compromise patron privacy by requiring authentication.27 in a chapter on the relationship of it security to academic values, oblinger said, “one possible interpretation of intellectual freedom is that individuals have the right to open and unfiltered access to the internet.”28 this statement was followed later with “equal access to information can also be seen as a logical extension of fairness.”29 a short article in library and information update alerted the authors to a uk project investigating improved online access to resources for library visitors not affiliated with the host institution.30 salotti described higher education access to e-resources in visited institutions (haervi) and its development of a toolkit to assist with the complexities of offering electronic resources to guest users.31 salotti summarized existing resources for sharing within the united kingdom and emphasized that “no single solution is likely to suit all universities and colleges, so we hope that the toolkit will offer a number of options.”32 launched by the society of college, national and university libraries (sconul), and universities and colleges information systems association (ucisa), haervi has created a best-practice guide.33 by far the most useful articles for this investigation have been those by nancy courtney. “barbarians at the gates: a half-century of unaffiliated users in academic libraries,” a literature review on the topic of visitors in academic libraries, included a summary of trends in attitude and practice toward visiting users since the 1950s.34 the article concluded with a warning: “the shift from printed to electronic formats . . . combined with the integration of library resources with campus computer networks and the internet poses a distinct threat to the public’s access to information even onsite. it is incumbent upon academic librarians to authentication and access | weber and lawrence 131 introductory letter with the invitation to participate and a forward containing definitions of terms used within the survey is in appendix a. in total, 61 (52 percent) of the 117 arl libraries invited to participate in the survey responded. this is comparable with the response rate for similar surveys reported by plum and bleiler (52 of 121, or 43 percent), driscoll (67 of 124, or 54 percent), and cook and shelton (69 of 123, or 56 percent).45 1. what is the name of your academic institution? the names of the 61 responding libraries are listed in appendix b. 2. is your institution public or private? see figure 1. respondents’ explanations of “other” are listed below. ■❏ state-related ■❏ trust instrument of the u.s. people; quasigovernment ■❏ private state-aided ■❏ federal government research library ■❏ both—private foundation, public support 3. are affiliated users required to authenticate in order to access computers in the public area of your library? see figure 2. 4. if you answered “yes” to the previous question, does your library provide the means for guest users to authenticate? see figure 3. respondents’ explanations of “other” are listed below. all described open-access computers. ■❏ “we have a few “open” terminals” ■❏ “4 computers don’t require authentication” ■❏ “some workstations do not require authentication” ■❏ “open-access pcs for guests (limited number and function)” ■❏ “no—but we maintain several open pcs for guests” ■❏ “some workstations do not require login” 5. is your library a federal depository library? see figure 4. this question caused some confusion for the canadian survey respondents because canada has its own depository services program corresponding to the u.s. federal depository program. consequently, 57 of the 61 respondents identified themselves as federal depository (including three canadian libraries), although 5 of the 61 are more accurately members of the canadian depository services program. only two responding libraries were neither a member of the u.s. federal depository program nor of the canadian depository services program. 6. if you answered “yes” to the previous question, and computer authentication is required, what provisions have been made to accommodate use of online government documents by the general public in the library? please check all that touched on every aspect of managing public computing, including public computer use, policy, and security.43 even in 2007, only 25 percent of surveyed libraries required authentication on all computers, but 46 percent required authentication on some computers, showing the trend toward an ever increasing number of libraries requiring public workstation authentication. most of the responding libraries had a computer-use policy, with 48 percent following an institution-wide policy developed by the university or central it department.44 ■■ method we constructed a survey designed to obtain current data about authentication in arl libraries and to provide insight into how guest access is granted at various academic institutions. it should be noted that the object of the survey was access to computers located in the public areas of the library for use by patrons, not access to staff computers. we constructed a simple, fourteen-question survey using the zoomerang online tool (http://www .zoomerang.com/). a list of the deans, directors, and chief operating officers from the 123 arl libraries was compiled from an internet search. we eliminated the few library administrators whose addresses could not be readily found and sent the survey to 117 individuals with the request that it be forwarded to the appropriate respondent. the recipients were informed that the goal of the project was “determination of computer authentication and current computer access practices within arl libraries” and that the intention was “to reflect practices at the main or central library” on the respondent’s campus. recipients were further informed that the names of the participating libraries and the responses would be reported in the findings, but that there would be no link between responses given and the name of the participating library. the survey introduction included the name and contact information of the institutional review board administrator for minnesota state university, mankato. potential respondents were advised that the e-mail served as informed consent for the study. the survey was administered over approximately three weeks. we sent reminders three, five, and seven days after the survey was launched to those who had not already responded. ■■ survey questions, responses, and findings we administered the survey, titled “authentication and access: academic computers 2.0,” in late april 2008. following is a copy of the fourteen-question survey with responses, interpretative data, and comments. the 132 information technology and libraries | september 2010 ■❏ “some computers are open access and require no authentication” ■❏ “some workstations do not require login” 7. if your library has open-access computers, how many do you provide? (supply number). see figure 6. a total of 61 institutions responded to this question, and 50 reported open-access computers. the number of open-access computers ranged from 2 to 3,000. as expected, the highest numbers were reported by libraries that did not require authentication for affiliates. the mean number of open-access computers was 161.2, the median was 23, the mode was 30, and the range was 2,998. 8. please indicate which online resources and services are available to authenticated users. please check all that apply. see figure 7. ■❏ online catalog ■❏ government documents ■❏ internet browser apply. see figure 5. ■❏ temporary user id and password ■❏ open access computers (unlimited access) ■❏ open access computers (access limited to government documents) ■❏ other of the 57 libraries that responded “yes” to question 5, 30 required authentication for affiliates. these institutions offered the general public access to online government documents various ways. explanations of “other” are listed below. three of these responses indicate, by survey definition, that open-access computers were provided. ■❏ “catalog-only workstations” ■❏ “4 computers don’t require authentication” ■❏ “generic login and password” ■❏ “librarians login each guest individually” ■❏ “provision made for under-18 guests needing gov doc” ■❏ “staff in gov info also login user for quick use” ■❏ “restricted guest access on all public devices” figure 3. institutions with the means to authenticate guests figure 4. libraries with federal depository and/or canadian depository services status figure 2. institutions requiring authentication figure 1. categories of responding institutions authentication and access | weber and lawrence 133 11. does your library have a written policy for use of open access computers in the public area of the library? question 7 indicates that 50 of the 61 responding libraries did offer the public two or more open-access computers. out of the 50, 28 responded that they had a written policy governing the use of computers. conversely, open-access computers were reported at 22 libraries that had no reported written policy. 12. if you answered “yes” to the previous question, please give the link to the policy and/or summarize the policy. twenty-eight libraries gave a url, a url plus a summary explanation, or a summary explanation with no url. 13. does your library have a written policy for authenticating guest users? out of the 32 libraries that required their users to authenticate (see question 3), 23 also had the means to allow their guests to authenticate (see question 4). fifteen of those libraries said they had a policy. 14. if you answered “yes” to the previous question, please give the link to the policy and/or summarize the policy. eleven ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software 9. please indicate which online resources and services are available to authenticated guest users. please check all that apply. see figure 8. ■❏ online catalog ■❏ government documents ■❏ internet browser ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software 10. please indicate which online resources and services are available on open-access computers. please check all that apply. see figure 9. ■❏ online catalog ■❏ government documents ■❏ internet browser ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software figure 5. provisions for the online use of government documents where authentication is required figure 6. number of open-access computers offered figure 7. electronic resources for authenticated affiliated users (n = 32) number of libraries number of librariesnumber of libraries number of libraries figure 8. resources for authenticating guest users (n = 23) 134 information technology and libraries | september 2010 ■■ respondents and authentication figure 10 compares authentication practices of public, private, and other institutions described in response to question 2. responses from public institutions outnumbered those from private institutions, but within each group a similar percentage of libraries required their affiliated users to authenticate. therefore no statistically significant difference was found between authenticating affiliates in public and private institutions. of the 61 respondents, 32 (52 percent) required their affiliated users to authenticate (see question 3) and 23 of the 32 also had the means to authenticate guests (see question 4). the remaining 9 offered open-access computers. fourteen libraries had both the means to authenticate guests and had open-access computers (see questions 4 and 7). when we compare the results of the 2007 study by cook and shelton with the results of the current study (completed in 2008), the results are somewhat contradictory (see table 1).46 the differences in survey data seem to indicate that authentication requirements are decreasing; however, the literature review—specifically cook and shelton and the 2003 courtney article—clearly indicate that authentication is on the rise.47 this dichotomy may be explained, in part, by the fact that of the more than 60 arl libraries responding to both surveys, there was an overlap of only 34 libraries. the 30 u.s. federal depository or canadian depository services libraries that required their affiliated users to authenticate (see questions 3 and 5) provided guest access ranging from usernames and passwords, to open-access computers, to computers restricted to libraries gave the url to their policy; 4 summarized their policies. ■■ research questions answered the study resulted in answers to the questions we posed at the outset: ■■ thirty-two (52 percent) of the responding arl libraries required affiliated users to login to public computer workstations in the library. ■■ twenty-three (72 percent) of the 32 arl libraries requiring affiliated users to login to public computers provided the means for guest users to login to public computer workstations in the library. ■■ fifty (82 percent) of 61 responding arl libraries provided open-access computers for guest users; 14 (28 percent) of those 50 libraries provided both open-access computers and the means for guest authentication. ■■ without exception, all u.s. federal depository or canadian depository services libraries that required their users to authenticate offered guest users some form of access to online information. ■■ survey results indicated some differences between software provided to various users on differently accessed computers. office software was less frequently provided on open-access computers. ■■ twenty-eight responding arl libraries had written policies relating to the use of open-access computers. ■■ fifteen responding arl libraries had written policies relating to the authorization of guests. figure 9. electronic resources on open access computers (n = 50) figure 10. comparison of library type and authentication requirement number of libraries authentication and access | weber and lawrence 135 ■■ one library had guidelines for use posted next to the workstations but did not give specifics. ■■ fourteen of those requiring their users to authenticate had both open-access computers and guest authentication to offer to visitors of their libraries. other policy information was obtained by an examination of the 28 websites listed by respondents: ■■ ten of the sites specifically stated that the open-access computers were for academic use only. ■■ five of the sites specified time limits for use of openaccess computers, ranging from 30 to 90 minutes. ■■ four stated that time limits would be enforced when others were waiting to use computers. ■■ one library used a sign-in sheet to monitor time limits. ■■ one library mentioned a reservation system to monitor time limits. ■■ two libraries prohibited online gambling. ■■ six libraries prohibited viewing sexually explicit materials. ■■ guest-authentication policies of the 23 libraries that had the means to authenticate their guests, 15 had a policy for guests obtaining a username and password to authenticate, and 6 outlined their requirements of showing identification and issuing access. the other 9 had open-access computers that guests might use. the following are some of the varied approaches to guest authentication: ■■ duration of the access (when mentioned) ranged from 30 days to 12 months. ■■ one library had a form of sponsored access where current faculty or staff could grant a temporary username and password to a visitor. ■■ one library had an online vouching system that allowed the visitor to issue his or her own username and password online. ■■ one library allowed guests to register themselves by swiping an id or credit card. ■■ one library had open-access computers for local resources and only required authentication to leave the library domain. ■■ one library had the librarians log the users in as guests. ■■ one library described the privacy protection of collected personal information. ■■ no library mentioned charging a fee for allowing computer access. government documents, to librarians logging in for guests (see question 6). numbers of open-access computers ranged widely from 2 to more than 3,000 (see question 7). eleven (19 percent) of the responding u.s. federal depository or canadian depository services libraries that did not provide open-access computers issued a temporary id (nine libraries), provided open access limited to government documents (one library), or required librarian login for each guest (one library). all libraries with u.s. federal depository or canadian depository services status provided a means of public access to information to fulfill their obligation to offer government documents to guests. figure 11 shows a comparison of resources available to authenticated users and authenticated guests and offered on open-access computers. as might be expected, almost all institutions provided access to online catalogs, government documents, and internet browsers. fewer allowed access to licensed electronic resources and e-mail. access to office software showed the most dramatic drop in availability, especially on open-access computers. ■■ open-access computer policies as mentioned earlier, 28 libraries had written policies for their open-access computers (see question 11), and 28 libraries gave a url, a url plus a summary explanation, or a summary explanation with no url (see question 12). in most instances, the library policy included their campus’s acceptable-use policy. seven libraries cited their campus’s acceptable-use policy and nothing else. nearly all libraries applied the same acceptable-use policy to all users on all computers and made no distinction between policies for use of open-access computers or computers requiring authentication. following are some of the varied aspects of summarized policies pertaining to open-access computers: ■■ eight libraries stated that the computers were for academic use and that users might be asked to give up their workstation if others were waiting. table 1. comparison of findings from cook and shelton (2007) and the current survey (2008) authentication requirements 2007 (n = 69) 2008 (n = 61) some required 28 (46%) 23 (38%) required for all 15 (25%) 9 (15%) not required 18 (30%) 29 (48%) 136 information technology and libraries | september 2010 ■■ further study although the survey answered many of our questions, other questions arose. while the number of libraries requiring affiliated users to log on to their public computers is increasing, this study does not explain why this is the case. reasons could include reactions to the september 11 disaster, the usa patriot act, general security concerns, or the convenience of the personalized desktop and services for each authenticated user. perhaps a future investigation could focus on reasons for more frequent requirement of authentication. other subjects that arose in the examination of institutional policies were guest fees for services, age limits for younger users, computer time limits for guests, and collaboration between academic and public libraries. ■■ policy developed as a result of the survey findings as a result of what was learned in the survey, we drafted guidelines governing the use of open-access computers by visitors and other non-university users. the guidelines can be found at http://lib.mnsu.edu/about/libvisitors .html#access. these guidelines inform guests that openaccess computers are available to support their research, study, and professional activities. the computers also are governed by the campus policy and the state university system acceptable-use policy. guideline provisions enable staff to ask users to relinquish a computer when others are waiting or if the computer is not being used for academic purposes. while this library has the ability to generate temporary usernames and passwords, and does so for local schools coming to the library for research, no guidelines have yet been put in place for this function. figure 11. online resources available to authenticated affiliated users, guest users, open-access users authentication and access | weber and lawrence 137 these practices depend on institutional missions and goals and are limited by reasonable considerations. in the past, accommodation at some level was generally offered to the community, but the complications of affiliate authentication, guest registration, and vendor-license restrictions may effectively discourage or prevent outside users from accessing principal resources. on the other hand, open-access computers facilitate access to electronic resources. those librarians who wish to provide the same level of commitment to guest users as in the past as well as protect the rights of all should advocate to campus policy-makers at every level to allow appropriate guest access to computers to fulfill the library’s mission. in this way, the needs and rights of guest users can be balanced with the responsibilities of using campus computers. in addition, librarians should consider ensuring that the licenses of all electronic resources accommodate walk-in users and developing guidelines to prevent incorporation of electronic materials that restrict such use. this is essential if the library tradition of freedom of access to information is to continue. finally, in regard to external or guest users, academic librarians are pulled in two directions; they are torn between serving primary users and fulfilling the principles of intellectual freedom and free, universal access to information along with their obligations as federal depository libraries. at the same time, academic librarians frequently struggle with the goals of the campus administration responsible for providing secure, reliable networks, sometimes at the expense of the needs of the outside community. the data gathered in this study, indicating that 82 percent of responding libraries continue to provide at least some open-access computers, is encouraging news for guest users. balancing public access and privacy with institutional security, while a current concern, may be resolved in the way of so many earlier preoccupations of the electronic age. given the pervasiveness of the problem, however, fair and equitable treatment of all library users may continue to be a central concern for academic libraries for years to come. references 1. lori driscoll, library public access workstation authentication, spec kit 277 (washington, d.c.: association of research libraries, 2003). 2. martin cook and mark shelton, managing public computing, spec kit 302 (washington, d.c.: association of research libraries, 2007): 16. 3. h. vail deale, “public relations of academic libraries,” library trends 7 (oct. 1958): 269–77. 4. ibid., 275. 5. e. j. josey, “the college library and the community,” faculty research edition, savannah state college bulletin (dec. 1962): 61–66. ■■ conclusions while we were able to gather more than 50 years of literature pertaining to unaffiliated users in academic libraries, it soon became apparent that the scope of consideration changed radically through the years. in the early years, there was discussion about the obligation to provide service and access for the community balanced with the challenge to serve two clienteles. despite lengthy debate, there was little exception to offering the community some level of service within academic libraries. early preoccupation with physical access, material loans, ill, basic reference, and other services later became a discussion of the right to use computers, electronic resources, and other services without imposing undue difficulty to the guest. current discussions related to guest users reflect obvious changes in public computer administration over the years. authentication presently is used at a more fundamental level than in earlier years. in many libraries, users must be authorized to use the computer in any way whatsoever. as more and more institutions require authentication for their primary users, accommodation must be made if guests are to continue being served. in addition, as courtney’s 2003 research indicates, an ever increasing number of electronic databases, indexes, and journals replace print resources in library collections. this multiplies the roadblocks for guest users and exacerbates the issue.48 unless special provisions are made for computer access, community users are left without access to a major part of the library’s collections. because 104 of the 123 arl libraries (85 percent) are federal depository or canadian depository services libraries, the researchers hypothesized that most libraries responding to the survey would offer open-access computers for the use of nonaffiliated patrons. this study has shown that federal depository libraries have remained true to their mission and obligation of providing public access to government-generated documents. every federal depository respondent indicated that some means was in place to continue providing visitor and guest access to the majority of their electronic resources— whether through open-access computers, temporary or guest logins, or even librarians logging on for users. while access to government resources is required for the libraries housing government-document collections, libraries can use considerably more discretion when considering what other resources guest patrons may use. despite the commitment of libraries to the dissemination of government documents, the increasing use of authentication may ultimately diminish the libraries’ ability and desire to accommodate the information needs of the public. this survey has provided insight into the various ways academic libraries serve guest users. not all academic libraries provide public access to all library resources. 138 information technology and libraries | september 2010 identify yourself,” chronicle of higher education 50, no. 42 (june 25, 2004): a39, http://search.ebscohost.com/login.aspx?direct =true&db=aph&an=13670316&site=ehost-live (accessed mar. 2, 2009). 28. diana oblinger, “it security and academic values,” in luker and petersen, computer & network security in higher education, 4, http://net.educause.edu/ir/library/pdf/pub7008e .pdf (accessed july 14, 2008). 29. ibid., 5. 30. “access for non-affiliated users,” library & information update 7, no. 4 (2008): 10. 31. paul salotti, “introduction to haervi-he access to e-resources in visited institutions,” sconul focus no. 39 (dec. 2006): 22–23, http://www.sconul.ac.uk/publications/ newsletter/39/8.pdf (accessed july 14, 2008). 32. ibid., 23. 33. universities and colleges information systems association (ucisa), haervi: he access to e-resources in visited institutions, (oxford: ucisa, 2007), http://www.ucisa.ac.uk/ publications/~/media/files/members/activities/haervi/ haerviguide%20pdf (accessed july 14, 2008). 34. nancy courtney, “barbarians at the gates: a half-century of unaffiliated users in academic libraries,” journal of academic librarianship 27, no. 6 (nov. 2001): 473–78, http://search.ebsco host.com/login.aspx?direct=true&db=aph&an=5602739&site= ehost-live (accessed july 14, 2008). 35. ibid., 478. 36. nancy courtney, “unaffiliated users’ access to academic libraries: a survey,” journal of academic librarianship 29, no. 1 (jan. 2003): 3–7, http://search.ebscohost.com/login.aspx?dire ct=true&db=aph&an=9406155&site=ehost-live (accessed july 14, 2008). 37. ibid., 5. 38. ibid., 6. 39. ibid., 7. 40. nancy courtney, “authentication and library public access computers: a call for discussion,” college & research libraries news 65, no. 5 (may 2004): 269–70, 277, www.ala .org/ala/mgrps/divs/acrl/publications/crlnews/2004/may/ authentication.cfm (accessed july 14, 2008). 41. terry plum and richard bleiler, user authentication, spec kit 267 (washington, d.c.: association of research libraries, 2001): 9. 42. lori driscoll, library public access workstation authentication, spec kit 277 (washington, d.c.: association of research libraries, 2003): 11. 43. cook and shelton, managing public computing. 44. ibid., 15. 45. plum and bleiler, user authentication, 9; driscoll, library public access workstation authentication, 11; cook and shelton, managing public computing, 11. 46. cook and shelton, managing public computing, 15. 47. ibid.; courtney, unaffiliated users, 5–7. 48. courtney, unaffiliated users, 6–7. 6. ibid., 66. 7. h. vail deale, “campus vs. community,” library journal 89 (apr. 15, 1964): 1695–97. 8. ibid., 1696. 9. john waggoner, “the role of the private university library,” north carolina libraries 22 (winter 1964): 55–57. 10. e. j. josey, “community use of academic libraries: a symposium,” college & research libraries 28, no. 3 (may 1967): 184–85. 11. e. j. josey, “implications for college libraries,” in “community use of academic libraries,” 198–202. 12. don l. tolliver, “citizens may use any tax-supported library?” wisconsin library bulletin (nov./dec. 1976): 253. 13. ibid., 254. 14. ralph e. russell, “services for whom: a search for identity,” tennessee librarian: quarterly journal of the tennessee library association 31, no. 4 (fall 1979): 37, 39. 15. ralph e. russell, carolyn l. robison, and james e. prather, “external user access to academic libraries,” the southeastern librarian 39 (winter 1989): 135. 16. ibid., 136. 17. brenda l. johnson, “a case study in closing the university library to the public,” college & research library news 45, no. 8 (sept. 1984): 404–7. 18. lloyd m. jansen, “welcome or not, here they come: unaffiliated users of academic libraries,” reference services review 21, no. 1 (spring 1993): 7–14. 19. mary ellen bobp and debora richey, “serving secondary users: can it continue?” college & undergraduate libraries 1, no. 2 (1994): 1–15. 20. eric lease morgan, “access control in libraries,” computers in libraries 18, no. 3 (mar. 1, 1998): 38–40, http://search .ebscohost.com/login.aspx?direct=true&db=aph&an=306709& site=ehost-live (accessed aug. 1, 2008). 21. susan k. martin, “a new kind of audience,” journal of academic librarianship 24, no. 6 (nov. 1998): 469, library, information science & technology abstracts, http://search.ebsco host.com/login.aspx?direct=true&db=aph&an=1521445&site= ehost-live (accessed aug. 8, 2008). 22. peggy johnson, “serving unaffiliated users in publicly funded academic libraries,” technicalities 18, no. 1 (jan. 1998): 8–11. 23. julie still and vibiana kassabian, “the mole’s dilemma: ethical aspects of public internet access in academic libraries,” internet reference services quarterly 4, no. 3 (1999): 9. 24. clifford lynch, “authentication and trust in a networked world,” educom review 34, no. 4 (jul./aug. 1999), http://search .ebscohost.com/login.aspx?direct=true&db=aph&an=2041418 &site=ehost-live (accessed july 16, 2008). 25. rita barsun, “library web pages and policies toward ‘outsiders’: is the information there?” public services quarterly 1, no. 4 (2003): 11–27. 26. ibid., 24. 27. scott carlson, “to use that library computer, please authentication and access | weber and lawrence 139 appendix a. the survey introduction, invitation to participate, and forward dear arl member library, as part of a professional research project, we are attempting to determine computer authentication and current computer access practices within arl libraries. we have developed a very brief survey to obtain this information which we ask one representative from your institution to complete before april 25, 2008. the survey is intended to reflect practices at the main or central library on your campus. names of libraries responding to the survey may be listed but no identifying information will be linked to your responses in the analysis or publication of results. if you have any questions about your rights as a research participant, please contact anne blackhurst, minnesota state university, mankato irb administrator. anne blackhurst, irb administrator minnesota state university, mankato college of graduate studies & research 115 alumni foundation mankato, mn 56001 (507)389-2321 anne.blackhurst@mnsu.edu you may preview the survey by scrolling to the text below this message. if, after previewing you believe it should be handled by another member of your library team, please forward this message appropriately. alternatively, you may print the survey, answer it manually and mail it to: systems/ access services survey library services minnesota state university, mankato ml 3097—po box 8419 mankato, mn 56001-8419 (usa) we ask you or your representative to take 5 minutes to answer 14 questions about computer authentication practices in your main library. participation is voluntary, but follow-up reminders will be sent. this e-mail serves as your informed consent for this study. your participation in this study includes the completion of an online survey. your name and identity will not be linked in any way to the research reports. clicking the link to take the survey shows that you understand you are participating in the project and you give consent to our group to use the information you provide. you have the right to refuse to complete the survey and can discontinue it at any time. to take part in the survey, please click the link at the bottom of this e-mail. thank you in advance for your contribution to our project. if you have questions, please direct your inquiries to the contacts given below. thank you for responding to our invitation to participate in the survey. this survey is intended to determine current academic library practices for computer authentication and open access. your participation is greatly appreciated. below are the definitions of terms used within this survey: ■■ “authentication”: a username and password are required to verify the identity and status of the user in order to log on to computer workstations in the library. ■■ “affiliated user”: a library user who is eligible for campus privileges. ■■ “non-affiliated user”: a library user who is not a member of the institutional community (an alumnus may be a nonaffiliated user). this may be used interchangeably with “guest user.” ■■ “guest user”: visitor, walk-in user, nonaffiliated user. ■■ “open access computer”: computer workstation that does not require authentication by user. 140 information technology and libraries | september 2010 appendix b. responding institutions 1. university at albany state university of new york 2. university of alabama 3. university of alberta 4. university of arizona 5. arizona state university 6. boston college 7. university of british columbia 8. university at buffalo, state university of ny 9. case western reserve university 10. university of california berkeley 11. university of california, davis 12. university of california, irvine 13. university of chicago 14. university of colorado at boulder 15. university of connecticut 16. columbia university 17. dartmouth college 18. university of delaware 19. university of florida 20. florida state university 21. university of georgia 22. georgia tech 23. university of guelph 24. howard university 25. university of illinois at urbana-champaign 26. indiana university bloomington 27. iowa state university 28. johns hopkins university 29. university of kansas 30. university of louisville 31. louisiana state university 32. mcgill university 33. university of maryland 34. university of massachusetts amherst 35. university of michigan 36. michigan state university 37. university of minnesota 38. university of missouri 39. massachusetts institute of technology 40. national agricultural library 41. university of nebraska-lincoln 42. new york public library 43. northwestern university 44. ohio state university 45. oklahoma state university 46. university of oregon 47. university of pennsylvania 48. university of pittsburgh 49. purdue university 50. rice university 51. smithsonian institution 52. university of southern california 53. southern illinois university carbondale 54. syracuse university 55. temple university 56. university of tennessee 57. texas a&m university 58. texas tech university 59. tulane university 60. university of toronto 61. vanderbilt university 54 information technology and libraries | june 2011 recreation, law enforcement and public safety, and social services available in the community ■■ access to electronic encyclopedias, local libraries’ catalogs, full-text articles online, and document delivery.”2 at the time we were asking the question, will an information infrastructure be built? the answer? most assuredly. indeed, librarians stepped up to the table and ensured that the public had access to information-related services at their local library. the information the public asked for in 1994, as listed above, is widely available today. there are numerous examples in which librarians and libraries have served as leaders in the ongoing sustainablity of local, regional, and national information networks. it was pointed out at the time, and remains true today, that in an era of ever-shrinking resources, libraries cannot and should not compete with telecommunications, entertainment, and computer companies. they need to “join them as equals in the information arena.”3 lita has a viable role in the development of the twentyfirst-century skills that will firmly put the information infrastructure into place. a lita member is appointed as a liaison to the office for information technology policy (oitp) and serves on the lita technology and access committee, which addresses similar issues. the lita transliteracy interest group explores, develops, and promotes the role of libraries in all aspects of literacy. working with the oitp provides lita membership with the opportunity to participate in current issues, such as digital literacy. the information infrastructure has come a long way in the last twenty some years. there is still much to be done. robert bocher, technology consultant with the wisconsin state library and oitp fellow, will present “building the future: addressing library broadband connectivity issues in the 21st century” at the lita president’s program from 4 p.m. to 5:30 p.m. on sunday, june 26, at the ala annual conference in new orleans. i look forward to seeing you at the program and to hear about the successes and the work that remains to be done to address the broadband needs we all face in the country. references 1. federal communications commission, the national broadband plan: chapter 2: goals for a high performance america, http://www.broadband.gov/plan/2 -goals-for-a-high-performance-america/ (accessed apr. 2, 2011). 2. karen starr, “the american public, the public library, and the internet; an ever-evolving partnership” in the cybrarian’s manual, ed. pat ensor (chicago: ala, 1997): 23–24. 3. ibid., 31. t wenty years ago, librarians became involved in the implementation of the internet for the use of the public across the country. those initiatives were soon followed by the bill and melinda gates foundation projects supporting public libraries, which included funding hardware grants to implement public computer labs and connectivity grants to support high-speed internet connections. in 2008, the institute of museum and library services (imls) convened a task force to define twentyfirst-century skills for museums and libraries, which became an ongoing national initiative (http://www.imls .gov/about/21stcskills.shtm). the one year anniversary of the release of the national broadband plan was march 16, 2011. as described on broadband.gov, the plan is intended “to create a high-performance america—a more productive, creative, efficient america in which affordable broadband is available everywhere and everyone has the means and skills to use valuable broadband applications.”1 in 1994, the idaho state library’s development division cosponsored eight focus groups in which 179 people participated. the participants were asked several questions, including the types of information they would like to see on the internet. the results reflected the public’s interest at that time in the following: ■■ “expert advice on a variety of topics including medicine, law, car repair, computer technology, animal husbandry, and gardening ■■ economic development, investment, bank rates, consumer product safety, and insurance ■■ community-based information such as events, volunteers, local classified advertisements, special interest groups, housing information, public meetings, transportation schedules, and local employment opportunities ■■ computer training, foreign language programs, homework service, teacher recertification, school activities, school scheduling, and adult education ■■ electronic mail and the ability to transfer files locally as well as worldwide ■■ access to public records, voting records of legislators, absentee voting, the ability to renew a driver’s license, the rules and regulations from governmental agencies, and taxes ■■ information about hunting and fishing, environmental quality, the local weather, road advisories, sports, karen j. starr (karen.j.starr@gmail.com) is lita president 2010-11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starr president’s message: 21st century skills, 21st century infrastructure editorial the authors of “the state of rfid applications in libraries,” that appeared in the march 2006 issue, inadvertently included two sentences that are near quotations from a commentary by peter warfield and lee tien in the april 8, 2005 issue of the berkeley daily planet. on page 30 immediately following footnote 24, the authors wrote: “the eugene public library reported ‘collision’ problems on very thin materials and on videos as well as false readings from the rfid security gates. collision problems mean that two or more tags are close enough to cancel the signals, making them undetectable by the rfid checkout and security systems.” warfield and lien wrote: “the eugene (ore.) public library reported ‘collision’ problems on very thin materials and on videos as well as ‘false readings’ from the rfid security gates. (collision problems mean that two or more tags are close enough to ‘cancel the signals,’ according to an american library association publication, making them undetectable by the rfid checkout and security systems.)” (accessed may 16, 2006, www .berkeleydailyplanet.com/article.cfm?archivedate=04-08 -05&storyid=21128). the authors’ research notes indicated that it was a near quotation, but this fact was lost in the writing of the article. the article referee, the copy editors, and i did not question the authors because earlier in the same paragraph they wrote about the eugene public library experience and referred (footnote 23) to an earlier article in the berkeley daily planet. the authors and i apologize for this unfortunate error. **** july 1, 2006 marked the merger of rlg and oclc. by the time this editorial appears, many words will already have been spoken and written about this monumental, twentyfirst century library event. i know what i think the three very important immediate effects of the merger will be. first, it is a giant step toward the realization of a global library bibliographic database. second, taking advantage of rlg’s unique and successful programs and integrating them and their development philosophy as “rlgprograms,” while working alongside oclc research, seems a step so important for the future development of library technology that it cannot be overemphasized. third, and very practically, incorporating redlightgreen into open worldcat will give the library world a product that users might prefer over a search of google books or amazon. i requested and received quotes about the merger from the principals that i might put into this editorial that won’t appear until four months after the may 3 announcement. jay jordan, president and ceo, oclc, remarked: “we have worked cooperatively with rlg on a variety of projects over the years. since we announced our plans to combine, staff from both organizations have been working together to develop plans and strategies to integrate systems, products, and services. over the past several months, staff members have demonstrated great mutual respect, energy, and enthusiasm for the potential of our new relationship and what it means for the organizations we serve. there is much work to be done as we complete this transition. clearly, we are off to a good start.” betsy wilson, chair, oclc board of trustees, and dean of libraries, university of washington, wrote: “the response from our constituencies has been overwhelmingly supportive. over the past several months, we have finalized appointments for the twelve-person program council, which reports to . . . oclc through a standing committee called the rlg board committee. we are starting to build agendas for our new alliance. the members of this group from the rlg board are: james neal, vice president for information services and university librarian, columbia university; nancy eaton, dean of university libraries and scholarly communication, penn state university (and former chair of the oclc board); and carol mandel, dean of libraries, new york university. from oclc the members are elisabeth niggeman, director, deutschesbibliothek; jane ryland, senior scientist, internet 2; and betsy wilson, dean of university libraries, university of washington.” and from james michalko, currently president and ceo of rlg, and by the time you read this, vice president, rlg-programs development, oclc: “we are combining the practices of rlg and oclc in a very powerful way— by putting together the traditions of rlg and oclc we are creating a robust new venue for research institutions and new capacity that will provide unique and beneficial outcomes to the whole community.” by now, all lita members and ital readers know that in 1967, fred kilgour founded oclc; and was the founding editor of the journal of library automation (jola—vol. 1, no. 1 was published in march, 1968), which, with but a mild outcry from serials librarians, changed its title to information technology and libraries in 1982. this afternoon (6/15/06), i called fred. he and his wife eleanor reminisced about the earliest days, and then i asked him for his comments on the oclc-rlg merger. because he had had the first words about both oclc and jola, as it were, i told him that i would like for him to have the last. and this is what he said, “at long last!” fred kilgour died on july 31, 2006, aged 92. a tribute posted by alane wilson of oclc may be read at http:// scanblog.blogspot.com/2006/07/frederick-g-kilgour -1914-2006.html editorial: a confession, a speculation, and a farewell john webb john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university and editor of information technology and libraries. editorial | webb 115 kruger ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ an algorithm for variable-length proper-name compression 257 james l. dolby: r & d consultants company, los altos, california viable on-line search systems require reasonable capabilities to automatically detect (and hopefully correct) variations between request format and stored format. an important requirement is the solution of the problem of matching proper names, not only because both input specificatiof.i,s and storage specifications are subject to error, but also because various transliteration schemes exist and can provide variant proper name forms in the same data base. this paper reviews several proper name matching schemes and provides an updated version of these schemes which tests out nicely on the proper name equivalence classes of a suburban telephone book. an appendix lists the corpus of names used for algorithm test. a viable on-line search system cannot reasonably assume that each user will invariably provide the proper input information without error. human beings not only make errors, but also expect their correspondents, be they human or mechanical, to be able to cope with these errors, at least at some reasonable error-rate level. many of the difficulties in implementing computer systems in many areas of human activity stem from failure to recognize, and plan for, routine acceptance of errors in the systems. indeed, computing did not become the widespread activity it is now until the socalled higher-level languages came into being. although it is customary to think of higher-level languages as being "more english-like," the height of their level is better measured by the brevity with which various jobs can be expressed (for brevity tends to reduce errors) and the degree of sophistication of their automatic error detection and correction procedures. the processing of catalog information for the purposes of exposing and retrieving information presents at least two major areas for research in automatic error detection and correction. at the first stage, the data bank must be created, updated and maintained. methods for dealing with input errors at this level have been derived by a number of groups and it seems reasonable to assert that something in the order of 60% of the input errors can be detected automatically ( 1,2,3 ). with the possibility of human proof258 journal of library automation vol. 3/4 december, 1970 reading and error detection through actual use, it is reasonable to expect a mature data base to have a very low over-all error rate. at the second stage, however, when a user approaches the data base through a terminal or other on-line device, the errors will be of a recurring nature. each user will generate his own error set and, though experience will tend to minimize the error rate for a particular user, there will be an essentially irreducible minimum error rate even for an experienced user. if the system is to attract users other than professional interrogators, it must respond intelligently at this minimal error level. this paper explores certain problems associated with making "noisy matches" in catalog searches. because preliminary information indicates that the most likely source of input errors is in the keyboarding of proper names, the main emphasis of the paper is on the problem of algorithmically compressing proper names in such a way as to identify similar names (and likely misspellings) without over-identifying the list of possible authors. existing name-compression algorithms the problem of providing equivalence classes of proper names is hardly new. library catalogs, telephone directories and other major data bases have made use of "see-also"-type references for many years. some years ago remington-rand derived an alphanumeric name compression algorithm, soundex, that could be applied either by hand or by machine for such purposes ( 4). perhaps the most widely used on-line retrieval system presently in existence, the airline reservation system (such as sabre), makes use of such an algorithm (5). the closely related problem of compressing english words (either to establish noisy matches, to eliminate misspelled words, or simply to achieve data bank compression) has also received some attention ( 6, 7, 8). implementation of such algorithms has been described ( 9, 10, 11, 12, 13). although english word structure differs from proper-name structure in some important respects (e.g., the existence of suffixes), three of the algorithms are constructed by giving varying degrees of attention to the following five areas of word structure: 1 ) the character in word initial position; 2) the character set: (a, e, i, 0, u, y, h, w); 3) doubled characters (e.g., tt); 4) transformation of consonants (i.e., all alphabetic characters other than those in 2 above) into equivalence classes; 5) truncation of the residual character string. the word-initial character receives varying attention. soundex places the initial consonant in the initial position of the compressed form and then transforms all other consonants into equivalence classes with numeric titles. sabre maintains the word-initial character even if it is a vowel. in the armour research foundation scheme (arf), the word-initial character is also retained as is. algorithm for name compressionjdolby 259 both soundex and sabre eliminate all characters in the set 2) above. the arf scheme retains all characters in shorter words and deletes vowels only, to reduce the compressed form to four characters, deleting the "u" after "q," the second vowel in a vowel string, and then all remaining vowels. all three systems delete the second letter of a double-letter string. sabre goes a step further and deletes the second letter of a doubleletter string occurring after the vowels have been deleted. thus, the second "r" of "bearer" would be deleted. soundex maps the eighteen consonants into six equivalence classes: 1) b, f, p, v 2) c, g, j, k, q, s, x, z 3) d, t 4) l 5) m, n 6) r sabre and arf do not perform any transformations on these eighteen consonants. finally, all three systems truncate the remaining string of characters to four characters. for shorter forms, padding in the form of zeros (soundex), blanks (sabre), or hyphens (arf) is added so that all codes are precisely four characters long. variable-length coding schemes have been considered but generally rejected for implementation on major systems because of the attendant difficulties of programming and the fact that code compression is enhanced by fixed-length codes where no interword space is necessary. although fixed-length schemes of length greater than four have been considered, no definitive data appears to be available as to the enhanced ability of compressed codes to discriminate by introduction of more characters. the sabre system does add a fifth character but makes use of the person's first initial for added discrimination. tukey ( 14) has constructed a personal author code for his citation indexing and permuted title studies on an extensive corpus of the statistical literature. in this situation the author code is a semi-mnemonic code in a tag form to assist the user in identification rather than to be used as a basic entry point. however, tukey does note that in his corpus a threecharacter code of the surname, plus two initials, is superior to a fivecharacter surname code for purposes of unique identification. measuring algorithmic performance one of the main problems in constructing linguistic algorithms is to decide on appropriate measures of performance and to obtain data bases for implementing such measures. in this case it is clear that certain improvements in existing algorithms can be madeparticularly by using more sophisticated b·ansformation rules for the consonants and that 260 journal of librat·y automation vol. 3/4 december, 1970 the problems of implementing such changes are not so great in today's context as they were when the systems noted above were originally derived. improvements in processing speeds and programming languages, however, do not remove the need for keeping "linguistic frills" to a minimum. ideally, it would be desirable to have a list of common errors in keyboarding names as a test basis for any proposed algorithms. unfortunately, no such list of sufficient size appears to be available. lacking this, one can speculate that certain formal properties of the predictability of language might be useful in deriving an algorithm. at the english word level, some effort has been made to exploit measures of entropy as developed by shannon in this direction (6, 7). however, there is good reason to question whether entropy, at least when measured in the usual way, is strongly correlated with actually occurring errors ( 15). as an alternative, one can study existing lists of personal-name equivalence classes to derive such algorithms and then test the algorithm against such classes, measuring both the degree of over-identification and the degree of under-identification. clearly, such tests will carry more weight if they are conducted under economic forcing conditions where weaknesses in the test set will lead to real and measurable expense to the organization publishing the list. the sabre system operates under strong economic forcing conditions in the sense that airline passengers frequently have a number of competitive alternatives available to them and lost reservations can cause sufficient inconvenience for them to consider these alternatives. however, the main application of the sabre system is to rather small groups of persons (at least when compared to the number of personal authors in a typical library catalog), so that errors of over-identification are essentially trivial in cost to the airlines. a readily available source of "see-also"-type equivalence classes of proper names is given in the telephone directory system. here, the economic forcing system is not so strong as in the airline situation, but it is measurable in that failure to provide an adequate list will lead to increased user dependence on the information operator, with consequent increased cost to the telephone company. as a test of the feasibility of using such a set of equivalence classes, the 451 classes found in the palo alto-los altos (california) telephone directory were copied out by hand and used in deriving and testing the algorithm given in the next section and the soundex algorithm. there remains the question of deciding what is to constitute proper agreement between any algorithm and the set of equivalence classes chosen as a data base. at the grossest level it seems reasonable to argue that overidentification is less serious than under-identification. false drops only tend to clog the line. lost reference points, on the other hand, lead to lost information. investigation of other applications of linguistic algorithms, such as algorithms to hyphenate words, identify semantically similar words through cutting off of suffixes, and so forth, indicates that it is usually algorithm for name compressionjdolby 261 possible to reduce crucial error (in this case under-identification) to something under 5%, while preserving something in the order of 80% of the original distinctions (or efficiency) of the system. efforts to improve materially on the "five-and-eighty" rule generally lead to solutions involving larger context and/or extensive exception dictionaries. in this study efforts are directed at achieving a "five-and-eighty" solution. a variable-length name-compression scheme in light of the fact that no definitive information is available on the problems of truncating errors in name-compression algorithms, it is convenient to break the problem into two pieces. first is derivation of a variable-length algorithm of the required accuracy and efficiency and then determination of the errors induced by truncation. a studying of the set of equivalence classes given in the palo alto-los altos telephone directory made fairly clear that with minor modifications of the basic five steps used in the other algorithms noted above, it would not be too difficult to provide a reasonably accurate match without requiring too much over-identification. the main modifications made consisted of maintaining the position of the first vowel and using local context to make transformations on the consonants. the algorithm is given below. (the rules given must be applied in the order given both with respect to the rules themselves and to the order of the lists within the rules, as the precedence relations are important to the performance of the algorithm.) a spelling equivalent abbreviation algorithm for personal names 1) transform: "meg" to "mk", "mag" to "mk", "mac" to "mk", "me" to "mk". 2) working from the right, recursively delete the second letter from th f ii . i tt · "dt" "ld" " d" " t" " " " d" " t" " '' e o owmg e er parrs: , , n , n , rc , r , r , sc , "sk", "st''. 3) t f ,, , t "k ,, (( , t 1.( , " ., t " ., " , t " ,~ ,, rans orm: x o s , ce o se , c1 o s1 , cy o sy , consonant-ch" to "consonant-sh"; all other occurrences of "c" to "k", "z" to "s", "wr" to "r", "dg" to "g", "qu" to "k'', "t" to "d", "ph" to "f' (after the first letter). 4) delete all consonants other than "1", "n", and y' which precede the letter "k" (after the first letter). 5) delete one letter from any doubled consonant. 6) transform "pf#" to "p#", "#pf" to "#f", "vowel-gh#" to "vowel-£#", "consonant-gh" to "consonant-g#", and delete all other occurrences of "gh". ("#"is the word-beginning and word-ending marker.) 7) replace the first vowel in the name by the symbol "•". 8) delete all remaining vowels. 9) delete all occurrences of "w" or "h" after the first letter in the word. the vowels are taken to be (a, e, i, 0, u, y) . the remaining literal characters are treated as consonants. 262 journal of library automation vol. 3/4 december, 1970 the algorithm splits 22 ( 4.9%) of the 451 equivalence classes given by the phone directory. on the other hand, the algorithm provides 349 distinct classes (not counting those classes that were broken off in error) or 77.4% of the 451 classes in the telephone directory data base. thus has been achieved a reasonable approximation to the "five-and-eighty" performance found in other linguistic problem areas. to give a proper appreciation of the nature of these underidentification errors, they are discussed below individually. 1) the name bryer is put in the same equivalence class with a variety of spellings of the name bear. the algorithm fails to make this identification. 2) blagburn is not equated to blackburn. 3) the name davison is equated to davidson in its various forms. the algorithm fails to make this identification and this appears to be one of a modest class of difficulties that occur prior to the -son, -sen names. 4) the class of names dickinson, dickerson, dickison, and dickenson are all equated by the directory but kept separate, except for the two forms of dickinson, by the algorithm. 5) the name holm is not equated with the name home. 6) the name holmes is not equated with the name homes. 7) the algorithm fails to equate jaeger with two forms of yaeger. 8) the algorithm fails to equate lamb with lamn. 9) the algorithm incorrectly assumes that the final "gh" of leigh should be treated as an "f." treating final "gh" either as a null sound or an "f' leads to about the same number of errors in either direction. 10) the algorithm fails on the pairing of leicester and lester. the difficulty is an intervening vowel. 11) the algorithm fails to equate the various forms of lindsay with the forms of lindsley. 12) the algorithm fails to equate the various forms of mclaughlin with mclachlan. 13) the algorithm fails to equate mccullogh with mccullah. this is again the final "gh" problem. 14) the algorithm fails to equate mccue with mchugh (again the final "gh" problem) . 15) the algorithm fails to equate moretton with morton. this is an intervening vowel problem. 16) the algorithm fails to equate rauch with roush. 17) the algorithm fails to equate robinson with robison (another -son type problem). 18) the algorithm incorrectly assumes that the interior "ph" of shepherd is an "£." 19) the algorithm fails to equate speer with speier. algorithm for name compressionjdolby 263 20) the algorithm fails to equate stevens with stephens. 21) the algorithm fails to equate stevenson with stephenson. 22) the algorithm fails to equate the various forms of the word thompson (an -son problem.) in several of the errors noted above it may be questioned whether the telephone directory is following its own procedures with complete rigor. setting these aside, the primary errors occur with the final "gh," the words ending in "son," and the words with the extraneous interior vowels. each of these problems can be resolved to any desired degree of accuracy, but only at the expense of noticeable ·increases in the degree of complexity of the algorithm. the truncation problem simple truncation does not introduce errors of under-identification; it can only lead to further over-identification. examination of the results of applying the algorithm to the telephone directory data base shows that no new over-identification is introduced if the compressed codes are all reduced to the leftmost seven characters. further truncation leads to the following results: code length 7 6 5 4 cumulative over-identification losses 0 1 6 45 thus there is a strong argument for maintaining at least five characters in the compressed code. however, there is no real need for restriction to simple truncation. following the procedures used in the arf system, further truncation can be obtained by selectively removing some of the remaining characters. the natural candidate for such removal is the vowel marker. if the vowel marker is removed from all the five character codes, only six more overidentification errors are introduced. removal of the vowel markers from all of the codes would have introduced 17 more errors of over-identification. the utility of the vowel marker is in the short codes. this in turn suggests that introduction of a second vowel marker in the very short codes may have some utility, and this is indeed the case. if the conception of vowel marker is generalized as marking the position of a vowel-string (i.e., a string of consecutive vowels), where for these purposes a vowel is any of the characters (a, e, i, 0, u, y, h, w), and these markers are maintained as "padding" in the very short words, 18 errors of over-identification are eliminated at the cost of two new errors of under-identification. in this way the following modification to the variable length algorithm is derived: 1) mark the position of each of the first two vowel strings with an "o ," if there is more than one vowel. 264 journal of library automation vol. 3/4 december, 1970 2) truncate to six characters. 3) if the six-character code has two vowel markers, remove the righthand vowel marker. otherwise, truncate the sixth character. 4) if the resulting five-character code has a vowel marker, remove it. otherwise remove the fifth character. 5) for all codes having less than four characters in the variable-length fonn, pad to four characters by adding blanks to the right. measured against the telephone directory data base, this fixed-length compression code provides 361 distinct classes (not counting improper class splits as separate classes) or 80% of the 451 given classes. twentyfour ( 5.3 %) of the classes are improperly split. by way of comparison, the sound ex system improperly splits 135 classes ( 30%) and provides only 287 distinct classes (not counting improperly split classes), or 63.8% of the telephone directory data base. acknowledgments this research was carried out for the institute of library research, university of california, under the sponsorship of the office of education, research grant no. oeg-1-7-071083-5068. the author would like to thank ralph m. shoffner and kelley l. cartwright for suggesting the problem and for a number of useful comments on existing systems. allan j. humphrey was kind enough to program the variable-length version of the algorithm for test purposes. appendix: corpus of names used for algorithm test a list of personal-name equivalence classes from the palo alto-los altos telephone directory is arranged according to the variable-length compression code (with the vowel marked "•" treated as an "a" for ordering) . names whose compressed codes do not match the one given in the first column (and hence represent weaknesses in the algorithm and/ or the directory groupings) are given in italics. a small number of directory entries that do not bear on the immediate problem have been deleted from the list : bell's see also bells; co-op see also co-operative; st. see also saint; etc. 0 bl abel, abele, abell, able 0 brms abrahams, abrams 0 brmsn abrahamson, abramson •d eddy, eddie 0 dmns edmonds, edmunds 0 dmnsn edmondson, edmundson 0 dms adams, addems 0 gn eagen, egan, eggen 0 gr jaeger, yaeger, yeager °kn aiken, aikin, aitken °kns adkins, akins °kr okr ·ks 0 lbrd ·ln 0 ln 0 lsn 0 lvr •ms 0 ngl 0 nl 0 nrs 0 nrsn •ns 0 rksn 0 rl 0 rn •rns •rs 0 rvn 0 rvng 0 sbrn b•n b•ns b°kmn b0 l b0 l b0 l b0 l b.l b 0 ln b·m b 0 mn b•n b0 nd b·r b0 r b•r b•r b 0 rbr b•rc b 0 rgr b 0 rk b 0 rn algorithm for name compressionjdolby 265 acker, aker eckard, eckardt, eckart, eckert, eckhardt oakes, oaks, ochs albright, allbright elliot, elliott allan, allen, allyn ohlsen, olesen, olsen, olson, olsson oliveira, olivera, olivero ames, eames engel, engle, ingle o'neal, o'neil, o'neill andrews, andrus andersen, anderson, andreasen ennis, enos enrichsen, erickson, ericson, ericsson, eriksen earley, early erwin, irwin aarons, ahrends, ahrens, arens, arentz, arons ayers, ayres ervin, ervine, irvin, irvine erving, irving osborn, osborne, osbourne, osburn beatie, beattie, beatty, beaty, beedie betts, betz bachman, bachmann, backman bailey, baillie, bailly, baily, bayley beal, beale, beall, biehl belew, ballou, bellew buhl, buell belle, bell bolton, boulton baum, bohm, bohme bauman, bowman bain, bane, bayne bennet, bennett baer, bahr, baier, bair, bare, bear, beare, behr, beier, bier, bryer barry, beare, beery, berry bauer, baur, bower bird, burd, byrd barbour, barber berg, bergh, burge berger, burger boerke, birk, bourke, burk, burke burn, byrne 266 journal of library automation vol. 3/4 december, 1970 b 0 rnr b 0 rns b 0 rnsn b0 rs bl°kbrn bl 0 m br 0 d br 0 n br 0 n d 0 ds d°f d 0 gn d°k n•knsn n•ksn n•l n•l n•l d 0 mn n•n n•n n•n n•n n•n d0 nl d.r n•r d 0 rm d 0 vdsn n•vs dr0 sl f• f°fr f 0 gn f 0 l f 0 l f 0 lknr f 0 lps f 0 ngn f 0 nl f0 rl f 0 rr f 0 rr f 0 rs bernard, bernhard, bernhardt, bernhart berns, bims, burns, byrns, byrnes bernstein, bornstein bertsch, birch, burch blackburn, blagburn blom, bloom, bluhm, blum, blume brode, brodie, brody braun, brown, browne brand, brandt, brant diezt, ditz duffie, duffy dougan, dugan, duggan dickey, dicke dickenson, dickerson, dickinson, dickison dickson, dixon, dixson dailey, daily, daley, daly dahl, dahle, dall, doll deahl, deal, diehl diamond, dimond, dymond dean, deane, deen denney, denny donahoo, donahue, donoho, donohoe, donohoo, donohue, dunnahoo downey, downie dunn, dunne donley, donnelley, donnelly daugherty, doherty, dougherty dyar, dyer derham, durham davidsen, davidson, davison davies, davis driscoll, driskell fay, fahay, fahey fifer, pfeffer, pfeiffer fagan, feigan, fegan feil; pfeil feld, feldt, felt faulkner, falconer philips, phillips finnegan, finnigan finlay, finley farrell, ferrell ferrara, ferreira, ferriera foerster, forester, forrester, forster forrest, forest f 0 rs f 0 rs f 0 sr fl 0 n fl 0 ngn fr0 fr0 dmn fr0 drksn fr°k fr0 ns fr0 ns fr 0 s fr0 sr g0 d g0 ds g°f g0 l g0 lmr g0 lr g0 ms g0 nr g 0 nsls g0 nslvs g0 rd c•rn g 0 rn g 0 rnr c•rr g 0 s gr 0 gr.fd gr0 n gr•s h•n h°f h°fmn h0 g h 0 gn h°k h°ksn h 0 l h•l h•l h0 l h 0 ld algorithm for name compressionjdolby 267 faris, farriss, ferris, ferriss first, fuerst, furst fischer, fisher flinn, flynn flanagan, flanigan, flannigan frei, frey, fry, frye freedman, friedman frederickson, frederiksen, fredickson, fredriksson franck, frank france, frantz, franz frances, francis freeze, freese, fries fraser, frasier, frazer, frazier good, goode getz, goetz, goetze goff, gough gold, goold, gould gilmer, gilmore, gilmour gallagher, gallaher, galleher gomes, gomez guenther, gunther gonzales, gonzalez consalves, gonzalves garratt, garrett garrity, geraghty, geraty, gerrity gorden, gordohn, gordon gardiner, gardner, gartner garrard, gerard, gerrard, girard gauss, goss gray, grey griffeth, griffith green, greene gros, grose, gross hyde, heidt hoff, hough, huff hoffman, hoffmann, hofman, hofmann, huffman hoag, hoge, hogue hagan, hagen hauch, hauck, hauk, hauke hutcheson, hutchison holley, holly holl, hall halley, haley haile, hale holiday, halliday, holladay, holliday i 268 journal of libra1·y automation vol. 3/4 december, 1970 h 0 lg h 0 lm h 0 lms h 0 ln h0 m h 0 mr h 0 n h 0 n h0 nn h 0 nrks h 0 nrksn h0 ns h0 ns i-jonsn h 0 r h 0 r h 0 r h 0 r h 0 rmn h 0 rmn h 0 rmn h0 rn h 0 rn h 0 rn h 0 rngdn h 0 s h 0 s h 0 s h 0 sn h 0 vr r tfr rfrs tkb rkbsn rks rl rms rmsn rnsn rs ko k°f k°fmn helwig, hellwig holm, home holmes, homes highland, hyland ham, hamm hammar, hammer hanna, hannah hahn, hahne, harm, haun hanan, hannan, hannon hendricks, hendrix, henriques hendrickson, henriksen, henrikson heintz, heinz, heinze, hindes, hinds, hines, hinze haines, haynes henson, hansen, hanson, hanssen, hansson, hanszen herd, heard, hird, hurd hart, hardt, harte, heart hare, hair hardey, hardie, hardy hartman, hardmen, hardman, hartmann herman, hermann, herrmann harman, harmon heron, herrin, herron hardin, harden hom, horne herrington, harrington haas, haase, hasse howes, house, howse hays, hayes houston, huston hoover, hover jew, jue jeffery, jeffrey jefferies, jefferis, jefferys, jeffreys jacobi, jacoby jacobsen, jacobson, jackobsen jacques, jacks, jaques jewell, juhl jaimes, james jameson, jamieson, jamison jahnsen, jansen, jansohn, janssen, jansson, janzen, jensen, jenson joice, joyce kay, kaye coffee, coffey coffman, kauffman, kaufman, kaufmann k°k k0 l k0 l k0 lmn k0 lr k0 mbrln k 0 mbs k0 mp k0 mps k0 n k0 n k0 n k0 n k0 n k0 n k0 n k 0 nl k 0 nr k0 ns k0 p k0 pl k0 r k0 r k0 r k0 r k0 r k 0 rd k0 rln k 0 rn k0 rsnr k0 s k0 s k0 s k0 sl k0 slr k 0 sr kl 0 n kl.,rk kl 0 sn kr 0 kr 0 gr kr.,mr kr 0 n kr 0 s kr 0 s algor·ithm. for name compressionfdolby 269 cook, cooke, koch, koche cole, kohl, koll kelley, kelly coleman, cohnan koehler, koeller, kohler, koller chamberlain, chamberlin combs, coombes, coombs camp, kampe, kampf campos, campus cahn, conn, kahn cahen, cain, caine, cane, kain, kane chin, chinn chaney, cheney coen, cohan, cohen, cohn, cone, koehn, kahn coon, kuhn, kuhne kenney, kenny, kinney conley, conly, connelly, connolly conner, connor coons, koontz, kuhns, kuns, kuntz, kunz coop, co-op, coope, coupe, koop chapel, chapell, chappel, chappell, chappelle, chapple carrie, carey, cary corey, cory carr, kar, karr kurtz, kurz kehr, ker, kerr cartwright, cortright carleton, carlton carney, cerney, kearney kirschner, kirchner chace, chase cass, kass kees, keyes, keys cassel, cassell, castle kesler, kessler, kestler kaiser, kayser, keizer, keyser, kieser, kiser, kizer cline, klein, kleine, kline clark, clarke claussen, clausen, clawson, closson crow, crowe krieger, kroeger, krueger, kruger creamer, cramer, kraemer, kl·amer, kremer craine, crane christie, christy, kristee crouss, kraus, krausch, krause, krouse 270 journal of library automation vol. 3/4 december, 1970 kr 0 s kr 0 s kr 0 snsn lo lo l 0 d l 0 dl l 0 drmn l°k l°ks l 0 ln l 0 lr l 0 mb l 0 mn l 0 mn l0 n l0 n l0 n l0 n l 0 ng l 0 nn l 0 ns l 0 r l 0 rns l 0 rns l 0 rsn l 0 s l 0 s l 0 sr l0 v l 0 vd l 0 vl l 0 vn m 0 d m 0 dn m0 ds m 0 dsn m°kl m°km m°ks m°ks m 0 ln m 0 ln m 0 lr m 0 lr cross, krost crews, cruz, kruse christensen, christiansen, christianson loe, loewe, low, lowe lea, lee, leigh lloyd, loyd litle, littell, little, lytle ledterman, letterman leach, leech, leitch lucas, lukas laughlin, loughlin lawler, lawlor lamb, lamm lemen, lemmon, lemon layman, lehman, lehmann lind, lynd, lynde lion, lyon lin, linn, lynn, lynne lain, laine, laing, lane, layne lang, lange london, lundin lindsay, lindsey, lindsley, linsley lawry, lowery, lowrey, lowry lawrence, lowrance laurence, lawrance, lawrence, lorence, lorenz larsen, larson lewis, louis, luis, luiz lacey, lacy leicester, lester levey, levi, levy leavett, leavitt, levit lavell, lavelle, leavelle, loveall, lovell lavin, levin, levine mead, meade m oretton, morton mathews, matthews madison, madsen, matson, matteson, mattison, mattson michael, michel meacham, mechem marques, marquez, marquis, marquiss marcks, marks, marx maloney, moloney, molony mullan, mullen, mullin mallery, mallory moeller, moller, mueller, muller m0 lr m 0 ls m 0 n m0 nr m0 nr m0 nsn m 0 r m 0 r m0 r m0 r m0 r m0 rf m0 rl m 0 rn m 0 rs m0 rs mk0 mk0 mk0 mk 0 mk 0 l mk 0 lf mk 0 lm mk 0 n mk 0 nr mk 0 ns mk0 ns mk0 r mk0 r mkd 0 nl mkf 0 rln mkf 0 rsn mkl 0 d mkl 0 kln mkl 0 ln mkl 0 n mkl•n mkl 0 s mkm 0 ln mkn°l mkr•o n°kl n°kls n°kls algorithm for name compressionjdolby 271 millar, miller miles, myles mahan, mann miner, minor monroe, munro monson, munson murray, murrey maher, maier, mayer mohr, moor, moore meyers, myers meier, meyer, mieir, myhre murphey, murphy merrell, merrill marten, martin, martine, martyn meyers, myers maurice, morris, morse mccoy, mccaughey magee, mcgee, mcgehee, mcghie mackey, mackay, mackie, mckay mccue, mchugh magill, mcgill mccollough, mccullah, mccullough mccallum, mccollum, mccolm mckenney, mckinney macintyre, mcentire, mcintire, mcintyre mackenzie, mckenzie maginnis, mcginnis, mcguinness, mcinnes, mcinnis maguire, mcguire mccarthy, mccarty macdonald, mcdonald, mcdonnell macfarland, macfarlane, mcfarland, mcfarlane macpherson, mcpherson macleod, mccloud, mcleod maclachlan, maclachlin, mclachlan, mclaughlin, mcloughlin mcclellan, mcclelland, mclellan mcclain, mcclaine, mclain, mclane maclean, mcclean, mclean mccloskey, mcclosky, mccluskey macmillan, mcmillan, mcmillin macneal, mcneal, mcneil, mcneill magrath, mcgrath nichol, nicholl, nickel, nickle, nicol, nicoll nicholls, nichols, nickels, nickles, nicols nicholas, nicolas 272 journal of library automation vol. 3/4 d ecember, 1970 n°klsn n°ksn n°l n°lsn n°mn n°rs n°sbd p•n p 0 drsn p•c p 0 lk p0 lsn p•n p•r p•r p0 rk p 0 rks p•rs r•rs p•rs p 0 rsn pr°kr pr 0 ns pr 0 r r• r• r 0 bnsn r•n r•n r 0 d r 0 dr r•ns r 0 gn r•gr r°k r°k r°kr n•l r0 mngtn r0 mr n•ms n•n r0 nr r•s nicholsen, nicholson, nicolaisen, nicolson nickson, nixon neal, neale, neall, neel, neil, neill neilsen, neilson, nelsen, nelson, nielsen, nielson, nilson, nilssen, nilsson neumann, newman norris, nourse nesbit, nesbitt, nisbet pettee, petty peterson, pederson, pedersen, petersen, petterson page, paige polak, pollack, pollak, pollock polson, paulsen, paulson, poulsen, poulsson paine, payn, payne parry, perry parr, paar park, parke parks, parkes pierce, pearce, peirce, piers parish, parrish paris, parris pierson, pearson, pehrson, peirson prichard, pritchard prince, prinz prior, pryor roe, rowe rae, ray, raye, rea, rey, wray robinson, robison rothe, roth rudd, rood, rude reed, read, reade, reid rider, ryder rhoades, rhoads, rhodes regan, ragon, reagan rodgers, rogers richey, ritchey, ritchie reich, reiche reichardt, richert, rickard reilley, reilly, reilli, riley remington, rimington reamer, reimer, riemer, rimmer ramsay, ramsey rhein, rhine, ryan reinhard, reinhardt, reinhart, rhinehart, rinehart reas, reece, rees, reese, reis, reiss, ries r0 s r0 s r0 s r•vs s•br s°fl s•fn s°fns s°fnsn s°fr s°fr s•cl s 0 glr s•k s•ks s•l s•l s•lr s•ls s•lv s•lvr s 0 mkr s 0 mn s 0 mn s•mrs s·ms s•n s 0 n s 0 nr s0 nrs s 0 pr s·r s·r s·r s 0 r s0 r s•rl s 0 rlng s•rmn s0 rn s•rr sos sm 0 d algorithm for name compressionjdolby 273 rauch, rausch, roach, roche, roush rush, rusch russ, rus reaves, reeves seibert, siebert schofield, scofield stefan, steffan, steffen, stephan, stephen steffens, stephens, stevens steffensen, steffenson, stephenson, stevenson schaefer, schaeffer, schafer, schaffer, schafer, shaffer, sheaffer stauffer, stouffer siegal, sigal sigler, ziegler schuck, shuck sachs, sacks, saks, sax, saxe seeley, seely, seley schell, shell schuler, schuller schultz, schultze, schulz, schulze, shults, shultz silva, sylva silveira, silvera, silveria schomaker, schumacher, schumaker, shoemaker, shumaker simon, symon seaman, seemann, semon somers, sommars, sommers, summers simms, sims stein, stine sweeney, sweeny, sweney senter, center sanders, saunders shepard, shephard, shepheard, shepherd, sheppard stahr, star, starr stewart, stuart storey, story saier, sayre schwartz, schwarz, schwarze, swartz schirle, shirley sterling, stirling scheuermann, schurman, sherman stearn, stem scherer, shearer, sharer, sherer, sheerer sousa, souza smith, smyth, smythe 274 journal of library automation vol. 3/4 december, 1970 sm 0 d sn°dr sn°l sp 0 lng sp 0 r sp 0 r sr 0 dr sr0 dr t0 d t 0 msn t0 rl tr 0 s v·l v·l v·r w•o w 0 dkr w·nl w·nmn w 0 dr w 0 drs w 0 gnr w 0 l w 0 l w 0 l w 0 lbr w 0 lf w 0 lkns w 0 lks w 0 ln w 0 lr w 0 lrs w 0 ls w 0 ls w 0 ls w 0 lsn w 0 n w 0 r w 0 r w 0 rl w 0 rnr w 0 s w·smn schmid, schmidt, schmit, schmitt, smit schneider, schnieder, snaider, snider, snyder schnell, snell spalding, spaulding spear, speer, speirer spears, speers schroder, schroeder, schroeter schrader, shrader tait, tate thomason, thompson, thomsen, thomson, tomson terrel, terrell, terrill tracey, tracy vail, vaile, vale valley, valle vieira, vierra white, wight whitacre, whitaker, whiteaker, whittaker whiteley, whitley whitman, wittman woodard, woodward waters, watters wagener, waggener, wagoner, wagner, wegner, waggoner willey, willi wiley, wylie wahl, wall wilber, wilbur wolf, wolfe, wolff, woolf, woulfe, wulf, wulff wilkens, wilkins wilkes, wilks whalen, whelan walter, walther, wolter walters, walthers, wolters wallace, wallis welch, welsh welles, wells willson, wilson winn, wynn, wynne worth, wirth ware, wear, weir, wier wehrle, wehrlie, werle, worley warner, werner weis, weiss, wiese, wise, wyss weismann, weissman, weseman, wiseman, wismonn, wissman algorithm for name compressionjdolby 275 references 1. cox, n.s.m.; dolby, j. l.: "structured linguistic data and the automatic detection of errors." in advances in computer typesetting (london: institute of printing, 1966), pp. 122-125. 2. cox, n.s.m.; dews, j. d.; dolby, j. l.,: the computer and the library (hamden, conn.: archon press, 1967). 3. dolby, j. l.; forsyth, v. j.; resnikoff, h. l.: computerized library catalogs: their growth, cost and utility (cambridge, massachusetts: the m.i.t. press, 1969) . 4. becker, joseph; hayes, robert m. : information storage and retrieval (new york: wiley, 1963 ), p. 143. 5. davidson, leon: "retrieval of misspelled names in airlines passenger record system," communications of the acm, 5 (1962), 169-171. 6. blair, c. r.: "a program for correcting spelling errors," information & control, 3 ( 1960), 60-67. 7. schwartz, e. s.: an adaptive information transmission system employing minimum redundancy word codes (armour research foundation report, april 1962). (ad 274-135). 8. bourne, c. p.; ford, d.: "a study of methods for systematically abbreviating english words and names," journal of the acm, 8 ( 1961), 538-552. 9. kessler, m. m., "the "on-line" technical information system at m.i.t.", in 1967 ieee international convention record. (new york: institute of electrical and electronic engineers, 1967), pp. 40-43. 10. kilgour, f. g.: "retrieval of single entries from a computerized library catalog file," american society for information science, proceedings, 5 ( 1968), 133-136. 11. nugent, w. r.: "compression word coding techniques for information retrieval," journal of library automation, 1 (december 1968), 250-260. 12. rothrock, h. i.: computer-assisted directory search; a dissertation in electrical engineering. (philadelphia: university of pennsylvania, 1968). 13. ruecking, f. h.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-238. 14. tukey, j. w.: a tagging system for journal articles and other citable items: a status report (princeton, n.j.: statistical techniques research group, princeton university, 1963). 15. resnikoff, h. l.; dolby, j. l.: a proposal to construct a linguistic and statistical programming system, (los altos, cal.: r & d consultants company, 1967). the next generation integrated library system: a promise fulfilled? yongming wang and trevor a. dawes information technology and libraries | september 2012 76 abstract the adoption of integrated library systems (ils) became prevalent in the 1980s and 1990s as libraries began or continued to automate their processes. these systems enabled library staff to work, in many cases, more efficiently than they had in the past. however, these systems were also restrictive—especially as the nature of the work began to change—largely in response to the growth of electronic and digital resources that they were not designed to manage. new library systems—the second (or next) generation—are needed to effectively manage the processes of acquiring, describing, and making available all library resources. this article examines the state of library systems today and describes the features needed in a next-generation library system. the authors also examine some of the next-generation library systems currently in development that purport to fill the changing needs of libraries. introduction since the late 1980s and early 1990s, the library automation system has gone from inception to rapid implementation to near ubiquitous adoption. but after two decades of changes in information technology, and especially in the last decade, the library has seen itself facing tremendous changes in terms of both resources and services it provides. on the resource side, print material and physical items are no longer dominant collections; electronic resources are fast outpacing physical materials to become the dominant library resources, especially in the academic and special libraries. in addition, many other digital format resources, such as digital collections, institutional repositories, and e-books have taken root. on the service front, library users— accustomed to immediate and instant searching, finding, and accessing information in the google age—demand more and more instant and easy access to library resources and services. but the library automation system, also called the integrated library system (ils), has not changed much for the past two decades. it finds itself uneasily handling the ever-changing library environment and workflow. library staff becomes ever more frustrated with the ils, noting its inadequacy in dealing with their daily jobs. library users are confused by the many interfaces and complexity of library applications and systems. it is obvious that we are at the tipping point for a dramatic change in the area of library automation systems. the library literature has been referring to these as second-generation library automation systems or next-generation library systems.1 two pillars of the second-generation library automation system are(1) it will manage the library resources in the comprehensive and unified way regardless of resource format and location; and (2) it will break away from the traditional ils models and build on the service oriented architecture (soa) model. yongming want (wangyo@tcnj.edu) is systems librarian for the college of new jersey library, ewing township, and trevor dawes (tdawes@princeton.edu) is access services & circulation librarian, princeton university libraries, princeton, new jersey. the next generation library system: a promise fulfilled? | wang and dawes 77 we are at the beginning of a new era of library automation systems. some library system vendors have realized the need to change and have started to develop and implement the secondgeneration library automation system. we believe that the concept and implementation of the new library automation system will catch on quickly among the all types of libraries. it will change how the library conducts its business and will benefit both library staff and users. literature review there is not much research literature on the subject to date. after more than a decade of library automation development and implementation, starting in the late 1990s, libraries have been facing the challenges ushered in by rapidly evolving internet and web 2.0 technologies in addition to the growing number of savvy web users. libraries found themselves lagging behind other sources (such as internet search engines) in meeting users’ information needs, and library staff members are generally frustrated by the lack of flexibility of traditional library systems. as early as 2007, marshall breeding pointed out that “as librarians continue to operate with sparse resources, performing ever more services with ever more diverse collections—but with no increases in staff—it’s more important than ever to have automation tools that provide the most effective assistance possible.”2 in his 2009 article, he deliberately says that “dissatisfaction with the current slate of ils products runs high. the areas of concern lie in their inability to manage electronic content and with their user interfaces that do not fare well against contemporary expectations of the web.”3 so what are the trends in libraries for the last decade in terms of library resources, collections, services, and resource discoveries? according to breeding, there are three trends: “1. increased digital collections; 2. changed expectations regarding interfaces; 3. shifted attitudes toward data and software.”4 andrew pace notes that “web-based content, licensed resources, born-digital documents, and institutionally significant digital collections emerged rapidly to overtake the effort required to maintain print collections, especially in academic libraries.”5 another noticeable trend in the library technology field is occurring along with a similar trend in the general information technology field, that is, the open-source software movement. as pace states, “open source software (oss) efforts such as the open archive initiative (oai), dspace, and koha—just to name a few, as an exhaustive list would overwhelm the reader—challenged commercial proprietary systems, not only for market share but often in terms of sophistication and functionality.”6 as for the infrastructure and features of the second-generation library automation system, both breeding and pace have their respective visions. breeding writes that “the next generation of library automation systems needs to be designed to match the workflows of today’s libraries, which manage both digital and print resources.”7 “one of the fundamental assumptions of the next generation library automation would involve a design to accommodate the hybrid physical and digital existence that libraries face today.”8 pace specifically requires that the next-generation library automation system should use the web as a platform to fulfill the notion of software-as-aservice (saas), or further, platform-as-a-service (paas). the technical advantages of such systems would include the ability to “1. develop, test, deploy, host, and maintain on the same integrated environment; 2. user experience without compromise; 3. build-in scalability, reliability, and information technology and libraries | september 2012 78 security; 4. build-in integration with web services and databases; 5. support collaboration; 6. deep application instrumentation.”9 also as early as october 2007, computers in libraries invited ellen bahr to survey a number of library technology experts regarding what features and functionality they want to see built into ilss soon. the experts included roy tennant, kristin antelman, ross singer, andrew pace, john blyberg, stephen abram, and h. frank cervone. they identified the following key functionality for future ilss: • direct, read-only access to data, preferably through an open source database management system like mysql. • a standard way to communicate with the ils, preferably through an application programming interface. • standards-compliant systems including better security and more complete documentation. • the ability to run the ils on hardware that the library selects and on servers that the library administers. • greater interoperability of systems, pertaining to the systems within the library (including components from vendors, open source communities, and homegrown systems) and beyond (enterprise-level systems such as courseware and university portals, and shared library systems such as oclc). • greater distinction between the ils (which needs to efficiently manage a library’s business processes) and the opac (which needs to be a sophisticated finding tool). • better user interfaces, making use of the most current technologies available and providing a single interface to all of the library’s holdings, regardless of format.10 four aspects of next-generation ils there are four distinguishing characteristics of the next-generation ils we believe are critical. they are comprehensive library resources management; a system based on service-oriented architecture; the ability to meet the challenge of new library workflow; and a next-generation discovery layer. comprehensive library resources management comprehensive library resources management requires that next-generation ilss should be able to manage all library materials regardless of format or location. current ilss are built around the traditional library practice of print collections and services designed around these collections, but the last ten to fifteen years have seen great shifts in both library collections and services. print and physical materials are no longer the dominant resources. actually, in many libraries, especially in academic and research libraries, the building of electronic and digital collections have taken a larger role in library collection development. the traditional ils has not been able to handle ever-growing electronic and digital resources—either in terms of their acquisition or management. therefore a variety of either commercial or open-source the next generation library system: a promise fulfilled? | wang and dawes 79 electronic resources management systems (erm systems) have been developed over the years to address this management gap, but two problems exist: first, most erm systems, whether commercial or open-source, have not been able to truly integrate the acquisition process into the acquisitions workflow of the current ils systems, causing a messy and redundant workflow for the library staff. in libraries where an erm is deployed, staff generally track workflows in both the erm and the ils. if the library’s workflows have not been revised, miscommunication between the traditional acquisitions staff and the electronic resources staff can cause confusion, delay, and may even lead to disruption of services to library patrons. second, erm systems, by design, don’t take current library workflows into account. while it is true that these resources may need to be processed differently, library staff generally are used to traditional processes and want systems that function in familiar ways. many libraries, particularly academic libraries, still have relatively large serials departments responsible for the management of print journals. some have only recently begun to develop the personnel and the skills required to manage the influx of electronic and digital resources. because of these problems with existing erm systems, it is important that the next-generation ilss fully integrate the key features of erm systems, enabling the library to streamline and efficiently manage resources and staff. full integration of e-resource management would not only include acquisitions functionality but also the ability to manage licenses—a critical component of e-resource management—and the ability to manage the various packages, databases, and vendors. describing and providing access to e-resources are two aspects of the e-resources management process. these two features of the erm system should also be integrated with the description and metadata management component of the next-generation ils. centrally managing the metadata of e-resources enables easier discovery of resources by library users and has the advantage of shifting some of the management workflow to the metadata (or cataloging) staff. system based on service-oriented architecture next-generation ilss should be designed based on service-oriented architecture (soa). what is soa? a service-oriented architecture (soa) is an architecture for building business applications as a set of loosely coupled distributed components linked together to deliver a well-defined level of service. these services communicate with each other, and the communication involves data exchange or service coordination. soa is based on web services. broadly, soa can be classified into two aspects: services and connections, described below. services: a service is a function or some processing logic or business processing that is welldefined, self-contained, and does not depend on the context or state of other services. an example of a service is loan processing services, which can be a self-contained unit for processing loan applications. another example is weather services, used to get weather information. any application on the network can use the services of the weather service to get the weather information for a local area or region. in the library field, an example of a well-defined service is a check-in or check-out service. information technology and libraries | september 2012 80 connections: connections are the links connecting these self-contained distributed services with each other. they enable client-to-services communication. in case of web services, simple object access protocol (soap) is frequently used to communicate between services. there are many benefits of soa in the next-generation ils. these include the ability to be platform independent, therefore allowing libraries to use the software and hardware of their choice. there is no threat of being locked in to a single vendor, as many libraries are now with their current ilss. soa also enables incremental development, deployment, and maintenance. the vendors can use the existing software (investment) and use soa to build applications without replacing existing applications. as breeding described, the potential of web services (soa) for libraries includes • real-time interaction between library-automation systems and business systems of a library’s parent organization; • real-time interaction between library-automation systems and library suppliers or other business partners; • blending of library services into campus or municipal portal environments; • insertion of library services and content into courseware-management systems or other learning environments; • blending of content from external sources into library interfaces; and • delivery of library services and content to library users through nontraditional channels. 11 meet the challenge of the new library workflow the library systems in use today are, in general, aging—most were developed at least ten to fifteen years ago. they have been updated with software patches and new releases, but they still demand that staff work in the manner in which the systems were originally designed. although changes in our library operations have been realized in many organizations, these systems have not been able to adequately adapt to how library staff now want to—or need to—operate. the inability to keep pace with the move from largely print to increasingly electronic resources in our libraries is one of the reasons our existing systems fail. copeland et al. present a stunning visual of the typical workflow involved in acquiring and making available an electronic resource in the print-based library management system.12 their graphic depicts five possible starting points, nine decision points, and close to twenty steps involved in the process. this process may not be typical, but it is illustrative of the complex nature of our new workflows that simply cannot be accommodated by existing ilss. as early as 1997, the sirsi corporation recognized the need to modify systems; they introduced workflows, which is designed to streamline library operations.13 workflows, which introduced a graphical user interface to the sirsi unicorn system, was intended to allow staff a certain amount of flexibility and customization, depending on the tasks they typically perform. the new systems that are being developed and deployed today promise even more flexibility and propose to enable staff to work more efficiently irrespective of the format of the material being processed. but these systems will require staff to think about workflows in entirely different ways. not only will the method used to perform tasks be different (now web-based, hosted services as the next generation library system: a promise fulfilled? | wang and dawes 81 opposed to client-server-based tools) but the functionality has been enhanced to be more efficient. we cannot say how these new systems will be welcomed or resisted by staff. nor can we say how much staff savings will be realized because these systems are still too new and have not yet been implemented on a wide enough scale for a thorough assessment. but they are at least starting to address the issue. on the one hand, they will open a new window for further study and exploration of how to shape the next-generation ilss to suit the new library workflow. on the other hand, the library will benefit by changing some of their out-of-date practices and workflows around the new system. next-generation discovery layer current library opacs, like the ilss themselves, are more than ten years old and generally have shown no improvement in search capability, navigability, or discovery. meanwhile, search technology has radically improved in the past decade. frustrations with the opacs’ limitations on the part of both librarians and library users eventually motivated many libraries to seek alternatives. libraries want to take advantage of the advances in search and discovery technology by implementing “nextgen” opacs or library discovery services. given the vast range of resources available in libraries—local print holdings, specialized databases, and commercial databases to name only a few—libraries want a service that would make as many of them as discoverable as possible. the ideal system would have a unified search interface with a single search box, but with relevance ranking, faceted search, social tagging of records, persistent links to records, rss feeds for searches, and the ability to easily save searches or export selected records to standard bibliographic management software programs. the ideal system would also integrate with the library’s opac, overlaying its current interface with a more nimble and navigable interface that still allows real-time circulation status and provides as much support as possible for foreign language fonts. it would also be as customizable as possible. numerous options for discovery currently exist, and these include summon from serials solutions, primo from ex libris, worldcat local from oclc, ebsco discovery service, and encore from innovative interfaces. as these services are not the focus of this article, they will not be discussed in detail, but the next-generation ilss should have the ability to integrate seamlessly with these discovery services. analysis of two examples 1. alma development in early 2009, ex libris (owner of aleph and voyager) began discussions with several institutions (boston college, princeton university, and katholieke universiteit leuven; purdue university joined later) to develop what they then termed the unified resource management system (urm). the urm was to replace the existing ilss and the subsequent add-ons that provided functionality not inherently available, such as the electronic resources management (erm) tools. the “backend” operations would also be de-coupled from the user interface as described elsewhere in this paper. information technology and libraries | september 2012 82 through a series of in-person and online meetings with the development partners, ex libris staff developed the conceptual framework and functional requirements for the urm (later named alma) and began development of the product. alma was delivered to the partners in a series of releases, each with more functionality, and the feedback was used to enhance or further develop the product. alma uses the concept of a shared metadata repository (the metadata management system) to which libraries would contribute, through which records would be shared, and from which records would be downloaded and edited with local information. selection and acquisitions functions would be integrated not only within alma, but within the discovery layer to allow patrons, as well as staff, the ability to suggest items for addition to the library’s holdings. with “smart fulfillment,” the workflows for delivering materials to patrons will also be seamless.14 one of the major changes planned for alma is the ability to manage the types of resources that cannot be effectively managed in current ilss—specifically electronic and digital resources. these resources are currently managed with the use of add-on products that interact with varying degrees of success with the ilss. this lack of integration has been a source of frustration for library staff, particularly as library electronic and digital collections continue to steadily grow. the development partners have presented extensively at various conferences about the development process and have been mostly positive about the product. dawes and lute described princeton university’s participation in a presentation at the 2011 acrl conference in philadelphia.15 at princeton, an executive committee was created to oversee that partner’s process. other staff members were then involved in testing each of the partner releases as the functionality increased and was made available to them. the princeton university team then provided feedback to ex libris via regular telephone calls, after which they would see changes based on their feedback, or a status update from ex libris about the particular issue reported. the staff members at princeton believe that their participation in the development of alma has given them an opportunity to closely examine their workflows to see where efficiencies can be made. 2. kuali ole project in 2008 a group of nine libraries formed the open library environment (ole) project, later called kuali ole. kuali is a community of higher education institutions that came together to build enterprise-level and open-source applications for the higher education community. these systems include some core applications such as kuali financial system, kuali people management, and other campus-wide applications. the kuali ole is its most recent endeavor. the purpose of the kuali ole project is to build an enterprise-level, open-source, and next-generation ils. the goal of kuali ole, taken from its website (http://kuali.org/ole), is to “develop the first system designed by and for academic and research libraries for managing and delivering intellectual information.” there are six principal objectives of the project: • to be built, owned, governed by the academic and research library community • to supports the wide range of resources and formats of scholarly information • to interoperate and integrate with other enterprise and network-based systems the next generation library system: a promise fulfilled? | wang and dawes 83 • to support federation across projects, partners, consortia, and institutions • to provide workflow design and management capabilities • to provides information management capabilities to nonlibrary efforts the funding is provided by a contribution from the andrew w. mellon foundation and the nine partner institutions. kuali ole will be built based on the soa model, on top of the kuali middleware application, kuali rice, the core component of the kuali suite of applications. kuali rice “provides an enterprise class middleware suite of integrated products that allows for applications to be built in an agile fashion. this enables developers to react to end user business requirements in an efficient and productive manner, so that they can produce high quality business applications.”16 version 1.0 of kuali ole is scheduled to be released to the public in december 2012. a stepping and testing version (0.3) was released in november 2011, which covers some core acquisitions features such as “select” and “acquire” processes. we believe that the kuali ole software will not only provide an alternative solution of the ils for academic and research libraries, but will change the way the library conducts its business, and will also have implications for staffing. these changes will result from the comprehensive management of library materials and resources, and the system’s interoperability with other college-level enterprise applications. conclusion after about two decades of library automation system history, both libraries and vendors have begun to realize that a revolutionary change is needed in designing and developing the nextgeneration ils. the system, built on the model of soa, should enable the library to comprehensively and effectively manage all library resources and collections, should accommodate a more flexible library workflow, and should enable the library to provide better services to library users. it is encouraging to see that, in both the commercial and open-source arenas, concrete steps are being taken to develop these systems that will manage all library resources. alma and kuali ole are but two of the next-generation ilss in development. in 2011, serials solutions announced their intent to develop a system using the same principles as described. so have innovative interfaces and oclc, the latter of which has already released an early version of their product to some institutions. since these products are still in development and implementation is not yet widespread, their success in meeting the needs of the library community is still to be seen. references 1. marshall breeding, “next generation library automation: its impact on the serials community,” the serials librarian 56, no. 1–4 (2009): 55–64. 2. marshall breeding, “it’s time to break the mold of the original ils,” computers in libraries 27, no. 10 (2007): 39–41. 3. breeding, “next generation library automation information technology and libraries | september 2012 84 4. breeding, “it’s time to break the mold of the original ils.” 5. andrew pace, “21st century library systems,” journal of library administration 49, no. 6 (2009): 641–50. 6. ibid. 7. breeding, “it’s time to break the mold of the original ils.” 8. breeding, “next generation library automation.” 9. dave mitchell, “defining platform-as-a-service, or paas,” bungee connect developer network, 2008, http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-orpaas (accessed jan. 28, 2012). 10. ellen bahr, “dreaming of a better ils,” computers in libraries 27, no. 9 (2007): 10–14. 11. marshall breeding, “web services and service oriented architecture,” library technology reports 42, no. 3 (2006): 3–42. 12. jessie l. copeland et al., “workflow challenges: does technology dictate workflow?” serials librarian 56, no. 1–4 (2009): 266–70. 13. “sirsi introduces workflows to streamline library operations,” information today 14, no. 7 (1997): 52. 14. ex libris, “ex libris alma: the next generation library services framework,” 2011, www.exlibrisgroup.com/category/almaoverview (accessed jan. 3, 2012). 15. acrl virtual conference, “princeton university discusses ex libris alma,” 2011, www.learningtimes.net/acrl/2011/906 (accessed jan. 3, 2012). 16. kuali rice website, http://www.kuali.org/rice (accessed sept. 10, 2012). http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-or-paas http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-or-paas http://www.exlibrisgroup.com/category/almaoverview http://www.kuali.org/rice a file storage service on a cloud computing environment for digital libraries victor jesús sosa-sosa and emigdio m. hernandez-ramirez information technology and libraries | december 2012 34 abstract the growing need for digital libraries to manage large amounts of data requires storage infrastructure that libraries can deploy quickly and economically. cloud computing is a new model that allows the provision of information technology (it) resources on demand, lowering management complexity. this paper introduces a file-storage service that is implemented on a private/hybrid cloud-computing environment and is based on open-source software. the authors evaluated performance and resource consumption using several levels of data availability and fault tolerance. this service can be taken as a reference guide for it staff wanting to build a modest cloud storage infrastructure. introduction the information technology (it) revolution has led to the digitization of every kind of information.1 digital libraries are appearing as one more step toward easy access to information spread throughout a variety of media. the digital storage of data facilitates information retrieval, allowing a new wave of services and web applications that take advantage of the huge amount of data available.2 the challenges of preserving and sharing data stored on digital media are significant compared to the print world, in which data “stored” on paper can still be read centuries or millennia later. in contrast, only ten years ago, floppy disks were a major storage medium for digital data, but now the vast majority of computers no longer support this type of device. in today’s environment, selecting a good data repository is important to ensure that data are preserved and accessible. likewise, defining the storage requirements for digital libraries has become a big challenge. in this context, it staff—those responsible for predicting what storage resources will be needed in the medium term—often face the following scenarios: • prediction of storage requirements turn out to be below real needs, resulting in resource deficits. • prediction of storage requirements turn out to be above real needs, resulting in expenditure and administration overhead for resources that end up not being used. in these situations, considering only an efficient strategy to store documents is not enough.3 the acquisition of storage services that implement an elastic concept (i.e., storage capacity that can be victor jesús sosa-sosa (vjsosa@tamps.cinvestav.mx) is professor and researcher at the information technology laboratory at cinvestav, campus tamaulipas, mexico. emigdio m. hernandez-ramirez (emhr1983@gmail.com) is software developer, svam international, ciudad victoria, mexico. information technology and libraries | december 2012 35 increased or reduced on demand, with a cost of acquisition and management relatively low) becomes attractive. cloud computing is a current trend that considers the internet as a platform providing on-demand computing and software as a service to anyone, anywhere, and at any time. digital libraries naturally should be connected to cloud computing to obtain mutual benefits and enhance both perspectives.4 in this model, storage resources are provisioned on demand and are paid according to consumption. services deployment in a cloud-computing environment can be implemented three ways: private, public, or hybrid. in the private option, infrastructure is operated solely for a single organization; most of the time, it requires an initial strong investment because the organization must purchase a large amount of storage resources and pay for the administration costs. the public cloud is the most traditional version of cloud computing. in this model, infrastructure belongs to an external organization where costs are a function of the resources used. these costs include administration. finally, the hybrid model contains a mixture of private and public. a cloud-computing environment is mainly supported by technologies such as virtualization and service-oriented architectures. a cloud environment provides omnipresence and facilitates deployment of file-storage services. it means that users can access their files via the internet from anywhere and without requiring the installation of a special application. the user only needs a web browser. data availability, scalability, elastic service, and pay-per-use are attractive characteristics found in the cloud service model. virtualization plays an important role in cloud computing. with this technology, it is possible to have facilities such as multiple execution environments, sandboxing, server consolidation, use of multiple operating systems, and software migration, among others. besides virtualization technologies, emerging tools that allow the creation of cloud-computing environments also support this type of computing model, providing dynamic instantiation and release of virtual machines and software migration. currently, it is possible to find several examples of public cloud storage, such as amazon s3 (http://aws.amazon.com/en/s3), rackspace (http://www.rackspace.com/cloud/public/files), and google storage (https://developers.google.com/storage), each of which provide high availability, fault tolerance, and services and administration at low cost. for organizations that do not want to use a third-party environment to store their data, private cloud services may offer a better option, although the cost is higher. in this case, a hybrid cloud model could be an affordable solution. organizations or individual users, can store sensitive or frequently used information in the private infrastructure and less sensitive data in the public cloud. the development of a prototype of a file-storage service implemented on a private and hybrid cloud environment using mainly free and open-source software (foss) helped us to analyze the behavior of different replication techniques. we paid special attention to the cost of the system implementation, system efficiency, resource consumption, and different levels of data privacy and availability that can be achieved by each type of system. http://aws.amazon.com/en/s3 http://www.rackspace.com/cloud/public/files https://developers.google.com/storage a file storage service on a cloud computing environment for digital libraries | sosa-sosa 36 infrastructure description the aim of this prototyping project was to design and implement scalable and elastic distributed storage architecture in a cloud-computing environment using free, well-known, open-source tools. this architecture represents a feasible option that digital libraries can adopt to solve financial and technical challenges when building a cloud-computing environment. the architecture combines private and public clouds by creating a hybrid cloud environment. for this purpose, we evaluated tools such as kvm and xen, which are useful for creating virtual machines (vm).5 open nebula (http://opennebula.org), eucalyptus (http://www.eucalyptus.com), and openstack (http://www.openstack.org) are good, free options for managing a cloud environment. we selected open nebula for this prototype. commodity hard drives have a relatively high failure rate, hence our main motivation to evaluate different replication mechanisms, providing several levels of data availability and fault tolerance. figure 1(a) shows the core components of our storage architecture (the private cloud), and figure 1(b) shows a distributed storage web application named distributed storage on the cloud (disoc), used as a proof of concept. the private cloud also has an interface to access a public cloud, thus creating a hybrid environment. figure 1. main components of the cloud storage architecture the core components and modules of the architecture are the following: • virtual machine (vm). we evaluated different open-source were evaluated, such as kvm and xen, for the creation of virtual machines.6 some performance tests were done, and kvm showed a slightly higher performance than xen. we selected kvm as the main virtual machine manager (vmm) for the proposed architecture. vmms also are called http://opennebula.org/ http://www.eucalyptus.com/ http://www.openstack.org/ information technology and libraries | december 2012 37 hypervisors. each vm has a linux operating system that is optimized to work in virtual environments and requires a minimum consumption of disk space. the vm also includes an apache web server, a php module, and some basic tools that were used to build the disoc web application. every vm is able to transparently access a pool of disks through a special data access module, which we called dam. more details about dam follow. • virtual machine manager module (vmmm). this has the function of dynamic instantiation and de-instantiation of virtual machines depending on the current load on the infrastructure. • data access module (dam). all of the virtual disk space required by every vm was obtained through the data access module interface (dam-i). dam-i allows vms to access disk space by calling dam, which provides transparent access to the different disks that are part of the storage infrastructure. dam allocates and retrieves files stored throughout multiple file servers. • load balancer module (lbm). this distributes the load among different vms instantiated on the physical servers that make up the private cloud. • load manager (lm). this monitors the load that can occur in the private cloud. • distributed storage on the cloud (disoc). this is a web-based file-storage system that is used as a proof of concept and was implemented based on the proposed architecture. replication techniques high availability is one of the important features offered in a storage service deployed in the cloud. the use of replication techniques has been the most useful proposal to achieve this feature. dam is the component that provides different levels of data availability. it currently includes the following replication policies: no-replication, total-replication, mirroring, and ida-based replication. • no-replication. this replication policy represents the data availability method with the lowest level of fault tolerance. in this method, only the original version of a file is stored in the disk pool. it follows a round-robin allocation policy whereby load assignation is made based on a circularly linked list, taking into account disk availability. this policy prevents all files from being allocated to the same server, providing a minimal fault tolerance in case a server failure. • mirroring. this replication technique is a simple way to ensure higher availability without high resource consumption. in this replication, every time a file is stored in a disk, the dam creates a copy and places it on a different disk. • total-replication. this represents the highest data availability approach. in this technique, a copy of the file is stored on all of the file servers available. total-replication also requires the highest consumption of resources. • ida-based replication. to provide higher data availability with less impact on the consumption of resources, an alternative approach based on information-dispersal techniques can be used. the information dispersal algorithm (ida) is an example of this a file storage service on a cloud computing environment for digital libraries | sosa-sosa 38 strategy.7 when a file (of size |f|) is required to be stored using the ida, the file is partitioned into n fragments of size |f|/m, where m2 ~oi l~ of orals, vest aa4 .. ~:ibe r i~ tu ndra and forest-tundra cry oce mics cryogenic engineering cbyoturbati011 central 23-319) soils 23-3195 23-2276 cryogenic ·fossil c(p.vasses coun ty. ··: crystal gbovth in l'is let 23-2314 patter ns on the ice surface of a lake ~327 14 crystal lattices growth of an ice cr ystal ' in anal og y wit h a n elec tro s tatic field 23-2624 crystal structobe crystal struc ture of water 23-22ij5 gr o vt h o f s nowflakes 23-2874 si lver 'iod ide nucleating sites 23-2269 snow c ry s tals in fusb iai district,' kyoto 23-2878 cbystal study t!chiiqo!s complexities of the three-diaensional ' shape of indivi dual crystals in g l ac i er ice 23-2943 crystals physi ca l properties of aolecular c ry s tals, liquids, and glasselj-2231 cobic ice hexagonal and cubic ice at lov temperature 23-2651 ten s ile strength of c ubic crystals under pressure . 23-2928 cyclo'e blokiig siok ft!f!r "cyclone~ blowing snow •eter and its u se a t llirnyy 23-3073 cjbstal stody tecuiiqo!s contac't • ethod of photographing snow ann firn sa•ples 23-2799 diiiac! avalan ches on rebun i s land, japan 23-29.13 daaage by snowstora of jan. 1963 in japan 2l-a67 por e s t da mage caused by avalanches ' 23-2875 snov and ice da•age o n electric communication lin es in hokkaido 23-2~8 1 darag!690 forest tbe!s co•parative studies · of avalanche injury and vind daaage to forests 23-2ij2~ dus building da a s of •oraine deposits 23-2556 bui l ding e abank•ents in freezing weather 23-3150 chang ing the hydrologica l regiae of a river by controlling its flow23-2429 the ycar-round constru c tion of t~~ yilyuy power station da• in t•e extreoe north 23-2982 dbpoir1tioi build in g d efor•ati o os caused bj frost h eave 2 l26 07 concrete defor•ation due to shrinkage at minus te•pera ture s 23-2558 deformation of brid ge abut•ents erected on penafrosf 23-2599 roadbed deforaati o n due to ground thawing and fros t heave 23-2865 stab ility o f foundations built oo fr os t heaving ground 23-2598 strains in concrete due to negati•e t e aperatures 2l-28ij5 d!gfiee dus de vel o pment of ~bore ice in the lazare v station region 23-303 7 fig. 3. output of cobol language program using marc ii data. marc ii and cobol/ avram and droz 271 table 8. manpower expenditure activity analysis and programming debugging and checkout man weeks 1 2 total 3 since the processing time of a print program is usually a function of the speed of the printer, no accurate internal processing times were recorded . . however, there was no noticeable time difference between this program and other marc print programs written at the library of congress in assembly language. communication format processing the aforementioned techniques are equally adaptable for use with the marc ii communications format ( 3) with the following changes in format conventions: 1) the communication format has a 24-character leader rather than 92 characters of fixed length items in the processing format. in the program, under the "working-storage section", the group item labelled "fixed-marc" would have to be redefined to conform with the 24-character leader. the cobol statements that are noted with " 0 0 " would require a change of their value from "92" to "24". 2) the communication format has no total count of entries in the record directory. a calculation would have to be made to arrive at the total count and that figure stored in a new hold area labelled "directory-count". the base address of the data in the communication format is not relative to the first position of the record as defined in the processing format, but to the first position of the first variable field. this base address is carried in the record leader, and is available for the calculation required for the directory entry count (base address -24/ 12). in the program, after the record directory had been searched and the proper entry placed in the work area, the "move-data" sub-routine would move the appropriate field to the work area for processing with the one alteration noted below with an asterisk. move-data. move zeros to tsub. move spaces to hold-data. move d-address to dsub. add base-address to dsub. o perform move-a d-length times. move-a. add 1 to dsub. add 1 to tsub. move marc-byte (dsub) to d-hold (tsub). programming techniques naturally are dependent on the processing required and the format characteristics at the individual institution. if the marc ii communications format were to be manipulated in the form 272 journal of library automation vol. 1/ 4 december, 1968 in which it is received (each byte equal to a character with a 24-character leader followed by 12-character directory entries) an alternate approach to that suggested above could be to work in the record area and not move data to a work area. conclusion the only marc ii data available to users up to the writing of this article (october 1968) has been the marc ii test tape released by the library of congress in august 1968. therefore, it is probable that most people expressing doubts about the use of cobol with marc records have done so without the experience of actually using the language. we now have this experience at the libary of congress. cobol was successfully used for the computer processing of marc records. the complexity of the record did not detract from ease in programming. although the programs written were for a report function, the data accessing modules of cobol nevertheless can be used for many other functions. file maintenance and retrieval algorithms could be defined and programmed in cobol with facility equal to that in programming the subject function. references 1. griffin, hillis: "automation of technical processes in libraries," in annual review of information science and technology, edited by carlos a. cuadra (chicago: encyclopaedia britannica) 3 (1968), 241-262. 2. u. s. library of congress, information systems office: subscriber's guide to the marc distribution service (washington, d. c.: library of congress, 1968). 3. avram, henriette d.; knapp, john f.; rather, lucia j.: the marc ii format: a communications format for bibliographic data (washington, d . c.; library of congress, 1968 ), pp. 1, 2, 10. microsoft word december_ital_kiscaden_final.docx creating  a  current  awareness  service   using  yahoo!  pipes  and  libguides             elizabeth  kiscaden     information  technology  and  libraries  |  december  2014         51   abstract   migration  from  print  to  electronic  journals  brought  an  end  to  traditional  current  awareness  services,   which  primarily  used  print  routing.  the  emergence  of  real  simple  syndication,  or  rss  feeds,  and  email   alerting  systems  provided  users  with  alternative  services.  to  assist  users  with  adopting  these   technologies,  a  service  utilizing  aggregate  feeds  to  the  library’s  electronic  journal  content  was  created   and  made  available  through  libguides.  libraries  can  reestablish  current  awareness  services  using   existing  technologies  to  increase  awareness  and  usage  of  library-­‐provided  electronic  journal  content.   the  current  awareness  service  presented  is  an  example  of  how  libraries  can  build  basic  current   awareness  services  utilizing  freely  accessible  technologies.     current  awareness  services   library  current  awareness  services,  commonly  referred  to  as  “table  of  contents”  services,  historically   involved  the  dissemination  of  information  in  the  form  of  print  journals  or  photocopied  journal   contents  routed  to  library  users  subscribed  to  the  service.1,2  these  services  have  been  particularly   popular  among  corporate,  law,  and  hospital  libraries,  which  routinely  route  serials  to  primarily   internal  clients.  while  these  paper-­‐based  services  are  still  offered  at  some  libraries,  most  shifted  to   an  electronic  model  of  service  with  the  migration  to  electronic  journals.   as  libraries  adopted  electronic  journals,  many  paper-­‐based  current  awareness  services  transitioned   to  an  electronic  table  of  contents  service  utilizing  email  alerts  or  referred  users  to  rss  feeds  made   available  by  publishers  and  database  vendors.3  a  common  challenge  to  a  library-­‐managed  electronic   table  of  contents  service  is  the  complexity  of  managing  alerts  for  hundreds  of  electronic  journals  for   multiple  patrons.  more  often,  libraries  make  individual  users  responsible  for  subscribing  to  email   alerts  or  rss  feeds  on  their  own,  effectively  transferring  the  responsibility  of  subscribing  to,  filtering,   and  managing  incoming  information  to  the  user.   a  drawback  to  this  migration  is  that  library  users  often  don’t  possess  a  clear  understanding  of  what   tools  are  available  to  create  their  own  service.4  formerly,  journals  may  have  arrived  on  a  user’s  desk   for  perusal,  yet  now  users  are  required  to  seek  out  information  independently.  additionally,  despite   the  number  of  discovery  tools  available,  library  users  are  often  unaware  of  journals  available  in  an   electronic  format  through  their  library.5  information  management  tools  have  become  necessary  in   our  current  information  environment;  with  the  abundance  of       elizabeth  kiscaden  (elizabeth-­‐kiscaden@uiowa.edu),  former  library  director  at  waldorf  college,  is   head,  library  services,  hardin  library  for  the  health  sciences,  university  of  iowa,  iowa  city.       creating  a  current  awareness  service  using  yahoo!  pipes  and  libguides  |  kiscaden   52   information  available,  keeping  up-­‐to-­‐date  with  new  information  in  a  discipline  can  be  overwhelming.   therein  exists  an  opportunity  for  libraries—academic,  special,  and  public—to  revitalize  current   awareness  services  and  build  information  management  tools  using  aggregate  feeds.     design  and  description  of  the  service   at  waldorf  college,  the  luise  v.  hanson  library  created  a  current  awareness  service  utilizing  rss   feeds,  with  the  intent  to  assist  faculty  with  keeping  up-­‐to-­‐date  with  newly  published  content  in  the   library’s  electronic  journal  collection.  the  service,  dubbed  info  sos,  was  designed  to  overcome  two   barriers  to  patron  participation  in  feed  services:  the  chore  of  subscribing  to  and  curating  multiple   feeds  and  the  lack  of  awareness  of  feeds  and  feed  reader  technology.  info  sos  was  piloted  to  faculty   during  the  spring  of  2014  and  was  accompanied  by  an  informal  questionnaire  to  collect  feedback.   info  sos  is  built  on  rss,  or  “really  simple  syndication”  technology,  one  of  the  most  prevalent  tools  for   keeping  current  with  new  information  published  electronically.  rss  has  been  available  for  more  than   a  decade,6  and  many  users—both  patrons  and  library  professionals—are  using  this  technology.   however,  while  powerful  and  freely  accessible,  rss  feeds  have  their  limitations.  subscribing  to  and   curating  multiple  feeds  can  become  a  burden.     to  eliminate  the  chore  of  managing  multiple  feeds,  info  sos  displays  feed  aggregates  created  using   yahoo!  pipes  http://pipes.yahoo.com/pipes/).  aggregate  feeds,  or  feeds  comprising  multiple  rss   feeds,  can  be  created  using  many  tools  available  freely  online,  such  as  feed  stitch,  feed  informer,   feedburner,  and  more.  yahoo!  pipes  was  chosen  for  this  service  primarily  because  it  requires  limited   coding  knowledge,7  yet  the  software  provides  a  number  of  advanced  functions  for  sorting  and   combining  large  groups  of  feeds.  these  advanced  features  became  essential  when  building  aggregate   feeds  for  content  from  journal  aggregators.     yahoo!  pipes  requires  a  user  account  (free  of  charge)  before  constructing  pipes.  the  software   combines  and  sorts  information  using  a  visual  editor  that  resembles  virtual  plumbing,  which  is   presumably  why  the  software  is  called  pipes.  to  construct  the  aggregate  feeds  composing  info  sos,   librarians  used  the  fetch  feed  operator  to  combine  individual  rss  feeds  into  a  single  feed.  once   combined,  the  service  uses  the  sort  operator,  which  sorts  the  aggregated  content  by  date.  from  the   sort  operator,  the  content  is  connected  to  the  pipe  output,  from  which  a  single  rss  feed  is  generated.     the  strength  of  yahoo!  pipes  lies  in  the  advanced  tools  available  for  manipulating  feed  content.  for   example,  pipes  sorts  feed  content  from  database  vendors  by  the  date  it  is  published  to  the  feed,  not   the  publication  date  of  the  article.  if  desired,  aggregate  feed  creators  can  use  the  rename  and  regex   operators  to  remove  the  article  publication  date  from  the  description  field  and  use  it  to  sort  the  feed   content.  another  useful  tool  is  the  union  operator,  which  allows  creators  to  string  together  larger   bundles  of  feeds.       information  technology  and  libraries  |  december  2014   53     figure  1.  fetch  feed  and  sort  operator  in  yahoo!  pipes       figure  2.  image  of  yahoo!  pipe  using  advanced  tools     creating  a  current  awareness  service  using  yahoo!  pipes  and  libguides  |  kiscaden   54   lack  of  awareness  is  a  barrier  to  user  adoption  of  rss  feeds;  many  users  have  an  unclear   understanding  of  what  a  rss  feed  is.  if  unfamiliar  with  rss  feeds,  it  is  safe  to  assume  that  users  are   unfamiliar  with  rss  reader  technology  as  well.  at  waldorf  college,  this  was  confirmed  by  the   questionnaire  distributed  during  the  pilot  of  this  service.  of  the  twenty-­‐eight  faculty  respondents,   more  than  70  percent  had  never  used  an  rss  feed  before  using  info  sos.  it  is  safe  to  assume  that   these  faculty  would  not  have  a  subscription  to  a  feed  reader.   recognizing  the  need  for  an  interface  to  deliver  content,  librarians  used  the  libguides  software  to   display  content  from  these  aggregate  feeds.  the  software  contains  a  tool  for  adding  feed  content,  and   allows  for  the  application  of  an  institution’s  proxy  prefix  to  the  url,  creating  seamless  access  on  and   off  campus.  the  info  sos  resource  contains  tabbed  pages  designated  for  individual  fields  (biology,   psychology,  library  sciences,  etc.)  displaying  aggregate  feeds  for  journals  in  each  subject  area.  for   example,  the  physics  page  contains  aggregate  feeds  for  new  articles  published  in  the  library’s  full-­‐ text  physics  journals,  as  displayed  in  the  figure  below.       figure  3.  aggregate  physics  feeds  in  libguides   user  feedback   info  sos  remains  a  relatively  new  service  to  library  users  at  the  luise  v.  hanson  library,  but   preliminary  feedback  has  been  positive.  the  service  was  advertised  to  faculty  via  email  and     information  technology  and  libraries  |  december  2014   55   accompanied  by  a  feedback  survey  created  using  google  forms.  as  stated  previously,  librarians   received  twenty-­‐eight  responses  to  the  survey,  a  relatively  strong  response  considering  the  limited   number  of  faculty  at  the  college.   of  the  respondents,  more  than  70  percent  had  never  used  an  rss  feed  previously,  instead  using  a   variety  of  other  tools  to  stay  current  with  their  field.  of  those  other  tools,  18  percent  of  faculty   subscribed  to  table  of  contents  alerts,  27  percent  browsed  new  issues  of  print  journals,  25  percent   visited  association  websites,  and  23  percent  conducted  periodic  searches  for  information  in  the   library  databases.  it  was  of  some  concern  that  of  these  tools,  only  faculty  using  databases  and   subscribing  to  table  of  contents  alerts  would  be  connecting  with  the  library’s  electronic  journal   collection.   when  presented  with  info  sos  and  asked  whether  faculty  would  find  this  tool  useful,  more  than  70   percent  responded  that  they  would.  faculty  were  solicited  for  suggestions  for  improving  the   resource,  and  librarians  received  many  suggestions  for  expanding  the  content.  this  feedback  was   valuable  in  that  it  provided  justification  for  continuing  the  service  beyond  the  pilot  and  a  list  of   potential  subject  areas  to  begin  expanding  the  service.  the  intended  outcome  of  the  service  is  to   assist  faculty  in  keeping  current  with  literature  in  their  field  and  utilizing  the  library’s  resources  in   the  process.   limitations  and  challenges   generating  feeds  from  popular  library  databases,  such  as  ebscohost  and  proquest,  is  limited  in  that   the  publication  dates  for  articles  are  contained  in  the  description  field.  this  can  make  the  sort   operator  in  yahoo!  pipes  somewhat  inaccurate  because  it  would  be  sorting  by  the  date  they  were   published  to  the  feed,  not  by  actual  publication  date  of  the  journal  article.  if  necessary,  this  issue  can   be  corrected  using  the  rename  and  regex  operators  by  copying  the  item  description  as  the   publication  date.     an  additional  challenge  regarding  vendor-­‐created  feeds  relates  to  the  issue  of  expiring  feeds  created   from  library  databases.  a  library  profile  was  required  for  each  database,  such  as  ebscohost  or   proquest,  to  create  and  save  feeds.  this  allows  for  the  renewal  of  expiring  feeds;  the  email  account   attached  to  the  profile  receives  an  invitation  to  renew  expiring  feeds.  most  vendors  allow  for  feeds  to   be  created  at  the  database  without  a  profile,  but  those  feeds  will  automatically  expire  if  not  used   within  a  period  of  time.  the  potential  of  feeds  expiring  may  add  an  element  of  maintenance  to  the   current  awareness  service.       future  developments   yahoo!  pipes  offers  the  unique  ability  to  publish  pipes  that  others  may  share  and  “clone.”  for   libraries  interested  in  creating  aggregate  feeds  for  popular  ebscohost  journals,  the  pipes  created  for   info  sos  are  available  to  clone  at  http://pipes.yahoo.com/infosos.  a  search  of  published  pipes   available  in  yahoo!  pipes  reveals  pipes  created  by  many  public  and  academic  libraries,  all  of  which   are  available  to  clone  and  edit.  the  ability  to  share  pipes  with  other  institutions  introduces  the   possibility  of  current  awareness  services  shared  between  library  consortia  or  associations.     creating  a  current  awareness  service  using  yahoo!  pipes  and  libguides  |  kiscaden   56   as  information  becomes  more  abundant,  tools  and  services  to  manage  incoming  information  will   continue  to  be  a  corresponding  need.  creating  and  sharing  services  that  utilize  technology  common   to  libraries  presents  us  with  the  opportunity  to  collaborate  with  one  another  and  revitalize  library-­‐ engineered  current  awareness  services.  these  services  offer  a  value  that  is  twofold:  library  users   benefit  from  the  ability  to  stay  current  with  publications  in  their  field,  and  libraries  have  the   potential  of  increased  usage  of  their  purchased  content.  with  no  financial  investment,  an  aggregate   feed-­‐based  service  is  a  value  that  a  variety  of  libraries  can  implement  with  the  investment  of  only   limited  personnel  time.   references     1.    g.  mahesh  and  dinesh  kumar  gupta,  “changing  paradigm  in  journals  based  current  awareness   services  in  libraries,”  information  services  &  use  28,  no.  1  (2008):  59–65,   http://dx.doi.org/10.3233/isu-­‐2008-­‐0555.       2.    stephen  m.  johnson,  andrew  osmond,  and  rebecca  j.  holz,  “developing  a  current  awareness   service  using  really  simple  syndication  (rss),”  journal  of  the  medical  library  association  97,  no.  1   (2009):  51–53,  http://dx.doi.org/10:3163/1536-­‐5050.97.1.011.   3.    mahesh  and  gupta,  “changing  paradigm  in  journals  based  current  awareness  services  in   libraries.”   4.    m.  kathleen  kern  and  cuiying  mu,  “the  impact  of  new  technologies  on  current  awareness  tools   in  academic  libraries,”  reference  &  user  services  quarterly  51,  no.  2  (2011):  92–97.     5.    sandra  j.  weingart  and  janet  a.  anderson,  “when  questions  are  answers:  using  a  survey  to   achieve  faculty  awareness  of  the  library’s  electronic  resources,”  college  &  research  libraries  61,   no.  2  (2000):  127–34,  http://dx.doi.org/10.5860/crl.61.2.127.   6.    jim  doree,  “rss:  a  brief  introduction,”  journal  of  manual  &  manipulative  therapy  15,  no.  1  (2007):   57–58.     7.    bill  dyszel,  “create  no-­‐code  mashups  with  yahoo!  pipes,”  pc  magazine  26,  no.  21/22  (2007):  103– 5.   jorgensen ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ editorial board thoughts: libraries as makerspace? tod colegrove information technology and libraries | march 2013 2 recently there has been tremendous interest in “makerspace” and its potential in libraries: from middle school and public libraries to academic and special libraries, the topic seems very much top of mind. a number of libraries across the country have been actively expanding makerspace within the physical library and exploring its impact; as head of one such library, i can report that reactions to the associated changes have been quite polarized. those from the supported membership of the library have been uniformly positive, with new and established users as well as principal donors immediately recognizing and embracing its potential to enhance learning and catalyze innovation; interestingly, the minority of individuals that recoil at the idea have been either long-term librarians or library staff members. i suspect the polarization may be more a function of confusion over what makerspace actually is. this piece offers a brief overview of the landscape of makerspace—a glimpse into how its practice can dramatically enhance traditional library offerings, revitalizing the library as a center of learning. been happening for thousands of years . . . dale dougherty, founder of make magazine and maker faire, at the “maker monday” event of the 2013 american library association midwinter meeting framed the question simply, “whether making belongs in libraries or whether libraries can contribute to making.” more than one audience member may have been surprised when he continued, “it’s already been happening for hundreds of years—maybe thousands.”1 the o’reilly/darpa makerspace playbook describes the overall goals and concept of makerspace (emphasis added): “by helping schools and communities everywhere establish makerspaces, we expect to build your makerspace users' literacy in design, science, technology, engineering, art, and math. . . . we see making as a gateway to deeper engagement in science and engineering but also art and design. makerspaces share some aspects of the shop class, home economics class, the art studio and science lab. in effect, a makerspace is a physical mashup of these different places that allows projects to integrate these different kinds of skills.”2 building users’ literacies across multiple domains and a gateway to deeper engagement? surely these are core values of the library; one might even suspect that to some degree libraries have long been makerspace. a familiar example of maker activity in libraries might include digital media: still/video photography and audio mastering and remixing. youmedia network, funded by the macarthur patrick “tod” colegrove (pcolegrove@unr.edu), a lita member, is head of the delamare science & engineering library at the university of nevada, reno, nevada. mailto:pcolegrove@unr.edu editorial board thoughts: libraries as makerspace? | colegrove 3 institute through the institute of museum and library services, is a recent example of such effort aimed at creating transformative spaces; engaged in exploring, expressing, and creating with digital media, youth are encouraged to “hang out, mess around, and geek out.” a more pedestrian example is found in the support of users with first-time learning or refreshing of computer programming skills. as recently as the 1980s, the singular option the library had was to maintain a collection of print texts. through the 1990s and into the early 2000s, that support improved dramatically as publishers distributed code examples and ancillary documents on accompanying cd or dvd media, saving the reader the effort of manually typing in code examples. the associated collections grew rapidly, even as the overhead associated with the maintenance and weeding of a collection that was more and more rapidly obsoleted grew more. today, e-book versions combined with ready availability of computer workstations within the library, and the rapidly growing availability of web-based tutorials and support communities, render a potent combination that customers of the library can use to quickly acquire the ability to create or “make” custom applications. with the migration of the supporting print collections online, the library can contemplate further support in the physical spaces opened up. open working areas and whiteboard walls can further amplify the collaborative nature of such making; the library might even consider adding popular hardware development platforms to its collection of lendable technology, enabling those interested to check out a development kit rather than purchase on their own. after all, in a very real sense that is what libraries do—and have done, for thousands of years: buy sometimes expensive technology tailored to the needs and interest of the local community and make it available on a shared basis. makerspace: a continuum along with outreach opportunities, the exploration of how such examples can be extended to encompass more of the interests supported by the library is the essence of the maker movement in libraries. makerspace encompasses a continuum of activity that includes “co-working,” “hackerspace,” and “fab lab”; the common thread running through each is a focus on making rather than merely consuming. it is important to note that although the terms are often incorrectly used as if they were synonymous, in practice they are very different: for example, a fab lab is about fabrication. realized, it is a workshop designed around personal manufacture of physical items— typically equipped with computer controlled equipment such as laser cutters, multiple axis computer numerical controlled (cnc) milling machines, and 3d printers. in contrast, a “hackerspace” is more focused on computers and technology, attracting computer programmers and web designers, although interests begin to overlap significantly with the fab lab for those interested in robotics. co-working space is a natural evolution for participants of the hackerspace; a shared working environment offering much of the benefit of the social and collaborative aspects of the informal hackerspace, while maintaining a focus on work. as opposed to the hobbyist that might be attracted to a hackerspace, co-working space attracts independent contractors and professionals that may work from home. information technology and libraries | march 2013 4 it is important to note that it is entirely possible for a single makerspace to house all three subtypes and be part hackerspace, fab lab, and co-working space. can it be a library at the same time? to some extent, these activities are likely already ongoing within your library, albeit informally; by recognizing and embracing the passions driving those participating in the activity, the library can become central to the greater community of practice. serving the community’s needs more directly, opportunities for outreach will multiply even as it enables the library to develop a laser-sharp focus on the needs of that community. depending on constraints and the community of support, the library may also be well-served by forming collaborative ties with other local makerspace; having local partners can dramatically improve the options available to the library in day-to-day practice, and better inform the library as it takes well-chosen incremental steps. with hackerspace/co-working/fab lab resources aligned with the traditional resources of the library, engagement with one can lead naturally to the other in an explosion of innovation and creativity. renaissance in addition to supporting the work of the solitary reader, “today's libraries are incubators, collaboratories, the modern equivalent of the seventeenth-century coffeehouse: part information market, part knowledge warehouse, with some workshop thrown in for good measure.”3 consider some of the transformative synergies that are already being realized in libraries experimenting with makerspace across the country: • a child reading about robots able to go hands-on with robotics toolkits, even borrowing the kit for an extended period of time along with the book that piqued the interest; surely such access enables the child to develop a powerful sense of agency from early childhood, including a perception of self as being productive and much more than a consumer. • students or researchers trying to understand or make sense of a chemical model or novel protein strand able not only to visualize and manipulate the subject on a two-dimensional screen, but to relatively quickly print a real-world model to be able and tangibly explore the subject from all angles. • individuals synthesizing knowledge across disciplinary boundaries able to interact with members of communities of practice in a non-threatening environment; learning, developing, and testing ideas—developing rapid prototypes in software or physical media, with a librarian at the ready to assist with resources and dispense advice regarding intellectual property opportunities or concerns. the american libraries association estimates that as of this printing there are approximately 121,169 libraries of all kinds in the united states today; if even a small percentage recognize and begin to realize the full impact that makerspace in the library can have, the future looks bright indeed. editorial board thoughts: libraries as makerspace? | colegrove 5 references 1. dale dougherty, “the new stacks: the maker movement comes to libraries” (presentation at the midwinter meeting of the american library association, seattle, washington, january 28, 2013). http://alamw13.ala.org/node/10004. 2. michele hlubinka et al., makerspace playbook, december 2012, accessed february 13, 2012, http://makerspace.com/playbook. 3. alex soojung-kim pang, "if libraries did not exist, it would be necessary to invent them," contemplative computing, february 6, 2012, http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-benecessary-to-invent-them.html. http://alamw13.ala.org/node/10004 http://makerspace.com/playbook http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html presidents it is with great pleasure that i bring you greetings as your lita president. it is an honor to be the lita president and to follow the very productive term of tom wilson. as you know, this column is an opportunity for the president to increase communication with the membership. as lita president, i plan to concentrate my efforts on continuing to capitalize on the association’s many strengths. last year at lita’s 2004 town meeting at ala midwinter meeting in san diego, we built on the planning efforts of tom wilson. to assist in the development of goals to support lita’s developing vision statement, i gathered additional information for the next planning phase. i asked the town meeting’s more than eighty attendees to consider these three questions. 1. what do you like about lita, its organizational structure, and its programs? 2. what services or products does lita currently offer that you value? 3. what new services or products would increase lita’s value to you? the attendees gathered at small tables and discussed the three questions. after the discussions, each table shared their answers. here’s what they thought: 1. what do you like about lita, its organizational structure, and its programs? � easy to become involved � inviting � great networking � forward thinking � lack of bureaucracy � flexibility � enthusiasm of members � encourages discussion � openness to new members � open structure 2. what services or products does lita currently offer that you value? � lita_l � regional institutes � top tech trends � ter � national forum � ital � lita publications � interest groups � programming � networking with knowledgeable people 3. what new services or products would increase lita’s value to you? � coordination with other divisions � liaison at state levels � mentoring (partnering with nmrt) � leadership in advising libraries � best practices and competencies � more standards involvement � more access of those not attending conference � webcasts in one-to-two-hour sessions � partnerships with vendors � more diversity in the organization � conference reports � use more leading-edge technology � more content on the web site � more focus on technical issues � blogging � online newsletter � announcement service � mechanism for sharing information � another electronic discussion list for tech how to’s � tech reviews and recommendations � online community support � rss feed on the web site � stronger voice on technology and policy � semiannual e-mail messages to all lita members as you can tell from these comments, lita provides many valuable services to our members, and to the association at large. however, there are many opportunities for us to do more. to accomplish the goal of expanding our services and setting priorities, we have continued our emphasis on strategic planning activities. along with mary taylor, lita’s executive director, i attended several alaahead planning sessions last fall. our participation in these meetings reinforced our commitment to the planning processes we used to draft lita’s goals and strategies for the next five years. this draft plan is scheduled for review by the lita board during the 2005 ala midwinter meeting. additionally, lita’s strategic plan will be discussed at the 2005 lita town meeting, also during midwinter meeting. we look forward to finalizing our strategic plan by the 2005 ala annual conference. i believe this plan will help us achieve our goals and be used to gauge our successes. in addition to our strategic planning process, lita has made great strides in a number of significant areas. the lita web advisory task force, chaired by zoe stewart marshall, has been working to “establish policies governing the lita web site’s content, responsibilities for its management, and an approval process for posting content 2 information technology and libraries | march 2005 president’s column colby mariva riggs colby mariva riggs is lita president and project coordinator, library systems, university of california–irvine. (continued on page 31) online.” they have implemented several process improvements already and will complete their work by the 2005 ala annual conference. this past fall, michelle frisque, lita web manager, conducted a survey of our members about the lita web site. michelle and the web coordinating committee are already working on a new look and feel for the lita web site based on the survey comments, and the result promises to be phenomenal. on top of all of the current activities, new vision statement, strategic planning, and the lita web site redesign, mary taylor and the lita board worked with a graphic designer to develop a new lita logo. after much deliberation, the new logo debuted at the 2004 lita national forum with great enthusiasm. many members commented that the new logo expresses the “energy” of lita and felt the change was terrific. with your help, lita had a very successful conference in orlando. although there were weather and transportation difficulties, the lita programs and discussions were of the highest quality, as always. the program and preconference offerings for the upcoming annual conference in chicago promise to be as strong as ever. don’t forget, lita also offers regional institutes throughout the year. check the lita web site to see if there’s a regional institute scheduled in your area. lita held another successful national forum in fall 2004 in st. louis, “ten years of connectivity: libraries, the world wide web, and the next decade.” the threeday educational event included excellent preconferences, general sessions, and more than thirty concurrent sessions. i want to thank the wonderful 2004 lita national forum planning committee, chaired by diane bisom, the presenters, and the lita office staff who all made this event a great experience. the next lita national forum will be held at the san jose marriott, san jose, california, september 29–october 2, 2005. the theme will be “the ubiquitous web: personalization, portability, and online collaboration.” thomas dowling, chair, and the 2005 lita national forum planning committee are preparing another “must attend” event. next year marks lita’s fortieth anniversary. 2006 will be a year for lita to celebrate our history, future, and our many accomplishments. we are fortunate to have lynne lysiak leading the fortieth anniversary task force activities. i know we all will enjoy the festivities. i look forward to working with many of you as we continue to make lita a wonderful and vibrant association. i encourage you to send me your comments and suggestions to further the goals, services, and activities of lita. 32. terence cavanaugh, “e-books and accommodations: is this the future of print accommodation?” teaching exceptional children 35, no. 2 (2002): 56–61. 33. skip pratt, “e-books and e-publishing: ignore ms reader and palm os at your own peril,” knowledge download, 2002. accessed dec. 27, 2004, www.knowledge-download.com/260802 -e-book-article. 34. davina witt, “audience profile and demographics,” mar./apr. 2003. accessed dec. 27, 2004, www.bookbrowse.com/ media/audience.cfm. 35. geoff daily, “gameboy advance: not just playing with games,” econtent 27, no. 5 (2004): 12–14. 36. associated press, “flexible e-paper on its way,” associated press, 7 may 2003. accessed dec. 27, 2004, www.wired.com/news. 37. richard mayer, multimedia learning (cambridge, uk: cambridge university press, 2000). 38. sottong, “e-book technology.” 39. amc, “film facts: read about lost films.”accessed june 19, 2003, www.amctv.com/article?cid=1052. 40. ronald jantz, “e-books and new library service models: an analysis of the impact of e-book technology on academic libraries,” information technology and libraries 20, no. 2 (2001): 104–15. 41. susan lareau, the feasibility of the use of e-books for replacing lost or brittle books in the kent state university library, 2001, eric, ed 459862. accessed dec. 27, 2004, http://searcheric.org. 42. eli edwards, “ephemeral to enduring: the internet archive and its role in preserving digital media,” information technology and libraries 23, no. 1 (2004): 3–8. 43. norm parry, format proliferation in public libraries, 2002, eric, ed 470035,. accessed dec. 27, 2004, http://searcheric.org. 44. david m. levy, scrolling forward: making sense of documents in the digital age (new york: arcade pub., 2001). 45. about alan lomax. accessed dec. 27 2004, www.alan -lomax.com/about.html. dispelling five myths about e-books | gall 31 (president’s column continued from page 2) art & tech 24 ebsco cover 2 lita covers 3–4 index to advertisers 116 information technology and libraries | september 2009 success factors and strategic planning: rebuilding an academic library digitization program cory lampert and jason vaughan this paper discusses a dual approach of case study and research survey to investigate the complex factors in sustaining academic library digitization programs. the case study involves the background of the university of nevada, las vegas (unlv) libraries’ digitization program and elaborates on the authors’ efforts to gain staff support for this program. a related survey was administered to all association of research libraries (arl) members, seeking to collect baseline data on their digital collections, understand their respective administrative frameworks, and to gather feedback on both negative obstacles and positive inputs affecting their success. results from the survey, combined with the authors’ local experience, point to several potential success factors including staff skill sets, funding, and strategic planning. e stablishing a successful digitization program is a dialog and process already undertaken or currently underway at many academic libraries. in 2002, according to an institute of museum and library services report, “thirty-four percent of academic libraries reported digitization activities within the past 12 months.” nineteen percent expect to be involved in digitization work in the next twelve months, and forty-four percent beyond twelve months.1 more current statistics from a subsequent study in 2004 reflected that digitization work has both continued and expanded, with half of all academic libraries performing digitization activities.2 fifty-five percent of arl libraries responded to a survey informing part of the 2006 association of research libraries (arl) study managing digitization activities; of these, 97 percent of the respondents indicated engagement in digitization.3 the 2008 ithaka study key stakeholders in the digital transformation in higher education found that nearly 80 percent of large academic libraries either already have or plan to have digital repositories.4 with digitization becoming the norm in many institutions, the time is right to consider what factors contribute to the success and rapid growth of some library digitization programs while other institutions find digitization challenging to sustain. the evolution of digitization at the unlv libraries is doubtless a journey many institutions have undertaken. over the past couple of years, those responsible for such a program at the unlv libraries have had the opportunity to revitalize the program and help collaboratively address some key philosophical questions that had not been systematically asked before, let alone answered. associated with this was a concerted focus to engage other less involved staff. one goal was to help educate them on academic digitization programs. another goal was to provide an opportunity for input on key questions related to the programs’ strategic direction. as a subsequent action, the authors conducted a survey of other academic libraries to better understand what factors have contributed to their programs’ own success as well as challenges that have proven problematic. many questions asked of our library staff in the planning and reorganization process were asked in the survey of other academic libraries. while the unlv libraries have undertaken what is felt are the proper structural steps and have begun to author policies and procedures geared toward an efficient operation, the authors wanted to better understand the experiences, key players, and underlying philosophies of other institutional libraries as theses pertain to their own digitization program. the following article provides a brief context relating the background of the unlv libraries’ digitization program and elaborates on the authors’ efforts toward educating library colleagues and gaining staff buy-in for unlv’s digitization program—a process that countless other institutions have no doubt experienced, led, or suffered. the administered survey to arl members dealt with many topics similar to those that arose during the authors’ initial planning and later conversations with library staff, and as such, survey questions and responses are integrated in the following discussion. the authors administered a 26-question survey to the 123 members of the arl. the focus of this survey was different from the previously mentioned arl study managing digitization activities, though several of the questions overlapped to some degree. in addition to demographic or concrete factual types of questions, the unlv libraries digitization survey had several questions focused on perceptions—that is, staff support, administrative support, challenges, and benefits. areas of overlap with the earlier arl survey are mentioned in the appropriate context. though unlv isn’t a member of the arl, we consider ourselves a research library, and, regardless, it was a convenient way to provide some structure to the survey. survey responses were collected for a forty-five-day period from mid-june to late july, 2008. through visiting each and every arl library’s website, the authors identified the individuals that appeared to be the “leaders” of the arl digitization programs, with instructions to forward the message to a colleague if cory lampert (cory.lampert@unlv.edu) is digitization projects librarian and jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada las vegas. success factors and strategic planning | lampert and vaughan 117 they themselves had been incorrectly identified. this was very tricky, and revealed numerous program structures in place, differences between institutions in promoting their collections, and so on. the authors didn’t necessarily start with the presumption that all arl libraries even have a digitization program, but most (but not all) either seemed to have a formal organized digitization program with staffing, or at least had digitized and made available something, even if only a single collection. we e-mailed a survey announcement and a link to the survey to the targeted individuals, with a follow-up reminder a month later. responses were anonymous, and respondents were allowed to skip questions; thus the number of responses for the twenty-six questions making up the survey ranged from a low of thirty (24.4 percent) to a high of forty-four responses (35.8 percent). the average number of responses for each of the questions was 39.8, yielding an overall response rate of 32.4 percent. questions were of three types: multiple choice (select one answer), multiple choice (mark all that apply), and open text. in addition, some of the multiple choice questions allowed additional open text comments. survey responses appear in appendix a. n context of the unlv libraries’ digitization program “digital collection,” for the purpose of the unlv library digitization survey, was defined as a collection of library or archival materials converted to machine-readable format to provide electronic access or for preservation purposes; typically, digital collections are library-created digital copies of original materials presented online and organized to be easily searched. they may offer features such as: full text search, browsing, zooming and panning, side by side comparison of objects, and export for presentation and reuse. one question the survey asked was “what year do you feel your library published its first ‘major’ digital collection?” responses ranged from 1990 to 2007; the general average of all responses was 2001. the earlier arl study found 2000 as the year most respondents began digitization activities.5 mirroring this chronology, the unlv libraries has been active in designing digital projects and digitizing materials from library collections since the late 1990s. technical web design expertise was developed in the cataloging unit (later renamed bibliographic and metadata services), and some of the initial efforts were to create online galleries and exhibits of visual materials from special collections, such as the jeanne russell janish (1998) exhibit.6 subsequently, the unlv libraries purchased the contentdm digital collection management software, providing both back-end infrastructure and front-end presentation for digital collections. later, the first digitization project with search functionality was created in partnership with special collections and was funded by a unlv planning initiative award received in 1999. the early las vegas (2003) project focused on las vegas historical material and was designed to guide users to search, retrieve, and manipulate results using contentdm software to query a database.7 unlv’s development corresponds with regional developments in utah in 2001, when “the largest academic institutions in utah were just beginning to develop digital imaging projects.”8 data from the 2004 imls study showed that, in the twelve months prior to the study release in 2004, the majority of larger academic libraries had digitized between one and five hundred images for online presentation.9 in terms of staffing, digitization efforts occur in a wide variety of configurations, from large departments to solo librarians managing volunteers. for institutions with recognized digitization staff, great variations exist between institutions in terms of where in the organizational chart digitization staff are placed. boock and vondacek’s research revealed that, of departments involved in digitization, special collections, archives, technical services, and newly created digital library units are where digitization activities most commonly take place.10 a majority of respondents to the arl study indicated that some or all activities associated with digitization are distributed across various units in the library.11 in 2003, the unlv libraries created a formal department within the knowledge access management division—web and digitization services (wds)—initially comprising five staff focused on the development of the unlv libraries’ public website, the development of web-based applications and databases to manage and efficiently present information resources, and the digitization and online presentation of library materials unique to the unlv libraries’ collections and of potential interest to a wider audience. augmenting their efforts were individuals in other departments helping with metadata standards, content selection, and associated systems technical support. the unlv library digitization survey showed that the majority (78 percent) of libraries that responded have at least one full-time staff member whose central job responsibility is to support digitization activities. this should not imply the existence of a fully staffed digitization program; the 2006 imls study found that 74.1 percent of larger academic libraries described themselves as lacking in sufficiently skilled technology staff to accomplish technology-related activities.12 central to any digitization program should be some structure in terms of how projects are proposed and subsequently prioritized. to help guide the priorities 118 information technology and libraries | september 2009 of unlv’s infant wds department, a digital projects advisory committee was formed to help solicit and prioritize project ideas, and subsequently track the development of approved projects. this committee’s work could be judged as having mixed success partly because it met too infrequently, struggled with conflicting philosophical thoughts on digitization, and was confronted with the reality that staff that were needed to help bring approved ideas to fruition simply weren’t in place because of too many other library priorities drawing attention away from digitization. an evaluation of the lessons learned from these early years can be found in brad eden’s article.13 the unlv library digitization survey had several questions related to management and prioritization for digital projects and shows that despite the challenges of a committee-based decisionmaking structure, when a formal process is in place at all, 42.1 percent of survey respondents used a committee versus a single decision maker (23.7 percent) for determining to whom projects are proposed for production. a follow-up question asked “how are approved projects ultimately prioritized?” the most popular response (54.1 percent) indicated “by a committee for review by multiple people,” followed by “no formal process” (27 percent). “by a single decision maker” was selected by 18.9 percent of the respondents. the earlier arl study asked a somewhat related question: “who makes decisions about the allocation of staff support for digitization efforts? check all that apply.” out of seven possible responses, the three most popular were “head of centralized unit,” “digitization team/committee/working group,” and “other person”; the other person was most often in an administrative capacity, such as a dean, director, or department head.14 administrative support for a program was another variable the unlv library digitization survey investigated. the survey asked respondents to rate, on a scale of one to five, “how would you characterize current support for digitization by your library’s administration?” more than 40 percent of responses indicated “consistent support,” followed by 31 percent of respondents indicating “very strong support, top priority,” 14.3 percent ranking support as neutral, and 14.2 percent claiming “minimal support” or “very little support, or some resistance.” it was also clear from some of the other questions’ responses that the dean or director’s support (or lack thereof) can have dramatic effects on the digitization program. 2005 brought change to the unlv libraries in the form of a new dean. well-suited for the digitization program, she came from california, a state very heavily engaged and at the forefront of digitization within the library and larger academic environment. one of her initiatives was a retooling of the digitization program at the unlv libraries, and her enthusiasm reflects a growing awareness of administrators regarding the benefits of digitization. n reorganization, library staff engagement, and decision making in 2006, two new individuals joined unlv libraries’ web and digitization services department, the digitization projects librarian (filling a vacancy), and the web technical support manager (a new position). a bit later, the systems department (providing technical support for the web and digitization servers, among other things), and the wds department were combined into a single unit and renamed library technologies. collectively, these changes brought new and engaged staff into the digitization program and combined under one division many of the individuals responsible for digital collection creation and support. perhaps more subtlety, this arrangement also provided formal acknowledgement of the importance and desire of publishing digital collections. with the addition of new staff and a reorganization, a piece still missing was a resuscitation of library stakeholders to help solicit, prioritize, and manage the creation of digital collections and an overall vision guiding the program. while the technical expertise, knowledge of metadata and imaging standards, and deep-rooted knowledge of digitization programs and concepts existed within the library technologies staff, other knowledge didn’t—primarily in-depth knowledge of the unlv libraries’ special collections and a track record of deep engagement with college faculty and the educational curriculum. similar to other organizations, the unlv libraries had not only created a new unit, but was also poised to introduce cross-departmental project groups that would collaborate on digitization activities. in their study of arl and greater western library association (gwla) libraries, book and vondracek found that this was the most commonly used organizational structure.15 knowledge of the concepts of a digitization program and what is involved in digitizing and sustaining a collection was not widespread among other library colleagues. acknowledged, but not guaranteed up front for the unlv libraries, was the likely eventual reformation of a group of interested and engaged library stakeholders charged to solicit, prioritize, and provide oversight of the unlv libraries’ digitization program. for various reasons, the authors wanted to garner staff buy-in to the highest degree possible. apart from wanting less informed colleagues to understand the benefits of a digitization program, it was also likely that such colleagues would help solicit projects through their liaison work with programs of study across campus. one unlv library digitization survey question asked, “how would you characterize support for digitization in your library by the majority of those providing content for digitization projects?” “consistent support” was indicated by 65.9 percent of respondents; 15.9 percent indicated “very strong support, top priority,” 13.6 percent indicated neutrality, and 4.6 success factors and strategic planning | lampert and vaughan 119 percent indicated either minimal support or even some resistance. to help garner staff buy-in and set the stage for revitalizing the unlv libraries’ digitization efforts, we began laying the groundwork to educate and engage library staff in the benefits of a digitization program. this work included language successfully woven into the unlv libraries’ strategic plan and an authored white paper posing engaging questions to the larger library audience related to the strategic direction of the program. finally, we planned and executed two digitization workshops for library staff. n the strategic plan one unlv library digitization survey question asked, “is the digitization program or digitization activities referenced in your library’s strategic plan?” a total of 63.4 percent indicated yes, with an additional 22 percent indicating no specific references, but rather implied references. only 7.3 percent indicated that the digitization program was not referenced in any manner in the strategic plan, while, surprisingly, 3 responses (7.3 percent) indicated that their library doesn’t have a strategic plan. the unlv libraries’ strategic plan is an important document authored with wide feedback from library staff, and it exemplifies the participatory decision-making process in place in the library. the current iteration of the strategic plan covers 2007–9 and includes various goals with supporting strategies and action items.16 in addition, all action items have associated assessment metrics and library staff responsible for championing the action items. departmental annual reports explicitly reference progress toward strategic plan goals. as such, if goals related to the digitization program appear in the strategic plan, that’s a clear indication, to some degree, of staff buy-in in acknowledging the significance of the digitization program. fortunately, digitization efforts figure prominently in several goals, strategies, and action items, including the following: n increasingly provide access to digital collections and services to support instruction, research, and outreach while improving access to the unlv libraries’ print and media collections. n provide greater access to digital collections while continuing to build and improve access to collections in all formats to meet the research and teaching needs of the university. identify collections to digitize that are unique to unlv and that have a regional, national, and international research interest. create digital projects utilizing and linking collections. develop and adapt metadata and scanning standards that conform to national standards for all formats. provide content and metadata for regional and national digital projects. continue to develop expertise in the creation and management of digital collections and information. collaborate with faculty, students, and others outside the library in developing and presenting digital collections. n be a comprehensive resource for the documentation, investigation, and interpretation of the complex realities of the las vegas metropolitan area and provide an international focal point for the study of las vegas as a unique urban and cultural phenomenon. facilitate real and digital access to materials and information that document the historical, cultural, social, and environmental setting of las vegas and its region by identifying, collecting, preserving, and managing information and materials in all formats. identify unique collections that strengthen current collections of national and international significance in urban development and design, gaming, entertainment, and architecture. develop new access tools and enhance the use of current bibliographic and metadata utilities to provide access to physical and digital collections. develop web-based digital projects and exhibits based upon the collections. an associated capital campaign case statement associated with the strategic plan lists several gift opportunities that would benefit various aspects of the unlv libraries; several of these include gift ideas related to the digitization of materials. n the white paper another important step in laying the groundwork for the digitization program was a comprehensive white paper authored by the recently hired digitization projects librarian. the finished paper was originally given to the dean of libraries and thereafter to the administrative cabinet, and eventually distributed to all library staff. the outline of this white paper is provided as appendix b. the purpose of the white paper was multifaceted. after a brief historical context, the white paper addressed perhaps the single most important aspect of a digitization program—program planning—developing the strategic goals of the program, selecting and prioritizing projects though a formal decision-making process, and managing initiatives from idea to reality through efficient project teams. this first topic addressing the core values of the program had a strong educational purpose for the entire library staff—the ultimate audience of the paper. as part of its educational goal, the white paper enumerated the various strengths of digitization and why an institution 120 information technology and libraries | september 2009 would want to sustain a digitization program (providing greater worldwide access to unique materials, promoting and supporting education and learning when integrated with the curriculum, etc.). it defined distinctions between an ephemeral digital exhibit and a long-term published and maintained collection. it discussed the various components of a digital collection—images, multimedia, metadata, indexing, thematic presentation (and the preference to be unbiased), integration with other digital collections and the library website, etc. it posited important questions on sustenance and assessment, and defined concepts such as refreshing of data and migration of data to help set the stage for future philosophical discussions. given the myriad reasons one might want to publish a digital collection, checked by the reality that all the reasons and advantages may not be realized or given equal importance, the white paper listed several scenarios and asked if each scenario was a strong underlying goal for our program—in short, true or false: n “the libraries are interested in digitizing select unique items held in our collection and providing access to these items in new formats.” n “the libraries are interested in digitizing whole runs of an information resource for access in new formats.” n “the libraries should actively pursue funding to support major digitization initiatives.” n “the libraries should take advantage of the unique publicity, promotion, and marketing opportunities afforded by a digital project/program.” continuing with a purpose of defining boundaries of the new program, the paper asked questions related to audience, required skill sets, and resources. the second primary topic introduced the selection and prioritization of the items and ideas suggested for digitization. it posed questions related to content criteria (why does this idea warrant consideration? would complex or unique metadata be required from a subject specialist?) and listed various potential evaluative measures of project ideas (should we do this if another library is already doing a very similar project?). technical criteria considerations were enumerated, touching on interoperability of collections in different formats, technical infrastructure considerations, and so on. multiple simultaneous ideas beg for prioritization, and the white paper proposed a formal review process and the library staff and skill sets that would help make such a process successful. the third primary topic focused on the details of carrying an approved idea to reality, and strengthened the educational purpose of the white paper. it described the general planning steps for an approved project and included a list of typical steps involved with most digital projects—scanning; creating metadata, indexes, and controlled vocabulary; coding and designing the web interface; loading records into unlv libraries’ contentdm system; publicizing the launch of the project; and assessing the project after completion. one unlv library digitization survey question was related to thirteen such skills the unlv libraries identified as critical for a successful digitization program. the question asked respondents to rate skill levels possessed by personnel at their library, based on a five-point scale (from one to five: “no expertise,” “very limited expertise,” “working knowledge/enough to get by,” “advanced knowledge,” and “tremendous expertise”). neither “no expertise” nor “very limited expertise” garnered the highest number of responses for any of the skills. the overall rating average of all thirteen skills was 3.79 out of 5. the skills with the highest rating averages were “metadata creation/cataloging” 4.4 and “digital imaging/document scanning/post image processing/photography” with 4.27. the skills with the lowest rating averages were “marketing and promotion” with 2.95 followed by “multimedia formats” with 3.33. the unlv libraries’ white paper contained several appendixes that likely provided some of the richest content of the white paper. with the educational thrust completed, the appendixes drew a roadmap of “where do we want to go from here?” this roadmap suggested the revitalization of an overarching digital projects advisory committee, potential members of the committee, and functions of the committee. the committee would be responsible for soliciting and prioritizing ideas and tracking the progress of approved ideas to publication. the appendixes also proposed project teams (which would exist for each project), likely members of the project teams, and the functions of the project team to complete day-to-day digitization activities. the liaison between the digital projects advisory committee and the project team would be the digitization projects librarian, who would always serve on both. the last page of the white paper provided an illustration highlighting the various steps proposed in the lifecycle of a digital project—from concept to reality. n digitization workshops several months after the white paper had been shared, the next step in restructuring the program and building momentum was sponsoring two forums on digitization. the first one occurred in november 2006 and included two speakers brought in for the event, roy tennant (formerly user services architect with the california digital library and now with oclc) and ann lally (head of the digital initiatives program at the university of washington libraries). this session consisted of a success factors and strategic planning | lampert and vaughan 121 two-hour presentation and q&a to which all library staff were invited, followed by two breakout sessions. all three sessions were moderated by the digitization projects librarian. questions from these sessions are provided in appendix c. the breakout sessions were each targeted to specific departments in the unlv libraries. the first focused on providing access to digital collections (definitions of digital libraries, standards, designing useful metadata, accessibility and interoperability, etc.). the second focused on components of a well-built digital library (goals of a digitization program, content selection criteria, collaboration, evaluation and assessment, etc.). colleagues from other libraries in nevada were invited, and the forum was well attended and highly praised. the sessions were recorded and later made available on dvd for library staff unable to attend. this initial forum accomplished two important goals. first, it was an allstaff meeting offering a chance to meet, explore ideas, and learn from two well-known experts in the field. second, it offered a more intimate chance to talk about the technical and philosophical aspects of a digitization program for those individuals in the unlv libraries associated with such tasks. as a momentum-building opportunity for the digitization program, the forum was successful. the second workshop occurred in april 2007. to gain initial feedback on several digitization questions and to help focus this second workshop, we sent out a survey to several dozen library staff—those that would likely play some role at some point in the digitization program. the survey contained questions focused on several thematic areas: defining digital libraries, boundaries to the digitization program, users and audience, digital project design, and potential projects and ideas. it contained thirteen questions consisting of open-ended response questions, questions where the respondent ranked items on a five-point scale, and “select all that apply”–type questions. we distributed the survey to invitees to the second workshop, approximately three dozen individuals; of those, eighteen (about 50 percent) responded to most of the questions. the survey was closely tied to the white paper and meant to gauge early opinions on some of the questions posed by that paper. whereas the first workshop included some open q&a, the second session was structured as a hands-on workshop to answer some of the digitization questions and to illustrate the complexity of prioritizing projects. the second workshop began with a status update on the retooling of the unlv libraries’ digitization program. this was followed by an educational component that focused on a diagram that detailed the workflow of a typical digitization project and who was involved and that emphasized the fact that there is a lot of planning and effort needed to bring an idea to reality. in addition, we discussed project types and how digital projects can vary widely in scope, content, and purpose. finally, we shared general results from the aforementioned survey to help set the stage for the structured hands-on exercises. the outline for this second workshop is provided in appendix d. one question of the unlv library digitization survey asked, “on a scale of 1 to 5, how important are each of the factors in weighing whether to proceed with a proposal for a new digital collection project, or enhancement of an existing project?” eight factors were listed, and the fivepoint scale was used (from one to five: “not important,” “less important,” “neutral,” “important,” and “vitally important”). the average rating for all eight factors was 3.66. the two most important factors were “collection includes unique items” (4.49 average rating) and “collection includes items for which there is a preservation concern or to make fragile items more accessible to the public” (3.95 average rating). the factors with the lowest average ratings were “collection includes integration of various media into a themed presentation” (2.54 average rating) followed by “collection involves a whole run of an information resource (i.e., such as an entire manuscript, newspaper run, etc.” (3.39 average rating). the earlier arl survey asked a somewhat related question, “what is/has been the purpose of these digitization efforts? check all that apply.” of the six possible responses (which differed somewhat from those in the unlv library digitization survey), the most frequent responses were “improved access to library collections,” “support for research,” and “preservation.”17 the earlier survey also asked the question, “what are the criteria for selecting material to be digitized? check all that apply.” the most frequent responses were “subject matter,” “material is part of a collection being digitized,” and “rarity or uniqueness of the item(s).”18 the first exercise of the second digitization workshop focused on digital collection brainstorming. the authors provided a list of ten project examples and asked each of the six tables (with four colleagues each) to prioritize the ideas. afterward, a speaker from each table presented the prioritizations and defended their rankings. this exercise successfully illustrated to peers in attendance that different groups of people have different ideas about what’s important and what constitutes prime materials for digitization. the rankings from the varying tables were quite divergent. a related question asked of the arl libraries in the unlv library digitization survey was “from where have ideas originated for existing, published digital collection at your library?” and offered six choices. respondents could mark multiple items. the most chosen answer (92.7 percent) was “special collections, archives, or library with a specialized collection or focus.” the least chosen answer (51.2 percent) was “an external donor, friend of the library, community user, etc.” for the second part of the workshop exercise, each table came up with their own digital collection ideas, defined the audience and content of the proposal, and defended and 122 information technology and libraries | september 2009 explained why they thought these were good proposals. fourteen unique and varied ideas were proposed, most of which were tightly focused on las vegas and nevada, such as “history of las vegas,” “unlv yearbooks,” “las vegas gambling and gamblers,” and “african american entertainers in las vegas.” other proposals were less tied to the area, such as a “botany collection,” “movie posters,” “children’s literature,” “architecture,” and “federal land management.” this exercise successfully showed that ideas for digital collections stretch across a broad spectrum, as broad as the individual brainchilden themselves. finally, in the last digitization workshop exercise, each table came up with specialties, roles, and skills of candidates who could potentially serve on the proposed committee, and defended their rationale—in other words, committee success factors. this exercise generated nineteen skills seen as beneficial by one or more of the group tables. at the end of the workshop, we asked if others had alternate ideas to the proposed committee. none surfaced, and the audience thought such a committee should be reestablished. this second workshop concluded with a brief discussion on next steps—drafting a charge for the committee, choosing members, and a plug for the expectation of subject liaisons working with their respective areas to help better identify opportunities for collaboration on digital projects across campus. n toward the future digital projects currently maintained by the unlv libraries include both static web exhibits in the tradition of unlv’s first digitization efforts, as well as several searchable contentdm–powered collections. the unlv libraries have also sought to continue collaborative efforts, participating as project partners for the western waters digital library (phase 1) and continuing in a regional collaboration as a hosting partner in the mountain west digital library. partnerships were shown in the unlv library digitization survey to garner increased buy-in for projects, with one respondent commenting that faculty partnerships had been “the biggest factor for success of a digital library project.” institutional priorities at unlv libraries reflect another respondent’s comment regarding “interesting archival collections” as a success factor. one recently launched unlv collection is the showgirls collection (2006), focused on a themed collection of historical material about las vegas entertainment history.19 another recently launched collection, the nevada test site oral history project (2008), recounts the memories of those affiliated with and affected by the nevada test site during the era of cold war nuclear testing and includes searchable transcripts, selected audio and video clips, and scanned photographs and images.20 with general library approval, the restructured digitization projects advisory committee was established in july 2007 with six members drawn from library technologies, special collections, the subject specialists, and at large. the advisory committee has drafted and gained approval for several key documents to help govern the committee’s future work. this includes a collection development policy for digitization projects and a project proposal form to be completed by the individual or group proposing an idea for a digital collection. at the time of writing, the committee is just now at the point of advertising the project proposal form and process, and time will tell how successful these documents prove. in the unlv library digitization survey, 65.4 percent responded that a digitization mission statement or collection development policy was in place at their institution. one goal at unlv is to “ramp up” the number of simultaneous digitization projects underway at any one time at unlv. many items in the special collections are ripe for digitization. many of these are uncataloged, and digitizing such collections would help promote these hidden treasures. related to ramping up production, one unlv library digitization survey question asked, “on average over the past three years, approximately how many new digital collections are published each year?” responses ranged from zero new collections to sixty. the average number of new collections added each year was 6.4 for the 32 respondents who gave exact numerical answers. while this is perhaps double the unlv libraries’ current rate of production, it illustrates that increasing production is an achievable goal. staffing and funding for the unlv libraries’ digitization program have both seen increases over the past several years. a new application developer was hired, and a new graphics/multimedia specialist filled an existing vacancy. together, these staff have helped with projects such as modifying contentdm templates, graphic design, and multimedia creation related to digital projects, in addition to working on other web-based projects not necessarily related to the digitization program. another position has a job focus shifted toward usability for all things webbased, including digitization projects. in terms of funding, the two most recent projects at the unlv libraries are both the result of successful grants. the recently launched nevada test site oral history project was the result of two grants from the u.s. departments of education and energy. subsequently, a $95,000 lsta grant proposal seeking to digitize key items related to the history of southern nevada from 1900 to 1925 was funded for 2008–9, with the resulting digital collection publicly launched in may 2009. this collection, southern nevada: the boomtown years, contains more than 1,500 items from several institutions, focused on the heyday of mining town life in southern success factors and strategic planning | lampert and vaughan 123 nevada during the early twentieth century.21 this grant funded four temporary positions: a metadata specialist, an archivist, a digital projects intern, and an education consultant to help tie the digitized collection into the k–12 curriculum. grants will likely play a large role in the unlv libraries’ future digitization activities. the unlv library digitization survey asked, “has your institution been the recipient of a grant or gift whose primary focus was to help efforts geared toward digitization of a particular collection or to support the overall efforts of the digitization program?” the question sought to determine if grants had played a role, and if so, whether it was primarily large grants (defined as > $100,000), small grants (< $100,000), or both. the majority of responses (46.2 percent), indicated a combination of both small and large grants had been received in support of a project or the program. an additional 25.6 percent indicated that large grants had played a role, and 23.1 percent indicated that one or more small grants had played a role. two respondents (5.1 percent) indicated that no grants had been received or that they had not applied for any grants. the earlier arl survey asked the question, “what was/is the source of the funds for digitization activities? check all that apply.” of seven possible responses, “grant” was the second most frequent response, trailing only “library.”22 with an eye toward the future, the survey administered to arl libraries asked two blunt questions summarizing the overall thrust of the survey. one of the final open-ended survey questions asked, “what are some of the factors that you feel have contributed to the success of your institution’s digitization program?” forty respondents offered answers that ranged from listing one item to multiple items. several responses along the same general theme seemed to surface, which could be organized into rough clusters. in general, support from library administration was mentioned by a dozen respondents, with such statements as “consistent interest on the part of higher level administration,” “having support for the digitization program at an administrative level from the very beginning,” “good support from the library administration,” “support of the dean,” and, mentioned multiple times in the same precise language, “support from library administration.” faculty collaboration and interest across campus was mentioned by ten respondents, evidenced by statements such as “strong collaboration with faculty partners,” “support of faculty and other partners,” “interest from faculty,” “heavily involving faculty in particular . . . ensures that we can have continued funding since the faculty can lobby the provost’s office,” and “grant writing partnerships with faculty.” passionate individuals involved with the program and/or support from other staff in the libraries were mentioned by ten respondents, with comments such as “program management is motivated to achieve success,” “a strong department head,” “individual staff member ’s dedication to a project,” “commitment of the people involved,” “team work, different departments and staff willing to work together,” and “supportive individuals within the library.” having “good” content to digitize was mentioned by seven respondents, with statements such as “good content,” “collection strength,” “good collections,” and “availability of unique source materials.” strategic plan or goals integration was mentioned in several responses, such as “strong financial commitment from the strategic plan” and “mainstreaming the work of digital collection building into the strategic goals of many library departments.” successful grants and donor cultivation were mentioned by four respondents. other responses were more unique, such as one respondent’s one-word response—“luck”—and other responses such as “nimbleness, willingness, and creativity,” and “a vision for large-scale production, and an ability to achieve it.” the final unlv library digitization survey question asked, “what are the biggest challenges for your institution’s digitization program?” thirty-nine respondents provided feedback, and again, several variations on a theme emerged. the most common response, unsurprisingly, “not enough staffing,” was mentioned by eighteen respondents, with responses such as “lack of support for staffing at all necessary levels,” “the real problem is people, we don’t have enough staff,” “limited by staff,” and “we need more full-time people.” following this was (a likely related response) “funding,” mentioned by another nine respondents, with statements such as “funding for external digitization,” “identifying enough funding to support conversion,” “we could always use more money,” and, succinctly, “money.” related to staffing, specifically, six responses focused on technical staff or support from technical staff, such as “need more it (information technology) staff,” “need support from existing it staff,” “not enough application development staff,” and “limited technical expertise.” prioritization and demand issues surfaced in six responses, with responses such as “prioritizing efforts now that many more requests for digital projects have been submitted,” “prioritization,” “can’t keep up with demand,” and “everyone wants to digitize everything.” workflow was mentioned in four responses, such as “workflow bottlenecks,” “we need to simplify the process of getting materials into the repository,” and “it takes far longer to describe an object than to digitize it, thus creating bottlenecks.” “not enough space” was mentioned by three respondents, and “maintaining general librarywide staff support for the program” was mentioned by two respondents. the unlv libraries will keep in mind the experiences of our colleagues, as few, if any, libraries are likely immune to similar issues. 124 information technology and libraries | september 2009 n conclusions the unlv library digitization survey revealed, not surprisingly, that not all libraries, even those of high stature, are created equally. many have struggled to some extent in growing and sustaining their digitization programs. many have numerous published projects, others have few or perhaps even none. administrative and fellow colleague support varies, as does funding. additional questions remain to be tackled at the unlv libraries. how precisely will we define success for the digitization program? by the number of published collections? by the number of successful grants executed? by the number of image views or metadata record accesses? by the frequency of press in publications and word-of-mouth praise from fellow colleagues? ideas abound, but no definitive answers exist as of yet. at the larger level, other questions are looming. as libraries continue to promote themselves as relevant in the digital age, and promote themselves as a (or the) central partner in student learning, to what degree will libraries’ digital collections be tied into the educational curriculum, whether at their own affiliated institutions or with k–12 in their own states as well as beyond? clearly the profession is changing, with library schools creating courses and certificate programs in digitization. discussions about the integration of various information silos, metadata crosswalking, and item exposure in other online systems used by students will continue. library digitized collections are primary resources involved in such discussions. while these questions persist, it’s hoped that at a minimum, the unlv libraries have established the foundational structure to foster what we hope will be a successful digitization program. references 1. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2002 report,” may 23, 2002, www.imls.gov/publications/ techdig02/2002report.pdf (accessed mar. 1, 2009). 2. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report,” jan. 2006, www.imls.gov/resources/ techdig05/technology%2bdigitization.pdf (accessed mar. 1, 2009). 3. rebecca mugridge, managing digitization activities, spec kit 294 (washington, d.c.: association of research libraries, 2006): 11. 4. ross housewright and roger schonfeld, “ithaka’s 2006 studies of key stakeholders in the digital transformation in higher education,” aug. 18, 2008, www.ithaka.org/research/ ithakas%202006%20studies%20of%20key%20stakeholders%20 in%20the%20digital%20transformation%20in%20higher%20 education.pdf (accessed mar 1, 2009). 5. ibid. 6. university of nevada, las vegas university libraries, “jeanne russell janish, botanical illustrator: landscapes of china and the southwest,” oct. 17, 2006, http://library.unlv .edu/speccol/janish/index.html (accessed mar. 1, 2009). 7. university of nevada, las vegas university libraries, “early las vegas,” http://digital.library.unlv.edu/early_ las_vegas/earlylasvegas/earlylasvegas.html (accessed mar. 1, 2009). 8. arlitsch, kenning, and jeff jonsson, “aggregating distributed digital collections in the mountain west digital library with the contentdm multi-site server,” library hi tech 23, no. 2 (2005): 221. 9. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report.” 10. michael boock and ruth vondracek, “organizing for digitization: a survey,” portal: libraries and the academy 6, no. 2 (2006), http://muse.jhu.edu/journals/portal_libraries_and_ the_academy/v006/6.2boock.pdf (accessed mar. 1, 2009). 11. mugridge, managing digitization activities, 12. 12. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report.” 13. brad eden, “managing and directing a digital project,” online information review 25, no. 6 (2001), www.emerald insight.com/insight/viewpdf.jsp?contenttype=article& filename=html/output/published/emeraldfulltextarticle/ pdf/2640250607.pdf (accessed mar. 1, 2009). 14. mugridge, managing digitization activities, 32–33. 15. boock and vondracek, “organizing for digitization: a survey.” 16. university of nevada, las vegas university libraries, “university libraries strategic goals and objectives,” june 1, 2005, www.library.unlv.edu/about/strategic_goals.pdf (accessed mar. 1, 2009). 17. mugridge, managing digitization activities, 20. 18. ibid, 48. 19. university of nevada, las vegas university libraries, “showgirls,” http://digital.library.unlv.edu/showgirls/ (accessed mar. 1, 2009). 20. university of nevada, las vegas university libraries, “nevada test site oral history project,” http://digital.library .unlv.edu/ntsohp/ (accessed mar. 1, 2009). 21. university of nevada, las vegas university libraries, “southern nevada: the boomtown years,” http://digital .library.unlv.edu/boomtown/ (accessed may 15, 2009). 22. mugridge, managing digitization activities, 40. success factors and strategic planning | lampert and vaughan 125 appendix a. unlv library digitization survey responses 1. is the digitization program or digitization activities referenced in your library’s strategic plan? answer options (41 responses total) response percent response count yes 63.4 26 no 7.3 3 not specifically, but implied 22.0 9 our library doesn’t have a strategic plan 7.3 3 2. how would you characterize current support for digitization by your library’s administration? answer options (42 responses total) response percent response count very strong support, top priority 31.0 13 consistently supportive 40.5 17 neutral 14.3 6 minimal support, 7.1 3 very little support, or some resistance 7.1 3 3. how would you characterize support for digitization in your library by the majority of those providing content for digitization projects (i.e., regardless of whether those providing content have as a primary or a minor responsibility provisioning content for digitization projects)? answer options (44 responses total) response percent response count very strong support, top priority 15.9 7 consistently supportive 65.9 29 neutral 13.6 6 minimal support 2.3 1 very little support, or some resistance 2.3 1 126 information technology and libraries | september 2009 4. what year do you feel your library published its first “major” digital collection? major is defined as this was the first project deemed as having permanence and which would be sustained; it has associated metadata, etc. if you do not know, you may estimate or type “unknown.” responses ranged from 1990 to 2007. 5. to date, approximately how many digital collections has your library published? (please do not include ephemeral exhibits that may have existed in the past but no longer are present or sustained.) responses ranged from 1 to 1,000s. the great majority of responses were under 100; four responses were between 100 and 200, and one response was “1,000s.” success factors and strategic planning | lampert and vaughan 127 6. on average over the past 3 years, approximately how many new digital collections are published each year? all but two responses ranged from 0 to 10. one response was 13, one was 60. 7. what hosting platform(s) do you use for your digital collections (e.g., contentdm, etc.)? 8. does your institution have an institutional repository (e.g., dspace)? answer options (41 responses total) response percent response count yes 73.2 30 no 26.8 11 9. if the answer was “yes” in question 5, is your institutional repository using the same software as your digital collections? answer options (30 responses total) response percent response count yes 26.7 8 no 73.3 22 128 information technology and libraries | september 2009 10. is there an individual at your library whose central job responsibility is the development, oversight, and management of the library’s digitization program? (for purposes of this survey, central job responsibility means that 50 percent or more of the employee’s time is dedicated to digitization activities.) answer options (38 responses total) response percent response count yes 78.9 30 no 21.1 8 11. are there regular, full-time staff at your library who have as their primary or one of their primary job responsibilities support of the digitization program? for this question, a primary job responsibility means that at least 20 percent of their normal time is spent on activities directly related to supporting the digitization program or development of a digital collection. (mark all that apply) answer options (39 responses total) response percent response count digital imaging/document scanning, post-image processing, photography 82.1 32 metadata creation/cataloging 79.5 31 archival research of documents included in a collection(s) 28.2 11 administration of the hosting server 53.8 21 grant writing/donor cultivation/program or collection marketing 23.1 9 project management 61.5 24 multimedia formats 25.6 10 database design and data manipulation 53.8 21 maintenance, customization, and/or configuration of digital asset management software or features within that software (e.g., contentdm) 64.1 25 programming languages 30.8 12 web design and development 71.8 28 usability 25.6 10 marketing and promotion 28.2 11 none of the above 2.6 1 12. approximately how many individuals not on the full-time library staff payroll (i.e., student workers, interns, fieldworkers, volunteers) are currently working on digitization projects? answers ranged from 0 to “approximately 46.” the majority of responses (24) fell between 0 and 10 workers; twelve responses indicated more than 10; several responses indicated “unknown.” success factors and strategic planning | lampert and vaughan 129 13. has your library funded staff development, training, or conference opportunities that directly relate to your digitization program and activities for one or more library staff members? answer options (41 responses total) response percent response count yes, frequently, one or more staff have been funded by library administration for such activities 48.8 20 yes, occasionally, one or more staff have been funded by library administration for such activities 51.2 21 no, to the best of my knowledge, no library staff member has been funded for such activities 0.0 0 14. where does the majority of digitization work take place? answer options (41 responses total) response percent response count centralized in the library (majority of content digitized using library staff and equipment in one department) 48.8 20 decentralized (majority of content digitized in multiple library departments or outside the library by other university entities) 12.2 5 through vendors or outsourcing 7.3 3 hybrid of approaches depending on project 31.7 13 15. on a scale of 1 to 5 (1 being least important and 5 being vitally important), how important are each of the factors in weighing whether to proceed with a proposal for a new digital collection project or enhancement of an existing project? answer options (41 responses total) not important less important neutral important vitally important rating average response count collection includes item(s) for which there is a preservation concern or to make fragile item(s) more accessible to the public 0 1 9 22 9 3.95 41 collection includes unique items 0 0 1 19 21 4.49 41 collection involves a whole run of an information resource (e.g., an entire manuscript, newspaper run, etc.) 2 5 11 21 2 3.39 41 130 information technology and libraries | september 2009 answer options (41 responses total) not important less important neutral important vitally important rating average response count collection includes the integration of various media (i.e., images, documents, audio) into a themed presentation 7 11 17 6 0 2.54 41 collection has a direct tie to educational programs and initiatives (e.g., university courses, statewide education programs, or k–12 education) 3 3 6 17 12 3.78 41 collection supports scholarly communication and/or management of institutional content 1 4 7 21 8 3.76 41 collection involves a collaboration with university colleagues 1 3 9 18 10 3.83 41 collection involves a collaboration with entities external to the university (e.g., public libraries, historical societies, museums) 2 4 11 19 5 3.51 41 16. from where have ideas originated for existing, published digital collections at your library? in other words, have one or more digital collections been the brainchild of one of the following? (mark all that apply) answer options (41 responses total) response percent response count library subject liaison or staff working with teaching faculty on a regular basis 75.6 31 library administration 65.9 27 special collections, archives, or library with a specialized collection or focus 92.7 38 digitization program manager 63.4 26 university staff or faculty member outside the library 68.3 28 an external donor, friend of the library, community user, etc. 51.2 21 (continued from previous page) success factors and strategic planning | lampert and vaughan 131 17. to whom are new projects first proposed to be evaluated for digitization consideration? answer options (38 responses total) response percent response count to an individual decision-maker 23.7 9 to a committee for review by multiple people 42.1 16 no formal process 34.2 13 18. how are approved projects ultimately prioritized? answer options (37 responses total) response percent response count by a single decision-maker 18.9 7 by a committee for review by multiple people 54.1 20 by departments or groups outside of the library 0.0 0 no formal process 27.0 10 19. are digitization program mission statements, selection criteria, or specific prioritization procedures in use? answer options (40 responses total) response percent response count yes, one or more of these forms of documentation exist detailing process 67.5 27 yes, some criteria are used but no formal documentation exists 25.0 10 no documented process in use 7.5 3 20. what general evaluation criteria do you employ to measure how successful a typical digital project is? (mark all that apply) answer options (39 responses total) response percent response count log analysis showing utilization/record views of digital collection items 69.2 27 analysis of feedback or survey responses associated with the digital collection 38.5 15 publicity generated by, or citations referencing, digital collection 46.2 18 e-commerce sales or reproduction requests for digital images 12.8 5 we have no specific evaluation measures in use 33.3 13 132 information technology and libraries | september 2009 21. has your institution been the recipient of a grant or gift whose primary focus was to help efforts geared toward digitization of a particular collection or to support the overall efforts of the digitization program? answer options (39 responses total) response percent response count we have received one or more smaller grants or donations (each of which was $100,000 or less) to support a digital collection/program 23.1 9 we have received one or more larger grants or donations (each of which was greater than $100,000) to support a digital collection/program 25.6 10 we have received a mix of small and large grants or donations to support a digital collection/program 46.2 18 we have been unsuccessful in receiving grants or have not applied for any grants—grants and/or donations have not played any role whatsoever in supporting a digital collection or our digitization program 5.1 2 22. how would you rate the overall level of buy-in for collaborative digitization projects between the library and external partners (an external partner is someone not on the full-time library staff payroll, such as other university colleagues, colleagues from other universities, etc.)? answer options (41 responses total) response percent response count excellent 41.5 17 good 39.0 16 neutral 4.9 2 minimal 7.3 3 low or none 0.0 0 not applicable—our library has not yet published or attempted to publish a collaborative digital project involving individuals outside the library 7.3 3 23. when considering the content available for digitization, which of the following statements apply? (mark all that apply) answer options (40 responses total) response percent response count at my institution, there is a lack of suitable library collections for digitization 0.0 0 content providers regularly contact the digitization program with project ideas 52.5 21 the main source of content for new digitization projects comes from special collections, archives, other libraries with specialized collections (maps, music, etc.), or local cultural organizations (historical societies, museums) 87.5 35 success factors and strategic planning | lampert and vaughan 133 answer options (40 responses total) response percent response count the main source of content for new digitization projects comes from born digital materials (such as dissertations, learning objects, or faculty research materials) 32.5 13 content digitization is mainly limited by available resources (lack of staffing, space, equipment, expertise) 47.5 19 obtaining good content for digitization can be challenging 7.5 3 24. various types of expertise are important in collaborative digitization projects. please rate the level of your local library staff’s expertise in the following areas (1–5 scale, with 1 having no expertise and 5 having tremendous expertise). answer options (41 responses total) no expertise very limited expertise working knowledge/ enough to “get by” advanced knowledge tremendous expertise n/a rating average response count digital imaging/ document scanning, post image processing, photography 0 1 3 21 16 0 4.27 41 metadata creation/ cataloging 0 0 2 20 18 0 4.40 40 archival research of documents included in a collection 0 2 6 15 16 2 4.15 41 administration of the hosting server 1 2 7 16 15 0 4.02 41 grant writing/ donor cultivation 1 4 13 13 8 2 3.59 41 project management 0 1 9 23 8 0 3.93 41 multimedia formats 0 5 21 10 4 1 3.33 41 database design and data manipulation 0 4 9 14 13 1 3.90 41 (continued from previous page) 134 information technology and libraries | september 2009 answer options (41 responses total) no expertise very limited expertise working knowledge/ enough to “get by” advanced knowledge tremendous expertise n/a rating average response count digital asset management software (e.g., contentdm) 3 0 5 21 11 0 3.93 40 programming languages 4 3 14 9 11 0 3.49 41 web design and development 2 1 13 10 15 0 3.85 41 usability 1 7 12 13 8 0 3.49 41 marketing and promotion 2 11 17 7 3 1 2.95 41 25. what are some of the factors that you feel have contributed to the success of your institution’s digitization program? survey responses were quite diverse because respondents were speaking to their own perceptions and institutional experience. the general trend of responses are discussed in the body of the paper. 26. what are the biggest challenges for your institution’s digitization program? survey responses were quite diverse because respondents were speaking to their own perceptions and institutional experience. the general trend of responses are discussed in the body of the paper. appendix b. white paper organization i. introduction ii. current status of digitization projects at the unlv libraries iii. topic 1: program planning a. are there boundaries to the libraries digitization program? what should the program support? b. what resources are needed to realize program goals? c. who is the user or audience? d. when selecting and designing future projects, how can high-quality information be presented in online formats incorporating new features while remaining un-biased and accurate in service provision? e. to what degree do digitization initiatives need their own identity versus heavily integrating with the libraries’ other online components, such as the general website? f. how do the libraries plan on sustaining and evaluating digital collections over time? g. what type of authority will review projects at completion? how will the project be evaluated and promoted? iv. topic 2: initiative selection and prioritization a. project selection: what content criteria should projects fall within in order to be considered for digitization and what is the justification for conversion of the proposed materials? (continued from previous page) success factors and strategic planning | lampert and vaughan 135 b. project selection: what technical criteria should projects fall within in order to be considered for digitization? c. project selection: how does the project relate to, interact with, or complement other published projects and collections available globally, nationally, and locally? d. project selection and prioritization: after a project meets all selection criteria, resources may need to be evaluated before the proposal reaches final approval. what information needs to be discussed in order to finalize the selection process, select between qualified project candidates, and begin the prioritization process for approved proposals? e. project prioritization: should we develop a formal review process? v. topic 3: project planning a. what are the planning steps that each project requires? b. who will be responsible for the different steps in the project plan and department workload? c. how can the libraries provide rich metadata and useful access points? d. what type of web design will each project require? e. what type of communication needs to exist between groups during the project? vi. concluding remarks vii. related links and resources cited viii. white paper appendixes a. working list of advisory committee functions and project workgroup functions b. contentdm software: roles and expertise c. project team workflow d. contentdm elements appendix c. first workshop questions general questions 1. how do you define a digital library? do the terms “repository,” “digital project,” “exhibit,” or “online collection” connote different things? if so, what are the differences, similarities, and boundaries for each? 2. what factors have contributed to a successful digitization program at your institution? did anything go drastically wrong? were there any surprises? what should new digitization programs be cautious and aware of? 3. what is the role, specifically, of the academic library in creating digital collections? how is digitization tied to the mission of your institution? 4. why digitize and for whom? do digital libraries need their own mission statement or philosophy because they differ from physical collections? should there be boundaries to what is digitized? 5. what standards are most widely in use at this time? what does the future hold? are there new standards you are interested in? technical questions, metadata questions 1. what are some of the recommended components of digital library infrastructure that should be in place to support a digitization program (equipment, staff, planning, technical expertise, content expertise, etc?) 2. what are the relationships between library digitization initiatives, the library website, the campus website or portal, and the web? in what ways do these information sources overlap, interoperate, or require boundaries? 3. how do you decide on what technology to use? what is the decision-making process when implementing a new technology? 4. standards are used in various ways during digitization. what is the importance of using standards, and are there areas where standards should be relaxed, or not used at all? how do digitization programs deal with evolving standards? 5. preservation isn’t talked about as much as it used to be. what’s your solution or strategy to the problem of preserving digital materials? 6. will embedded metadata ever be the norm for digital objects, or will we continue to rely on collection management like contentdm to link digital objects to their associated metadata? 136 information technology and libraries | september 2009 appendix d. second workshop outline 1. introduction—purpose/focus of the meeting a. to talk about next steps in the digitization program b. quick review of the current status and where the program has been c. serve to further educate participants on the steps involved in taking a project idea to reality d. goals for participants: understand types of projects and project prioritization; engage in activities on ideas and prioritization; talk about process and discuss committee; open forum 2. staff digitization survey discussion a. “defining digital libraries” b. “boundaries to the digitization program” c. “users and audience” d. “digital project design” e. “potential projects and ideas” 3. first group exercise: digital project idea ranking and defense of ranking 4. second group exercise: digital project idea brainstorming and defense of ideas brainstormed 5. concept/proposal for a digitization advisory committee 6. conclusion and next steps collections and design questions 1. how do you decide what should be included in a digital library? does the digital library need a collection development policy and if so, what type? how are projects prioritized at your institution? 2. how do you decide who your user is? are digital libraries targeting mobile users or other users with unique needs? what value-added material compliments and enhances digital collections (i.e., item-level metadata records, guided searches, narrative or scholarly content, teaching material, etc.)? 3. how should digital libraries be assessed and evaluated? how do you gauge the success of a digital collection, exhibit, or library? what has been proven and disproved in the short time that libraries have been doing digital projects? 4. what role do digital libraries play in marketing the library? how do you market your digital collections? are there any design criteria that should be considered for the web presence of digital libraries (should the digital library look like the library website, the campus website, or have a unique look and feel)? 5. do you have any experience partnering with teaching faculty to create digital collections? how are collaborations initiated? are such collaborations a priority? what other types of collaborations are you involved in now? how do you achieve consensus with a diverse group of collaborators? to what degree is centralization important or unnecessary? 2 information technology and libraries | march 2009 andrew k. pace president’s message: lita now andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. a t the time of this writing, my term as lita president is half over; by the time of publication, i will be in the home stretch—a phrase that, to me, always connotes relief and satisfaction that is never truly realized. i hope that this time between ala conferences is a time of reflection for the lita board, committees, interest groups, and the membership at large. various strategic planning sessions are, i hope, leading us down a path of renewal and regeneration of the division. of course, the world around us will have its effect—in particular, a political and economic effect. first, the politics. i was asked recently to give my opinion about where the new administration should focus its attention regarding library technology. i had very little time to think of a pithy answer to this question, so i answered with my gut that the united states needs to continue its investment in it infrastructure so that we are on par with other industrialized nations while also lending its aid to countries that are lagging behind. furthermore, i thought it an apt time to redress issues of data privacy and retention. the latter is often far from our minds in a world more connected, increasingly through wireless technology, and with a user base that, as one privacy expert put it, would happily trade a dna sample for an extra value meal. i will resist the urge to write at greater length a treatise on the bill of rights and its status in 2008. i will hope, however, that lita’s technology and access and legislation and regulation committees will feel reinvigorated post–election and post–inauguration to look carefully at the issues of it policy. our penchant for new tools should always be guided and tempered by the implementation and support of policies that rationalize their use. as for the economy, it is our new backdrop. one anecdotal view of this is the number of e-mails i’ve received from committee appointees apologizing that they will not be able to attend ala conferences as planned because of the economic downturn and local cuts to library budgets. libraries themselves are in a paradoxical situation—increasing demand for the free services that libraries offer while simultaneously facing massive budget cuts that support the very collections and programs people are demanding. what can we do? well, i would suggest that we look at library technology through a lens of efficiency and cost savings, not just from a perspective of what is cool or trendy. when it comes to running systems, we need to keep our focus on end-user satisfaction while considering total cost of ownership. and if i may be selfish for a moment, i hope that we will not abandon our professional networks and volunteer activities. while we all make sacrifices of time, money, and talent to support our profession, it is often tempting when economic times are hard to isolate ourselves from the professional networks that sustain us in times of plenty. politics and economics? though i often enjoy being cynical, i also try to make lemonade from lemons whenever i can. i think there are opportunities for libraries to get their own economic bailout in supporting public works and emphasizing our role in contributing to the public good. we should turn our “woe-are-we” tendencies that decry budget cuts and low salaries into championed stories of “what libraries have done for you lately.” and we should go back to the roots of it, no matter how mythical or anachronistic, and think about what we can do technically to improve systemwide efficiencies. i encourage the membership to stay involved and reengage, whether through direct participation in lita activities or through a closer following of the activities in the ala office of information technology policy (oitp, www.ala.org/ala/aboutala/offices/oitp) and the ala washington office itself. there is much to follow in the world that affects our profession, and so many are doing the heavy lifting for us. all we need to do sometimes is pay attention. make fun of me if you want for stealing a campaign phrase from richard nixon, but i kept coming back to it in my head. in short, library information technology— now more than ever. 5 tails wagging dogs a funny thing happened on the way to the form. in the past decade, many libraries believed they were developing or using automated systems to produce catalog cards, or order slips, or circulation control records. the trauma of aacr2 implementation has helped many to realize belatedly that they have, in fact, been building data bases. libraries must relate their own machine-readable records to each other in a new way as they face new applications. further methods of relating and using records from different libraries, and even different networks, are becoming necessities in our increasingly interdependent world. a narrow view of the process of creating records has often resulted in introduction of nonstandard practices that provide the required immediate result, but create garbage in the data base. in effect, letting the tails wag the dogs. for many years, john kountz and the tesla (technical standards for library automation) committee addressed this issue forcefully, but were as voices in the wilderness. the problems created are the problems of success. the expectations libraries have developed have outstripped their practices. many libraries are only now seriously addressing the practices they have used to create data bases that already contain hundreds of thousands of records. precisely because of its success, the oclc system is a useful case in point. in general, oclc has adhered closely to marc standards. in call number and holding fields, national standards have been late forthcoming, and libraries have often improvised. meeting the procrustean needs of catalog cards has ofttimes blinded libraries to the long-term effects of their practices. multiple subfield codes to force call number "stacking" and omission of periods from lc call numbers are two examples of card-driven practice. not following recommended oclc practice of fully updating the record at each use has created archive tapes requiring significant manual effort to properly reflect library holdings. variant branch cataloging practices create dilemmas. some malpractices have resulted from attempts to beat pricing algorithms . some, like retaining extraneous fields or accepting default options when they are incorrect, merely reflect laziness or shortsighted procedures. while implementing systems in the present, libraries must keep a weather eye to the future. what new requirements will future systems place on records being created today? brian aveney 222 information technology and libraries | december 2006 social engineering is the use of nontechnical means to gain unauthorized access to information or computer systems. while this method is recognized as a major security threat in the computer industry, little has been done to address it in the library field. this is of particular concern because libraries increasingly have access to databases of both proprietary and personal information. this tutorial is designed to increase the awareness of library staff in regard to the issue of social engineering. one morning the phone rings at the circulation desk; the assistant, joyce, answers. “seashore branch public library, how may we help you?” she asks, smiling. “my wife and i recently moved and i wanted to confirm that you had our current address,” a pleasant male voice responds. “could you give me your name please?” “the card is in my wife’s name, jennifer greene. we’ve been so busy with the move that she hasn’t had a chance to catch up with everything.” “okay, i have her information here. 123 main street, apartment 2b. is that correct?” “thank you so much, that’s it. do you have our new number or is it still 555-555-1234 in your records?” “let me see . . . no, i think we have your new number.” “could you read it back to me?” “sure . . . 555-555-6789, is that right?” “555-555-6789 . . . that’s right. thank you very much, you’ve been very helpful.’ “no problem, that’s what we’re here for.” what just happened? what happened to joyce may have been exactly what it appeared to be—a conscientious spouse trying to make sure information was updated after a move. but what else could it have been—research for an identity theft, or a stalker trying to get personal information? we have no way of knowing. all reasons except for the first, innocent, reason are covered by the term social engineering. in the language of computer hackers, social engineering is a nontechnical hack. it is the use of trickery, persuasion, impersonation, emotional manipulation, and abuse of trust to gain information or computer-system access through the human interface. regardless of an institution’s commitment to computer security through technology, it is vulnerable to social engineering. recently, the institute of management and administration (ioma) reported social engineering as the number-one security threat for 2005. according to ioma, this method of security violation is on the rise due to continued improvements in technical protections against hackers.1 why and how does social engineering work? the first thing to keep in mind about social engineering is that it does work. kevin mitnick, possibly the best known hacker of recent decades, carried out most of his questionable activities through the medium of social engineering.2 he did not need to use his technical expertise because it was easier to just ask for the information he wanted. he discovered that people, when questioned appropriately, would give him the information he wanted. social engineering succeeds because most people work under the assumption that others are essentially honest. as a pure matter of probability, this is true; the vast majority of communications that we receive during the day are completely innocent in character. this fact allows the social engineer to be effective. by making seemingly innocuous requests for information, or making requests in a way that seems reasonable at the time, the social engineer can gather the information that he or she is looking for. methods of social engineering the arsenal of the social engineer is large and very well established. this is mainly because social engineering amounts to a variation on confidence trickery, an art that goes back as far as human history can recall. one might argue that homer’s iliad contains the first record of a social engineering attack in the form of the trojan horse. direct requests many social-engineering methods are complex and require significant planning. however, there is a simple and effective method that is often just as effective. the social engineer contacts his or her target and simply asks for the information. preying on trust and emotion social engineering is a method of gaining information through the persuasion of human sources, based on the abuse of trust and the manipulation of emotion. in his book, the art of deception, mitnick makes the argument that once a social engineer has established the trust of a contact, then all security is effectively voided and helping the hacker? library information, security, and social engineering samuel t. c. thompson samuel t. c. thompson (sthompson@ collier-lib.org), is a public service librarian at the collier county public library, naples, florida. helping the hacker? | thompson 223 the social engineer can gather whatever information is required. the most common method of targeting computer end-users is through the manipulation of gratitude. in these cases, a social engineer, usually impersonating a technician, contacts a user and states that there is something wrong on the victim’s end, and that the social engineer needs a few pieces of information to “help” the user. appreciative of the assistance, the victim provides the necessary information to the helpful caller or carries out the requested actions. predictably, no problem ever existed and the victim has now provided the social engineer either access to a computer system or with the information needed to gain that access. a counterpoint to the manipulation of gratitude is the manipulation of sympathy. this method is most often used on information providers such as help-desk personnel, technicians, and library staff members. in this scenario, a social engineer contacts a victim and claims to have either lost information, is out of contact with a normal source, or is simply ignorant of something that he or she should know. as anyone can empathize with this plea, the victim is often all too willing to provide the information sought by the social engineer. using these methods—taking advantage of the gratitude, sympathy, and empathy of their victims—social engineers are able to achieve their aims. impersonation because forming trust relationships with their victims is critical to a socialengineering attack, it is not surprising that social engineers often pretend to be someone or something that they are not. two of the major tools of impersonation are (1) speaking the language of the victim institution and (2) knowledge of personnel and policy. to allay suspicion, a social engineer needs to know and be able to use an institution’s terminology. being unable to do so would cause the victim to suspect, rather than trust, the social engineer. with a working knowledge of an organization’s particular vocabulary, a social engineer can phrase his or her request in terms that will not rouse alarm with the intended victim. the other major goal of a social engineer in preparing a successful impersonation is to develop a familiarity with the “lay of the land,” i.e., the specifics of and personnel within an organization. for instance, a social engineer needs to discover who has what authority within an organization so as to understand for whom he or she needs to claim to speak. research to establish trust in their victims, social engineers use research as a tool. this comes in two forms, background research and cumulative research. background research is the process by which a social engineer uses publicly available resources to learn what to ask for, how to ask for it, and whom to ask it of. while the intent and goal of this research differs from the techniques used by students, librarians, and other members of the population, the actual process is the same. cumulative research is the process by which a social engineer gathers the information that he or she needs to make more critical requests of their victims. the facts that a social engineer seeks through cumulative research may seem without value to the casual observer, but put together properly, they are anything but that. questions can include names of staff, internal phone numbers, procedures, or seemingly minor technical details about the library’s network (e.g., what operating system are you running?). late in the afternoon the phone at the reference desk rings. marcy, the librarian on duty answers, “reference desk.” “hi there, this is dave simpson calling from information services at the main branch. sorry about the echo, i’m working in the cabling closet at the moment, so i’m calling you on my cell phone.” “no problem, i can hear you fine. what can i do for you?” “thanks. a lot of the branches have been having network problems over the last few days. has everything been okay at the seashore branch reference desk?” “i think so.” “okay, that’s good. i’m running a test right now on the network and needed to find a terminal that was behaving itself. could you log off and let me know if any messages come up?” “no problem.” marcy logs off of the reference computer; nothing strange happens. “just the usual messages.” “good. now start logging back on. what user are you going in as? i mean which login name are you using?” “searef. okay, i’m logged on now.” “no strange messages?” “nothing.” “that’s great. look, our problem might be kids hacking into the system so i need you to change the password. do you know how to do that?” “i think so.” “well, let me walk you through it.” dave spends a couple of minutes walking marcy through changing the system password. the password is now changed to 5ear3f, a moderately secure password. “thanks, marcy. you’ve been a great help. we have your new password logged into the system. could you pass on the new password to the other reference personnel?” “sure.” “wonderful. just remember not to give the password out to anyone who doesn’t need it, and don’t write it down where anyone who shouldn’t have it can get at it. have a great day.” “you too.” 224 information technology and libraries | december 2006 why are libraries vulnerable? libraries are vulnerable to social-engineering attacks for two major reasons: (1) ignorance and (2) institutional psychology. the first of these difficulties is the easiest to address. the ignorance of library professionals in this matter is easily explained—there is very little literature to date about the issue of social engineering directed at library personnel. what exists is usually mixed in larger articles on general security issues and receives little focus. this lack of concern about social engineering can also be seen in computer professional literature, where it is dwarfed by the volume of articles concerning technical security issues. this is a curious gap, considering the high rate of occurrence of this kind of attack. is it because many technical professionals are less comfortable with a social issue—that can only be solved through people—than with a technical security issue that can be solved through the development or implementation of proper software?3 unfortunately, not knowing about a method of security violation leaves one vulnerable to that method. it is incumbent on librarians, computer administrations, and security professionals to be aware of these issues. the second factor is harder to address but equally important. unlike almost any other profession, librarians are expected to fulfill their patrons’ informational needs without question or bias. this laudable goal makes librarians vulnerable to social-engineering attacks because the inquiries made by a social engineer about the information resources available at a library may be used for nefarious purposes. a reference interview over these issues may be very successful from the point of view of both parties involved, as the librarian fills the openended inquiries of the social engineer, and the social engineer receives much, if not all, of the information that he or she needs to violate the library’s internal information systems. why libraries can be targets at this point, it is relevant to ask why security violators would even bother with library computer networks. what do libraries have that is worth possibly committing a crime to get? personal information is probably the most tempting target in a library computer system. libraries possess databases of names, addresses, and other personal data about library cardholders. this information is valuable, and not all of it is easily available from public sources. as may be seen in the section of this article on techniques, such information could be used as an end unto itself or as a stepping stone to security violations in other systems. subscriptions to proprietary databases are quite expensive, as any acquisitions librarian will explain. given the high prices and limited licensing, a hacker may want to gain access to these information resources. this could be a casual hacker who wants to have access to a library-only resource from his or her home computer, or this may be a criminal who wishes to steal intellectual properties from a database provider. libraries often have broadband access designed for a large network (e.g., t1). as these lines are very expensive, few individuals can afford them. at the same time, it has been observed that these broadband lines have immense capabilities for downloading information from other networks. there are many reasons why a hacker would seek to illicitly use such a resource. for instance, a casual hacker may want to download a large number of bootlegged movie files, or a criminal may wish to download a corporate database. with access to a library’s high bandwith internet line, these actions can be carried out quickly and with a minimized chance of detection. libraries possess large numbers of computers due to their increasing automation. these computer resources can, if compromised, be used as anonymous remote computers by hackers. called “zombies,” compromised computers could be used to deliver illegal spam, distributed denial of service (ddos) attacks, or as servers to distribute illegal materials. if library computers are used in this way, there is a potential for a library to face legal responsibility for the actions of its computers or for the questionable materials found on them. prevention the tools needed to prevent social engineering from succeeding are awareness, policy, and training. these tools feed into one another—we become aware of the possibility of social-engineering attacks, develop policy to communicate these concerns to others, and then train others in these policies to protect them and their libraries from social engineering. libraries should have a simple set of policies to help prevent social engineering from affecting them. this policy need not be long; ideally, it should be a small page of bullet points that are easy to remember or to post near telephones. what is important is that it is easy to remember and implement when a call or e-mail comes in.4 basic guidelines for protection against social engineering ■ be suspicious of unsolicited communications asking about employees, technical information, or other internal details. ■ do not provide passwords or login names over the phone or helping the hacker? | thompson 225 via e-mail no matter who claims to be asking. ■ do not provide patron information to anyone but the patron in person and only upon presentation of the patron’s library card or other proper identification. ■ if you are not sure if a request is legitimate, contact the appropriate authorities. ■ trust your instincts. if you feel suspicious about a question or communication, there is probably a good reason. ■ document and report suspicious communications. in closing social engineering is an immensely effective method of breaching computer and network security. it is, however, entirely dependent on the ability of the social engineer to persuade staff members into providing information or access that they should not provide. with care and good information policies, we can prevent social engineering from working. after all, do we really want to be helping the hacker? the circulation desk phone rings. joyce answers, “seashore branch public library, how may we help you?” “hi there, i’m worried that i haven’t turned in all the books i have out, and i really don’t want to get stuck with a fine. could you tell me what i have out?” “no problem. what is you name?” “sean grey.” joyce brings up sean grey’s circulation records, and then remembers about the library’s information policy and decides to ask another question, “could you give me your library card number?” “i don’t have that with me. i really don’t want to get stuck with those fines.” “i’m sorry. mr. grey, to preserve patron privacy we can only give out circulation information if you give us your card number or if you are here in person with your card or id.” “but i just want to avoid a fine. can’t you help?” “don’t worry; if you are late by accident on occasion, we are willing to forgive a fine.” “so you can’t give me my records?” “i’m sorry but we have to protect patron privacy. i’m sure you understand.” “i guess so. goodbye.” “have a good day.” ■ references 1. institute of management & administration, “six security threats that will make headlines in ’05,” ioma’s security director’s report 5, no. 1 (2004): 1–14. 2. k. manske, “an introduction to social engineering,” security management practices (nov./dec. 2000): 53–59. 3. m. mcdowell, “cyber-security tip st04-014,” (2005), http://www.us.cert. gov/cas/tips/st04-014.html (accessed june 5, 2005). 4. k. mitnick and w. simon, the art of deception (indianapolis: wiley, 2002). alcts cover 2 lama cover 3 lita 180, 216, cover 4 index to advertisers editorial | truitt 163 ■■ the space in between in my opinion, ital has an identity crisis. it seems to try in many ways to be scholarly like jasist, but lita simply isn’t as formal a group as asist. on the other end of the spectrum, code4lib is very dynamic, informal and community-driven. ital kind of flops around awkwardly in the space in between. —comment by a respondent to ital’s reader survey, december 2009 last december and january, you, the readers of information technology and libraries were invited to participate in a survey aimed at helping us to learn your likes and dislikes about ital, and where you’d like to see this journal go in terms of several important questions. the responses provide rich food for reflection about ital, its readers, what we do well and what we don’t, and our future directions. indeed, we’re still digesting and discussing them, nearly a year after the survey. i’d like to use some of my editorial space in this issue to introduce, provide an overview, and highlight a few of the most interesting results. i strongly encourage you to access the full survey results, which i’ve posted to our weblog italica (http:// ital-ica.blogspot.com/); i further invite you to post your own thoughts there about the survey results and their meaning. we ran the survey from mid-december to mid-january. a few responses trickled in as late as mid-february. the survey invitation was sent to the 2,614 lita personal members; nonmembers and ital subscribers (most of whom are institutions) were excluded. we ultimately received 320 responses—including two from individuals who confessed that they were not actually lita members—for a response rate of 12.24 percent. thus the findings reported below reflect the views of those who chose to respond to the survey. the response rate, while not optimal, is not far from the 15 percent that i understand lita usually expects for its surveys. as you may guess, not all respondents answered all questions, which accounts for some small discrepancies in the numbers reported. who are we? in analyzing the survey responses, one of the first things one notices is the range and diversity of ital’s reader base, and by extension, of lita’s membership. the largest groups of subscribers identify themselves either as traditional systems librarians (58, or 18.2 percent) or web services/development librarians (31, or 9.7 percent), with a further cohort of 7.2 percent (23) composed of those working with electronic resources or digital projects. but more than 20 percent (71) come from the ranks of library directors and associate directors. nearly 15 percent (47) identify their focus as being in the areas of reference, cataloguing, acquisitions, or collection development. see figure 1. the bottom line is that more than a third of our readers are coming from areas outside of library it. a couple of other demographic items: ■■ while nearly six in ten respondents (182, or 57.6 percent) work in academic libraries, that still leaves a sizable number (134, or 42.3 percent) who don’t. more than 14 percent (45) of the total 316 respondents come from the public library sector. ■■ nearly half (152, or 48.3 percent) of our readers indicated that they have been with lita for five years or fewer. note that this does not necessarily indicate the age or number of years of service of the respondents, but it’s probably a rough indicator. still, i confess that this was something of a surprise to me, as i expected larger numbers of long-time members. and how do the numbers shake out for us old geezers? the 6–10 and greater-than-15-years cohorts each composed about 20 percent of those responding; interestingly, only 11.4 percent (36) answered that they’d been lita members for between 11 and 15 years. assuming that these numbers are an accurate reflection of lita’s membership, i can’t help but wonder about the explanation for this anomaly.” see figure 2. how are we doing? question 4 on the survey asked readers to respond to several statements: “it is important to me that articles in ital are peerreviewed.” more than 75 percent (241, or 77.2 percent) answered that they either “agreed” or “strongly agreed.” “ital is timely.” more than seven in ten respondents (228, or 73.0 percent) either “agreed” or “strongly agreed” that ital is timely. only 27 (8.7 percent) disagreed. as a technology-focused journal, where time-to-publication is always a sensitive issue, i expected more dissatisfaction on this question (and no, that doesn’t mean that i don’t worry about the nine percent who believe we’re too slow out of the gate). marc truitt editorial: the space in between, or, why ital matters marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 164 information technology and libraries | december 2010 would likely quit lita, with narrative explanations that clearly underscore the belief that ital—especially a paper ital—is viewed by many as an important benefit of membership. the following comments are typical: ■■ “lita membership would carry no benefits for me.” ■■ “dues should decrease, though.” [from a respondent who indicated he or she would retain lita “i use information from ital in my work and/ or i find it intellectually stimulating.” by a nearly identical margin to that regarding timeliness, ital readers (226, or 72.7 percent) either “agreed” or “strongly agreed” that they use ital in their work or find its contents stimulating. “ital is an important benefit of lita membership.” an overwhelming majority (248, or 79.78 percent) of respondents either “agreed” or “strongly agreed” with this statement.1 this perception clearly emerges again in responses to the questions about whether readers would drop their lita membership if we produced an electronic-only or open-access ital (see below). where should we be going? several questions sought your input about different options for ital as we move forward. question 7, for example, asked you to rank how frequently you access ital content via several channels, with the choices being “print copy received via membership,” “print copy received by your institution/library,” “electronic copy from the ital website,” or “electronic copy accessed via an aggregator service to which your institution/library subscribes (e.g., ebsco).” the choice most frequently accessed was the print copy received via membership, at 81.1 percent (228). question 8 asked about your preferences in terms of ital’s publication model. of the 307 responses, 60.6 percent (186) indicated a preference for continuance of the present arrangement, whereby we publish both paper and electronic versions simultaneously. four in ten respondents preferred that ital move to publication in electronic version only.2 of those who favored continued availability of paper, the great majority (159, or 83.2 percent) indicated in question 9 that they simply preferred reading ital in paper. those who advocate moving to electronic-only do so for more mixed reasons (question 10), the most popular being cost-effectiveness, timeliness, and the environmental friendliness of electronic publication. a final question in this section asked that you respond to the statement “if ital were to become an electronic-only publication i would continue as a dues-paying member of lita.” while a reassuring 89.8 percent (273) of you answered in the affirmative, 9.5 percent (29) indicated that you figure 2. years of lita membership figure 1. professional position of lita members 18.2% (58) 0.3% (1) 0.6% (2) 0.6% (2) 0.9% (3) 2.2% (7) 2.5% (8) 3.1% (10) 4.1% (13) 4.4% (14) 6.3% (20) 7.9% (25) 9.4% (30) 9.7% (31) 12.9 % (41) 16.7% (53) 0% 5% 10% 15% 20% systems librarian (includes responsibility for ils, servers, workstat... other (please specify) library director web services/development librarian deputy/associate/assistant director reference services librarian cataloging librarian consortium/network/vendor librarian electronic resources librarian digital projects/digitization librarian student teaching faculty computing professional (non-mls) resource sharing librarian acquisitions/collection development librarian other library staff (non-mls) 11.4% (36) 19.7% (62) 20.0% (63) 48.3% (152) 0% 10% 20% 30% 40% 5 years or less 11–15 years 6–10 years more than 15 years editorial | truitt 165 his lipstick-on-a-pig ils. somewhere else there’s a library blogger who fends off bouts of insomnia by reading “wonky” ital papers in the wee hours of the morning. and that ain’t the half of it, as they say. in short—in terms of readers, interests, and preferences—“the space in between” is a pretty big niche for ital to serve. we celebrate it. and we’ll keep trying our best to serve it well. ■■ departures as i write these lines in late-september, it’s been a sad few weeks for those of us in the ital family. in mid-august, former ital editor jim kopp passed away following a battle with cancer. last week, dan marmion—jim’s successor as editor (1999–2004)—and a dear friend to many of us on the current ital editorial board—also left us, the victim of a malignant brain tumor. i never met jim, but lita president karen starr eulogized him in a posting to lita-l on august 16, 2010.3 i noted dan’s retirement due to illness in this space in march.4 i first met dan in the spring of 2000, when he arrived at notre dame as the new associate director for information systems and digital access (i think the position was differently titled then) and, incidentally, my new boss. dan arrived only six weeks after my own start there. things at notre dame were unsettled at the time: the libraries had only the year before successfully implemented exlibris’ aleph500 ils, the first north american site to do so. while exlibris moved on to implementations at mcgill and the university of iowa, we at notre dame struggled with the challenges of supporting and upgrading a system then new to the north american market. it was not always easy or smooth, but throughout, dan always maintained an unflappable and collegial manner with exlibris staff and a quiet but supportive demeanor toward those of us who worked for him. i wish i could say that i understood and appreciated this better at the time, but i can’t. i still had some growing ahead of me—i’m sure that i still do. dan was there for me again as an enthusiastic reference when i moved on, first to the university of houston in 2003 and then to the university of alberta three years later. in these jobs i’d like to think i’ve come to understand a bit better the complex challenges faced by senior managers in large research libraries; in the process, i know i’ve come to appreciate dan’s quiet, knowledgeable, and hands-off style with department managers. it is one i’ve tried (not always successfully) to cultivate. while i was still at notre dame, dan invited me to join the editorial board of information technology and libraries, a group which over the years has come to include many “friends of dan,” including judith carter (quite possibly the world’s finest managing editor), andy boze (ital’s membership] ■■ “ital is the major benefit to me as we don’t have funds for me to attend lita meetings or training sessions.” ■■ “the paper journal is really the only membership benefit i use regularly.” ■■ “actually my answer is more, ‘i don’t know.’ i really question the value of my lita membership. ital is at least some tangible benefit i receive. quite honestly, i don’t know that there really are other benefits of lita membership.” question 12 asked about whether ital should continue with its current delayed open-access model (i.e., the latest two issues embargoed for non-lita members), or go completely open-access. by a three-to-two margin, readers favored moving to an open-access model for all issues. in the following question that asked whether respondents would continue or terminate lita membership were ital to move to a completely open-access publication model, the results were remarkably similar to those for the question linking print availability to lita membership, with the narrative comments again suggesting much the same underlying reasoning. in sum, the results suggest to me more satisfaction with ital than i might have anticipated; at the same time, i’ve only scratched the surface in my comments here. the narrative answers in particular—which i have touched on in only the most cursory fashion—have many things to say about ital’s “place,” suggestions for future articles, and a host of other worthy ideas. there is as well the whole area of crosstabbing: some of the questions, when analyzed with reference to the demographic answers in the beginning of the survey, may highlight entirely new aspects of the data. who, for instance, favors continuance of a paper ital, and who prefers electronic-only? but to come back to that reader’s comment about ital and “the space in between” that i used to frame this discussion (indeed, this entire column): to me, the demographic responses—which clearly show ital has a substantial readership outside of library it—suggest that that “space in between” is precisely where ital should be. we may or may not occupy that space “awkwardly,” and there is always room for improvement, although i hope we do better than “flop around”! the results make clear that ital’s readers—who would be you!—encompass the spectrum from the tech-savvy early-career reader of code4lib journal (electronic-only, of course!) to the library administrator who satisfies her need for technology information by taking her paper copy of ital along when traveling. elsewhere on that continuum, there are reference librarians and catalogers wondering what’s new in library technology, and a traditional systems librarian pondering whether there is an open-source discovery solution out there that might breathe some new life into 166 information technology and libraries | december 2010 between membership and receiving the journal. many of them appear to infer that a portion of their lita dues, then, are earmarked for the publication and mailing of ital. sadly, this is not the case. in years past, ital’s income from advertising paid the bills and even generated additional revenue for lita coffers. today, the shoe is on the other foot because of declining advertising revenue, but ital is still expected to pay its own way, which it has failed to do in recent years. but to those who reasonably believe that some portion of their dues is dedicated to the support of ital, well, t’ain’t so. bothered by this? complain to the lita board. 2. as a point of comparison, consider the following results from the 2000 ital reader survey. respondents were asked to rank several publishing options on a scale of 1 to 3 (with 1 = most preferred option and 3 = least preferred option): ital should be published simultaneously as a print-onpaper journal and an electronic journal (n = 284): 1 = 169 (59.5%); 2 = 93 (32.7%); 3 = 22 (7.7%) ital should be published in an electronic form only (n = 293): 1 = 55 (18.8%); 2 = 61 (20.8%); 3 = 177 (60.4%) in other words, then as now, about 60% of readers preferred paper and electronic to electronic-only. 3. karen starr, “fw: [libs-or] jim kopp: celebration of life,” online posting, aug. 16, 2010, lita-l, http://lists.ala. org/sympa/arc/lita-l/2010-08/msg00079.html (accessed sept. 29, 2010). 4. marc truitt, “dan marmion,” information technology & libraries 29 (mar. 2010): 4, http://www.ala.org/ala/mgrps/ divs/lita/ital/292010/2901mar/editorial_pdf.cfm (accessed sept. 29, 2010). webmaster), and mark dehmlow. while dan left ital in 2004, i think that he left the journal a wonderful and lasting legacy in these extremely capable and dedicated folks. my fondest memories of dan concern our shared passion for model trains. i remember visiting a train show in south bend with him a couple of times, and our last time together (at the ala midwinter meeting in denver two years ago) was capped by a snowy trek with exlibris’ carl grant, another model train enthusiast, to the mecca of model railroading, caboose hobbies. three boys off to see their toys—oh, exquisite bliss! i don’t know whether ital or its predecessor jola have ever reprinted an editorial, but while searching the archives to find something that would honor both jim and dan, i found a piece that i hope speaks eloquently of their contributions and to ital’s reason for being. dan’s editorial, “why is ital important?” originally published in our june 2002 issue, appears again immediately following this column. i think its message and the views expressed therein by jim and dan remain as valid today as they were in 2002. they also may help to frame my comments concerning our reader survey in the previous section. farewell, jim and dan. you will both be sorely missed. notes and references 1. a number of narrative answers to the survey make it clear that ital readers who are lita members perceive a link 121 n ews and announcements redi or not . .. "public libraries and the remote electronic delivery of information (redi)," a working meeting, was held in columbus, ohio, on monday and tuesday, march 23 and 24, 1981. the meeting, jointly sponsored by the public library of columbus and franklin county (ohio) and oclc, inc., considered the issues that public libraries must examine before becoming involved in electronic information services . subjects explored included technology, communications, information providers, information users , social implications, and financial, legal, and regulatory responsibilities. tom harnish, program dir e ctor of oclc' s home delivery of library services program, was moderator of the twoday event. participants at the conference represented a variety of public libraries from throughout the u.s., including new york, georgia, texas, california, colorado, and illinois. don hammer represented lita at the meeting; mary jo lynch of the ala office for research also attended . "geographic distances, " said harnish, "were the only points of separation among the meeting participants . there was an overwhelming agreement on the concerns for the future of libraries and universal access to information in the electronic age . " on the second day of the conference it became apparent that the redi agenda could not be properly dealt with in two days. "we need an organization which will address these issues on an ongoing basis," said richard sweeney, executive director of plcfc . "librarians at the conference agreed to promote and lead the development of the electronic library . to that end, this group is seeking recognition by ala as a membership initiative group with a special interest in the electronic library." the group's founders prepared the following mission statement for the membership initiative group: to ensure that information delivered electronically remains accessible to the general public, the electronic library association shall promote participation and leadership in the remote electronic delivery of information* (redi) by publicly supported libraries and nonprofit organizations . goals of the organization are to: • identify services and information that are best suited to remote electronic delivery; • plan , fund, and develop working demonstrations of library redi services ; • communicate the availability of electronic library services to the user community; · • inform the library profession of trends, specific events , and future directions of redi; • create coalitions with organizations in allied fields ·of interest. public libraries and nonprofit organizations with information interests, such as information and referral groups, are invited to join the electronic library association . the group plans to meet at the ala annual conference in san francisco. meeting details will be announced as soon as they are available . it was the goal of the "public libraries and the remote electronic delivery of information " meeting to provide th e fram e work within which to address the myriad issues in redi. the electronic library group will validate the role of libraries in technology .... redi or not here we come. *information delivered electronically where and when it is needed, in the library and elsewhere (home/office/off-site). 122 journal of library automation vol. 14/2 june 1981 arl adopts plan for improving access to microforms a plan aimed at improving bibliographic access to materials in microform by building a nationwide database of machinereadable records for individual titles in microform sets was approved in principle by the arl board of directors on january 30, 1981. the plan concentrates on monograph collections, and is aimed at providing records for individual titles in both current and retrospective sets. records add~d to the database will also aid cooperative efforts in preservation microfilming. elements of the plan include: • inputting of records conforming to accepted north american standards to the major bibliographic utilities by libraries and microform publishers; • d~ve_lopment of "profile matching" by the b1blwgraphic utilities permitting the cataloging of all individual titles in a series or microform collection with single operation; • cooperative cataloging of current and retrospective microform sets by libraries and publishers; • compensation for publishers who input acceptable bibliographic records to the bibliographic utilities to offset loss of revenue from card set sales. cooperation among libraries publishers, networks, and others ha's been stressed throughout the development of the plan, and initiatives on a number of fronts are necessary and encouraged in order to accomplish the goal of improved bibliographic access to microforms. arl will s_eek outside funding for a program coordmator to facilitate implementation of the elements outlined above, and recruitment for the one-year position will begin short!~ . the coordinator, advised by a committee of librarians (from arl and ~on-arl institutions) and microform publ~shers, will work with libraries, publishers, and the bibliographic utilities to help get the plan off the ground. the plan is the result of a one-year study funded by a grant from the national endowment for the humanities and conducted for arl by richard boss of information systems consultants, inc. during the course of the year, he interviewed librarians , microform publishers, representatives of the bibliographic utilities, and others interested in bibliographic access to microforms, gradually building the plan from elements on which there was agreement and discarding ideas that were not widely accepted. the effort to build a consensus among the various interested parties was aided by the advisory committee, comprising both arl librarians and microform publishers, which assisted and advised throughout the course of the project. arl will publish the study this spring. arl sponsorship of this project and its follow-up reflects the long-standing commitment the association has had to improving access to microforms . two earlier arl studies on improving bibliographic access contributed to the development of standards for descriptive cataloging of microforms, reinforced the importance of microforms for preserving and disseminating scholarly materials, and identified some of the problem areas that the current study has addressed . today, as the amount of materials in microform in arl libraries continues to grow-arl libraries hold more than 146,660,000 units of microform-improving access to these materials has taken on even greater urgency. the association of research libraries is an organization of major research libraries in the united states and canada. members include the larger university libraries, the national libraries of both countries and a number of public and special librar~ ies with substantial research collections . there are at present 111 institutional members . battelle studies using computers to access unpublished technical information engineers may be able to use computers to store, call up, and otherwise display some technical information not currently published in professional journals as a result of a study recently begun by battelle's columbus laboratories. in a four-month study sponsored by the american society of mechanical engineers (asme), battelle researchers are examining ways to use computers as an alternative to publications for communicating with the technical community. asme is a technical and educational organization with a membership of 100,000 individuals, including 17,000 student members. it conducts one of the largest technical publishing operations in the world, which includes codes, stanc dards, and operating principles for industry. according to battelle's gabor j. kovacs, certain types of information traditionally are not covered in monthly or quarterly technical journals, yet they often have widespread appeal among engineers. "recent advances in computer and telecommunications technologies, coupled with rapidly rising publication costs and postal rates, have created an ideal environment for organizations to consider using computers as an alternative mode of communication," kovacs said . "data bases can be used to maintain information that is impractical for conventional publication, and it is now possible to use them for many other types of communication as well." during the study, researchers will determine the feasibility of using a computer database to disseminate to asme members such information as short articles dealing with design and applications data, news and announcements 123 catalog data, and teleconference messages. with the help of the asme, battelle specialists will define the information requirements for such a system. while technology is sufficiently advanced to accommodate virtually any type of information, costs can become prohibitive unless practical compromises are made, kovacs said . as part of the study, battelle researchers also will analyze the costs associated with systems of varying capabilities. researchers then will define several alternative database systems, which will include such attributes as: • online, interactive retrieval features • simple-to-use retrieval language • user-aid features • a minimum of seventy-five simultaneous users • ability to send, store, and broadcast messages • compatibility with a variety of hard copy and crts (cathode ray tube terminals) • sixteen or more hours per day availability to accommodate different time zones • a minimum of thirty-characters-persecond transmission rates two of these alternative system de .signs-one representing a minimum capability and the other a maximum capability-then will be selected for further evaluation by battelle and the asme. 2 information technology and libraries | december 2007 editorial: farewell and thank you john webb this issue of information technology and libraries (ital), december 2007, marks the end of my term as editor. it has been an honor and a privilege to serve the lita membership and ital readership for the past three years. it has been one of the highlights of my professional career. editing a quarterly print journal in the field of information technology is an interesting experience. my deadlines for the submission of copy for an issue are approximately three and a half months prior to the beginning of the month in which the issue is pub­ lished; for example, my deadline for the submission of this issue to ala production services was august 15. therefore, most articles that can appear in an issue were accepted in final form at least five months before they were published. some are older; one was a baby at only four months old. when one considers the rate of change in information technologies today, one understands the need for blogs, wikis, lists, and other forms of profes­ sional discourse in our field. what role does ital play in this rapidly changing environment? for one, unlike these newer forms, it is double­blind refereed. published articles run a peer review gauntlet. this is an important distinction, not least to the many lita members who work for aca­ demic institutions. it may be crass to state it so baldly, but publication in ital can help one earn tenure, an old­fashioned fact of life. it is indexed or abstracted in nineteen published sources, not all of them in english. many of its articles appear in various digital repositories and archives, and these also are harvested or indexed or both. in addition, its articles are cataloged in worldcat local. many of lita’s most prominent members—your distinguished peers—have published articles in ital. the journal also serves as a source for the wider dis­ semination of sponsored research, a requirement of most grants. and you can read it on the bus or at the beach (heaven forbid!), in the brightest sunlight, or with a flashlight under the covers (though there are no reports of this ever having been observed). i am amazed at how quickly these three years have passed, though that may be at least as much a function of my advanced age as of the fun and pleasure i have had as editor. certainly, these past three years have hosted some notable landmarks in our history. lita and ital both celebrated their fortieth anniversaries. sadly, the death of one of lita’s founders and ital’s first editor, frederick g. kilgour, on july 31, 2006, at age ninety­two, was a landmark in the passing of an era. oclc and rlg’s merger, which fred lived to witness, was a landmark of a different sort—one of maturity, we hope. ital is now an electronic as well as a print journal. this conversion has had some rough passages, but i trust these will have been ironed out by the time you read this. when i became editor, i had a number of goals for the journal, which i stated in my first editorial in march 2005. reading that editorial today, i realize that we successfully accomplished the concrete ones that were most important to me then: increasing the number of articles from library and i­school faculty; increasing the number that result from sponsored research; increasing the number that describe any relevant research or cutting­edge advance­ ments; increasing the number of articles with multiple authors; and finding a model for electronic publication of the journal. the accomplishment of the most abstract and ambitious goal, “to make ital a destination journal of excellence for both readers and authors,” only you, the readers and authors, can judge. i thank mary taylor, lita executive director, and her staff for all of the support they provided to me during my term. i owe a debt that i can never repay to all of the staff of ala production services who worked with me these past three years. their patience with my some­ times bumbling ways was award­winning. thank all of you. the lita presidents and other officers and board members were unfailingly supportive, and i thank you all. in the lita organizational structure, the ital editor and the editorial board report to the lita publications committee, and the editor is a member of that body. i thank all of the chairs and other members of that commit­ tee for their support. once more, and sadly for the last time, i thank all of the members of the ital editorial board who served dur­ ing my term for their service and guidance. they perform more than their share of refereeing, but more importantly, as i have written before, they are the junkyard dogs who have kept me under control and prevented my acting on my worst instincts. i say again, you, the lita member­ ship and ital readership, owe them more than you can ever guess. trust me. to marc truitt, ital managing editor and the incom­ ing ital editor for the 2008–2010 volume years, i must say, “thank you, thank you, thank you!” marc and the ala production services staff were responsible for the form, fit, and finish of the journal issues you received in the mail, held in your hands, and read under the covers. finally, most of all, thank you authors whose articles, communications, and tutorials i have had the privilege to publish, and you whose articles have been accepted and await publication. john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university, and editor of information technology and libraries. editorial: farewell and thank you | john webb 3 not only is this the end of my term as editor, but i also have retired. from now on, my only role in the field of library and information technology will be as a user. those of you have seen the movie the graduate probably remember the early scene when benjamin, the dustin hoffman character, receives the single word of advice regarding his future: “plastics.” (i don’t know if that scene is in the novel from which the movie was adapted.) my single word of advice to those of you too young or too ambitious to retire from our field is: “handhelds.” i am surprised that my treo is more valuable to me now in retirement than it was when i was working. (i’m not surprised that my ipod video is, nor that word thinks that treo and ipod are misspellings.) i just wish that more of the web was as easily accessible on my treo as are google maps and almost all of yahoo!. handhelds. trust me. rural public libraries and digital inclusion: issues and challenges brian real, john carlo bertot, and paul t. jaeger information technology and libraries | march 2014 6 abstract rural public libraries have been relatively understudied when compared to public libraries as a whole. data are available to show that rural libraries lag behind their urban and suburban counterparts in technology service offerings, but the full meaning and effect of such disparities is unclear. the authors combine data from the public library technology and access study with data from smaller studies to provide greater insight to these issues. by filtering these data through the digital inclusion framework, it becomes clear that disparities between rural and nonrural libraries are not merely a problem of weaker technological infrastructure. instead, rural libraries cannot reach their full customer service potential because of lower staffing (but not lower staff dedication) and funding mechanisms that rely primarily on local monies. the authors suggest possible solutions to these disparities while also discussing the barriers that must be overcome before such solutions can be implemented. introduction the large number of rural public libraries in the united states is surprisingly understudied, particularly in terms of technology access. the american library association (ala) and other professional organizations consider a public library to be small or rural if its population of legal service area is 25,000 or less. when viewed through this lens, rural public libraries1 • have on average less than one (.75) librarian with a master’s degree from an alaaccredited institution; • have an average of 1.9 librarians, defined as an employee holding the title of librarian; • have an average total of 4.0 staff, including both fulland part-time employees; • have a median annual income (from all sources) of $118,704.50; • have an average of 41,425 visits annually; and • typically have one building or branch that is open an average of 40 hours/week. brian real (breal@umd.edu) is a phd candidate in the college of information studies, john carlo bertot (jbertot@umd.edu) is co-director of the information policy and access center and professor in the college of information studies, and paul t. jaeger (pjaeger@umd.edu) is codirector of the information policy and access center and associate professor and diversity officer of the college of information studies, university of maryland, college park, maryland. mailto:breal@umd.edu mailto:jbertot@umd.edu mailto:pjaeger@umd.edu rural public libraries and digital inclusion | real, bertot, and jaeger 7 while these data suggest rural libraries operate on a smaller and less financially robust scale than their suburban and urban counterparts, the full effect of these discrepancies on service levels is unclear. this article uses various information sources to analyze the effect of these discrepancies on the ability of rural libraries to offer technology-based services. since the advent of the internet in the mid-1990s, public libraries have been key internet-access and technology-training providers for their communities. the ability to offer internet access alongside support and training for patrons using such technology are primary indicators of libraries’ value to their communities. by analyzing data from the 2012 public library funding and technology access survey (plftas), the authors found that rural libraries, on average, have weaker technological infrastructure (such as fewer average numbers of computers and slower broadband connections) and are able to offer fewer support services, such as training classes, than urban and suburban public libraries. with public libraries being many patrons’ only source of broadband access in many rural communities, limitations for rural libraries may affect patrons’ ability to fully participate in employment, education, government, and other central aspects of society. through analysis of the plftas data2 about technology access in rural public libraries in conjunction with other studies of rural libraries and librarians, this article explores the causes and effects of the relatively more limited technological and support infrastructures for rural patrons and communities. method as documented since 1994,3 public libraries were early adopters of internet-based technologies. the purpose of the pltas survey, and its previous iterations, is to identify public library internet connectivity; propose and promote public library internet policies at the federal level; maintain selected longitudinal data as to the connectivity, services, and deployment of the internet in public libraries; and provide national estimates regarding public library internet connectivity. through changes in funding sources and frequency of administration over the past two decades, the survey has maintained core longitudinal questions (e.g., numbers of public access workstations, bandwidth), but consistently explored a range of emerging topics (e.g., jobs assistance, e-government, emergency roles). the survey’s method has evolved over time to meet changing survey data goals. the 2012 survey provides both national and state estimates of public library internet connectivity, public access technologies, and internet-enabled services and resources. the survey used a stratified “proportionate to size sample” to ensure a proportionate national sample using the fy2009 imls public library dataset (formerly maintained by the us national center for education statistics) to draw its sample. strata included states in which libraries resided and metropolitan status (urban, suburban, rural) designations. bookmobile and books by mail service outlets were removed from the file, leaving 16,776 library outlets. information technology and libraries | march 2014 8 the study team drew a sample with replacement of 8,790 outlets stratified and proportionate by state and metropolitan status state.4 the survey received 7,252 responses for a response rate of 82.5%. using weighted analysis to generate national and state data estimates, the analysis uses the responses to estimate to all public library outlets (minus bookmobiles and books by mail) in the aggregate as well as by metropolitan status designations. unless otherwise noted, all data discussed in the article are from the 2012 study. that study, along with all previous public libraries and the internet and public library funding and technology access studies, additional analysis, and data products are available at http://www.plinternetsurvey.org. digital inclusion and the value of public libraries digital inclusion is a useful framework through which one can understand the importance of ensuring individuals have access to digital technologies as well as the means to learn how to use them.5 digital inclusion comprises policies and actions that mitigate the significant, interrelated problems of the digital divide and digital literacy: • digital divide implies the gap—whether based in socioeconomic status, education, geography, age, ability, language, or other factors—between individuals for whom internet access is readily available and those for whom it is not. indeed, even those with basic, dialup internet access are losing ground as internet and computer technologies continue to advance, using increasing bandwidth and demanding high-speed (“broadband”) internet access. • digital literacy encompasses the skills and abilities necessary for access once the technology is available, including understanding the language and component hardware and software required to successfully navigate the technology. • digital inclusion is policies developed to close the digital divide and promote digital literacy. it marries high-speed internet access (as dial-up access is no longer sufficient) and digital literacy in ways that reach various audiences, many of whom parallel those mentioned within the digital divide debate. to match the current policy language, digital inclusion will signify outreach to unserved and underserved populations. since virtually every public library in the united states offers public internet access, these institutions are invaluable in promoting digital inclusion. however, the plftas data shows that not all libraries are equal, with rural public libraries lagging behind libraries in more populated areas in providing technology services. therefore this article focuses on the following issues and questions: • digital divide: why do rural individuals have less access to broadband technologies than their suburban and urban counterparts? how are rural libraries currently compensating for this deficit? rural public libraries and digital inclusion | real, bertot, and jaeger 9 • digital literacy: why do rural libraries offer less digital literacy training and patron support? how do rural libraries compare to libraries in more populated areas on key issues in digital literacy, such as employment and government information? • digital inclusion: what policies have been developed to help rural libraries close the digital divide and promote digital literacy, and what policies—including funding structures and decisions—hinder these libraries from adequately addressing these concerns? what governmental and extra-governmental policies can be enacted to help rural libraries to better promote digital inclusion? the following section describes the differences between rural libraries and their urban and suburban counterparts, combining plftas data with information from other studies to demonstrate how rural libraries are more essential in bridging the digital divide yet are seemingly doing less to promote digital literacy. following this, the authors discuss why rural libraries trail suburban and urban libraries in these areas, with studies suggesting the issue is a result of inadequate resources, not a lack of staff dedication. finally, the authors present a review of some of the initiatives that are attempting to bridge these divides, including suggestions that may help rural librarians to act as better advocates for their patrons’ needs. rural challenges to digital inclusion numerous studies, including plftas, show that rural libraries offer less technology access with slower connection speeds than libraries in more populated areas. these libraries also offer comparatively less formalized digital literacy training, although rural libraries still provide invaluable informal training in this area. this section highlights discrepancies between rural libraries and those in more populated areas. technology and service disparities between rural and nonrural libraries while almost every public library offers patrons internet access, 70.3% of rural libraries are the only free internet and computer terminal access providers in their service communities, compared to 40.6% of urban and 60.0% of suburban libraries.6 the disparity between these categories becomes more striking when one considers the difference between home broadband adoption in rural and nonrural areas. according to the pew research center’s home broadband 2010 survey, only 50% of rural homes have broadband internet access, compared to 70% of nonrural homes.7 this disparity is due in large part to the greater difficulty and cost of creating the infrastructure to support broadband internet access in more sparsely populated areas.8 with broadband access provided primarily by for-profit companies, little profit motive exists to expand services to areas where the infrastructure cost would not allow for a quick and efficient recouping of costs. the us government has attempted to address this problem in a numerous ways, including dedicating $7.2 billion to improving broadband access throughout the country through grants (broadband information technology and libraries | march 2014 10 technology opportunity programs; btop) and loans (broadband infrastructure projects; bip) as part of the american recovery and reinvestment act (arra) of 2009.9 expanding this infrastructure will take time, and at this time it is unknown as to the extent to which broadband access in rural communities, both in general and for public libraries, will increase. as the arra projects near completion, it will be important to conduct follow-up analysis of the effect in terms of access to broadband in the home and in anchor institutions such as public libraries, as well as the extent to which broadband subscriptions increased. at present, however, public libraries—and rural public libraries in particular—are still the primary source of broadband access for many americans, and this will likely remain true for large portions of the population for the foreseeable future. individuals in need of internet access have few options in many communities. though there are increasing free wireless (wi-fi) internet access sources in communities (e.g., coffee shops, food outlets), one needs to have a device (e.g., tablet, laptop) to use these options. in two-thirds of american communities, the public library is the only source of freely available public internet access inclusive of public access computers.10 specific government efforts to increase internet access, broadband networks, and digital literacy of the population, however, fail to involve public libraries in a meaningful way, if at all.11 to be fair, public libraries were eligible to compete for the grants or submit loan applications for the arra broadband funding initiatives, and public libraries in states such as alaska, arizona, colorado, idaho, maine, montana, nebraska, and others have benefited from this, primarily through inclusion in applications with multiple beneficiaries.12 since btop works as a grantmaking process, relatively few us public libraries (approximately 20%) have benefited from btop funding, but the results have been encouraging. for example, 85 libraries in mostly rural nebraska have upgraded their broadband capacity using btop funds, with broadband capacity for these locations increasing from an average of 2.9 mbps to an average of 18.2 mbps. other states have tried innovative ideas, such as the idaho department of labor’s youth corps program to train high school and college students to work as digital literacy coaches, and then deploy them to libraries around the state. indeed, the btop program has certainly created some encouraging results, but this is not a permanently funded program and it targets a limited number of libraries, so it cannot be considered as a primary, widespread solution to the digital inclusion gap between rural and more populated areas. the authors of a recent btop report note, “unless strategic investments in u.s. public libraries are broadened and secured, libraries will not be able to provide the innovative and critical services their communities need and demand.”13 thus btop may provide a good model to addressing gaps in digital inclusion, but it was never designed to be a permanent solution. this role of ensuring digital inclusion in communities has accelerated at a time of unprecedented austerity nationally and at the state and local levels of government in particular. based on bureau of labor statistics (http://www.bls.gov) data, the united states lost 584,000 public-sector jobs between june 2009 and april 2012, or 2.5% of the local, state, and federal government jobs that rural public libraries and digital inclusion | real, bertot, and jaeger 11 existed before the prolonged economic downturn began. according to the center on budget and policy priorities, state budget shortfalls have ranged from $107 billion to $191 billion between 2009 and 2012, and current projections place state budget shortfalls at $55 billion for 2013.14 the prolonged economic downturn, in part, has driven up library usage in some communities.15 even before the downturn began, public libraries in the rural areas typically had the oldest computing equipment, the slowest internet access speeds, and the lowest support levels from the federal government.16 as a part of becoming the main source of digital literacy training and digital inclusion, public libraries have also become a primary training provider for in-demand, technology-based job skills.17 the resulting situation forces public libraries to balance reduced support, increased demand, and a growing centrality in helping their communities recover from the economic downturn. at the center of both increased demand and increased support of digital literacy and inclusion lies sufficient internet access. in a survey of rural librarians in tennessee, respondents reported that their patrons’ most critical information need was broadband internet access.18 the respondents also ranked access to recent hardware technology and software, technology training, and help with specific tasks like applying for jobs or government benefits as highly critical. by comparison, the respondents ranked traditional services such as book loaning as the least critical duty, significantly trailing the abovementioned and other technology services. despite rural librarians viewing technology-based services as their most important function, however, rural libraries lack the resources to meet the same service quality as nonrural libraries. the ensuing section discusses the nature of those disparities. technology infrastructure and technology training virtually all public libraries offer their patrons access to the internet. there is no statistical difference between rural, suburban, and urban libraries in this regard.19 likewise, rural libraries only lag slightly in wireless internet availability, which is becoming increasingly important with the ubiquity of mobile technology devices; 86.3% of rural libraries have wireless access available for patrons, compared to an average of 90.5% across all three categories.20 and, in one of the few technological areas where rural libraries lead their nonrural counterparts, 42.3% of rural libraries reported they had sufficient public access computer terminals at all times, compared to 33.5% of suburban and 12.9% of urban libraries. while the number of rural library computer terminals may be adequate in many locations, hardware quality suffers; 69.5% of rural libraries replace their public access computer terminals as needed while, 66.4% of urban libraries have a technology replacement schedule.21 for many small libraries with only a single full-time librarian, that employee also serves as the it specialist for the location.22 therefore many rural libraries have less up-to-date technologies and less technical support than their nonrural counterparts. even if the librarians who also provide it support for their locations are qualified to fulfill this role, the greater issue is limited time for information technology and libraries | march 2014 12 librarians to work on these issues in addition to other duties. in addition to less recent hardware, rural libraries also have limited bandwidth; 31.1% of rural libraries operate on bandwidths of 1.5 mbps (t1) or less, compared to only 18.3% of suburban libraries and a mere 9.7% of urban libraries.23 the greatest issues facing rural libraries are not well represented by the broader categories of internet access but instead in the implementation of services to make these technologies highly useful and effective for patrons. only 31.8% of rural libraries offer formal technology training classes, as compared to 63.2% of urban and 54.0% of suburban libraries.24 this comparison alone does not present a problem, since more populated areas have larger customer bases that justify training patrons in groups rather than in one-on-one sessions. however, rural libraries also trail significantly in offering one-on-one technology training, with only 30.1% of rural libraries providing such programs, compared to 43.4% of urban and 37.9% of suburban libraries. only 21.9% of rural libraries have online training materials, compared to 36.3% and 33.7% of urban and suburban libraries, respectively. in fact, 12.5% of rural libraries do not offer planned technology training at all, compared to a mere 5.1% of urban libraries and 8.0% of suburban libraries. therefore, while most patrons in nonrural areas who have limited technology skills can go to their local library and acquire such skills for free, such access to the resources for personal advancement is drastically limited by comparison in rural areas. since many rural residents do not have internet access in their homes, many of these individuals do not own computers and have limited technology skills resulting from limited technology exposure. this makes the technology training disparity between rural and nonrural libraries quite problematic, since most americans need these skills to maintain a high standard of living and employment. employment assistance while public libraries in all areas saw adequate staffing as a statistically similar problem for helping patrons find jobs—51.9% or rural librarians agreed this was a challenge, only slightly exceeding the overall average of 49.8%—the greater issue is the disparity of confidence levels in assisting patrons in employment matters.25 nearly half (48.3%) of rural survey respondents agreed a lack of staff expertise was a challenge to helping patrons find and apply for jobs online, compared to 27.9% of urban and 37.7% of suburban libraries. the internet has become essential for many people who wish to gain employment, thus rural public librarians’ inability to support rural residents with limited technology skills is problematic. many government agencies, hospitals, and private employers—including walmart, the largest employer in the united states—will no longer accept paper applications, but instead insist potential employees submit applications via the internet.26 this can be especially challenging for individuals who have recently lost jobs they have held for decades, as they simultaneously need to refresh basic application and interviewing skills while learning how to use unfamiliar information technologies to find and apply for jobs. librarians can offer critical assistance in these cases, especially for individuals who do not own a computer or have internet access in their homes. however, inequities in staffing between rural rural public libraries and digital inclusion | real, bertot, and jaeger 13 and nonrural libraries can prevent rural residents from having equal access and aid in finding careers. government service access rural libraries also lag behind libraries in more populated areas in providing support for accessing online government services. there is no statistically significant difference between public libraries in staff providing assistance to patrons who need help filling out forms, with 96.6% of all libraries offering this service.27 however, only 45.6% of rural libraries assist customers in understanding government programs and services, compared to 57.8% of urban and 52.9% of rural libraries. rural libraries are also far less likely to have formal guides to help customers understand these government services, with only 15.3% of rural libraries offering such products as compared to 33.6% of urban and 22.2% of suburban counterparts. just 6.2% of rural libraries offer formal training classes for using government websites, understanding government programs, and completing forms. roughly a one-fourth (24.5%) of urban and 11.9% of suburban libraries offer such services. in terms of staff expertise, 20.0% of rural libraries reported having at least one staff member with significant knowledge and expertise of government services, compared to 31.4% of urban and 25.0% of suburban libraries. therefore, while most public libraries help patrons access government services, rural libraries lag substantially in the type of formal planning that may make patrons more aware of government services that would improve their quality of life. important services such as voter registration, motor vehicle services, payment of taxes, and school enrollment for children can now be done either only or much more efficiently online.28 these online services are more convenient for many americans, but “while many members of the public may struggle with accessing or using egovernment information and services, government agencies have come to focus on it as a means of cost savings rather than increasing access for members of the public.”29 government agencies have for the most part not taken many americans' lack of digital literacy into account when shifting their primary means of service to the digital realm, nor have they considered the effect this shift has on public libraries as the primary internet provider for many americans. this has led to extra responsibilities for rural public libraries but not a direct increase in resources. one might consider that rural libraries offer fewer of these services, or have less expertise in providing digital government services, in part because such services are not in demand by patrons. however, government services have steadily moved online and the pace is accelerating towards an e-only means of interacting with government. the open government movement,30 combined with the federal government’s release of the technology and services blueprint, signals the further use of technologies to offer innovative and operational digital government services—both through more traditional web-based services and mobile applications.31 and state and local governments are increasingly engaging in e-government services such as unemployment and social service benefits, taxation, licensing, and more. in short, federal, state, and local governments are moving rapidly to a range of e-services that will necessitate facility by librarians with technologies, information technology and libraries | march 2014 14 government services, and government information to better help their communities navigate the challenges of e-government. government intervention in digital literacy although most government agencies have not considered the effect their shift to primarily digital services has on individuals who lack basic digital literacy, the federal communications commission launched two programs that could help with the digital literacy problem. the first of these, digitalliteracy.gov, is designed to provide individuals with tools to facilitate digital inclusion, helping users to acquire skills that will make them more capable in the modern information environment. the challenge with this approach is that many resources on the website are designed for individuals who need such skills and who therefore probably do not have access to the internet or possess the skills to fully engage the resources. moreover, most of these resource links point to external sites, which are organized by arbitrary user ratings rather than skill level and relevance.32 likewise, educator resources – which should be most valuable in helping librarians to education patrons – are presented as links to external sites with limited information about each resource. these resources may be able to help patrons, but a collaborative effort that includes public librarians in creating resources could better target particular patron needs in a public access setting. a newer project, connect2compete, demonstrated more promising progress in this area. connect2compete is a partnership between the fcc and private businesses to provide low-cost internet and computers to low-income families, digital literacy training, and other services.33 they also publicize the digital literacy divide, working with the ad council and other organizations to promote this issue.34 the website allows users to search for places where they can receive digital literacy training, with the search results primarily displaying local public library branches. however, despite pointing users to public libraries for such training, connect2compete currently only helps to fund such training in limited cases. while this program provides a strong model for raising awareness about digital inclusion, it is unlikely to provide infrastructure resources to fully bridge the gap between rural and nonrural communities in the near future. while the fcc has been innovative by soliciting private funds to prevent connect2compete from using any taxpayer funds, these private funds will not replace the need for government funds for public libraries throughout the nation, nor is private funding likely to continue indefinitely. indeed, “while governments at all levels are relying on public libraries to ensure digital inclusion, the same governments are reducing the funding of the very libraries that are being relied on.”35 the following section will detail how decreasing funding and limited resources have contributed to the digital divide between rural and nonrural libraries. rural libraries and barriers to promoting digital inclusion when the internet was emerging in the 1990s, “public libraries essentially experimented with public-access internet and computer services, largely absorbing this service into existing service rural public libraries and digital inclusion | real, bertot, and jaeger 15 and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology services and resources.”36 while some libraries have increased their funding levels to match these challenges, most funding agencies have not recognized the costs or value of additional services that public libraries now offer in a wired nation. this section discusses the reasons why rural libraries have not been able to offer the same level of service nonrural library patrons routinely expect. funding inadequacies for rural libraries rural libraries face challenges from their problematic funding structure. sin noted that for public libraries, “on average, the local government provided 76.6% of the funding; the state, 8.7%; the federal government, 0.7%; and other sources, 13.9%.”37 this is a particular problem for rural libraries since, as holt explained, “if cities and suburbs had to survive on the extraordinarily low taxes on agricultural property, the urban/suburban public sector would have service levels so low that most officials would turn away in disgust.”38 this lack of local revenue for all public services—including libraries—in rural areas is exacerbated by the continuing population decrease in small towns and the desirability of such locales for retiring seniors, who prefer to live in areas with low taxes because of limited incomes.39 in other words, public library funding structures that place local governments at the forefront of budgeting plans put rural libraries at a serious disadvantage and promote a digital divide between rural and nonrural areas. holt notes, “it is the legitimate function of state government to make things right. state governments, after all, are of a size and scale that historically allows them to perform as equity agencies for locales.”40 indeed, the averages for funding sources cited above can vary, and state and federal governments have attempted to dampen the funding inequities between rural and nonrural libraries. one example is the federal e-rate program, established under the telecommunications act of 1996 to provide schools, libraries, and healthcare providers with a discounted “education rate” for communication technologies, including internet technologies.41 while this has subsidized part of the internet service costs for libraries throughout the nation, many libraries do not apply because they do not know they are eligible or the application process is too complicated. some rural libraries have had the advantage of their state library systems applying on their behalf, but even when funding is provided this only covers parts of the libraries’ connection and equipment costs. and, according to the plftas survey, only 61.5% of rural libraries received e-rate funds, compared to 75.0% of urban libraries, showing the program does not favor the class of libraries with the greatest connectivity issues.42 likewise, as noted above, the federal government designated $7.2 billion from the american recovery and reinvestment act of 2009 for improving broadband access throughout the nation, with funding designated for rural areas and public libraries in general. these improvements will take time, though, and will not fully compensate for the lack of local funds for rural libraries or rural libraries not receiving nearly as much in nongovernmental funds as nonrural libraries.43 additionally, while local governments in some areas have created their own broadband information technology and libraries | march 2014 16 infrastructure to compensate for corporate providers’ unwillingness to expand to some areas due to inadequate predicted profits, nineteen state governments banned such practices due to lobbying efforts from the broadband industry.44 the corporations that lobbied for these laws feared that if this becomes common practice, local governments could offer low enough pricing to compete against for-profit services. while this may be a legitimate concern, the end result of this legislation is local governments—including rural governments—in some states being legally blocked from allocating funds to solve the market failure that has prevented corporate providers from adequately expanding into rural areas. therefore public libraries’ funding and resource structures are inherently stacked against rural institutions. while e-rate and other federal and state programs may mitigate the problem, the ultimate solution needs to be a restructuring of library funding models that takes the primary burden off struggling local governments or at least increases state and federal contributions. in a seminal article on rural libraries and the technology written in 1995, vavrek noted that “public libraries cannot survive by only appealing to those who are least likely to be able to pay to support the library. while visions of the homeless person using the internet to locate information is both compassionate and within the social role of the public library, can the library afford to provide this access?”45 beyond patrons not being open to assisting less fortunate individuals, vavrek suggested attempts to diversify library services—including introducing internet technology services, which was novel at the time—could distract resources from libraries’ established services that have traditionally appealed to all income classes and, with this, erode public support for these institutions. the pew home broadband 2010 survey show vavrek’s thoughts on this matter were prescient, as 53% of survey respondents believed the government either should not support broadband expansion or that this should not be a very important priority.46 the benefits of greater broadband access and relevant service support may seem obvious to those who are intimate with this matter, but much of the public does not see the importance of expanding such services. if rural librarians cannot fight these perceptions and convince traditional library users and the general public of the importance of these services, then they will probably not be able to reverse these negative trends. unfortunately, rural libraries lack the time, resources, and data to lobby the public on these matters. staffing and training problems for rural librarians a lack of funding and resources affect not only rural public libraries, but also rural public librarians. in a study that illustrated such issues, flatley and wyman surveyed a random sample of libraries in extremely rural areas, with their service population baseline being 2,500 as opposed to the 25,000 threshold noted above.47 while the data they collected are somewhat dated (the survey was conducted in 2007), this study still deserves special attention because similar data have not been collected more recently or by other authors. the authors found that 80% of rural libraries have only a single full-time employee, and 50% have two or less paid employees when fulland part-time employees are considered.48 these rural public libraries and digital inclusion | real, bertot, and jaeger 17 employees are underpaid compared to the national average, with 72% reporting they earned $12.99 or less per hour.49 when asked why they believed their pay was relatively low, more than half (53%) of rural librarians responded it was because their communities lacked funds, demonstrating the structure of local funding being more important than state and federal funding to librarians’ salaries.50 flatley and wyman also found that only 14% of these employees held mls degrees, with 32% having achieved bachelor’s degrees and 37% having completed only a high school diploma.51 as one would expect in relation to most rural librarians not having professional training before entering the field, many of these individuals applied for their first library career because they saw a position advertised for their local library and it offered better pay than most other local jobs. while many rural librarians entered the profession because of reasons other than a desire to become librarians, the data suggest these individuals are capable and enthusiastic about their jobs. almost half (47%) of rural librarians had worked in the field for more than a decade, with an additional 22% having been librarians for six to ten years.52 two-thirds (66%) of survey respondents stated they intended to remain librarians until retirement age, and 97% responded they were very satisfied or somewhat satisfied with their careers.53 additionally, despite the relatively low pay for library positions, this was not the most common complaint rural librarians had about their jobs. instead, while 27% found low pay to be the greatest issue they faced, 29% felt a lack of funds for new materials was a greater problem.54 therefore, while certain technological issues in rural public libraries—such as the lack of technological training courses for patrons—can be framed accurately as a problem involving rural librarians, these problems should not be framed as the librarians’ fault. with current staffing levels, rural librarians do not have as much available staff time to provide training courses and one-on-one training as their suburban and urban counterparts. these librarians may also lack the knowledge and experience to train others in technological skills, and their libraries may lack the funds to help them acquire these abilities. these factors are outside of these librarians’ control, however, and “no matter how hard lis professionals try, one cannot expect public library systems (especially those in less-advantaged neighborhoods) to bridge the information gap when the libraries are themselves underfunded and understaffed.”55 considering typical rural librarians' high dedication levels, one can assume they would be willing to remedy information gaps if they first had the resources to fix their libraries’ skill, funding, and staffing gaps. possible solutions rural libraries face the dual issue of a lack of resources to allow librarians enough time to advocate for their branches and a lack of data that advocates can use to show funders these libraries’ value to their communities. as a solution for the latter problem, sin suggested that library and information science (lis) scholars and other prominent figures in the field begin a dialogue with underfunded libraries—including rural institutions—to work with librarians to gather, process, and interpret data on libraries’ needs and libraries’ effects on their information technology and libraries | march 2014 18 communities.56 this would have the dual benefit of giving librarians better information with which they could focus their services for maximum value and providing graduate-student and professional-level researchers with a stronger understanding of their field. the authors of this article would like to expand on this slightly to suggest that any researchers who draft scholarly papers and presentations from data collected from work with underfunded libraries should feel obligated to assist libraries in using this data for their own benefit. scholars are likely to be in a better position to advocate for libraries with which they collaborate than time-and resourcestrapped librarians, and they should feel an ethical responsibility to do so after reaping the benefits of research. more rural librarians also need the skills to empower them to lead technological training courses for patrons, gather data to better understand how to best optimize their services, and lobby for greater funding at the local level. mehra et al. of the school of information sciences at the university of tennessee attempted to remedy this problem to a limited degree with a program they launched in june 2010 with funding from the imls laura bush 21st century librarian program.57 the researchers used this funding to provide full scholarships—including laptop computers and funds for books—to sixteen rural librarians already working in the south and central appalachia regions, allowing them to earn an mls degree in two years of part-time study. the researchers had previously conducted a qualitative survey of rural librarians in tennessee to determine the training and resource needs of rural librarians,58 and they used these data to form a customized mls program for the scholarship students. this included courses focusing on strong technical competencies, service evaluation, grant writing, and other courses of particular relevance to the rural environment. likewise, georgia uses state funds to pay the salaries of many experienced librarians with mls degrees throughout the state, thereby lifting the burden of affording such individuals off cashstrapped counties and municipalities.59 however, as this system develops in georgia, state funding is still limited and there have been state funding cuts to other areas, such as materials and infrastructure, to allow for an increase in state-funded professional librarians.60 therefore, while this appears to be a promising model that can be of particular benefit to rural residents of the state, further study is needed to determine its overall effects. with an estimate of more than 8,000 rural public libraries operating in the united states,61 it would be impossible to find the resources to provide the large majority of librarians without an mls at these locations with the full training needed to earn the degree. even if such funding were available, a large portion—if not the majority—of these resources could be put to better use by improving rural libraries’ technological infrastructure, increasing salaries, and growing collections. therefore, while the mls may remain the gold standard for library professionalism, it is not a realistic goal for many experienced and dedicated librarians throughout the country. instead, a more realistic program on a larger scale may be to provide rural librarians with targeted online and in-person training to enhance the skills they feel they need to be more successful. faculty and rural public libraries and digital inclusion | real, bertot, and jaeger 19 graduate students in lis academic programs are perhaps the most capable people to lead such training, and they are likely more capable of writing grant proposals to cover the costs of such programs than the rural librarians they could assist. mehra et al. have shown promising progress in this direction,62 and by removing the mls goal (or only expecting it in limited cases), their work could easily be emulated to help lis educators empower librarians throughout the nation. connect2compete, as detailed above, also has the potential to provide a training model for public librarians. the organization plans to create a “digital literacy corps,” comprising individuals who will help train portions of the public in basic digital literacy skills.63 while this program is still in its early phases, the organization plans to include librarians among this corps, training them to be better able to train others. once again, this will be achieved through private funds donated by corporate partners. this is certainly a noble effort and will likely benefit many libraries and their patrons, but “having access to training and being able to take advantage of training are two separate things.”64 connect2compete, digitalliteracy.gov, and other organizations already provide some resources to help rural librarians understand digital literacy issues and provide better training, but librarians have limited time to familiarize themselves with these sources when dealing with their daily duties. for librarians to use current, future, or more refined training resources, the problem of low staffing—and its cause, low funding—must be addressed. since many rural librarians lack the skill or, more importantly, time to lobby for their own libraries, this is a significant area where partner organizations can help. whether these partners are university departments as envisioned by mehra et al. and sin or individuals funded by private donations in the connect2compete model is inconsequential. the important issue is that if these partner groups want to truly help rural libraries bridge the digital divide, these groups will have to contribute a significant portion of their efforts to lobbying to increase library funding enough to improve infrastructure and increase staffing—and, through this, staff time—for training and assisting patrons. as discussed above, the btop program has had success both in increasing technological infrastructure and human infrastructure, with grant funding being used in some cases to bring in temporary staff that is capable of training patrons in digital literacy and to increase training opportunities for patrons using existing staff. given the information above, btop’s holistic approach is certainly encouraging, and the program's use of federal funds has shown how resources from above the local level can serve as an equalizing force. the temporary nature and limited funding of this program, however, make it important to remember this cannot be considered as the primary solution to the digital inclusion problem. conclusion many rural public libraries are the only providers of free broadband internet service and computer terminals for their communities, with these communities having the lowest average proportion of homes with broadband connections. with the internet being essential to receive information technology and libraries | march 2014 20 important government services and to apply for jobs with some of the largest and most ubiquitous employers throughout the nation, the value of the services offered by these libraries cannot be understated. the basic public library funding structure needs to be modified to close the digital inclusion gap between rural and more populated areas. even if local governments remain the primary funding source for public libraries, this contribution cannot remain grossly disproportionate when compared to state and federal support. state and federal governments are already seeing savings by moving access to government services and information online, and these governments will benefit with the better employment rates and better employee competency that comes with a digitally inclusive society. since these governments share in the benefits of digital inclusion, they must also share in the costs. some programs have shown promising results in bolstering rural public libraries and, though this, improving this nation's digital inclusion. these results range from large-scale programs such as btop to smaller programs such as the mls education program initiated by mehra et al. a common element of many of these programs, though, is their temporary nature, showing that funders are not recognizing that as technological innovation continues, new problems in digital inclusion will emerge. for government decision makers to understand the ongoing nature of the digital inclusion problem, rural public librarians and their allies—including academics and other stakeholders— will need to gather better data and provide better advocacy. references 1. , “fy2011 public library (public use) data files,” institute of museum and library services, http://www.imls.gov/research/pls_data_files.aspx. 2. john carlo bertot etal., 2011–2012 public library funding and technology access survey: survey findings and results (college park, md: information policy and access center, 2012), http://ipac.umd.edu/sites/default/files/publications/2012_plftas.pdf. 3. the studies originally began as the public libraries and the internet survey series until 2006 through various funding sources, at which time they became part of the public library funding and technology access study (http://www.ala.org/plinternetfunding), funded by the american library association and the bill & melinda gates foundation. 4. john carlo bertot et al., “public libraries and the internet: an evolutionary perspective,” library technology reports 47, no. 6 (2011): 7–8. 5. paul t. jaeger et al., “the intersection of public policy and public access: digital divides, digital literacy, digital inclusion, and public libraries,” public library quarterly 31, no. 1 (2012): 1–20. 6. bertot et al., 2011–2012 public library funding and technology access survey. http://www.imls.gov/research/pls_data_files.aspx http://ipac.umd.edu/sites/default/files/publications/2012_plftas.pdf http://www.ala.org/plinternetfunding rural public libraries and digital inclusion | real, bertot, and jaeger 21 7. aaron smith, home broadband 2010 (washington, dc: pew research center, 2010): 8, http://www.pewinternet.org/~/media//files/reports/2010/home%20broadband%20201 0.pdf. 8. federal communications commission, connecting america: the national broadband plan (washington, dc: federal communications commission, 2009): xi–xiii, http://download.broadband.gov/plan/national-broadband-plan.pdf. 9. aaron smith, home broadband 2010, 5. 10. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 286; bertot et al., “public libraries and the internet,” 12–13. 11. jaeger et al., “the intersection of public policy and public access,” 1–20. 12. us public libraries and the broadband technology opportunities program (btop). (washington, dc: american library association, 2013): 1–2, http://www.districtdispatch.org/wp-content/uploads/2013/02/ala_btop_report.pdf. 13. ibid., 18. 14. “states continue to feel recession’s impact,” center on budget and policy priorities, last modified june 27, 2012, http://www.cbpp.org/cms/index.cfm?fa=view&id=711. 15. deanne w. swan et al., public libraries survey: fiscal year 2010 (imls-2013–pls-01) (washington, dc: institute of museum and library services, 2010) 16. paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology & libraries 26, no. 2 (2007): 4–14, http://dx.doi.org/10.6017/ital.v26i2.3277. 17. natalie greene taylor et al., “public libraries in the new economy: 21st century skills, the internet, and community needs,” public library quarterly 31, no. 3 (2012): 191–219. 18. bharat mehra et al., “what is the value of lis education? a qualitative study of the perspectives of tennessee’s rural librarians,” journal of education for library & information science 52, no. 4 (2011): 272. 19. bertot et al., 2011–2012 public library funding and technology access survey, 15. 20. ibid., 22. 21. ibid., 46. 22. bertot, “public access technologies in public libraries,” 88. 23. bertot et al., 2011-2012 public library funding and technology access survey, 21. 24. ibid., 29. http://www.pewinternet.org/~/media/files/reports/2010/home%20broadband%202010.pdf http://www.pewinternet.org/~/media/files/reports/2010/home%20broadband%202010.pdf http://download.broadband.gov/plan/national-broadband-plan.pdf http://www.districtdispatch.org/wp-content/uploads/2013/02/ala_btop_report.pdf http://www.cbpp.org/cms/index.cfm?fa=view&id=711 information technology and libraries | march 2014 22 25. ibid., 42–45. 26. mehra et al., “what is the value of lis education?” 271–72. 27. bertot et al., 2011–2012 public library funding and technology access survey, 36. 28. paul t. jaeger and john carlo bertot, “responsibility rolls down: public libraries and the social and policy obligations of ensuring access to e-government and government information,” public library quarterly 30, no. 2 (2011): 91–116. 29. ibid., 100. 30. the obama administration’s commitment to open government: a status report (washington: government printing office, 2013): 4–7, http://www.whitehouse.gov/sites/default/files/opengov_report.pdf. 31. barack obama, digital government: building a 21st century platform to better serve the american people (washington, dc: office of management and budget, 2012), http://www.wh.gov/digitalgov/pdf. 32. “find educator tools,” digitalliteracy.gov, http://www.digitalliteracy.gov/content/educator. 33. “about us,” everyoneon, http://www.everyoneon.org/c2c. 34. ad council, “ad council & connect2compete launch nationwide psa campaign to increase digital literacy for 62 million americans,” press release, march 21, 2013, http://www.adcouncil.org/news-events/press-releases/ad-council-connect2competelaunch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans. 35. jaeger et al., “public libraries and internet access,” 14. 36. bertot, “public access technologies in public libraries,” 81. 37. sei-ching joanna sin, “neighborhood disparities in access to information resources: measuring and mapping u.s. public libraries’ funding and service landscapes,” library & information science research 33, no. 1 (2011): 45. 38. glenn e. holt, “a viable future for small and rural libraries,” public library quarterly 28, no. 4 (2009): 288. 39. ibid., 288–89. 40. ibid., 289. 41. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the e-rate program and libraries and library consortia, 2000–2004: trends and issues,” information technology & libraries 24, no. 2 (2005): 57–67. 42. bertot et al., 2011–2012 public library funding and technology access survey, 61. 43. sin, “neighborhood disparities in access,” 51. http://www.whitehouse.gov/sites/default/files/opengov_report.pdf http://www.wh.gov/digitalgov/pdf http://www.digitalliteracy.gov/content/educator http://www.everyoneon.org/c2c/ http://www.adcouncil.org/news-events/press-releases/ad-council-connect2compete-launch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans http://www.adcouncil.org/news-events/press-releases/ad-council-connect2compete-launch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans rural public libraries and digital inclusion | real, bertot, and jaeger 23 44. olivier sylvain, “broadband localism,” ohio state law journal 73, no. 4 (2012): 20–24. 45. bernard vavrek, “rural information needs and the role of the public library,” library trends 44, no. 1 (1995): 26. 46. aaron smith, home broadband 2010, 2. 47. robert flatley and andrea wyman, “changes in rural libraries and librarianship: a comparative survey,” public library quarterly 28, no. 1 (2009): 25–26. 48. ibid., 34. 49. ibid., 35. 50. ibid., 28. 51. ibid., 33. 52. ibid., 26. 53. ibid., 29. 54. ibid., 30. 55. sin, “neighborhood disparities in access,” 50. 56. ibid., 51. 57. bharat mehra et al., “collaborations between lis education and rural libraries in the southern and central appalachia: improving librarian technology literacy and management training,” journal of education for library & information science 52, no. 3 (2011): 238–47. 58. mehra et al., “what is the value of lis education?” 59. “state paid position guidelines,” last updated august 2013, http://www.georgialibraries.org/lib/stategrants_accounting/official_state_paid_position_gui delines-updated-august-2013.pdf. 60. bob warburton, “georgia tweaks state funding formula to prioritize librarians,” library journal, february 2, 2014, http://lj.libraryjournal.com/2014/02/budgets-funding/georgiatweaks-state-funding-formula-to-prioritize-librarians. 61. bertot et al., 2011–2012 public library funding and technology access survey, 14. 62. mehra et al., “collaborations between lis education and rural libraries”; mehra et al., “what is the value of lis education?” 63. institute of museum and library services, “imls announces grant to support libraries’ roles in national broadband adoption efforts,” press release, june 14, 2012, http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadban d_adoption_efforts.aspx. http://www.georgialibraries.org/lib/stategrants_accounting/ http://lj.libraryjournal.com/2014/02/budgets-funding/georgia-tweaks-state-funding-formula-to-prioritize-librarians/ http://lj.libraryjournal.com/2014/02/budgets-funding/georgia-tweaks-state-funding-formula-to-prioritize-librarians/ http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadband_adoption_efforts.aspx http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadband_adoption_efforts.aspx information technology and libraries | march 2014 24 64. bertot, “public access technologies in public libraries,” 88. book reviews information technology and libraries | march 2014 44 epub 3: best practices, by matt garrish and markus gylling. sebastopol, ca: o'reilly. 2013. 345 pp. isbn: 978-1-449-32914-3. $29.99. there is much of value in this book—there aren't really that many books out right now about the electronic book markup framework, epub 3—yet i have a hard time recommending it, especially if you're an epub novice like me. so much of the book assumes a familiarity with epub 2. if you aren't familiar with this version of the specification, then you will be playing a constant game of catch-up. also, it's clear that the book was written by multiple authors; the chapters are sometimes jarringly disparate with respect to pacing and style. the book as a whole needs a good edit. this is surprising since o'reilly is almost unifo rmly excellent in this regard. the first three chapters form the core of the book. the first chapter, "package document and metadata," illustrates how the top level container of any epub 3 book is the "package document." this document contains metadata about the book as well as a manifest (a list of files included in the package as a whole), a spine (a list of the reading order of the files included in the book), and an optional list of bindings (a lookup list similar to the list of helper applications contained in the configurations of most modern web browsers). the second chapter, "navigation," addresses and illustrates the creation of a proper table of contents, a list of landmarks (sort of an abbreviated table of contents), and a page list (useful for quickly navigating to a specific print-equivalent page in the book). the third chapter, "content documents," is the heart of the core of the book. this chapter addresses markup of actual chapters in a book, pointing out that epub 3 markup here is mostly a subset of html5, but also pointing out such things as the use of mathml for mathematical markup, svg (scalable vector graphics), page layout issues, use of css, and the use of document headers and footers. after reading these first three chapters, my sense is that one is ready to dive into a markup project, which is exactly what i did with my own project. that said, i think a reread of these core chapters is due, which i intend to do presently. the rest of the book is devoted to specialty subjects such as how to embed fonts, use of audio and video clips, "media overlays" (epub 3 supports a subset of smil, the synchronized multimedia integration language, for creating synchronized text/audio/video presentations), interactivity and scripting (with javascript), global language support, accessibility issues, provision for automated text-to-speech, and a nice utility chapter on validation of epub 3 xml files. of these, the chapter on global language support i found to be fascinating. for us native english speakers, it's not immediately obvious some of the problems one will inevitably encounter when trying to create an electronic publication that can work in non-western languages. just consider languages that read vertically and from right to left, for one! as an epub novice, my greatest desire would be for the book to provide, maybe in an appendix, a fairly comprehensive example of an epub 3 marked -up book. maybe this is a tall book reviews 45 order? nevertheless, i would love to see an example of marked up text including bidirectional footnotes, pagination, a table of contents, etc.; simple, foundational things, really. examples of each of these are included in the book, but not in one place. having such an example in one place would be something that could be used as a quick-start template for us epub beginners. to be fair, code examples of all of this is up on the accompanying website, and i am using these examples as i learn to code epub 3 for my own project. but having a single, relatively comprehensive example as an appendix to the book would be very useful. as i read this book, something kept bothering me. epub2 and epub 3 are so very different, with reading systems designed to render epub 3 documents being fairly rare at this point. so if different versions of the same spec are so different, with no guarantee that a future reading system will be able to read documents adhering to a previous version, then the prospect of reading epub documents into the future is pretty sketchy. are e-books, then, just convenient and cool mechanisms for currently reading longish narrative prose—convenient and cool, but transitory? mark cyzyk is the scholarly communication architect in the sheridan libraries, johns hopkins university, baltimore, maryland, usa. 250 compression word coding techniques for information retrieval 'william r. nugent: vice president, inforonics, inc., cambridge, massachusetts a description and comparison is presented of four compression techniques for word coding having application to information retrieval. the emphasis is on codes useful in creating directories to large data files. it is further shown how differing application objectives lead to differing measures of optimality for codes, though compression may be a common quality. introduction cryptographic studies have documented much useful language data having application to retrieval coding. because unclassified cryptographic studies are few, fletcher pratt's 1939 work ( 1) remains the classic in its field. gaines ( 2) has the virtue of being in print, and the more recent cryptographic history of kahn ( 3), while comprehensive, lacks the statistical data that made the earlier works valuable. the word coding problem for language processing, as opposed to cryptography, has been extensively studied by nugent and vegh ( 4). information theorists have contributed the greatest volume of literature on coding and have added to its mathematical basis, largely from the standpoint of communications and error avoidance. a brief discussion of compression codes and their objectives is here presented, and then a description of a project encompassing four compression codes having application to retrieval directories. two of the compression word goding/ nugent 251 codings are newly devised. one is transition distance coding, a randomizing code that results in short codes of high resolving power. the second is alphacheck. it combines high readability with good resolution, and permits simple truncation to be used by means of applying a randomized check character that acts as a surrogate of the omitted portion. it appears to have the greatest potential, in directory applications, of the codes considered here. recursive decomposition is a selected letter code devised by the author several years ago ( 4). it has been tested and has the advantages of simple derivation and high resolution. soundex(5) is the only compression code that has achieved wide usage. it was devised at remington rand for name matching under conditions of uncertain spelling. objectives of compression coding it is desired to transform sets of variable length words into fixed length codes that will maximally preserve word to word discrimination. in the final directories to be used, the codes for several elements will be accessible to enable the matching of several factors before a file record is selected. the separate codes for differing factors need not be the same length, though each type of code will be of uniform length; nor need the codes for differing factors be derived by the same process. what we loosely call codes, must be formally designated ciphers. that is, they must be derivable from the data words themselves, and not require "code books" to determine equivalences. this is so because the file directories must be derivable from file items, ent:ries in directory form must be derivable from an input query, and these two directory items must match when a record is to be extracted. the ciphers need not be decipherable for the application under consideration, and in general are not. fixed length codes which provide the rough equivalent and simplicity of a margin entry in a paper directory, are generally desirable for machine directories. the functions of the codes will detennine their form, and a code or file key designed to meet one objective will generally not be satisfactory for any other objective. the following typical objectives serve as four examples: ( 1) create a file key for extraction of records in approximate file order, as is required for the common sorting and printout problem. a typical code construction rule is to take the first six letters. johnsen _..,.. johnse johnson _..,.. johnso johnston _..,.. johnst johnstone _..,.. johnst 252 journal of library automation vol. 1/ 4 december, 1968 ( 2) create a file key for extraction of records under conditions of uncertainty of spelling (airline reservation problem). a typical code construction rule is vowel elimination or soundex. a typical matching rule is best match. vowel elimination soundex johnsen_..,.. jhnsn j525 _..,.. j52 johnson_..,.. jhnsn j525 _..,.. j52 johnston _..,.. jhnstn j5235 _..,.. j52 johnstone _..,.. jhnstn j5235 _..,.. j52 ( 3) create a file key extraction of records from accurate input, with objective of maximum discrimination of similar entries (cataloging search problem). typical code construction rules are recursive decomposition coding or transition distance coding. recursive decomposition transition distance johnsen_..,.. jhnsen bftz johnson _..,.. jhnson dnwu johnston _..,.. jhston ziky johnstone _..,.. jhsone ecrc for the file keys of primary concern accurate imput data is assumed and the objective is maximum discrimination. desirably, a code would be as discriminating as transition distance coding and be as readable as truncation coding. this can be achieved to some degree by combining the two codes into one, with an initial portion truncated and a final check character representing the remainder via a compressed transition distance code: alphacheck. ( 4) create a file key for human readability and high word to word discrimination. possible code construction rules are alphacheck, and simple truncation plus a terminal check character. johnsen _..,.. johnsv johnson _..,.. johnsx johnston _..,.. johnsd johnstone _..,.. johnss methods the algorithms for creating the preceding codes are described in the following sections. it is axiomatic that randomizing codes give the greatest possible discrimination for a given code space. the whole trick of creating a good compression code is to eliminate the natural redundancy of english orthography, and preserve discrimination in a smaller word size. compression word goding/ nugent 253 letter-selection codes can only half accomplish this, due to the skewed distribution of letter usage. they can eliminate the higher-frequency components, but they cannot increase the use of the lower-frequency components. randomizing codes-often called "hash" codes, properly quasi-random codes-can equalize letter usage and hence make best use of the code space. prime examples here are the variants of godel coding devised by vegh ( 4) in which the principle of obtaining uniqueness via the products of unrepeated primes is exploited, as it is in the randomizing codes considered here. the problem in design of a randomizing code is that the results can be skewed rather than uniformly distributed due to the skewed nature of the letters and letter sequences that the codes operate on. in transition distance coding, the natural bias of letters and letter sequences is overcome by operating on a word parameter that is itself semi-random in nature. the following principle, not quite a theorem, applies: "considering letters in their normal ordinal alphabetic position, and considering letter transitions to be unidirectional and cyclic, the distribution of transition distances in english words is essentially uniform." in view of the fact that letter usage has an extremely skewed distribution, with a probability ratio in excess of 170 to one for the extremes, it is seen that the more uniform parameter of transition distances is a superior one for achieving randomized codes. the relative uniformity of transition distance needs further investigation, but one typical letter diagram sample from gaines ( 2) with 9999 transitions (means number of occurrences of each distance = 385) yielded a mean· deviation of 99 and a standard deviation of 123, and an extreme probability ratio of 3.3 to one for the different transition distances from 0 to 25. the distribution can be made more uniform by letter permutation. permutation is used in the algorithm for transition distance coding but not in alphacheck. algorithm the method of transition distance coding is used to operate on a variable length word to achieve fixed length alphabetic or alphanumeric codes that exhibit quasi-random properties. the code is formed from the modulo product of primes associated with transition distances of permuted letters. the method is intended strictly for computer operation, as it is a simple program but an extremely tedious manual operation. there are five steps: ( 1) permute characters of natural language word. this breaks the diagram dependency that could make the transition distances less uniformly distributed. this step might be dispensed with if the resulting distributions prove satisfactory without it. the permutation process consists of taking the middle letter (or letter right of middle for words with an even number of letters) , the first, the last, the second, the next-to-last, etc. 254 journal of library automation vol. 1/4 december, 1968 until all letters have been used. that is, for a letter sequence: at, a2, ... at ... ' an the following permutation is taken: arnt ( -i +1), at, an, a2, an-1, ... a
or . if the table’s content is complex, it may be necessary to provide an alternative presentation of the information. it is best to rely on css for page layout, taking into consideration the directions in subparagraph (d) above. (i) frames shall be titled with text that facilitates frame identification and navigation. frames are a deprecated feature of html, and their use should be avoided in favor of css layout. (j) pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 hz and lower than 55 hz. lights with flicker rates in this range can trigger epileptic seizures. blinking or flashing elements on tending a wild garden: library web design for persons with disabilities | vandenbark 25 a webpage should be avoided until browsers provide the user with the ability to control flickering. (k) a text-only page, with equivalent information or functionality, shall be provided to make a web site comply with the provisions of this part, when compliance cannot be accomplished any other way. the content of the text-only page shall be updated whenever the primary page changes. complex content that is entirely visual in nature may require a separate text-only page, such as a page showing the english alphabet in american sign language. this requirement also serves as a stopgap measure for existing sites that require reworking for accessibility. some consider this to be the web’s version of separate-but-equal services, and should be avoided.9 offering a text-only alternative site can increase the sense of exclusion that pwd already feel. also, such versions of a website tend not to be equivalent to the parent site, leaving out promotions or advertisements. finally, a text-only version increases the workload of web development staff, making them more costly than creating a single, fully accessible site in the first place. (l) when pages utilize scripting languages to display content, or to create interface elements, the information provided by the script shall be identified with functional text that can be read by assistive technology. scripting languages such as javascript allow for more interactive content on a page while reducing the number of times the computer screen needs to be refreshed. if functional text is not available, the screen reader attempts to read the script’s code, which outputs as a meaningless jumble of characters. using redundant text links avoids this result. (m) when a web page requires that an applet, plug-in, or other application be present on the client system to interpret page content, the page must provide a link to a plug-in or applet that complies with [subpart b: technical standards] §1194.22(a) through (i). web developers need to ascertain whether a given plug-in or applet is accessible before requiring their webpage’s visitors to use it. when using applications such as quicktime or realaudio, it is important to provide an accessible link on the same page that will allow users to install the necessary plug-in. (n) when electronic forms are designed to be completed on-line, the form shall allow people using assistive technology to access information, field elements, and functionality required for completion and submission of the form, including all directions and cues. if scripts used in the completion of the form are inaccessible, an alternative method of completing the form must be made immediately available. each element of a form needs to be labeled properly using the
$serviceup”; introducing zoomify image | smith 53administering an open-source wireless network | feher and sondag 53 # if last two status’ were down if ($oldstatusfile =~ m/\($service\)-0--->/){ $msg = “$service back up at $currenttime\n”; # if service has owner & not already in mail list, # add owner to mail list $toemail .= “, \’$owner\’” if ($owner && (!($toemail =~ $owner))); } } #else current status is down else{ $statusline .= “down\”>down”; # if last status was down & before that status was up if ($oldstatusfile =~ m/\($service\)-0-1-->/){ $msg = “$service down at $currenttime\n”; # if service has owner & not already in mail list, # add owner to mail list $toemail .= “, \’$owner\’” if ($owner && (!($toemail =~ $owner))); } } $statusline .= “
education libraries
library statistics, statewide summary by type of library california 1992–93 to 1997–98
retrieval two search methods were used: direct probabilistic retrieval. an in-house implementation was used of a probabilistic full-text retrieval algorithm developed at berkeley.7 this search engine takes a free-form text query and returns a ranked list of captions of tables ranked according to their relevance scores. for example, the five top-ranked captions returned to the query “public libraries in california” were: figure 1. query interface for search-term recommender system f or the north american industry classification system figure 2. display of naics code search-term recommendations for “car” figure 3. display of numeric data retrieved using selected naics code search across different media | buckland, chen, gey, and larson 185 1. library statistics, statewide summary by type of library california, 1992–93 to 1997–98 table f6. 2. library statistics, statewide summary by type of library california, 1993–94 to 1998–99 table f6yr0-0. 3. number of california libraries, 1989 to 1999 table f5yr00 4. number of california libraries, 1989 to 1998, as of september table f5. 5. california public schools, grades k–12, 1989 to 1998 table f4. each entry in the retrieved set list is linked to a numeric table maintained at the counting california web site and, by clicking on the appropriate link, a user can display the table as an ms excel file or as a pdf file. mediated search. from the same extracted records the words in the captions were used to create an evi to the subtopics in the topic classification using the method already described. as an example, the query “personal individual income tax,” when submitted to the evi, generated the following ranked list of subtopics: 1. income 2. government earnings and tax revenues 3. personal income 4. property tax 5. personal income tax 6. corporate income tax 7. per capita income a user can click on any selected subtopic to retrieve the captions of tables assigned that subtopic. for example, clicking on the fifth subtopic, personal income tax, retrieves: ■ personal income tax returns: number and amount of adjusted gross income reported by adjusted gross income class california, 1998 taxable year. table d10yr00 ■ personal income tax returns: number and amount of adjusted gross income reported by adjusted gross income class california, 1997 taxable year. table d9 ■ personal income statistics by county, california 1997 taxable year. table d10 ■ personal income statistics by county, california 1998 taxable year. table d11yr00 ■ transverse searching between textand numeric-data series to demonstrate the searching capability from a bibliographic record to numeric-data sets, the first step is to retrieve and display a bibliographic record from an online catalog. a web-based interface for searching online catalogs was implemented using an in-house implementation of the z39.50 protocol. besides the z39.50 protocol, an important component that makes searching remote online catalogs feasible is the gateway between the http (hypertext transfer protocol) and the z39.50 protocol. while http is a connectionless-oriented protocol, the z39.50 is a connection-oriented protocol. the gateway maintains connections to remote z39.50 servers. all search requests to any remote z39.50 server go through the gateway. searching from catalog records to numeric data sets having selected some text (for the purposes of this study, a catalog record), how could one identify the facts or statistics in a numeric database that are most closely related to the topic? clicking on a “formulate query” button placed at the end of a displayed full marc record creates a query for searching a numeric database. the initial query will contain the words extracted from the title, subtitle, and the subject headings and is placed in a new window where the user can modify or expand the query before submitting it to the search engine for a numeric database. so, for example, the following text extracted from a catalog record: library laws of the state of california, library legislation. california. public libraries when submitted as a query, retrieves a ranked list of table names, of which two, covering different time periods, are entitled library statistics, statewide summary by type of library, california. searching from numeric data sets from catalog records transverse search in the other direction, starting from a data table, is achieved by forwarding the caption of a table to the word-to-lcsh evi to generate a prompt list of the seven top-ranked lchss, any one of which can be used as a query submitted to the catalog. ■ architecture figure 4 shows the structure of the implementation. the boxes shown in the figure are: 1. a search interface for accessing bibliographic/textual resources through a word-to-lcsh evi. 2. a word to the lcsh evi. 3. a ranked list of lcshs closely associated with the query. 4. an online catalog. 186 information technology and libraries | december 2006 5. results of searching the online catalog using an lcsh. 6. a full marc record displayed in tagged form. 7. a new query formed by extracting the title and subject fields from the displayed full marc record. 8. a numeric database. 9. a list of captions of numeric tables ranked by relevance score to the query. 1 0. numeric table displayed in pdf or ms excel format. 11. a search interface for numeric databases based on a probabilistic search algorithm. a user can start a search using either interface (boxes 1 or 11) and, from either starting point, find records on the same topic of interest in a textual (here bibliographic) database and a socioeconomic database. ■ conclusions and further work enhanced access to numeric data sets the descriptive texts associated with numeric tables, such as the caption, headers, or row labels, are usually very short. they provide a rather limited basis for locating the table in response to queries, or describing a data cell sufficiently to form a usefully descriptive query from it. sometimes the title (caption) of a table may be the only searchable textual description about the content of the table, and the titles are sometimes very general. for example, one of the titles, library statistics, statewide summary by type of library california, 1992–93 to 1997–98, is so general that neither the kinds of statistics nor the types of libraries are revealed. if a user posed the question, “what are the total operating expenditures of public libraries in california?” to a query system that indexes table titles only, the search may well be ineffective since the only word in common between the table title and the user’s query is “california” and, if the plurals of nouns have been normalized, to the singular form, “library.” table column headings and row headings provide additional information about the content of a numeric table. however, the column and row headings are usually not directly searchable. for example, a table named “language spoken at home” in counting california databases consists of rows and columns. the column headings list the languages spoken at home, while the row headings show the county names in california. each cell in the table gives the number of people, five years of age and older, who speak a specific language at home. to answer questions such as “how many people speak spanish at home in alameda county, california?” using the table title alone may not retrieve the table that contains the answer to the example question. it is recommended that the textual descriptions of numeric tables be enriched. automatically combining the table title and its column and row headings would be a small but practical step toward improved retrieval. geographic search socioeconomic numeric data series refer to particular areas and, in contrast to text searching, the geographical aspect ordinarily has to be specified. to match the geographical area of the numeric data, a matching text search may also have to specify the same place. the authors found that this was hard to achieve for several reasons. place names are ambiguous and unstable: a search for data relating to trinidad might lead to trinidad, west indies, instead of trinidad, california, for example. the problem is compounded because, in numeric data series, specialized geopolitical divisions, such as census tracts and counties, are commonly used. these divisions do not match conveniently with searchers’ ordinary use of place names. also, the granularity of geographical coverage may not match well. data relating to berkeley, for example, may be available only in aggregated data for alameda county. it was eventually concluded that reliance on the names of places could never work satisfactorily. the only effective path to reliable access to data relating to places would be to use geospatial coordinates (latitude and longitude) to establish unambiguously the identity and location of any place and the relationship between places. this means that gazetteers and map visualizations become important. gazetteers relate named places to defined spaces, and thereby reveal spatial relationships between places, e.g., the city of alameda is on alameda island within alameda county. this problem has been addressed in a subsequent figure 4. architecture of the prototype search across different media | buckland, chen, gey, and larson 187 study entitled “going places in the catalog: improved geographical access.”8 temporal search searches of text files and of socioeconomic numeric data series also differ substantially with respect to time periods: numeric data searches ordinarily require the years of interest to be specified; text searches rarely specify the period. an additional difficulty arises because in text, as in speech, a period is commonly referred to by a name derived metaphorically from events used as temporal markers, rather than by calendar time, as in “during vietnam,” “under clinton,” or “in the reign of henry viii.” named time periods have some of the characteristics of place names: they are culturally based and tend to be multiple, unstable, and ambiguous. it appears that an analogous solution is indicated: directories of named time periods mapped to calendar definitions, much as a gazetteer links place names to spatial locators. this problem is being addressed in a subsequent study entitled “support for the learner: what, where, when, and who.”9 media forms the paradox, in an environment of digital “media convergence,” that it appears impossible to search directly across different media forms invites closer attention to concepts and terminology associated with media. a view that fits and explains the phenomena as the authors understand them, distinguishes three aspects of media: ■ cultural codes: all forms of expression depend on some shared understandings, on language in a broad sense. convergence here means cultural convergence or interpretation. ■ media types: different types of expression have evolved: texts, images, numbers, diagrams, art. an initial classification can well start with the five senses of sight, smell, hearing, taste, and feel. ■ physical media: paper; film; analog magnetic tape; bits; . . . being digital affects directly only this aspect. anything perceived as a meaningful document has cultural, type, and physical aspects, and genre usefully denotes specific combinations of code, type, and physical medium adopted by social convention. genres are historically and culturally situated. convergence can be understood in terms of interoperability and is clearly seen in physical media technology. the adoption of english as a language for international use in an increasingly global community promotes convergence in cultural codes. nevertheless, the different media types are fundamentally distinct. metadata as infrastructure it is the metadata and, in a very broad sense, “bibliographic” tools that provide the infrastructure necessary for searches across and between different media—thesauruses, mappings between vocabularies, place-name gazetteers, and the like. in isolation, metadata is properly regarded as description attached to documents, but this is too narrow a view. collectively, the metadata forms the infrastructure through which different documents can be related to each other. it is a variation on the role of citations: individually, references amplify an individual document by validating statements made within it; collectively, as a citation index, references show the structure of scholarship to which documents are attached. ■ summary a project was undertaken to demonstrate simultaneous search of two different media types (socioeconomic numeric data series and text files) without ingesting these diverse resources into a shared environment. the project objective was eventually achieved, but proved harder than expected for the following reasons: access to these different media types has been developed by different communities with different practices; the systems (vocabularies) for topical categorization vary greatly and need interpretative mappings (also known as relative indexes, searchterm recommender systems, and evis); specification of geographical area and time period are as necessary for search in socioeconomic data series and, for this, existing procedures for searching text files are inadequate. ■ acknowledgement this work was partially supported by the institute of museum and library services through national library leadership grant no. 178 for a project entitled “seamless searching of numeric and textual resources,” and was based on prior research partially supported by darpa contracts n66001-97-c-8541; ao# f477: “search support for unfamiliar metadata vocabularies” and n66001-00-18911, to# j290: “translingual information management using domain ontologies.” references 1. michael k. buckland, fredric c. gey, and ray r. larson, seamless searching of numeric and textual resources: final report on institute of museum and library services national leadership 188 information technology and libraries | december 2006 grant no. 178 (berkeley, calif.: univ. of california, school of information management and systems, 2002), http:// metadata.sims.berkeley.edu/papers/seamlesssearchfinal report.pdf (accessed july 18, 2006); michael buckland et al., “seamless searching of numeric and textual resources: friday afternoon seminar, feb. 14, 2003,” http://metadata.sims .berkeley.edu/papers/seamlessfri.ppt (accessed july 18, 2006). 2. michael buckland et al., “mapping entry vocabulary to unfamiliar metadata vocabularies,” d-lib magazine 5, no. 1 (jan. 1999), www.dlib.org/dlib/january99/buckland/01buckland .html (accessed july 18, 2006); michael buckland, “the significance of vocabulary,” 2000, http://metadata.sims.berkeley .edu/vocabsig.ppt (accessed july 18, 2006); fredric c. gey et al., “entry vocabulary: a technology to enhance digital search,” in proceedings of the first international conference on human language technology, san diego, mar. 2001 (san francisco: morgan kaufmann, 2001), 91–95, http://metadata.sims.berkeley.edu/ papers/hlt01-final.pdf (accessed july 18, 2006). 3. los angeles times, july 12, 1995: d1. 4. michael buckland, “vocabulary as a central concept in library and information science,” in digital libraries: interdisciplinary concepts, challenges, and opportunities. proceedings of the third international conference on conceptions of library and information science (colis3), dubrovnik, croatia, may 23–26, 1999, ed. t. arpanac et al. (lokve, croatia: benja pubs., 1999), 3–12, www .sims.berkeley.edu/~buckland/colisvoc.htm (accessed july 18, 2006); buckland et al., “mapping entry vocabulary.” 5. counting california, http://countingcalifornia.cdlib.org (accessed july 18, 2006). 6. “factsheet: unified medical language system,” www .nlm.nih.gov/pubs/factsheets/umls.html (accessed july 18, 2006). 7. william s. cooper, aitao chen, and fredric c. gey, “fulltext retrieval based on probabilistic equations with coefficients fitted by logistic regression,” in d. k. harman, ed., the second text retrieval conference (trec-2), march 1994, 57–66 (gaithersburg, md.: national institute of standards and technology, 1994), http://trec.nist.gov/pubs/trec2/papers/txt/05.txt (accessed july 18, 2006). 8. “going places in the catalog: improved geographical access,” http://ecai.org/imls2002 (accessed jul. 18, 2006). 9. vivien petras, ray larson, and michael buckland, “time period directories: a metadata infrastructure for placing events in temporal and geographic context,” in opening information horizons: joint conference on digital libraries (jcdl), chapel hill, n.c., june 11–15, 2006, forthcoming, http://metadata.sims .berkeley.edu/tpdjcdl06.pdf (accessed july 18, 2006); “support for the learner: what, where, when, and who,” http://ecai .org/imls2004 (accessed july 18, 2006). search across different media | buckland, chen, gey, and larson 189 appendix: statistical association methodology a statistical maximum likelihood ratio weighting technique was used to construct a two-way contingency table relating each natural-language term (word or phrase) with each value in the metadata vocabulary of a resource, e.g., lcsh, lccns, u.s. patent classification numbers, and so on.1 an associative dictionary that will map words in natural languages into metadata terms can also, in reverse, return words in natural language that are closely associated with a metadata value. training records containing two different metadata vocabularies can be used to create direct mappings between the values of the two metadata vocabularies. for example, u.s. patents contain both u.s. and international patent classification numbers and so can be used to create a mapping between these two quite different classifications. multilingual training sets, such as catalog records for multilingual library collections, can be used to create multilingual natural language indexes to metadata vocabularies and, also, mappings between natural language vocabularies. in addition to the maximum likelihood ratio-based association measure, there are a number of other association measures, such as the chi-square statistic, mutual information measure, and so on, that can be used in creating association dictionaries. the training set used to create the word-to-lcsh evi was a set of catalog records with at least one assigned lcsh (i.e., at least one 6xx field). natural language terms were extracted from the title (field 245a), subtitle (245b), and summary note (520a). these terms were tokenized; the stopwords were removed; and the remaining words were normalized. a token here can contain only letters and digits. all tokens were then changed to lower case. the stoplist has about six hundred words considered not to be content bearing, such as pronouns, prepositions, coordinators, determiners, and the like. the content words (those not treated as stopwords) were normalized using a table derived from an english morphological analyzer.2 the table maps plural nouns into singular ones; verbs into the infinitive form; and comparative and superlative adjectives to the positive form. for example, the plural noun printers is reduced to printer, and children to child; the comparative adjective longer and the superlative adjective longest are reduced to long; and printing, printed, and prints are all reduced to the same base form print. when a word belonging to more than one part-of-speech category can be reduced to more than one form, it is changed to the first form listed in the morphological analyzer table. as an example, the word saw, which can be a noun or the past tense of the verb to see, is not reduced to see. subject headings (field 6xxa) were extracted without qualifying subdivisions. the inclusion of foreign words (alcoholismo, alcoolisme, alkohol, and alcool), derived from titles in foreign languages, demonstrate that the technique is language independent and could be adopted in any country. it could also support diversity in u.s. libraries by allowing searches in spanish or other languages, so long as the training set contains sufficient content words. evis are accessible at http://metadata. sims.berkeley.edu/prototypesi.html. fuller descriptions of the project methodology can be found in the literature.3 ■ references 1. ted dunning, “accurate methods for the statistics of surprise and coincidence,” computational linguistics 19 (march 1993): 61–74. 2. daniel karp et al., “a freely available wide coverage morphological analyzer for english,” in proceedings of coling-92, nantes, 1992 (morristown, n.j.: association for computational linguistics, 1992), 950–55, http://acl.ldc.upenn .edu/c/c92/c92-3145.pdf (accessed july 18, 2006). 3. michael k. buckland, fredric c. gey, and ray r. larson, seamless searching of numeric and textual resources: final report on institute of museum and library services national leadership grant no. 178 (berkeley, calif.: univ. of california, school of information management and systems, 2002), http://metadata.sims .berkeley.edu/papers/seamlesssearchfinalreport.pdf (accessed jul. 18, 2006); youngin kim et al., “using ordinary language to access metadata of diverse types of information resources: trade classification and numeric data,” in knowledge: creation, organization, and use. proceedings of the american society for information science annual meeting, oct. 29–nov. 4, 1999 (medford, n.j.: information today, 1999), 172–80. microsoft word march_ital_tharani_tc proofread.docx linked  data  in  libraries:  a  case  study     of  harvesting  and  sharing  bibliographic   metadata  with  bibframe     karim  tharani     information  technology  and  libraries  |  march  2015             5   abstract   by  way  of  a  case  study,  this  paper  illustrates  and  evaluates  the  bibliographic  framework  (or   bibframe)  as  means  for  harvesting  and  sharing  bibliographic  metadata  over  the  web  for  libraries.   bibframe  is  an  emerging  framework  developed  by  the  library  of  congress  for  bibliographic   description  based  on  linked  data.  much  like  semantic  web,  the  goal  of  linked  data  is  to  make  the   web  “data  aware”  and  transform  the  existing  web  of  documents  into  a  web  of  data.  linked  data   leverages  the  existing  web  infrastructure  and  allows  linking  and  sharing  of  structured  data  for   human  and  machine  consumption.   the  bibframe  model  attempts  to  contextualize  the  linked  data  technology  for  libraries.  library   applications  and  systems  contain  high-­‐quality  structured  metadata,  but  this  data  is  generally  static   in  its  presentation  and  seldom  integrated  with  other  internal  metadata  sources  or  linked  to  external   web  resources.  with  bibframe  existing  disparate  library  metadata  sources  such  as  catalogs  and   digital  collections  can  be  harvested  and  integrated  over  the  web.  in  addition,  bibliographic  data   enriched  with  linked  data  could  offer  richer  navigational  control  and  access  points  for  users.  with   linked  data  principles,  metadata  from  libraries  could  also  become  harvestable  by  search  engines,   transforming  dormant  catalogs  and  digital  collections  into  active  knowledge  repositories.  thus   experimenting  with  linked  data  using  existing  bibliographic  metadata  holds  the  potential  to   empower  libraries  to  harness  the  reach  of  commercial  search  engines  to  continuously  discover,   navigate,  and  obtain  new  domain  specific  knowledge  resources  on  the  basis  of  their  verified   metadata.   the  initial  part  of  the  paper  introduces  bibframe  and  discusses  linked  data  in  the  context  of   libraries.  the  final  part  of  this  paper  outlines  and  illustrates  a  step-­‐by-­‐step  process  for  implementing   bibframe  with  existing  library  metadata.   introduction   library  applications  and  systems  contain  high-­‐quality  structured  metadata,  but  this  data  is  seldom   integrated  or  linked  with  other  web  resources.  this  is  adequately  illustrated  by  the  nominal   presence  of  library  metadata  on  the  web.1  libraries  have  much  to  offer  to  the  web  and  its  evolving   future.  making  library  metadata  harvestable  over  the  web  may  not  only  refine  precision       karim  tharani  (karim.tharani@usask.ca)  is  information  technology  librarian  at  the  university   of  saskatchewan  in  saskatoon,  canada.     information  technology  and  libraries  |  march  2015   6   and  recall  but  has  the  potential  to  empower  libraries  to  harness  the  reach  of  commercial  search   engines  to  continuously  discover,  navigate,  and  obtain  new  domain  specific  knowledge  resources   on  the  basis  of  their  verified  metadata.  this  is  a  novel  and  feasible  idea,  but  its  implementation   requires  libraries  to  both  step  out  of  their  comfort  zones  and  to  step  up  to  the  challenge  of  finding   collaborative  solutions  to  bridge  the  islands  of  information  that  we  have  created  on  the  web  for   our  users  and  ourselves.     by  way  of  a  case  study,  this  paper  illustrates  and  evaluates  the  bibliographic  framework  (or   bibframe)  as  means  for  harvesting  and  sharing  bibliographic  metadata  over  the  web  for  libraries.   bibframe  is  an  emerging  framework  developed  under  the  auspices  of  the  library  of  congress  to   exert  bibliographic  control  over  traditional  and  web  resources  in  an  increasingly  digital  world.   while  bibframe  has  been  introduced  as  a  potential  replacement  for  marc  (machine-­‐readable   cataloging)  in  libraries;2  however,  the  goal  of  this  paper  is  to  highlight  the  merits  of  bibframe  as   a  mechanism  for  libraries  to  share  metadata  over  the  web.   bibframe  and  linked  data   while  the  impetus  behind  bibframe  may  have  been  replacement  of  marc,  “it  seems  likely  that   libraries  will  continue  using  marc  for  years  to  come  because  that  is  what  works  with  available   library  systems.”3  despite  its  uncertain  future  in  the  cataloging  world,  bibframe  in  its  current   form  provides  fresh  and  insightful  mechanism  for  libraries  to  repackage  and  share  bibliographic   metadata  over  the  web.  bibframe  utilizes  the  linked  data  paradigm  for  publishing  and  sharing   data  over  the  web.4  much  like  semantic  web,  the  goal  of  linked  data  is  to  make  the  web  “data   aware”  and  transform  the  existing  web  of  documents  into  a  web  of  data.  linked  data  utilizes   existing  web  infrastructure  and  allows  linking  and  sharing  of  structured  data  for  human  and   machine  consumption.  in  a  recent  study  to  understand  and  reconcile  various  perspectives  on  the   effectiveness  of  linked  data,  the  authors  raise  intriguing  questions  about  the  possibilities  of   leveraging  linked  data  for  sharing  library  metadata  over  the  web:     although  library  metadata  made  the  transition  from  card  catalogs  to  online  catalogs   over  40  years  ago,  and  although  a  primary  source  of  information  in  today’s  world  is  the   web,  metadata  in  our  opacs  are  no  more  free  to  interact  on  the  web  today  than  when   they  were  confined  on  3"  ×  5"  catalog  cards  in  wooden  drawers.  what  if  we  could  set   free  the  bound  elements?  that  is,  what  if  we  could  let  serial  titles,  subjects,  creators,   dates,  places,  and  other  elements,  interact  independently  with  data  on  the  web  to  which   they  are  related?  what  might  be  the  possibilities  of  a  statement-­‐based,  linked  data   environment?  5       linked  data  in  libraries:  a  case  study  of  harvesting  and  sharing  bibliographic  metadata     with  bibframe  |  tharani       7     figure  1.  the  bibframe  model6   bibframe  provides  the  means  for  libraries  to  experiment  with  linked  data  to  find  answers  to   these  questions  for  themselves.  this  makes  bibframe  both  daunting  and  delighting   simultaneously.  it  is  daunting  because  it  imposes  a  paradigm  shift  in  how  libraries  have   historically  managed,  exchanged,  and  shared  metadata.  but  embracing  linked  data  also  leads  to  a   promise  land  where  metadata  within  and  among  libraries  can  be  exchanged  seamlessly  and   economically  over  the  web.  bibframe  (http://bibframe.org)  consists  of  a  model  and  a  vocabulary   set  specifically  designed  for  bibliographic  control.7  the  model  identifies  four  main  classes,  namely,   work,  instance,  authority,  and  annotation  (see  figure  1).  for  each  of  these  classes,  there  are  many   hierarchical  attributes  that  help  in  describing  and  linking  instantiations  of  these  classes.  these   properties  are  collectively  called  the  bibframe  vocabulary.     philosophically,  linked  data  is  based  on  the  premise  that  more  links  among  resources  will  lead  to   better  contextualization  and  credibility  of  resources,  which  in  turn  will  help  in  filtering  irrelevant   resources  and  discovering  new  and  meaningful  resources.  at  a  more  practical  level,  linked  data   provides  a  simple  mechanism  to  make  connections  among  pieces  of  information  or  resources  over   the  web.  more  specifically,  it  not  only  allows  humans  to  make  use  of  these  links  but  also  machines   to  do  so  without  human  intervention.  this  may  sound  eerie,  but  one  has  to  understand  the  history   behind  the  origin  of  linked  data  not  to  think  of  this  as  yet  another  conspiracy  for  machines  to  take   over  the  world  (wide  web).     in  1994  tim  berners-­‐lee,  the  inventor  of  the  web,  put  forth  his  vision  of  the  semantic  web  as  a   “web  of  actionable  information—information  derived  from  data  through  a  semantic  theory  for     information  technology  and  libraries  |  march  2015   8   interpreting  the  symbols.  the  semantic  theory  provides  an  account  of  ‘meaning’  in  which  the   logical  connection  of  terms  establishes  interoperability  between  systems.”8  while  the  idea  of   semantic  web  has  not  been  fully  realized  for  a  variety  of  functional  and  technical  reasons,  the   notion  of  linked  data  introduced  subsequently  has  made  the  concept  much  more  accessible  and   feasible  for  a  wider  application.9  once  again,  it  was  tim  berners-­‐lee  who  put  forth  the  ground   rules  for  publishing  data  on  the  web  that  are  now  known  as  the  linked  data  principles.10  these   principles  advocate  using  standard  mechanisms  for  naming  each  resource  and  their  relationships   with  unique  universal  resource  identifiers  (uris);  making  use  of  the  existing  web  infrastructure   for  connecting  resources;  and  using  resource  description  framework  (rdf)  for  documenting  and   sharing  resources  and  their  relationships.     a  uri  serves  as  a  persistent  name  or  handle  for  a  resource  and  is  ideally  independent  of  the   underlying  location  and  technology  of  the  resource.  although  often  used  interchangeably,  a  uri  is   different  from  a  url  (or  universal  resource  locator),  which  is  a  more  commonly  used  term  for   web  resources.  a  url  is  a  special  type  of  uri,  which  points  to  the  actual  location  (or  the  web   address)  of  a  resource,  including  the  file  name  and  extension  (such  as  .html  or  .php)  of  a  web   resource.  being  more  generic,  the  use  of  uris  (as  opposed  to  urls)  in  linked  data  provides   persistency  and  flexibility  of  not  having  to  change  the  names  and  references  every  time  resources   are  relocated  or  there  is  a  change  in  server  technology.  for  example  if  an  organization  switches  its   underlying  web-­‐scripting  technology  from  active  server  pages  (asp)  to  java  server  pages  (jsp),  all   the  files  on  a  web  server  will  bear  a  different  extension  (e.g.,  .jsp)  causing  all  previous  urls  with   old  extension  (e.g.,  .asp)  to  become  invalid.  this  technology  change,  however,  may  have  no  impact   if  uris  are  used  instead  of  urls  because  the  underlying  implementation  and  location  details  for  a   resource  are  masked  from  the  public.  thus  the  uri  naming  scheme  within  an  organization  must   be  developed  independent  of  the  underlying  technology.  there  are  diverse  best  practices  on  how   to  name  uris  to  promote  usability,  longevity,  and  persistence.11  the  most  important  factors,   however,  remain  the  purpose  and  the  context  for  which  the  resources  are  being  harvested  and   shared.     use  of  rdf  is  also  a  requirement  of  using  linked  data  for  sharing  data  over  the  web.  much  like   how  html  (hypertext  markup  language)  is  used  to  create  and  publish  documents  over  the  web,   rdf  is  used  to  create  and  publish  linked  data  over  the  web.  the  format  of  rdf  is  very  simple  and   makes  use  of  three  fundamental  elements,  namely,  subject,  predicate,  and  object.  similar  to  the   structure  of  a  basic  sentence,  the  three  elements  make  up  the  unit  of  description  of  a  resource   known  as  a  triple  in  the  rdf  terminology.  unsurprisingly,  rdf  requires  all  three  elements  to  be   denoted  by  uris  with  the  exception  of  the  object,  which  may  also  be  represented  by  constant   values  such  as  a  dates,  strings,  or  numbers.12  as  an  example,  consider  the  work  divine  comedy.  the   fact  this  work,  also  known  as  divina  commedia,  was  created  by  dante  alighieri  can  be  represented   by  the  following  two  triples  (using  n-­‐triples  format):       linked  data  in  libraries:  a  case  study  of  harvesting  and  sharing  bibliographic  metadata     with  bibframe  |  tharani       9              .         “divina  commedia”.   in  the  first  triple  of  this  example,  the  work  divine  comedy  (subject)  is  being  attributed  to  a  person   called  dante  alighieria  (object)  as  the  creator  (predicate).  in  the  second  triple  the  use  of  sameas   predicate  asserts  that  both  divine  comedy  and  divina  commedia  refer  to  the  same  resource.  thus   using  uris  makes  the  resources  and  relationships  persistent  whereas  use  of  rdf  makes  the   format  discernible  by  humans  and  machines.  this  seemingly  simple  idea  allows  data  to  be   captured,  formatted,  shared,  transmitted,  received,  and  decoded  over  the  web.  use  of  the  existing   web  protocol  (http  or  hypertext  transfer  protocol)  for  exchanging  and  integrating  data  saves   the  overhead  of  putting  additional  agreements  and  infrastructure  in  place  among  parties  willing   or  wishing  to  exchange  data.  this  ease  and  freedom  to  define  relationships  among  resources  over   the  web  also  makes  it  possible  for  disparate  data  sources  to  interact  and  integrate  with  each  other   openly  and  free  of  cost.     why  is  this  seemingly  simple  idea  so  significant  for  the  future  of  the  web?  from  a  functional   perspective,  what  this  means  is  that  linked  data  facilitates  “using  the  web  to  create  typed  links   between  data  from  different  sources.  these  may  be  as  diverse  as  databases  maintained  by  two   organisations  in  different  geographical  locations,  or  simply  heterogeneous  systems  within  one   organisation  that,  historically,  have  not  easily  interoperated  at  the  data  level.”13  the  notion  of   typed  linking  refers  to  the  facility  and  freedom  of  being  able  to  have  and  name  multiple   relationships  among  resources.  from  a  technical  point  of  view,  “linked  data  refers  to  data   published  on  the  web  in  such  a  way  that  it  is  machine-­‐readable,  its  meaning  is  explicitly  defined,  it   is  linked  to  other  external  data  sets,  and  can  in  turn  be  linked  to  from  external  data  sets.”14  in  a   traditional  database,  relationships  between  entities  or  resources  are  predefined  by  virtue  of  tables   and  column  names.  moreover,  data  in  such  databases  become  part  of  the  deep  web  and  not   readily  accessed  or  indexed  by  search  engines.  15   the  use  of  uris  to  name  relationships  allows  data  sources  to  establish,  use,  and  reuse   vocabularies  to  define  relationships  between  existing  resources.  these  names  or  vocabularies,   much  like  the  resources  they  describe,  have  their  own  dedicated  uris,  making  it  possible  for   resources  to  form  long-­‐term  and  reliable  relationships  with  each  other.  if  resources  and   relationships  have  and  retain  their  identities  by  virtue  of  their  uris,  then  links  between  resources   add  to  the  awareness  of  these  resources  both  for  humans  and  machines.  this  is  a  key  concept  in   realizing  the  overall  mission  of  linked  data  to  imbue  data  awareness  and  transforming  the   existing  web  of  documents  into  a  web  of  data.  consequently  various  institutions  and  industries     information  technology  and  libraries  |  march  2015   10   have  established  standard  vocabularies  and  made  them  available  for  others  to  use  with  their  data.   for  example,  the  library  of  congress  has  published  its  subject  headings  as  linked  data.  the   impetus  behind  this  gesture  is  that  if  data  from  multiple  organizations  is  “typed  link”  using  lcsh   (library  of  congress  subject  headings)  with  linked  data,  then  libraries  and  others  gain  the  ability   to  categorize,  collocate,  and  integrate  data  from  disparate  systems  over  the  web  by  virtue  of  using   a  common  vocabulary.  as  more  and  more  resources  link  to  each  other  through  established  and   reusable  vocabularies,  the  more  data  aware  the  web  becomes.  recognizing  this  opportunity,  the   library  of  congress  has  also  developed  and  shared  its  vocabulary  for  bibliographic  control  as  part   of  the  bibframe  framework.16     implementing  bibframe  to  harvest  and  share  bibliographic  metadata   nowadays,  systems  like  catalogs  and  digital  collection  repositories  are  commonplace  in  libraries,   but  these  source  systems  often  operate  as  islands  of  data  both  within  and  across  libraries.  the   goal  of  this  case  study  is  to  explore  and  evaluate  bibframe  as  a  viable  approach  for  libraries  to   integrate  and  share  disparate  metadata  over  the  web.  as  discussed  above,  the  bibframe  model   attempts  to  contextualize  the  use  of  linked  data  for  libraries  and  provides  a  conceptual  model  and   underlying  vocabulary  to  do  so.  to  this  end,  a  unique  collection  of  ismaili  muslim  community  was   identified  for  the  case  study.  the  collection  is  physically  housed  at  the  harvard  university  library   (hul)  and  the  metadata  for  the  collection  is  dispersed  across  multiple  systems  within  the  library.   an  additional  objective  of  this  case  study  has  been  to  define  concrete  and  replicable  steps  for   libraries  to  implement  bibframe.  the  discussion  below  is  therefore  presented  in  a  step-­‐by-­‐step   format  for  harvesting  and  sharing  bibliographic  metadata  over  the  web.     1. establishing  a  purpose  for  harvesting  metadata   the  harvard  collection  of  ismaili  literature  is  first  of  its  kind  in  north  america.  “the  most   important  genre  represented  in  the  collection  is  that  of  the  ginans,  or  the  approximately  one   thousand  hymn-­‐like  poems  written  in  an  assortment  of  indian  languages  and  dialects.”17  the   feasibility  of  bibframe  was  explored  in  this  case  study  by  creating  a  thematic  research  collection   of  ginans  by  harvesting  existing  bibliographic  metadata  at  hul.  the  purpose  of  this  thematic   research  collection  is  to  make  ginans  accessible  to  researchers  and  scholars  for  textual  criticism.   historically  libraries  have  played  a  vital  role  in  making  extant  manuscripts  and  other  primary   sources  accessible  to  scholars  for  textual  criticism.  the  need  for  having  such  a  collection  in  place   for  ginans  was  identified  by  dr.  ali  asani,  professor  of  indo-­‐muslim  and  islamic  religion  and   cultures  at  harvard  university:     perhaps  the  greatest  obstacle  for  further  studies  on  the  ginan  literature  is  the  almost   total  absence  of  any  kind  of  textual  criticism  on  the  literature.  thus  far  merely  two  out  of   the  nearly  one  thousand  compositions  have  been  critically  edited.  naturally,  the   availability  of  reliably  edited  texts  is  fundamental  to  any  substantial  scholarship  in  this   field.  .  .  .  for  the  scholar  of  post-­‐classical  ismaili  literature,  recourse  to  this  kind  of     linked  data  in  libraries:  a  case  study  of  harvesting  and  sharing  bibliographic  metadata     with  bibframe  |  tharani       11   material  has  become  especially  critical  with  the  growing  awareness  that  there  exist   significant  discrepancies  between  modern  printed  versions  of  several  ginans  and  their   original  manuscript  form.  fortunately,  the  harvard  collection  is  particularly  strong  in  its   holdings  of  a  large  number  of  first  editions  of  printed  ginan  texts—a  strength  that  should   greatly  facilitate  comparisons  between  recensions  of  ginans  and  the  preparation  of   critical  editions.18   2. modeling  the  data  to  fulfill  functional  requirements   historically,  the  physicality  of  resources  such  as  book  or  compact  disc  has  dictated  what  is   described  in  library  catalogs  and  to  what  extent.  the  issue  of  cataloging  serials  and  other  works   embedded  within  larger  works  has  always  been  challenging  for  catalogers.  for  this  case  study  as   well,  one  of  the  major  implementation  decisions  revolved  around  the  granularity  of  defining  a   work.  designating  each  ginan  as  a  work  (rather  than  a  manuscript  or  lithograph)  was  perhaps  an   unconventional  decision,  but  one  that  was  highly  appropriate  for  the  purpose  of  the  collection.   thus  there  was  a  conscious  and  genuine  effort  to  liberate  a  work  from  the  confines  of  its  carriers.   fortuitously,  bibframe  does  not  shy  away  from  this  challenge  and  accommodates  embedded  and   hierarchal  works  in  its  logical  model.  but  bibframe,  like  any  other  conceptual  model,  only   provides  a  starting  point,  which  needs  to  be  adapted  and  implemented  for  individual  project   needs.       figure  2.  excerpt  of  project  data  model     information  technology  and  libraries  |  march  2015   12   the  data  model  for  this  case  study  (see  figure  2)  was  designed  to  balance  the  need  to   accommodate  bibliographic  metadata  with  the  demands  of  linked  data  paradigm.  central  to  the   project  data  model  is  the  resources  table  where  information  on  all  resources  along  with  their  uris   and  categories  (work,  instance,  etc.)  are  stored.  resources  relate  to  each  other  with  use  of   predicates  table,  which  captures  relevant  and  applicable  vocabularies.  the  namespace  table  keeps   track  of  all  the  set  of  vocabularies  being  used  for  the  project.  in  the  triples  table,  resources  are   typed  linked  using  appropriate  predicates.  once  the  data  model  for  the  project  was  finalized,  a   database  was  created  using  mysql  to  house  the  project  data.   3. planning  the  uri  scheme     in  general  the  uri  scheme  for  this  case  study  conformed  to  the  following  intuitive  nomenclature:   .     this  uri  naming  scheme  ensures  that  a  uri  assigned  to  a  resource  depends  on  its  class  and   category  (see  table  1).  while  it  may  be  customary  to  use  textual  identifiers  in  the  uris,  the  project   used  numeric  identifiers  to  account  for  the  fact  that  most  of  the  ginans  (works)  are  untitled  and   transliterated  into  english  from  various  indic  languages.  generally  support  for  using  uris  is  either   already  built-­‐in  or  added  on  depending  on  the  server  technology  being  used.  this  case  study   utilized  the  lamp  (linux,  apache,  mysql,  and  php)  technology  stack,  and  the  uri  handler  for  the   project  was  added  on  to  the  apache  webserver  using  url-­‐rewriting  (or  mod_rewrite)  facility.19     resource  types   bibframe  category   uri  example   organizations   annotation   http://domain.com/organization/1   collections   annotation   http://domain.com/collection/1   items   instance   http://domain.com/item/1     ginan   work   http://domain.com/ginan/1     subjects   authority   http://domain.com/subject/1     table  1.  uri  naming  scheme  and  examples   4. using  standard  vocabularies     bibframe  provides  the  relevant  vocabulary  and  the  underlying  uris  to  implement  linked  data   with  bibliographic  data  in  libraries.  while  not  all  attributes  may  be  applicable  or  used  in  a  project,   the  ones  that  are  identified  as  relevant  must  be  referenced  with  their  rightful  uri.  for  example,   the  predicate  hasauthority  from  bibframe  has  a  persistent  uri   (http://bibframe.org/vocab/hasauthority)  enabling  humans  as  well  as  machines  to  access  and   decode  the  purpose  and  scope  of  this  predicate.  other  vocabulary  sets  or  namespaces  commonly   used  with  linked  data  include  resource  description  frameowrk  (rdf),  web  ontology  language   (owl),  friend  of  a  friend  (foaf),  etc.  in  rare  circumstances,  libraries  may  also  choose  to  publish   their  own  specific  vocabulary.  for  example,  any  unique  predicates  for  this  case  study  could  be     linked  data  in  libraries:  a  case  study  of  harvesting  and  sharing  bibliographic  metadata     with  bibframe  |  tharani       13   defined  and  published  using  the  http://domain.com/vocab  namespace.   5. identifying  data  sources     the  bibliographic  metadata  used  for  this  case  study  was  obtained  from  within  hul.  as  mentioned   above,  the  data  pertained  to  a  unique  collection  of  religious  literature  belonging  to  the  ismaili   muslim  community  of  the  indian  subcontinent.  this  collection  was  acquired  by  the  middle  eastern   department  of  the  harvard  college  library  in  1980.  the  collection  comprises  28  manuscripts,  81   printed  books,  and  11  lithographs.  in  1992,  a  book  on  the  contents  of  this  collection  was  published   in  1992  by  dr.  asani  and  was  titled  the  harvard  collection  of  ismaili  literature  in  indic  languages:   a  descriptive  catalog  and  finding  aid.  the  indexes  in  the  book  served  as  one  of  the  sources  of  data   for  this  case  study.     subsequent  to  the  publication  of  the  book,  the  harvard  collection  of  ismaili  literature  was  also   made  available  through  harvard’s  opac  (online  public  access  catalog)  called  hollis  (see  figure  3).   the  catalog  records  were  also  obtained  from  the  library  for  the  case  study.  some  of  the  120  items   from  the  collection  were  subsequently  digitized  and  shared  as  part  of  the  harvard’s  islamic   heritage  project.  the  digital  surrogates  of  these  items  were  shared  through  the  harvard   university  library  open  collections  program.  and  the  library  catalog  records  were  also  updated  to   provide       figure  3.  hollis:  harvard  university  library’s  opac   direct  access  to  the  digital  copies  where  available.  additional  metadata  for  the  digitized  items  was   also  developed  by  the  library  to  facilitate  open  digital  access  through  harvard  library’s  page   delivery  service  (pds)  to  provide  page-­‐turning  navigational  interface  for  scanned  page  images   over  the  web.  data  from  all  these  sources  was  leveraged  for  the  case  study.       information  technology  and  libraries  |  march  2015   14     6. transforming  source  metadata  for  reuse   etl  (extract,  transform,  and  load)  is  an  acronym  commonly  used  to  refer  to  the  steps  needed  to   populate  a  target  database  by  moving  data  from  multiple  and  disparate  source  systems.  extraction   is  the  process  of  getting  the  data  out  of  the  identified  source  systems  and  making  it  available  for   the  exclusive  use  of  the  new  database  being  designed.  in  the  context  of  the  library  realm,  this  may   mean  getting  marc  records  out  from  a  catalog  or  getting  descriptive  and  administrative  metadata   out  of  a  digital  repository.  format  in  which  data  is  extracted  out  of  a  source  system  is  also  an   important  aspect  of  the  data  extraction  process.  use  of  xml  (extensible  markup  language)  format   is  fairly  common  nowadays  as  most  library  source  systems  have  built-­‐in  functionality  to  export   data  into  a  recognized  xml  standard  such  as  marcxml  (marc  data  encoded  in  xml),  mods   (metadata  object  description  schema),  mets  (metadata  encoding  and  transmission  standard),   etc.  in  certain  circumstances,  data  may  be  extracted  using  csv  (comma-­‐separated  values)  format.   transformation  is  the  step  in  which  data  from  one  or  more  source  systems  is  massaged  and   prepared  to  be  loaded  to  a  new  database.  the  design  of  the  new  database  often  enforces  new  ways   of  organizing  source  data.  the  transformation  process  is  responsible  to  make  sure  that  the  data   from  all  source  systems  is  integrated  while  retaining  its  integrity  before  being  loaded  to  the  new   database.  a  simplistic  example  of  data  transformation  may  be  that  the  new  system  may  require   authors’  first  and  last  names  to  be  stored  in  separate  fields  rather  than  in  a  single  field.  how  such   transformations  are  automated  will  depend  on  the  format  of  the  source  data  as  well  as  the   infrastructure  and  programming  skills  available  within  an  organization.  since  xml  is  becoming   the  de  facto  standard  for  most  data  exchange,  use  of  xslt  (extensible  stylesheet  language   transformations)  scripts  is  common.  with  xslt,  data  in  xml  format  can  be  manipulated  and   given  different  structure  to  aid  in  the  transformation  process.     the  loading  process  is  responsible  for  populating  the  newly  minted  database  once  all   transformations  have  been  applied.  one  of  the  major  considerations  in  this  process  is  maintaining   the  referential  integrity  of  the  data  by  observing  the  constraints  dictated  by  the  data  model.  this  is   achieved  by  making  sure  that  records  are  correctly  linked  to  each  other  and  are  loaded  in  proper   sequence.  for  instance,  to  ensure  referential  integrity  of  items  and  their  annotations,  it  may  be   necessary  to  load  the  items  first  and  then  the  annotations  with  correct  reference  to  the  associated   item  identifiers.   for  this  case  study,  records  from  source  systems  were  obtained  in  marcxml  and  mets  formats,   and  specific  scripts  were  developed  to  extract  desired  elements  and  transform  them  into  the   required  format.  a  somewhat  unconventional  mechanism  was  used  to  capture  and  reuse  the  data   from  dr.  asani’s  book,  which  was  only  available  in  print.  the  entire  book  was  scanned  and   processed  by  an  ocr  (optical  character  recognition)  tool  to  glean  various  data  elements.  once  the   data  was  cleaned  and  verified,  the  information  was  transformed  into  a  csv  data  file  to  facilitate     linked  data  in  libraries:  a  case  study  of  harvesting  and  sharing  bibliographic  metadata     with  bibframe  |  tharani       15   database  loading.   7. generating  rdf  triples   the  rdf  triples  can  be  written  or  serialized  using  a  variety  of  formats  such  as  turtle,  n-­‐triples,   json,  as  well  as  rdf/xml,  among  others.  the  traditional  rdf/xml  format,  which  was  the  first   standard  to  be  recommended  for  rdf  serialization  by  the  world  wide  web  consortium  (w3c),   was  used  for  this  case  study  (see  figure  4).  the  format  was  chosen  for  its  modularity  in  preserving   the  context  of  resources  and  their  relationships  as  well  as  its  readability  for  humans.  generating   rdf  may  be  a  simple  act  if  the  data  is  already  stored  in  a  triplestore,  which  is  a  database   specifically  designed  to  store  rdf  data.  but  given  that  this  project  was  implemented  using  a   relational  database  management  system  (rdbms),  i.e.,  mysql,  the  programming  effort  to  generate   rdf  data  was  complex.  the  complications  arose  in  identifying  and  tracking  the  hierarchical  nature   of  the  rdf  data,  especially  in  the  chosen  serialization  format.  several  server-­‐side  scripts  were   developed  to  aid  in  discerning  the  relationships  among  resources  and  formatting  them  to  generate   triples.  in  hindsight  generating  triples  would  have  been  easier  using  the  n-­‐triples  serialization  but   that  would  have  also  required  more  complex  programming  for  rebuilding  the  context  for  the  user   interface  design.   figure  4.  a  sample  of  triples  serialized  for  the  project   8. formatting  rdf  triples  for  human  and  machine  consumption   the  raw  rdf  data  is  sufficient  for  machines  to  parse  and  process,  but  humans  typically  require   intuitive  user  interface  to  contextualize  triples.  in  this  case  study,  xsl  was  extensively  used  for   formatting  the  triples.  while  xslt  and  xsl  (extensible  stylesheet  language)  are  intricately   related,  they  serve  different  purposes.  xslt  is  a  scripting  language  to  manipulate  xml  data   whereas  xsl  is  a  formatting  specification  used  in  presentation  of  xml,  much  like  how  css   (cascading  style  sheets)  are  used  for  presenting  html.  a  special  routing  script  was  also   developed  to  detect  whether  the  request  for  data  was  intended  for  machine  or  human   consumption.  for  machine  requests,  the  triples  were  served  unformatted  whereas  for  human   requests,  the  triples  were  formatted  to  display  in  html.       information  technology  and  libraries  |  march  2015   16     figure  5.  formatted  triples  for  human  consumption   discussion   models  are  tools  of  communicating  simple  and  complex  relations  between  objects  and  entities  of   interest.  effectiveness  of  any  model  is  often  realized  during  implementation  when  the  theoretical   constructs  of  the  models  are  put  to  test.  the  challenge  faced  by  bibframe,  like  any  new  model,  is   to  establish  its  worthiness  in  the  face  of  the  existing  legacy  of  marc.  the  existing  hold  of  marc  in   libraries  is  so  strong  that  it  may  take  several  years  for  bibframe  to  be  in  a  position  to  challenge   the  status  quo.  historically  bibliographic  practices  in  libraries  such  as  describing,  classifying,  and   cataloging  resources  have  primarily  catered  to  tangible,  print-­‐based  knowledge  carriers  such  as   books  and  journals.20  bibframe  challenges  libraries  to  revisit  and  refresh  their  traditional  notion   of  text  and  textuality.   although  initially  introduced  as  a  replacement  for  marc,  bibframe  is  far  from  being  an  either-­‐or   proposition  given  the  marc  legacy.  nevertheless,  bibframe  has  made  linked  data  paradigm   much  more  accessible  and  practical  for  libraries.  rather  than  perceiving  bibframe  as  a  threat  to   existing  cataloging  praxis,  it  may  be  useful  for  libraries  to  allow  bibframe  to  coexist  within  the   current  cataloging  landscape  as  a  means  for  sharing  bibliographic  data  over  the  web.  libraries   maintain  and  provide  authentic  metadata  about  knowledge  resources  for  their  users  based  on   internationally  recognized  standards.  this  high  quality  structured  metadata  from  library  catalogs   and  other  systems  can  be  leveraged  and  repurposed  to  fulfill  unmet  and  emerging  needs  of  users.   with  linked  data,  library  metadata  could  become  readily  harvestable  by  search  engines,   transforming  dormant  catalogs  and  collections  into  active  knowledge  repositories.   in  this  case  study  seemingly  disparate  library  systems  and  data  were  integrated  to  provide  a   unified  and  enabling  access  to  create  a  thematic  research  collection.  it  is  also  possible  to  create   such  purpose-­‐specific  digital  libraries  and  collections  as  part  of  library  operations  without  having   to  acquire  additional  hardware  and  commercial  software.  it  was  also  evident  from  this  case  study   that  digital  libraries  built  using  bibframe  offer  superior  navigational  control  and  access  points     linked  data  in  libraries:  a  case  study  of  harvesting  and  sharing  bibliographic  metadata     with  bibframe  |  tharani       17   for  users  to  actively  interact  with  bibliographic  data.  any  linked  data  predicate  has  the  potential   to  become  an  access  point  and  act  as  a  pivot  to  provide  insightful  view  of  the  underlying   bibliographic  records  (see  figure  6).  with  advances  in  digital  technologies  “richer  interaction  is   possible  within  the  digital  environment  not  only  as  more  content  is  put  within  reach  of  the  user,   but  also  as  more  tools  and  services  are  put  directly  in  the  hands  of  the  user.”21  developing  capacity   to  effectively  respond  to  the  informational  needs  of  users  is  part  and  parcel  of  libraries’   professional  and  operational  responsibilities.  with  the  ubiquity  of  the  web  and  increased  reliance   of  users  on  digital  resources,  libraries  must  constantly  reevaluate  and  reimagine  their  services  to   remain  responsive  and  relevant  to  their  users.       figure  6.  increased  navigational  options  with  linked  data   conclusion   just  as  libraries  rely  on  vendors  to  develop,  store,  and  share  metadata  for  commercial  books  and   journals,  similar  metadata  partnerships  need  to  be  put  in  place  across  libraries.  the  benefits  and   implications  of  establishing  such  a  collaborative  metadata  supply  chain  are  far  reaching  and  can   also  accommodate  cultural  and  indigenous  resources.  library  digital  collections  typically   showcase  resources  that  are  unique  and  rare,  and  the  metadata  to  make  these  collections   accessible  must  be  shared  over  the  web  as  part  of  library  service.     as  the  amount  of  data  on  the  web  proliferates,  users  find  it  more  and  more  difficult  to  differentiate   between  credible  knowledge  resources  and  other  resources.  bibframe  has  the  potential  to   address  many  of  the  issues  that  plague  the  web  from  a  library  and  information  science  perspective,   including  precise  search,  authority  control,  classification,  data  portability,  and  disambiguation.   most  popular  search  engines  like  google  are  gearing  up  to  automatically  index  and  collocate   disparate  resources  using  linked  data.22  libraries  are  particularly  well  positioned  to  realize  this   goal  with  their  expertise  in  search,  metadata  generation,  and  ontology  development.  this  research   looks  forward  to  further  initiatives  by  libraries  to  become  more  responsive  and  make  library     information  technology  and  libraries  |  march  2015   18   resources  more  relevant  to  the  knowledge  creation  process.     references     1.     tim  f.  knight,  “break  on  through  to  the  other  side:  the  library  and  linked  data,”  tall   quarterly  30,  no.  1  (2011):  1–7,  http://hdl.handle.net/10315/6760.   2.     eric  miller  et  al.,  “bibliographic  framework  as  a  web  of  data:  linked  data  model  and   supporting  services,”  november  11,  2012,  http://www.loc.gov/bibframe/pdf/marcld-­‐report-­‐ 11-­‐21-­‐2012.pdf.   3.     angela  kroeger,  “the  road  to  bibframe:  the  evolution  of  the  idea  of  bibliographic   transition  into  a  post-­‐marc  future,”  cataloging  &  classification  quarterly  51,  no.  8  (2013):   873–89,  http://dx.doi.org/10.1080/01639374.2013.823584.   4.     eric  miller  et  al.,  “bibliographic  framework  as  a  web  of  data:  linked  data  model  and   supporting  services,”  november  11,  2012,  http://www.loc.gov/bibframe/pdf/marcld-­‐report-­‐ 11-­‐21-­‐2012.pdf.   5.     nancy  fallgren  et  al.,  “the  missing  link:  the  evolving  current  state  of  linked  data  for  serials,”   serials  librarian  66,  no.  1–4  (2014):  123–38,   http://dx.doi.org/10.1080/0361526x.2014.879690.   6.     the  figure  has  been  adapted  from  eric  miller  et  al.,  “bibliographic  framework  as  a  web  of   data:  linked  data  model  and  supporting  services,”  november  11,  2012,   http://www.loc.gov/bibframe/pdf/marcld-­‐report-­‐11-­‐21-­‐2012.pdf.   7.     “bibliographic  framework  initiative  project,”  library  of  congress,  accessed  august  15,  2014,   http://www.loc.gov/bibframe.   8.     nigel  shadbolt,  wendy  hall,  and  tim  berners-­‐lee,  “the  semantic  web  revisited,”  intelligent   systems  21  no.  3  (2006):  96–101,  http://dx.doi.org/10.1109/mis.2006.62.   9.     sören  auer  et  al.,  “introduction  to  linked  data  and  its  lifecycle  on  the  web,”  in  reasoning   web:  semantic  technologies  for  intelligent  data  access,  edited  by  sebastian  rudolph  et  al.,  1– 90  (heidelberg:  springer,  2011),  http://dx.doi.org/10.1007/978-­‐3-­‐642-­‐23032-­‐5_1.   10.    tim  berners-­‐lee,  “linked  data,”  design  issues,  last  modified  june  18,  2009,   http://www.w3.org/designissues/linkeddata.html.   11.    danny  ayers  and  max  völkel,  “cool  uris  for  the  semantic  web,”  world  wide  web  consortium   (w3c),  last  modified  march  31,  2008,  http://www.w3.org/tr/cooluris.   12.    tom  heath  and  christian  bizer,  linked  data:  evolving  the  web  into  a  global  data  space   (morgan  &  claypool,  2011),  http://dx.doi.org/10.2200/s00334ed1v01y201102wbe001.     linked  data  in  libraries:  a  case  study  of  harvesting  and  sharing  bibliographic  metadata     with  bibframe  |  tharani       19     13.    christian  bizer,  tom  heath,  and  tim  berners-­‐lee,  “linked  data—the  story  so  far,”   international  journal  on  semantic  web  and  information  systems  5,  no.  3  (2009):  1–22,   http://dx.doi.org/10.4018/jswis.2009081901.   14.    ibid.     15.    tony  boston,  “exposing  the  deep  web  to  increase  access  to  library  collections”  (paper   presented  at  the  ausweb05,  the  twelfth  australasian  world  wide  web  conference,   queensland,  australia,  2005),   http://www.nla.gov.au/openpublish/index.php/nlasp/article/view/1224/1509.   16.      “bibliographic  framework  initiative,”  bibframe.org,  accessed  august  15,  2014,     http://bibframe.org/vocab;  “bibliographic  framework  initiative  project,”  library  of  congress,   accessed  august  15,  2014,  http://www.loc.gov/bibframe.   17.    ali  asani,  the  harvard  collection  ismaili  literature  in  indic  languages:  a  descriptive  catalog   and  finding  aid  (boston:  g.k.  hall,  1992).   18.    ibid.   19.    ralf  s.  engelschall,  “url  rewriting  guide,”  apache  http  server  documentation,  last  modified   december,  1997,  http://httpd.apache.org/docs/2.0/misc/rewriteguide.html.   20.    yann  nicolas,  “folklore  requirements  for  bibliographic  records:  oral  traditions  and  frbr,”   cataloging  &  classification  quarterly  39,  no.  3–4  (2005):  179–95,   http://dx.doi.org/10.1300/j104v39n03_11.   21.    lee  l.  zia,  “growing  a  national  learning  environments  and  resources  network  for  science,   mathematics,  engineering,  and  technology  education:  current  issues  and  opportunities  for   the  nsdl  program,”  d-­‐lib  magazine  7,  no.  3  (2001),   http://www.dlib.org/dlib/march01/zia/03zia.html.     22.    thomas  steiner,  raphael  troncy,  and  michael  hausenblas,  “how  google  is  using  linked  data   today  and  vision  for  tomorrow”  (paper  presented  at  the  linked  data  in  the  future  internet   at  the  future  internet  assembly  (fia  2010),  ghent,  december  2010),   http://research.google.com/pubs/pub37430.html. 142 lc/marc on molds; an experiment in computer-based, interactne bibliographic storage, search, retrieval, and processing pauline atherton, associate professor, school of library science, and karen b. miller, research associate, syracuse university, syracuse, new york a project at syracuse university utilizing molds, a generalized computer-based interactive retrieval program, with a portion of the library of congress marc pilot project tapes as a data base. the system, written in fortran, was used in both a batch and an on-line mode. it formed part of a computer laboratory for library science students during 1968-1969. this report describes the system and its components and points out its advantages and disadvantages. introduction the somewhat intimidating title of this report becomes less so when translated from jargon into more familiar phrases. the lc/marc on molds experimental project conducted at syracuse university school of library science utilizes a computer: 1) to store bibliographic reference (library catalog) data, 2) to search the data for items that meet a searcher's criteria, 3) to retrieve items the searcher wishes retrieved, and 4) to process or manipulate items as required. a dialog or interaction between man and his data, via the machine, is established when a searcher makes a request in a query language and the computer responds immediately to the request. the lc/marc on molds system consists of two major components. the first is the data base, which is a slightly modified subset of the library of congress marc pilot project records ( 1). the second component is the computer programming system written in fortran known as molds (acronym for management on-line lc marc on molds/atherton and miller 143 data system). molds provides the computer routines required to store and maintain the data base, and the query language (also known generally as molds) that a searcher uses to interact with his data stored in the computer. the lc/ marc on molds system was originally implemented in april 1968 on the ibm 360/ 50 at the syracuse university computing center. this system is part of an experiment to determine how on-line interactive retrieval systems could be used to greatest advantage in the information gathering process. the molds system, developed in 1966 by the syracuse university research corporation (2) for management purposes, was readily available for use in the research reported in this paper. molds has been used with several data bases, including the marc records. the system has not been made available to a large user population. preliminary work with the system and a few demonstrations to students have already provided considerable insight into the desirable and undesirable features in both the marc data base and the molds query language, an insight that has already resulted in both data-base and querylanguage modification. work with the system on the computer at syracuse university has raised many crucial questions extending beyond the original research plan about system and data base design-questions for which there are as yet no answers. even at its early stage of experimentation the work should be of interest to librarians because of its use of the marc pilot project records and its use of an available retrieval program with features suitable for reference retrieval. to the authors' knowledge, this is the first computer-based project in which the library of congress marc records were used in an interactive retrieval environment. the query language (molds) was not specifically designed for reference retrieval, but its design features make its use for this purpose quite feasible. it differs from the usual interactive system designed for bibliographic reference retrieval and therefore deserves attention for comparative purposes. molds gives a user the ability to process as well as retrieve data, something very few search and retrieval systems are designed to do. the contribution of lc/marc on molds to the world of information retrieval, promising though it appears, cannot be assessed until all experiments are run. this report on its features, both good and bad, is offered in order to make those concerned with the design and application of interactive systems aware of its unique aspects and potential. hopefully, this work will contribute another ingredient to the synthesis of ideas and methods that will bring the state of the art ever closer to the optimum and ideal. 144 journal of library automation vol. 3/2 june, 1970 table i. some features of interactive retrieval systems (circa 1968) #docs. in data base system name data base structure access points 1. audacious 2330 tree structure udc descriptions (alp) threaded list euratom key words 2. bold 6000 threaded list astia subject category (sdc) index terms accession numbers 3. colex 2000 inverted index descriptor} subject micro tree structure author qualified country (sdc) index subject by ~ent, subject area, date 4. grins > 1000 serial document index terms (lehigh u) inverted index 5. multilist varies threaded list any chosen key term to fit aaplitree structure cation ( e! author, subject, ate, directory title wor , subject headings) 6. marc/ 2000 molds cell-matrix any discrete data block 7. nasa/recon 270,000 ? subject l { author corporate qualified date source by report# contract# 8. tip > (mit) 25,000 list structure author(s) location ( where work done) citation identification ( i.v-p.) article title ( entire, keyword) citation index bibliographic coupling 9. suny biomed > 20,000 inverted index auilior } { comm. title qualified date, ne1work subject by lang. •each command is a subroutine. commands are tailored to application. a ccess to authority files on-line udc schedules subject category list, index term file no index term no optional no no no lc marc on molds/atherton and miller 145 related terms or #commands cross refs in query given language yes 11 yes 14 ( 11 light pen) no (conversation) yes (conversation) no 0 optional 35 yes 16 function keys no 9 also various mac commands no 10 (?) computer instruction in language use optional optional optional no no no optional no no comj:ier ai query formulation (conversation) limited yes yes yes no no yes no yes root communiword cation search link yes crt yes crt with light pen yes teletype yes teletype no crt no crt yes teletype no ibm 2740 console 146 journal of library automation vol. 3/ 2 june, 1970 background a number of interactive retrieval systems have b een designed and implemented within the last few years. the features and potential of lc/ marc on molds are best viewed in relation to what has been done in the field up to now. to gain some perspective, the major features of data base structures and query languages of other interactive systems are summarized in table 1. this table presents those features of most interest to librarians who may wish to compare searching on a computer with searching in the card catalog or other bibliographic reference tools. references 3-12 document sources for the data in this table. molds data base structure the general structure of the data base with which molds operates is, in comparison with the threaded lists and inverted indexes found in many retrieval systems, extremely simple and unsophisticated. the data base can be composed of from one to ten distinct files of 1000 records each. a record is equal to the bibliographic description on a card in a library catalog. each record may be up to 300 computer words ( 1200 characters) long and may be subdivided into 80 blocks. originally, there was a 200word (boo-character) limitation on record size, but this has now been expanded. the total file size (limit of 10,000 records) is adequate for testing purposes, but expansion beyond the present limitations is planned in order to make the system more practical for actual use. the structure of a file is essentially a simple matrix. each row contains all the elements of a single complete record; each column contains all like discrete items of all the records in the file. the columns are called blocks in the molds system, block and fi eld being used synonymously in this report. for example, a library catalog card for one publication would be a record in a file composed of library catalog records. the main entries in the file constitute a block and the dates of publication constitute another block. figure 1 illustrates the data base structure, as of 1968. in this illustration the maximum number of files is 10 ( 1000 records each) and the maximum number of blocks 80. each file and each block in a file is given a name and/ or number. a user can reference or call up any file or data block within a file by using its name or number in a molds query language command. there are as many access points to a file as there are blocks in that file. this is in contrast to a conventional card catalog, for example, where the only access points are filing entries: main entry, title, subject(s), added entries, series, and analytics. no specific provision is made within the molds system for the storage of authority files, cross reference lists, or other intermediate keys to the records. such files are not absolutely necessary for effective operation of the system since every block can be accessed and can serve as its own authority file. for more efficient system operation, however, it is intended lc marc on molds/atherton and miller 147 to explore the possibility of creating authority fi1es as part of the data base, beginning with portions of the seventh edition of the library of congress list of subject headings. block al block a2 •••••••• block abo pin l ....___ ---v../ file a } record aj. record a2 re;,ord a3 record aj.ooo data base block bl blockb2 -------block b8c ~ ~ file b record bl record b2 .. record blooo (as of 1968,--maldmulllllulllber .of files is 10 files,each or 1000 records, v1th maximum number of blocks 80) figure 1. section of general molds data base structure. provision is made for temporary user storage areas in which the user places the results of his retrieval and processing operations. data in the user area is retained only during the session in which it is created. although it cannot be saved for use at a later date, all or part of it can be printed out on the on-line printer for the user's later reference. while the general structure of the data base is formalized within the molds system, the content and specific organization of a particular data base is determined by its originator. this feature, plus the simplicity of molds' own structure, introduces a great deal of flexibility into the data base and the use that can be made of it. the originator of the data base may designate as a block any discrete data item he wishes. if the user population is dissatisfied with results using one content and arrangement of blocks, the base can be reformatted and restructured in a fairly simple maintenance run. no problems of linking records or modifying authority lists arise, as neither is part of the system. the first version of the lc/marc data base has in f.act been modified by addition of three blocks and division of one block in half to form two blocks, giving access to smaller units of data. 148 journal of library automation vol. 3/2 june, 1970 the lc/marc data base in molds format library of congress marc pilot project tapes containing some 40,000 records of english language books cataloged in 1966-67 became available for this project in the fall of 1967. because of the molds data base limitations, a subset of these catalog records was selected for use with molds. the original plan was to have each file in the data base consist of as complete a set as possible of all marc pilot project records from a single library of congress classification schedule. the candidate for the first file was class r (medicine) which contained just under 1000 records. later molds files were formed for two other lc classes: t (technology) and z (bibliography and library science). in mid-1969 two stratified sample files of the marc data base were created, one in the humanities, another in the social sciences. in all, syracuse has a marc/molds data base of 10,000 records. the record format of the marc tape was first analyzed to determine which fields should be included in the data base, and which might be omitted. the criterion for selection was probable usefulness to searchers of the data base, a conception that should undoubtedly be modified as searches are monitored. appropriate changes would not be difficult. toward the end of january 1969, a programming project was begun which entailed the design and implementation of a computer program to perform format conversion of the library of congress marc i bibliographic file to satisfy molds data base requirements. the project represented a three man-month effort and was completed by june 1969. the data-base converter program represents an attempt to provide a user-oriented facility for creating a molds data base from marc information. essentially, the user of the program describes each molds file to be produced by specifying: 1) the number of (fixed) fields per molds record; 2) the name and size (in characters) of each field in the molds record; 3) the name of the marc i field from which the data are to be taken; 4) selection criteria according to which marc i records are to be chosen for conversion; 5) for any marc i field, a data conversion procedure to be applied prior to transferring the information to the appropriate molds field; 6) whether or not diacritical codes should be stripped from the marc i field prior to transferring the information to the molds field; 7) whether or not character translation from lower-case to upper-case codes should be performed on the data prior to transfer from the marc i to the molds field. although the program has not yet been refined to the extent originally intended, nevertheless it contains all the features indicated above and has lc marc on molds/atherton and miller 149 been used to create ten molds files since its completion. the program is written in pl/i and more fully documented in a report available from the national auxiliary publication service of asis. molds requires fixed-field input for its data base, but many of the fields or data blocks on the marc tape are variable in length. therefore, the field lengths of 200 records in the class r (medicine) subset were examined to determine the maximum size which would produce a molds record within the original 200 computer-word (boo-character) limitation and still retain all the desired data. this limitation was easily expanded to 300 words, allowing addition of new fields and expansion of existing fields as new marc/molds files were generated. a record whose original variable length was 500 characters or less expanded to about 800 characters when converted to fixed-field form. in the first data base only records of 500 characters or less were considered for inclusion, which gave a total of 620 records in the first marc/molds file. by mid-1969 this data base was greatly enlarged using the program described above. the names of the present marc/molds files are: ss01, ss02, ss03, ss04, ssoz, and ssoh. the first files generated were called marc and marz. the marc/molds format now in use is given in table 2. the additions made to the original format are noted. marc/molds block names can be used instead of block numbers; for ease of searching both name and number are given in the table. the molds block number corresponds to marc pilot project field tags whenever possible. after this second revision had been completed, marc ii ( 13) format with new field tags appeared . interestingly, there were remarkably few differences. creating an information retrieval system from other data bases can present some major headaches. during the first test session with the marc/ molds data base, it was discouraging to find that successful retrieval operations could not be performed on such vital items as subject or main entry (blocks main and suba, respectively) . the problem lay in the fact that the lower-case character codes employed on the marc tape had not been converted to the all-upper-case-codes required by molds. once discovered, the problem was easily remedied. other problems were not so easy to solve. the marc data base had been received in a "raw form", i.e., there were typographical errors in the original tapes and irregular spacing; and incorrect punctuation, spelling and abbreviations. there was no way to detect these errors, and the retrieval program would only work on direct matches of query and document information elements. the molds language (to be discussed subsequently) required a good deal of standardization and regularity of the records to take full and effective advantage of its retrieval capabilities. 150 journal of library automation vol. 3/2 june, 1970 table 2. marc/ molds data base format description field names chars. marc i data element marc fixed fields: molds blk in fixed information block no. block field values or name position explanation or tag no. lc card no. ldn~ 80 11 9-19 type of main entry type 81 1 21 a-g form of work f0rm 82 1 22 mis bibliographies indicator bib 83 1 23 xb illustrations " illu 84 1 24 " maps map 85 1 25 conferences c0nf 86 1 26 juvenile juv 87 1 27 languages lang 88 4/ 4 29-36 both languages language 1 lanl 1 4 29-32 language 2 lan2 2 4 33-36 publication dates date 89 4/ 4 38-45 both dates height in em. hite 90 2 59-60 uniform tracing indicator unif 91 1 66 xlb series tracing indicator sert 92 1 69 " place of publication code plcd 18 4 46-49 publisher code pucd 19 4 50-53 lc call no. lcn~ 98 20 90 dewey class no. dew! 99 20 92 dewey class no. (edited) dew2 39 8 92 ooddd.dd lc class no. (edited) lccl 97 8 90 e.g. 00351.2352 main entry main 10 68 10 title statement titl 20 80 20 subtitle statement stit 21 80 20 edition statement edit 25 12 25 place } . plce 30 28 30 publisher 1mpnnt statement publ 31 28 30 collation c0ll 40 48 40 series note sers 50 44 50/51 note n0ta 60 44 60 note n0tb 61 44 60 subject tracing suba 68 48 70 subject tracing subb 69 48 70 subject tracing subc 70 48 70 lc marc on molds/atherton and miller 151 personal author tracing paua 71 40 71 personal author tracing paub 72 40 71 corporate author tracing c0rp 73 1 72 lc card suffix lcff 94 3 94 total marc/molds characters 848 the molds system functionally, the molds system consists of utility routines to store a data base, a well-defined query language, a language interpreter, and a set of logical procedures which allow the user to operate on a data base. the molds system is a set of fortran iv subroutines which perform the maintenance functions, interpret the commands in the query language and perform the desired logical procedures. the subroutines render the system modular and open. it is therefore relatively easy for a programmer skilled in fortran iv to add, modify and delete commands and functions as required. this feature of the system is quite desirable. user feedback invariably points up weaknesses in the language or suggests useful features which might be incorporated. molds was continually modified in response to user requirements, and each modification was implemented within a short time without requiring major programming changes throughout the system. the system has already grown since it was first implemented with the marc data base, and commands have been added or modified as required. hardware configuration marc/molds was run at syracuse university computing center on an ibm 360/ 50 computer. originally, the on-line mode required full dedication of the computer during execution. the molds system requires some 150,000 bytes of main memory and a disk storage unit to hold the entire data base, as well as intermediate data generated by the user. the molds system has been implemented on other computers (2). interaction with the system in the on-line version was carried on through an ibm 2260 display station consisting of a keyboard and crt (cathode ray tube) display screen. although two or more consoles have not as yet been operated simultaneously, the system is intended to be time-shared. effort was made to alter the system to operate in a 50,000 ( 50k) upper partition, so that it could be accessible at all times rather than on a scheduled basis. this involved reorganizing the program into an overlay structure in which the basic or root segments are resident in a fixed portion of memory throughout execution, while the remainder of the program is divided into a set of smaller segments which can overlay each other, being brought into memory only when needed. this task 152 ] ournal of library automation vol. 3/ 2 june, 1970 required a careful analysis of each subroutine for its dependence upon others, breaking the program into mutually exclusive segments, while ensuring that any given set of segments which occupied memory simultaneously did not exceed 50k bytes of storage. many of the larger segments which had to be further subdivided required considerable reprogramming. the first attempt at executing the new overlay version failed. due to a general lack of experience with the 2260 display units, it had not been anticipated that system software would not allow the console to be accessed from outside of the root segment, and the 2260 software package had been placed in an overlay area. as a result the original overlay configuration had to be altered. the console input/ ouput (i/0) package was moved into the root segment, increasing its size by several hundred bytes and similarly decreasing the amount of storage available for the overlay portions. therefore, it was necessary to develop yet another configuration to conform to these new storage limitations. while the necessary changes were being made, the computing center began operating a limited time-sharing system which itself required full dedication of the 360/50 machine. projected dates for returning to normal computer operations within a multi-partition environment were far enough in the future to suggest the efficacy of creating a new version of molds which could function off line, with cards and printer instead of the 2260 consoles. in this batch, or off-line, mode molds jobs could be submitted through the regular queue and run by computer center staff during batch processing time. with the on-line source program as a starting point, all references to 2260's were replaced with card reader and printer statements and the molds language instructions deleted which depended on the console for their use. mter all changes had been made and compilation was completed successfully, the off-line molds was exercised against a sample data base until it was satisfactorily debugged. since it was known that the computing center would eventually return to r artitioned operation, it was next undertaken to overlay the off-line molds into a 50k partition. this was accomplished with little difficulty since the problems encountered in working with the on-line version were largely due to the consoles. the end result of the entire task, therefore, was an off-line molds which could operate either in core or in overlay structure at the discretion of the user. the molds query language the molds query language includes some 34 distinct commands which must be entirely formulated by the user according to precise syntactical rules. the large number of commands is in part a reflection of the fact that this system provides the user with the ability to perform more operations of a greater variety on a data base than other interactive inforlc marc on molds/ atherton and miller 153 mation retrieval systems. it provides for retrieval of records from the data base according to data value descriptors, processing of data values by arithmetic and logical operations, sorting of retrieval records, and display of retrieval records in full or in part. operationally, the molds system regards a file of records as a set of parallel lists of blocks (figure 1). with the marc data base, these blocks were the 38 fields of catalog data (such as dewey class number, title, author, etc.). the commands in the molds query language are geared to list processing operations. in general, most of the molds commands will result in the formation of lists which are either identical in format to the original file, or are an independent list of alpha or numeric constants not subdivided into blocks. despite its surface complexity, the query language was designed specifically for users with absolutely no computer experience. the fixed format commands are easy to learn and use, even for the novice in computer based systems. they are mnemonic enough so that a little use soon brings an easy familiarity with them. commands in the molds query language there are six categories of commands in the language: retrieval, processing, display, storage, utility, and language augmentation. the commands are listed below with a brief explanation of each. retrieval commands: find: extract fetch define chain select forms a temporary subfile consisting of records from the data base for which the value in a specified block is equal, not equal, greater, greater or equal, less, less or equal to an input value. forms a temporary subfile consisting of records from an argument subfile for which the value in a specified block is equal, not equal, greater, greater or equal, less, less or equal to an input value. forms a temporary file which duplicates an existing file in the data base (added to original molds commands during this project) . forms a temporary subfile from two argument subfiles based on logical relationships and, or, not. forms a temporary subfile consisting of records from an argument subfile for which the value in a specified block is equal to any of the values in a specified block from a second argument subfile. forms a temporary subfile consisting of records from an argument subfile for which the value in a specified block is equal to any of the values in an argument list. 154 journal of library automation vol. 3/2 june, 1970 these six retrieval commands allow the user to extract selected data from the data base. selection is based on 1) a simple algebraic relationship (e.g., equal, not equal, greater than, etc.) between block values and a value specified by the user in the command (value may be alphanumeric or numeric), or 2) a simple logical relationship (e.g., and, or, not) between block values in two lists. all retrievals from molds files are based on exact-match correspondences between input descriptors and data values as they occur in records. each file is treated as distinct regardless of the fact that for the marc/ molds data base the second file may simply be a continuation of the first, etc. any block in a file may be used as an argument in a retrieval process. thus, the usual range of access points (author, title, subject, classification number) is considerably extended to include such unorthodox access points as juvenile literature, language, illustrations, and bibliographies. for example, one can retrieve all documents on a given subject or subjects which are juvenile books with bibliographies and illustrations published by a given publisher in 1966. the user can define his search limits with a degree of specificity not found in most interactive systems. however, the price he must pay is exactness in specifying the values used as retrieval criteria. the system will not retrieve on root words or key letter combinations, although such capability could be added. the block values must, therefore, be consistent and the user must have a precise knowledge of what they may be. this knowledge can be gained by examining the values and having them printed out as needed. (molds does have the capability of selecting unique values from a list, ordering them, and printing them out at any time during system operation. processing commands: count counts the number of records in an argument subfile or items in an argument list. order (reverse) maximum (minimum) total average arranges the records of an argument subfile in ascending (descending) order according to the values in a specified block or similarly sorts the values in an argument list. may be applied to alphabetic, numeric, and chronological data. selects the record containing the maximum (minimum) value in a specified block from an argument subfile, or the maximum (minimum) value in an argument list. may be applied to numeric or chronological data. calculates the sum of the values in a specified block of an argument subfile or of a list of numbers. calculates the average of the values in a specified block of an argument subfile or of a list of numbers. lc marc on molds/atherton and miller 155 median variance squareroot difference add (subtract multiply divide) calculates the median of the values in a specified block of an argument subfile or of a list of numbers. calculates the variance (standard deviation squared) of the values in a specified block of an argument subfile or of a list of numbers. calculates the square root of each value in a block of an argument subfile or of a list of numbers. calculates successive differences in the values of a specified block in an argument subfile or of a list of numbers. adds (subtracts, multiplies, divides ) the values from a specified block from an argument file (or list) to the corresponding values from a specified block from a second argument file (or list) . firstelement selects the first record from an argument subfile or reduce compress list. deletes the first record from an argument subfile or list. forms a temporary list composed of all the unique values in a specified block of an argument subfile or in an argument list. the eighteen processing commands allow the user to manipulate the data in the lists he has retrieved. he may count the number of elements in a list, arrange them in ascending or descending order, form the sum, average, variance, median and square root of a list of numbers; add, subtract, multiply, and divide one list by another, and select all unique elements from a list. the ability to process data as well as retrieve it may be unique to molds as compared to other interactive systems, and gives the language a useful added power. display commands display show print outputs on the crt (cathode ray tube) each complete record in an argument subfile (added to original molds commands during this project). outputs in columnar fashion on the crt selected blocks from up to three argument subfiles or lists (deleted in batch or off -line mode) . outputs in columnar fashion on the printer selected blocks from up to three argument subfiles or lists (added to original molds commands during this project) . the three display commands allow the user to display entire documents, or display selected books of information or records in columnar format. in 156 journal of library automation vol. 3/2 june, 1970 the on-line version of molds this may be done on the crt, or a printout made of selected blocks or lists of documents on the high speed printer. there is much flexibility and versatility in output format which is completely determined by the user. the command, show, is not used in the batch mode of molds. storage commands: set stores a single numeric value. store stores an alphabetic, chronological, or numeric list of arbitrary length. the two storage commands allow the user to insert independent lists of constants into the storage area. such lists do not become part of the data base, but are used in conjunction with retrieval and processing commands. utility commands: clear delete dump recall list deletes from storage a temporary subfile or list created during the session. deletes from storage all temporary subfiles or lists created during the session. displays on the crt in tabular fashion the names, file origins, and number of items in each subfile and list created by the user during the session (deleted in batch or off-line mode). displays on the crt the command which resulted in the creation of a specified temporary subfile or list (added to original molds commands during this project). produces printed copy of all commands issued during the session. may be used with stop at end of search (added to original molds commands during this project). the five utility commands allow the user to perform housekeeping operations, such as the clearing of storage areas, reinitialization of the system, and termination of execution. the command dump is not used in the batch mode of molds. language augmentation command: program allows the user to create new commands consisting of a sequence of basic commands and to store them for future sessions. the language augmentation command program, is one of the most important features of the language. it allows the user to create new commands tailormade to his own needs. this is shown in the first molds search query which follows. lc marc on molds/atherton and miller 157 search request formulation in marc/molds molds search query-example 1 (batch mode) program tally a/ count b a/ print b// end find zny ssoz/plcd/e/nyny/ tally zny/ print zny/plce/ plcd/pucd/ i find p67 ssoz/ date/e/1967 i tally p67/ define ny67 zny/and/p67/ tally ny67/ average avht ny67 /hite/ print avht/ stop the above example shows an off-line or batch-mode search. this sequence of commands would be keypunched and submitted as a job deck in the regular queue and run by the computer center staff, the searcher receiving the results as a printout from the high speed printer. ssoz is the name of one of the marc/molds files. this particular interaction shows the use of the operator program to augment the language in the subsequent search by adding tally to the list of commands. the following example shows a search query which is a sequence of some typical molds commands along with an explanation of the effect of each. each command has three parts. the first part (find, define, etc.) is the imperative which tells what operation is to be performed. the second part ( bibl, engl, both, etc.) is the label of the place in storage where the result of the operation is to be stored. this label is made up by the user when he gives a command. the third part of the command is the operand. in some cases the operand gives the criteria for retrieval (as in find, define) . it always gives the name or label of the list to be operated on, and in some cases specifies a particular block of that list. the request shown in this example was handled by molds to retrieve, display, and process all english language books on printing, or typesetting, or type founding which have bibliographies. the sequence illustt·ates the flexibility of molds, the many types of processing which can be done, the relatively easy way to use command format. this particular sequence was performed in the on-line version with chance for usersystem interaction after each command. 158 journal of library automation vol. 3/2 june, 1970 molds search query-example 2 (on-line mode) molds commands: find bibl marc/bib/e/x/ explanation: find all records in the file named marc for which the block named bib contains a value equal to (e) x (x in the block indicates presence of bibliographies). the list of selected records is to be stored in a location called bibl. find engl marc/lang/e/eng/ find all documents in the file named marc for which the block named lang contains a value equal to (e) eng, i.e. english language books. the list of selected records is to be !itored in a location called engl. define both bibl/ and/engl/ store subs 3/ alpha/13/ element 1 = printing/ element 2 = type-setting/ element 3 = define a new list called both which consists of the documents common to both bibl and engl, i.e., all english language books with bibliographies. inform the system that the user wishes to store, via the console, a list of values which will be called subs. the list will contain 3 elements which will be alphanumeric (alpha) as opposed to strictly numeric. the longest element will not exceed 13 characters. (system responds with these words.) user inserts first value by typing it on the console. (system responds with these words.) user inserts second value. (system responds with these words.) lc marc on molds/atherton and miller 159 type-founding/ select all both/subj/subsi count no. all/ show no.// print all/main/titl/lcno i i all/publ/plce/ i maximum big all/hite/ average ave all/hite/ user inserts third value. user has now created an independent list of three distinct valuesprinting, typesetting, type-founding and stored them in a location called subs. select all records from the list called both for which the values in the block named subj are equal to any of the values in the list called subs, i.e. those records for which the subject heading is printlng, type-setting, or type-founding. the selected records are stored in a location called all. count the number of records in the list called all. the count is stored in a location called no. display the contents of no. on the crt. produce a 5-column printed listing consisting of the values in the blocks ·named main (main entry), titl (title), lcno (library of congress classification number), publ (publisher), plce (place of publication) from each record of the list called all. from the list called all, select the record containing the maximum value in the block named hite (height) . the record is stored in a location called big. calculate the average of the values in the block named hite (height) of the list called all. the value is stored in a location called ave. 160 journal of library automation vol. 3/2 june, 1970 the following example records another interaction and the results in the off-line or batch mode. notice the error message which did not interrupt the search. this result also includes a report on the length of central processing unit (cpu) time each operation takes in hours, minutes, seconds and tenths of seconds. any line preceded by c indicates that the line was printed by the computer; any line minus the c indicates that the information was typed in by the user. molds retrieval-example 3 (batch mode) c please enter your program c line 1 oooooooopauline athertonooooooooo c invalid command name c set in at 185 day of 1969 16-01-17.1 c line 1 program tally a/ c line 1 count b a/ c line 2 print b// c line 3 end c set in 185 day of 1969 16-01-17.5 c line 2 find d2 ssoz/dew2/ne/o? c set in at 185 day of 1969 16-02-38.7 c line 3 find d1 ssoz/dewl/ne/ i c set in at 185 day of 1969 16-03-56.7 c line 4 tally d2/ c 950.00 c set in at 185 day of 1969 16-03-57.3 c line 5 tally d1/ c 905.00 c line 6 stop comments on marc/molds thus far this report has been confined to a more or less factual description of the components of the marc/molds system. no doubt the reader has asked himself many questions about the system, and made his own critical comparisons between this system and others. what follows are preliminary and necessarily subjective comments based on a lc marc on molds/atherton and miller 161 few demonstrations given to students in the school of library science and on the authors' own observations and reflections. system design response time response time (i.e. the time between transmission of a command in the on-line version and its execution) has been on the order of 90 seconds for a search of 620 records, to 20 seconds for an arithmetic operation involving the same number of records. when one thinks of these times in comparison with the time required to perform the same operations manually, they seem rapid. however, 90 seconds appears to be an unreasonably long period of time in a computer-based interactive retrieval environment. viewers of demonstrations often asked why it took the computer "so long" to perform a search. a user's tolerance for delay appears to vary a great deal with the type of retrieval system he is using. this has been observed on other occasions, but no determination has yet been made of tolerable limits in different environments, a determination that would be important in designing computer-based systems. · man-system interaction a design goal of most other existing interactive retrieval systems seems to be to give the computer certain anthropomorphic qualities and make it into a teacher or a responsive friend. such systems offer computeraided query formulation and/or a friendly conversation with the computer. the molds on-line system does not include either of these features. the user must first master a marc/molds manual which is an explanation of the system and the data base. he then goes on line and gives his command. molds responds by performing that command or by putting out a brief error message if the command format was improper. apparently the objective of conversation with the computer as found in most systems is to make it easier for the user to achieve desired results or to make him feel more at ease with the system. the person who plays with an interactive system once or twice probably finds conversations with a computer amusing, novel, and helpful in his first attempts. however, for a serious and steady user, carrying on the same conversation with the computer during each and every session can be tedious, repetitive, time consuming and sometimes circular. the optimum mix of computer-aided and independent user-formulated query is yet to be studied and found. perhaps molds, because it is a poor conversationalist, could aid in this search. at any rate, the automatic assumption of conversational features as a design goal for computer-based retrieval systems may not be based on sound knowledge of what suits the serious user. molds repertory of commands the processing commands in the molds query language are a wei162 journal of library automation vol. 3/2 june, 1970 come and valuable addition to the usual repertory of search and display commands common to most interactive systems. although the marc data base does not lend itself to a great deal of processing, we have found some commands useful, particularly count, order, maximum, minimum, and compress. processing times when individual commands of a single search take seconds of cpu time, it is certain that a retrieval system will be expensive if it is employed by a great many users as a general purpose system. some of the molds commands operating on the marc data base took whole minutes of cpu time! the authors have learned a great deal about interactive retrieval systems by using molds experimentally, but because of the excessive cost of certain runs, may not be able to continue research with it. modifications will have to be made to make it more efficient (i.e. cheaper to run) before it could be recommended for general use in the syracuse university library school or anywhere else. if the molds system can be designed to yield good results for certain types of searches with a realistic file size, it will be a boon to the library or educational institution seeking to automate some part of its searching procedures. data base noah prywes ( 14) has commented, "the effectiveness in retrieving documents is highly dependent on the amount of labor and processing invested in the storage of documents." the minimum amount of processing done on the marc tapes has, in fact, limited the effectiveness of retrieval. the extreme simplicity of the general molds data base structure is worthy of study. the efficiency and cost of retrieval using this structure needs to be compared very carefully with more sophisticated threaded lists. one extremely important factor to consider will undoubted· ly be the effect of increasing the size of the file. as pointed out before, the molds system requires an exact match of punctuation and spelling between retrieval criteria and stored data items, a match difficult to achieve. to be sure, this is partially a limitation in the molds system that may be relaxed by incorporating a capability to search for root words and key letter combinations. however, the many inconsistencies in abbreviations, punctuation, and spelling that appear in bibliographic records when information on title pages is transcribed, as on the marc tapes, can enormously complicate effective retrieval. marc or non-marc bibliographic records will always contain some "author" variations that such a system as molds may have to accommodate. this is a very knotty problem. these comments are not to be construed as a criticism of the fine work the library of congress has done in its marc pilot project. the marc lc marc on molds/atherton and miller 163 pilot project record format, with sometimes indistinct data elements ( special punctuation marks and symbols), was not specifically designed for computer-based interactive search systems. hopefully, the use herein described to which the marc data base has been put, and the experience derived from that use, will be of value as future modifications of the marc format are made. mter all, reference retrieval, using bibliographic information, automated or manual, is natural to libraries and is, indeed, one of the purposes for which that information is recorded in the first place. since one of the true values of a computer-based file lies in making multiple use of the records, it becomes imperative to test the various uses to which these records can be put. the future use of marc/molds at syracuse university the marc/molds system has undergone continual modification in data base structure and query language during the first year of work on it. a computer-based system must be capable of such flexibility, for changes should be accomplished easily and smoothly. no system is perfect, especially in its early days, least of all molds. it is intended to continue investigation into information-seeking behavior, and to use marc/ molds occasionally along with other retrieval systems. another paper describes use of the marc file with the ibm/ document processing system ( 15). summary this report has tried to describe, not sell, marc/molds as fairly as possible in the belief that some of its features should be considered by persons designing interactive systems, and by those responsible for refinement of the marc format. the searching capability is valuable as it increases the access points to the data. the arithmetic and logical operations provide an opportunity to perform certain studies of the marc data base. the marc files will eventually have many applications beyond technical processing functions in libraries. these applications would be more practically implemented if the marc format were modified to accommodate them and if librarians would use systems such as molds during their exploration of alternatives. marc/molds as a computer-based system has many wealmesses. outnumbering and to some extent overshadowing the concrete statements about its faults is its great potential. many questions have been raised which remain unanswered. questions dealing with the basic design of the system and data base are indicative of the development and experimentation which must be done before computer-based interactive retrieval in libraries is a practical reality. acknowledgments the work on this project has been supported by rome air develop164 journal of library automation vol. 3/2 june, 1970 ment center (contracts. u. no. af30 (602)-4283). related work, supported by a grant from the u. s. office of education, provided an education in understanding of the marc tapes. the authors gratefully acknowledge the comments made by phyllis a. richmond and frank martel on the original manuscript. mrs. sharon stratakos, programmer most responsible for molds, contributed a great deal to the authors' understanding of this retrieval program and its potential use with a bibliographic reference file such as marc. program microfiches and photocopies of the following may be obtained from national auxiliary publications service of asis: "rome project program description: molds support package" (naps 00884). references 1. avram, henriette: the marc pilot project, final report (washington, d. c.: library of congress, 1968). 2. a user-oriented on-line data system (syracuse, n. y.: syracuse university research corp., 1966). 2 v. 3. freeman, robert r.; atherton, pauline: audacious-an experiment with an on-line interactive reference retrieval system using the universal decimal classification as the index language in the field of nuclear science (new york: american institute of physics, april 25, 1968) (aip/udc-7). 4. burnaugh, h . p.; et al: the bold user's manual (revised) (santa monica, cal.: jan. 16, 1967) ( tm-2306/004/01). 5. cegala, l.; waller, e.: colex user's manual (falls church, va.: system development. feb., 1969) (tm-wd-(l)-405/000/00). 6. smith, j. l.; micro: a strategy for retrieving ranking and qualifying document references (santa monica, cal.: jan. 15, 1966) (sp 2289). 7. green, james sproat: grins : an on-line structure for the negotiation of inquiries (bethlehem, pa.: lehigh university, center for the information sciences, september 1967) . 8. computer command and control company: description of the multilist system (philadelphia, pa.: july 31, 1967. 9. national aeronautics and space administration, scientific and technical information division: nasa/recon user's manual (washington, d. c.: october 1966). 10. kessler, m. m.: tip user's manual (cambridge, mass.: massachusetts institute of technology, dec. 1, 1965). 11. biomedical communication network: user's training manual (syracuse, new york : december 1968). 12. welch, noreen 0. : a survey of five on-line retrieval systems (washington, d. c.: mitre corp., august 1968) (mtp-322). lc marc on molds/atherton and miller 165 13. avram, henriette d.; knapp, john f.; rather, lucia j.: the marc ii format (washington, d. c.: library of congress, 1968). 14. prywes, noah s.: on-line information storage and retrieval (philadelphia, pa.: university of pennsylvania, moore school of electrical engineering, june 1968). 15. atherton, p.; wyman, j.: "searching marc project tapes using ibm/document processing system," proceedings of american society for information science, 6 ( 1969), 83-88. 90 oclc search key usage patterns in a large research library kunj b. rastogi: oclc; and ichiko t. morita: ohio state university, columbus. many libraries use the oclc online union catalog and shared cataloging subsystem to perform various library functions, such as acquisitions and cataloging of library materials. as an initial part of the operations, users must search and retrieve a bibliographic record for the desired item from the large oc lc database. various types of derived search keys are available for retrieval. this study of actual search keys entered by users of the oclc online system was conducted to determine the types of search keys users prefer for performing various library operations and to find out whether the preferred search keys are effective. introduction in the last decade, many information systems have been developed that use search keys to retrieve bibliographic records from large databases. the oclc online union catalog and shared cataloging subsystem in particular is one of the larger of these systems. 1--u there are currently more than 7 million bibliographic records in the oclc database. the oclc online system uses search keys to access various index files that locate bibliographic records in the database. index files are maintained for name/title, personal author, corporate author, coden, isbn, and lccn indexes. the first four of the above index files contain search keys that are derived from information (e. g., author, title) present in the piece or citation. search keys in these four indexes are in general not unique, because the derived key could be the same for different bibliographic records. the last three indexes (coden, isbn, and lccn) contain search keys or identifiers that are unique in general. a user enters a search key consisting of characters (letters, numbers, symbols, commas, hyphens) formatted according to specific rules that identify to the system which index file to search. for example, to search the name/title index, the user enters a search key consisting of the first four characters of the author's last name and the first four characters of manuscript received october 1980; .accepted december 1980. search key usage!rastogi and morita 91 the first nonarticle word of the title of the work, separated by a comma. to search the title index, the user enters a search key consisting of the first three characters of the first nonarticle word in the title, the first two characters of the second word, the first two characters of the third word, and the first character of the fourth word, each separated by a comma. 7 the system compares the user-entered search key with the search keys contained in that index file. this comparison results in one of three possible cases: l. only one index file search key matches the user-entered search key . 2 . more than one index file search key matches the user-entered search key. 3. no index file search key matches the user-entered search key. in the first case, the system retrieves the unique bibliographic record corresponding to the search key and displays it on the user's terminal screen. in the second case, the system retrieves all records that correspond to the search key, prepares truncated entries (consisting of author, title, imprint data, etc.) for those records, and displays the truncated entries on the user's terminal screen . the user then selects the truncated entry that corresponds to the desired record and requests the system to display the full record for that item. in the third case, the system responds with the reply that a record matching the user-entered search key was not present (a "not found" response) in the index. in the oclc online system, 2,500 member libraries ·using 3,800 terminals search the oclc database to perform various library functions such as acquisitions, monograph cataloging, and serials cataloging. users can choose to enter any type of search key from the various types of search keys permitted by the system. users' preferences to enter a particular type of search key will depend in part upon the kind of information they have about the item to be searched and the type of library function they wish to perform. if users receive a "not found" response after entering a particular type of search key, they may then try a different type of search key that they consider next best. the purpose of this study was to determine what types of search keys are preferred to perform various library functions and whether the preferred search keys are effective. the study also investigated what type of search key is used next when particular types of search keys are unable to retrieve the desired record to determine if there are any discernible search patterns. materials and methods for conducting this study, data were needed on the pattern of searchkey use in oclc member libraries. further, the data had to include the actual time of day when work was performed for a particular library 92 journal of library automation vol. 14/2 june 1981 function on a specific terminal. this requirement would permit identification in the online system use data collected by oclc of search keys entered to perform specific library functions. ideally, a library with several oclc terminals, each used exclusively for only one library function, was desired. the ohio state university (osu) library met this requirement. the osu library has eleven terminals: two of the eleven terminals are used exclusively for performing acquisition functions, seven are used for monographic cataloging, and one terminal each is used for serials cataloging and public use. the terminal assigned for serials cataloging is used for monograph cataloging after 5 p.m. library staff at osu use all the terminals exclusively, except for the public-use terminal. this public-use terminal can be used by anyone, including faculty, students, and library staff. two full days' transactions for each of the osu terminals were obtained from the oclc online system use statistics (olsus) file. during the online operation, the system writes a record on the olsus file for each message entered by the user. this record includes the institution number, a number identifying the terminal from which the message came, the time of the transaction, and the first nonblank sixteen characters of the message . if the user-entered message is a search key, the system response is either a "not found" response or a "found" response. with the "found" response, the system displays the bibliographic record (if unique) or displays a truncated entry screen. however, a "found" response does not necessarily mean that the truncated entry screen includes information about the bibliographic record the user was actually seeking. for the study, a program was written to scan the records in the olsus file for two full days in october 1978. the program extracted all the records for messages that came from the eleven osu terminals and wrote the records on two tapes--one for each day's activity. these tapes were sorted first by the terminal number and then within each terminal number by the time of transaction. each sorted tape was fed to another program that printed, for each terminal, the actual messages in chronological order and the associated system response. from this printout, it was possible manually to go through the complete sequence of messages entered to search a single bibliographic item. the printout for an entire day's activity for each terminal was thus divided into sections, each section containing all transactions that were performed to search for a single item. for each section, the type of search key first entered and the system response was noted. in case of a "not found" response, the type of search key next entered (if the search process was continued for the item) also was noted. the results were combined for all the terminals used to perform a specific library function (e.g., acquisitions) and for the two days. search key usage!rastogi and morita 93 results and discussion table 1 and figure 1 show the different types of search keys used as the first choice to perform various library functions. note that at the time of data collection for this study, the interlibrary loan subsystem was not operational. table 1. different types of searches for various applications type of search nametritle title personal author lccn isbn issn coden total monograph acquisitions cataloging items %of items %of searched total searched total 111 37.5 313 51.7 49 16.6 48 7.9 0 0.0 9 1.5 122 41.2 201 33.2 14 4.7 34 5.6 0 0.0 0 0.0 0 0.0 0 0.0 296 100.0 605 100.0 acquisitions lccn ( 41.2% serials cataloging title other s 4 .7% serials cataloging items %of searched total 15 15.9 72 76.6 0 0.0 1 l.l 1 l.l 5 5.3 0 0.0 94 100.0 monograph cataloging lccn name/tit le public use title name / title 48.7% public use items %of searched total 77 48.7 44 27 .8 16 10.1 13 8.2 3 1.9 3 1.9 2 1.3 158 100.0 fig . 1 . number of different types of search keys for various applications. 94 journal of library automation vol. 14/2 june 1981 during the two-day period, a total of 605 items were searched for monograph cataloging, 296 items were searched for acquisitions operations, and 94 items were searched for serials cataloging. a total of 158 items were. searched on the public-use terminal. most types of search keys were used to some extent. the use of isbn and issn search keys was quite limited for all types of library functions. the coden search key was used only twice, and both times through the public-use terminal. the corporate author search key was not used at all. the use of the personal-author search key was much smaller than expected. this was probably because at the time of the study the system did not permit use of personal author keys during peak hours (9 a .m. to 5 p.m .) of online system operation. for the acquisitions function, the lccn search key was used most often, followed by the name/title key. these two types of keys together were used for about 80 percent of the acquisitions items searched . for the monograph cataloging function, the most frequently used search key was the name/title key. this key was entered for about 52 percent of items searched. the next most frequently used key for monograph cataloging was the lccn key, used for about 33 percent of the items searched. for the serials cataloging function, the title key was used most often, for more than 75 percent of the items searched. searches performed through the public-use terminal included all types of search keys. the name/title key was used most frequently , followed by the title key. before performing an actual search, a user must choose, from among the various types of search keys available in the oclc system, the particular search key to use. if the search key used for a first try (primary choice of search key) results in a "not found" response from the system, a second key may be entered (secondary choice of search key). this sequence may continue through many search-key choices until the user retrieves the desired record ("found" response) or decides to abandon the search at some point upon obtaining a "not found" response. for this study, the investigation was confined to onlyprimary and secondary choices of search keys. the results of the "found" responses for the primary choice of key and for the secondary search key entered after receiving the first "not-found" response are presented in tables 2 through 5. for the acquisitions function (table 2), the most frequently used primary search key was the lccn key, which retrieved the desired record about 89 percent of the time. when the lccn key could not retrieve the record, the user chose mostly the name/title key as his/her secondary choice or abandoned the search. the next most frequently used primary search key was the name/title key, which retrieved the desired record about 51 percent of the time. when the name/title key was unsuccessful, the users entered as their secondary search key a title key search key usage!rastogi and morita 95 table 2. number of primary and secondary choices of search keys for acquisitions search discontinued types of search key used after the type of %of notafter the first not-found response first notsearch key items found found found name/ personal found used first searched responses responses responses title title author lccn isbn response nameffitle 111 57 51.3% 54 17 22 0 1 0 14 (31.5%)(40. 7%) (0.0%) (1.9%) (0.0%) (25.9%) title 49 17 34.7% 32 6 ll 0 2 1 12 (18.8%)(34.4%) (0.0%) (6.2%) (3.1%) (37.5%) personal author 0 lccn 122 109 89.3% 13 5 1 0 2 1 4 (38.4%) (7.7%) (0.0%) (15.4%) (7.7%) (30.8%) isbn 14 1 7.1% 13 8 3 0 0 0 2 (61.5%)(23.1 %) (0.0%) (0.0%) (0.0%) (15.4%) issn 0 coden 0 total 296 184 62 .2% 112 36 37 0 5 2 32 (32.1 %)(33.0%) (0.0%) (4.5%) (1.8%) (28.6%) note: to calculate the percentage given in parentheses, the number of ''types of search key used after the first not-found response" was divided by the number of "not-found responses." about 41 percent of the time, or a different name/title key about 31 percent of the time. approximately 26 percent of the time they abandoned the search. it seems that acquisitions users mostly try the lccn key first if available (the lccn is not present in all the records) and the name/title key first if the lccn is not available. thus, users adopted the right approach since the lccn key has· the highest hit rate. furthermore, the lccn key is more efficient than other keys because it results, on the average, in a fewer number of replies. for the monograph cataloging function (table 3), the name/title key was used most often as the primary search key, resulting in retrieval of the desired record about 57 percent of the time. when the name/title key could not retrieve the record, the users next attempted a title key (52 percent of the time) or a different name/title (21 percent of the time). about 23 percent of the time they discontinued the search. the lccn key was the second most frequently used primary search key and successfully retrieved the record about 79 percent of the time. when the lccn key was unsuccessful, the users tried the name/title key (58 percent of the time) as their secondary choice or abandoned the search . unlike the search-key usage pattern for acquisitions, the use of the lccn key for monograph cataloging was lower than use of the name/ title key, although here also the hit rate was highest for the lccn key. the reason the lccn use was lower is that ohio state university, being a research institution, processes a large number of items from var96 journal of library automation vol. 14/2 june 1981 table 3. number of primary and secondary choices of search keys for monograph cataloging search discontinued types of search key used after the type of %of notafter the first not-found response first notsearch key items found found found name/ personal found used first searched responses responses responses title title author lccn isbn response nameffitle 313 180 57.5% 133 28 69 1 4 1 30 (21.1%)(51.9%) (0.7%) (3.0%) (0.7%) (226%) title 48 24 50.0% 24 9 2 1 3 2 7 (37.5%) (8.3%) (4.2%) (12.5%) (8.3%) (29.2%) personal author 9 3 33.3% 6 4 0 0 0 1 1 (66.6%) (0.0%) (0.0%) (0.0%) (16.7%) (16.7%) lccn 201 158 78.6% 43 25 4 0 2 l 11 (58.1 %) (9.3%) (0.0%) (4.7%) (2.3%) (25.6%) isbn 34 3 8.8% 31 20 4 1 1 3 2 (64.5%)(12.9%) (3.2%) (3.2%) (9.7%) (6.5%) issn 0 coden 0 total 605 368 60.8% 237 86 79 3 10 8 51 (36.3%)(33.3%) (1.3%) (4.2%) (3.4%) (21.5%) note: to calculate the percentage given in parentheses, the number of ''types of search key used after the first not-found response" was divided by the number of "not-found responses." ious sources other than regular acquisitions channels, and many of these sources do not have lccn information. for the serials cataloging function (table 4), the title key was the first primary choice and retrieved the desired records 44 percent of the time. if this key failed to retrieve the desired records, the users entered as their secondary key a different title key 55 percent of the time and a name/title key 17 percent of the time. approximately 23 percent of the time, users decided to discontinue the search. although for serials cataloging the title key was used most frequently, its hit rate was less than 45 percent. on the other hand, the issn key was used very little, but its hit rate was as high as 80 percent. the use of the issn key is likely to increase in the future, however, because the united states postal service now requires the issn to be present on serials . 8 therefore, the issn will be more readily available to the user. among the searches performed through the public-use terminal (table 5), the most frequently used primary search key was the name/title key, which resulted in a successful search about 29 percent of the time. when patrons encountered a "not found" response, they tried as their secondary choice a different name/title key 29 percent of the time, or a title key 29 percent of the time. they abandoned the search 38 percent of the time. as mentioned earlier, the public-use terminal can be used by anyone, including faculty and students. the hit rate for name/title search key usage!rastogi and morita 97 table 4 . number of primary and secondary choices of search keys for serials cataloging %of nottypes of search key used after the first not-found response search discontinued after the first not-type of search key used first items found found found name/ personal found response searched responses responses responses title title author lccn isbn nameffitle 15 3 20.0% title 72 32 44.4% personal author 0 lccn 0 0.0% isbn 0 0.0% issn 5 4 80.0% coden 0 total 94 39 41.5% 12 6 4 1 0 0 (50.0%)(33.3%) (8.3%) (0.0%) (0.0%) 1 (8.3%) 40 7 22 2 0 0 9 1 (17.5%)(55.0%) (5.0%) (0.0%) (0.0%) (22. 5%) 0 1 0 0 0 (0.0%)(100.0%) (0.0%) (0.0%) (0.0%) 1 0 0 0 0 (100.0%) (0.0%) (0.0%) (0. 0%) (0.0%) 0 1 0 0 0 (0.0%) (100.0%) (0.0%) (0.0%) (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 55 14 28 3 0 0 10 (25.5%)(50.9%) (5.4%) (0.0%) (0.0%) (18:2%) note: to calculate the percentage given in parentheses, the number of "types of search key used after the first not-found response" was divided by the number of "not-found responses." table 5. number of primary and secondary choices of search keys for public use %of nottypes of search key used after the first not-found response search discontinued after the first n ot-type of search key used first items found found found name/ personal found response searched responses responses responses title title . author lccn isbn nameffitle 77 22 28.6% 55 16 16 0 2 0 21 (29.1 %)(29.1 %) (0.0%) (3.6%) (0.0%) (38.2%) title 44 20 45.4% 24 ll 9 0 0 0 4 (45.8%)(37.5%) (0.0%) (0.0%) (0.0%) (16.7%) personal author 16 5 31.3% ll 0 0 3 0 0 8 (0.0%) (0.0%) (27.3%) (0.0%) (0.0%) (72.7%) lccn 13 5 38.5% 8 2 2 0 1 1 2 (25.0%)(25. 0%) (0.0%) (12.5%) (12.5%) (25.0%) isbn 3 2 66.7% 0 0 0 0 1 0 (0 .0%) (0.0%) (0.0%) (0.0%) (100.0%) (0.0%) issn 3 33.3% 2 0 0 0 0 0 2 (0 .0%) (0.0%) (0.0%) (0.0%) (0.0%) (100.0%) coden 2 0 0.0% 2 0 1 0 0 0 1 (0.0%) (50.0%) (0.0%) (0.0%) (0.0%) (50.0%) total 158 55 34.8% 103 29 38 3 3 2 3h (28.2%)(27 .2%) (2.9%) (2.9%) (1.9%) (36.9%) note: to calculate the percentaee given in parentheses, the number of "types of search key used after the first not-found response" was divided by the number of "not-found responses." 98 journal of library automation vol. 14/2 june 1981 key at this terminal was rather low. from this study, it is not possible to say whether this was due to patrons' lack of knowledge in key construction or lack of sufficient information needed for the construction of the key. summary and conclusions among various types of search keys available to the users, the name/ title, lccn, and title search keys were entered most frequently. the use of personal-author, isbn, issn, and coden search keys was very limited for all library functions. corporate-author search keys were not used at all. for the acquisitions function, system users most frequently entered the lccn key, followed by the name/title key. for monograph cataloging, the users entered the name/title key most frequently, followed by the lccn key. for serials cataloging, the use of the title key was the most common. persons using public-use terminals entered mostly name/ title and title search keys. for acquisitions and monograph cataloging functions, the lccn key was most successful in retrieving the desired records. the next most successful key was the name/title key. for both of these functions, when the name/title key failed to retrieve the record, users next tried the title key most of the time. for serials cataloging, the title key was used most frequently but was not very successful in retrieving serial records. on the other hand, the issn key was the most successful but it was used very little . individual identifiers such as lccn, issn, isbn, and coden are very efficient search keys because they retrieve, on the average, far fewer numbers of replies than other types of search keys. with the exception of lccn, the individual indentifiers were used only to a small extent. from this study, it is not possible to answer questions such as: why weren't individual identifiers' search keys not used more often? did a searcher use a name/title key even when the lccn was available? to answer such questions, data will have to be collected concerning what kind of information is available to the searcher when constructing the search keys. acknowledgments the authors wish to thank william h. hochstettler for programming assistance, and peggy zimbeck for editorial assistance with the manuscript. references l. f. g. kilgour, p. l. long, and e. b. leiderman, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7:79-82 (1970) . 2. f. g . kilgour and others, '"title-only entries retrieved by truncated search keys," search key usage!rastogi and morita 99 journal of library automation 4:207-10 (dec . 1971). 3. p. l. long and f . g . kilgour, "a truncated search key title index," journal of library automation 5:17-20 (march 1972). 4. a. l. landgraf and f. g. kilgour, "catalog records re trieved by personal author using derived search keys," journal of library automation 6:103--8 (june 1973). 5. a. l. landgraf, k. b. rastogi, and p. l. long, "corporate author entry records retrieved by use of derived truncated search keys," journal of library automation 6:15161 (sept. 1973). 6. j. d . smith and j . e . rush , "the relationship between author names and author entries in a large on-line union catalog as retrie ved using truncated keys," journal of the american society for information science 28 , no.2:115--20 (march 1977). 7. oclc, inc., searching the on-line union catalog (columbus, ohio: oc lc, inc., 1979). 8. library of congress information bulletin 37:35 (1 sept. 1978). kunj b. rastogi is a research scientist at oclc . ichiko morita is assistant professor at the ohio state university libraries. this paper discusses some of the problems associated with search and digital-rights management in the emerging age of interconnectivity. an open-source system called context driven topologies (cdt) is proposed to create one global context of geography, knowledge domains, and internet addresses, using centralized spatial databases, geometry, and maps. the same concept can be described by different words, the same image can be interpreted a thousand ways by every viewer, but mathematics is a set of rules to ensure that certain relationships or sequences will be precisely regenerated. therefore, unlike most of today’s digital records, cdts are based on mathematics first, images second, words last. the aim is to permanently link the highest quality events, artifacts, ideas, and information into one record documenting the quickest paths to the most relevant information for specific data, users, and tasks. a model demonstration project using cdt to organize, search, and place information in new contexts while protecting the authors’ intent is also introduced. ■ statement of the problem human history is composed of original events, artifacts, ideas, and information translated into records that are subject to deciphering and interpretation by future generations (figure 1). it’s like putting together a puzzle, except that each person assembling bits and pieces of the same information may end up with a different picture. we are at a turning point in the history of humanity’s collective knowledge and expertise. we need more precise ways to structure questions and more interactive ways to interpret the results. today, there is nearly unlimited access to online knowledge collections, information services, and research or educational networks to preserve and interpret records in more efficient and creative ways.1 there is no reason digital archiving and dissemination techniques could not also be used to streamline redundancies between collections, build cross-references more methodically.2 content should be presented and techniques utilized according to orderly specifications. this will help to document work more responsibly, making shared records more correct, interesting, and complete. the open-source system proposed, context driven topologies (cdt), packs and unpacks ideas and information in themes similar to museum exhibitions using specifications created by each author and network. data layers are formed by registering unique combinations of geography, knowledge domains, and internet addresses to create multidimensional shapes showing where data originate, where they belong, and how they relate to similar information over time. the topologies can be manipulated to consolidate and compare multiple sources to identify the most reliable source, block out repetitious or irrelevant background information, and broadcast precise combinations of ideas and information to and from particular places. “places,” in this sense, means geographic region and cultural background, knowledge domain and education level, and all of their corresponding online resources. modern information must be searchable on multiple and simultaneous levels.3 today’s searches occur for a number of reasons that did not exist when most current collections, repositories, and publications were created. digital records have the potential to reach far broader audiences than original events, artifacts, and ideas. therefore, digitized items and the acts of publishing and referencing over networks could theoretically serve a longer-term and more expanded purpose than most individual collections, repositories, or publications are designed to serve. there is no shortage of interesting work to look at. we live in a complex world that is just recently being digitized, mapped, analyzed, and broadcast over the internet in fine detail and compelling overall relationships. many deborah l. macpherson (debmacp@gmail.com) is projects director, accuracy&aesthetics (www.accuracyandaesthetics.com) in vienna, virginia. deborah l. macpherson digitizing the non-digital: creating a global context for events, artifacts, ideas, and information digitizing the non-digital | macpherson 95 figure 1. 50 word word-search-puzzle (courtesy of kevin lightner) 96 information technology and libraries | june 2006 of these relationships require mathematics, images, and maps to explain them. we need more than keywords to explore and reference all that has been documented, but we have formed the habit of using keywords and machine-based classification schemes. the entire digital world is in a mire of conflicting priorities, funding opportunities, and intellectual quests toward the future. to advance humanity’s collective curiosity and knowledge, and to coordinate similar efforts across disciplines and cultures, we need one form of record keeping. one global context to show: 1. where ideas and information begin; 2. if the original is non-digital (e.g., an artifact or real world event), and if so, the location where the artifact resides or the time and place of the event; and 3. a marking system to keep track of the ways information has been exchanged, reinterpreted, and reused to create a more comprehensive and simplified guide to humanity’s collective knowledge and expertise. digitizing the non-digital is a concept to address three issues: ■ tools to assemble the bigger pictures needed to document the best paths to the most relevant information in sets rather than retrieving results item by item; ■ placeholders for information that has not been digitized or was never recorded; and ■ distribution to and from specific places according to the ways it is used, the kind of information it is, and the types of people who are able to understand it. there is currently little distinction between all data that have been collected or exist, versus the data and techniques selected to draw conclusions. there are no tools to differentiate between information under rigorous discussion by a discipline or culture versus random bits and pieces. there is a need to develop the equivalent of interpretive exhibits to instruct and inspire the general public. there is currently no way to herd information into crowded areas to be consolidated, compressed, and prioritized by its relationship to similar ideas and information. citation patterns are able to show connections or structure-related information.4 however, they currently do not show whether the reference is for or against the other work. there are very few big pictures.5 there is no way to trace where an idea has led over time. the global context proposed is not like the ancient library of alexandria or large-scale contemporary initiatives. the envisioned process looks beyond the quest to digitize or publish every available event, artifact, and idea. it is not about each item itself. it is being able to make sense of the ways the same information can be viewed in different contexts, and being able to construct a reliable process to search and document the results. having bigger pictures will allow researchers, curators, and others to see what is missing or decide which archival works should be converted into digital form. we do not have the time, resources, or reasons to digitize every item in every collection. the aim is to gradually identify what the most telling examples are in different areas so someone new to an event, artifact, idea, or information can see it in various contexts and automatically be shown the most compelling or instructive sequences first (figure 2). a coordinated effort to overlap and see all archives and publications by ranking accuracy and appeal to the public in relationship to all knowledge will make it possible for entirely new lines of inquiry to be established. it will help researchers coordinate work across disciplines. an example of this principle today is the international virtual observatory alliance (ivoa).6 ivoa is a coordinated effort by astronomers worldwide to document our universe more efficiently by systematizing their records; showing where they originate; indicating how they were collected; meeting their rigorous mathematical figure 2. photomosaic ® thousands of miniature images of the civil war combine to make one large portrait. (courtesy of robert silvers) article title | author 97digitizing the non-digital | macpherson 97 standards; and deciding themselves how and where their records belong in relationship to each other, and which ones are most important. only astronomers are qualified to do this. the same is true in any area of humanity’s specialized knowledge and expertise. the most difficult aspect of creating a global context is accommodating and expressing each area in its unique way as created from within, while still being able to get the most descriptive examples from all areas to fit together in a sensible and appealing overview. until digital archives and publications can be deeply searched on a global level using simpler tools and predetermined pathways accessible by anyone, two researchers in different geographic or academic areas may be investigating the same topic from different points of view and will not know it. there is no way to be led to the best internet resources. today, as so much information surrounds us, it is hard to believe that common lines of inquiry could be discovered by accident. context of the place, time, idea, or education level should be able to drive internet topologies to the most appropriate online resources. constructing a reliable and beautiful digital history of all events—both natural and man-made—artifacts, ideas, and information means contributing to and combining a wide range of knowledge, expertise, networks, archives, and tools. mapping digital knowledge to historical knowledge means arguing about and perfecting an entirely new set of checks and balances. historical and digital knowledge are different. historical knowledge is fluid, continuous, and held by traditionally separated cultures and disciplines. digital knowledge goes everywhere that can be marked and traced by the times and places it was created, captured, and distributed. trying to visualize what is happening and relating it to working practices and the types of information that came before it is not like tracing the history of the human race back to adam and eve or the universe back to the big bang, where substantial guesswork beyond our memory or experience is involved. the entire conversion into the networked age is happening before our eyes in less than one generation without the benefit of reflection, careful review, and storytelling. we’re collecting everything indiscriminately over and over again while all datasets are rapidly expanding. we need to step back, slow down, and acknowledge that many current digitization and publication methods do not consistently generate reflective or reviewed results that are able to tell a story. we do not currently have one shared map, context, mathematical record, language, or set of symbols to interpret from different points of view for a variety of purposes over time. we do not currently mark the original versus subsequent interpretations of the same information as an integral component of most digital records. there is no financial support for one single shared storage space to preserve only the highest-resolution, most agreed-upon versions because we may never be able to agree on what they are. therefore, there is also not one system that can be fine-tuned to discover research and results that may be accidentally overlapping. instead, unusual approaches get watered down by constrained words designed to fit metadata requirements developed by archivists and engineers rather than the original authors. links get broken, web sites are no longer maintained, trends change. there are currently very few feasible ways to pick up on a line of inquiry previously initiated by others without sorting through and regenerating the same information again.7 a simplified version of the work needs to be preserved on the network, able to be referenced by others even if they are far away, live in a different time, or are more or less advanced in their ways of thinking. if digital information is reliable, someone in a remote place or in the future should not need to collect the same information again or unintentionally retrieve out-of-date or duplicate results. searches in the public domain should not be boring. they should be as easy to click through as tv channels, with more directions to go and better content. all searchers should not have to start at the top like everyone else on the first page of google, citeseer, or arxiv with a blank white space and a box to enter key words. investigators should be able to outline the facts they know, dial in measurements, specify relationships, and generally be able to use their own knowledge and expertise to isolate and extract entire ideas over broad spectrums or select only relevant portions of archives and publications to reintegrate into larger bodies of work for further discussion. digital objects are able to depict more than the unaided eye can see. an example is the evaluation of the center of mass of michelangelo’s david performed for david’s restoration by the visual computing lab based on a 3d model of the statue built by stanford university (figure 3).8 the digital david does not have mass. the original david is a beautiful object sculpted of a known and predictable material. the model makes it possible to test restoration techniques without permanent damage in ways no one would dare attempt on the irreplaceable original without first knowing more. the documentation process is an enhanced original that should be permanently bound to the digital history of the original sculpture. the evaluation method could be applied to other objects, but this model belongs with this object and this type of research. a global context built upon a solid, mathematically linked foundation would mean this conscientious work would not be lost or need to be repeated. digital records are not being used nearly to their full potential. so many influences on humanity’s intellectual evolution could be examined as history takes shape over 98 information technology and libraries | june 2006 time. concurrent and conflicting interpretations can take on more meaning than the original by itself. for example, how could the internet and legal citations be used to map the subsequent interpretations of the u.s. constitution from the time, place, and reasons where it was written to every supreme court case and related citation since the original context? what would this map look like (figure 4)? the impact that these four pages of ink on paper have had to the united states and the entire world cannot currently be examined in one volume to see where the most contentious and useful passages are. similar dynamics in wikipedia are shown in history flow by martin wattenberg at ibm research.9 what if techniques developed in one field could be applied to content from another area? for example, what if computer models created to track storms and hurricanes could be used to arrange and watch the evolution and real world impact of all the documents and actions associated with a war? being able to see how originals evolve in their interpretation and impact on society over time is practical because not all records are worth keeping. even worse, mundane or meaningless events, artifacts, ideas, or information may seem more important than they actually were if they are not translated into digital form or distributed in the right way.10 the task today is to make the most advanced ways of thinking and working more approachable and appealing to someone new, which is everyone outside a particular discipline or culture, while traversing a map of humanity’s collective knowledge and expertise. because shared memories of this magnitude would be so far-reaching and complex, the record itself needs to be able to show every user how to use it. every unique purpose for looking around, publishing, or referencing work, and adding to or taking away from a collaborative global context should be geared toward improvement and simplification. while millions and millions of people are accessing enormous numbers of files and collections, some paths are better than others. in order to sort and choose the best parts of vast collections, documenting everyone going in and out of various semantic places can ultimately identify the best paths to information everyone understands. what if someone who does not care at all about paintings makes an inquiry—which ten should they be shown to get them interested? there is also the issue of gearing the internet to provide more efficient pathways to widely accessed preapproved and curated information. every mouse click could accumulate to document the most reliable pathways in and out of shared information spaces to generate an assortment of scenarios for looking at the same information in different ways (figure 5).11 we think there is far too much information to consolidate into one big picture, that our ideas and methods are too incompatible to coexist comfortably in one space, but perhaps this is not really the case. perhaps we can understand what is happening more clearly by working backwards. ■ proposed solution and design for a running prototype even though many networks are in place and countless computers have been manufactured, technology advances rapidly. there are very few reasons to repair obsolete equipment or maintain outdated web resources. therefore, why not go back to the drawing board on all of it? we may have completely new computers and networks within ten years, anyway. a record-keeping and referencing system this ambitious needs to incorporate every type of record, classification scheme, symbol, style, and quirk. when visiting a new place outside your comfort zone, it needs to be obvious what the best local techniques are to filter and understand the results. people new to an area need to have the option of using tools they can invent or already know. figure 3. david’s center of mass (courtesy of the visual computing lab and stanford university) article title | author 99digitizing the non-digital | macpherson 99 the visualization of cdt’s model demonstration project will bring together research scientists, artists, integrators, and institutions to develop a running prototype. the purpose is to establish and record a series of planned and spontaneous situations in different parts of the world across a range of disciplines and existing networks so that these situations can be mapped. the project will be a group of people thinking together to confront the roadblocks in assembling incompatible ideas and information into one context. the group will collaborate in larger and smaller groups in roughly three-month intervals as participants continue with their existing work. the development of this system has to be dynamic, changing piece by piece both from the bottom up and the top down while everyone’s regular work continues. therefore, the system will be geared toward sample sets of active work products, rather than the record-keeping system by itself. the current objective is to establish a network of ten art museums, ten scientific research institutes, and ten new media/new technology efforts in ten cities that speak different natural languages (for example: english, german, french, italian, hindi, mandarin, ga [belonging to the cluster of kwa languages in ghana], uzbek, spanish, and arabic). the overall intent is to use mathematics, art, and individual ways of knowing to develop a series of professional sketches to serve as shortcuts between languages and key words in the search process. the first step is to map the background of each of the project participants’ previous work by time, location, and discipline. the database will include scientific visualizations, art objects, performances, algorithms, mathematical formulae, musical recordings, and many other forms of creative and scholarly expression. the next steps will be to hold a series of interactive workshops. at the first workshop, the research scientists will explain the mathematics and images they use in their work. two sets of artists will isolate the aesthetics to render their own map through the scientists’ ideas. two traveling exhibits will be created, one to be experienced in person, the other to be presented through a new media and online exhibit. both will be tracked physically and conceptually using cdt. the results will be generated and interpreted using gis, matlab, photoshop, and flow visualization software. for more information, please contact the author. a survey of individual and institutional requirements will be undertaken to define practical ways to move and organize ideas and information into a unified sample map of previously unrelated content and techniques. for example, at one institute, perhaps only two participants and four local professors will understand what that part of the map is showing. another part may only have meaning to one artist. a unified map for everyone, with built-in copyright protection for the participating artists, scientists, and institutions, will be presented to nonspecialist general publics around the world for feedback and further change within specified limits. the participating publics will be people interested in contemporary art, cutting-edge scientific research, new media, and events where all three communities can interact. each part of the prototype will be able to be examined in groups to compare and contrast different elements against different backgrounds. some arrangements will be assisted by the computer and network. the project will map everything with which each event, idea, and artifact has ever been associated in scale, proportion, and relative placement in the record overall. for example, if the records in question are paintings, any group could figure 5. thick and thin (courtesy of the artist john simon) figure 4. the constitution of the united states (courtesy of the u.s. national archives and records administration [nara]) 100 information technology and libraries | june 2006 be gathered together into the same reference window without copying the images. the assembly window has a built-in scale for the items it is showing, so they will be displayed in the correct proportion to each other. the system binds images of physical objects with their dimensions and the times and places they were created while this information is known—so a user does not ever have to guess later when looking back at any part of the record. any group of paintings can be automatically arranged chronologically, by size, culture, or any number of comparisons and curatorial issues. a sample sequence is: 1. a zoomed-in map showing a group of paintings in an exhibit. each painting links to its history. 2. within the map of all paintings shown in an intricate collage. 3. inside the map of all human endeavor shown as an appealing landscape. higher levels can then be used to reorganize a theme, for example, “only germany 2005 to 2007,” and drilling back down to generate other exhibitions. this would lead to other paintings and other curators’ conclusions, which would provide a more complete representation of each painting, exhibition, museum, curator, culture, and era. when the records in question are scientific visualizations, problems of presenting unrelated files together are more complex. the records may not share a common scale or system of reference. it may only be possible to place mathematical constructs in contexts based on where they originate geographically and by knowledge domain. an important part of the work will be determining the best contexts by which to introduce ideas or information to untrained viewers and devising methods to start deeper in the records using mathematical, cultural, or other prior knowledge and preferences. the same concept can be described by different words, the same image can be interpreted a thousand ways by every viewer, but mathematics is a set of rules to ensure that certain relationships or sequences will be precisely regenerated. therefore, unlike most of today’s digital records, cdts are based on mathematics first, images second, words last. ideas and information will be encoded to persist over specified periods of time. better examples will find higher placement by connecting to more background information and showing stronger relationships to larger numbers of open questions. cycles will be implemented to return to the same idea later and remove information that is never referenced or has not changed the course of the record’s flow. out-of-date, irrelevant, or rarely used information has to either be compressed or be thrown away, a new type of identity and a process to assemble and eliminate information will be created in thirty prototype forms showing the intertwined history of the events, artifacts, ideas, and information generated by the project and all it branches out to when connecting back to the publications, exhibits, ideas, artifacts, and other information generated by the participating individuals and institutions. the cdt model will relate and join tables to display all the different forms together in one map. each piece of information and the patterned space around it will be documented a special way to generate drawings leading back to originals reliably structured to transfer to other computers and networks. they will transfer without ambiguity because the transactions and paths to the internet addresses are based on mathematical relationships that can be checked. each contributor has the first opportunity to place his or her ideas in context and define the limits of how their originals can be referenced, changed, and presented. at the end of the project, the set will be closed so that it can be cleaned of information that was only temporary, placeholders can be examined, and the entire model can be manipulated as one whole. for more information, please see www.contextdriventopologies.org the more specified a single piece or set of information, the easier it will be to define its history and place it in context. each unique placement and priority assigned by each individual or institution may not agree with the priorities and placements envisioned by others, but sooner or later, there will begin to be correspondence and everyone will be looking very generally at the same emerging map. ■ conclusions there will be innumerable contexts to create, discover, and remark upon in the future by creating a shared pace of curiosity and knowledge acquisition. a global context could be used to extrapolate new knowledge from trends that occur over longer periods of time in more places than we currently share or document. as the envisioned system is fine-tuned, it will become an ideal place to test an idea that is only partially complete to see where the idea fits or to determine if it has already been done. the results could be immediately applied to improve education. in today’s frantic information overload, we should not forget that digital information—and even cold, hard, raw data—is more than ones and zeros. they represent peoples’ work, their fingerprints; people are attached to their data. one wishes networks of computers could understand one’s ideas and work, but we only show them the boring parts. the proposed system will capture beauty so computers can help to find where it is hidden inside all the repositories, publications, and collections through which no person has the time to sort. the system will allow article title | author 101digitizing the non-digital | macpherson 101 users to specify how they think their information relates to the rest of the world so their intended context can be traced in the future. one hopes that using networks and computers to compare ideas and works on larger levels will restore craftsmanship and attention spans to make users want to spend more time with better information. a shared visual language driven by mathematical relationships that can be checked will allow future historians to see where records simply will not harmonize. users will be able to analyze why different ways of looking can shape and divide knowledge and history as it changes. visiting online archives and publications will change. developing processes to pre-organize searches and results for public viewing can change now by creating a system for curators and others to develop sets of information, rather than publishing individual items on their web sites. library facilities can change, and research rooms can become multimedia centers. networks can broadcast content and techniques in one package. there is not one clearly defined reason why being able to see these kinds of overviews or make these types of comparisons can be useful. the internet is a worldwide invention being constructed for a variety of purposes. a perfectly legitimate reason to capture the history of transactions across it in a simple form is just to see what might happen with the objective of increasing our understanding and respect for each other. the most important reason for establishing a global context is to allow users to transfer and update complex histories, thoughts, images, studies, visualizations, drawings, flow diagrams, sequences, transformations, cultural objects, stories, expressions, and purely mathematical or dynamic relationships without depending on constrained keywords or illegible codes that do not describe this information as well as the information can describe itself. all cultures and disciplines would be able to construct their parts of the record precisely the way they prefer. we would finally be able to use computers to show why and how we think information is related—a huge leap forward in the world of digital record keeping. references and notes 1. citeseer, 2005, http://citeseer.ist.psu.edu (accessed apr. 6, 2006); internet2, 2005, www.internet2.edu (accessed apr. 6, 2006); jane’s information group, 2005, www.janes.com (accessed apr. 6, 2006); machine learning network online information service (mlnetois), 2005, www.mlnet.org (accessed apr. 6, 2006); national technical information service, 2005, www.ntis .gov (accessed apr. 6, 2006); smithsonian institution libraries, galaxy of knowledge, 2005, www.sil.si.edu/digitalcollections (accessed apr. 6, 2006); thompson scientific, isi web of knowledge, 2005, www.thomsonisi.com (accessed apr. 6, 2006); visual collections, david rumsey collections, 2005, www .davidrumsey.com/collections (accessed apr. 6, 2006); world health organization, statistical information system, 2005, www3.who.int/whosis/menu.cfm (accessed apr. 6, 2006). 2. g. ammons et al., “debugging temporal specifications with concept analysis,” in proceedings of the acm sigplan 2003 conference on programming language design and implementation (new york: association for computing machinery, june 2003). 3. w. huyer and a. neumaier, “global optimization by multilevel coordinate search,” in global optimization 14 (1999): 331–55 4. a. bagga and b. baldwin, (workshop paper), in colingacl ‘98: 36th annual meeting of the association for computational linguisitics and 17th international conference on computational linguisitics, aug. 10–14, 1998, montréal, quebec, canada: proceedings of the conference (new brunswick. n.j.: acl; san francisco: morgan kaufmann, 1998); s. deerwester et al., “indexing by latent semantic analysis,” journal of the american society for information science 41, no. 6 (1990): 391–07; a. mccallum and b. wellner, “toward conditional models of identity uncertainty with application to proper noun coreference,” in proceedings of the ijcai workshop on information integration on the web (mountain view, calif: research institute for advanced computer science, 2003), 79–84; t. nisonger, “citation autobiography: an investigation of isi database coverage in determining author citedness,” college and research libraries 65, no. 2 (mar. 2004): 152–63; k. van deemter and r. kibble, “on coreferring: coreference in muc and related annotation schemes,” computational linguistics 26, no. 4 (dec. 2000); k. boyack, “mapping all of science and technology at the paper level,” presented at the session mapping humanity’s knowledge and expertise in the digital domain as part of the 101st annual meeting of the association of american geographers (denver: association of american geographers, 2005): 54; metacarta, 2005, www.metacarta.com. 5. j. burke, “knowledgeweb project, 2005.” www.k-web .org (accessed apr. 6, 2006); visual browsing in web and non-web databases, iowa state university, www.public.iastate .edu/~cyberstacks/bigpic.htm (accessed apr. 6, 2006). 6. international virtual observatory alliance, 2005, www .ivoa.net (accessed apr. 6, 2006). 7. s. bradshaw, “charting excursions through the literature to manage knowledge in the biological sciences,” presented at the session mapping humanity’s knowledge and expertise in the digital domain, as part of the 101st annual meeting of the association of american geographers (denver: association of american geographers, 2005): 56, project paper available from http://dollar .biz.uiowa.edu/~sbradsha/beedance/publications.html (accessed apr. 6, 2006). 8. m. callieri et al., “visualization and 3d data processing in the david restoration,” ieee computer graphics and applications 24, no. 2 (mar./apr., 2004): 16–21. 9. m. wattenberg, “history flow,” 2005, http://research web.watson.ibm.com/history (accessed apr. 6, 2006). 10. k. börner, “semantic association networks: using semantic web technology to improve scholarly knowledge and expertise management,” in visualizing the semantic web, 2nd ed. vladmire geroimenko and chaomei chen, eds., (london: springer verlag, 2006) 99–115. 11. g. sidler, a. scott, and h. wolf, “collaborative browsing in the world wide web,” in proceedings of 8th joint european networking conference, edinburgh, scotland (new york: elsevier, 102 information technology and libraries | june 2006 1997); j. thomas, “meaning and metadata: managing information in a visual resource reference collection,” in proceedings of association for computers and the humanities and the association for literary and linguistic computing meeting (charlottesville, va.: university of virginia, 1999); h. yu and a. vahdat, “design and evaluation of a conit-based continuous consistency model for replicated services,” in acm transactions on computer systems 20, no. 3 (aug. 2002): 239–82. 12. visualization of context driven topologies/cdt model demonstration project, 2005, www.contextdriventopologies.org (accessed apr. 6, 2006). image acknowledgments: 50-word word-search puzzle www.synthfool.com/puzzle.gif permission: kevin lightner, synthesizer enthusiast. wrightwood, california abraham lincoln www.photomosaic.com/samples/large/abrahamlincoln.jpg permission: from the artist robert silver. david’s center of mass http://vcg.isti.cnr.it/projects/davidrestoration/restaurodavid.htm http://graphics.stanford.edu/projects/mich/book/book.html permission: roberto scopigno, visual computing lab, isti-cnr, via g. moruzzi, 1, 56124 pisa italy and marc levoy, stanford computer graphic lab, gates computer science bldg. stanford, ca 94305 u.s. constitution www.archives.gov/ repository: national archives building, washington, d.c. permission: nara government records are in the public domain. thick and thin www.numeral.com/drawings/plotter/thickandthin.html 1995 11" × 15" ink on paper. permission: from the artist john simon, new york city. specializing in algorithms and conceptual art. editorial i think that writing editorials in my job as the new editor of information technology and libraries (ital) is going to be a real piece of cake. all i have to do, dear readers, is to quote (with proper attribution) walt crawford, the title of whose book i repeat as the title of this, my inaugural editorial.1 and then quote other sages of our profession, using only as many of their words as is fitting and proper to make my editorials relevant to the concerns of our membership and readers and as few of my own words as i can to repay the confidence that the library information and technology association (lita) has placed in me— and to avoid muddling the ideas of those to whom i shall be indebted. those of you reading this will note that i have already fallen prey to the conceit of all scholarly journal editors: that their readers, of course, after surveying the tables of contents, dive wide-eyed first into the editorials. of course. to paraphrase a technologist of an earlier era, “when in the course of human events, it becomes necessary for” a new editor to take on the responsibility for the stewardship of ital, “a decent respect to the opinions of mankind requires that” he “should declare the causes which impel” him to accept that responsibility and, further, to write editorials. i quote, of course, from the first paragraph of the declaration of independence adopted by the “thirteen united states of america” july 4, 1776. in this, my first editorial, i, too, shall put forth for the examination of the members of lita and the readers of ital my goals and hopes for the journal that i am now honored to lead. these goals and hopes are shared by the members of the ital editorial board, whose names appear in the masthead of this journal. ital is a double-blind refereed journal that currently has a manuscript acceptance rate of 50 percent. it began in 1968 as the journal of library automation (jola), the journal of the information science and automation division (isad) of ala, and its first editor was fred kilgour. in 1978 isad became lita, and in 1982, the journal title was changed to reflect the expanding role of information technology in libraries, an expansion that continues to accelerate so that ital is no longer the only professional journal within ala whose pages are now dominated by our accelerating use of information technologies as tools to manage the services we provide our users and as tools we use ourselves to accomplish our daily duties. i write part of this editorial in the skies over the middle section of the united states as i return home from the seventh national lita forum held in st. louis, october 7–10. at the forum, i heard presentations, visited poster sessions, and talked with colleagues from forty-four states and six countries who had something to say and said it well. i hope that some of them may submit manuscripts to ital so that all the members of lita and all the readers of the journal will profit as well from some of what the attendees of the forum heard and saw. i attended the forum forewarned by previous ital editors to carry plenty of business cards, and i went armed with a pocketful. i think i distributed enough that, if pieced together, their blank sides would provide sufficient writing space for at least one manuscript! in an attempt to fulfill the jeffersonian promise above, i hereby list a few of my goals for the beginning of my term as editor. i must emphasize that these goals of mine supplement but do not supplant the purposes of the journal as stated on the first page and on the ital web site (www.ala.org/lita/litapublications/ital/italinformation. htm); likewise, they do not supplant the goals of my predecessors. in no particular order: i hope to increase the number of manuscripts received from our library and information schools. their faculty and doctoral students are some of the incubators of new and exciting information technologies that may bear fruit for future library users. however, not all research turns up maps on which “x marks the spot.” exploration is interesting, even vital, for the journey, for the search itself, and our graduate faculties and students have something to say. i hope to increase the submission of manuscripts that describe relevant sponsored research. in the earlier volumes, jola had an average of at least one article per issue, maybe more, describing the results of funded research. ital can and should be a source that information-technology researchers consider as a vehicle for the publication of their results. two articles in this issue result from sponsored research. in fact, i hope to increase the number of manuscripts that describe any relevant research or cutting-edge developments. much of the exploration undertaken by librarians improving and strengthening their services involves research or problems solved on both small scales and large. neither the officers of lita, the referees, the readers, nor i are interested in very many “how i run my library good” articles. we all want to read a statement of the problem(s), the hypotheses developed to explore the issues surrounding the problem(s), the research methods, the results, the assessment of the outcomes, and, when feasible, a synthesis of how the research methods or results may be generalized. i hope to increase the number of articles with multiple authors. libraries are among society’s most cooperative institutions and librarians, members of one of the most cooperative of professions. the work we do is rarely that of solitary performers, whether it be research or the editorial | webb 3 editorial: first have something to say john webb john webb (jwebb@wsu.edu) is assistant director for digital services/collections, washington state university libraries, pullman, and editor of information technology and libraries. (continued on page 21) __problems with unauthorized people accessing the internet through the wireless network __problems with restricted parts of the network being accessed by unauthorized users __other 3. how were security problems resolved? benefits of use of network 1. what have been the biggest benefits of wireless technology? check all that apply. __user satisfaction __increased access to the internet and online sources __flexibility and ease due to lack of wires __has improved technical services (use for library functions) __has aided in bibliographic instruction __provides access beyond the library building __allows students to roam the stacks while accessing the network __other 2. how would you describe current usage of the network? __heavy __moderate __low 3. in your opinion, has this technology been worth the benefit-cost ratio thus far? __yes __no __not sure 4. what advice would you give to librarians considering this technology? (editorial continued from page 3) design and implementation of complex systems to serve our users. writing about that should not be solitary either. i hope to publish think-pieces from leaders in our field. i hope to publish more articles on the management of information technologies. i hope to increase the number of manuscripts that provide retrospectives. libraries have always been users of information technologies, often early adopters of leading-edge technologies that later become commonplace. we should, upon occasion, remember and reflect upon our development as an information-technology profession. i hope to work with the editorial board, the lita publications committee, and the lita board to find a way, and soon, to facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association. in short, i want to make ital a destination journal of excellence for both readers and authors, and in doing so reaffirm the importance of lita as a professional division of ala. to accomplish my goals, i need more than an excellent editorial board, more than first-class referees to provide quality control, and more than the support of the lita officers. i need all lita members to be prospective authors, prospective referees, and prospective literary agents acting on behalf of our profession to continue the almost forty-year tradition begun by fred kilgour and his colleagues, who were our predecessors in volume 1, number 1, march 1966, of our journal. reference 1. walt crawford, first have something to say: writing for the library profession (chicago: ala, 2003). wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 21 75 video technologies: neologism or library trend? converging factors are shaping a new environment for libraries, and, as a consequence, the present is full of opportunity. technical and social changes provide libraries with a host of alternatives for service, growth, and innovation. in this new environment libraries will, undoubtedly, continue to promote the availability of books and other materials, continue to increase their efforts to furnish patrons with information, and continue to broaden the range of activities offered so that patrons can receive personalized service. patron information seeking and searching methods we have known, however, will give way to new methods based on computers and telecommunications. a host of new technologies is growing out of the evolutionary pathway marked by telegraph, telephone, radio, and television . broadband communications (that's the cable that today brings you predominantly entertainment television), satellite, videotex, teletext, videodisc, videotape, large-screen television, and computer displays (some are as large as the side of a building) are available either today or within the next year or two . each of these technologies is a new medium within its own inherent capabilities and limitations. each has the promise of providing faster and more cost-efficient information services than some present forms of printed communication. and each requires a different approach and different knowledge for effective and efficient use, and integration into library operations. in a growing number of locations, cable communications for delivery of library se rvices have already been made available virtually free of charge . other technologies, such as videotex, will grow significantly . estimates suggest that in five years more than 8 million american homes will be able to obtain extensive, automated information services from commercial, private, and government sources . probably a larger number will receive limited information services over the broadcast airwaves via teletext. dramatic new services will combine television, computer, telephone, satellite, and cable into home entertainment and information centers ... potential extensions of libraries. some sources suggest more than 50 percent of the american gross national product results from the collection, processing, and dissemination of information, much of which involves new technologies . inev76 journal of library automation vol. 14/2 june 1981 itably, this technological trend also will occur in libraries and, in this light, the relatively low level of involvement of computers in providing patron services today is notable. by their natural inertia, individuals and organizations in the library community will be opposed to the acceptance of cable services, videotex , online catalogs, information retrieval , and other video technologies simply because it represents change . but these technologies are technically feasible and are becoming an economic reality. the point of demarcation between computer and library may well become a terminal in a patron's home. whether or not the service provided is a library's or a commercial competitor's depends to a great extent on how libraries define their role in this environment, and on the degree of library participation in the evolutionary process that's now taking place. something besides inertia opposes the acceptance of new technologies, however. to some degree, lack of awareness of technological trends is a factor, but more significant is a lack of clear understanding (both by the proponents of the technology and by librarians) of how new technology can be integrated into the library setting. understanding the value a technology offers for increased service or decreased cost, for example should be paramount, but frequently the technology seems to be offered as an end in itself. internal and external factors must be considered to guide the application of technology toward meeting library and patron needs. financial concerns, social forces , and the consumer/patron appear to be major factors leading libraries toward a future deeply involved in video technologies. whether the outcome will result from external pressure or internal plan remains to be seen. it's incumbent upon libraries to be informed and active participants in directing their own future in this kind of an environment. what are the implications of this technical evolution and internal/external factors? one thing is sure: it's a massive industry growing at a very rapid rate, and it is going to grow even faster . libraries have the opportunity to grow with this trend through application of the technologies to existing technical services, increased availability of patron services, and development of innovative services . if there is a common thread that can identify those libraries which will grow and prosper, that thread is flexibility the capacity of library management and staff to adapt this library to the new environment, and integrate technology into their library. readers and contributors of ]ola are the people that can e ither have an integral part in defining the future direction of libraries, or passively watch patron needs outstrip services. library schools and people involved in library-related research must play a key role in assessing the value of video technologies and defining how to integrate them into the business and service of libraries. what is going to preserve and enhance editorial!harnish 77 the role of libraries in the 1980s will not only be flexibility but another very critical element foresight, dedicated to patron needs . many libraries have met this technological revolution head-on and are intimately involved in testing, developing, and providing innovative library services . in this and forthcoming issues, we hope to bring a perspective on these changes that is valuable and cogent to the library community. readers of ]ola and practitioners in all areas of video technology are called upon to describe their efforts and share their results drawn from this rapidly changing field through contributions to this journal. thomas d. harnish editor's notes jola will continue to be interesting and useful to its readers to the extent that its readers are willing to expend the efforts to also be its writers. the authors in this issue are all as busy as you and i. they have made time in their already full schedules to write down ideas and information they hope will be useful and provocative. they and we of the ]ola staff hope you are pleased with the results . so what's new by you? how have your patrons reacted to your new online catalog? what do the costs of your acquisitions system look like? how about that idea you had about a new way to do whatever? do you think the fuss over authority control is worth it? if you have ,ideas, perceptions, or stories to tell that you feel are of interest to your fellow readers, please write and let them know. editorial board thoughts: the promise of immersive libraries jerome yavarkovsky information technology and libraries | december 2013 5 immersive technologies—interactive 3d graphics, simulation, and gaming technologies—have much to offer higher education by collapsing geography and by providing a richer learning environment. over the past forty years, through digitization and internet services, librarians have brought technology to bear on making it easier to find and use information, even to the point where people can find and use library resources without coming to the library. information space—the space between user and literature—has been collapsed through digital access. now there is potential to collapse the space between users themselves as they work together from different locations. in recent decades, learning has gone from a predominantly independent and competitive process for students to one that makes greater use of collaboration, cooperation, and group study. the library as a place for students and researchers to work individually with their literature has become a collaborative workspace where students work together on research projects and shared class assignments. this presents a challenge to libraries and learning institutions with limited space for students to meet, share ideas, do coursework together, work on joint projects, and practice presentations. for more than five years, advances in the development of virtual meeting space and workspace have enabled librarians to provide immersive, 3d virtual world services that give a sense of presence that is lacking in conference calls, text chat, and web conferencing.1 as a result, not only is the individual’s physical distance from library materials eliminated, but also the distance is eliminated between individuals who work with each other using library materials. immersive technologies offer the promise of 3d virtual world libraries where students and their teachers can work together in virtual space with library materials and tools—search engines, online catalogs, media, text, etc. students would sit at their computer wherever they are and work together with classmates in a shared environment, using library materials as well as productivity tools—word processor, spreadsheet, web, or blog development software. this would be a boon for real-world institutions lacking sufficient physical space, but also for distance education and international education. for example, students might study abroad but take a class at their home institution, take classes with classmates at foreign institutions, learn jerome yavarkovsky (jeromey@bc.edu) is emeritus university librarian, boston college, and founding co-chair of the libraries and museums technology working group under the immersive education initiative of the media grid. editorial board thoughts | yavarkovsky 6 languages and experience foreign cultures more directly, or take classes in locales not accessible to professors. and now, with the advent of massive open online courses, the opportunity lies within the immersive library to serve large numbers of students who have limited or no library facilities. these students would be able to use library resources as well as to communicate and work with classmates, teaching assistants, tutors, and others whose support would be augmented through their virtual presence rather than via email or text chat. as current research material is born digital, and legacy material is digitized at accelerating rates and is delivered digitally, they are perfect for use in immersive environments. however, immersive library resources are not limited to traditional materials. they include virtual world learning objects and environments, and virtual representations of books—walk-in books or educational simulations—that are interactive experiences with literature.2 further, in virtual space, physical research objects can be part of the study environment and be brought into an active relationship with the information resources that pertain to them. for example, if you were studying 3d models of mayan pottery in a virtual library workspace, you would have access to historical accounts, conference papers, current periodical articles, photographic archives, dissertations and other material pertinent to your studies. through this, not only would distance be eliminated between individuals and library materials, but also physical research objects from pottery to architecture, from monuments to molecular models could be represented in virtual space and related to the information that pertains to them.3 so whether the collaborators are students, teachers, librarians, researchers, we can see a time, even now, when they will no longer be bound by physical workspace. in addition to the immersive library as repository and collaborative study space, the immersive library can be expected to offer enhanced services to users. for example, immersive information literacy programs, immersive research and course consultations, virtual interlibrary document management, and document delivery are just a few possibilities. the immersive workspace is a logical setting for instructing students in digital literacy, the evaluation of resources, and the use of information tools. the goal here would be to bring into the immersive environment the rich body of course designs and curriculum materials pertaining to the educated use of information. we know that among the most significant lifelong benefits of higher education are information skills—finding, evaluating, and using information for work or for personal enrichment. research and course consultations are further academic services that would be valuable applications in the immersive library. librarians, library assistants, and docents have all helped individuals in second life and opensim libraries by providing advice, information, and guidance. these services have further potential in 3d virtual environments through enhancement with photographs, documents and virtual structures. add to these the tools and resources of the library collaborative workspace, and the potential for student and faculty consultation and advisory information technology and libraries | december 2013 7 services is even greater. imagine the phd candidate or student with a writing assignment in the library space with the librarian and all the tools needed for help with dissertation research. as immersive libraries realize their potential and grow in number and scale, the challenge of managing them will grow as well. the tools of 3d virtual workspaces hold promise here also to facilitate the work of the library and improve librarian productivity. librarians work across geographic boundaries regionally and nationally in consortia and multitype systems for resource sharing, collaborative research and development projects, digitization initiatives, staff development programs, and any number of efforts to economize and improve performance. the very management of the library enterprise should benefit from 3d virtual reality tools brought to bear on the day-to-day work and communication of the library. for any who might want to learn more, the ala virtual communities and libraries member initiative group maintains an ala connect site (http://connect.ala.org/node/66325). communication on libraries in virtual environments is also available through the acrl virtual worlds interest group and its google group, acrlinsl (http://groups.google.com/group/acrlinsl). references 1. lori bell and rhonda b. trueman, eds., virtual worlds, real libraries: librarians and educators in second life and other multi-user virtual environments (medford nj: information today, 2008); tom peters, “librarianship in virtual worlds,” library technology reports 44, no. 7 (october 2008). 2. aaron griffiths, ma education in virtual worlds: immersive literature, “presenting the novel night, by elie wiesel, as an immersive literature discussion space,” youtube video, www.youtube.com/watch?v=i-ijpjcwtxa&feature=player_embedded#t=0, blog post at f/xual education services, may 17, 2013, http://fxualeducation.wordpress.com/2013/05/17/immersivelit/; bernadette daly swanson/daisyblue hefferman, “second life: bradburyville virtual experience, fahrenheit 451,” youtube video, 2008, http://www.youtube.com\\watch?v=yyhqo0q2m_g. 3. luís miguel sequeira and leonel caseuri morgado, “virtual archaeology in second life and opensimulator,” journal of virtual worlds research 6, no. 1 (april 2013), http://journals.tdl.org/jvwr/index.php/jvwr/article/view/7047. http://connect.ala.org/node/66325 http://groups.google.com/group/acrlinsl http://fxualeducation.wordpress.com/2013/05/17/immersivelit/ http://www.youtube.com/watch?v=yyhqo0q2m_g http://journals.tdl.org/jvwr/index.php/jvwr/article/view/7047 microsoft word ital_march_gerrity.docx editor’s comments bob gerrity   information  technology  and  libraries  |  march  2013   1       with  this  issue,  information  technology  and  libraries  (ital)  begins  its  second  year  as  an  open-­‐ access,  e-­‐only  publication.  there  have  been  a  couple  of  technical  hiccups  related  to  the  publication   of  back  issues  of  ital  previously  only  available  in  print:  the  publication  system  we’re  using  (open   journal  system)  treats  the  back  issues  as  new  content  and  automatically  sends  notifications  to   readers  who  have  signed  up  to  be  notified  when  new  content  is  available.  we’re  working  to   correct  that  glitch,  but  hope  that  the  benefit  of  having  the  full  ital  archive  online  will  outweigh   the  inconvenience  of  the  extra  e-­‐mail  notifications.  overall  though,  ital  continues  to  chug  along   and  the  wheels  aren’t  in  danger  of  falling  off  any  time  soon.  thanks  go  to  mary  taylor,  the  lita   board,  and  the  lita  publications  committee  for  supporting  the  move  to  the  new  model  for  ital.   readership  this  year  appears  to  be  healthy—the  total  download  count  for  the  thirty-­‐three  articles   published  in  2012  was  42,166,  with  48,160  abstract  views.  unfortunately  we  don’t  have  statistics   about  online  use  from  previous  years  to  compare  with.    the  overall  number  of  article  downloads   for  2012,  for  new  and  archival  content,  was  74,924.    we  continue  to  add  to  the  online  archive:  this   month  the  first  issues  from  march  1969  and  march  1981  were  added.  if  you  haven’t  taken  the   opportunity  to  look,  the  back  issues  offer  an  interesting  reminder  of  the  technology  challenges  our   predecessors  faces.     in  this  month’s  issue,  ital  editorial  board  member  patrick  “tod”  colegrove  reflects  on  the   emergence  of  makerspace  phenomenon  in  libraries,  providing  an  overview  of  the  makerspace     landscape.  lita  member  danielle  becker  and  lauren  yannotta  describe  the  user-­‐centered  website   redesign  process  used  at  the  hunter  college  libraries.  kathleen  weessies  and  daniel  dotson   describe  gis  lite  provide  examples  of  its  use  at  the  michigan  state  university  libraries.  vandana   singh  presents  guidelines  for  adopting  an  open-­‐source  integrated  library  system,  based  on   findings  from  interviews  with  staff  at  libraries  that  have  adopted  open-­‐source  systems.  danijela   boberić  krstićev  from  the  university  of  novi  sad  describes  a  software  methodology  enabling   sharing  of  information  between  different  library  systems,  using  the  z39.50  and  sru  protocols.   beginning  with  the  june  issue  of  ital,  articles  will  be  published  individually  as  soon  as  they  are   ready.  ital  issues  will  still  close  on  a  quarterly  basis,  in  march,  june,  september,  and  december.   by  publishing  articles  individually  as  they  are  ready,  we  hope  to  make  ital  content  more  timely   and  reduce  the  overall  length  of  time  for  our  peer-­‐review  and  publication  processes.   suggestions  and  feedback  are  welcome,  at  the  e-­‐mail  address  below.       bob  gerrity  (r.gerrity@uq.edu.au)  is  university  librarian,  university  of  queensland,  australia.   157 library network analysis and planning (lib-nat) maryann duggan: director, industrial information services program, southern methodist university, dallas, texas a preliminary report on planning for networl< design undertaken by the reference round table of the texas library association and the state advisory council to library services and construction act title iii texas program. necessary components of a network are discussed, and network transactions of eighteen dallas area libraries analyzed using a methodology and quantitative measures developed fm· this project. to be a librarian in 1969 is to stand at the crossroads of change, with a real opportunity to put libraries and professional experience to work on immediate problems of today's world. in mobilizing total library resources for effective service to a variety of patron groups in a variety of ways, the librarian has at hand an exciting new tool of great potential and equally great challenge: the library network library networks and reference services networks and all that they imply are simply an extension of good reference services as they have been practiced for years, but their existence and potential capability require redefinition of the reference function, which, being no longer limited to one collection, has been given new dimensions of time, depth and breadth. networks, and the inter-library cooperation they require, offer an opportunity to combine materials, services and expertise in order to achieve more than any one library can do alone. in this case, the whole is greater than the sum of its parts, for each library can offer its particular patron group the total capability of the network, including outside resources not previously available. 158 journal of library automation vol. 2/ 3 september, 1969 with the new tool of library networks, it is possible to provide responsive, personalized, in-depth reference service, and to provide it so rapidly that a patron can receive a pertinent bibliography covering his desired topic within an hour of his original inquiry. the reference librarian becomes an expert in resources and resource availability at the national level. his reference desk becomes a switching center, at which he receives and analyzes inquiries, decides the level of service required, identifies available sources or resources that match an inquiry, transmits the latter ( restructured to be compatible with the network language), conducts a dialog with the source, receives the response and interprets it to the patron. this procedure is not markedly different from what has been done for years in any reference library, but with greater potential the process must be more formalized and structured. networks do require new expertise and crystallizing the reference philosophy. clarification is needed as to 1) types or levels of reference services, and unit operations in reference services; 2) the role of in-depth subject analysis of reference queries; 3) decisions on alternate choices of sources and of communications links; 4) structuring of large blocks of resources to permit fast access; and 5) the role of each library in the network and its responsibility to the network. approach to network design the reference round table of the texas library association and the state advisory council to library services and construction act title iii texas program have been struggling with the challenge of inter-library network design for the past two years. this paper is written to share with reference librarians some of their preliminary findings and to urge the involvement of reference librarians in planning and developing networks and network parameters. for identification the project herein described is referred to as lib-nat, for library network analysis theory. although only the author can be blamed for any faults of this "theory," many persons have contributed to the development of it. the reference round table of the texas library association has provided the forum for exploring and developing ideas on inter-library cooperation. title iii of the library services and construction act has provided the legal and financial impetus enabling the field testing of some of those ideas. texas chapter, special libraries association, has sparked and catalyzed ideas and clarified needs. the state technical services act provided the vehicle for experimental development of new approaches to reference services. southern methodist university provided the haven and ivory tower from which these new approaches could be tried under the cloak of academic respectability. but, of greatest importance of all, individual librarians, with vision and desire to be of service and willingness to try new things, have been the driving force in helping to develop new concepts of library use and purpose in the texas area. lib-nat 159 the basic philosophy back of lib-nat is simply that any person anywhere in the state of texas should have access to any material in any library anywhere in the state through a planned, orderly, effective system that will preserve the autonomy of each library while serving the needs of all the citizens of the state. particular needs of special user groups (such as the blind or the accelerated student or the industrial researcher) should also be identified and provided for in a cooperative mode through local libraries throughout the state. network components in the process of developing lib-nat, twelve critical components were identified that are essential to orderly, planned development of the objectives stated above. as a minimum, such a network must have the following: 1) organizational structure that provides for fiscal and legal responsibility, planning, and policy formulation. it must require commitment, operational agreement and common purpose. 2) collaborative development of resources, including provision for cooperative acquisition of rare and research material and for strengthening local resources for recurrently used material. the development of multi-media resources is essential. 3) identification of nodes that provide for designation of role specialization as well as for geographic configuration. 4) identification of primary patron groups and provision for assignment of responsibility for library service to all citizens within the network. 5) identification of levels of service that provide for basic needs of patron groups as well as special needs, and distribution of each service type among the nodes. there must be provision for "referral" as well as "relay" and for "document" as well as "information" transfer. 6) establishment of a bi-directional communication system that provides "conversational mode" format and is designed to carry the desired message/document load at each level of operation. 7) common standard message codes that provide for understanding among the nodes on the network. 8) a central bibliographic record that provides for location of needed items within the network. 9) switching capability that provides for interfacing with other networks and determines the optimum communication path within the network. 10) selective criteria of network function, i. e., guidelines of what is to be placed on the network. 11 ) evaluation criteria and procedures to provide feedback from users and operators and means for network evaluation and modification to meet specified operational utility. 160 journal of library automation vol. 2/ 3 september, 1969 12) training programs to provide instruction to users and operators of the system, including instruction in policy and procedures. the foregoing components of the ideal inter-library network (one so designed that any citizen anywhere in the state can have access to the total library and information resources of the state through his local library) may be considered the conceptual model, or the floor plan from which the network of the program can be constructed. although these twelve components might be labeled "ideal," they are achievable and they are within reach of the present capability of all libraries today. they have also weathered the unrelenting critique of 288 reference librarians in the march 27, 1969, tla reference round table ("the 1969 reference round table pre-conference institute: an overview," texas library journal, vol. 45 (summer 1969), no. 2.). during that reference round table the twelve components were tested in a simulated network, using 42 cases. in this behavioral model actual, current inter-library practices were observed during game-playing in the simulated network. the experience verified that the components outlined above are essential to the development of planned, cooperative, inter-library systems. analysis of network transactions as part of the lsca title iii project, and to test the twelve components, exploration was instituted into the existing inter-library relations among eighteen libraries of all types in the dallas area to see how current practices compared with the ideal conceptual model the essential minimum requirement of a library is document transfer, i. e., the ability to supply a known item on request; and on-going inter-library loan transactions are a valid indicator of emerging network patterns in the current environment. this microscopic study of 1967 individual library loans among eighteen libraries of different types has provided a wealth of insight into network developments. as a pilot model it has offered a means of observing and studying existing practices, identifying problems, and experimentally evaluating the effect of changes in the system or environment. more must be known about on-going inter-library transactions for the design of improved networks. in the attempt to find out who was attempting to borrow what from whom and how successfully requests were filled, the following variables were considered: 1) type of library, both borrowing and lending, such as academic, public, special, or public school. 2) type of message format, i. e., telephone, twx, telex, letter, or interlibrary loan. 3) type of item requested in the transaction, such as monograph, serial, map, document. 4 ) geographic location of borrowing and lending library, i. e., local, area, state, regional, national or international. lib-nat 161 the complexity of even a small pilot model required the formulation of some rigor in the analysis and the development of analytical tools and symbolic models. figure 1, for example, is a symbolic model that permits comparison of two variables simultaneously, e. g., the type of library participating in the transactions and the geographic level of the participants. for modeling purposes, it was assumed all libraries fall into one of four 1 = local 3 = state 2 = area 4 = re ion switching centers fig. 1. symbolic model of inter-library networks. classes represented by the quadrants in figure 1. also it was assumed that each library can be identified as to a specific geographic level, as indicated by the numbers 1 through 6. in the analysis of the pilot model data it was observed that transactions occur among libraries of the same type and at the same geographic level, and between libraries of different types at different geographic levels. figure 1 provides a symbolic model for conceptualizing these various types of transactions. switching centers, represented on figure 1 by the circles around the geographic numbers, participate in transactions at varying geographic levels, as well as between and among various types of library sectors. the role and the location of switching centers is an important aspect of lib-nat. 162 journal of library automation vol. 2/ 3 september, 1969 within the framework of the symbolic model, the simple form of interlibrary loan may be represented as a two-body transaction between the borrowing library and the lending library, as shown in figure 2. applying these transactions on the symbolic model of figure 1 and considering both a b fig. 2. two-body transaction. type of library and geographic level, four general classes of two-body transactions can be identified: 1 ) homogeneous vertical, i. e., between two libraries of the same type but at different geographic levels (pt _..,.. p~; st _..,.. sa) ; 2) heterogeneous horizontal, i. e., between two different types of libraries at different levels ( pt _..,.. a1; st _..,.. p1); 3) heterogeneous vertical, i. e., between two different types of libaries at different levels (pt _ ..,.. a4; sl _..,.. pg); 4) homogeneous horizontal, i.e., between two libraries of the same type and the same geographic level (pt _..,.. pt; s2 .... s2). the formulas serve as a shorthand symbolic representations of some typical transactions of these four classes. the final report on lib-nat will contain statistical data on distribution of pilot model transactions by type and by geographic level, showing type interdependency and geographic dependency or self-sufficiency. further analysis of the pilot model data revealed another type of transaction, the three-body transaction, in which a third agent becomes involved. the third agent may act as a referral center, as illustrated in figure 3, or as a relay center, as illustrated in figure 4 ( sw indicates switching center) . part of the lib-nat theory specifies that there is a distinction between referral and relay, and that the latter is a valid function of a true switching center. figure 5 illustrates the various types of possible three-body transactions with different geographic levels of switching among the different types of libraries. which of these transactions is the most efficient or has the greatest utility is one of the basic design parameters needing further analysis. it should be noted that the variable, of message format, that is, the channel of communication or type of communication link, has not yet been investigated in the symbolic modeling of these transactions. lib-nat 163 ..... .... a ... b c ~ .4~ t fig. 3. three-body transaction: referral. ... .... a sw b fig. 4. three-body transaction: relay. at • swt ., st at • sws ~ a4 ps • sws • pt sc1 ~sw2 .... sc2 p2 .. p1 ~sw3 .. p3 p2 ... p1 ~sw3 ~ a3 p2 • p1 ~sw3 .. sw1 ~s1 fig. 5. three-body transactions at various geographic levels. .. . . 'i 164 journal of library automation vol. 2/ 3 september, 1969 network configuration another very important design parameter is the network configuration or organizational hierarchy specifying the communication channels and message flow pattern. figure 6 illustrates symbolically a non-directed configuration of communication. if each dot represents a node in the network ( i. e., a participating library), and each line represents a communication link, it can be seen that each node can communicate directly with every other node, providing (or requiring) a total of fifteen links among the six nodes . n·l c = n (-2-) =15 fig. 6. non-directed network . by contrast, figure 7 illustrates a directed configuration to which the six nodes are interconnected through a switching center and requiring only six channel links. in like manner, if a non-directed network desires to interface with a specialized center, such as the library of congress or a special bibliographic center or search center, a total of twenty-one channels is required (figure 8), whereas a directed network can interface with a specialized center via only seven channels, as illustrated in figure 9. __ .... ----------jc~n-t=sl fig. 7. directed network. lib-nat 165 fig. 8. non-directed network including specialized center. fig. 9. directed network including specialized center. 166 journal of library automation vol. 2/ 3 september, 1969 as local or area networks begin to develop, there will be a need for tying together two area networks to develop larger units of service. the interfacing of an original network of six libraries in one area with an adjoining area network of sl"( libraries will result in the network configuration shown in figure 10 in the case of a non-directed network, and sixty-six communication links among twelve nodes will be required. whereas, if two directed networks of six libraries each desire to interface, a type of linkage requiring only thirteen channels may be envisioned (figure 11). which is the best type of network configuration? what are the decision parameters that should be considered in designing or planning network configuration? how can alternate configurations be evaluated ? alternate channel requirements? and alternate geographic levels of switching? in the pilot model study, a mathematical model has been devised which can be used for simulating various configurations and channel capacities, fig. 10. interface of two non-directed networks. fig. 11 . interface of two directed networks. lib-nat 167 thereby permitting some desired criteria function of network performance to be maximized or optimized. the details of the mathematical model will be published as part of the final report on lib-nat; in the meantime it can be said that this is a fascinating area of network analysis which will be useful to any group of libraries planning network configurations. the mathematical model-a multi-commodity, multi-channel, capacitated network model, developed by dr. richard nance at southern methodist university as part of the title iii project-promises to have a high potential application in network design and performance evaluation. it does require that the librarian make some hard-nosed decisions on operational and performance parameters of the inter-library systems discussed in the preceding article, but this is part of the challenge of lib-nat. measures of participation it is obvious that types of libraries, geographic level, types of transactions, various network configurations, alternate communication links and switching levels are all important in planning inter-library systems. next it is necessary to take an in-depth look at the relationship between the individual participating library and the total network. in the pilot model study of eighteen libraries a noticeable difference appeared in the magnitude and type of participation. in surveying only the two-body transactions, it was observed that some libraries were primarily borrowers and others primarily lenders, and some were heavy and some light. in pursuit of a quantitative method of representing these relationships some formulae were evolved which are helpful in understanding node/network dynamics. starting with the individual library or node, let b .. equal the number of borrowing transactions originating at that node and l,. equal the number of lending transactions; then l.. plus b,. will equal the total number of all transactions at that particular node. in like manner, looking at the total network (in this case all eighteen participating libraries), let bt equal the total number of borrowing transactions originating in the network and l, the total number of lending transactions; then b, plus lt will equal the total number of both types of transactions in the network. in the analysis of node/network dynamics, it was felt there should be some way of quantitatively expressing the individual node's dependency on the total network and also a way of expressing the relative degree of activity of each node. in other words, a participating library that was a net borrower (compared to its lending) was obviously more dependent on the network than would be a library that borrowed very little compared to its lending. the extent of dependency can be expressed as a node dependency coefficient calculated as follows: b. b,. + l,. relative amount of borrowing compared to total node transactions .. .. 168 journal of library automation vol. 2/3 september, 1969 among its other uses, the dependency coefficient of a node may give some insight into the extent to which it should share in network expenses, but the dependency coefficient alone should not be a final criterion, since magnitude of activity is of equal importance. for developing a method of quantitatively expressing activity of a node compared to total activity of the network a factor called the node activity coefficient may be calculated as follows: relative activity of both types at _ one node compared to total activity be + lt in total network bn + ln then, to quantitatively express the dependency of a given node on the network, one can calculate the node/ network dependency coefficient as follows: b b+l fig. 12. 0. 0.6 0.5 0.4 0.3 o' o i 0.2 cp i 0.1 i i i 100 be + le > o. 5 = net borrower < 0.5 = net lender i i i i q i i i i i ·i i i i i 200 300 400 b+ l node dependency coefficient. i i i i i i b>l1 i b
  • q0.2241, the null hypothesis is rejected and it must be accepted that 15,123 is an outlier. once it is determined with statistical certainty that the suspected outlier is indeed an outlier, it needs to be replaced with the median calculated from all values found in dataset 2. for the case of polymer, the median was calculated to be 27 from all values in table 2. replacing an outlier with the median to accommodate the data has been proven to be quite effective in dealing with outliers by introducing less distortion to that dataset.39 extreme values are therefore replaced with values more consistent with the rest of the data.40 january february march april may june july august september october november december polymer 2009 27 14 35 22 15 28 24 19 11 8 13 7 polymer 2010 12 15 26 33 38 64 39 5 13 27 109 44 polymer 2011 113 159 638 345 52 57 94 70 39 36 221 65 polymer 2012 130 4 98 24 27 18 13 16 18 25 9 5 table 3. the identified outlier is replaced with the median (highlighted in bold). the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 37 table 3 represents the number of full-text articles downloaded for polymer after the outlier had been replaced with the median. the confirmed outlier of 15,123 articles downloaded recorded in october 2010 is replaced with the median of 27, highlighted in bold. this then becomes the accepted value for the number of articles downloaded from polymer in october 2010. the outlier is discarded. the new value of 27 articles downloaded in october 2010 replaces the extreme value of 15,123 in the original 2010 jr1 report (see table 4). this is the final step. january february march april may june july august september october november december polymer 12 15 26 33 38 64 39 5 13 27 109 44 surface and coatings technology 3 1 2 1 22 17 17 0 12 3,771 5,428 601 international journal of radiation oncology 11 18 35 22 17 6,436 176 13 25 29 24 19 journal of catalysis 0 1 5 1 2 2 16 4 0 2 6,693 1 table 4. sample from a 2010 jr1 counter-compliant report indicating the number of articles downloaded per journal over a twelve-month period. polymer’s identified outlier is replaced with the median calculated from table 2 (highlighted in bold). once the first outlier is corrected, the same procedures need to be followed for the other suspected outliers highlighted in table 1. if it is determined that they are outliers, they are replaced with their associated median values. although the steps and calculations used to identify and correct for outliers are relatively simple to follow, it is admittedly a very lengthy and timeconsuming process. but in the end, it is well worth the effort. results and discussion table 5 details the changes in the overall number of articles downloaded from j. n. desmarais library e-journals that resulted from the elimination of outliers. the column titled “recorded downloads” details the number of articles downloaded between 2000 and 2012, inclusively, prior to outlier testing. the column titled “corrected downloads” represents the number of articles downloaded during the same period of time but after the outliers had been positively identified and the data cleaned. the affected values are highlighted in bold. information technology and libraries | june 2014 38 year recorded downloads corrected downloads 2000 806 806 2001 1034 1034 2002 1015 1015 2003 4890 4890 2004 72841 72841 2005 251335 251335 2006 640759 640759 2007 731334 731334 2008 710043 710043 2009 725019 725019 2010 857360 757564 2011 869651 696973 2012 716890 716890 table 5. comparison of the recorded number of articles downloaded to the corrected number of articles downloaded, over a thirteen-year period. all data from all available years were tested for outliers. only data recorded in 2010 and 2011 tested positive for outliers. replacing outliers with the median values for those affected journal titles dramatically reduced the total number of downloaded articles (see table 5). between 2007 and 2009, inclusively, the actual number of full-text articles downloaded recorded from the library’s e-journal collection totaled between 731,334 and 725,019 annually (see table 5). the annual average for those three years is 722,132 articles downloaded. but in 2010 that number dramatically increased to 857,360 downloaded articles, which was followed by 869,651 downloaded articles in 2011 (see table 5). the elimination of outliers from the 2010 data resulted in the number of downloads dropping from 857,360 to 757,564, a difference of nearly 99,796 downloads, or 12 percent. similarly, in 2011, the number of articles downloaded decreased from 869,651 to 696,973 once outliers were replaced with median values. this represents a reduction of over 172,678 downloaded articles, or 20 percent. a staggering 20 percent of articles downloaded in 2011 can therefore be considered as erroneous and, in all likelihood, the result of illicit downloading. figure 1 is a graphical representation of the change in the number of articles downloaded before and after the identification of outliers and their replacement by median values. the line “recorded downloads” clearly indicates a surge in usage between 2010 and 2011 with usage returning to levels recorded prior to the 2010 increase. the line “corrected downloads” depicts a very different picture. the plateau in usage that began in 2007 continues through 2012. evidently, the observed spike in usage was artificial and the result of the presence of outliers in certain datasets. if the data had not been tested for outliers, it would have appeared that usage the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 39 had substantially increased in 2010 and it would have been incorrectly assumed that usage was on the rise once more. instead, the corrected data bring usage levels for 2010 and 2011 back in line with the plateau that had begun in 2007 and reflects a more realistic picture of usage rates at laurentian university. figure 1. comparing the recorded number of articles downloaded to the corrected number of articles downloaded over a thirteen-year period. accuracy in any data gathering is always extremely important, but accuracy in e-resource usage levels is critical for academic libraries. academic libraries having e-journal subscription rates based either entirely or partly on usage can be greatly affected if usage numbers have been artificially inflated. it can lead to unnecessary increases in cost. since it was determined that outliers were present only during the period in which the library had found itself under “attack,” it can be assumed that the vast majority, if not all, of the extreme usage values were a result of illegal downloading. it would therefore be a shame to need to pay higher costs because of inappropriate or illegal downloading of licensed content. accurate usage data is also important for academic libraries that integrate usage statistics into their collection development policy for the purpose of justifying the retention or cancellation of a particular subscription. the j. n. desmarais library is such a library. as indicated earlier, if the cost-per-download of a subscription is consistently greater than the cost of an interlibrary loan for three or more years, it is marked for cancellation. at the j. n. desmarais library, the average cost of an interlibrary loan had been previously calculated to be approximately can$15.00.42 therefore, subscriptions recording a “cost-per-download” greater than the can$15.00 target for more than three years can be eliminated from the collection. information technology and libraries | june 2014 40 any artificial increase in the number of downloads would have as result to artificially lower the cost-per-use ratio. this would reinforce the illusion that a particular subscription was used far more than it really was and lead to the false belief that it would be less expensive to retain rather than rely on interlibrary loan services. the true cost-per-use ratio may be far greater than initially calculated. the unnecessary retention of a subscription could prevent the acquisition of another, more relevant, one. for example, after adjusting the number of articles downloaded from sciencedirect in 2011, the cost-per-download ratio increased from can$0.74 to can$1.59, a 53 percent increase. for the j. n. desmarais library, this package was obviously not in jeopardy of being cancelled. but a 53 percent change in the cost-per-use ratio for borderline subscriptions would definitely have been affected. it must also be stated that none of the library’s subscriptions having experienced extreme downloading found themselves in the position of being cancelled after the usage data had been corrected for outliers. regardless, it is important to verify all usage data prior to any data analysis to identify and correct for outliers. once the outlier detection investigation has been completed and any extreme values replaced by the median, there would be no further need to manipulate the data in such a fashion. the identification of outliers is a one-time procedure. the corrected or cleaned datasets would then become the official datasets to be used for any further usage analyses. conclusions outliers can have a dramatic effect on the analysis of any dataset. as demonstrated here, the presence of outliers can lead to the misrepresentation of usage patterns. they can artificially inflate average values and introduce severe distortion to any dataset. fortunately, they are fairly easy to identify and remove. the following steps were used to identify outliers in jr1 countercompliant reports: 1. identify possible outliers: visually inspect the values recorded in a jr1 report dataset (dataset 1) and mark any extreme values. 2. for each suspected outlier identified, take the usage values for the affected e-journal title and incorporate them into a separate blank spreadsheet (dataset 2). incorporate into dataset 2 all other usage values for the affected journal from all available years. it is important that dataset 2 contain only those values for the affected journal. 3. test for the outlier: perform dixon q test on the suspected outlier to confirm or disprove existence of the outlier. 4. if the suspected outlier tests as positive, calculate the median of dataset 2. 5. replace the outlier in dataset 1 with the median calculated from dataset 2. 6. perform steps 1 through 5 for any other suspected outliers in dataset 1. 7. the corrected values in dataset 1 will become the official values and will be used for all subsequent usage data analysis. the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 41 the identification and removal of outliers had a noticeable effect on the usage statistics for j. n. desmarais library’s e-journal collection. outliers represented over 100,000 erroneous downloaded articles in 2010 and nearly 200,000 in 2011. a total of 20 percent of recorded downloads in 2011 were anomalous, and in all likelihood a result of illicit downloading after laurentian university’s ezproxy server was breached. new technologies have made digital content easily available on the web, which has caused serious concern for both publishers43 and institutions of higher learning, which have been experiencing an increase is illicit attacks.44 the history of napster supports the argument that users “will freely steal content when given the opportunity.”45 since web robot traffic will continue to grow in pace with the internet, it is critical that this traffic be factored into the performance and protection of any web servers.46 references 1. victoria j. hodge and jim austin, “a survey of outlier detection methodologies,” artificial intelligence review 85 (2004): 85–126, http://dx.doi.org/10.1023/b:aire.0000045502.10941.a9; patrick h. menold, ronald k. pearson, and frank allgöwer, “online outlier detection and removal,” in proceedings of the 7th mediterranean conference on control and automation (med99) haifa, israel—june 28-30, 1999 (haifa, israel: ieee, 1999): 1110–30. 2. hodge and austin, “a survey of outlier detection methodologies,” 85–126. 3. vic barnett and toby lewis, outliers in statistical data (new york: wiley, 1994). 4. hodge and austin, “a survey of outlier detection methodologies,” 85–126; r. s. witte and j. s. witte, statistics (new york: wiley, 2004); menold et al., “online outlier detection and removal,” 1110–30. 5. menold et al., “online outlier detection and removal,” 1110–30. 6. hodge and austin, “a survey of outlier detection methodologies,” 85–126. 7. laurentian university (sudbury, canada) is classified as a medium multi-campus university. total 2012 full-time student population was 6,863, of which 403 were enrolled in graduate programs. in addition, 2012 part-time student population was 2,652 with 428 enrolled in graduate programs. also in 2012, the university employed 399 full-time teaching and research faculty members. academic programs cover a multiple of fields in the sciences, social sciences, and humanities and offers 60 undergraduate, 17 master’s, and 7 doctoral degrees. 8. alain r. lamothe, “factors influencing usage of an electronic journal collection at a mediumsize university: an eleven-year study,” partnership: the canadian journal of library and information practice and research 7, no. 1 (2012), https://journal.lib.uoguelph.ca/index.php/perj/article/view/1472#.u36phvmsy0j. https://journal.lib.uoguelph.ca/index.php/perj/article/view/1472#.u36phvmsy0j information technology and libraries | june 2014 42 9. ben tremblay, “web bot—what is it? can it predict stuff?” daily common sense: scams, science and more (blog), january 24, 2008, http://www.dailycommonsense.com/web-botwhat-is-it-can-it-predict-stuff/. 10. derek doran and swapna s. gokhale, “web robot detection techniques: overview and limitations,” data mining and knowledge discovery 22 (2011): 183–210, http://dx.doi.org/10.1007/s10618-010-0180-z. 11. c. lee giles, yang sun, and isaac g. councill, “measuring the web crawler ethics,” in www 2010 proceedings of the 19th international conference on world wide web (raleigh, nc: international world wide web conferences steering committee, 2010): 1101–2, http://dx.doi.org/10.1145/17772690.1772824. 12. shinil kwon, kim young-gab, and sungdeok cha, “web robot detection based on patternmatching technique,” journal of information science 38 (2012): 118–26, http://dx.doi.org/10.1177/0165551511435969. 13. david watson, “the evolution of web application attacks,” network security (2007): 7–12, http://dx.doi.org/10.1016/s1353-4858(08)70039-4. 14. eric kin-wai lau, “factors motivating people toward pirated software,” qualitative market research 9 (2006): 404–19, http://dx.doi.org/1108/13522750610689113. 15. huan-chueh wu et al., “college students’ misunderstanding about copyright laws for digital library resources,” electronic library 28 (2010): 197–209, http://dx.doi.org/10.1108/02640471011033576. 16. ibid. 17. ibid. 18. emma mcculloch, “taking stock of open access: progress and issues,” library review 55 (2006): 337–43; c. patra, “introducing e-journal services: an experience,” electronic library 24 (2006): 820–31. 19. wu et al., “college students’ misunderstanding about copyright laws for digital library resources,” 197–209. 20. ibid. 21. vincent j. calluzzo and charles j. cante, “ethics in information technology and software use,” journal of business ethics 51 (2004): 301–12, http://dx.doi.org/10.1023/b:busi.0000032658.12032.4e. 22. s. l. solomon and j. a. o’brien “the effect of demographic factors on attitudes toward software piracy,” journal of computer information systems 30 (1990): 41–46. 23. j. n. desmarais library, “collection development policy” (sudbury, on: laurentian university, 2013), http://www.dailycommonsense.com/web-bot-what-is-it-can-it-predict-stuff/ http://www.dailycommonsense.com/web-bot-what-is-it-can-it-predict-stuff/ http://dx.doi.org/10.1007/s10618-010-0180-z http://dx.doi.org/10.1145/17772690.1772824 http://dx.doi.org/10.1177/0165551511435969 http://dx.doi.org/10.1016/s1353-4858(08)70039-4 http://dx.doi.org/1108/13522750610689113 http://dx.doi.org/10.1108/02640471011033576 http://dx.doi.org/10.1023/b:busi.0000032658.12032.4e the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 43 http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development %20policy.pdf. 24. lamothe, “factors influencing usage”; alain r. lamothe, “electronic serials usage patterns as observed at a medium-size university: searches and full-text downloads,” partnership: the canadian journal of library and information practice and research 3, no. 1 (2008), https://journal.lib.uoguelph.ca/index.php/perj/article/view/416#.u364kvmsy0i. 25. martin zimerman, “e-books and piracy: implications/issues for academic libraries,” new library world 112 (2011): 67–75, http://dx.doi.org/10.1108/03074801111100463. 26. ibid. 27. peggy hageman, “ebooks and the long arm of the law,” econtent (june 2012), http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-ofthe-law--82976.htm. 28. “dataset, n.,” oed online, (oxford, uk: oxford university press, 2013), http://www.oed.com/view/entry/261122?redirectedfrom=dataset; “dataset—definition,” ontotext, http://www.ontotext.com/factforge/dataset-definition; w. paul vogt, “data set,” dictionary of statistics and methodology: a nontechnical guide for the social sciences (london, uk: sage, 2005); allan g. bluman, elementary statistics—a step by step approach (boston: mcgraw-hill, 2000). 29. david b. rorabacher, “statistical treatment for rejection of deviant values: critical values of dixon’s ‘q’ parameter and related subrange ratios at the 95% confidence level,” analytical chemistry 63 (1991): 139–45; r. b. dean and w. j. dixon, “simplified statistics for small numbers of observations,” analytical chemistry 23 (1951): 636–38, http://dx.doi.org/10.1021/ac00002a010. 30. surenda p. verma and alfredo quiroz-ruiz, “critical values for six dixon tests for outliers in normal samples up to sizes 100, and applications in science and engineering,” revista mexicana de ciencias geologicas 23 (2006): 133–61. 31. robert r. sokal and f. james rohlf, biometry (new york: freeman, 2012); j. h. zar, biostatistical analysis (upper saddle river, nj: prentice hall, 2010). 32. “null hypothesis,” accessscience (new york: mcgraw-hill education, 2002), http://www.accessscience.com. 33. ibid. 34. “critical value,” accessscience, (new york: mcgraw-hill education, 2002), http://www.accessscience.com. 35. ibid. 36. verma and quiroz-ruiz, “critical values for six dixon tests for outliers,” 133–61. http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development%20policy.pdf http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development%20policy.pdf https://journal.lib.uoguelph.ca/index.php/perj/article/view/416#.u364kvmsy0i http://dx.doi.org/10.1108/03074801111100463 http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-of-the-law--82976.htm http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-of-the-law--82976.htm http://www.oed.com/view/entry/261122?redirectedfrom=dataset http://www.ontotext.com/factforge/dataset-definition http://dx.doi.org/10.1021/ac00002a010 http://www.accessscience.com/ http://www.accessscience.com/ information technology and libraries | june 2014 44 37. rorabacher, “statistical treatment for rejection of deviant values,” 139–45. 38. ibid. 39. jaakko astola and pauli kuosmanen, fundamentals of nonlinear digital filtering (new york: crc, 1997); jaakko astola, pekka heinonen, and yrjö neuvo, “on root structures of median and median-type filters,” ieee transactions of acoustics, speech, and signal processing 35 (1987): 1199–201; l. ling, r. yin, and x. wang, “nonlinear filters for reducing spiky noise: 2dimensions,” ieee international conference on acoustics, speech, and signal processing 9 (1984): 646–49; n. j. gallagher and g. wise, “a theoretical analysis of the oroperties of median filters,” ieee transactions of acoustics, speech, and signal processing 29 (1981): 1136–41. 40. menold et al., “online outlier detection and removal,” 1110–30. 41. ibid. 42. lamothe, “factors influencing usage”; lamothe, “electronic serials usage patterns.” 43. paul gleason, “copyright and electronic publishing: background and recent developments,” acquisitions librarian 13 (2001): 5–26, http://dx.doi.org/10.1300/j101v13n26_02. 44. tena mcqueen and robert fleck jr., “changing patterns of internet usage and challenges at colleges and universities,” first monday 9 (2004), http://firstmonday.org/issues/issue9_12/mcqueen/index.html. 45. robin peek, “controlling the threat of e-book piracy,” information today 18, no. 6 (2001): 42. 46. gleason, “copyright and electronic publishing,” 5–26. http://dx.doi.org/10.1300/j101v13n26_02 http://firstmonday.org/issues/issue9_12/mcqueen/index.html 14 information technology and libraries | march 2007 article title: subtitle in same font author name and second author author id box for 2 column layout 14 information technology and libraries | march 2007 article title: subtitle in same font author name and second author author id box for 2 column layout based on data collected as part of the 2006 public libraries and the internet study, the authors assess the degree to which public libraries provide sufficient and quality bandwidth to support the library’s networked services and resources. the topic is complex due to the arbitrary assignment of a number of kilobytes per second (kbps) used to define bandwidth. such arbitrary definitions to describe bandwidth sufficiency and quality are not useful. public libraries are indeed connected to the internet and do provide public-access services and resources. it is, however, time to move beyond connectivity type and speed questions and consider issues of bandwidth sufficiency, quality, and the range of networked services that should be available to the public from public libraries. a secondary, but important issue is the extent to which libraries, particularly in rural areas, have access to broadband telecommunications services. t he biennial public libraries and the internet studies, conducted since 1994, describe public library involve­ ment with and use of the internet.1 over the years, the studies showed the growth of public­access comput­ ing (pac) and internet access provided by public libraries to the communities they serve. internet connectivity rose from 20.9 percent to essentially 100 percent in less than ten years; the average number of public access computers per library increased from an average of two to nearly eleven; and bandwidth rose to the point where 63 percent of public libraries have connection speeds of greater than 769kbps (kilobytes per second) in 2006. this dramatic growth, replete with related information technology challenges, occurred in an environment of challenges—among them budgetary and staffing—that public libraries face in main­ taining traditional services as well as networked services. one challenge is the question of bandwidth suf­ ficiency and quality. the question is complex because typically an arbitrary number describes the number of kbps used to define “broadband.” as will be seen in this paper, such arbitrary definitions to describe band­ width sufficiency are generally not useful. the federal communications commission (fcc), for example, uses the term “high speed” for connections of 200kbps in at least one direction.2 there are three problematic issues with this definition: 1. it specifies unidirectional bandwidth, meaning that a 200kbps download, but a much slower upload (e.g., 56kbps) would fit this definition; 2. regardless of direction, bandwidth of 200kbps is neither high speed nor does it allow for a range of internet­based applications and services. this inad­ equacy will increase significantly as internet­based applications continue to demand more bandwidth to operate properly. 3. the definition is in the context of broadband to the single user or household, and does not take into consideration the demands of a high­use multiple­ workstation public­access context. in addition to connectivity speed, there are many ques­ tions related to public library pac and internet access that can affect bandwidth sufficiency—from budget and sus­ tainability, staffing and support, to services public librar­ ies offer through their technology infrastructure, and the impacts of connectivity and pac on the communities that libraries serve. one key question, however, is what is quality pac and internet bandwidth for public libraries? and, in attempting to answer that question, what are measures and benchmarks of quality internet access? this paper provides data from the 2006 public libraries and the internet study to foster discussion and debate around determining quality pac and internet access.3 bandwidth and connectivity data at the library outlet or branch level are presented in this article. the band­ width measures are not systemwide but rather at the point of service delivery in the branch. ■ the bandwidth issue there are a number of factors that affect the sufficiency and quality of bandwidth in a pac and internet service context. examples of factors that influence actual speed include: ■ number of workstations (public­access and staff) that simultaneously access the internet; ■ provision of wireless access that shares the same con­ nection; ■ ultimate connectivity path—that is, a direct connec­ tion to the internet that is truly direct, or one that goes through regional or other local hops (that may have aggregated traffic from other libraries or orga­ nizations) out to the internet; john carlo bertot and charles r. mcclure assessing sufficiency and quality of bandwidth for public libraries john carlo bertot (jbertot@fsu.edu) is the associate director of the information use management and policy institute and professor at the college of information, florida state university; and charles r. mcclure (cmcclure@ci.fsu.edu) is the director of the information use management and policy institute (www .ii.fsu.edu) and francis eppes professor of information studies at the college of information, florida state university. article title | author 15assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 15 ■ type of connection and bandwidth that the telecom­ munications company is able to supply the library; ■ operations (surfing, e­mail, downloading large files, streaming content) being performed by users of the internet connection; ■ switching technologies; ■ latency effects that affect packet loss, jitter, and other forms of noise throughout a network; ■ local settings and parameters, known or unknown, that impede transmission or bog down the delivery of internet­based content; ■ range of networked services (databases, videoconfer­ encing, interactive/real­time services) to which the library is linked; ■ if networked, the speed of the network on which the public­access workstations reside; and ■ general application resource needs, protocol priority, and other general factors. thus, it is difficult to precisely answer “how much bandwidth is enough” within an evolving and dynamic context of public access, use, and infrastructure. putting public­access internet use into a more typi­ cal application­and­use scenario, however, may provide some indication of adequate bandwidth. for example: ■ a typical three­minute digital song is 3mb; ■ a typical digital photo is about 2mb; and ■ a typical powerpoint presentation is about 10mb. if one person in a public library were to e­mail a powerpoint presentation at the same time that another person downloaded multiple songs, and another was exchanging multiple pictures, even a library with a t1 line (1.5mbps—megabytes per second) would experience a temporary network slowdown during these operations. this does not take into account many other new high­ bandwidth­consuming applications such as cnn stream­ ing­video channel; uploading and accessing content to a wiki, blog, or youtube.com; or streaming content such as cbs’s webcasting the 2006 ncaa basketball tournament. an increasingly used technology in various settings is two­way internet­based video conferencing. with an installed t1 line, a library could support two 512kbps or three 384kbps videoconferences, depending on the amount of simultaneous traffic on the network—which, in a public access context, would be heavy. indeed, the 2006 public libraries and the internet study indicated a near continuous use of public­access workstations by patrons (only 14.6 percent of public libraries indicated that they always had a sufficient number of workstations available for patron use). public libraries increasingly serve as access points to e­government services and resources, e.g., social services, disaster relief, health care.4 these services can require the simple completion of a web­based form (low­bandwidth consumption) to more interactive services (high­band­ width consumption). and, as access points to continuing education and online degree programs, public libraries need to offer adequate broadband to enable users to access services and resources that increasingly can depend on streaming technologies that consume greater bandwidth. ■ bandwidth and pac in public libraries today as table 1 demonstrates, public libraries continue to increase their bandwidth, with 63.3 percent of public libraries reporting connection speeds of 769kbps or greater. this compares to 47.7 percent of public libraries reporting connection speeds of greater than 769kbps in 2004. there are disparities between rural and urban pub­ lic libraries, with rural libraries reporting substantially fewer instances of connection speeds of greater than 1.5mbps in 2006. on the one hand, the increase in con­ nectivity speeds between 2004 and 2006 is a positive step. on the other, 16.1 percent of public libraries report that their connection speeds are insufficient to meet patron demands all of the time, and 29.4 percent indicate that their connection speeds are insufficient to meet patron demands some of the time. thus, nearly half of public libraries indicate that their connection speeds are insuf­ ficient to meet patron demands some or all of the time. in terms of public access computers, the average number of workstations that public libraries provide is 10.7 (table 2). urban libraries have an average of 17.1 workstations, as compared to rural libraries, which report an average of 7.1 workstations. a closer look at bandwidth and pac for the next sections, the data offer two key views for analysis purposes: (1) workstations—divided into libraries with ten or fewer public­access workstations and libraries with more than ten public­access worksta­ tions (given that the average number of public­access workstations in libraries is roughly ten); and (2) band­ width—divided into libraries with 769kbps or less and libraries with greater than 769kbps (an arbitrary indicator of broadband for a public library context). in looking across bandwidth and public­access work­ stations (table 3), overall 31.8 percent of public libraries have connection speeds of less than 769kbps while 63.3 percent have connection speeds of greater than 769kbps. a majority of public libraries—68.5 percent—have ten or fewer workstations, while 30.9 percent have more than ten workstations. in general, rural libraries have fewer workstations and lower bandwidth as compared to sub­ urban and urban libraries. indeed, 75.2 percent of urban 16 information technology and libraries | march 200716 information technology and libraries | march 2007 libraries with fewer than ten workstations have connec­ tion speeds of greater than 769kbps, as compared to 45.2 percent of rural libraries. when examining pac capacity, it is clear that public libraries have capacity issues at least some of the time in a typical day (tables 4 through 6). only 14.6 percent of public libraries report that they have sufficient numbers of workstations to meet patron demands at all times (table 6), while nearly as many, 13.7 percent, report that they consistently are unable to meet patron demands for public­access workstations (table 4). a full 71.7 percent indicate that they are unable to meet patron demands during certain times in a typical day (see table 5). in other words, 85.4 percent of public libraries report that they are unable to meet patron demand for public­access workstations some or all of the time during a typical day—regardless of number of workstations available and type of library. the disparities between rural and urban libraries are notable. in general, urban libraries report more difficulty in meeting patron demands for public­access workstations. of urban public libraries, 27.8 percent report that they consistently have difficulty in meeting patron demand for workstations, as compared to 11.0 percent of suburban and 10.6 percent of rural public libraries (table 4). by contrast, 6.6 percent of urban libraries report sufficient workstations to meet patron demand all the time as compared to 18.9 percent of rural libraries (table 6). when reviewing the adequacy of speed of connectiv­ ity data by the number of workstations, bandwidth, and metropolitan status, a more robust and descriptive pic­ table 1. public library outlet maximum speed of public-access internet services by metropolitan status and poverty metropolitan status poverty level maximum speed urban suburban rural low medium high overall less than 56kbps 0.7% ±0.8% (n=18) 0.4% ±0.6% (n=17) 3.7% ±1.9% (n=275) 2.0% ±1.4% (n=245) 2.7% ±1.6% (n=61) 2.6% ±1.6% (n=5) 2.1% ±1.4% (n=311) 56kbps– 128kbps 2.5% ±1.6% (n=67) 5.4% ±2.3% (n=264) 15.2% ±3.6% (n=1,132) 9.9% ±3.0% (n=1,237) 9.5% ±2.9% (n=216) 5.3% ±2.2% (n=10) 9.8% ±3.0% (n=1,463) 129kbps– 256kbps 2.7% ±1.6% (n=72) 6.8% ±2.5% (n=332) 11.1% ±3.1% (n=829) 8.5% ±2.8% (n=1,067) 7.3% ±2.6% (n=166) 8.2% ±2.8% (n=1,233) 257kbps–768kbps 9.1% ±2.9% (n=241) 10.4% ±3.1% (n=504) 13.4% ±3.4% (n=1,002) 12.5% ±3.3% (n=1,557) 8.4% ±2.8% (n=190) 11.7% ±3.2% (n=1,747) 769kbps– 1.5mbps 33.6% ±4.7% (n=889) 40.0% ±4.9% (n=1,945) 31.0% ±4.6% (n=2,310) 34.3% ±4.8% (n=4,286) 34.6% ±4.8% (n=788) 38.1% ±4.9% (n=70) 34.4% ±4.8% (n=5,144) greater than 1.5mbps 49.4% ±5.0% (n=1,304) 31.6% ±4.7% (n=1,533) 19.9% ±4.0% (n=1,488) 27.4% ±4.5% (n=3,423) 35.5% ±4.8% (n=808) 50.5% ±5.0% (n=93) 28.9% ±4.5% (n=4,324) don’t know 1.9% ±1.4% (n=50) 5.4% ±2.3% (n=263) 5.7% ±2.3% (n=427) 5.5% ±2.3% (n=685) 2.1% ±1.4% (n=48) 3.5% ±1.8% (n=6) 4.9% ±2.2% (n=739) weighted missing values, n=1,497 table 2. average number of public library outlet graphical publicaccess internet terminals by metropolitan status and poverty* poverty level metropolitan status low medium high overall urban 14.7 20.9 30.7 17.9 suburban 12.8 9.7 5.0 12.6 rural 7.1 6.7 8.1 7.1 overall 10.0 13.3 26.0 10.7 * note that most library branches defined as “high poverty” are in general part of library systems with multiple branches and not single building systems. by and large, library systems connect and provide pac and internet services systemwide. article title | author 17assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 17 ture emerges. while overall, 53.5 percent of public librar­ ies indicate that their connection speeds are adequate to meet demand, some parsing of this figure reveals more variation (tables 7 through 10): ■ libraries with connection speeds of 769kpbs or less are more likely to report that their connection speeds are insufficient to meet patron demand at all times, with 24.0 percent of rural libraries, 25.8 percent of suburban libraries, and 25.4 percent of urban libraries so reporting (table 7). ■ libraries with connection speeds of 769kpbs or less are more likely to report that their connection speeds are insufficient to meet patron demand at some times, with 35.0 percent of rural libraries, 38.1 per­ cent of suburban libraries, and 53.4 percent of urban libraries so reporting (table 8). ■ libraries with connection speeds of greater than 769kbps also report bandwidth­sufficiency issues, with 12.0 percent of rural libraries, 10.5 percent of suburban libraries so reporting; and 14.0 percent of urban librar­ ies indicating that their connection speeds are insuf­ ficient all of the time (table 7); 20.3 percent of rural libraries, 29.5 percent of suburban libraries, and 30.0 percent of urban libraries indicating that their connec­ tion speeds are insufficient some of the time (table 8). ■ libraries that have ten or fewer workstations tend to rate their bandwidth as more sufficient at either 769kbps or less or greater than 769kbps (tables 7, 8, and 10). thus, in looking at the data, it is clear that libraries with fewer workstations indicate that their connection speeds are more sufficient to meet patron demand. table 3. public library public-access workstations and speed of connectivity by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 48.4% n=2,929 45.2% n=2,737 30.1% n=891 63.2% n=1,872 21.6% n=269 75.2% n=937 more than 10 workstations 22.0% n=307 75.5% n=1,053 12.0% n=225 85.1% n=1,595 9.6% n=130 89.8% n=1,221 total 43.4% n=3,242 50.9% n=3,802 23.0% n=1,116 71.6% n=3,474 15.1% n=399 83.0% n=2,194 missing: 7.6% (n=1,239) table 4. fewer public library public-access workstations than patrons wishing to use them by metropolitan status rural suburban urban total 10 or fewer workstations 10.5% n=681 10.8% n=339 23.6% n=300 12.1% n=1,321 more than 10 workstations 10.8% n=158 11.4% n=220 31.2% n=430 16.9% n=808 total 10.6% n=845 11.0% n=562 27.8% n=748 13.7% n=2,157 missing: 2.9% (n=473) table 5. fewer public library public-access workstations than patrons wishing to use them at certain times during a typical day by metropolitan status rural suburban urban total 10 or fewer workstations 68.8% n=4,444 74.5% n=2,347 69.1% n=880 70.5% n=7,670 more than 10 workstations 78.1% n=1,139 80.2% n=1,548 62.8% n=866 74.5% n=3,553 total 70.5% n=5,605 76.7% n=3,905 65.6% n=1,764 71.7% n=11,273 missing: 2.9% (n=473) table 6. sufficient public library public-access workstations available for patrons wishing to use them by metropolitan status rural suburban urban total 10 or fewer workstations 20.6% n=1,331 14.7% n=464 7.4% n=94 17.4% n=1,889 more than 10 workstations 11.0% n=161 8.4% n=163 6.0% n=83 8.5% n=406 total 18.9% n=1,501 12.3% n=627 6.6% n=177 14.6% n=2,304 missing: 2.9% (n=473) 18 information technology and libraries | march 200718 information technology and libraries | march 2007 table 7. public library connection speed insufficient to meet patron needs by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 25.4% n=668 12.1% n=297 27.4% n=233 9.8% n=173 15.4% n=34 10.2% n=90 more than 10 workstations 11.6% n=34 11.4% n=108 19.2% n=41 11.3% n=168 25.4% n=32 17.1% n=199 total 24.0% n=705 12.0% n=408 25.8% n=274 10.5% n=341 18.7% n=72 14.0% n=293 table 8. public library connection speed insufficient to meet patron needs at some times by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 34.1% n=898 19.3% n=474 37.1% n=315 29.0% n=511 50.0% n=130 27.0% n=238 more than 10 workstations 43.2% n=127 22.5% n=214 42.3% n=90 30.3% n=450 60.3% n=76 32.0% n=374 total 35.0% n=1,025 20.3% n=694 38.1% n=405 29.5% n=961 53.4% n=206 30.0% n=626 table �. public library connection speed is sufficient to meet patron needs by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 38.9% n=1,025 68.3% n=1,675 35.0% n=297 60.2% n=1,062 34.6% n=90 62.9% n=556 more than 10 workstations 45.2% n=133 66.1% n=628 38.5% n=82 54.9% n=817 14.3% n=18 50.9% n=594 total 39.5% n=1,158 67.5% n=2,306 35.7% n=379 57.9% n=1,886 28.0% n=108 56.0% n=1,168 table 10. public library connection speed insufficient to meet patron needs some or all of the time by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 59.5% n=1,566 31.4% n=771 64.6% n=549 38.8% n=684 65.4% n=170 37.1% n=328 more than 10 workstations 54.8% n=161 33.9% n=322 61.5% n=131 41.6% n=618 85.7% n=108 49.1% n=573 total 24.0% n=1,025 32.3% n=1,102 64.0% n=680 40.0% n=1,302 72.0% n=278 44.0% n=919 article title | author 1�assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 1� ■ discussion and selected issues the data presented point to a number of issues related to the current state of public library pac and internet­access adequacy in terms of available public access computers and bandwidth. the data also provide a foundation upon which to discuss the nature of quality and sufficient pac and internet access in a public library environment. while public libraries indicate increased ability to meet patron bandwidth demand when providing fewer publicly avail­ able workstations, public libraries indicate that they have difficulty in meeting patron demand for public access computers. growth of wireless connections in 2004, 17.9 percent of public library outlets offered wire­ less access, and a further 21.0 percent planned to make it available. outlets in urban and high­poverty areas were most likely to have wireless access. the majority of librar­ ies (61.2 percent), however, neither had wireless access nor had plans to implement it in 2004. as table 11 demon­ strates, the number of public library outlets offering wire­ less access has roughly doubled from 17.9 percent to 36.7 percent in two years. furthermore, 23.1 percent of outlets that do not currently have it plan to add wireless access in the next year. thus, if libraries follow through with their plans to add wireless access, 61.0 percent of public library outlets in the united states will have it by 2007. the implications of the rapid growth of the public library’s provision of wireless connectivity (as shown in table 11) on bandwidth requirements are significant. either libraries added wireless capabilities through their current overall bandwidth, or they obtained additional bandwidth to support the increased demand created by the service. if the former, then wireless access created an even greater burden on an already problematic band­ width capacity and may have actually reduced the overall quality of connectivity in the library. if the latter, libraries then had to shoulder the burden of increased expendi­ tures for bandwidth. either scenario required additional technology infrastructure, support, and expenditures. sufficient and quality connections the notion of sufficient and quality public library con­ nection to the internet is a moving target and depends on a range of factors and local conditions. for purposes of discussion in this paper, the authors used 769kbps to differentiate “slower” from “faster” connectivity. if, how­ ever, 1.5mbps or greater had been used to define faster connectivity speeds, then only 28.9 percent of public libraries would meet the criterion of “faster” connectiv­ ity (see table 1). and in fact, simply because 28.9 percent of public libraries report connection speeds of 1.5mbps or faster does not also mean that they have sufficient or quality bandwidth to meet the computing needs of their users, their staff, their vendors, and their service provid­ ers. some public libraries may need 10mbps to meet the pac needs of their users as well as the internal staff and management computing needs. the library community needs to become more edu­ cated and knowledgeable about what constitutes sufficient and quality connectivity in their library for the communi­ ties that they serve. a first step is to understand clearly the nature and type of the connectivity of the library. the next step is to conduct an internal audit that minimally: ■ identifies the range of networked services the library provides both to users as well as for the operation of the library; ■ identifies the typical bandwidth consumption of these services; ■ determines the demands of users on the bandwidth in terms of services they use; ■ determines peak bandwidth­usage times; ■ identifies the impact of high­consumption networked services used at these peak­usage times; ■ anticipates bandwidth demands of newer services and resources that users will want to access through the library’s infrastructure—myspace.com, youtube. com—regardless of whether or not the library is the direct provider of such services; and ■ determines what broadband services are available to the library, the costs of these services, and the “fit” of these services to the needs of the library. based on this and related information from such an audit, library administration can better determine the degree to which the bandwidth is sufficient in speed and quality. ■ planning for sufficient and quality bandwidth knowing the current condition of existing bandwidth in the library is not the same as successful technology plan­ ning and management to ensure that the library has, in fact, bandwidth that is sufficient in speed and quality. once an audit such as has been suggested is completed, careful planning for bandwidth deployment in the library is essential. it appears, however, that currently much of the management and planning for networked services is based first on what bandwidth is available as opposed to the bandwidth that is needed to provide the necessary services and resources in a networked environment. this stance puts public libraries in a reactive condition rather than a proactive condition regarding provision of net­ worked services. 20 information technology and libraries | march 200720 information technology and libraries | march 2007 most public library planning approaches stress the importance of conducting some type of needs assessment as a precursor to any type of planning.5 further, technology plans should include such things as goals, objectives, ser­ vices provision, and evaluation as they relate to bandwidth and the appropriate bandwidth needed. recent library technology planning guides, however, give little attention to the management, planning, and evaluation of band­ width as it relates to provision of networked services. it must be noted that some public libraries may be prevented from accessing higher bandwidth due to high cost, lack of availability of bandwidth alternatives, or other local factors that determine access to advanced telecommunications in their areas. in such circumstances, the audit may serve to inform the public service/utilities commissions, fcc, and others of the need for deploy­ ment of advanced telecommunications services in these areas. ■ bandwidth planning in a community context the audit and planning processes that have been described are critical activities for libraries. it is essential, however, for these processes to occur in the larger community con­ text. investments in technology infrastructure are increas­ ingly a community­wide resource that services multiple functions—emergency services, community access, local government agencies, to name a few. it is in this larger context that library pac and internet access occurs. moreover, there is a convergence of technology and service needs. for example, public libraries increasingly serve as agents of e­government and disaster­relief providers.6 first responders rely on the library’s infrastructure when theirs is destroyed, as hurricane katrina and other storms demonstrated. local, state, and federal government agen­ cies rely on broadband and pac and internet access (wired or wireless) to deliver e­government services. thus, at their core, libraries, emergency services, gov­ ernment agencies, and others have similar needs. pooling resources, planning jointly, and looking across needs may yield economies of scale, better service, and a more robust community technology infrastructure. emergency providers need access to reliable broadband and commu­ nications technologies in general, and in emergency situ­ ations in particular. libraries need access to high­quality broadband and pac technologies. both need access to wireless technologies. as broadcast networks relinquish ownership of the 700 mhz frequency used for analog television in february 2009, and this frequency is distributed to municipali­ ties for emergency services, now is an excellent time for libraries to engage in community technology planning for e­government, disaster planning and relief efforts, and pac and internet services. by working with the larger community to build a technology infrastructure, the library and the entire community benefit. ■ availability to high-speed connectivity one key consideration not known at this time is the extent to which public libraries—particularly those in rural areas—even have access to high­speed connec­ tions. many rural communities are served not by the large telecommunications carriers, but rather by small, privately owned­and­run local exchange carriers. iowa and wisconsin, for example, are each served by more than eighty exchange carriers. as such, public libraries are limited in capacity and services to what these exchange table 11. public-access wireless internet connectivity availability in public library outlets by metropolitan status and poverty metropolitan status poverty level provision of public-access wireless internet services urban suburban rural low medium high overall currently available 42.9% ± 4.9% (n=1,211) 42.5% ± 4.9% (n=2,240) 30.7% ± 4.6% (n=2,492) 38.0% ± 4.8% (n=5,165) 28.1% ±4.5% (n=679) 53.8% ± 5.0% (n=99) 36.7% ± 4.8% (n=5,943) not currently available and no plans to make it available within the next year 23.1% ± 4.2% (n=651) 29.7% ± 4.6% (n=1,562) 49.2% ± 5.0% (n=3,988) 37.4% ± 4.8% (n=5,091) 44.4% ± 4.9% (n=1,072) 21.0% ± 4.1% (n=39) 38.3% ± 4.9% (n=6,201) not currently available, but there are plans to make it available within the next year 30.6% ± 4.6% (n=864) 26.0% ± 4.4% (n=1,369) 18.6% ± 3.9% (n=1,509) 22.5% ± 4.2% (n=3,063) 26.2% ± 4.4% (n=633) 25.3% ± 4.4% (n=46) 23.1% ± 4.2% (n=3,742) article title | author 21assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 21 carriers offer and make available. thus, in some areas, dsl service may be the only form of high­speed connec­ tivity available to libraries. and, as suggested earlier, dsl may or may not be considered high speed given the needs of the library and the demands of its users. communities that lack high­quality broadband ser­ vices by telecommunications carriers may want to con­ sider building a municipal wireless network that meets the community’s broadband needs for emergency, disas­ ter, and public­access settings. as a community engages in community­wide technology planning, it may become evident that local telecommunications carriers do not meet the broadband needs of the community. such com­ munities may need to build their own networks, based on identified technology­plan needs. ■ knowledge of networked services connectivity needs patrons may not attempt to use high­bandwidth services at the public library because they know from previous visits that the library cannot provide acceptable connec­ tivity speeds to access that service—thus, they quit trying to access that service, limiting the usefulness of the pub­ lic library. in addition, librarians may have inadequate knowledge or information to determine when bandwidth is or is not sufficient to meet the demands of their users. indeed, the survey and site visits revealed that some librarians did not know the connection speeds that linked their library to the internet. consequently, libraries are in a dilemma: increase both the number of workstations and the bandwidth to meet demand; or provide less service in order to operate within the constraints of current connectivity infrastruc­ ture. and yet, roughly 45 percent of public libraries indi­ cate that they have no plans to add workstations within the next two years; the average number of workstations has been around ten for the last three surveys (2002, 2004, and 2006); and 80 percent of public libraries indicate that space limitations affect their ability to add workstations.7 hence, for many libraries, adding workstations is not an option. ■ missing the mark? the networked environment is such that there are multi­ ple uses of bandwidth within the same library—for exam­ ple, public internet access, staff access, wireless access, integrated library system access. we are now in the web 2.0 environment, which is an interactive web that allows for content uploading by users (e.g., blogs, mytube.com, myspace.com, gaming). streaming content, not text, is increasingly the norm. there are portable devices that allow for text, video, and voice messaging. increasingly, users desire and prefer wireless services. this is a new environment in which libraries provide public access to networked services and resources. it is an enabling environment that puts users fully in the content seat—from creation to design to organization to access to consumption. and users have choices, of which the public library is only one, regarding the information they choose to access. it is an environment of competition, advanced applications, bandwidth intensity, and high­quality com­ puters necessary to access the graphically intense content. the impacts of this new and substantially more com­ plex environment on libraries are potentially significant. as user expectations rise, combined with the provision of high­quality services by other providers, libraries are in a competitive and service­ and resource­rich informa­ tion environment. providing “bare minimum” pac and internet access can have two detrimental effects in that they: (1) relegate libraries to places of last resort, and (2) further digitally divide those who only have public­access computers and internet access through their public librar­ ies. it is critical, therefore, for libraries to chart a high­end course regarding pac and internet access, and not access that is merely perceived to be acceptable by the librarians. ■ additional research the context in which issues regarding quality pac and sufficient connectivity speeds to internet access reside is complex and rapidly changing. research questions to explore include: ■ is it possible to define quality pac and internet access in a public library context? ■ if so, what are the attributes included in the defini­ tion? ■ can these attributes be operationalized and mea­ sured? ■ assuming measurable results, what strategies can the library, policy, research, and other interested communities employ to impact public library move­ ment toward quality pac and internet access? ■ should there be standards for sufficient connectivity and quality pac in public libraries? ■ how can public librarians be better informed regard­ ing the planning and deployment of sufficient and quality bandwidth? ■ what is the role of federal and state governments in supporting adequate bandwidth deployment for public libraries?8 ■ to what extent is broadband deployment and avail­ ability truly universal as per the universal service 22 information technology and libraries | march 200722 information technology and libraries | march 2007 (section 254) of the telecommunications act of 1996 (p.l. 104­104)? these questions are a beginning point to a larger set of activities that need to occur in the research, practitioner, and policy­making communities. ■ obtaining sufficient and quality public-library bandwidth arbitrary connectivity speed targets, e.g., 200kbps or 769kbps, do not in and of themselves ensure quality pac and sufficient connectivity speeds. public libraries are indeed connected to the internet and do provide public­ access services and resources. it is time to move beyond connectivity­type and ­speed questions and consider issues of bandwidth sufficiency, quality, and the range of networked services that should be available to the public from public libraries. given the widespread connectivity now provided from most public libraries, there continue to be increased demands for more and better networked services. these demands come from governments that expect public libraries to support a range of e­government services, from residents who want to use free wireless connectivity from the public library, to patrons who need to download music or view streaming videos (to name but a few). simply providing more or better connectivity will not, in and of itself, address all of these diverse service needs. increasingly, pac support will require additional public librarian knowledge, resources, and services. sufficient and quality bandwidth is a key component of those services. the degree to which public libraries can provide such enhanced networked services (requiring exceptionally high bandwidth that is both sufficient and of high quality) is unclear. mounting a significant effort now to better understand existing bandwidth use and plan for future needs and requirements in individual public libraries is essential. in today’s networked envi­ ronment, libraries must stay competitive in the provision of networked services. such will require sufficient and high­quality connectivity and bandwidth. ■ acknowledgements the authors gratefully acknowledge the support of the bill & melinda gates foundation and the american library association for support of the 2006 public libraries and the internet study. data from that study have been incorpo­ rated into this paper. references 1. information institute, public libraries and the internet (tal­ lahassee, fla.: information use management and policy insti­ tute, 2006). all studies conducted since 1994 are available at: http://www.ii.fsu.edu/plinternet (accessed march 1, 2007). 2. u.s. federal communications commission, high speed services for internet access: status as of december 31, 2005 (wash­ ington, d.c.: fcc, 2006), available at http://www.fcc.gov/ bureaus/common_carrier/reports/fcc­state_link/iad/ hspd0604.pdf (accessed mar. 1, 2007). 3. j. c. bertot et al., public libraries and the internet 2006 (tal­ lahassee, fla.: information use management and policy insti­ tute, forthcoming), available at http://www.ii.fsu.edu/plinternet (accessed mar. 1, 2007). 4. j. c. bertot et al., “drafted: i want you to deliver e­ government,” library journal 131, no. 13 (aug. 2006): 34–37. 5. c. r. mcclure et al., planning and role setting for public libraries: a manual of options and procedures (chicago: ala, 1987); e. himmel and w. j. wilson, planning for results: a public library transformation process (chicago, ala, 1997). 6. j. c. bertot et al., “drafted: i want you to deliver e­gov­ ernment.”; p. t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. 7. j. c. bertot et al., public libraries and the internet 2006. 8. jaeger et al., “the policy implications of internet connec­ tivity in public libraries.” checking out facebook.com | charnigo and barnett-ellis 23 author name and second author checking out facebook.com | charnigo and barnett-ellis 23 author name and second author author id box for 2 column layout while the burgeoning trend in online social networks has gained much attention from the media, few studies in library science have yet to address the topic in depth. this article reports on a survey of 126 academic librarians concerning their perspectives toward facebook.com, an online network for students. findings suggest that librarians are overwhelmingly aware of the “facebook phenomenon.” those who are most enthusiastic about the potential of online social networking suggested ideas for using facebook to promote library services and events. few individuals reported problems or distractions as a result of patrons accessing facebook in the library. when problems have arisen, strict regulation of access to the site seems unfavorable. while some librarians were excited about the possibilities of facebook, the majority surveyed appeared to consider facebook outside the purview of professional librarianship. d uring the fall of 2005, librarians noticed something unusual going on in the houston cole library (hcl) at jacksonville state university (jsu). students were coming into the library in droves. patrons waited in lines with photos to use the public­access scan­ ner (a stack of discarded pictures quickly grew). library traffic was noticeably busier than usual and the computer lab was constantly full, as were the public­access termi­ nals. the hubbub seemed to center around one particular web site. once students found available computers, they were likely to stay glued to them for long stretches of time, mesmerized and lost in what was later determined to be none other than “facebook addiction.” this addic­ tion was all the more obvious the day the internet was down. withdrawal was severe. soon after the librarians noticed this curious behavior, an article in the chanticleer, the campus newspaper for jsu, dispelled the mystery surrounding the web­site brouhaha. a campus reporter broke the exciting news to the jsu community that “after months of waiting and requests from across the country, it’s finally here. jsu is officially on the facebook.”1 the library suddenly became a popular hangout for students in search of computers to access facebook. apparently jsu jumped on the bandwagon relatively late. the facebook craze had already spread throughout other colleges and universities since the web site was founded in february 2004 by mark zuckerberg, a former student at harvard university. the creators of facebook vaguely define the site as “a social utility that connects you with the people around you.”2 although originally created to allow students to search for other students at colleges and universities, the site has expanded to allow individuals to connect in high schools, companies, and within regions. recently, zuckerberg has also announced plans to expand the network to military bases.3 currently, students and alumni in more than 2,200 colleges and uni­ versities communicate, connect with other students, and catch up with past high school classmates daily through the network. students who may never physically meet on campus (a rather serendipitous occurrence in nature) have the opportunity to connect through facebook. establishing virtual identities by creating profiles on the site, students post photographs, descriptions of academic and personal interests such as academic majors, campus organizations of which they are members, political orientation, favorite authors and musicians, and any other information they wish to share about themselves. facebook’s search engine allows users to search for students, faculty, and staff with similar interests by keyword. it would be hard to gauge how many of these students actually meet in person after connecting through facebook. the authors of this study have heard students mention that either they or their friends have made dates with other students on campus through facebook. many of the “friends” facebook users first add when they initially establish their accounts are the ones they are already acquainted with in the physical world. when facebook made its debut at jsu, it had become the “ninth most highly trafficked web site in the u.s.”4 one source estimated that 85 percent of college students whose institutions are registered in facebook’s directory have created personal profiles on the site.5 membership for the university network requires a university e­mail address, and an institution cannot be registered in the directory unless a significant number of students request that the school be added. currently, more than nine mil­ lion people are registered on facebook.6 soon after jsu was registered on facebook’s direc­ tory, librarians began to receive questions regarding use of the scanner and requests for help uploading pictures to facebook profiles. students seemed surprisingly open about showing librarians their profiles, which usually contained more information than the librarians wanted to know. however, not all students were enthusiastic about facebook. complaints began to surface from students awaiting access to computers for academic work while classmates “tied up” computers on facebook. some stu­ dents complained about the distraction facebook caused checking out facebook.com: the impact of a digital trend on academic libraries laurie charnigo and paula barnett-ellis laurie charnigo (charnigo@jsu.edu) is an education librarian and paula barnett-ellis (pbarnett@jsu.edu) is a health, science, and nursing librarian at the houston cole library, jacksonville state university, alabama. 24 information technology and libraries | march 200724 information technology and libraries | march 2007 in the library’s computer lab, a complaint that eventually reached the president of jsu. currently, the administra­ tion at jsu has decided to block access to facebook in the computer labs on campus, including the lab in the library. opinions of faculty and staff in the library about facebook vary. some librarians scoff at this new trend, viewing the site primarily as just another dating service. others have created their own facebook accounts just to see how it works, to connect with students, and to keep up with the latest internet fad.7 ■ study rationale prompted by the issues that have arisen at hcl as a result of heavy patron use of facebook, the authors surveyed academic librarians throughout the united states to find out what impact, if any, the site has had on other libraries. the authors sought information about the practical effect facebook has had on libraries, as well as librarians’ perspectives, perceived roles associated with, and awareness of internet social trends and their place in the library. online social networking, like e­mail and instant messaging, is emerging as a new method of com­ munication. recently, the librarians have heard facebook being used as a verb (e.g., “i’ll facebook you”). few would probably disagree that making social connections and friends (and facebook revolves around connecting friends) is an important aspect of the campus experi­ ence. much of the attraction students and alumni have toward college yearbooks (housed in the library) stems from the same fascination that viewing photos, student profiles, and searching for past and present classmates on facebook inspires. emphasis in this study centers on librarians’ awareness of, experimentation with, and atti­ tudes towards facebook and whether or not they have created policies to regulate or block access to the site on public­access computers. however trendy an individual web site such as facebook may appear, online social networking, a cat­ egory facebook falls within, has become a new subject of inquiry to marketing professionals, sociologists, commu­ nication scholars, and library and information scientists. downes defines social networks as a “collection of indi­ viduals linked together by a set of relations.”8 according to downes, “social networking web sites fostering the development of explicit ties between individuals as ‘friends’ began to appear in 2002.”9 facebook is just one of many popular online social network sites (myspace, friendster, flickr), and survey respondents often asked why questions focused solely on facebook. the authors decided to investigate it specifically because it is cur­ rently the largest online social network targeted for the academic environment. librarians are also increasingly exploring the use of what have loosely been referred to as “internet 2.0” com­ panies and services, such as facebook, to interact with and reach out to our users in new and creative ways. the term internet 2.0 was coined by o’reilly media to refer to internet services such as blogs, wikis, online social net­ working sites, and types of networks that allow users the ability to interact and provide feedback. o’reilly lists the core competencies that define internet 2.0 services. one of these competencies, which might be of particular inter­ est to librarians, is that internet 2.0 services must “trust the users” as “co­developers.”10 as librarians struggle to develop innovative ways to reach users beyond library walls, it seems logical to observe online services, such as facebook and myspace, which appeal to a huge portion of our clientele. from a purely evaluative standpoint of the site as a database, the authors were impressed by several of the search features offered in facebook. graph­theory algo­ rithms and other advanced network technology are used to process connections.11 some of the more interesting search options available in facebook include the ability to: ■ search for students by course field, class number, or section; ■ search for students in a particular major; ■ search for students in a particular student organiza­ tion or club; ■ create “groups” for student organizations, clubs, or other students with common interests; ■ post announcements about campus or organization events; ■ search specifically for alumni; and ■ block or limit who may view profiles, providing users with built­in privacy protection if the user so wishes. since the authors finished the study, the site has added a news feed and a mini feed, features that allow users to keep track of their friends’ notes, messages, profile changes, friend connections, and group events. in response to negative feedback about the news feeds and mini feeds by users who felt their privacy was being violated, facebook’s administrators created a way for users to turn off or limit information displayed in the feeds. the addition of this technology, however, provides a sophisticated level of connectivity that is a benefit to users who like to keep abreast of the latest happenings in their network of friends and groups. the pulse, another feature on the site, keeps daily track of popular interests (e.g., favorite books) and member demographics (number of members, political orientation) and compares them with overall facebook member averages. the authors were pleasantly surprised to discover that the beatles and led zeppelin, beloved bands of the baby boomers, article title | author 25checking out facebook.com | charnigo and barnett-ellis 25 continue to live on in the hearts of today’s students. these groups were ranked in the top ten favorite bands by stu­ dents at jsu. as of october 2006, the top campaign issues expressed by facebook users were: reducing the drinking age to eighteen (go figure) and legalization for same­sex marriage. arguably, much of the information provided by facebook is not academic in nature. however, an evaluation or review of facebook might provide useful information to instruction librarians and database ven­ dors regarding interface design and search capabilities that appeal to students. provitera­mcglynn suggests that facilitating learning among millennials, who “represent 70 to 80 million people” born after 1992 (a large percent­ age of facebook members) involves understanding how they interact and communicate.12 awareness of students’ cultural and social interests, and how they interact online, may help older generations of academic librarians better connect with their constituents. ■ the literature on online social networks although social networks have been the subject of study by sociologists for years and social network theories have been established to describe how these networks func­ tion, the study of online social networks has received little attention from the scholarly community. in 1997, garton, haythornthwaite, and wellman were among the first to describe a method, social network analysis, for studying online social networks.13 their work was published years before online social networks similar to facebook evolved. currently, the literature on these networks is predominantly limited to popular news pub­ lications, business magazines, occasional blurbs in library science and communications journals, and numerous student newspapers.14 privacy issues and concerns about sexual predators lurking on facebook and similar sites have been the focus of most articles. in the chronicle of higher education, read details numerous arrests, suspensions, and schol­ arship withdrawals that have resulted from police and administrators searching for incriminating information students have posted in facebook.15 read discovered that, because students naively reveal so much informa­ tion about their activities, some campus police were regularly trolling facebook, finding it “an invaluable ally in protecting their campuses.”16 students may feel a false sense of security when they post to facebook, regarding it as their private space. however, read warns that “as more and more colleges offer alumni e­mail accounts, and as campus administrators demonstrate more internet savvy, students are finding that their conversations are playing to a wider audience than they may have antici­ pated.”17 privacy concerns expressed about facebook appear to revolve more around surveillance than stalk­ ers. in a web seminar on issues regarding facebook use in higher education, shawn mcguirk, director of judicial affairs, mediation, and education at fitchburg state college, massachusetts, recommends that administrators and others concerned with students posting potentially incriminating, embarrassing, or overtly personal infor­ mation draft a document similar to the one created by cornell university’s office of information technologies, which advises students on how to safely and responsibly use online social networking sites similar to facebook.18 after pointing out the positive benefits of facebook and reassuring students that cornell university is proud of its liberal policy in not monitoring online social networks, the essay, entitled “thoughts on facebook,” provides poignant advice and examples of privacy issues revolv­ ing around facebook and similar web sites.19 the golden rule of this essay states: don’t say anything about someone else that you would not want said about yourself. and be gentle with your­ self too! what might seem fun or spontaneous at 18, given caching technologies, might prove to be a liability to an on­going sense of your identity over the longer course of history.20 a serious concern discussed in this document is the real possibility that potential employers may scan facebook profiles for the “real skinny” on job candidates. however, unless the employer uses an e­mail issued from the same school as the candidate, he or she is unable to look at the individual’s full profile without first request­ ing permission from the candidate to be added as a “friend.” all the employer is able to view is the user’s name, school affiliation, and picture (if the user has posted one). unless the user has posted an inappropriate picture or is applying for a job at the college he or she is attending, the threat of employers snooping for informa­ tion on potential candidates in facebook is minimal. the same, however, cannot be said of myspace, which is much more open and accessible to the public. additionally, three pilot research studies have also focused on privacy issues specifically relating to facebook, including those of stutzman, gross and acquisti, and govani and pashley. results from all three studies revealed strikingly close findings. individuals who participated in the studies seemed willing to dis­ close personal information about themselves—such as photos and sometimes even phone numbers and mailing addresses—on facebook profiles even though students also seemed to be aware that this information was not secure. in a study of fifty carnegie mellon university undergraduate users, govani and pashley concluded that these users “generally feel comfortable sharing their per­ sonal information in a campus environment. participants said they “had nothing to hide” and “they don’t really 26 information technology and libraries | march 200726 information technology and libraries | march 2007 care if other people see their information.”21 a separate study of more than four thousand facebook members at the same institution by gross and acquisti echoed these findings.22 comparing identity elements shared by members of facebook, myspace, friendster, and the university of north carolina directory, stutzman discov­ ered that a significant number of users shared personal information about themselves in online social networks, particularly facebook, which had the highest level of campus participation.23 gross and acquisti provide a list of explanations suggesting why facebook members are so open about sharing personal information online. three explanations that are particularly convincing are that “the perceived benefit of selectively revealing data to strang­ ers may appear larger than the perceived costs of possible privacy invasions”; “relaxed attitudes toward (or lack of interest in) personal privacy”; and “faith in the network­ ing service or trust in its members.”24 in public libraries, concern has primarily centered on teenagers accessing myspace.com, an online social net­ working site much larger than facebook. myspace, whose membership, unlike facebook, does not require an .edu e­mail address, has a staggering 43 million users, a num­ ber that continues to rise.25 julian aiken, a reference librar­ ian at the new haven free public library, wrote about the unpopular stance he took when his library decided to ban access to myspace due to the hysterical hype of media reports exposing the dangers from online predators lurking on the site.26 for aiken, the damage of censorship policies in libraries far outweighs the potential risk of sex crimes. furthermore, he suggests that there are even edu­ cational benefits of myspace, observing that “[t]eenagers are using myspace to work on collaborative projects and learn the computer and design skills that are increasingly necessary today.”27 what is apparent is that whether facebook continues to rise in popularity or fizzles out among the college crowd, the next generation of college students, who now constitute the largest percentage of myspace users, are already solidly entrenched and adept at using online social networks. librarians in institutions of higher education might need to consider what implica­ tions the communication style preferences of these future students could have, if any, on library services. while most of the academic attention regarding online social networks has centered on privacy concerns, perhaps the business sector has done a more thorough investiga­ tion of user behavior and students’ growing attraction towards these types of sites. business magazines have naturally focused on the market potential, growth, and fluctuating popularity of various online social networks. advertisers and investors have sought ways to capital­ ize on the exponential growth of these high­traffic sites. business week reported that as of october 2005, facebook .com had 4.2 million members. more than half of those members were between the ages of twelve and twenty­ four.28 while some portended that the site was losing momentum, as of august 2006, membership on facebook had expanded beyond eight million.29 marketing experts have closely studied, apparently more so than com­ munication scholars, the behavior of users in online social networks. in a popular business magazine, hempel and lehman describe user behavior of the “myspace generation”: “although networks are still in their infancy, experts think they’re already creating new forms of social behavior that blur the distinctions between online and real­world interactions.”30 the study of user behavior in online social networks, however, has yet to be addressed in length by those outside the field of marketing. although evidence of interest in online social net­ works is apparent in librarian weblogs and forums (many librarians have created facebook groups for their libraries), actual literature in the field of library and information science is scarce.31 dvorak questions the lack of interest displayed by the academic community toward online social networks as a focus of scholarly research. calling on academics to “get to work,” he argues “aca­ demia, which should be studying these phenomena, is just as out of the loop as anyone over 30.”32 this discon­ nect is also echoed by michael j. bugeja, director of the greenlee school of journalism and communication at iowa state university, who writes, “while i’d venture to say that most students on any campus are regular visitors to facebook, many professors and administrators have yet to hear about facebook, let alone evaluate its impact.”33 the lack of published research articles on these types of networks, however, is understandable given the newness of the technology. a few members of the academic community have sug­ gested opportunities for using facebook to communicate with and reach out to students. in a journal specifically geared toward student services in higher education, shier considers the impact of facebook on campus community building.34 although she cannot identify an academic purpose for facebook, she describes how the site can con­ tribute to the academic social life of a campus. facebook provides students with a virtual campus experience, particularly in colleges where students are commuters or are in distance education. shier writes, “as the student’s definition of community moves beyond the geographic and physical limitations, facebook.com provides one way for students to find others with common interests, feel as though they are part of a large community, and also find out about others in their classes.”35 furthermore, facebook membership extends beyond students to fac­ ulty, staff, and alumni. shier cites examples of professors who used facebook to connect or communicate with their students, including the president of the university of iowa and more than one hundred professors at duke university. professors who teach online courses make article title | author 27checking out facebook.com | charnigo and barnett-ellis 27 themselves seem more human or approachable by estab­ lishing facebook profiles.36 greeting students on their own turf is exactly the direction staff at washington university’s john m. olin library decided to take when they hired web services librarian joy weese moll to communicate and answer questions through a variety of new technologies, includ­ ing facebook.37 brian mathews, information services librarian at georgia institute of technology, also created a facebook profile in order to “interact with the students in their natural environment.”38 mathews decided to experiment with the possibilities of using facebook as an outreach tool to promote library services to 1,700 stu­ dents in the school of mechanical engineering after he discovered that 1,300 of these students were registered on facebook. advising librarians to become proactive in the use of online social networks, mathews reported that overall, his experience helped him to effectively “expand the goal of promoting the library.”39 bill drew was among the first librarians to create an account and profile for his library, the suny morrisville library. as of september 2006, nearly one hundred librarians had created profiles or accounts for their libraries on facebook. one month later, however, the administration at facebook began shutting down library accounts on the grounds that libraries and institutions were not allowed to represent themselves with profiles as though they were individu­ als. in response, many of these libraries simply created groups for their libraries, which is completely appropri­ ate, similar to creating a profile, and just as searchable as having an account. the authors of this study created the “houston cole library users want answers!” group, which currently has ninety­one members. library news and information of interest about the library is announced in the group.40 in this study, one trend the authors will try to identify is whether other librarians have considered or are already using facebook in similar ways that moll, mathews, and drew have explored as avenues for com­ municating with students or promoting library services. ■ the survey in february 2006, 244 surveys were mailed to reference or public service librarians (when the identity of those per­ sons could be determined). these individuals were chosen from a random sample of the 850 institutions of higher education classified by the carnegie classification listing of higher education institutions as “master’s colleges and universities (i and ii)” and “doctoral/ research universities (extensive and intensive).”41 the sample size provided a 5.3 percent margin error and a 95 percent confidence level. one hundred twenty­six surveys were completed, providing a response rate of 51 percent. fifteen survey questions (appendix a) were designed to target three areas of inquiry: awareness of facebook, practical impact of the site on library services, and perspectives of librarians toward online social networks. awareness of facebook a series of questions on the survey queried respondents about their awareness and degree of knowledge about facebook. the overwhelming majority of librarians were aware of facebook’s existence. out of 126 librarians, 114 had at least heard of facebook; 24 were not familiar with the site. as one individual wrote, “i had not heard of facebook before your survey came, but i checked and our institution is represented in facebook.” universities registered in facebook are easily located through a search­by­region on facebook’s home page. thirty­eight colleges and universities for alabama (jsu’s location) are registered in facebook. (in comparison, 143 academic institutions in california are listed.) out of those librar­ ians who had heard of the site, 27 were not sure whether their institutions were registered in facebook’s directory. sixty survey participants were aware that their institu­ tions were registered in the directory, while fifteen librar­ ians reported that their universities were not registered (figure 1). several comments at the end of the survey indicated that some of the institutions surveyed did not issue school e­mail accounts, making membership in facebook impossible for their university. interestingly, out of the sixty individuals who could claim that their universities were in the directory, 34 percent have created their own personal facebook accounts and two libraries have individual profiles (figure 2). one individual who established an account on the site wrote, “personally, i’m a little embarrassed by having an account because it’s such a teeny­bopper kind of thing and i’m a little old for it. but it’s an interesting cultural phenomenon and academic librarians need to get on the bandwagon with it, if only to better understand their constituents.” another survey respondent with an individual profile on the site reported a group created by his or her institution on facebook titled “i totally want to have sex in the library.” this individual wanted to make it clear, however, that the students—not the librarians—created this group. a particularly help­ ful participant went so far as to poll the reference col­ leagues in all nine of the libraries at his/her institution and found that “only a few had even heard of facebook.” that librarians will become increasingly aware of online social networks was the sentiment expressed by another individual who wrote, “most librarians at my institu­ tion are unaware of social software in general, much less facebook. however, i think this will change in the future as social software is mentioned more often in traditional media (such as television and newspapers).” according to survey responses, it does not appear 28 information technology and libraries | march 200728 information technology and libraries | march 2007 that use of facebook by students has been as noticeable or distracting in other libraries as it has been at hcl. when asked to describe their observation of student use of library computers to access facebook, 56 percent of those surveyed checked “rarely to never.” only 20 percent indicated “most of the time” to “all of the time” (table 1). however, it is important to remember that only sixty individuals could verify that their institutions are regis­ tered on facebook. through comments, some librarians hinted that “snooping” or keeping mental notes of what students view on library computers is frowned upon. it simply is not our business. “we do not regulate or track student use of computers in the library,” wrote one indi­ vidual. several librarians noted that students were using facebook in the libraries, but more so on personal laptops than public­access computers. practical impact of facebook another goal of this study was to find out whether facebook has had any real impact on library services, such as an increase in bandwidth, library traffic, and noise, or in use of public­access computers, scanners, or other equipment. student complaints about monopolization of computers for use of facebook led administrators to block the site from computer labs at jsu. access to facebook on public­access terminals, however, was not regulated. survey responses revealed that facebook has had minimal impact on library services elsewhere. only one library was forced to develop a policy for specifically addressing computer­use concerns as a result of facebook use. one individual mailed the sign posted on every computer terminal in the library, which states, “if you are using a computer for games, chat, or other recreational activity, please limit your usage to thirty minutes. computers are primarily intended for academic use.” another librarian reported that academic computing staff had to shut down access to facebook on library computers due to band­ width and access issues. this individual, however, added, “interestingly, no one has complained to the library staff about its absence!” given a list of possible effects facebook may have had on library services and operations, 10 per­ cent of respondents indicated that facebook has increased patron use of computers. seven percent agreed that it has increased patron traffic, and only 2 percent reported that the site has created bandwidth problems or slowed down internet access. only four individuals received patron complaints about other users “tying up” the computers with facebook (figure 3). since the advent of facebook, the public scanner has become one of the hottest items in hcl. librarians at jsu know that use of the scanner has increased tremendously due to facebook because the scanner used by students to upload photos is attached to a public workstation next to the general reference desk. students often ask questions about uploading pictures to their facebook profiles as well as how to edit photos (e.g., resizing and cropping). one survey question asked whether scanner use had increased as a result of facebook. of the sixty­two respon­ dents who answered this question (it was indicated that only those libraries that provide public access to scanners should answer the question), 77 percent reported that figure 1. institutions added to the facebook directory figure 2. involvement with facebook table 1. student use of library computers to access facebook (based on observation) total percentage never 23 32 rarely 17 24 some of the time 17 24 all the time 7 10 most of the time 7 10 article title | author 2�checking out facebook.com | charnigo and barnett-ellis 2� scanner use had not increased. furthermore, only two librarians have assisted students with the scanner or pro­ vided any other type of assistance, for that matter, with facebook. the assistance the two librarians gave included scanning photographs, editing photos, uploading photos to facebook profiles, and creating accounts. however, in a separate question, 21 percent of participants agreed that librarians should be responsible for helping students, when needed, with questions about facebook. no librar­ ian has added additional equipment such as computers or scanners as a result of facebook. only one individual reported future plans by his/her library to add additional equipment in the future as a result of heavy use of the site. perspectives toward facebook one of the main goals of the study was to obtain a snapshot of the perspectives and attitudes of librarians toward facebook and online social networks in general. most of the librarians surveyed were neither enthusiastic nor disdainful of facebook. a small group of the respon­ dents, however, when given the chance to comment, were extremely positive and excited about the possibilities of online social networking. twenty­one individuals saw no connection between libraries and facebook. sixty­ seven librarians were in agreement that computer use for academic purposes should take priority, when needed, over use of facebook. however, fifty­one respondents indicated that librarians needed to keep up with internet trends, such as facebook, even when such trends are not academic in nature (table 2). out of 126 librarians who completed the survey, only 23 reported that facebook has generated discussion among library faculty and staff about online social networks. on the other hand, few individuals voiced negative opinions toward facebook. only 5 percent of those surveyed indicated that facebook annoyed faculty and staff. one individual wrote, “i don’t like facebook or most social networking services. they encourage the formation of cliques and keep users from meeting and accepting those who are different than themselves.” comments like this, however, were rare. although the majority of librarians seemed fairly apa­ thetic toward facebook, few individuals expressed nega­ tive comments toward the site. few librarians indicated that facebook should be addressed or regulated in library policy. most individu­ als viewed the site as just another communication tool similar to instant messaging or cell phones. in fact, while most librarians did not express much interest in facebook, many were quite vocal about not regulating its use. the following comment by one survey partici­ pant captures this sentiment: “attempts to restrict use of facebook in the library would be futile, in my opinion, in the same way it is now impossible to ban use of usb drives and aim in academic libraries.” while most indi­ table 2. access, assistance, and awareness of facebook and similar trends: perspectives total percentage computer use for academic purposes should take priority, when needed, over use of facebook. 67 53 librarians need to “keep up” with internet trends, such as facebook, even when these trends are not academic in nature. 51 40 library resources should not be monopolized with use of facebook. 35 28 librarians should help students, when able, with questions regarding facebook. 27 21 there is no connection between libraries and facebook. 21 17 student use of facebook on library computers should not be regulated. 15 12 library computers should be available for access to facebook, but librarians should not feel that it is their responsibility to assist students with questions regarding the site. 11 9 (respondents were allowed to check any or all responses that applied.) figure 3. patron complaints about facebook 30 information technology and libraries | march 200730 information technology and libraries | march 2007 viduals agreed that academic use of computers should take priority over recreational use, a polite request that a patron using facebook allow another student to use the computer for academic purposes, when necessary, appears more preferable than the creation and enforce­ ment of strict policies. as one librarian put it, “i don’t want students to see the library as a place where they are ‘policed’ unnecessarily.” when asked if facebook serves any academic pur­ pose, 54 percent of those surveyed indicated that it does not, while 34 percent were “not sure.” twelve percent of the librarians identified academic potential or pos­ sible benefits of the site (figure 4). the authors were surprised to find that 46 percent of those surveyed were not completely willing to dismiss facebook as pure rec­ reation. some librarians found facebook to be a distrac­ tion to academics: “maybe i’m old fashioned, but when do students find time for this kind of thing? i wonder about the impact of distractions like this on academic pursuits. there’s still only twenty­four hours in a day.” another individual asked two students who were using facebook in the library what they thought of the site and they admitted that it was “frequently a distraction from academic work.” for the 34 percent who were not sure whether facebook has any academic value, there were comments such as “i am continuing to observe and will decide in the future.” academic uses for facebook included suggestions that it be used as a communication tool for student collaboration in classes (facebook allows students to search for other students by course and sec­ tion number). one individual suggested it could be used as an “online study hall,” but then wondered if this might lead to plagiarism. some thought instructors could somehow use facebook for conducting online discussion forums, with one participant observing “it’s ‘cooler’ than using blackboard.” “building rapport” with students through a communication medium that many students are comfortable with was another benefit mentioned. respondents who were enthusiastic about facebook thought it most beneficial as a virtual extension of the campus. facebook could potentially fill a void where face­to­face connections are absent in online and dis­ tance­education classes. several librarians suggested that facebook has had a positive influence in fostering col­ legiate bonds and school spirit. as one individual wrote, “[t]he academic environment is not only responsible for scholarly growth, but personal growth as well. this is just one method for students to interact in our highly techno­ logical society.” facebook could provide students who are not physically on campus with a means to connect with other students at their institutions who have similar academic and social interests. some librarians were so enthusiastic about facebook that they suggested libraries use the site to promote their services. using the site to advertise library events and creating online library study groups and book clubs for students were some of the ideas expressed. one librar­ ian wrote: “facebook (and other social networking sites) can be a way for libraries to market themselves. i haven’t seen students using facebook in an academic manner, but there was a time when librarians frowned on e­mail and aim too. if it becomes a part of students’ lives, we need to welcome it. it’s part of welcoming them, too.” more librarians, however, felt that facebook should serve as a space exclusively for students and that librarians, profes­ sors, administrators, police, and other uninvited folks should keep out. furthermore, as one individual noted, it is not “an appropriate venue” for librarians to promote their services. while the review of literature demonstrates that much has been made of online social networks and privacy issues, the librarians surveyed were not particularly con­ cerned about privacy. only 19 percent indicated that they were concerned about privacy issues related to facebook. however, some librarians voiced concerns that many stu­ dents are ignorant about the risks of posting personal infor­ mation and photographs on facebook and do not seem fully aware of the possibility that individuals outside their social sphere might also have reason to access the site. one individual mentioned that the librarians at her institution have begun to emphasize this to students during library instruction sessions on internet research and evaluation. ■ limitations several limitations to this study must be noted when attempting to reach any type of conclusion. participants who had never heard of facebook obviously could not answer any questions except that they were not famil­ iar with the site. some questions required respondents to “guesstimate.” unless librarians have access to their figure 4. finds conceivable academic value in facebook article title | author 31checking out facebook.com | charnigo and barnett-ellis 31 institution’s internet usage statistics, it would be hard for them to really know how much bandwidth is being used by students accessing facebook. librarians, having been trained in a profession that places a high value on freedom of access, might also be wary of activities that suggest any type of censorship. therefore, it is conceivable that some of the librarians surveyed do not know whether students are using facebook in the library because they make a point not to snoop or make note of individual web sites that students view. ■ discussion while online education is growing at a rapid rate across the united states, so is the presence of virtual academic social communities. although facebook might prove to be a passing fad, it is one of the earliest and largest online social networking communities geared specifically for students in higher education. it represents a new form of communication that connects students socially in an online environment. if online academics have evolved and continue to do so, then it is only natural that online academic social environments, such as facebook, will continue to evolve as well. while traditionally considered the heart of the campus, one is left to ponder the library’s presence in online academic social networks. what role the library will serve in these environments might largely depend on whether librarians are proactive and experi­ mental with this type of technology or whether they simply dismiss it as pure recreation. emerging technolo­ gies for communication should provoke, at the very least, an interest in and knowledge of their presence among library and information science professionals. this survey found that librarians were overwhelmingly aware of and moderately knowledgeable about facebook. some librarians were interested in and fascinated with facebook, but preferred to study it as outsiders. others had adopted the technology, but more for the purpose, it would seem, of having a better understanding of today’s students and why facebook (and other online social net­ working sites) appeals to so many of them. it is apparent from this study that there is a fine line between what now constitutes “academic” activity and “recreational” activity in the library. sites like facebook seem to blur this line fur­ ther and librarians do not seem eager or find it necessary to distinguish between the two unless absolutely pressed (e.g., asking a student to sign out of facebook when other patrons are waiting to use computers for academic work). one area of attention this study points to is a lack of con­ cern among librarians toward the internet and privacy issues. some individuals surveyed suggested that librari­ ans play a larger role in making students aware that people outside their society of friends—namely, administrative or authority figures—have the ability to access the informa­ tion they post online to social networks. participants were most enthusiastic about facebook’s role as a space where students in the same institution can connect and share a common collegiate bond. librarians who have not yet “checked out” facebook might consider one individual’s description of the site as “just another ver­ sion of the college yearbook that has become interactive.”42 among the most cherished books in hcl that document campus life at jsu are the mimosa yearbooks. alumni and students regularly flip through this treasure trove of pho­ tographs and memories. no administrator or librarian would dare weed this collection or find its presence irrele­ vant. while year books archive campus yesteryears, online social networks are dynamically documenting the here and now of campus life and shaping the future of how we communicate. as casey writes, “libraries are in the habit of providing the same services and the same programs to the same groups. we grow comfortable with our provision and we fail to change.”42 by exploring popular new types of internet services such as facebook instead of quickly dismissing them as irrelevant to librarianship, we might learn new ways to reach out and communicate better with a larger segment of our users. ■ acknowledgements the authors would like to acknowledge stephanie m. purcell, student worker at the houston cole library, for her excellent editing suggestions and insight into online social networks from the student’s point of view, and john­bauer graham, head of public services at the houston cole library, for his encouragement. references and notes 1. angela reid, “finally . . . the facebook,” the chanticleer, sept. 22, 2005, 4. 2. facebook.com, http://www.facebook.com/about.php (accessed dec. 2, 2005). 3. angus loten, “the great communicator,” inc.com., june 6, 2006, http://www.inc.com/30under30/zuckerberg.html (accessed dec. 4, 2005). 4. adam lashinsky, “facebook stares down success,” fortune, nov. 28, 2005, 4. 5. michael amington, “85 percent of college students use facebook,” testcrunch: tracking web 2.0 company review on facebook (sept. 7, 2005), http://www.techcrunch.com/2005/09/07/ 85­of­college­students­use­facebook (accessed dec. 2, 2005). 6. http://www.facebook.com/about.php. 7. facebook us! if you are a registered member of facebook, do a global search for “laurie charnigo” or “paula barnett­ ellis.” 32 information technology and libraries | march 200732 information technology and libraries | march 2007 8. stephen downes, “semantic networks and social net­ works,” the learning organization 12, no. 5 (2005): 411. 9. ibid. 10. tim o’reilly, “what is web 2.0?” http://www.oreilly net.com/pub/a/oreilly/tim/news/2005/09/30/what­is­web ­20.html (accessed aug. 6, 2006). 11. http://www.facebook.com/about.php. 12. angela provitera mcglynn, “teaching millennials, our newest cultural cohort,” the education digest 71, no. 4 (2005): 13. 13. laura garton, caroline haythornthwaite, and barry well­ man, “studying online social networks,” journal of computer mediated communication 31, no. 4 (1997). 14. facebook.com’s “about” page archives a collection of col­ lege newspaper articles about facebook: http://www.facebook .com/about.php (accessed dec. 4, 2005). 15. brock read, “think before you share,” the chronicle of higher education, jan. 20, 2006, a38–a41. 16. ibid., a41. 17. ibid., a40. 18. shawn mcguirk, “facebook on campus: understanding the issues,” magna web seminar presented live on june 14, 2006. transcripts available for a fee from magna pubs. http://www .magnapubs.com/catalog/cds/598755­1.html (accessed aug. 2, 2006). 19. tracy mitrano, “thoughts on facebook” (apr. 2006) cor­ nell university of information technologies, http://www.cit .cornell.edu/oit/policy/ memos/facebook.html (accessed june 22, 2006). 20. ibid., “conclusion.” 21. tabreez govani and harriet pashley, “student awareness of the privacy implications when using facebook,” unpublished paper presented at the “privacy poster fair” at the carnegie mellon university school of library and information science, dec. 14, 2005, 9, http://lorrie.cranor.org/courses/fa05/tubzhlp .pdf (accessed jan. 15, 2006). 22. ralph gross and alessandro acquisti, “information rev­ elation and privacy in online social networks,” paper presenta­ tion at the acm workshop on privacy in the electronic society, alexandria, va., nov. 7, 2005, 79, http://portal.acm.org/citation .cfm?id=1102214 (accessed nov. 30, 2005). 23. frederic stutzman, “an evaluation of identity­sharing behavior in social network communities,” paper presentation at the idmaa and ims code conference, oxford, ohio, april 6–8, 2006, 3–6, http://www.ibiblio.org/fred/pubs/stutzman _pub4.pdf (accessed may 23, 2006). 24. gross and acquisti, “information revelation and privacy in online social networks,” 73. 25. “myspace: design anarchy that works,” business week, jan. 2, 2006, 16. 26. julian aiken, “hands off myspace,” american libraries 37, no. 7 (2006): 33. 27. ibid. 28. jessi hempel and paula lehman, “the myspace genera­ tion,” business week, dec. 12, 2005, 94. 29. http://www.facebook.com/about.php. 30. hempel and lehman, “the myspace generation,” 87. 31. the authors created the “librarians and facebook” group on facebook to discuss issues concerning facebook and librari­ anship, such as censorship issues, policies, and ideas for con­ necting with students through facebook. this is a global group. if you have a facebook account, we invite you to do a search for “librarians and facebook” and join our group. 32. john c. dvorak, “academics get to work!” pcmagazine online, http://www.pcmag.com/article2/0,1895,1928970,00 .asp (accessed feb. 21, 2006). 33. michael j. bugeja, “facing the facebook,” the chronicle of higher education, jan. 27, 2006, c1–c4; ibid. 34. maria tess shier, “the way technology changes how we do what we do,” new directions for student services 112 (winter 2005): 83–84. 35. ibid., 84 36. shier, “the way technology changes how we do what we do,” 112; j. duboff, “poke” your prof: faculty discovers thefacebook.com,” yale daily news, mar. 24, 2005, http://www .yaledailynews.com/article.asp?aid=28845 (accessed jan. 15, 2006; mingyang liu, “would you friend your professor? duke chronicle online, feb. 25, 2005, http://www.dukechronicle.com/ media/paper884/news/2005/02/25/news/would.you.friend .your.professors­1472440.shtml?norewrite&sourcedomain =www.dukechronicle.com (accessed jan. 15, 2006). 37. brittany farb, “students can ‘check out’ new librarian on the facebook,” student life (washington univ. in st. louis), feb. 27, 2006, http://www.studlife.com/home/index.cfm?eve nt=displayarticle&ustory_id=5914a90d­53b (accessed feb. 27, 2006). 38. brian s. mathews, “do you facebook? networking with students online,” college & research libraries news 37, no. 5 (2006): 306. 39. ibid., 307. 40. view the “houston cole library users want answers!” group by doing a search for the group title on facebook. 41. nces compare academic libraries, http://nces.ed.gov/ surveys/libraries/ compare/peervariable.asp (accessed dec. 2, 2005). the random sample was chosen using the research ran­ domizer available online, http://www.randomizer.org/form .htm (accessed dec. 2, 2005). 42. michael e. casey and laura c. savastinuk, “library 2.0,” library journal 131, no. 14 (2006): 40. article title | author 33checking out facebook.com | charnigo and barnett-ellis 33 1. has your institution been added to the facebook directory?  yes  no (skip to questions 10, 11, and 12  not sure (skip to questions 10, 11, and 12)  i am not familiar with facebook (skip all questions and submit) 2. which best describes your involvement with facebook?  i have a personal account  my library has an account  no involvement 3. which best describes your observation of student use of library computers to access facebook?  all the time  most of the time  some of the time  rarely  never 4. has your library added additional equipment such as computers or scanners as a result of facebook use?  yes  no  no, but we plan to in the future 5. have patrons complained about other patrons using library computers for facebook?  yes  no  not sure 6. has your library had to develop a policy or had to address computer use concerns as a result of facebook use?  yes  no  not sure 7. if your library provides public access to a scanner, has patron use of scanners increased due to the use of facebook?  yes  no 8. have you assisted students with the library’s scan­ ner for facebook?  yes  no 9. if you have provided assistance to students with facebook, please check all that apply:  creating accounts  scanning photographs or offering advice on where students can access a scanner  editing photographs (e.g., resizing photos or use of a photo editor)  uploading photographs to facebook profiles  other __________________________________ 10. check the responses that best describe your opinion about the responsibilities of librarians in assisting students with facebook questions and access to the web site:  student use of facebook on library computers should not be regulated.  library resources should not be monopolized with facebook use.  computer use for academic purposes should take priority, when needed, over use of facebook.  librarians should help students, when able, with facebook questions.  librarians need to “keep up” with internet trends, such as facebook, even if they are not academic in nature.  there is no connection between librarians, libraries, and facebook.  library computers should be available for facebook use, but librarians should not feel that they need to assist students with facebook questions. 11. would you consider facebook to be a relevant aca­ demic endeavor?  yes  no  not sure appendix a: survey on the impact of facebook on academic libraries 34 information technology and libraries | march 200734 information technology and libraries | march 2007 12. if you answered “yes” to question 11, please describe how facebook could be considered an aca­ demic endeavor. ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ 13. please check all answers that best describe what effect, if any, use of facebook in the library has had on library services and operations?  has increased patron traffic  has increased patron use of computers  has created computer access problems for patrons  has created bandwidth problems or slowed down internet access  has generated complaints from other patrons  annoys library faculty and staff  interests library faculty and staff  has generated discussion among library faculty and staff about facebook 14. is privacy a concern you have about students using facebook in the library?  yes  no  not sure please list any observations, concerns, or opinions you have regarding facebook use in libraries. extracted the paragraphs from my palm to my desktop, and saved that document and the tocs on a universal serial bus (usb) key. today, i combined them in a new document on my laptop and keyed the remaining paragraphs in my room at an inn on a pier jutting into commencement bay in tacoma on southern puget sound. i sought inspiration from the view out my window of the water and the fall color, from old crow medicine show on my ipod, and from early sixties beyond the fringe skits on my treo. fred kilgour was committed to delivering informa­ tion to users when and where they wanted it. libraries must solve that challenge today, and i am confident that we shall. editorial continued from page 3 ital_24n4.pdf ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ in march 2003 the university of mississippi libraries made our metasearch tool publicly available. after a year of working with this product and integrating it into the library web site, a wide variety of libraries interested in our implementation process and experiences began to call. libraries interested in this product have included consortia, public, and academic libraries in the united states, mexico, and europe. this article was written in an effort to share the recommendations and concerns given. much of the advice is general and could be applied to many of the metasearch tools available. google scholar and other open web initiatives that could impact the future of metasearching are also discussed. m any libraries are looking for ways to facilitate the discovery process for users. implementing a one­stop search product that does not require database­specific knowledge is one of the paths librar­ ies are choosing.1 as these search engines are made available to patrons, the burden of design falls to the library as well as to the product developers. most library users may be familiar with a few databases, but the vast majority of electronic resources remain unrevealed. using a metasearch product, a single search is broadcast out to similar and divergent electronic resources, and search results are returned and typically mixed together. metasearch results are returned in real­time and link the user to the native interface. although there are many products that support one­stop searching, the university of mississippi libraries chose to purchase innovative interfaces’ metafind product because it tied into a digital initiative partnership with innovative. some of the possibilities of the types of resources you can search include: n library catalogs n licensed databases n locally created databases n full text from journals and newspapers n digital collections n selected web sites internet search engines the simplicity of google searching is very appeal­ ing to users. in fact, users have come to expect this kind of empowering tool. at the university of mississippi, students use and have been using google for research. as google scholar went public, it became evident that university faculty also use it for the same reasons. it was apparent from the university of mississippi libraries’ 2003 libqual+ survey results that users would like more personal control than the library was offering (table 1). unintentionally elaborate mazes are created and users become lost in a quagmire of choices. as indicated by our libqual+ survey results, our users want easy­to­use tools that allow them to find informa­ tion on their own, and they want information to be easily accessible for independent use. these are clearly two areas that many libraries are struggling to improve for their patrons. the question is how to go about it. based on several changes made between 2003 and 2005, which included implementing a metasearch tool, the adequacy mean improved for both questions and for undergradu­ ates as well as graduate students and faculty (table 2). the adequacy mean compares the minimum level of ser­ vice that a user expects with the level of service that they perceive. in table 1, the negative adequacy mean figures indicate that the library was not meeting users’ minimum level of service for these two questions or that the per­ ceived level of service was lower than the minimal level of service. table 2 compares the adequacy mean from 2005 with 2003 and indicates a notable, positive change in adequacy mean for each question and with each group. n design perspectives and tension generally, there are conflicts within libraries regarding the question of how to improve access for patrons and allow for independent discovery. for those leading a metasearch implementation, these tensions are important to understand. in implementing new technologies, there are key development issues that may decrease internal acceptance until they are addressed. however, one may also find that there are some underlying fears regarding this technology. although the following cross­subculture comparisons simply do not do justice to each of the valid perspectives, these brief descriptions highlight the types of perspectives one might encounter when considering or implementing a metasearch product. expert searchers prefer native interfaces and all of the functionalities of the native interface. they are typically unhappy with the “dumbed­down” or clunky searching of a metasearch utility. they would prefer for patrons to be taught the ins and outs of the database they should be using for their research. this presupposes that the students either know which database to use, will spend time inves­ tigating each database on their own, or that they will ask for assistance. however, there are clearly native interface 44 information technology and libraries | june 2007 metasearching and beyond: implementation experiences and advice from an academic library gail herrera gail herrera (gherrera@olemiss.edu) is assistant dean for technical services & automation and associate professor at the university of mississippi. metasearching and beyond | herrera 45 functionalities—such as lim­ iting to full text—that, while wonderful to patrons, are not consistent across resources or a part of the metasearch standard. users would cer­ tainly benefit if limiting to full­text was ubiquitous among vendors and if there were some way to determine full­text availability within metasearch tools. results ranking is another issue that expert searchers may bring to the table. currently, there is a niso metasearch initiative that is striving to standard­ ize metasearching.2 another downside for the expert searcher is that there is no browse function. those who are in administrative or manage­ rial positions working with electronic resources see metasearching as an opportunity to reveal these resources to users who might not otherwise discover them. for example, many users have learned to search ebsco’s academic search premier not realizing that key articles on a local civil rights figure such as james meredith are also available in america: history & life, jstor, and lexisnexis. metasearching removes the need for the user to spend additional time choosing databases that seem relevant and searching them indi­ vidually. from a financial perspective, if a library is pay­ ing for these electronic resources, they should be using them as much as possible. and while the university of mississippi libraries generally target the undergraduate audience with our metasearch tool, the james meredith search is a good example of how a metasearch tool might reveal other databases with information that a serious researcher could then further investigate by link­ ing through the citation to the native interface. those associated with library instruction may also be uncomfortable with metasearching. in fact within a short time of implementing the product, several instructors conveyed their fear that in making searching so simple, they would no longer have a job as the product developed. generally, it seems that users are always in need of instruc­ tion although the type of instruction and the tools continue to change. it is an understandable fear and one that would be wise to acknowledge for those embarking on a metasearch implementation. while metasearch can be an empowering tool for users, you may also encounter some emotional reactions among library employees. from an information literacy point of view, frost has noted that metasearching is “a step backward” and “a way of avoiding the learning process.”3 it is true that in providing an easy search tool, the library is not endeavoring to teach all students intermedi­ ate or advanced information retrieval knowledge or skills. however, it is important to provide tools that meet users at their level of expertise and as previously noted, this is an area identified in need of improvement. for those working at public service points such as the reference desk, metasearching is an adjustment. many times those working with patrons tend to use databases with which they are more familiar or in which they feel more confident. federated search tools may reveal resources that are typically less used and therefore unfa­ miliar to library employees. training may then become an issue worthy of addressing not just for the metasearch interface and design but also for the less­used resources. for those involved in technical support, this product may range from exciting to exasperating. the amount of time your technical support personnel have to dedicate to your metasearch project should be a major factor when investigating the available products. just like any other technological investment, you are either going to (1) purchase the technology and outsource manage­ ment or (2) obtain a lesser price from a vendor for the tool and invest in developing it yourself. there is also a middle ground, but this cost­shifting is important to keep in mind. regardless of your approach, it is critical to include the technical support person on your imple­ mentation team and to keep in mind the kind of time investment that is available when reviewing prices. along with developing this product, one may also find oneself investing additional time and money into infra­ structural upgrades such as the proxy server, network equipment, or dns servers. in addition to these perspectives, there is a general tension in library web site design philosophies between how librarians would like patrons to use their services table 1. 2003 libqual adequacy mean undergrad grad faculty easy-to-use access tools that allow me to find things on my own -.10 -.30 -.29 making information easily accessible for independent use .37 -.09 .03 table 2. positive change in libqual adequacy mean from 2003 to 2005 undergrad grad faculty easy-to-use access tools that allow me to find things on my own .53 .46 .24 making information easily accessible for independent use .22 .22 .45 46 information technology and libraries | june 2007 and what patrons want. the traditional design based on educating users and having users navigate to information “our way” has definitely curtailed over the past several years with attention being paid increasingly to usability. as usability studies give librarians increasing informa­ tion, libraries are moving toward designing for our users based on their approaches and needs rather than how librarians would have them work. depending on where one’s library is in this spectrum of design philosophy, implementing a metasearch tool may be harder or easier. judy luther surmised the situa­ tion well, “for many searchers, the quality of the results matter less than the process—they just expect the process to be quick and easy.”4 moving toward this lofty goal is to some extent dictated by the abilities and inabilities of the technologies chosen. as a technologist, the general rule seems to be that the easier navigation is made for our users; the more complex the technical structure becomes. n metasearch categories in arranging categories of searches for a metasearch product, some libraries group their electronic resources by subject, and others use categories that reflect full­text avail­ ability. the university of mississippi libraries use both. the most commonly used category is our full­text category. this full­text category was set as the default on our most popular search box located on our articles and databases web page (figure 1). since limiting to full­text materials is not a standard, the category was defined by the percentage of full­text they contain. this is an important distinction to understand because a user may receive results that are not full­text, but the majority of results will likely be full­text. at our library, if the resource contains more than 50 percent full­text, it is included in the full­text category. other categories included in this implementation are ready reference, library catalogs, digital collections, lim­ ited resources, publicly available databases, and broad subject categories. one electronic resource may be included in the full­text category, a broad sub­ ject category such as “arts and humanities” and also have its own individual category in order to mix and match individual resources on sub­ ject guides using a tailor­made search box. the limited resource category contains resources that should be searchable using the metasearch tool but that have a limited number of simultaneous users. if it were included in the default full­text category that is used so much, it would tie up the resource too much. investigating resources with only one or two simultaneous users at the begin­ ning of the project may help you avoid error messages and user frustration. one might wonder, “why profile limited resources then?” there may be specific search boxes on subject guides where librarians decide to add that individual but limited resource. it might also be necessary to shorten the time­out period for limited user resources. along those same lines, having pay­per­search resources profiled could also be expensive and is not recommended. since the initial implementation, migrating away from per­ search resources has become a priority. within the first few months of implementation, the free resources such as pubmed and askeric were moved to a new “publicly available” category. the reason is that since there is not any authentication involved, these results return very quickly and are always the first results a user sees. while they are important resources, our intent was really to reveal our subscription resources. this approach allows users to search these resources if specifically chosen but they are not included in the default full­text category. this approach does still allow subject librarians to mix and match these free individual resources on subject guide search boxes. n response time of all of the issues with our metasearch tool, response time has been the most challenging. there are so many issues when it comes to tracking down sluggish response that it can be extremely difficult to know where to start. if one’s metasearch software is not locally hosted, response time could involve the library network, campus network, off­campus network provider, and the vendor’s network, not to mention the networks of all the electronic resources users are searching. when one adds the other variable of authentication, the picture becomes even more over­ whelming and difficult to troubleshoot. for authentication, the university of mississippi libraries purchased innovative’s web access management module (wam), which is based on the figure 1. metasearch tailored search box with full text category selected metasearching and beyond | herrera 47 ezproxy software. as the use of our electronic resources from on­campus and off­campus has grown, the inci­ dence of increasing network issues has risen. in work­ ing with our campus telecommunications group, the pursuit of ever­greater bandwidth has become a priority. troubleshooting has included tracking down trouble­ some switch settings, firewall settings, as well as campus dns and vendor dns issues. if your network adminis­ trators use packet shapers, this may be another hurdle. clearly, our metasearch product has placed a significant load increase on the proxy server. in looking at proxy statistics, 24 percent of total proxy hits were from the metasearch product (figure 2). with this in mind, one may find the load on one’s proxy server increasing very dramatically during peak usage and may need to plan for upgrades accordingly. even with improvements and tweaks along the way, response time is still an issue and one of the highest hurdles in selling a metasearch product internally and externally. one metasearch statistical module includes response time information for individual resources along with usage data. the response time information would be very helpful in troubleshooting and in working with electronic resource vendors. usage tracking is another criterion to consider in reviewing metasearch products. n response time and tailored search boxes during implementation, one of the first discussions to have is who will be the target audience for this product. at this institution, undergraduates were the target audi­ ence and more specifically, those looking for three to five articles for a paper. while our metasearch software has a master screen showing all of the resources divided into the main categories, facing users with over sixty check boxes was not a good solution (figure 3). this master screen is good for demonstrating categories to library staff, overall functionality of the technology, and also for quickly checking all of your resources for connectivity errors. from early conversations with students, keeping basic users far away from this busy screen is a good goal. remember, the purpose is to give them an easy starting point. the best way to keep users in a simple search box is to construct search boxes and hand­pick either individual resources or categories keep­ ing in mind the context of the web page. for example, the articles and databases page has a simple search box that searches for articles. subject guide boxes search individual electronic resources selected by the subject librarian. the university of mississippi libraries also have a large col­ lection from the american institute of certified public accountants (aicpa). the search box on that page searches our catalog, which contains aicpa books along with the aicpa digital collection. some libraries are interested in developing a standard metasearch box to display as a widget or standing content area throughout their web site. this is interesting and worth considering. however, matching the web page content with appropri­ ate resources has been our approach. as the standards and technology develop, this may be worth further con­ sideration depending on usability findings. for the most commonly used search box on the articles and databases page (figure 1), the default category checked is the full­ text articles category. donna fyer stated that, “for the average end user, the less decision making, the better.”5 this certainly rings true for our users. originally, a simple metasearch search box was placed on the library homepage. the library catalog and the basic metasearch box were both displayed. this seemed confusing for users since both products have search capabilities. with the next web site redesign, the basic metasearch box moved from the library homepage to the articles and journals web page. this was a success­ ful place for the article quick search box to reside since the default was set to search the full­text category. there were some concerns that users might be typing journal titles into the search box but these were rare instances and not necessarily inappropriate uses. the next rede­ sign eventually moved this search box to the articles and databases page, where it remains. for the articles and databases pages, the simple search box (figure 1) by default searches the full­text category and searches the title keyword index. the index category with the label, “article citations,” can also be checked by the user. the majority of metasearches begin with this search box and figure 2. total proxy hits vs. metafind proxy hits 4� information technology and libraries | june 2007 most users do not change the default settings for the resources or the index. n subject guide search boxes in addition to the “article quick search” box, subject librarians slowly became interested in a search box for their subject guides as the possibili­ ties were demonstrated. in order to do this, the ven­ dor was asked to profile each resource with its own unique value in order to mix and match individual resources. while the idea of searching resources by subject category sounds useful and appealing, sometimes universal design begets universal dis­ cord. even with a steering committee involved, it is hard for everyone to agree what resources should be in each of the main subject categories: arts and humanities, science and engineering, business and economics, and social science. some libraries have put a lot of time and effort into creating a large number of subject categories. the master search screen (figure 3) displays several of this library’s categories but not the broad subject categories noted above. these general sub­ ject categories are brought out in the multipurpose interface called the “library search engine” (figure 4). the library search engine design is a collection of the categories and resources showing the full functionality of our metasearch tool. the subject categorization approach within our metasearch interface is a good way to show the multifunction­ ality of the product but remains relatively unused by patrons. by giving each resource its own value, subject librarians have the flexibility to select spe­ cific resources and/or categories for their subject guides. it is worth noting that it required additional setup from our vendor and was not part of the original implementation. after a few months of testing with the initial implemen­ tation, willing subject librarians chose individual resources for their tailored search boxes. once a simple search box has been constructed, it can be easily copied with minor modi­ fications to make search boxes for those requesting them. while progress was slow to add these boxes to subject guides, after about a year there was growing interest. in setting these up, subject librarians have several choices to make. first of all, they choose the resources that will be searched. for example, the biology subject guide search box searches academic search premier, bioone, and jstor by default. basicbiosis and pubmed are also avail­ able but are not checked by default. users can check these search boxes if they also wish to search these resources. choosing the resources to include in the search box as well as setting what resources are checked by default is the most important decision. the subject librarian is also encour­ aged to assist in evaluating the number of hits per resource returned. with response time being a critical factor, deter­ mining the number of hits per resource should involve testing and take into consideration the overall number of resources being searched. n relevance selecting the default index is another decision in setting up search boxes. again, users are google­oriented and tend to go with whatever is set as the default option. out of the box, our metasearch tool defaults to the keyword index or keyword search. the issue of relevancy is a hot topic for metasearch products. this issue typically comes up in metasearch discussions. it is also listed as an issue in the niso metasearch initiative. from the technical side of the equation, results are displayed to the user as soon as they are retrieved. this allows users to begin immediately exam­ figure 3. master screen display (partial screenshot) figure 4. library search engine subject categories metasearching and beyond | herrera 4� ining the results. adding a relevancy algorithm as a step would mean all of the results would have to be returned, ranked, and then displayed. with response time being a key issue, a faster response is more important than relevance. another consideration is if the metasearch results are displayed to the user as interfiled or by electronic resource where the resource is returning results based on its own relevancy rankings. one way to increase relevance is to change the default index from keyword to title keyword. for our students, bringing back keywords in the title made the results more relevant. this is the default index used for our article search on the articles and database web page. subject librarians have the choice of indexes they prefer when blending resources. one caveat in using title keyword is that there are resources that do not support title keyword searching. for other resources, title keyword is not an appropriate index. for example, wilson biographies does not have a title keyword search. it makes perfect sense that a biography database would not support title keyword searching. in these cases, the search may fail and note that the index is not supported. to accommodate this type of exception, the profile for wilson biographies needed to be changed to have the title keyword search­mapped to a basic keyword search. while this does not make the results as relevant as the other search results, it keeps any errors from appearing and allows results to be retrieved. n results per source and per page for metafind, there are also two minor controls that can work as hidden values unseen by the patron or as compo­ nents within the search box for users to manipulate. the first control is the number of hits to return per resource. if a subject librarian is only searching two or three resources in his tailored search box, he probably will want to set this number higher. if there are many resources, this number should be lower in order to keep response time reasonable. the second control is the number of results to return per page. in general, it is important to adjust these controls after testing the response for the resources selected. while users typically use the default settings, showing these two con­ trols gives the user a visual clue that the metasearch tool is not retrieving all of the results from the resource. instead, it is only retrieving the first twenty­five, for example. n implementation advice one of the most important pieces of advice is that it is extremely important to have a date in one’s contract or rfp for all of the profiling to be completed if the vendor is doing the resource profiling. from this library’s experi­ ence, the profiling of a resource can take a very long time, and this is a critical point to include in the contract. one might also consider adding cost and turn­around time for new resources after the initial implementation to the contract. the more resources profiled, the more useful the product. however, one also needs to pay attention to response time. if the plan is to profile one’s own resources or connectors, librarians should be mindful of the time involved and ask other libraries with the same product about time investments. being able to work with vendors who will provide an opportunity to evaluate the product “live” is preferable. in deciding who to target for an implementation team, consider representatives from reference, collection development, and systems. it is also very important to include whoever manages electronic resource access/ subscriptions and a web manager. in watching other pre­ sentations, exclusion of any of these representatives can seriously undermine the implementation. buy­in is essen­ tial to success. additionally, giving librarians as many options as possible, such as control over what types of resources are in their search boxes as well as the number of hits per resource makes the product more appealing. n questions to ask once the implementation team is set, interviewing refer­ ences for the products under consideration is an impor­ tant part of the process. unstructured conversations with references really allow librarians to explore together what the group wants and how its needs fit with the services the vendor offers. a survey of questions via e­mail is another possibility. in choosing this method, be sure to leave some room for open comments. regardless of the approach, it is important to spend some time asking ques­ tions. provided are a list of recommended questions: n who is responsible for setting up each resource—the vendor or you? n how much time does it typically take to set up a new resource and what is the standard cost to add a new resource? n is there a list or database of already­established pro­ files for electronic resources for this product? n how much time would you estimate that it took to implement the product? n will you be able to edit all of the public web pages yourself or will you be using vendor support staff to make changes? if the vendor support staff has to make some of the changes, how responsive are they? 50 information technology and libraries | june 2007 n can you easily mix and match individual resources for subject guides, departmental pages, or other kinds of web pages? or do you only have the option to set up global categories? n is your installation local or does the vendor host it? are there response issues? n is there an administrative module to allow you to maintain categories, resource values, and configura­ tion options? n how much time goes into managing the product monthly? and who manages the product at your library? n what kind of statistical information does the vendor provide? n how satisfied are you with the training, implementa­ tion support, and technical documentation? n how does the vendor handle broken resources or subscription changes? as with most technologies, there are upfront and hid­ den costs. it is important to determine what hidden costs are involved and if you have the resources to support all of the costs. sometimes libraries choose the least expen­ sive product. however, this approach can lead librar­ ies down the path of hidden costs. for example, if the product is less expensive but your library is responsible for setting up new electronic resources, managing all of the pages, and finding ways to monitor and troubleshoot performance outside of the tools provided, the hidden expenditures in time and training may be more costly in the end than purchasing the premium metasearch tool. in essence, one must pay for the product one way or another. the big question is, where are the resources to support the product? if one’s library has more it/web personnel than money, the lower­costing product may be the way to go, but be sure to check with other librar­ ies to see if they have been able to successfully clear this hurdle. additionally, if your library has more one­time money than yearly subscription money, this may dictate the details of the rfp, and your library may lean toward a purchase rather than an annual subscription. n metasearch summary clearly, students want a simple starting place for their research. implementing a metasearch tool to meet this need can be a hard sell internally for many reasons. at this institution, response time has been the overriding critical issue. response has lagged due to server and network issues that have been difficult to track down and improve. however, authentication is truly the most time­ consuming and complex part of the equation. some fed­ erated search tools are actually searching locally stored information, which helps with response. while these are not truly metasearch tools and are not performing real­ time searches, this approach may yield more stability with faster response. over the years in implementing new services such as the library web site, illiad, electronic resources, and off­ campus authentication, new services are often adopted at a much faster rate by library users than by library employees. typically, there will be early adopters who use the services immediately based on need. it then takes general users about a year to adopt a new service. iii’s metasearch technology has been available for the past four years. however, our implementation is evolving with each web site redesign. still, it is used regularly. the university of mississippi libraries has been pro­ viding access to its electronic resources in two distinct ways: (1) providing urls on web pages to the native interface of the electronic resource and (2) metasearching. as the library moves forward in developing digital col­ lections and the number of electronic resources profiled for metasearching increases, it is possible that this kind of global discovery tool will compete in popularity with the library catalog. providing such information mining tools to patrons will cause endless frustration for the library literate. response times, record retrieval order, as well as licensing and profiling issues, are all obstacles to pro­ viding a successful metasearch infrastructure. retrieval inconsistency and ad hoc retrieval order of records is very unsettling for librarians. however, this is the kind of tool to which web users have become accustomed and certainly seems to fill a need that to date has been lacking where library electronic resources are concerned. n open web developments one other trend appearing is scholarly research discovery tools on the open web. enter google scholar along with other similar initiatives such as windows live academic search. google scholar beta was released in november 2004 and very soon after began an initiative to work with libraries and their openurl resolvers.6 this bridging between an open web tool and libraries is an interest­ ing development. a fair amount has been written about google scholar to date although the project is still in its beta phase. what does google scholar have to do with metasearching? good question. it remains to be seen how much scholarly information will become search­ able via google scholar. for now, the jury is still out as to whether google scholar will begin to encroach upon the traditional territory of the indexing and abstracting world. if sufficient content becomes available on the open web, whether from publishers or vendors allowing their metasearching and beyond | herrera 51 content to be included, then the authentication piece that directly effects response time may be overcome. in using google scholar or other such open web portals, search­ ing happens instantly. when a user uses the openurl resolver to get to the full­text, that is where authentication enters into the picture and removes the negative impact on searching. the tradeoff is that there are many issues involved in openurl linking and the standardization of the metadata needed to provide consistent access. there are many parallels between what google scholar is attempting to offer and what the promises of metasearching have been. for metasearching, under­ graduate students looking for their three to five articles for a paper are considered our target audience. for in­ depth searching, metasearching does have limitations, but for the casual searcher looking for a few full­text articles, it works well. interestingly, similar recommen­ dations are being made for google scholar.7 however, opinions differ on this point. roy tennant went so far as to indicate it is a step forward in access to those users without access to licensed databases, but remained reserved in his opinion regarding the usefulness for those with access.8 google scholar also throws in a few bonuses. while providing access to open access (oa) materials in our opac for specific collections such as the directory of open access journals, these same resources have not been included in our metasearch discovery tool. google scholar is searching these open repositories of scholarly informa­ tion, although there is some concern over the automatic inclusion of materials such as syllabi and undergraduate term papers within the institutional repositories.9 google scholar also provides a useful citation feature and rel­ evancy. google scholar recognizes the user’s preference for full­text access and provides a visual cue from the brief results when article full­text is available. this func­ tionality is not currently available from our metasearch software but would be extremely helpful to users. on the downside, some of google scholar’s linking policies make it difficult for libraries to extend services beyond full­ text articles to their users. another notable development among subscription indexing services is the ability to reveal content to web search engines. ebsco’s initiative is called ebscohost connection.10 in implementing metasearching, libraries have debated about providing access to free versus subscrip­ tion resources. for our purposes, free resources were not included in the most commonly used search in the full­ text category. there are those who would argue against this decision, and they have very good points. in fact, it has already been noted that some libraries use google scholar to verify incomplete interlibrary loan citations quickly.11 in watching the development of google scholar, it seems possible that this free tool that uncovers free open access resources and institutional repository mate­ rials may not necessarily be a competitive product, but may be a very complementary one. n impact on the opac what will this mean for the “beloved” opac? for a very long time, users have expected more of the library catalog than it has provided. while the library catalog is typically appreciated by library personnel, its usefulness for finding materials other than books has been hard for general users to understand. many libraries including the university of mississippi have been loading records from their electronic resources in hopes of making the library catalog more useful. the current conversation regarding digital library creation also begs the question, “what is the library catalog?” although the library catalog serves as a searchable inventory of what the library owns, it is simply a pointing mechanism, whether it points the user to a shelf, a building, or a url. in our endeavor to provide instant gratification and full­text, as well as the user’s desire for information regardless of format, the library catalog is beginning to take a backseat. it was clear four years ago in plan­ ning digital collections that a metasearch tool would be needed to tie together subscription resources, digital collections, publicly available resources, and the library catalog. it will be interesting to see whether patrons choose to use the formal tools provided by the library or the informal tools developing on the open web, such as google scholar, to perform their research. more than likely, discovery and access will happen through many avenues. while this may complicate the big picture for those in library instruction, it is important to meet users on the open web. one’s best intentions and designs are presented to users but they may choose unintended paths. librarians should watch the paths they are taking and build upon them. sometimes even one’s best attempts fall short, as pointed out clearly in karen schneider’s latest series, “how opacs suck.”12 still it is important to acknowl­ edge design shortcomings and keep forging ahead. dale flecker, who spoke at the taiga forum, recommended not to spend years trying to “get it right” before imple­ menting, but instead to consider ourselves in perpetual beta and simply implement and iterate.13 in other words, do not try to make the service perfect before implement­ ing it. most libraries do not have the time and resources to do this. instead, find ways to gain continual feedback and constantly adjust and develop. students are familiar with internet search engines and do not want to choose between resources. access to a simple resource discovery tool is an important service for users. unfortunately, authentication, product design 52 information technology and libraries | june 2007 and management, and licensing restrictions tend to be stumbling blocks to providing fast and comprehen­ sive access. regarding the metasearch tool used at the university of mississippi libraries, development part­ nerships have already been formed between the vendor and a few libraries to improve upon many of the issues discussed. innovative is developing a next­generation metasearch product called research pro that leverages ajax technology. while efforts are made to participate in discussions and develop our already­existing tools, it is also impor­ tant to pay attention to other developments such as google scholar. at this point, google scholar is in beta but this kind of free searching could turn the current infra­ structure on its ear to the benefit of patrons. the efforts to meet users on the open web and reveal scholarly content are definitely worth keeping an eye on. references 1. roland dietz and kate noerr, “one­stop searching bridges the digital divide,” information today 21, no. 7 (2004): s24. 2. niso metasearch initiative, http://www.niso.org/ committees/ms_initiative.html (accessed may 8, 2006). 3. william j. frost, “do we want or need metasearching?” library journal 129, no. 6 (2004): 68. 4. judy luther, “trumping google? metasearching’s prom­ ise.” library journal 128, no. 16 (2003): 36. 5. donna fyer, “federated search engines,” online 28, no. 2 (2004): 19. 6. jill e. grogg and christine l. ferguson, “openurl link­ ing with google scholar,” searcher 13, no. 9 (2005): 39–46. 7. mick o’leary, “google scholar: what’s in it for you?” information today 22, no. 7 (2005): 35–39. 8. roy tennant, “is metasearching dead?” library journal 130, no. 12 (2005): 28. 9. o’leary, “google scholar.” 10. what is ebscohost connection?, http://support.epnet .com/knowledge_base/detail.php?id=2716 (accessed may 10, 2006). 11. laura bowering mullen and karen a. hartman, “google scholar and the library web site: the early response by arl libraries,” college & research libraries 67, no. 2 (2006): 106–22. 12. karen g. schneider, “how opacs suck,” ala techsource, http://www.techsource.ala.org/blog/karen+g./sch­ neider/100003/ (accessed may 10, 2006). 13. dale flecker, “my goodness, life is different,” pre­ sentation to the taiga forum, mar. 27–28, 2006, http://www .taigaforum.org/pres/fleckerlifeisdifferenttaiga20060327.ppt (accessed may 10, 2006). lita cover 2, cover 3, cover 4 index to advertisers lib-mocs-kmc364-20140106083504 the recon pilot project: a progress report october 1970-may 1971 159 henriette d. avram and lenore s. maruyama: marc development office, library of congress, washington, d. c. synopsis of three progress reports on the recon pilot project submitted by the library of congress to the council on library resources covering the period october 1970-may 1971. progress w reported in the following areas: recon production, foreign language editing test, format recognition, microfilming, input devices, and tasks assigned to the recon working task force. introduction with the implementation of the marc distribution service in march 1969, the library of congress and the library community have had available in machine readable form the catalog records for english language monographs cataloged since 1969. most libraries, however, also need to convert their older cataloging records, and the library of congress attempted to meet these needs by establishing the recon pilot project in august 1969. during the two-year period of the pilot project, various techniques for conversion of retrospective bibliographic records have been tested, and a useful body of catalog records is being converted to machine readable form. the pilot project is being supported with funds from the library of congress, the council on library resources, and the u.s. office of education. earlier articles in the journal of library automation have described the progress through september 1970 ( 1, 2, 3 ). this article covers the period october 1970 through may 1971. 160 journal of library automation vol. 4/3 september, 1971 progress-october 1970 through may 1971 recon production the conversion of 8476 records in the 1969 and 7-series of card numbers that had not been included in the marc distribution service was completed, and these records were sent to 47 subscribers of the marc distribution service. the subscribers were not charged for these records but were asked to send a tape reel to the library for the duplication process. at present, the recon data base consists of 25,206 records in the 7, 1969, and 1968 series of card numbers. records in the 1968 series that were part of the data base for the marc pilot project are being converted by program from the marc i format to the marc ii format, proofed, and updated. to date, 7551 out of 7583 marc i records have been processed. prior to the implementation of the marc distribution service, records were input for test purposes, and the resulting practice tapes contain data requiring correction or updating to correspond with the present specifications of the marc ii format. of the 8340 titles on the practice tapes, 3460 have been updated and reside on the recon master file. these updated machine readable records will be distributed with the recon titles in the 1968 card series. foreign languages editing experiment a foreign language editing experiment was conducted to test the accuracy of marc/recon editors in editing french and german language records. records used for this test included 1180 of the 5000 recon research titles. at least 50 percent accuracy was expected since half of the task of editing a marc record involves being able to read the language of the record. the other half involves identifying the data elements by their location in the record. the three editors used in the experiment had studied french in high school, one having had an additional year in college; none had studied german. each editor was required to edit approximately 200 records in each language. statistics on the number of records edited per hour and the number of errors made, when compared with the same editors' statistics for editing english language records, showed that each editor maintained an approximately equal rate of speed in editing foreign language records as in editing english. the error rate for each editor, however, was more than tripled on foreign records, and each made approximately as many errors in french (the language studied) as in german. each editor averaged more than 12 errors per batch in french and 12 in german. since the marc editorial office has established a standard of 2.5 errors per batch ( 20 records comprising a batch ) as being acceptable for trained marc editors, this error rate would have to be lowered in a production environment. the majority of errors occurred in the title field, which is a portion of the recon pilot projectjavram and maruyama 161 the record that must be read for content in order to be edited correctly. the second largest number of errors occurred in the fixed fields, which are also dependent upon a reading knowledge of the language of the record for accurate coding. the number of errors made in each batch of records by each editor was tabulated to determine if any improvement was made during the course of the experiment. in no case was improvement noted. statistics were also kept on the number of times an editor consulted various sources for help: e.g., dictionaries, the editing manual, the lc official catalog, the reviser, or a language specialist. dictionaries were consulted frequently, and the reviser and language specialists rarely. typing statistics (number of errors) were also recorded for 181 french and 185 german records. the error rate for typing foreign language material was lower than for typing english. the english language statistics, however, were combined for several typists, and the foreign language statistics were for one typist only. charts showed that there was no improvement in the number of typing errors made at the end of the test. the primary conclusion drawn from the results of the experiment is that in order to edit foreign language records with an acceptable degree of accuracy, it would be necessary for the editor to have a good knowledge of the language as well as the editing procedures. f orrnat recognition format recognition is a technique that allows the computer to process unedited bibliographic records by analyzing data strings for certain keywords, significant punctuation, and other clues to determine proper identification of data fields. the library of congress has been developing this technique since early 1969 in order to eliminate substantial portions of the manual editing process, which in turn should represent a considerable savings in the cost of creating machine readable records. the recon report, which was written prior to the completion of the first format recognition feasibility study, concluded that "partial editing combined with format recognition processing is a promising alternative to full editing." ( 4) since that time, the emphasis in the deve1opment of the programs has been shifted to no editing prior to format recognition processing. the programs are in the final stages of acceptance testing, and it is expected that 75% of the records can be processed without errors created by the format recognition programs. preliminary estimates show that it takes approximately half a second of machine time to process one record by format recognition ; the manual editing process, on the other hand, takes approximately six minutes per record. the total amount of core storage required is approximately 120k: 80k for the programs and 40k for the keyword lists. although the keyword lists are maintained as a separate data set on a 2314 disk pack, they are loaded into memory during processing. the format recognition programs have been written 162 journal of library automation vol. 4/3 september, 1971 in assembler language for the library's ibm 360/40 under dos. the logical design of the format recognition process, with detailed flow charts needed for implementation of computer programming, has been published as a worki~;tg document by the american library association so that the technical content would be available to assist librarians in their automation projects ( 5). workflow for format recognition begins with the input of unedited catalog records via the mt /st following the typing specifications created for format recognition. mter being processed by the format recognition programs, these records are proofed by the editors (the first instance in which they see the records), and the necessary corrections or verifications made. correction procedures for format recognition records are the same as those used for regular marc records. figures 1, 2, and 3 are examples of the printed card used for input, the mt /st hard copy, anq the proofsheet of the record created by format recognition. initial use of the format recognition programs is for input of approximately 16,000 recon records in the 1968 card series. input of current marc records via format recognition will begin at a later date. recon records were chosen for large-scale testing because they are not required for an actual production operation such as the marc distribution service. in addition, work has begun on the expansion of format recognition to foreign languages. analysis is being done on german and french monograph records, and eventually spanish, for new or expanded keyword lists and some changes to the algorithms. ewart, andrew. the world's greatest ion' n if airs. london. odhu m~. hl(;j ti. e. 19681• 287 p. 8 plates, lllus .. ports. 2~ em . 20/( n 68-. library of congress 0 301.41'4'0922 lir-97457 hq80l.a2eo fig. 1. input for format recognition. the recon pilot pro;ect/avram and maruyama 163 hq80l.a2e9 ewart, andrew the world's greatest love affairs.#london, odhams, 1967 [i. e. 1968]. 287 p. 8 plates, illus., ports. 22 em. 25/(b68-03757) l.l love. 2. biography. i. title. 301.41/4/0922 68-97457 library of congress fig. 2. mt j st hard copy. 050/ 1 100/1 68-97457 cal :$ab ---·----------·---. ---meps :ta *ewnrt , 1\ndrew. ----------------·-----------------------245/ 1 tila~ *the world's greatest love affair s. 260/ 1 i mp *abc *london , *odhams, *1 967 [i.e. 1 968) . ---·------------------------·-------300/1 col *abc *287 p . *8 p lates , illus., ports , 22*cm . 350/1 pri *e. 015/ 1 :mrha *b68-03757 650/ 1 sut-l*a *love . -------------------·------·--650/2 sut-l*a *biography. 0 --------------------------------08 2/1 ddc*a *301.41 /4/0 922 --;o'bft~c--=-~ --==~-~--~--1 ·~-_--2 -~--~-=i-;_ ~-:-_---;---_ ~ . ~~== c. c. ~-r11r..~-~1tl~.b~·-+13~.--~1*~~.--~1~5~.~etmtyr-------------·------m-;-s-21-.-l%&-h.-------r-3-;-en-!r-z*;-aef'-2-5.--------------. --2-6-; --~.-m--~--t9~--'*l-;-----7r.----fig. 3. proofsheet of format recognition r ecord. microfilming for a full-scale retrospective conversion project at the library of congress, it is likely that records for input would be microfilmed from the card division record set and updated from the corresponding records in the library's official catalog. a subset of the record set, such as the catalog cards for a given year, would be microfilmed and then the appropriate records, i.e., english language monographs, german monographs, etc., would be selected after filming. costs were calculated for a base figure of 100,000 records for the year 1965, and four different methods of 164 journal of library automation vol. 4/3 september, 1971 microfilming have been estimated as follows by the library's photoduplication service: 1) microfilming for a direct-read optical character reader ( $2000); 2) microfilming for reader/ printer specifications ( $2350); 3) microfilming for reader specifications ( $400); and 4) microfilming for a xerox copyflo printout of a card overlaid on a 8 x 10)~ worksheet ( $7000). the differences in cost are primarily attributable to the type of camera used (rotary or planetary) and the kind of feed mechanism (manual or automatic). other factors need to be considered, such as the fact that film suitable for ocr requirements could not be used on xerox copyflo or even for contact printing to positive film. since a readable copy of the original printed card is necessary for updating and proofing, microfilming for direct-read ocr would not be a viable alternative. input devices the monitoring of existent input devices was continued with an investigation of dissly systems' scan data optical character reader. scan data has been modified, via software, to read 55 different type fonts which are recognized by a "best compare" technique using six stored fonts to match against the remaining 49. according to the manufacturer, direct-reading is accomplished with approximately 95% level of accuracy. errors are recorded during a proofing cycle and corrected in the machine readable data base. the scan data equipment does not have a transport for a 3 x 5 document, so that a number of 3 x 5 cards must be attached to an 8 x 14 document for scanning, and therefore these cards would not be returned to the library by the manufacturer. under these conditions, cards to be read by scan data equipment would have to be obtained from stock rather than from the card division record set. unfortunately, many cards are out of stock; and of those that are in stock many may be cards reprinted several times by photo-offset methods and consequently have a poor image. therefore the use of this device would be severely hampered. fifty good quality cards were submitted to dissly systems for an experiment that was run without any modifications to the existing machine and software. five of the 50 cards were returned to the library with a matching printout. the results were not encouraging because many lines of text were missed and many characters misread. recon working task force the recon working task force has compiled work statements for contractual support for two of its research projects. these projects involve investigations on the implications of a national union catalog in machine readable form and the possible utilization of machine readable data bases other than that of the library of congress for use in a national bibliographic store. preliminary tasks related to these projects have been described in earlier progress reports ( 6, 7). the recon pilot project/ avram and maruyama 165 the first part of the work statement deals with the products that could be derived from the machine readable national union catalog: a bibliographic register, indexes by name, title, and subject, and a register of locations. these indexes would provide multiple access points to the records in the national union catalog. the bibliographic register will contain a full bibliographic record on each title covered. the indexes will contain partial records which are associated with the full records in the register, and a given index file will carry one or more partial records for every record in the register. for each title in the register, the register of locations lists those libraries where copies of the title are held. the assumption is made that the indexes under consideration will contain the following data elements (the numeric designations and subfield codes are those used in the marc format fields): name index name ( 100, 110, 111, 400, 410, 411, 600, 610, 611, 700, 710, 711, 800, 810, 811) short title ( 245) main entry in abbreviated form date (fixed field date 1) language (fixed field language code) lc card number register number title index short title ( 130, 240, 241, 245, 440, 630, 730, 7 40, 840) main entry in abbreviated form date (fixed field date 1, or may be omitted if in heading) language (fixed field language code, or may be omitted if in heading) lc card number register number subject index subject heading ( 650, 651) main entry ( 100, 110, or 111) short title (245) date (fixed field date 1) language (fixed field language code) lc card number register number the abbreviated form of main entry noted above is to be included in the record of the name or title index unless the name itself is carried in the main entry of that record. it is defined as follows: 1) for a personal name, a conference, or a uniform title heading-subfield "$a" is appended in brackets after the title; and 2) for a corporate name-subfield "$a" plus the first "$b" subfield are appended, within a single set of brackets, after the title. 166 journal of library automation vol. 4/3 september, 1971 the specific objective of this project is to define and investigate alternative processing schemes associated with an automated national union catalog. this study will explore and examine these processing schemes and the following components: 1) techniques for introducing the necessary input into the automated nuc svstem. the considerations to be covered include the relationship to' marc input, use of the format recognition programs, and the problems of language in terms of selection of input. 2) techniques for structuring or organizing the data contained in the register and the various indexes to establish and maintain the relationships among the records contained in these data bases. 3) techniques and procedures connected with the production of the products listed above. this investigation will also cover any selection and sorting procedures necessary. 4) analysis of the format, i.e., graphic design and printing, size, style, typographic variation, condensation, etc. 5) examination of alternative cumulation patterns associated with the products of the system. in this connection, items such as number of characters in an average entry, average number of entries on a page, expected rate of increase of number of entries in catalog, and segmentation of catalog are to be taken into consideration. 6) feasibility of producing a register through automation techniques. if this can be accomplished, further investigation will be directed toward the feasibility and cost of segmenting the register into three sections: one produced from machine readable records (english and whatever roman alphabet language records are in machine readable form); one produced from roman alphabet language records which are only in printed form; and one produced from non-roman alphabet language records which are only in printed form. the costs associated with the various techniques and procedures enumerated above as well as with their components will be calculated. from these figures an average total cost per title cataloged is to be determined for each alternative processing scheme. these cost values (one per alternative scheme ) are to be compared with those associated with a purely manual processing scheme. included in this cost analysis will be the associated costs for different forms of hard copy as well as for the use of com (computer output microfilm). from any one index and the register of locations, the maximum number of alphabetic and numeric lists (registers of location ordered by register number) will be determined, taking into account ease of usage and technical and economic feasibility. the intent is to have as few lists as possible and still keep the cost within reasonable bounds. supplements to the indexes should be issued monthly; supplements to the register of locations may be issued monthly or quarterly. the recon pilot pro;ectfavram and maruyama 167 the second project is a continuation of a previous investigation on the possible utilization of machine readable data bases other than that produced by the library of congress for use in a national bibliographic store. the results of this project should determine if the use of other data bases is economically and technically feasible. using three or four data bases selected by the recon working task force, the study will determine the following: 1) method and cost of acquiring these other data bases in machine readable form. 2) analysis of the kinds of programs capable of converting records from a number of these data bases into the marc format. different level data bases might require different kinds of programs. if such an effort is deemed feasible, a cost estimate for such a program or array of programs will be calculated. 3) method and cost of printing the records for examination, corrections, etc. 4) method and cost of eliminating records already in the marc data base. 5) method and cost of comparing these records against the lc official catalog and making the necessary changes in the data or content designators. 6) cost for input of additions and corrections. 7) method and cost of incorporating the additions and corrections in the machine readable file. 8) cost of providing means by which these records would not be input again by any future lc retrospective conversion effort. a result of this project should be a determination as to whether high potential or medium potential files, or both, are suitable for conversion. a determination will be made of the minimum yield or the minimum number of titles needed to justify writing the programs to convert these data bases. a factor to be considered is that the number of unique titles will decrease as more data bases are converted for this pool of records. it was decided that the research tasks to study the problems in distributing name and subject cross reference control files would be dropped because of limitations of time and funds. an additional task, however, has been added that can be performed within the time limits of the pilot project. during the past year, the library of congress card division has recorded information about card orders in machine readable form. this information will be analyzed as to the year and language of the most frequent orders because it is assumed that the most popular card orders bear a relationship to the potential use of a data base in machine readable form by libraries in the field. this study involves the following: 1) analysis of a frequency count of lc card orders for a one-year period and preparation of a distribution curve for card series. 168 journal of libmry automation vol. 4/3 september, 1971 2) analysis of a sample of frequently ordered cards to determine with fair reliability the proportion of english language titles in this group. the sample will be large enough to give an indication of other language groups that might be significant for any recon effort. 3) preparation of distribution curves for english language and nonenglish titles by card series. 4) mathematical analysis of the results of 1) -3) above to arrive at a table to show the anticipated utility of converting specified subsets of the lc card set. outlook research in input devices has not uncovered any equipment that offers a significant technical and cost improvement over the mt /st currently used in the library of congress. on-line correction and verification of marc/recon records will, however, speed conversion and will offer relief in the flow of documents and paper work required in a purely batch operation. since marc/recon records will be corrected and verified in one operation rather than by the cyclic process of the present system, · cost savings should be realized. the library of congress will have this on-line capability through the multiple use marc system. this new system is still in the design phase, and a projected date for implementation has not yet been set. to date investigations in the use of direct-read optical character readers have demonstrated that there are no devices currently available capable of scanning the lc printed card. the format recognition programs are operational, and recon titles in the 1968 card series are being converted without any prior editing of the records. procedures are being implemented to gather the necessary data to compare costs of the format recognition technique with costs of conversion with human editing. production statistics have shown that retrospective records are more costly to convert than current records. this higher cost is attributed to the additional tasks in recon of selecting the subset for input from the lc record set and comparing the records with the lc official catalog for updating. since cards in the lc record set do not necessarily reflect the latest changes made to the cards in the lc official catalog, the official catalog comparison is necessary to ensure that recon records are as up-to-date as the cards in the official catalog. although the recon report ( 8) recommended conversion in reverse chronological order with highest priority given to the last ten years of english language monograph cataloging, the working task force study on the card division popular titles may reveal that selective conversion is a more practical approach. the orderliness of chronological conversion by language does mean that records in machine readable form can be ascertained easily. it is interesting, however, to speculate on the use of the recon pilot project/ avram and maruyama 169 these records compared with popular titles which may cross many years and languages. the marc/recon titles constitute the data base for the phase ii card division mechanization project, and close liaison continues to be maintained between both projects. it is recognized that the distribution of cards and marc records requires the same computer based bibliographic files and has similar hardware and software requirements. plans are presently underway to transfer the duplication of tapes for ~.iarc subscribers from the library's ibm 360/40 to the card division's spectra 70 when the phase ii system is operational. the recon pilot project does not officially end until august 1971. in an attempt to make information available as rapidly as possible, the preparation of the final report will begin this summer, since several aspects of the project are complete enough to be documented. the final report will be published by the library of congress, and its availability will be announced in the lc information bulletin and in professional journals. acknowledgments the authors wish to thank the staff members associated with the recon pilot project in the marc development office, the marc editorial office, the technical processes research office, and the photoduplication service of the library of congress for their contributions to the project and, therefore, to this report. special thanks are due to patricia e. parker of the marc development office for her work on the foreign language editing experiment and for writing that section of this article. references 1. avram, henriette d.: "the recon pilot project: a progress report," journal of library automation, 3 (june 1970), 102-114. 2. avram, henriette d.; guiles, kay d.; maruyama, lenore s.: "the recon pilot project: a progress report, november 1969-april 1970," journal of librm·y automation, 3 (september 1970), 230-251. 3. avram, henriette d.; maruyama, lenore s.: "recon pilot project: a progress report, april-september 1970," jow·nal of library automation, 4 ( march 1971 ) , 38-51. 4. recon working task force: conversion of retrospective catalog records to machine-readable form: a study of the feasibility of a national bibliographic service (washington, d.c.: library of congress, 1969 ), 179. 5. u. s. library of congress. information systems office. format recognition process for marc records: a logical design (chicago, american library association, 1970 ). 6. avram , guiles, maruyama, op. cit., 248-249. 7. avram, maruyama, op. cit., 49-51. 8. recon working task force, op. cit., 11. conversion of bibliographic information to machine readable form using on-line computer terminals 217 frederick m. balfour: information systems engineer, technical information dissemination bureau, state university of new york, buffalo, new york a description of the first six months of a profect to convert to machine readable form the entire shelf list of the libraries of the state university of new york at buffalo. ibm datatext~ the on-line computer service which was used for the conversion, provided an upperand lowercase typewriter which transmitted data to disk storage of a digital computer. output was a magnetic tape containing bibliographic information tagged in a· modified marc i format. typists performed all tagging at the console. au information except diacriticals and non-roman alphabets was converted. direct costs for the first six months were $.55 per title. several recent articles have reported on methods and related costs to convert library bibliographic information to machine readable form. chapin ( 1) compared keypunching, paper tape, and optical character recognition. keypunching was also described by hammer ( 2), and black (3) . buckland (4) described paper tape conversion, and johns hopkins university ( 5) reported on optical character recognition. online computer terminals have been proposed ( 6), but have hitherto not been tried in a large library. without attempting to discuss the various techniques, this paper presents a detailed report of converting with on-line computer terminals. it is hoped that the experiences reported here and in the cited articles will 218 journal of library automation vol. 1/ 4 december, 1968 provide suitable information to a library administration considering largescale conversion. background in 1965 a systematic program of automation was begun in the libraries of the state university of new york at buffalo. the general goals of the program were to improve services to patrons and streamline internal operations. there are three general areas usually considered for automation in a library: acquisitions and accounting, the card catalog, and circulation control. an analysis of the system indicated that conversion of the card catalog to machine readable form would provide the greatest improvement in library services and operations. the reasons for the decision were as follows. first, the university libraries are growing rapidly; in one year the shelf list will increase by 60,000 to 100,000 titles, or about 15 to 25 per cent. second, suny buffalo is currently planning a new campus which will be completed in five to ten years. in the interim, the university will be spread over three major campus locations, with many smaller offices and departments located throughout the city, and the libraries must provide some form of bibliographic index for each location. the conversion of the shelf list to machine readable form will allow this distribution of the bibliographic information at a very low cost per title. finally, the project will provide experience in using magnetic tape for the handling of bibliographic information, so that when the library of congress' marc project begins to produce magnetic tapes, suny buffalo will be able to utilize them immediately. selecting the conversion hardware in 1966, a proposal for converting the shelf list to machine readable form ( 7) was presented to the library administration. it pointed out the many improvements in patron services, the advantages to the library staff, both professional and clerical, and the monetary savings to be realized by such a conversion. it discussed the four methods of file conversion then feasible: punched cards, optical scanners, punched paper tape, and magnetic tape-keyed data converters (as exemplified by tl1e mohawk data sciences equipment) ( 8). the proposal recommended using the magnetic tape-keyed data converters because of their input speed, ease of entry, and elimination of handling cards or paper tape. during the first quarter of 1967, a fifth method of conversion was considered, an ibm product called datatext (9). it required the rental of an ibm 27 41 communications terminal (essentially a typewriter), a western electric 103a data-set, and a voice-grade telephone line to the nearest ibm installation, which was cleveland, ohio. a customer may buy time in six-hour blocks called datatext agreements. an agreeconversion of bibliographic information/ balfour 219 ment covered a time segment from 7:00a.m. to 1:00 p.m., or from 1:30 p.m. to 7:30p.m., five days a week. datatext provided everything that the magnetic tape converters did with some important additions. first, it had upperand lower-case alphabet using a shift character (the library administration had seen only the mohawk upper-case converter). second, the typewriter gave a typed copy which was easy to proofread. third, corrections were much easier because of the text-editing capabilities of the on-line computer. text-editing can best be illustrated by describing a typical datatext job. a typist working from source material produces a typewritten page; at the same time, the ibm 27 41 she is using transmits the data being typed to the computer in an area called "working storage". when typing is completed, the clerk gives the appropriate command and the information is stored in an area called "permanent storage", a computer manipulation which can be compared to taking a page from the typewriter and placing it in a folder in a file cabinet. when the typist wishes to make changes to the information, she can give a command to recall it from permanent storage to working storage. she can then manipulate it in several ways. during original entry, the computer automatically assigned numbers to each line. using these line numbers, a typist can move information within the text, can add or delete information, and can correct errors. commands are very simple and concise; for example, it takes four keystrokes to move a new line into the text. in making a correction, the typist merely types the incorrect word and the correct word; the computer then types the complete line to show that the correction has been properly executed. (this instant replay, or on-line interaction, is a benefit unique to the on-line terminal.) after any change, the computer automatically renumbers lines and reformats the entire text. a sample of typed input is illustrated and discussed later in the article. in april 1967, it was decided to test the datatext service because of its powerful correction capability, and because it could be installed and working within three weeks. in may the console was delivered, the telephone equipment installed, and a long-distance line to cleveland rented. a one-month test of datatext proving successful, three more consoles, data sets and telephone lines were added, and the conversion project was fully underway. training the typists the majority of the typing and proofreading staff were drawn from existing personnel in the cataloging department. individuals chosen had a background in either catalog card typing or file maintenance, and consequently a good working knowledge of information on a catalog card. it was anticipated that with a minimum of further training, the typists could identify and tag information as they were typing it at the console. this assumption was critical to the success of the project, since the li.... ----------------~---220 journal of library automation vol. 1/ 4 december, 1968 brary could not afford the professional time necessary for complete pretagging of bibliographic information. typists involved in the one-month test were given several hour-long training sessions on tagging before the console arrived. when the project got underway, a list of all possible tags was posted near the console, and a librarian was nearby to answer questions. mter three weeks of operation, it was obvious that the typists could tag at the console, thus making this part of the test run a success. the tagging system used was developed from the marc i pilot project ( 10). most of the original tags were retained and several additional ones designed to meet specific local needs. tape files created were formatted according to marc i specifications, although fixed fields were left blank. the tagging system is outlined in a reference manual prepared for typists and proofreaders ( 11). operation of an on-line console requires special training. ibm sent a datatext instructor to buffalo on several occasions to provide typist training. for the major training session, which occurred in june, the ibm representative came for a full week. ten typists were trained; five specialized in entering information, and five specialized in retrieving, correcting, and transmitting information. by the end of the week both groups were skilled in their respective specialities, and many typists were able to perform well in both areas. later, typists were trained in several sessions by one of the library's typing staff. during the first three months, the author was near the terminals at all times to answer questions on terminal operation, to collect data for measuring and controlling performance, and to act as supervisor. a librarian was on call for questions on complex library problems, and the programmer-analyst was available to help solve problems regarding input format and tagging. at the end of this period, appropriate clerical staff had been trained to supervise minute-to-minute operation. conversion procedures the general method of conversion (figure 1 ) was as follows. a typist typed into "working storage" for an hour, inputting 15 to 30 shelf list cards. she instructed the computer to store this "document" in a permanent storage location on disc. she then placed the typed copy and cards in a proofreading bin, cleared working storage, and started another document. a proofreader compared typed copy with original cards and indicated any errors. the corrected document then went to a correction typist who "retrieved" the document from permanent storage to working storage, performed the corrections, and transmitted the corrected document to magnetic tape. the original uncorrected document was left in permanent storage overnight and deleted the following day. documents were transmitted to tape bqffalo 3 x 5 shelf list cards hard copy proofreading operation file conversion of bibliographic information/ balfour 221 cleveland computer disc storage mail to library fig. 1. shelf list conversion information flow. 222 journal of library automation vol. 1/ 4 december, 1968 for about two weeks and the accumulation returned to the library via the mails. (ibm saved all permanent storage records for one week as a security measure. if a library typist inadvertently deleted a document , it could be retrieved by the computer operator. ) figure 2 shows a sample of typed input and subsequent correction. line numbers, as they are stored on the disc, are included on the right margin for ease of explanation. lines typed in capitals are computer r esponses to commands, the first entry being the command to clear working storage. the computer responds and then indicates that the console is in one of two general input modes. all cards are typed in "automatic" mode, for which the typist gives the appropriate command. when the computer responds the typist asks for the next line number, which is 3, and begins to input the card. in line 4, the typist makes an error and realizes it before throwing the carriage. she hits the "attention" key proc cleared uncontrolled mode a automatic mode n next number -3 90t bs2575.3.a7 lot bible. n.t. matthew. english. 1963. new english. 20t the gospel according to matthew=. commemen 3 ntary by a.w. argyle. 4 30a cambridge 30b university press 30c 1963 40t 227 p. maps. 20 em. sot the dambridge bible commentary: new english bible 70t bible. n.t. matthew -commentaries. 7lt argyle, aubrey william, 191073t title. 60z 92t 226.207 94t 63-23728 n next number -10 6 dambridge cambridge ~ot the cambridge bible commentary: new english bible fig. 2. sample input and correction of one shelf list card. 5 6 7 8 9 conversion of bibliographic info1'mation/balfovr 223 clueing the underscore, rolls the platen down, back spaces, and retypes the correct word. the computer then corrects the error. in line 6 the typist misspells "cambridge", but does not realize it before throwing the carriage. the correction is shown at the bottom although the input typist could not have performed it herself; it would have gone through proofreading and back to the correction typist. the correction is made by typing the line number, in this case "6", the incorrect word, "dambridge", tab, and the correct word. the computer responds by typing out the complete line showing the correction. except for a brief period, the shelf list was converted in alphabetic order, and by december 1 shelf list drawers through the e's were completed. early in the project, some of the literature classification, p and pq, was converted. foreign languages in the pq's gave no particular problems, and typing rates did not drop. all cards were converted in shelf list order except for those having non-western alphabets. when possible, these were transliterated and entered. otherwise their input was delayed. since the 2741 console has no diacritical marks, these were left out; however each card having them was entered and given a special tag to permit retrieval at a later date when diacritical marks could be added by special coding such as used by marc. conversion consoles and shelf list were in the same building. each day, several inches of cards were removed from the drawer being processed and a marker inserted indicating where the cards had gone. in general operation, cards were returned and refiled in less than a day so that inconvenience to staff was minimal. as a card was proofread, it was marked on the back with a "c" and the ·upper right hand comer received a very small notch with a mcbee punch. thus, newly cataloged cards filed with cards already converted are recognizable by the unnotched comer. costs table 1 gives a statistical summary of the conversion project from july 31 through december 1, 1967. the term "l.c. card" refers to a complete bibliographic entry for a title and may include more than one physical card, or may include writing on the back of a card. input and correction functions are reported separately and then totaled to give a realistic input rate per hour for corrected cards. supervisor cost reflects wages of clerical supervisors only. those of the programmer-analyst, the librarian and the systems analyst assigned to the project are not included. a breakdown of monthly equipment costs per console is given in table 2. installation costs were $150 for each terminal, and $50 for each leased telephone line. when the project operated four consoles, the monthly equipment cost was $4,472. 224 journal of library automation vol. 1/ 4 december, 1968 table 1. conversion project statistics (july 31-dec. 1, 1967) input, proofreading and correction total l.c. cards input typist hours input typist hours correcting total typist hours proofreading hours number of errors per l.c. card l.c. card input rate per hour l.c. card correction rate per hour overall conversion rate (input & correction) cards per hour proofreading rate, cards per hour costs labor cost @ $1.75 per hour equipment and supervisors total cost cost per card converted utilization of console time hours typed hours consoles down hours computer down hours lost time table 2. monthly operational costs per terminal ibm 2741 communications terminal western electric 103a data set 24-hour voice-grade lease line to cleveland plus local telephone costs 2 data text agreements @ $310. total 3,035 492 3,381 245 91 438 4,155 49,348 3,527 1,235 .42 16.3 100 14 40 $ 8,078.00 18,995.00 $27,073.00 $0.55 81.4% 5.9% 2.2% 10.5% 100.0% $ 85.00 27.50 385.50 620.00 $1118.00 "hours typed" is time that consoles were actually being used to input or correct cards. this is slightly less than "typist hours worked" because some correction has been delayed, but it was included in hours worked to give true representation of input rates. "hours consoles down" reflects time lost due to console breakdown. during the early part of the conversion of bibliographic information/ balfour 225 period, two consoles were failing often. however, as operating problems were solved, console down-time dropped far below the average 5.9 per cent shown. "hours computer down" was also greater during early weeks of the project. however, for each hour down, ibm credited the library with $12.00 ( $3.00 per terminal for four terminals). "hours lost time" reflects periods when a working console could not be manned because of personnel breaks or operator absence. all times are given in console-hours, four consoles operating for one hour being recorded as four hours. the error rate of .42 errors per card is very low. allowing 350 characters per shelf list card, typists were making one error for every 830 keystrokes. this translates to about 3 errors per typewritten page of 50 characters per line, 50 lines per page. the office of secretarial studies of suny at buffalo indicates that this rate is well within the tolerance for "normal" typing, as in a typing pool. when it is considered that typists were tagging and inputting complicated bibliographic information, rate of accuracy was commendably high. typists used in the project included the lowest salary grade of civilservice typists, part-time hourly workers, and students. an acceptable input rate for civil service typists was 18 cards per hour, which is equivalent to 21 5-character words per minute. the faster typists, at 26 cards per hour, were typing at 30 words per minute. again, let it be mentioned that the material was complex and that typists were required to tag each piece of information. conclusions several points can be made about converting with datatext. it was easy to implement and received excellent support from ibm. the ibm information marketing staff in cleveland provided constant assistance during the early part of the installation and visited often once the project was successfully underway. ibm sent the datatext instructor as often as needed and provided free computer time during teaching sessions. the four long-distance telephone lines and data sets proved reliable. there was only one instance during the period when a line was inoperable and it was repaired in three hours. the liaison and support from new york bell telephone was very good. datatext costs would have been lower had the ibm installation been nearer. cleveland is 173 miles from buffalo giving a 24-hour leaseline cost of $342 per month. (datatext service will soon include a uniform long-distance-lines cost.) verification or correction on datatext does not require human retyping of each line of entry. only the word in error and its replacement need be typed; the console then types the corrected line to show that the error was deleted and the replacement inserted. consequently correction costs are low and corrections accurate. 226 journal of library automation vol . l / 4 december, 1968 average rates and costs given in table i reflect learning during the first six months of the project. towards the end of the reported period, rates were improving and costs decreasing. since december 1967, the project has added three more consoles and uses a datatext service provided by a campus computer. costs have dropped below $.45 per card, a figure which will increase somewhat when diacriticals are added. potentially cost per title for complete conversion is under $.50. references 1. chapin, richard e.; pretzer, dale h.: "comparative costs of converting shelf list records to machine readable form," journal of library automation, 1 (march 1968), 66-7 4. 2. hammer, donald p.: "problems in the conversion of bibliographic dataa keypunching experiment," american documentation, 19 (january 1968), 12-17. 3. black, donald v. : "creation of computer input in an expanded character set," journal of library automation, 1 (june 1968), 110-120. 4. buckland, l. f.: recording of library of congress bibliographical data in machine readable form (rev. ed.; washington, d.c.: council on library resources, 1965). 5. the johns hopkins university. milton s. eisenhower library: progress report on an operations research and systems engineering study of a university library (baltimore: johns hopkins, 1965). 6. international business machines corporation. federal systems division : report of a pilot protect for converting the pre-1952 national union catalog to a machine readable record (rockville, maryland: ibm, 1965). 7. lazorick, gerald j.; herling, john; atkinson, hugh: conversion of shelf list bibliographic information to machine readable form and production of book indexes to shelf list (buffalo, n.y.: state university of new york at buffalo, technical information dissemination bureau, 1966). 8. mohawk data sciences corp.: datagram no. 35, 1181 twk correspondence data-recorder, (herkimer, n.y., mohawk data sciences corp., 1967). 9. international business machines corporation: datatext operators instruction guide, form # j20-0010-1 (ibm, white plains, n.y., 1967). 10. u.s. library of congress, information systems office: a preliminary report on the marc (machine readable catalog) pilot protect (washington, d.c.: library of congress, 1966). 11. michael m. coffey: reference manual for typists and proofreaders. sunyab shelf list conversion project (buffalo, n.y. : suny at buffalo, technical information dissemination bureau, 1968). a hybrid access method for bibliographic records abraham bookstein: the university of chicago graduate library school, chicago, illinois. 97 this paper defines an access method for bibliographic reco1'ds that combines features of the sea1'ch key app1'oach and the inverted file approach. it is a refinement of the search key technique that permits its extension to la1'ge files. a method by which this approach can be efficiently implemented is suggested. introduction a major problem in the development of computerized files of bibliographic records is the creation of a convenient and economical mechanism to access the records. as the problem of organizing a file for efficient access is a general one, a number of structural devices have been suggested. hsiao and harary propose an abstract model for file structure that encompasses those that are discussed most frequently. 1 lefkovitz discusses these techniques in more detail and considers the advantages of each for implementation, while dodd and knuth describe the data structures needed in implementing such files. 2-4 these works reveal the interrelation between a file's organization and its retrieval capability, but the determination of which routes of access to provide must be the task of those responsible for creating the file. such a determination may involve consideration of both the intrinsic structure of the items represented by the file and the conditions under which the file is to be used. they will influence which file organization should be chosen. because of the complexity inherent in collections of bibliographic " items, the problem of determining suitable access routes to library files has been a challenging one. almost any datum may, on some occasion, be a useful means of entering the file. dimsdale and heaps, in their discussion of a file structure for an on-line catalog, explicitly propose words from the title, authors, and library of congress call numbers. 5 in this paper we shall consider the problem of accessing a known item by means of information contained in the author and title field. we shall concentrate on two approaches that have received much attention-the use 98 i ottrnal of library automation vol. 7/2 june 197 4 of a truncated search key, referred to simply as search key, and the use of boolean expressions of key words from the title. both of these are intended to allow a user simple entry into the file when the full field of information is long, complicated, or, perhaps, incompletely known by the user. the authors and titles of books often share these characteristics. each of the approaches, taken by itself, has its strengths and its weaknesses. we will discuss each technique in tum, and then suggest an elaboration of the search key technique that incorporates some features of the boolean search technique; this combination of techniques should enable systems that are committed to the use of search keys as a primary access route to extend this technique to large files. it introduces into the search key approach some of the flexibility of the key word approach. search keys this approach defines at least one special field, the search key, for each item represented in the file, and allows retrieval of the record for an item by inputting the value of its search key. 68 the search key should be constructed so as to allow its evaluation from data that are available at the time of access. the main advantage of this approach as it is usually implemented has been its great simplicity-for a broad variety of materials, the key can be readily evaluated and quickly entered into the system. the most heavily discussed defect of this approach is that it will sometimes retrieve a considerable number of records to a single request. consider, for example, these works: 1. ramsay, blanche margaret. relation of various climactic factors to the growth and development of sugar beets, and 2. ramsey, ian thomas. religious language. the popular ( 3, 3) search key, constructed by concatenating the first three letters of the author's name and the first three letters of the first significant word of the title, would represent each of these by the key ram, rel. this defect becomes particularly severe with certain corporate entries and works such as conference proceedings. furthermore, this difficulty can be expected to become aggravated as the file increases in size or, equivalently, as some items are given multiple search key values; the latter may be required in order to alleviate the problems inherent in having to access items with ambiguous or multiple forms of titles. attempts to remedy the difficulty of multiple retrievals have resulted in increasingly complex keys, defeating the purpose for which this technique was originally proposed. a more complex key makes greater demands on the user, encourages mistakes on entry, and also might increase the likelihood of two individuals deriving different keys for the same item. inverted files in this approach, a user attempts to retrieve a record by forming a hyb1·id access methodjbookstein 99 boolean expression of key words taken from various fields of the desired record. 9• 10 stanford university's ballots, for example, allows the user to enter the file by means of words taken from the title of a book. two advantages of this approach as compared to the search key are that: (a) the user need not know the information required to form a search key, for example the first word of the title; and (b) the user is able to enter the system by what appears to him to be the most distinctive terms in the title, thereby minimizing false drops. users of ballots have found that because of the speed at which computers operate, usually the indexes can be manipulated and a record retrieved immediately, or in a very short period of time. fayollat gives an estimate of two to five seconds as the response time. the most direct way to implement this approach would be to access each record in the file and compare it to the request. for any but the smallest files this would be unreasonably costly in computer time. an alternative, and customary, implementation involves maintenance of indexes of key words. while experience with this approach, as at the ballots project, recommends this as a workable implementation, it can be costly in terms of the computer costs involved with upkeeping the indexes. hybrid approach we offer for consideration an elaboration of the search key approach that incorporates aspects of the key word approach. it is intended as an alternative to developing increasingly complex keys for systems adopting a search key approach, but for which a simple search key retrieves too many items; possibly this approach can be selectively applied to the more troublesome parts of the file, such as to items with corporate authors. this approach associates a search key with each record, hopefully one that is simple and easily derived. a user would begin by entering into the system the search key. if the system finds that the number of items that would be retrieved exceeds a preset threshold, it would output a message requesting that the user enter a set of key words taken from various fields in the records; the title would be very useful in this regard. the system first generates a subfile of records having the desired search key. if a hashing technique is used, constructing this subfile can be accomplished quickly and at relatively little cost in space for tables. 11 once the smaller file is formed, a complete search of the full records can be made for the key words. since the system operates in two phases, it is less sensitive to the number of records the search key retrieves as far as user considerations are concerned. ease of use becomes the dominating objective in designing the search key. experience to date suggests that even a very simple search key will almost always produce less than thirty records with files having in the order of 100,000 records. however, a complete search of a reduced file of thirty records should be feasible; in fact, usually the subfile will be no larger than two or three records. from one point of view, in the hybrid system, we can 100 journal of library automation vol. 7/2 june 1974 think of the search key not as an access mechanism, as earlier, but rather as a file reduction mechanism. this system trades the cost of maintaining and storing large indexes for an increase in costs of computer processing; only relatively easily maintained hash tables for fixed length search keys need be maintained. an accurate assessment of these costs can be made only after the statistical characteristics of various search keys have been explored. observations if it should be desired to implement a hybrid system, the following observations would be in order: 1. among the current concerns of facilities with large bibliographic files is file compaction. if records will have to be searched for key words, this consideration will influence planning of compaction techniques. for example, a technique such as cop ack, which completely scrambles the bits in a record, would not be permissible. 12 use of variable length codes for characters, such as in hoffman coding, would allow searches for key words; most likely such a search would be implemented by attempting to match substrings of bits rather than matching on the full word level.13 another common compaction technique, bigram coding, would also complicate the separation of words unless the blank were prevented from combining with other characters; because of the frequency with which the blank occurs with other characters, this restriction would interfere considerably with the efficacy of the technique.u a different approach would be to recognize that each word could have only two "spellings," depending on what happened to the blank preceding the word, and both spellings could be tested. (a brief survey of the above compaction techniques has been conducted by fouty.15 ) 2. though a complete search for key words would be feasible on a small file, it is possible to expedite the search considerably by means of a technique devised by malcolm harrison, which involves adding a fixed number of bits, or signatures, to each field on which a search can take place; these additional bits are derived in a well-defined way from the original field. 16• 17 this subfield is a fixed-size representation of the full field in a form that can be used to very rapidly eliminate most records which would not pass the key word matching test. it is stored in the index to the file along with the address of the record. though this preliminary test is not foolproof, it could considerably reduce the size of the subfile that requires a more costly complete search, thereby reducing the number of disc accesses. if this procedure is adopted, a possible sequence of events would be as follows: (a) a user inputs a search key and, perhaps, a couple of key words. these may be words he is certain are in the title, although the hybrid access methodjbookstein 101 name of a series, the author, or subject headings would also represent candidates. (b) on the basis of the search key the system creates a sub file of record addresses and signatures taken from the index-if the user is unfortunate the subfile would have a large number of records. (c) a rapid preliminary search of the signatures using the harrison technique is made of the reduced file to test whether the key words could possibly be part of a record. this pass eliminates a number of records; how efficient this technique is will depend on the number of bits the system associates with each representative field. (d) finally, the full records of the remaining items are retrieved and a full search is made. at any point, if the subfile is too large, the system may request additional key words. example of technique implementation how to create a signature for a record is best explained by means of an example. many variants are possible, and we have chosen a simple one for the purposes of illustration. the signature we shall create will consist of one word of thirty-two bits. we proceed as follows: 1. list all the substantive words of the title, e.g., relation, various, climactic, ... , beets, if we consider one of the titles mentioned above. 2. truncate each word to, say, the first four characters: rela, vari, ... , beet. other truncation sizes, or no truncation at all, may be elected. 3. for each string of characters produced in this way, form the two consecutive strings of three characters. for example, "vari" contributes "var" and "ari." since the first word is already represented in the search key, we may use only the second three-letter string for that word-here "rela" is represented only by "ela." implicit in this implementation is the assumption that if a user remembers anything about a word, he will correctly remember at least its first three characters, and that the first four characters go a long way toward giving the word away. 4. finally, we turn on a bit in the signature for each three-character string, essentially creating a hash code of thirty-two bits. the code should incorporate information from all three characters. for purposes of illustration, the following method will suffice: (a) for each letter in a three-letter string, substitute the rank of that letter in the alphabet, beginning with 01 for a-thus "ela" becomes 05,12,01; (b) consider the string of digits as a single six-digit number, and multiply that number by 1111-thus "ela" becomes 51201 and then 56884311; (c) divide by 32 and use the remainder as the address of the bit which is to be turned on. the string "ela" is thus associated with bit number 23, where the leftmost bit is the oth bit. as the algorithm is 102 journal of library automation vol. 7/2 june 1974 applied to each three-character string, the signature is formed. the book by blanche margaret ramsay is accordingly represented by: 01000011100100011000010100100101 similarly the book by ian thomas ramsey is represented by: 00000000000000010000000001000010 suppose a patron, or a cataloger, wishes to see the record associated with mr. ramsey's book on religious language. he would enter the search key, ram,rel, and, say, the word "language." among the index entries reb·ieved by the search key will be the desired book, and also the book by ramsay, dealing with sugar beets. the signature for the word "language" has bits numbered 30 and 25 turned on. since the ramsay book does not have both of these turned on (in this case neither bit is turned on), it is immediately eliminated; the actual records retrieved from the file will be only those for which both bits are on. though it is quite possible that false drops can be incurred in this way, clearly many incorrect records are easily eliminated. note also that the user need input only as much of the word as he has confidence in, provided that at least three characters are produced. use of the above technique leaves a number of decisions that still must be made by the system designer. among these are: 1. should a signature be associated with each item, or only a part of them, for example, with corporate authors? 2. how much truncation is appropriate, if any? if no truncation is used, then the user can input fragments of words, including fragments taken from the middle of a word, as well as full words. on the other hand, as the signature fills up, the probability of a false drop increases. earlier research contains a formula that allows us to estimate this effect.18 consider a title with six significant words. fayollat has found that in a file of biomedical serials, about 83 percent of all items will be of this size or less.19 similarly, let us assume that the average word in the title is made up of eight characters, a modal number of characters in fayollat's data base. if the user requests a term composed also of eight characters, then table 1 estimates the probability of a false drop as a function of the b·uncation size. table l. probability of false drops as function of truncation size. truncation probability of length false drop 3 .17 4 .10 5 .08 6 .08 7 .08 8 .09 it is seen that for this typical case, the method eliminates about 90 perhybrid access methodjbookstein 103 cent of the false drops. it must be understood that longer titles, or titles made up of longer words, will be more likely to be erroneously retrieved; on the other hand, the user can increase his precision by inputting a larger number of terms. the above calculation assumes that terms in the request and in the title are independent; of course, all items having the same search key as the request and sharing the discriminant word will be retrieved; presumably the user will minimize this effect by choosing distinctive words. fayollat finds that 50 percent of the words appearing in his titles occur only once. conclusion in conclusion, we propose a technique for entering a bibliographic data base that retains the simplicity of search keys while also including some of the flexibility that boolean expressions of key words have for uniquely defining an item. in such a system, the only indexes that must be maintained are the hash tables; the other indexes, such as title words, are replaced by the search algorithms. if a signature, the supplementary field described above, is also stored in the index, this approach reduces the number of disc accesses. a major limitation of this approach is that a user must be able to provide a search key; this is shared, however, with systems depending exclusively on search keys. furthermore, since the system is capable of handling larger numbers of retums on the search key, there is greater inducement to associating more search key values for each item. thus such a hybrid system allows groups that find search keys an attractive access technique to extend this approach to file sizes which strain the capacities of the direct approach. references 1. d. hsiao and f. harary, "a formal system for information retrieval from files," communications of the acm 13:67-73 (feb. 1970). 2. d. lefkovitz, file structures for on-line systems (new york: spartan books, 1969). 3. g. dodd, "elements of data management systems," computing surveys 1:117-35 (june 1969). 4. d. knuth, fundamental algorithms, the art of computer programming, vol. 1 (new york: addison-wesley, 1968). 5. j. j. dimsdale and h. s. heaps, "file structure for an on-line catalog of one million titles," journal of libmry automation 6:37-55 (march 1973). 6. f. g. kilgour, p. l. long, and e. b. leiderman, "retrieval of bibliographic entries for a name-title catalog by use of truncated search keys," proceedings of the asis 7:79-82 (1970). 7. p. l. long and f. g. kilgour, "a truncated search key title index," journal of libmry automation 5:17-20 (march 1972). 8. a. landgraf, k. rastogi, and p. long, "corporate author entry records retrieved by use of derived truncated search keys," journal of library automation 6:15661 (sept. 1973). 9. james fayollat, "on-line serials control system in a large bio-medical library. 104 journal of library automation vol. 7/2 june 1974 part ii. evaluation of retrieval features," journal of the asis 23:353-58 (nov.dec. 1972). 10. a. h. epstein et al., articles in proceedings of the asis 10 ( 1973). 11. a. bookstein, "double hashing," journal of the asis 23:402-5 (nov.dec. 1972). 12. b. a. matton and p. a. d. de maine, "automatic data compression," communications of the acm 10:711-15 (nov. 1967). 13. w. d. maurer, "file compression using hoffman coding," in computing metho.ds in optimization problems 2, from second international conference on computing methods in optimization problems (new york: academic press, 1969), p.242-56. 14. w. d. schieber and g. w. thomas, "compaction of alphanumeric data," journal of library automation 4:198-206 (dec. 1971). 15. gary fouty, unpublished master's thesis, university of chicago. 16. m. harrison, "implementation of the substring test by hashing," communications of the acm 14:777-79 (dec. 1971). 17. a. bookstein, "on malcolm harrison's subsb·ing testing technique," communications of the acm 16:180-81 (march 1973). 18. ibid. 19. fayollat, "on-line serials control system." 20 file organization of library records i. a. warheit: international business machines corporation, san jose, california library records and their utilization are described and the various types of file organization available are examined. the serial file with a series of inverted indexes is preferred to the simple serial file or a threaded list file. it is shown how various records should be stored, according to their utilization, in the available storage devices in order to achieve optimum cost-performance. one of the problems data processing people are beginning to face is the organization of library files. these are some of the largest and most voluminous files that will have to be organized, maintained and searched. they range in size from the national union catalog of the library of congress, which has over sixteen million records with an average of three hundred characters each, down to the hundreds of small college catalogs of 100,000 records. there are more than fifty universities whose holdings range from one million to over eight million volumes. the average holdings of library systems serving cities of 500,000 or more exceed two million volumes, although the actual number of titles is less. since the tum of the century the university libraries have been growing exponentially and at present are doubling, on the average, every fifteen years. , also the abstracting-indexing services, whose records are very similar to library catalog records and are used in much the same way, have grown very large. chemical abstracts which has been operating since 1907, now has over three and a half million citations. it provides data on file organization of library records/w arheit 21 some three million compounds and is today adding over a quarter of a million citations each year. if the present rate of growth continues, it will be adding 400,000 citations a year by 1971. index medicus and biological abstracts are very similar and there are a number of other somewhat smaller bibliographic services in the field of metals, engineering, physics, petroleum, urban renewal, atomic energy, meteorology, geology, aerospace, and so on. in addition, library-type file maintenance, organization and search are being applied to medical records, adverse drug reaction reports, intelligence files, engineering drawings, museum catalogs · and the like, and these too, represent very large information retrieval files. in other words, library files are very widespread and are beginning to become a problem for data processing. characteristics of files the aforementioned library files have certain common characteristics. first, as already noted, they are large. in the next ten or fifteen years there will probably be several hundred libraries with holdings exceeding one million volumes each. second, the records themselves are alphabetic and tend to be voluminous. they range from two hundred characters in an index journal, to three hundred characters for the standard catalog card up to two thousand characters for the abstract journals. in 1962 the library of congress, for example, estimated that it would need a file exceeding 9 x 108 bits to do its normal library processing and to store the serial records; it would need a file of 1.3 x 109 bits to store the circulation records and location directory and monitor the use of the collection, and would need a file of 1012 bits for the central catalog and the catalog authority files ( 1) . on the basis of library experience since 1962, these figures are generally considered too low. third, file records are variable in length. the librarian cannot control his inputs. the world's publica,tions appear in every shape, form and identity and they must be recorded the way they have appeared so that they can be properly identified. artificial identification such as book numbers, call numbers, coden numbers for journals and the like are simply parochial conveniences and do not replace the actual bibliographic record. records in a large catalog file are generally stable and not dynamic. if there is a new edition of a document, a new bibliographic record is made. if the old document is retained along with the new edition, the old catalog record is also retained. the record is discarded only if the document is discarded and, in the large research library, this occurs very infrequently. new indexing or cataloging is seldom applied to old records. in contrast, the smaller item record file used for acquisition and processing, the circulation file, and the serials records file, all ranging from 10,000 to 100,000 records, are dynamic records requiring many and frequent changes, additions and deletions. _ 22 journal of library automation vol. 2/1 march, 1969 each record item must have a number of different access points, since a single class or access point which everyone will accept is an impossibility. at present, with conventional library cataloging, card catalogs and printed indexes provide about five or six access points or records per title. however, computer systems, with their greater opportunity to do deeper indexing, are providing from ten to twenty keys or access points per title. distribution of index tenns is very uneven and not predictable. a few terms have a great many postings or addresses, while many terms, notably author entries, have only one or two postings. file segmentation by subject class has been proposed by some data processing personnel, but inter-disciplinary needs are such that subject segmentation is not considered very seriously. file segmentation by date, especially for the abstract services, is increasing in popularity. it is generally thought that major activity, in the technologies especially, is concentrated in current records; this is less true, however, in the sciences and even less in the humanities. public library and undergraduate library personnel may not object to segmenting their files, but those librarians responsible for major research collections that cover all disciplines do not look with favor on segmented files. although circulation records do provide some clues as to the activity of the various parts of a library's collection, no one really knows what the search activity in the catalog is, or how it is distributed across the various records used. therefore, since every record is considered permanent in libraries, major effort has been expended on input processing which has included the recording of much material whose utility is questionable. a user wants to access files in open language, and wants to receive response in open language; he will not use codes and so-called machine language and will tolerate only a minimum of training on methods to interrogate the file. he prefers to engage in an actual dialogue with the file and if he cannot do this will ask a reference librarian or reader's advisor to find the references for him. he also wants real-time response. if he doesn't get fairly prompt answers, he will go elsewhere to satisfy his informational needs. types of files the librarian must work with a number of files: 1) the item record file is the record of an item, book, journal, report. etc., that is being ordered, is on order, is being received, or is being processed by the cataloger. 2) the catalog file is the permanent bibliographic and subject record of the item that has been processed by the cataloger. 3) the serial~ record file, which is in two parts, is the record of holdings of completed volumes both bound and unbound, and the check-in record of currently received periodical issues. 4) the circulation control file keeps the record of all items loaned or otherwise charged out. 5) the catalog authority file organization of library records/w arheit 23 file is the thesaurus-like vocabulary control which indexers and catalogers use as their authority list and guide in assigning index terms. it is also used to "normalize" the inquiries of a searcher and convert them to legitimate index terms. the librarian is also concerned with a number of indexed abstracts produced by various discipline oriented institutions which are used in libraries. he also uses a number of special files: borrower or patron file, special collection files, location files, vendor files, and the like. except for a few comments about the item record, this discussion is confined primarily to the catalog file, which is by far the largest file and, for the librarian and the general user, the most important. as already noted, in most respects it is very similar to the indexed abstract file and, in fact, in certain special libraries, these two files are combined. in process file the in process, or item record, file consists of records of all items which the library is acquiring and processing. it is not a very large file, or, at least if properly policed, should not be. unfortunately, because in manual systems it is difficult continuously to follow up outstanding orders, a lot of deadwood accumulates and files become unnaturally large and difficult to handle. in a well controlled file, however, the number of records does not grow appreciably, for, although new items are added, processed titles are removed when they are added to the catalog file. · in addition to providing such normal bibliographic access points as personal author, corporate author, title, report nmj}ber and the like, the item record may also be searched by a number of specialized keys: order number, vendor, publisher, journal code, contract number, fund, requester. the item record is very dynamic. information available to the librarian when the order for an item is placed may be faulty. new information will be coming in about the item, such as price, shipping costs, invoice number, change in vendor, and change in title. various funds have to be charged and obligations changed, payments authorized, funds decremented, receipt notices prepared and sent to requesters, flags in various files changed to prevent duplicate orders and the bibliographic record transmitted to the cataloging staff. however, once an item has been received and cataloged, only the bibliographic information (author, title, place, publisher, date, pagination) are retained and the rest of the information is retired to an historical file. ( 2). because it would provide greater flexibility as new and unexpected demands are generated, the best way to handle this dynamic file would be with a generalized data management system rather than with a tailormade acquisitions and processing program. although present data management systems are really not suitable, because of variable length records in item record files and because terminals will be used, it appears that some could be adapted. 24 journal of library automation vol. 2/1 march, 1969 catalog file the tendency today, however, is to build a single master file with various functional fields where bibliographic information, ordering, and purchasing data, loan records, location information and other item control data are stored. how should this very large master catalog file be organized so that it will be easy and economical to maintain and provide all the desired search capabilities? there are three basic file organization schemes in use today for information retrieval: the serial file, the inverted file and the list process file ( 3,4,5). actually, from a technical point of view, both the inverted file and the list process file represent two different classes of list structures and are, therefore, sometimes referred to as the inverted list system and the threaded list system. serial file organization although the serial file is the easiest and cheapest to maintain, the librarian obviously cannot accept purely serial searching of his catalog. the file is much too big and the real time requirements are such as to rule out any but the shortest, simplest serial or sequential search. as will be pointed out later, the librarian does need some serial searching capability, and of course he does need it if he wants to do any browsing. however, if he is to provide any kind of useful service, he must use direct-access storage devices and access to his records individually. threaded list file organization for a while there was some interest in using a threaded list file organization for the catalog file. here, the searcher is first directed through a dictionary or directory to the latest record associated with a term. this record also contains the chain address of the previous record having the same descriptor, so that a user can run through a "chain" or "list" until he reaches the oldest or last record, or comes back full circle to the starting record. each record belongs to a number of lists, one for each descriptor used to describe it, and there are as many lists as there are descriptors. such a system seems economical of storage space in that a secondary or separate index does not have to be stored, but, since storage space for the chain or link address has to be provided, the actual savings are very small. there are several possible refinements of this list file organization which reduce storage costs. some involve elimination of redundant information; a term, or any other searchable piece of information, is stored just once, sometimes in the form of a table. each record that contains searchable information has a pointer to the term itself. there have to be, of course, pointers from every term back to the records as well. insofar as the pointers may require fewer bits than the terms or addresses themselves, there is a saving in storage space. it does cost some additional processing time and file maintenance is somewhat complicated. ( 6). file organization of library records/w arheit 25 another economy measure is provided by what is generally called a multilist system which groups several-usually three-descriptors into one super key with one chain address. a multilist not only saves space but also speeds both file posting and searching by processing multiple descriptors simultaneously. ( 7,8,9,10,11). such a system, to be workable, must permit grouping of various descriptors into mutually exclusive groups, and within each group there must be some equitable distribution of descriptors posted to records. in normal library information retrieval applications, a very large percentage of the descriptors are used just for one or two documents and only a few descriptors are used to identify a large number of document records. in other words, most of the so-called super keys end up having just a single real descriptor, which is equivalent to establishing a separate list for each descriptor. in a test made with the defense document center collection it turned out that about ninety percent of the super keys had only single descriptors. ( 12,13). there are, in addition, special modifications of multilist files which essentially involve segmenting the multilist to fit the hardware, for example, the track length or cylinder size. (14). a fragmented sub-list, sometimes referred to as a cellular multilist, may even contain all the link addresses in the directory, thus becoming indistinguishable from an inverted file. any list process file organization, however, does pose serious file maintenance problems, especially where individual records must be changed or deleted. also special precautions must be taken to avoid broken chains and provision made to repair breaks, although some advocates of list process files claim it is easier to maintain thread~d lists than inverted lists. of course, if multilists are used, a special effort must be made to build the super keys. · it must not be forgotten that a threaded list directory can only provide the search statistics for a single term and, unlike the inverted list, can only provide intersection statistics upon completion of a total search. the few librarians who have been exposed to threaded list file organization have not reacted favorably. a few have been interested in applying this technique to do hierarchical searches and other relationship connections in their authority lists or thesauri, but have not seriously considered using it for their catalog files. inverted file organization the traditional library file organization as exemplified by the standard card catalog has been based on a serial main file plus an inverted file. here a normal serial file is "inverted" and the file sequenced by index entry or key. the record itself is duplicated under each of its keys, which librarians call tracings. by strictly limiting the number of tracings or keys applied to each record, the librarian can keep the card catalog down to a reasonable size. however, as deeper indexing is applied to the documents, more keys or tracings are used and the file becomes very large. 26 ] ournal of library automation vol. 2/1 march, 1969 furthermore, storage costs in the mechanized file are appreciably higher than in an ordinary manual card file. the full record, therefore, in a mechanized system cannot be economically stored behind each term. only the document or record number or file address of the master record is recorded after each term; in other words, the inverted file is just an index to the record file. the main record file itself is a simple serial file where each record is complete in itself, the tracings or keys in the record and the address of the record being duplicated on the inverted file. the catalog file, therefore, is made up of two parts: a serially organized main or master record file, and an inverted index to the main file. ( 15). maintenance of an inverted index is expensive. tracings and the addresses to which they refer have to be duplicated, requiring costly additional storage space. new terms and new addresses cannot simply be added to the end of a file but must be distributed and interfiled throughout tl1e index, causing a number of file maintenance problems. the inverted index and main serial file must be kept in phase, with changes in one being reflected in the other. to maintain these files, separate inputs should not be prepared; instead the inverted index should be generated from the main record file update by program control. ( 16,17,18,19). although the combined file organization of a serial record file and an inverted index does cost more to maintain than serial or list file organization, it provides such superior search capabilities that it has become the favored library catalog file organization. since the inverted file is organized by subject headings or descriptors and since a search request is specified by listing the desired descriptors and their logical relationships, the search programs need only examine the items filed behind each selected descriptor or subject heading. it is unnecessary to look at all the records, as it is with the serial file. the inverted file search, in · its basic form, takes the request descriptors, obtains the list of record addresses or items under each relevant descriptor, makes the specified logical connections, and produces all items satisfying the request. the search procedure examines only potentially pertinent records, ignoring the rest of the file. in other words, the file is organized every time a search is made to suit the requirements of the search. thus, the file and the request are compatible and utilization of the file is essentially independent of its size. an inverted index provides a very special capability to a searcher who is using a terminal, on-line system. he can test both individually and collectively the effectiveness of the terms of his search statement without having to make a complete search of the master record, simply by examining the inverted index. the system will tell him, for example, the number of entries under a term. it will tell him how many entries several terms share in common so that he can test the intersections, that is, the conjunction and disjunction of the terms. the count of addresses that results from the list intersection can be returned immediately to the terfile organization of library records/w arheit 27 minal as an upper limit of the number of hits. in effect decoding of the boolean expression takes place in the inverted index, which is a very compact list, and hence the response time is fast. it is true, some additional calculations and comparisons in the record itself may reduce the number of hits, but will never increase them. sitting at a terminal, a searcher can ask the system what will be the maximum number of hits he will get in response to a search statement. he can change the parameters of his search statement and see immediately what effect that will have on the response of the system. it is primarily because of this capability of the user to have a dialog with the machine that every terminal-oriented library information retrieval system, at least of which the author is aware, is adopting an inverted file organization. in order to reduce storage costs, not every search term need be carried on an inverted index. those search terms or index entries that are practically never searched alone, but used rather in conjunction with another term or tracing, are carried only in the main file and not on the inverted index. in a library catalog these terms are usually the place and date of publication, publisher, language of the book, level of the publication (i.e. adult, children, youth), number and type of illustrations, and so on. these terms appear on almost every record and some of them are high density terms; that is, they are heavily posted. for example, in a typical u.s. library, some eighty per cent of the books are identified as being in english. form headings (bibliography, essay, poem, biography, map, etc. ) , geographic headings, and numerics tha~ are used in conjunction with what are called main headings, also do not appear on the inverted index, but can be searched in the main file. in the very unlikely event that a search is required to be made only for a term not on the inverted file, then, of course, a serial search can be made of the master file. in some systems, a very compact serial file of data may simplify serial searching of the master file. physical organization a basic understanding of how a library's records are used is necessary to a proper plan for their physical organization. in a manual system, logical organization and physical organization of a library's records are identical. furthermore, all files are physically the same, usually on 3x5 catalog cards or, in a few cases, in printed book or sheaf catalogs. in a computer system, however, because of varying capacities, speeds and storage costs of different direct access devices, it is extremely important that the various records and segments of records be stored in those devices which will give the best cost-performance for the application. this means that the rate of utilization of the various records and parts of records, as well as the size of the records, will determine what types of devices will be used as physical files. 28 journal of library automation vol. 2/1 march, 1969 in a library operation there is very heavy use of index terms, or subject headings and author entries, to search the files; records for these entries can be very short. borrower records and charge-out records in circulation control systems are also very actively used. there is less use made of the bibliographic record or journal citation. these records are somewhat longer than the subject and author tracings, and hence require more storage, but do not need such rapid access. notes, abstracts and other explanatory material can require an enormous amount of storage space but, as a rule, are used only infrequently. patron registration, as contrasted with borrower records, is used much less frequently, unless, of course, the two types of records are combined. since serials holdings records do not change very frequently, printouts are quite satisfactory as finding tools and the records are usually kept off-line. journal check-in, however, requires a great number of accesses every day. in view of the requirements generated by the above uses, the present thinking for on-line library systems, in terms of current hardware, runs something like this: in a combined file system described above, with the bibliographic record on the serially organized main file and the index in an inverted file arrangement, the inverted file, which must be accessed many more times than the main fil~, would best be carried on disk files. the bibliographic record itself, being much more voluminous and accessed less frequently, is stored in a larger, slower, more economical file like the ibm 2321, tl1e data cell. abstracts, and other seldom used bulk records might well be on tape, off line. actually, though, as libraries build up their record files to control their total collections, they will, of course, exceed the capacity of the present data cells and will have to go to future mass memory devices similar to the ibm photo digital storage system. then it may be economical to put even abstracts and notes of the bibliographic record on line. if there is a separate item record file of in process or acquisitions data, it can be handled in the same way as the catalog file, that is, all access points as an inverted file on disk with the record itself on the data cell strips. if, however, the total item record file is not too big, it might well be stored on disk. circulation control records are carried on disk, but patron registration, if it is to be kept on line, would be more economically stored in the data cell. the authority list or thesaurus really has two functions. it is heavily used to validate and convert all inputs and all search requests. it is also used to store all cataloging and indexing decisions and to provide guides to users as to the formulation of search queries. the necessary data makes for long records that are either infrequently used or available as printouts. therefore, a condensed form of the authority list or thesaurus, a forni which carries only the terms and their equivalents, is best stored on disk, whereas the full-blown authority list which is used primarily for printing the thesaurus and its supplements can be carried off-line on tape, or in file organization of library records/w arheit 29 the cheapest, biggest and slowest direct access device which is available. in order to achieve economical, compact storage, the subject headings, descriptors or index terms would not be stored in open language but in numeric codes. by using, for example, the decimal code as used in a dewey decimal system, numeric codes would also make it possible economically to build hierarchies or class tables with the descriptors. it would be necessary, therefore, in every transaction, to translate from open language to code when interrogating the system and to translate from code to open language when outputing from the system. translations would have to be very fast to accommodate the traffic of a large number of terminals. the translation job, using a stored table, might have to be done in an auxiliary, large core storage, which is very fast but more expensive than disk files. as a general rule, what is being proposed is that for very large files the index and the bibliographic record are not to be stored in the same device. one might start this way until the file and the traffic into it are built up and the system becomes fully operational. however, the system should be so structured that indexes could be stored in files that are faster than the bulk storage devices used for the records. the translation files, that is, the tables that convert from open language to stored codes on input and the reverse on output, can be stored in the fastest available exemal storage. ( 20). it is extremely doubtful that hardware development in the' immediate future will change these principles of library file organization very much. as storage costs drop, total capacities increase, and _access times become shorter, more and more libraries will find it practical and economical to put their files on line in order to provide the improved services that users demand. references 1. u. s. library of congress: automation and the library of congress (washington, government printing office, 1963), p. 74. 2. batts, n. c.: "data analysis of science monograph order/cataloging fmms," special libraries, 57 (october, 1966), 583-586. 3. "corporate data file design," edp analyzer. 4 (december, 1966). 4. climenson, w. d.: "file organization and search techniques," annual review of information science and technology. 1 (new york: interscience, 1966), p. 50. 5. borko, h.: "design of information systems and services," annual review of information science and technology, 2 (new york: interscience, 1967), p. 50. 6. castner, w. g., et al.: "the mecca systema modified list processing application for library collections," proceedingsa. c. m. national meeting ( 1966), pp. 489-498. 30 journal of library automation vol. 2/1 march, 1969 7. prywes, n. s., et al.: the multi-list system (philadelphia, moore school of electrical engineering, university of pennsylvania technical status report no. 1 under contract nonr 551(40), november, 1961). 8. prywes, n. s.; gray, h. j.: "the multi-list system for real time storage and retrieval," ifip congress proceedings. 1962, pp. 112-116. 9. university of pennsylvania, moore school of electrical engineering: the tree as a stratagem for automatic information handling (report of work under ... contract nonr 551 ( 40) and ... af 30 ( 602)2832, moore school report no. 63-15, 15 december 1962). 10. lefkovitz, d.: automatic stratification of descriptors (philadelphia, moore school of electrical engineering, university of pennsylvania, technical report under contract nonr 551 ( 40), moore school report no. 64-03, 15 september 1963). 11. landauer, i.: "the balanced tree and its utilization in information retrieval," ieee transactions on electric computers (december, 1963), pp. 863-871. 12. univac division of sperry rand corporation: multi-list systems: preliminary report of a study into automatic attribute group assignment; technical status report no. 1-2, 3#ad 609 709, 4#ad 609 710 ( 1963-1964). 13. univac division of sperry rand corporation: optimization and standardization of information retrieval language and systems; final report (ad 630-797, 1966). 14. lefkovitz, d.: file st1'uctures for on-line systems (class syllabus). 15. curtice, r. m.: magnetic tape and disc file organizations for retrieval (lehigh university, center for information sciences, july, 1966). 16. warheit, i. a.: "the direct access search system," afips conference proceedings, 24 ( 1963), pp. 167-172. 17. warheit, i. a.: the combined file search system. a case study of system design fm· information retrieval (paper presented at the f. i. d. meeting in washington, d. c., october 15, 1965; abstract, 1965 congress, international federation for documentation ( fid), washington, d. c., u. s. a. 10-15, october 1965), p. 92. 18. prentice, d. d.: the combined file search system (san jose, california: ibm june 15, 1964). 19. 1401 information storage and retrieval systemversion ii; the combined file search system, no. 10.3.047 (hawthorne, new york: ibm, may 1, 1966) . 20. warheit, i. a.: file organization for libraries; report to project intrex, mit, cambridge, massachusetts, march 14, 1968. lib-mocs-kmc364-20131012115244 150 the british library's approach to aacr2* lynne brindley: british library, bibliographic services division, london, england. the formal commitment of the british library to aacr2 and dewey 19 entailed substantial changes to the u.k. marc format, the blaise filing rules, and a variety of products produced for the british library itself and for other libraries, including the british national bibliography. the british library file conversion involved not only headings but also algorithmic conversion of the descriptive cataloguing. along with the u.s. library of congress and the national libraries of australia and canada, the british library was formally committed to the adoption of the anglo-american cataloguing rules, second edition (aacr2) and decimal classification, 19th edition (dc19) in 1981. this entailed fairly substantial changes to the marc format as published in the u.k. marc manual, 2nd edition as well as the implementation of the new and more sophisticated blaise (british library automated information service) filing rules. 1 there is, of course, never an ideal time for making major changespolitically, economically, or technically; and the bibliographic services division (bsd) found itself having a large number of preexisting separate systems, particularly for our batch processing work, which had grown up over a long period of time and had in most cases been tailor-made to the individual products. whilst relatively small, bsd is nonetheless responsible for a multiplicity of products and services, almost all of which were to be affected to some extent by the change toaacr2/dc19. briefly, then, a comment on the different services and the degree to which they were affected, thus setting the scene for our decisions on machine conversion. *based on a talk given at the library association seminar "library automation and aacr2," held in london on january 28, 1981. the views expressed in this paper do not necessarily represent those of the british library or the bibliographic services division. manuscript received june 1981; accepted june 1981. services and impacts printed publications british library!brindley 151 the major printed publication of the division is the british national bibliography. it is arguable that for the printed publications (especially the weeklies) there would have been little justification for retrospective conversion. the files could have been cut off at the end of 1980 and started afresh for 1981-it might, however, have precluded, or certainly have made more messy, the possibility of any multiannual cumulations across this period. microform products these are mostly individual com catalogues, both within the bl, especially the reference division, and externally, provided through locas (bsd's local catalogue service) to some sixty libraries in the u.k. in many ways those libraries that plunged into automation early, building up files of records derived from central u.k. and lc marc, were likely to be worst affected. individual machine-readable files had grown very large and exploited not only relatively current cataloguing data, but also full retrospective u.k. holdings back to 1950. also we foresaw no lessening of use by libraries taking our catalogue service of the u.k. retrospective 1950-80 file after aacr2 implementation. therefore the grounds for attempting automatic retrospective conversion of records were indisputable. tape services u .k. exchange tapes, either as a weekly service or through the selective record service, are supplied to nearly one hundred organisations. the same arguments that there will be continuing selection from the retrospective files apply-therefore, for compatibility and ease of use we needed to consider conversion. the weekly exchange tape service makes a clean aacr1/aacr2 break, but obviously libraries have back files of aacrl records. mindful of our responsibility to other organisations and agencies utilising our records, we decided to make our own converted tapes of lc and u.k. marc records available to tape-service customers to aid their own conversions. online services regarding the blaise online information retrieval system for u.k. and lc marc, our concern was to ensure continued easy searching and printing across the total span of files. without automatic conversion it would have been difficult, if not impossible, to ensure consistency in search elementsandindexentries (e.g.: in u.k. marc, seriesfields400, 410, and 411 no longer exist, so without conversion a searcher would have to remember specific search qualifiers for pre-1981 records, and different ones thereafter). without conversion the searcher would need a lot more knowl152 journal of library automation vol. 14/3 september 1981 edge of marc and the history of cataloguing practices to formulate effective strategies. outside users of marc last and very much not least was a consideration of what we could do to help the now large community of u.k. marc users in coping with the changeover. this is now a very large and diverse group relying on bsd for the provision of bibliographic records for whatever purpose. our own conversion enabled us to provide a multiplicity of aids to libraries. of particular note are (1) u.k. and lc exchange tapes of converted records, and (2) machine-readable and microfiche versions of our own name conversion file, which is being used as the basis for the new name authority fiche. so, in the context of the variety of our services the case for conversion was strong. retrospective conversion the extent of the retrospective conversion exercise is discussed below. in conjunction with this work we were faced with the necessity of rationalising our com and print product software (library software package), both to enable it to drive each of the previously separate print applications and to ensure that it had sufficiently sophisticated output facilities to cope with the complexity of aacr2/u.k. marc 2 records, with their increase in numbers of subfields, their repeatability, all or some, and varying sequences, to produce the specified layout and punctuation across our services. extent of conversion we are now in a position to discuss the retrospective conversion exercise. having decided in principle to become involved with conversion, the extent of our involvement had to be established. british libraries have never had the tradition of building and utilising name authority files, and certainly the concepts fit more easily in the north american primarily online system context rather than in the predominantly batch cataloguing systems established in the u.k. the bl therefore found itself without a machinereadable authority file and began to create one from scratch to enable the important heading changes required by aacr2 to be handled automatically. again because of the overriding importance of com catalogues in the u.k., considerable attention was paid not only to automatic heading changes but also to automatic marc coding and text conversions bringing the descriptive cataloguing elements also into line with aacr2/u.k. marc 2, so that catalogue records could be consistent on output whether derived from the conversion or newly created . the third consideration for conversion was our library of congress file british library!brindley 153 (books all1968), used in the u.k. as part of our cataloguing services and as a file in the blaise online system. we had always performed certain conversions on lc records to bring them more into line structurally with the u.k. marc format. however, u.k. libraries using these records for cataloguing purposes still had to undertake substantial editing. it was therefore decided to use the opportunity to enhance this conversion and bringlc records into line with u.k. marc 2 to make them of maximum use to british librarians. to summarise, then, the retrospective conversion comprised three main parts: 1. that part which utilised information stored in the name conversion file, which records the aacr2 and aacrl forms of names. this enabled the automatic conversion of major, commonly occurring personal and corporate headings. 2. automatic marc coding and text conversions-this consisted of specifications at marc tag and subfield level of algorithms for automatic marc coding and scme bulk text conversions. it resulted in records being converted to a pseudo-aacr2/u.k. marc 2jormat, so that all output specifications, whether by profile or by online inversion, had only to cater for the new format. these two parts of the conversion are inexorably linked, both conceptually and in programming terms , with frequent references to alternative courses of action dependent on whether a match has been found on ncf. the details of conversion are in "specification for retrospective conversion of the uk marc files 1950-1980,"2 prepared in the computer services department. 3 . the third facet of conversion was to our library of congress files (books all1968), to bring records in line with u.k. marc 2 as far as possible. only conversions of tags, indicators, subfield marks, punctuation, and order of data elements have been included; no attempt has been made to bring textual data into conformity with bsd practice. the converted records are therefore in aa c r2 form to the extent that lc applies aacr2 to a particular record. the next section highlights major points of each part of the conversion, commenting particularly on aspects of programming and testing. name conversion then arne conversion file was built up by bsd's descriptive cataloguing section over nine months of 1980 and comprises authenticated aacr2 headings with theaacrl form where different. it will form the basis of an authority file of headings and references for future bsd cataloguing and will be the first publicly available u.k. authority file. the file was maintained using existing locas facilities. pseudo-marc records were created recording the aacrl and aacr2 forms of headings in the format shown in example 1. 154 journal of library automation vol. 14/3 september 1981 field 001 (control number) 049 (source code) 110.1 $a great britain $c accidents investigation branch (name heading in aacrii form) 710.1 $a great britain $c department of trade $c accidents investigation branch (name heading in aacri form) 910.1 $a great britain $c department of trade $c accidents investigation branch $x see $a great britain $x accidents investigation branch (reference for aacrii name heading) name conversion file record example! the file being used for conversion comprised some 12,000 records, of which 4,000 had aacr2 heading changes. the remaining records were authenticated by bsd as correct aacr2 headings without alteration. of the changed headings most were prolific personal and corporate (particularly u.k. government) headings. the first stage of the conversion process for u.k. marc records (1950-80) involved all records being processed against the name conversion file to replace aacrl with aacr2 headings and associated references. in programming terms, the name conversion was relatively easyrelatively, that is, in the context of bibliographic programming. the matching program used was not particularly sophisticated. it took each ncf record, identified the 7xx (aacrl) field, created a key of fifty characters stripping out all blanks, embedded punctuation and diacriticals, and then tried to match the key against each 1xx heading in whatever file was being converted. if there was a match on the key, then the program proceeded to match character by character through the data looking for an exact match. if this was not found, then the ncf record was not processed. example 2 shows this procedure more clearly. of course, this file has not converted all aacrl headings, but it has ensured that the majority of headings likely to recur (i.e., of any significance in catalogue collocation of headings) have been automatically changed. automatic marc coding and text conversions this is commonly known as the format conversion program and forms the bulk of the "specification for retrospective conversion." the original specification was extremely complex, particularly bearing in mind the tight time scales that we were working to. the major difficulty throughout all parts of this facet of conversion was having to specify procedures to accommodate the variety of usage of marc across thirty years, including previously automatically converted 1950-68 u.k. marc records; it has british library!brindley 155 ncfrecord 710 (aacri) $a great britain $c civil service department $c central computer agency# 110 (aacrii) $a central computer agency# 910 (aacrii) $a great britain $c civil service department $c central computer agency $x see $a central computer agency# key: 10$agreatbrit ain$ccivilservicedepartment$ccentralc matching on datawould match central computer agency would not match central cataloguing agency n.b. key equals 50 characters (upper case) ncfrecord 700 (aacri) 100 (aacrii) $a walker $h david esdailel $a walker $h david e. $q david esdaile $r 1907 -1 900 (aacrii) $a walker $h david $c 1907$x see $a walker, david e.# key: 10$aw alker$hdavidesdaile book record before: 100 walker $h david esdaile# 900ajter: 100 $a walker $h david e. $q david esdaile $r 1907 -# 900 $a walker $h david $c 1907$x see $a walker, david e. $z 100# n. b. addition of new reference name conversion matching example2 been almost impossible to verify absolutely that any of the automatic changes would cover all cases. not surprisingly, this was an extremely complex program. it had to allow for manipulating in fairly precise ways nonstandard and variable data, and had to be designed to cope with occurrences in many different combinations . the programmer had to code for these combinations, some of which may possibly never have been used. it is probably the case that certain combinations do not exist, but this could not be guaranteed over such a large number of records until the total file had been converted. a good example of the complex logic of this kind of processing is found in the 245 field, where seven complex conditions were allowed for: (1) (2) (3) field245 if $e ___ then _ __ else _ _ _ if$£ then else _ _ _ if $d or $e ___ or ___ or _ _ _ or ______ or ______ or ______ or ____ __ 156 journal of library automation vol. 14/3 september 1981 then __ _ else if $d or $e ___ or ___ or __ _ or ___ or ___ or or __ _ or __ _ then __ _ (4) iftags ___ then __ _ (5) if008 and or ___ or __ _ then __ _ (6) if $h then and __ (7) if $e then __ _ else if first $e then _ _ _ else __ _ else __ _ repeat for all levels of 245. another variation on this theme is that the specification catered for what it expected to find. again, because of the voiume and span of data the expected was not always found. for example, a lot of processing of references is dependent on the presence of a $x. what do you do when you find a record accidentally without one? a third problem was that of interdependency of fielch and subsequent actions . a good example of this is found in llos and related 910s. if a 110 is changed, you may have to create a 910 , replace a 910 with another one, or reorganise existing subfields. then you may have to reorder the field and also flag the action to come back to later in the program. hence you are switching back and forth across fields throughout the program . you cannot simply start at field one, process sequentially, and then stop. clearly this makes program testing that much more complicated. however, those were the problems-really a very small percentage of the whole. from all that has been seen of the converted files so far it has been a highly successful exercise. all of the major marc changes and many of less significance have been converted automatically by this program-treaties, laws, statutes, series, conferences, multipart works-the resulting records being consistent in marc tagging structure and in significant headings and areas of text. library of congress file conversion it has already been stressed that the automatic marc coding and text conversions for u.k. marc were very complex programs. perhaps even more complicated was the conversion program written to transform lc into u.k. marc format. the main reason for this is that the u.k. and ncf conversions are one-off programs and a great number of the manipulations could be hard-coded. however, it is intended that the lc conversion program will be used on an ongoing basis against each weekly lc tape. thus each conversion has been treated as a separate parameter to the british library!brindley 157 program so that it is general purpose and easily alterable in the light of changes of practice by lc. to give you some idea of the complexity, there are well over 600 separate parameters to the program. i say separate, but in fact they are interrelated parameters, so that if a minor change is made to one it can potentially affect many others. many of the problems relating to this program could again only be really apparent in volume testing, not in writing. each parameter written and tested in isolation was satisfactory, but when they began to be put together in modular form, then the problem of unusual combinations began to show. although the conversion parameters for lc records are extensive, they cannot touch the cataloguing data, certainly not nearly as much as in the u.k. marc conversion. there are added problems in the fact that the records coming to us from lc do not show the clean aa crl/ aacr2 break that bsd is adopting. we are having to allow for mixed records from lc at least in the foreseeable future. details of the lc-to-u.k. marc conversion are published in a detailed specification. 3 common issues in conversion testing it is possible to draw out common problems applicable across all the conversion work, particularly in testing. they are as follows: 1. variability of records; 2. complexity of records; 3. volume of data; 4. nonstandard data; 5. repercussions throughout system. variability this is an obvious problem in the handling of marc records, but particularly pertinent when trying to do such complex manipulations. the record format itself is of course variable-there are very few essential fields or data elements; most need not be present at all; if they are present, they can be there once or ten times. standards of cataloguing, and therefore marc coding, have changed considerably over the period in question, adding to the variability. in some exceptional cases bsd practices are different from those prescribed in the marc manual, e. g., nonstandard use of title references. all of this results in additional difficulties from specification, through programming and testing. on average we found that one conversion process took two to three times the amount of coding required for more normal computer processing. complexity this is linked with variability and was manifest particularly in the fact that it was extremely difficult to ensure that the programs catered for all 158 journal of library automation vol. 14/3 september 1981 conditions. we found that testing threw up oddities not allowed for in the original specification. in an ideal situation with no time constraints a totally tailored and comprehensive test file should have been drawn up for each facet of conversion. this exercise alone would have taken a good year and would still not have catered for the unexpected data problems. in practice, whilst bsd's descriptive cataloguing staff were able to provide several hundred records that tested the majority and most important of the conversions, we always faced the possibility of coming across exceptions. this soon became apparent when volume testing commenced and each new file threw up another combination and a different program route not previously tested. volume the third major factor adding to the complexity of the whole operation was the sheer volume of data to be processed. approximate figures are as follows : u.k. marc 0.7 million records lc marc 1.4 million records locas 2.5 million records the combination of these three factors-variability, complexity, and volume of datamade testing extremely difficult and expensive in machine terms, in that large test batches of material had to be processed. nonstandard data like any large file, u.k. marc has its share of incorrect data, most of it of no particular significance. however, some problems arose in conversion testing resulting occasionally in corrupted records. one example that springs to mind was the incorrect spelling of months in treaties, giving problems in the 110 $b conversion to 240 . repercussions throughout system a cautionary note, really: we made a decision that postconversion records should not be put back and overwrite existing master files until they had been through validation programs (i.e., those used for validating new input for bnb and locas); it was felt that this was a necessary safeguard against reintroducing any structurally incorrect records postconversion. it was here again that testing threw up timely reminders of just how much the validation programs had been upgraded and changed since many of the original records had been input through the system. scheduling the scheduling of such a large, complex exercise was extremely difficult, with interdependency of processing related to the success or otherwise of overnight runs . a lot of time was spent before the conversion period in british library/brindley 159 discussion with our computer bureau to ensure maximum cooperation throughout the difficult time. they were extremely helpful in ensuring operator coverage throughout weekends and priority for our work. one of the problems we encountered was having to forecast the approximate number of machine hours that would be required throughout january 1981 when the bulk of conversion work was carried out. at the time the figures were needed we were still in early stages of programming so no volume tests could be run. equally, although we were experienced in large-volume processing it was difficult to draw any direct comparisons with production work. additionally, we had to allow for a heavier than normal production work load towards the end of the year, which always sees annual volumes, cumulations, online file reorganisation, and so on. scheduling therefore was a fine art to ensure correct priorities for production, the bureau's own work, and conversion , and to minimise contentions for files and peripherals. staffing of interest is a picture of the human resources involved in this project . what is striking is the magnitude of the task achieved by very few people. the overall management of the project was taken on by existing line management within bsd's computer services department. two project leaders were appointed, one a librarian and one a systems analyst. the librarian had a team of four temporarily seconded staff who were totally responsible for all output profile specifications (printed products and com), testing, and implementation. they also did a considerable amount of checking of test file conversion runs . the systems analyst was a project leader for three analyst-programmers and one jcl writer. between them they were responsible for lc and u.k. conversion programming and the new filing rules. existing operations staff and others as appropriate within the division were called upon for other tasks. disruption to services whilst disruption to our normal production services was kept to an absolute minimum, it was decided that it would be necessary to temporarily suspend certain services through the month of january 1981 while the bulk of the file conversion took place. throughout the period, the blaise online information retrieval system continued to be operational : associated online facilities that would normally allow the despatch of marc records to catalogue files were suspended to avoid any non-aacr2 or nonconverted records inadvertently updating converted locas files. the production of com catalogues through locas was suspended for a single month, and the first issue of bnb for 1981 was not scheduled until early in february. the schedule for the conversion exercise was adhered to with no major slippage except in the case of our lc file conversion; this exercise 160 journal of library automation vol. 14/3 september 1981 stretched on into the spring for a variety of technical reasons largely concerned with the characteristics of the lc data. conclusions having been so closely involved in this project it is difficult to draw out general conclusions as yet. however, there are some already obvious benefits both for bsd and the wider library community: the rationalisation of our software for com/printed products will lead to easier maintenance and future upgrading; the introduction of the blaise filing rules across all our products is an improvement; the new lc conversion will make our lc files much more easily usable by the british library community; we have the basis of a u.k. name authority file for the first time. this was a vast and sophisticated conversion exercise and will result in u.k. marc files probably more uniform in structure than they have ever been. it forms an excellent basis for the continuation of bsd services, especially those based on utilising records across the whole time span, e. g., blaise information retrieval, selective record and cataloguing services. equally, because our conversion has been so extensive we have been able to share it: the specification, the name conversion file, and the converted u.k. and lc files were all available at minimal cost to libraries in the u.k. of course, it is not the 100 percent solutionit was never intended to beso of course if you look hard enough you will find inconsistencies. however, it has proved that very extensive automatic conversion is possible even with today's state of the art of computing and that bsd had led the way, indeed eased the path of transition to aacr2 for british libraries. references 1. british library, filing rules committee, blaise filing rules (bl , 1980). 2. british library, bibliographic services division, computer services department, "specification for retrospective conversion of the uk marc files 19501980" (unpublished with limited distribution). 3. british library, bibliographic services division, "specification for conversion of lc marc records to uk marc" (unpublished with limited distribution). lynne brindley is head of c ustomer services for the british library automated information service (blaise). lib-s-mocs-kmc364-20141005043703 68 book reviews indiana seminar on information networks (isin). proceedings. compiled by donald p. hammer and gary c. lelvis. west lafayette, indiana: purdue university libraries, 1972. 91 p. (available at no charge from the extension division, indiana state library, 140 north senate avenue, indianapolis, indiana 46204 as long as the supply lasts). the indiana seminar on information networks (october 26-28, 1971) was an attempt to introduce indiana librarians to the benefits (and presumably problems) of library networking. papers included in the proceedings are introduction to networks (maryann duggan), library of congress marc & recon (lucia j. rather), nelinet (ronald f. miller), an on-line interlibrary circulation and bibliographic searching demonstration (gary c. lelvis and donald p. hammer), ohio college library center (frederick g. kilgour), user response to the facts (facsimile tran.smission system) netu1ork (lynn r. hard), indiana twx network discussion (margaret d. egan & abbie d. h eitger), and how does the n etwork serve the researcher? (irwin h. pizer) . as with any collection of written papers or oral presentations, the quality is mixed. the papers are introductory in nature, the pizer article being the exception. the majority report "case studies" of particular automated operations and/ or networks (marc & recon, nelinet, oclc, facts) . the facts article is the most interesting of these "case studies" because it moves beyond simply reporting "how we done it good" into an evaluation of why the network did not succeed (the network did not meet a real and/or consciously recognized need of the libraries it was proposing to serve) and emphasizes the importance of careful planning. any wouldbe network planner should read this article; there are many lessons to be learned. although the collected papers have all of the disadvantages usually associated with a collection of oral presentations (material is loosely organized and lacks continuity, introductory and oversimplified, repetitive, and out of date), they are a valuable addition to the growing body of literature dealing with networks both from the idealized conceptual view and, perhaps more importantly, from the practical reality view of existing networks. kenneth j. bierman systems librarian virginia polytechnic institute computers and systems; an introduction for librarians, by john eyre and peter tonks. hamden, connecticut, linnet books (shoe string press), 1971. 127 p. $5. 75. isbn: 0-208-01073-4. at last an inexpensive introductory text specifically written for librarians and library students! not since n. s. m. cox's the computer and the library have we had such a short, easy to read, yet comprehensive, description of the essentials. complementing the text are twenty-nine figures illustrating everything from batch and real-time processors, disc drives, program process, and systems flowcharts to data elements, formats, and input procedures, marc ii records on magnetic tapes, and sample pages from a computer-produced author catalog. the text reads like a well-organized glossary, treats the subjects of library use of computers and systems analysis in a way at once simple and informative. the authors had tested the material with students in courses at the school of librarianship of the polytechnic of north london. thanks to the british-american cooperation surrounding marc efforts, this book will be as useful in our library school classes as it is in theirs. the index d eserves a special note because it was compiled after the style of precis developed by the british national bibliography. it is a facet analysis of the text featuring access to "activity:thing: type:aspect" in a prescribed permuted order. although there is not much emphasis in such a text on subject access or information retrieval, this is not entirely overlooked and this index serves as an excellent example of what could be done by computer. truly an excellent introduction to computers and systems analysis for librarians! a two-page bibliography contains suggestions for further reading on the topic or for an expanded reading of various applications of computers in libraries. pauline atherton school of library science syracuse university isis: integrated scientific information system; a general description of an ap· proach to computerised bibliographical control, by william schieber. geneva: international labour office, 1971. 115p. $1.50. this document is a well-written description of the computerized library system developed at the international labour office. planning and development for the system began in 1963. it has been implemented and is now in operation within the central library and documentation bmnch of the ilo. the isis bibliographic control system is a large file system for storing, processing, and retrieving bibliographic information. the ilo data base consists of some 45,000 records of books, periodical articles, and other documents. each record consists of conventional bibliographic data (with less detailed definitions than marc data, however) plus an abstract. in form, the abstract appears to be written in natural language, but all descriptor words used in the abstract are taken from a controlled vocabulary and, in fact, provide subject indexing. on-line terminals are used for ide searches. the search system allows searches by subject descriptors, language, and date of publication. sequential formulation of the search allows control of the number of responses to a desirable size. records are also indexed on various data fields, such book reviews 69 as author and title. display of records and browsing are handled on line, but printing of lists or bibliographies is handled through subsequent batch printing jobs. regularly scheduled outputs of the system include printed catalogs, indexes, and authority lists. two other systems have been developed at the ilo using some programs and files of the bibliographic control system. one is for controlling loans of library books, the other is for serials data and includes a subsystem for routing library periodicals. these three major systems are described in some detail in this report. a fourth section deals with system monitoring and control. costs are discussed here. the isis system is an interesting and unique one even though the system is geared primarily to a special library environment. it is evident that much careful thought and attention to detail went into the system design and development. the integrated use of programs and files as described here and the details of some design elements make this a useful document. the report itself is well done. describing a complex system for a varied audience is a difficult task. the author, william d. schieber, has put together an excellent example of a systems report document. charles t. payne systems development office university of chicago library title derivative indexing techniques: a comparative study, by hilda feinberg. metuchen, n.j.: the scarecrow press, 1973. x+297p.; index and bibliography. this book is primarily a survey of key word indexes, with some discussion of issues in indexing. the survey is quite good, but already out of date. the discussion is unfortunate. the survey covers a wide range of computer-based article title key word indexes, including extreme cases such as permuterm. sample pages are included for fifty-six indexes, and thirteen lists of excluded words ("stopwords") are· given. reproduction of samples is generally ex70 journal of library automation vol. 6/ 1 march 1973 cellent, and this portion is valuable in showing the virtues and defects of various approaches to key word indexing. since this survey, at least three major libraries have begun publication of key word indexes to serial titles, a type of index with different problems which is likely to be more common in the future. the discussion suffers from a lack of focus. there are no clear standards for key word indexes or the traditional tools they complement or replace, and studies of user preference and convenience have been limited and inconclusive. it is difficult to say what makes a key word index more or less workable, and this book seems to cloud the issues even more. ms. feinberg makes some questionable and unsupported assumptions about what users think, want, and need, and a number of recommendations which are at best only applicable to indexes of article titles in scientific fields. take three major recommendations: plural and singular forms should be interfiled, synonyms and similar words should be interfiled, and foreign titles should be translated. the university of california (berkeley) library found "college," "university," "company" and "papers" to be good exclusion words, while "colleges," "universities," "companies," and "paper" are good subject words. synonym control increases homonym problems, makes for longer (and thus more difficult to use) lists, and entails difficult decisions as to what con~titute true synonyms. translation raises the qm ..., tion of whether a user should be guided to a publication he may not be able to read. in sum, these and similar decisions should depend much more on the field of study and user population than on this type of general treatment. there are other problems reflecting deficiencies in the areas of technical background, understanding of typography, and appreciation of some reasons for key word indexing. ms. feinberg comes out strongly in favor of "title enrichment"-adding artificial titles to improve indexing. this, however, adds cost and time to the key word approach, and subtracts from its clear advantages. a large section is devoted to an experimental study of different indexing programs, with the result that different programs produce different in dexes. generally, the discussion detracts from the survey. finally, the title chosen seems unfortunate. "key word indexing" may not be an ideal term, but it is fairly well known; must we introduce yet another vague, polysyllabi<, phrase, "title derivative indexing"? walt crawford university of california berkeley accountability: systems planning in education. leon lessinger & associates. creta d. sabine, editor. homewood, ill.: etc publications, 1973. 242 pages. "accountability" has become a rallying cry in many educational circles of late: for the public in its demand for visible results for educational dollars, and for educators as they attempt to define and defend new programs. this well-sequenced collection of nine papers on this subject addresses the problem of accountability at all levels of the educational 'enterprise. first is a conceptualization of systemsplanning through an explanation of the systems approach, cost effectiveness, and cost analysis. next are specific methods of systems-planning at the classroom, community college, university, and state 0 ....... 0 1:: '"'t ~ ~ .q. t"-t .... ~ '"'t ~ ~ e0 ~ .,... .... 0 ~ < 0 ~ c.:> .......... ~ s: ill '"i n .?"' ~ (0 ~ 0 e~tus hconstruct •c? hatcr message; ____j \.__} \.__} l.__1 \.__} i """ no ~ ~ ~ co co -· ;:$ q"q .q.. add 1 to h construct ~ print h skip a skip i a:: a read a read > delete delete msg old new old new ~ counter ~iessage ht":ti ~~~ d~a d-a (j ~ ~ ~ co construct i ~ral .......... invalid to matcr msg 1-; trj !:lo ~ > z ill ::s 0.. to roe fig. 2 continued. c:: tr1 ,j:>.. ....... ~ ~ ...... c subtract i 'r·-··-~ 2048 from ...... length c -1:"'4 .... add 1 to c:.-neii·count ~ exit ) ~ ~ .... c ;:! ~ .... s· ;: < c :-c.:> subtract i ......... ~ subtract i v \ r~d i 1 2048 from i 2048 from length ~ length cj no >; ~ no i . \ .?" exit ~ (!) exit ~ fig. 2 continued. add 1 to new-count fig. 2 continued. subtract 2048 i froh length ~ i exit ) ud-ne\1 hove hivalues to old-compare ~ area -----' hove hi· values to h nell-compare area ~~ j exit '"i; ~ ~ <.':) ~ ~0 ~ ::x:l (") ~ ~ ) <.':) ex it ~ b:j 1-c tr1 !:d ~ > z § 0.. b:j t-t c tr1 ..... (,:) 00886nam 2200205 0010013000000080041000130500021000540820018000751110093000932450119001862600 ~7ft03053000033003425000089c037550400290046465000320049365000240052570000460054970000460059571000400 0641& 67026007 &690324s1968 moua · b 10100 engo &0 sarc847sb.a67 1966& $4616.3/62/0755&20 saapplied seminar 0~ the laboratory diagnosis of liver diseases,scwashington, d.c.,$01966.&1 $alabor atory diagnosis of liver oiseases.$ccompiled and edited by f. william suno f. r~an and f. william sunde rman, jr.&o sast. louis,sbw. h. greensc*c1968*& saxl[[, 542 p.sbillus.sc27 cm.& sahelo under the a uspicf.s of the association of clinical scientists, nov. 10•13, 1966.& $al~cludes bibliographies.&oo saliversxdiseasessxdiagnosis.&oo$ameoicine, clinical.&10sasunderman, frederick williamtsd1898•seeo.& 10~asunderman, frederick william,sd1931•seed.&20saassociation of clinical scientists.* 00778nam 2200169 0010013000000080041000130500019000540820010000731000017000832450295001202600 04600415300002600461500003800487500002800525652002800553740002800581& 6702a617 &690324r19681846mdu c c 00000 f~go &0 saf93sb.h65 1968& sa929.3&10sahinman, royal ralph,$01785•1868.&1 saa catal ogtw of the names of the first puritan settlers of the colony of connecticut,sbwith the time of thei r a~rival in the colony, and their standing in society, together with thfir place of residence, as f ar as can be discovered by the records.sccollected from the state and town records.&o sabaltimore,sb genealogical pub. co.,sc1968.& sa336 p.sbport.sc23 cm.& $aon spine* first puritan settlers.& sare print of the 1646 ed.&oosaconnecticutsxgenealogy.&olsafirst puritan settlers.* 00896nam 2200193 00100130000000800410u0130500017000540820010000711000021000812450128001022600 0500023030000320020049c005800312500013300370504003100503650003100534710006100565810007700626& 6703 0030 &690324s1968 nyua b 00010 engo &0 sara395.a3sbu4& sa362.1£10saullmann, john e.t1 sat he application of management science to thf evaluation and design of regional hf.alth services,scedit ed by john e. ullmann.&o sa*hempsteao, n.y.,sbhofstra university*sc1968.& saiii, 346 p.sbillus.$c28 cm.&1 sahofstra university yearbook of business, ser. 5 0 v. 2& 'a**this* ~fport results from the c ontinulng series of m.b.a. seminars conducted by the school of business of hofsfra university.*& sa bibliographical footnotes.&oosacommunitv health services.&20 sahofstra university, hempstead, n.y.sbs chool of business.&2 sahofstra university, hempstead, n.y.styearbook of buslnfss,svser. 5, v. 2* 00844nam 2200217 00100130000000a0041000130410011000540500018000650 8 20014000831000027000972450 0940012426000580021830000490027635000100032549000730033550400810040865000260048965000330051584000270 054884000s200575& 67031114 &690328s1968 njua 8 00100 engo &1 shengfrf.&o san7b32sb.g6613& sa704.948/2&10sagrabar, anor=e,$dl896•&1 sachrlstian iconography*sba study of its origins.sc*trans lateo from french by terry grabar.&o saprinceton, n.j.*sbprinceton univepsity presssc*c196r*& sal, 174, *203* p.sbillus. ipart col.)sc27 cm.& sa15.00&1 sabollingen series, ~s. the a. w. mellon lectu res in the fine arts, 10& sabibliography* p. 149•158 12d group) *illustrations** p. *1*•*203* (30 g roupl&oosaartt early christian.&oosachr.istian art and symbolism.& sabolltngfn seriesrsv35.& sathe a. w. mellon lectures in the fine artsrlv10* fig. 3. print record program output. ..,. ..,. i ....... -q.. t"i & ~ ~ ..... c ~ ..... c;· ;:s ~ !"""' (;:) ........... ...... a:: ~ '"i pi-' cd c3 processing of marc tapes j bierman and blue 45 drop and transfer records program this is a utility program that enables any number of lc card numbers to be entered on cards, with the option in each case of dropping the record entirely or transferring it to another tape for future action. it has proven useful for removing out-of-sequence records, purging files, etc. inputs are two in number: 1) any tape in marc code and format (sequence is not checked) ; and 2) detail cards, each of which contains a 12-position lc card number and a code indicating if this marc record is to be dropped or transferred to another tape. these cards must be in sequence. there are three outputs: 1) an updated tape containing all marc records on which no action was taken; 2) transferred tape containing, in sequence, all records transferred; and 3) a listing showing the lc number and the action taken, which is useful for verification of results. print record program this program prints in readable form any tape in marc code and format. the translation table, which produces a form of upper-case ebcdic, is the same as that used for other department of libraries programs. it is a character-for-character translation, which, for the present, is useful for many and varied applications. input is any tape in marc code and format. output is an upper-case ebcdic translation of the tape. figure 3 shows a sample output. figure 4 shows how the oklahoma department of libraries is handling the marc expanded character set with a small printer (ibm 1404-48 characters). simply stated, the problem is that there are many more characters coded in the marc ascii character set than are available on the particular printer that the department of libraries is using. (this is a local limitation of the printer that happens to be available; it is not a limitation of computer technology, as printers with expanded character sets are readily available). in general, rarely used punctuation and special punctuation marks not in the printer's character set print as an "•'', the lower-case letters print their upper-case equivalents, and diacriticals and foreign language symbols print as "= ". this translation table is used for in-house lists (for checking purposes, etc.). for production purposes, a slightly different translation table is used. characters, particularly punctuation marks, not available on the printer are translated to their closest equivalent or left blank, whichever is more appropriate. at the oklahoma department of libraries, all translations at this time are internal and do not affect the marc tapes, which are being left in the original ascii code. it seemed unreasonable to centrally translate the tapes to ebcdic until agreement among all the users could be reached as to a mutually useful translation table. 46 journal of library alutomation vol. 3/ 1 march, 1970 there is a good possibility that in the near future the information and management services division will make available an off-line printer with an expanded character set ( upperand lower-case letters, additional punctuation, etc.). if this does happen, then print-outs in an expanded character set would be economically possible. k .. .c " .c "" kc: h a, y,9 a,9 8, 9 c , 9 0,9 e , 9 f ,9 g ,9 11, 9 a,8 , 9 8 , 8 , 9 c , 8 , 9 0 ,8, 9 e, 8 ,9 f , 8,9 g , 8,9 a , q , 9 j , 9 k, 9 l , 9 m, 9 n,9 0, 9 p,9 q, 9 j , 8 , 9 k ,8,9 l ,8, 9 m,8 , 9 n , 8 ,9 0 ,8 , 9 p . 8 , 9 k .. " k .c .. " " .c .c " u .... " " 0 .. .. u .. .. "' :e :e "' 00 nul $ 01 soh s 0 2 stx $ oj etx $ 04 eot $ os enq $ 0 6 ack $ 07 bel s 08 ss $ 09 lit $ oa lf s os vt s oc ff $ od cr s :je so s of si $ 10 ole $ 1 1 dc! $ 12 dc2 + lj ocj $ 14 dc4 $ 15 nak $ !6 syn $ 17 ets $ 18 can $ 19 em $ ! a sub $ 18 esc $ !c fs $ 10 gs * le rs & l f us $ " k .. " .c .c " u u .... .....c 0 0<.> u uc: : ~6. s b $ sb $ sb $ 58 s sb $ s b $ s b $ sb $ ss s 58 $ ss $ sb $ ss s sb $ ss $ ss $ 58 $ 58 $ 4e + ss $ 5b $ 58 $ sb $ sb $ 58 $ 5b $ 58 $ sb $ sb $ sc h 50 & sb $ .. .. .c ... .c ...... "" .. ~ "'"' j ,y,9 z , l z,2 z , j z,4 z, s z , 6 z , 7 z , 8 y,l , 9 y, 2 , 9 y , 3 , 9 y, 4,9 y , 5 , 9 y,6, 9 y, 7 , 9 a ,q, z 1 , 9 2 , 9 j , 9 4 ,9 5 , 9 6 , 9 7 , 9 8 , 9 1 ,8, 9 2 , 8 , 9 j , 8 ,9 4, 8 , 9 5 , 8 ,9 6 , 8,9 7 , 8,9 ~ k ~ " 0 .. .. "' 20 sp 21 ! 22 .. 2j i 24 $ 25 7. 26 & 27 . 28 ( 29 ) 2a " 2b + 2c ' 20 2e 2f i jo 0 jl 1 j2 2 3j j j4 4 3 5 5 j6 6 j7 7 38 8 39 9 ja : 38 jc < )0 3e } 3f ? marc hex lc • tape mare k .. .c 0 !::! 0 u "' "' sp ,, k * * * " . ( ) * * ' i ll 1 2 j 4 s 6 7 8 9 .;( " " • • * marc hex ld • end o f recor d marc hex l e • field terminator marc h ex if • delimet er fig. 4. conversion table. k " .. -<£: 0 ov u uc: "'"'~ "' "'"' 40 sc ,., sc * 5c * sc * sc * 5c " 70 i 40 ( so ) 5c * sc • 6 b ' 60 48 61 i f\l 0 fll f2 2 fj j f4 4 f5 5 f6 6 f7 7 f8 8 f9 9 5c * sc h sc • sc • sc " sc • ..~ "'"' a, z 8 , z c,z d, z l:: , z f , z c , z h, z a, 8 b, s c , 8 0 , 8 e , s f , s g, 8 & a,r 8 , r c , r d, r e , r f , r g , r h,r j , 8 k, 8 l , 8 m, 8 n, s 0, 8 p,8 k x <0 " .c .c " 0 " .. .. .. "' .. "' 4 0 @ 41 a 42 b 4j c 44 d 4 5 e 4 6 f 47 c 48 h 49 i 4a j 48 k 4c l 40 ~~ 4e )i 4 f 0 so p 51 q 52 r 53 s 54 t 5 5 u 56 v 57 w 58 x 59 y sa z ss ( sc \ so j se ... sf .. .. oo x <0 .c : + is 0 ,1 is 0 ,3, 8 i s 5 , 8 .. .. .1 65019066 66021087 66061643 66065709 65019667 66021669 66061685 6606'5767 6'5023174 6607.1679 66061f!69 66065770 6'502s047 66021680 66061875 67010007 65026126 66021689 66061886 67010038 65027231 66021695 66061889 67010310 6'>027416 66022509 66061r99 67010836 6'>021"107 660229af! 66061917 67011394 6s027708 66.02'l067 66061967 670115h 65028116 66024150 66061983 670ll9h 65060409 6602'i530 66061988 67012048 65060483 66025986 66062017 67012128 6'>060652 1>6026120 660620l8 67012478 650601,84 66026122 66062160 67012840 65060737 660261h 66062168 67013691 65060796 66026124 66062252 670140!>1 65061226 660261?.5 66062259 67014071 650b1567 1>6026126 6 60 62283 67014142 65061895 6602659r 6 6 062290 67014311 65061896 660?6650 66062309 67014312 65062346 b6027410 66062403 67014916 65062359 66027435 66062405 67015033 65062399 6602769~ 6 6 062417 67015715 6'i062463 66027694 660624'•4 67016233 6506248'> m021!204 66062476 67016619 65062489 66028413 66062637 670169h 65062')04 6602a462 66062640 67017216 65062'>07 66028495 66062649 670174!19 65062'>4'3 6602 8687 66062820 67017582 6506<'').72 6602f\c)')f) 66062964 67017584 65062300 660 'i014fl 6fl06?.986 670176:>9 fig. 5. library of congress number listing. processing of marc tapes j bierman and blue 49 c start ) move data to header une move to hoidarea usingh alter end-sw to go to eoj t-----.1 print-rtn fig. 6. print card numbers program detail flowchart. 50 journ-al of library automation vol. 3/ 1 march, 1970 add 1 to h yes )---------.{ print-rtn locate & pull est l.------lil'l length codertranslate fig. 6 continued. subtract 2048 from length processing of marc tapes/ bierman and blue 51 sethtofl translate hold area move 1 to k 51 to l 101 tom 151 ton 201 top 251 to q perform rtnx 5 times clear hold area fig. 6 continued. eo) close files stop run skip 1 une perform rtn-y 10 times l move hold (k) toprlnt-k • move hold (l) to printl move hold (m) toprlntm , move hold (n) toprlntn move hold (p) to printp , move hold (q) to printq add 1 t o k, l, m, n, p, q. 52 journal of library a.utomation vol. 3/1 march, 1970 retrieval sub-system withdrawing records program this program withdraws records selected by lc card number and copies the complete marc ii records onto another tape. a library sends the department of libraries a magnetic tape containing the lc card numbers for the records it wants copied from the data base. the data base is searched and the requesting library is sent back three tapes and three hard copies. the tapes are: 1) the original finder tape, 2) an item tape containing the records which matched, and 3) a tape containing the lc card numbers of the records which did not match. the three hard copies are: 1) a list in lc card number order of the records which matched containing on the first line information from the finder tape and on the second line information from the marc tape; 2) a listing of the card numbers and other information on the finder tape which did not match any card number in the data base; 3) a listing of card numbers and other information on the finder tape that were invalid. there are three inputs to the system, the first being a marc master, which is the latest merged master at the department of libraries; its records are in the original code and format. the second consists of finder records, which come from the individual library. input is originally on card in the format specified in table 1, then put on tape, blocked 5, and sorted (no tape labels are used at this time) on all 12 positions of the lc number. the tapes are unlabeled upper-case ebcdic 1600 bpi. the third is a card that enters the appropriate date and library code into the system. table 1. original card input format to odl-~5 card columns 1 2-4 5-12 13 14-28 29-48 49-76 77-80 field contents and special instructions local library code (assigned by dept. of libraries) lc card number prefix (upper case alpha or blank) lc card number (numeric) lc card supplement indicator (may be blank) local use (may be blank) local use or first 20 positions of author (may be blank) local use or first 28 positions of title (may be blank) local use or publication date (may be blank) the system gives the following five outputs: 1) matched records, a listing of records that matched and were transferred to the individual library's item tape. this listing shows all informaprocessing of marc tapes/bierman and blue 53 tion from the finder record, and immediately below, the following information from the marc record: lc card number, the first 20 characters of the author, and first 28 characters of the title and the publication date. information pulled is as follows: author (first tag beginning with 1), which will usually be 100 or 110; title, which will always be 245; and date, which will be the 7-10 positions under tag 008. figure 7 shows sample of output. the first line is data from the finder tape and the second line data from the marc master tape. hucheo .records library cool! x qate. processed · 06/15/69 lc number local use author htle date 6406(1336 arcd publishing cohp operations and maintenance 1966 aroo publishing,tom9 operation~ and maintl!nanceot 1966 6()02[680 knox, john jay a history of banking in the 1969 knox, john jay.$0182 a history of banking in .the 1969 67021200 g.ll/pic gilbert pictorial anatomy ·of the cat 1968' gilber.t • stephen g.c pictorial anatomy of the cat 1968 61.023081> dickinson, emily two poehs. 1968 dicki.nsoii, ehily•sdt two poems.·sc*illus. and call 1'168 680080h gernsheih, helmut l. jo1 m. daguerrb 1968 gernsheih, helmulo$0 l. j• m. daguerr!!*sbthe hist 1968 68008214 riley• jdhii it, the studeiit looks at his tea 1'169 riley, john w.£1 sat the .student looks at his tea 1969' 6a0081ol8 tagiurl'o renato organizational climate 1968 tagiurto rf.nat0~&1 $ organizational clihate*~8exp 1968 680257:!11 t"e bardoue principles, sty 1968 8az!n 0 gerhajn~tl sa the baroque* principles, ·sno 1968 69015554 36823 philips, judson girt with sl~ fingers 1969 philips, judson penf the girl with six fingers*sb 1969 17002574 aylmer, g. e. the struggle for the constit 1968 aylmer, .c. e.&t uth the struggle for the constit .1968 78625296 groves, doris guiding the development of 1968 .groves, ooris.u sag guiding . the development of t 1968 79000540 americ~n assoclation preparation fo~ retirement 1968 ·american association preparation for retirement•s· 1968 at 6.800't"l52 iiellst r08ei!t science• hob8y hook of iieathf 1968 iiel:lsi robert', s01913 ·st i ence•h08b'i' book op weathe 1968 agr680001't 5 .santhyer, carolee morocco s agricultural ecd~o 1968 santh'i'er'• carolee,sd morocco s agricultural econo l96b gs 68000236 us geological survey bi~liogi!aphy 'of reports resu 1968 heath, jo ann,sd1923 bi8liograph'i' of reports rfsu 1968 total hatched reccs 15 aoo. generated errs .. fig. 7. matched records listing. 54 journal of library a.utomation vol. 3/1 march, 1970 2) items tape, containing all records requested from the master tape. they are in marc format and code, and the number of logical records should match the matched record count. 3) unmatched finders listing, showing all valid finder records that did not match the marc master tape. figure 8 shows sample output. 4) unmatched finders tape, containing all valid finder records that did not match the marc master tape. uiimatcheo ~fcoros lleaary coof. x date paocessfo 06/is/69 lc nuhbf.~ local use a unto~ titlf date 39015412 paoint paoelfoao i~teana tional uw 1939 6000716) 3~6~ delllt melvin belli l coks at life 1960 640635r6 caswell t barbara w woak~ens c!i~pe~sat i on hfnh i 1963 68002763 belli law revolt 1?6~ 68055~0~ eiseha'it alberta thf. guest oog 68066~07 roheck t hllcreo spf.c!al class programs for 1?6a 70003466 th f fhenoeo ca{f. facility 196 7 71 079)1 0 rfitlelt willi am the •fo itfrranean, i is rolf 1969 hfw68000051 pi ve'ih/fouc tot~l un•atcheo rcos 9 fig. 8. unmatched records listing. 1!!\ltor list iiig library code x d ate processed 06/15/69 x 66016u65 rich necessit ies of life 1966 i nv all c lc• number x 6805540~ i? i segues t dupl i ca te lc number x 73u3622 8leinh eih the ai.se and fall of the 197 0 inv alic lc• number j 95000001 invalid libr ary code j g9 6md003b ~683986 i nvalid library code jg9 6adddol8 468~986 invalid lc• pref i x jhe\17369 78 }6 he t s~lue iiivalid liormy code xhe 3a3326d9l invalid lc• prefix xheh332609z invalid lc• nuh8er xhlw790d0366 j ones relationships a~cng l969 invalic lc• pref ix j3266aodoog9 curt is the making of a presiden t 1969 invalid lirrary code j3266addddg9 curtis the making of a pres i dent l 96? i'ivalio lc• prffix j32668dooor.9 curtis the making of a pres i dent 196? i nv all c lc• nuhher x32669005736 hew3265ht 32 invalic lc2 pre f ix total frrors l4 fig. 9. errors listing. processing of marc tapesjbierman and blue 55 5) errors listing, showing all 80 columns of invalid finder records and the appropriate error message. finder records are invalid if one of the following errors occurs: 1) blank or invalid library code; 2) prefix any characters except blank or upper-case alpha; 3) lc card number not pure numeric. invalid finder records are not processed but are placed on an error listing. figure 9 shows sample output. no edits will be made on columns 14-80, which are for local use entirely; all data from these fields will be transmitted to printed listings for any desired local use or for verification. record counts are included at numerous points to facilitate accurate record control. for the purposes of this particular program, counts should check as follows: matched + e + unmatched -generated original rrors '= records records errors count matched records appear at the end of the listing of the same name, errors appear at the end of the listing of the same name, and unmatched records appear at the end of the listing of the same name. generated errors appear at the end of the matched records listing. a generated error indicates more than one error in -a single card and this count is included only for control purposes. the original count is expected to be maintained by the submitting library for maximum accuracy. these counts are checked immediately, and any discrepancies cleared up as soon as possible. figure 10 gives the overall view of the program and figure 11 a detailed flowchart. the odl-05 program was written to provide the greatest flexibility possible to the user libraries. the only information absolutely required for the finder tape is the local library code, and the complete lc card number. however, the remaining 67 card columns are available to the local library for any use it may wish to make of them. if the local library would like a quick method of sight checking to make sure that the records copied were the records wanted, it can keypunch the first twenty characters of the author in columns 29-48, the first 28 characters of the title in columns 49-76, and the date of publication in columns 77-80. if this is done, the matched records listing will contain the author, title and date from the finder tape, immediately followed underneath in the same position on the page by the corresponding information from the marc record. figure 7 shows sample output. thus, the library can quickly sight check what it thought it was getting at the time of request with what it actually got from the marc record. of course, the local library is free to put no information, or other information, in columns 29-80; the operation of the system will not be affected and whatever information is included in columns 29-80 will appear on the three output listings (matched records, unmatched finders and errors). 56 journal of library automation vol. 3/ 1 march, 1970 -phase 1edit, pull atches and list. -phase 2print errors -phase 3print unmatched listing unmatched listing date & lib. code card matched listing errors fig. 10. withdrawing records program system flowchart. ~ master-read \..._/ ~ finder-read ~ processing of marc tapes/ bierman and blue 57 move control data ·to header areas put hi-values 1----~in compare pull lci & convert to ebcdic compare \..._/ area close files put proper error code in record ~ !---•compare '-.j ~ phase-2 \._.,/ ~---=" fineread fig. 11. withdrawing records program detail flowchart. 58 journal of library automation vol 3/ 1 march, 1970 construct ~:...___,..,non-match red 'and code as 1----~ ~1rud finder-read fig. 11 continued. such /"""'.. work-read '-./ open work tap e r-'\ 1----_.. finder-read -...__,~ close files construct error message work-read processing of marc tapes/bierman and blue 59 {\ phase-3 open unmatched tape construct at end close files unmatched eoj mes sage fig. 11 continued. another convenience for the local library is that it has to do no original programming to use the system. all that is needed are standard sort, merge and card-to-tape programs. any of the programs written by the department of libraries is available to users on demand. they may find the merge or lc card number print programs useful. another consideration for the user is the ease with which invalid finder records and unmatched finder records can be resubmitted into the system. to correct finder records in error, the library simply repunches cards from the error listing, with necessary corrections, and resubmits them in the next cycle with new cards. unmatched finder records can be merged with any new finder records in the next cycle and resubmitted, no repunching being necessary. 60 journal of library automation vol. 3/ 1 march, 1970 what is presently being done the variety of applications for marc presently being worked on in oklahoma libraries is most interesting. central state college, edmond, oklahoma, is currently subscribing to the weekly marc tapes and producing an index of available materials which cumulates for two months and then drops off the older entries. the library is receiving its own subscription to the marc tapes for this purpose but does not plan to maintain a complete file of marc records. the tulsa city-county library system, tulsa, oklahoma, is currently using marc records from the state data base for bibliographic information for its machine produced book catalog. it originally had a subscription to the marc tape service, but with the operation of the state-wide data base, is dropping it. the university of oklahoma, oklahoma state university, and oklahoma county libraries have no immediate plans for utilization of the marc records as distributed by the library of congress; however, when they do move in this area it will probably be for use in their technical processing departments and the state marc data base will form a basis for their use. computer and language used the computer being used for the department of libraries marc program is an ibm 360/ 30 located in the state budget bureau but under the administrative control and operation of the information and management services division of the board of affairs (the centralized state computer center for the capitol complex) . the computer has 32k core size, one on-line card read/ punch, model 2540, four magnetic tape drives, model 2415, two magnetic disk drives, model 2311, and one on-line printer model 1404. the programs are written in cobol for the 360/ 30, operating under dos, with a cobol compiler. very little modification would be required to operate under os. the merge program ( odl-01) requires three tape drives. the withdrawing program ( odl-05) requires four tape drives but could be modified to operate with only three tape drives. in agreement with henriette avram and julius droz ( 4), the department of libraries has found that cobol can easily be used to process marc records. the information and management services division has assigned a programmer to the department of libraries who has done, and will do, all the marc programming. she is actually employed by the imsd and the department of libraries contracts with them for her services. presently, the department is being charged about $7.00 an hour for programming time. the planning, system design, actual programming, and production are all closely supervised by the data processing coordinator of the library, and he is on the department of libraries' staff. processing of marc tapes/ bierman and blue 61 the relationship between the imsd and the odl has been extremely beneficial for the library. thus far, the centralized computer center has provided fast and excellent service at a minimum cost. having a fulltime data processing coordinator on the staff of the library has negated the communication barrier which so often exists between a computer service center and a user library. cost cost figures for use of marc are very difficult to find. few of the marc i participants ( 3) give anything but a fleeting reference to cost. the reason is clear: cost figures are difficult to determine and even more difficult to evaluate meaningfully. table 2 is a breakdown of the charges to the department of libraries for programming and machine time; it does not include department of libraries' staff time or overhead costs. the figures are accurate through the end of february 1970. table 2. costs system design -----------------------------------------------$1,102.00 programming -----------------------------2,467.00 machine cost for program testing, debugging and machine and operator cost for merging through 2/ 28/ 70 ------------------2,026.00 total --------------------------$5,595.00 for the first year, the department of libraries is absorbing all the costs of merging and maintaining the marc master file, as well as the costs of all programming, as a form of state aid to libraries. the machine costs of comparing a finder tape with the master file, copying the desired records, and printing the various hard-copy lists, is being absorbed by the user library. the user also supplies the two blank tapes which are needed for each run. the machine time costs are based on the rate of $80.00 an hour of cpu time. plans for the future of the state-wide marc master file two major problems are apparent in the system as it is now set up. the system was initially created as a sequential tape system because this was the easiest and quickest way to establish a working system, and because it was felt that this would be practical for at least the first year of operation. one problem is that the sequential file will become expensive to maintain and does not allow direct access to a particular record without a sequential search. another problem is that the present system allows enhy into the file only by lc card number and does not allow entry directly with bibliographic information. 62 ]vurnal vf library automation vol 3/1 march, 1970 in accordance with present plans, in march 1970 work will begin on converting the storage medium from tape to a direct access device (disk or data cell) as the recon study suggests ( 5). at that time the file will cease to be maintairied in lc card number order and will be maintained in the order in which the records are received from the library of congress. various indices to the marc data base will be produced; author and title indices will enable the data base to be searched by bibliographic information when the lc card number is not known. in this way, only the indices (which would be comparatively much smaller), and not the complete data base, would have to be merged and searched. in terms of the data base itself, this will be the next major change. in the long run, it will be desirable for libraries that want access to the marc data base to have such access directly via terminals. at the present time, the cost of this kind of access is not worth the increased speed of access, nor is the money presently available; however, in the future, the cost of such a system will surely be reduced by technological improvements and increased importance of instantaneous access to the data base. when need balances with cost, such a set-up will be feasible. the geographical expansion of the system is a possibility. economically, this is most desirable, because the more ways the cost of maintaining the data base is split, the cheaper it is for all involved. some preliminary investigation along these lines with bordering states is being made and hopefully at some time in the future there will be a regional data base which many libraries can use. plans for future cooperative use of marc the cooperative use of marc thus far in oklahoma only affects the larger libraries which have access to computers and automation personnel. essentially, each library is autonomous and is free to use marc in any manner it wishes. it will remain true in oklahoma that individual libraries will always be free to use the data base to retrieve part or all of the data base for any purpose. however, plans are under way for more cooperative use of marc with libraries that do not have automation capabilities that would result in useful hard copy products for such libraries. two such cooperative plans have been proposed for immediate implementation. the first of these is a current awareness service. selected subjects would be compared against the data base on a bi-weekly (or other period) basis and complete bibliographic information for books representing the selected subjects would be printed as a personalized current awareness service. for example, all law titles on the marc tapes for two weeks could be pulled and listed, and the listing distributed to the county and state law offices, attorney firms, the law school library, etc., for selection and order purposes. the same could be done with library processing of marc tapes/ bierman and blue 63 science or any other subject. subject lists of interest to various agencies of state government could be produced and sent to them. another possibility is a profile of a legislative session by subject and then weekly or monthly lists of current materials available on these subjects for ordering by the department of libraries and possible lists to be made available to the legislative members. there are many possible uses for such a system which could be done fairly inexpensively. work began on this project in october 1969, and the service became operational on a cost basis in february 1970. a second possibility is catalog card and processing aids production. this would probably be done as a pilot project with several libraries throughout the state and then, if successful, expanded to any library in the state wanting to use the service. catalog card sets with subject headings printed at the top, and call numbers printed if the library accepts lc or lc dewey classification (there would be several options available within the system), spine labels, and book and circulation card labels would be provided. a by-product of such a state-wide operation would be the maintenance of book location information in machine readable form in a central place for future use as a basis for a machine readable state-wide union catalog. a project not in the immediate future but certainly being considered is that of cooperative retrospective conversion. that is, several libraries in the state would like to have bibliographic information in marc format for all books in their collections. whether the department of libraries would go ahead with such an ambitious project or wait for it to be done nationally ( recon study) would depend on timeliness on the national scene, need on the local scene, and available financial resources. eventually, oklahoma would like to have in machine readable form a complete union catalog of the entire library resources of the state that could be used for cooperative acquisitions programs, for strengthening subjects which are weak within the state, and as a location tool for interlibrary loan. such a data base would later be used also for reference functions. needless to say, such an ambitious project as this is not in the immediate future. conclusion early in the game, oklahoma libraries learned that the most economical means to library automation was cooperative automation. the creation of a state-wide marc data base is an important step toward cooperative library automation, while still allowing each local library to maintain its individuality for uses of the data. many areas of cooperation still remain untouched. the future success of library automation in oklahoma lies in the imaginative and creative projects that could be designed and implemented cooperatively to the mutual cost savings and benefit of all. 64 journal of library automation vol. 3/ 1 march, 1970 programs copies of the programs mentioned in this paper may be obtained from national auxiliary publications service of asis as follows: 1) "a program to merge all marc ii tapes received from the library of congress onto a single tape" (naps 00815); 2) "a program to drop given records or to transfer them to a separate tape" (naps 00816); 3) "a program to print marc tapes in readable form" (naps 00817); 4) "a program to pull selected records from the marc master tape for a single library" (naps 00818); and 5) "a program to print a listing of all library of congress card numbers on a given marc tape" (naps 00819). references 1. nugent, william r.: "nelinet: the new england library information network." a paper presented at the international federation for information processing, ifip congress 68, edinburgh, scotland, august 6, 1968. (cambridge, mass.: inforonics, inc., 1968), 4 pp. 2. pulsifer, josephine s.: "washington state library." in avram, henriette d.: the marc pilot project; final report on a project sponsored by the council on library resources, inc. (washington: library of congress, 1968), pp. 149-165. 3. avram, henriette d.: the marc pilot project; final report on a project sponsored by the council on library resources, inc. (washington: library of congress, 1968), pp. 89-183. 4. avram, henriette d.: droz, julius r.: "marc ii and cobol," journal of library automation, 1 (december 1968), 261-72. 5. recon working task force: conversion of retrospective catalog records to machine-readable form; a study of the feasibility of a national bibliographic service. (washington, d. c.: library of congress, 1969). lib-mocs-kmc364-20131012114302 304 reports and working papers cable library survey results public service satellite consortium: washington, d .c. the following paper was distributed to pssc members in may 1981 , and is reproduced here to bring it to the attention of a wider audience. background the public service satellite consortium (pssc) conducted a survey of academic libraries in july 1980 to study their data communications needs and services. results of that study, coupled with library interest generated by that study, convinced pssc that: (1) libraries have a wide variety of communications needs which could be addressed with appropriate uses of telecommunications; (2) all types of libraries are affected, not just academic libraries; and (3) data transfer was but one of many types of library services in need of better communications. this information motivated pssc to take a broader look at library communications. that second look resulted in the identification of the "cable library" (catvlib) phenomenon and video library services. in december 1980, pssc launched a second survey directed to cable libraries; that is, libraries of all types which are connected to local cable companies. this study was aimed at determining to what extent, if any, a national satellite cable library network might be already in technical existence. how many libraries are presently connected to cooperative cable companies with satellite hardware and excess satellite receiver capacity? and of that number, how many cable libraries would be interested in participating in satellite-assisted library services and video-teleconferences? to answer these questions, pssc mailed questionnaires to 101 libraries that had been identified as potential cable libraries. in order to allow the participation of unidentified cable libraries, pssc also advertised the survey in various library periodicals, including american libraries, cable-libraries, and lola. that ad resulted in an additional 97 cable libraries requesting to participate in the survey, raising the total number of libraries receiving the questionnaire to 198. as of april 1981, 86 libraries have responded, yielding a 43 % return. follow-up phone calls have indicated that more surveys are forthcoming, or that the questionnaire proved to be irrelevant to present library conditions. in some cases, copies of the survey were requested and distributed for informational purposes only. the survey instrument the questionnaire incorporated explanations of terminology and was eight pages long. additional enclosures furnished more specific information about pssc and videoteleconferencing. the respondent was not only questioned about his/her library facilities, but also was asked to interview thecable company for necessary technical information. though contributing to slower returns, this two-tiered approach did succeed in establishing contact between the library and the cable company, as well as provide all the data required to profile each library as a potential network participant. survey participants since a national network is being pursued, an attempt was made to reach as many of the states as possible. thirty-seven states received copies of the survey, while thirty-one had at least one responding library. all types of libraries were surveyed. those surveyed included elementary school libraries, high school libraries, vocational school libraries, academic libraries, public libraries, regional library networks, state libraries, library systems, special libraries, and libraries that also double as their local community access center for cable television. of the 86 who responded, 63 were public, 18 were academic, 4 were school, and one was a special library. responding libraries have been categorized according to their ability to be an active member of the network: uf usable facility-those libraries that have met all the technical requirements for network participation. the library must be currently connected to an operational cable system which has a satellite receiving station and excess receiver capacity. in addition, the cable system and the library must have indicated an interest in participating in and hosting occasional satellite-transmitted events. nxc no excess ro capacity-libraries that meet all technical cable connectivity requirements, but whose cable system cannot presently accommodate any more activity on its satellite receiver(s), are grouped here. should time become available in the future, these libraries are then technically able to advance to the usable facility group. nro no catv rohere are placed those libraries that are connected to an operational cable system . however, the cable system has no satellite receiving station and, therefore, no satellite access. in order to become a usable facility, these cable systems must install a satellite receiving station and be able to offer excess receiver capacity. ncc no catv connectionwhile a cable system with all the satellite hardware requirements may be operating in the library's area, these libraries are not connected to the cable system. reasons given in the survey are varied including logistics, economics, and disinterest . depending upon the technical status of the cable system, a reports and working papers 305 simple link may be all that is needed for the library to become a usable facility. nca no catv in arealibraries in this group are located in areas that presently have no operational cable system. some areas are now in the franchising process, some have awarded franchises but are not operational, and others have no idea if and when cable service will come to their areas. libraries here have the advantage of knowing what requirements are necessary for network participation and can use this information when franchising negotiations begin. ni no interest-here are grouped those libraries that are at various stages of technical capability, but have no desire to participate in a national satellite cable library network. table 1 illustrates responses according to geographical location. (numbers refer to the quantity of libraries from each state that fit into the above defined categories .) exactly half of these respondents are usable facilities. the largest hindrance to network participation is lack of connectivity between the library and the cable system. library /cable connectivity part one of this survey established the degree of connectivity between libraries and their local cable companies. pssc's major concern was to find libraries wired to at least receive cable programming. pssc also discovered that the highest percentage of libraries had two-way connection, usually for the purpose of cablecasting. connectivity among the 86 respondents was broken down as follows (all percentages have been rounded off): 33 (39%) two-way interconnection (transmit and receive video) 29 (34%) one-way catv drop (receive onlyregular subscriber) 14 (16%) no catv connection 9 (10%) nocatvinmyareaor presently operational in my area 1 (1%) no answer to question other questions in this section profiled the technical capabilities of the cable system. specific hours of each day of the week a satellite receiver was available for occa306 journal of library automation vol. 14/4 december 1981 table i. state nabama naska arizona california colorado connecticut florida georgia hawaii idaho lllinois indiana iowa kansas kentucky maryland massachusetts michigan minnesota mi ssouri nevada new jersey new york north carolina north dakota ohio oklahoma oregon pennsylvania tennessee texas utah vermont virg inia wash ington wisconsin wyoming total total state respondents uf 0 no response 3 i 5 2 2 4 i i i 0no response 2 2 i i 2 2 2 i 3 3 i 2 2 ii 7 i i i i 4 i 14 5 2 i 0 no response i i 0no response 2 2 2 2 i 2 2 2 0 no response 4 2 2 i 3 3 0 no response 86 43 sional use were charted. weekday mornings proved to be the most available time block. it is also imperative for pssc to know what transponders (channels) of the satellite cable systems can access. there are twenty-four transponders on satcom i, the main satellite used by cable. when pssc coordinates a satellite telecast, time on a satellite transponder must be secured . each transponder is leased to someone, such as home box office (hbo), ted turner's cable news network, or the appalachian community service network nxc 3 i 7 nro 2 3 9 ncc 3 i 2 2 2 14 nca 2 i 2 8 nl 5 (acsn), to name a few, for the carriage of their programming. time needed by pssc for a two-hour satellite event, for example, can be sublet from a transponder lessee, subject to availability. however, finding time slots on satcom i transponders is becoming increasingly difficult as many lessees are expanding the number of hours of their own programming. as a result, pssc must know which transponders each cable system can receive so that an attempt can be made, where possible, to accommodate the majority of survey facilities. the ideal situation is for catvs to own "frequency agile" satellite receivers; that is, receivers that can access any of the transponders. some receivers can get only evennumbered transponders or odd-numbered transponders; others can access only certain individual transponders. transponder accessibility is usually related to the type of programming the cable operator offers or plans to offer to the local cable subscribers, or to the age of the system. (older systems often use twelve channel receivers, tunable to only evenor odd-numbered transponders on satcom i.) for example, if a cable operator does not anticipate offering anything besides hbo now or in the future from satcom i, often he/she cannot justify the need for a frequency agile receiver. table 2 outlines transponder accessibility for usable facilities only. this abundance of frequency agile receivers will provide the connected libraries with a greater amount of flexibility in receiving programming since their participation will not be dependent upon a certain transponder. another question probed the availability of provisions for closed-circuit, discrete delivery of satellite transmissions from thecable system's receiver into the library. being able to provide closed-circuit capabilities would ensure the privacy of a satellite telecast. some pssc clients insist that their transmissions be safe-guarded through closed-circuit delivery. as expected, closed-circuit arrangement does not exist between very many libraries and their catvs. unless part of an institutional cable loop, most libraries cannot presently be singled out for closed-circuit cable reception. under normal conditions, what is transmitted from the head end of the cable system travels to everyone subscribing to the cable service. eleven of the forty-three usable facilities claimed closedcircuit capabilities are currently available. those thirty-two without described what technical considerations must be present before such provisions could be offered. these technical requirements included scrambling devices, mid-band channel usage, modulators and demodulators. such upgrading of the cable company's hardware was quoted as costing from hundreds to sevreports and working papers 307 era! thousands of dollars. no catv indicated willingness to assume the expenses for such special capabilities, but a few did offer to investigate the possibility of temporary special links on a per-occasion basis. library facilities the survey also asked about the library's facilities. information in part two centered on library accommodations and equipment. answers here provided a description of each library, which gave pssc an idea of how adaptable to hosting satellite teleconferences each might be. a basic satellite program viewing facility consists of the viewing area, equipped with chairs and tables, at least one television monitor (wired to receive the cable protable2. # of facilities able to access transponder # transponder i 2 2 2 3 i 4 i 5 i 6 4 7 3 8 3 9 6 10 3 11 0 12 2 13 i 14 3 15 0 16 3 17 i 18 i 19 0 20 2 21 3 22 4 23 0 24 5 frequency agile 30 not sure 4 note: these ligures are for transponder accessibility on satcom i. numbers for the specific transponders were tabulated from those surveys that indicated their satellite receivers were not frequency agile, but rather could access only those transponders they had listed. 308 journal of library automation vol. 14/4 december 1981 gramming), and , for interactive programs, a telephone. survey libraries reported they had conference rooms, auditoriums, and classrooms available for viewing satellite telecasts. the number of viewers able to be accommodated at one time ranged from 6 to 400, with the average facility holding 75 people. some libraries could provide simultaneous viewing in more than one room, which increased the total number of people they could accommodate for a single event. a majority of the libraries had more than one monitor; some as many as fifteen monitors. three lib!aries indicated they owned a large-screen television projector. fortyfour percent of the usable facilities have no phones in the viewing rooms, but many explained that phones were either nearby or could be temporarily installed for an interactive event. in response to a question about the location and accessibility of the library within its community, the general comments described the majority of the libraries as being in a convenient part of town, with ample parking and barrier-free design. when given enough advance notice, most libraries were willing to schedule an event at any time, even during hours and on days the library was normally closed to the public. traditionally, as a part of its standard networking service, pssc rents viewing facilities for the client, whether they are public television stations, hotels, or other facilities. libraries, as another type of viewing resource, would be entitled to receive payment for use of their facilities. obviously, this fact treads on controversial "fee or free" waters. being aware of this, pssc asked the libraries whether they could accept money for these purposes; and, if not, whether they might have some other mechanism, such as a "friends of the library" group, to which the money could be given instead. those libraries that said they could accept money directly for the use of their facilities numbered thirty-four. oddly enough, thirtyfour libraries also said they could not accept money directly for the use of their facilities. of that group, thirty-one indicated they did have a "friends of the library" or similar group to which money could be given for indirect channeling back into the library. eighteen libraries did not answer this question (many due to libraries not completing the entire survey once they felt the cable information made them technically ineligible for participation). only three libraries might have a problem with financial arrangements for an event. program interests the final section of the survey (part three) gave each respondent the opportunity to list topics of interest to the library and community that could be presented via a satellite video-teleconference. general comments identified continuing education, organizational conferences, training, seminars, workshops, media distribution, and information dissemination as major activities suitable for satellite-assisted delivery and distribution. special target audiences included the following: 1. senior citizens 2. handicapped 3. minorities 4. the disadvantaged (economically, educationally, socially) 5. the abused (drug addicts and alcoholics; abused children and spouses, teachers and students; victims of crime; and the sexually harrassed) 6. the institutionalized (in hospitals, prisons, nursing homes, mental health centers, hospices) these special patrons are often served through outreach programs and were named here as potential beneficiaries of satellite programming. the most frequently named special population was the elderly, with suggestions for retirement, social services, nursing-home care, insurance, and other senior-oriented programming. three major classes of other potential users of satellite video-teleconferencing in the library were identified: 1. education-oriented: preschool and nursery students; elementary, middle, junior high, and high school students; postsecondary and graduate students; vocational, technical, extension, and cooperative education students; special education students; adult and continuing education students; educational administrators, faculties, and staff 2. government-oriented: federal, regional, state, county, and local government officials and employees 3. employment-oriented: professional! nonprofessional; salaried/hourly; union /nonunion; management/staff; public/private sectors; employed/ unemployed; full /part-time; permanent/ temporary; big/small business; human services/ trade particular topics of interest felt to be ideal satellite program areas within each library's community included the following (appearing in no rank order): energy (solar and natural resources) consumerism community services environment historic preservation/oral history legal aid librarianship computers, data processing technology communications/telecommunications fund raising safety recreation , physical education, sports, parks language (bilingual, sign, foreign, literacy) economics and finance (investment, banking, inflation, budgeting) conservation genealogy religion business and industry civil defense agriculture and forestry health and medicine mental health arts and humanities curriculum sharing therapy and rehabilitation real estate several local associations, who have affiliates or branches located nationally, were listed as potential users of satellite videoteleconferencing (in order of popularity): 1. american association of retired persons 2. league of women voters 3. historical societies 4. american library association 5. chamber of commerce 6. american association of university women reports and working papers 309 7. parent/teacher associations 8. councils of government 9. jaycees 10. boy scouts 11 . friends of the library three questions concerning interest and ability to participate in future satellite video-teleconferencing activities were asked . the questions, vital to the outcome of this survey, are reiterated here with their respective answers: 1. would you be interested in helping set up one or more of these specialized teleconferences? yes 63 (73%) no 10 (12 o/o) maybe 5 (6%) no answer 8 (9 %) 2. would you be interested in doing a local follow-up program after a national teleconference that is of interest to your community? yes 65 (76%) no 6 (7%) maybe 8 (9%) no answer 7 (8%) 3. periodically, nationally based organizations sponsoring teleconferences or special programs enlist promotional and site arrangement support from local site facilitators. would you like to be listed as available to provide this support? yes 54 (63%) no 18 (21 o/o) maybe 3 (3%) no answer 11 (13%) the interest of the libraries surveyed is well documented in questions one and two. however, their ability to presently participate is limited to financial and personnel resources as demonstrated by question three's responses. general conclusions and recommendations the majority of surveyed libraries recognize the need for libraries to expand their community service roles through some use of telecommunications. many of the 86 libraries indicated the concept of libraries becoming satellite program viewing facilities through their cable connectivity was an idea so new to them that they could not fully 310 journal of library automation vol. 14/4 december 1981 understand or visualize what would be expected of the library in this novel role. yet the general consensus was that if joining with their cable systems to provide satellite programs receiving locations was a method of improving community library services, while not making demands on the library's budget, then the concept was worth exploring individually on an operational basis. to illustrate this concept of the ca tvlib as a satellite program viewing facility, a typical scenario would find participating catvlibs contacted by an organization or networking agent who wishes to reach the general community or a special segment with its satellite-transmitted programming. the catvlib, as the community contact, would have the option to respond negatively or positively. if the catvlib is interested, it must begin performing local coordination duties, most important of which is garnering the agreement of its cable system. catvlib and cable system discussions will determine five things: 1. can the cable system access the satellite transponder on which the programming will be carried? 2. will the cable system have a satellite receiver available on the date and time of the program? 3. will theca tv lib have its viewing facility available on the date and time of the program? 4. if desired by the program's sponsor, will the catv lib contact the local group who is to participate in the program and work with them prior to the satellite telecast to the extent needed by the requesting organization? 5. can the cable system and/or the catvlib handle special program considerations, if any? for example, provide closed circuit capability in the catv lib? tape the program? provide telephone(s) for interactive programs? provide local site facilitation? -coordinate local follow -up activities? provide refreshments? coordinate advance publicity within the community? once the catvlib has determined whether or not it is able and desires to offer their services, the ca tvlib would be recorded as a satellite program "receive site." theca tv lib will then assume the degree of local responsibility requested and contracted by the requesting organization, including all negotiations necessary with the cable system. while there were survey indications of general support for such a national satellite cable library network, what are the pros and cons of its operation? pros pre-existing conditions. ca tvlibs need no investment for hardware, but merely take advantage of pre-existing cable connectivity. community service. such ca tvlib participation potentially offers service to every member of the community. outreach to new patrons. those community residents not previously using the library may find this new service applicable to their needs. economics. catvlibs could recoup any charges incurred through this service, as well as expect payment as a rented receive site. program interaction. live satellite programming has the advantage over taped programming of allowing the option of offering viewers the opportunity to interact with the program's presenter(s). resource-sharing potential. this service has the future potential of providing catvlibs with an alternative method of accessing new information resources and data bases. human resources can be shared now through this service. potential catv expansion. more catvs are expanding and upgrading their satellite access capabilities as usage of satellites by cable programming vendors increases. some catv s have already purchased west ar iii hardware in addition to their satcom i hardware. future implications. if satellite-related services become valued by the community, the residents might decide the catv lib should have its own satellite hardware so that the community could take advantage of more programming available directly from satellite. cons lack of sa tcom i occasional time. it is becoming increasingly difficult to sublease transponder time on this satellite for occasional satellite programs. dependency. the catvlib must depend entirely on the cable system to be able to be a network participant and offer this service. ca tvlib participation is dependent upon the cable system's satellite access capabilities, which generally means satcom i only. lack of cctv. generally, most ca tvlibs cannot offer closed-circuit capability, so absolute privacy cannot be guaranteed to the program's sponsor. catvlib policies. some catvlibs will have to make decisions about various controversial items, such as: -accepting money for use of facilities. -allowing some clients the right to limit viewing to only registrants. -hosting controversial groups. range of catvlib capabilities. the survey demonstrated that ca tvlibs cannot all offer the same degree of service due to the wide range of technical capabilities. at present, each satellite event would have to be judged individually to determine which catvlibs were equipped to participate. a glance at the pros and cons of marrying libraries and satellite communications through cable connectivity suggests a national satellite catv lib network is a presently available and usable resource with potential for future expanded capabilities and unlimited programming uses. the obstacles imposed by the cons, however, are cause for a serious and objective look at the present and future viability of such a network. popular present uses of satellite videoteleconferencing are for telecasting continuing education and organizational conference interactive programming to special audiences. some pssc clients will often request to: -charge his/her special audience for participating (course or conference fees, for example). -have the satellite-transmitted event reports and working papers 311 closed-circuit telecasted to the receiving locations only. -reach specific geographical locations (often large urban areas, such as new york or los angeles). charging special audiences for closed-circuit satellite event the first two client requests are often related. if the client intends to charge the registrant-viewer a fee, he/she often expects the program to be viewed only at designated receive sites that are hosting the paying participants. (why should a viewer pay if heishe could watch the same program at home on a cable channel for free?) obviously, those clients interested in a "box office" approach to their event, that is, to make a profit rather than offer a service, are not suited for catvlib network use. however, how can the ca tvlibs accommodate those public service groups which must recoup expenses in order to offer such satellite program services? client-designed incentives such as giving the phone number for viewer interaction in a program only to the ca tvlibs rather than displaying or announcing the number during the program; requiring participants to have special materials and/ or integrating local preor postevent activities in the catvlibs with the program; even offering course credit to registrants only are manageable alternatives for those catvlibs that cannot terminate the program in their facilities only. some catvlibs may be able to negotiate whh their catv for the provision of the necessary equipment to provide closed-circuit capabilities. however, this survey did not identify many catvs that were willing to cooperate with the libraries to that extent. for those catvlibs whose policies restrict their involvement with financial transactions, particularly money exchange among library patrons, advance registration fees paid directly to the client could enable the libraries to avoid being required by the client to "collect at the door." most libraries, however, by their very nature, cannot prohibit anyone from viewing a program within their facilities, thereby making it generally impossible for them to guar312 journal of library automation vol. 14/4 december 1981 antee the client their requested selective audience. size, location, and distribution of receive sites video-teleconference users generally want to reach as many of their members or special populations as possible, yet they must pay to rent each receive site. economics influence their attempt to reach more people at fewer locations, not necessarily those most in need of the program. therefore, it is no surprise that popular receive sites are located in heavily populated cities. while cable television is finally coming to urban areas, present conditions find a lack of operational catvs available. the typical catvlib now is located in a smaller city or rural area. large states, such as california and texas, have little or no catvlib representation. only twentythree states currently have a usable catvlib facility, which makes the network descriptor "national" not quite accurate. expanding the catvlib network to include more and larger cities and all states is a must to make it competitive with other satellite networks available to a client. but even if the network is able to expand, the previously mentioned inability of catvlibs to provide closed-circuit capabilities will lessen its desirability as a resource when that capability is offered by another satellite ground facility in the same city. one competitive alternative a catv lib can consider is rental cost. clients expect to pay a reasonable rate for the use of each facility. this rate differs among different types of satellite networks, and even within the same network. for example, renting a public television station is generally less expensive than booking a hotel. yet the rate for two public television stations can vary in the hundreds of dollars. if a catvlib chooses to offer its facilities for free, asking only for compensation os any expenses it might incur because of the satellite event or charges a minimal amount, their facility becomes economically attractive. one factor the ca tvlibs must not overlook when contemplating such a decision is the cable system. will the cable system expect remuneration for its services, especially if the catvlib is receiving payment? libraries must remember they have entered into a cooperative arrangement with their catvs in order to become a satellite program viewing facility. toward future independence while a skeletal cable library network does technically exist, it is imperative that libraries work toward their own future independence before they can truly establish themselves as a viable satellite network. evolution of a catvlib network to a satellite library network might include the following two steps: l. expanded catvljb network. the survey instrument should now evolve into an interview tool for profiling additional libraries to become part of this network. efforts should be made to encourage libraries within poorly represented states to join the network if technically feasible. expansion is urged for two main reasons: to allow libraries the opportunity to experience being a satellite program viewing facility without financial obligations. -to allow community residents the opportunity to experience a library service with great potential for all local population segments. once the library is regarded as the logical place for community communications, it will be much ea~ier to begin a community drive toward supporting the outfitting of the library with the proper hardware necessary to function in that capacity. requirements for becoming part of the expanded catvlib network include: -at least one-way connectivity between the library and the catv. (a typical subscription for basic service will suffice.) -the catv must have a satellite receiving station. -the catv must have excess capacity available on its satellite receiver. -catv must be willing to cooperate with the library in providing satellite reception of occasional satellite telecasts. library must have at least one viewing room available to seat those viewing the satellite program. library must have at least one television monitor, wired to receive cable programming, available in the viewing room. library must be willing to assume role of community contact to extent requested by client. (need is for library interest in participating in these occasional satellite telecasts; degree of local responsibility can be negotiated.) even though this network is designed to be a temporary method of allowing library participation in satellite communications, future implications could find these libraries expanding, improving, or beginning eablecasting on a library-designated cable channel. thus, libraries deciding whether they should become involved with a temporary network might contemplate the related activities available from library/cable system cooperation. 2. satellite library network. at some point in the not too distant future, libraries will be faced with the decision of becoming independent from their cable system and obtaining their own satellite hardware. a library with its own satellite receiving station will become more desirable to more users as a receive site for a satellite videoteleconference since it will be more reports and working papers 313 flexible and autonomous. besides satellite video-teleconferences, libraries could investigate other uses of their satellite hardware including: direct satellite access (with permission recommended) for cable television fare; reception of nationwide satellite distribution of taped video programming for library use; -facilitation of various library data communications. if the library is able to prove the value and practicality of having community satellite access capabilities located at its facilities to the residents through participation in the catv lib network, local funding of a satellite library project might be realistic. if corporations are made aware of how such a satellite library facility could benefit their own communications needs, a corporate grant could prove to be another funding route. other sources of support must also be explored. final word as a result of this survey, pssc has profiled cable libraries of all technical capabilities for input into a database of network resources. however, the limitations of a catv lib network have been noted. effort will be made by pssc where appropriate to use this network for client satellite telecasts. pssc will continue to profile interested cable libraries for addition to the network , upon request of the library. statement or ownership and management }ourrwf of l.ihrary automatior1 is published qua rterly by the arn ericnn library as~iation, 50 e. lluron st .. chica~o. jl 60611. annual subscriptio n prk-e, s 15. am erican library a.o,sociation. o\\ rwr; brian avcncy. editor. second class postage paid at chicago, ill inois. pnnted in u.s.a. a .. a nonprofit organization authorized to mail at special ratl'\ (sl'ction 132.122, postal s(•rt:ice .\lanual), the puq>oi,(', fu nction. and nonprofit .;:tatu~ of thh organi zation and the exempt status for ft..--dcral in(•omt.· taj; purposes have not chan~ed during the preceding l\\ el\ e month~. extent and ~aturc of circulatlon ("a\eras;:e" figures denott• the numlx:r of copies printed each issue during the preccdmg: tweh"e months: ''actual " figure-. denote number of copies o f sin~le l 'isue published neart..-q to filin~date -the june 1981 i~sue.) tot al numbt!r o f copll"i printed: aq~ra~e. 6,869: ac t ual. 7,345. paid circulation: not applicable (i.e .. no ... a c"' throuj!h dealers. carrie rs. street 'endoro, and rountcnal<.--s} . mail subscription ... : ah•ra~<·. 6,076: actual. 6 ,308. total puid circulation: average, 6.076. actual6 ,308. free dhtrihution b y mail, carrier, o r otlwr means, samples, complirnt·ntary. and ot her free cop ies: a\t~ragc . 432: a<:tuul, 446. t olitl di ~t rihution: average. 6.508: actual. 6,i.54. copies no t di. ... tributcd: offic<' us(', le ft over, unacco unt ed . 11poiled after printing: aven~ge, 361; actual. 59 1. hcturm from news agents: not applicahlc. t otul (sum prcviouo; thrt.."c entries): a\'erage, 6,869: actual. 1.345. stateml·nt of 0\\ ncn hi p. ~1anagement and circulation (ps 3526. j une 19so) fo r 1981 fil ed with the united stat<" po't office pmt rna\tn in chica~o. september 30. 19hl 48 information technology and libraries | march 2007 author id box for 3 column layout column title editor zoomify image is a mature product for easily publishing large, high-resolution images on the web. end users view these images with existing webbrowser software as quickly as they do normal, downsampled images. a flash-based zoomifyer client asynchronously streams image data to the web browser as needed, resulting in response times approaching those of desktop applications using minimal bandwidth. the author, a librarian at cornell university and the principal architect of a small, open-source company, worked closely with zoomify to produce a cross-platform, opensource implementation of that company’s image-processing software and discusses how to easily deploy the product into a widely used webpublishing environment. limitations are also discussed as are areas of improvement and alternatives. z oomifyer from zoomify (www .zoomify.com) enables users to view large, high­resolu­ tion images within existing web­ browser software while providing a rich, interactive user experience. a small zoomifyer client, authored in macromedia flash, is embedded in an html page and makes asyn­ chronous requests to the server to stream image data back to the client as needed. by streaming the image data in this way, the image renders as quickly as a normal, downsampled image, even for images that are giga­ bytes in size. as the user pans and zooms, the response time approaches that of desktop applications while using the smallest possible band­ width necessary to render the image. and because flash has 98.3 per­ cent browser saturation, viewing “zoomified” images is seamless for most users and allows them to view images interactively in much greater detail than would otherwise be prac­ tical or even possible.1 zoomify image (sourceforge.net/ projects/zoomifyimage) was created at cornell university in collabora­ tion with zoomify to create an open­ source, cross­platform, and scriptable version of the processing software that creates the image data displayed in a zoomifyer client. this work was immediately integrated into an inno­ vative content­management system that was being developed within the zope application server, a premier web application and publishing plat­ form. authors in this system can add high­resolution images just as they normally add downsampled images, and the image is automat­ ically processed on the server by zoomify image and displayed within a zoomifyer client. zoomify image is now in its second major release on source forge and contains user con­ tributed software to easily deploy it in other environments such as php. zoomifyer has been used in a number of applications in many fields, and can greatly enhance many research and instructional activities. applying zoomifyer to digital­image collections is obvious, allowing libraries to deliver an unprecedented level of detail in images published to the web. new applications also suggest themselves, such as serving high­resolution images taken from tissue samples in a medical lab or using zoomifyer in advanced geo­ spatial image applications, particu­ larly when advanced client features such as annotations are used. the zoomifyer approach also has positive implications for preservation and copyright protection. zoomify image generates cached derivatives of master image files so the image masters are never directly accessed in the application or sent over the internet. image data are stored and transmitted to the client in small chunks so that end users do not have access to the full data of the original image. deploying zoomify image dependencies and winstallation zoomify image was designed ini­ tially to be a faithful, cross­platform port of zoomify’s image­processing software. it was developed in close cooperation with zoomify to pro­ vide a scriptable method for invok­ ing the image­preparation process for zoomifyer clients so this technol­ ogy could be used in more environ­ ments. zoomify image is written in the python programming language and uses the third­party python imaging library (pil) with jpeg support, both of which are also open source and cross­platform. it has been tested in the following environments: ■ python 2.1.3 ■ pil 1.1.3 and ■ python 2.4.3 ■ pil 1.1.4 installers for python and pil exist for all major platforms and can be obtained at python.org and www .pythonware.com/products/pil. the installation documentation that comes with pil will help you locate the appropriate jpeg libraries if they are missing from your system. for macosx, you can find pre­built binary installers for python, pil and zope at sourceforge.net/projects/ mosxzope. introducing zoomify image adam smith adam smith (ajs17@cornell.edu) is a systems librarian at cornell university library, ithaca, new york. introducing zoomify image | smith 4�introducing zoomify image | smith 4� the “ez” version of the zoomifyer client, a flash­based applet with basic pan and zoom functionality, is pack­ aged with zoomify image for conve­ nience so the software can be used immediately once installed. the ez client is covered by a separate license and can be easily replaced with more advanced clients from zoomify at www.zoomify.com. (a description of how to upgrade the zoomifyer client is included in this paper.) after python and pil with jpeg support are installed, download the zoomify image software from sourceforge.net/projects/zoomify­ image and decompress it. using zoomify image from the command line begin exploring zoomify image by invoking it on the command line: python /zoomifyfilepr ocessor.py or, to process more than one file at a time: python /zoomifyfile processor.py the file format of the images input to zoomify image are typically either tiff or jpeg, but can be any of the many formats that pil can read.2 an image called “test.jpg” is included in the zoomify image distribution and is of sufficient size and complexity to provide an interesting example. during processing, zoomify image creates a new directory to hold the converted image data in the same location as the image file being processed. the name of this direc­ tory is based on the file name of the image being processed, so that, for example, an image called “test.jpg” would have a corresponding folder called “test” containing the converted image data used by the zoomifyer client. if the image file has no file extension, the directory is named by appending “_data” to the image name, so that an image file named “test” would have a corresponding directory called “test_data.” if the process is re­run on the same images, any previously generated data are automatically deleted before being regenerated. zoomify provides substantial documentation and sample code on its web site that demonstrates how to use the data generated by zoomify image in several environments. user­ contributed code is bundled with zoomify image itself, further dem­ onstrating how to dynamically incor­ porate this conversion process into several environments. an example of the use of zoomify image within the zope application server is given. incorporating zoomify image into the zope application server the popular zope application server contains a number of built­in services including a web server, ftp and webdav servers, plug­ins for access­ ing relational databases, and a hier­ archical object­oriented database that uses a file­system metaphor for stor­ age. this object database provides a unique opportunity to incorporate zoomifyer into zope seamlessly. to use zoomify image with zope, the distribution must be decom­ pressed into your zope products directory. for versions 2.7.x and up, this is at: /products/ in zope versions prior to the 2.7.x series, the products directory is at: /lib/python/ products/ restart zope and now within the web­based zope management interface (zmi), the ability to add zoomify image objects appears. after selecting this option, a form is presented that is identical to the form used for adding ordinary image objects within zope. when an image is uploaded using this form, zope automatically invokes the zoomify image conversion process on the server and links the generated data to the default zoomifyer client that comes with the distribution. if the image is subsequently edited within zmi to upload a new version, any existing conversion data for that image are automatically deleted, and the new conversion data are gener­ ated to replace them, just as when invoked on the command line. again, the uploaded image can be in any format that zope recognizes as having a content­type of “image/...” and that pil can read. the only potential “gotcha” in this process is that in the versions of the zoomifyer client the author has tested, zoomify image objects that have file names (in zope terminology, the file name is the object’s “id” property) with extensions other than “.jpg” are not displayed properly by the zoomifyer client. so, when uploading a tiff image, for example, the id given to the zoomify image object should either not contain an extension, or it should be changed from image.tif to something like image_tif. this bug has been reported to zoomify and may be fixed in newer versions of the flash­based viewing software at the time of publication. to view the image within the zoomifyer client, simply call the “view” method of the object from within a browser. so, for a zoomify image object uploaded to: http:///test/test.jpg go to this url: http:///test/test. jpg/view or, to include this view of the image within a zope page template 50 information technology and libraries | march 200750 information technology and libraries | march 2007 (zpt), simply call the tag method of the zoomify image just as you would a normal image object in zope. so, in a zpt, use this: it is possible that the zoomify image conversion process will not have had time to complete when someone tries to view the image. the zoomify image object will attempt to degrade gracefully in this situation by trying to display a downsampled version of the image that is gener­ ated part way through the conver­ sion process, or, if that is also not available, finally informing the user that the image is not yet ready to be viewed. this logic is built into the tag method. to add larger images more effi­ ciently, or to add images in bulk, the zoomify image distribution contains detailed documentation to quickly configure zope to accept images via ftp or webdav and automatically process them through zoomify image when they are uploaded. finally, the default zoomifyer cli­ ent can be overridden by uploading a custom zoomifyer client into a loca­ tion where the zoomify image object can “acquire” it, and giving it a zope id of “zoomifyclient.swf”. how it works to be viewed by a zoomifyer cli­ ent, an image must be processed to produce tiles of the image at differ­ ent scales, or tiers. an xml file that describes these tiles is also necessary. zoomify image provides a cross­ platform method of producing these tiled images and the xml file that describes them. beginning at 100­percent scale, the image is successively scaled in half to produce each tier, until both the width and height of the final tier are, at most, 256 pixels each. each tier is further divided into tiles that are, at most, 256 pixels wide by 256 pixels tall, as seen in figure 1. these tiles are created left to right, top to bottom. tiles are saved as images with the naming conven­ tion indicated in figure 2. the numbering is zero­based, so that the smallest tier is represented by one tile that is at most 256 x 256 pixels wide with the name “0­0­ 0.jpg.” tiles are saved in directories in groups of 256, and those directories also follow a zero­based naming con­ vention starting with “tilegroup0.” lower­numbered tile groups contain lower­numbered tiles, so 0­0­0.jpg is always in tilegroup0. zoomifyer clients understand this tile­naming scheme and only request tiles from the server that are necessary to stitch together the por­ tion of the image being viewed at a particular scale. limitations zoomify image was developed to meet two goals: 1. to provide a cross­platform port of the zoomifyer con­ figure 1. tiers and tiles for a 2048 x 2048 pixel image figure 2. tile image naming scheme introducing zoomify image | smith 51introducing zoomify image | smith 51 verter for use in unix/linux systems, and 2. to make the converter script­ able, and ultimately integrate it into open­source content­man­ agement software, particularly zope. this zoomifyer port was writ­ ten in python, a mature, high­level programming language with an execution model similar to java. although zoomify image continues to be optimized, compared to the official zoomify conversion software, it is slower and more limited in the sizes of images it can reasonably process. anecdotally, zoomify image has been used effectively on images hundreds of megabytes large, but significant performance degradation has been reported in the multi­giga­ byte range. because of these limitations in zoomify image, the official zoomify image­processing software is recom­ mended for converting very large images manually in a windows or macintosh environment. the zoomify image product is recommended in the following circumstances: ■ the conversion must be per­ formed on a unix/linux machine. ■ the conversion process must be scriptable, such as for batch pro­ cessing or being run dynamically. ■ images sizes are not in the multi­ gigabyte range. if a scriptable, cross­platform version of the zoomifyer converter is needed, but performance is an issue, several things can be done to extend the current limits of the soft­ ware. obviously, upgrading hard­ ware, particularly ram, is effective and relatively inexpensive. running the latest versions of python and pil will also help. each new version of python makes significant perfor­ mance improvements, and this was a primary goal of version 2.5, which was released in september 2006. the author believes that the cur­ rent weak link in the performance chain is related to how zoomify image is loading image data into memory with pil during processing. in the current distribution, a python script contributed by gawain avers, which is based partially on the zoomify image approach, uses imagemagick instead of pil for image manipula­ tion and is better able to process multi­gigabyte images. the author would like to add the ability to des­ ignate the image library at runtime in future versions of zoomify image. future development beyond improving the performance of the core­processing algorithm, the author would also like to explore opportunities for more efficiently processing images within zope, such as spawning a background thread for processing images so the zope web server can immediately respond to the client’s image­submission request. the author would also like to improve the tag method to display data more flexibly in the zoomifyer client and ensure consistent behav­ ior with zope’s default image tag method. finally, zoomify image could also benefit from the addi­ tion of a simple configuration file to control such runtime properties as image quality and which third­party image­processing library to use, for example. conclusion zoomify image is mature, open­ source software that makes it pos­ sible to publish large, high­resolution images to the web. it is designed to be convenient to use in a variety of architectures and can be viewed within existing browser software. download it for free, begin using it in minutes, and explore its unique possibilities. references 1. adobe systems, macromedia flash player statistics, http://www.adobe.com/ products/player_census/flashplayer/ (accessed march 1, 2007). 2. pythonware, python imaging library handbook: image file formats, http:// www.pythonware.com/library/pil/ handbook/formats.htm (accessed aug. 6, 2006). resources macromedia flash player statistics (http://www.adobe.com/ products/player_census/flash­ player/) (accessed jan. 2, 2007). python imaging library (pil) (http:// www.pythonware.com/products/ pil/) (accessed jan. 2, 2007). python programming language official web site (http://www.python.org/) (accessed jan. 2, 2007). zoomify image (http://sourceforge.net/ projects/zoomifyimage/) (accessed jan. 2, 2007). zoomify (http://www.zoomify.com/) (accessed jan. 2, 2007). zope community (http://www.zope .org/) (accessed jan. 2, 2007). zope installers for macosx (http:// sourceforge.net/projects/ mosxzope/) (accessed jan. 2, 2007). lib-mocs-kmc364-20140103102106 27 personnel aspects of library automation david c. weber: director of libraries, stanford university, stanford, california personnel of an automation project is discussed in terms of talents needed in the design team, their qualifications and organization, the attitudes to be fostered, and the communication and documentation that is important for effective teamwork. discussion is based on stanford university's experience with protect ballots and includes comments on some specific problems which have personnel importance and may be faced in major design efforts. no operation is any better than its rersonnel. the selection, encouragement, motivation and advancement o the individuals who operate libraries or library automation programs are the critical elements in the success of automation. the following observations are based upon experience at stanford university over the past eight years in applying data processing to libraries, and particularly in the large scale on-line experience of project ballots (an acronym standing for bibliographic automation of large library operations using a time sharing system) supported by the u. s. office of education bureau of research during the past three years. the first par! of the paper treats of five key personnel aspects: the automation team, thetr qualifications, their organization, the climate for effort, and documentation. 28 journal of library automation vol. 4/1 march, 1971 the team experts are required for the design of any computer system or system based on other sophisticated equipment and they must emphatically form a "team" to be effective. the group may include a statistician and/or financial expert, a systems analyst, a systems designer, a systems programmer, a computer applications programmer, and a librarian. there may be several persons of each type, or one person may assume more than one responsibility. a few universities have librarians who have received training in systems analysis or in programming. the computer related professions are, however, demanding in themselves, and especially so when the programming language may change with each generation of computers. it is therefore usual for the head librarian to work with experts located in a systems office, an administrative datajrocessing center, or a computation center. except for the librarians, few · any of the experts may be on the library payroll, although in a very large project all may be financed from one or two accounts in the library. the team must cover the variety of functions encompassed in a formal system development process. these functions are enumerated in detail in stanford's project documentation ( 1), but a brief summary of typical functions performed by the team may indicate its diversity. there is the analysis of existing library operations, conceptual design of what is desired under an automated system, form and other output design, review of published literature and on-site analysis of selected efforts of a related nature; determination of machine configuration to support the system design, study of machine efficiency, and reliability of main frame plus peripheral equipment; choice of programming language, checkout and debugging of programs; cost effectiveness study, study of present manpower conversion, analysis of space requirements and equipment changes; staff training programs with manuals or computer aided instruction, system documentation and publicity; systems programming and applications programming, and project management. the total effort is collaborative; the system is designed by and with the users of it (i.e., library staff), not for them, and a tremendous contribution of local staff time is essential to success. in many instances an institution will have some, but not all, of these resources and capabilities in adequate amount. if amount is insufficient, the project director must determine how, through consultants or change of project course, a needed talent can be obtained or bypassed. the consequences of each mix of talent and change of strategy need assessment at frequent intervals; reassessment must be done with the full participation of the most senior library officers, including the director of libraries, as well as certain other key university officers. at stanford, the group has for three years comprised diverse talent and worked reasonably well as a team. the library has recently delegated to the director of the computation center the immediate project management of ballots and spires ( 2) (stanford public information retrieval personnel aspects of automation/ weber 29 system). thus the current combined staff of twenty-three, which should reach a peak of twenty-five during 1971, reports to the ballots-spires project director. he in turn reports both to the director of the computation center in a direct relationship and, under his second hat as chief of the library automation department, to the assistant director of libraries for bibliographic operations in a dotted-line relationship. see table 1 for stanford's diversity of staff. table 1. staff of project ballots-1970 title or age degree years of years on classification experience project project director 36 bs, ce 15 1 special assistant 40 bs 12 2 senior system programmer 37 ba 8 1 system programmer 36 bs 14 3 manager technical development 29 bs 5 2 system services manager 30 ba 8 2 librarian 11/system analyst 28 ba, mls 3 3 librarian/system analyst 27 ba, mls 2 <1 project documentation 35 ba, mls 3 1 editor assistant 26 ma 3 <1 system analyst 27 ba, ma 5 1 junior system analyst 25 ba 2 2 programmer trainee 26 1 1 programmer 30 aa 7 3 programmer 26 ba 4 1 programmer 32 bs 11 <1 programmer 28 bs 7 <1 research assistant 27 bs, ms, phd 4 3 research assistant 28 ba, llb 8 2 research assistant 22 ba 3 2 research assistant 24 ba 4 2 senior secretary 27 8 1 secretary 19 1 1 in development of library automation or of any sophisticated data processing system, it is essential to utilize librarians and other system users to the utmost in constructing the design. there is evidence that an effective program of library automation results from on-campus development: that is, using a local staff with librarians working on a daily basis with system ~alysts, programmers, and information scientists. librarians most definitely should not try to do it all themselves; that would be sheer folly and w~uld reveal a lamentable lack of appreciation of the highly complex sktlls of the other professionals working in the information sciences. l 30 journal of library automation vol. 4/1 march, 1971 team qualifications a qualified and enthusiastic team with strong backing from the library administration is the most important single element in a library's automation ehorts. this requires that the library administrator have a grasp of the intricacies, although he himself will probably not understand all details involved in the system design. it also requires consideration of the desire for advancement of those in computer refated professions and the various characteristics of their career/attems, including training, experience, job market, salary potentials, an mobility. the team will need to be selected with care and joint ehort by computer stah and library stah management. people are needed who can teach and learn from one another. they must be tolerant, and interested in problems and details, for they will be changing traditional systems, altering people's work habits, and probably shaking their self-confidence. security comes from knowing the facts and being able to work on the new system-to be in part responsible for one's own future. team harmony of ehort can be promoted by the so-called "bridge professional", or what the sociologists call a "marginal professional", meaning one who is able to assist those in one profession to converse and work ehectively with those in another. at stanford the librarian/analysts and the project editor have been ehective in such a capacity. those in the computer related professions, along with all on the library stah, need a sense of purpose, a sense of achievement, and recognition of their contributions by superiors as well as peers. the automation team needs a competent, experienced, technically knowledgeable, and tactful captain. he must manage with an appreciation for communication, a knack for touching base with various groups having interests in the ehort, the judgment to assign reasonable tasks, and the realism to set and achieve feasible time schedules-all within budget limitations. if the leader is less than this paragon, others in the organization must provide these qualities, all of which are required. for at least another decade it is likely that the expert analyst andjrogrammer will receive as high a salary as a librarian division hea or assistant department chief, and a highly qualified systems designer may well earn more than any chief and perhaps as much as the assistant director of libraries. the scale is not irrational or unjust; it merely recognizes the scarcity of particular talents and their importance to major library automation programs. designing an on-line library system requires a person of proven competence in on-line systems. a salary oher shaved here may well lead to regret. experience in project ballots points up problems with the selection of personnel who are not library trained. some persons may be excellent in theoretical development but poor as managers, or some may play a "campus politics" game in order to move into senior positions in the computation center. computer specialists have diherent career goals than do librarians, and rarely see the library as a permanent career commitment personnel aspects of automation/weber 31 by which to promote library automation; rather their commitment is toward automation and computer applications, not a particular section of the university. a project manager also needs to take great care that research does not become an end in itself, a particular tendency of graduate students doing system development. implementation must be the goal of library automation; automated operations must be sound, efficient, dependable, and economical. some of the special needs and working conditions for personnel in an automated program are outlined by allen b. veaner (3). team organization the organizational unit of an automation program may be first an office, then later a division when the group is farger and the function more permanent. the staff of a major project should have a departmental status equal to that of the acquisition or cataloging department. these latter two departments may be combined with an automation department under an assistant or associate director for technical processing. however, it is a rare individual who can give adequate attention to both the complexities of a major traditional library function and the direction of a major research and development program. thus the initial organizational pattern may be one of separate but equal status, and at some point in time the units may be combined under one administrator. see figure 1 for stanford's new organization adopted after three years of ehort, as it entered the productionengineered phase. units may best be combined when a research and development project begins to take on a significant amount of operational work. the reason is that the person in charge of the system development may need to oversee its implementation in order to assure that standards are followed for data preparation, coding, and the details of forms; and that feedback of experience for system improvement is secured. this combination of units should not be achieved when the rroject is still in the development stage, but it should also not wait unti operations are well under way. some anticipation is desirable. in the medium-scale program such combination of units may be possible after a year of operation, or the continuing production may be assumed by a traditional department and the systems office left free for further experimentation and development work. production is normally the responsibility of traditional departments and ~om the day of implementation; the automation department responsibility is for instructing in system use, debugging of programs, and fine tuning of the system. in a large project striving toward an integrated system for all technical processes and public services, the transfer of responsibilities to traditional departments may come in no less than three years and perhaps as many as five years from the origin of the project because of c_onstant developments in software and hardware, developments which library users cannot control but to which they must be responsive. an ~ director, stanford : university librarjes e the .r.lots : : ballots: prin:cipal investigator and assistant director of libraries for bibliographic operations i library systems i design com!httee i i i i system services manager 4-library syste~ls 2-system programmers analysts (incl. 2 librarians) age = 27 years degrees = 1t (a)exper = 3 years (b)proj = 2 years age = 37 years degrees = 1 exper = 11 years )?roj = 2 years (a) professional experience in edp systems (b) time with the ballots/spires project director, stanford vice president computation center for research project director spires: principal investigator spires/ballots and professor of the and chief of the library's department of communication automation department 6-applications + 4-graduate programmers srudents (full time) (part time ) 26 years 2 age = 31 years degrees = 1 exper = 71years proj = l year 4 years 2 years project documentation 2-editors age = 30 years degrees = 2 exper = 3 years proj = 1 year fig. 1. ballots/spires organization-1970. ~ 'c' ~ ~ ..... .q.. t-t 5:j ~ c ~ ;; c· ;$ ~ ~ -~ ~ i ..... ~ ..... personnel aspects of automation/weber 33 automation division or systems office would remain to take care of the refinements, maintenance, and development of further applications which are a result of the open-ended nature of a major automation program. the climate for effort if the librarian is to work effectively with all of the previously mentioned experts, he must become more than superficially familiar with the equipment and with the software which instructs it. the librarian who carries the responsibility for major mechanized data processing programs will probably have taken at least half a dozen courses in various aspects of data processing in order to be able to state reasonable requirements, to comprehend economic and technical limitations, discuss file organization problems with the systems designer, and be sufficiently informed to help explain the new system to the library staff that will operate or make use of it. this type of specialized training will also be necessary for other team members who will work with different parts of the system. a number of librarians will need to take several short courses selected for their early relevance to the work at hand. staff may take courses offered in the university computer science department, by the computation center, or by a local computer firm. various clerical personnel will need briefing sessions, and it will be necessary to train some typists to serve as skilled terminal operators. indeed, training will be needed on a continuing basis as more staff use the system; manuals are important unless self-instruction is built in. these efforts are desirable because the employee needs assurance that his talents will not be outdated and he be laid off as a consequence; rather that he will be retrained to the new system, shown that its function is not totally different from the previous one, and shown that it can actually serve him and lead to enhanced satisfaction and improved salary in his library employment. computer based systems are far more likely to upgrade librarianship than to make it obsolete. they will enhance the profession by eliminating its routine drudgery, and thus more sharply identify its really professional nature. don r. swanson has commented on this point: "those librarians who have some kind of irrational antipathy toward mechanization per se (not just toward some engineers who have inappropriately oversold mechanization) i regard with some suspicion because i think they do not have sufficient respect for their profession. they may be afraid that librarianship is going to be exposed as being intellectually vacuous, which i don't think is so. even in a completely mechanized library there would still be need for skilled reference librarians, bibliographers, catalogers, acquisitions specialists, administrators, and others. those librarians in the future who regard mechanization, not with suspicion, but as a subject to be mastered will be those who will plan our future libraries and who will plan the things that machines are going to do. there will be no doubt of their professional status." ( 4) 34 journal of library automation vol. 4/1 march, 1971 persons who have inhibitions about machine based systems will not be effective members of the design and development group. those receptive to the change will benefit by having their job horizons enlarged and their prospects for improved salary and personnel classification enhanced. they will also share in the enthusiasm inspired by a bold new enterprise. this is not to say that all library staff members will enjoy the exacting refinements of a machine system, just as not everyone has talent to be a first-rate cataloger. it is not suited to everyone, and therefore the nature and purpose of the system must be clearly explained or demonstrated to anyone interested in such an assignment lest he accept it and then become disenchanted with the work. the importance cannot be overstated of telling the entire library staff what is being done in regard to automation-and why. disquieting rumors will abound in the absence of full and candid communication. staff meetings should be held to review progress and outline next steps. staff bulletins should publish summaries of the program and reports on its current status, information that can also be useful for faculty and staff outside the library. it must not be forgotten that the card catalog, the manual circulation system, and common order forms are familiar to all students and faculty. most students will have seen these in their high school or public libraries, yet few will have seen a sophisticated machine system, and will often be skeptical about its efficiency and dependability. faculty members may well wonder whether it is worth the cost. the effort to explain a program concisely but clearly to the library staff, students, faculty, and other university staff can be highly rewarding in understanding, and in moral and financial support. columbia university's experience with library automation has led them to state that .. though the hardware and software programs associated with computer technology are formidable, they are not the only (and possibly not even the most important) problems in an automation effort. two areas often overlooked or grossly underestimated are: 1 ) creating an environment hospitable to change [and] especially important in this area is staff training and organization. 2) describing and analyzing existing manual procedures sufficiently before attempting to design automated systems." (5) documentation the documentation of any new system is of singular importance. there is an oral tradition in most libraries; techniques of filing or searching are passed on by the supervisor, although libraries use staff manuals to formalize some of the techniques. however, in a system where absolute exactitude is demanded and where costs of system development are high, methodical recording of principles and procedures is obviously necessary. especially vital are details of design and programming, for purposes of debugging, maintenance, and transfer to others. personnel aspects of automation/weber 35 critical personnel issues in an important statement from massachusetts institute of technology's project mac in 1968, professor f. j. corbat6 outlines fifteen critical issues ranging from technical to managerial that affect the complexity and difficulty of constructing computer systems to serve multiple users ( 6). seven of the fifteen have substantial personnel aspects; experience with project ballots provides the basis for the following comments on them. 1) "the first danger signal is when the designers of the system won't document. they don't want to be bothered trying to write out in words what they intend to do." stanford's experience might not put this as a first critical issue, yet it is evident that without adequate and clear documentation the advancement of any research or development project is jeopardized. one expert, an invaluable member of the ballots team, has full responsibility for this very important task. the position requires adequate clerical support; there are one-and-a-half assistants on the ballots team. 2) "the second danger signal is when designers won't or can't implement. what is referred to here is the lofty designer who sketches out on a blackboard one day his great ideas and then turns the job over to coders to finish many months later." stanford has experienced some of the seductiveness of design innovations, especially on the part of graduate student research assistants. (yet these assistants have done excellent work and it is wished they were all full time on the project. ) without constant review and the use of pert charts or other scheduling, shying away from implementation can be a real hazard. there will be dark days when the design team cannot surmount some intractable but crucial obstacle, and tne project manager and stah librarians working with the team must be sympathetic, encouraging and patient. 3) "the next danger signal is when the design needs more than ten people. this doesn't mean that all the support people . . . must add up to no more than ten. but when the crucial kernel of the design team is more than ten people, a larger scale project is coming into being. this is the point where communication problems begin to develop." stanford has flirted with that particular danger point. with acquisition and cataloging staff included, the ballots design group is over ten and there is a communication problem, but one due not so much to size as to different backgrounds, vocabulary and scheduling of effort. the need for communication has been intensified because the main library is over half a mile from the computation center. it has required monthly staff meetings at early stages of design, and late stages of development, and at other times weekly staff meetings of the design group with the librarians who are ~etting the design criteria. failure of constant and accurate communication m a research and development effort is a threat to its effective progress. 4) "if a project cannot be finished or made use of in one year, there is potential trouble, because the chances of underestimation are strong (and ) a personnel turnover of roughly 20% per year must be assumed." stanford's 36 journal of library automation vol. 4/1 march, 1971 experience would bear this out. there was some time and cost underestimation. turnover during 1969-70 was 17%; the year before it was 50%. obviously documentation then becomes a more critical element in progress, and turnover may lead the librarian to feel that it is sometimes one step backwards for every two steps forward. turnover may be minimized by generous salary increases, not only once a year but perhaps at other times also when merit deserves reward and as responsibilities increase. in contrast to customary operations, an automation design effort is constantly changing in nature and emphasis; this fact requires flexibility in personnel management and frequently deserves immediate response in salary and classification administration. to keep a qualified research team in an area of specialization in demand, one must pay the price. let there be no misunderstanding, a good system of library automation cannot be finished in one year-nor in three; and it is costly. 5) "another danger signal is when a system is not a line-of-sight system. this means that all of the terminals, consoles, or what-have-you are not in the same room within shouting distance of the operator." any on-line system like ballots cannot be line-of-sight. terminals are brought to the users, not users to the terminals. since an on-line system requires total file recovery through use of log tapes, a facility not available on the prototype system, stanford has experienced problems when the machine goes down; it takes time to rerun a program or mount a different disk pack; a file was once wiped out; and there are many other users of the central facility, which puts a premium on scheduling, advance notice, backup, and the like. if a design team is not housed in adjacent space, it will take more personnel or time than in a line-of-sight arrangement to achieve the same accomplishment. ballots systems analysts were in the main library througbout the early design phases and the systems designers were near the computation center. lack of line-of-sight was a sufficiently severe problem that all of the ballots staff were collocated near the computation center last winter as the production engineered phase began. 6) "a somewhat related danger signal is when there are over ten system maintainers. here i am talking about an on-line system that is actually being maintained on-line." at stanford no more than one person has worked at one time on the program maintenance of stanford's four-yearold computer produced undergraduate library book catalog. there have been some complexities due to staff changes, changes in the operating system, and an off-campus contract for reprogramming to third-generation equipment, but the problems have not resulted because of the scale of the project. ballots, on the other hand, is twenty to fifty times as large a system, and it is expected that two or three programmers will be needed to maintain the systems software and a similar number to maintain and make minor revisions to the applications software. 7) "the last danger signal is when the system requires the ability to permit combinations of sharing, privacy and control." at stanford, assignpersonnel aspects of automation/ weber 37 ment of authority for file access has become a problem-who is permitted to update an acquisition record or authorize payment? the requirement for security also enters in any system which has salary data or other personnel information in files. a whole order of complexity is added. as in many of the above problems, complexity is accentuated when one is developing an on-line interactive system which serves multiple users. security must be designed to the file level and, later, to the record or even data element level. security requires control of access to file, of writing in a file, and of updating data through three types of checks: access allowable from a given terminal, from the file password, or from an individual password. such problems do not exist in off-line systems. conclusion for successful automation of library operations, it is of fundamental importance to choose a task that is appropriate in timing, magnitude of effort, funding, and personnel. the ballots experience demonstrates that one must devote great thought, care, and analysis to choosing the right automation project at the right time, and base it on having well qualified people to direct and accomplish the task. given suitable conditions it will be a most exciting and fruitful endeavor. the system that works well is a thing of beauty, and people make it so. references 1. stanford university, spires/ballots project: project control notebook, may 1970. section 1.4 "system development process." 2. parker, edwin b.: spires (stanford physics information retrieval system) 1969-70 annual report to the national science foundation. (stanford university: institute for communication research, june 1970). 3. veaner, allen b.: "major decision points in library automation," college & research libraries, 31 (september 1970), 299-312. 4. swanson, don r.: "design requirements for a future library." in markuson, barbara evans, ed.: libraries and automation. (washington: library of congress, 1964), p. 21. 5. columbia university libraries : progress report [to the national science foundation on library automation] for jan. 1968-dec. 1969 (nsf-gn-694). p. 14. 6. corbat6, fernando j.: sensitive issues in the design of multi-use systems (waltham, massachusetts: honeywell edp technology center, technical symposium on advances in software technology, february 1968). 17 pp. project mac internal memo. mac-m-383. microsoft word 5485-10835-5-ce.docx negotiating  a  text  mining  license  for   faculty  researchers       leslie  a.  williams,     lynne  m.  fox,     christophe  roeder,     and  lawrence  hunter       information  technology  and  libraries  |  september  2014           5     abstract   this  case  study  examines  strategies  used  to  leverage  the  library’s  existing  journal  licenses  to  obtain  a   large  collection  of  full-­‐text  journal  articles  in  xml  format,  the  right  to  text  mine  the  collection,  and   the  right  to  use  the  collection  and  the  data  mined  from  it  for  grant-­‐funded  research  to  develop   biomedical  natural  language  processing  (bnlp)  tools.  researchers  attempted  to  obtain  content   directly  from  pubmed  central  (pmc).  this  attempt  failed  because  of  limits  on  use  of  content  in  pmc.   next,  researchers  and  their  library  liaison  attempted  to  obtain  content  from  contacts  in  the  technical   divisions  of  the  publishing  industry.  this  resulted  in  an  incomplete  research  data  set.  researchers,  the   library  liaison,  and  the  acquisitions  librarian  then  collaborated  with  the  sales  and  technical  staff  of  a   major  science,  technology,  engineering,  and  medical  (stem)  publisher  to  successfully  create  a   method  for  obtaining  xml  content  as  an  extension  of  the  library’s  typical  acquisition  process  for   electronic  resources.  our  experience  led  us  to  realize  that  text-­‐mining  rights  of  full-­‐text  articles  in   xml  format  should  routinely  be  included  in  the  negotiation  of  the  library’s  licenses.   introduction   the  university  of  colorado  anschutz  medical  campus  (cu  anschutz)  is  the  only  academic  health   sciences  center  in  colorado  and  the  largest  in  the  region.  annually,  cu  anschutz  educates  3,480   full-­‐time  students,  provides  care  during  1.5  million  patient  visits,  and  receives  more  than  $400   million  in  research  awards.1  cu  anschutz  is  home  to  a  major  research  group  in  biomedical  natural   language  processing  (bnlp),  directed  by  professor  lawrence  hunter.  natural  language  processing   (also  known  as  nlp  or,  more  colloquially,  “text  mining”)  is  the  development  and  application  of   computer  programs  that  accept  human  language,  usually  in  the  form  of  documents,  as  input.  bnlp   takes  as  input  scientific  documents,  such  as  journal  articles  or  abstracts,  and  provides  useful     leslie  a.  williams  (leslie.williams@ucdenver.edu)  is  head  of  acquisitions,  auraria  library,   university  of  colorado,  denver.  lynne  m.  fox  (lynne.fox@ucdenver.edu)  is  education  librarian,   health  sciences  library,  university  of  colorado  anschutz  medical  campus,  aurora.     chistophe  roeder  is  a  researcher  at  the  school  of  medicine,  university  of  colorado,  aurora.   lawrence  hunter  (larry.hunter@ucdenver.edu)  is  professor,  school  of  medicine,  university  of   colorado,  aurora.       negotiating  a  text  mining  license  for  faculty  researchers  |  williams  et  al   6   functionality,  such  as  information  retrieval  or  information  extraction.  cu  anschutz’s  health   sciences  library  (hsl)  supports  hunter’s  research  group  by  providing  a  reference  and  instruction   librarian,  lynne  fox,  to  participate  on  the  research  team.  hunter’s  group  is  working  on   computational  methods  for  knowledge-­‐based  analysis  of  genome-­‐scale  data.2  as  part  of  that  work,   his  group  is  devising  and  implementing  text-­‐mining  methods  that  extract  relevant  information   from  biomedical  journal  articles,  which  is  then  integrated  with  information  from  gene-­‐centric   databases  and  used  to  produce  a  visual  representation  of  all  of  the  published  knowledge  relevant   to  a  particular  data  set,  with  the  goal  of  identifying  new  explanatory  hypotheses.     hunter’s  research  group  demonstrated  the  potential  of  integrating  data  and  research  information   in  a  visualization  to  further  new  discoveries  with  the  “hanalyzer”   (http://hanalyzer.sourceforge.net).  their  test  case  used  expression  data  from  mice  related  to   craniofacial  development  and  connected  that  data  to  pubmed  abstracts  using  gene  or  protein   names.  “copying  of  content  that  is  subject  to  copyright  requires  the  clearing  of  rights  and   permissions  to  do  this.  for  these  reasons  the  body  of  text  that  is  most  often  used  by  researchers   for  text  mining  is  pubmed.”3  the  resulting  visualization  allowed  researchers  to  identify  four  genes   involved  in  mouse  craniofacial  development  that  had  not  previously  been  connected  to  tongue   development,  with  the  resulting  hypotheses  validated  by  subsequent  laboratory  experiment.4  the   knowledge-­‐based  analysis  tool  is  open  access.     to  continue  the  development  of  the  bnlp  tools  for  the  knowledge-­‐based  analysis  system,  three   things  were  required:  a  large  collection  of  full-­‐text  journal  articles  in  xml  format,  the  right  to  text   mine  the  collection,  and  the  right  to  store  and  use  the  collection  and  the  data  mined  from  it  for   grant-­‐funded  research.  the  larger  the  dataset,  the  more  robust  the  visual  representations  of  the   knowledge-­‐based  analysis  system,  so  hunter’s  research  group  sought  to  compile  a  large  corpus  of   relevant  literature,  beginning  with  journal  articles.  the  text  that  is  mined  can  start  in  many   formats;  however,  xml  provides  a  computer-­‐ready  format  for  text  mining  because  it  is  structured   to  indicate  parts  of  the  document.  xml  is  “called  a  ‘markup  language’  because  it  uses  tags  to  mark   and  delineate  pieces  of  data.  the  ‘extensible’  part  means  that  the  tags  are  not  pre-­‐defined;  users   can  define  them  based  on  the  type  of  content  they  are  working  with.”5,6   xml  has  been  adopted  as  a  standard  for  content  creation  by  journal  publishers  because  it   provides  a  flexible  format  for  electronic  media.7  xml  allows  the  parts  of  a  journal  article  to  be   encoded  with  tags  that  identify  the  title,  author,  abstract,  and  other  sections,  allowing  the  article  to   be  transmitted  electronically  between  editor  and  publisher  and  to  be  easily  formatted  and   reproduced  into  different  versions  (e.g.,  print,  online).  xml  can  also  indicate  significant  content  in   the  text,  such  as  biological  terms  or  concepts.  xml  allowed  hunter’s  research  group  to  write   computer  programs  that  can  make  sense  of  each  article  by  using  the  xml  tags  as  indicators  of   content  and  placement  within  the  article.  products  have  been  developed,  such  as  la-­‐pdftext,  to   extract  text  from  pdf  documents.8  however,  direct  access  to  xml  provides  more  useful  corpora     information  technology  and  libraries  |  september  2014   7   because  the  document  markup  saves  time  and  improves  the  accuracy  of  results  extracted  from   xml.     once  the  sections  and  content  of  an  article  are  identified,  text-­‐mining  techniques  are  applied  to  the   article.  “text  mining  extracts  meaning  from  text  in  the  form  of  concepts,  the  relationships  between   the  concepts  or  the  actions  performed  on  them  and  presents  them  as  facts  or  assertions.”9  text-­‐ mining  techniques  can  be  applied  to  any  type  of  information  available  in  machine-­‐readable  format   (e.g.,  journal  article,  e-­‐books).  a  dataset  is  created  when  the  text-­‐mined  data  is  aggregated.  using   bnlp  tools,  hunter’s  research  group’s  knowledge-­‐based  analysis  system  analyzed  the  dataset  and   produced  visual  representations  of  the  knowledge  that  have  the  potential  to  lead  to  new   hypotheses.  text  mining  and  bnlp  techniques  have  the  potential  to  build  relationships  between   the  knowledge  contained  in  the  scholarly  literature  that  lead  to  new  hypothesis  resulting  in  more   rapid  advances  in  science.   literature  review   hunter  and  cohen  explored  “literature  overload”  and  its  profoundly  negative  impact  on  discovery   and  innovation.10  with  an  estimated  growth  rate  of  3.1  percent  annually  for  pubmed  central,  the   us  national  library  of  medicine’s  repository,  researchers  struggle  to  master  the  new  literature  of   their  field  using  traditional  methods.  yet  much  of  the  advancement  of  biological  knowledge  relies   on  the  interplay  of  data  created  by  protein,  sequence,  and  expression  studies  and  the   communication  of  information  and  discoveries  through  nontextual  and  textual  databases  and   published  reports.11  how  do  biomedical  researchers  capitalize  on  and  integrate  the  wealth  of   information  available  in  the  scholarly  literature?  “the  common  ground  in  the  area  of  content   mining  is  in  the  shared  conviction  that  the  ever  increasing  overload  of  information  poses  an   absolute  need  for  better  and  faster  analysis  of  large  volumes  of  content  corpora,  preferably  by   machines.”12   bnlp  “encompasses  the  many  computational  tools  and  methods  that  take  human-­‐generated  texts   as  input,  generally  applied  to  tasks  such  as  information  retrieval,  document  classification,   information  extraction,  plagiarism  detection,  or  literature-­‐based  discovery.”13  bnlp  techniques   accomplish  many  tasks  usually  performed  manually  by  researchers,  including  enhancing  access   through  expanded  indexing  of  content  or  linkage  to  additional  information,  automating  reviews  of   the  literature,  discovering  new  insights,  and  extracting  meaning  from  text.14  text  mining  is  just   one  tool  in  a  larger  bnlp  toolbox  of  resources  used  to  read,  reason,  and  report  findings  in  a  way   that  connects  data  to  information  sources  to  speed  discovery  of  new  knowledge.15  according  to   pioneering  text-­‐mining  researcher  marti  hearst,  “text  mining  is  the  discovery  by  computer  of  new,   previously  unknown  information,  by  automatically  extracting  information  from  different  written   resources.  a  key  element  is  the  linking  together  of  the  extracted  information  together  to  form  new   facts  or  new  hypotheses  to  be  explored  further  by  more  conventional  means  of     negotiating  a  text  mining  license  for  faculty  researchers  |  williams  et  al   8   experimentation.”16  biomedical  text  mining  uses  “automated  methods  for  exploiting  the  enormous   amount  of  knowledge  available  in  the  biomedical  literature.”17   recent  reports,  commissioned  by  private  and  governmental  interest  groups,  discuss  the  economic   and  societal  value  of  text  mining.18,19  the  mckinsey  global  institute  estimates  the  worth  of   harnessing  big  data  insights  in  us  health  care  at  $300  billion.  the  report  concludes  that  greater   sharing  of  data  for  text  mining  enables  “experimentation  to  discover  needs,  expose  variability,  and   improve  performance”  and  enhances  “replacing/supporting  human  decision  making  with   automated  algorithms,”  among  other  benefits.  furthermore,  the  mckinsey  report  points  out  that   north  america  and  europe  have  the  greatest  potential  to  take  advantage  of  innovation  because  of   a  well-­‐developed  infrastructure  and  large  stores  of  text  and  data  to  be  mined.20  however,  these   new  and  evolving  technologies  are  challenging  the  current  intellectual-­‐property  framework  as   noted  in  an  independent  report  by  ian  hargreaves,  “digital  opportunity:  a  review  of  intellectual   property  and  growth,”  resulting  in  lost  opportunity  for  innovation  and  economic  growth.21  in  “the   value  and  benefits  of  text  mining,”  jisc  finds  copyright  restrictions  limit  access  to  content  for  text   mining  in  the  biomedical  sciences  and  chemistry  and  that  costs  for  access  and  infrastructure   prevent  entry  into  text-­‐mining  research  for  many  noncommercial  organizations.22  despite   copyright  barriers,  organizations  surveyed  pointed  out  the  risks  associated  with  failing  to  use  text-­‐ mining  techniques  to  further  research  include  financial  loss,  loss  of  prestige,  opportunity  lost,  and   the  brain  drain  of  having  talented  staff  seek  more  fulfilling  work.  jisc  explores  a  research  project’s   workflow  and  finds  a  lack  of  access  to  text  mining  delayed  the  publication  of  an  important  medical   research  study  by  many  months,  or  the  time  the  research  team  spent  analyzing  and  summarizing   relevant  research.23  both  reports  advocate  an  exception  to  intellectual  property  rights  for   noncommercial  text-­‐mining  research  to  balance  the  protection  of  intellectual  property  with  the   access  needs  of  researchers.  a  centrally  maintained  repository  for  text  mining  has  been  proposed,   although  its  creation  would  face  significant  challenges.24   scholarly  journal  content  is  the  raw  “ore”  for  text  mining  and  bnlp.  the  lack  of  access  to  this  ore   creates  a  bottleneck  for  researchers.  “new  business  models  for  supporting  text  mining  within  the   scholarly  publishing  community  are  being  explored;  however,  evidence  suggests  that  in  some   cases  lack  of  understanding  of  the  potential  is  hampering  innovation.”25  bnlp  and  machine-­‐ learning  research  products  are  more  accurate  and  complete  when  more  content  is  available  for   text  mining.  “knowledge  discovery  is  the  search  for  hidden  information.  .  .  .  hence  the  need  is  to   start  looking  as  widely  as  possible  in  the  largest  set  of  content  sources  possible.”26  however,  as   noted  in  a  nature  article,  “the  question  is  how  to  make  progress  today  when  much  research  lies   behind  subscription  firewalls  and  even  ‘open’  content  does  not  always  come  with  a  text-­‐mining   license.”27  large  scientific  publishers  are  facing  economic  challenges,  and  potentially  diminished   economic  returns,  as  the  tension  over  the  right  to  use  licensed  content  heats  up.  nature,  the   flagship  of  a  major  scientific  publisher,  predicted  “trouble  at  the  text  mine”  if  researchers  lack   access  to  the  contents  of  research  publications.28  and  a  2012  investment  report  predicted  slower     information  technology  and  libraries  |  september  2014   9   earnings  growth  for  elsevier,  the  largest  stem  publisher,  if  it  blocked  access  to  licensed  content   by  text-­‐mining  researchers.  the  review  predicted,  “if  the  academic  community  were  to  conclude   that  the  commercial  terms  imposed  by  elsevier  are  also  hindering  the  progress  of  science  or  their   ability  to  efficiently  perform  research,  the  risk  of  a  further  escalation  of  the  acrimony  [between   elsevier  and  the  academic  community]  rises  substantially.”29  with  open  access  alternatives   proliferating,  including  making  federally  funded  research  freely  accessible,  stem  publishers  are   under  increased  pressure  to  respond  to  market  forces.  “the  greatest  challenge  for  publishers  is  to   create  an  infrastructure  that  makes  their  content  more  machine-­‐accessible  and  that  also  supports   all  that  text-­‐miners  or  computational  linguists  might  want  to  do  with  the  content.”30  on  the  other   end  of  the  spectrum,  researchers  are  struggling  to  gain  legal  access  to  as  much  content  as  possible.     academic  libraries  have  long  excelled  at  serving  as  the  bridge  between  researchers  and  publishers   and  can  expand  their  roles  to  include  navigating  the  uncharted  territory  of  obtaining  text-­‐mining   rights  for  content.  increasing  the  library’s  role  in  text  mining  and  other  associated  bnlp  and   machine-­‐learning  methods  offers  tremendous  potential  for  greater  institutional  relevance  and   service  to  researchers.31  at  cu  anschutz’s  hsl,  fox  and  williams,  an  acquisitions  librarian,  found   natural  opportunities  for  collaboration  including  negotiating  rights  to  content  more  efficiently   through  expanded  licensing  arrangements  and  facilitating  the  secure  transfer  and  storage  of  data   to  protect  researchers  and  publishers.   method   hunter  and  fox  began  working  in  2011  to  obtain  a  large  corpus  of  biomedical  journal  articles  in   xml  format  to  create  a  body  of  text  as  comprehensive  as  possible  for  bnlp  experimentation  that   would  further  advance  hunter’s  research  group’s  knowledge-­‐based  analysis  system.  the  desired   result  was  an  aggregated  collection  obtained  from  multiple  publishers,  stored  locally,  and   available  on  demand  for  the  knowledge-­‐based  analysis  system  to  process.  hunter  and  fox  soon   realized  that  “the  process  of  obtaining  or  granting  permissions  for  text  mining  is  daunting  for   researchers  and  publishers  alike.  researchers  must  identify  the  publishers  and  discover  the   method  of  obtaining  permission  for  each  publisher.  most  publishers  currently  consider  mining   requests  on  a  case  by  case  basis.”32  they  pursued  a  multifaceted  strategy  to  build  a  robust   collection  and  to  determine  which  strategy  proved  most  fruitful  because,  during  a  grant  review,   national  library  of  medicine  staff  wanted  evidence  of  access  to  an  xml  collection  before  awarding   a  grant.     fox  first  approached  two  open-­‐access  publishers,  biomed  central  (bmc)  and  public  library  of   science  (plos),  to  request  access  to  xml  text  from  journals  in  the  subjects  of  life  and  biomedical   science.  fox  had  existing  contacts  within  both  organizations  and  an  agreement  was  reached  to   obtain  xml  journal  articles.  letters  of  understanding  were  quickly  obtained  as  both  publishers   were  excited  about  exploring  new  ways  for  their  research  publications  to  be  accessed  and  the   potential  to  increase  the  use  of  their  journals.  possible  journal  titles  were  identified  and     negotiating  a  text  mining  license  for  faculty  researchers  |  williams  et  al   10   arrangements  were  made  to  transfer  and  store  files  locally  from  bmc  and  plos  to  hunter’s   research  group.   hunter  approached  staff  at  pubmedcentral  (pmc)  to  request  access  to  articles  and  discovered   they  could  only  be  made  available  with  permission  from  publishers.  a  wiley  research  and  product   development  executive  granted  hunter  permission  to  access  wiley  articles  in  pmc.  the  wiley   executive  was  interested  in  learning  what  impact  text  mining  might  have  on  wiley  products.   hunter’s  research  group  planned  to  transfer  document  type  definition  (dtd)  format  files  from   pmc.  unfortunately,  when  hunter’s  research  group  staff  requested  file-­‐transfer  assistance  from   pmc,  no  pmc  staff  were  available  to  provide  the  technical  help  needed  because  of  budget   reductions.  pmc  staff  could  accurately  evaluate  their  time  commitment  because  they  had  a  clear   understanding  of  the  xml  access  and  transfer  process,  and  knew  they  could  not  allocate  resources   to  the  effort.     hunter  then  began  to  leverage  his  professional  network  connections  to  obtain  content  from  a   major  stem  vendor.  research  and  development  division  directors  within  the  company  were   familiar  with  the  work  of  hunter’s  research  group  and  were  willing  to  provide  assistance  in   acquiring  content.  however,  when  the  research  group  began  to  perform  research  using  this  data,   further  investigation  determined  that  the  contents  were  not  adequate  for  the  research.  follow-­‐up   between  fox,  the  research  group,  and  the  vendor  revealed  that  the  group’s  needs  were  not   communicated  in  the  vendor’s  vernacular,  resulting  in  the  group  not  clearly  understanding  what   content  the  vendor  was  providing.  this  disconnect  occurred  in  the  communication  flow  from  the   research  group  to  the  vendor’s  research  and  development  staff  to  the  vendor’s  sales  staff  (who   identified  the  content  to  be  shared).  it  was  a  like  a  game  of  telephone  tag.   after  the  initial  strategies  produced  mixed  results,  hunter’s  research  group  hypothesized  that  they   could  harvest  materials  through  hsl’s  journal  subscriptions.  hunter’s  research  group  attempted   to  crawl  and  download  journal  content  being  provided  by  hsl’s  subscription  to  a  major  chemistry   publisher.  since  publishers  monitor  for  web  crawling  of  their  content,  the  chemistry  publisher   became  aware  of  the  unusual  download  activity,  turned  off  campus  access,  and  notified  the  library   that  there  may  have  been  an  unauthorized  attempt  to  access  the  publisher’s  content.  researchers   are  often  unaware  of  complex  copyright  and  license  compliance  requirements.  in  fact,  librarians   sometimes  become  aware  of  text-­‐mining  projects  only  after  automated  downloads  of  licensed   content  prompt  vendors  to  shut  off  campus  access.33  libraries  can  prevent  interruption  of   campus-­‐wide  access  to  important  resources  by  suggesting  more  effective  content-­‐access  methods.     williams,  an  hsl  acquisitions  librarian,  investigated  the  interruption  in  access  and  discovered   hunter’s  research  group’s  efforts  to  obtain  journal  articles  to  text  mine  for  their  research.  she   offered  to  use  her  expertise  in  acquiring  content  to  help  hunter’s  research  group  obtain  the   dataset  needed  for  their  research.  initially,  hunter  and  fox  had  not  included  an  acquisitions     information  technology  and  libraries  |  september  2014   11   librarian  because  that  position  was  vacant.  after  williams  became  involved,  the  effort  focused  on   licensing  content  via  negotiation  and  licensing  with  individual  publishers.   results   “there  are  a  large  number  of  resources  to  help  the  researcher  who  is  interested  in  doing  text   mining”  but  “no  similar  guide  to  obtaining  the  necessary  rights  and  permissions  for  the  content   that  is  needed.”34  at  cu  anschutz,  this  vacuum  was  filled  by  williams,  who  is  knowledgeable  about   the  acquisition  of  content,  and  fox,  who  is  knowledgeable  about  hunter’s  research,  serving  as  the   bridge  between  the  research  group  and  the  stem  publisher.  by  working  together  and  capitalizing   on  each  other’s  expertise,  williams  and  fox  were  able  to  facilitate  the  collaboration  that  developed   a  framework  for  purchasing  a  large  collection  of  full-­‐text  journal  articles  in  xml  format.  as  the   collaboration  progressed,  three  major  elements  to  the  framework  surfaced,  including  a  pricing   model,  a  license  agreement,  and  the  dataset  and  delivery  mechanism.   researchers  interested  in  legally  text  mining  journal  content  often  find  themselves  having  to   execute  a  license  agreement  and  pay  a  fee.35  what  should  the  fee  be  based  on  to  create  a  fair  and   equitable  pricing  model?  publishers  establish  pricing  for  library  clients  on  the  basis  of  not  only  the   content,  but  many  valued-­‐added  services  such  as  the  breath  of  titles  aggregated  and  made   available  for  purchase  in  a  single  product,  the  creation  of  a  platform  to  access  the  journal  titles,  the   indexing  and  searching  functionality  within  the  platform,  and  the  production  of  easily  readable   pdf  versions  of  articles.  these  value-­‐added  services  are  not  required  for  text-­‐mining  endeavors.   rather,  the  product  is  the  raw  journal  content  that  has  been  peer-­‐reviewed,  edited,  and  formatted   in  xml  that  precedes  the  addition  of  value-­‐added  services.  therefore  the  pricing  should  not  be   equivalent  to  the  cost  of  a  library’s  subscription  to  a  journal  or  package  of  journals.  in  the  end,   after  lengthy  negotiations,  the  pricing  model  for  the  hunter’s  research  group  collection  of  full-­‐text   journal  articles  in  xml  format  consisted  of   • a  cost  per  article;   • a  minimum  purchase  of  400,000  articles  for  one  sum  on  the  basis  of  the  cost  per  article;   • an  annual  subscription  for  the  minimum  purchase  of  400,000;   • the  ability  to  subscribe  to  additional  articles  in  excess  of  400,000  in  quantities  determined   by  hunter’s  research  group;   • a  volume  discount  off  the  per  article  price  for  every  article  purchased  in  excess  of  400,000;   • inclusion  of  the  core  journal  titles  purchased  via  the  library’s  subscription  at  no  charge;     • inclusion  of  the  core  journal  titles  purchased  by  the  university  of  colorado  boulder  at  no   charge  because  of  hunter’s  joint  appointment  at  both  cu  boulder  and  cuanschutz   campuses;  and   • a  requirement  for  hsl  to  maintain  its  subscription  to  the  vendor’s  product  at  its  current   level.     negotiating  a  text  mining  license  for  faculty  researchers  |  williams  et  al   12   “where  institutions  already  have  existing  contracts  to  access  particular  academic  publications,  it  is   often  unclear  whether  text  mining  is  a  permissible  use.”36  from  the  beginning,  common  ground   was  easily  found  on  the  subject  of  core  titles  purchased  by  the  two  campuses’  libraries.  core  titles   are  typically  those  journals  that  libraries  pay  a  premium  for  to  obtain  perpetual  rights  to  the   content.  most  of  the  negotiation  focused  on  access  titles,  which  are  journals  that  libraries  pay  a   nominal  fee  to  have  access  to  without  any  perpetual  rights  included.   the  final  challenge  related  to  cost  was  determining  how  to  process  and  pay  for  the  product.   hunter’s  research  group  operates  on  major  grant  funding  from  federal  government  agencies.  the   university  of  colorado  requires  additional  levels  of  internal  controls  and  approvals  to  expend   grant  funds  as  well  as  to  track  expenditures  to  meet  reporting  requirements  of  the  funding   agencies.  also,  grant  funding  of  this  type  often  spans  multiple  fiscal  years  whereas  the  library’s   budget  operates  on  a  single  fiscal  year  at  a  time.  therefore  it  was  decided  that  hunter  would   handle  payment  directly  rather  than  transferring  funds  to  hsl  to  make  payment  on  their  behalf.   “libraries  as  the  licensee  of  publishers’  content  are  from  that  perspective  interested  in  the  legal   framework  around  content  mining.”37  during  price  negotiations,  williams  recommended   negotiating  a  license  agreement  similar  to  those  libraries  and  publishers  execute  for  the  purchases   of  journal  packages.  a  license  agreement  would  offer  a  level  of  protection  for  all  parties  involved   while  clearly  outlining  the  parameters  of  the  transaction.  hunter  and  the  stem  publisher  readily   agreed.     the  final  license  agreement  contained  ten  sections  including  definitions;  subscription;  obligations;   use  of  names;  financial  arrangement;  term;  proprietary  rights;  warranty,  indemnity,  disclaimer,   and  limitation  of  liability;  and  miscellaneous.  while  the  license  agreement  was  similar  to   traditional  license  agreements  between  libraries  and  publishers  for  journal  subscriptions,  there   were  some  notable  differences.  first,  in  the  definitions  section,  users  were  defined  and  limited  to   hunter  and  his  research  team.  this  limited  the  users  to  a  specific  group  of  individuals  unlike   typical  library–publisher  license  agreements  that  license  content  for  the  entire  campus.     second,  the  subscription  section  covered  how  the  data  can  be  used  in  detail  and  allowed  the   dataset  to  be  installed  locally.  this  was  important  to  make  the  dataset  available  on  demand  to   researchers;  to  allow  researchers  to  manipulate,  segment,  and  store  the  data  in  multiple  ways   instead  of  as  one  large  dataset;  and  to  allow  the  researchers  the  ability  to  access  and  use  the  large   dataset  efficiently  and  quickly.  because  the  dataset  would  be  manipulated  so  extensively,  the   license  gave  permission  to  create  a  backup  copy  and  store  it  separately.  the  subscription  section   also  required  the  dissemination  of  the  research  results  to  occur  in  such  a  way  that  the  dataset   could  not  be  extracted  and  used  by  others.  this  was  significant  because  prof.  hunter  releases  the   bnlp  software  applications  they  develop  as  open  source  software  so  that  the  applications  can  be   open  to  peer  review  and  attempts  at  reproduction.  ideally,  someone  could  download  the  open   source  software,  obtain  the  same  corpus  as  input,  and  see  the  same  output  mentioned  in  the  paper.     information  technology  and  libraries  |  september  2014   13   third,  the  obligations  section  was  radically  different  from  traditional  library–publisher  license   agreements  because  even  though  “publishers  are  still  working  out  how  to  take  advantage  of  text   mining  .  .  .  none  wants  to  miss  out  on  the  potential  commercial  value.”38  this  interest  prompted   the  crafting  of  an  atypical  obligations  section  in  the  license  agreement  that  included  an  option  for   hunter  to  collaborate  with  the  stem  publisher  to  develop  and  showcase  an  application  on  the   vendor’s  website  and  included  a  commitment  for  hunter  to  meet  quarterly  with  the  vendor’s   representatives  to  discuss  advances  in  research.  furthermore,  the  obligations  section  specified  a   request  for  hunter  and  the  university  of  colorado  to  recognize  the  vendor  where  appropriate  and   a  right  for  the  stem  publisher  to  use  any  research  software  application  released  as  open  source.   up  to  this  point,  williams  had  been  collaborating  with  the  university  of  colorado  in-­‐house  counsel   to  review  and  revise  the  license  agreement.  when  the  stem  publisher  requested  the  right  to  use   the  software  application,  williams  was  required  to  submit  the  license  agreement  to  the  university   of  colorado‘s  technology  transfer  office  for  review  and  approval.  approval  was  prompt  in  coming,   primarily  because  prof.  hunter  releases  his  software  applications  as  open  source.   fourth,  the  license  agreement  included  a  “use  of  names”  section,  which  is  not  found  in  typical   library–publisher  agreements.  this  section  authorized  the  vendor  to  use  factual  information   drawn  from  a  case  study  in  market-­‐facing  materials  and  a  requirement  that  the  vendor  request   written  consent,  as  required  from  the  university  of  colorado  system,  for  information  in  the  case   study  to  be  released  for  market  facing  materials.  the  vendor  also  agreed  not  to  use  the  university   of  colorado’s  trademark,  service  mark,  trade  name,  copyright,  or  symbol  without  prior  written   consent  and  to  use  these  items  in  accordance  with  the  university  of  colorado  system’s  usage   guidelines.     fifth,  the  vendor  agreed  not  to  represent  in  any  way  that  the  university  of  colorado  or  its   employees  endorse  the  vendor’s  products  or  services.  this  is  extremely  important  because  the   university  of  colorado’s  controller  does  not  allow  product  endorsements  because  of  the  federal   unrelated  business  income  tax.  exempt  organizations  are  required  to  pay  this  tax  if  engaged  in   activities  that  are  regularly  occurring  business  activities  that  do  not  further  the  purpose  of  the   exempt  organization.39     finally,  the  license  agreement  stated  all  items  would  be  provided  in  xml  format  with  a  unique   digital  object  identifier  (doi)  number,  essential  for  linking  xml  content  to  real-­‐world  documents   that  researchers  using  hunter’s  research  group’s  knowledge-­‐based  analysis  system  would  want  to   access.   after  a  pricing  model  and  license  agreement  were  finalized,  the  focus  turned  to  the  last  major   element  of  the  framework:  the  dataset  and  delivery  mechanism.  elements  such  as  quality  of  the   corpora  contents,  file  transfer  time,  and  storage  capacity  are  all  important.  in  other  words,  “the   need  is  to  start  looking  as  widely  as  possible  in  the  largest  set  of  content  sources  possible.  this   need  is  balanced  by  the  practicalities  of  dealing  with  large  amounts  of  information,  so  a  choice     negotiating  a  text  mining  license  for  faculty  researchers  |  williams  et  al   14   needs  to  be  made  of  which  body  of  content  will  most  likely  prove  fruitful  for  discovery.  text  mines   are  dug  where  there  is  the  best  chance  of  finding  something  valuable.”40   when  building  an  xml  corpora  for  research,  hunter’s  research  group  wanted  to  maximize  their   return  on  investment,  so  a  pilot  download  was  conducted  to  assure  that  the  most  beneficial   content  could  be  transferred  smoothly  to  a  local  server.  “permissions  and  licensing  is  only  a  part   of  what  is  needed  to  support  text  mining.  the  content  that  is  to  be  mined  must  be  made  available   in  a  way  that  is  convenient  for  the  researcher  and  the  publisher  alike.”41  this  pilot  phase  allowed   hunter’s  researchers  and  the  vendor’s  technical  personnel  to  clarify  the  requirements  of  the   dataset  and  to  efficiently  deliver  and  accurately  invoice  for  content.  one  of  the  initial  obstacles   was  that  a  filter  for  the  delivery  mechanism  didn’t  exist.  letters  to  the  editor,  errata,  and  more   were  all  counted  as  an  article.  hunter’s  researchers  quickly  determined  that  research  articles  were   most  important  at  this  point  in  the  development  of  the  knowledge-­‐based  analysis  system.  how   should  a  useful  or  minable  article  be  defined—by  its  length,  by  xml  tags  indicating  content  type,  or   by  some  other  criteria?  roeder,  a  software  engineer,  used  article  attributes  and  characteristics   embedded  in  xml  tags  to  define  an  article  as  including  all  of  the  following:     • an  abstract   • a  body   • at  least  40  lines  of  text   • none  of  the  following  tags:  corrigendum,  erratum,  book  review,  editorial,  introduction,   preface,  correspondence,  or  letter  to  the  editor   in  the  end,  hunter’s  research  group  and  the  vendor  agreed  to  transmit  everything  and  allow  the   group  a  fifteen  business  days  to  evaluate  the  content.  the  research  group  would  then  notify  the   vendor  of  how  many  “articles”  were  received.  this  process  would  continue  until  400,000  “articles”   were  received.     after  spending  more  than  a  year  working  to  develop  a  structure  to  purchase  a  large  corpus  of   journal  articles  to  text  mine.  just  as  hunter’s  research  group  was  ready  to  execute  the  license,   remit  payment,  and  receive  the  articles,  their  federal  grant  expired,  stalling  the  purchase.  in   retrospect,  this  unfortunate  development  was  the  catalyst  for  a  shift  in  philosophy  and  strategy  for   the  researchers  and  librarians  at  cu  anschutz.   discussion   xml  text-­‐mining  efforts  will  continue  to  expand,  leading  to  increased  demand  on  libraries  and   librarians  to  play  a  role  in  securing  content.  publishers,  researchers,  and  libraries  see  the  potential   commercial  and  research  value  for  text  mining  journal  content  and  are  driving  the  rapid  evolution   of  this  arena,  in  part,  because  “there  is  increasing  demand  from  public  and  charitable  funders  that   maximum  value  is  leveraged  from  their  substantial  investment  and  this  includes  making  outputs     information  technology  and  libraries  |  september  2014   15   accessible  and  usable.  .  .  .  text  mining  offers  the  potential  for  fuller  use  of  the  existing  publicly-­‐ funded  research  base.”42     however,  publishers  identified  two  main  barriers  to  text  mining  from  their  perspective—lack  of   standardization  in  content  formats  and  in  access  terms—and  concede  that  “publishers  should   develop  shared  access  terms  for  research-­‐driven  mining  requests.”43  from  the  researcher  and   librarian  perspective,  there  are  many  barriers  and  costs  involved  including  “access  rights  to  text-­‐ minable  materials,  transaction  costs  (participation  in  text  mining),  entry  (setting  up  text  mining),   staff  and  underlying  infrastructure.  currently,  the  most  significant  costs  are  transaction  costs  and   entry  costs.”44  the  significant  transaction  costs  stem  from  the  time  it  takes  to  navigate  the   complexity  of  negotiating  and  complying  with  license  agreements  for  journal  content.  the  various   types  of  “costs  are  currently  borne  by  researchers  and  institutions,  and  are  a  strong  hindrance  to   text  mining  uptake.  these  could  be  reduced  if  uncertainty  is  reduced,  more  common  and   straightforward  procedures  are  adopted  across  the  board  by  license  holders,  and  appropriate   solutions  for  orphaned  works  are  adopted.  however,  the  transaction  costs  will  still  be  significant  if   individual  rights  holders  each  adopt  different  licensing  solutions  and  barriers  inhibiting  uptake   will  remain.”45   in  a  survey  of  libraries,  findings  indicated  that  librarians  anticipate  a  new  role  as  facilitators   between  researchers  and  publishers  to  enable  text  mining.46  librarians  are  a  natural  fit  for  this   role  because  they  already  have  expertise  in  navigating  copyright,  requesting  copyright   permissions,  and  negotiating  license  agreements  for  journal  content.  “advice  and  guidance  should   be  developed  to  help  researchers  get  started  with  text  mining.  this  should  include:  when   permission  is  needed;  what  to  request;  how  best  to  explain  intended  work  and  how  to  describe   the  benefits  of  research  and  copyright  owners.”47   after  their  experience  with  developing  a  framework  to  license  and  purchase  a  large  corpora  of   journal  articles  in  xml  format  to  be  text  mined,  fox  and  williams  came  to  believe  that,  in  addition   to  providing  copyright  expertise,  librarians  should  assist  in  reducing  transaction  costs  by   developing  model  license  clauses  for  text  mining  and  routinely  negotiating  for  these  rights  when   the  library  purchases  journals  and  other  types  of  content.  adopting  this  philosophy  and  strategy   led  williams  and  fox  to  successfully  advocate  for  the  inclusion  of  a  text-­‐mining  clause  in  the   license  agreement  for  the  stem  publisher  in  this  case  study  at  the  time  of  the  library’s   subscription  renewal.  this  occurred  at  a  regional  academic  consortium  level,  making  text  mining   easier  at  fourteen  academic  institutions.  furthermore,  the  university  of  colorado  libraries,  which   includes  five  libraries  on  four  campuses,  is  now  working  on  drafting  a  model  clause  to  use  when   purchasing  journal  content  as  the  university  of  colorado  system  and  to  put  forth  for  consideration   by  the  consortiums  that  facilitate  the  purchase  of  our  major  journal  packages.  given  that   incorporating  text  mining  clauses  into  library–publisher  license  agreements  for  scholarly  journals   is  in  its  infancy,  there  are  few  resources  available  to  assist  librarians  adopting  this  new  role.  model   clauses  include  the  following:     negotiating  a  text  mining  license  for  faculty  researchers  |  williams  et  al   16   • british  columbia  electronic  library  network’s  model  license  agreement48   o clause  3.11.  “data  and  text  mining.  members  and  authorized  users  may  conduct   research  employing  data  or  text  mining  of  the  licensed  materials  and  disseminate   results  publicly  for  non-­‐commercial  purposes.”     • california  digital  library’s  standard  license  agreement49   o section  iv.  authorized  use  of  licensed  materials.  “text  mining.  authorized  users   may  use  the  licensed  material  to  perform  and  engage  in  text  mining/data  mining   activities  for  legitimate  academic  research  and  other  educational  purposes.”     • jisc’s  model  license  for  journals50   o clause  3.1.6.8.  “use  the  licensed  material  to  perform  and  engage  in  text   mining/data  mining  activities  for  academic  research  and  other  educational   purposes  and  allow  authorised  users  to  mount,  load  and  integrate  the  results  on  a   secure  network  and  use  the  results  in  accordance  with  this  license.”   o clause  9.3.  “for  the  avoidance  of  doubt,  the  publisher  hereby  acknowledges  that  any   database  rights  created  by  authorised  users  as  a  result  of  textmining/datamining  of   the  licensed  material  as  referred  to  in  clause  3.1.6.8  shall  be  the  property  of  the   institution.”   publishers  are  also  beginning  to  break  down  barriers  perhaps,  in  part,  because  of  the  sentiment   that  “privately  erected  barriers  by  copyright  holders  that  restrict  text  mining  of  the  research  base   could  be  increasingly  regarded  as  inequitable  or  unreasonable  since  the  copyright  holders  have   borne  only  a  small  proportion  of  the  costs  involved  in  the  overall  process;  furthermore,  they  do   not  have  rights  or  ownership  of  the  inherent  facts  or  ideas  within  the  research  base.”51  biomed   central  and  plos  both  offer  services  that  allow  researchers  to  access  xml  text  collections.  biomed   central  makes  content  readily  accessible  by  providing  a  website  for  bulk  download  of  xml  text.52   plos  requires  contact  with  a  staff  member  for  download  of  xml  text.53  in  december  2013,   elsevier  also  announced  that  it  would  create  a  “big  data”  center  at  the  university  college  london   to  allow  researchers  to  work  in  partnership  with  mendeley,  a  knowledge  management  and   citation  application  now  owned  by  elsevier.  while  this  is  a  positive  step,  the  partnership  does  not   appear  to  make  the  data  available  to  research  groups  beyond  the  university  college  london.54     however,  there  is  still  a  long  way  to  go  before  publishers  and  librarians  are  routinely   collaborating  on  opening  up  the  scholarly  literature  to  be  mined.  for  example,  a  2012  nature   editorial  states  “nature  publishing  group,  which  also  includes  this  journal,  says  that  it  does  not   charge  subscribers  to  mine  content,  subject  to  contract.”55  repeated  attempts  by  williams  to   obtain  more  information  from  nature  publishing  group  and  a  copy  of  the  contract  have  proved   fruitless.     in  january  2014,  elsevier  announced  that  “researchers  at  academic  institutions  can  use  elsevier’s   online  interface  (api)  to  batch-­‐download  documents  in  computer-­‐readable  xml  format”  after     information  technology  and  libraries  |  september  2014   17   signing  a  legal  agreement.  elsevier  will  limit  researchers  to  accessing  10,000  articles  per  week.56,57   for  small-­‐scale  projects  with  a  narrow  scope,  this  limit  will  suffice.  for  example,  mining  the   literature  for  a  specific  gene  that  plays  a  known  role  in  a  disease  could  require  a  text  set  under   30,000  articles.  at  elsevier’s  current  rate  of  article  transfer,  a  30,000  article  text  set  could  be   created  in  roughly  three  weeks.  however,  for  large-­‐scale  projects  such  as  hunter’s  research   group’s  knowledge-­‐based  analysis  system  that  require  a  text  set  of  400,000  articles  (or  much   more,  if  not  limited  by  budget  constraints),  nearly  a  year  of  time  would  be  required  to  build  the   corpora.  time  is  one  of  the  most  valuable  commodities  in  computational  biology.  the  elapsed  time   required  to  transfer  articles  at  the  rate  of  10,000  articles  per  week  represents  a  bottleneck  that   most  grant-­‐funded  research  cannot  afford.  speed  of  transfer  will  also  be  a  factor.  researchers   require  flexibility  to  maximize  available  central  processing  unit  (cpu)  hours  because  documents   can  take  from  a  few  seconds  to  a  full  minute  each  to  transfer  to  the  storage  destination.   monopolizing  peak  hours  in  high  performance  computing  (hpc)  settings  may  mean  that   computing  power  is  not  available  for  other  tasks,  although  many  hpc  centers  have  learned  to   allocate  cpu  use  more  efficiently  to  high  volumes.  furthermore,  the  terms  and  conditions  set  by   elsevier  for  output  limits  excerpting  from  the  original  text  to  200  characters.58  this  is  roughly   equivalent  to  two  lines  of  text  or  approximately  forty  words.  this  may  be  insufficient  to  capture   important  biological  relationships  necessary  to  evaluate  the  relevance  of  the  article  to  the   research  being  represented  by  the  hanalyzer  knowledge-­‐based  analysis  system.     conclusion   forging  a  partnership  between  a  library,  a  research  lab,  and  a  major  stem  vendor  requires   flexibility,  patience,  and  persistence.  our  experience  strengthened  the  existing  relationship   between  the  library  and  the  research  lab  and  demonstrated  the  library’s  willingness  and  ability  to   support  faculty  research  in  a  nontraditional  method.  librarians  are  encouraged  to  advocate  for   the  inclusion  of  text-­‐mining  rights  in  their  library’s  license  agreements  for  electronic  resources.   what  the  future  holds  for  publishers,  researchers,  and  libraries  involved  in  text  mining  remains  to   be  seen.  however,  what  is  certain  is  that  without  cooperation  between  publishers,  researchers,   and  libraries,  breaking  down  the  existing  barriers  and  achieving  standards  for  content  formats   and  access  terms  will  remain  elusive.   references     1.     university  of  colorado  anschutz  medical  campus,  university  of  colorado  anschutz  medical   campus  quick  facts,  2013,   http://www.ucdenver.edu/about/whoweare/documents/cuanschutz_facts_041613.pdf.     negotiating  a  text  mining  license  for  faculty  researchers  |  williams  et  al   18     2.     sonia  m.  leach  et  al.,  “biomedical  discovery  acceleration,  with  applications  to  craniofacial   development,”  plos  computational  biology  5,  no.  3  (2009):  1–19,   http://dx.doi.org/10.1371/journal.pcbi.1000215.   3.     jonathan  clark,  text  mining  and  scholarly  publishing  (publishing  research  consortium,  2013).   4.     corie  lok,  “literature  mining:  speed  reading,”  nature  463  (2010):  416–18,   http://dx.doi.org/10.1038/463416a.   5.     hong-­‐jie  dai,  yen-­‐ching  chang,  richard  tzong-­‐han  tsai,  wen-­‐lian  hsu,  "new  challenges  for   biological  text-­‐mining  in  the  next  decade,"  journal  of  computer  science  and  technology  25,   no.1  (2010):  169-­‐179,  doi:  10.1007/s11390-­‐010-­‐9313-­‐5.     6.     anne  hoekman,  “journal  publishing  technologies:  xml,”   http://www.msu.edu/~hoekmana/wra%20420/ismte%20article.pdf.   7.     alex  brown,  "xml  in  serial  publishing:  past,  present  and  future,"  oclc  systems  &  services  19,   no.  4,  (2003):149-­‐154,  doi:  10.1108/10650750310698775.   8.     cartic  ramakrishnan  et  al.,  “layout-­‐aware  text  extraction  from  full-­‐text  pdf  of  scientific   articles,”  source  code  for  biology  and  medicine  7,  no.  7  (2012),   http://dx.doi.org/10.1186/1751-­‐0473-­‐7-­‐7.   9.     ibid.   10.    lawrence  hunter  and  k.  bretonnel  cohen,  “biomedical  language  processing:  perspective   what’s  beyond  pubmed?”  molecular  cell  21,  no.  5,  (2006):  589–94.   11.    martin  krallinger,  alfonso  valencia,  and  lynette  hirschman,  “linking  genes  to  literature:  text   mining,  information  extraction,  and  retrieval  applications  for  biology,”  genome  biology  9,   supplement  2  (2008):  s8.1–s8.14,  http://dx.doi.org/10.1186/gb-­‐2008-­‐9-­‐s2-­‐s8.   12.    eefke  smit  and  maurits  van  der  graaf,  “journal  article  mining:  the  scholarly  publishers’   perspective,”  learned  publishing  25,  no.  1  (2012):  35–46,   http://dx.doi.org/10.1087/20120106.   13.    hunter  and  cohen,  “biomedical  language  processing,”  589.   14.    clark,  text  mining  and  scholarly  publishing.   15.    leach  et  al.,  “biomedical  discovery  acceleration.”   16.    marti  hearst,  “what  is  text  mining?”  october  17,  2003,   http://people.ischool.berkeley.edu/~hearst/text-­‐mining.html.     information  technology  and  libraries  |  september  2014   19     17.    k.  bretonnel  cohen  and  lawrence  hunter,  “getting  started  in  text  mining,”  plos   computational  biology  4,  no.  1  (2008):  1–3,  http://dx.doi.org/10.1371/journal.pcbi/0040020.   18.    jisc,  “the  model  nesli2  licence  for  journals,”  2013,  http://www.jisc-­‐collections.ac.uk/help-­‐ and-­‐information/how-­‐model-­‐licences-­‐work/nesli2-­‐model-­‐licence-­‐/.   19.    ian  hargreaves,  “digital  opportunity:  a  review  of  intellectual  property  and  growth,”  may   2011,  http://www.ipo.gov.uk/ipreview-­‐finalreport.pdf.     20.    james  manyika  et  al.,  “big  data:  the  next  frontier  for  innovation,  competition,  and   productivity,”  mckinsey  &  company,  may  2011,   http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_inn ovation.   21.   hargreaves,  “digital  opportunity.”     22.    diane  mcdonald  and  ursula  kelly,  “the  value  and  benefits  of  text  mining  to  uk  further  and   higher  education,”  jisc,  2012,  http://www.jisc.ac.uk/reports/value-­‐and-­‐benefits-­‐of-­‐text-­‐ mining.   23.    jisc,  “the  model  nesli2  licence  for  journals.”   24.    smit  and  van  der  graaf,  “journal  article  mining.”   25.    mcdonald  and  kelly,  “the  value  and  benefits  of  text  mining.”   26.    clark,  text  mining  and  scholarly  publishing.   27.    “gold  in  the  text?”  nature  483  (march  8,  2012):  124,  http://dx.doi.org/10.1038/483124a.   28.    richard  van  noorden,  “trouble  at  the  text  mine,”  nature  483  (march  8,  2012):  134–35.   29.    claudio  aspesi,  a.  rosso,  and  r.  wielechowski.  reed  elsevier:  is  elsevier  heading  for  a  political   train-­‐wreck?  2012.   30.    clark,  text  mining  and  scholarly  publishing.   31.    jill  emery,  “working  in  a  text  mine:  is  access  about  to  go  down?”  journal  of  electronic   resources  librarianship  20,  no.  3  (2009):135–38,   http://dx.doi.org/10.1080/19411260802412745.   32.    clark,  text  mining  and  scholarly  publishing:  14.   33. van  noorden,  “trouble  at  the  text  mine.” 34.   ibid. 35.    ibid.     negotiating  a  text  mining  license  for  faculty  researchers  |  williams  et  al   20     36.    jisc,  “the  model  nesli2  licence  for  journals.”   37.    smit  and  van  der  graaf,  “journal  article  mining.”   38.    van  noorden,  “trouble  at  the  text  mine.”   39.    internal  revenue  service,  “unrelated  business  income  defined,”     http://www.irs.gov/charities-­‐&-­‐non-­‐profits/unrelated-­‐business-­‐income-­‐defined.   40.    clark,  text  mining  and  scholarly  publishing:  10.   41.    ibid:  14.   42.    mcdonald  and  kelly,  “the  value  and  benefits  of  text  mining.”   43.    smit  and  van  der  graaf,  “journal  article  mining.”   44.    mcdonald  and  kelly,  “the  value  and  benefits  of  text  mining.”   45.    ibid.   46.    smit  and  van  der  graaf,  “journal  article  mining.”   47.    mcdonald  and  kelly,  “the  value  and  benefits  of  text  mining.”   48.    british  columbia  electronic  library  network,  bc  eln  database  licensing  framework,     http://www.cdlib.org/services/collections/toolkit/.   49.      “licensing  toolkit,”  california  digital  library,   http://www.cdlib.org/services/collections/toolkit/.   50.    jisc,  “the  model  nesli2  licence  for  journals.”   51.    mcdonald  and  kelly,  “the  value  and  benefits  of  text  mining.”   52.    “using  biomed  central’s  open  access  full-­‐text  corpus  for  text  mining  research,”     http://www.biomedcentral.com/about/datamining.   53.    “help  using  this  site,”  plos,  http://www.plosone.org/static/help.   54.    iris  kisjes,  “university  college  london  and  elsevier  launch  ucl  big  data  institute,”  elsevier   connect,  press  release,  december  18,  2013,  http://www.elsevier.com/connect/university-­‐ college-­‐london-­‐and-­‐elsevier-­‐launch-­‐ucl-­‐big-­‐data-­‐institute.   55.    “gold  in  the  text?”   56.    richard  van  noorden,  “elsevier  opens  its  papers  to  text-­‐mining,”  nature  506  (february  2,   2014):  17.   57.    sciverse,  content  apis,  http://www.developers.elsevier.com/cms/content-­‐apis.     information  technology  and  libraries  |  september  2014   21     58.    “text  and  data  mining,”  elsevier,  ,  http://www.elsevier.com/about/universal-­‐access/content-­‐ mining-­‐policies.   108 information technology and libraries | june 2006 tutorial writing your first scholarly article: a guide for budding authors in librarianship scott nicholson this series of questions and answers is designed to help you take the first steps toward the successful production of a scholarly article in librarianship. you may find yourself in a library position that requires writing or you may have just decided that you are ready to share your findings, experiences, and knowledge with the current and future generations of librarians. while following the guidelines listed here will not guarantee that you will be successful, these steps will take you closer to discovering the thrill of seeing your name in print and making a difference in the field. what should i write about? perhaps you already have an idea based upon your experiences and expertise, or perhaps you aren’t sure which of those ideas you should write about. the best way to start writing is to read other articles! many scholarly articles end with a future research section that outlines other projects and questions that the article suggests. it is useful to contact the author of a piece that holds a future research seed to ensure that the author has not already taken on that challenge. sometimes, the original author may be interested in collaborating with you to explore that next question. how do i start? scholarship is an iterative process, in that works that you produce are bricks in an ever-rising wall. your brick will build upon the works of others and, once published, others will build upon your work. because of this, it is essential to begin with a review of related literature. search in bibliographic and citation databases as well as web search tools to see if others have done similar projects to your own. the advantage of finding related literature is that you can learn from the mistakes of others and avoid duplicating works (unless your plan is to replicate the work of others). starting with the work of others allows you to place your brick on the wall. if you do not explicitly discuss how your scholarship relates to the scholarship of others, only those having familiarity with the literature will be able to understand how your work fits in with that of previous authors. in addition, it’s easier to build upon your work if those who read it have a better idea of the scholarly landscape in which your work lives. as you go out and discover literature, it is crucial to keep citation information about each item. much of what you will cite will be book chapters or articles in journals, and you will save yourself time and trouble later if you make a printed copy of source items and record bibliographic information on that copy. recording the title of the work, the full names (including middle initials) of authors and editors, page range, volume, issue, date, publisher and place of publication, url and date accessed, and any other bibliographic information at the time of collection will save you headaches later when you have to create your references list. as different journals have different citation requirements, having all of this information allows you the flexibility of adapting to different styles. one type of scholarship produced by libraries is the “how our library did something well” article. while a case study of your library can be an appropriate area of discussion, it is critical to position these pieces within the scholarship of the field. this allows readers to better understand how applicable your findings are to their own libraries. the concept illustrates the difference between the practice of librarianship and library science. library science is the study of librarianship and includes the generalization of library practice in one setting to other settings. before starting your writing, talk about your idea with your colleagues, which will help you refine your ideas. it will also generate some excitement and publicity about your work, which can help inspire you to continue in the writing process. colleagues can help you consider different places where similar works may already exist and might even open your eyes to similar work in another discipline. you may find a colleague who wants to coauthor the piece with you, which can make the project easier to complete and richer through the collaborative process. another important early step is to consider the journals you would like to be published in. many times, it can be fruitful to publish in the journal that has published works that are in your literature review. considering the journal at this point will allow you to correctly focus the scope, length, and style of your article to the requirements of your desired journal. your article should match the length and tone of other articles in that journal. most journals provide instructions to authors in each issue or on the web; the information page for ital authors is at www.ala.org/ala/ lita/litapublications/ital/information authors.htm. how can i find funding for my research? some projects can’t be easily done in your spare time and require resources for surveys, statistical analysis, travel, or other research costs. you will find that successful requests for funding scott nicholson (srnichol@syr.edu) is an assistant professor in the school of information studies, syracuse university, new york. writing your first scholarly article | nicholson 109 start with a literature review and a research plan. developing these before requesting funding will make your request for funding much stronger, as you will be able to demonstrate how your work will sit within a larger context of scholarship. you will need to develop a budget for your funding request. this budget will come together more easily if you have planned out your research. it may be useful or even required for you to develop a set of outcomes for your project and how you will be assessing those outcomes (find more information on outcome-based evaluation through the imls web site at www.imls.gov/grants/current/ crnt_obe.htm). developing this plan will give you a more concrete idea of what resources you will need and when, as well as how you can use the results of your work. resources for research may come from the inside, such as the library or the parent organization of the library, or from an external source, such as a granting body or a corporate donor. in choosing an organization for selection, you should consider who would most benefit from the research, as the request for funding should focus on the benefit to the granting body. many libraries and schools do have small pots of money available for research that will benefit that institution and that, many times, go untapped due to a lack of interest. granting organizations put out formal calls for grant proposals. these can result in a grant that would carry some prestige but would require a detailed formal application that can take months of writing and waiting. another approach is to work with a corporate or nonprofit organization that gives grants. if your organization has a development office, this office may be able to help connect you with a potential supporter of your work. how do i actually do the research? just as the most critical part of a dissertation is the proposal, a good research plan will make your research process run smoothly. before you start the research, write the literature review and the research plan as part of an article. it can be useful to create tables and charts with dummy data that will show how you plan to present results. doing this allows you to notice gaps in your data-collection plan well before you start that process. in many research projects, you only have a single chance to collect data; therefore, it’s important to plan out the process before you begin. how do i start writing the paper? the best way to start the writing process is to just write. don’t worry about coming up with a title; the title will develop as the work develops. you can skip over the abstract and introduction; these can be much easier to write after the main body of the article is complete. if you’ve followed the advice in this paper, then you’ve already written a literature review and perhaps a research plan; these make a good starting point for your article. one way to develop the body of the article is to develop an outline of headings and subheadings. starting with this type of outline forces you to think through your entire article and can help you identify holes in your preparation. once you have the outline completed, you can then fill in the outline by adding text to the headings and subheadings. this approach will keep your thinking organized in a way typically used in scholarly writing. scholarly writing is different than creative writing. many librarians with a humanities background face some challenges in transitioning to a different writing style. scholarly writing is terse; strunk and white’s the elements of style (2000) focuses on succinct writing and can help you refresh your writing skills.1 if you are having difficulty finding the time to write, it can be useful to set a small quota of writing that you will do every day. a quota such as four paragraphs a day is a reasonable amount to fit into even a busy day, but it will result in the completion of your first draft in only a few weeks. i’m finished with my first complete draft! now what? while you will be excited with the completion of the draft, it’s not appropriate to send that off to a journal just yet. take a few days off and let your mind settle from the writing, then go back and reread your article carefully. examine each sentence for a subject and a verb, and remove unneeded words, phrases, sentences, paragraphs, or even pages. try to tighten and clean your writing by replacing figures of speech with statements that actually say what you mean in that situation and removing unneeded references to firstand second-person pronouns. working through the entire article in this way greatly improves your writing and reduces the review and editing time needed for the article. after this, have several colleagues read your work. some of these might be people with whom you shared your original ideas, and others may be new to the concepts. it can be useful to have members of different departments and with different backgrounds read the piece. ask them if they can read your work by a specific date, as this type of review work is easy to put off when work gets busy. these colleagues may be people who work in your institution or may be people you have met online. if you know nobody who would be appropriate, consider putting out a request for assistance on a library discussion list focused on your research topic. dealing with the comments from others requires you to set aside your 110 information technology and libraries | june 2006 defenses. you did spend a lot of time on this work and it can be easy to slip into a defensive mode. attempt to read their comments from an objective viewpoint. remember—these people are spending their time to help you, and a comment you disagree with at first blush may make more sense if you consider the question “why would someone say this about my work?” putting yourself into the reader’s shoes can aid you in the creation of a piece that speaks to many audiences. what goes on when i submit my work? at this point, your readers have looked at the piece, and you have made corrections on it. now you’re ready to submit your work. follow the directions of the target journal, including length, citation format, and method of submission. if submission is made by e-mail, it would be appropriate to send a follow-up e-mail a few days after submission to ensure the work was received; it can be very frustrating to realize, after a month of waiting, that the editor never got the work. once you have submitted your work, the editor will briefly review it to ensure it is an appropriate submission for the journal. if it is appropriate, then the editor will pass the article on to one or more reviewers; if not, you will receive a note fairly quickly letting you know that you should pick another journal. if the reviewing process is “blind,” then you will not know who your reviewers are, but they may know your identity. if the process is “double-blind,” neither reviewer nor author will know the identity of the other. the reviewers will read the article and then submit comments and a recommendation to the editor. the editor will collect comments from all of the reviewers and put them together, and send those comments to you. this will always take longer than you would prefer; in reality, it will usually take two to six months, depending upon the journal. after a few months, it would be appropriate for you to contact the editor and ask about the progress on the article and when you should expect comments. do not expect to have your article accepted on the first pass. the common responses are: ■ reject. at this point, you can read the comments provided, make changes, and submit it to another journal. ■ revise and resubmit. the journal is not making a commitment to you, but they are willing to take another look if you are willing to make changes. this is a common response for first submissions. ■ accept with major changes. the journal is interested in publishing the article, but it will require reworking. ■ accept with minor changes. you will be presented with a series of small changes. some of these might be required and others might be your choice. ■ accept. the article is through the reviewing process and is on to the next stage. this is an iterative process. you will most likely go through several cycles of this before your article is accepted, and staying dedicated to the process is key to its success. it can be disheartening to have made three rounds of changes only to face another round of small changes. ideally, each set of requested changes should be smaller (and take less time) until you reach the acceptance level. do not submit your work to multiple journals at the same time. if you choose to withdraw your work from one journal and submit it to another, let the editor know that you are doing this (assuming they have not rejected your work). my article has been accepted. when will it come out? once your article is accepted, it will be sent into a copyediting process. the copy editor will contact you with more questions that focus more on writing and citation flaws than on content. after making more corrections, you will receive a proof to review (usually with a very tight deadline). this proof will be what comes out in the journal, so check important things like your name, institutions, and contact information carefully. the journal will usually come out several months after you see this final proof. the process from acceptance to publication can take from six months to two years (or more), depending on how much of a publication queue the journal has. the editor should be able to give you an estimate as to when the article will come out after full acceptance. can i put a copy of my article online? it depends upon the copyright agreement that you sign. many publishers will allow you to put a copy of your article on a local or institutional web site with an appropriate citation. some allow you to put up a preprint, which would be the version after copyediting but not the final proof version. if the copyright agreement doesn’t say anything about this, then ask the editor of the journal about the policy of authors mounting their own articles on a web site. conclusion writing an article and getting it published is akin to having a child. your child will have a life of its own, and others may notice this new piece of knowledge and build upon it to improve their own library services writing your first scholarly article | nicholson 111 or even make their own works. it is a way to make a difference that goes far beyond the walls of your own library, to extend your professional network, and to engage other scholars in the continued development of the knowledge base of our field. reference 1. w. strunk jr. and e. b. white, the elements of style (boston: allyn & bacon, 2000). for more information: w. crawford, first have something to say: writing for the library profession (chicago: ala, 2003). r. gordon, the librarian’s guide to writing for publication (lanham, md.: scarecrow, 2004). l. hinchliffe and j. dorner, eds., how to get published in lis journals: a practical guide (san diego: elsevier, 2003), www .elsevier.com/framework_librarians/lib raryconnect/lcpamphlet2.pdf, (accessed feb. 8, 2006). adventure code camp: library mobile design in the backcountry david ward , james hahn, and lori mestre information technology and libraries | september 2014 45 abstract this article presents a case study exploring the use of a student coding camp as a bottom-up mobile design process to generate library mobile apps. a code camp sources student programmer talent and ideas for designing software services and features. this case study reviews process, outcomes, and next steps in mobile web app coding camps. it concludes by offering implications for services design beyond the local camp presented in this study. by understanding how patrons expect to integrate library services and resources into their use of mobile devices, librarians can better design the user experience for this environment. introduction mobile applications offer an exciting opportunity for libraries to expand the reach of their services, to build new connections, and to offer unique, previously unavailable services for their users. mobile apps not only provide the ability to present library services through mobile views (e.g., the library catalog and library website), but they can tap into an ever-increasing list of mobile-specific features. by understanding how patrons expect to integrate library services and resources into their use of mobile devices, librarians can better design the user experience for this environment. by adjusting the normal app production workflow to directly involve students during the formative stages of mobile app conception and design, libraries have the potential to generate products that more accurately anticipate real-life student needs. this article details one such approach, which sources student talent to code apps in a fast-paced, collaborative setting. as part of a two-year institute of museum and library services (imls) grant, an academic library– based research team investigated three different methods for involving users in the app development process—a student competition, integrated computer science class projects, and the coding camp described in this article. the coding camp method focuses on a trend in mobile software development of having intensive two-to-three-day coding events that result in working prototypes of applications (e.g., iphonedevcamp, http://www.iphonedevcamp.org/). coders typically work in groups to simultaneously learn how new software works, and also develop a functioning app that matches an area of personal interest. camps promote collaboration, which provides additional networking and social outcomes to attendees. additionally, camps provide an david ward (dh-ward@illinois.edu) is reference services librarian, james hahn (imhahn@illinois.edu) is orientation services & environments librarian, and lori mestre (lmestre@illinois.edu) is head, undergraduate library and professor of library administration, university of illinois at urbana-champaign. http://www.iphonedevcamp.org/ mailto:dh-ward@illinois.edu mailto:imhahn@illinois.edu mailto:lmestre@illinois.edu adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 46 opportunity for software makers to promote their services and products, and they can result in new code and ideas on which to base future products. for academic libraries, a camp environment provides an educational opportunity for students, particularly those in a field with a computing or engineering focus, to learn new coding languages and techniques and to gain experience with a professional software production process that runs the full timeline from conception to finished product. coding camps offer a chance for librarians to get direct student feedback on their own software development goals. the resulting applications provide potential benefits to both groups—students have a functional prototype to enhance their classroom experiences and a codebase to build on for future projects, and the librarians gain an insight into students’ desires for the content of mobile apps, code to integrate into existing apps, and direct student input into the iterative design process. this article presents the results of a mobile application coding camp held in fall 2013. the camp method was tested as a way to explore a less timeand staff-intensive process for involving students in the creation of library mobile apps. three specific research questions framed this investigation: 1. what library and course-related needs do students believe would benefit from the development of a mobile application? 2. is the library providing access to data that is relevant to these needs, and is it available in a format (e.g., restful apis) that end users can easily adopt into their own application design process? 3. how viable is the coding camp method for generating usable mobile app prototypes? literature review in line with efforts in academic libraries to operationalize participatory design for novel service implementation,1 the library approach to code camps included sourcing student technical expertise in line with other tech companies’ approaches to quickly iterating prototypes that may advance or enhance company services. while coding camps happen in corporate settings, other types of camps try to publicize technologies, like a programming language, while others still are directed toward a specific cohort.2 the departure point for the library was in understanding other ways the library might consider organizing and pairing its resources of apis with other available campus services. a few highly visible and notable corporate “hackfests” or “hackdays” include the facebook hackdays, in which facebook timeline was developed (http://www.businessinsider.com/facebook-timeline-began-as-a-hackathon-project-2012-1). the mobile app company twitter also has monthly hackfests where employees from across the company work for a sustained period (a weekend or friday) on new ideas putting together prototypes that may transition into new services for the company. http://www.businessinsider.com/facebook-timeline-began-as-a-hackathon-project-2012-1 information technology and libraries | september 2014 47 an example of code camps from academia are the mhacks camps at the university of michigan (http://mhacks.challengepost.com/), among the largest code camps for university students in the midwest. these camps are notable for their funding from corporations and for their support of student travel from colleges around the country to participate at the university of michigan. at each event, coders are encouraged to make use of the corporate apis that student programmers may make use of once they graduate or form companies after graduation. on the professional front, digital library code meet-ups (such as that of the code4lib preconference: http://code4lib.org/) are an opportunity for library technologists to share strategies and new directions in software using hands-on group coding sessions that last a half or full day. a recent digital event for the digital public library of america (dpla) hosted hackfests to demonstrate interface and functional possibilities with the source content in the dpla. similarly, the hathi trust research center organized a developer track for api coaching at their conference so that participants would have hands-on opportunities to use the hathi trust api (http://www.hathitrust.org/htrc_uncamp2013). goals of coding camps include development of new services or creation of value-added services on already existing operations. code is not required to be complete, but functional prototypes help showcase new ways of approaching problems or novel solutions. recently, mhacks issued the call to form new businesses at their winter hackathon (http://www.mhacks.org). libraries are typically less interested in new businesses, but rather seek new service ideas and new principles for organizing content via mobile and to do so in such a way that will source student preferences for location specific services, a key focus for the research team’s student/library collaborative imls grant. method while the camp itself took only two days, there was a significant amount of lead-time needed to prepare. in addition to obtaining standard campus institutional-review-board permissions for the study, it was also necessary to consult the office of technology management to devise an assignment agreement covering the software generated by the camp. the research team chose a model that gave participating students the option to assign co-ownership of the code they developed to the library. this meant that both students and the library could independently develop applications using the code generated during the camp. marketing for the camp specifically targeted departments and courses where students with interest and skills for mobile application development were likely to be found, particularly in computer science and engineering. individual instructors were contacted, as well as registered student organizations, to help promote the camp. attendees were directed to an online application form, where they were asked to provide information on their coding skills and details on their interest in mobile application development. http://mhacks.challengepost.com/ http://code4lib.org/ http://www.hathitrust.org/htrc_uncamp2013 http://www.mhacks.org/ adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 48 ten students were ultimately selected from the pool and, of those, six attended the camp. a precamp package was sent to these students to help them prepare for the short, intense timeframe the event entailed. this package included details on library data that were available to base applications on through web apis, as well as brief tutorials on the coding languages and data formats participants needed to be familiar with (e.g., javascript, json, xml, etc.). participants were also provided with information on parking and other logistics for the event. the research team consisted of librarians and academic professionals involved in public services and mobile design, and student coders employed by the library to serve as peer mentors. the team designed the camp as a two-day experience occurring over a weekend (friday evening to saturday late afternoon). the first day was scheduled as an introduction to the camp, with details on library and related apis that could be used for apps and an opportunity for participants to brainstorm app ideas and form design teams. the day ended with some preliminary coding and consultation with camp organizers about planned directions and needs for the second day. the second day of the camp mostly for coding, with breaks scheduled for food, presentations of work-in-progress, and an opportunity to ask questions of the research team. the day ended with each team presenting their app, describing their motivation in designing it and the functionality they had been able to code into it. given the brief turnaround time, the research team put a heavy focus during the orientation session on clearly articulating the need to develop apps germane to student library needs. examples from the student mobile app design competition conducted in february 2013 were provided as starting points for discussion, as these reflected known student desires for mobile library applications.3 after the camp ended, students who elected to share their code with the library were given details on how and where to deposit the code. post-camp debriefing interviews (lasting 30 to forty-five minutes each) were scheduled individually with all participants to get their feedback on the setup of the event as well as what they felt they learned from the experience. discussion researcher observations and feedback from students, both during the camp and in individual interviews afterwards, led to several insights about what sorts of outcomes libraries might anticipate from running camps, how to best structure library coding camps, what outcomes students anticipate from participating in a sponsored camp environment, and what features and preferences students have for mobile apps designed to support their academic endeavors. a key student assumption, which emerged from comments at the event and through subsequent student interviews, was that students anticipated completing a fully functioning mobile app by the end of camp. instead, the two student teams each finished with an app that, while it included some of the features they desired, still required additional coding to be fully realized. several information technology and libraries | september 2014 49 suggestions were made for how this need might be met at future events. the most consistent feedback from the students was that they would have liked an additional day of coding (three total camp days), so that they could have gotten further on the implementation of their app ideas. during the exit interviews, one student noted that the two-day timeframe really only allowed for sketching out an idea for an app, not coding it from scratch. a pair of related suggestions from students included having templates for mobile apps available to review to get up to speed on new frameworks (particularly jquery), and secondly, a longer meetand-greet for teams prior to beginning work during which they could compare available coding skills and have some extended brainstorming of app ideas. students were somewhat mixed in their desire for assistance in developing app ideas—some appreciated the open-endedness of the camp, but others wanted a more organizer-driven approach. some students suggested having time to work with library staff after the camp to finish or polish their apps. this observation suggests the enthusiasm students had for the camp itself, and specifically for having a social, structured, and mentored opportunity to develop their coding skills. based on these requests, the research team created “office hours” on the fly after the camp ended to support this request. research team members and coding staff communicated times when team members could come into the library and get additional help with developing their apps. the students had very similar themes for app features to those that the research team observed in an earlier student mobile app competition study. notable categories included the following: • identify and build connections with peers from courses. • discover campus resources and spaces. • facilitate common activities such as studying and meeting for group work. students remarked that the camp was an opportunity to both meet people with similar coding interests as well as to learn more about specific functional areas of app development (specific coding languages, user interface design, etc.) in which they had little experience. jquery and javascript for user-facing designs were particular areas of interest. many students had some indepth background working on pieces of a finished software product but had not previously done start-to-finish software design; this was a big selling point for the camp. the collaborative nature of the camp also matched students’ preferences to work in teams and to learn from peers. while the research team had coders on hand to assist with both the library apis, as well as jquery basics, most teams did the majority of their work themselves, and preferred self-discovery. each team did eventually ask for coding advice, but this occurred toward the end of the camp, once their apps were largely coded and they needed assistance overcoming particular sticking points. the other piece of advice students asked organizers about concerned identifying apis for locations of campus maps, and other related resources to serve as data sources powering their apps. in the course of assisting with these requests, researchers discovered another key issue facing library mobile app development—the lack of campus standards for presenting information across adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 50 different colleges and departments. in particular, maps of rooms inside campus buildings were not provided in a consistent or comprehensive way. this was particularly frustrating to the team that was attempting to develop an app featuring turn-by-turn navigation and directions to meeting rooms and computer labs. in addition to sharing information on known apis and data sources, camp organizers also learned about previously unknown data sources from the student teams. one example was a json feed for the current availability of computers in labs provided by the college of engineering. while this feed was beneficial to starting work on an app for one team, it also led to frustration because feeds for other campus computer labs did not exist, and the team was limited to designing around the specific labs that did have this information available. observed student discussions about the randomness of data availability also highlighted one of the key themes of student-centered design—the conceptualization of a university as a single entity, the various parts of which combine and come in and out of focus depending on the current student task. related student feedback from one of the post-event interviews described a strong desire to create integrated, multifunction apps to meet student needs as opposed to a variety of apps that each did one thing. the siloed nature of campus infrastructures frustrates this desire to some extent but also creates opportunities for students to build a tool that meets a real need among their peers to comprehend and organize their academic environment. this observation also matches those found during the aforementioned student competition. conclusion and future directions student feedback on the camp, as a whole, was very positive, and in the individual interviews, students noted they would like to participate in another camp if it was offered. on the library side, the research team felt that the camp was useful to their ongoing mobile app development process, partially for the code generated but primarily for the direct feedback on what types of apps students wanted to see. the start-up time and costs for the project were low, as expected, and the insights into student mobile preferences seemed proportionate to this outlay. the camp method should be reproducible in a variety of library environments. the key assets other libraries will need to have in place to run a camp include staff with knowledge of client-side api use (in particular jquery, cors, or related skills), and knowledge of campus data sources that students may wish to pull from. third-party apis with bibliographic data (e.g., good reads) could also be used as placeholders for libraries that do not have access to apis for their own catalogs or discovery systems. student suggestions for extending the camp by a day, and their ideas for how to structure it for student success, were very specific and actionable and provided excellent guidance. one of their ideas was to develop tutorials and templates that could be introduced as a pre-camp meeting. this would not add too much prep time. another idea for a future camp would be to develop a specific theme for teams, which would allow for more documentation of and practice with specific apis. information technology and libraries | september 2014 51 the low attendance was a concern, so for the next camp twice the number of desired participants will be invited to ensure both a variety of coding skills and interests as well as opportunities for more teams to be formed. additionally, partnerships with student coding groups or related classes should help to drive up attendance. the biggest difficulty moving forward will be developing campus standards for data that can be made available to students about resources, spaces, and services. as noted above, students typically do not design a “library app,” rather they look to build a “student app” that pulls in a variety of data from across campus. functions of apps are therefore more oriented toward common student activities like studying, socializing, and learning. a related challenge will be to provide adequate format and delivery mechanisms for access to supporting data feeds. cognizant of the silo issue, noted above, as libraries present their own data for student consumption, these tendencies towards a unified view need to be taken into account. completion of an assignment is more than identifying three scholarly sources; it might involve identifying a space to do the research, locating peers or mentors for either the research or writing process, locating suitable technology to complete an assignment, and a variety of other needs. the features and information presented on a library’s website should be designed as modular building blocks that can fit into other campus services in a similar way to how course reserves are sometimes presented in campus learning management services alongside syllabi and assignments. separating library content (e.g., full-text articles, room information, research services) from library process can help with freeing information about what libraries have to offer and can facilitate broader discovery of services and resources at point of need. key to this process is recognizing the student desire to shape the resources they need into a comprehensible format that matches their workflow rather than forcing students to learn a specific, isolated, and inflexible path for each part of the projects they work on. this study has shown that a collaborative process in technology design can yield insights into students’ conceptual models about how spaces, resources, and services can be implemented. while the traditional model of service development often leaves these considerations until the very end in a summative assessment of service, the coding camp and collaborative methods presented here provide librarians a new tool for adding depth to service design and implementation, ultimately resulting in services and platforms that are informed by a more wellrounded and deeper understanding of the student mobile-use experience. in that regard, the initial research questions that framed this study could also be used by other libraries as they explore the library and course-related needs that students could benefit from with the development of mobile applications, as well as to determine if their library provides access to data that is relevant to those needs. the results from this study have affirmed that, for at least the library from this study, that the coding camp method is viable for generating usable mobile app prototypes. it also affirmed that by directly involving students during the formative stages of mobile app conception and design, the products of those apps more accurately reflect real-life student needs. adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 52 references 1. council on library and information resources (clir), participatory design in academic libraries: methods, findings, and implementations (washington, dc: clir, 2012), http://www.clir.org/pubs/reports/pub155/pub155.pdf. 2. “hackathon,” wikipedia, 2014), http://en.wikipedia.org/wiki/hackathon. 3. david ward, james hahn, and lori mestre, “designing mobile technology to enhance library space use: findings from an undergraduate student competition,” journal of learning spaces (forthcoming). http://www.clir.org/pubs/reports/pub155/pub155.pdf http://en.wikipedia.org/wiki/hackathon 124 book reviews theory and application of information research. edited by ole harbo and leif kajberg. london: mansell publishing, 1980. 235p. £16.00. isbn: 0-7201-1513-2. this book reproduces twenty-one papers presented at the second intemational research forum on information science, which was held at the royal school of librarianship in copenhagen during august of 1977. the title of this work may be misleading since the majority of the papers could better be described as the foundations of information science. the papers that advanced the theory of information science were the exception, and the contributions dealing with practical applications were even rarer. the contributors included many familiar names: kathleen t. bivins, anthony debons, william goffman, manfred kochen, allan d. pratt, and hans h . wellisch from the united states; nicholas j. belkin, j. m. brittain, b. c. brookes, robert a. fairthorne, j.-m. griffiths, m. h. heine, s. e. robertson, b. c. vickery, and t. d. wilson from the united kingdom; and many names from europe that may be less familiar on this side of the atlantic. the forum was organized into five sessions: general models of information science, information science in relation to other scientific disciplines, measurement, the information retrieval process, and the future tasks of information scientists in europe. within the book, the distinction between these sessions generally is not obvious. appendixes give the forum program, summarize the discussions of the papers, and report on group discussions. in the introduction, it was stated that it was hoped that the forum would bridge the gap betwe~n theory and research on one side and practice on the other. the book does not fulfill this hope, but it does present a good collection of papers dealing with a variety of aspects in information science. the view that the main problems of information science are cognitive rather than technical is evident in many of the papers. however, bradford's law, shannon's theory, and the epidemic model are addressed in several of the papers. with a few exceptions, the papers are quite readable and do not require a mathematical background to be understood and appreciated. the summaries and group discussions are disappointing, possibly because several of the authors were unable to attend the forum. kathleen bivius was the only american contributor present. there is no index, although one would have been helpful. the book is valuable and should be part of any library collection covering information science. anyone interested in information science should be able to find several highly relevant papers. however, only a limited number of scholars will find it necessary to read the entire work.-edward t. o'neill, matthew a. baxter school of information and library science, case western reserve university, cleveland, ohio. personal documentation for professionals-means and methods, by v. stibic. amsterdam: north-holland pub!. co., 1980. 214p. $29.25 (dfl 60.00). isbn: 0-444-85480-0. while there have been many a number of books written on the design, development, and use of large-scale database systems, there have been few that focus on the control of one's own personal collect ion of reprints, memoranda, reports, drafts, slides, and related miscellanea, which accumulate so rapidly in any professional " information -handler's" office. stibic's book addresses this problem in a thoroughly professional and competent manner. his first two chapters introduce the general nature of the problem, and discuss professionals' information needs and sources. the third, "document description," covers the record structure, abstracting, subject descriptions, keywords and classification methods, and their various combinations. the fourth chapter details the various technical means for storage of original documents, microfilm, and such control meobanisms as card indexes, peek-a-boo cards, and computer-supported indexes. all of these chapters draw on the experience and practices familiar to users of large-scale systems. stibic recommends the use of iso and other standardized practices, and endeavors to emphasize the need for constructing one's own system in accord with generally accepted design principles. stibic is careful to point out, however, that if one is in fact designing a personal documentation system, then personal idiosyncrasies and preferences can be built into it. it is not necessary to use an established and standardized vocabulary or classification system without modification. one may alter it to suit one's own purposes. however, the structure of the system (whether descriptors, classification numbers, or other means} must be controlled; otherwise the system will become useless. the next four chapters are case studies of different systems. the first is a card index technique used by an individual. the second describes a computerized index to support the documentation needs of a project team. (essentially an augmented kwic index, published quarterly.) the third case study is one of particular interest to many professionals at the moment-the use of a personal computer as an indexing control system. the system, though not explicitly identified, is roughly comparable to many of those available in the u.s.; a microcomputer with 64k ram, a display of 80x24 lines, two floppy disks with 512k bytes/disk, and an socharacter-per-line printer. the indexing is done via a faceted classification system of about 250 terms, which are hierarchically linked, providing automatic up-posting from specific to generic terms. a hashcoding technique is used to minimize the storage space required on the disk, and searching is performed by simple serial book reviews 125 searching of the index records. the fourth case study is an examination of the upgrading of the manual card index described in the first study to a system supported by a large main-frame computer, using a terminal in the professional's office. a combination of automatic keyword extraction and manual classincation is used for indexing. complex boolean searches are possible with this system. stibic concludes with a chapter on future prospects, touching briefly on such things as internal and public viewdata/teletext systems. he also provides a checklist of desirable features of "a multi-purpose personal work station." such a station is not merely a special-purpose device used to aid in some parts of one's work, such as retrieval, but is an integral part of all of one's work; computer, calculator, textprocessor, mail-dispatch system, calendar, in/out box, and so forth. the author, a scientist of long standing with philips in holland, has provided a valuable guide to this area. there are two relatively minor points of criticism, however. whether it was the author's or the publisher's choice is not clear, but there is an excessive use of italics throughout the text. this lavish use seems more appropriate to teenagers' romantic novels than to a serious work. in this case, it is more distracting than helpful. secondly, but more understandably, the extensive references stibic gives are frequently to documents not easily available in the u.s. some are oecd papers, some refer to the german din standards, and some to internal philips technical reports. these are minor points, however, regarding an excellent book. it is recommended not only for the information professional, but for anyone who is seriously concerned with the problem of keeping track of what one needs to know.-allan d. pratt, university of arizona graduate library school, tucson. viewdata revolution, by sam fedida and rex mahle a halsted press book. new york: wiley, 1979. l86p. $34.95. lc: 7923869. isbn: 0-470-26879-4. sam fedida is the inventor of prestel, 126 journal of library automation vol. 14/2 june 1981 the british post office's viewdata system . with this as his license, he and rex malik have written a 186-page volume explaining the prestel system. prestel is a series of databases, which are accessed by a keypad similar to a calculator. the common television takes on the characteristic of a crt for viewin·g alphabetical and numerical information. the connection to the computer is by telephone, and, in britain, the post office is in charge of the telephones . overall, in spite of several printing errors, this book does provide information about the system. the authors explain the types of information that will be available on the pres tel system, such as "buying a car," "houses for sale," "entertainment," "education," "an evening out," and "news . " they have also devoted individual chapters to electronic mail, electronic funds transfer, and education, explaining how each works in the system. the authors stress the benefits and attributes of their system almost to the point of redundancy . in each of the chapters, the manner in which the information is going to be accessed is repeated. despite the repetition, the primary focus is what prestel will do for the betterment of mankind. the uniqueness of prestel is the simplicity of its access process. according to the authors, being able to access the information in one's own home will make prestel a major tool for dissemination of information for many agencies and businesses. at times, the "hard sell" is very obvious throughout the volume. however, the diagrams are good and help to explain the authors' points. the problems fedida and malik anticipate in the electronic mail and protocols are realistic. in the chapters "future i" and "future ii," the authors go off on a tangent, using a time line, on what they see in the future. again, it is basically a repetition of what was said in the previous chapters, only from a futuristic point of view. here, the reader gets a distinct feeling of what is really bothering them now in the system; that is, government bureaucracy . they cite the different groups trying to control the information by means of legislation. they delve into the problem of uniformity of standards. television is an example . what will be standard for convertors and adapters for the computer hookup? this is a real problem that was well explored throughout the work. this volume is good for librarians who are interested in cable, telecommunications, and computers . however, be aware of its poor organization. there are numerous printing errors that affect its readability. nevertheless, if a person can wade through these errors and the repetition of ideas, he/she can obtain some useful information from this text. there is a distinct feeling throughout this work that it was put together hastily . nonetheless, there is a dearth of information on this subject, and this book will serve some useful purpose for libraries .-robert miller, memphis/shelby county public library and information center, memphis, tennessee . ala filing rules. filing committee, resources and technical services division, american library association . chicago: american library assn., 1980. 50p. $3.50. lc: 80-22186. isbn: 0-8389-3255-x. library of congress filing rules. prepared by john c . rather and susan c. biebel. washington, d. c.: library of congress, 1980. ll1p . $5. lc: 80-607944 isbn: 0-8444-0347-4. available from customer services section, cataloging distribution service, library of congress, washington, dc 20541. these two works represent the culmination of over a decade of effort within the library profession to overhaul the techniques by which entries are arranged to form catalogs. the impetus for this work came from recognition that computer technology would soon be enlisted to perform the arrangement of entries for the production of catalogs, and that filing rules current at the time would be impossible to implement in their entirety on the computer. although the original intention was to develop rules appropriate for the arrangement of entries by computer, those at the library of congress and the ala committee working on the problem soon realized that, from the point of view of catalog users, it would be very undesirable to have different sets of filing rules in operation depending on the physical medium of the catalog. therefore, the scope of the effort was broadened to rules that could be applied both manually and by machine using headings that were formulated according to more than one set of cataloging rules. now that we have these new rules, the question arises whether they are better than what preceded them . the criteria for "better" ought to be whether the rules make entries easier to find both for known-item searches and browsing within the complex device called a library catalog. or to state the same criteria negatively: it should be more difficult to lose an entry in the catalog if it has been filed according to the rules . the evaluation of these rules against other possible approaches to catalog arrangement ought to be centered on observation of the needs of a variety of both experienced and unsophisticated catalog users and on measurement of the effectiveness of the alternative approaches to meet these needs. the complex problems of filing clearly exemplify the need for research as recently expressed by herb white in his columns in american libraries. lacking any empirical data on which to base an evaluation, we must rely on our professional judgment and personal biases to argue the case for the new rules. to this reviewer, it seems that common sense supports a set of rules that are simple, consistent, and easy to explain to library users. the need for simplicity and consistency directly implies the "file-as-is" principle (i.e., file exactly as the heading is visually constructed, not by some interpretation of it), which should be applied even at the cost of having to search in more than one place in the arrangement; e .g., numeric digits and numeric words , mac and me, muller and mueller. the file-as-is principle has been more consistently applied in the ala rules than the lc rules, the latter undoubtedly a result of the anticipated complexity and size book reviews 127 of lc's catalogs, although there is no justification argued for these departures from the basic principle. of specific interest to readers of the journal is whether these rules can be implemented for computer sorting of catalog entries . do the rules succeed in meeting their original objective? the ala rules certainly appear to be amenable to very straightforward systems analysis and programming. for this the committee and its chairperson, joe rosenthal, need to be commended. from some sources there are already claims of systems that fully implement the new ala rules, which certainly could be the case . however, it would be interesting to know how these systems deal with the follow ing, which seem to be potentially troublesome: • the lack of consistent support in the marc format for handling initial articles when the rules call for ignoring initial articles in corporate names other than personal or place names, title subheadings ($t subfield), and subject headings. the english articles obviously present no problem, but the table of articles in appendix 2 shows more than thirty words that can be both an article and the cardinal numeral l. in addition, the footnote , "in h awaiian, the '0 emphatic' must be carefully distinguished from the preposition 0, but 0 also serves. the h awaiian language as a noun and a verb (each with several meanings), an adverb, and a conjunction," must surely give pause to the diligent systems designer. the recent library of congress practice of dropping nonfiling initial articles from heading fields still does not solve the problem of initial articles in the several million marc records that already exist in library catalogs . • the requirement that roman numerals be filed numerically presents an opportunity to construct an interesting but not overly complex algorithm . however, although the marc format makes the identification of roman numerals in heading fields fairly straightforward (the $b subfield), the identification of roman numerals embedded in a long title is much more ambiguous . for example, does iv mean "4" or "intravenous"? 128 journal of library automation vol. 14/2 june 1981 • the rules require that punctuation in an arabic numeral that is included to increase its readability is to be ignored in filing, but decimal points are significant in determining the numeric value of the number (i.e . , .003 files before 1) . how does one specify an algorithm to deal with the title, "5.000 kilometres dans le sud"? using european practice, this number is obviously 5,000, but why not 5 according to the computer algorithm? • the special rule for nonroman alphabets (rule 7) is interesting: "if, in the arrangement of bibliographic records, it is necessary to distinguish access points containing characters in different nonroman alphabets, scripts and syllabaries (cf. rule 1, order of characters) the following order of precedence is used. . .. " there follows a table beginning with amharic and ending in tibetan. that is the entire rule. systems designers who have implemented this rule clearly have transcendent skills! reliance on the marc language code in the 008 field has both theoretical and practical problems. • the introductory text advises libraries to include in the file information notes and references that explain filing practices to catalog users . however, the rules do not specify where these references are to file in relation to other headings. admonishment to provide these at "appropriate points" is not much help. • the ampersand is ignored in filing (for which we should be grateful) . but, by including the optional rule 1.3, which allows filing the ampersand "as its spelled-out language equivalent," the ala committee has put systems designers in the position of having to explain why this rule cannot be implemented on the computer-at least not until the marc format includes a code for language of the field (not a likely development, and even then not all ambiguity would be eliminated). interestingly, the library of congress treats all ampersands as a character filing between blank and the letter a . • the optional rule 9.1, which allows the inclusion of "the role of a person or a corporate body in a legal action in arranging access points," presents a problem when the rule requires suppression of all other relators . how is the computer programmed to recognize a legal action? is there a finite list of such relator words? differences between aacr2 and previous cataloging practices further complicate the use of this option . admittedly, many of these problems are marginal in terms of the number of entries in a catalog affected, but to a systems designer, even though there is only one instance, it must be accounted for in the computer programs if the system can claim a "full" implementation of the rules. clearly, full implementation will require some changes in the marc format before all rules can be applied absolutely consistently and unambiguously . the library of congress rules, although applying similar principles, depart significantly from the ala rules in detail and complexity. a full analysis of the implementation problems would require much more space than this review will allow. suffice it to say that although the library's libsked program has been under development for twelve years, and its strengths and limitations have undoubtedly influenced the development of these filing rules, there are elements in these rules that have not yet been implemented in libsked, and several where no one has yet figured out how to do it. although the work on these rules is complete, there are two more projects the profession should undertake that would be most useful for those concerned with catalog development . in both sets of rules, there is mention in the introduction of the need for a brief version of the essential rules, which could be handed out to catalog users . why did the committee not develop such a brief guide and include it as an appendix to the rules? those of us who work on computers are familiar with the reference cards for programming languages put out by computer manufacturers . a similar format for the filing rules would be very useful. another more difficult but equally useful project would be the publication of a standard design implementation of the ala filing rules expressed in terms of the marc format . such a design would include the marc fields and subfields necessary for each possible entry from a bibliographic record and a description of any special processing required for particular data elements. the design would be expressed at a level that is independent of programming languages and computer hardware . we need a standard reference that translates the filing rules into the language of the marc format. t he ala rules, in some tantalizingly brief instances, begin this process. both sets of filing rules are significant improvements over those previously available to systems analysts. reference librarians should find these rules easy to explain book reviews 129 to beleaguered catalog users. for their simplicity and relatively slight departure from the "file-as-is" principle, the ala rules are to be recommended . the library of congress rules, in their attempt to retain the classificatory structures that support the browsing user, further complicate the task of the user performing a known-item search. library research has indicated that the preponderance of catalog searches in research libraries are known-item searches .-] ohn f. knapp, ringgold management systems, beaverton, oregon . tps ties them together ® aegistered tr ade mark oclci!!> a t s a c c i rcu l at i®o n l 0 s g i® r l i n® n g tps electronics provides on-line and off-line interfaces • one-step item processing • error-free data entry • back-up storage tps electronics 4047 transport st. palo alto, ca 94303 41 5-494-6802 130 highlights of lit a board meetings the highlights of lita board meetings are published here to inform division members of the activities of their board . the highlights are not the official minutes of the meetings . 1981 ala midwinter meeting washington, d.c. first session february 1, 1981 the meeting was called to order by s. michael malinconico, president . the following board members were present : s. michael malinconico, barbara evans markuson, brigitte l. kenney, nancy l. eaton, kenneth j. bierman, ronald f . miller, bonnie k. juergens, marilyn j. rehnberg, heike kordish, and donald p. hammer, lita executive director. staff: laura stewart. the minutes of the 1980 annual meetings were approved and adopted with the correction that brigitte kenney be reported as present at the wednesday, july 2, 1980, meeting. marbi committee report (report by eleanor montague). montague reported that the marbi committee is continuing its work, and that the members do not feel that the value of their work has been lessened by the new arrangement with the library of congress . the committee has discussed changing its mode of operations by introducing teleconferencing and by establishing a steering committee, but these things may be in the future . board discussion took place on the value of ala input to the marc format and whether or not lita should support a representative to the two lcsponsored meetings. montague requested a budget of $2200 to support that representative and the board decided to vote on that matter when it considers the 1981182 lit a budget later in the week. new lita button and new avs brochure (report by donald hammer). the new lita button, "litaship is for everyone," was introduced, and copies of the audiovisual section's membership recruitment brochure "who says ala doesn't do anything about a v?" were distributed to the board members . joint litairtsd board meeting. malinconico announced and discussed the joint lita/rtsd board meeting to take place later in the week. he pointed but that there are many areas of joint interest and many activities the two divisions could cooperate in . he mentioned specifically discussion groups, highlights of meetings 131 cosponsorship of programs, z39 representation, problems concerning ala policies , the coming five-year review of the isbd, and other things. telecommunications committee report (report by joan maier). the board was brought up to date by maier on the preconference the telecommunications committee plans to sponsor at san francisco. it will be concerned with the "office in the home" concept and the support the library should provide to that "electronic cottage" mode of operating. the second day will consist of a tour of "silicon valley's" mission college where that college will demonstrate its new approach to education and its use of automation . additionally, the silicon valley electronic manufacturers will demonstrate their technology . joyce capell, who represented mission college, gave the board information about the college and the potential exhibitors from silicon valley. vacancy in division councilor position. hammer reported that the request was made to the ala bylaws committee to ask council to change the ala bylaws to allow for an alternate councilor to be elected by each division, and for that alternate councilor to have the vote if a division's councilor cannot complete the term of office. lit a will elect an alternate councilor in the coming ala election and it is expected that the ala bylaws committee will present their proposal to council this week. proposed increase in ala overhead charges . hammer reported that the ala controller has proposed that ala raise its overhead charges from 13 percent to 16.5 percent. this is the ala charge against institutes , preconferences, and other special activities. · the board decided to establish a task force to determine ho~ these overhead · charges are arrived at and exactly what items are included in them . . the task force will consist of ronald f . miller, chairperson; barbara markuson, bonnie juergens, and donald hammer, resour~e person. the following motion was made by ronald miller, seconded by kenneth bierman , and passed: that a task force be formed to obtain additional information about overhead charges which are assessed the division . toward that end, the task force will accomplish the following : (1 ) describe in writing the steps required for determination and approval and adoption of an overhead rate; (2) define the component costs included in the overhead rate ; (3) suggest services which overhead covers which might be contracted for in other ways . the members of the task force are : ronald f. miller, chairman barbara e. markuson bonnie k. juergens donald p. hammer, resource person the dates for accomplishment of the three items are: (1 ) may 1, 1981 (2) june 1, 1981 (3) ala annual meeting 132 journal of library automation vol. 14/2 june 1981 report on free ]olas. to date, twenty-six requests have been received from lita members for free copies of back issues of jola. this offer was approved by the board at the last annual conference as a means to reduce the supply of back issues of jola. it was suggested that new members of lita should be notified that these issues are available. report on funds allocated for san francisco programs . the ala conference program committee allocated to the lit a units the following funds for programs at the san francisco annual conference . isas/tesla "technical standards: the good, the bad, the missing" vccs "use of video by and for the deaf" vccs "viewdata-the electronic delivery of information" end of first session. second session february 2, 1981 $100.00 $350.00 $700.00 the meeting was called to order by s. michael malinconico, president . the following board members were present: s. michael malinconico, brigitte l. kenney, barbara e. markuson, nancy l. eaton, kenneth j. bierman, ronald f. miller, bonnie k. juergens, marilyn j. rehnberg, heike kordish, and donald p. hammer, lita executive director. staff: laura stewart . lita standards committee. a problem has arisen concerning an overall standards committee in lita in that those seeking information about technical standards have no one or no unit within lita to contact except tesla, which is concerned only with computer and data processing standards. an example of the situation is that of steve salmon who was appointed liaison to lita from the ala standards committee . he can only contact tesla and has nowhere to go concerning standards in any of the other areas of interest to lita. there is also no unit in lita empowered to establish standards policy for the entire division. after discussion, the board asked the lita executive director to contact mr. salmon and discuss the matter with him to determine what, if any, problems he felt the present arrangement made for him . the board will later reconsider the matter. rtsd catalog form, function, and use committee. this is a committee rtsd is proposing that would be an interdivisional committee concerned with the evolving and the proliferation of library catalogs and with development of programs and workshops " to inform and develop professional thinking on the form, function, and use of library catalogs." it was decided to bring the matter up at the lita/rtsd joint board meeting and to ask for additional information at that time. lita legislation and regulation committee (report by judith sessions). the legislation and regulation committee has made arrangements to hold a reception in the russell senate office building at which librarians will be able to meet their legislator and/or the legislators' staff members. the rehighlights of meetings 133 sponse to the invitations has been excellent as about one hundred rsvps have been received from legislators and their staff members. a report was given on the revision of the communications act of 1934 and the provisions that librarians should be working to have included. the copyrigh~, law was also discussed, especially the lack of a clear definition for "fair use. information bill of rights. about a year ago the information industry association compiled and published a statement called the "information bill of rights." the lita executive director brought the statement to the attention of the board because it was felt that the statement was written from the aspect of the profit-making organization only and perhaps should be broadened . the board decided that this was not in its province and asked the lita executive director to forward the matter to the ala office for intellectual freedom for any action they feel is warranted. marc users & library automation discussion groups. the marc users discussion group has decided that it would like to merge with the library automation discussion group (formerly cola) but would like to retain the four-hour time slot it has· had for many years. a motion was made by kenney, seconded by ron miller, and passed: that the lita board permit the merger of the marc users discussion group (mudg) and the library automation discussion group (ladg) and that they be called library and information technology discussion group. a motion was made by kenney, seconded by juergens, and passed: that the discussion groups (mudg & ladg) after they merge retain the four-hour time slot for the combined new group. a motion was made by juergens, seconded, and passed: that the chair of the library and information technology discussion group be instructed to contact the lit a program planning committee chair for coordination of discussion topics prior to each litdg meeting. a motion was made by kenney, seconded by ron miller, and passed: that the library and information technology discussion group elect a deputy chair to assist the chair from now on. end of second session. third session february 2, 1981 the meeting was called to order by s. michael malinconico, president. the following board members were present: s. michael malinconico, brigitte l. kenney, barbara e. markuson, nancy l. eaton, kenneth j. bierman, angie w. leclercq, helen cyr, bonnie k. juergens, marilyn j. rehnberg, heike kordish, charles husbands, and donald p . hammer, lita executive director. staff: laura stewart. apple education foundation grants. brigitte kenney reported that the apple foundation had been flooded with grant requests and that they 134 journal of library automation vol. 14/2 june 1981 have decided to restrict their grants to software development only. president malinconico asked kenney to determine exactly what the limitations are before the board considers the matter further. honoraria paid to lita speakers. the question raised was whether or not the people on the lit a board -of directors or any of the lit a program planning committees should be paid honoraria when they serve as speakers at lita institutes. a motion was made by juergens, seconded by helen cyr, and passed: that lita will not pay honoraria to lita board members or lita members of program committees for participation in institute programs . (this will take effect after end of next annual conference.) (this will take effect immediately for board members.) ala survey of priorities of membership (report by ron miller). five priorities the ala membership priorities committee has determined are access to information, legislation and funding, intellectual freedom, public awareness, and professional and staff development. some board members expressed surprise that some of the areas of concern to the white house conference were not included as ala priorities and that one of the expressed priorities (legislation) is only a means to an end. malinconico asked the board members to send their comments by april 1 to miller, who will then distribute a proposed amendment to the board. a task force of three, barbara markuson, brigitte kenney, and ron miller, was appointed by consensus to write the proposed amendment. it was suggested by juergens that the lita statement be published in american libraries as a letter to the editor in the same issue that the proposed ala priorities are published. lita teleconferencing system. a representative, john sehnert, from the source, gave a presentation of that system to the board. after a long discussion about the capabilities of such systems and the needs of the board, it was decided that by march 1 the recommendations for a pilot project would be provided, by march 15 the system should be operational, by april 1 a set of criteria should be made up to evaluate the project, and by the next board meeting (in san francisco) an evaluation should be held with a decision made as to the permanency of the system. it was suggested by barbara markuson that the "electronic mail" system should be demonstrated during the lit a president's program and input should be sought from the members as to what their needs are along this line. program planning committee (report by kaye capen). the committee is in the midst of a transition of the chair. sue tyner will be the new chairperson. capen discussed the joint rtsd/lita/rasd preconference on online catalogs to be held in philadelphia. national conference plans. berna heyman reported on plans for the lita national conference planned for baltimore in the spring of 1983. if for highlights of meetings 135 any reason it cannot be held in the spring of 1983, she stated that the fall of 1984 would be their second choice. the maryland library association would be interested in cosponsorship. the committee is considering asking for help from the council on state governments. the conference format would include exhibits, workshops prior to the conference, invited papers, contributed papers, a poster session, and panel sessions. a survey of the lita members is being considered in order to get ideas on subjects of interest. discussion followed, but no action was necessary. end of third session. joint lita/rtsd board meeting february 2, 1981 introductions. both boards, guests, and staff introduced themselves. agenda . there was no set agenda. karen horny, rtsd president, suggested that one topic that might be discussed or at least recognized is that both lita and rtsd have retrospective conversion discussion groups . both appeared to have different focuses on their discussion of retrospective conversion. background . michael malinconico, lita president, gave background on the reason for the joint board meeting. there has been created an uneasy sort of division between technology, application of technology, and technical services systems. this uneasy division is thinking of technology as the form in which library services are delivered, thinking of the technical-services interests as reflected in rtsd as concerned primarily with the content of that service. the distinction between form and content obviously falls apart very rapidly. in previous lita discussions, barbara markuson p"ointed out that there are perhaps three stages of implementation of technology . in the first phase, there is exploration of the potential of technology. that is the domain of lita. in the second phase, there is an implementation and a certain amount of acclimatization that is ne<;essary. this is the gray area. the third phase is where the technology becomes integrated into the operation of a library. this is the concern of the traditional technical services. the gray area needs to be addressed. with automated cataloging systems in particular, they are beginning to mature, and it is no longer clear who should be concerned and addressing the problems. thus there is overlap. we need to meet together to consider ways to make more efficient use of time that is expended at ala meetings. currently there are a number of joint ventures: cosponsorship of the catalogs preconference, 1982; the establishment of a joint committee on catalog form, function, and use; cosponsorship of a program in san francisco on union lists of serials. three things should be considered: l. how to organize to take joint action on matters that concern us mutually. 2. think of the joint programs as pilot ventures and attempt to set up a structure that can be used for future joint ventures. 136 journal of library automation vol. 14/2 june 1981 3. consider what other projects we might want to do jointly. bill gosling, rtsd past president, stated that one of the points in terms of overlap is the factor of growth ofboth divisions . the factor of growth is related to two things: (1) interest and (2) the desire or need to have official affiliation with the association . this is not unique to rtsd and lita. ala, as well as the divisions, is growing; more and more people are involved and want to be involved. michael malinconico stated that we should let the growth be a result of conscious action, it should not be something that happens without our conscious intent or control. it may be that there are instances where overlap is necessary and desirable. let the overlap occur as something done by intention. norman dudley stated that ala does have mechanisms for resolving overlap, which we are just beginning to use . we can never identify the gray areas because of the very nature of the technology. every new application of technology presents us with new or possible gray areas . what is needed is the sensitivity, willingness, and ability to approach the other unit and ask for cosponsorship. michael malinconico stated that the divisions had experimented with liaisons to their boards. the meetings often conflict, so this seems an enormously inefficient method of communication. another example of perhaps peripheral interest to lita is the isbd fiveyear program. there might be some value in having a joint review of the isbds . arnold hirshon suggested that the division executive directors exchange minutes or summary board minutes . at this time rtsd does not do summary minutes. the rtsd newsletter reports rtsd board action as well as section and committee reports, however. bill gosling, rtsd past president, stated that when talking about units of a division, even as an officer, it is difficult to ensure information communication . an orientation session is very important. if two or three people miss this, the information has to be picked up by sitting in meetings. for programming, a mechanism to be used is a screening for all programs . the planners have to include what affiliation is appropriate and what contact has been made. perhaps , to return to mike malinconico's point about structure, we should charge our organization committees, who review recommendations for new committees, also to look at possible affiliation. this happened with the catalog form, function, and use committee. it was suggested that rtsd and lita ought to exchange representatives to the organization and bylaws committees. michael malinconico suggested expanding this to exchanging representatives to the division level programming committees . rtsd does not have one as yet. bill gosling agreed that when the structure becomes defined, this is another area of exchange . michael malinconico suggested two other areas that lita and rtsd could explore--the proposed increase in the ala overhead rate for workshops, institutes, preconferences, etc., and the difficulty of getting publicity for forthcoming programs in american libraries. lita has formed a task force to look into the proposed change in the overhead rate to look at what ala central provides for the overhead charges and to identify those things that might be more economical to contract for separately. highlights of meetings 137 this is perhaps another area for cooperation. the sense of the lita board was that they would like rtsd participation in the task force. nancy williamson agreed to sit in on the task force as an rtsd observer. the task force's function statement has three aspects: 1. to identify the procedural steps that a dues increase would have to go through and how to effect those steps. 2. determine what it is we get for the overhead we pay. 3. determine those things that we get that might more economically be contracted for separately. student dues and graduated dues. the lit a board acted in support of student dues. rtsd had a concern about the impact of student dues on publications as the current $7.50 fee from the $15.00 membership fee does not cover the cost of publications. rtsd on the proposed graduated dues structure for new members, felt that it was difficult to assess the impact of new division members until they saw the effects on the ala general membership. the lita board was in favor of the graduated dues structure for new division members. interdivisional committee on catalog form, function, and use. this committee would replace the book catalog committee. currently rtsd is receiving responses from other divisions on their interest in forming such an interdivisional committee . ala/coo would have to look at this committee. michael malinconico stated that the formation of this committee would be one way of addressing some of rtsd and lita's mutual concerns. on-line preconference. the division executive directors were charged with writing an agreement on the responsibilities of each division with respect to this program and then circulate it to the respective boards. the joint board meeting was adjourned at 5:42 p.m. fourth session february 3, 1981 the meeting was called to order by s. michael malinconico, president. the following board members were present : s. michael malinconico, brigitte l. kenney, barbara evans markuson, nancy l. eaton, kenneth j. bierman, angie w. leclercq, helen cyr, bonnie k. juergens, marilyn j. rehnberg, heike kordish, charles husbands, donald p. hammer, lita executive director. staff: laura stewart. discussion of marbi committee request for funds. a discussion of the marbi committee and lita representation at its meetings took place. the following motion was made by barbara e. markuson, seconded by bonnie k. juergens, and passed : the lit a board approves the expenditure of up to $2200 to cover expenses for one lita representative at the two 1981 marbi meetings held 138 journal of library automation vol. 14/2 june 1981 outside the two annual ala meetings . this matter will be reviewed again at the next midwinter meeting. (amended by s. michael malinconico, and approved unanimously.) lita's place in standards setting. a long discussion took place on the past contribution of lit a in standards setting and what its position should be now and in the future in the standards field. the place of marbi, tesla (isas), and the isas international mechanization consultation committee was considered. the discussion culminated in a decision to ask the executive director of lita to write a background paper on the history of lita's involvement with all standards activities, including actions with other groups, and what results were achieved . the report is to be available at the next annual conference. national conference report (report by berna heyman) . it was reported that the national conference program committee would ask the ala executive board to approve a conference for lita in the spring of 1983. a discussion took place on the audience at which the conference would be aimed. concern was expressed for the inclusion of beginning-level programs and papers as well as activities for the more knowledgeable. the tutorial approach to all aspects of areas of interest to lita members and others was advocated by several board members. after a discussion on the registration fees and on the individuals who should be present to represent the lita board before the ala executive board, a motion was made by bonnie k. juergens, seconded by brigitte kenney, and passed: the board approves the request of the program planning committee to proceed with current plans to hold a lita conference entitled "informationffechnology : lita brings it all together." such approval includes a vote of appreciation to the committee for the effort that has gone into this plan. end of fourth session. fifth session february 3, 1981 the meeting was called to order by s. michael malinconico, lita president . funds allocated by ala to lita for san francisco programs. hammer reported that $900.00 was allocated by ala to each division to be distributed by their boards for san francisco conference programs. also---$100.00 was given to tesla by ala. also---$350.00 was given to lita vccs "video for the deaf" program. also---$700.00 was given to lita/vccs "viewdata" program; vccs requests at least $300.00 more for this . a motion was made by kenneth j. bierman, seconded by bonnie k. juergens, and passed : that tesla be awarded $550.00 and vccs program planning committee be awarded $350.00 for additional support for their programs for the san highlights of meetings 139 francisco conference. these funds are to come from the "regular conference program funds." bylaws and organization committee (report by heike kordish). no action items to report . program planning committee (report by sue tyner). no report given as sue tyner had just recently become chair. telecommunications committee (report by joan maier). after discussion it was decided to double the number of registrants expected for the "office in the home" preconference from 150 to 300. a revised budget was presented to the board. the lit a telecommunications committee would like to publish in the lit a newsletter a listing of electronic mail systems, a listing of paperless information technology consultants, and a dial-order-type services listing. these items have been turned down by the editor of ]ola, but have been accepted by the editor of the lita newsletter . the lita board gave its enthusiastic endorsement. nominating committee . malinconico reported on the 1981 elections slate as follows: vice-president/president-elect: kevin hegarty, carolyn gray director-at-large: hugh atkinson, emma cohn council: bonnie juergens, george abbott, lynne bradley lita section reports: isas (report by bonnie k. juergens). no action items to report. publications committee (report by charles husbands): brian aveney has augmented the jola staff by getting david weisbrod to be book review editor, and tom harnish to be an assistant editor for video communicationswhich we think will bring some new focus to those areas. as the committee has begun to organize the division's publications program, it is questioning whether or not an editorial board and a publications committee are needed . the committee proposes that lita have a publications committee, and that the journal and the newsletter have editorial boards. the committee recommends the chairperson of the publications committee be an ex officio member of each of the editorial boards, and that the chief editor of each of those publications be an ex officio member of the publications committee. the newsletter editorial board would consist only of the staff, i.e ., the chief editor, and the section editors; and the journal editorial board would consist of the chief editor, the various assistant editors, and additional people to serve as a core of reviewers (but not necessarily limited to that function) . the relationship to the lita board is something of a question. the bylaws state that the jola editor is a member, ex officio, of the lita board . the roster shows that charles husbands, as chair of the publications committee, is the ex officio member. there is a question as to whether either needs to be a member of the lita board . the publications committee feels that there should be only one ex officio member on the lita board-the chair of the publications committee-and asks the lita board to resolve this question. 140 journal of library automation vol. 14/2 june 1981 after discussion it was moved by ronald f . miller, seconded by brigitte l. kenney, and passed : that the board officially recognizes and approve s the establishment of two editorial boards; the first for the association's newsletter, the second for its journal. furthermore, the chairperson of the publications committee should appoint a liaison to the lita board for reportorial purposes, and the bylaws shall be amended to delete the journal editor as an ex officio member of the board. the publications committee recommends that the lita budget be published each year in the newsletter. the committee suggests changes in the form of the lita budget that would more accurately and/or more specifically indicate expenditures. the publications committee also suggested that some narrative be included with the budget to explain various aspects of it. no board action was necessary . the suggestion came up, in reference to items for the newsletter, that it might be interesting to try getting the headlines from the newsletter into some kind of electronic distribution. nexis was suggested. tom harnish suggested the source as another possibility, keeping in mind legal and copyright considerations . the lita board is considering an electronic mail pilot project, but the lita telecommunications committee is already in the process of setting up such a project of its own at this time . the board asked the newsletter editorial board to draw up a proposal for the lit a board to consider at the san francisco annual conference . american national standards institute committees . hammer brought to the board's attention two recent problems concerning lit a's representation on american national standards institute (ansi) committees . 1. ansi sent lita an invoice for $50.00 for membership in ansi. when it was pointed out to ansi that lita is a division of ala and ala is a member of ansi, the $50.00 charge was dropped. 2. as was reported to the board at the last annual conference meetings, the computer & business equipment manufacturers association (cbema) billed lita for $1,125.00 for a partial-year 1980 membership on x4 ($1 ,500.00 for a full year), and later information revealed that membership on x3 would cost $2,500.00. x3 and x4 have now been combined, but no information has been received on what the dues are for the "new" x3 committee . the problem is that letters to cbema asking what provision has been made for representation from nonprofit users groups are ignored . lita, therefore, no longer has any representation on the computer-standards-setting committees. after discussion , it was suggested that the lita executive director continue to try to communicate with cbema. sponsorship of lita institutes by outside organizations . bonnie juergens, chair of isas, brought up the matter of outside organizations asking to hold lita institutes for their members. the specific incident concerned is that of the law library association's request for sponsoring the "data processing specifications and contracting" workshop as a preconference workshop prior to their conference in june. the board indicated a willingness to allow such arrangements, but felt that lit a should gain some financial return from them. highlights of meetings 141 in this case, the board, by consensus, indicated that the law library association should be asked for 20 percent of the costs (with 15 percent being least acceptable) as remuneration to lita. lita bibliography . juergens brought up the question of continuing the lit a bibliography on library automation . the last one published included the years 1973--1977. she wanted board reaction as to whether or not it is a viable project and whether or not isas should prepare a working plan and a budget to be presented to the board at the next annual conference. the board, by consensus, asked isas to proceed to develop a plan . lita representative to ifla ( international federation of library associations) . ifla representative nominations. kenney presented a statement concerning the need of, and the requirements for, nominees to ifla. her recommendations for nominees were fred kilgore, susan martin, russell shank, and dick degennaro . members of the lita board were invited to submit additional names of possible nominees, especially as there is no limit to the number of nominees. ala operating agreement with divisions . at one of the copes meetings, there emerged a new ala operating agreement for the divisions written by robert wedgeworth, now being discussed by all units . there was a negative reaction to the vagueness of the document as it now stands. the president of the board suggested that the board members put their comments in writing to send to him around march 1. student membership dues proposal . ron miller asked the board if it wanted to reconsider its approval of the reduced student dues proposal in the light of recent discussions and actions in ala council. after discussion, the board confirmed its approval of reduced student dues and also took the position of being in favor of "local," i.e., divisional control of d~es . ala membership promotion task force (reported by blanche woolls) . the membership promotion task force is going to arrange special discounts to members of ala to go to museums and so forth in san francisco . lita might want to mention in the lita newsletter places of interest and things to do that the members might not otherwise know about. more specifically, lita might want to highlight the technology that exists in the san francisco area that lita members might be interested in going to see on their own . membership committee (blanche woolls). the lita membership committee recommends that lita prepare information for ala members who are not members of lita, to suggest that they should belong to lita by stressing those areas of the division that could attract individual participation in the association, such as the discussion groups and programs. it was moved by brigitte l. kenney, seconded by kenneth j. bierman, and passed : that the lita board authorize up to $700.00 for a mailing to ala members who are not lita members . 142 journal of library aut01nation vol. 14/2 june 1981 the membership committee requests support of the lita board for student chapters. though there is only one, university of michigan, there should be a letter sent to welcome the students into lita. woolls offered to write the "greetings" letter. bringing into lita people who are not librarians was presented to the board and discussed . aect is having their national meeting in philadelphia in april. as a member of aect as well as lita, blanche woolls would like authorization to arrange a very small reception at this meeting to attract members to lita . it was moved by kenneth j. bierman , seconded by brigitte l. kenney , and passed: that the lita board authorize up to $300.00 to the membership committee for a reception for the aect national convention, april 5--9 . the purpose of this reception is to encourage new members for lit a. membership committee is going to have a microcomputer in the lita booth at ala with a "lita game" on it-telling what lita is all about . they are aiming at zero cost to lita for both the microcomputer and the game . lita oral history task force . s. michael malinconico suggested that board members read the report on this subject that was made to the board by robert miller. avs & vccs proposed merger. brigitte l. kenney announced that both the a v section and the vccs section have expressed an interest in merging into one section and in expanding the telecommunications interests in lita into another separate section. s. michael malinconico suggested that the av section and vccs meet in san francisco and, in a joint meeting, discuss this matter and see that their memberships are informed of the results of that meeting. end of fifth session. lita board of directors meetings record of votes 1981 midwinter motions (in order of appearance in the "highlights" ) board member 2 3 4 5 6 7 8 9 10 11 12 13 s. michael malinconico y y y y y y y y y y y y y brigitte l. kenney 0 y y y y y y y y y y y y barbara e. markuson y y y y y y y y y y y y y nancy l. eaton y y y y y y y y y y y y y kenneth j. bierman y y y y y y y y y y y y y ronald f. miller a y y y y y y y y y y y y angie w . leclercq 0 0 0 0 0 y y y y y y y y helen cyr 0 0 0 0 0 y y y y y y y y bonnie k. juergens y y y y y y y y y y y y y marilyn j. rehnberg y y y y y y y y y y y y y key: y = yes n = no a = abstain 0 = absent 143 instructions to authors the journal of library automation welcomes manuscripts related to all aspects of library and information technology . some specific topics of interest are mentioned on the masthead page . feature articles, communications, letters to the editor, and news items are all considered for inclusion in the journal. feature articles are refereed, other items generally are not. all material is edited as necessary for clarity or length. manuscripts must be typewritten and submitted in original and one duplicate . do not use onion skin. all text must be double spaced, including footnotes and references. manuscripts should conform to a manual of style , 12th ed., rev. (chicago: university of chicago press, 1969). illustrations should be prepared carefully as camera-ready copy, neatly drawn in a professional manner on separate sheets of paper. manuscript pages , bibliographic references, tables, and figures should all be numbered consecutively . feature articles consist of original research, state-of-the-art reviews, or comprehensive and in-depth analyses . they may be from ten to twenty-five pages in length. an abstract of 100 words or less should accompany the article on a separate sheet. headings should be used to identify major sections. authors are encouraged to relate their work to other research in the field and to the larger context of economic, organizational or management issues surrounding the development, implementation, and use of particular technologies. communications consist of brief research reports, technical findings, and application notes . these may be up to ten pages in length; an abstract need not be included. letters to the editor may offer corrections, clarifications, and additions to previously published material , or may be independent expressions of opinion or fact related to current matters of concern in the interest area of the journal. a' letter commenting on an article in the journal is shared with the author, and a response from the author may appear with the letter. letters should be no more than three pages in length . . news items may announce publications , conferences, meetings, products, services , or other items of note. these should be limited to two pages in 'length. book reviews are assigned by the book review editor. re;1ders wishing to review books for the journal are invited to contact the book review editor, indicating their special areas of interest and expertise. names and addresse s of the journal editors may be found in paragraph three on the masthead page. in all correspondence please include your own name, institutional affiliation, mailing address , and phone number. lib-mocs-kmc364-20131012114220 nated, volume-oriented, resource-sharing electronic ordering process. for information relative to bisac transmission formats or bisac membership, write to: book industry systems advisory committee, 160 fifth ave., suite 604, new york, ny 10010. for input to bisac purchase order formats, write to: j. k. long, chairman, bisac p.o. subcommittee, c/o oclc, inc., 6565 frantz rd., dublin, oh 43017. (mr. long is also the library or network representative on the isbn advisory council.) for input to the ansi z39 p.o. transmission formats, write to: mr. e. muro, chairman, subcommittee u, c/o baker & taylor co., 6 kirby ave., somerville, nj 08876. for problems with the isbn and san, write to: mr. emory i koltay, international standard book numbering agency, 1180 avenue of the americas, new york, ny 20036. microcomputer backup to online circulation sheila intner: emory university, atlanta, georgia. our primary objective in purchasing microcomputer systems for the great neck library was to provide a better alternative to paper and pencil checkouts when our minicomputer-based clsi libs 100 automated circulation system was down. two difficult and lengthy downtime periods occurring shortly after going online convinced the administration that public service should not be jeopardized because of system failure. after investigation of the backup systems vended by computer translation, inc., 1 two of them were purchased in november 1980. computer translation, inc. (cti) sells a turnkey backup system based on an apple ii plus microcomputer, with two mini-disk drives using 5 '14 " floppy diskettes, a tv monitor, and a switching system connecting the apple to the libs 100 console and terminals. software designed to interface with the clsi system is part of the package. the backup collects and stores data for communications 297 check-ins and checkouts and then dumps them into the database by simulating a terminal when the mini-main-frame is operational again. this requires dedicating a terminal to this process until complete. it can also be used alone as a portable unit for circulation purposes, or with any of the many applesoft packages available, or with an applesoft program of the user's own design. our initial experience in great neck was with a borrowed demonstration system, set up by a sympathetic cti representative on the spur of the moment in tandem with and connected to the main library checkout station's crt laser terminal after several days of downtime. the circulation staff cheered as the familiar prompts appeared on both screens. they used the clsi equipment which they were accustomed to operating and the computer room staff learned to operate the cti system. the ease with which the apple could be transported to different locations in the building and the immediate relief it gave wherever it was connected, sometimes one checkout station, sometimes another, led us to put off deciding on a permanent installation at first. we thought it might be more advantageous to keep it on a rolling cart and use it wherever a terminal was down, or wherever the traffic appeared to be heaviest. we continued in this manner for a while even after both of our own apple systems were delivered. it soon became apparent that the apple and its accompaniments, especially the switching system with its dangling cables, was a nuisance at the checkout counter. people with piles of books or records tended to nudge it dangerously close to the edge or jiggle its connections loose. the circulation staff didn't like waiting until someone from the computer room could be spared to bring up the system, secure the connections, and turn on the apple. also, although the apple is a very reliable instrument which has given us negligible downtime, bumpy rides over various floors, carpets, lintels, and textured tiles occasionally loosened its chips and rendered it, too, inoperative. cti representatives were called in to make a more permanent installation for the apple in our computer room, a simple operation requiring some additional cable. se298 journal of library automation vol. 14/4 december 1981 lection of the terminals to be attached as alternate backup or dumping sites was not so easy, however. the choice of the primary backup site was not a problem, since one of the two checkout stations flanking the main door was fairly obvious. but the second terminal which would be preempted for dumping was a more difficult decision. dumping sessions vary in length depending on the number of records to be processed and the activity on the rest of the libs 100 system. in our library, we find it takes about an hour to dump 100 to 150 transactions. this appears to be slower than average and may well be due to the extremely high level of system activity. thus, dumping 1,000 transactions would take a full working day. we had been online for such a short time that great backlogs of patron and material data entry from new registrants and unconverted books had developed and were a high priority item. neither the circulation department, which was handling registrations, nor technical services, which was handling materials, felt they could afford to lose much terminal time for dumping. thus, the reference department's information desk terminal was reluctantly chosen as the alternate terminal on the grounds that they only did inquiries formaterials which borrowers could locate by means of searching the catalog and making trips to the shelves. if necessary, information desk personnel could step across the aisle to the circulation department and use a terminal there. the permanent installation was set up in this way for one backup system , while the other one remained mobile in the event we wanted to use it at one of our three branches. only the switching box and cables were really unmovable. the apple, drives, and monitor could still be disconnected and moved about at will. experience over the last few months with this arrangement demonstrated that, all things considered, it is unwise to attach two public service terminals to one apple, in spite of the pressure it puts on behind-thescenes operations to lose terminal time in the event of an extensive dump. the reaction of the public to being told a terminal that usually helped them was inoperative has been so negative it outweighed the delays in data entry. therefore, a change in the current configuration will soon be made. meanwhile, we realized the second backup system was not being used to greatest advantage. when the libs 100 was down, the next most pressing demand after main library checkouts were checkouts at the largest branch, located near the railroad station. we were collecting about thirty transactions an hour or less at other locations in the main building while the station branch staff were writing down twice that amount or more and explaining to their public that the computer was down. it seemed important to pursue the possibility of connecting one of the station branch's terminals to the second apple while keeping the apple itself in the computer room in the main library. not only was there even less space in the branch for another piece of hardware on their counter, but staff training and hardware control presented a greater problem since many more part-time people were employed there. cti worked on the problem for about two months, resolving it through the addition of a modem to the basic configuration. in this new installation, which we did ourselves with phone assistance from cti, and which has been operational for three weeks as of this writing, the dedicated phone line connector for the branch terminal is removed from its port on the libs 100 console and inserted into one of the switching box connectors. the apple is turned on as usual and the crt laser terminal at station branch appears to operate normally. in fact, it operates so closely to its usual libs 100 mode that staff members forget they are not online with the libs and call up to find out why inquiries don't work. we are still experiencing a significant amount of downtime with our libs 100. some of this is attributable to our relatively full storage, requiring us to perform housekeeping routines frequently, but the rest is a result of system failure. now, however, because of the apples, this causes far less anguish in the circulation department. when the libs 100 goes down, the permanently connected backups are switched on in the computer room by their staff and circulation clerks continue checking materials out on their regular clsi equipment in the main library and station branch. on days when housekeeping chores are scheduled, the console operator's job includes turning on the apples so we can begin serving the public when the doors open at 9:00 a.m. unless downtime persists for more than a day, no other routines are done except checkouts. under some circumstances, certain materials might be checked in on the apple, but it is not desirable to do this for newer materials on which holds may have been placed. when the libs 100 is online again, the checkout station is switched back to normal mode and the apple takes over the information desk's port for dumping, rendering that terminal inoperative. dumping continues around the clock until all transactions have been processed from both apples. normal activities proceed at all other terminals. diskettes are dumped in chronological order. as the dumping process operates, a file of transactions eliciting error or exception messages from the libs 100 is created on the apple diskette. this file is available for attention at a later time for manual entry into the database. the chief asset of the dumping process is the accuracy achieved by automatic inputting. when we used paper and pencil, not only was the original writing time consuming, but manual data entry was difficult because of illegible handwriting, inaccurate transcription of the numbers, inaccurate inputting into the database, and lack of available personnel for the job. the cti system resolves all of these difficulties, but a price is paid in the loss of the dumping terminal's services. the public may be less disturbed if a terminal in a nonpublic area is used. but to the department involved, access to the database is a central part of their work and its loss severely limits their output. in fact, dependence on the automated circulation system by all departments in the library has been swift and universal even though we originally assumed the terminals outside the circulation department would be used sparingly. plans are being made to store personnel records in machine-readable form on diskettes. other developments are being put on a back burner until we have less frequent communications 299 need for the apples as backups. however, levels, great neck library's youth department, has several apples of its own on which budding "computerniks" practice their art. for them there are few limits to possible applications-perhaps only the outermost boundaries of imagination. reference 1. joseph covino and sheila intner, "an informal survey of the cti computer backup system;· journal of library automation 14:108-10 oune 1981). computer-to-computer communication in the acquisition process sandra k. paul: skp associates, new york city. in the 1970s, we entered the period of computer-to-computer communication; we now appear to have reached the second stage of development. today more than seventy publishers are equipped to receive computer tape orders and input them directly to their order fulfillment systems; twenty-six publishers can produce computer invoices and credits for their customers; six are capable of sending monthly updating information about titles, prices, publication dates, and books declared out of print. all of this, however, is based on a system through which computer tapes are sent from buyer to seller and back via the united states mail. the next stepcomputer-to-terminal or computer-tocomputer communication-is just around the corner. historical perspective how did this happen? it started in september 1974 when dewitt c. ("bud") baker, newly appointed president of the baker & taylor company, envisioned the savings his company could find if their customers provided the international standard book number (isbn) on their orders. he also believed that the volume of paper created by the computer was expensive and time-consuming for publishers to handle. lib-s-mocs-kmc364-20140601052211 96 ]o11mal of library automation vol. 5/ 2 june, 1972 analysis of search key retrieval on a large bibliographic file gerry d. guthrie, steven d. slifko : research & development division, the ohio state university libraries, columbus, ohio two search keys (4,5 and 3,3) are amlyzed using a probability formula on a bibliographic file of 857,725 records. assuming random requests by record permits the creation of a predictive model which more closely approximates the actual behavior of a search and retrieval system as determined by a usage survey. introduction systems planners are hard pressed to accurately predict the access characteristics of search keys on large on-line bibliographic files when so little is known about user requests. this paper presents a realistic model for analyzing different search keys and, in addition, the results are compared to actual request data gathered from a usage survey of the ohio state university libraries circulation system. a number of papers are available in the literature concerning search key effectiveness; however, all of these were done on relatively small data bases ( 1-5) . of particular importance to this paper is kilgour's article on truncated search keys ( 6) . purpose the purposes of this study are ( 1 ) to determine the comparative effectiveness of the 4,5 and 3,3 search keys, ( 2) to compare two predictive models, and ( 3 ) to test the results with an actual usage survey. method the ohio state university libraries circulation system contained at the time of this study 857,725 titles representing over 2.6 million volumes in the analysis of search key retrieval/guthrie 97 osu collection. the data base used for this study was the search key index file which contained one search key for each title in the master file. the search key is composed of the first four letters of the author's last name and the first five letters of the first word of the title excluding nonsignificant words ( 4,5 key). title words are passed against a stop-list to determine significance. the stop-list contains the words: a, an, and, annual, bulletin, conference, in, international, introduction, journal, of, on, proceedings, report, reports, the, to, yearbook. the search key file is in sequence by search key. for comparative purposes, a second search key file was created and sorted which contained a 3,3 key (the first three characters of the author's last name and the first three characters of the first significant word of the title. ) the two files of sorted search keys were then processed by a statistical analysis computer program. this program created a frequency distribution table of identical keys, i.e., how many keys were unique, duplicated once, duplicated twice, etc. from this table two models were compared. modell: file entry was viewed as a random process with choice of any unique search key equiprobable. this model has been suggested in the literature mentioned earlier. it states that if x;. number of keys will return i matches then the probability of a file search returning i matches may be written: p(i) = xi/ku where ku is the total number of unique file keys. likewise, the cumulative probability for i or fewer matches is i i p(i) = ~ p(i) = ( l x;. )/ku i= l i= l model 2: file entry is viewed as a random process with the choice of any record equiprobable. thus, p( i) = ix;/rt where r t is the total number of file records. correspondingly, i i p(i) = l p(i) = ( ~ ixi )/rt i= l i= l survey: the ohio state university libraries automated circulation system includes a telephone center to which patrons may telephone requests for 98 journal of library automation vol. 5/2 june, 1972 library holdings information and for checking out and renewing books. telephone operators, sitting at cathode ray tube ( crt) terminals, translate the patron's author-title request into a 4,5 search key and proceed with a file search. by having the telephone operators treat te lephone calls as random input to the system and recording the number of matches returned for each search used, results can be generated in the same form that both of the models take, i.e. , i or fewer matches have been returned p( i ) x 100 percent of the time. this is a relatively easy survey to conduct since the output list of matching records for any particular key entry is headed with the exact number of matches which follow. the sample size was 1000 information requests recorded over two one-week periods separated by one month. before these two subsamples were merged, statistical analysis on their individual means (for percent of 10 or fewer matches) signified they were identical at the 99 percent confidence level. results the results predicted by the two models for both a 4,5 and 3,3 search key for 1-10 matches appear in tables 1 and 2. the figures pertaining to the 4,5 key can be compared directly to the data received fro m the survey conducted through the osu library's telephone center. this comparison is shown in table 1 for 1-10 matches. table 1. file access comparisons (4,5 search key). (percent of time i or fewer matches returned) i 1 2 3 4 5 6 7 8 9 10 actual survey 35.9 53.8 66.0 73.1 78.5 81.3 83.8 85.6 86.6 87.8 modell model 2 (random key) (random tecord) 81.3 55.7 92.9 71.6 96.3 78.5 97.7 82.4 98.4 84.9 98.8 86.6 99.1 87.8 99.3 88.8 99.4 89.6 99.5 90.2 to acquire a 99 percent upper confidence limit on the percent of requests returning 10 or fewer matches, the normal distribution was used as an approximation to the binomial distribution ( n = 1000, p = .878 ) producing an upper limit of 90.2 percent. analysis of search key retrieval/guthrie 99 table 2. file access comparisons (3,3 search key). (percent of time i or fewer matches were returned ) i 1 2 3 4 5 6 7 8 9 10 discussion modell (random key) 64.3 81.0 87.9 91.6 93.7 95.1 96.1 96.8 97.3 97.7 model 2 (random record) 28.0 42.5 51.7 58.0 62.7 66.3 69.3 71.8 73.9 75.7 in table 1 the results of the survey show that 87.8 percent of all searches recorded returned 10 or fewer titles. in modell, assuming that requests of the file are random with respect to search key, it is predicted that 99.5 percent of all searches will return 10 or fewer titles. all predicted percentages for model 1 are consistently higher than observed results. the predicted response in model2 more closely approximates the observed behavior of the system as the number of responses increases. however, model 2 is also consistently higher than the actual survey. comparing model 1 and model 2 only, it is apparent that assuming a random record request more accurately reflects the true usage of a library collection. the lower percentages recorded in the actual survey may be attributable to a number of variables not taken into consideration in this study. clustering due to common english word titles and common names may account for the greater part of this difference. table 2 shows the results of predicted response for a 3,3 search key. in this table, model2 predicts that only 75.7 percent of requests will return 10 or fewer titles. equally important, only 28.0 percent of the requests will return a single record. conclusion in predicting the expected behavior of an information retrieval system, it is more accurate to assume random requests by record than to assume random requests by search key. probability predictions are deceptively high for assumed random key requests and do not reflect actual usage of the file. even assuming random requests by record will produce higher-thanobserved results. data calculated using model 2 should be considered as an upper limit or "ideal" performance indicator. regarding the results of 100 journal of library autvmatio11 vol. 5/ 2 june, 1972 the random record model as the upper limit on effectiveness of the search key, the data gathered indicate that, as the search key is shortened from 4,5 to 3,3, the deviation between the random key and random record models is considerably heightened. the 4,5 search key is more efficient for retrieval of 10 or fewer records from a large file than the 3,3 key (90.2 -75.7 percent ). based on these data, the osu libraries decided to retain the 4,5 search key and not reduce it to 3,3. additional studies should be undertaken to determine the effects of common word usage, common names, and their relation to book usage. secondly, the data presented here could be systematically and randomly reduced in size to predict the behavior of various search key combinations on varying file sizes. references 1. philip l. long and frederick g. kilgour, "a truncated search key title index," journal of library automation 5:17-20 (mar. 1972 ). 2. frederick g. kilgour, philip l. long, eugene b. leiderman, and alan l. landgraf, "title-only entries retrieved by use of truncated search keys," journal of library automation 4:207-10 (dec. 1971 ). 3. frederick g. kilgour, "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science 5: 133-36 ( 1968) . 4. frederick h. ruecking, jr., "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," j ournal of library automation 1:227-38 ( dec. 1968). 5. william l. newman and edwin j. buchinski, "entry / title compression code access to machine readable bibliographic files," journal of library automation 4:72-85 (june, 1971 ). 6. frederick g. kilgour, philip l. long, and eugene b. leiderman, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7:79-81 ( 1970). .. starr ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ lib-s-mocs-kmc364-20141005043103 the new york public library automated book catalog subsystem s. michael malinconico: assistant chief, systems analysis and data processing office and james a. rizzolo: chief, systems analysis and data processing office, the new york public library. 3 a comprehensive automated bibliographic control system has been developed by the new york public library. this system is unique in its use of an automated authority system and highly sophisticated machine filing algorithms. the primary aim was the rigorous control of established forms and their cross-reference structure. the original impetus for creation of the system, and its most highly visible product, is a photocomposed book catalog. the book catalog subsystem supplies automatic punctuation of condensed entries and contains the ability to pmduce cumulation/ supplement book catalogs in installments tl'ithout loss of control of the crossreferencing structure. background in 1965 studies confirmed what much of the new york public library's administration had long felt: the public card catalog of the research libraries, containing entries dating back to 1857, was rapidly deteriorating.1 it was estimated that 29 percent of the cards were illegible, damaged, or in some other way unusable. further, cataloging and card filing arrearages were monotonically increasing at an alarming rate. increases in labor costs were eroding all efforts to cope with these problems manually. in addition, the deputy director at that time (now director), john m. cory, realized that a wider base of support was absolutely essential to the survival of the new york public library as an institution. as a result of these disquieting observations, three logical conclusions followed. first, the existing card catalog would have to be closed off, rehabilitated, and photographically preserved. second, available technology should be explored as a possible solution to some of the spiraling arrearage problems. in particular the applicability of computer technology was to be explored. this exploration appeared to offer some most attractive longterm solutions. the capture of all future cataloging in a machine-readable form would obviate for all time the deterioration problem. this strategy could also provide a basis for a check against spiraling costs, since traditionally unit costs have tended to increase in manual and decrease in 4 journal of library automation vol. 6/ 1 march 1973 automated systems.2 seen within the context of the marc project at the library of congress ( lc), the economies were becoming manifestly obvious. the long-term benefits to the entire library community of a national network of shared machine-readable bibliographic data could not be denied. capture of data in machine-readable form for use by information retrieval systems which might become economically feasible in the near future had to be viewed as a matter of great value. third, wider access to the resources of the new york public library had to be provided if a wider base of support for the library's operation was to be sought. the solution decided upon was the development of an automated bibliographic control system capable of producing photocomposed book catalogs. the book catalog would then serve as the prospective catalog and augment the retrospective card catalog, which would also appear in book form following photographic duplication of the cards. 3 this solution, at one stroke·, addressed itself to all three of the major problems, and showed great promise as a future investment. reproducible book catalogs could be widely distributed. a machine-based system would eliminate manual filing, would take full advantage of cataloging available from marc, and would begin at the earliest possible time the establishment of an invaluable machine-readable bibliographic data base. photographic techniques had already been employed in producing book catalogs, e.g. the national union catalog, the book catalog of the free library of philadelphia, and the enoch pratt free library catalogs, among others. 4 computer-produced book catalogs embodying various techniques (computer line printing, photo-typesetting, etc.) and levels of sophistication were being produced by many institutions, e.g. harvard university's widener library shelflist, stanford university's undergraduate library catalog, baltimore county public library's catalog, among others.57 an extensive review of various types of book catalogs including typical pages of each is given by hilda feinberg.8 following extensive studies conducted by messrs. henderson, rosenthal, and nantier of the nypl research libraries, the systems analysis and data processing office (sad po) was formed, staffed by edp and library specialists, to be completely dedicated to the solution of problems of automated bibliographic control and library automation. from the beginning it was decided that if edp technology were to be utilized, it should qe utilized in a manner which took full advantage of the properties of the medium. the computer was not to be used as an ultrasophisticated and costly printing press. the application of new technology to a field will invariably lead to waste and awkward results if the intrinsi_c properties of the technology are not fully utilized. the fundamental properties of edp technology lie in its abilities to: 1. reorganize and combine data; 2. select items meeting a set of predefined conditions; 3. maintain a permanent but flexible correlation between items; ' automated book catalog subsystemj malinconico 5 4. transform a set of conditions into data; 5. perform all of the above with remarkable speed and accuracy; 6. perform all operations with a merciless consistency. thus, it was realized, at the outset of the project at nypl, that technology could provide a great deal more than the maintenance of a machine-readable record and its reorganization for display. a rigorous control of bibliographic data was possible, and would extract maximum utility from any investment in edp technology. it was with these ideas in mind that machine-based authority control and filing systems were developed. the authority control file provides the fnndamental utility of the system. control of data usage has always been of paramount concern to the professional bibliographer. it becomes even more important in a machine-based system in which the data lie in an essentially invisible form until a fairly complex display operation is performed. advantages of an authority file another bibliographic aid which the computer could provide through an authority control system was the maintenance and integrity of a crossreference structure. in addition, one of the classical functions of crossreferencing could be eliminated : it would no longer be necessary to direct a user from one classification which has been used extensively to a newer one when terminology changes. consider the problems which might arise if the library of congress were to change its current usage of the heading aeroplane to airplane. it would be virtually impossible, under a manual system, for a library to attempt to locate, alter, and refile all cards bearing the tracing aeroplane. with a central authority file the problem is reduced to a single transaction and a fraction of a second of effort by the computer. the change is effected with an accuracy unattainable in a manual system. finally, the common nuisance of a cross-reference leading to yet another cross-reference is automatically obviated. the presence of a machine-readable authority file and the ability to verify use of all forms against this central authority, with machine accuracy, eliminates all clerical errors in the usage of names and headings to which a manual system is susceptible. the problem of consistent usage is greatly compounded in a machine-based system which does not provide mechanical verification. inconsistencies in any automated system generally tend to diminish its utility, and invariably lead to ludicrous results. nonetheless, inconsistencies of usage in an automated system are more readily corrected than those in a manual system. the existence of a central authority file, however, reduces the operation to maximum simplicity and allows no deviation from established standards. while maximum rigor in machine control was attempted, an attempt was also made to shield the professional librarian, who would be using the system, from as much of the tyranny imposed by the machine as possible. in the system finally adopted, the librarian need only exercise care when 6 journal of library automation vol. 6/ 1 march 1973 establishing a form. following establishment of the form, the cataloger need not be concerned, with any of the details of the entry, such as punctuation, accent marks, marc delimiting or categorization. the authority subsystem supplies all such details. in short, the cataloger is only required to spell the form correctly. the machine will identify any incorrect usage; thus a great deal of tedious and time-consuming (and thereby costly) manual searching is eliminated. at the same time that work began on the automated system at nypl extensive activity in library automation was also in progress in many other parts of the country, involving virtually all areas of library operation: cataloging, acquisitions, serials control, circulation, and reference services (information retrieval) . since, at nypl, it was assumed that the bibliographic data base and its conb·ol would form the cornerstone of each of these systems, cataloging was given first priority. this approach differed from that taken at other institutions; others, columbia university for example, chose to develop an acquisitions system first. 9 still others developed highly sophisticated circulation systems, ohio state university being notable among these. 10 even among those institutions which chose to address themselves to the problems of automated cataloging, important differences in approach were evident. these diherences were largely a result of attempts to solve different types of problems related to cataloging. among the many projects initiated at that time two will be mentioned, as they are representative of the differences in approach to automated cataloging. the first is represented by the university of california union book catalog project, undertaken by the institute of library research ( ilr). this system is characterized by an attempt to minimize, via computer programming, manual intervention in data preparation. employing the technique of automatic format recognition, the ilr staff attempted to find the most economical means of rendering a vast amount of retrospective data into machine-readable form. 11 in converting such a large amount of data they had to also concern themselves with the statistical error levels to be expected from keying. having decided that extensive manual edit was too timeconsuming and costly, and itself prone to statistical error, they attempted to create computer programs which would use the massive amounts of data as a self-editing device. in a sense, ilr used the nature of the problem as its own solution. the goal of the project was the production of a book catalog representing a five-year cumulation ( 19631967) of materials on the nine university of california campuses, and a marc-like data tape clean enough for print purposes. nypl, on the other hand, decided to consider only prospective materials in a continuously published catalog, and the creation of a marc-like record which would approach in completeness, as closely as was economically feasible, that created by the library of congress. to this end manual tagging and editing were absolutely essential. automated book catalog subsystem/malinconico 7 the second system to be considered is the shared cataloging system developed by the ohio college library center.12 the primary emphasis here is on the economy to be derived by instantaneous access to the combined cataloging efforts of a cooperating group of libraries. at oclc the primary emphasis was placed on on-line bibliographic data input and access. the major bibliographic product to be produced was a computer printed card set. the overriding consideration of oclc was the sharing of resources among many users, while at nypl the major concern was the content integrity of a single user's file. advantages of a book form catalog a book form catalog has several advantages over a card form catalog: it is portable, compact, more readily scanned and extremely simple to reproduce. when coupled with an automated system for maintenance and production the advantages are greatly magnified, as manual filing is virtually eliminated. the format, sequencing, and usage of terms in a book catalog may be varied at will to accommodate users' needs and library service policies. advantages and disadvantages to book catalogs are summarized in the introduction to tauber and feinberg's collection of articles on book catalogs.13 comparisons of book versus card catalogs are presented by catherine macquarrie and irwin pizer in articles reprinted in the work cited above.14• 16 the most obvious advantage of the book catalog is its portability. wide availability of the catalog of a library's collection makes possible a level of service not economically feasible under any other system. access to the complete collection of a library system can be made available economically to every educational institution in the region served by the system. access to a highly valuable research collection can be made available to a much wider geographic region than was hitherto possible. the concept of a union catalog for a region becomes much more viable, making possible regional cooperation in acquisitions policies and relieving the burden of heavily duplicated collections currently borne by library systems within manageable geographic regions. such cooperative ventures allow the cost of maintaining the catalog to be defrayed among the various members of the consortium. thus, a book form catalog would appear to provide groups of libraries with the possibility of operating economies, while increasing the overall level of service to the public they serve. the utility of a book form union catalog has already been demonstrated by the experience of the m~d-manhattan libraries in new york. midmanhattan, a central circulating library, consists essentially of five libraries in two locations. provision of complete bibliographic access with a traditional card catalog would require the manual maintenance of five individual and two union catalogs. the utility of the mid-manhattan catalog has been further increased with the inclusion of the entire nypl branch library system in january 1973. 8 journal of library automation vol. 6/1 march 1973 a library's internal operation benefits by wide availability of the catalog, as individual copies of the catalog can be made available to the acquisition division, the cataloging division, and each of a library's special collection administrators, making references to the traditional official catalog more efficient; such has been the experience of nypl. baltimore county public library reports a similar finding.16 a perhaps hidden advantage of a book catalog lies in its compactness. a book catalog requires neither the space nor the expensive furniture required by a card catalog. the problem of space becomes more and more acute as the "information explosion" continues to mushroom. an ironic squeeze is encountered in that the collection yearns more and more for the space occupied by the catalog, while the catalog, in growing, continues to make its own demands on available space. description of the nypl bibliographic system files before attempting to describe the book catalog subsystem, we shall briefly describe the nature of the files from which the bibliographic data are drawn. the complete bibliographic system consists of four major files and computer programs for their control and maintenance. • the files are: 1. complete marc data base (updated weekly with all changes and additions) from which cataloging may be drawn; 2. bibliographic master file; 3. authority master file; 4. bibliographic/ authority linkage file. for the purpose of this discussion we shall take the existence and maintenance of these files for granted, and concern ourselves solely with their use in the production of photocomposed book catalogsp bibliographic master file this file contains unit records for each bibliographic item in the collection; t books and book-like materials, monographs, serials, analytics and in• the system actually consists of three independent sets of such files (marc is common to all)-one each for the research libraries, the branch libraries, and the dance collection. t separate data bases are maintained for the research and branch library systems. the research libraries file contains all book and certain book-like material added to its collections since january 1971. the branch libraries' file contains all holdings of books and book-like materials of the mid-manhattan library collections. this file currently duplicates to a large extent the holdings of the rest of the branch system, and will eventually encompass the entire system. automated book catalog subsystem/malinconico 9 dexing items are included.: the information content is identical to that of marc records. tagging and delimiting adhere to the marc conventions except in those cases in which it was necessary to expand delimiting in order to enhance the functional utility of the marc coding structure. some data distinctions which marc has since dropped, but which are nonetheless useful, have been retained. the expansions consist of the addition of several delimiters not used by marc in order to provide filing forms (which are automatically generated, but which may be manually overridden) for titles, and sequencing information for volume numbers of series and serials. transformations from a marc ii communications for· mat to the nypl format and vice versa are possible due to the isomor· phism of the two records. the transformation of marc ii format records into nypl processing format is carried out in the normal course of processing, in which marc records are selected for addition to the nypl files. authority master file this file is the central repository of all established forms. names (personal, corporate, and place), series titles, uniform titles, conventional titles, and topical subject headings are all established on this file. categorization of each form is controlled by this file. no form is accepted for use in a bibliographic record unless it matches a form already established on the authority file, and is used consistently with the categorization assigned to it, e.g. a form categorized as a topical subject is never permitted as an author, a series title may only match a form categorized as a title, etc. the cross-reference and note structures are maintained on this file. an additional heading employed by nypl, which falls conceptually half way between a cross-reference and a subject heading, the dual entry, is also controlled here. the dual entry heading serves to bring together, under a nonlc heading, bibliographic items which nypl considers unique, by virtue of the nature of its collection. an example might be found in the genealogy division which contains a very extensive collection dealing with new york city. use of the dual entry allows a sequencing under both a subject heading indirectly regionalized to new york city (lc heading) and at the same time a drawing together of all items about new york city into a single sequence headed by new york city. take, for example, the lc established heading elections-new york (city); nypl automatically causes all ~ at nypl a distinction is made between analysis and indexing of a work. the latter refers to selective analysis, used when it is desired to provide, for example, subject access to a significant article in a periodical without creation of the series added entry. there are two types of indexing provided by the nypl system. the fust creates only a subject tracing; such treabnent might be accorded an article of topical significance by a staff writer of a popular periodical. the second would create both an author and subject entry; this might be used in the case of an author of note writing on a significant subject in a popular periodical, e.g. norman mailer writing on political conventions for esquire magazine. 10 journal of library automation vol. 6/ 1 march 1973 items traced to the above heading to appear under both the lc heading, and the dual entry new york (city)-elections (figure i). the dual entry merely provides an alternate form of organization for display. no bibliographic tracing is permitted directly to a dual entry. the additional entry point is automatically created when a catalog is printed. manual effort by the cataloger in order to provide the additional entry point is prevented; in addition, the bibliographic record remains rigorously marc-compatible. automatic control of cross-references, dual entries, and the en masse alteration of classification are facilitated by the authority subsystem together with the correlative and reorganizational capabilities of the computer. there is some irony in the relative ease with which the computer allows such individualized organization of data to be effected and the computer's reputation-richly deserved-for imposing a bland uniformity on its victims. the authority file provides one other invaluable service: it controls, in a single location, filing forms to be associated with a heading. consistency of filing is assured and, again, extreme simplicity of alteration is possible. only one record need be changed in order to alter the filing of the entire 201 election handbook. [boise, 1970] 87 p. 71-511901 [jld 71·314) elections in ghana, 1969. austin, dennis, 1922· [new delhi, 1970] 26 p. 7l -44166s [jfe 71·564] elections • japan. curtis, gerald l. eleclion campaigning, japanese style. new york, 1971. xiii , 275 p. 71-591294 [jld 71-805) elections • jurisprudence. see elecflon law. elections • lancashire, eng. • history. clarke, p. f. lancas hire and the new liberalism. cambridge [eng.} 1971. ix, 472 p. 71-s09sj8 (jle 11·191) elections • management and methods. see electioneering. elections • new york (city) .. ivins, william mills, 1851-1915. machine politics and money in elections in new york city. new york, 1970 [cl887} 150 p. 72-41160 [irgn 72-92] elecfions ·norway. koritzinsky, theo. velgere, partier og utenrikspo litikk. oslo, 1970. 182, [i] p. 12-261079 [jld 72·536) 2 4 new york (city) • economic assistance. poston, richard waverly. the gang and the establishment. new york [ 1971] xii, 269 p. 72-59612 [jld 72-583] new york (city) • economic assistance· law and legislation. u. s. congress. house. committee on education and labor. subcommiltee on the war on poverty program. antipoverty program in new york city and los· angeles. washington, 1965. vii, 209 p. n -222049 [jle 72·171) new york (city) • elections.+ ivins, william mills, 1851-1915. machine politics and money in elections in new york city. new york, 1970 [cl887] 150 p. 72-41160 [irgn 72-92] new york (city). environmental protection administratioll. fabricant, neil, 1937· toward a rational power policy: energy, politics, and pollution. new york [1971] vi , 292, [30] p. 72143433 [jse 72·291] new york (city) federation of jewish philanthropies. see federation of jewish p•iluthropies of new york. fig. 1. the nypl research libraries dictionary catalog, july 1972: ciu-f page 201 on the left, and ln page 297 on the right. dual entries under new york (city) are shown on the right. this catalog was produced in 6 and 8 pt. type set on 8 pt. body. automated book catalog subsystem/ malinconico 11 body of material associated with a heading. filing forms are automatically generated, with provision made for a manual override. automatic filing has been found to be correct in better than 95 percent of the cases currently in use. the remaining 5 percent required manual intervention. the machine filing algorithms are based on language and on marc categorization and delimiting. 18 initial articles are dropped in each of thirty-eight languages, including the major languages transliterated into a romanized alphabet (those employing cyrillic alphabets, oriental languages, hebrew, and yiddish) . chronological subdivisions are filed automatically obseiving rules regarding inclusiveness of dates, etc. important chronological periods (currently fifty-four such periods) are recognized and filed automatically, e.g. american revolutionary and civil wars, french revolutions, chinese dynasties, middle ages, etc. roman enumeration is automatically filed in correct decimal sequence. bibliographic/ authority linkage file the basic function of the bibliographic/ authority linkage file is to provide a communications channel between the two major files by assigning to each authority form a neutral unique number. the linkage file then provides access to the established form regardless of the metamorphoses which it may have undergone since its original use (the number remains inviolate). each authority upon addition to the file is assigned a unique number; however, the authority file is sequenced by an alphabetic sort key. this sort key bears no logical relationship to the filing form of the heading; it is constructed by dropping punctuation and accent marks, converting to upper case, dropping multiple blanks, and appending a hash total. the linkage file maintains the correspondence between authority control number and alphabetic sort key. only the authority control numbers, determined by the first bibliographic/ authority file match for each field, are carried in the bibliographic records. in addition, information is provided to the book catalog subsystem regarding changes in the authority file (alteration of established forms, etc.) which would cause an entry exhibiting such alterations to be immediately regenerated for inclusion in a book catalog supplement. appropriate action is taken against the bibliographic file when activity to an au~ thority heading is sensed by the book catalog subsystem. the presence of a dual entry form, which will require the creation of an additional entry under the associated variant form, is also indicated here. alternative input files it should be mentioned that the full set of files described above is not a mandatory requirement for creation of a book catalog. a bibliographic file in a marc ii communications format alone will suffice. we have performed tests using both another library's data file and the marc file as 12 journal of library automation vol. 6/1 march 1973 sole input to the system. using unmodified file update software we have generated from these marc ii format data bases complete authority files, and thence book catalogs. no cross~references or scope notes are possible in this mode of operation, since marc makes no provision for them. a further experiment was performed using another library's data base (in marc ii format) in combination with the cross-reference structure of the nypl authority file. this led to highly satisfactory results, demonstrating that a photocomposed catalog could be created, and exhibiting the utility of the input file enhanced by cross-references.§ the photocomposed book catalog subsystem the system for production of book catalogs represents only the visible tip, albeit a large and complex tip, of the entire bibliographic system. it consists, in a11, of ten computer programs and several score modules. the system was designed with thought toward production of catalogs with a variety of output options. in most cases, these options can be attained by the elimination of entire programs or modules. space does not permit a consideration of all possible variations; the most important will be mentioned in the course of the discussion. one consideration which was deemed of paramount importance was to remain as independent of photocomposition hardware as possible. photocomposition is yet in its infancy; hence, an inextricable commitment to a particular device, it was decided, was to be avoided. the final approach taken was the design, by sadpo, of generalized photocomposition software which is responsive to device-independent typographic commands. the only function of this software is to accept, as input, completely defined text data and typographic instructions from which it generates formatted pages. this task is accomplished via a translation of device-independent into device-particular commands in the form of a photocomposition device driver tape. should a new or more desirable photocomposition device become available, or significant advantage be found in employing a different photocomposition vendor, only one program need be altered. the photocomposition software is completely generalized and can be used to generate anything from book catalogs to typeset prose, in virtually any format (see the section on the pagination program for a discussion of the formatting options provided). figures 2, 3, and 4 demonstrate some of the possibilities . . the creation, organization, and control of data to appear in the catalog was undertaken as a completely distinct set of programming tasks. design obfectives of the book catalog system before embarking upon a discussion of the technical aspects of each g the september 1972 hennepin county public library book catalog was published with a bibliographic data base produced by the hennepin county library combined with the nypl research libraries' authority flle. automated book catalog subsystem/ maljnconjco 13 dictiona r)' catalog supplemen t, august/no vember 1972 a. a. a. see associ~td amt'rlc.n artists. a. a. h. p. e. r. see .a•etitoan assod1tion for h• altb. ph)l'iq' ialllcmioft, a•d recrn.lio.-. a. a. t... l 1o«: ar.rima .4uoe:ia tl011 ol law l.lh..-. a amoa re~rts. afro-american music o ppnrtu!uiic's assnd:lllinn. v. 3. no. 4; o c i./oc<:. 197 1minneapolis. ci.jrrf.ntlssues a v a ilable t:-1 music division . 12-.. dl .. 979 [m.,sie div.) a. a. s. sec assor:i:ado• l•r asilft sra~~d~s. a. a. s. h . 0 . f or corpotati: body reprek!i.icd by cht'sc ioailiits. ~e:. a~k:ajto ~ioal .r s tme hipw•y otneials. a. b. a. sec: american bankert aisecis~~iiod. abe clt r ft rmehtmpr:ill.let'lt chnik. sareng., klaus k. ( bertin. 19701 243 p. 72-j96ll1 (jsd 72·391] an a. b.c. of b:ritidl fe.ms. ma~arthy, d.1phne. 1-ondnq, 1971. 127 p., 2() plaits. 72-21s.ui (jse 72· 640] abc ~oor de ... atenporl. bron&erj, j . f. laren [197 1] . 199 p. 12·l9llll [jfd 7l·l831] abc"s ol libr.ry pronotiojt. sherma n, stc11e, l9l8· metuc he n, n.j., 19 7 l.lv, u2 p. n .nn •7 (jld 72-1106] a. c. a. for corpot"ate 'body represc:r.te d by these iflilials. !4x: assocbtell coundls • f tile arts. a cu d: • n-itk al imi1llt int• israer.s. 111--.s. 1, no. s/6-; lt\ l.!g._ 1971?· oeylon. mo. curr ent i$si.j£s availa8le in pejhodic a ls division. 72-421 5-to (per d l,,] a. c. s . moaow•ph. ~ee ame rican chen~ieal society. acs mo11•1raph. a etui ha rfd•no. mn trona.rdi, lucio, 193().. milano, )971. 14fl p. 71·2hi1.s [jfil 71· 1735] t\vail ..-.ole 1n periodicals division. i'2-2ui24 (p~div.] a. i . p. see arhr ftaa l11stit•te of pi••mts. a. j. r. se.~ au.claz.imv .ita1ia .. rsui. a. j. c. su a .. r~ jewisll c~tec. a. l. g. 0 . l. (computer program languagei see algol (computer program unguagei a. l. p. see lllbor parf)" (a.astraua). a ia quile de din. s.b4e. philippe de. parfs. 1970.96 p . 12·11112> [~fo 7:1-,78] a ia ~he clts trisws ~ bohattc;. m ilnsl.a\1, [pracuc-. 1970) ,;g, (63j p. nf ill"-s. lp :u1 c nl.) 1l-<40jil76 [ifg 12--9"] a la vitille rutslf:, in.c. the att or the anldsmith. & the jc:we\tr; a loan c'lhibition ror the bcntfit c f the: y oun1 women·, chri51i an auoc:ia,ion of 1hc: city nf new y ork. novtn~bc:r 6·novcmbc:r zl. 1961. new york (1 96!1 u9 p , ill us. (pari col.) 2& em. 7l.j09ll2 [mn o 71-ui] a. m . a. rc: a.-rkaa mn.ical auoclmka. amdg, a histor y ol c•dhts coli~ 18?0.79. harney . thomas e .. tl9 7· c~n i!dus col le&e. n.e w york (1 971lls ~ p. n-m(;}o 7 holo] a. m. i. r . a. ~ a111straliu miael"'!ij lnlll•stries ranrch auod•tio.t. a. m . i. r . a. bullt tla. a lluraliaa mu~cral industries rnurc:h associauon. bulletin. fmtibou rnc:) full recolo o f holdi ngs in central serial jt.ecord12·171208 [jsp 7l-29t] a. m. s. !icc: ame riun m•t .. c~nclk•l s odety; amcric:a n m et toroloaica l sot'if:ty, a m o•tevidu. 0<)uoq:ha.lk. louis moreau, 1829· 1869 . (s)rnphony. nca. 2) (1168?) score: (sj pi 7l·lt;<91 (lng 7h9) aaron. daniel. 19j2· supervi;sioll :and c un-lcwlum de11elopmen 1. a. s. c. i. see hylller•bn, 11\di:a. admin.lscr • tl'e s taff col~ae ol i-.4.1l a. s. 1. s. stt aidt:ric ... soclelr r.,. t.r.r-.ati .. sdctl«. asis werlc,o.p ef c..,.t~r cdmposkior, washi•ston. d. c., \970. prn.::ccdin &'· edited by r o bert m. lartd.au. wu hina:to n , atiutican society fnr t11rorm<1 1i on scicnc:t [i 97 11 he., 2 58 p. illw. 24 em. bibliojtllpl'ly: p. 249-l so. 1l·l'161jo (jfe 72·1217) a. s. s. £. me a.uricu s.c~ty of sa-.itary ~ ... a. s. t. d . see amet~ s.clety fer tr ... ni"l md j>e..-~lop•t•t. as1mt l£s/ aiaa sptt:t sinn1.lation co,.ruc "c~, :zct. philt delpllia. 1967. tcchnk~l p.ap~r!l. ~~:~ri':.~h;~;)j~~219 ~~t~r. ~3' ~:.tin& •cch.pon10rrd b y a..nmta.• s!ikiely for ttu.i'-1 •nd m i mrim. lm;ritlll:t of £nnm.-enut st-ienca (md) j\llteri.:n lm.~itimt of auou.ckj. md ast~nt~wtk:l.. • 7l...ojii10 [js p 71·5 11] a., s ergio lllirliitr. ~c lll:irlien a •• str&io. 1945· a. t . .1\. sc~: great brlu,fr~. ait tr:ansporl a~txiu•ry. a. t. £ e. m. ~e asod.lcih t£cni~ esp:aiol:1 lilt esiydi•s m ttallilrekos. a. t . l k c acadttnit tt ri:..sopolita n 4e k tras. a·l' (fighter·bomber j'l.anf:sj ••• mustang (fighte!li'lanes ) a llt rnpo y (uea:o! mdn stntos, frllnci ~cn. sa.ntia ao d e: chile, 1970. 59 p. 72274u4 [ jfc 7:.695) a. v, h . sec: hln&:m)'· alla• •ne\nu hat6s.ic. avr; allatllltilter vlit.nt.tf·rc-port.(n r.j i·; 1972· ticusetwamm. gcrm.any. c urrent fig. 2. the nypl research libraries dictionary catalog supplement, november 1972: az page 1. this catalog was produced in 6 and 7 pt. type set on 7 pt. body utilizing a three column format. captions are in 8 pt. type. the mid-manhattan ubr/try ..4pmi~ lauis, li/j1·1 873. (cont.) ~~-~oflbt~ofl., ................ llkll!~ dte~ ...-l placci 1·14.,. ~ preaded .,. .. t.lwlti!.ui ouul.lhwlqof ~,._.., ~ot irwse.autl• ......__,~,...cm66. rt4ml'tln ,_~* .. a-at .. ._......., • aa... .. ....._ ~ neucww. 1o 24 )lilo.t ju7. 7j.jwio cd [551.312·a) agassiz. lquis, 1117-1173. luri&, edward, 19l7· loub apssiz. chieqo (cl960] 449p. 70.104ut c.:3 [b·apai .. t) tharp, louise (hall) 1898· adveniuj"o\ij alliance. booton (el g$9] 354p. "'"'"' cc3 [b·apui&-t} "'-*• jit•......, ll77•l!n7. felaaieo and impromphll. freeport, n .y., 8oob for ullraries prasa [ 1967] :u8p. <-r --) _ol ... ltuod. 71h1,.. c.:l [82+apta) agatbias, schoi.asticus, d 512. cam~, averil apthiu. oxford [c l970) 168p. (t~;~;i't~ a.,.._, ~~odor aluloj, thothe na..t olllcer's alli. 70.100619 c.: co4 [b·apew-l) manh, robert, 1932· a,new, the une•aminod-. new york [e1967) 182p. 71· 111010 c.: co4 [b-a-w·m] ..._. s. y .... alma. s-..t j_,., , .... fig. 3. the mid·manhattan names catalog, april 1972: acit page 20. this is a divided catalog produced in a two column f01mat utilizing 6 and 8 pt. type set on 8 pt. body. 14 journal of library automation vol. 6/ 1 beyond the ttable: state. new york. [<1971) 2·2s. p.; 71·591412 c.. (301.1.~5) bcr_. t~w a*-'le tt.a.. schon. oo:n•ld. ap. new yoct (ct971} 2-ls.c p.; tl·s~i47l q4 [l01.2 .. s) ...... .... amii. (eel.) freeport, n.y. (1971. df443}29tp. 12·210216 co6 [94u31 .. a) a.,o~ ........ krotj'iey, herbert. new york [~1966} 209p. 7l·'89so cc4 (3tu41-k) n. 8.._041 gila owl !u eplld,. ol ra.l sltiner, rudou', 18.61·1925. ntw vo.-k [1971) 102 p. tl·s919h q4 (294.5924-s] bbofn. o;.>wu, c. od.mcpu, 19ll· n"' york.[cl969) )17, l26p. 70.4301s9 c:.4 [966.905.0] "he bulle ami dn: tade•t nc• £me. v,~,~,, p.ola.ad de, l91l3· carden city, n.y .• 1911. 21!4 p . 1l·1386s8 cof [221-v] :bibl£ ·comment allies. bisek, m&nhcw_ (~.} peeke'a coml'l)tfttajy on lhc bib~e. (loll.doa, e196l:u 116, »ii) n:..~:=~. th::' .. -ti~ 114'7·l9u. detroit, 1961. sv. 11·399610 co6 [oiu7j.b) blb ..... ~a '-' iaiu .. amri gw.. fig. 4. the mid-manhattan titles catalog supplement, july 1972. this page was created as a test utilizing 4 pt. type on 4 pt. body. the actual supplement created for use by the public was created in 6 and 8 pt. type. automated book catalog subsystem/ malinconico 15 processing step, we shall state the objectives which we set out to meet, and the constraints-generally economic-under which they were met. method of publication as it is economically impractical to publish the entire catalog on a very frequent basis, a cumulation/ supplement scheme was adopted. two basic types of supplements are possible: ( l ) a supplement containing only new items for the period represented; or ( 2) a cumulative supplement containing all items new to the system since the last appearance of a cumulation of the entire collection, automatically replacing all previous supplements. the latter is more costly than the former. the economic desirability of the former was eschewed in favor of convenience to the user. under the scheme adopted, a user has, at any time, only three sources to considerthe retrospective catalog, the prospective cumulation, and the cumulative supplement. b we have derived several optimization formulae for reaccumulation schedules.19 application of these formulae indicated a reaccumulation cycle of approximately one year, assuming that supplements would appear monthly. the formulae also indicated that a small premium would have to be paid for the administrative convenience of spreading the printing and processing load of the cumulation over the span of the entire reaccumulation period, compared to the cost of a complete printing at the beginning of each period. the adopted publication scheme calls for the publication each month of *2 of the cumulation, together with a supplement containing all items which have not yet appeared in the cumulation and those which have been altered since their appearance in a cumulation. the division into twelve segments is table-controlled; the number of segments may be varied from one to sixteen. for example, in january a cumulation is published for the alphabetic span a-b; a supplement is published for the remaining letters of the alphabet. a similar situation would occur the following month, etc. thus, at any given time the public is presented with a set of volumes representing the cumulated catalog and a supplement which contains all material not found in the former. the public is unaware of the fact that the cumulation is being cyclically updated. they are only aware of the fact that they have no more than three sources to consult: ( 1) the old card catalog, ( 2) the basic cumulative book catalog, and ( 3) the cumulative supplement. the fact that entries are migrating from the supplement to the basic cumulation each month is of no consequence from the standpoint of catalog usage. the decision governing representation of an item in a cumulation or supplement is made on an entry by entry basis. for example, one of the n all material in the card catalog h as become known as the retrospective collection, and all material entered into the automated system after january 1972 has b ecome known as the prospective collection. 16 journal of library automation vol. 6/ 1 march 1973 subject added entries may have migrated into the cumulation; hence, it will no longer appear in a supplement. however, the main, and all other added entries, falling into different filing ranges, will continue to appear in a supplement until they too can be absorbed into the cumulation. similarly, alterations to a bibliographic record will cause only those entries whose text or sequencing is affected to reappear in a supplement. a change to or an addition of a subject tracing will cause only that subject added entry to be regenerated for inclusion in a supplement. the main, and all other added entry citations, which remain unaltered, need not reappear in a supplement (assuming they have previously migrated into the cumulation). condensed added entries in order to keep printing costs to a minimum, all added entries are condensed; title page extension, publisher, and bibliographic notes do not appear under any of the added entries, the assumption being that the user who is interested in such data will take the trouble to refer to the main entry, which contains the complete bibliographic citation. this type of back-and-forth reference, while quite awkward in a card environment, is extremely simple in a book catalog. economic considerations also led to the decision to suppress tracings from the main entry. the system was designed to allow these decisions not to be irreversible. the choice of data which are to appear with an entry is governed by a set of tables which may be readily altered should it be desired to change the format or context of an entry. punctuation of condensed entries is accomplished automatically. this is not a trivial problem, and one that only a cataloger can truly appreciate. consider, for a moment, the myriad ways in which bracketing may occur within the title or imprint statement, and the ways in which these may span the two fields. add to these factors the rules which do not permit the appearance of double punctuation. we have found that punctuation of added entries is effected correctly in 98 percent of catalog entries. in those instances in which ala punctuation rules are observed in the complete record, correct punctuation is assured (this is not true of cataloging obtained from european sources). control of cross-references it is in the realm of cross-references that the mindless consistency of the computer is most effectively employed. the goal to which we addressed ourselves was the absolute integrity of cross-referencing. under no circumstances-short of erasing a cross-reference from a previously published catalog-were cross-references to refer the user to a heading which did not have an associated bibliographic citation. all meaningful cross-references providing alternate access points to a citation must appear. by the same token, in order to minimize costs, cross-references which appear in a automated book catalog subsystem/malinconico 17 cumulation available to the public are not to be repeated in a supplement. cross-references to a heading would be considered valid entry points to the catalog when bibliographic citations appear under a subdivision of that heading. for example, the appearance of bibliographic citations under negro art-exhibitions would cause all cross-references to negro art to be generated (figure 5). the same rules concerning appearance in supplements and cumulations are observed for these secondary cross-references. alterations to cross-references which have appeared in a cumulation will cause the altered forms to reappear immediately in a supplement, provided the referenced heading is still in use in the catalog. similarly, alteration of the referenced heading would cause the reference to the new form to be automatically generated. nepi. aatiat.o. i..a comunili estctica il> kant. bari, adriatica, 1968. 399 p. 25 em. "2. edizione accresciuu: includes biblioarephl~al rcrcrepces. 72-283171 [jfe 72-659) noeriqm. shapiro, norman r. (comp) new york [1970) 247 p. 72-4010599 [jfd 72·5021) ne necro ud j-.lea. pim, bedford clapperton trevelyan, 1826-1886. freeport, n.y., 1971. vii, 72 p. 12·3324'8 [hrc 72·749) ne necro ocl doe • ....._ bond, frederick weldon. college park, md. [1969, c1940] x, 213 p. 72-365267 [mwed 12·657) negro art • exhibitions. ~ harlem cultural council. new blac~ts. [new york, 19691 [541 p. (chietry illus., ports.) 72-420544 [mcw 72-9il8] negro art • united states. harlem cultural council. new black artists. [new york, 1969] (s4] p. (chien.x. ~u .... , ports.) 72-420544 [mcw 72-908] negro art· united states • history. chase, judith wraag. afro-american art and craft. new york [1971] 142 p. 72·363299 [3-mamt 72-910] negro artists· united states. fax, elton c. seventeen black artists. new york [1971] xiv, 306 p. 12·31l294 [mamt 72·732) negro arts • harlem, new york (city) hujjins, nathan irvin. 1927harlem rcnatasance. new york, 1971. xi, 343 p. 12-173133 [jfd 12·3936] afro-americans. new york (19711 61 p. 12·261130 [jnf 72·6] negroes. . black america. new york [1970] xv, 303 p . 72·296234 [iec 72·1178) negroes· addresses, essays, lectures. goldstein, rhoda l. black life and culture in the united states. new york [1971) xiii, 400 p. 12·240427 [iec 72-1 ul6] necroes •d the ....,.t depl'essiotl wolters, raymond, 1938westport, conn. [1970j.xvii, 398 p. 72-296828 [iec 72-1260] negroes • art. see negro art. + negroes as businessmen. andreasen, al~n r, 1934inner city business. new york [1971) xix, 238 p. 72-3371ss [jle 72-977) durham, laird. black capitalism. washington [1970) vii, 71 p. 72-401063 [jld 72-1931] jones. edward h. blacks in busin.,.., new york (1971] 214 p. 72-4008520 [jld 71-2200) negroes as businessmen • directories. national minority business directories, inc. national black business directory. (minneapolis] full record 01' holdings in central serial record. 12·406758 [jlm 12·221] negroes as physicians. sec negro physicians. fig. 5. the nypl research libraries dictionary catalog supplement, october 1972: l z page 120 and 121. these pages demonstrate the generation of the cross reference negroesart see negro-art even though only subdivisions of negro-art appear in the catalog. a further consideration extends to cross-references which have migrated into a cumulation. when a cumulation segment is updated, all cross-references which previously appeared in it should continue to appear if, and only if, the referenced heading is still in use in either the same segment of the cumulation, another segment of the cumulation, or a supplement; if not, its use is discontinued. subsequent use of the referenced heading would then call up the cross-reference for reuse. each of the above desid_,. ...: ._______. ··---· _ ~ 18 journal of library automation vol. 6/ 1 march 1973 erata requires rather intricate logic when the cumulation is being produced in monthly installments, as any of the following is possible: 1. cross-reference in a supplement, referenced heading in a supplement; 2. cross-reference in a supplement, referenced heading in a cumulation; 3. cross-reference in a cumulation, referenced heading in a supplement; 4. cross-reference in a cumulation, referenced heading in a cumulation. in each case, the cross-reference must be suppressed whenever the referenced heading disappears from the catalog available to the public, but must be retained when it refers to a heading existing in any part of the catalog. the cross-reference and referenced heading may easily appear in catalog segments published as much as eleven months apart, making it absolutely essential that both the authority and book catalog subsystems maintain strict control of the cross-reference structure. control of hierarchies it was decided that the appearance of cataloging under a subdivision of a heading which contains associated notes should cause the higher level heading with its attendant notes to appear. such a heading would be forced to appear regardless of whether or not it itself headed a bibliographic citation, under the assumption that notes concerning a heading might he valuable to a user interested in a subdivision of that heading (see figure 6 for an example ). acta symbojjc:a. v. i, no. 2·; fall, 197(). [akron, ohio] current issues available in periodicals division. 72·218723 [per. di•.] acflng. schreck, everett m. princij:iies and styles of acting. reading, mass. [1970) 354 p . 72·24is44 [mweq 7:z-t57] acflon songs. see games with music l'actirite utistiq•e. philippe, marie dominique. paris (1969)1 v. 72· 272967 [jfd 72-1443] actirities by v.no.s ceatraj bllllks to pro-'ote economic ud sod~ •elfue prop-aas [by lester c. thurow and others] a staff report prepared for the committee on banking and currency, house of representatives, 91st congress, second session. washington, u. s. govt. print. off., 1971 .' vii, 332 p. 24 em. at head of title: committee print includes bibliographies. 72·288174 [.jle 12·835] actors. here are entered works on actors .. including bolh men and women. works about women actors alone or women as actors are entered under the heading actresses. actors, american • biography. shaw, dale, 1927titans of tbe american stage. philadelphia [1971) 160 .p.· 72·313460 [mwer 12-524) fig. 6. the nypl research libraries dictionary catalog supplement, august 1972: page 2. the heading actors is caused to appear due to the presence of a scope note and the use of a subdivision of the heading. automated book catalog subsystemj malinconico 19 dictionary and divided catalogs the same system was required to serve two divisions of the new york public library, each of which has different traditions and philosophies of service to identifiably different users. therefore, an additional flexibility was required of the system: the ability to produce both dictionary form and divided catalogs. the research libraries, which have traditionally used a dictionary fom1 of catalog, wished to continue that practice. the branch libraries, on the other hand, felt that their public could be better served by a divided catalog, separated into titles, subjects, and names. the system was designed in such a manner that the modification of a single parameter in the final sort would produce either form of catalog. book catalog subsystem-technical description the entire subsystem consists of ten separate programs, each of which will be described below. the flow charts in figures 7, 8, and 9 depict the processing how of the subsystem. the system was designed to operate on an ibm 360 model 40 (which has since been replaced with a 370 model 145) with 256k bytes of core storage. the programs were written exclusively in bal for a dos configuration. a conversion to full os has recently been completed. each processing step described below is executed sequentially. significant peripheral devices required are: five tape drives, one disk drive in addition to those required by the operating system, and a line printer. please refer to figures 7, 8, 9, and 10 for the programs and files referenced by symbols pi, tl, dl, etc. entry explosion and construction-program pl this program serves as the driver for the entire subsystem. in this step entries are selected for inclusion in a supplement or cumulation segment. requests for data required from the authority :file are initiated. the format and data content of each entry are defined by this program via a set of tables. these tables may be altered at will, allowing redefinition of the format and content of any entry. the bibliographic master file is updated to indicate the appearance of an entry in a cumulation, preventing its subsequent appearance in a supplement. in addition, this program is charged with accepting communication of activity to the authority file and taking the appropriate action with respect to the bibliographic file. this activity may take several forms: alteration of a heading, change of delimiting, change to a filing form, posting or removal of a cross-reference or dual entry, change of categorization, or the complete transfer of all cataloging from one valid heading to another. evidence of activity to an authority heading is carried on the authority 1 bibliographic linkage file ( d0 ). when such activity has affected a head20 journal of library automation vol. 6/ 1 march 1973 fig. 7. subsystem flow chart. explode catalog entries generate responses, select headings.and update x-reference linkage pl create requests for x-references, dual entries, &. higher level headings p3.1 automated book catalog subsystemj malinconico 21 format headings module format headings module fig. 8. subsystem fiow chart. eliminate duplicate heading requests p4 locate higher le ve l headings, & dual entries ps 22 journal of library automation vol. 6/ 1 march 1973 create requests for secondary x-references, write headings p3.2 locate x-references, update x-reference indicators p6 fig. 9. subsystem flow chart. format headings module format headings module automated book catalog subsystem/ malinconico 23 insert authority text data into skeleton catalog entries p7 pagination p8 fig. 10. subsystem flow chart. 24 journal of library automation vol 6/ 1 march 1973 ing used by a bibliographic record as an authority field, the field is tagged for verification by the authority file in the next file update/ authority-interface run. the indicator for the field in question, denoting previous appearance in a cumulation, is turned off. at the same time, the indicators for all other catalog tracings which require that authority field as data are turned off. when a transfer from one heading to another has occurred, the new linkage number is inserted into the authority directory of the bib· liographic record. this is not absolutely necessary, as the authority / bibliographic linkage file provides the link via a chain when a transfer has occurred. nonetheless, the insertion of the true authority control number into the bibliographic file eliminates the necessity of a chained search in all future accesses of the tracing, space on the linkage file is conserved, and no additional indicators are required to make note of the fact that the entry has been caused to reappear in a supplement as a result of the transfer. in all cases of activity to an authority record, reverification is forced for the associated tracing field in order to guarantee correct usage of the altered authority. each bibliographic record is examined to determine whether it will contribute to the .catalog. this is done on an entry by entry basis. each field of the bibliographic record capable of defining a catalog entry is examined. all fields which define a catalog entry (tags 1, 245, 4-, &-, 7-) carry a set of indicators denoting appearance in the cumulation, and a number defining the cumulation segment into which the entry should file. an additional indicator for authority fields denotes the presence (or absence) of an associated dual entry on the authority file. appearance in the cumulation and filing segment number of the dual entry are also carried in the bibliographic record, allowing independent control of the dual entry citation. as may be readily seen, the dual entry acts as a phantom tracing in the bibliographic record and will thus not be specifically mentioned in the discussion of selection criteria below. an entry is selected for construction on the basis of the following criteria: 1. the bibliographic record is in a valid status, i.e. has passed all editing tests, and sufficient time for proofreading has elapsed. 2. all authority fields required for construction of the entry have been verified against the authority file in the weekly bibliographic file update/ intedace production runs. 3. it files in the segment being produced that month. 4. the indicator denoting appearance in the cumulation is not set. thus, any alteration to the content of a bibliographic record, warranting immediate reappearance of an entry, may be communicated to the book catalog subsystem by the extinction of the cumulation indicator. both cumulation and supplement entries are created in the same run. the entries are separately collated by causing the highest level of the final automated book catalog subsystem/ malinconico 25 sort to be a code denoting supplement or cumulation. it will prove fruitful at this point to draw a distinction between a catalog entry-the printed bibliographic citation-and the machine record which is created by the system prior to phototypesetting. the machine record is nothing more than a highly organized print record. the final merging of such print records from various processing steps completely define the text, typography, and sequencing of the final printed catalog. the machine print records created by the system up to step p8 will be referred to as text entry ( te) records. when an entry is to be included in a particular month's catalog segment or supplement, a table for the particular type of entry is consulted in order to determine the data and the typographic commands which will govern the entry's format. at this point only a skeleton text entry record is constructed, as all authority data will be obtained from the authority file. the sequencing information is contained in the sort key of each te record, which defines six levels of sorting: 1. collation-catalog or supplement. this is further refined when a divided catalog is being produced. 2. level i sort, and sort code. 3. level ii sort, and sort code. 4. level iii sort, and sort code. 5. publication date. 6. publisher. in the case of certain series entries, level ii and iii may be split into two half-size levels by the program in order to further refine the sort sequence. as an example of the use of sort levels i, ii, and iii, we might consider a subject added entry. in that case, the level i sort is defined by the filing form of the subject tracing, level ii by the filing form of the author's name and level iii by the filing form of the title of the work. the sort codes are used to separate entries which would result in the same sort keys but are conceptually different, e.g. a name which might simultaneously define a title added entry, a main entry, and a subject added entry. a similar situation exists at the second sort level where conventional titles are to be separated from titles or subject title entries. sort key levels, as all other data elements required in a te record, will be directly inserted into the record under construction if they consist of nonauthority data, and will be identified by linkage codes for later insertion when the filing form data is returned from the authority file. the final te record will not be completed until step p7, to be described below. following construction of the sort key (or indications to complete a sort key) typographic commands and text data are inserted into the te record. the typographic commands are contained as binary bit settings in a record directory. the directory also defines the location and length of each data element, or gives a linkage code when the data are to be obtained from the authority file, and hence cannot be inserted until program p7. the order 26 journal of library automation vol. 6/1 march 1973 of entries in the directory defines the printing sequence of text data. thus, when text data are available, true locations and lengths are provided in the record. when they are not, linkage codes replace them in the directory. these linkage codes are simply replaced by true locations and lengths when the authority text is added to the end of the record by another program (p7). it will suffice at this point to mention that all typographic commands are present in the record. the function of the commands will be discussed in detail below when the pagination program ( p8) is discussed. having constructed a set of skeleton te records, the program initiates requests to the authority file for authority text data and filing forms. requests are also made to the authority file for headings which are to print above the bibliographic citations. these headings will be constructed in the same manner as catalog enbies, i.e. as te records. they will then be merged with the respective te records as citation entries. these heading requests also initiate a sequence of processing steps culminating in the location and formatting of all relevant cross-references. the necessary crossreferences are formatted into te records, and are likewise merged to form the complete catalog. when an entry is chosen for inclusion in a cumulation segment, indicators to that effect are set in the bibliographic master record; it is then written onto the updated bibliographic master file. locate authority data and select headings-program p2 all inquiries to the authority file are sorted into authority sort key sequence and matched with the authority file. all inquiries will result in a match to a valid authority record. a match for each inquiry is assured by the weekly file updatej intetface processing programs. inquiries to the authority file result in any combination of the following actions: ( 1) authority text and filing data are supplied, via a response record, to program p7 for the completion of te records created by program pl; (2) authority records are selected to serve as headings above bibliographic citations (these same records will also cause cross-references to be selected); ( 3) authority records are selected in order to initiate a search for the associated dual entry, as per instructions contained in the inquiry record. the selected headings consist of complete authority records with instructions regarding their eventual use and routing. headings are routed, via a collation code, into cumulation segments or supplements. since a single authority heading may appear as both a main entry and subject heading, indicators are set defining its eventual use as one, the other, or both. these indicators will be called usage indicators. usage decisions made by pl are passed to this step as part of the inquiry records. the results of these decisions are then transmitted as a set of codes inserted into the se~ lected authority records. automated book catalog subsystem/malinconico 27 this program is further charged with the responsibility of keeping current the catalog status indicators for cross-references by maintaining two binary indicators with every cross-reference. a cross-reference record with multiple see fields will have a pair of indicators for each see field. the first binary indicator denotes prior appearance of a cross-reference in a cumulation segment. the second indicates that the referenced heading currently appears in some part of the catalog. in passing through the entire authority file, this program will note that a heading which falls in the current month's filing range has had no requests for its use lodged against it. when this is the case, transactions are created for every cross-reference, defined by see froms in the heading record, extinguishing the second binary indicator described above. the cross-reference will then not be used again until it is required. the need for this operation will become more evident when we discuss program p6. the maintenance of the physical linkage between cross-references and headings is performed by the authority file update subsystem. this subsystem guarantees that the linkage is kept current regardless of alterations to headings and cross-references. hence, all see froms are guaranteed to refer to a cross-reference (direct see) record on the file. explode hierarchies, cross-references and dual entries-program p3.1 the selected authority records are examined for the presence of see from fields. if any are found, they are used to create further inquiries to the authority file for cross-references. a similar operation is performed for dual entries with the exception that the dual entry inquiry is not created unless it was requested by program pl. the request is passed via indicators in the inquiry record (as discussed above in the description of program p2). all records which are subdivisions of headings, e.g. sculpture-technique, will cause inquiries for all significant higher level headings ( sculpture in this case) to be created. higher level headings will supply additional entry points via cross-references to them, or may themselves appear if they contain notes. cros.'i-reference requests are separated for later processing. they will be processed with requests for secondary cross-references to be generated by program p3.2 below. exclude duplicate headings and separate inquiries-program p4 this program is nothing more than a sort with exits. the input tape of selected headings and higher level heading requests is sorted, and if a request for a higher level heading has already been filled by a heading selected in 'p2, the request is dropped. all usage information carried by the request is logically added to the matching heading. when multiple requests for the same higher level heading are discovered, all but the first are 28 journal of library automation vol. 6/ 1 march 1973 dropped. usage information from all duplicates is added to the retained request by a logical or operation. the authority records which were selected by p2 for use as headings are formatted into complete text entry (te) records for later input to the pagination program. te heading records are formatted by a single module invoked by this step and again in p5. the surviving hierarchy requests, and all dual entry requests are separated for processing in the next step. format headings module all heading records selected for print are processed by this module, which converts the input text and filing data of authority recmds into te records. at times quasi-duplicates of the te record are constructed with different filing and typography codes for use as main entry and subject headings. at times portions of the data are encoded as nonprinting because it is lmown that the print data will be provided by other heading records. this is the case with author/ conventional title records. the author heading is assured because of the explosion of higher level headings; hence, a simple method is provided for insuring its appearance only once regardless of the number of associated conventional titles. when a subject heading record is created, the heading is made to appear twice in the record, once in upper case for printing, and once in its normal upper and lower case form, encoded as nonprinting, for possible use as a dictionary heading by the pagination program. the conversion to upper case is effected via a translate table, because of the presence of control information within the text for floating diacritics. also, diacritics and many special characters do not have a simple upper case equivalent due to the use of the complete ala character set. punctuation of cross-references is effected in this module. the complexities by no means approach those encountered in punctuating condensed added entries; nonetheless, they do exist. for example, terminal periods in headings referenced in a cross-reference must be replaced with semicolons when more than one heading is referenced, a blank mus l be inserted following the hyphen and preceding the semicolon in open ended dates, the final referenced heading in a string must end in a period unless it terminates with a hyphen, quote mark, exclamation point, question mark, parenthesis, etc. typographic codes which apply to headings, notes associated with headings, and phrases in cross-references are inserted by this program when te records are created. locate hierarchies and dual entries-program p5 all heading requests are applied to the authority master file. when the heading corresponding to a request is located, the entire authority record is written onto an output file for further processing. this process is simautomated book catalog subsystem/ malinconico 29 ilar to that executed when the original heading requests were processed in program p2. higher level headings are encoded for use in accordance with their categorization and filing form. when a requested dual entry heading is located, a te record is written for later processing by the pagination program. a response record containing the filing form of the dual entry is also written onto an indexed sequential disk file. a direct access file is necessary since the catalog record contains only a link to the primary heading, and all requests for the dual entry come via a request against the primary heading in program p2. rather than attempting a complex scheme for keeping track of all bibliographic items requiring the dual entry data, only one copy of the dual entry response is isolated and indexed by the control number of the primary heading. it is then retrieved on that basis when needed. explode secondary cross-references, separate and select hierarchical headings-program p3.2 this program is simply a phase of program p3.1 described above. the major difference lies in its handling of the authority records which it accepts as input. they are written out as te records, but only if they meet one of two conditions: if the authority record matching the heading request contains notes, it is selected for eventual formatting into a heading; or if it represents an author, required of an author/conventional title combination. in all other instances higher level headings are not selected for printing. the format headings module is invoked by this step for all higher level headings selected for print. if secondary cross-references are not desired, the explosion module which creates the requests is simply bypassed. similarly, higher level headings may be suppressed. no further attempt is made to generate higher level headings, as they have all been exploded in p3.1. the exploded cross-reference requests are separated in this program, just as they were in p3.1. locate cross-references-program p6 prior to execution of this step tapes t3.1, t3.2, t3.3 are sort/merged into a single tape t3.4 (figure 9). t3.4 now contains all of the transactions generated by program p2, and all cross-reference requests. recall that p2 has created transactions extinguishing the indicator carried by cross-reference headings, denoting that the referenced heading appears somewhere in the catalog. the sort causes all of these transactions to be applied before any cross-reference requests are processed. it might appear a bit paradoxical that a request should be made to a cross-reference whose referenced heading was not selected in p2; however, recall that a cross-reference may be invoked as the result of the use of a subdivision of the referenced heading (secondary cross-reference). at this point some discussion of the cross-reference record is in order. a cross-reference may point to several headings simultaneously, e.g. ani30 journal of libraty automation vol. 6/ 1 march 1973 mals see aardvarks/ bears/ cats/ ... zebras. each referenced heading is controlled individually. only the required references are extracted as needed. in the example above, if aardvarks and cats appeared in the catalog those two references would have been selected, and no others. hence, the discussion which follows will be greatly simplified if we consider each cross-reference transaction to apply to only a single reference. this is effected operationally by carrying the control number of the heading which gave rise to the cross-reference request within the request. following the application of transactions, if any, to extinguish indicators, the selection fm· print logic is executed. cross-references are selected for printing when the indicator specifies that the cross-referenced heading appears somewhere in the catalog available to the public, regardless of whether there is a specific request for it, and the cross-reference is filed in the segment being produced. a request for a cross-reference which already appears in a cumulation segment currently in use is ignored. a request for a cross-reference which is not already in the catalog is honored. the actual logic is somewhat complex; however, the end result is as described above. cross-references to be printed are routed to either a supplement or cumulation installment depending upon the filing range in which they fall. when a divided catalog is being produced cross-references are further routed into the appropriate catalog on the basis of categorization. following the selection of, or refusal to select, a heading, the indicators denoting prior appearance in the catalog and linkage to a heading in use are updated. continuing integrity of the cross-reference structure for future printings of the catalog is thus assured. complete citation text entry records-program p7 prior to execution of this processing step, response records emanating from p2 are sorted into bibliographic item number sequence. sequencing is necessary since the ske-leton te records are in the same sequence as the bibliographic master file. identification of authority response data required by a te record is via bibliographic item number and a sequence number assigned to each authority field within a bibliographic record. subfields of a response record are identified by delimiter. response records are matched to skeleton te records bearing the same item number. following the match, all required data are inserted into the skeleton te record. codes are carried in the te record directing this program to perform certain formatting functions not possible in step pl. these functions include insertion of certain combinations of parentheses and brackets required by series notes, addition of a series note to certain call numbers, and the replacement of the author portion of an authortitle combination se1·ies note with his:, her:, in his:, in her:, etc. none of the above could have been accomplished in a typographically acceptable manner in program pl. dual entry data are obtained from the indexed sequential file ( dl). the automated book catalog subsystemjmalinconico 31 identification of such data is via the authority control number of the primary lc subject heading carried in the bibliographic record. this number is used to access file dl for the required text and filing data. pagination-program p8 prior to execution of this step a set of page initialization records is created for the particular type of catalog being produced. these records are prepared by a program not shown in the subsystem flow. initialization records govern the overall format of the book to be produced. there are six such initialization records, all of which must appear at the beginning of the input tape. they may also appear embedded anywhere among the te records in various combinations. the first initialization record, known as a page dimension ( pd) record, defines the physical dimensions of the page to be printed. parameters carried in this record also determine the dimensions of inner and outer page margins, head and foot margins (independently for recto and verso pages), number and width of columns, body size on which to set type, and spacing between entries. when an embedded pd record is encountered the program will terminate any page cun-ently being formatted, begin a new page, and continue formatting in accordance with the redefined dimensions. the second initialization record defines the starting page number, and indicates whether paging is to start with a recto or verso page. the pagination program may also be directed via this record to place a black square at the edge of a page, at a location defined by the record, to serve as a thumb index. this record may also appear anywhere else on the tape. when it does appear as an embedded record it commands the program to terminate the page being formatted at that point, to begin a new page, and possibly provide a number of blank pages. this allows volumes to be broken at predefined sort points. in this manner we may separate alphabetic segments, the various volumes of a divided catalog, or cumulation and supplement volumes, and move the thumb index. subsequently four records define caption and legend text (independently for recto and verso pages). any one or combination of these records may also occur elsewhere on the tape. when they do occur as embedded records, the program terminates the page currently being formatted, alters the appropriate caption and/ or legend text, and continues to format text. interfiling of these records with te records allows captions to be changed automatically between volumes of a divided catalog, or between supplement and cumulation volumes, or at any other desired sort point. the six records described above control those aspects of page format which are common to a large class of entries. individual te records carry typographic commands which are specific to the entry, or to an element of the entry. a code carried by each te record (entry fo1'mat code) defines typographical rules for the entry as a whole. this code is used to identify 32 journal of library automation vol. 6/ 1 march 1973 data to be used in the formation of dictionary and column headings when page breaks occur. certain widow rules affecting the entire entry are specified, e.g. entry may not span columns, entry may not form the last line of a column, etc. line advance commands, defining the amount of space (if any) to be left between enbies, are carried in this code. data elements within an entry may require different typographic rules. format codes for each such element are carried within a record directory. the directory also serves to identify the location and length of text data to be typeset in accordance with the typography specified by element format codes. element format codes consist of 32 bit fullwords. groups of bits within the word define separate typographic rules. these bits may be set in any combination, defining a complete spectrum of typography. the major typographic parameters governed by these bit settings are: 1. starting indention ("continue on the previously used line" is included). 2. overflow indention to be used if the element must be continued onto another line. 3. space to be left on a line before adding any additional text to a previously used line. 4. justification-left, right, center of column, and center of page. 5. type size height. 6. type size width relative to height. 7. type face-bold or light. 8. type style-roman or italic. 9. element widow rules-restrictions which do not allow text to : span columns, form the first line of a column, span from a verso to a recto page, or span from a recto to a verso page. 10. line break-indicating whether lines may be broken at blanks only, or may be broken at blanks and certain special characters. line break decisions observe a hierarchy of rules, e.g. if the indicator is set to break at blanks only and no blanks are found within the entire line, the program automatically reverts to the second option (break at blanks and special characters ); should that also fail, the line will be broken arbitrarily at the last character which fits on the line. 11. hyphenation indicator-due to the great number of foreign languages used in the nypl catalog no hyphenation routine is employed. allowance has been made, however, for the inclusion of a hyphenation module should it be desired in the future and an indicator provided in order to invoke it. other rules of lesser importance exist, but space does not warrant their discussion. the entire ala character set plus several additional characters specified by nypl may be typeset via this program on an iii videocomp. diacritics are floated onto the characters they accent. the coding structure adopted by nypl consists of two unique codes preceding a pair of characters to be automated book catalog subsystem / malinconico 33 overprinted. the first code indicates to all processing programs that the data to follow must be interpreted in a unique manner. the second defines the unique treatment to be accorded. we currently employ only two such functions codes; both imply a form of overprint. coding in this manner allows unlimited expansion of the character set. a function code has been assigned but not yet utilized for overprinting of triplets. this would be necessary in handling doubly accented characters, such as are found in vietnamese. functions codes have been assigned defining escapes to nonroman alphabets. the character set includes two blanks in addition to the normal word space. one of these will provide a word space on printed output but will fail line break tests. such a character is of great utility as a separator in abbreviations and as a word space preceding such terminal characters as a close parenthesis. conversion of the nypl data base to utilize this super blank will be effected following definition of sufficiently reliable rules for its automatic generation at input. the second blank is a zero set width character. this character, when present in a machine record, is assigned a null width by the phototypesetting device. its utility lies in areas in which it is required to remove only one or two characters from a record, but it is not desired to expend the programming or processing time in restructuring the record. all of the input text data and format codes are translated into commands to an iii videocomp 830 and written onto a driver tape. the driver tape is then delivered to a photocomposition vendor who mounts it on a videocomp to produce camera ready copy for catalog pages. the camera ready copy is then delivered to a printer who produces multilith plates, and thence, pages which are bound into monthly supplements and cumulation segments. conclusion photocomposed book catalogs have been in use at nypl since january 1971. the effectiveness of the system can, perhaps, best be judged by the only adve rse reaction received thus far: in the case of material which must pass through the bindery after cataloging, entries appear in the catalog befor e the materials reach the shelves, thereby causing annoyance to users. judged by more serious criteria, the system has been proven to be an operational success. the processing budget for the research libraries is now insignificantly higher than it was under the manual system, but cataloging volwnes have increased dramatically: 7,500 titles/ mo. cataloged vs. 5,500 titles/ mo. under the old manual system. the increase in productivity cannot be solely attributed to the automated system. some of it is attributable to the revision, by the head of preparation services, of manual procedures. e xpansion of book catalog coverage the entire bibliographic system is currently in the final stages of revi34 journal of library automation vol. 6/1 march 1973 sion for production of a multimedia catalog of the dance collection of the research library of the performing arts. 20 the organization of cita~ tions referring to material in diverse media will be accomplished by pr~ viding separate sequences under appropriate headings, denoting: works by, works about, visual works, music, audio materials. listed under each of these headings will be the following types of materials: 1. works by-written works by an author. 2. works about-written works about an author, performer, etc. (the subheading is not used under topical subjects.) 3. visual works-photographs (original and indexed), prints and orig* inal designs, motion pictures and videotapes, filmstrips and slides. 4. music-music scores. 5. audio materials-phono records and phonotape. these headings are not as specific as those suggested by riddle, et al., however, they do provide the early warning function discussed by virginia taylor.zl, zz this catalog is due for publication in early 197 4. pending the success of this venture, a study will be made of the means of extending the scope of the research libraries' catalog to include nonbook materials. in late fall 1973, an extremely exciting and bold step will be taken by the jewish division of the research libraries. they will begin data input of material in hebrew, using the recently defined ansi correspondence scheme for hebrew characters. 23 within this scheme roman and special keyboard characters have been assigned to each character of the hebrew alphabet. book catalog display of hebrew text will utilize these characters in a left to right print mode until such time as development money is found for the digitization of hebrew character fonts, and for modifications to the pagination program in order to display mixed roman and he~ brew text. all hebrew entries will be filed in accordance with conventions for sequencing hebrew text. the hebrew entries will be interfiled with entries in romanized forms by conceptually assuming the sequencing alphabet to contaln 57 characters: blank, a, b, ... , z, 0, 1, . .. , 9, n , !l , ... , .n . if we have an author who has written several titles in roman al~ phabet languages, and others in hebrew, we would create a sequence of main entries under his name interfiled according to the alphabetic sequence shown above. all hebrew or variant title added entries would be found in a sequence starting at the end of the roman alphabet. the primary reasons for adopting such a scheme as opposed to the more traditional romanization are: 1. a nationally endorsed correspondence schedule has been provided by ansi. 2. it is desired to enter this data into the automated system and end the manual operation at the earliest possible time. 3. it is desired not to have to revise all cataloging when true hebrew text may be economically displayed. it is virtually impossible to reauto1mted book catalog subsystemjmalinconlco 35 cover the true form of nonroman text from its romanized form. these two areas, nonroman alphabet display and inclusion of nonbook materials, represent the only areas in which further development of the book catalog system is planned. future efforts will be directed to conver~ sion of the batch-oriented processing system to one with on-line file maintenance capability. it should be stressed again that the primary aim of the bibliographic system is not production of book catalogs. the system was designed to create a highly controlled data base which could be used in conjunction with whatever display medium it; technologically and economically feasible. online access to the catalog will require extreme control of the data, as automated retrieval techniques require very precise definition of access points. the problems of data organization become greatly magnified when crt display devices are used, as the visual scan range produced is severely limited. the extensive development effort to produce book catalogs was undertaken at nypl since it was felt that for at least the next decade book catalogs in printed or microform would provide the only economically viable form of access to the collection. book catalogs will, no doubt, also serve as backup forms of display for a considerable time after introduction of electronic access techniques. references 1. seoud makram matta, the card catawg in a large research library: present conditions and future possibilities in the new york public library, submitted in partial fulfillment of the requirements for the degree of doctor of library science. (new york: columbia university, school of library service, 1965). 2. i. a. warheit, "automation of libraries-some economic considerations," presented to: canadian association of infornuition science, ottawa, ontario, canada, 27 may 1971. 3. james w. henderson and joseph a. rosenthal, eds., library catalogs: their preservation and maintenance by photographic and automated techniques (mit report no. 14.) (cambridge, mass .: mit press, 1968). 4. margaret c. brown, "a book catalog at work (free library of philadelphia)," library resources and technical services 8:349-58 (fall1964). 5. richard de gennaro, "harvard university's widener library shelflist conversion and publication program," college & research libraries 31:318-33 (september 1970). 6. richard d. johnson, "a book catalog at stanford," journal of library automation 1:13-50 (march 1968). 7. paula kieffer, "the baltimore county public library book catalog," library resources and t echnical services 10:133--41 (spring 1966). 8. hilda feinberg, "sample book catalogs and their characteristics." in: book catalogs by maurice f . tauber and hilda feinberg. (metuchen, n.j.: the scarecrow press, 1971) p.381-511. 9. paul j. fasana and heike kordisb, the columbia university libraries integrated technical services system. part ii: acquisitions. (a) introduction. (new york: columbia university libraries systems office, 1970). 62 p. 10. gerry d. guthrie, "an on-line remote access and circulation system." in: amer36 journal of library automation vol. 6/ 1 march 1973 ican society for infor11ultion science. annual meeting. 34th, denver, colorado, 7-11 november 1971. proceedings 8:3059, communications for decision-makers. (greenwood publishing corp.: westport, connecticut, 1971). 11. ralph m. shoffner, "some implications of automatic recognition of bibliographic elements," journd of the american society for infor11ultion science 22:275-82 (july/ august 1971) . 12. frederick c . .kilgour, "initial design for the ohio college library center: a case history." in : clinic on library applications of data processing, 1968. proceedings (urbana: university of illinois, graduate school of library science, 1969) , p. 54-78. 13. maurice f. tauber and hilda s. feinberg, book catalogs (metuchen, n. j.: the scarecrow press, 1971). 14. catherine 0. macquarrie, "library catalogs: a comparison," hawaii library association ]ournal21:18-24 (august 1965). 15. irwin h. pizer, "book catalogs versus card catalogs," medical library association bulletin 53: 225-38 (april 1965). 16. kieffer, "the baltimore county public library," p.l33--41. 17. james a. rizzolo, "the nypl book catalog system: general systems flow," the larc reports 3:87-103 ( falll970). 18. edward duncan, "computer filing at the new york public library," the larc r eports 3:66-72 (fall1970). 19. s. michael malinconico, "optimization of publication schedules for an automated book catalog," the larc reports 3:8185 (fall 1970) . 20. dorothy lourdou, "the dance collection automated book catalog," the larc reports 3: 1738 (fall 1970). 21. jean riddle, shirley lewis, and janet macdonald, n on-book materials: the organization of integrate d collections. prelim. ed. (ottawa, ont.: canadian library association, 1970). 22. virginia taylor, "media designators," library resources and technical services 1:60-65 (winter 1973) . 23. edward a. goldman, et al., "transliteration and a 'computer-compatible' semitic alphabet," hebrew union college annual 42:251-78 (1971). lib-mocs-kmc364-20140103102512 64 predicting the need for multiple copies of books robert s. grant, presently at hope college, holland, michigan. an industrial inventory technique adapted to a university library's computer based circulation system as one aid in identifying heavily used books for multiple-copy purchase. the university of windsor has approximately 5,000 students. the university library's open stacks contain more than 300,000 volumes, 100,000 of which are non-circulating (bound periodicals and reference books). there are approximately 200,000 books available for circulation, a booksto-student ratio of 40:1. nevertheless, a perennial student complaint is: "why is it that every time i need a book, someone else has already checked it out?" to help mitigate this problem, the library decided several years ago to embark upon a programme of purchasing multiple copies of much used books. the question then became one of determining which books would need duplicating, and how many more copies of each title would need to be bought. suggestions of titles to be duplicated were at £rst solicited from the faculty, but ever-increasing demands on them prevented their being more than minimally cooperative. three years ago, in an effort to increase the availability of books to undergraduates, the library changed its circulation period for undergraduates from two weeks to one week, with unlimited renewals. at the same time there was instituted a system whereby a student £lied out a reserve card requesting that he be allowed to check out a book upon its predicting need for multiple copies/grant 65 return. when there were five or more such requests, then a copy of the book was to be purchased. although this . system of ordering multiple copies was very cumbersome, it was better than nothing. an article by william l. leffler ( 1) suggested a system of adapting industrial inventory techniques to the problem of identifying books to be duplicated that would be compatible with the library's computer based circulation system and also could be expected to be simpler and more thorough than the above method of buying multiple copies. without rehearsing leffler's arguments, the basic formula used in this project can be simply stated as: n x n9s% nbooks = t where nbook• = the number of copies of a single title necessary to meet at least 95% of student demand for that title; t =number of days of observation, i.e., the number of days in the academic year in which students are permitted to check out books (a constant of 273 in this formula, being the number of days in the period from 1 september to 31 may); n = total number of times a title circulated during t; n9s% =a+ 2s, where a= the average length of time a title was on loan, i.e., the total number of days in which a title was in circulation divided by the number of times (n ) the title circulated. s =standard deviation, which is computed as: j l (a~a) 2 a1 is the length of time, in terms of days, that a single title was off the shelves each time it circulated, and is not to be confused with a which is the average length of time (over the academic year) that the same title was on loan. the sum ( l) of all the a1's was used earlier to calculate a. a1 + a2 + aa . ... etc. a= n for example, if a book circulated three times during the academic year (the first time for 18 days, the second time for 20 days, and the third time for 3 days) then a (the average length of time the book was on loan) would be calculated as 18 + 20 + 3 13 66 3 , or . at this point it should be noted that although the library continues to accept request cards for books presently on loan (and to reserve books for the requestors), these requests are not used as part of the data in determining the number of copies necessary to meet at least 95% of the demand. for one thing, there is no way of knowing how long the person making a request will want to keep a book out, and time is an important element in the formula. but more importantly, the formula, as it now stands, attempts to account for unsatisfied requests. it assumes that in at least some instances there will be more requests for a title than there 66 journal of library automation vol. 4/2 june, 1971 calculate t n '>-----~1 of loans + add 1 t o copies circulati ng n calcu late n o . of days on loan s tor e information in table fig. 1. programme logic. total n of days on loan re-set table n calculate average length of l oan calculate standard deviation calculate calcu l a te print report predicting need for multiple copiesj grant 67 are copies in the library. by providing an analysis of the present circulation profile of each book, the formula attempts to predict the number of copies of each title the library would need to have in order to more adequately accommodate unsatisfied demand. the programme for performing the calculations is written in pl/1 and is run on an ibm 360 j 50 (figure 1). the execution time for 140,000 circulation records (each time a book circulates the data on its circulation is considered a single record ) is 15 minutes. the historical record file, the source of data for the programme, is incremented each time a book in circulation is returned. figure 2 shows the format of this file. the file itself is a sequential file stored on magnetic tape, updated daily to include the previous day's circulation data. entries are arranged in lc call number-accession number order. field card type lc call number author accession number spare card sequence number spare borrower's id code borrower 's id number spare action code due da te (mmddyy) (mo .-dayyr.) spare indicator date charged out (yyddd) (yr.-day) date returned (yyddd) (yr. -day) fig. 2. format of historical record file. length 1 29 15 6 1 6 2 1 6 3 1 6 3 1 5 5 accumulative length 1 30 45 51 52 58 60 61 67 70 71 77 bo bl 86 91 68 journal of library automation vol. 4/ 2 june, 1971 results after the calculations described above have been performed for every title circulated during the academic year, a print-out of the results is produced ( figure 3). in order to limit paperwork, only those results under "projected need" which were ~ 1.00 appear on the print-out; any results less than 1.00 were suppressed. the column labelled "transactions" is simply the number of times the book was checked out and checked back in again. the column, "average loan period" is the a described in the formula above. and the column, "copies circulated" is the number of books with the same classification number as listed on the left-hand column, but with different accession numbers, checked out during the year. this figure is not the number of copies of the book that the library owns, which could, in some instances, be more copies than were actually circulated. the column labelled "projected need" should, according to the calculations, indicate the number of copies of a title which could accommodate the demand for that title with 95% certainty. in order to find out whether or not the library should purchase more copies of a particular title, the number listed in this column is simply checked against the number of classification author projected trans. avg loan copies need period circul . am---101.-.c3488-canada-national 3 . 61 37 10 . 45 17 b-----56.-.c6---collins-james-d 1.14 21 8 . 00 2 b-----65.-.86---bodenheimer,£.1. 21 12 11.50 3 b--65. .r6----rommen-heinrich l. 34 5 20.60 2 8--67 ..858-blake-ralph-m-2.00 4 36 .75 2 8 ----67.-.n22---nagel-ernest--2. 34 23 11 . 39 3 8----72.-.c63-copleston f.c . 2. 39 27 9.18 10 b----72. .hs---gilson-e. h.---1. 64 26 9.03 14 b-----72.-.j6----joad-cyril-edwi 2.84 8 2 1. 7 5 2 b----72.-.p3----parkerf .h . ---2.48 4 41.00 2 b--358.-.c57---plato--------5.68 21 15 . 61 3 8----358.-.j8----plato---------2.00 38 8 . 07 10 b---358.-.w7----plato-------2. 72 5 39.80 3 8----377. -. a285-plato--3.65 8 3 5. 3 7 2 b----378.-.a2c6--plato-----l. 58 2 73 . 00 2 8 -381. -. ast35plato---l. 04 3 36 . 33 3 b---385.-.a6----anderson-f h-2.92 16 13 . 43 2 b----395.-.877---brumbaugh-r-s2. 05 12 13.33 1 b----395.-.c6----crombie-i-----3.02 17 12 . 41 2 b-395. .g67--grube-georg em5.13 30 10.30 5 b----395. -. g78 --guardini,r.2. 04 17 12 . 23 4 b---395. -. k6----koyre-alexandre 1.13 4 21. 7 5 1 b-395.-.l6--lodge rupert c1. 88 3 51.33 1 b----395.. 553-5horey , paul-4.69 23 11.91 4 b--398. .t25 -taylor,a.e. ---1. 31 28 7 . 7 5 5 8 -398.-.e8h17hall , robert-w .2.99 11 16 . 72 2 b-407 .-. l8l9-lutoslaw5ki,w. 3.10 4 59.25 1 b--505. -. m2--aristotel£5, -2 . 88 17 1 2 . 00 7 b----505.-.03---oates-w.j.--3 . 86 9 2 7. 3 3 7 b--528.. z 4 13-zeller-eduard-1.39 6 33 . 00 2 8--528. .p751--pohlenz-max---1. 35 5 34.60 2 8 --667.. 525---sam bur5ky 5am-1. 36 5 42 . 40 1 b-701 . -.d4d6-dondaine , h . f . 1. 03 2 69. 0 0 1 b-701.-.a4e5 -proc lu5-diadoch 1.11 2 72.50 1 fig. 3. circulation history analysis report. predicting need for multiple copies/ grant 69 copies listed for this classification number in the official shelf list. for example, the book classified as b.72.j6 shows a "projected need" of 2.84. therefore if the library had three copies of this book, and the book's circulation pattern did not change significantly in the immediate future, then the library would be able to fill 95% of the requests. the official shelf list, however, indicates that the library only owns two copies of this title, suggesting that at least one more copy should be purchased to meet present demand. these calculations do not anticipate future demand on the book. also, doubling the number of copies can never succeed in doubling circulation, a fact demonstrated by leimkuhler ( 2). this print-out, therefore, can only serve as one guide to multiple-copy purchase. precautions and pitfalls in using the results of these computations as a guide to the purchase of multiple copies, the librarian should be aware of several factors which may have distorted the results. one is that the student who checks out the only copy of a book and keeps checking it out all year, in lieu of buying his own copy, creates a false "demand" for· the book. it may be that he is the only person in the university interested in it, and when he graduates this book may sit out its life on the shelves completely unused. however, since the historical record file contains the borrower's id number, it is possible to distinguish between an original loan and a renewal. the first time the borrower's id number appears on the book's circulation record indicates the original loan. each additional and consecutive time the same borrower's id number appears on the same circulation record indicates a renewal. although the pilot project did not contain provisions for obviating this problem, it would have been simple enough to build into the programme a mechanism for suppressing the unwanted data. a faculty member who assigns parts of books for students to read, but does not place the books on reserve, forces competition for them on the open shelves. this too creates a demand which may not exist after the professor leaves the university or stops teaching a particular course. the librarian should be aware of such possible short-lived demands that may never recur. the circulation analysis programme was executed at the end of one academic year in order to provide the university of windsor librarians with guidelines for purchase of multiple copies of books to be used in the next academic year. if it were known that a particular book receiving heavy use one year would not receive equally heavy use in the next (because, for example, the particular course requiring that book would no longer be taught; or the book would be placed on a "two-hour reserve" for the coming academic year; or the book circulated frequently in one year only because it was on the "best-seller list"), then it would be folly to purchase three or four additional copies of the book just because the computer print-out indicated that a number of additional copies were 70 journal of library automation vol. 4/2 june, 1971 needed. other factors, therefore, although not included in the input data, are certainly relevant in determining the need for multiple copies. at the university of windsor library, a book that needs to be re-bound because of heavy use or mutilation is charged out to the bindery department. it then shows up on the historical record file, just as though it had been charged out. but since the "borrower's id number" for books charged to the bindery department consists of all zeroes, it would be simple enough to identify and suppress these particular records as unwanted data. by-products in addition to providing a list of books to be considered for duplication, the historical record file upon analysis revealed several other interesting facts about the university library's circulation. most noteworthy is the fact that, although there were more than 200,000 circulating books sitting on the open shelves at the time of this pilot project, only 40,205 different titles circulated for a total of 134,276 times. assuming there were only 100,000 different titles among the 200,000 books, this would mean that nearly 60% of the collection was probably not used by the students. of the 40,205 different works which did circulate, the calculations indicated that only 3,257 titles required one or more copies in order to fill 95% of the requests. of this latter number, only 570 titles were in need of duplication. (that is to say, the number of copies listed under projected need exceeded the number of copies actually owned by the library as indicated by the shelf list.) a random sample comprising one-third of these 570 titles was checked to see whether or not the books were in print. indications were that 38% of the titles in need of duplication were no longer in print. conclusions a close examination of the 570 titles apparently in need of duplication reveals that, with very few exceptions, students are apparently checking out only books that are curriculum oriented in the most narrow sense, i,e., books which they need to use in writing term papers. nevertheless, one can appreciate the fact that these books are in demand by the student, and if the library is to be responsive to users' demands on its facilities, it will need to spend part of the book budget each year purchasing multiple copies of the most heavily used books. unfortunately, even with these good intentions and the sophisticated assistance of the computer, students' demands for books will still be frustrated (at least one-out-of-three times) because books which need to be duplicated are no longer in print. programme a print-out copy of the circulation analysis programme described above predicting need for multiple copies/grant 71 is available from mrs. jean griffiths, computer centre, university of windsor, windsor, ontario, canada. acknowledgments the initial impetus and continuous guidance for this project was provided by albert v. mate, assistant librarian for public services at the university of windsor. dr. martin basic, faculty of business administration, acted as consultant. systems analyst was mrs. jean griffiths, and programmer was mrs. lillian jin, both at the university computer centre. references 1. leffier, william l.: "a statistical method for circulation analysis," college and research libraries, 25 ( 1964), 488-490. 2. leimkuhler, ferdinand f .: "systems analysis in university libraries," college and research libraries, 27 ( 1966), 13-18. • 100 communications the evolution of an online acquisitions system jenko lukac: lewis and clark college library, portland, oregon. about two years ago a home-grown online acquisitions system was developed and implemented at pacific university. the program, written in basic for the data general nova computer, performs all the necessary functions such as ordering, receiving, fund accounting, etc. 1 this program was offered to the library community, and about one hundred libraries from around the world have availed themselves of it . one of the libraries that obtained and adopted pacific's electronic acquisitions system (peas) was the watzek library at lewis and clark college. the advantage of a home-grown system is that it can be freely modified to suit the evolving needs of a particular library. this communication describes some of the changes made by lewis and clark college to the peas program, in order to illustrate how software developed at one institution can be "imported" into and enhanced by another institution . although matters were particularly simplified by having the same person who developed peas at pacific be responsible for the enhancements at lewis and clark, the procedure and conclusions are still generally applicable. the first change made to the peas program was to rename it clas-the computerized library acquisitions system . the most important change, however, was to translate it from data general basic to digital equipment corporation basic, since the computer at lewis and clark is a dec vax-11 . (each hardware manufacturer implements a slightly different version of a programming language.) the translation requires changing things such as square brackets to parentheses, the word read to get, the word write to put, etc. these changes would have to have been done repeatedly throughout the program, but, in fact, were quite easily accomplished by using a text editor-a metaprogram that can be instructed to change all occurrences of, for example, the word read to the word get in a single pass. clas retained all of the features of peas , and became fully operational at lewis and clark in february of 1980. since then, new features have been added as the staff expressed a need for them. some are minor, such as having the computer recognize initial articles in titles . others are more significant : 1. searching for records in clas by author and title makes use of unlimited rightand left-handed truncation. this makes possible subject searching through k~y words in the title . for this purpose an extra terminal is provided at the reference desk. 2. clas permits the file to be searched by the name of the faculty member who requested the item, in addition to the eight other access points available in peas. 3. clas provides an activity report for any given period showing, for each fund, the amount ordered, the amount received, and the average cost per item . 4. clas can produce vendor reports showing for each vendor the average discount and the delivery schedule. 5. clas asks the operator to verify the cost of an item if the list price and cost differ by more than 30 percent. 6. clas allows the receipt of partial shipments. some of the enhancements to clas involved successive modifications. for example, one of the features of peas was the prevention of duplicate orders by matching new orders being input with records already in the database. a potential duplicate is reported if there is a match on both the author and the title fields . it was decided at the time of implementation at lewis and clark that this criterion was too restrictive, and clas was programmed to report a duplicate if only the title fields matched . after some months of experience, it turned out that even this requirement was excessively restrictive: a slight variation in the way a title was input would prevent a duplicate from showing up. the criterion was then further relaxed to signal duplicates if either the title or the author's last name matched. this, however, was too broad a net : although no duplicates were missed, ordering a book by wilson or smith produced a tedious list of potential duplicates. hence , the requirement was tightened slightly to look for a match in either the title or the author's last name and first initial. this final criterion is currently serving well the needs of the watzek library. what is important about this evolutionary process is that it illustrates the dynamic way in which a library can "fine-tune" an automated system that is receptive to user modifications. since peas is supposed to be a selfexplanatory system, it lacks any documentation. clas is still a self-explanatory system, but nevertheless a manual has been produced to describe all its features and to record programming information such as the structure of the files . one version of the documentation is kept in machine-readable form so that it can be easily updated to correspond to developments in the program . in conclusion, it can be stated that a library-application software package has been successfully transplanted from one institution to another, from one hardware environment to another, and in doing so has matured into a fuller and more flexible system, which it is hoped will, in turn, benefit other libraries contemplating the automation of their acquisitions operation .2 references 1. jenko lukac, "a no cost online acquisicommunications 101 tions system for a medium-size library," library journal 107:684-85 (march 15, 1980). 2. interested libraries can request a copy of the clas program ($80) or manual ($40) directly from the author. the significance of information in the ordinary conduct of life* robert newhard: torrance public library, torrance, california. the information benefit provided to the general public by the developing telecommunications systems will be highly dependent upon the provider's perception of the current and potential role of information in the ordinary interests of life. as sessing this role cannot easily be done by standard questionnaire or survey methods because information does not have a conscious function in people's lives. some paradigms from the past and present may, therefore, be of use in articulating the everyday importance of information. the tool paradigm: information as a link between man and his tools or repairing a lost confidence prior to the industrial revolution, most production was carried on in the home, using tools either made or repaired mainly at home. in this cottage industry, each person was very close to and secure in the use of his tools . with the advent of the industrial revolution and the factory system, the worker no longer owned his tools, but went to one place to use someone else's tools. man and his tools began to separate. many used the tools, fewer understood them. this process began to create the "expert." today most of the tools we use-the automobile, telephone, computer termi* a version of this paper was delivered at the meeting on "public libraries and the remote electronic delivery of information (redi)," columbus, ohio, march 23-24, 1981. editorial board thoughts: a considerable technology asset that has little to do with technology mark dehmlow information technology and libraries | march 2014 4 for this issue’s editorial, i thought i would set aside the trendy topics like discovery, the clo ud, and open . . . well, everything—source, data, science—and instead focus on an area that i think has more long-term implications for technologists and libraries. for technologists in libraries, probably any industry really, i believe our most important challenges aren’t technical at all. for the average “techie,” even if an issue is complex, it is often finite and ultimately traceable to a root cause—the programmer left off a semi-colon in a line of code, the support person forgot to plug in the network cable, or the systems administrator had a server choke after a critical kernel error. debugging people issues, on the other hand, is much less reductive. people are nothing but variables who respond to conflict with emotion and can become entrenched in their perspectives (right or wrong). at a minimum, people are unpredictable. the skill set to navigate people and personalities requires patience, flexibility, seeing the importance of the relationship through the 1s and 0s, and often developing mutual trust. working with technology benefits from one’s intelligence (iq), but working with people requires a deeper connection to perception, self-awareness, body language, and emotions, all parts of emotional intelligence (eq). eq is relevant to all areas of life and work, but i think particularly relevant to technology workers. of particular importance are eq traits related to emotional regulation, self-awareness, and the ability to pick up social queues. my primary reasoning for this is that technology is (1) fairly opaque to people outside of technology areas and (2) technology is driving so much of the rapid change we are experiencing in libraries. it units in traditional organizations have a significant challenge because many root issues in technology are not well understood, and change is uncomfortable for most, so it is easy to resent technology for being such a strong catalyst for change. as a result, it is becoming more incumbent upon us in technology to not only instantiate change in our organizations but also to help manage that change through clear communication, clear expectation setting, defining reasonable timeframes that accommodate individuals’ needs to adapt to change, a commitment to shift behavior through influence, and just plain old really good listening. i would like to issue a bit of a challenge to technology managers as you are making hiring decisions. if you want the best possible working relationships with other functional areas in the library, especially traditional areas, spend time evaluating candidates for soft skills like a relaxed demeanor; patience; clear, but not condescending, communication; and a personal commitment to mark dehmlow (mdehmlow@nd.edu), a member of lita and the ital editorial board, is director, information technology program, hesburgh libraries, university of notre dame, south bend, indiana. editorial board thoughts: a considerable technology asset | dehmlow 5 serving others. these skills are very hard to teach. they can be developed if one is committed to developing them, but more often than not, they are innate. if a candidate has those traits as a base but also has an aptitude for understanding technology, that individual will likely be the kind of employee people will want to keep, certainly much more so than someone who has incredible technical skill but little social intelligence. for those who are interested in developing their eq, there are many of tools available—a million management books on team building, servant leadership, influencing coworkers, providing excellent service, etc. personally, i have found that developing a better sense of self-awareness is one of the best ways to increase one’s eq. tests such as the meyers briggs type indicator ,1 the strategic leadership type indicator ,2 and the disc,3 which categorize your personality and work-style traits, can be very effective tools for understanding how you approach your work and how your work style may affect your peers. combined with a willingness to flex your style based on the personalities of your coworkers, these can be very powerful tools for influencing outcomes. most importantly, i have found putting the importance of the relationship above the task or goal can make a remarkable difference in cultivating trust and collaboration. self-awareness and flexible approaches not only have the opportunity to improve internal relationships between technology and traditional functional areas of the library, but between techies and end users. we are using technology in many new creative ways to support end users, meaning techies are more and more likely to have direct contact with users. in many ways, our reputation as a committed service profession will be affected by out tech staffs’ ability to interact well with end users, and ultimately, i believe the proportion of our tech staff that have a high eq could be one the strongest predictor s of the long-term success for technology teams in libraries. references 1. “my mbti personality type,” the myers briggs foundation, http://www.myersbriggs.org/mymbti-personality-type/mbti-basics. 2. “strategic leadership type indicator —leader’s self assessment,” hrd press, http://www.hrdpress.com/slti. 3. “remember that boss who you just couldn’t get through to? we know why…and we can help,” everything disc, http://www.everythingdisc.com/disc-personality-assessment-about.aspx. http://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/ http://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/ http://www.hrdpress.com/slti http://www.everythingdisc.com/disc-personality-assessment-about.aspx lib-mocs-kmc364-20131012113323 velopment, which has recently seen the implementation of a new batch retrospectiveconversion subsystem, and added com catalog options and online authority verification during input/edit. while not the only bibliographic system to be successfully replicated, the wln computer system is becoming the most systematically replicated main-frame facility, with a broad range of future possibilities, including that of a truly turnkey system. wln's experience indicates that, if a system is designed for ease of maintenance at perhaps some sacrifice of efficiency, it will be readily transportable and allow others to obtain the benefits of a highly sophisticated bibliographic capability without the everincreasing cost of original development and, more importantly, without having to support the ongoing maintenance of a unique system. a general planning methodology for automation richard w. meyer, beth ann reuland, francisco m. diaz, and frances colburn: clemson university, clemson, south carolina. introduction a workable planning methodology is the logical starting place for the successful implementation of automation in libraries. an automation plan may develop on the basis of an informal arrangement or from the efforts of one individual, but just as often, automation plans are developed by committees. an automation planning committee must determine and execute some kind of planning methodology and is more likely to be successful if it starts with clear guidelines, good leadership, and a thoroughly proven approach. as a summary review of the literature will bear out, many libraries have developed their own planning techniques inhouse. some of these, which are addressed to the issues of cataloging rule changes and public-access catalogs, have been very well thought out .1 however, these techniques are generally not directed to planning for communications 205 library-wide automation, and are usually designed to meet the specific needs of an individual library. although the pattern for these studies is often similar, they do not seem to be based upon any general automation design methodology. neither, in addition, does there seem to be a general methodology available through any external library agency. the office of library management studies of the association of research libraries has developed a number of programs designed to assist libraries with their planning efforts, some of which appear to be useful in automation development. 2 but for many libraries, these programs may be too broad, too time-consuming or too expensive. as an alternative, some libraries will need to look elsewhere for a general automation planning methodology. this problem was addressed by the administration of the clemson library, and was resolved in a unique way. background the robert muldrow cooper library of clemson university has the responsibility of acquiring, preserving, and making available for use the many materials needed by faculty and students in their research and instructional efforts. at a typical landgrant institution like clemson, the amount of scholarly publishing and the pressure to develop research proposals has risen sharply in recent years . the increased needs of users working with an expanding and diversified collection have resulted in a doubling of circulation activity, and have required the growth of library staff by 70 percent over the last decade . furthermore, acquisition, processing, and access problems are compounded by the high inflation rate of materials, particularly serial publications, and manpower costs. even though user demands heavily burdened the traditional manual systems, the extent of library automation at clemson had been limited to a batch circulation system, a simple serials-listing capability, and the use of bibliographic utilities. although it had been generally accepted for some time that the acquisitions and fund-control functions at clemson were in need of automation, no concrete approach to develop206 journal of library automation vol. 14/3 september 1981 ing a system had been established. in addition, there was some concern that the development of an automated acquisitions system shouldn't be initiated without a clear understanding of how such an effort would affect the rest of the functions in the library. with this in mind, and as an initial part of planning, the library administration decided to implement a programmed study to determine specific needs and problems of the whole library at clemson and to determine the attendant costs and benefits of their resolution. since developing the methodology for this kind of study effort inhouse has been shown by experience elsewhere to be both expensive and time-consuming, a planning methodology was sought which could be brought in from outside the library and applied in a timely fashion. the international business machines corporation (ibm), through their local marketing representative, volunteered to supply that methodology by means of an education industry application transfer team (att} study. in order to implement the study, a team was organized consisting of representatives from the library, from the university's division of administrative programming services (daps), and from the ibm corporation. the purpose, approach, and results of that study constitute the rest of this paper. purpose the application transfer team methodology was implemented to fulfill a fourfold purpose. • first, it was necessary to act on the recognized need for a library-wide automation plan with something tangible that library and university administrators could use in the decision-making process. • second, basic objectives and implementation estimates were required to provide groundwork to the development of systems specifications and evaluation. • third, the planning process needed to provide a forum for meaningful participation by a number of library staff and users. • fourth, the planning needed to be accomplished rather quickly. the att met all these requirements. although the att study technique is generalized for work on any problem in the education arena, it seems particularly well suited to the library environment because it is oriented toward developing applications that solve production problems. the application transfer team methodology was developed by the ibm corporation for customer use. the a tt methodology evolved from ibm's business system planning function, which has been operational since the early 1970s. although the methodology has been used several times in the academic environment, this is the first time, to our knowledge, that it has been used in a library operation. the strength of the att is that it helps members of a team with diverse backgrounds to understand the environment under study. its final goal was "to improve operational productivity, provide better service to students, and provide information which can enhance management planning and decision making."3 put to work, the methodology is straightforward and effective. from beginning to end, the a tt process took clemson slightly more than three months elapsed time. total work time (including all report writing) for library staff was approximately one thousand man hours. as the initial step with the a tt methodology, it was necessary to engage a sponsor and to select a team . for this study, the sponsor chosen was the dean of graduate studies, who reported directly to the vicepresident for academic affairs. in turn, the director of mmpnting and the director of the division of administrative programming services (daps) reported to the dean of graduate studies. although it was not critical that the sponsor be intimately involved in the project, his level of authority within the university administration would help to secure acceptance of the study's recommendation. the sponsor also provided cogent advice along the way, based upon his understanding of institutional resources, and he served as a communication link with other university administrative offices. the study team was chosen by the library administration with the intention of getting diverse involvement and expertise. library staff included the associate director, the head of circulation, the serials cataloger, and a reference librarian. although only the associate director brought significant experience in library automation development, the head of circulation contributed substantial practical experience with automation systems. the cataloger offered specifics of bibliographic problems, cataloging rule changes, and serials control issues, and the reference librarian contributed a comprehensive knowledge of informationretrieval concerns. outside staff included the director of daps, who furnished details on the clemson computing environment, and an ibm marketing representative, who provided appropriate help with hardware capabilities, the att metnodology, and legwork. in addition, clemson was also able to engage the help of a representative of ibm's education industry division to guide the a tt efforts on the basis of his experience in the use of the methodology. from time to time, other ibm and daps staff were involved in assisting with interviews and report writing. the associate director served as team chair in order to act as spokesperson, to coordinate team effort, and to edit the final report. methodology the application transfer team methodology is applied in six phases. ibm recommends that these phases be conducted sequentially, and that they last from five to sixteen weeks, depending on the size of the problem. throughout the process, verbal reviews were conducted by the team with the sponsor and with the library staff. the first phase involved an organizational session. following the introduction of team members, the ibm education industry division representative presented an overview of the methodology and explained the mechanics of the a tt study process. the team then established the scope of the study by choosing an application area on which to focus and by determining the general objectives of the final system to be implemented. since part of the purpose of the project was to develop a plan for librarywide automation, it was quickly recognized by the team that the application area should be an integrated library information system. however, the ibm representative suggested that this scope was too broad for the study and that one functional area such as communications 207 acquisitions be chosen, with other functions reserved for subsequent att studies. given time constraints, a compromise arrangement was made in which serials control was determined as the scope. since serials control is a single functional area, but encompasses nearly all bibliographic issues, it served as a microcosm of overall library operations. therefore, it was generally accepted that a plan that effectively accommodated serials would constitute an integrated system plan. the organizational phase continued by determining who to interview during the data-collections phase and by setting up an interview schedule. this phase was concluded by developing an outline of the final report and by assigning writing responsibilities to individual team members. the data-gathering effort constituted phase two. this involved structured interviews of representative staff of each unit of the library who were involved in routine interactions with any phase of serials control at clemson. interviews were conducted with staff from acquisitions, cataloging, circulation, reference units, and branch libraries as well as the university business office, students, and faculty. following an outline in the att, each person interviewed was asked for specific details of his work with serial publications regarding (1) interfaces (or points of interaction), (2) concerns or needs, (3) suggested improvements, (4) expected values or benefits of improvements, (5) work volume, and (6) cycles. data gathered in each of these interview sessions were immediately documented in a letter to the interviewees. these letters were reviewed by those interviewed for corrections and adde::tl delail. data from completed and documented interviews were consolidated during the third phase of the study into a matrix of each of the six questions plotted against operational areas of the library, graphically designating areas of the greatest concern to the largest part of the library. this composite was analyzed to separate problems that could be reasonably handled by an integrated automation system from those that needed the attention of administrative policy and direction. functions for automation consideration were then examined in a 208 journal of library automation vol. 14/3 september 1981 "blue sky" session of the committee to envision what system would accommodate the specifications for serials control and access that each library unit and serials user required. from this session a synthesis emerged of the architecture for an integrated system. 4 this architecture included a description of the basic relationships of functional modules of the system, a list of the various files needed to contain system information, and a list of data elements required for bibliographic holdings, acquisition, and patron records in the system database. phase four called for the translation of the architecture and general system requirements into modules on basic access, acquisition or processing functions, and into the individual programs needed to execute each module. the team divided into two parts. the ibm and daps personnel, with the associate director, listed the modules and programs and formulated descriptions of each. part of the description effort involved drafting approximate flowcharts of each program. using algorithms developed by ibm, these descriptions were used to assign estimates of person hours required to create the necessary modules. in order to determine the overall cost of system development the person-hour figures were converted to dollars using an average hourly cost for clemson daps personnel. committee members not involved in program/module design formed a group to evaluate anticipated benefits defined in the interviews, to collect data from library staff to support these expectations, and to assign a value to them. benefits from reduced file maintenance, processing, and tracking time were valued as person hours saved by the new system . additional improvements were projected for the system's capability for better fund control, more complete and immediate on-order, claiming, and inprocess information, and statistical collection development/use data. these benefits were assigned the value of estimated duplicate and inappropriate material acquired under the present system. a value was not assigned to user benefits. faculty and student satisfaction is intangible, and variable from case to case. enhanced user service was recognized as a substantial benefit of the proposed system, but was not quantified. the cost factors determined in phase four were consolidated with derived benefit values to form a cost/benefit analysis, which constituted phase five. in the sixth and final phase an implementation plan was formulated. this plan, along with recommended target dates, was presented orally to library staff and university administration. in addition, the entire process, recommendations, and plan of action were documented in a written report. 5 results within the a tt report were a description of the current library environment, objectives and description of the proposed system, implementation considerations, a cost/benefits analysis, and recommendations for a plan of action. although care was taken to "walk through" the function of each module of the described system, the report was not intended to provide detailed computer program specifications ready to be coded by a programmer. it described a useful and powerful integrated serials system in sufficient detail to be a working tool in the hands of a knowledgeable systems analyst to match (or revise) already available systems and programs to the library's specifications. the report itself also served as an effective communication link with the university administration, setting out library concerns and giving rational solutions to the pervasive problem of serials control and, in the long term, to an integrated library information system. the timing of the a tt study was fortunate for the clemson library. the university was on the eve of an accreditation selfstudy. as often happens with the examination of any organization, a host of related, but unacknowledged , problems surfaced in the course of the att study. during the interviews, staff members felt free to bring up matters of unclear policies, misunderstood hierarchical arrangements, and staffing inadequacies throughout the library. the number and importance of nonautomation concerns was significant enough that an administrative report was written to articulate these problems to the university administration. 6 it is interesting to note also that, while in every instance the team received enthusiastic cooperation from all those interviewed , there was fear among some staff members that any automation project would necessarily cut staff positions. once this worry was identified, the study team was able to allay those fears by explaining the study's purpose. one of the greatest contributions of the att study has been the direction it has given the library for future goals and priorities. by focusing on the problems of serials control, the team evaluated a microcosm of library problems. investigating these problems in the environment of more limited budgets, possible future closing or freezing of the card catalog, and increased user demands for services has helped the library develop a course of action, a resolve of mission, and a direction for future growth. the staff of daps and the library are conducting a review of existing software and systems potentially appropriate for a comprehensive serials control system. the att study was the tool successfully used to elicit university support for library automation. the university has given its approval, and supplied funding, to proceed with the determination of available systems and with the development of a request for quotation. communications 209 references l. for example: university of rochester, river campus libraries, task force on access systems, report (rochester, n.y.: univ. of rochester, 1980), university of california, berkeley, general library, committee on bibliographic control, future of the general library catalogs of the university of california at berkeley (berkeley: univ. of california, 1977); pennsylvania state university libraries, systems development department, remote catalog access system: general system specifications (university park: pennsylvania state univ., 1977). 2. association of research libraries, office of management studies, annual report, 1979 (was hington, d.c.: the association, 1979). 3. international business machines corporation, application transfer teams: application description (white plains, n.y.: the corporation, 1977), p.1 ; international business machines corporation, application transfer teams: realizing your computing syste1ns' potential (white plains, n.y.: the corporation, 1977). 4. inte rnational busine'is machines corporation , business systems planning: information systems planning guide (white plains, n.y.: the corporation, 1975), p.49. 5. richard w. meyer and others, total integrated library information system: a report on the general design phase (syracuse, n.y.: eric clearinghouse on information resources, 1980), ed 191446. 6. richard w. meyer, cooper library: status and agenda. a report on fy 1979-80 (clemson, s.c.: clemson univ., 1980). lib-s-mocs-kmc364-20140601051731 58 book reviews descriptive cataloguing; a student's introduction to the anglo-american cataloguing rules 1967. by james a. tait and douglas anderson. second ed.; rev. and enl. hamden, conn.: linnet books, 1971, 122p. $5.00 this second edition contains some corrections to the errors made in the 1968 edition, and includes the changes and clarifications brought out by the aacr amendment bulletin. the number of exemplary title pages has been increased from twenty-five to forty, thus giving the student more practice in determining entries and doing descriptive cataloging. this reviewer believes that a more exact title would be "descriptive cataloging and determining entries and headings," because this introductory text not only covers descriptive cataloging as defined and explained in "part iidescriptive cataloging" of the anglo-american cataloguing rules, but also includes some of the basic rules for determining entries and headings in aacr's "part !-entry and heading." there are three distinct sections: descriptive cataloging; determining entries and headings; and facsimile title pages for student practice. descriptive cataloging is covered in just thirteen pages, but all the basic elements are there. the explanations are clear and examples are shown, but not in the context of a full card. (unfortunately only one full catalog card is illustrated in the entire book.) it is in this section, more than in any other, where the differences between british and american cataloging become obvious. british descriptive cataloging varies in so many ways from its american counterpart that a beginning student in an american library school would be quite confused by these variations. the next section consists of twenty-five pages and is devoted to the basic rules on entries and headings. examples are used to illustrate the rules and the authors point out some differences between the british and american texts of the aacr. the remaining seventy pages contain the forty reproduced title pages which are followed by some commentary and a key corresponding to each title page. these title pages give the student a wide range of experience in transcribing the proper information onto the card and in determining main and added entries. even though this book is an excellent introduction to the rudiments of descriptive cataloging and the determination of main and added entries, book reviews 59 its use of british descriptive cataloging precludes its being widely adopted in beginning cataloging courses in american library schools. donald /. l ehnus centmlized processing for academic librm·ies. by richard m. dougherty and joan m. maier. metuchen, n.j.: scarecrow press, 1971. 254p. $10.00 this is the final report of the colorado academic libraries book processing center ( calbpc) two-part study investigating centralized processing. phase i, reported by laurence leonard, maier, and dougherty in centralized book processing, scarecrow, 1969, was basically a feasibility study, whereas this final report describes the beginning six months of operations that tested the phase i recommendations. partially funded by the national science foundation, the experiment measured anticipated time and cost savings, monitored acquisitions and cataloging operations, and tested product acceptability for six libraries participating in the 1969, six-month study. even though centralized book processing might hold little appeal for the reader, this volume nonetheless is valuable to technical service heads because of its above average sophistication in applying a systems analysis approach to technical services problems. the authors objectively report their findings, outlining in detail the mistakes, the unanticipated problem areas, and what they believed to be the successes. from the start the authors encountered problems with scheduling. by the time the experiment began most participants had a large portion of their book money encumbered, and the center was forced to accept cataloging arrearages in addition to book order requests. those who did send in orders did not conform to patterns predicted in phase i. instead, the center was used as a source of obtaining more difficult materials, including foreign language items. it was discovered that in actual practice calbpc had no impact on discounts received from vendors. the vendor performance study lacked relevancy because it was based upon the date invoices were cleared for payment rather than the date books were received in house. in evaluating the total processing time, four libraries reduced their time lag by participating in the center's centralized processing, and the cost of processing the average book was reduced from $3.10 to $2.63. the product acceptance study showed that the physical processing was only partially accepted with most of the libraries modifying a truncated title that was printed on the book card and book pocket as a by-product of the automated financial subsystem. other local modifications were made on books processed by the center but that cost or local error correction costs were not reported in the study. calbpc's automated financial subsystem was beseiged with many problems resulting from lack of programming foresight and adequate consulting 60 journal of library automation vol. 5/1 march, 1972 by those who had previously designed such systems. individuals interested in the automation of acquisitions should read this section of the report. calbpc's problems were typically those of building exceptions to exceptions in order to accommodate unanticipated program omissio.ns. simply not recognizing that books could be processed before invoices were paid caused delays and bottlenecks of such magnitude that procedures had to be devised to circumvent requirements of the automated subsystem. many recommendations were particularly relevant to cooperative ventures. in formulating processing specifications such as call number format and abbreviation standardization, calbpc had not anticipated the infinite local variations they would have to accommodate. they quickly recognized the need for both greater quality control to minimize errors within the system and better communications and educational programs for participants. a reoccurring message was that librarians emphasized the esthetics of catalog cards rather than the content, thus a recommendation was made to investigate whether a positive correlation exists between the esthetics of the product and the quality of the library service. the authors emphasized that a cooperative program depends more upon competencies and willingness of individuals than the technical aspects of the operations. some diversification of services was called for but no mention was made of the possibilities of an on-line system. it was felt that in future operations the center should accept orders for out-of-print and audiovisual materials. those libraries participating in approval programs had received no benefit by having books sent first to the center, thus it was suggested that the center forward those libraries a bibliographic packet only and that the approval books bypass the center. this well-documented study, half of which is devoted to charts and appendix materials, concluded its recommendations with a positive evaluation of the service the center had performed and suggested that public and school libraries should also be participants. ann allan mccrory ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ \ orthographic error patterns of author names in catalog searches 93 renata tagliacozzo, manfred kochen, and lawrence rosenberg: mental health research institute, the university of michigan, ann arbor, michigan an investigation of error patterns in author names based on data from a survey of library catalog searches. position of spelling errors was noted and related to length of name. probability of a name having a spelling error was found to increase with length of name. nearly half of the spelling mistakes were replacement errors; following, in order of decreasing frequency, were omission, addition, and transposition errors. computer-based catalog searching may fail if a searcher provides an author or title which does not match with the required exactitude the corresponding computer-stored catalog entry ( 1). in designing computer aids to catalog searching, it is important to build in safety features that decrease sensitivity to minor errors. for example, compression coding techniques may be used to minimize the effects of spelling errors on retrieval ( 2, 3, 4). preliminary to the design of good protection devices, the application of error-correction coding theory ( 5, 6, 7) and data on error patterns in actual catalog searches ( 8, 9) may be helpful. a recent survey of catalog use at three university libraries yielded some data of the above-mentioned kind (10). the aim of this paper is to present and analyze those results of the survey which bear on questions of error control in searching a computer-stored catalog. in the survey, users were interviewed at random as they approached the catalog. of the 2167 users interviewed, 1489 were searching the catalog for a particular item ("known-item searches"). of these, 67.9% first entered the catalog with an author's or editor's name, 26.2% with a title, and 5.9% with a subject heading. approximately half the searchers had a written citation, while half relied on memory for the relevant in94 journal of library automation vol. 3/2 june, 1970 formation. paradoxically, though most known-item searchers tried to match primarily an author and only secondarily a title, there were in the sample of searches many more cases of exact title citation than of exact author citation. imperfect recall of author name of the 1489 "known-item" searches, 1356 could be verified against the actual item. from the total nwnber of searches ( 1260) in which the catalog user had provided an author's (or editor's) name, those works were subtracted which did not have a personal authorship ( 208) or had multiple authors or multiple editors ( 127). this left 925 searches, of which 470 had complete and correct author entries, while 455 contained various degrees of imperfection in the author citation. table 1 gives the distribution of incorrect and/or incomplete author citations. in the study an author's name was defined as incomplete when the first name, or the two initials, or one out of two initials was missing. table 1. incorrect and/or incomplete author names categories university of michigan libraries i ii iii total general library 144 25 6 175 undergraduate library 94 35 4 133 medical library 110 27 10 147 -total 348 87 20 455 in category i (the most numerous) the author's last name was correct, but the author citation as a whole was either incomplete or incorrect; i.e., there were mistakes and/or omissions in the first and middle name or initials. most of the searches in category i were incomplete rather than incorrect. since in category i there is nothing wrong with the author's last name, the searcher's ability to gain access to the right location in the catalog is presumably not impaired as long as the last name is not too common. once the searcher has entered the catalog, he will make use of other clues, such as title or knowledge of the topic, to identify the right item. but if the name is smith or brown or johnson, and the catalog is a large one, to have an incomplete author's name may be equivalent to having no name at all. (in the university of michigan general library catalog, which contains over four million cards, the entry "smith" extends over eight drawers, and the entries "brown" and "johnson" over four drawers each.) in an automated catalog it is easy to limit the set of entries from which the right item has to be selected by intersecting the last name of the author with some other clues. incompleteness of the author name may then not be a serious handicap. orthographic error patternsftagliacozzo 95 category iii includes all searches in which the searcher had an author that turned out to be wrong. the error in this case was not in incompleteness or misspelling of the author's name, but in the identity of the author. no further analysis of this group was conducted. category ii is the one which forms the object o£ the present report. the analysis concerns mainly position and type of errors, and the incidence of errors as related to name length. position of errors in author names the location of errors in the author citation is important for manual systems, such as traditional library card catalogs, as well as for automated systems. table 2 shows the distribution of e in the sample of incorrect author citations from all three libraries, where e is the position of the letter, counting from left to right, in which an error appeared. in the fourteen cases in which more than one error occurred in the same name, only the first error was considered. in a few cases the error involved a string of letters (e.g., friedman for friedberg). in such cases the position of the first letter of the string determined the location of the error. table 2. position of error in last name of author incorrect names e no. % cumulative % 1 2 2.3 2.3 2 11 12.6 14.9 3 11 12.6 27.6 4 19 21.8 49.4 5 13 14.9 64.4 6 12 13.8 78.2 7 7 8.0 86.2 8 6 6.9 93.1 9 3 3.4 96.6 10 2 2.3 98.9 11 1 1.1 100.0 total 87 table 2 shows that about half the incorrect author names had errors in one of the first four letters, while the other half had errors in one of the following letters, from the fifth to the eleventh position. the most frequently misspelled is the fourth letter, which is responsible for 21.8% of the total number of errors occurring in the sample. the ordinal number indicating the position of the error is not, by itself, a sufficient indicator of the area where the error occurred. an error in the third letter, for instance, is close to the beginning of the name if the 96 ]ourml of library automation vol. 3/ 2 june, 1970 name is 9 letters long, but close to the end if the name is 4 letters long. in table 3 l indicates the length (the number of letters) of the authot name and pa the location of the error-i.e., the position of the first letter, counting from left to right, where an error appears. the incorrect author names of the sample ( 87) have a length of between 3 and 12 letters. the column on the right of the table, el, indicates the distribution of names of a given length. the row at the bottom of the table gives the distribution of errors occurring in a given position. mistakes are shown to occur anywhere from the first letter to the eleventh letter. when the error consists in the addition of a letter to the end of the correct name, pa is beyond the name itself. the figures which appear next to the diagonal line, on the right, indicate mistakes of this sort. a sununary inspection of the table produces the impression that errors are clustered toward the end of the names, or at least that they are more prevalent in the second half of the name than in the first half. this seems to be a direct consequence of the fact that the first column of the table (errors in position 1) is almost empty. it is tempting to say that errors very rarely occur in the first letter of a proper name. but is this really so? it is true that english-speaking people place particular emphasis on initials, to the extent that initials are often sufficient for identifying well-known figures. the special attention given to the first table 3. position of error vs. length of name length (l) errors (pe) frequency (el) 1 2 3 4 5 6 7 8 9 10 11 3 1 1 4 1 3 5 5 1 2 1 7 6 1 3 6 21 7 4 2 6 19 8 2 3 2 16 9 2 1 1 1 1 1 1 8 10 1 1 2 1 2 7 11 1 1 2 12 1 1 total 2 11 11 19 13 12 7 6 3 2 1 87 orthographic error patternsjtagliacozzo 97 letter of a name would certainly contribute to the scarcity of errors in such a letter. but it is also possible that when errors in the first letter occur, they so transform the name that it becomes unrecognizable. several such authors may have ended up in the category of non-verified authors necessarily excluded from the analysis. it would be interesting to verify whether the "serial-position effect" that some authors found in the spelling of common nouns is present also in the spelling of proper names. according to jensen and to kooi et al., the distribution of spelling errors in relation to letter position closely approximates the serial-position curve for errors found in serial rote learning ( 11, 12). to ascertain if this is the case for author names, a data base much larger than that used for this study would be needed. distribution of errors and length of names is the probability of a catalog searcher misspelling the name of an author dependent to any extent on the length of the name? table 3 shows the frequency of occurrence of names of a given length in the 87 misspelled names (column el). the next step was to calculate the distribution of the length of author names in the whole group of verified author citations provided by the catalog searchers. this group, it should be remembered, does not include multiple authors, multiple editors or nonpersonal authors. the ratio of the corresponding figures in the two distributions will give the percentage of names of a given length having spelling mistakes (table 4) . table 4. probability of errors in recall of author names of a given length length frequency of frequency of percentage of of name incorrect names all names incorrect names 2 1 3 1 9 11.1%} 4 5 87 5.7% 4.9% (short 5 7 169 4. 1% names) 6 21 215 9.8%"\ 7 19 191 9.9% j 10.5% (medium 8 16 127 12.6% names) 9 8 59 13.6%} 10 7 36 19.4% 14.3% (long 11 2 26 7.7% names) 12 1 5 20.0% 87 925 there is an observable trend toward an increase of mistakes with length of name. of course, the two extremes of length distribution are scarcely 98 journal of library automation vol. 3/2 june, 1970 represented, and this is probably responsible for inconsistencies in the percentage disb·ibution. grouping names into three length categories (i.e., short names, middle-length names, and long names) makes more apparent differences in percentages of incorrect names. the differences are significant at the .01 level of confidence. type of error in author names errors which occurred in the spelling of the last names of authors were grouped into four broad categories: replacement errors, omission errors, addition errors, and transposition errors. while it is true, especially in badly mangled words, that an error can often be said to be of any of several types, it was generally easy to identify the simplest necessary transformation of the letters, and to assign the incorrect name to the type of error corresponding to that kind of transformation. in some cases this meant adding a string of letters or replacing one string by another. altogether the sample of 87 incorrect authors contained 104 errors. eleven names exhibited two errors each, three had three errors, and the remaining just one error. of the 104 errors, 50 were replacement errors; these are cases in which one letter or string of letters of the correct name has been replaced by a different letter or string of letters (e.g. hoiser for hoijer, friedman for friedberg). the most common replacement errors appear in table 5, in order of decreasing frequency. table 5. single-letter replacement errors no. of errors correct lettet' incorrect letter 6 0 a, a, a, a, p, r 5 a, e, y, y, y 4 y a, i, u, z 3 a i, o, 0 3 s c, r, z 3 v b, f, w 2 e i, 0 2 g c, r 28 not included in the table are the 10 letters which were each replaced just once and the 12 strings of letters. in four cases, the replaced letter was the second of a double letter. there were 34 omission errors in all. four of these involved a string of letters; all the rest were single-letter omissions. eleven single-letter omissions occurred in the last letter of the name (e.g. abbot instead of abbott), and 19 in the middle of the name (e. g. brent instead of orthographic error patternsjtagliacozzo 99 brendt). table 6 gives the frequency distribution of the omitted letters. the asterisk indicates that the omitted letter was the second of a double letter. table 6. single-letter omission errors no. of error in middle error in final letter errors position position omitted 8 5 3 e 4 4 a 4 40 t 3 1 20 n 2 2 h 2 2 i 2 20 1 2 1 1 s 1 1 c 1 1 d 1 1 r 30 addition errors totaled 18. in one case the addition consisted of a string of letters, while in the others only one letter was added. addition errors can occur in the middle of a name (e.g. berelison for berelson) or at the end of it (e.g. haller for halle). in the latter case, the added letter is found beyond the last letter of the correct name (these were the errors on the right of the diagonal in table 3). the distribution of addition errors is shown in table 7. the asterisk indicates that the added letter duplicated the previous letter. table 7. single-letter addition errors no. of error in middle errors position 5 2 2 2 1 1 1 1 l 1 17 error in final position 4 1 1 1 .l added letter s c e i a f 1 m n z 100 journal of library automation vol. 3/2 june, 1970 there were two transposition errors: ie for ei and ai for ia. in cases of second and third errors in the name, there were five replacement errors, seven omission errors, and five addition errors. table 8 summarizes the type of errors encountered in the sample of incorrect authors. figures in this table include strings as well as single letters, and second and third errors, as well as first errors. table 8. distribution of types of errors middle position replacement errors omission errors addition errors transposition errors conclusion four trends could be observed: 44 21 10 2 final total position 6 50 13 34 8 18 2 104 1) vowels usually replaced vowels, and consonants usually replaced consonants. apparently the probability of misspelling a single letter was slightly higher for vowels than for consonants. with the latter, there is some indication that the substitution was guided by phonetic similarity ( " » • 1 d b "b" "f" " ") e.g., v is rep ace y , or , or w . 2) most omissions in which the correct name had a double letter occurred at the end of the word. 3) replacement errors tended to come earlier in words than did omissions and additions. (this is not due to the fact that addition and omission errors contained a disproportionately high number of final errors; even when these final errors are excluded, replacement errors still come earlier than other types.) 4) second and third errors in a name have comparatively few replacement errors. acknowledgment this work was supported in part by the national science foundation, grant gn 716. references 1. kilgour, f. g.: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for i nfo1'11ultion science, 5 ( 1968), 133-136. 2. nugent, william r. : "compression word coding techniques for information retrieval," journal of library automation, 1 (december 1968), 250-260. orthographic error patternsjtagliacozzo 101 3. ruecking, frederick h ., jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-238. 4. dolby, james l. : "an algorithm for noisy matches in catalog searching." in: a study of the organization and search of bibliographic holdings records in on-line computer systems: phase i. (berkeley, cal. : institute of library research, university of california march 1969 ), 119-136. 5. peterson, william w.: error correcting codes (new york: wiley, 1961). 6. alberga, cyril n.: "string similarity and mispellings," communications of the acm, 10 ( 1967), 302-313. 7. galli, enrico j.; yamada, hisao m.: "experimental studies in computer-assisted correction of unorthographic text," ieee transactions on engineering writing and speech, ews-11 (august 1968), 75-84. 8. tagliacozzo, r., et al.: "patterns of searching in library catalogs." in: integrative mechanisms in literature growth. vol iv. (university of michigan, mental health research institute, january 1970). report to the national science foundation, gn 716. 9. university of chicago graduate library school: requirements study for future catalogs, (chicago : university of chicago graduate library school, 1968) . 10. tagliacozzo, renata; rosenberg, lawrence; kochen, manfred: access and recognition: from users' data to catalog entries (ann arbor, mich.: the university of michigan, mental health research institute, october 1969, communication no. 257) . 11. jensen, arthur r.: "spelling errors and the serial-position effect," journal of educational psychology, 53 (june 1962), 105-109. 12. kooi, beverly y.; schutz, richard e.; baker, robert l.: "spelling errors and the serial-position effect," journal of educational psychology, 56 ( 1965), .334-336. president’s message cindi trainor information technologies and libraries | september 2013 1 it's fall already, and for lita that means some exciting things! the program planning committee is hard at work evaluating the sixty-plus program proposals that have been submitted. the new conference format specified by ala means that we have 20 slots for lita programs at annual, including the president's program and top tech trends. the ppc certainly has its work cut out! well before midwinter in philadelphia will be national forum 2013. i'm so excited that this year's forum is going to be in my home state of kentucky. join us from november 7-10 in "luhvuhl" (louisville) for great keynotes, preconference workshops, concurrent sessions, and of course, networking opportunities. travis good from make magazine will deliver our opening keynote, nate hill from chattanooga public library is up on saturday, and emily gore from the digital public library of america closes out the forum on sunday. i hope you'll join us for an exciting forum in the bluegrass state. in governance news, the board of directors identified three goal areas on which we will concentrate this year: stabilizing the budget, engaging members, and growing membership. we are eagerly awaiting the final report of the financial stability task force, led by tom wilson and andrew pace; they presented preliminary findings at the board meeting in chicago in july. we are currently updating the strategic plan, which was last updated in 2010—watch the lita-l list and the blog for more. we would love your input on them! if you're interested in lita governance, check out the board's space on ala connect. many discussion posts are open, and we welcome your comments! you can also complete this form (http://www.ala.org/lita/about/board/contact) to reach out to the board anytime. the board will be having several meetings this fall: the executive committee is tentatively meeting september 30; the forum steering committee will meet after forum 2013; the budget review committee will be meeting at the ala joint boards meeting in late october; and the entire board will have an online meeting before we convene in person in philadelphia. watch lita-l for the announcements; we welcome you as guests to all our meetings. finally, for those of you interested in leadership but not necessarily ready to run for board, i want to point you to documents put together by the lita emerging leaders team in 2013: http://connect.ala.org/node/197839 our team of three ala emerging leaders, margaret heller, zach coble, and katie heidgerkengreene, surveyed lita leaders, worked with committee chairs coordinator michelle frisque and ig chairs coordinator paul keith, and synthesized tons of information and many documents into the leadership guide for new chairs of committees and interest groups cindi trainor (cindiann@gmail.com) is lita president 2013-14 and community specialist & trainer for springshare, llc. http://litablog.org/ http://www.ala.org/lita/about/board/contact http://www.ala.org/lita/about/board/contact http://connect.ala.org/node/197839 http://connect.ala.org/node/209032 mailto:cindiann@gmail.com president’s message | trainor 2 (http://connect.ala.org/node/209032). they also created a sample leadership game (http://www.gloriousgeneralist.com/leadership.html) to test our leadership knowledge, and presented their work at the emerging leaders poster session in chicago. the leadership guide will inform future orientation activities for committee and ig chairs and will then be handed over to the bylaws & organization committee to be incorporated into the lita manual. thank-you for being a lita member! i hope to see you at a future event. my fellow board members and i welcome your comments and suggestion--for program ideas, workshop or online class ideas, and for how we can keep lita awesome. :) http://connect.ala.org/node/209032 http://www.gloriousgeneralist.com/leadership.html http://www.gloriousgeneralist.com/leadership.html 218 history of library computerization frederick g. kilgour : director, ohio college library center, columbus, ohio the history of library computerization from its initiation in 1954 to 1970 is described. approximately the first half of the period was devoted to computerization of user-oriented subject infotmation retrieval and the second half to library-oriented procedures. at the end of the period on-line systems were being designed and activated. this historical scrutiny seeks the origins of library computerization and traces its development through innovative applications. the principal evolutionary steps following upon a major application are also depicted. the investigation is not confined to library-oriented computerization, for it examines mechanization of the use of library tools as well; indeed, the first half-dozen years of library computerization were devoted only to user applications. the study reveals two major trends in library computerization. first, there are those applications designed primarily to benefit the user, although few, if any, applications have but one goal. the earliest such applications were machine searches of subject indexes employing post-coordination of uniterms. nearly a decade later, the first of the bookform catalogs appeared that made catalog information far more widely available to users than do card catalogs. finally, networks are under development that have as their objective availability of regional resources to individual users. the second trend is employment of computers to perform repetitive, routine library tasks, such as catalog production, order and accounting procedures, serials control, and circulation control. this type of mechanizahistory of library computerizationfkilgour 219 tion is extremely important as a fir st step toward an increasingly productive library technology, which must be an ultimate goal if libraries are to be economically viable in the future ( 1,2). historical studies of library computerization have not yet appeared, although some reports beginning with that of l. r. bunnow ( 3) in 1960 contain valuable literature reviews. both editions of literature on information retrieval and machine translation by c. f. balz and r. h. stanwood ( 4,5) are extremely useful. in addition, j. a. speer's libraries and automation ( 6) is a valuable, retrospective bibliography of over three thousand entries. origins the origins of library computerization were in engineering libraries newly established in the 1950's and employing the uniterm coordinate indexing techniques of mortimer taube on collections of report literature. the technique of post-coordination of simple index terms proved most suitable for computerization, particularly when the size of a file caused manual manipulation to become cumbersome. harley e . tillitt presented the first report, albeit unpublished at the time, on library computerization at the u.s. naval ordnance test station (nots), now the naval weapons center at china lake, california. the report, entitled "an experiment in information searching with the 701 calculator" (7), was given at an ibm computation seminar at endicott, new york, in may 1954. the system was extended .and improved in 1956, and a published report appeared in 1957 ( 8). tillitt subsequently published an evaluation ( 9). the nots system mimicked manual use of a uniterm card file. this noteworthy system could add new information, delete information related to discarded documents, match search requests against the master file, and produce a printout of document numbers selected. search requests were run in batches, thereby producing inevitable delays that caused user dissatisfaction. when the user did receive results of his search, he had a host of document numbers that he had to take to a shell list file to obtain titles. subsequent system designers also found that a computerized system could cause user dissatisfaction if it did not speed up and make more thorough practically all tasks. because use of the system dwindled, it was not reprogrammed for an ibm 704 that replaced the 701 in 1957. however, a couple of years later, when an ibm 709 became available, the system was reprogrammed and improved so that the user received a list of document titles ( 10). tillitt, bracken, and their colleagues deserve much credit for their pioneer computerization of a subject information retrieval system. the application required considerable in genuity, for the ibm 701 did not have built-in character representation. therefore it was necessary to develop subroutines that simulated character representation ( 11 ). moreover, the 701 had an 220 journal of library automation vol. 3/3 september, 1970 unreliable electrostatic core memory. on some machines the mean time between failures was less than twenty minutes ( 12). in september 1958, general electric's aircraft gas turbine division at evendale, ohio, initiated a system on an ibm 704 computer ( 13) that was similar to the nots application. mortimer taube and c . d. gull had installed a uniterm index system at evendale in 1953 (14,15). the ge system was an improvement over the then-existing nots system because it printed out author and title information for a report selected, as well as an abstract of the report. like the nots system, however, the ge application provided only for boolean "and" search logic. the celebrated medlars system ( 16) encompassed the first major departure in machine citation searching. the original medlars had two principal products: 1 ) composition of index m edicus; and 2) machine searching of a huge file of journal article citations for production of recurrent or ondemand bibliographies. the system became operational in 1964. the nots and ge systems coordinated document numbers as listed under descriptors. medlars departed from this technique by searching a compressed citation file in which each citation had its descriptors or subject headings associated with it. the medlars system also provides for boolean "and," "or," and "not" search logic. the next major development was dialog (17), an on-line system for machine subject searching of the nasa report file. queries were entered from remote terminals. the suny biomedical communication network constitutes an important development in operation of machine subject searching and production of subject bibliographies of traditional library materials. the suny network went into operation in the autumn of 1968 with nine participating libraries ( 18) . its principal innovation is on-line searches from remote terminals of the medlars journal article file to which book references have been added. the suny network eliminates the two major dissatisfactions with the nots system and all subsequent batch systems, in that it provides the user with an immediate reply to his search query. catalog production in 1960, l. r. bunnow prepared a report for the douglas aircraft company ( 3) in which he recommended a computerized retrieval system like the nots and ge systems that would also include catalog card production. bunnow's proposal was perhaps the first to contain the concept of production of a single machine readable record from which multiple products could be obtained, such as printed catalog cards and subject bibliographies produced by machine searching. catalog card production began in may 1961 ( 19), the cards having a somewhat unconventional format and being printed all in upper-case characters as shown in figure 1. cards were mechanically arranged in packs for individual catalogs, and alphabetized within packs-an early sophistication. accompanying the history of library ,computerizationjkilgour 221 ml 13,750 douglas aircraft co., inc mechanized information retrieval system for douglas aircraft company, inc., status report. g. w. koriagin, l. r. bunnow january 1962 copy 1 fig. 1. sample catalog card. info~tion retrieval ibraries computer earching ibm 7090 ibm 1401 production of catalog cards was production of accession lists from the same machine readable data. the next development in catalog card production occurred at the air force cambridge research laboratory library, which began to produce cards mechanically in upperand lower-case in 1963 ( 20). a special computer-like device called a crossfiler manipulated a single machine readable cataloging record on paper tape to produce a complete set of card images punched on paper tape. this paper-tape product drove a friden flexowriter that mechanically typed the cards in upperand lower-case. two years later, yale began to produce catalog cards in upperand lower-case directly on a high-speed computer printer ( 21). the yale cards were also arranged in packs, as had been those at douglas, but were not alphabetized within packs. the new england library information network, nelinet, demonstrated in a pilot operation in 1968 a batch processing technique servicing requests from new england state university libraries, via teletype terminals, for production of catalog card sets, book labels, and book pockets from a marc i catalog data file ( 22). the nelinet system became operational in the spring of 1970 employing the marc ii data base. also in 1968 the university of chicago library brought into operation catalog card production with data being input remotely on terminals in the library, and cards being printed in batches on a high-speed computer printer centrally ( 23 ). bookform catalogs began to appear in the early 1960's, and it appears that the information center of the monsanto company in st. louis, missouri, published the earliest report on a bookform catalog that it had 222 journal of library automation vol. 3/3 september, 1970 produced by computer in 1962 ( 24,25) . the center discontinued its card catalog in the same year. book catalogs can increase availability of cataloging information to users while reducing library work, and the monsanto book catalog is an example of such an achievement, for it provides a union catalog of the holdings of seven monsanto libraries, and is produced in over one hundred copies. as would be expected, the catalog appeared all in upper-case. however, in september 1964 the library at florida atlantic university produced a bookform catalog in upperand lower-case (26) and the university of toronto library put out the first edition of its upperand lower-case onulp catalog on 15 february 1965 (27,28). the monsanto catalog format called for author and call number on one line, with title and imprint on a second, or second and third, line. both florida atlantic and toronto catalogs were essentially catalogs of catalog cards. under the leadership of mortimer taube, documentation, inc. was first to produce a bookform catalog in upperand lower-case, with a format like that of bookform catalogs in the nineteenth century ( 29); documentation, inc., prepared the catalog for the baltimore county public library. entries were made once, with titles listed under an entry if there were more than one. the stanford bookform catalog appeared late in 1966, introducing a new type of unit record, whose first element is the title paragraph. h. p. luhn proposed selective dissemination of information ( sdi) in 1958 (30), and perhaps the first library application of sdi was in the spring of 1962 at the ibm library at owego ( 31), where special processing was given to new acquisitions for input into the sdi system. at about the same time, the library of the douglas missile & space systems division instituted an sdi system that employed as input a single machine readable record from which catalog cards and accessions lists were also produced ( 32). the introduction of sdi into library operation is a major, historic innovation, for sdi is a routine but personalized service in contradistinction to the depersonalized library service characteristic of all but the smallest libraries. selective dissemination of information is one of the few examples of library computerization that takes full advantage of the computer's ability to treat an individual as a person and not as one of a horde of users. circulation the picatinny arsenal reported the first computerized circulation system ( 33). the pica tinny application produced a computer printed loan record, lists of reserves, overdues, lists of books on loan to borrowers, and statistical analysis, in a system that began operation in april 1962. the charge card at picatinny was an ibm punch card into which was punched the bibliographic data and data concerning the borrower each time the book was charged. in the fall of 1962, the thomas j. watson research center ( 34) activated a circulation system much like the pica tinny system, except that bibliographic data was punched into a book card by machine, but information about the borrower was manually punched. history of library c;om.puterizationjkilgour 223 the next step forward occurred at southern illinois university ( 35), where a circulation system like the two just described began limited operation in the spring of 1964 employing an ibm 357 data collection system. by using the 357, it was possible to have a machine punched book card and a machine readable borrower's identification card that could be read by the 357, thereby eliminating manual punching. the southern illinois system became fully operational at the beginning of the fall term of 1964, as did a similar 357 system at florida atlantic university (26). batch processed circulation systems periodically producing a listing of books on loan have a built-in source of dissatisfaction, particularly in academic libraries, for current records are unavailable on the average for half the period of the frequency of the printout. such delay can be eliminated in an on-line system, wherein information about the loan is available immediately after recording the loan. however, not all circulation systems with remote terminals operate interactively. in an on-line system introduced at the illinois state library in december 1966 ( 36) the transactions were recorded on an ibm 1031 terminal located at the circulation desk, data transmitted from the terminal being accumulated daily and processed into the file nightly. as first activated, the system did not permit querying the file to determine books charged out, but this capability was added in 1969. also in december 1966, the redstone scientific information center brought into operation a pilot on-line book circulation system based on a converted machine readable catalog consisting of brief catalog entries. this pilot system remained in operation until october 1967, and was capable of recording loans, discharging loans, putting out overdues, maintaining reserves, and locating the record in the file (37). the bellrel real time loan system went into operation at bell laboratories library in march 1968 ( 38). bellrel has a data base consisting of converted catalog records, so that in effect it also is a remote catalog access system. bellrel serves three libraries remotely from two ibm 1050 terminals in each library. bellrel is a sophisticated on-line, real time circulation system that not only records and discharges books, but also replies to inquiries as to the status of a title, and the status of a copy, and will display the full record for a title, as would be required for remote catalog access. serials the library of the university of california, san diego, activated the first computerized serials control system ( 39). this system has as its objective production of a complete holdings list, lists of current receipts, binding lists, claims, nonreceipt lists, and expiration of subscription lists. checking in was accomplished by manual removal from a file of a prepunched card for a specific title and issue. the check-in clerk sent this card to the computer center for processing and the journal issue to the shelves. this 224 journal of library automation vol. 3/3 september, 1970 technique of prepunching receipt cards has generated new problems in some libraries, for professional advice is often needed as to action to be taken when the issue received does not match the prepunched card. nevertheless, the san diego system still operates, albeit with modifications. the washington university school of medicine library activated a serials control system in 1963 ( 40) that was essentially like that at san diego. a series of symposia held at washington university, with the first in the autumn of 1963, widely publicized the system and led to its adoption elsewhere. the university of minnesota biomedical library introduced a technique of writing in receipts of individual journal issues on preprinted check-in lists ( 41 ). check-in data was then keypunched from the lists. this system obviated the problem generated by prepunched cards that did not match received issues, but, of course, reintroduced manual procedures. difficulties with check-in procedures, and delays in receipt of printed lists of holdings made it clear that an on-line real time circulation control system would be superior to the batch systems described in the previous paragraph. laval university in quebec introduced the first on-line, real time system in 1969 ( 42). in september 1969 the laval on-line file held 16,335 titles. access to the file from cathode ray tube terminals is by accession number, and the file, or sections thereof, can be listed. the system also produces operating statistics and contains the potential for automatic claiming. the kansas union list of serials ( 43 ), which appeared in 1965, was the first computerized union list to contain holdings of several institutions. the kansas union list recorded holdings for nearly 22,000 titles in eight colleges and universities. reproduced photographically from computer printout and printed three columns on a page, this legible and easy-to-use list set the style for many subsequent union lists. acquisitions the national reactor testing station library was first to use a computer in ordering processes ( 44). a multiple-part form was produced for library records and for dealers. the library of the thomas j. watson research center activated a more sophisticated system in 1964 that produced a processing information list containing titles of all items in process, a shelf list card, a book card, and a book pocket label ( 45). the pennsylvania state university library put a computerized acquisition system into operation in 1964 ( 46). this system produced a compact, line-a-title listing of each item in process, together with an indication of the status of the item in processing. a small decklet of punch cards was produced for each item on a keypunch, and one of these cards was sent to the _computer center for processing each time its associated item changed status. the pennsylvania system also produced purchase orders. in june 1964, the university of michigan library ( 47) introduced a computerized acquisitions procedure more sophisticated than its predehistory of library computerization/kilgour 225 cessors. the michigan system produced a ten-part purchase order fanfold, an in-process listing, and computer produced transaction cards to update status of items in process; and carried out acconnting for encumbrance and expenditure of book fnnds. in addition, the system produced periodic listings of "do-not-claim" orders, listings of requests for quotation, and of "third claims" for decision as to future action on such orders. in 1966, the yale machine aided technical processing system began operation ( 48). it produced daily and weekly in-process lists arranged by author, a weekly order number listing, weekly fund commitment registers, and notices to requesters of status of request. subsequently, claims to dealers were added, as well as management information reports on activities within the system. like the pennsylvania and michigan systems, its inprocess list recorded the status of the item in processing. the washington state university library brought the first on-line acquisition system into operation in april 1968 ( 49). access to the system was by purchase order number, with records arranged in a random access file nnder addresses computed by a random number generator (50). the stanford university libraries on-line acquisition system began operation in 1969 (51), and employed a sequential file of entries having an index of words in author and title elements of the entry. the stanford system calculated addresses of index works by employing a division hashing technique on the first three letters of the word. standardization by 1965, a dozen or more libraries had a dozen or more formats for machine readable bibliographic records, and an impenetrable thicket of such records was evolving. fortnnately, the library of congress, with the help of the connell on library resources, took the initiative in standardization of format of bibliographic records and produced the now familiar marc format (52) . just as standardization of catalog card sizes enabled interchange of catalog records, so has marc made possible interchange of machine readable catalog records. this standardization has encouraged developments of networks, such as the suny biomedical network, nelinet, the washington state libraries network, and that of the ohio college library center. with each of these regional networks employing the marc bibliographic record, it will be possible to integrate these regional nodes into a future national network. substance and sum the first half of the first decade and a half of library computerization was confined almost entirely to two major mechanizations of mortimer taube's uniterm coordinate indexing. the computerization of single descriptors with attendant document numbers wa£ a relatively easy task. the first breakaway from computerized subject searching came at the 226 journal of library autornation vol. 3/ 3 september, 1970 douglas aircraft corporation, where the technique of producing one machine readable record from which multiple products could be obtained was introduced in 1961. the last half of library automation's decade and a half has been largely consumed with efforts to automate existing library procedures. althou~h notable departures have occurred that take advantage of the computers powerful qualities, on-line, real time techniques introduced at the very end of the historical period under review began again to use individual words as words, not unlike the logic in which the first applications employed uniterms; and it seems likely that the immediate future will witness increasing degrees of computerization based on individual words in bibliographic descriptions rather than on the record as a whole. acknowledgments the author is grateful to sheila bertram for identifying, searching out, and gathering most of the references used in this paper. cloyd dake gull furnished in correspondence invaluable information about events of the fifties and early sixties, and various librarians supplied photocopies of early documents. references l. kilgom, frederick g.: "the economic goal of library automation," college & research libraries, 30 (july 1969 ), 307-311. 2. baumol, william j.: "the costs of library and informational services." in libraries at large (new york: r. r. bowker co., 1969), pp. 168-227. 3. bunnow, l. r.: study of and proposal for a mechanized inforrrwtion retrieval system for the missiles and space systems engineering library (santa monica, california: douglas aircraft co., 1960). 4. balz, charles f.; stanwood, richard h .: literature on information retrieval and machine translation ( international business machines corp., november 1962). 5. balz, charles f.; stanwood, richard h.: literature on information retrieval and machine translation 2d. ed. (international business machines corp., january 1966). 6. speer, jack a. : libraries and automation; a bibliography with index (emporia, kansas: teachers college press, 1967). 7. tillitt, harley e.: "an experiment in information searching with the 701 calculator," journal of library automation, 3 (sept. 1970 ), 202-206. 8. bracken, r. h. ; tillitt, h. e.: "information searching with the 701 calculator," journal of the a ssociation for computing machinery, 4 ( april 1957 ), 131-136. 9. tillitt, harley e. : "an application of an electronic computer to information retrieval." in boaz, martha : modern trends in doc'lrlm entation (new york: pergamon press, 1959), pp. 67-69. history of library computedzationjkilgour 227 10. zaharias, jerome l.: lizards; libmry irlformation search and retrieval data system (china lake, california: u. s. naval ordnance test station, 1963). 11. bracken, robert h.; oldfield, bruce g.: "a general system for handling alphameric information on the ibm 701 computer," journal of the association for computing machinery, 3 (july 1956), 175-180. 12. rosen, saul: "electronic computers: a historical survey," computing surveys, 1 (march 1969), 7-36. 13. barton, a. r.; schatz, v. l.; caplan, l. n.: information retrieval on a high speed computer (evendale, ohio: general electric co., 1959), p· 8. 14. gull, c. d.: personal communication, (22 august 1969) . 15. dennis, b. k.; brady, j. j.; dovel, j. a., jr.: "five operational years of inverted index manipulation and abstract retrieval by an electronic computer," journal of chemical documentation, 2 (october 1962 )) 234-242. 16. austin, charles j.: medlars; 1963-1967 (bethesda, maryland: national library of medicine, 1968). 17. summit, roger k.: "dialog: an operational on-line reference retrieval system." in association for computing machinery: proceedings of 22nd national conference. (washington, d. c.: thomson, 1967), pp. 51-56. 18. pizer, irwin: "regional medical library network," bulletin of the medical libmry association, 51 (april1969), 101-115. 19. koriagin, gretchen w .: "library information retrieval program," journal of chemical documentation, 2 (october 1962 ) 242-248. 20. fasana, paul j.: "automating cataloging functions in conventional libraries," 7 (fall 1963), 350-365. 21. kilgour, frederick g.: "library catalogue production on small computers," american documentation, 17 (july 1966), 124-131. 22. nugent, william r.: "nelinet-the new engjand information network." in congress of the international federation for information processing, 4th, edinburgh, 5-10 august, 1968: proceedings (amsterdam: north-holland publishing co., 1968), pp. g 28-g 32. 23. payne, charles t.: "the university of chicago's book processing system." in proceedings of a conference held at stanford university libraries, october 4-5, 1968 (stanford, califomia: stanford university libraries, 1969). 24. wilkinson, w . a.: personal communication (november 1969). 2.5. wilkinson, w. a.: "the computer-produced book catalog: an appli · cation of data processing at monsanto's information center." in university of illinois graduate school of library science: proceedings of the 1965 clinic on library applications of data processing (champaign, illinois: illini union bookstore, 1966), pp. 92-111. 228 journal of library automation vol. 3/3 september, 1970 26. heiliger, edward: "florida atlantic university library." in university of illinois graduate school of library science: proceedings of the 1965 clinic on library applications of data processing (champaign, illinois: illini union bookstore, 1966), pp. 92-111. 27. bregzis, ritvars: personal communication (november 1969 ) . 28. bregzis, ritvars: "the ontario universities library project-an automated bibliographic data control system," college & research libraries, 26 (november 1965), 495-508. 29. robinson, charles w.: "the book catalog: diving in," wilson library bulletin, 40 (november, 1965), 262-268. 30. luhn, h. p.: "a business intelligence system," ibm journal of research and development, 2 (october 1958), 315-319. 31. stanwood, richard h.: "the merge system of information dissemination, retrieval and indexing using the ibm 7090 dps ." in association for computing machinery: digest of technical papers (1962), pp. 38-39. 32. young, e. j.; williams, a. s.: historical development and present status-douglas aircraft company computerized library program (santa monica, california: douglas aircraft co., 1965). 33. haznedari, i.; voos, h.: "automated circulation at a government r & d installation," special libraries, 55 (february 1964), 77-81. 34. gibson, r. w., jr.: randall, g. e.: "circulation control by computer," special libraries, 54 (july-august 1963), 333-338. 35. mccoy, ralph e.: "computerized circulation work: a case study of the 357 data collection system," library resources & technical services, 9 (winter 1965), 59-65. 36. hamilton, robert e.: "the illinois state library 'on-line' circulation control system." in university of illinois graduate school of library science: proceedings of the 1968 clinic on library applications of data processing (urbana, illinois: graduate school of library science, 1969), pp. 11-28. 37. "redstone center shows on-line library subsystems," datamation, 14 (february 1968), 79, 81. 38. kennedy, r. a. : "bell laboratories' library real-time loan system (bellrel)," journal of library automation, 1 (june 1968), 128-146. 39. university of california, san diego, university library: report on serials computer project; university library and ucsd computer center (la jolla, california: university library, july 1962). 40. pizer, irwin h.; franz, donald r.; brodman, estelle: "mechanization of library procedures in the medium-sized medical library: i. the serial record," bulletin of the medical library association, 51 (july 1963) , 313-338. 41. strom, karen c.: "software design for bio-medical library serials control system." in american society for information science, annual meeting, columbus, 0., 20-240ct.1968: proceedings, 5 (1968) , 267-275. history of l-ibrary computerizationjkilgour 229 42. varennes, rosario de : "on-line serials system at laval university library," journal of library automation, 3 (june 1970). 43. kansas union list of serials ( lawrence, kansas: university of kansas libraries, 1965 ), 357 pp. 44. griffin, hillis l.: "electronic data processing applications to technical processing and circulation activities in a technical library." in university of illinois graduate school of library science: p-roceedings of the 1963 clinic on library applications of data process'ing (champaign, illinois: illini union bookstore, 1964) , pp. 96-108. 45. randall, g. e.; bristol, roger p.: "pil (processing information list ) or a computer-controlled processing record," special libraries, 55 (feb. 1964), 82-86. 46. minder, thomas l.: "automation-the acquisitions program at the pennsylvania state university library." in international business machines corporation: ibm library mechanization symposium, endicott, new york, may 25, 1964, pp. 145-156. 47. dunlap, connie: "automated acquisitions procedures at the university of michigan library," library resources & technical services, 11 (spring 1967), 192-206. 48. alanen, sally; sparks, david e.; kilgour, frederick g.: "a computermonitored library technical processing system." in american documentation institute, 1966 annual meeting, october 3-7, 1966, santa monica, california: proceedings, pp. 419-426. 49. burgess, t .; ames, l.: lola; library on-line acquisitions subsystem (pullman, wash.: washington state university library, july 1968). 50. mitchell, patrick c.; burgess, thomas k.: "methods of randomization of large files with high volatility," journal of library au-tomation, 3 (march 1970). 51. parker, edwin b.: "developing a campus information retrieval system." in proceedings of a conference held at stanford university libraries, october 4-5, 1968 (stanford, california: stanford university libraries, 1969), pp. 213-230. 52. "preliminary guidelines for the library of congress, national library of medicine, and national agricultural library implementation of the proposed american standard for a format for bibliographic information interchange on magnetic tape as applied to records representing monographic materials in textual printed form (books) ," jourruzl of ubrary automation, 2 (june 1969), 68-83. lib-s-mocs-kmc364-20141005045144 providing bibliographic services from machine-readable data basesthe library's role richard de gennaro: director of libraries, university of pennsylvania, philadelphia, 215 libraries will play a key .role in providing access to data bases, but not by subscribing to tape services and establishing local processing centers as is commonly assumed. high costs and the nature of the demand will make this approach unfeasible. it is more likely that the library~s reference staff will develop the capability of serving as a broker between the local campus user and the various regional or specialized retail distribution centers which exist or will be established. this brief paper will attempt to counter the widely held view that the larger research libraries will soon need to begin subscribing to the growing number of data bases in machine-readable form and providing current awareness and other services from them for their local users. 0 it will speculate on how this field might develop and will suggest a less expensive and more feasible strategy which libraries may use to gain access to these increasingly important bibliographic services. the key question of who will pay for these new services, the user or the institution, will also be discussed. while it is clearly outside the scope of this paper to review the state-ofthe-art of data base services, reference to a few key works and a brief introduction to the subject may be helpful. the most comprehensive and authoritative review of the state-of-the-art of the field and its literature is the excellent chapter entitled "machinereadable bibliographic data bases" by marvin c. gechman in the 1972 volume of the annual review of information science and technology. 1 a useful selection of readings is key papers on the use of computerbased bibliographic services edited by stella keenan and published jointly • this paper was developed from a talk by the author on a panel entitled "library management of machine-readable data bases." the program was jointly sponsored by cola, isad, and acrl and took place at the ala conference in las vegas, june 24, 1973. 216 journal of library automation vol. 6/4 december 1973 by the american society for information science and the national federation of abstracting and indexing services in 1973.2 a study of six university-based information systems made by the national bureau of standards is essential and contains in convenient form comparative and descriptive information about these pioneering centers which are sponsored by the national science foundation.3 some of the most useful and important data bases available are those that have been developed by the indexing and abstracting services as byproducts of their efforts to automate the production of their regular printed publications. like the publications, the tapes come in a wide variety of incompatible formats. among the important producers are: chemical abstracts service, biosciences information service, engineering index inc., american institute of physics, and the american geological institute. ccm information corporation (pandex) and the institute for scientific information are two examples of major commercial suppliers. several of the scientific societies received substantial grants from the n ationa! science foundation and other sources in the 1960s for this automation effort, and it was generally expected that an important new market for the by-product tapes would develop among researchers in universities and in industry. imaginative and forward-looking librarians and computer people at various universities applied for and received grants to establish centers where these new data tapes could be used to provide current awareness and retrospective search services to users. the national aeronautics and space administration established a network of regional dissemination centers at six universities, including the universities of connecticut, indiana, and new mexico, the north carolina science and technology research center, university of pittsburgh, and the university of southern california. the national science foundation has been supporting centers at the university of georgia, lehigh university, university of california at los angeles, ohio state university, and stanford university. other centers have been established at the illinois institute of technology research institute and the university of florida. it is worth noting that nearly all centers provide services free to their own institutional users and continue to be heavily subsidized. all seem eager to expand their markets to include paying customers from a larger region. the latest entry into this field is the new england board of higher education's northeast academic science information center (nasic) sponsored by nsf. nasic's approach is basically different from the unitary centers that have been named. it will attempt to become a broker between the various existing centers and its own members, facilitating their access to existing services elsewhere. it will serve a ten-state region and is expected, perhaps somewhat optimistically, to become self-supporting after the three-year grant period ends. the number of data bases available in the united states is now over a hundred and is growing rapidly, apparently without benefit of firm stan..... providing bibliographic services/ de gennaro 217 dards. a parallel development is taking place in europe. as the number of available data bases increases, and as the activity at these centers expands, more and more librarians become interested in and concerned about how they are going to provide these new, important, and expensive services on their own campuses. interest among librarians in data base services is running high. a session at the association of research libraries conference in the spring of 1973 was devoted to it, and a program at the annual meeting of the american library association in las vegas on the subject was jointly sponsored by the cola discussion group~ the information science and automation division, and the association of college and research libraries. while this interest is commendable and should be stimulated, it is also important that it be tempered and put into perspective by a realistic consideration of some of the costs and problems involved in providing these services. this is what the remainder of this paper will attempt to do. the title of the ala program was "library management of machine~ readable reference data bases." implied in that title are two basic assumptions that are widely accepted: one is that libraries will play a key role in providing access to information in machine-readable data bases on their campuses. the other is that in order to provide this access they will have to acquire and maintain these data bases and develop the capability of searching and manipulating them for their local users. the first assumption is valid; libraries will be responsible for assisting users in gaining access to information in this new form. the second assumption is highly questionable, if not invalid. it is extremely unlikely that many individual libraries will be able to afford to establish centers to acquire and process these machine-readable data bases. while it may appear that a straw man is being set up that can be easily demolished, the idea that academic libraries must and will begin acquiring and servicing many large and expensive data bases, and even statistical data banks, is still widely enough held that it ought to be put to rest. how did this idea gain such currency? perhaps it was because the first available data bases were from the indexing and abstracting services and contained machine-readable versions of their printed indexes. since li~ braries subscribed to the printed editions, it followed that they should also subscribe to the tape editions. the same is true for the census tapes. li~ braries were the chief repositories for printed census publications, so it was natural to assume that they would have to subscribe to and make avail~ able the machine~readable census data as well. we now know better about the census tapes; the problem was simply beyond our resources, and they are being made available from specialized centers. a similar solution may well emerge for the bibliographical data tapes of the indexing and ab~ stracting services. to help put matters into perspective, it might be useful to review a few other ideas we had in the last two decades on how certain technological de~ 218 journal of library automation vol. 6/ 4 december 1973 veloprnents would be implemented in the library. take microfilm, for example. back in the 1950s when microfilm came of age for library use, many librarians thought that every major library would require its own laboratory where large quantities of film could be produced and processed under the direction of a new breed of librarian called a documentalist. several major libraries did establish such laboratories for a time, but the only remaining ones of any significance are at the library of congress and a few other large libraries . most of the others were put out of business by the copying machine, the local service bureau, and commercial micropublishers-and the documentalists became information scientists. library automation provides other interesting examples. many of us recall that in the 1960s it was a commonly held view that each major library would have to automate its operations, and that librarians would learn to master the computer that was soon to be installed in every library basement, or see themselves replaced by computer experts. as we all know, it did not happen that way. librarians will probably end up with computer terminals or minicomputers, with software packages supplied by library cooperatives or commercial vendors. when the marc tapes were first made available, it was assumed (and this is what the marc i experiment was all about) that each library would have to subscribe to the tapes and design, implement, and operate its own system to use the data in its cataloging operations. again, it did not happen that way. marc data are being used by libraries, but indirectly through cooperative centers such as oclc, or through commercial vendors of card services such as information design or josten's, inc. individual libraries are not subscribing to marc tapes, as we had thought would be the case. the point of citing these few examples is to suggest that it is extremely difficult in the early stage of a new technology to predict with any confidence how it will be introduced and implemented, and what effects it will have. we seem to have a natural tendency first to try to cope with each new technological development on a do-it-yourself individual library level, and when experience teaches us that implementing the particular technology is more difficult and more expensive than we thought, we regroup and try a broader-based approach. this is approximately where we are with data base services; it is time for a broader-based approach. again, it is unlikely that libraries will provide access to machine-readable data by setting up their own campus information centers to acquire and process data bases. anyone who takes the time to look at a list of data bases available and their annual subscription rates will understand that research library book budgets will not be large enough to cover these additional subscription costs. in fact, the subscriptions are only a minor element in the total cost of providing these services. the data bases must be cumulated and maintained. programs to manipulate and access them in their many nonstandard formats and contents must be written or adapted. c1 providing bibliographic services/de gennaro 219 the cost of administering and marketing the services and interfacing with the users will be high. perhaps the most critical question to be answered is: will the individual user be charged for the services he uses or will the costs be absorbed by the university? the answer to that question will determine how and to what extent the machine-based services will be used in the future. if they are offered free, as are traditional library services, then one can assume with some confidence that a substantial demand for them will materialize. this has in fact been the early experience of the centers at the university of georgia and ohio state and others where use has been totally subsidized by grant money.3·' on the other hand, if the individual user is asked to pay for these services out of his own pocket or even out of departmental or grant funds, the market for them will be severely limited. it is extremely unlikely that large numbers of faculty and other researchers in universities will be seriously interested in becoming paying users of machine-based information services. the experience of c. c. parker at the university of southampton may prove to be typical.5 he reported a drop from forty-seven to five users of an sdi service after charges were introduced. it was not that the users could not pay the charges, but that they preferred to use their resources for other more important needs. the national library of medicine recently instituted user charges in the medline system in order to effect a needed reduction in the number of users. the case for giving these services to users free is theoretically sound in the traditional library context, but there are practical difficulties. first, these services will be expensive and they will require a net addition to library budgets rather than a transfer from one activity to another; the prospects for such budget increases seem dim in the next few years. second, if the services are offered free, there will be no natural or automatic mechanism for controlling their use, and such control is essential to limit costs. once users get on a free subscription list they will tend to stay on it whether they actually use the products or not. this happens in many libraries where current accessions lists are regularly sent to faculty, most of whom discard them unread. on the other hand, there is ample precedent for charging a modest fee for certain services in libraries. the best example is the almost universal charge for photocopies. in those instances where libraries offered free copies, the service was abused and charges had to be reinstated. it seems likely that a combination of institutional subsidy and individual charges will evolve as the dominant method of paying for machinereadable services. in order to recover some costs and prevent abuses, an appropriate system of charges will have to be instituted in spite of the logic of the argument for free services. incidentally, the case for free computer time in universities is perhaps equally valid, but it has never been accepted by the responsible budget officers. 220 journal of library automation vol. 6/4 december 1973 regardless of who pays, these services will have to be advertised and marketed aggressively to reach the limited number of potential users on each campus. it will not be enough to announce their availability and wait for customers. but even the best salesman on the most research-oriented campus will probably fail to find enough users to justify the high costs of providing the extensive and diverse subject coverage that every university will require. the solution, of course, lies in the establishment of a small number of comprehensive regional or even national information processing centers, possibly backed up by a much larger number of specialized centers or services for particular subject or mission-oriented fields such as physics, chemistry, medicine, pollution, urban studies, census data, etc. libraries will play a key role in facilitating access to data bases by functioning as the interface or broker between the users on campus and these regional and special processing and distribution centers. this means that they must develop a new kind of information or data services librarian on their reference staffs whose function it will be to publicize these services and maintain extensive files of information on their scope, contents, cost, and availability. these reference specialists will also guide users to the most appropriate services, help them to build and maintain their interest profiles, and provide assistance with the business aspects of dealing with vendors."'"' after an initial start-up period, this function should and doubtless will become a fully integrated part of the regular reference service, and the need for specialists will disappear as this knowledge becomes a part of every reference librarian's repertoire. the available data base services fall into two main categories: off-line batch and on-line interactive services. the most commonly available up to now have been regular off-line current awareness ( sdi) services based on an interest profile; these have been supplemented by occasional requests for retrospective searches of the older files. the results of these off-line searches are delivered to the subscriber by conventional mail. on-line services permit the user or the reference specialist to access a portion of the data base directly via terminals and telephone lines and perform the search in an interactive mode. some results are immediately displayed on the terminal and others are sent by mail. the lockheed information retrieval service and systems development corporation have recently begun offering interactive searching ~th online computer terminals of a large selection of the most useful bibliographic data bases. with this capability commercially available from leased terminals on a fee-per-use basis, it will be difficult for a university or even some existing centers to justify subscribing to and maintaining these data bases for their own limited use. if lockheed, sdc, and other vendors ca~ d evelop the market and operate these services at a profit, they may be able to 00 the university of pennsylvania library recently established a data services office based on this concept with encouraging early results. providing bibliographic services/ de gennaro 221 satisfy a very substantial portion of the need for these new bibliographic services. medline, toxline, recon, and the new york times information bank provide other models for specialized and centralized interactive services. some authorities assert that this trend toward on-line interactive searching will accelerate and eventually supersede tape searching.6 others argue that the cost of maintaining and searching on-line the really large data bases is prohibitive and will remain so for several years to come. it seems most likely to this author that the trend will be toward on-line systems covering a limited period of time, probably the latest three to five years, with supporting off-line services for retrospective searches. if this proves to be the case, libraries will find it practical and convenient to make terminals available at or near reference desks. a close look at the several centers which now exist on individual campuses would probably show that they are heavily subsidized by grant or other outside funds, and that they are trying to expand to serve their states or even wider regions in order to achieve greater cost effectiveness. these centers ·deserve the credit that is always due pioneers. they are in the process of developing the patterns for providing these services in the future. one of the chief lessons they may have already taught us is that a single university, or even possibly a single state or region, is not a large enough market base upon which to build this activity. these centers will require a large volume of business to justify their high overhead and operating costs and they will seek and welcome additional paying customers. to summarize and conclude, libraries will play a key role in providing access to machine-readable data bases, but they will generally not do it by acquiring and managing these data bases in local campus centers because of the high costs involved. these high costs and the limited market will restrict the number of processing centers to several regional or even national centers, supplemented by a larger number of specialized discipline and mission-oriented services. many data bases and services will be available on a fee-for-service basis either through existing centers or directly from professional societies, government agencies, and commercial vendors with the library serving as facilitator or broker. it seems likely that a combination of institutional subsidies and individual cl1arges will emerge as the pattern for paying for these new computer-based bibliographical services. references i. marvin c. gechroan, "machine-readable bibliographic data bases," in annual review ()j info1'11uj.tion science and technology, v. 7 (washington, d.c.: asis, 1972). p.323-78. 2. stella keenan, ed., key papers on the use of computer-ba.sed bibliographic services (washington, d.c.: asis, 1973). 222 journal of library automation vol. 6/ 4 december 1973 3. b. marron, and others, a study of six university-based information systems (washington, d.c.; national bureau of standards, 1973 [nbs technical note 781]). 4. james l. carmon, "a campus-based information center," special libraries 64:6569 (feb. 1973). 5. c. c. parker, "the use of external current awareness services at southampton university," aslib proceedings 25:4-17 (jan. 1973). 6. m. cerville, l. d. higgins, and francis j. smith, "interactive reference retrieval in large files," information storage and retrieval 7:205-10 (dec. 1971). digitization has bestowed upon librarians and archivists of the late 20th and early 21st centuries the opportunity to reexamine how they access their collections. it draws these two traditional groups together with it specialists in order to collaborate on this new great challenge. in this paper, the authors offer a strategy for adapting a library system to traditional archival practice. t he librarian and the archivist . . . both collect, preserve, and make accessible materials for research; but significant differences exist in the way these materials are arranged, described, and used.”1 among the items usually collected by libraries are: published books and serials, and in more recent times, commercially available sound recordings, films, videos, and electronic resources of various types. archives, on the other hand, tend to collect original records of an organization, unique personal papers, as well as other effects of individuals and families. each type of institution, given its particular emphasis, has its own traditions and its own methods of dealing with its collections. most midto large-sized automated libraries in the united states and abroad use machine readable cataloging (marc) records to form the basis of their online catalogs. bibliographic records, including those in the marc format, generally represent an individually published item, or “information product,”2 and describe the physical characteristics of the item itself. the basic unit of archival description, however, is a much more complex entity than the basic unit of bibliographic description and often involves multiple hierarchical levels that may or may not extend down to the level of individual items. at portland state university (psu) the authors examined whether the capabilities of their present integrated library system could be expanded to capture the hierarchical structure of traditional archival finding aids. ■ background as early as 1841, the cataloging rules established by panizzi were geared toward locating individual published items. panizzi based his rules on the idea that any person looking for any particular book should be able to find it through the catalog.3 this tradition has continued over time up through current standards such as the anglo-american cataloguing rules and reaffirmed in marc, the standard for the representation and exchange of bibliographic information that has been widely used by libraries for over thirty years.4 archival description, on the other hand, is generally based on the fonds, that is, the entire collection of materials in any medium that were created, accumulated, and used by a particular person, family, or organization in the course of that creator’s activities and functions.5 thus, the basic unit of archival description, usually a finding aid, is a much more complex entity than the basic unit of bibliographic description, often involving multiple hierarchical levels of description that may or may not extend down to the level of individual items. before archival description begins, the archivist identifies related groups of materials and determines their proper arrangement. once the arrangement is determined, then the description of the materials reflects both their provenance and their original order.6 the first explicit statement of the levels of arrangement in an archival collection was by holmes and has since been elevated to the level of dogma in the archival community.7 a more recent statement in describing archives: a content standard (dacs) indicates that the actual levels of arrangement may differ for each collection. by custom, archivists have assigned names to some, but not all, levels of arrangement. the most commonly identified are collection, record group, series, file (or filing unit), and item. a large or complex body of material may have many more levels. the archivist must determine for practical reasons which groupings will be treated as a unit for purposes of description.8 rephrasing holmes, the five levels of arrangement can be defined as: 1. the collection level which holmes called the depository level—the breakdown of the depository’s complete holdings into a few major divisions based on the broadest common denominator 2. the record group level—the fonds or complete collection of the papers of a particular administrative division or branch of an organization or of a particular individual or family 3. the series level—the breakdown of the record group into natural series and the arrangement of each series with respect to the others 4. the filing unit level—the breakdown of each series into unit components, which are usually fairly obvious if the documents are kept in file folders 5. the document level—the level of individual items digital collection management through the library catalog michaela brenner, tom larsen, and claudia weston digital collection management through the library catalog | brenner, larsen, and weston 65 michaela brenner (brennerm@pdx.edu) and tom larsen (larsent@pdx.edu) are database maintenance and catalog librarians, and claudia weston (westonc@pdx.edu) is assistant university librarian for technical services, portland state university. 66 information technology and libraries | june 2006 the end result of archival description is usually a finding aid that ideally presents an accurate representation of the items in an archival collection so that users can, as independently as possible, locate them.9 building on the print finding aid, the archival community has explored a number of mechanisms for disseminating information on the availability of items in their collections. in 1983, the usmarc format for archival and manuscript control (marc-amc) was released and subsequently sanctioned for use as one possible standard data structure and communication protocol in the saa descriptive standard archives, personal papers, and manuscripts (appm) and its successor, dacs.10 its adoption, however, has been somewhat controversial among archivists.11 the difficulty in capturing the hierarchical nature of collections through the marc format is one factor that has limited the use of marc by the archival community. while it is possible to encode this hierarchical description in marc using notes and linking fields, few archivists in practice have actually made use of these linking fields.12 thus, in archival cataloging, marc records have been used primarily for collection-level description, allowing users to search and discover only general information about archival collections in online catalogs while the finding aid has remained the primary tool for detailed data at all levels of description. in 1995, the encoded archival description (ead) emerged as a new standard for encoding descriptions of archival collections. the ead standard, like the marc standard, allows for the electronic storage and exchange of archival information; but unlike marc, it is based on the finding aid. ead is well suited for encoding the hierarchical relationships between the different parts of the collection and displaying them to the user, and it has become more widely adopted by the archival community. as outlined, the standards and systems chosen by an institution are dictated by the needs and traditions of that institution. the archival community relies heavily on finding aids and, with increasing frequency, on ead, their electronic extension; whereas the library community heavily relies on the online public access catalog (opac) and marc records. new trends capitalizing on the strengths of both traditions are evolving as libraries and archives seek ways to improve access to their archival and digital collections. ■ access to digital archival collections in libraries when searching the web for collections of information, one frequently encounters separate interfaces for traditional library, archival, and digital collections even though these collections may be owned, sponsored, hosted, or licensed by a single institution. descriptive records for traditional library materials reside in the opac and are constructed according to standard library practice, while finding aids for the archival and digital collections increasingly appear on specially designed web sites. this, of course, means that users searching the opac may miss relevant materials that are described only in the archival and digital documents database or web site. similarly, users searching the archival and digital documents database or web site may miss relevant materials that are described only in the opac. in other instances, libraries, such as the library of congress, selectively add records to their opacs for individual items in their archival and digital document collections. this incorporation allows users more complete access to items within the library’s collections. authority control and the assignment of descriptors further enhance access to the item-level records. to minimize processing costs, however, libraries frequently create brief descriptive records for items, thereby limiting their value to patrons.13 by creating descriptive records for the items only, libraries also obscure the hierarchical relationships among the items and the collections in which they reside. these relationships can provide the user with a useful context for the individual items and are an essential part of archival description. still other libraries, such as the university of washington, include collection-level marc records in the opac for their archival and digital document collections. these are searchable in the opac in the same way as bibliographic records for other materials. these collection-level records can then in turn be linked to finding aids that describe the collections more fully.14 collection-level records often are used in libraries where library resources may be insufficient for cataloging large collections of materials at the item level.15 the guidelines for collection-level records in appm and dacs, however, allow for additional fields that are not ordinarily used in library bibliographic records. these include such things as descriptions of the organization and arrangement of the collection, citations for published descriptions of the collection and links to the finding aid, and acknowledgment of the donors, as well as ample subject access to the collection. despite their potential for detail, collectionlevel records cannot provide the same degree of access to individual items as full item-level records. ■ an approach taken at portland state university library in many ways, archival and digital-document collections are continuing resources. a continuing resource is defined as “. . . a bibliographic resource that is issued over time digital collection management through the library catalog | brenner, larsen, and weston 67 with no predetermined conclusion. continuing resources include serials and ongoing integrating resources.”16 like published continuing resources, archival and digital collections generally are created over time with no predetermined conclusion. in fact, some archival collections continue to grow even after part of the collection has been accessioned by a library or archive. thus, even though many of the individual items in the collection might be properly treated as monographic (not unlike serial analytics), it would not be unreasonable to treat the entire collection as a continuing resource. with this in mind, the authors examined whether their electronic-resource management system could be adapted to accommodate evolving collections of digitized and born-digital material. more specifically, the present system was examined to determine whether its capabilities could be expanded to capture the hierarchical structure found in traditional archival finding aides. the electronic resource management system in use by psu library is innovative interfaces’ electronic resource management (erm) product. according to innovative interfaces inc.’s (iii) marketing literature, “[erm] effectively controls subscription and licensing information for licensed resources such as e-journals, abstracting and indexing (a&i) databases, and full-text databases.”17 to control and provide improved access to these resources, erm stores details about purchase orders, aggregators and publishers, subscription terms, licensing conditions, breadth of holdings, internal and external contact information, and other aspects of these resources that individual libraries consider relevant. for increased security and data integrity, multilevel permissions restrict viewing and editing of data to the appropriate level of staff or patron. the ability of erm to replicate the two-level hierarchical relationships between aggregators or publishers and the electronic and print resources they provide was of particular interest to the authors. through erm and iii’s batch record load capabilities, bibliographic and resource records can be loaded into the iii system using delimited source files such as those provided by serials solutions. resource records are the mechanisms used by iii to describe digital resources at a collection, subcollection, or title level, thereby enabling the capture of descriptive information not permitted by standard bibliographic records. iii uses holdings records to document serial holdings statements. according to the marc 21 formats for holdings data, a holdings statement is the “record of the location(s) and bibliographic units of a specific bibliographic item held at one or more locations.”18 iii holdings records may also contain a url for connecting to an electronic resource. in figure 1, for example, the resource record shows that psu library provides limited access to a number of journal titles through its springer journals online resource. as seen in figure 2, the display of a holdings record embedded in a bibliographic record provides more specific information on the availability of a title through the library’s collection. in this particular example, the information display reveals that print volumes are available for this title but that psu only has this title available as a part of the springer-verlag electronic collection accessible by clicking on the hotlink. more information on the springer collection can be discovered by clicking on the about resource button to retrieve the springer journals online resource record. this example, then, represents a two-level hierarchy where the resource springer journals online is analogous to an archival collection and abdominal imaging is analogous to an archival series. adaptation of erm for library-created digital collections was explored through work being done to fulfill the requirements of a grant received in 2005 by psu library. the goal of this grant was “to develop a digital library under the sponsorship of the portland state university library to serve as a central repository for the collection, accession, and dissemination of key planning documents and reports, maps, and other ephemeral materials that have high value for oregon citizens and for scholars around the world.”19 the overall collection is called the oregon sustainable community digital library (oscdl). in addition to having its own web site, it was decided to make this collection accessible through the psu library catalog so that patrons could find digitized original documents about the city of portland together with other library materials. bibliographic records would be added to the database with hyperlinks to the digitized original documents using existing staff and tools. these bibliographic marc records would be as complete as possible. initially, attention was focused on documents originating from four different sources: ernest bonner, a former portland city planner; the city of portland archives; metro (the regional government for the portland, oregon, metropolitan area); and trimet (the portland metropolitan public transportation system). along with the documents, metadata was received from various databases. these descriptions ranged from almost nothing to detailed archival descriptions. unlike the challenge of shifting titles and holdings with typical serials collections, the challenge of this project was to reflect the four hierarchical levels of psu library’s collection (figure 3). innovative’s system structure was manipulated in order to accomplish this. at the core of iii’s erm module are resource records (rr) created to reflect the peculiarities of a particular collection. linked to these resource records are holdings records (hr) containing hyperlinks to the actual digitized documents (doc h1 – doc h3) as well as to their respective bibliographic records (bib doc h1 – bib doc h3) containing additional information on the individual items within the collection (figure 4). 68 information technology and libraries | june 2006 first, resource records were manually created for three of the subcollections within the bonner collection. these subcollections contained documents reflecting the development of harbor drive, front street, and the park blocks. the fields defined for the resource records include the resource title; type (digitized documents) and format (pdf) of the resource; a hyperlink to the new oscdl web site; content and systems contact names; a brief description of the resource; and, most importantly, the resource id used to connect holding records for individual documents to the corresponding resource record. next, the batch-loading function in erm was used to create bibliographic and holding records and associate them with the resource records. taking advantage of tracking data produced during the digitization process (figure 5), spreadsheets were created for each collection reflecting the data assigned to each individual digitized document. the document title, the date the document was created, number of pages, and summaries were included. coordinates for the streets mentioned in the documents were also included. because erm uses issn numbers and titles as match points for record loads, ”issn” numbers were also manufactured for each document and included in the spreadsheet. these homemade numbers were distinguished by using pdx as a prefix followed by collection and document numbers or letters, for example, pdx0022090 or pdxhdcoll. fortunately, erm accepted these dummy issns (figure 6). from this data spreadsheet, the system-required comma delimited coverage load file (*.csv) was also created. for this file, the system only allows a limited number of fields, and is very particular about the right terms, including correct capitalization, for the header row. individual document titles, the made-up issn numbers, individual urls to the documents, and a collection-specific resource id (provider) that connects all the documents from a collection to their respective resource record were included. the resource id is the same for all documents in one collection (figure 7). in the first attempt, the system was set up to produce holdings and bibliographic records automatically, using the data from the spreadsheets. for the bibliographic records, a system-provided template was created that included some general subject headings, genre headings, an author field, and selected fixed fields, such as language, bibliographic level, and material type (figure 8). records for the harbor drive collection were loaded, and the system created brief bibliographic and holdings records and linked them to the harbor drive resource record. the records were globally updated to add the general material designator (gmd) “electronic resource” to the title as well as the phrase “digitized document” as a local “call number” to make these documents more visible in the browse screen of the online catalog (opac) (figure 9). the digitized documents now could be found in the library catalog by author, subject, or keyword. the brief bibliographic records (figure 10) allow the user to go either to the digitized document via url or to the resource record with more information on the resource itself and links to other items in the same collection. the resource record then provides links either to the new oscdl web site (via the oregon sustainable community digital library link at the bottom of the resource record), to the bibliographic description of the individual document, or to the digitized document (figure 11). however, the quality of the brief bibliographic records that had been batch generated through the system-provided template was not satisfactory (figure 8). it was decided that more document-specific data like summaries, number of pages, the dates the documents were created, geographical information, and documentlevel local subject headings should be included. these data were already available from the original spreadsheets. with limited time and staff resources, full bibliographic marc records were batch created using the spreadsheets, detailed templates adjusted slightly to each collection, microsoft mail merge, and finally, the marcedit program created by terry reese of oregon state university (http://oregonstate.edu/~reeset/marcedit/html/index.html). this gave maximum control over the data to be included and the way they would be included. it also eliminated the need to clean up the data following the record load (figure 12). subsequently, full bibliographic records were created for the subcollections harbor drive, front street, and park blocks, to connect them to the next higher level, the bonner collection (figure 3). these records were also contributed to worldcat. mimicking the process used at the document level, a resource record was created for the bonner collection and the holdings records for the three subcollections were connected with their corresponding bibliographic records (figure 13). resource records with their corresponding item-level records for trimet, the city archives, and metro followed. the final step was then to add the resource record and the bibliographic record for the whole oscdl collection (figure 14). since this last bibliographic record is not connected to a collection above it, there is only a hyperlink to the oscdl resource record (figure 15). more subcollections and their corresponding digital documents are continually being added to oscdl. structures in psu library’s opac are adjusted as these collections change. digital collection management through the library catalog | brenner, larsen, and weston 69 ■ conclusion according to salter, “digitizing, the current challenge that straddles the 20th and 21st centuries, has given archivists and librarians pause to reconsider access to their collections. the world of digitization is the catalyst for it people, librarians, and archivists to unify the way they do things.”20 in this paper, a strategy has been offered for adapting a library system to traditional archival practice. by making use of some of the capabilities of the module in psu library’s integrated library system that was originally designed for managing electronic resources, a method was developed for managing digital archival collections in a way that incorporates some of the features of a traditional finding aid. the contents of the various hierarchical levels of the collection are fully represented through the manipulation of the record structures available through psu’s system. this technique provides for enhanced access to the individual items of a collection by giving the context of the item within the collection. links between the hierarchical levels facilitate navigation between the levels. although the records created for traditional library systems are not as rich as those found in traditional finding aids, or in ead, their electronic equivalent; and the visual arrangements are not as intriguing as a wellplanned web site, the ability to show how items fit within the greater context of their respective collection(s) is a step toward reconciling traditional library and archival practices. enabling the library user to virtually browse through the overall resources offered by the library and then, if desired, through the various levels of a collection for relevant resources enhances the opportunities presented to the user for finding relevant information. references and notes 1. society of american archivists, “so you want to be an archivist: an overview of the archival profession,” 2004, www.archivists.org/prof-education/arprof.asp (accessed apr. 24, 2006). 2. kent m. haworth, “archival description: content and context in search of structure,” journal of internet cataloging 4, no. 3/4 (2001): 7–26. 3. antonio panizzi, “rules for the compilation of the catalogue,” the catalogue of the british museum 1 (1841): v–ix. 4. joint steering committee for revision of aacr, angloamerican cataloguing rules, 2nd ed., 2002 revision (chicago: ala, 2002). 5. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004). 6. haworth, “archival description.” 7. oliver w. holmes, “archival arrangement: five different operations at five different levels,” american archivist 27, no. 1 (1964): 21–41; terry abraham, “oliver w. holmes revisited: levels of arrangement and description of practice,” american archivist 54, no. 3 (1991): 370–77. 8. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004); xiii. 9. haworth, “archival description.” 10. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004); steven l. hensen, comp., archives, personal papers, and manuscripts, 2nd ed. (chicago: society of american archivists, 1989). 11. peter carini and kelcy shepherd, “the marc standard and encoded archival description,” library hi tech 22, no. 1 (2004): 18–27; steven l. hensen, “archival cataloging and the internet: the implications and impact of ead,” journal of internet cataloging 4, no. 3/4 (2001): 75–95. 12. abraham, “oliver w. holmes revisited.” 13. elizabeth j. weisbrod and paula duffy, “keeping your online catalog from degenerating into a finding aid: considerations for loading microformat records into the online catalog,” technical services quarterly 11, no. 1 (1993): 29–42. 14. carini and shepherd, “the marc standard and encoded archival description.” 15. see, for example, margaret f. nichols, “finding the forest among the trees: the potential of collection-level cataloging,” cataloging & classification quarterly 23, no. 1 (1996): 53–71; and weisbrod and duffy, “keeping your online catalog from degenerating into a finding aid.” 16. joint steering committee for revision of aacr, angloamerican cataloguing rules, d-2. 17. innovative interfaces inc., “electronic resources management,” 2005, www.iii.com/pdf/lit/eng_erm.pdf (accessed apr. 24, 2006). 18. library of congress, marc 21 format for holdings data: including guidelines for content designation (washington, d.c.: cataloging distribution service, library of congres, 2000), appendix e–glossary. 19. carl abbot, “planning a sustainable portland: a digital library for local, regional, and state planning and policy documents—framing paper,” 2005, http://oscdl.research.pdx. edu/framing.php (accessed apr. 24, 2006). 20. anne a. salter, “21st-century archivist,” newsletter, 2003, www.lisjobs.com/newsletter/archives/sept03asalter.htm (accessed apr. 24, 2006). 70 information technology and libraries | june 2006 figure 1. example of resource record from the psu library catalog (search conducted nov. 4, 2005) appendix. figures digital collection management through the library catalog | brenner, larsen, and weston 71 figure 2. example of a bibliographic record for a journal title from the psu library catalog (search conducted nov. 4, 2005) 72 information technology and libraries | june 2006 figure 4. resource record harbor drive with linked holdings records, bibliographic records, and original documents figure 3. partial diagram of the hierarchical levels of the collection digital collection management through the library catalog | brenner, larsen, and weston 73 figure 7. comma delimited coverage load file (*.csv) figure 6. data spreadsheet figure 5. spreadsheet for tracking data 74 information technology and libraries | june 2006 figure 9. browse screen in opac figure 8. bibliographic records template digital collection management through the library catalog | brenner, larsen, and weston 75 figure 11. resource record with various links figure 10. system-created brief bibliographic record in opac 76 information technology and libraries | june 2006 figure 13. bonner resource record with linked holdings records, bibliographic records, and original documents figure 12. full bibliographic record in opac digital collection management through the library catalog | brenner, larsen, and weston 77 figure 15. bibliographic record for the oscdl collection figure 14. outline of linked records in the collection 166 book reviews proceedings of the 1968 clinic on library applications of data processing, edited by dewey e. carroll. urbana : university of illinois, 1969. 235 pp. $3.00. for all except inveterate institute participants, it must be difficult to decide to spend yet another week listening to a widely mixed series of papers and discussions on data processing in libraries, in the hope of finding something new or useful. to attract a wide audience, the offerings tend to range from simple introductions to technical discussions of specific programs or projects. the value of gathering the papers of such institutes into volumes of proceedings is questionable. material from the introductory papers would certainly find greater use in a comprehensive monograph, while the papers which report new developments or technical problems would have a better chance of reaching their proper audiences if published in journals. the repetitive "how-we-did-it" reports might best be left unpublished. the proceedings of the 1968 illinois clinic does have a number of articles which deserve wide readership. frederick g. kilgour's paper on initial system design for the ohio college library center is excellent, not so much for solutions, but because he raises the questions on the purpose of college libraries and the nature of regional systems which need to be raised before embarking on design. those who have had experience with automated operations will appreciate lawrence auld's listing of ten categories of library automation failure. (he omits one of the most common-lack of computer stability. ) a technical article of considerable interest is alan r. benenfeld's paper on generation and encoding of the data base for intrex. those looking for reports of successful computer applications may find useful information in the papers by robert hamilton, of the illinois state library, on circulation; by james w. thomson and robert h. muller, of the university of michigan, on the u. of m. order system; by michael m. reynolds, of indiana university, on centralized technical processing for the university's regional campus libraries; by john p. kennedy, of georgia tech, on production of catalog cards; and by robert k. kozlow, of the university of illinois, on a computer-produced serials list. melvin ]. voigt book reviews 167 planning library services. proceedings of a research seminar held at the university of lancaster, 9-11 july, 1969. edited by a. graham mackenzie and ian m. stuart. lancaster, england: university of lancaster library, 1969. 30 shillings. this volume offers fifteen papers presented in six sessions; each session had one or more papers and some discussion. the papers range from very general mathematical models to local problems of british legal codes and re-organization of local governments. the first session introduces the problems and some theoretical notions of how to deal with them. the next three sessions deal with analysis techniques. morely introduces some simple techniques of maximizing benefits for given resources. brookes presents a good quick introduction to statistics and distributions which occur frequently in information science. leimkuhler develops cost models for storage policies and woodburn analyzes the costs in heirarchical library systems. the mathematics in these latter papers, although not difficult, will probably put off a good many librarians and administrators. both are practitioners and impressed by results, not complex models; the equations developed by leimkuhler or woodburn are probably too complex to be successfully used by most librarians. this might reflect the state of the librarian and not of the art, however, to quote cloote (from the paper by duchesne) : "with only a very few notable exceptions, successful models have been so simple that an operational research specialist would disown them." the fifth session covers data collection and evaluation. duchesne comments on management information systems and operations research for librarians. conventional techniques of data collection are reviewed by ford, including sample forms and a note of warning about too many surveys. in the final session leimkuhler presents an overview which includes several choice comments on progress (or lack of it) in libraries. during the discussion period, mackenzie suggests that libraries should use up to five percent of their budgets for research. this reviewer feels that unless this suggestion is taken more seriously, most of the theory will never find an application. these proceedings would make an excellent companion to burkhalter's case studies in library systems analysis as more theoretically oriented readings for a course operations research or administration in librarianship. some of the techniques presented could be adapted for immediate application in analyzing present systems. thus this collection of papers can be useful to both student and practitioner interested in research and development of library systems. arvo tars 168 journal of library automation vol. 3/2 june 1970 libraries at large, edited by douglas m. knight and e. shepley nourse. new york: r. r. bowker, 1969. 664 p. $14.95. libraries at large is based on the materials which the national advisory commission on libraries employed in its deliberations. the commission appraised the adequacy of libraries and made recommendations designed "to ensure an effective, efficient library system for the nation." these materials are also useful to those engaged in the enrichment of present library programs and to those developing new library projects. the materials consist of papers and reports written for the commission and include essays, original investigations, and literature reviews, as well as reprints of material that has appeared elsewhere. some papers are of top quality; some are poor. nevertheless, the appearance of these materials in one volume adds a convenient source of information that will be useful to librarians for years to come. approximately half the book is devoted to problems related to the use of libraries and to the users of libraries. the second half contains discussions of government relationships of libraries and a series of useful appendixes. perhaps the most novel section of the book is william j. baumors "the cost of library and informational services." this study investigates the economics of libraries in depth and the results are of great interest. this chapter on economics contains new material and brings together that which existed heretofore, so that it constitutes the major resource on library economics. this chapter alone is so valuable as to justify the recommendation that all libraries and most librarians should acquire libraries at large. the section on copyright is equally important, for it brings together data on a topic possessing cataclysmic potentials for librarianship. verner clapp's "copyright: a librarian's view" is the best statement that has appeared on the subject, and it is hoped that clapp's dissertation will awaken librarians to the peril that confronts them. on the other hand, the chapter entitled "some problems and potentials of technology as applied to library informational services" is somewhat less than satisfying. the section starts off with mathews and brown's "research libraries and the new technology," which originally appeared in on research libraries. it is still an inadequate exposition. there follows a reprint of "the impact of technology on the library building," which educational facilities laboratories published in 1967. the statement is adequate, but more useful information exists. the last section of the chapter is a study, "technology in libraries," which the system development corporation produced. this paper is a useful review of technologies employed by libraries and recommends five important network and systems projects to be undertaken. the chapters on government relationships include discussions of those book reviews 169 with the federal government and those at local, state and regional levels. germaine krettek and eileen d. cooke have provided a worthwhile appendix listing and abstracting library-related legislation at the national level. libraries at large is indeed a resource book, and those papers containing original investigations and literature reviews are of such high quality as to insure usefulness of this work to all thoughtful librarians. frederick g. kilgour computers and their potential applications in museums. a conference sponsored by the metropolitan museum of art. new york: arno press, 1968. 402 pp. $12.50. computers and their potential applications in museums contains the published proceedings of a conference which was held in new york, 1968. sponsored by the metropolitan museum of art and supported by ibm, the conference was another attempt to involve art and related fields in computer technology. this book covers a broad range of issues and problems from information retrieval to creativity. experts from museums, educators, librarians and computer specialists discussed the possible uses and the implications of computers for the museum field. the diversity of the participants seems to represent the components of an exceedingly complex problem which is as monumental as the museum field itself. as an overall document it gives evidence of concern and insight into the many technical problems which some researchers have encountered. in many instances the non-technical experts were too global in their thinking, while the technologists were too local in their area of concern to communicate to anyone but technologists. this disparity between approaches, with the obvious difficulties presented, is a typical one whenever non-technical groups attempt to make use of computer technology. an ambitious conference in scope, there were excellent participants and several of the papers were stimulating and provocative. the interaction among the people who attended the conference may have been useful and it may have generated important ideas. for a reader of the published proceedings one wishes there had been a final chapter which could have provided some guidelines for research and education in this field. there was an opportunity for the organizers of the conference or a small group of the participants to summarize the problems and to give some direction to solutions. several years and many conferences later we in the humanities have made little progress in use of the computer. it seems that we are still better at rhetoric than at problem solving. charles csuri 170 journal of library automation vol. 3/2 june 1970 books for junior college libraries. pirie, james w., comp. chicago: american library assoc., 1969. 452 pp. $35.00. during the recent period of rapid growth and development of junior and community colleges, a bibliographic guideline has been long awaited. james w. pirie's books for junior college libraries, with its healthy potential for developing many basic collections and extending and updating others, fills that void. though it does not boast to be the single ideal bibliographic tool, it is a welcome addition to, (and perhaps replacement for some of) its predecessors-frank bertalan's books for junior college libraries; charles l. trinkner's basic books for junior college libraries; hester hohman's readers adviser; helen wheeler's a basic book collection for the community college library; bro dart foundation's the junior college library collection~ edited by dr. bertalan; and the ever-present subject guide to books in print and books in print, from bowker. books for junior college libraries represents the cooperative efforts of some 300 expert consultants-subject specialists, faculty members and librarians-charged with the responsibility of producing a publication to serve as a book selection guide for new or established junior and community college libraries. approximately 20,000 titles are arranged by subject, broadly interpreted; with entries consisting of author, title, subtitle, edition; if other than the first, publisher, and place of publication, date, pagination, price and l.c. number. easy access is provided by the inclusion of an author and subject index. a comparative "table of subject coverage" appearing in the preface, tabulating the percentage of subject distribution to total volume for the lamount, michigan, and the more recent books for college libraries lists, indicates that books for junior college libraries maintains a comparable subject percentage distribution to total volume. only book titles have been included; foreign entries have been limited to a few major works, and out-of-print titles, in favor of titles readily available. paperbacks were listed, in the absence of card copy. though limited in its coverage of terminal and vocational courses, with emphasis toward the transfer or liberal arts program, books for junior college libraries does embrace all fields of knowledge that tend to be challenging and useful for the general education programs. it has been endorsed by the joint committee on junior colleges of aajc, ala, and the junior college section of acrl, and moves toward the recommendations of the ala standards for junior college libraries. this bibliographic guideline for junior college libraries should be welcomed by public schools as well as junior and community colleges for its assistance in developing new collections, as well as expanding and updating old collections, with quantity, quality, and economy working together. ]ames i. richey book reviews 171 agricultural sciences information network development plan. educom research report, august 1969. 74 pp. the national agricultural library wants to implement its old plan of an agricultural science information network "based on the assumption that the land-grant libraries in the states are the natural nodes to this network." educom undertook a study which was submitted to and discussed by a symposium held in washington, d. c., on february 10-12, 1970, with the participation of all agricultural libraries interested in "new and improved ways of exchanging information in support of agricultural research and education." the goal is "to develop a long-range plan for strengthening information, communication, and exchange among the libraries of land-grant institutions and the nal." according to the report, the network concept would constitute a "network of networks" and three basic components are envisioned: 1) land-grant libraries, 2) information analysis centers, and 3) telecommunications. all these components have their own aims and objectives described in this report. "nal's first course of action in the establishment of a system of information analysis centers is to develop a directory of existing analysis centers of interest to the agricultural community. the directory should be supported with a catalog detailing the services and products offered by these centers. nal should then establish cooperative agreements with these centers which would make them responsive to the needs of the users of the agricultural sciences information network. this should be supported with the installation of communications equipment to encourage and facilitate the use of a center." no doubt, the participants of the symposium will have thoroughly investigated and discussed this plan with serious consideration to its practical implementation. a new approach and improvement of information exchange is not only a necessity, but also long overdue, for those in agriculture. this information development plan would provide service for research workers at the experiment stations, scientists and teachers at the colleges, agricultural extension people at the land-grant institutions, and, last but not least, for the farmers who provide us with food and fibers in order to bring a fuller and better life on the farm and in rural and city homes. a detailed analysis of the performance, an evaluation and revision of this gigantic scientific information system, can only be made after it has been in operation for a few years. it is very promising that the national agricultural library-among its many objectives-has again taken the initiative. john de gara 172 journal of library automation vol. 3/2 june, 1970 cornell university libraries. manual of cataloging procedures. 2d ed. ithaca, n.y. : cornell university libraries, 1969. $18.00. editor robert b. slocum and his associates have produced a valuable manual useful to catalogers and persons involved in the administration of policies and procedures in technical services. as stated in the preface the manual is a supplement, not a substitute, for the anglo-american cataloging rules and its predecessors, lc list of subject headings and the lc classification schedules. the following directive is basic: "the revisers are always open for consultation on particularly difficult problems, but it must be assumed that a professional cataloger will have a thorough knowledge of the basic tools of his profession. . . . if this knowledge is in any way lacking, the cataloger has the obvious responsibility of acquiring it through diligent study and experience. he should not come to the reviser with questions whose answers are available in the aforementioned tools and in this manual." the format is loose-leaf, so that additions and revisions may be made easily to reflect new developments and techniques. the sections include pre-cataloging procedures; general cataloging and classification procedures; recataloging and reclassification; cornell university college and department libraries-special collections and special catalogs; . . . serials and binding department; files and filing; typing, card production, book preparation; statistics; appendix (including abbreviations, romanization tables, etc. ) ; and index. the procedures and practices described are those adopted by a research library "conscious of the need for both quality and quantity in the work of its staff." this publication, weighing five pounds, is a great achievement and with its full index an indispensable contribution to the collection of worthwhile cataloging manuals. descriptions of local procedures may seem detailed but basic principles and policies are well covered. the final touch is the inclusion of a catalog card for the manual! margaret oldfather barnettellis 22 information technology and libraries | march 2005 the metascholar initiative of emory university libraries, in collaboration with the center for the study of southern culture, the atlanta history center, and the georgia music hall of fame, received an institute of museum and library services grant to develop a new model for library-museum-archives collaboration. this collaboration will broaden access to resources for learning communities through the use of the open archives initiative protocol for metadata harvesting (oaipmh). the project, titled music of social change (mosc), will use oai-pmh as a tool to bridge the widely varying metadata standards and practices across museums, archives, and libraries. this paper will focus specifically on the unique advantages of the use of oaipmh to concurrently maximize the exposure of metadata emergent from varying metadata cultures. t he metascholar initiative of emory university libraries, in collaboration with the center for the study of southern culture, the atlanta history center, and the georgia music hall of fame, received an institute of museum and library services grant to develop a new model for library-museum-archives collaboration to broaden access to resources for learning communities through the use of the open archives initiative protocol for metadata harvesting (oai-pmh).1 the collaborators of the project, entitled music of social change (mosc), are creating a subject-based virtual collection concerning music and musicians associated with social-change movements such as the civil-rights struggle. this paper will specifically focus on the advantages offered by oai-pmh in amalgamating and serving metadata from these institutional sources that are significantly different in kind.2 there has been a great deal of discussion within the library community as to the possibilities oai-pmh holds for harvesting, aggregating, and then disseminating research metadata. however, in reality, only a few of institutions (be they museum, archives, or libraries) have actually begun to utilize oai-pmh to this end. there are some practical, historical barriers to implementing any shared system for distributing metadata across institutions that are, more than in degree, different in kind. one of these significant differences is of metadata cultures and practices. libraries have traditionally incrementally assigned metadata at an item level within their collection(s). the strength of this model is that at least a minimal amount of metadata is assigned to a very high percentage of items within the collection. the challenge of such a system is that for such metadata records to interoperate within a shared database and through a common interface (for example, the traditional union catalog), the metadata fields have been quite rigidly defined compared to those within archival and museum environments. due to tradition as well as the sheer volume of items collected by libraries, metadata at an item level are not greatly detailed or contextualized. often, items within library collections lack robust relationary mapping to other items within or outside of the collection, as is done, for example, in archival processing. content contextualization is highly valued by archival metadata practices and culture as the central tenet of metadata creation. items at a subcollection level almost always have metadata derivative from and deferential to that of the collection-level metadata. the great benefit of archival practices in metadata assignment is a contextualization of content that reflects the background, the topographic place in time and space of a given portion of a collection and its organic, emergent relationship to the whole. the weaknesses of this model are a great inconsistency in description details and variables (at the collection and subcollection levels), as well as very disparate levels of granularity within the hierarchy of the structure of a collection at which metadata are assigned. such disparities among institutional types feed an unnecessary level of misunderstanding by libraries of the metadata culture and aims of archives as well as those of museums. museums often have very skeletal documented (as opposed to undocumented) metadata about their collections or objects therein. often museums are not funded to make metadata on their collections freely available. it is common, in fact, for curatorial staff to view metadata as intellectual property to which they serve as gatekeepers, reflecting a professional value placed upon contextualizing materials for users. this is done on a user-by-user or exhibition-by-exhibition basis, depending on user background or the thesis of a given exhibition. additionally, museums perceive information on the aboutness of their collections to be a class of capital with which they can always potentially cost-recover or generate income. within the culture of museums, staff have traditionally been disinclined to make their collections available in an unmediated manner. additionally, there has been resistance to documenting information about collections in a systematic way. there is even greater resistance to adhering to any prescriptions on metadata as would be required for compliance with even the most minimally structured database. such regulation would discriminate the mosc project: using the oai-pmh to bridge metadata cultural differences across museums, archives, and libraries eulalia roel eulalia roel (eulalia.roel@gmail.com) is coordinator of information resources at the federal reserve, atlanta. against the nuanced information required for each and every object within a collection. � why oai-pmh to bridge these cultures? oai-pmh was selected by the mosc project as a means to bridge some of these substantial disparities. the protocol is often mistakenly assumed to function only with metadata expressed as unqualified dublin core (dc). in fact, the protocol functions with any metadata format expressed by extensible markup language (xml); this is the minimal requirement for content to serve metadata through oai-pmh. this includes those formats that have been well received by institutions other than libraries, such as xml encoded archival description (ead) as it is used in archives. as per 4.2 of the oai-pmh guidelines for repository implementers, communities are able to develop their own collection description xml schemas for use within description . . . elements. if all that is desired is the ability to include an unstructured textual description, then it is recommended that repositories use the dublin core description element. seven existing schemes are: dublin core, encoded archival description (ead), the eprints schema, rslp collection description schema, uddi/wsdl, marc21, and the branding schema.3 the oai protocol has often been partnered with unqualified dc metadata, as this is the most minimal metadata structure necessary for participation in an oai harvesting system. not only are these dc fields unqualified, no fields are actually required. no structure or regulations are codified outside of requiring metadata contributors to adhere to this unqualified metadata schema. therefore, the oai protocol requires minimal technology support and resources at any given contributing site (such support varying more widely across institutions than even their metadata practices themselves). this maximizes flexibility in metadata contribution, as well as maximizing interoperability between the collective data pool from which a user can search. granted, this unregulated framework does come at a cost of inconsistency in metadata detail and quality. however, the great advantage of such nominal requirements is that they enable contributors with minimal metadata-encoding practices to participate in the metadata collaborative. following is an example of a record as it may appear in the mosc collection:
    oai:atlantahistorycenter.com:10 2003-03-31 south:blues south:mississippi-delta-region
    long hall recordings morris, william blues .. comment: sound amateur recording 2003-05-16 sound recording http://atlantahistorycenter.com/ porcelain/10
    additionally, with no fields required by the dc schema, institutions can have absolute discretion as to what metadata are exposed if this is a concern (as may be for privacy considerations for archives or for intellectualproperty concerns for museums). however, one of the great strengths of implementing oai-pmh is that, while the threshold for regulating metadata is low, the protocol can also handle any metadata format expressed by xml, including data formats significantly more structured than dc; for example, ead, text encoding initiative (tei), and tei lite-defined documents. scholars are then able to access these scholarly objects via one point, while still being able to collectively access and utilize all metadata objects available in all collections, from the most to the least robust. the aim of the mosc project participants in selecting oai-pmh is to maximize participation from fairly disparate kinds of organizations, with equally disparate kinds of metadata cultures and practices. in comparison to other, currently available methods of metadata aggregation, oai-pmh is maximally forgiving of discordant metadata suppliers. thereby, the hope is, metadata contributions are maximized. concurrently, the protocol the mosc project | roel 23 24 information technology and libraries | march 2005 allows for highly robust metadata formats. as the cost for inclusion in aggregated systems, in some cases metadata objects are stripped down. this need is eliminated when oai-pmh is utilized. the use of the protocol allows for the inclusion of objects consisting of the most skeletal unqualified dublin core elements, while still accommodating the most complicated metadata objects. optimally, this is a means to achieve a critical mass of contributed resources that will enable end users to utilize the mosc project as the premier site and a primary resource for information on materials about music and musicians associated with social-change movements. � acknowledgment the author would like to express her sincerest gratitude to the institute of museum and library services for funding the music of social change project. references 1. “metascholar: an emory university digital library research initiative,” emory university libraries web site. accessed sept. 1, 2004, http://metascholar.org/; “the center for southern culture,” university of mississippi web site. accessed sept. 1, 2004, www.olemiss.edu/depts/south/; “atlanta history center,” atlanta history center web site. accessed sept. 1, 2004, www.atlantahistorycenter.com/; “georgia music hall of fame,” georgia music hall of fame web site. accessed sept. 1, 2004, www.gamusichall.com/home.html; “institute of museum and library services: library-museum collaboration,” institute of museum and library services web site. accessed sept. 1, 2004, www.imls.gov/grants/l-m/index.htm. 2. “implementation guidelines for the open archives initiative protocol for metadata harvesting,” open archives initiative web site. accessed sept. 1, 2004, www.openarchives.org/ oai/openarchivesprotocol.html#introduction. 3. “4.2 collection and set descriptions,” open archives initiative web site. accessed sept. 1, 2004, www.openarchives.org/ oai/2.0/guidelines-repository.htm#setdescription. reproduced with permission of the copyright owner. further reproduction prohibited without permission. wikiwikiwebs: new ways to communicate in a web environment chawner, brenda;lewis, paul h information technology and libraries; mar 2006; 25, 1; proquest education journals pg. 33 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. bailey 116 information technology and libraries | september 2006 three critical issues—a dramatic expansion of the scope, duration, and punitive nature of copyright laws; the ability of digital rights management (drm) systems to lock-down digital content in an unprecedented fashion; and the erosion of net neutrality, which ensures that all internet traffic is treated equally—are examined in detail and their potential impact on libraries is assessed. how legislatures, the courts, and the commercial marketplace treat these issues will strongly influence the future of digital information for good or ill. editor's note: this article was submitted in honor of the fortieth anniversaries of lita and ital. b logs. digital photo and video sharing. podcasts. rip/mix/burn. tagging. vlogs. wikis. these buzzwords point to a fundamental social change fueled by cheap personal computers (pcs) and servers, the internet and its local wired/wireless feeder networks, and powerful, low-cost software. citizens have morphed from passive media consumers to digital-media producers and publishers. libraries and scholars have their own set of buzzwords: digital libraries, digital presses, e-prints, institutional repositories, and open-access (oa) journals, to name a few. they connote the same kind of change: a democratization of publishing and media production using digital technology. it appears that we are on the brink of an exciting new era of internet innovation: a kind of digital utopia. gary flake of microsoft has provided one striking vision of what could be (with a commercial twist) in a presentation entitled “how i learned to stop worrying and love the imminent internet singularity,” and there are many other visions of possible future internet advances.1 when did this metamorphosis begin? it depends on who you ask. let’s say the late 1980s, when the internet began to get serious traction and an early flowering of noncommercial digital publishing occurred. in the subsequent twenty-odd years, publishing and media production went from being highly centralized, capital-intensive analog activities with limited and welldefined distribution channels, to being diffuse, relatively low-cost digital activities with the global internet as their distribution medium. not to say that print and conventional media are dead, of course, but it is clear that their era of dominance is waning. the future is digital. nor is it to say that entertainment companies (e.g., film, music, radio, and television companies) and information companies (e.g., book, database, and serial publishers) have ceded the digital-content battlefield to the upstarts. quite the contrary. high-quality, thousand-page-per-volume scientific journals and hollywood blockbusters cannot be produced for pennies, even with digital wizardry. information and entertainment companies still have an important role to play, and, even if they didn’t, they hold the copyrights to a significant chunk of our cultural heritage. entertainment and information companies have understood for some time that they must adapt to the digital environment or die, but this change has not always been easy, especially when it involves concocting and embracing new business models. nonetheless, they intend to thrive and prosper—and to do whatever it takes to succeed. as they should, since they have an obligation to their shareholders to do so. the thing about the future is that it is rooted in the past. culture, even digital culture, builds on what has gone before. unconstrained access to past works helps determine the richness of future works. inversely, when past works are inaccessible except to a privileged minority, future works are impoverished. this brings us to a second trend that stands in opposition to the first. put simply, it is the view that intellectual works are property; that this property should be protected with the full force of civil and criminal law; that creators have perpetual, transferable property rights; and that contracts, rather than copyright law, should govern the use of intellectual works. a third trend is also at play: the growing use of digital rights management (drm) technologies. when intellectual works were in paper (or other tangible forms), they could only be controlled at the object-ownership or object-access levels (a library controlling the circulation of a copy of a book is an example of the second case). physical possession of a work, such as a book, meant that the user had full use of it (i.e., the user could read the entire book and photocopy pages from it). when works are in digital form and are protected by some types of drm, this may no longer be true. for example, a user may only be able to view a single chapter from a drm-protected e-book and may not be able to print it. the fourth and final trend deals with how the internet functions at its most fundamental level. the internet was designed to be content-, application-, and hardware-neutral. as long as certain standards were met, the network did not discriminate. one type of content was not given preferential delivery speed over another. one type of strong copyright + drm + weak net neutrality = digital dystopia? charles w. bailey jr. charles w. bailey jr. (cbailey@digital-scholarship.com) is assistant dean for digital library planning and development at university of houston libraries. digital dystopia | bailey 117 content was not charged for delivery while another was free. one type of content was not blocked (at least by the network) while another was unhindered. in recent years, network neutrality has come under attack. the collision of these trends has begun in courts, legislatures, and the marketplace. it is far from over. as we shall see, its outcome will determine what the future of digital culture looks like. ฀ stronger copyright: 1790 versus 2006 copyright law is a complex topic. it is not my intention to provide a full copyright primer here. (indeed, i will assume that the reader understands some copyright basics, such as the notion that facts and ideas are not covered by copyright.) rather, my aim is to highlight some key factors about how and why united states copyright law has evolved and how it relates to the digital problem at hand. three authors (lawrence lessig, professor of law at the stanford law school; jessica litman, professor of law at the wayne state university law school; and siva vaidhyanathan, assistant professor in the department of culture and communication at new york university) have done brilliant and extensive work in this area, and the following synopsis is primarily based on their contributions. i heartily recommend that you read the cited works in full. the purpose of copyright let us start with the basis of u.s. copyright law, the constitution’s “progress clause”: “congress has the power to promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.”2 copyright was a bargain: society would grant creators a time-limited ability to control and profit from their works before they fell into the public domain (where works are unprotected) because doing so resulted in “progress of science and useful arts” (a social good). regarding the progress clause, lessig notes: it does not say congress has the power to grant “creative property rights.” it says that congress has the power to promote progress. the grant of power is its purpose, and its purpose is a public one, not the purpose of enriching publishers, nor even primarily the purpose of rewarding authors.3 however, entertainment and information companies can have a far different view, as illustrated by this quote from jack valenti, former president of the motion picture association of america: “creative property owners must be accorded the same rights and protections resident in all other property owners in the nation.”4 types of works covered when the copyright act of 1790 was enacted, it protected published books, maps, and charts written by living u.s. authors as well as unpublished manuscripts by them.5 the act gave the author the exclusive right to “print, reprint, publish, or vend” these works. now, copyright protects a wide range of published and unpublished “original works of authorship” that are “fixed in a tangible medium of expression” without regard for “the nationality or domicile of the author,” including “1. literary works; 2. musical works, including any accompanying words; 3. dramatic works, including any accompanying music; 4. pantomimes and choreographic works; 5. pictorial, graphic, and sculptural works; 6. motion pictures and other audiovisual works; 7. sound recordings; 8. architectural works.”6 rights in contrast to the limited print publishing rights inherent in the copyright act of 1790, current law grants copyright owners the following rights (especially notable is the addition of control over derivative works, such as a play based on a novel or a translation): ฀ to reproduce the work in copies or phonograph records; ฀ to prepare derivative works based upon the work; ฀ to distribute copies or phonograph records of the work to the public by sale or other transfer of ownership, or by rental, lease, or lending; ฀ to perform the work publicly, in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works; ฀ to display the copyrighted work publicly, in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work; and ฀ in the case of sound recordings, to perform the work publicly by means of a digital audio transmission.7 duration the copyright act of 1790 granted authors a term of fourteen years, with one renewal if the author was still living (twenty-eight years total).8 now the situation is much more complex, and, rather than trying to review the details, i’ll provide the following example. for a personal author who produced a work on or after january 1, 1978, it is covered for the life of the author plus seventy years.9 so, assuming 118 information technology and libraries | september 2006 an author lives an average seventy-five years, the work would be covered for 144 years, which is approximately 116 years longer than in 1790. registration registration was required by the copyright act of 1790, but very few eligible works were registered from 1790 to 1800, which enriched the public domain.10 now registration is not required, and no work enriches the public domain until its term is over, even if the author (or the author’s descendants) have no interest in the work being under copyright, or it is impossible to locate the copyright holder to gain permission to use his or her works (creating so-called “orphan works”). drafting of legislation by 1901, copyright law had become fairly esoteric and complex, and drafting new copyright legislation had become increasingly difficult. consequently, congress adopted a new strategy: let those whose commercial interests were directly affected by copyright law deliberate and negotiate with each other about copyright law changes, and use the results of this process as the basis of new legislation.11 over time, this increasingly became a dialogue among representatives of entertainment, high-tech, information, and telecommunications companies; other parties, such as library associations; and rights-holder groups (e.g., ascap). since these parties often had competing interests, the negotiations were frequently contentious and lengthy. the resulting laws created a kind of crazy quilt of specific exceptions for the deals made during these sessions to the ever-expanding control over intellectual works that copyright reform generally engendered. since the public was not at the table, its highly diverse interests were not directly represented, and, since stakeholder industries lobby congress and the public does not, the public’s interests were often not well served. (there were some efforts by special interest groups to represent the public on narrowly focused issues.) frequency of copyright term legislation with remarkable restraint, congress, in its first hundred years, enacted one copyright bill that extended the copyright term and one in its next fifty; however, starting in 1962, it passed eleven bills in the next forty years.12 famously, jack valenti once proposed that copyright “last forever less one day.”13 by continually extending copyright terms in a serial fashion, congress may grant him his wish. licenses in 1790, copyrighted works were sold and owned. today, many digital works are licensed. licenses usually fall under state contract law rather than federal copyright law.14 licensed works are not owned, and the first-sale doctrine is not in effect.15 while copyright is the legal foundation of licenses (i.e., works can be licensed because licensors own the copyright to those works), licenses are contracts, and contract provisions trump user-favorable copyright provisions, such as fair use, if the licensor chooses to negate them in a license. criminal and civil penalties in 1790 there were civil penalties for copyright infringement (e.g., statutory fines of “50 cents per sheet found in the infringer ’s possession”).16 now there are criminal copyright penalties, including felony violations that can result in a maximum of five years of imprisonment and fines as high as $250,000 for first-time offenders; civil statutory fines that can range as high as $150,000 per infringement (if infringement is “willful”), and other penalties.17 once the copyright implications of digital media and the internet sunk in, entertainment and information companies were deeply concerned: digital technologies made creating perfect copies effortless, and the internet provided a free (or low-cost) way to distribute content globally. congress, primarily spurred on by entertainment companies, passed several laws aimed at curtailing perceived digital “theft” through criminal penalties. under the 1997 no electronic theft (net) act, copyright infringers face “up to 3 years in prison and/or $250,000 fines,” even for noncommercial infringement.18 under the 1998 digital millennium copyright act (dmca), those who defeat technological mechanisms that control access to copyrighted works (a process called “circumvention”) face a maximum of five years in prison and $500,000 in fines.19 effect of copyright on average citizens in 1790, copyright law had little effect on citizens. the average person was not an author or publisher, private use of copyrighted materials was basically unregulated, the public domain was healthy, and many types of works were not covered by copyright at all. in 2006, ฀ virtually every type of work imaginable is under automatic copyright protection for extended periods of time; ฀ private use of digital works is increasingly visible and of concern to copyright holders; ฀ the public domain is endangered; and ฀ ordinary citizens are being prosecuted as “pirates” under draconian statutory and criminal penalties. digital dystopia | bailey 119 regarding this development, lessig says: for the first time in our tradition, the ordinary ways in which individuals create and share culture fall within the reach of the regulation of the law, which has expanded to draw within its control a vast amount of culture and creativity that it never reached before. the technology that preserved the balance of our history—between uses of our culture that were free and uses of our culture that were only upon permission—has been undone. the consequence is that we are less and less a free culture, more and more a permission culture.20 how has copyright changed since the days of the founding fathers? as we have seen, there has been a shift in copyright law (and social perceptions of it) from ฀ promoting progress to protecting intellectual property owners’ “rights”; ฀ from covering limited types of works to covering virtually all types of works; ฀ from granting only basic reproduction and distribution rights to granting a much wider range of rights; ฀ from offering a relatively short duration of protection to offering a relatively long (potentially perpetual) one; ฀ from requiring registration to providing automatic copyright; ฀ from drafting laws in congress to drafting laws in work groups of interested parties dominated by commercial representatives; ฀ from making infrequent extensions of copyright duration to making frequent ones; ฀ from selling works to licensing them; ฀ from relatively modest civil penalties to severe civil and criminal penalties; and ฀ from ignoring ordinary citizens’ typical use of copyrighted works to branding them as pirates and prosecuting them with lawsuits. (regarding lawsuits filed by the recording industry association of america against four students, lessig notes: “if you added up the claims, these four lawsuits were asking courts in the united states to award the plaintiffs close to $100 billion—six times the total profit of the film industry in 2001.”)21 complicating this situation further is intense consolidation and increased vertical integration in the entertainment, information, telecommunications, and other high-tech industries involved in the internet.22 this vertical integration has implications for what can be published and the free flow of information. for example, a company that publishes books and magazines, produces films and television programs, provides internet access and digital content, and provides cable television services (including broadband internet access) has different corporate interests than a company that performs a single function. these interrelated interests may affect not only what information is produced and whether competing information and services are freely available through controlled digital distribution channels, but corporate perceptions of copyright issues as well. one of the ironies of the current copyright situation is this: if creative works are by nature property, and stealing property is (and has always been) wrong, then some of the very industries that are demanding that this truth be embodied in copyright law have, in the past, been pirates themselves, even though certain acts of piracy may have been legal (or appeared to be legal) under then-existing copyright laws.23 lessig states: if “piracy” means using the creative property of others without their permission—if “if value, then right” is true—then the history of the content industry is a history of piracy. every important sector of “big media” today—film, records, radio, and cable tv—was born of a kind of piracy so defined. the consistent story is how last generation’s pirates join this generation’s country club—until now.24 let’s take a simple case: cable television. early cable television companies used broadcast television programs without compensating copyright owners, who branded their actions as piracy and filed lawsuits. after two defeats in the supreme court, broadcast television companies won a victory (of sorts) in congress, which took nearly thirty years to resolve the matter: cable television companies would pay, but not what broadcast television companies wanted; rather they would pay fees determined by law.25 of course, this view of history (big media companies as pirates in their infancy) is open to dispute. for the moment, let’s assume that it is true. put more gently, some of the most important media companies of modern times flourished because of relatively lax copyright control, a relatively rich public domain, and, in some cases, a societal boon that allowed them to pay statutory license fees— which are compulsory for copyright owners—instead of potentially paying much higher fees set by copyright owners or being denied use at all. today, the very things that fostered media companies’ growth are under attack by them. the success of those attacks is diminishing the ability of new digital content and service companies to flourish and, in the long run, may diminish even big media’s ability to continue to thrive as a permission culture replaces a permissive culture. several prominent copyright scholars have suggested copyright reforms to help restore balance to the copyright system. james boyle, professor of law at the duke university law school, recommends a twenty-year copyright term with “a broadly defined fair use protection for journalistic, teaching, and parodic uses—provided that those uses were not judged to be in bad faith by a jury applying the ‘beyond a reasonable doubt’ standard.”26 120 information technology and libraries | september 2006 william w. fisher iii, hale and dorr professor of intellectual property law at harvard university law school, suggests that “we replace major portions of the copyright and encryption-reinforcement models with . . . a governmentally administered reward system” that would put in place new taxes and compensate registered copyright owners of music or films with “a share of the tax revenues proportional to the relative popularity of his or her creation,” and would “eliminate most of the current prohibitions on unauthorized reproduction, distribution, adaptation, and performance of audio and video recordings.”27 lessig recommends that copyright law be guided by the following general principles: (1) short copyright terms, (2) a simple binary system of protected/not protected works without complex exceptions, (3) mandatory renewal, and (4) a “prospective” orientation that forbids retrospective term extensions.28 (previously, lessig had proposed a seventy-five-year term contingent on five-year renewals). he suggests reinstating the copyright registration requirement using a flexible system similar to that used for domain name registrations. he favors works having copyright marks, and, if they are not present, he would permit their free use until copyright owners voice their opposition to this use (uses of the work made prior to this point would still be permitted). litman wants a copyright law “that is short, simple, and fair,” in which we “stop defining copyright in terms of reproduction” and recast copyright as “an exclusive right of commercial exploitation.”29 litman would eliminate industry-specific copyright law exceptions, but grant the public “a right to engage in copying or other uses incidental to a licensed or legally privileged use”; the “right to cite” (even infringing works); and “an affirmative right to gain access to, extract, use, and reuse the ideas, facts, information, and other public-domain material embodied in protected works” (including a restricted circumvention right).30 things change in two hundred-plus years, and the law must change with them. since the late nineteenth century, copyright law has been especially impacted by new technologies. the question is this: has copyright law struck the right balance between encouraging progress through granting creators specific rights and fostering a strong public domain that also nourishes creative endeavor? if that balance has been lost, how can it be restored? or is society simply no longer striving to maintain that balance because intellectual works are indeed property, property must be protected for commerce to prosper, and the concept of balance is outmoded and no longer reflects societal values? ฀ drm: locked-up content and fine-grained control noted attorney michael godwin defines drm as “a collective name for technologies that prevent you from using a copyrighted digital work beyond the degree to which the copyright owner (or a publisher who may not actually hold a copyright) wishes to allow you to use it.”31 like copyright, drm systems are complex, with many variations. there are two key technologies: (1) digital marking (i.e., digital fingerprints that uniquely identify a work based on its characteristics, simple labels that attach rights information to content, and watermarks that typically hide information that can be used to identify a work), and (2) encryption (i.e., scrambled digital content that requires a digital key to decipher it).32 specialized hardware can be used to restrict access as well, often in conjunction with digital marking and encryption. the intent of this article is not to provide a technical tutorial, but to set forth an overview of the basic drm concept and discuss its implications. what is of interest here is not how system a-b-c works in contrast to system x-y-z, but what drm allows copyright owners to do and the issues related to drm. to do so, let’s use an analogy, understanding that real drm systems can work in other ways as well (e.g., digital watermarks can be used to track illegal use of images on the internet without those images being otherwise protected). for the moment, let’s imagine that the content a user wishes to access is in an unbreakable, encrypted digital safe. the user cannot see inside the safe. by entering the correct digital combination, certain content becomes visible (or audible or both) in the safe. that content can then be utilized in specific ways (and only those ways), including, if permitted, leaving the safe. if a public domain work is put in the safe, access to it is restricted regardless of its copyright status. bill rosenblatt, bill trippe, and stephen mooney provide a very useful conceptual model of drm rights in their landmark drm book, digital rights management: business and technology, summarized here.33 there are three types of content rights: (1) render rights, (2) transport rights, and (3) derivative-works rights. render rights allow authorized users to view, play, and print protected content. transport rights allow authorized users to copy, move, and loan content (the user retains the content if it is copied and gets it back when a loan is over, but does not keep a copy if it is moved). derivative-works rights allow authorized users to extract pieces of content, edit the content in place, and embed content by extracting some of it and using it in other works. each one of these individual rights has three attributes: (1) consideration, (2) extents, and (3) types of users. in the first attribute, consideration, access to content is provided for something of value to the publisher (e.g., money or personal information). content can then be used to some extent (e.g., for a certain amount of time or a certain number of times). the rights and attributes users have are determined by their user types. digital dystopia | bailey 121 for example, an academic user, in consideration of a specified license payment by his or her library, can view a drm-protected scholarly article—but not copy, move, loan, extract, edit, or embed it—for a week, after which it is inaccessible. we can extend this hypothetical example by imagining that the library could pay higher license fees to gain more rights to the journal in question, and the library (or the user) could dynamically purchase additional article-specific rights enhancements as needed through micropayments. this example is extreme; however, it illustrates the fine-grained, high level of control that publishers could potentially have over content by using drm technology. godwin suggests that drm may inhibit a variety of legitimate uses of drm-protected information, such as access to public-domain works (or other works that would allow liberal use), preservation of works by libraries, creation of new derivative works, conduct of historical research, exercise of fair-use rights, and instructional use.34 the ability of blind (or otherwise disabled) users to employ assistive technologies may also be prevented by drm technology.35 drm also raises a variety of privacy concerns.36 fair use is an especially thorny problem. rosenblatt, trippe, and mooney state: fair use is an “i’ll know it when i see it” proposition, meaning that it can’t be proscriptively defined. . . . just as there is no such thing as a “black box” that determines whether broadcast material is or isn’t indecent, there is no such thing as a “black box” that can determine whether a given use of content qualifies as fair use or not. anything that can’t be proscriptively defined can’t be represented in a computer system.37 no need to panic about scholarly journals—yet. your scholarly journal publisher or other third-party supplier is unlikely to present you with such detailed options tomorrow. but you may already be licensing other digital content that is drm-protected, such as digital music or e-books that require a hardware e-book reader. as the recent sony bmg “rootkit” episode illustrated, creating effective, secure drm systems can be challenging, even for large corporations.38 again, the reasons for this are complex. in very simple terms, it boils down to this: assuming that the content can be protected up to the point it is placed in a drm system, the drm system has the best chance of working if all possible devices that can process its protected content either directly support its protection technology, recognize its restrictions and enforce them through another means, or refuse access.39 anything less creates “holes” in the protective drm shell, such as the well-known “analog hole” (e.g., when drm-protected digital content is converted to analog form to be played, it can then be rerecorded using digital equipment without drm protection).40 ideally, in other words, every server, network router, pc and pc component, operating system, and relevant electronic device (e.g., cd player, dvd player, audiorecording device, and video-recording device) would work with the drm system as outlined previously or would not allow access to the content at all. clearly, this ideal end-state for drm may well never be realized, especially given the troublesome backwardcompatibility equipment problem.41 however, this does not mean that the entertainment, information, and hightechnology companies will not try to make whatever piecemeal progress that they can in this area.42 the trusted computing group is an important multiple-industry security organization, whose standards work could have a strong impact on the future of drm. robert a. gehring notes: but a drm system is almost useless, that is from a content owner’s perspective, until it is deployed broadly. putting together cheap tc components with a marketdominating operating system “enriched” with drm functionality is the most economic way to provide the majority of users with “copyright boxes.”43 seth schoen argues computer owners should be empowered to override certain features of “trusted computing architecture” to address issues with “anti-competitive and anti-consumer behavior” and other problems.44 drm could potentially be legislatively mandated. there is a closely related legal precedent, the audio home recording act, which requires that digital audiotape equipment include special hardware to prevent serial copying.45 there is currently a bill before congress that would require use of a “broadcast flag” (a digital marker) for digital broadcast and satellite radio receivers.46 last year, a similar fcc regulation for broadcast digital television was struck down by a federal appeals court; consequently, the current bill explicitly empowers the fcc to “enforce ‘prohibitions against unauthorized copying and redistribution.’”47 another bill would plug the analog-to-digital video analog hole by putting “strict legal controls on any video analog to digital (a/d) convertors.”48 whether these bills become law or not, efforts to mandate drm are unlikely to end. dmca strongly supports drm by prohibiting both the circumvention of technological mechanisms that control access to copyrighted works (with some minor exceptions) and the “manufacture of any device, composition of any program, or offering of any service” to do so.49 what would the world be like if all newly published (or released) commercially created information was in digital form, protected by drm? what would it be like if all old works in print and analog formats were only reissued in digital form, protected by drm? what would it be like if all hardware that could process that digital information had to support the information’s drm scheme or block any access to it because this was mandated by law? what would it be 122 information technology and libraries | september 2006 like if all operating systems had direct or indirect built-in support for drm? would “progress of science and useful arts” be promoted or squashed? ฀ weaker net neutrality lessig identifies three important characteristics of the internet that have fostered innovation: (1) edge architecture: software applications run on servers connected to the network, rather than on the network itself, ensuring that the network itself does not have to be modified for new or updated applications to run; (2) no application optimization: a relatively simple, but effective, protocol is utilized (internet protocol) that is indifferent to what software applications run on top of it, again insulating the network from application changes; and (3) neutral platform: the network does not prefer certain data packets or deny certain packets access.50 lessig’s conceptual model is very useful when thinking about net neutrality, a topic of growing concern. educause’s definition of net neutrality aptly captures these concerns: “net neutrality” is the term used to describe the concept of keeping the internet open to all lawful content, information, applications, and equipment. there is increasing concern that the owners of the local broadband connections (usually either the cable or telephone company) may block or discriminate against certain internet users or applications in order to give an advantage to their own services. while the owners of the local network have a legitimate right to manage traffic on their network to prevent congestion, viruses, and so forth, network owners should not be able to block or degrade traffic based on the identity of the user or the type of application solely to favor their interests.51 for some time, there have been fears that net neutrality was endangered as the internet became increasingly commercialized, a greater percentage of home internet users migrated to broadband connections not regulated by common carrier laws, and telecommunications mergers (and vertical integration) accelerated. some of these fears are now appearing to be realized, albeit with resistance by the internet community. for example, aol has indicated that it will implement a two-tier e-mail system for companies, nonprofits, and others who send mass mailings: those who pay bypass spam filters, those who don’t pay don’t bypass spam filters.52 critics fear that free e-mail services will deteriorate under a two-tier system. facing fierce criticism from the dearaol.com coalition and many others, aol has relented somewhat on the nonprofit issue by offering special treatment for “qualified” nonprofits. a second example is that an analysis of verizon’s fcc filings reveals that “more than 80% of verizon’s current capacity is earmarked for carrying its service, while all other traffic jostles in the remainder.”53 content-oriented net companies are worried: leading net companies say that verizon’s actions could keep some rivals off the road. as consumers try to search google, buy books on amazon.com, or watch videos on yahoo!, they’ll all be trying to squeeze into the leftover lanes on verizon’s network. . . . “the bells have designed a broadband system that squeezes out the public internet in favor of services or content they want to provide,” says paul misener, vice-president for global policy at amazon.com.54 a third example is a comment by william l. smith, bellsouth ‘s chief technology officer, who “told reporters and analysts that an internet service provider such as his firm should be able, for example, to charge yahoo inc. for the opportunity to have its search site load faster than that of google inc.,” but qualified this assertion by indicating that “a pay-for-performance marketplace should be allowed to develop on top of a baseline service level that all content providers would enjoy.”55 about four months later, at&t announced that it would acquire bellsouth, after which it “will be the local carrier in 22 states covering more than half of the american population.”56 finally, in a white paper for public knowledge, john windhausen jr. states: this concern is not just theoretical—broadband network providers are taking advantage of their unregulated status. cable operators have barred consumers from using their cable modems for virtual private networks and home networking and blocked streaming video applications. telephone and wireless companies have blocked internet telephone (voip—voice over the internet protocol) traffic outright in order to protect their own telephone service revenues.57 these and similar examples are harbingers of troubled days ahead for net neutrality. the canary in the net neutrality mine isn’t dead yet, but it’s getting very nervous. the bottom line? noted oa advocate peter suber analyzes the situation as follows: but now cable and telecom companies want to discriminate, charge premium prices for premium service, and give second-rate service to everyone else. if we relax the principle of net neutrality, then isps could, if they wanted, limit the software and hardware you could connect to the net. they could charge you more if you send or receive more than a set number of emails. they could block emails containing certain keywords or emails from people or organizations they disliked, and block traffic to or from competitor web sites. they could make filtered service the default and force users to pay extra for the digital dystopia | bailey 123 wide open internet. if you tried to shop at a store that hasn’t paid them a kickback, they could steer you to a store that has. . . . if companies like at&t and verizon have their way, there will be two tiers of internet service: fast and expensive and slow and cheap (or cheaper). we unwealthy users—students, scholars, universities, and small publishers—wouldn't be forced offline, just forced into the slow lane. because the fast lane would reserve a chunk of bandwidth for the wealthy, the peons would crowd together in what remained, reducing service below current levels. new services starting in the slow lane wouldn't have a fighting chance against entrenched players in the fast lane. think about ebay in 1995, google in 1999, or skype in 2002 without the level playing field provided by network neutrality. or think about any oa journal or repository today.58 is net neutrality a quaint anachronism of the internet’s distant academic/research roots that we would be better off without? would new internet companies and noncommercial services prosper better if it was gone, spurring on new waves of innovation? would telecommunications companies (who may be part of larger conglomerates), free to charge for tiered-services, offer us exciting new service offerings and better, more reliable service? ฀ defending the internet revolution sixties icon bob dylan’s line in “the times they are achangin’”—“then you better start swimmin’ or you’ll sink like a stone”—couldn’t be more apt for those concerned with the issues outlined in this paper. here’s a brief overview of some of the strategies being used to defend the freewheeling internet revolution. 1. darknet: j. d. lasica says: “for the most part, the darknet is simply the underground internet. but there are many darknets: the millions of users trading files in the shady regions of usenet and internet relay chat; students who send songs and tv shows to each other using instant messaging services from aol, yahoo, and microsoft; city streets and college campuses where people copy, burn, and share physical media like cds; and the new breed of encrypted dark networks like freenet. . .”59 we may think of the darknet as simply fostering illegal file swapping by ordinary citizens, but the darknet strategy can also be used to escape government internet censorship, as is the case with freenet use in china.60 2. legislative and legal action: there have been attempts to pass laws to amend or reverse copyright and other laws resulting from the counter-internet-revolution, which have been met by swift, powerful, and generally effective opposition from entertainment companies and other parties affected by these proposed measures. the moral of this story is that these large corporations can afford to pay lobbyists, make campaign contributions, and otherwise exert significant influence over lawmakers, while, by and large, advocates for the other side do not have the same clout. the battle in the courts has been more of a mixed bag; however, there have been some notable defeats for reform advocates, especially in the copyright arena (e.g., eldred v. ashcroft), where most of the action has been. 3. market forces: when commercial choices can be made, users can vote with their pocketbooks about some internet changes. but, if monopoly forces are in play, such as having a single option for broadband access, the only other choice may be no service. however, as the oa movement (described later) has demonstrated, a concerted effort by highly motivated individuals and nonprofit organizations can establish viable new alternatives to commercial services that can change the rules of the game in some cases. companies can also explore radical new business models that may appear paradoxical to pre-internet-era thinking, but make perfect sense in the new digital reality. in the long run, the winners of the digital-content wars may be those who are not afraid of going down the internet rabbit hole. 4. creative commons: copyright is a two-edged sword: it can be used as the legal basis of licenses (and drm) to restrict and control digital information, or it can be used as the legal basis of licenses to permit liberal use of digital information. by using one of the six major creative common licenses (ccl), authors can retain copyright, but significantly enrich society’s collective cultural repository with works that can be freely shared for noncommercial purposes, used, in some cases, for commercial purposes, and used to easily build new derivative creative works. for example, the creative commons attribution license requires that a work is attributed to the author; however, a work can be used for any commercial or noncommercial purpose without permission, including creating derivative works.61 there are a variety of other licenses, such as the gnu free documentation license, that can be used for similar purposes.62 5. oa: scholars create certain types of information, such as journal articles, without expecting to be paid to do so, and it is in their best interests for these works to be widely read, especially by specialists in their fields.63 by putting e-prints (electronic preprints or post-prints) of articles on personal home pages or in various types of digital archives (e.g., institutional repositories) in full compliance with copyright law and, if needed, in compliance with publisher policies, scholars can provide free global access to these works with minimal effort and at no (or little) cost to themselves. further, a new generation of free e-journals are being published on the internet that are being funded by a variety of business models, such as advertising, author fees, library membership fees, and supplemental products. these oa strategies make digital 124 information technology and libraries | september 2006 scholarly information freely available to users across the globe, regardless of their personal affluence or the affluence of their affiliated institutions. ฀ impact on libraries this paper’s analysis of copyright, drm, and network neutrality trends holds no good news for libraries. copyright the reach of copyright law constantly encompasses new types of materials and for an ever-lengthening duration. as a result, copyright holders must explicitly place their works in the public domain if the public domain is to continue to grow. needless to say, the public domain is a primary source of materials that can be digitized without having to face a complex, potentially expensive, and sometimes hopeless permission clearance process. this process can be especially daunting for media works (such as films and video), even for the use of very short segments of these works. j. d. lasica recounts his effort to get permission to use short music and film segments in a personal video: five out of seven music companies declined; six out of seven movie studios declined, and the one that agreed had serious reservations.64 the replies to his inquiry, for those companies that bothered to reply at all, are well worth reading. for u.s. libraries without the resources to deal with complicated copyright-related issues, the digitization clock stops at 1922, the last year we can be sure that a work is in the public domain without checking its copyright status and getting permission if it is under copyright.65 what can we look forward to? lessig says: “thus, in the twenty years after the sonny bono act, while one million patents will pass into the public domain, zero copyrights will pass into the public domain by virtue of the expiration of a copyright term.”66 (the sonny bono term extension act was passed in 1998.) digital preservation is another area of concern in a legal environment where most information is automatically copyrighted, copyright terms are lengthy (or endless), and information is increasingly licensed. simply put, a library cannot digitally preserve what it does not own unless the work is in the public domain, the work’s license permits it, or the work’s copyright owner grants permission to do so. or can it? after all, the internet archive does not ask permission ahead of time before preserving the entire internet, although it responds to requests to restrict information. and that is why the internet archive is currently being sued by healthcare advocates, which says that it: “is just like a big vacuum cleaner, sucking up information and making it available.”67 if it is not settled out of court, this will be an interesting case for more digitally adventurous libraries to watch. as the cost of the hardware and software needed to effectively do so continues to drop, faculty, students, and other library users will increasingly want to repurpose content, digitizing conventional print and media materials, remixing digital ones, and/or creating new digital materials from both. with the “information commons” movement, academic libraries are increasingly providing users with the hardware and software tools to repurpose content. given that the wording of the u.s. copyright act section 108 (f) (1) is vague enough that it could be interpreted to include these tools when they are used for information reproduction, is the old “copyright disclaimer on the photocopier” solution enough in the new digital environment? or—in light of the unprecedented transformational power of these tools to create new digital works, and their widespread use both within libraries and on campus—do academic libraries bear heavier responsibilities regarding copyright compliance, permission-seeking, and education? similar issues arise when faculty want to place self-created digital works that incorporate copyrighted materials in electronic reserves systems or institutional repositories. enduser contributions to “library 2.0” systems that incorporate copyrighted materials may also raise copyright concerns. drm as libraries realize that they cannot afford dual formats, their new journal and index holdings are increasingly solely digital. libraries are also licensing a growing variety of “born digital” information. the complexities of dealing with license restrictions for these commercial digital products are well understood, but imagine if drm was layered on top of license restrictions. as we have discussed, drm will allow content producers and distributors to slice, dice, and monetize access to digital information in ways that were previously impossible. what may be every publisher/vendor’s dream could be every library’s nightmare. aside from a potential surge of publisher/vendor-specific access licensing options and fees, libraries may also have to contend with publisher/ vendor-specific drm technical solutions, which may: ฀ depend on particular hardware/software platforms, ฀ be incompatible with each other, ฀ decrease computer reliability and security, ฀ eliminate fair or otherwise legal use of drm-protected information, ฀ raise user privacy issues, ฀ restrict digital preservation to bitstream preservation (if allowed by license), digital dystopia | bailey 125 ฀ make it difficult to assess whether to license drmprotected materials, ฀ increase the difficulty of providing unified access to information from different publishers and vendors, ฀ multiply user support headaches, and ฀ necessitate increased staffing. drm makes solving many of these problems both legally and technically impossible. for example, under dmca, libraries have the right to circumvent drm for a work in order to evaluate whether they want to purchase it. however, they cannot do so without the software tools to crack the work’s drm protection. but the distribution of those tools is illegal under dmca, and local development of such tools is likely to be prohibitively complex and expensive.68 fostering alternatives to restrictive copyright and drm given the uphill battle in the courts and legislatures, ccls (or similar licenses) and oa are particularly promising strategies to deal with copyright and drm issues. copyright laws do not need to change for these strategies to be effective. it is not just a question of libraries helping to support oa by paying for institutional memberships to oa journals, building and maintaining institutional repositories, supporting oa mandates, encouraging faculty to edit and publish oa journals, educating faculty about copyright and oa issues, and encouraging them to utilize ccls (or similar licenses). to truly create change, libraries need to “walk the talk” and either let the public-domain materials they digitize remain in the public domain, or put them under ccls (or similar licenses), and, when they create original digital content, put it under ccls (or similar) licenses as well. as the oa movement has shown, using ccls does not rule out revenue generation (if that is an appropriate goal), but it does require facilitating strategies, such as advertising and offering fee-based add-on products and services. net neutrality there are many unknowns surrounding the issue of net neutrality, but what is clear is that it is under assault. it is also clear that internet services are more likely to require more, not less, bandwidth in the future as digital media and other high-bandwidth applications become more commonplace, complex, and interwoven into a larger number of internet systems. one would imagine that if a corporation such as google had to pay for a high-speed digital lane, it would want it to reach as many consumers as possible. so, it may well be that libraries’ google access would be unaffected or possibly improved by a two-tier (or multi-tier) internet “speed-lane” service model. would the same be true for library-oriented publishers and vendors? that may depend on their size and relative affluence. if so, the ability of smaller publishers and vendors to offer innovative bandwidth-intensive products and services may be curtailed. unless they are affluent, libraries may also find that they are confined to slower internet speed lanes when they act as information providers. for libraries engaged in digital library, electronic publishing, and institutional repository projects, this may be problematic, especially as they increasingly add more digital media, large-data-set, or other bandwidth-intensive applications. it’s important to keep in mind that net neutrality impacts are tied to where the chokepoints are, with the most serious potential impacts being at chokepoints that affect large numbers of users, such as local isps that are part of large corporations, national/international backbone networks, and major internet information services (e.g.,yahoo!). it is also important to realize that the problem may be partitioned to particular network segments. for example, on-campus network users may not experience any speed issues associated with the delivery of bandwidth-intensive information from local library servers because that network segment is under university control. remote users, however, including affiliated home users, may experience throttled-down performance beyond what would normally be expected due to speed-lane enforcement by backbone providers or local isps controlled by large corporations. likewise, users at two universities connected by a special research network may experience no issues related to accessing the other university’s bandwidth-intensive library applications from on-campus computers because the backbone provider is under a contractual obligation to deliver specific network performance levels. although the example of speed lanes has been used in this examination of potential net neutrality impacts on libraries, the problem is more complex than this, because network services, such as peer-to-peer networking protocols, can be completely blocked, digital information can be blocked or filtered, and other types of fine-grained network control can be exerted. ฀ conclusion this paper has deliberately presented one side of the story. it should not be construed as saying that copyright law should be abolished or violated, that drm can serve no useful purpose (if it is possible to fix certain critical deficiencies and if it is properly employed), or that no one has to foot the bill for content creation/marketing/distribution and ever-more-bandwidth-hungry internet applications. 126 information technology and libraries | september 2006 nor is it to say that the other side of the story, the side most likely to be told by spokespersons of the entertainment, information, and telecommunications industries, has no validity and does not deserve to be heard. however, that side of the story is having no problem being heard, especially in the halls of congress. the side of the story presented in this paper is not as widely heard—at least, not yet. nor does it intend to imply that executives from the entertainment, information, telecommunications, and other corporate venues lack a social conscience, are fully unified in their views, or are unconcerned with the societal implications of their positions. however, by focusing on short-term issues, they may not fully realize the potentially negative, long-term impact that their positions may have on their own enterprises. nor has this paper presented all of the issues that threaten the internet, such as assaults on privacy, increasingly determined (and malicious) hacking, state and other censorship, and the seemingly insolvable problem of overlaying national laws on a global digital medium. what this paper has said is simply this: three issues—a dramatic expansion of the scope, duration, and punitive nature of copyright laws; the ability of drm to lock-down content in an unprecedented fashion; and the erosion of net neutrality—bear careful scrutiny by those who believe that the internet has fostered (and will continue to foster) a digital revolution that has resulted in an extraordinary explosion of innovation, creativity, and information dissemination. these issues may well determine whether the much-touted information superhighway lives up to its promise or simply becomes the “information toll road” of the future, ironically resembling the pre-internet online services of the past. references and notes 1. gary flake, “how i learned to stop worrying and love the imminent internet singularity,” http://castingwords.com/ transcripts/o3/5073.html (accessed may 2, 2006). 2. lawrence lessig, free culture: the nature and future of creativity (new york: penguin, 2005), 130, www.free-culture.cc/ (accessed may 2, 2006). 3. ibid., 131. 4. ibid., 117–18. 5. william f. patry, copyright law and practice (washington, d.c.: bureau of national affairs, 2000), http://digital-law -online.info/patry (accessed may 2, 2006). 6. u.s. copyright office, copyright basics (washington, d.c.: u.s. copyright office, 2000), www.copyright.gov/circs/circl/ html (accessed may 2, 2006). 7. ibid. 8. lessig, free culture, 133. 9. barbara m. waxer and marsha baum, internet surf and turf revealed: the essential guide to copyright, fair use, and finding media (boston: thompson course technology, 2006), 17. 10. patry, copyright law and practice; lessig, free culture, 133. 11. jessica litman, digital copyright (amherst: prometheus books, 2001), 35–63. 12. lessig, free culture, 134. 13. ibid., 326. 14. association of american universities, the association of research libraries, the association of american university presses, and the association of american publishers, campus copyright rights & responsibilities: a basic guide to policy considerations (association of american universities, the association of research libraries, the association of american university presses, and the association of american publishers, 2006), 8, www.arl.org/info/ frn/copy/campuscopyright05.pdf (accessed may 2, 2006). 15. george h. pike, “the delicate dance of database licenses, copyright, and fair use,” computers in libraries 22, no. 5 (2002): 14, http://infotoday.com/cilmag/may02/pike .htm (accessed may 2, 2006). 16. patry, copyright law and practice. 17. computer crime and intellectual property section criminal division, u.s. department of justice, “prosecuting intellectual property crimes manual,” www.cybercrime.gov/ipmanual .htm (accessed may 2, 2006); u.s. copyright office, copyright law of the united states of america and related laws contained in title 17 of the united states code (washington, d.c.: u.s. copyright office, 2003), www.copyright.gov/title17/circ92.pdf (accessed may 2, 2006). 18. recording industry association of america, “copyright laws,” www.riaa.com/issues/copyright/laws.asp (accessed may 2, 2006). 19. kenneth d. crews, copyright law for librarians and educators: creative strategies and practical solutions, 2nd ed. (chicago: ala, 2006), 94. 20. lessig, free culture, 8. 21. ibid., 51. 22. lawrence lessig, the future of ideas: the fate of the commons in a connected world (new york: vintage bks., 2002), 165–66, 176. 23. lessig, free culture, 53–61. 24. ibid., 53. 25. ibid., 59–61. 26. james boyle, shamans, software, and spleens: law and the construction of the information society (cambridge: harvard univ. pr., 1996), 172. 27. william w. fisher iii, promises to keep: technology, law, and the future of entertainment (stanford, calif.: stanford univ. pr., 2004), 202. 28. lessig, free culture, 289–93. 29. litman, digital copyright, 179–80. 30. ibid., 181–84. 31. michael godwin, digital rights management: a guide for librarians (washington, d.c.: office for information technology policy, ala, 2006), 1, www.ala.org/ala/washoff/woissues/ copyrightb/digitalrights/drmfinal.pdf (accessed may 2, 2006). digital dystopia | bailey 127 32. ibid., 10–18. 33. bill rosenblatt, bill trippe, and stephen mooney, digital rights management: business and technology (new york: m&t bks., 2002), 61–64. 34. godwin, digital rights management: a guide for librarians, 2. 35. david mann, “digital rights management and people with sight loss,” indicare monitor 2, no. 11 (2006), www .indicare.org/tiki-print_article.php?articleid=170 (accessed may 2, 2006). 36. julie e. cohen, “drm and privacy,” communications of the acm 46, no. 4 (2003): 46–49. 37. rosenblatt, trippe, and mooney, digital rights management: business and technology, 45. 38. j. alex halderman and edward w. felten, “lessons from the sony cd drm episode,” feb. 14, 2006, http://itpolicy.princeton .edu/pub/sonydrm-ext.pdf (accessed may 2, 2006). 39. godwin, digital rights management: a guide for librarians, 18–36. 40. wikipedia, “analog hole,” http://en.wikipedia.org/ wiki/analog_hole (accessed may 2, 2006). 41. godwin, digital rights management: a guide for librarians, 18–20. 42. ibid., 36. 43. robert a. gehring, “trusted computing for digital rights management,” indicare monitor 2, no. 12 (2006), www.indicare .org/tiki-read_article.php?articleid=179 (accessed may 2, 2006). 44. seth schoen, “trusted computing: promise and risk,” www.eff.org/infrastructure/trusted_computing/20031001 _tc.php (accessed may 2, 2006). 45. pamela samuelson, “drm {and, or, vs.} the law,” communications of the acm 46, no. 4 (2003): 43–44. 46. declan mccullagh, “congress raises broadcast flag for audio,” cnet news.com, mar. 2, 2006, http://news.com .com/congress+raises+broadcast+flag+for+audio/2100-1028 _3-6045225.html (accessed may 2, 2006). 47. ibid. 48. danny o’brien, “a lump of coal for consumers: analog hole bill introduced,” eff deeplinks, dec. 16, 2005, www.eff .org/deeplinks/archives/004261.php (accessed may 2, 2006). 49. siva vaidhyanathan, copyrights and copywrongs: the rise of intellectual property and how it threatens creativity (new york: new york univ. pr., 2001), 174–75. 50. lessig, the future of ideas, 36–37. 51. educause, “net neutrality,” www.educause.edu/ c o n t e n t . a s p ? pa g e _ i d = 6 4 5 & pa r e n t _ i d = 8 0 7 & b h c p = 1 (accessed may 2, 2006). 52. electronic frontier foundation, “dearaol.com coalition grows from 50 organizations to 500 in one week,” mar. 7, 2006, www.eff.org/news/archives/2006_03.php#004461 (accessed may 2, 2006). 53. catherine yang, “is verizon a network hog?” businessweek, feb. 13, 2006, 58, www.businessweek.com/technology/ content/feb2006/tc20060202_061809.htm (accessed may 2, 2006). 54. ibid. 55. jonathan krim, “executive wants to charge for web speed,” washington post, dec. 1, 2005, d05, www.washingtonpost .com/wp-dyn/content/article/2005/11/30/ar2005113002109 .html (accessed may 2, 2006). 56. harold furchtgott-roth, “at&t, or another telecom takeover,” the new york sun, mar. 7, 2006. www.nysun.com/ article/28695 (accessed may 2, 2006). (see also: www.furchtgott -roth.com/news.php?id=87 (accessed may 2, 2006). 57. john windhausen jr., good fences make bad broadband: preserving an open internet through net neutrality (washington, d.c.: public knowledge, 2006), www.publicknowledge.org/ content/papers/pk-net-neutrality-whitep-20060206 (accessed may 2, 2006). 58. peter suber, “three gathering storms that could cause collateral damage for open access,” sparc open access newsletter, no. 95 (2006), www.earlham.edu/~peters/ fos/newsletter/ 03-02-06.htm#collateral (accessed may 2, 2006). 59. j. d. lasica, darknet: hollywood’s war against the digital generation (new york: wiley, 2005), 45. 60. john borland, “freenet keeps file-trading flame burning,” cnet news.com, oct. 28, 2002, http://news.com.com/2100 -1023-963459.html (accessed may 2, 2006). 61. creative commons, “attribution 2.5,” http://creative commons.org/licenses/by/2.5/ (accessed may 2, 2006). 62. lawrence liang, “a guide to open content licenses.” http://pzwart.wdka.hro.nl/mdr/research/lliang/open _content_guide (accessed may 2, 2006). 63. peter suber, “open access overview: focusing on open access to peer-reviewed research articles and their preprints.” www.earlham.edu/~peters/fos/overview.htm (accessed may 2, 2006); charles w. bailey jr., “open access and libraries,” in mark jacobs, ed., electronic resources librarians: the human element of the digital information age (binghamton, n.y.: haworth, 2006), forthcoming, www.digital-scholarship.com/cwb/oa libraries.pdf (accessed may 2, 2006). 64. lasica, darknet, 72–73. 65. waxer and baum, internet surf and turf revealed, 17. 66. lessig, free culture, 134–35. 67. joe mandak, “internet archive’s value, legality debated in copyright suit,” mercury news, mar. 31, 2006, www .mercurynews.com/mld/mercurynews/news/local/states/ california/northern_california/14234638.htm (accessed may 2, 2006). 68. arnold p. lutzker, primer on the digital millennium: what the digital millennium copyright act and the copyright term extension act mean for the library community (washington, d.c.: ala washington office, 1999), www.ala.org/ala/washoff/wois sues/copyrightb/dmca/dmcaprimer.pdf (accessed may 2, 2006). the chamberlain group inc. v. skylink technologies inc. decision offers some hope that authorized users of drm-protected works could legally circumvent drm for lawful purposes if they had the means to do so (see: crews, copyright law for librarians and educators: creative strategies and practical solutions, 96–97). continued on page 139 toward a twenty-first-century library catalog | antelman, lynema, and pace 139 copyright © 2006 by charles w. bailey jr. this work is licensed under the creative commons attributionnoncommercial 2.5 license. to view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to creative commons, 543 howard st., 5th floor, san francisco, ca, 94105, usa. bailey continued from 127 ฀ known-item questions 1. “your history professor has requested you to start your research project by looking up background information in a book titled civilizations of the ancient near east.” a. “please find this title in the library catalog.” b. “where would you go to find this book physically?” 2. “for your literature class, you need to read the book titled gulliver’s travels written by jonathan swift. find the call number for one copy of this book.” 3. “you’ve been hearing a lot about the physicist richard feynman, and you’d like to find out whether the library has any of the books that he has written.” a. “what is the title of one of his books?” b. “is there a copy of this book you could check out from d. h. hill library?” 4. “you have the citation for a journal article about photosynthesis, light, and plant growth. you can read the actual citation for the journal article on this sheet of paper.” alley, h., m. rieger, and j.m. affolter. “effects of developmental light level on photosynthesis and biomass production in echinacea laevigata, a federally listed endangered species.” natural areas journal 25.2 (2005): 117–22. a. “using the library catalog, can you determine if the library owns this journal?” b. “do library users have access to the volume that actually contains this article (either electronically or in print)?” ฀ topical questions 5. “please find the titles of two books that have been written about bill gates (not books written by bill gates).” 6. “your cat is acting like he doesn’t feel well, and you are worried about him. please find two books that provide information specifically on cat health or caring for cats.” 7. “you have family who are considering a solar house. does the library have any materials about building passive solar homes?” 8. “can you show me how would you find the most recently published book about nuclear energy policy in the united states?” 9. “imagine you teach introductory spanish and you want to broaden your students’ horizons by exposing them to poetry in spanish. find at least one audio recording of a poet reading his or her work aloud in spanish.” 10. “you would like to browse the recent journal literature in the field of landscape architecture. does the design library have any journals about landscape architecture?” appendix a: ncsu libraries catalog usability test tasks lib-mocs-kmc364-20131012114126 286 communications marc format simplification d. kaye capen: university of alabama, university. this is a summary of a paper written on the consideration of the feasibility as well as the benefits, disadvantages, and consequences of simplification of the marc formats for bibliographic records. 1 the original paper was commissioned in june 1981, by the arl task force on bibliographic control as one facet in exploring the perceived high costs of cataloging and adhering to marc formats in arl libraries. the conclusions and recommendations, however, are entirely those of the author and the opinions and judgments stated here result from a wide-ranging canvas of technical services people, computer people, and/o r library administrators. because the marc format has so many uses, the paper is divided into five perspectives from which the marc format can be viewed: history, standards, and codes; present purposes; library operations; computer operations; and online catalogs. the library of congress has already begun a review of the marc format and has distributed a draft document. 2 the general thrust of that review is a close examination of the marc format in an attempt to begin to lay the foundation on which revised marc formats can firmly standparticularly in regard to content designation (tags, indicators, and subfield codes used to identify and characterize the data explicitly). as that review deals with the very specific, this paper aims generally at attempting to paint with broad strokes a picture of today's marc in its many relationships, benefits, costs, and what the impact would be to the whole from any change to the part. perspective: marc history, standards, and codes relationships the original marc format document established conventions for encoding data for monographs. though it was understood that early applications were going to relate to the production of catalog cards, the marc designers looked ahead to an increasing emphasis on data retrieval applications. other design considerations included, for example, the necessity for providing for complex computer filing, allowance for a variety of data processing equipment, and an attempt to provide for some analytical work (more specific description of contents notes or other types of analysis). later the single marc ii format was transformed into a series of formats, and as time passed, those formats became inextricably tied to other developments at the national and international levels: the international standard bibliographic descriptions, the anglo-american cataloguing rules , 2d ed., unimarc, the national level bibliographic records, and the national and international communications standards; e.g., ansi z39.2-1979 and iso 2709. benefits the benefits of the marc formats and other standards and codes have been substantial both philosophically and pragmatically. the sharing of cataloging records through the computer-based, online networks have been shown in a variety of cost studies to have contained the rate of rise of per unit cost. a further benefit of the marc formats is the momentum its creation gave to the steady movement toward standardization which can benefit individuallibraries in a number of ways: first, bibliographic information can be exchanged among libraries and countries. second, in recent years we have moved steadily toward creating an environment in which the library of congress would become one of many authoritative libraries thus enhancing the shareability of records. costs the early costs of the development and implementation of the marc formats were borne by lc (aided by council on library resources funds). lc continues to bear most of the costs of marc formats, such as new marbi proposals, duplication and distribution of documentation, and so forth. direct investment of library dollars came through the purchase of the marc tapes and the development of systems to receive, process, and output data in marc formats. impact of change throughout the years of its use, the marc format content designation and content rules have been augmented or modified. in the beginning, however, databases were small and changes could be absorbed more readily. the number and complexity of the formats have increased, as have the interrelationships of the marc formats with other standards and codes resulting in a present environment in which the impact of change is felt more strenuously. perspective: present relationships and constraints relationships today's close interrelationships between the marc formats and other codes and standards affect both library and computer operations. though, for example, the general international standard bibliographic description was implemented by the library community prior to the adoption of aacr2, the second edition of the rules has firmly incorporated the isbds. when this format description system is combined with the machine-based marc formats, some isbd information will be supplied by humans and some generated by programmed machine manipulations. communications 287 as a second example, in the last couple of years, the library of congress has spearheaded the development of national level bibliographic record(s) which define the specific data elements that should be included by any organization creating cataloging records which may also be shared with other organizations or be acceptable for contribution to a national database. as the logical idea of a national database comes to fruition, it is necessary for the marc format to provide for greater specificity in the coding of originating library, modifying library, and so forth. benefits the benefits of the use of the marc format continue to lie in the ease with which bibliographic information can be shared and the concomitant beneficial impact on cost control. in addition, the marc format supports a host of other standards and codes and the benefit from these relationships has been consistency in and fostering of standards development. in the bibliographic arena, the more that standards are developed-locally, regionally, nationally, and internationally-the more we will be able to transmit and share bibliographic data, thus controlling the costs of original cataloging. on the other hand, we also "pay" when we standardize. cost the two costs associated with increased standardization are additional time and thus cost required to meet standards, and the increased expense of maintaining local practices which may often be idiosyncratic. in relation to the latter, while many local idiosyncrasies are often unnecessary and counterproductive, there are generally some which have become an integral part of a large catalog database or upon which a major procedural activity is based. but, to benefit from compliance with standards, increasingly we will move away from local practices. in terms of the time required to adhere to the marc format, it is possible to continue to utilize the format (or participate in systems that use it) and yet control the amount of complexity with which one has to deal. both aacr2 and national level biblio288 journal of library automation vol. 14/4 december 1981 graphic record documents allow for "levels of description" which provide for more or less description; and various online networks allow, in a similar manner, for limited input standards. as we view the array of standards and codes which together make up today's bibliographic scene, we can see that each of the separate elements is consistent within itself, is understandable, and counts for only a portion of the costs associated with the cataloging process. the combination of elements, however, begins an accretion of complexity that for most requires an effort of organization and education in order to control work flow and meet standards. impact of change because the marc format is closely interwoven with a number of national and international codes and standards, changes to the format would have implications far beyond the local library. at the very least, discussions would have to involve a host of individuals and groups, all at different stages of development and implementation based upon the present marc format. perspective: library operations relationships in the library-operations perspective, any operations related to the marc format have to be viewed as only one of many elements which must be interfaced with daily work flow. let us look, for example, at the amount of time which might be expended in a typical large academic library by cataloging personnel in training and ongoing work activities required in marc-related operations. in those libraries which obtain access to cataloging databases as members of networks, contact with the marc format is filtered through the standards, requirements , marc implementation design, documentation and other related training facilities of the network. libraries which maintain their own databases do the same kind of filtering, though staff may have somewhat more control of the user cordiality of the interface. the shared networking environment , however, generally seems to imply more standards and requirements because of the attempt to guarantee as much "shareability" as possible. libraries participating in oclc, for example, must train staff in the following codes: aacri; aacr2; standard subject heading codes; standard classification codes; oclc/marc formats for each type of material being cataloged; oclc bibliographic input standards; oclc level i and level k input standards; oclc systems users guides; in some instances, input standards documents for regional or special-interest cooperatives; local library interpretations, procedures, and standards. any close review of the time library staff expend in the use of these tools for either training or ongoing operations reveals that marc per se requires only a limited proportion of a typical library staff person's day. while training may be intensive at either the beginning of a person's job or at the beginning of work with a new type/format of material, this portion of the cataloging unit cost is small. benefits, costs in the cataloging activity, the benefits from the use of the marc formats are at least two: first, the marc format as part of an online cataloging system permits the machine-production of catalog cards at a major savings over manual production. second, access to a shared cataloging database permits the use of "clerical" catalogers at an estimated unit cost saving per book of twenty dollars when compared to "original" cataloging.3 third, depending upon the information available in the cataloging record, the time required for decision making during the cataloging process can be decreased significantly. impact of change it was the general consensus of the technical services people i contacted that simplification of the formats through the consistent assignment of tags would make training and introduction to new formats somewhat easier, but that any savings of time would probably be trivial. there was no consensus that either simplification or shortening would result in any significant time or cost savings. to a certain extent, the use of the very specific marc formats has made the descriptive cataloging process (and the training to undertake it) clearer in that the logical relationships and description of the data elements are so clearly exposed through the assignment of tags and other codes. also, once initial familiarity with the format(s) is achieved, ongoing use becomes second nature. it is also possible for cataloging staff to control the complexity with which they will deal through the use of less than "full," but still nationally acceptable levels of cataloging and, hence, marc coding. finally, most technical services people believe that cataloging and maintenance activities in libraries have always been complex, requiring long and detailed procedures and intricate work flow . while membership in networks requires new skills and knowledge, it is the sum of the whole rather than the difficulty of any single portion which affects unit costs today. changing the marc format through either simplification or shortening would have only a slight effect on the total technical services operation and costs. perspective: the computer operations environment relationships in looking at computer operations, there are at least two major subdivisions: operations that serve only one client (e.g., alibrary system serving itself) or operations that serve many clients (e.g., rlin or blackwell/north america). the constraints differ for each operation and are further complicated by whether or not the computer operation must be able to produce as well as accept bibliographic records in a marc format. each computer facility, for example, can have distinct operating software depending upon the type and mix of computing equipment used. in addition, each computing facility translates the marc-formatted records into an internal processing format which may differ extensively from marc. too, further tailoring may be done for batch processing as opposed to online operations and computer operations which serve a single user may not have to re-create records in the marc format and may even communications 289 more radically redesign the marcformatted records for internal use. as changes to the marc format occur over the years, each computer system will write additional software to incorporate those changes into the then existing system. in some instances, it may be too difficult to attempt to convert old databases to reflect changes in marc coding, and there will then exist an "old" database and a "new" database for that particular marc field or subfield. since changes have occurred in many fields, most databases are an amalgam of new and old interpretations (this is true in relation to cataloging codes, too) of marc coding, and original internal software design may reflect the same type of patchwork quilt. operating these computer systems is complicated, in addition, by the fact that a wide range of user library needs and desires must be accommodated. indeed, a report prepared by hank epstein for the conference to explore machine-readable bibliographic interchange (cembi) revealed after an exhaustive review of the use of marc data elements that there was no data element not used by someone!• benefits benefits that accrue to computing operations as a result of the marc format include the use of what was called "a pretty decent general communications format ," which facilitates communications, card/ com production, and online information retrieval. as a communications format it is as coherent as any other structure for carrying bibliographic data. because the format allows for a very specific level of detail in description, computing operations can supply a variety of products to fill a variety of needs. costs while specific cost information was not available for inclusion in this paper, discussion does reveal some widely held generalizations. first, the marc format does not seem to be any more complex or costly to use than other variable field communications formats. beginning programmers are generally introduced first to the internal communications format of their particular 290 journal of library automation vol. 14/4 december 1981 computing system, and when they come to the marc tags rapidly become familiar with the coding through experience. indeed, if the programmers know the structure of and have a specification for the format, they can work with that format even though they may be unfamiliar with it from the users' point of view. thus, the format itself, and training in its use does not seem to be significantly costly. second, every change in the marc format requires some programming effort and may or may not require concomitant changes in the database. the consensus of the computer people with which i spoke was that the sophistication and specificity of the marc formats was a good thing, but the inconsistencies among formats is problematical. the benefits of consistency can be important, but to justify changes financially, the major changes should be done at one time. indeed, most individuals doubted whether or not there was sufficient capital in these straitened times to be able to implement consistently a major marc format changeand this is from the perspective of both the operations serving one and many users. impact of change without a philosophical and practical framework (or benchmark) against which to compare the benefits and costs of alternative solutions to marc format maintenance issues and without a better and more comprehensive description of the requirements of the internal processing formats of the computer operations, it is difficult to assess clearly the costs and benefits of marc format changes. it does seem to be the case presently that, once established, computer operations can deal with the complexity and specificity of the marc format without undue ongoing financial investment. the strength of the marc format for computer operations lies in its specificity. for the batch processing environment especially, the marc format is a reasonably efficient format and one that facilitates development. its inefficiencies are not drastic and its specificity buys valuable flexibility. severe cuts or major simplifications would be a mistake since discontinuing specificity is a one-way street-once it is gone, it cannot be retrieved. the ability of the machine to assist in editing is weakened by the loss of specificity and it then becomes more difficult to edit out poor data. simplification through consistency, rather than shortening, would produce the most beneficial impact-though it must be done carefully to be cost beneficial. perspective: online catalogs relationship the major difficulties facing us when we attempt to discuss the relationship of the marc format to online catalogs is that, first, we know so little about how people think when they use our card catalogs; and, second, we have so little experience with how those thought and use patterns might change when the online catalog replaces the card catalog. another aspect of online library system development is the combination of subsystems such as acquisitions, serials control, or authority control with the online catalog and the implications of such a combination for system design, the internal processing format, and compatibility with the marc format. the index design of most large online catalogs or information retrieval systems today relies upon precoordinated search keys in order to facilitate the large sorting activities that have to occur. the second indicator in the 700 field, for example, is designed for the purpose of formulating search keys, filing added entries or for selecting alternative secondary added entries. this type of specificity is necessary for both card production and online retrieval. taken together, all of these considerations make most systems and library technical people hesitate to recommend any major changes to the marc format at this time. benefits at this time, therefore, in terms of information retrieval, there does not seem to be any major force toward either simplifying or shortening the marc format to facilitate retrieval. this becomes an even more cogent sentiment when we consider that major development efforts have already been begun in the areas of online catalog access and information retrieval. delays in these development efforts now caused by ........ changes in the marc formats could be enormously wasteful of the time and effort already invested, and could postpone urgently needed implementation of new, easily maintainable online systems. costs there is no firm cost data to guide us in considering the impact of marc format changes in the information retrieval environment. generally accepted assumptions are, however, that because of our lack of knowledge and experience in this area, it is simply too risky and potentially costly to experiment. impact of change overall, without more experience in this area, it is the general opinion that the fullest level of descriptive specificity of the marc format might be required to design and implement online catalogs/information retrieval systems which can be responsive to the needs of a variety of users and levels of information. interaction with other subsystems and formats is also incomplete, thus clouding our vision of the impact of change over the breadth of the library community. summary and conclusions the original purpose of the marc format is still a cogent and necessary one-that of allowing for a great variety of individual library needs for products, practices, and policies via a standardizing communications format. both catalog card production and online retrieval necessitate the same level of specificity, though particular tags, indicators, and subfield codes may vary. as we look toward a variety of authoritative cataloging sources the marc format, in addition to a specific coding of bibliographic information, might also have to specify descriptions of cataloging actions so that the greatest degree of "shareability" might exist. some of this related authoritytype information will either be carried as part of the marc format or in some manner as linked records. the computer operations that utilize the marc formats exist under the constraints of a variety of internal processing formats and design constraints. for each internal processing system, however, the specificity of the marc format offers flexibility and communications 291 efficiency for a number of different processes and products. taken by itself, the marc format is no more difficult to work with than any other standard or technique for both librarians and computer people. while it might be useful for librarians to implement training aids such as online documentation, access to library manuals (particularly that of the library of congress), and so forth, the benefits of aids such as these are trivial since the coding can be learned rather quickly through experience. for computing people, on the other hand, changes in the formats can be very expensive and disruptive. there is general agreement, moreover, that over the long term we have got to be able to maintain the marc format in response to experience with retrieval and other theoretical and technical advances. the main thrust of maintenance in the computing realm is consistency across formats, but approaching this type of simplification requires a number of preliminary steps if it is to be implemented effectively. we need to develop a vocabulary for jointly discussing the elements of the problem. in addition, a major review needs to be undertaken of the internal processing formats and design constraints of the major computer operations-both to serve as a benchmark for measuring the impact of format changes, and as a guideline for newly developing systems to assist in avoiding mistakes in the development of new computer operations. someone needs to be thinking about and designing the ultimate, comprehensive marc format-not to be implemented, but to serve as a springboard for discussion and for consideration of system design. we need to establish limitations on what we will handle with the marc formats and where we will begin to rely on underlying formats instead. the development of a comprehensive marc conceptualization would also provide a protocol for undertaking the improvement of marc and would serve as a benchmark against which local systems could be compared. at the very least, the steps described here would facilitate the consideration and implementation of making the formats consistent across types of material a goal which is seen by all to be highly desirable. 292 journal of library automation vol. 14/4 december 1981 we need a format which is consistent, easily maintainable without being uncontrollably disruptive, and responsive to changing needs which are likely to accelerate as we gain experience with online systems. rather than recommending or supporting the implementation of specific changes to the marc format, it is essential that the library community begin to establish the framework and benchmarks necessary to maintain the marc formats over the long term as well as to guide short-term considerations. arl and others can play an important role in undertaking and encouraging a broader approach to this pressing problem. such an approach will not only reduce the risk of decision making, but will also assist in the development of the cost/benefit data needed to enhance consideration of format changes. references 1. d. kaye capen, simplification of the marc format: feasibility, benefits, disadvantages, consequences (washington, d.c.: association of research libraries, 1981), 22p. 2. "principles of marc format content designation,'" draft (washington, d.c.: library of congress, 1981), 66p. 3. ichikot. morita and d. kaye capen, "a cost analysis of the ohio college library center on-line shared cataloging system in the ohio state university libraries," library resources & technical services 21:286302 (summer 1977). 4. council on library resources bibliographic interchange committee, bibliographic interchange report, no. i (washington, d.c.: the council, 1981). comparing fiche and film: a test of speed terence crowley: division of library science, san jose state university, san jose, california. introduction for more than a decade librarians have been responding to budget pressures by altering the format of their library catalogs from labor-intensive card formats to computer-produced book and microformats. studies at bath, 1 toronto, 2 texas, 3 eugene, 4 los angeles, 5 and berkeley, 6 have compared the forms of catalogs in a variety of ways ranging from broad-scale user surveys to circumscribed estimates of the speed of searching and the incidence of queuing. the american library association published a state-of-the-art reporf as well as a guide to commercial computer-output microfilm (com) catalogs pragmatically subtitled how to choose; when to buy. 8 in general, com catalogs are shown to be more economical and faster to produce and to keep current, to require less space, and to be suitable for distribution to multiple locations. primary disadvantages cited are hardware malfunctions, increased need for patron instruction, user resistance (particularly due to eyestrain), and some machine queuing. the most common types of library com catalogs today are motorized reel microfilm and microfiche, each with advantages and disadvantages. microfilm offers filesequence integrity and thus is less subject to user abuse, i.e., theft, misfiling, and damage; in motorized readers with "captive" reels it is said to be easier to use. disadvantages include substantially greater initial cost for motorized readers; limits on thecapacity of captive reels necessitating multiple units for large files; inexact indexing in the most widespread commercial reader, and eyestrain resulting from high speed film movement. microfiche offers a more nearly random retrieval, much less expensive and more versatile readt:r~, and unlimited file size. conversely, the file integrity of fiche is lower and the need for patron assistance in use of machines is said to be greater than for self-contained motorized film readers. the problem one of the important considerations not fully researched is that of speed of searching. the toronto study included a selftimed "look-up" test of thirty-two items "not in alphabetical order" given to thirtysix volunteers, of whom thirty finished the test. the researchers found the results "inconclusive" but noted that seven of the ten librarians found film searching the fastest method. "average" time reported for searching in card catalogs was 37.3 minlib-mocs-kmc364-20131012113423 $c this subfield will contain all but the first character (or all but the first if a longer escape sequence is used) of every escape sequence found in the record. if the same escape sequence occurs more than once, it will be given only once in this subfield. the subfield is repeatable. this subfield does not identify the default character sets. example: l'>l'l~c)w a record containing the iso extended cyrillic character set. l'>l'>$c)w$c)x a record 3.4 discussion-other details containing both the iso greek and extended cyrillic character sets. when a field has an indicator to specify the number of leading characters to be ignored in filing and the text of the field begins with an escape sequence, the length of the escape sequence will not be included in the character count. when fields contain escape sequences to languages written from right to left, the field will still be given in its logical order. for example, the first letter of a hebrew title would be the eighth character in a field (following the indicators, a delimiter, a subfield code, and a three-character escape sequence). the first letter would not appear just before the end of field character and proceed backwards to the beginning of the field. a convention exists in descriptive cataloging fields that subfield content designation generally serves as a substitute for a space. an escape sequence can occur within a word, after a subfield code, or between two words not at a subfield boundary. for simplicity, the convention that an escape sequence does not replace a space should be adopted. one other convention is also advocated: when a space, subfield code, or punctuation mark (except open quote, pareports and working papers 215 renthesis or bracket) is adjacent to an escape sequence, the escape sequence will come last. wayne davison of rlin raised the following issue. after the library of congress has prepared and distributed an entirely romanized cataloging record for a russian book, a library with access to automated cyrillic input and display capability will create a record for the same book with the title in the vernacular. (since aacr2 says to give the title in the original script "wherever practicable," the library could be said to be obligated to do so.) in such an event the local record could have all the authoritative library of congress access points. to keep this record current when the library of congress record is revised and redistributed, it would be necessary to carry the lc control number in the local record. most automated systems are hypersensitive to the presence of two records with the same control number. the two records can be easily distinguished: in the library of congress record, the modified record byte in field 008 will be set to "o" and it will not have any 066, character sets present field. a comparison of oclc, rlg/rlin, and wln university of oregon library the following comparison of three major bibliographic utilities was prepared by the university of oregon library's cataloging objectives committee, subcommittee on bibliographic utilities. members of the subcommittee were elaine kemp, acting assistant university librarian for technical services; rod slade, coordinator of the library's computer search service; and thomas stave, head documents librarian. the subcommittee attempted to produce a comparison that was concise and jargonfree for use with the university community in evaluating the bibliographic utilities under consideration. the university faculty library committee was enlisted to review this document in draft form and held three meetings with the subcommittee for that purpose. the document was also shared with library faculty and staff in order to elicit suggestions for revision. 216 journal of library automation vol. 14/3 september 1981 a copy of the draft was sent to each utility with a request for suggestions for correction and/or clarification of the report. each of the utilities responded promptly, and their recommendations were reviewed by the subcommittee and have been incorporated into the report as it appears here. in reading this report two considerations should be kept in mind: (1) the information is current as of december 1980, and (2) the efforts at brevity and jargon-free comparison may have resulted in oversimplification in some areas. this report is one aspect of the sixmonths-long decision-making process that led the university of oregon library to select oclc, inc. (now the online computer library center). introduction an online bibliographic utility provides computer services to member libraries who, in turn, contribute computer-readable records to a common database. the database is a collection of catalog records input by the members and other sources such as the library of congress, the government printing office, and the national library of medicine. use of the database is online, meaning that each member library accesses the computer directly and carries out its work in an interactive, conversational manner through a computer terminal located in the library. communications with the central computer are carried over a leased long-distance telephone line. the bibliographic utility produces two primary products-catalog cards and magnetic tapes of a library's catalog records-and offers many other services for processing and bibliographic control in libraries. in addition to providing the products and services of a bibliographic utility through the research libraries information network (rlin), the research libraries group (rlg) has three other goals: (1) to provide a structure through which common research library problems can be addressed, (2) to provide scholars and others with increasingly sophisticated access to bibliographic and other forms of information , and (3) to promote, develop, and operate cooperative programs in collection development, preservation of library materials, and shared access to research materials. the purpose of this report is to provide an overview of considerations in selecting an online bibliographic utility and a comparison of the three utilities being reviewed by the university of oregon library. each consideration is accompanied by a brief definition or explanation, and a summary of each utility's capability in providing the necessary services or products. an attempt has been made to distinguish between currently available services and those that are planned for the future, but technological and organizational changes in the utilities have complicated this task and, in some cases, made it difficult for the subcommittee members to distinguish between operational and projected capabilities. basic characteristics history oclc oclc, inc., was founded in 1967 by the ohio college association as the ohio college library center, to be the first online shared cataloging network. it has since expanded beyond the confines of the state of ohio and is currently used by nearly 2,400 member libraries in the united states and abroad. in 1977 it adopted its present name. rlgirlin the research libraries group, inc. , was established in 1974 by four major research libraries. in 1978 it acquired from stanford university the ballots bibliographic data system, which became the foundation for rlin (research libraries information network), rlg's wholly-owned bibliographic utility. besides being the basis for rlg's cooperative processing activities, rlin supports its other three programs: shared resources, cooperative collection development, and preservation. rlg presently has 23 owner-members. wln in 1975 the washington library network began testing its online system using as its base a computerized bibliographic database that several washington libraries had been building since 1972. wln is a project of the washington state library and presently has over 60 members, primarily in the northwest. membership configuration oclc oclc had 2,392 member libraries, in early 1981, including about 1,300 college and university libraries, 330 public libraries, 250 federal libraries, 145 special libraries, 77law libraries, 71 members of the association of research libraries, 168 medical libraries, 37 state libraries, and at least 48 art and architecture libraries. rlg!rlin in december 1980, there were 23 ownermembers (21 university libraries, the new york public library, and the american antiquarian society), two associate members, two affiliate members, and several museum and three law library special members. libraries which formerly contracted for ballots cataloging services from stanford university are still being served by rlin. these include 52 libraries using rlin for online cataloging and 136 libraries using rlin on a search-only basis. wln wln had 65 members, in early 1981 , including 34 college and university libraries, 21 public libraries, two special libraries, three state libraries, five law libraries, and the pacific northwest bibliographic center. governance methods of governance are of concern to libraries considering membership inasmuch as they determine to a great extent the responsiveness of the utilities to the needs of their members and the ability of members to participate in setting the direction and priorities for the utility. oclc a 15-mem ber board of trustees holds the powers and performs the duties necessary for governance (including filling management vacancies and approving policy and budgets). a users' council, elected by the members, participates in the election of trustees and represents the interests of the membership in an advisory capacity. it also reports and working papers 217 must ratify amendments to the oclc code of regulations and articles of incorporation. of the 69 delegates to the council, 44 are from academic libraries. various advisory groups exist representing the interests of special groups within the membership, including a research libraries advisory group. twenty regional networks contract with oclc to provide services to their members. oclc libraries in oregon participate through the oclc western service center, claremont, ca, and are served by oclc's portland office. rlg!rlin rlg /rlin operates through a board of governors consisting of one representative from each full member institution with the president as chief operating officer. standing committees for collection management, public services, preservation, and library technical systems & bibliographic control; and program committees for east asia, art, law, theology, and music are composed of appointees from member institutions and report to the president. wln an 11-member computer services council is elected directly by the online participant libraries. legal responsibility for wln resides with the washington state library commission. financial stability an indicator of a utility's financial stability is its proven ability to generate sufficient revenues to cover expenses with the least recourse to outside funding sources. financial stability in a utility is a concern to a library considering membership not only from the standpoint of a utility's mere survival, but because of its implications for future system developments, possible dramatic fee increases should outside funding evaporate, and maintenance of high quality services and products. oclc oclc, inc., is a not-for-profit corporation, with tax-exempt status having been granted under section 501 ( c)(3) of the internal revenue code . it is self-supporting, receiving no government or private subsidies, 218 journal of library automation vol. 14/3 september 1981 and issuing no stock. its revenues alone support existing operations, expansion, and research and development activities. revenues result from fees charged member libraries for products and services. oclgs estimated assets for fiscal year 1980 were over $55 million and its revenues approximately $24 million. its revenue base is its 2,400 member institutions. rlgirlin the research libraries group, inc., is a tax-exempt corporation owned by its 23 owner-member institutions. revenues result from fees charged members for use of the rlin database. rlg currently must supplement this income with foundation grants and loans from stanford university, because of relatively high development costs and relatively low revenues. as of this year, nearly $5.25 million has been received in grants and a $2.2 million loan was obtained, to be repaid by august 1986. rlg has projected that in 1982-83 ongoing operating costs will be met by feegenerated income. rlg's board of governors recently approved a new income/ expense structure to take effect september 1, 1981: "operating expenses matched by rates for services; system development matched by grants and loans; program and administration matched by a program partnership fee." this new program partnership fee will be a flat annual rate for full members in the range of $20,000 to $25,000. a decline in the number of units cataloged by member libraries (due in part to decreased acquisitions budgets), which is the basis for fees charged, forced the board lo inslilute this new fee. il.lg is encouraging member libraries to seek these additional funds from institutional sources outside the libraries' own budgets. the new financial structure appears to reflect a recognition of the need for outside resources to provide for research and development for at least the immediate future, and at the same time an effort to reconcile income and expense in the areas of operating expenses and program administration. its revenue base is its membership of 23 institutions. in the past rlg has estimated that financial stability would be reached when membership reached 35, but it is unclear how the new rate structure will affect that projection. wln the washington library network receives revenues in the form of fees for services and products. as a division of the washington state library, it also receives some funding from the state of washington. wln has been the recipient of some outside grants, but does not appear to rely heavily upon grant monies to meet ongoing expenses or system development costs. wln would like to lessen its dependency upon the state of washington, and has taken the first step by broadening the base of its advisory committee to include out-ofstate members. its revenue base is its membership of approximately 60 libraries. the committee preparing this report does not have information as to the proportion of revenues generated by fees. however, a recent (july 1, 1980) 10% increase in service rates was put into effect for these stated purposes, among others: "to recover the cost of operation of the computer service" and to "allow a modest margin to insure stability." track record in meeting past system developme11t deadlines past success or failure in meeting announced deadlines for system developments may be indicative of future performance in this regard. all three utilities are heavily engaged in research and development and, while we are primarily interested in the features that are presently available, it is also important to try to gauge what each system will look like several years from uuw. the amount of information available to the committee varied according to the utility, so these columns are not directly comparable, but merely suggestive. oclc oclc tries not to attach dates to its projections because of early failures to meet announced deadlines. however, its interlibrary loan system was implemented one year early and its searching improvements are claimed to be ahead of schedule. the planned acquisitions subsystem had been scheduled for completion in summer 1980, and is currently being tested by a small number of member libraries. the conversion of oclc's database to accommodate the new cataloging rules and include new forms of names was completed on schedule in december 1980. the serials union listing capability was also completed on time. (seep. [224]) rlgirlin a study dated august 1978 performed for the university of california listed planned ballots system developments with projected completion dates. this list follows, with actual completion dates or revised projections added: • network file system (now called "reconfigured database" by rlin) projected january 1979 revised projection april 1981 serials cataloging projected january 1979 actual completion late 1979 authority control system, phase 1 projected january 1979 revised projection spring 1981 authority linking and control, phase 2 projected fall1979 revised projection spring 1981 generalized acquisitions projected fall1979 revised projection (in two phases) june 1981, october 1981 serials control projected 1980 revised projection post-1982 library management information system projected 1979 no projected date, no resources allocated book/com catalog interface projected 1980 revised projection 1981 wln wln's present online system was one year late, and its acquisitions module was also late. the processing of retrospective conversion tapes which had been three months behind was current by early 1981, *since 1978 the rlg board of governors has determined the order of priorities for research and development. reports and working papers 219 with the exception of two special projects. large-scale system adjustments to accommodate new cataloging rules were completed on schedule, as was implementation of roll-microfilm catalogs. database size and components the size and makeup of the utility's database is of concern to libraries considering membership because those factors have the greatest bearing on the library's likelihood of obtaining a large portion of its cataloging information from the system. oclc size. over 7.1 million bibliographic records (february 1981) books: 4.9 million (october 1979) serials: 341,000 (october 1979) other: 340,000 (october 1979) name authority records: 500,000 (est. by 1981) formats available. books serials films (av) maps manuscripts music recordings music scores sources of data. member-contributed records library of congress-produced machinereadable cataloging records (marc) (1968 to date) government printing office-produced records (cataloged directly into oclc by gpo) conser records (conversion of serialsa project of 15 major libraries to produce machine-readable serials cataloging records). data are entered directly into oclc, then authenticated by the library of congress and the national library of canada. national library of medicine-produced records additional sources include the following databases: canadian marc serials minnesota union list of serials pittsburgh regional library center serials 220 journal of library automation vol. 14/3 september 1981 rlg/rlin size. over 3 million bibliographic records 0 une 1980) books: 2.5 million (june 1980) serials: 460,000 (june 1980) authority records: 1.6 million (early 1981) formats available. books serials films (av) maps music recordings music scores sources of data. member-contributed records marc (excluding 19681972) gpo records (to be added spring 1981) conser records cataloging records from columbia and yale universities and university of minnesota biomedical libraries, previously put into machine-readable form, have been added to rlin. records from the new york public library, northwestern and pennsylvania state universities will be added in the near future. additional sources include the avery index to architectural periodicals. wln size. 2 million bibliographic records (january 1981) authority records: 2.3 million (january 1981) holdings records: 2.3 million (december 1980) formats available. books serials films (av) music recordings• music scores• sources of data. member-contributed records marc (1968 to date) gpo records conser records (except those not yet authenticated by the library of congress) machine-readable records from the university of illinois will be added to wln's • awaiting implementation by the library of congress. database on a weekly basis by mid-1981. records from certain libraries in the southeastern library network (solinet) will be added in the future, ,as part of an arrangement whereby wln made its computer software package available for use by illinois and solinet. resource sharing interlibrary loan (ill) ill is the process by which library materials are lent and borrowed by libraries in the u.s. and foreign countries. a bibliographic utility provides two tools to aid in this process: an online union catalog used to determine which library owns the needed material, and a message switching system used to communicate among libraries and to carry out the transaction. ill at the university of oregon library is currently accomplished using a large number of printed union catalogs and is communicated by mail or western union teletype. a bibliographic utility will not completely replace ill transactions carried out in this manner. the number of requests for materials from the library collection will probably increase due to the "visibility" gained in the online union catalog. oclc the oclc database provides the largest online union catalog through a holdings record listed with each catalog entry. the ill message system transfers records from the database to the lending library in a request form, automatically sends the request to up to five libraries, generates records on the status of each request, and provides statistics on ill transactions. oclc ill transactions are generally faster than traditional methods of interlibrary loan because of the ability to move data directly from the online union catalog to the request form without re-typing and the ability to have requests automatically forwarded if a library is unable to fill the request immediately. oclc's ill subsystem has been in operation for a year and participating libraries have reported general satisfaction with its performance. rlg/ rlin the rlin database provides an online union catalog through a holdings record listed with each catalog entry. materials not located in the rlin database may be referred to the bibliographic center at yale university for further manual searching through printed union catalogs. the rlg message system may be used to create and send ill requests to other rlg libraries, though this system is not specifically designed as a comprehensive ill support system. the shared resources program committee has recently formed a task force charged with the responsibility to create a functional specification for an automated interlibrary loan system, and to determine the priority for its implementation. rlg resource sharing policy requires members to give priority to ill requests from other rlg members, to suspend fees to members, to provide on-site access to users from members' libraries' institutions, and to provide free photocopies of non-circulating materials. wln the wln database provides an online union catalog through a holdings record listed with each catalog entry. this online union catalog includes the local library call number and, for serials, the specific holdings of the library. the wln resource directory is a microfiche listing of the bibliographic and holdings information in the database. wln offers no message switching system for ill, though this is their highest priority for future development. in cooperation with pacific northwest bibliographic center, wln is planning experiments with a message switching system for interim use until the comprehensive ill system is developed. cooperative acquisitions cooperation in purchasing library materials is done in order to minimize the duplication of expensive purchases and to ensure that important works are easily available to users of the library, whether they are actually owned or not. oclc member libraries may search the database to determine the holdings of particular items by other member libraries, in order to reports and working papers 221 avert undesirable duplicative purchases. rlgirlin members actively coordinate purchases of certain categories of materials in designated fields in order to avoid extensive duplication and to ensure that at least one copy of every item of research value be acquired by a member institution. in support of this effort is an automated "cooperative purchase file," containing limited bibliographic information and acquisition decisions of rlg members for all new serials on order and for all expensive items ($500 or more). member institutions agree to develop conspectuses reflecting their level of holdings and development in certain fields (subjects, language, and formats). these conspectuses are time-consuming to develop. a survey of holdings in chinese, japanese, and korean languages has been finished by 12 members. older members have completed language and literature, fine arts, philosophy, and religion. history is expected by march, 1981, to be followed by the hard sciences. based upon these conspectuses, rlg members will build a system-wide collection development policy. new members are expected to begin work on their conspectuses as soon as possible, but not necessarily immediately after joining rlg. wln members may search the database to determine the holdings of particular items by other member libraries, in order to avert undesirable duplicative purchases. libraries may also search the in-process file to determine if items are on order by one of the 23 libraries using wln's acquisitions subsystem. support for collection development activities a bibliographic utility is potentially useful for collection development in that it provides a large file of bibliographic records that may be searched to assist in a) determining the existence of published materials in specified categories (on a particular subject, by a particular author, in a particular series, for example), and b) obt~ining cor222 journal of library automation vol. 14/3 september 1981 rect bibliographic information about specific items to assist in ordering them. important features in a utility in this regard are database size and variety of access points (subject, author, series titles, etc.). oclc useful access points by which the database may be searched include: • personal author • corporate author • title • series title • variant names (e.g. clemens or twain) • conference names the database must be searched using a "search key" (a code based upon a sequence of initial letters in the words to be searched), not real words. rlg/rlin useful access points by which the database may be searched include: • personal author • corporate author • conference names • title • series title • subject heading or call number range (excluding items cataloged by the library of congress) • publisher, using a truncated isbn (international standard book number) [restricted to items cataloged by the library of congress] a search of rlin is likely to produce multiple records for particular items because an item held by more than one member will be displayed for as many libraries as have cataloged it through the system. it is projected that by april, 1981, run's "reconfigured database" will have solved that problem by attaching holdings information to one unified record. it will also have merged the two bibliographic subfiles (library of congress and member cataloging) so that access by subject heading, call number range, and isbn will be available for the entire database. wln useful access points by which the database may be searched include: • personal author • corporate author or corporate author keyword (keyword searching permits the user to search for items using either the full heading: american society for information science; or words from the heading: "society" and "information. " this capability is useful when the complete phrase is not known.) • title • corporate or conference author/title series (keyword) • series title or truncated series title • subject heading and/or subdivision or truncated subject heading • corporate and conference name subject headings (keyword) preservation of library materials all bibliographic utilities, because of their function as a union catalog of their members' machine-readable cataloging information, have some usefulness for libraries making decisions about preservation priorities. a library may, for example, choose to give preservation treatment to item a rather than item b because item b is owned by several other libraries in the vicinity, whereas item a appears to be unique. it must be remembered, however , that many older items will not appear at all, because they were cataloged long before the utilities came into existence. oclc members may search holdings information in the database to determine the relative rarity of an item that is a candidate for preservation treatment. rlgirlin members may search holdings information in the database to determine the relative rarity of an item that is a candidate for preservation treatment. a computerized list of members' micropreservation activities is provided. experimental programs are conducted to test new preservation technologies and applications of existing processes . preservation microfilming is being done for members by staff at yale and princeton. funds are provided to members for preservation activities. r these activities are part of rlg's preservation program, one of its four major programs. wln members may search holdings information in the database to determine the relative rarity of an item that is a candidate for preservation treatment. technical processing acquisitions the steps by which the library purchases books and other materials include: l. pre-order searching to determine that a requested item is not already owned by the library or on order. 2. selecting a dealer likely to be able to supply desired item. 3. placing the order. 4. receiving the item. 5. clearing the order records. 6. processing the invoice for payment. 7. maintaining precise accounting of all book funds. 8. inquiring about the status of items which are not received when expected. 9. cancelling orders and adjusting accounting records when items are not available. at the uo most acquisitions forms and files are created and maintained manually. in an automated acquisitions system the placing of the initial order generates an acquisition record for each item, which is updated as the item moves through the cycle outlined above. this eliminates the need for maintaining separate files according to the status of an order. oclc operational. oclc has an online nameaddress directory which presently can be searched while using other oclc subsystems. this file contains information about publishing, educational, library, and professional organizations and associations. this information will be automatically transferrable to forms being produced online. planned. oclc's acquisitions subsystem, which is presently being tested by sereports and working papers 223 lected member libraries, is projected to be generally available in spring 1981. when operational the acquisitions subsystem will permit users to: place orders for all types of bibliographic materials (forms generated will be sent directly to supplier with copy to library) renew subscriptions request publications or price quotations create deposit account orders send prepaid orders cancel orders create and adjust fund records receive periodic fund reports rlgirlin operational. rlin does not have an operational acquisitions subsystem. stanford university is continuing to use a system developed as part of ballots. planned. the rlg board of governors has approved functional specifications for an acquisitions subsystem to be introduced in two phases. by june 1981, rlin plans to have a centralized in-process file which will contain records of all new orders, gifts, subscriptions, etc. of members, and will be able to support non-accounting aspects of the acquisitions process. the capability to store and maintain an online book fund accounting system will be achieved in october 1981. rlin expects to be able to support all files, processing, and products necessary to establish, coordinate, and monitor materials acquisitions from the point of selection decision, request, order, or receipt through completion of technical processing activity. wln operational. wln's acquisitions subsystem, which has been operational since may 1978, is comprised of four files: 1. in-process file which supports the majority of acquisitions activities. 2. standing orders file which has records for subscriptions and other items which are renewed or reordered on a continuing or periodic basis. 3. name and address file which contains names and addresses of book dealers and other vendors, main libraries, branch libraries, etc. 4. account status file which provides ca224 journal of library automation vol. 14/3 september 1981 pability to maintain up-to-date accounting. information keyed into the terminal during the day is entered against the accounts nightly and is reflected in the account totals available online the following day. records of completed transactions are transferred to a magnetic tape history file and can be used for generating statistical and other reports. with each step of the order cycle, appropriate forms and reports are generated. special system reports reflecting the status of the four files may be generated on request. instructions entered at the time of the initial order provide for automatic generation of notification forms for individuals requesting the specific item being ordered or inquiry notices for materials not received after a specified period. planned. further refinements of the procedures and capabilities of the system. cataloging the creation of a cataloging record involves: i. describing an item 2. assigning headings for names of persons or organizations and titles by which the user might be expected to seek the item in the catalog 3. assigning a unique call number which will place the item with others of a similar nature, and 4. assigning subject headings which reflect the content of the item. because most libraries collect many of the same materials, the concept of sharing the responsibility for cataloging was developed which makes materials available more quickly at reduced cost. with the establishment of national and international cataloging rules and standards, and the growth of large online computerized databases, it is becoming increasingly feasible to have each item cataloged only once with that cataloging information available for all libraries to use. the library of congress catalogs approximately 250,000 titles per year into machine-readable form . this cataloging is available through each of the bibliographic utilities and may be used for the creation of local catalogs. when the library of congress has not yet cataloged a specific item, a utility member library may prepare the cataloging according to specified standards and enter its cataloging into the database for use by other member libraries and for its own catalog. another aspect of the cataloging activity is the creation of a local database which can be used as the basis of not only the local library catalog, but also of a local circulation, acquisitions, and serials system, as well as for regional union catalogs. in order to provide total access to a library's collection in this machine-readable database, information concerning every item in the library must be entered into the system. this process is called retrospective conversion. during the retrospective conversion process the library can choose to eliminate existing inconsistencies in the treatment of library materials including reclassifying books so that most materials are retained in one main classification system. the university of oregon library has as a long-term goal completing total retrospective conversion of its collection so that all materials can be searched and located in an online catalog. oclc operational. oclc's online cataloging subsystem has been operational since 1971. based on the experience of similar libraries, the university of oregon library might expect to find entries in oclc's database for over 90 percent of the items searched . • these cataloging records can be modified online or accepted as is. the local library's symbol is added to indicate that it has used the cataloging record and then presorted, alphabetized catalog cards are ordered. the cards are printed overnight and shipped on a daily basis. many oclc libraries print their call number labels by means of a printer attached to their terminal. once a cataloging transaction has been completed, it is not possible to retrieve your local modifications online in the oclc system. the record of your transaction is stored and sent to your library on magnetic tape on a periodic basis. these magnetic archive tapes can be used by a vendor or •see footnote on page 225. local computing center to generate a local microform or online catalog, run a circulation system, etc. it is presently possible to catalog most types of materials in the oclc system including books, serials, microforms, motion pictures, music, sound recordings, maps, and manuscripts. increased emphasis has been placed on quality control and adherence to specified standards in the creation of cataloging records, but there is no official editing of cataloging records by oclc staff. in 1979-80 nearly 45 percent of the activity on oclc's cataloging subsystem was related to retrospective conversion. oclc's large database, extended hours of service, and special pricing schedules for retrospective conversion and reclassification make it attractive for these activities. oclc charges 60 c~nts per retrospective conversion record during hours of peak system activity (prime time) and five cents per retrospective conversion record during less busy hours (non-prime time). planned. oclc continues to explore means of improving quality control. after moving their central facility to new quarters in early 1981, oclc will reconsider the possibility of storing and displaying the number and location of local copies of a title. rlgirlin operational. at this time the university of oregon might expect to find cataloging available for 70 to 90 percent of its ongoing work in rlin. t a search of rlin's database retrieves multiple records because each library's records are stored separately. the reports and working papers 225 library selects the desired record, modifies or accepts it, enters the library's symbol, and orders cards which are printed nightly and sent in presorted, alphabetized batches. no call number labels are produced, and it is not presently possible to print labels from the terminal. local library modifications are accessible online. magnetic tapes or cataloging transactions may be purchased and used to create local online or microform catalogs. most materials may be cataloged with rlin including books, serials, microforms, motion pictures, music, sound recordings, and maps. member libraries agree to catalog in conformity with rlin standards, but there is no formal editing of records by rlin staff on an ongoing basis. sample quality checking is the responsibility of a newly-created position of quality assurance specialist. with only 23 owner-members, rlg must carefully consider the impact on the system of allowing individual members to undertake retrospective conversion projects. each project must be approved by the board of governors, and members are encouraged to seek outside financial support rather than asking rlin for reduced rates. rlin has just received a 1.25 million dollar grant including $600,000 to support retrospective conversion projects. rlin does not charge for retrospective records which are completely recataloged and upgraded with the book in hand. the prices for other levels of retrospective conversion cataloging range from fifty-five cents to $1.85 per record. planned. in april 1981, rlin plans to reformat its database so that there will be t a wide range of success rates for searching each system are cited in the literature, each dependent on the sample procedures used. the university of oregon library had 100 items searched against each database. this sample excluded books with printed library of congress card numbers, and included books, serials, microforms, music scores, recordings, documents, and non-book materials. of this sample oclc found 96, rlin found 65, and wln found 38. the range of figures cited in this report allows for variation between studies cited in the literature, word-of-mouth reports from librarians using these systems, and the university of oregon library's own sample. an analysis of this sample is being prepared. recent comparisons of searching success are found in the following: linking the bibliographic utilities: benefits and costs, submitted to the council on library resources ... by donald a. smalley [and others). columbus, ohio, battelle, 1980; matthews, joseph r. , "the four online bibliographic utilities: a comparison," library technology reports 15:6 (november-december 1979), p. 665-838; tracy, juan i. and remmerde, barbara, "availability of machine-readable cataloging: hit rates for ballots, bna, oclc, and wln for the eastern washington university library," library research 1:3 (falll979), p. 227-81. 226 ] ournal of library automation vol. 14/3 september 1981 only one copy of each cataloging record. member libraries' symbols and local cataloging information will be displayed with the appropriate records. wln operational. based on the experience of others, the university of oregon library might currently expect to find cataloging records available for 50 to 70 percent of its ongoing work in the wln database. • libraries search wln's database, accept or modify the cataloging records, and order cards and labels which are printed nightly and shipped weekly. (card sets are not presorted for filing.) local cataloging information is accessible online through the library's wln terminal. magnetic tapes of a library's cataloging transactions may be purchased to run a local online or microform catalog. wln also provides microform catalogs on either microfilm or microfiche. books, serials, and audio-visual materials, but not music, sound recordings, and maps may be cataloged on wln's system. libraries cataloging in wln must conform to well-defined wln standards. new cataloging records go through an edit cycle and are reviewed by central wln staff before being added to the wln database. presently this review takes about two weeks. during this period, the cataloging record may not be retrieved online. the wln batch retrospective conversion subsystem has been operational since august 1980. using this system a library enters brief cataloging records which are collected by the system and searched later as a unit through the wln database. records for which a match is found are billed at six cents. records not matched are billed at one cent and may be searched again at a later date. over 30 wln libraries are using this capability, which can be made available to non-members under special circumstances. planned. wln is considering dispersing among selected member libraries responsi~ility for editing member-created catalogmg records. wln :will make music cataloging available within the near future. •see footnote on page 225. serials check-in serials are publications issued in successive pa~ts be~ring n~merical or chronological designations which are intended to be co~tinued indefinitely. they include periodicals; newspapers; annual reports and yearbooks; i?urnals, memoirs, proceedings an~ transactions of societies; and numbered senes. the average research library will have between 15,000 and 20,000 such titles. precise data must be maintained to enter ~ach issue as received, to discover missing ~ssues, to requ_est replacements for missing issues, to momtor accounting information, to ~enew or cancel subscriptions, and to mamtain binding information. serials files contain such information as title, relationship to earlier publications, name and address of publisher volumes the library owns, call number a'nd location date, volume, and number of each issue' date each issue was received, subscriptio~ dates, price, etc. at_ t?e university of oregon library all of this mformation is maintained in manual files. once the serials check-in operation is co_mpute~ized, it is possible to generate a w1de ~anety of serials finding lists, analyses of senals subscriptions by subject, location, department, etc., and to provide current serials information online. oclc operational. oclc introduced its serials control subsystem in 1976 and improvements to the system in 1979. participants create online local data records with information necessary to monitor and cont~ol each iss~e o~ each serial received by the hbr~ry. i_nshtutwns can check-in currently received issues online. ~recent ancillary to this system is the ability to create and maintain online a cooperativ~ r~or~ of serials owned by any group of mshtutwns (a union list of serials). pl~nned. oclc plans to continue upgradmg the capabilities of its serials control subsystem as needed. rlg!rlin operational. none. planned. automated serials check-in is one of several items listed for consideration after current development activities are released, probably in late 1982. no resources are presently committed to this project. wln operational. while wln has no current serials check-in capabilities, it does support maintenance of serials subscriptions in the acquisitions subsystem, including automatic renewal and reorder reminders. wln also produces union lists of serials. planned. wln is investigating existing commercially-created check-in systems to see whether they can purchase an existing system to incorporate into wln's services. management lnfonnation precise up-to-date information concerning library operations can be very useful in planning improvements in library services and in attaining efficient utilization of available personnel, resources, and materials. without the computer, the laborious record-keeping necessary to obtain useful management information almost negates the benefits of having the information. oclc operational. oclc produces cataloging, interlibrary joan, and serials check-in system use and system performance statistics on a regular basis. libraries can make local arrangements to create additional analyses of the information stored on subscription archival tapes of their local cataloging activity. oclc offers semimonthly, monthly, or quarterly accession lists of new materials cataloged by each library. these lists may be in call number or subject sequence. oclc has produced some special studies for institutions based on their cataloging records. planned. when the acquisitions subsystem is operational, libraries may choose to receive a cumulative, monthly fund activity report and a periodic, cumulative fund commitment register. these reports will provide institutions with current financial control data. oclc plans to continue to develop its ability to provide management information. reports and working papers 227 rlgirlin operational. system use statistics are provided in the form of the monthly invoice, which may be used to monitor cataloging and public service activity, and may be broken down into appropriate accounts by pre-planning. lists in call number order of materials cataloged by a library into rlin could be produced from local printers attached to the terminal. planned. the generation of management information is a future development project; no special management reports are prepared presently. among the management reports included in the specifications for the acquisitions subsystem, projected for implementation by october 1981, are status reports on in-process files, materials awaiting receipt, materials received, and book fund balances. wln operational. wln produces aggregate system activity reports monthly, but does not analyze the cataloging activity or subject holdings. wln's acquisitions subsystem can be used to produce acquisitionsrelated management reports concerning account transactions, account history, standing orders, renewals and reorders, receipts, detailed encumbrances, etc. a microform accession list by title is available. a general-purpose text-editing facility may be used by management to maintain data not derived from wln operations and to produce formatted reports of this data. planned. wln is developing the capability to store and maintain detailed collection information for each library online, including copy numbers and location symbols for each copy of a title owned by a library. no specific management information plans have been outlined at this point. public services reference use of the utility's terminal a bibliographic utility has potential for use in library reference services in three major areas: 1. verification of bibliographic information. the utility's database may be searched for cataloging information 228 journal of library automation vol. 14/3 september 1981 not in the uo library catalog. a verification search is made to locate a complete catalog description of a specific, known item and is carried out most easily using one of the unique numbers assigned to a publication (library of congress card number, international standard book number, etc.). if one of these is not known, a combination of author and title words, or a "search key"• based on author and title is used to retrieve the information. verification places a greater reliance on the quality of bibliographic information in the utility's database than on search techniques used to locate the information. 2. compilation of subject bibliographies. the utility's database is searched through words in the titles and subject headings in a bibliographic record in order to produce a list of materials on a given subject. this subject query can be modified using the logical relationships and, or, and not to indicate, respectively, limitations, synonyms, or exclusions in the search. the ability to obtain a printed list of references is convenient, if not required. 3. compilation of author bibliographies. the database is searched to find all material created by a particular individual or corporate body. the size of the utility's database is a major consideration, as is the source of the cataloging found in an author search. again, a printed list is necessary. oclc the oclc database can be searched in a variety of ways to support reference ser• a search key is a code based on a certain number of characters drawn from a particular element in the bibliographic reference. for instance, to find a record for william manchester's american caesar, an author/title search key using the first four letters of the author's name and the first four letters in the title would be manc,amer. various combinations of letters are used to search author names, titles, or author/title combinations. a search key may not necessarily be unique to a given item , and may retrieve other items beside the one desired. vices, though there is no subject search capability in the system. the following access points may be used in a search: 1. lc card number 2. international standard book number (isbn) 3. international standard serial number (issn) 4. coden (an abbreviation developed by chemical abstracts service for designating periodical titles) 5. government documents number 6. oclc identification number 7. personal author (search key, not full words) 8. corporate author (search key) 9. performer (search key) 10. title (search key) 11. author/title (search key) 12. series title (search key) 13. variant names (search key) 14. conference names (search key) searches may be restricted by year or by type of material, such as books, manuscripts, maps, etc. the logical operators and, or, and not are not used in oclc. the oclc search system is primarily based on search keys and is best utilized to locate a known item. local printing is available on any oclc terminal so equipped. there is one standard print format offered. rlgirlin the following access points may be used in a search of the rlin database, though not all are currently active in each subfile of the database: 1. lc card number 2. isbn 3. issn 4. coden 5. government documents number 6. rlin identification number 7. call number (complete or truncated) 8. recording label number 9. personal author 10. corporate authors or conference names (keyword or phrase) 11. title words 12. subject headings (keyword or phrase) 13. music publisher truncation (searching of partial entries) is available to aid in searching incomplete entries and the logical operators and, or, and not may be used to broaden or restrict a search. local printers may be attached to the rlin terminals. a variety of print formats is offered. plans include unified search access points for all subfiles of the database as of april, 1981. wln the "following access points may be used to search the wln database: 1. lc card number 2. isbn 3. issn 4. wln identification number 5. personal author 6. corporate authors or conference names 7. title words 8. series title (complete or truncated) 9. corporate or conference author/title series (keyword) 10. subject headings (complete or truncated) for a variety of reasons, the wln search system is the most powerful of the three utilities. truncation is available and the logical operators and, or, and not may be applied to broaden or restrict a search. records may be printed locally in a variety of formats on any wln terminal so equipped. wln will also provide printing at the central computer for reference bibliographies. wln search software may be purchased for local database management applications (see the section on online public catalogs.) links to other computerized services there are presently over 150 reference databases available through commercial computerized reference service vendors. during the last ten to fifteen years, standard bibliographic indexing and abstracting publications such as chemical abstracts, historical abstracts and dissertation abstracts international have used computerized methods to organize and print references to periodical articles, reports, dissertations, conference papers, etc. the vendor creates a computer searchable version of the reference database and makes reports and working papers 229 it available to libraries for a fee based on their use of the computerized search system. membership in a bibliographic utility can provide two benefits in the use of other computerized reference services: 1. discounts on fees through membership in large group contract administered by the utility. 2. access to the reference vendor's computer through the utility's terminal and communication network. oclc oclc's affiliated online services program provides access at discounted rates to the information services of bibliographic retrieval service (brs), lockheed information systems (lis), and the new york times information bank. oclc's communications network does not yet permit users to link to the hosts using an oclc terminal, though this capability is anticipated in the near future. rlg!rlin rlin does not offer a formal program in this area, though the rlg 40 terminal is compatible with other information retrieval systems. wln wln does not offer a program in this area, but anticipates offering access to brs, lis, and new york times information bank. circulation none of the bibliographic utilities under consideration currently support circulation functions on their computers. however, each system can provide a machinereadable archive tape of our cataloging information to be used in developing a computerized circulation system. in order to keep track of circulation transactions, it is necessary to have complete retrospective conversion of the uo library catalog. another important consideration is the transferability of data between the utility's computer and the circulation computer. oclc oclc anticipates offering support for local circulation systems on their computer 230 journal of library automation vol. 14/3 september 1981 for member libraries and will demonstrate their system in mid-1981. oclc data has been successfully transferred to many local circulation systems. rlg/rlin rlin does not anticipate offering local circulation services for member libraries. rlin data has been successfully transferred to several local circulation systems. wln wln does not anticipate offering local circulation systems on their computer for member libraries. wln data has been successfully transferred to local circulation systems and an agreement has been reached with dataphase, a computerized circulation system vendor, to discount purchase of their system by wln member libraries. public online catalogs again, none of the bibliographic utilities under consideration currently support public online catalogs of an individual library's collection. a public online catalog requires further programming in order to make it easy for the public to locate materials of interest without extensive training; the bibliographic utility's searching procedures are too esoteric to be used by the general public. as in circulation, issues of data transferability and full retrospective conversion of the uo library's catalog are paramount. oclc oclc does not currently encourage public access to their database and does not support use of local online catalogs on their computer due to the tremendous demand for computer resources exerted by 2400 member libraries. oclc and rlg/ rlin are participating in a study of user requir~ ments for a public online catalog. oclc data has been successfully transferred to several local online catalogs, including eugene public library's circulation and online catalog system, ulisys. rlgirlin rlin anticipates being able to offer public access to their database. they are participating in a study with oclc of user requirements for such a system, but no date has been announced for the development of this capability in rlin. rlin data has been successfully transferred to a local public online catalog at northwestern university. wln wln does not believe that a local online patron accessed catalog should be provided through the wln computer, even though they anticipate having such a capability within one year. instead, they encourage libraries to develop local systems for public access to the online computerized catalog and to obtain data from the wln cataloging system . the university of illinois is adapting the wln computer search and database management software to provide a local online catalog and computerassisted instruction in its use for the public. checklist for cassette recorders connected to crts prepared by lawrence a. woods: purdue university libraries, west lafayette, indiana, for the technical standards for library automation committee, information science and automation section, library and information technology association . introduction a data cassette recorder connected to a printer port is an effective, low-cost method of collecting data in machine-readable form from display terminals such as the oclc 100/105. it is important that a data recorder be used rather than an audio recorder although the cassette itself can be a goodquality audio tape. it is also important to note that the data recorded on the tape are not the same as the data originally transmitted to the display terminal, but are simply a line-by-line image of what appears on the screen. a typical installation will have a minimum of two devices: one attached to the display terminal to collect data, and one attached to a printer or an input device to another computer for playback of the data. there are more than 150 various data remicrosoft word june_ital_gerrity.docx editor’s comments bob gerrity     information  technology  and  libraries  |  june  2015       1       library  discovery  circa  1974   our  ongoing  project  to  digitize  back  issues  of  information  technology  and  libraries  (ital)   and  its  predecessor,  journal  of  library  automation  (jola),  provides  frequent  reminders  of   what’s  changed  (and  what  hasn’t)  in  library  technology  in  the  past  several  decades.  the   image  above  is  from  a  1974  advertisement  in  jola  for  the  “rom  ii  book  catalog  on  microfilm”   from  information  design  in  menlo  park,  ca.    the  ad  copy  speaks  for  itself:   all  the  advantages  of  a  printed  book  catalog…none  of  the  disadvantages.  your  staff  and  patrons   can  use  the  catalog  simultaneously  in  many  different  locations.  the  user  can  scan  a  number  of   related  titles  on  the  same  page,  in  contrast  to  the  one-­‐at-­‐a-­‐time  viewing  of  catalog  cards  in  trays.   manual  filing  routines  and  maintenance  are  eliminated.   easy  to  use…requires  no  instruction.  an  automatic  index  pointer  shows  your  patron  his  position   in  the  file.  at  the  touch  of  a  button  he  can  scan  forward  or  back  at  high  speed.  average  look-­‐up   time  is  about  twelve  seconds.  a  staff  member  can  insert  an  updated  catalog  totally  cumulated   on  a  single  reel  of  microfilm  in  about  one  minute.  your  patrons  never  touch  the  film—your   complete  library  catalog  “locked-­‐in”!     bob  gerrity  (r.gerrity@uq.edu.au)  is  university  librarian,  university  of  queensland,  australia.     editor’s  comments  |  gerrity       doi:  10.6017/ital.v34i2.8805   2   my  favorite  bit  is  the  sign  on  the  front  of  the  machine,  proudly  proclaiming:   these  are  all  the  books  in  the  library.   this  month’s  issue  of  ital  looks  at  the  current  state  of  library  discovery  from  a  number  of   angles.  will  owen  and  sarah  michalak  describe  efforts  at  unc  chapel  hill  and  partners  within   the  triangle  research  libraries  network  to  enhance  the  utility  of  the  library  catalog  as  a  core   tool  for  research,  taking  advantage  of  web-­‐based  search  technologies  while  retaining  many  of   the  unique  attributes  of  the  traditional  catalog.  joseph  deodato  provides  a  useful  step-­‐by-­‐ step  guide  to  evaluating  web-­‐scale  discovery  services  for  libraries.  david  nelson  and  linda   turney  analyze  faceted  navigation  capabilities  in  library  discovery  systems  and  offer   suggestions  for  improving  their  usefulness  and  potential.    julia  bauder  and  emma  lange   describe  a  new  approach  to  subject  searching,  using  an  interactive,  visual  approach.  yan   quan  liu  and  sarah  briggs  report  on  the  current  state  of  mobile  services  among  the  top  100   us  university  libraries.    unrelated  to  discovery  but  certainly  relevant  to  issues  around  library   provision  of  access  to  information,  jill  ellern,  robin  hitch,  and  mark  stoffan  report  on  user   authentication  policies  and  practices  at  academic  libraries  in  north  carolina.     lib-mocs-kmc364-20131012113204 194 journal of library automation vol. 14/3 september 1981 today's large academic libraries struggle, there is, nonetheless, room for criticism of library priorities. this study must be viewed as only a first step (largely tentative and exploratory) in relating automation with service attitudes. it suggests that online systems may be associated with managers more positive in their view of the management role and more positive in their attitudes toward users than batchand manual-system managers. further research would be useful at this point to compare levels of automation (manual, batch, and online) with circulation-staff service attitudes or those of patrons using the systems. references l. laurence miller, "changing patterns of circulation services in university libraries" (ph.d. dissertation, florida state university, 1971), p.iii. 2. ibid., p.149. 3. robert oram, "circulation," in allen kent and harold lancour, eds., encyclopedia of library and information science, v.s (new york: marcel dekker, 1971), p.l. 4. william h. scholz, "computer-based circulation systemsa current review and evaluation," library technolo gy reports 13:237 (may 1977). 5. robert oram , " circulation," p.2. 6. james robert martin , "automation and the service environment of the circulation manager" (ph.d. dissertation, florida state university, 1980), p.22. statistics on headings in the marc file sally h. mccallum and james l. godwin: network development office, library of congress, washington, d.c. in designing an automated system, it is important to understand the characteristics of the data that will reside in the system. work is under way in the network development office of the library of congress (lc) that focuses on the design requirements of a nationwide authority file. in support of this work, statistics relating to headings that appear on the bibliographic records in the lc marc ii files were gathered. these statistics provide information on characteristics of headings and on the expected sizes and growth rates of various subsets of authority files. this information will assist in making decisions concerning the contents of authority files for different types of headings and the frequency of update required for the various file subsets. then ational commission on libraries and information science supported this work. use of these statistics to assist in system design is largely system-dependent; however, some general implications are given in the last section of this paper. in general , counts were made of the number of bibliographic records, headings that appear in those records, and distinct headings that appear on the records. the statistics were broken down by year, by type of heading, and by file. in this paper, distinct headings are those left in a file after removal of duplicates. distinctness will not be used to imply that a heading appears only once in a source bibliographic file, although distinct headings may in fact have only a single occurrence. thus, a file of records containing the distinct headings from a set of bibliographic records is equivalent in size to a marc authority file of the headings in those bibliographic records. methodology these statistics were derived from four marc ii bibliographic record files maintained internally at lc: books, serials, maps, and films. the files contain updated versions of all marc records that have been distributed by lc on the books, serials, maps, and films tape:; frum 1969 through october 1979, and a few records that were then in the process of distribution. the files do not contain cip records. a total of l ,336,182 bibliographic records were processed, including 1,134,069 from the books file, 90,174 from the serials file, 60,758 from the maps file, and 51,176 from the films file. a file of special records, called access point (ap) records, was created that contains one record for the contents of each occurrence of the following fields in the bibliographic records: type of heading personal name corporate name conference name topical subject geographic subject uniform title heading fields 100,700,400,800,600 110,710,410,810,610 111,711,411,811,611 130, 730, 650 651 830,630 only the 6xx subject fields that contained lc subject headings (i.e., second indicator = 0) were selected asap records. the main entry data string was substituted for the pronoun in the series (4xx) fields that contained pronouns. the ap records also contained information from the bibliographic records that assisted in making the counts, such as the date of entry of the record on the file, the identity of the type of bibliographic file, and the language of the bibliographic record. a third file was derived from the ap file that contained a normalized character string for each ap record heading. these normalized ap records were used to produce the counts of distinct headings by clustering like data strings. normalization included conversion of all characters to uppercase, and masking of diacritics, marks of punctuation, and other characters that do not determine the distinctness of a heading, but would interfere with machin~ determination of uniqueness. the subhelds included in the normalized string, hence used for all heading comparisons, are given below. only use-dependent subfields, such as the relator subfield, and those that belonged to title clusters in author/title headings were excluded. examples of the ap file field contents and the normalized forms are: ap field contents: chuang-tzu chuang-tzu [blaeu,joan] 1596-1673 blaeu, joan. 1596-1673 blaeu,joan, 1596-1673 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788.1824 communications 195 normalized forms: chuang tzu blaeu joan 1596 1673 byron george gordon noel byron baron 17881824 distinct headings for this study were determined by comparing on the following subfields: type of heading personal name corporate name sub fields a, b,c,d a, b, k, f, p, s, g conference name a, q, e topical subject a, b, x, y, z geographic subject a, b, x, y, z all occurrences of repeating subfields were included. the relator data of subfields were dropped from personal and corporate name headings as were the title subfields in author/title headings. a separate study will examine the occurrence of author/title headings. approximately 8 percent of the name headings in the files carry title subfields: 6 percent are series and 2 percent are author/title subjects or added entries. two types of distinct heading counts were generated for topical and geographic subject headings. one takes account only of main terms, the a and b subfields, excluding all subject subdivisions. the other compared the complete heading strings, including subject subdivisions. characteristics of the files the four bibliographic files from which the statistics were derived were begun in different years and are of unequal size. table 1 presents the number of bibliographic records added to each of the marc files by the year that the record was first entered into the file. the records added in the first months of 1979 have been eliminated from tables 1-3, thus the total number of records under consideration is 1,210,809. in the combined file, the records for books dominate the contributions from other forms of materials, representing 85 percent of the combined file records. after the addition of the films and serials records in 1972 and 1973 the total number of records added each year leveled off to around 115,000 but jumped to an average of slightly more than 150,000 records per year following the ad196 journal of library automation vol. 14/3 september 1981 table 1. number of records added to each file by year year entered book serial map film total 1968 11,812 0 0 0 11,812 1969 43,874 0 1, 104 0 44,978 1970 86,004 0 3,467 0 89,978 1971 105,390 0 8,857 6,280 114 ,247 1972 73,437 0 4,665 6,280 84,382 1973 92,512 3,720 5,566 8,929 110,727 1974 99,004 10,682 6,246 8,457 124,389 1975 86,527 15,866 6,721 8,604 117,718 1976 120,106 19,098 6,876 5,432 151,512 1977 140,011 17,999 7,011 4 ,797 169,818 1978 169,044 12,643 5,584 4,464 191,735 total 1,027,721 80,008 56,117 46,963 1,210,809 table 2. numbers of headings and distinct name headings added to all files by year number of headin gs number of distinct headin gs year personal corporate conference personal corporate conference entered names names names names names. names 1968 14,526 3,138 155 12,620 2,139 143 1969 53, 134 21,206 1,027 39,184 9,364 909 1970 104,365 42 ,798 2,175 63,037 14,286 1,769 1971 129,617 57,496 2,742 64,029 15,216 2,158 1972 91,040 45,768 1,942 41,246 9,891 1,402 1973 118,188 57,847 2,625 48,703 12,653 1,862 1974 127,588 73,303 2,972 51,623 17,129 1,983 1975 113,622 76,417 2,519 50,291 18,135 1,742 1;}76 154 ,7 18 88,207 3,454 73,182 23,120 2,306 1977 182,860 87,985 3,487 89,353 23,906 2,333 1978 218,535 97,042 4,192 99,780 24,280 2,831 total 1,308, 193 651,207 27,290 633,048 170, 119 19,438 table 3. numbers of subject headings and distinct subject headings added to all files by year number of distinct headings number of headings first terms only full headings year topical geographic topical geographic topical geographic entered subjects subjects subjects subjects subjects subjects 1968 10,615 1,857 4,390 489 7,775 1,512 1969 45,161 9,047 8,104 1,980 23,617 5,426 1970 89,304 21,054 8,170 4,263 34 ,526 10,179 1971 115,220 31,278 6,853 5,417 36,689 12,862 1972 92,247 20,760 4,236 2,597 26,201 7,074 1973 121 , 161 27,890 4,460 3,105 33,061 9,819 1974 137,843 31,814 4,524 3,553 39,262 11 ,4 13 1975 130,980 30,650 4,203 3,417 40,129 11 ,818 1976 168,840 39,886 5, 125 4,142 55,468 15,472 1977 185,331 44,973 5,718 4,194 59,529 16,676 1978 222,565 49,923 7,151 4,034 69,856 17,855 t otal 1,319,267 309,132 62,934 37,191 426, 113 120, 106 clition of major non-english roman alphabet language records in 1976. the increase is noticeable primarily in the books and serials files since the maps file had been adding those languages since 1969 and only a limited number of non-english-language audiovisual materials are cataloged. the unusually large number of records added to the books file in 1971 resulted from a special project to add retrospective titles to the file. the large increase in books records in 1978 was due to the co marc project in which retrospective lc records that had been converted to machine-readable form by other libraries were contributed to the lc marc file. approximately 12,000 comarc records were added in 1977 and 28,000 in 1978. the fall in numbers of film records produced in 1976-1978 reflects a general fall in production of instructional films in the united states. counts of items cataloged that are compiled by lc processing services from catalogers' statistics sheets show that lc cataloged approximately 225,000 titles in 1978; thus, approximately 73 percent of lc cataloging is currently going into machinereadable form. the principal exclusions are records for most nonroman material (only nonroman records for maps have been transliterated and added since 1969) and a few records for music, sound recordings, incunabula, and microforms. the portion being put into machine-readable form should rise significantly as the romanized records for items in several nonroman alphabets are added in the next year. name headings table 2 presents the number of occurrences of name headings in the marc bibliographic files and the number of distinct name headings, both by type of heading and by year. the number of distinct headings that were new to the file in a year was determined by comparing the headings added in a given year against those added in all previous years. it is not surprising to find that 66 percent of name-heading occurrences are personal names, 33 percent are corporate, and only 1.4 percent are conference. the figures shift when considering the distinct names, where 77 percent are percommunications 197 sonal and only 21 percent are corporate. looking at ~he total figures in table 2, while 1 ,308,193 of the headings that appeared on the records were personal names, only 633,048 or 48 percent of these were distinct. of the rest, 52 percent were duplicates of the distinct headings. similarly, 26 percent of corporate names were distinct, with 74 percent being duplicates; and 71 percent of conference names were distinct, with only 29 percent being duplicates. in 1968, 87 percent and 68 percent of personal and corporate names, respectively, were distinct, i.e., 13 percent and 32 percent "had been used previously" when they appeared on a bibliographic record during the year. as the base file of names grows, the percentage of names appearing on new records but which "had been used previously" rises, to 60 percent and 77 percent in 1974. while the figures reported in table 2 indicate that the percentage of headings used that were repeats fell slightly again in 1977 (51 percent and 73 percent), this is probably due to the influx of new names with the addition of new languages in 1976-77. additional statistics gathered on english-language items show the percentage of repeating headings becoming steady after 1974. subject headings statistics concerning distinct topical and geographical subject headings were collected for main terms, excluding subdivisions, and for full subject heading strings. table 3 gives the numbers of headings and the numbers of distinct headings of each type found in the marc file. looking at the total figures, only 4.8 percent of topical first terms are distinct, the rest are duplicates. this indicates an average occurrence of 20.8 times for each first term. slightly more, 12 percent, of the geographic first terms are distinct. when the full headings with topical, period, form, and geographic subdivisions are considered, the percentage of headings that are distinct rises to 32.3 percent for topical subjects and 38.8 percent for geographic subjects. thus, 67.3 percent of topical and 61.2 percent of geographic are duplicates of existing headings. in the yearly figures, sub198 journal of library automation vol. 14/3 september 1981 ject headings show the same tendency as name headings in that the percentages of headings that appear on new records but which "had been previously used" rises as the stock of headings increases and then levels off. subjects were also affected by the addition of other roman alphabet languages in 1976-77 but not to a very large degree. for all access points, name headings and full string subject headings, name headings account for 55 percent of the headings that occur in the bibliographic records, with only 45 percent attributable to topical and geographical headings. it should be noted that 12 percent of the name headings that appear on the bibliographic records are names used as subjects. frequencies of occurrence counts were also made of the frequency with which name headings occurred in the bibliographic files. table 4 summarizes the frequency data: 66 percent of distinct personal names, 62 percent of distinct corporate names, and 84 percent of distinct conference names occur only once in the files. the percent of corporate names with single occurrences is surprisingly close to that for personal; however, the percent of names having multiple occurrences falls more slowly for corporate than for personal names. while 5.47 percent of corporate names occur ten or more times, only 1.92 percent of personal names occur ten or more times. the figures for personal names roughly correspond to those obtained by william potter from a sample taken from the main catalog at the university of illinois at urbana-champaign. that study showed 63.5 percent of personal names occurred onlyonce. 1 the number of occurrences of different types of headings are compared in figure 1. the bars show the numbers of personal, corporate, conference, topical, and geographic headings that appear in the bibliographic files. the shaded areas represent the number of headings that are distinct, thus the upper part of each bar represents additional occurrences of the headings from the shaded area. for personal, corporate, and conference headings a further distinction is made between distinct headings that occur only once, the crosshatched area, and those that have multiple occurrences. thus the multiple occurrences of corporate names may be seen to come from a small table 4. frequency of occurrence of name headings in all files distinct distinct distinct number of personal names corporate names conference names occurrences number percent number percent number percent 1 456,328 65.65 116,250 62.02 18,02 1 83.90 2 119,68 1 17.22 30,185 16.10 2,049 9.54 3 46,247 6.65 11,563 6.17 587 2.73 4 23,951 3.45 6,814 3.64 289 1.35 5 13,820 1.99 4,109 2.19 163 .76 6 8,790 1.26 2,958 1.58 98 .46 7 5,827 .84 2,175 1.16 56 .26 8 4,056 .58 1,673 .89 48 .22 9 2,998 .43 1,395 .74 36 . 17 10 2,153 .31 10 ,037 .55 18 .08 11-13 4,116 .59 2,180 1.16 44 .20 14-20 3,748 .54 2,632 1.40 41 .19 2150 2,678 .39 2,901 1.55 23 .11 51-100 448 .06 936 .50 4 .02 101-200 149 .02 374 .20 2 .01 201-300 47 .01 109 .06 1 .00 301400 19 .00 46 .02 0 .00 401-500 11 .00 21 .01 0 .00 5011000 5 .00 53 .03 0 .00 1001 + 2 .00 18 .01 0 .00 total 695,074 99.99 187,429 99.98 21,480 100.00 number of distinct corporate headings, as was indicated by the slow decrease of the multiple-heading occurrence rate (i.e., a small group of corporate names have a very large number of occurrences). file growth as a bibliographic file grows and the stock of names and subjects that are contained in the associated authority file increases, the number of new-to-the-file 1400 1200 1000 "' <:> 800 z i5 : .. 0 a: w 600 id ::;: "' z 400 200 1,444,726 personal names corporate names communications 199 headings that are required for the new bibliographic records would be expected to fall. figure 2 illustrates that tendency and shows that there is a leveling off of the number of new-to-the-file headings per new bibliographic record after the bibliographic file reaches a certain size. for example, after approximately 700,000 bibliographic records are in the file, for every additional 100 bibliographic records approximately 298 name and subject headings 30,417 conference names 1.468,804 topical subjects geographic subjects d distinct headings distinct headings that occur -only once fig. 1. number of headin gs by type. 200 journal of library automation vol. 14/3 september 1981 will be assigned, and, of these, approximately 53 will be new personal names, 14 new corporate names, 2 new conference names, 35 new topical subjects, and 10 new geographic subjects; the remaining 184 headings used will already be established in the authority file. thus after a certain bibliographic file size is reached, the growth of the authority file is approximately a linear function of the growth of the bibliographic file. implications the reoccurrence frequency of headings in a bibliographic file is often cited as a factor in designing bibliographic and authority-file configurations. discussion 1.2 ii i 0 0 .9 a: 0 u w a: .8 ~ ~ .7 z 5 :.\ " .6 ~ z 0 .5 a: w "' ~ . 4 z .3 centers on the necessity of carrying authority records for headings that occur only once in a bibliographic file . with reference to the name-heading data in table 4 and figure 1, carrying authority records only for headings that occur more than once could 'potentially reduce the size of the authority file from that indicated by the whole shaded area (including shaded and crosshatched) to the plain shaded area, i.e., from 903,983 records to 310,123, a 66 percent decrease. controlling multiple occurrences of a heading is, however, only one role of the authority record. more important perhaps is the control of cross-references connected with the heading. preliminary work with a • persona l names ---9 top~cal su8jects ... corporate names 2~ ----------~----~---------& geographi ca l subj ects 'y con ference n ames » ~~~~r=~~~~~==~==~==~~~==~==~==~==~-100 200 300 400 500 600 700 800 900 1000 11 00 1200 1300 number of bibliographic records cthousands) fig. 2 . n umber of n ew headings p er r eco rd for all files. random sample of personal names in the lc file indicates that less than 17 percent of personal names require cross-references. thus the personal name headings that occur only once but would require authority records because of cross-references could be less than 17 percent. the frequency data combined with reference structure data could have a significant impact on design. out of a total of 695,074 personal names in the authority files associated with the marc bibliographic files examined here, 456, 328, or 66 percent, occur only once. of these, fewer than 77,575 would be expected to have cross-references, thus the nameauthority file for personal names could be reduced in size from 695,074 records to 316,321, a 55 percent decrease. if separate authority records are a system requirement, the occurrence figures might then be useful for defining configurations that employ machine-generated provisional records for single-occurrence headings that do not have reference structures or that simplify in other ways the treatment of these headings. these figures may also be useful in making decisions on the addition of retrospective authority records to the automated files. reference 1. william gray potter, "when names collide: conflict in the catalog and aacr2," library resources & technical services 24:7 (winter 1980). rlin and oclc as reference tools douglas jones: university of arizona, tucson. the central reference department (social science, humanities, and fine arts) and the science-engineering reference department at the university of arizona library are currently evaluating the oclc and rlin systems as reference tools, to see if their use can significantly improve the effectiveness and efficiency of providing reference service. a significant number of the questions received by our librarians, and presumably by librarians elsewhere, incommunications 201 volve incomplete or inaccurately cited references to monographs, conference proceedings, government documents, technical reports, and monographic serials. if by using a bibliographic utility a librarian can identify or verify an item not found in printed sources, then effectiveness has been improved. once a complete and accurate description of the item is found, it is a relatively simple task to determine whether or not the library has the item, and if not, to request it through interlibrary loan. additionally, if the efficiency of the librarian can be improved by reducing the amount of time required to verify or identify a requested item, then the patron, the library, and, in our case, the taxpayer, have been better served. the promise of nearimmediate response from a computer via an online interactive terminal system is clearly beguiling when compared to the relatively time-consuming searching required with printed sources, which frequently provide only a limited number of access points and often become available weeks, months, or even years after the items they list. we realize, of course, that the promise of instantaneous electronic information retrieval is limited by a va):'iety of factors, and presently we view access to rlin and oclc as potentially powerful adjuncts tonot replacements for-printed reference sources. given that rlin and oclc have databases and software geared to known-item searches for catalog card production, our evaluation attempts to document their usefulness in reference service. a preliminary study conducted during the spring semester of 1980-81 indicated that approximately 50 percent of the questionable citations requiring further bibliographic verification could be identified on oclc or rlin. the time required was typically five minutes or less. successful verification using printed indexes to identify the same items ranged from 20 percent in the central reference department to 50 percent in science-engineering. time required per item averaged approximately fifteen minutes. based on our findings, we plan a revised and more thorough test during the fall semester of 1981-82, which will include an assessment of the enhancements to the hutchinson this study focuses on the adoption and use of wireless technology by medium-sized academic libraries, based on responses from eighty-eight institutions. results indicate that wireless networks are already available in many medium-sized academic libraries and that respondents from these institutions feel this technology is beneficial. w ireless networking offers a way to meet the needs of an increasingly mobile, tech-savvy student population. while many research libraries offer wireless access to their patrons, academic libraries serving smaller populations must heavily weigh both the potential benefits and disadvantages of this new technology. will wireless networks become essential components of the modern academic library, or is this new technology just a passing fad? prompted by plans to implement a wireless network at the houston cole library (hcl) (jacksonville state university’s [jsu’s] library), which serves a student enrollment close to ten thousand, this study was conducted to gather information about whether libraries similar in size and mission to hcl have adopted wireless technology. the study also sought to find out what, if any, problems other libraries have encountered with wireless networks and how successful they have perceived those networks to be. other questions addressed include level of technical support offered, planning, type of equipment used to access the network, and patron-use levels. � review of literature a review of the literature on wireless networks revealed a number of articles on wireless networks and checkout programs for laptop computers at large research institutions. seventy percent of major research libraries surveyed by kwon and soules in 2003 offered some degree of wireless access to their networks.1 no articles, however, specifically addressed the use of wireless networks in medium-sized academic libraries. many articles can also be found on wireless-network use in medical libraries and other institutions. library instruction using wireless classrooms and laptops has been another subject of inquiry as well. breeding wrote that there are a number of successful uses for wireless technology in libraries, and a wireless local area network (wlan) can be a natural extension of existing networks. he added that since it is sometimes difficult to install wiring in library buildings, wireless is more cost effective.2 a yearly survey conducted by the campus computing project found that the number of schools planning for and deploying wireless networks rose dramatically from 2002 to 2003. “for example, the portion of campuses reporting strategic plans for wireless networks rose to 45.5 percent in fall 2003, up from 34.7 percent in 2002 and 24.3 percent in 2001.”3 the use of wireless access in academia is expected to keep growing. according to a summary of a study conducted by the educause center for applied research (ecar), the higher-education community will keep investing in the technology infrastructure, and institutions will continue to refine and update networks. the move toward wireless access “represents a user-centered shift, providing students and faculty with greater access than ever before.”4 in an article on ubiquitous computing, drew provides a straightforward look at how wlans work, security issues, planning, and the uses and ramifications of wireless technology in libraries. he suggests, “perhaps one of the most important reasons for implementing wireless networking across an entire campus or in a library is the highly mobile lifestyle of students and faculty.” the use of wireless will only increase with the advent of new portable devices, he added. wireless networking is the best and least expensive way for students, faculty, and staff to take their office with them wherever they go.5 the circulation of laptop computers is a frequent topic in the available literature. the 2003 study by kwon and soules primarily focused on laptop-lending services in academic-research libraries. fifty percent of the institutions that responded to their survey provided laptops for checkout. the majority indicated moderate-to-high use of laptop services. positive user response and improved “public reputation, image, and relations” were the greatest advantages reported with laptop circulation. the major disadvantages associated with these services were related to labor and cost.6 a study of laptop checkout service at the mildred f. sawyer library at suffolk university in boston revealed that laptop usage was popular during the fall semester of 1999. students checked out the computers to work on group projects. a laptop area was set aside on one library floor to provide wired internet access for eight users. however, students wanted to use the laptops anywhere, not one designated place. the wired laptop areas were not popular, dugan wrote, adding that “few students used the wired area and the wires were repeatedly stolen or intentionally broken.” an interim phase involved providing wireless network cards for checkout wireless networks in medium-sized academic libraries: a national survey paula barnett-ellis and laurie charnigo paula barnett-ellis (pbarnett@jsucc.jsu.edu) is health and sciences librarian, and laurie charnigo (charnigo@jsucc .jsu.edu) is education librarian at houston cole library, jacksonville state university, alabama. wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 13 14 information technology and libraries | march 2005 to encourage patrons to use their own laptops, and, when a wireless network was put into place in the fall of 2000, demand exceeded the number of available laptops for checkout.7 � method a survey (see appendix) was designed to find out how many libraries similar in size and mission to hcl have adopted wireless networks, the experiences they have encountered in offering wireless access, and, most importantly, whether they felt the investment in wireless technology has been worth the effort.8 the national center for education statistic’s academic library peer comparison tool, a database composed of statistical information on libraries throughout the united states, was used to select institutions for this study. a search on this database retrieved eighty-eight academic libraries that met two criteria: full-time enrollments of between five thousand and ten thousand, and classification by the carnegie classification of higher education as master’s colleges and universities i.9 the survey was administered to those thought most likely to be responsible for systems in the library; they were selected from staff listings on library web sites (library systems administrator, information tech-nology [it] staff). if such a person could not be identified, the survey was sent to the head of library systems or to the library director. the survey was divided into the following sections: implementation of wireless network, planning and installation stages, user services, technical problems, and benefits specific to use of network. surveys were mailed out in march 2004. an internet address was provided in the cover letter if participants wished to take the survey online rather than return it by mail. an e-mail reminder with a link to the online survey was sent out three weeks after the initial survey was mailed. all letters and e-mails were personalized, and a self-addressed stamped envelope and a ballpoint pen with the jsu logo were included with the mail surveys. in the e-mail reminder, the authors offered to share the results of the project with anyone who was interested, and received several enthusiastic responses. � results a total of fifty-three completed surveys were returned, resulting in a response rate of 60 percent. the overwhelming majority (85 percent) responded that their library offered wireless-network access. even if the thirty-five surveys that were not returned had reported that wireless networks were not available, more than 50 percent would still have offered wireless networks. survey results also pointed to the newness of the technology. only four of the fifty-three institutions have had wireless networks for more than three years. the majority (73 percent) has implemented wireless networks just within the last two years. when asked to identify the major reasons for offering wireless networks to their patrons, the three responses most chosen were: (1) to provide greater access to users; (2) the flexibility of a network unfettered by the limitations of tedious wiring; and (3) to keep up with technological innovation (see table 1). least significant factors in the decision to implement wireless networks were cost; use by library faculty and staff; to aid in bibliographic instruction; and use for carrying out technical services (taking inventory). somewhat to the authors’ surprise, wireless use in bibliographic instruction was not high on the list of reasons for installing a wireless network, identified by only 9 percent of respondents. the benefits of wireless for library instruction was stressed in the literature by mathias and heser and patton.10 in addition to obtaining an instrument for gauging how many libraries similar in scope and size to hcl have implemented wireless networks and why they chose to do so, questions on the survey were also designed to gather information on planning and implementation, user services, technical problems, and perceived benefits. � planning and implementation although tolson mentions that some schools have used committees composed of faculty, staff, and students to look into the adoption of wireless technology, responses from this survey indicated that the majority (60 percent) of the libraries did not form committees specifically for the planning of their wireless networks.11 in addition, 49 percent of the libraries took fewer than six months to plan for implementation of a network, 37 percent required six months to one year, and 15 percent reported more than one to two years. actual time spent on installation and configuration of wireless networks was relatively short, 98 percent indicating less than one year (see table 2 for specific times). one of the most important issues to consider when planning to implement a wireless network is extent of coverage—where wireless access will be available. survey responses revealed varying degrees of wireless coverage among institutions. twenty percent had campus-wide access, 55 percent had some level of coverage throughout the entire library, 37 percent provided a limited range of coverage outside the building, and 20 percent offered access only in certain areas within the library. according to a bulletin published by ecar, institutions vary in their approaches to networking depending on enrollment. smaller colleges and universities with fewer than ten thousand students are “more likely to implement campuswide wireless networks from the start. larger institutions are more likely to implement wireless technology in specific buildings, consistent with a desire to move forward at a modest pace, as resources and comfort with the technology grow.”12 questions on the survey also queried respondents about the popularity of spaces in the library where users access the library’s wireless network. answers revealed that the most popular areas for wireless access are study carrels, tables, and study rooms. nineteen percent indicated that accessing wireless networks in the stacks is popular. of particular concern to hcl, a thirteen-story building, was how the environment of the library would accommodate a wireless network. a thorough site survey is important to locate the best spots within the library to install access points and to determine whether there are architectural barriers in the building that might interfere with access. the majority of survey respondents indicated that the site survey conducted in their library for a wireless network was carried out by their academic institution’s it staff (59 percent). while library staff conducted 35 percent of site surveys, only 17 percent were conducted by outside companies. � user services an issue to be addressed by libraries deciding to go wireless is whether laptop computers should also be provided for checkout in the library. after all, it might be hard to justify the usefulness of a wireless network if users do not have access to laptops or other hardware with wireless capabilities. while one individual reported working at a “laptop university” in which campuswide wireless networking exists and all students are required to own laptops, not all college students will have that luxury. in order to provide more equal access to students, checking out laptops has become an increasingly common service in academic libraries. seventy percent of this survey’s respondents whose institutions offered wireless access also made laptops available for checkout. comments made throughout the survey seemed to imply that while checking out laptops to patrons is an invaluable complement to offering wireless access, librarians should be prepared for a myriad of hassles that accompany laptop checkout. wear and tear of laptops, massive battery use, cost of laptops, and maintenance were some of the biggest problems reported. one participant, whose institution decided to stop offering laptops for checkout to patrons in the library, wrote, “it required too much staff time to maintain and we decided the money was better spent elsewhere. the college now encourages students to purchase a laptop [instead of] a full-sized pc.” one participant worried that the rising use of laptops in his library would lead to the obsolescence of its more than one hundred wired desktops, writing, “our desktops are very popular and we think having them is one of the reasons our gate count has increased in recent years. what happens when everyone has a laptop?” the number of laptops checked out in the libraries varied. the majority of libraries had purchased between one and thirty laptops available for checkout (see table 3). three institutions had more than forty-one laptops available for checkout. one library could boast that it had sixty laptops available for checkout with twelve pagers to notify students waiting in line to use laptops. when asked about the use of laptops in libraries, 46 percent table 1. main reasons for implementing a wireless network in absolute numbers and percentages reasons for implementing total number of percent of responses a wireless network responses out of total number provide greater access to users 36 67 flexibility (no wires, ease in setting up) 29 54 to keep up with or provide technological innovation 28 52 campuswide initiative 21 39 requests expressed by users 16 30 provide greater online access due to shortage of computers-per-user in the library 15 28 other 7 13 offer network access outside the library building 6 11 aid in bibliographic instruction 5 9 for use by library faculty and staff 5 9 low cost 5 9 to carry out technical services (such as inventory) 4 7 wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 15 16 information technology and libraries | march 2005 observed moderate use, while 32 percent reported heavy use of laptops. only 3 percent indicated that they hardly ever noticed use of laptops in the library. for those students who chose to bring their own laptop to access the library’s wireless network, half of the institutions surveyed required students to purchase their own network-interface cards for their laptops, while 19 percent allowed students to check them out from the library. in addition to laptops, personal digital assistants, (pdas) were listed by 37 percent of respondents as devices that may access wireless networks. one librarian indicated that cell phones could access the wireless network in his library. fiftysix percent of respondents indicated that users are able to print to a central printer in the library from their wireless device. an important consideration for implementing a wireless network is how users will authenticate. authentication protocol is defined by the microsoft encyclopedia of networking as “any protocol used for validating the identity of a user to determine whether to grant the user access to resources over a network.”13 authentication methods listed by the institutions surveyed varied greatly and the authors could not identify all of them. methods mentioned were lightweight directory access protocol (ldap), virtual private network (vpn), and media access control (mac) addresses, bluesocket, remote authentication dial in user service (radius), pluggable graphical identification and authentication (pgina), protected extensive authentication protocol (peap), and e-mail logins. out of the thirty-nine responses to this question, seven individuals indicated that they do not require any type of authentication at the present. although some individuals noted that they are planning to enable some type of authentication in the future, one participant suggested that there were ethical issues involved in requiring users to authenticate. this person argued that “anonymous access to information is valued” and praised his institution’s current policy of allowing “anyone who can find the network” to use it. a concern about offering wireless network access in the library is how library staff will be prepared to handle the flood of technical questions that are likely to ensue. the level of technical support offered to users varied among the institutions surveyed. more than half of the respondents indicated that users receive help specifically from it staff or from the campus computer center. thirtynine percent of users received help from the reference desk, while 19 percent received help from circulation staff. thirty-three percent of the responding institutions offered technical help from a web site, while 7 percent indicated that they did not offer any type of technical support to users. technical problems the technical problems most often encountered with wireless networks centered on architectural barriers that cause black-outs or slow-spots where wireless access fails. this confirms the importance of carrying out thorough site surtable 2. total length of time taken to completely configure and install the wireless network time to install and total number of percent of responses configure wireless network responses out of total number less than one month 12 28 one to two months 11 26 more than two months to four months 10 23 more than four months to six months 4 9 more than six months to one year 5 12 more than one year 1 2 table 3. total number of laptops available for checkout in the library total laptops total number of percent of responses available for checkout responses out of total number one to five 8 26 six to ten 5 16 eleven to fiften 1 3 sixteen to twenty 5 16 twenty-one to thirty 8 26 thirty-one to forty 1 3 more than forty 3 10 veys and testing prior to installation of access points. site surveys may be carried out by companies specially equipped and trained to determine where access points should be installed, the most appropriate type of antennae (directional or omnidirectional), and how many access points are needed to provide the greatest amount of coverage. configuration of the network was the second most highly reported problem associated with installing wireless networks, seeming to suggest the need for librarians to coordinate their efforts and rely on the knowledge provided by the it coordinator (or similar type of personnel) within their institution. lack of technical support available to users, slow speed, and authentication were also indicated as technical problems most encountered (see table 4). integrating the wireless network with the existing wired network was the least-mentioned problem associated with wireless networks. although security problems, particularly concerning wired equivalency protocol (wep) vulnerabilities, have been pointed out as one of the major drawbacks of a wireless network, the majority of users had not as yet experienced security problems. although one participant wrote, “don’t be too casual about the security risks,” another individual wrote, “talk to your networking department,” as many of them are overly worried about security. perceived benefits respondents reported that the number-one benefit of offering wireless access was user satisfaction. giving patrons the ability to use their laptops anywhere in the library and do multiple tasks from one machine is simply becoming what more and more users expect. the secondlargest benefit revolved around flexibility and ease of use due to the lack of wires. thirty-five percent indicated that allowing students to roam the stacks while accessing the network was a significant benefit. although a few studies have suggested the promise of wireless networks for aiding bibliographic instruction, only 9 percent of respondents indicated this as a benefit of wireless technology. use of wireless technology for instruction, it might be recalled, was not a significant factor noted by respondents in the decision to implement a wireless network. likewise, use of this type of network to carry out technical services (such as inventory) was also low on the scale of benefits. seventy-three percent of users claimed that wireless networks have thus far been worth the cost-benefit ratio. while 70 percent indicated moderate to heavy use of the wireless network, 27 percent reported low usage. when asked what advice they would give to others considering adopting wireless networks in their libraries, the overwhelming majority of responses were positive, recommending that hcl take the plunge. as one individual wrote, “offer it and they will come. it has really increased the usage of our library.” other individuals noted that it is simply necessary to offer wireless access to keep up with technological innovation, and that students expect it. the most significant warning, however, revolved around checkout and maintenance of laptops, which, from the results of this survey, seems be both a big advantage and a headache. several individuals echoed the importance of doing site surveys to test bandwidth limitations and access. one particularly energized participant, using multiple exclamations for emphasis, shared a plethora of advice. “throttle connection speeds! allow only http access! block ports and unnecessary protocols! secure your network and disallow unauthenticated users! use access control lists! establish policies that describe wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 17 table 4. technical problems encountered problems total number of percent of responses encountered responses out of total number architectural barriers 15 28 configuration problems 12 22 not enough technical help available to users when needed 10 19 slow speed 10 19 authentication problems 10 19 blackouts 6 11 problems installing drivers 6 11 security problems 6 11 difficulty signing on 6 11 problems with operating systems 5 9 other 3 6 problems integrating the wireless network with an existing wired network 2 4 18 information technology and libraries | march 2005 [wireless fidelity] wi-fi risks and liabilities on your part!” useful advice on wireless-access implementation gleaned from this survey fell under the following categories: � be aware of slower speed � create a policy and guide for users � do it because more users are going wireless, it is necessary to keep up with technological innovation, and because students love it � provide plenty of access points � install access points in appropriate places � ensure continuous connectivity by allowing overlap between access points � purchase battery chargers and heavy-duty laptops with extended warranties � get support from it staff for planning and maintenance � offering wireless will increase library usage � perform or have an expert perform a careful site survey and do lots of testing to locate dead or slow spots in the library due to architectural barriers � enable some type of authorization � be aware of security concerns � although the majority of participants’ networks (70 percent) support 802.11b (which allows for throughput up to 11 megabits per second), a few participants suggest using the 802.11g standard (up to 54 megabits per second) because it is “the fastest” and “backwards compatible to 802.11b” � conclusion though it is a relatively new technology, this study found that a surprisingly large number of medium-sized academic libraries are already offering wireless access. not only are they offering wireless access, but they are also providing patrons with laptops for checkout in the library. although actual use of the network by patrons was not determined through survey responses (as individuals were only asked about their observations of network use), the comments and answers were overwhelmingly positive and enthusiastic about this new technology. problems that have been encountered with wireless networks largely revolve around configuration, slow speed, and laptop checkout. although much of the literature focuses on security issues that accompany wireless networking, few individuals reported problems with security. college and university students, like the rest of society, are becoming increasingly mobile. more often, they want access to library networks and the internet wherever they happen to be studying or working on group projects, not merely in computer labs or designated study areas. the majority of the libraries in this study are accommodating these students’ needs by offering wireless access. according to breeding, wireless networking is a rapidly growing niche in the networking world, and mobile computer users will become a larger and larger part of any library’s clientele.14 to encourage patrons to continue visiting them, academic libraries, large and small, should attempt to meet the demand for wireless access if at all possible. references and notes 1. myoung-ja lee kwon and aline soules, laptop computer services: spec kit 275 (washington, d.c.: association of research libraries office of leadership and management services, 2003), 11. 2. marshall breeding, “the benefits of wireless technologies,” information today 19, no. 3 (mar. 2002): 42–43. 3. kenneth c. green, “the campus computing project.” accessed mar. 3, 2004, www.campuscomputing.net/. 4. educause center for applied research, “respondent summary: wireless networking in higher education in the u.s. and canada.” accessed dec. 4, 2003, www.educause.edu/ ir/library/pdf/ecar_so/ers/ers0202/ekf0202.pdf. 5. wilfred drew, “wireless networks: new meaning to ubiquitous computing,” journal of academic librarianship 29, no. 2 (mar. 2003): 102–106. 6. kwon and soules, laptop computer services, 11, 15–17. 7. robert e. dugan, “managing laptops and the wireless networks at the mildred f. sawyer library,” journal of academic librarianship 27, no. 4 (jul. 2001): 295–98. 8. questions on the survey did not distinguish as to whether wireless network installations were initiated by it or library personnel. 9. national center for education statistics, “compare academic libraries.” accessed mar. 10, 2004, http://nces.ed.gov/ surveys/libraries/academicpeer/. 10. molly susan mathias and steven heser, “mobilize your instruction program with wireless technology,” computers in libraries 22, no.3 (mar. 2002): 24–30; janice k. patton, “wireless computing in the library: a successful model at st. louis community college,” community & junior-college libraries 10, no. 3 (mar. 2001): 11–16. 11. stephanie diane tolson, “wireless laptops and local area networks.” accessed dec. 11, 2003, www.thejournal.com/ magazine/vault/articleprintversion.cfm?aid=3536. 12. raymond boggs and paul arabasz, “research bulletin: the move to wireless networking in higher education.” accessed dec. 4, 2003, www.educause.edu/ir/library/pdf/erb0207.pdf. 13. mitch tulloch, microsoft encyclopedia of networking (redmond, wash.: microsoft pr., 2002), 122. 14. marshall breeding, “a hard look at wireless networks,” library journal 127, no. 12 (summer 2002): 14–17. 1. has a wireless network been implemented in your library? __yes __no 2. if your library has not adopted wireless networking, are you currently planning or seriously considering it for the near future? __yes (please skip to question 4) __no (please fill out questions 2 and 3 only) 3. what are your primary concerns about implementing a wireless network? check all that apply. __the technology is still new __unsure of its benefits __no need for one __questions regarding security __cost __would not be able to provide technical support that might be needed __funds must primarily support other types of technology at the moment __have not noticed many users with laptops in the library __slow speed of wireless networks __other 4. how long has a wireless network been implemented in your library? __fewer than 6 months __6 months to 1 year __more than 1 to 2 years __more than 2 to 3 years __more than 3 years 5. what were the main reasons for implementing a wireless network? check all that apply. __provide greater access to users __campuswide initiative __offer network access outside the library building __provide greater online access due to shortage of computers per user in the library __flexibility (no wires, ease in setting up) __requests expressed by users __low cost __to keep up with or provide technological innovation __to carry out technical services (such as inventory) __aid in bibliographic instruction __for use by library faculty and staff __other 6. please describe the coverage of your network. check all that apply. __campuswide __library building and limited range outside the library building __inside the library (all areas) __select areas within the library 7. what areas of the library are most popularly used for access to the wireless network? check all that apply. __reference and computer media center areas __in the stacks __librarians and staff offices __carrels, tables, reading or study rooms __area outside the library building 8. please list standards your wireless network supports. check all that apply. __802.11b __802.11a __802.11g __bluetooth __other planning and installation 1. was a committee established to plan the implementation and service of the wireless network? __yes __no 2. how long did it take to plan for implementation of the wireless network? __fewer than 6 months __6 months to 1 year __more than 1 to 2 years __more than 2 years 3. how long did it take to install and configure the network? __less than a month __1 to 2 months __more than 2 to 4 months __more than 4 to 6 months __more than 6 months to 1 year __more than 1 year 4. who performed the site survey? check all that apply. __an outside company or contractor appendix. survey: implementation of wireless networks wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 19 20 information technology and libraries | march 2005 __institution’s own information technology coordinator or computer staff __library staff with technical expertise __no site survey was conducted 5. if the site surveyor was an outside company or contractor, please list their company name and whether you would recommend them. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ user services 1. how are users authenticated? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2. does the library check out laptops to users (for either wired or wireless use)? __yes __no 3. if laptops are available for checkout, do they have wireless capability? __yes __no 4. how many laptops do you have for checkout? __one to five __six to ten __eleven to fifteen __sixteen to twenty __twenty-one to thirty __thirty-one to forty __more than forty 5. how would you describe use of laptops in your library on the average day? __heavy—very noticeable use of laptops __moderate use of laptops __low use of laptops __not sure __hardly even notice laptops are used 6. how do users obtain wireless cards for the network? check all that apply. __check out from library __purchase from library __purchase from the campus computer center __must purchase on their own 7. if the library checks out wireless cards, how many were purchased for checkout? __one to five __six to ten __eleven to fifteen __sixteen to twenty __twenty-one to twenty-five __twenty-six to thirty __more than thirty 8. what type of technical support does the library provide to users? check all that apply. __help from reference or help desk __help from the information technology staff or campus computer center __circulation staff __other library staff __from a web site __no technical help is provided to users 9. has the library created a policy for the use of wireless networks? __yes __no 10. are users able to print from the wireless network in the library? __yes __no 11. which of the following may access the wireless network? check all that apply. __laptops __desktop computers __pdas __cell phones __other technical problems 1. what technical problems have you or your users encountered? check all that apply. __blackouts __architectural barriers __slow speed __problems integrating the wireless network with an existing wired network __configuration problems __security problems __authentication problems __problems with operating systems __difficulty signing on __not enough technical help available to users when needed __problems installing drivers __other 2. have you experienced security problems with the network? check all that apply. __have not experienced any security problems __problems with unauthorized people accessing the internet through the wireless network __problems with restricted parts of the network being accessed by unauthorized users __other 3. how were security problems resolved? benefits of use of network 1. what have been the biggest benefits of wireless technology? check all that apply. __user satisfaction __increased access to the internet and online sources __flexibility and ease due to lack of wires __has improved technical services (use for library functions) __has aided in bibliographic instruction __provides access beyond the library building __allows students to roam the stacks while accessing the network __other 2. how would you describe current usage of the network? __heavy __moderate __low 3. in your opinion, has this technology been worth the benefit-cost ratio thus far? __yes __no __not sure 4. what advice would you give to librarians considering this technology? (editorial continued from page 3) design and implementation of complex systems to serve our users. writing about that should not be solitary either. i hope to publish think-pieces from leaders in our field. i hope to publish more articles on the management of information technologies. i hope to increase the number of manuscripts that provide retrospectives. libraries have always been users of information technologies, often early adopters of leading-edge technologies that later become commonplace. we should, upon occasion, remember and reflect upon our development as an information-technology profession. i hope to work with the editorial board, the lita publications committee, and the lita board to find a way, and soon, to facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association. in short, i want to make ital a destination journal of excellence for both readers and authors, and in doing so reaffirm the importance of lita as a professional division of ala. to accomplish my goals, i need more than an excellent editorial board, more than first-class referees to provide quality control, and more than the support of the lita officers. i need all lita members to be prospective authors, prospective referees, and prospective literary agents acting on behalf of our profession to continue the almost forty-year tradition begun by fred kilgour and his colleagues, who were our predecessors in volume 1, number 1, march 1966, of our journal. reference 1. walt crawford, first have something to say: writing for the library profession (chicago: ala, 2003). wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 21 lib-mocs-kmc364-20140103103252 scope : a cost analysis of an automated serials record system 129 michael e. d. koenig, alexander c. finlay, joann g. cushman : technical information department, pfizer inc. , groton, conn., and james m. detmer: detmer systems co., new canaan, conn. a computerized serials record and control system developed in 1968/69 for the technical information department of pfizer inc. is described and subjected to a cost analysis. this cost analysis is conducted in the context of an investment decision, using the concept of net present value, a method not previously used in library literature. the cost analysis reveals a positive net present value and a system life break-even requirement of seven years at a 10% cost of capital. this demonstrates that such an automated system can be economically justifiable in a library of relatively modest size ( approx. 1,100 serial and periodical titles). it may be that the break-even point in terms of collection size required for successful automation of serial records is smaller than has been assumed to date. introduction the field of librarianship has in general not been characterized by an abundance of cost analysis articles. this is by no means a novel observation ( 1,2,3). library automation has been no exception, despite its more quantitative aura. in particular there has been an almost complete lack of any analysis of the cost of an automated system as an investment decision. 130 journal of library automation vol. 4/3 september, 1971 the bulk of material that has been written regarding costs and cost analysis has concentrated upon costs per unit of productivity of a functioning system, or upon comparison of such costs among various systems ( 4,5,6) . though still perhaps underrepresented, there is a growing core of such articles. indeed, jacob's article on standardized costs ( 7) indicates that a certain level of maturity has been reached. the analysis of library automation in terms of its justifiability as an investment decision is not an appropriate area for benign neglect. librarians, whether they be special, academic, or public, typically must justify their budgets to some higher authority, and the decision to automate must almost invariably be an investment decision, requiring an expenditure of funds above the normal operating budget. if librarians hope to be successful in justifying their pleas for an investment in automation, an "investment in the library's future", they should be prepared to justify their requests in terms of what they represent-investment decisions. the cost analysis described below is an example of such an analysis. it is an after-the-fact analysis, but the principle remains the same. methods and materials the scope (systematic control of periodicals) system was implemented in 1968 by the technical information department of pfizer, inc., at the medical research laboratories in groton, connecticut. the system is not radically different from others described in the literature ( 8,9,10). it is reasonably sophisticated in its handling of such featu res as claiming, binding, and budgeting. the basic design element of the system is the computer generation each month of a deck of ibm cards corresponding to anticipated receipts for that month. as an item is received, the corresponding card is pulled from the anticipated deck and is used to inform the system of the receipt of the item. this "tub file" feature, first used by the university of california at san diego ( 11) is the major design difference between scope and the university of minnesota bio-medical library system described by grosch ( 12) and strom ( 13) , with which scope seems most comparable in terms of system sophistication and capability. system description the system was originally written in fortran iv for an ibm 1800 computer with two tape drives . a total of twelve programs were written. two of these programs are quite large (the weekly update and the monthly generation program) comprising about 600 statements each; the remainder average 200 statements. since that time the programs have been revised to operate on an ibm 360/30 computer using two 2400 tape drives and two 2311 disk drives. several more programs have also been written. fortran iv was chosen as a program language to render the system relatively immune to hardware changes and has fully justified itself. a listing of programs follows. scope: a cost analysis/koenig, et al. 131 program function number epc01 weekly update epc02 monthly card deck generation epc03 vendor listing epc04 periodical title evaluation & budget listing epc05 holdings listing epc06 scope file print epc07 psn file swap-to reassign psn & realphabetize epc08 daily receipt listing epc09 binding listing epc10 short title vs. full title thesamus epcll skeleton binding punch epc12 copy tape file epc13 general skeleton punch epc14 cross index punch epc15 receipt edit epc16 pmchase order analysis epc17 discipline analysis file design core requirements bytes 17060 15992 5648 6916 6992 6852 7638 1768 2480 3876 3024 2008 2920 3300 1444 2796 3024 scope maintains a magnetic tape file in which each periodical is recorded in sequence by its periodical sequence number ( psn). appearing once in the file for every psn are records giving title, cross-reference, holdin~s, and journal control information, including, for instance, "separate index.' records for one or more copies then follow this basic information. each copy within a psn consists of records for all current expected receipts ( xrs ), binding units ( bus ) not yet complete, as well as a trailer ( tl) summary. a file print program is provided which enables the library staff to inspect every item of data in the file. "anticipated" deck scope generates monthly a deck of approximately 2,500 80-column hollerith cards to be used for posting periodicals as received. a card is made for each receipt expected within the succeeding five weeks. for all regular known publication schedules, these cards are complete as to volume, issue ( including separate index) and publication date. for irregular or unknown publication schedules, one or more incomplete cards are provided in the deck. upon receipt of an issue, the proper card is pulled from the "anticipated" deck, the actual date of receipt is punched and the card used to prepare the daily receipts listing. the card is also used to update the tape file on a weekly cycle. unexpected issues require that a card be prepared manually by the library staff. issues which are omitted by the publisher require that the card be returned to the system as a "throwback." if an issue is 132 journal of library automation vol. 4/3 september, 1971 unexpectedly divided into two or more parts, separate cards are manually prepared and the original card deleted. claims in order to issue claims on a current basis, the tape file is updated weekly with receipts. every receipt will find a copy of itself on the scope tape (generated when the "anticipated" deck was produced) and a received code (r) and the current date will be posted to the record. consequently, any item not marked received becomes a claim as soon as the "claim delay" period is exceeded. a card to be used for claiming will be punched on the weekly cycle first exceeding "lag" and "claim delay," and once again every four weeks thereafter until resolved either by receipt or transfer to the missing issue file. the "lag" is the period in weeks lapsing between formal date of publication and earliest anticipated date of receipt. the "claim delay" period is calculated as the weeks elapsing between earliest anticipated date of receipt and latest normal date of receipt. "lag" and "claim delay" may be modified for each publication based on experience. binding binding units are created within the scope file during the monthly generation run. a unit is punched when all the issues comprising it are received or claimed (that is, when none of them is yet to be anticipated). if the unit is complete (no claims) it will be dropped from the tape file at the time it is punched and will not be punched again. binding units are formed whenever a volume changes or whenever the "issues per bind" factor is satisfied. receipts having been accumulated in the file from week to week are dropped at the time of the monthly generation after being counted for binding. from the binding unit cards a listing is prepared that is used by the library staff to make up bundles of periodicals for the binders. the binding unit card accompanies the shipment and is used by the binder. it includes information on issues included, indexes, color of binding, etc. file maintenance in addition to receipts and "throwbacks" the weekly update procedure allows add, change, and delete transactions to affect the scope file on a record-for-record basis. such transactions are needed to handle new periodicals, additional copies, closed series, discontinued copies, name changes, publication schedule changes, revised costs, vendor changes, and the like. the update operation is ordered by psn, copy number, record type, and (for xrs) volume and issue, in that order. an entire publication schedule may be added to the file in such cases as when the schedule is known but highly irregular (frequency code 99). after the receipt cards are processed by the update each week, they are filed in the "manual receipt file" together with copies of claims sent to ....... scope: a cost analysis/koenig, et al. 133 vendors. as binding units are created, copies of binding cards are filed in the same file, and receipt cards representing binding cards are discarded, as are earlier binding cards. this manual file corresponding to 1,000 journals requires about 5,000 cards and occupies three card file drawers. it is filed by psn and is therefore in order alphabetically by journal title. discards and additions to the manual file are about equal and hence it does not increase substantially in size. it permits rapid manual examination of the current status of each periodical. holdings list a program is provided that lists the complete scope file showing full title and abbreviated holdings statement for each psn. in addition, any cross reference/ history data and any desired holdings detail will be printed. since the file maintenance process insures an accurately updated file, this listing may be run at any time to provide an accurate reflection of library holdings. periodical title evaluation (scrutiny) a program is provided that lists all copies in the scope file requiring annual review prior to renewal. this procedure is controlled by the "value code" assigned individually to each copy within a psn. in addition to full title and abbreviated holdings statement, the listing shows by whom abstracted, the discipline codes associated with the periodical, and the annual cost. given this information, library users are requested to vote for retention of items for the next year. those not receiving sufficient votes are not renewed. separate programs not part of the scope system are used to prepare vote cards and tabulate results. budget list the program that prepares the periodical title evaluation list can be used to prepare lists by "department charged," a convenient budgetary tool used each fall to plan purchases for the following year. the lists may, of course, be run at any time. vendor order list a program is provided to prepare from the scope file a listing of all non-terminated copies associated with each requested vendor. a threecharacter vendor abbreviation is used to control this process and is coded into each copy control record. in addition to the short title, the list gives vendor reference (his identifier for the periodical ), pfizer purchase order number and date, and the estimated annual cost. each different condition (form of publication, such as periodical, microfilm) is listed with the number of copies ordered. although prices are not firm at the time of ordering, this listing nevertheless provides the detail needed for purchasing documents. as price 134 journal of library automation vol. 4/3 september, 1971 change information is made available and updated into the fil e, the listing may be rerun for checking out final billings from the vendor. similar lists can be produced by purchase order number, a convenient tool for resolving those financial complexities which inevitably occur. discipline list this program is used to prepare lists by discipline/subject, as microbiology, immunology, etc., a useful tool for maintaining collection balance, and for assuaging patrons' fears that their disciplines may not be adequately represented. system capacity present counts indicate approximately 9,000 tape records in the system, representing approximately 1,100 journals. about 200 issues are posted weekly. there are no restrictions on future expansion of the system as presently implemented. method the method of cost analysis used was the "net present value method." perhaps the clearest most readily available description of this concept is to be found in chapters 19 and 20 of shillinglaw's cost accounting, analysis and control ( 14). briefly the idea is that of comparing a given investment decision with what might reasonably be expected from an alternative use of that same money for another investment. an investment is typically defined as "an expenditure of cash or its equivalent in one time period or periods in order to obtain a net inflow of cash or its equivalent in some other time period or periods." ( 14, p. 564). the librarian typically thinks of investing in automation now in order to make possible a lessened expenditure in the future-at least a lessened expenditure in comparison to what would be necessary to accomplish the same level of operations in a non-automated fashion. conceptually these are the same; investment now in order to reap some future benefit. future savings can be treated as a future cash inflow. the concept of net present value is rather simple; it consists of converting all present and expected future cash flows (or their equivalents) to a present value and examining that value in comparison to alternative uses for the resources invested. the process of conversion is that of relating time and money. time does of course influence the worth of money. a dollar a year from now is worth less than a dollar today, for the dollar today can be invested and a year from now it will be worth more than a dollar, or at least the mathematical expectation of its worth is more than a dollar. the question is at what rate future cash flows should be discounted. business firms typically use their "cost of capital" (the cost which the business must pay to obtain capital) as the discount rate. a business d~cision should yield a positive net present value when the appropriate future cash flows are discounted at the cost of capital. if not, the investscope: a cost analysis/koenig, et al. 135 ment is a losing proposition, and the business would have been better off by not obtaining the capital, or by investing it elsewhere. the calculation of an appropriate cost of capital is a complicated exercise involving such things as debt capital, equity capital, etc. the figure of 10% is often cited as a good rule of thumb; happily it is appropriate in the case at hand and is the one used here. to the obvious question "is there any relevance in this net-presentvalue/cost-of-capital idea to an academic or a public library which does not obtain its funds in the same way, or have any explicit cost of capital?" the response is "yes." if a decision to automate, when analyzed in this fashion in comparison with alternative methods, should result in a negative net present value, then that decision is demonstrably poor. for if the money invested in automation were instead invested in the market, it could supply the alternative system's future greater operating costs with money left over to utilize elsewhere. this latter course might not be an option in fact, but the mere presence of its theoretical preferability would cast doubt on the desirability of any decision to automate. conversely a positive net present value would argue for the desirability of automation, regardless of the source of the funds. the cost analysis that follows is expressed in terms of set up cost outlays (investment) and projected savings (cash inflow). the investment expenses are of course reasonably well documented. the operational savings are based on 18 months' successful experience with the system. set-up costs (including 1968 and 1969 parallel running costs) systems analysis and programming: (fees paid to consultant ) keypunching: conversion reprogramming: (ibm 1800-dbm 360/30) computer time: personnel, opportunity costs: (asst. librarian $4,000 tech. info. mgr. $6,000) total set-up costs: yearly running costs system maintenance: (retainer to detmer systems) computer time (full costing) : allowance for machine conversion: (based on an expectation of conversion at 3 yr. intervals at a cost of $750 each time) total $10,450 2,000 500 4,000 10,000 $26,950 $ 500 5,000 250 $ 5,750 136 journal of library automation vol. 4/3 september, 1971 operational savings 1970 ~ , per year (in comparison with continued running of the previous manual system) posting: $ 1,400 (based on a saving of 8 hours per week of clerical work) claiming: 1,050 (based on a saving of 10% of an assistant librarian's time) binding: 2,700 (based on elimination of approximately 450 hours of overtime, clerk and assistant librarian, and 150 hours regular time per year) replacement costs : 400 (represents decreased replacement costs due to rapid binding and consequent lower loss rate) production of holdings list: 250 (based on a savings of 50 hours per year of assistant librarian's time) ordering/bookkeeping: 1,250 (based on a savings of 250 hours per year of assistant librarian's time ) total $7,050 savings resulting from control of the collection practicable (see discussion below) space saving per year: subscription saving per year: incremental overhead saving per year: total total yearly savings yearly running costs difference (realized savings) results not previously $ 750 2,000 1,500 $ 4,250 $11,300 $ 5,750 $ 5,500 the net present value at the end of 1970 based on 10% cost of capital and 15 year life expectancy follows. the present value of one unit one year ago is 1.1052, at 10% cost of capital (assuming for simplicity that the 1968-70 set-up prices were paid in a lump one year prior to the end of 1970 ); 7.7688 is the present value of an annuity of one unit per year for 15 years at 10% cost of capital. net present value factor scope: a cost analysis/koenig, et al. 137 net present value 1968-1970 set-up costs: ( $26,950) x ( 1.1052) ( -$29,785) yearly savings, commencing 1970: ($ 5,500) x (7.7688 ) ( +$43,117) net present value= $13,332 these findings indicate that the crude payback period :::::::: 4.9 years (commencing january 1971). the system life required to break even at 10% cost of capital = 7 years. another way of looking at the matter is to calculate the discounted rate of return. that is, at what rate of discount is the sum of the positive present values equal to the sum of the negative present values. in this case, the discounted rate of return = 17%. in other words, since the discounted rate of return ( 17%) is significantly above that available for alternative uses of the resources (say 10%), this is a reasonable candidate for investment. discussion the net present value method has two inputs in addition to the raw data. the first one, already discussed, is the cost of capital. most large businesses can supply such a figure, or at least inform the librarian or information manager what approximation is used by that company (though surprisingly many otherwise sophisticated businesses do not use this method ) . in an academic environment, advice can usually be obtained from someone in the economics department or in the business school. in any case, 10% is a good rule of thumb. the second input is the expected life span. this is not as crucial as one might suppose, for the farther distant the cash flow, the less its net present value. the net present value factor in this case for 15 years' life expectancy was 7.7688; for ten years it would have been 6.3213, for 20 years 8.6466-not a great difference. as is invariably the case, many of the effects of scope were difficult to quantify. the most difficult were those in the sections "savings resulting from control of the collection not previously praclkable." since the collection can now be easily analyzed and scrutinized with only a minimum expenditure of research staff time, the rate of growth of the collection has been considerably tamed, while maintaining customer satisfaction. prior to scope, new subscriptions had been added at the rate of about 90 a year. when scope was implemented, this fell to 10, and has now risen to approximately 30. during its first year of operation, scope apparently resulted in 80 fewer periodical subscriptions, the second year, 60 fewer. continuing this progression, 80, 60, 40, 20, 0, one would arrive at the conclusion that a long-range reduction in collection size of 200 subscriptions was achievable. to be conservative, the calculation has been based 138 ]ourool of library automation vol. 4/3 september, 1971 on an estimate of a reduction of 100 subscriptions/year. even this estimate represents a saving of over $4,000 per year. the resulting space savings were based on a cost of $10 per square feet per year (standard occupancy charges adjusted for stack use) and a ten-year cycle in stack space enlargement. this scrutiny might have been done manually at a justifiable cost, but it had not been done, and more importantly probably would not have been done. the operational savings may be open to some criticism because, as is probably obvious to an experienced serials record librarian, the previous manual system was not strikingly efficient. it can well be argued that the most efficient possible manual system rather than the previous system should have been the alternative against which scope was evaluated. from the point of view of the organization, however, the relevant comparison is to actuality, not to what is theoretically possible, but in generalizing the results this specificity must be borne in mind. somewhat mitigating this circumstance, however, is the fact that the running costs of scope are probably overestimated. the computer cost is based on full costing, inappropriately high for the following reasons: 1) it includes programming overhead, but since scope was programmed externally, the scope project is being doubly charged for its programming; 2) the same double charging applies to program maintenance; 3) the costing makes no distinction between high priority jobs, and relatively low priority jobs such as scope, and presumably low priority is less expensive. since the distortions in the two paragraphs above are difficult to estimate and since they are to a degree counterbalancing, they are simply noted rather than quantified. the yearly operational savings ( $7,050) still intuitively appear surprisingly high. one's initial reaction is that even with overhead included, this is not a great deal less than the yearly cost of one library assistant. in point of fact, one library assistant has been transferred from the library to the rapidly expanding computer based information section (computer based sdi and retrospective searching ), with no apparent deterioration of library services. the library is in fact handling a greater work load than previously, with one less person. this cannot be entirely attributed to scope, as some other rationalization of library operations has b een introduced, but it does indicate that the calculated savings are not a grossly distorted reflection of reality. conclusion as pointed out in the introduction, almost any significant attempt at library automation will require an investment decision. librarians should be prepared to make analyses of their proposals in terms of their justifiability as investment decisions, both for reasons of politics and for their own satisfaction and confidence. the net present value method is a powerful, convenient, and useful tool for such analyses. it is hoped that this scope: a cost analysis/koenig, et al. 139 article will serve as a reasonable case study for the application of this technique to the problems of library automation. an automated serial records system for a relatively modest ( 1,100 serial and periodical titles ) special library has run successfully and achieved its objectives for more than a year and a half. one of the major objectives was to produce a system that allowed clerical help to be substituted for a librarian's scarce and costly time, thus allowing more effective utilization of the professional librarian's skills. this objective has been met. furthermore, a complete turnover of the personnel interfacing with the system has been accomplished easily and painlessly. no small part of the credit goes to the originators who designed and documented the system for such turnover. jt is an old chestnut, but well worth repeating-"design the systems not for yourself, but for the person who will be chosen to replace you." the cost analysis of the operations of the system indicate that its design, implementation, and operation are economically justified, and that capital investment will be paid off in approximately seven years. (the crude payback period was less than five years. ) the major implication of this economic justification lies in the relatively modest size of the library's operation. it may well be that the break-even point in terms of collection size required for successful and cost-effective automation of serial records is smaller than has heretofore been assumed. references 1. dougherty, richard m.: "cost analysis studies in libraries: is there a basis for comparison," library resources & technical services, 13 (winter 1969), 136-141. 2. fasana, paul j.: "determining the cost of library automation," a. l. a. bulletin, 61 (june 1967 ) 656-661. 3 . griffin , hillis l.: "estimating data processing costs in libraries," college and research libraries, 25 (sept. 1964), 400-403, 431. 4. kilgour, frederick g.: "costs of library catalog cards produced by computer," journal of library automation, 1 (june 1968), 121-127. 5. chapin, richard e.; pretzer, dale h.: "comparative costs of converting shelf list records to machine readable form," journal of library automation, 1 (march 1968), 66-74. 6. black, donald v.: "creation of computer input in an expanded character set," ] ournal of library automation, 1 (june 1968), 110-120. 7. jacob, m. e. l.: "standardized costs for automated library systems," ] ournal of library automation, 3 (september 1970), 207-217. 8. lebowitz, abraham 1.: "the aec library serial record: a study in library mechanization," special libraries, 53 (march 1967), 149-153. 9. scoones, m.: "the mechanization of serial records with particular reference to subscription control," as lib proceedings, 19 (february 1967)' 45-62. 140 journal of library automation vol. 4/3 september, 1971 10. pizer, irwin h.; franz, donald r. ; brodman, estelle: "mechanization of library procedures in the medium-sized medical library: the serial record," medical library association bulletin, ll (july 1963 ), 313-338. 11. university of california, san diego, university library: report on serials computer project (la jolla, cal., university library, 1962). 12. grosch, audrey n.: university of minnesota bio-medical library serials control system. comprehensive report (minneapolis, university of minnesota libraries, 1968) 91 p. 13. strom, karen d.: "software design for bio-medical library serials control system." in american society for information service, annual meeting, 20-24 oct. 1968, proceedings, vol. 5. (new york, greenwood publishing corp. 1968), 267-275. 14. shillinglaw, gordon: cost accounting analysis and control (homewood, illinois, richard d . irwin inc. 1967) 913 p. assignfast: an autosuggest-based tool for fast subject assignment rick bennett, edward t. o’neill, and kerre kammerer information technology and libraries | march 2014 34 abstract subject assignment is really a three-phase task. the first phase is intellectual—reviewing the material and determining its topic. the second phase is more mechanical—identifying the correct subject heading(s). the final phase is retyping or cutting and pasting the heading(s) into the cataloging interface along with any diacritics, and potentially correcting formatting and subfield coding. if authority control is available in the interface, some of these tasks may be automated or partially automated. a cataloger with a reasonable knowledge of faceted application of subject terminology (fast)1,2 or even library of congress subject headings (lcsh)3 can quickly get to the proper heading but usually needs to confirm the final details—was it plural? am i thinking of an alternate form? is it inverted? etc. this often requires consulting the full authority file interface. assignfast is a web service that consolidates the entire second phase of the manual process of subject assignment for fast subjects into a single step based on autosuggest technology. background faceted application of subject terminology (fast) subject headings were derived from the library of congress subject headings (lcsh) with the goal of making the schema easier to understand, control, apply, and use while maintaining the rich vocabula ry of the source. the intent was to develop a simplified subject heading schema that could be assigned and used by nonprofessional cataloger or indexers. faceting makes the task of subject assignment easier. without the complex rules for combining the separate subdivisions to form an lcsh heading, only the selection of the proper heading is necessary. the now-familiar autosuggest4,5 technology is used in web search and other text entry applications to help the user enter data by displaying and allowing the selection of the desired text before typing is complete. this helps with error correction, spelling, and identification of commonly used terminology. prior discussions of autosuggest functionality in library systems have focused primarily on discovery rather than on cataloging.6-11 rick bennett (rick_bennett@oclc.org) is a consulting software engineer in oclc research , edward t. o’neill (oneill@oclc.org) is a senior research scientist at oclc research and project manager for fast, and kerre kammerer (kammerer@oclc.org) is a consulting software engineer in oclc research, dublin, ohio. http://www.oclc.org/research/activities/fast.html http://www.loc.gov/catdir/cpso/lcc.html mailto:rick_bennett@oclc.org mailto:oneill@oclc.org mailto:kammerer@oclc.org information technology and libraries | march 2014 35 the literature often uses synonyms for autosuggest, such as autocomplete or type-ahead. since assignfast can lead to terms that are not being typed , autosuggest seems most appropriate and will be used here. the assignfast web service combines the simplified subject choice capabilities of f ast with the text selection features of autosuggest technology to create an in -interface subject assignment tool. much of a full featured search interface for the fast authorities, such as searchfast ,12 can be integrated into the subject entry field of a cataloging interface. this eliminates the need to switch screens, cut and paste, and make control character changes that may differ between the authority search interface and the cataloging interface. as a web service, assignfast can be added to existing cataloging interfaces. in this paper, the actual operation of assignfast is described , followed by how the assignfast web service is connected to an interface, and finally by a description of the web service construction. assignfast operation an authority record contains the established heading, see headings, and control numbers that may be used for linking or other future reference. the relevant fields of the fast record for motion pictures are shown here: control number fst01027285 established heading motion pictures see cinema see feature films -history and criticism see films see movies see moving-pictures in fast, the facet of each heading is known. motion pictures is a topical heading. the see references are unauthorized forms of the established heading. if someone intended to enter cinema as a subject heading, they would be directed to use the established heading motion pictures. for a typical workflow, the subject cataloger would need to leave the cataloging interface, search for “cinema” in an authority file interface, find that the established heading was motion pictures, and return to the cataloging interface to enter the established heading. the figure below shows the same process when assignfast is integrated into the cataloging interface. without leaving the cataloging interface, typing only “cine” shows both the see term that was initially intended and the established heading in a selection list. assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 36 figure 1. assignfast typical selection choices. selecting “cinema use motion pictures” enters the established term, and the entry process is complete for that subject. figure 2. assignfast selection result. the text above the entry box provides the fast id number and facet type. information technology and libraries | march 2014 37 as a web service, assignfast headings can be manipulated by the cataloging interface software after selection and before they are entered into the box. for example, one option available in the assignfast demo is marcbreaker format.13 marcbreaker combines marc field tagging and allows diacritics to be entered using only ascii characters. using marcbreaker output, assignfast returns the following for “ ”: =651 7$abrazil$zs{tilde}ao paulo$0(ocolc)fst01205761$2fast in this case, the output includes marc tagging of 651 (geographic), as well as subfie ld coding ($z) that identifies the city within brazil, that it’s a fast heading, and the fast control number. the information is available in the assignfast result to fill one or multiple input boxes and to reformat as needed for the particular cataloging interface. addition to web browser interfaces as a web service, assignfast could be added to any web-connected interface. a simple example is given here to add assignfast functionality to a web browser interface using javascript and jquery (http://jquery.com). these technologies are commonly used, and other implementation technologies would be similar. example files for this demo can be found on the oclc developers network under assignfast.14 the example uses the jquery.autocomplete function.15 first, the script packages jquery.js, jqueryui.js, and the style sheet jquery-ui.css are required. version 1.5.2 of jquery and version 1.8.7 for jquery-ui was used for this example, but other compatible versions should be fine. these are added to the html in the script and link tags. the second modification to the cataloging interface is to surround the existing subject search input box with a set of div tags.
    the final modification is to add javascript to connect the assignfast web service to the search input box. this function should be called from assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 38 function setuppage() { // connect the autosubject to the input areas jquery('#existingbox').autoc omplete( { source: autosubjectexample, minlength: 1, select: function(event, ui) { jquery('#extrainformation').html("fast id " + ui.item.idroot + " facet "+ gettypefromtag(ui.item.tag)+ ""); } //end select } ).data( "autocomplete" )._renderitem = function( ul, item ) { formatsuggest(ul, item);}; } //end setuppage() the source: autosubjectexample tells the autocomplete function to get the data from the autosubjectexample function, which in turns calls the assignfast web service. this is in the assignfastcomplete.js file. in select: function, the extrainformation text is rewritten with additional information returned with the selected heading. in this case, the fast number and facet are displayed. the generic _renderitem of the jquery.autocomplete function is overwritten by the formatsuggest function (found in assignfastcomplete.js) to create a display that differentiates the see from the authorized headings that are returned in the search. the version used for this example shows: see heading use authorized heading when a see heading is returned, or simply the authorized heading otherwise. web service construction the autosuggest service for a fast heading was constructed a little differently than the typical autosuggest. for a typical autosuggest for the term motion picture from the example given above, you would index just that term. as the term was typed, motion picture and other terms starting with the text entered so far would be shown until you resolved the desired heading. for example, typing in “m t” might give motion pictures motion picture music employee motivation information technology and libraries | march 2014 39 diesel motor mothers and daughters for the typical autosuggest, the term indexed is the term displayed and is the term returned when selected. for assignfast, both the established and see references are indexed. however, when typing resolves a see heading, both the see heading and its established heading are displayed. only the established heading is selected, even if you are typing the see heading. for assignfast, the “m t” result now becomes features (motion pictures) use feature films motion pictures motorcars (automobiles) use automobiles motion picture music background music for motion pictures use motion picture music motion pictures for the hearing impaired use films for the hearing impaired documentaries, motion picture use documentary films mother of god use mary, blessed virgin, saint the headings in assignfast are ranked by how often they are used in worldcat, so headings that are more common appear at the top. to place the established heading above the see heading when they are similar, the established heading is also ranked higher than the see for the same usage. assignfast can also be searched by facet, so if only topical or geographic headings are desired, only headings from these facets will be displayed. the web service uses a solr16 search engine running under tomcat.17 this provides full text search and many options for cleaning and manipulating the terms within the index. the particular option used for assignfast is the edgengramfilter.18 this option is used for autosuggest and has each word indexed one letter at a time, building to its entire length. the ndex f “c nem ” w then c nt n “c,” “c ,” “c n,” “c ne,” “c nem,” nd “c nem .” solr handles utf-8 encoded unicode for both input and output. the assignfast indexes and queries are normalized using fast normalization19 to remove punctuation, diacritics, and capitalization. fast normalization is very similar to naco normalization, although in fast nor malization the subfield indicator is replaced by a space and no commas retained. assignfast is accessed using a rest request.20 rest requests consist of urls that can be invoked via either http post or get methods, either programmatically or via a web browser. http://fast.oclc.org/searchfast/fastsuggest?&query=[query]&queryindex=[queryindex]&qu eryreturn=[queryreturn]&suggest=autosuggest&rows=[numrows]&callback=[callbackfun ction] assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 40 where parameter description query the query to search queryindex the index corresponding to the fast facet. these include name description suggestall all facets suggest00 personal names suggest10 corporate names suggest11 events suggest30 uniform titles suggest50 topicals suggest51 geographic names suggest55 form/genre queryreturn information requested list, comma separated. these include: names description idroot fast number auth authorized heading, formatted for display with—as subfield separator type alt or auth—indicates whether the match on the queryindex was to an authorized or see heading tag marc authority tag number for the heading—100= personal name, 150 = topical, etc. raw authorized heading, with subfield indicators. blank if this is identical to auth (i.e., no subfields) breaker authorized heading in marcbreaker format. blank if this is identical to raw (i.e., no diacritics) indicator indicator 1 from the authorized heading numrows headings to return maximum restricted to 20 callback the callback function name for jsonp table 1. assignfast web service results description. information technology and libraries | march 2014 41 example response: http://fast.oclc.org/searchfast/fastsuggest?&query=hog&queryindex=suggestall&queryreturn=s uggestall%2cidroot%2cauth%2ctag%2ctype%2craw%2cbreaker%2cindicator&suggest=autosu bject&rows=3&callback=testcall yields the following response: testcall({ "responseheader":{ "status":0, "qtime":148, "params":{ "json.wrf":"testcall", "fl":"suggestall,idroot,auth,tag,ty pe,raw,b reaker,indicator", "q":"suggestall:hog", "rows":"3"}}, "response":{"numfound":1031,"start":0,"docs" :[ { "idroot":"fst01140419", "tag":150, "indicator":" ", "type":"alt", "auth":"swine", "raw":"", "breaker":"", "suggestall":["hogs"]}, { "idroot":"fst01140470", "tag":150, "indicator":" ", "type":"alt", "auth":"swine--ho using", "raw":"swine$xhousing", "breaker":"", "suggestall":["hog houses"]}, { "idroot":"fst00061534", "tag":100, "indicator":"1", "type":"auth", "auth":"hogarth, william, 1697-1764", "raw":"hogarth, william,$d1697-1764", "breaker":"", "suggestall":["hogarth, william, 1697-1764"]}] }}) table 3. typical assignfast json data return. assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 42 the first response heading is the use for headin hogs, which has the authorized heading swine. the second is the use for heading for hog houses, which has the authorized heading swine-housing. this authorized heading is also given in its raw form, including the $x subfield separator, which is unnecessary for the first heading. the third response matches the authorized heading for hogarth, william, 1697–1764, which is also given in its raw form. the breaker (marcbreaker) format is only added if it differs from the raw form, which is only when diacritics are present. conclusions subject assignment is a combination of intellectual and manual tasks. the assignfast web service can be easily integrated into existing cataloging interfaces, greatly reducing the manual effort eq ed f g d s bject d t ent y nc e s ng the c t ge ’s p d ct v ty. references 1. lois mai chan and edward t. o’neill, fast: faceted application of subject terminology, principles and applications (santa barbara, ca: libraries unlimited, 2010), http://lu.com/showbook.cfm?isbn=9781591587224 . 2. oclc research activities associated with fast are summarized at http://www.oclc.org/research/activities/fast. 3. lois m. chan, library of congress subject headings: principles and application: principles and application (westport, ct: libraries unlimited, 2005). 4. “a t c mp ete ” wikipedia, last modified on october 1, 2013, http://en.wikipedia.org/wiki/autocomplete. 5. tony russell-rose, “des gn ng e ch: as-you-type suggestions,” ux magazine, article no. 828, may 16, 2012, http://uxmag.com/articles/designing-search-as-you-type-suggestions. 6. david ward, jim hahn, and kirsten fe st “a t c mp ete s rese ch t : a t dy n providing search suggestions ” information technology & libraries 31, no. 4 (december 2012), 6–19. 7. jon je mey “a t m ted indexing: feeding the autocomplete monster,” indexer 28, no. 2 (june 2010), 74–75. 8. holger bast, christian w. mortensen, and ingmar webe “o tp t-sensitive autocompletion search,” information retrieval 11 (august 2008), 269–286. 9. elías tzoc, “re-using today’s metadata for tomorrow’s research: five practical examples for enh nc ng access t d g t c ect ns ” journal of electronic resources librarianship 23, no. 1 (january–march 2011) http://lu.com/showbook.cfm?isbn=9781591587224 http://www.oclc.org/research/activities/fast/ http://en.wikipedia.org/wiki/autocomplete http://uxmag.com/articles/designing-search-as-you-type-suggestions information technology and libraries | march 2014 43 10. holger bast and ingmar webe “type less f nd m e: f st a t c mp et n e ch w th succinct index,” sigir ’06 proceedings of the 29th annual international acm sigir conference on research and development in information retrieval (new york: acm, 2006), 364–71. 11. demian katz, ralph levan, and ya’aqov ziso “us ng a th ty d t n v f nd ” code4lib journal 11 (june 2011). 12. edward t. o’ne , rick bennett, and kerre kammerer, “using authorities to improve subject searches ” in m j ž me nd k. r e nd edw d t. o’ne eds., “ ey nd l b es— subject metadata in the digital environment and semantic web ,” special issue, cataloging & classification quarterly 52, no. 1/2 (in press). 13. “marcm ke nd marc e ke use ’s m n ” library of congress, network development and marc standards office, revised november 2007, http://www.loc.gov/marc/makrbrkr.html . 14. “oclc deve pe s netw k— ss gnfa t ” s bm tted eptembe 28 2012 http://oclc.org/developer/services/assignfast [page not found] 15. “jq e y t c mp ete ” ccessed oct be 1 2013 http://jqueryui.com/autocomplete. 16. “ap che l cene—ap che ” ccessed oct be 1 2013 http://lucene.apache.org/solr. 17. “ap che t mc t ” ccessed oct be 30 2013 http://tomcat.apache.org. 18. “ olr w k —analyzers tokenizers tokenfilters,” last edited october 29, 2013, http://wiki.apache.org/solr/analyzerstokenizerstokenfilters . 19. thomas b. hickey, jenny toves, and edward t. o’neill, “naco normalization: a detailed examination of the authority file comparison rules,” library resources & technical services 50, no. 3 (2006), 166–72. 20. “rep esent t n t te t nsfe ” wikipedia, last modified on october 21, 2013, http://en.wikipedia.org/wiki/representational_state_transfer . http://www.loc.gov/marc/makrbrkr.html http://oclc.org/developer/services/assignfast http://jqueryui.com/autocomplete/ http://lucene.apache.org/solr/ http://tomcat.apache.org/ http://wiki.apache.org/solr/analyzerstokenizerstokenfilters http://en.wikipedia.org/wiki/representational_state_transfer levan opensearch and sru | levan 151 not all library content can be exposed as html pages for harvesting by search engines such as google and yahoo!. if a library instead exposes its content through a local search interface, that content can then be found by users of metasearch engines such as a9 and vivísimo. the functionality provided by the local search engine will affect the functionality of the metasearch engine and the findability of the library’s content. this paper describes that situation and some emerging standards in the metasearch arena that choose different balance points between functionality and ease of implementation. editor's note: this article was submitted in honor of the fortieth anniversaries of lita and ital. ฀ the content provider’s dilemma consider the increasingly common situation in which a library wants to expose its digital content to its users. suppose it knows that its users prefer search engines that search the contents of many sites simultaneously, rather than site-specific engines such as the one on the library’s web site. in order to support the preferences of its users, this library must make its contents accessible to search engines of the first type. the easiest way to do this is for the library to convert its contents to html pages and let the harvesting search engines such as google and yahoo! collect those pages and provide searching on them. however, a serious problem with harvesting search engines is that they place limits on how much data they will collect from any one site. google and yahoo! will not harvest a 3-million-record book catalog, even if the library can figure out how to turn the catalog entries into individual web pages. an alternative to exposing library content to harvesting search engines as html pages is to provide a local search interface and let a metasearch engine combine the results of searching the library’s site with the results from searching many other sites simultaneously. users of metasearch engines get the same advantage that users of harvesting search engines get (i.e., the ability to search the contents of many sites simultaneously) plus those users get access to data that the harvesting search engines do not have. the issue for the library is determining how much functionality it must provide in its local search engine so that the metasearch engine can, in turn, provide acceptable functionality to its users. the amount of functionality that the library provides will determine which metasearch engines will be able to access the library’s content. metasearch engines, such as a9 and vivísimo, are search engines that take a user’s query, send it to other search engines, and integrate the responses.1 the level of integration usually depends on the metasearch engine’s ability to understand the responses it receives from the various search engines it has queried. if the response is html intended for display on a browser, then the metasearch engine developers have to write code to parse through the html looking for the content. in such a case, the perceived value of the content determines the level of effort that the metasearch engine developers put into the parsing task; low-value content will have a low priority for developer time and will either suffer from poor integration or be excluded. for metasearch engines to work, they need to know how to send a search to the local search engine and how to interpret the results. metasearch engines such as vivísimo and a9 have staffs of programmers who write code to translate the queries they get from users into queries that the local search engines can accept. metasearch engines also have to develop code to convert all the responses returned by the local search engines into some common format so that those results can be combined and displayed to the user. this is tedious work that is prone to breaking when a local search engine changes how it searches or how it returns its response. the job of the metasearch engine is made much simpler if the local search engine supports a standard search interface such as sru (search and retrieve url) or opensearch. ฀ what does a metasearch engine need in order to use a local search engine? the search process consists of two basic steps. first, the search is performed. second, records are retrieved. to do a search, the metasearch engine needs to know: 1. the location of the local search engine 2. the form of the queries that the local search engine expects 3. how to send the query to the local search engine to retrieve records, the metasearch engine needs to know: 4. how to find the records in the response 5. how to parse the records opensearch and sru: a continuum of searching ralph levan ralph levan (levan@oclc.org) is a research scientist at oclc online computer library center in dublin, ohio. 152 information technology and libraries | september 2006 ฀ four protocols this paper will discuss four search protocols: opensearch, opensearch 1.1, sru, and the metasearch xml gateway (mxg).2 opensearch was initially developed for the a9 metasearch engine. it provides a mechanism for content providers to notify a9 of their content. it also allows rss (really simple syndication) browsers to display the results of a search.3 opensearch 1.1 has just been released. it extends the original specification based on input from a number of organizations, microsoft being prominent among them. sru was developed by the z39.50 community.4 recognizing that their standard (now eighteen years old) needed updating, they simplified it and created a new web service based on an xml encoding carried over http. the mxg protocol is the product of the niso metasearch initiative, a committee of metasearch engine developers, content providers, and users.5 mxg uses sru as a starting place, but eases the requirement for support of a standard query grammar. ฀ functionality versus ease of implementation a library rarely has software developers. the library’s area of expertise is, first of all, the management of content and, secondarily, content creation. librarians use tools developed by other organizations to provide access to their content. these tools include the library’s opac, the software provided to search any licensed content, and the software necessary to build, maintain, and access local digital repositories. for a library, ease of adoption of a new search protocol is essential. if support for the search protocol is built into the library’s tools, then the library will use it. if a small piece of code can be written to convert the library’s existing tools to support the new protocol, the library may do that. similarly, the developers of the library’s tools will want to expend the minimum effort to support a new search protocol. the tool developer’s choice of search protocol to support will depend on the tension between the functionality needed and the level of effort that must be expended to provide and maintain it. if low functionality is acceptable, then a small development effort may be acceptable. high functionality will require a greater level of effort. the developers of the search protocols examined here recognize this tension and are modifying their protocols to make them easier to implement. the new opensearch 1.1 will make it easier for some local search-engine providers to implement by easing some of the functionality requirements of version 1.0. similarly, the niso metasearch committee has defined mxg, a variant of sru that eases some of the requirements of sru.6 ฀ search protocol basics once again, the five basic pieces of information that a metasearch engine needs in order to communicate effectively with a local search engine are: (1) local search engine location, (2) the query-grammar expected, (3) the request encoding, (4) the response encoding, and (5) the record encoding. the four protocols provide these pieces of information to one degree or another (see table 1). the four protocols expose a site’s searching functionality and return responses in a standard format. all of these protocols have some common properties. they expect that the content provider will have a description record that describes the search service. all of these services send searches via http as simple urls, and the responses are sent back as structured xml. to ease implementation, opensearch 1.1 allows the content provider to return html instead of xml. all four protocols use a description record to describe the local search engine. the opensearch protocols define what a description record looks like, but not how it is retrieved. the location of the description record is discovered by some means outside the protocol (a priori knowledge). the description record specifies the location of the local search engine. the sru protocols define what a description record looks like and specifies that it can be obtained from the local search engine. the location of the local search engine is provided by a means outside the protocol (a priori knowledge again). each protocol defines how to formulate the search url. opensearch does this by having the local search-engine provider supply a template of the url in the description record. sru does this by defining the url. opensearch and mxg do not define how to formulate the query. the metasearch engine can either pass the user’s query along to the local search engine unchanged or reformulate the query based on information about the local search engine’s query language that it has gotten by outside means (more a priori knowledge). in the first case, the metasearch engine has to hope that some magic will happen and the local search engine will do something useful with the query. in the latter case, the metasearch engine’s staff has to develop a query translator. sru specifies a standard query grammar: cql (common query language).7 this means that the metasearch engine only has to write one translator for all the sru local search engines in the world. but it also means that all the sru local search engines have to support the cql query grammar. since there are no local search engines that support cql as their native query grammar, the content provider is left with the task of translating cql queries into their native query grammar. the query translation task has moved from the metasearch engine to the content provider. opensearch and sru | levan 153 opensearch 1.0, mxg, and sru define the structure of the query response. in the case of opensearch, the response is returned as an rss message, with a couple of extra elements added. mxg and sru define an xml schema for their responses. opensearch 1.1 allows the local search engine to return the response as unstructured html. this moves the requirement of creating a standard response from the content provider and leaves the metasearch engine with the much tougher task of finding the content embedded in html. if the metasearch engine doesn’t write code to parse the response, then all it can do is display the response. it will not be able to combine the response from the local search engine with the responses from other engines. sru and mxg require that records be returned in xml and that the local search engine must specify the schema for those records in the response. this leaves the content provider with the task of formatting the records according to the schema of their choice, a task that the content provider is probably best able to do. in turn, the metasearch engine can convert the returned records into some common format so that the records from multiple local search engines can be combined into a single response. because the records are encoded in xml, it is assumed that standard xml formatting tools can be used for the conversion. opensearch does not define how records should be structured. the opensearch response has a place for the title of the record and a url that points to the record. the structure of the record is undefined. this leaves the metasearch engine with the task of parsing the record that is returned. again, the effort moves from the content provider to the metasearch engine. if the metasearch engine does not or cannot parse the records, then it can at least display the records in some context, but it cannot combine them with the records from another local search engine. ฀ conclusion these protocols sit on a spectrum of complexity, trading the content provider’s complexity for that of the search engine. however, with lessened complexity for the metasearch engine comes increased functionality for the user. metasearch engines have to choose what content providers they will search. those that provide a high level of functionality can be easily combined with their existing local search engines. content providers with a lower level of functionality will either need additional development by the metasearch engine or will not be searched. not all metasearch engines require the same level of functionality, nor will they be prepared to accept content with a low level of functionality. content providers, such as digital libraries and institutional repositories, will have to choose the functionality they need to support to reach the metasearch engines they desire. references and notes 1. joe barker, “meta-search engines,” in finding information on the internet: a tutorial (u.c. berkeley: teaching library internet workshops, aug. 23, 2005 [last update]), www.lib.berkeley. edu/teachinglib/guides/internet/metasearch.html (accessed may 8, 2006). 2. a9.com, “opensearch specification,” http://opensearch .a9.com/spec/ (accessed may 8, 2006); a9.com, “opensearch 1.1,” http://opensearch.a9.com/spec/1.1/ (accessed may 8, 2006). 3. mark pilgrim, “what is rss?” o’reilly xml.com, dec. 18, 2002, www.xml.com/pub/a/2002/12/18/dive-into-xml.html (accessed may 8, 2006). 4. the library of congress network development and marc standards office, “z39.50 maintenance agency page,” www.loc.gov/z3950/agency/ (accessed may 8, 2006). 5. national information standards organization, “niso metasearch initiative,” www.niso.org/committees/ ms_initiative.html (accessed may 8, 2006). 6. niso metasearch initiative task group 3, “niso metasearch xml gateway implementors guide, version 0.2,” may 16, 2005, [microsoft word document] www.lib.ncsu.edu/nisomi/images/0/06/niso_metasearch_initiative_xml _gateway _implementors_guide.doc (accessed may 8, 2006); the library of congress, “sru: search and retrieve via url; sru version 1.1 13 february 2004,” www.loc.gov/standards/sru/index.html (accessed may 8, 2006). 7. the library of congress, “common query language; cql version 1.1 13th february 2004.” [web page] www.loc .gov/standards/sru/cql/index.html (accessed may 8, 2006). table 1. comparison of requirements of four metasearch protocols for effective communication with local search engines protocol feature opensearch 1.1 opensearch 1.0 mxg sru local search engine location a priori a priori a priori a priori request encoding defined defined defined defined response encoding none rss xml xml record encoding none none xml xml query grammar none none none cql we do not have an information-prone society. when faced with a problem or interest, i suggest, we are more prone to ask, "what do i have to do?" rather than, "what do i have to know?" part of this reaction is probably due to the fact that when we ask "what do i have to know?" we are faced with another problem in addition to the initial one; i.e., where to get the information. this added effort simply confirms in us our indifference to information, and we take our best shot at solving the problem through decision and action . i sometimes think we have made a virtue of the information incapacity by the way we laud decision making as an indicator of ability. if the foregoing examples are reasonably accurate, we are then faced with a situation in which information is fundamentally important to societal and individual wellbeing, but is not perceived to be so by people in the conduct of their daily affairs. computer-supported telecommunications systems can be the instrument for accelerating information control by a few (this has been much of the trend , so far , as indicated by corporate, research, and technical use of these systems), or it can be used to build information confidence, use, and desire throughout society. this option, i. suggest, is central to the significance of telecommunications systems for a democratic society. if the latter option is to be obtained, i suggest that information will have to be packaged and targeted so well on people 's everyday problems and interests that it will be easier and more productive to say "what do i have to know?" before saying "what do i have to do?" a basic approach to articulating an information service of this kind consists of the following steps: l. determine and prioritize the individual and societal problems and interests of a given community. 2. ascertain the information parameters of those problems and interests. 3. locate and obtain the information necessary to address those problems and interests. 4. organize this information so as to optimally target the specified probcommunications 103 !em or interest to be as easily retrievable as possible. this requires an understanding of the context in which the information is used so that it is optimally relevant, and an understanding of the language and problem articulation common ·to the individuals in the community in order to ensure rapid retrieval. a lesson in interactive television programming: the home book club on qube w. theodore bolton: oclc , inc., columbus, ohio. on december 1, 1977, warner communications christened what has become the most publicized and talked about technological development in the field of cable television: qube, its two-way interactive cable system . publicity posters claimed that this would be "a day you'll tell your grandchildren about," and broadcasters added the word "interactive" to their cocktail-party vocabulary. academicians who ten· years ago forecast a technological revolution initiated by the marriage of computer to cable television, smugly grinned and saw their dreams turn into reality. response to qube, however, has been mixed. participatory television brings, to some, futuristic images of instant democracy ; others warn of its potential demagogic power. 1 regardless of your critical persuasion, there now exists what former cbs executive turned warner amex2 consultant mike dann calls "a whole new utility ."3 this whole new utility, whether in the form of qube cable television, or some other combination of computer, cable television, telephone, and standard over-the-air broadcasting, will change the way we conduct our lives <:nd interact with other people . the history of the home book club early in 1979, the oclc, inc . , research staff appraised the nature and context of the qube facilities (located in co104 journal of library automation vol. 14/2 june 1981 lumbus , ohio, only five miles away). discussions, which at times centered around far-fetched and lofty ideas, eventually led to realistic and inventive concepts that made use of qube's interactive technology. the most promising of these concepts was a book discussion program where the audience determined the content and direction of the discussion itself. hoping to take advantage of this new technology, and at the same time expand library services available to the general public, oclc proposed a book discussion program to qube. in a previously released statement, qube vice-president harlan kleiman had stated that the polling capabilities of the qube system should be treated like a "time bomb."4 yet oclc's proposal indicated an interest in exploring these very same devices. this factor, coupled with qube' s "closed door" policy toward outside researchers and scholars, seemed to indicated that the home book club research proposal would be rejected. but qube executives did the unexpected: they agreed to air six home book club programs, one each month . and so, on july 18, 1979, at 7 p.m . , the home book club premiered. an interactive book discussion what makes qube unique is its twoway, or upstream, capability. the qube technology is made up of three complementary computers that are used for monitoring, tabulation, and billing purposes . each qube console in a viewer's home has thirty channels to choose from and five response buttons to press when answering questions posed to home viewers on qube programs . by monitoring and tabulating data that show which tv sets are on, which programs viewers are watching, and which response buttons they last touched, qube therefore has a virtually error-free system of audience research . t his allows for a staggering amount of audience data to be compiled theoretically every six seconds . apart from the thirty-channel capability of standard television, community programs, and pay-per-viewing feature films, the most intriguing aspect of qube is its five response buttons. oclc felt that the use of these buttons should be emphasized and the concept of interaction should be fully incorporated into the home book club . at the beginning of each home book club program, home viewers were asked to select, from three alternatives, the opening topic of conversation about the book. after the home viewers had "touched in" their preference on one of the prespecified buttons, the qube polling computer tallied and displayed the results. once the book discussion was under way, the home viewers were given additional opportunities to "democratically" determine whether the panelists should continue in a particular topic area, or move on to new topic areas. if a controversial issue emerged within the course of a discussion, the horne book club panelists were encouraged to spontaneously pose interactive questions to horne viewers. this form of instantaneous polling was extended to telephone participants who were also periodically incorporated into the book discussion. a sampling of these opinion-type questions included: from the wifey program, "should sandy have left norman?"; from the metropolitan life program, "is this book too subjective for non-new yorkers?"; from the eye of the needle program, "was the violence portrayed a necessary part of this book?"; from the world according to carp program, "was this a feminist novel?" toward the end of each one-hour horne book club program the qube system broke new ground in interactive television history: home viewers selected, from five alternatives, the book to be discussed on next month's program. in addition, horne viewers were able to request a copy of the book to be sent to their home at no charge from the public library of columbus and franklin county (plcfc) . these two transactions took place with a mere touch of the prespecified button on the qube console. plcfc provided a major contribution to the horne book club. once the qube computers had compiled the names and addresses of those viewers who requested next month's book (earlier, all horne viewers had been told that their names would be entered in the qube computer if they responded to a book request), the qube computer printed the names on mailing labels. these labels were forwarded to the plcfc books-by-mail office, which then filled each request. the total time from "touch-in request" to "in-home mail delivery" was usually two to three days. indeed, a form of electronic catalog ordering actually took place each time the home book club program was cablecast in columbus. it should be noted that home book club viewers were also given the opportunity to order the alternative book choices. who watched the home book club? an additional use of qube's two-way capability was also incorporated into the first six home book club programs. prior to selecting and ordering the next months' books, home viewers were asked to respond to a series of demographic-type questions. from these questions, a profile of the typical home book club viewer was compiled to plcfc and qube management. this portion of the program also provided the oclc research department with data with which to explore the market-research potential of an interactive television system. from the beginning of the home book club research project, a few obvious limitations of interactive polling became apparent. first, not all home viewers made use of, or were willing to participate in, qube's interactive technology. response rates ranged from 20 to 85 percent, with an approximate mean rate of 55 percent. second, only one viewer in a multiple-person household could respond. third, it can be logically assumed that certain kinds of people will and did interact more often than others . taking these limitations into consideration, a few generalizations were still able to be made regarding the home book club audience . the demographic data traced over the first six programs showed the audience to be primarily composed of younger (below thirty-nine years of age), college-educated (65 percent had college communications 105 or postgraduate degrees), middle to upper income (60 percent earning $25 ,000 or more per year), females (approximately 70 percent of the interacting audience). these figures should not surprise anyone who is either familiar with previous profiles of the general library users or who may in passing conjure a guess as to what kind of person might be interested in viewing a televised interactive book discussion . a closer inspection of the instantaneous audience demographics, however, led to some disappointing implications. can a democratic television program survive? as was pointed out earlier, home viewers were permitted to select the next month's book at the conclusion of a program. this was strictly a democratic process where the majority ruled. the world according to carp, the premier home book club book, was followed by eye of the needle and wifey for programs two and three respectively. the qube computer indicated that each of these programs were viewed by approximately 175 households, or almost 420 individuals. in a competitive structure where there are twenty-nine television program alternatives from which a viewer can choose, qube, oclc, and the plcfc felt that a successful programming concept had been born. qube management enthusiastically reported that the home book club had achieved audience levels that at times rivaled their more extravagant and broadbased entertainment/interview program, "columbus alive." this enthusiasm was short-lived as audience-level figu res from program four came in . at the end of program three (wifey), the audience selected james michener's weighty novel chesapeake for the next month's program. the respectable figure of approximately 375 viewers for wifey dwindled to slightly less than 210 viewers for chesapeake. and to make matters worse, the audience-level figures did not improve for programs five and six. there are several alternative and sometimes complementary explanations for this substantial loss in audience. first, many viewers may not have been able to get 106 journal of library automation vol. 14/2 june 1981 through the some one thousand pages of "maryland's eastern shore" history in chesapeake, and thus chose not to participate in the horne book club. second, the new fall syndicated programs offered at that time by local network affiliates may have led many viewers to choose alternative programming. additional hypotheses can also be gleaned from the interactive demographic data: whereas in programs one through three approximately 40 percent of the audience indicated their educational level to be either some college or below, only 20 percent of the chesapeake audience (program four) fell into this category. this statistic remained constant for programs five and six of the horne book club. in the democratic television environment that the horne book club provides, what happens to the minority interest group? could this democratic television system be systematically eliminating specific viewer types? it might be that the outvoted minority group book reader can withstand being overruled just so many times before ceasing to participate. what recourse does this minority interest group have other than to be dominated by higher-educated viewers who heavily stuff the electronic ballot box in favor of their own book preferences? quite clearly the recourse for the minority interest group was to select a competing television program, as evidenced by the declining viewing audience-level figures. the loss of these viewers becomes especially disheartening because this particular audience segment may represent a group of individuals who never before participated in a book discussion . the future of the home book club given the somewhat disappointing results of the horne book club reported thus far, one would expect the program to be recorded in history as a noble, but unsuccessful, attempt at interactive television programming. the books-by-mail program did send out some 760 paperback books as a result of the horne book club (a 79 percent overall increase), and twenty-six new library cards (not a prerequisite) were issued to horne book club viewers . but the fact remains that a for-profit company such as warner annex most definitely cannot substantiate the continuation of a program that has the audience ratings as low as the horne book club . ... or can it? not only has the horne book club been continued (it's now in its twentieth month), but a morning edition of the horne book club premiered in june 1980. what explanations can account for this somewhat bewildering corporate behavior? on a very idealistic level, warner annex could be fulfilling its obligation to serve all facets of the columbus community. the horne book club certainly offers a viewing alternative to an often neglected segment of the viewing population. oclc, inc., and public libraries throughout the united states applaud this kind of responsible programming . on a more practical level, there may be other strategies behind the renewal of the horne book club contract. a 1978 study completed by the argus research corporation concluded that "no profits are expected from qube until the system is successfully replicated in cities other than columbus , and at considerably lower costs . "s to replicate the qube system, warner annex must expand its cable territory into new communities throughout the united states. this can at times be a very difficult task. the right for a company such as warner annex to wire a local municipality to its qube system is determined by local government. normally, a city council reviews and contrasts alternative cable systems in terms of the services each system proposes in return for franchising rights. the final decision usually is based on costs, the programming made available, and, most importantly, the kind of community service the cable system proposes to extend to its viewers . one definition of extended community service might be a televised book discussion program that involves the local public libraries . the alluring notion of an interactive book discussion may even be more appealing to community-minded city council members. in fact, qube is currently using an edited composite tape of h9rne book club highlights in their franchising efforts. the success of such efforts remains to be seen . whether warner amex's motives are communityor commercial-minded, the fact remains that other communities may have the opportunity to develop a program of this kind . since local governments can legally specify what services the cable company must provide, the inclusion of a televised book discussion program could become part of a contract fulfillment. advice for those interested in developing alternative television programs for special-interest groups : don't be caught napping when your national cable representatives come knocking on your city council door. as for the home book club , qube and the public library of columbus and franklin county are working at reestablishing a solid baseline audience . as is the case for any television program, promotion is a key ingredient for success . when viewers were asked where they first found out about the home book club, more than half indicated they obtained program information through the free qube program guide. approximately 15 percent heard from a friend and 12 percent found information at the public library . a coordinated promotional effort is highly recommended for a public-service program of this nature . the future of interactive television qube must be thought of as more than just a two-way television system. in fact, it is more than interactive television. qube is actually a computer hooked to a cable communication system . that cable communication system is a network providing a pathway for a wide variety of services from central facility to home subscribers . in the future, not only will systems such as qube provide "local loop" communications for these services, but undoubtedly will be interconnected by a satellite with other similar systems throughout the country and indeed the world. the five buttons on the existing qube consoles are just the first evidence of the future possibilities of interactive broadband communications systems currently delivering television . because the early applications of cable were to provide entertainment television, and more often communications 107 than not were provided by people in the television business, cable television is naturally oriented toward the entertainment business . but the future of these broadband communications systems is in interactive retrieval of information as much as it is in entertainment. this goes far beyond the simple polled system so frequently used in a two-way mechanism : the talk show host asks how many people have read a particular book, the audience responds, and the net result has no effect on the program itself. it is also a lot more than interactive television : the host asks what you want to discuss, the audience says the plot of the book, and the answer has an effect on the outcome of the show. in fact , these broadband communications systems have the potential for placing at the fingertips of americans a vast storehouse of information services about, for example, the best auto routes to your favorite spots, baby care, banking, buying a house , dressmaking, good buys, hobbies , jobs , legal facts , properties for sale or rent, sports scores, technology, and wine. as qube expands into its qube iii system with more than a hundred channels of services, it will be technically positioned to support all aspects of this burgeoning information age. 6 besides simple information retrieval, a qube subscriber will be able to conduct banking and shopping transactions, to provide information such as who is on what side of community issues , and also (incidentally) to watch television . if all of that does not seem like enough, remember that cable "is really a very large pipe through which any variety of electronic information can be pushed. passive home security, fire alarm , and energy management are also services either in existence or contemplated by a number of cable operators . for that matter there is no reason to believe the computer processing services can't be made available to individual subscribers . a subscriber could call up the program to balance his checkbook, to perform his smallbusiness payroll calculations, or to complete a statistical analysis of data for a school project. most people thought (as we initially did) that interactive cable (qube) means interactive television . but oclc's research 108 journal of library automation vol. 14/2 june 1981 has shown that interactive television programs : 1. serve as an initial introduction to naive audiences of what a truly interactive system is all about; 2. are difficult to implement; 3. really aren't democratic; 4. are basically polling devices. it has been said that the reason that railroads went out of business was because they insisted that they were in the railroad business and wouldn't admit that they were in the transportation business. if cable operators insist that they are in the television business, they may well miss the opportunities that are possible in the communications business or, in fact, in the information business . by the same token, if libraries miss the significance of what cable television is bringing to their business, their role in the community will be diminished and libraries may go the way of railroads. modern communications and computers offer an opportunity for libraries to become the information choice in their community. in the near future, applications such as the home book club may well be a way to provide increased accessibility of library services to library patrons, and to "condition" those patrons to the coming electronic nature of libraries . over the long term, libraries, if they have the courage and the foresight, can be the focus of the coming information and telecommunications revolution . the message is quite clear: opportunities abound. references l. john wicklein, "wired city , u.s.a: the charms and dangers of two-way tv," atlantic monthly 243:35--42 (feb. 1979). 2. warner amex represents a newly formed corporation resulting from the merger of warner communications and american express. 3. jonathan black, "brave new world of television," new times ll:41 (24 july 1978). 4. ibid., p.49. 5. "warner cable's qube: exploring the outer reaches of two-way tv," broadcasting 95:28 (31 july 1978). 6. "two-way converters hot ticket at ncta exhibits," broadcasting 97:72 (26 may 1980). an informal survey of the cti computer backup system joseph covino and sheila intner: great neck library, great neck, new york. in order to help decide whether or not to purchase computer backup systems from computer translation, inc. (cti), * for use when the clsi libs 100 automated circulation system is not operating, great neck library conducted an informal survey of libraries using both systems . eleven institutions, including both public and academic libraries, responded to a brief questionnaire. they were asked what size cti system they had purchased and why, how easily it was installed, how well it performed, how it was maintained, and if clsi acknowledged that the addition of the backup did not affect their libs 100 maintenance agreements . before summarizing the responses, the structure of the two systems and how they interact should be outlined. clsi libs 100 the clsi automated circulation system consists of a stand -alone minicomputer console with local and/or remote terminals connected to it through individual ports by means of electrical and/or dedicated telephone line hookups. when it operates, the terminals are online and interactive with the database, which is stored on one or more multiplatter disc packs. cti backup the cti backup system is based on an apple ii microcomputer with two minidisc drives, which take 5 1/4-inch floppy discs, a tv monitor, and a switching system that can be connected to the libs 100 console or its terminals . the cti system can also be used alone. when the libs 100 is down (inoperative), the cti system is connected to a terminal, and data is recorded on its discs for later dumping (data entry) into the database via a port connection . it *cti is a profit-making company wholly owned by brigham young university. the cti backup system was originally developed to support the clsi"installation at byu. lib-mocs-kmc364-20131012125101 174 oclc's database conversion: a user's perspective arnold wajenberg and michael gorman: university of illinois library, urbana-champaign this article describes the experience of a large academic library with headings in the oclc database that have been converted to aacr2 form. it also considers the use of lc authority records in the database. specific problems are discussed, including some resulting from lc practices. nevertheless, the presence of the authority records, and especially the conversion of about 40 percent of the headings in the bibliographic file, has been of great benefit to the library, significantly speeding up the cataloging operation. an appendix contains guidelines for the cataloging staff of the university of illinois, urbana-champaign in the interpretation and use of lc authority records and converted headings. the library of the university of illinois, urbana-champaign, is the largest library of a publicly supported academic institution, and the fifth largest library of any kind, in the united states. in the last year for which figures are available (1979-80), the library added more than 180,000 volumes representing more than 80,000 titles. the library is currently cataloging more than 8,000 titles a month; more than 80 percent of the records for these titles are derived from the oclc database (library of congress and oclc member copy). because our cataloging is of such volume and because we are actively engaged in the development of an online catalog, we decided to use the second edition of the anglo-american cataloguing rules (aacr2) earlier than the "official" starting date of january 1981. we began to use aacr2 for all our cataloging in november 1979. this early use of aacr2 has led to two consequences. first, we now have oclc archival tapes representing about 150,000 titles cataloged according to aacr2. this represents a valuable and continuously growing bibliographic resource that can be used without modification in our future online catalog. second, we have a considerable and unique collective experience in the practical application of aacr2. the minor problems of working with aacr2 in an aacrl manuscript received june 1981; accepted june 1981. oclc's database conversion/w ajenberg and gorman 175 plus superimposition environment (until january 1981) were more than compensated for by these two positive results. oclc conversion with our practical background in the use of aacr2 and our continuing need for a high volume of cataloging, we were, naturally, keenly interested in the (to our mind) progressive decision of oclc to use machine matching techniques to convert the form of name and title headings in its database-the online union catalog (oluc) to conform to aacr2. we recognized the limitations of the project, essentially those defined by the capabilities of the computer for matching character by character, but felt that this was a major venture that would, when completed, produce major benefits. what follows is an assessment and analysis of the results of the project in the light of the experience of a library that is dedicated to achieving highvolume, quality cataloging. we deal with the lc authority file as well as the oclc headings because the lc file was the basis of the project and because, from the practical point of view, the two files are complementary aspects of the same service. the greatest value of the conversion, and its greatest claim to uniqueness, lies in the sheer size of the project in terms of headings checked and changed. our catalogers, and others who work with current materials, estimate that more than 40 percent of the name and title fields we use in our current cataloging have a w subfield indicating that the name or title has been changed to its aacr2 form. since oclc estimates that 39 percent of the name and title fields were affected by the conversion, it would appear that the headings that were changed are the headings that we are more likely to use. in other words, the project has brought us more than a 39 percent benefit . we are also greatly encouraged to find that the number of headings coded dn (meaning aacr2 "compatible," or, more bluntly, lc's modifications of the provisions of aacr2) is a very tiny minority of all converted headings. this means that when, in the future, this policy of "compatibility" is lessened or dropped, there will be relatively few changes to be made. lc authority records we also benefit from the presence of lc authority records in the oclc database when we establish headings that are new to our catalogs. there is one problem with the use of these records, which was revealed by a sample of new university of illinois authority records (see table 1). this sample of 368 new university of illinois records reveals that lc authority records are available relatively rarely for new headings. this is not surprising as these new headings are established most often as part of the process of original cataloging, which , almost without exception , occurs in our library only when oclc copy is not available. it seems to us to be unfortunate that 176 journal of library automation vol. 14/3 septem her 1981 table 1. recently established headings no record record record authority coded coded coded record c• d* n• given name headi ngs 13 5 0 1 single surname headings 212 26 2 2 (number of this category with (132) (7) (1) (2) initial isms expanded in parentheses) compound surname headings 29 12 0 (number of this category with (2) (0) (0) (0) initialisms expanded in parentheses) single surnames plus 3 0 0 0 uniform titles general corporate 34 12 0 0 headings general headings with 7 2 0 0 subdivisions government headings 4 2 0 total 302 59 2 5 *key: c-in subfield w, indicates an aacr2 form , as established by library of congress. din subfield w, indicates an aacr2 " compatible" fo rm, as established b y library of congress. n-in subfield w, indicates that the input operator could not determine which set of rules governed the form of the heading. member libraries cannot contribute their authority records to the oclc database. our experience suggests that the online authority file would grow very rapidly if that were the case. to put it another way, the oclc conversion provides an enormous and valuable resource of aacr2 headings. it did not, and could not, provide new authority information. oclc will be complementing its valuable work in upgrading the retrospective file when it devises and implements a scheme for making available authority records for new headings derived from a wide range of sources. since so many headings were converted to aacr2, it may seem churlish and ungrateful to complain that more was not done. the following descriptions are not intended to form part of an attack on oclc's project or to minimize its achievement. form subdivisions the project failed to delete form subdivisions (such as "liturgy and ritual" and "laws, statutes, etc.") from added entry headings and subjects. the program correctly deleted them from main entry headings, but the inconsistencies resulting from their retention elsewhere makes the job of ensuring consistency in a large copy cataloging operation that much harder. oclc's database conversion/w ajenberg and gorman 177 this inconsistency in treatment is illustrated by examples 1 and 2. example 1 originally was entered under 110 10 illinois. k laws, statutes, etc . the program correctly changed the main entry heading to 110 10 illinois and added a subfield w, coded mn (them indicates a conversion by machine to theaacr2 form; then means "not applicable," and indicates that there is no title element in the heading) . example 2 has as main entry 110 20 illinois community college board but has as added entry 710 10 illinois. k laws, statutes, etc. t illinois public community college act under aacr2, the subfield k, "laws, statutes, etc.," should not be present in the heading. unfortunately, the program looked only at 110 fields, not at 710 fields, and so the heading was not corrected in the conversion. it must therefore be edited manually by every library that uses the record. program problems our direct use of the online authority file is somewhat hampered by the programming oversight that makes it impossible to search uniform titles. of course, uniform titles that are accompanied by a 100 field (notably in music) can be retrieved by an author search, but those without 100 fields (anonymous classics, sacred scriptures, etc.) are virtually inaccessible. there were a handful of specific instances in which the specifications were inadequate or the programs seem to have malfunctioned. these resulted in some oddities such as the conversion of the subject "jesus christ" to "sermon on the mount" and the (surely not politically motivated) switch from "u.s. department of state" to "voice of america." oclc has been scrupulous in identifying and publicizing these errors. they are few in number and, though conspicuous, have rarely caused us many problems. as can be seen, the problems caused by what we see as failures on oclc's part are few and affect few cataloging circumstances. theremaining problems either result from the decisions and actions of the library of congress and, hence, are wholly or mostly out of oclc's control, or are of such a nature that they cannot be solved by computer matching techniques without extensive editorial intervention. whether such human intervention is possible and, if possible, cost-beneficial is not for us to say, though it must be recognized that to transform the oluc to pure aacr2 conformity would be a herculean task . that task would undoubtedly involve many of the hundreds of thousands of records that are seldom or never used. oclc's database conversion!w ajenberg and gorman 179 serials the most troublesome example of the kind of problem that cannot be resolved by machine matching is that of serials. the oclc conversion project was, quite properly, not concerned with choice of entry (aacr2, chapter 21). this seems a simple and clearly defined decision. when we come to consider serials, this clear distinction between choice and form of entry becomes blurred. the major change brought about by aacr2 (rule 21.1b2) is that many serials previously entered under the heading for a corporate body are to be entered under their titles. in fact, the great majority of serials will now be entered under title. the upshot of this is that the citation (or form of heading) for a serial changes from, for example, national society for medical research. bulletin to bulletin i national society for medical research the restriction of the oclc project to forms of heading means that most serials in oluc will be found under headings the form of which may be correct but are inappropriate for citations. this problem, which, of course, cannot be resolved by computer matching, has led to difficulties for us in copy cataloging, because a degree of expertise is needed to apply aacr2 rule 21.1b2 and to distinguish between the majority of serials where the 110 field should be changed to a 710 and the small minority where the 110 field should remain as it is. since most serials are to be entered under their titles, it occurs to us to suggest that the oclc conversion project could have changed all 110 fields in records identified as relating to serials to 710 . by that method, the majority of serials would be correctly entered and the potential for mistaken citations greatly reduced. multiple personal names persons who write under more than one name (real names, pseudonyms, etc.) and who are not primarily identified by one of those names (aacr2 22.2c3) pose a special problem. under the provisions of aacr2, such persons are to be represented in the catalog (and the database) under two or more names. despite the fact that "creasey, john" and "marric, j. j ." and" ashe, gordon" are all names used by the same man, they will appear as separate headings from now on. under aacrl plus superimposition one of those names ("creasey, john") was used as the heading for all works. within the confines of the oclc project, there was no method available to distribute the various records under the various headings. it occurs to us that some method based on matching the name found in the 245 $c subfield with the 100 field might, at least, have resulted in the project recognizing probable cases calling for multiple headings. for example: 100 a hibbert, eleanor 245 a bride of satan i $c jean plaidy 180 journal of library automation vol. 14/3 september 1981 could alert the system to a case for change. we recognize that this would call for more sophisticated computer matching techniques and that it would call for editorial intervention. a good example of the problem this has caused for us is the case of the danish author karen blixen. she wrote under that name and under the pseudonyms isak dinesen and pierre andrezel. records in the database that were added before 1981 will use "blixen, karen, 1885-1962" as the heading for all her works including those published under pseudonyms. since the blixen heading is a perfectly acceptable aacr2 form, the conversion program codes it as an aacr2 heading, which it is for the blixen books but is not for those published under other names. the authority record (example 3) includes a note identifying both pseudonyms as valid aacr2 headings, but, of course, the programs as written cannot interpret such a note and match them with appropriate records. corporate name changes corporate bodies present a similar problem when one is dealing with those that have changed their name. until1967, the library of congress used the latest name of such bodies with see references from the earlier names. both editions of aacr require that works issued under the earlier names be entered under those names and works issued under the latest name be entered under that name, the various names being connected by see also references. however, records in the oluc for earlier works cataloged before 1967 will show those works entered under a later name. for bib record enter t>1b display recd :o: end r~c: stat : n entrd : 80 11 :::1 u!;eooj : 80 1121 t·,p~ : = b1b lvl : g~vt a9n : ~ang : suurce : ~· t tt:o : 004 inlc: :.. en-: lvl : n h~ad ref: a h~ad : c•: i i~ ;;\ d s t ol. t •j s : ;1_ n-3-n~~.? : o:t mod h:~ c : a•j t h status : a 1 0 10 n 7~0077 1 9 '2 1(•0 10 bl1 ~~'' t:-:·~r:"no d t:::=:3c.j '=~62 . w r.001790::1~-:t.·'l.c.:t.nn----n r.n n :..; 4(h) 10 andrb'i-::'1?1· pj."'r' \!' w rp;.jo:::790::15a•:ht.snn----nnnd 4 4 0c"i t ll d1 n,~ '='1::n. ls·"l ~ w nl"l•) 37"''021 ~;,:toln-tnr.----nr.r.d 5 /;..(.7 th.: f-(.•11 r.• w lnq ps~ •jd•:•n•r•s. ~r i.· val l •j aa(r 2 h~adlrt:a~ : a ar.,jrbe,;:el, r" l ·:?r 1"' €'. u:::::5l ctt,~·~ .a d l l1(•!5-er. ~ i !.•-:\1 1 1f::35j '="162 w n 0047902 15>i<:ln0l nr. -·--r•nr.n no holding~. i n uiu for iioldh~c·. en ter .1h depress disf'lay recd ~·end f'\~ o: r t .:... t: ,en t r d: 7507 11 used : 8 10725 t,pe : ·"i:l btb lvl : m c• o vt r ~•ji• : _ l~r,9 : ..:tn9 ::::.;:, tj rce : ij i ll•js.: r~::pr: [i•·= 1·11: r c.:onf p•jb: _t-rr. : __ oat tp : _ m/f/b : _ _ _ ind ;.. : _mod rec : f~s.ts•: t-.r: _ c·:.or.t : d-?~·: : lr:.t lvl: 0dt~~ = 196:?.. _ l 0 1 0 63-11618 2 040 ~ orl 1 oc.l ~ ~.~: . ::: 0'::·0 0 pz 3. 86;2(126 b el • 4 0~..,2 fl •5 09:.· t; 6 04.;, tiiuu 7 100 10 bl1xen ~ ka~ en , d 188~ ·1 962 w e n :::: _::.q•:; 1 ehr·.:-n9ar-d c (b·,·) 1~<1 1 d1 nes.-?n [,. .. :-~ ... :jd) q 260 0 new yor~. brandon. house. c fi~~~j 10 "30•) 111 "· c 22 err •• example3 oclc,s database conversion/w ajenberg and gorman 181 because those later names are valid aacr2 headings in terms of their form , they are coded en (i.e., aacr2 validated) by the program , even though they may not be the right headings for the records to which they are attached. a good example of this problem is that of the "lutheran churchmissouri synod ." an earlier name of this religious body is "deutsche evangelisch-lutherische synode von missouri, ohio, und andern staaten." unfortunately, the authority record (example 4) does not even show that the earlier name is valid according to aacr2. the conversion program, on encountering the earlier name used as a heading, would change it to the later name and code that form as being the aacr2 heading. another example of the problem is: chamber of commerce of the united states of america. international department this is identified as the aa c r2 form (example 5) but, in fact, the department has changed its name to "international division." lc practice another problem we have encountered is that of the literal-mindedness of the computer programs in matching like with like. this problem is compounded by inconsistencies in cataloging practice resulting from variations in lc cataloging practice. an example of this problem is that of the nigerian author chinua achebe. the heading "achebe, chinua" is marked as being aacr2 despite the fact that the authority record shows ~;.:rt:oen 1 ot .. ~ fur f< i b reco:.ord ente:f~ to1b oi'sf'lay reco send f;:.,.,: ~ t~ t : 11 f r.t r· d: :::(' 1 ~~:::::: 1.1•3-.:o•j! 801~::.2:3 t ·tpe! :;:: 8jb lvl : c'o:•vt a9n : l~l.n9 : 'i-t:••jr•:.: : ·; lt e: 0 11 l'nlc : .:;:~ e: n•: lvl : r• h e.]_ d 1 ...:.·t : a h~a. d: c i •) 10 :<)•)57t)65 :.: 11•.) .:-l~· ' • • ,~··.;t r, ,-hur•:h--11j.ss<:••jr j :_::..·, .,-l( .. j , t li' n(l(ol:::(lll(l~:::;.ao3•:~.r • r•----r,r.r+n 3 4 j. o .:•: l '~ '' '=' '3. . rori?l91• commerce-fc•rc~ i'ein f'oll c, [i~ pt. w n(u_l~.:?o l ').:· l ·'l.-3.f1~rtn ---nflrar' 8 67(1 ar, t ntr odu·:tt or. t.~· do1n9 lffif'•ht •• • 19 4 7w n007:~:t) t 02' 1 aananr.---nn n n example s that he was born in 1930. lc's announced policy is to give dates "whenever the information is readily available," but only for headings established after december 1980. this restriction creates inconsistencies in lc practice that are hard to predict. the result is that we often establish a personal heading with a date, only to discover that lc is not using it. the definition of " readily available" is clearly elastic and does not provide clear guidance to other libraries. it is irritating and occasionally burdensome but does not create a quantitatively serious problem . one unfortunate result of lc's machine conversion of its authority file to aacr2 forms has been to make notes on the authority records harder to understand. this is because only headings and references were changed; notes were not affected. this means that the wording of the notes may refer to a state of affairs that has altered as a result of the aacr2 conversion. example 6 is the authority record for theaacr2 form of heading for the university of illinois prior to the change of name in 1966. the history note (field 667) incorrectly says that the heading for works published before 1966 is "illinois. university" (the pre-aacr2 form) . since the aacr2 form as established by lc looks very much like the new name, "university of illinois at urbana-champaign," the authority card is very difficult to understand. nothing short of revising the note, and/or the use by lc of a less confusing qualifier than "(urbana-champaign campus)," will make the authority record intelligible. an example of how lc practice has affected the oclc program adversely is in the area of the so-called compatible headings. these are instances of when lc has chosen to depart from the provisions of aacr2 for one reason or another. leaving aside the utility and morality of such a policy, it presents a considerable problem to those of us who use oclc oclc's database conversion/w ajenberg and gorman 183 records. the example that follows is of the worst of these "compatible" practices. lc has decided to ignore the common form of name for persons who are not "famous or published under an american imprint." 1 thus, the writer p. c. boeren would be recorded as "boeren, p . c. (petrus cornelis), 1909" under the provisions of aacr2, but, because boeren is neither famous nor american, the "compatible" heading will be "boeren, petrus cornelis, 1909." this heading is not acceptable in an aacr2 catalog. scr-e-er. 1 qf 4 for bib record enter btb display recd senli rec stat: c entrd : 801122 us~d : 810718 fype: z bib lvl : ~ govt a9r : lang: source: site: 038 inl~: ~ enc lvl: n h@ad r~f: a he~d: ~ ~ie~d status : a name : a mod rec : auth ~t~tus: a 1 010 n 7904c•l04 2 110 20 untvt:-rsit-.~ of illtnc•ls chi•: a9o cir·cl€, artd the univer·sit -..-· .:·f illjr .. :.ts at th et-tedto:·a.l center~ o1e-r·f=' reorganized into equal administr·ative ca~puses with1n a university s . stem with a •:entral admintstr·attve staff in llr· ba,r,a . a wor•-s p•jbllshed by th~s~ b(•dles after the reorgantzatton tn 1966 are found under a un1v~rs1tv of llltnots at urbana-champat9n . a un1vers1ty of 11 l1no1s at ch1ca9o c trcle. a un1vers1ty ot lll1nots at the medical center . a untver·st t. of llltr.ots (srstem) a subject entry: wor• s about these bod1es are ~nte-red •jnder th ~?name o:·r· n>3m~s tn e .:..::: 1stence d•jrln~ the 1ate5-t period fc•r wht•:t. sijt•je:t c.·.vera9-e ts 91v~r.. in the c ase wher~ the required name is represent~d 1n tht~ ~ at~lo~ onl, un·j~r ~ later form of th e• rtame. ~r.tr··r 1s ma.j-e un·j ~r· tht.:o la t ~r f.:·r·m. w n010:3 t061c·a'lnur.n--nnrtn 14 667 llltno1s indus t r1al university. w n004790s=a4dnann----nnnn example6 184 journal of library automation vol. 14/3 september 1981 more, it is quite possible that if boeren' s works are published in america or if lc suddenly decides that boeren is "famous," the heading will be changed. this is an infrequently encountered problem for us but one where lc's peculiar policies have created problems that have nothing to do with oclc or aacr2. conclusion the problems that we have cited above are real but not numerically significant (except in the case of serials and multiple personal namesneither of which are under oclc's control). they are far outweighed by the tremendous value of the more than 40 percent of oclc headings that have been converted to their aacr2 form. the oclc conversion has made it possible for us to do aacr2 cataloging more quickly than in the period november 1979-december 1980. we have issued guidelines to our professional, paraprofessional, and clerical cataloging staff who deal with all the headings we encounter in using oclc (see appendix). problems such as those we have described are dealt with in our guidelines, and in practical terms now in day-to-day work. they may take some extra time, but overall our cataloging operation has been greatly speeded by oclc's conversion . reference 1. cataloging service bulletin , no.6:6(falll979) appendix university of illinois library at urbana champaign copy cataloguing guidelines authority records lc authority records, now available on oclc, can be very helpful in determining the correct aacr 2 form of headings, and should be cited on authority cards we prepare, when we use them in establishing headings. the tag numbers used on authority records sometimes have different meanings from the numbers used on bibliographic records. the meanings are: lxx heading 4xx see reference (i.e. from the form in this field to the form in the lxx field) 5xx see also reference (i.e. from the form in this field to the form in the lxx field) 6xx notes (e.g. the authority used by the lc cataloguer) each field concludes with a w subfield, consisting of 24 characters indicating in coded form various types of information about the heading. the 13th character, the 3rd past the six-character date, consists of one of five letters indicating the rules governing the form of heading in that field. the codes are: oclc's database conversion!w ajenberg and gorman 185 c aacr2 d compatible with aacr 2 b aacr, 1967 ed. a earlier rules (e.g., ala rules of 1949, etc.) n not applicable or not applied here is an example of an lc authority record, omitting the fixed field and some of the references: 010 n 790558820 110 20 state university of new york at buffalo. w n008801115aacann----nnnn 410 10 buffalo. b university w n002791105aaaann----nnna 41010 new york (state). b state university, buffalo. w n009801115aaaann----nnna 667 the following heading for an earlier name is a valid aacr 2 heading: university of buffalo. w n007791105aanann----nnnn when oclc carried out its aacr 2 conversion project, the data about the rules encoded in subfield w was added to headings in bibliographic records, if those headings were altered by the conversion. for bibliographic records in oclc, subfield w contains 2 characters, each of which must be one of the following: c (for aacr 2 heading) d (for accr 2 compatible heading) m (for machine converted heading) n (not applicable or not applied) the first character applies to the name portion of the heading; the second, to the title portion. obviously, in many cases there is no title portion, in which case the second character will ben. the code m (machine converted heading) is used when a heading is altered directly by program, rather than being extracted from an authority record. an example would be the elimination of subfield k laws, statutes, etc. 1. use of subfield win cataloguing since oclc does not want member libraries to apply the letter codes in subfield w for their original input, the presence of a cord in subfield w should always indicate an lc decision identifying an aacr 2 or aacr 2 compatible heading. supply subfield w for all cataloguing to be added to oclc's data base. the codes to be used are given in illinet's information bulletin #92, from which this table is copied: 1 aacr 2 form found in on-line lc name-authority file 2 aacr 2 compatible form in on-line lc name-authority file 3 aacr 2 form supplied by inputting institution with copy in hand and piece not in hand 4 aacr 2 form supplied by inputting institution with piece in hand 5 author or title portion of heading not converted to aacr 2 form. this subfield (#w) is always the last subfield in the field. it must contain a two character code. the first character applies to the name portion of the heading; the second character applies to the title portion of the heading. if the heading is a name heading and does not include a title portion, use "n" as the second part of the code. if the heading is a uniform title heading, use "n" as the first part of the code. examples: 700 10 day lewis, c. #q (cecil), #d 1904-1972 #win 600 10 schmidt, h. r. #q (heinrich rudolf) #w 4n 130 00 bible. #p n.t. #s authorized. #f 1974. #w n4 accept headings coded c in subfield was correct aacr 2 headings, unless the heading is for an author entered under surname who writes in a non-roman alphabet language. for such 186 journal of library automation vol. 14/3 september 1981 authors, use the form given only if it is a standard romanization of the name in the original alphabet. if a form other than the standard romanization is used, substitute the standard romanization, and trace an x ref. from the form coded c. 2. lc author headings without dates lc recently announced that it will not add dates to a heading already established without dates, unless the dates are needed to resolve a conflict. when there is no conflict, the dates will be recorded in the authority record in a 6xx field , but will not be added to the heading. dates will be routinely added to newly established headings at the time the headings are established, if the information is readily available. lc codes such headings c, not d, because aacr 2 does not require that a date be added to the heading, except to resolve a conflict. if such an lc authority record is available when a heading is being established, use the lc form , without adding dates to the heading, unless dates are needed to resolve a conflict in the new catalogue. record the dates on an authority card. if lc authority is not available wh(;ln a heading is being established, use dates in the heading if the information is readily available. if, later , lc authority is found that omits date from the heading, do not change the heading as already established for the uiuc new catalogue. since records in oclc may contain headings without dates for persons we have established with dates, some conflicts will be generated. these should be resolved by catalogue maintenance staff, who will add dates in pencil to headings on new cards that lack dates, but are otherwise identical with headings in the new catalogue. such conflicts in the machine record will be cleaned up gradually, after fbr is up . 3. acceptable dn forms headings coded d in authority records (dn in bibliographic records) are the aacr 2 "compatible" forms. in many cases, the difference from aacr 2 is trivial, and the form can therefore be used. in such cases, if lc authority is available, use the form as established by lc, and record the information on an authority card. if lc authority is not available when a heading is being established, follow aacr 2. if, later, lc authority is found that establishes a "compatible" form , do not change the form in the uiuc new catalogue to the lc "compatible" form. it will sometimes happen that "compatible" forms will be found on records in oclc (coded dn, usually) . such headings may be used only if they fall into one of the categories listed below . this will sometimes result in "compatible" forms and true aacr 2 forms both being used in the new catalogue. in some cases, the two forms can be interfiled; in other cases, catalogue maintenance staff will need to correct "compatible" headings in pencil. acceptable dn form s are: a . lc will omit hyphens between forenames if the heading has been established without hyphens, even though rule 22.102 would require hyphens. use the lc form , if found . catalogue maintenance will interfile headings identical except for the presence or absence of hyphens. b. lc will continue to place the abbreviation ca. after a date in the heading for a person, if the heading has already been established in that form , even though rule 22.18 specifies that the abbreviation should precede the date. use the lc form, if found . catalogue maintenance will interfile headings identical except for the placement of the abbreviation ca. c. lc will not correct the language of an addition to a personal name heading; i. e . will not change to the language used in the person's works. (e.g., a heading already established as louis antoine, father will not be changed to louis antoine, pere, even though the latter is the author's usage.) use the lc form, if found . catalogue maintenance will correct conflicts in pencil, to the lc form. d. lc will not change a personal name heading to a fuller form of the name, even if the shorter form is not predominant. use the lc form , if found. catalogue maintenance will correct conflicts created by personal name headings that vary in fullness to the form to which a "see" reference has been made. if there is no "see" reference, catalogue oclc's database conversion!w ajenberg and gorman 187 maintenance will refer the conflict to the appropriate cataloguing service. e. lc will continue to use additions to surname headings supplied by cataloguers, for headings already established with such additions. use the lc form, if available. catalogue maintenance will resolve conflicts by adding qualifiers in pencil to headings that are otherwise identical with the forms with qualifiers. f. lc will continue to use titles of honor, address, or nobility with headings that have already been established with such titles, even though the authors do not use such titles. use the lc form , if found. catalogue maintenance will resolve conflicts by adding qualifiers in pencil to headings that are otherwise identical to the forms with the qualifiers. g. lc will not use initial articles in uniform title and corporate headings, even when they are required by aacr 2. we will follow lc practice in this, and use the lc form when found. catalogue maintenance will interfile uniform title and corporate headings that are identical except for the presence or absence of initial articles. h . lc will continue to use the abbreviations bp. and abp. for personal name headings that have already been established with those abbreviations used as qualifiers, instead of spelling out the qualifiers in full. use the lc form, if found. otherwise, follow aacr 2 and spell out "bishop" and "archbishop". catalogue maintenance will resolve conflicts by correcting in pencil to the form spelled out in full. i. lc will not add terms of incorporation to corporate headings already established without them, nor delete them from corporate headings already established with them, even though lc interpretation of aacr 2 would require such adjustment. use the lc form, if available. otherwise, retain terms of incorporation in corporate name headings only if the term is an integral part of the name, or if, without the term, it would not be apparent that the heading is the name of a corporate body. catalogue maintenance will resolve conflicts by adding, in pencil, terms of incorporation to headings identical to established forms except for the absence of such terms. j. lc will not add geographic qualifiers to corporate headings established previously without such qualifiers, even though they have chosen to apply the option in rule 24.4 that allows qualifiers to be added when there is no conflict. use the lc form, if available. catalogue maintenance will resolve conflicts by adding qualifiers in pencil to headings identical to established headings except for the absence of such qualifiers. k. lc will not reduce the hierarchy of far eastern corporate headings, established before 1981, even though aacr 2 rules would require that intervening superior bodies would be omitted from the heading. use the lc form , if available. catalogue maintenance will refer conflicts to the appropriate cataloguing agency for resolution. the asian library cataloguer is the final authority for such headings. l. lc will not change the capitalization of acronyms and initialisms to conform to the usage of the corporate body, if the acronym has already been established with a different capitalization. use the lc form, if available. catalogue maintenance will resolve conflicts by interfiling acronyms and initialisms that are identical except for variations in capitalization. m. lc will not supply quotation marks around elements in a corporate heading that has already been established without quotation marks, even though this varies from the usage of the body. use the lc form, if available. catalogue maintenance will resolve conflicts by interfiling headings identical except for the presence or absence of quotation marks. n. if lc is attempting to resolve a conflict (i.e. two different people with identical author statements), and neither dates nor expanded initials are available to resolve the c:onflict, lc will add an unused name in parentheses to the heading if the information is available. e.g.: established heading: smith, elizabeth new author: elizabeth smith 188 journal of library automation vol. 14/3 september 1981 (new author's full name, ann elizabeth smith, is available) lc heading: for new author: smith, elizabeth (ann elizabeth) use lc forms if found in name authority file. catalogue maintenance will refer problems to the appropriate cataloguing agency. 4. unacceptable dn forms in a few cases, the aacr 2 "compatible" forms, coded d in authority records and dn in bibliographic records, are unacceptable in the uiuc library. instead, we will follow aacr 2 in constructing these headings, and record the lc form on authority cards when they are found. we will also make references from the lc forms, if they would file differently from the forms we use. for many of these, catalogue maintenance will have to refer conflicts to the appropriate cataloguing agency. in a few cases, catalogue maintenance can make the corrections on the cards. the unacceptable dn forms are: a. lc will sometimes, but not always, continue to use headings established prior to 1981 with names spelled out in full , when the authors represent some of those names with initials. follow aacr 2 in constructing headings for these names. use initials in conformity with the authors' usage, and add the corresponding full names in parentheses, in subfield q, when the information is available. whenever an element in a compound surname or a first forename is represented by an initial, make a reference from the fuller form. usually, a reference will not be needed if a forename other than the first is represented by an initial. b. lc will continue to add " pseud." to personal name headings already established with that qualifier. do not use the qualifier "pseud." when establishing personal name headings, and delete the term from oclc records that use it, including records added by lc. catalogue maintenance will resolve conflicts by lining out the qualifier "pseud." in headings. c. lc will continue to add 20th century fl. dates to personal name headings already established with such dates. do not use 20th century fl. dates when establishing personal name headings, and delete such dates from oclc records that use it, including recorded added by lc. catalogue maintenance will resolve conflicts by lining out 20th century fl. dates in headings. 5. 87x fields one part of the aacr 2 conversion project by oclc was the addition of fields tagged 870, 871, 872, or 873. these fields contain the pre-aacr 2 forms of headings that were changed by the conversion. oclc participants can add 87x fields to records they enter into the data base. however, we will not supply these fields in our cataloguing. 6. authority cards prepare authority cards whenever references are needed, and whenever an lc authority record for the heading is found , even if we do not use the lc form. citation of the authority record takes the form: "lc auth. rec." followed by the record number and the indication, in parentheses, of the code for rules given in subfield w. example: akademie der wissenschaften und der literatur (mainz, germany) lc auth . rec. 80076417 (en) if the lc form differs from the form used as the heading in muc, give the lc form in parentheses, following the sub field w code. example: abrahamson , max w. (max william) lc auth. rec. 78064817 ( dn) (ab-rahamson, max william) it will sometimes happen , when establishing the heading for a corporate body, that an lc oclc's database conversion/w ajenberg and gorman 189 authority record for a subdivision of the body you are establishing will give you the aacr 2 form of the body you are setting up. precede the citation to the authority record with the word "from". example: united states. environmental protection agency. region v. from lc auth . rec. 80159375 (en) (the lc authority record is for the water division of region v) 7. references the basic rule for making references is given in aacr 2, rule 26.1: "whenever the name of a person or corporate body or the title of a work is, or may reasonably be, known under a form .that is not the one used as a name heading or uniform title, refer from that form to the one that has been used. do not make a reference, however, if the reference is so similar to the name heading of uniform title or to another reference as to be unnecessary." ultimately, this decision depends on the cataloguer's judgement. usually, make a reference only if it would file differently from the established heading and from all other references. refer from variant forms found in works catalogued for this library, and in standard reference sources. lc authority records will often suggest useful references. however, we may need references not traced by lc, and we may not need all of the references lc traces. notice especially that lc authority records will often give a reference from the pre-aacr 2 form, even when it would file with the aacr 2 form. for example, the authority record for akademie der wissenschaften und der literatur (mainz, germany) traces a reference from adakemie der wissenschaften und der literatur, mainz-the pre-aacr 2 form. these two forms would file together, so we do not need the reference. we will trace "see also" references from forms that can legitimately be used as headings, whether or not they have been used yet in the uiuc library. we will no longer observe the former restriction, which allowed "see also" references to be made only if both headings had been used. for further information on authority records and references, see the cataloguing manual, section a79. aw:lgo arnold wajenberg is principal cataloger and michael gorman is director, technical services, at the university of illinois library. 102 the recon pilot project: a progress report henriette d. a vram: project director, information systems office, library of congress, washington, d. c. a synthesis of the progress report submitted by the library of congress to the council on library resources under an officers grant to initiate the recon pilot project that gives an overview of the project and the progress made from august-november 1969 in the following areas: training, selection of material to be converted, investigation of input devices, and format recognition. introduction the recon pilot project is an effort to analyze the problems of largescale conversion of retrospective catalog records through the actual conversion of approximately 85,000 non-current records. this project has grown directly out of the implementation of the marc distribution service. libraries considering the use of machine readable records for their current materials have naturally begun to consider conversion of their older records as well. some libraries have even begun such conversion projects. since the library of congress is also interested in the feasibility of converting its own retrospective records, it seemed appropriate to explore the possibility of centralized conversion of retrospective cataloging records and their distribution to the entire library community from a central source. a proposal having been submitted by the library of congress to the council on library resources, inc. ( clr), the council granted funds for a study of this problem. an advisory committee was appointed to provide guidance, and direct responsibility for the study and report ( 1) was assigned to a working task force. recon pilot project/ avram 103 a recommendation of the working task force was the implementation of a pilot project to test the techniques suggested in the report in an operational environment. since any feasibility report, no matter how detailed, refers to a theoretical model, the recommended techniques should be tested to determine a most efficient method for a large-scale conversion activity. the advisory committee concurred with this recommendation. the library of congress submitted a proposal for a pilot project (hereinafter referred to as recon) to clr, and received an officer's grant in august 1969 to initiate recon while the council continued its evaluation of the full-sc'ale pilot project. . a progress report was submitted to clr by the library covering the period from mid-august to november 1, 1969. so that clr might have a clear understanding of the work in progress, the report addressed itself to both the areas of recon supported by the council and those activities supported by the library of congress. in december 1969, clr awarded the library the funds requested for the entire pilot project. to make the library community cognizant of recon as quickly as possible, clr granted permission to modify the progress report for publication.· overview of the recon pilot project the pilot project is concerned with the conversion and distribution of an estimated 85,000 english language titles: 22,000 titles cataloged in 1969 and not included in the marc distribution service, and 63,000 titles from 1968. the creation of this data base partially satisfies the conclusions and specific recommendations of the recon working task force as stated in the report ( 2) : 1) there should be no conversion of any category (language or form of material) of retrospective records until that category is being currently converted; 2) the initial conversion effort should be limited to english language monograph records issued from 1960 to date and converted into machine readable form in reverse chronological order. (marc distribution service covers current english language monographs cataloged by the library of congress) . in order to explore the problems encountered in encoding and converting cataloging records for older english language monographs, and monographs in other roman alphabet languages, 5,000 additional titles will be selected and converted. the library further intends to investigate, through the design and implementation of a format recognition program, the use of the computer to assist in the editing of cataloging records. this technique should significantly reduce the manpower needs of the present method of conversion and therefore have an impact on any future library of congress conversion activity, either of currently cataloged or retrospective titles. recon will include experimentation with microfilming and producing hard copy from the lc record set. the record set in the lc card division consists of a master copy of the latest version of every lc printed card, arranged by card series and, 104 journal of library automation vol. 3/2 june, 1970 within each series, by card number. although a specific time period can be selected for conversion, the primary disadvantage of the record set for this purpose is the fact that not all changes in cataloging made to the lc official catalog are reflected in the record set. after considering all the alternatives, the recon working task force recommended (3) that the record set be used for selection of titles, but that the titles be compared with the official catalog and updated to insure bibliographic accuracy and completeness. since the record set is in constant use by card division personnel, the selected titles for conversion must be reproduced, and the original file reconstituted, as quickly as possible. the state of the art of direct-read optical character recognition devices suitable for large-scale conversion will be monitored and experimentation will be conducted with a variety of input devices. recon is closely related to the lc card division mechanization project, which is based upon the availability of records in machine readable form. recon will be closely coordinated with the card division project, both in the design of specifications for implementation and in the investigation of a common hardware/software configuration. the project was organized during august 1969. the first group of records being edited are those cataloged by the library of congress in 1969. in june 1970, the editing of the 1968 records will begin. since these records will have to be compared with the lc official catalog to record any changes, present thinking includes the design of a print program (referred to as a two-up print program) to cut printing time by providing a listing with records arranged in card number sequence (the order of input) and in alphabetic sequence by main entry on the same page. the records will be arranged by main entry to reduce the effort of checking them against the official catalog and the changed records will be inserted in their proper place in sequence by lc card number. the process of manual editing may be greatly reduced, or perhaps even eliminated, by october 1970, when the format recognition program is scheduled for completion. mter this time, the records will be input with little or no prior tagging and further editing will be performed by the computer. the resulting records will be examined by the marc editors both for accuracy in transcription and for correctness in the assignment of marc tags, indicators, and subfield codes. the duration of the pilot project will be twenty-four calendar months, august 1969-august 1971. it is anticipated that by november 1970 enough data should be available to determine whether a full-scale conversion project should be undertaken. an early evaluation of the project is advantageous in order to explore the funding possibilities of a conversion effort if the results of the pilot are affirmative. figure 1 is a calendar indicating the major milestones of recon as postulated during august 1969. recon pilot project/ avram 105 1969 1970 1971 au s 0 n d ja f mr ap ~y jn jy au s 0 n d ja f ~r ap my jn jy au t_ project begins • production staff hired • iso staff organized card division sends 1969, 1968 cards !investigate input devices, recon/card division hardware/software rrainfng editors trinf index !reproduction methods for catalog records study !analysis, editing, etc., research [itles 1 organize cardf for recon input ~ull editing of 1969 titlfs (16,000 records) rna1ysir of system to convert 1968 titles 1 full editingr 1968 titles ~er new mtst's ~ire nfw mtst typists fesign and implementation of format recognitifn fig. 1. recon calendar. use of format recognition on remainder 1968 titles conversion of marc , ! to marc ii rnd interim marc ii to marc ii regin evaluation of pilot pr~ject begin planning for con• tinuation of project! begin writing final report 106 journal of library automation vol. 3/2 june, 1970 essentially the same advisory committee and working task force selected for the recon feasibility study have agreed to serve in their respective capacities for recon. the implementation of the library of congress' marc distribution service and the initiation of recon are providing the nucleus of a national bibliographic data base. creation of this data base is not in itself a panacea for libraries but, in fact, amplifies the need to explore some of the larger issues at this time to provide the direction for future cohesive library systems. certain aspects of the problems were discussed in general terms in the recon report but time did not permit full analysis. during the two-year period of recon, the working task force will consider some of those issues (defined as four tasks listed below) under the grant from clr. the ability to complete all of the tasks described will be dependent on additional funding, which, it is hoped, may be available early in 1970. 1) any national data store should have a data base in which all records are consistent. it is possible, and highly probable, that libraries may convert bibliographic records for local use, which may not require the detail of a marc ii record. it is imperative that before levels of completeness of marc records are defined with respect to content and content designation, the implications of these definitions to future library networks be thoroughly explored. 2) any consideration of a national bibliographic data store in machine readable form should include the possibility of recording titles and holdings from other libraries. although the resolution of the problems associated with a machine readable national union catalog are enormous, it is time to begin an exploration of the problems to provide guidance for future design efforts. 3) several institutions have begun the conversion of their cataloging records into machine readable form. the possibility of utilizing these records in building a national bibliographic data store should be investigated. this will involve evaluating the difficulty and cost of converting and upgrading records converted by others to a marc format as opposed to preparing original records. 4) the library of congress maintains, and is considering the conversion into machine readable form, of its name and subject authority files. many libraries have expressed interest in receiving these records in the present marc distribution service. little thought has been given to the storage and maintenance of these large files in each library subscribing to marc distribution service. a library may not have in its collections a bibliographic record requiring either a name or subject cross reference record distributed by the library of congress. however, the library will keep the cross reference record because it cannot predict when a title will be added to the collection that does require the cross reference structure. the result will be the eventual storage and maintenance of the ~-==--------------------------------.... recon pilot project/ avram 107 entire lc name and subject reference files in each library. this problem should be explored to determine if there is a possible efficient method of libraries accessing these files from either a centralized source or several regional sources. progress-august 1969 to november 1969 organization the recon staff is divided into two sections: 1) the production section, responsible for the actual editing and keying of the records; and 2) the research and development section, responsible for liaison with the production section, determination of the criteria for the selection of the 1968 and 1969 titles, actual selection of the 5,000 research titles, investigation of input devices and photocopying techniques, liaison with the card division mechanization project, and the design and coding of special computer programs unique to recon. in addition, staff members of the marc project team in the information systems office (iso) are working in areas of format recognition and marc system programming that will affect recon. training the marc experience at the library of congress has demonstrated that staff members assigned to the editorial process of preparing catalog records for conversion to machine readable form must be exposed to cataloging fundamentals. phase i of the training program for the recon editors was a twoweek cataloging class conducted by the supervisor of the production section, a professional librarian with experience in teaching cataloging principles at the library of congress. each day was formally structured into reading, discussion, and practice. the editor-trainees applied the angloamerican cataloging rules ( 4) to practice problems and to actual cataloging of books. experience in using the lc subject heading list, filing rules, and classification schedules was provided to a lesser extent. in order to insure that the editor-trainees would have a wider range of experience in examining cataloging copy, the mnemonic marc tags and the more simple indicators and subfield codes were taught and used to identify explicitly cataloging elements on lc proofslips. phases ii and iii of the training, marc editing and correction procedures, were also taught by professional librarians. the editing class, which lasted two weeks, was divided into lecture sessions and laboratory sessions. each lecture period was from two to three hours; then, during the laboratory session, the instructions given in the lectures were applied to practice worksheets. the course covered input of variable and fixed fields, assignment of bibliographic codes for language and place of publication, and identification of diacritical marks included in the lc character set. phase iii of the training program, on correction procedures, 108 journal of library automation vol. 3/2 june, 1970 was a one-week class covering the addition, deletion, and conection of entire records or data elements at the field level. the training period was followed by an intensive practice period using marc input worksheets, which were reviewed by the experienced editors. selection of cards the actual selection of the 1968 and 1969 titles is a joint effort by the card division staff and the recon staff. the procedures for the selection of cards from the card division for recon differ from those described in the original report. since only cards for 1968 and 1969 titles are being selected, it is more expedient to draw the cards from the card division card stock than to microfilm the record set. these cards will include all titles cataloged by the library of congress during 1968 and 1969 regardless of language or form of material, which will yield approximately 250,000 cards. the cards are forwarded to the production section from the card division, where each record is inspected to determine whether it meets the criteria established for recon, i.e., all english language monographs with an lc catalog card number representing works cataloged by lc in 1968 and 1969 that are not already in machine readable form. the determination as to whether or not an item is in english is based upon the text, not the title page. an anthology of literature in spanish with a title page in english would not be included in recon; a book with text in english but title page in french would be included. if a book is multilingual (complete text in more than one language), the language of the first title determines inclusion or exclusion for recon. atlases are included, but not single maps or set maps. music or music scores are excluded, but books about music are included. records representing film strips, moving pictures, serials, and other kinds of materials not regarded as monographs are excluded. once the cards eligible for recon are selected and arranged in lc card number sequence, the cards are compared with the print index listing all records already in machine readable form. those records not in machine readable form are photocopied onto the input worksheet for editing and keying. to date, 60,000 cards have been selected by card division staff and forwarded to the production staff for further processing. selection of research titles an integral part of recon is the conversion of 5,000 titles to machine readable form for research purposes. ideally, these titles should serve not only the needs of recon but also be useful for some other purpose in the library of congress. these titles would include english language monographs cataloged before 1950, and foreign language material using the roman alphabet, and would be used to test various methods of input recon pilot projectjavram 109 and certain aspects of the format recognition program. the older material would represent records cataloged under earlier cataloging rules and would reveal problems in conversion in an area in which little information exists. two sources were initially considered for the selection of research titles: 1) titles in the main reading room collection for conversion into machine readable form for the production of book catalogs, and 2) the popular titles (cards ordered most frequently) of the card division mechanization project. a decision was made to study the titles in both sources with priority given to solution of conversion problems and to determine: 1) if overlap existed in records for both projects that would also serve the needs of recon; 2) if overlap did not exist, which titles (main reading room collection or card division popular titles) best served the needs of recon; and 3) if the titles in neither project were suitable, the method of selection to be used from the card division record set. the first task was a study of the characteristics of the main reading room collection. the collection consists of approximately 14,000 titles, and printed cards have been collected to compile a complete shelf-list catalog. these cards represent a wide range of material cataloged from 1900 to date. approximately one-fourth to one-third represent serials. the collection includes material in most of the roman alphabet languages currently processed at the library, the more common non-roman alphabet languages, such as russian, japanese, hebrew, etc., and a number of "difficult" titles, such as encyclopedias, dictionaries, etc., that would present a variety of cataloging and editing problems. the second task was a study of the popular titles from the card division. the card division provided a printout of card numbers for titles with 25 or more orders. there were 4,765 such card numbers listed with their corresponding number of orders. only 210 of these were for pre-1950 cards, and 97 of the 210 cards were for serial titles. only 15 out of the 210 cards were for "difficult" titles. another list was produced which contained card numbers for titles with ten or more orders. this list (with 39,148 card numbers) did produce more titles that would meet the research needs of recon. a sampling technique was designed by the technical processes research office to determine the percentage of overlap of this list with the titles in the main reading room reference collection. the estimated number of matches ( 15.5%) indicated that not enough overlap existed to consider a selection of titles that would serve the needs of both projects (main reading room collection and card division) and recon. therefore, the research titles are being selected from records for the reference collection. iso is working closely with staff members of the reference department on this project. the reference department is providing local informallo journal of library automation vol. 3/2 june, 1970 tion (e.g., local call number to locate the item in the reference collection as opposed to the lc call number which locates the item in the general collection) for all titles. as this process is completed, the responsible recon staff member is selecting the research titles. to date, "local" information has been added to 2,000 records, and 400 recon titles have been selected from this group of records. computer programs the only computer program implemented to date is the print index program. this program was required to check the records meeting the manual selection criteria for inclusion in recon against records in existing machine readable data bases to avoid duplicate input. print index lists by card number all records in machine readable form in either the marc i or marc ii data bases. at a later date, the 1968 titles found on the marc i data base will be processed by a subset of the format recognition program and converted to the marc ii processing format. the print index program is made up of two routines. the lc catalog card number routine reads each record, extracts the lc card number and creates a magnetic tape file of numbers (called print index tape). the tape created contains a card number right justified for machine sorting, a card number in the same form (zeros deleted) as the number on the printed card, and a data base code indicating the file in which the record originally resided (e.g., marc ii data base, marc ii practice tape, marc i data base). a parameter card is used to indicate which format and data base is to be processed. the ibm sort is used to arrange the output of the lc catalog card number routine into the following order: all 6x-series numbers, all 6xseries numbers with alphabetic prefixes (by year of cataloging-i.e., 1968 followed by 1969), all 7 -series numbers (disregarding the check digit, the second digit in the number). the lc card number print routine prints the card numbers, which are in numeric sequence as described in the preceding paragraphs, from the print index tape. each page of the listing contains a heading, a running index, a date, and a page number. the program prints 200 card numbers and data base codes per page. the numbers are in ascending order, top to bottom in four columns of 50 numbers each. format recognition the experience of the library in the creation of machine readable cataloging records during the marc pilot project and the marc dish·ibution service has clearly demonstrated that the highest cost factor of conversion is the human editing and proofing. the editing presently consists of assigning tags and codes to the bibliographic record to explicitly identify the content of the record for machine manipulation. the recon pilot projectfavram 111 library has completed a format recognition feasibility study which concluded that the probability of success of automatically assigning tags and codes by computer is high. since the format recognition feasibility study was only concerned with cataloging records for current english language monographs, the study must be extended to cover other roman alphabet languages and as part of recon, records which were created according to different rules and conventions. although the progress report submitted to clr included the definition and status of each of the tasks that make up the format recognition program, these have been omitted to avoid duplication with an article recently published in the i ournal of library automation ( 5) describing format recognition concepts in some detail and elaborating on the tasks completed and projected at that time. investigation of input devices the investigation of input devices and the testing of several selected devices in an operational mode will continue throughout recon. a study of the use of a mini-computer operating in an on-line mode for input, editing, and formatting of marc records is in progress at the library and will supplement the recon effort and provide additional data. a preliminary investigation was begun of optical character readers commercially available and in the developmental phases. only those readers capable of reading numerous characters on many lines (page reader) as opposed to a limited number of characters or lines per document (document reader) were included in the study. the machines evaluated were considered as possible candidates if they were capable of processing upperand lower-case alphabetic characters, numerals, standard punctuation and some special symbols. each manufacturer has specifications for the type of paper required and the font style which can be recognized. paper handling is a major drawback of optical character readers. excessive handling of the paper or any type of smear, crease, or crinkle could cause rejection of a character or conversion of a character to some specified symbol indicating an invalid character. error rates for the devices considered range from one to 35 characters per 10,000 characters and 80% of the errors are caused by paper handling. typewriters used to prepare the source document must be constantly cleaned and ribbons changed to keep impact keys free of dirt. frequent jamming appears to be a characteristic of most machines; unjamming these machines can be difficult and is highly dependent upon the skill of the operator. ten companies that have various types of optical character recognition equipment commercially available were considered in the first study. five were immediately rejected because their devices did not meet the criteria as specified above. 112 journal of libmry automation vol. 3/2 june, 1970 the devices remaining had the following characteristics: control data corporation 915 page reader. accepts 2.5x4 to 12x14inch paper; ocr-a standard type font; recognizes upper-case alphas, numerals, and standard punctuation; through programming and use of special symbols, lower-case alphas can be coded. farrington model 3030. accepts 4.5x5.5 to 8.5x13.5inch paper; ocr-a standard and 12l (farrington) type fonts; recognizes uppercase alphas, numerals, standard punctuation and special symbols; through programming and use of special symbols, lower-case alphas can be coded. scan-data models 100/300. accepts 8.5xll-inch paper; multi-type fonts; recognizes upperand lower-case alphas, numerals, standard punctuation, and special symbols; has programmable unit for formatting. philco-ford general purpose reader. accepts 5.7x8.5x11 inch paper; multi-type fonts; recognizes upper-case alphas, numerals, standard punctuation and special symbols; through programming and use of special symbols, lower-case alphas can be coded. recognition equipment retina. accepts 3.25x4.88 to 14.14-inch paper; multi-type fonts; recognizes upperand lower-case alphas, numerals, standard punctuation, and special symbols; has a programmable unit for formatting. the possibility exists of using any of these five machines for the input of english language material. the keying of an extraneous character is required with the farrington and control data corporation equipment for lower-case and some special symbols. this is not necessary with philco-ford, scan-data, and recognition equipment machines. since the number of special symbols vary by machine, each machine must be studied to determine a method of coding the entire library character set as developed by the library of congress and this method must be evaluated in terms of the burden placed on the typist. with the added feature of lower-case recognition, the price of the machine increases substantially. adequate information has not been obtained from these companies to give an accurate accounting of cost. it should be noted that the rental price for the majority of optical character readers is high, a factor which will have to be taken into consideration at the time of selection of an input device. the most economic route to recon pilot project/ avram 113 conversion may be through a service bureau, depending on the volume of records to be converted. outlook it is too early in the life of the project to predict the outcome or to describe any factual conclusions. the library of congress is greatly encouraged by the interest expressed in the project and the assistance offered by the members of the advisory committee and the working task force. the scope of the assignments and the fact that all members of the working task force have responsible positions in their own institutions are clear evidence of the spirit of cooperation that has been exhibited by the working task force members and their parent organizations. other members of the library community have been and will continue to be contacted throughout the project for their expertise in certain facets of the many problems under exploration. several developing regional networks were requested to describe their plans in the hope that smaller scale efforts would shed some light on the problems involved on a national level. those organizations contacted have responded, and a continuing liaison will be maintained not only to avoid duplication of effort but, more important, to attain a better understanding of how to approach the requirements of future library systems in terms of what is possible today. the report submitted to clr described progress made to november 1, 1969. since that time, the recon production staff has selected all the 1969 titles from the card stock to be included in recon, 5,200 records have been edited, and the first 250 have been forwarded to a service bureau to test its procedures for keying. the staff· has begun the selection of the 1968 titles and out of approximately 26,000 records received to date from the card division 19,000 are recon candidates. the production section continues its training by the proofing of marc records until the recon records are processed through the marc system to provide the required diagnostics for the proofing process. procedures were set up for typing records without any editing and in accordance with the requirements for the format recognition program. sample records selected for testing the procedures were of above-average difficulty in order to include all types of data that might be encountered. the procedures will be continually evaluated until some optimal method is determined. the format recognition algorithms are being evaluated by having recon staff simulate a computer and follow through the logic of the algorithms on actual data. results of the simulation will provide the necessary feedback to adjust the algorithms prior to the coding of the computer programs. detailed design work has begun on the expansion of the marc system to include random access capability and on-line correction. this 114 journal of library automation vol. 3/2 june, 1970 effort is being coordinated with the card division mechanization project and is considering the requirements of a large-scale conversion activity. although it has a long way to go, recon is on schedule and for any project concerned with automation, that is an encouraging note. for the moment the future looks bright. acknowledgment the author wishes to thank the recon staff members of the library of congress for their respective reports which were incorporated into the progress report submitted to the council on library resources, inc., and as such, are significant contributions to this paper. without the aid of the council on library resources the recon project would not have become a reality. through three important grants the council has made a major contribution to the project: 1) the first was a grant in support of the recon feasibility study and the working task force that resulted in the recon report; 2) an officer's grant enabling the establishment of the recon production unit to create additional machine readable records not included in the marc distribution service; and 3), most importantly, a grant providing full funding for the two-year pilot project. references 1. library of congress; recon working task force: conversion of retrospective catalog records to machine readable form. (washington: library of congress, 1969). 2. ibid, pp. 10-11. 3. ibid, pp. 20-38. 5. anglo-american cataloging rules. (chicago: american library association, 1967). 4. avram, henriette d., et al.: marc program research and development: a progress report," journal of library automation, 2 (december 1969), 242-265. lib-mocs-kmc364-20131012120558 314 news and announcements first use of catvlib network: american red cross satellite telecast on may 21, 1981, the american red cross celebrated their one-hundredth birthday by ending their annual conference in washington, d.c., with a special twohour nationwide satellite telecast. the pssc coordinated distribution of the telecast, which originated from constitution hall in washington, d.c. , from 10 a.m. to noon. the program was carried on satcom i, transponder 16 (appalachian community service network), and made available to all cable systems able to receive this transponder. those areas not able to schedule the live program were offered a satellite-transmitted taped feed later in the day. the american red cross had encouraged all its local chapters to initiate program reception in their communities by approaching the local cable system about carrying the event. since the american red cross was offering a free program and trying to saturate as much of the united states as possible, use of the catvlib network in conjunction with this telecast was appropriate. pssc contacted 53 libraries in 23 states that were interested in assuming local coordination for bringing this event to their communities. as the local coordinator, the ca tvlibs' minimum responsibilities included alerting the cable systems to schedule receiving this program (if the local red cross chapter had not already approached the catv) and contacting the local red cross chapter to offer the catvlibs' facilities for their group viewing and concomitant local celebration. of these fifty-three catvlibs , only seven could not participate because of technical problems. schedule conflicts; lack of catv, red cross, or community interest; and red cross alternative plans were the major factors in prohibiting twelve others from directly participating in hosting the satellite-transmitted program. the remaining thirty-four catvlibs did host community residents in their facilities. evaluation forms revealed a variety of degrees of catvlib participation in coordinating their first satellite event participation. several catvlibs (though none came to the library for viewing) were instrumental in getting the program into the community and available to all local cable subscribers. advance publicity, birthday cakes and refreshments, sing-alongs, taping for multiple showings, and joint library/ chapter preand postevent activities are but a few of the ways the individual catvlibs participated. all of the evaluation forms indicated that the ca tvlibs wanted to be contacted as a potential local site for future satellite events. the following list names the fifty-t hree ca tvlibs that were initially contacted to be local coordinators for the red cross onehundredth birthday satellite telecast. though not all were successful, catvlib made an effort to bring the program to its community. colorado boulder public library, boulder connecticut thomaston public library, thomaston florida tarpon springs public library, tarpon springs georgia tri-county regional library, rome idaho pocatello public library, pocatello illinois pekin public library, pekin rockford public library, rockford indiana fort wayne public library, fort wayne monroe county public library, bloomington iowa kirkwood community college telecommunications center, cedar rapids iowa city public library, iowa city kansas abilene public library, abilene newton public library, newton kentu cky lexington public library, lexington louisville public library, louisville camden-carroll library, morehead state university , morehead massachusetts greenfield community college library, greenfield south hadley library system, south hadley minn esota anoka county library, fridley cloquet public library, cloquet crow river regional library, willmar international falls public library, international falls minnesota valley regional library, mankato marshall-lyon county library system, marshall western plains library system, montevideo rochester public library , rochester st. cloud public library, st. cloud missouri st. charles city county library, st. peters new j ersey burlington county college library , pemberton new york albany public library, albany amherst public library , willia msville bethlehem public library, delmar chautauqua-cattaraugus library system, jamestown gates public library, rochester mid-york library system, utica ridge road elementary school library, horseheads north carolina davidson county community college library, lexington ohio greene county district library, xenia public library of columbus and franklin county, columbus news and announcements 315 university of toledo library, toledo pennsylvania altoona area public library, altoona lancaster county library, lancaster monroeville public library , monroeville tennessee memphis/shelby county public library & information center, memphis utah merrill library and learning resources program , utah state university, logan weber county library, ogden virginia arlington county department of libraries, arlington washington edmonds community college library, lynnwood lynnwood public library, lynnwood mountlake terrace public library, mountlake terrace seattle public library, seattle wisconsin middleton public library, middleton nicolet college learning resource center, rhinelander who's who and what's what in library video and cable for librarians interested in who is doing what in video in libraries, or in how to do it themselves, a guidebook has been published by the video a nd cable communications section of the libra ry a nd information technology association . it is the 461-page video and cable guidelines. edited by leslie c hamberlin burk and roberto esteves-two of the most active libra rians in the video field-the book includes papers from donald sager, kandy brandt, arlene farber sirkin, anne hollingsworth , and by burk and esteves. among the topics covered are a description ofthe present operation, future plans, problems, and benefits of video in 250 libraries in the u.s. and canada. the book is spiral-bound and can be used conveniently as a manual for staff development programs. its price is $9. 75 . for additional information, or to order copies (prepaid orders only, please), contact lit a, ala, 50 e. huron st., chicago, il 60611 ; (3 12)944-6780. 316 journal of library automation vol. 14/4 december 1981 elmig electronic mail arrives the "new arrival" to the library association family this summer is the electronic library membership initiative group. elmig is an organization of individuals established to ensure that electronically delivered information remains accessible to the general public. elmig promotes participation and leadership in the remote electronic delivery of information by publicly supported libraries and nonprofit organizations. the group's efforts are coordinated by richard sweeney, director of the public library of columbus and franklin county; neal kaske, director of oclc's office of research; and kenneth dow lin, director of the pikes peak library district. the first founding goals of elmig are: • identifying services and information best suited for the remote electronic access to and delivery of information; • planning, funding, and developing working demonstrations of library electronic information services; • communicating the availability of electronic library services to the community; • informing the library profession of trends, specific events, and future directions in remote electronic delivery of information; • creating coalitions with organizations in allied fields of interest. organizers of elmig are working within ala to foster interest in , and facilitate the needs of, the electronic library. ala has established a membership initiative group to address the concerns of this group. the electronic library membership initiative group will meet during the ala midwinter meeting in denver. interested individuals are encouraged to attend the meeting scheduled for monday, january 25, 1982, at 2 p.m. in room 2e of the auditorium. interest in elmig/ela has surfaced quickly. the membership group was formed in march, and gathered the 200 signatures needed for official recognition at the ala annual conference in san francisco. some 150 people met at that conference to discuss topics of concern. they decided to continue these discussions at the 1982 midwinter meeting and plan for an elmig program to be presented at philadelphia. elmig aims to address the issues concerning the electronic library on a continuing basis through ongoing interaction of its members. to facilitate this interaction, elmig will use an electronic mail system. further information on elmig and its members is available from richard sweeney at the public library of columbus and franklin county, 28 s. hamilton rd., columbus, oh 43213. see page 317 for subscriber agreement form. heynen to head arl microform project the association of research libraries has hired jeffrey heynen to head a two-year program designed to improve bibliographic access to microform collections in american and canadian libraries. the association has received $20,000 from the council on library resources to initiate the project, and additional funds are anticipated from other sources. heynen brings an extensive background in micrographics and publishing to the project as well as a long-standing commitment to improving the treatment , use, and bibliographic control of microforms in libraries. he has served as chair of the american library association's reproduction of library materials section, and was a participant in earlier groups that laid the foundation for the current arl project. currently president of information interchange corporation, heynen has held executive positions with congressional information service, greenwood press, and redgrave information resources. these positions have all included responsibility for the creation of large microform collections. heynen hold~ memberships in numerous standards-making bodies, including the international organization for standardization (iso), the american national standards institute, and the national micrographics association, and is a lecturer at the university of maryland college of library and information services. the arl microform project is based upon a planning study conducted for the association by richard boss of information systems consultants, inc. its purpose is to stimulate and coordinate the work of libraries, microform publishers, bibliographic utilities, and regional networks in providing bibliographic access to millions of monographic titles in microform that are now inadequately or insufficiently cataloged. since the development of the plan during 1980, there has been keen interest both in the elements of the plan and in the cooperative efforts needed to achieve them. a number of libraries-both arl and nonarl members-are planning to begin or are already entering catalog records for individual titles in microform sets into bibliographic databases. for example, three arl libraries have recently been awarded grants under title 11-c of the higher education news and announcements 317 act, strengthening research library resources, to catalog major microform sets, entering the resulting records into one of the major utilities. all three librariesstanford university, university of utah, and indiana university-will be coordinating their efforts with the goals of the arl program. key to these efforts, however, is coordination to ensure that national standards are accepted and followed, to distribute the work load so that as many sets as possible are covered and duplication of effort is avoided, and to ensure that the records are available to all libraries that want to use them. the arl microform project will emphasize building on existing resources, coordinating efforts among the library and pubsubscriber agreement electronic library membersiup initiative group ------------------·(ala member), applies for membership in the electronic library membership initiative group, electronic mail system, and states that: recitals: a. elmig is an association of individuals whose mission is to ensure that information delivered electronically remains accessible to the general public; and b. elmig seeks to promote participation and leadership in remote electronic delivery of information by publicly supported libraries and nonprofit organizations. now therefore, the above member and oclc agree that: 1. member will deposit with oclc a $100 contribution toward the cost of electronic mail service and attendant expenses for the first year of operation, which is to commence january 1, 1982. the member recognizes that the initial member contribution may not be sufficient to pay for a year of operation and agrees, when invoiced, to make additional payments of $100, or other agreed upon sums, to oclc for the continuation of service. 2. oclc agrees that by accepting member deposits, it will secure electronic mail service for the members of elmic; and 2.1 will place member deposits in a separate elmic account from which oclc will pay the cost of the electronic mail service, u.s. postal mailings, and any other expenses incurred in the administration of ems. 2.2 will provide a year-end accounting of contributions and expenditures to members with in a reasonable time after december 31 , 1981 , and each year-end thereafter. member: by --------------------------------------------------------title --------------------------------------------------------date ---------------------------------------------------------318 journal of library automation vol. 14/4 december 1981 lishing communities and the bibliographic utilities, and, where possible, facilitating cooperative projects already planned or under way. heynen will be assisted by an advisory committee composed of representatives of both arl and non-arl libraries, the major bibliographic utilities, and microform publishers. the arl project will operate out of the office of information interchange corporation, 503 11th st., se, washington, dc 20003; (202)544-0291. libraries and publishers interested in participating in the project are urged to contact the project office. nominations sought for lita award nominations are being sought for the library and information technology association's award for achievement. the award is intended to recognize distinguished leadership, notable development or application of technology, superior accomplishments in research or education, or original contributions to the literature of the field. the award may be given to an individual or to a small group of individuals working in collaboration. organized institutions or parts of organized institutions are not eligible. nominations for the award may be made by any member of the american library association and should be submitted by january 15, 1982, to hank epstein, lita awards committee chairperson, 1992 lemnos dr., costa mesa, ca 92626. are these books on your shelf? the special library role in networks: proceedings of a conference robert w. gibson, jr., ed. 296 p. 1980 isbn 0-87111·279°5 ........ . ........ . ......... $10.50 d reports on the cu rrent state of networking and presents a creative approach to special library involvement in network participation and management. special libraries special issue on information technology and special libraries april 1981, vol. 72, no. 2 ...... .. ....... . .... .. ............... .. .......... .. ......... $9.00 d the entire issue of this journal is devoted to the technological transformation of the information industry. topics discussed are such advances as computer and tele communications components, software developments, linking, and modes of access to information systems. bibliographic utilities: a guide lor the special librarian james k. webster, ed. 32 p. 1980 isbn 0°87111°280°7 ............. 0 •• 0 •••••••••••• 0 •• $3.75 d a comparative study of the services offered by the four major north american online bibliographic utilities. total$ ___ _ send to: special libraries association order departmentbox jla 235 park avenue south orders from individuals must be prepaid. new york, new york 10003 date ______ _ name __________ __ organization street address ------------------------city ----------state _ _____ zip _____ _ new york ctty purchasers add 8 'i•% state and city sales tax. new york state purchasers add appropriate state and local sales tax. assessing the treatment of patron privacy in library 2.0 literature michael zimmer information technology and libraries | june 2013 29 abstract as libraries begin to embrace web 2.0 technologies to serve patrons, ushering in the era of library 2.0, unique dilemmas arise regarding protection of patron privacy. the norms of web 2.0 promote the open sharing of information—often personal information—and the design of many library 2.0 services capitalize on access to patron information and might require additional tracking, collection, and aggregation of patron activities. thus embracing library 2.0 potentially threatens the traditional ethics of librarianship, where protecting patron privacy and intellectual freedom has been held paramount. as a step towards informing the decisions to implement library 2.0 to adequately protect patron privacy, we must first understand how such concerns are being articulated within the professional discourse surrounding these next generation library tools and services. the study presented in this paper aims to determine whether and how issues of patron privacy are introduced, discussed, and settled, if at all, within trade publications utilized by librarians and related information professionals introduction in today’s information ecosystem, libraries are at a crossroads: several of the services traditionally provided within their walls are increasingly made available online, often by non-traditional sources, both commercial and amateur, thereby threatening the historical role of the library in collecting, filtering, and delivering information. for example, web search engines provide easy access to millions of pages of information, online databases provide convenient gateways to news, images, videos, as well as scholarship, and largescale book digitization projects appear poised to make roaming the stacks seem an antiquated notion. further, the traditional authority and expertise enjoyed by librarians has been challenged by the emergence of automated information filtering and ranking systems, such as google’s algorithms or amazon’s recommendation system, as well as amateur, collaborative, and peerproduced knowledge projects, such as wikipedia, yahoo! answers, and delicious. meanwhile, the professional, educational, and social spheres of our lives are increasingly intermingled through online social networking spaces such as facebook, linkedin, and twitter, providing new interfaces for interacting with friends, collaborating with colleagues, and sharing information. michael zimmer, phd, (zimmerm@uwm.edu), a lita member, is assistant professor, school of information studies, and director, center for information policy research, university of wisconsin-milwaukee. mailto:zimmerm@uwm.edu information technology and libraries | june 2013 30 libraries face a key question in this new information environment: what is the role of the library in providing access to knowledge in today’s digitally networked world? one answer has been to actively incorporate features of the online world into library services, thereby creating “library 2.0.” conceptually, library 2.0 is rooted in the global web 2.0 discussion, and the professional literature often links the two concepts. according to o’reilly, web 2.0 marks the world wide web’s shift from a collection of individual websites to a computing platform that provides applications for end users and can be viewed as a tool for harnessing the collective intelligence of all web users.1 web 2.0 represents a blurring of the boundaries between web users and producers, consumption and participation, authority and amateurism, play and work, data and the network, reality and virtuality.2 its rhetoric suggests that everyone can and should use new internet technologies to organize and share information, to interact within communities, and to express oneself. in short, web 2.0 promises to empower creativity, to democratize media production, and to celebrate the individual while also relishing the power of collaboration and social networks. library 2.0 attempts to bring the ideology of web 2.0 into the sphere of the library. the term is generally attributed to casey,3 and while over sixty-two distinct viewpoints and seven different definitions of library 2.0 have been advanced,4 there is general agreement that implementing library 2.0 technologies and services means bringing interactive, collaborative, and user-centered web-based technologies to library services and collections.5 examples include • providing synchronous messaging (through instant message platforms, skype, etc.) to allow patrons to chat with library staff for real-time assistance; • using blogs, wikis, and related user-centered platforms to encourage communication and interaction between library staff and patrons; • allowing users to create personalized subject headings for library materials through social tagging platforms like delicious or goodreads; • providing patrons the ability to evaluate and comment on particular items in a library’s collection through rating systems, discussion forums, or comment threads; • using social networking platforms like facebook or linkedin to create online connections to patrons, enabling communication and service delivery online; and • creating dynamic and personalized recommendation systems (“other patrons who checked out this book also borrowed these items”), similar to amazon and related online services. launching such library 2.0 features, however, poses a unique dilemma in the realm of information ethics, especially patron privacy. traditionally, the context of the library brings with it specific norms of information flow regarding patron activity, including a professional commitment to patron privacy (see, for example, american library association’s privacy policy, 6 foerstel,7 gorman,8 and morgan 9). in the library, users’ intellectual activities are protected by decades of established norms and practices intended to preserve patron privacy and confidentiality, most assessing the treatment of patron privacy in library 2.0 literature | zimmer 31 stemming from the ala’s library bill of rights and related interpretations.10 as a matter of professional ethics, most libraries protect patron privacy by engaging in limited tracking of user activities, having short-term data retention policies (many libraries actually delete the record that a patron ever borrowed a book once it is returned), and generally enable the anonymous browsing of materials (you can walk into a public library, read all day, and walk out, and there is no systematic method of tracking who you are or what you’ve read). these are the existing privacy norms within the library context. library 2.0 threatens to disrupt these norms. in order to take full advantage of web 2.0 platforms and technologies to deliver library 2.0 services, libraries will need to capture and retain personal information from their patrons. revisiting the examples provided above, each relies on some combination of robust user accounts, personal profiles, and access to flows of patrons’ personal information: • providing synchronous messaging might necessitate the logging of a patron's name (or chat username), date and time of the request, e-mail or other contact information, and the content of the exchange with the librarian staff member. • library-hosted blogs or wikis will require patrons to create user accounts, potentially tying posts and comments to patron ip addresses, library accounts, or identities. • implementing social tagging platforms would similarly require unique user accounts, possibly revealing the tags particular patrons use to label items in the collection and who tagged them. • comment and rating systems potentially link patrons’ particular interests, likes, and dislikes to a username and account. • using social networking platforms to communicate and provide services to patrons might result in the library gaining unwanted access to personal information of patrons, including political ideology, sexual orientation, or related sensitive information. • creating dynamic and personalized recommendation systems requires the wholesale tracking, collecting, aggregating, and processing of patron borrowing histories and related activities. across these examples, to participate and benefit from library 2.0 services, library patrons could potentially be required to create user accounts, engage in activities that divulge personal interests and intellectual activities, be subject to tracking and logging of library activities, and risk having various activities and personal details linked to their library patron account. while such library 2.0 tools and services can greatly improve the delivery of library services and enhance patron activities, the increased need for the tracking, collecting, and retaining of data about patron activities presents a challenge to the traditional librarian ethic regarding patron privacy.11 despite these concerns, many librarians recognize the need to pursue library 2.0 initiatives as the best way to serve the changing needs of their patrons and to ensure the library’s continued role in information technology and libraries | june 2013 32 providing professionally guided access to knowledge. longitudinal studies of library adoption of web 2.0 technologies reveal a marked increase in the use of blogs, sharing plugins, and social media between 2008 and 2010.12 in this short amount of time, library 2.0 has taken hold in hundreds of libraries, and the question before us is not whether libraries will move towards library 2.0 services, but how they will do it, and, from an ethical perspective, whether the successful implementation of library 2.0 can take place without threatening the longstanding professional concerns for, and protections of, patron privacy. research questions recognizing that library 2.0 has been implemented, in varying degrees, in hundreds of libraries,13 and is almost certainly being considered at countless more, it is vital to ensure that potential impacts on patron privacy are properly understood and considered. as a step towards informing the decisions to implement library 2.0 to adequately protect patron privacy, we must first understand how such concerns are being articulated within the professional discourse surrounding these next generation library tools and services. the study presented in this paper aims to determine whether and how issues of patron privacy are introduced, discussed, and settled—if at all—within trade publications utilized by librarians and related information professionals. specifically, this study asks the following primary research questions: rq1. are issues of patron privacy recognized and addressed in literature discussing the implementation of library 2.0 services? rq2. when patron privacy is recognized and addressed, how is it articulated? for example, is privacy viewed as a critical concern, as something that we will need to simply “get over,” or as a non-issue? rq3. what kind of mitigation strategies, if any, are presented to address the privacy issues related to library 2.0? data analysis the study combines content and textual analyses of articles published in professional publications (not peer-reviewed academic journals) between 2005 and 2011 discussing library 2.0 or related web-based services, retrieved through the library, information science, and technology abstracts (lista) and library literature & information science full text databases. the discovered texts were collected in winter 2011 and coded to reflect the source, author, publication metadata, audience, and other general descriptive data. in total, there were 677 articles identified discussing library 2.0 and related web-based library services, appearing in over 150 different publications. of the articles identified, 50 percent of appeared in 18 different publications, which are listed in table 1. assessing the treatment of patron privacy in library 2.0 literature | zimmer 33 table 1. top publications with library 2.0 articles (2005–2011) publication count computers in libraries library journal information today library and information update incite scandinavian public library quarterly american libraries electronic library online school library journal information outlook mississippi libraries college & research library news library hi tech news library media connection csla journal (california school library association) knowledge quest multimedia information and technology 51 51 21 21 20 18 16 15 14 14 13 13 12 12 12 10 10 8 each of the 677 source texts was then analyzed to determine if a discussion of privacy was present. full-text searches were performed on word fragments to ensure the identification of variations in terminology. for example, each text was searched for the fragment “priv” to include hits on both the terms “privacy” and “private.” additional searchers were performed for word fragments related to “intellectual freedom” and “confidentiality” in order to capture more general considerations related to patron privacy. of the 677 articles discussing library 2.0 and related web-based services, there were a total of 203 mentions of privacy or related concepts in 71 articles. these 71 articles were further refined to ensure the appearance of the word “privacy” and related terms were indeed relevant to the ethical issues at hand (eliminating false positives for mentions of “private university,” for example, or mention of a publication’s “privacy policy” that happened to be provided in the pdf searched). the final analysis yielded a total of 39 articles with relevant mention of patron privacy as it relates to library 2.0, amounting to only 5.8 percent of all articles discussing library 2.0 (see table 2). a full listing of the articles is in appendix a. information technology and libraries | june 2013 34 table 2. article summary count % total articles discussing library 2.0 articles with hit in “priv” and related text searches articles with relevant discussion of privacy 677 71 39 10.5 5.8 the majority of these articles were authored by practicing librarians in both public and academic settings and present arguments for the increased use of web 2.0 by libraries or highlight successful deployment of library 2.0 services. of the 39 articles, only 4 focus primarily on challenges faced by libraries hoping to implement library 2.0 solutions.14 a textual analysis of the 39 relevant articles was performed to assess how privacy was discussed in each. two primary variables were evaluated: the length of discussion, and the level of concern. length of discussion was measured qualitatively as high (concern over privacy is explicit or implicit in over 50 percent of the article’s text), moderate (privacy is discussed in a substantive section of the article), and minimal (privacy is mentioned, but not given significant attention). the level of concern was measured qualitatively as high (indicated privacy as a critical variable for implementing library 2.0), moderate (recognized privacy as one of a set of important concerns), and minimal (mentioned privacy largely in passing, giving it no particular importance). results of these analyses are reported in table 3. table 3. length of discussion and level of concern length of discussion level of concern high moderate minimal 3 8 28 9 13 16 of the 39 relevant articles, only three had lengthy discussions of privacy-related issues. as early as 2007, coombs recognized that the potential for personalization of library services would force libraries to confront existing policies regarding patron privacy. 15 anderson and rethlefsen similarly engage in lengthy discussions of the challenges faced by libraries wishing to balance patron privacy with new web 2.0 tools and services. 16 these three articles represent less than 1 percent of the 677 total articles identified that discussed library 2.0 while only three articles dedicate lengthy discussions to issues of privacy, over half the articles that mention privacy (21 of 39) indicate a high or moderate level of concern. for example, cvetkovic warns that while “privacy is a central, core value of libraries…the features of web 2.0 applications that make them so useful and fun all depend on users sharing private information with the site owners.” 17 and casey and savastinuk’s early discussion of library 2.0 puts these concerns in context for librarians, warning that “libraries should remain as vigilant with assessing the treatment of patron privacy in library 2.0 literature | zimmer 35 protecting customer privacy with technology-based services as they are with traditional, physical library services.” 18 while 21 articles indicated a high or moderate level of concern over patron privacy, less than half of these provided any kind of solution or strategy for mitigating the privacy concerns related to implementing library 2.0 technologies. overall, 14 of the 39 relevant articles provided privacy solutions of one kind or another. breeding, for example, argues that librarians must “absolutely respect patron privacy,” 19 and suggests any library 2.0 tools that rely on user data should only be implemented if users must explicitly “opt-in” to having their information collected, a solution also offered by wisniewski in relation to protecting patron privacy with location-based tools.20 rethlefsen goes a step further, proposing libraries take steps to increase the literacy of patrons regarding their privacy and the use of library 2.0 tools, including the use of classes and tutorials to help educate patrons and staff alike. 21 conversely, cvetkovic argues that “the place of privacy in our culture is changing,” and that while “in many ways our privacy is diminishing, but many people…seem not too concerned about it.” 22 as a result, while she argues for only voluntary participation in library 2.0 services, cvetkovic takes a position that information sharing is becoming the new norm, weakening any absolute position regarding protecting patron privacy above all. discussion rq1 asks if issues of patron privacy are recognized and addressed within literature discussing library 2.0 and related web-based library services. of the 677 articles published for professional audiences that discuss library 2.0, only 39 contained a relevant discussion of the privacy issues that stem from this new family of data-intensive technologies, and only 11 of these discussed the issue beyond a passing mention. rq2 asks how the privacy concerns, when present, are articulated. of the 39 articles with relevant discussions of privacy, only 11 make more than a minimal mention of privacy concerns. however, the discussion in 22 of the articles reveals a high or moderate level of concern. this suggests that while privacy might not be a primary focus of discussion, when it is mentioned, even minimally, its importance is recognized. finally, rq3 seeks to understand if any solutions or mitigation strategies related to the privacy concerns are articulated. with only 14 of the 39 articles providing a means for practitioners to address privacy issues, readers of library 2.0 publications are more often than not left with no real solutions or roadmaps for dealing with these vital ethical issues. taken together, the results of this study reveal minimal mention of privacy alongside discussions of library 2.0. less than 6 percent of all 677 articles on library 2.0 include mention of privacy; of these, only 11 make more than a passing mention of privacy, representing less than 2 percent of information technology and libraries | june 2013 36 all articles. of the 39 relevant articles, 22 express more than a minimal concern, but of these, only 9 provide any mitigation strategy. these results suggest that while popular publications targeted at information professionals are giving significant attention to potential for library 2.0 to be a powerful new option for delivering library content and services, there is minimal discussion of how the widespread adoption and implementation of these new tools might impact patron privacy and even less discussion of how to address these concerns. consequently, as the interest in, and adoption of, library 2.0 services increase, librarians and related information practitioners seeking information regarding these new technologies in professional publications will not likely be confronted with the possible privacy concerns, nor learn of any strategies to deal with them. this absence of clear guidance for addressing patron privacy in the library 2.0 era resembles what computer ethicist jim moor would describe as a “policy vacuum”: a typical problem in computer ethics arises because there is a policy vacuum about how computer technology should be used. computers provide us with new capabilities and these in turn give us new choices for action. often, either no policies for conduct in these situations exist or existing policies seem inadequate. a central task of computer ethics is to determine what we should do in such cases, that is, formulate policies to guide our actions. 23 given the potential for the data-intensive nature of library 2.0 technologies to threaten the longstanding commitment to patron privacy, these results show that work must be done to help fill this vacuum. education and outreach must be increased to ensure librarians and information professionals are aware of the privacy issues that typically accompany attempts to implement library 2.0, and additional scholarship must take place to help understand the true nature of any privacy threats and to come up with real and useful solutions to help find the proper balance between enhanced delivery of library services through web 2.0-based tools and the traditional protection of patron privacy. acknowledgements this research was supported by a ronald e. mcnair postbaccalaureate achievement program summer student research grant,and a uw-milwaukee school of information studies internal research grant. the author thanks kenneth blacks, jeremy mauger, and adriana mccleer for their valuable research assistance. assessing the treatment of patron privacy in library 2.0 literature | zimmer 37 references 1. tim o’reilly, “what is web 2.0? design patterns and business models for the next generation of software,” 2005, www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web20.html. 2. michael zimmer, “preface: critical perspectives on web 2.0,” first monday 13, no. 3 (march 2008), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2137/1943. 3. michael casey, “working towards a definition of library 2.0,” librarycrunch (october 21, 2005), www.librarycrunch.com/2005/10/working_towards_a_definition_o.html. 4. walt crawford, “library 2.0 and ‘library 2.0,’” cites & insights 6, no 2 (midwinter 2006): 1–32, http://citesandinsights.info/l2a.htm. 5. michael casey and laura savastinuk, “library 2.0: service for the next-generation library,” library journal 131, no. 14 (september 1, 2006): 40–42; michael casey and laura savastinuk, library 2.0: a guide to participatory library service (medford, nj: information today, 2007).; nancy courtney, library 2.0 and beyond: innovative technologies and tomorrow’s user (westport, ct: libraries unlimited, 2007). 6. american library association, “policy on confidentiality of library records,” www.ala.org/offices/oif/statementspols/otherpolicies/policyconfidentiality. 7. herbert n. foerstel, surveillance in the stacks: the fbi’s library awareness program (new york: greenwood, 1991). 8. michael gorman, our enduring values: librarianship in the 21st century (chicago: american library association, 2000). 9. candace d. morgan, “intellectual freedom: an enduring and all-embracing concept,” in intellectual freedom manual. (chicago: american library association, 2006). 10. library bill of rights, american library association, www.ala.org/advocacy/intfreedom/librarybill; american library association, “privacy: an interpretation of the library bill of rights,” www.ala.org/template.cfm?section=interpretations&template=/contentmanagement/conten tdisplay.cfm&contentid=132904 11. rory litwin, “the central problem of library 2.0: privacy,” library juice (may 22, 2006), http://libraryjuicepress.com/blog/?p=68. http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2137/1943 http://www.librarycrunch.com/2005/10/working_towards_a_definition_o.html http://citesandinsights.info/l2a.htm http://www.ala.org/offices/oif/statementspols/otherpolicies/policyconfidentiality http://www.ala.org/advocacy/intfreedom/librarybill http://www.ala.org/template.cfm?section=interpretations&template=/contentmanagement/contentdisplay.cfm&contentid=132904 http://www.ala.org/template.cfm?section=interpretations&template=/contentmanagement/contentdisplay.cfm&contentid=132904 http://libraryjuicepress.com/blog/?p=68 information technology and libraries | june 2013 38 12. zeth lietzau and jamie helgren, u.s. public libraries and the use of web technologies, 2010 (denver: library research service, 2011), www.lrs.org/documents/web20/webtech2010_closerlookreport_final.pdf. 13. ibid. 14. sue anderson, “libraries struggle to balance privacy and patron access,” alki 24, no. 2 (july 2008): 18–28; karen coombs, “privacy vs. personalization,” netconnect (april 15, 2007): 28; milica cvetkovic, “making web 2.0 work–from ‘librarian habilis’ to ‘librarian sapiens,’” computers in libraries 29, no. 9 (october 2009): 14–17, www.infotoday.com/cilmag/oct09/cvetkovic.shtml;, melissa l. rethlefsen, “tools at work: facebook’s march on privacy,” library journal 135, no. 12 (june 2010): 34–35. 15. coombs, “privacy vs. personalization.” 16. anderson, “libraries struggle to balance privacy and patron access.”; melissa l rethlefsen, “facebook’s march on privacy,” library journal 135, no. 12 (2010): 34–35. 17. cvetkovic, “making web 2.0 work.” 18. casey and savastinuk, “library 2.0: service for the next-generation library.” 19. marshall breeding, “taking the social web to the next level,” computers in libraries 30, no. 7 (september 2010): 34–37, www.librarytechnology.org/ltg-displaytext.pl?rc=15053. 20. jeff wisniewski, “location, location, location,” online 33, no. 6 (2009): 54–57. 21. rethlefsen, “tools at work: facebook’s march on privacy.” 22. cvetkovic, “making web 2.0 work,” 17. 23. james moor, “what is computer ethics?” metaphilosophy 16, no. 4 (october 1985): 266–75. http://www.lrs.org/documents/web20/webtech2010_closerlookreport_final.pdf http://www.infotoday.com/cilmag/oct09/cvetkovic.shtml http://www.librarytechnology.org/ltg-displaytext.pl?rc=15053 assessing the treatment of patron privacy in library 2.0 literature | zimmer 39 appendix a: articles with relevant mention of patron privacy as it relates to library 2.0 anderson, sue. “libraries struggle to balance privacy and patron access.” alki 24, no. 2 (july 2008): 18–28. balnaves, edmund. “the emerging world of open source, library 2.0, and digital libraries.” incite 30, no. 8 (august 2009): 13. baumbach, donna j. “web 2.0 and you.” knowledge quest 37, no. 4 (2009): 12–19. breeding, marshall. “taking the social web to the next level.” computers in libraries 30, no. 7 (september 2010): 34–37. casey, michael e. and laura savastinuk. “library 2.0: service for the next-generation library.” library journal 131, no. 14 (september 1, 2006): 40–42. cohen, sarah f. “taking 2.0 to the faculty why, who, and how.” college & research libraries news 69, no. 8 (september 2008): 472–75. coombs, karen. “privacy vs. personalization.” netconnect (april 15, 2007): 28. coyne, paul. “library services for the mobile and social world.” managing information 18, no. 1 (2011): 56–58. cromity, jamal. “web 2.0 tools for social and professional use.” online 32, no. 5 (october 2008): 30–33. cvetkovic, milica. “making web 2.0 work—from ‘librarian habilis’ to ‘librarian sapiens.’” computers in libraries 29, no. 9 (october 2009): 14–17. eisenberg, mike. “the parallel information universe.” library journal 133, no. 8 (may 1, 2008): 22–25. gosling, maryanne, glenn harper, and michelle mclean. “public library 2.0: some australian experiences.” electronic library 27, no. 5 (2009): 846–55. han, zhiping, and yan quan liu. “web 2.0 applications in top chinese university libraries.” library hi tech 28, no. 1 (2010): 41–62. harlan, mary ann. “poetry slams go digital.” csla journal 31, no. 2 (spring 2008): 20–21. hedreen, rebecca c., jennifer l. johnson, mack a. lundy, peg burnette, carol perryman, guus van den brekel, j. j. jacobson, matt gullett, and kelly czarnecki. “exploring virtual librarianship: second life library 2.0.” internet reference services quarterly 13, no. 2–3 (2008): 167–95. information technology and libraries | june 2013 40 horn, anne, and sue owen. “leveraging leverage: how strategies can really work for you.” in proceedings of the 29th annual international association of technological university libraries (iatul) conference, auckland, nz (2008): 1–10, http://dro.deakin.edu.au/eserv/du:30016672/horn-leveragingleveragepaper-2008.pdf. huwe, terence. “library 2.0, meet the ‘web squared’ world.” computers in libraries 31, no. 3 (april 2011): 24–26. “idea generator.” library journal 134, no. 5 (1976): 44. jayasuriya, h. kumar percy, and frances m. brillantine. “student services in the 21st century: evolution and innovation in discovering student needs, teaching information literacy, and designing library, 2.0-based student services.” legal reference services quarterly 26, no. 1–2 (2007): 135–70. jenda, claudine a., and martin kesselman. “innovative library 2.0 information technology applications in agriculture libraries.” agricultural information worldwide 1, no. 2 (2008): 52–60. johnson, doug. “library media specialists 2.0.” library media connection 24, no.7 (2006): 98. kent, philip g. “enticing the google generation: web 2.0, social networking and university students.” in proceedings of the 29th annual international association of technological university libraries (iatul) conference, auckland, nz (2008), http://eprints.vu.edu.au/800/1/kent_p_080201_final.pdf. krishnan, yyvonne. “libraries and the mobile revolution.” computers in libraries 31, no. 3 (april 2011): 5–9. li, yiu-on, irene s. m. wong, and loletta p. y. chan. “mylibrary calendar: a web 2.0 communication platform.” electronic library 28, no. 3 (2010): 374–85. liu, shu. “engaging users: the future of academic library web sites.” college & research libraries 69, no. 1 (january 2008): 6–27. mclean, michelle. “virtual services on the edge: innovative use of web tools in public libraries.” australian library journal 57, no. 4 (november 2008): 431–51. oxford, sarah. “being creative with web 2.0 in academic liaison.” library & information update 5 (may 2009): 40–41. rethlefsen, melissa. “facebook’s march on privacy.” library journal 135, no. 12 (2010): 34–35. schachter, debbie. “adjusting to changes in user and client expectations.” information outlook 13, no. 4 (2009): 55. http://dro.deakin.edu.au/eserv/du:30016672/horn-leveragingleveragepaper-2008.pdf http://eprints.vu.edu.au/800/1/kent_p_080201_final.pdf assessing the treatment of patron privacy in library 2.0 literature | zimmer 41 shippert, linda crook. “thinking about technology and change, or, ‘what do you mean it’s already over?’” pnla quarterly 73, no. 2 (2008): 4, 26. stephens, michael. “the ongoing web revolution.” library technology reports 43, no. 5 (2007): 10–14. thornton, lori. “facebook for libraries.” christian librarian 52, no. 3 (2009): 112. trott, barry and kate mediatore. “stalking the wild appeal factor.” reference & user services quarterly 48, no. 3 (2009): 243–46. valenza, joyce kasman. “a few new things.” lmc: library media connection 26, no. 7 (2008): 10– 13. widdows, katharine. “web 2.0 moves 2.0 quickly 2.0 wait: setting up a library facebook presence at the university of warwick.” sconul focus 46 (2009): 54–59. wisniewski, jeff. “location, location, location.” online 33, no. 6 (2009): 54–57. woolley, rebecca. “book review: information literacy meets library 2.0: peter godwin and jo parker (eds.).” sconul focus 47, (2009): 55–56. wyatt, neal. “2.0 for readers.” library journal 132, no. 18 (2007): 30–33. a comparative analysis of the effect of the integrated library system on staffing models in academic libraries ping fu and moira fitzgerald information technology and libraries | september 2013 47 abstract this analysis compares how the traditional integrated library system (ils) and the next-generation ils may impact system and technical services staffing models at academic libraries. the method used in this analysis is to select two categories of ilss—two well-established traditional ilss and three leading next-generation ilss—and compare them by focusing on two aspects: (1) software architecture and (2) workflows and functionality. the results of the analysis suggest that the nextgeneration ils could have substantial implications for library systems and technical staffing models in particular, suggesting that library staffing models could be redesigned and key librarian and staff positions redefined to meet the opportunities and challenges brought on by the next-generation ils. introduction today, many academic libraries are using well-established traditional integrated library systems (ilss) built on the client-server computing model. the client-server model aims to distribute applications that partition tasks or workloads between the central server of a library automation system and all the personal computers throughout the library that access the system. the client applications are installed on the personal computers and provide a user-friendly interface to library staff. however, this model may not significantly reduce workload for the central servers and may increase overall operating costs because of the need to maintain and update the client software across a large number of personal computers throughout the library. 1 since the global financial crisis, libraries have been facing severe budget cuts, while hardware maintenance, software maintenance, and software licensing costs continue to rise. the technology adopted by the traditional ils was developed more than ten years ago and is evidently outdated. the traditional ils does not have sufficient capacity to provide efficient processing for meeting the changing needs and challenges of today’s libraries, such as managing a wide variety of licensed electronic resources and collaborating, cooperating, and sharing resources with different libraries.2 ping fu (pingfu@cwu.edu), a lita member, is associate professor and head of technology services in the brooks library, central washington university, ellensburg, wa. moira fitzgerald (moira.fitzgerald@yale.edu), a lita member, is access librarian and assistant head of access services in the beinecke rare book and manuscript library, yale university, new haven, ct. mailto:pingfu@cwu.edu mailto:moira.fitzgerald@yale.edu a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 48 today’s libraries manage a wide range of licensed electronic resource subscriptions and purchases. the traditional ils is able to maintain the subscription records and payment histories but is unable to manage details about trial subscriptions, license negotiations, license terms, and use restrictions. some vendors have developed electronic resources management system (erms) products as standalone products or as fully integrated components of an ils. however, it would be more efficient to manage print and electronic resources using a single, unified workflow and interface. to reduce costs, today’s libraries not only band together in consortia for cooperative resource purchasing and sharing, but often also want to operate one “shared ils” for managing, building, and sharing the combined collections of members.3 such consortia are seeking a new ils that exceeds traditional ils capabilities and uses new methods to deliver improved services. the new ils should be more cost effective, should provide prospects for cooperative collection development, and should facilitate collaborative approaches to technical services and resource sharing. one example of a consortium seeking a new ils is the orbis cascade alliance, which includes thirty-seven universities, colleges, and community colleges in oregon, washington, and idaho. as a response to this need, many vendors have started to reintegrate or reinvent their ilss. library communities have expressed interest in the new characteristics of these next-generation ilss; their ability to manage print materials, electronic resources, and digital materials within a unified system and a cloud-computing environment is particularly welcome.4 however, one big question remains for libraries and librarians, and that is what implications the next-generation ils will have on libraries’ staffing models. little on this topic has been presented in the library literature. this comparative analysis intends to answer this question by comparing the nextgeneration ils with the traditional ils from two perspectives: (1) software architecture, and (2) workflows and functionality, including the capacity to facilitate collaboration between libraries and engage users. scope and purpose the purpose of the analysis is to determine what potential effect the next-generation ils will have on library systems and technical services staffing models in general. two categories of ilss were chosen and compared. the first category consists of two major traditional ilss: ex libris’s voyager and innovative interfaces’ millennium. the second category includes three nextgeneration ilss: ex libris’s alma, oclc’s worldshare management services (wms), and innovative interfaces’ sierra. voyager and millennium were chosen because they hold a large portion of current market shares and because the authors have experience with these systems. yale university library is currently using voyager, while central washington university library is using millennium. alma, wms, and sierra were chosen because these three next-generation ilss are produced by market leaders in the library automation industry. the authors have learned about these new products by reading and analyzing literature and vendors’ proposals, as well as information technology and libraries | september 2013 49 attending vendors’ webinars and product demonstrations. in the long run, yale university library must look for a new library service platform to replace voyager, verde, metalib, sfx, and other add-ons. central washington university library is affiliated with the orbis cascade alliance mentioned above. the alliance is implementing a new library management service to be shared by all thirty-seven members of the consortium. ex libris, innovative interfaces, oclc, and serials solutions all bid for the alliance’s shared ils. after an extensive rfp process, in july 2012 the orbis cascade alliance decided to choose ex libris’s alma and primo as their shared library services platform. the system will be implemented in four cohorts of approximately nine member libraries each over a two-year period, beginning in january 2013. the central washington university library is in the forth migration cohort, and their new system will be live in december 2014. it is important to emphasize that the next-generation ils has no local online public access catalog (opac) interface. vendors use additional discovery products as the discovery-layer interfaces for their next-generation ilss. specifically, ex libris uses primo as the opac for alma, while oclc’s worldcat local provides the front-end interface for wms. innovative interfaces offers encore as the discovery layer for sierra. as front-end systems, these discovery platforms provide library users with one-stop access to their library resources, including print materials, electronic resources, and digital materials. while these discovery platforms will also impact library organization and librarianship, they will have more impact on the way that end-users, rather than library staff, discover and interact with library collections. in this analysis, we focus on the effects that back-end systems such as alma, wms, and sierra will have on library organizational structure and staffing, rather than the end-user experience. as our sample only includes five ilss, the scope of the analysis is limited, and the findings cannot be universal or extended to all academic libraries. however, readers will gain some insight into what challenges any library may face when migrating to a next-generation ils. literature review a few studies have been published on library staffing models. patricia ingersoll and john culshaw’s 2004 book about systems librarianship describes vital roles that systems librarians play, with responsibilities in the areas of planning, staffing, communication, development, service and support, training, physical space, and daily operations. 5 systems librarians are the experts who understand both library and information technology and can put the two fields together to context. they point out that system librarians are the key players who ensure that a library stays current with new information technology. the daily and periodic operations for systems librarians include ils administration, server management, workstation maintenance, software and applications maintenance and upgrades, configuration, patch management, data backup, printing issues, security, and inventory. all of these duties together constitute the workloads of systems librarians. ingersoll and culshaw also emphasize that systems librarians must be proactive in facing constant changes and keep abreast of emerging library technologies. a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 50 edward iglesias et al., based on their own experiences and observations at their respective institutions, studied the impact of information technology on systems staff.6 their book covers concepts such as the client-server computing model, web 2.0, electronic resource management, open-source, and emerging information technologies. their 2004 studies show that, tough there are many challenges inherent in the position, there are also many ways for system staff to improve their knowledge, skills, and abilities to adapt to the changing information technologies. janet guinea has also studied the roles of systems librarians at an academic library.7 her 2003 study shows that systems librarians act as bridge-builders between the library and other university units in the development of library-initiated projects and in the promotion of information technology-based applications across campus. another relevant study was conducted by marshall breeding at vanderbilt university in an investigation of the library automation market. his 2012 study compares the well-established, traditional ilss that dominate the current market (and are based on client-server computing architecture developed more than a decade ago) to the next-generation ilss deployed through multitenant software-as-a-service (saas) models, which are based on service-oriented architecture (soa).8 through this comparison, breeding indicates that next-generation ilss will differ substantially from existing traditional ilss and will eliminate many hardware and maintenance investments for libraries. the next-generation ils will bring traditional ils functions, erms, digital asset management, link resolvers, discovery layers, and other add-on products together into one unified service platform, he argues.9 he gave the next-generation ils a new term, library services platform.10 this term signifies that a conceptual and technical shift is happening: the next-generation ils is designed to realign traditional library functions and simplify library operations through a more inclusive platform designed to handle different forms of content within a unified single interface. breeding’s findings conclude that the next-generation ils provides significant innovations, including management of print and electronic library materials, reliance on global knowledge bases instead of localized databases, deployment through multitenant saas based on a service-oriented architecture, and the provision of a suite of application programming interfaces (apis) that enable greater interoperability and extensibility.11 he also predicts that the next-generation ils will trigger a new round of ils migration.12 method our method narrowed down the analysis for the implications of ilss on library systems and technical services staffing models to two major aspects: (1) software architecture, and (2) workflows and functionality, including facilitation of collaborations between libraries and user engagement. first, we analyzed two traditional ilss, voyager and millennium, which are built on a client-server computing model, deliver modular workflow functionality, and are implemented in our institutions. through the analysis, we determined how these two aspects affect library organizational structure and librarian positions designed for managing these modular tasks. then, information technology and libraries | september 2013 51 based on information we collected and grouped from vendors’ documents, rfp responses, product demonstrations, and webinars, we examined the next-generation ilss alma, wms, and sierra— which are based on soa and intended to realign traditional library functions and simplify library operations—to evaluate how these two factors will impact staffing models. to provide a more in-depth analysis, particularly for systems staffing models, we also gathered and analyzed online systems librarian job postings, particularly for managing the voyager or millennium system, for the past five years. the purpose of this compilation is to cull a list of typical responsibilities of systems librarians and then determine what changes may occur when they must manage a next-generation ils such as alma, wms, or sierra. data on job postings were gathered from online job banks that keep an archive of past listings, including code4lib jobs, ala joblist, and various university job listing sites. duplicates and reposts were removed. the responsibilities and duties described in the job descriptions were examined for similarities to determine a typical list. the data from all sources were gathered together in a single database to facilitate its organization and manipulation. specific responsibilities, such as administering an ils, were listed individually, while more general responsibilities for which descriptions may vary from one posting to another were grouped under an appropriate heading. to ensure complete coverage, all postings were examined a second time after all categories had been determined. we also used our own institutions as examples to support the analysis. the implications of ils software architecture on staffing models voyager and millennium are built on client-server architecture. libraries that use these ilss also use add-ons, such as erms and link resolvers, to manage their print materials and licensed electronic resources. the installation, configuration, and updates of the client software require a significant amount of work for library it staff. many libraries must allocate substantial staff effort and resources to coordinating the installation of the new software on all computers throughout the library that access the system. those libraries that allow staff to work remotely have experienced additional costs and it challenges. in addition, server maintenance, backups, upgrades, and disaster recovery also require excessive time and effort of library it staff. administering ilss, erms, and other library hardware, software, and applications is one of the primary responsibilities for a library systems department. positions such as systems librarian, electronic resource librarian, and library it specialist were created to handle this complicated work. at a very large library, such as yale university library, the systems group of library it is only responsible for voyager’s configuration, operation, maintenance, and troubleshooting. two other it support groups—a library server support group and a workstation support group—are responsible for installation, maintenance, and upgrade of the servers and workstations. specifically, the library server support group deals with the maintenance and upgrade of ils servers and the software and relational database running on the servers, while the workstation support group takes care of the installation and upgrade of the client software on hundreds of a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 52 workstations throughout twenty physical libraries. at a smaller library, such as central washington university library, on the other hand, one systems librarian is responsible for the administration of millennium, including configuration, maintenance, backup, and upgrade on the server. another library it staff member helps install and upgrade the millennium client on about forty-five staff computers throughout its main library and two center campus libraries. comparatively, the next-generation ilss alma, wms, and sierra have a saas model designed by soa principles and deployed through a cloud-based infrastructure. oclc defines this model as “web-scale management services.”13 using this innovation, service providers are able to deliver services to their participating member institutions on a single, highly scalable platform, where all updates and enhancements can be done automatically through the internet. the different participating member institutions using the service can configure and customize their views of the application with their own brandings, color themes, and navigational controls. the participating member institutions are able to set functional preferences and policies according to their local needs. web-scale services reduce the total cost of ownership by spreading infrastructure costs across all the participating member institutions. the service providers have complete control over hardware and software for all participating member institutions, dramatically eliminating capital investments on local hardware, software, and other peripheral services. service providers can centrally implement applications and upgrades, integration across services, and system-wide infrastructure requirements such as performance reliability, security, privacy, and redundancy. thus participating member institutions are relieved from this burdensome responsibility that has traditionally been undertaken by their it staff.14 from this perspective, the next-generation ils will have a huge impact on library organizational structure, staffing, and librarianship. since the next-generation ils is implemented through the cloud-computing model, there is no requirement for local staff to perform the functions traditionally defined as “systems” staff activities, such as server and storage administration, backup and recovery administration, and server-side network administration. for example, the entire interfaces of alma and wms are served via web browser; there is no need for local staff to install and maintain clients on local workstations. therefore, if an institution decided to migrate to a next-generation ils, the responsibilities and roles of systems staff within the institution would need to be readdressed or redefined. we have learned from attending oclc’s webinars and product demonstrations that library systems staff would be required to prepare and extract data from their local systems during new systems implementation. they also would be required to configure their own settings such as circulation policies. however, after the migration, a systems staff member would likely serve as a liaison with the vendor. this would require, according to oclc’s proposal, only 10 percent of the systems staff’s time on an ongoing basis. through attending ex libris’s webinars and product demonstrations, we have learned that a local system administrator may be required to take on basic management processes, such as record-loading or integrating data from other campus systems. similarly, we have learned from innovative interfaces’ webinars and product demonstrations that sierra would still need local systems information technology and libraries | september 2013 53 expertise to perform the installations of the client software on staff workstations. sierra would require library it staff to perform administrative tasks like the user account administration and to support sierra in interfacing with local institution-specific resources. in general, as shown in table 1, local systems staff could be freed from the burdensome responsibility of administering the traditional ils because of the software architecture of the nextgeneration ils. systems librarian responsibilities workload percentage traditional ils nextgen ils managing ils applications, including modules and the opac 10 x managing associated products such as discovery systems, erms, link resolver, etc. 10 x day-to-day operations including management maintenance, troubleshooting, and user support 10 x x server maintenance, database maintenance and backup 10 x customizations and integrations 5 x x configurations 5 x x upgrades and enhancements 5 x patches or other fixes 5 x design and coordination of statistical and managerial reports 5 x x overall staff training 5 x x primary representative and contact to the designated library system vendors 5 x x keeping abreast of developments in library technologies to maintain current awareness of information tools 5 x x engaging in scholarly pursuit and other professional activities 10 x x serving on various teams and committees 5 x x reference and instruction 5 x x total 100 100% 60% table 1. systems librarian responsibilities comparison for traditional ils and next-generation ils. note: the systems librarian responsibilities and the approximate percentage of time devoted to each function are slightly readjusted based on the compiled descriptions of the systems librarian job postings we collected and analyzed from the internet and from vendors’ claims. a total of 47 position a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 54 descriptions were gathered. the workload percentage is adopted from the job description of the systems librarian position at one of our institutions. our analysis shows that systems staff might reduce their workload by approximately 40 percent. therefore library systems staff could use their time to focus on local applications development and other library priority projects. however, it is important to emphasize that library systems staff should reengineer themselves by learning how to use apis provided by the next-generation ils so that they will be able to support the customization of their institutions’ discovery interfaces and the integration of the ils with other local enterprise systems, such as financial management systems, learning management systems, and other local applications. the implications of ils workflows and functionality on staffing models the typical workflow and functionality of both voyager and millennium are built on a modular structure. major function modules, called client modules, include systems administration, cataloging, acquisitions, serials, circulation, and statistics and reports. additionally, the traditional ils provides an opac interface for library patrons to access library materials and manage their accounts. millennium has an erms module built in as a component of their ils while ex libris has developed an independent erms as an add-on to voyager. the systems administration module is used to add system users and to set up locations, patron types, material types, and other library policies. the cataloging module supports the functions of cataloging resources, managing the authority files, tagging and categorizing content, and importing and exporting bibliographic records. the sophistication of the cataloging module depends primarily on the ils. the acquisitions module helps in the tracking of purchases and acquisition of materials for a library by facilitating ordering, invoicing, and data exchange with serial, book, and media vendors through electronic data interchange (edi). the circulation module is used to set up rules for circulating materials and for tracking those materials, allowing the library to add patrons, issue borrowing cards, and form loan rules. it also automates the placing of holds, interlibrary loan (ill), and course reserves. self-checkout functionality can be integrated as well. the serials module is essentially a cataloging module for serials. libraries are often dependent on the serials module to help them track and check-in serials. the statistics and reports module is used to generate reports such as circulation statistics, age of collection, collection development, and other customized statistical reports. a typical traditional ils comprises a relational database, software to interact with that database, and two graphical user interfaces—one for patrons and one for staff. it usually separates software functions into discrete modules, each of them integrated with a unified interface. the traditional ils’s modular design was a perfect fit for a traditional library organizational structure. the staff at central washington university library, for example, under the library administration, are organized into the following three major groups: public services, including the reference and circulation departments; technical and technology services, including the cataloging, collection development, serials & electronic resource, and systems departments; and information technology and libraries | september 2013 55 other library services and centers, including the government documents department, the music library, two center campus libraries, the academic and research commons, and the rare book collection & archive. each department has at least one professional librarian and other library staff members responsible for their daily operations. for example, the collection development librarian is responsible for the acquisition of print monographs and serials, while the electronic resource librarian is responsible for purchasing and managing licensed databases or e-journals. however, the next-generation ils significantly enhances and reintegrates the workflow of traditional ils functions. the functionality is quite different from the traditional ils’s modular structure. the design of the functionality stresses two principles: modularity and extensibility. it brings together the selection, acquisition, management, and distribution of the entire library collection. it provides a centralized data-services environment to its unified workflows for all types of library assets. one of the big enhancements of the next-generation ils is the acquisitions module, which enables the management of both print and electronic materials within a single unified interface, with no need to move between modules or multiple systems for different formats and related activities. for example, according to oclc, wms streamlines selection and acquisition processes via built-in access to worldcat records and publisher data. vendor, local, consortium, and global library data share the same workflows. wms automatically creates holdings for both physical and electronic resources. the worldcat knowledge-base simplifies electronic resource management and delivery. order data from external systems can be automatically uploaded. for consortium users, wms’s unified workflow and interface fosters efficient resource-sharing between different institutions whose holdings share a common format. similarly, ex libris’s alma has an integrated central knowledge base (ckb) that describes available electronic resources and packages, so there is no need to load additional descriptive records when acquiring electronic resources based on the ckb. the purchasing workflow manages orders for both print and electronic resources in a very similar way and handles some aspects unique to electronic resources, such as license management and the identification of an access provider. staff users can start the ordering process by searching the ckb directly and ordering from there. this search is integrated into the repository search, allowing a staff user to perform searches both in his or her institution as well as in the community zone, which holds the ckb. the next-generation ils provides unified data services and workflows, and a single interface to manage all physical, electronic, and digital materials. this will require libraries to rethink their acquisitions staffing models. for example, in small libraries could merge the acquisition librarian position and the electronic resource librarian position or reorganize the two departments. another functionality enhancement of the next-generation ils provides the authoritative ability for consortia users to manage local holdings and collections as well as shared resources. for example, wms’s single shared knowledge base eliminates the need for each library to maintain a copy of a knowledge base locally, because all consortia members can easily see what is licensed by other members of the consortia. cataloging records are shared at the consortium and global levels a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 56 in real time. each institution immediately benefits from original cataloging records added to the system and from enhancements to existing records. authority control is built into worldcat, so there is no need to do authority processing against local bibliographic databases. with real-time circulation between libraries’ collections, there is no need to re-create bibliographic and item data in separate local systems. similarly, sierra enhances the traditional technical services workflows by providing a shared bibliographic database. whenever a member library performs selection or ordering, the library is able to determine if other consortia members have already selected, ordered, and cataloged the title. this may impact a local selection, allowing consortia members to more collectively develop their individual collections and reduce duplication. alma’s centralized metadata management service (mms) takes a very similar approach to wms and sierra, allowing several options for local control and shared cataloging, depending on an institution’s needs, while ex libris maintains authority files. very large institutions, for example, might manage some records in the local catalog and most records in a shared bibliographic database, while smaller institutions might manage all of their records in the shared bibliographic database. all these approaches require more collaboration and cooperation between consortia members. according to vendors’ claims on their proposals to the orbis cascade alliance, small institutions might not need to have a professional cataloger, since the cataloging process is simplified and it is therefore easier for paraprofessional staff to operate and copy bibliographic records from the knowledgebases of these ilss. in addition, the next-generation ils also allows library users to actively engage with ils software development. for example, by adding opensocial containers to the product, wms allows library developers to use api to build social applications called gadgets and add these gadgets to wms. one example highlighted by oclc is a gadget in the acquisitions area of wms that will show the latest new york times best sellers and how many copies the library has available for each of those titles. similarly, sierra’s open developer community will allow library developers to share ideas, reference code samples, and build a wide range of applications using sierra’s web services. also, sierra will provide a centralized online resource called sierra developer sandbox to offer a comprehensive library of documented apis for library-developed applications. all these enhancements provide library staff with new opportunities to redefine their roles in a library. conclusions and arguments in summary, compared to the client-server architecture and modular design of the traditional ils, the next-generation ils has an open architecture and is more flexible and unified in its workflow and interface, which will have a huge impact on library staffing models. the traditional ils specifies clear boundaries between staff modules and workflows while the next-generation ils has blurred these boundaries. the integration and enhancement of the functionality of the nextgeneration ils will help libraries streamline and automate workflows and processes for managing both print and electronic resources. it will increase libraries’ operational efficiency, reduce the information technology and libraries | september 2013 57 total cost of ownership, and improve services for users. particularly, it will free approximately 40 percent of library systems staff time from managing servers, software upgrades, client application upgrades, and data backups. moreover, the next-generation ils provides a new way for consortial libraries to collaborate, cooperate, and share resources. in addition, the web-scale services provided by the next-generation ils allow libraries to access an infrastructure and platforms that enable them to reach a broad, geographically diverse community while simultaneously focusing their services on meeting the specific needs of their end-users. thus the more integrated workflows and functionality allow library staff to work with more modules, play multiple roles, and back up each other, which will bring changes to traditional staffing models. however, the next-generation ils also brings libraries new challenges along with its clear advantages. librarians and library staff might have concerns pertaining to their job security and can be fearful of new technologies. they may feel anxious about how to reengineer their business processes, how to get training, how to improve their technological skills, and how to prepare for a transition. we argue here that library directors might think about these staff frustrations and find ways to address their concerns. libraries should provide staff more opportunities and training to help them to improve their knowledge and skills. redefining job descriptions and reorganizing library organizational structures might be necessary to better adapt to the changes brought about by the next-generation ils. systems staff might invest more time in local application developments, other digital initiatives, website maintenance, and other library priority projects. technical staff might reconsider their workflows and cross-train themselves to expand their knowledge and improve their work efficiency. they might spend more time on data quality control and special collection development or interact more with faculty on book and e-resource selections. we hope this analysis will provide some useful information and insights for those libraries planning to move to the next-generation ils. the shift will require academic libraries to reconsider their organizational structures and rethink their manpower distribution and staffing optimization to better focus on library priorities, projects, and services critical to their users. references 1. marshall breeding, “a cloudy forecast for libraries,” computers in libraries 31, no. 7 (2011): 32–34. 2. marshall breeding, “current and future trends in information technologies for information units,” el profesional de la información 21, no. 1 (2012): 11. 3. jason vaughan and kristen costello, “management and support of shared integrated library systems,” information technology & libraries 30, no. 2 (2011): 62–70. 4. marshall breeding, “agents of change,” library journal 137, no. 6 (2012): 30–36. a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 58 5. patricia ingersoll and john culshaw, managing information technology: a handbook for systems librarians (westport, ct: libraries unlimited, 2004). 6. edward g. iglesias, an overview of the changing role of the systems librarian: systemic shifts (oxford, uk: chandos, 2010). 7. janet guinea, “building bridges: the role of the systems librarian in a university library,” library hi tech 21, no. 3 (2003): 325–32. 8. breeding, “agents of change,” 30. 9. ibid. 10. ibid., 33. 11. ibid., 33. 12. ibid., 30. 13. sally bryant and grace ye, “implementing oclc’s wms (web-scale management services) circulation at pepperdine university,” journal of access services 9, no. 1 (2012): 1. 14. gary garrison et al., “success factors for deploying cloud computing,” communications of the acm 55, no. 9 (2012): 62–68. 276 on-line acquisitions by lolita frances g. spigai: former information analyst, oregon state university library; and thomas mahan: research associate, oregon state university computer center, corvallis, oregon. the on-line acquisition program (lolita) in use at the oregon state university library is described in t erms of development costs, equipment requirements, and overall design philosophy. in pa1'ticular, the record format and content of records in the on-orde1' file, and the on-line processing of these records (input, search, correction, output) using a cathode ray tube display terminal are detailed. the oregon state university library collection has grown by 15,00020,000 new titles per year (corresponding to 30,000-35,000 volumes per year) for the past three years to a total of approximately 275,000 titles ( 600,000 volumes); continuing serials account for a large percentage of annual "volume" growth. these figures would indicate an average input of 60-80 new titles per day. on an average, a corresponding number of records are removed each day upon completion of the processing cycle. a like number of records are updated when books and invoices are received. in addition, approximately 200 searches per day are made to determine whether an item is being ordered or to determine the status of an order. since the mid-1960's, and with the introduction of time-sharing, a handful of academic libraries ( 1, 2, 3) and several library networks ( 4, 5, 6) have introduced the advantages ( 7) of on-line computer systems to library routines. most of the on-line library systems use teletypewriter terminals. use of visual displays for library routines has been limited, although stanford anticipates using visual displays with ibm 2741 typeon-line acquisitionsjspigai and mahan 277 writer terminals in a read-only mode ( 1), and the library of the ibm advanced systems development division at los gatos, sharing an ibm 360/50, uses an ibm 2260 display for ordering and receiving ( 8). in addition, an institute of library research study, focusing on on-line maintenance and search of library catalog holdings records, has concluded that even with the limited number of characters available on all but the most expensive display terminals " ... the high volume of data output associated with bibliographic search makes it desirable to incorporate crt's as soon as possible, in order to facilitate testing on a basis superior to that achievable with the mechanical devices." (9). many academic libraries, during shelflist conversion or input of acquisition data, use a series of tags for bibliographic information. some of these tags are for in-house use, while others presumably are used to aid in the conversion of marc tape input to the library's own input format. the number of full-time staff required to design and operate automated systems in individual academic libraries typically ranges from seven to fifteen. this doesn't seem to be an inordinate range, since most departments of a medium-large to large academic library require a similar size staff for operational purposes alone. lolita (library on-line information and text access) is the automated acquisition system used by the oregon state university library. it operates in an on-line, time-shared, conversational mode, using a cathode ray tube (cdc-210) or a 35-ksr teletype as a terminal, depending upon the operation required. both types of equipment are in the acquisitions department of the library; each interacts with the university's main computer ( cdc-3300, 91k core, 24-bit words), which, in turn accesses the mass storage disk ( cdc-814, capable of storing almost 300 million characters) through the use of lolita's programs in conjunction with the executive program, os-3 ( 10). under the os-3 time-sll,aring system, lolita shares the use of the central computer memory and processor with up to 59 other concurrent users; the use of the mass storage disk is also shared with other users of the university's computer center. (lolita will require approximately 11 million characters of disk storage). lolita's programs are written in fortran and in the assembly language, compass, and are composed of two sets: those which maintain the outstanding order file, and those which produce printed products and maintain the accounting and vendor files. several key factors have shaped the design of lolita. an on-line, time-sharing system has been operating at osu since july 1968, and online capabilities have been available for test purposes since the summer of 1967. programming efforts could be concentrated exclusively on the design of lolita and an earlier pilot project ( 11) , for no time was needed to design, debug or redesign the operating system software, as was necessary at washington state u. and the u. of chicago (2, 12) . heavy reliance was put on assembly language coding for the usual 278 journal of library automation vol. 3/4 december, 1970 reasons, plus the knowledge that the computer center's next computer is to be a cdc-3500, with an instruction set identical to that which the library now uses. in short, neither the os-3 operating system nor the assembly language will change for the next few years. an added motivation influencing program design was the desire to minimize response time for the user. in view of the transient nature of a university library's student and civil service staff, the need for an easily-learned and maintained system is paramotmt. the flerible display format of the crt allows a machine readable worksheet, with a built-in, automatic, tagging scheme; it obviates the need for a paper worksheet, and thus eliminates a time-consuming, · tedious, and error-prone conversion process. the book request slip contains the source information for input. proofreading and correction are done on-line at time of input. alterations can be made at any later time as well. lolita has used from 1.5 to 3.0 fte through the period of design to operation. after an initial testing and data base buildup period, anticipated to last about six months, and during which lolita will be run in parallel with the manual system, it is expected that the on-order/in-process, vendor, and accounting files will be maintained automatically and that reports and forms currently output by the acquisitions department staff will be generated automatically. specifically, records comprising three files will be kept on-line : 1) the outstanding order file (a slight misnomer since it includes and will include three types of book request data: outstanding orders, desiderata of high priority, and in-process material), 2 ) name and address for those vendors of high use (approximately 200 of 2500, or about 8% ), and codes and use-frequency counts for all vendors, and 3) accounting data for all educational resource materials purchased by the oregon state university library. it should be kept in mind that, although lolita is designed for book order functions, the final edited record, after the item has been cataloged, will be captured on magnetic tape as a complete catalog record. thus, all statistics and information, except circulation data, will be available for future book acquisitions. this project is being undertaken for two reasons: 1) the oregon state university library is concerned that librarians achieve their potential as productive professionals through the use of data processing equipment for routine procedures, and that cost savings may be realized as the library approaches a total system encompassing all of the technical services routines, and 2) a uniquely receptive computer center and a successful on-line time-sharing facility are available. record format and content each book request is described by 27 data elements which are grouped into three logical categories and are displayed in three logical "pages" on-line acquisitionsfspigai and mahan 279 of a crt screen. the categories are: 1) bibliographic information, 2) accounting information, and 3) inventory information; figures 1, 2, and 3 list the data elements in the same sequence as they appear on the crt screen. though most data elements listed are self-explanatory, eight require some description. order number flag word author title edition id number publisher year published notes fig. 1. bibliographic information. order number date requested date ordered estimated price number of copies account number vendor code vendor invoice number invoice date actual price date received date 1st claim sent date 2nd claim sent fig. 2. accounting information. order number bib cit date cataloged volume issue location code lc class number fig. 3. inventory information. 280 l ournal of library automation vol. 3 f 4 december, 1970 flag word this data element indicates the status of a request. the normal order procedure needs no hag word. exceptions are dealt with automatically by entering an appropriate hag word. as more requests are added to the system, and as more exceptional instances are uncovered, more hag words will undoubtedly be added. to date there are twelve hag words, plus one data element which serves both as a data element and as a status signal. flag words and procedures activated are described below. conf.: confirming orders for materials ordered by phone or letter, and for unsolicited items which are to be added to the collection. the order form is not mailed, but used for processing internal to the library only. accounting routines are activated. gift: for gift or exchange items, a special series number prefixed by a "g" is assigned and the printed purchase order is used internally only. this hag word also acts as a signal so that accounting routines will not encumber any money. the primary reason for assigning a purchase order number is to provide a record indexing mechanism (this is also true for held orders) . held : selected second-priority orders being held up for additional book budget funds. these order records are kept on line, and are assigned a special series of purchase order numbers, prefixed by an "h." no accounting procedures accompany these orders, although a purchase order is generated and manually filed by purchase order number. live : held orders which have been activated. this word causes a reassignment of purchase order numbers to the next number in the main sequence ( instead of "h" -prefixed numbered) and sets up the natural chain of accounting events. the new purchase order number is then written or typed on the order form, the order date added, and the order mailed. cash: orders for books from vendors who require advance payment. an expenditure, instead of an encumbrance, is recorded. rush: used for books which are to be rush ordered and/or rush cataloged. rush will also be rubber-stamped on the purchase order for emphasis. no special procedures are activated within the computer programs; rush is an instruction for people. docs: used when ordering items from vendors with whom the osu library maintains deposit accounts (e.g. government printing office). this causes a zero encumbrance in the accounting scheme; cash is used to put additional money into deposit accounts. canc: cancelled orders. unencumbers monies and credits accounts for cash orders. reis: used to reissue an order for an item which has been cancelled. a new purchase order containing a new order number, vendor, etc. will automatically be issued. re-input is not necessary; however, changes in vendor no., etc., can be made. on-line acquisitionsj spigai and mahan 281 part: denotes a partial shipment for one purchase order. no catalog date can be entered while part appears as the flag word. invo will replace part when the final shipment has been received; canc will replace part if the final shipment is not received, and the order is reissued for the portion received. · invo : when invoice information is entered into the file, invo is typed in as the flag word. this causes accounting information (purchase order number, vendor code, invoice number, actual price, invoice data, account number) to be duplicated in the accounting file. kill: used to remove an inactive record from the file ( cf. date cataloged). date cataloged: a value entered for this data element signals the end of processing. the record is removed from the main file and transferred to magnetic tape. changes and additions to inventory and bibliographic data elements are anticipated at this final point, to bring the record into line with those of the catalog dept. author(s) all authors are to be included in this data element, corporate authors, joint authors, etc. the entry form is last name first (e.g. smith, john a. ). for compound authors, a slash is used as the delimiter separating names (e.g. smith, john a. i jones, john paul) . id number standard book number, vendor catalog number, etc. order number the order number is automatically assigned to one of three series depending on the flag word: the main number series with the fiscal year as prefix; held order series with an "h"-prefix (stored in the order number index as 101, the "h" is what is printed on the order forms); and gift series with a "g" -prefix (likewise stored in the order number index as 102). vendor code a sample of 18 months of invoice data (obtained from the comptroller's office) for the library resource account number indicates the use of 2200 vendors during that period of time. by sorting by invoice frequency and dollar amount, about 200 vendors were identified who either invoiced the library more than 12 times during this time period (since the invoices tended to contain more than one item for frequently used vendors, the number of purchase orders issued could easily be several times this amount), or whose invoices totalled over $110.00. of these, 171 have been selected for on-line storage. they will be assigned code numbers 1 to 171, and names and addresses of these vendors will be included on the computer generated purchase orders. authority files for all vendors 282 journal of library automation vol. 3/4 december, 1970 are kept on rolodex units; one set is arranged alphabetically by vendor name, the other by vendor code. account number the library account to which the book is charged. the number is divided into four sections: 1) a two-digit prefix identification for osu, 2) a four-digit identification for osu library resource expenditures, 3) a oneor two-digit identification of the particular library resource fund account to be charged (e.g. science, humanities, serials, binding, etc. ), and 4) a oneor two-digit code identifying the subject which most closely describes the request. from this data, statistics will be derived which describe expenditures by subject as well as by fund allocation. this will provide a powerful tool for collection building and . may also be a political aid in governing departmental participation in book selection. bibcit bibliographic citation code which cites the location by acquisitions dept. personnel of bibliographic data ( l.c. copy, etc. ). this information is included on the catalog work slip (4th copy of the purchase order) so that duplicate searching by the catalog dept. can be avoided. lc classification number refers to the call number as it is assigned by the osu catalog dept. file organization on-order record the operating system for oregon state university's on-line, time-sharing system reads into memory a quarter page (or file block) of 510 computer words at a time. each on-order (outstanding order) record is composed of a block of 51 computer words ( 204 6-bit characters), or linked lists of blocks, in order to best use this system. thu·s, each quarter page is divided into ten physical records of 51 computer words apiece. for records requiring more than one block, the nearest available block of 51 words within the same 510 word file-block is used; but if none is vacant within the same file-block, the first available 51-word block in the file is used. if none is free the file is lengthened to provide more blocks. a bit array is used to keep track of the status (in use, vacant) of records in the main file. in the bit array, each of 20 bits of each 24-bit computer word corresponds to a 51-word block in the main file. as in figure 4, the 13th bit has a zero value, indicating a vacancy in the 13th 51-word block of the main file; the 14th bit has a value of 1, indicating the 14th 51-word block in the on-order file is in use. a total of 10,120 block locations can be monitored by each file block of the bit array. records in this file are logically ordered by purchase order number, the arrangement effected by pointers which string the blocks together. on-line acquisitiansf spigai .and mahan 28$ 510-word ftle block unused 4 bits one -word b i t array fig. 4. bit army monitor of record block use in the on order file. access points order number the order number index is arranged by the main portion of the order number, and within that, it is in prefix number sequence. the sequence in figure 5 illustrates order number index arrangement (as well as the logical arrangement of the on-order file). the order number index allows quick access to selected points within the main file. conceptually, the ordered main file is segmented into strings of records whose order numbers fall into certain ranges. more specifically, items whose sequence numbers range from 0 to 4 (ignoring the prefix of the order number) comprise the first segment, 5 to 9 the second, etc. the index itself merely contains pointers to the leading record in each (conceptual) segment. thus, in the records whose purchase order numbers are shown in figure 5, there would be pointers to the second (69-124) and sixth (70-125), but not to the others. to reach the fourth ( 101-124) one follows the index to the second, and then follows the block pointers through the third to the fourth . 102-118 69-124 70-124 101-124, 102-124 70-125 102-125 . 70-126 fig. 5. fiscal year 1969, order number 124 fiscal year 1970, order number 124 held order number 124 for the current year gift order number 124 for the current year ( note : the prefix 'h,' which is printed on the purchase orders is represented as the number 101 for internal computer processing; likewise 102 represents the prefix 'g') order number index sequence. 284 journal of library automation vol. 3/4 december, 1970 p.o. number forward pointer ' p.o. number backward pointer time of last update . p. 0. number title forward pointer v title backward pointer v pointers to author( s) / ~ ~ title > date of re_quest date ordered encumbered price number of c<>e_ies account number (2 words) vendor number flag word ~ publisher 1 date of publication ~ notes ~ ~ edition ~ ld number ~ blbcit ' lc classification number )' volume number issue ~ location code ; ~ ~ vendor's invoice number ~~ invoice date actual price date received date first claim sent date second claim sent fig. 6. "on order" record organization. on-line acquisitionsjspigai and mahan 285 author(s) the author index is in the form of a multi-tiered inverted tree. the lowest tier is an inverted index containing the only representation of the author's names (it is not stored in the on-order record (figure 6), and, for each author, pointers to the records of each of his books (figure 7). the entries for several authors may be packed into a single 51-word block, if space permits. each higher tier serves to direct the indexing mechanism to the proper block in the next tier below, and to this end as much as needed of an author's name is filed upwards into higher tiers; this method is described in more detail by lefkovitz ( 13) as "the unique truncation variable length key-word key." author index directory (level 0 + 1) john/ jones, j 927 inverted author index (level 0) control word (ii chars. in record; # chars. in full name of author; # of titles jones, t jones, john pa ul 928 jop k.a 1282 tow ~ ~~~3 in on order file ~~2~66~7------------~ on order file 1072 927 10/20/69 10/29/69 $4.95 . 30-1061-6-20 16 0000 1282 10 fig. 7. author index organization and access to on order file. title not yet programmed. on-line record processing record creation after a number of new book requests have been searched to determine their absence from osu's collection and after they have been bibliographically identified, they are hatched for vendor assignment and readied for entry into the on-line file of book requests via the crt (figure 8 ). l-.:> 00 0) g '"'t i5 -c -~ n ... /y'rifiid "-.. n _/ not ""i assiql vrnoor 1•..-::-::-. _ i .... ~ a ~ y i > ~ ...... c ~ ...... .... c ;:s n < 0 !-' cn -~ d (!) () (!) !3 0"' (!) ~'"i ..... to -..1 0 fig. 8. book request processing. on-line acquisitionsjspagai and mahan 287 lolita's starting page is obtained by typing in the word lolita on the crt screen. the text illustrated in figure 9 is then displayed on the screen of the crt. when 't' is typed in, indicating a wish to create a record, the first data element of the first page of input appears (figure 10). (since the majority of records do not need a flag word upon input, the flag word fill-in line appears only on a redisplay of this page, and the flag word may be inserted at that time.) main file please indicate a choice 1. create a new entry 2. locate an existing entry 9. terminate all processing fig. 9. "starting" page of function choices. author(s): examples: jones dequincey, thomas washington, booker t. adams, john quincy/ doe, john american medical association fig. 10. first data element displayed in new record creation process. at this point the user can go in one of two directions. the first page of input information may be entered one data element at a time, each element being requested in a tutorial fashion by lolita. alternately, all of the first page data may be input at once, with data elements separated by delimiters. the user can switch from one method to the other at any point. a control key (return) is the delimiter used to signal the end of each data element, and, at the same time, return repositions the cursor (which indicates the position of the next character to be typed on the crt screen) to the location of the next data element to be filled in. another conh·ol key (send): 1) serves as a terminal delimiter, and 2) transmits data on the screen to the computer, thereby 3) triggering the continuation of processing until the next screen display is generated. thus, with page one, data elements are displayed, filled in and sent one at a time in the tutorial approach, or, all seven data elements are typed in at once, a return mark following items 1-6, then sent after the last data element. return or send must be used with each data element, even with those for which there is no information. this secures the sequence of element input, thus providing an easy (for the user) and automatic way of tagging elements for any future tape searches to provide statistics or analytical reports. in particular, this process obviates all content restrictions on variable (ie., free-form) items. each of the pages is redisplayed after 288 journal of library auto'tiultion vol. 3/4 december, 1970 input, and corrections can be made at this time. the crt is used for all input and its write-over capabilities are utilized for corrections, as compared to the "read-only" use planned for crt displays used for stanford's ballots ( 1). except for the flag word, all the data elements on the first page are variable in length and unrestricted as to content. data elements on page 2 and 3 (figures 2 and 3) are more of a fixed length in nature; thus with these pages, a whole page at a time is always filled in and sent: the tutorial function is inherent in the display. the concluding display is shown in figure 11. send if all done, type 1-3 to review pages. fig. 11. review option. because hatched searching and input are assumed, when one search or input is finished, the program recycles to continue searching or inputting without going back to the starting page (figure 9) each time. record search searching programs have been completed which will search by order number and by author. title searching will be implemented within the next few months, although a satisfactory scheme for title searching ( improving on manual methods, yet economical) has not been uncovered. methods suggested or used by ames, kilgour, ruecking, and spires have been noted (14, 15, 16, 17). the procedure for searching within the outstanding order file begins with the display of choices shown in figure 9. one types a "2," indicating a desire to locate an existing entry, and the text shown in figure 12 is displayed on the crt screen. at this point one chooses to search either by order number or by author. if one selects a valid order number representing a request record, the first page of that record, containing bibliographic information, is displayed. this is followed by the display shown in figure 11, so that accounting and inventory information may also be reviewed. for the user's convenience the order number is displayed in the upper right-hand comer of each of the three pages, both upon record input and search redisplay. to search by author, one types the author's name on the second line of figure 12, using the same format as that used in record creation. if the ------------------------: order number ------------------------------: a uth 0 r supply one of the above (start on the appropriate line) fig. 12. display of search options. ' on-line acquisitionsjspigai and mahan 289 author has only one entry in the outstanding order file, the first page of the entry will appear, etc. (as in the order number search above) . if the author entered has more than one entry in the on-line file, information depicted in figure 13 will be displayed on the screen of the crt. __ _____________ : enter number or 'nf' (not found) 1. night of the iguana 2. the milk-train doesn't stop here anymore 3. cat on a hot tin roof n. the glass menagerie fig. 13. display of multiple titles on file for one author. if the requested title is one of the titles displayed, one types its number and the record for that title will be displayed. if the title isn't among those displayed, typing nf would result in a redisplay of the text in figure 12 in order for searching to continue. for personal authors, variant forms of the name may be located using the following procedure. the word others is entered at the top of the screen, after an unsuccessful author search, so that a search for author j. p. jones would find all documents by john paul jones, joseph p. jones, j. peter jones, etc., as well as j. p. jones. a search for john p. jones would find all documents by j. p. jones, john jones and j. peter jones as well as john p. jones. record changes additions and corrections to the original record are made by first locating the record (by order number, author, or eventually, title), adding to the data elements, or writing over them (for corrections), and transmitting the information. examples of this procedure include: 1) entering the date received, 2) recording the vendor invoice number, invoice date, and actual price and 3) inserting or changing a flag word. in addition, after an item has been cataloged, the record is revised to include catalog data, as well as to exclude extraneous order notes. output aside from the crt displays, output is in three forms: off-line tape, printed forms and on-line files (figure 14). examples of output are library purchase orders, accounting reports, vendor data, and records of cataloged items. the number of potential reporting uses is limited only by money and imagination. 290 journal of library automation vol. 3/4 december, 1970 fig. 14. output from on-line on order file input. i order number i i date i id number author title publisher vendor name vendor address voujmes edition fig. 15. purchase order. f estimated price i no. of copies i vendor cooe i account date of pub. * * * • flag** • * gift or held order no. bibcit library purchase order 00 r cd !il~ iii r= ::0 r < . > sp >cil r-i ~ c/l c~ x c/lftl :v 0 c: -i ::0 z 0 "' < q "' ~ ::0 c/l ~ :::; -< on-line acquisitionsfspigai and mahan 291 the purchase order, shown in figure 15, is composed of four copies: 1} the vendor's copy to be retained by him, 2) a vendor "report" copy, 3) the copy which is kept as a record in the osu library, and 4) a catalog work slip to be forwarded to the catalog department with the book. purchase orders are printed on the library's teletype, which is equipped with a sprocket-feed. orders can also be printed on the line printer in the computer center. while this is a slightly cheaper data processing procedure, since no terminal costs are incurred, convenience and security have produced a victory in "economics over economies" ( 18 ), and the librarian's time has been considered in the total scheme. for gift items, purchase orders are produced as the cheapest means of preparing a catalog work slip. held purchase orders are produced and manually filed in purchase order number sequence, but when their status is changed to live, the old numbers are automatically replaced by a purchase order number in the main series. these new numbers are written onto the purchase orders, along with any other changes, and the orders are mailed. the flag word live also activates accounting procedures. there are two sets of accounting reports. the first is generated when the purchase orders are issued and contains tabulated information for the library's bookkeeper, the head of business records in the acquisitions dept., and the comptroller of the oregon state system of higher education. the second summary report is issued after the book and invoice have been received and will contain additional information, pertinent to the invoicing procedure; this report has the same distribution as the first. periodic reports are planned for the library's subject divisions summarizing expenditures by account number, reference area, and subject. programming for this has not yet been done. a frequency count will be stored with each vendor code and periodic listings will be printed for use in retaining vendors. mter an item has been cataloged, the catalog work slip and a slip equivalent to a main-entry catalog card are sent to acquisitions, and all remaining information and changes are recorded in the on-line record. this record is then transferred to a file from which it is dumped onto a magnetic tape. this off-line file will be used for statistical analyses and will be the start of a machine readable data base. future plans will, of course, depend on funding; however, two logical steps which could follow immediately and require no additional conversion are: 1) additional computer generated paper products (charge cards, catalog cards, book spine labels, new book lists, etc. ) , and 2) a management information system using acquisition and cataloging data. the construction of a central serial record in machine readable form would produce many valuable by-products. a program for the translation of the marc ii test tape has been written which causes these records to be printed out on the computer center's line printer; and since a sub292 journal of library automation vol. 3/4 december, 1970 scription to the marc tapes is now available to osu for test purposes, its advantages and compatibility with lolita will be investigated as time permits. unsolved problems, aside from those which everyone working in a data processing environment faces (e.g. syst~m and hardware breakdown, continued project funding, and lengthy dehv~ry times for hardware), include: 1) the widely varying system response tunes (commonly from a fraction of a second up to 60 seconds; usually 2-15 seconds); 2) the lack of personnel skilled in both data processing and library techniques; 3) the limited print train currently available on the line printer ( 62 character set); and 4) bureaucratic policy, which can render the most sophisticated plans for automation unfeasible if properly applied. it is recognized that all these problems can be solved by money, time, and priorities. meanwhile, the period of in-parallel operation will be valued as a time to educate, to test, to gather statistics, and to further refine the programs and procedures which comprise lolita. evaluation preliminary input samples indicate that a daily average of from 8 hours, 20 minutes, to 10 hours and 45 minutes will be necessary for input, searches, ~ting and corrections using the crt. an additional 3 hours per day ~f terminal time using the teletype will be required to produce the purchase orders, answer rush search questions if the crt is busy, and activate the daily batch programs (accounting reports, etc.). the sad economic plight of most libraries causes librarians to cast an especially suspicious eye on the costs of automation; a few words on osu's data processing costs may b~ of interest. the cost of total development efforts to produce lolita is under $90,000 (though considerably less was actually expended), or an average annual cost of $30,000 over a three-year period. this compare~ favorably with average annual incomes of from $50,000 to over $300,000 m federal funds alone for other on-line library acquisition projects in ?tiiversities ( 19, 20, 21, 22). a total of 6.75 man-years was required to des1gn lolita. the 6.75 man-years comprises 2.5 years of programming, 3.25 years .of systems analysis, coordination and documentation, and 1.0 year of clencal work, and represents the efforts of four students and six professional workers. this total does not include the time spent by acqu~sitions department personnel in reviewing lolita's abilities or in leammg to use the terminals. current data processing rates charged by the computer center include the following: crt rental-$100/mo.; cpu time-$300/hr.; terminal time -$2.00/hr.; on-line storage costs-15c/2040 characters/mo. the teletype has been purchased, thus only local phone lines charges are incurred. the on-line system is available for use from 7 :30 a.m. to 11:00 p.m. each week-day, and from 7:30 a.m. to 5:00 p.m. on saturday, which more than covers the 8-5 schedule of the acquisitions department. il on-line acquisitionsjspigai and mahan 293 acknowledgments the work on which this paper is based was supported by the administration, the computer center and the library of oregon state university. special mention is due robert s. baker, systems analyst, osu library, and lawrence w. s. auld, head, technical services, osu library, for their extensive participation in the lolita project and for their many suggestions which benefitted the final version of this paper. hans weber, head, business records, osu library, also contributed much to lolita's design. references l. veaner, allen b.: project ballots: bibliographic automation of large library operations using a time-sharing system. progress report, march 27, 1969-june 26, 1969, (stanford california: stanford university libraries, 29 july 1969), ed-030 777. 2. burgess, thomas k.; ames, l.: lola: library on-line acquisition sub~system~ (pullman, washington: washington state university, systems office, july 1968), pb-179 892. 3. payne, charles: "the university of chicago's book processing system." in stanford conference on collaborative library systems development: proceedings, stanford, california, october 4-5, 1968 (stanford california: stanford university libraries, 1969). ed-031 281, 119-139. 4. pearson, karl m.: marc and the library service center: automation at bargain rates (santa monica, california: system development corporation, 12 september 1969). sp-3410. 5. nugent, william r.: "nelinet -the new england library information network." in congress of the international federation for information processing (ifip), 4th: proceedings, edinburgh, august 5-10, 1968 (amsterdam, north holland publishing co., 1968 ). g28-g32. 6. blair, john r.; snyder, ruby: «an automated library system: project leeds," american libraries, 1 (february 1970), 172-173. 7. warheit, i. a.: "design of library systems for implementation with interactive computers," ] ournal of library automation, 3 (march 1970)' 68-72. 8. overmyer, lavahn: library automation: a critical review (cleveland, ohio: case western reserve university, school of library science, december 1969). ed-034 107. 9. cunningham, jay l.; schieber, william d.; shoffner, ralph m.: a study of the organization and search of bibliographic holdings records in on-line computer systems: phase i (berkeley, california university: institute of library research, march 1969). ed029 679, pp. 13-14. 294 journal of library automation vol. 3/4 december, 1970 10. meeker, james w.; crandall, n. ronald; dayton, fred a.; rose, g. : "os-3: the oregon state open shop operating system." in american federation for information processing societies: proceedings of the 1969 spring joint computer conference, boston, mass., may 14-16, 1969 (montvale, new jersey: afips press, 1969), 241248. 11. spigai, frances; taylor, mary: a pilot-an on-line library acquisition system (corvallis, oregon: oregon state university, computer center, january 1968), cc-68-40, ed-024 410. 12. university of chicago. library: development of an integrated, computer-based, bibliographical data system for a large university library (chicago, illinois: university of chicago, library, 1968). pb-179 426. 13. lefkovitz, david : file structures for on-line systems (new york: spartan books, 1969 ), pp. 98-104. 14. ames, james lawrence: an algorithm for title searching in a computer based file (pullman, washington : washington state university library, systems division, 1968). 15. kilgour, frederick g.: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science, 5 (new york, greenwood publishing corp., 1968)' 133-136. 16. ruecking, frederick h., jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-238. 17. parker, edwin b.: spires (stanford physical information retrieval system). 1967 annual report (stanford california: stanford university, institute for communication research, december 1967), 33-39. 18. kilgour, frederick g.: "effect of computerization on acquisitions," program, 3 (november 1969), 100-101. 19. "university library systems development projects undertaken at columbia, chicago and stanford with funds from national science foundation and office of education," scientific information notes, 10 (april-may 1968), 1-2. 20. "grants and contracts," scientific information notes, 10 (octoberdecember 1968), 14. 21. "university of chicago to set up total integrated library system utilizing computer-based data-handling processes," scientific information notes, 9 (june-july 1967), 1. 22. "washington state university to make preliminary library systems study," scientific information notes, 9 (april-may 1967), 6. 1 an automated music programmer (musprog) david f. harrison, music director, wsui-ksui, and randolph j. herber, applications programmer, university computer center, the university of iowa, iowa city, iowa a system to compile programs of recorded music for .broadcast by the university of iowa's radio stations. the system also provides a permanent catalog of all recorded music holdings and an accurate inventory control. the program, which operates on an ibm 360/65, is avaaable in fortran iv, cobol and pl/1, with assembly language subroutines and external maintenance programs. the state university of iowa (iowa city) owns and operates two broadcasting stations, wsui, at 9l0 kc, and ksui, at 91.7 mc. wsui was the first educational radio station in operation west of the mississippi, and ranks among the oldest stations iri the country; ksui was among the earliest of the frequency modulation outlets in the area to offer programming in multiplex stereo. in the spring of 1967, when it became necessary to completely reorganize their recorded music libraries, an investigation wali simultaneously underway to determine the feasibility of utilizing automated data processing ( a.d.p.) techniques in the discographic operations of the stations. at the time there were several working bibliographic applications ( 1), ranging from relatively simple record-keeping (where is ... ? ) to more ambitious cross-referencing and indexing operations, one of which uses the kwic (keyword-in-context) computer program to classify musical recordings ( 1). on the basis of the awareness of these applications, and a belief that the intrinsic principles could be utilized and extended to cover somewhat different needs, it was proposed that the facilities of the uni2 journal of library automation vol. 2/1 march, 1969 versity computer center be employed in the selection and updating of recorded music programs. in designing a coded set of instructions to perform these tasks, it was deemed necessary that any attempt at the selection or compilation of a series of music programs should be made in accordance with certain criteria supplied to the system by the user, and that these selection specification parameters should closely parallel those which would be employed were such an extraction from the total libraries to be performed manually. additional requisites were that provision be made for updating and enlarging the master file as new items were acquired, and that the coding of the programmed instn.1ctions should be sufficiently flexible to permit inclusion of supplemental criteria as they became desirable. the above proposal met with a certain degree of opposition, the main bone of contention being that such an application would necessarily "dehumanize" music programming. there have been, and will continue to be, similar objections raised by those who are unaware of the advantages offered by a. d. p. and concomitantly unaware of the mental processes which result in what is commonly referred to as "artistic judgment." it is not the purpose of this article to attempt an exhaustive analysis of such processes, nor to castigate the objectors; it is rather simply to bring forth several basic observations dealing with the problem under discussion. a contemporary composer-theorist interested in the applications of a. d. p. techniques to the process of musical composition has observed that no paradoxical "almighty force" exists in science, which, in actual fact, progresses by discrete steps which are at once limited but unpredictable ( 3, 4). the following list of conclusions, although relating specifically to the problems of machine-"created" music, find no less an application to the current problem: creative human thought is an aggregate of restrictions or choices in all fields of human activity, including the arts. certain aspects of these ·judgments can be mechanized and simulated by certain physical mechanisms currently extant, including the cm;nputer. the rapidity of calculation or decision by computer frees human beings from the long and arduous task of manually selecting, compiling and checking of programmed works. the time thus saved can be better spent on such amenities as scripting, with complete performance information and record data, and the always-too-necessary pronouncing aids. moreover, the computer program can be "exported" to any place similarly equipped to be used by other individuals, or where other programmers are able to alter the algorithm to meet their specific needs. the automated music programmer ( musprog) was interpreted as being a series of steps, the first of which specifies that complete mus~c programs are to be selected in accordance with a table of specifications introduced as data, each card containing inforination pertinent to a discrete program. the second step requires that each and every entry in the catalog be i f automated music programmer/harrison and herber 3 checked for availability by any program in the tables established in the preceding step, this status to be determined on the basis of a satisfactory comparison with the individual criteria supplied on the selection specification card. among these are "tests" (note that a failure to meet the requirements in any step disqualifies the item) to determine when the item was last selected, as well as the number of times selected; a check for allowable time length; a check for duplication of composer and/ or title; a statement that stereo recordings are to be used only for fm; a check for acceptable period, style and type of composition; and the decision to update the master file. in the final operation of the program, each duplicate title of a work selected is also updated, simulating selection to prevent its selection during the next month. if each duplicate were given the date factor of the item actually selected, the latter would tend to appear much more frequently than its companions because the program would continue to select the longest available item, and it is reasonably safe to assume that the selected item is the longest version of the title in question. it was necessary, therefore, to devise a means by which each version of a given work (indicated by both title and composer) be given equal weight for "fair" selection. a unidimensional array called item was constructed with ten positions as follows: item (10)/0', '0', '0', '0', '1', '1', '1', '2', '2', '3'/. the index of the array was then selected by referencing a routine which generates random, positive integers in the range one through ten. the contents of that position in item are added to the date factor of the record selected, and the result placed in the corresponding field of the duplicate title under scrutiny. thus there exists a 40% probability that the duplicate will have the same "weight" as the selected item, a 30% chance that the duplicate will be "pushed back" one month, 20% for two months, and a 10% probability that the date factor of the duplicate title will be increased by three months. 'when all the titles have been thus read or updated, the run concludes. figure 1 is the flowchart that is the basic design of musprog and from which the computer program was coded. the program runs on an ibm 360/65 and is available in fortran iv, cobol, and pl 1 with assembly language subroutines and external maintenance programs. copies of these programs may be obtained from the national auxiliary publications service (naps #00278). the machine readable catalog system currently employed by the university's radio stations is, on the whole, independent of the record's origin or manufacture. (the catalog number could be considered as nothing more than an indication of a discrete shelf space. ) the system was designed to facilitate maximally efficient use of the 80 columns available on a punch card. by utilizing two alaphabetic and two decimal characters ranging from aoo through zz99 provision is made for identification of records and tapes in quantities somewhat in excess of seventy-thousand 4 ] ournal of library automation vol. 2/1 number of p rogram spec .. lfication spaces +1 to snumi snumi -1 to snumi write program determine for which station program is being selected indic ate null selection list iy setting pol nter= ~ o'flfrmine which compo~ents of the allowable characteristics are significant fig. 1. flowchart for musprog. march, 1969 to the paogram pointer set j to pointer in piece cep•j-it--l.--------' i from lag add oh f 10 i\iumifr of plfcfs sflfctfd chain rl f: cf to begi nn i ng of list for ,rogram sui i rac t duration from time remain ing c om,u i f lag fr om du ra l io n copy p i e ce informal i on 1 ~ 1 0 spac f automated music programmer/ harrison and herber 5 ph 1 w ri te prog ram poi nter rewind ~npui and r an oom c hang e in to pr opu l a g f i el o fig. 1 continued. 6 journal of library automation vol. 2/1 march, 1969 individual discs or reels. the total of actual single titles possible to catalog in this manner is at least twice that number. the card catalog is made up along more or less standard, triple-reference lines on the familiar 3x5-inch card. these remain in the master card file, but are actually used only for reference purposes, rather than for actual selection. the "real" master library exists in the form of punched cards (later transferred to magnetic tape). each card image contains the following information, with blank columns separating contiguous fields: columns 1-10 composer, or first ten characters if abbreviation necessary. 12-27 title, abbreviations standardized 29-33 duration of work in seconds. 35-37 period of composition. 39-40 type of composition. 42-45 catalog number. 47-57 physical location of item on cataloged disc or tape. 59-64 "date fields, used for updating and usage factors. 66-69 seasonal key, a blank indicating general usefulness. 71-80 field used by musprog for internal record-keeping. operation selection of music by the system is performed in accordance with a table of program specifications which includes information pertinent to the length of the desired program and maximum permissible length of any single work within it, the type of music desired, and additional information, such as date, time and title of the program to be aired and an indication of the station for which the program is to be selected. all the selections for ksui ( fm) are required to be stereophonic. classification into stereophonic and monophonic groups is a function of the catalog number, aoo through z99 being stereophonic and aaoo through zz99 being monophonic. a program selection card contains the following data: columns 1 2-6 7-11 12 13-27 28-79 station code: w for wsui, blank for ksui duration of program in seconds. maximum duration of each item to be selected ( 0 or blank indicates program may consist of but a single work equivalent in length to program duration. number of types being specified. three three-plus-two character fields to specify period and type (modia equals "twentieth-century, orchestra"). if any field is blank, musprog assumes anything acceptable. title of program to be selected, day and time. automated music programmer/harrison and herber 7 as an example, the following specifications were made for a program called "aubade" which was aired at 10:00 a.m. on tuesday, july 30, by wsui. program duration was to be 3400 seconds (56:40), allowing 3:20 for continuity. maximum length of any single work within the program was to be 900 seconds (15 :00). music could be chosen from the contemporary orchestral repertoire, any instrumental work from the classic period, or any type "3" work, i. e., soloist and piano, or chorus a cappella. figure 2 shows a printout of selections for two programs. music sel ected fo r ws ui even i ng con cert 5:30 pm thu rsday. july 2 5 pr ogram no, 6 9 le ng th 860 0 unuse d time is 1 tota l 2:23:19 rangs trom di v el eg !aco fa41 52/82 prok o~ ! ev semyon kotko sui ka 4 7 51-2/e bach cantata 146 ma 19 s1-2/e beetho ven pia co n arr vn c kb-> 1 s 12/ e music sele cte d for wsui 0:15:32 0:42:12 0:42:25 0 :43:10 even i ng co nce rt 5:30 pm tu esday• july 30 procr mt no , 6 7 length 060 0 unus ed t im e is 0 total 2:23:20 ha ydn sy mp ho ny 38 kfl21 51/e tchaik ovsk sy~ phony 5 ga 5 0 s l -2/e i ve s pi a son 2 con cor na8 1 s l-2/ e beethoven strin g qrt 15 ca 12 51-8/e fig. 2 printout of selections. 0 : 13:40 0 ! 42:58 0:4 3: 0 an additional feature of musprog is provision for a periodic summary of library usage, affording the librarian a concise account of frequently played items, as well as an indication of those works which have been selected infrequently or ignored altogether. this report allows the programmer to assess more accurately the maximum number of times a selection may be programmed before it is declared unacceptable. the system also puts out printed lists of works extracted from the library in accordance with a user-specified table of reference fields: e. g., all symphonies, all works by bach, all works of under ten minutes in length, all christmas music; or conceivably, any symphonies by bach which are 8 journal of library automation vol. 2/1 march, 1969 suitable for christmas and less than ten minutes long. this latter step could also include, with minor alterations in the computer program, provision for performances by one specific ensemble or artist only. an external program allows adding items to the master tape, deleting those no longer needed and correcting any of the various fields within individual records; thus if mis-timings or other inaccuracies are noted, it becomes a relatively simple matter to correct them. discussion it can readily be seen that "the machine" neither possesses nor displays "taste" in any conventional sense of that word, since it can select only those types of music which the programmer has declared acceptable. it does not, indeed cannot, show any predilection toward certain types of music to the detriment or exclusion of others, save those which have been removed from the list of potential selections by the programmer. it performs no independent judgments. without doubt, then, there is no logical basis for t.~e cry of "dehumanization," since the program was originally designed by human minds and is, at each step of the process of selection, governed by the human-designed control parameters and program specifications; therefore it cannot select music willy-nilly, but must be told what to do and how to do it. it also has been found that specifications cannot be "plugged in" at random, for the programs thus selected would prove little more than a conglomerate of sundry works bearing no relation to one another. it is very much a necessity that organization and logic be designed into each program to have any coherent programming result. the machine does not "know" what to do unless told. it should be brought out that because of a built-in logic and the order of titles on the master file the program will tend to select the longest works available to fill the specified program time, making up the difference, if any, with progressively shorter pieces until the time is filled, or until no work of acceptable type and sufficient brevity can be located. since the longer works tend to occur among certain types and/ or styles of music, there may be some tenuous grow1ds for a suspicion of bias. it will be observed that musprog does not include information pertinent to performer, conductor, etc. one of the several reasons for this apparent oversight is that such information would, at the outset, have required the use of one to four additional data cards per title. since this information was not deemed absolutely essential to the immediate functions of the program, it was decided to postpone inclusion of such a refinement to some future date. conclusion musprog has been utilized by the state university of iowa since march, 1968, and has resulted in considerable time-saving. for example, the july, 1968, programming required one hundred and two programs automated music programmer/harrison and herber 9 varying in length from thirty minutes to somewhat over four hours, and consisting of a variety of musical styles and representing a diversity of programming difficult to achieve efficiently by ordinary means. in three minutes and twelve seconds, musprog selected the programs, updated the catalog, checked for duplication of selections, timed each program, and printed out the resultant copy properly headed. at an approximate cost of $250.00 per hour, this comes to less than fifteen dollars per month to perform tasks which might normally require two persons, at perhaps two or three dollars per hour, to work an entire week or more. it is doubtful that even then each catalog entry could be examined and an accurate record of usage be kept. acknowledgments a staff research grant from the graduate college, university of iowa, partially supported development and operation of this system. dean duane c. spriestersbach of the graduate college, professor gerard p. weeg, chairman of the department of computer science, and program supervisor robert e. irwin gave generous support and encouragement to the development of musprog. references 1. 'wilhoit, g. cleveland: "computerized indexing for broadcast music libraries," journal of broadcasting, 11 (fall, 1967) 325-337. 2. brook, barry s.: "rilm, repertoire internationale de ia litterature musicale," notes; the quarterly journal of the mtisic library association, 23 (march, 1967) 462-467. 3. xenakis, iannis: "in search of a stochastic music," gravesano review, 11 (1958). fagan 140 information technology and libraries | september 2006 visual search interfaces have been shown by researchers to assist users with information search and retrieval. recently, several major library vendors have added visual search interfaces or functions to their products. for public service librarians, perhaps the most critical area of interest is the extent to which visual search interfaces and text-based search interfaces support research. this study presents the results of eight full-scale usability tests of both the ebscohost basic search and visual search in the context of a large liberal arts university. l ike the web, online library research database interfaces continue to evolve. even with the smaller scope of library research databases, users can still suffer from information overload and may have difficulty in processing large results sets. web search-engine research has shown that the number of searchers viewing only the first results page has increased from 29 percent in 1997 to 73 percent in 2002 for united states-based web searchengines users.1 additionally, the mean number of results viewed per query in 2001 was 2.5 documents.2 this may indicate either increasing relevance in search results or an increase in simplistic web interactions. visual alternatives to search interfaces attempt to address some of the problems of information retrieval within large document sets. while research and development of visual search interfaces began well before the advent of the web, current research into visual web interfaces has continued to expand.3 within librarianship, the most visual interface research seems to focus on those that could be applied to large-scale digital library projects.4 although library products often have more metadata and organizational structure than the web, search engine-style interfaces adapted for field searching and boolean operators are still the most frequent approach to information retrieval.5 yet research has shown that visual interfaces to digital libraries offer great benefit to the user. zaphiris emphasizes the advantage of shifting the user’s mental load “from slow reading to faster perceptual processes such as visual pattern recognition.”6 according to borner and chen, visual interfaces can help users better understand search results and the interrelation of documents within the result set, and refine their search.7 in their discussion of the function of “overviews” in visual interfaces, greene and his colleagues say that overviews can help users make better decisions about potential relevance, and “extract gist more accurately and rapidly than traditional hit lists provided by search engines.”8 several library database vendors are implementing visual interfaces to navigate and display search results. serials solutions’ new federated search product, centralsearch, uses technology from vivisimo that “organizes search results into titled folders to build a clear, concise picture for its users.”9 ulrich’s fiction connection web site has used aquabrowser to help one “discover titles similar to books you already enjoy.”10 the queens library has also implemented aquabrowser to provide a graphical interface to its entire library’s collections.11 xreferplus maps search results to topics by making visual connections between terms.12 comabstracts, from cios, uses a similar concept map, although one cannot launch a search directly from the tool. groxis chose a circular style for its concept-mapping software, grokker. partnerships between groxis and stanford university began as early as 2004, and grokker is now being implemented at stanford university libraries academic and information resources.13 ebsco and groxis announced their partnership in march 2006.14 the ebscohost interface now features a visual search tab as an option that librarians can choose to leave on (by default) or turn off in ebsco’s administrator module. figure 1 shows a screenshot of the visual search interface. within the context of library research databases, visual searching likely provides a needed alternative from traditional, text-based searching. to test this hypothesis, james madison university libraries (jmu libraries) decided to conduct eight usability sessions with ebscohost’s new visual search, in coordination with ebsco and groxis. while this is by no means the first published usability test of vendor interfaces, the literature understandably reveals a far greater number of usability tests on in-house projects such as library web sites and customized catalog interfaces than on library database interfaces.15 it is hoped that by observing users try both the ebsco basic search and visual search, more understanding will be gained about user search behavior and the potential benefits of a visual approach. ฀ method the usability sessions were conducted at jmu, a large liberal arts university whose student population is mostly drawn from virginia and the northeastern region. only 10 percent of the students are from minority groups. jmu requires that all freshmen pass the online information skills seeking test (isst) before becoming a sophomore, and the libraries developed a web tutorial, “go for the gold,” to prepare students for the isst. therefore, usabiljody condit fagan usability testing of a large, multidisciplinary library database: basic search and visual search jody condit fagan (faganjc@jmu.edu) is digital services librarian at carrier library, james madison university, harrisonburg, virginia. usability testing of a large, multidisciplinary library database | fagan 141 ity-test participants were largely white, from the northeastern united states, and had exposure to basic information literacy instruction. jmu libraries’ usability lab is a small conference room with one computer workstation equipped with morae software.16 audio and video recordings of user speech and facial expressions, along with “detailed application and computer system data,” are captured by the software and combined into a searchable recording session for the usability tester to review. a screenshot of the morae analysis tool is shown in figure 2. the usability test script was developed in collaboration with representatives of ebsco and groxis. ebsco provided access to the beta version of visual search for the test, and groxis provided financial incentives for student participants. the test sessions and the results analysis, however, were conducted solely by the researcher and librarian facilitators. the visual search development team was provided with the results and video clips after analysis. usability study participants were recruited by posting an announcement to the jmu students’ web portal. a $25 gift certificate was offered as an incentive, and more than 140 students submitted a participation interest form. these were sorted by the number of years the student(s) had been at jmu to try to get as many novice users as possible. because so much of today’s student work is conducted in groups, four groups of two, as well as four individual sessions, were scheduled, for a total of twelve students. jmu librarians who had received both human-subjects training and an introduction to facilitation served as facilitators to the usability sessions. their role was to watch the time and ask open-ended questions to keep the student participants talking about what they were doing. the major research question it was hoped would be answered by the tests was, “to what extent does ebsco’s basic search interface and visual search interface support student research?” since the tests could not evaluate the entire research process, it was decided to focus on the development of the research topic. specifically, the goal was to find out how well each interface supported the intellectual process of the students in coming up with a topic, narrowing their topic, and performing searches on their chosen subtopics. an additional goal was to determine how well users were able to find and use the interface widgets and how satisfied the students felt after using the interfaces. the overall session was structured in this order: a pretest survey about the students’ research experience; a series of four tasks performed with ebscohost’s basic search; a series of three tasks performed with ebscohost’s visual search; and a posttest interview. both basic and visual search interfaces were used with academic search premier. each of the eight sessions was recorded in entirety by the morae software, and each recording was viewed in entirety. to try to gain some quantitative data, the researcher measured the time it took to complete each task. however, due to variables such as facilitator involvement and interaction between group members, the numbers did not lend themselves to comparison. also, it would not have been clear whether greater numbers indicated a positive or negative sign. taking longer to come up with subtopics, for example, could as easily be a sign of exploration and interested inquiry as it might be of frustration or failure. as such, the data are mostly qualitative in nature. figure 1. screenshot of ebscohost’s visual search figure 2. screenshot of morae recorder analysis tool 142 information technology and libraries | september 2006 ฀ results the student participants were generally underclassmen. two of the students, group 2, were in their third year at jmu. all others were in their first or second year. while students were drawn from a wide variety of majors, it is regrettable that there was not stronger representation from the humanities. when asked, “what do you normally use to do research?” six students answered an unqualified “google.” three other students mentioned internet search engines in their response. only two students gave the brand or product names of library research databases: one said, “pubmed, wilsonomnifile, and ebsco,” while the other, a counseling major, mentioned psycinfo and cinahl. when shown a screenshot of basic search, half of the students said they had used an ebsco database before. all of the participants said they had never before used a visual search interface. the full results from the individual pretest interviews are shown in figures 3 and 4. to begin the usability test, the facilitator started internet explorer and loaded the ebscohost basic search, which was set to have a single input box. the scripts for each task are listed in figure 5. note that task 4 was only featured in the basic search portion of the test. for task 1 on the basic search—coming up with a general topic—all of the participants began by using their own topics rather than choosing from the list of ideas. also, although they were asked to “spend some time on ebsco to come up with a possible general topic,” all but group 6 fulfilled this by simply thinking of a topic (sometimes after some discussion within the groups of two) and typing it in. with the exception of group 6, the size of the result set did not inspire topic changes. figure 6 summarizes the students’ searches and relative success on task 1. in retrospect, the tests might have yielded more straightforward findings if the students had been directed to choose from the provided list of topics, or even to use the same topic. however, part of the intention was to determine whether either interface was helpful in guiding the students’ topic development. it was hoped that by defining the scenario as writing a paper for class, their topic selection would reflect the realities of student research. however, it probably would have been better to have used the same topic for each session. task 2 asked participants to identify three subtopics, and task 3 asked them to refine their search to one subtopic and limit it to the past two years. a summary of these tasks appears in figure 7. a surprising finding during task 2 was that students did go past the first page of results. four groups went past the first page of results, while two groups did not get enough results for more than one page. the other two groups did not choose to look past the first page of results. this contrasts with jansen and spink’s findings, figure 3. results from pretest interview, groups 1–4 figure 4. results from pretest interview, groups 5–8 usability testing of a large, multidisciplinary library database | fagan 143 in which 73 percent of web searchers only view the first results page.17 another pleasant surprise was that students spent some time actually reading through results when they were searching for ways to narrow their topic. five groups scanned through both titles and abstracts, which requires clicking on the article titles to display the citation view. one of these five additionally chose to open full-text articles and look at the references to determine relevance. two groups scanned through the results pages only, but looked at both article titles and the subjects in the left-hand column. group 5 seemed to only scan the titles in the results list. this user behavior is also quite different than that found with web search-engine users. in one recent study by jansen and spink, more than 90 percent of the time, search-engine users viewed five or fewer documents per query.18 the five groups that chose to view the citation/abstract view by clicking on the title (groups 1, 2, 3, 4, and 6) identified subtopics that were significantly more interesting and plausible than the general topic they had come up with. from looking at their results, these groups were clearly identifying their subtopics from reading the abstracts and titles rather than just brainstorming. although group 2 had the weakest subtopics, going from the world baseball classic to specific players’ relationships to the classic and the home-run derby, they were working with a results set of but eleven items. the three groups that relied on scanning only the results list succeeded to an extent, but as a whole, the new subtopics would be much less satisfying to the scenario’s hypothetical professor. after scanning the titles on two pages of results, group 5 (an individual) ended up brainstorming her subtopics (prevention, intervention, and what an eating disorder looks like) based on her knowledge of the topic rather than drawing from the results. group 7 (a group of two) identified their subtopic (sand dunes) from the lefthand column on the results list. group 8 (an individual) picked up his subtopics (steroids in sports, president bush’s stance on steroids, and softball) from reading keywords in the article titles on the first page of results. since the subjects in the left-hand column were a new addition to basic search, the use of this area was also noted. four groups used the subjects in the left-hand column without prompting. two groups saw the subjects (i.e., ran the mouse over them) but did not use them. the remaining two groups made no action related to the subjects. a worrisome finding of tasks 2 and 3 was that most students had trouble with the default search being set to phrase-searching rather than to a boolean and. this can easily be seen in looking at the number of results the students came up with when they tried to refine their topics (figure 7). even though most students had some limiter still in effect (full text, last two years) when they first tried their new refined search, it was the phrasesearching that really hurt them. luckily, this figure 6. task 1, coming up with a general topic using basic search figure 5. tasks posed for each portion of the usability test. 144 information technology and libraries | september 2006 is a customizable setting in ebsco’s administrator module, and it is recommended that libraries enable the “proximity” expander to be set “on” by default, which will automatically combine search terms with and. task 4, finding a “recent article in the economist about the october earthquake in kashmir,” was designed to test the usability of the ebscohost publication search and limiter. it was listed as optional in case the facilitator was worried that time was an issue. four of the student groups—1, 2, 5, and 7—were posed the task. of these four groups, three relied entirely on the publication limiter on the refine search panel. group 1 chose to use the publication search. all four groups quickly and successfully completed this task. ฀ ฀additional questions during basic search tasks at various points during the three tasks in ebsco’s basic search, the students were asked to limit their results set to only full-text results, to find one peer-reviewed article, and to limit their search to the past two years. seven out of the eight student groups had no problem finding and using the ebscohost “refine search” panel, including the full-text check box, date limiter, and peerreviewed limiter. group 7 did not find the refine search panel or use its limiters until specifically guided by the facilitator near the end. this group had found other ways to apply limits: they used the “books/monographs” tab on the results list to limit to full text, and the results-list sorting function to limit to the past two years. after having seen the refine search panel, group 7 did use the “peer reviewed” check box to find their peer-reviewed article. toward the end of the basic search portion, students were asked to “save three of their results for later.” three groups demonstrated full use of the folder. an additional three groups started to use the folder and viewed the folder but did not use print, save, or e-mail. it is unclear whether they knew how to do so and just did not follow through, or whether they thought they had safely stored the items. two students did not use the folder at all, acting individually on items. one group used the “save” function but did not save each article. ฀ visual search similar to task 1, when using the basic search, students did not discover general topics by using the interface, but simply typed in a topic of interest. only two groups, 1 and 8, chose to try the same topic again. in the interests of processing time, visual search limits the search to the first 250 results retrieved. since jmu has set the default sort results to display in chronological order, the most recent 250 results were returned during these usability tests. figure 8 shows the students’ original search terms using visual search, the actions they took while looking for subtopics, and the subtopics they identified. additionally, if the subtopics they identified matched words on the screen, the location of those words is noted. three of the groups (1, 2, and 5) identified subtopics when looking at the labels on topic and subtopic circles. group 3 identified subtopics while looking at article titles as well as the subtopic circles. the members of group 6 identified subtopics while looking at the citation view and reading the abstract and full text, as well as rolling over article titles with their mice. it was not entirely clear where the student in group 4 got his subtopics from. two of the three subtopics did not seem to figure 7. basic search, task 2 and 3, coming up with subtopics. usability testing of a large, multidisciplinary library database | fagan 145 be represented in the display of the results set. his third subtopic was one of the labels from a subtopic circle. groups 7 and 8 both struggled with finding their subtopics. group 7 simply had a narrow topic (“jackalope”), and group 8 misspelled “steroids” and got few results for that reason. lacking many clusters, both groups tried typing additional terms into the title keyword box on the filter panel, resulting in fewer or zero results. for task 3, students were asked to limit their search to the last two years and to refine their search to a chosen subtopic (figure 9). particularly because the results set is limited to 250, it would have been better to have separated these two tasks: first to have them limit the content, then perhaps the date of the search. three groups, all groups of two, used the date limit first (2, 6, and 8). three groups (1, 3, and 6) narrowed the content of their search by typing a new search or additional keywords into the main search box. groups 2 and 4 narrowed the content of their search by clicking on the subtopic circles. note that this does not change the count of the number of results displayed in the filter panel. groups 5 and 7 tried typing keywords into the title keyword filter panel and also clicking on circles. both groups fared better with the latter approach. group 8 typed an additional keyword into the filter panel box to narrow his search. while five of the groups announced the subtopic to which they wanted to narrow their search before beginning to narrow their topic, groups 2, 7, and 8 began to interact with the interface and experiment with subtopics before choosing one. while groups 2 and 8 arrived at a subtopic and identified it, group 7 tried many experiments, but since their original topic (jackalope) was already narrow, they were not ultimately successful in identifying or searching on a subtopic. as with basic search, students were asked to save three articles for later. five of the groups (2, 4, 5, 6, and 8) used the “add to folder” function which appears in the citation view on the right-hand side of the screen. of these, three groups proceeded to “folder has items.” of these groups, two chose the “save” function. two groups used either “save” or “e-mail” to preserve individual items, rather than using the folder. one group experienced system slowness and was not able to load the full-record view in time to determine whether they would be able to save items for later. a concern that students may not realize is that in folder view or individually, the “save” button really just formats the records. the user must still use a browser function to save the formatted page. no student performed this function. figure 8. visual search, task 1 and 2, coming up with a general topic figure 9. visual search, task 3, searching on subtopic (before date limit, if possible) 146 information technology and libraries | september 2006 several students had some trouble with the mechanics of the filter panel, shown in figure 10. seven of the eight groups found and used the filter panel, originally hidden from view, without assistance. however, some users were not sure how the title keyword box related to the main search box. at least two groups typed the same search string into the title keyword box that they had already entered into the main search box. also, users were not sure whether they needed to click the search button after using the date limiter. however, in no case was a student unable to quickly recover from these areas of confusion. ฀ results of posttest interview at the end of the entire usability session, participants were asked several questions while looking at screenshots of each interface. a full list of posttest interview questions can be found in figure 11. when speaking about the strengths of basic search, seven of eight groups talked about the search options, such as field searching and limiters. the individual in group 1 mentioned “the ability to search in fields, especially for publications and within publications.” one of the students in group 3 mentioned that “i thought it was easier to specify the search for the full text and the peer reviewed—it had a separate page for that.” the student in group 4 added, “they give you all the filter options as opposed to the other one.” five of the eight groups also mentioned familiarity with the type of interface as a strength of basic search. since jmu has only had access to ebsco databases for less than a year, and half of the students admitted they had not used ebsco, it seemed their comments were with the style of interface more than their experience with the interface. the student in group 1 commented, “seems like the standard search engine.” group 2 noted, “it was organized in a way that we’re used to more,” and group 3 said, “it’s more traditional so it’s more similar to other programs.” half of the groups mentioned that basic search was clear or organized. group 6 explained, “it was nice how it was really clearly set out . . . like, everything’s in a line.” not surprisingly, visual search’s strengths surrounded the grouping of subtopics: seven of eight groups made some comment about this. the student in group 4 said, “it groups the articles for you better. it kinda like gives you the subtopics when you get into it and search it and that’s pretty cool.” the student in group 8 stated, “you can look and see an outline of where you want to go . . . it’s easy to pinpoint it on screen like that’s where i want to go with my research.” some of the other strengths mentioned about visual search were: showing a lot of information on one screen without scrolling (group 7) and the colorful nature of the interface. a student in group 2 added, “i like the circles and squares—the symbols register easily.” the only three weaknesses listed for basic search in response to the first question were: “not having a spot to put in words not to search for” (group 1); that, like internet search engines, basic search should have “a clip from the article that has the keyword in it, the line before and the line after” (group 6); and that basic search might be too broad, because “unless you narrow it, [you have to] type in keywords to narrow it down yourself” (group 7). figure 10. visual search filter panel figure 11. posttest interview questions usability testing of a large, multidisciplinary library database | fagan 147 with regard to weaknesses of visual search, half of the groups had some confusion about the content, partially due to the limited number of results. a student from group 7 declared, “it may not have as many results. . . . if you typed in ‘school’ on the other one, it might have . . . 8,000 pages [but] on this you have . . . 50 results.” the student in group 5 agreed, saying that with visual search, “they only show you a certain number of articles.” the student in group 1 said, “it’s kind of confusing when it breaks it up into the topics for you. it may be helpful for some other people, but for the way my mind works i like just having all my results displayed out like on the regular one.” half of the groups also made some comment that they were just not used to it. six of the groups were asked which one they would choose if they had class in one hour. (it is not clear why the facilitator did not ask this question of groups 3 and 8.) four groups (1, 2, 5, and 7) indicated basic search. one student in group 2 said, “i think it’s easier to use, but i don’t trust it.” the other in group 2 added, “it’s new and we’re not quite sure because every other search engine is you just type in words and it’s not graphical.” both students in group 7 commented that the familiarity of basic search was the reason they would use it for class in one hour. both groups 2 and 7 would later say that they liked the visual search interface better. two groups (4 and 6) chose visual search for the “class in one hour” scenario. the student in group 4 commented, “because it does cool things for you, makes it easier to find. otherwise you’re going through by title.” both these groups would later also say that they liked the visual search interface better. the students were also asked to describe two scenarios, one in which they would use basic search and one in which they would use visual search. four of the groups (1, 3, 5, and 6) said they would use basic search when they knew what information they needed. seven of the eight groups said they would use visual search for broad topics. all the students’ responses are given in figure 12. when asked which interface they preferred, the groups split evenly. comments from the four who preferred basic search (1, 3, 5, and 8) centered on the familiarity of the interface. the student in group 5 added, “the regular one . . . i like to get things done.” all four of these students had said they had used an ebsco database before. the two students who could list library research databases by name were both in this group. of the four who preferred visual search (2, 4, 6, and 7), three groups had never used ebsco before, though one of the students in group 7 thought he’d used it in the library web tutorial. group 2 commented, “it seemed like it had a lot more information . . . cool . . . futuristic.” the student in group 4 said, “it’s kind of like a little game. . . . like you’re trying to find the hidden piece.” group 7 commented that visual search was colorful and intriguing. the students in group 6 both stated “the visual one” in unison. one student said that visual search was more “[eye-catching] . . . it keeps you focused at what you are doing, i felt, instead of . . . words . . . you get to look at colors” and added later that it was “fun.” the other students in group 6 said, “i’m a very visual learner. so to see instead of having to read the categories, and say oh this is what makes sense, i see the circles like ‘abilities test’ or ‘academic achievement’ and i automatically know that’s what it is . . . and i can see how many articles are in it . . . and you click on it and it zooms in and you have all of them there.” the second student went on to add, “i’ve been teaching my mom how to use technology and the visual search would be so much easier for her to get, because its just looks like someone drew it on there like this is a general category and then it breaks it down.” other suggestions given during the free-comment portion of the survey were to have the filters from basic search appear on visual search (especially peer-reviewed); curiosity about when visual search would become available (at the time it was in beta test); and a suggestion to have generaleducation writing students write their first paper using visual search. figure 12. examples of two situations: one in which you would be more likely to use visual search, and one in which you would be more likely to use ebsco 148 information technology and libraries | september 2006 ฀ discussion this evaluation is limited both because most students chose different topics for each search interface, and because they only had time to research one topic in each interface. therefore, there could be an infinite number of scenarios in which they would have performed differently. however, this study does show that, for some students, or for some search topics, visual search will help students in a way that basic search may not. one hypothesis of this study was that within the context of library research databases, visual searching would provide a needed alternative from traditional, text-based searching. the success of the students was observed in three areas: the quality of the subtopics they identified after interacting with their search results; the improvement of the chosen subtopic over their chosen general topic, and the quality of the results they found for their subtopic search. the researcher made a best effort to compare topics and results sets and decide which interface helped the student groups to perform better. in addition, qualities that each interface seemed to contribute to the students’ search process were noted (figure 13). these qualities were determined by reviewing the video recordings and examining the ways in which either interface seemed to support the attitudes and behaviors of the students as they conducted their research tasks. when considering all three of these areas, four groups did not, overall, require visual search as an alternative to basic search (1, 3, 4, and 7). two of these groups (4 and 7) seemed to benefit from more focus when using the basic search interface. although visual search lent them more interaction and exploration (which may be why they said they preferred visual search), it seems the focus was more important to their performance. for the other two groups (1 and 3), basic search really supported the depth of inquiry and high interest in finding results. these two groups confirmed that they preferred basic search. for two groups (6 and 8), visual search seemed an equally viable alternative to basic search. for group 6, both interfaces seemed to support the group’s desire to explore; they said they preferred visual search. for the student in group 8, basic search seemed to orient him to the goal of finding results, while visual search supported a more exploratory approach. since, in his case, this exploratory approach did not turn out well in the area of finding results, it is not surprising that he ended up preferring basic search. the remaining two groups (2 and 5) performed better with visual search, upholding the hypothesis that an alternate search is needed. group 2 seemed bored and uninterested in the search process when using basic search even though they chose a topic of personal interest: “world baseball classic.” visual search caught their attention and sparked interest in the impersonal topic “global warming.” group 2 spent more time exploring while using the visual search interface, and in the posttest survey admitted that they preferred the visual search interface. the student in group 5 said she preferred basic search, and as a selfdescribed psycinfo user, seemed comfortable with the interface. yet for this test scenario, visual search made her think of new ideas and supported more real exploration during the search process. within each of the three areas, basic search appeared to have the upper hand for both the quality of the subtopics identified by the students, and in the improvement of the chosen subtopics over the general topics. this is at least partially explained by the limitation of visual search to the most recent 250 results. that is, as the students explored the visual search results, choosing subtopics would not relaunch a search on that subtopic, which would have engendered more and perhaps better subtopics. in the third area, the quality of the results set for the chosen topic, visual search seemed to have the upper hand if only because of the phrase-searching limitation present in jmu’s administrative settings for basic search. that is, students were often finding few or no results on their chosen subtopics in basic search. this study also had findings that seem to transcend figure 13: strengths of basic search and visual search in quality of subtopics, most improved topic, and result sets usability testing of a large, multidisciplinary library database | fagan 149 these interfaces and the underlying database. first, libraries should strongly consider changing their database default searching from phrase searching to a boolean and, if possible. (this is possible in ebsco using the administrative module.) second, most students did not have trouble finding or using the interface widgets to perform limiting functions, with the one exception being some confusion about the relationship between the visual search filters and main search box. unlike some research into web search behavior, students may well travel beyond the first page of results and view more than just a few documents when determining relevance. finally, the presence of subject terms in both interfaces proved to be an aid to understanding results sets. this study also pointed out some improvements that could be made to visual search. first, it would be great if visual search returned more than 250 results in the initial set, or at least provided an overview of the size, type, and extent of objects using available metadata.19 however, even with today’s high-speed connections, result-set size will need to be balanced with performance. perhaps, as students click on subtopics, the software could rerun the search so that the results set does not stay limited to the original 250. on a minor note, for both basic and visual search, greater care should be taken to make sure users understand how the save function works and alert users to the need to use the browser function to complete the process. it should be noted that ebsco has not stopped developing visual search, and many of these improvements may well be on their way. ebsco says it will be adding more support for limiters, display preferences, and contextual text result-list viewing at some point in the future. these feature sets can currently be viewed on grokker.com. an important area for future research is user behavior in library subscription databases. while these usability tests provide a qualitative evaluation of a specific interface, it would be worthwhile to have a more reliable understanding about students’ searching behavior in library databases across similar interfaces. since public service librarians deal primarily with users who have self-identified as needing help, their experience does not always describe the behavior of all users. furthermore, studies of web search behavior may not apply directly to searching in research databases. specifically, students’ use of subject terms in both interfaces could be explored. half of the student groups in this study chose to use the basic search subject clusters in the left-hand column on the results page, despite the fact that they had never seen them before (this was a beta-test feature). is this typical? would this strategy hold up to a variety of research topics? another interesting question is the use of a single search box versus several search boxes arrayed in rows (to assist in constructing boolean and field searching). in the ebsco administrative module, librarians can choose either option. based on research rather than anecdotal evidence, which is best? another option is the default sort: historically, at jmu libraries, this has been a chronological sort. does this cause problems for relevance-thinking students? finally, the issue of collaboration in student research using library research databases would be a fascinating topic. certainly, these usability recordings could be reviewed with a mind to capturing the differences between individuals and groups of two, but there may be better designs for a more focused study of this topic. ฀ conclusion if you take away one conclusion from this study, let it be this: do not hesitate to try visual search with your users! information providers must balance investments in cutting-edge technology with the demands of their users. libraries and librarians, of course, are a key user group for information providers. a critical need in librarianship is to become familiar with the newest technology solutions, particularly with regard to searching, in order to provide vendors with informed feedback about which technologies to pursue. by using and teaching new visual search alternatives, librarians will be poised to influence the further development of alternatives to text-based searching. references and notes 1. bernard j. jansen and amanda spink, “how are we searching the world wide web? a comparison of nine search engine transaction logs,” special issue, information processing and management 42, no. 1 (2006): 257. 2. bernard j. jansen and amanda spink, “an analysis of web documents retrieved and viewed,” in proceedings of the 4th international conference on internet computing (las vegas, 2003), 67. 3. aravindan veerasamy and nicholas j. belkin, “evaluation of a tool for visualization of information retrieval results,” sigir forum (acm special interest group on information retrieval) (1996): 85–93; katy börner and javed mostafa, “jodl special issue on information visualization interfaces for retrieval and analysis,” international journal on digital libraries 5, no. 1 (2005): 1–2; ozgur turetken and ramesh sharda, “clustering-based visual interfaces for presentation of web search results: an empirical investigation,” information systems frontiers 7, no. 3 (2005): 273–97. 4. stephen greene et al., “previews and overviews in digital libraries: designing surrogates to support visual information seeking,” journal of the american society for information science 51, no. 4 (2000): 380–93; panayiotis zaphiris et al., “exploring the use of information visualization for digital libraries,” new review of information networking 10, no. 1 (2004): 51–69. 5. katy börner and chaomei chen eds., visual interfaces to digital libraries, 1st ed. (berlin; new york: springer, 2003), 243. 150 information technology and libraries | september 2006 6. zaphiris et al., “exploring the use of information visualization for digital libraries,” 51–69. 7. börner and chen, visual interfaces to digital libraries, 243. 8. greene et al., “previews and overviews in digital libraries,” 380–93. 9. “vivisimo corporate profile,” in vivisimo, http://vivi simo.com/html/about (accessed apr. 19, 2006). 10. “aquabrowser library—fiction connection,” www.fic tionconnection.com/ (accessed apr. 19, 2006). 11. “queens library—aquabrowser library,” http://aqua .queenslibrary.org/ (accessed apr. 19, 2006). 12. “xrefer—research mapper,” www.xrefer.com/research (accessed apr. 19, 2006). 13. “stanford ‘groks,’” http://speaking.stanford.edu/back _issues/ soc67/library/stanford_groks.html (accessed apr. 19, 2006); “grokker at stanford university,” http://library.stan ford.edu/catdb/grokker/ (accessed apr. 19, 2006). 14. “ebsco has partnered with groxis to deliver an innovative visual search feature as part of ebsco,” www.groxis .com/service/grokker/pr29.html (accessed apr. 19, 2006). 15. michael dolenko, christopher smith, and martha e. williams, “putting the user into usability: developing customer-driven interfaces at west group,” in proceedings of the national online meeting 20 (medford, n.j.: learned information, 1999), 81–90; e. t. morley, “usability testing: the silverplatter experience,” cd-rom professional 8, no. 3 (1995); ron stewart, vivek narendra, and axel schmetzke, “accessibility and usability of online library databases,” library hi tech 23, no. 2 (2005): 265–86; nicholas tomaiuolo, “deconstructing questia: the usability of a subscription digital library,” searcher 9, no. 7 (2001): 32–39; b. hamilton, “comparison of the different electronic versions of the encyclopaedia britannica: a usability study,” electronic library 21, no. 6 (2003): 547–54; heather l. munger, “testing the database of international rehabilitation research: using rehabilitation researchers to determine the usability of a bibliographic database,” journal of the medical library association (jmla ) 91, no. 4 (2003): 478–83; frank cervone, “what we’ve learned from doing usability testing on openurl resolvers and federated search engines,” computers in libraries 25, no. 9 (2005): 10–14; alexei oulanov and edmund f. y. pajarillo, “usability evaluation of the city university of new york cuny+ database,” electronic library 19, no. 2 (2001): 84–91; steve brantley, annie armstrong, and krystal m. lewis, “usability testing of a customizable library web portal,” college & research libraries 67, no. 2 (2006): 146–63; carole a. george, “usability testing and design of a library web site: an iterative approach,” oclc systems & services 21, no. 3 (2005): 167–80; leanne m. vandecreek, “usability analysis of northern illinois university libraries’ web site: a case study,” oclc systems & services 21, no. 3 (2005): 181–92; susan goodwin, “using screen capture software for web-site usability and redesign buy-in,” library hi tech 23, no. 4 (2005): 610–21; laura cobus, valeda frances dent, and anita ondrusek, “how twenty-eight users helped redesign an academic library web site,” reference & user services quarterly 44, no. 3 (2005): 232–46. 16. “morae usability testing for software and web sites,” www.techsmith.com/morae.asp (accessed apr. 19, 2006). 17. jansen and spink, “an analysis of web documents retrieved and viewed,” 67. 18. ibid. 19. greene et al., “previews and overviews in digital libraries,” 381. 230 the recon pilot project: a progress report november 1969 -april 1970 henriette d. avram, kay d. guiles, lenore s. maruyama: marc development office, library of congress, washington, d. c. a srtnthesis of the second progress report submitted by the library of congress to the council on library resources under a grant for the recon pilot project. an overview of the p1'0gress made from november 1969 to april 1970 in the following areas: p1'0duction, official catalog comparison, format mcognition, research titles, microfilming, investigation of inptlt devices. in addition, the status of the tasks assigned to the recon working task force are briefly described. introduction an article was published in the june 1970 issue of the journal of library automation ( 1) describing the scope of the recon pilot project (hereafter referred to as recon) and summarizing the first progress report submitted by the library of congress ( lc) to the council on library resources (clr). recon is supported by the council, the u.s. office of education, and the library of congress. in order that all aspects of the project might be brought together as a meaningful whole, the various segments, regardless of the source of support, were covered in the second progress report and have been included in this article. in some instances, it has been necessary to introduce a section by repeating some aspects already reported in the june 1970 article in order to add clarity to the content of that section. recon pilot project/ avram 231 progress-november 1969 to april 1970 recon production the production operations of the recon pilot project are being handled by the recon production unit in the marc editorial office of the lc processing department. printed cards with 1968, 1969, and 7-series card numbers have been provided from the card division stock for recon input, and approximately 99,550 cards in the 1969 and 7-series have been received. using prescribed selection criteria the recon editors have sorted these cards and obtained approximately 27,150 eligible for recon input. approximately 150,000 cards in the 1968 series have also been received. the recon editors have sorted 60,000 of these cards and obtained approximately 24,000 records eligible for recon input. a large number of cards in these three series is already out of print, and replacement cards are being sent by the card division as soon as reprints are made. each card eligible for recon input from the above-mentioned selection process is also checked against a computer produced index of card numbers for records in machine readable form. each number in the print index has a corresponding code to show on which machine readable data base the record resides. the source codes are as follows: m1-marc i data base m2-marc ii, 1st practice tape m3-marc ii, 2nd practice tape m4-marc ii data base m5-marc ii residual data base (the two practice tapes contain records converted before the implementation of the marc distribution service to test the programs and input techniques.) the print index used for the final selection of the 1969 and 7-series card numbers contained only the records from m2-m5 (the marc i data base consists of the records converted during the marc pilot project which ended in june 1968). for the selection of the 1968 records, another print index had been produced which contains numbers for records on all five data bases. if the recon editors find a match on the print index, the appropriate source code is added to the printed card; these printed cards are then maintained in a separate file. (later in the project, the records in the data bases identified as m1 to m3 will be updated to conform with the current marc ii format and added to the recon data base.) the remaining cards for recon are reproduced on input worksheets and edited. to date, approximately 9,750 records in the 1969 and 7-series have been edited for recon. recon records in the 1969 and 7-series are being input by a service bureau. the contractor uses ibm selectric typewriters equipped with an ocr typing mechanism, and the hard-copy sheets are run through an 232 journal of library automation vol. 3/3 september, 1970 optical scanner. the output from the scatmer is a magnetic tape which is processed by the contractor's programs to produce a tape in the marc pre-edit format. this tape is then sent to lc and processed by the marc system programs to produce a full marc record. since the input for the retrospective conversion effort will be printed cards (or copies of printed cards from the card division record set), it will be necessary to compare these with their counterparts in the lc official catalog. the printed card for each main entry in the official catalog will show if any changes have been made which did not warrant reprinting these cards to incorporate these changes. items on a printed card that could be noted in this fashion include changed subject headings, added entries, and call numbers. since these will be important access points in a machine readable catalog record, it was felt that such revisions should be reflected in the recon records. the recon report ( 2) contains a lengthy discussion of the various factors involved in the catalog comparison process, such as the percentage of change in relation to the age of the record, the difficulty in ascertaining any changes because of language, interpretation of cataloging rules, etc. to determine the most efficient and least costly method of catalog comparison, two recon editors were assigned to conduct an experiment to test eight different methods as follows: 1) print-out checked in alphabetic order-single group of 200 records. 2) proofsheets (already proofed) checked in worksheet (card number) order-group of 200 records in batches of 20. 3) proofsheets (not proofed) checked in worksheet (card number) order -group of 200 records in batches of 20. 4) proofsheets (already proofed) checked by mental alphabetizationgroup of 200 records in batches of 20. 5) proofsheets (not proofed) checked by mental alphabetization-group of 200 records in batches of 20. 6) worksheets before editing (not input) checked by mental alphabetization-group of 200 records in batches of 20. 7) worksheets before editing (not input) checked in alphabetical order -group of 200 records in batches of 20. 8) worksheet before editing (not input) checked in worksheet (card number) order-group of 200 records in batches of 20. mental alphabetization means the searching of all the entries in a batch beginning with "a," then all the entries beginning with "b," etc., even though the batch is not in alphabetical order. each editor used 200 records for each method, made the necessary corrections, and recorded the time required as well as the number of corrections made. . figure 1 shows the average number of records checked in an hour using the eight different methods of catalog comparison. tables 1 and 2 give the estimated cost per record for each of the methods. in determining met.hod one : prinffi-0w checked in alplt.aaet'lcal 0,ra.e¢ metn0d. twa : pr0€if~li~lt!s (already proepfed) cheeked.in w:ork&heet qrder method ·th~~e: pitoof:s:tieets--(no;t proo.£,ed) cheek:ed i .n worksrbet orde,r merll:od fou·r method five method six p'rotlf'sheets (already proofed) che.c:ked bv .tmij!f,j:m.. all'~etua.',l'l~on liroofs·h:eets (not pt;';oe:i ed) checked bv mental alphabetization method seven: workshtbftts before editing (no•t tnntht) ~jh\~e,':f!,q-!\1. qj;!>to ..... 0 ~ = table 4. input devices >:l -----------0 manufacturer i mn};~ne ke yboard reco rd price t'-1 model configudisplay length i•~ mont1:f! remarks .... purchase ~ ration characters rent a cybercom i kic mark! kp none 80 $7970 $145 con.verter-$1801month ~ data action ki c 150 kp projec720 $5900 $155 converter-$5751month > tion .: ibm i ki c 50 kp back7w $9605 $175 converter-$340/ month -0 i ki c light in£nite converter-$3401 month ~ ibm mtstv t printed $100 -iv t printed in£nite $277 ... _ 0 sycor i kic 301 t crt 216 $7000 $150 converter-$1301 month ~ tycore ki c 8500 kp light240 $6000 $120 converter-$220/month < emitting 0 diodes viatron ki c 21 ti kp crt infinite $1920 $39 many options affecting price "' burroughs ki m n-7000 kp projec160 $8400to $165 to ...___ "' tion $12,200 $277 honeywell ki m keytape ti kp back80-400 $7500 to $148 to pooler for 2 stationscl) light $33,000 $735 $2001 month exh-a (i) ~ keymatic ki m 1091 t backin£nite $8750 $166 price is for basic 88 keys. 256 .... (i) light unique keys available as well s as optional printer. 0" 100 or 200 (i) mal i ki m 100-92 kp projec$6400 $160 pooler for up to 8 stationsv~ tion $401month extra ...... mohawk i ki m 6400 kp back80 $8000 $145 pooler for 3 stationsco light $1751month extra --l c motorola i k/ m kb800 kp none wo $8500 none pooler for 7 stationspotter i ki m kdr kp bcd 160 $8100 $165 purchase price $9700 pooler for 3 stations(bit) $451month sangamo kim ds9100 kp back120 $8200 $177 pooler for 10 stationsvanguard ki m datakp scribe light none 200 $247 / month extra $8500 $175 comoutp-r t(j't tnfo. .,.. c'r't' qm ~ 1 0. clf'v\ .&. ont'!n ~ '"'""" . coo soles system computer kit 6000 kp entry system mohawk kit 9000 kp computer machinery ki t key processkp general ki t ing 2100 t computer systems inforex kit key entry kp penta kit key kp associates logic systems eng. k/ t keytran kp logic corp. ki d lc-720 kp legend: ki t = key to magnetic tape system kid = key to disk system ki c = key to cassette none 496 back80 light back250 light printed 200 crt l28 back200 light none 300 crt 350 ki m = key to computer compatible magnetic tape kp = key punch t ikp = typewriter or key punch $78,000 $ 200 $16,200 to $360 to two to 6 stations $42,000 $925 $53,000 to $1040 to four to 16 stations $145,000 $2840 $92,500 to $2055to eight to 32 stations $168,100 $4095 $81,240 to $2350 to seven to 39 stations $273,120 $7885 $30,300 to $760 to four to 8 stations $35,100 $960 $110,000 $3000 to eight to 64 stations to $8600 $345,200 $100,000 $2875 to nine to 48 stations to $6350 $220,000 $148,000 $2450to four to 16 stations to $5800 $300,000 t = typewriter backlight= a matrix consisting of all individual characters that can be keyed. each character, as keyed, is displayed one at a time in its particular position in the matrix. projection and light-emitting diodes = a one-character position dot matrix. each character, as keyed, is displayed one at a time in the same position. bcd (bit) = lights displaying the bit position (on, off ) of individual characters. each character, as keyed, is displayed one at a time. (the prices quoted and the characteristics given of each device reflect the best information that could be obtained by the recon staff.) :::tl ~ ("') 0 ~ "';j [ ~ ~~ --.. ~ ~ ~ ~ 244 journal of library automation vol. 3/3 september, 1970 could be assigned to single keys and translated to their proper value by software, thus reducing the amount of keystroking required. the keymatic appears worth further investigation; therefore, the library may rent a device for several months for testing and evaluation. a typist will be trained in current marc/recon procedures and assigned to the keymatic as soon as her training period has been completed. the first month will be spent training on the keymatic prior to the actual input of recon records to obtain production and error rates and cost evaluation for comparison purposes. serious consideration was also given in the recon report to direct-read ocr equipment; however, at that time no equipment existed that offered the technical capability to perform the conversion of the lc record set. since then, preliminary investigation of the model370 compuscan universal optical character reader proved interesting enough to continue further exploration of the device. the model 370 compuscan is a computer directed flying-spot scanner which matches the scanned portion of a character with a character described in the core memory of the computer. the manufacturer has examjned a sample of lc printed cards selected at random over a period of twenty years and has concluded that although the hardware is sufficient to read the record set optically, significant software effort would be required. the results of the sampling indicated that the record set is not constituted entirely of "mint" cards, i.e., cards printed from the metal of the original linotype composition, but is composed of originals and reprints of the original. when the stock of the original printing is close to depletion, the card is reprinted by photographing the card, and duplicates are made by a photo-offset process. as this cycle is repeated, the card for any one title could be several generations removed from the original. in some instances, a microscopic examination of the cards seems to indicate that the matrices used in the linotype composition were worn. because of these factors, what might appear as the same character to the naked eye would represent different pattern configurations to the scanner's core memory. · the coarseness of the card surface may also cause variations in the same characters. lc cards have a high rag content in order to meet the archival standards required by libraries. the roughness of the surface does not affect the readability for the human but may cause variations in a given character when read by an optical scanner. another significant problem with lc cards concerns characters which touch, i.e., connections between what are intended to be distinct characters but are read by the scanner as one. for example, if a lower case "n" were next to a lower case «t" and the cross bar on the "t" touched the "n," the scanner would consider the combination of the "n" and the "t" as one character. software must be written to handle the variant character and the touching recon pilot project/ avram 245 character problems. in the case of the touching characters, the machine must recognize some allowable limit of reading a single character, and when this limit is exceeded, the pattern read rnust be divided and matched against single-character patterns held in core. programs can be written so that if either of the above conditions occurs, the output on magnetic tape will be flagged for later spot checking, permitting the scanner to continue to operate at throughput speeds without human intervention. the resultant magnetic tape would serve as input to the library's format recognition programs to reformat the scanner's output into the marc ii format. it has been estimated that the throughput speed of compuscan would be in the vicinity of 1800 cards per hour. the lc record set will be microfilmed according to the specifications required by the scanner. since the scanner operates with negative film, a very dark background with a very clear, white image is necessary. a tentative cost estimate of the microfilming and reading has been computed at approximately fifty cents per 1000 characters output on magnetic tape (approximately three lc cards). this price does not include the cost of the software. original printed "mint" cards will be used to test the device without implementing the required software, and depending on the results, investigation may be continued. the keying of the 1969 recon records has been performed by a contractor using an ibm selectric typewriter with the resulting hard copy fed through a farrington optical character reader. as part of the contractor's services to the library, production rates were monitored and reported. this gave lc the basis to compare two devices, the key-tocassette used at the library of congress for the marc distribution serv.ice and the equipment used by the contractor for recon records. to make the comparison in table 5, it was necessary to determine the costs for each method using the techniques developed in the recon report (9). some modifications of cost were made to the original recon estimates because actual figures are now available. marc costs were obtained by dividing the costs of the manhours for typing and proofing in a given period by the number of records added to the marc master file in the same period. the equipment cost per record was also based on the number of records added to the master file. production rates associated with particular tasks were not used. the manpower figures supplied by the contractor were limited to hourly production rates; therefore, to obtain the cost per record for ocr typing it was necessary to project the hourly rate to cover a manyear. the estimated annual production of a typist was then divided into the annual salary of a gs-4 (step 1) typist incremented by 8.5% for fringe benefits. the ocr equipment costs were computed on the basis of figures supplied by the contractor, assuming ownership of the ocr-font typewriter and service bureau rental of the scanner. 246 journal of library automation vol. 3/3 september, 1970 table 5. input costs per record 1. manpower key to cassette method typing $ .45 proofing .70 total $1.15 ocr method typing rate of contractor 1,000 records in 104 hours or 9.6 records per hour typing cost at lc $5,522 + 8.5% ( $5,522) 9.6 x 1,338 $ .466 proofing rate of recon editors at lc: 1,534 records proofed in 173 hours or 8.9 records per hour20% = 7.1 records per hour proofing cost at lc $6,882 + 8.5% ( $6,882) $ .786 7.1 x 1,338 typing $ .466 proofing .786 total $1.25 2. equipment (costs do not include maintenance where applicable ) key to cassette key to cassette monthly rental $100.00 converter-monthly rental prorated over 10 key to cassettes 26.00 total $126.00 hourly cost (assumes 132 hours a month) $ .955 effective production rate of key to cassette average weekly marc output 1,005 4 k t c tt 't 120 = 8.4 records/hour ey o asse e um s record cost of key to cassette and converter $.955 8.4 = $ .114 recon pilot project/ avram 247 ocr method ocr-font typewriter purchase price 40-month amortization hourly cost (assumes 132 hours use) effective production rate of ocr typewriter $500.00 12.50/month .095 9.6 records/hour x 1,338 homs d /l 132 hours x 12 months 8·1 recor s 10ur record cost of ocr typewriter $.095 sr=$ .o12 ocr scanner-service bureau hourly rental 10,000 lines/hour each recordis lines 555 records/hour record cost of ocr scmmer total record cost for equipment $.012 + $ .09 = $ 50.00 $ .09 $ .102 the cost of proofing in the ocr method was based on the recon experience at lc modified by contractor experience. in actual practice, ocr records are proofed and corrected by the contractor before they are proofed by recon editors. it was assumed that double proofing is unnecessary but that allowance should be made for the added difficulty of reading copy with a higher proportion of errors. (a preliminary study of errors on recon proofsheets has shown that there are fewer typographical errors on recon proofsheets than on current marc proofsheets.) for this reason, the number of recon records proofed in an hour has been decreased by 20% in the calculations. on the basis of the calculations in table 5, the comparative input costs are summarized as follows: table 6. estimated input cost per record key-to-cassette ocr manpower: typing $.45 $.47 proofing .80 .78 equipment .11 .10 totals $1.26 $1.35 the final figures indicate that the two methods are very close in cost. as presently calculated, the key-to-cassette method is less expensive than the ocr method. it is easy to see that a slight change in any cost or production rate could make the ocr method less expensive. if the proofing 248 journal of library automation vol. 3/3 september, 1970 rate of 8.9 records per hour were maintained instead of decreasing to 7.1 per hour, the ocr proofing cost would drop to $.63, and the total price for this proposed method would be $1.20. one way to test the assumption of the added difficulty of a single proofing would be to obtain uncorrected records from the contractor as a means of determining the actual proofing rate under that condition. recon tasks the four tasks that have been identified for study by the working task force are: 1) levels of completeness of marc records; 2) implications of a national union catalog in machine readable form; 3) conversion of existing data bases in machine readable form for use in a national bibliographic service; and 4) study of problems involved in any future distribution of name and subject cross reference control files. progress to date on the first three tasks is described in the following paragraphs. task 1 has been completed, and an article summarizing the results of a report submitted to clr has been published in the journal of library automation, june 1970 ( 10). the following conclusions reached by this study are quoted from the article: 1) the level of a record must be adequate for the purposes it will serve. 2) in terms of national use, a machine readable record may function as a means of distributing cataloging information and as a means of reporting holdings to a national union catalog. 3) to satisfy the needs of diverse installations and applications, records for general distribution should be in the full marc ii format. 4) records that satisfy the nuc function are not necessarily identical with those that satisfy the distribution function. 5) it is feasible to define the characteristics of a machine readable nuc report at a lower level than the full marc ii format. task 2 consists of an investigation of the implications of a national union catalog in machine readable form. a design of such a system is needed, and although the implementation of such a project is beyond the purview of the working task force, some of the technical and cost factors should be examined and defined for possible future research. as a framework for discussion purposes, a future reporting system for the national union catalog was postulated based on the present reporting system as follows: contributors lc outside libraries present report form printed cards locally produced cards and lc cards future report form lc marc data (for all records) marc data (for all records) or records submitted to nuc to be keyed as machine readable records recon pilot project/ avram 249 the problems of the control number and library location symbols were considered, but a tentative decision was made that recommendations should be forthcoming when the american national standards institute sectional committee z39 has completed its work on library identification codes. the indicators and subfield codes to be included in the machine readable nuc records would depend on the optimum file arrangement of the suggested bibliographic listings. the library of congress is presently engaged in a filing rules study which should influence the inclusion or exclusion of particular content designators. task 2 is still in progress. task 3 is the investigation of the possible utilization of other machine readable data bases for use in a national bibliographic store. the task was divided into several subtasks as follows: 1) identification of useful data bases for the purposes described (content and bibliographic completeness); 2) cost of the conversion from a local format to a marc ii record; 3 ) cost of updating records not already in the lc data base for consistency and missing data by comparing the records with the library of congress official catalog; 4) cost of comparing the record for the existing lc machine readable records to eliminate duplicate records. to satisfy the first subtask, a questionnaire was sent to 42 organizations. the information requested included: 1) availability of data bases-maintained by library or service bureau, and permission to copy data base. 2) use of the data base-for acquisitions, production of book catalog, circulation system, etc. 3) composition of data base-monographs, serials, technical reports, etc. 4) composition of data base-number of titles, imprint dates (primarily current, retrospective, etc.), language of records. 5) source of catalog data-marc distribution service, lc catalog card, local cataloging. 6) data elements for monographs. 7) format used in identifying data elements-marc i format, marc ii format, etc. 8) character set used. the results from this survey were analyzed, and a follow-up letter was sent to 22 of the organizations, requesting further information as follows: 1) an estimate of the number of monographs added to the data base each year. 2) representative group of twenty-five entries for monographs including both fiction and non-fiction. 3) details on the character set used in the machine readable data base.· 4) detailed specifications of monographic record format. responses from this last letter have been received and analyzed. this analysis should identify a limited number of machine readable data bases that will be subjected to further content and cost analysis. 250 journal of library automation vol. 3/3 september, 1970 outlook the recon project continues to be on schedule. the working task force has met several times for deliberations on the assigned tasks; in addition, members have been briefed on the progress of the pilot project and their advice has been sought. thus, individuals interested in the problems of bibliographic conversion guide the project throughout its development. the library of congress recon staff continues to maintain liaison with individuals and organizations working in any facet of the project's scope, hoping to bring all expertise possible to bear on the problems involved. it is significant, although not fully recognized at the onset of the recon project, that the solution to many of the problems under exploration will have impact on current conversion as well as retrospective conversion. this is evident at the library of congress where marc and recon, although staffed separately in the production area, share staff in the information systems office, and the project is known as marc/recon. coordination continues between the recon project and the card division mechanization project. the recon project director is the technical adviser for the card division project, and under her general direction, a computer analyst in the information systems office has been assigned full time to the project. the analyst has been given a detailed orientation to the procedures and computer programs for marc/recon and the specifications for the card division project. this exposure is necessary to guarantee that there is no duplication of effort between the two projects and that the design work for the card division project includes the possibility of a future national service for machine readable cataloging, both current and retrospective. (the marc distribution service is such a national service for english language monograph cataloging data, but what is assumed here is a service of a much broader scope.) although progress has been made in many of the tasks included in recon, several methods of input described in the recon report can only be fully evaluated when the format recognition programs are implemented. according to present estimates, this should take place toward the end of 1970. much remains to be accomplished. the library of congress will continue to make its progress known as rapidly as possible, because the results of the pilot project will have great ramifications for the entire library community. acknowledgments the authors wish to thank the staff members associated with the recon pilot project in the technical processes research office and the marc editorial office in the library of congress processing department, and recon pilot profectj avram 251 those in the information systems office, for their respective reports, which were incorporated into the progress report submitted to the council on library resources and which provided significant contributions to this paper. references l. avram, henriette d.: "the recon pilot project: a progress report," journal of library automation, 3 (june 1970). 2. recon working task force: conversion of retrospective records to machine-readable form (washington, d. c.: library of congress, 1969), pp. 32-33. 3. avram, henriette d., et al.: "marc program research and development: a progress report," journal of library automation, 2 (december 1969)' 250-253. 4. recon working task force: op. cit., p. 31. 5. national microfilm association: glossary of terms for microphotography and reproductions made from micro-images. 4th rev. ed. (annapolis, md.: national microfilm association, 1966), p. 8. 6. ibid. 7. ibid., p. 52 8. hawken, william r.: copying methods manual (chicago: library technology program, american library association, 1966), p. 243. 9. recon working task force: op. cit., pp. 58-59, 86, 93. 10. recon working task force: "levels of machine-readable records," journal of library automation, 3 (june 1970). / 78 design principles for a comprehensive library system tamer uluakar, anton r. pierce, and vinod chachra: virginia polytechnic institute and state university, blacksburg, virginia. this paper describes a project that takes a step-by-step or incremental approach to the development of an online comprehensive system running on a dedicated computer. the described design paid particular attention to present and predicted capabilities in computing as well as to trends in library automation. the resultant system is now in its second of three releases, having tied together circulation control, catalog access, and serial holdings . perspective the use of computers in libraries is no longer a speculative venture for the daring few. rather, library automation has become the accepted prerequisite for effective library service. the question faced is not "if," but rather "how" and "when." the reasons for this evolution are diverse, but fundamental is the recognition of online computer processing as the most effective means of simultaneously handling inventory control, information retrieval, and networking of large, complex, and volatile stores of data. most areas of current library practice could now benefit from effective computer-based control. mature and proven systems exist for cataloging, circulation, serials control, acquisitions, catalog access, and "reader guidance"; the latter by virtue of online literature searching facilities such as dialog, medlars, or brs. the challenge is to find or develop an optimal mix of capabilities. two common limitations from which library automation projects suffer are the use of nonstandardized, incomplete records and the lack of functional integration of different tasks. in most cases these limitations are due to historic circumstances. the pioneering systems say, those online systems introduced between 1967 and 1975 had to conserve carefully the available computing resources. a decade ago it was unthinkable for any library to store a million marc records online. mass manuscript received july 1980; accepted february 1981. design principles/uluakar, et al. 79 storage costs alone precluded that option. to best realize the benefits of automation, short records, usually of fixed length, were employed. there is little question that systems based on short records were helpful to their users . however, one characteristic of these systems was their proliferation within a particular library. after the first system was shown to be a success, it became compelling to try another. the problem was that these separate systems were usually not communicating directly with each other because of limitations imposed by program complexity and load on available resources. thus, the use of incomplete records breeds isolated, noncommunicating systems. however, system users have come to demand that all relevant data be available at a single terminal from a single system. it is not enough to know that a particular title is due back in twenty-five days; the user must also know that copy two has just been received, and that copy three is expected to arrive from the vendor in one week. that is, the functions of catalog access, circulation, and acquisitions must be brought together at a single place the user's terminal. and while the importance of functional integration has been recognized for some time, only a very few report successful implementations. i,z the kafkaesque alternative to functional integration becomes the library that has been "well computerized" but where the librarian must use five different terminals, one for each task. as computer-based systems have grown to maturity, increasing stress has been placed on standardization . in library automation the measure of standardization is wide-scale use of the marc formats for documents and authorities; the use of bibliographic "registry" entries such as isbn, issn, or coden; the use of standard bibliographic description; and so forth. however, the application of common languages and standardized protocols, data description, and definition has been less pervasive. we find many applications that eschew use of the common high-level languages, database management systems, and standard "off-the-shelf' or general-purpose hardware. the emergence of powerful and easy-to-use database management systems, the spectacular price reductions in hardware, and the concomitant, and equally spectacular, improvements in system capabilities have made it clear that it is practical to think ambitiously. perhaps the major articulation of these developments has been the pervasive shift from a central computer shared with nonlibrary users to the utilization of dedicated minicomputers. 3 our analysis of the requirements of a comprehensive system led to recognition of the key role played by serials in research libraries. serials form the most critical factor in automating library service because of the complexity of their bibliographic, order, and inventory records, and because of their importance to research. 4 a fundamental error in designing a comprehensive library system would involve focusing on the require80 journal of library automation vol. 14/2 june 1981 ments of monographs and/or other "one-shot" forms of the literature. the reason is, simply, that monographs and other such publications can be treated as an easy limiting case of a continuing set of publications . this observation is borne out by christoffersson, who reports an application that extends the idea of seriality and develops a means to provide useful control and access to all classes of material. 5 design philosophy the concerns outlined above mean that a viable library system should meet the following design criteria: functional integration. functional integration is simply the ability to conduct all appropriate inquiries, updates, and transactions on any terminal. this envisages a cradle-to-grave system wherein a title is ordered, has its bibliographic record added to the database, is received and paid, has its bibliographic record adjusted to match the piece, is bound, found by author, title, subject, series, etc., charged out, and, alas, flagged as missing. in this way a terminal linked to the system will be a one-stop place to conduct all the business associated with a particular title, subject, series, order, claim, vendor, or borrower. completeness of data. if the system is to be functionally integrated, it is clear that it must carry the data required to support all functions. in particular, data completeness is required to satisfy the access and control functions. consider, for example, the problems associated with the cataloging function. a book is frequently known by several titles or authors. creating these additional access points is a large portion of the cataloger's responsibility. only systems that allow the user access to these additional entries utilize the effort spent in building the catalog record. such system capabilities must be present to allow the laborintensive card catalog to be closed and, more important, to allow maintenance of the catalog within the system . use of standardized data and networking. in an excellent article, silberstein reminds us that, in general, the primary rationale for adhering to standards is interchangeability. 6 we give great importance to being able to project our data to whatever systems may develop in the future. we believe this consideration is of the highest priority because, fundamentally, the only thing that will be preserved into the future is the data itself.* without interchangeability of data, sharing of resources is impossible. data interchangeability is, of course, a basic assumption that has been made in speculation concering the national bibliographic network7 developing from the bibliographic utilities-notably, oclc, inc., the research libraries group's rlin facility, the washington library network, and the university of toronto's utlas facility. today, nearly all *this state of affairs seems to be true for all computer-based systems because their lifetime is, typically, no greater than ten years. design principles!uluakar, et al. 81 research libraries participate in some utility. while their participation is primarily directed to utilization of the c<;~,taloging support services, we find an increasing amount of interest and use of additional capabilities, notably interlibrary loan. we expect a steady and continual growth of these library networking capabilities. however, networking is not problem free. perhaps the biggest single problem in using the network is the misalignment between the record as found on the bibliographic database and the requirements of individual libraries. while such variability between the resource database record and the user's needed version is well understood, 8 the local library frequently has a difficult time adjusting records to meet local needs. one example is oclc's inability to "remember" in the online database a particular library's version of a record. another example is the conser project's practice of "locking" very dynamic records as soon as they are authenticated. this locking frequently means that required updates cannot be made and users cannot share with one another corrections to the base record. after locking, each must, independently, go about bringing the record up to date. thus, as roughton notes, "the next library to call up the record loses the benefit of the previous library's work. "9 this inhospitable state of affairs forces individual libraries to maintain their own records if they wish to change bibliographic records after initial entry. the problem of local adjustment of bibliographic records in no way conflicts with the goal of standardized bibliogra:phic data. standardized data provides a quick means of delivering an intelligible package to a variety of users who will adapt the package to meet their particular needs . standardization does not mean making adaptation inefficient or more costly than it need be; rather, standards provide a framework around which the details are filled in. these observations on standardized data formats imply that the library's data must be based on marc records for books, serials, authorities, etc.; and on the ansi standards for summary serials holdings notation, book numbers, library addresses, and so forth. microscopic data description. at this point, system administrators face a fundamental problem-many of the library's important records have no standard format. the most conspicuous example involves the notation for detailed serials holdings. 10 the only alternative one has when trying to build a system without standardized formats is to rely on "microscopic" description. that is, each and every distinct type of data element that makes up (or can make up) a field in a record must be accounted for and uniquely tagged. in this way, whatever standard format is ultimately set, it will be possible, in principle, to assemble by algorithm the data elements into an arrangement that will be in conformity with the standard. only if the library is using microscopic data description will the library be able to maintain its independence of particular lines 82 journal of library automation vol. 14/2 june 1981 of hardware or software. we are convinced that the use of untagged, free-form input will, in the long run, spell disaster. use of general purpose hardware and software. many strategies in dealing with library automation involve redesigning standard hardware or software. for example, one vendor has reported an interesting design of mass storage units that improved access time. 11 we feel that future applications should, as much as possible, steer clear of such customized implementations because the standard capabilities of most affordable systems allow sufficient processing power and storage economies even if these capabilities are suboptimal for a particular application . the use of general-purpose hardware and system software promotes system sharing between different installations. moreover, an application based on general-purpose hardware and system software will be easier to maintain and far less vulnerable to changes in personnel. for turnkey installations, the greater the degree of use of general-purpose hardware and software, the better shielded will the installation be against changes in product line or the vendor's ultimate demise . a noteworthy application of this principle of compatibility is seen in the system being developed by the national library of medicine. 12 system description the functional capabilities of the virginia tech library system (vtls) have been developed in two software releases, with the third release soon to appear. the initial release met the needs associated with circulation control and also provided rudimentary access to the catalog and serials holdings. the present release has benefited from the use of the marc format, and allows vastly improved catalog access and control. release iii, the comprehensive library system now being developed, will draw together acquisitions, authority control, and serials control with the current capabilities. vtls release i the initial release of the system was developed in 1976 to meet needs generated by rapid library growth. circulation transactions had been increasing at about 10 percent annually for the previous decade and were straining the manually maintained circulation files beyond acceptable limits. the main library* at virginia tech is organized in subject divisions-each essentially "owning" one floor of a 100,000-square-foot facility. a 100,000-square-foot addition to the library had been approved. because virginia tech's library has only one card catalog, some means was necessary to distribute catalog information throughout a facility that *only two quite small branch libraries (architecture and geology) exist on campus . in addition there is a reserve collection located in the washington, d.c., area that supports off-campus graduate programs in the areas of education, business administration, and coiuputer science. all these sites are linked to the system. design principles/uluakar, et al. 83 was to double its size. after reviewing the alternative means of distributing the catalog-e . g., a duplicate card catalog, photographic reproduction of the catalog, or a com catalog-it was decided to attack both problems, circulation control and remote catalog access, within a single online system . vtls was installed on a full-time basis in august 1976. its first release ran continuously on the library's dedicated hewlett/packard 3000 minicomputer until december 1979 . at that time the system held brief bibliographic data for approximately 325,000 monographs and 25,000 journals and other serial titles-records for about half the collection. while the first release ably met its goals, it became clear that it would prove to be an unsuitable host for additional modules involving acquisitions and serials control, primarily because of the brief, fixed-length bibliographic records. as a result of highly favorable price reductions in computer hardware and improvements in capability, it was possible to think in terms of storing one million marc records online as well as supporting the additional terminals required for a comprehensive library system. vtls release ii vtls runs under a single online program for all real-time transactions. the major goals in the design of this program were the following: 1. two conflicting requirements had to be a~commodated : first, the program had to be easy to use for library patrons. this is requisite for a system that will eventually replace the card catalog. second, the program had to be practical, efficient, and versatile for its professional users. the keystrokes required had to be minimal, and related screens had to be easily accessible· from one to another. 2. the response time had to be good, especially for more frequent transactions. 3. the contents of all screens had to be balanced to provide enough information without being overcrowded and difficult to read or comprehend. further, each screen of vtls had to be arranged by some logical arrangement of the data it contains-for most screens this meant alphabetical sorting of the data according to ala rules. 4. the format of all screens, especially those to be viewed by the patrons, had to be visually pleasing. thus , the use of special symbols (which are so abundant on many computer system displays), nonstandard abbreviations, and locally (and often quite arbitrarily) defined terms were unacceptable. 5. the program had to have security provisions to restrict certain classes of users from addressing particular modules of the program. considerable effort was spent to satisfy these goals. the first goal was achieved by the "network of screens" approach. the second goalprompt system response-necessitated the use of the "data buffer 84 journal of library automation vol. 14/2 june 1981 method," which, in turn , proved to have other uses (both of these techniques are discussed below) . to satisfy goals three and four, a committee of librarians and analysts spent months drafting and reviewing each screen until it was finally approved by the design group. goal fivesecurity provisions-was reached without much difficulty. network of screens vtls' s data-access system is designed to be used as easily as a road map. this is accomplished by the use of a "network of screens." the network of screens is much like a road map in which a set of related data (a screen displayed in one or more pages) acts as a "city," and the commands that lead from one set to another act as "highways." vtls has nineteen screens including various menu screens, bibliographic screens (see "the data buffer method" below), serial holdings screens, item (physical piece) screens, and screens for patron-related data. the user can "drive" from one "city" to another us ing system commands. the system commands are either "global" or "local." global commands, as the name implies, may be entered at any point during the execution of the online program. a local command is peculiar to a given screen. global commands are of two types: search commands and processing commands. search commands are used to access the database by author, title, subject, added entries, call number, lc card number, isbn, issn, patron name, etc. processing commands, on the other hand, initiate procedures such as check-out, renewal, or check-in of items. the user first enters a global (search) command to access one of the screens in the network. from there, local commands that are specific to the current screen can be used. there are three different types of local commands: commands that take the user from one screen to another; commands that page within the current screen; and commands that update data related to the screen. for example, it is possible to start by entering an author search command to access the network and then proceed not only to find what books the author has in the system but also the availability of each of the books . if the books are checked out, information about the patrons who have them can also be reached. this display is called the patron screen . from the patron screen, one can "drive" to the patron activity screen , which displays circulation information about the patrons. thus, each d isplayed screen leads to another. in fact, the searches can start at ten different screens and proceed in many different ways through the network. database design image/3000, hewlett-packard's database management system used by vtls, is designed to be used with fixed-length records. this fact, coupled with the need to sort entries on most screens, created serious problems in the early stages of the system design . but various techdesign principles/uluakar, et al. 85 niques were devised to overcome these apparent road blocks . figure 1 illustrates the breakdown of the bibliographic record in the database and the way it is linked with piece-specific · data. bibliographic data are stored in three distinct groups for subsequent retrieval: l. controlled vocabulary terms. (authority data set) 2. title and title-like data. (title data set) 3. all remaining bibliographic data; i.e., data that is not indexed. (marc-other data set) this grouping of the marc record extends to subfields, thus splitting mixed fields such as author-title added entries . when individual fields are parsed in this way, a single field may contribute more than one access point, such as variant forms of author, title, series name, subject, and added entries. access by the standard bibliographic control numbers is effected by use of inverted files (not shown in the figure). a fundamental characteristic of this layout involves the storage of controlled vocabulary terms (i.e., authors and subjects). regardless of the number of references made to an authority term from different bibliographic records, the controlled vocabulary term is stored only once. the system assigns a unique number (authority id) to each such term and uses this number to keep records of the references made to it in a separate data set (authority bibliographic linkage data set). this particular structure makes an authority control subsystem possible, speeds up online retrieval and display, and economizes mass storage. the data buffer method the system displays bibliographic records in two different formats. if the terminal used is designated for librarians, the records are displayed al'thority -bibliographic linka<;e data set fh;ure 16. biblio<;raphic layo ut of the cfs-11 data base . tsimplif'iedl fig. 1. bibliographic layout of the vtls database (simplified). 86 journal of library automation vol. 14/2 june 1981 in the marc format (the resulting screen is referred to as the marc screen); otherwise, they are displayed in a screen that is formatted similar to a catalog card. before displaying these screens, the online program collects and formats the data to be displayed and stores it in one of the two "buffer" data sets. the records stored in the buffer data sets are called buffer records. buffer records can be edited, as required, by adding new lines, deleting, or modifying existing character strings. these updates can be executed quickly and without placing much load on the system since they involve little, if any, analysis, indexing, and sorting. thus, the buffer data sets store all bibliographic updates and new data entry of the day. at night, these records are transferred to the rest of the database by a batch program. the data buffer method has had several pronounced effects on the system. by transferring periods of heavy resource demand to off-hours, the system can work with full marc records in a library that has a heavy real-time load of data entry, inquiry, and circulation. the data buffer approach also improves access efficiency because once a buffer record is prepared for a screen, subsequent searches for the same record are satisfied by the buffer record. data entry and the oclc interface the most frequently encountered method of entering marc records into a local computer involves use of tape in the marc ii communications format . alternative methods include the use of microprocessors or digital recorders which "play back" a marc-tagged screen image from oclc or some other bibliographic utility. these alternative methods have the strong advantage of shortening the delay introduced while waiting for a tape to be delivered. we have been able to link the utility's terminal to the data buffer. 13 data flows from the utility to the buffer in real time. no intervention in the utility's terminal was required for the local processor to be able to capture the marc-tagged screen. batch programs running on the hip 3000 read records from printer ports of oclc terminals and pass them directly to the data buffer. once a record gets into the data buffer, it is accessible by oclc number so that subsequent editing and linkage to piece-specific data or serial holdings can be made right away in the local system . buffer records can also be created by direct keyboarding of the full array of fixed and variable fields using the vtls terminals. circulation as with most other online circulation systems, vtls uses machinesensible bar-code labels to identify books and borrowers to the system. all efforts have been made to humanize the system. one consequence is design principles/uluakar, et al. 87 that the system does not make decisions better made by responsible staff. thus, two kinds of circulation stations reside side by side. the first is staffed by students who typically work a ten-to-twenty-hour week and historically have shown high turnover. their circulation stations only deal with inquiries and with heavily used but nondiscretionary transactions: check-out, renewal, and check-in. should problems arise, the borrower is directed to the adjacent station staffed by a full-time employee who, using the system, can articulate circulation policy to borrowers and make decisions with regard to any questions concerning fines, lost books, or reinstatement of invalidated or blocked privileges . start-up we found system start-up to be a relatively easy task. it was convenient to use the so-called rolling conversion in which items were labeled upon their initial circulation through the system. the greatest benefit was seen in the first year when the probability that items brought to the circulation desk were already known to the system increased exponentially. after six months this probability had risen to 65 percent with only 10 percent of the circulating collection having been labeled . at the end of the year the probability increased linearly at 0. 7 percent per month. after three years of operation, the probability was 90 percent, with approximately 50 percent of the circulating collection having been labeled. reference use the ability to distribute catalog access as well as circulation information provides a powerful information tool. a subset of all functions previously described is available to the nonlibrarian users of the system through user-cordial screens. a "help" function may also be initiated at any screen to guide users through the network of screens. current development critical to the overall design of vtls is the system's ability to treat serials and continuations. without this capability, the modules being developed to support acquisitions, serials check-in and claiming, and binding, will not function satisfactorily. equally important, the design lays the foundation for authority control by virtue of its use of a dictionary for all controlled vocabulary terms. thus a name or subject entry is carried internally as a four-byte code, which is translated to the authority entry upon display. another internally coded data element, the bib-id, is designed to handle many of the linkage problems associated with serials and continuations. the bib-id is unique for each marc record. prior to establishing the serials control modules governing receipt, 88 journal of library automation vol. 14/2 june 1981 claiming, and binding, the coded holdings module must be functioning. this module will allow automatic identification of volume (or binding unit) closure and automatic identification of gaps in holdings or overdue receipts. thus, highest priority has been given to the development of this module so that these other modules can, in turn, develop. the holdings module serves two functions: first, it allows the detailed recordings of serials holdings consistent with the principle stated earlier concerning microscopic data description; and second, these microscopic data are coded so that the system can recognize (and predict) particular pieces or binding units in terms of enumerative and chronological data. the next three areas of development are modules for acquisitions and fund control, serials receipts and binding, and authority control. the final development will be comprehensive management reports. it should be noted that each one of these developments will result in a specific benefit to the user community. the project is incremental in that the development of area a does not mean that area b must be developed for a to have lasting value. this incremental approach offers designers and administrators the advantages associated with an orderly growth in complexity and budget requirements. further, the capabilities of the host hardware and software are stressed in smaller steps than would be the case if the comprehensive system were written and then turned on. the key move appears to be predefining the scope and capabilities of each stage so that a useful product emerges at its completion, and so that it lays a foundation for the next. references 1. velma veneziano and james s. aagaard, "cost advantages of total system development," in proceedings of the 1976 clinic on library applications of data processing (urbana, ill.: university of illinois press, 1976), p.133-44 . 2. charles payne and others, "the university of chicago data management system ," library quarterly 47:1-22 (jan . 1977). 3. audry n. grosch, minicomputers in libraries (new york: knowledge industry press, 1979), 142p . 4. richard degennaro, "wanted: a mini-computer serials system," library journal 102:878-79 (april 15, 1977). 5. john g. christoffersson, "automation at the university of georgia libraries," journal of library automation 12:23-38 (march 1979). 6. stephen m. silberstein, "standards in a national bibliographic network," journal of library automation 10:142-53 (june 1977). 7. network technical architecture group, "message delivery system for the national library and information service network: general requirements," in david c. hartmann, ed . , library of congress network planning paper, no.4, 1978, 35p. 8. arlene t. dowell, cataloging with copy (littleton, colo.: libraries unlimited, 1976), 295p. 9. michael roughton, "oclc serials records: errors , omissions, and dependability," journal of academic librarianship 5:316-21 (jan. 1980). 10. tamer uluakar, "needed: a national standard for machine-interpretable representation of serial holdings," rtsd newsletter 6:34 (may/june 1981) . design principles!uluakar, et al. 89 11. c.l. systems, inc., "the libs 100 system: a techn-ological perspective," clsi newsletter, no .6 (fall/winter 1977). 12. lister hill national center for biomedical communications, national library of medicine, "the integrated library system: overview and status" (lhc/ctb internal documentation, bethesda, md., october 1, 1979), 55p. 13. francis j. galligan to pierce, 11 feb. 1980. tamer uluakar is manager of the virginia tech library automation project. anton r. pierce is planning and research librarian at the university libraries. vinod chachra is director of computing resources and associate professor of industrial engineering. a scatter storage scheme for dictionary lookups d. m. murray: department of computer science, cornell university, ithaca, new york scatter storage schemes are examined with respect to their applicability to dictionary lookup procedures. of particular interest are virtual scatter methods which combine the advantages of rapid search speed and reason• able storage requirements. the theoretical aspects of computing hash addresses are developed, and several algorithms are evaluated. finally, experiments with an actual text lookup process are described, and a possible library application is discussed. a document retrieval system must have some means of recording the subject matter of each document in its data base. some systems store the actual text words, while others store keywords or similar content indicators. the smart system ( 1) uses concept numbers for this purpose, each number indicating that a certain word appears in the document. two advantages are apparent. first, a concept number can be held in a fixedsized storage element. this produces faster processing than if variablesized keywords were used. second, the amount of storage required to hold a concept number is less than that needed for most text words. hence, storage space is used more efficiently. smart must be able to find the concept numbers for the words in any document or query. this is done by a dictionary lookup. there are two reasons why the lookup must be rapid. for text lookups, a slow scheme is costly because of the large number of words to be processed. for handling user queries in an on-line system, a slow lookup adds to the user response time. 174 journal of library automation vol. 3/3 september, 1970 storage space is also an important consideration. even for moderate sized subject areas the dictionary can become quite large-too large for computer main memory, or so large that the operation of the rest of the retrieval system is penalized. in most cases a certain amount of core storage is allotted to the dictionary, and the lookup scheme must do the best possible job within this allotment. this usually means keeping the overhead for the scheme as low as possible, so that a large portion of the allotted core is available to hold dictionary words. the rest of the dictionary is placed in auxiliary storage and parts of it are brought in as needed. obviously the number of accesses to auxiliary storage must be minimized. this paper presents a study of scatter storage schemes for application to dictionary lookup, methods which appear to be fast and yet conservative with storage. the next two sections describe scatter storage schemes in general. they are followed by a section presenting the results of various experiments with hash coding algorithms and a section discussing the design and use of a practical lookup scheme. the final sections deal with extensions and conclusions. basic scatter storage method a basic scatter storage scheme consists of a . transformation algorithm and a table. the table serves as the dictionary and is constructed as follows: given a natural language word, the algorithm operates on its bit pattern to produce an address, and the concept number for the word is placed in the table slot indicated by this address. this process is repeated for every word to be placed in the dictionary. the generated addresses . are called hash addresses; and the table, a hash table.· · there are many possible algorithms for producing hash addresses ( 2,3,4). some of the most common are: 1 ) choosing bits from the square of the integer represented by the input word; 2) cutting the bit pattern into pieces and adding these pieces; 3) dividing the integer represented by the input word by the length of the hash table and using the remainder. · collisions in an ideal situation every word placed in the dictionary would have a unique hash address. however, as soon as a few slots in the hash table have been filled, the possibility of a collision arises-two or more words producing the same hash address. to differentiate among collided entries, the characters of the dictionary words· must be stored along with their concept numb~rs. during lookup, the input word can then be compared with the character string to verify that the correct table entry has been located. · · the problem of where to store the collided items has several · methods of solution ( 3,5). the linear scan method places a collided item in the first free table slot after the slot indicated by the hash address. the scan· is scatter storage for dictionary lookups/murray 175 circular over the end of the table. the random probe method uses a crude algorithm to generate random offsets r(i) in the interval [1,h] where h is the length of the hash table. if the colliding address is a, slot a+r( 1) mod h is examined. the process is repeated until an empty slot is found. both of these methods work best when the hash table is lightly loaded; that is, when the ratio between the number of words entered and the number of table slots is small. in such cases the expected length of scan or average number of random probes is small. chaining methods provide a satisfactory method of resolving collisions regardless of the load on the hash table. however, they require a second storage table-a bump table-for holding the collided items. when a collision occurs, both entries are linked together by a pointer and placed in the bump table. a pointer to this collision chain is placed in the hash table along with an identifying flag. further colliding items are simply added to the end of the collision chain. table layout and search procedure in the virtual scatter storage system described later, the hash table has a high load factor. hence the chained method (or rather a variation of it) is used to resolve collisions. further discussion involves only scatter storage systems using collision chains. with this restriction, then, a scatter storage system consists of a hash table, a bump table, and the associated algorithm for producing hash addresses. a dictionary entry consists of a concept number and the character string for the word it represents. these entries are placed in the hash-bump table as described above. consequently there are three types of slots in the hash table-slots that are empty, slots holding a single dictionary entry, and slots containing a pointer to a collision chain held in the bump table. figure 1 is a typical table layout. hash table 0 empty slot • • concept + char • nary entry single dictio . . pointer -..j.,ntry 11 'r --~)~entry 21 \ collision cha in fig. 1. typical table layout. 176 journal of libmry automation vol. 3/3 september, 1970 one of the advantages of scatter storage systems is that the search strategy is the same as the strategy for constructing the hash-bump tables. a word being given, its hash address is computed and the tables searched to find the proper slot. during construction, dictionary information is placed in the slot; during lookup, information is extracted from the slot. the basic search procedure is illustrated by the flow diagram in figure 2. the construction procedure is similar. pointer,----< get next bump table entry input the text ~rd coiilpute hasii. address return concept number word never entered in dictionary fig. 2. flow diagram for the lookup procedure in basic scatter storage systems. scatter storage for dictionary lookups/murray 177 theoretical expectations an ideal transformation algorithm produces a unique hash address for each dictionary word and thereby eliminates collisions. from a practical point of view, the best algorithms are those which spread their addresses uniformly over the table space. producing a hash address is simply the process of generating a uniform random number from a given character string. if the addresses are truly random, a probability model may be used to predict various facts about the storage system. suppose a hash table has h slots and that n words are to be entered in the hash-bump tables. let h, be the expected number of hash table slots with i entries for i=0,1, ... n. in other words, ho is the expected number of empty slots, h1 is the expected number of single entries, and h2,hs, ... , hn are the expected number of slots with various numbers of colliding items. even though the items are physically located in the bump table, they may be considered to "belong .. to the same slot in the hash table. it is expected that: n 1) h=~ h. i=o n 2) n="s i h, i=o now let x _ (1 if exactly i items occur in the r~~ slot ' 1 ~0 if exactly i items do not occur in the j'11 slot for i = 1,2, ... , h then h, = e [xu + x.2 + ... + xm] h = ~ e [x,j] i= 1 assume that any chosen table slot is independent of the others so that the probability of getting any single item in the slot is 1/h. then the probability of getting exactly i items in that slot is 3)p·= (~x1r(1-1f then e[x,1] = 1·p, + 0· (1-p,) = p, substituting into the above 4) h,= h·p, = n( ~x~) i ( 1~ ri for j = 0,1, ... > n 178 journal of library automation vol. 3/3 september, 1970 for the cases of interest h and n are large, and the poisson approximation can be used in equation 3: p ·nih (n/h)' •e il the ratio n f h is the load factor mentioned previously. it is usually designated by a so that a• 5) h, = he·a if i=o,l, . . . , n equation 5 is sufficient to describe the state of the scatter storage system after the entry of n items. most of the statistics of interest can be predicted using this expression; a few of them are listed in table 1. the time required for a single lookup using a hash scheme depends on the number of probes into the table space, that is, how many slots must be examined. suppose the word is actually found; if it is a single entry, only one probe is required. if the word is located in a collision chain, the number of probes is one (for the hash table) plus one additional probe for each element of the collision chain that must be examined. suppose that the word is not in the dictionary; if its hash address corresponds to an empty table slot, again only one probe is needed. however, if the address points to a collision chain, the number is one plus the length of the chain. for words found in the dictionary the average number of probes per lookup is : i 6) p = 1 + n[(o)ht + (1+2)hz + (1+2+3)hs + ... n i = 1 + ~ h , ~ f i=2 f=1 1 n = 1 + 2 ~ (i+1)fi-1 i=2 1 n 1 n =i+ 2 ~ (i-1) f i-1 +2 l f •. t i=2 i=2 1 n+1 1 n+1 = 1 + 2 ~ (i-1)fi-t + 2 ~ f1-1 i=2 i=2 (probes) + (1+2+ ... + n)hn] scatter storage for dictionary lookups/murray 179 table 1. expected storage and search properties fo1' basic scatter storage schemes measure load factor number of empty table slots number of single entries number of collision chains of length i expected sums fraction of hash table empty fraction of table filled with single entries fraction of hash table slots with i entries expected sums number of collisions number of entries in the bump table total table slots required average lookup time (probes) h = number of hash table slots n = number of words to be entered formula a=n/h ho =he-a h1 = ne-a ai hi = h e-a--:-r i = 2,3, ... , n z. n h = ~--hi i=o n n =~--hi i=o 1 fo = h ho= e-a \ 1 f1 = h h1 = aea 1 a' f, = h hi = e-a if i = 2,3, ... , n n 1 = ~ f, i=o n a = ~ i f, i=o no = h2 + h a + ... + hn = h ho-hl b = n-hl s = h+b 180 journal of library automation vol. 3/ 3 september, 1970 vihtual scatter storage method from table 1, the expected number of collisions is nc= hhoht = h( 1 e ·ntn _ ~e·nih) for a fixed n, this number decreases as h increases. at the same time the number of empty hash table slots ho = h e·nt n increases as h increases. both of these results are expected; as the hash addresses are spread over a larger and larger table space ( h slots), the number of collisions should decrease and the number of empties increase for a fixed number of entries ( n). a virtual scatter storage scheme tries to balance these opposing strains by combining hash coding with a sparse storage technique. large or virtual hash addresses are used to obtain the collision properties associated with a very large hash table, and the storage technique is used to achieve the storage and search properties of a reasonably sized hash table. if the virtual hash address is taken large enough, the expected number of collisions can be reduced to essentially zero. with no expected collisions, it is possible to dispense with verifying that a query word and the dictionary word are the same. it is enough to check that they produce the same virtual address. hence, the character strings need not be stored in the hash-bump tables at all. to implement the virtual scheme a large hash address is computed, say in the range ( 0, v), and the address is split into a major and minor part. the major portion is used just as before-as an index on a hash table of size h. the minor portion is stored in the hash or bump table, in place of the character string. with this difference, the virtual scheme works just as the basic scheme does. the lookup procedure is identical, but the minor portions are used for comparison rather than character strings. all the results of the previous section apply as storage and timing estimates. the advantage of virtual scatter storage systems is economy of storage space. the minor portion is much smaller in size than that of the character string it replaces. it is true that the virtual scheme assigns the same concept number to two different words if they have the same virtual address. this need not be disastrous for document retrieval applications. presumably v is chosen large enough to keep the number of collisions small. on the one hand, errors could be neglected because of their low probability of occurrance and their small effect on the total performance of the retrieval system. on the other hand, it is always possible to resolve detected collisions even in a virtual scheme. collisions may be detected during dictionary construction or updating, and the characters for the scatter storage for dictionary lookups/murray 181 colliding words appended to the bump table. the hash or bump table entry must contain a pointer to these characters along with an identifying flag. collisions occurring during actual lookups cannot be detected. collision problem "' in order to use a virtual hash scheme, the virtual table must be large enough to reduce the expected number of collisions to an acceptable level. from a practical point of view, a collision may be considered to involve only two words, rather than three, four, or more. it is assumed that the probability of these other types of collisions is negligible. let v be the size of the virtual hash table. then the expected number of collisions is simply n. =h2 a2 = v2e·a where a = ~ . in this case v> > n so that a is small and e·a is approximately 1. a2 7) n.=v2 n2 =2v suppose, for example, the dictionary has n = 213 words. if the size of the virtual hash table is chosen to be v = 226, then the expected number of collisions is (213)2 1 nc = 2(226) = ]' suppose further that this table size is adopted for the dictionary, and that the hash code algorithm produces three collisions. the question arises whether the algorithm is a good one-whether it produces uniform random addresses. the answer is found by extending the previous probability model. consider a virtual scatter storage scheme in which the virtual table size is v, and n items are to be entered into the hash-bump tables. again assume that collisions involve only two items. let p(i) =prob [i collisions] = prob [i table slots have 2 items and n-2i slots have 1 item] the number of ways of choosing the i pairs of colliding words (in an ordered way) is: (~x n22} .. ( n-~+2 ) 2' (~~2i)l 182 journal of library automation vol. 3/3 september, 1970 there are il ways of ordering these pairs and vi (v)n-i = (v-n+i)l ways of placing the pairs in the hash table, so that ( . nl (v)n-' t n 2 j 8) p l) = 21il (n-2i)! -yr fori= 0,1, ... , in a form for hand computation, 1 2 n-1 9) p(0)=(1--y) (1-y-) ... (1---y) p( ') =p('1 ) (n-2i+ 2) (n-2i+1) f t ' 2i(v-n+ i) or i=1,2, ... , these results are exact, but the following approximations can be used with accuracy n-1 . log p ( 0) = ~ log ( 1 ~ ) i=l n-1 . =~ -' . 1 v 7= nz -2v let f3 = ~; . terms linear in n may be neglected in equation 9, giving p(o) = exp(-fi) p(i) = ~ p(i-1) 1 this is also a poisson distribution: 10) p(i) = exp(-fi) ft for i = 0,1,2, .. . , l ~ j this equation gives the approximate probability of i collisions for a virtual scatter storage scheme. it may be used to form a confidence interval around the expected number of collisions nc = /3. for the previous example in which v = 22 6, n = 213, n c = ~'the following table of values can be made: i p(i) ~p(i) 0 .607 .607 1 .303 .910 2 .076 .986 3 .012 .998 \ scatter storage for dictionary lookups/murray 183 the probability is .986 that the number of collisions is less than or equal to 2. since the algorithm gave 3 collisions, it appears to be a poor one. the results for the collision properties are summarized in table 2. table 2. expected collision properties for virtual scatter storage systems measure collision factor expected number of collisions probability of i collisions probability that the number of collisions c lies in [ a,b] v virtual hash table size formula n2 {3= 2v n.=p p( i) = exp(-{3) ~' i=o, 1, ... , [ ~ j b prob = ~ p(i) i=a n number of words to be entered experiments with algorithms for generating hash addresses any scatter storage scheme depends on a good algorithm for producing hash addresses. this is especially true for virtual schemes in which collisions are to be eliminated. in these experiments three basic algorithms are evaluated for use in virtual schemes. the words in two dictionariesthe adi wordform and cran 1400 \ivordform-are used. the hash-bump tables are filled using these words and the resulting collision and storage statistics compared with the expected values. dictionaries the adi wordform contains 7822 words pertaining to the field of documentation. it contains 206 common words (previously judged) averaging 3.93 characters. the remaining 7616 noncommon words average 8.00 characters. in all there are 61,712 characters. the cran 1400 wordform contains 8926 words dealing with aeronautics. the common word list consists of that of the adi, plus four additional entries. the 8716 noncommon words average 8.40 characters. there is a total of 74,074 characters. figures 3 and 4 show the distribution of the length of the words versus percentage of collection. the abrupt end to the curves in figure 3 is due to truncation of words to 18 characters. both dictionaries have approximately the same size and proportions of words of various length. however, their vocabularies are considerably different. a good hash scheme should work equally well on both dictionaries. 184 journal of library automation vol. 3/3 september, 1970 1/) 0 common words "e ~ ~ adi >0 cran 1400 ~ 0 c 0 += 0 ·0 -0 -c q) ~ 8 q) a.. 0 2 4 6 8 10 14 word length fig. 3. distribution of dictionary words according to their lengths. >. '0 c .q -u 0 -0 q) .~ -a :; e ::j u \ scatter storage for dictionary lookups/murray 185 0 common words 6 adi 0 cran 1400 0 2 4 6 8 10 12 14 16 18 20 word length fig. 4. cumulative distribution of dictionary words according to th eir l engths. 186 journal of library automation vol. 3/3 september, 1970 hash coding algorithms by their nature, hash coding algorithms are machine dependent. the computer representation of the alphabetic characters, the way in which arithmetic operations are done, and other factors all affect the randomness of the generated address. the algorithms described below are intended for use on the ibm s /360. words are padded with some character to fill an integral number of s /360 full words. then the full words are combined in some manner to form a single fullword key, and the final hash address is computed from this key. in the experiments which follow, the blank is used as a fill character. this is an unfortunate choice because of the binary representation of the blank 01000000. in some algorithms the zeroes may propagate or otherwise affect the randomness. a good fill character is one that 1) is not available on a keypunch or teletype, 2) will not propagate zeroes, 3) will generate a few carries during key formation, and 4) has the majority of its bits equal to 0, so their positions may be filled. a likely candidate for the s/360 is 01000101. three basic methods of generating virtual hash addresses-addition, multiplication, and division-are studied. the first and second provide contrasting ways of forming the single fullword keys. the second and third differ in the way the hash address is computed from the key. variations of each basic method are also tested to try to improve speed, programming ease, or collision-storage properties. l. addition methods ac-addition and center the fullwords of characters are logically added to form the key. the key is squared and the centermost bits are selected as the major. the minor is obtained from bits on both sides of the major. as-addition with shifting same as ac, except the second, third, etc. fullwords are shifted two positions to the left before their addition in forming the key. (an attempt to improve collision-storage properties) am-addition with masking same as ac, except the second, third, etc. fullwords have certain nonsignificant bits altered by masks before their addition in forming the key. (an attempt to improve collision-storage properties) 2. multiplication methods mc-multiply and center the fullwords of characters are multiplied together to form the key. the center bits of the previous product are saved as the multiplier for the next product. the key is squared and the centermost bits selected as the major. the minor is obtained from the bits on both sides of the major. scatter storage fo1' dictionary lookups/murray 187 msl-multiply and save left same as mc, but during formation of the key, the high order bits of the products, rather than the center, are used as successive multipliers. (an attempt to improve speed) mlm-multiply with left major same as mc, but taking the major from the left half of the square of the key and the minor from the right half. (an attempt to improve speed) 3. division methods dp-divide by prime the fullwords of characters are multiplied together to form the key. the center bits of the previous product are saved as the multiplier for the next product. the key is divided by the length of the virtual hash table-a prime number in this case-and the remainder used as the virtual hash address. the major is drawn from the left end of the virtual address and the minor from the right. do-divide by odd number same as dp, except using a hash table whose length is odd. (an attempt to provide more flexibility of hash table sizes ) dt -divide twice same as dp, except two divisions are made. the major is produced by dividing the key by the actual hash table size. the minor results from a second division. primes are used throughout as divisors. (an attempt to improve storage-collision properties) evaluation in the experiments to evaluate each variation of the above hash schemes, the size of the virtual hash table varies from 220 to 228 slots. the actual hash table varies in size from 212 to 214 slots. bump table space is used as needed. the tables are filled by the words from either the adi or cran dictionaries and the collision and storage statistics taken. because good collision properties are most important, they are examined first. the storage properties are dealt with later. the number of collisions obtained from each scheme versus the virtual table length is plotted in figures 5 to 8. the adi dictionary is shown in figures 5 and 7, and the cran in figures 6 and 8. the circled lines correspond to curves generated from equations 7 and 10. the horizontal one shows the expected number of collisions and the lines above and below it enclose a 95% confidence interval about the expected curve. in other words, if an algorithm is generating random . addresses, the probability is 95% that the curve for that scheme lies between the heavy lines. consider figures 5 and 6 showing the results for all the addition methods and the mc variation of the multiplication variation. the ac and mc algorithms differ only in that addition is used in forming the key in the 188 journal of library automation vol. 3/3 september, 1970 -0 ooooooo theoretical curves (equations (7) and ( 1 0) experimental curves ---interpolated curves virtual hash table size (power of two) fig. 5. collisions in the adi dictionary for addition and multiplication hash schemes. first one and multiplication in the second one. yet the curves are spectacularly different. the result seems to have the following explanation. the purpose of a hash address computation is to generate a random number from a string of characters. if the bits in the characters are as varied as possible, then the algorithm has a headstart in the right direction. however, the s/360 bit patterns for the alphabet and numbers are: a to i 1100 xxxx j to r 1101 xxxx s to z 1110 xxxx 0 to 9 1111 xxxx scatter storage for dictiona1·y lookups/murray 189 en c: 0 en 0 (.) -0 ... q) .0 e ::t z 20 ooooooo theoretical curves (equations (7) and (l 0) experimental curves --interpolated curves 26 virtual hash tobl e size (power of two) fig. 6. collisions in the gran dictionary for addition and multiplication hash schemes. c 28 in each case the two initial bits of a character are l's, so that in any given word one-fourth of the bits are the same. in forming a key, the successive additions in the ac algorithm may obscure these nonrandom bits if a sufficient number of carries are generated. however, the number of additions performed is usually small-2 or 3and it appears that the pattems are not broken sufficiently. the mc algorithm uses multiplication to form its keys, which involves many additions-certainly enough to make the resulting key random. the multiplications in the mc algorithm are costly in terms of computation time. therefore the as and am algorithms are tried. these addition 190 journal of library automation vol. 3/3 september, 1970 en c 0 ·~ 0 u .... 0 ooooo theoretical curves experimental curves interpolated curves 22 20 virtual hosh table size (power of two} fig. 7. collisions in the adi dictionary for division and multiplication hash schemes. variants try to hasten the breakup of the nonrandom bits by shifting and masking respectively. although these variants reduce the number of collisions somewhat, none of the addition schemes could be called random. typically a few words are singled out at some point and continue to collide regardless of the length of the virtual address. several collision pairs are listed below. note the similarities between the words. count worth tolerated wheel -sound -forty -telemeter -sheet in 1: 0 ·;;; 0 0 ... 0 ... cl) .d e :i z 20 scatter storage for dictionary lookups/murray 191 0000000 \ \ ,---\ \ \ \ 26 theoretical curves (equations (7) and (1 0) experimental curves interpolated curves 28 virtual hash table size (power of two) fig. 8. collisions in the gran dictionary for division and multiplication hash schemes. consider the multiplication algorithms. during key formation, the process of saving the center of successive products adds to the computation time. the msl variation attempts to remedy this by saving only the high order bits between multiplications (on the s /360 this means saving the upper 32 bits of the 64-bit product) . this method is so inferior that its collision graph could not be included with the others. the poor results stem from the fact that characters at the end of fullwords have little effect on the key and that the later multiplications swamped the effects of the earlier ones. examples of collision pairs are given below. for convenience the fullwords are separated by blanks. 192 journal of library automation vol. 3/3 september, 1970 certainty prevented heaving expe nse charter certainly -presented -heat lng -expanse -chapter the mc and mlm variants are identical with respect to collision properties. in general these algorithms produce good results, reducing the number of collisions to zero in both dictionaries. the collision curve is always beneath the expected one. consider figures 7 and 8 showing the results for all division methods and the mc method. all of the division algorithms display a distinct rise in the number of collisions when the virtual table size is near 224-regardless of the dictionary. the majority of the colliding word pairs are 4-character words having the same two middle letters. this brings to light a curious fact about division algorithms. for virtual tables, the divisor of the key is large and the initial few bits determine the quotient, leaving the rest for the remainder. for words of less than 4 characters (which require no multiplications during key formation), dividing by 224 is equivalent to selecting the last 3 characters of the word as the hash address. because the divisors are not exactly equal to 224, only the two middle characters tend to be the same. examples are: deal -bear took -soon held -cell verb -term this phenomenon apparently continues for table sizes around 226 and 228, but there are few or no words of 4 characters or less which agree in 26 or 28 bits. for divisors smaller than 22 \ a larger part of the key determines the quotient and apparently breaks up the pattern. because the above effect occurs only for v = 22 \ these points are passed over on the graphs. in general, the dt algorithm is superior to the rest of the division methods, mostly because each of its two divisors is smaller than those used in other methods. prime numbers seem to produce better results than other divisors. on the basis of collision properties, the mc, mlm, dt, and possibly as algorithms are the best. storage-search evaluations are included for these methods only. the experiments with each hash coding method also include counting the frequency of various length of collision chains. here a collision chain refers to chains of words producing the same major. the frequency counts are compared with the expected counts given by equation 5. the comparison is in terms of a chi-square goodn ess-of-fit test with a 10 % level of 0 8 :;::: t/) -0 -(j) q) ~ 0 ;:) ct (j) i .c u 2 0 scatter storage for dictionary lookups/murray 193 x·---x·---x or----dt / dt a~as mlm as ----as,dt mlm--mlm mlm mc mc-mc ~c ----mc virtual hash table size (power of two) xcurve for 10% level of significance fig. 9. deviations of storage-search properties from expected values for selected hash schemes using the adi dictionary. significance. figures 9 and 10 show the results of this test for each dictionary. included in the graphs is the line corresponding to the 10% level of significance. if the major portions of the hash addresses are really random, there is a probability of 0.90 that the 10% line will lie above the curve for the algorithm tested. consider the mc and mlm algorithms which differ only in that the major is selected from the center and left of the virtual address. from the graphs, it is clear that the multiplication methods produce their most 194 journal of library automation vol. 3/3 september, 1970 .~ -.!:!! -0 -cj) ~ 0 8 ~ c:1' cj) i ..c (.) 6 4 mlm x--dt virtual hash table size (power of two) xcurve for 10% level of significance fig. 10. deviations of storage-search properties from expected values for selected i-iash schemes using the gran dictionary. random bits in the center of their product. this is somewhat as expected, because the center bits are involved in more additions than other bits. the division algorithm, which had fairly good collision properties, seems to have rather mediocre storage properties. this is probably due to the scatter storage for dictionary lookups/murray 195 same causes as the collision problems, but working at a lower level, and not affecting the results as much. the as curve is included simply for completeness. the scheme displays a well behaved storage curve, but it has poor collision properties. in summary, the mc scheme seems to be the best for both dictionaries in terms of collision and search properties. in terms of computing time, the method is more time consuming than the addition methods, but less expensive than the division methods. the difference in computation times is not an extremely big factor. all methods required from 35 to 55 microseconds for an 8-character word on the s/360/65. the routines are coded in assembly language and called from a fortran executive. the times above include the necessary bookkeeping for linkage between the routines. a practical lookup scheme general description the lookup scheme described below is designed for use with dictionaries of about 21 :. words. the virtual table size selected is 229 and the actual table size is 216• on the basis of the results presented in previous sections, when the dictionary is full, it is expected that 1) 36.8% of the hash table will be empty, 2) 36.8% of the hash table will be single entries, 3) the bump table will require ( 0.632 )215 entries, 4) 1 collision is expected, 5) the probability of 5 or fewer collisions is 0.999, and 6) the average lookup will require 2.13 probes. table layout in all previous discussions a dictionary entry has included a minor and a concept number. a concept number is simply a unique number assigned to each word. the hash address of a word is also unique, and hence can be used. there is no need to store and use a previously assigned concept number. a dictionary entry contains a 14-bit minor and a single bit indicating whether the word is common or noncommon: 1 2 15 ic minor c = 0 implies the word is common; c = 1 implies the word is noncommon. a hash table entry contains 16 bits arranged as : 0 1 15 i flag i information flag = 0 implies that the information is a dictionary entry; flag = 1 implies that the information is a pointer to the bump table. words that have the same major are stored in a block of consecutive 196 journal of library automation vol. 3/3 september, 1970 locations in the bump table. this eliminates the need for pointers in the collision "chains". a bump table entry also has 16 bits structured as: 0 1 2 w i end i c minor end= 0 implies that the entry is not the last in the collision block; end = 1 implies that the entry is the last in the block. some convention must be adopted to signify an empty hash table slot. a zero is most convenient in the above scheme. unfortunately a zero is also a legitimate minor. however, to cause trouble the word generating the zero minor would have to be a common word and a single table entry (zero minors in the bump table are no problem). hopefully this occurs rarely because of the size of the minor ( 14 bits) and the small number of common words. however, even if this combination of circumstances occurs, the common word could be placed in the bump table anyway. in designing the tables, it is important to make the hash table entries large enough to accommodate the largest pointer anticipated for the bump table. for the above scheme, the expected bump table size is less than 215 so that the 15 bits allocated for pointers is sufficient. search considerations the number of probes needed to locate any given word depends on the place that the word occupies in a collision block. the average search time is improved if the most common words occupy the initial slots in each block. a study of adi text yields the statistics given in tables 3 and 4. table 3. division of words by categm·y. number of words percent of total 17270 total words 100.0 8716 common words 50.5 8554 noncommon words 49.5 table 4. distribution of l engths. number of all common noncharacters words percent words percent common percent words 1-4 10145 58.8 8057 92.5 2097 24.5 5-8 4630 26.8 627 7.2 4003 46.8 9-12 2249 13.0 32 0.3 2217 25.9 13-16 221 1.3 0 0.0 221 2.6 17-20 11 0.1 0 0.0 11 0.1 21-24 5 0.0 0 0.0 5 0.1 totals 17270 100.0 8716 100.0 8554 100.0 av. length 6.3 4.3 8.3 scatter storage for dictionary lookups/murray 197 using the categorical information, it appears that in filling the hash-bump tables, the common words should be entered first. within each category, all words should be entered in frequency order if such information is known. if frequency information is not available, the distribution by lengths can be used as an approximation to it. for common words, this means entering the shorter words first. for noncommon words, the words of 5 to 8 characters should be entered first. the greater the number of single entries, the greater the average search speed. figure 11 shows the fraction of single entries ( f 1) and fraction of empty slots ( f o) for various load factors. the fraction of single entries .l: iji 0 i -0 c 0 -(.j 0 ~ u. 0 .4 .8 load factor fig. 11 . theoretical hash table usage. 0 fraction empty slots a fraction of single entries 1.6 198 journal of library automation vol. 3/3 september, 1970 f1=ae-a reaches a maximum for a= 1, but since the slope of the curve is small around this point, the load factor in the interval ( 0.8, 1.2 ) is practically the same. table usage is better, however, for the larger values of a. these facts imply that scatter storage schemes make most efficient use of space and time for a=l. most text words can be assumed to be in the dictionary. thus the order of comparisons during lookup should be: hash table scan 1) check minor assuming the text word is a common word 2) check minor assuming the word is non common 3) check if the entry is a pointer to the bump table 4) check if the entry is empty first bump table entry (must be at least two) 5) check minor assuming the word is a common word 6) check minor assuming the word is non common other bump table entries 7) check minor assuming the word is non common 8) check minor assuming the word is common 9) check if at end of collision block. the search pattern can be varied to take advantage of the storage conditions. for example, if all common words are either single entries or the first element of a collision block, then step 8 may be eliminated. performance the lookup system described above has been implemented and tested on the ibm s/360/65. a modified form of the mc algorithm is used to compute a 29-bit virtual address and divide it into a 15-bit major and a 14bit minor. the modification is the inclusion of a single left shift of the fullwords of characters during key formation. this breaks up certain types of symmetries between words such as wingtail and tailwing. without this, such words will always collide. the hash-bump tables were filled with entries from the adi dictionary-common words first, followed by noncommon words. the shortest words were entered first. table 5 gives comparison of the expected and actual results. table 5. lookup system results. a=.239 number of empty table slots number of single entries number of collision blocks longest collision block average length of collision blocks size of bump table number of collisions average probes per lookup expected 25810 6161 797 4 2.1 1663 .06 1.33 actual 25762 6250 756 4 2.1 1572 0 1.33 scatter storage for dictionary lookups/murray 199 to obtain the actual lookup times 627 words were processed. the words were read from cards and all punctuation removed. each word was passed to the lookup program as a continuous string of characters with the proper number of fill characters added. the resulting times are given in table 6 (in microseconds); a larger sample of the category of "not-found" words processed with less accurate timings indicates that the average time for words in this category is about 62 microseconds (standard deviation 26). table 6. lookup times category number of words of words all 627 common 288 noncommon 338 not found 1 percent of total 100.0 45.9 53.9 0.2 average time 57.9 49.9 64.7 53.1 standard deviation 11.7 6.7 10.7 0.0 average probes 1.18 1.12 1.24 1.00 the time to compute a hash address depends on the length of the word . let n be the number of s /360 full words needed to hold these characters. the time to form the initial address is i ( n) = 34.5 + 10.2 ( n-1) microseconds. the average total lookup time, then, is t = i(n) + cp where c is the average time per probe into the table space and p is the average number of probes. for the words in the experiment n = 2.32 (average), i ( n) = 40.3, and t = 57.9, so that each probe required about 15 microseconds. c ompadsons timing information for other lookup schemes is difficult to obtain. a treestructured dictionary is used for a similar purpose at harvard. published information indicates 6pq microseconds are needed to process p words in a dictionary of q entries. this time is for the ibm 7094. translating this time to the s/360/65, which is roughly four times faster, and using the adi dictionary ( q = 7822), it appears that each lookup averages 11,000 microseconds. exactly how much computation and input-output this includes is unknown. extensions larger dictionaries as more words are added to the dictionary, the size of the virtual address must increase in order to prevent collisions. as a result, the number of bits per table slot must also increase in order to accommodate the larger minors and pointers that are used. for a fixed-sized hash table, the number of entries in the bump table grows as new words are added. at some point the space required for tables will exceed the amount of core allotted for 200 journal of library automation vol. 3/ 3 september, 1970 dictionary use. to salvage the scheme, it may be possible to split the buinp table into parts-one part for more frequently used words and one for words in rather rare usage. during dictionary construction common words are entered first, then noncommon, then rare. when a rare word must be placed in a collision block, a marker is stored instead, and the item is placed in the secondary bump table. presumably the nature of the words in the second bump table will make its usage rather infrequent, thus saving access to auxiliary storage to fetch it. suffix removal many dictionary schemes store only word stems; the lookup attempts to match only the stem, disregarding suffixes in the process. this is not easily done with scatter storage schemes. one solution is to try to remove the suffix after an initial search has failed. each of the various possible stems must be looked up independently until a match is found. another solution is to use a table of correspondences between the various forms of a word and its stem. the concept number could be used as an index on th is table containing pointers to information about the actual stem. a thesauru s lookup can be handled the same way. application to library files library fil es-characterized by a large number of entries, personal and corporate names, foreign language excerpts, etc.-present special problems to lookups. with regard to size, there is no particular reason that scatter storage cannot b e extended to such files. the only genuine requirement is the ability to compute a virtual address long enough to insure a reasonably low number of collisions. as mentioned previously, table space can become a problem. for really large files, a two-stage process looks most promising. a small hash table is used to address high frequency items and a larger hash table is used for addressing all other data. lookup starts with the small tables and continues to the larger ones if the initial search fails. the same virtual address can be used in both lookups by shifting a few bits from the high-frequency minor to the low-frequency major. this two-stage technique should keep the amount of table shufbing to a minimum and provide rapid lookup for all textual data in titles, abstracts, etc. with respect to bibliographic information, personal and corporate names are bothersome because they can occur in several forms. unfortunately, scatter storage schemes do not guarantee that dictionary entries for r. a. jones and robert a. jones are near each other, so that if an initial lookup fails, the rest of the search can be confined to a local area of the file. there are two approaches to the problem : ( 1 ) standardization of names before input or ( 2) repeated lookups using variants of a name as it occurs in text. standardization, along with delimiting and formatting bibliographic data, is probably the most effective and least expensive approach. in addition, it reduces the amount of redundant data in the file. scatter storage for dictionary lookups/murray 201 phrases in foreign languages present a difficulty, since the character sets on most computing equipment are limited to english letters and symbols. however, if an encoding for such symbols is used, lookup can proceed normally. the problem of obtaining the dictionary entry for an english equivalent of a foreign word is a completely different matter and will not be dealt with here. conclusions virtual scatter storage schemes are well suited for dictionaries, having both rapid lookup and economy of storage. the rapid lookup is due to the fact that the initial table probe limits the search to only a few items. the space savings come from the fact that the actual character strings for words are not part of the dictionary. the schemes depend heavily on a good algorithm for producing random hash addresses. the theory developed in the first two sections of this paper gives a basis for judging the worth of proposed algorithms. for any particular application, the table organization may vary to suit different needs and to store different information. however, the advantages of scatter storage schemes are still present. references 1. salton, g.: "a document retrieval system for man-machine interaction." in association for computing machinery. proceedings of the 19th national conference, philadelphia, pennsylvania, august 25-27, 1964, pp. l2.3-l-l2.3-20. 2. mcilroy, m. d.: dynamic storage allocation (bell telephone laboratories, inc., 1965). 3. morris, r.: "scatter storage techniques," communications of the acm (january, 1968 ). 4. maurer, w. d.: "an improved hash code for scatter storage," communications of the acm (january, 1968) . 5. johnson, l. r.: "indirect chaining method for addressing on secondary keys," communications of the acm (may, 1961). reproduced with permission of the copyright owner. further reproduction prohibited without permission. using gis to measure in-library book-use behavior xia, jingfeng information technology and libraries; dec 2004; 23, 4; proquest pg. 184 that development rests with the local site. for the nwda consortium, this development , using the code base, ha s been manageable. the current stat e of interface dev elop ment for the nwda project can be reviewed at http :// nwda. wsulibs .wsu.edu / project_info /. conclusion in se lecting an ead searc handretrieval system, one important qu es tion for th e consortium was, which software solution had the best prosp ects for migration in the futur e? because of the inherent strength s of nativ e xml technology in comp ari son to the other product categories list ed in tabl e 1, a nati ve xml databas e appeared to be the best appro ach, and textml provided the best combination of licensi ng costs, software capabilities, and support. it is important to note that the distinctions betw een nativ e xml databas es and databases that support xml throu gh extensions (xmlenabled databa ses) ma y b eco me more difficult to dis cern over time, in part du e to the existi ng exper tise and in vestme nts in rdbms technologies. 16 nevertheless, capabilities central to native xml, such as the us e of an xml-based query language, are integral to th e success of such hybrid syst ems . references and notes 1. daniel pitti , "encoded archival de scriptio n: the development of a n encoding standard for archival finding aids ," the american archivist 60, no. 3 (summ er 1997): 269. 2. daniel pitti, "encod ed archival des cription: an introducti on and overvi ew," 0-lib magazine 5, no. 11 (nov. 1999). accessed nov. 2, 2004, www.dlib. org / dlib / november99 / 11 pitti.html. 3. daniel v. pitti and wendy m. duff (ed s.), "introduction," in encoded archival description on the internet (binghamton, n.y.: haworth, 2001), 3. 4. james m . roth, "serv ing up ead: an exp lorat o ry study on the deployment and utilization of encod ed archival description finding aids," the american archivist 64, no. 2 (fall /win ter 2001): 226. 5. sarah l. shreeves et al., "har ves ting cultural heritage metadata using the oai protocol," library hi tech 21, no. 2 (2003): 161. 6. nan cy fleck and michael sead le, "ead harvesting for the national ga llery of th e spoken word" (pap er present ed at th e coa liti on for netw orke d information fall 2002 task force meeting, san anton io, tex., dec. 2002). accessed nov. 2, 2004, www.cni .org/tfms/2002b. fall/handouts/h-ead-fleckseadle.doc. 7. anne j. gilliland -swe tland , "po pularizi ng th e finding aid : exploiting ead to enhance online discover y and retrieval," in encoded archival description on the internet (bing h a mton , n.y.: haworth, 2001), 207. 8. ibid, 210-14. 9. charlott e b. brown and brian e. c. schottlaender, "the onlin e arch ive of california: a consortia! approach to encode d archival descrip tion, " in encoded archival description on the internet (bingham ton, n .y.: haworth, 2001), 99. 10. ibid , 103-5. oac ava ilable at: www . o ac.c dlib.org / . accessed nov. 2, 2004. 11. christopher j. prom and thomas habing, "using the open archiv es initiative protocols w ith ead," in proceed ings of th e second acm/ieee-cs joint confe rence on digit al librari es (portland, ore., july 2002). accessed nov . 2, 2004, http:// dli .grai ng er.uiu c.edu / publications / jcdl20 02/ p14prom.pdf. 12. marc cyrenne, "go ing n at ive: wh en should you use a native xml database?" aim e-doc magazine 16, no . 6 (nov./dec. 2002), 16. accessed nov. 2, 2004, www. edo cmaga zine.com/ article_ n ew.as p?id=2 5421. 13. product categor y decisions ba sed upon definiti ons and classifications available from: ronald bourret, "xml database products." accessed nov. 2, 2004, www. rpbourret .com/ x ml / xmlda ta base prods .htm. 14. cyrenn e, "going native," 18. 15. bill stockting, "ead in a2a," microsoft power point pres entation. accessed n ov. 2, 2004, www.agad.a rchiwa . gov.pl/ ead /s tocking.pp t. 16. uwe hohenstein, "supp orting xml in oracle9i ," in akm a l b. chaudhri , 184 information technology and libraries i december 2004 awais rashid, and rob erto zic ar i (eds.), xml data management: native xml and xml-enabled database systems (boston: addison-wesley, 2003), 123-4. using gis to measure in-library book-use behavior jingfeng xia this article is an attempt to develop geographic information systems (gis) technology into an analytical tool for examining the relationships between the height of the bookshelves and the behavior of library readers in utilizing books within a library. the tool would contain a database to store book-use information and some gis maps to represent bookshelves. upon analyzing the data stored in the database, differen t frequencies of book use across bookshelf layers are displayed on the maps. the tool would provide a wonderful means of visualization through which analysts can quickly realize the spatial distribution of books used in a library. this article reveals that readers tend to pull books out of the bookshelf layers that are easily reachable by human eyes and hands, and thus opens some issues for librarians to reconsider the management of library collections. several years ago, when working as a library assistant reshelving books in a university librar y, the author noted that the majority of books used inside the library were from the mid-range laye rs of bookshelves. that is, b y proportion , few books pulled out by library readers were from the top or bottom layers. books on the layers that were easily reachable by readers were frequentl y utilized . such a book-us e distribution patt ern made the job of reshelving books easy, but created some inquiries: how could book locati ons influ ence th e choices of readers in selecting books? if this was not an isolated observation, it must have exposed an inter es ting reproduced with permission of the copyright owner. further reproduction prohibited without permission. phenomenon that librarians needed to pay attention to . then , by finding out the reasons , librarians might becom e capable of guiding, to some extent , us ers' selectiv eness on library books by deliberately arranging collections at design ated heights on book sh elves. a research study was designed to develop geographical information systems (gis) into an analytical tool to examine former casual observations by the author. the study was conducted in the mackimmie library at the university of calgary. thi s paper highlights th e results of the study that aimed at assessing th e behavior of library readers in pulling out books from bookshelves . thes e book s, when not checked out, are categoriz ed as "pickup books" becau se they are usually discarded inside a library after use and then picked up by library assistants for reshelving. like many other libraries , the mackimmie library does not encourage reasd ers to reshelve books th emse lves. arcview, a gis software, was selected to develop th e tool for this study because gis ha s the functions of dynamicall y analyzing and di splayin g spatial data. the research on library readers pullin g out books involv es the measur emen ts of bookshelf heights, and thu s deals with spatial coordinates. with the capability of presenting book shelves in different views on map s, gis is able to provide readers with an easy und erstanding of the anal ytical results in visual forms, which make any textu al description s wordy . at the same time, some gis products are available now in most academic libraries, thus giving develop ers convenient access to use. hypothesis when library users decide to check books out of a library, the se books are what the y think of as useful. peopl e are usually hesitant to carry home books that are of little or uncertain use, not only because of the limit on the numb er of check-out books , but also bec ause of the physical work required for carrying them. moreover, some items, such as periodicals and multimedia materials, are either designated as "refe rence only" or have a very short loan period . it is reasonable to beli eve that user s carefully select what they want from library collections and keep these book s for handy use outside the library. by contrast , in-library book use repre sents a different category of library readers' behavior . there are two general categories of in-library book us e: readers bringin g their own books into a library for use, and readers pulling out book s from bookshelves inside a librar y. the former is commonly seen when students study textbook s for examinations (not the topic of this study), whil e the latter is a little more complex. 1 as library users approach bookshelves to extract book s, th ey may or may not hav e a definit e target. when coming with call numb ers, peopl e will deliberately draw the books they want for reading, photoc opyi ng , or referencing. ho wever, there are time s when user s on ly wander in bookshelf aisles of desired collections, uncertain about singling out specific books . th ey may simply shelf-shop to randomly select whatever is interesting to them, or they may locate a subject of need and go to the storage position(s) to look for whatever books are there. no matter what these readers' intention s are, they roam among collections, pick book s for quick u se, and leave them inside the library after use, although some materials may also be checked out. because of such arbitrary selections from library collections , physical con venie nce sometimes influence s library users in takin g books from booksh elves-they ma y look around for books on bookshelf layers that are at a reach able height. the standard library bookshelf is hi gher than the average person's height and is structured to have five to eigh t layers. in aca demic libraries, "wood shelving is available in three heights: 82 in. (2050 mm), with a bottom shelf and six ad justabl e shelves; 60 in. (1500 mm), with a b ottom shelf and four adjustable she lves; and 42 in. (1050 mm), with a bottom shelf and two adjustable shel ves ." 2 for regular collections in mo st academic libraries, bookshelve s are usually about eightytwo inches high and hav e seven layers. books on the top lay er are out of reach for many reader s, requiring them to use a ladder to draw a book from it. many users are hesitant to use ladders. even worse, a reader will have to bend over or squ at down to view the contents of books on the bottom layer of a bookshelf . hence , the hypothe sis is that books used inside a library are primarily distribut ed among the mid-ranged layers of bookshelves. specifically, if a bookshelf ha s seven lay ers, books placed on layers two through six are most frequently consulted. this is the subject of this research paper . background a considerable number of studies have investigated the utilization of books that are checked out of a library. an esti mate made in 1967 pointed out that over seven hundred research results pertained to this topic. ' how ever, the situation of books used inside a library has not been given enough attention. one of the reasons for this seeming neglect comes from the belief that the records of library book s in circulation provide similar info rma tion as those of books used within libraries." thi s misunderstanding wa s lately criticized by other researchers who discov ere d the differences in use behavior between jingfeng xia (jxia@email.arizona.edu) is a student at the school of information resources and library science at the university of arizona, tucson. using gis to measure in-library book-use behavior i xia 185 reproduced with permission of the copyright owner. further reproduction prohibited without permission. libr ary readers takin g books h ome and those using books inside libraries. 5 research ers hav e now recog ni zed that correlations between the two sets of data are n ot as strong as they seemed to be. such reco gnition, unfortunately, ha s not resulted in mor e consequ ent work to explor e the issu e of in-lib rary book use. this is probabl y due to th e difficulties of co llecting data or the la ck of appropriate research methods .6 also, the majority of rele va nt surv eys w ere conducted several de cades ago and focu sed primaril y on exp loring a go od method of sam plin g in-library book us e.7 am ong the se studies , fu ssler and simon preferr ed to carry out researc h by distributing questionnaires am ong library reader s; drott u sed randomsampling m et hods to statisti ca lly examine th e importance of librar ybook use; and jain, as well as salv erso n, emphasized dividing th e survey time s into differ ent investi gation units when conducting res earch. simil a rly, m orse point ed out the compl ex ity of measurin g lib rarybook u se a t wo rk , advocating an involv ement of computerized operation s in librar y-book man ag ement. the sampling strategies and analy tical methods implemented in pa st studie s are still applicable to curr ent res earc h. non etheless, because many new technol ogie s ha ve come into view since th en, it is quite likel y tha t som e new ways of obtaining and analy zing th e d ata of in-library book use can now be developed. th e n ew app roac hes must have the capability of providing not only accurate m easurem ent of the data but also the me ans for easy manipulation . th eir result s must be able to enhance th e und ers tandin g of us er behavio r in expl ori ng th e reso urc es of existing collection inv entorie s . one of th e solutions is an analytic al tool. an analytical tool can control data collection and anal ysis by computerizati on . if the system is ab le to accumul ate const antly upd ated records ov er time, it will remedy the probl em of poor sampling th at man y researchers hav e encount ered, be cause an alysis will then b e done on all the data rather th an w ith certain isolated samples. the development of m odern technologi es makes such data collection and storage po ssible and easier than ever before. on e exampl e of the technologi es is the radio freque ncy identification (rfid) tag system that ha s been adopted b y some public and acade mic librar ies recently.8 thi s system stores a tag in each librar y item with the item's biblio gra phic information, and uses an antenna to keep tr ack of th e tag. by automatically communicating with dat a stored in the tags, the system can collect dat a on all librar y collections in a timel y manner and export them into pred esigned d atabases for easy man ag ement. data an a lys is and pres enta tion comprise ano ther p ar t of the an aly tical mechani sm. researc hers h ave to carefully evaluate existing technologies in order to select prop er produ cts or de ve lop parti cular pro gra ms to integrate with rfid (if used) and th e databases. it is fortunate th at gis techno log y is available with numerous functi ons for analyzing and demonstrating data , especiall y spatial data. da ta visuali za tion through gis produ cts has been very good, which giv es them advantages over other analytic al, stati stical, or repor tin g produ cts. combining rfid and gis into one system would seem to be th e perfect solution-the former can effective ly carry out dat a collection and th e latter can efficiently perfo rm data analysis and presentation. h ow ever, while gis products h ave been u sed in libraries in the unit ed states for more th an a dec ade, mo st academic libraries are hesitant to invest in rfid because of its high costs . gis technology alone, however , can still provide sufficient functions to be dev eloped into such an analytical tool. up to n ow, tho se librarie s that have provid ed gis serv ices only use the software that assists in the utilization of geospatial data and map186 information technology and libraries i december 2004 ping te chnologie s for users .9 gis is not expl oited enou gh to aid the manage ment of librari es them selves and the res earch of librar y collections. some commercial gis software, such as "lib rary decision" by civic-technologi es, has be en recently marketed to support the analysis of libraryuser d a ta for public libraries. 10 ho wev er, it only wor ks w ell on the data of conventional geographical nature, that is, th e distributi on and location of librari es and th eir users with the mapping of city bl ocks and streets . it does not app ly to a librar y an d its books, and especiall y not to the distribution of books us ed insid e th e librar y. such products are also not ap plicabl e to acad emic librari es that do not always concentrate on the ana lysis of geog rap hical area s of their us ers. even so, gis h as all the function s that such a propos ed analytical tool demands. it is suit able for assisting in the research of in-library book us e where library floor layout s or other facilities can be d raw n into maps on multiple-dimensional views. at the same tim e, bookshelves wi th individual lay ers can be treated as an innovative form of map by gis technology (see figur e 1), makin g visible the relationship of book u se to the height of the book sh elf. as soon as th e presentation mechanism is linked to databases, any updat es on book use will be mirror ed visuall y. method this proj ect is one of a serie s of projects for deve lopin g gis into a tool to manag e and anal yze the u sage characteristi cs of library books . the other projects include u sing gis to measure book u sability for the de velopm ent of collection inventorie s; to assist in the managem ent of libr ary physic al space an d facilities; and to locat e library items . 11 in order to make gis workable for the subject of this paper , the focus was placed only on the exploration of corr ela tions b e tween b ooks helf reproduced with permission of the copyright owner. further reproduction prohibited without permission. figure 1. the front view of one bookshelf rack on the fifth floor of the university of calgary mackimmie library. eight bookshelves assemb le the range. here, different shades of color represents the numbers of books used on each individual layer. the display is only for demonstration and not to actual scale. height s and book-use frequenci es in an aca demic library environm ent. th ere are two major step s to conductin g this research : collectin g data and d ev eloping a gis anal y tical tool. since mackimmie librar y did not in ves t in rfid at th e tim e thi s resea rch was undertak en , p ersonal ob servations were mad e to record b ook-use data. 12 the dev elopm ent of th e gis tool involves creatin g a sm all d a tabase to store data and facilit ate d ata analysis. it also requir es creatin g seve r al bookshelf and sh elf-r an ge m ap s to pre sent anal ytical result s in visualized forms. arc view-the mo st p opul ar gis produ ct in th e w orldwas ut ilize d for the de ve lopment. this paper presents only a p or tion of co llection areas at mackimmie library. part of the fifth floor, wh ere som e collections of humaniti es and social sciences are stored, w as selected becau se this floor is amon g the busi est of th e floors used by read ers. it is filled with sixty-eight ran ges of b ookshelves containin g book s from call numbers b to du. the terms used in this paper includ e bookshelf, referring to one unit of furnitur e fitted with horizontal sh elves to h old book s; rack, which includ es more than one bookshelf standin g tog eth er in a line ; and range, comp osed of two racks standing b ack-t o-back. bookshelves on the fifth floor are arr anged to surround a group of facility rooms in the central area. stud y corridors are set between booksh elves and the wall. each booksh elf ran ge consists of two bookshelf rack s, each of which in turn has eight individual bookshel ves . all of the book shel ves are about eight y-two in ches high and have seven laye rs. th e laye rs, except for the top on es th a t are open, are equal in height , w idth , and length. data collection personal surv eys wer e taken by the author to not e d own each call number of books that w ere n ot in their origin a l p os ition s on the sh elv es, but in stead were found discard ed on the floo r, tables, chairs, sofas , or on top or in front of other stocked book s . boo ks on th e sh elving carts ar e also account ed for. the surveys we re separ ately con ducted three times a d ay mo rnin g, afternoon, and ev en in gin ord er to cat ch as m any book s u sed in a day as p oss ible. to avoid reco rdin g the sa me boo k mor e than on ce, n o duplicat e call numbers w ere acce pt ed for any single da y even thou gh th e sam e book wa s found in diff erent locations on that day. on the oth er hand , the sam e call number coul d be ent ere d int o the records on th e second day alth ough it was recorded th e d ay befo re a nd remained in th e sa m e pla ce w ith out b eing pick ed up by librar y ass is tants . (thi s dupli ca te reco rdin g was ve ry rare beca use of th e routin e work of book pi ckup by libra ry ass istants.) a period of two w eeks w as d esignated for the sur vey in th e first h alf of december 2002. th e final exam in ation week was pl ann ed becau se it represents a week of h ea vy book u se, although previous resea rch found th at readers in this w ee k tend ed to u se library collection s less th an their own stud y mat erials." a suppl em ent a ry surve y th a t a lso las ted two w eeks, includin g a final exam ina tion wee k, wa s condu cted in th e lib ra ry in late spring 2004. to simplify the rese arch , some excepti ons w ere established for d a ta collection. pe riodicals were exclud ed beca use th ey have a very short loa n p er iod (gen erall y one day) . libra ry u sers m ay pr efer to read articl es in journ als w ithin the library and thu s w ill h av e a clear idea as to wh a t m aterials to read. '' books belon ging to oth er floo rs of the librar y, o r b oo ks b elon g ing to th e fifth floor but found out sid e th e area were not includ ed in th e an alysis. furthermore, du e to the n atur e and time limit of thes e ob ser v ation s, b ooks pulled out of tar geted bo okshelves were not distingui sh ed from b oo ks taken from book sh elves at rand om . thi s information can onl y become ava ilable throu gh int erv iew s using gis to measure in-library book-use behavior i xia 187 reproduced with permission of the copyright owner. further reproduction prohibited without permission. with library users, which can be another rese arch project. each book shelf laye r wa s recorded with and signified by two call numbers: the start and end numbers of books. for example, the call numbers "bf1999 .k54" to "bh21 .b35 1965," representing books stored on a particular layer, were record ed to identify that layer . because book shifting can happen from time to time, such recording of start and end call numbers for individual book shelf layers only reflects the condition s when this research wa s undertaken and may need updates whenever changes occur. data manipulation and visualization using a bookshelf lay er as the recording unit is essenti al for the analysi s of the relationship between book use and bookshelf height. each book used can be classifi ed to fit in one unit according to the call num ber of the book. therefore , building a databas e with a table for lay ers will be an important part in the development of such an analytic al tool. the layers tabl e includes a data field as an identifi e r to stand for the sequenc e of e ach layer-1 for the top layer, 2 for th e next layer down , and so on , in addition to storing the start and end call numbers of books for each lay er. if more than one bookshelf in th e library has seven layers, layer identifiers will it erate from booksh elf to bookshelf . therefore, this tabl e will also need an identifier for each individual book shelf with which lay ers are associated. the dat abase will also contain such information as bookshelf ranges, bookshelf racks, and books , all of which are individual database tables that are joined with each other by relational keys. among them, the ranges table is simply characterized by its id entifier, and is designed to repre se nt two rack s of book shelves that stand back to back. the bookshelves table is identified by the call numbers of the start and end books stor ed across individual bookshelves rather than on individual layers. furthermore, th e books table is primarily filled with the data of individual book call numbers as well as book pickup time s and book discard locations . gis h as lim ited ability for orga nizing da tab ase struc tu re. if n ecessa ry, oth er da tab ase managemen t sys tem s, su ch as microsof t access, can b e incor p ora ted . qu ery codes are built to ge t su mmarize d infor m ation for speci fic p ur poses, and th e agg rega ted da ta are exp or ted int o gis data bases for fur the r sp a tial an alysis or con venie nt vis u al prese nt ati on . da ta vis u aliza tion can be show n at differe nt leve lsby layer, books helf, rack, and range . th e firs t attempt at ma king a vis u al dem on stra tion of this researc h is for th e area of in di vi du al b ooks helves at layer leve l (see figur e 1). th e follow in g qu ery w ill return necessary summ arize d informa tion: select sum(b.call_no) as total_num, l.layer_id, l.shelf_id from (books b inner join layers l on b .some_id = l.some_id) where b.call_no > l.start_no and b.call_no < l.end_no group by l.layer_id, l.shelf_id order by l.shelf_id, l.layer_id. at the same time, another attempt is made to d emonstrate book numbers per layer, at bookshelf level, across multipl e bookshelf ranges. this demonstration provides a better visualization in the gis di splay so that an ov erall view of the height distributions of book usage over certain collection areas can be presented (see figures 2 and 3). to achi eve such visualization, data must be compared in order to get information about which layer of a bookshelf 188 information technology and libraries i december 2004 contains the most frequently used books and which holds thos e that are rarely visit ed . this demonstration indicates that any alternative selection of analytical-display units can be easily performed by making modifications on the query that works on aggregating data . technically, data visualization can be presented by using an y gis software, although arcview is used here because it has been availabl e in the systems of many academic libraries. bookshelf ranges in mackimmie library 's fifth floor were drawn into map features . in order to show them with a three-dim ensional view, each of the seven layers was given a sequential number as its height value , and all book shelves were treated as having the same height. these height values are tre a ted as the z values in any three-dimensional analysis. then, by associating the numbers of books from the database with the heights of layers on the map, arcview is able to sketch the hei ght distributions of inlibrary book us e in new perspectives, dramatically improving the understanding of book use. in order to implement the visualiza tion of all layers across a bookshelf range, lay ers were drawn as map features (see figure 1). layer heights and widths are in appropriat e proportion . (individual book s on each layer are for demonstration only, and thus are not in the exact shape and number.) figure 1 shows how a bookshelf rack has been presented as a gis map, which is a totally new idea in the applications of gis visualization . the databas e and visualization mechanism constitute what is referred to in this paper as the analytical tool. one will find that th e development is relatively easy and the tool is incredibly simple. however, it is a dynamic device. if expanded into other parts of the library collections, this tool will become an integrated system that is able to assi st in the management of library book use and reproduced with permission of the copyright owner. further reproduction prohibited without permission. ••••m•==== ===----::"'-:-=-=-=-=-=-=-=-=-=-=-===:-::::-::-":"".:-:-~.,jgl4 file edit 7:j scene iheme .s1.liace 6t~ s ~ i:!~ ijid ~ ~ liiffl~[i] !hj~ ~~~i§]~ [qi (ill ---..................... _ -~ ¥.l ,, ill figure 2. a three-dimensional view of bookshelf ranges on the fifth floor at the mackimmie library. the height of each bookshelf represents the corresponding height of the layer from which most books were removed. this display is not to actual scale. i -'! st a,t iij gj.1v ~ ...... 0 ~ ...... o· ;! < 0 ~ cjl ......._ l~ ._ c :l v(!) 1-' cd -l 1:-0 automation of acquisitions/carter 123 ming, because the computer center did not have the full-time personnel to support a major new effort. this was resolved by hiring a programmer on a special three-month contract running from april 15 to july 15, 1971. prior to implementation, the library was forced to rely on the availability of keypunch machines at the computer center. in september 1971, an ibm model 129 keypunch and verifier was installed in the technical services department of the library. a model 129 was chosen for the library in conformance with the initial requirement set by the director of the computer center-that all library data for the computer be verified. this has proven to be a wise decision, as we have had relatively limited problems with invalid or erroneous data. requirements specification phase (analysis) three weeks were allowed for identification and specification of all output desired from the initial system. many of these requirements were alluded to in the preliminary list of criteria for the system. to meet the library's needs we decided that the system must produce: purchase orders, individual order cards (including a copy used to order catalog cards from the library of congress), budget statements including all encumbrances and payments as well as other financial data, lists of all books on order or in process or cancelled, notices to vendors regarding items on order more than 120 days, notices to each faculty member of the additions to the collection of items they requested complete with call number, and a monthly accession list of all newly cataloged items that could be circulated to all faculty members. time date to date to development steps required start complete i. requirements specifications 3 weeks feb. 15 march 5 ii. detailed design-system how 3 weeks march 8 march 26 ill. detailed design-programming specifications 10 weeks march 29 june4 iv. programming-acquisitions 10 weeks april15 june 23 v. programming-materials accessioned 3 weeks june 24 july 14 vi. computer program system test -acquisitions & materials accessioned 2% weeks july 1 july 26 vii. implementation july 1971 fig. 2. time estimate for automation of acquisitions at parkland college as submitted in january 1971. a beginning and ending date for each phase is indicated and the actual time in weeks required is shown. 124 journal of library automation vol. 5/2 june, 1972 once it was known what forms were required, orders were placed for the necessary pre-printed forms. with some outside advice in the matter of forms suppliers, specifications for three new forms were delineated, two of which would be for use on the computer. the first form encountered in outlining the acquisitions process was a request form. the request form is used to make a record of all items ordered and to serve as a checklist in the searching process (see figure 3). later, it is stamped with a six-position control number and serves as the source document for keypunching new orders, which require three input cards per item ordered. the request form is then retained in control-number sequence until the item has completed its way through the technical services process. specifications for the purchase orders were drawn up by parkland's business manager. the machine-generated purchase orders used by parkland are almost identical to the conventional manual purchase orders used throughout the college. in this case, automation of the library's purchase orders is a likely precursor to automation of the purchase orders for the remainder of the college. the most complicated form to design, from the library's viewpoint, was the individual order form. this was required in five parts, including a copy complying with library of congress specifications for use with ocr equipment. (this is illustrated in figure 4. ) paper pato iy n.cji. co. speeoiset e moore business foams, inc., 26 searched in bip pbip 8pr ptla o. p, pil fund vendor format code author (last name first) titlefvol. card catalog publisher other year no. copies reviewed in: series/edition lccard no. requester control no. order code price sbn fig. 3. request form, used as a control record for each item ordered. -------r-------------------------------------------------------------------i 0 0 0 0 0 i subsc riber no i m i alpha pref' i i 220111 i i i author westheimer, david title lighter than a feathe r publisher little date 1971 no. copies l control number 103921-b order date ·v endor ll ttle brown & co j l c car d number r 174 -15494 7 i i 10 i list price i lo i ' w '; i r •· " ~ ii 0 7.95 i 1-14-72 i l01375 i 0 p 0. no . i parkland college library i io 11111111111111111 i 0 i a b c d e f g sbni h i j k l m n 0 i i b_ -------i-------------------------------------------------------------------~---~-. 01 1 o i i ------------_l__ original copy, used to order catalog cards rrom me library of congress. ---~--------------------------------,-0 i 1 r;:·-t,m7 ! o 0 [ author·westh£jmer, oavio l ... o ... ee•o title light fit than • featttfr i o-m 0 'o publisher ll ttle date 1971 list price 7. 95 0 0 0 0 . no. copies 1 control number 10)921-6 order date vendor little a·•town a. to 1-14-72 p.o. no. l01j7s parkland college library champaign, illinois 61820 second copy, used to send to vendor. fig. 4. copies one and two of the multiple-part order form . 0 0 0 126 journal of lib·rary automation vol. 5/2 june, 1972 it was important to determine forms requirements early, as it was anticipated that several months' time would elapse before they would be received. naturally, it was desired that the forms be on hand by the time the programs would be ready for testing, which was planned for late june or early july. one of the most critical parts of the requirements specification phase was the determination of data elements to be included in the master records. perhaps the most perplexing of those possibilities considered was subject headings. since we wanted an open-ended system which would leave us some room for future development, without major modifications, a decision was made to include three 50-character subject headings in each record. here we were limited because of the decision made (for purposes of simplicity of design and programming) to confine the system to fixed -length records. it was considered desirable for storage purposes to keep the master record length within 400 characters. while the decision on subject headings may prove to be adequate in the long run, it does give parkland's library a good starting point for some projects using subject headings, such as developing bibliographies on demand. despite possible future modifications to the data base, all items going into the history (master) file included headings as defined above. additional determinations made in the initial phase regarded files to be maintained. here a crucial factor was the physical limitations of the college's computer system. as only two tape drives and two disk drives comprised the primary storage facilities, the capability for performing sorts was limited. in fact, one of the disk drives was reserved strictly for systems programs, and could not be utilized directly by the library. this contributed to the decision to maintain separate on-order and in-process files, as well as a history file on tape. the college vendor file and the library budget file are maintained on disk. a final area of effort in the initial phase was developing codes to be utilized throughout the system. naturally, many conditions would be indicated in the computer records by the use of a oneor two-position code. one example is the format code, a one-position code, which indicates the types of items used such as: b=book, r=record, and s=filmstrip. design phase-system flow three weeks were allotted to developing the overall systems flow chart. this time was spent working out each separate program that would be required, and flow-charting the entire series of programs. a flow chart of the system (without minor additions dating after september 1971) is shown in figure 5. however, it does not necessarily indicate the sequence in which programs are run. in general, maintenance of each of the separate files is run prior to new data. this procedure has proved to work well. .-------~ : llfnoo• i : u'oaf( ca~o$ i ~ ----;--.! ... -.. -, i \woau i i vtlfooii r· ~:~~ ~ automation of acquisitions/ carter 127 o\uiojhiuv ooooo c:~f6' fig. 5. system flow chart. 128 journal of library automation vol. 5/ 2 june, 1972 in most cases, pre-sorting of card input is provided. this decision was not based on optimum efficiency but on the compatibility with routine procedures and facilities in the computer center. design phase-program specifications one of the most significant parts of the development of parkland's automated library acquisitions system is the exhaustive documentation provided by detailed written specifications for each program in the system. each program, including utilities such as sorts, was assigned a job number and then described under each of the following topics: purpose, frequency, definitions (any unusual terms), input, output, and method. a format was provided for each input and output, whether it was a card, tape, disk, list, or other printed report or form. these accompanied each individual program specification. the method section is particularly important. here the librarian-analyst stated the procedure used to arrive at the given output based on the given input. any necessary constants were defined. because the librarian-analyst has had programming training, these specifications are detailed to the point where the programmer does not have to do much more than code the problem, making it possible for programming to proceed quickly. this thorough problem definition for each program by the librarian-analyst was one of the major factors (perhaps the primary key) in our success in acquisitions being accomplished rapidly and efficiently. it had the advantage of obviating the need for a senior programmer, or for having someone from the computer center become highly involved in the analysis of library details. furthermore, and perhaps most important is the fact that it provides the detailed documentation of the system. there should be no doubt as to the procedures within each program. an example of a specification for one of the programs in the parkland college library acquisition series is presented in the appendix. it should be mentioned that most of the programs are written in cobol. there are a few in assembler, and some minimal use is made of rpg. testing of the program the original plans called for testing with test data which would proceed simultaneously with programming. however, as things developed, most coding was done prior to very much testing. as a result, the period originally devoted to live data testing of the whole system was instead devoted to testing the programs with test data. thus, in early july, we were about two weeks behind the original time estimate, and that is where it ended up. the usual problems showed up in testing with test data. moreover, during the first week of july, it was learned that the business office was changing the length of the account numbers from 9 to 11 positions. fortunately, space had been planned for up to a 12-position field, so the lengthened number could be easily accommodated by the system. however, the changautomation of acquisitions/carter 129 ing of numbers required modification of any program which edited data for valid account numbers. this was a minor problem and easily resolved. on july 15 the programmer completed the job for which he was hiredi.e., to complete a programming and systems test utilizing live data and to make appropriate changes as identified during testing. since not even testdata testing was complete on july 15, he stayed until july 20 and finished that work. meanwhile, the director of the computer center had already selected the individual to be the operator when the library's jobs were being run on a regular basis. this employee would also provide program maintenance. on july 21, this permanent staff member took over programming. for the next two weeks, while summer school classes were in session, most of the trial runs of the library series had to he done during evenings, nights, and on weekends. by the end of july, most of the major bugs appeared to be out of the programs. impact on technical services success on the first usable purchase order and order cards came on august 3. within the next day or two, a workable budget statement was produced along with a wits list (work in technical services). by august 13, when the vacation time came, nearly one thousand books had been ordered via the automated system. while a few bugs remained to be dealt with in september, the system was accomplishing its basic mission essentially on time. it took less than eight months to identify requirements, and design, program, and test a system consisting of twenty-seven programs in its original design! during the remainder of 1971, various bugs were found, and, it is to be hoped, eliminated from the system. more bugs occurred in the budget series than in any other single segment of the system. over a period of several months, these were worked out; as of march, 1972, the budget sequence of programs worked smoothly. implementation following the implementation of the automated technical services system, several effects were evident. an obvious effect was the saving of two to three days per month formerly spent on bookkeeping. on the other hand, one permanent staff member was added to technical services because of the keypunching workload. this addition had two causes: the keypunching load, and the fact that many more books were ordered directly from publishers with a consequent major increase in processing in-house. therefore, much of what was expended in salary for the extra clerk was saved by eliminating most prepaid processing costs. for several months after implementation, some duplication of effort was required, especially by acquisitions personnel. thus, the total effect on changing the nature of work was not immediately obvious. by march 1972, duplication was essentially phased out, and more realistic assessments of the 130 journal of library automation vol. 5/2 june, 1972 impact of automation in changing the nature of the workload are now being made. one of the most obvious changes is the increased number of bills to be approved for payment. by utilizing the computer to batch purchase orders and order cards, almost all materials are now ordered directly from publishers, rather than pre-processed from a jobber. although the speed by which items are received and processed has increased substantially, there has been a corresponding increase in paper work in this regard. additional services besides the immediate effects of the automation of acquisitions within technical services, other parts of the library and the college felt the impact. this is especially true of reference, which now has a weekly updated listing of all items on order, in process, or cataloged within the last month, in both author /main entry and title sequence. budget statements are now available to the director of the learning resource center and other personnel on a weekly rather than monthly basis. not only are they received sooner, but they provide more information than is present in the statement originating from the computer center. a useful fringe benefit is the availability of overdue notices to vendors when items have heen on order more than 120 days. a computer-generated notice is sent each week to faculty members regarding items requested, cancelled, or cataloged. the response of the library staff and the rest of the faculty to the automated system has been very favorable. cost at this date (march 1972) , costs are difficult to assess, but certainly seem minimal. the only direct costs are the installation of a 129 keypunch, which rents for $170 per month, plus the salary of the extra staff member for keypunching. however, the extra salary is compensated for by no longer ordering items pre-processed at an average cost of $2.05 per item. naturally, there is some local cost for processing materials such as pockets and labels, but it is minor on a per-volume basis. in addition, by being processed locally, materials are available to the users much more rapidly. among other costs, the learning resource center had to pay a threemonth salary for a programmer. other computer support, whether personnel or machine time, has not been directly billed to the library. analyst time is absorbed, in part, in general library salaries as the librarian-analyst is also head oftechnical services and is responsible for original cataloging. about one-half of her time is devoted to automation activities. as an indirect cost of automation, it is reasonable to include the cost of a special summer project contract of about $1500 for the reference librarian to catalog a-v materials. this was necessary because the librarian-analyst was directly involved with automation, thus not able to keep up with all media of materials to be cataloged. purchase-order forms previously covered by the business office budget cost the library $900. however, it was a two-year automation of acquisitions/carter 131 supply which was paid for by money the college, if not the library, would have expended anyway. the multiple-order forms for computer use exceed the cost of more standard forms by several hundred dollars per year. the library also expends about $400 per year to buy punch cards and magnetic tape. some direct savings resulted from what are by-products of the automated system, but which were previously done manually. these include production of a monthly accession list and notices to faculty members of items they requested which were ordered, cancelled, or cataloged. the accession list was previously compiled by xeroxing in ten copies the shelflist card for all items added to the collection during a month. this involved both xerox charges and student assistant time. notices to faculty were previously sent out by both the order and processing sections. now these notices are consolidated, which produces savings in addressing time, as well as eliminating manual production of each notice. overall, in calculating costs and savings, direct and indirect, it appears at this point that parkland has automated many library routines very inexpensively, although specific cost figures remain to be determined. with the availability of a similar computer, many other libraries should be able to undertake automation of certain basic functions without large expenditures of either money or personnel time. problems as with all automated efforts, some problems were encountered at almost every stage of development. taken as a whole, these were minor and, for the most part, few hitches were encountered. however, so that others may profit from the library automation experience at parkland, those problems will be discussed. the major problem was the original programmer of the series. this person was not a regular employee of parkland and was not concerned with being retained. since he was not part of the staff, he worked erratically and frequently was hard to get hold of. we were working on a tight time schedule, and it was very important to maintain close supervision of the progress being made, although sometimes this was difficult. in addition, even though it was strongly desired that tests be conducted throughout the three-month period, the programmer waited until all coding and compiling was completed before beginning even test-data testing with most programs. fortunately, it worked out satisfactorily, as the regular staff member of the computer center, who presently runs our jobs and does program maintenance, took over in mid-july and was available for live-data tests. all staff members directly involved with automation worked very hard the last two weeks of july and the first week of august to complete testing with live data. the programs were further refined during august and september, and most of the bugs were out by early fall . naturally, changes in specifications continued to be made, and our acquisitions system is definitely not static. 132 journal of library automation vol. 5/ 2 june, 1972 the lesson we learned from the experience with the initial programmer is that, if a regular staff member of the institution can be assigned to the development of programs for the library, avoiding other assignments during that time period, a more satisfactory response can be achieved from the programmer. also, in such an operation it would be possible to monitor progress on a more regular basis. another group of problems arose in connection with the new forms required for the automated system. fortunately, these were not serious. the forms arrived later than they were promised, and, without exception, their cost was about 25 percent more than the original estimates. because custom forms can take a long time to be completed, it is wise to identify output requirements ·early in the development of an automated system, so that the forms can be completed and delivered when the system is ready for final testing and implementation. a few minor problems revolved around decisions made in file design. for conserving space and holding down the size of the master record, it was decided to pack numerical fields. this would have been satisfactory if packing had been limited to such fields as the julian date, such as 72001 rather than 01-01-72. (this form of the date was used to provide easy computation when calculating overdue orders. ) unfortunately, fields such as the numerical part of the lc card number and the parkland college account numbers were also packed. no problem existed except when the lc card was blank at order time; then the lc number printed as zeros. of course, these could be suppressed once the problem was identified, although it was decided to make space to unpack the field. it was learned that packed fields always print zero when unpacked, unless this is specifically suppressed, and also that it is impossible to debug packed fields on routine file dumps that are requested with provisions for unpacking and reformatting the dump. this is because packed fields print blank when they are dumped. other minor difficulties included: l. the print chain did not print colons or semi-colons, except as zero, therefore, the library's records all contain commas instead. 2. in the midst of programming the account numbers , all the college's funds were changed, thus requiring the change of constants and edit criteria in many programs. 3. as originally specified for input, the lc classification number did not sort in shelf list order, for instance, bf 8 sorted after bf 21. this was eventually remedied by left-justifying the letters and right-justifying the numbers within separate fixed fields. 4. routine delays for machine repair and maintenance were a concern, since it is necessary to adhere to a tight schedule in systems development. automation of acquisitions/carter 133 future development as is so frequently the case, now that parkland is committed to automated functions within the library, more and more applications are seen. even the former skeptics on the staff are enthusiastic, and all the professionals have made suggestions for the future. several additions to the acquisitions system were made in the first six months following implementation of the system. these included a list of purchase orders sequenced by vendor and enlarging the machine-generated notices to faculty requestors to cover items ordered and cancelled. various additions have been made in several programs originally part of the system, which expand the services the system can provide for the library staff. many more minor modifications and supplementary features in acquisitions have been identified for inclusion in the system, and will be added as time permits. the first additional area to benefit directly by the computer availability has been periodicals. without involving complicated programming, the periodicals holdings have been converted to a card file which is then listed directly, card by card, without changes, except for suppression of a control and sequence number. nothing more is planned for periodicals in the near future, because the new card file enables the master holdings list of 800 titles to be updated in technical services by the periodicals assistant, who also keypunches one-half time. the time-consuming retyping of the holdings list is now eliminated, and multiple copies of up-to-date holdings lists can be produced more frequently with less effort. another new area for which programming specifications were released in december 1971 is reference. in this system it is hoped that subject bibliographies and holdings lists, based on library of congress classification, can be produced. this system will have a multitude of purposes, one of the primary ones being to give better service to our faculty members. we get many requests for copies of portions of our shelflist or other extracts of holdings. rather than filling these requests by xeroxing cards or tedious typing, a few extract specifications will permit computerized retrieval and printing. also, search time in the catalog will be cut down considerably. in the subject bibliographies, the library plans to be able to extract on any heading, stem of a heading, or any part of a heading, thus getting much more flexibility than in manual use of the card catalog. programming for this is currently under way, and after the system has been completed and is operational, some interesting results should be identified. by including three subject headings of fifty characters in our original file design, it was possible to design and program the reference series as a spin-off of the acquisitionstechnical services system with a minimum of additional effort. even if it is eventually decided to lengthen either the number or size of the subject headings contained in parkland's file, useful services will have been provided under the original design, as well as simply having provided a base for further decisions and developments. 134 journal of library automation vol. 5/2 june, 1972 other projects which are being considered for future action are serials holdings (in parkland's case, mostly annuals and yearbooks which get cataloged), including an anticipation list, and management statistics consisting of holdings percentages by class letter versus collection additions and circulation figures by class letter. circulation itself will undoubtedly not be designed prior to actual residence on the permanent campus ( anticipated for fall 1973), but all of the above are possibilities and some will receive attention in the immediate future. by building a data base which includes subject headings and call numbers, many future projects will be practical to consider as the file maintenance programs and the data base will already exist. these, of course, may be modified from time to time to meet changing conditions and requirements. additionally, parkland's library staff has been following cooperative library automation efforts involving other libraries, and would happily consider participation in appropriate cooperative ventures. conclusion in the opinion of both the library and computer staff, the automation of acquisitions is a success. it was accomplished rapidly and essentially on time and economically-with few costs higher than originally anticipated. now that the system is operating smoothly, with only an occasional bug cropping up, the extra workload caused by parallel operations has been phased out and the total efficiency of the system should continue to improve. the system to date has been running on a weekly basis, and this has proved satisfactory to both the computer center personnel and the library. the library is among the first parts of parkland to be on a regular weekly schedule using the computer. most other processing is on a monthly and quarterly cycle. in approaching any automated systems development, a general attitude of flexibility combined with thoroughness is very important and will probably bring the best long-term results. by being flexible and open-ended, regardless of what portion of a library's functions were originally automated, the way will be paved to provide a data nucleus for other applications in areas of the library. thoroughness in design and attention to initial detail are also important, as sometimes it is harder to find the time to make the changes than was expected. there is probably a tendency to get along with an operational system as it is, rather than making minor non-crucial modifications in it, although such changes do get worked in as time permits. nonetheless, it is very important that in the initial stages a system be as comprehensively planned as feasible. the parkland college learning resource center is fortunate in that original specifications (on the whole) were well thought out and provided a cohesive unit, which is also characterized by built-in flexibility, and as a result is adaptable to future growth. automation of acquisitions/carter 135 acknowledgments numerous individuals have participated in and supported library automation efforts at parkland college. david l. johnson, director of the learning resource center provided the initial inspiration and determination. robert 0. carr, director of the computer center, welcomed the library's commitment to automation and provided the technical advice where necessary. sandra lee meyer, acquisitions librarian, gave full cooperation, including tireless aid in clarification of requirements and debugging test results. since late july 1971, bill abraham has been the programmeroperator for the library system and has consistently given more than one hundred percent effort. jim whitehead from western illinois university contributed valuable advice based on his prior experience in acquisitions automation. finally, kathryn luther henderson, an inspirational teacher and friend, voluntarily spent many hours writing test data and offering the opportunity for many fruitful discussions. references 1. thomas k. burgess, "criteria for design of an on-line acquisitions system at washington state university library," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 50-66. 2. alvin c. cage, "data processing applications for acquisitions at the texas southern university library," in proceedings, texas confe1·ence on library automation, 1969 (houston: texas library association, acquisitions round table, 1969), p. 35-57. 3. john b. corbin, "the district and its libraries-tarrant county junior college district, fort worth, texas," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 114-34. 4. t. c. dobb, "administration and organization of data processing for the library as viewed from the computing centre," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 75-80. 5. connie dunlap, "automated acquisitions procedures at the university of michigan library," library resources & technical se rvices 11: 192202 (spring 1967). 136 journal of library automation vol. 5 / 2 jun e , 1972 6. robert m . hayes and joseph becker, handbook of data processing for libraries (new york: wiley-becker and hayes, 1970). 7. john f. macpherson, "automated acquisition at the university of western ontario," in automation in libraries. papers presented at the c.a.c.u.l. workshop on library automation at the university of british columbia, vancouver, april 10-12, 1967 (ottawa, ontario: canadian library association, 1967). 8. ned c. morris, "computer-based acquisitions system at texas a & t university," journal of library automation 1 :1-12 (march 1968 ). 9. louis vagianos, "acquisitions: policies, procedures, and problems," in automation in libraries. papers presented at the c.a.c.u.l. workshop on library automation at the university of british columbia, vancouver, april 10-12, 1967 (ottawa, ontario: canadian library associ ation , 1967 ), p. 1-9. design of library systems for implement at ion with interactive computers 65 i. a. w arheit: program administrator, information systems marketing, international business machines, san jose, california in the development of library systems, the movement today is toward the so-called "totar' or integrated system. this raises certain design and implementation questions, such as: what functions should be on-line, real time and what should be done off line in a batch mode; should one operate in a time-share environment or is a dedicated system preferred; is it practical to design and implement a total system or is the selective implementation of a series of applications to be preferred. although it may not be feasible in most cases to design and install a total system in a single operation, it is shown how a series of application programs can become the incremental development of such a system. currently library mechanization is entering a new phase. the first phase, extending from 1936 to the mid-fifties, saw the development of a number of small, scattered, and essentially experimental automatic data processing ( adp) library applications. these were punch card systems for purchasing, serials holdings lists and circulation control. during the second phase, which has been running now about 15 years, a large number of library applications have been mechanized. these include the production of catalog cards, book catalogs, periodical check-in, serials holdings, circulation control systems, acquisitions programs and searching of files, or 66 journal of library automation vol. 3/ 1 march, 1970 information retrieval. systems librarians have been busy designing individual programs, building special computer stored files, implementing conversion of records and developing operating procedures for these various applications. more importantly, they have been studying the library from a systems point of view in order to have a better understanding of the individual tasks performed and how they can be best accomplished with the available tools. at first concern was limited to individual applications in the library. gradually some of the more perceptive systems analysts began to be concerned about integrating these various applications. some simple examples are the generation of book cards for process control and circulation control as a by-product during the order-receiving cycle; the combination of subscription renewal, claims, and binding control with the serials holding program; the development of authority lists in book catalog programs; the simultaneous updating of accession files and circulation control files, etc. the purpose of many of these partially integrated programs was to reduce redundancy and make multiple use of single inputs. the next step was to look at the library as a whole and consider it as a "total" or single, integrated system. rather than building a series of independent applications programs, a number of libraries began to plan total systems in which the individual applications would be integrated segments. in the past year or two such efforts have been undertaken by the university of chicago, stanford university, redstone arsenal, the national library of medicine, washington state university, university of toronto, system development corporation, ibm and others ( 1, 2, 3, 4, 5, 6) . it is this total systems concept which is the new and current development of library electronic data processing ( edp). at first, a total integrated system was conceived as a series of separate application programs utilizing separate files, but whose records have similar formats and field designators allowing for the multiple use of single inputs. a more advanced concept, however, calls for the construction of a single logical file, even though, physically, the individual record elements may be distributed over a number of tracks and storage devices. operating on this central file are a series of program modules performing functions involving file building, searching, computation, display and printing. as each application is called for-that is, as the librarian prepares an order, receives an invoice, checks in a periodical, adds a call number, does some cataloging, charges out a book, etc.-the appropriate program functions are called into use. attached to the file are a number of indexes or access points. one such program, for example, provides some eighteen indexes: author, permuted title, subject heading, descriptor, call number, invoice number, publisher, serial j.d., l. c. card number, borrower, etc. it is not just coincidental that the development of the total integrated library system developed at the same time that computer hardware besystems with interactive computers/ warheit 67 came available that made it practical, especially in an economic sense, to operate a total library system. one of the basic elements of this hardware was the development of real-time, on-line, terminal-oriented, timeshared systems. at present, orders for on-line systems are increasing at such a rate that it is estimated in the june 23, 1969, edp weekly "that half of the computers installed by 1975 will be on-line systems." although there are a number of reasons why on-line, time sharing and terminal oriented equipment made it feasible to build total library systems, the fundamental ones were that now the librarians could interact with their system and records and could, essentially simultaneously, perform a great variety of tasks. the scientific and business communities have been quick to take advantage of these new capabilities. a number of computer manufacturers, software firms and service companies soon started to provide terminal oriented, commercial time-share services. by the beginning of 1969 there were some 35 such services in existence, serving over 10,000 customers; by the end of 1969 it is estimated there will be over 30,000 users. although these systems are often used essentially for remote job entry, their main attraction for users has been their on-line, conversational, realtime capabilities. the interactive, man-computer techniques made possible by commercial time-sharing services have been extremely valuable for problem solving applications, especially engineering and programming. however, the wide availability of text editing packages have also opened up these services for libraries. one of the first academic libraries to use such a service for preparing bibliographic records was the state university of new york at buffalo (7, 8). many universities and industrial firms have developed their own timesharing systems. a number of special libraries, notably those in ibm, were quick to take advantage of their in-house, time-share system to implement acquisitions, catalog input and library bulletin programs (9). the defense documentation center over three years ago began preparing its bibliographic inputs on line. the suny biomedical network based in syracuse does the same (10). the washington state university library was one of the first academic libraries to implement an on-line acquisition program ( 11), and midwestern university ( 12) and bell laboratories ( 13) now have on-line circulation control systems. with the advent of time-shared, on-line capabilities and the potentiality of building total, integrated systems, librarians today who are planning edp systems are faced with a number of design decisions: 1) should the system be a real-time, on-line system or an off-line, batch mode operation, or a combination of both? 2) is it desirable to operate in a timeshare environment or is a dedicated system to be preferred? 3) should one design a total, integrated system or should one selectively implement a number of individual applications? 4) if the decision is for an integrated system, how can it be incrementally implemented? 68 journal of library automation vol. 3/1 march, 1970 it is recognized that a program must be tailored to fit the available resources and that it is not always possible to build an ideal system. nevertheless, design objectives must be established even though they cannot be immediately realized. if the ultimate objectives are understood, then the program development will be orderly and later reconversions will be kept to a minimum. therefore, even though the design objectives may not be achieved for a number of years, they should be established so that current implementation can be carried out in a rational manner with some assurance that the system will grow and develop. real time or batch library operations have always involved a variety of interactive real time and batch mode procedures. most operations dealing directly with the library patrons are, of course, in real time; reference question handling and charging of books are typical examples. some technical processes, such as cataloging and searching for acquisition, are also essentially interactive, real-time operations. this means that the librarian completely processes each item by creating or updating a record or servicing an inquiry, one at a time, with little or no attempt to batch the identical operations for a number of items or inquiries. other processing, however, such as preparing and mailing orders to vendors, sorting and filing charge-out cards, sending overdues, filing into the catalog, checking in periodicals, labeling, preparing binding, etc., is essentially done in the batch mode. in other words, batch and real-time operations complement each other, for whereas it is more effective to do some operations in real time, hatching is more effective for other operations. librarians, therefore, expect and need both modes of operation. the actual distinction between these two modes is often lost in certain mechanized systems where everything is done in a non-interactive batch mode while interactive, real-time services are provided from printouts. many current library mechanized systems are really nothing more than processing techniques for producing the standard, hard-copy, bibliographic tools such as catalog cards, serials lists, book catalogs, orders, overdue notices and the like. whenever the librarian wants to use the information generated by these programs, he consults the hard-copy files or lists. he does not interrogate a computer file directly. this approach has been typical of many other computer-based information systems. when the first direct access devices ( ramac) were made available for commercial and industrial inventory control, they were used primarily to update the records and to produce the inventory lists and card files which the user would consult for information. later, as confidence developed in the machines, and terminals became available, the printout lists and files were abandoned and the user began consulting the computer store directly. systems with interactive computers/warheit 69 typically today in libraries using computer systems, inputs are processed in batches and outputs are produced in batches. real-time services are provided from the print-outs: the catalogs, the on-order file, serials lists and so on. even circulation control has been an off-line, batch operation. although the charge-out may be made through a data entry unit, all that is actually accomplished at the time is that the transaction is recorded. it is only later that the transactions are hatched and processed, the files set up for the loans, the discharges pulled from the file and the delinquencies handled. although librarians will not, in the immediate future at least, as readily give up their card catalogs and printed lists as business and industry are doing and as some enthusiasts believe librarians will ( 14, 15) the queuing problem alone where the public must use the files would be very severesome hard-copy files could be dispensed with in an on-line system. certainly hard-copy files of circulation records, periodical checkin records, authority lists, on-order records and the like need not be maintained when these files are available via terminals. until now, practically all library machine processing, with a few exceptions, has been hatched, off line and not interactive. in a non-interactive system, records are created and modified by manual preparation of work sheets followed by keypunching for data entry. in a library environment, for example, this means that the acquisitions librarian fills out an order work sheet that is given to a keypunch operator, who either prepares a decklet of punch cards or punches a paper tape or makes a magnetic record on tape. the cards or tape are then fed into the computer, the input is edited and errors noted and a proof copy is printed. the error messages and proof copy come back to the order librarian, who makes the necessary corrections. these are handed to the keypunch operator, who corrects and updates the record and inputs it again into the computer. if the operator has not introduced some new errors, the record is then processed. if she has, the record loops back again to the order librarian. the same story can, of course, be told about catalog records, journal and report records, and so on. in an interactive on-line system, the originator of the information (in this example the order librarian) could key his data directly into the computer or could prepare a work sheet for operator input. the editing would occur at once by the terminal responding to each entry and verification or error messages would be returned immediately. the librarian or operator would enter the necessary corrections and upon acceptance of the record by the system would signal entry of the record into the file and the print queues as required. also, during the preparation of the entry, the librarian would be using the terminal-presumably a display type terminal-to consult the files he needs, such as shelf list, orders outstanding, authority lists, etc. 70 journal of library aait01tultion vol. 3/ 1 march, 1970 a simplified flow chart comparison of an off-line and an on-line cataloging process would look something like that shown in figure 1. off-line catalog revision kp proof input output worksheet h edit h .], t 8 correction correction 1 .online cataloger output or --4 input edit (7 revision catalog worksheet l 1 error correction i fig. 1. cataloging process: off-line and on-line. although only a few library applications and no total library system are as yet on-line operations, a number of analogous operations are being carried out in other industries, such as order entry, inventory control, production scheduling, insurance policy information, freight waybilling, etc., so that one can make a few tentative assessments (16, 17, 18). to begin with, in an on-line system a work sheet does not have to be prepared, and so the keypunch operator is eliminated. because of the interaction of the originator and the system, all corrections and editing are accomplished at once, so that the tum-around time is very much less. preparation of printed error messages and proof copy are eliminated and the total error rate is greatly reduced. thus, although the reading-in of the individual records is slower in the on-line mode than in the batch mode, appreciably fewer messages need be read to complete a record in the on-line mode, making for more economical machine time. to this, however, must be added terminal and communication costs as well as the terminal supervisor program and the fact that most on-line work is done systems with interactive computers/warheit 71 during the prime shift, so that actual machine costs tend to be higher with the on-line system. some, however, dispute this, claiming that, on balanc~, machine costs are equal. labor costs, however, are very much lower with the on-line system. as a general rule, computer input costs are 85% labor and 15% machine. not only can a transcription clerk be eliminated, but the order librarian who prepares the original inputs on the terminal works very much more efficiently. consulting hard-copy files and lists is more time consuming and less informative than interrogating machine files. in an on-line system, the librarian's necessary tools are brought directly to him and displayed rapidly and efficiently. he does not have to walk to the sheh list, the catalog or the on-order file and copy information. in a well developed, sophisticated system some of the heavily used tools, such as the subject heading authority lists and class tables, would also be available from the terminal. not only does the librarian not have to spend time going to the physical files, but since the information is computer stored, it is brought to him in a greater variety of forms and sequences than is available in the hard-copy files. for example, titles are fully permuted so that incomplete title information can be searched. some systems librarians are proposing the use of codes and ciphers to search for entries, especially those with garbled titles ( 19, 20). all entries, including added authors, editors, vendors, etc., are immediately available even for uncataloged on-order items, so that searching is not restricted to main entries. it is not surprising, therefore, that clerks preparing computer inputs prefer working on line rather than off line. one interesting discovery is that since operators can do so much more with on-line systems they tend to take more time to turn out a better product. indications are "that significantly lower costs would have resulted if the time-sharing users had stopped work (i.e. gone to the next task) when they reached a performance level equal to that of batch users" ( 17). even with a circulation control system, there is higher system efficiency with an on-line operation. every transaction, such as a charge-out or a discharge, is an actual inquiry into the file as to the status of the book and borrower and the answer is immediately available; therefore controls and audit procedures can be simpler. elaborate error correction routines do not have to be provided in the program to identify improper inputs as has to be done with an off-line system. incorrect loans are not made of restricted material, such as holds and reserves, or to delinquent borrowers. the system also acts as a locator tool for determining the location and availability of volumes. as a final note, on-line systems are necessary if effective networks are to be developed and decentralized services provided ( 21, 22). the basic conclusion is that an on-line system can handle more work and provide more services at greater machine costs but lower labor costs than a manual or an off-line machine system. in view of the fact that 72 journal of library a-utomation vol 3/1 march, 1970 machine costs are coming down rapidly, while labor costs and throughput demands are forever rising, the future of the on-line machine system in the library looks very promising. time share or dedicated system a number of librarians have had very unhappy experiences with data processing departments over which they had no control. machines have been changed, schedules dropped, library jobs delayed or dropped for "higher priority jobs" and so on. one tendency, therefore, has been to try to get a library's own computer facility. but, as de gennaro so succinctly summarizes it, "the economics of present day computer applications in libraries make it virtually impossible to justify an in-house machine of the capacity libraries will need dedicated solely or largely to library uses ... eventually, library use may increase to a point where the in-house machine will pay for itself, but during the interim period the situation will be uneconomical unless other users can be found to share the cost. in the immediate future, most libraries will have to depend on equipment located in computing or data processing centers . . . experience at the university of missouri suggests the future will see several libraries grouping to share a machine dedicated to library use . . . it seems reasonable to suppose that in the next few years sharing of one kind or another will be more common than having machines wholly assigned to a single library . . ." ( 23). it is true that the small computers are getting more powerful and it is quite possible the day will come when small stand-alone computers will have the capacity to do all the jobs required by the library. for the time being, however, an on-line system supporting a number of terminals for a variety of tasks in the library requires a computer of a size which cannot be economically justified except for the very large libraries. also, one thing that is often overlooked is that implementing a large library system requires data processing technical support that is very seldom available on the library's staff. one need only look at the information systems office of the library of congress, or the system analysis and data processing office of the new york public library to have some appreciation of the requirements for such technical support. also, a large central system often has backup capabilities which provide insurance against breakdowns and interruptions. the question really is not whether a library should time share or have a dedicated system, but rather whether or not the library has the necessary control over its segment of the total system. this segment is the library's property and its services are available to the library as set forth in the agreement made when the library became part of the data processing services. again, it must be emphasized that all this applies to systems which have to perform all library functions. most libraries, however, in order to systems with interactive computers/warheit 73 get started and develop their programs, are beginning with small, standalone computers or are submitting batch jobs to a data processing center. later, as their programs develop, they will have to upgrade their com• puter capabilities. in view of the ultimate needs of a system which will support most of the major processing functions of a library, most libraries will have to have access to computer facilities whose full support they cannot economically justify. time sharing, certainly for the immediate future, will be required for any on-line library system. total integrated system or individual application it is more economical to handle a variety of library applications by using a single file and a standard set of functional programs, than it is to provide a separate file and a separate set of application programs for each application. not only is it more economical, but this total, integrated approach is, in its essential modularity, extremely flexible. functions can be added, changed, or removed, and sequences can be re-ordered, so that the system can grow and change with changing needs and capabilities. also, since the full record is available, if needed, for every application, added services, normally not feasible, are practical. for example, a circulation control system that, instead of having separate circulation files, keeps charge records in its central bibliographic file, can set a hold on all copies of a book, no matter where the copies are kept, as in the bellrel system ( 13). also, from a total record one can select various subsets and make different orderings to provide a variety of services. the library systems currently being designed are essentially mechanized versions of existing manual systems. however, as experience is gained with these new systems, as more advanced equipment is made available, and as research and development provide new insights, these systems will evolve and change. for example, in some cases a major part of descriptive cataloging is becoming a part of acquisitions. the former compartmentalization in libraries is already breaking down. one should, therefore, be prudent and not lock up the system into tightly compartmentalized segments on the assumption that current file subsets will remain unchanged. it is advisable that each library activity have potential access to all system functions and to all records. in the present context, an activity may have no need for all functions, nor does it need the total record, but as the system develops it might very well need these added capabilities. the problem, however, is that for a total, integrated system one must first build a complete structure including the file and all the functionssuch as file building, search, compute, compare, display, print, etc.-as well as set up all the access points which are essentially indexes. in addition, all the overhead necessary for supervising the programs, managing the files, and monitoring the terminals must be provided for. to use an 74 journal of library automation vol. 3/ 1 march, 1970 analogy, one must first build foundation, walls and roof and install all plumbing and wiring before building any rooms. consequently, the start up or initial investment is far higher than for implementing a single application program. some who have undertaken the development of total systems did not fully appreciate this at first and have, as a result, had to replan their development programs. even if one could bring in a fully debugged program for a total system, there would still be the tasks of converting records, training staff, setting up operating manuals and working out procedures. only as machinable records became available and the file grew and developed could various applications become operable. from a practical point of view, the implementation of a total system would have to be incremental; that is, once the basic system is installed, applications would have to be implemented one at a time and in some rational order. this is even more true where the programs for a total system have not been written as yet or where the library's resources are such that it can only undertake one job at a time. from a practical point of view, one can develop and implement only one application at a time. furthermore, as is often the case, the available equipment is limited and cannot do everything the library will ultimately want. it is necessary, therefore, to develop single applications and to design them in such a way that they can become part of an integrated system. it is also necessary to have a strategy and a plan to move up through the various levels of mechanization. today there are many who, although accepting a total, on-line system as a desirable goal, feel that it is impractical to consider because of costs and unavailability of equipment. a full analysis of economic change in terms of wage-cost rise and machine-cost decrease, of technologic improvement and of demand for added services, goes far beyond the limits of this paper. there is developing, moreover, a literature on these subjects (24, 25, 26, 27, 28, 29). suffice it to say that an increasing number of librarians are becoming convinced that library mechanization is inevitable, that it will affect all operations of the library, that it will provide the highest level of service through direct, on-line, interactive systems and that, whatever today's limitations may be, these changes are coming so fast that plans must be made now. these individuals are also convinced that whatever is now undertaken in the way of mechanization will evolve into an integrated system with many basic functions operated in a real-time, on-line mode. implementation of an integrated library system typically a library mechanization project will start with a single, relatively uncomplicated application that will not impact library operations very much, will require only a small amount of systems design and programming, and will run in a batch mode on a small equipment configuration. a typical example is the preparation of a serials holdings list. from systems with interactive computers/warheit 75 this first job, the librarian and his staff will become acquainted with data processing, will introduce the data processing personnel to some library requirements and will, hopefully, begin to develop procedures for working with the computing center. having passed this introductory stage, many librarians continued, as a rule, simply by developing the next application. today, however, the more prescient ones are first assessing the total impact of mechanization and, having decided that their library will be mechanized, try to plan what their foreseeable goals are, then work out a plan to achieve these goals. having decided that the ultimate goal is a total integrated system for the whole library, which will provide real-time services and therefore must operate on line, the library planner will set priorities and work out a strategy to reach these goals. in some instances he can start designing a total system. in other situations, he does not have the resources to do so, but plans to make use of programs being developed for other libraries or of so-called standard, commercial packages, or programs which may be developed jointly with other libraries. he should realize that he can't just sit and wait for d-day when a total complete program will be wheeled in and a turnkey operation will be installed overnight. the lead time necessary for planning, training, conversion and installation is too often grossly underestimated, so that these preliminary preparations are neglected to the detriment of orderly growth and development. having established certain long-range goals, the librarian will tailor his current programs so that the library system will develop as smoothly as possible. he will try to keep the various subsystems and program segments as generalized and as modular as possible. he will structure his records so that they can ultimately be fitted together into a full bibliographic record. he will try to avoid using records so truncated that they will have to be discarded and recorded again later. he may, in fact, actually start with a full record that is comparable to his present shelf list or catalog card, even though there may be no need of the whole record for the current application. he will provide for a variety of print options, such as line width, number of lines, number of columns, etc., so that a separate print program will not have to be written for each product or to accommodate every change in style. he will try to organize his files so that the file structures and the record formats will not have to be radically changed when the system goes on line. he may store some of his records -his active on-order file, for example-on direct access storage devices. if he can, he will create access points to his large bibliographic file and store them on disk files too, even though he is currently operating off-line. such direct access storage of indexes makes economic sense when very large files and library files are large and grow very fast must be searched or sorted. aside from these immediate benefits, such a file organization requires little or no restructuring or record reformatting when 76 journal of library automation vol. 3/ 1 march, 1970 the system ultimately goes on line and becomes terminal oriented. as early as possible, he will put his circulation control system on line. this is by far the cheapest and easiest on-line operation requiring the least investment and yet producing the most immediate benefits. again, aside from the immediate benefits, this on-line operation represents an important building block for the ultimate total system. aside from the current improved services, the experience of working on line and the opportunity to develop and refine processes and procedures will pay important dividends in the design and implementation of the total on-line system. with knowledge of how he wants his system to develop, the librarian is now able to establish priorities and allocate his resources. the emphasis will be on file building, on capturing the record. acquisitions programs or circulation control systems will come first. work on the display terminal and communication will come later after searchable files have been built up. in other words, an attempt is made to have a controlled growth through several levels of mechanization. a start is made with a simple, off-line, batch job. then a beginning is made on building what is to become the main, central bibliographic file, the catalog. as soon as possible, parts of it are stored on direct access devices, so that it can be used more effectively and so that its structure will conform to the requirements of an ultimate on-line system. a simple on-line process is adopted as soon as feasible. each application program uses standard functional modules in macro form and so on. all this, of course, is highly oversimplified and may seem truistic to many. nevertheless, there has been too much evidence of programs undertaken without adequate planning and of programs that have lacked continuity because adequate guide lines have not been established. such failures are too often ascribed to changes in personnel or hardware. a project should be designed so that inevitable changes in personnel and hardware can be tolerated without its being wrecked. therefore, the establishment of long-range goals can have a profound effect on the shape and success of current operations. more and more librarians and systems personnel engaged in library projects are beginning to think in terms of total integrated systems. they are looking ahead and planning. they are designing and implementing their present applications not in a simple ad hoc way but as part of what is to become a total system. references 1. alexander, r. w.: "toward the future integrated library system," 33rd conference of fid and international congress on documentation, (tokyo: 1967). systems with interactive computers/ warheit 77 2. redstone scientific information center : automation in libraries (first atlis workshop) 15-17 november 1966, huntsville, ala.: redstone arsenal, (june 1967). report rsic625. 3. black, donald v.: "library information system time sharing: system development corporation's lists project," california school libraries, (march 1969), 121-6. 4. black, donald v.: library information system time-sharing on a large, general purpose computer. (system development corporation report sp-3135, 20 september 1968). 5. bruette, vernon r.; cohen, joseph; kovacs, helen : an on-line computer system for the storage and retrieval of books and monographs (brooklyn, new york : state university of new york downstate medical center, 1967). 6. fussier, herman h.; payne, charles t. : development of an integrated computer-based bibliographical data system for a large university library. (chicago : chicago university, 1968) . clearinghouse report pb 179 426. 7. balfour, frederick m.: "conversion of bibliographic information to machine readable form using on-line computer terminals," journal of library automation, 1 (december 1968), 217-26. 8. lazorick, gerald j.: "computer/ communications system at suny buffalo," educom. the bulletin of the interuniversity communications council, 4 (february 1969), 1-3. q_. bateman, betty b.; farris, eugene h.: "operating a multilibrary system using long-distance communications to an on-line computer," proceedings of asis, 5 ( 1968 ), 155-62. 10. pizer, i. h.: "regional medical library network," medical library association bulletin, 57 (april 1969), 101-15. 11. burgess, t .; ames, l.: lola library on-line acquisitions subsystem. (pullman, wash.: washington state university library, july 1968). 12. reineke, charles d.; boyer, calvin j. : "automated circulation system at midwestern university," ala bulletin, 63 (october 1969 ), 1249-54. 13. kennedy, r. a.: "bell laboratories' library real-time loan system (bellrel)," journal of library automation, 1 (june 1968), 128-46. 14. licklider, j. c. r. : libraries of the future (cambridge, massachusetts : m.i.t. press, 1965). 15. swanson, don r. : "dialogues with a catalog," library quarterly, 34 (january 1964), 113-25. 16. brown, robert r.: "cost and advantages of on-line dp," datamation, 14 (march 1968), 40-3. 17. gold, michael m.: "time-sharing and batch-processing; an experimental comparison of their values in a problem-solving situation," communications of the acm, 12 (may, 1969), 249-59. 78 journal of library automation vol. 3/ 1 march, 1970 18. · sackman, h.: "time sharing versus batch processing: the experimental evidence," afips conference proceedings, 32, 1968 spring ]oint computer conference, 1-10. 19. nugent, william r.: "compression word coding techniques for information retrieval," journal of library automation, 1 (december 1968) ) 250-60. 20. ruecking, frederick h.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-38. 21. grosch, audrey n.: "implications of on-line systems techniques for a decentralized research library system," college & research libraries, 30 (march 1969), 112-18. 22. rayward, w. boyd: "libraries as organizations," college & research libraries, 30 (july 1969), 312-26. 23. de gennaro, richard: "the development and administration of automated systems in academic libraries," journal of library automation, 1 (march 1968), 75-91. 24. "the costs of library and informational services." knight, douglas m.; nourse, e. shepley, eds.: in libraries at large (new york: r. r. bowker, 1969), 168-227. 25. cuadra, carlos a.: "libraries and technological forces affecting them," ala bulletin, 63 (june 1969), 759-68. 26. culbertson, dons.: "the costs of data processing in university libraries : in book acquisition and cataloging," college & research libraries, 24 (november 1963), 487-89. 27. dolby, j. l.; forsyth, v.; and resnikoff, h. l.: an evaluation of the utility and cost of computerized library catalogs. final report project no. 7-1182, u. s. department of health, education and welfare. 10 july 1968, eric ed 022517. 28. kilgour, frederick g.: ''the economic goal of library automation," college & research libraries~ 30 (july 1969), 307-11. 29. knight, kenneth e.: 'evolving computer performance," datamation, 14 (january 1968), 31-5. president’s message cindi trainor information technologies and libraries | december 2013 1 hi, litans! forum 2013 i'm excited that 2014 is almost here. last month saw a very successful forum in louisville, in my home state of kentucky. there were 243 people in attendance, and about half of those were firsttime attendees. it's also typical of our yearly conference that there are a large number of attendees from the surrounding area; this is one of the reasons that it travels around the country. louisville's forum was the last of a few in the "middle" of the country--these included st. louis, atlanta, and columbus. next year, forum will move back out west, to albuquerque, nm. the theme for next year's conference will be "transformation: from node to network." see the lita blog (http://litablog.org/2013/11/call-for-proposals-2014-lita-forum/) for the call for proposals for concurrent sessions, poster sessions, and pre-conference workshops. goals of the organization at the board meeting in the fall, we took a stab at updating lita's major goal areas. the strategic plan had not been updated since 2010, so we felt it was time to update the goal areas, at least for the short term. the goals that we agreed upon will carry us through annual conference 2015 and will give us time to mount a more complete planning process in the meantime. they are: • collaboration & networking: foster collaboration and encourage networking among our members and beyond so the full potential of technologies in libraries can be realized. • education & sharing of expertise: offer education, publications, and events to inspire and enable members to improve technology integration within their libraries. • advocacy: advocate for meaningful legislation, policies, and standards that positively impact on the current and future capabilities of libraries that promote equitable access to information and technology. • infrastructure: improve lita’s organizational capacity to serve, educate, and create community for its members. midwinter activities in other governance news, the board will have an online meeting in january 2014, prior to cindi trainor (cindiann@gmail.com) is lita president 2013-14 and community specialist & trainer for springshare, llc. http://litablog.org/2013/11/call-for-proposals-2014-lita-forum/ mailto:cindiann@gmail.com president’s message | trainor 2 midwinter conference. our one-hour meeting will be spent asking and answering questions of those who typically submit written reports for board meetings: the vice-president, the president, and the executive director. as always, look to ala connect for these documents, which are posted publicly. we welcome your comments, as well as your attendance at any of our open meetings. our midwinter meeting schedule is: • the week of january 13 online meeting, time and date tba • saturday, january 25, 1:30 4:30 p.m. pcc 107a • monday, january 27, 1:30 4:30 p.m. pcc 115a as always, midwinter will also hold a lita happy hour (sunday, 6-8 pm, location tba), the top tech trends panel (sunday, 10:30 a.m., pcc 204a), and our annual membership meeting, the lita town meeting (monday 8:30 a.m., pcc 120c). we look forward to seeing you, in philadelphia or virtually. make sure to check the midwinter scheduler (http://alamw14.ala.org/scheduler) for all the details, including the forthcoming happy hours location. it's the best party^h^h^h^h^h networking event at midwinter! i would be remiss if i did not mention lita's committees and igs and their midwinter meetings. many will be meeting saturday morning at 10:30 a.m. (pcc 113abc)--so you can tablehop if you like. expressing interest at midwinter is a great way to get involved. can't make it to philadelphia? no problem! fill out the online form to volunteer for a committee, or check out the connect groups of our interest groups. some of the igs meet virtually before midwinter; some committees and igs also invite virtual participation at midwinter itself. join us! http://alamw14.ala.org/scheduler lib-s-mocs-kmc364-20141005045237 223 cumulating the supplements to the seventh edition of lc subject headings roy b. torkington: central library and documentation branch, international labour office. 0 at the time of writing, the author was head, library systems department, university of california, san diego. a description is presented of the project of the university of california library automation program to cumulate the 1966 through 1971 supplements to the library of congress subject headings. the university of california institute of library research marc processing software, bibcon, was used, with specially written programs. the resulting cumulation was edited, printed in book form, and made available to libraries. the final task involved merging six marc files into one file of over 125,000 records and then printing that file in a format similar to that of lc subject headings. the project was a cooperative effort with participation by people from several uc campuses. introduction the seventh edition of subject headings used in the dictionary catalogs of the library of congress was published with a cutoff date of june 30, 1964. the first supplement covered additions and changes from july 1, 1964 through december 31, 1965. subsequent supplements were issued annually, with each annual cumulating quarterly over a one-year period.1• 2 by 1972, when the supplement for 1971 was issued, it was necessary, when assigning or verifying subject headings, to use seven supplements ( 1964/ 65, 1966, 1967, 1968, 1969, 1970, 1971) in addition to the seventh edition. the supplement cumulation task of the university of california library automation program aimed at alleviating that problem by merging the 1966 through 1971 annual supplements into one cumulation. through the courtesy of the library of congress we were able to obtain unedited magnetic tape files of all supplements except the 1964/65; these files were in the lc internal format. the university of california library automation program undertakes cooperative automation programs which will benefit uc libraries. the cu0 the views expressed in this document may not be considered those of the international labour office. ~ i ·u~· tc:j-[ ci ··o· l3 ..... l.cmo101n' ...,. 00 ""~'"'·" .o ........ ""it '•,())' s....cj;.tj'td eii·10\.'c a mull ' ' ,~-. p'!... ~' ' .. ~ ... " .. .---.l'-' ...,' }" , i ~ ... , ... '' ',,, 0: ~/ j:.ym l111fc~lt.d ed-~41c~tmu&.~ ' ' .. .,. ...... ··.,o / umlim flt-'l\1~1'<11*"' fig.l. preliminary cumulation production and editing. ' ' , ' , ' ' 'tf 't5 ',b edited. kern ! dorm sori!hl f:llt.1uc\i~atien. m-,-llcu hitisom jc.,..tldltmi ... ,.c~)t,-.., . 'd \5' ~ ;:l, ~__j l:=_j '@),, ,/' ' l '@• / ....... ..... aiviual 0 b' ~ l:;:-j fig. 2. annual supplement cumulation cycle. ...... .. ,... ..... .,_, n•w ~ .... ~ ""'" i a. .a t"-1 .... i r ~ ~ ~ ~ t::;j (1) ~ 0" ~ 1-' ~ cumulating supplements to lc subject headingsjtorklngton 225 mulative supplement task was seen as a low-cost project which could result in a product useful to uc and other libraries. we intended to use software already available at uc as much as possible. the available software was bibcon, a group of marc processing programs developed and used by the uc institute of library research, berkeley, and uc santa cruz, for production of the 1963-1967 uc union catalog supplement and the uc santa cruz book catalog.8 the bibcon programs to be used were: sked, a sort key creator and editor; biblist and bibprint, record, column, and page formatters; and fix, a record corrector. essentially we considered the problem of cumulating the supplements as one of producing a book catalog by cumulating several annual catalogs. thus we needed a program to convert the lc internal format to uc marc (supcon) and a merge program ( supmrg) (see figures 1 and 2) . merge transactions a special merge program was necessary, because merging the supplement files is more complex than merely interfiling entries. frequent matches or <;0 < m [ai nu~bt• ui hor oui _10 nc . \. ii ill i u u 01 ' 110 ,,_; ,. 01 u n" " "'"""•"""" '"'" " ~~"~!"".~_'~~-~~ · ko l nn• ll k """" " kh 011 " "' '""""" " "'" fig. 2. circulation card. a mark-sense reproducer prepares the cards for the computer. this reproducer had been acquired for other college computer functions and the library was able to make use of it (2). under this plan the books are charged out by having the borrower write in his identification number, which serves as the borrower's number, and his name in the appropriate box on the ibm circulation card ( 3). the student assistant at the desk mark-senses the book card with the identification number; this is the one manual operation, but it has presented no problem. the marked circulation cards are sent three times a week to the data center, where the mark-senses are read and punched and the due date is gang-punched in. the 1401 computer generates a second circulation card, duplicating the accession number, call number, author and title. old and new circulation cards are machine filed together by accession number and returned to the circulation file, which is arranged by date and accession number. it was found that the accession number is easier to read than the library of congress number and is the truly unique number. a printed circulation listing, arranged by call number to facilitate use, is kept at the charge desk; it shows accession number, author and title, borrower's name, identification number and due date. it is also possible to prepare a daily circulation report by student identification number and name if required. the entire circulation is sent to the data center weekly to produce a cumulative print-out of all books in circulation. these print-outs provide daily and weekly totals of all outstanding circulated books. no data processing equipment is required for reserve circulation. charging out of books on reserve continues to be done by having the borrower write his identification and name on a blue reserve card to be kept at the desk. library mechanization at auburn/hilbert 17 when a book is returned, the pair of circulation cards are selected from the circulation file. the used charge card, which contains the borrower's identification number and due date, is marked "cancelled" with a rubber stamp. the new circulation card is inserted in the pocket of the book and the book is reshelved. cancelled circulation cards are kept and sorted later to provide statistical analyses by date and class number for each semester. this system was developed because it was felt a small library could not justify expensive charging machinery. acquisitions and shelf list once the reclassification operation was organized it was possible to set up automation procedures for processing current acquisitions. an ibm card was designed as a book request card (figure 3) to be filled in by staff or faculty member. information on it includes author's name, title, publisher, price purchase order number, academic department, and requestor's name. at the data center the foregoing information is keypunched into the card, which then becomes a purchase order card. the purchase order number identifies the vendor and is gang-punched into the cards. a computer print-out produced from the purchase order cards is mailed directly to the dealer as a book order. order cards are kept in an "on order" file by dealer or purchase order number and then by author until the books are received. / i i i i i i i i i i""'' '""· .,.o . nol li.tr,. 0 c • •. nn uf ••11r nnvr tiis ill£ mlibrary author c request form tide inch.! de date & edirion ii neceuafy publis'her dept, please print or type. list price i reqvested a, complete request nn lit jhow nrs lim . and sign. p.o. -~ cod lc. cion i i do not fold, bend acceu;on .j i i i or mutilate this card. i i i i i i ' 1 f j t i i i i i ii n ij ult 111111111tr11 nn l4 11mn•~uu~»•nxnut1unm~·t/ii~miiuummmpmm.i1uh~u-ii·h~ itd»h~niih~· ~ ............. fig. 3. book request card. when the book and its library of congress cards have been received, the corresponding order cards are pulled and the following additional information is added to the purchase order card: actual cost (taken from the invoice), accession number (stamped on), and the library of congress call number (taken from the library of congress printed card). figure 4 is a flowchart of the acquisitions procedure. the books are then processed in the same manner as was used for reclassifying ( 1). 18 journal of library automation vol. 3/ 1 march, 1970 ,,.~, ll/20 vcr fig. 4. acquisitions flow chart. vpm r~ceipt .f~~.s tj,.de/#~d. m•ke. ~kif list . card~ f"r1m ~rcluue. orr:lel' li~ra~"''/ ~,. fi/,..,9 library mechanization at auburn/ hilbert 19 /41>1 lillo fig. 4 continued. 1~01 l/1~0 order cards are sent to the data center to reproduce the shelf list cards, automatically transferring the pertinent information already punched in the order card and keypunching the additional information into the shelf list cards (figure 5). currently, provision is being made for inclusion of the library of congress card order number in the shelf list card to enable easy subsequent selection of the corresponding marc records. -r;, l,l,i"r%ry -hr .fi/i-,9 fig. 5. new books listing procedure. 20 l ournal of library automation vol. 3/ 1 march, 1970 the shelf list cards are used to produce the new books list (figure 5). the shelf list is kept in the ibm card form, and a book catalog could easily be made if so desired. to compile a bibliography it is only necessary to take the punched cards from the shelf list in the wanted classification. the library's subject catalog and the library of congress subject headings are checked to determine the class numbers to be used. as depicted in the flowchart in figure 5, these cards are put through the computer to produce the print-out, and then returned to the punched shelf list file. this system was designed to produce a bibliographical record of the books in the library and to automate the technical processing of the books in as simple a method as possible so as not to defeat the purpose of automating. accounting (figure 6) the accounting system was designed to use the book request card after it has had department and cost punched into it. after the books are processed, accumulated request cards are sent periodically to the data center, where computer print-out is produced by department, listing the books purchased and the cost of each, with a summary showing all expenditures. copies are sent to department chairmen to keep them informed of their expenditures. these order cards are kept for a semester, then returned to the individual requesting faculty members after a cumulative accounting record has been made. by this means it is possible to keep track of each department's book budget and the library's total book budget, with the computer doing all the work. l117tj fig. 6. accounting procedure. library mechanization at auburn/ hilbert 21 overdues (figure 7) overdue notices are machine prepared from overdue circulation cards which are selected periodically from the charge-out file. the cards are passed through the computer, which generates second and third overdue overdue c!.ire. ecl"ds 14tji 70 li.6ran; mw ovt~due.. file l. 1/~t:f fig. 7. overdue procedure. t 22 journal of library a.utomation vol. 3/ 1 march, 1970 cards to be used for discharging purposes. gummed address labels that include the student identification number are produced using the college log of names and addresses. the appropriate label is applied to the reverse side of the circulation card using the i.d. as a guide. each notice card is stamped "overdue book, please return as soon as possible," then sent through the postage meter and placed directly in the mail. if several overdues are sent to the same person, the cards are mailed in an envelope, using the gummed label. the second and third notice cards are filed at the circulation desk until needed or until the book is returned. there is another file for borrowers who are seriously delinquent in returning their books. cards that have accumulated in this overdue file are processed as follows to generate further overdue notices: an overdue notice is sent to the borrower, the dean's letter to the borrower or to his parents, and the list of names to the dean and the student personnel office. at the end of each semester a list is prepared indicating all books held by individual faculty members for more than three months and the latter are notified. the time-consuming operation of preparing overdues has been considerably reduced ( 4). serials serial holdings have been converted to machine readable punched cards. the state university of new york, under the direction of dr. irwin pizer of the upstate medical center at syracuse, has recently published a union list containing the titles of all periodicals received in all units of the state university (5). it includes the serial holdings of auburn community college, (approximately 400 titles) and punched cards for these holdings are used by the library adapted for its use. information on the card comprises title, inclusive dates, years on microfilm, department for which the periodical was ordered and the indexes in which the periodical is listed. each new serial title added to the holdings is keypunched with this information. the punched cards are used to print out an alphabetically arranged title listing and a departmental listing. adding or withdrawing titles is a simple matter, and up-to-date lists of periodical holdings are easily produced by the computer. copies of the lists are sent to eacli faculty member and several copies are available at the desk and in the periodical room. costs since library use of the data center was considered to be similar to other college uses (e.g., that of the business office), the cost of library automation was absorbed by the data center and not charged to the library. an estimate of the cost, including rental time on the computer (about three hours per week), supplies, and data center staff time, is about $1500.00 a year for ongoing programs. library mechanization at auburn/hilbert 23 conclusion the automated systems herein described have now been completely operational for over a year. converting data for a computer operation spotlighted inaccurate recording of information and afforded a good opportunity for correcting previous errors. periodically, progress and results have been reviewed and changes made, as will continue to be the case. the automated circulation system is providing the library with rapid, accurate, and efficient circulation control not possible for a manual system. ease and speed of performing routine library operations by the use of automation more than compensates for the cost of data processing. automated technical procedures provide for faster and more efficient processing of books, production of the library's monthly new books list (which previously took hours to type) and subject bibliographies. other important results of the mechanization project are the serial listings and departmental accounts, all of which make possible better library service. acknowledgments the programming was done in autocoder by, or under the supervision of, mr. richard klinger, chairman of the data processing department at auburn community college; to him is due most of the credit for the mechanization of the library. the library is grateful to mr. klinger for his encouragement and enthusiastic support and his willingness to assume the technical responsibilities of programming and systems design. references 1. international business machines: mechanized library procedures. (white plains, n. y.: ibm, n. d. ). 2. international business machines: library processing for the albuquerque public school system (white plains, n. y.: ibm, n. d.). 3. dejarnett, l. r.: "library circulation." in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964), pp. 78-93. 4. eyman, eleanor g.: "periodicals automation at miami-dade junior college," library resources and technical services, 10 (summer, 1966), 341-61. 5. the union list of serials in the libraries of the state university of new york. (syracuse, n.y.: state university of new york upstate medical center, 1966). marc international richard e. coward: head of research and development, the british national bibliography, london, england 181 the cooperative development of the library of congress marc ii profect and the british national bibliography marc ii project is described and presented as the forerunner of an international marc network. emphasis is placed on the necessity for a standard marc record for international exchange and for acceptance of international standards of cataloging. this paper is an examination of two major operational automation projects. these projects, the library of congress marc ii project and the british national bibliography (bnb) marc ii project, are the result of sustained and successful anglo-american cooperation over a period of three years during which there has been continuous evaluation and change. in 1969, for a brief period, the systems developed have been stabilised, partly to give time for library systems to examine ways and means of exploiting a new type of centralised service, and partly to give the library of congress and the british national bibliography the opportunity to look outwards at other systems being developed in other countries. there has, of course, already been extensive contact and exchange of views between the agencies involved in the planning and developing of automated bibliographic systems and the possibilities of cooperation and exchange have been informally discussed at many levels. the time has now come for the national libraries and cataloguing agencies concerned to look at what has been achieved and to lay the foundation for effective cooperation in the future. the history of the anglo-american marc project began at the library 182 journal of library automation vol. 2/ 4 december, 1969 of congress with an experiment in a new way of distributing catalogue data. the traditional method of distributing library of congress bibliographic information is to provide catalogue cards or proof sheets. these techniques will undoubtedly continue indefinitely into the future, but the rapid spread of automation in libraries has created a new demand for bibliographic information in machine readable form. the original marc project ( 1) was "an experiment to test the feasability of distributing library of congress cataloguing in machine readable form". the use of the word "cataloguing" underlines the essential nature of the marc i project; its end product was a catalogue record on magnetic tape. there is a very significant difference between a catalogue record on magnetic tape and a bibliographic file in machine form. the latter does not necessarily hold anything resembling a catalogue entry, although marc ii still reflects, both in the lc implementation ( 2,3) and in the bnb implementation ( 4,5), a preoccupation with the visual organisation of a catalogue entry. fortunately retention of the cataloguing ''framework" does not hinder the utilisation of lc or bnb marc data in systems designed to hold and exploit bibliographic information, as the whole project is designed as a method for communication between systems. the essence of the marc ii project is that it is a communications system, or a common exchange language between systems wishing to exchange bibliographic information. it is highly undesirable, in fact quite impossible, to plan in terms of direct compatability between systems. machines are different, programs are different, and local objectives are different. the exchange of bibliographic information in any medium implies some level of agreement on the best way to organise and present the data being exchanged. the need to use a fairly standard type of bibliographic structure on a catalogue card is obvious enough, and over the years a form of presentation, as best exemplified by a library of congress catalogue card, has been developed which holds all the essential data and also, by means of typographical distinctions and layout, conveys the information in a visually attractive style. when bibliographic information is transmitted in a machine readable form the question of visual layout does not arise but the question of structure is vitally important. this structure is called the machine format and the machine format holds the data. it literally does not matter in what order the various bits and pieces that make up a catalogue record appear on a magnetic tape. what does matter very much is that the machine should be able to recognise each data element: author, title, series, subject heading, etc. in practice, either each data element must be given an identifying tag that the machine can recognise, or each data element must occupy a predetermined place in the record. in view of the unpredictable nature of bibliographic information, the former methodthat of tag identification-is now widely used and is the technique adopted in the marc system. marc international/ coward 183 the lc and bnb marc systems are two very closely related implementations of a communications format which in its generalised form has been carefully designed to hold any type of bibliographic information. the generalised format description is now being circulated by british standards institute and united states of america standards institute. it can be very briefly described as follows : i leader i directory i control field(s) i data fields the leader is a fixed field of 24 characters, giving the record length, the file status and details of the particular implementation. the directory is a series of entries each containing a tag (which identifies a data field) , the length of the data field in the record, and its starting character position. this directory is a variable field depending on the number of data elements in the record. the control fields consist of a special group of fields for holding the main control number and any subsidiary control data. the data fields are designated for holding bibliographic data. each field may be of any length and may be divided into any number of subfields. a data field may begin with special characters, called indicators, which can be used to supply additional information about the field as a whole. it can be seen that the basis of marc ii is a very flexible file structure designed to hold any type of bibliographic record. once such a level of compatability is established it is possible to prepare general file handling systems ( 6) which will convert any bibliographic record to a local file format. there is certainly much scope for agreement on local file formats as well, but such formats will necessarily be conditioned by the type of machine available and the use to be made of the file. the establishment of a generalised file structure is a great step forward but by itself means very little unless a wide measure of agreement can be reached on the data content of the record to be exchanged. here the responsibility for cooperation and standardisation shifts from the automation specialist to the librarian, and particularly to those national libraries and cataloguing agencies who can by their practical actions assist libraries to implement the standards prepared for the profession. in order to appreciate the real importance of standardisation, particularly in the context of the marc project, it is necessary to look a few years into the future. it is inevitable that the rapid spread of automated systems in libraries will create a demand for machine readable bibliographic records and that in turn will lead to the setting up of bibliographic data banks in machine readable form in national and local centres. these data banks will be international in scope and will contain many millions of items. in the long run the only feasible way to maintain them is for each country or group of countries to develop automated centralised cata184 journal of library automation vol. 2/4 december, 1969 loguing systems for handling their own national outputs and to receive from all other countries involved in the network machine readable records of the latter's national outputs. countries cooperating on this basis must agree on standards of cataloguing (and ultimately on standards of classification and subject indexing), so that the general data bank presents a consistently compiled set of bibliographic data. there is no doubt that national data banks will be set up. libraries today are faced simultaneously with a rapid increase in book prices, a need to maintain ever-increasing book stocks to meet the basic requirements of their readers, and a persistent shortage of trained personnel to catalogue their purchases. these trends are already well established and in the united states, where they are most advanced, the result has been the massive and highly successful shared cataloguing program. historically the shared cataloguing program will probably be seen as the first and last attempt to provide a comprehensive bibliographic service by unilateral action. a large number of countries have cooperated in this attempt but the shared cataloguing program does not rest on the principle of exchange. it is doubtful if even the united states will be able to maintain and extend this programme in its present form. the shared cataloguing program must ultimately be replaced with an international exchange system. national machine readable bibliographic systems will be established, but there is a grave danger that those agencies responsible will be primarily concerned only with the immediate problem of producing records suitable for use in their own national context or for their own national bibliography, regardless of the fact that the libraries and information centres they need to serve are acquiring ever-increasing quantities of foreign material. the exchange principle will be downgraded to an afterthought, a bv-product of the fact that an automated system is being used. if this outcome is to be avoided, international standards must be prepared and national agencies must accept them instead of only paying lip service to them. in the past librarians have tended to be more concerned with codification than standardisation, but in the field of cataloguing at least a great breakthrough was made sixteen years ago when seymour lubetzky produced his "cataloguing rules and principles; a critique of the a.l.a. rules for entry and a proposed design for their revision" ( 7). the work of lubetsky led to the "paris principles" ( 8) published by ifla in 1963 and in due course to the preparation of the "anglo-american cataloguing rules" 1967 ( 9) . these rules, though unfortunately departing from lubetzky's principles in one or two areas provide a solid basis for standardisation. we are fortunate to have them available at such a critical moment in the history of librarianship. they must form the basis of an international marc project. of all the great libraries of the world, the library of congress has done more than any other to promote international cataloguing standards. it is now in a uniquely favourable position to promote these standards marc international/coward 185 through its own marc ii project. the lc marc ii project, together with the bnb marc ii project, can provide the foundation of the international marc network. these projects alone cover the total field of english language material and yet already the basic requirement of standardisation is absent. the library of congress finds itself unable, for administrative reasons, to adopt fully the code of rules it worked so hard to produce and which british librarians virtually accepted as it stood in the interests of international standardisation. that a great library should be in this position is understandable. what is less understandable is that the library of congress should transfer the non-standard cataloguing rules established by an internal administrative decision to prescription of cataloguing data in the machine readable record that it is now issuing on an international basis. one of the great advantages of machine readable records is that they can simultaneously be both standard and non-standard. there is no reason that the library of congress, or any national agency, should not provide for international exchange a standard marc record together with any local information the library might want. if as a result other national agencies are encouraged to do the same, it will not be long before the absurdity and expense of examining each record received via the international network in order to change a standard heading to a local variant, will become apparent. the british national bibliography has already accepted the anglo-american code and by this action has now done much to promote its acceptance in great britain. incomplete acceptance of the code is really the only significant difference between the two marc projects. at a detailed level there are differences in some of the subfield codes. these are chiefly due to the fact that the british marc committee was particularly concerned with the problems of filing bibliographic entries, and as no generally accepted filing code exists it was decided to provide a complete analysis of the fields in headings. this analysis will enable the bnb marc data base to be arranged in different sequences to test the rules now being prepared. the other difference, or extension, in the british marc format is the provision of cross references with each entry, on the assumption that in a marc system a total pack of cataloguing data should be provided. however these differences reflect the experimental nature of the british project, not the fundamental differences in opinion. in this paper an attempt has been made to look at the british and american marc projects not as systems for distributing bibliographic information but as the forerunners of an international bibliographic network. intensive efforts have been made to lay a foundation for this international network. the anglo-american code provides a sound cataloguing base, the generalised communications format provides a machine base, and the standard book numbering system provides an international identification 186 journal of library automation vol. 2/ 4 december, 1969 system. these developments are all part of a general move towards real cooperation in the provision of bibliographic services. they must now be brought together in an international marc network. references i. avram, henriette d.: the marc pilot profect (washington, library of congress: 1968). 2. u. s. library of congress. information systems office. the marc ii format: a communications format for bibliographic data. prepared by henriette d. avram, john f. knapp and lucia j. rather. (washington, d. c.: 1968). 3. "preliminary guidelines for the library of congress, national library of medicine, and national agricultural library implementation of the proposed american standard for a format for bibliographic information interchange on magnetic tape as applied to records representing monographic materials in textual printed form (books)," journal of library automation, 2 (june 1969) . 68-83 4. bnb marc documentation service publications, nos. 1 and 2 (london, council of the british national bibliography, ltd., 1968 ). 5. coward, r. e.: '~he united kingdom marc record service," in cox nigel s. j.; grose, michael w.: organization and handling of bibliographic records by computer (hamden, conn., archon books, 1967). 6. cox, nigel s. m.; dews, j. d.: "the newcastle file handling system," in op. cit. (note 4). 7. lubetzky, seymour: code of cataloging rules ... prepared for the catalog code revision committee . .. with an explanatory commentary by paul dunkin. (chicago : american library association, 1960). 8. international federation of library associations. international conference on cataloguing principles, paris, 9th-18th october, 1961: report; edited by a. h. chaplin. 9. anglo-american cataloging rules. british text (london: library association, 1967). editorial board thoughts: requiring and demonstrating technical skills for library employment emily morton-owens information technologies and libraries | september 2016 6 recently i’ve been involved in a number of conversations about technical skills for library jobs, sparked by an ital article by monica maceli1 and a code4lib presentation by jennie rose halperin.2 maceli performed a text analysis of job postings on code4lib to reveal what skills are cooccurring and most frequent. halperin problematized the expense of the mls credential in comparison to the qualifications actually required by library technology jobs and the salaries offered for technical versus nontechnical work. this work has inspired many conversations about the shift in skills required for library work, the value placed on different kinds of labor, and how mls programs can teach library technology. during a period of hiring at my institution and through teaching a library school course in which many of the students are on the brink of graduation, my attention has been called particularly to one point in the library employment process: job postings. these advertisements are the first step in matching aspiring library staff with the real-life needs of libraries—where the rubber meets the road between employer expectations and new-grad experience. most libraries already use the practice of distinguishing between required and preferred qualifications, which is a good start, especially for technology jobs where candidates may offer strong learning proficiency yet lack a few particular tools. although there have been conflicting interpretations of the hewlett-packard research suggesting that men are more likely than women to apply to jobs when they don’t meet all the requirements,3 i observe a general tendency among graduating students to err on the side of caution because they’re not sure which qualifications they can claim. among my students, for example, constant confusion attends the years of experience required. is this library experience? general job experience? experience at the same type of library? paid or unpaid? postings are often ambiguous and students may choose to apply or not. similarly, there are questions about what extent of experience qualifies someone to know a technology: mastering it through creating new projects at a paid job, experience maintaining it, or merely basic familiarity? not knowing who has been hired, and on the basis of what kind of experience, is a gap for researchers trying to close the loop on job advertisements. even when a job posting has avoided an overlong list of required technical skills, it might still be expressing a narrow sense of what’s required to qualify. someone who understands subversion will be capable of understanding git, so we see plenty of job advertisements that ask for experience with a “a version control system (e.g. git, subversion, or mercurial).” i recently polled staff in our department and found very few of us with bachelor’s degrees in technical subjects. more of us had come to working in library technology through work experience or graduate programs. and yet, our job postings contained long statements that conflated education and experience, such as “bachelor’s degree in computer science, information science, or other emily morton-owens (egmowens@upenn.edu), a member of the ital editorial board, is director of digital library development and systems, university of pennsylvania libraries, philadelphia, pennsylvania. mailto:egmowens@upenn.edu editorial board thoughts | morton-owens doi: 10.6017/ital.v35i3.9527 7 relevant field and at least 3 years of experience application development in object oriented and scripting languages or equivalent combination of education and experience. master’s desirable.” i edited our statement to more clearly allow a combination of factors that would show sufficient preparation: “bachelor’s degree and a minimum of 3-5 years of experience, or an equivalent combination of education and experience, are required; a master’s degree is preferred,” followed by a separate description of technical skills needed. this increased the number and quality of our applications, so i’ll remain on the lookout for opportunities to represent what we want to require more faithfully and with an open mind. meanwhile, on the other side of the table, students and recent grads are uncertain how to demonstrate their skills. first, they’re wondering how to show clearly enough that they meet requirements like “three years of work experience” or “experience with user testing” so that their application is seriously considered. second, they ask about possibilities to formalize skills. recently, i’ve gotten questions about a certificate program in ux and whether there is any formal certification to be a systems librarian. surveying the past experience of my own network—with very diverse paths into technology jobs ranging from undergraduate or second master’s degrees to learning scripting as a technical services librarian to pre-mls work experience—doesn’t suggest any standard method for substantiating technical knowledge. once again, the truth of the situation may be that libraries will welcome a broad range of possible experience, but the postings don’t necessarily signal that. some advice from the tech industry about how to be more inviting to candidates applies to libraries too; for example, avoiding “rockstar”/ “ninja” descriptions, emphasizing the problem space over years of experience,4 and designing interview processes that encourage discussion rather than “gotcha” technical tasks. at penn libraries, for example, we’ve been asking developer candidates to spend a few hours at most on a take-home coding assignment, rather than doing whiteboard coding on the spot. this gives us concrete code to discuss in a far more realistic and relaxed context. while it may be helpful to express requirements better to encourage applicants to see more clearly whether they should respond to a posting, this is a small part of the question of preparing new mls grads for library technology jobs. the new grads who are seeking guidance on substantiating their skills are the ones who are confident they possess them. others have a sense that they should increase their comfort with technology but are not sure how to do it, especially when they’ve just completed a whole new degree and may not have the time or resources to pursue additional training. even if we make efforts to narrow the gap between employers and jobseekers, much remains to be discussed regarding the challenge of readying students with different interests and preparation for library employment. library school provides a relatively brief window to instill in students the fundamentals and values of the profession and it can’t be repurposed as a coding academy. there persists a need to discuss how to help students interested in technology learn and demonstrate competencies rather than teaching them rapidly shifting specific technologies. editorial board thoughts | morton-owens doi: 10.6017/ital.v35i3.9527 8 references 1. monica maceli, “what technology skills do developers need? a text analysis of job listings in library and information science (lis) from jobs.code4lib.org,” information technology and libraries 34 no3 (2015): 8-21, doi:10.6017/ital./v23i3.5893. 2. jennie rose halperin, “our $50,000 problem: why library school?” code{4}lib, http://code4lib.org/conference/2015/halperin. 3. tara sophia mohr, “why women don’t apply for jobs unless they’re 100% qualified,” harvard business review, august 25, 2014, https://hbr.org/2014/08/why-women-dont-apply-for-jobsunless-theyre-100-qualified. 4. erin kissane, “job listings that don’t alienate,” https://storify.com/kissane/job-listings-thatdon-t-alienate. http://dx.doi.org/10.6017/ital./v23i3.5893 http://code4lib.org/conference/2015/halperin https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://storify.com/kissane/job-listings-that-don-t-alienate https://storify.com/kissane/job-listings-that-don-t-alienate lib-mocs-kmc364-20131012114038 278 circulation systems past and present* maurice j. freedman: school of library service, columbia university, new york city. a review of the development of circulation systems shows two areas of change. the librarian's perception of circulation control has shifted from a broad service orientation to a narrow record-keeping approach and recently back again . the technological development of circulation systems has evolved from manual systems to the online systems of today. the trade-ojjs and deficiencies of earlier systems in relation to the comprehensive services made possible by the online computer are detailed. in her 1975 library technology reports study of automated circulation control systems, barbara markuson contrasted what she called "older" and "more recent" views of the circulation function. the "older" or traditional view was that circulation control centered on conservation of the collection and recordkeeping. the "more recent" attitude encompasses "all activities related to the use of library materials. " 1 it appears that this latter outlook is not as new as markuson had suggested. in 1927, jennie m. flexner's circulation work in public libraries described the work of circulation as the "activity of the library which through personal contact and a system of records supplies the reader with the [materials] wanted. "2 flexner went on to characterize four major functions of circulation as follows: (1) the staff must know the books in the collection, and have a working familiarity with them. (2) the staff must know the readers; their wants, interests, etc. (3) the circulation staff must fully understand the library mission and policies and work harmoniously with those in related departments. (4) the circulation department has its own particular duty to perform .... effective routines and techniques must be established by the library and mastered by the staff if the distribution of books is to be properly accomplished and the public is to have *this article is adapted from a speech delivered at rutgers university. manuscript received november 1980; revised may 1981 ; accepted july 1981. circulation systems/freedman 279 the fullest use of the resources of the institution. the library must be able to locate books, on the shelves or in circulation; to know who is using material and how the reader can be traced, if he is misusing or unduly withholding the books drawn. 3 the function of circulation has not changed since flexner's description. even within the context of online circulation systems, it is absolutely essential that the circulation system be seen in as broad a context as possible. it is not merely an electromechanical phenomenon staffed by automatonclerks. circulation services involve that function which is ultimately one of the most fundamental: the satisfactory bringing together of the library user and the materials sought by that person. it follows, then, that the mechanism and means of delivery and control of the service are only a small part, and certainly not the most important part of the circulation function. knowing your collection, your readers, and clearly knowing your library's mission are crucial prerequisites for the effective circulation of library materials. an examination of the history of circulation systems and their evolution to the present state reveals the change in outlook from a narrow view of the circulation function to a broader view. let us begin by establishing the basic elements of record keeping, upon which circulation control is based. there are three categories of records: 1. for the collection of materials, books, tapes, microforms, etc., comprising the library. 2. for the readers or users of the library service. 3. for the wedding or concatenation of the first two, i.e., the library user's use or borrowing of the library's materials. a minimal circulation model is a set of procedures or recordkeeping with respect to only the third category, i.e., records of the materials held by the library user outside of the library. a total or complete system would then be one that provides for all three categories. using these criteria to judge the level of control provided by the various circulation systems of the past, let us review. the earliest method of circulation control was the chain method. in this case, "circulation" is not an accurate term; "use" of materials is more appropriate, as the collection did not circulate. books were chained to the wall and the user did not take the material outside of the library. the minimal circulation model is not met, and records were not required. several hundred years later, the ledger system's first iteration involved a simple notation into a ledger. the identification of the book-call number and/or author and title-and the borrower's identification were recorded. upon the return of the book, the borrower or the receiving clerk initialed the ledger entry or otherwise indicated the return of the item. minimal circulation control is met. a more developed or sophisticated ledger system exceeded this minimal circulation model. the new ledger had each page headed by a different 280 journal of library automation vol. 14/4 december 1981 borrower or registration number. consequently, a given user had all of his or her charges recorded on the given page indicated by the user's number. the economy of not having to write the borrower's name for every transaction was made possible through the creation of a file of patron records linked to the ledger page by common registration numbers. in effect, this was our first "automation." the use of a master file in support of anumbered page provided information that had previously been handwritten every time someone wished to borrow books from the library. the new ledger system also allowed for a more orderly control of charges. only the borrower's number was needed to get at the page of transactions relating to that borrower, as opposed to the former methoda benchmark method, in a sensein which the transactions were chronologically entered and had no other ordering whatsoever. even with the improved ledger system, though, the only ordering was by borrower number and date of issue to the borrower. there was no arrangement that provided for sequencing or finding the books borrowed. the need to identify borrowed books led to the dummy system. every book had a concomitant dummy book (or large card) that had a ruled sheet of paper with the book identification information on it and the borrower's name and/or number. when a user wished to borrow a book, the dummy was pulled from a file and the borrower information was written on the sheet of paper. the dummy was then filed on the shelf occupying the space formerly occupied by the book itself. when the book was returned, it was reshelved, the dummy removed, and the circulation transaction was crossed out. this system is interesting in that it provides for a complete inventory control. either all items are on the shelf in proper sequence or a physical surrogate or record for circulating items is substituted and placed in proper sequence. one has instant and, in effect, "online" access to the presence or absence of materials if one has the call number and can go to the shelf. unlike most systems that can only tell whether or not the book is present, the dummy system tells who has the book and when it was charged. in terms of a minimal model, this system provided less and more than the ledger system. if a reader wanted a list of books he or she borrowed, the reader would have to view every dummy and see if the listed item was charged to him or her. in contrast, the ledger system served such a request well, though every page of the ledger might have to be examined to find out who had borrowed a book not found on the shelf. leaping past several systems, let us now discuss the newark system , the overwhelmingly prevalent system in the united states today (if we include the mechanical or electromechanical versions of dickman, gaylord (the manual, not automated), and demeo). the newark system incorporated the best features of the systems already mentioned. a separate registration file was kept which provided both alphabetic access by patron and numeric access by patron registration circulation systems/freedman 281 number. consequently, the recording of the borrower's identification during circulation transactions only involved the notation of the number. for book identification, a card and matching pocket were placed in each book with the call number and/or author-title identification information. the circulation transaction involved the removal of the card from the pocket and the entering on it, ala dummy system, the date of the transaction and the borrower number. the cards for all of the books borrowed on a given day were aggregated and filed in shelflist sequence in a tray headed by the date of the transactions. resorting to computer jargon, the major or primary sort of the book cards (read circulation cards) was by date, but the minor sort was by call number. consequently, if one wanted to know the status of a given book and one had the call number, it would not take too long to search, even with a file as large as the one in the main branch of newark public library, by looking for the item in all of the different days' charges. when a book was returned, the clerk noted from the date-of-issue card inserted in the book's pocket, the tray in which to search, and the matching call number on the pocket which was used for discharging the book, i.e., removing the charge card from the tray and replacing it in the book. the combination of the books on the shelf plus the cards in the different trays in shelflist order constituted a complete inventory. additionally, the trays of cards comprised a comprehensive record of all current charges, i.e., all transactions by date, call number, and borrower, with borrower number pointing to fuller information in the registration file. looking back at our basic model, the newark system offered not just the minimum-a record of the item and the borrower who took it-but also introduced a major step toward inventory control. there was an inventory sequence involved, or, more accurately, several inventory sequences-one for each given collection (or day) of circulation transactions. what was still missing was a record by borrower of what was charged to him or her. in the original newark system, the borrower's card had entered upon it dates of issue and return of items. this way, even if the library could not tell the user what items (s)he had, the user's card would reflect the number of items outstanding. the handling of reserves, renewals, and overdue notices occurred as follows: a colored clip or some indicator on a circulation card would be used to indicate a reserve. a renewal would be handled the same as a return except the person would wait while the charge card was pulled from the appropriately dated tray, and assuming that no reserves had been placed on the circulation card, the book would be recharged (i.e., renewed) to the borrower. overdues automatically presented themselves by default. cards left in a tray after a predetermined number of days represented charges for which overdues were to be sent. the tray was taken to the registration file and the numerically sequenced registration cards for the delinquent borrowers removed so that notices could be prepared and sent. then the 282 journal of library automation vol. 14/4 december 1981 registration slips and circulation cards had to be refiled at the completion of the process. essentially, most subsequent systems are variants on the newark system. the mcbee key-sort system involves the use of cards with prepunched holes around the edges, one of which can be notched to indicate the date an item is due. the cards are arranged by call number creating a single sequence. the insertion of a knitting needle .like device through a given hole will allow all of the books overdue for a given date to fall free of the deck. this system is like the newark system in that it has inventory and date access, but unlike newark it places a horrible burden on the borrower. each card has (written by the borrower) the borrower's name and address and the call number, author, and title of the book. thus, the library is saved the labor of creating circulation cards and maintaining registration records for every patron-all of the information needed is on the charge card. but here, as marvin scilken has pointed out, the burden of the library's tasks are merely passed on to the users. this point should be emphasized. the next system to be considered is the photo-charge system. microphotos are taken of the borrower's card, which has the name and address on it, the book card (as in the newark book identification card), and a sequentially numbered date-of-issue or date-due slip . again, as with the mcbee, since the photo record includes the borrower's name and address, one can throw away registration files. also, a list or range of transaction numbers is kept by date used. since the numbered date-of-issue slip is placed in the book at the time of charging, and one removes it when the book is returned, it is a simple step to cross off or remove the number on the slip from its corresponding duplicate on the list of numbers for that day's transactions. overdue transactions are found by searching for unchecked transaction numbers on the numerically sequenced microfilm. this system does meet the criterion of the minimal model, a record of the user's use of the item. in terms of labor intensity, one has eliminated the maintenance of charge-card files and registration files by a single microfilm record. reserves, though , are terribly time-consuming with the photo-charge system: each returned book, before it can be returned to the shelf or renewed, must be searched against a call-numbered sequence of reserve cards. academic libraries would not use this kind of system because call-number access is a necessity, especially in relation to recalls of longloaned items . the elimination of paper files is what so commended this system to public libraries over the newark-based systems. but, as was noted, one has virtually no way of determining who took a book out or when it is due back except, in principle, by searching all of the reels of microfilm. some variants on this microfilm system were developed. bro-dart marketed a system that thermographically produced eye-readable records instead of microimages . such was the state of circulation systems before computers began to be used. the following-a discussion of the involvement of computers-can circulation systems/freedman 283 be separated by the type of hardware: main frames, minicomputers, and microcomputers. the main-frame computer has been used primarily in the past as a processing unit for batches of circulation transactions collected and fed to it via punched cards, terminals, or minicomputers. call number and author and title (albeit brief) and user identification number, were captured for each transaction. in the 1960s and into the early 1970s, this information would be batch-processed by the computer and a variety of reports would be produced. what the computer does, then, is keeps track of numbers, their ranges, and the dates of the ranges. but the computer can do much more than this. it is capable, as none of the nonautomated systems were, of rearranging the data input and then comparing and tabulating them as desired and appropriate. consequently, the fact that the call number, author, and title are stored by the machine means that lists or files can be arranged by any of these elements. the same goes for date of transaction. as to borrower identification number, a master file much like the newark registration file is kept (only now in its machine-readable form), and the computer does the comparing at high speed instead of the clerk taking the charge record and going to the numeric file to find the name and address of the borrower. of course, the computer can then readily and quickly print out overdue notices with an obvious absence of clerical support and labor intensity. as we all know, the rate of increase of labor costs in increasing, and the rate of increase of computer costs is decreasing. two kinds of large computer systems have been used. the batchoriented one, which either kept track of items in circulation only (the absence system-only items absent from the collection were tracked), or one that kept track of the entire collection (the inventory system). 4 normally, identification numbers were used for patrons in either system. although relatively rare in academic and public libraries, the mainframe-based online system is also in use. ohio state university is famous for its online system. what is meant here is that all transactions are immediately recorded and all files are instantly updated. printing is still necessary for overdue notices, but printed circulation lists are not necessary because of the online answers to queries regarding books or patrons now possible through terminals distributed to appropriate locations. the minicomputers came on the scene in two stages. clsi's entrance in 1973 utilized one of the early minicomputers, quite small by today's standards. for relatively small libraries that had not begun to dream of having their own computers, it became possible to have an entire inventory (in abbreviated form) and an entire patron file online. consequently, all of the access power of the newark system, and none of its labor intensity, was available online and much more besides. few libraries could afford the main-frame system of ohio state, but many could pay for clsi's, and indeed they did. in the last few years, minicomputers have grown several magnitudes 284 journal of library automation vol. 14/4 december 1981 above the capacity and speed of main-frame computers of the 1960s. consequently, such firms as dataphase, systems control, geac, gaylord, and others offer these larger minis, which can now support online the needs of large branch systems with inventories of hundreds of thousands of books. incidentally, clsi, with a new mini line, can do this now as well. both the miniand maxi-based systems do all of the basic work originally outlined: the whole inventory can be accessed online or with printed lists arranged by author, title, or call number (and, presently, some vendors offer online subject access and cross-references); access can also be made by patron's name. further, the basic transactionitem, borrower, and date-is recorded and checked for holds or delinquency before it is accepted. without overly extolling the present state of the art, it should be said that all of the information identified as important in the earliest systems is now not only available in a far quicker and more usable fashion, it can be manipulated by the machine in a variety of ways to meet and serve management objectives not considered practicable in the past. peter simmons showed how collection development could be aided by automatically generating purchase orders when reserves exceeded a specified acceptable level. 5 all kinds of statistical data regarding collection and patron use can be generated that could not have been possible in a manual mode. while at the university of southwestern louisiana, william mcgrath was able to adjust book budget allocations in terms of collection use and undergraduate major in a most interesting fashion. 6 the net result was an empirically based expenditure of book funds. now the microcomputer or microprocessor is the newly emerging phenomenon , and in many respects it is not unlike the minicomputer of the early 1970s. it is being used to perform single data-recording functions, and is also being seen as the link to the larger computer . so we have moved from chained books to microcomputers the size of a desk top. originally, a great deal of information was captured at great expense and laboriously maintained. certainly the handwritten and typed records of the newark system, although relatively comprehensive, were obtained and preserved at great cost. and, despite it all , there were real limitations of access . the succeeding mcbee and photo-charging systems appreciably cut out-of-pocket costs to the library, but either passed labor directly on to the user, or eliminated access altogether. book or patron access are virtually impossible with the photo-charging method. simply put, that system tells what is overdue, and that's all. the entry in the 1960s of the computer radically altered the ground rules. now all sequences of encoded elements are possible, and management information can be derived. important statistical data pertaining to collection use and library users can be obtained by further manipulating the data accumulated in the circulation process. it is now possible for all but the smallest and the very largest libraries to have access to and control circulation systems/freedman 285 of their materials through the current range of minicomputers on the market. jennie flexner told us that circulation had to be more than maintenance and record keeping of loan and borrower transactions. through the advances of the computer technology and its application to circulation control, we have finally seen what seems to be an optimization of the recordkeeping process and, by extension, an improvement in circulation service. if instantaneous access to patron files, inventory files, and outstanding transaction files through a variety of modes and computer-developed management data does not constitute that optimization, it will have to dountil the real thing comes along. acknowledgment the author is deeply indebted to susan e. bourgault for her editorial assistance. references 1. barbara evans markuson, "automated circulation control," library technology reports quly and sept., 1975), p.6. 2. jennie m. flexner, circulation work in public libraries (chicago: american library assn., 1927), p.l. 3. ibid., p.2. 4. robert mcgee, "two types of design for online circulation systems," journal of library automation 5:185 (sept. 1972). 5. peter simmons, collection development and the computer (vancouver, b.c.: univ. of british columbia, 1971), 60p. 6. william e. mcgrath, "a pragmatic allocation formula for academic and public libraries with a test for its effectiveness," library resources & technical services 19:356-69 (fall1975). maurice j. freedman is an associate professor at the school of library service, columbia university, new york city. communications ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ roel some considered 2000 the year of the e-book, and due to the dot-com bust, that could have been the format’s highwater mark. however, the first quarter of 2004 saw the greatest number of e-book purchases ever with more than $3 million in sales. a 2002 consumer survey found that 67 percent of respondents wanted to read e-books; 62 percent wanted access to e-books through a library. unfortunately, the large amount of information written on e-books has begun to develop myths around their use, functionality, and cost. the author suggests that these myths may interfere with the role of libraries in helping to determine the future of the medium and access to it. rather than fixate on the pros and cons of current versions of e-book technology, it is important for librarians to stay engaged and help clarify the role of digital documents in the modern library. a lthough 2000 was unofficially proclaimed as the year of the electronic book, or e-book, due in part to the highly publicized release of a stephen king short story exclusively in electronic format, the dot-com bust would derail a number of high-profile e-book endeavors. with far less fanfare, the e-book industry has been slowly recovering. in 2004, e-books represented the fastest-growing segment of the publishing industry. during the first quarter of that year, more than four hundred thousand e-books were sold, a 46 percent increase over the previous year ’s numbers.1 e-books continue to gain acceptance with some readers, although their place in history is still being determined—fad? great idea too soon? wrong approach at any time? the answers partly depend on the reader ’s perspective. the main focus of this article is the role of e-book technologies in libraries. libraries have always served as repositories of the written word, regardless of the particular medium used to store the words. from the ancient scrolls of qumran to the hand-illuminated manuscripts of medieval europe to the familiar typeset codices of today, the library’s role has been to collect, organize, and share ideas via the written word. in today’s society, the written word is increasingly encountered in digital form. writers use word processors; readers see words displayed; and researchers can scan countless collections without leaving the confines of the office. for self-proclaimed book lovers, the digital world is not necessarily an ideal one. emotional reactions are common when one imagines a world without a favorite writing pen or the musty-smelling, yellowed pages of a treasured volume from youth. one of the battle lines between the traditional bibliophile and the modern technologist is drawn over the concept of the e-book. some see this digital form of written word as an evolutionary step beyond printed texts, which have been sometimes humorously dubbed tree-books. although a good deal of attention has been generated by the initial publicity regarding newer e-book technologies, the apparent failures of most of them has begun to establish myths around the concept. abram points out that the relative success of e-books in niche areas (such as reference works) is in direct contrast with public opinion of those purchasing novels and popular literature through traditional vendors.2 crawford paraphrases lewis carroll in describing this confusion: “when you cope with online content about e-books, you can believe six impossible things before breakfast.”3 incidentally, this article will attempt to dispel a mere five of the myths about e-books. the future of e-books and the critical role of libraries in this future are best served by uncovering these myths and seeking a balanced, reasoned view of their potential. a 2002 consumer survey on e-books found that 67 percent of respondents wanted to read an e-book, and 62 percent wanted that access to be from a library.4 underlying this position is the assumption that the ideas represented by the written word are of paramount importance to both writers and readers. it is also assumed that libraries will continue their critical role in collecting, organizing, and sharing information. � myth 1—e-books represent a new idea that has failed many libraries have invested in various forms of e-book delivery with mixed results.5 sottong wisely warns of the premature adoption of e-book technology, which he dubs a false pretender as a replacement to printed texts.6 however, the last five years are but a small part of a longer history, and presumably, a still longer future as is often the case with computer jargon, the term e-book has emerged and gained currency in a very short amount of time. however, the concept of providing written texts in an electronic format has existed for a long time, as demonstrated by bush’s description of the dispelling five myths about e-books james e. gall james e. gall (james.gall@unco.edu) is assistant professor of educational technology at the university of northern colorado, greeley. dispelling five myths about e-books | gall 25 26 information technology and libraries | march 2005 memex.7 the gutenberg project put theory into practice by converting traditional texts into digital files as early as 1971.8 even if the e-book merely represents the latest incarnation of the concept, it does so tenuously. books in their present form have a history of hundreds of years, or thousands if their parchment and papyrus ancestors are included. this history is rich with successes and failures of technology. for example, petroski presents an interesting historical examination of the problem of storing books when the one book–one desk model collapsed under the proliferation of available texts.9 similarly, a determination on the success or failure of e-books, or digital texts, based upon a relatively short period of time, is fraught with difficulty. rather, it is important to look at recent developments as merely a next step. the technology is clearly not ready for uncritical, widespread acceptance, but it is also deserving of more than a summary dismissal. � myth 2—e-books are easily defined the term e-book means different things depending on the context. at the simplest, it refers to any primarily textual material that is stored digitally to be delivered via electronic display. one of the confusing aspects of defining ebooks is that in the digital world, information and the media used to store, transfer, and view it are loosely coupled. an e-book in digital form can be stored on cd–rom or any number of other media and then passed on through computer networks or telephone lines. the device used to view an e-book could be a standard computer, a personal digital assistant (pda), or an e-book reader (the dedicated piece of equipment on which an e-book can be read; confusingly, also referred to as an e-book). technically, virtually any computing device with a display could be used as an e-book reader. from a practical point of view, our eyes might not tolerate reading great lengths of text on a wireless phone, and banks will not likely provide excerpts of chaucer during atm transactions. another important factor in defining e-books is the actual content. a conservative definition is that an e-book is an electronic copy or version of a printed text. this appears to be the predominant view of publishers. purists often maintain that a true e-book is one that is specifically written for that format and not available in traditional printed form.10 this was one of the categories of the shortlived (2000–2002) frankfurt e-book awards. of course, the multitude of textual materials that could be delivered via the technology exceeds these definitions. magazines, primary-source documents, online commentaries and reviews, and transcripts of audio or video presentations are just a short list of nonbook materials that are finding their way into e-book formats. one can note with some sense of irony that the technology behind the web was originally designed as a way for scientists to disseminate research reports.11 despite the web’s popularity, reading research reports makes up an exceedingly small percentage of its use today. although there is a continuing effort to reach a common standard for e-books (see www.openebook.org/), the current marketplace contains numerous noncompatible formats. this noncompatibility is the result of both design and competitive tradeoffs. in the case of the former, there is a distinct philosophical difference between formats that attempt to retain the original look and navigation of the printed page (such as adobe’s popular pdf files) versus those that retain the text’s structure but allow variability in its presentation (as best exemplified by the free-flowing nature of texts presented as html pages). this difference can also be seen in the functionality built around the format. traditional systems provide readers with familiar book characteristics such as a table of contents, bookmarks, and margin notes, a view that could be named bibliocentric. the alternative is one that takes more advantage of the new medium and could be labeled technocentric, and can most easily be seen in the extensive use of hyperlinking.12 the simplest use of hyperlinking provides an easy form of annotating texts and presenting related texts. on the other extreme, hyperlinks are used in the creation of nonlinear texts in which the followed links provide a unique context for building meaning on the part of the reader.13 it is interesting to note that a preliminary study of e-book features found that the most desirable features tended to reflect the functionality of traditional books and the least desirable features provided functionality not found there.14 competitive tradeoffs are a critical issue at the current point of e-book development. the current profit models of publishing entities and copyright concerns of authors seem naturally opposed to e-book formats in which texts were freely shared, duplicated, and distributed. for example, the open ebook forum is the most prominent organization devoted to the development of standards for e-book technologies. in late 2004, their web site listed seventy-six current members. although the american library association is a member, it is one of only six members representing library-oriented organizations. in comparison, thirty-five members (or 46 percent) are publishing organizations, and thirteen (or 17 percent) are technology companies.15 the number of traditional publishers versus technology companies on this list may suggest that a bibliocentric view of ebooks would be more favored. this also appears to confirm one media prediction that traditional publishers would continue to dominate efforts with this new medium.16 however, the limited representation of libraries in this endeavor is troubling (despite the disclaimer of using an admittedly rough metric for measuring impact). it is clear that many industry formats attempt to limit the ability to distribute materials by keying files so that they may only be viewed on one device or a specific installed version of the reader software. this creates technological problems for entities like libraries that attempt to provide access to information for various parties. the concept of fair use of copyrighted materials has to be reexamined under an entirely new set of assumptions. another irony is that the availability of free, public-domain materials in e-book format can be viewed as negative by the publishing industry. after investing considerable time and effort in developing e-book technology, publishers would prefer that users continue purchasing new e-book material rather than spend time reading the vast library of free historical material. many of these content issues are currently being played out in courts and the marketplace, particularly with regard to digital music and video.17 although one can humorously imagine the so-called problems associated with a population obsessed with downloading and reading great literature, the precedents set by these popular media will have a direct impact on the future of digital texts. despite the labor required to scan or key entire print books into digital formats, there have been some reports of this type of piracy.18 other models for the dissemination of digital intellectual property that are not determined by traditional material concerns of supply and demand will continually be attempted. for example, nelson predicted a hypertext-publishing scheme in which all material was available, but royalties were distributed according to actual access by end users.19 theoretically, such a system would provide a perfect balance between access and profitability. in nelson’s words “nothing will ever be misquoted or out of context, since the user can inquire as to the origins and native form of any quotation or other inclusion. royalties will be automatically paid by a user whenever he or she draws out a byte from a published document.”20 � myth 3—e-books and printed books are competing media many, if not most, published articles regarding e-books follow classic plot construction; the writer must present a protagonist and an antagonist. bibliophiles place the printed page as the hero and the e-book as the potential bane of civilization. proulx, one such author, was quoted as saying, “nobody is going to sit down and read a novel on a twitchy little screen—ever.”21 technologists cast the e-book as the electronic savior of text, replacing the tired tradition of the printed word in the same way the printed word replaced oral traditions. hawkins quotes an author who claims that e-books are “a meteor striking the scholarly publication world.” his slightly more restrained view was that e-books had the potential “to be the most far-reaching change since gutenberg’s invention.”22 grant places this metaphorical battle at the forefront by titling an article “e-books: friend or foe?”23 before deciding which side to take, consider whether this clash of media is an appropriate metaphor. this author has introduced samples of current ebook technology in graduate classes he has taught. when presented with the technology as part of the coursework, students quickly declare their allegiances. bibliophiles most often suggest that the technology will never replace the love of curling up with a good book. the technologists will ask how many pages can be stored in the device and then fantasize about the types of libraries they can carry and the various venues for reading that they will explore. however, after a few weeks in using the devices, both groups tend to move to a middle ground of practical use. at that point, the discussion turns to what materials are best left on the printed page (usually described as pleasure reading) and what would be useful in e-book format (reference works, course catalogs, how-to manuals). other instructors have reported similar patterns of use.24 at this point, the observation is largely anecdotal, but it does call into question the perceived need for a decisive referendum on the value of e-books. the issue is not whether e-books will replace the printed word. the concern of librarians and others involved in the infrastructure of the book should be on developing the proper role for e-books in a broader culture of information. unless this approach is taken, the true goal of libraries—disseminating information to the public—will suffer. the gap between bibiliophile and technologist approaches can already be seen in the materials available in e-book format. the publishing industry in general treats the e-book as just another format, releasing the same titles in hardcover, book-on-tape, and e-book at the same time. on the opposite end of the spectrum, technologists have adopted various e-book formats for creating and transferring numerous reference documents. given their preferences, it is easy to find e-book references on unix, html coding, and the like, but there is a scarcity of materials in philosophy, history, and the arts. librarians seem the most appropriate group for developing shared understanding. publishers and e-book hardware and software manufacturers need to be concerned with the bottom line. libraries, by design, are concerned with the preservation of information and its continued dissemination long after the need to sell a particular book has passed. the hobby of creating and transferring texts to digital form is idiosyncratic and unorganized when viewed from the highest levels. libraries not only contain expertise in all areas of human endeavor, but also have strategies for categorizing and maintaining information in productive ways. in short, libraries are the best line of defense for maintaining the value of the printed page and promoting the value of digital texts. dispelling five myths about e-books | gall 27 28 information technology and libraries | march 2005 � myth 4—e-books are expensive a common complaint about e-books is that they are expensive. on the surface, this seems clear. dedicated ebook readers seemed to bottom out at around $300, and a new bestseller in e-book format is priced about the same as the hardcover edition. add the immediate and longterm costs of rechargeable batteries and the electricity needed to power them, and the economic case against the e-book appears closed. what if we turn the same critical eye to the printed page? the manufacture and distribution of printed texts is highly developed and astounding. when gutenberg succeeded in putting the christian bible in the hands of the moneyed public, he surely could not have comprehended the billions of copies that would eventually be distributed. even with the wealth of printed material at hand, one must still consider the high cost of the system. the law of supply and demand rules books as a tangible product. the most profitable books are those that will reach the most readers. specialized texts have limited audiences and, therefore, will usually be priced higher. this produces problems for both groups. popular texts must be printed in high quantities and delivered to various outlets. unfortunately, the printed page does have maintenance costs. sellen and harper point out that the actual printing cost is insignificant compared with the cost of dealing with documents after printing. they cite one study that indicated that united states businesses spend about $1 billion per year designing and printing forms, but spend an additional $25 to $35 billion filing, storing, and retrieving them.25 books are no different; as any librarian knows, it costs money to maintain a collection and protect texts from the environment and the effects of age. in the retail arena, the competition is fiercer. books that do not sell are removed in favor of those that do. it is estimated that 10 percent of texts printed each year are turned to pulp, although, fortunately, many are recycled.26 the bbc reported that more than two million former romance novels were used in the construction of a new tollway.27 with more specialized texts, the problem is not wealth, but scarcity. if a text is not profitable, it will probably become out of print. this is often synonymous with inaccessible. from the publisher’s perspective, it is only cost-effective to commit to a printing when the demand is high enough. a library is a good source of outof-print texts, provided that it has been funded appropriately to acquire and maintain the particular works that are needed. e-books are not a panacea. other innovations, such as on-demand publishing, may be part of the answer in solving the economic issues regarding collections. however, e-books can help alleviate some of these issues. e-books are easily copied and distributed, which is a boon to the researcher and information consumer. in many cases, the goal is the access to information, not the possession of a book. it could also benefit the author and publisher if appropriate reimbursement systems are put into place. as previously described, nelson originally envisioned his online hypertext system, xanadu, with a mechanism for royalties based on access—a supply-anddemand system for ideas, not materials.28 the systems used to manage access to digital materials continue to increase in complexity and have spawned a whole new business of digital rights management (drm).29 examples include reciprocal (www.reciprocal.com), overdrive (www.overdrive.com), and netlibrary (www.net library.com). libraries are the specific target of netlibrary, which promotes an e-books-on-demand project that allows free access for short periods of time.30 the creation of a standard digital object identifier (doi) for published materials may also help online publishers and entities like libraries manage their digital collections more easily.31 online music systems, such as apple’s itunes (www. itunes.com), strike a workable balance between quickand-easy access to music and a workable, economic model for reimbursing artists. e-books also have appeal for special audiences who already require assistive technologies for accessing print collections.32 having discussed the hidden costs of printed texts, another important economic issue of e-books to examine is a current trend in usage. despite the availability of dedicated e-book readers, the largest growth in e-book usage is surely in nondedicated devices. e-book–reading software is available for personal computers, laptops, and pdas. according to one source, microsoft had sold four million pocketpc e-book-enabled devices, and had two million downloads of the ms reader for the personal computer; palm had sold approximately 20 million ebook-enabled devices; and adobe had more than 30 million acrobat readers downloaded.33 these numbers alone indicate some 24 million reader-capable pdas, and 32 million reader-capable pcs, for a total of 56 million devices. although it is difficult to find data on actual use, one online bookseller reported some data on e-book use from an audience survey.34 although 88 percent had purchased books online, only 16 percent had read an e-book (11 percent using a pc, 3 percent on a handheld device, and 2 percent on both). it is presumed that in most cases this equipment was purchased for other reasons, with ebook reading being a secondary function. as such, it would be unfair to include the full cost of this equipment in any calculation of the cost of providing information in an e-book format. if so, the cost of providing artificial lighting in any building where reading takes place would need to be calculated as part of the cost of the printed page. the potential user base for the e-book rises as more computers and pdas are sold, decreasing the need for special equipment. this does not mean that the dedicated e-book reader is obsolete. by most commercial accounts, the apple newton was a failure. its bulky size and awkward interface were the subject of much ridicule. however, it did introduce the concept of the pda. the success of the palm line of products owes much to the proof of concept provided by the newton. the makers of the portable gameboy videogame system are repositioning it for multimedia digital-content delivery, and plan to pilot a flash-memory download system for various content types, including e-books.35 innovative products such as e-paper are already developed in prototype form.36 they are likely to lead to another wave of dedicated e-book readers or provide e-book–reading potential embedded in other consumer applications. � myth 5—e-books are a passing fad it is trendy to list the failures of past media (such as radio, film, and television) in impacting education despite great initial promise.37 however, all those media are still with us after having found particular niches within our culture. if the e-book is viewed as just an alternative format, comparisons with past experiences of library collections containing videotapes, record albums, and such are not appropriate.38 however, if e-books are viewed as a tool or way to access information, the questions change. instead of asking how digital formats will replace print collections, we can ask how will an e-book version extend the reach of our current collection or provide our readers with resources previously unavailable or unaffordable. when trying to locate a research article, one is generally not concerned with whether the local library has a loose copy, bound copy, microform, microfiche, or even has to resort to interlibrary loan. as long as the content is accessible and can be cited, it can be used. electronic access to journal content is becoming more common. perhaps dry journal articles do not conjure up the same romantic visions of exploring the stacks that may hinder greater acceptance of e-books. a parallel can be drawn to the current work of filmrestoration experts. the medium of film has reached an age where some of the earliest influential works no longer exist or are in a condition of rapid deterioration. according to one film site, more than half of the films made before 1950 have already been lost due to decay of existing copies.39 the work of restoration involves finding what remains of a great work in various vaults and collections. often, the only usable film is a secondor third-generation copy. from digitized copies, cleaning, color correction, and other painstaking work, a restored and—it is hoped—complete work emerges. ironically, once this laborious process is completed, a near-extinct classic is suddenly available to millions in the form of a dvd disc at a local retailer. what if the same attitude was taken with the world’s collections of printed materials? jantz has described potential impacts of e-book technology on academic libraries.40 lareau conducted a study on using e-books to replace lost books at kent state university, but found that limited availability and high costs did not make it feasible at the time.41 project gutenberg (www.gutenberg.net) and the electronic text center at the university of virginia (http://etext.lib.virginia.edu) are two examples of scholars attempting to save and share book content in electronic forms, but more efforts are needed. unfortunately, the shift to digital content has also contributed to the sheer volume of content available. edwards has recently discussed issues in attempting to archive and preserve digital media.42 the web may be suffering from a glut of information, but the content is highly skewed toward the new and technology oriented. in a few years, we may find that nontechnology–related endeavors are no longer represented in our information landscape. � conclusion the e-book industry is currently dominated by commercial-content providers, such as franklin, and software companies, most notably adobe, palm, and microsoft. traditional print-based publishers have also maintained continued interest in the medium. it is assumed that these publishers had the capital to weather the ups and downs of the industry more so than new publishers dedicated solely to e-book delivery. although the contributions and efforts of these organizations are needed, the future of e-book content should not be left to their largesse. when the rocket e-book device was initially released, a small but loyal following of readers contributed thousands of titles to its online library. some of these titles were self-published vanity projects or brief reference documents, but many were public-domain classics, painstakingly scanned or keyed in by readers wishing to share their favorite reads. when gemstar purchased rocket, the software’s ability to create non-purchased content was curtailed and the online library of free titles dismantled. apparently, both were viewed as limiting the profitability of the e-book vendor. however, gemstar recently made notice of discontinuing their e-book reading devices, one would assume due to a lack of profitability. this can be seen as a cautionary tale for libraries, which often define success by number of volumes available and accessed rather than units sold. committing to a technology that concurrently requires consumer success can be problematic. bibliophile and technologist alike must take responsibility for the future of our collective information resources. the bibliophile must ensure that all aspects of dispelling five myths about e-books | gall 29 30 information technology and libraries | march 2005 human knowledge and creativity are nurtured and allowed to survive in electronic forms. the technologist must ensure that accessibility and intellectual-property rights are addressed with every technological innovation. parry provides three concrete suggestions for public libraries in response to new media demands: continue to acknowledge and respond to customer demands, revisit the library’s mission statement for currency, and promote or accelerate shared agreements with other institutions to alleviate the high costs of accumulating resources.43 the proper frame of mind for these activities is suggested by levy: we make a mistake, i believe, when we fixate on particular forms and technologies, taking them in and of themselves, to be the carriers of what we want to embrace or resist. . . . it isn’t a question, it needn’t be a question, of books or the web, of letters or e-mail, of digital libraries or the bricks-and-mortar variety, of paper or digital technologies. . . . these modes of operation are only in conflict when we insist that one or the other is the only way to operate.44 in the early 1930s, lomax dragged his primitive audio-recording equipment over the roads of the american south to capture the performances of numerous folk musicians.45 at the time, he certainly didn’t imagine that at one point in history someone with a laptop computer sitting in a coffee shop with wireless access could download the performances of robert johnson from itunes. however, without his efforts, those unique voices in our history would have been lost. it is hoped that the readers of the future will be thanking the library professionals of today for preserving our print collections and enabling their access digitally via our primitive, but evolving, e-book technologies. references 1. open e-book forum, “press release: record e-book retail sales set in q1 2004,” june 4, 2004. accessed dec. 27, 2004, www.openebook.org. 2. stephen abram, “e-books: rumors of our death are greatly exaggerated,” information outlook 8, no. 2 (2004): 14–16. 3. walt crawford, “the white queen strikes again: an e-book update,” econtent 25, no. 11 (2002): 46–47. 4. harold henke, “consumer survey on e-books.” accessed dec. 27, 2004, www.openebook.org. 5. sue hutley, “follow the e-book road: e-books in australian public libraries,” aplis 15, no. 1 (2002): 32–37; andrew k. pace, “e-books: round two,” american libraries 35, no. 8 (2004): 74–75; michael rogers, “librarians, publishers, and vendors revisit e-books,” library journal 129, no. 7 (2004): 23–24. 6. stephen sottong, “e-book technology: waiting for the ‘false pretender,’” information technology and libraries 20, no. 2 (2001): 72–80. 7. vannevar bush, “as we may think,” atlantic monthly 176, no. 1 (1945): 101–108. 8. michael s. hart, “history and philosophy of project gutenberg.” accessed dec. 27, 2004, www.gutenberg.net/ about.shtml. 9. henry petroski, the book on the bookshelf (new york: vintage, 2000). 10. steve ditlea, “the real e-books,” technology review 103, no. 4 (2000): 70–73. 11. tim berners-lee, weaving the web: the original design and ultimate destiny of the world wide web by its inventor (new york: harpercollins, 1999). 12. james e. gall and annmari m. duffy, “e-books in a college course: a case study” (presented at the association for educational communications and technology conference, atlanta, ga., nov. 8–10, 2001). 13. george p. landow, hypertext 2.0: the convergence of contemporary critical theory and technology (baltimore, md.: johns hopkins univ. pr., 1997). 14. harold henke, “survey on electronic book features.” accessed dec. 27, 2004, www.openebook.org. 15. open e-book forum, “press release: record e-book retail sales set in q1 2004.” 16. lori enos, “report: e-book industry set to explode,” e-commerce times, 20 dec. 2000. accessed dec. 27, 2004, www. ecommercetimes.com/story/6215.html. 17. luis a. ubinas, “the answer to video piracy,” mckinsey quarterly no. 1. accessed accessed dec. 27, 2004, www .mckinseyquarterly.com. 18. mark hoorebeek, “e-books, libraries, and peer-topeer file-sharing,” australian library journal 52, no. 2 (2003): 163–68. 19. theodor h. nelson, “managing immense storage,” byte 13, no. 1 (1988): 225–38. 20. ibid., 238. 21. jacob weisberg, “the way we live now: the good ebook,” new york times, 4 june 2000. accessed dec. 27, 2004, www.nytimes.com. 22. donald t. hawkins, “electronic books: a major publishing revolution. part 1: general considerations and issues,” online 24, no. 4 (2000): 14–28. 23. steve grant, “e-books: friend or foe?” book report 21, no. 1 (2002): 50–54. 24. lori bell, “e-books go to college,” library journal 127, no. 8 (2002): 44–46. 25. abigail j. sellen and richard h. harper, the myth of the paperless office (cambridge, mass.: mit pr., 2002). 26. stephen moss, “pulped fiction,” sydney morning herald, 29 mar. 2002. accessed dec. 27, 2004, www.smh.com.au. 27. bbc news, “m6 toll built with pulped fiction,” bbc news uk edition, 18 dec. 2003. accessed dec. 27, 2004, http:// news.bbc.co.uk. 28. nelson, “managing immense storage.” 29. michael a. looney and mark sheehan, “digitizing education: a primer on e-books,” educause 36, no. 4 (2001): 38–46. 30. brian kenney, “netlibrary, ebsco explore new models for e-books,” library journal 128, no. 7 (2003). 31. stephen h. wildstrom, “a library to end all libraries,” business week (july 23, 2001): 23. online.” they have implemented several process improvements already and will complete their work by the 2005 ala annual conference. this past fall, michelle frisque, lita web manager, conducted a survey of our members about the lita web site. michelle and the web coordinating committee are already working on a new look and feel for the lita web site based on the survey comments, and the result promises to be phenomenal. on top of all of the current activities, new vision statement, strategic planning, and the lita web site redesign, mary taylor and the lita board worked with a graphic designer to develop a new lita logo. after much deliberation, the new logo debuted at the 2004 lita national forum with great enthusiasm. many members commented that the new logo expresses the “energy” of lita and felt the change was terrific. with your help, lita had a very successful conference in orlando. although there were weather and transportation difficulties, the lita programs and discussions were of the highest quality, as always. the program and preconference offerings for the upcoming annual conference in chicago promise to be as strong as ever. don’t forget, lita also offers regional institutes throughout the year. check the lita web site to see if there’s a regional institute scheduled in your area. lita held another successful national forum in fall 2004 in st. louis, “ten years of connectivity: libraries, the world wide web, and the next decade.” the threeday educational event included excellent preconferences, general sessions, and more than thirty concurrent sessions. i want to thank the wonderful 2004 lita national forum planning committee, chaired by diane bisom, the presenters, and the lita office staff who all made this event a great experience. the next lita national forum will be held at the san jose marriott, san jose, california, september 29–october 2, 2005. the theme will be “the ubiquitous web: personalization, portability, and online collaboration.” thomas dowling, chair, and the 2005 lita national forum planning committee are preparing another “must attend” event. next year marks lita’s fortieth anniversary. 2006 will be a year for lita to celebrate our history, future, and our many accomplishments. we are fortunate to have lynne lysiak leading the fortieth anniversary task force activities. i know we all will enjoy the festivities. i look forward to working with many of you as we continue to make lita a wonderful and vibrant association. i encourage you to send me your comments and suggestions to further the goals, services, and activities of lita. 32. terence cavanaugh, “e-books and accommodations: is this the future of print accommodation?” teaching exceptional children 35, no. 2 (2002): 56–61. 33. skip pratt, “e-books and e-publishing: ignore ms reader and palm os at your own peril,” knowledge download, 2002. accessed dec. 27, 2004, www.knowledge-download.com/260802 -e-book-article. 34. davina witt, “audience profile and demographics,” mar./apr. 2003. accessed dec. 27, 2004, www.bookbrowse.com/ media/audience.cfm. 35. geoff daily, “gameboy advance: not just playing with games,” econtent 27, no. 5 (2004): 12–14. 36. associated press, “flexible e-paper on its way,” associated press, 7 may 2003. accessed dec. 27, 2004, www.wired.com/news. 37. richard mayer, multimedia learning (cambridge, uk: cambridge university press, 2000). 38. sottong, “e-book technology.” 39. amc, “film facts: read about lost films.”accessed june 19, 2003, www.amctv.com/article?cid=1052. 40. ronald jantz, “e-books and new library service models: an analysis of the impact of e-book technology on academic libraries,” information technology and libraries 20, no. 2 (2001): 104–15. 41. susan lareau, the feasibility of the use of e-books for replacing lost or brittle books in the kent state university library, 2001, eric, ed 459862. accessed dec. 27, 2004, http://searcheric.org. 42. eli edwards, “ephemeral to enduring: the internet archive and its role in preserving digital media,” information technology and libraries 23, no. 1 (2004): 3–8. 43. norm parry, format proliferation in public libraries, 2002, eric, ed 470035,. accessed dec. 27, 2004, http://searcheric.org. 44. david m. levy, scrolling forward: making sense of documents in the digital age (new york: arcade pub., 2001). 45. about alan lomax. accessed dec. 27 2004, www.alan -lomax.com/about.html. dispelling five myths about e-books | gall 31 (president’s column continued from page 2) art & tech 24 ebsco cover 2 lita covers 3–4 index to advertisers lib-s-mocs-kmc364-20140601051834 biblios revisited j kountz 63 biblios revisited john c. kountz: library systems coordinator, california state university and colleges, los angeles. when this article was in preparation, the author was systems analyst, orange county public libraries, orange county, california. in the following, orange county public library's earlier reports on its biblios system are updated. book catalog and circulation control modules are detailed, development and operation costs documented, and a cost comparison for acquisitions cited. "in 1968 ala began publishing, through its information science and automation division, a journal of library automation. it is perhaps appropriate to note that in the first three quarterly issues only one public library project was described ( 1), and this was a project under contemplation, not one actually in operation." ( 2) this statement by dan melcher to substantiate his contention that library automation is suspect is, in itself, suspect. the public library project alluded to as being contemplated in 1968 was brought to fruition by orange county (california) public library in 1969, and has functioned with startling success ever since. in addition, the finished system was reported to the library ( 3) and data processing ( 4) worlds in 1969 and 1970 respectively. orange county public library's biblios (book inventory building library information oriented system) is a system designed to fulfill all functional requirements of a multibranch library which is growing by leaps and bounds (5). specifically these functional requirements are: acquisitions, book processing, catalog maintenance, circulation control, and book fund accounting, in addition to management reporting on a level not practical in a manual system. 64 ]ounwl of uhrary automation vol. 5 / 2 !unc, 1972 the functional system the interrelation of these system elements is shown diagramaticall y in figure 1. briefly and from a us<'r's point of view, the system works like this: a title is desired by someone, patron or staff member. the p erson refers to the book catalog, figure 2, to see if the item is in the collection. if it is and not in circulation, he gets the book directly. if the item is in circulation, he can submit a request for it-to rece ive the book on its return. to update the catalog, a cumulative supplement is produced, keeping current the listing of the library's holdings. if the title is not found in the catalog or supplement, the monthly cumulative on· order list, figure 3, is consulted. if the title is listed , a request is submitted and, on receipt and processing, the book is released to the requester. if the title is cancelled, the requester is notified. when a title wanted for the collection is not listed in either the catalog or the cumulative on-order list, a bibliographic information sheet ( bis ), figure 4, is completed and optically scanned into the system. this information is essentially a pre-cataloging bibliographic description of the desired material. once entered, these same data serve first to create purchase orders and related reports; then, once edited by the catalogers from the book in hand, to create book card and pocket sets (figure 5 ), book catalog entries, shown in figure 2, holding lists (shelf lists ) for each branch, and a broad array of operational reports. it is a feature of biblios that the descriptive data (from the bis) are entered in their entirety only once. this means that a bibliographic description need not be initialized by each individual using it; rather, it need only be consulted and, if necessary, corrected or deleted. thus, an entry once in the system is immediately available for, among other purposes, ordering. this is especially significant since it means that each entry in the book catalog, the catalog supplement, the cumulative on-order list, etc., can be ordered against by simply using the key number for the desired item and the number assigned to the branch wishing to order. this poses the possibility of orders for materials which are op or otherwise not readily available through the usual vendor channels. biblios addresses these potential errors by listing (pre-vend list, figure 6) all order requirements for review before they are used to create orders. by editing this list against books in print and/or publishers' catalogs and taking corrective action, orders for the unobtainable are short-stopped. on placing an order, while a unique subpurchase order number is mechanically created, the key number continues to document the title for processing purposes. in this role the key number follows the order until it is filled or cancelled. thus, the key is used by biblios to update inven tory automatically on receipt of an order and to create the card and pocket sets for those materials received. finally, the key number is used by the branches to report inventory changes and, as a subset of inventory, for circulation control. rl ruos revisited /kountz fl.s since it is through the key number (or key, for short) for a bibliographic citation that the citation is used in the various functions performed hy biblios, perhaps a little detail concerning the key is in order. bibliographic data optical scan marc ~ l bibliographic jl book catalog ~ master indices book catalogs & supps. orders new materials reorders ~ ! acq uisitions accounting ~ sub purchase orders on-order-lists budget reports vendor performance pre-vend/review lists inventory update losses gifts ..___...._r----inventory t locator ~ lo cator guide & supps. pocket & card sets collection profiles fig. 1. biblios-the functiorwl system. the key number circulation input transactions patron registrations _,...___ ... l cl rculation l ~ holdings list book "tags" patron register overdue notices management repo rts us e profiles in figure 2, the key for 73084452 has been underlined. the key number resembles the lc card order number. wherever an lc card order number is available, it is used. when no lc card order number is available, a unique orange county ( oc) number is applied. the oc number consists of two alphabetic characters in the first two positions (at one time the numbers implied year ) of the "traditional" number followed by a six-digit sequential number. since the library of congress has certain idiosyncracies about its card order number, the key also specifies the type of material it represents (for example, only book keys are in the book ca talog ), and identifies each volume, or edition, of a title which has a blanket lc card order number. the selection of the lc card order number for this application was based on a suspicion that the bulk of materials in the collection were already adult catalog '71 cumulative supplement 7 author-title section wall, joseph frazier. walter chandoha's book of foals ward, mary jane. washington, george, pres. u. s ., andrew carneg•e oxford umvers•ty 1970 and horses. the other carol•ne~ a novel crown 1970 17 32-1799. 1 137p lnde• b•bhog photos b•ography see. chandoha. walter 216p the journal ol ma1or george wa.h1ngton 92-c aa006725 636.1 197 1 aa011379 fiction 70108078 march of ameoca facs1mlfe seoes. no wall, leonard vernon comp1ter the walters ndrome. ward, ritch i e . 42 oogmat r p reads w1tilamsburgh the puppet boo~ ed. of g. a while 2nd & see· neel ric~ard the living cloc~s drawmgs by hollett smith ponied. london. reponted for r jefferys extensive rev under ed. of a. r. philpott y . allred a knopf 1971 385p index b/ w 1154 co vers the pertod from oct i 153 faber & faber 1965 300plnde•b•bhog fictionm 1970 79122149 lllusphotos to jan 1154 un•vers•tymicro fllms 1966 b/ w lllus photos walters, barbara. 574 1 77111247 32p no index maps 791.53 68017740 how to tal~ w1th practically anybody about · . 973.26 66026314-001 wall street and w·tchcraft practically anyth~ng doubleday 1970 warde, fredenck b . washington international arts i • 195p f11ty years ol make bel1eve by fredeock l tt see gunther. max 80856 aa007142 warde international pr syndicate 19 20 e er edllor 133 1971 aa012873 · liop grant s and a1d to lnd1v1duals 1n the arts th w ii s walton, clarence c . contammg ltstmgs o f most protess1onat e a treet jungle. ethos and the execut1 ve values m fiction 2 i008l 54 awards. and /nlormafion about colleges see ney. richard manageoal decision makmgprent1ce-hall wardropper, bruce w. edlfor umvers1tres and prot schools of the 332.678 1970 76084477 1969 267plndekbibhog spamsh f>oetry of the golden age edited by arts by the ed1tors of the washmgton wallace, irving. 658.4 .. 73084~52 bruce w wardropper appleton-century inti arts lefler paperback wa sh lnll the nympho and other mamacs s1monwalton lzaak 1971 353p for lang poetry collect1on art s le tter 1970 75p no index sc huster 197 1 4 7 5p index b1bhog b/ w the ll~es of john donne and george herbert b1bhog r378. 34 70112695 lllus b•ography bound wlfh the prlgom 's progress. by sp861.08 78132806 waskow, arthur i. 301.415 aa011778 john bunyan v 15 m the harvard ware, clyde. the freedom seder a new 1/aggadah l or wallace, marcia. c!ass1cs coll1er 1909 418p the ecfen tree touchstone pubhsh~ng passover holt r1nehart w~nsto 1970 barefoot '" the kltchen. a cookbook tor fiction 09023026-001 -015 company 1971 357p 56p b w lllus summer hostesses dra wmgs by re1d the l1ves ol john donne and george herbert fiction aao 13079 296.437 7910355 7 perez kolman st marhn·s pr 1971 150p bound w1th the p1lgflm ·s progress. by . wasley, ruth. index b/ w tllus john bunyan no. i 5 m the harvard warmack, ohver j. bead oe 51 gn a comprehens11·e course tor 6 4 !5 73145431 ctasslcscoll1er 1937 418p the mystery ol lmquity . volume i 2 thess . begmner and expeoenced craltsman by w ii · r b rt fiction 37040164 -001 -015 2.1 pub by the author 1969 120p no ruth wasley and ed•lh hams crown a ace, 0 e edit or w b h j h index 19 70 216p index b w ill us col ill us the worldoibermnl.l5981680 byrobert am aug • osep • 200 77013647-001 photos 1 w41/4c~ tjnd th~ cdllors of t1me-lde the b lue kmeht an atlantic monthly press books t1me ·l1 i e books. 1970 192p index book l11t1e. brown. 1972 338p lg dwin g . -------746.~ . ~ ...__......._ 81~&,.chm ; ' cql.!!.~u s · · __...------~·~'l...._ _ 79175 nd the~r cu ~ -~ --~ ~-~ fig. 2. a book catalog page featuring four columns. biblios revisitedfkovntz 67 assigned a number, a suspicion which was confirmed on completion of conversion through simple reporting of the keys on file. in short, after fifty years of operation of orange county's libraries, 92 percent of all titles in the collection had an "lc number," a factor one might weigh when trying to decide between isbn and lc card order number; nor has it been indicated that isbn's will be developed retrospectively. an update to the system in the paper presented to the american society for information science in 1969 ( 6), neither the book catalog nor the circulation control modules had been implemented. book catalog in may 1971, the first edition of biblios book catalog was released for public use. since that date, the cumulative supplement has been run six times. the module of biblios producing the book catalog and cumulative supplement is diagrammed in figure 7. input is the title-master file (the system's bibliographic data base) and a specification of the output required. the output options available to the library include the production of either a full catalog or a cumulative supplement (displaying all entries placed on file since production of the full catalog which have been edited by cataloging). in the case of full catalog production, the title-master file is updated to reflect the use of all qualifying entries for catalog production and the date of their use. this updating facilitates cumulative supplement production by precluding the use of these entries from display until the next full catalog run. in addition to the type catalog (full or supplement), the library designates the format of the output. either an off-line print-out or a print file designed to drive a mechanical photocomposition device, or both, can be requested. it is important to note that this print file is designed specifically to be hardware independent, e.g., it will run on rca, photon, alphanumeric, or comparable equipment with equal ease. hardware independence in its simplest terms means the computer program does not have to be rewritten each time a vendor goes out of business. and, coincidentally, this print file is in the sequence it is to be displayed in. in short, the vendor only performs that processing necessary to make his device set type to the library's specification for layout, font style, and font sizea specification, it might be added, which calls for upperand lower-case type from a file in upper-case only. this approach differs from what has become typical of book catalog production in that sorting, file maintenance, and all related processing are sustained by the library through biblios. the vendor only sets type, prints, and binds. the results spell savings since a potentially error-laden file does not have to be committed to the most expensive of all displays, photocomposition, before corrections can be made. l~20l4c4 cumulative on·o~oer list _ _ _ _ media 01 book author titl~ --wioereierg~ siv wicker, kings~ey '~ier, este::t wiest, j, i levy, p, wiltox, leslie a, _ _ wil.de f; , laura .( incal.ls) wilk, max wllkeksun, oavio ~ilkes, bill st. john wllklhsun, paul h, wllki~son, rupert willarn5, milorto wilds ~~~ li.c ox, dont,lo \oilllc(1ll, donald wjllcux·, donald wjlliaiis, brao will i ams, ctlljn ·-· \ol i l li mis, ethel w a _ williahs, garth vii ll i mis, jay ~illians, john g, will i mis, joyce will 14ms, hill.er williams, rooert m. --williams, te nn essee ~ illia~s, ursula moray ~illia~s, ursula mo ray ~ illi~g hah, ~arren we 1-: illl$, f. roy i'll ls j;.~ , eomlino hilson, ~~len janet (came whson, erica wilson, h, w.; firm, pusl ¥! hson, ira g. 'rl ll.son, jean vi i lson~ j ulin rgwan ~ilson, k~ nneth l. w inc~[ll, cunstance mabel ~inuchvt eu~ene c, winn, marie; lllt\hf\ts tales , ~ i ntepburn 1 mollie ~ t nterst donald l., _ __ \-' lrte noer g, patricia l~ ~~ 1 se 1 arthur wi se, herbe~t alvin \ollse, slr.ne:v t, ~!li it her s1 carl ~lltkin, b. e. my best frtf l-10 ways of nhhlism white oak managem nt guide to pertm~schjt s 1/ cyage by thr. ~horr.s of silver l wit and wisrom of holly~u cross and th e s witc~ulaoe nautica i' arc ha ecjlogv aircraf enc i nes of t~e w preve nti on of u•i mki ng pr luc k nf harr y wfav~r muo (rn leat htr design new ues!gn i n jewelry wood design lost le l enos cf the wes t ~omnsex~al.s and the mllit knuw youk ant r sturs big gul •.en animal a 8 c s ll. v e r ~· h i s tl t; field gi de to the sutter adjusta~l[ jul i~ only wo~lo thlre is u c l a susi nr ss forecast hilk train du~sntt stop h snv in ear< n thrh toy makl rs fr~e-ac ~ ess higher ed ucat jta~y c ~ooses e~rope upstate alllr icml pai nter i n paris crf~e l. ~mgro l dery ficti on catal~g for 1970 wf-lht ccj ii ?uto;s ca n ~ot do weaving is fu!·i oa~r lt~c. t o:~ have fa l th with~ut ftar guide t reference books, to nk in 1' ulf playgro . p ol!o k w i ntf~ts tales l61 1970 techn i que lf rlandpu ilt pu henry c· nt.m .l 'riall..u:!: as all.aro~ndt h l 4guse 6rt \ti hu ki lli:o e!ijch pur:l.ll great t ~es uf tl rrgr i t lnvfst ~no ~~tirl i~ mexl m\e rjca . riddle &ulll< sumhary of califurnia law -------------------------fig. 3. all outstanding titles are reported in the monthly cumulative onorder list. orange c3unty public libraky lc-cc · n i.j ~uer 7zl14z2 9 ••••••••• aaoll44 9 ••••••••• 7 ~ 13~ 0 00 ••••••••• aaoll~5 3 • •••••••• 7 j l ~ 5~89 ••••••••• 1 39 0 27 9 4 9. , ••••••• 73124 9 83 ••••••••• 63 009~42 ••••••••• 711,9~50 ••••••••• 41013 39 7 •••••• 070 7aou 3d57 ••••••••• 72 131 147 ••••••••• 6y0 17~b7., ••••••• , 7 9 126t 7o ••••••••• 6~0 12400 ••••••••• . 7oo h 0 ~ 6n •• • •••••• aaol l , o? ••••••••• 6 00 15 2 5 2 •• • ••• • •• ~700 ~jlz ••••••••• 71 1 36 ~ 8? ••••••••• 7 3 146 ~ 03 ••••••••• a a u12 ~ 5l • •••• • ••• 7ol227uo ••••••••• aa017727 ••••••••• b 30l364l ••••••••• 79 1 0 2 ~ 11 ••••••••• 7315 2 d 7 ~ ••••••••• aa 0 17711 ••••••••• 7 50 g3 v 24., ••••••• 7514 33 0?. ••••••••• 7 0 149zzj., ••••• • • 62 009637 ••••••••• 0 90 35 0 4~ •••••• 070 ' 7 3 112 ~ 23 • • • •••••• aao l3 , 77., ••••••• 72150 ~2 ~ ••••••••• 7 7 1 2 4 69~ ••••••••• aao u 6 78 j •••••• o 67 aa c l2 1 55 • •••••••• o7 o l3)ql ••••••••• 5 5o1 j l ~~ •••••• o7l , aanl 77 9 4••••••••• 7 t::cj:1orc; .j ••••••••• 600260 ~~ ••••••• • • 761 4 8 ~ 31 ••••••••• 4 ~00 5 5 ~ 2 ••••••••• aao l 7o e5 ••••••• • • 5j olo u45., ••••••• 6000 4 79~ •••••• 169 surl pu t; tip. l8 order no 7127707 7 47 7117906?.49 7121705 '•27 7ll3 c02 0 3.3 7l2 7700j o a 711791 8~36 71 2 2105 -' ~l 7 1 2 !>ooo l 90 7127703 '•36 712210 ~'i 8q 710 '> 704 \li. o 7l22107z63 712!>6023 9 4 7122103587 7l277q5.::i30 712770297 9 7125605046 71277 0 2742 7l2t:l006~ l 71 22104 <. 31 7j.2560l l!l 6 7122100 0 95 712!>6026?.3 71161098 7 !) 7l2!;i605 ;•j3 7l2770 0 o37 7 l27 7 :l 4 ub8 "tllol os 'j65 712tr 0 2 6 5l 712 560 3 v l3 7125600143 7lz770ll6t 7l 25r) 097co 712211 ( v ~h 712770 1!u e0 7 i 6 4 712,~03013 711"1903-il~ 7 1256081>0 7127711:323 71 2i'7u tl'>31 7ll~ l()4d69 712770703i. 71221qq l ?7 712 ~ 60 3 g 5l 7 1277021> 71 711791p t94 712"170 59 05 712 21 0 9t•0 7 vei~dqr ct bro l:tp. o sho brq pr< ~! bro br~ bro cd bro bt b~o bro bro cro bt ar.o bihi 8 rll &ku ur•u sf{ o ch ut ~ r l.j ck o c!-i &r ij bf, g flfw liko wi p i< \~ st 00 bt p.ro dd cf·ia ih a f.:. a p (;ro ihuj bro ort-1 [i t cll too•··· 7l p t.r • .:~ 6 8 ptl qty cd :?.z 4 5 l 1 1 23 l 4 2 8 1 2 3 l 1 2 l 1 16 !. l 1 'l 1 7 lo '0 1 j 6 1 26 fl l \ ' ) 6 2 4 l 3 4 2 1 4 l 7.7 3 :; li ~ t prtc.t: :i, so-7.or, 3.95 4.9~ !!.9~ l. 8 () 7,95 4,9 5 ?.'j:; 2 5. 00 l :> . (; q 4.5 o 1~.so · 7.~0 8 ,95 5,95 6.9; 5 , c·o 3,3 () 3 .if ., ~ . 9 :; l. 3 5 4.9!> 10. ~ 0 3. 75 3,a il 3. (1 () 6 , ~ i) a. ~o 6. 9 :; 4 .95 7.5 () zs.r.o (l. 9~ 5 .95 b.95 3 .9~ · 4. 00 6.9~ 4.95 5.9!5 l.c, \) 0 8,9!) 5. 0 0 ;. 9 ) 4. 'i 'i l. -.: ..... a ;j ~ ..... 0 ~ 5' . ;::s 0 < £. cj1 0 ......_ l\j '--< c: 0 ::l _en ..... 0 cd -l t-0 0 --- 7616035~••••••••• 1 ryck1 fra~cis f a loarj~d gun a~olz421········· 1 sacki john n a ~ieu!enant callf.v 7712'906tt••••••• 1 sanders, eo, n a fami~v aa01311~•·••••••• 1 sanderson, ivan te~ence n a \.!• 5 9 a, 61014l1'••!••••!• 1 sandoz, marl ~ j r jhes~ were the sioux aaq1048~••••••••• 3 santesson 1 hans stefan f j days -after tqmurrow aaol0~7bee•t•••~• 1 sase~, hlroslav n esto es san p:railciscu j aa010~77,,,,,,,,, 1 sasek, mirosi.av n estu es washingt0~1 d, c, j 6~0l9787e••••••!• 1 sasek, ~lroslav this is hong kong '!016zb6," .. "!! 1 sasek, 11irdslav this ls pa~is 14171270 .. " .. ,., 1 saxton, jusepiiine group feast n j r n j r f a 1.1~1 pi{'i(;e ~,,o 4o95 ---------------------------------------------fig. 6. before producing a sub-p.o., the pre-vend list is checked for o .p. materials, among other things. ---------------------------~-----------------, y , _ r,u~llc llort\ry oaf e .. ll•oij·7~ ~agf44 ol ihjok , ty publlsh [ fl. ua!e vur 0 ujsc net pr av .. special o~oe~l liata co cd ptt p.rice co co 3 ~arnt:s -nubi.e ~9'11 •bro a ! , , e~!jnothcs ih a 4 l>oualroar 1971 •ou a 5 dcjl.isle()tiy 19?1 *ol' a sf 9 steln-uay 19"11 *i.h\ 0 a a! a 2 vi k l '·h> press 19'!1 •bt a 8ro a ) e. p, dutton 1971 *if!' a ii---3 n -.]. -.]. ----------~----~--------------()0 lb~oc301 holdings list cost center j adult call nu"ber author * ()' i 0 795.~15 reese terence • story of an accusation 1 0 795.415 s~einwolo alfred * shoat cut to winning bridge~· 795.415 s,.ith thoi'ias " * look it up in i'oyle 795.~15 young ray • bridge for people who don t know 0 795.41503 reese terence + bridge player s dictionary , 0 0 0 0 0 0 0 0 0 0 0 0 d 0 0 0 0 795.42 ccllver donald i + scie~tific blackjack and co ~ 795.42 thorp edward 0 * beat the dealer a winning stri 7q5.43& blackstone harry + blackstone s modern card tr ~ 795.43& stanyon ellis + card tricks for everyone 795.540973 r~nd ,.cnally * 1970 rand mcnally guidebook to c 796 bisher fur "an * with a southern exposure 796 krout john allen + annals of a"erican sport 796 mittelbuscher c f + call em rig~t 796 murray jim * sporting world of jim murray 796 smith robert miller • grantland rice award pri2 7<16 smith walter wellesley + views cf sport 796 vannier 11aryhelen + individual and tea" sp orts 796 wood cle11e nt • coi'iplete book of gai1es 796.026 sports rules encyclopedia* sports rules e~cycl 796.03 s~lak john s + dictionary of american sports 796.06& aaron david* chilo s play 796.068 butler george d * recreation areas 796.0m isaacs stan* careers and opportunities in spoil 796.08 pepe philips* winners never ouit 796.08 wino herbert warren * realm of sport 796.082 esquire * esquire s great men and moments in sp 796.0&2 sports illustrated * sports the american scene 796.09 cohane timothy + bypaths of glory 796.0973 be9t sports stories • best sports stories for 796.0973 best sports stories * best sports stories for 796.0973 best sports stories • best sports stories for 796.0973 best sports stories * b!st sports stories for 796.0973 best sports stories • best sports stories for 7q6.l broer marion r • fundamentals of marching 7q6.13 c~~se richard • hull~baloo and other singing f( 796.15 wagenvoord james • flying kites 796.3 holt rich~ro • teach yourself billiards 796.31 amateur athletic union • official rules 796.31 maxwell harvey c + american lawn bowler 796.323 amateur athletic union + official a a u 796.323 sports illustrated + book of basketball ~no sn( handbai s gu i dl basket i 7q6.32307 verderame sal reo • organization for champidnsi 796.323092 auerbach ~rnolo reo * reo auerb~ch winning the 796.3230q2 pettit bob • bob pettit the drive within ~e 796.3236 a~thel" pete • city game 796.33 ccnerly charlie • forward p~ss 796.332 schenkel chris • how to whch football on tel.e1 796.33203 treat roger l • encyclopedia of football 796.3320'& riger robert • best plays of the year 196z 796.33206 curran bob • four hundred thousand dollar quaa" 796.332077 devine dan * "issouri power football 796.332077 schoor gene • treasury of notre dame football 796.332082 newcombe ~ack * fireside book of football 7q6.33209 bell joseph n * bowl game thrills --.-----------.-----------......-.--..-.-fig. 9. th e maintenance of manual shelftists is obviated by a bibliosproduced holdings list for each branch. j------------------------------costa iusa loh-fict ion l tle i: card froi1 another l' lete casino guide ·ie gy for the ga"e of twenty one i :ks ; l"pgrounos rev eo i i l: sports stories i i 'or girls ano wollen ! 11pe d la ls i i ~ qrts i i i 061 : ~63 ~64 ~66 ~ 970 ~lk gai'ies ( jker l l e ! all guide 1965 1966 r ip high school basketball hard way ~.is ion ' erback 07/01/71 page 341 r s n8r lc/oc nbr • 67~17872 ••••••••• • 61016665 ••••••••• • 7&077366 ••••••••• • 64015641 ••••••••• • 63025374 ••••••••• • 66023116 ••••••••• • 66012019 ••••••••• • 58005566 ••••••••• • 6&022206 ••••••••• 60c01380••••••••• • 62008215 ••••••••• • 2900080~a •••••••• • 8b091802 ••••••••• • 68c25594••••••••• • 62015934 ••••••••• • 53006862 ••••••••• • 60007465 ••••••••• • 3&003909 ••••••••• • 61019409 ••••••••• • 60013658 ••••••••• • 64012696 ••••••••• • 57011288 ••••••••• • 64019529 ••••••••• • 67026079 ••••••••• • 66019433 ••••••••• • 61010232 ••••••••• • 63021480 ••••••••• • 63016506 ••••••••• • 45035124 •••••• 061 • 45035124 •••••• 063 • 45035124 •••••• 064 • 45035124 •••••• 066 • 45035124 •••••• 070 • 65021807 ••••••••• • 49008127 ••••••••• • 68031281 ••••••••• • 5&003667 ••••••••• • 88c40004 ••••••••• • 66025876 ••••••••• • 88090177 ••••••••• • 62011346 ••••••••• • 63014720 •••••• • •• • 67011223 ••••••••• • 66c14357 ••••••••• • aa010179 ••••••••• • 60012110 •••••••• • • 64020856 ••••••••• • 61013913 ••••••••• • 62022305 • • ••••••• • 65022618 ••••••••• • 62005250 ••••••••• • 6201&326 ••••••••• • 64019933 ••••••••• • 2 63016799 ••••••••• 80 journal of library automation vol. 5/ 2 june, 1972 branch to circul•tion control sub-system book card production (by s ranth) book & date-due cards fig. 10. biblios circulation control subsystem. biblios revisitedjkountz 81 cards, cassette, or mini-reels). ideally, the elusive transactor should be able to "read" a label on the book as well as a patron card. kimball labels, "sunburst" tags, magnetically coded swatches and the like have worked and continue to work in the retail trade; there is no reason why they shouldn't work for libraries. the only deterrent seems to be the reticence of their manufacturers to enter an unknown market where, following the melcher axiom, they are met with a "stubborn, 'show me' attitude when automation is proposed." ( 8) the products designed into the circulation control module include: weed lists, patron "black lists," circulation profiles (graphically displaying patron use of each branch's collection), and automatic duplicate ordering. reports measure circulation from a manager's viewpoint, but not to the exclusion of such bread-and-butter products as overdue notices, registration lists, and related statistical recapitulations. a word about documentation for each program in each subsystem of biblios, forty unique programs in all, there is a formal package consisting of: l. a program specification detailing the inputs, processing, outputs, idiosyncracies, and edits of that program; 2. a listing of the cobol program itself; 3. an operations binder (notebook) section for set-up and run procedures; 4. a user's guide section relating requirements and diagnostics to the librarians using the program including typical problems; and, 5. assorted total system binders (notebooks). while some might think "overkill," in automation this is not the case. the biblios system has yet to fail a scheduled commitment. further, it is suspected that the mere discipline of documentation caused many serious reconsiderations of program and procedural logic, at the time and on the spot, with the result that biblios is a reliable systemrequiring no major rework and continuing to respond to the library's functional requirements for over two years at this writing. a word about development costs both developmental and operational costs for biblios are known and documented. specifically, the costs to procure such a system are broken out in table 1, where each subsystem is examined in terms of the dollars it represents and the assorted tasks required to bring it into being. the totals represent all costs over approximately a three-year period beginning with rough specifications and yielding the first book catalog. it must be noted that final program specifications and coding were performed for orange county by a contractor. this approach was chosen, since a good job done on time was wanted. that the approach was valid is table 1. biblios development costs (including full conversion and publication of first book catalog). (x) 1:-0 '-. 0 bibliographic book catalog ~ marc inventory locator guide acquisitions circulation total contractor 0 -program specifications t"-' & coding $16,686 $ 54,299 $ 25,800 $ 72,305 $ 91,000 $260,090 & ;::; ""' .'= orange co. public library ~ :::: ..... analyst 3,360 7,840 2,240 14,560 7,000 35,000 0 ;:; coordination ..... 1,225 7,679 818 5,310 5,670 20,702 -· -~ :s implementation ( k.p. , < machine time, etc. ) 4,772 12,263 4,635 7,879 10,110 39,659 £. con version/ outside cll -services 800 53,500 41,370 95,670 l-:> subtotal 10,157 81,282 49,063 27,749 22,780 191,031 ._ ,.. :l "\.) ,.... total $26,843 $135,581 $ 74,863 $100,054 $113,780 $451,121 c;o -..1 l..o biblios revisitedfkountz 83 evidenced by the achievement of a successful system on schedule and within budget. this approach reflects a contention that librarians can specify their requirements if they "have a mind to," and that a contracted programming staff can satisfactorily perform to predetermined standards and timeframes if properly directed. in direct contrast to this approach are the incredible schedules developed when requirements are not specified (and frozen), and the suspected monumental costs hidden in lost staff time due to extended parallel operations or simply waiting until "they" get the " ... thing" to run right. the remaining cost components, briefly, reflect direct library analyst time, the cost of coordination meetings, direct key punch and machine time for programs, their test, debug, string test, systems test, and for the bibliographic and book catalog, subsystems conversion and catalog print file generation. the conversion/outside services include a marc subscription, the creation and use of a group of nine typists to optically scan the library's files to convert them to machine readable form (including error correction), and the contracted services of a photoreproduction house to mechanically compose, print, bind, and deliver 500 sets of the book catalog and 100 sets of the locator guide. these are the costs of setting the system up, staff training, and creating a single operational display: the book catalog. a word about operating costs early in 1965, as a prelude to implementing a book acquisition program, a time/cost study was performed to determine how much it cost the library to order a book (one title). this study detailed and costed the typing, sorting, assignment of vendors, and the reduction of a diversity of paper requisite to creating a purchase order. excluding the cost of the purchase order form itself, the direct manual cost for this process was $1.56 per title, using a clerical rate of $2.10 per hour. in the intervening years three things have happened: first, clerical rates have increased to $2.79 per hour which when applied to the unit cost of the 1965 acquisitions study means a direct outlay of $2.07 per title (as against the previous $1.56). second, the number of branches has increased which implies that, if the manual system of 1965 could cope with the increased load, it would have required more people and therefore an increase in indirect costs, not to mention the probability of less efficiency due to increased direct costs. third, orange county has automated this function (as well as others). since orange county is wont to track costs, it so happens that the cost for creating a purchase order ( subpurchase order under the new system) is available. specifically, orange county knows computer and peripheral costs and the exact time for processing from actual billings over the past two years. the reduction of these data to a per-unit-handled equivalent, while detailed, is not difficult. thus, it is possible to deduce the machine costs table 2. typical processing costs for one title in orange county public library's biblios system. marc acquisitions 2 book catalog (weekly) bibliographic1 inventory order receive3 b.c. inventory run cost $325.16 $300.40 $201.21 $1244.94 $238.55 $238.00 $26.00 average items per period 1154 1,000 8,100 700 4000 order receive cost/entry 2.83 0.30 0.025 $1.78 $0.34 0.059 0.0006 supplies 0.13 0.028 $.05 (sub p.o.) services 0.02 ( convelope) .06 ( opscan) 0.041 0.0028 (opscan) ( comp /print) ( comp /print) total $2.96 $0.32 $0.053 $1.89 $0.34 $.10 $0.0034 example: cost of entry from initial input to display in book catalog (including convelope; excluding marc source: $2.77). 1 40% bibliographic. 2 60% bibliographic. 3 includes invoice, vendor, and budget displays. 4 if all new entries to system came from marc. (x) ..,... 0' ::: 3 1::) ........ ~ t""< 6.:. ..., ;:::: ..., <::: :;.,... ::: .,... 2 :::; ..... c· ;:; < 2.. '-" -...._ 1'0 ._ ,.. 5 "(1) ,_. ~ 1'0 biblios revisitedjkountz 85 equitable to those for the earlier manual effort : creating a purchase order for one title, including the purchase order form , now costs $1.89. similar economies can readily be documented as can the increases in service to our patrons at no increase in staff. the operating costs for those biblios subsystems in regular use are given in table 2. only two entries on this table are not self-explanatory. marc marc, which is indicated as processed weekly, has not been run for over a year. the explanation is simple economics. it costs $0.32 to manually place a bibliographic description on file (excluding the time spent to circle an entry in publishers, weekly (pw) vs. $2.96 to process the same entry from marc. this cost for marc includes the subscription cost prorated to selected entries, the translation and format of all marc entries, the automatic release of those entries of limited value to a public library, the cumulation of entries which may be of value, the extract and transfer of those entries selected, and the reporting via indices and full listings for the contents of the cumulated file. the unit cost is the actual processing cost for marc ii files for one year divided by the number of titles processed through the rest of biblios during the same period. this cost does not include corrections to selected marc entries (invariably in the call number and author fields for consistency with the library's existing files). the costs affiliated with processing corrective input closely resemble those for bibliographic, e.g., $0.32 each. prorated bibliographic input biblios works on pre-cataloged entries. the 60 percent bibliographic input shown under acquisitions relates to the full initial description for a title being entered by a book selector to effect its order and subsequent reporting; the 40 percent shown under bibliographic is for cataloger input to adjust the entry for title-page accuracy, consistency with existing files , and, for nonfiction, the assignment of call numbers and subject headings. it is important to note that for reorders against a title already in the system, no bibliographic input is required. in the case of reorders, the per title cost is $0.88 including subpurchase order forms. references 1. john c. kountz, "cost comparison of computer versus manual catalog maintenance," l ournal of library automation 1:159-77 (spring 1968). 2. daniel melcher, m elcher on acquisition (chicago: american library association, 1971 ), p. 135. 3. john c. kountz and robert norton, "biblios-a modular approach to total library adp," proceedings of asis 6:39-50 ( 1969). 86 journal of library automation vol . 5/ 2 june, 1972 4. john c. kountz and robert e. norton , "biblios-a modular system for library automation," datamation 16-79-83 ( feb. 1970 ) . 5. orange county public library presently has twenty-six branches, three bookmobiles, and plans for at least three more branches and an additional bookmobile in the near future. 6. kountz and norton, "biblios-a modular approach." 7. the device affiliated with the book depends on the transactor. the only requirements are that it mechanically represent the key for the book, be practically indestructible, and that it can be prepared mechanically. this last consideration is an absolute when there are 800,000 volumes to convert. 8. melcher, melcher on acquisition, p. 135. microsoft word june_ital_vacek_final.docx president’s  message:     making  an  impact  in  the  time     that  is  given  to  us   rachel  vacek     information  technologies  and  libraries  |  june  2015         3   in  an  early  chapter  in  the  fellowship  of  the  ring,  by  j.r.r.  tolkien,  frodo  laments  having  found  the   one  ring  and  gandalf  tries  to  console  him  by  saying,  “all  we  have  to  decide  is  what  to  do  with  the   time  that  is  given  us.”  this  is  one  of  my  favorite  quotes  in  the  lord  of  the  rings  series  because  it   inspires  us  to  rise  to  the  occasion  and  perform  to  the  best  of  our  abilities.  it  also  implies  that  that   we  have  a  purpose  to  fulfill  within  a  predetermined  time  period.   although  my  term  in  office  is  three  years,  i’m  only  lita  president  for  one  year.  to  set  a  vision  and   goals,  establish  a  sense  of  urgency,  generate  buy-­‐in,  engage  and  empower  the  membership,   implement  sustainable  changes,  and  remain  positive  and  focused  –  all  within  one  year  while   holding  a  full-­‐time  job  –  is  challenging  to  say  the  least.     i’ve  been  very  fortunate  during  my  almost  eight-­‐year  tenure  at  the  university  of  houston  libraries   to  participate  in  numerous  professional  development  opportunities,  lead  change,  and  make  a   difference.  personal  and  professional  growth  has  always  been  very  important  to  me,  and  being  in   an  environment  that  encourages  me  to  become  a  better  librarian,  technologist,  manager,  and   leader  is  not  only  helpful  for  my  career,  but  also  extremely  rewarding  on  an  intellectual  level.  lita   has  benefited  that  training.   in  today’s  library  technology  landscape,  one  of  the  many  skills  leaders  need  to  possess  is  the   ability  to  effect  change.  as  lita  president,  i  have  put  many  changes  in  motion  and  am  happy  with   what  i  have  accomplished,  and  proud  of  our  board  and  the  members  who  volunteer  to  lead  and   effect  change.   as  i  reflect  over  the  past  year,  it’s  fair  to  say  that  lita,  despite  some  financial  challenges,  has  had   numerous  successes  and  remains  a  thriving  organization.  three  areas  –  membership,  education,   and  publications  –  bring  in  the  most  revenue  for  lita.  of  those,  membership  is  the  largest  money   generator.  however,  membership  has  been  on  a  decline,  a  trend  that’s  been  seen  across  ala  for   the  past  decade.  in  response,  the  board,  committees,  interest  groups,  and  many  and  individuals   have  been  focused  on  improving  the  member  experience  to  retain  current  members  and  attract   potential  ones.  with  all  the  changes  to  the  organization  and  leadership,  lita  is  on  the  road  to   becoming  profitable  again  and  will  remain  one  of  ala’s  most  impactful  divisions.       rachel  vacek  (revacek@uh.edu)  is  lita  president  2014-­‐15  and  head  of  web  services,  university   libraries,  university  of  houston,  houston,  texas.     president’s  message  |  vacek       doi:  10.6017/ital.v34i2.8804     4   the  board  has  taken  numerous  steps  to  stabilize  or  reverse  the  decline  in  revenues  that  has   resulted  from  a  steady  reduction  in  overall  membership.  at  ala  annual  2014,  the  financial   advisory  committee  was  established  to  respond  to  recommendations  from  the  financial   strategies  task  force,  adjusting  the  budget  to  make  a  number  of  improvements  while  planning  for   larger,  more  substantial  changes.   in  fall  2014  we  took  steps  to  improve  our  communications  by  establishing  the  communications  &   marketing  committee  and  appointing  a  social  media  manager  and  a  blog  manager.  the  blog  and   social  media  have  seen  a  steady  upward  trajectory  of  engagement  with  over  27,000  blog  views   since  september  2014  and  over  13,300  followers  on  twitter.  these  efforts  help  recruit  and  retain   members,  advertise  our  online  education  and  programming,  and  increase  attendance  at   conferences.       over  the  past  year,  nine  workshops  and  two  web  courses  were  offered,  many  of  which  sold  out   thanks  to  new  marketing  approaches.  the  forum  remains  popular  and  has  stellar  programming   and  keynote  speakers.  programs  and  workshops  at  ala  conferences  are  stronger  than  ever  and   continue  to  be  well  attended.  publications  also  remain  strong.  although  only  three  lita  guides   were  published  this  year,  partially  due  to  a  change  in  publishers,  there  are  many  more  in  the   pipeline.       finally,  the  search  for  a  new  executive  director  is  underway,  and  with  a  new  leader  comes  fresh   ideas  and  perspectives.  i  am  excited  about  lita’s  future.  the  incoming  board,  along  with  a  new   executive  director,  has  an  opportunity  to  make  national  and  lasting  impact  as  well  as  collaborate   with  outstanding  librarians  and  staff  in  this  division  and  across  ala.  lita’s  challenges  and   successes  are  shared  amongst  a  dedicated  team  of  volunteers,  and  together  we’ve  made  significant   changes.  i  believe  that  lita  members  will  continue  to  rise  to  the  occasion  and  make  incredible   things  happen  with  “the  time  that  is  given  us.”  lita  is  an  amazing  organization  because  of  its   members  and  their  passion  and  dedication.  i  couldn’t  be  prouder.  it  has  been  an  honor  and  a   privilege  to  serve  as  your  president.           usability test results for encore in an academic library megan johnson information technology and libraries | september 2013 59 abstract this case study gives the results a usability study for the discovery tool encore synergy, an innovative interfaces product, launched at appalachian state university belk library & information commons in january 2013. nine of the thirteen participants in the study rated the discovery tool as more user friendly, according to a sus (standard usability scale) score, than the library’s tabbed search layout, which separated the articles and catalog search. all of the study’s participants were in favor of switching the interface to the new “one box” search. several glitches in the implementation were noted and reported to the vendor. the study results have helped develop belk library training materials and curricula. the study will also serve as a benchmark for further usability testing of encore and appalachian state library’s website. this article will be of interest to libraries using encore discovery service, investigating discovery tools, or performing usability studies of other discovery services. introduction appalachian state university’s belk library & information commons is constantly striving to make access to libraries resources seamless and simple for patrons to use. the library’s technology services team has conducted usability studies since 2004 to inform decision making for iterative improvements. the most recent versions (since 2008) of the library’s website have featured a tabbed layout for the main search box. this tabbed layout has gone through several iterations and a move to a new content management system (drupal). during fall semester 2012, the library website’s tabs were: books & media, articles, google scholar, and site search (see figure 1). some issues with this layout, documented in earlier usability studies and through anecdotal experience, will be familiar to other libraries who have tested a tabbed website interface. user access issues include the belief of many patrons that the “articles” tab looked for all articles the library had access to. in reality the “articles” tab searched seven ebsco databases. belk library has access to over 400 databases. another problem noted with the tabbed layout was that patrons often started typing in the articles box, even when they knew they were looking for a book or dvd. this is understandable, since when most of us see a search box we just start typing, we do not read all the information on the page. megan johnson (johnsnm@appstate.edu) is e-learning and outreach librarian, belk library and information commons, appalachian state university, boone, nc. mailto:johnsnm@appstate.edu usability test results for encore in an academic library | johnson 60 figure 1. appalachian state university belk library website tabbed layout search, december 2012. a third documented user issue is confusion over finding an article citation. this is a rather complex problem, since it has been demonstrated through assessment of student learning that many students cannot identify the parts of a citation, so this usability issue goes beyond the patron being able navigate the library’s interface, it is partly a lack of information literacy skills. however, even sophisticated users can have difficulty in determining if the library owns a particular journal article. this is an ongoing interface problem for belk library and many other academic libraries. google scholar (gs) often works well for users with a journal citation, since on campus they can often simply copy and paste a citation to see if the library has access, and, if so, the full text it is often is available in a click or two. however, if there are no results found using gs, the patrons are still not certain if the library owns the item. background in 2010, the library formed a task force to research the emerging market of discovery services. the task force examined summon, ebsco discovery service, primo and encore synergy and found the products, at that time, to still be immature and lacking value. in april 2012, the library reexamined the discovery market and conducted a small benchmarking usability study (the results are discussed in the methodology section and summarized in appendix a). the library felt enough improvements had been made to innovative interface’s encore information technology and libraries | september 2013 61 synergy product to justify purchasing this discovery service. an encore synergy implementation working group was formed, and several subcommittees were created, including end-user preferences, setup & access, training, and marketing. to help inform the decision of these subcommittees, the author conducted a usability study in december 2012, which was based on, and expanded upon, the april 2012 study. the goal of this study was to test users’ experience and satisfaction with the current tabbed layout, in contrast to the “one box” encore interface. the library had committed to implementing encore synergy, but there are options in layout of the search box on the library’s homepage. if users expressed a strong preference for tabs, the library could choose to leave a tabbed layout for access to the articles part of encore, for the catalog part, and create tabs for other options like google scholar, and a search of the library’s website. a second goal of the study was to benchmark the user experience for the implementation of encore synergy so that, over time, improvements could be made to promote seamless access to appalachian state university library’s resources. a third goal of this study was to document problems users encountered and report them to innovative. figure 2. appalachian state university belk library website encore search, january 2013. usability test results for encore in an academic library | johnson 62 literature review there have been several recent reviews of the literature on library discovery services. thomsettscott and reese conclude that discovery tools are a mixed blessing. 1 users can easily search across abroad areas of library resources and limiting by facets is helpful. downsides include loss of individual database specificity and user willingness to look beyond the first page of results. longstanding library interface problems, such as patrons’ lack of understanding of holding statements, and knowing when to it is appropriate to search in a discipline specific database are not solved by discovery tools.2 in a recent overview of discovery services, hunter lists four vendors whose products have both a discovery layer and a central index: ebsco’s discovery service (eds); ex libris’ primo central index; serials solutions’ summon; and oclc’s worldcat local (wcl). 3 encore does not have currently offer a central index or pre-harvested metadata for articles, so although encore has some of the features of a discovery service, such as facets and connections to full text, it is important for libraries considering implementing encore to understand that the part of encore that searches for articles is a federated search. when appalachian purchased encore, not all the librarians and staff involved in the decision making were fully aware of how this would affect the user experience. further discussion of this in the “glitches revealed” section. fagan et al. discuss james madison university’s implementation of ebsco discovery service and their customizations of the tool. they review the literature of discovery tools in several areas, including articles that discuss the selection processes, features, and academic libraries’ decisions process following selection. they conclude, the “literature illustrates a current need for more usability studies related to discovery tools.” 4 the most relevant literature to this study are case studies documenting a library’s experience with implementing a discovery services and task based usability studies of discovery services. thomas and buck5 sought to determine with a task based usability study whether users were as successful performing common catalog-related tasks in worldcat local (wcl) as they are in the library’s current catalog, innovative interfaces’ webpac. the study helped inform the library’s decision, at that time, to not implement wcl. beecher and schmidt6 discuss american university’s comparison of wcl and aquabrowser (two discovery layers), which were implemented locally. the study focused on user preferences based on students “normal searching patterns” 7 rather than completion of a list of tasks. their study revealed undergraduates generally preferred wcl, and upperclassmen and graduates tended to like aquabrower better. beecher and schmidt discuss the research comparing assigned tasks versus user-defined searches, and report that a blend of these techniques can help researchers understand user behavior better.8 information technology and libraries | september 2013 63 this article reports on a task-based study, in which the last question asks the participant to research something they had looked for within the past semester, and the results section indicates that the most meaningful feedback came from watching users research a topic they had a personal interest in. having assigned tasks also can be very useful. for example, an early problem noted with discovery services was poor search results for specific searches on known items, such as the book “the old man and the sea.” assigned tasks also give the user a chance to explore a system for a few searches, so when they search for a topic of personal interest, it is not their first experience with a new system. blending assigned tasks with user tasks proved helpful in this study’s outcomes. encore synergy has not yet been the subject of a formally published task-based usability study. allison reports on an analysis of google analytic statistics at university of nebraska-lincoln after encore was implemented.9 the article concludes that encore increases the user’s exposure to all the library’s holdings, describes some of the challenges unl faced and gives recommendations for future usability studies to evaluate where additional improvements should be made. the article also states unl plans to conduct future usability studies. although there are not yet formal published task-based studies on encore, at least one blogger from southern new hampshire university documented their implementation of the service. singley reported in 2011, “encore synergy does live up to its promise in presenting a familiar, user-friendly search environment.10 she points out, “to perform detailed article searches, users still need to link out to individual databases.” this study confirms that users do not understand that articles are not fully indexed and integrated; articles remain, in encore’s terminology, in “database portfolios.” see the results section, task 2, for a fuller discussion of this topic. method this study included a total of 13 participants. these included four faculty members, and six students recruited through a posting on the library’s website offering participants a bookstore voucher. three student employees were also subjects (these students work in the library’s mailroom and received no special training on the library’s website). for the purposes of this study, the input of undergraduate students, the largest target population of potential novice users, was of most interest. table 3 lists demographic details of the student or faculty’s college, and for students, their year. this was a task-based study, where users were asked to find a known book item and follow two scenarios to find journal articles. the following four questions/tasks were handed to the users on a sheet of paper: 1. find a copy of the book the old man and the sea. 2. in your psychology class, your professor has assigned you a 5-page paper on the topic of eating disorders and teens. find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem. http://www.snhu.edu/ usability test results for encore in an academic library | johnson 64 3. you are studying modern chinese history and your professor has assigned you a paper on foreign relations. find a journal article that discusses relations between china and the us. 4. what is a topic you have written about this year? search for materials on this topic. the follow up questions where verbally asked either after a task, or asked as prompts while the subject was working. 1. after the first task (find a copy of the book the old man and the sea) when the user finds the book in appsearch, ask: “would you know where to find this book in the library?” 2. how much of the library’s holdings do you think appsearch/ articles quick search is looking across? 3. does “peer reviewed” mean the same as “scholarly article”? 4. what does the “refine by tag” block the right mean to you? 5. if you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend? participants were recorded using techsmith’s screen-casting software camtasia, which allows the user’s face to be recorded along with their actions on the computer screen. this allows the observer to not rely solely on notes or recall. if the user encounters a problem with the interface, having the session recorded makes it simple to create (or recreate) a clip to show the vendor. in the course of this study, several clips were sent to innovative interfaces, and they were responsive to many of the issues revealed. further discussion is in the “glitches revealed” section. seven of the subjects first used the library site’s tabbed layout (which was then the live site) as seen in figure 1. after they completed the tasks, participants filled in a system usability scale (sus) form. the users then completed the same tasks on the development server using encore synergy. participants next filled out a sus form to reflect their impression of the new interface. encore is locally branded as appsearch and the terms are used interchangeably in this study. the six other subjects started with the appsearch interface on a development server, completed a sus form, and then did the same tasks using the library’s tabbed interface. the time it took to conduct the studies was ranged from fifteen to forty minutes per participant, depending on how verbal the subject was, and how much they wanted to share about their impressions and ideas for improvement. jakob nielson has been quoted as saying you only need to test with five users: “after the fifth user, you are wasting your time by observing the same findings repeatedly but not learning much new.”11 he argues for doing tests with a small number of users, making iterative improvements, and then retesting. this is certainly a valid and ideal approach if you have full control of the design. in the case of a vendor-controlled product, there are serious limitations to what the information technology and libraries | september 2013 65 librarians can iteratively improve. the most librarians can do is suggest changes to the vendor, based on the results of studies and observations. when evaluating discovery services in the spring of 2012, appalachian state libraries conducted a four person task based study (see appendix a), which used university of nebraska at lincoln’s implementation of encore as a test site to benchmark our students’ initial reaction to the product in comparison to the library’s current tabbed layout. in this small study, the average sus score for the library’s current search box layout was 62, and for unl’s implementation of encore, it was 49. this helped inform the decision of belk library, at that time, not to purchase encore (or any other discovery service), since students did not appear to prefer them. this paper reports on a study conducted in december 2012 that showed a marked improvement in users’ gauge of satisfaction with encore. several factors could contribute to the improvement in sus scores. first is the larger sample size of 13 compared to the earlier study with four participants. another factor is in the april study, participants were using an external site they had no familiarity with, and a first experience with a new interface is not a reliable gauge of how someone will come to use the tool over time. this study was also more robust in that it added the task of asking the user to search for something they had researched recently and the follow up questions were more detailed. overall it appears that, in this case, having more than four participants and a more robust design gave a better representation of user experience. the system usability scale (sus) the system usability scale has been widely used in usability studies since its development in 1996. many libraries use this tool in reporting usability results.12,13 it is simple to administer, score, and understand the results.14 sus is an industry standard with references in over 600 publications.15 an “above average” score is 68. scoring a scale involves a formula where odd items have one subtracted from the user response, and with even numbered items, the user response is subtracted from five. the total converted responses are added up, and then multiplied by 2.5. this makes the answers easily grasped on the familiar scale of 1-100. due to the scoring method, it is possible that results are expressed with decimals.16 a sample sus scale is included in appendix d. results the average sus score for the 13 users for encore was 71.5, and for the tabbed layout, the average sus score was 68. this small sample set indicates there was a user preference for the discovery service interface. in a relatively small study like this, these results do not imply a scientifically valid statistical measurement. as used in this study, the sus scores are simply a way to benchmark how “usable” the participants rated the two interfaces. when asked the subjective follow up question, “if you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend?” 100% of the participants recommended the library change to appsearch, (although four users actually rated usability test results for encore in an academic library | johnson 66 the tabbed layout with a higher sus score). these four participants said things along the lines of, “i can get used to anything you put up.” participant sus sus year and major or college appsearch first encore tabbed layout student a 90 70 senior/social work/female no student b 95 57.5 freshman/undeclared/male yes student c 82.5 57.5 junior/english/male yes student d 37.5 92 sophomore/actuarial science/female yes student e 65 82.5 junior/psychology/female yes student f 65 77.5 senior/sociology/female no student g 67.5 75 junior/music therapy/female no student h 90 82.5 senior/dance/female no student i 60 32.5 senior/political science/female no faculty a 40 87.5 family & consumer/science/female yes faculty b 80 60 english/male no faculty c 60 55 education/male no faculty d 97.5 57.5 english/male yes average 71.5 68 table 1. demographic details and individual and average sus scores. discussion task 1: “find a copy of the book the old man and the sea.” all thirteen users had faster success using encore. when using encore, this “known item” is in the top three results. encore definitely performed better than the classic catalog in saving the time of the user. in approaching task 1 from the tabbed layout interface, four out of thirteen users clicked on the books and media tab, changed the drop down search option to “title,” and were (relatively) quickly successful. the remaining nine who switched to the books and media tab and used the default keyword search for “the old man and the sea” had to scan the results (using this search method, the book is the seventh result in the classic catalog), which took two users almost 50 seconds. this length of time, for an “average user” to find a well-known book is not considered to be acceptable to the technology services team at appalachian state university. when using the encore interface, the follow up question for this task was, “would you know where to find this book in the library?” nine out of 13 users did not know where the book would be, or information technology and libraries | september 2013 67 how to find it. the three faculty members and student d could pick out the call number and felt they could locate the book in the stacks. figure 3. detail of the screen of results for searching for “the old man and the sea”. the classic catalog that most participants were familiar with has a “map it” feature (from the third party vendor stackmap), and encore did not have that feature incorporated yet. since this study has been completed, the “map it” has been added to the item record in appsearch. further research can determine if students will have a higher level of confidence in their ability to locate a book in the stacks when using encore. figure 3 shows the search as it appeared in december 2012 and figure 4 has the “map it” feature implemented and pointed out with a red arrow. related to this task of searching for a known book, student b commented that in encore, the icons were very helpful in picking out media type. figure 4. book item record in encore. the red arrow indicates the “map it” feature, an add-on to the catalog from the vendor stackmap. browse results are on the right, and only pull from the catalog results. when using the tabbed layout interface (see figure 1), three students typed the title of the book into the “articles” tab first, and it took them a few moments figure out why they had a problem with the results. they were able to figure it out and re-do the search in the “correct” books & usability test results for encore in an academic library | johnson 68 media tab, but student d commented, “i do that every time!” this is evidence that the average user does not closely examine a search box--they simply start typing. task 2: “in your psychology class, your professor has assigned you a five-page paper on the topic of eating disorders and teens. find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem.” this question revealed, among other things, that seven out of the nine students did not fully understand the term scholarly or peer reviewed article are meant to be synonyms in this context. when asked the follow up question “what does ‘peer reviewed’ mean to you?” student b said, “my peers would have rated it as good on the topic.” this is the kind of feedback that librarians and vendors need to be aware of in meeting students’ expectations. users have become accustom to online ratings by their peers of hotels and restaurants, so the terminology academia uses may need to shift. further discussion on this is in the “changes suggested” section below. figure 5. typical results for task two. figure 5 shows a typical user result for task 2. the follow up question asked users “what does the refine by tag box on the right mean to you?” student g reported they looked like internet ads. other users replied with variations of, “you can click on them to get more articles and stuff.” in fact, the “refine by tag” box in the upper right column top of screen contains only indexed terms from the subject heading of the catalog. this refines the current search results to those with the specific subject term the user clicked on. in this study, no user clicked on these tags. information technology and libraries | september 2013 69 for libraries considering purchasing and implementing encore, a choice of skins is available, and it is possible to choose a skin where these boxes do not appear. in addition to information from innovative interfaces, libraries can check a guide maintained by a librarian at saginaw valley state university17 to see examples of encore synergy sites, and links to how different skins (cobalt, pearl or citrus) affect appearance. appalachian uses the “pearl” skin. figure 6. detail of screenshot in figure 5. figure 6 is a detail of the results shown in the screenshot for average search for task 2. the red arrows indicate where a user can click to just see article results. the yellow arrow indicates where the advanced search button is. six out of thirteen users clicked advanced after the initial search results. clicking on the advanced search button brought users to a screen pictured in figure 7. usability test results for encore in an academic library | johnson 70 figure 7. encore's advanced search screen. figure 7 shows the encore’s advanced search screen. this search is not designed to search articles; it only searches the catalog. this aspect of advanced search was not clear to any of the participants in this study. see further discussion of this issue in the “glitches revealed” section. information technology and libraries | september 2013 71 figure 8. the "database portfolio" for arts & humanities. figure 8 shows typical results for task 2 limited just to articles. the folders on the left are basically silos of grouped databases. innovative calls this feature “database portfolios.” in this screen shot, the results of the search narrowed to articles within the “database portfolio” of arts & humanities. clicking on the individual databases return results from that database, and moves the usability test results for encore in an academic library | johnson 72 user to the database’s native interface. for example, in figure 8, clicking on art full text would put the user into that database, and retrieve 13 results. while conducting task 2, faculty member a stressed she felt it was very important students learn to use discipline specific databases, and stated she would not teach a “one box” approach. she felt the tabbed layout was much easier than appsearch and rated the tabbed layout in her sus score with a 87.5 versus the 40 she gave encore. she also wrote on the sus scoring sheet “appsearch is very slow. there is too much to review.” she also said that the small niche showing how to switch results between “books & more” to article was “far too subtle.” she recommended bold tabs, or colors. this kind of suggestion librarians can forward to the vendor, but we cannot locally tweak this layout on a development server to test if it improves the user experience. figure 9. closeup of switch for “books & more” and “articles” options. task 3: “you are studying modern chinese history and your professor has assigned you a paper on foreign relations. find a journal article that discusses relations between china and the us.” most users did not have much difficulty finding an article using encore, though three users did not immediately see a way to limit only to articles. of the nine users who did narrow the results to articles, five used facets to further narrow results. no users moved beyond the first page of results. search strategy was also interesting. all thirteen users appeared to expect the search box to work like google. if there were no results, most users went to the advanced search, and reused the same terms on different lines of the boolean search box. once again, no users intuitively understood that “advanced search” would not effectively search for articles. the concept of changing search terms was not a common strategy in this test group. if very few results came up, none of the users clicked on the “did you mean” or used suggestions for correction in spelling or change in terms supplied by encore. during this task, two faculty members commented on load time. they said students would not wait, results had to be instant. but when working with students, when the author asked how they felt when load time was slow, students almost all said it was fine, or not a problem. they could “see it was working.” one student said, “oh, i’d just flip over to facebook and let the search run.” so perhaps librarians should not assume we fully understand student user expectations. it is also information technology and libraries | september 2013 73 worth noting that, for the participant, this is a low-stakes usability study, not crunch time, so attitudes may be different if load time is slow for an assignment due in a few hours. task 4: “what is a topic you have written about this year? search for materials on this topic.” this question elicited the most helpful user feedback, since participants had recently conducted research using the library’s interface and could compare ease of use on a subject they were familiar with. a few specific examples follow. student a, in response to the task to research something she had written about this semester, looked for “elder abuse.” she was a senior who had taken a research methods class and written a major paper on this topic, and she used the tabbed layout first. she was familiar with using the facets in ebsco to narrow by date, and to limit to scholarly articles. when she was using appsearch on the topic of elder abuse, encore held her facets “full text” and “peer reviewed” from the previous search on china and u.s. foreign relations. an example of encore “holding a search” is demonstrated in figures 10 and 11 below. student a was not bothered by the encore holding limits she had put on a previous search. she noticed the limits, and then went on to further narrow within the database portfolio of “health” which limited the results to the database cinahl first. she was happy with being able to limit by folder to her discipline. she said the folders would help her sort through the results. student g’s topic she had researched within the last semester was “occupational therapy for students with disabilities” such as cerebral palsy. she understood through experience, that it would be easiest to narrow results by searching for ‘occupational therapy’ and then add a specific disability. student g was the user who made the most use of facets on the left. she liked encore’s use of icons for different types of materials. student b also commented on “how easy the icons made it.” faculty b, in looking for the a topic he had been researching recently in appsearch, typed in “writing across the curriculum glossary of terms” and got no results on this search. he said, “mmm, well that wasn’t helpful, so to me, that means i’d go through here” and he clicked on the google search box in the browser bar. he next tried removing “glossary of terms” from his search and the load time was slow on articles, so he gave up after ten seconds and clicked on “advanced search” and tried putting “glossary of terms” in the second line. this led to another dead end. he said, “i’m just surprised appalachian doesn’t have anything on it.” the author asked if he had any other ideas about how to approach finding materials on his topic from the library’s homepage and he said no, he would just try google (in other words, navigating to the group of databases for education was not a strategy that occurred to him). usability test results for encore in an academic library | johnson 74 the faculty member d had been doing research on a relatively obscure historical event and was able to find results using encore. when asked if he had seen the articles before, he said, “yes, i’ve found these, but it is great it’s all in one search!” glitches revealed it is of concern for the user experience that the advanced search of encore does not search articles; it only searches the catalog. this was not clear to any participant in this study. as noted earlier, encore’s article search is a federated search. this affects load time for article results, and also puts the article results into silos, or to use encore’s terminology, “database portfolios.” encore’s information on their website definitely markets the site as a discovery tool, saying, it “integrates federated search, as well as enriched content—like first chapters—and harvested data… encore also blends discovery with the social web. 18” it is important for libraries considering purchase of encore that while it does have many features of a discovery service, it does not currently have a central index with pre-harvested metadata for articles. if innovative interfaces is going to continue to offer an advanced search box, it needs to be made explicitly clear that the advanced search is not effective for searching for articles, or innovative interfaces needs to make an advanced search work with articles by creating a central index. to cite a specific example from this study, when student e was using appsearch, with all the tasks, after she ran a search, she clicked on the advanced search option. the author asked her, “so if there is an advanced search, you’re going to use it?” the student replied, “yeah, they are more accurate.” another aspect of encore that users do not intuitively grasp is that when looking at the results for an article search, the first page of results comes from a quick search of a limited number of databases (see figure 8). the users in this study did understand that clicking on the folders will narrow by discipline, but they did not appear to grasp that the result in the database portfolios are not included in the first results shown. when users click on an article result, they are taken to the native interface (such as psych info) to view the article. users seemed un-phased when they went into a new interface, but it is doubtful they understand they are entering a subset of appsearch. if users try to add terms or do a new search in the native database they may get relevant results, or may totally strike out, depending on chosen database’s relevance to their research interest. information technology and libraries | september 2013 75 figure 10. changing a search in encore. another problem that was documented was that after users ran a search, if they changed the text in the “search” box, the results for articles did not change. figure six demonstrates the results from task 2 of this study, which asks users to find information on anorexia and self-esteem. the third task asks the user to find information on china and foreign relations. figure 10 demonstrates the results for the anorexia search, with the term “china” in the search box, just before the user clicks enter, or the orange arrow for new search. figure 11. search results for changed search. figure 11 show that the search for the new term, “china” has worked in the catalog, but the results for articles are still about anorexia. in this implementation of encore, there is no “new search button” (except in the advanced search page, there is a “reset search” button, see figure 7) and usability test results for encore in an academic library | johnson 76 refreshing the browser is had no effect on this problem. this issue was screencast19 and sent to the vendor. happily, as of april 2013, innovative interfaces appears to have resolved this underlying problem. one purpose of this study was to determine if users had a strong preference for tabs, since the library could choose to implement encore with tabs (one for access to articles, one for the catalog, and other tab options like google scholar). this study indicated users did not like tabs in general, they much preferred a “one box solution” on first encounter. a major concern raised was the user’s response to the question, “how much of the library’s holdings do you think appsearch/ articles quick search is looking across?” twelve out of thirteen users believed that when they were searching for articles from the quick search for articles tabbed layout, they were searching all the library databases. the one exception to this was a faculty member in the english department, who understood that the articles tab searched a small subset of the available resources (seven ebsco databases out of 400 databases the library subscribes to). all thirteen users believed appsearch (encore) was searching “everything the library owned.” the discovery service searches far more resources than other federated searches the library has had access to in the past, but it is still only searching 50 out of 400 databases. it is interesting that in the fagan et al. study of ebsco’s discovery service, only one out of ten users in that study believed the quick search would search “all” the library’s resources.20 a glance at james madison university’s library homepage21 suggests wording that may improve user confusion. figure 12. screenshot of james madison library homepage, accessed december 18, 2012. information technology and libraries | september 2013 77 figure 13. original encore interface as implemented in january 2013. given the results that 100% of the users believed that appsearch looked at all databases the library has access to, the library made changes to the wording in the search box. (see figure 7). future tests can determine if this has any positive effect on the understanding of what appsearch includes. figure 14. encore search box after this usability study was completed. the arrow highlights additions to the page as a result of this study. some other wording changes suggested were from the finding that only seven out of nine students fully understood that “peer reviewed” would limit to scholarly articles. a suggestion was made to innovative interfaces to change the wording to “scholarly (peer reviewed)” and they did so in early january. although innovative’s response on this issue was swift, and may help students, changing the wording does not address the underlying information literacy issue of what students understand about these terms. interestingly, encore does not include any “help” pages. appalachian’s liaison with encore has asked about this and been told by encore tech support that innovative feels the product is so intuitive; users will not need any help. belk library has developed a short video tutorial for users, and local help pages are available from the library’s homepage, but according to innovative, a link to these resources cannot be added to the top right area of the encore screen (where help is commonly located in web interfaces). although it is acknowledged that few users actually read “help” pages, it seems like a leap of faith to think a motivated searcher will understand things like the “database portfolios” (see figures 9) without any instruction at all. after implementation, the usability test results for encore in an academic library | johnson 78 librarians here at appalachian conducted internally developed training for instructors teaching appsearch, and all agreed that understanding what is being searched and how to best perform a task such as an advanced article search is not “totally intuitive,” even for librarians. finally, some interesting search strategy patterns were revealed. on the second and third questions in the script (both having to do with finding articles) five of the thirteen participants had the strategy of putting in one term, then after the search ran, adding terms to narrow results using the advanced search box. although this is a small sample set, it was a common enough search strategy to make the author believe this is not an unusual approach. it is important for librarians and for vendors to understand how users approach search interfaces so we can meet expectations. further research the findings of this study suggest librarians will need to continue to work with vendors to improve discovery interfaces to meet users expectations. the context of what is being searched and when is not clear to beginning users in encore one aspect of this test was it was the participants’ first encounter with a new interface, and even student d, who was unenthused about the new interface (she called the results page “messy, and her sus score was 37.5 for encore, versus 92 for the tabbed layout) said that she could learn to use the system given time. further usability tests can include users who have had time to explore the new system. specific tasks that will be of interest in follow up studies of this report are if students have better luck in being able to know where to find the item in the stacks with the addition of the “map it” feature. locally, librarian perception is that part of the problem with this results display is simply visual spacing. the call number is not set apart or spaced so that it stands out as important information (see figure 5 for a screenshot). another question to follow up on will be to repeat the question, “how much of the library’s holdings do you think appsearch is looking across?” all thirteen users in this study believed appsearch was searching “everything the library owned.” based on this finding, the library made small adjustments to the initial search box (see figures 14 and 15 as illustration). it will be of interest to measure if this tweak has any impact. summary all users in this study recommended that the library move to encore’s “one box” discovery service instead of using a tabbed layout. helping users figure out when they should move to using discipline specific databases will most likely be a long-term challenge for belk library, and for other academic libraries using discovery services, but this will probably trouble librarians more than our users. information technology and libraries | september 2013 79 the most important change innovative interfaces could make to their discovery service is to create a central index for articles, which would improve load time and allow for an advanced search feature for articles to work efficiently. because of this study, innovative interfaces made a wording change in search results for article to include the word “scholarly” when describing peer reviewed journal articles in belk library’s local implementation. appalachian state university libraries will continue to conduct usability studies and tailor instruction and e-learning resources to help users navigate encore and other library resources. overall, it is expected users, especially freshman and sophomores, will like the new interface but will not be able to figure out how to improve search results, particularly for articles. belk library & information commons’ instruction team is working on help pages and tutorials, and will incorporate the use of encore into the library’s curricula. references 1 . thomsett-scott, beth, and patricia e. reese. "academic libraries and discovery tools: a survey of the literature." college & undergraduate libraries 19 (2012): 123-43. 2. ibid, 138. 3. hunter, athena. “the ins and outs of evaluating web-scale discovery services” computers in libraries 32, no. 3 (2012) http://www.infotoday.com/cilmag/apr12/hoeppner-web-scalediscovery-services.shtml (accessed march 18, 2013) 4. fagan, jody condit, meris mandernach, carl s. nelson, jonathan r. paulo, and grover saunders. "usability test results for a discovery tool in an academic library." information technology & libraries 31, no. 1 (2012): 83-112. 5. thomas, bob., and buck, stephanie. oclc's worldcat local versus iii's webpac. library hi tech, 28(4) (2010), 648-671. doi: http://dx.doi.org/10.1108/07378831011096295 6. becher, melissa, and kari schmidt. "taking discovery systems for a test drive." journal of web librarianship 5, no. 3: 199-219 [2011]. library, information science & technology abstracts with full text, ebscohost (accessed march 17, 2013). 7. ibid, p. 202 8. ibid p. 203 9. allison, dee ann, “information portals: the next generation catalog,” journal of web librarianship 4, no. 1 (2010): 375–89, http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience (accessed march 17, 2013) http://www.infotoday.com/cilmag/apr12/hoeppner-web-scale-discovery-services.shtml http://www.infotoday.com/cilmag/apr12/hoeppner-web-scale-discovery-services.shtml http://dx.doi.org/10.1108/07378831011096295 usability test results for encore in an academic library | johnson 80 10. singley, emily. 2011 “encore synergy 4.1: a review” the cloudy librarian: musings about library technologies http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-areview/ [accessed march 20, 2013]. 11 . nielson, jakob. 2000. “why you only need to test with 5 users” http://www.useit.com/alertbox/20000319.html (accessed december 18, 2012]. 12. fagan et al, 90. 13. dixon, lydia, cheri duncan, jody condit fagan, meris mandernach, and stefanie e. warlick. 2010. "finding articles and journals via google scholar, journal portals, and link resolvers: usability study results." reference & user services quarterly no. 50 (2):170-181. 14. bangor, aaron, philip t. kortum, and james t. miller. 2008. "an empirical evaluation of the system usability scale." international journal of human-computer interaction no. 24 (6):574-594. doi: 10.1080/10447310802205776. 15. sauro, jeff. 2011. “measuring usability with the system usability scale (sus)” http://www.measuringusability.com/sus.php. [accessed december 7, 2012]. 16. ibid. 17. mellendorf, scott. “encore synergy sites” zahnow library, saginaw valley state university. http://librarysubjectguides.svsu.edu/content.php?pid=211211 (accessed march 23, 2013). 18. encore overview, “http://encoreforlibraries.com/overview/” (accessed march 21, 2013). 19. johnson, megan. videorecording made with jing on january 30, 2013 http://www.screencast.com/users/megsjohnson/folders/jing/media/0ef8f186-47da-41cf96cb-26920f71014b 20. fagan et al. 91. 21. james madison university libraries, “http://www.lib.jmu.edu” (accessed december 18, 2012). http://emilysingley.wordpress.com/ http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-a-review/ http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-a-review/ http://www.useit.com/alertbox/20000319.html http://www.measuringusability.com/sus.php http://librarysubjectguides.svsu.edu/content.php?pid=211211 http://encoreforlibraries.com/overview/ http://www.screencast.com/users/megsjohnson/folders/jing/media/0ef8f186-47da-41cf-96cb-26920f71014b http://www.screencast.com/users/megsjohnson/folders/jing/media/0ef8f186-47da-41cf-96cb-26920f71014b http://www.lib.jmu.edu/ information technology and libraries | september 2013 81 appendix a pre-purchase usability benchmarking test in april 2012, before the library purchased encore, the library conducted a small usability study to serve as a benchmark. the study outlined in this paper follows the same basic outline, and adds a few questions. the purpose of the april study was to measure student perceived success and satisfaction with the current search system of books and articles appalachian uses compared with use of the implementation of encore discovery services at university of nebraska lincoln (unl). the methodology was four undergraduates completing a set of tasks using each system. two started with unl, and two started at appalachian’s library homepage. in the april 2012 study, the participants were three freshman and one junior, and all were female. all were student employees in the library’s mailroom, and none had received special training on how to use the library interface. after the students completed the tasks, they rated their experience using the system usability scale (sus). in the summary conclusion of that study, the average sus score for the library’s current search box layout was 62, and for unl’s encore search it was 49. even though none of the students was particularly familiar with the current library’s interface, it might be assumed that part of the higher score for appalachian’s site was simply familiarity. student comments from the small april benchmarking study included the following. the junior student said the unl site had "too much going on" and appalachian was "easier to use; more specific in my searches, not as confusing as compared to unl site." another student (a freshman), said she has "never used the library not knowing if she needed a book or an article." in other words, she knows what format she is searching for and doesn’t perceive a big benefit to having them grouped. this same student also indicated she had no real preference between appalachian or the unl. she believed students would need to take time to learn either and that unl is a "good starting place." usability test results for encore in an academic library | johnson 82 appendix b instructions for conducting the test notes: use firefox for the browser, set to “private browsing” so that no searches are held in the cache (search terms to not pop into the search box from the last subject’s search). in the bookmark toolbar, the only two tabs should be available “dev” (which goes to the development server) and “lib” (which goes to the library’s homepage). instruct users to begin each search from the correct starting place. identify students and faculty by letter (student a, faculty a, etc). script hi, ___________. my name is ___________, and i'm going to be walking you through this session today. before we begin, i have some information for you, and i'm going to read it to make sure that i cover everything. you probably already have a good idea of why we asked you here, but let me go over it again briefly. we're asking students and faculty to try using our library's home page to conduct four searches, and then ask you a few other questions. we will then have you do the same searches on a new interface. (note: half the participants to start at the development site, the other half start at current site). after each set of tasks is finished, you will fill out a standard usability scale to rate your experience. this session should take about twenty minutes. the first thing i want to make clear is that we're testing the interface, not you. you can't do anything wrong here. do you have any questions so far? ok. before we look at the site, i'd like to ask you just a few quick questions. what year are you in college? what are you majoring in? roughly how many hours a week altogether--just a ballpark estimate--would you say you spend using the library website? ok, great. hand the user the task sheet. do not read the instructions to the participant, allow them to read the directions for themselves. allow the user to proceed until they hit a wall or become frustrated. verbally encourage them to talk aloud about their experience. usability test results for encore in an academic library | johnson 83 written instructions for participants. find the a copy of the book the old man and the sea. in your psychology class, your professor has assigned you a 5-page paper on the topic of eating disorders and teens. find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem. you are studying modern chinese history and your professor has assigned you a paper on foreign relations. find a journal article that discusses relations between china and the us. what is a topic you have written about this year? search for materials on this topic. usability test results for encore in an academic library | johnson 84 appendix c follow up questions for participants (or ask as the subject is working) after the first task (find a copy of the book the old man and the sea) when the user finds the book in appsearch, ask “would you know where to find this book in the library?” how much of the library’s holdings do you think appsearch/ articles quick search is looking across? does “peer reviewed” mean the same as “scholarly article”? what does the “refine by tag” block the right mean to you? if you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend? do you have any questions for me, now that we're done? thank subject for participating. usability test results for encore in an academic library | johnson 85 appendix d sample system usability scale (sus) strongly strongly disagree agree i think that i would like to use this system frequently 1 2 3 4 5 i found the system unnecessarily complex 1 2 3 4 5 i thought the system was easy to use 1 2 3 4 5 i think that i would need the support of a technical person to be able to use this system 1 2 3 4 5 i found the various functions in this system were well integrated 1 2 3 4 5 i thought there was too much inconsistency in this system 1 2 3 4 5 i would imagine that most people would learn to use this system very quickly 1 2 3 4 5 i found the system very cumbersome to use 1 2 3 4 5 i felt very confident using the system 1 2 3 4 5 i needed to learn a lot of things before i could get going with this system 1 2 3 4 5 comments: lib-mocs-kmc364-20140106083930 198 an algorithm for compaction of alphanumeric data william d. schieber, george w. thomas: central library and documentation branch, international labour office, geneva, switzerland description of a technique for compressing data to be placed in computer auxiliary storage. the technique operates on the principle of taking two alphabetic characters frequently used in combination and replacing them with one unused special character code. such une-for-two replacement has enabled the ilo to achieve a rate of compression of 43.5% on a data base of approximately 40,000 bibliographic records. introduction this paper describes a technique for compacting alphanumeric data of the type found in bibliographic records. the file used for experimentation is that of the central library and documentation branch of the international labour office, geneva, where approximately 40,000 bibliographic records are maintained on line for searches done by the library for its clients. work on the project was initiated in response to economic pressure to conserve direct-access storage space taken by this particularly large file. in studying the problem of how to effect compaction, several alternatives were considered. the first was a recursive bit-pattern recognition technique of the type developed by demaine ( 1,2), which operates mdependently of the data to be compressed. this approach was rejected because of the apparent complexity of the coding and decoding algorithms, and also because early analyses indicated that further development of the second type of approach might ultimately yield higher compression ratios. compaction of alphanumeric datajschieber and thomas 199 the second type of approach involves the replacement, by shorter nondata strings, of longer character strings known to exist with a high frequency in the data. this technique is data dependent and requires an analysis of what is to be encoded. one such method is to separate words into their component parts: prefixes, stems and suffixes; and to effect compression by replacing these components with shorter codes. there have been several successful algorithms for separating words into their components. salton ( 3) has done this in connection with his work on automatic indexing. resnikoff and dolby ( 4,5) have also examined the problem of word analysis in english for computational linguistics. although this method appears to be viable as the basis of a compaction scheme, it was here excluded because ilo data was in several languages. moreover, dolby and resnikoff's encoding and decoding routines require programs that perform extensive word analysis and dictionary look-up procedures that ilo was not in a position to develop. the actual requirements observed were twofold: that the analysis of what strings were to be encoded be kept relatively simple, and that the encoding algorithm must combine simplicity and speed presumably by minimizing the amount of dictionary look-up required to encode and decode the selected string. one of the most straightforward examples of the use of this technique is the work done by snyderman and hunt ( 6 ) that involves replacement of two data characters by single unused computer codes. however, the algorithm used by them does not base the selection of these two-character pairs (called "digrams") on their frequency of occurrence in the data. the technique described here is an attempt to improve and extend the concept by encoding digrams on the basis of frequency. the possibility of encoding longer character strings is also examined. three other related discussions of data compaction appear in papers by myers et al. (7) and by demaine and his colleagues (8,9). the compression technique the basic technique used to compact the data file specifies that the most-frequently occurring digrams be replaced by single unused specialcharacter codes. on an eight-bit character machine of the type used, there are a total of 256 possible character codes (bytes ) . of this total only a small number are allocated to graphics (that is, characters which can be reproduced by the computer's printer). in addition, not all of the graphics provided for by the computer manufacturer appear in the user's data base. thus, of the total code set, a large portion may go unused. characters that are unallocated may be used to represent longer character strings. the most elementary form of substitution is the replacement of specific digrams. if these digrams can be selected on the basis of frequency , the compression ratio will be better than if selection is done independent of frequency. 200 journal of library automation vol. 4/4 december, 1971 this requires a frequency count of all digrams appearing in the data, and a subsequent ranking in order of decreasing frequency. once the base character set is defined, and the digrams eligible for replacement are selected, the algorithm can be applied to any string of text. the algorithm consists of two elements: encoding and decoding. in encoding, the string to be encoded is examined from left to right. the initial character is examined to determine if it is the first of any encodable digram. if it is not, it is moved unchanged to the output area. if it is a possible candidate, the following character is checked against a table to verify whether or not this character pair can be replaced. if replacement can be effected, the code representing the digram is moved to the output area. if not, the algorithm then moves on to treat the second character in precisely the same way as the first. the algorithm continues, character-by-character until the entire string has been encoded. following is a step-by-step description of the element. 1) load length of string into a counter. 2) set pointer to first character in string. 3) check to determine whether character pointed can occur in combination. if character does not occur in combination, point to next character and repeat step 3. 4) if character can occur in combination, check following character in a table of valid combinations with the first character. if the digram cannot be encoded, advance pointer to next character and return to step 3. 5) if the digram is codable, move preceeding non-codable characters (if any) to output area, followed by the internal storage code for the digram. 6) decrease the string length counter by one, advance pointer two positions beyond current value and return to step 3. in the following example assume that only three digrams are defined as codable: ab, be and de. assume also that the clear text to be encoded is the six-character string abcdef. after encoding the coded string would appear as: ab c de f a horizontal line is used to represent a coded pair, a dot shows a single (non-combined) character. the encoded string above is of length four. note that although bc was defined as an encodable digram, it did not combine in the example above because the digram ab was already encoded as a pair. the characters c and f do not combine, so they remain uncoded. note also that if the digram ab had not been defined as codable, the resultant combination would have been different in this case: a bc de f compaction of alphanumeric data j schieber and thomas 201 the decoding algorithm serves to expand a compressed string so that the record can be displayed or printed. as in the encoding routines, decoding of the string goes from left to right. bytes in the source string are examined one by one. if the code represents a single character, the print code for that character is moved to the output string. if the code represents a digram, the digram is moved to the output string. decoding proceeds byte-by-byte as follows until end of string is reached: 1 ) load string length into counter. 2 ) set pointer to first byte in record. 3 ) test character. if the code represents a single character, point to next source byte and retest. 4) if the code represents a digram: move all bytes ( if any ) up to the coded digram; and move in the digram. 5) increase the length value by one, point to next source byte and continue with step 3. application of the technique the algorithm, when used on the data base of approximately 40,000 records was found to yield 43.5% compaction. the file contains bibliographic records of the type shown in figure 1. 413.5 1970 70al350 warner m stone m the data bank societyorganizations, computers and social freedom. london, george allen and unwin, <1970>. 244 p. charts. /social research/ into the potential thrf.at to privacy and freedom f/human right/sl through thf misuse of /data bank/s examines /computer/ based /information ---~ieval/, the impact of computer technology on branches of the /public administration/ ann /health service/$ in the /usa/ ano the /uk/ ano co~cluoes that, in order to protect human dignity, the new powers must be kept tn chf.ck. /bibliography/ pp. 236 to 242 ano /reference/$. engl fig. 1 . sample record from test file. each record contains a bibliographic se gment as well as a brief abstract containing descriptors placed between slashes for computer identification. a large amount of blank space appears on the printed version of these records; however, the uncoded machine readable copy does not contain blanks, except between words and as filler characters in the few fields defined as fixed-length. the average length of a record is 535 characters ( 10) . 202 journal of library automation vol. 4/4 december, 1971 the valid graphics appearing in the data are shown in table 1, along with the percentage of occurrence of each character throughout the entire file. table 1. single-character frequency freq. freq. freq. freq. freq. graphic % graphic % graphic % graphic % graphic % b 14.87 i 4.32 h 1.58 0.63 8 0.31 e 7.63 c 3.48 1.52 w 0.50 ( 0.28 n 6.38 l 3.32 ' 1.52 2 0.42 ) 0.28 i 6.01 d 2.32 1 1.08 k 0.42 + 0.21 a 6.01 u 2.21 v 0.91 3 0.40 j 0.15 (/j 5.86 p 2.12 b 0.87 5 0.37 x 0.14 t 5.50 m 2.02 9 0.83 7 0.37 z 0.13 r 4.82 f 1.61 y 0.82 0 0.35 q 0.08 s 4.61 g 1.58 6 0.81 4 0.34 misc. 0.01 spec. as might be expected, the blank (b) occurs most frequently in the data because of its use as a word separator. the slash occurs more frequently than is normal because of its special use as a descriptor delimiter. it should also be noted that the data contains no lower-case characters. this is advantageous to the algorithm because it considerably le~sens the total number of possible digram combinations. as a result, a larger proportion of the file is codable in the limited set chosen as codable pairs, and because the absence of 26 graphics allows the inclusion of 26 additional coded pairs. in the file used for compaction there are 58 valid graphics. allowing one character for special functions leaves 197 unallocated character codes (of a total of 256 possible ). a digram frequency analysis was performed on the entire file and the digrams ranked in order of decreasing frequency. from this list the first 197 digrams were selected as those which were eligible for replacement by single-character codes. table 2 shows these "encodable" digrams arranged by lead character. the algorithm was programmed in assembler language for use on an ibm 360/40 computer. the encoding element requires approximately 8,000 bytes of main storage; the decoding element requires approximately 2,000 bytes. in order to obtain data on the amount of computer time required to encode and decode the file, the following tests were performed. to find the encoding time, the file was loaded from tape to disk. the tape copy of the file was uncoded, the disk copy compacted. loading time for 41,839 records was 52 minutes and 51 seconds. the same tape to disk operation without encoding took 28:08. the time difference ( 24:43) represents encoding time for 41,839 records, or .035 seconds per record. a decoding test was done by unloading the previously coded disk file to tape. the time taken was 41:52, versus a time of 20:20 for unloading compaction of alphanumeric dataischieber and thomas 203 an uncompacted file. the time difference (21:32) represents decoding time for 41,839 records, or .031 seconds per record. the compaction ratio, as indicated above, was 43.5 per cent. for purposes of comparison, the algorithm developed by snyderman and hunt ( 6) was tested and found to yield a compaction ratio of 32.5% when applied to the same data file. table 2. most frequently occuring digrams lead char. a b c d e f g h i l m n 0 p r s t u v w y b 1 i ) eligible digrams ab ac ad ag ai al am an ap ar as at ab bl bo ca ce ch ci cl co ct cu cb c. dedi du db dl ea ec ed ef el em en ep er es et ev eb el fe fifo fr f~ ge gl gr gb gl ha he hi ho hb la ic ie il in 10 is it iv la le li ll lo lu us ma me mi mm mu mhs na nc nd ne ng ni no ns nt nla nl oc od of og ol om on op or ou ov ol,a pa pe pl po pr p. ra re ri rk rn ro rs rt ru ry rb rl sa se sl so sp ss st su shs s, s. ta tc te th ti to tr ts tu ty tb t i uc ud ul un ur us ut va ve vi wo yhs yl lisa hsb bc bd be hsg lal lal bm bn bo hip l;6r bs hit l;6u l;6w };6};6 l/j i l/j-. l/j ( 19 1 a ; c je 11 / l ; m jp jr ; s jt jb 1, ,b .l/j -b ), possible extension of the algorithm currently the compression technique encodes only pairs of characters. there might be good reason to extend the technique to the encoding of longer strings-provided a significantly higher compaction ratio could be 204 journal of library automation vol. 4/4 december, 1971 achieved without undue increase in processing time. one could consider encoding trigrams, quadrigrams, and up to n-grams. the english wo~d ·'the", for example, may occur often enough in the data to make it worth coding. the arguments against encoding longer strings are several. prime among these is the difficulty of deciding what is to be encoded. doing an analysis of digrams is a relatively straightforward affair, whereas an analysis of trigrams and longer strings is considerably more costly, because of the fact that there are more combinations. furthermore, if longer strings are to be en'coded, the algorithms for encoding and decoding become more complex and time-consuming to employ. one approach to this type of extension is to take a particular type of character string, namely a word, and to encode certain words which appear frequently. a test of this technique was made to encode particular words in the data: descriptors . all descriptors (about 1200 in number) appear specially marked by slashes in the abstract field of the record. each descriptor (including the slashes) was replaced by a two-character code. after replacement, the normal compaction algorithm was applied to the record. a compaction ratio of 56.4% was obtained when encoding a small sample of twenty records ( 10,777 characters). the specific difficulty anticipated in this extension is the amount of either processing time or storage space which the decoding routines would require. if the look-up table for the actual descriptor values were to be located on disk, the time to retrieve and decode each record might be rather long. on the other hand, if the look-up table were to be in main storage at the time of processing, its size might exclude the ability to do anything else, particularly when on-line retrieval is done in an extremely limited amount of main storage area. a partial solution to this problem might be to keep the look-up tables for the most frequently occurring terms in main storage and the others on disk. at present further analysis is being done to determine the value of this approach. conclusions the compaction algorithm performs relatively efficiently given the type of data used in text data base (i.e. data without lower case alphabetics, having a limited number of special characters, in primarily english text ). the times for decoding individual records ( .031 sec/ record ) indicate that on a normal print or terminal display operation, no noticeable increase in access time will be incurred. however several types of problems are encountered when treating other kinds of data. since the algorithm works on the basis of replacing the most-frequently occurring n-grams by single-byte codes, the compaction ratio is dependent on the number of codes that can be "freed up" for n-gram representation. the more codes that can be reallocated to n-grams, the better the compaction. data which would pose complications to the algorithm-as currently defined-can be separated for discussion as follows: compaction of alphanumeric datajschieber and thomas 205 1) data containing both upper and lower case characters (as well as a limited set of special characters), and 2) data which might possibly contain a wide variety of little-used special graphics. if lower-case characters are used, a possible way to encode data using this technique is to harken back to the time-honored method of representing lower-case with upper-case codes, and upper-case characters by their value, preceeded by a single shift code (e.g., #access for access). the shift code blank character digram would undoubtedly figure relatively high on the frequency list, making it eligible as an encodable digram. the second problem occurs when one attempts to compact data having a large set of graphics. a good example of this is bibliographic data containing a wide variety of little-used characters of the type now being provided for in the marc tapes ( 11) issued by the u. s. library of congress (such as the icelandic thorn). normally representation of these graphics is done by allocating as many codes as required from the possible 256-code set. since the compaction ratio is dependent on the number of unallocated internal codes, a possible solution to this dilemma might be to represent little-used graphics by multi-byte codes which would free the codes for representation of frequently occurring n-grams. further, it is noticeable that the more homogeneous the data the higher the compression ratio. this means that data all in one language will encode better than data in many languages. there is, unfortunately, no ready solution to this problem, given the constraints of this algorithm. in dealing with heterogeneous data one must be prepared to accept a lower compression factor. without doubt to be able to effect a savings of around 40% for storage space is significant. the price for this ability is computer processing time, and the more complex the encoding and decoding routines, the more time is required. there is a calculable break-even point at which it becomes economically more attractive to buy x amount of additional storage space than to spend the equivalent cost on data compaction. yet at the present cost of direct-access storage, compaction may be a possible solution for organizations with large data files. references 1. marron, b. a.; demaine, p. a. d.: "automatic data compression," communications of the acm, 10 (november 1967), 711-715. 2. demaine, p. a. d.; kloss, k.; marron, b. a.: the solid system iii: alphanumeric compression. (washington, d. c. : national bureau of standards, 1967 ) . (technical note 413 ) . 3. salton, g.: automatic information organization and retrieval (new york: mcgraw-hill, 1968 ). 4. resnikoff, h. l.; dolby, j. l.: "the nature of affixing in written english," mechanical translation, 8 (march 1965), 84-89. 206 journal of library automation vol. 4/4 december, 1971 5. resnikoff, h . l.; dolby, j. l.: "the nature of affixing in written english," mechanical translation, 9 (june 1966), 23-33. 6. snyderman, martin; hunt, bernard: "the myriad virtues of text compaction," datamation (december 1, 1970), 36-40. 7. myers, w.; townsend, m.; townsend, t.: "data compression by hardware or software," datamation (april 1966), 39-43. 8. demaine, p. a. d.; kloss, k.; marron, b. a.: the solid system ii. numeric compression. (washington, d. c.: national bureau of standards, 1967). (technical note 413 ). 9. demaine, p. a. d.; marron, b. a.: "the solid system i. a method for organizing and searching files." in schecter, g. (ed.): information retrieval-a critical view. (washington, d. c.: thompson book co., 1967). 10. schieber, w.: isis (integrated scientific information system; a general description of an approach to computerized bibliographical control). (geneva: international labour office, 1971) . 11. books: a marc format; specification of magnetic tapes containing monographic catalog records in the marc ii format. (washington, d. c.: library of congress, information systems office, 1970.) 10 high school library data processing betty flora: librarian, leavenworth high school, leavenworth, kansas and john willhardt: data processing instructor, central missouri state college, warrensburg, missouri. planning and operation of an automated high school library system is described which utilizes an ibm 1401 data processing system installed for teaching purposes. book ordering, shelf listing and circulation have been computerized. this paper presents an example of a small automated high-school library system which works efficiently. a great deal of emphasis to date in library automation has been on large university and college libraries, but the relatively few schools that have pioneered in the field of school library automation have demonstrated its feasibility and its potential. data processing is economically within the realm of large and medium-sized school districts. the port huron district, port huron, michigan, has an accounting machine, keypunch and verifier; among the operations performed are printing purchase orders and book cards. the port huron staff consists of one professional librarian, two clerks and two part-time working students. evanston township high school, evanston, illinois, has an automated library system processed with an ibm 1401 computer. other high schools using library data processing are the oak park-river forest high school in illinois; beverly hills, california; west hartford, connecticut; weston, massachusetts; and the burnt hills-ballston lake and bedford-mt. kisco school districts in new york state (1). there are a small number of high schools and vocational schools in kansas and missouri that have high school library edp/flora and willhardt 11 data processing equipment which is used for teaching purposes. names and addresses of these schools may be obtained from the missouri director of vocational education at jefferson city, missouri, and from the kansas state supervisor of technical training at topeka, kansas. introduction leavenworth senior high school, leavenworth, kansas, a campusstyle school comprising six buildings, has approximately 1350 students. the library, located in the main academic building, is presently being remodeled and enlarged. it contains approximately eighteen thousand volumes, including the professional collection; and fifteen hundred to two thousand new volumes are added each year. the library staff consists of one qualified librarian, two full-time clerical assistants, and twenty student assistants, each of the latter working one class period a day. the library is, in the true sense of the term, a media center. a mobile listening center is available, and there are large collections of recordings, cartridge and reel tapes, film strips, films, microfilms, reproductions of paintings, educational games, magazines and vertical file material. fortunately, there is a consistently substantial budget of more than eight dollars per student, including some federal funds, which makes additions to the collection possible in stable development. · data processing at leavenworth high school was made possible by the vocational education act of 1963, which provided for the secretary of health, education, and welfare to enter into agreements with the several state vocational education agencies to provide such occupational training as found to be necessary by the secretary of labor (2). under the provisions of the act, federal money is alloted to the states, which in turn allot a portion of this money to various school districts; a school system receiving such money must lease or purchase data processing equipment and use it mainly for teaching purposes. a data processing curriculum was initiated in the school year 1964-65 at leavenworth high school, under conditions and regulations set up by the state supervisor of technical training which gave first priority in the use of the data processing equipment to teaching. this has been adhered to strictly at leavenworth high school; the equipment is used over half of the school day for teaching purposes and adult education courses in data processing are offered at night. class time consists of lecture and application, with students having opportunity to operate, wire, program and test problems. data processing classes are scheduled first in the computer room; administrative and library operations are scheduled to be processed in the remaining hours during the school day and after school, each operation being assigned a specific time. although unit record equipment was initially leased, plans for a small computer were included in the original decision to offer data processing courses. equipment, plus salaries to those conducting the program, con12 1 ournal of librm·y automation vol. 2/1 march, 1969 stitute a major investment for a medium-sized public high school. consequently, although the classes are a valuable addition to the vocational training area of the curriculum, as many applications as possible are made of school operations, such as enrollment, record keeping, grade reports and payroll, in order to further justify the cost. for this reason, the superintendent of the leavenworth school system suggested that the library might, by using data processing in many of its procedures, both support the data processing instructional program and increase its own effectiveness. methods and materials to develop a system requires systems analysis, which necessitates a clear formulation of purposes and requirements independent of any particular design for implementation ( 3); and the development of procedural applications to be processed on a computer system should be a joint responsibility of both the systems staff and line management ( 4). furthermore, any conversion of library procedures to automation should be carefully planned in advance. proceeding in the fullest cooperation with a view to mutual benefits, the librarian at leavenworth and the head of data processing spent many hours working out the details of their joint effort. the librarian explained her needs and suggested methods of achieving the desired objectives. for his part, the head of data processing evaluated the possibilities from a technical point of view and suggested methods of achieving the desired objectives. together they worked out an initial plan, and the various phases were then programmed. the leavenworth data processing library system was set up to 1) order all new library books; 2) complete shelf cards and book checkout cards; 3) run shelf card listings; 4) correct and file shelf cards; 5) reproduce book checkout cards for books checked out; 6) run first and second overdue notices; and 7) provide library inventory, book count lists and book catalogs. all the lists, notices, and reproduced cards are done on the 1401 computer; computer programs for these operations are written in autocoder. the amount of computer time required for the processing of library data and reports is comparatively small in relation to other operations of the data processing department and was set up to run partly in the daily schedule and partly after school. time required for preparation of information for the computer is significant and must be scheduled more carefully. again, part of this time is fitted into the daily schedule and part of it is accomplished after classes. the high school leases the following ibm data processing equipment: two 024 card punches, one 026 printing card punch, one 082 sorter, one 548 interpreter, one 085 collator, and one 1401 computer with 4k and one disk storage drive. the 1401 computer consists of the 1401 central processing unit, a 1402 card reader punch, a 1403 printer and a 1311 disk storage drive. high school library edp /flora and willhardt 13 the following cards were developed for the procedure: shelf card a is punched from lists of books to be ordered and only the following information and columns are punched: author name (columns 14-35), title (columns 36-71), copyright date (columns 72-73), and purchase date (columns 79-80). when the book is received, this card is completed with the following information: shelf letter (column 1), dewey decimal number (columns 2-7), author number (columns 8-13) and accession number (columns 7 4-78). shelf card b is punched and filed behind shelf card a. only the following information and columns are punched: price (columns 8-13), publisher (columns 36-65), and an x-punch in column 80. the book checkout card (figure 1) is first reproduced from the completed shelf card a, and after that from book checkout cards when books have been checked out of the library. this card contains the shelf letter (column 1), dewey decimal number (columns 2-7), author number (columns 8-13), author name (columns 14-30), title (columns 31-66), student number (columns 68-73), accession number (columns 7 4-78), and an x-punch in column 80. !look titlt author i accept responsibility for this book ano should this book be lost, destroyed or stolen while checked out to me, i will pay the replacement cost of the book, i agree to pay nie fin~ for overdue book$ as follows: i to 5 days overdue 2¢ per day 6 to 10 da'is overdue !5¢ per day over 10 days ~l:rt!v£ 10¢ per da't ---=moc.,=-=t '""':=::-'""' __ ] fig. 1. book checkout card. student tivmdir i a student finder card locates the student's name and parent's name and address on the computer disk pack. the biggest initial task was keypunching an ibm card for each book in the library, which at that time comprised 13,000 books. it was done by data processing students in the high school, working occasionally during class, but mostly after class and on saturdays on a voluntary basis. toward the end of the second semester, many of the procedures had been reviewed and discussed with students in the data processing classes as part of the vocational program. 14 journal of library automation vol. 2/1 march, 1969 aondb card i nte'rpret cards run listing on 1401 re-:-run list fig. 2. book order procedure. interpret book checkou cards interpret shelf card reproduce shelf card into book checkout card fig. 3. new book p1'0cessing. high school library edp/flora and willhardt 15 r ecej ve book · checkout cards from library reproduce new book checkout cards interpret new book checkout cards return old and new book check: out cards to ll brary fig. 4. book checkout procedure. cards for overdue books send finder cards to data processing run address labels return labels and finder cards to library fig. 5. overdue notice procedure. 16 journal of library automation vol. 2/1 march, 1969 book order (figure 2) the library furnishes the data processing department with request cards or lists of books to be ordered, giving author name, title, copyright date, price, publisher, and purchase date (year). data processing punches two cards for each book according to shelf cards a and b. these cards and batches must be kept in the order received from the library. the cards are interpreted, checked for correct punching and listed by batch. the library must check the number of copies ordered and the total amount of each group or batch. after verification and corrections, the cards are returned to data processing for rerunning of the number of copies necessary to send with the purchase order. new book processing (figure 3) when new books are received, the library staff discards shelf card b and writes the following information on shelf card a for punching in the columns indicated: shelf letter in column 1 ( b for biography, k for kansas, p for professional, r for reference, s for story collection, or a blank which indicates fiction); dewey decimal number in columns 2-7; author number in columns 8-13; and accession number in columns 74-78. these columns are interpreted on the 548 interpreter. shelf card a is used to reproduce the book checkout card. shelf cards are block sorted on column 1; each group is then sorted by author number and dewey decimal number. individual cards must be hand filed into the shelf list. the shelf list can be used to provide classification listings, inventory listings, library book counts and book catalogs. book checkout cards are interpreted, sorted by author name (columns 14-23 alpha), returned to the library and filed in the respective books. book checkout card reproduction (figure 4) as books are checked out of the library, the book checkout card (figure 1) is signed by the student and his number is written on it. once a week accumulated book checkout cards are sent to data processing to be reproduced into new book checkout cards which are interpreted and merged behind the old book checkout cards. each week's cards are kept separately. the old cards are for books due in the library in two weeks. the new cards are inserted in these books as they are returned and the old ones placed in a separate file for library circulation statistics. overdue notices (figure 5) the library is provided with a deck of student finder cards (on~ for each student), with student name, number and finder number on the card and in the address file on the disk. when books are overdue, finder cards are pulled by the library staff and sent to data processing,. where they are sorted by a disk accession number. address labels are run on high school library edp/flora and willhardt 17 the 1401 computer for those students with overdue books. these labels are presently attached to pre-printed envelope overdue notices (figure 6), but it is planned to replace the envelope with a continuous-form post card. the first notice is addressed to the student at his home and the second to his parents. leavenworth senior high school library if you have returned your overdue library materials disregard this notice •••• if not, please come to the library at your earliest convenience. 1 to 5 days overdue..................... 2¢ per day 6 to 10 days overdue ..................... 5¢ per day over 10 days overdue ..................... lo¢ per day fig. 6. overdue notice. discussion the book checkout and overdue notices procedures were the first concrete ones developed. these were initiated during the 1965-66 school year, and have proved to be quite successful in saving time and effort. one of the most useful purposes of the leavenworth system is that any portion of the shelf list can be easily provided for an instructor who wishes to assign special readings. also the system has simplified and accelerated preparation of lists for inventory purposes. the ordering process gives the librarian the opportunity to check the order lists before forwarding them to the business manager; this improves the accuracy of the order. 18 i ournal of library automation vol. 2/1 march, 1969 standardization of procedure and operation is essential for efficiency ( 5,6). basically the leavenworth procedure utilized two types of cards, sheh card a and the book checkout card, which are very similar in format. sheh card a is initiated when ordering books and is used to reproduce book checkout cards and to make sheh listings, inventory and book count listings. moreover the system was designed on the basis of having a minimum of skilled clerical workers. student help is used for correcting and filing sheh cards. the ability to provide a book catalog in the future is an advantage. a book catalog need not be confined to one area and may be done in multiple copies. different editions of a work may be more readily seen and compared on a printed page than in a card catalog, where only one entry can be examined at a time. also a book catalog may concentrate in a single easily handled volume entries which would occupy several heavy drawers in a card catalog (7). one of the problems associated with developing a system like the one here described is that of communication. as in all technical and professional areas, a specialized terminology develops, a kind of esoteric jargon which confuses meanings and impedes understanding. this difficulty naturally diminishes as each party to the cooperative effort becomes more familiar with the terminology of the other, and a little plain talking and clear thinking will soon eliminate it. the effectiveness of an automated library program depends, of course, upon the unqualified cooperation between the library and the data processing department. the librarian must establish a reasonable and acceptable schedule of work upon which the data processing department can depend, and she must assure that library material essential to that work is delivered according to schedule. conversely, the data processing department must undertake to complete the work promptly and accurately. evaluation certainly one of the most significant benefits of automation is the great saving of time. tedious and detailed tasks essential to the efficient operation of any library, tasks which formerly required many hours to complete and which had by their natures to be repeated periodically, are accomplished in a fraction of the time. consequently, the librarian is freed for more professional work; most importantly, she has more time to give to the students and their problems, which should be, above all, her first concern. the value of the leavenworth high school library system lies not only in greater accuracy and saving of time for the librarian and he~ staff, but also in the opportunity it provides for student help to learn and operate a system. it is apparent, finally, that automation, properly applied, can be an invaluable asset to the school library. like all systems it depends, in the high school library edp /flora and willhardt 19 final analysis, upon the human factors involved. so long as interests are mutual, and so long as efforts are equal, the library and data processing departments can work effectively together for the benefit of both. acknowledgments mr. jack spear, ksu, manhattan, kansas, advised on the initial planning of the system. the authors received cooperation and encouragement from mr. gordon yeargan, superintendent of schools in leavenworth, and mr. dino spigarelli, principal of leavenworth high school. mr. fred buis, data processing instructor at the high school, helped with the preparation of this paper and is continuing to develop the potential of the system. references 1. mccusker, sister mary lauretta: "implications of automation for school libraries part 2," school libraries, (fall, 1968), 15-22. 2. united states department of health, education and welfare: vocational and technical education (washington: government printing office, 1964). 3. markuson, barbara evans, ed.: libraries and automation (washington: library of congress, 1964). 4. elliott, orville c.; wesley, roberts.: business information processing systems (homewood, illinois: richard d. irwin, inc~ , 1968). 5. laden, h. n.; gildersleeve, t. r.: system design for computer applications (new york: john wiley & sons, inc., 1963). 6. dougherty, richard m.: "manpower utilization in technical services," library resources and technical services, 12 (winter, 1968), 79-80. 7. kingery, robert e.; tauber, maurice f., eds.: book catalogs (new york: the scarecrow press, inc., 1963). 112 journal of library automation vol. 14/2 june 1981 anyway because he is primarily getting suggested classification numbers in order to browse. the tucson public library could not have made the above decisions if it did not have a complete online file of all its holdings (including even reference materials that never circulate). but since this data did exist (after a five-year bar-coding effort) and since more than forty online terminals were already in place throughout the library system to access the online file, the decision not to include locations or holdings in the microform catalog seemed reasonable . in the longer-range future (1990?), it is very likely that the entire catalog will be available online. in the meantime, the tucson public library did not want to divide its resources maintaining two location records, but rather wanted to concentrate resources in maintaining one accurate record of locations available as widely as possible throughout the library system (by installing more online terminals for staff and public use). was this decision a sound one? we don't know. the microform catalog has not yet been introduced for public use. by the end of this year we should have some preliminary answers to this question. references 1. robin w. macdonald and j. mcree elrod, "an approach to developing computer catalogs," college & research libraries 34:202--8 (may 1973). a structure code for machine readable library catalog record formats herbert h. hoffman: santa ana college, santa ana, california. libraries house many types of publications in many media, mostly print on paper, but also pictures on paper, print and pictures on film, recorded sound on plastic discs, and others. these publications are of interest to people because they contain recorded information. more precisely said, because they contain units of intellectual, artistic, or scholarly creation that collectively can be called "works." one could say simply that library materials consist of documents that are stored and cataloged because they contain works. the structure of publications into documents (or "books") and works, the clear distinction between the concept of the information container as opposed to the contents, deserves more attention than it has received so far from bibliographers and librarians. the importance of the distinction between books and works has been hinted at by several theoreticians, notably lubetzky. however, the idea was never fully developed. the cataloging implications of the structural diversity among documents were left unexplored. as a consequence, librarians have never disentangled the two terms book and work . from the paris principles and the marc formats to the new second edition of the anglo-american cataloguing rules, the terms book and work are used loosely and interchangeably, now meaning a book, now a work proper, now part of a work , now a group of books. such ambiguity can be tolerated as long as each person involved knows at each step which definition is appropriate when the term comes up. but as libraries ease into the age of electronic utilities and computerized catalogs based on records read by machine rather than interpreted by humans, a considerably greater measure of precision will have to be introduced into library work. as one step toward that goal an examination of the structure of publications will be in order. the items that are housed in libraries, regardless of medium, are of two types. they are either single documents, or they are groups of two or more documents. items that contain two or more documents are either finite items (all published at once, or with a first and a last volume identified) or they are infinite items (periodicals, intended to be continued indefinitely at intervals). schematically, these three types of bibliographic items in libraries can be represented as shown in figure l. it should be noted that all publications, all documents, all bibliographic items in lid d ... d do __ _ fig. 1. three types of bibliographic items: top, single-document item; center, finite multiple-document item; bottom, infinite multipledocument item. braries, can be assigned to one of these three structures. there are no exceptions. all bibliographic items, furthermore, contain works. an item may contain one single work. but an item may also contain several works. schematically, the two situations can be represented as shown in figure 2. an item that is composed of several documents and contains several works may have one work in each document, or several per document. schematically, the two possibilities can be represented as shown in figure 3. it is possible, of course, for an item to fig . . 2. top, single-work document (example: a typical novel); bottom, multiple-work document (example: a collection of plays). communications 113 fig. 3. top, one work per document; bottom, several works per document . be composed of several documents but to contain only one work. figure 4 is a schematic representation of this case. mixed structures are also possible, as in the schematic shown in figure 5. ign oring the mixed structure that is only a combination of two "pure" structures, the foregoing information can be combined into a table that shows seven possible publication types that differ from each other in terms of structure (figure 6). all bibliographic items, whether composed of one document or many, are known by a title . these titles can be called item titles. in the case of a singledocument item (structures a and c), item title and document title are, of course, identical. but in the case of some multiple-document items (publications of types d, e, f, and g, for example), two possibilities exist: the documents that make up the item may or may not have their own individual document titles. for purposes of fig. 4. multivolume work (example: a very long novel in two volumes). fig. 5. finite multi-document item containing many works, mixed structure. 114 journal of library automation vol. 14/2 june 1981 one several documents document per item per item one \.jork per item a se veral several lo/orks works per item per c document one lo/ork per document fig . 6. publication types. the bibliographer or cataloger, items that consist of several documents bearing individual document titles can be described under one of two principles. the entire item can be treated as a unit. elsewhere i have coined a term for this treatment: the set description principle .1 but it is also possible to treat each document as a separate publication, to describe it under the book description principle . if we combine all these considerations we find that we can assign to each bibliographic item that is added to a library's collection one of the thirteen codes shown in figure 7. how can these codes be useful? taking a look into the future, let us imagine an online catalog system supported by a database that contains the records of a library's holdings . the records in such a database are entered in a definite format . in this format, whatever it will be called , there will be data fields for titles, authors, physical descriptions , subject headings, document numbers, and much else. i propose that to these fields one other be added: the structure code . the structure code would add a new dimension to the retrieval of recorded infinite infinite b d e f g formation. here are a few specific examples . consider a search for material on subject x. qualify the search argument by structure codes 1, 3, 7, and 12. result: the search will yield only major monographic works, defined as items of types a, b,f, and g. note that subject x assigned to such items is a true subject heading. the materials retrieved in this example would all be works dealing specifically with the topic x. but the same term assigned to an item coded, say, 6, would not be a true subject heading. the term here would only give a broad general summary of what the works in the item are about. the structure code adds sophistication to the retrieval process by enabling a searcher to distinguish between specific subject designators and mere summary subject headings. a search that excludes codes 2, 4, 5, and 6 limits output to materials that are not just collections of essays. the stratagem used in card catalogs to reach the same result is the qualification of a subject heading by terms denoting format, such as the subdivisions congresses or addresses, essays, lectures . this method of qualifying subject headings has never been done communications ll5 structure code publication type description principle: book (b) or set (s) schematic 1 a 2 c 3 b 4 d 5 d 6 d 7 f 8 f 9 e 10 e 11 e 12 g 13 g fig. 7. structure codes . consistently , however . the proposed structure code would ensure uniform treatment of all affected publications. qualify the search by codes 9, 10, 11, 13 and all periodicals can be excluded . in the card catalog, format qualifications such b b s b s, with individual document title s, without indiv . document title b s b s, with individual . document title s, without indiv. document title b s fwli ___ wgj ~--~ as periodicals, or societies, periodicals, etc ., or yearbooks are sometimes added to subject headings to reach similar results. again, the structure code would introduce uniformity and consistency. present-day card catalogs list publica116 journal of library automation vol. 14/2 june 1981 tions only. they do not list the individual works that may be contained in publications. if an analytic catalog were to be built into a computerized system at some time in the future , the structure code would be a great help in the redesign, because it makes it easy to spot items that need analytics, namely those that contain embedded works, or codes 2, 4, 5, 6, 8, 9, 10, 11, and 13. a searcher working with such an analytic catalog could use the code to limit output to manageable stages-first all items of type c, for example; then broadening the search to include those of type d; and so forth, until enough relevant material has been found. the structure code would also be useful in the displayed output. if codes 5 or 8 appeared together with a bibliographic description on the screen, this would tell the catalog user that the item retrieved is a set of many separately titled documents. a complete list of those titles can then be displayed to help the searcher decide which of the documents are relevant for him. in the card catalog this is done by means of contents notes . not all libraries go to the trouble of making contents notes, though, and not all contents notes are complete and rtliable . the structure code would ensure consistency and completeness of contents information at all times. codes 10 and 13 in a search output, analogously, would tell the user that the item is a serial with individual issue titles. there is no mechanism in the contemporary card catalog to inform readers of those titles. codes 4 and 7 would tell that the document is part of a finite set, and so forth. it has been the general experience of database designers that a record cannot have too many searchable elements built into its format. no sooner is one approach abandoned "because nobody needs it," than someone arrives on the scene with just that requirement. it can be anticipated, then, that once the structure code is part of the standard record format, catalog users will find many other ways to work the code into search strategies. it can also be anticipated that the proposed structure code, by adding a factor of selectivity, will help catalogers because it strengthens the authority-control aspect of machine-readable catalog files. if two publications bear identical titles, for example, and one is of structure 1, the other of structure 6, then it is clear that they cannot possibly be the same items. however, if they are of structures 1 and 7, respectively, extra care must be taken in cataloging, for they could be different versions of the same work. determination of the structure of an item is a by-product of cataloging, for no librarian can catalog a book unless he understands what the structure of that book is-one or more works, one or more documents per item, open or closed set, and so forth . it would therefore be very cheap at cataloging time to document the already-performed structure analysis and express this structure in the form of a code. references l. herbert h. hoffman, descriptive cataloging in a new light: polemical chapters for librarians (newport beach, calif.: headway publications, 1976), p.43. revisions to contributed cataloging in a cooperative cataloging database judith hudson: university libraries , state university of new york at albany. introduction oclc is the largest bibliographic utility in the united states. one of its greatest assets is its computerized database of standardized cataloging information . the database, which is built on the principle of shared cataloging, consists of cataloging records input from library of congress marc tapes and records contributed by member libraries. oclc standards ln. order to provide records contributed by member libraries that are as usable as those input from marc tapes, it is immitchell multimedia will have a profound effect on libraries during the next decade. this rapidly developing technology permits the user to combine digital still images, video, animation, graphics, and audio. it can be delivered in a variety of finished formats, including streaming video on the web, video on dvd/vcd, embedded digital objects within a web page or presentation software such as powerpoint, utilized within graphic designs, or printed as hardcopy. this article examines the elements of multimedia creation, as well as requirements and recommendations for implementing a multimedia facility in the library. t he term multimedia, which some may remember being used in the early 1970s as the name for slide shows set to music, now is used to describe “a number of diverse technologies that allow visual and audio media to be combined in new ways for the purpose of communicating.”1 almost all personal computers sold today are capable of viewing multimedia; many can, with minor modifications, also create multimedia. one of the most important features of multimedia is its flexibility. multimedia creation has several distinct elements—inputs, processes performed on those inputs, and outputs (see figure 1). each element can be described as follows. � inputs—new video can be recorded, or existing video, stored on a hard disk, cd/dvd, or tape can be imported. the same is true of audio, with the added flexibility of creating soundtracks or sound effects later, during the editing process. digital still images can be used, either shot on a camera or created by scanning an existing picture. digital artwork or animated sequences created in other software also can be brought in. � processing—regardless of the source, these digital inputs are loaded into the editing software. at this stage, the user will select and arrange the images and sounds, and the software may permit special effects to be created. in addition, the editing software may compress the file so that it is easier to use than the large file sizes used in raw video and audio recording. � outputs—at this point, the user has more choices to make. the new multimedia file can be sent to a program that will encode it for a streaming video in any one of a variety of popular formats, such as windows media, realmedia, or clipstream. then it can be mounted on a web site (either a regular page or within courseware such as webct or blackboard), or the file could be burned onto a cd or dvd, or it could be used within presentation software such as microsoft powerpoint. or the output file from the editing process could be encoded and embedded so that it is an avatar running as part of a web page with a product such as rovion bluestream. the possibilities are nearly endless. all of this is made possible by advances in technology on a variety of fronts. one of the happy anomalies in technology is that greater performance is frequently accompanied by lower costs. this is certainly the case with much of the activity surrounding multimedia. the following factors have fostered advances in multimedia: � increase in processing power and decrease in cost of computer hardware; � quality and affordability of video equipment; � compression of multimedia files; � consumer broadband internet access; and � current multimedia editing software the first two technology factors concern the equipment involved in multimedia production. leading off is the familiar, ever-increasing speed of processors and improved memory and hard-drive space, all delivered for less money. this trend is something that many people take for granted, but a reality check is sometimes in order. the processor in the typical desktop machine on advertised special today is approximately forty-four times as fast as the first pentium processor sold ten years ago, and is equipped with sixteen times as much ram and 117 times as much hard-drive space—at 20 percent of the cost of the old machine (not even adjusted for inflation!). the second factor is the incredible quality available in consumer-market video equipment at reasonable costs. while the images produced with consumer-grade video would not play well at the local megaplex movie theater, they look very good on the small screens found on computers, televisions, and classroom projectors. the third factor is that tremendous compression of multimedia files can be achieved during the editing process. an incoming raw-video file (in the standard .avi format) can be compressed with editing, encoding, and dedicated third-party compression software to an incredible 1 to 2 percent of its original size, and it will still retain very good quality as a digital object on the web and in other desktop viewing applications. the fourth factor is extremely critical for the success of multimedia web applications. home access is shifting away from dial-up access to broadband, with its greatly increased transfer rates. half of all united states homes with internet access are already using broadband, and the 32 information technology and libraries | march 2005 gregory a. mitchell (mitchellg@utpa.edu) is assistant director, resource management at the university of texas—pan american library, edinburg, texas. distinctive expertise: multimedia, the library, and the term paper of the future gregory a. mitchell forecast is for steady increase in these numbers.2 although not all broadband is created equal, it is all significantly faster than dial-up access. the final technology factor concerns the software that is currently available to the multimedia web developer. a developer can achieve some quite professional results with even the most basic products, and then can grow into more complex software that supports increasing levels of expertise. once again, this software is being sold in the price range that typical consumers can afford. � small really is beautiful creating a multimedia lab in the library need not be a large, complex undertaking. in fact, it can be very low cost and as simple as a single workstation. so it is scalable, allowing the library to start small and build in complexity and cost as time, money, and human resources will permit. at the bare-bones minimum, a multimedia lab would consist of a workstation with the software necessary for acquiring, editing, and outputting the files. for practical purposes, though, the workstation should be equipped with a network connection, a cd/dvd burner, a scanner, and a webcam with microphone. another very useful option is an analog-digital bridge device, which enables the capture of analog input (such as vhs tape) into digital files for the editor. to achieve better-quality video when shooting original content, a digital-video camera, tripod, wireless microphone, and portable light kit would be recommended. since more time typically is spent at the editing station than with the camera, the lab can be expanded with additional workstations before investing in another camera. experience at the author’s institution has shown that it is possible to operate a lab with ten workstations and only three video cameras and three still cameras. finally, output from the editing process will likely be printed, so a photoquality printer is another convenient option. this illustrates that the entry into multimedia work need not be a large expense, especially if an existing workstation and any other equipment is already available. if a fairly recent workstation is available to dedicate to the project, the library’s total startup cost could range from $200 to $1,000. not many new library services can be launched for as little as that. rather than dwell on equipment specifications, as that is not the intent of this discussion, the reader may consult the excellent tutorials available from desktop video and pc magazine’s online product guide.3 finally, the creation of a studio is a worthwhile option. although some video will need to be shot on location, many times it is possible to set up and shoot in just one place. a studio is the best place in which to work because it is a controlled environment. it does not need to be large or complicated, and a quiet office or study room can be set up with little effort and expense. the studio gives the users control over the sound and the lighting, and involves minimal setup time for projects. � the research paper of the future multimedia has begun to attract attention in the library community. joe janes, chair of library and information science at the information school at the university of washington and the person responsible for developing the internet public library, recently stated he foresees a growing role for multimedia in the library. it will replace much of the traditional, text-based communication that people are accustomed to. for example, multimedia projects can become the research paper of the future for students.4 it is the media in which many library customers will be working. experience from the author’s institution with creating a multimedia lab would seem to confirm his observation. during the first year and a half of operation, use of the lab has steadily increased (see figure 2). � collaboration the multimedia lab opens the doors to collaborative opportunities with faculty and students from a variety of disciplines across campus. this is because multimedia, like geographic information systems (gis) or other electronic information and communication technologies, is a tool and is not discipline-specific. as important as it is to make the connection with faculty, this media is something with which the students will frequently lead the figure 1. multimedia creation process distinctive expertise: multimedia, the library, and the term paper of the future | mitchell 33 34 information technology and libraries | march 2005 way. they are, after all, the mtv generation, and multimedia has an incredible appeal to their visual orientation. faculty themselves have used it to augment their web-based courses as well as traditional classroom instruction. the author ’s library has even initiated a multimedia résumé service for graduating students. the students can record a video introduction of themselves, encode this as a rovion bluestream avatar, and post it with their résumés on the web. this creates a much stronger impression than a standard résumé, hopefully giving the students an edge in promoting themselves on the job market. even more impressive is the variety of projects that are created in the lab by the students. one might expect to see interest from students in art and communications classes, but students come from many other disciplines as well. for example, business students have effectively used multimedia in their graduate-school business-plan presentations, while biology students like to use the graphics capabilities to study close-ups of slides. education students have employed it to produce multimedia instructional aids, and a sociology student put together a presentation on underserved, low-income neighborhoods. the library supplies the facility and instruction—only the imagination of students is needed. libraries have always been involved in the students’ research and writing process, by providing content, instruction, and facilities for producing the final research product. the same is true in the multimedia environment, although implementing a multimedia lab calls for some new skills for librarians. these include familiarity with basic principles of videography, learning how to use the cameras and other equipment, and gaining some mastery of the editing and encoding software. � why put it in the library? in addition to the research-paper analogy, the author believes that librarians can point with pride to the values and value that libraries offer their communities. it is a central and neutral location—not in one department’s or college’s turf. libraries are conveniently open for many hours per week. many of the information resources that students might use to prepare the presentation are in the library. and librarians have a professional ethic that drives them to provide instruction and assistance for the services the library offers. since multimedia production does have a learning curve and most new users need help in mastering the technology, it does not fit very well with the typical 24/7 drop-in computer lab that the campus information technology (it) often operates. this is a good opportunity for librarians to recognize some of their strengths and capitalize on them. in addition, this can be a breath of fresh air for librarians. here is an opportunity to learn about something new and creative. most people find that they have less room for creativity as time goes by.5 with a multimedia lab in the building, it will offer the librarians the opportunity to create multimedia productions for the library, besides assisting students and faculty with their projects. � potential problems there are some obstacles to overcome, of course. they need not be seen as major, but it is best to be realistic when beginning any new venture. it is almost always a good idea to start small, with a pilot project that will yield valuable lessons before venturing into anything big. � equipment—define what specifications are needed, see what is already available to use or borrow, then figure out what you will actually need to buy. � software—check out the variety of software for editing and production; think about how you want to begin using multimedia (primarily on the web, in presentation software such as powerpoint, as standalone videos on cds and dvds). � money—if funding permits, a library can invest several thousand dollars in a high-end multimedia computer, associated peripherals such as a color printer and one or more scanners, and a software suite to meet initial anticipated demands for multimedia creation and editing. if funding is scarce, you may want to investigate what existing equipment could be used in support of a pilot project. � location—this needs some space of its own, accessible to students and monitored by staff. although the figure 2. university of texas—pan american library multimedia lab usage editing workstation could be in an area with other computers, a quiet area is needed for shooting video so that there will not be interference from noise and unwanted foot traffic through the shots. � staffing and training—a multimedia lab is not a good candidate for self-service. librarians and staff who will provide the service need to learn how to use the equipment and software. make sure that they all have an acceptable level of competence and confidence so that the library can shine with its new service, but expect that everyone will need to continue to learn and grow in their proficiency. if your library plans to produce its own multimedia sessions as well, it would be a good investment to attend a class on television or video production. � hours—how many hours per week will the new service be available? if it is the entire time the library is open, be prepared to train plenty of staff. repeat users will need less help as their skills increase (by the way, some of these students can be great workstudy employees). � instruction—plan to offer formal orientation and instruction sessions to faculty and their classes. if your lab is small, this is challenging, but it can be accomplished with some creativity. for example, a general instruction session on concepts can be done in a classroom, followed up by a series of small groups working by appointment for the appliedlearning component in the multimedia lab. the author and a colleague have even done instruction outside the library using laptops and cameras, creating a de facto mobile studio. � copyright—if there are already vcrs or photocopiers in the library, you have had to deal with this issue. the pan american library at university of texas does not allow people to use its lab to copy movies, which is a request that surely will come to you, and we post the usual copyright notices just as we do at our photocopiers. for some excellent information on copyright, visit the american library association web site (www.ala.org). � evaluation—plan on at least basic evaluation of the service. this can include an assessment of the effectiveness of the instruction sessions, a survey of satisfaction with the lab itself, a questionnaire on the intended uses of the multimedia projects, demographic data on the students, or other student input. logs of the number of uses and peak-demand periods are extremely useful for planning and for justifying further expenditures and staffing requests. � flexibility for the future—whatever you do in a pilot phase, always keep in mind that you want to keep an open mind—you are trying to learn from the experience so that you can make good decisions for the direction of this new service. it may not go exactly the way you originally thought, because of serendipity, or changes in technology, or very strong demand from some segments of the campus instead of others, or other environmental factors. � conclusion benefits to the library from the multimedia lab are many. one of the most important benefits is that it keeps the library involved in the process of academic communication, as the medium of the communication changes with technology. by being involved in this evolving medium at its early stages, the library is poised to pounce on opportunities to employ it to the benefit of the library in instruction and content delivery. the library also would position itself on campus as a key player in it and the leading local expert in the growing field of multimedia. since multimedia is a tool that crosses the entire range of subject disciplines on campus, it opens the doors of faculty to collaborate with librarians in exciting new ways. just as many campuses already have learning and collaborative communities that grew around their web courseware or gis endeavors, so too can one develop around multimedia. the appendix offers a list of multimedia web sites to consider. libraries are more than warehouses of books and periodicals. as more and more of our resources have been made available electronically, and indeed more of higher education has moved to electronic delivery, many libraries have been faced with declining gate counts, circulations, and reference statistics. as someone observed, we are victims of our own success. so what is the role of the library? we are intrinsically involved in the process of instruction, academic research, and communication. as kling observed, “one important strategic idea is that libraries configure their it services and activities to emphasize the distinctive expertise of their librarians rather than simply concentrate on the size and character of the documentary collection.”7 it is imperative therefore that libraries pick out the new trends that will allow them to excel by capitalizing on their traditional strengths. references 1. scala, inc. multimedia directory. accessed apr. 21, 2004, www.scala.com/multimedia/multimedia-definition.html. 2. nielsen/netratings as of june, 2004. accessed aug. 10, 2004, www.websiteoptimization.com/. 3. about.com, dvt101. accessed apr. 15, 2004, http:// desktopvideo.about.com/library/weekly/aa040703a.htm; “anatomy of a video editing workstation,” pc magazine. accessed apr. 16, 2004, www.pcmag.com/article2/0,1759,1264650 ,00.asp. distinctive expertise: multimedia, the library, and the term paper of the future | mitchell 35 36 information technology and libraries | march 2005 4. college of dupage, “joe janes and colleagues: preparing for the future of digital reference,” a satellite broadcast from the college of dupage, 16 apr. 2004. 5. sandra kerka, creativity in adulthood (columbus, ohio: eric clearinghouse on adult career and vocational education, eric digest no. 204, ed429186, 1999). 6. american library association, “copyright issues, primer on the digital millennium.” accessed may 10, 2004, www.ala .org/ala/washoff/woissues/copyrightb/dmca/dmcprimer.pdf. 7. rob kling, “the internet and the strategic reconfiguration of libraries,” library administration & management 15, no. 3 (summer 2001): 144–51. appendix. for further reading: a multimedia web-site tour the following is a sampling of some of the most popular and interesting multimedia software, with examples of completed productions. this is not an official endorsement of any one product over another, whether listed here or not. a look at these sites will, however, give the reader an idea about the power and possibilities of multimedia communications. adobe (www.adobe.com) the well-known makers of some of the most powerful and popular editing software packages for graphics and video. camtasia (www.camtasia.com) easy to use, this is a good example of the type of software that does screen capture and recording, which is handy for producing online tutorials. clipstream (www.clipstream.com) an excellent example of the type of newer encoding software that achieves incredible compression of video and delivers it over the web with no viewer or plug-ins required for the user. finalcut pro (www.apple.com/finalcutpro) a perennial favorite among the mac crowd, this software is relatively easy to learn and lets the developer achieve dramatic results. flashants (www.flashants.com) a handy program that converts flash animation into .avi video format so that you can integrate animated sequences into a video production. macromedia (www.macromedia.com) the makers of flash and director, which are some of the most popular graphics, animation, and mulitimedia editing tools in the business. pinnacle (www.pinnaclesys.com) what finalcut pro is to the mac, this package is for the pc environment. easy to use, yet sophisticated in the results achieved. rovion (www.rovion.com) rovion bluestream is an encoder that enables the creation of avatar characters to appear live on your web page. a plugin is required for the user, but this approach definitely gets attention. serious magic (www.seriousmagic.com) an award-winning software package that allows you to turn a workstation into a studio, complete with teleprompter capability, sound effects, graphics, and editing. university of texas—pan american library (www.lib.panam.edu/libinfo/media.asp) links to multimedia projects at the author’s institution, including productions made by staff and students. 315 technical communications isad/solinet to sponsor institute "networks and networking ii; the present and potential" is the theme of an isad institute to be held at the braniff place hotel on february 27-28, 1975, in new orleans. the sponsors are the information science and automation division of ala and the southeastern library network (solinet). this second institute on networking will be an extension of the previous one held in new orleans a year ago. the ground covered in that previous institute will be the point of departure for "networks ii." the purpose of the previous institute was to review the options available in networking, to provide a framework for identifying problems, and to suggest evaluation strategies to aid in choosing alternative systems. while the topics covered in the previous institute will be briefly reviewed in this one, some speakers will take different approaches to the subject of networking, while other speakers will discuss totally new aspects. in addition to the papers given and the resultant questions and answers from the floor, a period of round table discussions will be held during which the speakers can be questioned on a person-to-person basis. a new feature to isad institutes now being planned will be the presence of vendors' exhibits. arrangements are being made with the many vendors and manufacturers whose services are applicable to networking to exhibit their products and systems. it is hoped that many of them will be interested in responding to this opportunity. the program will include: "a systems approach to selection of alternatives" -resource sharing-camponents-communications options-planning strategy. joseph a. rosenthal university of california, berkeley. ' "state of the nation"-review of current developments and an evaluation. brett butler, butler associates. "the library of congress, marc, and future developments." henriette d. avram, library of congress. "data bases, standards and data conversions" -existing data bases-characteristics-standardization-problems. john f. knapp, richard abel & co. "user products"-possibilities for product creation-the role of user products. maurice freedman, new york public library. "on-line technology"-hardware and software considerations-library requirements-standards-cost considerations of alternatives. philip long, state university of new york, albany. "publishers' view of networks"-copyright-effect on publishers-effect on authorship-impact on jobbers-facsimile transmission. carol nemeyer, association of american publishers. "national library of canada"-current and anticipated developments-cooperative plans in canada-international cooperation. rodney duchesne, national library of canada. "administrative, legal, financial, organizational and political considerations" -actual and potential problems-organizational options-financial commitment-governance. fred kilgour, oclc. registration will be $75.00 to members of ala and staff members of solinet institutions, $90.00 to nonmembers, and $10.00 to library school students. for hotel reservation information and registration blanks, contact donald p. hammer, isad, american library association, 50 e. huron st., chicago, il 60611; 312-944-6780. 316 journal of library automation vol. 7/4 december 1974 regional projects and activities indiana coopemtive libmry services authm·ity the first official meeting of the board of directors of the indiana cooperative library services authority (incolsa) was held june 4, 1974, at the indiana state library in indianapolis. a direct outgrowth of the cooperative bibliographic center for indiana libraries ( cobicil) feasibility study project sponsored by the indiana state library and directed by mrs. barbara evans markuson, incolsa has been organized as an independent not-for-profit organization "to encourage the development and improvement of all types of library service." to date, contracts have been signed by sixty-one public, thirteen academic, fourteen schools and five specfal librariesa total of ninety-three libraries. incolsa is being funded initially by a three-year establishment grant from the u.s. office of education, library services and construction act (lsca) title i funds. officers are: president-harold baker, head of library systems development, indiana state university; vice-presidentor. michael buckland, assistant director for technical services, purdue university libraries; secretary-mary hartzler, head of catalog division, indiana state library; treasurer-mary bishop, director of the crawfordsville book processing center; three directors-at-large--phil hamilton, director of the kokomo public library; edward a. howard, director of the evansville-vanderburgh county public library; and sena kautz, director of media services, duneland school corporation. stanford's ballots on-line files publicly available through spires september 16,.1974 the stanford university libraries automated technical processing system, ballots (bibliographic automation of large library operations using a timesharing system) , has been in operation for twenty-two months and supports the acquisition and cataloging of nearly 90 percent of all materials processed. important components of the ballots operations are several on-line files accessible through an unusually powerful set of indexes. currently available are: a file of library of congress marc data starting from january 1, 1972 (with a gap from may to august 1972); an in-process file of individual items being purchased by stanford; an on-line catalog (the catalog data file) of all items cataloged through the system, whether copy was derived from library of congress marc data, was input from non-marc cataloging copy, or resulted from stanford's own original cataloging efforts; and a file of see, see also, and explanatory references (the reference file) to the catalog data file. in addition, during september and october 1974, the 85,000 bibliographic and holdings records (already in machinereadable form on magnetic tape) representing the entire j. henry meyer memorial undergraduate library was convmted to on-line meyer catalog data and meyer reference files in ballots. these files are publicly available through spires (stanford public information retrieval system) to any person with a terminal that can dial up the stanford center for information processing's academic computer services computer (an ibm 360 model 67) and who has a valid computer account. the marc file can be searched through the following index points: lc card number personal name corporate/ conference n arne title the in-process, catalog data, and reference files for stanford and for meyer can also be searched as spires public subfiles through the following index points: ballots unique record identification number personal name corporate/ conference name title subject heading (catalog data and reference file records only) call number (catalog data and reference file records only) lc card number the title and corporate/ conference name indexes are word indexes; this means that each word is indexed individually. search requests may draw on more than one index at a time by using the logical operators "and," "or," and "and not" to combine index values sought. if you plan to use spires to search these files, or if you would like more information, a publication called gttide to ballots files may be ordered by writing to: editor, library computing services, s.c.i.p.-willow, stanford university, stanford, ca 94305. this document contains complete information about the ballots files and data elements, how to open an account number, and how to use spires to search ballots files. a list of ballots publications and prices is also available on request. as additional libraries create on-line files using ballots in a network environment, these files will also be available. these additions will be announced in ]ola technical commttnications. data base news interchange of alp and ei data bases a national science foundation grant (gn-42062) for $128,700 has been awarded to the american institute of physics (aip), in cooperation with engineering index ( ei), for a project entitled "interchange of data bases." the grant became effective on may 1, 1974, for a period of fifteen months. the project is intended to develop methods by which ei and alp can reduce their input costs by eliminating duplication of intellectual effort and processing. through sharing of the resources of the two organizations and an interchange of their respective data bases, alp and ei expect to improve the utilization of these computer-readable data bases. the basic requirement for the developtechnical communications 317 ment of the interchange capability for computer-readable data bases is the establishment of a compatible set of data elements. each organization has unique data elements in its data base. it will therefore be necessary to determine which of the data elements are absolutely essential to each organization's services which elements can be modified, and wh~t other elements must be added. mter the list of data elements has been established, it will be possible to unite the specifications and programs for format conversions from alp to ei tape format and vice versa. simultaneously, there will be the development of language conversion facilities between ei' s indexing vocabulary and alp's physics and astronomy classification scheme (pacs). it is also planned to investigate the possibility of establishing a computer program which can convert alp's indexing to ei's terms and vice versa. with the accomplishment of the above tasks, it will be possible to create new services and repackage existing services to satisfy the information demands in areas of mutual interest to engineers and physicists, such as acoustics and optics. eric data base users conference the educational resource information center (eric) held an eric data base users conference in conjunction with the 37th annual meeting of the american society for information science (asis) in atlanta, georgia, october 13-17, 1974. the eric data base users conference provided a forum for present and potential eric users to discuss common problems and concerns as well as interact with other components of the eric network: central eric, the eric processing and reference facility, eric clearinghouse personnel, and information dissemination centers. although attendees have in the past been primarily oriented toward machine use of the eric files, all patterns of usage were represented at this conference, from manual users of printed indexes to operators of national on-line reh·ieval systems. 318 ]oumal of library automation vol. 7/4 december 1974 a number of invited papers were presented dealing with subjects such as: • the current state and future directions of educational information dissemination. sam rosenfeld (nie), lee burchina! (nsf). • what services, systems, and data bases are available? marvin gechman (information general), harvey marron (nie). • the roles of libraries and industry, respectively, in disseminating educational information. richard de gennaro (university of pennsylvania), paul zurkowski (information industry association) . several organizations (national library of canada, university of georgia, wisconsin state department of education) were invited to participate in "show and tell" sessions to describe in detail how they are using the eric system and data base. a status report covering eric on-line services for educators was presented by dr. carlos cuadra (system development corporation) and dr. roger summit (lockheed). interactive discussion groups covered a number of subjects including: • computer techniques-programming methods, use of utilities, file maintenance, search system selection, installation, and operation. • serv:uig the end user of educational information. • introduction to the eric systemwhat tools, systems, and services are available and how are they used? • beginning and advanced sessions on computer searching the eric files. online terminals were used to demonstrate and explain use of machine capabilities. commercial services and developments scope data inc. ala train compatible terminal printers scope data inc. currently is offering a high-speed, nonimpact terminal printer for use in various interactive printing applications. capability can be included in the series 200 printer as an extra-cost feature to print the eight-bit ascii character set for ala character set with 176 characters. for further information contact alan g. smith, director of marketing, scope data inc., 3728 silver star rd., orlando, fl 32808. institute for scientific information puts life sciences data base on-line through system development corporation the institute for scientific information (lsi) has announced that it will collaborate with system development corporation (sdc) to provide on-line, interactive, computer searches of the life sciences journal literature. scheduled to be fully operational by july 1, 1974, the isi-sdc service is called scisearch® and is designed to give quick, easy, and economical access to a large life sciences literature .file. stressing ease of access, the sdc retrieval program, orbit, permits subscribers to conduct extremely rapid literature searches through two-way communications terminals located in their own facilities. mter examining the preliminary results of their inquiries, searchers are able to further refine their questions to make them broader or narrower. this dialog between the searcher and the computer (located in sdc's headquarters in santa monica, california) is conducted with simple english-language statements. because this system is tied in to a nationwide communications network, most subscribers will be able to link their terminals to the computer through the equivalent of a local phone call. covering every editorial item from about 1,100 of the world's most important life sciences journals, the service will initially offer a searchable ille of over 400,000 items published between april 1972 and the present. each month approximately 16,000 new items will be added until the average size of the file totals about one-half million items and represents two-and-one-half years ·of coverage. to assure subscribers maximum retrieval effectiveness when dealing with this massive amount of information, the data base can be searched in several ways. included are searches by keywords, word stems, word phrases, authors, and organizations. one of the search techniques utilized-citation searching-is an exclusive feature of the lsi data base. for every item retrieved through a search, subscribers can receive a complete bibliographic description that includes all authors, journal citation, full title, a language indicator, a code for the type of item (article, note, review, etc.), an lsi accession number, and all the cited references contained in the retrieved article. the accession number is used to order full-text copies of relevant items through lsi's original article tear sheet service (oats®). this ability to provide copies of every item in the data base distinguishes the lsi service from many others. current library of congress catalog on-line for reference searches information dynamics corporation (idc) has agreed to collaborate with system development corporation (sdc) to provide reference librarians, researchers, and scholars with on-line interactive computer searches of all library materials being cataloged by the library of congress. scheduled to be fully operational as of october 1, 1974, the sdc-idc service is called sdc-idc/libcon and is designed to give quick, easy, and economical access to a large portion of the world's scholarly library materials. as in the lsi service described above, the data base can be searched in several ways. included are compound logic searches by keywords, word stems, word phrases, authors, organizations, and subject headings for most english materials. one of the search techniques utilized-string searching-is an exclusive feature of sdc's orbit system. keyword searching of cataloged items including all foreign materials processed by the library of congress technical communications 319 is an exclusive feature of the idc data base not currently available in other online marc files. for individual items retrieved through a search, subscribers can receive a bibliographic description that includes authors, full title, an idc accession number, the lc classification number, and publisher information. standards the isad committee on technical standards for library automation invites your participation in the standards game editor's note: the tesla reactor ballot will be provided in f01'thcoming issues. to use, photocopy the ballot fol'm, fill out, and mail to: john c. kountz, associate for library automation, office of the chan{jellor, the california state university and colleges, 5670 wilshire blvd., suite 900, los angeles, ca 90036. the procedure this procedure is geared to handle both reactive (originating from the outside) and initiative (originating from within ala) standards proposals to provide recommendations to ala's representatives to existing, recognized standards organizations. to enter the procedure for an initiative standards proposal you must complete an "initiative standards proposal" using the outline which follows: initiative standard proposal outlinethe following outline is designed to facilitate review by both the committee and the membership of initiative standards proposals and to expedite the handling of the initiative standard proposal through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indi320 journal of library automation vol. 7/4 december 1974 cated by: vi. existing standards. not applicable). nate that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 8~~~~ x 11" white paper (typing on one side only) . each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title) . ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications) . v. description. briefly describe the standard proposal (specification of the standard). vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard proposal, cite them here (expository remarks). vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). viii. specifications. (optional) specify the standard proposal using record layouts, mechanical drawings, and such related documentation aids as required in addition to text exposition where applicable (specifications of the standard). kindly note that the outline is designed to enable standards proposals to be written following a generalized format which will facilitate their review. in addition, the outline permits the presentation of background and descriptive information which, while important during any evaluation, is a prerequisite to the development of a standard. tesla reactor ballot identification number for standing requirement reactor information name-----'----------tiue ______________________ ___ organization --------------addrms _____________ ___ city ___ _ state ___ zip __ _ telephonea 1:-:::-rea::+----~---­ need (for this standard) for d against 0 specification (a presented in this requirement) for 0 against 0 ext. can you participate in the development of this. standard -.,.---------==----0 no d yes reason for position: (use format of proposal. · additional pages can be used if required) the reactor ballot is to be used by members to voice their recommendations relative to initiative standards proposals. the reactor ballot permits both "for" and "against" votes to be explained, permitting the capture of additional information which is necessary to document and communicate formal standards proposals to standards organizations outside of the american library association. as you, the members, use the outline to present your standards proposals, tesla will publish them in jola-tc and solicit membership reaction via the reactor ballot. throughout the process tesla will insure that standards proposals are drawn to the attention of the applicable american library association division or committee. thus, internal review usually will proceed concurrently with membership review. from the review and the reactor ballot tesla will prepare a "majority recommendation" and a "minority report" on each standards proposal. the majority recommendation and minority report so developed will then be transmitted to the originator, and to the official american library association representative on the appropriate standards organization where it should prove a source of guidance as official votes are cast. in addition, the status of each standards proposal will be reported by tesla in jola-tc via the standards scoreboard. the committee (tesla) itself will be nonpartisan with regard to the proposals handled by it. however, the committee does reserve the right to reject proposals which after review are not found to relate to library automation. input to the editor: we have been asked by the members of the ala interdivisional committee on representation in machine readable form of bibliographic information, (marbi) to respond to your editorial in the june 1974 issue of the journal of library automation. this editorial dealt with the council of library resources' [sic] involvement in a wide range of projects, ranging from the sponsorship of a group which is attempting to develop a subset of marc for use in inter-library exchange technical communications 321 of bibliographic data ( cembi), to management of a project which has as its goal the creation of a national serials data base, (conser), and, more recently, to the convening of a conference of library and a&i organizations to discuss the outlook for comprehensive national bibliographic control. you raised several legitimate questions: 1) has sufficient publicity been given to these activities of the council so that all, not just a few, libraries are aware of what is happening and have an opportunity to exert an influence on developments? and, 2) is the council bypassing existing channels of operation and communication? you also suggest that proposals from groups such as cembi be channeled through an official ala committee such as marbi for intensive review and evaluation. it should be pointed out that marbi is not charged with the development of standards. it acts to monitor and review proposals affecting the format and content of machine readable bibliographic data, where that data has implications for national or international use. this applies to proposals emanating from cembi and conser as well as from other concerned groups. all indications to date are that the council is fully aware of marbi's role and will not bypass marbi. a number of members of marbi are also members of cembi and marbi is represented on the conser project. also reassuring is the fact that, unless we allow lc to fall by the wayside in its role as the primary creator and distributor of machine readable data, any standards for format or content developed by a council-sponsored group will eventually be reflected in the marc records distributed by lc. the library of congress has issued a statement, published in the june 1974 issue of jola, to the effect that it will not implement any changes in the marc distribution system which are not acceptable to marbi. marbi and lc have worked out a procedure whereby all proposed changes to marc are submitted to marbi. they are then published in ]ola and distributed to mem322 journal of library automation vol. 7/4 december 1974 hers of the marc users discussion group for comments. comments are collected and evaluated by marbi and a report submitted to lc, with its recommendations. the marbi review process does not guarantee perfection and there is no assurance that everyone will be satisfied. compromise and expediency are the name of the game in this extremely complicated and uncharted area of standards for machine readable bibliographic data. however the council has undoubtedly learned from the isbd(m) experience that it cannot make decisions which affect libraries without the greatest possible involvement of librarians. it is the feeling of the marbi committee members that the council intends to work with marbi in future projects which fall into marbi's area of concern. velma veneziano marbi past chairperson ruth tighe chairperson editor's note: it is gratifying to note that marbts response reflects the opinions expressed in the june 1974 editorial. the library community will doubtless. be pleased to learn of clr's intention to work closely with marbi.-skm to the editor: as briefly discussed with you, yom editorial in the june 1974 issue of jola is both admirable and disturbing (to me, at least). the problem of national leadership in the area of library automation is a critical problem indeed. being in the ''boondocks" and far removed from the scene of action, i can only express to you my perception as events and activities filter through to me. i can remember as far back as 1957 when adi had a series of meetings in washington, d.c. trying to establish a national program for bibliographic automation. i have been through eighteen years of meetings, committees, conferences, etc. concerned with trying to develop a national plan for bibliographic automation and information storage and retrieval systems. i have worked with nsf, usoe, department of commerce, u.s. patent office, engineering and technical societies, dod agency-the entire spectrum. i spent a good many years working in adi and asis, sla, andmost recently ala. at no time were we able to make significant progress towards a national system. even the great airlie house conference did not produce any significant changes in the fragmented, competitive "non-system." it has only been in the recent past since clr has taken an aggressive posture that i am able to see the beginning of orderly development of a national automated bibliographic system. i certainly agree that any topic as critical as those being discussed by cembi should be in the public domain, but i also believe that the progress made by cembi would not have been possible without clr taking the initiative in getting these key agencies together. thank goodness someone quit talking and started doing something at the national level! i sincerely believe that in the absence of a national library and with the cmrent lack of legally derived authority in this arena, clr provides a genuine service to the total library community in establishing cembi. hopefully, your very excellent article (in the same issue of jola) on "standards for library automation ... " will help to put the entire issue of bibliographic record standards into perspective. as a former chemist and corrosion engineer, i am fully aware of the absolute necessity for technical standards. i am also fully aware of the necessity of developing technical standards through the process you outlined in your article. hopefully, clr action with cembi will expedite this laborious process and help to push our profession forward into the twentieth century. since we ourselves have not been able to do it through all these years, i am personally grateful that some group such as clr took the initiative and forced us to do what we should have done years ago. maryann duggan slice office di1·ector editor's note: positive action and progmssive movement are, of course, desirable and are often lacking in large organizations. however, posit·ive action without communication of this action to the affected population can only be detrimental. on issues of the complexity of those addressed by cembi and conser, review by the library community is always useful, even though action may be temporarily delayed.-skm to the editor: on page 233 of the september issue of lola there is a report from the information industry association's micropublishing committee chairman (henry powell). he states that", .. the committee spelled out several areas of concern to micropublishers which will be the subject of committee action .... " one of the concerns of the committee is that a z39 standards committee has recommended "standards covering what micropublishers can say about their products." (emphasis mine.) technical communications 323 as chairman of the z39 standards subcommittee which is developing the advertising standard referred to, i wish to point out that there is no intention on the part of the subcommittee to tell micropublishers what they can say nor what they may say about their products. the subcommittee, which is composed of representatives from three micropublishing concerns, two librarians, and myself, has from the beginning taken the view that the purpose of the standard would be to provide guidance for micropublishers and librarians alike. we are most anxious that no one feel that the subcommittee has any intention of attempting to use the standards mechanism to tell any micropublisher how he must design his advertisements. in addition it should be noted that no ansi standard is compulsory. carl m. spaulding program officer council on library resou1·ces microsoft word september_ital_dowling_final.docx president’s  message   thomas  dowling     information  technologies  and  libraries  |  september  2015        doi:  10.6017/ital.v34i3.8966   1   fall  has  arrived,  faster  than  expected  (as  it  always  does).    it  seems  like  ala  annual  just  wrapped   up  in  san  francisco,  but  we're  already  well  underway  with  the  coming  year's  activities.   the  national  forum  2015  will  be  here  before  you  know  it.    in  a  fall  season  crowded  with  good   technology  conferences,  lita  forum  consistently  proves  its  value  as  a  small,  engaging,  and   focused  meeting.    technologists,  strategists,  and  front-­‐line  librarians  come  together  to  discuss  the   tools  they  make  and  use  to  provide  cutting  edge  library  services.    in  addition  to  great  lita   programming,  this  year  we're  working  with    colleagues  from  llama  (the  library  leadership  and   management  association)  to  provide  a  set  of  programs  focused  on  the  natural  cooperation  of   management  and  technologies  in  libraries.    there  are  great  preconferences  on  makerspaces  and   web  analytics,  keynote  addresses,  over  50  concurrent  sessions,  and  a  lot  of  networking   opportunities.    click  on  over  to  litaforum.org,  and  i  hope  to  see  you  in  minneapolis,  november  12-­‐ 15.   not  too  long  after  forum,  we'll  be  in  boston  for  midwinter,  and  then  annual  in  orlando.    the   program  planning  committee  is  already  at  work  selecting  the  best  programs  for  annual.  next   summer  is  also  the  start  of  lita’s  50th  anniversary  celebrations!   of  course,  not  everything  we  do  involves  travel  and  in-­‐person  meetings.    lita’s  fall  schedule  of   webinars  includes  sessions  on  patron  privacy,  creative  commons,  personal  digital  archiving,  and  a   second  iteration  of  top  technologies  every  librarian  needs  to  know.   on  the  staff  side,  we  are  happy  to  say  that  jenny  levine  has  started  as  lita’s  new  executive   director.    jenny  comes  to  us  from  ala’s  it  and  telecommunications  services  department,  where   she  is  still  putting  in  some  time  bringing  a  new  version  of  ala  connect  online.    jenny  and  the   governing  board  are  already  working  together  virtually:  we  are  about  to  select  our  emerging   leaders  for  the  year  and  are  working  on  an  exercise  to  set  divisional  priorities,  with  an  eye  toward   drafting  a  new  strategic  plan.    the  board  will  hold  two  online  meetings  this  fall.    as  always,  these   are  open  meetings,  so  if  you’re  interested  in  your  association’s  governance,  you’re  welcome  to  sit   in.    watch  the  board’s  area  in  connect  for  details,  or  look  for  upcoming  posts  to  lita-­‐l  and   litablog.org.    and  if  you  need  to  contact  the  board,  you  can  reach  us  at   http://www.ala.org/lita/about/board/contact.   i  hope  to  meet  as  many  lita  members  as  possible  this  year,  at  one  of  the  upcoming  in-­‐person   meetings  or  online,  or  just  drop  me  a  line  on  connect.    it’s  going  to  be  a  great  year  for  lita.     thomas  dowling  (dowlintp@wfu.edu)  is  lita  president  2015-­‐16  and  director  of  technologies,   z.  smith  reynolds  library,  wake  forest  university,  winston-­‐salem,  north  carolina.     fulfill your digital preservation goals with a budget studio yongli zhou information technology and libraries | march 2016 26 abstract to fulfill digital preservation goals, many institutions use high-end scanners for in-house scanning of historical print and oversize materials. however, high-end scanner prices do not fit in many small institutions’ budgets. as digital single-lens reflex (dslr) camera technologies advance and camera prices drop quickly, a budget photography studio can help to achieve institutions’ preservation goals. this paper compares images delivered by a high-end overhead scanner and a consumer-level dslr camera, discusses pros and cons of using each method, demonstrates how to set up a cost-efficient shooting studio, and presents a budget estimate for a studio. introduction colorado state university libraries (csul) are regularly engaged in a variety of digitization projects. materials for some projects are digitized in-house, while items from selected projects are sometimes outsourced. most fragile materials that require professional handling are digitized inhouse using an expensive overhead scanner. however, the overhead scanner has been occasionally unstable since it was purchased, and this has delayed some of our digitization projects. as digital photography technologies advance, image quality delivered by digital singlelens reflex (dslr) cameras is improving, and camera prices have lowered to an affordable level. in this paper, i will compare images produced by a scanner and a camera side-by-side, list pros and cons of using each method, illustrate how to establish a shooting studio, and present a budget estimate for that studio. literature review there are many online guidelines and manuals for digitizing print materials. some universities and museums have information about their digitization equipment online. most articles focus on either high-end scanners or customized scanning stations. these articles are very helpful for universities and museums that are relatively well funded. however, there is almost no literature discussing how to use inexpensive digital cameras and photography equipment to produce highquality digitized images. this article will use a case study to prove that a low-budget studio can produce high-quality digitized images. comparison of scanned and photographed images the test camera set was chosen because it was the one the author used for general purpose. the camera was also chosen by many professional photographers because of its quality and yongli zhou (yongli.zhou@colostate.edu) is digital repositories librarian, colorado state university libraries, fort collins, colorado. mailto:yongli.zhou@colostate.edu fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 27 affordability. to avoid dispute, the overhead scanner’s make and model are not revealed. test equipment budget studio overhead scanner • nikon d800 • nikon af micro-nikkor 60mm f/2.8d lens • manfrotto 055cxpro3 3-section carbon fiber tripod legs • really right stuff bh-40 lr ii ballhead • nonreflective glass • book cradles • x-rite original colorchecker card • natural daylight • total cost: $4,500 and no maintenance fees (priced in 2014) • our overhead scanner • nonreflective glass • book cradles • purchase price: $55,000 (purchase in 2007) • $8,000 annual maintenance (2013 price) table 1. test equipment focus and sharpness a quality digitized image needs to have a good focus. a well-focused image shows details better and can produce better optical character recognition (ocr) results for text-based documents. at csul, we have no control over the automatic focus on our overhead scanner and have noticed that sometimes one page is sharply focused but the next page is slightly out-of-focus. during the scanning process, our overhead scanner does not indicate if a shot is focused or not. a dslr camera can beep or display a flashing dot on the viewfinder when in focus. illustration the following two figures compare images produced by our test dslr and overhead scanner. both images were originals and have not been enhanced by software. in addition to this image, we tested nine other illustrations. following our comparison study, we concluded that a semiprofessional dslr camera produces sharper images than our expensive overhead scanner. in figure 1, at 100 percent zoom , the left image has a better focus, contains more details, and has colors closer to the original. the left image was taken using a nikon d800 + nikkor 60mm macro lens and under natural lighting. the right image was produced by our overhead scanner. in figure 2, at 200 percent zoom, the left image (taking using the dslr) shows much more detail than the image on the right (taken with the overhead scanner). information technology and libraries | march 2016 28 figure 1. comparative images from dslr (left) and overhead scanner (right), at 100 percent zoom. image from samuel m. janney, the life of william penn; with selections from his correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), plate between pages 296 and 297. figure 2. comparative images from dslr (left) and overhead scanner (right), at 200 percent zoom. image from samuel m. janney, the life of william penn; with selections from his fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 29 correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), frontispiece, print. at csul, the process of digitizing a text document includes scanning pages, converting them into portable document format (pdf) files, and applying an ocr process. in general, a well-focused image of text produces better ocr results, although software such as adobe acrobat can tolerate fuzzy images and produce reasonably accurate ocr text. our ocr tests from a slightly out-of-focus image and a well-focused image have no significant difference; however, from preservation and usability standpoints, we prefer well-focused images. figure 3. the left image was produced by our test dslr camera and has a better focus. the right image was produced by our overhead scanner. samuel m. janney, the life of william penn; with selections from his correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), 300, print. figure 4. we ran the ocr process on the above two images. the top image was produced by our test dslr camera and the bottom image was produced by our overhead scanner. samuel m. information technology and libraries | march 2016 30 janney, the life of william penn; with selections from his correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), 300, print. generated from the image by camera generated from the image by scanner " on one or two points of high importance, he had notions more correct than were, in his day, common, even among men of e1~larged minds, and he had the rare good fortune of being able to carry his theories into practice without any compromise." yet, "he was not a man of stron sense." " on one or two points of high importance, he bad notions more correct than were, in his day, common, even arnong men of e1~larged minds, and he had the rare good fortune of being able to carry his theories into practice without any compromise." yet, "he was not a man of strong sense." table 2. ocr results comparison these test results are very close because of the forgiveness of the adobe acrobat software. however, we have seen that for some other pages, a better-focused image generates improved ocr results. photograph a 6.5 inches by 4.5 inches silver print was used for this test. our tests show that the test dslr camera produced a sharper image of this historic photograph. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 31 figure 5. tested 6.5 inches by 4.5 inches photograph. the red square indicates the enlarged area for figure 6. historical photograph from colorado state university archives and special collections. figure 6. screen view at 100 percent zoom of a silver print. the top image was produced by the test dslr camera and the bottom one was produced by our overhead scanner. historical photograph from colorado state university archives and special collections. oversize materials for oversized materials, overhead scanners and dslr cameras have their drawbacks, so we do not think either option is ideal for them. our library uses a map scanner to scan oversize maps and posters. however, a map scanner is expensive and may not fit many libraries’ budgets. a map scanner also is not suitable for fragile maps or posters. our overhead scanner’s maximum scanning area is 24 inches by 17 inches, and the test map’s size is 25 inches by 26 inches. we had to scan the map in four sections and stitch them together using adobe photoshop. each section image has a files size of 313 mb. because of large file sizes, the stitching process is extremely slow. also stitching images is not recommended because there are always some degrees of mismatching errors created by lens distortion. a camera can capture any material size, but the details of the photographed images diminish as the material’s size increases. the photo of the entire map taken by our test dlsr has a file size of 35.8 mb. the image produced by camera has a lower resolution and less detail. information technology and libraries | march 2016 32 figure 7. oversized materials screen view at 100 percent zoom. the top image was photographed by the test dslr. the bottom image was scanned by our overhead scanner. historical map from colorado state university archives and special collections. small prints one big advantage of a dslr camera is that it can be set farther away to take pictures of oversized materials or very close to smaller objects to take close-up pictures. comparatively, the distance of lens and scanning platform on our overhead scanner is fixed, so no close-up images can be produced, and everything is reproduced at scale of 1:1. for the following example, we used a 5.5 inches by 3.5 inches drawing as our test subject. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 33 figure 8. a 5.5 inches by 3.5 inches fine drawing. a historical booklet from colorado state university archives and special collections. figure 9. small prints screen view at 100 percent zoom. the left image is produced by a dslr with a macro lens and the right image was scanned by our overhead scanner. a historical booklet from colorado state university archives and special collections. information technology and libraries | march 2016 34 the image produced by our overhead scanner has a resolution of 3,427 pixels by 2,103 pixels. the camera produces a 6,776 pixels by 4,240 pixels image. the higher pixel count allows users to see more details at the same zoom level. the image produced by camera is not only sharper but also contains more details. it also is good for making enlarged prints for promotion materials. for smaller maps, a dslr camera also produces superior images. for the following sample, we tested a 15 inches by 9.5 inches map. figure 10. a 15 inches by 9.5 inches map. the blue square indicates the enlarged area for figure 11. historical map from colorado state university archives and special collections. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 35 figure 11. small map screen views at 100 percent zoom. the left image was photographed by a dslr camera with a macro lens and the right image was produced by our overhead scanner. historical map from colorado state university archives and special collections. post-processing use of a sharpening filter our tests showed that a main drawback of our overhead scanner is that images produced are outof-focus. some digitization guidelines recommend minor post-processing for delivered images files to improve image quality. one might argue that to fix our overhead scanner’s out-of-focus problem, sharpening can be applied. technical guidelines for digitizing cultural heritage materials: creation of raster image master files recommends doing minor post-scan adjustment to optimize image quality and bring all images to a common rendition.1 this is good advice, but it is not applicable in real-world practice. to get the best result, each image would need to be evaluated and have a sharpening filter applied separately because when an improper sharpening setting is applied to an image, it often creates haloing artifacts and an unnatural look. the application of a sharpening filter to each image process will be extremely time-consuming. the haloing artifact is also called chromatic aberration (ca) effect. ca appears as unsightly color fringes near high contrast edges. chromatic aberrations are typically only visible when viewing the image on-screen at higher zoom levels or on large prints. information technology and libraries | march 2016 36 the following example shows that the ca may not appear at lower zoom levels, such as 50 percent or 100 percent. the left image has no sharpening filter applied and the right image has a sharpening filter applied. at 100 percent zoom, chromatic aberration is almost not identifiable, and the right image appears to be superior in turns of sharpness. figure 12. sharpening filter comparison sample at 100 percent zoom. the left image has no sharpening filter applied and the right image has been applied a sharpening filter. historical map from colorado state university archives and special collections. at a higher zoom level, we see ca, visible in the right image of figure 13. the extra colors are introduced by the software. figure 13. comparison of sharpening filter applied to images and at 500 percent zoom. the left image has no sharpening filter applied and the right image has sharpening filter applied. historical map from colorado state university archives and special collections. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 37 we recommend not applying sharpening filters to original scanned images; instead, attempt to obtain well-focused images from the beginning. for this reason, the test dslr camera outperformed our overhead scanner for most materials. color balance have you seen a scanned color image or color photograph with colors very different from the original image? for example, a white area appears to be bluish, or it has an orange cast? when scanning or photographing an image under different lighting, the output image can have very different colors. in the following figure, the left image was shot at a correct white balance (wb) setting. wb is the process of removing unrealistic color casts so that objects that appear white in person are rendered white in your photo.2 the center image has a blue color cast, which was caused by a lower kelvin setting, and the right image was shot at a higher kelvin setting. a camera may create images with the wrong colors, but so will a scanner if it is not calibrated correctly. figure 14. images shot under different white balance settings. we pay an $8,000 annual service fee for overhead scanner maintenance, which includes scanner color calibration. in general, image colors rendered by the machine are close to original colors but not exact. we have noticed that some images have a very light green overcast and other others are overly yellow; sometimes images appear to be darker than they should be. because we are not certified to calibrate the overhead scanner, we only use the prescribed settings set by technicians. also, we have no control over maintaining a fading light bulb, which will affect correct exposure. wb adjustment on photographs taken in a studio can be very precise. most dslr contains a variety of preset white balances. in general, auto wb works well, but does not deliver the best results. custom wb allows fine-tuning of colors. if a shooting studio is set up properly, the lighting should be consistent, so ideally one setting found most desirable can be used repeatedly. however, professional photographers do test shots at the beginning of each shooting session. once they find information technology and libraries | march 2016 38 the optimal test shot, they will use the exact settings for the batch. later, they will do minor color adjustment on the chosen test shot to ensure precise color representation, and then apply the adjustment settings on all other photos of the same batch. because many small variations can be present for each shooting session, they do not use the settings from the previous shooting. it may seem arduous to do test shots for each shooting, but it ensures accurate color reproduction. many professional photographers use colorchecker passport,3 which is a commercial product to help with quick and easy capture of accurate colors. i will demonstrate briefly a useful trick i learned from a professional photography seminar how to utilize colorchecker passport to apply correct white balance a group of images. 4 step 1: place an 18 percent gray card or a colorchecker passport card on top of a page. choose the correct exposure and take the photo. use the same exposure setting to take additional photos. for demonstration purposes, we deliberately used a very low and high kelvin setting for sample images. the low kelvin setting created cool and blue tones and the high kelvin setting created a tone that was too warm. note that the test shot with colorchecker board was not taken with exactly the correct white balance setting. figure 15. sample images for white balance adjustment. rocky mountain collegian 3–4 (1893), 118, colorado state university archives and special collections. step 2: in adobe lightroom, select the test target image and switch to “develop” mode. select the white balance tool, move the cursor over a gray area, try to find a spot where the red, green, and blue (rgb) values are close. if you can find a place with equal rgb values, it will be ideal. this simple click will set the test image’s white balance to an almost perfect setting. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 39 figure 16. applying a white balance in adobe lightroom 4 step 3. synchronize other images’ settings with the target image. select the target image and all other images, click the sync button, and select settings you would like to synchronize. make sure the wb button is checked. figure 17. synchronize settings in adobe lightroom 4 information technology and libraries | march 2016 40 figure 18. synchronized images with correct white balance. rocky mountain collegian 3–4 (1893), 118, colorado state university archives and special collections. recently, i had the opportunity to visit the spencer museum of art’s digitization lab. they have a different workflow to ensure even more scientifically correct colors. if you are interested in their approach, you can contact their information technology manager or photographer. color space one very important thing to understand is color space when you use a dslr camera. many dslr cameras support adobe rgb and srgb. srgb reflects the characteristics of the average cathode ray tube (crt) display. this standard space is endorsed by many hardware and software manufacturers, and it is becoming the default color space for many scanners, low-end printers, and software applications. it is the ideal space for web work but not recommended for prepress work because of its limited color gamut. adobe rgb (1998) was designed to encompass most of the colors achievable on cmyk printers, but only by using rgb primary colors on a device such as your computer display.5 it is recommended to use this color space if you need to do print production work with a broad range of colors. many scanning vendors deliver images in adobe rgb color space. prophoto rgb contains all colors that are in adobe rgb, and adobe rgb contains nearly every color that is in srgb. this color space covers more colors than the human eye can see. it can only be used for images in raw format and in 16-bit mode. common file formats that support 16-bit images are tiff and psd. most printers do not support 16-bit format. this color space normally is used by photographers who have a specific workflow and who print on specific high-end inkjet printers. when converting from 16-bit to 8-bit, some images will have banding or posterization problems. banding is a digital imaging artifact. a picture with banding problem shows horizontal or vertical lines. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 41 figure 19. an example of colour banding, visible in the sky in this thotograph.6 posterization of an image entails conversion of a continuous gradation of tone to several regions of fewer tones, with abrupt changes from one tone to another.7 figure 20. an example of posterization.8 while it is a good idea to capture images using adobe rgb to preserve a wide range of colors, you should convert images to srgb when delivering to unknown users and displaying on the web. currently, srgb is the only appropriate choice for images uploaded to the web, since most web browsers don’t support any color management. adobe rgb images that are uploaded to websites without conversion to srgb generally appear dark and muted.9 if they were printed on printers that do not support adobe rgb format, colors will be dull too. setting up a budget studio commercial approach bookdrive pro is a commercially available digitization unit. it uses two digital cameras and built-in flash lights. it may be the optimal solution for your projects, but it also may not fit your library’s information technology and libraries | march 2016 42 budget. the unit also is not suitable for oversized material such as large maps and posters. for more information about this product, please visit http://pro.atiz.com/. sample budget studio setup a digitization lab can have three rooms or areas, one for oversized materials, one for smaller prints or 3-d objects, and one for computers. the area for shooting oversized materials should have black walls and floor. you can either use one flash light to bounce light off the ceiling or use two flash lights to shine lights directly onto the materials. for fragile materials, the first approach is more appropriate. the area for shooting smaller prints or 3-d objects should have a stable table and black or white background paper. for this room or area, black walls and floor are not required. for shooting equipment, i will use the set chosen by the photographer from the university of kansas spencer museum of art as my example. item name sample item purchasing url price dslr camera nikon d810 http://www.bhphotovideo.co m/c/search?atclk=camera+mo del_nikon+d810&ci=6222&n= 4288586280+3907353607 $2,996.95 macro lens nikon af micro-nikkor 60mm f/2.8d lens http://www.bhphotovideo.co m/c/product/66987grey/nikon_1987_af_micro_ nikkor_60mm_f_2_8d.html $429.00 heavy duty mono stand arkay 6jrcw mono stand jr with counter weight— 6' http://www.bhphotovideo.co m/c/product/2727reg/arkay_605138_6jrcw_m ono_stand_jr.html $678.50 strobe broncolor g2 pulso— 1600 watt/second focusing lamphead with 16' cord http://www.bhphotovideo.co m/c/product/259745reg/broncolor_32_115_07_g2 _pulso_with_16.html $3,053.68 power pack broncolor senso a4 2,400w/s power pack http://www.bhphotovideo.co m/c/product/745060reg/broncolor_31_051_07_se nso_a4_2_400w_s_power.html $3,629.92 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 43 reflector broncolor p65 reflector, 65 degrees, 11" diameter, for broncolor pulso 8, twin and hmi http://www.bhphotovideo.co m/c/product/7162reg/broncolor_33_106_00_p6 5_reflector_65_degrees.html $513.52 reflector broncolor softlight reflector, 20" diameter, for broncolor primo, pulso 2/4 & hmi heads http://www.bhphotovideo.co m/c/product/7167reg/broncolor_33_110_00_sof tlight_reflector_20_for.html $501.76 light stand impact air-cushioned light stand http://www.bhphotovideo.co m/c/product/253067reg/impact_ls10ab_air_cush ioned_light_stand.html $44.99 light meter sekonic l-308s flashmate—digital incident, reflected and flash light meter http://www.bhphotovideo.co m/c/product/368226reg/sekonic_401_309_l_308s _flashmate_light_meter.html $199.00 book cradle book exhibition cradles http://www.universityproduct s.com/cart.php?m=product_list &c=1115&primary=1&parenti d=1271&navtree[]=1115 $30.00 background paper savage seamless background paper (both white and black) http://www.bhphotovideo.co m/c/product/45468reg/savage_1_12_107_x_12yd s_background.html $45.00 x 2 = $90.00 nonreflective glass 1/4" optiwhite starphire purified tempered single lite clear class can be purchased at local glass store. $75.00 white balancing accessory x-rite original colorchecker card http://www.bhphotovideo.co m/c/product/465286reg/x_rite_msccc_original_c olorchecker_card.html $69.00 software adobe lightroom 5 http://www.adobe.com/produ cts/photoshop-lightroom.html $150.00 table 3. list of items needed to prepare for a budget studio the total cost for a “budget” shooting studio ranges from $10,000 to $15,000, and there is no annual maintenance expense. http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.adobe.com/products/photoshop-lightroom.html http://www.adobe.com/products/photoshop-lightroom.html information technology and libraries | march 2016 44 figure 21. the university of kansas spencer museum of art digitization lab setup for oversized materials figure 22. steelworks museum of industry and culture’s digitization lab setup for oversized materials fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 45 figure 23. the university of kansas spencer museum of art digitization lab setup for smaller prints and 3-d objects figure 24. steelworks center of the west’s digitization lab setup for 3-d objects functions of some elements in the sample shooting studio 1. macro lens: it allows close up shooting of objects. it is especially useful when photograph small prints and small 3-d objects. it can also be used to photograph regular and oversized materials. 2. heavy-duty mono stand: it replaces a traditional tripod. it is very stable and allows quick adjustment of camera height and location. 3. strobe, power pack, and reflector: together they generate consistent and homogeneous light distribution. recommended further reading: “introduction to offcamera flash: three main choices in strobe lighting.”10 4. light stand: it holds strobe and reflector. information technology and libraries | march 2016 46 5. light meter: hand-held exposure meters measure light falling onto a light-sensitive cell and converts it into a reading that enables the correct shutter speed and or lens aperture settings to be made.11 6. book cradles: they help to minimize the stress on bookbindings and minimize page curvature problem. 7. nonreflective glass: it helps to flatten a photographed page and reduce the reflection. however, it does not completely eliminate glass reflection. one very useful trick to reduce glass reflection is to place a black board with a hole above a page and shoot through the hole. this approach actually does not eliminate reflection but reflects black to the photograph. when the photograph is reviewed on computer, it will appear as no reflection has occurred. figure 25. the university of kansas spencer museum of art digitization lab setup for materials needed be pressed down by a glass. many librarians believe that digitizing print materials using a digital camera requires a professional photographer, but this is not necessarily true. a professional photographer or even an art student can act as a consultant to help set up a shooting studio and provide basic training. also, many museums have professional photographers and have set up shooting studios for digitization. they are very willing to share their experience and even provide training. i believe the learning curve for operating a shooting studio is no greater than the learning curve to operate an overhead scanner machine and its software. pros and cons no digitization equipment or system is perfect. they all have trade-offs in image quality, speed, convenience of use, quality of accompanying software, and cost. our tests show that for most archival materials a dslr camera will do a better job than an overhead scanner. pros of overhead scanner fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 47 • the scanner is a complete scanning station. it can be connected to a computer and starts scanning immediately. materials can be placed on the scanning surface, so no equipment adjustments are required while scanning. • it can scan and save images in bitmap format directly, while a dslr camera can only shoot in grayscale or color. • built-in book cradles help to scan thick books and those that cannot be fully opened. • book curve correction functionality is provided by the accompanying software. cons of overhead scanner • high cost. the overhead scanner we have cost more than $50,000, with an annual maintenance contract of $8,000. • high replacement cost. when a scanner is outdated or broken, the entire machine has to be replaced. • instability. our overhead scanner is unstable even when placed on a sturdy table and handled only by professionals. from april 2010 to october 2010, the scanner was down for a total of forty-two working days (sixty calendar days). the company fixed the machine onsite many times, but it continues to have minor problems and has not been completely reliable. • the autofocus feature does not work consistently. • special training is needed to operate the machine and associated software. • file formats supported are limited. most scanners only support tiff, jpeg, jpeg 2000, windows bmp, and png. • unsupported outdated software: our overhead scanner’s software can only be run on an older operating system (windows xp) because there is no updated software for this model. pros of budget studio • stable. under normal use dslr cameras are much less likely to break down than scanners. for example, i have had an older dslr, nikon d200, for seven years. it has survived numerous backpacking trips, multiple drops, and extreme weather conditions. the camera still functions as needed. • fast and accurate focus. dslr cameras are designed to focus quickly, and their focus indicators provide instant feedback to the operators so they know that the image is focused. if operated properly, images delivered by dslr cameras can be sharper than ones delivered by scanners. • less expensive. a good quality dslr camera and a lens can be purchased for fewer than $4,000 and last for years. as technologies advance, dslr cameras’ prices will continue to drop. • ability to save files in more formats. in addition to tiff and jpeg formats, most dslr cameras can save photos in raw file format. some cameras can directly save images in digital negative (dng) format, and others deliver images in proprietary formats that can be information technology and libraries | march 2016 48 converted into dng using a computer program. editing raw images is nondestructive, while editing of tiff and jpeg images is irreversible. • accurate wb and exposure. by using right shooting and post-processing techniques, photographs can have exact color reproduction. on the other hand, calibrating an overhead scanner most likely can only be performed by a company’s trained technician. proper exposure and wb are not guaranteed. • the raw file format usually provides more dynamic range. overexposed and underexposed images can be fixed by adjusting exposure compensation via software; thus lost shadow or highlight detail can be restored. • can photograph 3-d objects. archival collections often have materials other than books, such as art pieces. these materials are better to be photographed than scanned. • versatile. cameras can perform on-site digitization, while overhead scanners are too bulky to be moved around. • faster and better preview. images can be viewed instantly on a computer when proper software, such as adobe lightroom, is used. operators can compare multiple shoots on a screen side-by-side and decide which photo to retain. • more accessible technical support. the number of dslr camera users is much higher than overhead scanner users. technical questions can often be answered through online forums. • easy to find replacement parts. when a piece in a shooting studio break down, it is easy to find replacement piece and replace by staff. • easy software updates. software used in a studio is independent from equipment. cons of budget studio • there is learning curve for setting up a shooting studio, operating the studio, and mastering new image processing techniques. • a dslr camera with a lower pixel setting will not be sufficient for scanning large-format materials, such as posters and maps. • no built-in book curve correction is provided by adobe photoshop or lightroom. however, our experience proves that the automatic book curve function does not always work well. we normally use a home-made book cradle to help lay a page flat and use one or two weights to hold down the other side of book. for some books, if flatness is hard to achieve, we place a piece of glass on the top to ensure the flatness. • security concern: since a dslr camera is highly portable, it can be stolen easily. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 49 figure 26. scanning setup using a book cradle. conclusion the technology of dslr cameras has advanced very quickly in the past ten years. newer dslr cameras can handle higher resolutions and have very little image noise even at a high iso setting. the higher demand for dslr cameras and accompanying image-editing software results in more rapid technology advances compared to low-demand and high-end overhead scanners. high consumer demand drives dslr camera prices much lower than prices for overhead scanners. in addition, the wide range of consumers purchasing dslr cameras and software prompts companies to offer more user-friendly interfaces. as you can see from our tests, for most library materials a dslr camera can produce superior images. if you do not have a budget for high-end overhead scanners, you can still fulfill your digitization preservation goals with a budget studio. acknowledgement i would like to thank robert hickerson and ryan waggoner, the university of kansas spencer museum of art, tim hawkins, and steelworks center of the west for showing their digitization labs and sharing experience with me. references 1. federal agencies digitization guidelines initiative, “technical guidelines for digitizing cultural heritage material: creation of raster image master files,” august 2010, http://www.digitizationguidelines.gov/guidelines/digitize-technical.html 2. “tutorials: white balance,” cambridge in colour, accessed march 9, 2016, http://www.cambridgeincolour.com/tutorials/white-balance.htm. http://www.cambridgeincolour.com/tutorials/white-balance.htm information technology and libraries | march 2016 50 3. “colorchecker passport user manual,” x-rite incorporated, accessed march 9, 2016, http://www.xrite.com/documents/manuals/en/colorcheckerpassport_user_manual_en.pdf. 4. scott kelby, “scott kelby's editing essentials: how to develop your photos,” pearson education, peachpit, accessed march 9, 2016, http://www.peachpit.com/articles/article.aspx?p=2117243&seqnum=3. 5. “srgb vs. adobe rgb 1998,” cambridge in colour, accessed march 9, 2016, http://www.cambridgeincolour.com/tutorials/srgb-adobergb1998.htm. 6. “colour banding,” wikipedia, accessed march 9, 2016, http://en.wikipedia.org/wiki/colour_banding. 7. “posterization,” wikipedia, accessed march 9, 2016, http://en.wikipedia.org/wiki/posterization. 8. “image posterization,” cambridge in colour, accessed march 9, 2016, http://www.cambridgeincolour.com/tutorials/posterization.htm. 9. richard anderson and peter krogh, “color space and color profiles,” american society of media photographers, accessed march 9, 2016, http://dpbestflow.org/color/color-space-andcolor-profiles. 10. tony roslund, “introduction to off-camera flash: three main choices in strobe lighting,” fstoppers (blog), accessed march 9, 2016, https://fstoppers.com/originals/introductioncamera-flash-three-main-choices-strobe-lighting-40364. 11. “introduction to light meters,” b & h foto & electronics corp., accessed march 9, 2016, http://www.bhphotovideo.com/find/product_resources/lightmeters1.jsp. http://www.xrite.com/documents/manuals/en/colorcheckerpassport_user_manual_en.pdf http://www.peachpit.com/articles/article.aspx?p=2117243&seqnum=3 http://www.cambridgeincolour.com/tutorials/srgb-adobergb1998.htm http://en.wikipedia.org/wiki/colour_banding http://en.wikipedia.org/wiki/posterization http://www.cambridgeincolour.com/tutorials/posterization.htm http://dpbestflow.org/color/color-space-and-color-profiles http://dpbestflow.org/color/color-space-and-color-profiles https://fstoppers.com/originals/introduction-camera-flash-three-main-choices-strobe-lighting-40364 https://fstoppers.com/originals/introduction-camera-flash-three-main-choices-strobe-lighting-40364 http://www.bhphotovideo.com/find/product_resources/lightmeters1.jsp oversize materials small prints use of a sharpening filter color balance color space setting up a budget studio commercial approach sample budget studio setup cons of budget studio acknowledgement i would like to thank robert hickerson and ryan waggoner, the university of kansas spencer museum of art, tim hawkins, and steelworks center of the west for showing their digitization labs and sharing experience with me. lib-mocs-kmc364-20140103102230 38 journal of library automation vol. 4/1 march, 1971 recon pilot project: a progress report, april-september 1970 henriette d. avram and lenore s. maruyama: marc development office, library of congress, washington, d. c. a synopsis of the third progress report on the recon pilot project submitted by the library of congress to the council on library resources. an overview is given of the progress made from april through september 1970 in the following areas: recon production, format recognition, research titles, microfilming, and investigation of input devices. in addition, the status of the tasks assigned to the recon working task force are briefly described. introduction the recon pilot project was established in august 1969 to test various techniques for retrospective conversion in an operational environment and to convert a useful body of records into machine readable form. it is being supported with funds from the council on library resources, the u.s. office of education, and the library of congress. this article summarizes the third progress report of the pilot project submitted by the library of congress to the council and has addressed itself to all aspects of the project, regardless of the source of funding, in order to present a meaningful document. two previous articles in the journal of library automation summarized the first and second progress reports, respectively ( 1), ( 2). this article describes the activities occurring april through september 1970. progress-april through september 1970 recon production at the present time, the recon data base contains approximately 20,000 records. it appears that the original estimates on the number of titles to be input during the recon pilot project were considerably higher than the actual number found to be eligible. this situation occurred because of the following circumstances: recon pilot project/ avram and maruyama 39 1) the original estimates were derived from the number of english language monographs cataloged during 1968 and 1969. since the marc distribution service began in march 1969, it was felt that the number of titles eligible for recon in the 1969 and 7-series of card numbers would be equal to the number cataloged during january-march 1969. in actuality, the titles cataloged during this period were primarily records with 1968 card numbers. 2) the estimate of records with 1968 card numbers was higher because it was thought that many more of these titles had been through the cataloging system than were actually processed prior to the beginning of the marc distribution service. instead of being included in recon, these records have been input into the marc distribution service. in order to obtain 85,000 records for conversion, several alternatives, including the conversion of english language monographs in the 1967 card series, are being studied. format recognition format recognition is a technique that will allow the computer to process unedited catalog records by examining data strings for certain keywords, significant punctuation, and other clues to determine the proper content designators. this technique should eliminate substantial portions of the manual editing process and, if successful, should represent a considerable savings in the cost of creating machine readable records. the logical design for format recognition has been completed, and the manual simulation to test the efficiency of the algorithms was described in an earlier article ( 3). completion date for the programs is expected in february 1971. the programs were designed in several modules so that they could be adapted for different input procedures without disturbing the logic. once the programs have been implemented, tests may show that certain fields should be pretagged because the error rate is too high or the occurrence of the field is too low to justify the processing time. the complete logical design for format recognition has been published as a separate report by the american library association ( 4). as part of a manual simulation to test the format recognition algorithms, one hundred fifty records for english language monographs were typed on an mt/st, a typewriter-to-magnetic tape device. the mt/st hardcopy output was used as the raw data for the simulation. the results of the test were analyzed for possible changes to the algorithms, keyword lists, or input specifications. then the records with the content designators assigned by the format recognition algorithms were retyped and processed by the existing marc system programs. proofsheets were produced and given to the recon editors for proofing, a process to verify content designators and bibliographic information. 40 journal of library automation vol. 4/1 march, 1971 each editor proofed all of the format recognition records; their hourly numbers of records proofed were as follows: highest, 9.3; lowest, 5.3; average, 6.8. the average number of current marc records edited and proofed in an hour is 4.8. when format recognition is implemented, present workflow-editing, typing, computer processing, proofing-will be replaced by a new onetyping, format recognition, proofing. in comparing production rates in the two systems, time needed to proof format recognition records must be compared against time needed to edit and proof in the current system. several factors should be considered when evaluating this portion of the simulation experiment. although all the records chosen for the test were of english language monographs, they were generally more difficult than those encountered in a normal day's work for both editors and typists. in addition, numerous errors were made by the human simulators, such as omission of subfield codes, delimiters, or fixed field codes. format recognition does appear to have reduced the amount of time spent in the combined editing and proofing process, but the success of the program depends heavily on the following factors: 1) extensive training for the input typists with greater emphasis placed on their role in this project; and 2) extensive training for the editors to alert them to kinds of errors the format recognition programs might make. proofing time for the test was greater than anticipated. with fewer errors from the typing input and the elimination of human errors from the simulation, it is possible that the proofing rate will be higher under actual work conditions. editors might reach an average of 9.3 records proofed, or double the number presently done in a combined editing/ proofing process. two programs are being written to support the format recognition project. format recognition test data generation (fortgen) will provide test data for format recognition by stripping marc records of delimiters, indicators, and subfield codes, and reformatting the data to be identical with the product from the initial input program. thus, a large quantity of high quality test data can be provided without additional keystroking. the keyword list maintenance program ( klmp) maintains approximately sixty keyword lists used by the format recognition program in processing bibliographic data. these lists are maintained as a separate data set on a 2314 disk pack. the actual lists themselves, alon~ with associated control data, are referred to as "keyword list structures. ' the general function of klmp is to read the entire set of keyword list structures from the file on disk, modify them as specified by parameter cards to klmp, and write a new file on disk. the individual actions performed by klmp are as follows: 1) create a list; 2) remove a list; 3) add a keyword; 4) delete a keyword; 5) augment a table (translation tables to recon pilot project/ avram and maruyama 41 generate codes such as geographic area code, language, place of publication); and 6) list structures (printout of all or selected portions of a list). since the keyword lists will be dynamic in nature, this program provides the flexibility required to change or update them without recataloging the entire format recognition program. new lists will be added as format recognition is extended to other languages, and keywords will be added to or deleted from existing lists as experience is gained in the use of format recognition. research titles since the production operations of the recon pilot project have been limited to english language monographs in the 1968, 1969, or 7 -series of card numbers, it was recognized that many problems concerning retrospective records would not be revealed in the conversion of relatively current titles. for this reason, a project to identify and analyze 5,000 research titles was included as part of the pilot project. these research titles would consist of records for older english language monographs and foreign language monographs in roman alphabets and would be studied for problems in the following areas: 1) earlier cataloging rules which caused certain elements to be omitted from the record or transcribed in a different style; 2) different printed card formats which placed elements in different locations; 3) difficulty in working with foreign languages when converting records to machine readable form; 4) problems arising from shared cataloging records; and 5) problems arising when expanding the format recognition algorithms to cover these kinds of records. the selection of these records was described in an earlier article ( 5). the initial analysis of the research titles has been completed, and a few of the problems encountered are listed as follows: 1) ellipses at the beginning of a title field ( • . . dictionnaire-manuelillustre des ecrivains et des litteratures) were used frequently on older cataloging records. since they are no longer prescribed by the present cataloging rules unless they appear on the title page at the beginning of a title, it was recommended that such ellipses be deleted from the machine record because they would affect the format recognition algorithms. 2) card numbers without digits representing the year (f-3144) were assigned during 1901. generally, these numbers appear with an alphabetic prefix representing the language of the publication or the classification number. it has been recommended that such numbers be revised to read "f01-3144" for the machine record. 3) records cataloged under the 1908 a. l.a. catalog rules included in the series statement such information as the editor of the series or the location of the series statement (half-title: everyman's library, ed. by ernest rhys. reference). it has been recommended that such information he deleted from the machine record. 4) an asterisk preceding personal name added entries (i. 0 spence, 42 journal of library automation vol. 4/1 march, 1971 lewis, 1874joint author.) indicated that the name had appeared in a fuller form at an earlier date; if this name were used as the main entry, there would have been a corresponding full name note at the bottom of the catalog card. it has been decided that this asterisk will be deleted from the machine record. 5) the national bibliographies from which shared cataloging copy is derived use punctuation conventions which differ from the aa rules. for example, the west german bibliography uses parentheses to indicate that the data are not on the title page, brackets to indicate the data are not in the publication, and angled brackets to indicate that the data are enclosed in parentheses on the title page ( <22.-27. mai 1967>. koln ([-ehrenfeld] bundesinstitut fur ostwissenschaftliche und internationale studien) 1967). such conventions would affect the expansion of the format recognition algorithms to foreign languages. this is an area in which the standard bibliographic description would be of great value. 6) in the marc ii format, each place of publication is a separate subfield so that when each place is connected by hyphens (milano-romanapoli ... ,), there would be a problem in inputting the data and having the data printed out in the same fashion. it has been recommended that each place of publication be separated with a comma instead of a hyphen (and the ellipsis deleted from the imprint statement). 7) conjunctions have been used between places of publication on records cataloged according to the 1908 rules and on some shared cataloging copy (london, glasgow and bombay) (neuwied a. rh. u. berlin). in the machine record, each place is a separate subfield, and the presence of a conjunction means that one subfield contains non-essential data. it has been recommended that conjunctions be omitted from the machine record and that places of publication be separated by commas. 8) the a. l.a. cataloging rules for author and title entries states that with certain well-known persons, dates of birth and death can be omitted when the heading is followed by a subject subdivision ( 1. shakespeare, william-language-glossaries, etc.). since the rules provide a list of such persons, it has been recommended that when such names are used as subject headings, they should include dates of birth and death in the machine record. 9) a collation statement like the following ( 25 p., 27-204 p. of ill us., 205-232 p., 233-236 p. of illus., 237-247 p. 28 em.) would cause the format recognition algorithms some difficulty in identifying the proper subfields. this is another area in which the adoption of a standard bibliographic description would aid format recognition programs. 10) both east and west german bibliographies give information about illustrations in the title paragraph rather than in the collation (title paragraph: [mit] 147 abbildungen und 71 tabellen. collation: xii, 418 p. 26 em.). the cataloging policy at the library has been revised so that recon pilot pro]ect/avram and maruyama 43 on current cataloging records information about illustrations is also repeated in the collation. it has been recommended that for retrospective records the data should be input as it appears on the catalog card. in this example, the machine record would not contain illustration information in the collation. 11) the method of transcribing non-lc subject headings has been changed in recent years, and the marc ii format reflects this change. in previous years, the following conventions were used: subscript brackets enclosed headings or portions of headings that were not the same as the lc form; subscript parentheses enclosed portions of headings that were the lc form but not the contributing library's; if two headings had the same number, the lc form was listed first; if both forms of the heading were the same, there would be only one number, and the heading itself would not have the subscript brackets or parentheses. it has been recommended that either the non-lc forms be deleted from the machine record or the transcription of such subject headings be revised to follow the current practice. 12) nlm subject hearings have different capitalization conventions from those used by lc, and the geographic subject subdivisions are often in a form different from that which the library of congress uses ( [dnlm: 1. public health administration-u.s.s.r. w6 p3]). in analyzing these research titles in terms of possible problems with format recognition, it was discovered that nlm subject headings would be incorrectly identified for the above reasons. format recognition depends heavily on capitalization and keyword lists; in this example, the heading "public health administration" would be identified as a corporate name because of the capitalization. examinination of the research titles showed the similarity of the cataloging of the older records (pre-1949) and the current foreign language records based on shared cataloging copy. certain stylistic conventions, such as the use of ellipses or the transcription of imprint statements, were similar for both kinds of material. it would be necessary to have a thorough knowledge of the ala catalog rules (published in 1908) in order to interpret the data on the older printed cards correctly during a conversion project. the experience of the editors in the recon production unit has been that retrospective records, even those cataloged during the last two years, require a considerable amount of interpretation in order to assign the correct content designators in the fixed fields. for pre-1949 records, the problem becomes more acute when one attempts to apply the procedures and techniques for current material to older records. it is very likely that a higher level of personnel would be required to process these records because in many instances the changes would be similar to recataloging the entire record. the expansion of format recognition to foreign languages would be 44 journal of library automation vol. 4/1 march, 1971 t emely difficult without a greater degree of consistency in shared ~:t~oging copy. each national bibliography, from which the cataloging copy is derived, has its own rules and style of cataloging, so that although the language of the works may be the same, e.g., german, the entries from the west german, east german, austrian, and swiss bibliographies may differ in terms of punctuation or style of cataloging. these problems have been compounded by printer·s errors on the printed cards as the result of conventions that differ from the aa rules. the adoption of the standard bibliographic description ( 6) would be a tremendous aid in interpreting cataloging data by both humans and format recognition programs. microfilming techniques the library's photoduplication service is supporting the recon pilot project by providing the cost estimates for the various alternatives of microfilming techniques and providing technical guidance as required. several discussions with them confirmed that the method of filming a portion of the record set containing the subset of records to be converted first and selecting the appropriate records afterward would be more advantageous than selection prior to microfilming ( 7). it was considered unrealistic to attempt to project microfilming costs for the entire recon effort. because of the paper handling problems involved in the management of input worksheets, the microfilming rate should be in reasonable proportion to the actual conversion rate. there is no point in providing a huge supply of input worksheets which will not be used in actual conversion for a long time. the data may become "dated," and there may be storage and handling problems. in addition, cost estimates provided by the photoduplication service can only be expected to prevail over the next twelve months. beyond that period, any quotation given is likely to be higher because of the general trend of rising costs. any projection of costs should be based on a manageable portion of the whole. just what this portion should consist of has yet to be determined. assuming a modus operandi as described above, there is needed a determination of the "rate floor," which is defined as the minimum number of records that must be microfilmed to achieve the maximum cost benefits resulting from a relatively high volume job. once the rate floor is determined, it should probably be translated into year equivalents, i.e., if the rate floor is 100,000 and the catalog card production is 50,000, then two years· worth of cards would be microfilmed. estimates would be obtained for the following alternatives: microfilming for ocr device specifications; microfilming for reader-printer specifications; microfilming for reader specifications; and microfilming for xerox copyflo printouts of the lc printed cards onto recon worksheets. certain ground rules were assumed for the actual microfilming process. the selected drawers of the record would be "frozen" for a day or two prior to being filmed, i.e., the file would be complete and no one would recon pilot project/avram and maruyama 45 remove cards from the file while filming was in process. the filming would take place during the day. assuming that 100,000 cards for the year 1965 would be used as a base figure and that approximately 5,000 cards per day can be filmed with a planetary camera, it would take twenty working days to film the collection of cards for one year in the record set (rate floor as defined above). all cost estimates will include quality control; i.e., quotations would indicate degree of inspection of film for technical quality and degree of preparation of the file before filming. input devices during 1969 the library of congress conducted an investigation to determine the feasibility and desirability of using a mini-computer for marc/recon input functions (original input and corrections). this study was performed with contractual support and consisted of three basic tasks: 1) analysis of present operations to determine functional requirements, to measure workloads, and to identify problem areas; 2) survey and analysis of mini-computers that are potentially capable of meeting the requirements of the present operations; 3) evaluation of available hardware and software capabilities relative to marc data preparation requirements and determination of economic feasibility based on present and projected workloads. the intent of this study was to provide a basis for future planning and procurement activities by the library of congress relative to improvement of the marc/recon man-machine interface. the survey of hardware was not intended to be all-inclusive. there were time and funding limitations, and in addition it was recognized that the mini-computer field was a rapidly expanding one; therefore, it was not possible at any cut-off point to have surveyed the totality. six firms were included in the survey, and the machines considered were the burroughs tc-500, the digital equipment corporation pdp-8/i, the honeywell pdp-516, the ibm 1800, the lnterdata model 4, and the xds sigma 3. of these, the dec pdp8/1 and the honeywell pdp-516 were determined to have the highest potential for meeting marc/recon requirements. additional analysis revealed that software availability for mini-computers is minimal. manufacturers covered in this investigation supplied an assembler as well as testing and editing routines. some provided a fortran, algol, or basic compiler and an operating system with foreground/background processing. systems that support fortran and the operating system are quite substantial, generally requiring 16,000 words of core, memory protect, disc, etc. the cost of this kind of system is generally a minimum of $10,000. few low-cost peripheral devices are available for use with mini-computers. high-speed tape readers, punches, and punched card readers are the most inexpensive input/output devices available. the addition of a magnetic tape unit to most systems significantly increases the overall cost. 46 journal of library automation vol. 4/1 march, 1971 the conclusion reached as a result of this investigation was that there is no gain, either tech~ically or economically ( co.nsiderin~ ~he hardwa~e configuration of the l1brary of congress), to usmg a mm1-computer m performing present marc/recon functions. another input device investigated during this reporting period was the keymatic data system model 1093, which was selected for a two-month test and evaluation period because it appeared to have the following advantages for the recording of bibliographic data: 1) this device has 256 unique codes; 2) data is recorded directly on computer compatible magnetic tape; 3) through manufacturer supplied software, the user may assign to certain keys, called expandables, the value of whole strings of characters; thus a single key would equate to a marc tag; 4) correction procedures are built into the device, i.e., the ability to delete a character, word, sentence, or entire record; and 5) the single character display screen obviates the necessity for hard copy. it is often claimed that hard-copy output is scanned by the typist tmintentionally to the detriment of typing rates. the machine tested was specifically set for the library's requirements. four separate keyboards contained 184 keys, of which 103 had upperand lower-case capability, and the remaining 81 had only a single case. the 256 possible codes were divided into the following categories: 1) 94 were used as expandables and assigned to those marc tags and data strings (correction and modification symbols) that appear most frequently; 2) 10 were used as machine function codes; 3) 150 were assigned unique values in the marc character set; and 4) 2 were left unused. the keys on the four keyboards were assigned values such that the most frequently used keys were located in a strong stroke area. the main character keyboard was designed to be closely compatible to the device currently in use at the library to lessen the training requirements for the typist. therefore, the typist had only to learn the expandable keys and some lesser used special characters. the program supplied by the manufacturer was modified for code conversion and output format acceptable to the marc system and to conform to the library's computer system assignments. the two typists selected to participate in the test were both experienced marc production typists. both typists were given individual instruction on the machine and spent three weeks practicing; at the same time, their performance was being analyzed and discussed with them. during the official evaluation period, the typists spent two weeks working full time on the machine. when the typists began their practice period, their speeds were relatively slow, 6-7 records per hour. as time progressed, their speed increased, leveling off to approximately 11-12 records per hour by the end of the test period. each typist reported problem areas during the official evaluation. one problem was the hesitation which resulted when the typist had to detennine recon pilot project/avram and maruyama 47 whether to use an expandable key or actually type the data, character by character. if she chose the former, the expandable key had to be found. the number and different combination of tags caused some confusion. the opinion of both typists concerning the keyboard arrangement was that they would rather type the tags character by character than search for the expandable key. more experience on this device might eliminate this problem. the absence of hard copy was felt to cause another problem. when a typist intuitively feels that she has made an error in current marc/ recon typing operations, she uses the hard copy to verify that a mistake has actually been made prior to taking corrective action. the lack of hard copy did not allow for this verification, and the typists reported that this detracted from their efficiency. the following table lists the results of the official evaluation period. the average production rate of these two typists on the mt /st is also listed. the figures for mt jst production have been calculated for a particular three-week period. typist a typist b total mt/st new records 505 540 1045 1995 correction records 323 278 601 verified records 58 537 595 average records/hour-new 10.1 14.0 12.1 14.6 average records /hour-corrected 21.3 27.7 24.5 keystrokes total 238,435 259,630 498,065 expandables used 12,280 14,646 26,926 the keymatic model used for the test rents for $768.25 per month (july 1970 pricelist). it is a fully equipped model with several options not required for the marc system. without these options, a less expensive model could be used. keymatic does have a 24-month lease plan in which the basic machine could be rented for $368.00 per month. this is an increase of $258.00 per month per machine over the current method of input. costs per record were computed for the keymatic device and for the mt /st based on the average record statistics of both typists. although the same records were not actually typed on the mt jst, extensive experience with production and error rates on that device made it valid to use average production rates for purposes of comparison. for purposes of computing the cost per record, the hourly cost per machine was calculated by dividing the cost per machine by 160 working hours. the 24-month leasing price of $368.00 per month was used for the keymatic, resulting in a macbine cost per hour of $2.30. the mt /st rental cost is $110.00 per month, resulting in an hourly cost of $.69. (the cost of the mt /st listed in a previous article ( 8) as being $100.00 was 48 journal of library automation vol. 4/1 march, 1971 in error.) on the basis of 12.1 records per hour on each device, the cost per record for the keymatic is $.19 and $.06 for the mt /st. in the context of the library of congress marc/recon project, the addition of a digi-data to translate mt/st output to computer compatible tape adds an incremental cost to each input device. for the purposes of this report, it was assumed that the project required five input devices. on this basis, the prorated digi-data cost per hour is $.33, which makes the total machine cost per hour for the mt /st as $1.02. thus, the cost per record for the mt /st becomes $.08. the results of the test indicated that the keymatic used in the library of congress environment did not substantially increase production rates or decrease error rates. thus, no savings in cost were demonstrated. the complex data to be typed and the construction and quality of the worksheets at the library of congress impose severe constraints on all machines. (the manuscript card reproduced on the marc/recon worksheet results in a source document that is difficult to work with for the following reasons: 1) loss of legibility during the copying process; 2) position of tags in relation to content; and 3) combination of typed and handwritten data as recorded by the catalogers. ) in order to make a fair comparison between the keymatic and the mt /st, the manuscript card was used for the test rather than the printed card. if, on evaluation, the keymatic proved to be more efficient than the mt /st using the manuscript card, it would be even more effective if the printed card were used, since the latter is a far more legible source document. keymatic does have a new machine, model k-103, which has an socharacter visual display option which might correct one of the objections raised by the typists, i.e., lack of hard copy; however, this model requires the use of a converter as does the mt /st. this device is less expensive than the machine used in the test and may be evaluated during the recon project at a later date. an investigation of model 370 compuscan was continued following the initial findings reported in a previous article (9). twenty-five letterpress library of congress printed cards representing english language titles and containing no diacritical marks in the content were sent to the firm for input. this allowed the machine to be evaluated and problems noted within an "ideal" test environment. depending on these results, further testing could be performed. since existing compuscan software was used to conduct the library of congress test, the entire lc card could not be read but only that portion that contained fonts already built into the existing configuration. the printed cards were blocked out, except for the area covering the body of the entry, i.e., title through imprint, prior to microfilming for subsequent scanning. operator intervention was required on approximately 1%-25% of the cfutracters on each card. in addition to the problems offered by variant recon pilot project/ avram and maruyama 49 and touching characters, fine lines in certain characters caused a misreading by the machine. this was f~uticularly true with the letter "e" being interpreted as the letter "c. compuscan felt this problem might be resolved by increasing the size of the comparison matrix of the hardware. in some instances, a period was generated in the middle of a word due to the coarseness of the card stock that was microfilmed. initial discussions have begun on the possibility of testing a retyped version of the printed card. the only rationale behind this test would be to investigate if typing for a scanner that could read upper-and lower-case and special characters made any significant difference in speed and/ or error rate compared to costs and production rates of typing for a scanner which could read only upper-case characters. the latter was described in an earlier article on recon (10). recon working task force the working task force continued the discussion on the implications of a national union catalog in machine-readable form. from the postulated reporting system for a future nuc described in a earlier article (11), several items were isolated for further consideration. these included: 1) grouping of records in a register (by language, alphabet, etc. ) to allow for a segmented approach to computer-produced book catalogs (a register is defined as a printed document containing the full bibliographic descriptions of works sequenced by unique identification numbers. as each record is added to the register, it is added at the end and assigned the next sequential identification number); 2) the need for additional indexes to the register by lc card number and classification number (the class number was not included in the list of data elements required for the machine-readable nuc); 3) the requirement to include the author statement in the title index versus using the main entry in all cases; and 4) clarification of subject index to mean only topical or geographic subjects. the following tasks were outlined for further consideration: 1) format of the printed nuc (graphic design and printing, size, style, typographic variation, etc.); 2) physical size of the volume depending on pattern of distribution (monthly, bimonthly, etc.); 3) input (relationship to marc input, use of format recognition, problems of languages in terms of selection for input); 4) output (cost of production for register and indexes, cost of sorting, costs of selection, etc.); 5) cumulation patterns in terms ?£ cost and utility (number of characters in an average entry, number of items on a page, rate of increase, etc.); 6) the use of com (computer output microfilm) as an alternative to photocomposition for printed output. work on task 3, the investigation of the possible use of existing data bases in machine readable form for a national bibliographic service, has been continued. phase 1 of this task consisted of a survey of existing machine readable data bases. selection of data bases for analysis was based on the following criteria: 1) the data base had to include monograph 50 journal of library automation vol. 4/1 march, 1971 records. 2) any data base known to have predominantly lc marc records was excluded. 3) the data base had to be potentially available to recon (security organizations or commercial vendors might not be willing to give their files to a recon effort). 4) data bases of less than 15,000 records were excluded. a data analysis worksheet was prepared to reduce the documentation to a standardized form for each system studied in the survey. it was initially anticipated that once documentation was received from the various institutions, additional contact would be made via telephone or on-site visits. this proved to be unnecessary, as the submitted documentation was generally sufficient. since many of the formats submitted were complicated, errors could have been made in interpretation; however, this possibility was not considered important enough to affect the findings of this task. if necessary, additional information can be requested from the library systems at a later date. the analysis of the submitted documentation was difficult for the following reasons: 1) the amount of documentation ranged from extremely detailed to very sparse; 2) neither the technical nor the bibliographic terminology was consistent for all organizations; 3) in some instances, the format descriptions were more detailed with respect to control and housekeeping data fields than bibliographic data fields. the formats were ranked according to three broad categories: low potential, medium potential, and high potential. to arrive at a ranking, the data fields of each format were compared to the marc ii format. comparison was made on the following basis: 1) present in both formats; 2) not present in local format and not capable of generation by format recognition algorithms; or 3) not present in local format but capable of generation by format recognition. the result of this analysis distributed the twenty-two institutions into the following ranked order: 1) low potential-s; 2) medium potential-s; 3) high potential-h. the figure for the number of low potential data bases is in addition to the eight out of the eleven originally rejected due to a small data base or very limited content in the record. it is significant to note that although no attempt was made at an all-inclusive survey of machine readable data bases, the total number of records in machine readable form reported by the respondents amounted to approximately 3.7 million of all types. of this figure, about 2.5 million represented monograph records. the phase 1 study included procedures required to transform a record into a certified recon record, thus outlining the areas requiring cost analysis to compare the economics of using existing files for a national bibliographic store, as opposed to original input. (certification in this context means comparing the record of the local institution to the record in the lc official catalog and, if required, making the record consistent with the lc cataloging as well as upgrading it to the bibjiographic com· recon pilot projec1'/ avram and maruyama 51 pleteness of the lc record. input in this sense includes the editing of the record as well as the keying.) the results of the study, prior to any further analysis, seems to indicate that the next phases of task 3 will concentrate on a very large data base with a high degree of compatibility with marc ii (high potential) and another data base with a format differing from marc ii both in level of explicit identification and in bibliographic completeness (medium potential). the first data base tests the most favorable situation; the latter a much less favorable situation. the carry-on phases of task 3 will include: 1) a determination of a cut-off point at which a particular data base would not be included in future studies (although the composition and the format of the records in the data base might fit the selection criteria, the number of records in the file might be insufficient to warrant the costs of the hardware/software for the conversion effort); 2) investigation of the hardware and software effort involved; and 3) determination of the costs of comparing the records with the lc official catalog and the resultant updating costs to bring the records up to the level of the records in the lc machine readable marc/ recon data base. acknowledgments the authors wish to thank the staff members associated with the recon pilot project in the marc development office, the marc editorial office, and the technical processes research office in the library of congress for their contributions to this report. the lc photoduplication service provided valuable assistance in certain phases of this project. work on the recon pilot project has continued to be supported by the council on library resources and the u.s. office of education. references 1. avram, henriette d.: "the recon pilot project: a progress report," journal of library automation, 3 (june 1970), 102-114. 2. avram, henriette d.; guiles, kay d.; maruyama, lenore s.: "the recon pilot project: a progress report, november 1969-april 1970," journal of library automation, 3 (september 1970), 230-251. 3. ibid., p. 235 4. u.s. library of congress. information systems office. format recognition process for marc records: a logical design. chicago: ala, 1970. 5. avram, henriette d.; guiles, kay d.; maruyama, lenore s. op. cit., p. 236. 6. ibid. 7. ibid., p. 237. 8. ibid., p . 246. 9. ibid., pp. 244-245. 10. ibid., pp. 245-248. 11. ibid., p. 248. identifying key steps for developing mobile applications and mobile websites for libraries devendra dilip potnis, reynard regenstreifharms, and edwin cortez information technology and libraries | september 2016 43 abstract mobile applications and mobile websites (mamw) represent information systems that are increasingly being developed by libraries to better serve their patrons. because of a lack of in-house it skills and the knowledge necessary to develop mamw, a majority of libraries are forced to rely on external it professionals who may or may not help libraries meet patron needs but instead may deplete libraries’ scarce financial resources. this paper applies a system analysis and design perspective to analyze the experience and advice shared by librarians and it professionals engaged in developing mamw. this paper identifies key steps and precautions to take while developing mamw for libraries. it also advises library and information science graduate programs to equip their students with the specific skills and knowledge needed to develop and implement mamw. introduction the unprecedented adoption and ongoing use of a variety of context-specific mobile technologies by diverse patron populations, the ubiquitous nature of mobile content, and the increasing demand for location-aware library services have forced libraries to “go mobile.” mobile applications and mobile websites (mamw), that is, web portals running on mobile devices, represent information systems that are increasingly being developed and used by libraries to better serve their patrons. however, a majority of libraries often lack the in-house human resources necessary to develop mamw. because of a lack of staff equipped with the requisite it skills and knowledge, libraries are often forced to partner with and rely on external it professionals, potentially losing control over the process of developing mamw.1 partnerships with external it professionals do not always help libraries meet the information needs of their patrons but instead can deplete their scarce financial resources. it then becomes necessary for librarians to understand the process of developing mamw to better evaluate mamw for better serving library patrons. one possibility devendra dilip potnis (dpotnis@utk.edu) is associate professor, school of information sciences; reynard regenstreif-harms (reynardrh@gmail.com) is project archives technician, great smoky mountains national park, gatlinburg, tennessee; and edwin cortez (ecortez@utk.edu) is professor, school of information sciences, university of tennessee at knoxville. mailto:dpotnis@utk.edu mailto:reynardrh@gmail.com) mailto:ecortez@utk.edu identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 44 is to re-educate themselves through continuing education or other professional development activities. another solution would be to see library and information science (lis) schools strengthen their curriculum in the area of management, evaluation, and application of mamw and related emerging technologies. issues, challenges, and strategies for providing librarians with these opportunities are abundant and have been debated for more than thirty years, especially since libraries started experiencing the impact of microchip and portable technologies.2 any practical and immediate guidance could help librarians in charge of developing mamw.3 however, a majority of the practical guidance available for developing mamw for libraries is limited to specific settings or patron populations. also, the practical guidance is not theoretically validated, curtailing its generalizability for diverse library settings. for instance, a number of librarians and it professionals share their experience and stories of mamw development to serve a specific patron population in a specific library setting.4,5 their stories typically describe their success stories of developing mamw, the lessons learned during the development of mamw, or their advice for developing mamw. this paper applies a system analysis and design perspective from the information systems discipline to examine the experience and advice shared by librarians and it professionals for identifying the key steps and precautions to be taken when developing mamw for libraries. system analysis and design, a branch of the information systems discipline, is the most widely used theoretical knowledgebase available for developing information systems.6 according to the system analysis and design perspective, development, planning, analysis, design, implementation, and maintenance are the six phases of building any information system.7 the next section synthesizes our method for this secondary research. the following section discusses the key steps we identified for developing, planning, analyzing, designing, implementing, and maintaining mamw for libraries. the concluding section presents the implications of this study for libraries and lis graduate programs. method we began this study with a practitioner’s handbook guiding libraries to use mobile technologies for delivering services to diverse patron populations.8 to search the literature relevant to our research, we devised many key phrases, including but not limited to “mobile technolog*,” “mobile applications for libraries,” and “mobile websites for libraries.” as part of our active informationseeking process, we applied a snowball sampling technique to collect more than seventy-five scholarly research articles, handbooks, ala library technology reports, and books hosted on ebsco and information science source databases. our passive information-seeking was helped by article suggestions from emerald insight and elsevier science direct, two of the most widely used journal hosting sites, in response to the journal articles we accessed there. we applied the following four criteria to establish the relevancy of publications to our research: accuracy of facts; duration of publications (i.e., from 2000 to 2014); credibility of authors; and content focused on information technology and libraries | september 2016 45 problems, solutions, advice, and tips for developing mamw. several research articles published by information technology and libraries and library hi tech, two top-tier journals covering the development of mamw for libraries, built the foundation of this secondary research. we analyzed the collected literature using the qualitative data presentation and analysis method proposed by miles and huberman.9 we developed microsoft excel summary sheets to code the experience and advice shared by librarians and it professionals. the coded data was read repeatedly to identify and name patterns and themes. each relevant publication was analyzed individually and then compared across subjects to identify patterns and common categories. the inter-coder reliability between the two authors who analyzed data was 85 percent. data analysis helped us identify the key steps needed for planning, analyzing, designing, implementing, and maintaining mamw for libraries. findings and discussion key steps for planning mamw forming and managing a team building teams of people with the appropriate skills, knowledge, and experience is one of the first steps suggested by the existing literature for planning mamw. it is essential for team members to be aware of new developments and trends in the market.10 for instance, developers should be aware of print resources on relevant technologies such as apache, asp, javascript, php, ruby on rails, and python, etc.; online resources such as detectmobilebrowser.com and w3c mobileok checker to test catalogs, design functionality, and accessibility on mobile devices; and various online communities of developers who could provide peer-support when needed.11 team members are also expected to keep up with new developments in mobile devices, platforms, operating systems, digital rights management terms and conditions, and emerging standards for content formats.12 periodic delegation of various tasks could help libraries develop mamw effectively.13 libraries should also form productive, financially feasible partnerships with external stakeholders such as internet service providers and network administrators for hosting mamw on appropriate internet servers that meet desired safety and security standards.14,15 requirements gathering requirements for developing mamw can be collected through empirical research and secondary research. typically, the goal of empirical research is to help libraries [set off as bulleted list?]gather patron preferences for and expectations of mamw,16,17 stay abreast of the continual evolution of patron needs,18 periodically (e.g., quarterly, annually, biannually, etc.) gather and evaluate user needs,19 index the content of mamw,20 investigate the acceptance of the library’s use of mamw by patrons,21 understand user needs, and identify top library services requested by patrons. identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 46 empirical research in the form of usability testing, functional validation, user surveys, etc., should be carried out before developing mamw to inform the development process and/or after developing mamw to study their adoption by library patrons. empirical research typically involves the identification of patrons and other stakeholders who are going to be affected by mamw. this step is followed by developing data-collection instruments, collecting data from patrons and other stakeholders, and analyzing qualitative and quantitative data using appropriate techniques and software.22 secondary research mainly focuses on scanning and assessing existing literature. for instance, using appropriate datasets on mobile use, librarians may be able to identify the factors responsible for the adoption of mobile technologies.23 typically, such factors include but are not limited to cognitive, affective, social, and economic conditions of potential users. mamw developers could also scan the environment by examining existing mamw and reviewing the literature to create sets of guidelines for replacing old information systems by developing new, well-functioning mamw.24 librarians could also scan the market for free software options to conserve financial resources.25 making strategic choices mobile applications or mobile websites? one of the most important strategic decisions libraries need to make during this phase is whether to use a mobile app or a mobile website—that is, a web portal running on mobile devices—for offering services to patrons. mobile websites are web browser-based applications that might direct mobile users to a different set of content pages, serve a single set of content to all patrons while using different style sheets or templates reformatted for desktop or mobile browsers, or use a site transcoder (a rule-based interpreter), which resides between a website and a web client and intercepts and reformats content in real time for a mobile device.26,27 mobile apps are more challenging to build than mobile websites because they require separate and specific programming for each operating system.28 mobile apps burden users and their devices. for instance, users are expected to remember the functionality of each menu item, and a significant amount of memory is required to store and support apps on mobile devices. however, potential profitability, better mobile-device functionality, and greater exposure through app stores can make mobile apps an economical option over mobile websites.29 buy or build? in the planning phase, libraries also need to decide whether to buy commercial, off-the-shelf (cots) mamw or build a customized mamw. mamw need to be evaluated in terms of customer support and service, maintenance, the ability to meet patron needs, and library needs when making this choice.30 sometimes libraries purchase cots products and end up customizing them, benefiting from both options. for example, some libraries first purchase packaged mobile frameworks to create simple, static mobile websites and subsequently develop dynamic library apps specific to library services.31 information technology and libraries | september 2016 47 managing scope many libraries have limited financial resources, which makes it necessary for their staff to manage the scope of mamw development. the ability to prioritize tasks and identify mission-critical features of mobile mamw are some of the most common activities undertaken by libraries to manage this scope.32 for instance, it is not practical to make entire library websites mobile because libraries would end up serving only those patrons who access their sites over mobile alone. instead, libraries should determine which part of the website should go mobile. a growing trend of using products like mobile first design to design a mobile version of a website first and then work up to a larger desktop version could help librarians better manage the scope of mamw development. alternatively, jeff wisniewski, a leading web services librarian in the united states, advises libraries to create a new mobile-optimized homepage alone, which is faster than trying to retrofit the library’s existing homepage for mobile.33 this advice is highly practical because no webmaster has any interest in trying to maintain two distinct versions of the library’s webpages with details such as hours of operations and contact information. selecting the appropriate software development method there are three key methods for developing mamw: structured methodologies (e.g., waterfall or parallel), rapid application prototyping (e.g., phased, prototyping, or throwaway prototyping), and agile development, an umbrella term used to refer to the collection of agile methodologies like crystal, dynamic systems development method, extreme programming, feature-driven development, and scrum. there is a bidirectional relationship between these mamw development methods and the resources available for their development. project resources such as funding, duration, and human resources influence and are affected by the type of software development method selected for developing mamw. however, studies rarely pay attention to this important dimension of the planning phase.34 key steps in the analysis phase requirements analysis after collecting data from patrons, the next natural step is to analyze the data to inform the process of conceptualizing, building, and developing mamw.35 the requirements-analysis phase helps libraries achieve user-centered design of mamw and assess the return on investment in mamw. the context and goals of the patrons using mobile devices, and the tasks they are likely and unlikely to perform on a mobile device, are the key considerations for developing user-centered mamw for library patrons.36 it is critical to gather, understand, and review user needs.37 surveys can be developed on paper or online, which can be analyzed using advanced statistical techniques or qualitative software.38,39 the analysis allows the following questions to be answered: which identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 48 library services do patrons use most frequently on their mobile devices? what is their level of satisfaction for using those services? what types of library services and products would they like to access with their mobile phones in the future? survey analyses can help librarians predict which mobile services patrons will find most useful;40 they can also help librarians classify users on the basis of their perceptions, experience, and habits when using mobile technologies to access library services.41 as a result, libraries can identify and prioritize functional areas for their mamw deployment.42 mamw developers can learn from their users’ humbling and/or frustrating experience of using mobile devices for library services. in addition, libraries can keep track of their patrons’ positive and negative observations, their information-sharing practices, and howthey create group experiences on the platform provided by their libraries.43 to improve existing mamw, libraries could also use google analytics, a free web metrics tool, for identifying the popularity of mamw features and analyzing statistics on how they are used.44 to develop operating system-specific mobile apps, google analytics can be used to learn about the popularity of mobile devices used by patrons.45 ideally, libraries should calculate and document roi before investing in the development of mamw.46 for instance, libraries can run a cost-benefit analysis on the process of developing mamw and compare various library services offered over mobile devices.47 typically the following data could help libraries run the cost-benefit analysis: specific deliverables (e.g., features of mamw), resources (e.g., resources needed, available resources, etc.), risks (e.g., types of risks, level of risks, etc.), performance requirements, and security requirements for developing mamw. this analysis would help libraries make decisions on service provisions such as specific goals to be set for developing mamw, feasibility of introducing desired features of mamw, and how to manage available resources to meet the set goals.48 libraries should also examine what other libraries have already done to provide mobile services.49 communication/liaising with stakeholders the effective communication between developers and stakeholders influences almost every aspect of developing information systems. however, existing studies do not emphasize the significance of communication with stakeholders. for instance, several studies vaguely refer to the translation of user needs into technology requirements.50 but few studies point out the precise modeling technique (e.g., entity relationship diagrams, unified modeling language, etc.) for converting user needs into a language understood by software developers. developers should communicate best practices and suggestions for the future implementation of mamw in libraries,51 which involves the prediction and selection of appropriate mamw for libraries,52 the demonstration of what is possible and how services are relevant, and how new resources can help create value for libraries.53,54 communication with users is also critical for creating value-added services for patrons who use different mobile technologies to meet their needs related to work, leisure, commuting, etc.55 information technology and libraries | september 2016 49 however, the existing literature on mamw development for libraries does not mention the significance of this activity. key steps for designing mamw prototyping prototyping refers to the modeling or simulation of an actual information system. mamw can have paper-based or computer-based prototypes. prototyping allows developers to directly communicate with mamw users to seek their feedback. developers can correct or modify the original design of mamw until users and developers are in agreement about the system design. building consensus between mamw developers and potential users is another key challenge to overcome during this phase, which may put a financial burden on mamw development projects. it requires skilled personnel to manage the scope, time, human resources, and budget of such projects. wireframing is one of the most prominent prototyping techniques practiced by librarians and it professionals for developing mamw for libraries.56 this technique depicts schematic on-screen blueprints of mamw, lacking style, color, or graphics, focusing mainly on functionality, behavior, and priority of content. selecting hardware, programming languages, platforms, frameworks, and toolkits existing literature on the development of mamw for libraries covers the selection and management of software; software development kits; scripting languages like javascript; data management and representation languages such as html, xml, and their text editors; and ajax for animations and transitions. the existing literature also guides libraries for training their staff for using mamw to better serve patrons.57 few studies also provide guidance on selecting cots products such as webkit, an open source web browser engine that renders webpages on smartphones and allows users to view high-quality graphics on data networks with faster throughput.58 however, it might be a good idea to use licensed open source cots products because licensed software allows libraries to legally distribute software within their organizations as covered by the licensing agreement. libraries that use software-licensing agreements may also be able to seek expert help and advice whenever they have a concern or query. in the authors’ experience, librarians have shared few effective strategies to design mamw. one key strategy is to purchase reliable device emulators and cross-compatible web editors. these technologies allow the user to work with the design at the most basic level, save documents as text, transfer the documents between web programs, and direct designers toward simple solutions.59 sample cross-compatible web editors include, but are not limited to, notetab pro (http://www.notetab.com/), code lobster (http://www.codelobster.com/), and bluefish (http://bluefish.openoffice.nl). http://www.notetab.com/ http://www.codelobster.com/ http://bluefish.openoffice.nl/ identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 50 hybrid mobile app frameworks like bootstrap, ionic, mobile angular ui, intel xdk, appcelerator titanium, sencha, kendo ui, and phonegap use a combination of web technologies like html, css, and javascript for developing mobile-first, responsive mamw. a majority of these frameworks use a drag-and-drop approach and do not require any coding for developing mobile apps. one-click api connect further simplifies the process. user-interface frameworks like jquerymobile and topcoat eliminate the need to design user interfaces manually. importantly, mamw developed using such frameworks can support many mobile platforms and devices. toolkits like github, skyronic, crudkit, and hawhaw enable developers to quickly build mobilefriendly crud (create/read/update/delete) interfaces for php, laravel, and codeigniter apps. such mobile apps also work with mysql and other databases, allowing users to receive and process data and display information to users. table 1 categorizes specific hardware and software features recommended for mamw to better serve library patrons. # areas of information systems/it specific features recommended for developing mamw for libraries 1 human-computer interaction (hci) behavioral, cognitive, motivational, and affective aspects of hci design responsive web sites for libraries to enhance user experience60 design a user interface meeting the expectations and needs of potential users (e.g., menu with the following items: library catalog, patron accounts, ask a librarian, contact information, listing of hours, etc.)61 design meaningful mobile websites based on user needs, documenting and maintaining mobile websites62 usability engineering design concise interfaces with limited links, descriptive icons, home and parent-link icons63 create a user-friendly site (e.g., the dok library concept center in delft, netherlands, offers a welcome text message to first-time visitors)64 effectively transition from traditional websites to mobile-optimized sites with responsive design65 create user-friendly interface designs66 present a clean, easy to navigate mobile version of search results67 information technology and libraries | september 2016 51 information visualization automatically maintain reliable and stable fundamental information required by indoor localization systems68 save time by redesigning existing sites69,70 2 web programming html, xml, etc. design sites with a complete separation of content and presentation71 code html and css for better user experiences72 create and shorten links to make them easier to input using small or virtual keyboards73 using cient-side and server-side scripting such as javascript object notation, etc. design and develop mashups74 develop mamw using client-server architecture, accessible on mobile devices75 without scripting implement widgetization to facilitate the integration of mobile websites—developing a widget library for mobile-based web information systems76 3 open source design mobile websites that allow users to leverage the same open source technology as the main websites77 design mobile websites linking to other existing services like library h3lp and library catalogs with mobile interfaces such as mobilecat78 4 networking design a mobile website capable of exploiting advancements in technology such as faster mobile data networks79 identify and address technology issues (e.g., connectivity, security, speed, signal strength, etc.) faced by patrons when using mamw80 5 input/output devices use a mobile robot to determine the location of fixed rfid tags in space81 design mamw capable of processing data communicated using radio frequency identification devices, near-field communication technology, and bluetoothbased technology like ibeacons82 offer innovative services using augmentedreality tools83 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 52 6 databases integrate a back-end database of metadata with front-end mobile technologies84 integrate front-end of mobile mamw with back-end of standard databases and services85 7 social media and analytics integrate social media sites (e.g., foursquare, facebook place, gowalla, etc.) with existing checkout services for accurate and information rich entries86 implement google voice or a free textmessaging service87 use google analytics for mobile optimized website by copying the free javascript code generated from google analytics and paste it into library webpages to gain insight into what resources are used and who used them88 integrate a geo-location feature with mobile services89 table 1. mamw with specific hardware and software features from the above table, which is based on the analysis of the literature on developing mobile applications and mobile websites for libraries, it becomes clear that web programming and hci are the two leading technology areas that shape the development of mamw and consequently the services offered by them. designing user interfaces of mamw librarians and it professionals engaged in developing mamw for libraries make the following recommendations. use two style sheets: css play a key role in offering uniform display to user interfaces for all webpages. studies recommend designing two style sheets—namely, mobile.css and iphone.css— when developing mamw, since most of the time smartphones ignore mobile stylesheets.90 in that case, iphone.css could direct itself to browsers of a specific screen-width, helping those mobile devices that are not directed to the mobile website by the mobile.css stylesheet.91 minimize use of javascript: javascript is instrumental in detecting what mobile device is being used by patrons and then directing them to the appropriate webpage with options including full website, simple text-based, and touch-mobile-optimized. however, it is critical to minimize the use of javascript on library mobile websites because not every smartphone offers the minimum level of support required to operate it.92 handle images intelligently: to help patrons optimize their bandwidth use, image files on mobile sites should be incorporated with css rather than html code; also, to ensure consistency in the information technology and libraries | september 2016 53 appearance of user interfaces of mobile websites, images should be kept to the same absolute size.93 key steps for implementing mamw programming for mamw programming is at the heart of developing mamw. as shown in table 1 above, web programming enables developers to build mamw with a number of value-added features for patrons. for instance, a web-application server running on cold fusion can process data communicated via web browsers on mobile devices; this feature allows mamw users to access search engines on library websites via smartphones.94 also, client-side processing of classes (with a widget library) allows patrons to use their mobile devices as thin clients, thereby optimizing the use of network bandwidth.95 testing mamw past studies recommend testing the content, display/design, and functionality of mamw in a controlled environment (e.g., usability lab) or in the real world (i.e., in libraries). content: librarians are advised to set up testing databases for testing image presentation, traditional free text search, location-based search, barcode scanning for isbn search, qr encapsulation, and voice search.96 display/design: librarians can review and test mamw on multiple devices to confirm that everything displays and functions as intended.97 they can also test a beta version of their mobile website with varying devices to provide guidance regarding image sizing;98 beta versions are also useful in testing mobile websites for their display on different browsers and devices.99 functionality: librarians can set up testing practices and environments for the most heavily used device platforms (e.g., hci incubators such as eye testing software, which is a combination of virtual emulators and mobile devices not owned by libraries).100,101 they can also use the user agent switcher add-on for firefox to test a mobile website and use web-based services like device anywhere and browser cam offering mobile emulation to test the functionality of mamw.102 training patrons unless patrons realize the significance of a new information system for managing information resources they will hardly use it. however, training patrons for using a newly developed mamw is almost completely missing from the studies describing the process of developing mamw for libraries. joe murphy, a technology librarian at yale university, identifies the significance of user training in managing the change from traditional to mobile search and advises librarians to explore the mobile literacy skills of their patrons and educate them on how to use new systems.103 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 54 data management mamw cannot function properly without clean data. cleaning up data, curating data, and addressing other data-related issues are some of the least mentioned activities in the literature for developing mamw. however, it is necessary for librarians engaged in developing mamw to identify and address common challenges for managing data when used for mamw. for example, it might be a good strategy for librarians to study the best practices for managing data-related issues when offering reference services using sms .104 skills needed for maintaining mamw documentation and version control of software past studies recommend developing a mobile strategy for building a mobile-tracking device and evaluating mobile infrastructure to ensure the continued assessment and monitoring of mobile usage and trends among patrons.105 however, past studies do not report or provide many details about the maintenance of mamw, which leads us to infer that maintenance of mamw involving documentation and version control is a neglected aspect of their development. open source software development is increasingly becoming a common practice for developing mamw. implementing version-control software (e.g., subversion and github) to accommodate the needs of developers distributed across the world is a necessity for developing mamw. versioncontrol software provides a code repository with a centralized database for developers to share their code, which minimizes errors associated with overwriting or reverting code changes and maximizes software development collaboration efforts.106 conclusion there are various forces driving change in the knowledge and skills area for information professionals: technologies, changing environments, and the changing role of it in managing and providing services to patrons. these forces affect all levels of it-based professionals, those responsible for information processing and those responsible for information services. this paper has examined the key steps and precautions to be taken while developing mamw to better serve their patrons. after analyzing the existing guidance offered by librarians and it professionals from the system analysis and design perspective, we find that some of the most ignored activities in mamw development are selecting appropriate software development methodologies, prototyping, communicating with stakeholders, software version control, data management, and training patrons to use newly developed or revamped mamw. the lack of attention to these activities could hinder libraries’ ability to better serve patrons using mamw. it is necessary for librarians and it professionals to pay close attention to the above activities when developing mamw. information technology and libraries | september 2016 55 our study also shows that web programming and hci are the two most widely used technology areas for developing mamw for libraries. to save their scarce financial resources, which otherwise could be invested in partnering with external it professionals, libraries could either train their existing staff or recruit lis graduates equipped with the skills and knowledge identified in this paper to develop mamw (see table 2). # key steps for developing mamw skills and knowledge required for developing mamw a planning phase 1 forming and managing team human resource management 2 making strategic choices time management cost management quality management human resource management (e.g., staff capacity) 3 requirements gathering research (empirical and secondary) 4 managing scope (e.g., managing financial resources, prioritizing tasks, identifying mission-critical features of mamw, etc.) scope management 5 selecting an appropriate software development method time management cost management quality management b analysis phase 6 requirements analysis research (empirical and secondary) 7 communication/liaising with stakeholders communications management c design phase 8 prototyping software development (hci) 9 selecting hardware and programming languages and platforms software development (web programming and hci) 10 designing user interfaces of mamw software development (hci) d implementation phase 11 programming for mamw software development (web programming—e.g., android, ios, visual c++, visual c#, visual basic, etc.) 12 testing mamw software development (web programming and hci) identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 56 13 training patrons human resource management 14 data management (e.g., cleaning up data, curating data, etc.) data management e maintenance phase 15 documentation and version control of software software development (web programming and hci) table 2. skills and knowledge necessary to develop mamw the management of scope, time, cost, quality, human resources, and communication related to any project is known as project management.107 in addition to the skills and knowledge related to project management, librarians would also need to be proficient in software development (with an emphasis on hci and web programming), data management, and the proper methods for conducting empirical and secondary research for developing mamw. if lis programs equip their graduate students with the skills and knowledge identified in this paper, the next generation of lis graduates could develop mamw for libraries without relying on external it professionals, which would make libraries more self-reliant and better able to manage their financial resources.108 this paper assumes a very small number of scholarly publications to be reflective of the realworld scenarios of developing mamw for all types of libraries. this assumption is one of the limitations of this study. also, the sample of publications analyzed in this study is not statistically representative of the development of mamw for libraries around the world. in the future, the authors plan to interview librarians and it professionals engaged in developing and maintaining mamw for their libraries to better understand the landscape of developing mamw for libraries. references 1. devendra potnis, ed cortez, and suzie allard, “educating lis students as mobile technology consultants” (poster presented at 2015 association for library and information science education annual meeting, chicago, january 25–27), http://f1000.com/posters/browse/summary/1097683. 2. edwin michael cortez, “new and emerging technologies for information delivery,” catholic library world no. 54 (1982): 214–18. 3. kimberly d. pendell and michael s. bowman, “usability study of a library’s mobile website: an example from portland state university,” information technology & libraries 31, no. 2 (2012): 45–62, http://dx.doi.org/10.6017/ital.v31i2.1913. 4. godmar back and annette bailey, “web services and widgets for library information systems,” information technology & libraries 29 no. 2 (2010): 76–86, http://dx.doi.org/10.6017/ital.v29i2.3146 . http://f1000.com/posters/browse/summary/1097683 http://dx.doi.org/10.6017/ital.v31i2.1913 http://dx.doi.org/10.6017/ital.v29i2.3146 information technology and libraries | september 2016 57 5. hannah gascho rempel and laurie bridges, “that was then, this is now: replacing the mobile optimized site with responsive design,” information technology & libraries 32, no. 4 (2013): 8–24, http://dx.doi.org/10.6017/ital.v32i4.4636. 6. june jamrich parsons and dan oja, new perspectives on computer concepts 2014: comprehensive, course technology (boston: cengage learning, 2013). 7. ibid. 8. andrew walsh, using mobile technology to deliver library services: a handbook (london: facet, 2012). 9. matthew b. miles and a. michael huberman, qualitative data analysis (thousand oaks, ca: sage, 1994). 10. bohyun kim, “responsive web design, discoverability and mobile challenge,” library technology reports 49, no 6 (2013): 29–39, https://journals.ala.org/ltr/article/view/4507. 11. james elder, “how to become the “tech guy and make iphone apps for your library,” the reference librarian 53, no. 4 (2012): 448–55, http://dx.doi.org/10.1080/02763877.2012.707465. 12. sarah houghton, “mobile services for broke libraries: 10 steps to mobile success,” the reference librarian 53, no. 3 (2012): 313–21, http://dx.doi.org/10.1080/02763877.2012.679195. 13. pendell and bowman, “usability study.” 14. lisa carlucci thomas, “libraries, librarians and mobile services,” bulletin of the american society for information science & technology 38, no. 1 (2011): 8–9, http://dx.doi.org/10.1002/bult.2011.1720380105. 15. elder, “how to become the ‘tech guy.’” 16. kim, “responsive web design.” 17. chad mairn, “three things you can do today to get your library ready for the mobile experience,” the reference librarian 53, no. 3 (2012): 263–69, http://dx.doi.org/10.1080/02763877.2012.678245. 18. rempel and bridges, “that was then.” 19. rachael hu and alison meier, “planning for a mobile future: a user research case study from the california digital library,” serials 24, no. 3 (2011): s17–25. 20. kim, “responsive web design.” http://dx.doi.org/10.6017/ital.v32i4.4636 https://journals.ala.org/ltr/article/view/4507 http://dx.doi.org/10.1080/02763877.2012.707465 http://dx.doi.org/10.1080/02763877.2012.679195 http://dx.doi.org/10.1002/bult.2011.1720380105 http://dx.doi.org/10.1080/02763877.2012.678245 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 58 21. lorraine paterson and boon low, “student attitudes towards mobile library services for smartphones,” library hi tech 29, no. 3 (2011): 412–23, http://dx.doi.org/10.1108/07378831111174387. 22. jim hahn, michael twidale, alejandro gutierrez and reza farivar, “methods for applied mobile digital library research: a framework for extensible wayfinding systems,” the reference librarian 52, no. 1-2 (2011): 106–16, http://dx.doi.org/10.1080/02763877.2011.527600. 23. patterson and low, “student attitudes.” 24. gillian nowlan, “going mobile: creating a mobile presence for your library,” new library world 114, no. 3/4 (2013): 142–50, http://dx.doi.org/10.1108/03074801311304050. 25. elder, “how to become the ‘tech guy.’” 26. matthew connolly, tony cosgrave, and baseema b. krkoska, “mobilizing the library’s web presence and services: a student-library collaboration to create the library’s mobile site and iphone application,” the reference librarian 52, no. 1-2 (2010): 27–35, http://dx.doi.org/10.1080/02763877.2011.520109. 27. stephan spitzer, “make that to go: re-engineering a web portal for mobile access,” computers in libraries 3 no. 5 (2012): 10–14. 28. houghton, “mobile services.” 29. cody w. hanson, “mobile solutions for your library,” library technology reports 47, no. 2 (2011): 24–31, https://journals.ala.org/ltr/article/view/4475/5222. 30. terence k. huwe, “using apps to extend the library’s brand,” computers in libraries 33, no. 2 (2013): 27–29. 31. edward iglesias and wittawat meesangnill, “mobile website development: from site to app,” bulletin of the american society for information science and technology 38, no. 1 (2011): 18– 23. 32. jeff wisniewski, “mobile usability,” bulletin of the american society for information science & technology 38, no. 1 (2011): 30–32, http://dx.doi.org/10.1002/bult.2011.1720380108. 33. jeff wisniewski, “mobile websites with minimal effort,” online 34, no. 1 (2010): 54–57. 34. hahn et al., “methods for applied mobile digital library research.” 35. j. michael demars, “smarter phones: creating a pocket sized academic library,” the reference librarian 53, no. 3 (2012): 253–62, http://dx.doi.org/10.1080/02763877.2012.678236. http://dx.doi.org/10.1108/07378831111174387 http://dx.doi.org/10.1080/02763877.2011.527600 http://dx.doi.org/10.1108/03074801311304050 http://dx.doi.org/10.1080/02763877.2011.520109 https://journals.ala.org/ltr/article/view/4475/5222 http://dx.doi.org/10.1002/bult.2011.1720380108 http://dx.doi.org/10.1080/02763877.2012.678236 information technology and libraries | september 2016 59 36. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mobile: tips on designing and developing mobile websites,” code4lib no. 8 (2009), http://journal.code4lib.org/articles/2055. 37. demars, “smarter phones.” 38. hahn et al., “methods for applied mobile digital library research.” 39. beth stahr, “text message reference service: five years later,” the reference librarian no. 52, no. 1-2 (2011): 9–19, http://dx.doi.org/10.1080/02763877.2011.524502. 40. patterson and low, “student attitudes.” 41. ibid. 42. ibid. 43. hanson, “mobile solutions for your library.” 44. stahr, “text message reference service.” 45. spitzer, “make that to go.” 46. allison bolorizadeh et al., “making instruction mobile,” the reference librarian 53, no. 4 (2012): 373–83, http://dx.doi.org/10.1080/02763877.2012.707488. 47. maura keating, “will they come? get out the word about going mobile,” the reference librarian no. 52, no. 1-2 (2010): 20-26, http://dx.doi.org/10.1080/02763877.2010.520111. 48. patterson and low, “student attitudes.” 49. hanson, “mobile solutions for your library.” 50. patterson and low, “student attitudes.” 51. hanson, “mobile solutions for your library.” 52. cody w. hanson, “why worry about mobile?,” library technology reports no. 47, no. 2 (2011): 5–10, https://journals.ala.org/ltr/article/view/4476. 53. keating, “will they come?” 54. spitzer, “make that to go.” 55. kim, “responsive web design.” 56. wisniewski, “mobile usability.” 57. elder, “how to become the ‘tech guy.’” http://journal.code4lib.org/articles/2055 http://dx.doi.org/10.1080/02763877.2011.524502 http://dx.doi.org/10.1080/02763877.2012.707488 http://dx.doi.org/10.1080/02763877.2010.520111 https://journals.ala.org/ltr/article/view/4476 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 60 58. sally wilson and graham mccarthy, “the mobile university: from the library to the campus,” reference services review 38, no. 2 (2010): 214–32, http://dx.doi.org/10.1108/00907321011044990. 59. brendan ryan, “developing library websites optimized for mobile devices,” the reference librarian 52, no. 1-2 (2010): 128–35, http://dx.doi.org/10.1080/02763877.2011.527792. 60. kim, “responsive web design.” 61. connolly, cosgrave, and krkoska, “mobilizing the library’s web presence and services.” 62. demars, “smarter phones.” 63. mark andy west, arthur w. hafner, and bradley d. faust, “expanding access to library collections and services using small-screen devices,” information technology & libraries 25 (2006): 103–7. 64. houghton, “mobile services.” 65. rempel and bridges, “that was then.” 66. elder, “how to become the ‘tech guy.’” 67. heather williams and anne peters, “and that’s how i connect to my library: how a 42second promotional video helped to launch the utsa libraries’ new summon mobile application,” the reference librarian 53, no. 3 (2012): 322–25, http://dx.doi.org/10.1080/02763877.2012.679845. 68. hahn et al., “methods for applied mobile digital library research.” 69. danielle andre becker, ingrid bonadie-joseph, and jonathan cain, “developing and completing a library mobile technology survey to create a user-centered mobile presence,” library hi-tech 31, no. 4 (2013): 688–99, http://dx.doi.org/10.1108/lht-03-2013-0032. 70. rempel and bridges, “that was then.” 71. iglesias and meesangnill, “mobile website development.” 72. elder, “how to become the ‘tech guy.’” 73. andrew walsh, “mobile information literacy: a preliminary outline of information behavior in a mobile environment,” journal of information literacy 6, no. 2 (2012): 56–69, http://dx.doi.org/10.11645/6.2.1696. 74. back and bailey, “web services and widgets.” 75. ibid. 76. ibid. 77. spitzer, “make that to go.” http://dx.doi.org/10.1108/00907321011044990 http://dx.doi.org/10.1080/02763877.2011.527792 http://dx.doi.org/10.1080/02763877.2012.679845 http://dx.doi.org/10.1108/lht-03-2013-0032 http://dx.doi.org/10.11645/6.2.1696 information technology and libraries | september 2016 61 78. iglesias and meesangnill, “mobile website development.” 79. bohyun kim, “the present and future of the library mobile experience,” library technology reports 49, no. 6 (2013): 15–28, https://journals.ala.org/ltr/article/view/4506. 80. pendell and bowman, “usability study.” 81. hahn et al., “methods for applied mobile digital library research.” 82. andromeda yelton, “where to go next,” library technology reports 48, no. 1 (2012): 25–34, https://journals.ala.org/ltr/article/view/4655/5511. 83. ibid. 84. hahn et al., “methods for applied mobile digital library research.” 85. houghton, “mobile services.” 86. ibid. 87. mairn, “three things you can do today.” 88. ibid. 89. tamara pianos, “econbiz to go: mobile search options for business and economics— developing a library app for researchers,” library hi tech 30, no. 3 (2012): 436–48, http://dx.doi.org/10.1108/07378831211266582. 90. demars, “smarter phones.” 91. ryan, “developing library websites.” 92. pendell and bowman, “usability study.” 93. ryan, “developing library websites.” 94. michael j. whitchurch, “qr codes and library engagement,” bulletin of the american society for information science & technology 38, no. 1 (2011): 14–17. 95. back and bailey, “web services and widgets.” 96. jingru hoivik, “global village: mobile access to library resources,” library hi tech 31, no. 3 (2013): 467–77, http://dx.doi.org/10.1108/lht-12-2012-0132. 97. elder, “how to become the ‘tech guy.’” 98. ryan, “developing library websites.” 99. west, hafner and faust, “expanding access.” 100. hu and meier, “planning for a mobile future.” 101. iglesias and meesangnill, “mobile website development.” https://journals.ala.org/ltr/article/view/4506 https://journals.ala.org/ltr/article/view/4655/5511 http://dx.doi.org/10.1108/07378831211266582 http://dx.doi.org/10.1108/lht-12-2012-0132 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 62 102. wisniewski, “mobile usability.” 103. joe murphy, “using mobile devices for research: smartphones, databases and libraries,” online 34, no. 3 (2010): 14–18. 104. amy vecchione and margie ruppel, “reference is neither here nor there: a snapshot of sms reference services,” the reference librarian 53, no. 4 (2012): 355–72, http://dx.doi.org/10.1080/02763877.2012.704569. 105. hu and meier, “planning for a mobile future.” 106. wilson and mccarthy, “the mobile university.” 107. project management institute, a guide to the project management body of knowledge (pmbok guide) (newtown square, pa: project management institute, 2013). 108. devendra potnis et al., “skills and knowledge needed to serve as mobile technology consultants in information organizations,” journal of education for library & information science 57 (2016): 187–96. http://dx.doi.org/10.1080/02763877.2012.704569 abstract introduction method forming and managing a team key steps in the analysis phase key steps for designing mamw key steps for implementing mamw skills needed for maintaining mamw conclusion forming and managing team this paper assumes a very small number of scholarly publications to be reflective of the real-world scenarios of developing mamw for all types of libraries. this assumption is one of the limitations of this study. also, the sample of publications anal... references lib-s-mocs-kmc364-20140601051249 12 cataloging geometry robert s. hazelton: school of library science, case western reserve university, cleveland, ohio a scheme is suggested for the physical arrangement of the contents of a library, in which the library as well as the books are considered as threedimensional entities, and classification is revised to refle ct this concept. don juan needs no bed, being far too impatient to undress, nor do tristan and isolda, much too in love to care for so mundane a matter, but unmythical mortals require one, and prefer to take their clothes off, if only to sleep. that is why bedroom farces must be incredible to be funny, why peeping toms are never praised, like novelists or bird watchers, for their keenness of observation: where there's a bed, be it a nun's restricted cot or an emperor's baldachined and nightly-redamselled couch, there are no effable data. ( 1) libraries are not beds-but the images are revealing and useful. that there is an information explosion going on, we are told too often by the impatient. and the very grammar of the situation reassures us: unlike tristan and isolda or a bomb, the information explosion can never explode or be exploding. for all its information (or is it just the patterns and inscriptions of a nominalist?) the library is so mundane it hardly merits a peeping tom. the information needs of the dons are seldom met by the library. unmythical mortals, however, swear by the local branch. the repositories of effable library data are small-and still far from full or accurate. perhaps most surprising is that information processing machines are still essentially foreigners in the repositories of information. and that is incredible. given this background i would like to make a suggestion based on some ideas from computer processing with a very mundane practicability -a workable compromise between the catalog and the possible computer manipulation in n dimensions. i take it that the "linearity of the catalogue" cataloging geometry /hazelton 13 is an abstraction; " .. . linearity is dictated by the physical form of the book and the characteristics of library architecture. in effect, a library is one continuous shelf of books, and each particular book represents a speciflc point in that line. it must follow, therefore, that any classification that can be applied to such an assemblage of units must necessarily exhibit a linear sequence of its terms." ( 2) the crucial words are "in effect." few libraries are in fact one continuous shelf. i know of one-it is my daughter's and suffers from a long shelf and only thirty-seven volumes. both the best and the worst of this view are exhibited as it is pushed to the extreme: "the failure of our present systems of book classification in no way condemns the act of classification as a fundamental bibliographic technique. book classification, as we have used it in the past, has failed for two reasons: one, because it has been based upon the book as a physical entity without taking into consideration the inherent character of the book as a composite intellectual product; two, because of limitations arising from the properties of our hierarchical systems of classification. jevons was right, for library classiflcation as he knew it, was indeed "a logical absurdity." by this he meant, of course, that the content of books is polydimensional, which is logically incompatible with the traditional hierarchical schematization of knowledge, which is a linear progression from general to specific. the book, then, as a physical unit, and irrespective of the dimensions of its content, must be forced into a monodimensional system in which it has only linear position. this limitation alone destroys most of the utility of traditional book classifications as instruments for the effective subject organization of library materials." ( 3) the best is this i think-the current schemes are inadequate and one of the major limitations is the notion that each book must be classified only as a linear position. more generally, this idea of linearity points to the absurdity in classification schemes. and now the worst-linearity is not just an abstraction; it is a myth and a fraud. it has not adequately represented the book as a physical object and has been constrained by the error, not the book. let us look more closely at the geometry of the book. that aspect apparently most startling to the classificationist (but not the librarian who never has enough room) is its solidity-its three dimensionality. it is impossible to build a book of less than three dimensions! the same problem exists for unmythical libraries-three dimensions are essential. practice does not easily square with the theory of one-dimensional libraries. the points on the line are far more arbitrary than you imagine. why does that line start on floor one, jump to three, back to one, up to two, and die in the basement? and having traversed the line, we will usually not have found any newspapers, any fiction , any children's books, and few journals. does that line ever flash through the shelf of brightly colored new books resplendent in the lobby impressing the children and trustees? 14 ]ourrujl of library automation vol. 5/1 march, 1972 allow the line to run through every shelf now. what most characterizes the scheme? for all the complexities, for all the work of dewey, la fontaine, and ranganathan, it is simplicity! books about the same subject or in some congenial category are insofar as possible physically proximate. by congenial category i mean a grouping according to a concept which is not a subject classification. the difficulty encountered in one dimension is purely physical. logically, any finite number of dimensions can be mapped into the integers ( i.e., one dimension) as long as the members of each dimension set are denumerable. ordered pairs are easily mapped into the integers by the following formula: i(x, y)=% (x+y)~+3x+y. this yields the progression of pairs < 0, 0 > , < 0, 1>, <1, 0>, < 0, 2>, < 1, 1> , <2, 0 >, < 0, 3>, < 1, 2>, etc. ordered triples are ordered pairs of ordered pairs and the integers: t(x, y, z)=i(x, i(y,z) ). and so on. because we have a d enumerable set of books we can accomplish a linear mapping by both subject and category. in fact , the problem is trivial because there are only a finite number of books. physically, however, neither subject nor category will remain together. to suit the library the mapping must be physically simple, but can be abstractly complex. for all his protestations, the classificationist cannot eschew the physical library. if he could-or wished to-the way is open. as i understand classification, it is vacuous without reference to its ability as a finding tool. it must concern itself with the polydimensional aspect of content but cannot disregard the codex. in answer to the question form "where is the book about ... ?" an appropriate and total response type is "at location (x, y, z) ." here x, y, and z are the spatial coordinates relative to a particular library both as to origin and values. the dewey or lc numbers of the book are incomple te answers in that they presume a knowledge of the classification structure as well as knowledge of the architecture of the building. i have suggested that a classification scheme must not disregard the codex, but must insofar as possible not be subservient to physical form. the following scheme takes advantage of the codex form, is as easily automated or computerized as current one-dimensional schemes, advances beyond one dimension, and is very relevant to finding: a library is considered as a three-dimension entity. conventions are adopted for run-on from room to room and floor to floor as for the linear scheme. each book is classified in all three dimensions-the dimensions being independent. the interpretation of each dimension is left to the discretion of the individual library. thus each book has a relative position in each dimension. (this is not an alexandrian scheme relying on absolute location. ) the following example illustrates the relevant concepts: choose a subject classification (as commonly understood ) for the x dimension. for example, let dewey numbers be arranged from left to right on the x axis. choose a category scheme for the y dimension. one could assign degrees of difcataloging geometry/hazelton 15 x d i f f i c u subject ficulty from one to seven, for example. choose a category scheme for the z dimension. one could assign numbers between one and seven running from most general to most specific. this has the following effect: standing in front of the near shelf (i.e., z=l) one can choose a subject by moving laterally. the general books will appear first with difficult items at the top, easy ones at the bottom. if the items are too general, merely move one stack forward and try again. this approach presents an unusually usable instructional layout for circular libraries. a reading lounge can be put dead center with the most subject specific books ranged about the circumference. level of difficulty is easily adjusted by looking up or down. given this apparatus you may wish to change the subject classification scheme. why not put solid state physics behind general physics instead of to the right or left? the card catalog can now be used with greater meaning. there is no reason why it cannot be a map of the shelves. the axes can be translated for ease of searching (e.g., interchange x and y for the card catalog). of particular interest is the relation between this scheme and those of a. d. booth ( 4) where access time is minimized by arranging books in the inverse order of their frequency of use. further refinements consider nonstandard shelf layouts (radial, circular, spiral). one misgiving about shelving by inverse frequency expressed by librarians is that one no longer knows where to look for a particular book in the sense that one knows when using standard schemes. this objection is easily overcome by combining the three-dimensional and frequency schemes. one dimension can be used for frequency, leaving two dimensions in which to group books by subject, difficulty, generality, color, length, or whatever you please. access time is reduced while physical grouping is retained. 16 journal of library automation vol. 5/1 march, 1972 one difficulty that will be encountered is the classification of books that are not subject-oriented-poetry and fiction, for example. these areas are not adequately dealt with in linear schemes and they could easily be left as they are. that is, two dimensions could be constants. on the other hand, it seems plausible that, given three dimensions in which to work, someone could discover congenial physical groupings that would be reasonable yet impossible in one dimension. rather than being a problem, threedimensional classification offers opportunities to cope with literatures that are not subject specific. each dimension of this scheme can be criticized on the same grounds as the current linear classification. but, taken as a whole, it provides a more powerful, much needed tool for the classificationist while allowing new approaches by automaters. its simplicity is assured because it is closer to our intuitive notions of information storage. three dimensions are necessary! references 1. w. h. auden, "the cave of nakedness," about the house ( new york: random house, 1965), p.32. 2. jesse h. shera, "classification-current functions and applications to the subject analysis of library material," in libraries and the organization of knowledge (connecticut: archon books, 1965), p.97 -98. 3. jesse h. shera, "classification as the basis of bibliographic organization," in libraries and the organization of knowledge (connecticut: archon books, 1965) , p.84, 85. 4. a. d. booth, "on the geometry of libraries," journal of documentation 25:28-42 (march 1969). 112 information technology and libraries | june 2006 book review debra shapiro, editor strategic planning and management for library managers by joseph r. matthews. westport, conn.: libraries unlimited, 2005. xiv, 150p. $40 (isbn 1-59158-231-8). the reality for most librarians is that, sometime in their career, they will be involved in strategic management and planning. while library school courses occasionally deal with this topic, it is from a theoretical perspective only. most librarians are promoted or coerced into leadership and management roles, often with little or no training or resources at their disposal to assist them with the transition or change of responsibilities. strategic planning is one of those duties assigned to library managers and leaders that often get pushed to the lowest-priority list, mainly because there are few guidelines and handbooks available in this area. since the publication of donald riggs’s strategic planning for library managers (oryx, 1984), little attention has been given to this vital topic. matthews’s book attempts to provide information on how to explore strategies; demystify false impressions about strategies; how strategies play a role in the planning and delivery of library services; broad categories of library strategies that can be used; and identification of new ways to communicate the impact of strategies to patrons. as the author states in the introduction, the focus of libraries has moved from collections to encompass the arena of change itself. finding strategies to enable operation in a fluid environment can mean the difference between relevance and irrelevance in today’s competitive information marketplace. the book is divided into three major sections: (1) what is a strategy, and the importance of having one; (2) the value of and options for strategic planning; and (3) the need to monitor and update strategies. the first four chapters make up the first section. chapters 1 and 2 go through the semantics and the need for strategies, as well as the realities and limitations of strategies. chapter 3 provides brief introductions to schools of strategic thought. these include the design school, the planning school, the positioning school, the entrepreneurial school, the cognitive school, the learning school, the power school, the cultural school, the environmental school, and the configuration school. chapter 4 introduces types of strategies: operational excellence, innovative services, customer intimacy, and the concept of strategic options. section 2 consists of chapters 5 through 8 and provides information on what strategic planning is, what its value is, process options such as planning alternatives and critical success factors, and implementation. section 3, comprised of chapters 9 and 10, focuses on the culture of assessment; monitoring and updating strategies; and tools available for managing the library. two appendixes are provided: one containing sample library strategic plans, and another with a critique of a library strategic plan. overall, the book is very straightforward and understandable, with numerous illustrations, process workflows, and charts. i found the information very interesting and useful, and the final section on assessment and measurement of strategic planning is essential for libraries to implement and monitor in today’s marketplace. the various explanations related to schools of strategic thought were especially helpful. this book should be read by every library manager and director involved in strategic planning and process.—brad eden, associate university librarian for technical services and scholarly communication, university of california, santa barbara ebsco cover 3 lita 107, 111, covers 2 and 4 index to advertisers lib-mocs-kmc364-20140106083630 development of a technical library to support computer systems evaluation 173 patricia munson malley: librarian, u. s. army computer systems support and evaluation command, washington, d.c. this paper reports on the development and growth of the united states army computer systems support and evaluation command (usacssec) technical reference library from a collection of miscellaneous documents related to only fifty computer systems to the present collection of approximately 10,000 hardware/software technical documents related to over 200 systems from 70 manufacturers. special emphasis is given to the evolution of the filing system and retrieval techniques unique to the usacssec technical reference library, i.e., computer listings of available documents in various sequences, and development uf the cataloging system adaptable to computer technology. it is hoped that this paper will be a contribution toward a standard approach in cataloging adp collections. the advent of the computer has created a situation which has been labeled the "information explosion." through automatic data processing, managers of all types can have available to them information previously impossible. many authors have addressed this situation from many aspects. however, little has been said of the explosive growth of information about computers themselves and of ways to cope with it. this paper is intended to help overcome this void. it is a description of the system installed by the united states army computer systems support and evaluation command ( usacssec) to provide controls on its extensive library of technical literature pertaining to automatic data processing equipment. the usacssec has the mission of selecting and procuring this equipment to satisfy requirements of the army, a process that involves analyzing and evaluating technical proposals made by computer manufacturers. the analysts of the command require immediate access to detailed technical literature on all aspects of commercially available adp hardware and 174 journal of library automation vol. 4/4 december, 1971 software. this literature is maintained in the command technical reference library. in form it ranges from single-page summaries to multi-volume bound collections. it includes periodicals, books, brochures, and reference works. in approximately five years, the library's vendor documentation has grown from approximately 200 to 10,000 manuals on over 200 computer systems. the library's holdings also include information on peripheral equipment from over 170 manufacturers, e.g., printers, magnetic tape transports, microfilm, platters, memories, etc.; standards; gsa federal supply schedules; programmed instruction courses published by vendors; and major reference works with monthly supplements. in the early days of the library's existence, one librarian was able to catalog and shelve the material manually with no difficulty. however, the rapid growth in the availability and use of adp brought with it a flood of technical literature which threatened to inundate the librarian and the manual filing methods. it was recognized early that some form of automation assistance for the library was necessary. the system described in this paper is the one which evolved and is now successfully employed. system description the system, named access (automated catalog of computer equipment and software systems), used by the usacssec is characterized by simplicity. it is built around a master list of all holdings, and the key to its uniqueness and success is the cataloging scheme. manufacturers have various methods of identifying their literature, some having structured stock numbers, some using only the document title, and others ranging between these extremes. the only common identifier is document title, which offers inadequate access to the collection. an efficient cataloging scheme is therefore of primary importance as a means of identifying and retrieving documents. searches made by the analysts for whom the library is maintained usually fall into one of three types: . 1) location of a specifically identified document (e.g., the cobol programming manual for the univac 1108 computer system); 2) location of all documents pertaining to specific aspects of a particular computer system (e.g., technical descriptions of all output devices for the burroughs b3500 system); 3) location of all documents pertaining to particular aspects of a number of different computer systems (e.g., technical descriptions of line printers for ibm system 360, burroughs b3500, honeywell 200, rca spectra 70, and univac 1108). in 1966, since approximately 75% of the literature in the library was ibm oriented, ibm's index of system's literature, which categorizes documents by subject, was used as an initial model to classify literature of other manufacturers. since that time a more sophisticated, explicit and expanded subject index has been developed. table 1 shows a complete list of categories, together with an explanation of them. computer systems libraryjmalley 175 table 1. representative subject categories and codes hardware categorization subject code (tab) 00 01 03 05 07 08 09 abbreviated title general information machine system input/output magnetic tape units and controls direct access storage units and controls analog equipment auxiliary equipment subject category content systems summaries, bibliographies, configurators, publications guide, brochures on systems where no technical documentation is provided and price lists not in the gsa federal supply schedule. ex: publications guide with addendas. principles of operation, operator manuals, operating procedures, reference and system manuals. ex: processor systems information manual, operating manual. component descriptions of unit record equipment, e.g., line printers, paper tape readers, card readers, etc. ex: printers reference manual, card punch style manual. component descriptions and operation of the units. ex: magnetic tape unit operating manual. component descriptions and operation procedures. ex: disc storage subsystem and reference manual. information related to analog computers. also includes the interface equipment for connecting to digital computers. ex: integrated hybrid subsystem. includes plotters, digitizers, optical character readers, all nonstandard i/ 0 devices. interface equipment. ex: graph plotters. 176 journal of library automation vol. 4/4 december, 1971 10 13 15 19 20 21 24 communications and remote terminal equipment special and custom features physical planning specifications original equipment manufacturers information component descriptions of communication control devices and remote terminals. ex: a. voice response unit. b. visual display unit. c. teletype, typewriter terminals. d. graphic display units. special feature descriptions and custom feature descriptions. (those devices that must be custom built.) ex: a. satellite coupler. b. programmed peripheral switch. c. special feature channelto-channel adapter. d. european communication line terminal. installation and physical planning manuals. ex: site preparation and installation manual. devices subcontracted from other manufacturers. ex: component subleased from one manufacturer for use on own vendors equipment. software categorization programming systemsgeneral assembler cobol general concepts and systems summary related to the software of the system. ex: a. catalog of programs. b. programmer's guide. reference and programming manuals on the assembly language ( s) of the system. ex: a. assembler language. b. card assembler reference manual. reference and programming manuals on the cobol language. ex: cobol reference manual. 25 26 28 30 31 32 computer systems libraryjmalley 177 fortran other languages report program generator input/output control systems data management systems literature on the utility programs reference and programming manuals on the fortran language (includes basic). ex: fortran iv operations manual. reference and programming manuals on other higher-order general purpose languages such as algol, jovial, etc. ex: a. algol programmers' guide. b. jovial compiler reference manual. reference and programming manuals on report program generator ( rpg) languages. ex: report program generator reference manual. information related to the software facilities for the control and handling ·of input/output operations. ex : a. operating systems basic locs. b. computer systems input/output package. information related to generalized information processing systems which include the functions of information storage, retrieval, organization, etc. ex: a. ibm -gis b . burroughs forge c. ge-lds standard routines used to assist in the operation of the computer; e.g., a conversion routine, sorting routine or a printout routine. ex: a. utility system general information manual. b. utility systems programming manual. 178 journal of library automation vol. 4/4 december, 1971 33 35 36 37 48 sort/merge systems simulators/ emulators language translators operating systems, supervisors-monitors automatic testing programs miscellaneous programs information related to software facilities whose major functions are to sequence data in a disciplined order according to defined rules. ex: a. sort /merge timing tables. b. general information sort/ merge routines. information related to techniques, hardware or software, utilized to make one computer operate as nearly as possible like some other computer. ex: a. flow simulator information manual. b. emulation information manual. information related to the programs of a system which are responsible for scheduling, allocating and controlling the system resources and application programs. ex: a. disk / tape operating system operation manual. b. operating system programmers. interpretive diagnostic techniques which provide analysis of hardware components or of software programs; e.g., hardware autotest programs, software trace routines. ex: a. program writing and testing bulletin. b. system test monitor diagnostic. information related to special techniques or application programs. ex: a. apt general information manual. computer systems library/malley 179 documents are shelved (in loose-leaf notebooks) by manufacturer, computer system, subject category and numerical publication identification. the user is aided in his searches by the following three types of listings of holdings: 1) listing by manufacturer (figure 1 ): major sort field, manufacturer; intermediate sort field , computer system nomenclature; intermediate sort field, subject code (tab); and minor sort field, publication number. that is, a document is listed by publication number, within subject code, within the computer system, within the manufacturer. this list serves as an index to the library's holdings. ~ ibm ibm ibm i bm ibm ibm ibm ibm ibm ibm ibm ibm ibh system tab sys/3 70 00 sys/370 00 sys/370 01 sys / 370 01 sys/370 01 sys / 370 01 sys/370 01 sys / 370 03 sys / 370 03 sys/370 07 sys / 370 07 sys/370 15 sys/370 15 usacssec technical reference library catalog as of june 71 ibm corporation library listing by mfr by system mrs . malley, librarian pub no publication title a33-3006-0l sys / 370 model 135 configurator 710300 n20 -0360 71 *srl newsletter index of publications + programs 701231 a22 693500 sys/370 mod 165 functional characteristics 700600 a22-6942 00 sys/370 hod 155 functional characteristics 700600 a22 -700000 sys / 370 principles of operation 700600 c20 -172900 a guide to system / 370 model 165 700600 c20 173400 *a guide to the ibm system/ 370 model 145 700900 a21 9124 -0l 3505 card reader, 3525 card punch subsystem 710300 a24 3550 -0l 3215 -1 console printerkeyboard comp descr 700700 a26-1592-00 3830 stg contrl / 3330 disk storage comp desc 700600 a26 -160600 2319 disk storage component sumhary 700900 a22 697000 system/ 370 model 15 5 installation man phys plan 700600 a22 6971 -00 system/ 370 model 165 installation man phys plan 700600 *indicates new entries since last catalog. fig. 1. sample index listing by manufacturer name and system. 2) listing by subject code (figure 2) : major sort field, subject code (tab); intermediate sort field, manufacturer; and minor sort field, computer system nomenclature. that is, a manual is listed by computer system, within the manufacturer, within the subject code. within each subject code, or tab, all manuals pertaining to this subject area are listed. 3) listing by manufacturer name and publication number (figure 3) : major sort field, manufacturer; intermediate sort field, publication number. that is, a document is listed by publication number within the manufacturer. 180 journal of library automation vol. 4/ 4 december, 1971 mfr system tab cdc 6000 24 cdc 6000 24 602s3000b 60191200a pu blication title *6000 series cobol 3 reference manual 64/6s/6600 cobol reference manual 700700 690900 rca spec70 24 ec 001 s00 *ansi cobol language translator (ucolt)prog pub 701200 rca 3301 24 940sooo realcom cobol 660soo un! 1108 24 fsd 20s l *fd ansi cobol prog ref man 700s04 un! 1108 24 up 7626 r2 *cobol exec 2 & exec 8 supplementary ref 700911 uni 9200 24 up 7s43 r2 *cobol supplementary ref-see 9300 24 700s11 uni 9300 24 up 7s43 r2 *cobol supplementary ref 700s11 uni 9300 24 up 7820 *9200/9300 cobol summary card 700917 uni 9400 24 up 7709 rl *9400 cobol su pplementary ref 700630 uni 9400 24 up 7797 *9400 cobol summ~ry card 700707 xds sigmas 24 901s01a cobol6s operations 680700 xds sigmas 24 90 1sooa cobol 6s reference 680700 *indicates new entries s ince l ast catalog. fig. 2. sample index listing by subject code (tab), manufacturer name i and system . mfr system tab pub no publication title pub date ibm sys/370 01 a22 6935 -00 sys / 370 mod 165 functional characterstics 700600 ibm sys / 370 00 a22 6944 -0l model 195 configurator 691100 ibm sys/370 01 a22 6962 -00 sys/370 mod 155 channel characteristics 700600 ibm sys/370 15 a22 6971 -00 system/370 model 165 i nstallation man phys plan 700600 ibm sys /360 19 a22 6974 -00 sys/360 370 i i o interface channel 710200 ibm sys/370 01 a22 7000-00 sys / 370 principles of operation 700600 ibm 7070 t 7074 01 a22 7003 -06 7070/7074 pri!iciples of operation 620000 ibm 1401/1460 00 a24 -140l -02 1401 system summary 650900 ibm sys/370 07 a26 1606-00 2319 disk storage component summary 700900 ibm sys/370 01 c20 1738 0l a guide to systej>i/370 model 135 710 300 ibm sys/370 15 c22 7004 00 sys / 370 installation manual physical planning 710 100 ibm sys/360 26 320 1011 01 call/360 & pl/1 subroutine ver 2 700200 ibm sys/360 25 320 1054-00 call/360 fortran reference manual 700200 figure 3. sample i ndex l i st i ng by manufacture r name and publ i cation numbe r . fig. 3. sample index listing by manufacturer name and publication number. computer systems library/malley 181 the manufacturer needs only to list his documents pertaining to a proposal and an analyst can find them immediately by using this listing. this listing also aids the manufacturer in updating his documents on file in the library, as most manufacturers publish their own index of publications in numerical order. the above lists are generated by sorting and listing a master file. the latter is maintained on magnetic tape and updated with punch cards. four card formats are employed, one for each of the following: 1) addition of publications, 2) deletion of publications, 3) change of title or date of a publication in the file, and 4) change of other information. tables 2 through 5 show the format for each type of card. it should be noted that in table 3, information in columns 1-26 must be identical to that in the entry to be deleted, and that the publication title and publication date are not changed by the card described in table 5. table 2. punch card format for addition of a publication card columns information 1-3 manufacturer (abbreviated) 4-12 system number 13-14 subject code 15-26 publication number 0 27 the letter 'a' (key for adding 28-74 75-80 a publication) publication title publication date table 3. punch card format for deletion of a publication card columns information 1-3 manufacturer 4-12 system number 13-14 subject code 15-26 publication number 0 27 the letter 'd' (key for deleting a publication) table 4. punch card format for change of title or publication date columns information remarks 1-3 manufacturer identical 4-12 system number to 13-14 subject code listing 15-26 publication number 0 27 the letter 'c' 28-74 the new title if applicable 75-80 the new publication date if applicable 182 journal of library automation vol. 4/4 december, 1971 table 5. punch card format for change of manufacturer, system, tab, or publication columns information remarks 1-3 manufacturer identical 4-12 system number to 13-14 subject code listing 15-26 publication number 27 the letter 'x' 28-30 new manufacturer name 31-39 new system number 40-41 new subject code 42-53 new publication number a simple program written in cobol for the univac ll08 is used to implement access. data cards are read into memory, and the master tape file is updated. errors such as "no match" or incorrect format are identified during the update process. the updated master file is sorted to provide the three types of output listings described above. system development the present system evolved over a five-year period. the initial catalogs were prepared and maintained manually, and some of the better features of the early attempts were carried forward into the automated system. because of this evolution, it is difficult to determine the actual development cost of access. much of the detailed design was done in connection with development of the computer program. approximately seven man-months were required for preparation and debugging of the program. during this period, a total of approximately two hours of univac ll08 system time was required. negligible time has been spent on program maintenance since installation of access. not unexpectedly the greatest effort was expended in collecting and preparing data for the initial master file. the library in 1967 contained over 3,000 documents, and a punch card had to be prepared for each. the major adpe manufacturers cooperated in this undertaking, by providing properly punched cards for individual documents. cards were prepared by the usacssec for documents provided by small manufacturers and for miscellaneous documents in the library. the major manufacturers have continued their assistance in maintaining the data base, providing punch cards with all new documents delivered to the library. nevertheless, it cannot be stressed too strongly that the updating and maintenance of this library file is a very difficult and tedious task representing the work of a full-time librarian, library assistant and computer systems library /malley 183 clerk the library may receive 600 new documents and/or page changes, with or without cards, during a thirty-day period. the master file is updated and new listings produced every sixty to ninety days. more frequent runs would prove more beneficial to the users and require less manpower on the part of the staff. each run requires approximately ten minutes of univac 1108 system time. it is an interesting fact that communication was a problem during detailed design of access. adp system analysts and programmers thought and spoke in terms of codes, fields, sorts and files; the librarian operated in a context of documents, catalog cards, and indexes. a period of mutual education was necessary before effective communication transpired and the system design progressed. results the library today contains almost 10,000 hardware/software equipment documents on over 200 computer systems from 70 manufacturers. the flexibility inherent in access permitted the library to absorb this rapid growth with minor perturbation. during one six-month period documents describing the mini-computers of twenty manufacturers were added. the subject codes accommodated all documents, and the only modification required to the system was the addition of codes for these new manufacturers. the value of access was demonstrated when ibm and rca announced the new system 7. documentation on the available hardware and software was delivered on the day of announcement together with punch cards, and within one week this large addition to the collection was completely integrated into the catalog. adpe manufacturers also have benefitted from access. the army requires that adpe vendors, to be eligible for contracts, must maintain current technical documentation of their proposed systems in the usacssec library. manufacturers are provided copies of the listings pertaining to their equipment to check for compliance with the requirement. some manufacturers have even accepted the access cataloging scheme for use in their own libraries. access has met the objectives established for it. benefitting from the evolutionary nature of the cataloging scheme, the system has required a minimum of modifications to date. none of these has been substantive, falling more in the category of debugging rather than in that of design change. although access was initiated and installed to satisfy the unique requirements of the usacssec, it has general application. it brings order to the conglomeration of technical information on adp systems and equipment. the three listings that it produces become, in effect, axes for the multi-dimensional volume of information. 184 journal of library automation vol. 4/4 december, 1971 conclusion the usacssec technical library is recognized as having the most extensive holdings of adpe manufacturer's literature in the washington area. no libraries of equal or greater size are known to exist anywhere. it was planned initially that only usacssec analysts and technicians would have access to the information in the usacssec library. however, the resulting interest of various organizations of the department of defense (dod), and the fact that this collection provided information that was otherwise unavailable, prompted the command to open the library to a selected group of dod users. this initial relaxation has gradually evolved into provision for all government and military personnel receiving prior clearance from command headquarters usacssec to utilize the library for research. unfortunately, because of the type of material collected, the quantity available, and the constant demand, it has not been possible to permit the lending of materials. at present, approximately eighty personnel from other government agencies use the library each month for research. some of the agencies use it each month for evaluation and selection of computers. user reaction is amazement that such a collection of adp materials exists. it is not unusual for relatively new and thoroughly dynamic fields of interest to progress so rapidly that efforts to document them adequately lag behind the latest developments. the problem is particularly acute in the information processing field, whose large amount of technical literature is of little value without an efficient cataloging system. usacssec has solved some of the information problems in the computer field by examining in detail the special on-the-job requirements of computer system analysts in general. by developing its library in terms of the computer industry, rather than specifically to one command's requirements, a generalized library system in adp has evolved. it is hoped that this paper will be a contribution toward a standard approach in cataloging adp collections and creation of a commonality among adp technical libraries. 105 application of the variety-generator approach to searches of personal names in bibliographic data bases-part 1. microstructure of personal authors' names dirk w. fokker and michael f. lynch: postgraduate school of librarianship and information science, university of sheffield, england. conventional approaches to processing records of linguistic origin for storage and retrieval tend to regard the data as immutable. the data generally exhibit great variety and disparate frequency distributions, which are largely ignored and which entail either the storage of extensive lists of items or the use of complex numerical algorithms such as hash coding. the results in each case are far fmm ideal. the variety-generator approach seeks to reflect the microstructure of data elements in their description for storage and search, and takes advantage of the consistency of statistical characteristics of data elements in homogeneous data bases. in this paper, the application of the variety-generator approach to the description of personal author names from the inspec data base by means of small sets of keys is detailed. it is shown that high degrees of partitioning of names can be obtained by key-sets generated from the initial characters of surnames, fmm the terminal characters of surnames, and from the initials. the implications of the findings for computer-based bibliographical information systems are discussed. introduction the application of computer technology to the storage of bibliographic data bases and to the selection of items from them on the basis of the content of specified data elements poses considerable problems. among the most important of these, from the viewpoint of the efficiency of computer use, is the fact that many of the individual data elements exhibit great variety (i.e., lists of their contents are extensive), and show relatively disparate distributions. this behavior is encountered in different degrees in regard to items such as words in the titles of monograph or periodical ar106 ]oumal of library automation vol. 7/2 june 1974 ticles, assigned subject headings, authors' names, and citations.14 such distributions have been extensively studied in various contexts by bradford, zip£, and mandelbrot.4-6 in general, the distributions are approximately hyperbolic, so that a small proportion of items may account for a substantial proportion of occurrences, while the majority of items occur only infrequently. the studies have been well reviewed by fairthorne.7 of all the data elements, personal author names exhibit a distribution which is at its most exh·eme in one direction. as is shown later in this paper, the most frequent author name in a file of 50,000 names occurred only sixteen times, while over 35,000 of the names, or over 70 percent of the file, occurred once only. a simple and general strategy for dealing with searches of data elements, the contents of which show large variety and disparate distributions, is under development by the research unit at the sheffield school, and has thus far been elaborated in regard to searches of chemical structures and of natural-language data bases. 8• 9 based on information-theoretic principles, it involves a two-stage search procedure in which in the first and rapid stage the majority of items which cannot possibly fulfill the search criteria are eliminated, while those which meet the criteria are examined for an exact match at the second stage. the criteria (or attributes) are selected on the basis of an examination of the microstructure of the items in the data base, and are chosen so that their frequencies are approximately equal. the number of criteria or attributes chosen for description of the items is variable within a wide range; with their aid, the variety of items can be described so as to facilitate discrimination among them. in the context of substructure searching, the attributes are representations of fragments of chemical structures,10 while in the case of text, they are strings of characters which are variable in length. these strings are long when the characters comprising them represent frequent combinations, and short when the characters are infrequent.11 since the sets of attributes can generate, in an approximate manner, the variety of items encountered in the data base, they are termed variety generato1·s. they are intermediate in number between the primitive set of symbols ( alphanumeric characters in the case of text, atoms and bonds in that of chemical structures) and the actual variety of items in the collection (words or word fragments in text in the first instance, and molecules in the second). the variety-generator approach involves recognition of the fact that the statistical properties of specific data elements within homogeneous data bases are relatively constant, and that the primitive symbols of the data elements themselves usually show hyperbolic distributions. new symbol sets can therefore be defined, consisting of sequences of primitive symbols such that their frequencies of occurrence become comparable. the new symbol sets then constitute the attributes which are employed, singly or in combination, to represent the items within a search file. these symbol sets variety-generator approachjfokker and lynch 107 approximate to the ideal of equifrequency postulated by shannon for optimal efficiency in communication. 12 only an approximation can be obtained, however, since the distributions of the newly defined symbols still cover a relatively wide range, and since they are seldom entirely independent of one another in statistical terms, and may often be strongly associated. the variety-generator concept is not entirely novel. indeed, it was anticipated most closely in precisely the present context by merrill and by cutter with a view to subdividing a library's holdings into equal groups of items.13 • 14 however, the greater flexibility of computer techniques would appear to make its use today even more attractive. this paper thus describes a study of a large file of authors' names with a view to identifying attributes of the names which can be used for efficient reh·ieval purposes. assessment of the effectiveness of the attributes in retrieval is described in part 2 of this series. (t the main terms used here are n-gram, key, and key-set, where an n-gram is a string of n adjacent characters. a key consists of an n-gram, and keys are chosen so that the frequencies of a set of keys (or key-set) are approximately equivalent in a given file. the measures used in assessing frequency distributions are shannon's expressions for the entropy of a sequence of symbols: and relative entropy: i h = i p1log2pi i= 1 h _ hactual rhmaximum hmaxlmum is reached when the probabilities of occurrence of the symbols of the sequence are equal; its value is the binary logarithm of the variety of symbols, since 1 1 h =n(-log2-) =log2n n n the value of the relative entropy is thus a measure of the degree of equifrequency of a set of symbols, and is independent of their variety. characteristics of name file the file studied was a collection of 100,000 personal names taken from ten issues of the inspec data base dating from the period 1969 to 1972. the names are represented in variable-length format, surname followed by a comma, space and initials each followed by a period. for the present purpose, case and diacritic shift symbols were ignored. <~>to appear in the september 1974 issue of the journal of library automation. 108 journal of library automation vol. 7/2 june 1974 subsets of the file were first sorted into sequence on the basis of the full names, and distributions determined both for surnames and initials, and for surnames alone, as shown in table 1 for the subset of 50,000 names. since the great majority of full names occur once only, the relative entropy of this distribution, at 0.975 (computed with respect to the 50,000 names, i.e., hmax= log250,000), is high, while that for surnames alone is lower, at 0.904. an analysis of the ratio of unique surnames to the total number of entries in files of 25,000, 50,000, 75,000 and 100,000 names showed that the proportion of different surnames added to the file as it increases in size is predictable. the relationship between the number of different surnames (d) and the total number of entries ( n) conforms to the expression: d=antl where a = 5.89 and {3 = 0.78. next, the frequencies of characters at different positions in the surnames and of the initials were determined. the most important positions in the surname are the first and last characters, as will be seen shortly. the distributions of these characters and of the first and second initials are shown in table 2. the relative entropy of the first initial is, interestingly, table 1. distribution of full names and surnames alone in a file of 50,000 inspec names. frequency f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > 20 full names no. of names with %of names with frequency f frequency f 35,187 70.37 4,768 19.07 1,060 6.36 302 2.42 88 0.88 34 0.41 16 0.22 7 0.11 3 0.05 1 0.03 2 0.05 1 0.03 total number of different full names = 41,469 h = 15.22 hmax = 15.61 ( log250,000) hr = 0.9753 su1·names no. of surnames with frequencyf % of surnames with frequencyf 19,894 39.79 4,258 17.03 1,597 706 395 235 134 104 68 54 36 39 36 28 24 24 15 19 16 9 112 9.58 5.65 3.75 2.82 1.88 1.66 1.22 1.08 0.79 0.94 0.94 0.78 0.72 0.77 0.51 0.68 0.61 0.36 8.44 total number of different surnames = 27,803 h = 14.11 hmax = 15.61 ( log250,000) hr = 0.9042 variety-generator approachjfokker and lynch 109 table 2. distributions of first and last characters of surname and of initials in 50,000 inspec name me. first character last character first second of surname of surname initial initial s 0.113 n 0.164 j 0.100 space 0.371 b 0.083 r 0.102 a 0.083 a 0.066 m 0.080 a 0.084 r 0.081 m 0.045 k 0.076 s 0.082 m 0.064 j 0.043 h 0.056 i 0.074 g 0.058 s 0.035 g 0.055 e 0.068 v 0.051 l 0.033 p 0.053 v 0.067 d 0.050 e 0.033 c 0.052 y 0.043 h 0.050 r 0.031 r 0.047 t 0.042 s 0.047 p 0.031 l 0.047 0 0.041 e 0.043 g . 0.030 d 0.044 l 0.040 p 0.042 c 0.030 t 0.040 h 0.037 w 0.038 w 0.028 w 0.040 k 0.033 k 0.036 v 0.028 a 0.036 d 0.030 l 0.036 h 0.027 f 0.034 g 0.026 c 0.035 d 0.026 n 0.025 z 0.013 t 0.033 i 0.026 v 0.025 m 0.013 b 0.032 f 0.024 e 0,018 u 0.013 n 0.026 n 0.024 j 0.017 f 0.006 f 0.026 k 0.022 0 0.016 c 0.005 i 0.023 b 0.020 z 0.013 w 0.005 y 0.023 t 0.013 i 0.013 p 0.004 0 0.010 y 0.007 y 0.011 x 0.004 space 0.005 0 0.005 u 0.005 b 0.003 z 0.005 z 0.002 q 0.001 j 0.001 u 0.004 u 0.001 x q 0.0002 q 0.0002 q 0.0002 x 0.0001 x 0.0001 h =4.309 h =4.039 h =4.374 h =3.688 hmax = 4. 700 (log,26) hmax = 4. 700 (log,26) hmax = 4.755 (iog.27) hmax = 4, 755 (log,27) hr = 0.917 hr = 0.859 h. =0.920 h. =0.776 the highest of the four; the highest ranking initial is j, which is one of the least frequent characters in english text. thereafter follow the first and last letters of the surname, and the second initial. the low relative entropy of the last is partly accounted for by the fact that a single initial occurred in 37 percent of the entries. distributions were also obtained for the second and subsequent characters of the surname. these, and also the distributions of the first character, are in general agreement with the results of earlier studies by bourne and ford, and by ohlman, and indicate that consonants predominate in the first position, vowels in the second position, while thereafter the distributions become less disparate. 15• 16 however, due to the variable lengths of names, the dominant character at the sixth and subsequent positions of the surname is the space character. key-set generation technique the basic key-set generation technique involves creating fixed-length 110 journal of library automation vol. 7/2 jube 1974 n-grams from some point or points of reference within each record, the strings generated being initially of length greater than those anticipated within the key-set. these strings are sorted into lexicographic order and counted. (the resultant distribution of the fixed-length strings is again hyperbolic.) the frequencies are compared with a predetermined threshold frequency-at the first stage none of the string frequencies should exceed this value. the strings are then shortened by truncation of the right-hand character, and the frequencies of the strings which have become identical through truncation are accumulated. the new n-gram frequencies are compared with the threshold value; any strings which exceed the value are noted. the procedure is repeated until the single characters are reached. two types of analysis are possible, redundant and nonredundant. ·in the latter, any string exceeding the threshold value is removed from the list and not processed further, while in the former they continue to the next processing stage. while redundant analysis is valuable at the exploratory stage, the nonredundant type is preferred for key-set generation. the procedure was first applied to strings of characters starting with the first character of each surname, as illustrated in figure 1. n-gram foreman forema forem fore for fo f frequency 11 13 24 98 143 214 1685 fig. 1. successive right-hand truncations of a surname during key-set generation here the frequency of the surname foreman in a _file of 50,000 names is eleven. when successively shortened, other surnames with the same initial n-gram are included in the count. comparison of the count with a threshold value results in selection of a key. here, if the threshold were 100, the key selected would be for. application of the procedure to the surnames of the 50,000 name file (the name records had a maximum of eighteen characters, left-justified and space-filled if less than this length), with a threshold frequency of 300 (i.e., a probability of 0.006), gave a key-set consisting of eighty-seven keys, including all the alphabetic characters. the key-set is shown, in alphabetic order, together with the probabilities, in table 3. it is clear that the most frequent characters at the beginning of the surname have produced most keys, s and m with eight keys each, b with seven, k with six, and h, g, p, and r each with five keys. whereas the relative entropy of the initial surname letter was 0.917, that of the key-set is 0.977. the probabilities of no less than seventy of the eighty-seven keys now lie between 0.005 and 0.015. the key-set itself consists of the twenty-six alphabetic characters (one of these, x, is not represented in the collection), fiftyvariety-generator approachjfokker and lynch 111 table 3. key-set of 87 keys produced from 50,000 surnames from inspec files. key p1'0bability key probability key probability key probability a .023 ga .009 m .001 ro .016 al .007 go .011 ma .022 s .027 an .006 gr .012 mar .008 sa .016 b .012 gu .007 mc .007 sch .014 ba .013 h ,006 me .010 se .008 bar .006 ha .021 mi .012 sh .016 be .017 he .010 mo .012 si .010 bo .014 ho .012 mu .008 so .007 br .014 hu .007 n .011 st .016 bu .009 i .013 na .008 t .030 c .013 j .010 ni .006 ta .010 ca .011 jo .007 0 .017 u .005 ch .016 k .015 p .011 v .015 co .013 ka .018 pa .014 va .010 d .015 ki .008 pe .011 w .011 da .009 ko .017 po .010 wa .011 de .013 kr .008 pr .006 we .008 do .007 ku .010 q .001 wi .010 e .018 l .013 r .007 x f .025 la .012 ra .011 y .011 fr .008 le .014 re .008 z .013 g .015 li .009 ri .006 h=6.2952 hmax = 6.443 (log,87) h, =0.977 eight digram keys, and the three trigram keys bar, mar, and sch. the predominance of vowels as the second character of keys is noticeable; forty-nine of the sixty-one n-grams have a vowel in the second position. the size of the key-set produced from a given data base can be varied arbitrarily by changing the threshold value. an approximately hyperbolic relation obtains between the value of the threshold and the number of keys selected. as the size of the key-set increases, the length of the longest n-gram in the key-set increases, and the distribution of n-grams shifts toward higher values, as shown in figure 2. stability of the key-sets with increase in file size is clearly an important factor. to determine the extent of this, successive portions of the entire file of 100,000 surnames were subjected to the analysis at a threshold value of 0.005. as illustrated in table 4, the key-sets are remarkably stable in regard to total key-set size, the number of keys of each length, and to the actual keys. table 4. stability of size and composition of keys with increasing file size. number of number of number of number of total size entries in file characters digrams trigrams of key-set 25,000 26 76 10 112 50,000 26 74 9 109 75,000 26 74 10 110 100,000 26 75 10 111 no, of keys common to key-sets 26 73 9 108 112 ]ou1'nal of library automation vol. 7/2 june 1974 400 300 number of n-grams 200 100 1 2 3 4 5 6 7 8 9 length of n-grams key-set size a 184 b 332 c 572 d 1034 threshold probability 0.0025 0.0015 0.0010 0.0007 10 11 12 13 fig. 2. distribution characteristics of n-grams generated from 10,000 surnames from inspec for four different threshold values as the size of the key-set increases, the range of probabilities represented among the keys narrows, and the relative entropy of the distribution increases, becoming eventually asymptotic with the value of one. this i~ illustrated in figure 3, for the surnames in a file of 50,000 entries. beyond a key-set size of about 100, increases in the relative entropy of the resultant distribution are marginal. furthermore, with increasing key-set size, the va1'iety-gene1'ato1' appmachjfokker and lynch 113 shorter and more frequent surnames begin to appear in their entirety as keys. as an alternative to increasing the variety of the keys, the production of keys from character positions after the first letter of the surname was considered. the problem of variations in name length, as well as the very different distributions of the characters at these positions, were not encouraging, and instead the production of key-sets from the last letter of the sur1 .99 .98 .97 .96 .95 .94 .93 hr .92 .91 .90 .89 .88 .87 .86 0 20 40 60 80 100 total number of keys for the front of surnames fig. 3. increase in relative entropy with increase in key-set size; keys generated from 50,000 surnames 114 j oumal of library automation vol. 7/2 june 1974 name was investigated, and proved much more ath·active, since it is largely independent of surname length. key-sets from the end of the surname for this purpose, each surname in the file was reversed within a record and subjected to key-generation. the relative entropy of the last character of the surname is substantially lower than that of the first character, at 0.860. accordingly, the key-sets have a higher proportion of longer keys than those produced from the front of the surname, as shown in table 5. this key-set consists of the twenty-six characters, seventy-eight digrams, table 5. key-set of 155 n-grams produced from last letter of 50,000 inspec surnames at threshold of 0.003. key p1'obability key p!'obability key probability key probability a .012 vich ,005 ein .005 is .012 ca .003 gh .003 kin .007 ns .006 da .008 sh .003 lin .005 ins .003 ka .006 th .005 tin .003 os .004 ma .007 ith .004 nn .010 rs .006 na .003 i .014 on .009 ss .005 ina .004 ai .004 son .013 ts .004 ra .010 hi .007 lson .004 us .004 ta .008 ii .009 nson .006 t .012 va .004 vskii .005 rson .004 dt .003 ova .010 ki .006 ton .009 et .004 wa .004 ski .005 0 .017 nt .004 ya .005 wski .004 ko .003 rt .003 b .003 li .005 nko .010 ert .004 c .005 ni .007 no .004 st .004 d .009 ri .005 to .007 tt .005 ld .005 ti .004 p .004 ett .003 nd .006 j .001 q .001 u .013 rd .009 k .010 r .005 v .001 e .020 ak .006 ar .006 ev .018 de .003 ck .009 er .016 ov .012 ee .004 ek .004 ber .003 kov .008 ge .004 ik .004 der .006 ikov .004 ke .006 l .007 ger .005 lov .005 le .008 al .006 nger .003 nov .006 ne .008 el .012 her .006 anov .006 re .006 ll .004 ier .005 rov .006 se .005 all .004 ker .007 sov .003 te .004 ell .008 ler .007 w .005 f .003 m .008 ller .005 x .004 ff .003 am .005 mer .003 y .017 g .004 n .009 ner .010 ay .004 ng .004 an .017 ser .003 ey .006 ang .003 man .014 ter .008 ley .007 ing .007 rman .003 or .004 ky .004 rg .007 yan .003 s .016 ry .005 h .004 en .018 as .007 z .007 ch .009 sen .007 es .011 tz .006 ich .003 in .019 nes .004 h=7.059 hmax = 7.276(log.155) hr = 0.970 va1'iety-generator approachjfokker and lynch 115 1 .99 .98 .97 .96 .95 .94 .93 .92 hr .91 .90 .89 .88 .87 .86 0 40 80 120 160 200 total number of keys for the end of sumames e!g. 4. increase in relative entropy with increase in key-set size; keys generated from 50,000 surnames forty trigrams, ten tetragrams, and a single pentagram. the breakdown of the individual terminal characters of the surname is also more extreme, since the distribution is more skew. thus n, the most frequent last character, has no fewer than nineteen different keys in this set, closely followed by r, with seventeen keys. the relative entropy of the distribution is again high, at 0.970 for this key-set. figure 4 shows the relation between key-set size and relative entropy, and indicates that a larger number of keys from the last character of the surname is required to reach the same relative en116 journal of library automation vol. 7/2 june 197 4 tropy as keys from the first character. there is an anomalous section of the curve, which may well derive from the much greater prevalence of suffixes than prefixes in personal names. conclusions this study has demonstrated the feasibility of devising partial representations of author names by applying the variety-generator approach to overcome the substantial frequency variations encountered in their distributions. it has also been shown that within a homogeneous file, i.e., one of consistent provenance, there exists a substantial level of consistency in terms of character distributions, as illustrated in table 4. the characteristics may vary substantially between data bases of different provenance, e.g., as between inspec and marc files. 17 conventional approaches to processing records comprising linguistic data tend to disregard the statistical properties of the items, and attempt to overcome the resultant problems either by storage of extensive lists of items or by using complex numerical algorithms. typical of this latter approach, in the present context, is the use of truncated search keys for access to bibliographical files in direct access stores, in which fixed-length character strings are the keys, as, for instance, in the system in operation at the ohio college library center.18 the problems encountered in the use of fixed-length truncated author and title search keys for monograph data are indicated by the fact that the search files using hash-addressing are operated, on average, at a density of only 62.5 percent. once the density reaches 75 percent, the proportion of collisions and the resultant degradation in performance are such that the files are recreated at a density of only 50 percent. fixed-length keys from author and title entries are demonstrably inefficient in performance since the information content is low. the distribution of the initial trigrams of 50,000 names from the inspec file provides corroboration of this fact. the number of possible combinations of three characters is 17,576 (263 ), yet only 3,285 trigrams were represented in the file, or 18.7 percent of the total variety. moreover, the relative entropy of the trigrams is much lower than that of the initial characters of the surnames, at 0.73. performance figures for precision illustrate this point.19 the present work, together with other studies of the scope for application of the variety-generator approach, thus stands in considerable contrast to prior work, and must be viewed as a means whereby the microstructure of particular data elements is fully reflected in their manipulation, affording substantial advantages. 20 part 2 of this paper illustrates this in regard to searches of personal names. acknowledgments we thank m. d. martin of the institution of electrical engineers for vm·iety-generator approachjfokker and lynch 117 provision of a part of the inspec data base and of file-handling software, and the potchefstroom university for c.h.e. (south africa) for awarding a national grant to d. fokker to pursue this work. we also thank dr. i. j. barton and dr. g. w. adamson for valuable discussions, and the former for n-gram generation programs. references i. p. b. schipma, term fragment analysis for inversion of large files (chicago: illinois institute of technology research institute, 1971). 2. j. c. costello and e. wall, "recent improvements in techniques for storing and retrieving information," in studies in co-ordinate indexing, vol. 5 (washington, d.c.: documentation inc., 1959). 3. l. h. thiel and h. s. heaps, "program design for retrospective searches on large data bases," information storage and retrieval8:1-20 (feb. 1972). 4. s.c. bradford, documentation (london: crosby-lockwood, 1948). 5. g. k. zip£, human behaviour and the principle of least effort (cambridge, mass: addison-wesley, 1949). 6. b. mandelbrot, "an informational theory of the statistical structure of language," in w. jackson, ed., communication theory (london: butterworth, 1953), p.486501. 7. r. a. fairthorne, "empirical hyperbolic distributions (bradford-zipf-mandelbrot) for bibliometric description and prediction," ]oumal of documentation 25:319-43 (dec. 1969). 8. m. f. lynch, "the microstructure of chemical data-bases, and their representation for retrieval," proceedings, cn ai nato advanced study institute on computer representation and manipulation of chemical information (in press). 9. i. j. barton, s. e. creasey, m. f. lynch, and m. j. snell, "an information-theoretic approach to text searching in direct-access systems," communications of the acm (in press). 10. g. w. adamson, j. cowell, m. f. lynch, a. h. w. mclure, w. g. town, and a. m. yapp, "strategic considerations in the design of screening systems for substructure searches of chemical structure files," ]oumal of chemical documentation 13:153-57 (aug. 1973). 11. a. c. clare, e. m. cook, and m. f. lynch, "the identification of variable-length, equifrequent character strings in a natural language data base," computer journal15:259-62 (aug. 1972). 12. c. e. shannon, "a mathematical theory of communication," bell system technical journal 27: 398-403 ( 1948) . 13. w. c. b. sayers, a manual of classification for librarians and bibliographers (london: grafton, 1926), 14. c. a. cutter, c. a. cutter's alphabetic order table ... altered and fitted with three figures by kate e. sanborn (boston: boston library bureau, 1896). 15. c. p. bourne and d. f. ford, "a study of the statistics of letters in english words," information & control4:48-67 (1961). 16. h. ohlman, "subject word letter frequencies; applications to superimposed coding," proceedings of the inte1'national conference of scientific information, vol. 2 (washington, d.c.: national academy of science, 1959), p.903-16. 17. d. w. fokker and m. f. lynch, "a comparison of the microstructure of author names in the inspec, chemical titles and b.n.b. marc data-bases" (in preparation). 118 ]oumalof library automation vol. 7/2 june 1974 18. f. g. kilgour, p. l. long, a. l. landgraf, and j. a. wyckoff, "the shared cataloging system of the ohio college library center," journal of library automation 5:157-83 (sept. 1972). 19. f. g. kilgour, p. l. long, and e. b. leiderman, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the asis 7:79-82 (1970). 20. i. j. barton, m. f. lynch, j. h. petrie, and m. j. snell, "variable-length character string analysis of three data-bases, and their application for file compression," proceedings, 1st informatics con£., durham, 1973 (in press). microsoft word september_ital_cyzyk_final.docx editorial board thoughts: information technology in libraries: anxiety and exhilaration mark cyzyk   information  technology  and  libraries  |  september  2015               6   a  few  weeks  ago  a  valued  colleague  left  our  library  to  move  his  young  family  back  home  to   pittsburgh.    insofar  as  we  were  a  two-­‐man  department,  i  spent  the  weeks  following  the   announcement  of  his  imminent  departure  picking  his  brain  about  various  projects,  their   codebases,  potential  rough  spots,  existing  trouble  tickets,  etc.    he  left,  and  i  immediately   inherited  nine-­‐years-­‐worth  of  projects  and  custom  code  including  all  the  "micro-­‐services"   that  feed  into  our  various  well-­‐designed,  high-­‐profile,  and  high-­‐performing  (thanks  to  him)   websites.   this  was  all,  naturally,  anxiety-­‐producing.   almost  immediately,  things  began  to  break.       early  on,  a  calendar  embedded  in  a  custom  wordpress  theme  crucial  to  the  functioning  of   two  of  our  revenue-­‐generating  departments  broke.    the  external  vendor  simply  made   disappear  the  calendar  we  were  screenscraping.  poof,  gone.    i  quickly  created  an  ok-­‐but-­‐less-­‐ than-­‐ideal  workaround  and  we  were  back  in  business,  at  least  for  the  time  being.   then,  two  days  before  the  july  4  holiday,  our  calendar  managers  started  reporting  that  our   google-­‐calendar-­‐based  system  was  disallowing  a  change  to  "closed"  for  that  saturday.    i   somehow  forced  a  closed  notification,  at  least  for  our  main  library  building,  but  no  matter   what  any  of  us  did  we  could  not  get  such  a  notification  to  show  up  for  a  few  of  our  other   facilities.    i  spent  quite  bit  of  time  studying  the  custom,  middleware  code  that  sits  between   our  google  calendars  and  our  website,  and  could  see  where  the  magic  was  happening.    i  now   think  i  know  what  to  do  -­‐-­‐  and  all  i  have  to  do  is  express  it  in  that  nutty  programming   language/platform  that  the  kids  are  using  these  days,  ruby  on  rails.    i've  never  written  a  line   of  ruby  in  my  life,  but  it's  now  or  never.   a  little  voice  inside  me  keeps  saying,  "you're  swimming  in  the  deep  end  now  -­‐-­‐  paddle   harder,  and  try  not  to  sink."   while  these  surprise  events  were  happening,  we  also  switched  source  code  management   systems,  so  a  migration  was  in  order  there,  my  longingly-­‐awaited  new  workstation  came  in   (and  i'm  sure  you  all  know  how  painstaking  it  is  to  migrate  all  idiosyncratic   data/apps/settings  to  a  new  workstation  and  ensure  it's  all  present,  functioning,  and  secure   before  dban-­‐nuking  your  old  drives),  we  decommissioned  a  central  service  that  had  been  in     mark  cyzyk  (mcyzyk@jhu.edu)  a  member  of  the  ital  editorial  board,  happily  works  and   ages  in  the  sheridan  libraries,  johns  hopkins  university,  baltimore,  maryland,  usa.     information  technology  in  libraries:  anxiety  and  exhilaration  |  cyzyk       doi:  10.6017/ital.v34i3.8967   7   production  since  2006,  we  fully  upgraded  our  wordpress  multisite  including  all  plugins  and   themes,  fixing  what  broke  in  the  upgrade,  and  i  got  into  the  groove  of  working  on  any  and  all   trouble  tickets/change  requests  that  spontaneously  appeared,  popping  up  like  mushrooms  in   the  verdant  vale  of  my  worklife.   this  was  all  largely  in  addition  to  my  own  job.   so  now  i  find  myself  surgically  removing/stitching  up  code  in  recently-­‐diseased  custom   wordpress  themes,  adding  ruby  code  to  a  crucial  piece  of  our  website  infrastructure,  and   learning  as  much  as  i  can  -­‐-­‐  but  quick  -­‐-­‐  about  the  wonderful  and  incredibly  powerful   bootstrap  framework  upon  which  most  of  our  sites  are  built.   surely  it's  anxiety-­‐producing?    you  bet.   but  it's  thrilling  and  exhilarating  was  well.    i'm  paddling  hard,  and  so  far  my  head  remains   above  water.    many  days,  i  just  can't  wait  to  get  to  work  and  start  paddling.       this  aging  it  guy  suddenly  feels  ten  years  younger!   (but  isn't  all  this  paddling  supposed  to  somehow  result  in  a  swimmer's  body?    patiently   waiting...)     reproduced with permission of the copyright owner. further reproduction prohibited without permission. prospector: a multivendor, multitype, and multistate western union catalog bush, carmel;garrison, william a;machovec, george;reed, helen i information technology and libraries; jun 2000; 19, 2; proquest pg. 71 prospector: a multivendor, multitype, and multistate western union catalog the prospector project represents a unique union catalog. the origin, goals, and design of the union catalog that uses the inn-reach system are presented. challenges of the union catalog include the integration of records from libraries that do not use the innovative interfaces system and the development of best practices for participating libraries. t he prospector project is a union catalog of sixteen libraries in colorado and wyoming built around the inn-reach software from innovative interfaces, inc. (iii).1 in 1997, the colorado alliance of research libraries (the colorado alliance) and the university of northern colorado submitted a joint grant proposal to create a regional union catalog for many of the major academic and public libraries in the region. the project would allow users to view library holdings and circulation information with a single query of the central database. the union catalog also would allow patrons to request items from any of the participating libraries and have them delivered to a nearby local library. however, unlike many of the other union catalogs in the country, prospector has several unique elements: • it is multistate (colorado and wyoming). • it is multisystem (incorporating systems from innovative interfaces and carl corporation; plans call for voyager from endeavor). • it is multi-library-type (academic, public, and special libraries). regional union catalogs representing the cataloged collections of libraries that are related by geography, subject, or library type have been extant for many years. early leaders in the field spearheaded locally developed systems such as the university of california's melvyl system and the illinois library computer systems organization's (ilcso) illinet online system, which became operational in 1980.2 the commercial integrated library system market began to emerge in the late 1980s and the 1990s with such vendors as innovative interfaces and its work with ohiolink through its inn-reach union catalog product, and the carl system.3 many major vendors now have union catalog solutions for a single physical union catalog, although most have the requirement that participating libraries all use the same integrated library system. an alternative approach that is also becoming popular, because of the heterogeneous nature of the ils marketplace and the widespread implementation of z39.50, is for libraries to create virtual union catalogs through broadcast searching. this solution is available from many ils vendors as well as through organizations such as oclc and its webz software. carmel bush, william a. garrison, george machovec, and helen i. reed there is not a single "right" answer for whether regional catalog searching and document delivery is best accomplished through a physical or virtual union catalog. each solution has benefits and drawbacks that must be balanced against the mix of vendors, economics, politics, and technical issues within a state. prospector is somewhat unusual in that it does create a single physical union catalog but allows for the incorporation of other library systems, made possible through a published specification from innovative interfaces. i prospector history, funding, and project goals colorado has a long history of resource sharing through a variety of programs, including use of the colorado library card statewide borrower's card and access to individual libraries' online catalogs through the access colorado library information network (aclin) and other regional catalogs. the colorado alliance has taken a leadership role within the state in promoting cooperation among major academic and public libraries in the areas of automation, joint acquisitions, and other cooperative endeavors. existing online catalog software enabled patrons to easily search individual online catalogs, but searching several catalogs was a tedious task requiring many steps. it has long been a goal of the alliance to have a true union catalog of holdings for all member libraries. to forward this goal, in 1997 the colorado alliance of research libraries and the university of northern colorado jointly applied for and received a grant from the colorado technology grant and revolving loan program to establish the colorado unified catalog, a unified catalog of holdings for sixteen of the major academic, public, and special libraries in colorado.4 the university of wyoming was included in the project through separate funding. the grant of $640,000 was used to develop a union catalog that would support searching and patron borrowing from a single database. the colorado alliance carmel bush (cbush@manta.library.colostate.edu) is assistant dean for technical services at the colorado state university libraries, fort collins; william a. garrison (garrisow@ spot.colorado.edu) is head of cataloging at the university of colorado at boulder (colo.) libraries; george machovec (gmachove@coalliance.org) is the associate director of the colorado alliance of research libraries, denver; and helen i. reed (hreed@unco.edu) is associate dean, university of northern colorado libraries, greeley. prospector i bush, garrison, machovec, and reed 71 reproduced with permission of the copyright owner. further reproduction prohibited without permission. and the university of northern colorado contributed an additional $189,500 of in-kind services to the unified catalog project. additionally, the colorado alliance contributed $119,000 of in-kind funds to support purchase of distributed system software. the colorado unified catalog project, later named prospector, was based upon the inn-reach software developed by innovative interfaces, inc. it included all innovative interfaces sites in colorado as of december 1996 as well as the carl system sites that were members of the nonprofit colorado alliance of research libraries.s the colorado unified catalog project had two major goals: • the development of a global catalog database containing the library holdings of the largest public and academic libraries in the region; and • the development of an automated borrowing system so that users at any of the participating libraries could easily request materials electronically from any other participating libraries.6 the union catalog would allow users to view library holdings and circulation information on titles with a single query of the global database. once titles were located, patrons could request available items and have them delivered to their home library. the grant proposal identified four major goals and outcomes of the project: access, equity, connections, and content and training. by creating a global catalog, the colorado unified catalog project would provide students, faculty, staff, and patrons free and open access to the union catalog via the internet. patrons from all participating libraries would have equal access to the combined holdings of all sixteen participating libraries, thus greatly enhancing resources available to patrons without the necessity of travel across the state. connectivity was greatly enhanced by the installation of high-speed internet access in the colorado alliance office where the union catalog server was housed. the unified catalog project amassed, in one place, the complete cataloged collections of the major libraries in the region creating a single, easy-to-use public interface. training for the catalog would be conducted in each library so that it could be integrated into the standard training and reference services of each participating library.? addressing statewide goals for libraries, the colorado unified catalog was designed to dovetail with an existing project in colorado called the access colorado library and information network (aclin) in several ways. the goal of aclin was to provide statewide searching of several hundred library catalogs in colorado through broadcast 239.50 searching. however, because of the large number of online library catalogs (too many z39.50 targets cause broadcast searching to be slow) and 72 information technology and libraries i june 2000 poor network infrastructure in some parts of the state, the creation of physical union catalogs, such as prospector, greatly enhanced the ability for a project such as aclin to be successful. as stated in the grant proposal it will: • make aclin more efficient since sixteen libraries will be grouped together and can be accessed via a single search, thus saving alcin users steps in searching; • enhance aclin's document delivery plans since patrons can make requests themselves; • offer both web and character interfaces for various levels of users; • provide access via aclin's dial-in ports as well as via the internet; and • support alcin's future developments based on a 239.50 environment.s work on the development of the colorado unified catalog began in mid-1997. even while contract negotiations were underway in midto late 1997, groups were busy undertaking discussions on the design and structure of the unified catalog. work on development of profiling and system specifications continued through july 1998. this data was entered onto the server at the colorado alliance office and a test database was created in august 1998. testing was completed in november 1998 and the first records were loaded in december 1998. the creation of the database for the first twelve libraries took seven months. during the database load the catalog was available for searching, although most participating libraries did not highlight the system in their local opacs. innovative interfaces, inc. conducted training on the actual patron requesting and circulation functions at three sites over the period from may through july 1999. as of january 2000 the catalog included more than 3.6 million unique bibliographic records of the twelve largest libraries in colorado (more than 6.6 million marc records have been contributed, which has resulted in 3.6 million unique records after de-duplication). with the database in place and opac and circulation training complete, prospector went "live" for patron-initiated requests in the first eight libraries on july 23, 1999. as of december 31, 1999, all twelve innovative sites were "live" in prospector. the final programming for loading the records from carl-system sites will be completed in spring 2000. it is anticipated that carl-system library records will be loaded in late spring 2000 and will bring the database to more than five million unique marc records, with more than ten million item records. since the receipt of the grant, two participating libraries have selected endeavor as their online integrated system . contract negotiations are underway between innovative interfaces and the reproduced with permission of the copyright owner. further reproduction prohibited without permission. colorado alliance to come to an agreement on loading records for the endeavor libraries into prospector. i politics and marketing of prospector planning and policy making are inherently political processes in which participants choose among goals and options in order to make decisions and to direct actions. for prospector the diverse makeup of multitype libraries and multisystems augured for different perspectives on implementation from the onset. nearly every department in member libraries would have an impact from the project. to be successful in carrying out their charges, the work of the task forces appointed to implement prospector had to address how these staff could influence the process and how local practices would be affected. the challenge was to engage staff in the process since the task force structure precluded representation from every member library. meeting this challenge would be vital to ensuring input and fostering buy-in and advocacy for prospector in member institutions. consequently, in addition to reviewing standards or best practices and focusing on the goals stipulated in the grant, obtaining factual knowledge about member practices and resources and encouraging communications served as key ingredients in planning and policy development. general process profiling prospector, a main charge for the cataloging/ reference task force, illustrates the general process employed in planning and how key ingredients were applied to gain input and produce results. the first step involved the task force's review of the grant's aims for the unified catalog. with that framework as a basis, a planning process was outlined and shared with participants. the prospector web site detailed the specification development process, including the schedule and opportunities for input. next the task force surveyed participants for information on their systems: bibliographic indexing rules, types of indexes, characters indexed in phrase indexes, indexes on which authority control performed, and suppliers of authority records. using this data, the task force identified the commonalties and differences to determine what to create in the unified catalog. members also consulted innovative interfaces and reviewed what previous innreach customers had established. draft recommendations for indexing, indexes, record overlay, and record display specifications were then posted on the prospector web site and participants requested to review and provide input. a notice in data/ink: the alliance newsletter (www.coalliance.org/ datalink) also referenced the site. at the same time, testing was performed using draft specifications in order to assess them and to check for other concerns that testing might reveal. because of the importance of the recommendations, an open forum was held to receive additional comments. following the forum, the task force members made final adjustments to the specifications. after the period for public comment ended, the specifications were submitted as recommendations to the prospector steering committee for approval. once approved, the specifications became official and were referenced in all site visits. issues because of the design of inn-reach, participants must make decisions about contribution of records, priorities for what record would serve as the master record, order of loading, indexing, indexes, and displays for the unified catalog. circulation functions require decisions about services for patron types, circulation statuses, loan periods, numbers of loans, renewals, recalls, checkouts, holds, overdues, fines, notices, pick-up locations, and billing. in the case of prospector, expectations regarding what would be controversial met with a few surprises. for example, the master record, the bibliographic record from one participating library to which holdings of other libraries are displayed, is based upon encoding level and the library priority list. the latter determines if the incoming record should replace an existing level; a record with a higher level will replace a lower one. based upon the data collected from libraries, a proposal categorized libraries into the following order: large, special, and "all others." the order was further factored by a member library's application of authority control and participation in program for cooperative cataloging programs. the proposal drew minimal comment from libraries. pride of ownership was not an obstacle. everyone was committed to the fullest authorized form of the record. how many loans an individual could request was the subject of early debate. there were concerns about discrepancies between local limits for borrowing and the possible setting of a higher number of loans on prospector. a corollary concern was that a high number might result in depleting a member library's collection. previous experience with borrowing by a subset of members shed light on the issue; there were no problems with loan limits. in fact, inn-reach supports "load leveling" across participating libraries randomly as well as by prospector i bush, garrison, machovec, and reed 73 reproduced with permission of the copyright owner. further reproduction prohibited without permission. precedence tables thus avoiding systematic checkout from one library only. members decided that they could always pass a request on to another owning library if necessary and monitor loans to determine if any abuses would develop. with these options, it then became possible to establish a forty checkout limit for individual patrons in prospector. differences in cataloging practices engendered more discussion because of the potential for a policy that might affect local practice. in the course of comparing practices of institutions, the cataloging/reference task force identified multiple records for the same serial titles that reflected differences in forms of entry and multiple versions treated either in separate records or on the same record. there was wide variety in statements of holdings. these differences warranted gathering further information on holdings; multiple versions, especially those involving electronic versions; and successive/latest entry for cataloging. the task force decided to hold a focus group on serials and invited staff in member libraries from serials, cataloging, and reference to attend. in the meantime, visits to participating libraries were instituted, the first of the roadshows, to discuss serials practices, their implications for overlays and displays, and options for handling them. the focus group attracted a large attendance and proved useful in gathering information about practices and the concerns of participating libraries regarding serials. most libraries reported individual practices for recording holdings. although participants expressed a desire for consistency, attendees also shared that resources are not available to retroactively change them. instead attendees encouraged development of a best practice recommendation that would follow the niso standards for those libraries wishing to change practices. with the exception of electronic versions of serials, focus group participants had no problem with multiple formats in the same bibliographic record as long as it was clear to users. electronic versions prompted a lot of questions about what to do with 856 links to restricted access resources and about changes in software. it was clear that this issue would need further investigation by the task force. the hottest area, successive or latest entry cataloging of serials, registered strong preferences by proponents. attendees did not welcome changing practice in either direction. instead there were questions asked about possible system changes and about the conduct of use studies to determine what problems might arise from latest entry records in the system. with the information gained from the focus group meeting, the task force assigned priority to the areas and pursued latest/ successive entry as the top priority. 74 information technology and libraries i june 2000 already the task force had consulted innovative interfaces, inc. and received a negative reply to possible changes to matching algorithms, loading programs, and record values that could deal with practices of participants because of the software structure. it was technically impossible for a latest entry and successive entry record to load separately given their match on the oclc number. the predominant use of successive entry and its status as the current national standard persuaded the task force initially to recommend coding latest entry in a special way so that the record for such an entry would not be the master record in the system unless it was unique. this interim measure led to the policy recommendation that successive entry serve as the standard for prospector. as a part of the recommendation, members are asked to not undertake retroactive conversion/ recataloging projects to change existing latest entry records. up to the meeting of the prospector board of directors, the serials policy was argued. the approval by the board illustrates that controversial issues may require that leadership commit their libraries to policies. marketing marketing incorporates an overall strategy of identifying patrons' needs, designing products to meet those needs, implementing the products, and promoting and evaluating them. the twin goals of prospector are: (1) one-stop shopping and expanded access regardless of location, and (2) an automated borrowing system to facilitate fast delivery of materials that addressed problems experienced by patrons in searching and obtaining materials. the grant proposal outlined a plan for member libraries to meet these goals through inn-reach software and the cooperative efforts of participating members. with the implementation of the unified catalog and patron-initiated borrowing, the next pieces of the strategy, promotion and evaluation, come into play. member libraries commitment to a cooperative venture takes time and energy. the support for prospector at the library director and dean level had to be translated to staff in member libraries whose efforts would be necessary to support the unified catalog and patron-initiated loans. staff members had to become acquainted with how prospector would benefit patrons and their work. hence internal promotion was a necessary component throughout planning and policy development and with implementation to users. because of the numbers of staff in member libraries, no one method would assure awareness of developments for prospector. the approach involved the alliance's newsletter (datalink), a prospector web site, electronic reproduced with permission of the copyright owner. further reproduction prohibited without permission. discussion lists, e-mail, correspondence, phone calls, documentation, training sessions, and many site visits. the site visits facilitated interaction across institutional lines and were important for discussing critical issues at the local level. in arranging for site visits, it was important to clarify what the staff members wanted to discuss. a general update on prospector might be followed by other technical sessions such as preparing the library's database for load into the prospector system. participants' questions emphasized the importance of sharing the plan for developing prospector and the basic concepts guiding the implementation planning and policy process as listed below. these concepts bore repeating because a staff member could have been hearing about prospector for the first time. • decisions and directions are guided by data and input gathered from participants, standards/best practices, system capabilities, and the aims for prospector described in the grant. • relatively few local practices are affected by participating in prospector. • inclusiveness in record contributions would build prospector into a rich resource for users; however, participating libraries can exert control over contributions. • global policies are developed for prospector only; local sites define their own local policies. • assistance is available to participating libraries in coming up with solutions for special circumstances. • prospector is not reinventing the wheel. although the multitype library and multisystem involvement would produce a new model of inn-reach, other inn-reach sites could serve as models. • think globally but act locally. more than a catchphrase, this statement acknowledges the reality of individual library circumstances and the balancing of prospector goals to maximize access and use of resources by patrons. patrons the design of the pac, a promotional brochure, and individual library public relations efforts all served to promote prospector's availability to users. prospector provides access via telnet and the web. the impetus, however, was to examine member webpacs and create a prospector webp ac that exemplified the best in menu design including caption descriptions, navigational aids, and consistency in display of elements among search screens. special attention was paid to providing example searches that would have appeal for the diversity of patrons served by the membership. after mulling over several name possibilities, the alliance staff suggested the name prospector for the unified catalog, connoting the rich mining history of the rocky mountain area. this identity found its depiction in a classic picture of a gold miner supplied by the colorado historical society. representing the user, the miner is the center panning for gold, an apt image for users exploring the richness of resources from the unified catalog. the incorporation of the image as the logo on the web site and the catalog was followed by its adoption for the entire cooperative venture. name recognition spread quickly. to facilitate promotion at member libraries, the alliance staff designed a brochure. the design features a brief description of the unified catalog, a list of members and information for patrons on how to connect, what's available on prospector, how to use the self-service borrowing, and how to view their circulation record. many libraries have web-mounted guides or paper handouts in their instructional service, using the alliance-designed brochure as a model. finally, staff in member libraries exercised individual approaches to promote prospector to users. denison library describes and provides a link to prospector on its web list of databases and help guides. colorado state university libraries devoted the front page of its library newsletter to "hunting for hidden gold," the introduction of prospector. a special newsletter for auraria's history faculty highlighted prospector in its database news section. the university libraries of the university of colorado at boulder describes the unified catalog in its web site on its state services page. more introductions came from instructional classes held by every member library. profile of participating libraries prospector is unique since it is multistate, multi type, and multisystem. of the sixteen members (see appendix a), almost all are located along the front range of the rocky mountains extending from laramie, wyoming, southward to colorado springs, colorado. only fort lewis college is located on the western slope of the mountains. despite the distances, a network of courier service connects all members. within the membership are eleven public and private academic libraries, three special libraries representing law and medicine, and two public libraries that serve almost one million registered patrons. twelve of the libraries operate innopac and are loaded into prospector. two libraries on the carl system are slated for loading in mid-2000. two other libraries are migrating to the voyager system by endeavor information systems in the summer of 2000. hopes are to incorporate them into the system in 2001. prospector i bush, garrison, machovec, and reed 75 reproduced with permission of the copyright owner. further reproduction prohibited without permission. description of how inn-reach works the inn-reach software is designed to provide a union catalog with one record per title with all of the libraries holding a title represented. after databases are loaded initially, the software automatically queues transactions that occur to bibliographic, item, order, or summary serial holdings records and sends those transactions up to the central catalog. staff in the local library has no extra work or steps to take to send transactions to the union catalog. the union catalog uses a "master" record to maintain only one bibliographic record per title. the "owner" of the master record is determined by several factors. a bibliographic record with only one holding library automatically has that library as the owner of the master record. if more than one library holds a title, the system uses an algorithm to determine which record coming into the system has the highest encoding level. the library that has the record with the highest encoding level becomes the owner of the record, and its version of the record is displayed and indexed in the catalog. in addition, a table is created which has a list of the libraries in priority order for determining the master record if two or more matching records enter the system with the same encoding level. for the prospector catalog, a survey was conducted of the participating institutions to determine which libraries might have the best or fullest records. questions in the survey included size of database, source of bibliographic records, participation in national projects (e.g., program for cooperative cataloging, oclc enhance), amount of authority work done and level of authority control in the local database, level of cataloging given to records, and type of institution. the task force charged with designing the catalog examined these surveys and determined a priority order of the participating institutions for selecting bibliographic records. the system also uses a set of match points each time a bibliographic record is added to the union catalog. whenever a match occurs, the system examines the encoding level of the incoming record and the library from which the record is coming to determine if a change in the master record is required. the existing record is overlaid by the incoming record if the master record holder is changed. the first check is done on the oclc record number. if there is a match on that, the system adds the holdings to the existing record. if there is no match on the oclc number, the system attempts to match on the isbn or issn in combination with the title in the 245 field. again, if a match occurs, the system adds the holdings to the existing record. if no match occurs, a new bibliographic record is added to the catalog. in addition, each library that has a local innovative interfaces system has the ability to exclude bibliographic, item, order, or check-in records from being sent to the 76 information technology and libraries i june 2000 union catalog. suppression may occur in each of these record types. the library may also choose to send a record to the union catalog but exclude it from public display in the union catalog or to suppress a record from displaying in the public catalog both locally and centrally. the inn-reach system has no central database maintenance module, though it does provide a staff mode in which to view records, to create lists, and to monitor transaction queues. the staff module that is available via a telnet connection allows authorized users to view those records that have been contributed to the union catalog but are not displayed to the public in the union catalog. for example, a library may contribute its order records to the union catalog but choose to suppress those records from public display; however, authorized staff may view these records in the inn-reach staff mode or create lists for collection development purposes that include those order records. circulation status of individual items and volumes also appears to the user. the prospector member libraries with local innovative interfaces systems also maintain a set of circulation or item status codes that display various messages to users of their individual public catalogs. the inn-reach system also has a set of circulation or item status codes. agreement was reached on what the status codes were to be in the central catalog, and each member library then had to map its local codes to the codes used in the central catalog to ensure proper message display in the union catalog. in some cases, the member libraries had to adjust local status codes. indexes for the prospector catalog were determined during the profiling process. in general, there are more indexes in the union catalog than are available in the member libraries' local catalogs. indexes in prospector include author, author/ title, library of congress subject headings, medical subject headings, library of congress children's subject headings, journal title, keyword, library of congress classification numbers, national library of medicine classification numbers, dewey decimal classification numbers, government documents numbers, oclc numbers, and special numbers (e.g., isbn, issn, music publisher numbers, etc.). the classification number indexes are derived using the classification numbers that appear in the defined marc tags for the various classification schemes in the bibliographic record and do not represent local call numbers. local call numbers are always stored at the item record level in the union catalog. it was decided that many local marc fields that are defined for local notes or local access would not transfer from the local catalog to the union catalog (e.g., 59x, 69x, 79x, 9xx) to avoid ambiguities and excessive heading conflicts. therefore, there may be access points or index entries in the local catalog that may not be available in the union catalog; the local reproduced with permission of the copyright owner. further reproduction prohibited without permission. catalog may still contain "richer" or "fuller" searching than the union catalog. the local catalog may have materials accessible in it as well that do not appear in the union catalog. patrons using a local catalog may transfer their searches up to prospector simply by clicking on a button in their local public catalogs and have the search automatically occur in the union catalog. patrons may access prospector directly either via the world wide web or via telnet. navigation between local catalogs and prospector as well as navigation within prospector has been designed to be clear and simple. patrons may also go from prospector either back to their local catalog or to the local catalogs of other member libraries. when a patron locates an item that he or she wishes to borrow from prospector, he or she may initiate the request for the item online. the borrowing and lending process is described below. prospector member libraries have been asked to be as inclusive as possible in contributing bibliographic records to the union catalog. member libraries have been asked to contribute the following: • items that users may borrow, including all monographic materials that circulate, and other material types as specified by individual institutions that are listed as available for circulation. • items that users may not borrow but may use onsite, including reference materials, archival materials, rare books, and others as determined by individual institutions. virtual items, such as electronic journals, which have ip limiting and authentication are included in this category. • items that are owned virtually which have urls or ip addresses that are open and unrestricted include government publications and selected home pages as determined by the local institution. bibliographic records that are contributed should have as full cataloging as possible for identification and retrieval. materials that are on reserve and other locally defined special materials (e.g., materials that have use restrictions placed upon them) may be excluded from prospector. the prospector union catalog will also include bibliographic and circulation information from libraries that do not use innovative interfaces as their local system vendor. i the integration of non-innovative libraries into inn-reach one of the major efforts in the prospector project was to be able to incorporate bibliographic, item, summary serial holdings, and acquisitions records from other vendors with the inn-reach union catalog software. in 1997, when the grant was written, it was envisioned that the system would incorporate libraries using two ils vendors-innovative interfaces, inc. and carl corporation-two of the major vendors in colorado at the time. twelve libraries used innovative interfaces and four used the carl system (denver public library, regis university, colorado school of mines, and the university of wyoming). however, in late 1999, the colorado school of mines and the university of wyoming decided to migrate to the voyager system by endeavor information systems (this is occurring in 2000). both of these institutions have still expressed an interest in being part of prospector, so they will need to be integrated in 2001 after they are stable on their new system. the remaining carl sites will be fully integrated in 2000. the integration of records that allows document requests from different vendors is being accomplished as follows: • innovative interfaces, inc. has published a set of specifications for how bibliographic, item, summary serial holdings, and acquisitions order records should be formatted to be loaded into the union catalog. • published specifications were also created for patron verification and for how document requests are to be transferred. • the alliance office is developing the software to package usmarc bibliographic records, item records, summary serial holding records, and order records to transfer to prospector. work is also being done so that document requests may be relayed between the different systems using an intermediate unix server running an sql database with a web interface for circulation to ill staff. because the carl and endeavor systems are built differently, the record updating may be done on a "batch" basis several times a day. patron verification, to determine if a carl or endeavor patron is in good standing before allowing a document request, will be done in realtime. i administrative and committee structures under provisions of the grant, the dean of libraries at the university of northern colorado provides administrative management for the project while the colorado alliance of research libraries houses the server, maintains the union catalog software, provides network connectivity, prospector i bush, garrison, machovec, and reed 77 reproduced with permission of the copyright owner. further reproduction prohibited without permission. develops the software to integrate the non-innovative sites into the union catalog, and provides ongoing system administration support for the project. a prospector steering committee comprised of deans and directors of three participating libraries provided general overview for the project during the initial stages. to carry out the initial work of the project, two task forces were appointed with responsibility for detailed design and implementation of the system: the catalog/reference task force and the circulation/document delivery task force. the catalog/reference task force was charged with making all bibliographic and display decisions relating to the catalog. this included establishing the criteria for determining which institution's bibliographic record displays in the catalog, developing display and overlay hierarchies for bibliographic records coming into the system, and identifying marc fields that would be indexed and displayed in the catalog. membership on this task force included both public services and technical services personnel, but did not include representation from every participating library.9 the circulation/document delivery task force was charged with developing common circulation policies to be applied in the union catalog including loan periods, fines, renewals, holds, recalls, checkout limits, and patron blocks. the task force was also responsible for developing the precedence table for routing patron requests. the members of this task force represented each participating library, and several libraries had representation from both their circulation and interlibrary loan department.lo these two task forces conducted meetings from july 1997 through december of 1999. the stage was set for the task forces' work at a training session held by innovative interfaces, inc. on system operation and functionality. each group received direction on what policy issues needed to be determined to lay the groundwork for establishing the codes that drive system functionality. after the initial training, each task force met several times a month, often consulting with innovative interfaces, inc. and/ or their local libraries as their planning and deliberations continued. communication was an important component during the development of the system. soon after the grant was awarded, staff from the alliance office visited each participating library and met with library personnel to explain the overall goals of the project and how work would be conducted. as detailed development progressed, open forums were held in central locations to keep representatives of all libraries apprised of progress and to get feedback regarding specific policy issues. completed work from the task forces was mounted on the prospector web site. in addition, regular articles appeared in data/ink, the alliance monthly newsletter. specific training sessions were conducted both by the task forces and by innovative interfaces. 78 information technology and libraries i june 2000 as the actual database loading process began, the catalog/reference task force conducted sessions at each prospector library. these sessions were twofold in purpose: to provide an opportunity for a general overview of how the database structure and indexing worked for all library personnel, and to train technical services personnel in how local coding of records impacted the display of their local records in the global catalog. in preparation for going live with patron requesting, innovative interfaces, inc. conducted pac searching and circulation training sessions at several central locations for frontline staff from all institutions. in addition, the circulation/ document delivery task force held a central session for representatives from all libraries to discuss issues relating to the flow of materials among libraries. during system implementation, it became apparent that some ongoing structure would be required for ongoing maintenance and development of the global catalog. in completion of their charges, each task force prepared a final report, which was submitted to the steering committee and to the prospector directors group. each task force recommended its own termination but outlined a structure to address ongoing issues. as approved by the prospector directors group, the ongoing governance structure is multilayered with frontline operations groups, broader planning and policy-setting committees, an advisory committee, a directors group, and electronic discussion lists for communication. monitoring of the day-to-day work of the cataloging and circulation/ document delivery operations is handled by frontline staff via e-mail, electronic discussion lists, and/ or telephone. broader planning and policy issues are addressed through smaller, representative standing committees. the advisory committee and directors group operate at a policy level. the new structure includes: • a catalog site liaison group comprised of one representative from each participating library and charged with serving as the point of contact for inquiries regarding catalog maintenance, access and record merging; • a catalog/reference committee comprised of members selected from the participating libraries and charged with responsibility for all bibliographic and display issues relating to prospector. this includes monitoring details of the current implementation as well as addressing ongoing policy issues, recommending system enhancements, testing new system functionality, and training staff at new sites coming into the system; • a document delivery site liaison group comprised of one or more representatives from each participating institution with responsibility to reproduced with permission of the copyright owner. further reproduction prohibited without permission. serve as a point of contact for other prospector libraries that have inquiries concerning issues, lost books, courier delivery, or related topics; • a circulation/document delivery committee comprised of representatives selected from the participating libraries and responsible for issues relating to the courier delivery service, circulation load-balancing, monitoring member compliance with circulation policies, recommending system enhancements, testing new system functionality , and the year-end reconciliation of lost book charges; and • a prospector advisory committee comprised of tewnty-four deans and directors from participating libraries to address issues requiring quick response relating to project specifications and operating rules. the prospector directors group is comprised of the deans/ directors of all participating libraries and is charged with making recommendations on high-level policy and admission of new participants . since prospector is a project of the nonprofit colorado alliance of research libraries consortium, all final high-level decisions and financial commitments are subject to the approval of the board of directors of the consortium . at the present, five of the sixteen prospector libraries are not part of the formal consortium but participate in this one project. the newly formed committees will continue to address broad policy and operational issues such as the load-balancing tables for routing patron requests to owning libraries, will document best practices for local libraries to follow in implementing certain functionality within their local system to achieve maximal results in the central catalog, will identify enhancements to the system , and will test new release functionality. i borrowing and lending policies and specifications as a prelude to its work, the circulation / document delivery task force examined borrowing and lending practice s from other innovative interfaces . inn-reach sites and reviewed the borrowing policies for consortia! borrowers that were developed and agreed to by a subset of alliance libraries (university of northern colorado, auraria library, and denver public library) several years ago. the first major duty of the task force was to establish circulation and document delivery policies that would govern those functions within the prospector system. these common circulation and document delivery policies were based on a series of assumptions: • the task force policies apply to the unified catalog only; local sites define local policies; • local workflow remains local purview; • policies should be kept simple; • circulating materials are commonly circulated materials, primarily books, at each site; • the task force will work within the confines of the inn-reach system; • if a patron is blocked locally, he or she will be blocked at the global level; • for routing purposes, each institution (rather than branch) is the routing site; and • local sites will determine when their items are declared lost. the task force established a series of recommendations for policies that applied to the prospector system . the proposed policies were discussed within the local institutions as well as with various administrative groups. the final policies for prospector lending as adopted and implemented in the system are: • loan period : twenty -one days • renewals: one • number of holds allowed : forty • checkout limit: forty items • recalls: none, except for academic library reserve collections • lost book charge: $100, which is comprised of a $75 refundable lost book charge and a $25 nonrefund able processing fee • libraries establish their own local rules for overdue fines on prospector materials . key features of the inn-reach software that were emphasized with each local library during training sessions are: • libraries have local control over what is loaned through the global catalog. • libraries have local control over which of their patrons can borrow materials through the global catalog. • if the local copy is checked out or missing, a copy may be requested through prospector. • the system is sensitive to multivolume works and allows particular volumes to be selected. the ongoing document delivery committee has developed a series of "best practices" that establish benchmark policies that each library is urged to adopt in the spirit of uniform cooperation among participating libraries. individual libraries, however, may choose not to adopt these practices. prospector i bush, garrison, machovec, and reed 79 reproduced with permission of the copyright owner. further reproduction prohibited without permission. system functionality the actual steps for a patron to request an item within the prospector system are simple and self-explanatory. once a patron has identified an item they wish to order, the following steps take place: • the user is prompted for institutional affiliation, name, and library card number. • the system checks local system to ensure that the patron is in good standing. • the user selects a pick-up location from those offered by their home institution. • the system forwards the patron request to an owning library with an available circulation status doing load balancing among the libraries with available copies. once the patron request is forwarded to a lending library, the request goes into the queue of requested items from that library. each library has established its own workflow for handling requests; however, that workflow must include interaction with the system to record the status of the request. once the item is located by the lending library, it is checked out to the requesting patron's "home" library and is sent, via courier, to that library. the "home" library then receives the item in the system and holds it pending pick-up by the patron. when the patron arrives to borrow that item, it is checked out to that patron's record according to the prospector loan rules. having a common set of loan rules for all prospector loans provides consistency for the patron. the patron may still have multiple due dates on items checked out at the same time depending on the loan rules for local checkouts. the system maintains statistics on several elements of the borrowing and lending processes. it tracks the total number of items borrowed and loaned and calculates the ratio of borrowing to lending per institution. in addition, it tracks the number of items cancelled and the reason why, the number of holds filled and cancelled, and several other groupings. i challenges and issues with the building of prospector still underway and public access available only since late july 1999, prospector is doing a respectable volume of loans in its infancy. over ten thousand items were delivered during the first six months of operation. this number is expected to dramatically rise as the system grows and as local libraries promote the service. this auspicious start provides a sense of 80 information technology and libraries i june 2000 accomplishment tempered by recognition that there is more to do. some of the major challenges facing the project include: • • • • • • • • development is underway to integrate records for the carl system libraries into the central catalog and provide borrowing capabilities for their patrons. as member libraries choose other online system providers, ideally, these systems likewise need to be interfaced with the prospector system. coming to agreements with all vendors involved will require careful negotiation and wording of contracts. discussions are underway with innovative interfaces and endeavor information systems for merging endeavor libraries into inn-reach. monitoring how the fiscal accounting for first endof-year reconciliation will work for lost books is planned. developing best practices and evaluating software enhancements for inn-reach are necessary. we need to determine how to handle electronic resources and multiple formats, and load records from commercial electronic resources, for example, net library. we must improve matching within the system and additional enhancements to the prospector web site. with growth of the system, full-time operations and management staff may be required. securing funding for the new ventures and new staffing will require development efforts or a sharing of costs by members. there is no state-based funding for ongoing maintenance and new product acquisition. with the increasing flow of materials between libraries, the courier delivery service must be monitored on an ongoing basis. the statewide courier service has been recently restructured and was contracted based on pre-prospector activity levels for interlibrary loan materials. with the ever-growing popularity of prospector, there will be a corresponding increase in volume for the courier. service levels need to be monitored closely to ensure that the speed of delivery is maintained and that the loss and incorrect routing rate is within acceptable limits. the balance of borrowing and lending will have financial impacts on some of the participating libraries. through a legislative allocation, the state library of colorado provides funding on a per transaction basis to libraries that are net lenders, or that loan more materials than they borrow. most libraries are considering the prospector transactions as equivalent to interlibrary loan transactions and counting them toward the payment for lending program. it is anticipated that the inclusion of prospector activity in the interlibrary loan borrowing and reproduced with permission of the copyright owner. further reproduction prohibited without permission. lending statistics will significantly alter the balance of payment for lending among the prospector libraries. already prospector has shown that it is changing behaviors. the cooperation between libraries has been impressive. in member libraries, staff are factoring prospector into their plans and realizing that keeping prospector operations staff informed of problems is a good habit. user searching and document delivery patterns are changing. margaret landrum, director at the fort lewis college library, predicts that prospector will have a dramatic effect on researchers in the geographic area. its start has given all members a share in that expectation. i the future and interesting spin-offs union catalog projects often take on a "life of their own" far beyond what was originally envisioned. some of the future spin-offs may include: • the addition of other research libraries in nearby states. • collection overlap studies and improved coordination on acquisition and weeding projects between libraries. • with the full implementation of the union catalog, there are opportunities for resource sharing at a broader level. the central catalog has the functionality to support bibliographic records for and access to "consortia!" resources, thus enabling libraries to jointly purchase resources and provide centralized access to them. • as database and online information providers develop new methodologies for access to their resources, there will be opportunities to easily link from either the local or central catalog to these online resources, a process which is cumbersome and/or impossible in the nonglobal environment. for instance, where databases are centrally mounted at the alliance office with shared ownership, the link to serial holdings feature is pointed to prospector, thus providing patron access to consortiawide holdings. • use of the system as a central repository for cataloged metadata for electronic resources on the web. • encouraging innovative interfaces, inc. to allow document requests that "fail" in the system to be forwarded to national ill subsystems or commercial document suppliers using national standards. i conclusion prospector dramatically alters the bibliographic landscape in colorado, offering patrons easy access to the bibliographic wealth of the state. patrons will be easily able to move from a local catalog to this regional system and request materials. librarians will find the system useful for collection overlap studies, improved coordination on acquisitions and weeding projects, z39.50 links with other indexing/ abstracting services for serials holdings information (e.g., ovid or silverplatter), and expedited book delivery. the high level of cooperation among the diverse nature of the participating libraries is exemplary. the incorporation of public and private universities, public libraries, and special libraries offers a model for cooperation. references 1. anthony j. dedrick, "the colorado union catalog project," college and research libraries news 59, no. 10 (1998): 754-55; george machovec, "prospector: a regional union catalog," colorado libraries 25, no. 2 (1999): 43-45. 2. clifford a. lynch, "the next generation of public access information retrieval systems for research libraries: lessons from ten years of the melvyl system," l!'.formation technology and libraries 11, no. 4 (1992): 405-15; bernie sloan, "testing common assumptions about resource sharing," information technology and libraries 17, no. 1 (1998): 18-29. 3. thomas dowling, "ohiolink-the ohio library and information network," library hi tech 15, no. 3 / 4 (1997): 136-39; lindy naj, "the carl system at the university of hawaii uhm library," library software review 12, no. 1 (1993): 5-11. 4. gary pitkin and george machovec, colorado union catalog. senate bill 96-197. technology grant and revolving loan program. excellence in learning through technology. december 1996. grant proposal by the university of northern colorado and the colorado alliance of research libraries. 5. gary pitkin, colorado union catalog-prospector. final report. july 27, 1999. 6. machovec, "prospector: a regional union catalog." 7. ibid. 8. ibid. 9. prospector staff web site, www.coalliance.org/prospector. 10. ibid. prospector i bush, garrison, machovec, and reed 81 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a general statistics about prospector: • sixteen libraries (see below) • twelve innovative interfaces sites (went live in fall 1999) • two carl sites (to go live in 2000) • two voyager endeavor sites (to be incorporated in 2001 pending final negotiations with both vendors) • 3.6 million unique marc records as of january 2000, which are expected to grow to more than 5 million after the incorporation of the carl and endeavor sites. • 9 million item records, which are expected to grow to more than 12 million after the incorporation of the carl and endeavor sites. • currently 61 percent of the records in the system are held by only one library. • greater than 1 million registered patrons are possible users . denver public library has over 500,000 patrons and jefferson county public library has over 300,000 patrons . • prospector url for public use : http:/ /prospector.coalliance.org • prospector staff url, which includes policies, committee minutes, and profiling tables: www.coalliance.org/ prospector prospector libraries auraria library colorado college colorado school of mines colorado state university denver public library fort lewis college jefferson county public library regis university university of colorado at boulder university of colorado/colorado springs university of colorado/health sciences university of colorado/law library university of denver university of denver/law library university of northern colorado university of wyoming web site http://carbon.cudenver.edu/public/library http://www.coloradocollege.edu/library http://www.mines.edu/academic/library http://manta.library.colostate.edu http://www.denver.lib.co.us http:/ !library. fortlewis.edu http://www.jefferson.lib.co .us http://www.regis.edu/1 ib/wlibhome.htm http://www.libraries.colorado.edu http://web.uccs.edu/library http://www.uchsc.edu/library/index.html http://www.colorado.edu/law/lawlib http://www.penlib.du.edu http://www.law.du.edu/library http://www.unco.edu/library http://www-lib.uwyo.edu 82 information technology and libraries i june 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix b early borrowing/lending data the borrowing and lending patterns in prospector will be of interest to monitor because of the wide variety of participating libraries in the system. the incorporation of both academic and public libraries has the potential for different use patterns as seen in more homogeneous academic union catalogs. the following data represents some of the very early borrowing and lending patterns in prospector . all of the libraries in the table went "live" in terms of borrowing and lending in late july or august 1999, with the exception of jefferson county public library, which went live in november 1999. history with other similar projects has shown that use will dramatically grow as libraries and users gain familiarity with the service. the incorporation of denver public library in 2000 should provide significant impact on the service. at the present (and in the accompanying table), prospector has been configured to do random load balancing without the use of any precedence tables to force document requests to one site or another. borrowing site aur ccc su cul cub du dul ftl jcpl uccs uchsc unc lending (owning) site ratio ub totals 1879 930 2301 225 1520 1132 129 946 1775 882 364 2063 aur 0.89 1667 108 282 33 232 187 17 113 234 128 70 263 ccc 0.72 673 114 109 11 96 57 66 89 53 10 68 csu 0.86 1985 267 156 29 272 221 18 130 288 134 55 415 cul 0.55 123 24 9 20 5 11 12 3 10 7 3 19 cub 2.05 3120 396 231 590 26 260 21 246 420 233 56 641 du 2.07 2341 361 153 464 42 315 20 163 279 131 69 344 dul 1.12 145 27 7 14 27 15 25 3 11 6 4 6 ftl 0.54 511 66 36 130 3 66 36 7 72 31 11 53 jcpl 0.54 962 187 81 201 11 154 65 11 64 33 38 117 uccs 1.02 900 170 65 148 12 130 65 5 3 137 15 90 uchsc 0.83 301 63 5 49 5 26 31 3 5 32 36 46 unc 0.69 1422 219 81 291 27 207 153 13 89 222 90 30 prospector fulfillments report, august 1999 through february 14, 2000 prospector i bush, garrison, machovec, ano reed 83 lib-mocs-kmc364-20140103103053 113 monocle marc chauveinc: conservator, university library of grenoble, saintmartin d'heres, france a new processing format, based on marc ii and some of bnb's elaborations of marc ii. it further enla1·ges marc ii to encompass french cataloging practices and filing arrangements in f1·ench catalogs. when the bibliotheque universitaire de grenoble, section sciences, wished to transform its card catalog into a book catalog and later into an on-line catalog, the first necessity was to build up a format fitted for the handling of complex records and the filing of non-alphabetical headings. after several personal assays at a format , the librarian at grenoble had translated into french, to give french librarians the opportunity to become acquainted with them, the marc ii and bnb formats ( 1,2) and finding these two formats the most flexible and complete of those reviewed, he also began the work of adapting them to french cataloging rules. the marc format is a standard format designed purely for communication of bibliographic records on magnetic tape; marc ii is a marc format containing library of congress cataloging data disseminated by the marc distribution service of the library of congress. the marc ii format is not intended as a local processing format; indeed, even the library of congress uses its own internal processing format and not marc ii. most centers using marc ii records have designed their own processing formats and file structures from which, if the center is to participate in a network, it must be possible to regenerate records in a communications format. the bnb format, one of the derivatives of marc, contains british national bibliography cataloging data. l 114 journal of library automation vol. 4/3 september, 1971 translations of the two formats was done in january 1969. subsequently a first french adaptation of them was discussed by a group of experts from the bibliotheque nationale and the direction des bibliotheques and was judged not good enough; a deeper work was necessary to analyze the marc format and test its compatibility with french cataloging practices. the resultant new processing format, called monocle (projet de mise en ordinateur d'une notice catalographique de livre), was published in june 1970 (3). programs meanwhile, in order to test the format and to prepare the operational work as soon as possible, programmers attached to the institute of applied mathematics at the university began to write several programs in cobol. cobol was chosen because the institute had good practice in that language, having worked with it for several years; because it can be easily modified if there is a change in format; and because it can be used with several types of computer, enabling other libraries to use it. the programs are still in the process of being written, but since the beginning of january 1970 all books cataloged by the library according to current practice have also been cataloged according to the new system and their records entered into the computer, so that both systems are now working simultaneously. the author catalog program, which is the most difficult and sophisticated, is not yet ready, but most of the following that were foreseen as necessary are actually working: 1) a test program (tstanaly) that checks the logical structure of the records at the input stage and displays on the printout any errors (fields missing, length of tags, of indicators, subfield codes, logical links between fields and information codes, etc. ) ; 2) a program ( expcreat) that creates the files, computes the directory and puts the records at their places on the disks; 3) a program ( tstnot ab ) for producing an alphabetical printed index containing author plus abridged title plus the address of the record on the disk; 4) a program for sorting records according to udc numbers and for printing them on a two-column weekly list; 5) a program to correct and update the created files; 6) a program for sorting records alphabetically in an annual catalog; 7) a program giving a list of udc numbers with the corresponding subject headings and vice-versa ; 8) several small modular programs for supplying statistics on the number of books and volumes, and expenditure in total and by subjects. input and output the institute of applied mathematics has two computers: an ibm monoclejchauveinc 115 360/40 and an ibm 360/67 that work together in a conversational mode during the day and in batch processing during the night. the library uses both of these modes. the conversational mode is controlled by a system called cp jcms (cambridge monitoring system) for the input of data through an ibm 1050 terminal with a paper-tape puncher and a reader, and the batch-processing mode by os (operating system) for the production of lists and statistics. on-line input through the terminal is very convenient for corrections, because of quick access to non-created provisory files of 100 records and the printed list that can be proofread. it has some inconveniences, however, the first of which is that it is a slow system. a typist punches the paper tape at an average rate of twenty records a day. taking into account the time of reply, errors of transmission, and breakdowns of the system, it is not possible to read more than fifty records in a morning, although theoretical speed of reading is forty records an hour. then the files have to be read through the tstanaly program, printed on the line printer, then controlled by the librarians, recalled and corrected on the 1050 terminal, and then again listed, controlled and so on until they are correct. it can take several days before a file of fifty records is ready. though paper is a convenient means of storing data in secmity in case of destruction of the files, it is a slow means of transmitting data and, because it may cause errors in transmission, is not very reliable. the 1050 terminal, although a typewriter, does not have a character set sufficient for library work. it was necessary to create multipunch codes for diacritical marks. because the foregoing is also an expensive means of input, the library is experimenting with a new one. using an ibm 72 tape typewriter already in the library, the corrections will be made off line with the two tape boxes existing on the machine, and when several tapes are correct they will be sent to an ibm service bureau to be translated into a computer magnetic tape. the translation program, which will be written by ibm staff, is not very expensive. output is on an ibm 1403 n1 line printer on which is used a special print train sn with upperand lower-case roman alphabet and to which diacritical marks have been added. products are 1) weekly lists of accessions according to the universal decimal classification, 2) weekly lists of books according to acquisition number, 3) weekly lists of books according to call number, 4) a monthly catalog by authors, 5) an annual catalog by authors, 6) an irregular catalog of periodicals, 7) an irregular catalog of serials, 8 ) an irregular catalog of theses, and 9) regular statistics on the work of the library. it was felt that for several years catalogs in book form would be less expensive and more useful than a system of on-line inquiry that would require display terminals to be used by untrained people. 116 journal of library automation vol. 4/3 september, 1971 format although it will be possible later on to transform monocle's internal format into one suitable for information retrieval , the system in use at grenoble is mainly conceived for printing of the lists enumerated above. this goal led to the consideration of the major problems of filing records and building an internal format to allow easy programming of correct filing , even if this correct filing is rather complicated for the computer. there were two possible ways to achieve this aim: one was to build a simple format and provide complex programming to introduce lists of dead words, tables of transcodification and translation (as "me" to "mac," "van nostrand" to "vannostrand" ) ; the other was to build a more complex format to make programming more simple and generalized and computer processing less expensive. the latter way was followed by the library of congress and the british national bibliography in their communications formats, so a start was made from these two projects, keeping most of their structure, tags and subfield codes. the system to be built, however, required a working format, not a communications format, which led to th e first modifications. two files were created, each containing leader, directory and variable fields. the two parts of each record can be reassembled into one marc record for a communications format. record files the first file, called the index (figure 1) , contains the leader slightly modified; field 008 of the marc format , put in fixed positions and having 69 characters; and the directory, built in a different way from the marc directory. since there will never be a field length of 9999 characters and a starting character position of 9999, length was reduced to 999 characters and the starting character position to 999. since twelve characters are too much for a normal field, these two numbers are only used for computation and are put in binary and both reduced to two bytes. this permits the insertion of three pieces of information between the tag and the field length: the subrecord indicator (two characters), the repeat indicator (one character) and the indicators (two characters). the directory takes the following form : 1 tag 2 subr. 3 1 4 rep. indic. s 1 6 1 1 length st. ch. pos. 8 9 10 1 11 121 bnb marc allows one digit for the subrecord indicator that makes possible nine codes for nine subrecords. since monocle will require more than nine subrecords, two digits are used, thereby permitting 99 subrecords. the repeat indicator of one digit is necessary if several identical fields are repeated in one record (e.g., in the case of several editors). a cross monocle/chauveinc image des enregistrements guide codes d'information emrrc:tntc: vedette auteur ou thrc anon ymc ss 56 57 58 59 lcrc: date jcr mot du titre ou de l'cditcur 60 61 62 63 index 10 ii 2c date 10 ii 12 12 so fichi er principal donnccs prix ~i fig. 1. map of index and cataloging data records. 52 sl donnccs 117 118 ]ourtuj.l of library automation vol. 4/3 september, 1971 reference can be directed towards one of these fields, and to prepare the sort field it is easier for the programs to look only for the tags than to test every "$a, in a field, which requires testing every character in the field. the repeat indicator has another function , that of linking several fields to be associated in the processing. on the worksheet ( figure 2), tags and indicators are written in the initiales border eau de catalocage ms 69 i l i i ecat type fmmc u d,· dah· 1\o rc dah' (4c,) 2.: done ( 4 c.) illustration n1vcau rtpro dur. n a m m r 18 19 ( )uvugc de r h(rc ncc lndt'ji vl·dctcc luccuturc l~ogr.jph.pcnod. lcr pub. sc. collection suite f i 20 21 22 2} 1.4 25 1 6 27 28 29 lo ll n )) lmjut' ~u ~~:c not1:ltnr t. ourcc cat. '":nodk1tc nbrc ¥ol source fournunur "'-'brt t'll . pru e in g ~ 138 s 0 11 a hia 0 11 0 2 0 5 0 0 34 js 36 jq 10 41 42 •0 44 4 s 4ti 47 48 ·19 so 51 52 sj 54 vc"dcttc auu:ur ou tnrc is vt·d'"·uc l c.·r mot tiu,• ou cdltcur 2cmoc tltu ed. l>att g ie n e 1¢ c a j s p e n a 9 619 it ss 56 57 58 59 60 61 62 6j 64 6' 66 67 68 ()9 et~qucttc l nd •c. co.gi!-lli.?>.i\~i9.<:1 ... $.1? ... a .. i;.q.!1.1.pr~.h~~-~-~-y.l! .... ~r~.'!-.~.~-~-<; .... $.~ ... :;:.$!, .... l!y ... ~r.nst . .w.[9.ug!!-.j;~g} .. gi\.~pi\.j;a , ..... , ... ax.~ .. .j\r.~~-l_cl ... w.a.x:.~.\l.~ ... h . ed ...... # ................... ......... ...... ................... ... .. .. 681 04 .. $a .. g.eneti.que ..... #. ..... # ...... ................................ ............ ....................................................... . fig. 2. worksheet . monocle/chauveinc 119 following order: tags, indicators, subrecord indicators, repeat indicators (e.g., 100 00 001). on the magnetic disk, however, the order is as follows: tag, subrecord indicator, repeat indicator, indicator (e.g., 100 001 00). the second file is the main file. records in this file have the same general design as the marc ii communications records, and monocle bas retained all the fields designed by the library of congress. each field begins with a two-character subfield code. grenoble does not use fields 001 to 009, but since the bibliotheque nationale will use these fields, monocle retains them. another characteristic of the second file is that records are input in random order and are given identification numbers that are their physical addresses on the disk. the address, which is put in the leader, is made up of ten digits, of which one is the number of the disk, four the number of the track and five the number of the record. access to every record is simple, since the identification number is also the physical address. a printed abridged alphabetical list giving author, title and this number indexes a printout of the main file. additions and corrections are made on this printout and then added to the computer file through a correction tape. the identification number is the access point. no supplementary internal index is needed, nor is any sequential search. there is direct access to every record in the file. some fields have been added for monocle, some deleted, and some modified. the main field deleted is field 130 (main entry uniform title heading) because its place was considered to be in the group of title fields. accordingly fields 630, 730 and 930 are deleted. that is to say, they are kept on the format, but not used, as is the case with many other fields. field 008 contains codes different from those of the ·marc format. these 69 codes (see figure 1) are put in fixed position just after the leader and before the directory. this permits various studies and manipulations (statistics, sorts, etc.) without going to the main file, which is in a variable-length form and whose contents are therefore less easily accessible than those of fixed fields. field 080 for universal decimal classification was not developed by the library of congress or bnb. for monocle it has been given a structure that permits differentiation of the call number (when the book is classified on the shelves according to the udc) from the udc number, which is only used for the card catalogs. in this structure "$a" represents the call number and "$b" represents the continuation of the udc number, as shown in the following example: 080 00 $a dur 539.143 $b ( 083) : 547.1 the colon instructs the computer to make a cross reference from the second number to the first. in field 100, main entry author personal name, the general layout was 120 journal of library automation vol. 4/3 september, 1971 retained, but the subfield codes changed for filing purposes. as a matter of fact, the filing rules for personal names at the bibliotheque nationale differ in many aspects from american library association rules. in designing monocle, the library tried all along to give filing value to subfield codes in order to simplify programming. for instance, the filing order for the same name is: saint pope emperor kings of france kings (other countries) forename single surname plus forename this gives: john, saint john, king of england john john, bishop of chartres john, peter john, peter, ed. john, peter, advocate therefore the following subfield codes have been adopted: names $a saint $b pope $c emperor $d king of france $e other kings $f (alphabetized by name of kingdom) relator $g date $h numeration $i precedent epithet $k filing epithet $1 forename $m this structure is closer to that of the bnb than to marc's, but an important change has been made in the indicators. marc and bnb indicators for this field were chosen for communications purposes and are therefore not necessarily convenient for internal processing. in fact, the program had to test every character and take action on some of them (delete a blank, transform a hyphen into a blank, etc.), which takes a lot of computer time. to facilitate construction of sort keys a change of indicators was made that assigned to each of them a specific action. for first indicator 1 no action is assigned. that is to say that a name monocle/chauveinc 121 is filed exactly as it is, whether it is a single surname or a compound surname: 100 10 $a durand $m charles " smith $m john ,, castro calvo $m frederico hoa tien su santa cruz $m alonso de eighty percent of names are put under this indicator and put in the sorting field without any test, which saves much computer time. first indicator 2 changes a hyphen into a blank in a compound name. the internal hyphen becomes a blank because it is filed as a blank: martin-chauffier martin chauffier pasteur vallery-radot pasteur vallery radot first indicator 3 is used for the compound names in which a character (blank, hyphen, apostrophe) is deleted: la fontaine (filed as lafontaine) mac innis (filed as macinnis ) o'neil (filed as oneil) von nostrand (filed as van nostrand) there seems nowhere a clear explanation of the reasons for creating a special field for family names (the use of this indicator in marc ii). for french libraries it is useless for filing purposes, family name being filed as a surname. first indicator 4 is used when a complex filing is necessary, that is to say, when the technique of inserting vertical bars (or any other characters) is used in the way proposed by r. coward. the use of this specific indicator for these three bars enables the program to test for them only when this indicator is present. this means that there is just one test per name instead of ten or twenty on each character of every name. as this indicator is in the directory, the processing of the names before the sorting itself is hastened. martin i du card i ducard dupon i de la cueriviere i lacueriviere me alester i macalester i me craw-hill i maccraw hill i muller i mueller i first indicator 0 also has a filing function. as names of saints and kings will be a small part of the files, and in order to file them correctly, three bars are inserted to mark omissions for alphabetization. 100 00 $a therese i d' ii a villa $b sainte 100 00 $a therese de i' ii enfant jesus $b sainte $k marie francoise therese martin in field llo the subfield codes of the communications format were not sufficient for a good filing. first, there seemed no reason to separate name (inverted) and name (direct order) because there is no difference in the 122 journal of library automation vol. 4/3 september, 1971 filing of these names, which is strictly alphabetical. there is also no logical difference between them. so monocle retains only two of these indicators: 10, for name of a corporate body entered under the name of a place and 20, for other corporate bodies. this will be useful either for research purposes or for giving priority in filing to the name of place following upon the other name. as there are the same filing problems as in the author field, the indicator 40 has been added, which means that the three vertical lines are used. 110 40 $c martin i von ii wagner universitat the subfield coding is rather succinct in the marc format, and a change was made from the bnb coding because french practice does not use form subheading and "treaty" subheading. moreover, under the name of a corporate body there can be a subheading such as "conference." this subheading has to be interfiled with a subheading of subordinate department and then should have a different code. library association. londres. conference. library association. londres. cataloging group the subfield codes are: $a french name of the corporate body ~ uniform title used by $b place i the bibliotheque nationale $c name $g relator $h name of congress or conference $1 subordinate department $j additional designation (number of the congress) $k date of the congress $m place of the congress $n remainder of the title $o type of jurisdiction $p name of larger geographic entity $q inverted element monocle does not use the "$t" proposed in marc, and the same is true with many other fields ( 410, 610, 710, 910). monocle makes important changes in the title fields, following british marc but going a little further. tags have been assigned to titles in the following order: 240 collective filing title (complete works) 241 uniform title ( bible) 242 original title 243 translated title (used only for the filing of russian or greek words according to the roman alphabet) 244 romanized title 245 title a book may have several titles, in which case they are filed under the name of the author in the numerical sequence of the tags. a collective monocle/chauveinc 123 title (the complete work ) is filed before a uniform title (if it exists), and the latter before an original title, which is in turn filed before an actual title. classical works of which there are many translations have to be regrouped under the original title, but this may not be true of scientific works or of popular novels, which are filed under actual title. moreover, filing of titles can be different in different libraries and for different books in the same library, which is why the filing order will not be determined on the worksheet, but by the program. this problem in filing order was raised by the bibliotheque nationale, which does not want to have determined in the record itself which of several titles will be the filing title; titles will be put under their respective tags according to their nature, and the program will, according to certain tests, choose the filing title. however, a completely satisfying solution to achieving flexibility and unambiguity in filing has not been arrived at. monocle now uses only sequences 240, 241 and 245, using about the same indicators as the marc format but with a slightly different meaning. the first indicators in field 241 have also been changed in order to achieve proper filing whether or not a conventional title contains a personal name. for example "exposition chagall" will be filed before "exposition bibliotheque nationale." the second indicator set to 't ' shows that there should be a cross reference from this title to the title used for filing (actual title to original title, alternative title to main title ). the second indicator set to "9" shows that the title is not significant and will not be used in a title catalog; field 900 is thus not used and repetition of the cross reference is avoided. monocle also employs in title fields the indicator "4" used in field 100 for complex names and an added indicator "5" for title without personal names. subfield codes have also been modified in such a way as to use their alphabetical value as filing value as well as to identify data elements within a field. the following codes are used in fields 240, 241, 242, 243, 244 and in corresponding fields 440-444, 7 40-7 44, 940-944 ) : $a title $b filing number for a logical order of the bible, koran, etc. $c adaptation or extract $d remainder of the title $e filing number for languages $f language $g filing number for dates $h dates $k name of person $1 epithet $m forename $p place $q corporate body the following are examples of this subfield code use: 124 journal of library automation vol. 413 september, 1971 241 50 $a bible $b 03 $d a. t. pentateuque, genese $c extraits $e 7 $f francais $h 1967 241 50 $a exposition $p paris $q bibliotheque nationale $h 1967 241 10 $a exposition $k chagall $m marc $h 1963 for field 245 marc indicators have been retained and "40" added for title with complex filing. these titles use the three vertical lines. 245 40 $a i le xxeme i vingtieme i siecle for more simple filing the virgule or slash is used to eliminate articles at the beginning of titles. this is more flexible than the use of one indicator to determine the number of characters to avoid in filing, especially as there can be more than nine characters to avoid. 245 00 $a the i chemistry of life the foregoing two techniques are used in all the fields x4y of monocle ( 445, 945, etc. ) . there are slight modifications in other fields. for example, in the "collation" field the american and british formats do not make any mention of volumes. as it comes first in monocle collation, the subfield codes of 260 are modified as follows: $a volumes $b height $c pagination $d illustration this situation may change if an international standardized catalog description is agreed upon. in fields 400, 600, 700 and 900 the marc and bnb marc projects have foreseen only one subfield "$t" to put the title after the name, and only one field, 740 or 940 for titles alone. to permit filing author-title series or an author-title added entry with titles of works of the same author, the following title fields were constructed in exactly the same way as fields 240-245: 440, 640, 740, 940. the following fields were added, with the same indicators and subfield codes as 240-245: 441, 442, 443, 444, 741, 742, etc. the repeat indicator is used to link the author to the title in order to make one entry, since author entry and title entry may be quite independent. 410 20 001 $c national research council 445 00 001 $a i publications $y 1708 100 00 $a meynell $m esther 241 00 $a the i little chronicle of anna magdalena bach $f francais $h 1957 245 01 $a la i petite chronique d'anna magdalena bach $c trad. par m. e. buchet 700 11 $a buchet $m m. e. $g trad. 900 10 001 $a bach $m anna magdalena $g auteur suppose 945 00 001 $a la petite chronique $r voir $z 241 000 945 00 002 $a laipetite chronique d'anna magdalena bach $r voir $z 241 000 monoclejchavveinc 125 this is a very useful tool, which permits generalization of the program to interfile records of books published by an institution with records of series published by the same institution, something not possible if one is under "$t" and the other under 245. the technique is not used, however, when the name is part of the title, as in "holden day series in mathematics." it is also useful because monocle treats large handbooks as series, which is more simple than using "$d" and "$e" in the 245 field and repeating the name of the treatise in every record or using the subrecord technique. field 502 has also been modified to permit filing dissertations by subject, towns, date and number. the details of the indicators and subfield codes can be found in monocle (3). one of the main problems encountered was the processing of multivolume sets. it was thought necessary to develop a provision to permit interfiling volumes of a multivolume set. there are three cases, the most simple being that in which volumes are simply numbered 1, 2, 3 ... with or without a title and a date by volume. field 505 is used in this case, with subfield codes slightly modified: $y volume number $a title $b subtitle $e remainder (date, pagination) following is an example: 505 00 $y 1 $a the practice of kinetics $e 1969, 450 p. $y 2 sa the theory of kinetics $e 1969, 436 p. in the second case, when each volume has authors, title, and date, the subrecord technique can be used, each volume having its own subrecord. this is possible only for treatises with few volumes, since the complete record cannot be too long. for very complicated handbooks the series technique is employed. a record is made for the main title as a guide record, and other records are made for each volume, the name of the main treatise being repeated in fields 400-445. this case could be treated by the subrecord technique, but this would give very long and complicated records, too long to be processed by computer and difficult to correct each time a new volume comes in. although the technique used is not very logical, the guide record is made only once, and a record is made for the volume only when it comes in, without any modification to the records already in the computer. when the records are sorted in alphabetical order, one entry will be made to the individual volume and by the "series note" will find its place under the guide record ( 3). there is of course no logical link internal to the file between records of different books of the same series, nor of them with their guide record. if there is a multivolume work as part of a series, in which each volume bears a different number in the series, there are two possibilities: either to use field 505 and 445 for each volume, linking them by the repeat indicator, or to use the subrecord technique. monocle 126 journal of library automation vol. 4/3 september, 1971 makes a choice according to the complexity of the records. at the request of the bibliotheque nationale and of some documentalists wishing to use the format for bibliographies of articles, some fields were added. field 270 contains name of the printer, the place and date of printing. indicators 00 subfield codes $a place $b printer's name $c date field 545 is the title of a periodical from which is extracted the article in the main entry. this tag was chosen because 500 is the note number (the title of the periodical is not an entry ) and 45 is the title number and can be constructed as a title field. indicators 00 subfield codes $a title $b subtitle $c year $d month $e day $y volume $f issue $g pagination $h bibliographical references "$y" was kept for volume for the sake of consistency throughout the format. since it was undesirable to alter marc fields 660 and 670, monocle employs 680-682 for french subject headings. however, name subject heading tags were retained as 600, 610 and 611, but with modified subfield coding. as in french filing geographical names are filed before topical names, the following tags were assigned: 680 geographical names 681 topical names 682 topical names for indexes only the last tag was created in order to differentiate between subject headings for information retrieval and headings for printed indexes only. if there is a relation between two headings, the slash is used between them to tell the computer to make an inverted entry. for example, 680 04 $a chemistry j physics gives two entries, one under chemistry and the other under physics. to allow each library to have its own subject heading system the second indicator is used to indicate this system: for example, 04 is for bibliotheque de grenoble. codes for monocle are partially taken from the british codes instead of the american ones because they are given a filing value. they are, however, slightly different, in that there is no form subdivision. subfield codes are as follows: $a heading $t chronological subdivision $u geographic subdivision $w general subdivision, 1st level $x general subdivision, 2nd level $y general subdivision, 3rd level $z general subdivision, 4th level monocle/chauveinc 127 the levels have been requested for some information retrieval systems that have multilevel thesauri. as a general rule, the attempt was to give a filing value to most of the subfield codes in order to simplify and hasten processing without any table of translation. the latter is always possible, but burdens the program. the library of congress has published a special format for serials. thinking it not very useful, and feeling that serials could be processed by the marc format for books, the librarians at grenoble simply added to the monocle format some fields specifically for serials, as follows: 030 coden 210 abbreviated title 515 525 not used 555 in monocle 503, bibliographic history, is used for the "followed by" and "following" notes of a periodical, because they are simply notes and not added entries. fields 780 and 785 are not necessary, since in a catalog an entry is usually not made for these titles. most periodicals are processed by the format without any trouble. the holdings of the library are put under 090 $b, as shown in the following example: 090 00 $a cbp. 185 $b 1, 1967$c 5732s. $a call number $b holdings $c location summary as stated at the beginning, the library of congress in its marc ii communications format has published the most comprehensive and the most detailed analysis of a bibliographical record. some, mostly documentalists, do not agree with the marc ii complexity in coding, but their aims are not the same as those of librarians who want, first, to catalog books and catalog records according to rules required for a catalog of a large stock of books. a simple, alphabetical sort on the author names is not adequate and is quite unusable by a reader. however, an arrangement that is good for a weekly bibliography may not be sufficient for a complete catalog. the british national bibliography made a thorough study of catalog entries and produced a better filing structure in accordance with the anglo-american rules. 128 journal of library automation vol. 4/3 september, 1971 monocle translated the marc format with slight modifications, but subsequent trials led to more modifications. monocle format has been made from a librarian's point of view, but sometimes a programmer's view of the system has brought about an improvement in it. monocle is working, but not without difficulties. these difficulties come not from the format itself but from the on-line system, which is not working as well as expected. the system organization may not be of the best and perhaps needs a thorough study before being put into operation. the format is not completely satisfactory and needs improvement. documentalists are right when they say it is too complex and expensive. synthesis between the documentalist format, which is too simple, and the monocle format will be undertaken to simplify the worksheet and speed up input time. from the librarian's point of view there are still problems to be solved. processing of complex titles is not easy, elegant and clear. the analysis should go deeper to determine more logical relations between data, avoidance of duplication of information in the record, and speeding up of processing at every stage. the technique of links between fields and records is not developed in monocle as it is in other systems. it may be helpful to connect data by use of pointers and to do away with repetition of series notes that are already input elsewhere. hierarchical links between records should be useful. hence, there is much work still to be done, but the most immediate goal is to make the monocle format operational not only for the library of grenoble university for also for the bibliotheque nationale, which has adopted it for the automation of the bibliographie de la france. the philosophy behind the modifications introduced in converting the marc communications format to the monocle processing format can and should be discussed, but they have all been made in order to improve the structure of the record not only for an internal processing but also for the interfiling of records, which is much more complicated. until now work has been done only on descriptive cataloging and on author-title filing. subject indexing and information retrieval are quite another job. references 1. avram, henriette d.; knapp, john f. ; rather, lucia j.: the marc ii format: a communications format for bibliographic data (washington, d. c.: library of congress, 1968). 2. bnb marc documentation service publication no.1 (london: council of the british national bibliography, ltd., 1968) . 3. chauveinc, marc: monocle ; protect de mise en ordinateur d'une notice catalographique de livre (grenoble: universitaire de grenoble, 1970) . 242 marc program research and development: a progress report henriette d. avram, alan s. crosby, jerry g. pennington, john c. rather, lucia j. rather, and arlene whitmer: library of congress, washington, d. c. a description of some of the research and development activities at the library of congress to expand the capabilities of the marc system. gives details of the marc processing format used by the library and then describes programming work in three areas: 1) automatic tagging of data elements by format recognition programs; 2) file analysis by a statistical program called genesis; and 8) information retrieval using the marc retriever. the marc system was designed as a generalized data management system that provides flexibility in converting bibliographic descriptions of all forms of material to machine readable form and ease in processing them. the foundation of the system is the marc ii format (hereinafter simply called marc), which reached its present form after many months of planning, consultation, and testing. implementation of the system itself has required development of a battery of programs to perform the input, storage, retrieval, and output functions necessary to create the data base , for the marc distribution service. these programs are essentially like those of the marc interim system described in the report of the marc pilot project ( 1). briefly, they perform the following tasks: marc research and development/ avram 243 1) a pre-edit program converts records prepared on an mt /st to a magnetic tape file of ebcdic encoded record segments. 2) a format edit program converts the pre-edited tape file to a modified form of the marc processing format. 3) a content edit program generates records in the final processing format. at this stage, mnemonic tags are converted to numeric form, subfield codes may be supplied, implicit fixed fields are set, etc. 4) ibm sort program arranges validated content-edit output records by lc card number. this program is also used later in the processing cycle. 5) a generalized file maintenance program (update 1) allows addition, deletion, replacement, or modification of data at the record, field, or subfield levels before the record is posted to the master file. a slightly different version (update 2) is used to update the master file. 6) a print index program generates a list of control numbers for a given file. the list may also include status, date of entry, or date of last transaction for each record. 7) a general purpose print program produces a hardcopy to be used to proofread the machine data against the original input worksheet. since the program is table controlled, it can be modified easily to yield a great variety of other formats and it can be extended routinely to handle other data bases in the marc processing format. 8) two additional programs select new records from the marc master file and convert them from the processing format to the communications format on both sevenand nine-track tapes for general distribution. as the basic programs became operational, it was possible to investigate other aspects of the marc system that would benefit from elaboration and refinement. reports of some of this activity have found their way into print, notably a description of the marc sort program and preliminary findings on format recognition (2, 3), but much of the library·s research and development effort in programming is not well known. the purpose of this article is to give a progress report on work in three significant areas : 1) automatic tagging of data elements by format recognition programs; 2) file analysis by a statistical program called genesis; and 3) information retrieval using the marc retriever. in the following descriptions, the reader should bear in mind that all of the programs are written to accommodate records in the marc processing format. a full description of the format is given to point up differences between it and the communications format. all of the programs are written in assembly language for the ibm s360/ 40 functioning under the disk operating system (dos ) . the machine file is stored on magnetic tape and the system is operated in the batch mode. at present, the programs described here are not available for general distribution, but it is expected that documentation for some of them may 244 journal of library automation vol. 2/4 december, 1969 be filed with the ibm program information department in the near future. meanwhile, the library of congress regrets that it will be unable to supply more detailed information. it is hoped that the information in this article will answer most of the questions that might be asked. marc processing format the marc data base at the library of congress is stored on a ninechannel magnetic tape at a density of 800 bpi. the file contains records in the undefined format; each record is recorded in the marc processing format (sometimes called the internal format). data in the processing format are recorded in binary, packed decimal, or ebcdic notation depending on the characteristics of the data and the processing required. the maximum length of a marc processing record is 2,048 bytes. the magnetic tape labels follow the proposed standard developed by subcommittee x3.2 of the united states of america standards institute. a marc record in the processing format is composed of six parts: record leader ( 12 bytes), communications field ( 12 bytes), record control field ( 14 bytes), fixed fields (54 bytes), record directory (variable in length, with each directory entry containing 12 bytes) and variable data fields (variable length). all records are terminated by an end-of-record ( eor) character. record leader 0 1 2 4 5 6 7 record l ength element number 1 2 date yy : mm :nn status not record used type i number character name of position characters in record record length 2 0-1 date 3 2-4 8 9 11 bibliographic not level used definition total number of bytes in the logical record including the number of bytes in the record length itself. it is given in binary notation. date of last transaction (i.e., the date the last action was taken upon the whole record or some part of the record). the date is recorded in the form of marc research and development/ a vram 245 3 4 5 6 7 status 1 not used 1 record type 1 bibliographic 1 levels not used 3 communications field 12 n 14 15 16 record directory record directory entry source location colult 17 yymmdd, with each digit being represented by a four-bit binary-coded decimal digit packed two to a byte. 5 a code in binary notation to indicate a new, deleted, changed, or replaced record. 6 contains binary zeros. 7 an ebcdic character to identify the type of record that follows (e.g., printed language material) . 8 an ebcdic character used in conjunction with the record type character to describe the components of the · bibliographic record (e.g., monograph). 9-11 contains binary zeros. 18 19 20 2~ record ininnot destination process process u sed type status element number number character n arne of position definition characters in record 1 record directory 2 location 2 directory entry 2 count 3 record source 1 12-13 the binary address of the record directory relative to the first byte in the record (address zero). 14-15 the number of directory entries in the record, in binary notation. there is one directory entry for every variable field in the record. 16 an ebcdic character to show the cataloging source of the record. 246 journal of library automation vol. 2/4 december, 1969 4 record 1 17 an ebcdic character to show destination the data bank to which the record is to be routed. 5 in-process 1 18 a binary code to indicate the type action to be performed on the data base. the in-process type may signify that a new record is to be merged into the existing file; a record currently in the file is to be replaced, deleted, modified in some form; or that it is verified as being free of all error. 6 in-process 1 19 a binary code to show whether status the data content of the record has been verified. 7 not used 4 20-23 contains binary zeros. record control field 24 i ! i i 'i'i i i libr~ry of con~ess cata~og card nymber 1 supplement 1 number not used segment number element number 1 number character name of position definition characters in record library of 12 congress catalog card number 24-35 on december 1, 1968, the library of congress initiated a new card numbering system. numbers assigned prior to this date are in the "old, system; those assigned after that date are in the "new, system( 4). the library of congress catalog card number is always represented by 12 bytes in ebcdic notation but the data elements depend upon the system. marc research and development/ avram 247 old numbering system prefix 3 24-26 an alphabetic prefix is left justified with blank fill; if no prefix is present, the three bytes are blanks. year 2 27-28 number 6 29-34 supplement 1 35 a single byte in binary notation number to identify supplements with the same lc card number as the original work. new numbering system not used 3 24-26 contains three blanks. initial 1 27 initial digit of the number. digit check digit 1 28 "modulus 11, check digit. number 6 29-34 supplement 1 35 see above. number 2 not used 1 36 contains binary zeros. 3 segment 1 37 used to sequentially number the number physical records contained in one logical record. the number is in binary notation. fixed fields i ~ j { 911 the fixed field area is always 54 bytes in length. fixed fields that do not contain data are set to binary zeros . . data in the fixed fields may be recorded in binary or ebcdic notation, but the notation remains constant for any given field. 248 journal of library automation vol. 2/ 4 december, 1969 record directory 92 94 95 96 98 99 100 101 102 103 ta g site not action data relative number used code length address element number character number name of position characters in record definition 1 tag 3 92-94 an ebcdic number that identifies a variable field. the tags in the directory are in ascending order. 2 site number 1 95 a binary number used to distinguish variable fields that have identical tags. 3 not used 3 96-98 contains binary zeros. 4 action code 1 99 a binary code used in file maintenance to specify the field level action to be performed on a record ( i.e., added, deleted, corrected, or modified). 5 data length 2 100-101 length (in binary notation) of the variable data field indicated by a given entry. 6 relative 2 102-103 the binary address of the first address byte of the variable data field relative to the first byte of the record (address zero). 7 directory end 1 n since the number of entries in of field the directory varies, the characsentinel ter position of the end-of-field terminator ( eof) also varies. marc research and development/ avram 249 variable data fields indicator(s) delimiter sub field delimiter data < $ terminator code code( s) element number character number name of position 1 2 3 4 5 6 characters in record indicator variable delimiter 1 subfield variable code delimiter 1 data terminator code variable 1 n n n n n n ~ definition a variable data field may be preceded by a variable number of ebcdic characters which provide descriptive information about the associated field. a one-byte binary code used to separate the indicator ( s) from the subfield code( s). when there are no indicators for a variable field, the first character will be a delimiter. variable fields are made up of one or more data elements ( 5). each data element is preceded by a delimiter; a lower-case alphabetic character is associated with each delimiter to identify the data element. these alpha characters are grouped. all variable fields will have at least one subfield code. each data element in a variable field is preceded by a delimiter. all variable fields except the last in the record end with an endof-field te1minator ( eof); the last variable field ends with an end-of-record terminator (eor). 250 journal of library automation vol. 2/4 december, 1969 format recognition the preparation of bibliographic data in machine readable form involves the labeling of each data element so that it can be identified by the machine. the labels (called content designators) used in the marc format are tags, indicators, and subfield codes; they are supplied by the marc editors before the data are inscribed on a magnetic tape typewriter. in the current marc system, this tape is then run through a computer program and a proofsheet is printed. in a proofing process, the editor compares the original edited data against the proofsheet, checking for errors in editing and keyboarding. errors are marked and corrections are reinscribed. a new proofsheet is produced by the computer and again checked for errors. when a record has been declared error-free by an editor, it receives a final check by a high-level editor called a verifier. verified records are then removed from the work tape and stored on the master tape. the editing process in which the tags, indicators, sub:field codes, and :fixed :field information are assigned is a detailed and somewhat tedious process. it seems obvious that a method that would shift some of this editing to the machine would in the long run be of great advantage. this is especially true in any consideration of retrospective conversion of the 4.1 million library of congress catalog records. for this reason, the library is now developing a technique called "format recognition." this technique will allow the computer to process unedited bibliographic data by examining the data string for certain keywords, significant punctuation, and other clues to determine the proper tags and other machine labels. it should be noted that this concept is not unique to the library of congress. somewhat similar techniques are being developed at the university of california institute of library research ( 6) and by the bodleian library at oxford. a technique using typographic cues has been described by jolliffe ( 7 ) . the format recognition technique is not entirely new at the library of congress. the need was recognized during the development of the marc ii format, but pressure to implement the marc distribution service prevented more than minimal development of format recognition procedures. in the current marc system a few of the fields are identified by machine. for example, the machine scans the collation statement for keywords and sets the appropriate codes in the illustration fixed field. in general, however, machine identification has been limited to those places where the algorithm produces a correct result 100 percent of the time. the new format recognition concept assumes that, after the unedited record has been machine processed, a proofsheet will be examined by a marc editor for errors in the same way as is done in the current marc system. since each machine processed record will be subject to human review, it will be possible to include algorithms in the format recognition program that do not produce correct tagging all of the time. marc research and development/ avram 251 the format recognition algorithms are exceedingly complex, but a few examples will be given to indicate the nature of the logic. in all the examples, it is assumed that the record is typed from an untagged manuscript card (the work record used as a basis for the library of congress catalog card) on an input device such as a paper tape or a magnetic tape typewriter. the data will be typed from left to right on the card and from top to bottom. the data are input as fields, which are detectable by a program because each field ends with a double carriage return. each field comprises a logical portion of a manuscript card; thus the call number would be input as a single field, as would the main entry, title paragraph, collation, each note, each added entry, etc. it is important to note that the title paragraph includes everything through the imprint. identification of variable fields call number. this field is present in almost every case and it is the first field input. the call number usually consists of 1-3 capital letters followed by 1-4 numbers, followed by a period, a capital letter, and more numbers. there are several easily identifiable variations such as a date before the period or a brief string of numbers without capital letters following the period. the delimiter separating the class number from the book number is inserted according to the following five-step algorithm: 1) if the call number is law, do not delimit. 2) if the call number consists simply of letters followed by numbers (possibly including a period), do not delimit. example: hf5415.13 if this type of number is followed by a date, it is delimited before the blank preceding the date. example: ha12f 1967 3) h the call number begins with 'kf' followed by numbers, followed by a period, then: a) if there are one or two numbers before the period, do not delimit. example: kf26.l354 1966a b) if there are three or more numbers before the period, delimit before the last period in the call number. example: kfn5225f.z9f3 4) if the call number begins with 'cs71' do not delimit unless it contains a date. in this case, it is delimited before the blank preceding the date. example: cs7l.s889f 1968 5) in all other cases, delimit before the last capital letter except when the last capital letter is immediately preceded by a period. in this latter case, delimit before this preceding period. examples: ps3553.e73fw6 e595.f6fk4 1968 pz10.3.u36fsp tx652.5f.g63 1968 name main entry. the collation statement is the first field after the call number that can 252 journal of library automation vol. 2/4 december, 1969 be easily identified by analyzing its contents. the field immediately preceding the collation statement must be the title paragraph. if there is only one field between the call number and the collation, the work is entered under title (tagged as 245) and there is no name main entry. if there are two or three fields, the first field after the call number is a name main entry (tagged in the 100 block). when three fields occur between the call number and collation, the second field is a uniform title (tagged as 240). further analysis into the type of name main entry and the subfield code depends on such clues as location of open dates ( 1921) , date ranges covering 20 years or more ( 1921-1967), identification of phrases used only as personal name relators ( ed., tr., comp. ), etc. the above clues strongly indicate a personal name. identification of an ordinal number preceded by punctuation and a blank followed by punctuation is strongly indicative of a conference heading. in the course of processing, delimiters and the appropriate subfield codes are inserted. subfield code "d" is used with dates in personal names; subfield code "e" with relators. example: mepsfde smith, john,f1902-1967,fed. analysis for fixed fields publisher is main entry indicator. this indicator is set when the publisher is omitted from the imprint because it appears as the main entry. the program will set this indicator whenever the main entry is a corporate or conference name and there is no publisher in the imprint statement. this test will fail in the case where there is more than one publisher, one of which is the main entry, but occurrences of this are fairly rare (less than 0.2 percent). biography indicator. four different codes are used with this indicator as follows: a = individual autobiography; b = individual biography; c = collected biography or autobiography; and d = partial collected biography. the "n' code is set when 1) "autobiographical", "autobiography", "memoirs", or "diaries" occurs in the title statement or notes, or 2) the surname portion of a personal name main entry occurs in the short title or the remainder of the title subfields. the "b" code is set when 1 ) "biography" occurs in the title statement, 2) the surname portion of a personal name subject entry occurs in the short title or the remainder of the title subfields, or 3) the dewey number contains a "b" or a 920. the "c" code is set when 1) "biographies" occurs in the title statement or 2) a subject entry contains the subdivision 'oiography." there appears to be no way to identify a "d" code situation. despite this fact, the biography indicator can be set correctly about 83 percent of the time. marc research and development/ avram 253 implementation schedule work on the format recognition project was begun early in 1969. the first two phases were feasibility studies based on english-language records with a certain amount of pretagging assumed. since the results of these studies were quite encouraging, a full-scale project was begun in july 1969. this project is divided into five tasks. task 1 consisted of a new examination of the data fields to see if the technique would work without any pretagging. new algorithms were designed and desk-checked against a sample of records. it now seems likely that format recognition programs might produce correctly tagged records 70 percent of the time under these conditions. it is possible that one or two fixed fields may have to be supplied in a pre-editing process. tasks 2 through 5 remain to be done. task 2 will provide overall format recognition design including 1) development of definitive keyword lists, 2) typing specifications, 3) determination of the order of processing of fields within a record, and 4) description of the overall processing of a record. when the design is completed, a number of records will go through a manual simulation process to determine the general efficiency of the system design. task 3 will investigate the extension of format recognition design to foreign-language titles in roman alphabets. task 4 will provide the design for a format recognition program based on the results of tasks 2 and 3 with detailed flowcharts at the coding level. the actual coding, checkout, and documentation will be performed as task 5. according to current plans, the first four tasks are scheduled for completion early in 1970 and the programming will be finished later in the year. outlook it is apparent that a great deal of intellectual work must be done to develop format recognition algorithms even for english-language records and still greater ingenuity will be required to apply these techniques to foreign-language records. nevertheless, on the basis of encouraging results of early studies, there is evidence that the human effort in converting bibliographic records to machine readable form can be materially reduced. since reduction of human effort would in tum reduce costs, the success of these studies will have an important bearing on the rate at which current conversion activities can be expanded as well as on the economic feasibility of converting large files of retrospective cataloging data. genesis early in the planning and implementation of automation at the library of congress it became apparent that many tasks require information about the frequency of data elements. for example, it was helpful to know about the frequency of individual data elements, their length in characters, and the occurrence of marks of punctuation, diacritics, and specified 254 journal of library automation vol. 2/4 december, 1969 character strings in particular data elements. in the past, most of the counting has been done manually. once a sizable amount of data was available in machine readable form, it was worthwhile to have much of this counting done by computer. therefore, the generalized statistical program (genesis) was done as a general purpose program to make such counts on all forms of material in the marc processing format on magnetic tape files. any of a variety of counts can be chosen at the time of program execution. there are three types of specifications required for a particular run of the program: selection criteria; statistical function specifications; and output specifications. selection criteria record selection criteria are specified by statements about the various data fields that must be present in the records to be processed. field selection criteria specify the data elements that will actually be analyzed. processing by these techniques operates logically in two distinct stages: 1) the record is selected from the input file; i.e., the program must determine if a particular record is to be included in the analysis; and 2) if the record is eligible, the specified function is performed on selected data fields. it should be noted that records may be selected for negative as well as positive reasons. the absence of a particular field may determine the eligibility of a record and statistical processing can be performed on other fields in the record. record selection is optional; if no criteria are specified, all records on the input file will be considered for processing. since both record selection and field selection reference the same elements, specifications are input in the same way. selection of populations can be designated by tagging structure (numeric tags, indicators, subfield codes or any combination of these three), specified character strings, and specified characters in the bibliographic data. the following queries are typical of those that can be processed by genesis. how many records with an indicator set to show that the volume contains biographic information also have an indicator set to show that the subject is the main entry? how many records with a field tagged to show that the main entry is the name of a meeting or conference actually have the words "meeting" or "conference" in the data itself? table 1 shows the operators that can be used with record and field select statements. statistical function specification the desired statistical function is specified via a function statement. four functions have been implemented to date. they involve counts of occurrences of specified fields, unique data within specified fields given a range of data values, data within a specified range, and particular data characters. in addition to counting the frequency of the specified element, genesis calculates its percentage in the total population. marc research and development/ a vram 255 table 1. operators of genesis operator equals not equal greater than or equal to less than or equal to and or example of usage count all occurrences where data represented by tag 530 equals "bound with" count all occurrences where the publication language code is not equal to "eng" count all occurrences and output records that are greater than or equal to 1,000 characters count all occurrences of records entered on the marc data base before june 1, 1968 (less than or equal to 680601) count all occurrences where the publication equals "s" and the publication date is greater than or equal to 1960 count all occurrences of personal name main entry (tag 100) a relator ( subfield code "e") that equals "ed." or "comp." the first function counts occurrences per record of specified field selection criteria. this answers queries concerning the presence of given conditions within the selected records; for example, a frequency distribution of personal name added entries (tag 700). this type of count results in a distribution table of the number of records with 0 occurrences, 1 occurrence, 2 occurrences, and so forth. the second function, which counts occurrences of unique data values within a specified range, answers queries when the user does not know the unique values occurring in a given field, but can state an upper and lower value. for example, the specific occurrences of publishing dates between 1900 and 1960 might be requested. the output in response to this type of query consists of each unique value, lying within the range specified, with its frequency count. in addition, separate counts are given for values less than the lower bound and of values greater than the upper bound. the function is performed by maintaining in computer memory an ordered list of unique values encountered, together with their respective counts. as selected fields are processed, each new value is compared against the entries in the list. if the new value already appears in the list, its corresponding count is incremented. otherwise, the new value is inserted in the list in its proper place and the remainder of the list is pushed down by one entry. the amount of core storage used during a 256 journal of library automation vol. 2/ 4 december, 1969 particular run is directly related to the number of unique occurrences appearing within the specified range. since the length of each entry is determined by the length of the bounds specified, the number of entries which can be held in free storage can vary from run to run. thus it is possible that the number of unique entries may fill memory before a run has been completed. when this happens, the value of the last entry in the list will be discarded and its count added to the "greater than upper bound" count. in this way, while the user may not obtain every unique value in the specified range, he will obtain all unique values from the lower bound which can be contained in memory. he is then in a position to make subsequent runs using, as a beginning lower bound value, the highest unique value obtained from the preceding run. the third function processes queries concerning counts within specified ranges. when this function is used, unique values are not displayed. instead, the occurrences are counted by specified ranges of values. more than one range can be processed during a single run. on output, the program provides a cumulative count of values encountered within each range as well as the counts of those less than and those greater than the ranges. function four counts occurrences of particular data characters. an individual character may be specified explicitly or implicitly as a member of a group of characters. this allows the counting of occurrences of various alphabetic characters within specified fields. the current list of character classes that can be counted are: alpha characters, upper-case letters, lowercase letters, numbers, punctuation, diacritics, blanks, full (all characters included in above classes), nonstandard special characters, and any particular character using hex notation. it should be noted that there are various ways of specifying particular characters. for example, an "a" might be designated causing totals to accumulate for all alphabetics; or, a "u" and an "l" might be specified causing separate totals to be accumulated for upperand 1ower-case characters. in addition to the total counts for each class, individual counts of characters occurring within any class can be obtained for display along with the total count. output specifications formatted statistical information is output to the line printer. optionally, the selected records can be output on magnetic tape for later processing. limitations for the purpose of defining a query, more than one field may be specified for record and field selection, using as many statements as necessary. at present, however, the statistical processing for a particular run is performed on all of the run-criteria collectively. for example, separate runs of the program are required to obtain each frequency distribution. it is important to note that genesis is essentially a means of making marc research and development/ avram 257 counts. the statistical analysis of data is a complex task that requires sophisticated techniques. genesis does not have the capability to analyze data in terms of standard deviation, correlation, etc. but the output does constitute raw data for those kinds of analyses. although the four functions of genesis implemented to date do not, in themselves, provide a complete statistical analysis, they greatly lessen the burden of counting; and techniques for designating data elements to be counted suffice to describe extremely complex patterns. continued use of the program will no doubt provide guidelines for expansion of its functions. use of the program genesis has already provided analyses that are helpful in the design of automated procedures at the library of congress, as is indicated by the following instances. a frequency distribution of characters was made to aid in specifying a print train. an analysis of certain data characteristics has determined some of the specifications for the format recognition program described in an earlier section. genesis is providing many of the basic counts for a thorough analysis of the material currently being converted for the marc distribution service to determine frequency patterns of data elements. the findings should be valuable for determining questions about storage capacity, file organization, and retrieval strategy. although genesis is a new program in the marc system, there is little doubt that it is a powerful tool that will have many uses. marc retriever since the marc distribution service has been given the highest priority during the past two years, the emphasis in the implementation of the marc system has been on input, file maintenance, and output with only minimum work performed in the retrieval area. it was recognized, moreover, that as long as marc is tape oriented, any retrieval system put into effect at the library of congress would be essentially a research tool that should be implemented as inexpensively as possible. it did seem worthwhile, however, to build retrieval capability into the marc system to enable the lc staff to query the growing marc data base. query capability would answer basic questions about the characteristics of the data that arise during the design phases of automation efforts. in addition, it seemed desirable to use the data base in an operational mode to provide some needed experience in file usage to assist in the file organization design of a large bibliographic data base. the specifications of the system desired were: 1) the ability to process the marc processing format without modification; 2) the ability to query every data element in the marc record, alone or in combination (fixed fields, variable fields, the directory, subfield codes, indicators); 3) the ability to count the number of times a particular element was queried, to accumulate this count, print it or make it available in punched card 258 journal of library automation vol. 2/4 december, 1969 form for subsequent processing; and 4) the ability to format and output the results of a query on magnetic tape or printer hardcopy. to satisfy these requirements it was decided to adapt an operational generalized information system to the specifications of the library of congress. the system chosen was aegis, designed and implemented by programmatics, inc. the modification is known as the marc retriever. general description the marc retriever comprises four parts: a control program, a parser, a retrieval program, and a utility program. queries are input in the form of punched cards, stacked in the core of the ibm s /360, and operated on as though all queries were in fact one query. thus a marc record will be searched for the conditions described by all queries, not by handling each query individually and rewinding the input tape before the next query is processed. the control program is the executive module of the system. it loads the parser and reads the first query statement. the parser is then activated to process the query statement. on return from the parser, the control program either outputs a diagnostic message for an erroneous query or assigns an identification number to a valid query. after the last query statement has been parsed, the control program loads the retrieval program and the marc input tape is opened. as each record on the marc tape is processed, the control program checks for a valid input query. if the query is valid, the control program branches to the retrieval program. on return from the retrieval program, the control program writes the record on an output tape if the record meets the specifications of the query. after the last marc record has been read from the input tape, the control program branches to the retrieval program for final processing of any requested statistical function (hits, ratio, sum, avg) that might be a part of the query. the output tapes are closed and the job is ended. the parser examines each query to insure that it conforms to the rules for query construction. if the query is not valid, an error message is returned to the control program giving an indication as to the nature of the error. valid query statements are parsed and converted to query strings in polish notation, which permits mathematical expressions without parentheses. the absence of embedded parentheses allows simpler compiler interpretation, translations, and execution of results. the retrieval program processes the query strings by comparing them with the marc record data elements and the results of the comparison are placed in a true/false stack table. if the comparison result is true, output is generated for further processing. if the result is false, no action · takes place. if query expressions are linked together with "or" or "and'' connectors, the results in the true/false stack table are ored and anded together resulting in a single true or false condition. marc research and development/ avram 259 the utility program counts every data element (fixed field, tag, indicator, sub field code, data in a variable field) that is used in a query statement. the elements in the search argument are counted separately from those in the output specifications. after each run of the marc retriever, the counts can be printed or punched for immediate use, or they can be accumulated over a longer period and processed on demand. query language general. query statements for the marc retriever must be constructed according to a precisely defined set of rules, called the syntax of the language. the language permits the formation of queries that can address any portion of the marc record (fixed fields, record directory, variable fields and associated indicators and subfields). queries are constructed by combining a number of elements: marc retriever terms, operators, fixed field names, and strings of characters (hereafter called constants). the following sections describe the rules for constructing a query and the query elements with examples of their use. query formation. a query is made up of two basic parts or modes: the if mode which specifies the criteria for selecting a record; and the list mode which specifies which data elements in the record that satisfy the search criteria are to be selected for printing or further processing. in general, the rules that apply to constructing if-mode expressions apply to constructing list-mode expressions except that the elements in the list mode must be separated by a comma. a generalized query has the following form: if if-mode expression list list-mode expression; where: if if-mode expression list list-mode expression signals the beginning of the if mode. specifies the search argument. signals the beginning of the list mode. specifies the marc record data element( s) that are to be listed when the search argument specified in the if-mode expression is satisfied. the format of the query card is flexible. columns 1 through 72 contain the query which may be continued on subsequent cards. no continuation indicator is required. columns 73 through 80 may be used to identify the query if desired. the punctuation rules are relatively simple. one or more blanks must be used to separate the elements of a query and a query must be terminated by a semicolon. 260 journal of libmry automation vol. 2/4 december, 1969 queries that involve fixed fields take the following form: if fixed-field-name!= constant list fixed-field-name2 where: fixed-field-namel constant fixed-field-name2 the name of fixed field. any operator appropriate for this query. the search argument the fixed field to be output if a match occurs. to query or specify the output of a variable field, the following general expression is used. if scan (tag= nnn) = constant list scan (tag= nnn); where: scan tag nnn constant indicates that a variable field is to be referenced. indicates that the tag of a variable field is to follow. the only valid operator. specifies the tag of the variable field that is to be searched or output. specifies the character string of data that is the search argument. the marc retriever processes each query in the following manner. each record in the data base is read from tape into core and the data elements in the marc record specified in the if-mode expression are compared against the constant( s) in the if-mode expression. if there is a match, the data element( s) specified in the list-mode expression are output. key terms. the terms used in a query statement fall into two classes. the first group instructs the program to perform specified functions: scan, hits, avg, ratio, sum. the second group relates to elements of the record structure. the most important key terms in this class are: indic (indicator), ntc ( subfield code), record (the entire bibliographic record), and tag (variable field tag). these terms are used to define a constant; e.g., tag= 100. operators. operators are characters that have a specific meaning in the query language. they fall into two classes. the first contains relational operators, such as equal to and greater than, indicating that a numeric relationship must exist between the data element in the marc record and the search argument. the second class comprises the logical operators "and" and marc research and development/ avram 261 "or". the operators of the marc retriever are shown in table 2. in the definitions, c is the query constant and d is the contents of a marc record data element. table 2. operators of the marc retriever operator constan~s. > ;::: < ~ 1= & i meaning c equals d c is greater than d c is greater than or equal to d c is less than d c is less than or equal to d c is not equal to d "and" (both conditions must be true) "or" (at least one condition must be true ) a constant is either a string of characters representing data itself (e.g., poe, edgar allan) or a specific variable field tag, indicator( s), and subfield code( s). constants may take the following form: cc where cc is an alphabetic or numeric character or the pound sign"#". when this form is used, the marc retriever will convert all lower-case alphabetic characters in the data element of the marc record being searched to upper-case before a comparison is made with search argument. this conversion feature permits the use of a standard keypunch that has no lower-case capability for preparation of queries. 'cc' where cc can be any one of the 256 characters represented by the hexadecimal numbers 00 to ff. this form allows nonalphabetic or nonnumeric characters not represented on the standard keyboard to be part of the search argument. when this form is used, the marc retriever will also convert all lowercase alphabetic characters in the data elements in the marc record being searched to upper-case before a comparison is made. @cc@ where cc can be any one of the 256 characters represented by the hexadecimal numbers 00 to ff. when this form is used, characters in the data element of the marc record being searched will be left intact and the search argument must contain identical characters before a match can occur. # the pound sign indicates that the character in the position it occupies in the constant is not to take part in the comparison. for example, if the constant were #ank, tank, rank, bank would be considered matches. more than one pound sign can be used in a constant and in any position. 262 journal of library automation vol. 2/ 4 december, 1969 specimen queries. the following examples illustrate simple query statements involving fixed and variable fields. if mcpdate1 = 1967 list mcrcnumb ; the entire marc data base would be searched a record at a time for records that contained 1967 in the first publication date field ( mcpdate1). the lc card number (mcrcnumb) of the records that satisfied the search argument would be output. if scan(tag= 100) = destouches list scan(tag=245); the personal name main entry field (tag 100) of each marc record would be searched for the surname destouches. if the record meets this search argument, the title statement (tag 245) would be output. in addition to specifying that a variable field is to be searched, the scan function also indicates that all characters of the variable field are to be compared and a match will result at any point in the variable field where the search argument matches the variable field contents. for example, if the if-mode expression is scan(tag = 100) =smith a match would occur on the following examples of personal name main entries (tag 100) : smith, john; smithfield, jerome; jones-smith, anthony. it is possible to include the indicators associated with a variable field in the search by augmenting the constant of the scan function as follows: if scan(tag = 100&indic = 10) = destouches list scan(tag = 245); where: indic 1 0 specifies that indicators are to be included. specifies that the first indicator must be set to 1 (the name in the personal name main entry [tag 100] is a single surname, specifies that the second indicator must be set to zero (main entry is not the subject). the personal name main entry field (tag 100) of each record would be searched and a hit would occur if the indicators associated with the field were 1 and 0 and the contents of the field contained the characters "destouches." if the record met these search criteria, the title statement (tag 245) would be output. it is also possible to restrict the search to the contents of one or more subfields of a variable field. for example: if scan ( tag = loo&indic = 10&ntc = a) =destouches list scan(tag=245); where: ntc a indicates that a subfield code follows. specifies that only the contents of subfield a are to be included in the search. note that in this form the actual subfield code "a" is converted to "a" by the program (see section on constants) . marc research and development/ avram 263 special rules. so far the discussion has concerned rules of the query language that apply to either the if mode or the list mode. this section and the remaining sections will discuss those rules and functions that are unique to either the if mode or the list mode. in the if mode, fixed and variable field expressions can be anded or ored together using the logical operators & and j. for example: if mcpdate1 = 1967&scan(tag = 100) = destouches list scan(tag = 245); this query would search for records with a publication date field (mcpdate1) containing 1967 and a personal name main entry field ( tag 100) containing des touches. if both search criteria are met, the title statement field (tag 245) would be printed. in the list mode more than one fixed or variable field can be listed by a query as long as the fixed field names or scan expressions are separated by commas. for example: if scan(tag = 100) = destouches list scan(tag = 245) , mcrcnumb; the list mode offers two options, list and listm, which result in different actions. list indicates that the data elements in the expressions are to be printed, and listm indicates that the data elements in the expression are to be written on magnetic tape in the marc processing format. it is often desirable to list a complete record either in the marc processing format using listm or in printed form using list. in either case, the listing of a complete record is activated by the marc retriever key term record. for example: if scan (tag= 100) = destouches list record; the complete record would be written on magnetic tape in the marc processing format instead of being printed out if listm were substituted for list in the above query. four functions can be specified by the list mode. hits signals the marc retriever to count and print the number of records that meet the search criteria. for example: if scan(tag=650) = automation list hits; ratio signals the marc retriever to count both the number of records that meet the search criteria and the number of records in the data base and print both counts. the remaining two list functions permit the summing of the contents of fixed fields containing binary numbers. sum causes the contents of all specified fields in the records meeting the search criteria to be summed and printed. for example : if mcrcnumb = ·~~~68 # #####' list sum ( mcrlgth ); the data base would be searched for records with lc card number field 264 journal of library automation vol. 2/4 december, 1969 ( mcrcnumb) containing three blanks and 68 in positions one through five. the remaining positions would not take part in the query process and could have any value. if a record satisfied this search argument, the contents of the record length field (mcrlgth) would be added to a counter. when the complete data base had been searched, the count would be printed. avg performs the same function as sum and also accumulates and prints a count of the number of records meeting the search criteria. use of the program the marc retriever has been operational at the library of congress since may 1969 and selected staff members representing a cross-section of lc activities have been trained in the rules of query construction. the applications of the program to the marc master file include: identification of records with unusual characteristics for the format recognition study; selection of titles for special reference collections; and verification of the consistency of the marc editorial process. as the file grows, it is expected that the marc retriever will be useful in compiling various kinds of bibliographic listings, such as translations into english, topical bibliographies, etc., as well as in making complex subject searches. the marc retriever is not limited to use with the marc master file; it can query any data base that contains records in the marc processing format. thus, the legislative reference service is able to query its own data base of bibliographic citations to produce various outputs of use to its staff and members of congress. because the marc retriever is designed to conduct searches from magnetic tape, it will eventually become too costly in terms of machine processing time to operate. it is difficult to predict when the system will be outgrown, however, because its life span will be determined by the growth of the file and the complexity of the queries. meanwhile, the marc retriever should provide the means for testing the flexibility of the marc format for machine searching of a bibliographic file. references 1. u.s. library of congress. information systems office: the marc pilot project. (washington, d.c.: 1968), pp. 40-51. 2. rather, john c.; pennington, jerry g.: "the marc sort program," journal of library automation, 2 (september 1969), 125-138. 3. recon working task force. conversion of retrospective catalog records to machine-readable form. (washington, d.c.: library of congress, 1969). 4. u.s. library of congress. information systems office: subscribers guide to the marc distribution service, 3d ed. (washington, d.c.: 1969), pp. 31-3lb. 5. ibid., p. 40. marc research and development/ avram 265 6. cunningham, jay l.; schieber, william d.; shoffner, ralph m.: a study of the organization and search of bibliographic holdings records in on-line computer systems: phase i. (berkeley, calif.: institute of library research, university of california, 1969), pp. 85-94. 7. jollilie, john : "the tactics of converting a catalogue to machinereadable form," journal of documentation, 24 (september 1968), 149-158. lib-mocs-kmc364-20131012113301 random sample of personal names in the lc file indicates that less than 17 percent of personal names require cross-references. thus the personal name headings that occur only once but would require authority records because of cross-references could be less than 17 percent. the frequency data combined with reference structure data could have a significant impact on design. out of a total of 695,074 personal names in the authority files associated with the marc bibliographic files examined here, 456,328, or 66 percent, occur only once. of these, fewer than 77,575 would be expected to have cross-references, thus the nameauthority file for personal names could be reduced in size from 695,074 records to 316,321, a 55 percent decrease. if separate authority records are a system requirement, the occurrence figures might then be useful for defining configurations that employ machine-generated provisional records for single-occurrence headings that do not have reference structures or that simplify in other ways the treatment of these headings. these figures may also be useful in making decisions on the addition of retrospective authority records to the automated files. reference 1. william gray potter, "when names collide: conflict in the catalog and aacr2," library resources & technical services 24:7 iwinter 1980). . rlin and oclc as reference tools douglas jones: university of arizona, tucson. the central reference department (social science, humanities, and fine arts) and the science-engineering reference department at the university of arizona library are currently evaluating the oclc and rlin systems as reference tools, to see if their use can significantly improve the effectiveness and efficiency of providing reference service. a significant number of the questions received by our librarians, and presumably by librarians elsewhere, incommunications 201 valve incomplete or inaccurately cited references to monographs, conference proceedings, government documents, technical reports, and monographic serials. if by using a bibliographic utility a librarian can identify or verify an item not found in printed sources, then effectiveness has been improved. once a complete and accurate description of the item is found , it is a relatively simple task to determine whether or not the library has the item, and if not, to request it through interlibrary loan. additionally, if the efficiency of the librarian can be improved by reducing the amount of time required to verify or identify a requested item, then the patron, the library, and, in our case, the taxpayer, have been better served. the promise of nearimmediate response from a computer via an online interactive terminal system is clearly beguiling when compared to the relatively time-consuming searching required with printed sources, which frequently provide only a limited number of access points and often become available weeks, months, or even years after the items they list. we realize, of course, that the promise of instantaneous electronic information retrieval is limited by a variety of factors, and presently we view access to rlin and oclc as potentially powerful adjuncts tonot replacements for printed reference sources. given that rlin and oclc have databases and software geared to known-item searches for catalog card production, our evaluation attempts to document their usefulness in reference service. a preliminary study conducted during the spring semester of 1980-81 indicated that approximately 50 percent of the questionable citations requiring further bibliographic verification could be identified on oclc or rlin. the time required was typically five minutes or less. successful verification using printed indexes to identify the same items ranged from 20 percent in the central reference department to 50 percent in science-engineering. time required per item averaged approximately fifteen minutes. based on our findings, we plan a revised and more thorough test during the fall semester of 1981-82, which will include an assessment of the enhancements to the 202 journal of library automation vol. 14/3 september 1981 rlin system scheduled to be operational this summer. the proposed test will involve eight members of the reference staff-four from each department-who will be trained to search on oclc and rlin. those selected will include both librarians and library assistants who regularly provide reference assistance. the results obtained from such a representative group will better enable us to assess the impact on the whole reference staff should we later decide to fully implement the service. they will be the only ones involved in sampling questions and conducting comparative searches. the test will have two components, the first of which will be a twenty-week period to collect at least 400 sample questions. during their regularly scheduled reference hours, the eight specially trained librarians 'will collect samples of reference requests for materials that, based on the information initially given by the patron, cannot be identified in the card catalog. after checking the catalog, the librarian will then complete the top portion of a two-page selfcarbon form with all of the information that is known about the requested item. then, at regular intervals during the semester, the pages of each form will be separated and distributed to other members of the test staff for batch-mode searching. the manual oclc and rlin searching for each query will be done by different staff members to eliminate crossover effects. each request will be searched on both oclc and rlin with the following information being recorded: 1. date of the material requested (if known). 2. type of material (e.g., conference proceeding). 3. amount of time required to do the search. 4. success or failure of the search. this information will then be cumulated in a statistical table, and the results of each search will be keypunched for computerized analysis using the bmdp (biomedical computer programs) statistical package to determine whether or not effectiveness and efficiency have been improved significantly. in addition, on twenty-four randomly selected days during the semester the trained searchers will count the total number of questions received by them on that day that would have been appropriate to search on rlin or oclc. by using these data it will be possible to extrapolate the potential usefulness of the systems for the entire semester. the second component of the test will be a two-week real-life test during which all questions requiring further verification would be searched immediately on rlin , oclc, and in the appropriate printed sources to compare time required, success rate, and type of material requested. this sort of test would permit the searcher to continue to negotiate with the patron as the search progressed, which is the usual situation. also, this would provide the only opportunity to have the patron judge the value of subject searches done on rlin. if funding is received, preliminary results should be available in early 1982. anyone conducting similar or otherwise relevant studies is asked to contact the author. replicating the washington library network computer system software thomas p. brown: manager of computer services, and raymond deb use: manager of development and library services, washington library network, olympia. the washington library network (wln) computer system supports shared cataloging and catalog maintenance, retrospective conversion, reference, com catalog production, acquisitions, and accounting functions for libraries operating within a network. the system offers both full marc and brief catalog records as well as linked authority control for a ll traced headings. it contains more than 250,000 lines of pl/1 and ibm bal code in more than 1,100 program modules and runs on ibm or ibm-compatible hardware with ibm operating systems (mvs,os/vs1). all database management functions are provided by adxbas, a product of software a.g. of north america. the online system runs unbook reviews 211 acknowledgment the work reported in this paper was supported in part by a grant from the u.s. office of education, oeg-7-071140-4427. references 1. ruecking, frederick h., jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-38. 2. lipetz, ben-ami; stangl, peter: "user clues in initiating searches in a large library catalog," in american societ'l for information science, proceedings, 5. annual meeting, october 20-24, 1968, columbus, ohio, p. 137-139. book reviews conceptual design of an automated national library system, by norman r. meise. metuchen, n.j.: scarecrow press, 1969. 234 pp. $5.00. this is a very confusing book. and it is too bad, because this reviewer kept feeling that the author, norman meise, had something to present. the trouble is that he does not communicate. this, i think, is the result of two things. first, the book reflects the naivete of engineers when they come to deal with what are basically social systems like libraries. this does not mean it can't be done, but such a task needs clarity and purpose, which this book does not have. the second springs from this failure. the masses of data, assumptions, and commentary in the book are poorly organized and interrelated. it is not enough to write strings of words; those strings must communicate and relate backward and forward in the text. although never explicitly stated, the book evidently grew out of a study performed by the united aircraft corporate systems center in 1965-66 for the development and implementation of a connecticut library research center (see eric document ed 0221512). the latest reference in the book is 1966. in a field, i.e. library networks, where a fair amount of work and discussion has taken place in the last three years (e.g. the edunet conference in 1966), a book like this quickly loses its impact. the purpose of the book, according to the author, is "to show the feasibility of a system concept rather than provide a detailed engineering design." the system is "an automated national library system" using the state of connecticut as a model. the author then adds (spoiling the whole introduction) : "if these functions (bibliographic searching, acquisition, cataloging, circulation) can be economically automated, the major problems associated with our information explosion will be solved." as anatole france once said: "it is in the ability to deceive oneself that the greatest talent is shown." 272 journal of library automation vol. 2/ 4 december, 1969 basically the system is made up of three levels : local libraries, the regional center, and the "national library central." these are interconnected either by teleprinter, at 75 bits per second, or crt consoles, as 1200 to 2400 bits per second. mr. meise develops extensive tables, using connecticut as a model, for (a) estimated message traffic, real-time and batch; (b) allocation of communication traffic to segments of circuit route; (c ) cumulative communications traffic; (d) number of circuits required versus circuit speed. he discusses bibliographic coupling ( 78-82), the itek memory centered processor, disc packs and file organization, ( 100-118, 162-179). i cite these tables and data (there are many more) merely to show the approach. at one point he talks about packages such as books, at another about papers. the whole system is based on statistics for which there is no discussion. item: "the local library should satisfy a large percentage of the user's needs ( 90-99% ) ; however, some portion of these needs ( 1-10%) should be obtained from other libraries to keep system costs within reasonable range" ( p.32). where does "90-99%" come from? how do we know that this level will "keep system costs within reasonable range"? item: "the state of connecticut is about the right size for a regional center from the point of view of expected user load" ( p.118). whose hat did he pull this one out of? there is no discussion of right size, nor really any of what "size" means -population? geographic area? cultural makeup? one suggested region (arizona, nevada, utah) has about the same population as connecticut, but is 62 times the size. certainly the communications costs are entirely different and the two regions are not comparable. figures suddenly appear in the text, e.g. 9,610,000 vols. (p.98) and others, and the reader does not know where they came from. they may be right. they may even have been discussed somewhere in the text, but on page 98 one does not remember. and the index is of no value: two pages, hastily organized. this is all too bad, because mr. meise evidently put a good deal of effort into this. instead of discussing the statistical assumptions necessary for network planning, we are presented with raw and unevaluated data. instead of a thorough analysis of the "feasibility of a system concept", we are presented with a grandiose scheme. buried in the pile, however, are data, which while poorly organized and presented, are necessary for practical network planning. what is needed is a coherent and basic statement of the kinds of data available, of the kinds of data that are unavailable or imprecise, of the conditions under which these kinds of data hold, and of the relative usefulness of such data at varying systems levels. perhaps it is unfair to criticize mr. meise for not writing this kind of book. yet my criticism is precisely that, because he writes as though these data already exist in organized form. they don't. he has built a house of cards on air. robert s. taylor book reviews 273 thesaurus of eric descriptors, 2d edition, washington, d.c.: educational resources information center, bureau of research, office of education, 1969. 289 pp. one of the principal problems associated with the review of a new thesaurus is that the thesaurus usually serves simultaneously to exemplify the use and misuse of the basics of thesaurus construction. the thesaurus of eric descriptors is no exception. for the purposes of this review, it is necessary to distinguish between a thesaurus and an authority list. both are designed to improve communication between the user and the information storage and retrieval system. a thesaurus is usually used in conjunction with free-vocabulary indexing (and retrieval) while the authority list must be used only with controlledvocabulary indexing. hence a thesaurus, in the words of the engineers' joint council guide to indexing and abstracting, " . .. is not meant to specify the words in which information is to be recorded, but rather to establish the semantic and generic interrelationships between such words". the indexer uses the thesaurus as a means of "enriching" his indexing, i.e., as a guideline for effective indexing. the searcher uses the thesaurus to aid in phrasing or clarifying his search question. in neither use is there demanded the use of a particular term in preference to any other. an authority list, on the other hand, must be composed entirely of system terminology (except for the use-use for relationships, although the non-preferred term cannot be used profitably as a search term) which the indexer and searcher are constrained to employ. the thesaurus of eric descriptors is, by its own admission, an authority list ("only those descriptors actually used for indexing are placed in the thesaurus .. . ", p. vii). a thesaurus may be used with either free or controlled vocabulary indexing/ retrieval; an authority list may be used only with a controlled vocabulary. it is time we started using the correct terms for these two types of communication device. apart from the confusion as to the exact nature of the document it inu·oduces, the introduction to the thesaurus of eric descriptors does provide a good discussion of the problems of indexing and "thesaurus" development, especially concerning the need for multi-term entries. the descriptor listing, to which are added a rotated descriptor display, a descriptor group display, and descriptor scope notes, is well constructed (especially commendable is the rotated descriptor display). however, i question the value of the descriptor groups, which serve to grossly classify the eric descriptors, since they tend to detract from the cross-concept nature of the authority list. finally, the formats of the various listings in this document are well done and provide a very readable and usable authority list. james e. rush 274 journal of library automation vol. 2/4 december, 1969 announced reprints. vol. 1, feb. 1969. microcard editions. 52 pp. $30.00 per year. this journal complements guide to reprints. announced reprints lists forthcoming reprints that have been announced but not yet produced. published quarterly, its scope includes books, journals, and other materials originating both in the united states and abroad. each issue will cumulate all previous issues except that following the november issue all titles that have been published will be dropped. books are entered by author. entries include author, title and original date of publication. journals and sets are entered by title and include volume numbers. each entry includes in brackets the date of the first inclusion of an item in announced reprints. titles preceded by an asterisk are those that have been published subsequent to being listed as a forthcoming title. a title that appears in the february issue, for example, as an announced title which is then published in march will appear in the may, august and november issues preceded by an asterisk. following the november issue it is dropped. prices are included, in some cases being in the currency of the country. prepublication prices may be listed but the deadline is not. there is an alphabetical listing of publishers known to be active in the reprinting business, but of the 218 publishers so listed 124 did not supply announced reprints with titles. among the nonrespondents was kraus reprint corporation, one of the larger houses. exactly what need this journal answers is not completely clear . the guide to reprints provides an annual, cumulative list of books, journals and other materials that have been reprinted. as an acquisition tool it is self-evident. but since the period between the time a title is announced and the time it is actually reprinted is variable, one can only suppose that the publishers hope to fix their market by having their forthcoming titles listed in announced reprints. if they get expressions of interest from many libraries they may actually reprint. since announced reprints gives the date of the first time a reprint title is listed, eventually librarians will learn which publishers are reliable and which are not in following through on the promise of publishing a reprint. john demo~ union list of serials in the libraries in the miami vauey, sue brown, editor. 2d edition. dayton, ohio: wright state university library, 1969. $20.00. it's hard to review a union list of serials because such a publication is obviously a very useful thing to have and to use. being intimately connected with the production of a similar list for the cincinnati area, i can book reviews 275 only commend the librarians of the dayton-miami valley consortium for producing this second edition in as short a time as they did. (the first edition containing 8880 titles held by 35 libraries was published in the spring of 1968.) this edition contains the holdings of tlrree more libraries than did the first, and nearly 900 more titles are included. there are a few minor points about which one might quibble, such as the listing of the computer output on the lined side of the paper, making the pages of the published list a bit lined and grey looking; the use of corporate entries for the titles, which is o.k. if the list is used only by librarians and others used to that form of entry, but confusing to the average patron who is, i am convinced, used to looking up holdings information by the running title that he picked up in a citation somewhere; listing holdings under the latest title of a periodical with notes as to the title variations over the years (although i can't complain too loudly about this, as it is the same way we are doing the cincinnati area list, although with less information as to title changes than in this list); the use of library name "codes., that are the same as, or similar to, those used in the "union list of serials," which causes a great string of oda-to run down each page. there are, naturally, a few missed cross references to the latest title as well as a few keypunching errors. these detract little from the usefulness of the volume, which should be great, especially in the area near western ohio. the list is available for $20 from the acquisitions department, library, wright state university, colonel glenn highway, dayton, ohio 45431. thomas h. rees, ]r. current contents; education. 1 (june 17, 1969) philadelphia, institute for scientific information. subscription price varies. the rise in need for librarians to build their own offprint files has intensified searching for current, relevant references. current contents; education facilitates that search, for it reproduces contents pages of some 350 journals in the field of education and related fields. this new publication includes over a dozen library journals, including the journal of library automation. the various sections of current contents have established a well deserved reputation for timeliness. indeed, some librarians have complained that their users receive reprinted contents pages in current contents before libraries receive the journals. since each issue contains an author index and address directory, it is easy to request an offprint, and thereby expend minimal effort in keeping up as well as building a personal offprint collection. the subscription price can vary from $100 for a single non-educational subscription to less than $1.50 for multi-year subscriptions in groups of 200 or more. f d ·kg k"l re enc . j gour 276 journal of library automation vol. 2/ 4 december, 1969 standardization for documentation, bernard houghton, ed. hamden, conn.: archon books, 1969. 93 pp. $4.00. the editor has brought together in this tight little book an illuminating collection of six useful papers prepared initially for a conference held in liverpool, england, in november, 1968. the announced goal of this conference was to "isolate and consider some of the areas in which the adoption of universal standards is of immediate relevance." inasmuch as the authors are all british, the volume will have greater interest abroad than in the u.s. nonetheless, there is universal recognition that standards in various areas of documentation are desperately needed and that a great deal remains to be done. an especially clear exposition of the british standards institution's work in this field is the work of c. w. paul-jones. he relates the methodology and the work of bsi to that of the international standards organization (iso ) and touches briefly on each standards committee and its program of work. his concise outline of standards in being and in progress, and the place of each standards-involved organization in the framework of universal standards is thoroughly competent. k. i. porter, the editor of the british union catalogue, touches on a variety of problems encountered in his work and discusses the potential of standards in the area of serial publications. a wryly humorous essay on standards for book production is the work of peter wright. he seems not very hopeful of changing the methods of book-trade production through standards, but believes in the usefulness of the effort to establish them. the essay of k.g.b. bakewell takes up classification, cataloging, and other devices for organizing library material and providing access to it. he deplores the inchoate british development in these areas and cites considerably greater standardization elsewhere. his review of known systems is helpful. d. martin's paper, "standards for a mechanized information system," reviews the practical problems of one who has to subject information to the unthinking mind of machines. he too enumerates needed standards for coding, indexing, data elements, etc., and concludes (properly) that "it is too early to start talking in terms of solutions: standards activity is only now beginning to gather momentum." the final paper, that of john maclachlan, is an ordinary how-we-do-it job, describing an abstracting service in one specific field and the local standards applied. these six papers taken as a whole constitute an informative and cogent source of information on the present status of standards work in the information field. despite the british emphasis, the case for multi-national and international standards is clearly set forth. this book should be required reading for students and workers in the fields of information science and documentation. ] errold orne book reviews 277 cataloging u.s.a., by paul s. dunkin. chicago: american library association, 1969. 159 pp. this book is, quite simply, a survey of the development of cataloguing in america, and of the present situation of cataloguing in america. it deals with all aspects of author cataloguing, descriptive cataloguing and subject cataloguing (both subject headings and classification). the method used by the author is didactic and expository rather than critical mr. dunkin seeks to analyse and to display the situation rather than to arrive at startling insights or to propose radical modification. the book is addressed to "the beginning student ... the experienced cataloger ... the public service librarian ... the library administrator". it is, mr. dunkin says, not a "how to do it book" but a "why do it book". it is certainly true that any member of mr. dunkin's readership will be enlightened by being shown the roots of modem cataloguing, and by having the perennial problems of cataloguing discussed in an admirably clear manner. mr. dunkin does not fail to illuminate each problem, and such illumination is, of course, half way to a solution. where he does fail, i feel, is in not providing any firm answers to these problems. perhaps in cataloguing there are no firm answers. this book seems to me, as an english cataloguer, to epitomise the "other directed" nature of american cataloguing. in reading this, as other american textbooks, i find a somewhat reverent attitude towards the great figures (principally cutter), the great institutions ( principally the library of congress) and the "sacred texts" (the various codes). the english tradition seems to me much more "inner directed", much more concerned with what is best for the individual catalogue, much less concerned with the necessity for standardisation and consistency between catalogues. this is not to say that either approach has a monopoly of virtue, or that one can fault mr. dunkin's book on this account. mr. dunkin has chosen his readership and his method, and within his self-imposed limits has produced a practical and useful book. furthermore the book is written with a clarity and ease unusual in cataloguing literature. michael gorman systems analysis in libraries, by f. robinson, c. r. clough, d. s. hamilton, and r. winter. newcastle upon tyne: oriel press lin1ited, 1969. 55 pp. 15s. ( symplegades, number i, a series of papers on computers, libraries and information processing). if symplegades once diligently guarded the entrance to the bosphorus, it has now gratefully allowed this simple book to survive its peril. the authors explain that the title is somewhat misleading-the book 278 journal of library automation vol. 2/4 december, 1969 has nothing to do with library systems and little in terms of system analysis that does not relate to computerization. the two purposes of this work are the need for stressing clarity in defining objectives and for emphasizing the extent and depth of the work involved in systems analysis. a book that could achieve such simple but difficult objectives and do it intelligently would indeed be welcome in our discipline. this volrune, however, does not obtain its objectives. it does provide something just as important in that it is readable (with dashes of humor) with a simple presentation of the basic tenets of systems analysis as it applies to libraries. it assumes that the reader knows nothing about systems analysis and its application to the computer. for the professional neophyte or the old graduate who has finally faced up to the realities of the future, this book should be a definite beginning point. the structure of the book and the presentation of the text contains the same simplicity as the message and book conveys. the presentation is in the form of the message. the book does contain one point of view which seems invalid. it suggests that systems analysis is only undertaken in connection with computerization. there are other shortcomings like the unexplained and unlabeled figures and the use of acronymns without explanation or definition. one also wonders about a technical book without the use of sourcing. i rene braden on research libraries; statement and recommendations of the committee on research libraries of the american council of learned societies; submitted to national advisory commission on libraries, november, 1967. cambridge, mass.: the m.i.t. press, 1969. 104 pp. this report presents the problems of research libraries and puts forth eleven major recommendations to solve these problems. in summary the recommendations are for a national library structure presided over by a national commission on libraries to cope with various problems, including automation; financial support from federal, state and private sources; and study and revision of the copyright law. none of the recommendations is novel. edwin e. williams of the harvard university library contributed a skillful summary of problems related to "bibliographic control and physical dissemination." m. v. mathews and w. s. brown of the bell telephone laboratories prepared a section entitled "research libraries and the new technology," which discusses computers and microcopying. the discussion of library computer applications is less than helpful. the authors propose a catalog for a university library on 80 reels of magnetic tape and propose "complete resorting of the catalog." no one with any experience whatbook reviews 279 soever in library computerization would dream, even in his worst nightmare, of such a monstrous arrangement. yale's ralphs. brown, jr., has furnished an appendix, "copyright problems of research libraries," that is most perceptive and informative. brown concludes that although copyright revision must move on," the costs of using copyright works [must be] bargained out" and that congress "must for a while attempt the difficult feat of standing still on a tightrope." the verso of the title page of on research libraries sharpens this point, for it carries the prohibition that "no part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without permission in writing from the publisher." bargaining, if it can be called that, is surely here. frederick g. kilgour microsoft word 5699-11611-7-ce.docx geographic  information  and  technologies   in  academic  libraries:  an  arl  survey  of   services  and  support       ann  l.  holstein     information  technology  and  libraries  |  march  2015             38   abstract   one  hundred  fifteen  academic  libraries,  all  current  members  of  the  association  of  research  libraries   (arl),  were  selected  to  participate  in  an  online  survey  in  an  effort  to  better  understand  campus   usage  of  geographic  data  and  geospatial  technologies,  and  how  libraries  support  these  uses.  the   survey  was  used  to  capture  information  regarding  geographic  needs  of  their  respective  campuses,  the   array  of  services  they  offer,  and  the  education  and  training  of  geographic  information  services   department  staff  members.  the  survey  results,  along  with  review  of  recent  literature,  were  used  to   identify  changes  in  geographic  information  services  and  support  since  1997,  when  a  similar  survey   was  conducted  by  arl.  this  new  study  has  enabled  recommendations  to  be  made  for  building  a   successful  geographic  information  service  center  within  the  campus  library  that  offers  a  robust  and   comprehensive  service  and  support  model  for  all  geographic  information  usage  on  campus.   introduction   in  june  1992,  the  arl  in  partnership  with  esri  (environmental  systems  research  institute)   launched  the  gis  (geographic  information  systems)  literacy  project.  this  project  sought  to   “introduce,  educate,  and  equip  librarians  with  the  skills  necessary”  to  become  effective  gis  users   and  to  learn  how  to  provide  patrons  with  “access  to  spatially  referenced  data  in  all  formats.”1   through  the  implementation  of  a  gis  program,  libraries  can  provide  “a  means  to  have  the   increasing  amount  of  digital  geographic  data  become  a  more  useful  product  for  the  typical   patron.”2     in  1997,  five  years  after  the  gis  literacy  project  began,  a  survey  was  conducted  to  elucidate  how   arl  libraries  support  patron  gis  needs.  the  survey  was  distributed  to  121  arl  members  for  the   purpose  of  gathering  information  about  gis  services,  staffing,  equipment,  software,  data,  and   support  these  libraries  offered  to  their  patrons.  seventy-­‐two  institutions  returned  the  survey,  a  60%   response  rate.  at  that  time,  nearly  three-­‐quarters  (74%)  of  the  respondents  affirmed  that  their   library  administered  some  level  of  gis  services.3  this  indicates  that  the  gis  literacy  project  had  an   evident  positive  impact  on  the  establishment  of  gis  services  in  arl  member  libraries.   since  then,  it  has  been  recognized  that  the  rapid  growth  of  digital  technologies  has  had  a   tremendous  effect  on  gis  services  in  libraries.4  we  acknowledge  the  importance  of  assessing     ann  l.  holstein  (ann.holstein@case.edu)  is  gis  librarian  at  kelvin  smith  library,  case  western   reserve  university,  cleveland,  ohio.     geographic  information  and  technologies  in  academic  research  libraries  |  holstein   39   how  geographic  services  in  academic  research  libraries  have  further  evolved  over  the  past  17   years  in  response  to  these  advancing  technologies  as  well  as  the  increasingly  demanding   geographic  information  needs  of  their  user  communities.     method   for  this  study,  115  academic  libraries,  all  current  members  of  arl  as  of  january  2014,  were   invited  to  participate  in  an  online  survey  in  an  effort  to  better  understand  campus  usage  of   geographic  data  and  geospatial  technologies  and  how  libraries  support  these  uses.  similar  in   nature  to  the  1997  arl  survey,  the  2014  survey  was  designed  to  capture  information  regarding   geographic  needs  of  their  respective  campuses,  the  array  of  services,  software.  and  support  the   academic  libraries  offer,  and  the  education  and  training  of  geographic  information  services   department  staff  members.  our  aim  was  to  be  able  to  determine  the  range  of  support  patrons  can   anticipate  at  these  libraries  and  ascertain  changes  in  gis  library  services  since  the  1997  survey.   a  cross-­‐sectional  survey  was  designed  and  administered  using  qualtrics,  an  online  survey  tool.  it   was  distributed  in  january  2014  via  email  to  the  person  identified  as  the  subject  specialist  for   mapping  and/or  geographic  information  at  each  arl  member  academic  library.  when  the  survey   closed  after  two  weeks,  54  institutions  had  responded  to  the  survey.  this  accounts  for  47%   participation.  responding  institutions  are  listed  in  the  appendix.   results   software  and  technologies   we  were  interested  in  learning  about  what  types  of  geographic  information  software  and   technologies  are  currently  being  offered  at  academic  research  libraries.  results  show  that  100%  of   survey  respondents  offer  gis  software/mapping  technologies  at  their  libraries,  36%  offer  remote   sensing  software  (to  process  and  analyze  remotely  sensed  data  such  as  aerial  photography  and   satellite  imagery),  and  36%  offer  global  positioning  system  (gps)  equipment  and/or  software.   nearly  all  (98%)  said  that  their  libraries  provide  esri  arcgis  software,  with  83%  also  providing   access  to  google  maps  and  google  earth,  and  35%  providing  qgis  (previously  known  as  quantum   gis).  smatterings  of  other  gis,  remote-­‐sensing,  and  gps  products  are  also  offered  by  some  of  the   libraries,  although  not  in  large  numbers  (see  table  1  for  full  listing).     the  fact  that  nearly  all  survey  respondents  offer  arcgis  software  at  their  libraries  comes  as  no   surprise.  arcgis  is  the  most  commonly  provided  mapping  software  available  in  academic  libraries,   and  in  2011,  it  was  determined  that  2,500  academic  libraries  were  using  esri  products.5  esri   software  was  most  popular  in  1997  as  well,  undoubtedly  because  they  offered  free  software  and   training  to  participants  of  the  gis  literacy  project.6         information  technology  and  libraries  |  march  2015   40   software/technology   type   %  of  providing   libraries   esri  arcgis   gis   98   google  maps/earth   gis   83   qgis   gis   35   autocad   gis   19   erdas  imagine   remote  sensing   19   grass   gis   15   envi   remote  sensing   15   geoda   gis   6   pci  geomatica   remote  sensing   6   garmin  map  source   gps   6   simplymap   gis   4   trimble  terrasync   gps   4   table  1.  geographic  information  software/mapping  technologies  provided  at  arl  member   academic  libraries  (2014)   google  maps  and  google  earth,  launched  in  2005,  have  quickly  become  very  popular  mapping   products  used  at  academic  libraries—a  close  second  only  to  esri  arcgis.  in  addition  to  being  free,   their  ease  of  use,  powerful  visualization  capabilities,  “customizable  map  features  and  dynamic   presentation  tools”  make  them  attractive  alternatives  to  commercial  gis  software  products.7     since  1997,  many  software  programs  have  fallen  out  of  favor.  mapinfo,  idrisi,  maptitude,  and   sammamish  data  finder/geosight  pro  were  gis  software  programs  listed  in  the  1997  survey   results  that  are  not  used  today  at  arl  member  academic  libraries.8  instead,  open  source  software   such  as  qgis,  grass,  and  geoda  are  growing  in  popularity.  they  are  free  to  use  and  their  source   code  may  be  modified  as  needed.   gps  equipment  lending  can  be  very  beneficial  to  students  and  campus  researchers  who  need  to   collect  their  own  field  research  locational  data.  the  2014  survey  found  that  30%  of  respondents   loan  recreational  gps  equipment  at  their  libraries  and  10%  loan  mapping-­‐grade  gps  equipment.   the  high  cost  of  mapping-­‐grade  gps  equipment  (several  thousand  dollars)  may  be  a  barrier  for   some  libraries;  however,  this  is  the  type  of  equipment  recommended  in  best-­‐practice  methods  for   gathering  highly  accurate  gps  data  for  research.  in  addition  to  expense,  complexity  of  operation  is   another  consideration.  while  it  is  “fairly  simple  to  use  a  recreational  gps  unit,”  a  certain  level  of   advanced  training  is  required  for  operating  mapping-­‐grade  gps  equipment.9  a  designated  staff   member  may  need  to  take  on  the  responsibility  of  becoming  the  in-­‐house  gps  expert  and  routinely   offer  training  sessions  to  those  interested  in  borrowing  mapping-­‐grade  gps  equipment.     location     geographic  information  and  technologies  in  academic  research  libraries  |  holstein   41   at  36%  of  responding  libraries,  the  geographic  information  services  area  is  located  where  the   paper  maps  are  (map  department/services);  19%  have  separated  this  area  and  designated  it  as  a   geospatial  data  center,  gis,  or  data  services  department;  13%  integrate  it  with  the  reference   department;  and  just  4%  of  libraries  house  the  gis  area  in  government  documents.  table  2  lists  all   reported  locations  for  this  service  area.  not  surprisingly,  in  1997,  government  documents  (39%)   was  just  as  popular  a  location  for  this  service  area  as  within  the  map  department  (43%).10   libraries  identified  government  documents  as  a  natural  fit,  keeping  gis  services  within  close   proximity  to  spatial  data  sets  recently  being  distributed  by  government  agencies,  most  notably  the   us  government  printing  office  (gpo).  these  agencies  had  made  the  decision  to  distribute  “most   data  in  machine  readable  form,”11  including  the  1990  census  data  as  topographically  integrated   geographic  encoding  and  referencing  (tiger)  files.12  gis  technologies  were  needed  to  access  and   most  effectively  use  information  within  these  massive  spatial  datasets.     location   %  of  libraries  (1997)   %  of  libraries  (2014)   map  department/services   43   36   government  documents   39   4   reference   10   13   geospatial  data  center,  gis,  or  data  services   3   19   not  in  any  one  location   -­‐   9   digital  scholarship  center   -­‐   6   combined  area  (i.e.,  map  dept.  &  gov.  docs.)   -­‐   6   table  2.  location  of  the  geographic  information  services  area  within  the  library  (1997  and  2014)   at  59%  of  responding  libraries,  geographic  information  software  is  available  on  computer   workstations  in  a  designated  area,  such  as  within  the  map  department.  however,  many  do  not   restrict  users  by  location  and  have  the  software  available  on  all  computer  workstations   throughout  the  library  (37%)  or  on  designated  workstations  distributed  throughout  the  library   (33%).  a  small  percentage  (7%)  loan  laptops  to  patrons  with  the  software  installed,  allowing  full   mobility  throughout  the  entire  library  space.   staffing   most  professional  staff  working  in  the  geographic  information  services  department  hold  one  or   more  postbaccalaureate  advanced  degrees.  of  113  geographic  services  staff  at  responding   libraries,  65%  had  obtained  an  ma/ms,  mls/mlis,  or  phd;  43%  have  one  advanced  degree,  while   22%  have  two  postbaccalaureate  degrees.  half  (50%)  hold  an  mls/mlis,  31%  hold  an  ma/ms,   and  6%  hold  a  phd.  nearly  one-­‐third  (31%)  have  obtained  a  ba/bs  as  their  highest  educational   degree,  3%  had  a  two-­‐year  technical  degree,  and  2%  had  only  earned  a  ged  or  high  school   diploma.  in  1997,  84%  of  gis  librarians  and  specialists  at  arl  libraries  had  an  mls  degree.13  at   that  time,  the  incumbent  was  most  often  recruited  from  within  the  library  to  assume  this  new  role,     information  technology  and  libraries  |  march  2015   42   whereas  today’s  gis  professionals  are  just  as  likely  to  come  from  nonlibrary  backgrounds,   bringing  their  expertise  and  advanced  geographic  training  to  this  nontraditional  librarian  role.     figure  1.  highest  educational  degree  of  geographic  services  staff  (2014)   on  average,  this  department  is  staffed  by  two  professional  staff  members  and  three  student  staff.   student  employees  can  be  a  terrific  asset,  especially  if  they  have  been  previously  trained  in  gis.   students  are  likely  to  be  recruited  from  departments  that  are  the  heaviest  gis  users  at  the   university  (i.e.,  geography,  geology).  some  libraries  have  implemented  “co-­‐op”  programs  where   students  can  receive  credit  for  working  at  the  gis  services  area.  these  dual-­‐benefit  positions  are   quite  lucrative  to  students.14     campus  users   in  a  typical  week  during  the  course  of  a  semester,  responding  libraries  each  serve  approximately   sixteen  gis  users,  four  remote  sensing  users,  and  three  gps  users.  these  users  may  obtain   assistance  from  department  staff  either  in-­‐person  or  remotely  via  phone  or  email.     on  average,  undergraduate  and  graduate  students  compose  the  majority  (75%)  of  geographic   service  users  (32%  and  43%,  respectively).  faculty  members  compose  14%  of  the  users,  followed   by  staff  (including  postdoctoral  researchers)  at  7%.  some  institutions  also  provide  support  to   public  patrons  and  alumni  (4%  and  1%,  respectively).  in  1997,  it  was  estimated  that  on  average,   63%  of  gis  users  were  students,  22%  were  faculty,  8%  were  staff,  and  8%  were  public.15   ged/hs   2%   2yr  tech   3%   ba/bs   31%   ma/ms/mlis   58%   phd   6%     geographic  information  and  technologies  in  academic  research  libraries  |  holstein   43     figure  2.  comparison  of  the  percentage  of  geographic  service  users  by  patron  status  (1997  and   2014)   the  top  three  departments  that  use  gis  software  at  arl  campuses  are  environmental   science/studies,  urban  planning/studies,  and  geography.  the  most  frequent  remote  sensing   software  users  come  from  the  departments  of  environmental  science/studies,  geography,  and   archaeology.  gps  equipment  loan  and  software  usage  is  most  popular  with  the  departments  of   environmental  science/studies,  geography,  biology/ecology  and  archaeology  (see  table  3  for  full   listing).  some  departments  are  heavy  users  of  all  geographic  technologies,  while  others  have   shown  interest  in  only  one.  for  example,  the  departments  of  psychology  and  medicine/dentistry   have  used  gis  but  have  expressed  little  or  no  interest  in  using  remote-­‐sensing  or  gps  technologies.   support  and  services   the  campus  community  is  supported  by  library  staff  in  a  variety  of  ways  with  regards  to  gis,   remote-­‐sensing,  and  gps  technology  and  software  use.  nearly  all  (94%)  libraries  provide   assistance  using  the  software  for  specific  class  assignments  and  projects,  and  78%  are  able  to   provide  more  in-­‐depth  research  project  consultations.  more  than  one-­‐quarter  (27%)  of  reporting   libraries  will  make  custom  gis  maps  for  patrons,  although  there  may  be  a  charge  depending  on  the   library,  project,  and  patron  type  (10%).  most  (90%)  offer  basic  use  and  troubleshooting  support;   however,  just  39%  offer  support  for  software  installation,  and  55%  offer  technical  support  for   problems  such  as  licensing  issues  and  turning  on  extensions.  the  campus  computing  center  or   information  technology  services  (its)  at  arl  institutions  most  likely  fields  some  of  the  software   installation  and  technical  issues  rather  than  the  library,  thus  accounting  for  the  lower  percentages.     a  variety  of  software  training  may  be  offered  to  the  campus  community  through  the  library;  80%   of  responding  libraries  make  visits  to  classes  to  give  presentations  and  training  sessions,  69%  host   workshops,  47%  provide  opportunities  for  virtual  training  courses  and  tutorials,  and  4%  offer   certificate  training  programs.     0   10   20   30   40   50   60   70   80   students   faculty   staff   public   alumni   1997   2014     information  technology  and  libraries  |  march  2015   44   department   gis   remote  sensing   gps   anthropology   24   10   8   archaeology   24   14   13   architecture   24   1   6   biology/ecology   32   10   13   business/economics   23   1   3   engineering   18   9   11   environmental  science/studies   41   22   16   forestry/wildlife/fisheries   21   12   10   geography   35   22   15   geology   31   12   10   history   27   2   2   information  sciences   14   1   0   nursing   8   1   2   medicine/dentistry   9   0   0   political  science   25   3   5   psychology   4   0   0   public  health/epidemiology/  biostatistics   30   3   9   social  work   2   0   1   sociology   22   0   3   soil  science   17   5   4   statistics   8   3   0   urban  planning/studies   36   7   9   table  3.  number  of  arl  libraries  reporting  frequent  users  of  gis,  remote-­‐sensing,  or  gps   software  and  technologies  from  a  campus  department  (2014)     often,  the  library  is  not  the  only  place  people  can  go  to  obtain  software  support  and  training  on   campus.  most  (86%)  responding  libraries  state  that  their  university  offers  credit  courses,  and  41%   of  campuses  have  a  gis  computer  lab  located  elsewhere  on  campus  that  may  be  utilized.  its  is   available  for  assistance  at  29%  of  the  universities,  and  continuing  education  offers  some  level  of   training  and  support  at  14%  of  campuses.     data  collection  and  access   most  (85%)  of  responding  libraries  collect  geographic  data  and  allow  an  annual  budget  for  it.   “libraries  that  have  invested  money  in  proprietary  software  and  trained  staff  members  will  tend   to  also  develop  and  maintain  their  own  collection  of  data  resources.”16  of  those  collecting  data,  26%   spend  less  than  $1,000  annually,  15%  spend  between  $1,000  and  $2,499,  17%  spend  between   $2,500  and  $5,000,  while  41%  spend  more  than  $5,000.  in  1997,  79%  of  libraries  spent  less  than   $2,000  annually,  and  only  9%  spent  more  than  $5,000.17       geographic  information  and  technologies  in  academic  research  libraries  |  holstein   45     figure  3.  annual  budget  allocations  for  geographic  data  (2014)   a  dramatic  shift  has  occurred  over  the  years  with  budget  allocations  for  data  sets.  no  longer  are   academic  libraries  just  collecting  free  government  data  sets  as  was  typically  the  case  back  in  1997,   but  they  are  investing  much  more  of  their  materials  budget  into  building  up  the  geographic  data   collection  for  their  users.     data  is  made  accessible  to  campus  users  in  a  variety  of  ways.  a  majority  (84%)  offer  data  via   remote  access  or  download  from  a  networked  campus  computer,  using  a  virtual  private  network   (vpn)  or  login.  more  than  half  (62%)  of  responding  libraries  provide  access  to  data  from   workstations  within  the  library,  and  64%  lend  cd-­‐roms.   roughly  one-­‐quarter  (26%)  of  responding  libraries  provide  users  with  storage  for  their  data.  of   those,  29%  have  a  dedicated  geographic  data  server,  14%  use  the  main  library  server,  29%  point   users  to  the  university  server  or  institutional  repository,  and  36%  allow  users  to  store  their  data   directly  onto  a  library  computer  workstation  hard  drive.   internal  use  of  gis  in  libraries   geographic  information  technologies  may  be  used  internally  to  help  patrons  navigate  the  library’s   physical  collections  and  efficiently  locate  print  materials.  of  the  survey  respondents,  60%  use  gis   for  map  or  air  photo  indexing,  27%  use  the  technology  to  create  floor  maps  of  the  library  building,   and  15%  use  it  to  map  the  library’s  physical  collections.  “the  use  of  gis  in  mapping  library   collections  is  one  of  the  non-­‐traditional  but  useful  applications  of  gis.”18  gis  can  be  used  to  link   library  materials  to  simulated  views  of  floor  maps  through  location  codes.19  this  enables  patrons   to  determine  the  exact  location  of  library  material  by  providing  them  with  item  “location  details   such  as  stacks,  row,  rack,  shelf  numbers,  etc.”20  the  gis  system  can  become  a  useful  tool  for   collection  management  and  can  be  a  tremendous  time-­‐saver  for  patrons,  especially  those   unfamiliar  with  the  cataloging  system  or  collection  layout.     discussion   recommendations  for  building  a  successful  geographic  information  service  center   0   5   10   15   20   25   30   35   40   45   percent  (%)     information  technology  and  libraries  |  march  2015   46   the  geographic  information  services  area  is  often  a  blend  of  the  traditional  and  modern.  it  can   extend  to  paper  maps,  atlases,  gps  equipment,  software  manuals,  large-­‐format  scanners,  printers,   and  gis.  gis  services  may  include  a  cluster  of  computers  with  gis  software  installed,  an  accessible   collection  of  gis  data  resources,  and  assistance  available  from  the  library  staff.  the  question  for   academic  libraries  today  is  no  longer  “whether  to  offer  gis  services  but  what  level  of  service  to   offer.”21  every  university  has  different  gis  needs,  and  the  library  must  decide  how  it  can  best   support  these  needs.  there  is  no  set  formula  for  building  a  geographic  information  service  center   because  each  institution  “has  a  different  service  mission  and  user  base.”22  every  library’s  gis   service  program  will  be  designed  with  its  unique  institutional  needs  in  mind;  however,  they  each   will  incorporate  some  combination  of  hardware,  software,  data,  and  training  opportunities   provided  by  at  least  one  knowledgeable  staff  member.23     “gis  represents  a  significant  investment  in  hardware,  software,  staffing,  data  acquisition,  and   ongoing  staff  development.  either  new  money  or  significant  reallocation  is  required.”24   establishing  new  or  enhancing  gis  services  in  the  library  requires  the  “serious  assessment  of  long-­‐ term  support  and  funding  needs.”25  commitment  of  the  university  as  a  whole,  or  at  least  support   from  senior  administration,  “library  administration,  and  related  campus  departments”  is  crucial  to   its  success.26  receiving  “more  funding  will  mean  more  staff,  better  trained  staff,  a  more  in-­‐depth   collection,  better  hardware  and  software,  and  the  ability  to  offer  multiple  types  of  gis  services.”27     once  funding  for  this  endeavor  has  been  secured,  it  is  of  utmost  importance  to  recruit  a  gis   professional  to  manage  the  geographic  information  service  center.  to  be  most  effective  in  this   position,  the  incumbent  should  possess  a  graduate  degree  in  gis  or  geography;  however,   depending  on  what  additional  responsibilities  would  be  required  of  the  candidate  (i.e.,  reference,   cataloging,  etc.)  a  second  degree  in  library  science  is  strongly  recommended.  this  staff  member   should  possess  mapping  and  gis  skills,  which  include  experience  with  esri  software  and  remote   sensing  technologies.  employees  in  this  position  may  be  given  a  job  titles  such  as  “gis  specialists,   gis/data  librarians,  gis/map  librarians,  digital  cartographers,  spatial  data  specialists,  and  gis   coordinators.”28     with  the  new  staff  member  on  board,  hereafter  referred  to  as  “gis  specialist,”  decisions  such  as   what  software  to  provide,  which  data  sets  to  collect,  and  what  types  of  training  and  support  to   offer  to  the  campus  can  be  made.  consulting  with  research  centers  and  academic  departments  that   currently  use  or  are  interested  in  using  gis  and  remote  sensing  technologies  is  a  good  place  to   learn  about  software,  data,  and  training  needs  and  to  determine  the  focus  and  direction  of  the   geographic  information  services  department.29  campus  users  often  come  from  academic   departments  that  “have  neither  staff  nor  facilities  to  support  gis,”  and  “may  only  consist  of  one  or   two  faculty  and  a  few  graduate  students.  these  gis  users  need  access  to  software,  data,  and   expertise  from  a  centralized,  accessible  source  of  research  assistance,  such  as  the  library.”30     at  minimum,  esri  arcgis,  google  maps  and  google  earth  should  be  supported,  with  additional   remote  sensing  or  open  source  gis  software  depending  on  staff  expertise  and  known  campus     geographic  information  and  technologies  in  academic  research  libraries  |  holstein   47   needs.  when  purchasing  commercial  software  licenses,  such  as  for  esri  arcgis,  discounts  for   educational  institutions  are  usually  available.  additionally,  negotiating  campus-­‐wide  software   licenses  may  be  a  good  option  to  consider  as  the  costs  are  usually  far  less  than  purchasing   individual  or  floating  licenses.  costs  for  campus-­‐wide  licensing  are  typically  determined  by  full-­‐ time  equivalent  (fte)  students  enrolled  at  the  university.     facilitating  “access  to  educational  resources  such  as  software  tools  and  applications,  how-­‐to-­‐ guides  for  data  and  software,”  and  tutorials  is  crucial.31  the  gis  specialist  must  be  familiar  with   how  gis  software  can  be  used  by  many  disciplines,  the  availability  of  “training  courses  or  tutorials,   sources  or  extensible  gis  software,  and  hundreds  of  software  and  application  books.”32  tutorials   may  be  provided  direct  from  a  software  vendor  (i.e.,  esri  virtual  campus)  or  developed  in-­‐house   by  the  gis  specialist.  creating  “gis  tutorials  on  short,  task-­‐based  techniques  such  as   georeferencing  or  geocoding”  and  making  them  readily  available  online  or  as  a  handout  may  save   time  having  to  repeatedly  explain  these  techniques  to  patrons.33   geospatial  data  collection  development  is  a  core  function  of  the  geographic  information  services   department.  to  effectively  develop  the  data  collection,  the  gis  specialist  must  fully  comprehend   the  needs  of  the  user  community  as  well  as  possess  a  “fundamental  understanding  of  the  nature   and  use  of  gis  data.”34  this  is  often  referred  to  as  “spatial  literacy.”35  it  is  crucial  to  keep  abreast  of   “recent  developments,  applications,  and  data  sets.”36   the  gis  specialist  will  spend  much  more  time  searching  for  and  acquiring  geographic  data  sets   than  selecting  and  purchasing  traditional  print  items  such  as  maps,  monographs,  and  journals  for   the  collection.  a  budget  should  be  established  annually  for  the  purchase  of  all  geographic   materials,  both  print  and  digital.  a  great  challenge  for  the  specialist  is  to  acquire  data  at  the  lowest   cost  possible.  while  a  plethora  of  free  data  is  available  online  from  government  agencies  and   nonprofit  organizations,  other  data,  available  only  from  private  companies,  may  be  quite   expensive  because  of  the  high  production  costs.  a  collection  development  policy  should  be  created   that  indicates  the  types  of  materials  and  data  collected  and  specifies  geographic  regions,  formats,   and  preferred  scales.37  the  needs  of  the  user  community  must  be  carefully  considered  when   establishing  the  policy.     the  expertise  of  the  gis  specialist  is  needed  not  only  to  help  patrons  locate  the  appropriate   geographic  data,  but  also  to  use  the  software  to  process,  interpret,  and  analyze  it.  “only  the  few   library  patrons  that  have  had  gis  experience  are  likely  to  obtain  any  level  of  success  without   intervention  by  library  staff”;38  thus,  for  any  mapping  program  installed  on  a  library  computer,   “staff  must  have  working  knowledge  of  the  program”  and  must  be  able  to  provide  support  to   users.39  furthermore,  the  gis  specialist  must  be  able  to  train  patrons  to  use  the  software  to   complete  common  tasks  such  as  file  format  conversion,  data  projection,  data  manipulation,  and   geoprocessing.  these  geospatial  technologies  involve  a  steep  learning  curve,  and  unfortunately   “hands-­‐on  training  options  outside  the  university  are  often  cost-­‐prohibitive”  for  many.40  the   campus  community  requires  training  opportunities  to  be  both  convenient  and  inexpensive.     information  technology  and  libraries  |  march  2015   48   teaching  hands-­‐on  geospatial  technology  workshops,  from  basic  to  the  advanced,  is  fundamental   to  educating  the  campus  community.  workshops  will  “vary  from  institution  to  institution,  with   some  offering  students  an  introduction  to  mapping  and  others  focusing  on  specific  features  of  the   program,  such  as  georeferencing,  geocoding,  and  spatial  analysis.  some  also  offer  workshops  that   are  theme  specific,”  such  as  “working  with  census  data”  or  “digital  elevation  modeling.”41  custom   workshops  or  training  sessions  can  be  developed  to  meet  a  specific  campus  need,  tailored  for  a   specific  class  in  consult  with  an  instructor,  or  designed  especially  for  other  library  staff.     today’s  geographic  information  service  center   the  academic  map  librarian  from  the  1970s  or  1980s  would  hardly  recognize  todays’  geographic   information  service  center.  what  was  once  a  room  of  map  cases  and  shelves  of  atlases  and   gazetteers  is  now  a  bustling  geospatial  center.  computers,  powerful  gis  and  remote-­‐sensing   technologies,  gps  devices,  digital  maps,  and  data  are  now  available  to  library  patrons.  every   library  surveyed  provides  gis  software  to  campus  users,  and  85%  also  actively  collect  gis  and   remotely  sensed  data.  with  the  assistance  of  expertly  trained  library  staff,  users  with  no  or  limited   experience  using  geospatial  technologies  are  enabled  to  analyze  spatial  data  sets  and  create   custom  maps  for  coursework,  projects,  and  research.  nearly  all  surveyed  libraries  (94%)  have   staff  that  can  assist  students  specifically  with  software  use  for  class  assignments  and  projects,   while  90%  provide  assistance  with  more  generalized  use  of  the  software.  a  majority  of  libraries   also  offer  a  variety  of  software  training  sessions,  workshops,  and  give  presentations  to  the  campus   community.  all  this  is  made  possible  through  the  library’s  commitment  to  this  service  area  and  the   availability  of  highly  trained  professional  staff,  most  who  hold  a  masters  or  doctoral  degree.  the   library  has  truly  established  itself  as  the  go-­‐to  location  on  campus  for  spatial  mapping  and  analysis.   this  role  has  only  strengthened  in  the  years  since  the  launch  of  the  arl  gis  literacy  project  in   1992.   references   1.     d.  kevin  davie  et  al.,  comps.,  spec  kit  238:  the  arl  geographic  information  systems  literacy   project  (washington,  dc:  association  of  research  libraries,  office  of  leadership  and   management  services,  1999),  16.   2.   ibid.,  3.   3.   ibid.,  i.   4.   abraham  parrish,  “improving  gis  consultations:  a  case  study  at  yale  university  library,”   library  trends  55,  no.  2  (2006):  328,  http://dx.doi.org/10.1353/lib.2006.0060.     5.     eva  dodsworth,  getting  started  with  gis:  a  lita  guide  (new  york:  neal-­‐schuman,  2012),  161.   6.   davie  et  al.,  spec  kit  238,  i.     geographic  information  and  technologies  in  academic  research  libraries  |  holstein   49   7.   eva  dodsworth  and  andrew  nicholson,  “academic  uses  of  google  earth  and  google  maps  in  a   library  setting,”  information  technology  &  libraries  31,  no.  2  (2012):  102,   http://dx.doi.org/10.6017/ital.v31i2.1848.   8.   davie  et  al.,  spec  kit  238,  8.   9.   gregory  h.  march,  “surveying  campus  gis  and  gps  users  to  determine  role  and  level  of   library  services,”  journal  of  map  &  geography  libraries  7,  no.  2  (2011):  170–71,   http://dx.doi.org/10.1080/15420353.2011.566838.   10.   davie  et  al.,  spec  kit  238,  5.     11.   george  j.  soete,  spec  kit  219:  transforming  libraries  issues  and  innovation  in  geographic   information  systems.  (washington,  dc:  association  of  research  libraries,  office  of   management  services,  1997),  5.   12.   camila  gabaldón  and  john  repplinger,  “gis  and  the  academic  library:  a  survey  of  libraries   offering  gis  services  in  two  consortia,”  issues  in  science  and  technology  librarianship  48   (2006),  http://dx.doi.org/10.5062/f4qj7f8r.   13.   davie  et  al.,  spec  kit  238,  5.   14.   soete,  spec  kit  219,  9.   15.   davie  et  al.,  spec  kit  238,  10.   16.   dodsworth,  getting  started  with  gis,  165.   17.   davie  et  al.,  spec  kit  238,  9.   18.   d.  n.  phadke,  geographical  information  systems  (gis)  in  library  and  information  services  (new   delhi:  concept,  2006),  36–37.   19.   ibid.,  13.   20.   ibid.,  74.   21.   rhonda  houser,  “building  a  library  gis  service  from  the  ground  up,”  library  trends  55,  no.  2   (2006):  325,  http://dx.doi.org/10.1353/lib.2006.0058.   22.   melissa  lamont  and  carol  marley,  “spatial  data  and  the  digital  library,”  cartography  and   geographic  information  systems  25,  no.  3  (1998):  143,   http://dx.doi.org/10.1559/152304098782383142.     information  technology  and  libraries  |  march  2015   50   23.   carolyn  d.  argentati,  “expanding  horizons  for  gis  services  in  academic  libraries,”  journal  of   academic  librarianship  23,  no.  6  (1997):  463,   http://dx.doi.org/10.1559/152304098782383142.   24.   soete,  spec  kit  219,  11.   25.   carol  cady  et  al.,  “geographic  information  services  in  the  undergraduate  college:   organizational  models  and  alternatives,”  cartographica  43,  no.  4  (2008):  249,   http://dx.doi.org/10.3138/carto.43.4.239.   26.   houser,  “building  a  library,”  325.   27.   r.  b.  parry  and  c.  r.  perkins,  eds.,  the  map  library  in  the  new  millennium  (chicago:  american   library  association,  2001),  59–60.   28.  patrick  florance,  “gis  collection  development  within  an  academic  library,”  library  trends  55,   no.  2  (2006):  223,  http://dx.doi.org/10.1353/lib.2006.0057.   29.   houser,  “building  a  library,”  325.   30.   ibid.,  323.   31.   ibid.,  322.   32.   parrish.  “improving  gis,”  329.   33.   ibid,  336.   34   florance,  “gis  collection  development,”  222.   35.    soete,  spec  kit  219,  6.   36.    dodsworth,  getting  started  with  gis,  165.   37.   soete,  spec  kit  219,  8.   38.   gabaldón  and  repplinger,  “gis  and  the  academic  library.”   39.   dodsworth,  getting  started  with  gis,  164.   40.   houser,  “building  a  library,”  323.   41.   dodsworth,  getting  started  with  gis,  161–62.         geographic  information  and  technologies  in  academic  research  libraries  |  holstein   51   appendix   responding  institutions   arizona  state  university  libraries   university  of  michigan  library   auburn  university  libraries   michigan  state  university  libraries   boston  college  libraries   university  of  nebraska–lincoln  libraries   university  of  calgary  libraries  and  cultural  resources   new  york  university  libraries   university  of  california,  los  angeles,  library   university  of  north  carolina  at  chapel  hill  libraries   university  of  california,  riverside,  libraries   north  carolina  state  university  libraries   university  of  california,  santa  barbara,  libraries   northwestern  university  library   case  western  reserve  university  libraries   university  of  oregon  libraries   colorado  state  university  libraries   university  of  ottawa  library   columbia  university  libraries   university  of  pennsylvania  libraries   university  of  connecticut  libraries   pennsylvania  state  university  libraries   cornell  university  library   purdue  university  libraries   dartmouth  college  library   queen’s  university  library   duke  university  library   rice  university  library   university  of  florida  libraries   university  of  south  carolina  libraries   georgetown  university  library   university  of  southern  california  libraries   university  of  hawaii  at  manoa  library   syracuse  university  library   university  of  illinois  at  chicago  library   university  of  tennessee,  knoxville,  libraries   university  of  illinois  at  urbana-­‐champaign  library   university  of  texas  libraries   indiana  university  libraries  bloomington   texas  tech  university  libraries   johns  hopkins  university  libraries   university  of  toronto  libraries   university  of  kansas  libraries   tulane  university  library   mcgill  university  library   vanderbilt  university  library   university  of  manitoba  libraries   university  of  waterloo  library   university  of  maryland  libraries   university  of  wisconsin–madison  libraries   massachusetts  institute  of  technology  libraries   yale  university  library   university  of  miami  libraries   york  university  libraries   lib-mocs-kmc364-20131012113359 210 reports and working papers inclusion of nonroman character sets the following document was prepared by staff of the library of congress as a working paper for discussions on incorporating the techniques described into the marc communications format. the document defines the principles for inclusion of nonroman alphabet character sets in the marc communications format and the procedural changes needed to allow implementation of the principles. this technique was agreed upon at the marbi committee meeting on february 2, 1981. any questions on the description of the inclusion of nonroman character sets in the marc communications format should be addressed to: library of congress, processing services, attention: mrs. margaret patterson, washington , dc 20540. 1. introduction the cataloging rules followed by american libraries favor recording the title page data in the original script when possible. this helps those who consult catalogs to read the most essential information about the book. (reading his or her name in romanized form is just as difficult for someone who knows arabic as reading your name when it's written in arabic. ) the new cataloging rules also specify that names and titles in notes be given in their original script, aacr2 l. 7 a.3. technological advances have made it possible to provide many, if not all , nonroman alphabets in machinereadable cataloging records. oclc and rlin are in the process of enhancing their systems so they can handle some nonroman writing systems. the library of congress has entered into a cooperative agreement with rlin for the development and use of an augmented rlin system for east asian (i.e., chinese, japanese, and korean) bibliographic data. although the library itself will not be creating and distributing marc records with nonroman characters in the near term , the goal of this proposal is to define how these data can be included now so others can do so soon. the technique known as an escape sequence announces that the codes which follow will represent letters in a specific different alphabet instead of the roman letters the codes would otherwise stand for. 2. principles the following principles will govern inclusion of other alphabets in marc records. note that these deal only with the marc communications format record, not the details of its processing-keying, sorting, display, etc.-by any bibliographic agency or utility. these principles are a slightly revised version of ones reviewed and approved in principle by the marbi character set committee in 1976. the earlier version was also distributed that year as working paper n77 of iso tc46/sc4/ wgi. (1) standard character sets should be used when available. (2) standard escape sequences should be used when available. (3) escape sequences should be used only when needed. (4) escape sequences are locking within a subfield but revert at any delimiter or field or record terminator code. example: (for demonstration purposes only, ec represents escape to cyrillic and ea escape to ascii) 245 10$aecrussian title proper :$becrussian subtitle. f not 245 10$aecrussian title proper :ea$becrussian subtitle. eaf and not 245 10$aecrussian title proper :$brussian subtitle. f (5) records which contain an escape sequence will also contain a special field which specifies what unusual character sets are present. 3. implementation the following will be done to realize these principles. • the ala character set will be redefined-see table 1. • a new character sets present field will be defined. • details of application such as distribution, filing indicator values, etc., will be defined. 3.1 discussionala character set a character set is a list of characters with the code used to represent each one. using this definition , the ala character set as given in appendixes iii.b and iii.c of marc formats for bibliographic data actually consists of eight character sets. (1) ascii and ala diacritics and special characters with their eight-bit code. (2) superscript zero to nine, plus, minus, open and close parentheses with their eight-bit code. table 1. proposed revised ala character set ~ p ~ p p p p i p p i p p p i i p i p p p i p i p i i p ~ i i i i ~ ~ ~ i p ~ i i ~ i p i p i i i i p p i i p i i i i p i i i i 4 3 2 i bits p i 2 3 4 ~ 6 7 r 9 10 ii 12 13 14 , 'i p i p i 2 nul ole sp soh dci ! fstx dc2 . etx dc3 " eot dc4 s enq nak " ack syn & bel !::to os can i ht em i lp sub vt f:sc + ff fs cr os , so ns , s l us' i ~ p l p 9 i i i i p p p i p i 3 ~ ~ . p @ p i a q 2 b r 3 c s 4 i> t ~ e u 6 p v 7 g w 8 h x !i i y j z ; k i < 1. \ m i > n ' 1 0 i ascii 6 • b c d c r • h ; j k i m n 0 reports and working papers 211 (3) subscript zero to nine, plus, minus, open and close parentheses with their eight-bit code. (4) greek lowercase alpha, beta, and gamma with their eight-bit code. (5-8) the same characters with their sixbit codes. the six-bit character sets are used to distribute marc records on seven-track tapes. there are very few subscribers. it is unlikely that a method can be devised for distribution of nonroman character sets records on such tapes. the present seven-track subscribers should be asked if they know of any way to do so. if they do not, the alternatives are to cease distribution of seven-track tapes entirely or limit them to those records containing only roman alphabet characters-those without a character sets present field. in the latter case, they should pay proportionately less for their subscription. the present four eight-bit character sets and their escape sequences do not conform to present standards. the present standards did not exist when the character sets were being defined. to avoid creating and distributing records containing both standard and nonstandard character sets and stanp p i i i i p p i i p p p i p i 7 8 9 . p q r ' l u ,. w x y , i' : i' -. del l i i /i p ~ i i i i i i p p i i p i p i ii p ~~. i i 10 ii 12 13 · u 0 l i l ' e • < 2 0 d ' j .. p ~ 4 4 . ;e • 5 s u > . b "' b * . p p © i ii r ® ii j 1 ® l " '-../ escape sequences would be given where needed in data fields. if necessary, it is permissible to embed escape sequences within a word. for example, a latin diacritic might be needed with an extended cyrillic letter to represent a letter in one of the nonslavic languages of central asia which uses the cyrillic alphabet. in addition to escape sequences for nonroman alphabets described above in which one code stands for one letter, the escape standards also define escape sequence procedures for changing to multiple byte character sets. because the ideographic writing 214 journal of library automation vol. 14/3 september 1981 table 3. escape sequence character set p p p p p p p 1 p p 1 p ~ ~ 1 1 u i p p p i p 1 p i 1 p p 1 i 1 i p p u 1 u u 1 i ~ 1 p i ~ i i i i p u i i q i i i i p i 1 i i ·i 3 2 i jilts g 1 2 j 4 r. 6 7 8 9 ill 11 12 ij (.1 1$ fl ~ ~ u ~ p 1 p 1 ~ p q p 1 1 p i 2 3 sp p ! 1 " 2 # j ll 4 \\ $ & 6 7 i 8 i 9 : • : < > i ? l ~ i i i g g i p i ~ 4 !o g 10 n 10 a « a 15 p !j 1.1 c ll a t .n e >' e "' r • r " • x u .. lo( -,, 3 ~ k w " j1 . n ... u1 m k ~ ll 0 . 0 ~ i i l i i u g p u i 1 p ~ i 1 u 1 ~ 1 ~ i ~ 7 8 9 10 11 12 n r .!l ~ p r c c t ~ y s lk j b 'i b j bl "' 3 ,_ ill ii 3 " ul y 'l ,, ,, i i p 1 13 -t 9 v .. [ j /i i i i 1 h ~ 1!o 1 / r '!; ,; e f y c ;;{ j:: s 1 .. j jb h, 1\ ,( y ll " i hll 7 i gt r, s cost 13052-67 russian iso dis 5~27 extended cyrillic systems of east asia use thousands of different characters, it will be necessary to use two or three bytes/codes to identify a single specific character uniquely. the japanese industrial standard character set, jis 6226, uses two bytes per character, and it has been submitted to iso to obtain a registered escape sequence. the first volume of the chinese character code for information interchange, cccii, has been issued; the second is expected in december. it uses three bytes per character. in all probability the lc/ rlin east asian cooperative project will adopt either these character sets and their escape sequences or machine reversible adaptations of them. the need to expand east asian character sets constantly to provide for infrequently used characters poses problems whose solutions cannot be predicted at this time. 3. 3 discussioncharacter sets present field as specified in the sixth principle, there is need for a special field which specifies what character sets are present whenever a set other than ascii and the ala extension of ascii are present in a record. the proposed field will use tag 066 and be defined as follows: 066 character sets present this field specifies what character sets are present in the other than ascii and the ala extension of ascii. the field is not repeatable. both indicators are unused and will contain blanks. $a this subfield will contain all but the first character of the escape sequence to the default character set in columns 2-7 whenever the default character set is not ascii. this is not likely to occur in records created in the united states. since there can only be one default character set, the subfield is not repeatable. $b this subfield will contain all but the first character of the escape sequence to the default character set in columns 1015 whenever the default character set is not the ala extension of ascii. this is not likely to occur in records created in the united states. since there can be only one default extension character set, this subfield is not repeatable. $c this subfield will contain all but the first character (or all but the first if a longer escape sequence is used) of every escape sequence found in the record. if the same escape sequence occurs more than once, it will be given only once in this subfield. the subfield is repeatable. this subfield does not identify the default character sets. example : l'>l'>~c)w a record containing the iso extended cyrillic character set. l'>l'>$c)w$c)x a record 3.4 discussion-other details containing both the iso greek and extended cyrillic character sets. when a field has an indicator to specify the number of leading characters to be ignored in filing and the text of the field begins with an escape sequence, the length of the escape sequence will not be included in the character count. when fields contain escape sequences to languages written from right to left, the field will still be given in its logical order. for example, the first letter of a hebrew title would be the eighth character in a field (following the indicators, a delimiter, a subfield code, and a three-character escape sequence). the first letter would not appear just before the end of field character and proceed backwards to the beginning of the field. a convention exists in descriptive cataloging fields that subfield content designation generally serves as a substitute for a space. an escape sequence can occur within a word, after a subfield code, or between two words not at a subfield boundary. for simplicity, the convention that an escape sequence does not replace a space should be adopted. one other convention is also advocated: when a space, subfield code, or punctuation mark (except open quote, pareports and working papers 215 renthesis or bracket) is adjacent to an escape sequence, the escape sequence will come last. wayne davison of rlin raised the following issue. after the library of congress has prepared and distributed an entirely romanized cataloging record for a russian book, a library with access to automated cyrillic input and display capability will create a record for the same book with the title in the vernacular. (since aacr2 says to give the title in the original script "wherever practicable," the library could be said to be obligated to do so.) in such an event the local record could have all the authoritative library of congress access points. to keep this record current when the library of congress record is revised and redistributed, it would be necessary to carry the lc control number in the local record . most automated systems are hypersensitive to the presence of two records with the same control number. the two records can be easily distinguished: in the library of congress record, the modified record byte in field 008 will be set to "o" and it will not have any 066, character sets present field. a comparison of oclc, rlg/rlin, and wln university of oregon library the following comparison of three major bibliographic utilities was prepared by the university of oregon library's cataloging objectives committee, subcommittee on bibliographic utilities. members of the subcommittee were elaine kemp, acting assistant university librarian for technical services; rod slade, coordinator of the library's computer search service; and thomas stave, head documents librarian. the subcommittee attempted to produce a comparison that was concise and jargonfree for use with the university community in evaluating the bibliographic utilities under consideration. the university faculty library committee was enlisted to review this document in draft jorm and held three meetings with the subcommittee for that purpose. the document was also shared with library faculty and staff in order to elicit suggestions for revision. president's column 114 information technology and libraries | september 2006 b eing president of a dynamic organization like lita is truly a humbling experience. every day i am awestruck by the dedication, energy, creativity, and excitement exhibited by lita’s members. i see it in everything that lita does, from its stellar publications and communications—including this journal, ital—to its programming and contribution to standards and system development. none of this would be possible without the hard work of all the dedicated members who volunteer their time not only to advancing their own professional development, but also to advancing the profession. thank you all. for forty years now, lita members have been dedicated to the association’s work, and we have been celebrating our fortieth anniversary throughout 2006. the celebration continues as we prepare to convene in nashville for the ninth lita national forum, october 26– 29, 2006. lita has had a long tradition of providing quality conferences. the first, held in 1970, was the conference on interlibrary communications and information networks, more familiarly known as the “airlie conference,” which had published proceedings. the second was a cooperative effort held in 1971 with the library education division and the american society for information science (asis), entitled “directions in education for information science: a symposium for educators.” in later years, lita held three national conferences: baltimore (1983), boston (1988), and denver (1992). in 1996, lita and the library administration and management association (lama) held a joint conference in pittsburgh. while the national conferences were very successful, the idea of a more informal, intimate event to be held annually took form, and in 1998 lita held its first annual national forum. next year we will continue the tradition of successful conference programming as we celebrate the tenth anniversary of the lita national forum in denver. this year’s theme is “netville in nashville: web services as library services.” we have an exciting lineup of keynote and concurrent-session speakers as well as several poster-session presenters who will stimulate lively discussions in all of the wonderful, informal networking opportunities this small conference offers. the sponsor showcase allows plenty of time for attendees to talk to our valued sponsors and learn more about their products. the two preconference programs offer in-depth experiences: “opensource installfest” and “developing best project management practices for it projects.” lita bloggers will be out in force producing summaries and reactions to it all. one of lita’s strongest membership benefits is the personal networking opportunities it provides. by providing an informal and enjoyable atmosphere, the national forum is one of the best places to network with others dealing with the same issues as you. i hope to see you there. besides the national forum (just one of lita’s many educational programs), one of the things i like most about lita is its flexibility to quickly accommodate programming to cover the latest issues and trends. lita’s programming at ala annual conferences attracts attendees from all divisions for this reason. every year, the highly successful top technology trends attracts more and more people who come to listen to the experts speak on the latest trends. the lita interest groups, like the technologies they focus on, also exhibit great flexibility because they can come and go—it’s easy to locate a few other members to create a new group where interested parties can come together for focused discussions or formal presentations. since its inception, lita has had traveling educational programs to provide programming opportunities for people who cannot attend the ala conferences. these in-depth programs, now called the regional institutes, focus on a topic and are offered as long as that issue is relevant. look for new electronic delivery of lita programs in the future. of course, lita’s publications provide a very lasting educational component. lita launched journal of library automation (jola), the predecessor of ital, in 1968, one year after the formation of the new division of ala. jola and, later, ital have consistently been a place for library information technologists to publish in a peer-reviewed scholarly journal. these well-respected publications have had a wonderful group of editors and editorial boards over the years. we are pleased that ital is now available online for members from the moment of publication. i want to thank all the people who work so hard to produce this publication on a quarterly basis. i also want to thank all the authors who submit their research for publication here and make a lasting contribution to the profession. all of these programs are just a sampling of what lita provides its members. is it any wonder i am awed by it all? i hope you are as well. i also hope that, in my year as your president, you will communicate with me in an open dialogue on the lita blog, via e-mail, or in person at conferences regarding how lita can better meet your needs as a member. we have been focusing a great deal on our educational goal because that is what we have heard you want out of lita. i encourage you to let me and the rest of the lita board know how we can best deliver a quality set of educational programs. president’s column bonnie postlethwaite bonnie postlethwaite (postlethwaiteb@umkc.edu) is lita president 2006/2007 and associate dean of libraries, university of missouri–kansas city. data center consolidation at the university at albany rebecca l. mugridge and michael sweeney information technology and libraries | december 2015 18 abstract this paper describes the experience of the university at albany (ualbany) libraries’ migration to a centralized university data center. following an introduction to the environment at ualbany, the authors discuss the advantages of data center consolidation. lessons learned from the project include the need to participate in the planning process, review migration schedules carefully, clarify costs of centralization, agree on a service level agreement, communicate plans to customers, and leverage economies of scale. introduction data centers are facilities that house servers and related equipment and systems. they are distinct from data repositories, which collect various forms of research data, although some data repositories are occasionally called data centers. many colleges and universities have data centers or server rooms distributed across one or more campuses, as does the university at albany (ualbany). this paper reports on the experiences of the libraries at ualbany as the libraries’ application and storage servers were consolidated into a new, state-of-the-art, university data center in a new building on campus. the authors discuss the advantages of consolidation, the planning process for the actual move, and lessons learned from the migration. background the university at albany is one of four university centers that are part of the state university of new york (suny) system. founded in 1844, ualbany has approximately 13,000 undergraduates, 4,500 graduate students, and more than 1,000 faculty members. it offers 118 undergraduate majors and minors, and 138 master’s, doctoral, and certificate programs. ualbany resides on three campuses: uptown (the main campus), downtown, and east.1 the uptown campus was built in the 1960s on grounds formerly owned by the albany country club. the campus was designed by noted architect edward durell stone in 1962–63 and was built in 1963–64. the campus buildings include four residential quadrangles surrounding a central “academic podium” consisting of thirteen three-story buildings connected on the surface by an overhanging canopy and below ground by a maze of tunnels and offices. many of the university’s classrooms, lecture halls, academic and operational offices, and infrastructure are housed within the podium on the basement or subbasement levels. this includes the university’s original data center, which is located in a basement room in the center of the podium. rebecca l. mugridge (rmugridge@albany.edu) is interim dean and director and associate director for technical services and library systems, and michael sweeney (msweeney2@albany.edu) is head, library systems department, university libraries, university at albany, albany, new york. mailto:rmugridge@albany.edu mailto:msweeney2@albany.edu data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 19 while visually striking and unique, the architectural design of the podium presented many challenges since its construction, one of which is regular flooding of the basement and subbasement levels. the original data center was flooded many times, to the extent that any heavy rainstorm had the potential to disrupt functionality and connectivity. when the university was first built in the 1960s it was not known to what extent computing would become part of the university’s infrastructure, and the room that the data center was housed in was not built to today’s standards for environmental control, such as the need for cooling. at the same time, server rooms sprouted all over the university, with many of the colleges and other units purchasing servers and maintaining server rooms in less than ideal conditions. these included server rooms in the college of arts and sciences, the school of business, the athletics department, the university libraries, and many other units. university libraries’ server room the university libraries maintained its own server room with two racks full of equipment that supported all of the libraries’ computing needs. these servers supported our website, mssql and mysql databases, ezproxy, illiad (interlibrary loan service), ares (electronic reserve service), and our search engine appliance (google mini). they also included our domain controller, intranet, and several servers used for backup. two servers and a storage area network housed our virtual environment, containing an additional nine virtual servers. these included servers to support library blogs, wikis, file storage, development and test servers, and additional backup servers. the only library servers not housed in the libraries’ server room were the integrated library system (ils) servers that were maintained primarily by the university’s information technology services (its) staff, our backup domain controllers, and a server holding backups of our virtual servers. the ils production server was housed in ualbany’s data center and the ils test/backup server was housed in the alternate data center in another building on campus. also, two of the libraries’ backup servers for other applications were housed in the university data center. the libraries’ server room consisted of a 340 square foot room on the third floor of the main campus library that was networked to support servers housed in two racks protected by a fire suppression system. there were two ceiling-mounted air conditioning units that cooled the room sufficiently for optimum performance. the libraries’ windows system administrator’s office was nearby and had a connecting door to the server room, giving him ready access to the servers when needed. data center consolidation data center consolidation is defined as “an organization's strategy to reduce it assets by using more efficient technologies. some of the consolidation technologies used in data centers today include server virtualization, storage virtualization, replacing mainframes with smaller blade server systems, cloud computing, better capacity planning and using tools for process automation.”2 in addition to the investigation and use of these technologies, the planning for a information technology and libraries | december 2015 20 new data center often involves the construction of a new building or the renovation of a current building. there were several drivers behind the ualbany’s decisions to build a new data center. in addition to the concerns mentioned above about the potential flooding risk of the current data center, the ability to manage optimum temperature was also a factor. the current data center was built to house 1960s-era equipment and was not able to keep up with the cooling requirements of the more extensive computing equipment in use in the twenty-first century. the current data center also occupied what is considered prime real estate at the university, at the center of campus and near the lecture center, which experiences high foot traffic during the academic year. the new data center was constructed near the edge of campus, with little foot or auto traffic, allowing the space previously occupied by equipment to be repurposed in a way that better meets the university’s needs. like many other universities, ualbany is increasingly making use of cloud computing capabilities. for example, the email and calendaring system are cloud-based. nevertheless, this movement is being made in a deliberate and thoughtful way, leaving many of our administrative computing needs reliant on the use of physical servers. ualbany and the libraries have decreased the number of physical servers necessary by relying on a virtualized environment, and part of the project to move to the new data center included a conversion from physical to virtual servers. the libraries’ ils production and test servers remain physical, as do several of the other libraries’ application servers. many of the libraries’ backup servers are now virtual. while there was no official mandate to consolidate all of the distributed server rooms across campus into the new data center, everyone involved understood that this was a direction the university administration supported. the libraries’ dean and director also supported this effort on behalf of the libraries and charged libraries’ staff to collaborate with its to make this happen. some of the drivers behind this decision include the promise of a better environment, improved security, backup generators for computing equipment, the use of its’s virtual environment, the automation of server management, a faster network, the ability to repurpose the libraries’ server room, and more. these drivers are described in more detail later in this paper. construction planning for ualbany’s new data center began in the mid-2000s and included the identification of funding and the architectural design of the new building, later to be named the information technology building (itb). the actual construction began in 2013, with an estimated completion date of february 2014 and occupancy in april 2014. unexpected challenges during construction delayed the timeline somewhat, and the construction was not completed until may 2014. the certificate of occupancy was granted in fall 2014. the data center is certified as tier iii by uptime institute,3 and the building is designated leed gold. data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 21 alternate data center simultaneously with the construction of the new data center, the university entered into an agreement with another suny institution to house our alternate data center. this center was originally housed in another building on the ualbany campus, less than a mile from both the main data center and itb, in a building leased by ualbany. this situation left some environmental issues out of our control, not an ideal situation. for example, an air conditioner failure in fall 2013 caused our backup and test ils server to be down for six days, affecting our ability to use that server for other purposes and holding up several projects. in addition, data center best practice calls for an alternate data center to be housed at a distance from the main data center. in february of 2014, the servers in the alternate data center were moved to their new location. this included the libraries’ backup ils server as well as two backup servers formerly housed in the main data center. advantages to the libraries moving to the university data center there were many advantages to the libraries moving to a centralized data center. many of these advantages also applied to the other units considering a move to the new data center, but for the purposes of this paper, we are addressing them in the context of the libraries’ experience. repurpose space the libraries’ server room occupied a large office that could be repurposed to house multiple staff offices or student spaces. the libraries have many group study rooms available for student use; however, they are in great demand, and the possibility of gaining more space for student use was seen as an advantage to making the move to a new data center. climate control the new data center is built on a raised floor that allows better air circulation. hundreds of servers and other pieces of equipment create a lot of excess heat, and raised floor construction allows for better circulation of air. new racks have chimneys that exhaust heat from high-density computing environments. air conditioners supply a constant stream of air that will maintain the optimum temperature for computing equipment. censors continually monitor humidity and keep it at an optimal level. this was an improvement over the libraries’ current server room, which had sufficient air conditioning for our relatively small number of physical servers but did not have backup generators to keep equipment running during a power outage. backup generators the new data center was built with two backup generators. if the building suddenly loses power, the backup generators will immediately start and provide a seamless source of energy. a secondary benefit to the university is that the backup generators can also provide a source of energy to other buildings on that side of campus; this area did not previously have a backup source of energy. again, the libraries’ server room did not have a redundant electrical supply. in information technology and libraries | december 2015 22 the event of a power outage, battery units would allow the servers to shut down properly if the outage lasted more than forty-five minutes. security with server rooms scattered all over the university, security issues were a concern. now that the servers are housed in one location, the university can provide a highly secure environment in a more cost effective way. the new data center has card-swipe access to the building and biometric access to the data center itself. there are also cameras installed in the building as a further security measure. virtual environment although the libraries have made strides toward moving into a virtualized environment in the past few years, we had many constraints on our ability to keep up with developments. the libraries’ virtual environment was two versions behind ualbany’s virtual environment, and the storage needs of the libraries’ virtual environment were at capacity. part of the incentive to moving into the new data center was the ability to downsize some of our physical equipment and migrate some of our physical servers to virtual equivalents. automation of server management one of the benefits of consolidating servers into one environment is that they are in a secure location, but it is still possible to manage them from a distance. the virtual environment has a web-based console that allows system administrators to connect and manage them, and the physical servers can be managed over the network as well. even though the servers are centralized, our system administrator can work from an office in the library, or from home if needed. faster network part of the project to construct a new data center included the installation of an additional fiber network across campus. the new fiber network connects all buildings on campus with each other and the new data center. all of the network equipment was upgraded, providing faster connections and response time. the additional fiber network is fault tolerant: if the primary network fails, the second fiber network can immediately take its place with no loss of service. staging and work room the new data center was designed to include a staging and work room. this can be used by any of the system administrators who are responsible for equipment housed in the data center, and it allows them to work on equipment in a room adjacent to the locked and secure data center. data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 23 equipment inventory part of the planning for the migration involved creating a detailed inventory of equipment. the libraries already had a server inventory, but the information collected for the migration went far beyond just a list of the servers. this helped us identify who was responsible for physical and virtual servers and who was responsible for the services and applications that ran on those servers. creating the equipment inventory also allowed us to consolidate and decommission equipment that was no longer needed, and additionally helped us determine a prioritization and time line for the move. applications inventory in addition to creating an equipment inventory, the libraries created an applications inventory that included information about the dependencies that applications had on each other. for example, the libraries’ electronic resources reserve application (ares) had been integrated into the university’s course management system, blackboard. that meant that when blackboard was inaccessible, ares was as well. all of these dependencies had to be taken into account when planning the schedule for the move. disadvantages to the libraries moving to the university data center the libraries have noted few disadvantages to moving to the new data center. what might seem like disadvantages are in reality just a change in the way we do our work. for example, we have been asked to inform someone in its before we go to the new data center to work on a server. this is a simple step, and has not hindered our work at any time. another change is the need to use a tool created by its to configure our virtual servers, and we found that the tool has been configured to give us fewer administrative options than what its staff have. this has reinforced our understanding that we need to be present and proactive in representing the libraries’ interests in managing all of our computing equipment and software. migration days the majority of ualbany’s servers were moved from the main data center to the new one in itb on august 9, 2014. however, we were unwilling to move all of the libraries’ servers on that day, which fell in the middle of the summer session. a compromise was reached between the libraries and its that allowed many of the libraries’ less mission-critical servers to be moved on the same day as the university’s servers. these servers were primarily ones that were used for development and backup purposes, one exception being the server that supported the libraries’ electronic reserves service. this server was dependent on the university-supported blackboard server, which was being moved on august 9, so the libraries’ agreed to move this server that day so there would not be two downtimes for the electronic reserve system. the libraries’ most critical servers were moved to itb on august 18, 2014. this was the first day of intersession and would affect students and faculty the least. there were many people involved information technology and libraries | december 2015 24 in the move, including the library systems staff, the migration consulting firm staff, the professional moving company that was hired to carry out the move, and its staff who were responsible for the network and other support. move activities included shutting down and backing up applications, powering off the servers, and packing the equipment. at itb the equipment was unpacked, placed in its assigned rack location, plugged in, and powered on. then each server had to be started, and applications tested. all of this activity began at 3:00 a.m. and continued until early afternoon. the day concluded with a conference call between all parties involved to confirm that everything was up and running as expected. lessons learned participate in the process the libraries were invited to participate in the planning for a new data center early in the process. its, ul, and other units with significant server collections met and discussed their computing needs and respective computing infrastructures. once the construction of itb began, the planning ramped up and monthly meetings of stakeholders became weekly meetings. agendas for these meetings included round robin reports about • construction project oversight; • migration consulting; • partnerships (with other units on campus, including the libraries); • status of our alternate data center (housed 10 miles away at another suny institution); • campus fiber network; • internal wiring and network design; • administrative computing planning and move; • research computing planning and move; • systems management (storage and virtual environment) planning and move; • data center advisory structure; and • campus notification and public relations. these meetings gave us an opportunity to learn about and understand all aspects of the data center migration project. participants reviewed project timelines and other documents that were housed on a shared wiki space. after the data center migration consultants were hired, they began to use the microsoft onedrive collaboration space to share and distribute documents. meeting regularly with all project participants allowed us to ask questions to clarify priorities and timelines and to advocate for the libraries’ needs. review schedules carefully as with many construction projects, unexpected delays in the construction of the data center delayed all of our plans. originally the building was to be completed in february; this was later data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 25 changed to april and then may. after the construction was complete, the building had to be commissioned, which means that every system within the building had to be tested independently by outside inspectors. coordination of this work is very time consuming, and the completion of the commissioning delayed occupancy by another few months. the university was finally given permission to move equipment into the data center in july. in the meantime, our consultants were working feverishly to develop timelines for the move, identify, and secure a contract with a professional it moving company, and create “playbooks” for each move. the playbook is a document that includes • names and contact information of everyone involved with the move; • sequence of events: an hour-by-hour description of all activities; • server overview: including the name, make, model, rack location and elevation, and contact person for each server; and • schematics of both old and new server locations including details about each server as well as rack locations and elevations. library staff became concerned when the original date scheduled to move most of the university’s application servers, including the libraries’ ils server, was in the middle of the summer session. although the projected downtime was only to be twelve hours (and probably fewer), library staff were not willing to have twelve hours’ downtime during a short four-week summer session. there were concerns that downtime, not only to the online catalog, but also to all of the libraries’ databases, the website, online reference service, electronic reserves, and other resources would present a severe hardship to faculty and students. we also recognized the risk, however small, of something going wrong during the move that would cause a lengthier downtime. at the same time the university was concerned about pushing the move too close to the start of the fall semester, as well as the increased cost of scheduling a second move date. during these negotiations it became apparent that the libraries’ needs are different from administrative computing needs. whereas the middle of a semester is a poor time for libraries’ servers to experience downtime, it can be a better time for administrative computing, which is often busier during intersession when grading reports are being run and personnel databases are being updated. ultimately, the libraries advocated for and secured an agreement for a second move date, scheduled for the first work day after the end of the summer session. similarly, its was encouraging all of its partners across the campus to move as much computing as possible into their virtual environment. this is a worthwhile goal, but again the libraries had to negotiate to make this change according to the schedule best for the libraries and its users. the its virtual environment was a more current release of the virtual machine (vm) software than the libraries were using, so the libraries were faced with not only a migration, but also an upgrade. ultimately, we postponed the vm migration until after the physical migration, and we have information technology and libraries | december 2015 26 benefited from waiting. other partners have had to work through a number of kinks in the process, and the libraries’ vm migration has benefitted from the other partners’ experience. clarify costs of centralization when ualbany began to consider and plan for a centralized data center, one of the concerns raised by the various data center managers from units other than its was the cost of centralizing their servers in another location. centralized data centers have many costs: heating, cooling, security, staffing, cleaning, backup energy sources, networking costs, and more. the question on everyone’s mind was who was going to pay for these costs. would each unit have to pay toward the maintenance of the data center? some objected to the idea of having to pay to be a tenant in a centralized data center, when they already had their own data center or server room at what seemed like no cost. the only cost they experienced was an opportunity cost of what else they could use the server room for. in the libraries’ case, the server room could be used for group study, office space, or other purposes, but it did not cost the libraries money to use it as a server room because utilities are covered centrally by the university. on the other hand, by migrating some of our computing to the its virtual environment, we may save money in the long run because we will not have to replace hardware and pay warranty fees. after much negotiation the university settled on a five-year commitment to no charges for the partnering units on campus, including the libraries. this agreement was documented in a partnership agreement drafted by a group of representatives from all of the key units involved. contribute to the development of a service level agreement library staff contributed to the development of a service level agreement (sla) for our participation in a centralized data center. having an sla in place ensures that all parties to the agreement understand their rights and responsibilities. we began by searching other universities’ websites for samples of slas, which we shared with its staff who were assigned to this project. the establishment of a centralized data center includes several major elements: data center as a service (dcaas), infrastructure as a service (iaas), as well as the network that connects it all. the sla that was developed, still in draft form, has elements that address the following: • the length of the agreement • network uptime • infrastructure as a service o server/storage environment and technical support o access to iaas o file backup and retention o maintenance of partner systems o its scheduled maintenance o data ownership, security, responsibility, and integrity data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 27 o business continuity, tiering, and disaster recovery o availability and response time of its staff • data center as a service o environment and support o building access and security o physical rack space o deliveries o scheduled maintenance o communications • glossary we recommend that institutions considering data center consolidation projects complete their sla and other agreements before moving servers into a shared environment. in our case, however, we were unable to finalize the sla prior to the actual move. this was not because of any particular demands that the libraries were making, but was primarily because of the rapid approach of the deadline for moving into the new data center. it had to be completed before the beginning of fall semester, and preferably with a few weeks to spare in case anything went wrong. while the planning for the data center construction and migration seemed to stretch over a long period of time, the final few months turned into a frenzy of activity that ranged from last-minute construction details to nailing down the exact order in which thousands of pieces of equipment would be moved. although not every detail was ironed out at the time of the move, the intentions and spirit of the sla have been documented and it will be completed during 2015. communicate developments and plans during the planning and development for the data center migration project, we recognized that it would be important to communicate any changes to the libraries’ systems availability to our users. its also recognized the need to communicate such changes. both its and the libraries took a many-pronged approach to communicating developments and plans related to the migration. within the libraries we shared updates at library faculty meetings as well as meetings of the library policy group (the dean’s administrative policy team). we sought feedback from many groups on proposed move dates, establishing intersession as the preferred time to move any libraries’ servers that would affect access to resources used by faculty or students. as the moves got closer the communication efforts were ramped up. within the libraries, we posted alerts on the libraries’ webpage that linked to charts indicating what services would be unavailable and when. we also included slides on the libraries’ main webpage with the same information. the same slide was posted on all three libraries’ flat-screen monitors, on which we post important news and dates. we sent mass emails to all libraries’ staff that reminded them when services would be down. staff members who were responsible for specific services made an effort to contact their customers directly. for example, the head of access services contacted information technology and libraries | december 2015 28 faculty members about the scheduled interruptions of ares, our electronic resources reserves system. some of the downtime affected just users, and other downtime also affected staff who could not work in the ils during the move. we planned alternate activities for staff members who could not work during the down time and had a productive division clean up day instead. its also made great efforts to communicate to the university community about the moves and any potential downtime. their efforts included mass emails to all faculty, staff, and students. its created and posted slides to the libraries’ flat screen monitors, as well as other monitors throughout the university. its also formed a team of liaisons from each school and college, using that group as yet another conduit to communicate changes. they shared draft schedules, seeking input on the effect of downtime on the university’s functions. leverage economies of scale one of the challenges of maintaining a distributed data center environment is that each system administrator or unit had to manage its own servers singlehandedly. in the case of the libraries at ualbany, we had moved in the direction of using the power of virtualization to manage many of our servers. virtualization refers to the process of creating virtual servers within one physical server, thereby multiplying the value of a single server many times. the libraries had virtualized a number of library servers, saving money by not having to purchase additional costly physical servers. however, its, with its greater purchasing power, was using more current and advanced virtualization software, hardware, and services than the libraries. its created a suite of services that allows system administrators access to the virtual environment so they can manage their virtual servers from their own offices. by moving into their virtual environment (iaas), the libraries are able to leverage the economies of scale presented by their environment. conclusion the consolidation of distributed data centers or server rooms on university campuses offers many advantages to their owners and administrators, but only minimal disadvantages. the university at albany carried out a decade-long project to design and build a state-of-the-art data center. the libraries participated in a two-year project to migrate their servers to the new data center. this included the hire of a data center migration consulting firm, the development of a migration plan and schedule for the physical move that took place late summer 2014. the authors have found that there are many advantages to consolidating data centers, including taking advantage of economies of scale, an improved physical environment, better backup services and security systems, and more. lessons learned from this experience include the value of participating in the process, reviewing migration schedules carefully, clarifying the costs of consolidation, contributing to the development of an sla, and communicating all plans and developments to the libraries’ customers, including faculty, staff, and students. as other university libraries consider the possibility of consolidating their data centers, the authors hope that this paper will provide some guidance to their efforts. data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 29 references 1. “fast facts,” university at albany, accessed march 31, 2015, www.albany.edu/about/about_fastfacts.php. 2. “data center consolidation, it consolidation,” webopedia, www.webopedia.com/term/d/data-center-consolidation-it-consolidation.html (accessed march 31, 2015). 3. uptime institute, accessed march 31, 2015, https://uptimeinstitute.com/tiercertification/. https://uptimeinstitute.com/tiercertification/ lib-s-mocs-kmc364-20140601052623 137 technical note help: the automated binding records control system an interesting new aspect of library automation has been the appearance of commercial ventures established to provide for an effective use of the new ideas and techniques of automation and related fields. some of these ventures have offered the latest in information science research and development techniques, such as systems analysis, management planning, and operations research. others have offered services based on new procedures, for example, computer-produced book catalogs, selective dissemination of information services, indexing and abstracting activities, mechanized acquisitions, and catalog card production systems. one innovation is a new technique devised for libraries to reduce the clerical effort required to prepare materials for binding and to maintain the necessary related records. the technique is called help, the heckman electronic library program. it was developed by the heckman bindery of north manchester, indiana, with the cooperation of the purdue university libraries. it was recognized by heckman's management that the processing of 10,000 to 20,000 periodicals weekly and the maintenance of over 250,000 binding patterns would soon become too unwieldy and costly unless more efficient procedures were developed. it was additionally realized that any new system should also be designed as a means to aid libraries with their interminable record-keeping problems. the latter purpose could be accomplished by providing a library with detailed and accurate information regarding each periodical it binds, and by simplifying the library's method of preparing binding slips for the bindery. in the fall of 1969, after a detailed analysis, the heckman bindery management began the development and programming of a computerized binding pattern system. this system was a result of a team effort involving management, sales, and production departments. john pilkington, data processing manager, directed the installation of the system and earl beal performed the necessary programming functions. in december of 1971 approx imately 700 libraries were using the system, and about 100,000 binding patterns were in the data file . 138 journal of library automation vol. 5/2 june, 1972 as the system was developed, a library's binding pattern data were converted to machine-readable form which then made it possible for the bindery automatically to provide nearly complete binding slips for each periodical title bound. in addition, the system provides an up-to-date pattern record for the libraries' files, and the bindery maintains the resultant data bank of pattern records as the library notifies it of additions, changes, and deletions. in this manner, the bindery expects to establish an efficient method for purging the file of out-of-date information. the system revolves around four forms: the binding pattern index card, the binding slip, the variable posting sheet, and the binding historical record. the binding pattern index card (figure 1) is a 5" x 8w' card, pink in color, which is a computer printout. one of these cards is retained in the library as its pattern record for each set of each periodical bound by the library. the data given on the card are essentially the same as those maintained by most libraries in their manual pattern £les, except that more detail is provided by the help system, and the library does not maintain the record-the bindery does-in machine-readable form. as changes are made to the patterns, the library clerk simply crosses out the old data on the appropriate binding slip and writes in the new data. when the bindery receives the binding slip, a new index card is produced, among other records, and forwarded to the library with the returned shipment of bound volumes. the system also provides for one-time changes that do not affect the pattern record. the data contained on the index cards include the library account number, the library branch or department code, the pattern number, color, type size, stamping position, title (vertical or horizontal spine positions), labels, call number, library imprint, and collating instructions. the collating instructions, which are listed in the instruction manual provided by the bindery, are given as a series of numeric codes. asterisks are used to indicate the end of a print line. the binding slips are also 5" x 8}2'' forms, but they are four-part multiple forms, of which three parts are sent to the bindery with the periodical to be bound, and one part, a card form, is retained by the library as its "at bindery" record. the information required by the binding slip is essentially the same as that included on the index card. the library, however, must provide the variable data such as volume number(s), date(s), month(s), or whatever information is required to identify a specific volume. the variable posting sheet (figure 2) is an 8)~" x 11" form that is used by the library when it sends several volumes or copies of a volume to the bindery at the same time. since the bindery cannot determine beforehand the number of physical volumes of a title a library will want to send for binding at a given time, it sends to the library only one printed-out binding slip to be used for the next volume of a given serial. if multiple volumes of -r-~---------------------------------------------:r-0 pattern cust. acct . no. i lib rar y' i pattern no. i •• 1. colo~. . , i trim i spine icust . pat. no. 'i' 0 i t'"e slot or i i otui i i i size start i i ::: library i 0 <( ~ ..... oo z z 0 i <( 0 post • 0 0 0 0 ,, ' 0 -o ·-' 0 ·~ f ·o 0 ~ accents i ~ z ~ to i : i rr.· llol ~ !ii llol i: i o i z <( ::e i i u z ·> i a: lli 0 i ~ i id z <( :1 :iii: i ~ x lli x i ... v e • t i c a l f r l 0 a n 0 t e l 0 • • call impriiiit panel l,..lllll$ coll.atl 8 len . s£w p£rma · film vol.. oty. 1 ovt• u: " u fiiioui u•• nu... 0, ~: r tal"[ stui filler sep. covea stu r stu8 w/stui sheets 11111 papu y x title i 0 ~ i f '" required i ~ i new title i i : i 0$ sample i i q. or rub job no. cover no. 0 c o: .. . : 3 0~ : • ! of 'i q_l ' + ~--------. 'f•a ( i o 1 -. }, 4 fig. 1. binding pattern index card. 140 journal of library automation vol. 5/ 2 june, 1972 binding patiern variable posting sheet 1he. heckman bin'de.~y, inc. cust. acct. no. 1 ~.18rj.rv rattern no.,l-israrv name periodicalname 'post patterw variabl-e information from \.eft to right in seqi./li:nc'e i z 3 4 5 . 6 ; .... '-......_~ .... "-....... ,_......-"'\_'-~r··-~ ........... ..____.._ · -l, )~ i / -~ fig. 2. variable posting sheet. a set are to be bound, the library clerk provides the variable information for the first volume by using the single binding slip, and the variable data for each additional volume of the same title are posted by the clerk on the posting sheet. the bindery will automatically produce from its pattern data bank the binding slips necessary for binding the additional volumes that are listed on the posting sheet. the binding historical record (figure 3) is a form provided for the use of the library if it desires a permanent record of every volume bound. the use of this form is not required by the system; it is simply a convenience record for the library binding staff. the form is printed on the back of the pattern index card. spaces are provided for volume, year, and date sent to the bindery, and most of the back of the card is available for posting. all data fields are of fixed length with the maximum size of the records at 328 characters. some of the data formats are shown in figure 4. a few of the data fields in the example need additional explanation. the fifth field labeled "print" refers to the color of the spine stamping, i.e., gold, black, or white. the "trim #1 & 2" fields are for bindery use only, and indicate volume size within certain groups for printing purposes. the "spine" field is also for bindery use, and it indicates the size of type that can be used according to the width of the spine. "product no." refers to certain types of publications such as magazines, matched sets, or items which will be pamphlet (inexpensively) bound. i i 0 0 0 0 0 0 0 0 0 0 title : publisher ' s address: volume year -------------------· binding record 0 0 date sent volume year date sent 0 0 0 0 0 0 0 0 fig. 3. binding historical record. ,..--1 i i i i i i i i i ibr., print punch program control card print punch program control card print punch program control card l----96 column cal card name ______________ _ i 12 1314 15 1 6 171 81 91 10 11 112 113 14l15l16 l11 l18l 19 20 l21l22 l23l24l25l26 l21 l28 l29 l30 131132133 134 35136 3ji38 l39 l40 1411421 43144145 l i ' print line 1 i p ier 1 ' t i i cust. no. lib pattern p mat. trim ~im s customer no. no. r #i p 1 i i i pattern ' n i n i t i e no. i ' i 2j 3 4j5j_6 1 18 19 110 11112113 14li5l 16 l11 l1 8 ll 9 20121 122 23124125126 21128129130 31132133134 35136 31138139140 i 411 42 143144 145 i ii i i i i i ii i i i i i ill ill ii i i ii i ill ii card name-------------i 2 13} 4 }5}6 1 i 8 i 9 i 101 ii 1121 131 14115 116111 118 119 20121122123124125126 1211281 29 130 i 311 32 133 i 34 i 35 l36l31 l38l39l40 141 42143144 14511 l i ' print line 1 i p,, er 1 i ti ' i cust. no. lib pattern i ' ' no. no. i !2 ' ) collate (con~.) -~ ' i i i i i 2 i 3 4 i 5 i 6 118 19 }10 11}12 }13 j4li5l16 i 11 i 18 i 19 20 i 21 i 22123 i 24125126121 i 28129130 131132133 i 34 i 35136 131 138 i 39140 i 41 42 143 14414511 i i i i ii i i , iiiii i i i i i 11 i i i i 11 111111 i i ii ii card name ______________ _ 1 2 i 3 14 i sl 6 i 1 i sl 91 10 111 112 j 13 14l15l 16l11 l1 8}19 20 }21}22}23}24}25}26}21}28 }29 }30 l31 }32 }33}34l35l36l37j38 l39l40 ]41 ]42 143 144]45~ print line 1 pr ier 1 ti cust. no. lib pattern i ~ no. no. 5 i 2 i 3 41516 1 i 8 i 9 i 10 ii 112113 14115 i 16117 i 18 i 19 20121 i 22123124125126 i 27128129130 i 31 i 32 133 134 i 35136131138 i 39140 i41i42i43i44i4sl~ i i i ijj ll . ill i i i i l l i i l i ll l ill l l l l l l l l l i ll fig. 4. data formats. ----, 1 j multiple layout form print lines 3 and 4 tier3 gx21·9088·0 um /050 " pnnted •n u s a "no of,offt'is_,.,~,..\w~l,.. "1f.---------'--collate----------------l 11 line 2 print lines 3 and 4 r2 tier 3 ----------------------~----variable ------------------------~----~ 1t line 2 print lines 3 and 4 r2 tier3 -----variable (contt) ------------------~ ' ' i i i i ~ l i i i i _____ j 144 journal of library automation vol. 5/ 2 june, 1972 l. lllrary hamil!. 'uft. acct. no, llll r: how 80unp i pro.~~~ ;::;:. j'iittflll no.,i'itiny i"ayijtta..:l trim i ~''ni-l cu$ t. i"ayteitn no. i !rvpe nor dr 'patter-n pr.l)o.itlng~t::tu p sixe hart ~ lfor.ix:olhal. lv veil tical ' i fr? fronl' or. labels variable fgl cafyions call ~ c impjlinl' ~ i panel. ~ line:s p collatingom ~ fig. 5. pattern printing setup. technical note / hammer 145 one additional form used in the system is for heckman's internal operations. that is a data input form known as the "pattern printing setup" (figure 5). this form is used by the bindery's input clerks to prepare new binding patterns for conversion to machine-readable form. the data prescribed by the form is much like that required by the binding pattern index card, except that data tags are shown for keypunching purposes. the system operates on an ibm system 3 computer with two 5445 disk drives and a 1403nl printer. the disk drives provide a total of 40,000,000 characters of on-line storage in addition to the 7,500,000 usable characters provided by the system 3 itself. five 5496 data recorders are used for data conversion. the programs are written in rpg2. the development of computer-oriented commercial services for libraries suggests that, perhaps if librarians wait long enough, they will not have to automate their libraries as commercial ventures will do it for them. the rapid appearance of systems-analysis firms, commercial and societal abstracting and indexing services, management and planning consulting groups, and data processing service bureaus tends to bear this theory out. at the very least, libraries will not be able to automate internally without providing for the incorporation of such ready services into their systems. when a service such as help is made available at no additional charge, there is no way for libraries to avoid automation. donald p. hammer donald p. hammer is associate director for library and information systems, university of massachusetts library, amherst. at the time the system d escribed in this article was developed, mr. hammer was the head of libraries systems development at purdue university. lib-mocs-kmc364-20140106084018 title-only entries retrieved by use of trunca1'ed search keys 207 frederick g. kilgour, philip l. long, eugene b. liederman, and alan l. landgraf: the ohio college library center, columbus, ohio. an experiment testing utility of truncated search keys as inquiry terms in an on-line system was performed on a file of 16,792 title-only bibliographic entries. use of a 3,3 key yields eight or fewer entries 99.0% of the time. a previous paper ( 1) established that truncated derived search keys are efficient in retrieval of entries from a name-title catalog. this paper reports a similar investigation into the retrieval efficiency of truncated keys for extracting entries from an on-line, title-only catalog; it is assumed that entries retrieved would be displayed on an interactive terminal. earlier work by ruecking (2), nugent (3), kilgour (4), dolby (5), coe ( 6), and newman and buchinski ( 7) were investigations of search keys designed to retrieve bibliographic entries from magnetic tape files. the earlier paper in this series and the present paper investigate retrieval from on-line files in an interactive environment. similarly, the work of rothrock ( 8) inquired into the efficacy of derived truncated search keys for retrieving telephone directory entries from an on-line file. since the appearance of the previous paper, the ohio state university libraries have developed and activated a remote catalog access and circulation control system employing a truncated derived search key similar to those described in the earlier paper. however, osu adopted a 4,5 key consisting of the first four characters of the main entry and the first five characters of the title excluding initial articles and a few other nonsignificant words. whereas the osu system treats the name and title as a continuous string of characters, the experiments reported in this and the previous paper deal only with the first word in the name and title, articles always being excluded. 208 journal of library automation vol. 4/4 december, 1971 the bell system has also recently activated a large traffic experiment in the san francisco bay area. the master file in this system contains 1,300,000 directory entries. the system utilizes truncated derived keys like those investigated in the present experiments. materials and methods the file used in this experiment was described in the earlier paper ( 1), except that this experiment investigates the title-only entries. the same programs used in the name-title investigation were used in this experiment; the title-only entries were edited so that the first word of the title was placed in the name field and the .11emaining words in the title field. as was the case formerly, it was necessary to clean up the file. single word titles often carried in the second or title field such expressions as one year subscription or vol 16 1968. in addition there were spurious character strings that were not titles, and in such cases the entire entry was removed from the file. thereby, the original 17,066 title entries were reduced to 16,792. the truncated search keys derived from these title-only entries consist of the initial characters of the first word of the title and of the second word of the title. if there was no second word, blanks were employed. if either the first or second word contained fewer characters than the key to be derived, the key was left-justified and padded out with blanks. to obtain a comparison of the effectiveness of truncated research keys derived from title-only entries as related to first keys derived from nametitle entries, a name-title entry fil e of the same number of entries ( 16,792) was constructed. a series of random numbers larger than the number of entries in the original name-title file ( 132,808 ) was generated and one of the numbers was added to each of the 132,808 name-title entries in sequence. next the fil e was sorted by number so that a randomized file was obtained. then the first 16,792 name-title entries were selected. the same program analyzed keys d erived from this file. results table 1 presents the maximum number of entries to be expected in 99% of replies for the file of 16,792 title-only entries as well as for the nametitle file containing the same total of entries. for example, when a large number of random requests are put to the title-only file using a 3,3 search key, the prediction is that 99.0% of the time, eight or fewer replies will be returned. however, in the case of the name-title file , only two replies will be returned 99.3% of the time. the 3,3 key produced only thirteen replies ( .12% of the total number of 3,3 keys) containing twenty-one or more entries. the highest number of entries for a single reply for the 3,3 key was 235 ( "jou ,of" d erived from journal of ) . the next highest number of replies was 88 ("adv, in" for advances in ) . trun cated search keys j kilgour 209 table 1. maximum number of entries in 99% of replies search key title-only entries name-title entries percent max imum ent1·ies maximum entries percent per reply of time per reply of time ~2 ~ ~1 7 99.0 ~3 ~ ~1 4 99.6 2,4 11 99.0 3 99.5 3,2 9 99.1 3 99.2 3,3 8 99.0 2 99.3 3~ 8 ~1 2 99.5 4,2 8 99.1 2 99.2 4,3 7 99.0 2 99.6 4,4 7 99.1 2 99.7 discussion the two words from which the keys are derived in name-title entries constitute a two-symbol markov string of zero order, since the name string and title string are uncorrelated. however, the two words from which keys are derived in the title-only entry are first order markov strings, since they are consecutive words from the title string and are correlated. the consequence of these two circumstances on the effective ness of derived keys is clearly presented in table 1. the keys from name-title entries consistently produce fewer maximum entries per reply. therefore, it is desirable to derive keys from zero order markov strings wherever possible. the ohio state university libraries contain over two and a quarter million volumes, but on 9 february 1971 there were only 47,736 title-only main entries in the catalog. the file used in the present experiment is 35% of the size of the osu file. since 99% of the time the 3,3 key yields eight or fewer titles, it is clear that such a key will be adequate for retrieval for library on-line, title-only catalogs. the 3,3 key also posse sses the attractive quality of eliminating the majority of human misspe1ling as pointed out in the earlier paper ( 1). there remains, however, the unsolved problem of the efficient retrieval of such titles as those beginning with "journal of" and "advances in". it appears that it will be necessary to devise a special algorithm for those relatively few titles that produce excessively high numbers of entries in replies. in the previous investigation it was found that a 3,3 key yielded five or fewer replies 99.08% of the time from a fil e of 132,808 name-title entries. table 1 shows that for a file of only 16,792 entries the 3,3 key produces two or fewer replies 99.3% of the time . these two observations suggest that as a file of bibliographic entries increases, the maximum number of entries per reply does not increase in a one-to-one ratio, since the maximum 210 journal of library automation vol. 4/4 december, 1971 number of entries rose from two to five while the total size of the file increased from one to approximately eight. further research must be done in this area to determine the relative behavior of derived truncated keys as their associated file sizes vary. conclusion this experiment has produced evidence that a series of truncated search keys derived from a first order markov word string in a bibliographic description yields a higher number of maximum entries per reply than does a series derived from a zero order markov string. however, the results indicate that the technique is nonetheless sufficiently efficient for application to large on-line library catalogs. use of a 3,3 search key yields eight or fewer entries 99.0% of the time from a file of 16,792 title-only entries. acknowledgment this study was supported in part by national agricultural library contract 12-03-01-5-70 and by office of education contract oec-0-72-2289 (506). references 1. f. g. kilgour; p. l. long; e. b. leiderman: "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7 ( 1970), pp. 79-82. 2. f. h. ruecking, jr.: "bibliographic retrieval from bibliographic imput; the hypothesis and construction of a test," journal of library automation 1 (december 1968), 227-38. 3. nugent, w. r.: "compression word coding techniques for information retrieval," ] ournal of library automation 1 ( december 1968 ) , 250-60. 4. f. g. kilgour: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science 5 ( 1968), pp. 133-36. 5. j. l. dolby: "an algorithm for variable-length proper-name compression," ] ournal of library automation 3 (december 1970), 257-75. 6. m. j. coe: "mechanization of library procedures in the medium-sized medical library: x. uniqueness of compression codes for bibliographic retrieval,'' bulletin of the medical library association 58 (october 1970), 587-97. 7. w. l. newman; e. j. buchinski: "entry/title compression code access to machine readable bibliographic files," journal of library automation 4 (june 1971 ), 72-85. 8. h. i. rothrock, jr.: computer-assisted directory search; a dissertation in electrical engineering. (philadelphia: university of pennsylvania, 1968). open search environments: the free alternative to commercial search services. adrian o’riordan information technology and libraries | june 2014 45 abstract open search systems present a free and less restricted alternative to commercial search services. this paper explores the space of open search technology, looking in particular at lightweight search protocols and the issue of interoperability. a description of current protocols and formats for engineering open search applications is presented. the suitability of these technologies and issues around their adoption and operation are discussed. this open search approach is especially useful in applications involving the harvesting of resources and information integration. principal among the technological solutions are opensearch, sru, and oai-pmh. opensearch and sru realize a federated model to enable content providers and search clients communicate. applications that use opensearch and sru are presented. connections are made with other pertinent technologies such as open-source search software and linking and syndication protocols. the deployment of these freely licensed open standards in web and digital library applications is now a genuine alternative to commercial and proprietary systems. introduction web search has become a prominent part of the internet experience for millions of users. companies such as google and microsoft offer comprehensive search services to users free with advertisements and sponsored links, the only reminder that these are commercial enterprises. businesses and developers on the other hand are restricted in how they can use these search services to add search capabilities to their own websites or for developing applications with a search feature. the closed nature of the leading web search technology places barriers in the way of developers who want to incorporate search functionality into applications. for example, google’s programmatic search api is a restful method called google custom search api that offers only 100 search queries per day for free.1 the limited usage restrictions of these apis mean that organizations are now frequently looking elsewhere for the provision of search functionality. free software libraries for information retrieval and search engines have been available for some time allowing developers to build their own search solutions. these libraries enable search and retrieval of document collections on the web or offline. web crawlers can harvest content from multiple sources. a problem is how to meet users’ expectations of search efficacy while not having adrian o’riordan (a.oriordan@cs.ucc.ie) is lecturer, school of computer science and information technology, university college, cork, ireland. open search environments: the free alternative to commercial search services | o’riordan 46 the resources of the large search providers. reservations about the business case for free open search include that large-scale search is too resource-hungry and the operational costs are too high; but these suppositions have been challenged.2 open search technology enables you to harvest resources and combine searchers in innovative ways outside the commercial search platforms. further prospects for open search systems and open-source search lie in areas such as peer-to-peer, information extraction, and subject-specific technology.3 many search systems unfortunately use their own formats and protocols for indexing, search, and results lists. this makes it difficult to extend, alter, or combine services. distributed search is the main alternative to building a “single” giant index (on mirrored clusters) and searching at one site, a la google search and microsoft bing. callan describes the distributed search model in an information retrieval context.4 in his model, information retrieval consists of four steps: discovering databases, ranking databases by their expected ability to satisfy the query, searching the most relevant databases, and merging results returned by the different databases. distributed search has become a popular approach in the digital libraries field. note that in the digital libraries literature distributed search is often called federated search.5 the federated model has clear advantages in some application areas. it is very hard for a single index to do justice to all the possible schemas in use on the web today. other potential benefits can come from standardization. the leading search engine providers utilize their own proprietary technologies for crawling, indexing, and the presentation of results. the standardization of result lists would be useful for developers combining information from multiple sources or pipelining search to other functions. a common protocol for declaring and finding searchable information is another desirable feature. standardized formats and metadata are key aspects of search interoperability, but the focus of this article is on protocols for exchanging, searching, and harvesting information. in particular, this article focuses on lightweight protocols, often rest (representational state transfer)–style applications. lightweight protocols place less onerous overheads on development in terms of adapting existing systems and additional metadata. they are also simpler. the alternative is heavyweight approaches to federated search, such as using web services or service-oriented architectures. there have been significant efforts at developing lightweight protocols for search, primary among which is the opensearch protocol developed by an amazon subsidiary. other protocols and services of relevance are sru, mxg, and the oai-omh interoperability framework. we describe these technologies and give examples of their use. technologies for the exchange and syndication of content are often used as part of the search process or in addition. we highlight key differences between protocols and give instances where technologies can be used in tandem. information technology and libraries | june 2014 47 this paper is structured as follows. the next section describes the open search environment and the technologies contained therein. the following section describes open search protocols in detail, giving examples. finally, summary conclusions are presented. an open search environment a search environment (or ecosystem) consists of a software infrastructure and participants. the users of search services and the content providers and publishers are the main participants. the systems infrastructure consists of the websites and applications that both publish resources and present a search interface for the user, and the technologies that enable search. technologies include publishing standards for archiving and syndicating content, the search engines and web crawlers, the search interface (and query languages), and the protocols or glue for interoperability. baeza-yates and raghavan present their vision of next-generation web search highlighting how developments in the web environment and search technology are shaping the next generation search environment.6 open-source libraries for the indexing and retrieval of document collections and the creation of search engines include the lemur project (and the companion indri search engine),7 xapian,8 sphinx,9 and lucene (and the associated nutch web crawler).10 all of these systems support web information retrieval and common formats. from a developer perspective they are all crossplatform; lemur/indri, xapian, and sphinx are in c and c++ whereas lucene/nutch is in java. xapian has language bindings for other programming languages such as python. the robustness and scalability of these libraries support large-scale deployment, for example the following large websites use xapian: citebase, die zeit (german newspaper), and debian (linux distribution).11 middleton and baeza-yates present a more detailed comparison of open-source search engines.12 they compare twelve open-source search engines including indri and lucene mentioned above across thirteen dimensions. features include license, storage, indexing, query preprocessing (stemming, stop-word removal), results format, and ranking. apache solr is a popular open-source search platform that additionally supports features such as database integration, real-time indexing, faceted search, and clustering.13 solr uses the lucene search library for the core information retrieval. solr’s rich functionality and the provision of restful http/xml and json apis makes it an attractive option for open information integration projects. in a libraries context singer cites solr as an open-source alternative for next-generation opac replacements.14 solr is employed, for example, in the large-scale europeana project.15 the focus of much of this paper is on the lightweight open protocols and interoperability solutions that allow application developers to harvest and search content across a range of locations and formats. in contrast, the delosdlms digital library framework exemplifies a heavyweight approach to integation.16 in delosdlms, services are either loosely or tightly coupled in a serviceoriented architecture using web service middleware. open search environments: the free alternative to commercial search services | o’riordan 48 particular issues for open search are the location and collection of metadata for searchable resources and the creation of applications offering search functionality. principal among these technological solutions are the opensearch and sru protocols. both implement what alipourhafezi et al. term a federated model in the context of search interoperability, wherein providers agree that their services will conform to certain standard specifications.17 the costs and adoption risk of this approach are low. technologies such as opensearch occupy an abstraction layer above existing search infrastructure such as solr. interoperability interoperability is an active area of work in both the search and digital library fields. interoperability is “the ability of two or more systems or components to exchange information and to use the information that has been exchanged.”18 interoperability in web search applies to resource harvesting, meta-search, and to allow search functions interact with other system elements. interoperability in digital libraries is a well-established research agenda,19 and it has been described by paepcke et al. as “one of the most important problems to solve [in dls].”20 issues that are common to both web search and digital-library search include metadata incompatibilities, protocol incompatibilities, and record duplication and near-duplication. in this paper, the focus is on search protocols; for a comprehensive survey of technology for semantic interoperability in digital library systems, see the delos report on same.21 a comprehensive survey of methods and technology for digital library interoperability is provided in a dl.org report.22 formats and metadata free and open standard formats are extensive in web technology. standard or de facto formats for archiving content include plain text, rich text format (rtf), html, pdf, and various xml formats. document or resource identification is another area where there has been much agreement. resource identification schemes need to be globally unique, persistent, efficient, and extensible. popular schemes include urls, persistent urls (purls), the handle system (handle.net), and dois (digital object identifiers). linking technologies include openurl and coins. openurl links sources to targets using a knowledge base of electronic resources such as digital libraries.23 contextobjects in spans (coins), as used in wikipedia for example, is another popular linking technology.24 applications can use various formats for transporting and syndicating content. syndication and transport technologies include xml, json, rss/atom, and heavyweight web service-based approaches. much of the metadata employed in digital libraries is in xml formats, for example in marcxml and the metadata encoding and transmission standard (mets). the world wide web consortium defined rdf (resource description framework) to provide among other goals “a mechanism for integrating multiple metadata schemes.”25 rdf records are defined in an xml namespace. in rdf, subject-predicate-object expressions represent web information technology and libraries | june 2014 49 resources and typically identified by means of an uri (universal resource identifier). json (javascript object notation) is a lightweight data-interchange format that has become popular in web-based applications and is seeing increasing support in digital library systems.26 harvesting web content is harvested using software called web crawlers (or web spiders). a crawler is an instance of a software application that runs automated tasks on the web. specifically the crawler follows web links to index websites. there was been little standardization in this area except the robot exclusion standard and use of various xhtml meta-elements and http header fields. there is a lot of variability in terms of the policy for the selection of content sources, policy for following links, url normalization, politeness, depth of crawl, and revisit policy. consequently, there are many web crawling systems in operation; open-source crawlers include datapartsearch, grub, heritrix, and the aforementioned nutch. harvesting and syndication of metadata from open repositories is the goal of the open archives initiative protocol for metadata harvesting (oai-pmh), originally developed by los alamos national laboratory, cornel, and nasa in 2000 and 2001.27 resource harvesting challenges include scale, keeping information up-to-date, robustness, and security. oai-pmh has been adopted by many digital libraries, museums, archives, and publishers. the latest version, oai-pmh 2.0, was released in 2012. oai-pmh specifies a general application-independent model of network-accessible repository and client harvester that issues requests using http (either get or post). metadata is expressed as a record in xml format. an oai-phm implementation must support dublin core, with other vocabularies as additions. oai-pmh is the key technology in the harvesting model of digital library interoperability described by van de sompel et al.28 an oai-pmh-compliant system consists of harvester (client), repository (network accessible server), and items (constituents of a repository). portal sites such as europeana and oaister use oai-pmh to harvest from large numbers of collections.29 there are online registries of oai-compliant repositories. the european commission’s europeana allow users to search across multiple image collections including the british library and the louvre online. another portal site that uses oai-pmh is culturegrid, operated by the uk collections trust. culturegrid provides access to hundreds of museum, galleries, libraries, and archives in the uk. the apache software foundation has developed a module, mod_oai, for apache webservers that helps crawlers to discover content. syndication and exchange here we outline lightweight options for syndication and information exchange. heavyweight web services-based approaches are outside the scope of this article. web syndication commonly uses rss (really simple syndication) or its main alternative, atom. atom is a proposed ietf standard.30 rss 2.0 is the latest version in the rss family of specifications, a simple yet highly extensible open search environments: the free alternative to commercial search services | o’riordan 50 format where content items contain plain text or escaped html.31 atom, developed to counter perceived deficiencies in rss, has a richer content model than rss and is more reusable in other xml vocabularies.32 both rss and atom use http for transport. rss organizes information into channels and items, atom into feeds and entries. extension as modules allows rss to carry multimedia payload (rss enclosures) and geographical information (georss). atom has an associated publishing protocol called atompub. syndication middleware, which supports multiple formats, can serve as an intermediary in application architectures. information and content exchange (ice) is a protocol that aims to “automate the scheduled, reliable, secure redistribution of any content.”33 twice is a java implementation of ice. ice automates the establishment of syndication relationships and handles data transfer and results formatting. this gives content providers more control over delivery, schedule, and reliability than simple web syndication without deploying a full-scale web services solution. the open archives initiative—object reuse and exchange (oai-ore) protocol provides standards for the description and exchange of aggregations of web resources.34 this specification standardizes how compound digital objects can combine distributed resources of multiple media types. ore introduces the concepts of aggregation, resource map, and proxy resource. resource providers or curators can express objects in rdf or atom format and assign http uris for identification. ore supports resource discovery so crawlers or harvesters can find these resource maps and aggregates. ore can work in partnership with oai-pmh. we outline some additional lightweight technologies for information exchange to conclude this section. opml (outline processor markup language) is a format that represents lists of web feeds for services such as aggregators.35 it is a simple xml format. feedsync and rome support formatneutral feed formats that abstract from wire formats such as rss 2.0 and atom 1.0 for aggregator or syndication middleware. these technologies are described in the literature.36 lockss (lots of copies keep stuff safe) is a novel project that users a peer-to-peer network to preserve and provide access to web resources. for example, the metaarchive cooperative uses lockss for digital preservation.37 meta-search meta-search is where multiple search services are combined. such services have a very small share of the total search market owing to the dominance of the big players. metacrawler, developed in the 1990s, was one of the first meta-search engines and serves as a model for how such systems operate.38 a meta-search engine utilizes multiple search engines by sending a user request to multiple sources or engines aiming to improve recall in the process. a key issue with meta-search is how to weight search engines and how to integrate sets of results into a single results list. figure 1 shows a general model of meta-search where the meta-search service chooses which search engines and content providers to employ. active meta-search engines on the web include dogpile, yippy, ixquick, and info.com. note that these types of website appear, change information technology and libraries | june 2014 51 names, and disappear frequently. currently meta-search services use various implementation methods such as proprietary protocols and screen scraping. figure 1. general model of meta-search. metasearch xml gateway (mxg) is a meta-search protocol developed by the niso metasearch initiative, a consortium of meta-search developers and interested parties.39 mxg is a message and response protocol, which enables meta-search service providers and content providers communicate. a goal of the design of mxg was that content providers should not have to expend substantial development resources. mxg, based on sru, specifies both the query and search results formats. combining results, aggregation and presentation are not part of the protocol and handled by the meta-search service. the standard defines three levels of compliance allowing varying degrees of commitment and interoperability. search protocols we describe opensearch and sru, along with applications, in the following subsections. after that, we detail related technologies. opensearch opensearch is a protocol that defines simple formats to help search engines and content providers communicate. it was developed by a9, a subsidiary of amazon, in 2005.40 it defines common formats for describing a search service, query results, and operation control. it does not specify content formats for documents or queries. the current specification, version 1.1, is available with a creative commons license. it is an extensible specification with extensions published on the website. both free open systems and proprietary systems use opensearch. in particular, many open-source search engines and content management systems support opensearch, including yacy, drupal and plone cms. opensearch consists of a description file for search source and a response format for query results. descriptors include elements url, query, syndicationright, and language. resource identification can be by urls, dois, or a linking technology such as openurl. responses describe a list of results and can be in rss, atom, or html formats. additionally there is an auto-discovery feature to signal that a html page is searchable, implemented using a html 4.0 element. search engine content provider search engine content provider choose combine query combined results open search environments: the free alternative to commercial search services | o’riordan 52 opensearch makes very few assumptions about the types of sources, the type of query or the search operation. it is ideal for combining content from multiple disparate sources, which may be data from repositories, webpages, or syndicated feeds. for illustrative purposes, listing 1 gives an example opensearch description for harvesting book information from an example digital library called diglib. the root node includes an xml namespace attribute, which gives the url for the standard version. the url element specifies the content type (a mime type), the query (book in this case), and the index offset where to begin. the rel attribute states that the result is a collection of resources. diglib harvests book items en-us utf-8 listing 1. xml opensearch description record. next, we describe some deployed applications that use opensearch. ojax uses qeb technologies such as ajax (asynchronous javascript) to provide a federated search service for oai-pmh compatible open repositories.41, 42 ojax also supports the discovery feature of opensearch, as described in the opensearch 1.1 specification, for auto-detecting that a repository is searchable. stored searches are in atom format. open-source meta-search engines can combine the results of opensearch enabled search engines.43 a system built as a proof-of-concept uses four search sources: a9.com, yacy, mozdex, and alpha. a user can issue a text query (word or phrase) with boolean operators and several modifiers. users can prefer or bias particular engines by setting weights. the system ranks results, combined using a voting algorithm and implemented using the lucene library. opensearch can be employed to specify the search sources and as a common format when results are combined. as levan points out “the job of the meta-search engine is made much simpler if the local search engine supports a standard search interface.”44 levan also mentions mxg in this context. nguyen et al. describe an application where over one hundred search engines are used in experiments in federated search.45 the search sources were mostly opensearch-compliant search engines. an additional tool scrapes results from noncompliant systems. intersynd uses opensearch to help provide a common protocol for harvesting web feeds. intersynd is a syndication system that harvests, stores, and provides feed recommendations.36 it uses java.net’s rome (rss and atom utilities) library to represent feeds in a format-neutral way. information technology and libraries | june 2014 53 intersynd is syndication middleware that allows sources to post and services to fetch information in all major syndication formats (see figure 2). its feed-discovery module, disco, uses the nutch crawler and the opensearch protocol to harvest feeds. nutch is an open-source library for building search engines that supports opensearch. nutch builds on the lucene information retrieval library, adding web-specifics, such as a crawler, a link-graph database, and parsers for html. figure 2. openseach in intersynd. opensearch 1.1 allows returned results in either rss 2.0 or atom 1.0 format or an opensearch format, the “bare minimum of additional functionality required to provide search results over rss channels” (quoted from a9 website). listing 2 below shows a disco results list in rss 2.0 format. opensearch fields appear in the channel description. the nutch fields appear within each item (not shown). an opensearch namespace is specified in the opening xml element. the following additional opensearch elements appear in the example: totalresults, itemsperpage and startindex. nutch: metasearch nutch search results for query: metasearch http://localhost/nutch-1.6dev/opensearch?query=metasearch&start=0&hitspersite=2&hitsperpage=10 282 0 10 metasearch cut... more items cut... listing 2. results produced using nutch with opensearch. open search environments: the free alternative to commercial search services | o’riordan 54 we mention one more application of opensearch here. a series of nasa projects to develop a set of interoperable standards for sharing information employs various open technologies for sharing and disseminating datasets including opensearch for its discovery capability.46 discovery of document and data collections is by keyword search, using the opensearch protocol. there are various extensions to opensearch. for example, an extension to handle sru allows sru (search and retrieval via url) queries within opensearch contexts. other proposed extensions include support for mobility, e-commerce, and geo-location. sru (search/retrieval via url) a technology with some similarities to opensearch but more comprehensive is sru (search/retrieval via url).47 sru is an open restful technology for web search. the current version is sru 2.0, standardized by the organization for the advancement of structured information standards (oasis) as searchretrieve. version 1.0. sru was developed to provide functionality similar to the widely deployed z39.50 standard for library information retrieval updated for the web age.48 sru addresses aspect of search and retrieval by defining models: a data model, a query model, and processing model, a result set model, a diagnostics model and a description-and-discovery model. sru is extensible and can support various underlying low-level technologies. both lucene and dspace implementations are available. the oclc implementation of sru supports both rss and atom feed formats and the atom publishing protocol. sru uses http as the application transport and xml formats for messages. requests can be in the form of either get or post http methods. sru supports a high-level query language called contextual query language (cql). cql is a human-readable query language consisting of search clauses. sru operation involves three parts: explain, search/retrieve and scan. explain is a way to publish resource descriptions. search/retrieve entails the sending of requests (formulated in cql) and the receiving of responses over http. the optional sru scan enables software to query the index. the result list is in xml schema format. the meta-search service mxg uses sru, but relaxes the requirement to use cql.39 srw (search/retrieve web service) is a web services implementation of sru that uses soap as the transfer mechanism. hammond combines opensearch with sru technology in an application for nature publishers.49 he also points out the main differences between the protocols such as sru’s use of a query specification language and differences in the results records. as well as supporting opensearch data formats (rss and atom), the nature application also supports json (javascript object notation). opensearch is used for formatting the result sets whereas sru/cql is used for querying. this search application launched as a public service in 2009. listing 3 below is an example from the nature application showing cql search queries ( tags) used in an opensearch description document. note how both the sru querytype and the information technology and libraries | june 2014 55 opensearch searchterms attributes appear in the query. further details on how to use sru and opensearch together are on the opensearch website. nature.com opensearch interface for nature.com the nature.com opensearch service nature.com opensearch sru listing 3. example using sru and opensearch. other technologies here we more briefly survey some additional technologies of relevance to open-search interoperability. xml-based approaches to information integration, such as the use of xquery, are an option but do not present a loose integration. chudnov et al. describes a simple api for a copy function for web applications to enable syndication, searching, and linking of web resources.50 called unapi, it requires small changes for publishers to add the functionality to web resources such as repositories and feeds. developers can layer unapi over sru, opensearch, or openurl.51 announced in 2008, yahoo!’s searchmonkey technology, also called yahoo!’s open search platform, allowed publishers to add structured metadata to yahoo! search results. searchmonkey divided the problem into two parts: metadata extraction and result presentation. in is not clear how much of this technology survived yahoo! and microsoft’s new search alliance, signed in 2010.52 mika described a search interface technology called microsearch that is similar in nature.53 in microsearch, semantic fields are added a search and search result presentation enriched with open search environments: the free alternative to commercial search services | o’riordan 56 metadata extracted from retrieved content. govaerts et al. described a federated search and recommender system that operates as a browser add-on. the system is opensearch-compliant and all results are in the atom format.54 the corporation for national research initiatives (cnri) digital object architecture (doa) provides a framework for managing digital objects in a networked environment. it consists of three parts: a digital object repository, a resolution mechanism (handle system), and a digital object registry. the repository access protocol (rap) proves a means of networked access to digital objects, which supports authentication and encryption.55 summary and conclusions a rich set of formats and protocols and working implementations show that open search technology is an alternative to the dominant commercial search services. in particular, we discussed the lightweight opensearch and sru protocols as suitable glue to create loosely coupled search-based applications. these can complement other developments in resource discovery and description, open repositories, and open-source information retrieval. the flexibility and extensibility offers exciting opportunities to develop new applications and new types of applications. the successful deployment of open search technology shows that this technology has matured to support many uses. a fruitful area of further development would be to make working with these standards easier for developers and even accessible to the nonprogrammer. references 1. google custom search api, https://developers.google.com/custom-search/v1/overview. 2. mike cafarella and doug cutting, “building nutch: open source search: a case study in writing an open source search engine,” acm queue 2, no. 2 (2004), http://0dl.acm.org.library.ucc.ie/citation.cfm?doid=988392.988408. 3. wray buntine et al., “opportunities from open source search,” in proceedings, the 2005 ieee/wic/acm international conference on web intelligence, 2–8 (2005), http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1517807. 4. jamie callan, “distributed information retrieval,” advances in information retrieval 5 (2000): 127–50. 5. péter jacsó, “internet insights—thoughts about federated searching,” information today 21, no. 9 (2004): 17–27. 6. ricardo baeza and prabhakar raghavan, “next generation web search,” in search computing (berlin heidelberg: springer, 2010): 11–23, http://link.springer.com/chapter/10.1007/9783-642-12310-8_2. https://developers.google.com/custom-search/v1/overview http://0-dl.acm.org.library.ucc.ie/citation.cfm?doid=988392.988408 http://0-dl.acm.org.library.ucc.ie/citation.cfm?doid=988392.988408 http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1517807 http://link.springer.com/chapter/10.1007/978-3-642-12310-8_2 http://link.springer.com/chapter/10.1007/978-3-642-12310-8_2 information technology and libraries | june 2014 57 7. trevor strohman et al., “indri: a language model-based search engine for complex queries,” in proceedings of the international conference on intelligent analysis 2, no. 6, (2005): 2–6. 8. xapian project website, http://xapian.org/. 9. andrew aksyonoff, introduction to search with sphinx: from installation to relevance tuning (sebastopol, ca: o’reilly, 2011). 10. rohit khare, “nutch: a flexible and scalable open-source web search engine,” oregon state university, 2004, p. 32, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.5978 11. “xapian users,” http://xapian.org/users. 12. christian middleton and ricardo baeza-yates, “a comparison of open source search engines,” 2007, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.6955. 13. apache solr, http://lucene.apache.org/solr/. 14. ross singer, “in search of a really ‘next generation’ catalog,” journal of electronic resources librarianship 20, no. 3 (2008): 139–42, http://www.tandfonline.com/doi/pdf/10.1080/19411260802412752. 15. europeana portal, http://www.europeana.eu/portal/. 16. maristella agosti et al., delosdlms—the integrated delos digital library management system berlin heidelberg: springer, 2007). 17. mehdi alipour-hafezi et al., “interoperability models in digital libraries: an overview,” electronic library 28, no. 3 (2010): 438–52, http://www.emeraldinsight.com/journals.htm?articleid=1864156. 18. institute of electrical and electronics engineers, ieee standard computer dictionary: a compilation of ieee standard computer glossaries (new york: ieee, 1990). 19. clifford lynch and hector garcía-molina, “interoperability, scaling, and the digital libraries research agenda,” in iita digital libraries workshop, 1995. 20. andreas paepcke et al., “interoperability for digital libraries worldwide,” communications of the acm 41, no. 4 (1998): 33–42. 21. manjula, patel et al., “"semantic interoperability in digital library systems,” 2005, http://delos-wp5.ukoln.ac.uk/project-outcomes/si-in-dls/si-in-dls.pdf. 22. georgios athanasopoulos et al., “digital library technology and methodology cookbook,” deliverable d3.4, 2011, http://www.dlorg.eu/index.php/outcomes/dl-org-cookbook. http://xapian.org/ http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.5978 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.6955 http://lucene.apache.org/solr/ http://www.tandfonline.com/doi/pdf/10.1080/19411260802412752 http://www.europeana.eu/portal/ http://www.emeraldinsight.com/journals.htm?articleid=1864156 http://delos-wp5.ukoln.ac.uk/project-outcomes/si-in-dls/si-in-dls.pdf http://www.dlorg.eu/index.php/outcomes/dl-org-cookbook open search environments: the free alternative to commercial search services | o’riordan 58 23. herbert van de sompel and oren beit-arie, “open linking in the scholarly information environment using the openurl framework,” new review of information networking 7, no. 1 (2001): 59–76, http://www.tandfonline.com/doi/abs/10.1080/13614570109516969. 24. daniel chudnov, “coins for the link trail,” library journal, 131 (2006): 8-10.25. lois mai chan and marcia lei zeng, “metadata interoperability and standardization—a study of methodology, part ii,” d-lib magazine 12, no. 6 (2006), http://www.dlib.org/dlib/june06/zeng/06zeng.html. 26. json (javascript object notation), http://www.json.org/. 27. the open archives initiative protocol for metadata harvesting, http://www.openarchives.org/oai/openarchivesprotocol.html. 28. herbert van de sompel et al., “the ups prototype: an experimental end-user service across e-print archives,” d-lib magazine 6, no. 2 (2000), http://www.dlib.org/dlib/february00/vandesompel-ups/02vandesompel-ups.html. 29. oaister, http://oaister.worldcat.org/. 30. mark nottingham, ed., “the atom syndication format. rfc 4287,” memorandum, ietf network working group, 2005, http://www.ietf.org/rfc/rfc4287. 31. rss 2.0 specification, berkman center for internet & society at harvard law school, july 15, 2003, http://cyber.law.harvard.edu/rss/rss.html. 32. “rss 2.0 and atom 1.0 compared,” http://www.intertwingly.net/wiki/pie/rss20andatom10compared 33. jay brodsky et al., eds., “the information and content exchange (ice) protocol,” working draft, version 2.0, 2003, http://xml.coverpages.org/icev20-workingdraft.pdf. 34. open archives initiative object reuse and exchange, http://www.openarchives.org/ore/. 35. opml (outline processor markup language), http://dev.opml.org/. 36. adrian p. o’riordan, and m. oliver o’mahoney, “engineering an open web syndication interchange with discovery and recommender capabilities,” journal of digital information, 12, no. 1 (2011), http://journals.tdl.org/jodi/index.php/jodi/article/viewarticle/962. 37. vicky reich and david s. h. rosenthal, “lockss: a permanent web publishing and access system,” d-lib magazine 7, no. 6 (2001): 14, http://mirror.dlib.org/dlib/june01/reich/06reich.html. http://www.tandfonline.com/doi/abs/10.1080/13614570109516969 http://www.dlib.org/dlib/june06/zeng/06zeng.html http://www.json.org/ http://www.openarchives.org/oai/openarchivesprotocol.html http://www.dlib.org/dlib/february00/vandesompel-ups/02vandesompel-ups.html http://oaister.worldcat.org/ http://www.ietf.org/rfc/rfc4287 http://cyber.law.harvard.edu/rss/rss.html http://www.intertwingly.net/wiki/pie/rss20andatom10compared http://xml.coverpages.org/icev20-workingdraft.pdf http://www.openarchives.org/ore/ http://dev.opml.org/ http://journals.tdl.org/jodi/index.php/jodi/article/viewarticle/962 http://mirror.dlib.org/dlib/june01/reich/06reich.html information technology and libraries | june 2014 59 38. erik selberg and oren etzioni, “multi-service search and comparison using the metacrawler,” in proceedings of the fourth int'l www conference, boston, 1995. [pub info?] 39. niso metasearch initiative, metasearch xml gateway implementers guide, version 1.0, niso rp-2006-02, 2006, http://www.niso.org/publications/rp/rp-2006-02.pdf. 40. dewitt clinton, “opensearch 1.1 specification, draft 5,” http://opensearch.org/specifications/opensearch/1.1. 41. judith wusteman, “ojax: a case study in agile web 2.0 open source development,” in aslib proceedings 61, no. 3 (2009): 212–31, http://dx.doi.org/10.1108/00012530910959781. 42. judith wusteman and padraig o’hlceadha, “using ajax to empower dynamic searching,” information technology & libraries 25, no. 2 (2013): 57–64, http://0www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/f iles/content/25/2/wusteman.pd f. 43. adrian p. o–riordan, “open meta-search with opensearch: a case study,” technical report hosted at cora.ucc.ie repository, 2007, http://dx.doi.org/10468/982. 44. ralph levan, “opensearch and sru: a continuum of searching,” information technology & libraries 25, no. 3 (2013): 151–53, https://napoleon.bc.edu/ojs/index.php/ital/article/view/3346. 45. dong nguyen et al., “federated search in the wild: the combined power of over a hundred search engines,” in proceedings of the 21st acm international conference on information and knowledge management (maui, hawaii): acm press, 2012): 1874–78, http://dl.acm.org/citation.cfm?id=2398535. 46. b. d. wilson et al., “interoperability using lightweight metadata standards: service & data casting, opensearch, opm provenance, and shared sciflo workflows,” in agu fall meeting abstracts 1 (2011): 1593, http://adsabs.harvard.edu/abs/2011agufmin51c1593w. 47. library of congress, “sru—search/retrieve via url,” www.loc.gov/standards/sru. 48. the library of congress network development and marc standards office, “z39.50 maintenance agency page,” www.loc.gov/z3950/agency. 49. tony hammond, “nature.com opensearch: a case study in opensearch and sru integration,” d-lib magazine 16, no. 7/8, (2010), http://mirror.dlib.org/dlib/july10/hammond/07hammond.print.html. 50. daniel chudnov et al., “introducing unapi,” 2006, http://ir.library.oregonstate.edu/xmlui/handle/1957/2359. http://www.niso.org/publications/rp/rp-2006-02.pdf http://opensearch.org/specifications/opensearch/1.1 http://dx.doi.org/10.1108/00012530910959781 http://0-www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/files/content/25/2/wusteman.pdf http://0-www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/files/content/25/2/wusteman.pdf http://0-www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/files/content/25/2/wusteman.pdf http://dx.doi.org/10468/982 https://napoleon.bc.edu/ojs/index.php/ital/article/view/3346 http://dl.acm.org/citation.cfm?id=2398535 http://adsabs.harvard.edu/abs/2011agufmin51c1593w http://www.loc.gov/standards/sru http://www.loc.gov/z3950/agency http://mirror.dlib.org/dlib/july10/hammond/07hammond.print.html http://ir.library.oregonstate.edu/xmlui/handle/1957/2359 open search environments: the free alternative to commercial search services | o’riordan 60 51. daniel chudnov and deborah england, “a new approach to library service discovery and resource delivery,” serials librarian 54, no. 1–2 (2008): 63–69, http://www.tandfonline.com/doi/abs/10.1080/03615260801973448. 52. “news about our searchmonkey program,” yahoo! search blog, 2010, http://www.ysearchblog.com/2010/08/17/news-about-our-searchmonkey-program/. 53. peter mika, “microsearch: an interface for semantic search,” in semantic search, international workshop located at the 5th european semamntic web conference (eswc 2008) 334 (2008): 79–88, http://ceur-ws.org/vol-334/. 54. sten govaerts et al., “a federated search and social recommendation widget,” in proceedings of the 2nd international workshop on social recommender systems ([pub info?], 2011): 1–8. 55. s. [first name?]reilly, “digital object protocol specification, version 1.0,” november 12, 2009, http://dorepository.org/documentation/protocol_specification.pdf. http://www.tandfonline.com/doi/abs/10.1080/03615260801973448 http://www.ysearchblog.com/2010/08/17/news-about-our-searchmonkey-program/ http://ceur-ws.org/vol-334/ http://dorepository.org/documentation/protocol_specification.pdf lib-s-mocs-kmc364-20140601051338 multipurpose cataloging and indexing system (cain) at the national agricultural library. 21 vern j. van dyke: chief, computer applications, national agricultural library, and nancy l . ayer: computer systems analyst, national agricultural library, beltsville, maryland. a description of the cataloging and indexing system (cain) which the national agricultural library has been using since january 1970 to build a broad data base of agricultural and associated sciences information. with a single keyboarding, bibliographic data is inputed, edited, manipulated, and merged into a permanent base which is used to produce many types of printed or print-ready end-products. presently consisting of five subsystems, cain utilizes the concept of controlled authority files to facilitate both information input and its retrieval. the system was designed to provide maximum computer services with the minimum of effort by users. introduction this article describes an interactive system in operation at the national agricultural library which with a single keyboarding of data provides all necessary catalog cards, book catalogs, bibliographies, and related internal reports, as well as a computer data base for information retrieval. primarily in batch mode, the system can operate on an ibm 360 with 256k memory using os, six magnetic tape drives, a card reader, and a line printer. background the national agricultural library ( nal) as one of the three national libraries is responsible for the collection and dissemination of agricultural information on a national and worldwide basis. in this pursuit publications are obtained through gifts, exchange agreements, and by purchase of items in many languages. titles of those items in non-roman alphabets are transliterated and all non-english titles are translated. the volume of publications handled by nal in 1969 was in the neigh22 journal of library automation vol. 5/1 march, 1972 borhood of 600,000, of which approximately 275,000 were added to the collection. this volume was sufficiently large to provide a serious problem to nal's staff and thus computer assistance was clearly a logical and necessary arrangement. in 1964 a computer group was formed in nal; it became active in developing systems to prepare voluminous indexes for the bibliography of agriculture, the complete pesticides documentation bulletin, and the categorical and alphabetical issues of the agricultural/ biological vocabulary. during 1969 these systems were consolidated and expanded so as to process all input data within one coordinated set of parameters. in january 1970 the new cataloging and indexing (cain) system was implemented. system design cain is a complex and comprehensive computer system which has been engineered to handle up to five ( 5) simultaneous but separate users who share the same controlled authority files. the basic precept in development of computer applications at nal is to make input and output simple and convenient for the users, with the computer assuming as much detail and data manipulation as is technically feasible. at nal the current users providing input data are the new book section, cataloging, indexing, and agricultural economics. operating in parallel, cain also services the herbicides data base of the agricultural research service; the international tree disease data base of the forest service; and in 1971 will be installed in the library of the technion-israel institute of technology in haifa, israel. the master data record is variable in length with a fixed portion of 173 characters and up to fifty-seven additional segments of 65 characters each. the fixed portion includes basic data plus a directory of data contained in the variable portion. data elements in cain are: a. file code-delineates the various files. b. identification number-on cataloged items this embodies the accession number. all identification numbers include the year of accession, a parallel run code plus a unique control number. c. source code. d. user codes-specific identification of up to five users. e. english indicator-language of text. f. translation code-availability of an english translation. g. language, if other than english. h. proprietary restrictoridentifies classified records. i. title tracing indicator-for catalog cards. j .. main entry-designates main entry if not normal sequence. k document type-whether journal article, monograph, serial, etc. i. filing location-if other than in the library stacks. m. categories-two. general area of coverage of subject matter. cataloging and indexing system/van dyke and ayer 23 n. new book description-if the title is not sufficiently explanatory. o. titles-three types: ( 1 ) vernacular or short, ( 2) alternate or holdings, and ( 3) translated title (english). p. personal authors-up to 10. names plus identifying data. q. corporate authors-maximum of two. r. major personal author affiliation. s. abbreviated journal title if item is a journal article; imprint if monographs and serials. t. collation/pagination. u. date-two: search date, and date on publication if different. v. call number. w. subject terms-may be nested. up to 45. x. general notes. y. special purpose numbers-patent, grant, analysis, contract, technical, or report. z. series statement. aa. abstract/ extract. bb. tracings not otherwise normally generated by the system. cc. nonvocabulary cross-references. the total number of individual elements is limited only by the maximum record size. the nal-produced software is written in cobol. the data base is maintained on tape which is nine-track, 800 bpi, blocked 2, in ebcdic, with standard ibm 360 header and trailer labels. the total system presently consists of forty programs, some of which are multipass. in addition, throughput is sorted twenty-five times during the full computer run. these, of course, include the search and retrieval programs and sorts which are run only on request. the ultimate system which nal is working toward and for which the basic design is already substantially complete is an on-line full library document locator and control system which may be linked via dial-up service to an international and national science and technology information network. each portion of cain is developed with the broader picture in mind. it was this factor which weighed heavily in selecting cathode ray tube (crt) terminals for the proposed data gathering subsystem inasmuch as crt's will be the predominant type of terminal in the future network. for convenience in discussion, the system will be described by its subsystems: data gathering, edit and update, publication, search and controlled authorities. data gathering subsystem from its inception the input to cain was in the form of punched cards, a method which has proved to be slow and error prone. in order to eliminate double keyboarding and excessive time lag, as well as to reduce the 24 journal of library automation vol. 5/ 1 march, 1972 error rates, it was decided to perform this input function in the library with trained library personnel. to accomplish this, nal proposes to implement an "on-line" type of input subsystem using crt's. although this form of entry is not yet in use, the subsystem should operate substantially as follows. the documents are to be marked by catalogers and indexers and passed to library technicians who will enter the data through crt's into an on-line storage file. to do this, the technician will call from the hardware prestored formats as desired and fill in the data elements required. these formats use english terms and for the most part call for data rather than codes. in addition, data are to be entered in normal upperand lowercase without diacritics, thus improving visual scanning for errors. an average of four formats will be needed to enter one item. by use of an algorithm, the system would store formatted records for each id in such a manner as to permit recall singly or collectively. the physical documents are then to be passed on to an editor who can recall any or all formatted records for review. with the document in hand, stored records will be reviewed and corrected if necessary. when acceptable, the records will then be transmitted to magnetic tape. variations on this procedure could include input direct to tape, storage to tape without recall to a crt by an editor, cancellation of actions, and a direct purge of the entire storage file without loss of the controlling matrix. the expertise of the library technicians inputting the data should insure far more accuracy than could be expected from multihandling and multikeyboarding. in addition the system has been designed to accomplish basic pre-cain editing of such factors as numeric or alphabetic characters in certain fields and overall lengths of the fields. errors in these categories will be promptly identified by the computer by a blinking feature on the crt screen. another major benefit of this direct approach is that documents can be processed through the system so as to reach the stacks twenty-four days faster than under the current keypunch method. magnetic tapes created by the data gathering system will be periodically converted from ascii to ebcdic and processed into the edit and update subsystem of cain. the present nal time schedule for updating master cain files is weekly. this is not a requirement of the system but an administrative decision based on other deadlines. the data gathering system as prescribed by nal will be composed of sixteen crt's, a large on-line storage file , and one nine-track 800 bpi magnetic tape drive. this configuration will be either a hard-wired "black-box" approach, or controlled by a dedicated mini-computer. the hardware prescribed for this subsystem is not included as a requirement of cain inasmuch as transactions can be entered on 80-column cards if desired. an additional feature of this subsystem will be the generation of managecataloging and indexing systemjv an dyke and ayer 25 ment information feedback. this will encourage elimination of manual counts and provide accurate throughput volume statistics on a timely basis. through this means the supervisor will be in a better position to evaluate workload, individual performance, and hardware utilization. edit and update subsystem the first step in the acceptance of transactions is a thorough validation of each data element. the computer is used to relieve librarians of the voluminous and time-consuming edit of many individual elements having predetermined limits. thus, only a cursory review of the proof-listed records is necessary by a librarian before acceptance. the system cannot detect, of course, logical or typographical errors, but it can determine the absence of necessary information, codes in invalid ranges, and the incorrect placement of data. elements for which the system supplies authority files are not only verified against the file but also additional transactions are generated from the authority file to assure uniformity in output. this also eliminates the necessity for librarians having to enter those elements which have a direct predictable relationship to another element. further validations are performed at the point of building new records or updating records already in the master file. the two "master" files are ( 1 ) the temporary set of unselected records and ( 2 ) the permanent set of those records which have been approved and selected for publication in some form. data elements specified as required within each record are reviewed. if one or more is missing, the system refuses to approve this record, and a notice is produced concerning this reversal of human input. fields can be deleted, in whole or in part, replaced or added. three types of output from this subsystem are: • new updated master files. those which have been added or altered during this update run are proof-listed for cursory review by a team of professional librarians. corrections and/ or approvals are submitted in a subsequent update run. • activity notices. every action whether submitted by the user or system-generated which has been accepted for processing is reported. • error notices. all error and warning messages from this subsystem are compiled into one listing. this includes errors on individual elements, system-discovered errors of omission, and warnings of computer overriding of submitted actions. through the use of control cards various handling options are possible. one of these is proof-listing of a specific range or ranges of masters by identification numbers or dates. subject headings are assigned by professional librarians for monographs and new serial titles. for journal articles, however, the system analyzes the title of the article and creates subject index terms, using single words, 26 journal of library automation vol. 5/1 march, 1972 combinations of two words not separated by stop words, and singular and plural variations. the generated terms are then processed against the controlled authority file. those accepted as valid are inserted in the record for searching purposes. publication and distribution subsystem each data element of a bibliographic item is captured only once and at the earliest possible time in the receipt process. master records which have successfully passed the edit and update phase become candidates for various types of publications and other user services. six major modes of publication products are produced by cain, at various times and in a variety of both formats and media. preliminary to the production of formal output there is a screening for records designated as fully acceptable by the edit and update subsystem. as mentioned above, any record may be identified as being applicable to any combination of from one to five users. by a method of control cards the system is informed as to which users are scheduled for publication/ distribution, and the maximum quantity to be selected in each case. this subsystem reviews each record to ascertain its appropriateness for selection. records meeting the criteria are siphoned off for individual handling. no record is dropped from the temporary file until it has been selected by all applicable users. a new book shelf listing may be printed on photocopy paper on request. on preparation, it is ready to be matted, photographed, printed, and distributed throughout the department of agriculture. only enough new book entries are selected by the computer at one time as will fit on three sheets of a four-page publication. approved cataloged records are selected weekly. each record is analyzed for applicability to any or all of the eight major files for which catalog cards are prepared. each card file has its own criteria both in content and in the number and types of cards produced for it. the system produces a separate record for each card required, sorts together the records for each file, and alphabetizes within that file. leading articles (regardless of language) are printed but are excluded in the sorting procedure. cards are printed two-up in upperand lowercase in the format prescribed by angloamerican cataloging rules. after printing, the cards are distributed to the appropriate organizations and sections where they may be filed with a minimum of additional effort. monthly, a book catalog is compiled. this contains not only a listing by main entry but also indexes of personal authors, corporate authors, subjects, and titles. a biographic index (major personal author affiliation) capability is available although not presently used by nal in the book catalog. this catalog is printed in varying numbers of columns changeable by control card option for each index. again photocopy paper is used with a standard cataloging and indexing system/van dyke and ayer 27 upperand lowercase (tn) print train. an alternate option is magnetic tape output formatted for direct input to a computer-driven linotron. see bibliographic description for more detail. semiannually the index portions of the book catalog are cumulative. main entry listings are not repeated. multiyear accumulations may also be produced. the book catalogs are presently being published from photocopy printout by rowman and littlefield, inc., new york. bibliographies, either scheduled or special, can be produced with the same indexes as those in the book catalog. these are normally prepared for printing via the linotron. this magnetic tape record contains all formatting requirements with the exception of word divisions. document title, page, and columnar (subject category) headers are provided by nal. running headers are inserted by the linotron. through predetermined codes, the cain tape specifies the print style, print size, and print format. bibliographies may also be computer printed on photocopy paper similar to the book catalog. once a month, each record selected for publication is processed through a merge and adjustment program. at this point published records not previously on the permanent master file are added to it. those which are already on it are compared and the resident record is adjusted to include the new user for whom the record has just been published. the term field is also verified and updated if necessary. each term is also used to generate posting records for the subject authority file. the permanent (published) cain data base is available on magnetic tape in either the master format or a print format of the linear proof (listing of each data element). only records not previously published are added to the monthly sale tapes. these tapes may be ordered individually (new monthly selections) or collectively (whole file) at the cost of reproduction only. the tape is nine-track, 800 bpi, ebcdic with standard ibm 360 header and trailer labels. one of the purchasers of cain tape is the ccm information corporation of new york which publishes bibliography of agriculture from it starting in 1970. current purchasers include private corporations and universities, both in the united states and abroad. the last type of output is normal computer printout of numerous internal reports in a variety of customized formats. search subsystem the search capability of the cain system is not being used by nal on its own data base at the present time. it is utilized, however, by other organizations who run the cain system on a parallel basis, maintaining their own data bases. the following description, therefore, pertains to the programmed system rather than to its use on the nal data base. this subsystem permits identification and retrieval of records in cain format based on search statements as applied to almost every data element 28 journal of library automation vol. 5/ 1 march, 1972 or combinations thereof. such searches may use simple statements or a complex series of nested boolean parameters. questions may also be absolute or weighted to give more precise results. the weight factors if used are normally assigned to each statement within a search question, with a threshold weight assigned to the overall question. the total weight of all true statements must be equal to or greater than the threshold weight for the full query in order to be considered as meeting the search criteria. if such is not the case, the record will not be selected. since cain uses a controlled vocabulary, query statements on subject terms are first matched against that authority file. at this point each invalid (use ) term is replaced by a corresponding valid ( uf ) term if appropriate. in addition, if the query statement so specifies, the requested terms may be expanded one level in the hierarchy. in other words, it could generate additional statements requesting all broader, narrower, or related terms as specified if such structure were present for the subject within the vocabulary. because subject terms comprise the largest percentage of all search elements, an algorithm was developed whereby queries on this type of element are first processed against an inverted file. identification numbers are extracted for all terms matching the query and only those candidate records are searched using the full query. on a serial file such as cain, this concept provides a substantial savings in computer run time. the print options of retrieval output allow either for normal sequence by identification number or for a specific sequence as requested by the originator. the printout may contain all data elements or only those selected, all others being suppressed. at the present time this subsystem is used infrequently by nal and only for internal high priority searches due to the extremely limited subject indexing terms present. it is used more extensively on the parallel operation established for the international tree disease register maintained for the u. s. forest service. authority files subsystem this subsystem updates, generates, expands, and maintains three types of authority files. these include subject terms with associated hierarchy, call numbers of indexed journals with abbreviated titles, and a subject term inverted file carrying the identification number of each record using that term. each transaction to add, change, or delete any data is both edited and reversed before entering the updating sequence. thus an addition of a narrower term (for example, horse) to a base term (for example, animal) will automatically generate another transaction to add the broader term of animal to a base term (new or existing ) of horse. this precludes having to manually enter both sides of an action as well as assuring reciprocity of entries. due to the flexibility of the search subcataloging and indexing system/van dyke and ayer 29 system of cain, this hierarchical continuity is of great importance. if an item is changed the same procedure is followed. in the instance of deletion, a broader precept is involved. in this case, the term is deleted from all entries in other hierarchies but is itself left on the authority fil e and marked as being no longer valid. it is thus available for search purposes but is not allowed to be used on subsequent cain data records. during a normal cain data run, each call number or subject term in a record is verified against the appropriate file. each element on these files is carried in two forms-one in stripped uppercase, and the other in preferred print form. when an incoming term is found on the authority file, the system substitutes the proper form. this includes substituting a valid term for an invalid term as in the "use-use for" relationship, as well as generation of the appropriate abbreviated journal title for a given call number. in order to keep the authority file up to date, the transactions generated by the publication subsystem are now used to insert the record identification number into the inverted file as well as increase the number of postings per term. this assists search specialists in formulating queries in the manner which will reduce computer processing time to the greatest degree. when published, the authority files themselves can be printed in a special format which displays the entire hierarchy of each term. in addition, up to ten levels of increasingly narrower terms can be listed for each term. summary cain is a broad-based comprehensive batch mode system which meets many library requirements. its flexibility is apparent from the fact that it has already been expanded to se lect each newly cataloged serial record for transmission in marc ii communication format to the national serials data bank being created by the three national libraries. still more capabilities will undoubtedly be built into it before the nal ultimate on-line system is implemented. the major thrust of the systems design has been to concentrate on simplifying user interface while imposing stringent and extensive service requirements on the computer system itself. due to its inherent fluidity, cain is being retained as an in-house system. it is so complex that a single change in one subsystem may have radial effects in any or all of the other portions. continuing efforts are underway to simplify input, accelerate throughput, and expand its already generous services both to the staff of the national agricultural library and to those organizations utilizing output from the cain system. 290 a computer-accessed microfiche library r. g. j. zimmermann: department of engineering-economic systems, stanford university, stanford, california. at the time this article was written, the author was a member of the technical staff, space photography laboratory, california institute of technology, pasadena, california. this paper describes a user-interactive system for the selection and display of pictorial information stored on microfiche cards in a computej'controlled viewer. the system is designed to provide rapid access to photographic and graphical data. it is intended to provide a library of photogmphs of planetary bodies and is currently being used to sto1·e selected martian and lunar photogmphy. introduction information is often most usefully stored in pictorial form. photography, for example, has become an important means of recording data, especially in the sciences. a major reason for this importance is that photographs can be used to record information collected by instruments and not normally observable by the unaided eye. such photographs, especially in large quantities, may present a barrier to their use because of the inconvenience of reproducing and handling them. it is apparent that a system to compactly store and to speed access to these photographs would be very useful. such a system, utilizing a microfiche viewer directly controlled by a user-interactive computer program, has been developed to support a library of photographs taken from space. in the past fifteen years, the national aeronautics and space administration has conducted many missions to photograph planetary bodies. these missions have provided millions of pictures of the earth, moon, and mars. a large number of additional pich1res are expected to be taken in the near future. the space photography laboratory of the california institute of technology is establishing, under nasa auspices, a microfiche library of a selection of these photographs. the library currently contains the photographs of mars taken by the mariner 9 spacecraft as well as lunar photographs taken by the lunar orbiter series. the library is expected to be expanded as time and resources permit. it has been operating, with various versions of the control program, since june 1972. the program is: currently being further developed by mr. david neff and miss laura hormicrofiche libraryjzimmermann 291 ner of the space photography laboratory at the california institute of technology. hardware the photographs are kept on 105-by-148mm microfiche cards, sixty frames to a card. this format provides the least reduction of any standard microfiche format and was used to retain the highest possible resolution. the cards are displayed by a microfiche viewer (image systems, culver city, california) which can store up to about 700 cards and has the capability of selecting a card and displaying any frame on it within a maximum of about four seconds. (throughout this paper, "viewer" will be used to refer to the microfiche viewing device. ) the viewer can be equipped with a computer interface which allows the picture display to be directly computer controlled. an installation consists of the viewer with interface, any standard input/output ( ijo) terminal, and the control program, running, in this case, on a time-shared computer. the terminal is used for communication with the control program. the user enters all commands by typing on the terminal keyboard. the viewer is designed to be plugged in between the computer and i/0 terminal. the computer transmits all information on the circuit to which normally (without the viewer) only the terminal is attached. this information includes the viewer picture display control codes which are recognized and intercepted by the viewer. all other information is passed on to the terminal. no further special equipment is necessary. the system described has been implemented on a digital equipment corporation system 10 medium-scale computer with a time-sharing operating system. the program is written mainly in fortran with some assembly language subroutines. it runs in 12k words ( 36 bits /word) of core memory. the program will not run without conversion on any computer other than the dec system 10. software the control program is user-interactive, that is, it accepts information and commands from the user. these commands allow him to indicate what he desires and to control the action taken by the program. the program permits the user to indicate what characteristics he wishes the pictures to have, selects the pictures that satisfy his criteria, and then allows him to control the display of the selected pictures and to obtain any additional information he may need to interpret the pictures. to guide the user, instructions for use of the system, as well as other infonnation the user may need, are displayed on the viewer as they are required. all user responses are extensively checked for validity. any uninterpretable response is rejected with a message indicating the source of the trouble, and may be reentered in corrected form. it is always possible to return to a previous state, so it is impossible to make a "catastrophic" error. in designing the 292 journal of librat'y automation vol. 7/4 december 1974 system, particular attention was paid to integrating the viewer and computer to utilize the unique capabilities of each. for example, most instructions are presented on the viewer where they can be shown quickly and can be scanned easily by the user. only short messages need to be sent and received by the i/0 terminal. data base a picture is described by a number of characteristics, called parameters. for every picture stored in the viewer, the value for each of these parameters is stored in a disc file. in this application, parameters are mainly used to describe characteristics that are available without analyzing the picture for content. in science, these are the experimental conditions-such as viewing and lighting conditions for space photography. because space photographs are taken by missions with different objectives and equipment, it was necessary to design a library system to include pictures with widely varying selection characteristics. in order to accommodate sets of pictures with widely differing characteristics, without wasting storage space or requiring the elimination of useful descriptors, the computer storage has been structured to allow pictures to be grouped into picture sets, each of which is described by its own set of parameters. conversely, any group of pictures for which the same selection parameters are used forms a picture set. the characteristics of each such set of pictures are also stored and the program reconfigures itself to these characteristics whenever a new picture set is encountered. such an organization allows the control program to be used on groups of totally different kinds of pictures. opemtion in selecting a picture set the user is guided along a series of decisions presented on the viewer. at each step the control program directs the viewer to display a frame with a set of possible choices. the user enters his response on the i/0 terminal and the control program uses this response to determine which frame the viewer should be commanded to display next. when the user has selected a set, he is shown the available parameters and apppropriate values for these parameters. after he has specified acceptable values for the parameters he is interested in, the computer program compares these values with the known values in its records for the picture set. the pictures selected by the program are then available for display. as will be described, the user may, at any time, select another picture set or change his parameter specifications. he may also indicate which pictures of those selected by the computer during the comparison search he wishes to have remain available after the next comparison search. this allows comparison of pictures in different picture sets. appendix 1 shows an example of a typical search. the action of the control program can be separated into five phases of microfiche library/zimmermann 293 operation, each with a distinct function. the functions of three of these phases involve user interaction. transfer between phases may also be accomplished by user command. a different group of commands is employed for each of the user-interactive phases. in addition, there is a group of commands which may be used any time a user response is requested; they are listed in appendixes 3 and 4. there are no required commands or sequences of commands. the user proceeds from one phase to another as he desires. in each phase allowing user interaction, the user can enter any valid command at any time. figure 1 shows the phases and possible transfers between phases. a more detailed description of what occurs in each phase will be given after the data organization is described. picture set selection parameter specification search optimization comparison search picture display and information access bold lines enclose user-interactive phases. arrows indicate possible directions of control transfer; bold arrows are control transfers made by user commands. fig. 1. phases and control transfers. description of software data base organization as has been stated, the pictures of the library are grouped into picture sets. the data base may contain any number of picture sets. each such set has a picture file associated with it. this picture file is on disc storage and 294 journal of library automation vol. 7/4 december 1974 contains all the known information stored for a set of pictures. each picture in the set has an associated picture record in the file. in addition, the first record in a picture file, known as the format record, contains all the file specific information about that file. whenever a new picture file is called for, the format record for that file is read from disc storage into main memory and kept for reference. figure 2 shows the organizational structure of the data base. picture files (as many as required) format record picture records / ~.___i ~i ......_i ~if }ij fig. 2. picture file organization, picture records consist of a fixedand a variable-length portion. the variable-length portion contains the known values, for the associated picture, of the specification parameters. since the number of parameters, can vary from file to file, the length of this portion varies from file to file. (however, all picture records within a particular file have the same length and form.) the maximum number of parameters for a system is determined by array dimensions set when the program is compiled. currently these dimensions are set for a maximum of fifty parameters for any file in the system. the fixed-length portion contains (generally) the same type of information for all files. it includes the information needed to display a picture and to obtain interpretive information. when, during the comparison search, a picture is selected on the basis of information in the variable data, the fixed-length portion is copied into a table and kept for use during the picture display phase. each selected picture is represented by an entry in this table. the contents of the fixed-length portion are presented in table 1. as an example, the contents of a picture record for the mariner 9 photographs are given in appendix 5. a picture file's format record describes the file by all characteristics that are allowed to vary from file to file. the format records for all picture files have the same form; each is divided into a number of fields supplying information for a particular function. these fields can be separated into two categories: those which describe the picture records and those which apply to the file as a whole. for fields of the first type, each parameter has an enb·y in the field. for example, one such field contains the location, in microfiche librm·y /zimmermann 295 table 1. the fixed-length portion of a picture record field use fiche code file name picture number unit number id number auxiliary codes ( 3 fields) control code output by the control program to the viewer to display the frame associated with this picture record. the file name of the picture file; this and the picture number uniquely identify the picture record and allow it, and specifically the contents of the variable portion, to be refound. a sequence number assigned each picture record in the file in increasing order. the viewer that the picture associated with this picture record is stored in. the identification number referred to by the user. if the picture has been given an id number by which it is commonly known, it will be kept in this field. viewer control codes for frames containing different versions of, or auxiliary data for, the picture. the actual contents of these fields vary with the picture file as determined from the contents of the format record of that file. a picture record, of the value for each of the parameters. another field has a ten-letter description of each parameter. see appendix 2 for a description of the format field. operation of the control progmm the following is a brief technical description of the control program; detailed documentation is available. the control program is modularly constructed. each phase consists of a major subroutine and its subsidiary subroutines. at the completion of a phase, control is transferred to a main program which determines which phase is to be performed next and transfers control to it. the user-interactive (interrogation) subroutines ask for a user response, attempt to interpret the response and perform the desired function, then ask for another response. an important subroutine used by all the interrogation subroutines collects the characters of the user response into groups of similar characters to form alphabetic keywords, numbers, punctuation marks, relational operators, etc. when an interrogation subroutine is ready for a user request, it calls this "scanning" subroutine. the scanning subroutine outputs an asterisk, indicating it is ready, to the user i/0 terminal. the scanning subroutine supplies the groups of characters, along with a description of the group, to the interrogation subroutine. the interrogation subroutine then attempts to interpret the character groups by comparing them with acceptable responses. if the response is not in one of the acceptable forms, an error message is given to the user and he can try again. the error message includes an indication of where the error was found and describes the error. some commands do not need to be interpreted by the interrogation subroutines; the function they request is the same throughout the program. these are called immediate commands and are listed in appendix 3. these 296 journal of library automation vol. 7/4 december 1974 commands are interpreted, and their functions performed, by the scanning subroutine. picture set selection in selecting a picture set the user is asked to make a series of decisions. for each decision, a frame listing the possible choices is displayed on the viewer. all possible decisions form an inverted tree structure (see figure 3). the user may also return to a previous decision point. the tree structure is implemented in a table in computer storage. there is an entry in this table corresponding to each decision point in the tree. when a decision a martian aa orbital. aaa aab aac flyby :1-iariner hariner nariner iv vi, ix 0 u vii ab sur£ace viking b lunar ba orbital approach baa apollo hand held bab apollo metric bac apollo pan bad lunar orbiter bae ranger bb surface bba apollo bbb surveyor c venus flyby d mercury flyby fig. 3, example of a tree. microfiche libmryjzimmermann 297 is made, the entry corresponding to the new decision point is obtained. an entry at the bottom of the· tree identifies the picture file associated with the picture set selected. in general, an entry contains: ( 1) the viewer control code of the frame displaying the choices; ( 2) a pointer to the entry from which this node was reached; ( 3) the number of possible decisions which can be made at this decision point (to check for valid decisions); and ( 4) pointers to the entries for the decision points reached. parameter specification once the user has made a decision selecting a set of pictures, he is presented with a list of the available parameters and acceptable values for them. for each parameter in which the user is interested, he specifies the parameter number and the values or range of values acceptable to him. this information is stored in two tables which are referred to when the comparison search is made. one table, the parameter table, contains an entry for each parameter specified. this table is cleared whenever a new picture set is called for. an entry in the table includes: ( 1) the parameter number; ( 2) a code indicating which of several methods is to be used in processing the parameter; ( 3) a code providing information on how the user-specified values are to be interpreted; and ( 4) a pointer to the location in a second table, the values table, where the first of the specified values is stored. all additional values are placed in the values table following the addressed value. the processing code (number ( 2) above) allows each parameter to be processed by a unique method. a standard method for a given parameter is kept in a field of the format record. the user can also specify a method other than the standard one. if an entry already exists for a just-entered parameter, the old entry is updated rather than a new one created. search optimization this phase determines the most efficient way to conduct the comparison search from among a set of alternatives. whenever possible, the search is restricted to only a part of the picture file. for each picture file there is a number of parameters for which additional information is available. specifically, if a list of pictures ordered by increasing value of a parameter is available, the pictures which have a particular value of that parameter can be found more quickly through this list than by searching through the whole file for that value of the parameter. if the position, in this ordered list, of the picture at the low end of a range of values (of the parameter it is ordered on) can be found easily, the search can be started at this point and need only be continued until the picture at the high end has been reached. note that the picture records for the intervening pictures must nonetheless be compared with the user specifications since the restriction is only made on the basis of one parameter whereas more than one may have been specified. 298 ]oumal of library automation vol. 7/4 december 1974 a binary search is the method used to search the list for the first picture in a range of values. to use this method, of a set of n picture records the n / 2th is chosen and its value of the parameter is compared with the desired one. since the list of records is in order of the value of this parameter, it is clear in which half of the list a picture with the desired value of the· parameter would have to be. this interval can then be divided and the process continued until the remaining interval consists of only one picture. the main picture :file is itself usually arranged in order of at least one parameter. for other parameters, control lists of picture numbers ordered by value of these parameters can be used for binary searches. however, it is not practical to create these lists for all parameters as they require a fair amount of storage. an entry in such a list contains two words, the value of the parameter and the picture number of the corresponding picture. picture number is a sequence number which determines the position of the picture record relative to the beginning of the picture :file. each picture file has a table in its format record containing identifiers for the parameters for which the binary search technique can be used. if more than one of these has been specified (as stored in the parameter table), it must be determined which parameter restricts the search the most. to do this the upper and lower limits of the specified values of each such parameter are found (from the values table), and from this the expected number of picture records to be compared is computed. this number is multiplied by a factor indicating the speed of the type of search to be used relative to the speed of the simplest type of search. the parameter· with the lowest expected elapsed time of search is selected for the search. comparison search for each picture to be compared, the appropriate picture record is found and specified parameter values are compared with those in the picture record. a control list, selected in the search optimization phase, may be used to determine which picture records are to be compared. for each selected picture an entry containing a portion of the picture record is made in a picture table. the picture table has a limited capacity which is set when the program is compiled. for our application there is currently room for up to 100 entries. if the picture table is filled before the search is finished, the search is suspended and can be continued by a command in the display phase. picture display, information access this phase accepts commands to conb·ol display of the selected pictures and provide access to interpretive information. the picture table entries provide the information needed, either directly or by referring back to the picture record. any of the selected pictures can be viewed at any time. in addition, the user can "mark" preferred pictures to differentiate them from the others. these marked pictures are set apart in the sense that microfiche library/zimmermann 299 many. viewing and information access commands refer optionally to only these pictures. the pictures themselves are the primary source of information, but the user will often want information that is not available from the picture in order to interpret the picture. there are commands that request the control program to type out on the i/0 terminal the information in a picture record. these commands optionally refer to the picture currently displayed, the marked pictures, or all the selected pictures. other commands call for the display of data frames associated with a picture. these frames can contain large volumes of data that need not be kept in computer storage. the viewer control codes for these frames are kept in the picture table. the keyword commands to display data frames can vary from file to file. the valid commands for a file are kept in the file's format record. there are other commands to transfer control to other phases and to keep desired pictures available for display with those selected by the next comparison search. there is also a provision for adding file specific commands to perform any other function. the commands and their functions are listed in appendix 3. performance and costs a typical simple search consisting of logging in, picture selection, parameter specification, search, and display might take five to ten minutes and cost one to two dollars for compute time. most of this is time spent by the user in entering commands. command execution is usually almost immediate as it does not involve a major amount of computation. most of the compute time is accumulated during the comparison search phase. to search through the entire mariner 9 picture file of around 7,000 pictures (about 200,000 words) takes about forty seconds elapsed time and costs about two dollars. a more typical search, however, will allow some search optimization and cost about thirty cents with an elapsed time of ten seconds. of course, these figures should only be used as estimates, even for other dec system 10 systems, as elapsed time depends on system load and this, as well as the rates charged, varies considerably. total monthly compute costs for a system depend entirely on use. likewise, storage costs depend on actual storage space used. for the 200,000-word mariner 9 file our cost is about seventy-five dollars per month. only the most-used picture files actually need be kept on disc; the rest can be copied from magnetic tape if they are needed. all files are backed up on magnetic tape in any case. the rates listed in this paper are those charged by our campus time-sharing system. dec system 10 computer time is available from commercial firms at somewhat higher rates. the cost for a microfiche viewer with computer interface (image systems, culver city, california, model 201) is around $7,000. a thirty-characters-per-second i/0 terminal sells for $1,500 and leases for $90 per month. in addition, an installation may require a microfiche camera and 300 ]oumal of library autonwtion vol. 7/4 december 1974 other photographic equipment and supplies. photographic services are also available from the viewer manufacturer. the hardware cost for an independent system implemented on a minicomputer with 12k to 20k of core and five million words of disc memory is estimated at an additional $30,000 (exclusive of development and photographic costs). implementing a library system in implementing a library system to use the hardware and software described in this paper, two major areas of effort are required. first, the pictorial information must be converted to microfiche format; that is, it must be photographed, or possibly rephotographed if already in photographic form. in addition, a computer data base must be created. if information about the photographs is already available in computer-readable form, this involves writing a program to convert the data to the structure required by the control program. if this type of information is not available, the pictures may need to be investigated and the information coded, and presumably punched onto computer cards, for further processing. the major difficulties we encountered were coordinating the photographic and data base generation tasks, achieving the high resolution we required to retain the detail of the original photographs, and in using early versions of the microfiche viewer (which had a tendency to jam cards). conclusion a system for rapid access to pictorial information, the computer accessed microfiche library ( caml), has been described. caml has been designed to integrate, in an easy-to-use system, the storage capacity and capability for fast retrieval of a special microfiche viewer with the manipulating ability and speed of a computer. it is believed that this system will help overcome the barriers to the full utilization of photographs in large quantities, as well as have applications in the retrieval of other types of pictorial information. acknowledgments the work described in this paper was supported by nasa grant #ngr 05-002-117. the author is grateful to dr. bruce murray and the staff of the space photography laboratory at caltech for their support and advice; he also wishes to acknowledge the efforts of mr. james fuhrman, who assisted in the programming task and contributed many valuable ideas. appendix 1 the following is an example of a typical search. numbers in the left margin indicate when a new frame is displayed on the viewer. these were added later to clarify the interaction between viewer and terminal. user responses and commands are identified by lines beginning with an asterisk. (the control program types asterisks when it mic1'ofiche libra1'yjzlmmermann 301 is ready for input.) in this demonstration, most keywords were completely typed out. it is possible, however, to abbreviate any keyword to the shortest form that will be unique among the acceptable keywords. after the user enters a standard "log in" procedure to identify his account number and verify that he is an authorized user of this account, the control program is automatically initiated. the viewer displays a picture ( 1) of the installation and the user is asked to enter his name. the name, charges, and time of use will later be added . log 9::::;:94-···t·h·h-j job 13 caltech 506b sys~em tty?? po=t·::s~~ord: 1930 27-aug-74 tue ltd start, please enter your name det1dtetrat i otl 2enter name of file desired •~1r·1 i::·:: 3please type in parameters and their values type "done" when you have finished +orbit 222 +canera a •latitude -45 to 45 +specifications parameters from file mmix orbit 222 dr latitude -45.00 to 45.00 cat·1ef.'a a dr +done ·--?3:::2 pictuf.:es: to pf.:ocess, please lo.iait 2 pictures have been selected 4this is the fif.:st pictuf.:e the fdlloi!jitlg pictupes are ff.:om file t·hh~'!. •• 1 please entef.' commands • 5this is the last pictuf.:e 2 •t·1af.:k +type parat-!eters t·1arked paf.:at-!etef.: key for file mt-!ix das time orbit latitude phase angl viewing an slant rang local time filter exposur tm lont3itude camera rdllfile ~ 2, file mmix , id = 9557769 base pict. reader o, 2-e-2 1-a data reader 1), 2-e-2 5-a fdotp reader o, 2-e-2 5-k parameter values: ·~557769 222 :3€ .. 70 140.70 48.18 14.85 29:37 a 15.29 sn,:rt i~4 425606:3 no cm1~1ents solar angl f.:esolution 60.26 2.29 302 journal of library automation vol. 7/4 december 1974 to an accounting file. the user now enters the picture set selection phase. in the current system, only two files (picture sets) are stored and the user is simply presented with a frame ( 2) listing the file names and giving a short description of what is contained in each. the user types the desired file name (mmix-mariner 9 mars photographs) and thus enters the parameter specification phase. the available selection parameters and acceptable values are now shown ( 3) . the user specifies some param+e:>(am i t·ie 1 marked pictures haye been selected 6this is the first marked picture this is the last marked picture the following pictures are from file mmix 2 +respec if~' %warning--original search parameters are still in effect 7 pleas&type ih parameters and their values type "done" when you haye finished +restart 8 e~iter name of file des i red +orbit 9 please type in parameters and their values t'r'pe "done" i,_ihen you have finished +charge:s: $ 0.5:3 10 •help ll+iden "> 5196 • +this is a~ error ++error++: ~ld such ~:eyi .• jdrd--please rehpe litle +do 1022 pictures to process, please ~jait 22 pictures have been selected 12thjs is the first picture the follobiing pictures are from file orbit .. 1 please enter commands +type parameters specified parameter key for file orbit idetl a 1• file orbit• id = 5196 parameter. values: +5 13 :~ 6 5196 +type parameters latitude, longitude, resolution parameter key for file orbit latitude longitude resolution a 6, file orbit• id = 5201 parameter values: 24.48 -47.27 2.90 please turn off viei.~er• terminal• and coupler jc:s 13 [98"394, mnnj legged cff tty77 1948 27-a•y;-74 ji.s:i n2, ... nn) 306 journal of library automation vol. 7/4 december 1974 if not used, file name is assumed to refer to the file last searched. if the parameters are not enumerated, those specified for the picture selection are typed out. the parameters to be typed out can be enumerated or the specification parameters called for. if neither of these is done, the values of all parameters are typed out. parameters typed out are identified by column headings. phase transfer commands function respecify allows respecification of selection parameters-only those parameters which are reentered are changed; previously specified parameters retain their values. search similar to respecify, except only those pictures in the present list are candidates for selection. this is more efficient than again searching through all the pictures. continue if the search was terminated before all pictures had been processed, the search is continued from where it had been suspended. restart to view another set of pictures (all specified parameter values are deleted) . field number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23-28 appendix5 mariner 9 picture records field fixed-length portion fiche code data code file name id number (das) unit# picture number footprint code unused variable portion das time orbit latitude longitude solar lighting angle phase angle viewing angle slant range camera resolution local time filter exposure time role and file of filter version on roll film comments (content descriptors) lib-mocs-kmc364-20140103102400 52 journal of library automation vol. 4/1 march, 1971 book reviews computerized library catalogs: their growth, cost, and utility, by j. l. dolby, v. j. forsyth, [and] h. l. resnikoff. cambridge, mass.: them. i. t. press, 1969. 164 pp. $10.00. on the verso of the title page of this book extolling the benefits of computer stored information we read: "no part of this book may be reproduced in any form or by any means, . . . , or by any information storage and retrieval system, without permission in writing from the publisher." this is ironic evidence of the inner contradiction between the pioneering aspirations of new technological development and the interests of an existing industry which feels threatened by unfounded fear of obsolescence and pursues claims to undesirable universal control of information. it is a vivid indication of an urgent need for statutory regulation of the right to information as opposed to the right of profit-motivated control of information. behind this disturbing title page are found seven chapters, three of which constitute a specially important contribution to the very scarce literature on quantitative aspects of bibliographic administration. the other four chapters deal at some length with various cost-related aspects of bibliographic record conversion in machine readable form and with computer use for the production of library catalogs from such records: a chapter analyzing the user costs, costs of programming, hardware costs, and record conversion costs; another chapter on the effect of type face design and page format characteristics on the cost of printed catalogs; a chapter on automated error detection in bibliographic record processing; and a chapter on the use of machine readable catalog data in the production of bibliographies. the three chapters on statistical analysis of machine readable bibliographic data are chef-d'oeuvres of library literature. they demonstrate the wealth of quantitative information inherent in bibliographic record files, which with application of appropriate statistical methodology can yield most important information for library management. the introductory chapter illustrates a methodology of analysis of book publication trends, and comparing these with the gross national product and other economic indicators, points out forcibly the extent of vital quantitative information available to the administrator if he cares to analyze. this is a brilliant essay on the topic of the growth rate of library collections! the case study of the fondren library shelf list is a further elaboration of this theme, especially in terms of title vs. volume ratios and class distri· bution, leading to the third analytical essay on the similarities between the economic growth of nations and archival acquisition rates. this essay trio should not be missed by anyone concerned with the objectives and rational management of libraries. it should be obligatory book reviews 53 reading for administrators who want food for their creative vision. these essays not only are informative, illuminating and stimulating, they also attest the virtuosity of their authors in the area of imaginative statistical analysis. we put this book on our ready reference shelf with the conviction that its authors should be persuaded to give us a most needed textbook on the methodology of library statistical analysis. ritvars bregzis library use of computers, an introduction, edited by gloria l. smith and robert s. meyer. new york: special libraries association, 1969. 114 pp. $5.00. ( sla monograph no. 3). this little paperback book is the result of a lecture series held during the spring of 1965 and sponsored by sla's san francisco bay region chapter and the university extension, university of california, berkeley. according to the introduction, it is intended to be "a librarian's primer for computers: briefly, how they work, and basically what kinds of output can be got from them for use in libraries." the book includes most aspects of library automation in a very generalized manner. the chapters include programming, systems analysis, hardware, applications, reference services, conversion, and current trends and future applications. the most informative chapters are the ones on application and current trends. the other chapters matter-of-factly present their information, but because they fail to raise questions do not challenge the beginner to seek additional answers. in addition, most of the papers do not include a bibliography or reading list and therefore do not give the reader guidance to further his interests. the more substantive papers in the book confront the reader with provocative statements, such as, "it should be the cost to the patron which the library should worry about. what is the cost to the ~atron of not knowing something, or the cost of having to find out?" or ' . . . library science . . . today is really only a technology, much like medicine was before biology was developed, namely a handbook-cookbook world. we must continue to search for underlying principles so that librarianship may become a science and grow out of its technological phase." the papers without doubt met the needs of the lecture series for which they were originally intended, and the authors are experts in the field. most of them are well known for their work in library automation and mechanized information retrieval the question nevertheless arises-must ~very spoken word subsequently appear in print? very little contribution is made to the literature by this book as most of it has appeared elsewhere, many times, before-even down to many of the illustrations. as a group of elementary instructional lectures designed to stimulate intere!jt, most of these papers make the grade, but as a contribution to the literature, a fllw of the papers may have better served as journal articles. donald p. hamme1' 54 journal of library automation vol. 4/1 march, 1971 the career of the academic librarian-a study of the social origins, educational attainments, vocational experiences, and personality characteristics of a group of american academic librarians, by perry d. morrison. chicago: american library association, 1969. viii, 165 pp. $4.50. this acrl monograph no. 29 is a revised and condensed version of a dissertation for the degree of doctor of library science at the university of california at berkeley, published ten years after the data were collected. in 1958 the author sent questionnaires to two groups of academic librarians: head librarians of american college and university libraries earning $6,000 or more, which he calls the "primary group," and a "control group" from the same institutions selected from the 1955 edition of who's who in library service. the findings are quite interesting, often expected, but sometimes surprising. it would be expected that the study would support the theory that the true leader has interests broader than those whom he leads, that participation in professional and scholarly organizations is directly correlated with position and salary, and that willingness to move around to different positions is an important ingredient in the formula of "success". but this reviewer, and perhaps others, are surprised to learn that librarians, psychologists and business leaders tend to come from families of men who are better educated than the average, and that it is more advantageous for a woman than a man to hold a master's degree in a subject field and/or to possess the old-type master's in library science. the chapter on "implications of findings" has some interesting comments from respondents, among which is the notion that rewarding specialized competence, as opposed to general administrative ability, is essential if a maximum contribution is to be secured from both men and women. not many academic library administrators would quarrel with this point of view, but might quarrel with the heavy burden which the respondents lay on the library schools to solve the problems of academic librarianship, rather than sharing them with the libraries. criticism of the study is directed toward the rather pessimistic, probably unrealistic, attitude taken by the author on the question of faculty status, and the omission of any substantial inclusion about information science and information specialists, although one needs to recall that these were not prominent issues in 1958, though they were in 1969. the wealth of information contained in this little volume can be useful in recruiting to the profession and to individual libraries, helpful to the young librarian just starting out in the practice of his profession, and of value to administrators of academic libraries. the appendices include a copy of the questionnaire, a statement of the statistical treatment, suggestions for further research and an extensive bibliography. lewis c. branscomb book reviews 55 the computer in the public service: an annotated bibliography 19661969. compiled by public administration service. chicago: public information service, 1970. 74 pp. $8.00. it seemed useful to approach this review from the point of view of a designer of automated library systems, to see how well this bibliography covered the use of computers in that sector of the public service. thus approached, the computer in the public service is disappointing in the extreme. nearly a quarter of the volume is used to explain the classification system, wherein one finds libraries lumped together with museums and the promotion of science and research. this over-generalization is reflected throughout the work and results in very thin coverage, at least as concems the field of library automation. for example, in the four-year period covered, one finds a single article by henriette a vram. one finds neither marc nor recon in the subject index. the indexes, by the way, refer one to the numbers of the sections of the classification code, rather than to page numbers. this takes some getting used to. both journal articles and monographic works are included, but one is hard put to find reference to most of the major systems problems now being tackled by library systems designers, and is struck by the fact that most imprints included seem to be 1966-67. but to return to the real problem with this work: covering four years of the use of computers in the public service in forty-seven pages of citations naturally leaves wide gaps in the literature. the person wishing to learn all about the use of computers in the public service, especially in library systems, will do well to look beyond this meagre list of citations. it can perhaps be best characterized as too little and too late. lawrence g. livingston computer programming for business and social science, by paul l. emerick and joseph wilkinson. homewood, ill.: richard d. irwin, 1970. xviii, 429 pp. $13.25. this book is devoted principally to business programming, with a few of the examples being taken from educational administration or social ~cience. the computer language used throughout most of the chapters ~s fortran-not the current fortran iv, which is covered briefly m an appendix, but the older fortran ii. one chapter is devoted to cobol and another to an "overview of other languages" including basic, algol, and pl/i. it is a satisfactory introduction to programming fundamentals, generally clear and well presented. furthermore, the ?umerous exercises in business programming would seem to be a useful mtroduction to that field. however, neither the language nor the type of ~xample used gives the book any special relevance for persons approaching library automation. foster m. palmer 56 ] oumal of library automation vol. 4/1 march, 1971 consortiwn of universities of the washington metropolitan area: union list of serials, edited by bruce dack. 2d edition. washington, d.c.: distributed by the catholic university of america press, 1970. $20.00. this edition represents 23,787 diherent serial titles located in american, catholic, george washington, georgetown and howard universities. the scope has been enlarged to include monographic series. among the subjects emphasized are africana, astronomy, canon law, chemistry, classics, latin americana, law, linguistics, medicine, physics, semitics, theology and the american negro. although the citations lack the complete bibliographic data that was included in the first edition, the cross references are more than adequate to get from variant forms of the entry to the latest form. in most instances only the title is noted along with the holdings of the various libraries. the print is not as good as that of the first edition. even though the quality does not quite come up to that of the first edition, the list is still worthwhile for people doing research in the washington metropolitan area. sue brown the case for faculty status for academic librarians, edited by lewis c. branscomb. chicago: american library association, 1970. 122p. $5.00. this volwne is available at a most opportune time because of the current interest in faculty status for librarians in academic institutions. fourteen articles and statements examine the various aspects of the subject and generally support faculty status for librarians. although only two of the articles have not appeared in colleges and research libraries during the past decade, this compilation effectively brings together the relevant statements on the subject. faculty status for librarians is a burning issue on many campuses where recognition is being sought, threatened, or questioned. unfortunately, many librarians take faculty status for granted or perhaps do not completely understand or appreciate it. this book will be useful for many groups and situations because of its comprehensiveness. some of the subjects discussed are priviliges and obligations of faculty status, definition of professional duties, criteria for appointment and promotion, opportunities for study and research, and the granting of tenure. on individual campuses faculty status for librarians must be secured and retained in terms locally acceptable and in a frequently unfriendly, if not hostile, atmosphere. the article, "institutional dynamics of faculty status for librarians," by robert h . muller very effectively describes the advantages of faculty status for the institution and explains the forces that tend to work against faculty status for librarians. ]ames t. dodson book reviews 57 monocle; projet de mise en ordinateur d'une notice catalographique de livre, [by] marc chauveinc. grenoble: bibliotheque universitaire, 1970. (publications de ia bibliotheque u niversitaire de grenoble, iii) 156p., [32 leaves] this attractively printed volume is an adaptation, not a translation, of lc marc ii and bnb marc by the librarian of the university of grenoble. as m. chauveinc points out, the format established in a country is based on the cataloging rules used there, since the latter determine the cataloging elements, their functions and relationships. element by element, code by code, everything in marc had to be redefined for monocle in the context of the french rules. because of a commitment to international standardization, care was taken not to alter the basic structure of the content designators used in marc. when a marc variable field is not needed in monocle, e.g., 650 ( l. c. topical subject headings) it is left unused and a new tag is created for the equivalent french field. french topical subject headings are thus 681. marc is a communications format; monocle is a processing format for an individual library which wishes to place its bibliographic records in memory, to produce printed catalogs, and to keep statistics that will be useful in administering the library. the structure of monocle divides each record into two parts: an index file and a library or principal file. this latter is a continuous string of variable length fields which contain the data of a catalog record. the fields and subfields are differentiated in this file only by delimiters and subfield codes. the tags and indicators appear in the directory. the index file in its turn is divided into three parts: a leader, information codes, and a directory which is similar to, though not identical with, the marc directory. the nineteen-byte leader and the sixty-nine-byte information codes area together form the legend. the leader is based on the marc leader and 001 (control number) field while the information codes are an expansion and replacement of marc's 008 field, with provision for serials as well as monographs in this one format. some of the information indicated in the legend can be found in full in the variable fields. however, in line with monocle's avowed objective of automating to improve the running of a library, this information is repeated in the legend, since information placed there, being coded and in fixed positions, can be accessed and sorted more easily, rapidly, and economically. one of monocle's most striking characteristics is the great concern exhibited throughout for sorting and filing arrangement; one example is the effort made to give sorting values to subfield codes. in this respect monocle follows the example of bnb marc rather than lc marc, 58 journal of library automation vol. 4/1 march, 1971 which uses subfield codes only as a means of identifying distinct elements within a field. everyone interested in the problems of bibliographic formatting, or sorting for filing, should give monocle close attention, both for its specific provisions (such as its tagging conventions, its search code, its treatment of titles, subrecords, and references, etc.) and for the light it throws on marc. while lc marc is very succinct in its commentaries and rru:ely justifies its codes, monocle has much fuller explanations of its provisions and its reasons for agreeing with or differing from marc. judith hopkins system scope for library automation and generalized information storage and retrieval at stanford university. stanford, calif.: stanford university, 1970. 157 pp. (available from eric document reproduction service. ed 038153 mf $0.75; hc $7.70) "the purpose of this document is to define the scope of a manual-automated system to serve the libraries and the teaching and research community of stanford university." the automated system considered is not one, but the joint development of two major bibliographic projects; ballots (bibliographic automation of large library operations on a timesharing system) and spires (stanford physics information retrieval system). the development activity falls into three areas; applications unique to ballots, applications unique to spires, and common facilities that are used by both applications, such as executive and communications software and a text editor. the document is roughly divided in two, with hah being devoted to the scope statement and the other half a myriad collection of appendices. the scope portion of the document defines a second phase of development for the system, as prototype applications have been in operation. the objectives of the applications are redefined in system level detail in view of experience learned from phase one. hardware is evaluated and there are indications that it is inadequate to effectively handle even the prototype system. the appendices include a glossary for the uninitiated, sample documentation of the present library operations, a comment on how the law library could use the system, a review by louise addis of the stanford linear accelerator center's experience with spires, and a tutorial on information retrieval. because of the audience this publication is intended for (librarian users, system developers, and administrators) library automation specialists and information scientists will not find much to put their teeth into. the document seems to be intended mainly for internal use rather than external distribution. alan d. hogan book reviews 59 advances in librarianship, edited by melvin j. voigt. volume 1. new york: academic press, 1970. 294 pp. this volume is a most welcome addition to the literature of librarianship, and the prospect of its annual reappearance is indeed cheering. for decades, there were few major innovations in librarianship, but about 1960 there occurred a series of events in libraries, including user-operated photocopying machines, radical improvements and extensions of microphotography, and computerization, that are leading to formulation of new objectives, new systems, new techniques for printing, new media of communication, and new knowledge. up until a dozen years ago, a librarian could keep abreast of new knowledge in his field by skimming a few journals and reading precious few articles. today to keep up he should read a couple of abstract journals and request offprints, read most of the artic1es in at least four or five journals, and advances in libmrianship. this first volume contains eleven chapters by different authors that · discuss topics in a broad span of librarianship : cataloging; acquisitions; costs; academic, school and public libraries; bibliotherapy; and developing countries. although the standard observation by reviewers of such volumes is that "as is to be expected, the quality of the papers is uneven, with some falling below others," it would be accurate to state of this volume that the quality of some papers is higher than others. the editor is to be commended for having produced an excellent publication that should be on the personal shelves of every librarian who wishes to keep abreast of advances throughout his profession. frederick g. kilgour folkbiblioteken och adb. en introduktion i automatisk databehandling. av sten henriksson. datorn som hjalpmedel vid utlan och katalogisering. av claes axelson, lund: bibliotekstjanst, 1969. 86 p., ills., reg. the general service bureau for swedish public libraries, bibliotekstjanst, has edited this introduction to computers and library automation ( circulation and cataloguing). the book is written for persons who know the functions of a library and have perhaps more interest in than knowledge of computers and computer technique. the introduction about computers by sten henriksson from the university of lund is not loaded with numeric statements but carefully explains the computer technique. for non-mathematicians this is an extremely good introduction. claes axelson from bibliotekstjanst writes about the computer's use in circulation and cataloguing. the described circulation system is based on some experiments carried out at a branch library in malmo (stock: 15,000 vols., circulation: 50,000 vols per year). borrower's card and book-card are matched, and batch processed once a week. in case any cards are lost, 60 journal of library automation vol. 4/1 march, 1971 new ones are punched at the desk. reservations were troublesome to handle (a more serious objection if the systems had been intended for use in research and special libraries). cataloguing is described without referring to any experiments. all possibilities are mentioned, book catalogues, card catalogues, databanks, microfiche. specific problems about book cataloguing are treated too. both parts of the book are well written and illustrated with taste and economy. there is an index and a bibliography, but one will find ]ola and program missing. for librarians in scandinavian countries this book is a very useful one-though some problems are related only to the swedish public library world. mogens weitemeyer latin american literature. harvard university library (widener library shelhist, 21). cambridge, massachusetts : harvard university library, 1969. 498 pp. $20.00. the harvard library initiated its monumental project for publishing the widener shelf list in 1965 with the appearance of crusades. making available a classed listing of the holdings of one of the world's largest scholarly libraries is a major contribution to scholarship, as reviewers of the early volumes gratefully acknowledged. however, this review is not concerned with content of this remarkable publication, but rather with the typography of the 21st volume. the early volumes ( 1-8) were reproduced by photo-offset from computer produced copy that was all in upper-case characters. to be sure, one does not read a shelf list, one consults it. nevertheless, literature is in lower case, and using large compilations in upper case is tiresome because of the low legibility of upper-case characters. the plates of these first volumes contained an average of 55 entries. the harvard library improved its procedures about a year after the volumes began to appear, so that the computer printout was in an expanded character set that included lower-case characters and diacritics. the newer volumes were far more legible and therefore far more comfortable to use. each page carried a single column of entries; pages averaged 85 entries. beginning with volume 21, computerized phototypesetting techniques are being employed, and legibility is greatly improved. economy has also improved, for each page now averages over 130 entries per page-nearly two-and-one-half times the content of pages in the early volumes. volume 21 has two columns of entries per page-a format that enhances the number of entries as well as legibility. harvard is to be congratulated on taking advantage of computer developments during the first five years of publication of the widener shelflist. thereby, the aesthetics and economy of a major bibliographic publication have been gratifyingly enhanced. frederick g. kilgofjf june_ital_rubel_final picture perfect: using photographic previews to enhance realia collections for library patrons and staff dejah t. rubel information technology and libraries | june 2017 59 abstract like many academic libraries, the ferris library for information, technology, and education (flite) acquires a range of materials, including learning objects, to best suit our students’ needs. some of these objects, such as the educational manipulatives and anatomical models, are common to academic libraries but others, such as the tabletop games, are not. after our liaison to the school of education discovered some accessibility issues with innovative interfaces' media management module, we decided to examine all three of our realia collections to determine what our goals in providing catalog records and visual representations would be. once we concluded that we needed photographic previews to both enhance discovery and speed circulation service, choosing processing methods for each collection became much easier. this article will discuss how we created enhanced records for all three realia collections including custom metadata, links to additional materials, and photographic previews. introduction ferris state university’s full-time enrollment for fall 2015 was 14,715 students. of these students, 10,216 are big rapids residents and the other 4,499 are either kendall college of art and design students or at other off-campus sites across michigan.1 during the 2014-2015 school year, flite had 14,647 check-outs including 2,558 check-outs of items in reserves, which is where our realia collections are located.2 however, reserves includes other items in addition to these collections, thus making analysis of circulation statistics problematic. another problem with conducting such an analysis is that the educational manipulative collection already had photographic previews and the tabletop game collection is a pilot project, so there is no clear before and after comparison. we can, however, demonstrate that enhancing the catalog records for our anatomical model collection had an incredibly significant impact, jumping from a handful of check-outs from 2014-2015 to almost 450 in 2016. literature review although there are very few libraries using photographic previews for their realia collections, the ones that do described similar limitations with bibliographic records and goals that only dejah t .rubel (rubeld@ferris.edu) is the metadata and electronic resources management librarian, ferris state university, big rapids, mi. picture perfect: using photographic previews to enhance realia collections for library patrons and staff | rubel | https://doi.org/10.6017/ital.v36i2.9474 60 photographic previews could meet. most realia collections that warranted this extra effort are either curriculum materials or anatomical models, which is not surprising considering how difficult they are to describe. as butler and kvenild noted in their article on cataloging curriculum materials, “patrons struggled to identify which game or kit they sought based on the…information in the online catalog,” because “discovering curriculum materials in the catalog and getting a sense of the item are not easy when using traditional catalog descriptions...”3. as they continue, “the inventory and retrieval problems…were compounded by the fact that existing catalog records were not as descriptive as they should be.”4 this was also a problem for our collections because our names and descriptions were often not intuitive or precise. in addition, as loesch and deyrup discovered while cataloging their curriculum materials collection, “…there was great inconsistency among the oclc records regarding the labeling of the format…,”5 which was another issue we needed to address. although the general material designation (gmd) has since been rendered obsolete, flite continues to use it to highlight certain material. this choice is due to some limitations with our library management system as well as our discovery layer, namely the lack of good mapping or use of the 33x fields. until this is rectified with a more modern system, we have it found it easier to retain certain gmds like “sound recording”, “electronic resource”, and “realia”. thus, we needed to standardize our terms for each collection. another problem that our predecessors indicated photographic previews might resolve was missing objects or pieces of objects.6 this becomes especially important for our tabletop games collection because most of those pieces are very small and too numerous for a piece count upon return. fortunately, “previews…can aid users in making better decisions about potential relevance, and extract gist more accurately and rapidly than traditional hit lists provided by search engines.”7 ideally, a preview will display an appropriate level of information about the object it represents in order “…to support users in making a correct judgement about the relevance of that object to the user’s information need.”8 greene goes further by listing the main roles for previews of which the first two are the most applicable for photographic previews: aiding retrieval and aiding users in quickly making relevance decisions.9 for these uses, photographic previews of realia are ideal because users can examine the object without needing to see its details and they expect them to be abstract, not exhaustive, unlike digital surrogates that an archive would use.10 as greene also notes, the high-level goal of any preview is to "...communicate the level and scope of objects to users so that comprehension is maximized and disorientation is minimized."11 a common finding among all the previous projects was that even a single photograph provides more readily comprehensible information than several lines of description. as moeller states regarding their journal project, "they [previews of each issue's cover] give the researcher or student an immediate idea of the nature of the journal."12 he goes further to give the example of an innocuous journal title for a propagandist serial whose political nature is transparent once you view its imagery. from a staff perspective, photographic previews can also easily illustrate the number of information technology and libraries | june 2017 61 pieces and an object's condition or orientation. this can be very useful in determining whether something is missing or damaged without having to do a time-consuming individual piece count upon check-in. but as butler and kvenild discuss, layout within each photograph is key for illustrating missing pieces.13 unfortunately, aside from a few small projects mentioned in butler and kvenild's article, there are not many examples of photographic previews for realia collections currently being used by academic libraries. one reason might be software limitations. innovative's media management module is still unique among ils/lms software in that most vendors either provide a separate digital repository for special collections digital surrogates or they incorporate images into the catalog using third party software like syndetic solutionstm. another reason for the lack of photographic previews within catalogs may simply be the rarity of realia in academic libraries. every library certainly has a few unique pieces, like a skeleton for the pre-medical students, but often not enough to consider them an entire collection much less a complex enough collection to warrant the extra effort to create photographic previews of each item. at flite, we had already crossed that threshold of complexity. therefore, this article will start by discussing our educational manipulative collection, which provided the basis for how we would catalog and process the tabletop games and anatomical models. educational manipulative collection our first foray into creating photographic previews was completed by the previous cataloger with over 300 items cataloged in 2004 and another 30-40 added to the collection over the next decade. unlike the other realia collections, the educational manipulatives were cataloged using innovative’s course reserves module, so no attempt was made to find or create oclc records. nevertheless, the minimal metadata is very consistent across the collection, which supports greene’s recommendation “…that it was important to define a set of consistent attributes at the high level of the collection if any effective browsing across the collections was to be provided.”14 in our case, we rely on a combination of the gmd ([realia]), a custom call number prefix (toys box #), and a limited amount of local subject headings as shown below with “manipulatives” as the common subject for the entire collection. 690 = (d) current local subject headings in use as of 12/3/15: art. infant/toddler. block props. magnets. boards. manipulatives. cognitive. music. discovery box. oversize books. discovery. posters. dramatics. puppets. finger puppets. story apron. flannel board. story props. gross motor. woodworking. picture perfect: using photographic previews to enhance realia collections for library patrons and staff | rubel | https://doi.org/10.6017/ital.v36i2.9474 62 due to the nature of descriptive metadata, photographic previews of the educational manipulatives made logical sense because “the images…are not the content. they are the metadata, the description of the materials.”15 as moeller describes, innovative’s media management module links images and many other file types directly to bibliographic records without requiring users to click an additional link unless they want to view a larger image of a thumbnail.16 similar to butler and kvenild’s project, all of our photos were 900 pixels wide by 600 pixels tall, which is slightly smaller than their default width of 1000 pixels.17 one advantage of using the media management module is its ability to automatically create thumbnails 185 pixels wide by 85 pixels tall. a bigger advantage is that the images are hosted on the same server that runs our catalog, which allows us to freely distribute the images in an intuitive manner (thumbnails instead of links) without having to worry about authentication to a shared folder from off-campus, unlike our pdf files. unfortunately, our liaison to the school of education recently discovered some accessibility issues with media management that forced us to consider whether we should change the embedded photographic previews to external links. the most significant of these problems is simply the language of the proprietary viewer software. because it is written in java, if you click on a thumbnail for a larger image, many browsers, like chrome, will not run it and those that will often require a security exception to do so. we have attempted to ameliorate some of these issues by providing an faq entry on which browsers are best for viewing these images and how to add a security exception for our website, but unless or until innovative rewrites this software in a different language, these accessibility issues will persist because java is being phased out of many browsers. butler and kvenild also noted its slow response time compared to their own server.18 another issue they mentioned was that the thumbnails would not be visible in their consortial catalog, so they needed to add links in the 856 field for these users.19 this is less of an issue for us because we do not contribute any of our realia records to our consortia catalog, but moeller’s concern that in general “…enhancements involving scanned images…will not be easily shared with other libraries,”20 is entirely valid. unlike oclc records, there is no way to share attached or embedded images as part of the metadata and not the content. contrariwise, butler and kvenild’s concerns regarding catalog migration are very pertinent because we are considering moving to a new lms within the next few years.21 although we acknowledge that “utilizing 856 tags is an indirect method of accessing the images, as users must take the intiative to follow the links,” we will eventually have to move and link our photographic previews to ensure accessibility after migration.22 tabletop game collection unlike the educational manipulatives, the majority of the tabletop game collection was previously cataloged in oclc, so finding good bibliographic records was easy. once downloaded, we decided to add a unique gmd ([game]), custom call number prefix (board game box #), and local subject heading “tabletop games”. however, our emerging technlogies librarian who coordinated this information technology and libraries | june 2017 63 pilot project felt that the single subject heading was not descriptive enough. so he gave us a spreadsheet with more specific subject headings such as “deck building”, “historical”, and “resource management” that we added as genre/form subject headings in the 655_4 field. he also suggested that we add links to the rule books, which we did using the 856 field and the link text “connect to rule book (pdf)”. because tabletop games are commercial products, finding images online was also easy. at first, we had some concerns about copyright, but we are not reselling these products or using the image as a replacement for the item. so, we concurred with butler and kvenild that “…the images in our project fall under copyright fair use.”23 another plus to using commercial images is that we could use more than one to show various aspects of setup and play. the downside to this benefit is image sizes and content photographed varied widely, so we used our best judgement in creating labels and tried to keep them as consistent as possible. to ensure consistency across the collection, we decided that the first image should always be the top of the game’s box labeled “box cover” or “box cover – front” if there was a “box cover – back” image. (we only displayed the back of the box cover if there was significant information about the game printed on it.) then we added up to five additional images showing parts of the game like “card examples”, “game pieces”, and “game set-up”. overall, this number of images worked very well in both encore’s attached media viewer and the classic catalog/web opac, but there is a slight duplication in images by syndetic solutionstm for a few games. this results in a larger version of the box top image displaying to the right of the title and above the smaller thumbnails of images we added using media management. in regards to piece counts, we presumed that we would need photographic previews to aid in piece counting upon return of a tabletop game. however, our emerging technologies librarian assured us that because we are an educational institution, we could contact the vendor for free replacement pieces at any time. he also emphasized that unlike the educational manipulatives or the anatomical models, this was a pilot collection, so extensive processing would not be a good investment of our labor. fortunately, the anatomical model collection would require images for piece counts as well as several other cataloging customizations to increase discoverability and speed circulation. anatomical model collection similar to our educational manipulative collection, but not nearly as extensive, our anatomical model collection has been a part of flite since its inception. unlike the manipulatives, which are used primarily by the early childhood education students, the anatomical models support a range of allied health programs including but not limited to dental hygiene, radiology, and nursing. the majority of our two dozen models were purchased in the 20th century and, like the manipulatives, the majority were cataloged using innovative’s course reserves module. unfortunately, none of these records were very descriptive, some being so poor as to be merely a title like “jawbones” and a barcode. so, the first task was to match objects with oclc records. fortunately, this task picture perfect: using photographic previews to enhance realia collections for library patrons and staff | rubel | https://doi.org/10.6017/ital.v36i2.9474 64 became easier once we discovered that it was easier to match the object to the vendor’s catalog image and then search oclc by vendor model name or number than it is to decipher written descriptions if you do not know human anatomy. once good bibliographic records were downloaded, we decided to add one of three gmds depending on the type of model ([model], [chart], or [flash card]), a custom call number prefix (model #), and one or more of the local subject headings shown below. 690 = (d) anatomy model. anatomy chart. anatomy models. anatomy charts. dental hygiene model. dental model. dental hygiene models. dental models. technically, all dental models could be used as anatomical models, but not vice versa. therefore, the common subject headings for the collection are “anatomy model” and “anatomy models”. to make things easier to shelve, retrieve, and inventory, we also designed numeric ranges for the call numbers, as shown below, so we would know what type of model we should expect when referring to a specific model number. 099 = (c) model #00x following this hierarchy: 001-099 anatomical charts and flash cards 100-199 articulated skeletons 200-299 disarticulated skeletons and bone kits 300-399 organs 400-499 skulls (anatomical and dental hygiene) 500-599 other dental models (dental studies, dental decks) we also scanned and linked pdfs of the heavily worn model keys with the link text “connect to key pdf” before washing and rehousing all the models. once they were clean, they were ready for their shoot with ferris state university’s media production team. due to winter break, media production was able to shoot the majority of the collection fairly quickly. they returned to us high-resolution tiffs the same size as those for the manipulatives, 900 pixels by 600 pixels. in case of java viewer failure, we requested that there be one top-level image that showcases exactly what the model contains with images of individual pieces or drawers as the succeeding images. for example, our disarticulated skeletons are housed in small plastic carts with three drawers in each cart. therefore, the first image would be a shot of all the pieces of the disarticulated skeleton and the second image would be the contents of the top drawer, the third image the contents of the middle drawer, and the last image the contents of the bottom drawer. in this specific example, we re-used the images that we posted in the catalog information technology and libraries | june 2017 65 record by pasting them on top of the cart to show circulation staff what to expect in each drawer upon check-in. overall, photographic previews for this collection appear to be working very well for both catalog users and circulation staff “…to inform users about size, extent, and availability of collections or objects.”24 in fact, they have been working so well for this collection that usage has increased exponentially compared to previous years. figure 1. circulation statistics 2014-2016 conclusions and future directions although we implemented photographic previews for three realia collections, we could not define any standard workflow for the process beyond correcting or downloading the metadata first and adding the images second. part of this is due to our working primarily with legacy collections because we often discovered issues, like the model keys, while working through another issue. the other part is due to the nuances involved in processing realia in general. even with good, readily available catalog records like those for the tabletop games, time still had to be spent separating, organizing, and rehousing game pieces as well as hunting down useful images. unfortunately, any type of realia processing, even if it is just textual description, is much more time-consuming than the majority of academic library cataloging. adding in the extra steps to create, upload, and link a photographic preview can nearly double that labor investment. notwithstanding, as butler and kvenild advocate “…not supplying images as metadata for items that most need them (i.e. kits, games, and models) is to make them nearly irretrievable. providing bare-bones traditional metadata for these items is analogous to delegating them to the backlog shelves of yesteryear.”25 367 317 114 10 1 444 24 0 50 100 150 200 250 300 350 400 450 500 2014 2015 2016 circulation statistics manipulatives models games picture perfect: using photographic previews to enhance realia collections for library patrons and staff | rubel | https://doi.org/10.6017/ital.v36i2.9474 66 unfortunately, neither the library management system nor the third-party catalog enhancement market currently provides a good solution to this problem. considering how great an impact photographic previews have had in the online retail market, this lack of technical support is surprising. yes, syndetic solutionstm is a great product for cover images and tables of content for books. however, once you go beyond traditional resources, there is a great need to allow institutions to submit their own images as part of catalog record enhancement and not to serve as separate digital surrogates in a digital respository. this could be done either within the library management system, like the media management module, or as an option for catalog enhancement where libraries could add images to either a shared database or their own database using standard identifiers on a third-party platform like syndeticstm. further research on photographic previews is also sorely needed. as of this writing, we only have a handful of case studies and some guiding philosophy on the use of previews. consultation with internet retailers and literature on online marketing might be more applicable than library science research to evaluate their impact, but research into their direct impact vs. textual descriptions on catalog use would be ideal. references 1. fact book 2015 – 2016 (big rapids, mi: ferris state university institutional research & testing, 2016), http://www.ferris.edu/htmls/admision/testing/factbook/factbook15-162.pdf, 47. 2. ibid, 12. 3. marcia butler and cassandra kvenild, “enhancing catalog records with photographs for a curriculum materials center,” technical services quarterly 31 (2014): 122-138, https://doi.org/10.1080/07317131.2014.875377, 122-124. 4. ibid, 126. 5. martha fallahay loesch and marta mestrovic deyrup, “cataloging the curriculum library: new procedures for non-traditional formats,” cataloging & classification quarterly 34, no. 4 (2002): 79-89, https://doi.org/10.1300/j104v34n04_08, 82. 6. butler and kvenild, “enhancing catalog records with photographs,” 128. 7. stephan greene, gary marchionini, catherine plaisant, and ben shneiderman, “previews and overviews in digital libraries: designing surrogates to support visual information seeking,” journal of the american society for information science 51, no. 4 (2000): 380-393, https://doi.org/10.1002/(sici)1097-4571(2000) 51:4<380::aid-asi7>3.0.co;2-5, 381. 8. ibid. information technology and libraries | june 2017 67 9. ibid, 384. 10. ibid, 385. 11. ibid. 12. paul moeller, “enhancing access to rare journals: cover images and contents in the online catalog,” serials review 33, no. 4 (2007): 231-237, https://doi.org/10.1016/j.serrev.2007.09.003, 235. 13. butler and kvenild, “enhancing catalog records with photographs,” 128. 14. greene et. al., “previews and overviews in digital libraries,” 388. 15. butler and kvenild, “enhancing catalog records with photographs,” 124. 16. moeller, “enhancing access to rare journals,” 234. 17. butler and kvenild, “enhancing catalog records with photographs,” 129. 18. ibid, 132. 19. ibid, 126. 20. moeller, “enhancing access to rare journals,” 237. 21. butler and kvenild, “enhancing catalog records with photographs,” 131. 22. ibid, 135. 23. ibid, 134. 24. greene et. al., “previews and overviews in digital libraries,” 386. 25. butler and kvenild, “enhancing catalog records with photographs,” 136. 168 techniques for special processing of data within bibliographic text paula goossens: royal library albert i, brussels, belgium. an analysis of the codification practices of bibliographic desc1'iptions reveals a multiplicity of ways to solve the p1'oblem of the special processing of ce1tain characters within a bibliographic element. to obtain a clem· insight i'nto this subfect, a review of the techniques used in different systems is given. the basic principles of each technique are stated, examples am given, and advantages and disadvantages are weighed. simple local applications as well as more ambitious shared cataloging p1'0jects are considered. introduction effective library automation should be based on a one-time manual input of the bibliographic descriptions, with multiple output functions. these objectives may be met by introducing a logical coding technique. the higher the requirements of the output, the more sophisticated the storage coding has to be. in most cases a simple identification of the bibliographic elements is not sufficient. the requirement of a minimum of flexibility in filing and printing operations necessitates the ability to locate certain groups of characters within these elements. it is our aim, in this article, to give a review of the techniques solving this last problem. as an introduction, the basic bibliographic element coding methods are roughly schematized in the first section. according to the precision in the element identification, a distinction is made between two groups, called respectively field level and sub:field level systems. the second section contains discussions on the techniques for special processing of data within bibliographic text. three basic groups are treated: the duplication method, the internal coding techniques, and the automatic handling techniques. the different studies are illustrated with examples of existing systems. for the field level projects we confined ourselves to some important german and belgian applications. in the choice of the subfield level systems, which are marc ii based, we tried to be more complete. most of the cited applications, for practical reasons, only concern the treatment of monographs. this cannot be seen as a limitation because the methods discussed are very techniques for special processing/ goossens 169 general by nature and may be used for other material. each system which has recourse to different special processing techniques is discussed in terms of each of these techniques, enabling one to get a realistic overview of the problem. in the last section, a table of the systems versus the techniques used is given. the material studied in this paper provided us with the necessary background for building an internal coding technique in our internal processing format. bibliographic element codification methods field level systems the most rudimentary projects of catalog automation are limited to a coarse division of the bibliographic description into broad fields. these are marked by special supplied codes and cover the basic elements of author, title, imprint, collation, etc. in some of the field level systems, a bibliographic element may be further differentiated according to a more specific content designation, or according to a function identification. for instance, the author element can be split up into personal name and corporate name, or a distinction can be made between a main entry, an added entry, a reference, etc. this approach supports only the treatment of each identified bibliographic element as a whole for all necessary processing operations, filing and printing included. this explains why, in certain applications, some of the bibliographic elements are duplicated, under a variant form, according to the subsequent treatments reflected in the output functions. details on this will be discussed later. here we only mention as an example the deutsche bibliographie and the project developed at the university of bochum.l-4 it is evident that these procedures are limited in their possibilities and are not economical if applied to very voluminous bibliographic files. for this reason, at the same time, more sophisticated systems, using internal coding techniques, came into existence. these allow one to perform separate operations within a bibliographic element, based on a special indication of certain character strings within the text. as there is an overlap in the types of internal coding techniques used in the field level systems and in the subfield level systems, this problem will later be studied as a whole. we limit ourselves to citing some projects falling under this heading. as german applications we have the deutsche bibliographie and the bikas system. 5 in belgium the programs of the quetelet fonds may be mentioned.6· 7 subfield level systems in a subfield level system the basic bibliographic elements, separated into fields, are further subdivided into smaller logical units called subfields. for instance, a personal name is broken into a surname, a forename, a numeration, a title, etc. such a working method provides access to smaller logical units and will greatly facilitate the functions of extraction, sup170 journal of lihm1·y automation vol. 7/3 september 1974 pression, and transposition. thus, more flexibility in the processing of the bibliographic records is obtained. as is well known, the library of congress accomplished the pioneering work in developing the marc ii format: the communications format and the internal processing format. s-n these will be called marc lc and a distinction between the two will only be made if necessary. the marc lc project originated in the context of a shared cataloging program and immediately served as a model in different national bibliographies and in public and university libraries. in this paper we will discuss bnb marc of the british national bibliography, the nypl automated bibliographic system of the new york public library, monocle of the library of the university of grenoble, canadian marc, and fbr (forma bibliothecae regiae), the internal processing format of the royal library of belgium.l2-21 in order to further optimize the coding of a bibliographic description, the library of congress also provided for each field two special codes, called indicators. the function of these indicators differs from field to field. for example, in a personal name one of the indicators describes the type of name, to wit: forename, single surname, multiple surname, and name of family. some of the indicators may act as an internal code. in spite of the well-considered structuring of the bibliographic data in the subfield level systems, not all library objectives may yet be satisfied. to reduce the remaining limitations, some approaches similar to those elaborated in field level systems are supplied. some ( nypl, marc lc internal fmmat, and canadian marc) have, or will have, in a very limited way, recourse to a procedure of duplication of subfields or fields. all cited systems, except nypl, use to a greater or lesser degree internal coding techniques. finally some subfield level systems automatically solve certain filing problems by computer algorithms. this option was taken by nypl, marc lc, and bnb marc. each of these methods will be discussed in detail in the next section. techniques for special processing of data methods for special treatment of words or characters within bibliographic text were for the most part introduced to suppmt exact file arrangement procedures and printing operations. in order to give concrete form to the following explanation, we will illustrate some complex cases. each example contains the printing form and the filing form according to specific cataloging practices for some bibliographic elements. consider the titles in examples 1, 2, and 3, and the surnames in examples 4, 5, and 6. example 1: l'automation des bibliotheques automation bibliotheques example 2: bulletino della r. accademia medica di roma bolletino accademia medica roma techniques for special processing/ goossens 171 example 3: ibm 360 assembler language i b m three hundred sixty assembler language example 4: me kelvy mackelvy example 5: van de castele v andecastele example 6: martin du card martin dugard we do not intend, in this paper, to review the well-known basic rules for building a sort key (the translation of lowercase characters to uppercase, the completion of numerics, etc.). our attention is directed to the character strings that file differently than they are spelled in the printing form. the methods developed to meet these problems are of a very different nature. for reasons of space, not all the examples will be reconsidered in every case; only those most meaningful for the specific application will be chosen. duplication methods we briefly repeat that this method consists of the duplication of certain bibliographic elements in variant fonns, each of them exactly corresponding to a certain type of treatment. in bochum, the title data are handled in this way. one field, called "sachtitel," contains the filing form of the title followed by the year of edition. another field, named "titelbeschreibung," includes the printing form of the title and the other elements necessary for the identification of a work (statements of authorship, edition statement, imprint, series statement, etc.). to apply this procedure to examples 1, 2, and 3, the different forms of each title respectively have to be stored in a printing field and in a sorting field. analogous procedures are, in a more limited way, employed in the deutsche bibliographie. for instance, in addition to the imprint, the name of the publisher is stored in a separate field to facilitate the creation of publisher indexes. the technique of the duplication of bibliographic elements has also been considered in subfield level systems. the nypl format furnishes a filing subfield in those fields needed for the creation of the sort key. this special subfield is generally created by program, although in exceptional cases manual input may be necessary. in the filing subfield the text is preceded by a special character indicating whether or not the subfield has been introduced manually. marc lc (internal format) and canadian marc opt for a more flexible approach in which the filing information is specified with the same precision as the other information. the sorting data are stored in complete fields containing, among others, the same subfields as the corresponding original field. because in most subfield level systems the number of different fields is much higher than in field level systems, the duplication method becomes more intricate. provision of a separately coded field for each normal field 172 j oumal of library automation vol. 7 i 3 september 197 4 which may need filing information is excluded. only one filing field is supplied, which is repeatable and stored after the other fields. in order to link the sorting fields with the original fields, specific procedures have been devised. marc lc, for instance, reserves one byte per field, the sorting field code, to announce the presence or the absence of a related sorting field. the link between the fields themselves is placed in a special subfield of the filing field. 22 in the supposition that examples 3 and 4 originate from the same bibliographical description, this method may be illustrated schematically as follows: tag 100 245 880 880 sorting field code sequence number x 1 x 1 1 2 data $a$mc kelvy $a$ibm 360 assembler language $ja$1001$mackelvy $ja$2451$i b m three hundred sixty assembler language as is well known, the personal author and title fields are coded respectively as tag 100 and tag 245. tag 880 defines a filing field. in the second column, the letter x identifies the presence of a related sorting field. the third column contains a tag sequence number needed for the unequivocal identification of a field. in the last column the sign ·$ is a delimiter. the first $ is followed by the different subfield codes. the other delimiters initiate the subsequent subfields. in tag 100 and 245, the first subfields contain the surname and the short title respectively. in tag 880 the first subfield gives the identification number of the related original field. the further subfield subdivision is exactly the same as in the original fields. in canadian marc a slightly different approach has been worked out. note that in neither of the last two projects has this technique been implemented yet. for an evaluation of the duplication method different means of application must be considered. if not systematically used for several bibliographic elements, the method is very easy at input. the cataloger can fill in the data exactly as they are; no special codes must be imbedded in the text. but it is easy to understand that a more frequent need of duplicated data renders the cataloging work very cumbersome. in regard to information processing, this method consumes much storage space. first, a certain percentage of the data is repeated; second, in the most complete approach of the subfield level systems, space is needed for identifying and linking information. for instance, in marc lc, one byte per field is provided containing the sorting field code, even if no filing information at all is present. finally, programming efforts are also burdened by the need for special linking procedures. in order to minimize the use of the duplication technique, the cited systems reduce their application in different ways. bochum simplified its cataloging rules in order to limit its use to title information. as will be explained further, the deutsche bibliographie also has recourse to internal techniques for special processing/ goossens 173 coding techniques. nypl, marc lc, and canadian marc only call on it if other more efficient methods (see later) fail. they also make an attempt to adapt existing cataloging practices to an unmodified machine handling of nonduplicated and minimally coded data. intemal coding techniques separators separators are special codes introduced within the text, identifying the characters to be treated in a special way. a distinction can be made among four procedures. 1. simple separators. with this method, each special action to be performed on a limited character string is indicated by a group of two identical separators, each represented as a single special sign. illustration on examples 2, 3, 4, and 6 gives: example 2: £ bolletino £ ¢bulletino della r. ¢accademia medica ¢di ¢roma example 3: £i b m three hundred sixty £¢ibm 360 ¢assembler language example 4: m£a£c¢ ¢kelvy example 6: martin du¢ ¢card the characters enclosed between each group of two corresponding codes £ must be omitted for printing operations. in the same way the characters enclosed between two corresponding codes ¢ are to be ignored in the process of filing. in the case that only the starting position of a special action has to be indicated, one separator is sufficient. for instance, if in example 1 we limit ourselves to coding the first character to be taken into account for filing operations, we have: example 1: l' i automation des bibliotheques where a slash is used as sorting instruction code. the simple separator method has tempting positive aspects. occupying a minimum of storage space (maximum two bytes for each instruction), the technique gives a large range of processing possibilities. indeed, excluding the limitation on the number of special signs available as separators, no other restrictions are imposed. this argument will be rated at its true worth only after evaluation of the multiple function separators method and of the indicator techniques. the major disadvantage of the simple separator method lies in its slowness of exploitation. in fact, for every treatment to be performed, each data element which may contain special codes has to be scanned, character by character, to localize the separators within the text and to enable the execution of the appropriate instructions. for example, in the case of a printing operation, the program has to identify the parts of the text to be considered and to remove all separators. the sluggishness of 17 4 i ournal of library automation vol. 7 i 3 september 197 4 execution was for some, as for canadian marc, a reason to disapprove this method.23 as already mentioned, another handicap with cataloging applications is the loss of a number of characters caused by their use as special codes. it is self-evident that each character needed as a separator cannot be used as an ordinary character in the text. for bochum this was a motive to reject this method. many of the field level systems with internal codes have recourse to simple separators. we mention the deutsche bibliographie, in which some separators indicate the keywords serving for automatic creation of indexes and others give the necessary commands for font changes in photocomposition applications. in order to reduce the number of special signs, the deutsche bibliographie also duplicates certain bibliographic data. bikas uses simple separators for filing purposes. the technique is also employed in subfield level systems. in monocle each title field contains a slash, indicating the first character to be taken into account for filing. 2. multiple function separators. designed by the british, the technique of the multiple function separators was adopted in monocle. the basic idea consists of the use of one separator characteristic for instructing multiple actions. in the case of monocle these actions are printing only, filing only, and both printing and filing. in order to give concrete form to this method we apply it to examples 3, 4, and 6, using a vertical bar as special code. example 3: jibm 360 jib m three hundred sixty jassembler language example 4: mjc jacjkelvy example 6: martin dujjjgard the so-called three-bar filing system divides a data element into the following parts: data to be j data to be i data to be filed and printed j printed only filed only i data to be j filed and printed in comparison with the simple separator technique, this method has the advantage of needing fewer special characters. a gain of storage space cannot be assumed directly. as is the case in example 6, if only one special instruction is needed, the set of three separators must still be used. on the other hand, one must note that a repetition of identical groups of multiple function separators within one data element must be avoided. subsequent use of these codes leads to very unclear representations of the text and may cause faulty data storage. this can well be proved if the necessary groups of three bars are inserted in examples 1 and 2. of the studied systems, monocle is the only one to use this method. 3. separators with indicators. as mentioned in the description of subfield level systems, two indicators are added for each field present. in techniques for special p1'0cessing/ goossens 175 order to speed up the processing time in separator applications, indicators may be exploited. in monocle the presence or the absence of three bars in a subfield is signalled by an indicator at the beginning of the corresponding field. this avoids the systematic search for separators within all the subfields that may contain special codes. the number of indicators being limited, it is self-evident that in certain fields they may already be used for other purposes. as a result, some of the separators will be identified at the beginning of the field and others not. this leads to a certain heterogeneity in the general system concept which complicates the programming efforts. under this heading, we have mentioned the use of indicators only in connection with multiple function separators. note that this procedure could be applied as well in simple separator methods. nevertheless, none of the subfield level systems performs in this fashion because it is not necessary for the particular applications. this method is not followed in the field level systems as no indicators are provided. 4. compound separators. a means of avoiding the second disadvantage of the simple separator technique is to represent each separator by a two-character code: the first one, a delimiter, identifies the presence of the separator and is common to each of them; the second one, a normal character, identifies the separator's characteristic. taking the sign £ as delimiter and indicating the functions of nonprinting and nonfiling respectively by the characters a and b, examples 2 and 4 give in this case : example 2: £ abolletino £ a£ bbulletino della r. £ baccademia medica £ bdi £ broma example 4: m£aa£ac£b £bkelvy thus the number of reserved special characters is reduced to one, independent of the number of different types of separators needed. in none of the considered projects is this technique used, probably because of the amount of storage space wasted. indicators as the concept of adding indicators in a bibliographic record format is an innovation of marc lc, the methods described under this heading concern only subfield level systems. although at the moment of the creation of marc lc one did not anticipate the systematic use of indicators for filing, its adherents made good use of them for this purpose. 1. personal name type indicator. as mentioned earlier, in marc lc one of the indicators, in the field of a personal name, provides information on the name type. this enables one to realize special file arrangements. for example, in the case of homonyms, the names consisting only of a forename can be filed before identical surnames. using the same indicator, an exact sort sequence can be obtained for 176 journal of libmry automation vol. 7/3 september 1974 single surnames, including prefixes. knowing that the printing form of example 5 is a single surname, the program for building the sort key can ignore the two spaces. the systems derived from marc lc developed analog indicator codifications adapted to their own requirements. this seems to be an elegant method for solving particular filing problems in personal names. nevertheless, its possibilities are not large enough to give full satisfaction. for instance, example 6 gives a multiple surname with prefix in the second part of the name. the statement of multiple surname in the indicator does not give enough information to create the exact sort form. because of this shortcoming, monocle had recourse to the technique called "separators with indicators." 2. indicators identifying the beginning of filing text. bnb marc reserves one indicator in the title field for identification of the first character of the title to be considered for filing. this indicator is a digit between zero and nine, giving the number of characters to be skipped at the beginning of the text. applying this technique to example i, the corresponding filing indicator must have the value three. without having recourse to other working methods, this title sorts as: example 1: automation des bibliotheques notice that the article des still remains in the filing form. this procedure has the advantage of being very economical in storage space and in processing time. moreover the text is not cluttered with extraneous characters. on the other hand we must disapprove of the limitation of this technique to the indication of nonfiling words at the beginning of a field. the possibility of identifying certain character strings within the text is not provided for. taking examples 2 and 3 we observe that the stated conditions cannot be fulfilled. another negative side is the number of characters to be ignored, which may not exceed nine. also one indicator must be available for this filing indication. after bnb marc, marc lc and canadian marc also introduced this technique. 3. separators with indicators. the use of indicators in combination with separators has been treated above. pointers a final internal coding technique which seems worth studying is the one developed at the royal library of belgium for the creation of the catalogs of the library of the quetelet fonds, a field level system. the pointer technique is rather intricate at input but has many advantages at output. because there is inadequate documentation of this working method, we will try to give an insight into it by schematizing the procedures to be followed to create the final storage structure. at input, the cataloger intechniques for special p1'dcessing/goossens 177 serts the necessary internal codes as simple separators within the text. these codes are extracted by program from the text and placed before it, at the beginning of each field. each separator, now called pointm· characteristic, is supplemented with the absolute beginning address and the length of its action area within the text. in the quetelet fonds the pointer characteristic is represented by one character, the address and length occupy two bytes each. the complete set of pointers (pointer characteristics, lengths, and addresses ) is named pointer field. this field is incorporated in a sort of directory, starting with the sign "&" identifying the beginning of the field, followed by the length of the directory, the length of the text, and the pointer field itself. this is illustrated in figure 1. note that each field contains the five first bytes, even if no pointers are present. in the quetelet fonds, pointers are used for the following purposes: nonfiling, nonprinting, kwic index, indication of a corporate name in the title of a periodical, etc. examples 2, 3, and 4 should be stored in this system as represented in figure 2. directory text i i pointer field i i i i i 1 i i i i i i i i i i i representation of the structure of a field in the internal processing format of the quetelet fonds system. the codes respectively represent: &: field delimiter; ld: length of directory; lt: length of text; x, y, . . . : pointer characteristics; ax, ay, . . . : addresses of the beginning of the related action area inside the text; lx, ly, ... : length of these action areas. fig. 1. structure of direct01y with pointe1' technique. ' the advantages of the pointer technique are numerous. first, we must mention the relative rapidity of the processing of the records. in fact, in order to detect a specific pointer, only the directory has to be consulted. all subsequent instructions can be executed immediately. in contrast with most of the other methods discussed, there is no objection to using pointers for all internal coding purposes needed. this enables one to pursue homogeneity in the storage format, facilitating the development of programs. further, the physical separation of the internal codes and the text allow, in most cases, a direct clean text representation without any reformatting. finally, unpredictable expansions of internal coding processes can easily be added without adaptation of the existing software. a great disadvantage of the pointer technique lies in the creation of the directory. the storage space occupied by the pointers is also great in comparison with the place occupied by internal codes in other methods. a further handicap is the limitation imposed at input due to the use of simple separators. 178 journal of library automation vol. 7 i 3 september 197 4 ~~2,!j5,31~4>.~ 1,~eb ,4>11 ,9lel4,61ct;,3jb.o,l,l,e,t, i,n.o, ,b,u, i, i ,e, t, i ,n,oj 0 5 10 15 ,d,e,l,l,a, ,r,., ,a,c,c,a,d,e,m,i,a, ,m,e,d,i,c,a, ,d:i, ,r,o,m,a,$ ~ ~ ~ ~ ~ ~ ~ ~ ~~1 ,5 1 5,2~a~,$1 2,61sl2,6 1ci>, eli, ,b, ,m, ,t,h,r,e,e, ,h,u,n,d,r,e,d, ,s, i ,x,rl 0 5 10 15 ~ lv, , i,b,m, ,3,6,4>, ,a,s,s,e,m,b, i ,e, r, , i ,a,n,g,u,a,g,e, ~ ~ ~ h ~ ~ ~~ ~~1 ,5,4>,91~4>, 114>,1lelc~>,3 1 ci>, 1jm,a,c, ,k,e, l ,v,y, ~ 0 5 8 representation of examples 2, 3, and 4 in the quetelet fonds format. a represents the pointer characteristic for nonprinting data; b is the pointer characteristic for nonfiling data. fig. 2. pointe1· technique as applied to bibliographic data. in spite of these negative arguments, we see a great interest in this method, and wish to give some suggestions in order to relieve or to eliminate some of them. initially we must realize that the creation of a record takes place only once, while the applications are innumerable. the possibility of automatically adding some of the codes may also be considered. data needing special treatment expressed in a consistent set of logical rules can be coded by program. only exceptions have to be treated manually. in considering the space occupied by the directory, some profit could be imagined by trying to reduce the storage space occupied by the addresses and the lengths. there is also a solution to be found by not having systematically to provide pointer field information. one must realize that only a small percentage of the fields may contain such codes. finally, the restrictions at input may be removed by using complex separators. such a change does not have any repercussion on the directory. as far as we know, the pointer technique has not been used in a subfield level system. at our library an internal processing format of the subfield level type, called fbr, is under development, in which a pointer technique based on the foregoing is incorporated. techniques for special p1'dcessing/goossens 179 automatic handling techniques in order to give a complete review of the methods of handling data within bibliographic text, we must also treat the methods in which both the identification and the special treatment of these data are done during the execution of the output programs. the working method can easily be demonstrated with example 1. only the printing form must be recorded. the program for building the sort key processes a look-up table of nonfiling words including the articles l' and des. the program checks every word of the printing form for a match with one of the words of the nonfiling list. the sort key is built up with all the words which are not present in this table. to treat example 4, an analogous procedure can be worked through. an equivalence list of words for which the filing form differs from the printing form is needed. if, during the construction of the sort key, a match is found with a word in the equivalence list, the correct filing form, stored in this list, is placed in the sort key. the other words are taken in their printing form. in our case, using the equivalence list, me should be replaced by mac. in order to speed up the look-up procedures, different methods of organization of the look-up tables can be devised. other types of automatic processing techniques can be illustrated by the special filing algorithms constructed for a correct sort of dates. for instance, in order to be able to sort b.c. and a.d. dates in a chronological order, the year 0 is replaced by the year 5000. b.c. and a.d. dates are respectively subtracted from or added to this number. thus dates back to 5000 b.c. can be correctly treated. this technique, introduced by nypl, is also used at lc. the advantages of automatic handling techniques are many. no special arrangements must be made at input. only the bibliographic elements must be introduced under the printing form and no special codes have to be added. there is no storage space wasted for storing internal codes. as negative aspects we ascertain that not all cataloging rules may be expressed in rigid systematic process steps. examples 2 and 3 illustrate this point. one must also recognize that the special automatic handling programs must be executed repeatedly when a sort key is built up, increasing the processing time. this procedure may give some help for filing purposes, but we can hardly imagine that it really may solve all internal coding problems. think of the instructions to be given for the choice of character type while working with a type setting machine. the automatic handling technique is very extensively applied in the nypl programs, marc lc has recourse to it for treating dates, and bnb marc for personal names. 24 none of the field level systems considered here uses this method. summary and conclusions table 1 presents, for the discussed systems, a summary of the methods used for treating data in a bibliographic text. the duplication and indicator techniques have the most adherents. however, we must keep in mind table 1. review of the techniques for special processing of data within bibliographic text used or planned in the discussed systems systems techniques ,..... 00 0 automatic duplication internal codes handling ......... 0 separators t separators with indicators indicators pointers multiple personal beginning of 0 simple function name type filing text -t-t deutsche ""· c::>"' bibliographie x x ~ ~ <.-: eo chum x ~ <>+0 ~ bikas a .... 0 ;:i quetelet fonds x < 0 !"'"" -l. marc lc x x x x -cn cj') bnb marc x x x ('t) "'0 ..... ('t) nypl x x s 0" ('t) .... ,..... monocle x x x x co -l. jol>.. canadian marc x x x fer x techniques for special processing/ goossens 181 that in most of the systems the duplication of data only represents an extreme solution. on the other hand, indicators are very limited in their possibilities. as far as the flexibility and application possibilities are concerned, the simple separators and the pointers present the most interesting prospects. automatic handling techniques may produce good results for use in well-defined fields or subfields. from the evaluations given for the different methods, we conclude that for a special application the choice of a method depends greatly on the objectives, namely the sort of special processing facilities needed, the volume of data to be treated, and the frequency of execution. references i. rudolf blum, "die maschinelle herstellung der deutschen bibliographie in bibliothekarischer sicht," zeitschrift fur bibliothekswesen und bibliographie 13:303-21 (1966). 2. die zmd in frankfurt am main; herausgegeben von klaus schneider (berlin: beuth-vertrieb gmbh, 1969), p.133-37, 162-67. 3, magnetbanddienst deutsche bibliographie, beschreibung fur 7-spur-magnetbiinder (frankfurt on the main: zentralstelle fi.ir maschinelle documentation, 1972). 4. ingeborg sobottke, "rationalisierung der alphabetischen katalogisierung," in electronische datenverarbeitung in der universitiitsbibliothek bochum; herausgegeben in verbindung mit der pressestelle der ruhr-universitat bochum von gunther pflug und bernhard adams (bochum: druckund verlagshaus schiirmann & klagges, 1968), p.24-32. 5. datenerfassung und datenverarbeitung in der universitiitsbibliothek bielefeld: eine materialsammlung; hrsg. von elke bonness und harro heim (munich: pullach, 1972). 6. michel bartholomeus, l' aspect informatique de la catalographie automatique (brussels: bibliotheque royale albert j•r, 1970), 7. m. bartholomeus and m. hansart, lecture des ent1·ees bibliog1·aphiques sous format 80 colonnes et creation de l'enregistrement standard; publication interne: mecono b015a (brussels: bibliotheque royale albert j•r, 1969). 8. henriette d. avram, john f. knapp, and lucia j. rather, the marc ii format: a communications format for bibliographic data (washington, d.c.: library of congress, 1968) . 9. books, a marc format: specifications for magnetic tapes containing catalog records for books (5th ed.; washington, d.c.: library of congress, 1972). 10. "automation activities in the processing department of the library of congress," library resources & technical services 16:195-239 (spring 1972). 11. l. e. leonard and l. j. rather, internal marc format specifications for books (3d ed.; washington, d.c.: library of congress, 1972). 12. marc record service proposals (bnb documentation service publications no.1 [london: council of the british national bibliography, ltd., 1968]). 13. marc ii specifications (bnb documentation service publications no.2 [london: council of the british national bibliography, ltd., 1969]). 14. michael gorman and john e. linford, desc1·iption of the bnb marc recorda manual of practice (london: council of the british national bibliography, ltd., 1971). 182 ] ournal of library automation vol 7 i 3 september 197 4 15. edward duncan, "computer filing at the new york public library," in lm·c reports vol.3, no.3 ( 1970), p.66-72. 16. nypl automated bibliographic system overview, internal report. (new york: new york public library, 1972). 17. marc chauveinc, monocle: projet de mise en ordinateur d'une notice catalographique de livre. deuxieme edition (grenoble: bibliotheque universitaire, 1972). 18. marc chauveinc, "monocle," journal of library automation 4:113-28 (sept. 1971). 19. canadian marc (ottawa: national library of canada, 1972). 20. format de communication du marc canadien: monographies (ottawa: bibliotheque nationale du canada, 1973). 21. to be published. 22. private communications ( 1973). 23. private communications ( 1972). 24. private communications ( 1973). lib-s-mocs-kmc364-20141005043400 file structure for an online catalog of one million titles j. j. dimsdale: department of computing science, university of alberta, edmonton, canada, and h. s. heaps: department of computer science, sir george williams university, montreal, canada. 37 a description is given of the file organization and design of an on-line catalog suitable jo1· automation of a library of one million books. a method of virtual hash addressing allows rapid search of the indexes to the catalog file. storage of textual material in a compressed form allows considerable reduction in storage costs. introduction an integrated system for on-line library automation requires a number of computer accessible files. it proves convenient to divide these files into three principal groups, those required for the on-line catalog subsystem, those required for the acquisition subsystem, and those required for the on-line circulation subsystem. the present paper is concerned with the files for the catalog subsystem. files required for the circulation subsystem will be discussed in a future paper. the files for an on-line catalog system should contain all bibliographic details normally present in a manual catalog, and the file should be organized to allow searches to be made with respect to title words, authors, and library of congress ( lc) call numbers. it may also be desired to search on other bibliographic details, in which instance the appropriate files may be added to those described in the present paper. the file organization should be such as to support economic searching with respect to questions in which terms are connected by the logic operations and, or, and not. it should also allow question terms to be connected by operations of adjacency and precedence, and it should allow question terms to be weighted and the search made with reference to a specified threshold weight. it may be desirable for the file organization to include a thesaurus that may be used either directly by the user or by the search program to narrow, or broaden, the scope of the initial query or to ensure standardization of the question vocabulary. the file organization and search strategy should ensure that the user of the on-line catalog system receive an acceptable response time to his 38 journal of library automation vol. 6/ 1 march 1973 queries, although it is likely that some of the operations required by the circulation system will be given a higher priority. thus the integrated system must time-share between search queries, circulation transactions, and other tasks that originate from a number of separate terminals or from batch input. such tasks might arise from acquisitions, and from update and maintenance of the on-line catalog. the system should be a special purpose time-sharing system such as the time sharing chemical information retrieval system described by lefkovitz and powers and by weinberg.1· 2 in this system the queries time-share disk storage as well as the central processor. since an on-line catalog is a large file, and hence expensive to store in computer accessible form , it is desirable to store it in as compact a form as possible. for example, a catalog file for one million titles is likely to involve between 2 x los and 5 x 108 alphanumeric characters. if stored character by character the required storage capacity would be equivalent to that supplied by from seven to sixteen ibm 2316 disk packs. it is also important to design the frequently accessed files so as to minimize the number of disk, or data cell, accesses required to process each query. the files described in the present paper include ones stored in compressed form and organized for rapid access. throughout the present paper the term title is used in a general sense. it may include periodical titles as well as book titles. however, it is supposed that frequently changing information, such as periodical volume ranges, will be stored as part of the circulation subsystem rather than the catalog subsystem. overall file organization the complete bibliographic entries of the catalog may be stored in a serial (sequential) file so that any record may readily be read and displayed in its entirety. however, as indicated by curtice, use of an inverted file is to be preferred for purposes of searching.3 an alternative to the simple serial file is one organized in the form of a multiple threaded list ( multilist) in which all records that contain a particular key are linked together by pointers within the records themselves. the first record in each list is pointed to by an entry in a key directory as described by lefkovitz, holbrook, dodd, and rettenmayer.4-7 for very small collections of documents divett and burnaugh have attempted to organize on-line catalogs by use of ring structured variations of the multilist technique.8• 9 neither file organization is feasible for a collection of a million documents because of the long length of the threads involved. many disk accesses would be needed in order to retrieve all elements of a list, and hence there would be a very slow response to queries. the cellular multilist structure proposed by lefkovitz and powers, or the cellular serial structure proposed by lefkovitz, may well prove to be a viable alternative to the organization proposed in the present paper.10• 11 file structure for an on-line catalog j dimsdale 39 however, as indicated by lefkovitz, the inverted organization provides shorter initial, and successive, response times in answer to queries.12 in the present paper it is supposed that the on-line catalog file consists of both a serial file of complete bibliographic entries and an inverted file organized with respect to search keys such as title words, subject terms, author names, and call numbers. such a two-level structure is often assumed and has been termed a "combined file" by warheit who concluded it to be superior to either a single serial file or a threaded list organization.1317 the file structure described in the present paper uses indexes based on the virtual scatter table as described by morris and murray, the scatter in~ dex table discussed by morris, and the bucket as treated by buchholz.1820 the attractiveness of a similar structure for use in the ohio college library center has been analyzed by long, et ap1 the basic elements of the file organization are shown in figure 1. it is supposed that the access keys are title words, but a similar file structure is used for access with respect to keys of other types. key hashi ng hash {eg. title word)_. f unction-+ table file fig. 1. overall file organization any key may be operated on by a hashing function which transforms it into a pointer to an entry in a hash table file. this file contains pointers to both a dictionary file of title words and an inverted index which is stored in a compressed form. entries within the compressed inverted index serve as pointers to the catalog file of complete bibliographic entries. terms, such as title words, within the catalog file are coded to allow a compressed form of storage. the codes used in the compressed catalog file serve as pointers to the uncoded terms stored in the dictionary file. there would be a separate hashing function, hash table file, dictionary file, and compressed inverted file for use with each different type of key. however, there is only one compressed catalog file. for a search scheme that allows use of a thesaurus of synonyms, narrower terms, broader terms, and so forth, a thesaurus file may be added (figme 2). the files must be organized to allow for ease of updating. as further bibliographic entries are added it is necessary to add additional pointers from the inverted index. also, whenever a new key occurs in a bibliographic entry it must be added to the dictionary, assigned a code for storage in 40 journal of library automation vol. 6/1 march 1973 the compressed catalog file, and entered into the compressed inverted index. key hashing (eg. title word) function fig. 2. file organization with inclusion of a thesaur·us structure of the hash table file in order to locate the set of inverted index pointers that corresponds to a given search key k, the key is first operated on by a hashing function that transforms it into a bit string of length v bits. each such bit string is said to represent a virtual hash address, and is regarded as the concatenation of two substrings of length r and v-r bits. the two substrings are respectively said to constitute the major and the minor m( k) of the virtual hash address. the major is further divided into two bit strings b(k) and i(k) that define a bucket number b(k) of a bucket f3(k), and an index number i(k) of an entry within the bucket. the major that represents the pair of numbers b ( k), i ( k) is said to constitute a real hash address. the hash table file is divided into portions, or buckets, of equal length. each bucket is further divided into an index section, a content section, and a counter section (figure 3) . the index sections of all buckets have the same length. similarly, all content sections are of equal length, and so are all counter sections. as the hash table is created, entries are added sequentially into the content section so that any unfilled portion is at the end. in contrast, the index section of any bucket may contain unfilled entries at random positions and hence constitutes a scatter table. the hash table :file is created as follows. the various keys are transformed by the hashing function into bit strings b ( k), i ( k), m ( k). in the bucket f3 ( k) of number b ( k) an entry as described below is added to the content section, and the vacancy pointer within the counter section is incremented to point to the beginning of the unfilled portion of the content section. the i(k)th entry number in the index section is then set to point to the position of the entry added to the content section. the entry placed in the content section includes the minor m ( k) and a dictionary pointer to where the key is placed in the dictionary file as well as a pointer to an entry in the compressed inverted index. if there has previously occurred a bit string b(k1), i(k1), mr(k1) in which b(l) = b(k), i(k1) = i(k), mr(k1) # m(k) then no change is file structure for an on-line catalog/dim5dale 41 b(k), i(k), m(k) hash table file: counter section: counter section number number number occupied overflows from overflows into fig. 3. bucket of the hash table file made to the i ( k) th entry in the bucket f3 ( k) or to the minor m ( k1) in the content section. instead, the chain pointer is set to point to the location of a new entry that is added to the content section. in this new entry the minor is set to mt(k) and the dictionary pointer is set to indicate where the new key is placed in the dictionary file. there is said to have resulted a collision at the real hash address b(k), i(k). if there has previously occurred a bit string b ( k1), i ( kt) , m ( k1) in which b(k1) = b(k), i(k1) = i(k ), m(kt) = mt(k), where k1 =f k, then the collision bit that precedes m( k.,_) is set to 1. and a further content entry containing m ( k) is chained from the entry that contains m ( k1 ) . there is said to have occulted a collision at the virtual hash address b(k), i(k), m(k). the last three entries included in the counter section shown in figure 3 are optional but are useful for monitoring the performance of the hashing function with respect to bucket overflows and so forth. a bucket becomes full when there is no remaining unfilled space in its content section. if a further chain pointer is required from a content entry, its preceding overflow bit qc is set to 1 to indicate that the pointer is to another bucket. likewise, if a further entry is required in the index section its preceding overflow bit qr is set to 1 to indicate that it refers to an entry within another bucket. the bucket is then said to have overflowed. methods of handling bucket overflow, and choice of the new bucket, are discussed in a subsequent section. it should be noted that use of a hash table as described above retains most of the advantages of the usual scatter index method in which the in42 journal of library automation vol. 6/1 march 1973 dex entries and content entries are stored in two separate files. it has the further important advantage that in most instances a single disk access is sufficient to locate both the index entry and the corresponding content entry. as noted by buchholz and reising~ if it is known that certain keys are likely to appear with high frequency in search queries then it is advantageous to enter them at the start of creation of the hash file. 22 • 23 they will then tend to appear near the beginnings of the content entry chains and hence require little cpu time for their subsequent location. furthermore, they will tend to appear in the same bucket as their corresponding index entries, and hence their location will usually require only a single disk access. number of bits for virtual hash address suppose the hashing function is chosen so that the majors of the transformed keys are uniformly distributed among the r slots available for real hash addresses b,i. if there are n keys then a = n / r may be termed the load factor. it is the average number of keys that are transformed into any given real hash address. the probability that any given real hash address corresponds to k keys is given by m urra y24 as (i) pk = e-a ak j kl hence, for any given real address the probability of a collision occurring is n (2) c = ~ pk = 1 po p1 = 1 (1 + a)e-a. k= 2 if a collision occurs at a particular real hash address, the expected length of the required chain within the content section is n (3) l = ~ kp~r/c k=2 n = (1 / c) (~ kpk pd k:o = ( l j c) (a ae-a) _ a (e" 1) ea.1 a it may be noted that if the load factor a is equal to 1 then l = 2.43. if all the transformed keys are distributed uniformly among the v possible virtual addresses b, i, m then the expected total number of collisions at virtual addresses is given by murray25 as (4) p = n2/ 2v provided v" n. the expected relative frequency of collisions at virtual addresses is therefore (5) f = n / 2v. file structure for an on-line catalog j dlmsdale 43 it proves convenient to regard n, f, and a as basic parameters in terms of which may be determined the number r of bits required in the major 1 and the number v of bits required in the virtual hash addresses. the value of r must be at least as large as lo~r = lo~(n/a), and hence r may be chosen according to the formula (6) r = r log2 (n/ a) where r means "the smallest integer greater than or equal to." the value of v must be at least as large as (7) v = r lo~v = r lo~ (n/2£). if n and f have the form n = 2n and f = 2-'y then v may be chosen according to the formula (8) v = n + 'y 1 and the number of bits required for the minor is (9) m = vr. choice of bucket capacity with an 8-bit byte-oriented computer, such as the ibm 360, it proves convenient to use 8 bits of storage for each entry number plus overflow bit within the index section. if a value of zero is used to indicate an unused index entry there remain up to 127 possible values for entry numbers. thus the number c of entries in the content section must be less than or equal to 127. suppose there are b slots for index entries in each bucket. the total number of index entries in the entire file is r. it follows from the results of schay and spruth,26 tainter,27 and heising28 that the probability p( b, c) of overflow of any bucket is given by oo c-<>b (10) p (b, c) = ~ (ab)k -· k = c + 1 kl for selected values of b, beyer's tables of the poisson distribution have been used to compute p ( b, c) and to determine the largest value of c for which p(b, c) l o.oi.29 the results are shown in table 1 for the instance in which a = 1. a similar table has been computed by buchholz3° for the instance in which c = b and a ranges from 0.1 to 1.2. as is apparent from table 1, an increase in the value of b allows use of a smaller ratio c/ b and hence permits more economical use of storage. with b = 64 the allowed value of c/b is 1.33 and hence c may be chosen equal to 85. the reduction in access time that results from structuring the file so that each bucket contains both index and content entries is, of course, effected at the expense of additional storage costs. for example, if cj b = 1.33 then the space allocated for storage of content entries is 33 percent greater than if content entries are stored in a separate file. relaxation of the condition p(b,c)..:::: 0.01 allows a reduction in cj b, but the increased number of bucket overflows will cause additional disk accesses to be required. 44 journal of library automation vol. 6/ 1 march 1973 table 1. values of b, c, and cj b for which p(b,c~o.ol when a= 1. b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 60 100 c 5 6 8 10 11 13 1415 17 18 19 20 22 23 24 25 27 28 29 30 80 125 treatment of bucket overflows c! b 5.00 3.00 2.66 2.50 2.20 2.17 2.00 1.88 1.89 1.80 1.73 1.67 1.69 1.64 1.60 1.56 1.59 1.55 1.53 1.50 1.33 1.25 when a new key is found to map into a bucket whose content section is full then some means must be found to provide space in some other bucket. the particular procedure that should be used depends on the extent to which the entire set of buckets contain unfilled portions. suppose that many buckets are almost full and that the number c of allowed content entries is less than 127. the entire hash file may then be expanded with the same index sections but with longer content sections. if many buckets are almost full and c = 127 then the entire file may be expanded in such manner that each bucket is replaced by a pair of buckets that contain the same number b of allowable index entries, but whose number ct of allowable content entries is chosen to ensure that p(b,ct ) l 0.01. such doubling of buckets also doubles the number of index entries but it does not double the storage required for the entire file. each key k that corresponds to an entry in the original bucket is associated with an entry in the first , or second, of the new buckets according as the leading bit of either its index address i ( k) or its minor m:( k) is equal to 0 or 1. the effect is to shift one bit from i(k) or m(k) into the bucket address b ( k ) . this method is based on a suggestion of morris. 31 · suppose that few buckets are almost full. then a suitable means of determination of an unfilled bucket for storage of the minor is through use of some overflow algorithm that determines a sequence of bucket numbers bo ( k) , bt ( k) , b2 ( k), etc., corresponding to any given full bucket {:3o ( k) . suppose there are nb buckets. a quadratic residue algorithm (11) bi (k) = [b0 (k) + aj + bj2 ] mod nb file structure for an on-line catalog / dimsdale 45 has been considered by maurer and by bell for use with in-core hash tables, but it suffers from the disadvantage that the existence of a full bucket /)o ( k) will divert entries into the particular buckets /31 ( k), /32 ( k), etc. and hence cause them to fill more rapidly than other buckets which may contain fewer entries.82 • 33 it is believed that a more desirable form of the quadratic residue algorithm is ( 12) bj ( k) = { b0 ( k) + f1 [i ( k)] } mod nb where fl is a suitably chosen function. letting b, ( k) depend, through fs, on both j and i ( k), instead of on j alone, allows reduction of the tendency to fill a particular set of buckets. to prevent a tendency to overflow particular buckets it is also desirable for the overflow algorithm to produce bucket numbers that are uniformly distributed among all possible bucket numbers. among the more promising forms to be chosen for the fl [i ( k)] are the following ( 13a) fj [i ( k)] = i ' ( k) j where j = 1, 2, ... , nb -1, and l'(k) denotes i(k) if i(k ) is odd, but denotes i ( k) + 1 if i ( k) is even. since nb is a power of 2 such choice of i'(k) ensures that i'(k) and nb have no common factors, and hence that bi ( k) steps through the sequence /3o ( k ), /31 ( k), etc. covering every bucket in the file. ( 13b ) fj [i ( k) ] = i ' ( k ) j 2 where j = 1, 2, ... , r \ / n-1, and r means "the least integer greater than or equal to." ( 13c ) fdl ( k)] = rdi ' ( k)] where j = 1, 2, ... , db, and rj[l'( k)] denotes a number output by a pseudorandom number generator of the form suggested by morrisb4 with an initial input of i' ( k) instead of 1. it may be remarked that use of equation 13a requires the least number of machine instructions, and the least cpu time per step, but it has a strong tendency to cluster the f31(k) immediately after the /3o(k) and hence it is likely to be the least effective of the three methods. use of equation 13b produces less clustering, but the sequence does not include all buckets of the file. use of equation 13c requires the largest number of instructions and cpu time per step, but the f3j(k) are less likely to cluster and they are uniformly distributed among all possible buckets. thus equation 13c produces shorter chains of overflow buckets and hence requires fewer disk accesses. if a new key k maps into a full bucket /3o ( k) then the following procedure is used to determine the bucket into which the minor of k is to be inserted: ( i) the chain of pointers from the i ( k) th entry of the bucket /)o ( k) is followed, possibly through overflow buckets given by equation 12, in or46 journal of library automation vol. 6/1 march 1973 der to locate the terminal entry of the chain. suppose this terminal entry is within a bucket /3j ( k) . (ii) if there is available space in bucket /3j(k) then the minor mr(k) is entered and chained as described previously. (iii) if bucket /3j ( k) is full, but there is space in /3j + 1 ( k), then the minor m ( k) is entered into /3j + 1 ( k) and chained as described previously. ( iv) if buckets f3j ( k) and /3j + 1 ( k) are both full, and bucket /3j + 1 ( k) contains at least one nonempty index entry i ( k') whose chained content entries are all contained within /3j + 1 ( k), then the minor m ( k) is stored according to the following displacement algorithm: the terminal member of the chain from i ( k') is displaced to an overflow bucket /3r ( k') determined by use of equation 12, except that if both /3r(k') and /3r + 1(k') are full then a further bucket is determined by use of the displacement algorithm. the minor m ( k) is substituted for the displaced entry in bucket /3j + 1(k) and is chained appropriately. ( v) if application of step ( iv) leads to a bucket /3j + 1 ( k), or /3r + 1 ( k), that contains no nonempty index entry whose chained content entries are all contained within it, then the entire hash file must be expanded by use of one of the procedures described at the beginning of the present section. it should be emphasized that, although step ( iv) is necessary for completeness, the probability of its use is very low. with a probability of less than 0.01 for a bucket overflow, the probability of use of step ( iv) is less than (0.01) 3• search phase and problem of mismatch in the previous sections the structure of the hash index file has been discussed with emphasis on details of its creation and update. during search of the catalog files by use of the inverted index, each search key is processed by the following search alogorithm: step 1: the search key k is transformed by the hashing function into a virtual hash address b(k), i(k), m(k). step 2: the bucket /3(k) is read into core. step 3: the index entry specified by i(k) is examined. if it is empty then the search key is not present in the data base. if it is not empty then step 4 is performed. step 4: the overflow bit of the index entry specified by i(k) is examined. if it is equal to 1 then step 5 is performed. if it is equal to 0 then step 6 is performed. step 5: the overflow algorithm is used to determine the address of therequired overflow bucket which is then read into core, and step 6 is executed. step 6: the minor of each entry in the chain of content entries is comfile structure for an on-line catalog/dimsdale 47 pared to the minor of the search key's virtual hash address until either a match is found or the chain is exhausted. whenever the chain leads to an overflow bucket then step 5 is performed. step 7: if a match is found for m ( k) then the collision bit of the entry is examined. if it is equal to 0 then step 9 is performed. if it is equal to 1 then step 8 is performed. step 8: the dictionary entry that corresponds to each content entry in the virtual address collision is read into core and compared to the search key k. if no match is found then the search key is not present in the index. step 9: this step is included because there is a small probability that a misspelled search key, or one not present in the hash file, may be transformed into the same virtual address as some key already included in the file. the step consists of reading the corresponding dictionary entry into core and comparing it with the search key. for reasons discussed later in the present section it is desirable to omit this step. it should be noted that in most instances the search algorithm will not require execution of steps 5 and 8. in fact, with the hash index files designed as described in the previous sections, the probability of execution of step 5 is about 0.01 and the probability of execution of step 8 is about 2-16• consequently, if step 9 is also omitted the number of disk accesses required to find the index entry corresponding to a search key is approximately l.ol. the mismatch problem, which gives rise to step 9 of the search algorithm, is less serious than might be expected. suppose the hash function distributes the transformed keys uniformly over all hash addresses. the probability that a new, or misspelled, key maps into an existing entry is given by (14) pc = njv the probability that a search leads to a mismatch is therefore ( 15) p m = p .n j v where ps is the probability that the search key is misspelled or not in the hash table. thus, for a hash table of n = 216= 65,536 title words and v = 28\ an assumption of ps = 0.1leads to pm = 3 x 106• because pm is extremely small, and because each execution of step 9 requires up to two disk accesses, it is desirable to omit this step. if experience shows that particular new or misspelled search keys occur frequently, and cause mismatches, they may themselves be entered into the hash index file. in fact, some degree of automatic spelling correction may be provided if some common misspellings are included in the hash files and chained to the content entries that correspond to the correctly spelled keys. correct, but alternative, spellings of search keys may also be treated in the same manner. 48 journal of library automation vol. 6/1 march 1973 size of hash file for title words suppose the docwnent collection contains t different titles that comprise a total of w words of which there are n different words. let w = w /t denote the average number of words in each title. reid and heaps85 have reported word counts on the 57,800 titles included on the marc tapes between march 1969 and may 1970 and have noted that (16) w = 5.5 ( 17) log10n = 0.6 log1ow + 1.2. examination of other data bases has led to the conclusion that log n is likely to be a linear function of log w over the range 0 l w l 106• for a library of one million titles the equations 16 and 17 may therefore be used to predict that when t = 106 then (18) w :::: 5.5 x 106 and n = 1.8 x 105 • it follows from equation 6 that if a = 1 the number of bits required in the major is (19) r = 18. according to equation 7, in order to reduce the frequency f of collisions at virtual addresses to 2-16 the number of bits required in the entire virtual address is (20) v = r [lo~ (1.8 x 105 + 16 1] = 33. consequently, the number of bits in the minor is ( 21) m = v r = 15. however, with such a choice of r then r = 218 and the value of the load factor is, in fact, (22) a = n/r = 0.7 it follows from equation 4 that the expected total number p of collisions at virtual addresses is equal to approximately 2. it may be further noted that murray36 has derived the following approximation for the probability that the number of collisions at virtual hash addresses lies within the range a to d: d (23) p (a, d) = ~ e-"p p1/il (0 ~i l u~ n) i= a where l means "greatest integer less than or equal to." when p = 2 the equation gives a value of 0.9998 for the probability that the total number of collisions lies between 0 and 8. thus the above choice of r, v, and m leads to a title word hash table file with excellent virtual address collision properties. use of equation 10 with b = 64 and a= 0.7, leads to the result that the probability of bucket overflow may be reduced to 0.01 by choosing c = 62. in view of the above value of m it proves convenient to allocate 10 bytes of storage for each content entry. each entry consists of a 2-byte portion to contain the 15-bit minor preceded by a collision bit, a 1-byte portion to file structure for an on-line catalogj dimsdale 49 contain a 7-bit chain pointer preceded by an overflow bit, a 3-byte dictionary pointer, and a 4-byte pointer to an inverted index. the 64 one-byte index entries, the 62 ten-byte content entries, and 4 one-byte counters, constitute buckets of length 688 bytes. the entire hash file consists of r entries, and hence r/b = 212 buckets. its storage requirement is therefore for 212 x 688 = 2.82 x 106 bytes. it may be remarked that nine 688-byte buckets may be stored unblocked in one track of an ibm 2316 disk pack, and that the entire hash file occupies 11.38 percent of the disk pack. when the disk and channel are idle the average time to access such a bucket is the sum of the average seek time, the average rotational delay, and the record transmission time. for storage on an ibm 2314 disk drive the average bucket access time is therefore 60 + 12.5 + 2.8 = 75.3 milliseconds. the average access time for a sequence of accesses could be reduced by suitable scheduling. size of hash file for lc call numbers for a library of one million titles the number n of call numbers is 106• if a = 1 and f = .2-16 it follows from equations 6, 7, 9, and 4 that (24) r = 20, v = 35, m = 15, p = 16. with such a choice of r the load factor is approximately equal to 1. equa~ tion 23 gives a probability of 0.9998 that the total number of virtual address collisions lies between 0 and 34. use of equation 10 with b = 64 and a = 1.0 shows that the probability of bucket overflow may be reduced to 0.01 by choosing c = 85. the content entries for lc call numbers may be arranged as for title words except that the 4-byte pointer to an inverted index is replaced by a 3-byte pointer to the compressed catalog file. the bucket length is therefore 64 + 85 x 9 + 4 = 833 bytes. the storage requirement for the hash file is ( 220/ 26 ) x 833 = 13.65 x 106 bytes which may be stored in 2184 tracks, or 54.6 percent, of an ibm 2.316 disk pack. the average time to access a bucket is 60 + 12.5 + 3.3 = 75.8 milliseconds. size of hash file for author names in the present section the term "author" will be used to include personal names, corporate names, editors, compilers, composers, translators, and so forth. it will be assumed that for personal names only surnames are entered into the author dictionary. a search query that includes specification of authors with initials is first processed as if initials were omitted, and the resulting retrieved catalog entries are then scanned sequentially to eliminate any entries whose authors do not have the required initials. it will also be supposed that each word of a corporate name is entered separately into the author dictionary, and that the inverted index contains an entry for each term. in the absence of reliable statistics regarding the distributions of author 50 journal of library automation vol. 6/1 march 1973 surnames, words within corporate names, and so forth, the following as~ sumptions have been made in order to estimate tile size of the author dictionary and hash file for a library of one million titles: ( i) personal author names contain 2 x 105 different surnames of average length 7 characters. ( ii) the corporate author names include 4 x 104 different words of average length 6 characters. (iii) the author names include 1.6 x 104 different acronyms such as ibm, aslib, and so forth; their average length is 4 characters. it is thus supposed that n = 2.56 x 105 entries are required in the author hash files. calculations similar to those of the previous section show that ( 25) r = 18, v = 33, m = 15, p = 4, a = 1.0. equation 23 gives a probability of 0.9999 that the total number of virtual address collisions lies between 0 and 13. the probability of bucket overflow may be reduced to 0.01 by choosing c = 85. content entries of 10 bytes may be arranged as previously described for title words. hence each bucket requires 918 bytes of storage. the storage requirement for the hash file is ( 218/ 26 ) x 918 = 3.76 x 106 bytes which may be stored in 586 tracks, or 14.6 percent, of an ibm 2316 disk pack. the average time to access a bucket is 76.1 milliseconds. structure of dictionary files the structure of the dictionary files for title words and author names is as described by thiel and heaps.87• 38 each dictionary file contains up to 128 directories each of which points to up to 128 term strings that may each contain space for storage of 128 terms of equal length. thus each dictionary file contains up to 214 different terms. the dictionary pointers in the hash files are essentially the codes stored instead of alphanumeric terms in the catalog file. the most frequent 127 title words are assigned dictionary pointers of the form (26) 10000000 10000000 1xxxxxxx pt and do not have corresponding entries in the inverted index file. the last byte forms the code used to represent the title word within the compressed catalog file. the next most frequent 16,384 title words are assigned dictionary pointers of the form ( 27) 00000000 1xxxxxxx lxxxxxxx or (28) 10000000 oxxxxxxx 1xxxxxxx file structure for an on-line catalog/ d!msdale 51 according as there is, or is not, a corresponding entry in the inverted index. the last 2 bytes are used as codes in the compressed catalog file. the remaining title words are assigned dictionary pointers of the form ( 29) oxxxxxxx oxxxxxxx lxxxxxxx ...____~ --...------' p~ p~ pt they all have corresponding entries in the inverted index file, and the 3 bytes are used as codes in the catalog file. the reason that terms coded in the form 26 or 28 do not have corresponding entries in the inverted index file is that very frequently occurring terms form very inefficient search keys. also, previous results suggest that omission of corresponding entries in the inverted index allows its size to be reduced by about 50 percent.39• 40 the codes of type pt, ( ps,pt) , and ( pn,ps,pt) are used respectively for approximately 50 percent, 45 percent, and 5 percent of the title words. the average length of the coded title words in the compressed catalog file is therefore 1.55 bytes. associated with each dictionary file there is a directory of length 512 bytes whose entries point to the beginnings of term strings within the dictionary file and also indicate the lengths of the terms. within the hash table file a dictionary pointer of the form po, p s, pt points to the pt th term of the ps th term string in the dictionary associated with the po th directory. there is a single directory associated with each set of pointers of type pt and ps, pt. the average length of the 1.8 x 105 different title words is 7.6 characters, and hence the entire set of term strings requires 1.8 x 105 x 7.6 = 1.37 x 106 bytes for storage of title words. since twelve directories occupying 12 x 512 = 6144 bytes will be required, and since some term strings will contain unfilled portions, the storage requirement of the dictionary file will be slightly larger. if the title word dictionary is stored on disk in 1,000 byte records then the storage requirement is 238 tracks, or 5.95 percent, of an ibm 2316 disk pack. the assumptions made previously regarding author names imply an author dictionary size of 1.70 x 106 bytes and sixteen directories whose total storage requirements are 16 x 512 = 8,192 bytes. using an ibm: 2316 disk pack the storage requirement is for 286 tracks, or 7.15 percent. on completion of a search through use of the inverted index .file there results a set of sequence numbers that indicate the position of the relevant items in the compressed catalog file. before such items are displayed to a user of the system, each term must be decoded through access to the directory and dictionary to which it points. the time required to decode a catalog item depends on how the directories and dictionaries are partitioned between disk and core memory. several partitioning schemes for title words have been analysed, and the results are summarized in table 2. 52 journal of library automation vol. 6/1 march 1973 in the calculations used to obtain table 2 it is assumed that title words occur with the frequencies listed by kucera and francis.41 it is supposed that both the directory and term strings corresponding to codes of form pt are stored in a single physical record, that every other directory is contained wholly within a physical record, and that each dictionary term may be located by a single access to a term string. any required cpu time is regarded as insignificant compared to the time needed for file accesses. from the results shown in table 2 it appears that the best partition between core and disk is probably that which gives an average decode time of 42 milliseconds while requiring a dedicated 1501 bytes of core memory. this results when core is used to store both the directories and term strings for terms that correspond to pointers of type pt, and the directories only for terms that correspond to pointers of type ps,pt. compressed catalog file since the title word codes stored in the compressed catalog file have an average length of 1.55 bytes, whereas uncoded title words and their delimiting spaces have an average length of 6.5 characters, the compressed title fields occupy only 24 percent of the storage required for uncompressed words. uncoded author names and their delimiting spaces have an average length of 7.6 characters and are coded to occupy not more than 3 bytes; hence coding of author names effects an average compression factqr of less than 3;7.6 = 40 percent. for lc call numbers the compression factor is less than 30 percent. clearly, subject headings, publisher names, and series statements may be coded with even more effective compression factors. the saving in space through compression of the catalog file may be translated into a cost saving as follows. if there are an average of 5.5 words in each title then one million titles include 5.5 x 106 title words and delimiting spaces which, if stored in the catalog file in uncoded form, would require 3.63 x 107 bytes.42 when stored in coded form the requirement is for 8.54 x 106 bytes. charges for disk space vary considerably with different computing facilities. at the university of alberta users of the ibm 360 model 67 are charged a monthly rate of $.50 for each 4,096 bytes of disk storage. thus, for title words alone the advantage of storing the catalog file in compressed form is to allow the monthly storage cost to be reduced from $4,440 to $950. concluding remarks the results reported in the present paper indicate that a satisfactory structure for a catalog file may be designed to use the concept of virtual hash addressing and storage of terms in compressed form. access and decoding times may be reduced to acceptable amounts. it may prove advantageous to arrange the items in the catalog file in the order of their call numbers. this will tend to reduce the number of disk file structure for an on-line catalog/ dimsdale 53 table 2. average time to decode a title word of the compressed catalog file. core resident directories ter-m string none pr pt, (ps, pt) all pt, (ps, pt) all none pr pr p.,. pt, (ps, pr) 0 pr, (ps, pr ) 0 average number accesses 1.50 1.01 0.55 0.50 0.49 0.44 ( ps, pr) 0 signifies the 128 most frequent of the codes ps, pt average decode time (milliseconds) ll5 77 42 39 38 34 dedicated core memory (bytes) 0 989 1501 7133 2474 8106 accesses needed to retrieve catalog items in response to queries since it will tend to group relevant items. however, the benefits should be weighed against the additional expense required to maintain and update the ordered file. the present paper has omitted discussion of the form of the query language or the search algorithm that operates on the elements of the inverted index. a formal definition of one form of query language has been discussed by dimsdale.48 details of a search algorithm and structure of a compressed form of inverted index have been discussed by thiel and heaps.4 4 it may be noted that each content entry in the hash table file has 4 bytes reserved for a pointer to a bit string of the inverted index. whenever the bit string is less than 4 bytes in length it is stored in the content section and no pointer is required. storage of such bit strings within the content entries significantly reduces the storage requirements of the inverted index and also reduces the number of required disk accesses in the search phase of the program. acknowledgment the authors wish to express their appreciation to the national research council of canada for their support of the present investigation. references 1. d. lefkovitz and r. v. powers, "a list-structured chemical information retl"ieval system," in g. schecter, ed., informatio-n retrieval (washington, d.c.: thompson book co., 1967), p.l09-29. 2. p. r. weinberg, "a time sharing chemical information retrieval system" (doctoral thesis, univ. of pennsylvania, 1969) . 3. r. m. curtice, "experimental retrieval systems studies. report no. 1. magnetic tape and disc file organization for retrieval" (master's thesis, lehigh univ., 1966). 4. d. lefkovitz, file strttctures for on-line systems (new york: spartan books, 1969). 54 journal of library automation vol. 6/ 1 march 1973 5. i. b. holbrook, "a threaded-file retrieval system," journal of the american society for information science 21: 4048 (jan.-feb. 1970). 6. g. g. dodd, "elements of data management systems," computer surveys 1:11733 (june 1969). 7. j. w. rettenmayer, "file ordering and retrieval cost," information storage and retrieval8:19-93 (april1972). 8. r. t. divett, "design of a file structure for a total system computer program for medical libraries and programming of the book citation module" (doctoral thesis, univ. of utah, 1968). 9. h. p. burnaugh, "the bold (bibliographic on-line display) system," in g. schecter, ed., information retrieval (washington, d .c.: thompson book co., 1967)' p.53-66. 10. lefkovitz, powers, "a list-structured chemical information," p.109--29. 11. lefkovitz, file structures for on-line systetm, p.141. 12. ibid., p.177. 13, f. g. kilgour, "concept of an on-line computerized catalog," journal of library automation 3:1-11 (march 1970). 14. j. l. cunningham, w. d. schieber, and r. m. shoffner, a study of the organization and search of bibliographic holdings records in on-line computer systetm: phase i (berkeley: univ. of california, 1969). 15. r. s. marcus, p. kugel, and r. l. kusik, "an experimental computer stored, augmented catalog of professional literature," in proceedings of the 1969 spring joint computer conference (montvale: afips press, 1969) p.461-73. 16. j. w. henderson and j. a. rosenthal, eds., library catalogs: their preservation and maintenance by photographic and automate d techniques; m.i.t. report 14 (cambridge, mass.: m.i.t. press, 1968). 17. i. a. warheit, "file organization of library records," journal of library automation 2:2(}...30 (march 1969) . 18. r. morris, "scatter storage techniques," communications of the acm 11 :38-44 (jan. 1968) . 19. d. m. murray, "a scatter storage scheme for dictionary lookups," journal of library automation 3:173-201 (sept. 1970). 20. w. buchholz, "file organization and addressing," ibm systems journal 2:86-111 {june 1963). 21. p. l. long, k. b. l. rastogi, j. e. rush, and j. a. wyckoff, "large on-line files of bibliographic data: an efficient design and a mathematical predictor of retrieval behavior," in information processing 71 (north holland publishing company, 1972) p.473-78. 22. buchholz, "file organization," p.l02-3. 23. w. p. reising, "note on random addressing techniques," ibm systems journal 2:11216 (june 1963). 24. murray, "a scatter storage scheme," p.178. 25. ibid., p.181. 26. g. schay and w. g. spruth, "analysis of a file addressing method," communications of the acm 5:459-62 (august 1962). 27. m. tainter, "addressing for random-access storage with multiple bucket capacities," journal of the acm 10:307-15 (july 1963). 28. reising, "note on random addressing," p.ll2-16. 29. w. h. beyer, handbook of tables for probability and statistics (cleveland: the chemical rubber company, 1966). 30. buchholz, "file organization," p.99. 31. morris, "scatter storage," p.42. 32. w. d. maurer, "an improved hash code for scatter storage," communications of the acm 11:35-38 (jan. 1968). file structure for an on-line catalog/dimsdale 55 33. j. r. bell, "the quadratic quotient method: a hash code eliminating secondary clustering," communications of the acm 13:107-9 (feb. 1970). 34. morris, "scatter storage," p.40. 35. w. d. reid and h. s. heaps, "compression of data for library automation," in canadian association of college and university libraries: automation in libraries1971 (ottawa: canadian library association, 1971), p.2.1-2.21. 36. murray, "a scatter storage scheme," p.183. 37. l. h. thiel and h. s. heaps, "program design for retrospective searches on large data bases," information storage and retrieval8:1-20 (jan. 1972) . 38. h. s. heaps, "storage analysis of a compression coding for document data bases," infor 10:47-61 (feb. 1972) . 39. thiel and heaps, "program design," p.l5-16. 40. reid and heaps, "compression of data," p.2.1-2.21. 41. h. kucera and w. n. francis, computational analysis of present-day american english (providence: brown university press, 1967). 42. reid and heaps, "compression of data," p.2.4. 43. j. j. dimsdale, "application of on-line computer systems to library automation" (master's thesis, univ. of alberta, 1971), p.50-68. 44. thiel and heaps, "program design," p.l-20. lib-s-mocs-kmc364-20141005043558 56 highlight of minutes information science and automation division board of directors meeting 1973 midwinter meeting washington, d. c. monday, january 29, 1973 the meeting was called to order by president ralph shoffner at 8:10a.m. the following were present: board-ralph m. shoffner (chairman ) , richard s. angell, don s. culbertson (!sad executive secretary), paul j. fasana, donald p. hammer, susan k. martin, and bemiece coulter, secretary, isad. committee chairman-stephen r. salmon. guestscharles stevens and david weisbrod. report of national commission on library and information science. mr. charles stevens, executive director of the national commission on library and information science, discussed the commission's priorities and objectives for planning libmry and information services for the nation. the commission has identified six areas of activity in which to conduct investigations in relation to its charge which is to study " ... library and information services adequate to meet the needs of the people of the united states." these six: areas are: ( 1) understanding information needs of the users; ( 2) adequacies and deficiencies of current library and information services; ( 3) pattems of organization; ( 4) legal and financial restrictions on libraries; ( 5) technology in library and information systems; and ( 6) human resources. report to ala planning committee. the report to the ala planning committee on !sad's long range plans was deferred until after the !sad objectives committee report is received in june. objectives committee interim report. mr. stephen salmon, chairman, provided an interim report of the committee. the committee will recommend that the division continue to exist and will list its proposed objectives, which may differ from the original objectives. at the request of louise giles, chairman of the information technology discussion group, special attention will be given to that group's interests in formulating the statement of objectives. membership survey committee. mr. shoffner relayed ms. pope's report that the membership survey will cost $700.00, which is not available in the current budget. mr. culbertson said that the cost could be decreased by surveying a sample of 1,000 members. the decision was to highlights of minutes 57 request the full amount for the survey, to be performed in the coming fiscal year. asidic representative. mr. peter t. watson, through correspondence with mr. shoffner, reported that asidic is interested in liaison with ala, and was concerned with the possibility of accomplishing this through isad. mr. culbertson reported that asidic could become an affiliate of ala for a $40.00 fee, but that isad could recommend a formal liaison, especially if isad and asidic had similar interests. motion. it was moved by paul fasana that this matter of asidic liaison with ala be passed on to the executive director, mr. robert wedgeworth, and that the president of isad write and inform him of such. seconded by richard angell. carried. policy statement on privacy of data processing records. mr. culbertson had been approached about !sad's making a statement on broad issues of data processing, including privacy. a need has been made known by the ala washington office for having such a statement on which to base their stand in certain hearings. mr. hammer felt it very appropriate that the association (ala) take a position on it. mr. weisbrod mentioned that !sad could be involved because of the vulnerability of machine-readable files due to the large quantity of data processed. motion. it was moved by paul fasana that the isad board recommend to the ala council that it (ala) develop some policy expressing its membership's attitude toward the privacy of machine-readable data. seconded by donald hammer. carried. ]ola editor. mr. shoffner reported, concerning the appointment of an editor, that two contacts were outstanding and he would report to the board on wednesday. mr. culbertson has been serving as temporary editor. mr. fasana noted that the schedule for 1972 was for four issues, but only one had appeared. he asked what plans there were to catch up or cancel. mr. culbertson said that legally isad could not cancel any issues, and that a statement had been written for the "memo to members" section of american libraries. he also mentioned the previous board action to have ]ola te chnical communications become a part of the 1973 volume. wednesday, january 31, 1973 mr. shoffner called the meeting to order at 10:00 a.m. those present were: board-ralph m. shoffner (chairman), richards. angell, dons. culbertson (!sad executive secretary), paul j. fasana, donald p. hammer, susan k. martin, and berniece coulter, secretary, isad. committee chairmen-brigitte kenney, ronald miller, and velma veneziano. guest-peter watson. conference planning committee report. mrs. susan 58 journal of library automation vol. 6/ 1 march 1973 martin, chairman, reported that the 1972 seminar on telecommunications had been successful, and the april seminar with the national microfilm association in detroit was proceeding as scheduled. the seminar on the national libraries, originally scheduled for january, and the seminar on netm works which was to be in march had both been postponed until the next fiscal year. planning of the las vegas preconference program is continuing smoothly; the institute is to be concerned with a review of the state-of-the-art of library automation. it will update the !sad preconference institute of 1967. isad / led education committee report. a written report was submitted. (see exhibit 1.) rtsd / isad / rasd representation in machine-readable form of bibliographic information committee report. chairman vehna veneziano reported that as a result of a ]ola technical communications announcement that the committee meeting was open and that there would be discussion of the controversial international standard bibliographic description ( isbd), 2()0-300 persons attended the committee meeting. the committee felt that changes such as isbd in the marc records by the library of congress should take into account the users of the marc distribution service. committee action on the isbd was delayed until the isbd for serials proposal was further along. it was stated that the isbd for serials should be as consistent as possible with the isbd for monographs. the committee suggested that each division publish these standards in its journal. motion. it was moved by paul fasana that the !sad board suggest to the jola editorial board that discussion drafts of standards be published in the ] ournal of library automation. seconded by donald hammer. carried. mrs. veneziano pointed out that a resolution was passed concerning the formation of an ad hoc task force for a period of two years. the task force would work with emerging standards relating to character sets: greek and cyrillic alphabets; mathematical and logical symbols; and control characters relating to communications. three persons were suggested for the task force: charles payne of the university of chicago, david weisbrod of yale, and michael malinconico of the new york public library in addition to lucia rather and henriette avram of the library of congress. the task force would report back to the board through the committee. motion. it was moved by paul fasana that !sad consider the creation of a task force to work with emerging standards relating to character sets and the insertion of a fund request in the isad budget for $1,060 ( $700 for 2 trips for 3 persons and $360 per diem for 3 persons for 2 days for each of 2 trips). seconded by donald hammer. carried. highlights of minutes 59 the committee wished to go on record that since rtsd had recently formed a committee on computer filing that computer filing rules was a function of the interdivisional committee on representation in machinereadable form of bibliographic information. the subject of library codes was discussed. bowker was assigning numeric codes to libraries, book publishers, and book dealers. the committee is concerned about standards and does not wish to see the creation of systems of incompatible codes. telecommunications committee report. brigitte kenney, chairman, submitted a written 18-month report of the committee (exhibit 2). miss kenney announced that she was resigning as chairman of the committee and that no present member was available to assume the chairmanship. mr. hammer, as president-elect, was charged with appointing the next chairman. the function statement of the telecommunications committee has been grouped into four areas: ( 1) communication to members; ( 2) training; ( 3) legislative matters; and ( 4) research. she pointed out that both ]ola technical communications and american libraries had said in writing that they would accept articles on telecommunications, particularly cable tv, and had accepted none. also, she ltad attempted for a year and a half to assemble an information packet at ala headquarters, but did not know the status of the project. headquarters had requested guidelines on cable policy from the committee; she stated that they had not succeeded in completing this task. no guidelines had been provided. mter !sad and ala sources did not respond to a request to publish a cable newsletter, the american society for information science was approached. the asis council approved this the previous friday and she had obtained seed money from the markle foundation. miss kenney referred to the resolution introduced that afternoon in council that an ad hoc ala committee be established to address itself exclusively to cable matters and be representative of all units of ala, and that it take on very specific tasks with clearly delineated time limits. she further stated that she had not felt that !sad had given adequate support to the isad telecommunications committee's activities, and thought that the board would have to decide if this was an appropriate committee for !sad. if so, was the function statement too broad? should it be narrowed to just data transfer? miss kenney also suggested that the committee be expanded in size to include more people involved in telecommunications. in the discussion which followed it was indicated that it could take from two to three years to set up a committee in ala as an interdivisional committee. it was decided that a committee chairman should be found and that 60 journal of library automation vol. 6/1 march 1973 the board could then work with the chairman in the definition of the tasks to be performed. publishing of minutes. it was decided that the board of directors express to the editorial board their desire that the minutes of board meetings be published in the journal. seminar and institute topics committee report. ronald miller~ chairman, enumerated the following points of the committee's meeting: that ( 1) a long range plan for seminar programs be written to cover the period from july 1974 through june 1978; (2) part of the money from the institutes be budgeted to support a professional staff person at ala headquarters to handle the burden of the work; ( 3) policy be established concerning commercial groups using isad programs for a marketing channel, particularly products of use to libraries; ( 4) institutes or seminars be regionalized in the u.s. and canada; and ( 5) liaison efforts be utilized (a) within the network of ala, (b) through subcontractors, and (c) through continuing education programs of library schools or other institutes of higher education. in the discussion by the board it was agreed that a written document, both specific and general, be put before the isad membership concerning future seminar and institute topics in order to obtain reactions. ]ola editor appointed motion. it was moved by donald hammer that the board approve the appointment of susan k. martin as editor of the journal of library automation. seconded by paul fasana. carrjed. tribute to don culbertson. "the board commended don s. culbertson for long, energetic and useful service to isad." exhibit 1 january 23, 1973 !sad/led education committee report the isad/led education committee met sunday, january 28, at 9:30 a.m. in the garden restaurant of the shoreham hotel. present were members james liesener, robert kemper, gerald jahoda, edward heiliger, and (ex officio ) ralph shoffner. absent were ann painter and duane johnson. discussion focused on disc (developmental information science curriculum), what has been achieved by the disc contingent working under the aegis of asis, and how isad/ led could contribute to achieving the disc objective of producing transferable "modules" or packaged programs for information science teaching. it was decided that to reach this objective what would be required were: ( 1) an overall structure or frame of reference which could be used to coordinate modules developed by interested and dedicated individuals. (2) specifications for module construction. re 2lt was decided to await the completion of modules currently b eing developed by charles davis and david buttz and to examine these (at las vegas) as providing guidelines for module specifications. highughts of minutes 61 re l-it was suggested by ralph shoffner that a frame of reference might be achieved, with some dispatch, by drawing up a list of about 20 questions in the area of information science, which library schools might expect their graduating students to answer, each question being answerable in no more than an hour. the idea was that modules might be designed around these questions. also, it was seen that these questions might serve a useful purpose in organizing information science teaching in light of professional program evaluation and accreditation. the suggestion of "questions" was enthusiastically received and the following day gerald jahoda, edward heiliger and charles davis drew up a "sample" list of questions and outlined the following procedure: ( 1) the sample list of questions is sent to isad /led education committee members as well as to asis sig/eis and asis education committee members for recommendations in the way of additions, deletions and word revisions. by february 15, 1973. (2} the questions are revised and edited by an ad hoc committee consisting of interested members of the three committees involved. by march 30, 1973. ( 3) the revised list of questions is sent to accredited library schools in the u.s. and canada for additions, deletions and word revisions. by april 15. ( 4) !sad/led education committee members together with invited members from the asis committees involved revise the question list at las vegas. (5) designating potential module constructors for each of the questions on the final question list. formulation of module specifications at las vegas. immediately after las vegas the designated module constructors will be solicited. they will be sent a "question" together with module specification. this is where we are january 29, 1973. exhibit 2 respectfully submitted, elaine svenonius telecommunications committee annual report 1972/ 73 1. communications: a. cable newsletter: after exhausting every possible avenue within ala (amlibs, lola technicaj. commtmications, headquarters clearinghouse, information packet) the chairman received the mandate considered necessary to go ahead with plans for an effective communications medium. the mandate came in the form of a unanimous resolution from the 104 attendees at the cable institute, held in september, to produce such a newsletter. !sad board approval/endorsement was obtained, and lbe chairman approached asis which will publish the newsletter. start-up money was obtained from the markle foundation for the first promotional issue, which will receive widest distribution. based on response to the initial mailing, the newsletter will continue on a subscription basis, provided 750 subscriptions are obtained. the chairman and two other people will volunteer their time as coeditors. b. the chairman has been operating a clearinghouse on cable information out of her office, which has become incredibly time-consuming. it is impossible for one person to do all that is needed; innumerable letters have been written and phone conversations held with people and groups wanting advice on dozens of issues connected with cable. it is hoped that the newsletter, the proceedings of the cable institute, and a soon-to-be-established task force within srrt on cable will lessen the almost impossible load. c. specific letters were written in response to requests from the rocky mountain 62 ]oumal of library automation vol. 6/1 march 1973 federation (justification of library use of the ats-f satellite), senator mike gravel (introduced several bills on telecommunications, wanted to know what libraries could do with this medium), and a presentation will be made to the national commission hearings in new york. d. a librarian-representative was located, suggested, and subsequently appointed to the fcc federal-state-local advisory committee on cable. (a first for librarians!) e. liaison was maintained with nonlibrary groups: publicable, of which the chairman is a member, the mitre symposium on cable, to which the chairman was invited, and, as a result of that meeting, the aspen workshop on cable in palo alto, which the chairman attended by invitation from douglass cater, together with eight other people, to decide on the direction this activity should take. at all three meetings the chairman attempted to represent the library viewpoint on cable. f. a las vegas program was to be planned, together \vith acrl and the av committee. plans did not materialize, and the committee is being approached by the soon-to-be-established srrt task force on cable to cosponsor a program on cable at las vegas. 2. training: 1. institute on cable television for librarians: held september 1720, 1972, and attended by over 100 librarians from thirty-four states, representing public and state libraries primarily, this was directed by the chairman, and funded by usoe. russell shank and frank norwood, consultants to the tc committee presented major talks. the entire institute was videotaped and the tapes are available. proceedings will be issued in march as a double issue of the drexel library quarterly. the institute was designed to provide a format and material (including videotaped presentations) to allow others to do their own institutes. 2. telecommunications seminar: conducted by russell shank, consultant to the tc committee, it presented an overview over various aspects of telecommunications. held in washington september 25-26, 1972, it, too, was attended by almost 100 ji. brarians from all types of libraries. the chairman and frank norwood, consultant, participated in the presentation of papers. 3. legislative matters: the committee expressed its concern to the ala legislation committee about the lack of sufficient personnel to keep abreast of legislative and regulatory matters affecting telecommunications. the chairman of the legi~lation committee responded by stating that the ala washington office had been trying to do their best, in the absence of funding for additional personnel, and would continue to do so. the committee attempts to follow legislative and regulatory developments in the telecommunications area, and works closely with the washington office in this activity, providing persons to testify, and supplying two of the four members of the subcommittee on copyright (shank and kenney) . the committee participated actively in the revision of the ala policy booklet, concerning itself with matters pertaining to networks and telecommunications. all recommendations were incorporated in the final draft of this document. 4. research: the telecommunications requirements study, long ago proposed, is dormant. shank and kenney are actively working on putting together a proposal to respond to a call for proposals from nsf in the area of telecommunications policy research. the committee will discuss the proposal during the midwinter meeting, 1973. respectfully submitted, brigitte kenney the library of congress view on its relation to the ala marc advisory committee henriette d. avram: marc development office, libraq of congress. 119 this paper is a statement of the library of congress' 1'ecommendation that a marc advisory committee be appointed within the present structure of the rtsd jisad jrasd committee on representation in machine-readable form of bibliographic information (marbi) and describes the library's proposed relation to such a committee. the proposals and recommendations suggested were adopted by the marbi committee dming its deliberations at ala midwinter, janua1'y 1974, and a1'e now in effect. introduction during ala midwinter, january 1973, the library of congress (lc) suggested to the rtsd/isad/rasd committee on representation in machine-readable form of bibliographic information that a marc advisory committee be formed to work with the marc development office regarding changes made to the various marc formats. the primary interest of the committee would be the serial and monograph formats, though the committee should have interest in and responsibility for reviewing changes in any of the marc formats to insure that the integrity and compatibility of marc content designators are preserved. the marbi committee decided that it would be the marc advisory committee and asked that a paper be prepared proposing how such a committee would operate in relationship to the marc development office. prior to a discussion of marc changes, it appears appropriate to make certain basic statements regarding marc changes and the difficulties experienced by the marc development office in evaluating the significance of a change for the marc subscriber. it would be naive to assume, in a dynamic situation, that even in the best of all worlds a marc subscriber would never have to do any reprogramming. changes in procedures, changes in cataloging, experience in providing the knowledge for more efficient ways to process information, additional requirements from users, etc., have always been factors creating the 120 ] ournal of library automation vol. 7/2 june 197 4 need to both modify andjor expand an automated system. programming installations always require personnel to maintain ongoing systems. situations creating changes locally must exist and, likewise, they also exist at lc. staff of the marc development office give serious consideration to every proposed marc change and its impact on the marc subscribers. however, it must be realized that it is not possible to evaluate fully the impact of each change because the significance of a change is directly dependent on the use made of the elements of the record and the programming techniques used by each subscriber. marc staff cannot possibly know the details of use and programming techniques and capabilities at every user installation. each marc subscriber evaluates a change in light of his operational requirements. since the uses made of the data are varied among users, there is rarely a consensus as to the pros and cons of a change. marc staff are aware of the expenses imposed by changes to software and have made an attempt to solicit preferences in some cases for one technique over another from marc subscribers when changes were required. in the case of the isbd implementation, ten replies were received from questions submitted to the then sixty-two marc users. the remainder of this paper describes what is included in the term "change," the various stimuli that initiate changes, and recommendations of how lc and the marc advisory committee should interact in regard to changes. the appendix summarizes in chart form the addenda to books: a marc fo1·mat since the initiation of the marc service. an examination of the chart will reveal that the number and the types of changes have not been too significant. marc changes the term "change" is used throughout this paper in the broad sense, i.e., the term includes additions, modifications, and deletions of content data (in both fixed and variable fields) and content designators (tags, indicators, and subfield codes) made to the format as well as additions, modifications, and deletions made to the tape labels. the concern is with changes made to all records where applicable or groups of records but not with the correction or updating of individual records as part of the marc distribution service. changes as described above fall into several broad types: 1. addition of new fields, indicators, or subfield codes to the format. 2. implementation of aheady defined but unused tags, indicators, subfield codes, or fixed fields. 3. modification of content data of fields (fixed and variable). 4. changes in style of content in records, e.g., punctuation. 5. cessation in use of existing fields, indicators, and subfield codes. library of congress view/ avram 121 the following paragraphs are divided into two sections. section "a" describes the stimulus for a change and the rationale for making it. section "b" describes the lc position regarding the change and, where applicable, a recommendation to the marc advisory committee. changes made to marc records may be divided into the following categories: category 1: changes resulting from a change in cataloging rules or systems. a. cataloging rules or systems fall into two distinct types: those made in consultation with ala (resources & technical services division/cataloging & classification section/descriptive cataloging committee), and those made by the subject cataloging division to the subject cataloging system without consultation with ala. lc follows aacr. since the marc record is the record used for lc bibliographical control as well as the source record for the lc printed card and lc book catalogs (for those items presently within the scope of marc), cataloging changes (descriptive and subject) are necessarily reflected in marc. if the cataloging change is such that the retrospective records can reasonably be modified by automated techniques, these records are modified to reflect the change. prior to marc, this updating could not be provided to subscribers to lc bibliographic products and is one of the advantages of a machine-readable service. it has the effect of maintaining a consistent data base for all marc users. b. changes made in cataloging rules or systems will be made by the appropriate agencies. once changes in cataloging rules have been made by the ala (rtsdjccsjdcc) committee, lc will consult with the marc advisory committee with respect to their implementation in those cases affecting the marc format.'~* wherever possible, depending upon resources available, the number of records affected, and the type of change, the retrospective flies will be updated and made available in one of two ways: if the number of records is small (to be decided by lc), the records will be distributed as corrections through the normal channels of the marc distribution service. if the number of records is large, the records will be sold by the lc card division. category 2: changes made to satisfy a requirement of the library of congress. a. since lc uses the marc records for its own purposes, situations do arise in which lc has a requirement for a change. in most cases, lc feels that the change would also be beneficial to the users. under these circumstances lc has carefully evaluated the im""format change is used in this context to mean a change affecting the tags, indicators, subfield codes, addition or deletion of fixed fields, or change to the leader. 122 i oumal of libmry automation vol. 7/2 june 197 4 plication of the change to the marc subscribers and, in some cases, solicited their preferences and advice. b. if lc has a requirement to make a change to marc, the proposed change and the reason for the change will be referred to the marc advisory committee. the marc advisory committee will solicit opinions from marc users as to whether or not to include the change in the marc distribution service, and lc will abide by the committee's recommendation. if this decision is not to include the change, lc will implement the change only in its own data base.t category 3: changes made to satisfy subscribers' requests. a. subscribers sometimes request that a change be made to a marc record. where possible, within the limitation of lc resources, these requests are complied with. lc, when considering such a request, has sought the opinion of the marc subscribers, and if sufficient numbers of users were interested in the change, the change was implemented. b. changes requested by subscribers will be evaluated by lc, and if considered possible to implement, the proposed change will be submitted by lc to the marc advisory committee to solicit opinions from marc users. if the committee recommends, lc will implement the change. catego1·y 4: changes made to support international standardization. a. lc plays a significant role in international activities in the area of machine-readable cataloging records. much of the future expansion of marc depends upon standards in formats, data content, and cataloging. in all these activities, lc firmly supports aacr and current marc formats. occasionally, in order to arrive at complete agreement with agencies in other countries, it becomes necessary for all to compromise. however, in all cases lc does not agree to changes in cataloging rules until the recommendation has been approved by the appropriate ala committee. b. changes resulting from international meetings will fall principally into two areas: 1. cataloging-if the change required is the result of a change in cataloging rules and the ala (rtsdjccsjdcc) has approved the aacr modifications, the marc change falls into category 1. 2. all other changes affecting the format-since lc is the agency in the u.s. that will exchange machine-readable bibliographic records with other national agencies, lc will consider these t an exception to this statement will be those changes to lc practice which must be reflected on cards and in the marc record and which cannot exist in optional form. an example of the above would be abolition of the check digit in the lc card number. libmry of congress viewj avram 123 changes an internal lc requirement; therefore, they can be considered under the proposal described in category 2. lc will submit the proposed changes to the marc advisory committee. category 5: changes made to expand the marc program to include additional services. a. if the marc service were static, changes to expand the service would not be possible. an example of an additional service is the cataloging in publication data available on marc tapes. since these cataloging data are available four to six months prior to the publication of the item, it was determined to be of value to marc subscribers and'changes were made to the marc record to make these data available in machine-readable form. b. if a new service is under consideration at lc that will cause a change to marc records, e.g., cataloging in publication, lc will submit the proposal to the marc advisory committee for their action as described in category 2. other lc recommendations for the marc advisory committee 1. time fmme fo1' changes. in order to prevent consultation on changes from taking an inordinate length of time, lc proposes that the marc advisory committee be given two months to solicit comments from marc users, to arrive at a consensus, and to respond to proposed changes. if there is no response during that time, lc will implement the proposed change. lc will notify the marc subscribers two months prior to including the change in the marc distribution service. 2. consultation with the marc advisory committee. the marc development office will submit the recommendation for change and any other information required to evaluate the change to the marc advisory committee. the marc advisory committee will be responsible for submitting the proposal to the marc users and notifying the marc development office of the committee's recommendation. 3. test tapes. the marc advisory committee, on consultation with the marc development office, will consider the requirement for a test tape to reflect the change made to the marc record (the requirement for a test tape is dependent on the type of change made). appendix a addenda to books: a marc format stimul~ for change date change 1. cataloging rules and cataloging system changes 1972 u.s./gt. brit. changed to united states and great britain. comments change made to facilitate machine filing. 124 journal of library automation vol. 7/2 june 1974 appendix a-continued stimulus for change date change 1972 isbd. 1973 isbd-additional information. comments cataloging change based on an international agreement. 2. subscribers requests 1972 government publication code 3. initiated at lc: a. addition or deletion of fields added to fixed field. 1969 abolishment of 653-political jurisdiction (subject) and 750-proper name not capable of authorship.' these little-used fields proved difficult to define and of little value. 1970 addition of encoding level to implemented for use for leader. recon records. 1970 addition of geographic area code field, tag 043. 1971 addition of superintendent of documents field, tag 086. this field has been widely used by lc and subscriber libraries. information added to lc catalog cards (and thus to marc records) at the request of outside libraries. b. additions of indicators 1971 addition of filing indicators. or subfields information needed to allow lc to ignore initial articles in arranging its computerproduced book catalog. c. addition or change of codes or data to existing fields 1972 addition of "q" subfield to fields for conferences entered under place. 1969 code added to modified record indicator in fixed field to indicate shortened records. 1969 code for phonodiscs added to illustration fixed field. 1970 code added to modified record indicator in fixed field to indicate that the dashed-on entry on the original lc card was not carried in marc record. 1971 "questionable condition" codes deleted from country of publication code. 1971 geographic area code. guidelines for implementation modified slightly and 23 new codes added. subfield needed to enable lc to file conferences entered under place correctly. 1971 microfilm call numbers description of what such call carried in lc call number field. numbers looked like. 1971 abolished lc card number check digit. numbers available using check digit too limited. library of congress viewjavram 125 appendix a-continued stimulus for change date change comments d. explanations or 1970 use of "b" subfield with subfield and its use inadcorrections topical subjects (field 650) vertently omitted from books: and geographic subjects a marc format. it occurs (field 651). rarely in marc records. 1971 use of "revision date" as explanation of what this insuffix to lc card number. formation means at lc and how subscribers use it. 1971 indicators used with explanation of use of indiromanized title. cators with this field omitted from books: a marc format. e. changes to labels 1972 change to label to reflect new computer system at lc. 4. national and 1970 standard book number (9 international agreement digits ) changed to international standard book number ( 10 digits) to conform to an international standard. 1971 entry map added to leader to adoption of ansi z39 format conform to national standard. for exchange of bibliographic information interchange. 1971 change to label to conform to ansi standard. 5. new services at lc 1969 changes to label and status to provide for cumulative codes for cumulated tapes. quarterly and semiannual tapes. 1971 cip records-addition of codes to encoding level and record status. 139 technical communications announcements panel discussion on «government publications in machine-readable form" this meeting will be held on july 10 from 8:30 to 10:30 p.m. as a part of the american library association's 1974 new york conference. the meeting is cosponsored by the government documents round table's (godort) machinereadable data file committee, the federal librarians round table (flirt), the rasd information retrieval committee, and the rasd/rtsd/ asla public documents committee. the moderator is gretchen dewitt of columbus public library and the panelists are peter watson of ucla, mary pensyl of mit, judith rowe of princeton, and billie salter of yale. mr. watson will discuss the general issues concerning the acquisition and use of bibliographic data files and provide a brief description of some of the files now publicly available; miss pensyl will describe the workings of the project now underway to make these files available to mit users. mrs. rowe will discuss the ways in which government-produced statistical files supplement the related printed reports and will indicate some of the types and sources of files now being released; miss salter will discuss a program for integrating these and other research files into yale's social science reference service. representatives of several federal agencies will display materials describing and documenting both bibliographic and statistical data files. the purpose of the program is to acquaint reference librarians, particularly those now handling printed documents, with the uses of both types of files, the advantages and disadvantages of these reference tools, and the techniques and policy changes necessary for their use in a library environment. the recent release of the draft proposal produced by the national commission on libraries and information services makes more timely than ever an open discussion of the place of bibliographic and numeric data files in a reference collection. all librarians must be acquainted with these growing resources in order to continue to provide full service to their patrons. for further information, contact judith rowe, computer center, princeton university, 87 prospect ave., princeton, nj 08540. ninth annual educational media and technology conference to be hosted by university of wisconsin-stout, july 22-24, 1974 aetc past president dr. jerry kemp, coordinator of instructional development services for san jose state university (california), and film consultant ralph j. amelio, media coordinator and english instructor at willowbrook high school, villa park, illinois, will headline the university of wisconsin-stout's 9th annual educational media and technology conference to be held in menomonie, wisconsin, on july 22-24, 1974. "educational technology: can we realize its potential?" will be the subject of kemp's presentation on monday evening, while amelio, speaking on tuesday, july 23, will challenge participants with the subject "visual literacy: what can you do?". seven concurrent workshops will be held on monday afternoon: library automation; sound for visuals; making the timesharing computer work for you; new developments in photography; what's 140 journal of libmry automation vol. 7/2 june 1974 new in graphics; selecting and evaluating educational media; and instructional development: how to make it work! individuals leading the three-hour workshops will include: alfred baker, vicepresident of science press; john lord, technical service manager for the dukane corporation; william daehling, weber state college, ogden, utah; and several media specialists from learning resources, university of wisconsin-stout. about fifty exhibitors will show and demonstrate both hardware and software during the conference. six case studies will be given of exemplary media programs at the public school, vocational-technical, and college level. further information may be obtained by contacting dr. david p. bernard, dean of learning resources, university of wisconsin-stout, menomonie, wi 54751. report of recon project published the library of congress has published in recon pilot project (vii, 49p.) the final report of a project sponsored by lc, the council on library resources, inc., and the u.s. office of education to determine the problems associated with centralized conversion of retrospective catalog records and distribution of these records from a central source. in the marc pilot project, begun in november 1966, the library of congress distributed machine-readable catalog records for english-language monographs, and the success of that project led to the implementation in march 1969 of the marc distribution service, in which over fifty subscribers have by now received more than 300,000 marc records representing the current english-language monograph cataloging at the library of congress. as coverage is extended to catalog records for foreign-language monographs and for other forms of material, libraries will be able to obtain machine records for a large number of their current titles. more research was needed, however, on the problems of obtaining machinereadable data for retrospective cataloging, and the council on library resources made it possible for lc to engage in november 1968 a task force to study the feasibility of converting retrospective catalog records. the final report of the recon (for retrospective conversion) working task force was published in june 1969. one of the report's recommendations was that a pilot project test various conversion techniques, ideally covering the highest priority materials, english-language monograph records from 1960-68; and with funds from the sponsoring agencies lc initiated a two-year project in august 1969. the present report covers five major areas examined in that period: 1. testing of techniques postulated in the recon report in an operational environment by converting englishlanguage monographs cataloged in 1968 and 1969 but not included in the marc distribution service. 2. development of format recognition, a computer program which can process unedited catalog records and supply all the necessary content designators required for the full marc record. 3. analysis of techniques for the conversion of older english-language materials and titles in foreign languages using the roman alphabet. 4. monitoring the state-of-the-art of input devices that would facilitate conversion of a large data base. 5. a study of microfilming techniques and their associated costs. recon pilot project is available for $1.50 from the superintendent of documents, u.s. government printing office, washington, dc 20402. stock no. 300000061. library of congress issues recon working task fo1'ce report national aspects of creating and using marc/recon records (v, 48p.) reports on studies conducted at the library of congress by the recon working task force under the chairmanship of henriette d. avram. they were made concurrently with a pilot project by the library to test the feasibility of the plan outlined in the task force's first report entitled conversion of retrospective reco1·.ds to machine-readable form (library of congress, 1969) and in recon pilot p1'oject (library of congress, 1972). both the pilot project and the new studies received financial support from the council on library resources, inc., and the u.s. office of education. the present volume describes four investigations: ( 1) the feasibility of determining a level or subset of the established marc content designators (tags, indicators, and subfield codes) that would still allow a library using it to be part of a future national network; ( 2) the practicality of the library of congress using other machine-readable data bases to build a national bibliographic store; ( 3) implications of a national union catalog in machine-readable form; and ( 4) alternative strategies for undertaking a largescale conversion project. the appendices include an explanation of the problems of achieving a cooperatively produced bibliographic data base, a description of the characteristics of the present national union catalog, and an analysis of library of congress card orders for one year. although the findings and recommendations of this report are less optimistic than those of the original recon study, they reaffirm the need for coordinated activity in the conversion of retrospective catalog records and suggest ways in which a large-scale project might be undmtaken. the report provides a basis for realistic planning in a critical area of library automation. national aspects of creating and using marc!recon records is available for $2.75 from the superintendent of documents, u.s. government printing office, washington, dc 20402. stock no. 300000062. isad official activities tesla info1'mation editor's note: use of the following guidelines and forms is described in the article by john kountz in this issue of technical communications 141 jola. the tesla reactor ballot will also appear in subsequent issues of technical communications for reader use, and the tesla standards scoreboard will be presented as cumulate.d 1'esults warrant its publication. to use, photocopy or otherwise duplicate the forms presented in jola-tc, fill out these copies, and mail them to the tesla chai1'man, m1'. john c. kountz, associate fo1' libmry automation, office of the chancello1', the califomia state university and colleges, 5670 wilshim blvd., suite 900, los angeles, ca 90036. initiative standard proposal outlinethe following outline and forms are designed to facilitate review by both the isad committee on technical standards for library automation (tesla) and the membership of initiative standards requirements and to expedite the handling of the initiative standard proposal through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a tesla reactor ballot reactor information name title organization address city state ___ zip __ telephone identification number for standard requirement for against reason for position: (use additional pages if required} 142 ]oumal of librm·y automation vol. 7/2 june 1974 tesla standards scoreboard receipt screen division rej/acpl publish tally representative title/i.d. number date date date date date date date target specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indicated by: vi. existing standards. not applicable). note that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 83~" x 11" white paper (typing on one side only). each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title) . ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications) . v. description. briefly describe the standard proposal (specification of the standard) . vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard technical communications 143 proposal, cite them here (expository remarks) . vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). viii. specifications. specify the standard proposal using record layouts, mechanical drawings, and such related documentation aids as required in addition to text exposition where applicable (specification of the standard). research and development system development corporation awarded national science foundation grant to study interactive searching of large literature data bases santa monica, california-the national science foundation has awarded system development corporation $98,500 for a study of man-machine system communication in on-line reh·ieval systems. the study will focus on interactive searching of very large literature data bases, which has become a major area of interest and activity in the field of information science. at least seven major systems of national or international scope are in operation within the federal government and private industry, and more systems are on the drawing boards or in experimental operation. the principal investigator for the project will be dr. carlos cuadra, manager of sdc's education and library systems department. the project manager, who will be responsible for the day-to-day operation of the fifteen-month effort, is judy wanger, an information systems analyst and project leader with extensive experience in the establishment and use of interactive bibliographic retrieval services. ms. wanger is currently responsible for user training and customer support on sdc's on-line information service. the study will use questionnaire and interview techniques to collect data re144 journal of libml'y automation vol. 7/2 june 1974 lated to: (1) the impact of on-line retrieval usage on the terminal user; (2) the impact of on-line service on the sponsoring institution; and ( 3) the impact of online service on the information-utilization habits of the information consumer. attention will also be given to reliability problems in the transmission chain from the user to the computer and back. the major elements in this chain include: the user; the terminal; the telephone instrument; local telephone lines and switchboards; long-haul communications; the communications-computer interface hardware; the computer itself; and various programs in the computer, including the retrieval program. reports on regional projects and activities california state university and colleges system union list system the library systems project of the california state university and colleges has recently completed a production union list system. this system, comprised of eight processing programs to be run in a very modest environment (currently a cdc 3300), is written in ansi cobol and is fully documented. included in the documentation package are user worksheets for bibliographic and holding data, copies of all reports, file layouts, program descriptions, etc. output from this system are files designed to drive graphic quality photocomposition or com devices. the system is available for the price of duplicating the documentation package. and, for those so desiring, the master file containing some 25,000 titles and titles with references is also available for the cost of duplication. interested parties (bona fides only, please) should contact john c. kountz, associate for library automation, california state university and colleges, 5670 wilshire blvd., suite 900, los angeles, ca 90036, for further details. solinet membe1·ship meeting the annual membership meeting of the southeastern library network (solinet) was held at the georgia institute of technology in atlanta, march 14. it was announced that charles h. stevens, executive director of the national commission on libraries and information science, has been named director of solinet effective july 1. john h. gribbin, chairman of the board, will serve as interim director. it was also announced that solinet will be affiliated with the southern regional education board. sreb will provide office space, act as financial disbursing agent, and will be available at all times in an advisory capacity. negotiations are underway for a tie-in to the ohio college library center ( oclc) and a proposed contract is in the hands of the oclc legal counsel. it is anticipated that a contract soon will be signed. additional to the tie-in, solinet will proceed with the development of its own permanent computer center in atlanta. this center will eventually provide a variety of services and will be coordinated carefully with other developing networks, looking toward a national library network system. elected to fill three vacancies on the board of directors were james f. govan (university of north carolina), gustave a. harrar (university of florida), and robert h. simmons (west georgia college). they will assume office on july 1. anyone desiring information about solinet should write to 130 sixth st., nw, atlanta, ga 30313. reports-library projects and activities new book catalog for junior college district of st. louis the three community college libraries of the junior college district of st. louis have been using computerized union book catalogs since 1964. formerly maintained and produced by an outside contractor, the catalogs are now one product of a new catalog system recently designed and implemented by instructional resources and data processing staff of the district. known as "ir catalog," the system presently has a data base of approximately 65,000 records describing the print and nonprint collections of the district's three college instructional resource centers. in addition to photocomposed author, subject, and title indexes, the system also produces weekly cumulative printouts which supplement the phototypeset ''base" catalog. other output includes three-by-five-inch shelflist cards (which include union holdings information), a motion picture film catalog, subject and cross reference authority lists, and various statistical reports. hawaii state lihra1'y system to automate p1'ocessing the state board of education in hawaii has approved a proposal for a computerized data processing system for the hawaii state library. the decision allows for the purchase of computer equipment for automating library operations. the state library centrally processes library materials for all public and school libraries in the state. teichior hirata, acting state education superintendent, told board members a computerized system will speed book selection, ordering, and processing, and will improve interlibrary loan and reference services. he also pointed out it would facilitate a general streamlining of all technical administrative operations. the system's total cost will be $187,000, of which $58,000 will be spent for computer software. the "biblios" system, designed and developed at orange county public library in california and marketed by information design, inc., was selected as the software package. the caltech science lihm1'y catalog supplement the use of catalog supplements during the necessary maturation period required to take full advantage of the national program for acquisitions and cataloging is technical communications 145 obviously an idea whose time has come. the program developed at the california institute of technology, however, differs in several important respects from that previously described by nixon and bell at u.c.l.a. 1 for reasons based primarily on faculty pressure, the practice of holding books in anticipation of the cataloging copy has never been a practice at the institute. the solution, while hardly unique, is to assign the classification number (dewey) and depend on a temporary main entry card to suffice until the lc copy is available. while this procedure has the distinct advantage of not requiring the presence of the book to complete the cataloging process, it does, however, prevent the user from finding the newest books through a search of the subject added entry cards. the use of the computer-based systems is an obvious solution to this aspect of the program but raises several additional problems which formerly seemed to defy solutions. as has been pointed out by mason, library-based computer systems can rarely be justified in terms of cost effectiveness, and computer-based library catalogs are no exception.2 part of this problem arises from the natural inclination to repeat in machine language what has been standard practice in the library catalog. this reaction overlooks the very different nature of catalogs and catalog supplements. as catalogs serve as the basis for the permanent record and their cost can be prorated over several decades the need for a careful description of the many facets of a book is quite properly justified. in the case of catalog supplements, however, where the record will serve quite likely for only a few months, any attempt at detailed description of the book cannot be justified. one solution to this dilemma that has been developed here at caltech is a brief listing supplement which allows searching for a given book by either the first author or editor's last name, a key word from the title, or the first word of a series entry. these elements form the basis of a simple kwoc index (see figure 1) which sup146 journal of library automation vol. 7/2 june 1974 chemisorption chemisorption and catalysis hepple 541.395 he 1970 ch chester 19 techniques in partial differential equations chester 517.6 ch 1971 ch 199 ciba protein turnover 612.39 pr 1972 bl ( ciba foundation symposium, 9) fig. 1. sample entries from the kwoc index 108 19 t chemisorption & catalysis hepple 541.395 he 1970 ch a hepple chemisorption catalysis 108 t protein turnover 612.39 pr 1972 bi (ciba foundation symposium, 9) a protein ciba 199 t techniques in partial differential equations chester 517.6 ch 1971 ch a differential chester fig. 2. sample ent1·ies from the bibliographic listing new books chemistry /biology august 6, 1973 catalysis, chemisorption and . hepple 541.395 he 1970 ch differential equations, techniques in partial . . . chester 517.6 ch 1971 ch protein turnover ciba foundation symposium, 9 612.39 pr 1972 bi fig. 3. sample entries from the weekly list of newly added books plements the bibliographic listing (shown in figure 2) . all books received in the chemistry, physics, and biology libraries are represented in the catalog supplement. weekly lists of newly added books (shown in figure 3) are annotated to show the index terms prior to keypunching. the unit record consists of a "title" card or cards (which contain the full title, author/ editor, call number, library designation, and series information) and an "author" card (which contains the index terms) . edited material is added accessionally to the card file data base and batch processed on the campus ibm 370/ 155 computer. the catalog supplement is currently published on 8jf-by-1hnch sheets as a result of reducing the computer printout on a xerox 7000 copier. lists are given a vello-bind and delivered to therespective libraries. weeding the catalog supplement is still unresolved. at the present time additions are less than 1,000 per year, so that it may be possible after five years to replace the subject sections of the respective divisional catalogs with the catalog supplement. the "library" at caltech consists of several divisional libraries, each with their own card catalog. these divisional card catalogs are supplemented by a union catalog, which serves all libraries on campus and, because of the strong interdisciplinary nature of the divisional libraries, is much the better source for subject searches. the project is so facile and the costs so minimal that this approach might be of value to many small libraries. it is particularly applicable to the problems recently discussed by patterson. 3 books in series, even if they are distinct monographs, are often lost to the user from a subject approach. with this system each physical volume added to the library can be analyzed for possible inclusion in the catalog supplement. 1. robertanixon and ray bell, "the u.c.l.a. library catalog supplement," library resources & technical services 17:59 (winter 1973). 2. ellsworth mason, "along the academic way," library journal 96:1671 (1971). 3. kelly patterson, "library think vs libra1y user," rq 12:364 (summer 1973). danal. roth millikan librm·y c alifomia institute of technology commercial activities richard abel & company to sponsor workshops in library automation and management one of the most effective forms of continuing education is state-of-the-art reporting. recognizing the need for more such communication 1 the international library service firm of richard abel & company plans to sponsor two workshops for the library and information science community. the first workshop will deal with the latest techniques in library automation. it will precede the 197 4 american library association conference in new york city, july 7-13. the second will present advances in library management, and will be scheduled to precede the 1975 ala midwinter meeting, january 19-25. the workshops will include forums, lectures, and open discussions. they will be presented by recognized leaders in the fields of library automation, management, and consulting. each workshop will probably be one or two days long. there will be no charge to attend either of the workshops, but attendance will be limited, to provide a good discussion atmosphere. for the management workshop, attendance will be limited to librarians active in library management. similarly, the automation workshop is intended for librarians working in library automation. maintaining the theme of state-of-theart reporting, the basic content of the workshops will consist of what is happening in library management and automation today. looking to the future, there will also be discussions and forecasts of what is to come. persons interested in further informatechnical communications 147 tion or in pa1ticipating in either workshop should contact abel workshop director, richard abel & company, inc., p.o. box 4245, portland, or 97208. idc introduces bibnet on-line services the introduction of bibnet on-line systems, a centralized computer-based bibliographic data service for libraries, has been announced by information dynamics corporation. demonstrations are planned for the ala annual conference in new york, july 7-13. according to david p. waite, idc president, "during 1973, bibnet service modules were interconnected over thousands of miles and tested for on-line use with idc's centralized computerbased cataloging data files. this is the culmination of a program that began two years ago. it is patterned after advanced technological developments similar to those recently applied to airline reservation systems and other large scale nationwide computing networks used in industry." idc, a new england-based library systems supplier, will provide a computerstored cataloging data base of more than 1.2 million library of congress and contributed entries. initially it will consist of all library of congress marc records (now numbering over 430,000 titles), plus another 800,000 partial lc catalog records containing full titles, main entries, lc card numbers, and other selected data elements. as a result, bibnet will provide on-line bibliographic searching for all 1,250,000 catalog records produced by the library of congress since 1969. to enable users to produce library cards from those non-marc records for which only partial entries are kept in the computer, idc will mail card sets from its headquarters and add the full records to the data base for future reference. subscribing libraries will have access to the data base using a minicomputer cathode ray tube (crt) terminal. using this technique of dispersed computing each bibnet terminal has programmable computer power built-in. this in-house 148 journal of library automation vol. 7/2 june 1974 processing power, independent of the central computer, allows computer processes like library card production to be performed in the library. this also eliminates waiting for catalog cards to arrive in the mail. bibnet terminals communicate with the central computer over regular telephone lines, eliminating the high costs of dedicated communication lines. therefore, thousands of libraries throughout the united states and canada can avail themselves of on-line services at low cost. bibnet users will have several methods of extracting information from the idc data base. the computer can search for individual records by titles, main entry, isbn number, or keywords. here's how it works: the operator types in any one of the search items or if a complete title is not known, a keyword from the title may be used. the cataloging information is then displayed on the crt where the operator may verify the record. at the push of a button, the data is stored on a magnetic cassette tape which is later used for editing and production of catalog cards by the user library. the bibnet demonstration in new york will highlight one of many bibliographic service modules available from idc and stress the fact that these services can be utilized by individual libraries and organized groups of libraries. license for new information retrieval concept awarded to boeing by xynetics an exclusive license for manufacture and marketing to the government sector of systems incorporating a completely new concept in information storage and retrieval has been awarded to the boeing company, seattle, washington, by xynetics, inc., canoga park, california, it was announced jointly by dr. r. v. hanks, boeing program manager, and burton cohn, xynetics board chairman. the system is said to be the first image storage and retrieval system which offers response times and costs comparable to those of digital systems. the heart of the system is a device of proprietary design, the flat plane memory, which provides mpid access to massive amounts of data stored in high resolution photographic media. the photographic medium enables low cost storage of virtually any type of source material (documents, correspondence, drawings, multitone images, computer output, etc.) while eliminating the need for time-consuming, costly conversion of pre-existing information into a specialized (e.g., digital) format. by virtue of its extremely rapid random access capability, the data needs of as many as several thousand users can be served at remote video terminals from a single memory with near real time response ( 1-3 seconds, typically). the high speed, high accuracy, and high reliability of the flat plane memory is accomplished primarily through the use of the patented xynetics positioner, which generates direct linear motion at high speeds and with great precision and reliability instead of converting rotary motion. as a result, the positioners eliminate the gears, lead screws, and other mechanical devices previously utilized, and thus achieve the requisite speed, accuracy, and reliability. the xynetics positioners are already being used in automated drafting systems produced by the firm, and in a wide variety of other applications, including the apparel industry and integrated circuit test systems. the new approach could eliminate many of the problems associated with multiple reproductions and distribution of large data files. in addition to many government applications, the system is expected to have major applications in the commercial marketplace. appointments charles h. stevens appointed solinet director charles h. stevens, executive director, national commission on libraries and information science, has been appointed director of the southeastern library net~ work (solinet), effective july 1. the announcement was made at a meeting of solinet in atlanta, march 14, by john h. gribbin, board chairman. composed of ninety-nine institutional members, solinet is headquartered in atlanta. a librarian of acknowledged national stature and an expert on the technical aspects of information retrieval systems, mr. stevens brings to solinet a valuable combination of experience and abilities. concerned with national problems of libraries and information services, he will develop a regional network and move toward a cohesive national program to meet the evolving needs of u.s. libraries. a forerunner in library automation, mr. stevens served for six years as associate director for library development, project intrex, at massachusetts institute of technology. from 1959-1965 he was director of library and publications at mit's lincoln laboratory, lexington, massachusetts. at purdue university, he was aeronautical engineering librarian and later director of documentation of the thermophysical properties research center. mr. stevens is a member of the council of the american library association, the american society for information science, the special libraries association, and other professional organizations. he is the author of approximately forty papers in the field, lectures widely, and consults on library activities for a number of universities. mr. stevens holds a b.a. in english fro:in principia college, elsah, illinois, and master's degrees in english and in library science from the university of north carolina. mr. stevens has done further study in engineering at brooklyn polytechnic institute. mr. stevens is married and has three sons. input to the editor: international scuttlebutt informs us that those in the bibliothecal stratosphere are technical communications 149 attempting to formulate a communications format for bibliographical records acceptable on a worldwide basis. we on the local scene unite in wishing them "huzzah!" and "godspeed!" nomenclature must be provided, of course, to designate particular applications; and the following suggestions are offered as possible subspecies of the genus supermarc: deutschmarc-for records distributed from bonn and/ or wiesbaden rheemarc-for south korean records, named in honor of the late president of that country bismarc-for records of stage productions which have been produced by popular demand from the top balcony; especially pertinent for wagnerian operas benchmarc-for records of generally unsuccessful football plays minskmarc-for byelorussian records sachermarc-for austrian records, usually representing extremely tasteful concoctions trademarc-for records pertaining to manufactured products, especially patent medicines goldmarc-for records representing hungarian musical compositions ( v. karl goldmark, 1830-1915) ectomarc } endomarc mesomarc (from -for skinny, fat, and the italian, mezmedium-sized reczomarc) ords, respectively landmarc-for records of historic edifices; sometimes ( enoneously) applied to records for local geographical regions feuermarc-for records representing charred or burned documents montmarc-1. for records representing works by or about parisian artists; 2. for records representing publications of the french academy watermarc-for records representing documents contained in bottles washed up on the beach. joseph a. rosenthal university of california, berkeley microsoft word author_edits_march_ital_rebmannproof_edits.docx tv white spaces in public libraries: a primer kristen radsliff rebmann, emmanuel edward te, and donald means information technology and libraries | march 2017 36 abstract tv white space (tvws) represents one new wireless communication technology that has the potential to improve internet access and inclusion. this primer describes tvws technology as a viable, long-term access solution for the benefit of public libraries and their communities, especially for underserved populations. discussion focuses first on providing a brief overview of the digital divide and the emerging role of public libraries as internet access providers. next, a basic description of tvws and its features is provided, focusing on key aspects of the technology relevant to libraries as community anchor institutions. several tvws implementations are described with discussion of tvws implementations in several public libraries. finally, consideration is given to first steps that library organizations must take when contemplating new tvws implementations supportive of wifi applications and crisis response planning. introduction tens of millions of people rely wholly or in part on libraries to provide access to the internet. many lack access to the federal communications commission (fcc) recommended standard of 25 mbps (megabits per second) download speed and 3 mbps upload speed.1 though the fcc reclassified high-speed internet as a public utility under title ii of the telecommunications act to ensure that broadband networks are “fast, fair, and open” in 2015,2 the “digital divide” still remains. one in four community members does not have access to the internet at home. accounting for age and education level, households with the lowest median income households have service adoption rates of around 50%, compared to those with higher incomes, with rates of 80 to 90%.3 a recent pew research center survey on home broadband adoption found that 43% of those surveyed reported cost being their main reason for non-adoption.4 individuals with low quality or no access are more likely to be digitally disadvantaged, tend to use library computers more frequently, and are less equipped to interact and compete economically as more services and application processes move online.5 kristen radsliff rebmann (kristen.rebmann@sjsu.edu) is associate professor, san jose state university school of information, san jose, ca. emmanuel edward te (emmanueledward.te@sjsu.edu) is a graduate student, san jose state university school of information, san jose, ca. donald means (don@digitalvillage.com) is co-founder and principal of digital village associates, sausalito, ca. tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 37 this article highlights tv white space (tvws), a new wireless communication technology with the potential to assist libraries in addressing digital access and inclusion issues. this primer provides first a brief overview of the digital divide and the emerging role of public libraries as internet access providers, highlighting the need for cost-efficient, technological solutions. we go further to provide a basic description of tvws and its features, focusing on key aspects of the technology relevant to libraries as community anchor institutions. several tvws implementations are described with discussion of how tvws was set up in several public libraries. finally, we extend consideration to first steps library organizations must consider when contemplating new implementations including everyday applications and crisis response planning. digital access and inclusion the term “digital divide” describes the gap between people who can easily access and use technology and the internet, and those who cannot.6 as kinney observes, “there has not been one single digital divide, but rather a series of divides that attend each new technology.”7 digital divides are exacerbated by various factors including: socioeconomic status, education, geography, age, ability, language, and especially availability and quality.8 in recent years, the language describing this issue has changed, but the inequalities stay consistent and widen among different dimensions with each emerging technology. the most recent public policy term “digital inclusion” promotes digital literacy efforts for unserved and underserved populations.9 the progression from the term “digital divide” to “digital inclusion” represents a shift in focus from issues of access exclusively toward contexts and quality of participation and usage. along these lines, the language of digital inclusion reframes the issue by making visible that simply focusing on internet access can obscure the fact that divides associated with quality and effectiveness remain.10 in response to the digital divide, public libraries have become the “unofficial” providers of internet access, stemming from libraries’ access to broadband infrastructure, maintenance of publiclyavailable computers, and services providing assistance and training.11 a pew research center survey on perceptions of libraries found that most respondents reported viewing public libraries as important parts of their communities, providing resources and assisting in decisions regarding what information to trust.12 however, many public libraries are facing an “infrastructure plateau” of internet access due to few computer workstations and slower broadband connection speeds that can support a growing number of users,13 on top of insufficient funding, physical space, and staffing.14 previous surveys show that although public libraries are connected to the internet and provide public access workstations and wireless access, nearly 50% of public libraries only offer wireless access that shares the same bandwidth as their workstations.15 this increased usage strains existing network connections and infrastructure, resulting in slower connections for everyone connected to the public library’s network. many public libraries cannot accommodate more workstations, support the power requirements of both workstations and patrons’ laptops, and afford workstation upgrades and bandwidth increases to move past their insufficient connectivity speeds. libraries often lack the it skills, time, and funds to upgrade their information technology and libraries | march 2017 38 infrastructure.16 typical wireless access via wi-fi is relegated to distances within library buildings, which may extend to exterior spaces and is available only during operating hours. despite these challenges, public libraries continually provide access and “at-the-point-of-need” training and support for their patrons, especially for those who do not have easy access to the internet and computers.17 subsidized by federal funding, libraries represent key access providers and technology trainers for the public without internet access.18 the fcc classifies libraries as “community anchor institutions” (cais), organizations that “facilitate greater use of broadband by vulnerable populations, including low-income, the unemployed, and the aged.”19 recent surveys show that users have a positive view of libraries, providing opportunities to spend time in a safe space, pursue learning, and promote a sense of community. librarians offer internet skills training programs more often than other community organizations though (at around 75% of the time) training occurs informally.20 in particular, 29% of respondents to a library use survey reported going to libraries to use computers, the internet, or the wi-fi network; 7% have also reported using libraries’ wi-fi signals outside when libraries are closed.21 the majority of these users are more likely to be young, black, female, and lower income, utilizing library technology resources for school or work (61%), checking email or sending texts (53%), finding health information (38%), and taking online courses or completing certifications (26%).22 public libraries are already exploring creative approaches to providing internet access for these underserved communities. the mobile hotspot lending program in public library systems in new york city and kansas city are just two examples.23 yet libraries must do more by supporting innovation and providing leadership by partnering with other community organizations and their stakeholders to enhance resilience in addressing access and inclusion. the emergence of tvws wireless technology presents an opportunity for libraries to explore expanding the reach of their wireless signals beyond library buildings and extend 24/7 library wi-fi availability to community spaces such as subsidized housing, schools, clinics, parks, senior centers, and museums. tvws basics tv whitespace (tvws) refers to the unoccupied portions of spectrum in the vhf/uhf terrestrial television frequency bands.24 television broadcast frequency allocations traditionally assumed that tv station transmissions operating at high power needed wide spectrum separation to prevent interference between broadcasting channels, which led to the specific spectrum allocation of these frequency “guard bands.”25 research discovered that low-power devices can operate within these spaces, which led the federal communications commission (fcc) to field test tvws applications to wireless communications and (ultimately) promote tvws neutrality.26 in 2015, the federal communications commission (fcc) made a portion of these very valuable tvws bands of spectrum available for open, shared public use, like wi-fi. yet, unlike wi-fi, with a reach measured in 10s of meters, the range of tvws is measured in 100s or even 1000s of meters. tvws has good propagation characteristics, which makes it an extremely valuable license-exempt radio spectrum.27 it is a relatively stable frequency that does not change over time, allowing for tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 39 spectrum availability estimates to remain reliable and valid, which in turn promotes its various applications.28 radio spectrum is considered a “common heritage of humanity,”29 as radio waves “do not respect national borders.”30 the fcc recently made a portion of these tvws bands of spectrum available for open, shared public use.31 tvws availability and application are contextual and dependent on many key factors. availability is influenced by frequency (the idle channels purposely planned in tv bands, varying across regions), deployment (the height and location of the tvws transmit antenna and its installation sites in relation to nearby surrounding tv broadcasting reception), space and distance (geographical areas outside the current planned tv coverage, including no present broadcasting signals), and time (off-air availability of licensed broadcasting transmitters during specific periods of time, subject to change by the broadcaster).32 as tvws existed as fragmented “safety margins” between broadcast services, tvws is typically more abundant in rural areas that have less broadcast coverage and in larger contiguous blocks rather than in highly dense urban areas.33 assigned spectrum is not always used efficiently and effectively by licensees, and exclusive or nonexclusive sharing can alleviate pressure on these resources.34 this “spectrum crunch” of the inefficient use of scarce spectrum resources can be alleviated with dynamic spectrum access (dsa) and spectrum sharing. tvws availability is small where digital television has been deployed, with the potentials for aggregate interference (from tvws users in relation to primary tv service) and self-interference (within the tvws network), which may lead to a “mismatch situation” where there is high demand for bandwidth but very low tvws bandwidth supply.35 as most spectrum frequencies have been organized through some form of exclusive access in which only the licensee can use the specific spectrum, technologies such as cognitive radios can enable new modes of spectrum access, supporting autonomous, self-configuring, self-planning networks which rely on up-to-date tvws availability databases. the limited distribution (in many areas) of basic broadband infrastructure and relatively high cost of access often prevents individuals with lower incomes from participating in the digital revolution of information access and its opportunities.36 despite these challenges to broadband availability, tvws excels in areas with low broadband coverage. rural regions possess greater frequency availability due to lower density of spectrum licensing. in comparison to other frequencies operating higher up on the spectrum band, tvws does not require direct line-of-sight between devices for operation, and has lower deployment costs. equipment market costs are comparable to wi-fi equipment currently on the market.37 importantly, tvws can address access and inclusion by having relatively low start-up costs and no ongoing services fees. as a public resource, it can work with existing services to create new, potentially mobile connections to the internet that ensure the continuation of vital services in the event of service interruptions.38 in urban areas with fewer channels available, new efficient spectrum sharing policies will be necessary. assigned spectrum is not always used efficiently and effectively by licensees, and exclusive or non-exclusive sharing or “recycling” of bands for more information technology and libraries | march 2017 40 effective spectrum use by multiple parties with changing spectrum needs can alleviate pressure on these resources.39 tvws for public libraries tvws is a viable medium for applications from internet access, content distribution within a given location, tracking (people, animals, and assets), task automation, and public safety and security,40 as well as remote patient monitoring and other telemedicine applications.41 tvws complements existing networks that use other parts of the spectrum for access points, mobile communications, and home media devices.42 analyses of a recent digital inclusion survey suggest that technology upgrades can have significant impact on the ability of libraries to expand programs and services.43 as community anchor institutions (cais), public libraries can use tvws systems to expand and improve access to their services for their users, especially for underserved populations. library-led collaborations to deploy tvws networks in other cais and public spaces have numerous benefits. in conjunction with building-centered wi-fi, tvws can redistribute network users from congested library spaces to other community sites, thereby distributing network usage across the community. from an existing broadband connection, libraries can extend their networks of internet access strategically across their communities. yet, unlike networks which solely use limited-range wi-fi, far-reaching tvws can improve the coverage and inclusion of patrons in accessing library programs, services and the broader internet.44 the portability of the access points allows libraries to extend their reach by providing wireless connections in the shortterm, for cultural or civic events like fairs, markets, or concerts, and in the long-term, for use at popular public areas. recent tvws pilot installations have proven to be very stable in kansas, colorado, mississippi, and delaware. manhattan public library (kansas)’s tvws project began in fall 2013. though there were a few delays in the installation and testing process, the tvws equipment was successfully implemented and welcomed by the community in early 2014. it staff report that their remote locations have shown that this library service fills a community need, especially for underserved populations.45 delta county libraries (colorado) are conducting trials with two public hotspots to support “guest” access and potentially provide library patrons with more bandwidth access.46 tvws implementations in the pascagoula school district (mississippi)47 and delaware public libraries48 show successful initial pilot usage in providing wireless internet service directly to community-distributed access points. though there are contextual differences across these sites, the strength of public libraries as cais providing internet access via tvws systems is evident and promising. first steps any library can take the initiative in setting up a tvws network on its own. the first step is to assess availability of spectrum in the library’s geographic location. access to tvws frequencies is free and requires no subscription fees other than the initial equipment investment. public tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 41 databases of tvws availability are easily accessible and have been tested by the fcc since 2011;49 google also has posted its own spectrum database as well.50 from this setup, the library gains access to public tvws frequencies by which they can broadcast and receive internet connections from paired tvws-enabled remote hotspots. once it is determined that there is available spectrum/channels in the desired area, libraries can then explore how their current broadband and wireless connections might be expanded to include several community spaces where internet access is needed. next, the library works with a tvws equipment supplier to design and install a tvws network consisting of a base station that is integrated with their wired connection to the internet. finally, the library places tvws-enabled remote hotspots in (previously identified) community-based spaces where wi-fi access is needed by underserved populations. given a high quality backhaul (i.e., fiber optic cable high speed connection), tvws can spread that signal and provide access from the library, which is able to propagate and penetrate multiple barriers and geographical features with a signal up to 10 times stronger than current wi-fi. depending on the context (geographical features, tvws availabilities, etc.), hotspots can be installed up to six miles (10 km) away and do not require line-of-sight between the base station and hotspots. this ability is superior to current wi-fi networks that only cover patrons in the immediate vicinity of the library. these tvws remote hotspots also can be easily (and strategically) moved to support occasional community needs (such as neighborhood-wide or city events) or in response to crisis situations. tvws, libraries, and emergency response public libraries provide leadership as “ready access point, first choice, first refuge, and last resort” for community services in everyday matters and in emergencies.51 they have assisted residents in relief efforts during hurricanes katrina and rita, and other natural and man-made disasters.52 …the provision of access to computers and the internet was a wholly unique and immeasurably important role for public libraries… the infrastructure can be a tremendous asset in times of emergencies, and should be incorporated into community plans.53 they have likewise provided immediate and long-term assistance to communities and aid workers, providing physical space for recovery operations for emergency agencies, communication technologies, and emotional support for the community. in previous library internet usage surveys, nearly one-third of libraries reported that their computers and internet services would be used by the public in emergencies to access relief services and benefits.54 such activities include finding and communicating with family and friends, completing online fema forms and insurance claims, and checking news sites regarding information of their affected homes.55 yet, despite the admirable and successful efforts of many public libraries, their infrastructures are not always built to meet the increased demand of user needs and e-government services in emergency contexts.56 jaeger, shneiderman, fleischmann , preece, qu, and wu propose the concept of community response grids (crgs), which utilize the internet and mobile communications devices so that emergency responders and residents in a disaster area can information technology and libraries | march 2017 42 communicate and coordinate accurate, appropriate responses.57 this concept relies on social networks, both in person and online, to enable residents and emergency responders to work together in a multi-directional communication scheme. crgs provide residents tailored, localized information and a means to report pertinent disaster related information to emergency responders, who in turn can synthesize and analyze submitted information and act accordingly.58 due to their existing role as community anchor institutions (cais), public libraries are uniquely positioned for crg involvement. libraries can assist in facilitation of internet access with portable tvws network connection points. by virtue of their portability, tvws hotspots can provide essential digital access in times of crisis by moving along with their affected populations. emergency operations and communications in a crisis occur throughout networks comprised of various technologies. information management before, during, and after a disaster affects how well a crisis is managed.59 broadband internet can be one access route in the event that phone and radio transmissions are affected, and vice versa, as part of a “mixed media approach” to get messages to those that need it in an emergency.60 yet one must remember that internet communications are double-edged: the internet provides relevant material on demand and near instant sharing and collaborating, but these very features can compound a crisis with misinformation.61 despite these concerns, the potential of the integration of wireless devices and other technologies into a multi-technology, collaborative response system can solve the problem of existing communication structures that lack coordination and quality control.62 the proliferation of smartphones, laptops, and other portable wireless devices makes such technology ideal for emergency communications, especially in how users’ familiarity with their own devices will help them navigate crg communications while under stress.63 conclusion supporting internet access and inclusion in public libraries and having equal, affordable, and available access to information is a necessary component to bridging the digital divide. technology has become “an irreducible component of modern life, and its presence and use has significant impact on an individuals’ ability to fully engage in society.”64 as cohron argues, this principle represents more than providing people with internet access: it is about “leveling the playing field in regards to information diffusion. the internet is such a prominent utility in peoples’ lives that we, as a society, cannot afford for citizens to go without.”65 broadband access is the first step; digital literacy training is also a necessity. access alone is not enough to ensure quality and effective use, however, as the digital divide is representative of broader social inequalities that computer and internet access cannot fully remedy.66 this is a complex problem that requires a multi-faceted solution. as kinney states, “the digital divide is a moving target, and new divides open up with new technologies. libraries help bridge some inequities more than others, and substantial disparities exist among library systems.”67 internet access also becomes a necessity when the internet is to play a role in emergency communications.68 tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 43 it is problematic to suggest that public libraries can be simultaneously promoted as the solution to digital divide issues while facing cuts to funding. policy makers, community advocates, and the community members themselves are stakeholders in the success of their communities, and must also take responsibility for access and inclusion via public libraries.69 as public agencies automate to increase equality and save money, they exacerbate digital divides by excluding those without access. suggesting that community members simply visit the library to ensure access to public services places additional pressure on libraries, yet these efforts may go unsupported and unacknowledged. public libraries are already valuable community access points to resources especially in emergencies, though many suffer from a lack of concerted disaster planning. along similar lines, many libraries are ill-equipped to accommodate the bandwidth needs of growing and oftentimes sparsely connected populations. as communications and government services move increasingly online, it becomes imperative to build strong cost-effective information infrastructures. tvws connections can arguably help in breaking down the barriers that challenge ubiquitous access and inclusion. tvws-enabled remote access points in daily use around communities are ideally situated to provide everyday wi-fi and for rapid redeployment to damaged areas (as pop-up hotspots) to provide essential communication and information resources in times of crisis. in short, tvws can augment the technological infrastructure of public libraries toward further developing their roles as cais and leaders serve their communities well into the future. references 1. wireline competition bureau, “2016 broadband progress report,” federal communications commission, january 29, 2016, https://www.fcc.gov/reports-research/reports/broadbandprogress-reports/2016-broadband-progress-report. 2. office of chairman wheeler, “fcc adopts strong, sustainable rules to protect the open internet,” federal communications commission, february 26, 2015, https://apps.fcc.gov/edocs_public/attachmatch/doc-332260a1.pdf. 3. “here's what the digital divide looks like in the united states,” the white house, july 15, 2015, https://www.whitehouse.gov/share/heres-what-digital-divide-looks-united-states. 4. john b. horrigan and maeve duggan, “home broadband 2015,” pew research center, december 21, 2015, http://www.pewinternet.org/files/2015/12/broadband-adoptionfull.pdf. this 43% is further divided between 33% reporting the monthly subscription cost as their main reason, while the other 10% report the expensive cost of a computer as their reason for non-adoption. 5. bo kinney, “the internet, public libraries, and the digital divide,” public library quarterly 29, no. 2 (2010): 104-161, https://doi.org/10.1080/01616841003779718. information technology and libraries | march 2017 44 6. madalyn cohron, “the continuing digital divide in the united states,” the serials librarian 69, no. 1 (2015): 77-86, https://doi.org/10.1080/0361526x.2015.1036195. 7. kinney, “the internet, public libraries, and the digital divide.” 8. paul t. jaeger, john carlo bertot, kim m. thompson, sarah m. katz, and elizabeth j. decoster, “the intersection of public policy and public access: digital divides, digital literacy, digital inclusion, and public libraries,” public library quarterly 31, no.1 (2012): 1-20, https://doi.org/10.1080/01616846.2012.654728. 9. brian real, john carlo bertot, and paul t. jaeger, “rural public libraries and digital inclusion: issues and challenges,” information technology and libraries 33, no. 1 (2014): 6-24, https://doi.org/10.6017/ital.v33i1.5141. 10. jaeger et al., “the intersection of public policy and public access.” 11. john carlo bertot, paul t. jaeger, lesley a. langa, charles r. mcclure, “public access computing and internet access in public libraries: the role of public libraries in e-government and emergency situations,” first monday 11, no. 9 (2006), https://doi.org/10.5210/fm.v11i9.1392. 12. john. b horrigan, “libraries 2016,” pew research center, september 9. 2016, http://www.pewinternet.org/2016/09/09/libraries-2016/. 13. real et al., “rural public libraries and digital inclusion.” 14. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no.3 (2008): 285301, https://doi.org/10.1086/588445. 15. charles r. mcclure, paul t. jaeger, john carlo bertot, “the looming infrastructure plateau? space, funding, connection speed, and the ability of public libraries to meet the demand for free internet access,” first monday 12, no. 12 (2007): https://doi.org/10.5210/fm.v12i12.2017 . 16. ibid. 17. bertot et al., “public access computing and internet access in public libraries.” 18. ibid.; jaeger et al., “the intersection of public policy and public access.” 19. wireline competition bureau, “wcb cost model virtual workshop 2012 community anchor institutions,” federal communications commission, june 1, 2012, https://www.fcc.gov/newsevents/blog/2012/06/01/wcb-cost-model-virtual-workshop-2012-community-anchorinstitutions. 20. jennifer koerber, "ala and ipac analyze digital inclusion survey," library journal 141, no. 1 (2016): 24-26. 21. horrigan, “libraries 2016.” tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 45 22. ibid. 23. timothy inklebarger, “bridging the tech gap,” american libraries, september 11, 2015, https://americanlibrariesmagazine.org/2015/09/11/bridging-tech-gap-wi-fi-lending. 24. andrew stirling, “white spaces – the new wi-fi?,” international journal of digital television 1, no. 1 (2010): 69–83, https://doi.org/10.1386/jdtv.1.1.69/1; cristian gomez, “tv white spaces: managing spaces or better managing inefficiencies?,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 67-77. 25. steve song, “spectrum and development,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 35-40. 26. robert horvitz, “geo-database management of white space vs. open spectrum,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 7-17. 27. julie knapp, “fcc announces public testing of first television white spaces database,” federal communications commission, september 14, 2011, https://www.fcc.gov/newsevents/blog/2011/09/14/fcc-announces-public-testing-first-television-white-spacesdatabase. 28. horvitz, “geo-database management of white space vs. open spectrum.” 29. ryszard strużak and dariusz więcek, “regulatory issues for tv white spaces,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 19-34. 30. horvitz, “geo-database management of white space vs. open spectrum,” 8. 31. engineering & technology bureau, “fcc adopts rules for unlicensed services in tv and 600 mhz bands,” federal communications commission, august 11, 2015, https://apps.fcc.gov/edocs_public/attachmatch/fcc-15-99a1_rcd.pdf. 32. gomez, “tv white spaces: managing spaces or better managing inefficiencies?,” 68. 33. stirling, “white spaces – the new wi-fi?.” 34. linda e. doyle, “cognitive radio and africa,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 109-119. 35. gomez, “tv white spaces: managing spaces or better managing inefficiencies?,” 72. 36. mike jensen, “the role of tv white spaces and dynamic spectrum in helping to improve internet access in africa and other developing regions,” in tv white spaces a pragmatic information technology and libraries | march 2017 46 approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 83-89. 37. song, “spectrum and development.” 38. ibid. 39. doyle, “cognitive radio and africa,” 113. 40. stirling, “white spaces – the new wi-fi?.” 41. afton chavez, ryan littman-quinn, kagiso ndlovu, and carrie l kovarik, “using tv white space spectrum to practice telemedicine: a promising technology to enhance broadband internet connectivity within healthcare facilities in rural regions of developing countries,” journal of telemedicine and telecare 22, no. 4 (2015): 260-263, https://doi.org/10.1177/1357633x15595324. 42. stirling, “white spaces – the new wi-fi?.” 43. koerber, "ala and ipac analyze digital inclusion survey." 44. chavez et al., “using tv white space spectrum to practice telemedicine.” 45. kerry ingersoll, june 22, 2015, google+ comment to the gigabit libraries network, https://plus.google.com/107631107756352079114/posts/l4y8ci8sg5y. 46. delta county libraries, “super wi-fi pilot,” accessed november 1, 2016, http://www.deltalibraries.org/super-wi-fi-pilot/. 47. pascagoula tv white spaces facebook group, accessed november 1, 2016, https://www.facebook.com/psdtvws/. 48. “delaware libraries white space pilot update, january 2015,” accessed november 1, 2016, http://lib.de.us/files/2015/01/delaware-libraries-white-space-pilot-update-jan-2015.pdf. 49. knapp, “fcc announces public testing of first television white spaces database.” 50. see https://www.google.com/get/spectrumdatabase/. 51. bertot et al., “public access computing and internet access in public libraries.” 52. bertot et al., “the impacts of free public internet access.” see also horrigan, “libraries 2016.” 53. paul t. jaeger, lesley a. langa, charles r. mcclure, and john carlo bertot, “the 2004 and 2005 gulf coast hurricanes: evolving roles and lessons learned for public libraries in disaster preparedness and community services,” public library quarterly 25, 3/4, (2007), 199-214. 54. ibid. tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 47 55. bertot et al., “public access computing and internet access in public libraries.” 56. ibid. 57. paul t. jaeger, ben shneiderman, kenneth r. fleischmann , jennifer preece, yan qu, philip fei wu, “community response grids: e-government, social networks, and effective emergency management,” telecommunications policy 31 (2007): 592-604, https://doi.org/10.1016/j.telpol.2007.07.008. 58. ibid., 595. 59. laurie putnam, “by choice or by chance: how the internet is used to prepare for, manage, and share information about emergencies,” first monday 7, no.11 (2002), https://doi.org/10.5210/fm.v7i11.1007. 60. ibid. 61. ibid. 62. jaeger et al., “community response grids,” 598. jaegar et al. describe how the internet combines the best of one-to-one, one-to-many, many-to-one, and many-to-many in terms of the flow and quality of information. one-to-one communication is slow; many-to-one only benefits the central network, while outsiders reporting emergencies do not learn what others are reporting; one-to-many is inefficient, limited, and assumes the broadcaster has the appropriate information and can get it to those that need it most; many-to-many can create “information overload” of questionable content. 63. ibid., 599. 64. jaeger et al., “the intersection of public policy and public access,” 3. 65. cohron, “the continuing digital divide in the united states,” 84. 66. kinney, “the internet, public libraries, and the digital divide,” 120. 67. ibid., 148. 68. jaeger et al., “community response grids,” 599. 69. bertot et al., “the impacts of free public internet access,” 299. microsoft word december_ital_gonzales_final.docx linking  libraries  to  the  web:     linked  data  and  the  future  of  the   bibliographic  record       brighid  m.  gonzales       information  technology  and  libraries  |  december  2014           10   abstract   the  ideas  behind  linked  data  and  the  semantic  web  have  recently  gained  ground  and  shown  the   potential  to  redefine  the  world  of  the  web.  linked  data  could  conceivably  create  a  huge  database  out   of  the  internet  linked  by  relationships  understandable  by  both  humans  and  machines.  the  benefits  of   linked  data  to  libraries  and  their  users  are  potentially  great,  but  so  are  the  many  challenges  to  its   implementation.  the  bibframe  initiative  provides  the  possible  framework  that  will  link  library   resources  with  the  web,  bringing  them  out  of  their  information  silos  and  making  them  accessible  to   all  users.   introduction   for  many  years  now  the  marc  (machine-­‐readable  cataloging)  format  has  been  the  focus  of   rampant  criticisms  across  library-­‐related  literature,  and  though  an  increasing  number  of  diverse   metadata  formats  for  libraries,  archives,  and  museums  have  been  developed,  no  framework  has   shown  the  potential  to  be  a  viable  replacement  for  the  long-­‐established  and  widely  used   bibliographic  format.  over  the  past  decade,  web  technologies  have  been  advancing  at  a   progressively  rapid  pace,  outpacing  marc’s  ability  to  keep  up  with  the  potential  these   technologies  can  offer  to  libraries.  standing  by  the  marc  format  leaves  libraries  in  danger  of  not   being  adequately  prepared  to  meet  the  needs  of  modern  users  in  the  information  environments   they  currently  frequent  (increasingly,  search  engines  such  as  google).   new  technological  developments  such  as  the  ideas  behind  linked  data  and  the  semantic  web  have   the  potential  to  bring  a  host  of  benefits  to  libraries  and  other  cultural  institutions  by  allowing   libraries  and  their  carefully  cultivated  resources  to  connect  with  users  on  the  web.  though  there   remains  a  host  of  obstacles  to  their  implementation,  linked  data  has  much  to  offer  libraries  if  they   can  find  ways  to  leverage  this  technology  for  their  own  uses.  libraries  are  slowly  finding  ways  to   take  advantage  of  the  opportunities  linked  data  present,  including  initiatives  such  as  the   bibliographic  framework  initiative,  known  as  bibframe,  which  may  have  the  potential  to  be  the   bibliographic  replacement  for  marc  that  the  information  community  has  long  needed.  such  a   change  may  help  libraries  not  only  to  stay  current  with  the  modern  information  world  and  stay   relevant  in  the  minds  of  users,  but  also  reciprocally  create  a  richer  world  of  data  available  to   information  seekers  on  the  web.   brighid  gonzales  (brighidmgonzales@gmail.com),  a  recent  mlis  recipient  from  the  school  of   library  and  information  science,  san  jose  state  university,  is  winner  of  the  2014  lita/ex  libris   student  writing  award.     linking  libraries  to  the  web  |  gonzales       11   the  limitations  of  marc   much  has  been  written  over  the  years  about  the  issues  and  shortcomings  of  the  marc  format.   nonetheless,  marc  formatting  has  been  widely  used  by  libraries  around  the  world  since  the   1960s,  when  it  was  first  created.  this  long-­‐established  and  ubiquitous  usage  has  resulted  in   countless  legacy  bibliographic  records  that  currently  exist  in  the  marc  format.  to  lose  this   carefully  crafted  data  or  to  expend  the  finances,  time,  and  manual  effort  required  to  convert  all  of   this  legacy  data  into  a  new  format  may  be  a  cause  for  reservation  in  the  community.   but  the  fact  remains  that  in  spite  of  its  widespread  use,  there  are  many  issues  with  the  marc   format  that  make  it  a  candidate  for  replacement  in  the  world  of  bibliographic  data.   andresen  describes  several  different  versions  of  marc  that  have  largely  been  wrapped  together  in   the  community’s  mind,  reminding  us  that  “although  marc21  is  often  described  as  an  international   standard,  it  is  only  used  in  a  limited  number  of  countries.”1  in  actuality,  what  we  often  refer  to   simply  as  marc  could  be  marc21,  ukmarc,  unimarc  or  even  danmarc2.2  this  lack  of  a  unified   standard  has  long  been  an  issue  with  this  particular  format.   then  there  is  marc’s  notorious  inflexibility.  originally  created  for  the  description  of  printed   materials,  marc’s  rigidly  defined  standards  can  make  it  unsuited  for  the  description  of  digital,   visual,  or  multimedia  resources.  andresen  writes  that  “the  lack  of  flexibility  means  that  local   additions  might  hinder  exchange  between  local  systems  and  union  catalogue  systems.”3  tennant   has  also  expressed  frustration  with  marc’s  inflexibility,  particularly  its  inability  to  express   hierarchical  relationships.  tennant  posits  that  where  the  marc  format  is  “flat,”  expressing   relationships  involving  hierarchy,  such  as  in  a  table  of  contents,  “would  be  a  breeze  in  xml,”  which   is  the  format  he  recommends  moving  toward  for  its  greater  extensibility.4  marc’s  rigidity  may  also   be  a  reason  why  the  format  is  not  generally  used  outside  of  the  library  environment;  thus   information  contained  in  marc  format  cannot  be  exchanged  with  information  from  nonlibrary   environments.5   inconsistencies,  errors,  and  localized  practices  are  also  issues  frequently  cited  in  detailing  marc’s   inherent  shortcomings.  with  shared  cataloging,  inconsistencies  may  be  less  common,  but  there   remains  the  fact  that  with  any  number  of  individual  catalogers  creating  records,  the  potential  for   error  is  still  great.  and  any  localized  changes  can  also  create  inconsistency  in  records  from  library   to  library.  tennant  gives  as  an  example  recording  the  editor  of  a  book,  which  “should  be  encoded  in   a  700  field,  with  a  $e  subfield  that  specifies  the  person  is  the  editor.  but  the  $e  subfield  is   frequently  not  encoded,  thus  leaving  one  to  guess  the  role  of  the  person  encoded  in  the  700  field.”6   when  it  comes  to  issues  with  marc  in  the  modern  computing  environment,  however,  one  of  the   biggest  and  seemingly  insurmountable  problems  is  its  inability  to  express  the  relationships   between  entities.  andresen  points  out  that  it  is  “difficult  to  handle  relations  between  data  that  are   described  in  different  fields,”7  while  tennant  writes  that  “relationships  among  related  titles  are   problematic.”8  alemu  et  al.  also  write  of  marc’s  “document-­‐centric”  structure,  which  prevents  it     information  technology  and  libraries  |  december  2014   12   from  recognizing  relationships  between  entities  that  might  be  possible  in  a  more  “actionable  data-­‐ centric  format.”9   though  tennant  advocates  the  embrace  of  xml-­‐based  formats  as  a  way  to  transition  from  marc,   breeding  writes  that  even  marcxml  “cannot  fully  make  intelligible  the  quirky  marc  coding  in   terms  of  semantic  relationships.”10  alemu  et  al.  also  note  that  marc  may  continue  to  be  widely   used  mainly  because  alternatives,  including  xml,  have  not  yet  been  found  to  be  an  adequate   replacement.11   it  is  clear  that  if  libraries  and  their  carefully  crafted  bibliographic  records  are  to  remain  relevant   and  viable  in  today’s  modern  computing  world,  a  more  modern  metadata  format  that  addresses   these  issues  will  be  required.  clearly  needed  is  a  more  flexible  and  extensible  format  that  allows   for  the  expression  of  relationships  between  points  of  data  and  the  ability  to  link  that  data  to  other   related  information  outside  of  the  presently  insular  library  catalog.   linked  data  and  the  semantic  web   linked  data  works  as  the  framework  behind  the  semantic  web,  an  idea  by  world  wide  web   inventor  tim  berners-­‐lee,  which  would  turn  the  internet  into  something  closer  to  one  large   database  rather  than  simply  a  disparate  collection  of  documents.  since  the  internet  is  often  the   first  place  users  turn  to  for  information,  libraries  should  take  advantage  of  the  concepts  behind   linked  data  to  both  put  their  resources  out  on  the  web,  where  they  can  be  found  by  users,  and  in   turn  bring  those  users  back  to  the  library  through  the  lure  of  authoritative,  high-­‐quality  resources.   in  the  world  of  linked  data,  the  relationships  between  data,  not  just  the  documents  in  which  they   are  contained,  are  made  explicit  and  readable  by  both  humans  and  machines.  with  the  ability  to   “understand”  and  interpret  these  semantically  explicit  connections,  computers  will  have  the  power   to  lead  users  to  a  web  of  related  data  based  on  a  single  information  search.  underpinning  the   semantic  web  are  the  web-­‐specific  standards  xml  and  rdf  (resource  description  framework).   these  work  as  universal  languages  for  semantically  labeling  data  in  such  a  way  that  both  a  person   and  a  computer  can  interpret  their  meaning  and  then  distinguishing  the  relationships  between  the   various  data  sources.   these  relationships  are  expressed  using  rdf,  “a  flexible  standard  proposed  by  the  w3c  to   characterize  semantically  both  resources  and  the  relationships  which  hold  between  them.”12  baker   notes  that  rdf  supports  “the  process  of  connecting  dots—of  creating  “knowledge”—by  providing   a  linguistic  basis  for  expressing  and  linking  data.”13  rdf  is  organized  into  triples,  expressing   meaning  as  subject,  verb,  and  object  and  detailing  the  relationships  between  them.  an  example  is   the  catcher  in  the  rye  is  written  by  j.  d.  salinger,  where  the  catcher  in  the  rye  acts  as  the  subject,  j.   d.  salinger  is  the  object  and  the  “verb”  is  written  by  expresses  the  semantic  relationship  between   the  two,  naming  j.  d.  salinger  as  the  author  of  the  catcher  in  the  rye.  by  using  this  framework,   computers  can  link  to  other  rdf-­‐encoded  data,  leading  users  to  other  works  written  by  j.  d.   salinger,  other  adaptations  of  the  catcher  in  the  rye,  and  other  related  data  sources  from  around     linking  libraries  to  the  web  |  gonzales       13   the  web.   rdf  gives  machines  the  ability  to  “understand”  the  semantic  meaning  of  things  on  the  web  and  the   nature  of  the  relationships  between  them.  in  this  way  it  can  make  connections  for  people,  leading   them  to  related  information  they  may  not  have  otherwise  found.  the  use  of  xml  allows  developers   to  create  their  own  tags,  adding  an  explicit  semantic  structure  to  their  documents  that  they  can   exploited  using  rdf.   the  semantic  web  is  based  on  four  rules  explicated  by  web  inventor  tim  berners-­‐lee.  the  rules   for  the  semantic  web  are  as  follows:   1. use  uris  (uniform  resource  identifiers)  as  names  for  things.   2. use  http  uris  so  that  people  can  look  up  those  names.   3. when  someone  looks  up  a  uri,  provide  useful  information,  using  the  standards  (rdf*,   sparql).   4. include  links  to  other  uris  so  that  they  can  discover  more  things.14   uris  act  as  a  permanent  signpost  for  things,  both  on  and  off  the  web.  using  consistent  uris  allows   data  to  be  linked  between  and  back  to  certain  places  on  the  web  without  the  worry  of  broken  or   dead  links.  rdf  triples  map  the  relationships  between  each  thing,  which  can  then  be  linked  to   more  things,  opening  up  a  wide  world  of  interrelated  data  for  users.     the  concept  behind  linked  data  would  allow  for  the  integration  of  library  data  and  data  from   other  resources,  whether  from  “scientific  research,  government  data,  commercial  information,  or   even  data  that  has  been  crowd-­‐sourced.”15  however,  to  create  an  open  web  of  data  facilitated  by   linked  data  theories,  open  standards  such  as  rdf  must  be  used,  making  data  interoperable  with   resources  from  various  communities.  this  interoperability  is  key  to  being  able  to  mix  library   resources  with  those  from  other  parts  of  the  web.   interoperability  helps  to  make  “data  accessible  and  available,  so  that  they  can  be  processed  by   machines  to  allow  their  integration  and  their  reuse  in  different  applications.”16  in  this  way,   machines  would  be  able  to  understand  the  relationships  and  connections  between  data  contained   within  documents  and  thus  lead  users  to  related  data  they  may  not  have  otherwise  found.  using   linked  data  would  bring  carefully  crafted  and  curated  library  data  out  of  the  information  silos  in   which  they  have  long  been  enclosed  and  connect  them  with  the  rest  of  the  web  where  users  can   more  easily  find  them.   benefits  for  libraries   libraries  and  their  users  have  much  to  gain  from  participation  in  the  linked  data  movement.  in  an   age  when  google  is  often  the  first  place  users  turn  when  searching  for  information,  freeing  library   data  from  their  insulated  databases  and  getting  them  out  onto  the  web  where  the  users  are  can   help  make  library  resources  both  relevant  and  available  for  users  who  may  not  make  the  library     information  technology  and  libraries  |  december  2014   14   the  first  place  they  look  for  information.  this  can  lead  not  only  to  increased  use  by  library  patrons   and  nonpatrons  (who  would  now  be  potential  library  patrons)  alike,  but  also  to  increased  visibility   for  the  library.  creating  and  using  linked  data  technologies  also  opens  the  door  for  libraries  to   share  metadata  and  other  information  in  a  way  previously  limited  by  marc.  libraries  also  have   the  potential  to  add  to  the  richness  of  data  that  is  available  on  the  web,  creating  a  reciprocal   benefit  with  the  semantic  web  itself.   coyle  writes  that  “every  minute  an  untold  number  of  new  resources  is  added  to  our  digital  culture,   and  none  of  these  is  under  the  bibliographic  control  of  the  library.”17  indeed,  the  world  wide  web   is  a  participatory  environment  where  anyone  can  create,  edit  or  manipulate  information  resources.   libraries  still  consider  themselves  the  province  of  quality,  reliable  information,  but  users  don’t   necessarily  go  to  libraries  when  searching  and  don’t  necessarily  have  the  internet  acumen  to   distinguish  between  authoritative  information  and  questionable  resources.  coyle  also  notes  that   “the  push  to  move  libraries  in  the  direction  of  linked  data  is  not  just  a  desire  to  modernize  the   library  catalog;  it  represents  the  necessity  to  transform  the  library  catalog  from  a  separate,  closed   database  to  an  integration  with  the  technology  that  people  use  for  research.”18  using  linked  data,   libraries  can  still  create  the  rich,  reliable,  authoritative  data  they  are  known  for  while  also  making   it  available  on  the  web,  where  potentially  anyone  can  find  it.   much  has  been  written  about  libraries’  information  silos,  and  many  researchers  are  finding  in   linked  data  the  possibility  to  free  this  information.  for  the  information  contained  in  the  library   catalog  to  be  significantly  more  usable  it  “must  be  integrated  into  the  web,  queryable  from  it,  able   to  speak  and  to  understand  the  language  of  the  web.”19  alemu  et  al.  write  that  linking  library  data   to  the  web  “would  allow  users  to  navigate  seamlessly  between  disparate  library  databases  and   external  information  providers  such  as  other  libraries,  and  search  engines.”20  users  are  likely  to   find  the  world  of  linked  data  immeasurably  more  useful  than  individually  searching  library   databases  one-­‐by-­‐one  or  relying  on  google  search  results  for  the  information  they  need.   linked  data  also  allows  for  the  possibility  of  serendipity  in  information  searching,  of  finding   information  one  didn't  even  know  they  were  looking  for,  something  akin  to  browsing  the  library   shelves.21  linked  data  “allows  for  the  richer  contextualization  of  sources  by  making  connections   not  only  within  collections  but  also  to  relevant  outside  sources.”22  tillett  adds  that  linked  data   would  allow  for  “mashups  and  pathways  to  related  information  that  may  be  of  interest  to  the  web   searcher—either  through  showing  them  added  facets  they  may  wish  to  consider  to  refine  their   search  or  suggesting  new  directions  or  related  resources  they  may  also  like  to  see.”23   the  use  of  linked  data  is  not  just  beneficial  to  users  though.  libraries  are  also  likely  to  see   increased  benefits  in  the  sharing  of  metadata  and  other  resources.  alemu  et  al.  write  that  “making   library  metadata  available  for  re-­‐use  would  eliminate  unnecessary  duplication  of  data  that  is   already  available  elsewhere,  through  reliable  sources.24  tillett  also  writes  about  the  reduced  cost   to  libraries  for  storage  and  data  in  a  linked  data  environment  where  “libraries  do  not  need  to   replicate  the  same  data  over  and  over,  but  instead  share  it  mutually  with  each  other  and  with     linking  libraries  to  the  web  |  gonzales       15   others  using  the  web,”  reducing  costs  and  expanding  information  accessibility.25  byrne  and   goddard  also  note  that  “having  a  common  format  for  all  data  would  be  a  huge  boon  for   interoperability  and  the  integration  of  all  kinds  of  systems.”26   in  addition  to  the  reduced  cost  of  shared  resources,  something  with  which  libraries  are  already   very  familiar,  the  linking  of  data  from  libraries  to  one  another  and  to  the  web  would  also  allow  for   an  increased  richness  in  overall  data.  from  metadata  that  may  need  to  be  changed  or  updated   periodically  to  user-­‐generated  metadata  that  is  more  likely  to  include  current,  up-­‐to-­‐date   terminology,  the  “mixed  metadata”  approach  allowed  by  linked  data  would  be  “better  situated  to   provide  a  richer  and  more  complete”  description  of  various  resources  that  could  more  accurately   provide  for  the  variety  of  interpretation  and  terminology  possible  in  their  description.27   a  new  bibliographic  framework   one  of  the  most  important  ways  libraries  are  moving  toward  the  world  of  linked  data  is  with  the   bibliographic  framework  initiative,  known  as  bibframe,  which  was  announced  by  the  library  of   congress  in  2011.  since  then,  though  bibframe  is  still  in  development,  rapid  progress  has  been   made  that  suggests  that  bibframe  may  be  the  long-­‐awaited  replacement  for  the  marc  format   that  could  free  library  bibliographic  information  from  its  information  silos  and  allow  it  to  be   integrated  with  the  wider  web  of  data.   the  bibframe  model  comprises  four  classes:  creative  work,  instance,  authority,  and  annotation.   in  this  model,  creative  work  represents  the  “conceptual  essence”  of  the  item.  instance  is  the   “material  embodiment”  of  the  creative  work.  authority  is  a  resource  that  defines  relationships   reflected  by  the  creative  work  and  instance,  such  as  people,  places,  topics,  and  organizations.   annotation  relates  the  creative  work  with  other  information  resources,  which  could  be  library   holdings  information,  cover  art,  or  reviews.28  these  are  similar  in  a  way  to  the  frbr  (functional   requirements  for  bibliographic  records)  model,  which  uses  work,  expression,  manifestation,  and   item.29  indeed,  bibframe  is  built  with  rda  (resource  description  and  access)  as  an  important   source  for  content,  which  was  in  turn  built  around  the  principles  in  frbr.  despite  this,  bibframe   “aims  to  be  independent  of  any  particular  set  of  cataloging  rules.”30   realizing  the  vast  amounts  of  information  that  is  still  recorded  in  marc  format,  the  bibframe   initiative  is  also  working  on  a  variety  of  tools  that  will  help  to  transform  legacy  marc  records  into   bibframe  resources.31  these  tools  will  be  essential  as  “the  conversion  of  marc  records  to   useable  linked  data  is  a  complicated  process.”32  where  marc  allowed  for  libraries  to  share   bibliographic  records  without  each  having  to  constantly  reinvent  the  wheel,  bibframe  will  allow   library  metadata  to  be  “shared  and  reused  without  being  transported  and  replicated.”33   bibframe  would  support  the  linked  data  model  while  also  incorporating  emerging  content   standards  such  as  frbr  and  rda.34  the  bibframe  initiative  is  committed  to  compatibility  with   existing  marc  records  but  would  eventually  replace  marc  as  a  bibliographic  framework  “agnostic   to  cataloging  rules”35  rather  than  intertwined  with  them  as  marc  was  with  aacr2.  also  unlike     information  technology  and  libraries  |  december  2014   16   marc,  which  is  rigidly  structured  and  not  amenable  to  incorporation  with  web  standards,   bibframe  would  enable  library  metadata  to  be  found  on  the  web,  freeing  it  from  the  information   silos  that  have  contained  it  for  decades.  whereas  marc  is  not  very  web-­‐compatible,  “bibframe  is   built  on  xml  and  rdf,  both  ‘native’  schemas  for  the  internet.  the  web-­‐friendly  nature  of  these   schemas  allows  for  the  widest  possible  indexing  and  exposure  for  the  resources  held  in   libraries.”36   backed  by  the  library  of  congress,  bibframe  already  has  a  great  deal  of  support  throughout  the   information  community,  though  it  is  not  yet  at  the  stage  of  implementation  for  most  libraries.   however,  half  a  dozen  libraries  and  other  institutions  are  acting  as  “early  experimenters”  working   to  implement  and  experiment  with  bibframe  to  assist  in  the  development  process  and  get  the   framework  library  ready.  participating  institutions  include  the  british  library,  george  washington   university,  princeton  university,  deutsche  national  bibliothek,  national  library  of  medicine,  oclc,   and  the  library  of  congress.37  though  not  yet  fully  realized,  bibframe  seems  to  offer  a   substantial  step  toward  the  implementation  of  linked  data  to  connect  library  bibliographic   materials  with  other  resources  on  the  web.   the  challenges  ahead   the  road  to  widespread  use  of  the  semantic  web,  linked  data,  and  even  possible  implementations   such  as  bibframe  is  not  without  obstacles.  for  one,  knowledge  and  awareness  is  a  major  concern,   as  well  as  the  intimidating  thought  of  transitioning  away  from  marc,  a  standard  that  has  been  in   widespread  use  for  as  long  as  many  of  the  professionals  using  it  have  been  alive.  there  is  also  the   challenge  and  significant  resources  required  for  converting  huge  stores  of  legacy  data  from  marc   format  to  a  new  standard.  in  addition,  linked  data  has  its  own  set  of  specific  concerns,  such  as   legality  and  copyright  issues  involved  in  the  sharing  of  information  resources,  as  well  as  the   willingness  of  institutions  to  share  metadata  that  they  may  have  invested  a  great  deal  of  time  and   money  in  creating.   many  organizations  may  be  hesitant  to  make  the  move  toward  linked  data  without  a  clear  sign  of   success  from  other  institutions.  chudnov  writes  that  “a  new  era  of  information  access  where   library-­‐provided  resources  and  services  rose  swiftly  to  the  top  of  ambient  search  engines’  results   and  stayed  there”  is  what  may  be  necessary,  as  well  as  “tools  and  techniques  that  make  it  easier  to   put  content  online  and  keep  it  there.”38  byrne  and  goddard  also  note  that  “linked  data  becomes   more  powerful  the  more  of  it  there  is.  until  there  is  enough  linking  between  collections  and   imaginative  uses  of  data  collections  there  is  a  danger  librarians  will  see  linked  data  as  simply   another  metadata  standard,  rather  than  the  powerful  discovery  tool  it  will  underpin.”39    alemu  et  al.  concur  that  making  linked  data  easy  to  create  and  put  online  is  necessary  before   potential  implementers  will  begin  to  use  it.  “it  is  imperative  that  the  said  technologies  be  made   relatively  easy  to  learn  and  use,  analogous  to  the  simplicity  of  creating  html  pages  during  the   early  days  of  the  web.”40  the  potential  learning  curve  involved  in  linked  data  may  be  a  great   barrier  to  its  potential  use.  tennant  writes  in  an  article  about  moving  away  from  marc  to  a  more     linking  libraries  to  the  web  |  gonzales       17   modern  bibliographic  framework  that  users  “must  dramatically  expand  our  understanding  of  what   it  means  to  have  a  modern  bibliographic  infrastructure,  which  will  clearly  require  sweeping   professional  learning  and  retooling.”41   even  without  considering  ease-­‐of-­‐use  difficulties  or  the  challenges  in  teaching  practitioners  an   entirely  new  bibliographic  system,  the  fact  remains  that  transitioning  away  from  marc  toward  any   new  bibliographic  infrastructure  system  will  require  a  great  deal  of  resources,  time  and  effort.   “there  are  literally  billions  of  records  in  marc  formats;  an  attempt  at  making  the  slightest  move   away  from  it  would  have  huge  implications  in  terms  of  resources.”42  breeding  also  writes  of  the   potential  trauma  involved  in  shifting  away  from  marc,  which  is  currently  integral  to  many  library   automation  systems.43  a  shift  to  anything  else  would  require  not  just  the  cooperation  of  libraries   but  also  of  vendors,  who  may  see  no  reason  to  create  systems  compatible  with  anything  other  than   marc.  as  tennant  writes,  “anyone  who  has  ever  been  involved  with  migrating  from  one  integrated   library  system  to  another  knows,  even  moving  from  one  system  based  on  marc/aacr2  to  another   can  be  daunting.”44  moving  from  a  marc/aacr2-­‐based  system  to  one  based  on  an  entirely  new   framework  may  be  more  of  a  challenge  than  many  libraries  would  like  to  take  on.   a  move  to  something  such  as  bibframe  may  be  fraught  with  even  more  difficulty,  though  it  is   impossible  to  say  before  such  an  implementation  has  been  fully  realized.  library  system  software   is  not  yet  compatible  with  bibframe,  and  as  kroeger  writes,  “most  libraries  will  not  be  able  to   implement  bibframe  because  their  systems  do  not  support  it,  and  software  vendors  have  little   incentive  to  develop  bibframe  integrated  library  systems  without  reasonable  certainty  of  library   implementation  of  bibframe.”45  this  catch-­‐22  situation  may  be  difficult  to  remedy  without  a   large  cooperative  effort  between  libraries,  vendors,  and  the  entire  information  community.   another  potential  obstacle  to  bibframe  implementation  that  kroeger  suggests  is  the  possible   difficulty  in  providing  interoperability  with  all  of  the  many  other  metadata  standards  currently  in   existence.46  this  is  an  issue  that  tennant  also  considers  in  his  recommendations  that  a  new   bibliographic  infrastructure  compatible  with  modern  library  and  information  needs  must  be   versatile,  extensible,  and  especially  interoperable  with  other  metadata  schemes  currently  in  use.47   xml  has  proven  to  be  useful  for  a  wide  variety  of  metadata  schemas,  but  bibframe  would  need  to   be  able  to  make  library  data  held  in  a  huge  variety  of  metadata  standards  available  for  use  on  the   web.   another  issue,  cited  by  byrne  and  goddard,  is  that  of  privacy.  “librarians,  with  their  long  tradition   of  protecting  the  privacy  of  patrons,  will  have  to  take  an  active  role  in  linked  data  development  to   ensure  rights  are  protected.”48  issues  of  copyright  and  ownership,  something  libraries  already   grapple  with  in  the  licensing  of  various  library  journals,  databases,  and  other  electronic  resources,   may  be  insurmountable.  “libraries  no  longer  own  much  of  the  content  they  provide  to  users;   rather  it  is  subscribed  to  from  a  variety  of  vendors.  not  only  does  that  mean  that  vendors  will  have   to  make  their  data  available  in  linked  data  formats  for  improvements  to  federated  search  to   happen,  but  a  mix  of  licensed  and  free  content  in  a  linked  data  environment  would  be  extremely     information  technology  and  libraries  |  december  2014   18   difficult  to  manage.”49  again,  overcoming  obstacles  such  as  these  would  require  intense   negotiation  and  cooperation  between  libraries  and  vendors.  a  sustainable  and  viable  move  to  a   linked  data  environment  would  need  to  be  a  cooperative  effort  between  all  involved  parties  and   would  have  to  have  the  full  support  and  commitment  of  everyone  involved  before  it  could  begin  to   move  forward.   moving  libraries  toward  linked  data   making  the  move  toward  the  use  of  linked  data  and  modern  bibliographic  implementations  such   as  bibframe  will  require  a  great  deal  of  cooperation,  sharing,  learning,  and  investigation,  but   libraries  are  already  starting  to  look  toward  a  linked  future  and  what  it  will  take  to  get  there.   libraries  will  need  to  begin  incorporating  the  principles  of  linked  open  data  in  their  own  catalogs   and  online  resources  as  well  as  publishing  and  sharing  as  much  data  as  possible.  libraries  also   need  to  put  forth  a  concerted  effort  to  encourage  vendors  to  move  toward  library  systems  which   can  accommodate  a  linked  data  environment.   alemu  et  al.  write  that  cooperation  and  collaboration  between  all  of  the  involved  stakeholders  will   be  a  crucial  piece  to  the  transfer  of  library  metadata  from  catalog  to  web.  in  the  process,  and  as   part  of  this  cooperative  effort,  libraries  will  have  to  wholeheartedly  adopt  the  rdf/xml  format,   something  alemu  et  al.  deem  “mandatory.”50  this  would  support  the  “conceptual  shift  from   perceiving  library  metadata  as  a  document  or  record  to  what  coyle  (2010)  terms  as  actionable   metadata,  i.e.,  one  that  is  machine-­‐readable,  mash-­‐able  and  re-­‐combinable  metadata.”51   chudnov  adds  that  libraries  will  need  to  follow  “steady  url  patterns”  for  as  much  of  their   resources  as  possible,  one  of  the  key  rules  of  linked  data.  52  he  also  notes  that  we  will  know  we   have  made  progress  on  the  implementation  of  linked  data  when  “link  hubs  at  smaller  libraries   (aka  catalogs  and  discovery  systems)  cross  link  between  local  holdings,  authorities,  these  national   authority  files,  and  peer  libraries  that  hold  related  items,”  though  the  real  breakthrough  will  come   when  “the  big  national  hubs  add  reciprocal  links  back  out  to  smaller  hub  sites.”53  before  this  can   happen,  however,  libraries  must  make  sure  that  all  of  their  own  holdings  link  to  each  other,  from   the  catalog  to  items  in  online  exhibits.  chudnov  also  advocates  adding  user-­‐generated  knowledge   into  the  mix  by  allowing  users  to  make  new  connections  between  resources  when  and  where  they   can.54   borst,  fingerle,  and  neubert,  in  their  conference  report  from  2009,  write  that  libraries  and   projects  using  linked  data  need  to  regard  the  catalog  as  a  network,  publish  their  data  as  linked   data  using  the  semantic  web  standards  laid  out  by  tim  berners-­‐lee,  and  link  to  external  uris.55   they  also  suggest  libraries  use  and  help  to  further  develop  open  standards  that  are  already   available  rather  than  rely  on  in-­‐house  developments.56  in  their  final  recommendation,  they  write   that  while  libraries  need  to  publish  their  data  as  open  linked  data  on  the  web,  they  should  also  try   to  do  so  with  the  “least  possible  restrictions  imposed  by  licences  in  order  to  ensure  widest  re-­‐ usability.”57     linking  libraries  to  the  web  |  gonzales       19   conclusion   the  theories  behind  linked  data  and  the  semantic  web  are  still  in  the  process  of  being  drawn  out,   but  it  is  clear  that  at  this  point  they  are  more  than  hypotheticals.  linked  data  is  the  possible  future   of  the  web  and  how  information  will  be  organized,  searched  for,  discovered,  and  retrieved.  as   search  algorithms  continue  to  improve  and  users  continue  to  turn  to  them  first  (and  sometimes   entirely)  for  their  information  needs,  libraries  will  need  to  make  major  changes  to  ensure  the  data   they  have  painstaking  created  and  curated  over  the  decades  remains  relevant  and  reachable  to   users  on  the  web.  linked  data  provides  the  opportunity  for  libraries  to  integrate  their   authoritative  data  with  user-­‐generated  data  from  the  web,  creating  a  rich  network  of  reliable,   current,  far-­‐reaching  resources  that  will  meet  users’  needs  wherever  they  are.   libraries  have  always  been  known  to  embrace  technology  to  stay  at  the  forefront  of  user  needs   and  provide  unique  and  irreplaceable  user  services.  to  stay  current  with  shifts  in  modern   technology  and  user  behavior,  libraries  need  to  be  a  driving  force  in  the  implementation  of  linked   data,  embrace  semantic  web  standards,  and  take  full  advantage  of  the  benefits  and  opportunities   they  present.  ultimately,  libraries  can  leverage  the  advantages  created  by  linked  data  to  construct   a  better  information  experience  for  users,  keeping  libraries  both  a  relevant  and  more  highly  valued   part  of  information  retrieval  in  the  twenty-­‐first  century.   references     1.   leif  andresen,  “after  marc—what  then?”  library  hi  tech  22,  no.  1  (2004):  41.   2.     ibid.,  40-­‐51.   3.     ibid.,  43.     4.     roy  tennant,  “marc  must  die,”  library  journal  127,  no.  17  (2002):  26–28,   http://lj.libraryjournal.com/2002/10/ljarchives/marc-­‐must-­‐die/#_.     5.     andresen,  “after  marc—what  then?”   6.   tenant,  “marc  must  die.”   7.     andresen,  “after  marc—what  then?”,  43.   8.     tenant,  “marc  must  die.”     9.     getaneh  alemu  et  al.,  “linked  data  for  libraries:  benefits  of  a  conceptual  shift  from  library-­‐ specific  record  structures  to  rdf-­‐based  data  models,”  new  library  world  113,  no.  11/12   (2012):  549-­‐570,  http://dx.doi.org/10.1108/03074801211282920.     10.    marshall  breeding,  “linked  data:  the  next  big  wave  or  another  tech  fad?,”  computers  in   libraries  33,  no.  3  (2013):  20-­‐22,  http://www.infotoday.com/cilmag/.     11.    alemu  et  al.,  “linked  data  for  libraries.”     information  technology  and  libraries  |  december  2014   20     12.    mauro  guerrini  and  tiziana  possemato,  “linked  data:  a  new  alphabet  for  the  semantic  web,”   italian  journal  of  library  &  information  science  4,  no.  1  (2013):  79-­‐80,     http://dx.doi.org/10.4403/jlis.it-­‐6305.   13.    tom  baker,  “designing  data  for  the  open  world  of  the  web,”  italian  journal  of  library  &   information  science  4,  no  1  (2013):  64,  http://dx.doi.org/10.4403/jlis.it-­‐6308.   14.    tim  berners-­‐lee,  “linked  data,”  w3.org,  last  modified  june  18,   2009,  http://www.w3.org/designissues/linkeddata.html.     15.    karen  coyle,  “library  linked  data:  an  evolution,”  italian  journal  of  library  &  information   science  4,  no  1  (2013):  58,  http://dx.doi.org/10.4403/jlis.it-­‐5443.   16.    gianfranco  crupi,  “beyond  the  pillars  of  hercules:  linked  data  and  cultural  heritage,”  italian   journal  of  library  &  information  science  4,  no.  1  (2013),  36,    http://dx.doi.org/10.4403/jlis.it-­‐ 8587.     17.    coyle,  “library  linked  data:  an  evolution,”  56.       18.    ibid.,  56-­‐57.     19.    crupi,  “beyond  the  pillars  of  hercules,”  35.       20.   alemu  et  al.,  “linked  data  for  libraries,”  562.   21.    ibid.   22.   thea  lindquistet  al.,  “using  linked  open  data  to  enhance  subject  access  in  online  primary   sources,”  cataloging  &  classification  quarterly  51  (2013):  913-­‐928,   http://dx.doi.org/10.1080/01639374.2013.823583.   23.    barbara  tillett,  “rda  and  the  semantic  web,  linked  data  environment,”  italian  journal  of   library  &  information  science  4,  no.  1  (2013):  140,  http://dx.doi.org/10.4403/jlis.it-­‐6303.     24.    alemu  et  al.,  “linked  data  for  libraries.”     25.    tillett,  “rda  and  the  semantic  web,  linked  data  environment,”  140.     26.    gillian  byrne  and  lisa  goddard,  “the  strongest  link:  libraries  and  linked  data,”  d-­‐lib   magazine  16,  no.  11/12  (2010),  http://dx.doi.org/10.1045/november2010-­‐byrne.   27.    alemu  et  al.,  “linked  data  for  libraries,”  560.   28.    library  of  congress,  bibliographic  framework  as  a  web  of  data:  linked  data  model  and   supporting  services,  (washington,  dc:  library  of  congress,  november  21  2012),   http://www.loc.gov/bibframe/pdf/marcld-­‐report-­‐11-­‐21-­‐2012.pdf.   29.    barbara  tillett,  “what  is  frbr?  a  conceptual  model  for  the  bibliographic  universe,”  library  of     linking  libraries  to  the  web  |  gonzales       21     congress,  2003,    http://www.loc.gov/cds/downloads/frbr.pdf.   30.    “bibframe  frequently  asked  questions,”  library  of  congress,   http://www.loc.gov/bibframe/faqs/#q04.   31.    ibid.   32.    lindquist  et  al.,  “using  linked  open  data  to  enhance  subject  access  in  online  primary   sources,”  923.   33.    alan  danskin,  “linked  and  open  data:  rda  and  bibliographic  control.”  italian  journal  of   library  &  information  science  4,  no.  1  (2013):  157,  http://dx.doi.org/10.4403/jlis.it-­‐5463.   34.    erik  t.  mitchell,  “three  case  studies  in  linked  open  data.”  library  technology  reports  49,  no.  5   (2013):  26-­‐43.  http://www.alatechsource.org/taxonomy/term/106.   35.    angela  kroeger,  “the  road  to  bibframe:  the  evolution  of  the  idea  of  bibliographic  transition   into  a  post  marc  future,”  cataloging  &  classification  quarterly  51  (2013):  881,   http://dx.doi.org/10.1080/01639374.2013.823584.   36.    jason  w.  dean,  “charles  a.  cutter  and  edward  tufte:  coming  to  a  library  near  you,  via   bibframe,”  in  the  library  with  the  lead  pipe,  december  4,  2013,   http://www.inthelibrarywiththeleadpipe.org/2013/charles-­‐a-­‐cutter-­‐and-­‐edward-­‐tufte-­‐ coming-­‐to-­‐a-­‐library-­‐near-­‐you-­‐via-­‐bibframe/  .     37.    “bibframe  frequently  asked  questions,”  library  of  congress,   http://www.loc.gov/bibframe/faqs/#q04.   38.    daniel  chudnov,  “what  linked  data  is  missing,”  computers  in  libraries  31,  no.  8  (2011):  35-­‐ 36,http://www.infotoday.com/cilmag.   39.    byrne  and  goddard,  “the  strongest  link:  libraries  and  linked  data.”   40.    alemu  et  al.,  “linked  data  for  libraries,”  557.   41.    roy  tennant,  “a  bibliographic  metadata  infrastructure  for  the  twenty-­‐first  century,”  library   hi  tech  22,  no.  2  (2004):  175-­‐181,  http://dx.doi.org/10.1108/07378830410524602.   42.    alemu  et  al.,  “linked  data  for  libraries,”  556.     43.    breeding,  “linked  data.”   44.    tennant,  “a  bibliographic  metadata  infrastructure  for  the  twenty-­‐first  century.”   45.    kroeger,  “the  road  to  bibframe,”  884-­‐885.   46.    ibid.   47.    tennant,  “a  bibliographic  metadata  infrastructure  for  the  twenty-­‐first  century.”     information  technology  and  libraries  |  december  2014   22     48.    byrne  and  goddard,  “the  strongest  link:  libraries  and  linked  data.   49.    ibid.   50.    alemu  et  al.,  “linked  data  for  libraries.”   51.    ibid.,  563.     52.    chudnov,  “what  linked  data  is  missing.”   53.    ibid.   54.    ibid.   55.    timo  borst,  birgit  fingerle,  and  joachim  neubert,  “how  do  libraries  find  their  way  onto  the   semantic  web?”  liber  quarterly  19,  no  3/4  (2010):  336–43,   http://liber.library.uu.nl/index.php/lq/article/view/7970/8271.     56.    ibid.   57.   ibid.,  342-­‐343.   automatic extraction of figures from scientific publications in high-energy physics piotr adam praczyk, javier nogueras-iso, and salvatore mele information technology and libraries | december 2013 25 abstract plots and figures play an important role in the process of understanding a scientific publication, providing overviews of large amounts of data or ideas that are difficult to intuitively present using only the text. state-of-the-art digital libraries, which serve as gateways to knowledge encoded in scholarly writings, do not yet take full advantage of the graphical content of documents. enabling machines to automatically unlock the meaning of scientific illustrations would allow immense improvements in the way scientists work and the way knowledge is processed. in this paper, we present a novel solution for the initial problem of processing graphical content, obtaining figures from scholarly publications stored in pdf. our method relies on vector properties of documents and, as such, does not introduce additional errors, unlike methods based on raster image processing. emphasis has been placed on correctly processing documents in high-energy physics. the described approach distinguishes different classes of objects appearing in pdf documents and uses spatial clustering techniques to group objects into larger logical entities. many heuristics allow the rejection of incorrect figure candidates and the extraction of different types of metadata. introduction notwithstanding the technological advances of large-scale digital libraries and novel technologies to package, store, and exchange scientific information, scientists’ communication pattern has changed little in the past few decades, if not the past few centuries. the key information of scientific articles is still packaged in a form of text and, for several scientific disciplines, in a form of figures. new semantic text-mining technologies are unlocking the information in scientific discourse, and there exist some remarkable examples of attempts to extract figures from scientific publications,1 but current attempts do not provide a sufficient level of generality to deal with figures from high energy physics (hep) and cannot be applied in a digital library like inspire, which is our main piotr adam praczyk (piotr.praczyk@gmail.com) is a phd student at universidad de zaragoza, spain, and research grant holder at the scientific information service of cern, geneva, switzerland. javier nogueras-iso (jnog@unizar.es) is associate professor, computer science and systems engineering department, universidad de zaragoza, spain. salvatore mele (salvatore.mele@cern.ch) is leader of the open access section at the scientific information service of cern, geneva, switzerland. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 26 point of interest. scholarly publications in hep tend to contain highly specific types of figures (as any type of graphical content illustrating the text and referenced from it). in particular, they contain a high volume of plots, which are line-art images illustrating a dependency of a certain quality on a parameter. the graphical content of scholarly publications allows much more efficient access to the most important results presented in a publication.2,3 the human brain perceives the graphical content much faster than reading an equivalent block of text. presenting figures with the publication summary when displaying search results would allow more accurate assessment of the article content and in turn lead to a better use of researchers’ time. enabling users to search for figures describing similar quantities or phenomena could become a very powerful tool for finding publications describing similar results. combined with additional metadata, it could provide knowledge about evolution of certain measurements or ideas over time. these and many more applications created an incentive to research possible ways to integrate figures in inspire. inspire is a digital library for hep,4 the application field of this work. it provides a large-scale digital library service (1 million records, fifty-thousand users), which is starting to explore new mechanisms of using figures in articles of the field to index, retrieve, and present information.5,6 as a first step, direct access to graphical content before accessing the text of a publication can be provided. second, a description of graphics (“blue-band plot,” “the yellow shape region”) could be used in addition to metadata or full-text queries to retrieve a piece of information. finally, articles could be aggregated into clusters containing the same or similar plots in a possible alternative automated answer to a standing issue in information management. the indispensable step to realize this vision is an automated, resilient, and high-efficiency extraction of figures from scientific publications. in this paper, we present an approach that we have developed to address this challenge. the focus has been put on developing a general method allowing the extraction of data from documents stored in portable document format (pdf). the results of the algorithm consist of metadata, raster images of a figure, but also vector graphics, which allows easier further processing. the pdf format has been chosen as the input of the algorithm because it is a de facto standard in scientific communication. in the case of hep, mathematics, and other exact sciences, the majority of publications are prepared using the latex document formatting system and later compiled into a pdf file. the electronic versions of publications from outstanding scientific journals are also provided in pdf. the internal structure of pdf files does not always reveal the location of graphics. in some cases, images are included as external entities and easily distinguishable from the rest of a document’s content, but other times they are mixed with the rest of the content. therefore, to miss any figures, the low-level structure of a pdf had to be analyzed. the work described in this paper focuses on the area of hep. however, with minor variations, the described methods could be applicable to a different area of knowledge. information technology and libraries | december 2013 27 related work over years of development of digital libraries and document processing, researchers came up with several methods of automatically extracting and processing graphics appearing in pdf documents. based on properties of the processed content, these methods can be divided into two groups. the attempts of the first category deal with pdf documents in general, not making any assumptions about the content of encoded graphics or document type. the methods from the second group are more specific to figures from scientific publications. our approach belongs to the second group. tools include command line programs like pdf-images (http://sourceforge.net/projects/pdfimages/) or web-based applications like pdf to word (http://www.pdftoword.com/). these solutions are useful for general documents, but all suffer from the same difficulties when processing scientific publications: graphics that are recognized by such tools have to be marked as graphics inside pdf documents. this is the case with raster graphics and some other internally stored objects. in the case of scholarly documents, most graphics are constructed internally using pdf primitives and thus cannot be correctly processed by tools from the first group. moreover, general tools do not have the necessary knowledge to produce metadata describing the extracted content. with respect to specific tools for scientific publications it must be noted first that important scientific publishers like springer or elsevier have created services to allow access to figures present in scientific publications: the improvement of the sciverse science direct site (http://www.sciencedirect.com) for searching images in the case of elsevier7 and the springerimages service (http://www.springerimages.com/) in the case of springer.8 these services allow searches triggered from a text box, where the user can introduce a description of the required content. it is also possible to browse images by categories such as types of graphics (image, table, line art, video, etc.). the search engines are limited to searches based on figure captions. in this sense, there is little difference between the image search and text search implemented in a typical digital library. most of existing works aiming at the retrieval and analysis of figures use the rasterized graphical representation of source documents as its basis. browuer et al. and kataria et al. describe a method of detecting plots by means of wavelet analysis.9,10 they focus on the extraction of data points from identified figures. in particular, they address the challenge of correctly identifying overlapping points of data in plots. this problem would not manifest itself often in the case of vector graphics, which is the scenario proposed in our extraction method. vector graphics preserve much more information about the documents content than simple values of pixel colours. in particular, vector graphics describe overlapping objects separately. raster methods are also much more prone to additional errors being introduced during the recognition/extraction phase. the methods described in this paper could be used with kataria’s method for documents resulting from a digitization process.11 http://sourceforge.net/projects/pdf-images/) http://sourceforge.net/projects/pdf-images/) http://www.pdftoword.com/). http://www.sciencedirect.com/ http://www.springerimages.com/ automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 28 liu et al. present a page box-cutting algorithm for the extraction of tables from pdf documents.12 their approach is not directly applicable, but their ideas of geometrical clustering of pdf primitives are similar to the ones proposed in our work. however, our experiments with their implementation and hep publications have shown that the heuristics used in their work cannot be directly applied to hep, showing the need for an adapted approach, even in the case of tables. a different category of work, not directly related to graphics extraction but useful when designing algorithms, has been devoted to the analysis of graph use in scientific publications. the results presented by cleveland describe a more general case than hep publications.13 even if the data presented in the work came from scientific publications before 1984, included observations—for example, typical sizes of graphs—were useful with respect to general properties of figures and were taken into account when adjusting parameters of the presented algorithm. finally, there exist attempts to extract layout information from pdf documents. the knowledge of page layout is useful to distinguish independent parts of the content. the approach of layout and content extraction presented by chao and fan is the closest to the one we propose in this paper.14 the difference lies in the fact that we are focusing on the extraction of plots and figures from scientific documents, which usually follow stricter conventions. therefore we can make more assumptions about their content and extract more precise data. for instance, our method emphasizes the role of detected captions and permits them to modify the way in which graphics are treated. we also extract portions of information that are difficult to be extracted using more general methods, such as captions of figures. method pdf files have a complex internal structure allowing them to embed various external objects and to include various types of metadata. however, the central part of every pdf file consists of a visual description of the subsequent pages. the imaging model of pdf uses a language based on a subset of the postscript language. postscript is a complete programming language containing instructions (also called operators) allowing the rendering of text and images on a virtual canvas. the canvas can correspond to a computer screen or to another, possibly virtual, device used to visualize the file. the subset of postscript, which was used to describe content of pdfs, had been stripped from all the flow control operations (like loops and conditional executions), which makes it much simpler to interpret than the original postscript. additionally, the state of the renderer is not preserved between subsequent pages, making their interpretation independent. to avoid many technical details, which are irrelevant in this context, we will consider a pdf document as a sequence of operators (also called the content stream). every operator can trigger a modification of the graphical state of the pdf interpreter, which might be drawing a graphical primitive, rendering an external attached object, or modifying a position of the graphical pointer15 or a transformation matrix.16 the outcome of an atomic operation encoded in the content stream depends not only on parameters of the operation, but also on the way previous operators modified information technology and libraries | december 2013 29 the state of the interpreter. such a design makes a pdf file easy to render but not necessarily easy to analyze. figure 1 provides an overview of the proposed extraction method. at the very first stage, the document is pre-processed and operators are extracted (see “pre-processing of operators” below). later, graphical17 and textual18 operators are clustered using different criteria (see “inclusion of text parts” and “detection and matching of captions” below), and the first round of heuristics rejects regions that cannot be considered figures. in the next phase, the clusters of graphical operators are merged with text operators representing fragments of text to be included inside a figure (see “inclusion of text parts” below). the second round of heuristics detects clusters that are unlikely to be figures. text areas detected by the means of clustering text operations are searched for possible figure captions (see “detection and matching of captions” below). captions are matched with corresponding figure candidates, and geometrical properties of captions are used to refine the detected graphics. the last step generates data in a format convenient for further processing (see “generation of the output” below). figure 1. overview of the figure extraction method. additionally, it must be noted that another important pre-processing step of the method consists of the layout detection. an algorithm for segmenting pages into layout elements called page divisions is presented later in the paper. this considerably improves the accuracy of the extraction method because elements from different page divisions can no longer be considered to belong to the same cluster (and subsequently figure). this allows the method to be applied separately to different columns of a document page. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 30 pre-processing of operators the proposed algorithm considers only certain properties of a pdf operator rather than trying to completely understand its effect. considered properties consist of the operators’ type, the region of the page where the operator produces output and, in the case of textual operations, the string representation of the result. for simplicity, we suppress the notion of coordinate system transformation, inherent for the pdf rendering, and describe all operators in a single coordinate system of a virtual 2-dimensional canvas where operations take effect. transformation operators19 are assigned an empty operation region as they do not modify the result directly but affect subsequent operations. in our implementation, an existing pdf rendering library has been used to determine boundaries of operators. rather than trying to understand all possible types of operators, we check the area of the canvas that has been affected by an operation. if the area is empty, we consider the operation to be a transformation. if there exists a non-empty area that has been changed, we check if the operator belongs to a maintained list of textual operators. this list is created based on the pdf specification. if so, the operators argument list is scanned searching for a string and the operation is considered to be textual. an operation that is neither a transformation nor a textual operation is considered to be graphical. it might happen that text is generated using a graphical operator. however, such a situation is unusual. in the case of operators triggering the rendering of other operators, which is the case when rendering text using type-3 fonts, we consider only the top-level operation. in most cases, separate operations are not equivalent to logical entities considered by a human reader (such as a paragraph, a figure, or a heading). graphical operators are usually responsible for displaying lines or curve segments while humans think in terms of illustrations, data lines, etc. similarly, in the case of text, operators do not have to represent complete or separate words or paragraphs. they usually render parts of words and sometimes parts of more than one word. the only assumption we make about the relation between operators and logical entities is that a single operator does not trigger rendering of elements from different detected entities (figures, captions). this is usually true because logical entities tend to be separated by a modification of the context—there is a distance between text paragraphs or an empty space between curves. clustering of graphical operators the clustering algorithm the representation of a document as a stream of rectangles allows the calculation of more abstract elements of the document. in our model, every logical entity of the document is equivalent to a set of operators. the set of all operators of the document is divided into disjoint subsets in the process called clustering. operators are decided to belong to the same cluster based on the position of their boundaries. the criteria for the clustering are based on a simple but important observation: information technology and libraries | december 2013 31 operations forming a logical entity have boundaries lying close to each other. groups of operations forming different entities are separated by empty spaces. algorithm 1. the clustering algorithm. the clustering of textual operations yields text paragraphs and smaller objects like section headings. however, in the case of graphical operations, we can obtain consistent parts of images, but usually not complete figures yet. outcomes of the clustering are utilized during the process of figures detection. algorithm 1 shows the pseudo-code of the clustering algorithm. the input of the algorithm consists of a set of pre-processed operators annotated with their affected area. the output is a division of the input set into disjoint clusters. every cluster is assigned a boundary equal to the smallest rectangle containing boundaries of all included operations. in the first stage of the algorithm (lines 6–20), we organize all input operations in a data structure of forest of trees. every tree describes a separate cluster of operations. the second stage (lines 21– 29) converts the results (clusters) into a more suitable format. 1: input: operationset input_operations {set of operators of the same type} 2: output: map {spatial clusters of operators} 3: intervaltree tx ← intervaltree() 4: intervaltree ty ← intervaltree() 5: map parent ← map() 6: for all operation op ∈ input_operations do 7: rectangle boundary ← extendbymargins(op.boundary) 8: repeat 9: operationset int_opsx ← tx.getintersectingops(boundary) 10: operationset int_opsy ← ty.getintersectingops(boundary) 11: operationset int_ops ← int_opsx ∩ int_opsy 12: for all operation int_op ∈ int_ops do 13: rectangle bd ← tx[int_op] × ty[int_op] 14: boundary ← smallestenclosing(bd, boundary) 15: parent[int_op] ← op 16: tx.remove(int_op); ty.remove(int_op) 17: end for 18: until int_ops = ∅ 19: tx.add(boundary, op); ty.add(boundary, op) 20: end for 21: map results ← map() 22: for all operation op ∈ input_operations do 23: operation root_ob ← getroot(parent, op) 24: rectangle rec ← tx[int_ob] × ty[int_ob] 25: if not results.has_key(rec) then 26: results[rec] ← list() 27: end if 28: results[rec].add(op) 29: end for 30: return results automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 32 the clustering of operations is based on the relation of their rectangles being close to each other. definition 1 formalizes the notion of being close, making it useful for the algorithm. definition 1: two rectangles are considered to be located close to each other if they are intersecting after expanding their boundaries in every direction by a margin. the value by which rectangles should be extended is a parameter of the algorithm and might be different in various situations. to detect if rectangles are close to each other, we needed a data structure allowing the storage a set of rectangles. this data structure was required to allow retrieving all stored rectangles that intersect a given one. we have constructed the necessary structure using an important observation about the operation result areas. in our model all bounding rectangles have their edges parallel to the edges of the reference canvas on which the output of the operators is rendered. this allowed us to reduce our problem from the case of 2-dimensional rectangles to the case of 1-dimensional intervals. we can assume that edges of the rectangular canvas define the coordinates system. it is easy to prove that two rectangles of edges parallel to the axis of the coordinates system intersect only if both their projections in the directions of axis intersect. the projection of a rectangle into an axis is always an interval. the observation made above has allowed us to build the required 2-dimensional data structure by remembering two 1-dimensional data structures that recall a number of intervals and for a given interval return the set of intersecting ones. such a 1-dimensional data structure has been provided by interval-trees.20 every interval inside the tree has an arbitrary object assigned to it, which in this case is a representation of the pdf operator. this object can be treated as an identifier of the interval. the data structure also implements a dictionary interface, mapping objects to actual intervals. at the beginning, the algorithm initializes two empty interval trees representing projections on the x and y axes, respectively. those trees store values about projections of the biggest so-far calculated areas rather than about particular operators. each cluster is represented by the most recently discovered operation belonging to it. during the algorithm execution, each operator from the input set is considered only once. the order of processing is not important. the processing of a single operator proceeds as follows (the interior of the outermost “for all” loop of the algorithm). 1. the boundary of the operation is extended by the width of margins. the spatial data structure described earlier is utilized to retrieve boundaries of all already detected clusters (lines 9–10) 2. the forest of trees representing clusters is updated. the currently processed operation is added without a parent. roots of all trees representing intersecting clusters (retrieved in previous step) are attached as children of the new operation. information technology and libraries | december 2013 33 3. the boundary of the processed operation is extended to become the smallest rectangle containing all boundaries of intersecting clusters and the original boundary. finally, all intersecting clusters are removed from the spatial data structure. 4. lines 9–17 of the algorithm are repeated as long as there exist areas intersecting the current boundary. in some special cases, more than one iteration may be necessary. 5. finally, the calculated boundary is inserted into the spatial data structure as a boundary of a new cluster. the currently processed operation is designed to represent the cluster and so is remembered as a representation of the cluster. after processing all available operations, the post–processing phase begins. all the trees are transformed into lists. the resulting data structure is a dictionary having boundaries of detected clusters as keys and lists of belonging operations as values. this is achieved in lines 21–29. during the process of retrieving the cluster to which a given operation belongs, we use a technique called path compression, known from the union-find data structure.21 filtering of clusters graphical areas detected by a simple clustering usually do not directly correspond to figures. the main reason for this is that figures may contain not only graphics, but also portions of text. moreover, not all graphics present in the document must be part of a figure. for instance, common graphical elements not belonging to a figure include logos of institutions and text separators like lines and boxes; various parts of mathematical formulas usually include graphical operations; and in the case of slides from presentations, the graphical layout should not be considered part of a figure. the above shows that the clustering algorithm described earlier is not sufficient for the purpose of figures detection and it yields a results set wider than expected. in order to take into account the aforementioned characteristics, pre-calculated graphical areas are subject to further refinement. this part of the processing is highly domain-dependent as it is based on properties of scientific publications in a particular domain, in this case publications of hep. in the course of the refinement process, previously computed clusters can be completely discarded, extended with new elements, or some of their parts might be removed. in this subsection we discuss the heuristics applied for rejecting and splitting clusters of graphical operators. there are two main reasons for rejecting a cluster. the first of them is a size being too small compared to a page size. the second is the figure candidate having its aspect ratio outside a desired interval of values. the first heuristic is designed to remove small graphical elements appearing for example inside mathematical formulas, but also small logos and other decorations. the second one discards text separators and different parts of mathematical equations, such as a line-separating numerator from a denominator inside a fraction. the thresholds used for filtering are provided as automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 34 configurable properties of the algorithm and their values are assigned experimentally in a way maximising the accuracy of figures detection. additionally, the analysis of the order of operations forming the content stream of a pdf document may help to split clusters that were incorrectly joined by algorithm 1. parts of the stream corresponding to logical parts of the document usually form a consistent subsequence. this observation allows the construction of a method of splitting elements incorrectly clustered together. we can assign content streams not only to entire pdf documents or pages, but also to every cluster of operations. the clustering algorithm presented in algorithm 1 returns a set of areas with a list of operations assigned to each of them. the content stream of a cluster consists of all operations from such a set ordered in the same manner as in the original content stream of the pdf document. the usage of the original content stream allows us to define a distance in the content stream as follows: definition 2. if o 1 and o 2 are two operations appearing in the content stream of the pdf document, by the distance between these operations we understand the number of textual and graphical operations appearing after the first of them and before the second of them. to detect situations when a figure candidate contains unnecessary parts, the content stream of a figure candidate is read from the first to the last operation. for every two subsequent operations, the distance between them in the sense of the original content stream is calculated. if the value is larger than a given threshold, the content stream is split into two parts, which become separate figure candidates. for both candidates, a new boundary is calculated. this heuristic is especially important in the case of less formal publications such as slides from presentations at conferences. presentation slides tend to have a certain number of graphics appearing on every page and not carrying any meaning. simple geometrical clustering would connect elements of page style with the rest of the document content. measuring the distance in the content stream and defining a threshold on the distance facilitates the distinction between the layout and the rest of the page. this technique also might be useful to automatically extract the template used for a presentation, although this transcends the scope of this publication. clustering of textual operators the same algorithm that clusters graphical elements can cluster parts of text. detecting larger logically consistent parts of text is important because they should be treated as single entities during subsequent processing. this comprises, for example, inclusion inside a figure candidate (e.g., captions of axes, parts of a legend) and classification of a text paragraph as a figure caption. inclusion of text parts the next step in figures extraction involves the inclusion of lost text parts inside figure candidates. information technology and libraries | december 2013 35 at the stage of operations clustering, only the operations of the same type (graphical or textual) were considered. the results of those initial steps become the input to the clustering algorithm that will detect relations between previously detected entities. by doing this, we move one level farther in the process of abstracting from operations. we start from basic meaningless operations. later we detect parts of graphics and text, and finally we are able to see the relations between both. not all clusters detected at this stage are interesting because some might consist uniquely of text areas. only those results that include at least one graphical cluster may be subsequently considered figure candidates. another round of heuristics marks unnecessary intermediate results as deleted. applied methods are very similar to those described in “filtering of clusters” (above), only thresholds deciding on the rejections must change because we operate on geometrically much larger entities. also the way of application is different—candidates rejected at this stage can be later restored to the status of a figure. instead of permanently removing, heuristics of this stage only mark figure candidates as rejected. this happens in the case of the candidates having incorrect aspect ratio, incorrect sizes or consisting only of horizontal lines (which is usually the case with mathematical formulas but also tables). in addition to using the aforementioned heuristics, having clusters consisting of a mixture of textual and graphical operations allows the application of new heuristics. during the next phase, we analyze the type of operations rather than their relative location. in some cases, steps described earlier might detect objects that should not be considered a figure, such as text surrounded by a frame. this situation can be recognized by the calculation of a ratio between the number of graphical and textual operations in the content stream of a figure candidate. in our approach we have defined a threshold that indicates which figure candidates should be rejected because they contain too few graphics. this allows the removal of, for instance, blocks of text decorated with graphics for aesthetic reasons. the ratio between numbers of graphical and textual operations is smaller for tables than for figures, so extending the heuristic with an additional threshold could improve the table–figure distinction. another heuristic analyzes ratio between the total area of graphical operations and the area of the entire figure candidate. subsequently, we mark as deleted the figure candidates containing horizontal lines as the only graphical operations. these candidates describe tables or mathematical formulas that have survived previous steps of the algorithm. tables can be reverted to the status of figure candidates in later stages of processing. figure candidates that survive all the phases of filtering are finally considered to be figures. figure 2 shows a fragment of a publication page with indicated text areas and final figure candidates detected by the algorithm. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 36 figure 2. a fragment of the pdf page with boxes around every detected text area and each figure candidate. dashed rectangles indicate figure candidates. solid rectangles indicate text areas. detection and matching of captions the input of the part of the algorithm responsible for detecting figure captions consists of previously determined figures and all text clusters. the observation of scientific publications shows that, typically, captions of figures start with a figure identifier (for instance see the grammar for figure captions proposed by bathia, lahiri, and mitra.22 the identifier usually starts with a word describing a figure type and is followed by a number or some other unique identifier. in more complex documents, the figure number might have a hierarchical structure reflecting, for example, the chapter number. the set of possible figure types is very limited. in the case of hep publications, the most usual combinations include words “figure”, “plot,” and different variations of their spelling and abbreviating. information technology and libraries | december 2013 37 during the first step of the caption detection, all text clusters from the publication page are tested for the possibility of being a caption. this consists of matching the beginning of the text contained in a textual cluster with a regular expression determining what is a figure caption. the role of the regular expression is to elect strings starting with one of the predefined words, followed by an identifier or beginning of a sentence. the identifier is subsequently extracted and included in the metadata of a caption. the caption detection has to be designed to reject paragraphs of the type “figure 1 presents results of (. . .)”. to achieve this, we reject the possibility of having any lowercase text after the figure identifier. having the set of all the captions, we start searching for corresponding figures. all previous steps of the algorithm take into account the division of a page into text columns (see “detection of the page layout” below). when matching captions with figure candidates, we do not take into account the page layout. matching between figure candidates and captions happens at every document page separately. we consider every detected caption once, starting with those located at the top of the page and moving down toward the end. for every caption we search figure candidates lying nearby. first we search above the caption and, in the case of failure, we move below the caption. we take into account all figure candidates, including those rejected by heuristics. in the case of finding multiple figure candidates corresponding to a caption, we merge them into a single figure, treating previous candidates as subfigures of a larger figure. we also include small portions of text and graphics previously rejected from figure candidates that lie between figure and caption and between different parts of a figure. these parts of text usually contain identifiers of the subfigures. the amount of unclustered content that can be included in a figure is a parameter of the extraction algorithm and is expressed as a percentage of the height of the document page. it might happen that captions are located in a completely different location, but this case is rare and tends to appear in older publications. the distance from the figure is calculated based on the page geometry. the captions should not be too distant from the figure. generation of the output the choice of the format in which data should be saved at the output of the extraction process should take into account further requirements. the most obvious use case of displaying figures to end users in response to text-based search queries does not yield very sophisticated constraints. a simple raster graphic annotated with captions and possibly some extracted portions of metadata would be sufficient. unfortunately, the process of generating raster representations of figures might lose many important pieces of information that could be used in the future for an automatic analysis. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 38 to store as much data as possible, apart from storing the extracted figures in a raster format (e.g., png), we also decided to preserve their original vector character. vector graphics formats, similarly to pdf documents, contain information about graphical primitives. primitives can be organized in larger logical entities. sometimes rendering of different primitives leads to a modification of the same pixel of resulting image. such a situation might happen, for example, when circles are used to draw data points lying nearby on the same plot. to avoid such issues, we convert figures into scalable vector graphics (svg) format.23 on the implementation level, the extraction of vector representation of a figure proceeds in a manner similar to regular rendering of a pdf document. the interpreter preserves the same elements of the state and allows their modification by transformation operations. a virtual canvas is created for every detected figure. the content stream of the document is processed and all the transformation operations are executed modifying the interpreter’s state. the textual and graphical operators are also interpreted, but they affect only the appropriate canvas of the figure to which the operation belongs. if a particular operation does not belong to any figure, no canvas is affected. the behaviour of graphical canvases used during the svg generation is different from the case of raster rendering. instead of creating graphical output, every operation is transformed into a corresponding primitive and saved within an svg file. pdf was designed in such a manner that the number of external dependencies of a file is minimized. this design decision led to the inclusion of the majority of fonts in the document itself. it would be possible to embed font glyphs in the svg file and use them to render strings. however, for the sake of simplicity, we decided to omit font definitions in the svg output. a text representation is extracted from every text operation, and the operation is replaced by a svg text primitive with a standard font value. this simplification affects what the output looks like, but the amount of formatting information that is lost is minimal. moreover, this does not pose a problem because vector representations are intended to be used during automatic analysis of figures rather than for displaying purposes. a possible extension of the presented method could involve embedding complete information about used glyphs. finally, the generation of the output is completed with some metadata elements. an exhaustive categorization of the metadata that can be compiled for figures could be the customization of the one proposed by liu et al. for table metadata.24 in the case of figures, the following categories could be distinguished: (1) environment/geography metadata (information of the document where the figure is located); (2) affiliated metadata (e.g., captions, references, or footnotes); (3) layout metadata (information about the original visualization of the figure); (4) content data; and (5) figure type metadata. for the moment, we compile only environment/geography metadata and affiliated metadata. the geography/environment metadata consists of the document title, the document authors, the document date (creation and publication), and the exact location of a figure inside a publication information technology and libraries | december 2013 39 (page and boundary). most of these elements are provided by simply referencing the original publication in the inspire repository. the affiliated metadata consists of the text caption and the exact location of the caption in the publication (page and boundary). in the future, metadata from other categories will be annotated for each figure. detection of the page layout figure 3. sample page layouts that might appear in a scientific publication. the black color indicates areas where content is present. in this section we discuss how to detect the page layout, an issue which has been omitted in the main description of the extraction algorithm, but which is essential for an efficient detection of figures. figure 3 depicts several possibilities of organising content on the page. as mentioned in previous sections, the method of clustering operations based on their geometrical position may fail in the case of documents having a complex page layout. the content appearing in different columns should never be considered belonging to the same figure. this cannot be assured without enforcing additional constrains during the clustering phase. to address this difficulty, we enhanced the figure extractor with a pre-processing phase of detecting the page layout. being able to identify how the document page is divided into columns enables us to execute the clustering within every column separately. it is intuitively obvious, what can be understood as a page layout, although to provide a method of calculating such, we need a more formal definition, which we provide below. by the layout of a page, we understand a particular division of a page into areas called columns. each area is a sum of disjoint rectangles. the division of a page into areas must satisfy a set of conditions summarized in definition 3. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 40 definition 3: let p be a rectangle representing the page. the set d containing subareas of a page is called a page division if and only if � 𝑄 = 𝑃 𝑄∈𝐷 ∀𝑥,𝑦∈𝐷𝑥 ∩ 𝑦 = ∅ ∀𝑄∈𝐷𝑄 ≠ ∅ ∀𝑄∈𝐷∃𝑅=�𝑥:𝑥 𝑖𝑠 𝑎 𝑟𝑒𝑐𝑡𝑎𝑛𝑔𝑙𝑒,∀𝑦∈r\{x} 𝑦∩x=∅ �𝑄 = � 𝑥𝑥∈𝑅 every element of a division is called a page area. to be considered a page layout, borders of areas from the division must not intersect the content of the page. definition 3 does not guarantee that the layout is unique. a single page might be assigned different divisions satisfying the definition. additionally, not all valid page layouts are interesting from the point of view of figures detection. the segmentation algorithm calculates one of such divisions, imposing additional constraints on the detected areas. the layout-calculation procedure utilizes the notion of separators, introduced by definition 4. definition 4: a vertical (or horizontal) line inside a page or on its borders is called a separator if its horizontal (vertical) distance from the page content is larger than a given constant value. the algorithm consists of two stages. first, the vertical separators of a sufficient length are detected and used to divide the page into disjoint rectangular areas. each area is delimited by two vertical lines, each of which forms a consistent interval inside of one of the detected vertical separators. at this stage, horizontal separators are completely ignored. figure 4 shows a fragment of a publication page processed by the first stage of the layout-detection. the upper horizontal edge of one of the areas lies too close too close to two text lines. with the constant of the definition 4 chosen to be sufficiently large, this edge would not be a horizontal separator and thus the generated division of the page would require additional processing to become a valid page layout. the second stage of the algorithm transforms the previously detected rectangles into a valid page layout by splitting rectangles into smaller parts and by joining appropriate rectangles to form a single area. information technology and libraries | december 2013 41 figure 4. example of intermediate layout-detection results requiring the refinement. algorithm 2 shows the pseudo-code of the detection of vertical separators. the input of the algorithm consists of the image of the publication page. the output is a list of vertical separators aggregated by their x-coordinates. every element of this list consists of two elements: an integer indicating the x-coordinate and the list of y-coordinates describing the separators. the first element of this list indicates the y-coordinate of the beginning of the first separator. the second element is the y-coordinate of the end of the same separator. the third and fourth elements describe the second separator and the same mechanism is used for the remaining separators (if they exist). the algorithm proceeds according to the sweeping principle known from the computational geometry.25 the algorithm reads the publication page starting from the left. for every xcoordinate value, a set of corresponding vertical separators is detected (lines 9–18). vertical separators are searched as consistent sequences of blank points. a point is considered blank if all the points in its horizontal surrounding of the radius defined by the constant from definition 5 are of the background colour. not all blank vertical lines can be considered separators. short, empty spaces usually delimit lines of text or different small units of the content. in line 11 we test detected vertical separators for being long enough. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 42 if a separator has been detected in a particular column of a publication page, the adjacent columns also tend to contain similar separators. lines 19–31 of the algorithm are responsible for electing the longest candidate among the adjacent columns of the page. the maximization is performed across a set of adjacent columns for which at least one separator exists. algorithm 2. detecting vertical separators. the detected separators are used to create the preliminary division of the page, similar to the one from the example of figure 4. as with the previous step, separators are considered one by one in the order of increasing x coordinate. at every moment of the execution, the algorithm maintains a division of the page into rectangles. this division corresponds only to the already detected vertical separators. updating the previously considered division is facilitated by processing separators in a particular well-defined order. 1: input: the page image 2: output: vertical separators of the input page 3: list> separators ← ∅ 4: int max_weight ← 0; 5: boolean maximizing ← false 6: for all x ∈ {minx … maxx} do 7: emptyb ← 0, current_eval ← 0 8: empty_areas ← list() 9: for all y ∈ {0 … page_height} do 10: if point at (x, y) is not blank then 11: if y – emptyb – 1 > heightmin then 12: empty_areas.append(emptyb) 13: empty_areas.append(y = page_height? y : y-1) 14: current_eval ← current_eval + y emptyb 15: end if 16: emptyb ← y + 1 17: end if 18: end for {we have already processed the entire column. now we are comparing with adjacent already processed columns} 19: if max_weight < current_eval then 20: max_weight ← current_eval 21: max_separators ← empty_areas 22: maxx ← x 23: end if 24: if maximising then 25: if empty_areas = ∅ then 26: separators.add() 27: maximising ← false, max_weight ← 0 28: end if 29: else 30: maximising ← (empty_areas ≠ ∅) 31: end if 32: end for 33: return separators information technology and libraries | december 2013 43 before presenting the final outcome, the algorithm must refine the previously calculated division. this happens in the second phase of the execution. all the horizontal borders of the division are then moved along adjacent vertical separators until they become horizontal separators in the sense of definition 4. typically, moving the horizontal borders result in dividing already existing rectangles into smaller ones. if such a situation happens, both newly created parts are assigned to different page layout areas. sometimes when moving separators is not possible, different areas are combined together, forming a larger one. tuning and testing the extraction algorithm described here has been implemented in java and tested on a random set of scientific articles coming from the inspire repository. the testing procedure has been used to evaluate the quality of the method, but also allowed to tweak the parameters of the algorithm to maximize the outcomes. preparation of the testing set to prepare the testing set, we randomly selected 207 documents stored in inspire. in total, these documents consisted of 37,28 pages which contained 1,697 figures altogether. the records have been selected according to a uniform probability distribution across the entire record space. this way, we have created a collection that is representative for the entire inspire including historical entries. currently, inspire consists of: 1,140 records describing publications written before 1950; 4,695 between 1950 and 1960; 32,379 between 1960 and 1970; 108,525 between 1970 and 1980; 167,240 between 1980 and 1990; 251,133 between 1990 and 2000; and 333,864 in the first decade of the twenty-first century. in total, up to july 2012, inspire manages 952,026 records. it can be seen that the rate of growth has increased with time and most of inspire documents come from the last decade. the results on such a testing set should accurately estimate the efficiency of extraction for existing documents but not necessarily for new documents, being ingested into inspire. this is because inspire contains entries describing old articles which were created using obsolete technologies or scanned and encoded in pdf. the extraction algorithm is optimized for born-digital objects. to test the hypothesis that the extractors provides better results for newer papers, the testing set has been split into several subsets. the first set consists of publications published before 1980. the rest of the testing set has been split into subsets corresponding to decades of publication. to simplify the counting of correct figure detections and to provide a more reliable execution and measurement environment, every testing document has been split into many of pdf documents consisting of a single page. subsequently, every single page document has been manually annotated with the number of figures appearing inside. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 44 execution of the tests the efficient execution of the testing was possible thanks to a special script executing the plots extractor on every single page separately and then computing the total number of successes and failures. the script allows the execution of tests in a distributed heterogeneous environment and allows dynamic connection and disconnection of computing nodes. in the case of a software failure, the extraction request is resubmitted to a different computation node, allowing the avoidance problems related to a worker node configuration rather than to the algorithm implementation itself. during the preparation of the testing set, we manually annotated all the expected extraction results. subsequently, the script compared these metadata with the output of the extractor. using aggregated numbers from all extracted pages allowed us to calculate efficiency measures of the extraction algorithm. as quality measures, we used recall and precision.26 their definitions are included in the following equations: at every place where we needed a single comparable quality measure rather than two semiindependent numbers, we have used a harmonic average of the precision and the recall.27 table 1 summarizes the results obtained during the test execution for every subset of our testing set. figure 5 shows the dependency of recall and precision on the time of publication. the extractor parameters used in this test execution were chosen based on intuition and small number of manually triggered trials. in the next section we describe an automatic tuning procedure we have used to find the most optimal algorithm arguments. testsettheinpresentfigures figuresextractedcorrectly recall # # = figuresextracted figuresextractedcorrectly precision # # = information technology and libraries | december 2013 45 –1980 1980–90 1990–2000 2000–10 2010–12 number of existent figures 114 60 170 783 570 number of correctly detected figures 59 53 164 703 489 number of incorrectly detected figures 26 78 65 40 73 total number of pages 85 136 760 1919 828 number of correctly processed pages 20 44 712 1816 743 table 1. results of the test execution. figure 5. recall and precision as functions of decade of the date of the publication. it can be seen that, as expected, the efficiency increases with the increasing time of publication. a total recall and precision for all samples since 1990, which constitutes a majority of the inspire corpus, were both 88 percent. precision and recall based on the correctly detected figures do not give a full image of the algorithm efficiency because the extraction has been executed on a number of pages not containing any figures. the correctly extracted pages not having any figures do not appear in the recall and precision statistics because in their case the expected and detected number of figures are both equal to 0. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 46 besides recall and precision, figure 5 depicts also the fraction of pages that have been extracted correctly. taking into account the samples since 1990, 3,271 pages out of 3,507 have been detected completely correctly, which makes 93 percent success rate counted by number of pages. as it can be seen, this measure is higher than both the precision and the recall. the analysis of the extractor results in the case of failure shows that in many cases, even if results are not completely correct, they are not far from the expectation. there are different reasons of the algorithm failing. some of them may result from non-optimal choice of algorithm parameters, others from document layout being too far from the assumed one. in some rare cases, even manual inspection of the document does not allow an obvious identification of figures. the automatic tuning of parameters in previous section we have shown the results obtained by executing the extraction algorithm on a sample set. during this execution we were using extractor arguments which seemed to be the most correct based on our observation but also on other research (typical sizes of figures, margin sizes, etc.).28 this way of algorithm configuration was useful during the development, but is not likely to yield the best possible results. to find better parameters, we have implemented a method of automatic tuning. metrics described in the previous section provided a good method of measuring the efficiency of the algorithm running based on given parameters. the choice of optimal parameters can be relative to the choice of documents on which the extraction is to be performed. the way in which the testing set has been selected, allowed us to use it as representative for the hep publications. to tune the algorithm, we have used a described subset of testing set from the previous step as a reference. the subset consisted of all entries created after 1990. this allowed us to minimize the presence of scanned documents which, by design, cannot be correctly processed by our method. the adjustment of parameters has been performed by a dedicated script which has executed the extraction using various parameter values and has read results. the script has been configured with a list of tuneable parameters together with their type and allowed values range. additionally, the script had the knowledge of the believed best value, which was the one used in previous testing. to decrease the complexity of training, we have made several assumptions about the parameters. these assumptions are only an approximation of real nature of parameters, but the practice has shown that they are good enough to permit the optimization: • we assume that the precision and recall are continuous with respect to the parameters. this allows us to assume that efficiency of the algorithm for parameter values close to a given one will be close. the optimization has proceeded by sampling the parametric space in a number of points and executing tests using the selected points as parameter values. information technology and libraries | december 2013 47 having n parameters to optimize and dividing the space of every parameter into m regions leads to the execution of mn tests. execution of every test is a timely operation due to the size of the training set. • we assume that parameters are independent from each other. this means that we can divide the problem of finding an optimal solution in the n-dimensional space of n configuration arguments into finding n solutions in 1-dimensional subspaces. such an assumption seems to be intuitive and considerably reduces the number of necessary tests from o(mn) to o(m⋅n), where m is the number of samples taken from a single dimension. in our tests, the parametric space has been divided into 10 equal intervals in every direction. in addition to checking the extraction quality in those points, we have executed one test for the so-far best argument. in order to increase the level of fine-tuning of the algorithm, each test has been reexecuted in the region, where chances of finding a good solution were considered the highest. this consisted of a region centred around the highest result and having a radius of 10 percent of the parameter space. figure 6 and figure 7 show the dependency of the recall and the precision on an algorithm parameter. the parameter depicted in figure 6 indicates what minimal aspect ratio the figure candidate must have in order to be considered a correct figure. it can be seen that tuning this heuristic increases the efficiency of the extraction. moreover, the dependency of recall and precision on the parameter is monotonic which is the most compatible with the chosen optimization method. the parameter of figure 7 specifies which fraction of the area of the entire figure candidate has to be occupied by graphical operations. this parameter has a lower influence on the extraction efficiency. such a situation can happen when more than one heuristic influences the same aspect of the. this is contradictory with the assumption of parameter independence, but we have decided to use the present model for the simplicity. figure 6. effect of the minimal aspect ratio on precision and recall. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 48 figure 7. effect on the precision and recall of the area fraction occupied by graphical operations. after executing the optimization algorithm, we have managed to achieve a recall of 94.11 percent and a precision of 96.6 percent, which is a considerable improvement compared to previous results of 88 percent. conclusions and future work this work has presented a method for extracting figures from scientific publications in a machinereadable format, which is the main step toward the development of services enabling access and search of images stored in scientific digital libraries. in recent years, figures have been gaining increasing attention in the digital libraries community. however, little has been done to decipher the semantics of these graphical representations and to bridge the semantic gap between content, which can be understood by machines and this which is managed by digital libraries. extracting figures and storing them in uniform and machine-readable format constitutes the first step towards the extraction and the description of the internal semantics of figures. storing semantically described and indexed figures would open completely new possibilities of accessing the data and discovering connections between different types of publishing artefacts and different resources describing related knowledge.29 our method of detecting fragments of pdf documents that correspond to figures is based on a series of observations of the character of publications. however, tests have shown that additional work is needed to improve the correctness of the detection. also, the performance should be reevaluated after we have a large set of correctly annotated figures, confirmed by users of our system. the heuristics used by the algorithm are based on a number of numeric parameters that we have tried to optimize using automatic techniques. the tuning procedure has made several arbitrary assumptions on the nature of the dependency between parameters and extraction results. a future approach to the parameter optimization, requiring much more processing, could information technology and libraries | december 2013 49 involve the execution of a genetic algorithm that would treat the parameters as gene samples.30 this could potentially allow a discovery of a better parameter set because a smaller set of assumptions would be imposed on the parameters. a vector of algorithm parameters could play the role of a gene and random mutations could be introduced to previously considered and subsequently crossed genes. the evaluation and selection of surviving genes could be performed by the usage of the metrics described previously. another approach to improving the quality of the tuning could involve extending the present algorithm by a discovery of mutually dependent parameters and usage of special techniques (relaxing the assumptions) to fine-tune in subspaces spanned by these parameters. all of our experiments have been performed using a corpus of publications from hep. the usage of the extraction algorithm on a different corpus would require tuning the parameters for the specific domain of application. for the area of hep, we can also consider preparing several sets of execution parameters varying by decade of document publication or by other easy to determine characteristics. subsequently, we could decide which extraction method to run, based on those metrics. in addition to a better tuning of the existing heuristics, there are improvements that can be made at the level of the algorithm. for example, we could mention extending the process of clustering text parts. in the current implementation, the margins by which textual operations are extended during the clustering process are fixed as algorithm parameters. this approach proved to be robust in most cases. in fact, distances between text lines tend to be different depending on the currently utilized style. every text portion tends to have one style that dominates. an improved version of the text-clustering algorithm could use local rather than global properties of the content. this would not only allow to correctly handle the entire document written using different text styles, but also help to manage cases of single paragraphs differing from the rest of the content. another important, not-yet-implemented improvement related to figure metadata is the automatic extraction of figure references from the text content. important information about figure content might be stored in the surroundings of the place where publication text refers to a figure. furthermore, the metadata could be extended by the usage of some type of classifier that would assign a graphics type to the extracted result. currently, we are only distinguishing between tables and figures based on simple heuristics involving number and type of graphical areas and the text inside of the detected caption. in the future, we could detect line-plots from photos, histograms and so on. such a classifier could be implemented using artificial intelligence techniques such as support vector machines.31 finally, partial results of the figures extraction algorithm might be useful in performing other pdf analyses: • the usage of clustered text areas could allow a better interpretation and indexing of textual content stored in digital libraries with full-text access. clusters of text tend to describe automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 50 logical parts like paragraphs, section and chapter titles, etc. a simple extension of the current schema could allow the extraction of predominant formatting style of the text encoded in a page area. text parts written in different styles could be indexed in a different manner giving for instance more importance to segments written with larger font. • we mentioned that the algorithm detects not only figures, but also tables. a heuristic is being used in order to distinguish tables from different types of figures. our present effort concentrates on correct treatment of figures, but a useful extension could allow extraction of different types of entities. for instance, another common type of content ubiquitous in hep documents are mathematical formulas. thus, in addition to figures, it would be important to extract tables and formulas in structured format allowing a further processing. the internal architecture of the implemented prototype of the figure extractor allows easy implementation of extension modules which can compute other properties of pdf documents. acknowledgements this work has been partially supported by cern, and the spanish government through the project tin2012-37826-c02-01. references 1. saurabh kataria, “on utilization of information extracted from graph images in digital documents,” bulletin of ieee technical comittee on digital libraries 4, no. 2 (2008), http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html. 2. marti a. hearst et al., “exploring the efficacy of caption search for bioscene journal search interfaces,” proceedings of the workshop on bio nlp 2007: biological, translational and clinical language processing: 73–80, http://dl.acm.org/citation.cfm?id=1572406. 3. lisa johnston, “web reviews: see the science: scitech image databases,” sci-tech news 65, no. 3 (2011), http://jdc.jefferson.edu/scitechnews/vol65/iss3/11. 4. annette holtkamp et al., “inspire: realizing the dream of a global digital library in highenergy physics,” 3rd workshop conference: towards a digital mathematics library, paris, france (july 2010): 83–92. 5. piotr praczyk et al., “integrating scholarly publications and research data—preparing for open science, a case study from high-energy physics with special emphasis on (meta)data models,” metadata and semantics research—ccis 343 (2012): 146–57. 6. piotr praczyk et al., “a storage model for supporting figures and other artefacts in scientific libraries: the case study of invenio,” 4th workshop on very large digital libraries (vldl 2011), berlin, germany (2011). http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html http://dl.acm.org/citation.cfm?id=1572406 http://jdc.jefferson.edu/scitechnews/vol65/iss3/11 information technology and libraries | december 2013 51 7. “sciverse science direct: image search,” elsevier, http://www.info.sciverse.com/sciencedirect/using/searching-linking/image. 8. guenther eichhorn, “trends in scientific publishing at springer,” in future professional communication in astronomy ii (new york: springer, 2011), doi: 10.1007/978-1-4419-83695_5. 9. william browuer et al., “segregating and extracting overlapping data points in twodimensional plots,” proceedings of the 8th acm/ieee-cs joint conference on digital libraries (jcdl 2008), new york: 276–79. 10. saurabh kataria et al., “automatic extraction of data points and text blocks from 2dimensional plots in digital documents,” proceedings of the 23rd aaai conference on artificial intelligence, (2008) chicago: 1169–1174. 11. saurabh kataria, “on utilization of information extracted from graph images in digital documents,” bulletin of ieee technical committee on digital libraries 4, no. 2 (2008), http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html. 12. ying liu et al., “tableseer: automatic table metadata extraction and searching in digital libraries,” proceedings of the 7th acm/ieee-cs joint conference on digital libraries (jcdl’07), vancouver (2007): 91–100. 13. william s. cleveland, “graphs in scientific publications,” american statistician, 38, no. 4, (1984): 261–69, doi: 10.1080/00031305.1984.10483223. 14. hui chao and jian fan, “layout and content extraction for pdf documents,” document analysis systems vi, lecture notes in computer science 3163 (2004): 213–24. 15. at every moment of the execution of a postscript program, the interpreter maintains many variables. some of them encode current positions within the rendering canvas. such positions are used to locate the subsequent character or to define the starting point of the subsequent graphical primitive. 16. transformation matrices are encoded inside the interpreters’ state. if an operator requires arguments indicating coordinates, these matrices are used to translate the provided coordinates to the coordinate system of the canvas. 17. graphical operators are those that trigger the rendering of a graphical primitive. 18. textual operations are the pdf instructions that cause the rendering of the text. textual operations receive the string representation of the desired text and use the current font, which is saved in the interpreters’ state. 19. operations that do not produce any visible output, but solely modify the interpreters’ state. 20. herbert edelsbrunner and hermann a. maurer, “on the intersection of orthogonal objects,” information processing letters 13, nos. 4, 5 (1981): 177–81. http://www.info.sciverse.com/sciencedirect/using/searching-linking/image http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 52 21. thomas h. cormen, charles e. leiserson, and ronald l. rivest, introduction to algorithms, (cambridge: mit electrical engineering and computer science series, 1990). 22. sumit bhatia, shibamouli lahiri, and prasenjit mitra, “generating synopses for documentelement search,” proceedings of the 18th acm conference on information and knowledge management, new york (2009): 2003–6, doi: 10.1145/1645953.1646287. 23. jon ferraiolo, ed., “scalable vector graphics (svg) 1.0 specification,” w3c recommendation 01 september 2001, http://www.w3.org/tr/svg10/. 24. liu et al., “tableseer.” 25. cormen, leiserson, and rivest, introduction to algorithms. 26. ricardo a. baeza-yates and berthier ribeiro-neto, modern information retrieval,” (boston: addison-wesley, 1999). 27. ibid. 28. cleveland, “graphs in scientific publications.” 29. praczyk et al., “a storage model for supporting figures and other artefacts in scientific libraries.” 30. stuart russell and peter norvig, artificial intelligence: a modern approach (third edition) (prentice hall, 2009). 31. sergios theodoridis and konstantinos koutroumbas, pattern recognition (third edition) (boston, academic press, 2006). http://www.w3.org/tr/svg10/ pre-processing of operators clustering of graphical operators the clustering algorithm filtering of clusters clustering of textual operators inclusion of text parts detection and matching of captions generation of the output detection of the page layout preparation of the testing set execution of the tests the automatic tuning of parameters the free software alternative: freeware, open-source software, and libraries james e. corbly information technology and libraries | september 2014 65 abstract this paper will introduce the reader to the world of freeware and open-source software. following a brief introduction, the author presents an overview of these types of software. next comes a discussion of licensing issues unique to freeware and open-source software, which leads directly to issues of registration. the author then offers several strategies readers can adopt to locate these software packages on the web. the author then addresses questions regarding the use of freeware and open-source software before offering a few closing thoughts. introduction i first recognized the potential savings in time, money, and labor offered to librarians and others by freeware and open-source software while i was head of technical and automated services at st. ambrose university in davenport, iowa. among other responsibilities, i oversaw the cataloging and processing of all new library materials. normally, i created original records on oclc whenever i cataloged. one of the tools i needed to complete this work (particularly with foreign language materials) was an ascii chart to provide instructions for making characters not found in english, such as ß, ¿, and ç. today a search for such symbols is relatively easy (most can be found on any character map), but twenty years ago, it presented more of a challenge. i spent many fruitless hours looking for the implement i needed. after much searching, i discovered the right tool—david lord’s ascii chart. this freeware program featured several tables one for ansi characters, control characters, an ebdic table (extended binary coded decimal interchange code), palette (for colors), and a list of ibm pc characters. the first and last charts were the ones i referred to the most. whenever i needed a special symbol in a windows-based program, the ibm pc chart gave me the formula i needed to make it. any time i was in dos or a dos-based utility (as i was with oclc), i consulted the ansi chart to form a diacritical mark. ascii chart was a great piece of software that saved me many hours of work and helped me formulate accurate documents and records. in another instance, shortly after i assumed my duties as the director of library services at kansas wesleyan university, the time clock that registered the work hours of student workers broke down. it was an ancient machine that demanded frequent repairs. and although it dutifully james e. corbly (james.corbly@gmail.com), from austin, minnesota, has been studying freeware and open-source software for over a decade. mailto:james.corbly@gmail.com information technology and libraries | september 2014 66 printed check-in and check-out hours on time cards, it did not calculate the number of hours worked, nor could it prevent such abuses as students punching in and out for one another. rather than taking the machine to a repair shop, i sought an alternative method of record keeping. i found one a computer program from guia international called picture timeclock1 which not only registers each student’s work hours but also totals them so that monthly summaries of each student’s work record can be compiled with increased speed, ease, and accuracy. the software lists among its features a photo identification module that prohibits one student from signing in or out for another. picture timeclock is freeware, so no monies were involved in its procurement. i then took this process a step farther. each month, i digitized the work record sheets required by the business office so i could employ another freeware program, a pdf editor, to transfer the records from picture timeclock onto the digitized time sheets. together, these two programs reduced the amount of time needed to document student worker hours by 88% while enhancing the correctness of the submitted records. as an added bonus, i obtained a permanent digital copy of each student’s work record which could be readily accessed whenever necessary. today, i still rely on freeware and open-source software to solve many of my workaday problems. i utilize these packages for a variety of purposes including web content mining, manipulating pdf documents, and keeping my computers functioning at their optimal level. what are freeware and open-source software? among the major types of software (commercial, shareware, adware, etc.), freeware and opensource software are unique. freeware is copyrighted software given away by its owner (normally the author) for others to use. the author retains sole possession of the copyright, so users cannot alter the software. freeware authors allow individuals to use their productions in any legal manner but they do not allow anyone to sell the software for a profit. additionally, many freeware suppliers impose boundaries on the use of their products, restricting their application in commercial endeavors, for example. open-source software carries the concept of freeware to its ultimate conclusion. with open source products, the copyright holder gives others the right to study, modify, and distribute the software free of charge to anyone for any purpose. quite often, open source products result from the collaborative efforts of contributors living in numerous locations around the world. raw program code, along with the compiled program, is available to anyone who is willing to obtain it, scrutinize it, and make additions or improvements to it in the expectation that the combined efforts of many people will result in a product increasingly useful and reliable to end users. although some open source products lack documentation, many (if not most) have active user groups or communities which serve as sources of assistance to users. to be considered open source, a software product must meet several criteria, among which are the following: the free software alternative| corbly 67 • the software must be freely available without cost, royalties, or fees of any kind. • the program must be distributed as source code (for programmers) and compiled code (for end users). • end users and programmers may alter the program’s code. • the modified source code must be redistributed under the same conditions as the license for the original version of the software.2 one must guard against the temptation to equate freeware and open-source software with publicdomain software. this latter category of software is not copyrighted; hence, it is free of all costs and can be employed without any restrictions whatsoever.3 freeware and open-source software possess copyrights. although many users may be unfamiliar with the concepts of freeware and open-source software, they nonetheless rely on them every day in their routine tasks. if one uses the web, one needs to use a web browser. those who employ microsoft’s internet explorer are using one of the most popular freeware products available,4 while firefox users rely on an open source package.5 many freeware and open source systems are standards in their fields. ccleaner (freeware) has won numerous awards for its efficiency not only as a cleaner and system optimization tool but also as a guardian of user privacy.6 audacity (open source) is a sound recorder and editor employed not only by amateurs but also by professionals in the field.7 access to the web would be impossible without the use of apache (open source), the number one http server.8 it is also important to note that software can change type. freeware can become shareware, commercial software can morph into freeware, etc. licensing licensing of open source products is rather straightforward. although there are over sixty-five different open source licenses, one predominates the general public license (gpl). this is the most popular license used for open-source software. the gpl serves as the license for approximately 70% of open source products. the gpl first appeared in 1989. richard stallman, formerly of the massachusetts institute of technology and founder of the nonprofit free software foundation, authored it. modifications made to the gpl helped keep it vibrant as the years passed and the information technology world changed. as of this writing, the gpl in use is the third version, which came out on june 29, 2007. the foundation of the gpl consists of four principles: 1. the right of individuals to use software for any purpose; 2. the right of individuals to alter software to meet individual needs; 3. the right of individuals to share software with others; and information technology and libraries | september 2014 68 4. the right of individuals to freely distribute the changes one makes to software.9 to these ends, the gpl gives end users the right to freely reproduce and distribute copies of a software program’s source code, providing that each copy displays a copyright notice, a disclaimer of warranty, a copy of the gpl, and gpl notices. the right to modify the software’s code and freely distribute it to others, taking care to list all modifications made to the code and insuring that every condition outlined above is met. commentators often refer to gpl as “copyleft” licensing. copyleft is a method of making software freely available and requires all modified versions to meet the criteria already listed. one can read the text of this license at: https://www.gnu.org/licenses/gpl-3.0.txt.10 in addition to the gpl, open-source software authors distribute their work under other arrangements such as the berkeley software distribution licenses (bsdl), the mozilla public license, the nasa open source agreement, and the common public license. freeware licensing is not nearly as uniform as open source. there is no freeware equivalent to the gpl and the rights and responsibilities of the copyright owner vary from program to program. having noted this, there are clauses that many freeware licenses hold in common. among these are the following: • the copyright to the software is retained either by the author of the software or its provider. • end users may install the software and use it for any legal purpose. • one may install and use the software only on a specified number of computers. • users may copy and distribute the software provided the original copyright remains intact. • one cannot charge a fee for copies of the freeware save for distribution costs. • the software is provided “as is” the copyright owner assumes no liability for any damage caused by product usage. • freeware frequently has usage limits. many freeware licenses permit only personal or noncommercial use of the product. there may be limits on use of the freeware with other software packages and restrictions on freeware use over a network. with such a variety of clauses such as these, it is hardly surprising that freeware licenses vary in length. one freeware program i regularly use has a license consisting of three small paragraphs, while another boasts a license five pages in length. due to these characteristics of freeware licensing, i always copy the text of a freeware license to a blank page of my word processor. i keep this document in the master file of the software so that it is readily available should it be required for any exigency. https://www.gnu.org/licenses/gpl-3.0.txt the free software alternative| corbly 69 software registration the above section on licensing leads directly to questions regarding registration. in this arena, freeware and open-source software differ from other types of software. when employing freeware and open source products, one often finds differences between “personal” and “business” use. many of these packages allow unlimited use of the software as long as its use is strictly personal. in other words, if one downloads a software program, installs it on one’s computer, and utilizes it strictly for one’s own, non-commercial purposes, then use of the software is free. personal use refers to all usage that does not generate financial income, such as scrap-booking, creating personal websites, personal blogs, and print jobs such as flyers, posters, and t-shirts for grade schools and local food banks. in this case, one would simply register the software in one’s name. additionally, charities and other non-profit institutions (such as public, academic, and school libraries) may ordinarily employ freeware and open-source software under the same precincts as individuals, i.e., as personal users. however, suppose one works at a for-profit enterprise and desires to install the software on his/her office computer. naturally, since the firm owns all its office computers, one usually registers the software in the company’s name. under such conditions, many freeware packages oblige the user to seek permission from the software owner for such use. commercial use may also require a fee from the user to the software owner. these caveats also apply to individuals who obtain this software for engagement in their own moneymaking endeavors (such as freelancing). the end-user license agreement, included with all freeware packages, contains information to guide users in such contingencies. to avoid worrying about these details, some freeware and open source users simply treat these programs as if they were commercial products and register their use accordingly. this is a safe practice to adopt. one may be surprised to learn that numerous freeware and open source packages do not require registration at all; others regard user registration as a voluntary exercise. both stipulations presume non-commercial use of the software. locating freeware and open-source software finding freeware and open-source software is a rather simple process. a good place for the neophyte to start searching for them is at datamation.11 datamation is a periodical that began life in hard copy format in 1957 but morphed into an e-journal in the 1990s (the final print edition appeared in february 1998). to access the list of open source replacements for popular commercial security tools, for example, simply click on the “open source” heading of the menu bar on the home page of the website. scroll down the next page to discover the list of alternatives to information technology and libraries | september 2014 70 commercial security software in such categories as anti-malware, backup, and browsers, until finally, the reader reaches the last category, web filtering. another important resource comes from pcmag.com. each month, the site presents a list of the best free software available in a particular category such as firewalls, video conferencing, antivirus, and presentation software. both freeware and open-source software are included in each category. last year, experts in the field examined over one hundred free software packages from nine categories. most of the evaluated packages operate on either windows 7 or windows 8, although programs designed for other platforms, such as mac os and android, were included in the lists. also assayed are free cloud-based web applications which run in a browser. note that the lists of free software from 2012 and 2013 are readily available.12 john corpuz’s “45 free and useful windows applications” is a slideshow presenting detailed information on useful applications from a variety of software categories. embedded within each software description is a link which, when clicked, takes the reader to a window where that software may be downloaded.13 dottech offers one of the most comprehensive lists of freeware for windows available. this list consists of individual reports organized around nine categories: cleaning and maintenance, communication, files & documents, work & productivity, pdf tools, privacy and security, multimedia, network/internet tools, and miscellaneous. each report is well-written, concise, and features the type of information computer users need. although i disagree on some of the choices labeled as “best,” i cannot but appreciate the exhaustive nature of these documents. since dottech continuously revises and expands the number of reports in this set, wise users make periodical visits to this site.14 a quick search of the web will bring one to the open-source software directory where one will find open source applications listed by broad categories on the left side of the home page, categories which are then separated into topics before being subdivided by function. to discover network management software, for example, find “administrators” (category), then “networking” (topic), and finally, click on “network management” (function). in this manner, one will obtain a list of open source products matching this description.15 techradar provides another register of recommended free software.16 this site provides detailed descriptions of seventy-six freeware packages, ranging from productivity software to games. a link takes users to techradar’s download channel where one may read information on freeware and open-source software categorized by function and specific name.17 one cannot discuss freeware without mentioning the world’s largest supplier of freeware, microsoft. veteran users of its windows operating system and its office suite are undoubtedly cognizant of the templates and other helps the firm offers them. those implements represent merely a sampling of the valuable and diverse tools the company makes available to the computing world. microsoft provides access to their abundance of freeware via a download site on the web;18 the free software alternative| corbly 71 however, many people find the site difficult to navigate. for this reason, many individuals consult a friendlier, third party site that opens the doors to this unique collection of freeware. one of the better examples of these sites can be found at gizmo’s freeware.19 internet download directories offer an easy and convenient avenue for one to obtain freeware and open-source software. there are a plethora of such sites on the web; unfortunately, they are not all of equal value. some sites are simply better than others. here is a brief list of criteria one may employ to judge these sites: • ease of use. is the site easy to navigate? does it feature categories that enable one to search for a specific type of software? if it contains categories, does their organization enable one to find software quickly and efficiently? • language. can one easily comprehend the language used to describe the software? does the language express complex concepts in laypeople’s terms or in a manner targeted to information technology professionals? • are the software packages accurately described? do the descriptions detail desirable and undesirable traits of the software? do they clearly indicate the operating system required by the software? what other prerequisites does the software require for effective operation? are alternatives to that software specified? • are software reviews presented? if so, who wrote the reviews laypeople or information technology professionals? do the reviews offer download statistics? does the site contain a ratings system easily understood by the user? • are aids available to help users make optimum use of the software? if aids are available, what format are they in videos, documentation, other? • look for other features which may prove useful, such as: is there a link to the software’s home page? does the download directory provide access to more than one download site? among the download sites meeting these criteria are the following: cnet download.com http://download.cnet.com/windows/ majorgeeks.com http://www.majorgeeks.com/ softpedia http://www.softpedia.com/ filehorse.com http://www.filehorse.com/ softonic http://en.softonic.com/ filehippo.com http://www.filehippo.com/ freewareweb http://www.freewareweb.com/ http://download.cnet.com/windows/ http://www.majorgeeks.com/ http://www.softpedia.com/ http://www.filehorse.com/ http://en.softonic.com/ http://www.filehippo.com/ http://www.freewareweb.com/ information technology and libraries | september 2014 72 freeware guide http://www.freeware-guide.com/ freeware files http://freewarefiles.com/ sourceforge http://sourceforge.net/ this brief list does not even begin to exhaust the number of internet download directory websites available for use. frequent visits to these and similar websites will prove amply rewarding. another method of finding this software is to simply look for it on the web via a search engine. to do this most efficiently, one will need the proper name of the software one is trying to obtain. however, if one has that element, one will find this an effective technique of obtaining freeware and open-source software. how does one seek freeware and open source equivalents to commercial software and shareware? first, one may consult a website entitled alternativeto.20 this website will present one with a list of alternatives to specifically named commercial software packages. one has only to click on the search dialog box, key in the name of the commercial product one wishes to find alternatives to, and press the search icon. the list of results will include freeware, open-source software, and commercial products. clicking on the name of the product with transport one to that product’s home page, where one will gain additional information on that product and be given an opportunity to download the product onto one’s computer. secondly, one may consult any of several lists of software equivalents from the web. one of the better of these registers is “100 open source apps to replace everyday software,” by cynthia harvey.21 this list provides not only names of individual open source projects and the commercial packages they replace but also links to homepages of these projects. library-specific freeware and open-source software in the past several years, an increasing number of librarians have turned to freeware and opensource software to help them fulfill the duties they are obligated to discharge. one of the most renowned examples of library-specific open-source software is koha, an integrated library system.22 programmers and librarians developed koha fifteen years ago for the horowhenua library trust in new zealand. since then, libraries of all types, including public, academic, and school libraries, have adopted koha as their integrated library system. numerous consortia across the globe also employee koha to meet their needs and those of their users. one way novices may apprise themselves of this software is to consult websites such as the creative librarian, which features a page entitled “open-source software for libraries,” where software is enumerated by library function.23 any type of library, irrespective of the library’s size, can utilize the software described on this page. http://www.freeware-guide.com/ http://freewarefiles.com/ http://sourceforge.net/ the free software alternative| corbly 73 questions regarding the use of freeware and open-source software all this is not to say that freeware and open-source software do not have their challenges. for instance, in one anecdote i related in the introduction in this paper, i did not name the pdf editor i used at kansas wesleyan. that is because the software is no longer available; unfortunately, neither is david lord’s ascii chart. whenever one spots a piece of freeware or open-source software that may be useful, it is imperative to download it immediately. the availability of many packages is limited, and once gone, they are usually difficult, if not impossible, to obtain. another concern regards documentation. when i first obtained picture timeclock, for example, a complete set of instructions was available from the software provider. that is no longer the case. although an increasing number of freeware and open source packages offer documentation, many do not. however, as noted earlier in this report, many open-source software products have user groups called communities that exist not only to improve the software but also to provide technical assistance to those who use the software. downloading freeware and open-source software presents its own quandaries. even though most providers of these packages go to great lengths to insure the cleanliness of their product, it is nevertheless true that viruses and malware sometimes attach themselves to this software. whenever my security software activates during a download, i immediately cease the downloading process and make a note of the site for future reference. additionally, i always run my security software against all software downloads before installing them in order to keep my system free of any potential threats. issues also arise from employing freeware and open-source software in business offices. individuals bring most of this software into enterprise environments. since the organization itself doesn’t procure this software, the corporation’s information technology personnel may be reluctant to provide support. indeed, the corporation’s it department may not even permit an individual to download any outside software whatsoever onto a system under their domain. before one attempts to install such software (regardless of type of software) on one’s business unit, one should check with the company’s it people to obtain their views on the proposed installation. closing thoughts one question remains: why bother with freeware and open-source software? are librarians searching for new software programs to master? do they need an additional task to add to their todo lists? are freeware and open-source software worthy of the attention of already overworked and stressed-out librarians? yes, they are worthy of our attention. why? for three key reasons. first, freeware and open-source software are cost-effective. for no monies whatsoever, freeware and open-source software offer librarians the opportunity to add important new tools to the arsenal of implements at their information technology and libraries | september 2014 74 disposal. that means that badly needed funds can be more strategically used by the library to help enable it to fulfill its mission to its clientele. secondly, freeware and open-source software enable librarians to make increased use of computer hardware. computers are machines: they require software to not only tell them which tasks to execute but also to provide instructions for performing those tasks. with this software, the range of computer capabilities not only expands in terms of numbers but also increases in terms of efficiency. the bottom line: freeware and open-source software enhance the value of computer hardware to the library and its patrons. finally, with the assistance of freeware and open-source software, librarians become better librarians. they manage their time more effectually, make better use of the resources at their disposal, and elevate the degree of customer service at all levels of the organization. freeware and open-source software can expedite the handling of routine assignments and make possible the fulfillment of other jobs that, due to time and human limitations, are difficult, if not impossible, to address. freeware and open-source software are good for librarians, good for the library, and good for those who depend on the library for the fulfillment of their information needs. they truly foster what many individuals refer to as a “win-win situation” in the world of information acquisition, organization, preservation, and retrieval. urls cited 1. “picture timeclock,” guia international corporation, http://workschedules.com/store/product/picture_time_clock.aspx. 2. “the open source definition,” open source initiative, http://opensource.org/docs/osd. 3. “public-domain software,” webopedia: online tech dictionary for it professionals, http://www.webopedia.com/term/p/public_domain_software.html. 4. “fast and fluid for windows 7: get internet explorer 11,” microsoft, http://windows.microsoft.com/en-us/internet-explorer/download-ie. 5. “firefox web browser,” mozilla, http://www.mozilla.org/en-us/firefox/new/. 6. “ccleaner,” piriform, http://www.piriform.com/ccleaner. 7. “audacity,” audacity: free audio editor and recorder, http://audacity.sourceforge.net/. 8. “apache,” the apache http server project, http://httpd.apache.org/. 9. “a quick guide to gplv3,” gnu operating system, http://www.gnu.org/licenses/quick-guidegplv3.html. 10. “gnu general public license,” gnu operating system, https://www.gnu.org/licenses/gpl3.0.txt. http://workschedules.com/store/product/picture_time_clock.aspx http://opensource.org/docs/osd http://www.webopedia.com/term/p/public_domain_software.html http://windows.microsoft.com/en-us/internet-explorer/download-ie http://www.mozilla.org/en-us/firefox/new/ http://www.piriform.com/ccleaner http://audacity.sourceforge.net/ http://httpd.apache.org/ http://www.gnu.org/licenses/quick-guide-gplv3.html http://www.gnu.org/licenses/quick-guide-gplv3.html https://www.gnu.org/licenses/gpl-3.0.txt https://www.gnu.org/licenses/gpl-3.0.txt the free software alternative| corbly 75 11. “about us: datamation,” datamation, http://www.datamation.com/about/. 12. “the best free software,” pcmag.com, http://www.pcmag.com/article2/0,2817,2381528,00.asp. 13. “45 free and useful applications,” tom’s guide: tech for real life, http://www.tomsguide.com/us/pictures-story/286-39-best-free-windows-apps.html. 14. “best windows free software,” dottech, http://dottech.org/best-free-windows-software. 15. “open-source software directory,” http://www.opensourcesoftwaredirectory.com/. 16. “the best free software for your pc: essential pc programs you should download today,” techradar, http://www.techradar.com/us/news/software/the-best-free-software-for-yourpc-1221029 . 17. “newest downloads,” techradar, http://www.techradar.com/us/downloads. 18. “microsoft download center,” microsoft corporation, http://www.microsoft.com/enus/download/. 19. “best free microsoft downloads,” gizmo’s freeware: the best freeware reviewed and rated, http://www.techsupportalert.com/content/best-free-microsoft-downloads.htm. 20. “alternativeto,” http://alternativeto.net/. 21. “100 open source apps to replace everyday software,” datamation, http://www.datamation.com/open-source/100-open-source-apps-to-replace-everydaysoftware-1.html. 22. “koha library software,” official website of koha library software, http://kohacommunity.org/. 23. “open-source software for libraries,” the creative librarian, http://creativelibrarian.com/library-oss/. http://www.datamation.com/about/ http://www.pcmag.com/article2/0,2817,2381528,00.asp http://www.tomsguide.com/us/pictures-story/286-39-best-free-windows-apps.html http://dottech.org/best-free-windows-software http://www.opensourcesoftwaredirectory.com/ http://www.techradar.com/us/news/software/the-best-free-software-for-your-pc-1221029 http://www.techradar.com/us/news/software/the-best-free-software-for-your-pc-1221029 http://www.techradar.com/us/downloads http://www.microsoft.com/en-us/download/ http://www.microsoft.com/en-us/download/ http://www.techsupportalert.com/content/best-free-microsoft-downloads.htm http://alternativeto.net/ http://www.datamation.com/open-source/100-open-source-apps-to-replace-everyday-software-1.html http://www.datamation.com/open-source/100-open-source-apps-to-replace-everyday-software-1.html http://koha-community.org/ http://koha-community.org/ http://creativelibrarian.com/library-oss/ lib-mocs-kmc364-20131012113604 233 lit a a ward, 1980: maurice j. freedman s. michael malinconico this is the third presentation of the lit a award for outstanding achievement. the first two honored individuals whose achievements can be said to have created the discipline we know as library automation. the first award went to fred kilgour whose vision, daring, and entrepreneurial and managerial skills changed the way libraries operate almost overnight, and may in the increasingly stringent economic times ahead have helped ensure the economic viability of libraries. the second award went to henriette avram, whose untiring efforts on behalf of the marc formats and their promulgation is only just short of legendary. this year's winner distinguished himself in a somewhat different manner. his contributions did not lead to the development of new automated systems or services. rather, his outstanding achievement lies in the creative and pioneering use he made of technology in support of a clear vision of effective library service. his contribution comes from the depth of sensitivity and understanding he brought to the application of technology to library service. much to our go<;>d fortune, he has chosen to share with us through his many writings the insights he has found in his study of the fit between technology and the delivery of effective library service. this year's winner shares the distinction, with the two previous winners, of being a former president of the division. in fact, he presided over the change from the venerable acronym isad to the new name of the division: library and information technology association (lita). it gives me particular pleasure to present this year's award, as it goes not simply to an esteemed colleague but to a valued friend. i first met maurice (mitch) freedman at the first ala conference i attended-the midwinter meeting of 1972. the first session i attended at that conference was a meeting of the committee on library automation (cola). i had gone to that meeting to report on nypl's automated cataloging system, which had that month become fully operational with the publication of the book catalogs of the research libraries and of the mid-manhattan library. following the cola program, mitch approached me, introduced himself, and inquired about the possibility of using the nypl system to produce hennepin county's catalog. the consequences of that afternoon were most salutary both for the hennepin county library (hcl) and for me personally. hcl acquired at no cost an automated bibliographic control system, and i gained a friendship that has endured for nearly a decade. thus, rather than dwelling on mitch's professional accomplishments-which are already well known to you-1 would prefer to say a few words about the man himself. perhaps the best way to characterize him is to describe to you his office at 234 journal of library automation vol. 14/3 september 1981 maurice freedman (left) receiving 1980 lita award presented by s. michael malinconico (right). columbia university. prominently displayed on the walls are two enormous posters, one of bertrand russell and another of lenny bruce. a perhaps odd pair until one realizes that these men had one important attribute in common: neither of them accepted, without incontrovertible proof, truths supported by conventional wisdom alone. mitch, like the philosopher and satirist whose images grace the walls of his office, is an iconoclast who insists on more than the endorsement of reigning authority before he will embrace an idea; and he will work tirelessly to change the prevailing wisdom if he finds that it serves to frustrate rather than aid the delivery of the kind and quality of library service to which he feels the patrons of libraries are entitled. likewise, though he was among the pioneers who helped introduce sophisticated technologies such as automation and micrographics into the operation of libraries, he has always maintained a healthy skepticism, which has prevented him from being seduced by the dry voices of the hollow men who proclaim marvels that are in reality only gilded figures of straw. just as lenny bruce refused to accept contemporary conventions regarding language and behavior, mitch freeman has refused to accept the sanctity of lc subject terminology. he, sanford berman, and joan marshall have served for more than a decade as lc's conscience, prodding our phlegmatic, de facto national library to action. just as bertrand russell returned to the axioms of giuseppe peano in an attempt to secure the foundation of mathematics in formal logic and to lita award 235 free that discipline of fuzzy thinking, mitch has returned to the principles articulated by antonio panizzi and seymour lubetzky, as the tests by which to judge the claims of the self-assured mountebanks who regale us with newly coined bibliographic wisdom. in this regard i anxiously await the completion of his doctoral dissertation, in which he explores the philosophical underpinnings of theories of bibliographic control (a work that would have proved most useful during the protracted emotional debate that surrounded aacr2). i expect that it must be particularly gratifying for mitch to accept his award in this particular city. although his physical roots are in the northeast, i rather think his intellectual and spiritual roots are here, or more precisely, in the city across the bay-berkeley. it was just about twenty years ago that mitch, after graduating from rutgers university, newark, enrolled as a graduate student in philosophy at the university of california, berkeley. while at berkeley, his sense of social justice and utter disdain for unsupported dogma-could one expect less of a student of philosophy?led him to become active in the free speech movement. thus, we find very early in his career a concern for social issues, a concern that reemerged in his active involvement with the social responsibilities round table shortly after joining the library profession. before leaving berkeley, mitch earned his degree in library science. thus, he earned his degree from one of the most prestigious library schools on the west coast, and now plies his trade as associate professor at one of the most prestigious library schools on the east coast, the columbia university school of library service. if he is only moderately successful in conveying to his students his dedication to the delivery of quality library service, his steadfast conviction that technical services is in reality the first step in the provision of effective public service, and a respect for the supremacy of principle over expedience, his graduating classes will constitute a more lasting and meaningful award than this simple gesture conferred upon him by his professional colleagues. lib-mocs-kmc364-20140103102946 103 book reviews libraries in new york city, edited by molly herman. new york: columbia university school of library service, 1971. 214 pp. $3.50. this guide to libraries in new york is comprehensive, and the description of each library is thorough. pages 184 and 185 list libraries in which there are active and significant automation projects. frederick g. kilgour cobol logic and programming, by fritz a. mccameron, homewood, ill.: richard d. irwin, inc., 1970. 254 pp. $6.00. this book provides a good introduction to cobol, although the author implies that cobol logic is different from ·other computer language logics. however, many examples are included in the text to illustrate new commands and there are numerous review questions, exercises and problems in each chapter. the problems of later chapters build on the logical designs presented earlier. thus, the reader can follow a problem from analysis through solution. the book would be a more useful self-instruction guide as well as textbook if the answers to recall questions and exercises were given. a sound understanding of cobol should be gained from solving the fairly sophisticated problems at the end of the book. one unique and useful idea is the inclusion of coding sheet, punch card, printout, test data and output facsimiles. the most serious drawback of this book in regard to library automation is its obvious slant toward business applications. while the cobol commands presented are sufficient for most applications, there is no mention of character manipulation commands such as examine, with tallying and replacing options. in addition, problems are oriented toward bookkeeping and inventory controls. valerie ]. ryder die universitii.tsbibliothek auf der industrieausstellung: 1. wissen auf abruf. 16 pp. 2. dokumentation-lnformation. 16 pp. berlin: universitatsbibliothek der technischen universitiit berlin, 1970. no price. this constitutes a report (in two parts) of the contribution of the library of the technical university of west berlin to the official german industrial exposition held september 27 to october 6, 1968. the library's special exhibit was part of a section labeled: "quality through research and development." it attempted to give a synoptic view of modern library proce104 journal of library automation vol. 4/2 june, 1971 dures and their value for improving science library services. the examples demonstrated emphasized document acquisition procedures and the various readers' services. a total area of approximately 600 square feet was divided into two rooms, one showing technical equipment and the other, besides housing a twx-terminal, was furnished as a reference reading room. the terminal connected the exhibit area with the reference department of the library of the technical university. graphic charts on the walls explained functions of the typical science library in germany and the kinds of services offered. no fundamental differences from the situation in other western countries, especially the usa, can be pointed out. it may be mentioned here that west germany has an efficient organization of union catalogs, one for almost every state (bavaria, wiirtemberg baden including palatinate, hessen, nordrhein-westphalia, hamburg, and west-berlin). interlibrary loan requests go first to a region's union catalog and from there, when the item is traced within the region, to the appropriate lending library, which forwards the item or copy to the requesting library. non-traceable titles are automatically sent on to a neighboring state's union catalog, and so on, until the item is found and sent to the requesting library. reader/ copier machines for different systems of micronized text material were displayed and could be operated by the visitors. under the title "document circulation" the application of edp methods were shown, using machine readable paper tape for borrowing records. the system described was an off-line one, using (presumably daily) lists of the updated circulation master file. other graphic charts described the automated document retrieval system installed at the library of the technical institute of delft, netherlands, and the integrated library system of euratom in ispra, italy, which includes a selected dissemination of information service. computer generated bookform catalogs of monographic and serials records of other west german science libraries were on display, together with information dealing with the european translations center in delft, which records all scientific translations and publishes "world index of scientific translations." a film showing the operation of the national lending library of great britain was demonstrated. literature analysis, recording, storing, and retrieval are the topics of the second part of the report. electromechanical documentation methods using punch cards, and more often punch paper tape, with their corresponding machinery for selecting and writing back records, were shown under operating conditions. a computer based automatic information retrieval system, developed by siemens on the hardware of the current rca spectra 70 computer series was also exhibited. the system named "golem" claims to have some advantages over the medlars i system of the national library of medicine. it is operational at siemens/ edp headquarters in munich. richard a. polacsek book reviews 105 marc manuals used by the library of congres~> , prepared by the information systems office, library of congress. 2d ed. chicago: information science and automation division, american library association, 1970. 70, 318, 26, 18 p. this second edition contains the same four manuals as did the first, issued in 1969, although the titles of some of the individual manuals have been changed. the manuals are: 1. books: a marc format. 4th ed., april 1970 (formerly the subscriber's guide to the marc distt·ibution service. 3d ed.) 2. data preparation manual: marc editors. 3d ed., april 1970. 3. transcription manual: marc typists. 2d ed., april 1970. 4. computer and magnetic tape unit usability study. the fourth manual has been reproduced unchanged from the 1969 edition. the third, which contains the keyboarding procedures designed to convert bibliographic data into machine readable form, has been given a subtitle and completely revised to apply to a different keying device, the ibm mt /st, model v. it is the first two manuals, however, which will attract the widest continuing study outside of the library of congress. both manuals have been updated. significant changes from the previous edition of each are indicated in the margin by a double asterisk at the point where the revision was made. no indication is made of deletions, however. thus, users who look for field 652, which was described in the earlier edition, will not find it; nor will they find any instructions directing them to fields 651 and 610, which contain the material formerly placed in that discontinued field, although both 651 and 610 are provided with o o to indicate that they contain new material. among the additions to the first manual are provisions for greek, subscript, and superscript characters, and a revision of the 001 field to take into account both the old and the new l.c. card numbering systems. among the deletions is the table showing the ascii 8-bit hex and 6-bit octal in ebcdic hex sequence. the editors' manual contains the procedures followed by the marc editors in preparing data for conversion to machine readable form. while the first edition of the marc manuals contained the first edition of this particular manual, a second edition was issued in july 1969 for internal use within the library of congress. this third edition is essentially the same as the second edition with minor revisions such as the addition of examples and clarifying statements, a few new instructions, and corrections of typographical errors. the double asterisks in this manual refer to changes from the second edition, not from the first, so that owners of the first edition will have to make their own comparisons to see where the third edition differs from the first. among the new, non-asterisked, materials included that did not appear in the first edition are a discussion of other (non-lc) subject headings on 106 journal of library automation vol. 4/2 june, 1971 pp. 111-114 and of romanized titles on pp. 131-132. the third edition also contains several new appendices covering diacritics and special characters, sequence numbers, and correction procedures. while the editors' manual is designed chiefly for use by the editors at l.c., it has great value for marc users. in many places it provides an expansion and explanation of material treated much more briefly in the first manual, books: a marc format. examples of this clarification are the discussion of fixed fields in the editors' manual and its explanation of the alternative entry indicator in the 700 fields, which is merely listed in the first manual. the editors' manual also contains material that does not appear in the first manual, such as the alphabetic alternatives for the numeric tags (which i find more confusing and less memorable than the numeric ones). while only a year intervened between the appearance of the first and second editions of the marc manuals, enough changes have been made to make the new edition a necessary purchase for all those actively involved in the use of marc records. provision of an index would, however, have facilitated its use. judith hopkins computers in knowledge based fields, by charles a. myers. cambridge, mass.: the mit press, 1970. 136 pp. $6.95. a joint project of the industrial relations section, sloan school of management, mit and the inter-university study of labor problems in economic development. the author has written previously on the impact of computers on management. in the current study on the implications of technological change and automation he has selected five areas-formal education and educational administration; library systems; legal services; medical and hospital service; national and centralized local data banks. in this book he is trying to answer such questions as what needs prompted the use of computers, what are the initial applications and what problems were encountered, what affect does the use of computers have on the work performed and what resistance was encountered to their introduction. he also posed the question: can anything be said about comparative costs of computer based programs as compared with other programs? the answer appears to be "no" or "not yet." the chapter on libraries deals primarily with project intrex and thus fails to give an overview of developments in library systems which are operational. the other chapters offer a review of planned and operational projects as of 1968-69. stephen e. furth book reviews 107 libraries and cultural change, ronald c. benge ( hamden, connecticut and london:) archon books & clive bingley ( 1970 ). 278 pp. $9.00. this work is intended primarily to serve library students as an introduction to a consideration of the place of the library in society, with suggestions for further reading. the author is hopeful that it may be of interest to a wider audience, and it is. mr. benge has taught in library schools in the caribbean, west africa, england and wales. this experience is reflected in his approach to a discussion of the social background of library work. although, as he points out, it is possible to establish connections of many kinds, and libraries might be convincingly connected with witchcraft or the illegitimacy rate or prehistoric man, yet more meaningful connections must be sought, and he has selected not only culture, but cultural change, as the basis. further, in his several discussions he has tried to commence with the cultural background and then to note the possible implications for librarianship, rather than to follow the more usual method of commencing with libraries and showing the relevance to them of social forces and other institutions. a listing of a few of mr. benge's fourteen chapter-headings will suggest his development of his theme: "the clash of cultures", "mass communications", "censorship", "the impact of technology", "philosophies of librarianship". each chapter is an urbane essay in the editor's easy-chair manner, a monolog in which the author introduces the reader to that part of the universe that can be viewed through the arch over which the particular chapter-title is inscribed, and relates it to the work of the library. mr. benge is infmmative ( he is up-to-date on all manner of matters; e.g., he has been reading library college and he knows about high john), he is occasionally witty and often convincing. as the basis for class-room discussion his work is perhaps also as stimulating as a propaedeutic should be, but lacking such discussion i doubt this attribute. i find that to stimulate, a book must organize the field of discussion. for me mr. ben~e fails to do so. i find his essays agreeable, with occasional bons mots ('young people, like books, must be preserved for the future "; "guinea pigs are happy creatures") but, like other conversational literature, it leaves me with a general euphoria but unsatisfied logic. for example, the final chapter ("philosophies of librarianship" ) starts out bravely by questioning the relevance of theory but concludes feebly that what is needed to explain librarianship is perhaps a new integration of traditional custodial principles, the missionary approach, and the rationale of a personal reference service. references from other than the anglo-american culture-sphere are few; the book would have gained greatly from more. we here in jlauay are naturally interested to hear what mr. benge has to say on "the impact of technology". in this chapter-regrettablyhe abandons his method of social background first and relevance for libraries 108 journal of library automation vol. 4/2 june, 1971 afterward, and simply notes the direct impact of technology on libraries, mainly in the uk. he concludes that "there can be no doubt that the information crisis does exist and that traditional reference or retrieval methods have not solved it. there is chaos, duplication and waste. what i have tried to suggest here is that on the evidence to date, we cannot yet be sure that machine retrieval is the answer" ( p . 175). there are misprints, to be sure, neither unusually numerous or serious, with one exception. dr. vannevar bush's name (p. 182) has been mangled, and is, moreover, omitted from the name index. verner w. clapp serial publications in large libraries, edited by walter c. allen. urbana, ill.: graduate school of library science, university of illinois, 1970. 194 pp. $4.50. handling of serial publications was the topic of the sixteenth allerton park institute held in november 1969; the papers are published in this slim volume. almost every paper offers a number of controversial and provocative ideas which must have evoked interested and interesting reactions. the subsequent discussions are not reported. problems of serials-the librarian's basket of snakes-are identified and analyzed from selection and acquisition through check-in, cataloging, binding, shelf arrangement, abstracting and indexing, to machine applications. the papers cover this gamut well and in most cases provide a good view of the state of the art. recurrent themes are the significant role of serials in today's information flow, the urgency of the problems (though the content is long on agony and short on therapy), and the necessity for bearing in mind the user's rather than the librarian's convenience where both cannot be accommodated when reaching for solutions. donald hammer's paper on computer aided operations provides a good introduction and overview of automated serials systems, with some helpful hints to beginners in the field. microfilm technology and machine reada.ble commercial abstracting and indexing services are touched on by warren kuhn and bill woods, but each topic deserves more thorough treatment in separate papers. too few of the speakers proposed specific research in their areas; where such long-standing problems exist, some well-directed suggestions might elicit useful studies. the book should be useful to library schools as good coverage of a seldom detailed problem operation, to librarians entering the challenging maelstrom of serials handling, and to those already overinvolved who might be refreshed by the longer view. the poor proofreading is a minor flaw. mary jane reed book reviews 109 training in indexing: a course of the society of indexers, edited by g. norman knight. cambridge, massachusetts: the m. i. t. press, 1969. 219 pp. $7.95. to this reviewer, who h ad struggled through the compilation of one annual index to the journal of library automation with the aid of scarce, unrelated, and out-of-date books and periodicals on the subject of indexing, this thorough, well-written volume, aimed at the neophyte indexer, came as a godsend. it comprises a series of lectures, by master practitioners of the craft, sponsored by the society of indexers. that authors and audience were chiefly british d etracts not a whit from the book's usefulness to americans. two introductory chapters b y robert l. collison on the elements of book indexing are followed by twelve on specific treatment of those elements and of different types of material. chapters on indexing p eriodicals, scientific and technical material will particularly interest readers of lola. exercises, a selected bibliography, and an index that also serves as an illustration of points in the text, enhance the usefulness of this book to the beginner. it should be equally useful to an indexer of no matter how much experience, for , as collison emphasizes in his opening statement, indexing is still in an elementary stage, there are no common rules on which all indexers agree, and everyone considers himself his own authority on how an index should b e arranged and what should go into it. in treating a subject that might seem to the layman to lend itself all too readily to the cut-and-dried approach, the authors have brought a delightful measure of flexibility, wit and imagination. at no point do they lose sight of the fact that the indexing of books, like the writing of them, is a very human endeavor. eleano1· m. kilgour reader in library services and the computer, edited by louis kaplan. washington, d. c.: ncr microcard editions, 1971. 239 pp. $9.95. this volume contains a couple of dozen reprints, mostly of articles. the reader is not intended for those doing research and d evelopment in library automation, but rather for librarians and library students who wish to familiarize themselves with the subject. the quality of the articles is high. in general, they present a conservative position, which is not to say that they oppose library automation. rather, they inform the reader of positive action to be taken and in so doing impart understanding. within this conservative fram ework, however, various viewpoints are expressed. seven subjects group the articles: the challenge ( three articles); varieties of response (six ); theory of management (one); new services (six) ; catalogs and the computer ( two ); copyright (one ); and information retrieval testing ( six ) . the r eader is not a book in the sense that a book 110 journal of library automation vol. 4/ 2 june, 1971 contains a central theme. it is likely that the r eader will be used for its sections rather than in its entirety, but that is the manner in which one expects to use a reader. anyone who so uses it will be enlightened. the reader has but one serious shortcoming. it is devoid of an index. this deficit will seriously hamper consultation of the book. frederick g. kilgour automation management: the social perspective, ed. by ellis l. scott and roger w. bolz. athens, ga., center for the study of automation and society, university of georgia, 1970. (second annual georgia-reliance symposium) $5.75. sixteen papers are presented at this symposium by a variety of authors from labor, management, academe, etc. as in all collections of papers, they are uneven in quality. the preface of the symposium states that the "1970 symposium focused on the problem of automation management, from a social perspective, as it relates to industry, education, labor and government." the papers reflect ideas concerning the need for training and retraining, and for preparing people for automation by having them participate in the decision-making process. three papers on the effects of automation use economic analysis based upon the gross national product and other labor and business indicators and find that the changes predicted for automation in terms of joblessness and increased productivity are unfounded, although some questions are asked about the validity of the figures used to make these assumptions. there are interesting formulations on the nature of change and innovation and the time lag between basic research and industrial application. gordon carson's paper expressly attacks the issue of automation in libraries and in education. dr. carson sees one of the problems as the library's print media orientation when the other senses, such as hearing, could also be used. libraries are also attacked on the basis of how they measure effectiveness, i.e., the number of volumes on the shelf, rather than "the speed with which information can be retrieved from that library and placed in the hands of him who needs to use it." this methodology for measuring effectiveness is changing presently, so that the need expressed by dr. carson may be met. in conclusion, dr. carson states that there are "three essential areas in which automation can be exceptionally helpful in higher education. these are as follows: 1 ) improved teaching techniques including autodidactic learning systems; 2 ) registration, fee payment and curriculum planning .. . ; 3) libraries-information retrieval." although in a way many papers in this volume skirt the periphery of the effects of change and how to create it, it is worthwhile reading on the whole. henry voos book reviews 111 interlibrary loan involving academic libraries, by sarah katharine thomson. chicago: american library association, 1970. (acrl monograph, 32). viii/127 pp. $5.00. interlibrary loan pmcedure manual, by sarah katharine thomson. chicago: american library association, 1970. xi/116 pp. $4.50. interlibrary loan involving academic libraries is a summary version of "a normative survey of current interlibrary loan practices in academic libraries in the united states." it makes surprisingly compulsive reading for anyone who has worked much with interlibrary loans, and might be an eye-opener for those who haven't. (the original, complete version appeared in 1967 as a columbia university dls dissertation.) much of it documents or corroborates the feelings (or suspicions) of busy, experienced interlibrary loan staff; some of it is new and surprising; and doubtless many of the same patterns and trends hold true today. dr. thomson, working primarily with data reported by academic libraries to the u.s. office of education in 1963-64, results of intensive analysis of a sample of 5895 interlibrary loan requests (drawn from a total of 60,000 received by eight major university libraries in 1963-64 and 1964-65 ), and information from several questionnaires, presents a clear picture of who borrowed what from whom, how often; staffing and time required; distribution patterns of requests by size and location of library, type of reader; sources of difficulty, delay and failure; factors predictive of fast and efficient service; and a number of other variables. her results and conclusions are presented clearly, with supportive or illustrative statistics, graphs, correlations, and other tables. chapter 14 offers recommendations of librarians for increasing the proportion of interlibrary loan requests fill ed. suggestions and recommendations resulting from dr. thomson's study were incorporated in, or influenced the drafters of, the 1968 national interlibrary loan code, the model regional or state code, and the 1968 interlibrary loan request form. dr. thomson estimates that interlibrary loan requests involving academic libraries are well over the million mark by now, and refers to a 1965 study which reports large libraries estimating they are unable to fill about onethird of the requests they receive. it is to be hoped that some of the worst faults in interlibrary loan requests have been mitigated by the revised codes, revised form s, and better education of interlibrary loan assistants. the new procedures manual should help, too. perusing this monograph should foster greater awareness and understanding of the dimensions and problems of interlibrary loan service. now, if only we had an up-to-date cost study .... who profits from the appearance of the interlibrary loan procedure manual? not merely ill novices, whether new clerical assistants or young librarians faced with setting up, reorganizing, or streamlining interlibrary loan routines. it has value for the old ill hand, checking up on established routines to be sure no sloppiness has crept in ; for the library school student, as an early exposure to good library cooperation manners, 112 journal of library automation vol. 4/2 june, 1971 as well as a basic step-by-step indoctrination in "how to do it"; for recipient libraries, whose time and patience would be much less strained were all requestors to follow these elementary, commonsense, too often ignored recommendations; and last, not least, the library's patron, whose needs will be filled faster, more economically, with fewer false starts. a wealth of practical detail has been packed into these pages-a plethora of detail, some might complain, confusing the beginner and boring the experienced. but a procedure manual by definition tries to incorporate every stroke and serif of a to z. simple solutions to that complaint are re-reading, and/or judicious scanning. the manual includes annotated texts of the 1968 national interlibrary loan code and the model regional or special code; primer-type instructions for borrowing and requesting libraries (including concise sections on special puzzlers such as academic theses, government publications, technical reports, materials in non-roman alphabets); and consideration of related, often problematical areas such as photocopy, copyright and reprinting, location requests, teletype requests, purchase of dissertations, and international loans. useful appendices (e.g., sample forms, some library policy statements, the text of the ifla international loan code), a bibliography and a detailed index complete the work. chapter levels vary of necessity. for the novice, the teletype request chapter may seem too brief or confusing, yet several appendices (for instance ) will be of interest even to the seasoned ill assistant. throughout, the effort has been for clarity, coverage, explicitness. the cost of an interlibrary loan transaction is too great to indulge sloppy, inefficient, or idiosyncratic procedures, and this manual is therefore required reading for all involved in interlibrary loans, and a copy should be at the elbow of every new clerical assistant. elizabeth rumics microsoft word 5888-14722-8-ce.docx exploratory  subject  searching  in     library  catalogs:  reclaiming  the  vision     julia  bauder  and     emma  lange     information  technology  and  libraries  |  june  2015             92   abstract   librarians  have  had  innovative  ideas  for  ways  to  use  subject  and  classification  data  to  provide  an   improved  online  search  experience  for  decades,  yet  after  thirty-­‐plus  years  of  improvements  in  our   online  catalogs,  users  continue  to  struggle  with  narrowing  down  their  subject  searches  to  provide   manageable  lists  containing  only  relevant  results.  this  article  reports  on  one  attempt  to  rectify  that   situation  by  radically  reenvisioning  the  library  catalog  interface,  enabling  users  to  interact  with  and   explore  their  search  results  in  a  profoundly  different  way.  this  new  interface  gives  users  the  option  of   viewing  a  graphical  overview  of  their  results,  grouped  by  discipline  and  subject.  results  are  depicted   as  a  two-­‐level  treemap,  which  gives  users  a  visual  representation  of  the  disciplinary  perspectives  (as   represented  by  the  main  classes  of  the  library  of  congress  classification)  and  topics  (as  represented   by  elements  of  the  library  of  congress  subject  headings)  included  in  the  results.   introduction   reading  library  literature  from  the  early  days  of  the  opac  era  is  simultaneously  inspiring  and   depressing.  the  enthusiasm  that  some  librarians  felt  in  those  days  about  the  new  possibilities  that   were  being  opened  by  online  catalogs  is  infectious.  elaine  svenonius  envisioned  a  catalog  that   could  interactively  guide  users  from  a  broad  single-­‐word  search  to  the  specific  topic  in  which  they   were  really  interested.1  pauline  cochrane  conceived  of  a  catalog  that  could  group  results  on   similar  aspects  of  a  given  subject,  showing  the  user  a  “systematic  outline”  of  what  was  available  on   the  subject  and  allowing  the  user  to  narrow  their  search  easily.2  marcia  bates  even  pondered   whether  “any  indexing/access  apparatus  that  does  not  stimulate,  intrigue,  and  give  pleasure  in  the   hunt  is  defective,”  since  “people  enjoy  exploring  knowledge,  particularly  if  they  can  pursue  mental   associations  in  the  same  way  they  do  in  their  minds.  .  .  .  should  that  not  also  carry  over  into   enjoying  exploring  an  apparatus  that  reflects  knowledge,  that  suggests  paths  not  thought  of,  and   that  shows  relationships  between  topics  that  are  surprising?”3  however,  looking  back  thirty  years   later,  it  is  dispiriting  to  consider  how  many  of  these  visions  have  not  yet  been  realized.     the  following  article  reports  on  one  attempt  to  rectify  that  situation  by  radically  reenvisioning  the   library  catalog  interface,  enabling  users  to  interact  with  and  explore  their  search  results  in  a     profoundly  different  way.  the  idea  is  to  give  users  the  option  of  viewing  a  graphical  overview  of   their  results,  grouped  by  discipline  and  subject.  this  was  achieved  by  modifying  a  vufind-­‐based     julia  bauder  (bauderj@grinnell.edu)  is  social  studies  and  data  services  librarian,  and     emma  lange  (langemm@grinnell.edu)  is  an  undergraduate  student  and  former  library  intern,   grinnell  college,  grinnell,  iowa.     exploratory  subject  searching  in  library  catalogs:  reclaiming  the  vision  |  bauder  and  lange   doi:  10.6017/ital.v34i2.5888   93   discovery  layer  to  allow  users  to  choose  between  a  traditional,  list-­‐based  view  of  their  search   results  and  a  visualized  view.  in  the  visualized  view,  results  are  depicted  as  a  two-­‐level  treemap,   which  gives  users  a  visual  representation  of  the  disciplinary  perspectives  (as  represented  by  the   main  classes  of  the  library  of  congress  classification  [lcc])  and  topics  (as  represented  by   elements  of  the  library  of  congress  subject  headings  [lcsh])  included  in  the  results.  an  example   of  this  visualized  view  can  be  seen  in  figure  1.   figure  1.  visualization  of  the  results  for  a  search  for  “climate  change.”   subsequent  sections  of  this  paper  summarize  the  library-­‐science  and  computer-­‐science  literature   that  provides  the  theoretical  justification  this  project,  explain  how  the  visualizations  are  created,   and  report  on  the  results  of  usability  testing  of  the  visual  interface  with  faculty,  academic  staff,  and   undergraduate  students.     information  technology  and  libraries  |  june  2015     94   literature  review   exploratory  subject  searching  in  library  catalogs   since  charles  ammi  cutter  published  his  rules  for  a  printed  dictionary  catalogue  in  1876,  most   library  catalogs  have  been  premised  on  the  idea  that  users  have  a  very  good  idea  of  what  they  are   looking  for  before  they  begin  to  interact  with  the  catalog.4  in  this  classic  view,  users  are  either   conducting  known-­‐item  searches—they  know  the  titles  or  the  author  of  the  books  they  want  to   find—or  they  know  the  exact  subject  on  which  they  are  interested  in  finding  books.  yet  research   has  shown  that  known-­‐item  searches  are  only  about  half  of  catalog  searches,5  and  that  users  often   have  a  very  difficult  time  expressing  their  information  needs  with  enough  detail  to  construct  a   specific  subject  search.  instead,  much  of  the  time,  users  approach  the  catalog  with  only  a  vaguely   formulated  information  need  and  an  even  vaguer  sense  of  what  words  to  type  into  the  catalog  to   get  the  resources  that  would  solve  their  information  need.6   even  in  the  earliest  days  of  the  opac  era,  librarians  were  aware  of  this  problem.  some  of  them,   including  elaine  svenonius  and  pauline  cochrane,  speculated  about  better  use  of  subject  and   classification  data  to  try  to  help  users  who  enter  too-­‐short,  overly  broad  searches  focus  their   results  on  the  information  that  they  truly  want.  one  of  cochrane’s  many  ideas  on  this  topic  was  to   use  subject  and  classification  data  “to  present  a  systematic  outline  of  a  subject,”  which  would  let   users  see  all  of  the  different  aspects  of  that  subject,  as  reflected  in  the  library’s  classification   system  and  subject  headings,  and  the  various  locations  where  those  materials  could  be  found  in   the  library.7  svenonius  suggested  using  library  classifications  to  help  narrow  users’  searches  to   appropriate  areas  of  the  catalog.  for  example,  she  suggests,  if  a  user  enters  “freedom”  as  a  search   term,  the  system  might  be  programmed  to  present  to  the  user  contexts  in  which  “freedom”  is  used   in  the  dewey  decimal  classification,  such  as  “freedom  of  choice”  or  “freedom  of  the  press.”  once   the  user  selects  a  one  of  these  phrases,  svenonius  continued,  the  system  could  present  the  user   with  additional  contextual  information,  again  allow  the  user  to  specify  which  context  is  desired,   and  then  guide  the  user  to  the  exact  call  number  range  for  information  on  the  topic.  she  concluded,   “thus  by  contextualizing  vague  words,  such  as  freedom,  within  perspective  hierarchies,  the   computer  might  guide  a  user  from  an  ineptly  or  imprecisely  articulated  search  request  to  one  that   is  quite  specific.”8   ideas  such  as  these  had  little  impact  on  the  design  of  production  library  catalogs  until  the  late   1990s,  when  a  dutch  company,  medialab  solutions,  began  developing  aquabrowser,  which   features  a  word  cloud  composed  of  synonyms  and  other  words  related  to  the  search  term  and   allows  users  to  refocus  their  search  by  clicking  on  these  words.9  aquabrowser  became  available  in   the  united  states  in  the  mid-­‐2000s,  shortly  before  north  carolina  state  university  launched  its   endeca-­‐based  catalog  in  2006.10     while  aquabrowser’s  word  cloud  is  certainly  visually  striking,  the  feature  that  these  and  most  of   the  subsequent  “next-­‐generation”  library  catalogs  implement  that  has  had  the  most  impact  on   search  behavior  is  faceting.  facets,  while  not  as  sophisticated  as  the  systems  envisioned  by     exploratory  subject  searching  in  library  catalogs:  reclaiming  the  vision  |  bauder  and  lange   doi:  10.6017/ital.v34i2.5888   95   svenonius  and  cochrane,  are  partial  solutions  to  the  problems  they  lay  out.  facets  can  serve  to   give  users  a  high-­‐level  overview  of  what  is  available  on  a  topic,  based  on  classification,  format,   period,  or  other  factors.  they  can  also  help  guide  a  user  from  an  impossibly  broad  search  to  a   more  focused  one.  various  studies  have  shown  that  faceted  interfaces  are  effective  at  helping   users  narrow  their  searches,  as  well  as  helping  them  discover  more  relevant  materials  than  they   did  when  performing  similar  tasks  on  nonfaceted  interfaces.11  however,  studies  have  also  shown   that  users  can  become  overwhelmed  by  the  number  and  variety  of  facets  available  and  the  number   of  options  shown  under  each  facet.12   visual  interfaces  to  document  corpora   when  librarians  were  pondering  how  to  create  a  better  online  library  catalog,  computer  scientists   were  investigating  the  broader  problem  of  helping  users  to  navigate  and  search  large  databases   and  collections  of  documents  effectively.  visual  interfaces  have  been  one  of  the  methods  computer   scientists  have  investigated  for  providing  user-­‐friendly  navigation,  with  perhaps  the  most   prominent  early  advocate  for  visual  interfaces  being  ben  shneiderman.13  in  recent  years,   shneiderman  and  other  researchers  have  built  and  tested  various  types  of  experimental  visual   interfaces  for  different  forms  of  information-­‐seeking.14  however,  with  a  few  exceptions,  most  of   these  visual  interfaces  have  remained  in  a  laboratory  rather  than  a  production  setting.15  with  the   exception  of  the  “date  slider,”  a  common  interface  feature  that  displays  a  bar  graph  showing  dates   related  to  the  search  results  and  allows  users  to  slide  handles  to  include  or  exclude  times  from   their  search  results,  few  current  document  search  systems  present  users  with  any  kind  of  visual   interface.   method   the  grinnell  college  libraries  use  vufind,  open-­‐source  software  originally  developed  at  villanova   university  as  a  discovery  layer  to  use  over  a  traditional  ils.  vufind  in  turn  makes  use  of  apache   solr,  a  powerful  open-­‐source  indexing  and  search  platform,  and  solrmarc,  code  developed  within   the  library  community  that  facilitates  indexing  marc  records  into  solr.  using  solrmarc,  marc   fields  and  subfields  are  mapped  to  various  fields  in  the  solr  index;  for  example,  the  contents  of   marc  field  020,  subfield  a,  and  field  773,  subfield  z,  are  both  mapped  to  a  solr  index  field  called   “isbn.”  more  than  fifty  solr  fields  are  populated  in  our  index.  our  visualization  system  was  built  on   top  of  vufind’s  solr  index  and  visualizes  data  taken  directly  from  the  index.     the  visualizations  are  created  in  javascript  using  the  d3.js  visualization  library,  and  they  are   designed  to  implement  shneiderman’s  visual  information  seeking  mantra:  “overview  first,  zoom   and  filter,  then  details-­‐on-­‐demand.”16  the  goal  was  to  give  users  the  option  of  viewing  a  graphical   overview  of  their  results,  grouped  by  disciplinary  perspective  and  topic,  and  then  allow  them  to   zoom  in  on  the  results  from  specific  perspectives  or  on  specific  topics.  once  they  have  used  the   interactive  visualization  to  narrow  their  search,  they  can  choose  to  see  a  traditional  list  of  results   with  full  bibliographic  details  about  the  items.  this  would,  ideally,  provide  a  version  of  the     information  technology  and  libraries  |  june  2015     96   systematic  outline  that  cochrane  envisioned.  it  should  also  support  users  as  they  attempt  to   narrow  down  their  search  results  and  focus  on  a  specific  aspect  of  their  chosen  subject  without   overwhelming  them  with  long  lists  of  results  or  of  facets.   currently,  we  are  visualizing  values  of  two  fields,  one  containing  the  first  letter  of  the  items’   library  of  congress  classification  (lcc)  numbers  and  the  other  containing  elements  of  the  items’   library  of  congress  subject  headings  (lcsh).  this  data  is  visualized  as  a  two-­‐level  treemap.17   first,  large  boxes  are  drawn  representing  the  number  of  items  matching  the  search  within  each   letter  of  the  lcc.  within  the  largest  of  these  boxes,  smaller  boxes  are  drawn  showing  the  most   common  elements  of  the  subject  headings  for  items  matching  that  search  within  that  lcc  main   class.  less  common  subject  heading  elements  are  combined  into  an  additional  small  box,  labeled   “x  more  topics”;  clicking  on  that  box  zooms  in  so  that  users  only  see  results  from  one  lcc  main   class,  and  it  displays  all  of  the  lcsh  headings  applied  to  items  in  that  group.  similarly,  users  can   click  on  any  of  the  smaller  lcc  boxes,  which  do  not  contain  lcsh  boxes  in  the  original   visualization,  to  zoom  in  on  that  lcc  main  class  and  see  the  lcsh  subject  headings  for  it.  both  the   large  and  the  small  boxes  are  sized  to  represent  what  proportion  of  the  results  were  in  that  lcc   main  class  or  had  that  lcsh  subject  heading.     this  is  easier  to  explain  with  a  concrete  example.  let’s  say  a  student  were  to  search  for  “climate   change”  and  click  on  the  option  to  visualize  the  results.  you  can  see  what  this  looks  like  in  figure  1.   instead  of  seeing  a  list  of  nearly  two  thousand  books,  the  student  now  sees  a  visual  representation   of  the  disciplinary  perspectives  (as  represented  by  the  main  classes  of  the  lcc)  and  topics  (as   represented  by  elements  of  the  lcsh)  included  in  the  results.  users  could  click  to  zoom  in  on  any   main  class  within  the  lcc  to  see  all  of  the  topics  covered  by  books  in  that  class,  as  in  figure  2,   where  the  student  has  zoomed  in  on  “s  –  agriculture.”  or  users  could  click  on  any  topic  facet  to  see   a  traditional  results  list  of  books  with  that  topic  facet  in  that  main  class.  at  any  zoom  level,  users   could  choose  to  return  to  the  traditional  results  list  by  clicking  on  the  “list  results”  option.18   we  launched  this  feature  in  our  catalog  midway  through  the  spring  2014  semester.  formal   usability  testing  was  completed  with  five  advanced  undergraduates,  three  staff,  and  two  faculty   members  in  the  summer  of  2014.  (see  appendix  a  for  the  outline  of  the  usability  test.)  one  first-­‐ year  student  completed  usability  testing  in  the  fall  2014  semester.  the  usability  study  asked   participants  to  complete  a  set  list  of  nine  specific,  predetermined  tasks.  some  tasks  involved  the   use  of  now-­‐standard  catalog  features,  such  as  saving  results  to  a  list  and  emailing  results  to  oneself,   while  about  half  of  the  tasks  involved  navigation  of  the  visualization  tool,  which  was  entirely  new   to  the  participants.  each  participant  received  the  same  tasks  and  testing  experience  regardless  of   their  status  as  a  student,  faculty,  or  staff,  and  each  academic  division  was  represented  among  the   participants.       exploratory  subject  searching  in  library  catalogs:  reclaiming  the  vision  |  bauder  and  lange   doi:  10.6017/ital.v34i2.5888   97     figure  2.  visualization  of  the  results  for  a  search  for  “climate  change,”  filtered  to  show  only   results  with  library  of  congress  classification  numbers  starting  with  s.   results   usability  testing  revealed  no  major  obstacles  in  the  way  of  users’  ability  to  navigate  the   visualization  feature;  the  visualized  search  results  were  quickly  deciphered  by  the  participants   with  the  assistance  of  the  context  set  by  the  study’s  outlined  tasks.  familiarity  with  library   catalogs  in  general,  and  the  grinnell  college  libraries  catalog  in  particular,  showed  no  marked   impact  on  users’  performance.  no  particular  user  group  performed  as  an  outlier  in  regards  to   users’  general  ability  to  complete  tasks  or  the  time  required  to  do  so.     the  most  common  issue  to  arise  during  the  session  concerned  the  visualization’s  truncated  text,   which  appears  in  the  far  left  column  of  results  when  the  descriptor  text  contains  too  many   characters  for  the  space  allocated.  (an  example  of  this  truncated  text  can  be  seen  in  figure  1.)  the     information  technology  and  libraries  |  june  2015     98   subject  boxes  appearing  in  the  furthest  left  column  contain  the  least  results,  and  therefore  receive   the  least  space  within  the  visualization.  this  limited  space  sometimes  results  in  truncated  text.   the  full-­‐text  can  be  viewed  by  hovering  over  the  truncated  text  box,  but  few  users  discovered  this   capability.  another  common  concern  involved  a  participant’s  ability  to  switch  their  search  results   from  the  default  list  view  to  the  visualized  view.  all  participants  were  capable  of  selecting  the   “visualize  these  results”  button  required  to  produce  the  visualization,  but  a  handful  of   participants  expressed  that  they  feared  they  would  not  find  that  option  if  they  were  not  prompted   to  do  so.   participants  remarked  that  the  visualization  initially  appeared  daunting  but  then  quickly  became   comfortable  navigating  the  results.  most  participants,  including  staff,  stated  that  they  found  the   tool  useful  and  intended  to  use  it  in  the  future  during  the  course  of  their  typical  work  at  the  college.   conclusion   librarians  have  had  innovative  ideas  for  ways  to  use  subject  and  classification  data  to  provide  an   improved  online  search  experience  for  decades,  yet  after  thirty-­‐plus  years  of  improvements  in   online  catalogs,  users  continue  to  struggle  with  narrowing  down  their  searches  to  produce   manageable  lists  containing  only  relevant  results.19  computer  scientists  have  been  advocating  for   interfaces  to  support  visual  information-­‐seeking  since  the  1980s.  finally,  hardware  and  software   have  improved  to  the  point  where  many  of  these  ideas  can  be  implemented  feasibly,  even  by   relatively  small  libraries.  now  is  the  time  to  put  some  of  them  into  production  and  see  how  well   they  work  for  library  users.  the  particular  visualizations  reported  in  this  article  may  or  may  not   be  the  best  possible  visualizations  of  bibliographic  data,  but  we  will  never  know  which  of  these   ideas  might  prove  to  be  the  revolution  that  library  discovery  interfaces  need  until  we  try  them.         exploratory  subject  searching  in  library  catalogs:  reclaiming  the  vision  |  bauder  and  lange   doi:  10.6017/ital.v34i2.5888   99   appendix  a.  usability  testing  instrument   introductory  questions   before  we  look  at  the  site,  i’d  like  to  ask  you  just  a  few  quick  questions.   —have  you  searched  for  materials  using  the  grinnell  college  libraries’  website  before?  if  so,  what   for  and  when?  (for  students  only:  could  you  please  estimate  how  many  research  projects  you’ve   done  at  grinnell  college  using  the  library  catalog?)   in  the  grinnell  college  libraries,  we’re  testing  out  a  new  tool  in  our  catalog  that  presents  search   results  in  a  different  way  than  you  are  used  to.  now  i’m  going  to  read  you  a  short  explanation  of   why  we  created  this  tool  and  what  we  hope  the  tool  will  do  for  you  before  we  start  the  test.   research  is  a  conversation:  a  scholar  reads  writings  by  other  scholars  in  the  field,  then  enters  into   dialogue  with  them  in  his  or  her  own  writing.  most  of  the  time,  these  conversations  happen  within   the  boundaries  of  a  single  discipline,  such  as  chemistry,  sociology,  or  art  history,  even  when  many   disciplines  are  discussing  similar  topics.  but  when  you  do  a  search  in  a  library  catalog,  writings   that  are  part  of  many  different  conversations  are  all  jumbled  together  in  the  results.  it’s  like  being   thrown  into  one  big  room  where  all  of  these  scholars,  from  all  of  these  different  disciplines,  are   talking  over  each  other  all  at  once.  our  new  visualization  tool  aims  to  help  you  sort  all  of  these   writings  into  the  separate  conversations  in  which  they  originated.     scenarios   now  i  am  going  to  ask  you  to  try  doing  some  specific  tasks  using  3search.  you  should  read  the   instructions  aloud  for  all  tasks  individually  prior  to  beginning  each.  and  again,  as  much  as  possible,   it  will  help  us  if  you  can  try  to  think  out  loud  as  you  go  along.     please  begin  by  reading  the  first  scenario  aloud  and  then  begin  the  first  scenario.  if  you  are  unsure   whether  you  finished  the  task  or  not,  please  ask  me.  i  can  confirm  if  the  task  has  been  completed.   once  you  are  done  with  scenario  1,  please  continue  onto  scenario  2  by  reading  it  aloud  and  then   beginning  the  task.  continue  this  process  until  all  scenarios  are  finished.  if  you  cannot  complete  a   task,  please  be  honest  and  try  to  explain  briefly  why  you  were  unsuccessful  and  continue  to  the   next.     1. pretend  that  you  are  writing  a  paper  about  issues  related  to  privacy  and  the  internet.  do  a   search  in  3search  with  the  words  “privacy  internet.”   2. please  select  the  first  worldcat  result  and  attempt  to  determine  whether  you  have  access   to  the  full  text  of  this  book.  if  not,  please  indicate  where  you  could  request  the  full  text   through  the  interlibrary  loan  service.   3. go  back  to  your  initial  search  results.  please  choose  “explore  these  results”  of  the  ebsco   database  results.  choose  an  article.  if  you  have  unlimited  texting,  have  the  article’s     information  technology  and  libraries  |  june  2015     100   information  texted  to  your  cell  phone.  then,  add  the  article  to  a  new  list  for  future   reference  throughout  this  project.   4. go  back  to  your  initial  search  results.  for  grinnell  college’s  collections  results,  click  on  the   “explore  these  results”  link.  then  click  on  the  “visualize  results”  link  to  visualize  the   results.  which  disciplines  appear  to  have  the  greatest  interest  in  this  topic?   5. when  privacy  and  the  internet  are  discussed  in  the  context  of  law,  what  are  some  of  the   topics  that  are  frequently  covered  in  these  discussions?   6. one  specific  topic  you  are  considering  is  the  legal  issues  around  libel  and  slander  on  the   internet.  how  many  resources  do  the  libraries  have  on  that  specific  topic?   7. click  on  “q  –  science,”  to  see  the  results  authored  by  theoretical  computer  scientists.  based   on  these  results,  what  are  some  of  the  topics  that  are  frequently  covered  in  their   discussions  when  these  computer  scientists  discuss  privacy  and  the  internet?   8. pretend  that  you  are  writing  this  paper  for  a  computer  science  class  and  you  are  supposed   to  address  your  topic  from  a  computer  science  perspective.  please  narrow  your  results  to   only  show  results  that  are  in  the  format  of  a  book.  based  on  this  new  visualization,  what   might  be  some  good  topics  to  consider?   9. add  one  of  these  books  to  the  list  you  created  in  step  3.  please  email  all  of  the  items  on  this   list  to  yourself.   debriefing   thank  you.  that  is  it  for  the  computer  tasks.  i  have  a  few  quick  questions  for  you  now  that  you   have  gotten  a  chance  to  use  the  site.   1. what  do  you  think  about  3search?  is  it  something  that  you  would  use?  why  or  why  not?   2. what  is  your  favorite  thing  about  3search?   3. what  is  your  least  favorite  thing  about  3search?   4. did  you  find  the  visualization  function  useful?  why  or  why  not?   5. do  you  have  any  recommendations  for  changes  to  the  way  this  site  looks  or  works?         exploratory  subject  searching  in  library  catalogs:  reclaiming  the  vision  |  bauder  and  lange   doi:  10.6017/ital.v34i2.5888   101   references     1.     elaine  svenonius,  “use  of  classification  in  online  retrieval,”  library  resources  &  technical   services  27,  no.  1  (1983):  76–80,    http://alcts.ala.org/lrts/lrtsv25no1.pdf.     2.     pauline  a.  cochrane,  “subject  access—free  or  controlled?  the  case  of  papua  new  guinea,”  in   redesign  of  catalogs  and  indexes  for  improved  online  subject  access:  selected  papers  of  pauline   a.  cochrane  (phoenix:  oryx,  1985),  275.  previously  published  in  online  public  access  to  library   files:  conference  proceedings:  the  proceedings  of  a  conference  held  at  the  university  of  bath,  3– 5  september  1984  (oxford:  elsevier,  1985).   3.     marcia  bates,  “subject  access  in  online  catalogs:  a  design  model,”  journal  of  the  american   society  for  information  science  37,  no.  6  (1986):  363,  http://dx.doi.org/10.1002/(sici)1097-­‐ 4571(198611)37:6<357::aid-­‐asi1>3.0.co;2-­‐h   4.     charles  ammi  cutter,  rules  for  a  printed  dictionary  catalog  (washington,  dc:  government   printing  office,  1876).   5.     david  ward,  jim  hahn,  and  kirsten  feist,  “autocomplete  as  a  research  tool:  a  study  on   providing  search  suggestions,”  information  technology  &  libraries  31,  no.  4  (2012):  6–19,   http://dx.doi.org/10.6017/ital.v31i4.1930;  suzanne  chapman  et  al.,  “manually  classifying   user  search  queries  on  an  academic  library  web  site,”  journal  of  web  librarianship  7  (2013):   401–21,  http://dx.doi.org/10.1080/19322909.2013.842096.   6.     n.  j.  belkin,  r.  n.  oddy,  and  h.  m.  brooks,  “ask  for  information  retrieval:  part  i.  background   and  theory,”  journal  of  documentation  (1982):  61–71,  http://dx.doi.org/10.1108/eb026722;   christine  borgman,  “why  are  online  catalogs  still  hard  to  use?,”  journal  of  the  american   society  for  information  science  (1996):  493–503,  http://dx.doi.org/10.1002/(sici)1097-­‐ 4571(199607)47:7<493::aid-­‐asi3>3.0.co;2-­‐p;  karen  markey,  “the  online  library  catalog:   paradise  lost  and  paradise  regained?,”  d-­‐lib  magazine  13,  no.  1/2  (2007),   http://www.dlib.org/dlib/january07/markey/01markey.html.   7.     cochrane,  “subject  access—free  or  controlled?,”  275.   8.     svenonius,  “use  of  classification  in  online  retrieval,”  78–79.   9.     jasper  kaizer  and  anthony  hodge,  “aquabrowser  library:  search,  discover,  refine,”  library  hi   tech  news  (december  2005):  9–12,  http://dx.doi.org/10.1108/07419050510644329.   10.    kristen  antelman,  emily  lynema,  and  andrew  pace,  “toward  a  twenty-­‐first  century  library   catalog,”  information  technology  &  libraries  25,  no.  3  (2006):  128–39,   http://dx.doi.org/10.6017/ital.v25i3.3342.   11.    tod  olson,  “utility  of  a  faceted  catalog  for  scholarly  research,”  library  hi  tech  (2007):  550– 61,  http://dx.doi.org/10.1108/07378830710840509;  jody  condit  fagan,  “usability  studies  of     information  technology  and  libraries  |  june  2015     102     faceted  browsing:  a  literature  review,”  information  technology  and  libraries  29,  no.  2   (2010):  58-­‐66,  http://dx.doi.org/10.6017/ital.v29i2.3144.   12.    kathleen  bauer,  “yale  university  library  vufind  test—undergraduates,”  november  11,  2008,   accessed  september  9,  2014,   http://www.library.yale.edu/usability/studies/summary_undergraduate.doc.   13.    see,  for  example,  ben  shneiderman,  “the  future  of  interactive  systems  and  the  emergence  of   direct  manipulation,”  behaviour  &  information  technology  1  (1982):  237–56,   http://dx.doi.org/10.1080/01449298208914450;  ben  shneiderman,  “dynamic  queries  for   visual  information  seeking,”  ieee  software  11  (1994):  70–77,   http://dx.doi.org/10.1109/52.329404.   14.    see,  for  example,  aleks  aris  et  al.,  “visual  overviews  for  discovering  key  papers  and   influences  across  research  fronts,”  journal  of  the  american  society  for  information  science  &   technology  60  (2009):  2219–28,  http://dx.doi.org/10.1002/asi.v60:11;  furu  wei  et  al.,   “tiara:  a  visual  exploratory  text  analytic  system,”  in  proceedings  of  the  16th  acm  sigkdd   international  conference  on  knowledge  discovery  and  data  mining  (washington,  dc:  acm,   2010),  153–62,  http://dx.doi.org/10.1145/1835804.1835827;  cody  dunne,  ben  shneiderman,   robert  gove,  judith  klavans,  and  bonnie  dorr,  “rapid  understanding  of  scientific  paper   collections:  integrating  statistics,  text  analysis,  and  visualization,”  journal  of  the  american   society  for  information  science  &  technology  63  (2012):  2351–69,   http://dx.doi.org/10.1002/asi.22652.   15.    the  most  notable  exception  is  carrot2  (http://search.carrot2.org),  a  search  tool  that  will   automatically  cluster  web  search  results  and  display  visualizations  of  those  clusters.   16.    ben  shneiderman,  “the  eyes  have  it:  a  task  by  data  type  taxonomy  for  information   visualizations,”  september  1996,  accessed  april  27,  2014,   http://drum.lib.umd.edu/bitstream/1903/5784/1/tr_96-­‐66.pdf.   17.    ben  shneiderman,  “treemaps  for  space-­‐constrained  visualization  of  hierarchies:  including   the  history  of  treemap  research  at  the  university  of  maryland,”  institute  for  systems   research,  accessed  october  6,  2014,  http://www.cs.umd.edu/hcil/treemap-­‐history.   18.    to  explore  this  feature  in  our  catalog,  go  to  https://libweb.grinnell.edu/vufind/search/home,   do  a  search,  and  click  on  the  “visualize  results”  link  in  the  upper  right.   19.    a  recent  project  information  literacy  report  found  that  the  two  aspects  of  research  that  first-­‐ year  students  found  most  difficult  were  “coming  up  with  keywords  to  narrow  down  searches”   and  “filtering  and  sorting  through  irrelevant  results  from  online  searches.”  alison  j.  head,   learning  the  ropes:  how  freshmen  conduct  course  research  once  they  enter  college  (project   information  literacy,  december  5,  2013),   http://projectinfolit.org/images/pdfs/pil_2013_freshmenstudy_fullreport.pdf,  15.   lib-s-mocs-kmc364-20140601053338 two types of designs/ mcgee 203 book reviews the proceedings of the international conference on training for information work, rome, italy , 15th-19th november 1971, edited by georgette lubock. joint publication of the italian national information institute, rome and the international federation for documentation, the hague; f.i.d. publ. 486; sept. 1972, rome, 510 p. let's face it: there is something about any proceedings that elicits a very personal reaction in many of us: "here are papers that either, a) got their authors a trip to the conference city; b ) tell how we did good at our place; or c) unabashedly present h.b.i.'s( half baked ideas )." i personally like proceedings that have many papers under category c); such papers make me think ( or laugh ). the great majority of papers in these rome proceedings fall basically under category b), i.e.-'how we done it good,' and some quite obviously under a), i.e.-'have paper will travel'-well it was rome, italy, after all. however, there is a smattering of papers that fall under c), i.e.-h.b.i.'s. so for those interested in the topic, these proceedings offer among other things some food for speculative thought. for these other things let us start at the beginning. the contents consists of prefatory sections, one opening address, sixty-six papers, a set of twenty brief conclusions, three closing addresses, a summary of work at the conference, an author index, and a list of participants and authors' addresses. the papers are organized according to two major sessions: one on "training of information specialists" (nine invited and fortytwo submitted papers ) and another on "training of information users" (six invited and nine submitted papers ). the larger number of papers on training of specialists vs. training of users probably represents a good assessment of real education interests in the field. the conference was truly international: authors came from four continents, twenty countries and four international organizations. most represented were: italy as host country with fifteen papers, usa with eight, great britain with seven, and france with six papers. the concern for information science education is indeed worldwide; however, if the presented papers are any measure, such education is in big trouble, because one is left with the impression that information science education is in some kind of limbo: the bases, relations, and directions are muddled or nonexistent. but then isn't all contemporary higher education in big trouble, and in limbo? the conceptions of what information science education is all about differ so widely from paper to paper that the question of this difference in itself could be a subject of the next conference. it is my impression that the differences are due to a) widely disparate preconceptions of the nature of "information problems,'' and b) incompetence of a number of authors in relation to the subjects. accomplishments in some other field or, even worse, 204 journal of library automation vol. 5/3 september, 1972 a high administrative title does not necessarily make for competence in information science education. the proceedings offer a fascinating picture of information science education by countries and by various facets. it also offers frustration due to unbelievably unhygienic semantic conditions in the treatment of concepts, including a confusion from the outset of "training" and "education." the first business of the field should be toward clearing its own semantic pollution; such a conclusion can be derived even after a most cursory examination of the papers. my own choices for the three most interesting papers are: -v. slamecka and p. zunde, "science and information: some implications for the education of scientists;" (usa) -s. j. malan, "the implications for south african education in library science in the light of developments in information science;" (south africa) -w. kunz and h. w. j. rittel, "an educational system for the information sciences." (germany) the editing of the ptoceedings is exemplary; the editors and conference organizers worked hard and conscientiously. the proceedings also provide the best single source published so far from which one could gain a wide international overview not only of information science education but also of information science itself, including implicitly the problems the field faces. in this lies the main worth of the proceedings. t efko saracevic computer processing of library files at durham unive1·sity; an ordering and cataloging facility for a small collection using an ibm 360/ 67 machine. by r. n. oddy. durham, england: university library, 1971. 202p. £1.75. the task of the book is to guide the reader in the use of the lfp (library file processing) system developed by the durham university library. the lfp system orders items and prints book catalogs in various sequences for a small collection of items with the aid of an electronic digital computer. the system is batch with card input and printed output; the programs are written in pl/1. "the lfp system was designed to be flexible and easy to operate for small files , and is less suitable for files larger than 10,000 items because there are then other problems which it does not attempt to solve." (p. 10). the book fulfills its assigned task well; it is an excellent example of explanations and instructions for the personnel charged with the day to day operations for the particular system described. the book includes excellent introductory chapters on job control language, how computers operate, file maintenance, etc. outside of the durham university library, however, the book has little use except as a model of a well done operations guide. kenneth ]. bierman lib-s-mocs-kmc364-20141005044246 127 isad ad hoc committee reports introduction after seven years of operation as a division, it is an appropriate time to take stock of the division objectives and to describe desirable future activities of the division. to this end we established in july 1972, four ad hoc committees whose charges were to provide overviews of the division from slightly different perspectives. the first of these, the committee on objectives, was to review the activities of isad since its founding and to recommend future objectives and general activities. second, the seminar and institute topics committee was charged with reviewing past !sadsponsored institute activities and with recommending topics for future seminars and institutes which would be of most value both for the isad membership and for general library personnel. the third committee, research topics, was charged with assembling a priority list of !sad-related research and development needs of libraries. and, the fourth committee, membership survey, was charged with determining by a survey important characteristics of the current isad membership, i.e., their employment, experience, education, interests, and expectations. it can be seen from their charges that there is an interrelationship among the committees and thus, in the work to meet their charges, the committees would certainly wish to know the results of the work of the other committees. however, because of problems of timing and communication, it was recognized that the initial committee work would have to be carried out largely independent of the work of the other committees and that the task of knitting these committee results together would have to be carried out after the committees finished this stage of the work. the report of the committee on objectives was reviewed and accepted by the isad board of directors at the national meeting in las vegas. the work of the seminar topics committee and the research topics committee has not been reviewed by the board of directors. they are presented here in order to elicit thought and comment by the isad membership. the membership survey committee has established the survey to be carried out, but the survey has been delayed until funding could be obtained. this funding has been made available and analysis of survey results will be provided to the membership in the coming year. i should like to thank not only members of the ad hoc committees for their contributions to the division during the past year, but also, the members of the standing committees, the representatives and the chairmen of the discussion groups, all of whom contributed to an active, interesting and useful year for the division.-ralph m. shoffner, past-president, information science and automation division. 128 journal of library auto11ultion vol. 6/ 3 september 1973 report of the committee on objectives background on january 27, 1966, the council of the american library association voted to establish ala's fourteenth division, the information science and automation division, which would concern itself, said the council, "with the development and application of electronic data processing techniques and the use of automated systems in all areas of library work." before this date, there was no membership unit within ala with the sole responsibility for library automation, so there was no effective way for librarians involved with automation to communicate, or to learn from each other's experiences. there was also no way for the national professional association to provide leadership for those in need of information and guidance in the field, since responsibility for the area (to the extent it had been recognized at all) lay fragmented among various units within ala. two of the most important objectives of the division at the beginning, therefore, were the establishment of professional leadership at a national level (largely through the office of the executive secretary) to those libraries and librarians needing advice and help in the field of library automation, particularly smaller libraries unable to afford competent staff members with assigned responsibility in this area; and provision of a forum for discussion of library automation problems and experiences, and other means of communicating information in the field. a number of specific activities were also suggested at the beginning, including the following: 1. the establishment of a journal which would pull together articles on library automation, which at that time were appearing in many different places; 2. provision of a clearinghouse for information on library automation projects; 3. creation of a "bank" of computer programs and related documentation for use by other libraries planning similar applications; 4. evaluation of library automation equipment and applications; 5. tutorial seminars, preconference meetings and other educational programs. other concerns or matters suggested to the ala committee on organization for immediate attention by the new division included: 1. standardization of coding systems; 2. interlibrary distribution of bibliographic data in machine-readable form; 3. shared programming; 4. establishment of library communication networks; 5. automated searching techniques; 6. compatibility of equipment and programming; 7. use measurement and user studies involving automated systems; 8. the financing of cooperative automation projects; and 9. various social and legal problems relating to automation in libraries. during the first few years of the division's existence, most of these activities were pursued, some of them extensively (the seminars, for example) and others isad ad hoc committee reports 129 for only a brief period of time, until the need no longer seemed to exist. early committees also considered the need for a special library programming language, the computer aspects of the new copyright bill (then being drafted), and the method of designating periodical publications known as coden. the committee on objectives in the spring of 1972, ralph shoffner, the incoming president of !sad, appointed an ad hoc committee on objectives to consider and present formal recommendations regarding the future objectives and activities of isad. part of the thought behind appointment of the committee was that since isad had been in existence for five years it would be an appropriate time to review both its activities and objectives, and even to consider whether the division should continue to exist; it also seemed likely that some of the original objectives of the division might have been attained, or might no longer be appropriate, and a revised set of objectives might be needed for the forthcoming years. besides this writer, the committee has included frederick g. kilgour of the ohio college library center; henriette avram of the library of congress; john mcgowan of northwestern university; john knapp of richard abel & company; joseph treyz of the university of wisconsin; and pauline atherton of the school of library science at syracuse university. the membership of the committee was intended not only to incorporate knowledge of !sad's history and informed judgment of the need for various possible activities in the future, but also to be as representative as possible of the various types of library activity and the types of institutions presumably served by the division. first meeting and tentative conclusions the committee held its first meeting on wednesday, june 28, 1972 during the ala annual conference in chicago. after considerable discussion, it was clear that the consensus of the committee members, based on their own experiences and the comments of colleagues to whom they had talked, was that: 1. the education function has been important, and should be continued, especially the seminars, which have been very useful (at least 2,000 people have attended the marc institutes co-sponsored by isad and the library of congress, and hundreds more have attended seminars on other topics at local, regional and national meetings); 2. the journal of library automation and technical communications have also been useful, and should not only be continued but improved and expanded; 3. the executive secretary provides a focus for inquiries from libraries needing help and advice, and has provided much useful information to many libraries and librarians; 4. the idea of a computer program "bank" is not practical at this time, largely because of the costs to ala in terms of staff and operating expenses that such a project would entail; 5. evaluation of equipment and applications is still needed, but would require more staff and expertise than the association can fund at this time; 6. the division has provided useful forums for discussion, particularly such forums as the marc users group which meets twice a year to exchange ideas and ask questions of staff members from the marc office at lc; 130 journal of library automation vol. 6/ 3 september 1973 7. the division has also served a useful function in the promotion and development of standards, and in determining whether the standards have broad support in the library profession; 8. the division has helped to coordinate the concerns of other parts of ala regarding automat~on and information science, both through the executive secretary's office and through such devices as the preconference institute on data bases. the general but tentative conclusion was that the division should continue, but with a revised set of objectives, emphasizing the educative function, the provision of current information through various means, and the development and promotion of standards. requests for comment and responses the committee then decided to test these tentative conclusions by asking the opinions of a number of librarians and others who appeared to be in a position to judge the effectiveness of isad (or the lack of it), or who could be expected to reflect a useful variety of viewpoints. each was sent a letter outlining the purposes of the committee and the tentative conclusions, followed by a request for written comments; each was also invited to attend the next meeting of the committee in january 1973 at the midwinter meeting in washington. the responses by and large agreed with the tentative conclusions, but in many cases went further. for example, it was suggested that the division should take a more active stance, and should emphasize such things as making services available (especially through cooperation) that could not be made available before, reducing library operating costs, making the work of library staff more meaningful, minimizing the impact of library automation on people and jobs, encouraging research in library automation, and encouraging instruction in library automation for all students in library schools. it was also suggested that more emphasis be placed on reporting new developments at the annual meetings, after the model of the american society of physics, and that the division should be a vehicle for a "new librarianship" in which librarians would participate more fully in education and research rather than acting solely as a ''service" organization. another respondent commented that more active liaison was needed with other professional groups in the information processing field, e.g., asis, afips, jcet, etc.; that transferability of programs and applications should continue to be encouraged; and that library schools should be encouraged to offer more courses in library automation. many of those who received requests for comment accepted the committee's invitation to attend the midwinter meeting in washington, and added other points of view. among them were suggestions that isad should do more toward developing the utilization of machine-readable data base~, including the production of indexes to such data bases; foster and encourage computer-based library networks; and promote methods of accountability for librarians and libraries through development of techniques for measuring unit output of products and services. information technology one new and large area of activity for isad was also formally suggested at this meeting: leadership in, and organizational responsibility for, audiovisual isad ad hoc committee reports 131 and related educational technologies, including cable television. the origins of this suggestion date back some months before the washington meeting, and since adoption of this suggestion would entail a major enlargement of isad's area of responsibility it may be useful to provide the background. at present, ala has an audiovisual committee established during the reorganization of the late fifties, plus five divisional subcommittees officially tied to the audiovisual committee, and approximately nineteen other committees actively concerned with audiovisual matters but with no official connection to the ala audiovisual committee. this structure has not provided a "home" for media specialists or a focus for their interests and activities, and ala has been cl"itidzed for years for its sporadic and disjointed attempts to give proper attention to audiovisual matters. · in 1971, don s. culbertson, then executive secretary of isad, expressed formal concern over this issue to the isad board and proposed the establishment of an educational technology section within isad. following this, isad and the ala audiovisual committee agreed to a discussion group within isad to deterrtline the extent of interest and need for such a membership "home." the isad information technology discussion group, formed as a result of this initiative, met first on june 28, 1972, at the chicago conference. those present discussed areas of need and ·alternative organizational approaches, including the present organization of committees and subcommittees; the present organization, but with all committees reporting to the ala audiovisual committee; a round table; and affiliation with isad. becoming a part of isad seemed to this group to offer the most advantages : isad was an established division, capable of accommodating diverse interests, and with many concerns in common with the media specialists; it was already active in many areas of interest to media specialists, particularly standards development and educational programs; and affiliation would offer access to the two divisional publications, jola and ]ola technical communications. louise giles, leader of the discussion group, and pearce grove, chairman of the ala audiovisual committee, then met with the isad board and requested formal affiliation. the board referred the group to the committee on objectives, and on january 30, 1973, mrs. giles, mr. grove, and others interested in this request attended the midwinter meeting of the committee. discussion of this group's petition merged with the discussion of other activities of the division, as reported above, but the committee was unanimous in feeling that the group's request should be granted, and that the activities of isad should include audiovisual and related educational technology. recommendations as a result of its deliberations, and based on the comments and testimony of many interested and affected ala members, the committee now recommends the following: 1. that the division continue to exist; 2. that its area of responsibility include audiovisual and related educational technology; 3. that the objectives of the division be: a. advancement of the state of the art of librarianship via research in, and application of, information science and library automation; 132 ]ourool of library automation vol. 6/ 3 september 1973 b. professional leadership at the national level in the fields of information science and library automation; c. education and communication of information in these fields for libraries, librarians, and other interested parties; d. provision of expertise and assistance in these fields to other units of ala, and to other professional organizations; 4. that among other activities the division pursue, or continue to pursue, the following: . a. publication of the journal of library automation, technical communications, and other publications that may from time to time appear necessary, appropriate, and feasible; · b. provision of forums for communication of information in its fields of responsibility, including local, regional, and national seminars, institutes, regularly scheduled meetings, discussion groups, and special programs on specific topics; c. promotion of the development and use of appropriate standards; d. investigation, largely through committees, of matters of immediate, even though temporary, concern to the profession, within the division's area: of responsibility; e. encouragement through various means of techniques, approaches, and specific activities outside the division which are desirable for the profession (such as computer-based library networks, increased instruction in library automation and information technology in library· schools, costeffectiveness in automated library systems, and development of the utilization of machine-readable data bases); f. liaison with other professional organizations in its fields of responsibility. respectfully submitted, pauline atherton henriette avram frederick g. kilgour john knapp john mcgowan joseph h . treyz stephen r. salmon, chairman isad ad hoc committee reports 133 report of the committee on research topics introduction we take it· as axiomatic that the fundamental purpose of library automation is to increase the productivity of people who work in libraries. a necessary concomitant of this is that we must know what it is that people who work in libraries do. the question of what they should do is a deeper question, but not one that experts in automation are likely to shed much light on. what is new in the elicitation of the contents of library work is the need for specilication at a level of detail not needed when one must only specify for human direction. at the first, most elementary, level much of this has now been done. at least 25 or 30 major academic and public libraries have well-tested, working systems in operation and perhaps as many as 100 more have significant operations underway. the fact that some operations have failed outright, or fallen far short of promise, or been far too expensive, does not detract from the fundamental fact that a number of promising, economic systems are now in place and can be replicated throughout the library world at will. in this category we would place book catalog systems, basic circulation and acquisitions systems, and, more recently, card catalog production systems. such performance would not have been possible without great attention to detail and without the accumulation of a more precise understanding of how librarians work on these tasks. the question at hand is, given the developments of the past five years, where do we go from here? at least a portion of the answer lies in the observation that one of the by-products of the initial phase of automation is the creation of substantial data bases in machine-readable form which can be used to provide greater insight into the ne,.:t higher level of library work. analysis of this unprecedented mass of data on library activity must therefore be placed high on any priority list of future activity. we shall consider some of the possibilities in traditional terms by enumerating some of the problems in three broad areas of library work: acquisitions, cataloging, and circulation. acquisitions the fundamental problem in acquisitions is the problem of selectivity. some rough estimates may help put things in perspective: although we know of no formal estimates of the total number of monographs residing in archives in, say, the english speaking world, it must surely be in excess of twenty million volumes. we do know from the u.s. office of education that the median-sized university library has something in the order of 750,000 volumes; thus with the possible exception of the library of congress, the british museum, and (literally ) a handful of major university and public libraries there is little hope of ever obtaining a "complete" collection. the problem is not new and librarians have grappled with it for decades-if not centuries-with varying degrees of success. the several faces of the problem can perhaps best be seen by examining the questions posed by the several members of this committee: 1. how can libraries, with a·ininimum expenditnre of time and money, deter134 ]our1ull of library automation vol. 6/ 3 september 1973 mine how well they are serving their intended audiences? 2. what methodologies exist, or can be created, for dividing collection responsibility among members of a library consortium? 3. is it possible to develop criteria which libraries of varying size can use to identify not just the subject matter of materials to be purchased, but actually place priorities on serial titles, perhaps some monographs , government documents, etc. which should reside in a given library? if we presuppose the existence of automated circulation, acquisition, and cataloging systems-with accompanying statistical packages to simplify the routine processing of machine-readable data-the following sorts of studies might help to illuminate these questions: 1. d etailed statistics about who is borrowing what kinds of books; how these borrowing patterns change within a year and from year to year provide useful information about one aspect of library service. acquisitions data on the number and types of books ordered or suggested by patrons and the turn around time necessary to obtain such items and comparative studies of books obtained at patron request versus books obtained by other means may be helpful. 2. if two or more libraries are to act together in planning their acquisitions, it seems reasonable to suggest that each library should first carefully describe its collection and borrowing pattern at a rather detailed level. even libraries not formally allied in a consortium would do well to publish more information about their holdings to allow individual users to better judge which of the several libraries available to the user is most likely to contain the desired information. 3. better circulation data cannot help but be useful in selecting those categories of publications that are likely to circulate well. periodic examination of citation indexes can shed insight into the problem of selecting periodicals. individual monograph selection might be improved by accumulating circulation data across a number of similar libraries to provide a "best-circulator" list to go along with the "best-seller" lists already available. many such studies have been made in the past. the joint availability of computers and machine-readable records of b'ansactions makes it rather inevitable that their number will increase-with or without the benefit of further research. given this, it seems reasonable to suggest that in addition to a frontal attack on the problem of selectivity in acquisitions, there should be a substantive effort towards improving the statistical methodology involved in analyzing the available data as well as an attempt to make more readily available those statistical techniques that have already been shown to have application in this area. cataloging if the key word in acquisitions is "selectivity," the corresponding term in cataloging is "access." some of the earliest automation efforts in libraries were directed to the production of machine-readable catalogs and associated programs to produce from such data bases printed book catalogs. because of the large cost in converting the reb'ospective catalog, such efforts w ere largely limited to small collections and had, p erhaps, their best application in public library systems where multiple copies of the book catalog had obvious application. in such applications, access was enhanced by the reduced cost of multiple copies which enisad ad hoc committee reports 135 abled the system not only to maintain complete catalogs at each location but also to make copies available to neighboring institutions such as local school districts. more recently, the coming into being of the marc data base has led to . the creation of regional and national services for the more rapid creation of catalog card images. in these applications, the primary improvement in access is that provided by time gains that make it possible to get items on the shelf faster, however, some of these services, e.g. oclc, provide useful by-products such as the creation of an on-line union catalog of the various libraries using the system, thus facilitating interlibrary loan services. similarly, some automated circulation systems provide increased access to holdings through the use of on-line and/ or telephone access to author and/or title catalogs. (ohio state university, with one of the more sophisticated systems, notes that in each of the first two years of operation, circulation increased 15 percent-considerably more than the growth of the campus community.) computers have also been used to multiply the number of available orderings of a shelflist, increase access to titles by permuted title lists, develop use of citations through citation indexes, and increase subject access through accumulation of book indexes in consolidated volumes; in many cases access is further increased by publication-and multiplication of the access through multiple copies. extensions and refinements will almost certainly continue through the coming decade. what research, then, is necessary? the following topics seem w01thy of consideration: 1. develop measures of cost effectiveness of various access systems particularly with regard to the relative merits of on-line, telephone, and printed copies of access systems. 2. systematically enumerate the various types of information requests placed on libraries to obtain a better understanding of what libraries can do to supply needed information as well as documents and bibliographic references. circulation some of the most successful automation efforts have been obtained in circulation control, together with some of the most useful by-products. as costs come down, it seems natural to hope that similar usagl" information might be made available. in the noncirculating areas. specifically£ 1. determine the feasibility of entering records of replacement of documents that were removed from the shelf, but not borrowed, into the existing circulation system. 2. develop economic means for entering information about "information requests" into the same system. general in addition to those requests that have been more, or less, accumulated under the traditional headings, we would like to present the following general recommendations for future library studies: 1. determine the needs in terms of coordinated planning, cooperation, and hardware and software transferability which should be confronted before the fact, rather than after, as more and more regional operations take shape. ·136 journal of library automation vol. 6/ 3 september 1973 2. determine how libraries can develop the problem-solving and idea-producing capabilities of their staffs to the maximum. 3. develop a continuing education program for librarians covering isad related topics that can be presented throughout the u.s. at a reasonable price per person per class. summary the first phase of library automation required a significant effort to develop information on the basic chores of running a library. as the studies moved from the testing ground to fu11 operation they have started to generate a significant amount of information that can and should be exploited in tum to determine how library operations can be improved. such "research" will generally tend to be "applied" rather than "pure." they will tend to concentrate on costeffectiveness at least as much as on novelty. the studies themselves need not beand almost undoubtedly should not be-multimillion dollar efforts. although hardware and software developments will no doubt occur, the concentration appears to be more on planning and evaluating existing and proposed methods rather than on system "breakthroughs." respectfully submitted, don l . bosseau michael d. cooper douglas ferguson james l . dolby, chairman committee on research topics: members reports . . . concerning !sad related research and development needs, my first instinct was to try to think of needs occurring in hardware and software areas, but the influence of my new administrative duties has caused me to forego my old interest in systems work, by suggesting projects that are more planning oriented than technical. ( 1) determine the needs in terms of coordinated planning, cooperation, and hardware and software transferability which should be confronted before the fact, rather than after, as more and more regional oclc type operations take shape. with centers now on the drawing boards for texas, one in the southeast at atlanta or possibly tulane university, the california state university and colleges project, the university of california bibcenter, etc., it is possible that many of the duplications of effort of the past (involving individual libraries) could take place again, only on a larger scale. of course, this gets into "networks," but there is a similarity \vith the problems posed in the past which were due to incompatibilities between individual library automation efforts, and the potential for the same problems occurring with the oclc type of operations. again, standard formats and other recent accomplishments will help alleviate the magnitude of the problems. ( 2) with the implementation of a growing number of automated circulation systems, inventory control and processing systems, and the combined effects of tightening funding and inflated costs of library materials, libraries will, for the first time, have both a compelling reason to tighten up their collection development mechanism while also having available some of the statistical data required to determine the scope of material being collected. thus a library could scienisad ad hoc committee reports 137 tifically direct the nature of its acquisitions. specifically (though perhaps not clearly) i am suggesting that there is a need for research to study and perhaps develop criteria which academic libraries of varying sizes can use to identify not just the subject matter of materials to be purchased, but actually place priorities on serial titles, perhaps some monographs, government documents, etc. which should reside in a given library. such a set of criteria can be developed using circulation use statistics, statistical analysis of citation indexes, knowledge of the number of volumes and value of materials being published by subject areas and perhaps other factors. using statistics derived from citation indexes, as an example, we evaluated our journal collection in several subject areas in an effort to determine whether we were deriving maximum benefit in terms of cost and coverage from our existing journal collection. to do so we utilized a recent article by eugene garfield ("citation analysis as a tool in journal evaluation," science, 178:471-79,3 november 1972) which ranked journals by frequency and impact of citations. it was interesting to note that in some areas we were providing effective coverage of the literature for our faculty and research programs, whereas in other areas our collection was picking up only 1 percent to 2 percent of the useful information as evaluated on the basis of frequency of citation. of course what i am proposing is to do research into how this type of information can be stored and extracted, and evaluated to provide general academic libraries with scientifically based guides upon which to base their acquisitions programs. ( 3) where should libraries go with automation after inventory control, internal housekeeping functions, and mis activities become routine? in other words, if there is going to be a phase iii (?) in library automation, where will its emphasis be, or better yet, where should it be? i hope that this will provide at least something to think about and perhaps lead to a more clearly defined set of research topics.-dlb . . . . a number of research topics seem to me worth exploring. however i have no idea whether they fall within the scope of isad. ( 1) economics of depository storage facilities. should twelve million volume libraries be built, or should we have secondary storage facilities? ( 2) methodologies for dividing collection responsibility among members of a library consortium. ( 3) how to predict the usage of individual monographs, not classes of material. ( 4) develop a continuing education program for librarians covering isad related topics. present throughout the united states and at a price less than $10 per person per class. ( 5) undertake a study to determine a new editorial policy for the i ournal of library automation. my personal view would be that ]ola should move away from system descriptions and toward research topics.mdc. libraries require research that will develop the capacity to improve existing operations and respond innovatively with new services, programs, and products. this requires applied research that produces immediately usable tools that can 138 journal of library automation vol. 6/ 3 september 1973 be applied by a library staff with a minimum of outside or specialist help. ~ the focus is on the library staff at all levels (managers, supervisors, librarians, support personnel) and the aim is to enhance their ability to identify problems and fmmulate solutions within realistic constraints. the only way to move from the gene1ality of these considerations to the concreteness of what i consider important is to state the questions to which applied research should address itself. ( 1) how can libraries do more to supply needed information as well as documents and bibliographic references? ( 2) how can libraries, with a minimum expenditure of time and money, determine how well they are serving their intended audience( s)? ( 3a) how can libraries develop the problem-solving and idea-producing capabilities of their staffs to the maximum? ( 3b) how can a library staff develop a cost-consciousness combined with an aggressive approach for funds for projects with demonstrable results? comments it should be clear that these questions assume that libraries can better serve their patrons if existing staff skills are developed at the same time as the library becomes more actively involved with those it serves. technological development and multimillion dollar research are not needed. managerial and staff cookbooks, library-based demonstration projects and on-the-job training programs are needed. such tools and activities would help develop the flexible, mobile, and aggressive counterpart to the library's equally important passive, conserving, and stabiljzing role. for example the proposed research might have the following kinds of results. it might show how library managers (directors etc.) use noneconomic rewards in the library work system to foster cost control and reduction, and new service ideas and opportunities to get in touch with library users . research might produce a howto-do-it manual on user studies that might show how to use existing data or quickly gather data, perhaps on a sampling basis, to evaluate performance of a service or operational unit.-df. isad ad hoc committee reports 139 report of the committee on seminar and institute topics the committee the mission of the ad hoc committee as defined by the committee with the concurrence of ralph shoffner, president of isad, was: "to propose a plan for a program of seminars and institutes within the interests and educational needs of the information science and automation division (!sad) of the american library association. the plan should cover the period commencing july 1974 and ending june 1978." the members of the committee are: pauline atherton (syracuse university), brett butler (information design, inc.), jay cunningham (university of california-berkeley), paul fasana (new york public library), diana ironside (ontario institute for studies in education), sue martin (harvard university), ron miller (new england library information network), elaine svenonius (university of denver). the approach the committee began its work by placing the fulfillment of its mission firmly within the constraints of one major working assumption: that the educational function which the plan should serve must be directed toward accomplishing the objectives and needs of the isad membership. three parallel activities were undertaken as a result of our adherence to that assumption. first, a review of the extant data which resulted from seminars and institutes held by isad over ·the past several years was undertaken in order to identify characteristics of success or inadequacy which could be helpful to the committee. second, the deliberations of a parallel group, the isad committee on objectives, were obtained to provide the context within which the plan will function. third, the chairman of the isad membership survey committee was interviewed for the purpose of including questions pertinent to seminar and institute topics in the proposed survey. the remaining resources, external to the committee, were combined with the considerable experience and insight of the committee members. this experience includes continuing education techniques, information science research and education, automation of libraries and information services, computer and graphic technology, library cooperation and institute planning. some members play important roles in isad conference planning. it was hoped that the combination of these resources could then be directed toward the intriguing prospect of developing some reasonable prescience about how technology may be applied to libraries and information science activities during the period ending june 1978. the result of the first activity-a review of the historical data about isad seminars and institutesis discussed b elow, followed by a series of recommendations formulated by the committee within the context of the objectives proposed and accepted by the isad board of directors. 140 journal of library automation vol. 6/3 september 1973 seminar and institute activities: historical table 1 summarizes some of the data available to the committee about thirtytwo seminar and institute programs held since 1968: table 1 . 1sad institutes 1967-1972. dates programs location attendance 1967 june st ate of the art of library automation san francisco boo 1968 june automated circulation systems kansas city 600 july marc institute seattle 94 aug. marc institute denver 99 sep. marc institute new york 112 oct. marc institute chlcago 128 nov. marc institute boston 126 dec. marc institute atlanta 123 1969 feb. marc institute cleveland 120 mar. marc institute los angeles 120 mar. marc institute honolulu 52 apr. marc institute houston ( 100+ )? sep. marc institute san francisco 208 1970 jan. tutorial on library automation washington, d.c. 106 mar. marc institute washington, d.c. 167 mar. tutorial on library automation seattle 79 apr. marc institute evanston, il 92 apr. tutorial on library automation cambridge, ma 107 may tutorial on library automation new york 156 june marc institute cambridge, ma 105 oct. tutorial on library automation philadelphia 123 nov. tutorial on library automation san francisco 7 2 dec. library automation for school libraries dallas 38 1971 feb. library automation for school libraries atlanta 80 mar. tutorial on library automation elgin, il 77 apr. marc institute new york 100 may administration & management dedham, ma 54 nov. directions in information science education denver 67 1972 feb. marc i nstitute washington, d.c. 139 may microforms in library automation new york 53 june administration & management new york 52 sep. seminar on t elecommunications washington, d.c. llo total seminars/institutes: 32 estimated attendance: 4,460 by the end of september 1972, thirty-two seminars or institutes had been held across the united states from boston to honolulu. fifty percent of the sessions were concerned directly with the exposition and use of either the marc i or marc ii communications formats. the remaining seminars dealt with topics such as the introduction to library automation in general and to school libraries in particular, as well as library automation management, information science education, micrographic and telecommunications technologies. as a group, the marc-related institutes reached approximately 1,900 people isad ad hoc committee reports 141 (not allowing for multiple attendance); or 43 percent of the total attendance over the five-year period. the most heavuy attended institutes were those held either immediately b efore or after annual conferences of ala or asis. the northeast region (boston area, new yorlr., philadelphia, and washington, d.c.) hosted fourteen institutes; the midwest region (chicago area, cleveland, and kansas city ) hosted five; the southwest (denver, houston, and dallas) hosted four; the west coast (los angeles and san francisco) also hosted four; the northwest ( seattle) and southeast (atlanta) regions each were the site of two institutes. one institute was held in honolulu, and another was canceled. if preand post-conference institutes are excluded, the arithmetic mean attendance at a typical institute was approximately 98. the range was from a low of 38 (library automation for school libraries in dallas, december 1970) to a high of 167 (marc institute in washington, d.c., in march of 1970). it is risky to conjecture why any single institute was more successful than another purely on tht• basis of attendance. attendance figures are probably more nearly a function of the expectations of the target population, the magnetism of the topic and how much physical and financial investment is required to attend. there is little doubt, however, that the best attendance can be obtained if seminars are held immediately before or after an annual meeting of ala or isad in the same city. some post-institute evaluative data, also provided by isad headquarters, seemed to indicate that when dissatisfaction with institutes was registered by participants, it was derived from the diverse backgrounds of participants who attended institutes. for some, the content was too elementary; for others, it was too advanced. some attendees were technologists while others were managers. both large and small libraries of all types were represented. another factor relating to the apparent effectiveness of a particular institute was the perceived competence or teaching ability of particular institute faculty members. these factors do not reveal themselves as startling revelations, but do imply that some care should be used to describe and reach particular target populations, and to select instructional leaders. another important fact: between two-thirds and three-quarters of the participants of a typical institute were not members of !sad. it is readily apparent therefore, that these programs have reached an audience drawn from a broad segment of the library professional community. this condition is no doubt a good one, b ut it may imply a differential registration fee structure as well as a fruitful context in which to recruit new division members. with these considerations of the past in mind, the committee offers the comments and recommendations which follow. it should not be assumed that these recommendations are necessarily endorsed unanimously by the committee members. seminar and institute activities : recommendations 1. the isad committee on seminar and institute topics and the isad committee on objectives fully concur that the seminar/institute program has been valuable as a division activity in service of the profession. the committee therefore recommends that a program of seminars and institutes be continued and broadened by the division during the next five years. 142 journal of library automation vol. 6/ 3 september 1973 2. a history of fruitful conjoining of the division's seminars and institutes program with the library of congress, the american society for information science and other institutes and associations has been mutually beneficial. the committee therefore recommends that a conscious effort be made to continue the division's seminars and institutes program in cooperation with appropriate activities within the library of congress and other federal library-oriented activities (such as the federal libra1y cooperative center), and expand its target population and faculties to include members of asis, acm, iia, afips and jcet. 3. the geographic distribution of previous seminars and institutes based upon population concentrations has appeared to be reasonably sound. the committee there fore recommends that the division's seminars and. institutes program continue to disperse and replicate seminars among six geographic regions; the northeast, southeast, midwest, southwest, west and northwest. the gulf coast area sh()uld be considered as a central location for programs which the division may co-sponsor with the southeastern library association and the southwestern library association. programs occurring in the northern half of the continental united states should coordinate with our canadian counterparts. 4. technological advances which may be applicable to the concerns of the division are developing with astonishing speed. the committee therefore recommends that an "alert group" be formed from technically aware people for the purpose of formulating topics which should be considered by the planning group assigned to monitor the division's seminars and institutes program. a secondary source for topical input to a planning group should be the inclusion of "a topic alert form" included in the division's publications. 5. planning and implementing a series of seminars and institutes is a severe time burden on volunteers. the committee therefore recommends that the reliance upon voluntary workers to design and implement seminars and institutes should diminish, the major burden of such work passing to staff responsible for this work at ala headquarters. the staff salary should be substantially reimbursed from income derived from the program. further, subcontractors whose products are educational programs, as opposed to systems, hardware or supplies, should be considered as supplemental manpower to both voluntary committee work and !sad staff investment in particular seminars or institutes. 6. commercial interests may have particular products to sell to the isad community and view the division's seminars and institutes program as a channel through which their products can be marketed. the committee therefore recommends that a policy be formally adopted which permits the participation of representatives of commercial enterprises in the seminars and institutes program under the condition that appropriate competitors have equal and simultaneous opportunity to offer alternative products or points of view. the division must maintain its professional distance from any one technological solution to a particular set of problems. isad ad hoc committee reports 143 7. dissatisfaction on the part of some institute participants has been attributed to ( 1) participant heterogeneity, ( 2) misplaced expectations, ( 3) inadeq uacies in seminar faculties. the committee therefore recommends that the target populations be clearly defined and subgroups be given special attention; that expectations be clearly spelled out in publicity announcements and that faculty members be selected with care. 8. the most useful measure of success in any educational program is purported to be evidenced by some behavioral or attitudinal change on the part of the "educatee" occurring after a defined educational experience. the committee therefore recommernh that, where possible, evaluation techniques be used on a sampling basis to obtain such evidence after a period of time has elapsed beyond the occurrence of the seminar experience. 9. a limited number of people have been able to attend particular seminars because of the barriers of time, cost, and space. the committee therefore recommends that a supplemental program using edited audio or video media be attempted to promulgate institute segments in shortened time frames. 10. several colleges and universities are committed to continuing education in information science and library automation. the committee therefore recommends that, when appropriate, the talent of selected graduate schools be considered as resources for faculty and for program planning. one prerequisite to maximize the success of this type of alliance is evidence that the objectives of the isad seminars and institutes program are coincident with the objectives of the continuing education programs of such graduate schools. 11. information science has not received much attention per se within the seminars and institutes program, and the committee regards this state of affairs as insufficient to attain the objectives of the division. the committee therefore recommends that a working definition of information science be promulgated by the division to provide guidance for planning seminars and institutes for its practitioners. in iliis connection, close liaison with asis and special interest groups in other organizations is a natural procedure to follow. 12. the technologies which appear to have a growing impact upon library and information center operations and services are: computers, micrographics, media systems, and telecommunications. the committee therefore recommends that the division's seminars and in~ stitutes program planners seek to provide a series of tutorials not only on the operations and applications of these technologies taken separately, but also on the inter-relationships which are possible among them. 13. based upon suggestions contributed by the committee members and others, the following topical areas are recommended for consideration in planning for future seminars and institutes. a. interlibrary cooperation 1. technological options 144 journal of library automation vol. 6/3 september 1973 2. how to choose a network 3. variant forms of marc (e.g., serials, music, etc.) b. technology i. the interrelationships of computers, micrographics, instructional media systems and telecommunications 2. mini-computers 3. telecommunications 4. non-print audio and visual technology 5. techniques of data base storage, search and retrieval: trade-offs 6. the emergence of commercial jobbers in support of library automation c. internetwork compatibility 1. network transferability 2. data base interchange 3. methods of interconnection through telecommunications 4. standards and protocols d. management and people l. impact of automation on people and jobs 2. automation, ne tworks and educational needs of librarians 3. contract negotiation 4. problem solving for library managers 5. the impact of nonlibrarians on library automation and information science 6. the theory and application of interinstitutional cooperation: quid pro quo revisited 7. why automation projects fail. respectfully s-ubmitted, pauline atherton brett butler ]ay cunningham paul fasana diana ironside sue martin elaine svenonius ronald miller, chairman reproduced with permission of the copyright owner. further reproduction prohibited without permission. harvesting information from a library data warehouse su, siew-phek t;needamangala, ashwin information technology and libraries; mar 2000; 19, 1; proquest pg. 17 harvesting information from a library data warehouse data warehousing technology has been defined by john ladley as "a set of methods, techniques, and tools that are leveraged together and used to produce a vehicle that delivers data to end users on an integrated platform. "1 this concept has been applied increasingly by industries worldwide to develop data warehouses for decision support and knowledge discovery. in the academic sector, several universities have developed data warehouses containing the universities'ftnancial, payroll, personnel, budget, and student data.2 these data warehouses across all industries and academia have met with varying degrees of success. data warehousing technology and its related issues have been widely discussed and published. 3 little has been done, however, on the application of this cutting edge technology in the library environment using library data. i motivation of project daniel boorstin, the former librarian of congress, mentions that "for most of western history, interpretation has far outrun data." 4 however, he points out "that modem tendency is quite the contrary, as we see data outrun meaning." his insights tie directly to many large organizations that long have been rich in data but poor in information and knowledge. library managers are increasingly finding the importance of obtaining a comprehensive and integrated view of the library operations and the services it provides. this view is helpful for the purpose of making decisions on the current operations and for their improvement. due to financial and human constraints for library support, library managers increasingly encounter the need to justify everything they dofor example, the library's operation budget. the most frustrating problem they face is knowing that the information needed is available somewhere in the ocean of data but there is no easy way to obtain it. for example, it is not easy to ascertain whether the materials of a certain subject area, which consumed a lot of financial resources for their acquisition and processing, are either frequently used (i.e., high rate of circulation), seldom used, or not used at all. or, whether they satisfy users' needs. another example, an analysis of the methods of acquisition (firm order vs. approval plan) together with the circulation rate could be used as a factor in deciding the best method of acquiring certain types of material. such information can play a pivotal role in performing collection development and library management more efficiently and effectively. unfortunately, the data needed to make these types of decisions are often scattered in different files maintained siew-phek t. su and ashwin needamangala by a large centralized system, such as notis, that does not provide a general querying facility or by different file/ data management or application systems. this situation makes it very difficult and time-consuming to extract useful information. this is precisely where data warehousing technology comes in. the goal of this research and development work is to apply data warehousing and data mining technologies in the development of a library decision support system (loss) to aid the library management's decision making. the first phase of this work is to establish a data warehouse by importing selected data from separately maintained files presently used in the george a. smathers libraries of the university of florida into a relational database system (microsoft access). data stored in the existing files were extracted, cleansed, aggregated, and transformed into the relational representation suitable for processing by the relational database management system. a graphical user interface (gui) is developed to allow decision makers to query for the data warehouse's contents using either some predefined queries or ad hoc queries. the second phase is to apply data mining techniques on the library data warehouse for knowledge discovery. this paper covers the first phase of this research and development work. our goal is to develop a general methodology and inexpensive software tools, which can be used by different functional units of a library to import data from different data sources and to tailor different warehouses to meet their local decision needs. for meeting this objective, we do not have to use a very large centralized database management system to establish a single very large data warehouse to support different uses. i local environment the university of florida libraries has a collection of more than two million titles, comprising over three million volumes. it shares a notis-based integrated system with nine other state university system (sus) libraries for acquiring, processing, circulating, and accessing its collection. all ten sus libraries are under the consortium umbrella of the florida center for library automation (fcla). siew-phekt. su (pheksu@mail.uflib.ufl.edu) is associate chair of the central bibliographic services section, resource services department, university of florida libraries, and ashwin needamangala (nsashwin@grove.ufl.edu) is a graduate student at the electrical and computer engineering department, university of florida. harvesting information from a library data warehouse i su and needamangala 17 reproduced with permission of the copyright owner. further reproduction prohibited without permission. i library data sources the university of florida libraries' online database, luis, stores a wealth of data, such as bibliographic data (author, title, subject, publisher information), acquisitions data (price, order information, fund assignment), circulation data (charge out and browse information, withdrawn and inventory information), and owning location data (where item is shelved). these voluminous data are stored in separate files. the notis system as used by the university of florida does not provide a general querying facility for accessing data across different files. extracting any information needed by a decision maker has to be done by writing an application program to access and manipulate these files. this is a tedious task since many application programs would have to be written to meet the different information needs. the challenge of this project is to develop a general methodology and tools for extracting useful data and metadata from these disjointed files, and to bring them into a warehouse that is maintained by a database management system such as microsoft access. the selection of access and pc hardware for this project is motivated by cost consideration. we envision that multiple special purpose warehouses be established on multiple pc systems to provide decision support to different library units. the library decision support system (loss) is developed with the capability of handling and analyzing an established data warehouse. for testing our methodology and software system, we established a warehouse based on twenty thousand monograph titles acquired from our major monograph vendor. these titles were published by domestic u.s. publishers and have a high percentage of dlc/dlc records (titles cataloged by the library of congress). they were acquired by firm order and approval plan, the publication coverage is the calendar year 1996-1997. analysis is only on the first item record (future project will include all copy holdings). although the size of the test data used is small, it is sufficient to test our general methodology and the functionality of our software system. fcla d82 tables and key list most of the data from the twenty-thousand-title domain that go into the loss warehouse are obtained from the db2 tables maintained by fcla. fcla developed and maintains the database of a system called ad hoc report request over the web (arrow) to facilitate querying and generating reports on acquisitions activities . the data are stored in 0b2 tables. 5 for our research and development purpose, we needed db2 tables for only the twenty-thousand titles that we identified as our initial project domain. these titles all have an identifiable 035 field in the bibliographic records (zybp1996, zybcip1996, zybp1997 or zybpcip1997). we used the batchbam program developed by gary strawn of northwestern university library to extract and list the unique bibliographic record numbers in separate files for fcla to pick up. 6 using the unique bibliographic record numbers, fcla extracted the 0b2 tables from the arrow database and exported the data to text files. these text files then were transferred to our system using the file transfer protocol (frp) and inserted as tables into the loss warehouse. bibliographic and item records extraction fcla collects and stores complete acquisitions data from the order records as db2 tables. however, only brief bibliographic data and no item record data are available . bibliographic and item record data are essential for inclusion in the loss warehouse in order to create a viable integrated system capable of performing cross-file analysis and querying for the relationships among different types of data. because these required data do not exist in any computer readable form, we designed a method to obtain them. using the identical notis key lists to extract the targeted twenty-thousand bibliographic and item records, we applied a screen scraping technique to scrape the data from the screen and saved them in a flat file. we then wrote a program in microsoft visual basic to clean the scraped data and saved them as text-delimited files that are suitable for importing into the loss warehouse. screen scraping concept screen scraping is a process used to capture data from a host application. it is conventionally a three-part process: • displaying the host screen or data to be scraped. • finding the data to be captured. • capturing the data to a pc or host file, or using it in another windows application. in other words, we can capture particular data on the screen by providing the corresponding screen coordinates to the screen scraping program. numerous commercial applications for screen scraping are available on the market. however, we used an approach slightly different from the conventional one. although we had to capture only certain fields from the notis screen, there were other factors that we had to take into consideration. they are: • the location of the various fields with respect to the screen coordinates changes from record to record . this makes it impossible for us to lock a particular field with a corresponding screen coordinate. 18 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. • the data present on the screen are dynamic because we are working on a "live" database where data are frequently modified. for accurate query results, all the data, especially the item record data where the circulation transactions are housed, need to be captured within a specified time interval so that the data are uniform. this makes the time taken for capturing the data extremely important. • most of the fields present on the screen needed to be captured. taking the above factors into consideration, it was decided to capture the entire screen instead of scraping only certain parts of the screen. this made the process both simpler and faster . the unnecessary fields were filtered out during the cleanup process . i system architecture the architecture of the loss system is shown in figure 1 and is followed by a discussion on its components' functions. notis notis (northwestern online totally integrated system) was developed at the northwestern university library and introduced in 1970. since its inception, notis has undergone many versions. university of florida libraries is one of the earliest users of notis. fcla has made many local modifications of the notis system since uf libraries started using it. as a result, the uf notis is different from the rest of the notis world in many respects . notis can be broken down into four subsystems: • acquisitions • cataloging • circulation • online public access catalog (opac) at the university of florida libraries, the notis system runs on an ibm 370 main frame computer that runs the os/390 operating system . host explorer host explorer is a software program that provides a tcp /ip link to the main frame computer . it is a terminal emulation program supporting the ibm main frame, as/400, and vax hosts . host explorer delivers an enhanced user environment for all windows nt platforms, windows 95 and windows 3.x desktops. exact tn3270e, tn5250, vt420/320/220/101/100/52, wyse 50/60 and ansi-bbs display is extended to leverage the wealth of the windows desktop. it also supports all db2tables loss host explorer data cleansing and extraction warehouse graphical user interface figure 1. loss architecture and its components tcp /ip based tn3270 and tn3270e gateways. the host explorer program is used as the terminal emulation program in loss. it also provides vba compatible basic scripting tools for complete desktop macro development. users can run these macros directly or attach them to keyboard keys, toolbar buttons, and screen hotspots for additional productivity. the function of host explorer in the loss is v ery simple. it has to "visit" all screens in the notis system corresponding to each notis number present in the batchbam file, and capture all the data on the screens. in order to do this, we wrote a macro that read the notis number one at a time from the batchbam file and input the number into the command string of host explorer . the macro essentially performed the following functions: • read the notis numbers from the batchbam file. • inserted the notis number into the command string of host explorer . • toggled the screen capture option in host explorer so that data are scraped from the screen only at necessary times. • saved all the scraped data into a flat file. after the macro has been executed, all the data scraped from the notis screen reside in a flat file. the data present harvesting information from a library data warehouse i su and needamangala 19 reproduced with permission of the copyright owner. further reproduction prohibited without permission. in this file have to be cleansed in order to make them suitable for insertion into the library warehouse. a visual basic program is written to perform this function. the details of this program will be given in the next section. i data cleansing and extraction this component of the loss is written in the visual basic programming language. its main function is to cleanse the data that have been scraped from the notis screen. the visual basic code saves the cleansed data in a text-delimited format that is recognized by microsoft access. this file is then imported into the library warehouse maintained by microsoft access. the detailed working of the code that performs the cleansing operation is discussed below. the notis screen that comes up for each notis number has several parts that are critical to the working of the code. they are: • notis number present in the top-right of the screen (in this case, akr9234) • field numbers that have to be extracted. example: 010::, 035:: • delimiters. the " i " symbol is used as the delimiter throughout this code. for example, in the 260 field of a bibliographic record, "i a" delimits the place of publication, " i b" the name of the publisher and, "i c" the date of publication. we shall now go step by step through the cleansing process. initially we have the flat file containing all the data that have been scraped from the notis screens. • the entire list of notis numbers from the batchbam file is read into an array called bam_number$. • the file containing the data that have been scraped is read into a single string called bibrecord$. • this string is then parsed using the notis numbers from the bam_number$ array. • we now have a string that contains a single notis record. this string is called single_record$. • the program runs in a loop till all the records have been read. • each string is now broken down into several smaller strings based on the field numbers. each of these smaller strings contains data pertaining to the corresponding field number. • a considerable amount of the data present on the notis screen is unnecessary from the point of view of our project. we need only certain fields from the notis screen. but even from these fields we need the data only from certain delimiters. therefore, we now scan each of these smaller strings for a certain set of delimiters, which was predefined for each individual field. the data present in the other delimiters are discarded. • the data collected from the various fields and their corresponding delimiters are assigned to corresponding variables. some variables contain data from more than one delimiter concatenated together. the reason for this can be explained as follows. there are certain fields, which are present in the database only for informational purposes and will not be used as a criteria field in any query. since these fields will never be queried upon, they do not need to be cleansed as rigorously as the other fields and therefore, we can afford to leave the data of these fields as concatenated strings. example: the catalog_source field which has data from " i a" and " i c" is of the form " i a dlc i c dlc" while the lang code field which has data from "i a" and" i h" is of the form" i a eng i h rus." but we split this into two fields: lang_code_l containing "eng" and lang_code_2 containing "rus." • the data collected from the various fields are saved in a flat file in the text-delimited format. microsoft access recognizes this format. a screen dump of the text-delimited file, which is the end result of the cleansing operation, is shown in figure 2. the flat file, which we now have, can be imported into the library warehouse. i graphical user interface in order to ease the tasks of the user (i.e., the decision maker) to create the library warehouse and to query and analyze its contents, a graphical user interface tool has been developed. through the gui, the user can enact the following processes or operations through a main menu: • connection to notis • screen scraping • data cleansing and extracting • importing data • viewing collected data • querying • report generating the first option opens hostexplorer and provides a connection to notis. lt provides a shortcut to closing or minimizing ldss and opening hostexplorer. the screen scraping option activates the data scraping process. the data cleansing and extracting option filters out the unnecessary data fields and saves the cleansed data in a text-delimited format. the importing data option imports the data in the text-delimited format into the warehouse. the viewing collected data option allows the user to view the contents of a selected relational table stored in 20 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. "record humber","system control humber","catalogin source","language codes 1","language code~ "akr9234", "ybp1996 0507--clarr done", "a dlc i c dlc ", "1 : i a eng "," i h rus", "e-ur-ru", "306/. 0~ "rks6472", "ybp1996 0507--clrrr done"," a dlc i c dlc ", "1 : i a eng "," i h rus", "hull", "891. 73/ 44 "aks6493", "ybp1996 0507--clarr done"," a dlc i c dlc ","hull", "hull", "hull"," 001. 4/225/ 028563 i ~f "ajx7554", "ybp1996 05 08--clarr done"," a uk i c uk ","hull", "hull", "e-uk---", "362. 1 / 068 12 2 o",' "akb3478", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-fr---", "843/. 7 12 2 o", "t " "akc6442","ybp19960508--clarr done","a dlc c dlc ","1 : la eng ","lh ger","e-fr---","194 12 "ake9837", "ybp1996 0508--clarr done"," a dlc c dlc ","hull", "hull", "e-gr---", "883/. 01 12 20",' "akk9486", "ybp1996 0508--clarr done", "a dlc c dlc ","hull", "hull", "e-uk---", "822/. 052309 12 ~% l'akl2258", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-xr---", "929. 4/2/ 08992401 1• "akm2455", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-gx---", "943. 086 12 2 o",' "akm4649", "ybp1996 0508--clarr done"," a dlc c dlc ","hull", "hull", "hull", "863/ .64 i 2 20", "hu] ' "akh0246","ybp19960508--clarr done","a dlc c dlc ","hull","hull","n-us--la e-uk-en","700/. "akh181 o", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull" ,"hull", "e-uk---", "305. 6/2 042/ 0903.: "akh3749","ybp19960508--clarr done","a dlc c dlc ","hull","hull","f-ke--la f-so --","327.{ "akq727 4", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "hull", "355. 4/2 12 2 o", "hu] "akq9180", "y.bp1996 0508--clarr done", "a dlc c dlc ","hull", "hull", "n-us---", "23 0/. 93/ 09 12 2,f "akr 0424", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "n-us-mi", "331 . 88/1292/ 097' "rkr1411", "ybp1996 05 08--clarr done"," a cl i c cl ","hull", "hull", "n-us---", "3 05. 896/ 073 12 2 o' "akr1846", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e-uk-ni", "hull", "hull", "x, "akr2169", "ybp1996jt5 08--clarr done"," a dlc i c dlc ","hull", "hull", "n-us-sc", "323. 1/196073/ 091 "akr2245" ,"ybp19960508--c .larr d.one" ," a dlc i c dlc ","hull", "hull", "hull", "306 .4/6 i 2 20", "hu1 "akr2255", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "3 03. 48/2 12 2 o", "2r "akr226 o", "ybp1996 0508--clarr done"," a dlc i c dlc ","hull", "hull", "n-us-", "3 03. 48/2 12 2 o", "akr2281", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "t-----i a r------", "333. , · "akr2287", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "57 4. 5/262 12 2 o", "t "rkr2357", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e------", "361 . 6/1 / 094 12 l "akr2358", "ybp1996 0508--clarr done"," a dlc i c dlc ","hull", "hull" ,"hull", "333. 7/2/01 12 20" ,' ¥' "akr2371", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e------", "3 07. 72/ 094 12 211 "akr2386", "ybp1996 05 08--clarr done", "dlc i c dlci", "hull" ,/'hull", "e-uk---", "hull", "hull", "xu, "rkr25 03", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "575. 1 / 09 12 2 o", "hl 'i-r---· ---------·----figure 2. a text-delimited file the warehouse. the querying option activates ldss's querying facility that provides wizards to guide the formulations of different types of queries, as discussed later in this article . the last option, report generating, is for the user to specify the report to be generated. i data mining tool a very important component of loss is the data mining tool for discovering association rules that specify the interrelationships of data stored in the warehouse. many data mining tools are now available in the commercial world. for our project, we are investigating the use of a neural-network-based data mining tool developed by limin fu of the university of florida.? the tool allows the discovery of association rules based on a set of training data provided to the tool. this part of our research and development work is still in progress . the existing gui and report generation facilities will be expanded to include the use of this mining tool. i library warehouse fcla exports the data existing in the 0b2 tables into text files. as a first step towards creating the database, these text files are transferred using ftp and form separate relational tables in the library warehouse. the data that harvesting information from a library data warehouse i su and needamangala 21 reproduced with permission of the copyright owner. further reproduction prohibited without permission. are scraped from the bibliographic and item record screens result in the formation of two more tables. characteristics data in the warehouse are snapshots of the original data files. only a subset of the data contents in these files are extracted for querying and analysis since not all the data are useful for a particular decision-making situation. data are filtered as they pass from the operational environment to the data warehouse environment. this filtering process is necessary particularly when a pc system, which has limited secondary storage and main memory space, is used. once extracted and stored in the warehouse, data are not updateable. they form a read-only database. however, different snapshots of the original files can be imported into the warehouse for querying and analysis. the results of the analyses of different snapshots can then be compared. structure data warehouses have a distinct structure. there are summarization and detail structures that demarcate a data warehouse. the structure of the library data warehouse is shown in figure 3. the different components of the library data warehouse as shown in figure 3 are: • notis and 0b2 tables. bibliographic and circulation data are obtained from notis through the screen scraping process and imported into the warehouse. fcla maintains acquisitions data in the form of db2 tables. these are also imported into the warehouse after conversion to a suitable format. • warehouse. the warehouse consists of several relational tables that are connected by means of relationships. the universal relation approach could have been used to implement the warehouse by using a single table. the argument for using the universal relation approach would be that all the collected data fall under the same domain. but let us examine why this approach would not have been suitable. the different data collected for import into the warehouse were bibliographic data, circulation data, order data, and pay data. now, if all these data were incorporated into one single table with many attributes, it would not be of any exceptional use since each set of attributes have their own unique meaning when grouped together as bibliographic table, circulation table, and so on. for example, if we group the circulation data and the pay data together in a single table, it would not make sense. however, the pay data and the circulation data are related through the bib_key. hence, our use of the conventional approach of havuser .....--------~----.----------......----=--___ bibliographic data view circulation data view ufbib, ufpay, ufinv, ufcirc, uford warehouse pay data view import screen scraping notis fcla db2 tables figure 3. structure of the library data warehouse ing several tables connected by means of relationships is more appropriate. • views. a view in sql terminology is a single table that is derived from other tables. these other tables could be base tables or previously defined views. a view does not necessarily exist in physical form; it is considered a virtual table, in contrast to base tables whose tables are actually stored in the database. in the context of the ldss, views can be implemented by means of the adhoc query wizard. the user can define a query /view using the wizard and save it for future use. the user can then define a query on this query i view. • summarization. the process of implementing views falls under the process of summarization. summarization provides the user with views, which make it easier for users to query on the data of their interests. as explained above, the specific warehouse we established consists of five tables. table names including "_wh" indicates that it contains current detailed data of the warehouse. current detailed data represents the most recent snapshot of data that has been taken from the notis system. the summarized views are derived from the current detailed data of the warehouse. since current detailed data of the warehouse are the basic data of the 22 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. application, only the current detailed data tables are shown in appendix a. i decision support by querying the warehouse the warehouse contains a set of integrated relational tables whose contents are linked by the common primary key, the bib_key (biblio_key). the data stored across these tables can be traver sed by matching the key values associated with their tuples or records . decision makers can issue all sorts of sql-type queries to retrieve useful information from the warehouse. two general types of queries can be distinguished : predefined queries and ad hoc queries . the former type refers to queries that are frequently used by decision makers for accessing information from different snapshots of data imported into the warehouse . the latter type refers to queries that are exploratory in nature. a decision maker suspects that there is some relationship between different types of data and issues a query to verify the existence of such a relationship. alternatively, data mining tools can be applied to analyze the data contents of the warehouse and discover rules of their relationships (or associations). predefined queries below are some sample queries posted in english. their corresponding sql queries can be processed using loss. l. number and percentage of approval titles circulated and noncirculated. 2. number and percentage of firm order titles circulated and noncirculated . 3. amount of financial resources spent on acquiring noncirculated titles. 4. number and percentage of dlc/dlc cataloging records in circulated and noncirculated titles . 5. number and percentage of "shared" cataloging records in circulated and noncirculated titles. 6. numbers of original and "shared" cataloging records of noncirculated titles. 7. identify the broad subject areas of circulated and noncirculated titles . 8. identify titles that have been circulated "n" number of times and by subjects . 9. number of circulated titles without the 505 field. each of the above english queries can be realized by a number of sql queries. we shall use the first two english queries and their corresponding sql queries to explain how the data warehouse contents and the querying facility of microsoft access can be used to support decision making. the results of sql queries also are given . the first english query can be divided into two parts (see figure 4), each realized by a number of sql queries as shown below . sample query outputs query 1: number and percentage of approval titles circulated and noncirculated result : total approval titles circulated noncirculated 1172 980 192 83.76 % 16.24 % similar to the above sql queries, we can translate the second english query into a number of sql queries and the result is given below: query 2: number and percentage of firm order titles circulated and noncirculated result : total firm order titles circulated noncirculated report generation 1829 1302 527 71.18 % 28.82 % the results of the two predefined english queries can be presented to users in the form of a report. total titles 3001 approval 1172 39% circulated 980 83.76 % noncirculated 192 16.24 % firm order 1829 61% circulated 1302 71.18 % noncirculated 527 28 .82 % from the above report, we can ascertain that, though 39 percent of the titles were purchased through the approval plan and 61 percent through firm orders, the approval titles have a higher rate of circulation, 83.76 percent, as compared to firm order titles of 71.18 percent. it is important to note that the result of the above queries is taken from only one snapshot of the circulation data. analysis from several snapshots is needed in order to compare the results and arrive with reliable information. we now present a report on the financial resources spent on acquiring and processing noncirculated titles. in order to generate this report, we need the output of queries four and five listed earlier in this article. the corresponding outputs are shown below. query 4: number and percentage of dlc/dlc cataloging records in circulated and noncirculated titles. harvesting information from a library data warehouse i su and needamangala 23 reproduced with permission of the copyright owner. further reproduction prohibited without permission. result: total dlc/dlc records circulated noncirculated 2852 2179 673 76.40% 23.60% query 5: number and percentage of "shared" cataloging records in circulated and noncirculated titles. result: total "shared" records circulated noncirculated 149 100 49 67.11% 32.89% in order to come up with the financial resources, we need to consider several factors, which contribute to the amount of financial resources spent. for the sake of simplicity, we consider only the following factors: 1. the cost of cataloging each item with dlc/dlc record approval titles circulated 2. the cost of cataloging each item with shared record 3. the average price of noncirculated books 4. the average pages of noncirculated books 5. the value of shelf space per centimeter because the value of the above factors differs from institution to institution and might change according to more efficient workflow and better equipment used, users are required to fill in the value for factors 1, 2, and 5. loss can compute factors 3 and 4. the financial report , taking into consideration the value of the above factors, could be as shown below. processing cost of each dlc title = $10.00 673 x $10.00 = $ 6,730.00 processing cost of each shared title = $20.00 sql query t.o retrieve the distinct bibliographic keys of all the approval titles: select distinct bibscreen.bib_key from bibscreen right join pa yl on bibscreen.bib_key = pa y l.bib_num where (((payl.fund_key) like "*07*")); sql query to count the number of approval titles that have been circulated: select count (appr_title.bib_key) as countofbib_key from (bibscreen inner join appr_title on bibscreen.bib_key = appr _title.bib_key) inner join itemscreen on bibscreen.bib_key = itemscreen .biblio_key where (((itemscreen.charges)>0)) order by count(appr _title.bib_key); sql query to calculate the percentage: select cnt_appr_ti tle_circ.countofbib_ke y, int(([cnt_appr_titl e_circ]![countofbib _key])*lo0/ count([bibscreen)![bib_key])) as percent_apprcirc from bibscreen, cnt_appr_title _circ group by cnt _appr _title_circ.countofbib _key; approval titles noncirculated sql query for counting the number of approval titles that have not been circulated: select distinct count(appr_title.bib_key) as countofbib_ke y from (appr _title inner join bibscreen on appr_title.bib_key bibscreen.bib_key) inner join itemscreen on bibscreen .bib_key = itemscreen.biblio_ke y where ( ( (itemscreen.charges)=0) ); sql query to calculate the percentage: select cnt_appr_title_noncirc.countofbib_ke y, int(([cnt_appr_title_noncirc)![countofbib_ke y])*lo0/ count([bibscreen]! [bib _key]))) as percent_appr _noncirc from bibscreen, cnt_appr _title_noncirc group by cnt_appr_title_noncirc .countofbib_ke y; figure 4. example of an english query divided into two parts 24 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. 49 x $20.00 = $ 980.00 average price paid per noncirculated item = $48.00 722 x $48.00 = $34,656.00 average size of book = 288 pages = 3 cm average cost of 1 cm of shelf space= $0.10 722 x $0.30 = $216.60 grand total = $42,582.60 again it is important to point out that several snapshots of the circulation data have to be taken to track and compare the different analyses before deriving the reliable information. ad hoc queries alternately, if the user wishes to issue a query that has not been predefined, the ad hoc query wizard can be used. the following example illustrates the use of the ad hoc query wizard. assume the sample query is: how many circulated titles in the english subject area cost more than $35? we now take you on a walk-through of the adhoc query wizard starting from the first step till the output is obtained. figure 4 depicts step 1 of the ad hoc query wizard. the sample query mentioned above requires the following fields: • biblio_key for a count of all the titles which satisfy the given condition. • charges to specify the criteria of "circulated title". • fund_key to specify all titles under the "english" subject area. • paid_amt to specify all titles which cost more than $35. step 2 of the ad hoc query wizard (figure 5) allows the user to specify criteria and thereby narrow the search domain. step 3 (figure 6) allows the user to specify any mathematical operations or aggregation functions to be performed. step 4 (figure 7) displays the user-defined query in sql form and allows the user to save the query for future reuse. the output of the query is shown below in figure 8. the figure shows the number of circulated titles in the english subject area that cost more than $35. alternatively, the user might wish to obtain a listing of these 33 titles. figure 9 shows the listing. i conclusion in this article, we presented the design and development of a library decision support system based on data warehousing and data mining concepts and techniques. we described the functions of the components of loss. the screen scraping and data cleansing and extraction figure 4. step 1: ad hoc query wizard ~ e.9,~lang__;c~,tfe ... 1 lik~ "'ft,f" j.esi: !han eg,. crfi;irget t 4 gr~er th'jn·eii, q:,arges,> 0 equal tci'e_g_cfiarge~= !1 not . . figure 5. step 2: ad hoc query wizard harvesting information from a library data warehouse i su and needamangala 25 reproduced with permission of the copyright owner. further reproduction prohibited without permission. figure 6. step three : ad hoc query wizard figure 7. step four: ad hoc query wizard figure 8. query output figure 9. listing of query output 26 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. processes were described in detail. the process of importing data stored in luis as separate data files into the library data warehouse was also described. the data contents of the warehouse can provide a very rich information source to aid the library management in decision making. using the implemented system, a decision maker can use the gui to establish the warehouse, and to activate the querying facility provided by microsoft access to explore the warehouse contents . many types of queries can be formulated and issued against the database. experimental results indicate that the system is effective and can provide pertinent information for aiding the library management in making decisions. we have fully tested the implemented system using a small sample database . our on going work includes the expansion of the database size and the inclusion of a data mining component for association rule discovery. extensions of the existing gui and report generation facilities to accommodate data mining needs are expected. i acknowledgments we would like to thank professor stanley su for his support and advice on the technical aspect of this project. we would also like to thank donna alsbury for providing us with the 0b2 data, daniel cromwell for loading the 0b2 files and along with nancy williams and tim hartigan for their helpful comments and valuable discussions on this project. references and notes 1. john ladley , "operational data stores: building an effective strategy, " data warehouse: practical advice from the experts (englewood cliffs, n.j.: prentice hall , 1997). 2. information on har vard university's adapt proj ect. accessed march 8, 2000, www.adapt.harvard .edu/; information on the arizona state university data administration and institutional analysis warehou se. accessed march 8, 2000, www .asu .edu / data_admin / wh-1.html; information on the university of minnesota clarity project. accessed march 8, 2000,www.clarity.umn .edu/; information on the uc san diego darwin project. accessed march 8, 2000, www.act .ucsd .edu/ dw i darwin.html; information on university of wisconsinmadison infoaccess . accessed march 8, 2000, http :/ / wiscinfo. doit.wisc .edu/infoac cess /; information on the univer sity of nebraska data warehouse-nulook. accessed march 8, 2000, www .nulook.uneb.edu /. 3. ramon barquin and herbert edelstein, eds ., building, using, and managing the data warehouse (englewood cliffs, n .j.: prentice hall , 1997); ramon barquin and herbert edelstein, eds ., planning and designing the data warehouse (upper saddle river, n.j .: prentice hall, 1996); joyce bischoff and ted alexander, data warehouse: practical advice from the experts (englewood cliffs, n.j.: prentice hall , 1997); jeff byard and donovan schneider, "the ins and outs (and everything in between) of data war ehousing ," acm sigmod 1996 tutorial notes, may 1996. accessed march 8, 2000, www .redbrick.com / product s/ white / pdf/sigmod96.pdf ; surajit chaudhuri and umesh dayal, "an overview of data warehousing and olap technolog ," acm sigmod record 26(1), march 1997. accessed march 8, 2000, www.acm.org/sigmod / record/issue s/ 9703/ chaudhuri .ps ; b. devlin , data warehouse: from architecture to implementation (reading, mass.: addison-wesle y, 1997); u. fayyad and others, eds ., advances in knowledge discovery and data mining (cambridge, mass.: the mit pr., 1996); joachim hammer, "data war ehousing overview, terminology, and research issues." accessed march 8, 2000, www.cise.ufl .edu/ -jhammer / classes / wh-seminar / overview / index .htm ; w. h. inmon, building the data warehouse (new york, n.y.: john wiley, 1996); ralph kimball , "dangerous preconceptions." accessed march 8, 2000, www .dbmsmag.com/9608d05.html ; ralph kimball , the data warehouse toolkit (new york, n.y.: john wiley, 1996); ralph kimball, "mastering data extraction," in dbms magazine, june 1996. (provides an overview of the process of extracting , cleaning, and loading data .) accessed march 8, 2000, www .dbmsmag.com / 9606d05 .html ; alberto mendelzon , "bibliography on data warehousing and olap." accessed march 8, 2000, www.cs.toronto.edu/-mendel/dwbib.html. 4. daniel j. boorstin, "the age of negative discovery," cleopatra's nose: essays on the unexpected (new york: random hous e, 1994). 5. information on the arrow system . accessed march 8, 2000,www . fcla.edu /s ystem/intro_arrow.html. 6. gary strawn, "batchbaming." accessed march 8, 2000, http:/ /web .uflib.ufl .edu/rs/rsd/batchbam .html. 7. li-min fu, "oomrul: leaming the domain rules ." accessed march 8, 2000, www .cise.ufl .edu / -fu / domrul.html. harvesting information from a library data warehouse i su and needamangala 27 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a warehouse data tables ufcirc_wh uford _wh ufpay_wh attribute domain attribute domain attribute domain bib_key text(s0) id autonumber inv_key text(20) status text(20) ord_num text(20) ord_num text(20) enum / chron text(20) ord_div number ord_div number midspine text(20) process_uni t text(20) process _unit text(20) temp_locatn text(20) bib_num text(20) bib_key text(20) pieces number order_da te da te / time ord_seq_num number ch arges number mod_date date / time inv_seq_num number last_use date / tune vendor_code text(20) status text(20) browse s number vndadr_order text(20 create_ date da te / tune value text(20) vndadr_claim text(20) lst_update da te / time invnt_date date / time vndadr_retum text(20) currency text(20) created date / time vend_ title_n um text(20) paid_am t num ber ord_unit text(20) usd_amt n u mber rcv_unit text(20) fund_key text(20) ufinv_wh ord_scope text(20 exp_class text(20) pur_ord_prod text(20) fiscal_year text(20) attribute domain action _int number copies number inv_key text(20) libspecl text(20) type_pay text(lo) create _dat e date / time libspec2 text(20) text text(20) mod_date date / time vend_note text(20) db2_11mestamp date / time approv _stat text(20) ord_note text(20) vend_adr _code text(20) source text(20) vend_code text(20) ref text(20) ufbib_wh action_date text(20) copyctl _num number attribute domain vend_inv _date date/tune mediu m text(20) approval_date date / tune piece_cnt n umber bib_key text(20) appro ver_id text(20) div_no te text(20) system_control _num text(s0) vend_inv _num text(20) acr_stat text(20) ca talog_source text(20) inv_tot number rel_stat text(20) lan g_code_l text(20) cale_ tot_rym ts num ber lst_date date / time lang_code_2 text(20) calc_net _tot_pymts number action_date text(20) geo_code text(20) currency text(20) libspec3 text(20) dewey_num text(20) discount_percen t number libspec4 text(20) edition text(20) vouch_no te text(20) encum b_units number pagina tion text(20) official_ vend text(20) currency text(20) size text(20) process _unit text(20) est_price number series_440 text(20) intemal_note text(20) encumb_outs num ber series_490 text(20) db2_ timestamp text(20) fund _key text(20) conten t text(20) fiscal_ year text(20) subject_l text(20) copies n u mber subject_2 text(20) xpay_method text(20) subject_3 text(20) vol_isu_date text(20) authors_l text(20) title_author text(20) au thors_2 text(20) db2_ timestamp date / time au th ors_3 text(20) series text(20) 28 information technology and libraries i march 2000 150 book reviews networks and disciplines; !proceedings of the educom fall conference, october 11-13, 1972, ann arbor, michigan. princeton: educom, 1973. 209p. $6.00. as with so many conferences, the principal beneficiaries of this one are those who attended the sessions, and not those who will read the proceedings. except for a few prepared papers, the text is the somewhat edited version of verbatim, ad lib summaries of a number of workshop sessions and two panels that purport to summarize common themes and consensus. since few people are profound in ad lib commentaries, the result is shallow and repetitive. the forest of themes is completely lost among a bewildering array of trees. the conference was, i am sure, exciting and thought-provoking for the participants. it was simply organized, starting with statements of networking activities in a number of disciplines, i.e., chemistry, language studies, economics, libraries, museums, and social research. the paper on economics is by far the best organized presentation of the problems and potential of computers in any of the fields considered, and perhaps the best short presentation yet published for economics. the paper on libraries was short, that on chemistry lacking in analytical quality, that on language provocative, that on social research highly personal, and that on museums a neat mixture of reporting and interpreting. much of the information is conditional, that is, it described what might or could be in the realm of the application of computers to the various subjects. the speakers all directed their papers to the concept of networks, interpreted chiefly as widespread remote access to computational facilities. the papers are followed by very brief transcripts of the summaries of workshops in which the application of computers to each of the disciplines was presumably discussed in detail. much of each summary is indicative and not really informative about the discussions. the concluding text again is the transcript of two final panels on themes and relationships among computer centers. the only description for this portion of the text is turgid. in the midst of all this is the banquet paper presented by ed parker, who as usual was thoughtful and insightful, and several presentations by national science foundation officials that must have been useful at the time to guide those relying on federal funding for computer networks in developing proposals. i can't think of another reference that touches on the potential of computers in so many different disciplines, but it is apparent from the breadth of ideas and the range of suggested or tested applications that a coherent and analytical review should be done. this volume isn't it. russell shank smithsonian institution the analysis of information systems, by charles t. meadow. second edition. los angeles: melville publishing co., 1973. a wiley-becker & hayes series book. this is a revised edition of a book first published in 1967. the earlier edition was written from the viewpoint of the programmer interested in the application of computers to information retrieval and related problems. the second edition claims to be "more of a textbook for information science graduate students and users" (although it is not clear who these "users" are) . elsewhere the author indicates that his emphasis is on "software technology of information systems" and that the book is intended "to bridge the communications gap among information users, librarians and data processors." the book is divided into four parts: language and communication (dealing largely with indexing techniques and the properties of index languages) , retrieval of information (including retrieval strategies and the evaluation of system performance), the organization of information (organization of records, of ffies, file sets), computer processing of information (basic file processes, data access systems, interactive information retrieval, programming languages, generalized data management systems). the second two sections are, i feel, . much better than the first. these are the areas in which the author has had the most direct experience, and the topics covered, at least in their information retrieval applications, are not discussed particularly well or particularly fully elsewhere. it is these sections of the book that make it of most value to the student of information science. i am less happy about meadow's discussion of indexing and index languages, which i find unclear, incomplete, and inaccurate in places. the distinction drawn between pre-coordinate and post-coordinate systems is inaccurate; meadow tends to refer to such systems simply as keyword systems, although it is perfectly possible to have a post-coordinate system based on, say, class numbers, which can hardly be considered keywords, while it is also possible to have keyword systems that are essentially precoordinate. in fact, meadow relates the characteristic of being post-coordinate to the number of terms an indexer may use (" ... permit their users to select several descriptors for an index, as many as are needed to describe a particular document"), but this is not an accurate distinction between the two types of system. the real difference is related to how the terms are used (not how many are used), including how they are used at the time of searching. the references to faceted classification are also confusing and a number of statements are made throughout the discussion on index languages that are completely untrue. for example, meadow states (p. 51) that "a hierarchical classification language has no syntax to combine descriptors into terms." this is not at all accurate since several hierarchical classification schemes, including udc, do have synthetic elements which allow combination of descriptors, and some of these are highly synthetic. in fact, meadow himself gives an example (p. 3839) of this synthetic feature in the udc. it is also perhaps unfortunate that the student could read all through meadow's discussion of index languages without getting any clear idea of the structure of a thesaurus for information retrieval and how this thesaurus is applied in practice. book reviews 151 moreover, meadow used medical subject headings as his example of a thesaurus (p. 33-34), although this is not at all a conventional thesaurus and does not follow the usual thesaurus structure. my other criticism is that the book is too selective in its discussion of various aspects of information retrieval. for example, the discussion on automatic indexing is by no means a complete review of techniques that have been used in this field. likewise, the discussion of interactive systems is very limited, because it is based solely on nasa's system, recon. the student who relied only on meadow's coverage of these topics would get a very incomplete and one-sided view of what exists and what has been done in the way of research. in short, i would recommend this book for those sections (p. 183-412) that deal with the organization of records and files and with related programming considerations. the author has handled these topics well and perhaps more completely, in the information retrieval context, than anyone else. indexing and index languages, on the other hand, are subjects that have been covered more completely, clearly, and accurately by various other writers. i would not recommend the discussion on index languages to a student unless read in conjunction with other texts. f. w. lancaster university of illinois application of computer technology to librm·y processes, a syllabus, by joseph becker and josephine s. pulsifer. metuchen, n.j.: scarecrow press, 1973. 173p. $5.00. despite the large number of institutions offering courses related to library automation, including just about every library school in north america, accredited or not, there is a remarkable shortage of published material to assist in this instruction. with the publication of this small volume a light has been kindled; let us hope it will be only the first of many, for larger numbers of better educated librarians must surely result in higher standards in the field. this syllabus covers eight topics related 152 journal of library automation vol. 7/2 jtme 1974 to the use of computers in libraries, titled as follows: bridging the gap (librarians and automation); computer technology; systems analysis and implementation; marc program; library clerical processes (which encompasses acquisitions, cataloging, serials, circulation, and management information) ; reference services; related technologies; and library networks. each topic is treated as a unit of instruction, and each receives the identical treatment as follows. the units each start with an introductory paragraph, explaining what the field encompasses, and indicating the purpose of teaching that topic. the purpose of systems analysis, for example, is "to develop the sequence of steps essential to the introduction of automated systems into the library." a series of behavioral objectives are then listed, to show what the student will be able to do (after he has learned the material) that he presumably was unable to do before. for example, there are seven behavioral objectives in the unit on computer technology, of which the first four are: "1) the student will be able to discuss the two-fold requirement to represent data by codes and data structures for purposes of machine manipulation, 2) the student will be able to identify the basic components of computer systems and describe their purposes, 3) the student will be able to differentiate hardware and software and describe briefly the part that programming plays in the overall computer processing operation, 4) the student will be able to define the various modes of computer operation and indicate the utility of each in library operations." the remaining three objectives refer to the student's ability to enumerate and compare types of input, output, and storage devices. then an outline of the instructional material is presented, followed by the detailed and well-organized material for instruction. in no case can the material presented here be considered all that an instructor would need to know about the field, but a surprising amount of specific detail is included, along with a carefully organized framework within which to place other knowledge. the end result is to present to the instructor a series of outlines that would encompass much of the material included in a basic introductory course in library automation. every instructor would, presumably, want to add other topics of his own in addition to adding other material to the topics treated in this volume, but he has here an extremely helpful guide to a basic course, and the only work of its kind to be published to date. peter simmons school of librarianship university of british columbia the larc reports, vol. 6, issue 1. online cataloging and circulation at western kentucky university: an approach to automated instructional resources ~anagement. 1973. 78p. this is a detailed account of the design, development, and implementation of online cataloging and circulation which have been in operation at western kentucky university for several years. the library's reasons for using computers are similar to those of many college and university libraries that experienced rapid growth during the 1960s. the faculty of the division of library services first prepared a detailed proposal with appropriate feasibility studies and cost analyses to reclassify the collection from dewey decimal to library of congress classification. the proposal was approved by the administration of the university, and the decision was made to utilize campus computer facilities via online input techniques for reclassification, cataloging, and circulation. "project reclass" was accomplished during 1970-71 using ibm 2741 ats/360 terminals. a circulation file was subsequently generated from the master record file. the main library is housed in a new building and has excellent computer facilities within the library that are connected to the university computer center. cataloging information is input directly into the system via ats terminals; ibm 2260 visual display terminals are used for inquiry into the status of books and patrons; and ibm 1031/1033 data collection terminals are used to charge out and check in books. catalog cards and book catalogs in upper/lower case are produced in batch mode on regular schedule. the on-line circulation book record file is used in conjunction with the on-line student master record and payroll master record files for preparation of overdue and fine notices. apparently the communication between library staff and computer personnel has been well above average, and cooperation of the administration and other interested parties has been outstanding. the attention given to planning, scheduling, training, and implementation is impressive. what has been accomplished to date is considered very successful, and plans are book reviews 153 underway to develop on-line acquisitions ordering and receiving procedures. the report has some annoying shortcomings such as referring to the library of congress as "national library"; frequent use of the word "xeroxing," which the xerox corporation is attempting to correct; "inputing" for "inputting"; and several other misspelled words. some parts are poorly organized and unclear, but the report does provide rriany useful details for those considering a similar undertaking. lavahn overmyer school of library science case western reserve university digitization of text documents using pdf/a yan han and xueheng wan information technology and libraries | march 2018 52 yan han (yhan@email.arizona.edu) is full librarian, the university of arizona libraries, and xueheng wan (wanxueheng@email.arizona.edu) is a student, department of computer science, university of arizona. abstract the purpose of this article is to demonstrate a practical use case of pdf/a for digitization of text documents following fadgi’s recommendation of using pdf/a as a preferred digitization file format. the authors demonstrate how to convert and combine tiffs with associated metadata into a single pdf/a-2b file for a document. using real-life examples and open source software, the authors show readers how to convert tiff images, extract associated metadata and international color consortium (icc) profiles, and validate against the newly released pdf/a validator. the generated pdf/a file is a self-contained and self-described container that accommodates all the data from digitization of textual materials, including page-level metadata and icc profiles. providing theoretical analysis and empirical examples, the authors show that pdf/a has many advantages over the traditionally preferred file format, tiff/jpeg2000, for digitization of text documents. background pdf has been primarily used as a file delivery format across many platforms in almost every device since its initial release in 1993. pdf/a was designed to address concerns about long-term preservation of pdf files, but there has been little research and few implementations of this file format. since the first standard (iso 19005 pdf/a-1), published in 2005, some articles discuss the pdf/a family of standards, relevant information, and how to implement pdf/a for born-digital documents.1 there is growing interest in the pdf and pdf/a standards after both the us library of congress and the national archives and records administration (nara) joined the pdf association in 2017. nara joined the pdf association because pdf files are used as electronic documents in every government and business agency. as explained in a blog post, the library of congress joined the pdf association because of the benefits to libraries, including participating in developing pdf standards, promoting best-practice use of pdf, and access to the global expertise in pdf technology.2 few articles, if any, have been published about using this file format for preservation of digitized content. yan han published a related article in 2015 about theoretical research on using pdf/a for text documents.3 in this article, han discussed the shortcomings of the widely used tiff and jpeg2000 as master preservation file formats and proposed using the then-emerging pdf/a as the preferred file format for digitization of text documents. han further analyzed the requirements mailto:yhan@email.arizona.edu mailto:wanxueheng@email.arizona.edu digitization of text documents using pdf/a | han and wan 53 https://doi.org/10.6017/ital.v37i1.9878 of digitization of text documents and discussed the advantages of pdf/a over tiff and jpeg2000. these benefits include platform independence, smaller file size, better compression algorithms, and metadata encoding. in addition, the file format reduces workload and simplifies postdigitization processing such as quality control, adding and updating missing pages, and creating new metadata and ocr data for discovery and digital preservation. as a result, pdf/a can be used in every phase of a digital object in an open archival information system (oais)—for example, a submission information package (sip), archive information package (aip), and dissemination information package (dip). in summary, a pdf/a file can be a structured, self-contained, and selfdescribed container allowing a simpler one-to-one relationship between an original physical document and its digital surrogate. in september 2016, the federal agencies digital guidelines initiative (fadgi) released its latest guidelines for digitization related to raster images: technical guidelines for digitizing heritage materials.4 the de-facto best practices for digitization, these guidelines provide federal agencies guidance and have been used in many cultural heritage institutions. both the pdf association and the authors welcomed the recognition of pdf/a as the preferred master file format for digitization of text documents such as unbound documents, bound volumes, and newspapers.5 goals and tasks since han has previously provided theoretical methods of coding raster images, metadata, and related information in pdf/a, the goals of this article are threefold: 1. present real-life experience of converting tiffs/jpeg2000s to pdf/a and back, along with image metadata 2. test open source libraries to create and manipulate images, image metadata, and pdf/a 3. validate generated pdf/as with the first legitimate validator for pdf/a validation the tasks included the following: ● convert all the master files in tiffs/jpeg2000 from digitization of text documents into single pdf/a files losslessly. one document, one pdf/a file. ● evaluate and extract metadata from each tiff/jpeg2000 image and encode it along with its image when creating the corresponding pdf/a file. ● demonstrate the runtimes of the above tasks for feasibility evaluation. ● validate the pdf/a files against the newly released open source pdf/a validator verapdf. ● extract each digital image from the pdf/a file back to its original master image files along with associated metadata. ● verify the extracted image files in the back-and-forth conversion process against the original master image files choices of pdf/a standards and conformance level this article demonstrates using pdf/a-2b as a self-contained self-describing file format. currently, there are three related pdf/a standards (pdf/a-1, pdf/a-2, and pdf/a-3), each with information technology and libraries | march 2018 54 three conformance levels (a, b, and u). the reasons for choosing pdf/a-2 (instead of pdf/a-1 or pdf/a-3) are the following: ● pdf/a-1 is based on pdf 1.4. in this standard, images coded in pdf/a-1 cannot use jpeg2000 compression (named in pdf/a as jpxdecode). one can still convert tiffs to pdf/a-1 using other lossless compression methods such as lzw. however, the spacesaving benefits of jpeg2000 compression over other methods would not be utilized. ● pdf/a-2 and pdf/a-3 are based on pdf 1.7. one significant feature of pdf 1.7 is that it supports jpeg2000 compression, which saves 40–60 percent of space for raster images compared to uncompressed tiffs. ● pdf/a-3 has one major feature that pdf/a-2 does not have, which is to allow arbitrary files to be embedded within the pdf file. in this case, there is no file to be embedded. the authors chose conformance level b for simplicity. ● b is basic conformance, which requires only necessary components (e.g., all fonts embedded in the pdf) for reproduction of a document’s visual appearance. ● a is accessible conformance, which means b conformance level plus additional accessibility (structural and semantic features such as document structure). one can add tags to convert pdf/2b to pdf/2a. ● u represents a conformance level with the additional requirement that all text in the document have unicode equivalents. this article does not cover any post-processing of additional manual or computational features such as adding ocr text to the generated pdf/a files. these features do not help faithfully capture the look and feel of original pages in digitization, and they can be added or updated later without any loss of information. in addition, ocr results rely on the availability of ocr engines for the document’s language, and results can vary between different ocr engines over time. ocr technology is getting better and will produce better results in the future. for example, current ocr technology for english gives very reliable (more than 90 percent) accuracy. in comparison, traditional chinese manuscripts and pashto/persian give unacceptably low accuracy (less than 60 percent). the cutting edge on ocr engines has started to utilize artificial intelligence networks, and the authors believe that a breakthrough will happen soon. data source the university of arizona libraries (ual) and afghanistan center at kabul university (acku) have been partnering to digitize and preserve acku’s permanent collection held in kabul. this collaborative project created the largest afghan digital repository in the world. currently the afghan digital repository (http://www.afghandata.org) contains more than fifteen thousand titles and 1.6 million pages of documents. digitization of these text documents follows the previous version of the fadgi guideline, which recommended scanning each page of a text document into a separate tiff file as the master file. these tiffs were organized by directories in a file system, where each directory represents a corresponding document containing all the scanned pages of this title. an example of the directory structure can be found in han’s article. http://www.afghandata.org/ digitization of text documents using pdf/a | han and wan 55 https://doi.org/10.6017/ital.v37i1.9878 pdf/a and image manipulation tools there are a few open source and proprietary pdf software development kits (sdk). adobe pdf library and foxit sdk are the most well-known commercial tools to manipulate pdfs. to show readers that they can manipulate and generate pdf/a documents themselves, open source software, rather than commercial tools, was used. currently, only a very limited number of open source pdf sdks are available, including itext and pdfbox. itext was chosen because it has g ood documentation and provides a well-built set of apis to support almost all the pdf and pdf/a features. initially written by bruno lowagie (who was in the iso pdf standard working group) in 1998 as an in-house project, lowagie later started up his own company, itext, and published itext in action with many code examples.6 moreover, itext has java and c# coding options with good code documentation. it is worth mentioning that itext has different versions. the author used itext 5.5.10 and 5.4.4. using an older version in our implementation generated a non-compatible pdf/a file because the it was not aligned with the pdf/a standard.7 for image processing, there were a few popular open source options, including imagemagick and gimp. imagemagick was chosen because of its popularity, stability, and cross-platform implementation. our implementation identified one issue with imagemagick: the current version (7.0.4) could not retrieve all the metadata from tiff files as it did not extract certain information such as the image file directory and color profile. these metadata are critical because they are part of the original data from digitization. unfortunately, the author observed that some image editors were unable to preserve all the metadata from the image files during the conversion process. hart and de varies used case studies to show the vulnerability of metadata, demonstrating metadata elements in a digital object can be lost and corrupted by use or conversion of a file to another format. they suggested that action is needed to ensure proper metadata creation and preservation so that all types of metadata must be captured and preserved to achieve the most authentic, consistent, and complete digital preservation for future use.8 metadata extraction tools and color profiles as we digitize physical documents and manipulate images, color management is important. the goal of color management is to obtain a controlled conversion between the color representations of various devices such as image scanners, digital cameras, and monitors. a color profile is a set of data that control input and output of a color space. the international color consortium (icc) standards and profiles were created to bring various manufacturers together because embedding color profiles into images is one of the most important color management solutions. image formats such as tiff and jpeg2000 and document formats such as pdf may contain embedded color profiles. the authors identified a few open source tools to extract tiff metadata, includin g exiftool, exiv2, and tiffinfo. exiftool is an open source tool for reading, writing, and manipulating metadata of media files. exiv2 is another free metadata tool supporting different image formats. the tiffinfo program is widely used in the linux platform, but it has not been updated for at least ten years. our implementations showed that exiftool was the one that most easily extracted the full icc profiles and other metadata from tiff and jpeg2000 files. imagemagick and other image processing software were examined in van der knijff’s article discussing jpeg2000 for long-term preservation.9 he found that icc profiles were lost in imagemagick. our implementation has information technology and libraries | march 2018 56 showed the current version of imagemagick has fixed this issue. a metadata sample can be found in appendix a. implementation converting and ordering tiffs into a single pdf/a-2 file when ordering and combining all individual tiffs of a document into a single pdf/a-2b file, the authors intended to preserve all information from the tiffs, including raster image data streams and metadata stored in each tiff’s header. the raster image data streams are the main images reflecting the original look and feel of these pages, while the metadata (including technical and administrative metadata such as bitspersample, datetime, and make/model/software) tells us important digitization and provenance information. both are critical for delivery and digital preservation. the tiff images were first converted to jpeg2000 with lossless compression using the open source imagemagick software. our tests of imagemagick demonstrated that it can handle different color profiles and will convert images correctly if the original tiff comes with a color profile. this gave us confidence that past concerns about jpeg2000 and imagemagick had been resolved. these images were then properly sorted into their original order and combined into a single pdf/a-2 file. an alternative is to directly code tiff’s image data stream into a pdf/a file, but this approach would miss one benefit of pdf/a-2: tremendous file size reduction with jpeg2000. the following is the pseudocode of ordering and combining all the tiffs in a text document into a single pdf/a2 file. createpdfa2(queue tifflist) { create an empty queue xmlq; create an empty queue jp2q; /* tifffilelist is pre-sorted queue based on the original order */ /* convert each tiff to jpeg2000 losslessly, then add each jpeg2000 and its metadata into a queue */ while (tifflist is not empty) { string tifffilepath = tifflist.dequeue(); string xmlfilepath = tiff metadata extracted using exiftool; xmlq.enqueue(xmlfilepath); string jp2filepath = jpeg2000 file location from tiff converted by imagemagick; jp2q.enqueue(jp2filepath); } /* convert each image’s metadata to xmp, add each jpeg2000 and its metadata into the pdf/a-2 file based on its original order */ document pdf2b = new document(); /* create pdf/a-2b conformance level */ pdfawriter writer = pdfawriter.getinstance(doc, new fileoutputstream(pdfafilepath),pdfaconformacelevel.pdf_a_2b); writer.createxmpmetadata(); //create root xmp digitization of text documents using pdf/a | han and wan 57 https://doi.org/10.6017/ital.v37i1.9878 pdf2b.open(); while(jp2q is not empty){ image jp2 = image.getinstance(jp2q.dequeue()); rectangle size = new rectangle(jp2.getwidth(), jp2.getheight()); //pdf page size setting pdf2b.setpagesize(size); pdf2b.newpage(); // create a new page for a new image byte[] bytearr = xmpmanipulation(xmlq.dequeue()); // convert original metadata based on the xmp standard writer .setpagexmpmetadata(bytearr); pdf2b.add(jp2); } pdf2b.close(); } converting pdf/a-2 files back to tiffs and jpeg2000s to ensure that we can extract raster images from the newly created pdf/a-2 file, the authors also wrote code to convert a pdf/a-2 file back to the original tiff or jpeg2000 format. this implementation was a reverse process of the above operation. once the reverse conversion process was completed, the authors verified that the image files created from the pdf/a-2 file were the same as before the conversion to pdf/a-2. note that we generated md5 checksums to verify image data streams. images data streams are the same, but metadata location can be varied because of inconsistent tiff tags used over the years. when converting one tiff to another tiff, imagemagick has its implementation of metadata tags. the code can be found in appendix b. pdf/a validation pdf/a is one of the most recognized digital preservation formats, specially designed for long -term preservation and access. however, no commonly accepted pdf/a validator was available in the past, although several commercial and open source pdf preflight and validation engines (e.g., acrobat) were available. validating a pdf/a against the pdf/a standards is a challenging task for a few reasons, including the complexity of the pdf and pdf/a formats. the pdf association and the open preservation foundation recognized the need and started a project to develop an open source pdf/a validator and build a maintenance community. their result, verapdf, is an open source validator designed for all pdf/a parts and conformance levels. released in january 2017, the goal of verapdf is to become the commonly accepted pdf/a validator. 10 our generated pdf/as have been validated with verapdf 1.4 and adobe acrobat pro dc preflight. both products validated the pdf/a-2b files as fully compatible. our implementations showed that verapdf 1.4 verified more cases than acrobat dc preflight. figure 1 shows a pdf file structure and its metadata. information technology and libraries | march 2018 58 figure 1. a pdf object tree with root-level metadata. runtime and conclusion the time complexity of our code is o(log n) because of the sorting algorithms used. tiffs were first converted to jpeg2000. when jpeg2000 images are added to a pdf/a-2 file, no further image manipulation is required because the generated pdf/a-2 uses jpeg2000 directly (in other words, it uses the jpxdecode filter). tables 1 and 2 show the performance comparison running in our computer hardware and software environment (intel core i7-2600 cpu@3.4ghz, 8gb ddr3 ram, 3tb 7200-rpm 64mb-cache hard disk running ubuntu 16.10). digitization of text documents using pdf/a | han and wan 59 https://doi.org/10.6017/ital.v37i1.9878 table 1. runtimes of converting grayscale tiffs to jpeg2000s and to pdf/a-2b no. of files total file size (mb) image conversion runtime (tiffs to jp2s in seconds) total runtime (tiffs to jp2s to a single pdf/a-2b in seconds) 1 9.1 3.61 3.98 10 91.1 35.63 36.71 20 182.2 71.83 73.98 50 455.5 179.06 184.63 100 910.9 358.3 370.91 table 2. runtimes of converting color tiffs to jpeg2000s and to pdf/a-2b no. of files total file size (mb) image conversion runtime (tiffs to jp2s in seconds) total runtime (tiffs to jp2s to a single pdf/a-2b in seconds) 1 27.3 14.80 14.94 10 273 150.51 151.55 20 546 289.95 293.21 50 1,415 741.89 749.75 100 2,730 1490.49 1509.23 the results show that (a) the majority of the runtime (more than 95 percent) is spent in converting a tiff to a jpeg2000 using imagemagick (see figure 2); (b) the average runtime of converting a tiff has a constant positive relationship with the file’s size (see figure 2); (c) in information technology and libraries | march 2018 60 comparison, the runtime of converting a color tiff is significantly higher than that of converting a greyscale tiff (see figure 2); and (d) it is feasible in terms of time and resources to convert existing master images of digital document collections to pdf/a-2b. for example, the runtime of 1 tb of conversion of color tiffs will be 552,831 seconds (153.5 hours; 6.398 days) using the above hardware. the authors have already processed more than 600,000 tiffs using this method. the authors conclude that using pdf/a gives institutions advantages of the newly preferred master file format for digitization of text documents over tiff/jpeg2000. the above implementation demonstrates the ease, the reasonable runtime, and the availability of open source software to perform such conversions. from both the theoretical analysis and empirical evidences, the authors show that pdf/a has advantages over the traditional preferred file format tiff for digitization of text documents. following best practice, a pdf/a file can be a selfcontained and self-described container that accommodates all the data from digitization of textual materials, including page-level metadata and icc profiles. summary the goal of this article is to demonstrate empirical evidences of using pdf/a for digitization of text document. the authors evaluated and used multiple open source software programs for processing raster images, extracting image metadata, and generating pdf/a files. these pdf/a files were validated using the up-to-date pdf/a validators verapdf and acrobat preflight. the authors also calculated the time complexity of the program and measured the total runtime in multiple testing cases. most of the runtime was spent on image conversions from tiff to jpeg2000. the creation of the pdf/a-2b file with associated page-level metadata accounted for less than 5 percent of the total runtime. runtime of conversion of a color tiff was much higher than that of a greyscale one. our theoretical analysis and empirical examples show that using pdf/a-2 presents many advantages over the traditional preferred file format (tiff/jpeg2000) for digitization of text documents. digitization of text documents using pdf/a | han and wan 61 https://doi.org/10.6017/ital.v37i1.9878 figure 2. file size, greyscale and color tiffs and runtime ratio. information technology and libraries | march 2018 62 appendix a: sample tiff metadata with icc header 8 3400 4680 8 8 8 uncompressed rgb (binary data 41025 bytes, use -b option to extract) 3 1 (binary data 28079 bytes, use -b option to extract) 400 400 chunky appl 2.2.0 display device profile rgb xyz 2006:02:02 02:20:00 acsp apple computer inc. not embedded, independent none reflective, glossy, positive, color perceptual 0.9642 1 0.82491 epso 0 epson srgb 0.43607 0.22249 0.01392 0.38515 0.71687 0.09708 0.14307 0.06061 0.7141 0.95045 1 1.08905 copyright (c) seiko epson corporation 2000 2006. all rights reserved. (binary data 8204 bytes, use -b option to extract) (binary data 8204 bytes, use -b option to extract) (binary data 8204 bytes, use -b option to extract) 0 0 0 digitization of text documents using pdf/a | han and wan 63 https://doi.org/10.6017/ital.v37i1.9878 appendix b: sample code to convert pdf/a-2 back to jpeg2000s /* assumption: the pdf/a-2b file was specifically generated from image objects converted from tiff images with jpxdecode along with page-level metadata */ public static void parse(string src, string dest) throws ioexception{ pdfreader reader = new pdfreader(src); pdfobject obj; int counter = 0; for(int i = 1; i <= reader.getxrefsize(); i ++){ obj = reader.getpdfobject(i); if(obj != null && obj.isstream()){ prstream stream = (prstream) obj; byte[] b; try{ b = pdfreader.getstreambytes(stream); }catch(unsupportedpdfexception e){ b = pdfreader.getstreambytesraw(stream); } pdfobject pdfsubtype = stream.get(pdfname.subtype); fileoutputstream fos = null; if (pdfsubtype != null && pdfsubtype.tostring().equals(pdfname.xml.tostring())) { fos = new fileoutputstream(string.format(dest + "_xml/" + counter+".xml", i)); system.out.println("page metadata extracted!"); } if (pdfsubtype != null && pdfsubtype.tostring().equals(pdfname.image.tostring())) { counter ++; fos = new fileoutputstream(string.format(dest + "_jp2/" + counter+".jp2", i)); } if (fos != null) { fos.write(b); fos.flush(); fos.close(); system.out.println("jpeg2000s conversion from pdf completed !"); } } } /* then use imagemagick library to convert jpeg2000s to tiffs */ information technology and libraries | march 2018 64 references 1 pdf-tools.com and pdf association, “pdf/a—the standard for long-term archiving,” version 2.4, white paper, may 20, 2009, http://www.pdftools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf; duff johnson, “white paper: how to implement pdf/a,” talking pdf, august 24, 2010, https://talkingpdf.org/white-paperhow-to-implement-pdfa/; alexandra oettler, “pdf/a in a nutshell 2.0: pdf for long-term archiving,” association for digital standards, 2013, https://www.pdfa.org/wpcontent/until2016_uploads/2013/05/pdfa_in_a_nutshell_211.pdf; library of congress, “pdf/a, pdf for long-term preservation,” last modified july 27, 2017, https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml. 2 library of congress, “the time and place for pdf: an interview with duff johnson of the pdf association,” the signal (blog), december 12, 2017, https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duffjohnson-of-the-pdf-association/. 3 yan han, “beyond tiff and jpeg2000: pdf/a as an oais submission information package container,” library hi tech 33, no. 3 (2015): 409–23, https://doi.org/10.1108/lht-06-20150068. 4 federal agencies digital guidelines initiative, technical guidelines for digitizing cultural heritage materials. (washington, dc: federal agencies digital guidelines initiative, 2016), http://www.digitizationguidelines.gov/guidelines/fadgi%20federal%20%20agencies%20d igital%20guidelines%20initiative-2016%20final_rev1.pdf. 5 duff johnson, “us federal agencies approve pdf/a,” pdf association, september 2, 2016, http://www.pdfa.org/new/us-federal-agencies-approve-pdfa/. 6 bruno lowagie, itext in action, 2nd ed. (stamford, ct: manning, 2010). 7 “itext 5.4.4,” itext, last modified september 16, 2013, http://itextpdf.com/changelog/544. 8 timothy robert hart and denise de vries, “metadata provenance and vulnerability,” information technology and libraries 36, no. 4 (2017), https://doi.org/10.6017/ital.v36i4.10146. 9 johan van der knijff, “jpeg 2000 for long-term preservation: jp2 as a preservation format,” dlib 17, no. 5/6 (2011), https://doi.org/10.1045/may2011-vanderknijff. 10 pdf association, “how verapdf does pdf/a validation,” 2016, http://www.pdfa.org/howverapdf-does-pdfa-validation/. http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf https://talkingpdf.org/white-paper-how-to-implement-pdfa/ https://talkingpdf.org/white-paper-how-to-implement-pdfa/ https://www.pdfa.org/wp-content/until2016_uploads/2013/05/pdfa_in_a_nutshell_211.pdf https://www.pdfa.org/wp-content/until2016_uploads/2013/05/pdfa_in_a_nutshell_211.pdf https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://doi.org/10.1108/lht-06-2015-0068 https://doi.org/10.1108/lht-06-2015-0068 http://www.digitizationguidelines.gov/guidelines/fadgi%20federal%20%20agencies%20digital%20guidelines%20initiative-2016%20final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/fadgi%20federal%20%20agencies%20digital%20guidelines%20initiative-2016%20final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/fadgi%20federal%20%20agencies%20digital%20guidelines%20initiative-2016%20final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/fadgi%20federal%20%20agencies%20digital%20guidelines%20initiative-2016%20final_rev1.pdf https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ http://itextpdf.com/changelog/544 http://itextpdf.com/changelog/544 https://doi.org/10.6017/ital.v36i4.10146 https://doi.org/10.6017/ital.v36i4.10146 https://doi.org/10.1045/may2011-vanderknijff https://www.pdfa.org/how-verapdf-does-pdfa-validation/ https://www.pdfa.org/how-verapdf-does-pdfa-validation/ https://www.pdfa.org/how-verapdf-does-pdfa-validation/ abstract background goals and tasks choices of pdf/a standards and conformance level data source pdf/a and image manipulation tools metadata extraction tools and color profiles implementation converting and ordering tiffs into a single pdf/a-2 file converting pdf/a-2 files back to tiffs and jpeg2000s pdf/a validation runtime and conclusion summary appendix a: sample tiff metadata with icc header appendix b: sample code to convert pdf/a-2 back to jpeg2000s references let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects tanya m. johnson information technology and libraries | june 2016 39 abstract three-dimensional objects are important sources of information that should not be ignored in the increasing trend towards digitization. previous research has not addressed the evaluation of digitized versions of three-dimensional objects. this paper first reviews research concerning such digitization, in both two and three dimensions, as well as public access in this context. next, evaluation criteria for websites incorporating digital versions of three-dimensional objects are extrapolated from previous research. finally, five websites are evaluated, and suggestions for best practices to provide public access to digital versions of three-dimensional objects are proposed. introduction much of the literature surrounding the increased efforts of libraries and museums to digitize content has focused on two-dimensional forms, such as books, photographs, or paintings. however, information does not only come in two dimensions; there are sculptures, artifacts, and other three-dimensional objects that have been unfortunately neglected by this digital revolution. as one author stated, “while researchers do not refer to three-dimensional objects as commonly as books, manuscripts, and journal articles, they are still important sources of information and should not be taken for granted” (jarrell 1998, 32). the importance of three-dimensional objects as information that can and should be shared is not a new phenomenon; indeed, as early as 1887, museologists and educators forwarded the view that “museums were in effect libraries of objects” that provided information not supplied by books alone (given and mctavish 2010, 11). however, it is only recently, with the advent of newer technological mechanisms, that such objects could be shared with the public on a larger scale. no longer do people need to physically visit museums to experience and learn from threedimensional objects. rather, various techniques have been utilized to place digital versions of such objects on the websites of museums and archives, and projects have been created by various universities in order to enhance that digital experience. nevertheless, as newell (2012) states: collections-holding institutions increasingly regard digital resources as additional objects of significance, not as complete replacements for the original. digital technologies work best when they enable people who feel connected to museum objects to have the freedom to deepen these tanya m. johnson (tmjohnso@gmail.com), a recent mlis degree graduate from the school of communication & information, rutgers, the state university of new jersey, is winner of the 2016 lita/ex libris student writing award. mailto:tmjohnso@gmail.com let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 40 relationships and, where appropriate, to extend outsiders’ understandings of the objects’ cultural contexts. the raison d’être of museums and other cultural institutions remains centred on the primacy of the object and in this sense continues to privilege material authenticity. (303) in this regard, three-dimensional visualization of physical objects can be seen as the next step for museums and cultural heritage institutions that seek to further patrons’ connection to such objects via the internet. indeed, in this digital age, the goals of museums and archives are changing, converging with those of libraries to focus more efforts on providing information to the public, and, along with the growing trend to digitize information contained within libraries, there has been a concomitant trend to digitize the contents of museums in order to provide greater public access to collections (given and mctavish 2010). in light of this progress, this paper will review various methods of presenting three-dimensional objects to the public on the internet and, based on an evaluation of five digital collections, attempt to provide some advice as to best practices for museums or institutions seeking to digitize such objects and present them to the public via a digital collection. literature review two-dimensional digitization there are many ways to present digital versions of three-dimensional objects on a webpage, ranging from simple two-dimensional photography to complicated three-dimensional scanning and rendering. beginning on the simpler end of the scale, bincsik, maezaki, and hattori (2012) describe the process of photographing japanese decorative art objects in order to create an image database of objects from multiple museums. specifically, the researchers explain that they need high quality photographs showing each object in all directions, as well as close-up images of fine details, in order to recreate the physical research experience as closely as possible. they also note that, for the same reason, the context of each object must be recorded, including photographs of any wrapping or storage materials and accompanying documentation. for this project, the researchers utilized nikon professional or semi-professional cameras, with zoom and macro lenses, and often used small apertures to increase depth-of-field. at times, they also took measurements of the objects in order to assist museums in maintaining accurate records. the raw image files were then processed with programs such as adobe photoshop, saved as original tif files, and converted into jpeg format for upload. despite the success of the project, the researchers also noted the limitations of digitizing three-dimensional objects: with decorative art objects some information is inevitably lost, such as the weight of the object, the feeling of its surface texture or the sense of its functionality in terms of proportions and balance. digital images clearly can fulfill many research objectives, but in some cases they can only be used as references. one objective of the decorative arts database is to advise the researcher in selecting which objects should be examined in person. (bincsik, maezaki, and hattori 2012, 46) one difficulty with photography, particularly when digitizing artwork, is that color is a function of light. thus, a single object will often appear to be different colors when photographed in different lighting conditions using conventional digital cameras, which process images using rgb filters. information technology and libraries | june 2016 41 more accurate representations of objects can be acquired using multispectral imaging, which uses a higher number of parameters (the international standard is 31, compared to rgb’s 3) in order to obtain more information about the reflectance of an object at any particular point in space (novati, pellegri, and schettini 2005). multispectral imaging, however, is very expensive and, despite some researchers’ attempts to create affordable systems (e.g., novati, pellegri, and schettini 2005), the acquisition of multispectral images is generally limited to large institutions with considerable funding (chane et al. 2013). the use of two-dimensional photography to digitize objects is not limited to the arts; in the natural sciences, different types of photographic equipment have been developed to document existing collections and enhance scientific observation. gigapixel imaging, for example, has been utilized to allow museum visitors to virtually explore large petroglyphs located in remote locations as well as for documentation and viewing of dinosaur bone specimens that are not on public display (louw and crowley 2013). this technology consists of taking many, very high resolution photographs that are then, via computer software, “aligned, blended, and stitched” together to create one extremely detailed composite image (louw and crowley 2013, 89–90). robotic systems, such as gigapan, have been developed to speed up the process and permit rapid recording and processing of the necessary area. once the gigapixel image is created, it can then be uploaded and displayed on the web in dynamic form, including spatial navigation of the image with embedded text, audio, or video at specific locations and zoom levels to provide further information (louw and crowley 2013). various types of gigapixel imaging, including the gigapan system, have also been used to digitize important collections of biological specimens, particularly insects, which are often stored in large drawers. one study examined the documentation of entomological specimens by “whole-drawer imaging” using various gigapixel imaging technologies (holovachov, zatushevsky, and shydlovsky 2014). the researchers explained that different gigapixel imaging systems (many of which are commercial and proprietary) utilize different types of cameras and lenses, as well as different types of software for processing. however, despite the expensive cost of some commercially available systems, it is possible for museums and other institutions to create their own, economically viable versions. the system created by holovachov, zatushevsky, and shydlovsky utilized a standard slr camera, fitted with a macro lens and attached to an immovable stand. the researchers manually set up lighting, focus, aperture, and other settings, and moved the insect drawer along a pre-determined grid pattern in order to obtain the multiple overlapping photographs necessary to create a large gigapixel image. they used a freely available stitching software program and manually corrected stitching artifacts and color balance issues that resulted from the use of a non-telecentric lens.1 despite the lower cost of their individualized system, however, the researchers noted that the process was much more time-consuming and necessitated more labor from workers digitizing the collection. moreover, technologically speaking, the researchers emphasized the limits of two-dimensional imaging, given that the 1the difference between telecentric and non-telecentric lenses is explained by the researchers: “contrary to ordinary photographic lenses, object-space telecentric lenses provide the same object magnification at all possible focusing distances. an object that is too close or too far from the focus plane and not in focus, will be the same size as if it were in focus. there is no perspective error and the image projection is parallel. therefore, when such a lens is used to take images of pinned insects in a box, all vertical pins will appear strictly vertical, independent of their position within the camera’s field of view” (holovachov, zatushevsky, and shydlovsky 2014, 7). let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 42 “diagnostic characteristics of three-dimensional insects,” as well as the accompanying labels, are often invisible when a drawer is only photographed from the top. thus, the researchers concluded that, ultimately, “the whole-drawer digitizing of insect collections needs to be transformed from two-dimensions to three-dimensions by employing complex imaging techniques (simultaneous use of multiple cameras positioned at different angles) and a digital workflow” (holovachov, zatushevsky, and shydlovsky 2014, 7). three-dimensional digitization given the goal of obtaining as accurate a representation as possible when digitizing objects, many researchers have turned to the use of various techniques in order to obtain three-dimensional data. acquiring a three-dimensional image of an object takes place in three steps: 1. preparation, during which certain preliminary activities take place that involve the decision about the technique and methodology to be adopted as well as the place of digitization, security planning issues, etc. 2. digital recording, which is the main digitization process according to the plan from phase 1. 3. data processing, which involves the modeling of the digitized object through the unification of partial scans, geometric data processing, texture data processing, texture mapping, etc. (pavlidis et al. 2007, 94) steps 2 and 3 have been more technically described as (2) obtaining data from an object to create point clouds (from thousands to billions of x,y,z coordinates representing loci on the object); and (3) processing point clouds into polygon models (creating a surface on top of the points), which can then be mapped with textures and colors (metallo and rossi 2011). there are several techniques that can be utilized to acquire three-dimensional data from a physical object. table 1 explains the four general methods most commonly used by museums. information technology and libraries | june 2016 43 type description positives negatives approx. price range laser scanning a laser source emits light onto the object’s surface, which is detected by a digital camera; geometry of the object is extracted by triangulation or time of flight calculations high accuracy in capturing geometry; can capture small objects and entire buildings (using different hardware) limited texture and color captured; shiny surfaces refract the laser $3,000– $200,000 white light (structured light) scanning a pattern of light is projected onto the object’s surface, and deformations in that pattern are detected by a digital camera; geometry is extracted by triangulation from deformations captures texture details, making it very accurate; can capture color dark, shiny, or translucent objects are problematic $15,000– $250,000 photogrammetry three-dimensional data is extracted from multiple twodimensional pictures can capture small objects and mountain ranges; good color information need either precise placement of cameras or more precise software to obtain accurate data cameras: $500– $50,000; software: free– $40,000 volumetric scanning magnetic resonance imaging (mri) uses a strong magnetic field and radio waves to detect geometric, density, volume and location information; computed tomography (ct) uses rotating x-rays to create twodimensional slices, which can then be reconstructed into three-dimensional images both types can view the interior and exterior of an object; ct can be used for reflective or translucent objects; mri can image soft tissues no color information; mri requires object to have high water content $200,000– $2,000,000 table 1. description of four general methods of acquiring three-dimensional data about physical objects (table information compiled by reference to pavlidis et al. 2007; metallo and rossi 2011; abel et al. 2011; and berquist et al. 2012). the type of three-dimensional digitization used can ultimately depend upon the types of objects to be imaged or the type of data needed. for example, in digitizing human skeletal collections, one study explained that three-dimensional laser scanning was an advantageous technique to create models of bones for preservation and analysis, but cautioned that ct scans would be needed to examine the internal structures of such specimens (kuzminsky and gardiner 2012). another study let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 44 utilized several techniques in an attempt to decipher graffiti inscriptions on ancient roman pottery shards, ultimately concluding that high-resolution photography (similar to gigapixel imaging) and three-dimensional laser scanning both provided detailed and helpful data (montani et al. 2012). additionally, sometimes multiple types of digitization can be used for the same objects with similar results. one study, for example, obtained virtually equivalent threedimensional models of the same object using laser scanning and two types of photogrammetry (lerma and muir 2014). most recently, researchers have been utilizing combinations of digitization techniques to obtain the most accurate representations possible. chane et al. (2013), for example, examined methods of combining three-dimensional digitization with multispectral photography in order to obtain enhanced information concerning the physical object in question. the researchers explained that combining the two processes is difficult because, in order to obtain multispectral textural data that is mapped to geometric positions, the object must be imaged from identical locations by multiple scanners/cameras or else the data processing that combines the two types of data becomes extremely complex. as a compromise, the researchers created a system of optical tracking based on photogrammetry techniques that permits the collection and integration of geometric positioning data and multispectral textures utilizing precise targeting procedures. however, the researchers noted that most systems integrating multispectral photography with threedimensional digitization tended to be quite bulky, did not adapt easily to different types of objects, and needed better processing algorithms for more complex three-dimensional objects (chane et al. 2013). public access to three-dimensionally digitized objects despite museums’ growing focus on increasing public access to collections via digitization (given and mctavish 2010), there is very little literature addressing public access to three-dimensionally digitized objects. indeed, studies in this realm tend to focus on the technological aspects of either the modeling of specific objects or collections or website viewing of three-dimensional models. for example, abate et al. (2011) described the three-dimensional digitization of a particular statue from the scanning process to its ultimate depiction on a website. the researchers explained in detail the particular software architecture utilized in order to permit the remote rendering of the three-dimensional model on users’ computers via a java applet without compromising quality or necessitating download of potentially copyrighted works. by contrast, literature concerning the digital michelangelo project, during which researchers three-dimensionally digitized various michelangelo works, focused on the method used to create an accurate three-dimensional model, complete with color and texture mapping, and a visualization tool (dellepiane et al. 2008). one study did describe a project that was designed to place three-dimensional data about various cultural artifacts in an online repository for curators and other professionals (hess et al. 2011). this repository was contained within database management software, a web-based interface was designed for searching, and user access to three-dimensional images and models was provided via an activex plugin. despite the potential of the prototype, however, it appears that the project has ceased,2 and the institution’s current three-dimensional imaging project is focused on the design 2see http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/ecurator. http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator information technology and libraries | june 2016 45 of a traveling exhibition incorporating, among other things, three-dimensional models of artifacts and physical replicas created from such models.3 studies that do address public access directly tend to focus on the improvement of museum websites generally. for example, in terms of user expectations of museum websites, one study found that approximately 63 percent of visitors to a museum’s website did so in order to search the digital collection (kravchyna and hastings 2002). another study found four types of museum website users, who each had different needs and expectations of sites. relevantly, educators sought collections that were “the more realistic the better,” including suggestions like incorporating three-dimensional simulations of physical objects so that students could “explore the form, construction, texture and use of objects” (cameron 2003, 335). further, non-specialist users “value free choice learning” and “access online collections to explore and discover new things and build on their knowledge base as a form of entertainment” (cameron 2003, 335). similarly, some studies have addressed the incorporation of web 2.0 technologies into museum websites. srinivasan et al. (2009), for example, argue that web 2.0 technologies must be integrated into museum catalogs rather than simply layered over existing records because users’ interest in objects is increased by participation in the descriptive practice. an implementation of this concept is found in hunter and gerber’s (2010) system of social tagging attached to threedimensional models. this paper is an effort to address the gap between the technical process of digitizing and presenting three-dimensional objects on the web and the user experience of such. through the evaluation of five websites, this paper will provide some guidance for the digitization of threedimensional objects and their presentation in digital collections for public access. methodology and evaluative criteria evaluations of digital museums are not as prevalent as evaluations of digital libraries. however, given the similar purposes of digital museums and digital libraries, it is appropriate to utilize similar criteria. for digital libraries, saracevic (2000) synthesized evaluation criteria into performance questions in two broad areas: (a) user-centered questions, including how well the digital library supports the society or community served, how well it supports institutional or organizational goals, how well it supports individual users’ information needs, and how well the digital library’s interface provides access and interaction; and (b) systemcentered questions, including hardware and network performance, processing and algorithm performance, and how well the content of the collection is selected, represented, organized, and managed. xie (2008) focused on user-centered evaluation and found five general criteria that exemplified users’ own evaluations of digital libraries: interface usability, collection quality, service quality, system performance, and user satisfaction. parandjuk (2010) used information architecture to construct criteria for the evaluation of a digital library, including the following: • uniformity of standards, including consistency among webpages and individual records; • findability, including ease of use and multiple ways to access the same information; • sub-navigation, including indexes, sitemaps, and guides; 3see http://www.3dencounters.com. http://www.3dencounters.com/ let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 46 • contextual navigation, including simplified searching and co-location of different types of resources; • language, including consistency in labeling across pages and records and appropriateness for the audience; and • integration of searching and browsing. this system is particularly appropriate in the context of digital museums, as it emphasizes the curatorial or organizational aspect of the collection in order to support learning objectives. in one comprehensive evaluation of the websites of art museums, pallas and economides (2008) created a framework for such evaluation, incorporating six dimensions: content, presentation, usability, interactivity and feedback, e-services, and technical. each dimension then contained several specific criteria. many of the criteria overlapped, however, and three-dimensional imaging, for example, was placed within the e-services dimension, under virtual tours, although it could have been placed within presentation, with other multimedia criteria, or even within interactivity, with interactive multimedia applications. the problem in trying to evaluate a particular part of a museum’s website, namely, the way it presents three-dimensional objects in digital form, is that the level of specificity almost renders many of the evaluation criteria from previous studies irrelevant. as hariri and norouzi (2011) suggest, evaluation criteria should be based on the objective of the evaluation. hence, based on portions of the above-referenced studies, this author has created a more focused evaluation framework, concentrating on criteria that are particularly relevant to museums’ digital presentations of three-dimensional objects. this framework is detailed in table 2, below. dimension description functionality what technology is used to display the object? how well does it work? must programs or files be downloaded? are the loading times of displays acceptable? usability how easy is the site to use? what is the navigation system? are there searching and browsing functions, and how well does each work? how findable are individual objects? presentation how does the display of the object look? what is the context in which the object is presented? are there multiple viewing options? is there any interactivity permitted? content does the site provide an adequate collection of objects? for individual objects, is there sufficient information provided? is there additional educational content? table 2. summary of evaluative criteria five digital collections, specified below, will be evaluated based on these criteria. this will be done in a case study manner, describing each website based on the above criteria and then using those evaluations to make suggestions for best practices. results information technology and libraries | june 2016 47 it is difficult to compare different types of digital collections, particularly when the focus is on different types of technology utilized to display similar objects. however, because the goal here is to determine the best practices for the digital presentation of three-dimensional objects, it is important to evaluate a variety of techniques in a variety of fields. thus, the following digital collections have been chosen to illustrate different ways in which such objects can be displayed on a website. museum of fine arts, boston (mfa) (http://www.mfa.org/collections) the mfa, both in person and online, boasts a comprehensive and extensive collection of art and historical artifacts of varying forms. the website is very easy to navigate, with well-defined browsing options and easy search capabilities, allowing for refinement of results by collection or type of item. there are many collections, which are well organized and curated into separate exhibits and galleries. in addition, when viewing each gallery, suggestions are linked for related online exhibitions as well as tours and exhibits at the physical museum. each item record contains a detailed description of the item as well as its provenance. thus, the mfa website attains a very high rating for usability and content. however, individual items are represented by only single pictures of varying quality. some pictures are color, some are black and white, and no two pictures appear to have the same lighting. additionally, despite being slow to load, even the pictures that appear to be of the best quality cannot be of high resolution, as zooming in makes them slightly blurry. accordingly, the mfa website receives a medium rating for functionality and a low rating for presentation. digital fish library (dfl) (http://www.digitalfishlibrary.org/index.php) the dfl project is a comprehensive program that utilizes mri scanning to digitize preserved biological fish samples from a particular collection housed at the scripps institution of oceanography. after mri scans of a specimen are taken, the data is processed and translated into various views that are placed on the website, accompanied by information about each species (berquist et al. 2012). navigating the dfl website is very intuitive, as the individual specimen records are organized by taxonomy. it is easy to search for particular species or browse through the clickable, pictorial interface. records for each species include detailed information about the individual specimen, the specifics of the scans used to image each, and broader information about the species. individual records also provide links to other species within the taxonomic family. thus, the dfl website attains high ratings in both usability and content. for functionality and presentation, however, the ratings are medium. although for each item there are videos and still images obtained from threedimensional volume renderings and mri scans, they are small in size and have low resolution. there is no interactive component, with the possible exception of the “digital fish viewer” that supposedly requires java, but this author could not get it to work despite best efforts. one nice feature, shown in figure 1 below, is that some of the specimen records have three-dimensional renderings showing and explaining the internal structures of the species. http://www.mfa.org/collections http://www.digitalfishlibrary.org/index.php let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 48 figure 1. annotated three-dimensional rendering of internal structures of hammerhead shark, from the digital fish library (http://www.digitalfishlibrary.org/library/viewimage.php?id=2851) the eton myers collection (http://etonmyers.bham.ac.uk/3d-models.html) the eton myers collection of ancient egyptian art is housed at eton college, and a project to threedimensionally digitize the items for public access was undertaken via collaboration between that institution and the university of birmingham. digitization was accomplished with threedimensional laser scanners, data was then processed with geomagic software to produce point cloud and mesh forms, and individual datasets were reduced in size and converted into an appropriate file type to allow for public access (chapman, gaffney, and moulden 2010). usability of the eton myers collection website is extremely low. the initial interface is simply a list of three-dimensional models by item number with a description of how to download the appropriate program and files. another website from the university of birmingham (http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+col lection) contains a more museum-like interface, but contains many more records for objects than are contained on the initial list of three-dimensional models. moreover, most of the records do not even include pictures of the items, let alone links to the three-dimensional models, and the records that do include pictures do not necessarily include such links. even when a record has a link to the three-dimensional model, it actually redirects to the full list of models rather than to the individual item. there is no search functionality from the initial list of three-dimensional models, and no way to browse other than to, colloquially speaking, poke and hope. individual items are only identified by item number, and, aside from the few records that have accompanying pictures on the university of birmingham site, there is no way to know to what item any given number refers. the http://www.digitalfishlibrary.org/library/viewimage.php?id=2851 http://etonmyers.bham.ac.uk/3d-models.html http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+collection http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+collection information technology and libraries | june 2016 49 website attains only a low rating for content; although it seems that there may be a decent number of items in the collection, it is impossible to know for certain given the problems with the interface and the fact that individual items are virtually unidentified. the eton myers collection website also receives a low rating for functionality. in order to access three-dimensional models of items, users must download and install a program called meshlab, then download individual folders of compressed files, then unzip those files, and finally open the appropriate file in meshlab. despite compression, some of the file folders are still quite large and take some time to download. presentation of the items is also rated low. even for the high resolution versions of the three-dimensional renderings, viewed in meshlab, the geometry of the objects seems underdeveloped (e.g., hieroglyphics are illegible) and surface textures are not well mapped (e.g., colors are completely off). this is evident from a comparison of the threedimensional rendering with a two-dimensional photograph of the same item, as in figure 2, below. figure 2. comparison of original photograph (left) and three-dimensional rendering (right) of item number ecm 361, from the eton myers collection (http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+3 61&op-earliest_year=%3d&op-latest_year=%3d). notably, chapman, gaffney, and moulden (2010) indicate that the detailed three-dimensional imaging enabled them to identify tooling marks and read previously unclear hieroglyphics on certain items. thus, it is possible that the problems with the renderings may be a result of a loss in quality between the original models and the downloaded versions, particularly given that the files were reduced in size and converted prior to being made available for download. http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3d&op-latest_year=%3d http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3d&op-latest_year=%3d let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 50 epigraphia 3d project (http://www.epigraphia3d.es) the epigraphia 3d project was created to present an online collection of various historical roman epigraphs (also known as inscriptions) that were discovered and excavated in spain and italy; the physical collection is housed at the museo arqueológico nacional (madrid). digital imaging was accomplished using photogrammetry, free software was utilized to create three-dimensional object models and renderings, and photoshop was used to obtain appropriate textures. finally, the three-dimensional model was published on the web using sketchfab, a web service similar to flickr that allows in-browser viewing of three-dimensional renderings in many different formats (ramírez-sánchez et al. 2014). the epigraphia 3d website is intuitive and informative. browsing is simple because there are not many records, but, although it is possible to search the website, there is no search function specifically directed to the collection. thus, usability is rated as medium. despite the fact that the website provides descriptions of the project and the collection, as well as information about epigraphs generally, the website attains a medium rating for content in light of the small size of the collection and the limited information given for each individual item. however, the epigraphia 3d website receives very high ratings for functionality and presentation. the individual threedimensional models are detailed, legible, and interactive. individual inscriptions are transcribed for each item. the use of sketchfab to display the models is effective; no downloading is necessary, and it takes an acceptable amount of time to load. when viewing the item, users can rotate the object in either “orbit” or “first person” mode, as well as view it full-screen or within the browser window. users can also display the wireframe model and the textured or surfaced rendering, as shown in figure 3 below. figure 3. three-dimensional textured (left) and wireframe (middle) renderings from the epigraphia 3d project (http://www.epigraphia3d.es/3d-01.html), as compared to an original twodimensional photograph of the same object (right) (http://edabea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&r ec=19984). http://www.epigraphia3d.es/ http://www.epigraphia3d.es/3d-01.html http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 information technology and libraries | june 2016 51 smithsonian x 3d (http://3d.si.edu) the smithsonian x 3d project, although affiliated with all of the smithsonian’s varying divisions, was created to test the application of three-dimensional digitization techniques to “iconic collection objects” (http://3d.si.edu/about). the website provides significant detail concerning the project itself, mostly in the form of videos, and individual items, many of which are linked to “tours” that incorporate a story about the object. content is rated as medium because, despite the depth of information provided about individual items, there are still very few items within the collection. the website also receives a medium rating for usability, given the simple browsing structure, easy navigation, and lack of a search feature (all likely due at least in part to the limited content). functionality and presentation, however, are rated high. the x3d explorer in-browser software (powered by autodesk) does more than simply display a three-dimensional rendering of an object; it also permits users to edit the model by changing color, lighting, texture, and other variables as well as incorporates detailed information about each item, both as an overall description and as a slide show, where snippets of information are connected to specific views of the item. the individual three-dimensional models are high resolution, detailed, and wellrendered, with very good surface texture mapping. however, it must be noted that the x3d explorer tool is in beta and, as such, still has some bugs; for example, this author has observed a model disappear while zooming in on the rendering. table 3, below, summarizes the results of the evaluation. functionality usability presentation content mfa medium very high low very high dfl medium high medium high eton myers low low low low epigraphia 3d very high medium very high medium smithsonian x 3d high medium high medium table 3. summary of evaluation results for each website by individual criteria discussion based on the evaluation of the five websites described above, some suggested best practices for the digitization and presentation of three-dimensional objects become apparent. when digitizing, the museum should utilize the method that best suits the object or collection. for example, while mri scanning is likely the best method for three-dimensionally digitizing biological fish specimens, it is not going to be effective or feasible for digitizing artwork or artifacts (abel et al. 2011; berquist et al. 2012). regardless of the method of digitization used, however, the people conducting the imaging and processing should fully comprehend the hardware and software necessary to complete the task. additionally, although financial restraints must be considered, museums should note that some three-dimensional scanning equipment is just as economically feasible as standard digital cameras (metallo and rossi 2011). however, if a museum chooses to utilize only two-dimensional imaging, http://3d.si.edu/ http://3d.si.edu/about let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 52 each item should be photographed from multiple angles in high resolution, to avoid creating a website, like the mfa’s, on which everything other than the object itself is presented outstandingly. further, museums deciding on two-dimensional imaging should explore the possibility of utilizing photogrammetry to create three-dimensional models from their twodimensional photographs, like the epigraphia 3d project. there is free or inexpensive software that functions to permit the creation of three-dimensional object maps from very few photographs (ramírez-sánchez et al. 2014). finally, compatibility is a key issue when conducting threedimensional scans; the museum should ensure that the software used for rendering models is compatible with the way in which users will be viewing the models. in the context of public access to the museum’s digital collections, the website should be easy and intuitive to navigate. the mfa website is an excellent example; browsing and search functions should both be present, and reorganization of large numbers of objects into separate collections may be necessary. where searching is going to be the primary point of entry into the collection, it is important to have sufficient metadata and functional search algorithms to ensure that item records are findable. furthermore, remember that the website is simply a way to access the museum itself. hence, the collections on the website, like the collections in the physical museum, should be curated; there should be a logical flow to accessing object records. the museum may also want to have sections that are similar to virtual exhibitions, like the “tours” provided by the smithsonian x 3d project. finally, museums should ensure that no additional technological know-how (beyond being able to access the internet) is required to access the three-dimensional content in object records. users should not be required to download software or files to view records; epigraphia 3d’s use of sketchfab and the smithsonian’s x 3d explorer tool are both excellent examples of ways in which three-dimensional content can be viewed on the web without the need for extraneous software. museums and cultural heritage institutions are increasing the focus on providing public access to collections via digitization and display on websites (given and mctavish 2010). in order to do this effectively, this paper has attempted to provide some guidance as to best practices of presenting digital versions of three-dimensional objects. in closing, however, it must be noted that this author is not a technician. although this paper has tried to contend with the issues from the perspective of a librarian, there are complicated technical concerns behind any digitization project that have not been adequately addressed. in addition, this paper has not examined the role of budgetary constraints on digitization or the concomitant issues of creating and maintaining websites. moreover, because this paper has been treated as a broad overview of the digitization and presentation for public access of three-dimensional objects, the five websites evaluated were from varying fields of study. museums should look to more specific comparisons in order to appropriately digitize and present their collections on the web. conclusion there may not be a direct substitute for encountering an object in person, but for people who cannot obtain physical access to three-dimensional objects, the digital realm can serve as an adequate proxy. this paper has demonstrated, through an evaluation of five distinct digital collections, that utilizing three-dimensional imaging and presenting three-dimensional models of physical objects on the web can serve the important purpose of increasing public access to otherwise unavailable collections. information technology and libraries | june 2016 53 references abate, d., r. ciavarella, g. furini, g. guarnieri, s. migliori, and s. pierattini. “3d modeling and remote rendering technique of a high definition cultural heritage artefact.” procedia computer science 3 (2011): 848–52. http://dx.doi.org/10.1016/j.procs.2010.12.139. abel, r. l., s. parfitt, n. ashton, simon g. lewis, beccy scott, and c. stringer. “digital preservation and dissemination of ancient lithic technology with modern micro-ct.” computers and graphics 35, no. 4 (august 2011): 878–84. http://dx.doi.org/10.1016/j.cag.2011.03.001. berquist, rachel m., kristen m. gledhill, matthew w. peterson, allyson h. doan, gregory t. baxter, kara e. yopak, ning kang, h.j. walker, philip a. hastings, and lawrence r. frank. “the digital fish library: using mri to digitize, database, and document the morphological diversity of fish.” plos one 7, no. 4: (april 2012). http://dx.doi.org/10.1371/journal.pone.0034499. bincsik, monika, shinya maezaki, and kenji hattori. “digital archive project to catalogue exported japanese decorative arts.” international journal of humanities and arts computing 6, no. 1– 2 (march 2012): 42–56. http://dx.doi.org/10.3366/ijhac.2012.0037. cameron, fiona. “digital futures i: museum collections, digital technologies, and the cultural construction of knowledge.” curator: the museum journal 46, no. 3 (july 2003): 325–40. http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x. chane, camille simon, alamin mansouri, franck s. marzani, and frank boochs. “integration of 3d and multispectral data for cultural heritage applications: survey and perspectives.” image and vision computing 31, no. 1 (january 2013): 91–102. http://dx.doi.org/10.1016/j.imavis.2012.10.006. chapman, henry p., vincent l. gaffney, and helen l. moulden. “the eton myers collection virtual museum.” international journal of humanities and arts computing 4, no. 1–2 (october 2010): 81–93. http://dx.doi.org/10.3366/ijhac.2011.0009. dellepiane, m., m. callieri, f. ponchio, and r. scopigno. “mapping highly detailed colour information on extremely dense 3d models: the case of david's restoration.” computer graphics forum 27, no. 8 (december 2008): 2178–87. http://dx.doi.org/10.1111/j.14678659.2008.01194.x. given, lisa m., and lianne mctavish. “what’s old is new again: the reconvergence of libraries, archives, and museums in the digital age.” library quarterly 80, no. 1 (january 2010): 7– 32. http://dx.doi.org/10.1086/648461. hariri, nadjla, and yaghoub norouzi. “determining evaluation criteria for digital libraries’ user interface: a review.” the electronic library 29, no. 5 (2011): 698–722. http://dx.doi.org/10.1108/02640471111177116. hess, mona, francesca simon millar, stuart robson, sally macdonald, graeme were, and ian brown. “well connected to your digital object? e-curator: a web-based e-science platform for museum artefacts.” literary and linguistic computing 26, no. 2 (2011): 193– 215. http://dx.doi.org/10.1093/llc/fqr006. http://dx.doi.org/10.1016/j.cag.2011.03.001 http://dx.doi.org/10.1371/journal.pone.0034499 http://dx.doi.org/10.3366/ijhac.2012.0037 http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x http://dx.doi.org/10.1016/j.imavis.2012.10.006 http://dx.doi.org/10.3366/ijhac.2011.0009 http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1086/648461 http://dx.doi.org/10.1108/02640471111177116 http://dx.doi.org/10.1093/llc/fqr006 let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 54 holovachov, oleksandr, andriy zatushevsky, and ihor shydlovsky. “whole-drawer imaging of entomological collections: benefits, limitations and alternative applications.” journal of conservation and museum studies 12, no. 1 (2014): 1–13. http://dx.doi.org/10.5334/jcms.1021218. hunter, jane, and anna gerber. 2010. “harvesting community annotations on 3d models of museum artefacts to enhance knowledge, discovery and re-use.” journal of cultural heritage 11, no. 1 (2010): 81–90. http://dx.doi.org/10.1016/j.culher.2009.04.004. jarrell, michael c. “providing access to three-dimensional collections.” reference & user services quarterly 38, no. 1 (1998): 29–32. kravchyna, victoria, and sam k. hastings. “informational value of museum web sites.” first monday 7, no. 4 (february 2002). http://dx.doi.org/10.5210/fm.v7i2.929. kuzminsky, susan c. and megan s. gardiner. “three-dimensional laser scanning: potential uses for museum conservation and scientific research.” journal of archaeological science 39, no. 8 (august 2012): 2744–51. http://dx.doi.org/10.1016/j.jas.2012.04.020. lerma, josé luis, and colin muir. “evaluating the 3d documentation of an early christian upright stone with carvings from scotland with multiples images.” journal of archaeological science 46 (june 2014): 311–18. http://dx.doi.org/10.1016/j.jas.2014.02.026. louw, marti, and kevin crowley. “new ways of looking and learning in natural history museums: the use of gigapixel imaging to bring science and publics together.” curator: the museum journal 56, no. 1 (january 2013): 87–104. http://dx.doi.org/10.1111/cura.12009. metallo, adam, and vince rossi. “the future of three-dimensional imaging and museum applications.” curator: the museum journal 54, no. 1 (january 2011): 63–69. http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x. montani, isabelle, eric sapin, richard sylvestre, and raymond marquis . “analysis of roman pottery graffiti by high resolution capture and 3d laser profilometry.” journal of archaeological science 39, no. 11 (2012): 3349–53. http://dx.doi.org/10.1016/j.jas.2012.06.011. newell, jenny. “old objects, new media: historical collections, digitization and affect.” journal of material culture 17, no. 3 (september 2012): 287–306. http://dx.doi.org/10.1177/1359183512453534. novati, gianluca, paolo pellegri, and raimondo schettini. “an affordable multispectral imaging system for the digital museum.” international journal on digital libraries 5, no. 3 (may 2005): 167–78. http://dx.doi.org/10.1007/s00799-004-0103-y. pallas, john, and anastasios a. economides. “evaluation of art museums' web sites worldwide.” information services and use 28, no. 1 (2008): 45–57. http://dx.doi.org/10.3233/isu2008-0554. parandjuk, joanne c. “using information architecture to evaluate digital libraries.” the reference librarian 51, no. 2 (2010): 124–34. http://dx.doi.org/10.1080/02763870903579737. http://dx.doi.org/10.5334/jcms.1021218 http://dx.doi.org/10.1016/j.culher.2009.04.004 http://dx.doi.org/10.5210/fm.v7i2.929 http://dx.doi.org/10.1016/j.jas.2012.04.020 http://dx.doi.org/10.1016/j.jas.2014.02.026 http://dx.doi.org/10.1111/cura.12009 http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x http://dx.doi.org/10.1016/j.jas.2012.06.011 http://dx.doi.org/10.1177/1359183512453534 http://dx.doi.org/10.1007/s00799-004-0103-y http://dx.doi.org/10.3233/isu-2008-0554 http://dx.doi.org/10.3233/isu-2008-0554 http://dx.doi.org/10.1080/02763870903579737 information technology and libraries | june 2016 55 pavlidis, george, anestis koutsoudis, fotis arnaoutoglou, vassilios tsioukas, and christodoulos chamzas. “methods for 3d digitization of cultural heritage.” journal of cultural heritage 8, no. 1 (2007): 93–98, http://dx.doi.org/10.1016/j.culher.2006.10.007. ramírez-sánchez, manuel, josé-pablo suárez-rivero, and maría-ángeles castellano-hernández. “epigrafía digital: tecnología 3d de bajo coste para la digitalización de inscripciones y su acceso desde ordenadores y dispositivos móviles.” el profesional de la información 23, no. 5 (2014): 467–74. http://dx.doi.org/10.3145/epi.2014.sep.03. saracevic, tefko. “digital library evaluation: toward an evolution of concepts.” library trends 49, no. 3 (2000): 350–69. srinivasan, ramesh, robin boast, jonathan furner, and katherine m. becvar. “digital museums and diverse cultural knowledges: moving past the traditional catalog.” the information society 25, no. 4 (2009): 265–78, http://dx.doi.org/10.1080/01972240903028714. xie, hong iris. “users’ evaluation of digital libraries (dls): their uses, their criteria, and their assessment.” information processing and management 44, no. 3 (may 2008): 1346–73, http://dx.doi.org/10.1016/j.ipm.2007.10.003. http://dx.doi.org/10.1016/j.culher.2006.10.007 http://dx.doi.org/10.3145/epi.2014.sep.03 http://dx.doi.org/10.1080/01972240903028714 http://dx.doi.org/10.1016/j.ipm.2007.10.003 introduction microsoft word 9526-16430-5-ce.docx president’s message: reflections on lita’s past and future aimee fifarek information technologies and libraries | september 2016 3 when i reached out to ital editor bob gerrity about my first president’s column, he graciously provided copies of past lita presidents’ columns to get me started. it reminded me once again of the illustrious company i am in, starting with stephen r. salmon, the first president of the information services and automation division, as we were known until 1977. i am proud to be at the head of lita as it begins to celebrate its 50th anniversary year. a half century ago when lita was founded the world was experiencing an era of profound technological change. the us and soviet union were battling to be first in the space race, and an increasing number of world powers were engaging in nuclear testing. while civil rights demonstrations and the fighting in vietnam dominated the news, we were imagining peace via the technologically-driven future depicted in a new tv series called star trek. with tv focused on the stars, we were able to go to the movies and explore the strange new world of inner space in fantastic voyage. technology was poised to enter our daily lives as well, with diebold demonstrating the first atm1 and ralph h. baer writing the 4-page paper that would lay the foundation for the video game industry.2 heady times for technology indeed, and the fact that libraries were sufficiently advanced to require an association dedicated to supporting technologists is hardly surprising. by the time of lita’s founding at the 1966 midwinter meeting in chicago, library automation had been in development for over a decade.3 marc was just being invented, with the first tapes from the library of congress scheduled to go to the sixteen pilot libraries later that year. membership in the only organization that existed, the committee on library automation (cola), was restricted to the handful of professionals who either developed or managed existing library systems. but technology was beginning to impact many more librarians than just those rarified few. according to president salmon, “it was clear that large numbers of librarians who didn't meet cola's standards for membership were in need of information on library automation and wanted leadership.”4 the first meeting of our division on july 14, 1966 at the ala annual conference in new york was attended by several hundred librarians interested in information sharing, technology standards, and technology training for library staff. this group created the first mission, vision, and bylaws that set us on a 50-year path of success. lita is well positioned to take the first steps into our next 50 years. thanks to the efforts of last year’s lita board, we are on the verge of adopting a new two-year strategic plan that is designed aimee fifarek (aimee.fifarek@phoenix.gov) is lita president 2016-17 and deputy director for customer support, it and digital initiatives at phoenix public library, phoenix, az. president’s message | fifarek doi: 10.6017/ital.v35i3.9526 4 to guide us through the current transitional period. it will be accompanied by a tactical plan that will allow us to document our accomplishments and set the stage for an ongoing culture of continuous planning. also, jenny levine has proven to be extremely capable as she completes her first year as lita executive director. she has just the right combination of ala experience, technology know-how, and calm competence to guide us through the retooling and reimagining that is required to take a middle-aged association into the next phase of its life. the four areas of focus in the new strategic plan will help us to balance our efforts between preserving the strengths of our past and adapting our organization for a successful future. the first area of focus, member engagement, shows that our primary commitment needs to be to lita members. without you, lita would not exist. one of the key efforts is to increase the value of lita for members who are unable to travel to conferences. with travel budgets down and staying low, online member engagement is an area all of ala needs to improve, and who better to lead in this area than lita. the next area, organizational sustainability, is all about keeping the infrastructure of the organization strong, much of which happens in the domain of lita staff. budgeting, quality communication, and strategic planning all live here. the section on education and professional development recognizes the important role that webinars, online courses, online journal, and print publications play in allowing lita members to share their knowledge on both cutting edge and practical topics with the rest of the association and ala in general. we are already doing great work here and we need to better support and expand these efforts. the last focus area, advocacy and information policy, represents a future growth area for lita. now that everyone in the library world "does" technology to a certain extent, lita needs to think about how we will differentiate ourselves as outside competencies increase. our advantage is that we have been doing and thinking about technology for much longer than anyone else. with our vast wealth of experience, it's appropriate that we work to become thought leaders and implementers in the information policy realm. in this, as always, we return to where we started: our members. lita has thrived over the last 50 years because of this, our most important resource. lita was founded on the concept of sharing information about technology through conversation, publications, and knowledge creation. we endure because you, the committed, passionate information professionals are willing to share what you know with those who come after. and like our founders, there are always individuals who are willing to take on the mantle of leadership, whether through getting elected to lita board, becoming a committee or interest group chair, serving in key editorial roles for our monographs, journal, and blog, or joining the all-important lita staff. thanks to all of you who make lita’s future happen every day. i am proud to be in your company. information technologies and libraries | september 1016 5 references 1 . alan taylor, “50 years ago: a look back at 1966,” the atlantic photo, march 23, 2016, http://www.theatlantic.com/photo/2016/03/50-years-ago-a-look-back-at-1966/475074/, photo 46. 2. “take me back to august 30, 1966,” http://takemeback.to/30-august-1966#.v8szitlrtaq. 3. “library technology timeline,” http://web.york.cuny.edu/~valero/timeline_reference_citations.htm. 4. stephen r. salmon, “lita’s first 25 years, a brief history,” http://www.ala.org/lita/about/history/1st25years. institutional political and fiscal factors in the development of library automation, 1967-71 allen b. veaner: stanford university, stanford, california. 5 this paper (1) summarizes an investigation into the political and financial factors which inhibited the ready application of computers to individual academic libraries during the period 1967-71, and (2) presents the author's speculations on the future of libraries in a computer dominant society.il> technical aspects of system design were specifically excluded from the investigation. twenty-four institutions were visited and approximately 100 pe1·sons interviewed. substantial future change is envisaged in both the structure and function of the library, if the eme1·ging trend of coalescing libraries and computerized «information processing centers" continues. summary of major factors which inhibited the application of computers to library problems, 1967-71 major factors which inhibited the application of the computer to the library during the period 1967-71 can be categorized under three broad headings: (a) governance, organization, and management of the computer facility; (b) personnel in the computer facility; and (c) deficiencies in the library environment. a. governance, organization, and management of the computer facility 1. uncertainty over who was in charge of the computer facility.-this problem was partly attributable to the fact that the goals and objectives of the facility were imprecisely stated or not stated at all often there was no charter, no systematic procedures for establishing priorities, and excessive autonomy by the computer facility. these factors often permitted the facility to operate as a self-directing, self-sustaining entity, responsible to no informed, upper level manager. '~> the paper is based on a clr fellowship report to the council on library resources, inc., for the period january-june 1972. 6 journal of lihra1·y automation vol. 7/1 march 1974 2. effect of high level administrative changes.-in a few instances, the library automation effort was instigated by the president of the institution. he could, in effect, personally direct the allocation of resources. however, whenever a high administrative official leaves, the resulting vacuum is quickly filled by other interests, the atmosphere changes, and his personal program goals dissolve. 3. management inadequacies.-the effects of domination by a technician or special interest group are described below in more detail. although more and more organizations are putting together influential user groups to point the way toward better management, decision-making responsibility and authority continued to be misplaced in a few institutions which vested authority for technical decisions in a committee of deans who were somewhat remote from current trends in computing because of their administrative responsibilities. (in one institution, it was half jokingly stated that a dean in any hard science could be characterized as suffering from a minimum technological time lag of two years.) 4. lack of long-range planning inclusive of attention to community priorities.-few facilities visited had any written long-range plans, either for the acquisition of hardware, the conversion of older programs, or the involvement of users in systems design. ad hoc arrangements were prevalent. 5. system instability.-this was more the rule than the exception, especially in software, operating systems, hardware configuration, and pricing. wherever an academic computing facility was used for library development, the same broken record always seemed to be playing: the facility was always being taken apart and put together again. of course library development was not the only user affected; complaints arose from all users. 6. biased pricing algorithms.-in the academic facility, student and research use were competitive. hence systems were typically geared to distribute computing resources around the clock in some equitable and rational way. for instance, short student jobs were sometimes given a high priority for rapid turnaround, while long, grinding calculation work was pushed off to the evening or night shift by means of variable pricing schedules or algorithms. a pricing algorithm is basically a load leveling device to smooth out the peaks of over-demand and the valleys of under-utilization which would have occurred in the absence of such controls. devising pricing algorithms is by no means a simple task, since many factors must be taken into account: the kinds of machine resources available, their respective costs, the data rates at which they can function, market demand, hardware and software available, and system overhead, to name but a few. library jobs tended to suffer in both batch and on-line processing. in the former case, because batch jobs on large data bases took so much institutional political and fiscal factors/veaner 7 time, library work generally could not be done during the prime shift; in the latter case, an on-line library system made substantial demands upon a facility's storage equipment and telecommunications support, and competed with all other on-line users. 7. sense of competition with the library for hard dollars.-this problem, which is related to pricing bias, is detailed further on page 21. 8. scheduling problems.-many of the institutions visited had systems or charts for scheduling production, development, and maintenance. but conversations with system users often verified that schedules were either not met or had been unrealistically established. this was especially the case with development work b. personnel in the computer facility 1. selection and evaluation.-inasmuch as the library often did not have the competence to judge personnel nor the ability to generate meaningful specifications, there was generally very little protection from incompetence in this area. 2. elitism: the notion that the masters of the computer are inherently superior to and have better judgment than computer customers.-elitism is a paradox: it can be positive or negative-positive when the best brains produce software designs of true genius with respect to function, performance, economy, and reliability-but in its negative manifestation, reminiscent of the girl with the curl in the middle of her forehead: "when she was good, she was very, very good; when she was bad, she was horrid." during the boom years when computer facilities were expanding faster than the supply of competent staff, elitism seemed fairly common in the computer center. the excitement of rapid development, the seemingly unlimited intellectual challenge presented by the powerful apparatus, and high strung dispositions sometimes caused tempers to flare or immaturity to sustain itself beyond a reasonable time. strange hours, strange habits, bizarre behavior, all seemed to conspire against ordered and rational development. fortunately, as the field matures, the negative aspects of elitism are dying; managers now can concentrate on staff development work to turn top intellectual talents toward productive achievement. 3. disinterest.-this factor may be allied to elitism. in some instances, the computer center's staff gave considerable attention to the library during the period immediately following machine installation, when utilization was low. later, the staff's keen interest became "dulled" at the thought of operating a production system. "more interesting jobs" were .challenging the programmers and beginning to fill up the machine. 4. fear of the unknown big user.-it was recognized early that the library could be among the computer facility's largest potential customers, perhaps the largest. in some facilities, this recognition may have induced 8 journal of library automation vol. 7/1 march 1974 a fear of being taken over or overwhelmed by the user, who would then be in a position to dominate and dictate the direction of further development and operations. 5. fears of an unknown production environment-simply expressed, a production environment removes much of the stimulus for creative approaches to problem solving unless continuous development is maintained for new systems and new applications. many of the best programmers did not wish to lose their freedom to innovate and actively resisted participation in establishment of a production environment, with its concomitant requirement of "dull" maintenance support work. c. deficiencies within the library environment 1. failure to understand in full detail the current manual system.even where the manual system was understood, there was often an inability to describe it in the clear, unambiguous style essential to system design work. these deficiencies were further compounded by the unwillingness of some librarians to learn how to communicate adequately with computer personnel. 2. inability to communicate design specification.-many did not understand how to put together a specification document; particularly they did not know how to account exhaustively for all possible cases or alternatives. librarians were unaccustomed to defining their data processing requirements quantitatively or with precision-both absolutely indispensable to the computer environment. also, as much as the computer facility changed its software environment, many library development efforts were constantly changing their system requirements-a condition which made it all but impossible to program efficiently. 3. failure to understand the development process.-development is a new phenomenon in libraries. most librarians were not educated to comprehend development as an iterative process, characterized by experimentation, error, feedback, and corrective measures. accustomed to the relative stability of long-established procedures-some of which had stood for generations, even centuries-some librarians were baffled by the rapidly changing new technology, others showed impatience and a low tolerance for frustration. many expected development projects to resemble turnkey operations, and the failure of the process to accommodate these expectations produced disappointment and an inability to cope with the computer environment. 4. failure to recognize the computer as a finite resource.-both librarians and early facility managers seemed to look upon the computer as an inexhaustible resource, the former through lack of sophistication and the latter apparently through myopia or possibly ambition. some managers must have told their users that there was "no way" their equipment could be saturated in the foreseeable future. apparently some library users were naive enough to believe. institutional political and fiscal factorsjveaner 9 5. excessive or unrealistic performance expectations.-few library users understood the relationship between the system specifications and functional results, and fewer still understood the significance of performance specifications. the situation was not assisted by notions of "instantaneous" retrieval pushed by salesmen or the popular press. (the writer recalls vividly how one salesman told him the library could have a crt device for $1 a day! and indeed, the device itself was $1 per day if one cared to do without the keyboard, without cables, installation, control units, teleprocessing overhead, a computer, software, etc.) 6. lack of an established tradition of research and development ( r & d) and the lack of venture capital in the library community.the challenge of the computer may have been largely responsible for activating research and development as a serious and continuous effort in librarianship. inexperience in raising and managing funds for r & d, as well as a general lack of knowledge of computer cost factors inhibited progress or tended to make the development effort inefficient and full of surprises. 7. human problems.-some libraries having prior experience with small batch systems underestimated the scale of effort for contributing to the design of the large system, selling it to the users, installing it, and training the users. 8. insufficient support from top management.-in some instances, library management did not accord the automation effort the kind and degree of support essential to success. in particular, some librarians seemed to feel that automation was a temporary affair, definitely of less importance and significance than current manual operations. some did not recognize the sacrifices in regular production that would be necessary and some did not appreciate the continuing nature of development work. background two important prerequisites to progress in library automation were money and technical readiness. the government supplied the first, industry the second. the announcement by ibm in 1964 of its system 360 occurred at a fortunate time for the american library community. president johnson's administration had launched enormous programs in support of education. the library services and construction act was soon to channel millions of dollars into library plant expansion and, perhaps more significantly, the higher education act of 1965 was to sponsor research, which ui1til then had only the support of limited funds from the council on library resources, inc., and the national science foundation. (support from the national science foundation was largely, although not exclusively; directed toward discipline-oriented information services; one of the largest nsf grants went to the university of chicago library.) it was the right time to invest in library automation. important milestones were already behind the library community: the national library 10 journal of lihm1'y automation vol. 7/1 march 1974 of medicine's medlars program was well underway, the airlie conference on library automation had been held and its report published ("the white book"), and the library of congress automation feasibility study ("the red book") had appeared. 1 • 2 the first marc format was being tested in the field. in computer technology, third generation equipment represented major increases in computing power, processing speed, reliability, and capacity to store data in machine-readable form. ibm's sales force was successful beyond imagination in getting system 360's installed in large universities, as well as in business and government. ibm promised a new kind of software-time-sharing-which would virtually eliminate the tremendous mismatch of data processing speed between the human being and the machine. the new methods of spreading computer power through teleprocessing and time-sharing promised to make the computer at least competitive with and possibly an improvement over "antiquated" manual systems of providing rapid access to large and complex data files. within this relatively unknown environment, universities and libraries entered the software development process, which if successful, could enable them to catch up where they had been hopelessly falling behind. circulation, book purchasing, and technical processing loads in many libraries seemed to double and triple overnight as the country's schools and their programs grew to accommodate expanding enrollments. manual systems that had been reasonably workable and responsive in environments characterized by slow growth demonstrated significant and disturbing defects -the inability to deal with peak loads, or rapidly changing loads. the same effects were felt in administrative and academic computing: a bigger and more complex payroll, more students to register, construction contracts to monitor, more research grants which demanded bigger computers, and so on. these were truly boom years. but in the academic community there was still another force developing which was ultimately to be of even greater significance for libraries than the inconveniences of being unable to handle the housekeeping load: a dramatic rise in the expectations of patrons, especially in the academic community, where computers already abounded. libraries had come to be felt by some as strongholds of conservatism and expensive luxuries; librarians were faulted for not "putting the card catalog onto magnetic tape," for not implementing automated circulation systems, or otherwise failing to take advantage of new and powerful data processing techniques. the libraries were caught amidst a variety of sometimes conflicting, sometimes complementary factors: the visionary ignorance of the computer salesman, the senior academic officer possessed by the computer dybbuk, a lack of sympathy or understanding among some computer center managers, a lack of appreciation by students and faculty of the complexity of identifying, procuring, and cataloging unique copies of what must be the least standardized product known to man, and their own lukeinstitutional political and fiscal facto1'sjveaner 11 warm commitment to undertake the hard work required to learn how to use the computer resource. anxieties about jog displacement caused some library staff to look upon computers with trepidation, thus further placing the librarian in a defensive position. while these forces were taking shape, the library's bibliographic activities continued to be seriously hampered by inadequate international bibliographic control.~~ some essential computer hardware, especially the programmable crt terminal with an adequate character set, was either nonexistent or totally unsuitable to library applications. in this institutional context librarians entered the world of computers and data processing. t purpose it is the purpose of this report to examine in some detail how internal institutional factors affected the development of computerized bibliographic systems, and especially to consider nontechnical, negative factors: what slowed down or inhibited the applications of computers in librarianship? this report is not concerned with the merits or demerits of specific systems or their features; indeed, the investigator did not inquire about system specifications. major questions centered about the factors which fostered or hindered the development p1'ocess, regardless of the merit of a project or system. scope investigation was limited almost solely to those institutions considered likely to have large scale, in-house development projects using third generation computer equipment. the majority of places visited were large academic libraries. the time span included in the survey begins approximately in 1967 and ends in 1971. a total of twenty-four institutions was visited and some 100 persons interviewed; a list of the institutions visited is in appendix 1. methodology site visits and i nte1'views arrangements were made to visit four types of individuals: the director of libraries, the head of the library's system development department, the director of the computation center, and whatever principal institutional officer was managerially and/ or financially responsible for campus computing. considerable variation was found in the type of person assigned this last responsibility-it could be the provost, the vice-president u implementation of the library of congress' shared cataloging program under title ii6f the higher education act of 1965 was soon to alter this situation dramatically. t the painful trauma libraries and librarians experienced in getting into computers is too well documented to summarize here. perhaps the best summary has been done by stuart-stubbs. a 12 ] oumal of library automation vol. 7/1 march 197 4 for academic affairs, or the vice-president for business/ financial affairs. choice of the major institutional official to be interviewed was often determined by the pattern of computing in a particular institution, or the facility which supported the development effort. at first the investigator attempted to utilize a structured questionnaire for interviewing. this very quickly broke down, as the interviewees were generally voluble and ranged widely over many related topics or items which they would have been asked about later. accordingly, after the first few interviews, the formal questionnaire approach was dropped and a simple checklist of major questions kept on a few cards to make sure that each major issue had been addressed. every interviewee received the investigator graciously and none was unwilling to talk; indeed, if anything the opposite was the case-most persons seemed to be eagerly waiting for an opportunity to air their views. visits and interviews occurred during the period january-april1972. literature searches searching the literature on this topic has been extremely frustrating. in the literature of computer science and management, there are many articles on pricing algorithms, machine resource allocation schemes, and issues of managing the computer facility, but none specific to the topic of this report. besides scanning professional literature, the author has regularly conducted for the past year monthly computer searches via the ucla center for information service's sdi service. abstracts and citations were searched in research in education (rie) and current index to journals in education (cije). with respect to problems faced by the library in acquiring computer services, the results have been nil in both cases. the author reluctantly concludes that no major recent studies have yet been published in this sensitive area, although two papers by canadian librarians are very helpful. 3• 4 the national academy of sciences/computer science and engineering board's information systems panel appears to have come closest to identifying the issues in its report, library and information technology: a national systems challenge. still, the comments in that report are highly generalized and do not grapple with specifics. 5 structure of educational computing most of the visited institutions maintained separate facilities for administrative and academic computing, while a few ran combined facilities or were in the throes of consolidating their facilities. the differences between administrative and academic computing have historical roots deeply embedded in institutional soil. administrative computing is usually an outgrowth of punched card installations first set up for payroll and financial reporting. academic computing, on the other hand, has its origins within the institution's instructional and research programs. typically it has been supported by external grants and contracts and has been oriented toward institutional political and fiscal facto1'sjveaner 13 the "hard" sciences. until the recent dropoff in federal support of higher education, academic computing was a money maker (through the overhead on grants and contracts) while administrative computing was a money spender. administrative computing typically very little computational work is done in administrative applications; most of the computer work is associated with input, update, reading records, writing records, and printing reports. except for the pay.roll application, the consumer group has tended to be somewhat smaller and less transient than the academic group. but to university administrators the computer could do much more than write checks and pay bills. many significant administrative applications had already been installed on second generation equipment: faculty-staff directories, inventories of space, supplies, and equipment, records of grades, course consumption reports, etc. all these tended to expand the user group, increasing competition for the resource. the advent of third generation equipment made it attractive for administrators to think about applications centered around the so-called "integrated data base." this led to a demand for further new services for the registrar, fund raising and gift solicitation, student services, purchasing, etc. conventional administrative computing-particularly that part of it which generated regular reports-lent itself naturally to batch processing, and indeed many of the early computer installations actually continued established punched card operations, merely using the computer as a faster calculator and printer. the administrative computing shop is typically characterized by (or hopes to be characterized by) great systems stability and dependability, a cautious and measured rate of innovation, and in the opinion of some academic computing types, not much imagination. file integrity, backup and recovery, and timely delivery of its products are prime goals in an administrative computing system. the administrative computing facility very much resembles the library in two important aspects: ( 1) it is a production system; and ( 2) it is almost entirely an overhead function, i.e., there is little or no attempt at cost recovery from system users for its services. academic computing academic computing is a much different world. it serves a large, vociferous, .influential, and mostly technological user community, many of whom ~~e not only competent in programming, but more importantly, possess ready cash. but this is changing: as academic computing expands to service users in the humanities and social sciences rather than mainly those in the "hard" sciences, the user group is growing and it will probably not be long before it embraces the total academic community. in hard science applications, the academic facility typically performs an 14 journal of library automation vol. 7/1 march 1974 enormous amount of computing ("number crunching") with a relatively small amount of output. system backup and recovery is important to the academic computing facility, but file integrity responsibility may often be assigned to the user since such a center sometimes does not maintain the data base but merely provides a service for manipulating it. the main components of academic use are departmentor discipline-oriented research and student instruction, the latter being particularly strong if there is a well-established computer science department. software development has customarily played a major role in academic computing and the usual practice was to actively seek out imaginative systems programmers for whom change and system improvement are food and drink. consequently, instability, both in hardware and software, has been more the rule than the exception in the recent past, although as the management of computer facilities matures, this too is changing. currenttrendsandstatus it is obvious from the above that administrative and academic computing have been characterized by diametrically opposed machine and managerial requirements. where they have been combined in the same facility, tensions have prevailed and neither user was happy. in a few instances known to the writer, such combinations have been abortive and a reversion made to divided facilities. but as computing matures it is becoming evident that operational stability is needed for all types of computing, not just administrative computing. additionally, the financial crises now prevalent in institutions of higher education have brought more realistic attitudes to the fore in understanding just what kinds of facilities can be afforded, and how they should be managed. additionally, the economies of scale, the increasing flexibility of hardware and growing sophistication of software are now combining to form an environment which can better satisfy all potential users of computers. there are clear indications that a unified, well-managed shop with competent staff might now economically and efficiently serve a variety of applications, including administrative and academic-on the same facility. however, this is a developing trend and does not correspond with what the writer actually observed during his visits. in situ he saw much evidence that anthony oettinger's observations of some years ago were still valid: ... routine scheduled administrative work and unpredictable experimental work coexist only very uneasily at best, and quite often to the serious detriment of both. where the demands of administrative data processing and education require the same facilities at precisely the same time, the argument is invariably won by whoever pays the bills. finances permitting, the loser sets up an independent installation. 6 indeed, it would not be unreasonable to conclude from the interviews that in most places visited, computing during the period 1967-71 was in a institutional political and fiscal factorsjveaner 15 state of disarray. there is abundant and disagreeable evidence of technical incompetence, lack of management ability, ill spent money, communication failures, and naive and disillusioned users. but it would be a mistake to conclude that the failures in library automation are attributable primarily to computer-oriented personnel or hardware problems-librarians in their own way displayed many of these same failures. it would be another mistake to dwell excessively on the high failure rates observed. in any complex technological endeavor, the rate of failure is dramatically high at the beginning; there is ample evidence here from the aircraft and space industries. indeed, the likelihood of a first success in anything complex-library automation is complex, as we have learned the hard way-is practically nil. organization and management problems: the academic computing environment early academic computing facilities were typically run by faculty members in engineering, applied mathematics, computer science, or related fields. this arrangement was satisfactory when computers were small, relatively primitive, and the user community was confined to those few people who could program in machine language or assembly language. as equipment became bigger and more powerful, and as higher level programming languages developed, more and more people learned programming. correspondingly, the task of managing the computer facility grew rapidly in size and scope. the budget of a large computer center in a modern university can easily run to several millions of dollars annually. the manager must balance seemingly innumerable, complex forces: personnel, management, government and vendor relationships, demands from vocal users, establishing priorities, the challenge of hardware advances, marketing, pricing services, balancing the budget, etc. it soon became clear that few faculty members possessed either the multifaceted talents or the experience required for effective management. as the center's budget grew, and particularly as the shift was made from second to third generation equipment, th,e faculty member tended to be replaced by the technician as manager. unf01tunately for many of the facility users, the technician tended to promote his own technical interests in software development or hardware utilization. in some instances, the user community felt that the facility was being run more for the benefit of the staff than for the users. the technician-manager often looked at the computer as his personal machine, much as some faculty members had earlier felt the computer to be their own private preserve. the vice-president of one university expressed the view that the technician-manager doesn't really have an institutional loyalty tied to the goals and objectives of the academic programs; he is more loyal to the machine or the software. in a school with a long history of computer utilization, there had been no tech16 journal of library automation vol. 7/1 march 1974 nician in charge of the computer facility for a decade. yet in a school not too far away, an officer indicated that his institution had "made the same mistake twice in a row" by hiring a technician to manage the computer facility. the technician-manager represents a highly personalized management style, one in which goodwill, friendship, or personal interest is the key to effective service. it can hardly represent an arrangement for the successful development and implementation of computerized bibliographic systems. in the third and current organization and management phase of academic computer facilities, the professional manager is in charge. schools are now beginning to see the need to develop formal charters for their computing centers, quasi-legal instruments which will lay out their specific responsibilities as service agencies. a professionally managed service agency eliminates one of the most irritating elements in the allocation of computer resources: personal judgment by the faculty or technician-manager as to the worth of a project, which was so prevalent during earlier management stages. at the time of the interviews, very few institutions actually had such charters, but their need was being recognized. it is now universally accepted that the computer center can no longer be the plaything of the faculty nor the expensive toy of the technician. organization and management: the administrative environment because of its historical development the administrative computing facility was usually first run by someone with an accounting or financial background. (academic computing persons occasionally put disparaging labels on such people as "edp-types" or characterized them as having a "punched card mentality.") the nature of the workload virtually meant that the administrative shop would be set up mainly for batch processing and any data base services provided for other users would involve printed lists. such facilities were found satisfactory by a number of libraries even for applications such as circulation, which produced gigantic lists-probably because it represented a vast improvement over an antiquated, poorly designed, or overloaded manual system. however, there was at least one major technical consideration which had direct political and financial implications for the library which turned to the administrative computing facility for its computer support. this was the library's need to support and manipulate a data base with nearly every data element of variable length-a requirement that was practically nonexistent in administrative computing. some facilities were unable or unwilling to meet this requirement. the move from tape-oriented systems to mixed disc and tape systems on third generation equipment necessitated an upgrading of programming staff, and brought into the administrative shop the same clearcut distinction between system programmers and application programmers which had institutional political and fiscal factorsjveaner 17 emerged earlier in the academic shop. this change in turn demanded appointment of more knowledgeable facility managers, many of whom were drawn from business and industry rather than the ranks of in-house accounting staff. this transitional period was characterized by two enormously challenging parallel efforts: the conversion of existing programs to run on third generation equipment and the development of new applications. to an extent these responsibilities were competitive, and from this viewpoint it was certainly not a propitious time to embark upon anything as complex as bibliographic data processing. yet numerous workable systems emerged for circulation, book catalogs, ordering and accounting systems, and serials lists. these were not accomplished without anguish as the library did not control the machine resources and often did not control the human resources -the facility manager tended to make his pliority decisions to please his boss who was certainly not the librarian. besides, no application could really take precedence over payroll or accounting in the administrative shop. to the librarian it was more like borrowing another person's car than renting or owning a car: when the resource was urgently needed someone else had first call. organization and management: the library automation endeavor a detailed study of this subject is not within the scope of this investigation. however, it will be useful to note that the organization and management of library automation activities demonstrate development phases which closely parallel those in the computing environment: 1. a stage in which the user himself ( cf. accountant or faculty member) undertakes to perform the activity. in this stage individual librarians learned programming, did their own design work, wrote, debugged, and ran programs themselves. (this was possible in the "open shop" environment prevalent in many early computer facilities.) 2. a stage in which the technician-in this case a librarian with appropriate public service expertise (for circulation applications) or technical processing knowledge (for acquisitions, cataloging, or serials) -took charge of an organized development effort, hired his own programmers and systems analysts, and negotiated directly with the computer facility.* 3. a stage in which the professional system development manager is hired to oversee the total effort. such a person is sometimes drawn from business or industry, is a seasoned project manager, and has broad knowledge of computers, especially in the area of costs. such an ap*the technical person need not be a librarian. northwestern university represents a significant instance where a faculty member in computer sciences and electrical engineering undertook the development effort. 18 ]oumal of library automation vol. 7/1 march 1974 pointment is more common in the large library, the consortium, or network. human problems associated with rapid change in institutions some institutions, particularly in their administrative functions, became embroiled in a seemingly endless round of internal psycho-social problems which did not make the environment conducive to problem solving. the move to computerizing manually oriented functions, whether in the library or other parts of an institution, was found to be extremely threatening to established departmental structures. it was consistently reported that the political and emotional aspects of system conversion, both in the libra.ry and elsewhere, were much more aggravating than the technical aspects. the problem simply showed up first outside the library because applications of computers occurred there earlier. departments were sometimes unwilling to give up data for computer manipulation for fear that computerization would take jobs away. this phenomenon is not unknown in librarianship where some professionals take an extremely proprietary attitude toward bibliographic data. now pressures from governments, legislatures, and the academic community at large are gradually establishing the concept that some categories of data are corporate, and do not belong to a specific individual or department, or even to an institution, but should be shared through networking or other mechanisms. but the rapidity of microsocial change and its upsetting emotional consequences caught some library leaders unawares. a considerable reeducational process for both management and labor is required to smooth the transition to the new view. motivation problems it is difficult to elicit sound comment concerning motivation (or lack thereof) as a deterrent to progress in library automation. it is an emotional subject and neither the librarians nor the programmers come out "clean." the prima donna computer programmer, much in evidence in the early days of computer center development, is very much on the wane these days. like the spoiled child, the prima donna programmer could only exist where personal interests were permitted to take precedence over social goals-or perhaps where institutional goals for the computer facility had not been clearly articulated or had not yet come into focus. some prima donnas, partly out of ignorance, partly through a stereotyped image of library activities, were inclined to disdainfully dismiss library applications as "trivial," and demand "really challenging" assignments. but the librarians had their prima donnas, too. some had learned enough programming to be a little dangerous and they then felt like peers who could tell the computer center not only what to do but how to do it. at first, few members of the library staff were willing to learn how to arinstitutional political and fiscal factorsjveaner 19 ticulate their specifications and requirements to the management of a computer facility. most librarians expected some kind of miraculous magic, akin to a wave of the hand, to bring a computer system to reality. very few understood the heuristic nature of development. so there were barriers of status, depth of knowledge, and language-any one of which would have sufficed to kill the development of the good motivation essential to breaking new gro~nd. in the wrong combination they could present an overwhelming conspiracy, for their mutual interaction could only produce polarization and intransigence. the library and the computer facility the role of similarities and differences for a long time the library has been the "heart of the university." until the advent of the computer, little could challenge the supremacy of the library as the principal resource of an educational institution. even the faculty could be put into second place, since it was difficult to attract high quality faculty without good library resources, and the faculty were to a greater degree transient, for the library was considered "permanent," an investment for all time. the computer represents a new and challenging force in the arena where shrinking resources are allocated among competing academic users. both the library and the computer facility have experienced exceedingly rapid growth in the recent past, concurrent with an expanded demand for services which can easily outstrip available resources. among some of the larger academic libraries, the staff of the computer center may be half or greater than half that of the library. important differences between the two services have recently come into focus. first, most of the services and benefits of the library are intangible. because of this it has always been difficult to measure the cost benefit of the library as an institution, and it is well known that counts of the number of people entering the door or the number of circulations are far from true measures of the library's functional success. the computer, on the other hand, is a relentless accounting engine; computer facilities can produce endless statistics on the number of jobs run, lines printed, terminal hours provided to users, turnaround time, cards punched, etc. the computer's output is extremely tangible and can be more directly and easily related to academic achievement than can library use. a second major difference lies in apparently different financial roles within the institution. in most organizations, the library is run as an overhead expense, without any attempt to charge back to users or departments proportional costs of utilization. like air, the library resource is there for anyone to use as much or as little as he pleases; the library gets a "free ride," but the computer center is expected to pay its own way. this dichotomy is often explicitly designated as the "library-bookstore" duo model. furthermore, since the library does not generate much in the way of re20 journal of library automation vot 7/1 march 1974 search grants and contracts, it is looked upon as a consumer rather than a producer of financial resources. in fact, those who support computing in preference to books point to. the fact that overhead income generated by computer-related research grants and contracts is shared with the library which may have done little to contribute toward the acquisition of such income! in some institutions the situation has become critical indeed because of the recent substantial reductions in federal· support. much political in-fighting has been necessary to maintain current levels of computer activity, and not all such efforts· have been successful.· some institutions have been forced to cut back on computing power, merge facilities, or combine resources with other institutions. · · · · several years ago when the national science foundation imposed an expenditure ceiling on grants, associated overhead income was correspondingly reduced. one computer center director was reported to have suggested that the effect of this overhead cut could be nullified by a simple, internal reallocation of funds, say by taking the needed amount from the budget of another agency on campus of less significance to researchers and scientists, such as the library. this attitude is clear evidence that the library has lost its sacred cow status as a "good thing" on the campus. it too must justify itself. close examination of the library and the computer facility gives clear evidence that both deal with the same commodity: information. within the recent past several computer facilities have changed their designations to "information processing" facilities or centers. several institutions, notably the university of pittsburgh and columbia university, have coalesced the library and the computer center organizationally or have both units reporting to a vice-president for information services. the recognition and furtherance of this natural linkage may do much to reduce the potentially destructive competition which can characterize the relationship between the two units. there are remarkable growth parallels between the two facilities-the library acquiring and processing more and niore books in response to expanded publication patterns, more users, and the· growth of new ·disciplines and interdisciplinary research, while the computation facility moves rapidly from one generation of software and hardware to the next. the expansion of both organizations produces seemingly equal capital-intensive and labor-intensive pressures: library processing staff doubles and triples, while the ·newly acquired books demand ·more in the way of housing, whether of the traditional library type or warehouse space; the computer center moves toward more sophisticated hardware, especially terminals and communications, which need to be supported by greater numbers of still more highly qualified· systems programmers, communication experts, and user services staff. both services have a marketing. problem; but the computation facility, being relatively more dynamic and more interactive (because of terminal services), can be more sensitive and responsive, .financially and technically, to its clientele than can the library. only now institutional political and fiscal factors/veaner 21 with the emphasis upon computerized bibliographic networking has the library as an institution begun to approach the marketing strategies and the effective user feedback already well developed in computation facilities. service capacity, resource utilization and sharing differences both in service capacity and resource utilization represent a key political issue affecting the future of both libraries and computer facilities. in major universities, the budget for the computer facility is now not far from the library budget in size, and in a few institutions it exceeds the library budget. with the diminution of external grants and contracts, the two organizations compete for the same hard dollars. this economic competition can either drive the two facilities apart, dividing the campus, or cause them to coalesce-as has been the case at columbia and pittsburgh. despite its high operating costs, from the viewpoint of resource utilization, the well-managed computer facility can almost always point to an excellent record.§ no matter how well managed, the research library can never make this claim in the context of its current materials and processing expenditures, much of which by definition is aimed at filling future needs. the library and its patrons cannot "use" all the resources at their command; the library could not even service all the patrons should they demand the use of "all" the resources. in contrast, the computer facility (particularly large on-line systems with interactive capabilities) can be very efficiently utilized even when demand is heavy. thus, to the "objective" eye, it would appear that in the computer facility both the institution and the individual patron get more value for their dollar than they do in the library, which in comparison resembles a bottomless financial pit. one may counter that apples and oranges are being compared, but the institution which pays their bills nevertheless makes the comparison. flexibility, inflexibility, and the future besides better resource utilization, the computer facility offers the patron far greater flexibility of resource use than can the library. there is no way a large collection of books on the celtic language or the military history of the austro-hungarian empire can help a professor of structural engineering, a student of marine biology, or a researcher in modern urban problems. even the books these people actually need and use cannot easily assist others, as relevant data in them is not indexed or readily available for computer manipulation. · the point is that, unlike the library, the computer is a highly elastic universal tool, one that each user can temporarily shape to his own need, replicate .the shape later, or if he wishes change the shape at will. the traditional.lib:rary has no such flexibility; its main bibliographic retrieval de§in fact, if a computer resource is not much used and isn't "carrying its weight," it can be disposed of, by sale if purchased, or by cancellation if leased. 22 journal of library automation vol. 7/1 march 1974 vice-the card catalog-is especially noted for its high maintenance cost, its limited ability to respond to complex queries, and a general fixity of organization and structure that is ever at variance with changing patron expectations and interests. (if computers can be flexible, why can't the library?) there is much in the library that is not used because it is inaccessiblelocked up in an inflexible retrieval tool or unavailable because the stateof-the-art (both in bibliography and computer science) or staffing does not yet permit far deeper access via "librarian-negotiators" and patrons at terminals interacting with large and deeply indexed data bases. as long as major portions of the library budget and staff are devoted to housekeeping and internal technical processing, the library will look less good, less "costbeneficial" to the academic community than does the computer facility. but there is growing recognition that both institutions deal with information processing which covers a wide spectrum of time. true, the storage formats differ, but this may be a temporary phenomenon. as progress is made on improved, less expensive conversion of data from analog to digital form and vice-versa, the day may arrive when the library and the computer facility are indistinguishable. will the library become an information utility? computer utilities are an important developing trend and it is sometimes suggested that library services could be delivered within the utility model. utilities and libraries as they exist today have very different characteristics. a utility can be defined as a system providing a relatively undifferentiated but tangible service to a mass consumer group and with use charges in accordance with a pricing structure designed for load leveling (i.e., optimization of resource utilization). typically, a utility both wholesales and retails its services. within this definition, a conventional library cannot be construed as a utility; its services are generally intangible and very highly differentiated-indeed, chiefly unique, for rarely is one book "just as good as another"; its clientele is not the general public but a highly select group which itself contains highly unequal concentrations of users; and almost no libraries impose user charges in the interest of cost recovery; practically speaking, there is only one united states wholesaler (of bibliographic data) -the library of congress. this situation is changing in several respects. first, the establishment of practical, computerized bibliographic networks has introduced among participating institutions cost sharing schemes closely resembling the load leveling or rate averaging algorithms prevalent among utilities.ll these han example of rate averaging is the practice of the ohio college library center to lump total telecommunication cost and prorate it into the membership fee, in effect creab":ng a distance independent tariff. (this arrangement does not hold outside of ohio.) institutional political and fiscal factorsjveaner 23 new ideas have been readily accepted by libraries and could even become the basis for balancing more equitably the costs of interlibrary loan traffic. second, specialized "information centers" have evolved in certain fields, partially as a consequence of lack of responsiveness (or slow turnaround) by conventional library services, and "for profit" commercial services have been set up. examples of the latter include the european s'il vous plait and its american counterpart, f.i.n.d. (often such commercial services do not hire librarians as they are considered too tradition bound.) a third force which is rather inchoate at the moment may soon take on a recognizable shape: facilities management. under such a scheme, the complete management responsibility for all or part of a function is contracted to an outside vendor. for instance, it is conceivable that some libraries in the near future may have no in-house staff for technical processing. services would be purchased totally from a vendor or obtained from his resident staff, much as computer centers buy specialized expertise through the "resident s.e." (systems engineer). the gradual buildup of computerized bibliographic services offers an excellent opportunity for commercial ventures into turnkey bibliographic operations for libraries. this would bring the libraries one step closer to the utility concept, as they buy a complete package from a wholesaler who probably services many customers. the traditional library service concepts we know today may undergo drastic changes in financing and in methods of delivery. beyond the commercialized or contractual arrangement for technical processing, which is only one component of the total information flow, lie unknown territory and little explored concepts: use charges for library services (the bookstore model), the "for profit" library, the complete information delivery system integrated with computers, communication satellites, and cable tv. if the computer-based library is to become an information utility, a major accommodation will be needed in the financing arrangements, perhaps in form of user charges-for no utility can survive without regulated demand. an unlimited, uncontrolled demand for any product or service is untenable, for without regulation (i.e., pricing) demand rapidly outruns supply. in the traditional library, where theoretically every user has the "righf' to unlimited demand, this never happens for several reasons: (1) not all potential patrons elect to use the resource; ( 2) the users must usually go to the library to access the bibliographic apparatus and obtain the materials held by the library; ( 3) every item in a library collection does not have an equal probability of use; and ( 4) there is a finite rate at which human beings can "use the resource," i.e., people can read just so f~st. none of these self-limiting factors applies to say, electric power, radxo and tv broadcasting, telecommunication services, or similar utilities. the library picture could become quite different if these limitations were removed or mitigated. suppose the patron could access the bibliographic apparatus through his home computer terminal attached to his tv 24 journal of libmry automation vol. 7 ;1· march 1974 in the "wired city." further suppose that he could receive selected, short items (where time of delivery is important to him) directly at his tv set, or longer items having less time value as microforms or hard copy delivered by mail or private delivery systems. given such possibilities, the collecting policies of individual .. libraries" (if they continue to be called by that name) might well change drastically so that nationally, collections might become much more standardized or .. homogenized" -increasing the likelihood that individual holdings will have more nearly equal use probabilities. this would imply the need for one or more national and/ or regional centers for servicing the less used materials, along with appropriate delivery systems and pricing schedules. conclusion work on library automation has proceeded during a highly developmental period in the history of computing. in this sense, librarianship has suffered no worse than any other computer application, nearly all of which have gone through traumas of design, installation, redesign, reprogramming, etc. the main distinction is that in many of these other applications -government, military, industrial, or commercial-there have been . far greater resources available to the task and vastly greater experience with the development process. despite the obstacles, progress in computerized bibliographic work has been far more significant and has achieved far more than many librarians-especially those unaccustomed to the developmerit cycle..;..can appreciate. the snowballing growth of practical consortia and networks along with the successful installation and operation of several on-line bibliographic systems has already changed the face of libtarianship in ·a very short time. like the breaking of the sonic barrier, once the initial.difficulty is overcome, further progress is easier. the ·computer has successfully achieved what librarians have until recently· only paid lip service to: cooperation and wide sharing of an expensive· and large· resource. though the linear growth model in libraries has been dead for some time, the recognition of this fact has riot yet penetrated the entire profession. if libraries are to survive as viable institutions throughout this century and into the next, their leaders inust solve the financial, space, ·and human communication problems inherent in growth. local autonomy, local self-sufficiency, and the "freedom" to ·avoid, evade, and even· undermine national standards now show up as expensive and dangerous luxuries-potentially self-destructive. only through the computet will true library cooperation be possible~ only the development of regional and national bibliographic networks,· with the assistance of substantial federal funding, can really .. save" the library. the computer is actually the' library's life insurance and blood plasma .. a failure to respond to the challenge of the ·computer could be fatal, for it is increasingly apparent that patrons growing up in the computer era will not patiently interact'with··library systems geared to nineteenth-century methods. nothing institutional political and fiscal factorsjveaner 25 in the educational system exists to .force people to use a given resource; people use the resources which are effective, responsive, and economical. if the computer is a better performer than the library, patrons will go to the computer. this will be pa!ticularly the case as computer services· become broader in coverage, simpler to lise, and unit prices continue to decline. despite the serious and irritating problems associated with learning''tp ·use the computer,. librarians must continue aggressively to support. computer applications; indeed, library leaders can impart no more important message than this to their community leaders. · acknowledgments· i wish to thank the following persons for their support: dr. e. howard brooks, who was vice-provost for academic affairs in 1971, and da'vid c. weber, director of libraries, respectively, stanford university, for granting the leave of absence which enabled me to undertake this project. i acknowledge with thanks the contributions of the following persons who reviewed early drafts of the paper, in many cases making valuable suggestions and in other instances helping me ward off errors: mrs. henriette d. avram, head, marc development office, library of congress; hank epstein, director of project ballots and associate director for library and administrative computing, stanford center for information processing; frederick g. kilgour, executive director, ohio college library center; peter simmons, professor of library science, university of british columbia; carl m. spaulding, program officer, council on library resources, inc.; david c. weber, director of libraries, stanford university. references 1. barbara evans markuson, ed., libra1'ies and automation; conference on libraries and automation, warrenton, va., 1963. (washington, d.c.: library of congress, 1964). 2. u.s. library of congress, automation and the library of congress; a survey sponsored by the council on library resources, inc. (washington, d.c.: library of congress, 1963), 3, basil stuart-stubbs, "trial by computer: a punched card parable for library administrators," library ]ournal92:4471-4 (15 dec. 1967). 4. dan mather, "data processing in an academic library: some conclusions and observations," pnla quarterly 32:4-21 (july 1968). 5. lib1'aries and information technology: a national systems challenge; a report to the council on library resources, inc., by the information systems panel, computer science and engineering board. (washington: national academy of sciences, 1972). 6. anthony oettinger, run, computer, run (cambridge, mass.: harvard university · press, 1969), p.196. (these same comments were cited in allen b. veaner's earlier article, "major decision points in library automation," college & research libraries :299-312. 26 journal of library automation vol. 7/1 march 1974 appendix 1 list of institutions visited university of alberta university of british columbia university of chicago cleveland public library the college bibliocentre, ontario university of colorado columbia university cornell university harvard university university of illinois indiana university massachusetts institute of technology university of michigan new york public library northwestern university ohio college library center university of pennsylvania pennsylvania state university umversity of pittsburgh purdue university simon fraser university syracuse university university of toronto yale university subject access to a data base of library holdings alice s. clark: assistant dean for readers' services, university of new mexico general library, albuquerque. at the time this research was undertaken, the author was head of the undergraduate libraries at ohio state university. 267 as more academic and public libraries have some form of bibliographic description of their complete collection available in machine-readable form, public service librarians are devising ways to use the information for better retrieval. research at the ohio state university tested user 1'esponse to paper and com output from selected areas of the shelflist. results indicated usm·s at remote locations found such lists helpful, with some indication that paper printout was more popular than microfiche. while many of the computer applications in special libraries were designed to improve subject access to the collections, the systems adopted in academic and public libraries have often been those which would handle various file operations and improve control of circulation or technical processing functions. once some of the data describing the items in the collection became available in machine-readable form, reference librarians have been tempted to find ways to use it for subject retrieval. in november 1970, the ohio state university ( osu) libraries began to use its automated circulation system using a data base representing its complete shelflist with limited information on each title: field no. field 1 call number 2 author 3 title 4 lc number-or nolc if none available 5 title number 6 publication date (if available) 7 ser-serial indicator. when present indicates the title is a serial. 8 neng-non-english indicator. when present indicates the title is non-english. 9 size-oversize indicator. when present indicates the book is an oversize book. 268 journal of library automation vol. 7 i 4 december 197 4 field field no. 10 portxx:xx-portfolio number in which book is located (main library only). 11 mono-monographic set indicator. when present indicates 12 13 14 15 16 17 18 19 20 21 22 title has been designated a monographic set. number of holdings (not displayed if copy 1, main library) reference line number volume number copy number holdings· condition code library location patron identification number of specific saves for the copy circulation status date charged in the form of year, month, day date due in the form of year, month, day the system, modified from time to time, provided access by call number, record number, or author-title with an algorithm consisting of the first four letters of the author's name plus the first five letters of the title. a title search was also possible by entering four letters of the first significant word and five letters of the second significant word or five dashes. as soon as the system was implemented, it was immediately evident that the search option was one of the most important features of the system. the circulation clerk at any location either in the main library or in any department library could search the author and title and find: ( 1) if the osu libraries had the book; ( 2) where it was regularly housed; and ( 3) its status (charged out, missing, lost, or available for circulation). all of this was possible without checking the card catalog except when problems of identifying the main entry existed. the immediate lack was, of course, the subject approach. as use of the system continued and library personnel became more sophisticated, various procedures offering some kind of subject approach were developed. the title search option is one possibility for finding subject access. for example, to find a book on "evolution" one can enter the title search command tls/evol----and receive a report that there are 757 titles in which evolution is the first significant word. the terminal will then print out items as follows: tls/evol----page 1 757 matches 01 lan, h. j. 02 moody, paul amos. 1903 03 brosseau, george e 04 adler, irving 05 lotsy, j. p. 0 skipped evolutie (not all retrieved) introduction to evolution evolution evolution evolution 1946 1970 1967 1965 subject access/clark 269 06 smith, john maynard, 192007 miller, edward on evolution evolution evolution evolution evolution . 1972 1917 19-1924 1951 08 watson, j. a. s. 09 kellogg, v. l; 10 shull, a. franklin when the user types in pg2 or pg3, more titles will come up, and if more than thirty titles are desired, the original command can be reentered with a /skip 30 option to display others including all 757 titles if necessary. it is also possible to manipulate this option further since this first. search may tum up the name of an author recognized as an authority on the subject. in this case, when thomas huxley's evolution and ethics appears, the terminal attendant changes to an author-title search, ats/huxlevolu, and finds eight matches, four books by thomas huxley and four by julian sorell huxley on the same subject: ats/huxlevolu page 1 8 matches 01 huxley, thomas henry 02 huxley, thomas henry 03 huxley, julian 04 huxley, thomas henry 05 huxley, julian sorell 06 huxley, thomas henry 07 huxley, julian sorell 08 huxley, julian sorell 0 skipped (all retrieved in 1) evolution and ethics, and other essays evolution and ethics and other essays evolution, the modern synthesis evolution and ethics and other essays evolution as a process evolution and ethics and other essays evolution in action 1st ed evolution as a process 2d ed 1970 1916 1942 1897 1954 1896 1953 1958 to find the call number of any of these, the attendant merely enters a detailed line search dsl/1: dsl/1 hm106h91896a huxley, thomas henry evolution and ethics, and other nolc 902452 1970 1 01 001 3week und page 1 end the ability to search by a word in the title, which in the above example gives a form of kwic subject index, is even more specific if two words are used. for example, the attendant may enter tsl/chilpsych to bring up titles containing the words "child" and "psychology" as follows: tls / chilpsych page 1 52 matches 0 skipped (not all retrieved) 01 jersild, arthur thomas, 1902child psychology. 4th 1954 02 jersild, arthur thomas, 1902child psychology 5th ed 1960 270 journal of library automation vol. 7/4 december 1974 03 thompson, george greene, 191404 kanner, leo 05 curti, margaret (wooster) 06 clarke, paul a 07 greenberg, harold a 08 english, horace bidwell 09 chess, stella 10 curti, margaret (wooster) child psychology 1952 child psychiatry 3d ed 1957 child psychology 1930 child-adolescent psychology 1968 child psychiatry in the commun 1950 child psychology 1951 an introduction to child psych 1969 child psychology 2d ed 1938 the obvious subject approach is, of course, by call number. the system contains an option that permits a search on the general call number. the operator may enter either a real or an imaginary call number and receive the fifteen titles preceding and the fifteen titles subsequent to it in the shelflist. for example, with the command sps/hm106h9, using the call number from the previous example, the following ten titles will appear with that call number as the central item: sps/hm106h9 11 hm106g77 graubard, man the slave and master 12 hm106h3 haycraft, darwinism and race progress 13 hm106h57 herter, c. biological aspects of human problems 14 hm106h6 hill, g. c. heredity and selection in sociology 15 hm106h63 hoagland, evolution and man's progress 16 °hm106h9 17 hm106h91896 huxley, evolution and ethics and other essays 18 hm106h91896a huxley, evolution and ethics and other essays 19 hm106h91897 huxley, evolution and ethics and other essays 20 hm106h91916 huxley, evolution and ethics and other essays 21 hm106k29 keller, societal evolution; a study of the evolutionary basis page 2 input:hm106h9 entering pgl will bring up the ten preceding titles and pg3 the ten sub:sequent titles. one of the best features of this system is that the patron may call in by telephone and have at least some of this information read to him; if he is at a circulation area, he may receive a printout as an instant bibliography. recently an attempt has been made to use the file of data in other ways. in an attempt to provide better access to the main campus collection for the people at the five regional campuses of the university, an experiment was tried using a computer printout of certain selected parts of the shelflist. since microfiche is less expensive and more compact to handle, there were good reasons for using this form rather than the paper printout form. this was an obvious application for computer output microfiche (com). once subject access/clark 271 a master frame has been produced by com, the cost of additional copies is negligible. in order to test acceptance of form more accurately, it was decided to provide a list in each form to test on sample populations. to cover some of the subjects taught at the agricultural and technical institute at wooster, a total of 20,672 titles were selected in the following areas: agricultural economics botany agriculture agricultural machinery wood technology woodworking hd1401-2210 qk10-942 s tj148g-1496 ts80g-937 tt13g-200 2,121 titles 1,039 titles 17,157 titles 6 titles 197 titles 152 titles these titles were printed in a hard-copy printout in the following format with a program designed by gerry guthrie of the research and development division of the osu libraries: call number = tj1496c3a3 title number = 196795 author = caterpillar tractor company title = fifty years on tracks publ. date = 1954 holdings = cool com regular lc number = 55-20529 the physical form of the resulting documents varied somewhat due to the fact that each subject area was put in one cover. this meant "agriculture" ( s) with 17,157 titles was too bulky to carry around, but "wood technology" was compact and easily carried to one's office or home for leisurely browsing. a brief questionnaire was used to test the reaction to the list. responses were received from 6 percent of the students and faculty at the agricultural and technical institute. with the usual assumption that some students are not library users, there was some validity to the sample. results tabulated from these questionnaires fell into three categories: ( 1) nature of use; ( 2) value of the list; and ( 3) response to its form and format. since some questions were left blank, the totals were often less than 100 percent. nature of use the responses turned out to be evenly divided between faculty and students, 46 percent for each with some leaving this question blank. the faculty indicated that two-thirds of the use was for themselves and one-third for the students. students, of course, used it totally for their own purposes. the actual purpose of the list had been envisioned as access to the main campus collection, and increases in interlibrary loans indicated that it was 272 journal of libmry automation vol. 7/4 december 1974 effective. loans during the month of october 1973 totaled four while november's loans totaled thirty-four, showing a marked difference after the delivery of this search tool on october 31. the questionnaire showed that 77 percent indicated they used the information for this purpose. it should not have been a surprise to librarians to find that 34 percent of the sample population used the information to order a duplicate copy for the wooster ati library, an indication of readers' known proclivity for wanting their material close at hand. users' evaluation the increase in interlibrary loans was probably a better reflection of the users' approval than the actual questionnaire results, although the results themselves were also highly positive. seventy-seven percent checked that they found it valuable, against 15 percent who did not. eighty-five percent said they wanted more lists. requests for additional suggestions included a request to keep it up to date and a request to limit it to just recently published items, while another person asked for all of the titles located in the agricultural engineering library. the requests indicated that several additional subject areas were wanted: communication skills, personnel management, human relations, use of airplanes in agricultural, irrigation, and drainage engineering, and environmental pollution. suitability of form and format some attempt was made to determine how people react to the admittedly inconvenient form of a computer printout. since financial considerations limited the possibilities to either this form or microfiche, those options were presented in the questionnaire. preference for the paper form was expressed by the users of the list in this form-84 percent to 8 percent who would have preferred microfiche. · the population was evenly divided as to whether or not they wished to have the list in this call number order-50 percent wanted it by straight shelflist or call number order and 50 percent wanted it alphabetically by author. the latter response may very well reflect the proportionally large number of respondents who were faculty and who supposedly would know the authors in their fields and do not use a subject approach when seeking materials. while the original purpose of the research was to provide better subject access to a remote collection, it was also important to find out more about the user's response to microfiche if he could be given an improvement in service or a service he did not previously have. microfiche would be both more compact and less expensive if lists of this type were to be provided in many subjects and continually updated. for the microfiche section of the research project the library of congress classifications covering classics and related fields were chosen, partly subject access/clark 273 on the basis that faculty in these areas had agreed to participate and encourage their students to use the list. included were: de1-de98 df101-df289 dg11-dg209 n563q-n5790 na20q-na335 pa-all z7001-z7005 history-the mediterranean world history-greece history-italy history of art-greek and roman architecture-history-greek and roman language and literature of greece and rome bibliographies in linguistics, roman and greek literature, teaching languages this subset produced about eleven thousand titles. the format of the com was the same as that on the paper printout, with general titles appearing at the top of each sheet or frame, e.g., shelflist-classics-greece. this took twenty-two microfiche with sixty-nine frames each listing seven or eight titles. the last frame on each fiche was an index to that fiche. a nonreduced (eyeball) character at the top listed the first call number on the fiche. it was envisioned that the user might know the general classification number, search for it by the eyeball character, then consult the index in the last frame to locate the proper frame for a specific class. in this way the user could browse through the subject area. the chief advantage of com lay in the fact that the small envelope of microfiche and a portable reader were easy to check out of the library and carry home or to an office where the user could browse through the library shelflist at a leisurely pace. since initial reaction was negative, a subject index was prepared to make the list more usable to undergraduate students. this index was made up of the appropriate entries which appeared in the library of congress classification schedules, with all entries consolidated into one alphabet. 1 using this index to find an entry-for example, "caesar, c. julius" -the student would find two areas to search: dg261-267 and pa6235-6269. he would find these areas on the microfiche with the eyeball characters, then search the index frame to find the appropriate pages. the classics list with its index and instructions was packaged in neat, loose-leaf notebook form and, together with a portable reader, presented to classics faculty at two regional campuses. a set was also available in the library. the results were completely negative. reliance upon the cooperation of too small a number of cooperating teachers may have invalidated this part of the research, but the contrast in response to the similar printed list raised serious questions about user response to microfiche in an index or reference book situation.2 it had been anticipated that a population in the humanities or social sciences would have had more need than the science group for what was essentially a book list since serial titles did not include 27 4 j oumal of library automation vol. 7 i 4 december 197 4 holdings. the complete lack of interest from the faculty in the field of classics was an unexpected disappointment but no firm conclusions could be drawn without a research strategy designed to remove any possible variables. conclusion increased use of marc cataloging through such systems as oclc and ballots will mean many more libraries will have their total holdings in machine-readable form with the capability of using their records in new ways. programs for distributing microfiche copies of library catalogs such as georgia tech's lends program provide inspiration for public service librarians to make use of the data and technology that technical services automation projects are supplying. 8 this experiment in manipulating machine-readable library records for use in subject searching was an attempt toward better retrieval of a library's collection and indicated that such programs would be useful to extend service outside a single library location. references 1. it may soon be possible to do this in a much simpler fashion by using the combined indexes to the libl'ary of congmss classification schedttles (washington, d.c.: u.s. historical documents institute, 1974). 2. doris bole£, "computer-output microfilm," special libraries 65:169-75 (april 1974). in describing the use of com at the washington university school of medicine, bole£ said, "there is, however, an additional disadvantage, namely, the resistance of users to the use of microforms because of their inconvenience. patrons will sometimes choose not to read a publication when told it is available in some sort of microform only. it is assumed that librarians are not quite as reluctant, but it would be a mistake not to take this reluctance into consideration. this resistance by both librarians and patrons is stronger than is usually reported by com manufacturers and service bureaus" ( p.170-71). 3. the georgia tech libl'ary's complete card catalog is now available in microfiche form, brochure (atlanta: price gilbert memorial library, georgia institute of technology, 1972). lib-s-mocs-kmc364-20141005044147 117 technical communications isad announcements please note a change of address for the editor of technical communications: send all future news releases, technical communications, and announcements to don l. bosseau, director of libraries, emory university, atlanta, ga 30322. technological inroads artificial intelligence transistors and other circuit elements of the new generation of computers are so tiny and fitted so closely together that it becomes feasible to combine thinking circuits and memory units on a single chip. thus, one cell in the computer's memory bank can both remember and reason. this is a major step closer to artificial intelligence. in july 1971, the japanese government eannarked $100,000,000 for an eight year study of artificial intelligence. japanese industry accepts the conclusion that it could be increasingly dependent on "intelligent" computers. a technical report dated february 1971 reads; "the development of these tiny chips presages a time when the electronic brain will rival the human brain in complexity and memory. the identity of the fully educated computer may become blurred with that of its programmer-teacher! it may exhibit esthetic and artistic judgments of an interesting degree of subtlety. responses akin to feeling and emotion need not be excluded from its training if they may enhance its performance." along with artificial intelligence will come electronic voice recognition. voice recognition by the computer-in other words, a computer that will respond to oral command-is making significant engineering progress. rca reports that its voice command machine responds to twenty-eight of the basic sounds in the english language.-(extracted from advertising age, march 19, 1973). cbs laboratories invents way to produce microfilm pictures by laser dr. william e. glenn, jr., director of research at cbs laboratories, a division of columbia broadcasting system, inc., has been granted a u.s. patent for an improved method of recording and reproducing information from microfilm. by means of a splitbeam laser, pictorial or printed information is transferred to a metal master. this metal master disk is similar to the type used in the record industry and, from this disk, duplicates can be stamped at low cost. "the market potential," dr. glenn stated, "will not only include the cassette and film industry, but it can be an asset to libraries and government printing as well. this system," he further stated, "is designed for recording and reproducing picture infonnation. it uses diffraction gratings that are modulated in accordance with the picture information. reproduction is effected by directing light through the medium. the zero-order diffracted light is modulated in accordance with the picture information." the patent, assigned to columbia broadcasting system, inc., will offer reduced costs for recording on microfilm. it has potential for use in the motion picture film industry, libraries, and cassette r~ cording. cbs laboratories has made other outstanding advances in laser technology which include the laser color film recorder, holography, and the holographic scanner. l i i 118 journal of library automation vol. 6/ 2 june 1973 microimagery-solution to the information explosion tomorrow's busy businessman will have the information necessary to do his job right at his flngertips, due to the growing acceptance of microimagery as the solution to the information explosion. "in every area of business today, the need for information is increasing faster than any individual can keep up," says walter steel, bell & howell's vice-president of microimagery marketing. "university courses are now teaching kids to be generalists and how to flnd the information on what they need to know. they're learning that the vehicle to the access of information sometimes is more important than the knowledge," steel says. the seventies will be known as the decade of microfilm, just like the sixties for the copier and the fifties for the computer, according to steel. microfilm is halfway between the computer and the copier as a support to business, because it includes copies and peripherals to the computer. soon the copier will become peripheral to microfllm, steel states. steel calls microimagery, "the immediate communication tool." it's the new media that fits the new world of business. soon companies will be saying to their customers, "we11 send you our computer once a week." technical journals will simply send their subscribers a paper newsletter that hits the high spots, along with a deck of microfiche and a new index, plus a retrospective new index each month, steel forecasts. "microfilm won't ever totally replace paper," says steel, "but it will replace file cabinets and storage areas, plus it will simplif,y the filing system in any size office. steel says that the potential for microfilm is greatest in the business records market. the bank market was the base for the microfilm business, but it's no longer predominant, according to steel. "the basic unique value of microimagery is that it saves money. our goal at bell & howell is to be able to provide a complete microfilm system for the small office market for under $1,000. that would include a camera, microfilm processor and viewer," he stated. in light of increasing postage costs, many publishers are actively investigating microimagery. ten pounds of printed matter are reduced in microforms to an ounce or less. with the development of microfiche having a 50 to 1 ratio (i.e., 510 images on a 4 x 6 inch fiche) , 90 percent of the books published could be available on a single microfiche each. the book of the month club could become the fiche of the month club. in every profession there's new technology that the successful manager must have access to in order to continue his success. microimagery can put that knowledge at his fingertips. reports-regional projects and activities new uc library automation office established berkeley-coordination of multicampus automation projects serving the university of california's libraries has been placed in a central office under a director of the university-wide library automation program (ulap). jay l. cunningham, a project manager in uc's institute of library research, has been appointed to the director's position. library automation has been underway for several years at the university of california, which is considered one of the pioneers in this field. each of the nine campus libraries has specialists for automation on its staff, and a central staff has been also working on such problems in the university-wide institute of library research ( ilr) . with growing emphasis on automation, coordination of the various campus projects becomes increasingly important to insure that applications are compatible. uc also maintains close contact with similar efforts at the california state university and colleges. coordination of a number of horary functions by these two segments of public higher education may be greatly facilitated by automated procedures. in recent years, such coordinating tasks have fallen more and more on ilr, an organized research unit directed by a professor in berkeley's school of librarianship, charles p. bourne, who also served as acting ulap director for the past eighteen months. since the primary task of such units is research in support of the university's educational function, the responsibility for development and operation of university-wide automated procedures has been made into a full-time assignment, with cunningham taking over as ulap director from professor bourne. a close working relationship will be maintained between the two groups. among projects well under way are the following: university of california union catalog supplement. the berkeley and ucla catalogs published in book form in 1963 have been recently supplemented by a forty-seven-volume set showing all monographs cataloged by all nine campuses during the five years 1963 through 1967. preparation and printing of the more than 750,000 titles was done by semiautomatic methods. union list af serials. all serial publications, including book series and scholarly journals, to which uc libraries subscribe are entered in another list that is to be continually updated, a task greatly simplified by the computer. scholars and other library users will be able to determine immediately which uc campus subscribes to any serial and how complete its holdings are. bibliographic center. in addition to housing the above two projects, this center helps in processing newly acquired books by printing catalog cards by computer at the ulap headquarters. cards can be ordered by a uc library in full sorted sets, including multiple sets if needed for branch libraries. the new system supplements the present method of ordering cards separately from outside vendors or producing them on each campus. among projects envisaged for the future are automated circulation procedures, under which each borrower would be given a machine-readable card and the charge slip technical communications 119 in the back of the book would likewise be machine-readable, such as a punched card. this method would speed the checking out of books and facilitate statistical studies. other projects include a clearinghouse that would indicate instantaneously whether a new book recommended for purchase has been already ordered by another campus; and the streamlining of library accounting procedures. cunningham is a graduate of cornell university and holds the master of library science degree from berkeley. before joining the ilr staff, he served as a library systems specialist at the library of congress, and as a u.s. air force officer for four years. the new director will report directly to vice-president-academic affairs angus e. taylor, a university-wide official. committee undertakes implementation of program which will afford universitywide "direct access" state university of new york students will soon benefit from more direct access to the 7.5 million books and 6.2 million slides, films, recordings, and other research materials contained in libraries on the university's thirty-four state campuses. that the university is moving to provide faculty and students with walk-in privileges at any of the libraries at the twentynine state-operated campuses and the five statutory colleges at alfred and cornell universities, was announced recently by university chancellor ernest l. boyer. the proposed system, which has the endorsement of the faculty senate of the university, will greatly improve upon the university's current interlibrary loan program under which beaks at cooperating libraries can be borrowed through the mails. working in cooperation with state university librarians, chancellor boyer has announced the formation o£ a committee of librarians and administrators to develop a timetable and procedures to implement the program. the committee will be chaired by willis bridegam, director of libraries at the university center at binghamton. 120 journal of library automation vol. 6/ 2 june 1973 the other members of the panel are dr. philip sirotkin, vice-president for academic ahairs at the university center at albany; don cook of the university center at stony brook; mary cassata of the university center at buffalo; george cornell, college at brockport; and henry murphy of cornell university. in addition to developing a program timetable and procedures, the committee will also explore the future possibility of extending access privileges to the faculty and students at the thirty-eight locallysponsored community colleges. the expanded library access policy is seen as an essential step in the university's efforts to use its library resources more effectively, particularly since the cost of acquiring books and periodicals has grown at an extraordinary rate in recent years. some publications costs have increased at the rate of 15 percent p er year. state university of new york is the first major multicampus system to introduce such a reciprocal program on so wide a scale, although the library system of the state university of illinois has a similar policy, limited to faculty and graduate students. the growing use of modern computer and data processing techniques is another cost control program the university has implemented in administration of its libraries. shared cataloging techniques and the compilation of lists of university-wide locations will be developed to enable library users expeditiously to locate books and reference tools. the policy will be particularly beneficial to students of the university's empire state college, since they are not campusbased and must rely heavily on library collections near their homes or places of employment. the policy will also make it much more convenient for students and faculty to conduct research and complete reference assignments in other parts of the state during vacation and intersession periods. collectively, the libraries at the university's state campuses comprise one of the greatest collections of titles and reference materials in the world. holdings for the 197172 academic year included 7,551,333 volumes, 237,428 microfilms, another 5,115,584 units in other forms of microtext, 20,587 slides, 71,007 recordings, 86,662 maps, 90,694 periodical titles, 29,334 additional serial titles, and 541,007 printed government documents-for a grand total of 13,743,636 entries. potpourri u.s. experts study soviet science information system and services eight united states information specialists from government, universities, professional societies, and private industry participated in the first u.s.-u.s.s.r. symposium on scientific and technical information, organized under the u.s.-u.s.s.r. agreement on cooperation in the fields of science and technology, in moscow, june 18-19. the group led by dr. lee g . burchinal, head, office of science information service, national science f oundation, also spent ten days in the soviet union visiting key information organizations in moscow, novosibirsk, yerevan, and kiev. the purpose of the symposium and subsequent site visits was to give the u.s. group an opportunity to learn more about the soviet system for providing science and industry with needed scientific and technological information, and to explore feasible areas for possible future cooperation. in addition to dr. burchinal, members of the group included william t. knox, director, national technical information service, department of commerce; melvin s. day, deputy director, national library of medicine; dale b. baker, director, chemical abstracts service; scott adams, science communications division, the george washington university; dr. vladimir slamecka, director, school of information and computer science, georgia institute of technology; bart holm, manager, systems development section, information services division, e. i. dupont de nemours & co.; and jerome luntz, senior vice-president, mcgraw-hill publications co. the group was hosted by engineer n. b. arutiunov, director of the information directorate, state committee for science and technology (scst), council of ministers of the u.s.s.r. the symposium featured four presentations by soviet specialists on the following topics: 1. state scientifl.c and technical information system of the u.s.s.r. (dr. 0. v. kedrovskiy) 2. viniti's integrated information system for the u.s.s.r. (dr. a. i. chernyy) 3. specialized system of scientific and technical information services in instrument making (dr. v. a. rukhadze and dr. v. m. baikovsky) 4. psychological aspects in charting the pathways of scientific and technical information development (prof. dr. g. t. artamonov) on june 20-23, the u.s. group visited the all-union institute for scientific and technical information (viniti), the allunion scientific and technical information center (vntitsentr), the all-union research institute of medical and medicotechnical information (vniimi) and the state public library of the u.s.s.r. for science and technology (gpntb-sssr). on june 24-29, the u.s. group visited the siberian branch of the u.s.s.r. academy of sciences and the novosibirsk center of scientific and technical information, the armenian research institute of scientific and technical information and technico-economic studies ( armniinti), the ukrainian research institute of scientific and te chnical information and technico-economic studies (ukrniinti), and the institute of cybernetics of the ukrainian academy of sciences. although about .five years behind the u.s. in applications of technology, especially computer and microform systems, dr. burchinal said, the soviets have established a strong base for rapid future growth. reflecting their style of centralized, national planning, the soviets are well advanced toward development of an integrated national information system embracing both science and technology. the major components of the emerging technical communications 121 integrated national system are { 1) centralized policy, planning review, and methodological guidance provided by the state committee for science and technology ( scst) ; ( 2) concentration of national backup resources in all-union (national) institutions; ( 3) eighty-two ''branch" information networks established by the industrial ministries; ( 4) development by the fifteen republic and regional information institutes of "interbranch" or interdisciplinary dissemination services to serve local industries and planning bodies. a major feature of this national information system is emphasis on the active dissemination ("propaganda") of information about technological innovations throughout the soviet economy. the u.s. group, d. burchinal said, was particularly struck by the importance attached to information services by the highest levels of scientific and technological management in the u.s.s.r. and in the constituent republics. their commitment is reflected in the resources being assigned to development of improved information services. four new buildings are being constructed in moscow alone for all-union scientific and technological information services; staffs are being expanded; third-generation computer systems will be installed at numerous sites beginning in early 197 4; and new buildings are underway or were recently completed for n early a dozen republic and interbranch services. in short, the soviets know where they want to go, and they are devoting considerable resources to achieve their national objectives. the second half of the symposium begun in moscow was held in washington on october 1-2. at that time u.s. and u.s.s.r. representatives sought agreement on areas of continued cooperation which will be reported to the joint u.s.-u.s.s.r. commission on cooperation in science and technology when it meets in moscow. a report of the june visit by the u.s. team to the u.s.s.r. will be available through the national technical information service. 122 journal of library automation vol. 6/2 june 1973 pertinent publications isad cable tv information packet now available from the american library association's information science and automation division is a thirteen-piece packet of materials on cable television. included in this information kit of articles, bibliographies, policy statements and suggestions are the following: • annotated bibliography on cable television for librarians, brigitte l. kenney and susan bunting • catv: visual library service, brigitte l. kenney and frank w. norwood • cable television-a bibliographic review, james schoenung • cable television: state-of-the-art and franchise recommendations, advisory memorandum by nowell leitzke • a glossary of terms for cable television and other broadband communications, merry sue smaller • guidelines for planning a cable te1evision franchise, sidney dean, jr. • letter to joe fischer, jr., from c. lamar wallis, director of libraries, memphis public library and information center • metropolitan library service agency (melsa) position paper on cable television, jon shafer • planning for urban telecommunications, kas kalba • public-cable, inc. statement • a report on cable communications and the district of columbia public library, lawrence e. molumby • san francisco public library video center policy statement • video/ cable activities in libraries, brigitte l. kenney and susan bunting packets are available for $2.50 each. send order to: cable tv packet, donald p. hammer, !sad, american library association, 50 e. huron st., chicago, il 60611. please make checks payable to the american library association. lib-mocs-kmc364-20140106084141 221 book reviews introduction to information science, tefko saracevic, ed. new york: bowker 1970, 776 pp. $25.00 the editor has put together a large volume consisting of 776, 8~ x 11 pages and weighing almost 5 pounds. it comprises 66 different articles written by almost as many authors and covers the period from 1953 to 1970. two-thirds of the articles were written during the period 1966-1969. in short, it is a collection of a large number of papers mostly from the last few years having to do in some way with information science or more properly, with information systems. the papers generally are good ones and in some cases have already become acknowledged classics. in a few cases i am a bit puzzled about their inclusion in a volume of this type. in the few months since i have had this book i have already found numerous occasions to consult several of the articles. some of the other papers which i have not seen recently i have enjoyed reading again. the book is divided into four parts, which are further subdivided into thirteen chapters. the four parts are basic phenomena, information systems, evaluation of information systems, and a unifying theory. although the chapter headings are too numerous to list they include such topics as notions of information, communication processes, behavior of information users, concept of relevance, testing, as well as economics and growth. by virtue of the parts, chapters, and articles the editor has provided a type of classificalion system or structure for information science without attempting to define information science. interspersed between each of the parts and chapters is up to a page of introductory and explanatory material provided by the editor. in a volume of this type it is important to recognize what the volume is and what it is not. as i have mentioned, it is a good anthology of important articles related to information. it is not, as the title implies, an introduction to information science. the papers are by and large unrelated to each other and the introductory comments by the editor do little to provide a unifying relationship. furthermore, the overall scope of the articles is generally quite limited and, although the editor implies it is not so, tends to equate information science to information systems. the final paper in the volume by professor william goffmann is listed by the editor as part four-a unifying theory. the precise title of the chapter is somewhat less ambitious: namely, "a general theory of communication." the paper is an unpublished one (although similar papers by the author have been published elsewhere) and relates communication in a general sense to the theory of epidemic processes. although the theory is an 222 journal of library automation vol. 4/4 december, 1971 interesting one, it would hardly qualify as a unifying theory for information science. it certainly does not provide the unifying relationships among the various articles included in the text. my guess would be that .other qualified individuals, in putting together a similar volume, would have included many different articles. this, however, is the nature of the field at this time. by comparison note the recently published volume key papers in inf01·mation science, arthur w. elias editor. this book, although admittedly serving a somewhat different purpose, contains 19 papers with only a single paper in common with those of this particular volume. in summary, this is a good collection of relevant and useful articles in information science. it is probably desirable that they be included in a single volume. serious students, educators, and research workers will find this volume to be of interest. as a reference book it will be quite useful. the book is not, however, an introduction to information science. the novice, the student, and the casual reader will probably be disappointed and confused, and in some cases might even be misled. marshall c. yovits information processing letters. north-holland publishing company, amsterdam. vol. 1, no. 1, 1971. english. bi-monthly. $25.00. this journal is published by a most reputable company and has a most impressive international list of editors and references. the affiliation of editors illuminates the orientation of the journal: six of them are from departments of mathematics, computer science or cybernetics and two are from ibm laboratories. understandably, the journal is devoted basically to computer theory, software and applications, with a heavy accent on mathematically expressed theory related to the solution of computing problems, algorithms, etc. it is directed toward basic and applied scientists and not toward practitioners. people interested in library automation may, from time to time, find in it theoretical articles broadly related to their work, but they will have to do the "translating" themselves. this journal follows the tradition of "letters" journals in physics, biology and some other disciplines. the papers are short; publication is rapid; work reported generally tends to be very specific, preliminary to or a part of some larger research project; usually small items of knowledge are reported. the "letters" journals are received in the fields where they appear with mixed emotions. for instance, ziman (nature 224:318-324, 1969) questions very much the need for these publications. on the other hand, they are a useful outlet for authors who otherwise would not publish these often useful bits of specific knowledge. recommended for research libraries related to computer science. t efko saracevic book reviews 223 handbook of data processing for libraries. by robert m. hayes and joseph becker. new york: becker & hayes, inc., 1970. 885 pages. $19.95. to write a universal handbook in a field so full of complex intellectual pro.blems and simultaneously satisfy every potential reader is an impossible assignment. therefore the authors cannot be faulted for failing to satisfy everyone. they have succeeded in writing for a very important audience -administrators and decision makers. for this group, they have presented difficult technical material in a clear readable fashion-a reflection of ' their extensive teaching experience. for many library administrators, this handbook arrives five years too late. had it been available earlier, a large number of current automation projects might never have been authorized by management, or at least might have been conducted on a sounder basis. following a very conservative approach, the authors generally remain within the limitations of the current state of the art, being careful to distinguish that which is feasible (i.e., practical ) from that which is possible. over and over again, they warn librarians about the limitations of computers and caution against excessively high expectations. for administrators, the most useful material is in chapter 3, "scientific management of libraries," and in chapter 8, "system implementation." a reading of chapter 8 alone suffices to convey to the administrator the magnitude and complexity of even the most seemingly routine computer application in libraries. this chapter, the most important and useful in the entire book, covers planning, organization, staffing, hardware, site preparation, programming, data conversion, phase-over, staff orientation, and training. each of these topics-deserving of complete chapters in themselves-is treated briefly, but in enough detail to communicate the complexity of each component in the long stream of system development activities, all of which must be completed to the last detail for success. there are three useful appendices: a glossary, an inventory of machine readable data bases, and a list of 115 sources for keeping up to date. bibliographic footnotes abound and each chapter ends with a list of suggested readings. however, it is surprising how many references are five or more years old; in fact, there is a scarcity of current references. for example, ballou's well-known guide to microreproduction equipment, now entering its 5th edition, is cited in the first edition of 1959. the authors have been badly served by their proofreaders. the book is marred by an incredible number of spelling errors in text, tables, footnotes and references, especially with personal names, plus incomplete citations. the index contains many entries too broad to be useful, such as: utilization of computer ( 1 entry ), time sharing ( 1 entry), hardware ( 3 entries), technical services ( 3 entries). lacking from the index are name references to distinguished contributors to the literature, such as avram, cuadra, degennaro, fasana, and others. many of these names appear only in footnotes. 224 journal of library automation vol. 4/4 december, 1971 the book is rich in tabulated data and specifications for a variety of equipment. unfortunately, much of this equipment is inapplicable to library use, or the tabulated data is in error. table 12.25lists several defunct or never marketed equipments, such as ibm's walnut and eastman kodak's minicard, without indication of non-availability. in table 11.22 there are extensive listings of crt terminals, most of which are unsuitable for library applications by reason of deficient character sets or excessive rentals. nine of the units listed showed rentals of over $1,000 per month, and two of these were virtually at $5,000 per month, clearly beyond the reach of any library. table 12.2 suggests the access time to one of 10,000 pages in microfiche is half a second, a figure that is off by an order of magnitude for mechanical equipment and by two orders of magnitude for manual systems. (more nearly correct figures are given in the text on page 396). table 12.21 lists several microfilm cameras designed expressly for non-library applications and not adaptable to any library purpose. from a broader perspective, one misses several other features. is a "handbook" for the practitioner? if so, this volume is too elementary. can it be used as a textbook in a course in library automation and information science? the book contains no problems for students to attack, and except for references, no aids to the instructor. possibly it can serve as supplementary reading, for it contains far too much tutorial material (yet only ten pages of nearly 900 are devoted to flow charting). one wishes for more specifics drawn from the real world. a hypothetical case study in chapter 11 is illustrative: a 5% error rate is assumed for input of a 300,000 record bibliographic data base to be converted to machine readable form. not revealed in the example is that a relatively low error rate in keyboarding may result in a very high percentage of records which must be reprocessed to achieve a high quality data base. each reprocessed record will consume computer resources: cpu time, core, disc i/0, tape reading and writing, etc. we know from marc and recon that the ratio of the total records processed to net yield is on the order of 3:2; i.e., each record must be processed on the average of one and a half times to get a "clean" record. the cost of this reprocessing is far beyond the 5% lost by faulty keyboarding. the handbook will be a useful decision making tool for the generalist, a less helpful aid to the practitioner. it is hoped that a revised edition is in preparation, and particularly that the tabular material will be corrected and brought up to date. chapter 8, the heart of the book, should be greatly expanded. for the next edition, some consideration might be given to a two-volume work: the first volume for the administrator, and the second containing much more technical detail for the practitioner. if the two volume pattern is followed, a loose-leaf format with regular updating would be most helpful for the second half. allen b. v eaner book reviews 225 l~brary automation: experience, methodology, and technology of the lzbrary as an information system, by edward w. heiliger. new york: mcgraw-hill book co., 1971. xii, 333 pp. the need for a handbook and/or general introductory text on the topics of automation and systems analysis in libraries has been sorely felt for quite som.e time. during the past year, three have appeared (chapman and st. pierre, library systems analysis guidelines, wiley, 1970; hayes and becker, handbook of data processing for libraries, wiley, 1970 and the book here reviewed.) unfortunately, none is completely satisfactory, for different reasons. a serious student wanting a reasonably comprehensive, systematic, and balanced treatment of these subjects will, i'm afraid, be forced to have to use all three of these titles and, even then, will have constant need to use supplementary materials for a number of aspects. the title being considered in this review by heiliger and henderson, if one judged only by the authors' intent as expressed in the preface, would be exactly the kind of work that we've all felt the need for. as they state on page vii, the purpose "is to provide a perspective of the library functions that have been or might be mechanized or automated, an outline of the methodology of the systems approach, an overview of the technology available to the library, and a projection of the prospects for library automation." and, indeed, if one looks at the table of contents there are four parts that closely parallel this statement of purpose. the parts themselves though, when inspected more closely, reveal not a systematic treatise or even an in-depth treatment of these topics, but rather a loosely connected series of essays, each on a fairly superficial level, discoursing on a variety of aspects associated with, or tangental to these topics. this indicates, at least to this reviewer, that the genesis of the book was a series of lectures presented and refined over a period of time by the authors. although not in itself a bad thing, here it is unfortunate to some degree because not enough effort was expended in amplifying the material with additional data, library-oriented examples, and illustrations, nor in logically integrating the various parts. part i, entitled "experience in library automation," begins by broadly citing a number of library automation projects mostly dating from the early 60's. the level is extremely superficial and the presentation not very enlightening, since only three or four projects are mentioned, and then only in passing. immediately following are several excellent chapters describing traditional library activities (e.g., acquisition, cataloging, reference,. etc. ) in functional terms. the approach, though extremely simple, is for the most part effective and is only marred by occasional, overly condescending statements such as "library filing is a very complicated matter" or "reference librarians use serials literature extensively." unfortunately, in the 104 pages of this section there is not one illustration. 226 journal of library automation vol. 4/4 december, 1971 part ii, "methodology of library automation," attempts to describe the general approach and techniques of systems analysis. in a number of ways, this is the best part of the book. unfortunately, the concepts that are so simply and succinctly described are only indifferently related to activities that will be familiar to librarians. as a brief essay on the objectives and concepts of systems analysis, it is quite adequate, but as a discussion of how they relate to library problems, it is totally inadequate and often misleading. part iii, "technology for library automation," is probably the least informative part of the book, giving the reader virtually no practical information. all of the important and obvious technological concepts are listed, but are dismissed with what oftentimes is little more than a brief definition. the one exception to this is chapter 13, entitled for no apparent reason, "concepts." this chapter is in fact an innovative and thoughtprovoking view of a library as a data-handling system. one wishes that this chapter had been amplified and treated more fully. part iv, "prospects for library automation," is the least effective part of the book, having in my mind only one merit: it doesn't tack on a hollywood-style happy ending. the authors' view of the 70's, as far as can be inferred from this too short section, is cautious and mundane. these will be, i'm convinced, the overriding characteristics of automation efforts for the next several years. i only wish that the authors had elaborated more fully on these points and presented their views more coherently. the book is augmented with a 61-page bibliography ( 1,029 citations), which, though reasonably current, is of dubious worth because it is neither annotated nor particularly well balanced. certain classics, such as bourne's methods of information handling, or information storage and retrieval by becker and hayes, and certain current, basic items, such as cuadra's annual review of infonnation science and technology and the journal of library automation, are not listed. each chapter is accompanied by a "suggested reading list" wherein materials more or less pertinent to the subject of the chapter are listed. a glossary of terms in three parts (a total of 36 pages) is also included and, though difficult to use because it is in three alphabets and interspersed with the text, provides short but very adequate definitions. unfortunately, several jargon terms used in the text itself are not included; one that was most irritating to this reviewer is the term "gigabyte" which to my knowledge has very little currency among the cognoscenti. on balance, library automation is a title that should be recommended for a wide range of readers. though it will probably have little to offer experts in the field, it does have value as a text for library students or a general introduction for the average, non-technical librarian. paul ]. fasana book reviews 227 sistema colombiano de informacion cientifica y t echnica (sicoldic). a colombian network for scientific information, by joseph becker et al. quirama, colombia: may-june 1970. 59 p. mimeo. the task of the study team which produced this report was to present "an implementation plan for strengthening the scientific communication process in colombia by providing a permanent systematic mechanism to function in the context of colombia's internal needs for scientific and technical information in government, industry, and among the research activities in higher education." more specifically, the expressed goal of such a mechanism is "to develop a network which will permit any scientific or technical researcher, in government, industry, or university, to access the total information resources of the country without regard for his own physical location." the study was comple ted in two months (according to the cover dates) and comprised four areas of investigation, namely: 1) to elucidate the advantages of d eveloping a centrally administered national network including three levels of network nodes and a technical communications plan; 2) make an inventory of universities, institutes, telecommunications and computer facilities in colombia; 3) recommend a mix of these factors to produce specific services, and 4) propose a seven-year budget. the republic of colombia is about the size of texas and california combined, and its population is about 1 million less than new york state. most scientific and technical workers are located in five major cities, and the country is divided into six administrative zones. within these zones twenty universities and forty-four institutes were inventoried by the study team with respect to specialization, faculty, book collections and the like. from these universities and institutes, five primary and seven secondary nodes were named to be connected by means of a telex communications system. the telex connections are not to be computer-mediated in the forseeable future, but used for interlibrary loan and other messages. ( there were two teleprocessing systems operating in colombia at the time of the study.) basic recommendations are: that a governmental unit be established with responsibility for directing the development of sicoldic; that this unit, with a high echelon board of directors, should produce several directories, bibliographies and union lists, and publish a monthly catalog of government-sponsored scientific and technical research. in addition, a manual for use of the telecommunications system should be produced. the proposed budget is about $250,000. ( 4.5 million pesos ) for the first year, graduating to a 25-fold increase by 1976. in some aspects the sicoldic plan follows the pattern of some state library development plans being implemented in the u.s. the advantage of central control of information resources planning and fund control b y the sicoldic group, with fairly direct access to high governmental 228 journal of library automation vol. 4/4 december, 1971 authority, provides reasonable insurance for support of the plan, especially since these services contribute to the economic and scientific advance of colombia. there is no indication of the acceptance of the plan by colciencas, the governmental unit which commissioned it. of the sixty references in the bibliography, spanish publications predominate. ronald miller cooperation between types of libraries 1940-1968: an annotated bibliography, by ralph h. stenstrom. chicago: american library association, 1970. this bibliography is an effort to sift, organize and describe the literature of library cooperation produced during the period 1940-1968. two criteria governed the selection of the 348 books and monographs listed: 1) they must deal with cooperative programs involving more than one type of library, and 2) they must describe programs in actual operation or likely to be implemented. although most of the cooperative projects described are located in the united states, other countries are represented when the material about them is written in english. cooperative programs in the audio-visual field are included. the annotations explain the nature of the cooperative projects and give the names of participating libraries. an appendix describes briefly about 35 recent cooperative ventures not yet reported in the literature, which the editor learned about through an appeal published in professional journals. entries are arranged chronologically to facilitate direct access to the most recent developments and to permit tracing the evolution of a particular project over a period of time. three indexes provide approaches to the material by 1) name of author, cooperative project or library organization, 2) type of c ~ .... c ~ .... c;· ~ < ~ c.o ......... ~ t) (!) @ o(!) v'"l ..... :s 0 marc based sdi service j bierman and blue 311 low and high values are entered into the system and put directly into the lc table for searching of the marc tape. table 3 presents some lc classification numbers as they might be keypunched and entered into the system and a brief explanation of what records will be pulled as hits (matches). lc table entries are in the form of aann,nn where aa stands for the two possible initial letters and nnnn stands for the four numbers following the initial letter( s) and immediately preceding the first decimal point or next alphabetic character. zero is the lowest value and z is the highest. mter all classification number cards have been converted to table entries, the marc tape is read, the lc and dewey numbers are pulled from each record, and both tables are searched for hits (matches). the dewey classification number from the marc record is read and converted into a fixed-length 10-position numeric field. for example, the classification number 020/.6234/5456 from the marc tape would be converted to 0206234545 and the number 025.3/02 would be converted to 0253020000 before dewey table searching. if a classification number card had been 020-029 (see table 2), both of these records would have been a hit. the lc classification number read from the marc record is first converted to the form aannnn and then searched against the lc table. for example, the classification number z665.h45 from the marc tape would be converted to z00665 and z678.3.k39 would be converted to z00678 and then searched against the lc table. if the last entry in table 3 had been input into the system, these records would both be hits, as their lc numbers lie between zooool and zolooo. if a match is found in either table, the marc record is transferred in the original marc format to the output tape with the list code. mter odl-07 is completed, control passes to odl-07x. odl-fp7x program inputs are the header tape from the previous run and the detail tape containing the selected records from the previous ( odl-07) run. outputs are the sdi listings by subject areas (list code). figure 3 is a detail flow chart for odl-07x. the first record is read from the header tape and the detail tape is then searched for matching list codes. when a match is found, the marc record is formatted and printed. when the entire tape has been searched, the next header is read, the detail tape is rewound and the process is repeated. this continues · until all header and detail records have been matched and printed. the result is a series of sdi lists, each in lc card number sequence. see figure 4 for a sample of two printed records from a library science list. presently, the weekly lists are being printed on two-up, three-part, perforated teletype size (8~" x 5~") paper, one record ( sdi notice) to each separable form. 312 journal of library automation vol. 3/4 december, 1970 hskp at end at end yes construct and print record no fig. 3. odl-07x detail flow chart. discussion the· sdi system was written with flexibility as one of the main considerations. dewey classification number cards in ahnost any format can be machine converted to the intended table entry. both ranges and individual classification numbers are allowed. any number of dewey and lc entries and any number of lists can be handled simultaneously, the only limit being core size. the selection tables, not being built into the programs, can be changed at any time, weekly if desired. the print format generally follows traditional catalog card arrangement, the major difference being that each subject heading and added entry appears on a new line and is not numbered. the print program can be easily adapted to any conversion table desired; delimiters, field terminators, etc. are referred to symbolically. there is an optional feature which allows marc based sdi servicefbierman and blue 313 09/03170 ll erary science stevens, ~ary elizabeth• autc~atic indexing, a state•cf•the•art report. reissued with additions a.no ccrrecticns. washington, u.s. national bureau of standards, for sale by the supt. of docs., u.s. ggvt. print. off., 1970. vit 290 p. 26 cm. 2.25 (national bureau of standards ~onograph 91) •a united . states department of co~~erce publication.• includes eiblicgraphies. automatic indexing. t.s. national bureau cf standards. monograph 91 cc1co.u556 no. 91, l97c 029.5 73•t07239 marc • oklahoma oklahoma d.,artm£nt of lo1uoon sdj usu inr<>rmatoot< s••voco 09/03170 library sci ehce librarianship and literature, essays in honour of jack paffcrc. ecitec by a. t. milne. london, athlone p., 1970. viii ., ·141 p., 4 plates. illus., port. 23 cm . 40/incluoes eibliographical references. £r. j. ~. p. paffcrot by a. t. mjlne.--1. the british museum in recent times, by sir f. fra~cis.--2. the education cf a librarian, by r. irwin.••3. library cd-operation in great britain, by s. p. l. filcn.••4. the development of british university libraries, by j. w. scott.--5. problems of a special library, by r. tho~as.••6. t~e growth of literary societies, by a. brown.••7. the editor and the literary textt requi~e~ents ano opportunities, by ~. f. brooks.••b. some leaves frcm a thirteen•centurv illuminated manvscript in the university of london library, ay f. wormal0.••9. a bibliography of j. h. p. paffort, by j. harries and r. wo pound. library science••acdresse·s, essays, lectures. pafford, john henry pyle. ~ilne, ~lex~niler taylor, eo. pafford, john ~enry pyle. z6~5.l57 c20/.9~2 10~477193 ~85111179 marc • oklahoma . oklahoma o .. artm£nt 01' liuaiifs sol u ... infoiimation suvicf fig. 4. sample sdi notices. 314 journal of library automation vol. 3/4 december, 1970 any character or characters to be deleted and the resulting gap closed; this is desirable for diacriticals until better techniques for handling them are devised. both line and page length are referred to symbolically and can be easily changed to fit any form desired. line spacing and indentation are built into the present program, but even these can be changed. the major disadvantage of the sdi system as it now exists is that it allows selection by classification numbers only. unlike the marc i experimental sdi system at indiana university (16), which allowed for selection by weighted terms (both classification number and subject heading), this system allows for classification number selection only. programming difficulties, expense, and the necessity for additional processing time inhibit searching on subject headings. for selection of detailed subjects, subject heading searching is essential; however, for making subject searches in subject areas classification number searching seems more expedient, as it would be difficult to determine, and expensive to input, all of the subject headings for the field of law, for example. ideally, a marc-based sdi system would be able to provide selection based on classification numbers and/or subject descriptors. computer, language and cost the computer for which the programs were written was an ibm 360/30, 32k core size, one card read/punch, four tape drives, two disk drives and one printer. the programs have also been successfully run on an ibm 360/25 with one card read/punch, two tape drives and one printer. in the latter case, the first program was modified slightly because only two tape drives were available, whereas the sdi system normally requires three. modification was easily accomplished by having the header records punched rather than written in odl-07. the programs are written in cobol for the 360, operating under dos. very little modification would be required to operate under os. being written in cobol, the programs are easily adapted from one machine to another; they have been successfully run on a rca spectra, for example. they also are easily adapted and changed, the symbolic names and procedure division paragraph headings having been carefully selected to build in as much documentation as possible. following is a breakdown of the charges to the department of libraries for programming and machine time for development; department of libraries' staff time, overhead costs, and operating costs are not included. programming and debugging ------------------------$2,941.00 machine & operator costs for testing ___________ 452.00 operating costs are more difficult to determine and nearly impossible to evaluate meaningfully. the total amount of computer time required (and therefore the major cost) is primarily a function of the number of records on the marc tape being searched and the number of selected marc based sdi servicejbierman and blue 315 and printed records. if the marc tape contains 1,200 records, it takes about twelve minutes (clock time) of computer time (ibm 360/30, 32k) to select the desired records ( odl-07). as the total of classification numbers being searched increases (that is, as the dewey and lc tables grow), the computer time for selection does not appear to increase significantly. the print program ( odl-07x) is directly a function of the number of lists being produced (the number of times the detail tape must be rewound and re-read) and the total number of records being printed. as an example, ' if six different lists are being produced and a total of 375 records are being printed out, the computer time is 25 minutes. therefore, producing six weekly lists with an average of 62 records for each list takes approximately 37 minutes (clock time) each week. at the rate of $60.00 an hour, this is $37.00, or approximately 10c per record selected and sdi notice printed. table 4 presents a detailed analysis of five weekly runs. the total computer time is the number of minutes which were charged to the department of libraries by the computer center. since the department is charged one dollar per minute, this is also the dollar cost to the department for computer and operator costs for that weekly run. unfortunately, the total time given includes time for set-up and other factors. therefore, meaningful patterns are difficult to discern, as one week it may take several minutes longer to get the forms inserted and lined up in the printer, forms may break another week, etc. the remainder of table 4 is exactly accurate. it is interesting to note how much variance there is from week to week in the number of sdi notices for each subject list. for example, out of 889 marc records on the marc tape run on july 23, 16 were library science titles. however, the marc tape run on august 6 contained 1,201 records but only 12 were library science titles. in addition, notice that the library science list was reprinted seven times, and for the last two weeks reprinted five times, to get the total number of copies needed for the 25 subscribers to the list. current uses the uses to which the system is presently being put are in three general areas: 1) sdi lists for internal use of the department, 2) sdi lists for state government, and 3) sdi lists for other libraries. the department currently produces subject lists primarily for its own use in the areas of law and political science. since the department maintains specialty collections in these two subject areas, it is anxious to obtain the most current information on materials published in them for selection purposes. because the marc record comes out before the corresponding proof slip is distributed ( 17), use of the marc file has been a most successful means of obtaining complete and verified bibliographic information for the purpose of ordering new books. in addition, complete lc cataloging information is available should the proof not have arrived at the time the book is received. because the lists are currently being printed on three2'able 4. sample run times and list lengths. r"' cl' -8 a .. " ~ n 3 '" ~ .. " 1 1 1 1 75 92 118 22 /:j yl. lis ll. 1 1 1 1 7 1 r~ 100 ?n 71 83 100 20 1 1 1 1 60 65 73 21 60 65 73 21 1 1 1 1 61 80 89 29 61 80 89 29 1 1 1 1 80 113 106 31 80 113 106 31 c )> c .. ~ .. "!?"!?-0 "' 2. 2. (') ~ ~ .. g; .. !1. a (;' a 1 1 -15 41 -i:j 41 -1 1 -r u -8 44 -1 i 1 5 34 16 5 34 16 1 1 1 11 38 17 11 38 17 1 1 1 11 42 22 11 42 22 number of print runs number of marc records selected number of sol notices printed number of print runs nun>t>e_r of marc records selected number of sol notices printed number of pri nt runs number of marc records selected number of sol noli ces printed number of print runs number of marc records se ected number of 501 notices printed number of pri nt runs number of marc records selected number of sol not ices printed c.;, ...... :: "'! ;i ~ .q.. ~ ~ ~ "'i ~ > >:: cs .... ;::s a ~· ~ ~ c.;, .......... ~ t::1 ('t) (') ('t) s 0" j~ ...... ~ marc based sdi servicejbierman and blue 317 part teletype paper, one record per sheet, it is easy to separate the record to be ordered and send one copy to acquisitions, retaining one copy for the files, and sending one to the interested individual in state government with a note that the book is on order. the department also produces a special list of many different subjects which are of interest to the legislature for the legislative reference division of the governmental services branch. the legislative reference division can then order particularly useful materials quickly and route a copy of the sdi printout to the interested legislator or legislative committee. the department has prepared profiles of the state agencies having a large planning and research role. lists are prepared weekly for the department of education, department of corrections, department of vocational-technical education, department of welfare, industrial development commission, department of highways, and several small agencies, and are sent to the person responsible for planning and research within the department. he can then request books from the lists by returning one copy of the sdi notice to the department of libraries with a note to order, retaining the other copy for his files or routing it to a researcher particularly interested in the subject. certain lists are being produced and shared with libraries around the state. the law and political science lists are being sent to two law schools in oklahoma. the library science and bibliography lists are being sent to the library school and the two largest public library systems, as well as the two state universities. over 25 libraries outside oklahoma are receiving weekly library science, political science or law lists ( 18). a cooperative acquisitions program is evolving whereby certain libraries agree to specialize in certain subject areas so that every subject area would be covered by one library for specialized materials not needed by all libraries. currently, the program involves the two major public libraries and the department of libraries wherein the state teletype network (otis) is used to transmit rapidly information on expensive materials for cooperative acquisitions. selected lists in the specialized subject areas can be produced each week for each of the cooperating libraries to aid them in their selection, acquisition and cataloging of the materials. the uses currently being made have excited the imagination of many people, both within and without the department of libraries. a great deal has been accomplished since the system became operational early in february 1970; however, the possibilities have barely been identified. as mentioned above, one can envision this being the foundation of a cooperative acquisitions program. such a system could form a node of library service to business and industry; currently, some thought is being given to producing weekly lists of materials in automation and computer science (systems analysis, etc.) both for the many state agencies which have automated equipment and for businesses and industries around the state which utilize computer technology. 318 journal of library automation vol. 3/4 december, 1970 conclusion marc is an exciting and potentially valuable innovative new tool available to the library community, useful to improve both its own internal operations and, more importantly, its service to others. nonetheless, before extensive meaningful use of marc will occur, its potential uses must be identified and explored. this article has attempted to give a picture of one such experimental project to improve library service for others within the framework of a particular institution's resources and functions. much more research is needed on potential and operating uses of marc and the results of this research need to be disseminated to the library community. in addition, it is the opinion of the authors that for reasons both of available financial resources and expertise much of the research and development with marc must be a cooperative venture among many different libraries. some work has been done with marc cooperatively throughout the country (nelinet (19), oclc (20), clsd (21), for example) but much more needs to be done. the future of meaningful uses of marc is bright; however, much research and development is yet to be done which can best be done as a cooperative effort. programs and additional information sdi computer programs and services available from the department of libraries to other libraries are described in a publication called "sdi services and costs," available from the oklahoma department of libraries, 109 state capitol, oklahoma city, oklahoma 73105. additional progress reports on the sdi project, as well as other automation projects in oklahoma are reported in the bi-monthly oklahoma department of libraries automation newsletter, which is available on request. references 1. cuadra, carlos a., editor: annual review of information science and technology, 4 (chicago: encyclopedia britannica, 1969), 249-258. 2. studer, william joseph: computer-based selective dissemination of information (sdi) service for faculty using library of congress machine-readable catalog (marc) records (ph.d dissertation, graduate library school, indiana university, september, 1968 ), 1. 3. studer, william j.: "book-oriented sdi service provided for 40 faculty." in avram, henriette d.: the marc pilot profect; final report on a project sponsored by the council on library resources, inc. (washington: library of congress, 1968), 180. 4. cuadra: op. cit., 243-258. 5. ibid:. 263-270. 6. bloomfield, masse: "current awareness publications; an evaluation," special libraries, 60 (october 1969), 514-520. marc based sdi servicejbierman and blue 319 7. bottle, robert t.: "title indexes as alerting services in the chemical and life sciences," journal of the american society for information science, 21 (january-february 1970), 16-21. 8. brannon, pam barney; et al.: "automated literature alerting system," american documentation, 20 (january 1969), 16-20. 9. brown, jack e.: "the can/sdi project; the sdi program of canada's national science library," special libraries, 60 (october 1969), 501-509. 10. davis, charles h.; hiatt, peter: "an automated current-awareness service for public libraries," journal of the american society for information science, 21 (january-february 1970), 29-33. 11. housman, edward m.: "survey of current systems for selective dissemination of information ( sdi) ." in proceedings of the american society for information science, 6 (westport, connecticut: greenwood publishing corporation, 1969), 57-61. 12. martin, dohn h.: "marc tape as a selection tool in the medical library," special libraries, 61 (april 1970), 190-193. 13. bierman, kenneth john; blue, betty jean: "processing of marc, tapes for cooperative use," journal of library automation, 3 (march 1970), 36-64. 14. recon working task force: conversion of retrospective catalog records to machine-readable form; a study of the feasibility of a national bibliographic service (washington d.c.: library of congress, 1969). 15. bierman, kenneth john: "marc-oklahoma data base maintenance project," oklahoma department of libraries automation newsletter, 2 ( october 1970). 16. studer, william j.: (op. cit., note 2), 23-37. 17. payne, charles t.; mcgee, robert s.: "comparisons of lc proofslip and marc tape arrival dates at the university of chicago library," journal of library automation, 3 (june 1970 ), 115-121. 18. bierman, kenneth john: "marc-oklahoma cooperative sdi project report no. 1," oklahoma department of libraries automation newsletter, 2 (june & august 1970), 10-14. 19. nugent, william r.: nelinet: the new england library information network. paper presented at the international federation for information processing, ifip congress 68, edinburgh, scotland, august 6, 1968. (cambridge, mass: inforonics, inc., 1968). 20. kilgour, frederick g.: "a regional networkohio college library center" datamation, 16 (february 1970 ), 87-89. 21. the collaborative library systems development project (clsd): chicago-columbia-stanford. unpublished paper presented at the marc ii special institute, san francisco, september 29-30, 1969. lib-s-mocs-kmc364-20141005043847 87 on-line and back at s.f.u. m. sanderson: simon fraser university simon fraser university library began operation with an automated circulation system. after deliberation, it mounted the first phase of a two-phase o~line circulation system. a radically revised loan pol·icy caused the system design and assumptions to be called into question. a cheaper, simpler, and more effective off-line system eventually replaced the on-line system. the systems, fiscal, and administrative implications of this decision are reviewed. the original system when simon fraser university ( sfu) library opened in 1965, circulation of materials was handled by an automated system. briefly the method of operation was as follows: to borrow a book, the patron presented a laminated plastic card which had his borrower number and borrower class (faculty, staff, graduate, undergraduate) punched in it. the book itself cont:'lined a keypunched card holding the book's class number and brief author and title information. the book card and the patron's badge were fed into an ibm 1031 data collection terminal. the terminal transmitted the information to an ibm 1034 card punch which punched out a card containing the information from the book card, the patron's borrower number, and the date borrowed. at the end of the day, these transaction cards were used to update the loan master file. the loan master file produced daily a list of all material on loan, and fine and overdue notices for dispatch to patrons. payment cards for fines were also produced daily by the system; these cards were used to cancel fines from the file upon payment of the fine. the loan master file and the daily circulation listing also contained records of all materials on reserve. separate listings were available weekly showing reserve books and reserve photocopied material. at the end of each semester a list was produced of all students owing more than $2 in fines for the purpose of withholding grades until such time as fines were paid. reasons for going on-line the possibility of implementing an on-line system in one of the sfu departments was first discussed in early summer 1968. it was accepted by the computing centre management and the nonacademic department heads that: 1. the use of on-line processing generally was increasing rapidly. 2. the level of sophistication of these systems was not high. 88 ]oumal of libra-ry automation vol. 6 / 2 june 1973 3. there was a shortage of people competent to design, implement, and maintain sophisticated on-line systems. 4. a demand for on-line processing at sfu would develop. 5. sfu would probably move with the general trend toward increased use of on-line systems, and an on-line system ought to be initiated to develop local expertise in anticipation of demand. after further discussion, it was agreed that the department wishing to develop the first on-line system must be able to satisfy the following prerequisites: l. the system should encompass the beginning and the end of a clearly defined process. 2. the system should require the simultaneous use of one or more files by two or more terminals. 3. the system should use relatively large files with a high inquiry and update rate. 4. the system should satisfy genuine objectives of the application department. a survey of the departments showed that the library was the logical choice because: l. it could satisfy the prerequisites. 2. it had experience with automated systems. 3. batch-processing in the loan division could be extended to the on-line mode using the existing line of equipment. 4. the library administration was prepared to make an immediate commitment of resources to the project. the library's objectives were as follows: l. inventory conj1·ol-to gain statistics about the use of the collection. such data were available under batch processing for the general collection, but not for the reserve collection, which, with its loan periods of two hours, four hours, one day, and three days, was handled manually. 2. inventory usefulness-to determine how the library is being used and by whom. this information is essential in order to ensure that collection building is a reflection of the realities of the education process of the institution. 3. increased service-by definitiqn, the library is a service institution. if the automated system in batch mode allowed us to speed up the transaction process to handle large volume circulation, and allowed us to produce overdue notices, bills, and statistics, thereby increasing both the efficiency and service of the loan division, then we were satisfying a built-in library objective by implementing data processing in batch mode in the loan division. if the on-line system could give our users instant information on the status of books, then that function becomes a service objective. at sfu, the loan period and penalties for overdue books are the same for all classes of borrowers. the library has never been an enthusiastic supporter of the fines system because on-line and back at s.f.u./sanderson 89 of the general antagonism it creates and because it favors the borrower who can afford to pay. unfortunately, there was no acceptable way to force faculty to pay fines. it yvas thought that the on-line system was the only way to support a system of suspension of borrowing privileges for failure to return books, in lieu of the fines system. 4. cooperation-it was agreed between the three universities of british columbia (simon fraser university, university of victoria, and university of british columbia) that the storage of low-use material in a cooperatively supported lending/storage facility would save in the order of $800,000 per year. it was felt that the on-line system would provide useful statistics for this purpose. 5. future development-it was thought that the on-line system, with its statistics-gathering potential, was a necessary preliminary to the cooperative shelflist conversion of the three universities, in turn thought necessary to provide the kind of bibliographic information to allow collaborative collection building. the reasons why the above justifications later turned out to be invalid are given in a subseq11ent section. phase i of the on-line system (abbreviated system flowcharts of the various stages are shown in appendix 4) the purpose of phase i was to put the general collection on-line in enquiry mode only with batch updating every three hours-on-line updating was to wait until phase ii. in april1969, one full-time programmer analyst and one part-time systems analyst began work on the first phase of the on-line system, using three ibm 2260 graphic display terminals. problems with pgam, the pl/i graphic access method interface program, and multitasking support allowing the use of more than one terminal at a time (it was easy to get one terminal going) meant that by april 1970 the system was just struggling into life. there followed a period of parallel running which was unexpectedly long as a result of some of the problems peculiar to on-line systems (e.g. system down-time; designing a 1'eally effective back-up system to prevent loss of data). this phase lasted until october 1970. by july 1971 it had become apparent that the system was not cost-effective and in august 1971 the system was taken down and replaced by a revised version of the old batch system. the reasons and costs are given in a later section. there were three display terminals in the loan division, two for patrons, one for staff, giving the following capabilities: patron-when the patron typed in the class number of the book he was looking for, according to instructions appearing on the terminafs screen, the information was transmitted to the computer program which searched the on-line loan master file for the required class number. if the book was on loan, a message appeared on the screen giving the class number, borrower number, due date, and whether a hold had been placed on the book. 90 journal of library automation vol. 6/2 june 1973 if the book was not on loan or on reserve, or being repaired, or in cataloging, a message to this effect was displayed. if the patron made any errors in his use of the terminal, error routines in the program displayed messages giving corrective procedures. staff-by use of a special password, staff members could access different modules of the enquiry program. a status query by a staff member would result in all copies of a particular class number being displayed serially on the screen, and since fines and overdues were held on the master file, this type of information was also displayed. other routines available to staff members allowed holds to be placed on books or removed, renewals to be made, and the passwords to be altered. although passwords were a closely guarded secret, it was felt necessary to be able to change passwords in the event of their being learned by unauthorized users. since on-line updating was not to be incorporated until phase ii, the 1034 transaction cards were input every three hours and the loan master file updated in batch mode. file structure for phase i was based on an indexed sequential type of access to a loan master file which contained one 100-byte record per book on loan, one record per fine and one record per reserve book. in this way, the loan master file was in the same format as in the batch system. access to the file began with a program check of a small table held in core storage which gave ranges of class numbers with entry points to an index table. taking the appropriate entry point, the index table stored on disc was accessed. this gave the class number which headed each track for the loan master file. the index table was scanned for the appropriate track. each track of the loan master file contained fifty-four records with eighteen spaces for updates. whenever a record was changed or a new loan inserted, the new record was inserted in the update area. at the end of the day, the file was stripped of its update records and the old batch update program was used to update the loan master file. the loan master file was rewritten to disc the following morning ready for the day's updates. total file space allocated was fifty cylinders. phase ii and the demerit system phase ii was to see the on-line processing of loans and returns, the master file being updated at the time of the transaction instead of in three-hour batches. the reserve collection was to be automated and go on-line. the recording of holds and the production of hold slips for patrons and books was to be fully automated. detailed statistics of the use of the reserve collection were to be obtained. one of the major objectives of phase ii was the replacement of the fines system by a demerit system. under the demerit system a patron would accrue penalty points for the length of time a book was overdue. after a certain level was reached, a warning notice was to be sent out informing him that his privileges would be suspended if a particular level, of points were exceeded. if he then exceeded this level, his borrowing privileges on-line and back at s.f.u./ sanderson 91 wouid be suspended, and whenever he subsequently presented his library card to take out books, the checking procedure in the program would find his borrower number invalid, prevent the transaction being recorded and print a message on a 27 41 terminal giving the reason for suspension. after a given period, borrowing privileges would be restored, provided that overdue materials had been returned. at exam times, penalty points would accumulate more rapidly, as they would also for reserve materials which had short loan periods. file organization for phase ii was to be altered from that of phase i principally to allow easier retrieval and updating. a master index file would contain a brief record (26 bytes for class number, 4 bytes for relative address) for every cataloged book in the library. this index file would lead into the loan master file which would consist of variable length records: one fixed length portion of the class number and author-title, followed by varying numbers of fixed length sections giving details of the loan transaction. the number of the transaction sections would depend on the number of copies of the book which were on loan. anticipated file sizes were 60 cylinders for the master index and 30 to 40 cylinders for the loan master file. the increase in file handling efficiency and in restarting with no lost data after system down-time were seen to compensate for the increase in space allocation. loan policy changes problems with the system of fines and proposals such as the demerit system led to the suggestion that a survey should be made of campus opinion on the library loan policy. an examination of the results of the questionnaire and the comments obtained led to the submission of a somewhat different loan policy to the senate library committee. this policy, briefly, was a recall system with the two-week loan period changed to a semester loan period for general loan material, and retention of the current fines system for reserve materials until the implementation of phase ii. failure to respond to recall was to be penalized by suspension of library service. the system was to be experimental for two semesters. the decision to adopt a recall system had an immediate impact on system development for phase ii: 1. specifications for phase ii needed to be reworked. 2. the demerit system was no longer required. 3. interim procedures were required to handle the recall system until the inception of phase ii. 4. file size growth became unpredictable because it was not known whether all books would stay out until the end of the semester or be returned at more frequent intervals. this could indicate a file size of between 30,000 and 80,000. revision of thinking on on-line circulation two significant developments made it advisable for the library to re92 journal of library automation vol. 6/ 2 june 1973 consider its need for an on-line system in terms of both its benefits for the library and its economic justification. the first development was, as indicated, the radical revision of library loan policy-namely, the proposed adoption of a semester loan period supported by a recall system. the second was a detailed costing of the equipment requirements for phase ii of the on-line system, weighing the relative merits and costs of two alternative manufacturers. these costs have turned out to be significantly higher than originally anticipated. consequently, it was seen that the costing done for phase ii should be done again in the light of the new developments. the original benefits of the on-line system were also reexamined. 1. inventory control-this still applied as far as the reserve collection was concerned. these :.tatistics would have to be gained in some other way insofar as they are additional to the statistics now collected manually. 2. inventory usefulness-this was no longer a justification. by this time we had developed collection analysis programs which give a fine breakdown of the collection into separate disciplinary areas and give total volumes and book usage by borrower class in these areas. further development of these programs could give more information; e.g. referencing the registration system files could give information correlating students, courses, and book usage. 3. increase in service-this was no longer a justification. (a) the implementation of the recall system with its attendant suspension of privileges does not demand an on-line system for its operation as would the previously proposed demerit system. with a suspension of privileges for those owing over $25 tested in early 1971, we were operating a manual system of borrower control successfully, leading us to assume that the recall system's control system would similarly function well. (b) nobody ever complained that the information on the batch system was too old (eighteen hours old at maximum). we had even had messages (anonymous) left by frustrated users of the on-line terminals which could be paraphrased as: "what was wrong with the old system?" 4. cooperation-this was no longer a justification. extensions of the work on collection analysis mentioned in 2 above· could help in the identification of high and low use items and thus provide an alternative way to save the estimated $800,000 per year. work on collection comparison between the three british columbia universities is already underway in a tri-university task force. 5. future development-this was no longer a justification. shelflist conversion should have been hastened by the abandoning of the on-line loan system insofar as resources would be freed to work on the conversion, which is of far greater importance to the future information on-line and back at s.f.u./sanderson 93 handling capability of the library than knowing within four seconds whether or not a book is on loan-especially as the time taken in reshelving of books make this loan information prone to inaccuracy. it thus appeared that the reasons used to justify an on-line system were no longer valid, if, indeed, they ever were. when examining the cost figures again in view of the proposed recall system, the amortization of the development and equipment costs no longer seemed possible. the cost of the batch and on-line system equipment is shown in figures i and 2 for both ibm and colorado instruments (now mohawk). it can be seen that the difference m equipment costs between the proposed batch system and phase ii would have been over $15,000 per year. (some of the savings in equipment rental has been used to microfilm the subject catalog for distribution to three floors in the library which do not have easy access to this catalog.) the manual procedures involved with fines which phase ii was to eliminate are now considerably reduced by the recall system. the development costs of phase ii have been replaced with the cost of returning to the old batch system in a slightly improved form. the cost of this, at the computing centre, was $2,123.76. it had been predicted that writing phase ii in minerva and marc iv (two high-level program language packages) would make considerable savings in the impact on computing centre operations. however, even taking this into account there still remains the development costs and at least $15,000 per year for extra equipment. (the difference between the equipment costs for phase i (figure 1) and phase ii (figure 2).) see the appendixes for cost comparison and projections. colorado ( 3 year lease) monthly ibm monthly 3 c-deks @ $131.29 $393.87 1 10.31 a terminal 3 c-dek cable terminals @ $100.34 $100.34 @ $2.14 6.42 1 1031 a terminal 105.35 1 central controller 137.25 1 1031 b terminal 64.12 1 controller cable terminal 1 1034 card punch 328.73 box 2.25 (includes educational 2 mag tape-recorders 268.20 discount) 598.54 807.99 service-free discount @ 12% 100.00 installation-equipment 707.99 already on site installation-probably fr ee service contractapproximately 122.00 total colorado total ibm monthly $829.99 monthly $598.54 fig. 1. equipment costs (1971) ibm vs colomdo, phase i and off-line 94 journal of library automation vol. 6/ 2 junel973 colorado ( 3 year lease) monthly ibm monthly data collection: 5 c-dek3213 2 1031a terminals @ $131.39 $ 656.95 @ $100.34 $ 200.64 5 c-dek cable 2 1031b terminals terminals @ $2.14 10.70 @ $64.12 128.24 1 3216 central 1 1031a terminal controller 137.25 @ $105.35 105.35 1 controller cable 1 2711 data set 115.00 terminal 2.25 549.23 1 interface coupler 112.50 919.65 less 12% discount 110.36 additional 2703 809.29 attachments: 1 4879 600 bps 12.00 library share of 1 4697 type ii control 40.00 memorex 1270: 1 3205 data line set 86.00 base: 1/ 32 of $1,011 31.00 2 4790 line adapters line adapter: ~of $28 7.00 @ $12 24.00 modem 33.00 1 7506 (library pays half?) @ $86 43.00 $ 205.00 back-up: 9-track mag-tape switching rpq 36.00 recorder with free back-up clock 98.21 switching rpq 134.10 1034 card punch 328.73 printers: 2 2741@ $90.70 181.40 2 2741@ $90.70 181.40 display t enninals: 4 2260@ $46.74 186.96 4 2260@ $46.74 186.96 share of 2848 311.10 share of 2848 311.10 systems $1,896.63 equipment totals: $1,693.85 service contract: nil prime shift only 195.00 $1,896.63 total monthly cost: $1,888.85 equipment freight installation and charges: check-out: $1,390.00 approximately $ 100.00 fig. 2. equipment costs (1971) ibm vs colorado, phase ii on-line and back at s.f.u./sanderson 95 the present recall system the recall system has been in operation since august 1971. its principal features are as follows: that books be loaned for a period of one semester. that they be subject to recall after a period of two weeks after borrowing. that they become due on the last day of exams. that there be a penalty for failure to respond to recall. that there be a penalty for failure to return books after exams. that the penalty be suspension of library privileges plus a $5.00 fine. in the case of failure to respond to recall, the $5.00 fine is levied five days after the recall notice is sent. in the case of failure to return books after exams, the fine is $1.00 per day to a maximum of ·$25.00, starting at the end of the semester. listings of overdue books will be run during this period only, and a fine payment card produced and kept in the loan division. as in the first system, the fine payment card is used to cancel fines upon payment. the fine system and checking of delinquent borrowers is being successfully handled manually. that privileges will be restored only when the patron has both returned the books and paid the fine. the automated part of the system is similar to the original system described earlier except that fine and overdue notices are produced only at the semester end as mentioned. the reaction of the staff handling the recall system has been favorable, as has been the reaction of patrons. initial fears that a high percentage of the books in the collection would be out all semester and be returned en masse at the end have proved unfounded. the number of books out at any one time is often less than under the previous system. people seem to be returning books when they have finished with them and taking out fewer at a time; thus, browsing and usage are not affected. books began returning at 2,000 per day on november 30, 1971 in anticipation of the december 17 due date (master file standing at around 34,000 books on loan at this point). on december 19 only 4,864 books had not been returned. by december 29 this was down to 2,169, and by january 13, 1972 down to 394. recalls have fluctuated between 35 and 130 per day and of these an average of 8 recalls per day have not been picked up by the recaller. by contrast, under the fines system, the daily production of fines, overdue, and hold notices was between 500 and 700. the total amount of fines from september 1, 1970 to november 17, 1970 was ·$11,021.32. from september 6, 1971 to november 17, 1971 the figure was $2,405.03, a difference of $8,616.29. thus, although people are making similar use of the library, judging by the circulation statistics, it is not costing them as dearly. 96 ]oumal of library automation vol. 6/ 2 june 1973 costs comparative computer operating costs are shown in table l. tahle 1. comparative computer operating costs average monthly computer cost computer model 196970 old batch system $3, 100 ibm 360-40 1970-71 phase i-on-line $3,851 ibm 360-50 1971 recall system, batch $1,178 ibm 360-50 197273 recall system, batch $ 514 ibm 370-155 the annual average cost of computer processing is no\v $6,168 rather than the $19,320 projected in appendix l. staff salarit"s have risen in the two years since august 1971 and loans staff costs are now $33,200 instead of $21,267. total total annual cost is now $6,168 (computer time) + $7,182 (equipment) + $33,200 (loan staff and materials ) = $46,550. this is less than tl1e projected annual cost of $57,994. the recall system certainly seems so far to be making the predicted savings, and the increase in good will in the university community is something we must also take into account on the credit side. conclusion as is stressed so often in systems analysis theory, and sinned against so often in practice, a clear statement of objectives is required and a thorough cost/ benefit analysis of all alternative solutions is needed to prevent unwanted solutions of unreal problems. a first question should be: '\vhat are we really trying to acl1ieve here?" rather than: "i wonder if we could apply system x in this situation?" automation is one of many possible solutions to a p roblem. an on-line system is one of many possible automated solutions. the management aspects of the decisions in setting up an on-line system were referred to in "reasons for going on-line." the thought of taking the on-line system down again was born of a number of factors. in the first place, feeling on campus cau_sed the loan policy to evolve in a way not predictable at the time of system design. in the second place, we learned that on-line systems are not to be tre ated lightly. they require a great deal of careful design and technical competence if they are to be as efficient as they are impressive. they embody concepts as different from batch processing as batch processing is from the manual system it may replace. for us, the result was escalating costs, and an on-line system design that could have been better and less costly. the solution finally adopted was the result of considering what were seen to be the real requirements: maximum availability of materials with maximum convenience; and against the background of the library's general objectives, maximum cost-effective service in an era of tight budgets. appendix 1 annual circulation system cost summary (as of aucust 1971) present on-losses compared pr()posed line phase i plw.se ii savings over with batch (without predicted annual costs (with recall) recall) (with recall) phase i phase ii phase i phase ii machine time $19,320 $37,150 $42,000 $17,830 $22,680 forms overdue notices 950 3,250 9.50 2,300 fine notices 15 48 15 33 a printouts 3,200 960 1,000 $2,240 $2,200 ~ i postage for overdues & fines 3,530 12,000 3,530 8,470 t'"t ... envelopes 70 250 70 180 ~ ~ postage for holds/ recalls 1,200 1,200 1,200 [ punch cards 1,260 1,260 1,260 loans staff b:l ;::. fines 2,000 6,000 2,000 4 ,000 ~ ;>;"' stuffing envelopes 600 2,000 600 1,400 ~ looking up addresses 400 1,200 400 800 vl reserves staff 18,267 18,267 14,763 3,504 ~ equipment c::: 1030 system 7,182.48 7,182.48 14,606.04 7,423,56 '--en 2260 terminals 1,682.64 2,243.52 1,682.64 2,243.52 > share of 2848 ·3,733.20 3,733.20 3,733.20 3,733.20 z t:j 2741 terminals 2,176.80 2,176.80 tr1 ::0 $40,428.84 $38,257.08 $3,440 $6,964 en 0 net saving in annual cost of batch system over: phase i $36,988 z phase ii $31,293 '-0 --l 98 journal of library automation vol. 6/2 june 1973 appendix 2 gross computer operating costs during phase i costs shown include all circulation runs. nov. 1970 dec. 1970 ]an. 1971 feb. march april may $ 5,402.57 4,265.12 3,605.33 3,937.78 4,349.41 2,981.39 2,421.39 cpu hrs. 36.0241 28.4410 24.0419 26.2595 29.0043 average monthly operating cost of phase i over seven months: average monthly operating cost of former batch system: appendix 3 19.8820 16.1487 $3,851.85 $3,100.00 development costs for phase ii completion present system (phase i ) converted to minerva with new file organization, etc. interface to batch system. and systems computing centre library programming and systems tests est. pacific westem consulting (minerva) at $150 per day computer time (est. ) forms, staff training parallel runs minerva total phase ii on-line systems computing centre ibm support library personnel programming and systems tests pacific western consulting computer time (est.) forms, staff training parallel runs (33 d ays at $35 per day) equipment rental (@ $1,200 per month additional) total development phase ii total system development (already spent-in addition): $11,576 2 months 7 days subtotal 7 months 5 days subtotal ( 10 days) 13 months 48 days subtotal 21 months 10 days subtotal $ 1,800 200 2,000 5,600 750 8,350 1,500 50 350 --$10,2.50 --11,700 1,400 1,900 15,000 16,800 1,500 33,300 10,000 1,000 1,155 1,300 46,755 57,005 circulation mash:r lis~lng on-line and back at s.f.u. j sanderson 99 appendix 4 (a) original circulation system ibm 1034 card punch dajly circulation s)l>tom 100 journal of library automation vol. 6/ 2 june 1973 1030system circulation cards payment cards, lost book billi, reserve bills, etc. reserves listiog by course appendix 4 (b) phase i create on~ line l..o.an master inquiry and update program (status, holds & renewalt) 3 in cene ntl lo3.ns 1031 badgecard readers 2 in reserves reser\le listing"' by course {weekly) bode-up 1034 ca rd punch on~line and back at s.f-u-/sanderson appendix 4 (c) proposed phase ii c"..reate on-lioe loj><00t!!jdomo"-'ootl!jooljl0>j<'0!!!joolj!_oo!!j!o!!lo _ __,.oo!!_!!!oolj!_oo!!j!o'l!)do'-"'oo .. o""oo!jloo"'o!!!joo"'ooi!!io.!l!ddlli!ddil!iol!loo"'oo""o""'-do ooo cciooooooooooo oa ooooooooooooilnooooooooaooooooooooooooooooooooooo 00 ooooooooooooooodoooooojoooooooooooooooooooomoqo -·--oo oooooooooooooooooooooooooooooooooooooooooociooooo :j 1 ~o 11111010 t 720610 no lot7206l07l oo moooooo ttl r4 2 o , ooto4 3'911600 1 06'9u t 1 .. ...qqj 0 9607 iip q ll ltl 2l'9 0 0 115)512 0 --· ·-··--·--·· ·---fig. 1. dump of contracts & grants office master tape record original specifications for these project records had in fact included a gesture toward information retrieval in the form of a 5-digit "discipline" code; this code had quickly become null when problems of maintenance and interpretation revealed themselves. however, the regular monthly entry and updating of other descriptive data was already twelve months underway at the time the cooperative project was suggested. library index the original proposal for a library index to the c&g file was production of a standard kwic (keyword-in-context) index to project titles, to be based on use of an ibm share library program developed by computer center staff. after review of available library programs and output, the product was finally specified to be a kwoc (keyword-out-of-context) index to project titles, using the chief investigator's name as key to a second 'bibliographic" section of project summaries (figures 2, 3). a third section was added to list project summaries indexed by campus department name (figure 4). the c&g file included 12 elements which the library considered to be of general campus interest. these elements, comprising the project summaries, are: ( 1) project title; ( 2) chief investigator's name; ( 3) award status (i.e., funded or proposed); ( 4) project site (i.e., campus or affiliated institution); ( 5) project type (e.g., training, basic research, applied research); ( 6) grant number; ( 7) total project duration to date; ( 8) award period; ( 9) award amount; ( 10) granting agency name; ( 11) campus department; and ( 12) school. items 10, 11, and 12 of this list exist as numeric codes in the records, --:....-;;;;-:--c:-:--:-:c-:--'""""'!oi-sif\g-cc,...muf\ity ierifil heili 1--·· · -·dunlap ... mental healttt princ.ipl[ s and early educat w n f (j it oeaf children schlesinger, h ~~~~:~l~~ !~ t=e~:~:~=~ ~~ ~f :~m~e~~~~6:~ ~ ~n h:~!!~~t ~ ---------.oijsi:ji=li'ft~to~t,..\5ng:"!t:•o.c· ~~ -helabollt routes & their cthtitcl singer, 1 ~~~~~~~-'i~~~;~~·~s~~~~~2!-'la~~o;;c~~~tc.;:~:~ot· !~i ~nf e~7:~~~~~~1 ~smh1-bol-is-m -ta:::~·/ stuutes cf bile pigment i"'ltatiollsm schmid, r sa! :~~: ~" "~~ 4~~~:~" .~v£ ~~ '\i~e~a~:!:~uf!jf2~' ~\1!!2c;;te,;oe"'s•t ----------~~~/t¢~~t:·o.jl" ,.-. --tnetr.n erroiis qf metaticli sm smith, l "-*w.lffis'-''-'-~---l:~iffi~~~~~~~ o: ~~p~~o~~~~~ii~k~~~~~·~~·~;tt~~:" ceu s ___ ~-----~~n~at .. " fig. 2. keyword subject section 164 journal of library automation vol. 6/3 september 1973 _,uc..._..sf-:_j,jli'-"j:b .. a ... bfl-------...ti.ll.....ulhtl<;l a _ t_:_r;_fi_~•li.j1!.!2t.k _..21jlil,j[;l_ _ _ ________ jp:aag!ilec__.l19"""'•"'o"se'""•"'"'•u:-,--cw:-.... -..,..,ly"'•"'•h"'cc" y=te &li"~un e cy1 ct·~ · sts· ~ ·... ----patholugv pfi(~ j lvpe 8 slle:ucsf award ccontjiiiueoi --------"yp"-!..li!...j.i .;t!'t!cj .(ll__to !?/31/i.r.) agency cc.zl __ _!!n""c:._,ro,_.i_,t.,_a0,_,7"-19"'1~--,.----aosetiiau, w. • ••• lranshr fact(l fi -charachr t. role tumor immunity --------~~=~th~~u,.lc~ .. ~~icii/fj' to. '6f3c~.q..tj~1';~0~ ag~~~~~-: nu: ~=&b q ...!•~o~sfeo!n.!.!th!!!.a'-'l ''-'''-'''-'-' ~· ~·_.<;ex"-t~~:~~d~e~~~~t!\~:0:~~"uf'f~~;'t~:t4l fields gf p;~~t ~~:~ :1-ne~~~~~ucsf award 't'r 5 c ij/01170 to 4/30/hi agency 0061 no: rolns-09146 ."'c"'th""•"'••,-,""'s'.., -. .-cs"et'""re'""t"'io'-'n'"'p"'r"'ct"'e""'in7s ·&iinsey f.anu.'fat ____ ---·-----.•• --· , ···dentistry pa.oj lype 8 site:utsf proposal ruo,.,to:, a ... • • • the fcrest cycles cf uengue in m.tlaysu hccpeii fcuncal ion prijj type b '-'-'u.ci ~~-=~~~-;~~,·~2~1~/c,'~'l:'~-t~o,._:s'~'=''~~~·~·-~~~~-~=:::~~~~~~:;~~~~~~--fig. 3. chief investigator section with decoding tables comprising a separate file at the beginning of the tape volume. these items, together with items 3, 4, and 5 are coded on input but are not uniformly edited by the regular c&g update program. program requirements necessary program functions included the selection of active grant records and the editing, decoding, and formatting of selected data elements for printing, and the extraction and sorting of index terms. these functions were divided between a main routine coded by library staff, an indexing subroutine and print program written by a computer center staff member, and an ibm utility sort. local programs were written in pl/ 1, using the newly installed pl/1 optimizing compiler, and provided several tests of the compiler's capacities. only projects currently in "award" status, or which have outstanding proposals during the preceding twenty months, are selected for printing, approximately two-thirds of the file at this time. these conditions are tested on various data fields in the c&g record, and data from selected records are reformatted into a partial print line and passed to the keyword extraction ,..t: n icint-, c.j:~e i.!a~-fj~f s."i· i~lll. m!sciiptrmi'tf t=ei't.~-lct'lr-: tp[;i,· so:(gl u ~' tldc.jnf: p.-l:j type h site:ucsfjh:c i tine' gf.i\eriol • hgvlat ten 'cf·' p.'ftfhansf.()pi1iffgn' fntheltvf'r si.hi:c l gf ~lojc.ine pfi.c j type ti' stte:ucsf ,.ecii':ih, ghej.ial • p41ht:ttie'5itti:' aclh hr-:1-l 'faill.:l,·l: s( hf.ll gf mfflicine p'-'~j type ~ slle:v .~ -::;~e"'o"'i t"i ;:;-;n e,-, -,gc;oe n"'e"'•a""l.--;:;""'"'•"'••"""""th'r'rg 'f c. ((iii c inc g e 1\ s. schcdl r.f "'~ihcint pllf1j typ[ 0 sjte:ucsf ndtclf'.li:, bhemi\l ,. ly.syl olivas£ k(:'t l~hc tftl.ac:;'f.'n' c: vi"--ss (il'-."it' t'ng st:hl:l of mi:cicwe tl'-'(i j tyh b stlf.:uc.sf proposal proposal proposal • proposal p~c.posal ccontjnued. bissell•: oj-~:~-~ ::-, ·,;-._ ;·_.-;.r:::··· ~:-;:-:9_ siegel, r --...,:noulc'>in"'ec::-,<'o~, .... , ....... .--;;ccn:n;t;tr;;;a,.ct.--.;10;-;;-.0,:;;v•"•""op c··cr:r;ifucr lrait-.ih~g sessions frjr( op,.;s•"'n"•"'•"'•s'"o"-nn..,el;---_ --,,.---,_,~~-'"-d""''"'j"sc::-,-;;";--sc.hccl of mfojc:lne p !l:o j t'r'pe t sjte::ucsf "proposal: .. !-·.~ '·. .._, ... t ~icim , gei~f. ra l • y[f;ulath: n of ci tf~[ e" ipf'fs's'j cs:·l>y-~tffij~f.i£'rgiij '(;'cj;,htxe""s--------,8;';-a;i-;xt,-,'er;;-,-j.,--' .sci-icc( of •t:i:i cin !: tl'fjl caroigv as cul ar res -sctu.ii l ( f ~;[c-icin e ·•.• • ._, ( a-rlfh) vat 'r f s iffi __ _ --·-··· .... ----occ iu gl l'ajh (log y, clhd cal sch cl cf. "'hici ~§ . , .... _ _ • o __ t:lin p~~i::i~e. o )008901 histo k y 1-:ealth sci sc~ool (jf fi'loicine •ll . hist of health s tl _ _!l!!t;t<~f~i.ljift!ji,":l!,bix.x---:-'-------"u'ts f t i] j\!iflacts &._g.b.a.tlls....ll:tlla.-. __ ._____2/...l'llll---------"m.i'---'-•••••file2 ii.eao ei'i'ors ----------~a.cifncy c.ooe~~t . in cc.g ubl'e -·----·--depf • ccoes not found -----------------sue co de s not foun once the appropriate pieces of html code had been replaced with corresponding include statements, the changeover was complete. from this point forward, changes in such things as database names, coverage periods, and descriptive material will be made to one .txt file. the change will immediately be reflected across all subject pages with no additional work involved for the librarians responsible for those pages. note that the sul server has been configured so that it parses all web pages. this is necessary because most of the library's web pages have some ssl this configuration means that the web page extensions remain .html. if the server is not configured in this manner, then all pages containing ssi must end in a .shtml extension. this is a subject that requires discussion with automation librarians or the department responsible for the library's server. advantages obviously, the biggest advantage to this method is the time saved for individual librarians. there is now no need for librarians to do any maintenance work for links to information housed in the alphabetical list. static html pages referencing gale's infotrac onefile database, for instance, would have required updates to approximately forty subject pages; now, one librarian can correct one .txt file and simultaneously update all forty subject pages. time saved can be used in collecting and editing the list of web sites that are a part of each subject page; this is a task that has been pushed back in the past, in favor of making more urgent database information changes. .! i coffeecup html editor www.coffeecup.com , f.ile j;:dit ~iew q.ocument [nsert e,ormat iools \'lindow t!elp j . [~ · lrl· ~ ~ i ~ @ • ! .. ,,., ,. ix iq :ii"' t;~ ~ . ! '1&1 t':f ~ ft; • •&;pl .;11= • · 1;9l· ,,. ®·. !ai· ,.,i · ~ . fil• · ~ · ~ · ee . !.l!j • edrt j preview i help i r academic sea rch elite   ( ebsco)   ; < img src= " / image::: /f ulltext. gif " a lt = "some full text " border = "o"> multi-di sc iplinary database includes some sc hol ar ly articles
  • fig. 1. html code for academic search elite using a purl called eb-ase el microsoft excel 4lphabeticalresources.xls l~ ole ~dit 'iiew insert fq.rmat ioofs qata y!,indow t1elp d q§; !iii ,f6i 'm it [l. ~ 1 nth till, • ~ accessiblea~h ives · accessible.ix! fig. 2. database names, .txt file names, and resultant include commands in addition, librarians who are using this simple technique do not need extensive training. the creation of the excel database of include commands allows for quick additions to an existing page, or the creation of new subject pages. librarians using the include commands can simply copy and paste them; there is no need for them to understand the syntax or to be able to repeat it. this makes using ssi particularly attractive to staff who do not want the added burden of further training in html. the librarian responsible for creating the .txt files and the excel database of statements demonstrated the copying and pasting of the include statements to all the other librarians who edit html pages in a one-time tenminute training session. the only additional training issue has involved page structure. since the library uses a table structure for the subject pages, all table tags are included in the database .txt files. making sure that librarians understand that they do not need to recreate the table tags has been the only additional training issue for the department. as librarians begin to use these commands, links to resources across subject pages will look the same and will provide the user with the same information. this increased uniformity results in a more professional appearance for the web site as a whole. disadvantages this revolution in the maintenance of subject pages has not been without its disadvantages. the primary complaint by librarians using ssi include commands is that they cannot preview their changes in their html editors. sul's department uses the coffeecup html editor, which allows previews, but the previews are not visible for items that are retrieved using ssis. this is because the page is not fully assembled until the server assembles it. when the librarian views the page in the editor, 196 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. prior to uploading it to th e server, the include commands are without tar gets. the target .txt files are on the server. when a user requ ests a p age, include commands pull in the missing pieces (the .txt files, or other files); th en, th e completed pag e is seamlessly presented to the us er via his or her brow ser. as mach notes, "preview ing a web page without crucial element s . . . can be di sconcerting, esp ecially to visuall y oriented d esigners."20 in sul's experienc e with thi s particular issue, librarian s who are uncomfortable loading pages with locally invisible elements can load th em into temporary fold ers on the server, check them for errors there, and then move them to th eir appro priate dir ectories . conclusion situational factors have allowed sul to imple ment this change with surprising ease and speed. because the library has its own server, and because th ere is an automation librarian on staff, communicati on and chan ge have been easy and efficient. librar y staff deduce that it is becau se the include command of ssi is b eing u sed more than other possible commands that the librar y is not experiencing an increase in loadin g tim e on its pages. of course, the size of sul's reso urce list makes this kind of soluart & tec h ebsco tion feasible ; certainly, if the librar y were working with hundreds of resources, it would be more likely that a datab ase -driv en strategy would be ad op ted . the simplicity and elegance of the ssi include command process has encourage d adoption, and sul ha s seen no ill effects from the us er side of operations. librarian web au th ors qui ckly overcame any slight di sco mfort with the new proc ess and are now able to devote a portion of editing time to other, less m ono tonous tasks. references and notes 1. carla dun smore, "a qualitative study of web-mounted pathfinders created by academic business libraries," libri 52, no . 3 (sept. 2002): 140-41. 2. charles w. dea n , "th e public electronic libr ary : web-based subj ec t guides," library hi tech 16, no. 3-4 (1998): 80-88; gary rob erts , "designi ng a database-driven web site, or, the evolution of the infoiguan a," computers in libraries 20, no. 9 (oct. 2000): 26-32; bryan h. davidson, "database-driven, dynamic content delivery: providing an d managing access to online resources using microsoft access and ac ti ve server pages," oclc systems and services 17, no . 1 (2001): 34-42; marybeth grimes and sara e. morris , "a co mp ari so n of academic librarie s' webliographies, " internet reference services quarterly 5, no . 4 (2001): 69-77; laur a ga lv an -estra da, "moving towards a user-cent ere d, database-driven web site at th e ucsd libraries," index to advertisers 179 200 lita internet reference services quarterly 7, no. 1-2 (2002): 49-61. 3. roberts, "infoiguana "; davidson, "da tabase driven"; galvanestrada, "user -cen tered, database-driv en web site." 4. davidson, "database driven," und er " int roduction ." 5. ibid., under "developm ent conside ra tions." 6. roberts, "infoiguana ," 32. 7. ga lvan-estrada, " u ser -centered, database-driven web site, " 55-56. 8. jody co ndit fagan, "server -side includ es made sim ple, " the electronic library 20, no. 5 (2002): 382-83 . 9. michelle mach, "the service of serv er -side includes," information technology and libraries 20, no. 4 (2001): 213. 10. greg r. notess, "serv er side includes for site management," online 24, no. 4 (july 2000): 78, 80. 11. ibid. 12. mach, "se rvice of server-side includ es," 216. 13. ibid., 214. 14. fagan, "server -side includ es m ade simple," 387. 15. ibid., 383. 16 . ibid. 17. ibid. 18. apache httpd server project, "apac h e http server version 1.3: secu rity tips for server configurati on," th e apache softwar e foundation. accessed oct. 29, 2003, http: / / httpd. apac he.org/ docs / misc / sec urity _tips .html. 19. an th on y baratta, e-mail to th elis t mailing list, may 16, 2003, accessed nov . 4, 2003, http:/ / lists.evolt.or g/ archive/ week-of-mon-20030512/140824.html. 20. mach, "service of serv er -side includ es," 217. cover 2, 191, covers 3--4 using server-side include commands i northrup, cherry, and darby 197 letter from the editor kenneth j. varnum information technology and libraries | june 2018 1 in this june 2018 issue, we continue our celebration of ital’s 50th year with a summary by editorial board member sandra shores of the articles published in the 1970s, the journal’s first full decade of publication. the 1970s are particularly pivotal in library technology, as it marks the introduction of the personal computer, as a hobbyist’s tool, to society. the web is still more than a decade away, but the seeds are being planted. with this issue, we introduce a new look for the journal — thanks to the work of lita’s web coordinating committee, and in particular kelly sattler (also a member of the editorial board), jingjing wu, and guy cicinelli. the new design is much easier on the eyes and more legible, and sports a new graphic identity for ital. board transitions june marks the changing of the editorial board. a significant number of board members’ terms expire this june 30, and i’d like to take this opportunity to thank those departing members for their years of service to information technology and libraries, and the support they have offered me this year as i began as editor. each has ably and generously contributed to the journal’s growth over the last years, and i thank them for their service to the journal and to ital: • mark cyzyk (johns hopkins university) • mark dehmlow (notre dame university) • sharon farnel (university of alberta) • kelly sattler (michigan state university) • sandra shores (university of alberta) these are big shoes to fill, but i am excited about the new members who have been appointed for two-year terms beginning july 1, 2018. in march, we extended a call for volunteers for 2 -year terms on the editorial board. we received almost 50 applications, and ultimately added seven new members: • steven bowers (wayne state university) • kevin ford (art institute of chicago) • cinthya ippoliti (oklahoma state university) • ida joiner (independent consultant) • breanne kirsch (university of south carolina upstate) • michael sauers (do space, omaha, nebraska) • laurie willis (san jose public library) readership survey summary over the past three months, we ran a survey of the ital readership to try to understand a bit more detail about who you are, collectively. the survey received 81 complete responses out of about 11,000 views of pages with the survey link on it. here are some brief summary results: • nearly half (46%) of respondents have attended at least one lita event (in-person or online). letter from the editor | varnum 2 https://doi.org/10.6017/ital.v37i2.10571 • three quarters (75%) of respondents are from academic libraries. public, special, and lis programs make up an additional 20%. • the majority (56%) are librarians, with the remaining spread across a number of other roles. • almost two thirds (63%) of respondents have never been lita members, a quarter (25%) are current members, and the remainder are former members. • about four fifths (81%) of responses came from the current issue (either the table of contents or individual articles). an invitation what can you share with your library colleagues in relation to technology? if you have interesting research about technology in a library setting, or are looking for a venue to share your your case study, get in touch with me at varnum@umich.edu. sincerely, kenneth j. varnum, editor varnum@umich.edu june 2018 mailto:varnum@umich.edu board transitions readership survey summary an invitation 71 who will steer the ship? during 1973, the existence of two study groups sponsored by the council on library resources became informally known in the library automation community. by the time of the 197 4 ala midwinter meeting, the lack of formal identification of these groups, their goals, and their relation to clr provoked some spontaneous and possibly faulty responses from the ala members present. in the march 1974 issue of lola, ms. ruth tighe analyzed with perception and accuracy the behavior of information scientists attending that meeting. however, we feel that further attention should be paid to the precise situation in which we find ourselves. the council on library resources has, for eighteen months, funded a small group with the acronym of cembi. informal communication had it that this group of library automation experts originally was to devise a standardized subset of the marc monograph format; however, a full year passed without public announcement of this work. unable to come to an agreement, the group seems to have turned to specific strategies for interchange of machine-readable bibliographic data. these goals are, of course, valid and worthy of pursuit. clr also announced at midwinter the intent to administer a system plan for large-scale serials conversion to create a national serials data base. this project draws upon the considerable efforts of the ad hoc "toronto" group, which provided a status report of its efforts in the december 1973 issue of lola. in addition, an invitational conference was sponsored in april 197 4 by clr, to discuss national bibliographic control, with a small number of conference attendees and with a total absence of publicity. there are three major problems inherent in the situation described above, affecting not only the library automation community but libraries as a whole. first, there is no apparent justification for the air of secrecy which has surrounded clr' s direction of these worthy efforts. surely it must be abundantly clear to all administrators these days that in order to implement a far-reaching program it is necessary to inform if not consult with the target population. those librarians who are not associated with clr do not necessarily have axes to grind or home-grown systems to foist upon the world. they do wish to be kept informed of discussion and developments which may eventually have a direct effect upon their work. while it is perfectly reasonable to foster technical progress in a difficult area of study by forming a closed working group of skilled professionals, there seems to be little gained by avoiding recognition of such a group. 72 journal of library automation vol. 7/2 june 1974 second, the approach of these projects has sidestepped all the existing channels of operation and communication which we have been striving for over a decade to create. should clr wish a certain task perfornied, it should be able to contract with the library of congress or to fund an existing ala committee to carry out the work under the present circumstances, these established channels are likely to find their deliberations bypassed and superseded by these ad hoc groups. third, it seems important when determining issues (ad hoc standards for local input of marc-like monograph and serials records) which are of longrange concern to many libraries, that it is particularly important not to bias a development effort toward the needs of one type of library. the approach of the large research library, while important:'is not the only vantage point from which to perceive the problems of nationwide bibliographic systems. we recommend that the council on library resources find an alternate method of accomplishing its goals-a method which includes provision for adequate communication and which takes advantage of existing channels. such a method, for the cembi group, might be to declare its deliberations to be a user-group standards proposal for submission to the rtsd/isad/ rasd representation in machine-readable form of bibliographic information committee ( marbi), the appropriate ala committee. if clr wishes more intensive review of this or any other proposal from marbi, it could fund the necessary expenses for more frequent meetings of marbi. an analogous method would establish the serials project as a funded program, with the desired task goals, within the library of congress, the national serials data program, or an appropriate library union serials organization. clr has done many good deeds for the library world in its lifetime; it would indeed be unfortunate were it to inadvertently allow the growth of professional confusion and resentment that were evident at the midwinter meeting. susan k. martin lib-s-mocs-kmc364-20140601051149 3 an interactne computer-based circulation system: design and development james s. aagaard: departments of computer sciences and electrical engineering, northwestern university, evanston, illinois. an on-line computer-based circulation control system has been installed at the northwestern university library. features of the system include selfservice book charge, remote terminal inquiry and update, and automatic production of notices for call-ins and books available. fine notices are also prepared daily and overdue notices weekly. important considerations in the design of the system were to minimize costs of operation and to include technical services functions eventually. the system operates on a relatively smau computer in a multiprogrammed mode. introduction although the northwestern university library had given some consideration to the adoption of data processing techniques over a period of many years, it was not until planning for a new library building started that this consideration became serious. an associate university librarian and a systems analyst were added to the staff with specific responsibilities in the "automation" area. the recommendation of the systems analyst was that an on-line system should be designed to integrate all library functions. two areas were isolated for initial development: technical services, including ordering and cataloging, and circulation control. several other decisions were made at about the same time (fall 1967). perhaps most important of these was the choice of computer. the acquisition of a dedicated library computer was ruled out on the basis of cost, leaving the choice to be made between a control data 6400 soon to be installed in the university's computing center, or an ibm 360/30 in the administrative data processing department. it was clear that the ibm 360 would have to be upgraded considerably to handle an on-line system, but the decision was made to use it, based on the facts that it was already installed and operating, that the machine itself was more adaptable to text processing 4 journal of library automation vol. 5/1 march, 1972 applications, and that the library was an administrative application. a small programming staff was available, and it was decided to use that staff rather than have the library develop its own programming capability. the university's engineering and science libraries were administratively divorced from the rest of the evanston campus libraries to serve as a pilot location for development and testing. one final decision was made by the programming staff. since there was reason to believe that the use of a real-time system by the library might generate similar requests from other users of data processing services, the system should be capable of extension to other applications, if possible. design then began on a general-purpose file maintenance system. a detailed description of this system will be presented in another paper. actual programming started in spring 1968, and in about a year the teleprocessing system was essentially complete and work was started on various subsidiary programs, to be run on a daily or weekly basis. these included programs for producing catalog cards, purchase orders, and similar materials. however, at this time the realization came that the opening of the new building was less than a year away (construction was on schedule, unlike the situation with several other buildings) , and the new library administration felt very strongly that it would be desirable to have an operational circulation system. work then was suspended on the technical services part of the system, but it is important to note that the system developed up to that point provides the on-line inquiry capability to the circulation operation. it is also true that this capability is more sophisticated than would be needed for circulation applications only. mter the basic system design for the circulation system was completed in spring 1969, the massive job of preparing nearly a million punched cards for the books was started in the summer. this was done using student operators, working from the shelflist. the most expensive and time-consuming part of the job, however, proved to be the insertion of the cards in the books and this was not completely finished before the new building opened in january 1970. the computer circulation system was not ready for operation until december, so that it was tested in the pilot library for only a three-week period, hardly enough for a complete cycle of book charges and discharges. operation in the new building was complicated by several factors besides a new and unfamiliar circulation system. the building itself was not quite finished, all of the books were not in place, all of the remote terminals were not installed, and there was a large backlog of work which had accumulated during the moving period. the most serious problem, however, was that a decision had been made to continue the old manual circulation system in parallel with the new one, and with the other problems this became too much of a burden on the library staff. when it apinteractive circulation design/ aagaard 5 peared that there were no problems with the new system which could not be worked out, the manual system was quickly abandoned. after this point operations began to improve rapidly, and within a few months the system was running quite smoothly. the systems and programming staff has now returned to the implementation of the technical services system. general description functionally the northwestern university library circulation system may be viewed as consisting of three parts. the first of these is a book charge/ discharge operation using the ibm 1030 series of terminals. the second part is the general-purpose file maintenance system, originally developed for technical services; and the third part is a group of programs which are run in "batch" mode and thus have no direct interaction with the remote terminals. the teleprocessing program operates in a partition of 36,864 bytes of storage on a 65,536-byte computer. the ibm disk operating system is used, which requires 8,240 bytes, leaving 18,432 bytes for batch programs. the basic telecommunications access method is used for remote terminal input-output operations. data storage is on an ibm 2321 data cell. the present terminal configuration consists of five pairs of 1031 input terminals and 1033 printers, and four 27 40 model 2 typewriter terminals (two of which are used for technical services development). the partition size is probably adequate for one or two more terminals. all of the 1030 terminals share a common telephone line, as do all of the 27 40 terminals. two of the 1030 terminals are master units, connected directly to the telephone line, while three 1030s are satellites, operating through a master terminal. one master 1030 and one 2740 are located in the technological institute library; the remaining terminals are in the main university library. book charge system each of the 1031 terminals will accept an 80-column punched card which is kept in the book pocket, and also a punched plastic user identification badge. the three satellite terminals are located in the stack area of the library, one on each of three floors adjacent to the elevators. they are used for self-service charge of books. a master terminal, which also includes a manual entry keyboard, is located at the circulation desk on the main floor and is operated by library staff. the keyboard on this terminal allows the staff to perform additional functions , such as charging books to users without badges, charging for periods other than the standard loan period, processing renewals, and discharging books which have been returned. the 1033 printer associated with each t erminal is really just a modified electric typewriter. when a book is charged and the transaction is ac6 journal of library automation vol. 5/1 march, 1972 cepted by the computer, the printer creates a date due slip which shows the call number of the book, the identification number of the user, and the date due. this is placed in the book pocket and serves as the borrower's pass to carry the book past the exit guards. to make this a reasonably secure system, the guard must verify that the call number printed on the slip corresponds with the number on the book, and the user number on the slip corresponds with the number on a valid university identification badge. note that this system permits a user to bring the book back into the library and take it out again as often as he wishes during the loan period. the 1030 terminals are associated with a small, specialized part of the computer teleprocessing program which accepts the information from the terminal, reformats it so that it is compatible with the general-purpose file maintenance system, and enters it in the file, checking, of course, that the same book is not already in the file. (transactions which are invalid for one reason or another result in an "unprocessed" message on the printer, and the user must then go to the circulation desk to have the problem resolved. ) this portion of the system also processes renewals and discharges from the terminal at the circulation desk. inquiry system inquiry requests use the general-purpose file maintenance system originally developed for technical services. this gives an operator at the circulation desk the capability to query the file about the status of any book, and also to make certain changes in records, such as indicating renewals and saves. the terminal used for this purpose is the ibm 27 40, which is similar to a typewriter in operation. in order to facilitate the inquiry operations, the key to each record is the actual call number of the book, rather than some arbitrary accession number. the call number is divided into two parts, which we call the search key and the key extension. the search key includes the dewey classification number, cutter number, and up to four work letters; the key extension includes other information such as edition number, volume number, or copy number. a few compromises were necessary with the key extension in order to adapt it to the limited character set that the 1030 terminals can process, but these changes have not caused any difficulty. when a record is first entered in the file, the search key is used to calculate a position in the file. this is done by taking the alphanumeric characters in the search key, performing some mathematical operations to reduce the number of characters to four, and then treating the result as if it were a number. this resultant number is divided by a constant number which is chosen to be the largest prime number which is smaller than the number of tracks on the storage device for the file (the data cell ). the remainder from this randomizing computation gives a location where an attempt is made to place the record. it is quite possible that this place interactive circulation design/aagaard 7 in the file will be occupied by some other record, which might have the same search key or a quite different one. additional steps are provided to find an alternate location in such cases. additional information about the file organization is given in the appendix to this article. when a record must be located in the file, the same randomizing procedure is followed. then, the complete keys of any records which are found at the calculated location are examined to find the desired record. however, the operator at the inquiry terminal has the option of requesting a search for records based on search key only, and this is often useful. even though the individual records for each book are relatively short ( sixty-seven characters), the cost of maintaining a record in the file for every book in the library would be prohibitive. for this reason the philosophy has been adopted that an entry will be made in the file only for a book which is not in its proper place on the shelves. thus the file includes entries for books which have been placed on reserve, loaned to library departments, or reported lost or missing. there are probably more records of this type in the file at any given time than records representing books borrowed by individuals. unfortunately, the library does not have the terminal or personnel capacity to verify the status of all books before the user leaves the card catalog area (adjacent to the circulation desk) so that if he doesn't find a book on the shelves he must return to the circulation desk to request an inquiry. in many cases, of course, the user may be satisfied with a nearby book on the same subject, or perhaps he just intends to browse in a general subject area. it is hoped that at some future time it will be possible to obtain inquiry terminals which can be user-operated and located in the stack area. an additional feature of the system is the capability to place a "save" on a book which is on loan. this is done by the operator at the inquiry terminal, who adds the saver's identification number to the record of the book. the save request triggers a call-in notice, discussed below. this procedure has the minor disadvantage that a save cannot be entered so that the first copy of a particular book which is returned will be held; the operator must select a copy if more than one is out (or enter the save on all copies). this has not proved to be a serious problem. the overall teleprocessing system includes a file of records, called the "transaction file," which is written sequentially. records from other files which can be accessed by remote terminals are written in the transaction file under certain circumstances. in the case of circulation records accessed by the inquiry terminal, writing of the transaction file is under the control of the terminal operator, who will always request that the book record be written in the transaction file when entering a save on a record. records processed by the 1030 terminals are entered in the transaction file only when they represent books which are discharged and which possibly involve a fine or contain a save. 8 journal of library automation vol. 5/1 march, 1972 batch processing the third part of the complete system includes a number of "batch" programs, which are run periodically and are independent of the real-time program. however, they may be, and usually are, run at times when the real-time program is also operating. these programs use either the transaction file or the main circulation file. each weekday morning, a series of programs is run which processes the data entered into the transaction file since the previous run. three types of printed notices are prepared by these programs as shown in the accompanying figures. one is a fine notice for books which were overdue when returned but for which a fine was not collected. (if a fine was collected the fact is indicated by a keyboard entry on the 1030 terminal when the book is discharged.) there also may be circumstances when the regular fine was collected, but a penalty fine is due because the book was called in but not returned in the specified time. the second type of notice is the call-in notice. this is prepared from records which were placed in the transaction file as a result of a save entered by the inquiry terminal operator. the third type of notice is the book-available notice, which results when a book with a save is discharged. (when the actual discharge is performed the real-time computer program prints a message on the 1033 printer so that the book will not be returned to the stacks. ) after the selection of records from the transaction file, name and address information is added from student and personnel files maintained by the data processing department, and a four-part notice is prepared. the same printed form is used for all three types of notices; the additional copies of the fine notice are used in a manual follow-up system if the fine is not p-aid immediately. processing of these notices generally takes less than ten minutes of computer time a day. on a weekly basis, another series of programs is used to process the entire file of outstanding books. a considerably longer time is required because of the size of the file which must be examined. at this time all records in the file are also transferred to a backup file, which provides some protection in case of damage to the prime file (it has not yet been necessary to use it). information is also extracted about books which have had transactions in the past week, and a circulation statistics report is prepared. finally, records for books which are overdue are extracted and processed similarly to the daily notices, resulting in a one-part overdue notice. on a quarterly basis the entire file is examined and lists are prepared of all entries which otherwise do not qualify for overdue notices. these include charges to reserve, lost and missing, other libraries, carrels within the library, and to faculty (who are not charged fines). these lists are distributed for verification that the books are actually located where the interactive circulation design/ aagaard 9 file says they are and then returned to the circulation department for any further processing which might be necessary. backup system in designing a real-time circulation system it was obviously necessary to provide for those occasions when some equipment malfunction prevents normal processing. it was not felt necessary to provide any backup to the inquiry part of the system, and book discharges can be allowed to wait for several days if absolutely necessary. to process charges during a period when the computer system is not operational, a standard register source record punch is used. this device can accept the same plastic user badge and book card as the 1030 terminal, and transfers the punched information to a two-part form. the first part of the form serves as the date due slip, while the second part, a standard 80-column card, is used to enter the transaction through the 1031 when the real-time system is again operational. this system has the advantage that it is completely independent of the real-time system, except of course for the building electric power. if for some reason the standard register punch cannot be used for a transaction (or for a book without a book card under any circumstances), the missing information can be handwritten on the two-part form. the second card part is then keypunched and the transaction entered through the 1031 terminal. this ultimate backup system is avoided, however, as it is very susceptible to transcription errors. costs any attempt to determine the cost of the on-line circulation system is complicated by several factors. the cost of the terminals is an obvious item, and fairly easy to determine and allocate. however, the cost of the communications adapter which connects the telephone lines to the computer, as well as the cost of running the teleprocessing program, must be shared by all users. these now include technical services as well as circulation services, and in the future may include other nonlibrary university users. finally, even if this allocation could be made, there is still the problem of separating the costs of the teleprocessing program and the batch programs being run in the data processing department. however, since poor information may be better than none at all, the following figures are presented. they include monthly charges for the real-time program for both circulation and the part of technical services which is now operating, but do not include any charges for running of batch programs. 1030 terminals master 1031a & 1033 with manual entry 2 @ $251 ----------------------------------------------------------------------------------$ 502 10 journal of library automation vol. 5/ 1 march, 1972 satellite 1031b & 1033 3 @ $155 --------------------------------------------------------------------------------------465 2740 model 2, 600 bps 4 @ $170 --------------------------------------------------------------------------------------------680 2701 data adapter with 2 lines ---------------------------------------------------------_ 450 t elephone line charges ---------------------------------------------------------------------------25 data cell space, 5 cells ( 1 circulation; 4 technical services ) --------------------------------------------------1,400 core storage allocated exclusively to teleprocessing ------------------------1,700 estimated share of cpu and disk costs -------------------------------------------------1,400 special operator charge -----------------------------------------------------------------------350 future plans although the present real-time program includes a list containing a limited number of invalid user numbers (lost badges or users who are guilty of repeated violations of library rules), it would be more satisfactory to have a list of valid numbers instead. this might even be expanded into a self-regulating system, where users with a sufficient number of "demerits" would be prevented from charging additional books, and which might reduce the need for fines. another desirable addition to the system would be if some simple and inexpensive display terminals could be obtained and placed in the stack area to provide a self-service inquiry capability. this would relieve the circulation desk staff of some additional work, as well as reduce the number of trips the users must make to and from the stack area. acknowledgments the success of this circulation system is due in no small part to the constant support and encouragement from mr. h oward f. smith, director of administrative data processing, and mr. john p. mcgowan, university librarian. mrs. velma veneziano was responsible for establishing the library's requirements and rendered invaluable help during the implementation of the system. [editor's note: mrs. v eneziano is preparing an updated (1972) summary of the actual application of this design, which it is hoped will be published shortly.] appendix 1 detailed d escription of file structure the following information is included in each circulation file record: key dewey classification number-11 characters . ( the decimal point is included in the record but not punched on book cards. ) interactive circulation designjaagaard 11 cutter number-5 characters. work letters-4 characters. key extension volume, editor, copy-17 characters. location code-2 characters. this code indicates whether the book is in the main library or in a branch. it also provides for a subsidiary location code if required. large book indicator-! character. charge code and date-s characters. borrower number-5 characters. renewal code and date-s characters. discharge code and date-s characters. save date-2 characters. saver number-5 characters. due date-4 characters. terminal identification code-1 character. reserved for system use-1 character. all dates and user identification numbers are packed to save space. because of the random organization of the file it is necessary that storage space be allocated for considerably more records than the expected maximum file size. this allocation conceivably might have been based on a simulation study of the file operation, but this could not be justified since not even a good estimate of maximum file size was available. (an important unknown factor was the change in usage patterns expected to result from the move to a new building. ) the only available estimates indicated a maximum file size of about 60-70,000 records, and on this basis sufficient space was allocated to hold 144,000 records. with twelve records to a track on the ibm 2321 data cell, the file requires 12,000 tracks, or 60 percent of one cell. an equivalent would be about one pack on an ibm 2314 disk storage unit. of this total space, five percent (about 7,200 records ) is set aside as an overflow area; the remainder is the prime area. when the randomizing algorithm leads to a cylinder (twenty tracks) which is completely filled, the record is written in the overflow area and the cylinder at the prime location is flagged . using this system, failure due to "running out of space" will be gradual; it is likely that system performance will be seriously degraded before the overflow area is completely filled. after more than a year of operation the file contains almost 63,000 records, of which about 230 are in the overflow area. consideration is being given to minor changes in the handling of overflow records which should reduce the time required to search the overflow area. reproduced with permission of the copyright owner. further reproduction prohibited without permission. free culture: how big media uses technology and the law to lock down culture and control creativity coyle, karen information technology and libraries; dec 2004; 23, 4; proquest pg. 198 book review free culture how big media uses technology and the law to lock down culture and control creativity by lawrence lessig . new york: penguin, 2004. 240p. $24.95 (isbn 1594-20006-8). this is the third book by stanford law professor larry lessig, and the third in which he furthers his basic theme : that the ancient regime of intellectual property owners is locked in a battle with the capabilities of new technology. lessig u sed his first book, code and other laws of cyberspace (basic books, 1999), to explain that the notion of cyberspace as free, open, and anarchic is simply a myth, and a dangerous one at that: the very architecture of our computers and how they communicate determine what one can and cam10t do within that environment. if you can get control of that architecture, say by mand ating filters on cont ent, yo u can get subs tantial control over the culture of that communication space. in his sec ond book, the future of ideas: the fate of the commons in a connected world (random, 2001), lessig describes how the chang e from real prop erty to virtual propert y actually means more opportunity for control , not less. the theme that he takes up in free culture is his conc ern that certain powerful inter ests in our society (read: hollywood) are using copyright law to lock down the very stuff of creativity: mainly , pa st creativity. lessig himself admits in his preface that his is not a new or unique argument. he cites richard stallman's writings in the mid-1980s that became the basis for the free software movement as containing many of the same concepts that lessig argues in his book. in this case, it serves as a kind of proof of concept (that new ideas build on past ideas) rather than a criticism of lack of originality. stallman's work is not, however, a substitute for lessig's; not only does lessig address popular culture where stallman addresses only computer code, but lessig has one key thing in his favor: h e is a mast er story-tell er and a darned good writer, not something one usually expec ts in an academic and an expert in constitutional law. his book opens with the first flight osf the wright brothers and the death of a farmer's chick ens, followed by buster keaton's film steamboat bill and disney's famous mouse . th e next chapter traces the history of photography and how the law once considered that snapping a picture could require prior permission from the owners of any property caught in th e viewfinder. later he tells how an improvement to a sea rch engin e led one college student to owe the recording industry association of america $15 million. throughout the book lessig illustrates copyright through the lives of real people and uses histor y, science, and the arts to mak e this law come to life for the reader . lessig explains that intellectual property differ s from real property in the eye of the law. unlike real property, where th e property owner has near total control over its uses, the only control offered to authors originally was the control over who could make copies of the work and distribut e them. in addition, that right-the "copy right" -lasted only a short time. the original length of copyright in the united states was fourteen years, with the right to renew for another fourteen years. so a total of twenty-eight years stood betwe en an author's rights and the public domain, and those rights were limited to publishing copies. others could quote from a work, even derive other works from it (such as turning a no ve l into a play) , all within a law that was designed to promote science and the arts. fast forward to the present day and we have a very different situation. not only has there been a change in th e length of time that copyright applies to a work; a major change in 198 information technology and libraries i december 2004 tom zillner, editor copyright law in 1976 extended copyright to works that had not previously b een covered. in the earli es t u.s. copyright regimes of the late 18th century, only works that were registered with the copyright office were afforded the prot ection of copyright law, and only about five perc en t of works produc ed were so registered. th e rest were in the public domain. later, actual registration with the copyright office was unnecessary but the author was required to place a copyright notice on a work (e.g ., "© 2004, karen coyle") in order to claim copyright in it. copyright holder s had to renew works in order make use of the full term of protection, and renewal rates were actually quite low. in 1976, all such requirements were removed, and the law was amended to state that any work in a fixed m edium automatically receives copyright protection, and for the full term. that is true even if the author do es not want that protection . so although many saw the great exchange of ideas an d information on the internet as being a huge commons of knowledge, to be shared and sha red alike, a ll of it has, in fact, alwa ys been covered by copyright law-every word out there belongs to someone. that chang e, combined with a much earlier change that gave a copyright holder control over derivative works, puts creators into a deadlock. th ey cannot safely build on the work of others without permission (thus less ig's argument that we are becomin g a "permission culture ") . yet, we have no m echanism (such as registration of works that would result in a databas e of creators) that would facilitate getting th at permission . if you find a work on the internet and it has no named author or no contact information for the author, the law forbids you to reuse the work without permission, but there is nothing that would make getting that permission a manageable task. of course, even if you do know who th e rights hold er is , permission is not a given. for examreproduced with permission of the copyright owner. further reproduction prohibited without permission. ple, you hear a great song on the radio and want to use parts of that tune in your next rap performance. you would need to approach the major record label that holds the rights and ask permission, which might not be granted. you could go ah ead and use the sample and, if challenged, claim "fair use." but being challenged means going to court in a world where a court case could cost you in the six digits, an amount of money that most creators do not have. lessig, of course, spends quite a bit of time in his book on the length of copyright, now life of the author plus seventy years. it was exactly this issue that he and eric eldred took to the supreme court in 2003. lessig argued before the court that if congress can seemingly arbitrarily increase the length of copyright, as it has eleven times since 1962, then there is effectively no limit to the copyright term. yet "for a limited time" was clearly mandated in the u.s. constitution. lessig lost his case. you might expect him to spend his efforts explaining how the supreme court was wrong and he was right, but that is not what he does . right or wrong, they are the supreme court, and his job was to convince them to decide in favor of his client. instead, lessig revises his estimation of what can be accomplished with constitutional arguments and spends a chapter outlining compromises that mightjust might-be possible in the future. to the extent that eldred v. ashcroft had an effect on lessig's thinking , and there is evidence that the effect was profound, it will have an effect on all of us because lessig is one of the key actors in this arena. throughout the book, lessig points out the difference between copyright law and the actual market for works. there is a great irony in the fact that copyright law now protects works for a century or more while most books are in print for one year or less. it is this vast storehouse of out-ofprint and unexploited works that makes a strong argument for some modification of our copyright law. he also recognizes that there are different creative cultures in our society, with different views of the purpose of creation. here he cites academic movements like the public library of science as solutions for the sector of society that has a low or nonexistent commercial interest but a need to get its works as widely distributed as possible. for these creators, and for "sharers" everywhere, lessig promotes the creativecommons solution (at www. creativecommons.org), a simple licensing scheme that allows creators to attach a license to their work that lets others know how they can make use of it. in a sense, creativecommons is a way to opt out of the default copyright that is applied to all works. when i first received my copy of free culture, i did two things: i looked up libraries in the index, and i looked up the book online to see what other reviewers had said. online, i found a web site for the book (http:/ /free-culture.org) that pointed to two very interesting sites: one that lists free, downloadable fulltext copies of the book in over a dozen different formats; and one that allows you to listen to the chapters being read aloud by volunteers and admirers. (i did listen to a few chapters and generally they are as listenable as most nonfiction audio books. in the end, though, i read the hard copy of the book.) lessig is making a point by offering his work outside the usual confines of copyright law, but in fact the meaning of his gesture is more economic than legal. although he, and cory doctorow before him (down and out in the magic kingdom, tor books, 2003), brokered agreements with their publishers to publish simultaneously in print with free digital copies, few authors and publishers today will choose that option for fear of loss of revenue, not because of their belief in the sanctity of intellectual property. if there were sufficient proof that free online copies of works increased sales of hard copies, this would quickly become the norm, regardless of the state of copyright law. as for libraries-unfortunately, they do not fare well. he dedicates a short chapter to brewster kahle and his way-back machine as his example of the need to archive our culture for future access. i admit that i winced when lessig stated: but kahle is not the only librarian. the internet archive is not the only archive. but kahle and the internet archive suggest what the future of librarie s or archives could be. (114) lessig also mentions libraries in his arguments about out-of-print and inaccessible works, but in this case he actually gets it wrong: after it [a book] is out of print , it can be sold in used book store s without the copyright owner getting anything and stored in libraries, where many get to read the book, also for free. (113) since we know that lessig is very aware that books are sold and lent even while they are still in print, we have to assume that the elegance of the argum ent was preferred over precision . but he makes this error mor e than once in the book, leaving librarie s to appear to be a home for leftov ers and remaindered works. that is too bad. we know that lessig is aware of libraries; anyone active in the legal profession depends on them. he has spoken at library-related conferences and events. yet he does not see libraries as key players in the battle against overly powerful copyright interests . more to the point, libraries have not captured his imagination, or given him a good story to tell. so here is a challenge for myself and my fellow librarians: whether it means chatting up lessig after one of his many public performances, becoming active in creativecommons, or stopping by palo alto to take a busy law professor to lunch , we need to make sure that we get on , and stay on, lessig's radar . we need him ; h e needs us.-karen coyle, digital libraries consultant, http:// kcoyle.net book review 199 lib-mocs-kmc364-20140103102802 86 booth library on-line circulation system (bloc) paladugu v. rao: automation and systems librarian, and b. joseph szerenyi: director of library services, eastern illinois university, charleston, illinois an on-line circulation system developed at a relatively small university library demonstrates that academic libraries with limited funds can develop automated systems utilizing parent institution's computer facilities in a time-sharing mode. in operation sinte september 1968, using an ibm 360 j 50 computer and associated peripheral equipment, it provides control over all stack books. this article describes the history, analysis and design, and operational experience of the booth library on-line circulation system (bloc). since september 1968, when it went into operation, it has constantly been evaluated and modified to make it as perfect a system as possible. articles in library literature describing on-line circulation systems in operation at various libraries include hamilton ( 1 ) , heineke ( 2), kennedy ( 3), and bearman and harris ( 4). bloc differs considerably from those reported systems and has some unique characteristics that deserve the attention of the library profession. it is one of the pioneering circulation systems in which on-line real-time inquiries are being made into the computer files by use of a cathode ray tube display tenninal. it is not a prototype or model system to be interpreted as the optimum circulation system, but rather it is a dynamic system which will and should ------------------......... on-line circulation system j rao and szerenyi 87 be modified to achieve th~ best possible system in accordance with the latest developments in computer hardware and software. its analysis and design were influenced by the needs of an academic library. however, with little or no modification this system can be adopted by public and school libraries. environment eastern illinois university, a state-supported institution located in charleston, has developed a comprehensive curriculum that offers programs in liberal arts, teacher education and other professional fields, and a graduate school. the enrollment for the academic year 1970-71 is 8,600 students, and the number of the faculty is 711. the goal of the university is to provide an excellent education in an atmosphere of high faculty-student ratio and generally small classes, characterized by intellectual dialog and daily contact among students, faculty and administrators. instructors require heavy use of library materials. booth library, the main library of th~ university, contains 235,000 volumes in its collection at present. it has just finalized a five-year development plan to keep pace with the growth of the institution and will increase the collection to over 400,000 volumes by the end of 1975. bloc was designed to satisfy the library's present and future requirements. planning analysis phase in order to improve services to its patrons through the utilization of modern technology, booth library started planning for library automation as early as 1965. early experiments used unit record equipment in such areas as the ordering of library of congress printed catalog cards, acquisitions and serials control. initial difficulties prevented these projects achieving full operational status, however, and subsequently all were abandoned. the primary benefit library staff gained from these early experiments was education in planning carefully for subsequent automation projects, one of which is the bloc system. initial planning for the latter began in 1966; however, the original plan, which was for closed stacks, had to be modified considerably to make bloc compatible with more recent developments in booth library operations. the library switched to open stack operation in 1967. while the bloc planning was going on, there were also plans underway to expand booth library's physical facilities and its resources to meet the needs of an expanding campus. the volume of circulation had already been increasing at the rate of 15% per year. the circulation staff had to be increased to cope with the situation, and even then quality of service had to be sacrificed to quantity demands. furthermore, it was determined that the proposed growth in enrollment and the anticipated increase in library materials would increase the volume of circulation even more and 88 journal of library automation vol. 4/2 june, 1971 impose additional work on already overburdened circulation staff. the call-slip circulation system in use at that time no longer seemed adequate, and the file maintenance associated with the call-slip system had turned into a time-consuming and cumbersome task. thus the need for an improved and simplified circulation system became evident to the administration of the library. the professional librarians held several informal discussions to identify and develop a circulation system that would adequately meet both present and future requirements of the library. several existing types of circulation systems were considered and comprehensively reviewed, but the librarians did not agree upon any of them. however, the review did result in the formation of a task force, consisting of representatives from the administration, the data processing center, and the library. after thorough investigation, this task force recommended a computerized on-line circulation system as a possible solution to the library's problem, and the administration authorized the task force to prepare a detailed analysis and design proposal. design phase in developing its detailed proposal, members of the task force took into consideration the fact that the new circulation system would use the existing computer facilities on the campus. they aimed at a system that would provide the best possible service at least cost in the long run, and one that would allow for incorporation of future developments in computer technology. main design objectives were to l) eliminate borrower participation in the check-out process, 2 ) speed and simplify circulation procedure, 3) eliminate manual file maintenance, 4) permit identification of the status of any book within the system, 5) provide accurate and up-to-date statistics concerning use of library materials, including the number of times a given book is used, 6 ) provide guidance from the system in case of human error in conducting a transaction, and 7) relieve professional librarians from clerical chores. development hardware the computer system on the campus operates in a time-sharing mode, concurrently performing several on-line and batch processing jobs for the registrar's office, business office, textbook library and booth library. at the present time it is an ibm s/360 model 50 with 262k bytes central core and related peripheral equipment. it functions under the supervision of operating system os. on-line circulation systemjrao and szerenyi 89 figure 1 shows a schematic of the system's ibm equipment and data how among the various components, as applicable to bloc. among the components shown in the schematic, two 029 keypunches, one 059 verifier, two 1031 terminals, one 1033 printer and one 2260 cathode ray tube display fig. 1. booth library on-line circulation (bloc). 059 verifier 557 interpreter terminal are exclusively used by bloc and are located in the library. the other components, located in the computer center, are shared by bloc along with the other systems in operation on the campus. it should be pointed out also that this schematic represents only the equipment used exclusively or in a shared mode by bloc and does not represent the university's total computing system configuration. software there are 25 different applications programs written in pl/1 ( f level) to support the bloc system. these do not include the system programs written in assembler language to perform certain basic machine functions. there are two main data files that are required for the operation of bloc. these two files are stored on a 2314 disk storage facility and are available to the system in an on-line, random-access basis. the programs required to process the bloc transactions are also stored on the same disk storage facility, and these programs are loaded as needed by the operating system. 90 journal of library automation vol. 4/2 june, 1971 of the two data files the first one is the patron file, which contains identification data of persons eligible to borrow books from booth library. the second one is the booth master file, containing identification information for each physical volume located in booth library. the patron file is a combination of employee and student files that were created to serve the usual business needs of a university. this file is arranged in the indexed-sequential method (5) by the patron's social security number. each student record in this file is 408 bytes long and each employee record in this file 304 bytes long. at present this file contains over 19,000 records, including some inactive student records. to process the transactions bloc borrows such information as name, address and telephone number of the patron from this file as needed. updating of this file is done by the computer center with the aid of the university administrative offices. the booth master file was created exclusively for the operation of bloc. creation of this file, which took one and one-half years, was done by converting the booth library shelf list into punched cards and then transferring the information from the punched cards to a disk file. one master card was punched for each physical volume in the library. after verification, information from these cards was loaded onto the disk file through the s/360. layout of the master cards is given below: field card columns explanation transaction code accession number format code call number edition, year, series volume number part, index, supplement number copy number location code author title end of card code 1 2-7 8 9-28 29-31 32-35 36-38 39-40 41-42 43-52 53-79 80 a=new record c=change record d=delete record oversize, etc. reference library, etc. 12-4-8 punches any blank space in the above fields is filled in by a slash ( j). creation of book cards (those used in circulation transactions) from the master cards is explained in file updating procedure. the booth master file on the disk is arranged in the indexed-sequential method ( 5) by the first ten characters of the call number and by the on-line circulation system j rao and szerenyi 91 accession number, which has a fixed length of six characters. each record in this .file is 124 b ytes long. the layout for a record is given below: field byte positions explanation os control 1 call number 2-11 accession number 12-17 call number 18-27 edition, year, series 28-30 volume number 31-34 part, index, supplement number copy number location code author title control byte number of check outs status of book borrower social security # borrower status due date format code save social security # save type save status unused bytes 35-37 38-39 40-41 42-51 52-78 79 80-82 83 84-92 93 94-99 100 101-109 110 111 112-124 first ten characters remainder of call number cumulative number of check outs in or out !=student, 3=faculty, etc. oversize, etc. ss# of the patron that requested save !=student, 3=faculty, etc. is there a save or not? for future use average access time of a record from this file is 75 milliseconds. as a security measure, a copy of the booth master file is kept separately on a magnetic tape, from which the disk file can always be recreated. operation updating the booth master file is updated nightly with records of the new books acquired by booth library. after processing in the catalog department, the new books go to the keypunch section, where a master card is punched and verified for each new book. the master cards are then sent to the computer center, where each master card is reproduced and interpreted into two book cards. the layout of the book card is identical to that of the master card with two exceptions. a 't' is punched in column one as a transaction code for the system, and the end-of-card code is moved 92 journal of library automation vol. 4/2 june, 1971 to column 19 to expedite transaction processing, accession number and class number being adequate for locating a book record from the booth master file. the identification data appearing on the book card is similar to that on the master card. however, during the interpretation process printing on the book card is rearranged in a format more suited to visual veri£cation (figure 2) ' ii ii i fig. 2. bloc book card. i i i ii i i i i after interpretation the book cards are run through a stamping process. in this process the machine reads the accession number from each card and stamps the number on the back of same card, across the 3)~-inch dimension of the card and near the top. this allows the circulation staff to compare the accession number with the number stamped on the book pocket, to insure that the card is put in the right book. after being stamped the book cards are sorted into two identical decks and sent back to the library's keypunch section. one deck of the cards is put into book pockets and the books are shelved in the stack area ready for circulation. the second deck goes to the circulation department for interfiling in call number sequence into a duplicate book card file. cards from this file are used as replacements for the original book cards in the book pockets as needed. whenever a card is removed from this file, the information is noted on a special card, so that another duplicate can be punched and placed in the file for future use. late at night, after the library is closed, the master cards received on that day in the computer center are used to update the booth master file before the new books are put into actual circulation. transaction p1'ocessing circulation transactions are processed through the ibm 1030 data collection system, whose configuration consists of two 1031 card badge on-line circulation systemjrao and szerenyi 93 readers, one 1033 printer and one 1034 card punch. the entire 1030 system is controlled by a 2701 data adapter unit (figure 1). if the computer is not in the on-line mode, the 2701 routes the transaction information to the 1034 card punch to be punched into cards that will be used later to update the disk files. the 1030 system transactions are monitored by a special program called "1030 analyzer". this program is written in basic assembler language ( bal) and has its own partition of about 50k in memory. the 1030 analyzer controls five overlay programs which actually process the transactions and make necessary file modifications. each overlay is a segment of the transaction processing program and processes a specific routine, such as determining type of patron and loan period, calculating due date, etc. when the information for a transaction is transmitted to the 1030 partition, the 1030 analyzer determines which overlays are needed to process the particular transaction and calls those overlays. the overlays access the required records from the patron and booth master files and do the necessary processing, then the master records representing the latest transaction information are written back in their storage locations. the 1030 analyzer program and the associated overlays were written locally for bloc. check-out each person associated with the university who is eligible to borrow books from booth library is issued a badge by the appropriate administrative office. a patron is expected to present this badge, along with the books he wishes to check out, to the circulation desk. though transactions can be processed without the badge, this is done only in exceptional cases. into each badge are punched the person's social security number and a one-digit status code to indicate student, faculty, etc. the badge reader in the terminal reads the social security number and transmits it to the system, which interprets it as the record address for the particular person in the patron file and takes the necessary information from that address. the status code enables the system to determine the loan period. after receiving the books to be checked out and the badge from the patron, the circulation attendant first compares the accession number stamped on the book pocket against the accession number stamped on the back of the book card. if the numbers match, she proceeds with the processing; otherwise she first pulls the right book card from the duplicate book card file before proceeding. mter comparison of accession numbers, the badge is inserted into the badge slot on the 1031 terminal. the reset switch is set to "non-reset" to charge out more than one book to the same patron. the book card is then fed into the card input slot, face down, notch edge first. if the terminal is not able to read the card the first time a "repeat" light comes on, in which case the card is taken from the exit slot and fed into the input slot again. if the "repeat" light comes on more than twice for the same card, 94 journal of library automation vol. 4/2 june, 1971 it is assumed that there is a punching error in the card and the transaction is completed by manually recording the necessary information on a special card that is later punched and used to update the disk files. if the terminal reads the card without any problem the "card" light comes on, indicating that the terminal is ready to process the next card. the attendant takes the book card from the exit slot and puts it in the book pocket, then stamps due date on the date due slip for a student check-out, or inserts a prestamped date due card for a faculty check-out. when all book cards for one patron have been run through the terminal, the badge control switch is set to "re-set," which releases the badge to be returned to the patron. if the transaction is not normal, the deviation is communicated to the attendant by the system on the 1033 printer with one of the following messages: terminal address message ace.# class patron code 1) "#terminal-s no master record 137335 9197 3233815871" class# ace.# 2) "#da428-h21 001253 message you just tried to check out -is already out-check it in and try again" the first message is given for a book that got into circulation before its master card was loaded onto the disk file by the computer center; in this case the transaction is completed with a special card through manual recording. the second message is given for a book that has not gone through the check-in process upon arrival in the library from a previous check-out; here the attendant simply checks in the book, then checks it out again. check-in at frequent intervals books deposited in return bins are placed on a truck and taken to a terminal to be checked in. the check-in badge is inserted into the badge slot and the reset switch is set to non-reset mode. when a book is taken from the truck the accession number on the back of the book card is compared with the accession number on the book pocket. if the card is the right one, it is then run through the terminal and replaced in the book pocket, which completes the check-in process for a book. if the book card is not the right one or is missing from the pocket, the transaction is completed using the book card from the duplicate file. circulation list each night after the library is closed a cumulative circulation list is printed giving all books checked out up to the closing hour of 11 p.m. two copies of this list are delivered to the circulation department the next morning. one copy is placed at the card catalog to enable patrons to find out whether books they want are in or out. the second copy is kept at the circulation desk for staff use. on-line circulation systemfrao and szerenyi 95 this list, printed in call-number order, sho:ws the identification data of the book, its due date, the patron's social security number and his status. for faculty and special badge (mending etc.) check-outs, the transaction date is printed instead of the due date. at present a faculty member can check out a book for a whole academic year. however, the circulation librarian may recall a book from a faculty member after thirty days if it is needed by another patron. the transaction date helps the circulation librarian to recall books in accordance with this policy. since the loan period for special badge check-outs can not be predetermined, the transaction date is printed for these check-outs. the circulation list acts as a back-up to permit circulation staff to answer questions when the system is not in the on-line mode. when the system is in the on-line mode the circulation list reduces the demand on the 2260 terminal inquiries during peak periods. the circulation list printing will be eliminated when two more 2260 terminals become available for the use of the system. transaction file each check-in and check-out transaction· is recorded on a tape in addition to being on the booth master file on disk. this transaction tape is used to generate the circulation list. it also enables restoration of the disk file should something happen to the latter. the transaction tape is cumulated into weekly, monthly and annual tapes to generate a variety of statistical reports and also overdue lists. batch processing most of the time this system operates in an on-line, real-time mode. occasionally it has to operate in batch mode because of some mechanical malfunctioning. in this mode the computer system is not able to service the circulation system. consequently the 2701 data adapter unit routes the transaction information to the 1034 card punch (figure 1) to receive data and punch this data into cards in a pre-designated format. for each transaction conducted in this mode the 1034 card punch punches one card with appropriate data; this is later used to update the transaction file. in this mode the circulation staff cannot make on-line inquiries and cannot get guidance from the system in case of errors. to alert them there is a light on the terminal that hashes whenever the system operates in this mode. in case of a complete breakdown of the system, transactions are processed manually using special cards. later, information from these cards is punched in 1034 card format and the files are updated. during the two years of its operation the system went into this mode only once for two hours because of engineering difficulties. over dues a cumulative overdue list is printed once a week listing books overdue on that date. it shows identification data of a book and the address of its 96 journal of library automation vol. 4/2 june, 1971 borrower. for each overdue book a mail notification card is also printed, addressed to the borrower and containing identification data for the book. when an overdue book is checked in, the system prints out a message for the attendant. for example the message "lb1051-s62-131313 checked in by 320-46-0785 1 was 20 days overdue" means that the book identified as "lb1051-s62-131313", brought back by a borrower whose number is "320-46-0785", was 20 days overdue. mter check-in the overdue book is turned over to a clerk for necessary action. personal reserves a patron wishing to obtain a book that is checked out places a reserve on the book at the circulation desk. reserves are placed in on-line, real-time mode in the bloc system. the circulation attendant merely keys in the identification data of the book and the requestor, along with the reserve code, using the 2260 display terminal. this information is sent to the system in the following order: 1) start symbol indicating the beginning of an inquiry. 2) inquiry code "br" (for booth reserve). 3) identification data of the book (call and accession numbers). 4) identification data of the requestor (social security number) . 5) requestor's status code (no.3 for faculty members, etc.). 6) end of the message code ( "_" underscore ) . when this information is entered, the system places the book on reserve for the requestor and displays the necessary information on the screen for the attendant's visual verification. whenever a book on reserve is checked in, the system prints a message such as "qa76-5-f34-182929 is saved for 138-32-0044 3." alerted by this message, the attendant places the book on the reserve shelf and notifies the requestor, usually by telephone. meanwhile, if another person inadvertently tries to check out the book, the system prints the message as "qa76-5-f34-182929 is saved for 138-32-0044 3 do not check out." if the requestor cancels his reserve on the book, it can be taken off reserve status by sending the appropriate code and identification data of the book via the 2260 terminal. on-line inquiries one of the main advantages of bloc is that it enables the library to obtain answers to a variety of questions in seconds. the circulation staff can tell easily the status of any book, and can obtain the list of books borrowed by a patron. on-line, real-time inquiries can be made on this system using the 2260 display terminal. the 2260 inquiry processing is controlled by a special program called the 2260 analyzer. this program is written locally in pl/i and has its own partition (about 95k) in the memory of the computer. altogether it on-line citculation system j rao and szerenyi 97 services thirteen terminals located at various places on the campus. only two of these terminals accept the circulation inquiries: the master terminal in the computer center and the terminal at the circulation desk. the rest are used in connection with the other computer applications on the campus. when a circulation inquiry is transmitted to its partition, the 2260 analyzer determines the type of inquiry and calls in the appropriate overlays (at present there are 20 ) to access the needed records from the files and to process the inquiry, and then send the response back to the inquiry originating terminal. after processing, the records representing the latest modifications, if any, are written back in their previous storage locations. inquiry response time is less than a second. to know how to make a certain type of inquiry all one has to do is key in the letters "in" onto the screen and enter them into the system, then the system displays formats for various types of inquiries on the screen. this feature enables new operators to make inquiries on the terminal with minimum training. the reserve and clear inquiries have already been explained. the other circulation inquiries include: name, student or employee master file, book display, book scan, and unclear. name inquiry the social security number of a patron may be obtained by keying in his last name preceded by code letters "na"; if unsure of the spelling of the d esired last name the operator merely keys in a part of the last name. when either a name or segment thereof is entered, the screen displays twelve names in alphabetical order (beginning with the last name or part of the last name entered) along with corresponding social security numbers. if the desired name is not within these twelve, the operator can get the next twelve by pressing the "next" key. this procedure may be repeated until the desired name and corresponding social security number are located. she can then select that social security number and enter it into the system to get the address of that person. student and employee master file inqui1·ies these inquiries are being made to find the addresses and telephone numbers of patrons as needed by the circulation department. whenever a person's social security number, preceded by code letters "sm" (for students ) or "em" (for employees ), is entered into the system, it displays his campus and home addresses and telephone numbers. book display inquiry this inquiry enables the circulation staff to know the status of any book within the system. when the call number and accession number ( which is usually obtained from the duplicate book card fil e at the circulation desk), preceded by code letters "bd", are entered through the terminal, the system displays the following information on the book: call number; acces98 journal of library automation vol. 4/2 june, 1971 sion number; copy number; author and title; status, as checked in or checked out; if checked out, when; how many times it has been checked out so far; if checked out, the name and address of the person who has it; if on reserve, name and address of the reserve requestor. book scan inquiry through this inquiry the books in a given class can be scanned, one after another. whenever a class number, or part of it, preceded by code letters "bs" (for book scan), is entered into the system, it displays the information about the first book in that class; then, by pressing the "next" button on the terminal keyboard, the operator can have displayed information about the next book in that class. this procedure may be repeated as many times as necessary. this class access method is a very important feature of the bloc system; through it, one may discover rather quickly what books are available in the library on a given subject, and simultaneously it can be found whether a book is in the library or checked out. in this inquiry mode the system also keeps track of how many books are scanned for a given class and displays this information on the screen. the difference between the "bd" and the "bs" inquiries is that the "bd" inquiry is made when information about a specific book is needed and when its unique record address (call and accession numbers) is known. the "bs" inquiry is made when only part of the record address (such as class number portion of the call number) is known and when browsing through a given class is desirable. unclear inquiry the university library has to clear withdrawing or graduating students, and leaving employees. this is very easily accomplished through this system, by the operator's merely entering the patron's social security number, preceded by the code letters "bu" (for book unclear), into the system. if the patron has no books out as of that minute, the system displays "patron xxx-xx-xx~:x has no books checked out." otherwise the system displays the call and accession numbers for books checked out and not yet returned by the patron. the system can display up to ten titles at a time; if the patron has more than ten books out a "continue message" appears at the end of the top line on the screen, and otherwise a "final message" appears. discussion benefits gained by the circulation department have already been discussed. following are benefits gained by booth library as a whole: 1) booth library can now provide subject listings, arranged in callnumber order (sorting and printing takes only a few hours) as required by various academic departments. the listings have been extremely helpful in pointing out the library's resources to various accreditation committees. on-line circulation systemjrao and szerenyi 99 2) physical book inventory taking was greatly facilitated by printing the booth master file in segments and with indication whether a book was in or out at that time. 3) periodic listings of books charged out to special badges, as binding, lost, etc., have been printed to facilitate follow-up activities by the respective departments. 4) the booth master file acts as a security back-up to the shelf list. should something happen to the shelf list it could be recreated from the booth master file within two days. similarly, if need arises, the departmental shelf lists can be created overnight. 5) the library committee can now make book budget allocations on a more scientific basis by reviewing the annual statistical reports, which give more accurately than before the volume of circulation in various subject fields. in addition to the above benefits there are interesting possibilities for doing a variety of things, of which only a few are mentioned below: 1) periodic listings of new books received, on the basis of area of interest, can be printed to provide selective dissemination of information service. 2) since both students' and circulation records are in machine readable form, a variety of research tasks could be undertaken with readily available data to find out the reading habits of students at academic level and at age level. these reading habits could be correlated to academic achievement with the aid of data in the student records . 3) when the booth library marc implementation project is completed, most book cards can be generated directly from the marc tapes. punching of master cards will be necessary only for those books not entered on marc tapes. 4) the booth master file can be used to create data bases (partially or completely, depending on the size and characteristics of a library) at other libraries for similar applications. costs booth library does not differ from other libraries when an attempt is made to collect data on costs. there are no figures available on planning cost, system design cost, or on writing and testing the programs. bloc was developed through the collaborative efforts of library and computer center staff. many people have devoted time to the planning and development effort, working for bloc on an additional duty basis. only two people were hired to work full time for the project: one keypunch operator and one ibm machine operator; their combined annual salary was $9,000.00 in 1968 and in 1969. the machine operator position was terminated at the completion of the basic file conversion in 1969. present operating costs of the system are not yet available. since all programmers' and operators' time is devoted to maintaining and operating a number of systems, it is difficult to determine how much the operation 100 journal of library automation vol. 4/ 2 june, 1971 and maintenance of bloc costs. so far staff members have been busy improving the performance of bloc and have not had time to do an indepth cost study. howeve r, there are some costs which can be directly charged to bloc. at present the full time of one keypunch operator and 270 hours of student help, at a total cost of $745.00 per month, are exclusively devoted to bloc. the breakdown listed in table 1 gives an estimate of the percentage of use on each of the items of equipment used for library purposes and the monthly cost calculated from the percentage of use. terminal cable and magnetic tape costs are not included in the total. table 1. equipment use and cost. qty. item number item description percentage proportional of use by monthly rental bloc cost charged to bloc 2 029 card punch 100 $117 1 059 card verifier 100 58 1 083 sorter 1 1 1 088 collator 1 4 1 519 reproducer 1 2 1 557 interpreter 1 1 2 1031 input station 100 159 1 1033 printer 100 76 1 1034 card punch 67 206 1 1052 printer keyboard 5 3 1 1403 printer 2 16 1 2050 central processing unit 10 1,188 1 2260 visual display 100 41 1 2314 disk storage facility 13 689 1 2316 disk cartridge 100 17 1 2401 magnetic tape unit 50 138 1 2540 card read/punch 5 28 1 2701 data adapter unit 67 157 1 2821 control unit 5 46 1 2848 display control unit 13 96 total monthly cost of equipment: $ 3,043 total yearly cost of equipment: $36,516 it should be pointed out that the costs shown in table 1 are averaged on the basis of total number of units rented and the amount paid in connection with all computer applications, and not on the basis of equipment used by bloc alone. if the above-listed equipment had been rented for utilization by bloc alone the rental costs would have been much higher. moreover, the costs in table 1 do not include salaries of computer personnel. on-line circulation systemjrao and szerenyi 101 utilization of the bloc system has not produced any payroll savings. no library position was eliminated by installing it, but it is a certainty that more personnel would have been needed to discharge all duties at the circulation desk in the future without this system. using the computer allows a 20% increase in loans to be processed without increase in personnel cost. expansion the capacity of bloc has by no means been exhausted; its flexibility allows for more innovations, so that every possible circulation need can be met. the utilization of the bloc system is limited only by the ingenuity of its users. two new features are to be added to it in the near future. one of these is the installation of an ibm 27 41 communications terminal to generate date-due slips, so that the present method of stamping the due date in a book can be eliminated. the 2741 terminal was decided on for the reason that it can be operated in either on-line or off-line mode, enabling the circulation staff to type date due slips manually when the system is in off-line mode. the second new feature will be installation of an additional 2260 terminal for public use near the card catalog. this terminal will accept only "bd" and "bs" inquiries, and the "bd" inquiry on this terminal can be made by call number alone, which is readily available in the card catalog. privileged inquiries, such as placing books on reserve, etc., will continue to be the prerogative of the terminal at the circulation desk. this new feature will provide patrons with up-to-the-minute information concerning the availability of library materials. design phase for these new features has been completed and the programming effort is underway. it is expected that these new features will be added to bloc by the fall of 1971. conclusion it can be said that a relatively small university library with limited funds can start and develop automated systems if the parent institution obtains a computer for instructional and administrative purposes. this was the case in booth library's circulation system. to keep pace with eastern illinois university's anticipated growth, it was decided in 1964 to develop a data processing center. it has grown rapidly in terms of services rendered to the university. its main purpose initially was to serve the academic departments, but its services have spread to several administrative functions, such as admissions, student records, registration and personnel services, just to name a few. it was not difficult for the librarians to convince the university's administration of the necessity and usefulness of the computer for library purposes. relatively little extra expenditure for hardware was needed. understanding and cooperation from the staff of the reorganized 102 journal of library automation vol. 4/2 june, 1971 computer center helped to develop the library's circulation system. what was the dream of the librarians a few years ago is now an actual operation, is working well and giving better service to the library's patrons. the major advantage is the saving of time on all necessary operations. the system also freed the staff from routine manual work. it eliminated the large call-slip files and inevitable human errors in that file. patrons were freed from filling out call slips, and the circulation staff was freed from the tiresome task of decoding the unreadable "scribbling" of many patrons. check-out and check-in of books was speeded. there is no longer a line of waiting students at the circulation desk and, on the average, it takes less than five seconds to check out a book. a variety of reports containing computer analysis of circulation records are available at regular intervals. they are an aid to ordering additional copies for heavily used titles and to surveying the collection for weak spots. after more than two years of operational experience, it can be said with confidence that the bloc system has fully satisfied all its design objectives and even exceeded them by providing some additional benefits that were not in the original planning. references 1. hamilton, robert: "on-line circulation system at the illinois state library," the larc reports 1 (december 1968). 2. heineke, charles d.; boyer, calvin j.: "automated circulation system at midwestern university," ala bulletin, 63 ( october 1968 ), 1249-1254. 3. kennedy, r. a.: "bell laboratories' library real time loan system (bellrel), journal of library automation, 1 (june 1968 ), 128-146. 4. bearman, h. k. g.; harris, n.: "west sussex county library computer book issuing system," assistant librarian, 61 (september 1968 ), 200202. 5. ibm corp. introduction to ibm systemj360. direct-access storage devices and organization methods. white plains, n.y.: ibm, 1967. lib-s-mocs-kmc364-20140601053517 regional numerical union catalog on computer output microfiche william e. mcgrath: director of libraries; and donald simon: systems analyst and computer programmer, university of southwestern louisiana library, lafayette, louisiana. a union catalog of 1,100,000 books on computer output microfiche (com) in twenty-one louisiana libraries is described. the catalog, called lnr for louisiana numerical register, consists not of bibliographic information, but primarily of the lc card number and letter codes for the libraries holding the book. the computer programs, the data bank, and output are described. the programs provide the capability for listing over two million entries. also described are the statistical tabulations which are a by-product of the system and which provide a rich source for analysis. twenty-one louisiana libraries have produced on computer output microfiche (com) a union catalog containing locations for 1,100,000 books. about 150,000 of these are current acquisitions (books acquired in the last two years ) ; the rest are volumes in the retrospective collections of ten of the twenty-one libraries. the numerical register of books in louisiana libraries, as the catalog is now entitled, is the second step toward what is hoped will be a comprehensive current and retrospective list of over two million volumes, the estimated holdings of the participating libraries. the first was a conventionally printed register of 550,000 books, issued in 1971 and distributed to fifty louisiana libraries. the new register is not a bibliography. it includes no bibliographic information. it is a location device for books whose bibliographic information is already known and includes nothing that is not also listed by the library of congress. the title was deliberately chosen to distinguish it from 218 journal of library automation vol. 5/4 december, 1972 an older bibliographic louisiana union catalog. all books listed in the register are those having a library of congress ( lc ) card number; indeed the lc card number is the entry. the term "numerical" was chosen because we anticipate using other numbers besides the lc number-e.g., the mansell number, and the international standard book number ( isbn ). the lc card number is the most widely used book number we now have. this fact is put to good use by the library of congress in its own nucregister of additional locations. there are other lc number indexes, but they are not union catalogs. (the mansell number, of course, will be very useful when publication of the nuc-pre-1956 imprints is complete.) many more titles can be represented on a page by number codes than by complete bibliographic data, at a ratio of perhaps 600 to 9. unit costs are, therefore, much less. the first edition ( 1971 ) containing 550,000 volumes was produced for an estimated total cost of $22,600-$8,600 grant plus $14,000 absorbed. one hundred copies of the register were printed in hard copy form with approximate overall unit costs for keypunching, computer, travel, salaries, and printing, as follows : in terms of actual expenditures in terms of total funds, (grant funds ) expended plus absorbed per title entry 2.5¢ 6.0¢ per volume entry 1.6¢ 3.8¢ the second edition (november 1972) contains over 1,100,000 volumes and in terms of the second grant, was produced on computer output microfiche for an estimated total cost of $31,200, i.e., $10,000 grant plus $21,200 absorbed. (reproduction costs for the com are negligible. for an original copy of 5 fiche , containing all1,100,000 volumes, we were charged $25 by a commercial firm, and for extra copies, $3 each. copies for distribution will be sold at a slightly higher price.) unit costs for the com edition are: in terms of in terms of total actual expenditures funds, second grant (seco nd grant funds) expenditures plus absorb ed per title entry 1.8¢ 5.6¢ per volume entry .9¢ 2.8¢ unit costs computed on the basis of total costs to date suggest that they remain relatively constant from cumulation to cumulation. the concept of a numerical register is not new. the idea was discussed at length in a proposal by harry dewey ( 1) almost a generation ago in which he espoused all the essential ideas, and again in 1965 by louis schreiber ( 2). both argued that if the bibliographic data including the lc card number were already in hand, one could then merely look up the number in a numerical union catalog to determine a location. goldstein and others ( 3 ) have also studied what they called the "schreiber catalog" and have produced a sample computer printout of lc numbers. computer output microfiche, on the other hand, was not anticipated in the original concept. it has made reproduction and distribution cheap, fast, and regional numerical union catalog/mcgrath and simon 219 eminently feasible. the history of the register and its rationale have been discussed more fully by mcgrath ( 4). programs comprising the union catalog system the union catalog data record is shown in table 1. the first three fields are the familiar lc card number, and the fourth, the library location. table 1. the data record (1) (2) alpha year or series numeric series agr 69 (3) serial number within numeric series 2354 (4) library c ( 1 ) alpha series prefix this data field may contain from 1 to 4 alphabetic characters denoting a special series. (2) numeric series prefixthis data field may contain 1 or 2 digits. ( 3) serial number -this data field may contain up to 6 numeric digits. ( 4) alphabetic library designation codethis field contains a preassigned alphabetic code (up to 26) designating the participating library. the three programs which use this data record and comprise the union catalog system are shown in figure 1 and described below. lnredt program lnredt is an editing program which examines all card input data to determine whether they are acceptable or not. each data field as shown above is examined as follows: field 1 for the presence and rejection of nonalphabetic characters, and also to determine if the alphabetic code is a member of the accepted set of codes obtained from the library of congress; the accepted records are transferred after checking all fields to a magnetic tape file for subsequent use; rejected data records are printed and visually scanned for the source of error; fields 2 and 3 for the presence and rejection of nonnumeric characters; field 4 to determine if alphabetic. lnrsrt program lnrsrt sorts all records on the above mentioned tape file. the major sort key is the numeric prefix, field 2. the minor sort keys in order of the sort sequence are: field 1-the alphabetic special series indicator; field 3-the book serial number; field 4-the library code designation. lnrlst program lnrlst is the main program which uses the sorted data tape to: 220 journal of library automation vol 5/4 december, 1972 1. lnredt card recor ds editing of data fields generation of records of nique titl~s in combinations 3. lnrlst combinations en tered in memory matrix and coun initialized subsequent combinations matched and tallied 2. lnrsrt sorting of records calculation of statist i cal tables fig. 1. flow chart of the programs comprising the regist er system . regional numerical union catalog/mcgrath and simon 221 a. create a single record for each unique lc number containing the library code designation of each library having this particular book; b. produce a listing of the above records in lc card number order; c. generate records of unique titles in combinations of libraries owning the titles; d. enter into a memory matrix the combinations of libraries created in part (c); combinations are then counted; each time a combination is encountered, the matrix is searched for a match; if a match is found, the corresponding matrix position is incremented by one; if no match is found, a new matrix position is created with the new combination and the corresponding count initialized to one; this routine also provides for a total count of each library's contributions plus a grand total of all libraries' contributions; e. tabulate, from the data compiled in (d) above, several elaborate tables of summary statistics; these statistics are described later in this paper. the number of libraries the program lnrlst can accommodate is a variable and is entered as an execution-time parameter along with the library names and code designations. the main program occupies approximately 150,000 bytes of core memory. the output a sample of the register entries appears in figure 2. a simple one-letter designation was used to identify each library rather than the usual national union catalog ( nuc) designation in order to save space in the printout. these letters appear alphabetically to the right of each lc number. a typical page of the register contains ten columns of up to six-digit lc numbers, with the two-digit series number appearing only once at the beginning of each series. thus each page contains about 600 lc numbers. the latest cumulation of 1,100,000 volumes ( 560,000 lc numbers) consists of nearly 1,000 pages. the entire output was produced on five pieces of fiche directly from the cumulated tape. the com program was written by the commercial firm which contracted to run it. the computer output microfiche was issued on five 4x6 pieces in 42x. each piece contains 208 frames and each frame contains an average of 1,126 volumes and 573 titles. the data can be produced on 24x fiche as well as roll film. statistical summary the large samples of holdings (from an initial 5,000 volumes, through successive cumulations to 90,000 and, the most recent, 1,100,000) provide an excellent data base for statistical analysis. we believe the samples may be the largest title by title comparison of monographs ever tabulated in this format. very little analysis is presented in this paper, but the data base and its format will be explained. even without analysis, many interesting observations can be made. 222 ] ournal of library automation vol. 5/4 december, 1972 4449 e z 4587 ce 90 0 1544c az 4607 e 9157 ae 15503 c ez ps 4690 bcen 9236 b 15972 0 ej 76 4729 o"' 9314 z 15980 e 80 15168 4788 e 9611 e 16003 e e j 4859 c 9717 0 16109 e m 112600 j 4891 e 9792 be 16141 eo a 4903 aced 9944 z 16393 a e 4911 e 10294 0 1 6405 e 75630 elmo 77 4938 e 10349 ! 1~472 e 75728 a a 5087 bjlo 10354 16649 e 75736 z 5 . 5158 ab 10357 e 16681 e 75779 ai 56 i 5190 a 10361 j 1 6728 e 75787 ae 100 0 5296 0 10365 e 16752 e 75823 ae 214 bp 5564 c 10460 e 17260 ce 75866 aliiz 257 be 5~68 e 10468 a 17567 e 75874 ei'. 360 a 5647 e 10558 z 17b89 e 75902 acecl 407 a 5655 a 10631 17733 0 l 431 cp 5785 0 10645 e 18103 e 75937 abcmn 553 c 5813 ~~ 10661 ae 18154 e ol 632 e 5821 10716 ed 15225 e 75996 c 738 ae 5927 e 10723 z 19038 e 76051 aciop ~abceh 6112 e 10774 8 ~56~,1.. ? i 'of fig. 2. portion of a typical page of the computer printout showing the 2-digit 76 and 77 series, a typical prefix-ps, the serial numbers with the series, and letter codes to the right of each serial number. for example, library a has the book 77-5; seven libraries-a, b, c, m, n, 0, and z hold the book 77-75937. each page contains ten columns; only five are shown. most of the tabulations are designed to throw light on the various aspects of the overlap problem, since a decisive factor in determining the utility of . the register is a knowledge of the number of titles held in common by all the libraries. over the years there has been continuing interest in overlap. probably the first and most elaborate of the early studies was by leroy merritt ( 5), and one of the most recent by leonard, maier, and dougherty (6). continuing interest is expressed in such proclamations as that by ellsworth mason where he claims that materials are "being acquired in duplications that are rather staggering across the country." ( 7). the following statistics were tabulated from input for current acquisitions, the most recent being a total of 90,302 volumes, rather than the retrospective and current totals in the production runs. the 90,302 volumes were acquired for the most part during the two year period, fall 1969 to fall 1971. the statistics show holdings for sixteen libraries. the basic tabulation-titles held in common by unique combinations of libraries the basic tabulation sections which are shown in table 2 actually fill seven pages of computer printout. the tabulation is designed so that each unique and actual combination of libraries is separately listed, and the books held by each combination are counted. thus, in the table, although the total number of books held in common by libraries a and b is 127, the table 2. titles h eld in common by each unique combination of libraries ~a n~ l ibrary in combined library in combined lli comhination common holdin gs % combination common holdings 01 01 at3c adce a t~ cetii-1jlz aticelz atjcel ~ b~~~1_ ~ atlchhipz ao{;hjlmp a ~ ciji\ mpz ali cjnuz aocl ~b~~ ~ abe 2 39874 aijeh 2 44346 a~ehljm l 55067 al) thli"•iiipl i 66188 a~ehj l 48499 a~ehj~l i l 60064 a~ehji'. .52790 a~ehmo 1 54117 a~thz l 52108 autjoz 1 57757 aoel l 44765 atleu abez a d ghl i-ip a~r1 a!jhz ~~~ aujklp aokno ', 01 a ~l arlm ,01 abn athj • u library titles combiin combined % nation commonholdings ::j:j ~ .... 0 ~ ~ ~ .: ~ ""t .... (") !;::) ........ <::: ~ a· ~ cj !;::) .... !;::) c (jq ......... ~ (".) 0 !:xi ~ ~ § 0.. cj) -~ 0 z ~ 224 journal of library automation vol. 5/4 december, 1972 number of books held in common by them and no other library is only 52. the number of books held by libraries a, b and z, and no other library is 18. none of these 18 is included in the count of 52, and none of the 52 in the 18. they are mutually exclusive. but the 18, plus the 52, plus the small counts in each of the other combinations in which a and b share holdings is 127. the percentage of common holdings for each combination is also given except when the percentage is less than .01. thus libraries a and b have .48 percent in common of their total combined holdings of 10,688 volumes. it is interesting to note that of the 65,535 possible combinations, in only 444 combinations did the percentage of common holdings exceed .01 percent, and in only 8 did the percentage exceed 1 percent. of these, th.e highest is 5.43 percent (a and z). this 5.43 percent means that 678 of a and z's common holdings were held by no other library. the total of a and z's common holdings that were also held by other libraries is 1,315, or about 10.5 percent of 12,470. again this is the highest percentage of any combination. summary of titles held in common the basic tabulation of titles held in common is summarized in table 3. column 1 is the number of libraries from 1 to 16 in each combination. column 2 is the total number of titles counted in all combinations. for example, 59,907 titles exist in unique copy, thus there were only 59,907 copies (column 3), but there were only 8 titles which as many as 9 libraries held, for a total of 72 copies ( column 3). column 4 shows that all 16 libraries contributed unique titles and that there were 117 different combinations of two libraries, out of a possible 120 (column 5). thus there were 3 combinations of 2 libraries which had no titles in common. it is also most interesting that there were only 7 combinations of 9 libraries out of a possible 11,440, and no combinations of 10 or larger. according to the binomial distribution, there are 65,535 theoretical ways that 16 libraries can combine (total, column 5), whereas, in this sample, only 1,198 combinations occurred (total, column 4). column 6 is the result of column 2 divided by column 4. thus 3774.19 is the average number of unique titles contributed by each library. 74.92 is the average number held by any combination of 2 libraries, and 6.89 is the average held by any combination of 3. summary of each library's multiplicated titles the administrators of each library are especially interested to know how many of their own titles are also held by other libraries. this information for total input (i.e., for titles with lc prefixes from 1900 to the present) is given in table 4. (tables were also produced giving the same kind of table 3. summary of titles held in common by unique combinations of libraries (spring 1971 tabulation) ::x:l ('\) o'c:l .... column 1 column 2 column 3 column4 column 5 column 6 c ;:3 1::1 no. of libraries total no. of total no. of no. of times theoretical no. of average title ....... in each titles in all copies in all a combination times a combination overlap per <: combination combinations combinations occurred can occur combination ~ (binomial distribution) ~ ('\) 1 59,907 59,907 16 16 3,774.19 '""t ... 2 8,766 17,532 117 120 74.92 2 ..... 3 2,453 7,359 356 560 6.89 ~ 4 782 3,128 360 1,820 2.17 ;:s ... 5 279 1,395 214 4,368 1.30 g 6 84 504 75 8,008 1.12 (") 1::1 7 43 301 41 11,440 1.04 .... 1::1 8 13 104 12 12,870 1.08 -c o'c:l 9 8 72 7 11,440 1.14 .......... ~ 10 0 0 0 8,008 0.00 (') 11 0 0 0 4,368 0.00 (') :0 12 0 0 0 1,820 0.00 > ~ 13 0 0 0 560 0.00 ::r: 14 0 0 0 120 0.00 ii:> 15 0 0 0 16 0.00 ::i 0.. 16 0 0 0 1 0.00 c/) -totals 72,335 90,302 1,198 65,535 60.38 ~ 0 z ~ ~ cjl table 4. summary of each libt-ary's multiplicated titles (1900-1971 imprints) t:5 0) ...... column 1 column2 column 3 column 4 columns column 6 column 7 c ~ library library number at each library's no. of titles each library's each library's ~ code volumes volume as a for which copies m ultiplicated m ultiplicated ...... contributed %of total are also held titles as a % at titles as a % of c by each volumes by other own titles grand total -library libraries (col. 5+col. 3) (col. 5+total, col. 3) r:--. louisiana state ... ~ library a 4,708 5.21 2,497 53.03 2.76 ~ louisiana tech ~ university b 5,980 6.62 2,378 39.76 2.63 > university of south~ ..... western louisiana c 6,353 7.03 1,932 30.41 2.13 0 louisiana state uni~ versity-baton rouge e 29,186 32.32 6,190 21.20 6.85 ..... .... louisiana state univer0 ;:s sity medical center f 580 .64 168 28.96 .18 < grambling g 1,606 1.77 471 29.32 .52 0 -centenary h 4,472 4.95 2,061 46.08 2.28 louisiana state unicjl .......... versity-aiexandria i 2,765 3.06 1,087 39.31 1.20 ~ southeastern louisiana j 4,153 4.59 1,849 44.52 2.04 tj northwestern louisiana k 563 .62 230 40.85 .25 ("!) (") northeastern louisiana l 4,891 5.41 1,980 40.48 2.19 ("!) 3 loyola-new orleans m 3,803 4.21 1,744 45.85 1.93 0" louisiana state uni("!) versity-shreveport n 4,291 4.75 1,749 40.75 1.93 ~'"1 louisiana state uni,_.. (.0 versity-new orleans 0 5,968 6.60 1,783 29.87 1.97 -..:( ~ nicholls p 3,221 3.56 1,048 32.53 1.16 new orleans public z 7,762 8.59 3,228 41.58 3.57 totals 90,302 100.00 30,395 average 5,644 6.25 1,900 37.78 2.09 regional numerical union catalog/mcgrath and simon 227 information by decade and for the last two years, but are not reproduced here.) the column labels are self-explanatory, but it may be observed that the total in column 5, 30,395, equals the difference between the total copies, 90,302 (column 3, table 3) and the number of titles held by one library only, 59,907 (columns 2 and 3, table 3). distribution of titles published and multiplicated by decade table 5 shows that the very largest overlap, in current acquisitions, occurs among books with recent imprints. this is to be expected since these figures do not make any comparison to older books recently acquired by one library to those already in another library, and since the acquisition of older books is from a much larger universe than that for current books. table 5. distribution of contributed titles published and multiplicated by decade (titles acquired from 1969 to 1971) imprint number of titles %of titles number of volumes %of total period contributed contributed m ultiplicated volumes m ultiplicated 1900-1909 1,483 2.05 23 .13 1910-1919 1,049 1.45 29 .16 1920-1929 1,180 1.63 22 .12 1930-1939 1,816 2.51 74 .41 1940-1949 2,539 3.51 102 .57 1950-1959 5,353 7.40 361 2.01 1960-1971 58,915 81.45 17,356 96.59 totals 72,335 100.00 17,967 100.00 other summary statistics the foregoing tables illustrate the kind of tabulations that can be made with this type of data. more detailed tables can be compiled, and indeed were-e.g., tables giving the percentage of books acquired for each year and each decade for each library, with ten year totals and averages. other possibilities would be frequency distributions and summaries for clusters of similar libraries. this material awaits analysis. we believe it contains many heretofore unsuspected insights. future plans since the data can be updated so readily, plans are being made to provide funds for the extraction and keypunching of lc numbers in the remaining retrospective collections of the participating libraries. these libraries contain an estimated total of two million volumes. succeeding cumulations will be readily produced on com. most of the cost has been 228 journal of library automation vol. 5/4 december, 1972 for extracting retrospective numbers from card catalogs. once the remaining retrospective collections are cumulated, costs for cumulating current input will be negligible. any final catalog of course can never list complete holdings since each library has many titles without lc numbers. those titles could be listed in more conventional form. since they are in a minority, the expense would be far more reasonable than it would be to reproduce entire holdings in conventional form. we have said nothing about other aspects of the project. in committee discussions, however, much has been said about the feasibility of using the lc card number to access the information in other major projects such as marc, and possibly even the data bank in the ohio college library center. technically, it is feasible to print a conventional bibliographic catalog by matching up our lc numbers with titles listed in the current marc tapes; pragmatically and economically, of course, it is another matter. other possibilities are the printing of a list of specialized holdings by accessing the subject headings on the marc tapes, assignment of specialized acquisitions, and the gathering of information which might affect development of a joint processing center. acknowledgments this project was supported in part by the library services and construction act title iii funds administered by the louisiana state library. the authors wish to give special thanks to miss sallie farrell, louisiana state librarian, for her enthusiastic support and fine advice. we wish also to thank the other members of the l.l.a. committee on the union catalog: mr. sam dyson, louisiana tech university; mrs. jane kleiner, louisiana state university, baton rouge; mrs. elizabeth roundtree, louisiana state library; dr. gerald eberle, louisiana state university, new orleans; mrs. hester slocum, new orleans public library; mr. charles miller, tulane university, new orleans; mr. ronald tumey, rapides parish library, alexandria; and finally, mr. john richard, past president of the louisiana library association, who saw the importance of the project, and who appointed the original committee. complete documentation for this project, including computer programs, has been deposited with the eric clearinghouse on library and information science ( 8). references 1. harry dewey, "numerical union catalogs," library quarterly 18:33-34 (jan. 1948). regional numerical union catalog/mcgrath and simon 229 2. louis schreiber, "a new england regional catalog of books," bay state librarian 55: 13-15 (jan. 1965). 3. samuel goldstein, et al., development of a machine form union catalog for the new england library information network (nelinet). (wellesley, mass.: new england board of higher education, 1970) (u.s. office of education final report, project no. 9-0404.) ed 043 367. 4. william e. mcgrath, "lnr: numerical register of books in louisiana libraries," louisiana library association bulletin 34:79-86 (fall197l). 5. leroy c. merritt, "the administrative, fiscal, and quantitative aspects of the regional union catalog," in union catalogs in the united states (chicago, ill.: american library association, 1942). 6. lawrence e. leonard, joan m. maier, and richard m. dougherty, centralized processing: a feasibility study based on colorado academic libraries. (metuchen, n.j.: scarecrow press, 1969). 7. ellsworth mason, "along the academic way," library journal 96:167176 (15 may 1971). 8. william e. mcgrath and donald j. simon, lnr: numerical register of books in louisiana libraries; basic documents (lafayette, la.: louisiana library association, dec. 1972) (u.s. office of education) ed 070 470, ed 070 471. reproduced with permission of the copyright owner. further reproduction prohibited without permission. using a native xml database for encoded archival description search and retrieval cornish, alan information technology and libraries; dec 2004; 23, 4; proquest pg. 181 communications using a native xml database for encoded archival description search and retrieval alan cornish the northwest digital archives (nwda) is a national endowment for the humanities-funded effort by fifteen institutions in the pacific northwest to create a finding-aids repository. approximately 2,300 finding aids that follow the encoded archival description (ead) standard are being contributed to a union catalog by academic and archival institutions in idaho, montana, oregon, and washington. this paper provides some information on the ead standard and on search and retrieval issues for ead xml documents. it describes native xml technology and the issues that were considered in the selection of a native xml database, ixiasoft's textml, to support the nwda project. pitti, one of the founders of the ead standard, noted the primary motivation behind the creation of ead: "to provide a tool to help mitigate the fact that the geographic distribution of collections severely limits the ability of researchers, educators, and others to locate and use primary sources."' pitti expanded on this need for ead in a 1999 d-lib article: the logical components of archival description and their relations to one another need to be accurately identified in a machine-readable form to support sophisticated indexing, navigation, and display that provide thorough and accurate access to, and description and control of, archival materials.' in a more recent publication, pitti and duff noted a key advantage offered by ead that relates to the focus of this article, the development of an ead union catalog: ead makes it possible to provide union access to detailed archival descriptions and resources in repositories distributed throughout the world. . . . libraries and archives will be able to easily share information about complementary records and collections, and to "virtually" integrate collections related by provenance, but dispersed geographically or administratively.' in a 2001 american archivist article, roth examined ead history and deployment methods used up to the 2001 time period. importantly, two of the most prominent delivery systems described by roth-dynatext (a server-side solution) and panorama (a client-side solution)-were, by 2003, obsolete products for ead delivery. this is indicative of the rapid pace of change in ead deployment, in part due to the migration from sgml to xml technologies. roth described survey results obtained on ead deployment that underscore the recognized need at that time for a "costeffective server-side xml delivery system." the lack of such a solution motivated institutions to choose html as a delivery method for ead finding aids.4 articles like roth's that describe specific ead search-and-retrieval implementation options are in short supply. one such option, the university of michigan dlxs xpat software, is employed for the search and retrieval of ead and other metadata in the university of illinois at urbanachampaign (uiuc) cultural heritage repository. 5 another option, harvesting ead records into machine-readable cataloging (marc) to establish search and retrieval access in an integrated library system, was described by fleck and seadle in a 2002 coalition for networked information task force briefing. using an xml harvester product created by innovative interfaces, marc records are generated based upon marc encoding analogs included in the ead markup and loaded into an innovative interfaces innopac system. 6 this product has been used to create access to ead finding aids in the catalog for michigan state university's vincent voice library. in a 2001 article, gillilandswetland recommended several desirable features for an ead searchand-retrieval system. she emphasized the challenge of ead search and retrieval by noting the nature of finding aids themselves: archivists have historically been materials-centric rather than user-centric in their descriptive practices, resulting in the finding aid assuming a form quite unlike the concise bibliographic description with name and subject access most users are accustomed to using in other information systems such as library catalogs, abstracts, and indexes.' without describing specific software tools, gilliland-swetland argued for a user-centric approach to the search and retrieval of finding aids by examining the needs of specific user communities such as genealogists, k-12 teachers, and historians. 8 several initiatives similar to the nwda effort are described in the professional literature. the online archive of california (oac), which was founded in the mid-1990s, is a consortium of california specialcollections repositories. a number of key consortium functions are centralized, including "monitoring to ensure consistency of ead encoding across all oac finding aids" according to agreed-upon best practices, a critical need in the creation of a union catalog.9 brown and schottlaender also describe the integration of the oac into the california digital library, which enables linkages between ead alan cornish (cornish@wsu.edu) is systems librarian, washington state university libraries, pullman. using a native xml database for encoding archival description search and retrieval i cornish 181 reproduced with permission of the copyright owner. further reproduction prohibited without permission. finding aids and digiti zed copies of original materials. 10 finall y, one import ant development area is the po ssibilit y of inte grating ead docum ent s into open archives initiative (oai) services in order to enh ance resourc e discovery. a 2002 paper written by prom and habin g, both of whom work with th e uiuc cultural herit age repository, explored th e possibility of mapping ead to oai, the latt er of w hich is based up on th e fifteeneleme nt dublin cor e metadata set (unqualified) . while no ting, "w e do n ot propose that th e full capabiliti es of ead finding aids could be subsum ed by oai," prom and habing sug gested that it is possible to map the top-l eve l and co mpon e nt portions of ea d int o oai, res ul tin g in multipl e oai records from a singl e ead finding aid. in thi s scenario, a sin gle oai record is created from th e collectionlevel information and multipl e records from component-level information in an ead docum en t.11 evaluation of ead search and retrieval products in order to iden tify a software solution for supporting a union catalog of ead findin g aids, the con so rtium conducted a product evaluation. the strengths and weakn esses of the native xml technology em ployed by the consortiu m can be best understood by lookin g at alternative xml product s and product categor ies . table 1 shows the products con sidere d during an evaluati on period th at consisted of both product research and actual trials. in approaching the eva luation, the consortium and its union -catalog host institut ion , the washin gton stat e university libraries, had seve ral specific need s in mind. first, the licensing an d support costs for the product needed to fit w ithin the consortium's budget. second , th e sea rch-andretrieval softw are had to sup port several basic fun ctions: keywo rd searching across all union-cat alog finding aids; specific field searching based upon elements or attribut es in the ead docum ent ; an abilit y to customize the look and feel of the interface and search-results screens; and the ability to display search term(s) in the conte xt of the finding aid . as not ed in the tabl e, three of the ev aluated products are n ativ e xml databases. cyrenne provid es a definition of native xml as a database with the se features: • the xml document is stored intact: "t he xml d ocum ent is preserv ed as a separat e, unique entity in its entirety ." • "schema independenc e," that is, "a ny we ll-formed xml document can be stored and queried." • the qu ery language is xmlbased: "na ti ve xml d ata base vendors typically u se a quer y langua ge d es igned sp eci fically for xml" as opposed to sql.12 of the thr ee native xml products, only the licensi ng costs of ixiasoft's textml and the open-sourc e xindice so ftware fell within the available projec t fundin g. both pack ages were extensively tested, with text ml proving superio r at handlin g th e large (sometimes in the mb-size range) and structurall y complex ead documents crea ted by consortium memb ers. one key strength of textml that m et an nwda consortium-need involved field sea rching. in textml, it is possibl e to m ap a search field to one or m ore xpath s ta tements , enabling th e crea tion of sea rch fields b ase d upon the precise us e of an element or attribute in ead d ocuments. the importanc e of this capability is show n with th e ead element, which can appear at the collection lev el and at the sub or dinate component level in a docu men t. with textml, usin g its limited xpath support, it is p ossib le to refer ence a specific, contextual use of . in addition to the native xml sol utions , seve ral oth er product 182 information technology and libraries i december 2004 types were considered. an xml qu ery engine, verity ultra seek, was te sted and produced good results whe n u sed for the search and retrieval of consortium docum en ts. 13 ultraseek can be used to search discrete xml files , supports th e creation of custom int erfaces for th e searc hand-r etrie va l sys tem, and ha s strong documentation . pro bably th e most obvious limit a tion in thi s xml qu eryengin e product conc erned the crea tion of search fields. to contras t u ltr asee k with a native xml solution : ultras eek 5 .0 (used du r ing the product trial) lacked xpath support. inste ad, it requir ed a uniqu e eleme ntattr ibute combin ation for the crea tion of a databa se sea rch field . returning to th e exa mple , cont extual u ses of could n ot be indexed with o ut recoding consortium docum ent s to create a unique eleme nt-attribut e combination on which to ind ex. an xml-enabled databa se, dlxs xpat, has b een successfully used in se veral ead projects, including oac. one d isadv antage of this product is th at it re quir es a unix operating sys tem for th e se rver. a dditionall y, xpat, as a supporting toolse t for di gital-library collection building, provid es functionalit y that duplicates other media tool s at the ho st institution (specifically, oclc/ dim ema contentdm). the use of a relational dat abase management system (rdbms) to es tablish sea rch and retri eva l for ead xml d ocume nts was con sidered as well. th e advantage to thi s approac h is th at it would ena ble the u se of codin g techniques built up through other web-based media d elivery proj ects at the ho st institution. the mo st obvio us negati ve issue is th e need to map xml elements or attributes to tables and field s in an rdbms, which , as cyrenne notes, "is often expensiv e and will m ost likely res ult in the loss of some dat a suc h as processing in stru ctions , and comments as well as the noti on of eleme nt and attribut e orderin g." 14 the reproduced with permission of the copyright owner. further reproduction prohibited without permission. table 1. nwda project---€valuated search and retrieval products product vendor product category license mysquphp n/ a relationa l database management system open so urce tamino xml server software ag native xml database nat ive xml database xml query engine native xml database integrated library system xml enabled database commercial commercial commercial open source commercial commercial textml lxiasoft ultraseek verity xindice n/a xml harvester innovative interfaces xpat dlxs use of native xml avoids the task of explodin g xml data in to the tabl e and field struc ture s of an rdbms. finally, another approac h considered was the use of an integrated library sys tem product. this was a realistic option for nwda becaus e consortium member institutions had decid ed to include marc encoding catalogs for selected elements in union-catalog findin g ai ds. inn ovative int er faces produces an xml harve ste r that can be u sed to gen erate marc records from ead findin g aids th a t include marc encoding analo gs. for this proj ect, a local ( or self-cont aine d) cat alog could hav e been created and p opulated with marc records containing metadat a for th e ead docum en ts, includin g a url for online access. this approach offers important strengths and weaknesses . on the positiv e side, it is a relati ve ly eas y meth od for enablin g search-and-retrieval access to ead findin g aids. in contrast to the int er face coding requirement s for textml, the xml harvest er provided an almost tu rn key approach to xml search and retrieval. on the negativ e side, tw o factors stood out during th e evaluation . first, it would be difficult to fully custom ize sea rch-andretrieval interfaces as needed for th e proj ec t. second, u sing the xml harv ester, there is no ability to display searc h terms in the context of the findin g aid. search and retrieval is bas ed upon the m etadata extract ed from th e finding aid usin g th e marc analogs. in michigan state's voice librar y impl eme ntation of thi s so lution , th e finding aid is an external resource with no hi ghlighting of search ter ms . strengths and weaknesses of the textml approach each p roject has it s ow n specific n eed s; thu s, th ere is no correct approach to establishing searc h and retrieval for ead xml documents. in taking th e needs and resources of th e nwda conso rtium into account, ixiaso ft's textml, a nati ve xml prod uct, pr ovi ded the best fit and was licens ed for u se. the use of textml enables the creation of customized interfac es for an xml d atabase (or docum en t base, u sing the textml terminol ogy) and pro vides support for ke yword and field sea rching of consortium documents. the qualified xpath support in textml enables search fields to be built up on precis e element or attribute combinations wi thin ead document s. the existence of a major findingaids int erne t site empl oy ing textml was a factor in the proj ect's selection of the sof tware . th e acces s to archive s (a2a) site, access ible from url www .a2a.pro.gov.uk / , provid es an excell ent model for a publicly sea rchabl e finding-aid sit e. th e a2a site supp orts keyw ord searching and sea rchin g b y arc hival facility; provides multiple views of sea rch results (a summary recor ds scree n, sea rch ter ms in cont ext, and th e full rec ord); highlights searc h term(s) in the displayed findin g aid ; and supp or ts the presentation of lar ge findin g -aid doc ument s. while a2a u ses ge neral internation al standard arc hival description, or isad(g), as op posed to ead for its description standard, the similaritie s between th e two stand a rd s m akes th e a2a site a va luable example for d eve lopment. '5 one w eakness of textml is the implementation model supported by ixiasoft, whi ch assumes significant local de velopme nt of the app lication or web int er face. th e rela tionship b etween sof tw are cap abiliti es and local dev elopme nt was con sidered with each of the produ cts listed in tab le 1. as no ted , th e innovative interfaces so lution was th e most straightfor wa rd approach , assu ming the existenc e of the marc analogs in ead marku p, but provid ed the least flexibility in terms of customization an d establishing a tru e linkage between the searc h system and the actual document. in contra st, while ixiasoft m akes available a base set of active server pages using visual basic script (asp / vbscript) code for textml app lication de velop ment and provides very goo d trainin g and support ser vices, the resp onsi bility for using a native xml database for encoding archival description search and retrieval i cornish 183 reproduced with permission of the copyright owner. further reproduction prohibited without permission. that d evelopm ent rests with the loca l site . for the nwda consortium, this development, using the co de base, ha s been manag ea ble. the curr ent state of interface dev elopment for the nwda proj ect can b e reviewed at http: // nwd a.ws ulibs .ws u.edu / project_info /. conclusion in se le cting a n ead se archan dretr ieva l sy s te m, on e important qu es ti on for th e con so rtium was, whi ch software so lution had the be st prosp ects for migration in the futur e ? becau se of th e inherent strength s of native xml tec hnolog y in comparison to the other product catego r ies listed in table 1, a nativ e xml d a tabase appeared to be the be s t approach , and tex tml pro v ided the best combination of lic ensing costs, softw are capabilities, and support. it is import a nt to not e that the di stinctions betw ee n nativ e xml d atabas es and databases that supp or t xml throu g h extensions (xmlenabl ed datab ases) 1nay b eco me more difficult to di scern ov er time, in part due to the ex isting expertise and inv es tments in rdbms techn o lo gies.16 ne verthel ess, capabilities central to native xml, such as the us e of an xml-bas ed query language, are integral to th e success of such h ybrid systems . references and notes 1. daniel pitti, "enc oded archi va l description: the dev elop ment of an encoding standard for archival findin g aids," the american archivist 60, no. 3 (summer 1997): 269. 2. daniel pitti, "encod ed archi va l descrip tion: an introducti on and overview," d-lib magazine 5, no. 11 (nov. 1999). accessed nov. 2, 2004, www.dlib. org / dlib / novemb er99 / 11 pi tti.h tml. 3. daniel v. pitti and wendy m. duff (eds.), "introdu ction ," in encoded archival description on the internet (binghamton, n.y.: haworth, 2001), 3. 4. james m. roth, "serving up ead: an exploratory stud y on the deployment and utili zation of encoded archiva l description findin g aids, " the american archivist 64, n o. 2 (fall/ winter 2001): 226. 5. sarah l. shreeves et al., "h arvesting cultural her itage metada ta using the oai protocol," library hi tech 21, no. 2 (2003): 161. 6. nanc y fleck and michael seadle, "ead harv es ting for the na tional gallery of the spoken word" (pap er present ed at th e coalition for netw orked information fall 2002 task force meeting, san antoni o, tex., dec. 2002). accessed nov. 2, 2004, www.cn i.org/ tfms/20 02b. fall/ handout s/ h-ead-fleckseadl e.doc . 7. anne j. gilliland -swetland, "popularizing the finding aid : exploiting ead to enhance online discovery and retrieval," in encoded archival description on the internet (binghamton, n.y.: h aw orth, 2001), 207. 8. ibid , 210-14. 9. charlotte b. brown and brian e. c. schottlaender, "the online archive of california: a consor tia! approach to encoded archival description ," in encoded archival description on the internet (binghamton, n .y.: haworth , 2001), 99. 10. ibid, 103-5. oac available at: www. oac.cd lib. o rg/. accessed nov . 2, 2004 . 11. christ ophe r j. prom and thomas habing, "using the op en archives initiative protocols with ead," in proceed ings of the second acm/ ieee-cs joint conference on digital libraries (portland, ore., july 2002). accessed nov. 2, 2004, http:// dli .grainger. ui uc.ed u / publ ications/ jcdl20 02/ pl4prom .pdf . 12 . marc cyre nn e, "going n ative : wh en should you use a nativ e xml database?" aim e-doc magazine 16, no. 6 (nov./ dec. 2002), 16. accessed nov. 2, 2004, www .edocmag az ine.com / ar ticle_ new.asp?id=25421. 13. product categor y decisions based up on definitions and classifications available from : ronald bourret, "xml database products." accessed nov. 2, 2004, www. rp bourret.com / xml / xmld a t a b a se prods.htm. 14. cyrenne, "going native, " 18. 15. bill stockting, "ead in a2a," microsoft powerpoint present at ion. accessed n ov. 2, 2004, www.agad .archiwa. gov.pl/ ead / stocking.ppt. 16. uw e ho henst ein, "supporting xml in oracl e9i," in akmal b. chaudhri, 184 information technology and libraries i december 2004 awais rashid, and roberto zicari (eds.), xml data management: native xml and xml-enabled database systems (boston: add ison -wesley, 2003), 123-4 . using gis to measure in-library book-use behavior jingfeng xia this article is an attempt to develop geographic information syst ems (gis) technologi; into an analytical tool for examining the relationships between the height of the bookshelves and the behavior of library readers in utiliz ing books within a library. the tool would contain a database to store book-use information and some gis maps to represent bookshelves. upon analyzing the data stored in the database, different frequ encies of book use across bookshelf layers are displayed on the maps. the tool would provide a wonderful means of visualization through which analysts can quickly realize the spatial distribution of books used in a library. this article reveals that readers tend to pull books out of the bookshelf layers that are easily reachable by human eyes and hands, and thus opens some issues for librarians to reconsider the management of library collections. several years ago , when working as a library ass istant reshelving books in a univer sit y library, the author noted that the majority of books used inside th e librar y were from the mid-range layers of b oo kshelv es . that is , by proportion, few book s pulled out by librar y rea ders were from the top or b ottom layers. books on the layers that were ea sily re achable by readers were frequently utilized . such a b oo k-u se distribution patt ern mad e th e job of reshelving books easy, but created some inquiries: how could book locations influ ence th e choices of read ers in selecting books? if this was not a n isolat ed observ a tion, it must have exposed an int ere sting 108 journal of library automation vol. 14/2 june 1981 has shown that interactive television programs: 1. serve as an initial introduction to naive audiences of what a truly interactive system is all about; 2. are difficult to implement; 3. really aren't democratic; 4. are basically polling devices. it has been said that the reason that railroads went out of business was because they insisted that they were in the railroad business and wouldn't admit that they were in the transportation business . if cable operators insist that they are in the television business, they may well miss the opportunities that are possible in the communications business or, in fact, in the information business . by the same token, if libraries miss the significance of what cable television is bringing to their business, their role in the community will be diminished and libraries may go the way of railroads. modern communications and computers offer an opportunity for libraries to become the information choice in their community. in the near future, applications such as the home book club may well be a way to provide increased accessibility of library services to library patrons, and to "condition" those patrons to the coming electronic nature of libraries. over the long term, libraries, if they have the courage and the foresight, can be the focus of the coming information and telecommunications revolution . the message is quite clear: opportunities abound. references l. john wicklein, "wired city, u.s.a: the charms and dangers of two-way tv," atlantic monthly 243:35--42 (feb. 1979). 2. warner amex represents a newly formed corporation resulting from the merger of warner communications and american express. 3. jonathan black, "brave new world of television," new times 11:41 (24 july 1978). 4. ibid ., p.49. 5. "warner cable's qube: exploring the outer reaches of two-way tv, " broadcasting 95:28 (31 july 1978). 6. "two-way converters hot ticket at ncta exhibits , " broadcasting 97:72 (26 may 1980). an informal survey of the cti computer backup system joseph covino and sheila intner: great neck library, great neck, new york. in order to help decide whether or not to purchase computer backup systems from computer translation, inc . (cti), * for use when the clsi libs 100 automated circulation system is not operating, great neck library conducted an informal survey of libraries using both systems . eleven institutions, including both public and academic libraries , responded to a brief questionnaire. they were asked what size cti system they had purchased and why, how easily it was installed, how well it performed, how it was maintained, and if clsi acknowledged that the addition of the backup did not affect their libs 100 maintenance agreements . before summarizing the responses, the structure of the two systems and how they interact should be outlined. clsi libs 100 the clsi automated circulation system consists of a stand-alone minicomputer console with local and/or remote terminals connected to it through individual ports by means of electrical and/or dedicated telephone line hookups . when it operates, the terminals are online and interactive with the database, which is stored on one or more multiplatter disc packs. cti backup the cti backup system is based on an apple ii microcomputer with two minidisc drives, which take 51/.-inch floppy discs, a tv monitor, and a switching system that can be connected to the libs 100 console or its terminals. the cti system can also be used alone . when the libs 100 is down (inoperative), the cti system is connected to a terminal, and data is recorded on its discs for later dumping (data entry) into the database via a port connection . it *cti is a profit-making company wholly owned by brigham young university. the cti backup system was originally developed to support the clsi-installation at byu. appears to the public and the library's staff memb e r operating the backup-terminal combination that the terminal is working. there is, however, no connection between the backup unit and the database in this mode. when the libs 100 is up (operating) once again, the backup is connected and data is automatically dumped. naturally the port cannot be used by both the clsi terminal and the backup unit at the same time without the addition of other hardware . the terminals attached to other ports may operate normally while dumping is completed. the clsi and cti software, which operate compatibly, are owned by the respective companies, not the library . the responses 1. size of system : cti systems are available in two sizes , 32k and 48k . two libraries purchased the smaller system, nine purchased the larger system, and one purchased both. greater programming capabilities of the larger system were consid e red its greatest asset. 2. reason for purchase: five libraries indicated they use the backup for other purposes in addition to substituting for the libs 100 when it is down . among these other purposes were development of a community information database, personnel and financial reports and files, use as an rlin terminal, as a bookmobile terminal, and as an aid in converting short-title bibliographic records to expanded format. 3. installation: respondents were unanimous in having no problems with installation . seven did their own installation, while cti gave instructions over the phone. three were installed by cti, who also trained the library staff in its operation. one library indicated the accompanying documentation was enough to install the system without assistance. 4. performance: all eleven respondents were enthusiastic about system performance. some comments were, "it's the best thing since buttered communications 109 popcorn," and "we love it dearly 0 0 0 0 it saves hours 0 0 0 works just fine . " many commented on the slow dumping time as the biggest drawback, but noted that increased accuracy over manual entry and decreased pressure on their circulation staff during downtime were assets. 5. maintenance: backup system maintenance is not uniform . six respondents said that software was maintained by cti, but hardware was maintained by an apple dealer; or they were undecided about who would be respons ible for hardware repairs. a seventh library contracted with an apple dealer for hardware repairs, but was contending over software mai ntenance with ct i. three libraries answered that cti was maintaining the system, but did not specify both hardware and software . the last respondent expected to take hardware repairs to an apple dealer and did not mention software . 6. clsi maintenance agreements: one library stated that they had written assurance from clsi that the installation of the backup system would not affect their libs 100 maintenance contract. three more said they had verbal assurances . five respondents indicated no assurances from clsi that the libs 100 contract was not affected . one library sent a copy of a clsi · letter defining company policy in this area. it said, in part: "clsi does not prohibit the attachment of foreign devices to the systems . .. . " qualifications to this statement involved an inst itution's attempt to repair the libs 100 itself, to hold clsi responsible for damage resulting from the attachment of the device, or to have clsi maintain the device . the great neck library decided to purchase two cti backup systems for use when the libs 100 is down. experience bears out the findings of the survey ; i.e . , it is easy to install the system with only telephone assistance; it works well, and, though data transmission to the main unit is slow, it is accurate and removes some of llo journal of library automation vol. 14/2 june 1981 the desperation from a downtime situation. great neck library is also planning to use the apples for other functions, which, it is hoped, will be implemented soon . multimedia catalog: com and online kenneth j. bierman: tucson public library, tucson, arizona. like many public libraries, the tucson public library (tpl) is closing its card catalog and implementing a vendorsupplied microform catalog. unlike most of these other libraries, however, the tpl microform catalog will not include: location or holding information. the indication of where copies of a particular title are actually available (i.e., which of the fifteen possible branch locations) will be available only by accessing a video display terminal connected to the online circulation and inventory control system. conceptually, the tpl catalog will be in two parts with each part intended to serve different functions . 1 the microform catalog (copies available in both film and fiche format) will fulfill the bibliographic function of the catalog. this catalog will contain bibliographic description and provide the traditional access points of author, title, and subject. the online catalog (online terminals are in place at all reference desks and a few public access terminals will also be available) will fulfill the finding or locating function of the catalog. this catalog will contain very brief bibliographic description and will only be searchable by author, title, author/title, and call number, and will contain the current status of every copy of every title in the library system (i.e., on shelf, checked out, at bindery , reported missing, etc.). why did the tucson public library make this decision? there are two major reasons: l. accuracy . the location information , if provided in the microform catalog, would always be inaccurate and out of date. assuming that the locations listed in the latest edition of the microform catalog were completely accurate when the catalog was first issued (an unrealistic assumption to begin with as anyone who has ever worked with location information at a public library with many branches well knows!), the location information would become increasingly less accurate with each day because of the large number of withdrawals, transfers, and added copy transactions that occur (more than 100 , 000 a year). in addition, at any given time, one-quarter to one-third of the materials in busy branches are not on the shelf because they are either checked out or waiting to be reshelved. thus, the microform catalog would indicate that these materials were available at specific branches when a significant percentage would in fact not be available at any given time. in short, even in the best of circumstances, easily half of the location information would be incorrect in telling a user where a copy of a title was actually available at that moment. 2 . cost . a study done at the tucson public library indicated that close to half of the staff time of the cataloging department was spent dealing with location and holding information. this time includes handling transfers, withdrawals, and added copies. all of this record keeping is already being done as a part of the online circulation and inventory control system (the tucson public library has no card shelflist containing copy and location information but rather relies completely on the online file for this type of information) . to "duplicate" the information in the microform catalog would cost an estimated $40,000 to $60,000 a year and the information in the microform catalog would never be accurate or up to date for the reasons outlined above. figure 1 is a brief summary of how the bibliographic system will work. would the system in figure 1 be improved if holdings were included in the microform catalog? on the surface, the obvious answer is yes-more information is september_ital_ozeran_for_proofing managing metadata for philatelic materials megan ozeran information technology and libraries | september 2017 7 abstract stamp collectors frequently donate their stamps to cultural heritage institutions. as digitization becomes more prevalent for other kinds of materials, it is worth exploring how cultural heritage institutions are digitizing their philatelic materials. this paper begins with a review of the literature about the purpose of metadata, current metadata standards, and metadata that are relevant to philatelists. the paper then examines the digital philatelic collections of four large cultural heritage institutions, discussing the metadata standards and elements employed by these institutions. the paper concludes with a recommendation to create international standards that describe metadata management explicitly for philatelic materials. introduction postage stamps have existed since great britain introduced them in 1840 as a way to prepay postage. historian and professor winthrop boggs (1955) points out that postage stamps have been collected by individuals since 1841, just a few months after the first stamps were issued (5). to describe this collection and research, the term philately was coined by a french stamp collector, georges herpin, who “combined two greek words philos (friend, amateur) and atelia (free, exempt from any charge or tax, franked)” (boggs 1955, 7). thus postage stamps and related materials, such as the envelopes to which they have been affixed, are considered philatelic materials. in the united states, numerous societies have formed around philately, such as the american philatelic society, the postal history society, the precancel stamp society, and the sacramento philatelic society (in northern california). the definitive united states authority on stamps and stamp collecting for nearly 150 years has been the scott postage stamp catalogue, which was first created by john walter scott in 1867 (boggs 1955, 6). the scott catalogue “lists nearly all the postage stamps issued by every country of the world” (american philatelic society 2016). philately is a massively popular hobby, and cultural heritage institutions have amassed large collections of postage stamps through collectors’ donations. in this paper, i will examine how cultural heritage institutions apply metadata to postage stamps in their digital collections. libraries, archives, and museums have obtained specialized collections of stamps over the decades, and they have used various ways to describe these collections, such as through creating finding aids. only recently have institutions begun to digitize their stamp collections and make the collections available for online review, as digitization in general has become more common in cultural heritage institutions. megan ozeran (megan.ozeran@gmail.com), a recent mlis degree graduate from san jose state university school of information, is winner of the 2017 lita/ex libris student writing award. managing metadata for philatelic materials | ozeran | doi:10.6017/ital.v36i3.10022 8 problem statement textual materials have received much attention in regards to digitization, including the creation and implementation of metadata standards and schemas. philatelic materials are not like textual materials, and are not even like photographic materials, which have also received some digitization attention. in fact, there is very little literature that currently exists describing how metadata is or should be applied to philatelic materials, even though digital collections of these materials already exist. therefore, the goal of this paper is to examine exactly how metadata is applied to digital collections of philatelic materials. several related questions drove the research about this topic: as institutions digitize stamp collections, what metadata schema(s) are they using to do so? are current metadata standards and schemas appropriate for these collections, or have institutions created localized versions? what metadata elements are most crucial in describing philatelic materials to enhance access in a digital collection? literature review while there is abundant literature regarding the use of metadata for library, archives, and museum collections, there is a dearth of literature that specifically discusses the use of metadata for philatelic materials. indeed, there is no literature at all that analyzes best practices for philatelic metadata, despite the fact that several large institutions have already created digital stamp collections. even among the many metadata standards that have been created, very few specify metadata guidelines for philatelic collections. it is clear that philatelic collections have not been highlighted in discussions over the last few decades about digitization, so best practices must be inferred based on the more general discussions that have taken place. the purpose and quality of metadata when considering why metadata is important to digital collections (of any type), it is crucial to remember, as david bade (2008) puts it, “users of the library do not need bibliographic records at all. . .. what they want is to find what they are looking for” (125). in other words, the descriptive metadata in a digital record is important only to the extent that it facilitates the discovery of materials that are useful to a researcher. as arms and arms (2004) point out, “most searching and browsing is done by the end users themselves. information discovery services can no longer assume that users are trained in the nuances of cataloging standards and complex search syntaxes” (236). echoing these sentiments, chan and zeng (2006) write, “users should not have to know or understand the methods used to describe and represent the contents of the digital collection” (under “introduction”). when creating digital records, then, institutions need to consider how the creation, display, and organization of metadata (especially within the search system) make it easier or more difficult for those end users to effectively search the digital collection. how effective metadata is in facilitating user research is ultimately dependent upon the quality of that metadata. bade (2007) notes that the information systems are essentially a way for an institution to communicate with researchers, and that this communication is only effective if metadata creators understand what the end users are looking for in the content and style of information technology and libraries | september 2017 9 communication (3-4). thus, in somewhat circular fashion, metadata quality is dependent upon understanding how best to communicate with end users. to help define discussions of metadata quality, bruce and hillmann (2004) suggest seven factors to consider: “completeness, accuracy, provenance, conformance to expectations, logical consistency and coherence, timeliness, and accessibility” (243). deciding how to prioritize one or several factors over the others will depend on the resources and goals of the institution, as well as the ultimate needs of the end users. the state of standards standards are created by various organizations to define the rules for applying metadata to certain materials in certain settings. standards generally describe a metadata schema, “a formal structure designed to identify the knowledge structure of a given discipline and to link that structure to the information of the discipline through the creation of an information system that will assist the identification, discovery and use of information within that discipline” (cc:da 2000, under “charge #3”). essentially, a metadata schema standard demonstrates how best to organize and identify materials to enhance discovery and use of those materials. such standards are helpful to catalogers and digitizers because they define rules for how to include content, how represent content, and/or what the allowable content values are (chan and zeng 2006, under “metadata schema”). unfortunately, very few current metadata standards even mention philatelic materials, despite their unique nature. the only standard that appears to do so with any real purpose is the canadian rules for archival description (rad), created by the bureau of canadian archivists in 1990, and revised in 2008. thirteen chapters comprise the first part of the rad, and these chapters describe the standards for a variety of media. philatelic materials are given their own focus in chapter 12, which discusses general rules for philatelic description as well as specifics for each of nine areas of description: title and statement of responsibility, edition, issue data, dates of creation and publication, physical description, publisher’s series, archival description, note, and standard number. the rad therefore provides a decent set of guidelines for describing philatelic materials. the encoded archival description tag library created by the society of american archivists (ead3, updated in 2015) mentions philatelic materials only in passing. there is no specific section discussing how to properly apply descriptive metadata to philatelic materials. the single mention of such materials in the entire ead3 documentation is in the discussion of the tag, where it is noted that “jurisdictional and denominational data for philatelic records” (257) may be recorded. other standards don’t appear to mention philatelic materials at all, so implementers of those standards must extrapolate based on the general information provided. for example, describing archives: a content standard (dacs), also published by the society of american archivists (2013), does not discuss philatelic materials in any way. it does note, “different media of course require different rules to describe their particular characteristics…” (xvii), but the recommendations for specific content standards for different media listed in appendix b still leave out philately (141142). institutions using dacs for philatelic materials need to determine how to localize the standard. although marc similarly does not include specific guidelines for philatelic materials, peter roberts (2007) suggests ways to effectively use it for cataloging philatelic materials. for managing metadata for philatelic materials | ozeran | doi:10.6017/ital.v36i3.10022 10 instance, in the marc 655 field he suggests using the getty art and architecture thesaurus terms to describe the form of the materials and the library of congress subject headings to describe the subjects (genres) of the materials (86-87). in similar ways, most standards could potentially be applied to philatelic materials if an institution were to provide additional local rules for how to best implement the standard. the metadata that philatelists want there are actually a good number of resources for determining what metadata is important to philatelic researchers. boggs (1955) suggests that a philatelist may want to “study the methods of production; the origin, selection, and the subject matter of designs; their relation to the social, political and economic history of the country of issue; the history of the postal service which issued them” (1-2). these few initial research suggestions can provide some insight into what metadata elements would be most useful in a digital record. david straight (1994) suggests the most basic crucial items are the date and country of issue for an item (75). roberts (2007) provides significant background about philatelic materials and research, and indicates multiple metadata elements that will be helpful for researchers. he reiterates that dates are extremely useful, and are often identified on the materials themselves; when specific dates are not visible, a stamp itself may provide evidence of an approximate year based on when the stamp was issued (75). he notes that many of the postal markings also “indicate the time and place of origin, route, destination, and mode of transportation” (78), which will also be of interest to philatelic researchers. if any information is available about the original collector, dealer, or exhibitor of the stamp before it was acquired by a cultural heritage institution, this may also be of great interest to a researcher (81). roberts also suggests that the finding aids for philatelic collections are more crucial places for description than for specific item records, and that controlled vocabulary subject terms are important in these descriptions (86). because the scott postage stamp catalogue is the leading united states authority on stamps, it can also suggest the metadata elements that primarily concern philatelic researchers. each listing includes a unique scott number, paper color, variety (e.g., perforation differences), basic information, denomination, color of the stamp, year of issue, value used/unused, any changes in the basic set information, and the total value of the set (scott publishing co. 2014, 14a). the scott catalogue also describes a variety of additional components that researchers may be interested in, including the type of paper used, any watermarks, inks used, separation type, printing process used, luminescence, and gum condition (19a-25a). one additional interesting source for deciding what metadata is important to researchers (aside from directly surveying them, of course) is a piece of software that was created to help philatelists catalog their own private collections. stampmanage is available in united states and international versions, and it is largely based on the scott postage stamp catalogue in creating the full listing of stamps that may be available to a collector. it includes a wide variety of metadata elements for cataloging stamps, such as the scott number, country of origin, date of issue, location of issue, type of stamp, denomination, condition, color, brief description, presence and type of perforations, category, plate block size, mint sheet size, paper type, presence and type of watermark, gum type, and so forth (liberty street software 2016). as a product that is sold to stamp collectors, information technology and libraries | september 2017 11 stampmanage is likely to have a confident grasp of all the metadata that could possibly be important to its customers. this literature review helps create a holistic view of the issues faced by cultural heritage institutions with digitized stamp collections. although little progress has been made in the literature to describe how best to apply metadata to philatelic materials, there are ways that institutions can extrapolate guidelines from the literature that does exist. methodology to explore my research questions, i interviewed (over email) representatives of several large institutions with digitized stamp collections. the information provided by these institutions sheds light on the current state of metadata and metadata schemas for philatelic collections. note that there are other institutions with online collections of postage stamps that are not discussed in this paper (e.g., the swedish postal museum, https://digitaltmuseum.se/owners/s-pm). due to my own language limitations, this paper is limited to analysis of online collections that are described in english. additional research into institutions with non-english displays would support greater analysis of how cultural heritage institutions are currently creating and providing philatelic metadata. results smithsonian national postal museum in the united states, the largest publicly accessible digital collection of philatelic materials is from the smithsonian national postal museum. i discussed the metadata for this collection with elizabeth heydt, collections manager at the museum. ms. heydt stated that the stamps are primarily identified “by their country and their scott number” (e. heydt, pers. comm., october 5, 2016). for digital collections, the smithsonian national postal museum uses a gallery systems database called the museum system, which includes the getty art and architecture thesaurus as an embedded thesaurus. ms. heydt noted that aside from this embedded thesaurus, they “do not use any additional, formalized data standards such as the dublin core, mods,” or the like. of note, the museum system does allege compliance with “standards including spectrum, cco, cdwa, dacs, chin, lido, xmp, and other international standards” (gallery systems 2015, 4). the end user interface that pulls data from the museum system is called arago, which has “an internal structure that built on the scott catalogue system and some internal choices for grouping and classifying objects for the philatelic and the postal history collections.” users can search and browse the entire digital collection through arago, but ms. heydt did note that arago “is in stasis right now as we are in the planning stages for an updated version sometime in the near future.” based on an example record (http://arago.si.edu/record_145471_img_1.html), the descriptive metadata currently available for end users include a title, scott number, detailed description (including keywords), date of issue, medium, museum id (a unique identifier), and place of origin. digital images of the stamps are also included. a set of “breadcrumb” links at the top of the page also allow a user to browse each level of the digital collection, from an individual stamp record up to the entire museum collection as a whole. managing metadata for philatelic materials | ozeran | doi:10.6017/ital.v36i3.10022 12 library and archives canada i discussed the library and archives canada (lac) online philatelic collection with james bone, archivist at the lac. he explained that the philatelic collection has had a complicated history: our philatelic collection largely began with the dissolution of the national postal museum … in 1989 and the subsequent division and transfer of its collection to the canadian postal museum for artifacts/objects at the former canadian museum of civilization (now the canadian museum of history) and to the canadian postal archives at the former national archives (which was merged with the national library in the mid-2000s to create library and archives canada). as a side note, both the canadian postal museum and the canadian postal archives are themselves now defunct – although lac still acquires philatelic records and records related to philately and postal administration, these functions are no longer handled by a dedicated section but rather by archivists within our government records branch and our private records branch (the latter being me). (j. bone, pers. comm., october 11, 2016) regarding the collection’s metadata, mr. bone confirmed that the archival records at the lac all conform to the rad standard (discussed in the literature review above), and that philatelic materials are all given “at least a minimum level of useful file level or item level description for philatelic records based on chapter 12 of rad,” the chapter that specifically discusses philatelic materials. unfortunately, to his knowledge, the online database for these records does not use a common metadata standard such as oai-pmh that enables “external metadata harvesting or querying,” so the system is not searchable outside of the lac website. mr. bone also pointed out that there are fields visible on the back end of the lac online database that are not visible to end users, and the most notable of these omissions is the scott number (the number assigned to every stamp by the scott catalogue). he wrote that it seemed “bizarre” to not have the scott number visible, “as that’s definitely an access point that i would expect philatelic researchers to use to narrow down a result set to the postage stamp issue of interest.” however, it appears this invisibility was a decision consciously made by the lac, based on mr. bone’s review of an internal lac standards document. based on an example record (http://collectionscanada.gc.ca/pam_archives/index.php?fuseaction=genitem.displayitem&lang= eng&rec_nbr=2184475) the following fields are available for end users to view: title, place of origin, denomination, date of issue, title of the collection of which it is a part, extent of item, language, access conditions, terms of use, mikan number (a unique identifier), itemlev number (deprecated), and any additional relevant information such as previous exhibitions of the physical item. the postal museum the postal museum in london is set to open its physical doors in 2017, but much of the collection is already available for browsing and searching online. stuart aitken, curator, philately, explained to me that the online collection uses the general international standard archival description, second edition, as the primary metadata schema, but the online collection also includes “non isad(g) fields for certain extra-specific data for our archive material, including philatelic material” (s. aitken, pers. comm., december 1, 2016). based on my own review of the isad(g) standards information technology and libraries | september 2017 13 document (international council on archives 1999) and an example record from the postal museum’s online collection (http://catalogue.postalmuseum.org/collections/getrecord/gb813_p_150_06_02_011_01_001#cu rrent), it appears nearly all the fields are based on the isad(g) standards. these fields include information such as date, level of description, extent of item, language, description, and conditions for access and reproduction. only the field for “philatelic number” appears to be extra. there may be additional non-isad(g) fields that are not included in the example record above, but are included in other records when the extra information is available and relevant. each digital record also allows end users to submit tags for help with identification and search. no tags were already submitted on the example record reviewed above, but this is likely because the online collection is still rather new. of note, digital records are created at each archival level, from the broadest collection category down to the individual item (similar to the smithsonian national postal museum collection). to provide an additional way to browse the collection, a sidebar in each digital record shows where it exists in the hierarchy of collections and provides links to each broader collection of which the current record is a part. the british museum i reached out to the folks at the british museum to discuss the application of metadata to their online records for postage stamps, but at the time of this writing i have not received any response. however, some information can be gleaned from examining the website. unlike the other institutions reviewed in this paper, the british museum’s online collection includes a wide variety of objects. postage stamps are therefore identified in the online collection by specifying “postagestamp” in the “object type” field, which likely uses a controlled vocabulary. based on an example record (http://www.britishmuseum.org/research/collection_online/collection_object_details.aspx?objec tid=1102502&partid=1&searchtext=postage+stamp&page=1), each record for a postage stamp lists the museum number (a unique identifier), denomination, description, date issued, country of origin, materials, dimensions, acquisition name and date, department, and registration number (which appears to be the same as the museum number). digital images of the stamps are occasionally included. the collection website notes that the british museum is “continuing every day to improve the information recorded in it [the digital collection] and changes are being fed through on a regular basis. in many cases it does not yet represent the best available knowledge about the objects” (trustees of the british museum 2016a, under “about these records”). therefore, end users are encouraged to read the information in any given record with care, and to provide feedback if they have any additional information or corrections about an object. the online collection also is offered in machine-readable format, via linked data and sparql, to encourage wider accessibility and use. the website advises, the use of the w3c open data standard, rdf, allows the museum's collection data to join and relate to a growing body of linked data published by other organisations around the world interested in promoting accessibility and collaboration. the data has also been organised using the cidoc crm (conceptual reference model) crucial for harmonising managing metadata for philatelic materials | ozeran | doi:10.6017/ital.v36i3.10022 14 with other cultural heritage data. the cidoc crm represents british museum's data completely and, unlike other standards that fit data into a common set of data fields, all of the meaning contained in the museum's source data is retained. (trustees of the british museum 2016b) each digital object has rdf and html resources, as well as a sparql endpoint with an html user interface. discussion the information from the four institutions above provides a starting point for examining best practices for philatelic metadata. in the following discussion, i will review the information in light of the research questions: important metadata elements, the standards that were implemented, and whether the standards that currently exist have been sufficient. as explained in the literature review above, relevant metadata are crucial for enhancing end user research of digital records. this suggests that similarity of metadata across collections of the same type will improve users’ ability to conduct their research. unfortunately, there are only a few descriptive metadata fields used across all four of the institutions reviewed in this paper. these fields include a title (sometimes used very loosely), the date of issue, the place of issue, a description, and a unique identifier. these fields certainly seem to be the absolute minimum necessary for identifying (and searching for) a postage stamp, since they are among the fields discussed in the literature review as being important to philatelic researchers. other fields that are included in some but not all of the above collections, such as stamp denomination and access conditions, are nonetheless quite relevant to online collections of postage stamps. interestingly, although the scott catalogue is recognized as a premier stamp catalogue, only one institution (the smithsonian national postal museum) currently uses the scott identification number as part of the standard philatelic metadata. as noted above, the library and archives canada does include the scott number in the behind-the-scenes metadata, but does it not display the scott number to end users. the postal museum and the british museum don’t use the scott number at all. it appears that only the smithsonian believes the scott number is useful to end users, either for search or identification purposes. of the four institutions, it appears that only the british museum uses metadata standards that increase the accessibility of the online collection beyond its own website. the implementation of rdf for linked data creates an open collection that is machine-readable beyond the internal database used by the museum. the smithsonian national postal museum, library and archives canada, and the postal museum do not appear to use any similar metadata standard for data harvesting or transmission, which means that these collections can only be searched from within their respective websites. the most important thing to note in reviewing the online collections for these four institutions is the fact that each institution uses different standards to apply metadata in a different way. frankly, this is not a surprise. as discussed in the literature review above, although metadata standards exist for a variety of materials, philatelic materials are simply not considered. only the canadian rules for archival description explicitly include information about philatelic materials; information technology and libraries | september 2017 15 accordingly, the library and archives canada utilizes these rules when creating its online records of postage stamps. no similar standard exists in the united states or internationally, leaving individual institutions with the task of deciding what generic metadata standard to use as a jumping off point, and then modifying it to meet local needs. as described above, the smithsonian national postal museum uses the metadata schema that comes with their collection management software, and has created an end-user interface based off of internal metadata decisions. the postal museum based their metadata primarily off of isad(g), an international metadata standard with no specific suggestions for philatelic materials. i was unable to confirm the base metadata schema the british museum employs, although it is clear they use rdf to make the collection’s digital records more widely available. each institution appears to be using a different base metadata standard, essentially requiring them to reinvent the wheel upon deciding to digitize philatelic materials. this is what happens when there is no single, unified standard available for the type of material being described. conclusion as this paper has shown, metadata standards are sorely lacking when it comes to philatelic materials. other kinds of materials have received special considerations because more and more institutions decided it would be important to digitize them, so various groups came together to create standards that provide some guidance. it is time for this to happen for philatelic materials as well. there aren’t many cultural heritage institutions that currently manage digital collections of philatelic materials, so this is an opportunity for those who plan to digitize their collections to consider what has been done and what makes sense to pursue. it is clear that philatelic digitization is still nascent, but as with other kinds of materials, it is only likely that more and more institutions will attempt digitization projects. it is hoped that this paper can serve as a jumping off point for institutions to discuss the creation of international metadata standards specifically for philatelic materials. acknowledgements many thanks are owed to the people who took time out of their very busy lives to respond to the unrefined inquiries of an mlis grad student: stuart aitken (curator, philately, the postal museum); james bone (archivist, private archives branch, library and archives canada); and elizabeth heydt (collections manager, smithsonian national postal museum). their expertise and responsiveness is immensely appreciated. managing metadata for philatelic materials | ozeran | doi:10.6017/ital.v36i3.10022 16 references aape (american association of philatelic exhibitors). 2016a. “aape join/renew your membership.” http://www.aape.org/join_the_aape.asp. –––––. 2016b. “exhibits online.” http://www.aape.org/join_the_aape.asp. american philatelic society. 2016. “stamp catalogs: your guide to the hobby.” accessed december 8. http://stamps.org/how-to-read-a-catalog. arms, caroline r., and william y. arms. 2004. “mixed content and mixed metadata: information discovery in a messy world.” in metadata in practice, edited by diane i. hillman and elaine l. westbrooks, 223-37. chicago, il: ala editions. bade, david. 2007. “structures, standards, and the people who make them meaningful.” paper presented at the 2nd meeting of the library of congress working group on the future of bibliographic control, chicago, il, may 9, 2007. https://www.loc.gov/bibliographicfuture/meetings/docs/bade-may9-2007.pdf. bade, david. 2008. “the perfect bibliographic record: platonic ideal, rhetorical strategy or nonsense?” cataloging & classification quarterly 46 (1): 109-33. https://doi.org/10.1080/01639370802183081. boggs, winthrop s. 1955. the foundations of philately. princeton, nj: d. van nostrand company. bruce, thomas r., and diane i. hillmann. 2004. “the continuum of metadata quality: defining, expressing, exploiting.” in metadata in practice, edited by diane i. hillman and elaine l. westbrooks, 238-56. chicago, il: ala editions. bureau of canadian archivists. 2008. rules for archival description. rev. ed. ottawa, canada: canadian council of archives. http://www.cdncouncilarchives.ca/archdesrules.html. cc:da (american library association committee on cataloging: description and access). 2010. “task force on metadata: final report.” american library association. https://www.libraries.psu.edu/tas/jca/ccda/tf-meta6.html. chan, lois m., and marcia l. zeng. 2006. “metadata interoperability and standardization – a study of methodology part i: achieving interoperability at the schema level.” d-lib magazine 12 (6). https://doi.org/10.1045/june2006-chan. gallery systems. 2015. “tms: the museum system.” http://go.gallerysystems.com/abouttms.html. international council on archives. 1999. isad(g): general international standard archival description. 2nd ed. stockholm, sweden: international council on archives. http://www.icacds.org.uk/eng/isad(g).pdf. liberty street software. 2016. “stampmanage the best way to catalog your stamp collection.” http://www.libertystreet.com/stamp-collecting-software.htm. information technology and libraries | september 2017 17 roberts, peter j. 2007. “philatelic materials in archival collections: their appraisal, preservation, and description.” the american archivist 70 (1): 70-92. https://doi.org/10.17723/aarc.70.1.w3742751w5344275. scott publishing co. 2014. scott 2015 standard postage stamp catalogue. vol. 3, countries of the world, g-i. sidney, oh: scott publishing co. society of american archivists. 2013. describing archives: a content standard. 2nd ed. chicago, il: society of american archivists. http://files.archivists.org/pubs/dacs2e-2013_v0315.pdf. society of american archivists. 2015. encoded archival description tag library, version ead3. chicago, il: society of american archivists. http://www2.archivists.org/sites/all/files/taglibrary-versionead3.pdf. straight, david. 1994. “adding value to stamp and coin collections.” library journal 119 (10): 7578. accessed december 8, 2016. http://libaccess.sjlibrary.org/login?url=http://search.ebscohost.com/login.aspx?direct=tr ue&db=ulh&an=9406157617&site=ehost-live&scope=site. trustees of the british museum. 2016a. “about the collection database online.” accessed december 8. http://www.britishmuseum.org/research/collection_online/about_the_database.aspx. –––––. 2016b. “british museum semantic web collection online.” accessed december 8. http://collection.britishmuseum.org/. reproduced with permission of the copyright owner. further reproduction prohibited without permission. information ecologies: using technology with heart/the media ... zillner, tom information technology and libraries; mar 2000; 19, 1; proquest pg. 54 book reviews information ecologies: using technology with heart by bonnie a. nardi and vicki l. o'day. cambridge: mit pr., 1999. 232p. $27.50 (isbn 0-262-14066-7). the media equation: how people treat computers, television, and new media like real people and places by byron reeves and clifford nass. cambridge: cambridge univ . pr., 1996 and 1999. 305p. $28.95 (isbn 1-575-86052x); paper, $15.95 (isbn 1-575-86053-8). the books i am reviewing this month are interrelated because they both focus on information technology and our changing world, with the two volumes looking at different levels of the picture. the broader, and to me more intriguing, view is presented by nardi and o'day in their wonderful book information ecologies. although it is not clear from the capsule biographies of the dust jacket, nardi and o'day are anthropologists who study the world of technology in a number of locales, and they here report the findings from their field work. among the case studies they discuss are an examination of the activities of reference librarians at two corporations and a look at a virtual world created for and by elementary school students. but they do much more than simply present case studies, although these alone make the book a worthwhile read. in addition, they argue that the most useful way to look at information technology is through the metaphor of "information ecologies," "system[s] of people, practices, values, and technologies in ... particular local environment[s]." they adopt this biological metaphor after carefully considering the most commonly employed information technology metaphors: technology as tool, text, or system . in turn, they find each of these metaphors wanting. it is particularly important to choose carefully the metaphorical lenses through which technological developments are viewed. each particular metaphor has consequences for how sanguinely we view a technology, and it is often worthwhile to use multiple metaphors to enhance our world view. the information ecology metaphor is particularly appropriate for an anthropological view of local "habitats" and their inhabitants and artifacts . in turn, an anthropological view is particularly apt for capturing the human side of technology (thus the subtitle: using technology with heart). this is a side of things that can be overlooked in other metaphorical views, particularly since it requires that the sticky issue of values be considered. unfortunately for all of us, there is a reluctance to talk of human values when considering technology. as nardi and o'day note, there is a tendency to either enthusiastically applaud new technology without regard to its effects, or to condemn all new technology as inherently debasing to humanity, or to simply resign oneself pessimistically to the inevitable development of technology and our lack of control over it. nardi and o'day tend to be cautious optimists, claiming that we can control technology, and the way to exercise that control is through our own local encounters with information ecologies. thus, rather than bemoaning the dehumanizing effects of the internet, information ecologies explores the successful use of internet technologies to set up a virtual world for students and the elderly in phoenix, arizona. instead of thinking or acting globally, exploit the technology locally, but do so in a way that makes sense in terms of human values. on the taxonomic scale of technology views, ranging from gloom and doom (e.g., the views of clifford 54 information technology and libraries i march 2000 tom zillner, editor stoll) to perpetual optimism (e.g., nicholas negroponte), i place nardi and o'day somewhere in the middle, but as i suggested, leaning toward cautious optimism. in fact, they spend several chapters discussing the views of others and offering prescient criticism of the deficiencies of those views . of particular interest to me was their analysis of the french sociologist jacques ellul, who apparently sounded the alarm concerning the stress to mind and soul of constant technological change in 1954, well before the current crop of doomsayers. nardi and o'day find ellul's views, as articulated in the technological society to be compelling. yet, they claim, the rise of the internet can counteract the trend that ellul saw toward monotonous sameness and lack of diversity in the face of technological efficiency. perhaps so. one thing that i was looking for in information ecologies were some practical tools for engaging in the kind of exploration of information habitats that nardi, o'day, and other anthropologists engage in. there is a spate of interest lately in the role of anthropologists in the design and deployment of new technologies, and i would like to determine its applicability to my modest software development projects. unfortunately, i was mainly disappointed on this score. in fairness to the authors, they did not set out to spell out the anthropological methodology of exploring information ecologies in any detail. the purpose of the book is rather to argue that viewing the world of technology as a set of interconnected information ecologies is useful and accurate, and in many cases superior to other metaphorical views. they succeed in this goal. now i want them to go on to write a book on using anthropological methods in these ecologies without necessarily becoming a professional anthropologist. nardi and o'day do touch extremely briefly on a few conventions of interviewing subjects, with --------reproduced with permission of the copyright owner. further reproduction prohibited without permission. their most important technical discussion centering on what they call "strategic questioning," which they present in the context of evolving information ecologies . they provide useful categories of questions to be asked, and specific examples. although it may seem obvious to ask penetrating questions of members of an information habitat, this is one area in which software developers in particular fail miserably . another seemingly obvious pointer is to pay attention . again, its obviousness is deceptive , since most of us are poor observers who make many assumptions about the characteristics of a work activity without observational evidence . as evidence that people introducing new technologies to an ecology do not follow these simplest pieces of advice you can tum to the chapter "a dysfunctional ecology," to see how badly technology can fail for nontechnological reasons . this case study deals with a major teach ing hospital that introduced a monitoring system into its neurosurgical operating suites that captured instrument readings as well as complete audio and video. the system was installed to aid neurophysiologists, experts who are called in to advise neurosurgeons at key points during complex surgeries to ensure that patient neurological function is not compromised . the neurosurgeons and neurophysiologists at this hospital decided that it would be more efficient for the neurophysi ologists to be able to remotely monitor multiple surgeries simultaneously. both groups failed to consult with the other constituencies among the operating team, the nurses and anesthesiology staff. these groups believed that their privacy was being compromised, particularly since it was possible to tape any procedures at multiple workstations throughout the hospital. i can easily envision similar sorts of problems due to lack of communication in introducing new or modified technology into other milieus, e.g., libraries. although the consequences might not lead to the potentially life-threatening situations that could arise in an operating suite, there are certainly possible outcomes where service to users could be undermined. despite the book being not exactly what i (rather selfishly) want, lnformation ecologies is a first-rate read and an important starting point for those concerned with better controlling technological change in the world of information. turning from an anthropological point of view to a psychological one, the media equation offers another important basis for technological design and implementation, particularly of computer software and multimedia. the release last year of a paperback edition of this volume, first published in 1996, provides a convenient pretext for reviewing this work. reeves and nass have supervised years of study and experimentation that have consistently demonstrated the truth of what they call the "media equation": that our relations with media, including computers and multimedia, are identical in key ways to our relationships with other human beings. this is true of all of us, even those of us sophisticated enough to understand that we are dealing with devices and human artifacts rather than people . reeves and nass quite entertainingly present the technique they've used over the years to perform their research, on a step-by-step basis: 1. pick a research finding on how people respond to each other or their environment. 2. find the summary of the social or natural rule that the study has yielded. 3. replace the words "person" or "environment" in the summary with media of some sort (television, movi es, computers, etc.) 4. find the research procedure . 5. substitute media for one of the people or the environment in the procedure. 6. run the experiment. 7. draw conclusions. although this may sound facetious, it is in fact the recipe that produced the startling conclusions that we all tend to behave toward media much as we do toward other people. what's perhaps more important is that reeves and nass point toward techniques that practitioners can use to produce more effective media, including computer software . as a simple example, consider politeness. reeves and nass discovered that people treated computers with the same sort of politeness that they would other human beings, and in turn reeves and nass suggest that people respond better to "polite media." they then provide some fairly straightforward advice on producing polite computer programs, starting with grice's maxims, a set of politeness rules assembled by h. paul grice, a philosopher and psychologist. these center around truth telling, appropriate quantity of information (neither too much nor too little), relevance, and clarity. all of this is fairly unsurprising, but the authors spell out just how the maxims can be applied to the construction of computer programs . further, they go on to suggest some rules of thumb of their own. for example, some computer programs produce verbal output but expect the user to key in his or her responses. this may be perceived by the user , possibly subconsciously, as forcing an impolite response, since mixing communications modalities is a faux pas. thus, they suggest that if text input is required , perhaps only text output should be supplied . this should provide you with some of the flavor of the media equation, and in turn you may be able to see a set of potential ethical dilemmas that can arise from utilizing book reviews i 55 reproduced with permission of the copyright owner. further reproduction prohibited without permission. techniques that result from the research of reeves and nass. this set of problems can be seen most clearly in the chapter "subliminal images," where they discuss how subliminal messages could be inserted into new media to advertise products or to attempt to bolster employee morale. in fact, they say, " ... it might be easier to accomplish subliminal intrusions with a computer than with a television, because software can respond to the particular input of individual users and timing is more precise." they immediately temper this insight with the caution that" ... ethical and legal issues abound." indeed. although some of the techniques that can be applied to new media do lead to ethical problems, i think that most of what reeves and nass talk about are just elements of good design. subliminal suggestion seems to most of us to be out of bounds because it unfairly manipulates user response in a powerful way. the unfairness is that someone can be manipulated without his or her knowledge to do something outside of the person's normal behavior. although the other techniques tend to subtly alter behavior, they don't generally result in an anomalous action by the user. if you think this is a kind of philosophical hairsplitting, you're right. the onus is upon the programmer or multimedia designer to use these techniques with great care. in a past professional life i wrote computerized patient interviews for the psychiatry department of the university of wisconsin. researchers there and elsewhere found that people were generally more candid with the computer than they were with human clinicians. so the findings of reeves and nass were not quite as surprising to me as they might be to others. what did surprise me, however, is that the media equation is not a phenomenon solely of the nai"ve or inexperienced media and computer users. on the contrary, all of us, no matter how conversant we are with underlying technology, are susceptible to the effects described in the media equation. this vastly increases the power of computer programs and other media for both good and ill. i want to emphasize that not all of the possible effects of humanmedia interaction are pernicious. most are simply innocuous, and if techniques that benefit users can result from these effects there should be no harm in applying them in software or multimedia. in general, it's desirable to make user experiences of software and media pleasanter and more productive, and reeves and nass do an excellent job of providing pointers throughout the book. there are suggestions with regard to personality, emotion (including arousal), social roles, and form (e.g., image size, fidelity of sound, and video). none of them comes close to being as controversial as subliminal suggestion, although it continues to make me uncomfortable that people react to media as if they were dealing directly with other human beings. this is a disquieting finding, but it should not dissuade us from our jobs of designing good systems for users. all in all, information ecologies and the media equation are both firstrate books that belong in our libraries and on our professional bookshelves. both provide methodologies and techniques for making user interactions with automated systems a better experience, both in terms of accomplishing tasks efficiently and in terms of user satisfaction.-tom zillner index to advertisers info usa library technologies, inc. lita cover 4 cover 3 cover 2, 2 56 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. digital resource sharing and library consortia in italy giordano, tommaso information technology and libraries; jun 2000; 19, 2; proquest pg. 84 digital resource sharing and library consortia in italy tommaso giordano interlibrary cooperation in italy is a fairly recent and not very widespread practice. attention to the topic was aroused in the eighties with the italian library network project. more recently, under the impetus toward technological innovation, there has been renewed (and more pragmatic) interest in cooperation in all library sectors. sharing electronic resources is the theme of greatest interest today in university libraries, where various initiatives are aimed at setting up consortia to purchase licenses and run digital products. a number of projects in hand are described, and emerging trends analyzed. t he state of progress and the details of implementation in various countries of initiatives to share digital information resources obviously depend-apart from current investment policies to develop the information society-on many factors of a historical, social, and cultural nature that have determined the evolution and consolidation of cooperation practices specific to each context. before going to the heart of the specific subject of this article, in order to foster an understanding of the environment in which the trends and problems that we shall be considering are set, i feel it best to give a quick (and necessarily summary) sketch of the library cooperation position in italy. the word "cooperation" became established in the language of italian librarians only toward the mid-'70s, when in the sector of public libraries-which were transferred in those years from central government to local authorities-the "territorial library systems" were taking shape: this was a form of cooperation provided for and encouraged by regional laws that brought together groups of small and medium-sized libraries, often around a system centre supplying shared services. a few years later, in the wake of the new information technologies and in line with ongoing trends in the most advanced countries, in italy, too, the term "cooperation" became increasingly associated with the concept of computerized library networks. the decisive impulse in this direction came from a project of the national library service (sbn), the national network of italian libraries, then in a gestation stage, which also had the merit of tommaso giordano (giordano@datacomm.iue.it) is deputy director of the library at the european university institute, florence. 84 information technology and libraries i june 2000 speeding up the opening of the italian librarianship profession to experiences underway in the most advanced countries_! in the '80s, cooperation, together with automation, was the dominant theme at conferences and in italian professional literature. however, the heat of the debate had no satisfactory counterpart in terms of practical implementation, because of both resistance attributable to a noninnovative administrative culture and the polarization of the bulk of the investments around a single major project (the sbn network), the technical and organizational choices of which were shared by only part of the libraries, while others remained completely outside this programme. many librarians, while recognizing the progress over the last fifteen or twenty years (including the possibility of accessing the collective catalogue of sbn libraries through the internet), maintain that results obtained in the area of cooperation are well below expectations, or energy involved. i am touching here on one of the most sensitive, controversial points in the ongoing professional debate, which i do not wish to dwell on except to note the split that came in italian libraries following the vicissitudes of a project that ought, instead, to have united them and stimulated large-scale cooperation.2 i shall now seek to summarize the cooperation position in italy in relation to the subject of this article. very schematically (and arbitrarily) i have grouped the experiences i feel most signficant under three heads: sbn network, territorial library systems, and sectoral cooperation. sbn brings together some eight hundred large, medium-sized, and small libraries (national, localauthority, university, and research-institute). the programme, funded by the central government, supports cooperation in the following main sectors: • hardware sharing, • development and maintenance of library software packages, • network administration, • shared cataloguing, and • interlibrary loans. the sbn is a star network with its central node consisting of a database (the so-called "index") containing the collective catalogue of the participating libraries (currently some four million relevant bibliographic titles and 7.5 million locations). to the index are linked the thirtyseven local systems, single libraries or multiple libraries, that apply the computerized procedures developed by the sbn programme. thus the sbn is a closed network of only those libraries agreeing to adopt the automation systems distributed by the central institute for the union catalogue, the central office coordinating the programme, take part. reproduced with permission of the copyright owner. further reproduction prohibited without permission. from the organizational viewpoint, the sbn can be regarded as a de facto consortium (i.e., not in the legal sense of the term), even if the management bodies, participation structures, and funding mechanisms differ considerably from consortia that have been set up in other countries. in fact, libraries join the sbn through an agreement among state, regions, and universities, and the governing bodies represent not the libraries but their parent institutions. participating libraries receive the services free, and funding for developing the systems and network administration comes from the central government, which coordinates the technical level of the project through iccu.3 currently, ideas are moving toward evolving the sbn into an open network system and reorganizing its management bodies: if this provision becomes a reality, the sbn will have potential for taking on an important role in developing digital cooperation. the territorial library systems, developed especially in the central and northern regions, consist of small groups of public libraries cooperating in one or more sectors of activity such as: • sharing computer systems, • cataloguing, • centralized management of purchases, • interlibrary loans, and • professional training and other activities. the library systems are based on conventions and formal or informal agreements between local institutions (the municipalities) and receive support from the provincial and regional administrations. in more recent years some systems (e.g., abano terme, in the veneto) have formed themselves into formal, legal consortia. the most advanced experience in this sector-for example, the libraries in the valseriana (an industrial valley in lombardy), which have been operating on the basis of an informal consortium for some twenty years now-have reached a high level of efficiency comparable with the most developed european situations and may rightly be regarded as reference models for the organization of cooperation. however, given their limited size, they are unlikely to achieve economies of scale in the digital context unless they develop broader alliances. it is not unlikely that these consortia, given their capacity to work together, will in the near future develop broader forms of cooperation suited to tackling current technological challenges. sectoral cooperation (cooperation by area of specialization) is meeting today with steadily increasing interest, though it did not fare very well in the past. among the rare initiatives embarked upon by university and research libraries in this direction, particular importance in our context attaches to the national coordination of architectural libraries (cnba), started some twenty years ago, which became an association in 1991. the cnba has various projects on its programme and can be regarded as an established reference point for cooperation among architectural libraries. we should also mention one of the "oldest" cooperation projects among research libraries: the italian periodicals catalogue promoted by the national research council (cnr), recently made available online by the university of bologna.4 to complete this sketch, at least a mention should be made of the participation of italian libraries in the european commission's technical programme in favor of libraries. this programme, which since 1991 has mobilized the world of libraries in the european union, not only favors and guides explosion of technologies into libraries in accordance with preset objectives, but also has the aim of encouraging cooperation among libraries in the various countries. the programme-the latest edition of which includes not just libraries but also archives and museums-has secured significant participation from many italian libraries. over and above the validity of the projects already carried out or under way (important as that is), this programme has been very valuable to italian libraries in terms of exchanges of experience and of opening up professional horizons, especially as regards cooperation practice.s digital cooperation recently, following the expansion of electronic publishing, university libraries have been displaying renewed interest in cooperation activities with particular reference to acquiring licenses and sharing electronic resources. this movement is at present in full swing and is giving rise to manifold cooperation initiatives. to get an idea of the trends under way, one may leaf through a session on database networking in italian universities in the proceedings of the aib congress at genoa. 6 on that occasion a group of universities presented a "draft proposal of agreement on access to electronic information." the document is divided into two parts, the first defining the purposes and object of university cooperation in the sphere of electronic information. the second part indicates operational objectives for cooperation in acquiring electronic information and proposes a model contract for purchasing licenses, to which member universities are to keep. the content of this second part coincides with the recommendations and understandings signed by associations, consortia, and groups of libraries in other countries, and largely follows the indications and recommendations issued by the european bureau of library information and documentation associations (eblida), the organization that brings together the library associations of the various european countries; by digital resource-sharing and library consortia in italy i giordano 85 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the international coalition of library consortia (icolc); and by other library organizations. there is no point here in listing all initiatives under way in italian libraries, in part because most of them are only just started or in the experimental stage. i shall mention a few only to bring out the trends that seem, from my point of view , to be emerging . development of digital collections at the moment initiatives in this sector are much fewer and less substantial than in other industrialized countries. among them the biblioteca telematica italiana stands out: in it, fourteen italian and two foreign universities digitize , archive, and put online works in italian . the project is based on a consortium, the italian interuniversity library center for the italian telematic library (cibit), supported by funds from the national research council (cnr) and made up of the fourteen italian and two foreign universities that have signed the agreement. technical support is provided by the cnr institute for computer linguistics, located in pisa.7 in this context we must also note, especially for the consequences it may have for the future growth of digital collections, an agreement between the national central library in florence and the publishers and authors associations aimed at accomplishing the national legal depository for electronic publishing project, which also provides for production of a section of the italian national bibliography to be called bnidocumenti elettronici. the publishers who have signed the agreement undertake to supply a copy of their electronic products to the national central library in florence. the latter undertakes to guarantee conservation of the electronic products deposited, and to make them accessible to the public in accordance with the agreements reached. • description of electronic resources in this area the bulk of the initiatives are still in an embryonic stage. in the sector of periodicals index production (i.e., tocs), mention should be made of the economic social science periodicals (essper), a cooperation project on italian economics periodicals launched by the libero istituto universitario carlo cattaneo (castellanza, varese) to which some forty libraries are contributing. 9 recently the project has been extended to italian legal journals. essper is a cooperative programme based on an informal agreement among the libraries, each of which undertakes to supply in good time the tocs of the periodical titles they have undertaken to monitor. the programme does not benefit from any outside funds, being supported entirely by the participating libraries, which 86 information technology and libraries i june 2000 have recently been endeavouring to evolve into a more structured form of cooperation . administration of electronic resources and licenses in this sphere there have been numerous initiatives recently, particularly by university libraries . one may note, first, a certain activism by university data-processing consortia (big computing centres created at the start of the computer era to support applications in scientific and then university and library administration areas). the interuniversity consortium for automation (cilea) in milan , which has for some time been operating in the area of library systems and electronic information distribution (especially in the biomedical sector), has extended its activities by offering services to nonmembers of the consortium too. recently cilea, in connection with a broader programme---cdl-cilea digital library-has been negotiating with a number of major publishers the distribution of electronic journals and online bibliographic services on the basis of needs expressed by the libraries in the consortium. caspur (the university computing consortia in rome) is working on several projects, among them shared management of electronic resources on cd-rom in a network among five universities of the centre-south . caspur, too, has opened its services to libraries not in the consortium and is negotiating with a number of major publishers the licenses for establishing a mirror site for electronic periodicals. the university of genoa, through csita, its computing services centre, has concluded an agreement with an italian distributor of electronic services to enable multisite license-sharing for biomedical databases by institutions operating on the territory of liguria. very recently the universities of florence, bologna, modena, genoa, and venice and the european university institute in florence have initiated a pilot project (cipe) for shared administration of electronic periodicals, and have begun negotiations with a number of publishers. let us now seek to draw some conclusions from this initial, brief consideration of current initiatives: • initiatives in the area of digital cooperation are coming mainly from the world of university and research-institute libraries. • no projects are big enough to achieve economies of scale, with most initiatives in hand having a very limited number of partners and often being experimental in nature . • projects under way do not provide for the formation of proper consortia, most likely because the legal form of the consortium is hard to set up in italy because of the burdens involved, especially the complexity and length of the decision-making processes needed to constitute such an organization. reproduced with permission of the copyright owner. further reproduction prohibited without permission. • librarians prefer decentralized forms of cooperation, partly because , shaken by experiences of the past, they fear losing autonomy and efficiency and finding themselves caught up in the bureaucracy of centralized organizations. "however, there can also be a correlation between the amount of autonomy that the individual institution retains and the ability of the consortium to achieve goals as a group". this observation by allen and hirshon obviously holds for italy too . jo it is no coincidence , in fact, that university computing consortia, who have centralized staff and funds available, are able to carry out more incisiv e actions in this sector. • except for the biblioteca telematica italiana, no initiatives seem to have been incentivized by ad hoc government programmes or funds. • a part of the cooperation projects concerns sharing of databases on cd-roms. the traditional italian resistance to online materials would seem to be due partly to the still inadequate network infrastructures in our country; improvements in this sector might bring a quick turnaround here. • some initiatives in hand have been inspired more by suppliers than by librarians : the risk is to cooperate in distributing a particular product, not to enhance libraries' bargaining power. without wishing to deny anything to the suppliers, who today play an essential part in terms of professional information too, i feel that keeping the roles clearly separate may help to develop clear, upright and mutually advantageous cooperation. • some major project s are being led by universit y computing consortia that have begun to take an interest in the library sector. the university computing consortia would indeed have some of the requirements to play a first-rank role in this sphere if they can manage to bring themselves into their most natural position, i.e., to operate as agents of libraries rather than as distributors of services on behalf of the commercial suppliers. moreover, it ought to be clear that th e computing consortia should act as partners with the library consortia and not as substitutes for them, otherwise the libraries risk limiting their autonomy of decision . • some attention is turning toward university electronic publishing , though at the present stage it does not seem there are practical projects for cooperation in this area. • finally, one has to not e low initiative by libraries (compared with other countries) in developing content and in storing digital collections. th e analysis i have rapidly summarized here is the basis for an initative which has in recent months been stimulating the debate on digital cooperation in italy. i am referring to the italian national forum on electronic information resources (infer), a coordination group initially promoted by the europ ean university institut e, the university of florence, and a number of universities in the centre-north, which is today extending beyond the sphere of university and research libraries. the forum's chief mission is to coop erate to promote efficient use of electronic information reso urce s and facilitate access by the public. to this end it encourages libraries to set up consortia and other typ es of agreement on acquisition and management of electronic resources and access to them . infer's objectives can be summarized as follows: • to act as a reference and linkage point and develop initiatives to promote activities and programmes in the area of library e lectro nic resource sharing; • to enhance awar eness both at institutional political levels (ministries, universities, local authorities, etc.) and among librarians and end users; • to facilitate dialogue and mutual collaboration between libraries and all others in the knowledge production and distribution chain, to help them all (authors, publi shers, intermediaries, end users) to take advantage of the opportunities offered by the information society; and • to maintain contacts with similar initiatives under way in other countries. infer has immediately embarked on a rich programme of activities which is giving appreciable results especia lly in terms of raising awareness of the problem and coordinating initiativ es in the area. we shall her e briefly mention some of the actions in hand that seem to us most important. dissemination of information. infer has developed a web site where as well as information on the forum's activities, important information and documents can be found relating to the consortia, the negotiations and licenses, and in general the digital resource-sharing programmes in italy and around the world.1 1 a discussion list for tnfer members has also been activated. seminars and workshops. thi s activity is aimed at further exploration of themes of particular interest (e.g ., legal aspects of license contracts, or programmes under way in other countries) . data collection. the two main programmes corning und e r this heading are: (a) monitoring of italian cooperation initiatives under way in the digital sector; and (b) collecting data on acquisitions of electronic information resources in university libraries . this information will enable the libraries to have a more exact picture of the situation , so as to assess their bargaining power and achieve the necessary support to adopt the most appropriate strategies. digital resource-sharing and library consortia in italy i giordano 87 reproduced with permission of the copyright owner. further reproduction prohibited without permission. indications and recommendations. as well as translating and distributing documents from the most important associations operating in this area (such as eblida, icolc, and ifla), infer is developing a model license for the italian consortia. infer was set up in may 1999 and currently has some forty members, most of them representatives of university library systems, university computing consortia or research libraries, or univer si ty professors. one of infer's aspirations is to persuade decision-makers to develop a programme of incentives on a national scale for the creation of library consortia . i critical factors as to the delay we note in terms of shared management of electronic resources, weight clearly attaches to the fact that cooperation is not very established , nor are the national structures that ought to have supported it. it would be all too easy and perhaps also more fun to attribute this situation to the so-called individualism of italians and to abandon inquiry into th e structural limitations that may have determined it. first of all, except in very few cases, libraries have no administrative autonomy, or only very little, and with hardly any decision-making powers. this factor favors interference in decision-making processes, complicates th em, slows down procedures, and strips librarians of their responsibility. one of the reasons why the sbn has not managed to generate cooperation is to be sought in the mechanisms for joining and participating in the programme . in other words, many libraries have joined the sbn following decisions taken from above, at the political and admistrative levels, and not on the basis of an autonomous, weighted assessment of attitudes, needs, and alternatives. these experiences have augmented libraries' reluctance to embark on centrally steered national programmes. on the other hand, the low administrative autonomy they have prevents them from implementing truly effective alternative solutions, i.e., ones able to realize economies of scale. another factor is the administrative fragmentation of libraries . the big universities have fifty or so libraries each (often one per department). some universities have an office coordinating the librari es , but only in very few cases does this structure have the powers and the necessary support to coordinate; more often it acts as a mediation office with no real administrative powers. in short, the result is that since (perhaps also because of a misunderstood sense of departmental autonomy) there is no 88 information technology and libraries i june 2000 decision-making centre for libraries in each university , decisional processes prov e slow and cumbersom e. clearly, all this brings many probl ems in establishing understandings and cooperative programmes with other libraries and weakens the universities in negotiating licenses. this position, while objectively favoring suppliers in the short term, in the long term risks facing them with difficulties given an increasingly impoverished, uncertain market because of the fragmentation and the limited capacity of possible purchasers . another limit is the insufficient awareness, especially on the academic side, of the challenges of electronic information. in early 1999 the french daily le monde published an extensive feature on scientific publishing, showing how current publishing production mechanisms, whil e assuring a few big publishers of ample profit margins, are suffocating libraries and universities under the continu ous rises in prices for scientific joumals.12 the argument, immediately taken up by the spanish el pais and other european newspapers, met with very little response in italy. clearly, in italy today, the conditions do not exist to embark on initiatives like the incisive open letter to publishers sent by the kommission des deutschen bibliotheksinstituts filr erwerbung und bestandsentwicklung in germany, supported by similar swiss, austrian, and dutch organizations. 13 the lack of an adequate national policy in the area of electronic information is probably the direct consequence of the problems i have just mentioned. in this context, however praiseworthy the initiatives, they tend in the absence of reference points and practical support to break up or fritter away . under the ministry for universities there are no leadership or action bodies in the area of academic information, like the joint information system committee in britain that stimulates programmes aimed at developing and utilizing information technologies in university and research libraries . these observations are also valid for the state libraries and public libraries, too, where the central (ministry for cultural affairs) and regional authorities could play a more effective part in promoting digital cooperation . i conclusions the picture i have presented is not very rosy. however, it does reveal considerable elements of vitality and great er awareness of the problems emerging, starting with a few representatives of academic sectors who might be able to wield influence and bring about a turnaround. at the moment, the consortium movement to share electronic resources chiefly involves university libraries, reproduced with permission of the copyright owner. further reproduction prohibited without permission. but a few initiatives by public libraries are starting to appear, especially in the multimedia products sector. no specific lines of action are yet emerging at the level of the national authorities-especially the ministry for education and research and the ministry of cultural activities, on which the national libraries and many research libraries depend. it is likely that in the near future the entry of these agencies may be able to modify the current scenario and considerably influence the approach to cooperation. from this viewpoint, the impression is that a few consortium initiatives that have been flourishing in recent months on the part of both libraries and suppliers have the principal aim of proposing cooperation models to guide future choices. in conclusion, we are only at the outset, and the game is still waiting to be played. references and notes 1. michel boisse!, "l'organisation automatisee de la bibliotheque de l'institut universitaire europeen de florence," bulletin des bibliotheques de france 24, no. 5 (1979): 231-39. for an overall picture of the debate, see: la cooperazione: ii servizio bibliotecario nazionale: atti de/ 30th congresso del/'associazione italiana biblioteche, giardini naxos, november 21-24, 1982 (messina: universita di messina, 1986). 2. tommaso giordano, "biblioteche tra conservazione e innovazione," in giornate uncee su/le biblioteche pubbliche statali, roma, january 21-22, 1993 (roma: accademia nazionale dei lincei, 1994): 57-65. for the most recent developments in the debate, see the articles by antonio scolari, "a proposito di sbn," giovanna mazzola merola, "lo studio sull'evoluzione de! servizio bibliotecario nazionale," and claudio leombroni, "sbn un bilancio per ii futuro," bollettino aib 37, no. 4 (1977): 437-66. 3. further information on sbn can be found at www.iccu.sbn.it/sbn.htm, accessed oct. 27, 1999, where the collective catalogue of participating libraries is also accessible. 4. catalogo italiano dei periodici (acnp),www.cib.unibo.it/ cataloghi/infoacnp.htm, accessed sept. 19, 1999. 5. there is a considerable literature on the european commission's "libraries programme": for a summary of projects in the programme, see telematics for libraries: synopses of projects (luxembourg: office for official publications of european communities, 1998). updated information on the latest version of the programme can be found at www.echo.lu/ digicult, accessed oct. 26, 1999. on italian participation in the programme see: "ministero per i beni culturali e ambientali, l'osservatorio dei programmi internazionali delle biblioteche 1995-1998" (roma: mbac, 1999). 6. associazione italiana biblioteche (aib), xliv congresso nazionale aib. genova, 1988: www.aib.it/aib/congr/co98univ. htm, accessed oct. 27, 1999. 7. more information about cibit can be found at www.ilc.pi.cnr.it/pesystem/19.htm, accessed may 19, 2000. 8. progetto eden: deposito legale editoria elettronica n azionale, www.bncf.firenze.sbn.it/ progetti.html, accessed sept. 29, 1999. 9. more information about essper mav be found at www.liuc.it/biblio/ essper /default.htm, access~d may 19, 2000. 10. barbara mcfadden allen and arnold hirshon, "hanging together to avoid hanging separately: opportunities for academic libraries and consortia," information technology and libraries 17, no. 1 (1998): 37-44. 11. the infer web page can be found on the universita di roma i site, www.uniromal.it/infer, accessed may 19, 1999. 12. le monde, 22 jan. 1999: a whole page is devoted to this topic. see especially the article titled "les journaux scientifiques menaces per la concurrence d'internet." accessed feb. 4, 1999, www.lemonde.fr/ nvtechno /branche / journo / index.html. the point was taken up again by el pa(s, 27 jan. 1999; see the article titled "las revistas cientfficas, amenazadas por internet." 13. the letter, signed by werner reinhardt, dbi president, is available at www.ub.uni-siegen.de/pub/misc/offener_brief-engl. pdf, accessed feb. 4, 1999. digital resource-sharing and library consortia in italy i giordano 89 lib-s-mocs-kmc364-20140601051521 39 selective dissemination of marc: a user evaluation lorne r. buhr: murray memorial library, university of saskatchewan, saskatoon, saskatchewan after outlining the terms of reference of an investigation of user reaction to the selective dissemination of marc records, a summary of the types of users is given. user response is analyzed and interpreted in the light of recent developments at the library of congress. implications for the future of sdi of marc in a university setting conclude the paper. introduction f. w. lancaster ( 1968) in his detailed study of medlars makes the following statement, which has application to all sdi work: "in order to survive, a system must monitor itself, evaluate its performance, and upgrade it wherever possible." ( 1) since seldom operates in a fairly new field, sdi for current monographs, an evaluation is most important. to a great extent it must be made without reference to other systems since most of the operational sdi services deal with tape services in various fields of scientific journals, and although there are some parallels, there are numerous differences. whereas services such as can/ sdi cater primarily to the natural and applied sciences, seldom opens up the possibilities for sdi in the humanities and social sciences. the background to the seldom project at the university of saskatchewan has been outlined earlier by smith and mauer hoff ( 1971) and will not be repeated here. (2) after five months of operation a major questionnaire was sent out to each of 121 participants in the experimental seldom service. this questionnaire was based almost entirely on the one used by studer ( 1968 ) in his dissertation at indiana state university. (3) the general purpose of the study was to elicit user reaction to seldom, their evalmaterial appearing in this paper was originally presented at the third annual meeting of the american society for information science (western canada chapter), banff, alberta, october 4, 1971. 40 journal of library automation vol. 5/1 march, 1972 uation of its usefulness, time necessary to scan the weekly output, suggestions regarding continuance of the service, etc. besides this general purpose, the gathering and analyzing of data on seldom will be useful to the library administration in determining the future of an sdi service of this nature. a separate cost study is being prepared in this connection. several factors prompt a cautionary stance in assessing the value of an sdi system on the basis of one questionnaire: ( 1) there is no control situation to which we can compare seldom, i.e., there was no systematic service for current awareness in the field prior to the advent of seldom. faculty and researchers were dependent on their ingenuity to ferret out information on new books which were pertinent to their field of research and instruction. seldom is therefore being compared to a conglomeration of ad hoc methods which may be as numerous as the individuals using them. therefore, we must be cautious or we will tend to say, "something in the field of current awareness is better than nothing," when we really do not know what that "nothing" is. (2) although seldom had been operational for some twenty weeks when evaluation began, this is a relatively short period on which to base an assessment. on the other hand studer's evaluation was based on the experiences of thirty-nine users and covered only eight weekly runs against the marc tapes scheduled on an every other week basis. (3) seldom was implemented without any study to determine the adequacy of the ad hoc approaches, to which i have already referred, nor to assess the patterns of recommendation for purchase. it was assumed that there was a need for seldom and some of the response would indicate that this is a fairly valid assumption, since almost 90 percent of the respondents wanted the service continued. a random investigation in mid-august of 748 current orders in the acquisitions department for books with imprint of 1969 or later revealed that ninty-five or 12~ percent referred to seldom as the source of information for a particular recommendation to purchase. this may or may not be significant since there is no way of assessing whether these items would have been recommended anyway, only later perhaps. one by-product of orders based on seldom information is that correct lc and isbn numbers are given and with the capabilities of the tesa-1 cataloging/ acquisitions system such orders can be expedited more quickly and can also be cataloged sooner than non-marc materials, thus ostensibly getting the desired item to the requestor in less time than previously. seldom is valuable in our university setting, therefore, not only as a means of awareness of new items, but also in the actual retrieval of the item for the user, in this case through acquisition. our analysis, however, must be directed to the effectiveness of seldom as an awareness service, vis-a-vis the ad hoc approach. user group of 121 questionnaires sent out, seventy-seven or 63.5 percent were returned. six of these had to be rejected for the purposes of this study since selective dissemination of marc /buhr 41 table 1 i. library and information science a. on-campus 12 b. off-campus 17 29 ii. social sciences and humanities a. on-campus 15 b. off-campus 2 17 iii. natural and applied sciences a. on-campus 23 b. off-campus 2 25 either only a few questions had been answered or a general letter had been sent instead of answering the questionnaire. thus, the data presented in this study will be based on seventy-one completed questionnaires or 58.6 percent return. three additional verbal comments were made to the writer and thus we in fact heard from eighty or 66 percent of the users. the term "users" will designate the seventy-one who completed their questionnaires, although comments from the other nine individuals will also be referred to. the users have been grouped into three categories according to table 1. categorization was along fairly traditional lines, with category i being necessary because of the large number of people falling into this area. the seventeen off-campus users coming under designation (i) represent the library schools in canada as well as librarians/ information scientists in canada and the united states. the on-campus users are library department heads and heads of branch libraries. included in the social sciences and humanities are the fields of psychology, sociology, history, economics, english, commerce, classics, etc. the natural and applied sciences include all the health sciences plus physical education since the two profiles in that area are tending toward the health sciences. engineering, poultry science, physics, chemistry, biology, etc. , are represented here. observations a sample of the questionnaire used appears on p.47-50 and includes a tally of the number of responses for each possible alternative answer to each question. in some cases the total number of replies for a question is less than seventy-one. this is explained by the fact that some questions on some questionnaires were not answered or were answered ambiguously so they could not be tallied. generally speaking, users found seldom to be good to very good in providing sdi for new english monographs. 25.8 percent of the users found the lists very useful while 48.5 percent said they were useful. six users said the listings were inconsequential for their purposes; in several 42 journal of library automation vol. 5/1 march, 1972 instances this may be due to poor profiling or profiling for a subject area in which little would appear on the marc data base. 23.6 percent of the users indicated that in most cases items of interest found on the seldom lists were previously not known to them. 45.8 percent said that "of interest" items were frequently new. 76 percent of the group believed that the proportion of "of interest" items which also were new was satisfactory, a percentage which speaks well for the currency and effectiveness of an sdi capability. one of the chief drawbacks for which sdi services are often cited is the absence of evaluative commentary or abstract material to accompany the citations. some tape services do provide either an abstract or a good number of descriptors, and this has proved to be an asset in helping the subscriber. seldom is based on the marc tapes which provide complete cataloging data but do not give either evaluations or a multiplicity of descriptors. (some indications are that the information now available in publishers' weekly might at some time in the future be added to the marc tapes. ) interestingly enough, 83.5 percent of the users said the information included in the entries was adequate to determine whether an item was of interest or not. predictably, title, author/ editor, and subject headings were the three indicators, in that order, which were found most useful in making evaluations. this is significant since titles in the humanities and some of the social sciences, particularly, are often not as specific in describing the contents of a work as are titles in the physical sciences. 63.5 percent of the users indicate that seldom information is used for recommending titles for acquisition by the library. as a result it is quite possible that purchasing in the areas covered by seldom profiles may increase and the tendency to broaden the collection should increase. unfortunately, no pattern of pre-seldom recommending for purchase is known. some instructors use the weekly printouts to keep current bibliographies on hand both for teaching purposes and for research purposes. since over half the users ( 55.8 percent) needed no more than ten minutes per week to scan the printouts, there is no indication that excessive time is taken up in the use of such an sdi service. in reply to the question, "would you be willing to increase the number of irrelevant notices received in order to maximize the number of relevant ones?" opinions were nearly balanced with 58 percent replying in the affirmative and 42 percent answering negatively. on the other hand, increases in the marc data base expected some time in 1972 when other roman alphabet language imprints and records for motion pictures and filmstrips are added, did not seem problematic with only 25 percent of users asking that an upper limit be placed on the quantity of material retrieved by their profiles. numerous individuals (thirty ) responded favorably to the prospect of wider language coverage by marc. on the other hand, several individuals commented that non-english output on seldom would not enhance the service for them, and this likely reflects languagt selective dissemination of marc /buhr 43 capabilities more than a lack of non-english material in their subject area. the question regarding format brought interesting comments, especially from library personnel and off-campus librarians: "computer type format is often confusing." "a book designer should be consulted to improve the format." "spacing could be improved to separate title and imprint information from subject headings and notes at foot of entry. would make scanning easier." questions fourteen, nineteen, and twenty-one provide an overall summary of user reaction. 88.6 percent of users want the service to continue. overall value of seldom was rated "very high" by 11.3 percent, "high" by 33.8 percent, "medium" by 42.2 percent, and "low" by 12.7 percent. seldom served to demonstrate the possibility of sdi for monographs "amply" according to 36.6 percent of users, "adequately" to 50.6 percent of users, and "poorly" to 12.65 percent of users. there was less certainty on how such a program should be administered or coated, particularly since a long-range cost study was not yet available. clearly those who were impressed with seldom's effectiveness and future possibilities wanted other faculty to have the same opportunities, yet they cautioned against a blanket service. one comment sums this up best, "it should be available to anyone who has a perceived need for it-but require them to at least make the effort of setting up the profiles, etc." many of the less than enthusiastic comments about seldom could be correlated with little or no user feedback to the search editor in order to improve relevancy and recall. user education in this regard is crucial in order that all users fully understand the possibilities and limitations of the sdi service. the success of any existing sdi service in the periodical literature has hinged on a good data base and up-to-date, specific profiling according to smith and lynch ( 1971 ). ( 4) the effectiveness of the profiling is a direct function of the ingenuity and persistence of the user and the profile editor. discussion this study has attempted to weigh the usefulness of an sdi service primarily with regard to its utility as a current awareness service. seldom, in order to be worthwhile, must either be faster or broader in its coverage than existing services. two comparisons readily arise out of the commentary of the users. some library science professors felt that the lc proofslip service was just as fast as seldom and thus there was no advantage in having the latter when the former was available. a study done at the university of chicago by payne and mcgee ( 1970) repudiates this argument fairly effectively. ( 5) findings at chicago show that marc is faster than the corresponding proofslips. a number of users rely heavily on publishers' blurbs and prepublication notices and find that often books for which records appear on seldom are already on the library shelves. this observation is not altogether an indictment of seldom since another user observed that he appreciated being able to have the hard copy im44 journal of library automation vol. 5/1 march, 1972 mediately; and in some cases he might not even have known about the item except for seldom. some users mentioned that waiting for evaluative reviews could put one at least a year behind just in placing the order for the book, let alone receiving it. seldom has the virtue of informing individuals of the existence of new books, but the delay in having the actual item might be problematic, so one question was directed to this consideration. some people felt that it was at least worth something to know that a book existed even if one could not consult it immediately. numerous complaints were aired regarding the slowness of obtaining items ordered through a library's acquisitions department. in fact one user said this slowness meant he had to purchase personal copies of items he wanted/ needed. as indicated earlier in the introduction, the tesa-1 acquisitions-cataloging routine at the university of saskatchewan library does have the capability to speed up actual receipt of books by the patron. a recent development at the library of congress has definite implications for the future of seldom and any other marc-based sdi programs. the cip (cataloging in publication) 0 program initiated this summer means that lc will now be able to make available cataloging information, except for collation, for books about to be published, at a time factor of up to six weeks before publication. such marc records will have a special tag designating them as cip material. furthermore, cip records will appear only on marc, the number predicted is 10,000 for the first year and 30,000 by the third year, a figure which would include all american imprints. (6) marc-oklahoma o o has already surveyed the subscribers to its sdi project to determine whether they would prefer to receive both cip marc records and regular marc records or only one of the two categories. users preferred to receive both types of information and appropriate changes have been made to the oklahoma sdi programs. (7) beginning with september marc cip records will appear and present information on books thirty to forty-five days before they are published. several library personnel appreciated the usefulness of seldom as an outreach service of the university library into the academic community. they see seldom as a public relations tool. numerous efforts are at the present time being made by librarians to alert individuals to materials in their several fields of interest, and seldom can play an important role in providing an active dissemination of information on a systematic basis. this is the direction in which we need to move so that our role becomes both that of a collector of information and a disseminator of information. special librarians have been doing this kind of thing for years and seldom allows for specialized service to a larger user group. implications and conclusions 1. an sdi service based on marc can be helpful in building a balanced library collection depending on the efforts of faculty and/ or bibselective dissemination of marc /buhr 45 liographers in setting up their prorles and maintaining them. the article by ayres ( 1971 ) is particularly good on this aspect. ( 8) the parameters of the marc data base must constantly be kept in mind, just as the constraints of the ad hoc methods must be considered in any comparisons. publishers' blurbs in journals have the limitation of not systematically covering all the publications in a given subject area; book reviews tend to appear too late to allow users to receive current information on new books; seldom corrects the first shortcoming at the expense of not having the evaluations appearing in book reviews. on the other hand marc tapes do represent the cataloging of books in the english language by one of the largest national libraries in the world, and thus provide a coverage which is hard to duplicate by any one other alerting service. 2. comments, especially from users in the social sciences and humanities, indicate that an sdi system for new monographs has greater pertinence in their area than perhaps in the natural and applied sciences simply because of the nature of research done in the two areas. a recent study by j. l. stewart ( 1970) substantiates this factor for the field of political science. ( 9) his detailed analysis of the fatterns of citing in the writings appearing in a collective work in politica science indicated that 75 percent of such citations were from monographs leading him to the obvious conclusion that "monographs provide three times as much material as do journals" in the field of political science. by contrast, journals are likely more crucial for the fields of natural and applied science, and provide the key access point for vital information. 3. sdi of marc, most users felt, should demand a fair amount of effort on the part of users to assure that the service would obtain optimum return for money invested. a blanket service to all faculty would be wasteful since many faculty would not have a perceived need for it and others would not use it enough if it was simply offered free to everyone. comments tended to favor making contact through the departmental library representative and channel weekly printouts through this individual. a cost study will help determine whether it is economically feasible to operate seldom in an academic setting with at least 100 users. if current subscription costs for sdi services such as those offered by can/sdi of the national science library, ottawa can be maintained, and early indications are that they can, a cost of $100 per profile per year may be feasible bringing the annual expenditure for 100 users to $10,000. a chief variable which makes effective costing difficult is the variation in the number of records appearing on each weekly tape and this is a variable which can only be dealt with by prediction on the basis of the number of records on past tapes. 4. seldom has the virtue of adding a major role of dissemination of information to libraries which up until now have primarily operated as starers of information. selective dissemination of marc/buhr 47 822 33~ shakespeare william fleay. frederick garo. 1831-1909. shakespeare manual. new york.ams press<1970> xxiii. 312 p. 19 cm. lc 76-130621 pr2895 p1002 en 01 tw 000 wt 000 s r0252 fc leng 822.33 isbn 0~0~02~08~ seldom evaluation questionnaire l. what is your feelin g about the sdi lists as a source for finding out about the existence of newly published works in your fields of interest? would you say that the lists provided a source which was: (a) very useful (b) useful ( c ) moderately useful (d) inconsequential 18 34 12 6 2. do you feel that the sdi lists brought to your attention works of interest which are not generally cited b y other sources that you use to learn of new publications? (a) many works (b ) some works (c) a few works (d) none 10 39 19 2 3. how would you characterize your feeling about the relative proportions of the items "of interest" ( relevant items) and "those not of interest" (irrelevant items) included in the sdi lists? (a) the proportion of relevant items in the lists was satisfactory. 57 (b) the proportion of irrelevant items in the lists was too high. 13 48 journal of library automation vol. 5/1 march, 1972 4. it is inevitable that some "not-of-interest" items are included in the sdi lists. was the inclusion of irrelevant notices bothersome to you? (a) yes ( b ) no 6 65 reasons: 5. on the other hand, it is possible that for any given search run, some relevant items in the file are missed. the chance of relevant items being missed can generally be minimized by certain search adjustments, but with a resulting increase in irrelevant notices. would you be willing to increase the number of irrelevant notices received in order to maximize the number of relevant ones? ( a) yes ( b ) no 40 29 reasons: 6. the sdi lists notified you of an average of--items per list which you judged to be "of interest." on a purely quantitative basis, would you say that this number was satisfactory, or for some reason too small or too large? (a) satisfactory 48 ( b ) too small 16 ( c ) too large 1 7. when the input to the marc file is increased, your sdi output would also likely increase. do you feel that you would like to be able to set some arbitrary upper limit on the quantity of items included in each sdi list even at the risk of missing a number of relevant items? (a) yes (b) no if yes , maximum number__ _____________ _ 17 51 reasons: 8. the sdi lists alerted you to a number of items which you judged to be "of interest." would you say that "of interest" items were new to you? (a) in most cases (b) frequently (c) occasionally (d) seldom 17 33 17 5 9. do you feel that the proportion of items "of interest" which were also "new" to you was: (a) satisfactory (b) too low 54 17 10. would you say that, in general, information given for the entries in the sdi lists is adequate to judge whether an item is or is not of interest to you? ( a) yes ( b ) no 58 10 11. what elements of the entry did you most often find useful in making evaluations? (a) author/ editor (b) title (c) publisher (d) series note (e) sub38 55 9 4 35 ject headings (f) classification numbers (g) other (please specify) 8 1 selective dissemiootion of marc /buhr 49 12. what is the primary use to which you put the sdi information? (a) recommendation for library acquisition (b) personal purchase of 51 12 item (c) other (please specify) 15 13. if your recommendation originates the library order for a publication, it will be some time before the work is available; and even if already on order, most of the publications included in your lists were probably too new to be available from the library at the same time you received the list. do you feel that this diminishes the value of the sdi service? (a) significantly (b) somewhat (c) negligibly 2 w ~ for what reasons? 14. a potential value of sdi service, based on the large volume of newly published works cataloged by and for the library of congress, is to bring together in one list timely notices for those works in the file which correspond to your several fields of interest. do you feel that the experimental sdi service demonstrated this capacity? (a) amply (b) adequately (c) poorly 26 36 9 15. is the format of the sdi notices satisfactory? (a) yes (b) no 61 9 if not, what format would you suggest? 16. is the distribution schedule of once a week satisfactory? ( a ) yes ( b ) no 71 0 17. on the average, how much time would you estimate it took to examine an sdi list? roughly: minutes: (a) 5 (b) 5-10 (c) 10 (d) 10-15 (e) 15 (f) 15-20 (g) 20 23 16 9 11 5 1 5 18. a possible by-product of this sdi service is the building up of a cumulative marc tape file which can be searched in various ways by computer. would you make use of such a file? (a) yes (b) no 40 18 if no, for what purposes? 19. judging from your total experience with the sdi service, would you characterize its overall value to you as: ( a) very high (b) high (c) medium (d) low r 24 30 9 50 journal of library automation vol. 5/1 march, 1972 20. the marc file at present represents english monographs cataloged by the library of congress on a week-by-week basis. sometime in 1972, the library of congress will begin to add some non-english monographs to the marc file. keeping in mind the forthcoming expanded marc file on which future sdi service would be based, do you feel that its value to you would then be: (a) increased (b) the same (c) less 30 33 7 21. do you personally want this sdi service to be continued? (a) yes (b) no (c) it doesn't matter ~ 3 5 22. do you faculty? (a) yes feel that this sdi service should be offered to the entire 42 reasons: (b) no 14 23. do you feel that this sdi service should appropriately be made available by the university, i.e., that the university should organize and administer the service? (a) yes ( b ) no (c) don't know 36 5 23 24. do you feel that the university alone should pay for this faculty sdi service? (a) yes (b) no (c) don't know 30 6 25 25. optional: general comments, pros and cons, elucidation of above replies, attitudes, suggestions, etc., concerning the sdi service. lib-s-mocs-kmc364-20141005043630 63 technical communications announcements with this issue we begin the process of shifting the emphasis and content of technical communicatiom. some of the newsletter features of technical communicatiom will be dropped due to the fact that as a quarterly publication it cannot be satisfactorily used to disseminate certain kinds of temporal information (e.g., short lead-time announcements, notification of institutes, seminars, meetings, etc.) . instead, brief articles, letters, or comments on lola articles, and pertinent information about technical developments will hopefully assume a larger percentage of the allotted pages for technical communications. the isad editorial board, in approving these changes, voiced the opinion that technical communications would be much more useful as a result. concise technical communications and information notes featuring any aspect of the application of computers, systems analysis, or other technological developments (hardware, software, or techniques) pertinent to libraries are solicited. the design is also meant to provide a forum for the more rapid dissemination of information that will sometimes serve as the basis for the longer, more detailed articles which are published in lola. thus, the salient findings in a study, or the important developments taking place in a project or ongoing operation can be made known long before they might otherwise be brought out in a formal presentation. these changes should become evident by the march 1974 issue, and to insure that this type of material does begin making its appearance, please send your letters notes and technical communications , , . to the editor of technical commumcations. (see cover sheet, page ii.) technological inroads catv library application in mobile, alabama, a cable television subscriber can telephone the public library's reference department and turn to the library service channel to see the information requested over the telephone. the library installation costs are reported to be less than $500. the spectrum of patrons making use of the service include financial analysts (looking at charts and graphs) , illustrators and advertising personnel (obtaining pictorial representations) , technicians requesting information from manuals, teachers, and even tourists looking for directional information. business applications loom important in the future and are already underway. it is now possible to offer a centralized microfilm storage with coded access to various documents. similarly it was noted that retrieval and transmission of videotapes for the use of realtors will be explored. this would provide real estate agents with the ability to give a videotape tour of properties for sale. transmission time of a tape could be metered and billed to the appropriate realtor. other possible applications encompass such library activities as story hours, instruction for children in schools, and live telecast of library functions. (extracted from the american city, march 1973) peacesat (pan pacific education and communication experiments by satellite) populations in the pacific basin are often small in size and divided by great distances, making it impossible for many to sustain adequate levels of educ ation, health care, and technically based services. inadequate communications constitute a principal barrier to development. 64 journal of library automation vol. 6/1 march 1973 pan pacific education and communication experiments by satellite (peacesat) is a demonstration project in which selected educational and medical institutions in the pacific basin are linked by means of communication satellite relay. voice and facsimile are sent and received by each location in the system. slow-scan television and teletype will be used at some locations. the peacesat project is not a permanent service. it is a pilot demonstration to provide experience in the use of long distance transmission on which to base the design of future telecommunication services. its objectives are to increase the quality of education in the pacific by facilitating sharing of scarce, costly resources; to improve professional services in sparsely populated areas through telecommunication support; and, generally, to assist in applying the potential of satellite technology to the solution of domestic social problems and peaceful world development. the system is unique in the world. the satellite used is the ats-1 operated by the national aeronautics and space administration. only established and tested technology is used in the system. the costs to participants are small. exchanges conducted through the peacesat facilities involve two-way communication, two or more locations interconnected at one time, and often many users at each location engaged in dialogue with users at other terminals. the format and content are determined by the users. the idea of using satellite relay to facilitate communication for educational, health, and community services in remote areas of the pacific basin was proposed in 1969 to the national aeronautics and space administration by dr. john bystrom, professor of communication at the manoa campus of the u diversity of hawaii. a start on the project was made in december 1970 when president harlan cleveland approved a grant from the u diversity's innovative program. in february 1971, nasa approval for use of the ats-1 was granted. dr. paul yuen, professor of electrical engineering, and icatashi nose, associate professor of physics, had two prototype ground terminals available when the federal communications commission approved licenses for the experiment. in phase i of the project, beginning april 1971, ground terminals constructed at the university were successfully testoperated and utilized between hawaii community college in hila and the manoa campus of the university of hawaii. the hawaii state legislature emphasized its support of the project by appropriating $75,000 in april 1971. the international network began in january 1972 with terminals at wellington polytechnic in wellington, new zealand, and the university of the south pacific in suva, fiji, joining the system. additional terminals have been established at maui community college, kahului, maui ( hawaii); papua new guinea institute of technology, lae, png; the university of south pacific centre, nuku'alofa, tonga ; and the department of education, pago pago, american samoa. operating terminals are being established at saipan and truk in the trust territory of the pacific islands. the project is administered by the university of hawaii with the assistance of the governor's committee on pan pacific educational communications, appointed by governor john burns and headed by uh president harlan cleveland. a faculty advisory committee assists development at the university of hawaii. recommendations for long range planning in medical research are provided by a medical communications study advisory committee project director is dr. john bystrom, assisted by james mcmahon, system coordinator. technical design and development is under the direction of dr. paul yuen. key to the system is a small inexpensive ground terminal designed and constructed at the university by katashi nose. each of the educational institutions which have terminals have their own autonomous staff and organization which operate the equipment and develop educational uses of the system. management of the peacesat terminal on the manoa campus is under carol misko, terminal manager. during its relatively short existence, the peacesat system has been utilized in a wide variety of educational and scientific programs. the east-west center used a receiving station on the ocean liner president wilson to conduct orientation sessions with its arriving grantees. hamilton library on the manoa campus has demonstrated exchange of materials with other locations via peacesat. doctors of the pacific research section of the national institute of health consult with doctors at the bethesda, maryland, national library of medicine. the hawaii cooperative extension service has used the system to conduct seminars with specialists from new zealand, fiji, tonga, and hawaii locations. faculty and students at the various campuses of the system have utilized the communication channels made available by peacesat. a few among the many disciplines they represent are political science, english, spanish, education, indonesian languages, physics, oceanography, computer science, journalism, urban planning, and speech-communication. it was the peacesat system which carried the world's first regularly scheduled class of instruction via satellite. within the pacific basin keen interest has been shown in the development of this project, as evidenced by discussion of peacesat at meetings of the south pacific forum and the south pacific commission. the peacesat network recently provided the means for south pacific poets to exchange their works with one another. amonv those joining in the wellreceived poetry series was the poet laureate of tonga. in april 1972 the national library of medicine awarded the university of hawaii a contract for a study of medical networking in the pacific, incorporating demonstrations of library and professional exchanges. hours of operation for the network are currently 9:00-10:00 a.m. and 4:306:00 p.m., monday through friday {honolulu time). the manoa exchange center is located in george hall (212) on the technical communications 65 campus of the university of hawaii. peacesat project, program in communication, university of hawaii, honolulu, hi 96822. phones: ( 808) 948-8848, ( 808) 948-8771, rca telex #723597. tomorrow's library: spools of tape libraries with ranks of musty tomes and files of catalog cards may be difficult to flnd in the future. books probably will be in museums; libraries will be on spools of computer tape. library users might push a button for a no-deposit, no-return paperback printout, instead of standing in line for a hardback from the stacks. movement in these directions has already begun at the university of georgia where a staff of 110 and $9 million of computer hardware provide the following type of service: a professor sits before a crt and types out the chemical names of ddt on the keyboard. almost immediately, the television screen above the keyboard displays a list of 176 scientific references to ddt. this information is the result of an electronic search of about 40,000 issues of chemical abstracts, a title compilation on computer tape of all published scientifl.c papers in chemistry. similar abstracts are available in other scientific fields, and three large foundation grants will enlarge these holdings to include literature in engineering, education, and the humanities. the information retrieval system allows a user to "browse as he would in a li· brary." but the browsing is done through one of 37 remote terminals. the number of remote terminals is expected to more than quadruple in future years, giving a total of some 200 individual outlets. (extracted from couege management) library projects and programs microfiche catalog by tulsa city-county library the tulsa city-county library computer output microfiche catalog was published in early march, according to ruth blake, director of technical services, tulsa city-county library. the catalog is in 66 journal of library automation vol. 6/1 march 1973 register-index format. the register, arranged by number, contains full bibliographic information for each title. adult and juvenile indexes contain brief bibliographic entries, location information, and a reference to the register number of each title. both indexes are in dictionary form, with authors, titles, and subjects in a single alphabet. minnesota bio-medical mini-computer project the university of minnesota bio-medical library has received a $361,729 three-year grant from the national library of medicine to provide support during the development of a low cost, stand alone, library dedicated computer system. the system will employ on-line terminals for data entry and file query functions, and will be based on an integrated system design of a processing system which would be suitable for use in other libraries of a similar size. the premise of the development is that an integrated acquisitions, accounting, in-process control system for all library materials coupled with an on-line catalog/ circulation control system can be operationally affordable by a library or system of libraries in the 200,000 volume class using its own computer system. a digital equipment corp. pdp 11/ 40 system has been selected. the cpu features 16k core, 16 bit word, power fail/ automatic restart, programmable real time clock, extended instruction set, and memory management option which permits access to 124k of memory. a dec writer data terminal will be used as console and initial terminal on the system. two 9 channel 800 bpi tape drives and one 40 million character moving head disk pack drive comprise the system's initial mass storage. a 132 column, 96 character set line printer completes the initial hardware configuration. before the system is installed, suitable crt type terminals and communications interfaces will be chosen. six of these terminals will be required when the system is fully operational. memory expansion in the cpu and additional mass storage may be acquired depending upon needs, although the design efforts will be to minimize the amount of core required for the system and most efficiently use the mass storage available. one of the problems of using a minicomputer system to service an interactive on-line library system is a lack of a suitable operating system which can require minimal residency in core, yet contain only the functions needed on a library system. current timesharing operating systems provide some parts of a system, such as device handlers, but require too great allocation of core, or programming in a compiler level language such as basic. this approach has been deemed unsatisfactory if system costs for hardware are to be kept reasonable. during the development period a pdp 11140 dos operating system will be used to assist in writing a hybrid operating system and utilities using the pdp 11/40 assembler language. also under development will be the file design, the system common modules, and system dictionary. these elements of the system will be required to then design and program the individual system applications. since the grant does not provide any support for data conversion, the circulation application will be developed and installed for the reserve materials. these only number a few thousand and involve short loan periods and other complexities which will provide an excellent test of a circulation control system for general library-wide use. other application systems, such as acquisitions and serials already are computer supported and therefore have existing machine-readable data files. the project staff includes glenn brudvig, director of the bio-medical library as principal investigator; audrey n. grosch of the university libraries systems division as project director and the following systems specialists: bob denney, carl sandberg, eugene lourey, and don norris. pertinent recent publications nationwi.de survey of library automation -phase i. the california state university and colleges has published the final report of phase i of its nationwide survey of library automation. this comprehensive survey performed for the chancellor's office-library systems project by inforonics, inc. covers over twenty-five library automation projects in the united states and canada. those interested in obtaining a copy should write, enclosing a check in the amount of $5.00 (californians remember the 6 percent tax) to chancellor's office; the california state university and colleges; 5670 wilshire blvd., suite 900; los angeles, ca 90036. a survey of commonplace problems in library automation, compiled by frank s. patrinostro. this survey documents actual library experiences concerning problems encountered, their causes, and what steps were taken to solve the problems. order from larc press, ltd.; 105-117 w. fourth avenue; peoria, il 61602. survey of commercitdly available computer-readable bibliographic data bases, edited by john h. schneider, marvin technical communications 61 gechman, and stephen e . furth. pub-· lished by asis. this reference tool provides descriptions of eighty-one machine-readable data bases. key papers on the use of computerbased bibliographic services, edited by stella keenan. published jointly by the national federation of abstracting and indexing services (nfais) and asis. contains selected papers on the use and evaluation of computer-based services. cost reduction for special libraries an.d information centers, edited by frank slater. published by asis. the four sections of the book cover an overview of recent literature on costing for 1ibraries; general cost reduction considerations; show and tell-special cost reduction efforts; and real costs for information managers. (the three preceding publications are available from publications division, american society for information science, 1140 connecticut ave., n.w., washington, dc 20036.) lib-s-mocs-kmc364-20140601052239 101 an interactive computer-based circulation system for northwestern university: the library puts it to work velma veneziano: systems analyst, northwestern university library, evanston, illinois northwestern university library's on-line circulation system has resulted in dramatic changes in practices and procedures in the circulation services section. after a hectic period of implementation, the staff soon began to adjust to the system. over the past year and a half, they have devised ways to use the system to maximum advantage, so that manual and machine systems now mesh in close harmony. freed from time-consuming clerical chores, the staff have been challenged to use their released time to best advantage, with the result that the "service" in "circulation services" is much closer to being a reality. the transition from a manual to an automated system is never easy. northwestern university library's experience with an automated circulation system was no exception. the first three months of operation were especially harrowing; there were times when only the realization that the bridges back to the old system were burned kept the staff plugging away with a system which often seemed in imminent danger of collapse. that they survived this period is a tribute to their persistence and optimism as well as to the merit of the system . the impressive array of obstacles was offset by a number of positive factors. even though there were mechanical problems with terminals, the on-line computer programs worked flawlessly from the first. the climate for change was favorable. the automation project had the complete support of library administration; the head of circulation services, although new to the department and untrained in automation, was completely committed to the system and was able to transmit his enthusiasm to his staff. 102 journal of library automation vol. 5/2 june, 1972 within three months, the systems analyst, who had been available for advice and trouble-shooting, began to fade from the scene. only an occasional minor refin ement is now necessary. maintenance problems, both in programs and procedures, are minimal. basically the system has proved itself workable. in a previous paper by dr. james s. aagaard (lola , mar. 1972 ), the development of the system is traced and the system is described in terms of its logical design, program, and hardware components. the present paper will describe how the system operates in the library environment. the system accomplishes the traditional library tasks connected with circulation, but the methods used have changed radically. the development of effective procedures must in large part be credited to the circulation staff. these procedures have in a real sense spelled the difference between an adequate system and a good one. it is these procedures on which we will concentrate. the author wishes to thank the head of circulation services, rolf erickson, and his assistants, mrs. eleanor pederson and mrs. lillian hagerty, for supplying th e information to bring her up-to-date on procedures as they have evolved over the past three years. book identification almost 100 percent of the 900,000 books in the main library's circulation collection contain punched cards. accurately punched book cards, available in all books, can make the difference between success and failure of a circulation system. the book cards contain only the call number and location code. there is no doubt that, if conversion funds had been less limited, we might have elected to capture author/title data. however an analysis of the amount of data which could be carried on an 80-column card, added to the fact that this would quadruple the cost, led to the decision to omit author/ title. as a result, key punch costs were exceptionally low-1.1 cent per card. in spite of our fears, the complaints by users because overdue and other notices do not contain author / title have been surprisingly few. cards for new books are, with a few exceptions, produced automatically as output from the technical services module. all book cards are also on magnetic tape and constitute a physical inventory of the entire circulating collection, which is updated at intervals and listed. user identification the system requires a unique numeric identification number for each borrower. for faculty and evanston campus students, this is their social security number; for special users it is a five -digit number assigned by the library from a list of sequential numbers. the number is supplemented by a one-digit code which identifies the type of user. ,. interactive circulation syste m j veneziano 103 the university's division of student finance has responsibility for issuance of punched plastic badges for students. each spring at preregistration time, data are gathered and pictures taken for students planning to return to the university in the fall. badges are ready for distribution as soon as school opens. for incoming freshmen, transfers, and returnees, data are gathered and badges punched at registration time in the fall. a temporary paper badge is used during the several weeks required for badge preparation. an outside contractor prepares and punches the badges. there were initial problems with the accuracy of punching but these have been resolved. the library now has a small ibm 015 badge punch, which it uses for punching special user badges and badges for carrel holders. student badges are valid for one year. the user code is changed each year to prevent use of an expired badge. faculty and staff badges are issued by the personnel department of the university, and are good for three years. these are also produced by an outside firm. book security exit guards examine all books taken from the library to ensure that they are properly charged. the call number on the book and on the date-due slip are compared; the user number on the date-due slip and the user's badge are compared. this need not be a character-by-character comparison. a few selected characters will suffice. student badges contain their pictures, which should bear at least a resemblance to the holder of the badge. initially, students were not required to show their badges. after a rash of book thefts resulting from the use of lost or stolen badges, this policy was changed. the book-check routine sometimes slows exit from the building during peak periods ; however it is considered a necessary security measure. the problem of lost badges is a serious one. users tend to leave badges in the terminals. usually such badges are turned in at the main circulation desk by the next user; the owner is notified to come in and pick it up. if a student loses his badge, he must report it to the circulation desk as soon as possible. he is issued a special use r badge, and the computer center is notified to "block" his regular user number. if someone then tries to use the badge, an "unprocessed" message will appear in lieu of a valid date-due slip. the problem is timing. "blocking" is done only once a day. a determined thief can charge out a considerable number of books before the number can be blocked. for this reason, a check of the photograph on the badge is important. the maximum number of user numbers which can be blocked is fifty. fortunately, except for faculty /staff badges which are good for three years, student badges automatically become invalid at the end of each school year, and special user badges expire at the end of each quarter. behind the decision to go on-line was the belief that a university library, 104 journal of library automation vol. 5/ 2 june, 1972 to effectively serve its patrons, needs to be able to determine the status of a book without delay. all books which are not in their places on the shelves as indicated by the card catalog are, in theory, retiected in the computer circulation file. out of a circulating collection of 900,000 items, the number of records in the file at any one time will range from 30,000 to 60,000. this includes books temporarily located in the reserve room, books being bound, and books which have been sent to the catalog department. it also includes books which are lost or missing but which have not yet been withdrawn from the catalog. a single 2740 typewriter terminal, located at the main circulation desk, is used for inquiry into the circulation file. a library user, having obtained the call number of a book from the catalog, looks for it in the stacks. if he is unable to find it, he inquires at the terminal. the operator enters a command "search," followed by the abbreviated call number of the book (the key ) . if one or more records with this key are in the file, the file address, plus the balance of the call number (the key extension), are typed back from the computer for each such record. if one of the listed records is the desired one, the operator then asks for a display of the record. the display includes the due date, type of charge, user number, and, if there is one, the saver number. the ability to use an abbreviated call number to access the file has proved invaluable. the operator can in effect "browse" among all the various editions, copies, and volumes of a particular book which are in circulation. the technique also facilitates finding a record, such as a volume in a serial, where the format is often quite variable, and not always obvious from the call slip supplied by the user. if a large number of books all with the same key are in the file, there is sometimes a considerable wait while the typewriter types out the addresses and key extensions for all the records. once such a listing begins, there is no way at present to cut it off in mid-point. this is a minor inconvenience; it could be remedied quite easily if computer core were not such a precious commodity. the single 2740 terminal is heavily used and plans are under way to substitute a cathode ray tube in the near future. book locate procedures if a search on the 27 40 terminal reveals that a book is not in circulation, the individual may ask that it be "located." a form is filled out and the book is searched nightly in the stacks. ( it is also again searched in the 27 40 since it may have been charged out to another user after the inquiry. ) if it is found , it is brought down and placed on the "save" shelf, and the inquirer is notified that it is available. if it is not found , the form is held for two weeks and searched again, both in the 2740 and the stacks. if it is not found on the second search, it is interactive circulation systemjveneziano 105 entered into the file as a "missing book." the circulation section has found that entering missing books into the file as soon as possible saves them time, because a search for a single book is often duplicated needlessly for a number of different individuals. save procedures when a user is informed that a book is in the circulation file, he may ask that it be called in for him, provided it is not on loan to the reserve koom and provided it is not already "saved" for someone else. the 2740 operator calls in the record and adds the saver's identification number to the record. each weekday morning, '·book needed notices" are sent over from the computer center for books "saved" since the last notice run. the notices are stuffed in window envelopes and mailed. even though the number of saves is small, in relation to the total number of books charged out, this feature has contributed to the library's and the user's satisfaction with the system. initially there was some consideration given to providing for multiple saves on the same book. a study of the frequency of multiple saves indicated that the increased system complexity did not warrant it. moreover, a student usually cannot wait too long for his turn at a book. a better solution in a university library is either to buy more copies or place high demand books in the reserve room, or both. the standard loan period is four weeks. a save on a book causes the due date to be recalculated either to two weeks, or to five days from the date of the save, whichever is later. this variable loan period increases the number of users who can use a book in high demand, without inconveniencing the user of a book which no one else needs. to succeed, such a call-in policy must be backed with enough force to ensure that a called-in book is returned promptly. if a book is returned after the revised due date, the user incurs a penalty fine of $1.00 per day in addition to the regular 10 cents per day fine. expired call-ins result in a weekly computer-generated reminder. when a book which is saved is discharged, the terminal printer issues a message to this effect, and the book can be placed on the "save" shelf instead of being sent to the stacks. each night "book available notices" are produced for all such books discharged since the last notice run. the first copy of the notice is mailed to the saver; the second part is inserted in the book. the saver is given five weekdays to pick up the book. book charges self-service charges during the regular school year, from 1000 to 1200 books per day are charged out through the system. most of these charges are processed by the users, on the self-service terminals. 106 ]ounwl of library automation vol. 5/2 june, 1972 a basic objective in the design of the system was to make it easy for the user to charge out books. initially it was planned to have manned chargeout terminals. however, as the design of the system progressed, it became evident that the vast bulk of charge-out transactions would consist of three simple steps: ( 1) insert the user badge, ( 2) insert the book card, and ( 3) tear off the date-due slip. the idea arose: if the procedure was so simple, why not let the user himself do it, thus saving the cost of terminal operators? there was some concern over user reaction, but it was decided it was worth the risk. a simple set of illustrated instructions is attached to the terminal. since the terminal will not accept badges or book cards unless they are inserted in the proper direction, the user soon gets the idea. the terminal will also refuse a seriously off-punched badge or book card. if everything is done properly, the printer produces a date-due slip containing the user number, the book ca11 number, and the date due. this is detached and placed in the book pocket. if, instead of a valid date-due slip, the user receives a slip from the printer containing the word "unprocessed," he is instructed to take all materials to the main circulation desk. this condition will occur if the individual tries to take out a book which is already charged out (perhaps to the reserve room or a carrel). it also happens if the badge or book card has fewer than the required complement of characters or if the user code on the badge has expired. it also happens if the user's number has been "blocked." although readers had no difficulty mastering the technique of using the 1031 badge/card reader, the 1033 printer was another story. despite the printed and illustrated injunction to "tear the slip forward ," the users insisted on pulling the paper upward. the result-the continuous roll of paper would start to skew and the paper would eventually jam. to alleviate the skewing problem, we had pin-feed platens installed in the typewriters. these prevented the skew, but the upward pull on the paper caused the pin-feed holes to tear and get out of alignment with the pins. the result-a paper jam. the ibm field engineers valiantly tried to overcome the condition but to no avail. ibm was unwilling to make any major modification of the paper feed mechanism, and no amount of argument that such an improvement would increase their sales to other libraries had any effect. in desperation, the library fina11y took its problem to the physics shop in northwestern university's technological institute. the technicians there designed and built a hooded feed to channel the paper upward and forward at the desired angle. a hand-actuated knife blade was installed to cut and dispense the ticket-type slip. in spite of these heroic efforts, paper jams still occur with enough frequency to be annoying. since the terminals are isolated in the stacks, a jam often goes undetected until a user comes down to the main circulation desk interactive circulation systemfveneziano 107 with a complaint. for this reason, we have plans to install a "ticket printer," which will automatically cut and eject a ticket with no user intervention. unlike the 1033 printer, there has been very little down-time due to malfunctioning of the 1031 badge/card reader. due to their isolation on the stack floors, there was some early tampering with the terminals. now that the newness has worn off, the terminals seem to have lost their appeal to pranksters, except that the photographs used to illustrate procedures have a way of disappearing. everything taken into consideration, the self-service concept has proved completely feasible. it saves staff time and user time. the time required to charge out a book ranges from ten to fifteen seconds. carrel charges each quarter, the circulation section assigns carrels to individuals, mostly graduate students and faculty. carrel holders may charge out books for use in their carrels. a special loan code is entered which results in the date-due slip bearing the word "carrel." the user cannot take these books from the building. carrel charges are subject to call-in after two weeks but are not subject to fines. at the end of each quarter, unless the carrel has been reassigned to the same individual, any remaining books in the carrel are picked up and discharged. once a quarter, the carrel user receives a computer-printed list of books charged to his carrel. carrel holders tend to charge large numbers of books. for saving time on their part and on the part of staff, plastic badges are issued. these will contain the carrel number, the carrel code, and an expiration date. carrel holders may then use the self-service terminals in the stacks. charges to the reserve room the reserve room does not use the circulation system for charges to individuals, since the loan period is so limited. however the circulation file contains a record of all books located in the reserve room. when a book is charged to the reserve room, the identification number of the reserve room is entered in the 1031 slides, together with a loan code indicating an indefinite loan period. processing of large batches of books is speeded up by suppressing the printing of date-due slips for all intralibrary charges. after charging, the punched book card is removed and held until the book is ready to be returned to the stacks, at which time the book is discharged in the regular manner. if a book needed for reserve cannot be found in the stacks, it is searched in the 2740 terminal. if it is in the file , a save is placed on the record which generates a book-needed notice. the user is given five days to return the book. when the book is returned and discharged, a printer message alerts 108 journal of library automation vol. 5/ 2 june, 1972 the discharger, who places the book on the shelf for pick-up by the reserve room. if the book is not in the file, it goes through the "book locate procedure," after which , if it is not found , it is processed as a "missing" book. if su ch a missing book turns up, it can be immediately identified as needed hy the reserve room. a quarterly listing, in call number order, is re ceived from the computer center for all books charged out to the reserve room. this list serves as the heserve room's shelf list. bindery charges if a book is found to be in bad condition, it is set aside for a bindery decision. if it is beyond repair, it is charged out to the catalog department to be replaced or withdrawn. (after it is withdrawn it is deleted from the file.) if it can be repaired in-house, it is charged out to the mending section. if it must be sent to a commercial binder, it is charged to the bindery. the bindery section prepares an extra copy of the bindery ticket for all periodicals and unbound items, which it sends to the bindery. this ticket is used to keypunch a book card, which is then used to charge the book to the bindery. whenever a book is back from mending or binding, it is discharged before sending to the stacks. renewals all renewals are processed at the main circulation desk. the procedure is identical to a regular charge except that a slide on the terminal is set to "renew." the new date-due slip will contain the phrase "renew to." in theory, the self-service terminals could be used for renewals. in practice, unless elaborate precautions were taken, a user could renew a book before it became due and then return it for discharge, leaving one slip in the book and keeping the other. after the book reached the stacks, the user could insert the extra date-due slip and walk out undetected. as protection against this, the original date-due slip must be in the book when it is renewed. phone renewals are not accepted. however, if the user mails or brings in his date-due slips, the renewal is processed on the 2740. in the renewal of a book via the 2740, the record is called in and modified to change the date due and enter the correct renewal code. the original date due slip is stamped with the new date and the phrase "renewed." the slip is mailed to the user. although record modification via the 27 40 is a valuable and necessary feature, it must be used with discretion, since the generalized file management system governing the 2740 does not have the controls contained in the circulation-specific portions of the program which handle data from the 1030's; for example, automatic calculation of date due, rejection of renewals on saved books, validation of codes, etc. interactive circulation systemjveneziano 109 book discharge heturned books are left in book bins, one inside the building and one outside. it became very c\·ident during the implementation phase of the system that the success of the system depended on a thorough screening before discharge for purposes of detecting and deflecting potential problems before they got to the discharge terminal. books are first placed on dated trucks and then screened. books tcifhout punched book cards 1 f the punched hook card is missing, there will usually be a hand-written dat(' slip in the book ( the result of a manual charge ). the screener pulls the matching book card from the "book-cards-pending" fil e. ( after a manual charge, book cards arc punched and filed in this file to await the return of the hook.) the book is then ready for regular discharge. if there is no book card waiting, the hook must be held until a card is ready. this is done to avoid the charge being made after the discharge. books with incorrect book cards all book cards are checked to see that they match the call number on the book pocket. sometimes cards get switched between two books by the user when he charges them; sometimes the error was made when the card was originally matched with the book. if a book is found to contain an incorrect card, sometimes the correct card will he found in the "cards-pending" file. if so, it is pulled and inserted and the hook sent for regular discharge. ( the incorrect card becomes a "snag". ) if the correct card is not found, the record is searched in the 27 40 under both call numbers ( the one on the card and the one on the book ) . if the record is under both call numbers, the record which matches the book is deleted; the book is sent to keypunching; the unmatched book card is filed in the "cards-pending" file to await the return of the book which matches it. , if the record is found under only one of the two call numbers, it is deleted. the book is sent to keypunching; the unmatched book card becomes a ''snag." "snag" cards will be searched in the shelf list and, if they represent valid books, will be searched in the stacks. this is done to determine if a matching book can be found. books without date due slips the presence of a date clue slip in a hook usually indicates that the book should be in the circulation file. a slip will be missing if the user nen'r charged it out or if he lost (or removed ) the slip after charging it out. such books arc searched on the 2740. if no record is found, the book is sent to the stacks. if a record is found, it is deleted. however, we wish to llo journal of library automation vol. .5 / 2 june, 1972 guard against the user returning to insert the date due slip and walk out with it; thus, the book is not sent to the stacks until the date due is past. regular 1031 discharges the speed and accuracy of discharge are features which have contributed much to the success of the system. a book with a date-due slip and book card which matches the book go to the 1030 terminal at the main circulation desk for discharge. one slide is set to either "fine paid" or "fine not paid." (if the user paid a fine at the time he returned the book, a "fine paid" flag will be in the book. ) another slide is set to "book returned today," or "book returned yesterday," or "book returned prior to yesterday." if the last condition applies, the date of return is also set in the slides. once set, these slides need not be reset until there is a change of date or fine condition. for minimizing the resetting of slides, books are segregated into groups all of the same type. discharging is the essence of simplicity. the book card is inserted in the reader; it feeds down and out and is replaced in the book. the date-due slip is discarded and the book is ready for shelving. for the purpose of speeding up discharge, no printer message is received unless there is an error (record not in file), or unless the book has a save on it, or is a "found" lost book. one operator can discharge five to six books per minute. books are almost always discharged within one day of return and usually within three or four hours. if a large number of books should pile up after a period of computer down-time (fortunately rare), a massive discharge campaign is launched. two operators, working together on the terminal, can discharge books at the rate of one every eight seconds. if at the time of discharge a "save" message appears on the printer, the book is placed on the save shelf instead of being sent to the stacks. if a lost book is "found," the message alerts the operator to send the book to the staff member in charge of lost and missing books. if a message is received to the effect that no record exists in the file, the book is routed to the 2740 operator. occasionally the 1031 terminal will misread a card, usually due to improper folding. if a card is folded outside the punched area it causes no trouble. unfortunately, some of the original cards were folded in the middle which sometimes results in a punch being missed. this, in some cases, cannot be detected by the computer program. if the error resulted from a mis-read card, the terminal operator can usually determine, from the date-due slip and the error-message slip, the key under which the record exists. the record is deleted and the book sent to have a replacement card punched. an occasional cause of the "record-not-in-file" condition results when the charge was processed on the standard register punch (the mechanized interactive circulation systemjveneziano 111 back-up system). this punch has a disconcerting habit of dropping a punch from badges which have a slight defect. there is no warning when this happens, and the error is often not detected until the transaction is later processed through the 1030 terminal. since it is impossible to identify the user with certainty, such cards are simply discarded without processing on the assumption that most users are basically honest and will return the book. the 27 40 operator, seeing a date-due slip with a short identification number, is safe in assuming the record never got in the file. sometimes the "record-not-in-file" condition is the result of a discharger absent-mindedly discharging a book twice. if the 2740 operator cannot find a record, she gives up and sends the book to the stacks. during the early days of operation, when much of the charging was being done on the source record punch, the "record-not-in-file" condition was often due to the book being "discharged" b efore the charge was processed. the very small amount of down-time now, coupled with careful scheduling when it does occur, has almost eliminated this source of error. overdue books overdue notices for students and special individual users are prepared once a week. to avoid sending out large numbers of notices for books only a few days overdue, an overdue notice is prepared only if the book is at least four days overdue. a second notice is prepared two weeks after the first; a third and final notice is prepared two weeks after that. if there is no response to the final notice within two weeks, a "delinquent" notice is prepared which is not sent out but is used to prepare a bill for a "lost" book. the overdue-notice run also produces reminders of expired call-ins. fines and fine collection faculty and staff are fine-exempt. students and other individual users pay a 10 cents per day fine for books overdue more than three days. in addition , if a reader does not respond to a call-in by the revised due date, he is charged a $1.00 per day penalty fine. a user may elect to pay a fine on an overdue book at the time he returns it, in which case a "fine-paid" flag is inserted to alert the discharger to set the proper slide. no fine notice will result if this slide is set. for all other books returned late, fine notices are computer-prepared each weekday. these are on four-part forms; one copy is inserted in a window envelope and mailed; the other three parts are filed alphabetically by name. when the user pays his fine, the extra slips are discarded. if the fine is not paid in a reasonable period, one of the extra copies is sent as a follow-up notice. if no response to the follow-up is received, and if the total bill exceeds $3.00, the bill is sent to the department of student finance for collection. 112 journal of library automation vol. 5/2 june, 1972 sometimes the receiver of an overdue notice will come in to report that he ( l) returned the book, ( 2) lost it, or ( 3) never had it. in such cases the book is searched in the 27 40 because the book may have been returned since the overdue notice was prepared. if the record is still in the file, the item is verified in the shelf list. in some instances an incorrectly punched card is responsible for the item not being properly discharged. if a call number on a notice cannot be found in the shelf list, there is no alternative except to delete the record and absolve the reader of responsibility. if the call number on the notice represents a valid book, it is searched in the stacks and if found, is brought down and discharged, with a resultant fine notice. when the book cannot be found, the reader is usually held responsible for it, unless it was a case of a lost badge which was reported promptly, in which case the library is usually lenient. if no lost badge was involved, the book is processed as a "lost" book and the user is billed. a book is also considered lost (and the user billed) if the user does not respond to three overdue notices. weekly overdue notices are not prepared for faculty. instead a once-aquarter computer-produced memo is prepared informing the individual of the books charged to him. he is asked to return them or notify the library by carbon copy of the list that he wishes to retain them. if a faculty member does not return the list, the library calls in the books individually. as part of this quarterly memo run, listings of books charged to carrels and to departments (reserve room, bindery, cataloging, etc. ) are produced. these listings have proved very valuable in maintaining control over books charged out on a long-term basis. lost books when a book is determined to be "lost," a duplicate book card is prepared. the history of the loss, including the name and address of the individual involved, is entered on the card. if the reader is held responsible, the book is priced and a bill is prepared. the original record is left in the file until all the documents are prepared. then it is deleted via the 27 40, and a duplicate card is immediately used to charge the book out to the "lost" category. the duplicate card is then filed in the "lost/missing" file, by call number. another category of books is known as "missing." these are books which, although not charged out to anyone, cannot be found in the stacks. a duplicate card is prepared and used to charge the item out to the "missing" category. the card is filed in the "lost/missing" file. once a quarter, a computer-produced listing of lost/missing books is received. using this list, the stacks can be searched to see if the books have turned up. the list of books lost or missing for more than two years it turned over to the catalog department for withdrawal. after official withdrawal, the record will be deleted from the file. interactive circulation systemjveneziano 113 the fact that all lost/ missing books are reflected in the file has aided in detecting them if they turn up. if such a book is discharged, a printed message alerts the operator who routes the book to the person in charge of lost/missing books. the duplicate card is then pulled from the lost/missing file. since the card contains the name of the responsible individual, it is possible to trace down the original bill in case an adjustment is necessary. lost/ missing books also turn up if someone tries to charge them out. the "unprocessed" message which is printed instead of a date-due slip will usually cause the reader to bring the book to the main circulation desk where the proper action can be taken to reinstate it in the collection. manual charges the system had to be designed so a book could be charged out even if it did not have a punched book card. such books are brought to the main circulation desk where a two-part form is hand-prepared. one part becomes a date-due slip; the other part goes to keypunching. a composite card containing the call number, the user number, and the loan code is punched, which is then fed through the 1030 to create a charge record. also keypunched at this time is a regular book card, which is filed in the "cards-pending" file to await the return of the book. such manual charges are very unsatisfactory. call numbers and identification numbers are often illegible or miscopied. keypunch errors are not uncommon. care must be taken that the composite cards are processed through the 1030 before the book is returned for discharge. fortunately, books without cards are now a rarity. mechanized charges although the amount of computer down-time is very slight, some means had to be devised to charge out books during such periods. the manual charge procedure could have been used; however the high error rate in copying and punching, coupled with the delay in keypunching any substantial volume of cards, caused us to reject this as a back-up system. a standard register source record punch is used. this punch reads the badge and book card and transfers the data, plus data from a series of internal slides, to produce a printed date-due slip and a punched composite card. when the computer comes back on, the composite card is fed through the 1031 to set up the charge record. since only one machine can be justified from a cost standpoint, the process of charging books out in this fashion is slow. long lines of people often form, waiting for service. resetting the internal slides between one loan code and another is awkward and error-prone. the machine is extremely sensitive to badge quality and often misses a punch. however, as with manual charges, the most significant disadvantage is that charges are made "blind." there is no way to determine whether a book is not already in the file, or, if it is being renewed, that it has a save 114 journal of library automation vol. 5/2 june, 1972 on it. the user's number may be one of those "blocked" from use; this fact is not detected until it is too late. as with manual charges, care must be taken that all such mechanized charges are processed through the 1030 before any discharging is done. in spite of its defects, the source record punch has proved useful as a system back-up. the error rate in transfer, while higher than on the 1030, is significantly less than the error rate of manually prepared and keypunched charges. although slow, records get into the file much faster than if they had to be keypunched. the impact of the system on the library the new system has had a profound impact on the operation of the circulation services section, but other departments have also been affected, particularly technical services. tighter control of cataloging is now maintained. no longer is it feasible for small uncataloged collections or collections with off-beat cataloging to exist in virtual isolation from the rest of the library. regulations as to depth of classification have had to be adopted; the formation of the cutter number and work letters must be carefully regulated; the assignment of volume and edition numbers must be uniform. location symbols require careful control; no longer can books be casually passed from one collection to another without official transfer. withdrawal of lost and missing books must be systematically performed. the system gives maximum flexibility-books may circulate on lc class numbers or document numbers as well as on a dewey number. ways of handling non-standard cutter numbers and work letters have been improvised. at the same time, the system operates to prevent unnecessary haphazard and shortsighted practices. within the circulation services section, the computerization of circulation has not resulted in fewer personnel; it has, however, resulted in the same number of staff members being able to handle a much larger volume of circulation and to handle it more efficiently. in addition, cirrulation services has taken on a number of tasks which in the past were either not its responsibility or, if they were, were given only perfunctory attention. a comprehensive inventory of the entire collection of 1,200,000 books in the main library is in progress. errors both on books and in the catalog are being corrected. the physical condition of the collection is being attended to. the content and quality of the collection are receiving increased attention. incomplete serial holdings are being brought to light for possible acquisition. books in the stacks which are candidates for inclusion in the "special collections" department are being detected. so far as circulation proper is concerned, it can be said without reservation that the system saves a great deal of clerical effort. staff time spent in charging out books is very small. discharging an average day's books interactive circulation system j veneziano 115 requires three or four man-hours. filing has almost disappeared as has most of the typing formerly required. a 2740 operator is required for inquiry and processing of mail renewals for the better part of the day and evening. the collection and follow-up on fines and bills is still a time-consuming job, although the extra forms available for follow-up have supplied some relief. the system is not perfect. there are certain improvements-such as on-line validation of users and automatic regulation of loan privilegeswhich would be made if the time and money were available for them. however, considering the modest cost of developing and operating the system, the imperfections are bearable. not the least of the benefits derived from the system is a somewhat intangible one. the role of the circulation librarian, and that of his staff, has changed. no longer are they chained to mountains of cards which, as soon as they are filed, must be unfiled. staff members have been challenged to use their released time to the best advantage. much thought and ingenuity has gone into setting up procedures to achieve maximum efficiency and accuracy. for the first time, perfection is seen as an attainable goal. each day the staff develops more sophistication and gets a step closer to that goal. figure 1. user inserts identification badge and punched book card in self-service circulation terminal. 116 journal of library automation vol. 5/2 june, 1972 fig. 3. user inserts date-due slip in book pocket, completing charge procedure. fig. 2. specially designed attachment is used to cut off printed fig. 4. terminal at circulation desk has manual entry unit, which can be set to process charges without an identification badge, renewals, or discharges. interactive circulation systemjveneziano 117 fig. 5. typewriter terminal is used for inquiry into file, placing saves on books, and occasionally for renewals. microsoft word june_ital_liu_final.docx a  library  in  the  palm  of  your  hand:   mobile  services  in  top  100  university   libraries     yan  quan  liu  and     sarah  briggs     information  technology  and  libraries  |  june  2015             133   abstract   what  is  the  current  state  of  mobile  services  among  academic  libraries  of  the  country’s  top  100   universities,  and  what  are  the  best  practices  for  librarians  implementing  mobile  services  at  the   university  level?  through  in-­‐depth  website  visits  and  survey  questionnaires,  the  authors  studied  each   of  the  top  100  universities’  libraries’  experiences  with  mobile  services.  results  showed  that  all  of  these   libraries  offered  at  least  one  mobile  service,  and  the  majority  offered  multiple  services.  the  most   common  mobile  services  offered  were  mobile  sites,  text  messaging  services,  e-­‐books,  and  mobile   access  to  databases  and  the  catalog.  in  addition,  chat/im  services,  social  media  accounts  and  apps   were  very  popular.    survey  responses  also  indicated  a  trend  towards  responsive  design  for  websites  so   that  patrons  can  access  the  library’s  full  site  on  any  mobile  device.  respondents  recommend  that   libraries  considering  offering  mobile  services  begin  as  soon  as  possible  as  patron  demand  for  these   services  is  expected  to  increase.   introduction    mobile  devices,  such  as  smart  phones,  tablets,  e-­‐book  readers,  handheld  gaming  tools  and   portable  music  players  are  practically  omnipresent  in  today’s  society.  according  to  walsh  (2012),   “mobile  data  traffic  in  2011  was  eight  times  the  size  of  the  global  internet  in  2000  and,  according   to  forecasts,  mobile  devices  will  soon  outnumber  human  beings”.1  studies  have  revealed  that  use   of  mobile  devices  is  widespread  and  continues  to  increase.  as  of  2013,  56%  of  americans  owned  a   smart  phone  (smith  2013).  this  number  is  even  higher  among  people  ages  18  to  29.2  however,   peters  (2011)  points  out  that  mobile  phones  at  least  can  be  found  among  people  of  all  ages,   nationalities  and  socioeconomic  classes.  he  writes,  “we  truly  are  in  the  midst  of  a  global  mobile   revolution.”3  in  2012,  the  acrl  research  planning  and  review  committee  found  that  55%  of   undergraduates  have  smart  phones,  62%  have  ipods,  and  21%  have  some  kind  of  tablet.  over  67%   of  these  students  use  their  devices  academically.4  elmore  and  stephens  (2012)  write,  “academic   libraries  cannot  afford  to  ignore  this  growing  trend.  for  many  students  a  mobile  phone  is  no   longer  just  a  telephonic  device  but  a  handheld  information  retrieval  tool.”5       yan  quan  liu  (liuy1@southernct.edu)  is  professor  in  information  and  library  science  at   southern  connecticut  state  university,  new  haven,  ct,  and  special  hired  professor  at  tianjin   university  of  technology,  tianjin,  china.  sarah  briggs  (sjg.librarian@gmail.com)  is   library/media  specialist  at  jonathan  law  high  school,  milford,  ct.     a  library  in  the  palm  of  your  hand:  mobile  services  in  the  top  100  university  libraries  |     liu  and  briggs  |  doi:  10.6017/ital.v34i2.5650   134   it  is  clear  from  these  studies  that  academic  libraries  can  expect  their  patrons  to  be  accessing  their   services  via  mobile  devices  in  growing  numbers  and  need  to  adapt  to  this  reality.  however,  the   sheer  number  of  mobile  devices  on  the  market  and  the  myriad  ways  libraries  could  offer  mobile   services  can  be  daunting.  additionally,  offering  mobile  services  requires  investing  time,  money,   and  personnel.  in  order  to  give  libraries  a  starting  point,  this  paper  examines  the  current  status  of   mobile  services  in  the  united  states’  top  100  universities’  libraries  as  a  model,  specifically  what   services  are  being  offered,  what  are  they  being  used  for,  and  what  challenges  libraries  have   encountered  in  offering  mobile  services.  in  doing  so,  this  paper  attempts  to  answer  two  questions:   what  is  the  state  of  mobile  services  among  academic  libraries  of  the  country’s  top  ranked   universities,  and  what  can  the  experiences  of  these  libraries  teach  us  about  best  practices  for   mobile  services  at  the  university  level?     literature  review   current  status  of  mobile  services  in  academic  libraries   there  is  not  a  lot  of  data  regarding  the  prevalence  of  mobile  services  in  academic  libraries.  a  2010   study  found  that  35%  of  the  english  speaking  members  of  the  association  of  research  libraries   had  a  mobile  website  for  either  the  university,  the  library,  or  both  (canuel  and  crichton  2010).6  a   study  of  chinese  academic  libraries  revealed  that  only  12.8%  surveyed  had  a  section  of  their  web   pages  devoted  to  mobile  library  service  (li  2013).7  in  2010,  canuel  and  crichton  found  that  13.7%   of  association  of  universities  and  colleges  of  canada  members  had  some  mobile  services,   including  websites  and  apps.8  in  the  united  states,  a  2010  survey  found  that  44%  of  academic   libraries  offered  some  type  of  mobile  service.  39%  had  a  mobile  website,  and  36%  had  a  mobile   version  of  the  library’s  catalog.  half  of  libraries  which  did  not  offer  mobile  services  were  in  the   planning  process  for  creating  a  mobile  website,  catalog,  and  text  notifications.  additionally,  40%   planned  on  implementing  sms  reference  services,  and  54%  wanted  the  ability  to  access  library   databases  on  mobile  devices  (thomas  2010).9  however,  it  is  widely  assumed  that  mobile  services   will  expand  rapidly  in  the  future  (canuel  and  crichton  2010).10  more  recently,  a  2012  survey  of   academic  libraries  in  the  pacific  northwest  found  that  50%  had  a  mobile  version  of  the  library’s   website  and/or  catalog,  40%  used  qr  codes,  38%  had  a  text  messaging  service,  and  18%  replied   “other”  with  mobile  interfaces  for  databases  being  a  popular  offering.  however,  31%  of  survey   respondents  still  did  not  have  any  mobile  services  (ashford  and  zeigen  2012).11  osika  and   kaufman  (2012)  surveyed  community  and  junior  colleges  nationwide  to  determine  what  mobile   services  were  being  offered.  73%  offered  mobile  catalog  access,  62%  offered  vendor  database   apps,  two  were  creating  a  mobile  app  for  the  library,  and  14.7%  had  a  mobile  library  website.12         definition  and  types  of  mobile  services   although  there  are  dozens  of  different  mobile  devices  on  the  market,  la  counte  (2013)  aptly  and   succinctly  defines  them  as  follows:  “the  reality  is  that  mobile  devices  can  refer  to  essentially  any   device  that  someone  uses  on  the  go”  (vi).13  smart  phones,  netbooks,  tablet  computers,  e-­‐readers,     information  technologies  and  libraries  |  june  2015   135   gaming  devices  and  ipods  are  examples  of  mobile  devices  that  are  now  commonplace  on  college   campuses.  barnhart  and  pierce  (2012)  define  these  devices  as  “…networked,  portable,  and   handheld…”14  additionally,  these  devices  may  be  used  to  read,  listen  to  music,  and  watch  videos   (west,  hafner  and  faust  2006).15  according  to  lippincott  (2008),  libraries  should  consider  all   their  patron  groups  as  potential  mobile  library  users,  including  faculty,  distance  education   students,  on-­‐campus  students,  students  placed  in  internships  or  doing  other  kinds  of  fieldwork,   and  students  using  mobile  devices  to  work  on  collaborative  projects  outside  of  school.16     the  most  common  mobile  services  discussed  in  the  literature  are  mobile-­‐friendly  websites  or  apps,   mobile-­‐friendly  access  to  the  library’s  catalog  and  databases,  text  messaging  services,  qr  codes,   augmented  reality,  e-­‐books,  and  information  literacy  instruction  facilitated  by  mobile  devices.   these  services  fall  into  one  of  two  categories:  traditional  library  services  amended  to  be  available   with  mobile  devices  and  services  created  specifically  for  mobile  devices.     common  library  services  that  have  been  updated  to  be  mobile-­‐friendly  include  a  mobile  website   (either  as  a  mobile  version  of  the  library’s  regular  site,  an  app,  or  both),  mobile-­‐friendly  interfaces   for  the  library’s  catalog  and  databases,  access  to  books  in  electronic  format,  and  information   literacy  instruction  which  makes  use  of  mobile  devices.  regarding  mobile  websites  and  apps,   walsh  (2012)  writes,     “if  a  well-­‐designed  app  is  like  a  top-­‐end  sports  car,  a  mobile  website  is  more  like  a  family  run-­‐ around.  it  may  not  be  as  good  looking,  but  it  is  likely  to  be  cheaper,  easier  to  run  and   accessible  to  more  people.”17     it  is  not  feasible  to  replicate  the  entire  website  in  a  mobile  version,  so  libraries  must  know  what   patrons  find  most  important  and  address  that  information  through  the  mobile  site  (walsh  2012).18   according  to  a  2012  survey  of  academic  libraries  in  the  pacific  northwest,  the  most  popular  types   of  information  found  on  mobile  websites  are  links  to  the  catalog,  a  way  to  contact  a  librarian,  links   to  databases,  and  hours  of  operation  (ashford  and  zeigen  2012).19  many  libraries  are  also   providing  mobile  access  to  their  catalogs  and  databases.  this  is  sometimes  difficult  because  often   third-­‐party  vendors  are  responsible  for  the  catalogs  and/or  databases,  and  libraries  must  rely  on   these  vendors  to  provide  mobile  access  (iglesias  and  meesangnil  2011).20  however,  many  vendors   already  offer  mobile-­‐friendly  interfaces;  libraries  must  be  aware  when  this  is  the  case  and  provide   links  to  these  interfaces.  when  a  vendor  does  not  provide  a  mobile-­‐friendly  interface,  the  library   should  encourage  the  vendor  to  do  so  (bishoff  2013,  p.  118).21     there  is  a  growing  expectation  that  libraries  will  provide  e-­‐books  to  patrons  as  e-­‐books  become   increasingly  popular.  walsh  (2012)  states  that  the  proportion  of  adults  in  the  united  states  who   own  an  e-­‐book  reader  doubled  between  november  2010  and  may  2011.22  according  to  bischoff,   ruth,  and  rawlins  (2013),  29%  of  americans  owned  a  tablet  or  e-­‐reader  as  of  january  2012.23  this   has  presented  challenges  for  libraries,  mainly  in  two  areas:  format  and  licensing.  there  is  risk   involved  in  choosing  a  format  that  will  only  work  with  one  product,  i.e.  a  nook  or  a  kindle,     a  library  in  the  palm  of  your  hand:  mobile  services  in  the  top  100  university  libraries  |     liu  and  briggs  |  doi:  10.6017/ital.v34i2.5650   136   because  not  every  patron  will  own  the  same  device,  and  ultimately  one  device  might  become  the   most  popular,  rendering  books  purchased  for  other  devices  obsolete.  on  the  other  hand,  formats   that  work  with  multiple  devices  tend  to  have  only  basic  functionality  and  do  not  provide  an  ideal   user  experience  (walsh  2012).24  walsh  (2012)  recommends  epub,  which  works  well  with  many   different  devices,  is  free,  and  supports  the  addition  of  a  digital  rights  management  layer.25   licensing  is  also  an  issue  as  libraries  and  publishers  strive  to  find  a  method  of  loaning  e-­‐books   amenable  to  both.  no  one  model  has  emerged  which  is  mutually  satisfactory  (walsh  2012).26             libraries  are  increasingly  integrating  mobile  technologies  into  information  literacy  instruction   and  other  forms  of  instruction.  for  example,  services  such  as  skype  and  facetime,  which  walsh   (2012)  describes  as  “a  window  to  another  world”  (p.  105),  can  be  used  for  distance  learning,   including  reference  and  instruction.27  when  interactions  do  not  need  to  take  place  live,  many   mobile  devices  have  the  capability  to  take  pictures,  record  video,  and  record  audio  (walsh  2012,  p.   97).28  this  allows  class  events,  including  lectures  and  discussions,  to  be  broadcast  to  people  and   spaces  beyond  the  physical  classroom.  walsh  (2012)  notes  that,  when  constructing  podcasts  or   vodcasts,  it  is  important  to  make  mobile-­‐friendly  versions  of  these  available,  bearing  in  mind   different  platforms  and  screen  sizes  people  might  be  using  to  access  the  content.29      text  messaging,  qr  codes,  and  augmented  reality  are  examples  of  library  services  that  were   created  expressly  for  mobile  devices.  text  messaging  in  particular  has  become  a  very  popular   mobile  service  offering;  as  thomas  and  murphy  (2009)  write,  “interacting  with  patrons  through   text  messaging  now  ranks  among  core  competencies  for  librarians  because  sms  increasingly   comprises  a  central  channel  for  communicating  library  information.”30  a  common  use  of  text   messaging  is  a  ‘text  a  librarian’  service.  walsh  (2012)  recommends  launching  such  a  service  even   if  the  library  currently  offers  no  other  mobile  services,  noting,  “it  can  be  quick,  easy  and  cheap  to   introduce  such  a  service  and  it  is  an  ideal  entry  into  the  world  of  providing  services  via  mobile   devices”  (p.  45).31  peters  (2011)  points  out  that  the  shorter  the  turnaround  time  (he  recommends   less  than  ten  minutes)  the  better.  he  notes  that  many  questions  arise  as  the  result  of  a  situation   the  questioner  is  currently  in.  he  writes,  “if  you  do  not  respond  in  a  matter  of  minutes,  not  hours,   the  context  will  be  lost  and  the  need  will  be  diminished  or  satisfied  in  other  ways.”32   qr  codes  have  become  popular  in  libraries  offering  mobile  services.  qr  codes  encode  information   in  two  dimensions  (vertically  and  horizontally),  and  thus  can  provide  more  information  than  a   barcode.  the  applications  necessary  for  using  qr  codes  are  usually  free,  and  they  can  be  read  by   most  mobile  devices  with  cameras  (little  2011).33  the  most  common  uses  of  qr  codes  in   academic  libraries,  according  to  elmore  and  stephens  (2012),  are  linking  to  the  library’s  mobile   website  and  social  media  pages,  searching  the  library  catalog,  viewing  a  video  or  accessing  a  music   file,  reserving  a  study  room,  and  taking  a  virtual  tour  of  the  library  facilities.34     augmented  reality  may  not  currently  be  used  as  often  in  libraries  as  other  services  such  as  mobile   sites  and  text  messaging,  but  many  libraries  are  finding  unique  and  compelling  ways  to  use  ar.  ar   applications  link  the  physical  with  the  digital,  are  interactive  in  real  time,  and  are  registered  in  3-­‐d.     information  technologies  and  libraries  |  june  2015   137   hahn  (2012)  defines  ar  as  follows:  “in  order  to  be  considered  a  truly  augmented  reality   application,  an  app  must  interactively  attach  graphics  or  data  to  objects  in  real  time,  to  achieve  the   real  and  virtual  combination  of  graphics  into  the  physical  environment.”35  he  notes  that  such   applications  are  excellent  additions  to  libraries’  mobile  services  because  they  connect  physical  and   digital  worlds,  much  like  libraries.36  one  example  of  augmented  reality  is  north  carolina  state   university’s  wolfwalk,  which  is  advertised  as  “…a  historical  walking  tour  of  the  nc  state  campus   using  the  location-­‐aware  campus  map”  (ncsu  libraries).37  to  create  the  tour,  the  ncsu  libraries   special  collections  research  center  provided  over  one  thousand  photographs  of  the  campus  from   the  19th  century  to  the  present  (ncsu  libraries).38     research  design   to  make  sure  the  information  gathered  was  current  and  valid,  this  study  employed  two   approaches,  website  visits  and  survey  investigation,  to  determine  the  state  of  mobile  services  at   the  top  100  universities’  libraries.  the  website  visits  explored  what  mobile  services  are  being   offered  and  how  they  are  being  offered  at  these  university  libraries.  the  survey  sent  via  email   inquired  how  they  are  providing  mobile  services  in  their  libraries  and  what  their  results  have   been  regarding  challenges,  successes,  and  best  practices.  the  survey  data  was  analyzed  and   compared  to  the  data  obtained  via  website  exploration  to  form  a  more  comprehensive  picture  of   mobile  services  at  these  universities.   participants   university  libraries'  patrons  are  frequent  users  of  mobile  technology.  according  to  osika  and   kaufman  (2012),  studies  have  found  that  45%  of  18  to  29-­‐year-­‐olds  who  have  internet-­‐capable   cell  phones  do  most  of  their  browsing  on  their  devices.  39  kostruski  and  skornia  (2011)  note  that   people  of  this  age  group  are  “…leaders  in  mobile  communication…the  traditional  college-­‐age   student.”40  as  the  nation’s  leaders  in  undergraduate  and  graduate  programs  and  academic   research,  an  examination  of  the  status  of  the  top  100  university  libraries'  mobile  services  can   provide  useful  service  patterns  and  a  benchmark  for  the  service  improvements  that  would  benefit   academic  programs.  based  on  the  u.s.  news  &  world  report's  national  university  rankings,  this   study  selected  the  top  100  universities  in  the  2014  rankings.41     procedure   website  visits  as  the  first  step  were  conducted  from  march  2,  2014  to  march  16,  2014.  each   library’s  home  page  was  carefully  examined  for  the  most  common  mobile  services  named  in  the   literature  with  these  categorized  items:  1)  a  mobile  website  or  app,  2)  mobile  access  to  the   library’s  catalog  and  databases,  3)  text  messaging  services,  4)  qr  codes,  5)  augmented  reality,  and   6)  e-­‐books.  to  assess  each  site,  we  first  visited  the  site  via  a  nexus  7  to  see  if  it  had  a  mobile   version.  next,  we  viewed  each  library’s  full  site  on  a  laptop  computer.  we  browsed  through  each   page  of  the  site  looking  for  mention  or  use  of  each  said  categorization.  we  also  searched  for  these     a  library  in  the  palm  of  your  hand:  mobile  services  in  the  top  100  university  libraries  |     liu  and  briggs  |  doi:  10.6017/ital.v34i2.5650   138   items  via  the  library’s  site  map  or  site  search  functions  whenever  available.  the  results  were   tabulated  with  a  codebook  in  the  established  categorization  through  microsoft  excel.     although  the  website  visits  place  great  value  on  gathering  quantitative  data  about  what  mobile   services  are  offered  at  these  libraries,  this  method  has  its  limitations.  firstly,  it  locates  only  those   mobile  services  that  appear  on  a  library’s  website,  but  services  the  library  provides  which  are  not   mentioned  on  the  website  can  be  overlooked.  also,  the  use  of  mobile  devices  or  services  in  library   instruction,  a  very  commonly  mentioned  mobile  service  in  the  literature,  cannot  generally  be   determined  via  a  website  visit.  in  addition,  the  website  visit  provides  only  a  snapshot  of  the   current  state  of  mobile  services;  university  libraries  may  be  planning  to  implement  or  even  be  in   the  process  of  implementing  mobile  services.  lastly,  website  visits  evaluate  what  is  publicly   available,  but  it  is  not  possible  to  access  password-­‐protected  information  meant  only  for  a   university’s  students  and  faculty  to  assess  mobile  content.  to  address  these  shortcomings,  we   created  a  survey  using  surveymonkey  to  complement  the  data  supplied  from  the  website  visits.   we  sent  out  the  survey  via  email  to  each  of  the  top  100  universities’  libraries.    the  survey  was   conducted  from  april  10,  2014,  to  april  24,  2014.     results  and  analysis   study  results  presented  compelling  evidence  that  mobile  services  are  already  ubiquitous  among   the  country's  top  universities.  the  most  recognized  ones  are  mobile  sites,  mobile  apps,  mobile   opacs,  mobile  access  to  databases,  text  messaging  services,  qr  codes,  augmented  reality,  and  e-­‐ books.  these  service  forms  confirm  those  commonly  named  in  the  literature  as  library  mobile   services.   what  basic  types  of  mobile  services  do  the  libraries  provide?   the  results  showed  all  of  the  libraries  offered  one  or  more  of  the  specific  mobile  services  in  chart   1  with  multiple  entries  allowed,  presenting  modernized  new  service  patterns  the  university   libraries  provide  to  meet  the  needs  and  demands  of  university  communities  in  this  digital  era.     information  technologies  and  libraries  |  june  2015   139     chart  1.  percentage  of  libraries  offering  specific  mobile  services  (multiple  entries  allowed).   it  is  clear  from  both  the  survey  results  and  the  website  visits  that  almost  all  libraries  at  the  top   100  universities  are  offering  multiple  mobile  services,  with  mobile  websites,  mobile  access  to  the   library’s  catalog,  mobile  access  to  the  library’s  databases,  e-­‐books,  and  text  messaging  services   being  the  most  common.  qr  codes  and  especially  augmented  reality  are  not  as  common.     of  the  eight  main  mobile  services  we  looked  for  via  the  website  visits  and  survey  (mobile  site,   mobile  app  for  the  site,  mobile  opac,  mobile  access  to  databases,  text  messaging,  qr  codes,   augmented  reality,  and  e-­‐books),  all  libraries  surveyed  offer  between  one  and  seven  of  these   services.  no  universities  have  none  of  these  services,  and  no  universities  have  all  of  these  services.   only  one  university  has  one  service,  none  have  two,  seven  have  three,  thirteen  have  four,  twenty-­‐ four  have  five,  forty-­‐six  have  six,  and  eight  have  seven.  to  make  this  information  easy  to  read,  we   summarized  it  in  table  1  below.   number  of  mobile   services  offered   number  of   libraries   percentage   of  libraries   no  mobile  services   0   0%   1  mobile  service   1   1%   2  mobile  services   0   0%   3  mobile  services   7   7%   4  mobile  services   13   13%   5  mobile  services   24   24%   6  mobile  services   46   46%   7  mobile  services   8   8%   8  mobile  services   0   0%   table  1.  number  of  mobile  services  offered.   5.0%   29.2%   58.7%   77.2%   81.6%   81.7%   88.0%   92.6%   augmented  reality   mobile  app  for  site   qr  codes   text  messaging   mobile  website   mobile  databases   mobile  opac   e-­‐books   percentage  of  libraries  offering  specimic  mobile   services       a  library  in  the  palm  of  your  hand:  mobile  services  in  the  top  100  university  libraries  |     liu  and  briggs  |  doi:  10.6017/ital.v34i2.5650   140   such  a  data  pattern  demonstrates  not  only  that  mobile  services  are  very  widespread  at  these   universities’  libraries,  but  also  that  the  vast  majority  of  these  libraries  offer  multiple  mobile   services.  in  other  words,  libraries  do  not  appear  to  be  offering  mobile  services  in  isolation;  they   have  taken  several  of  their  most  popular  services  (such  as  websites,  reference,  and  search   functions)  and  mobilized  all  of  them.  in  fact,  the  average  number  of  mobile  services  offered  among   the  eight  services  we  examined  is  5.31.      although  results  collected  from  the  two  research  methods  (website  visits  and  survey)  are  almost   identical  for  mobile  websites  and  mobile  opacs  and  are  very  comparable  for  text  messaging,  qr   codes,  and  augmented  reality  there  is  a  bit  of  a  gap  between  results  from  the  website  visits  and  the   survey  regarding  mobile  databases  (92.9%  vs.  70.59%),  but  perhaps  libraries  that  responded  to   the  survey  just  happened  to  offer  mobile  access  to  databases  less  often  than  all  the  libraries  in   general.      it  is  interesting  that  we  located  e-­‐books  on  100%  of  the  websites  we  visited,  but  only  85.29%  of   respondents  mention  offering  them.  perhaps  this  discrepancy  can  be  explained  by  a  clarification  in   terms.  we  looked  for  the  presence  of  books  in  electronic  format  that  could  be  accessed  online.   perhaps  survey  respondents  only  considered  e-­‐books  specifically  formatted  for  smart  phones  or   tablets  as  a  mobile  service.  also,  later  in  the  survey  several  respondents  mention  communication   issues  as  an  ongoing  challenge  in  offering  mobile  services,  specifically,  not  always  knowing  what   other  library  departments  are  offering  in  terms  of  mobile  services.  it  is  possible  that  some  survey   respondents  are  not  responsible  for  the  e-­‐book  collection  and  thus  did  not  mention  it  as  a  mobile   service.     another  discrepancy  exists  between  the  results  for  mobile  apps  for  the  library’s  site  (20.2%  for   the  website  visits  versus  38.24%  for  the  survey).  these  results  indicate  that  mobile  apps  for   libraries’  sites  are  more  common  than  we  had  previously  thought.  perhaps  these  apps  are  being   advertised  in  places  other  than  on  the  library’s  website,  and  therefore  a  website  visit  is  not  the   best  way  to  discover  them.     the  website  visits  did  not  look  for  mobile  library  instruction,  mobile  book  renewal,  or  mobile   interlibrary  loan,  but  through  our  website  visits  we  saw  these  services  mentioned  several  times   and  thus  included  them  in  the  survey.  they  turned  out  to  be  somewhat  common  among  libraries   surveyed;  41.18%  of  respondents  offer  mobile  book  renewal,  20.59%  offer  mobile  interlibrary   loan,  and  32.35%  offer  mobile-­‐friendly  library  instruction.     table  2  below  compares  the  data  collected  from  both  the  website  visits  and  the  survey  among   these  100  universities,  ranking  from  high  to  low  percentages.  in  most  cases,  they  are  very  similar.         information  technologies  and  libraries  |  june  2015   141     mobile  services   percentage  of   libraries  offering   service  (website   visits)   percentage  of   libraries  offering   service  (survey)   e-­‐books   100%   85.29%   mobile  databases   92.90%   70.59%   mobile  opac   87.80%   88.24%   mobile  website   80.80%   82.35%   text  messaging   80.80%   73.53%   qr  codes   61.60%   55.88%   mobile  app  for  site   20.20%   38.24%   augmented  reality   7.00%   2.94%   table  2.  data  comparison  of  specific  mobile  services  between  website  visits  &  survey.   what content do the mobile sites offer? in addition to assessing whether libraries had a mobile site, the survey asked libraries that already have a mobile site what is included on the site. 100% of libraries with mobile sites include library hours on their site, making this the most common feature. the next two most common features are library contact information and a search function for the catalog, which both received 96.67%. searching within mobile-friendly databases , such as ebscohost mobile, jstor and pubmed, is the next most popular feature, although it trailed a little behind library hours, contact information, and catalog searching at 70%. book renewal received 56.67%, and access to patron accounts received 53.33%. interlibrary loan is the least common feature by far, offered by only 26.67% of respondents. this information is summarized in chart 2 below: chart  2.  components  of  libraries’  mobile  sites.   26.67%   53.33%   56.67%   70.00%   96.67%   96.67%   interlibrary  loan   access  to  patron  accounts   book  renewal   search  the  databases   library  contact  information   search  the  catalog   components  of  libraries'  mobile  sites     a  library  in  the  palm  of  your  hand:  mobile  services  in  the  top  100  university  libraries  |     liu  and  briggs  |  doi:  10.6017/ital.v34i2.5650   142   these  results  are  interesting  as,  overall,  they  reflect  higher  percentages  for  specific  mobile   services  than  question  1  on  the  survey,  which  asked  which  mobile  services  libraries  offer.  for   example,  in  question  1,  88.24%  of  respondents  offer  mobile  access  to  the  library’s  catalog,   whereas  for  libraries  with  mobile  sites,  96.67%  offer  access  to  the  catalog  on  the  mobile  site.  the   ability  to  search  mobile-­‐friendly  versions  of  databases  the  library  subscribes  to  was  almost  the   same  for  both  groups,  with  70.59%  of  respondents  to  question  1  offering  this  and  70%  of   respondents  having  this  as  a  component  of  their  mobile  sites.  mobile  book  renewal  is  much  more   common  among  libraries  with  mobile  sites;  56.67%  of  respondents  with  mobile  sites  compared  to   41.18%  of  total  respondents.  a  slightly  higher  percentage  of  respondents  with  mobile  sites  offer   mobile  interlibrary  loan  (26.67%)  compared  to  all  respondents  (20.59%).  this  data  suggests  that,   on  the  whole,  libraries  with  mobile  sites  are  more  likely  to  offer  other  mobile  services  as  well,   specifically  mobile  access  to  the  catalog,  mobile  book  renewal,  and  mobile  interlibrary  loan.     what  mobile  reference  services  do  libraries  provide?   the  survey  also  looked  for  information  on  virtual  and/or  mobile  reference  services.  81.25%  of   survey  respondents  offer  text/sms  messaging,  100%  offer  chat/im,  and  21.88%  offer  reference   services  via  a  social  media  account.  these  results  showing  popular  reference  services  in  these  top   universities  are  summarized  in  chart  3  below:   chart  3.  popular  mobile  reference  services.   chat/im  is  obviously  the  most  popular  method  of  providing  virtual/mobile  reference  services;  all   survey  respondents  offer  this  service.  text/sms  is  also  very  popular,  indicating  that  the  majority   of  libraries  see  value  in  providing  both  despite  their  similar  functions.  the  fact  that  social  media   does  not  compare  favorably  to  either  texting  or  chat/im  services  is  curious  because  most  social   media  platforms  have  a  mobile  version  available  that  libraries  can  take  advantage  of  for  free.   however,  this  may  not  be  the  best  medium  for  reference.  one  respondent  commented  on  this   question,  “our  ‘ask  a  librarian’  service  is  available  from  desktop  facebook,  but  not  on  mobile   facebook.”     22%   81%   100%   social  media   text/sms   chat/im   popular  virtual/mobile  reference  services     information  technologies  and  libraries  |  june  2015   143   what  apps  do  libraries  use  or  provide  for  patrons?   although  the  website  visits  and  survey  results  indicated  that  apps  for  a  library’s  site  are  not  very   common,  both  tools  revealed  that  use  of  apps  for  various  purposes  is  widespread.  the  most   commonly  mentioned  app  is  browzine,  which  is  used  for  accessing  e-­‐journals.  several   respondents  mentioned  apps  developed  in-­‐house  for  using  library  services,  such  as  an  app  for   reserving  a  study  room,  accessing  university  archives,  and  sending  catalog  records  to  a  mobile   device.  another  respondent  stated  that  the  university’s  app  has  a  library  function.  several   respondents  mentioned  vendor-­‐provided  or  third-­‐party  apps,  such  as  apps  for  accessing  pubmed,   sciencedirect,  naxos  music  library,  accessmylibrary  (for  gale  resources),  a  mobile  medical   dictionary,  and  the  american  chemical  society.  one  respondent  noted  that  the  library  loans  ipads   preloaded  with  popular  apps  to  support  student  research  such  as  endnote,  notability,   goodreader,  pages,  numbers,  and  keynote,  among  others.  finally,  these  apps  were  named  at  least   once  as  an  app  libraries  either  use  or  provide  access  to:  iresearch  (for  storing  articles  locally),   boopsie  (for  building  a  library  mobile  app),  ebrary  (for  accessing  e-­‐books),  and  safari  (for   accessing  books  and  videos  online).  these  results  indicate  that  the  use  of  apps  is  fairly  robust  and   diverse  among  these  libraries.  additionally,  from  these  results,  it  seems  more  common  for   libraries  to  use  and/or  provide  apps  created  by  third  parties  than  to  develop  an  in-­‐house  app,   perhaps  due  to  the  expertise  and  expense  involved  in  creating  and  maintaining  an  app.     what  mobile  services  will  be  added  in  the  future?   the  final  question  of  the  survey  asks  libraries  if  there  are  any  plans  to  offer  a  mobile  service  not   currently  provided.  responses  are  summarized  in  chart  4  below.     chart  4.  percentage  of  the  libraries  seeking  to  add  specific  mobile  services   the  most  common  selection  is  mobile  friendly  library  instruction,  with  61.54%.  the  next  most   common  is  a  mobile  website  (46.15%).  mobile  interlibrary  loan  was  chosen  by  38.46%  of   8%   8%   8%   15%   15%   15%   38%   46%   62%   text  messaging  services   qr  codes   mobile  app(s)   e-­‐books   augmented  reality   mobile  opac   mobile  databases   mobile  book  renewal   mobile  interlibrary  loan   mobile  website   mobile  library  instruction   planned  mobile  services  additions     a  library  in  the  palm  of  your  hand:  mobile  services  in  the  top  100  university  libraries  |     liu  and  briggs  |  doi:  10.6017/ital.v34i2.5650   144   respondents.  less  common  services  planned  include  adding  mobile  access  to  the  library’s  opac,   mobile  access  to  the  library’s  databases,  and  mobile  book  renewal,  each  of  which  were  chosen  by   15.38%  of  respondents.  7.69%  of  respondents  are  planning  to  add  mobile  apps,  e-­‐books,  and   augmented  reality,  respectively.  no  one  indicated  plans  to  add  text  messaging  services  or  qr   codes.  these  results  indicate  that  libraries  expect  demand  for  traditional  library  services  in  a   mobile-­‐friendly  format  to  continue  to  expand;  mobile  friendly  library  instruction  was  only  offered   by  32.35%  of  respondents,  yet  61.54%  have  plans  to  offer  this  service  in  the  future.  mobile   interlibrary  loan  is  currently  offered  by  20.59%  of  respondents,  so  the  fact  that  38.46%  would  like   to  add  it  represents  a  significant  change.     not  surprisingly,  mobile  websites  are  likely  to  remain  a  very  popular  mobile  service.  the  fact  that   82.35%  of  respondents  already  have  a  mobile  website  and  46.15%  who  do  not  have  one  wish  to   add  one  in  the  near  future  means  that  mobile-­‐friendly  sites  are  well  on  their  way  to  becoming   ubiquitous,  at  least  among  libraries  at  the  top  100  universities,  and  may  reasonably  be  expected  to   take  their  place  among  websites  in  general  as  a  necessity  to  maintain  institutional  viability.   additionally,  several  respondents  mentioned  moving  towards  responsive  design,  in  which  their   websites  are  fully  functional  regardless  of  whether  they  are  accessed  on  mobile  devices  or   desktops.   what  are  challenges  and  strategies  for  offering  mobile  services?   in  addition  to  looking  for  the  presence  or  absence  of  mobile  services  being  offered  at  top  100   university  libraries,  the  survey  also  examined  libraries’  experiences  in  implementing  mobile   services,  including  challenges,  successes,  and  best  practices.  several  themes  emerged  in  response   to  these  questions.  the  most  common  challenge  among  respondents  was  having  the  time,   expertise,  staffing  and  money  to  support  mobile  services,  especially  apps  and  mobile  sites.  to   solve  this  problem,  respondents  mention  relying  on  vendors  and  third-­‐party  providers  supplying   apps  to  access  their  resources,  but  this  does  not  give  libraries  the  flexibility  and  specificity  of  an   in-­‐house  app.     another  common  challenge  mentioned  by  several  respondents  involved  technical  issues,  such  as   difficulties  with  off  campus  access  to  resources  via  a  proxy  server  and  compatibility  issues  among   different  browsers  and  especially  different  devices.  a  lack  of  communication  and/or  support  is   another  issue  for  libraries.  one  respondent  reported  a  lack  of  support  from  the  campus  computing   center  for  mobile  services.  one  respondent  discussed  the  difficulty  of  having  a  coordinated  mobile   effort  when  the  library  has  a  large  number  of  departments,  and  each  department  may  or  may  not   be  aware  of  what  the  others  are  doing  in  regards  to  mobile  services.  survey  results  revealed  that   few  libraries  have  policies  in  place  to  support  mobile  services.     coming  up  with  a  specific  plan  for  implementing  such  services  can  help  libraries  work  towards   promoting  effective  communication  and  garnering  support.  one  respondent  wrote,  “the  biggest   challenges  have  been:  (1)  developing  a  strategy  (2)  developing  a  service  model  (3)  having  a   systematic  model  for  managing  content  for  both  mobile-­‐  and  non-­‐mobile  applications.  we've  had     information  technologies  and  libraries  |  june  2015   145   success  with  the  first  two  and  are  making  great  progress  on  the  third.”  interestingly,  several   respondents  noted  that  underuse  is  an  issue  for  some  services.  one  respondent  mentioned  that   qr  codes  are  not  used  often,  and  another  mentioned  that  the  library’s  text-­‐a-­‐librarian  service  is   much  underutilized.  several  respondents  cited  the  need  to  market  mobile  services  as  an  antidote   to  this  problem.  seeking  regular  feedback  from  the  user  community  regarding  mobile  services   wants  and  needs  is  another  recommended  solution.   other  issues  include  the  fact  that  not  all  library  services  are  mobilized.  however,  libraries  are   actively  looking  for  solutions  for  this.  there  is  a  trend  among  respondents  towards  developing  a   site  that  is  responsive  to  all  devices,  including  desktops,  laptops,  tablets,  and  phones.  this  will  take   the  place  of  a  separate  mobile  site.  as  one  respondent  states,  “at  the  moment,  our  library  mobile   website  only  has  a  fraction  of  the  services  available  via  our  desktop  website.  we  are  in  the  process   of  moving  everything  to  responsive  design,  with  the  expectation  that  all  services  will  be  equally   available  in  mobile  and  desktop.”  in  reading  through  these  responses,  one  message  is  clear:  mobile   services  are  a  must.  several  respondents  noted  that  demand  for  mobile  services  is  growing,  with   one  writing,  “get  started  as  soon  as  possible.  our  analytics  show  that  mobile  use  is  continuing  to   increase.”   conclusion   this  study  confirms  that  as  of  spring  2014  mobile  services  are  already  ubiquitous  among  the   country’s  top  100  universities’  libraries  and  are  likely  to  continue  to  grow.  where  the  most   common  services  offered  are  e-­‐books,  chat/im,  mobile  access  to  databases,  mobile  access  to  the   library  catalog,  mobile  sites,  and  text  messaging  services,  there  is  a  trend  towards  responsive   design  for  websites  so  that  patrons  can  access  the  library’s  full  site  on  any  mobile  device.       the  experiences  of  these  libraries  demonstrate  the  value  of  creating  a  plan  for  providing  mobile   services,  allotting  the  appropriate  amount  of  staffing,  time,  and  funding,  communicating  among   departments  and  stakeholders  to  coordinate  mobile  efforts,  marketing  services,  and  regularly   seeking  patron  feedback.  however,  there  is  no  one  approach  to  offering  mobile  services,  and  each   library  must  do  what  works  best  for  its  patrons.   references     1.     andrew  walsh,  using  mobile  technology  to  deliver  library  services  (maryland:  scarecrow   press,  2012),  xiv.   2.     “smartphone  ownership  2013,”  last  modified  june  5,  2013,   http://www.pewinternet.org/2013/06/05/smartphone-­‐ownership-­‐2013/.   3.     thomas  a.  peters,  “left  to  their  own  devices:  the  future  of  reference  services  on  personal,   portable  information,  communication,  and  entertainment  devices,”  reference  librarian  52   (2011):  88-­‐97,  doi:10.1080/02763877.2011.520110.     a  library  in  the  palm  of  your  hand:  mobile  services  in  the  top  100  university  libraries  |     liu  and  briggs  |  doi:  10.6017/ital.v34i2.5650   146     4.     acrl  research  planning  and  review  committee,  “top  ten  trends  in  academic  libraries,  “   college  &  research  libraries  news  73  (2012):  311-­‐320.   5.     lauren  elmore  and  derek  stephens,  “the  application  of  qr  codes  in  uk  academic  libraries,”   new  review  of  academic  librarianship  18  (2012):26-­‐42,  doi:10.1080/13614533.2012.654679.   6.     robin  canuel  and  chad  crichton,  “canadian  academic  libraries  and  the  mobile  web,”  new   library  world  112  (2011):  107-­‐120,  doi:10.1108/03074801111117014.   7.     aiguo  li,  “mobile  library  services  in  key  chinese  academic  libraries,”  journal  of  academic   librarianship  39  (2013):  223-­‐226,  doi:10.1016/j.acalib.2013.01.009.   8.     robin  canuel  and  chad  crichton,  “canadian  academic  libraries,”  107-­‐120.   9.     lisa  carlucci  thomas,  “gone  mobile?  (mobile  libraries  survey  2010),”  library  journal  135   (2010):  30-­‐34.   10.    robin  canuel  and  chad  crichton,  “canadian  academic  libraries,”  107-­‐120.   11.    “mobile  technology  in  libraries  survey,”  last  modified  2012,   http://www.ohsu.edu/xd/education/library/about/staff-­‐ directory/upload/mobile_survey_academic_final.pdf.   12.    brittany  osika  and  cate  kaufman,  “’mobilizing’  community  college  libraries,”  searcher  20   (2012):  36-­‐46.   13.    scott  la  counte,  “introduction,”  in  mobile  library  services:  best  practices,  ed.  charles  harmon   and  michael  messina.  (maryland:  scarecrow  press,  2013),  v-­‐vii.     14.    fred  d.  barnhart  and  jeannette  e.  pierce,  “becoming  mobile:  reference  in  the  ubiquitous   library,”  journal  of  library  administration  52  (2012):  559-­‐570,     doi:10.1080/01930826.2012.707954.   15.    mark  andy  west,  arthur  w.  hafner,  and  bradley  d.  faust,  “expanding  access  to  library   collections  and  services  using  small-­‐screen  devices,”  information  technology  &  libraries  25   (2006):  103-­‐107.   16.    joan  k.  lippincott,  “mobile  technologies,  mobile  users:  implications  for  academic  libraries,”   arl:  a  bimonthly  report  on  research  library  issues  &  actions  261  (2008):  1-­‐4.     17.    walsh,  using  mobile  technology,  58.   18.    ibid.   19.    “mobile  technology  in  libraries  survey.”     information  technologies  and  libraries  |  june  2015   147     20.    edward  iglesias  and  wittawat  meesangnil,  “mobile  website  development:  from  site  to  app,”   bulletin  of  the  american  society  for  information  science  and  technology  38  (2011):  18-­‐23,   doi:  10.1002/bult.2011.1720380108.   21.    joshua  bishoff,  “going  mobile  at  illinois:  a  case  study,”  in  mobile  library  services:  best   practices,  ed.  charles  harmon  and  michael  messina.  (maryland:  scarecrow  press,  2013),  107-­‐ 121.   22.    walsh,  using  mobile  technology.   23.    helen  bischoff,  michele  ruth,  and  ben  rawlins,  “making  the  library  mobile  on  a  shoestring   budget,”  in  mobile  library  services:  best  practices,  ed.  charles  harmon  and  michael  messina.   (maryland:  scarecrow  press,  2013),  43-­‐54.     24.    walsh,  using  mobile  technology.   25.    ibid.   26.    ibid.   27.    ibid.,  105.   28.    ibid.,  97.   29.    ibid.   30.    “go  mobile:  use  these  strategies  and  increase  your  mobile  literacy  and  your  patrons’   satisfaction,”  last  modified  november  1,  2009,   http://libraryconnect.elsevier.com/articles/technology-­‐content/2009-­‐11/go-­‐mobile.     31.    walsh,  using  mobile  technology,  45.   32.    peters,  “left  to  their  own  devices.”   33.    geoffrey  little,  “keeping  moving:  smart  phone  and  mobile  technologies  in  the  academic   library,”  journal  of  academic  librarianship  37  (2011):  267-­‐269,  doi:   10.1016/j.acalib.2011.03.004.   34.    elmore  and  stephens,  “the  application  of  qr  codes.”   35.    jim  hahn,  “mobile  augmented  reality  applications  for  library  services,”  new  library  world   113  (2012):  429-­‐438,  accessed  june  21,  2014,  doi:10.1108/03074801211273902.   36.    ibid.   37.    wolfwalk:  explore  nc  state  history  right  on  your  phone,”   http://www.lib.ncsu.edu/wolfwalk/.     a  library  in  the  palm  of  your  hand:  mobile  services  in  the  top  100  university  libraries  |     liu  and  briggs  |  doi:  10.6017/ital.v34i2.5650   148     38.    ibid.   39.    osika  and  kaufman,  “mobilizing  community  college  libraries.”   40.    kate  kosturski  and  frank  skornia,  “handheld  libraries  101:  using  mobile  technologies  in  the   academic  library,”  computers  in  libraries  31  (2011):  11-­‐13.     41.    “national  university  rankings,”  http://colleges.usnews.rankingsandreviews.com/best-­‐ colleges/rankings/national-­‐universities/spp+50.   lib-mocs-kmc364-20140103102629 72 entry /title compression code access to machine readable bibliographic files william l. newman: systems analyst and programmer, and edwin j. buchinski: assistant to the librarian, systems and planning, university of saskatchewan, saskatoon. an entry/title compression code is proposed which will fulfill the following requirements at the library, university of saskatchewan: 1) entry/title access to marc tapes; 2) entry/title access to the acquisitions and cataloguing in-process file; and 3) entry /title duplicate order edit within the acquisitions and cataloguing in-process file. the study which produced the code and applications for the code are discussed. introduction the determination and design of access points, or keys, to machine readable bibliographic files is a major problem faced by libraries planning computer assisted processing. alphabetic keys, i.e. truncations of title and/or author variable fields, are inadequate, since minor differences in spelling, punctuation, or spacing between master key and request key cause difficulties in accessing records. numeric keys, such as library of congress card numbers, isbn, purchase order numbers, etc. are therefore usually employed for searching machine readable library files. more sophisticated means must be developed in order to maximize the usefulness of these files, since a searcher, even with book in hand, may not be able to provide the numeric key necessary to obtain the book's machine readable data. this problem may be solved through the use of compression codes generated from author /title, or other bibliographic information. studies entry/title compression code/newman and buchinski 73 of compression codes and their performance have been reported by ruecking ( 1), kilgour ( 2), and the university of chicago ( 3). this approach has been endorsed by the library of congress ( 4) in the recon study. studies at the library, university of saskatchewan, were initiated with the hope of producing a compression code that would provide machine duplicate order edit in the acquisitions and cataloguing in-process file, and retrieve entries, using unverified or verified bibliographic information as input, either from a partially unverified file, such as the acquisitions and cataloguing in-process file, or from an authoritative machine readable data base, such as marc ii. in addition, the desired code would have to minimize errors in punctuation and spelling in order to achieve a high retrieval percentage, yet produce a low volume of duplicate codes for dissimilar works. construction of the data base since june, 1969, the marc data base has been used at the library to generate unit cards which have been used as source data for unit card masters in the cataloguing department. approximately 300 of these were drawn at random. at the same time the original order request forms for these items were searched. of the 300 items, 254 requisition forms were found in the manually maintained acquisitions in-process file. the lc card numbers were used to retrieve the corresponding marc records from the library's history marc tape. an additional4,128 marc records were placed on the same tape as the 254 marc records for which order request information existed. the lc card numbers, order entry, title, and, if present, d ate of publication were keypunched from the 254 requisition forms. this bibliographic information formed the data base from which search codes were produced for the acquisitions department records. code generation a computer program performed the following modifications on all input data prior to generating the actual compression codes. first, all the lowercase alphabetics were converted to upper-case alphabetics. then all punctuation was eliminated from the title field except for periods and apostrophies within a word. a word compaction routine then eliminated periods from within abbreviations and apostrophes from within words. the entries from the 4,382 marc records and the 254 requisition forms were categorized according to personal name, and corporate or conference name. the first comma delimited the portion of the personal name to be used in the compression routine. spaces, diacritics, periods, apostrophes, and hyphens were all eliminated from the personal name. the first two codes used in the project were labelled imaginatively code type 1, and code type 2, where code type 1 was a slight modification of the code developed by frederick h. ruecking ( 1). code type 2 was 74 journal of library automation vol. 4/2 june, 1971 based on a modified university of chicago experimental search code ( 3) incorporating ideas from some of ruecking's studies. code type 1 title compression (16 characters) see ruecking ( 1) for the rules which were used to construct the fourcharacter compressions. entry compression (12 characters) three four-character compressions were used for corporate or conference names instead of ruecking's four. one four-character compression was produced for personal names. date of publication (3 characters) if the year of publication was available, the last three digits were used, otherwise, the date was left blank. the total length of code type 1 is 31 characters. code type 2 title compression (6 characters) 1) "a", "an", "the", "and", "by", "if", "in", "of", "on", "to" were deleted from the title. 2) the first word containing two consonants was located and the first two consonants appearing in the word were used for the search code. 3) step 2 was repeated with a second and third word of the short title, whenever these were available. 4) if three words with two consonants were not available, the balance of the six characters needed for the code were supplied by those characters immediately after the last character used (except for blanks). entry compression (6 characters) a) personal name. 1 ) only the surname, or the forename if there was no surname. 2) if the name had six or fewer characters, the entire name was used. otherwise, vowels were deleted from the name (working backwards on the name ) until the six-character compression was formed, or the second consonant was located. 3) if the six-character compression was not formed by step 2, then the first four characters and the last two characters were used for the six-character compression. b) corporate and conference entries the rules for title compression to form the six-character code were followed. entry/title compression code/newman and buchinski 75 date of publication (3 characters) the last three digits of the date of publication, as in code type 1, were used. in either of the codes, if the title was the main entry, a code was generated with the entry field blank. examples of code generation title: factors in the transfer of technology. entries: 1) m.i.t. conference on the human factor in the transfer of technology, endicott house, 1966. 2) gruber, william h. 3) marquis, donald george 4) massachusetts institute of technology date of publication: 1969 code type 1 compressions: 1) f acttrsftchnlil:z;lblbmitlbcofrhumlb969 f acttrsftchnlblblblbmit!tcofrhumlblblblb 2) f acttrsftchnlblblz:hgrbrlblblb1z:lbt)lblb969 f acttrsftchnlblblblbgrbrlblblblblblblblblblblb 3) f acttrsftchnlbl:z;lblbmaqslbbbblz:lblbb969 f acttrsftchnipbblbmaqsbblbbbbbbl:z;bb 4) f acttrsftchnlbblbbmattintttchn969 f acttrsftchni:z;bblbmattintttchnlblblb code type 2 compressions: 1) fctrtcmtcnhm969 fctrtcmtcnhmlblblb 2) fctrtcgruber969 fctrtcgruberblbb 3) fctrtcmarqus969 fctrtcmarquslbbb 4) fctrtcmsnstc969 fctrtcmsnstclbblb procedure and results the two types of codes were generated from the 4,382 marc records using publication date, short title, main entry, and added entries. another program was written to generate codes from the acquisitions department data on cards and to write them on a separate tape using publication date if available, entry, and the first four significant words of the title and/or the words of the title up to the first punctuation mark. the two tapes containing codes were sorted in ascending code sequence, then compared. if the code generated from the acquisitions data, hereafter called the unverified code, was exactly the same as the code generated from marc tape, hereafter called the verified code, the codes and corresponding lc card numbers were printed as a hit. the program then checked the lc card 76 journal of librat·y automation vol. 4/2 june, 1971 numbers corresponding to the identical codes. if the lc card numbers were the same, a retrieval was recorded; otherwise, the matching codes were considered a false drop. the program also checked and printed duplicates existing within the verified codes and within the unverified codes. since the code formation programs involved string manipulation, they were written in pl/i, while the comparison program was written in cobol. the programs were run on the ibm/360 model 50 installed at the university of saskatchewan computation centre. table 1 and table 2 give the results. the following is a description of the error types used in evaluating non-retrieval: a-the unverified entry had only a remote relationship to the verified entry. no retrieval technique would have produced a match. b-the unverified entry was misspelled. c-the unverified title had only a remote relationship to the verified title. d-the unverified title contained misspelled word ( s) . e-only the unverified date of publication was incorrect. as an immediate consequence of the analysis of tables 1 and 2, the publication date was eliminated from the codes and the comparison program rerun producing the results given in table 3. table 1. code performance retrievals false drops code type 1 200 0 code type 2 206 0 table 2. non-retrieval a nalysis numbe r of no n-retrievals error type a code type 1 b c d e table 3. code performance rett·ievals code type 1a code type 2a 220 226 9 10 8 7 20 false drops 0 0 percent retrieval 78.74 81.10 code type 2 9 7 8 4 20 percent retrieval 86.61 88.98 no duplicate codes existed within the unverified code tape. from the 4,382 marc records, 6,828 cod es were produced for each of code type 1a entry/title compression code /n ewman and buchinski 77 and code type 2a. works having the same author and title, but different imprint, were not considered duplicates even though the program listed them as such. seven duplicates, one triplicate and one quadruplicate occurred in code type 1a; and eight duplicates, two triplicates and two quadruplicates in code type 2a. government publications were responsible for all but one of the duplicate codes. code type 2b a graph of the number of duplicate codes vs. the number of source records was drawn for code type 1a and code type 2a (fig. 1 ). as a result of this graph code type 2b was proposed. this code employed the same rules for construction as code type 2a, except that four significant words from the title and four significant words from corporate or conference entries were used to generate the compression. the total length of code type 2b is thus sixteen characters. six duplicates, one triplicate and one quadruplicate appeared when the comparison program was run using code type 2b. figure 1 is a graph of the result. ~ < (.) 15 ~ 10 rx. 0 5 looo 2000 3000 number of marc records 4000 2a la 2b fig. 1. numb er of duplicates vs number of source records for code types la, 2a and 2b. 78 journal of library automation vol. 4/2 june, 1971 the performance of code type 2b is summarized in tables 4 and 5. table 4. code performance retrievals code type 2b 223 table 5. non-retrieval analysis error type applications marc tapes a b c d f al:>e drops 0 percent retrieval 87.80 number of nonretrievals code type 2b 9 10 8 4 a marc code tape was recently created and is being maintained at the university of saskatchewan, as flowcharted in figure 2. each record on the tape consists of a compression code and an lc card number. approximately 100,000 entry /title keys, plus series statement and sbn keys, have been created from the 65,000 records on the current marc history tape. figure 3 illustrates how these access points are used to provide unit card printouts. figure 4 shows a sample output from the matching step in figure 3. this printout indicates the results of the search, and serves as a link between the request and the catalog card printed from the marc tape. in the printout, entry / title requests that have found more than one lc card number do not necessarily indicate a false drop. so far, these multiple finds have resulted from the same publications appearing on marc with different imprints. it is a simple matter to select the catalog card with the appropriate imprint. the discrepancy in table 6 between marc records found and titles verified is due to the above, and to multiple hits on a single record when requests for that record were submitted in more than one form, i.e. s.b.n. and author/ title. table 6 presents a summary of the results of submitting unverified requests over a four-week period against the marc code tape. during that time, 563 english language monographs with potential 1969 and 1970 imprints were searched. desired marc records were found for 184 titles, or 32.7% of these requests. the source data for the requests was supplied from title-pages and order recommendations. this data was not verified because the compression code access technique partially solves the problem of non-retrieval due to human errors in the submission. entry/ title compression code / newman and buchinski 79 i i i new new update codes codes .., i i i i i r-update i i i i i i i occasi onal -----1 i i i i i i i i i old i i split codes i i i i i update r _____ ., __ .j fig. 2. marc tape processing. 80 journal of library automation vol. 4/ 2 june, 1971 generate codes lc card number ~uests cataloo cards add to fi le codes fig. 3. marc a ccess programs. table 6. marc ret1·ieval form of request author/ corporate title title author / title number of requests marc records 546 29 130 found 173 2 11 titles verified 148 2 6 false drops 0 0 2 sbn 139 36 20 0 codes series total 11 855 18 240 8 184 6 8 entry/title compression code/ newman and buchinski 81 l i 8uit't • fntii:yifl'lf ~fou£""s"fsfofl: "'l ui(. cll.uog (.uos ii',u;t; ) m ilic i'oi !io ~oj l ii hf:~ mimu ages i~ ~nglan o 1216 1185 shah ~neujl l u!)jol of uniyfr'i i tt tf,i,c.hinc fltf frlll( lu~ i n i)jt: puge ino l i ii.fnf'>s ullll~ius 1 1'11 'lufl: f ii i.gf •no llltf'-f~s huttani!y l'«) qivin1f'l' 1111 ft.ll111,. ......... 0 .. 11-au"s nf south iii fst sariwu !holf 'tal 4y su fkcloskf't' .. (f4 fh41cs and noit "'a!..!y.lfthi n tc.f fl' , in i'lu') f ll fa l i lat i on in joii netfhhh cfnt~'f fu fl:tlp! l(f flt il' 1ndlls ta.iili14tion in l 9_th cfn tl,. y euii.o pf valf f:ncll sh cas(.o "'v val( fhclish g.t.sco"'y llot't 14h huit l "if'/n fncl i sh ho"'f lt't'ltfs a"tf".l,o saj:c)til "0t1'fity l>t_!l_ httlf mfnt nf f "'gl&no ')ii'~(. i( , ,.. 1."10"ty"100s ii'utianf:nu'ty d!aity 10 seat:ch coof lc caiii.o numiuu i ·=-------------------cu tht ogs nciil l ks n no lc caito nu"' iuiis found c u i'igliihi4dvu.in1ts 10460121 s'ec~t cu nnp~oiiihsp£(11( no lt co\11:0 hu~ 8fiii:5 fou !ro'd &"f &'ii"'ny iii()j$ paiili&mh11.!.!3!_qj'.lllty 17.0"'-'-''------------l(l 'lr. anhii:c tic jo'inv)n tntiiioplltj ion to \ovih lfg&l. sy\_ffl't 'u..neii lt(lt 1fic5 o\no "'ll. l 1 'tu10'1h co .. i'u'f c u l'l" lnfc mtuit nflt fifo lc cuo nu'!&u5 founo cu i' m5 u.fltn no lt caito nu muu found fig. 4. results of entry / title search for marc unit card printouts. manual searching of the nuc catalogs was employed to verify titles that could not be located on the marc tape. ten titles were found with marc notations after failing to be retrieved by compression code matching. type a and type c errors were primarily responsible for this non-retrieval. however, two of these titles could not be retrieved from the marc tape following manual verification, since the verified entries in the nuc preceded their counterparts on the marc tape. thus the performance of the compression codes can be evaluated as 184 of a possible 192 hits, or a 95.8% retrieval rate. during the four-week period the keypunchers formul ated 52% more entry / title requests than there were titles for verification. this is due mainly to the need for submitting more than one author/ title request whenever the portion of the title which comprises the short title is in doubt, since the code is formulat ed from the short title only. additional experience should decrease the number of redundant requests. only 8 false drops have been received in the above submissions. retrieval of series entries is likely to engender the greatest number of false drops because series statements are treated as titles in the code generation procedure. acquisitions and cataloguing during the past two years, the technical services department at the 82 journal of library automation vol. 4/2 june, 1971 library and the computation centre have designed, and are currently testing tesa i (technical services automation-phase i), an automated acquisition and cataloguing system ( 5), the primary objectives of which were to pursue a total library system concept and to provide for conversion from a batch system to an on-line operation when sufficient computer facilities become available. at the same time that work proceeded towards these objectives, status codes and receiving reports were employed as used in washington state university's lola system (6) and (7). however, marc tapes and compression codes comprise an integral part of the system. if a marc record can be located before an order is entered, a tremendous amount of keying is saved. one 64-character in-process transaction will supply the ordering information and transfer the bibliographic data from the current marc history tape to the direct access acquisitions and cataloguing in-process file (ibm's basic direct access method). minimal cataloguing updates are necessary before catalog card sets can be produced. entry /title access ensures that only a small percentage of needed marc records will slip through tesa i's fingers at order initiation time. another code application as illustrated in figure 5 will exploit the fact that the same code construction rules are used in the marc system as in tesa i. items requiring bibliographic information will be flagged in the in-process file. when a new marc tape arrives, the in-process code file (ibm's index sequential access method) will be automatically matched with the marc codes created from the new weekly tape. a sample printout from these matches is provided in figure 6. after verifying which marc records are needed, marc bibliographic information will be transferred to the appropriate in-process records. each record in the direct-access isam compression code file consists of a compression code (or sbn or lc card number) and the key ( purchase order number) to the corresponding in-process record. a threaded list structure exists within the in-process file to handle the possibility of one code accessing several items. thus an in-process record may be directly accessed by entry /title, series statement, sbn, lc card number or purchase order number. a fast edit routine built into the direct-access write detects whether or not the compression code about to be written is a duplicate of a code already in the file. if the code is unique the code record is written on disc and a single item list is created within the corresponding in-process record. if the code is not unique, the code record cannot be written. in this case the list structure for the code is updated to include the key of the in-process record being added. a message, together with the purchase order numbers of items which may be duplicates, is printed to warn the acquisitions staff that a potential duplicate is being added to the in-process file. traditional duplicate checking of in-process items thus becomes an exception. entry/title compression code/newman and buchinski 83 fig. 5. search of weekly marc tape for records needed in the in-process file. pag£ 1 ----------------------------------------------------------------------------------------------------------------1 tf n humiijer ; 1001s2'co ma't' cnu:espono to lc caiu) !ij miijer: author: ltfnyon, muu lloyi"', jitle~i:ny~oi ns of encund. 1711120 4 -!~~::o=~"' ;~:~:~!::~~l~bco~respond to lc card number: 7tl2 lus,_ ___________________ _ title: plt.oducts an d the consumer : oefective and oangerous pltoducts , -!~~=o~~~~~~~~e~~o~=~~ /: ~ c~~:ri;spnnn ro u: cuo'-"-" ""'""'"~'•oc''---''-"•'-'11'-''-""-'----------------------' -'tlf: china 800 y t s;eou!!>l'-''---ite m nu mi!i fr : loojs\~_0 hu corrfspono to lc card number : l9~()61u i u'i'hqr : ~ehinir on p.ln(ht.y u i ra j , p unn i ng and of140cr ac yt jai puil , 1~ 6 4. ---l.!.!!:...e: p~chau.t i uj , pl anning ahd .qemocu~c.cyl• ..><.<_projot!llfe-'.•----------------------fig. 6. in-process items for which bibliographic information may exist on the newly arrived marc tape. 84 journal of library automation vol. 4/2 june, 1971 remote access to marc an experiment was conducted in which entry /title requests were submitted from the ibm s360/40 computer at the university of saskatchewan, regina campus, computer centre, over a communication link to the saskatoon campus ibm s360/50 computer. the marc access program was read into the regina computer, sent to saskatoon's computer, spooled in the saskatoon job queue and executed; then the results of the search were sent to regina to be printed. the entire process took approximately the same time as if the program had actually been executed in regina. no data transmission errors were encountered in transmitting either the requests or the retrieved marc unit cards over this 150-mile communication link. conclusion there is an inverse relationship between retrieval performance and number of duplicate codes produced. a high retrieval code such as code type 2a results in more duplicates than a code such as ruecking's, which has a slightly lower retrieval performance. code type 2b fulfills the requirements for a code short in length and easy to construct that produces a low number of duplicates and has high retrieval capability. for an index to a library holdings file, or to a national data base, a code such as ruecking's, with four or more significant words from title and corporate or conference entries, and with different rules for personal author compression, would perhaps be suitable. acknowledgments the authors thank the library staff for their assistance in the study. they are also grateful to the library and computation centre administrations, in particular, d. c. appelt, g. c. burgis, and n. e. glassel for the allotment of computer time and their encouragement. references 1. ruecking, frederick h. jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," ]oumal of library automation, 1 (december 1968), 227-238. 2. kilgour, frederick g.: "retrieval of single entries from a computerized library catalog." in american society for information science, annual meeting, columbus, 0., 20-24 oct. 1968: proceedings, 5 ( 1968 ), 133-136. 3. "university of chicago experimental search code." in avram, henriette d.; knapp, john f.; rather, lucia j.: the marc ii format: a communications format for bibliographic data (washington, d.c., library of congress, 1968), pp. 129-131. 4. "computer requirements for a national bibliographic service." in recon working task force: conversion of retrospective records to machine-readable form (washington, d.c.: library of congress, 1969), pp. 183-226. entry/title compression codejnewman and buchinski 85 5. newman, w. l.: technical services automation-phase i acquisitions and cataloguing (computation centre, university of saskatchewan, saskatoon, november, 1969) , mimeographed. 6. burgess, t.; ames, l.: lola; library on-line acquisitions sub-system (pullman, wash.: washington state university library, july, 1968 ). 7. mitchell, patrick c.: lola, library on-line acquisitions sub-system, (washington state university, june, 1969) unpublished. metadata provenance and vulnerability timothy robert hart and denise de vries information technology and libraries | december 2017 24 timothy robert hart (tim.hart@flinders.edu.au) is phd researcher and denise de vries (denise.devries@flinders.edu.au) is lecturer of computer science, college of science and engineering, flinders university, adelaide, australia. abstract the preservation of digital objects has become an urgent task in recent years as it has been realised that digital media have a short life span. the pace of technological change makes accessing these media increasingly difficult. digital preservation is primarily accomplished by main methods, migration and emulation. migration has been proven to be a lossy method for many types of digital objects. emulation is much more complex; however, it allows preserved digital objects to be rendered in their original format, which is especially important for complex types such as those comprising multiple dynamic files. both methods rely on good metadata to maintain change history or construct an accurate representation of the required system environment. in this paper, we present our findings that show the vulnerability of metadata and how easily they can be lost and corrupted by everyday use. furthermore, this paper aspires to raise awareness and to emphasise the necessity of caution and expertise when handling digital data by highlighting the importance of provenance metadata. introduction unesco recognised digital heritage in its “charter on the preservation of digital heritage,” adopted in 2003, stating, “the digital heritage consists of unique resources of human knowledge and expression. it embraces cultural, educational, scientific and administrative resources, as well as technical, legal, medical and other kinds of information created digitally, or converted into digital form from existing analogue resources. where resources are ‘born digital’, there is no other format but the digital object.” 1 born-digital objects are at risk of degradation, corruption, loss of data, and becoming inaccessible. we combat this through digital preservation to ensure they remain accessible and useable. the two main approaches to preservation are migration and emulation. migration involves migrating digital objects to a different and currently supported file type. emulation involves replicating a digital environment in which the digital object can be accessed in its original format. both methods have advantages and disadvantages. migration is the more common method because it is simpler than emulation and the risks can often be neglected. these risks include potential data loss or change, in which the effects are permanent. emulation is complex, but it offers the better means to access preserved objects, especially complex file types comprising multiple dynamic files that must be constructed correctly. emulation also allows users to handle digital objects as closely to the “look and feel” as originally intended. 2 mailto:tim.hart@flinders.edu.au mailto:denise.devries@flinders.edu.au metadata provenance and vulnerability | hart and de vries 25 https://doi.org/10.6017/ital.v36i4.10146 accurate and complete metadata is central to both migration and emulation; thus, it is the focus of this paper. metadata are needed to record the migration history of a digital object and to record contextual information. they are also necessary to accurately render digital objects in emulated environments. emulated environments are designed around a digital object’s dependencies , which typically include, but are not limited to, drivers, software, and hardware. 3 the metadata describe the attributes of the digital object from which we can derive the type of system in which it can run (e.g., the operating system), the versions of any software dependencies, and other criteria that are crucial for accurate creation of an emulated environment. while metadata are being used to support the preservation of digital objects, there is another equally important role it should be playing. it is not enough to preserve the object so it can be accessed and used in the future. what of the history and provenance of the digital object? what about search and retrieval functionality within the archive or repository the digital object is held in? one must consider how these preserved objects will be used in the future, and by whom. preserving digital objects is difficult if adequate metadata is not present, especially if the item is outdated and no longer supported. looking to the future, we should try to ensure metadata are processed correctly for the lifecycle of the digital object. this means care must be taken at the time of creation and curation of any digital objects because although some metadata are typically generated automatically, many elements that will play a pivotal role later must be created manually. digital objects also commonly go through many changes, which is something that must be captured, as the change history will reveal what has happened to the object over of its lifecycle. the changes may include how the object has been modified, migrations to different formats, and what software created or changed the object—all of which is considered when emulating an appropriate environment. examples of these changes can be found in case studies presented in the paper. metadata types the common and more widely used metadata types include, but are not restricted to, administrative, descriptive, structural, technical, transformative, and preservation metadata. each metadata type describes a unique set of characteristics for digital objects. administrative metadata include information on permissions as well as how and when an object was created. transformative metadata includes logs of events that have led to changes to a digital object. 4 structural metadata describe the internal structure of an object and any relationships between components. technical metadata describe the digital object with attributes such as height, weight, format, and other technical details. 5 preservation metadata support digital preservation by maintaining authenticity, identity, renderability, understandability, and viability. they are not bound to any one category as they comprise multiple types of metadata, not including descriptive or contextual metadata. however, unlike the common metadata types, preservation metadata are unique from the other metadata types and are often ambiguous. 6 in 2012, the developers of version 2.2 of the premis data dictionary for preservation metadata saw descriptive metadata as less crucial for preserving digital objects; however, they did state it was important for discovery and decision making. 7 while version 2.2 allowed descriptive information technology and libraries | december 2017 26 metadata to be handled externally through existing standards such as dublin core, the latest version (2017) of the dictionary allows for “intellectual entities” to be created within premis that can capture descriptive metadata. 8 thus, while digital preservation does not require all types of metadata, the absence of contextual metadata limits the future possibilities for the preserved object. hart writes that because the multimedia objects are dynamic and interactive, and often composed of multiple image, audio, video, and software files, descriptive metadata are increasingly important because they can be used to describe, organise, and package the files. 9 it is also stressed that content description is of great importance because digital objects are not self-describing, which makes identifying semantic-level content difficult; without description metadata, context is lost. 10 for example, without description metadata to provide context, an image’s subject information and search and retrieval functionality is lost. without this information, verifying whether an object is the original, a copy, or a fabricated or fraudulent item is impossible in most cases. metadata vulnerability—case studies digital objects that are currently being created often go through several modifications, making it difficult to identify the original or authentic copy of the object. verifying and validating authenticity is important for preserving, conserving, and archiving objects. the digital preservation coalition defines authenticity as the digital material is what it purports to be. in the case of electronic records, it refers to the trustworthiness of the electronic record as a record. in the case of “born digital” and digitised materials, it refers to the fact that whatever is being cited is the same as it was when it was first created unless the accompanying metadata indicates any changes. confidence in the authenticity of digital materials over time is particularly crucial owing to the ease with which alterations can be made. 11 tests were undertaken to discover how vulnerable metadata can be in digital files that are subject to change, which can lead to loss, addition, and modification. the tests were conducted using the file types jpeg, pdf, and docx (word 2007). the tests revealed what metadata can be extracted and what metadata could be present in the selected file types. furthermore, they revealed how specific metadata can verify and validate the authenticity of a file such as an image. for each test, the metadata were extracted using exiftool (http://owl.phy.queensu.ca/~phil/exiftool/). alternative browser-based tools were tested and provided similar results; however, exiftool was selected as the primary testing tool because it produced the best results and had the best functionality. some of the files tested provided extensive sets of metadata that are too large to include, but subsets can be found in hart (2009). note that only subsets are included because some metadata was removed for privacy and relevance reasons. the process and method for each test was conducted in the following manner: http://owl.phy.queensu.ca/~phil/exiftool/ metadata provenance and vulnerability | hart and de vries 27 https://doi.org/10.6017/ital.v36i4.10146 • case study 1—jpeg o original metadata extracted for comparison o image copied, metadata extracted from copy and examined for changes o file uploaded to social media, downloaded from social media, extracted and examined against original • case study 2—jpeg (modified) o original metadata extracted for comparison o image opened and modified in photo editing software (adobe photoshop), metadata extracted from new version and examined against original • case study 3—pdf o basic metadata extraction performed to establish what metadata are typically found in pdf files and what types of metadata could be possible • case study 4—docx o original metadata extracted for comparison o file saved as pdf through microsoft word and metadata compared to original o file converted to pdf through adobe acrobat and metadata compared to original case study 1 this case study investigated the everyday use of digital files, the first being simply copying a file. it was revealed that copying a file creates an exact copy of the original file and no changes in metadata aside from the creation and modification time/date. thus, the copy could not be identified against the original unless the original creation time/date was known. the second everyday use was uploading an image to facebook. the metadata-extraction tests revealed that the original file had approximately 265 metadata elements. (the approximation is caused by the ambiguity of certain elements that may be read as singular or multiple entries.) these elements included, but were not limited to, the following: • dates • technical metadata • creator/author information • color data • image attributes • creation-tool information • camera data • change • software history many of the metadata elements had useful information for a range of situations. even so, several metadata elements were missing that would require a user input for creation. once the file had been uploaded to and then downloaded from social media, approximately 203 metadata elements were lost, included date, color, creation-tool information, camera data, change, and software history. it can be argued that removing some of this metadata would help keep user information private, but certain metadata should be retained, such as change and software history. these information technology and libraries | december 2017 28 metadata make it easier to differentiate fabricated images from authentic images and to know which modifications have been made to a file. for preservation purposes, the missing metadata is what may be needed to provide authenticity. this case study aims to make users aware of the significant risk of metadata loss when dealing with digital objects. if metadata are not identified and captured before the object is processed within a repository, the loss could be irreversible. case study 2 the second case study revealed how the change and software history metadata can be used to easily identify when a file has been modified. in the test conducted, it was evident by visually comparing the images that changes were made; however, modifications are not always obvious as some changes can be subtle, such as moving an element in the image that completely changes what the image is conveying. the following example displays the change history from the image used in case study 1, revealing how the metadata can easily identify modification: • history action—saved, saved, saved, saved, converted, derived, saved • history when—the first saved was at 2010:02:11 21:59:05, the last saved was at 2010:02:11 22:12:01 with each action having its own timestamp • history software agent—adobe photoshop cs4 windows for each action • history parameters—converted from tiff to jpeg further testing was conducted with simple photo manipulation using an original image to see firsthand the issues described in the initial test. the image contained approximately 178 metadata elements, including the typical metadata that were found in the first case study. once the image was processed and modified with adobe photoshop cs5, the metadata were no longer identical. the modified image had approximately 201 metadata elements. the new elements included photoshop-specific data, change, and software history. however, extensive camera data were lost. it can be argued that the camera data are not important for digital preservation because the lack of it will not hinder the preservation process. however, once the file is preserved and those data are lost, important technical and descriptive information can never be regained. for example, consider a spectacular digital image that captures an important moment in history. if that image is preserved for twenty years, in that time cameras and perhaps photography itself will have advanced dramatically. how digital images are captured and processed might be completely different and will most likely provide different results. should someone wish to know how that preserved image was captured, they would need to know what camera was used, lens and shutter speed data, lighting data, and other technical information. preserving those metadata can be almost as important as preserving the file itself because each metadata element has importance and meaning to someone. as most viewers of online media are aware, photos are often modified, especially on social media. this is often performed on “selfies,” pictures taken of oneself. these can be modified to make the person in the photo look better or to hide features they see as flawed. small modifications, such as covering some blemishes or improving the lighting have little effect on the image’s context, but some modifications and manipulations that can mislead people. these manipulated images often metadata provenance and vulnerability | hart and de vries 29 https://doi.org/10.6017/ital.v36i4.10146 take the form of viral hoax images circulating around the web. for example, figure 1 displays how two images can be combined into a composite image that changes the context of the image. figure 1. composite image. “photo tampering throughout history,” fourandsix technologies, 2003, http://pth.izitru.com/2003_04_00.html. the two images side by side are original photos taken in basra of a british soldier gesturing to iraqi civilians to take cover. in the right image, the iraqi man is holding a child and seeking help from the solider; as you can see, this soldier does not interpret this as a hostile act. the image above is a composite of the two that changes the story. in this image, the soldier appears to be responding with hostility toward the man approaching. with basic photo manipulation, this soldier who is protecting innocent civilians is portrayed holding them against their will. images like this circulate through media of all types, and although the exchangeable image file format (exif) metadata may not identify what has been done to the image, it would eliminate any doubt that the image has been modified. unfortunately, these data are not made available. making users aware of this vulnerability may improve detection of file manipulation at the time of ingest to better ensure only accurate and authentic material is being considered for preservation. donations received by digital repositories such as libraries must be scrutinised by trained individuals. with this awareness and knowledge of metadata, they can perform their duties to a much higher standard. case study 3 the pdf metadata extraction provided interesting results. over a range of tests on academic research papers, the main metadata identified consisted of pdf version, author, creator, creation date, modification date, and xmp (adobe extensible metadata platform) data. these metadata http://pth.izitru.com/2003_04_00.html information technology and libraries | december 2017 30 were not present in every pdf tested; in fact, the majority of pdf files seemed to be lacking important metadata. the author and creator fields were generally listed as “administrator” or “user” and bibliographic metadata was usually missing. however, pdf openly supports xmp embedding, therefore, bibliographic metadata could be embedded into the pdf. through further testing, bibliographic metadata linked to the pdfs were discovered stored in online databases. bibliographic software such as endnote and zotero allow metadata extraction, which enables users to import pdf files and automatically generate the appropriate bibliographic metadata. for example, zotero performs this extraction by first searching for a match for the pdf on google scholar. if this search does not return a match, zotero uses the embedded digital object identifier (doi) to perform the match. this method is not consistent: it often fails to retrieve any data, and in rare cases it retrieves the wrong data, which leads to incorrect references. given what we saw happen to metadata when a file is uploaded such as in case study 1 and the nature of a pdf’s journey through template selection, editing, and publishing, it is no surprise that metadata are lost or diluted along the way. case study 4 the fourth case study conducted on docx files provided an extensive set of metadata, some of which are unique to this file type. creating a new word document via the file explorer context menu and attempting to extract metadata resulted in an error as there were no readable metadata to extract until the file was accessed and saved. once the file had some user input and was saved, the metadata were created and could be extracted. microsoft office files contain external xml files that holds information about the document, such as formatting data, user information, edit history, and information about the document’s page count, word count, etc. picture a docx file as an uncompressed directory. however, using exiftool on the docx file allowed retrieval of the metadata from all the hidden files. the metadata included creation, modification, and edit information, such as number of edits and total edit time. every element within the document (e.g., text, images, tables, etc.) has its own metadata attached that are crucial for preserving the format of the document. the next step in the test involved converting the docx file into pdf using the following two methods: (1) converting the document via the “publish” save option within microsoft word; and (2) “right clicking” the document and selecting the option to convert to an adobe pdf. the results of the two methods varied slightly. method 1 stripped all the metadata from the document and generated only default pdf metadata consisting of system metadata (file size, date, time, permissions) and the pdf version, author details, and document details. method two behaved the same way except that some xmp metadata were created. both methods resulted in no informative metadata remaining as the majority of the xmp elements were empty fields or contained generic values such as the computer name as the author. all formatting and metadata unique to microsoft word was lost. this case study is an enlightening example of what can happen to metadata when a file is changed from one format to another. metadata provenance and vulnerability | hart and de vries 31 https://doi.org/10.6017/ital.v36i4.10146 human intervention the human element is a requirement in digital preservation as certain metadata, such as descriptive and administrative metadata, can only be created by humans. in fact, as hart notes, user input is needed to record the majority of the digital preservation metadata. 12 the process can be tedious, as described by wheatley. 13 one of the examples described included following the processes in a repository from ingest to access, beginning with the creation of metadata and the managerial tasks that are necessary. these tasks include using extraction tools and automation where possible. using frameworks to record changes to metadata is required, and in some cases metadata must be stored externally to their digital objects. this allows multiple objects of the same type to utilise a generic set of metadata to avoid redundant data. however, although using a generic metadata set is convenient, a large collection of digital objects could be affected if the metadata is lost or damaged. the human element increases the risk of error drastically because there are numerous steps to metadata creation. misconduct is also possible. therefore, the less digital preservation is reliant on humans (and the easier the tasks are that require human input), the better. this can only be achieved by automating most process and training people to ensure they handle their responsibilities accurately, consistently, and completely. learning the results from the case studies like those described in this paper will better prepare users working with digital objects. discussion to achieve the most authentic, consistent, and complete digital preservation, institutions must revise their preservation workflows and processes. this entails ensuring the initial processes within workflows are correct before processing digital content. the content must come from a credible source and have its authenticity approved. participation from the donor of the digital content might be beneficial if they can provide information and metadata about the content. this information could provide additional context for the content as well as identify its history (e.g., format migration or modification). this is not always possible as the donor is not always be the creator of the digital content. if the original source is no longer available, as much information as possible should be gathered from the donor about the acquisition of the content and any information regarding the original source. this should be considered and carefully monitored throughout the lifecycle of digital content. granted, if no changes are needed, devices such as write blockers can ensure this as they restrict users and any systems from making unwanted changes or “writes.” however, changes are sometimes unavoidable and (although it may not affect the content) detrimental. when changes are required, it is crucial to maintain the digital history by capturing all metadata added, removed, or modified during processing, commonly known as the “change history.” donor participation should be stipulated in a donor agreement, something that each institution offers to all donors, sometimes in the form of agreements through communication and often with a structured document. donor-agreement policies differ for each institution: some are quite detailed, allowing donors to carefully stipulate their conditions, whereas others place most of the information technology and libraries | december 2017 32 responsibility on the receiving institution. when dealing with sensitive or historic data of importance, policies should be in place to capture adequate data from the donor. when the content does not fall into this category, standard procedures, which should be present in all donor agreements and institution policies, can be followed. institutions must also consider when to apply these steps as some transactions between donor and institution can follow standard protocol; others are more complex, such as donations of content with diverse provenance issues. conclusion we have presented four case studies that illustrate how vulnerable digital-object metadata are. these examples show that common methods of handling files can cause irretrievable loss of important information. we discovered significant loss of metadata when uploading photos to social media and when converting a file to another format. the digital footprint left behind from photo manipulation was also exposed. we shed light on the bibliographic-metadata generation of pdf files, how they are obtained, and the surrounding issues. action is needed to ensure proper metadata creation and preservation for born-digital objects. librarians and archivists must place a greater emphasis on why digital objects are preserved as well as how and when users may need to access them. therefore, all types of metadata must be captured to allow users from all disciplines to take advantage of historical data in many years to come. given the rate of technological change, we must be prepared; observing first-hand the vulnerability of metadata is a step toward a safer future for our digital history. references 1 “charter on the preservation of digital heritage,” unesco, october 15, 2003, http://portal.unesco.org/en/ev.phpurl_id=17721&url_do=do_topic&url_section=201.html. 2 k. rechert et al., “bwfla—a functional approach to digital preservation,” pik—praxis der informationsverarbeitung und kommunikation 35, no. 4 (2012), 259–67. 3 k. rechert et al., design and development of an emulation-driven access system for reading rooms, archiving conference, 2014, 126–31, society for imaging science and technology, 2014. 4 m. phillips et al., the ndsa levels of digital preservation: explanation and uses, archiving conference, 2013, 216–22, society for imaging science and technology, 2013. 5 “premis: preservation metadata maintenance activity” library of congress, accessed march 10, 2016, http://www.loc.gov/standards/premis/. 6 r. gartner and b. lavoie, preservation metadata (2nd edition) (york, uk: digital preservation coalition, 2013), 5–6. http://portal.unesco.org/en/ev.php-url_id=17721&url_do=do_topic&url_section=201.html http://portal.unesco.org/en/ev.php-url_id=17721&url_do=do_topic&url_section=201.html http://www.loc.gov/standards/premis/ metadata provenance and vulnerability | hart and de vries 33 https://doi.org/10.6017/ital.v36i4.10146 7 premis editorial committee, premis data dictionary for preservation metadata, version 2.2 (washington, dc: library of congress, 2012), http://www.loc.gov/standards/premis/v2/premis-2-2.pdf. 8 premis editorial committee, premis schema, version 3.0 (washington, dc: library of congress, 2015), http://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf. 9 timothy hart, “metadata standard for future digital preservation” (honours thesis, flinders university, adelaide, australia, 2015). 10 j. r. smith and p. schirling, “metadata standards roundup,” ieee multimedia 13, no 2 (april-june 2006): 84–88. 11 “glossary,” digital preservation coalition, accessed august 5, 2016, http://handbook.dpconline.org/glossary. 12 timothy hart, “metadata standard for future digital preservation” (honours thesis, flinders university, adelaide, australia, 2015). 13 paul wheatley, “institutional repositories in the context of digital preservation,” microform & digitization review 33, no. 3 (2004): 135–46. http://www.loc.gov/standards/premis/v2/premis-2-2.pdf http://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf http://handbook.dpconline.org/glossary abstract introduction metadata types metadata vulnerability—case studies case study 1 case study 2 case study 3 case study 4 human intervention discussion conclusion references president’s message: ux thinking and the lita member experience rachel vacek information technologies and libraries | september 1014 1 my mind has been occupied lately with user experience (ux) thinking in both the web world and in the physical world around me. i manage a web services department in an academic library, and it’s my department’s responsibility to contemplate how best to present website content so students can easily search for the articles they are looking for, or so faculty can quickly navigate to their favorite database. in addition to making these tasks easy and efficient, we want to make sure that users feel good about their accomplishments. my department has to ensure that the other systems and services that are integrated throughout the site are located in meaningful places and can be used at the point of need. additionally, the site’s graphic and interaction design must not only contribute to but also enhance the overall user experience. we care about usability, graphic design, and the user interfaces of our library’s web presence, but these are just subsets of the larger ux picture. for example, a site can have a great user interface and design, but if a user can’t get to the actual information she is looking for, the overall experience is less than desirable. jesse james garrett is considered to be one of the founding fathers of user-centered design, the creator of the pivotal diagram defining the elements of user experience, and author of book, the elements of user experience. he believes that “experience design is the design of anything, independent of medium, or across media, with human experience as an explicit outcome, and human engagement as an explicit goal.”1 in other words, applying a ux approach to thinking involves paying attention to a person’s behaviors, feelings, and attitudes about a particular product, system, or service. someone who does ux design, therefore, focuses on building the relationship between people and the products, systems, and services in which they interact. garrett provides a roadmap of sorts for us by identifying and defining the elements of a web user experience, some of which are the visual, interface, and interaction design, the information architecture, and user needs.2 in time, these come together to form a cohesive, holistic approach to impacting our users’ overarching experience across our library’s web presence. paying attention to these more contextual elements informs the development and management of a web site. let’s switch gears for a moment. prior to winning the election and becoming the lita vicepresident/president-elect, i reflected on my experiences as a new lita member and before i became really engaged within the association. i endeavored to remember how i felt when i had joined lita in 2005. was i welcomed and informed, or did i feel distant and uninformed? was the path clear to getting involved in interest groups and committees, or were there barriers that rachel vacek (revacek@uh.edu) is lita president 2014-15 and head of web services, university libraries, university of houston, houston, texas. mailto:revacek@uh.edu president’s message | vacek 2 prevented me from getting engaged? what was my attitude about the overall organization? how were my feelings about lita impacted? luckily, there were multiple times when i felt embraced by lita members, such as participating in bigwig’s social media showcase, teaching pre-conferences, hanging out at the happy hours, and attending the forums. i discovered ample networking opportunities and around every corner there always seemed to be a way to get involved. i attended as many lita programs at annual and midwinter conferences as i could, and in doing so, ran into the same crowds of people over and over again. plus, the sessions i attended always had excellent content and friendly, knowledgeable speakers. over time, many of these members became some of my friends and most trusted colleagues. unfortunately, i’m confident that not every lita member or prospective member has had similar, consistent, or as engaging experiences as i’ve had, or as many opportunities to travel to conferences and network in-person. we all have different expectations and goals that color our personal experiences in interacting with lita and its members. one of my goals as lita president is to enhance the member experience. i want to apply the user experience design concepts that i’m so familiar with to effect change and improve the overall experience for current members and those who are on the fence about joining. to be clear, when i say lita member, i am including board members, committee members and chairs, interest group members and chairs, representatives, and those just observing on the sidelines. we are all lita members and deserve to have a good experience no matter the level within the organization. so what does “member experience” really mean? don norman, author of the design of everyday things and the man attributed with actually coining the phrase user experience, explains that "user experience encompasses all aspects of the end-user's interaction with the company, its services, and its products.” 3 therefore, i would say that the lita member experience encompasses all aspects of a member’s interaction with the association, including its programming, educational opportunities, publications, events, and even other members. i believe that there are several components that define a good member experience. first, we have to ensure quality, coherence, and consistency in programming, publications, educational opportunities, communications and marketing, conferences, and networking opportunities. second, we need to pay attention to our members’ needs and wants as well as their motivations for joining. this means we have to engage with our members more on a personal level, and discover their interests and strengths, and help them get involved in lita in ways that benefit the association as well assist them in reaching their professional goals. third, we need to be welcoming and recognize that first impressions are crucial to gaining new members and retaining current ones. think about how you felt and what you thought when you received a product that really impressed you, or when you started an exciting new job, or even used a clean and usable web site. if your initial impression was positive, you were more likely to connect with the product, environment, or even a website. if prospective and relatively new lita information technologies and libraries | september 1014 3 members experience a good first impression, they are more likely to join or renew their membership. they feel like they are part of a community that cares about them and their future. that experience became meaningful. finally, the fourth component to a good member experience is that we need to stop looking at the tangible benefits that we provide to users as the only things that matter. sure, it’s great to get discounts on workshops and webinars or be able to vote in an election and get appointed to a committee, but we can’t continue to focus on these offerings alone. we need to assess the way we communicate through email, social media, and our web page and determine if it adds or detracts from the member experience. what is the first impression someone might have in looking at the content and design of lita’s web page? do the presenters for our educational programs feel valued? does ital contain innovative and useful information? is the process for joining lita, or volunteering to be on a committee, simple, complex, or unbearable? what kinds of interactions do members have with the lita board or the lita staff? these less tangible interactions are highly contextual and can add to or detract from our current and prospective members’ abilities to meet their own goals, measure satisfaction, or define success. as lita president, and with the assistance of the board of directors, there are several things we have done or intend to do to help lita embrace ux thinking: • we have implemented a chair and vice-chair model for committees so that there is a smoother transition and the vice-chair can learn the responsibilities of the chair role prior to being in that role. • we have established a new communications committee that will create a communication strategy focused on communicating the lita’s mission, vision, goals, and relevant and timely news to lita membership across various communication channels. • we are encouraging our committees to create more robust documentation. • we are creating richer documentation that supports the workings of the board. • we are creating documentation and training materials for lita representatives to compliment the materials we have for committee chairs. • we have disbanded committees that no longer serve a purpose at the lita level and whose concerns are now addressed in groups higher within ala. • the assessment and research committee is preparing to do a membership survey. the last one was done in 2007. • we are going to be holding a few virtual and in-person lita “kitchen table conversations” in the fall of 2014 to assist with strategic planning and to discuss how lita’s goals align with ala’s strategic goals of information policy, professional development, and advocacy. • the membership development committee is exploring how to more easily and frequently reach out, engage, appreciate, acknowledge, and highlight current and prospective members. they will work closely with the communications committee. president’s message | vacek 4 i believe that we’ve arrived at a time where it’s crucial that we employ ux thinking at a more pragmatic and systematic level and treat at it as our strategic partner when exploring how to improve lita and help the association evolve to meet the needs of today’s library and informational professionals. garrett summarizes my argument nicely. he says, “what makes people passionate, pure and simple, is great experiences. if they have great experience with your product [and] they have great experiences with your service, they’re going to be passionate about your brand, they’re going to be committed to it. that’s how you build that kind of commitment.”4 i personally am very passionate about and committed to lita, and i truly believe that our ux efforts will positively impact your experience as a lita member. references 1. http://uxdesign.com/events/article/state-of-ux-design-garrett/203, garrett said this quote in a presentation entitled “state of user experience” that he gave during ux week 2009, a very popular conference for ux designers. 2. http://www.jjg.net/elements/pdf/elements.pdf 3. http://www.nngroup.com/articles/definition-user-experience/ 4. http://www.teresabrazen.com/podcasts/what-the-heck-is-user-experience-design, garret said this quote in a podcast interview with teresa brazen, “what the heck is user experience design??!! (and why should i care?)” http://uxdesign.com/events/article/state-of-ux-design-garrett/203 http://www.jjg.net/elements/pdf/elements.pdffunctional http://www.nngroup.com/articles/definition-user-experience/ http://www.teresabrazen.com/podcasts/what-the-heck-is-user-experience-design lib-s-mocs-kmc364-20140601053153 184 ]oumal of library automation vol. 5/3 september, 1972 two types of designs for on-line circulation systems rob mcgee: systems development office, university of chicago libra,ry on-line circulation systems divide into two types. one type contains records only for charged or otherwise absent items. the other contains a file of records for all titles or volumes in the library collection, regardless of their circulation status. this paper traces differences between the two types, examining different kinds of files and terminals, transaction evidence, the quality of bibliographic data, querying, and the possibility of functions outside circulation. aspects of both operational and potential systems are considered. introduction a literature survey was made of on-line circulation systems ( 1 ). to qualify for study, a system needed to perform any major circulation function on-line. charging and querying were common. some systems were also found to perform some acquisitions, cataloging, and reference work. criteria used to examine systems have been presented in an earlier paper as key factors of circulation system analysis and design (2). this paper conceptualizes the survey findings, and goes on to consider general problems and alternatives of designing on-line circulation systems. the survey shows that on-line circulation systems divide into two types, according to the scope of their bibliographic records. we give the term "absence file" to a set of records for only those items that have been charged or otherwise removed from their assigned locations. the name "item file" is given to what is, or approaches being, a comprehensive file of records for all titles or volumes in the library collection, regardless of their circulation status. each on-line circulation system either does or does not have an item file. systems without an item file must contain an absence file, and are two types of designs/mcgee 185 therefore called "absence systems." systems with an item file are called "item systems." (an item system may also have an absence file, depending upon its design.) note that an "absence file" and an "item file" are each conceptually or logically defined as a single file, whereas in some operational systems either may be stored as more than one physical file. two other basic files generally appear in operational systems: a user file of complete records for users; and a transaction file that may be variously used for data collection, system update, system backup, and batch generation of notices. we can now generalize common but not exclusive file definitions for the two design types. absence systems usually contain three main logical files: 1) a user file; 2) an absence file that contains records only for charged or otherwise absent items; and 3) a transaction file. user identification number and complete item data (all the item data the system is to hold) are input at transaction time to create charge records. these data are typically collected from machine-readable sources such as punched cards or magnetic strips; the surveyed systems use punched cards. time data, such as charge date or due date, and circumstantial data, such as charging location, may also be collected. during batch processing, user records are accessed by identification number to obtain name, address, and so forth. examples of absence systems are found at west sussex county library (3,4,5 ), illinois state library ( 6,7 ), midwestern university library ( 8,9,10), queen's university library ( 11,12,13,14,15), northwestern university library ( 16), and bucknell university library ( 17). item systems are characterized by three or four major files: 1) a user file; 2) an item file of bibliographic records for all library volumes or titles, or for as many as machine records can feasibly be created and stored; 3) a transaction file that may be used for update of the item file, data collection and analysis, and perhaps notice generation; and optionally 4) an absence file of records for circulating items, if transaction data for them are more efficiently kept here than in the item file. records in both the user and item files contain either full data or at least enough data to address messages to users and to adequately describe items. if an absence file is used in an item system, it may copy bibliographic data from the item file, or the two files may be linked to avoid data redundancy. item systems are in operation at bell laboratories library ( 18,19 ), eastern illinois university library ( 20), ohio state university libraries ( 21,22 ), and the technical library of the manned spacecraft center, houston ( 23). (the manned spacecraft center library, alone among the systems we have surveyed, does not have a user file. instead, user's last name, initials, and address code are input at charge time.) since the basic distinction between absence and item systems is whether descriptive data for an item are machine-held prior to its charge time, item file records are limited primarily by the costs of conversion and machine storag~, but can have full, even marc-like formats; whereas absence sys186 journal of library automation vol. 5/3 september, 1972 tern records are restricted by the quantity of data that can be input at charge time, i.e. by the capacities of source record coding and data transfer techniques. basic approaches to on-line circulation system development three approaches to the design of on-line circulation systems have originated from different notions of circulation control ( 24). first is the view that circulation control is a separate library function, or one with minimal relationships to other library data processing. exclusive requirements for user and item data are formulated; the format of bibliographic data and the design of data management capabilities are developed explicitly for circulation control, to the exclusion of other library data processing requirements. absence systems have been developed with this approach, but thus far item systems have not. the second approach is to create a circulation system that is operationally independent of other library data processing activities, but designed with a view toward the possibility of shared usage of bibliographic and other data, and of general library data processing facilities. compatibility with other functions is provided, to aid later combination. either system design can take this approach, but item systems can take better advantage of the integration of functions. a third approach is to add circulation control to other large file processes (such as a cataloging system), or to develop them concurrently. this follows an integrated view of library data processing that sees a circulation system operating with many of the same data and processing requirements as other library functions, all of which are handled by a general library data management system. the broad range of library data processing activities needs to be addressed, and an item system design is likely to be preferred. two concepts underlie these approaches: 1 ) an integrated library system, and 2) a remotely accessible library catalog. an integrated view of the library is one of a total operating unit with a variety of operations that are logically interrelated and interconnected by their mutual requirements for data and processing ( 25) . the term "integrated system" usually implies a system in which centralized, minimally-redundant files undergo shared processing by different library functions. it is not clear exactly how the concept of a remotely accessible catalog should be defined, or exactly what the phrase means to various users. if we take it to mean the capability to access information from a given catalog at remote locations, then a variety of systems may qualify: e.g., telephone access to a group that performs manual card catalog lookups ; multiple locations of book catalogs or microform catalogs; and terminal access to an on-line, computerized catalog. the last is pertinent to our discussion. two types of designs/mcgee 187 how an integrated system is implemented determines if its central bibliographic file is accessible from multiple remote locations; how an on-line, remotely accessible catalog is used determines if the system is integrated. recently the ohio state university libraries circulation system has come to be explicitly called a remote catalog access system. we have not yet found reports of any on-line system that has integrated all of technical services and circulation. the addition of circulation control to existing on-line cataloging systems has been planned for the shawnee mission system ( 26) and mentioned for the system at western kentucky university, bowling green ( 27). the ohio college library center has not yet decided how it will handle circulation. as long as we define an integrated system on the basis of multiple uses of nonredundant data (among other characteristics), and a remote catalog access system upon physical accessibility, various systems may qualify as either or both. recognizing these two concepts helps to show how the three approaches to on-line circulation system development capsulize broader trends in library systems. first, the redundancy of bibliographic data in operationally separate but conceptually related functions has characterized traditional manual systems, batch computer systems, and now some on-line systems. second, the construction of individual, independent subsystems, while planning for their eventual combination, has been called an evolutionary approach to a total system, by de gennaro ( 25). he has also defined a third, integrated approach, in which computerized systems are designed to take advantage of interrelationships among different subsystems, for example, by accepting one-time inputs of given data, and processing them for multiple library functions and outputs. these three trends have been widely experienced in changing relationships among traditional and innovative systems for acquisitions and cataloging. the pattern is repeating now in the evolution of on-line systems with respect to technical services processing and circulation control. large on-line systems are emerging to perform acquisitions, cataloging, circulation control, and reference functions with shared processing facilities and data bases. terminal devices most on-line circulation systems do not perform all functions on-line, although the following are possibilities: charge, discharge, inquiry, and other record creations and updates, such as reserving items in circulation, renewing loans, recording fines payments, and even converting files to machine-readable form. what do their input/output requirements imply for terminal devices? inputs for charges may be minimal user and item identification numbers, or full borrower and book descriptions. evidence of valid charges may be produced; printouts of user number, call number, short author and title, and due date are common. there are also special "security systems" that switch two-state devices in books (such as sensitized plates ~r labels) to record valid charge, but as yet no such system has been 188 journal of library automation vol. 5/3 september, 1972 coupled with on-line charging. discharge inputs need only to match existing records; simple access keys such as call number or accession number are adequate. querying, too, may be accomplished with simple search keys, or with bibliographic inputs such as author and title . all these functions can be performed with keyboard input and output display of alphanumeric data. absence systems although not a requirement by our definition, most absence systems feature machine-readable user and item cards (the queen's university system does not); the terminals used for charge and discharge must have card reading capabilities. thus for on-line tasks charge stations need card readers and the ability to produce charge evidence, usually in hard copy; querying by bibliographic search keys requires keyboard input and output display of alphanumeric data; discharge stations need a card-reading capability and a display mechanism to identify reserved items; and file creation requires inputs of alphanumeric data in a character set that may range from minimum to full. there are at least two problems in choosing terminal equipment for absence systems. fi1·st, any single terminal or configuration that satisfies input and output requirements for all basic functions may be too expensive to install at every library location of these activities. second, the combination of separate hardware units (such as keyboards, printers, and card readers) may require special hardware or software interfaces that prove difficult and expensive ( 16, 28). alternatively, separate circulation stations with different terminal devices can be established for specific functions. this solution may introduce problems of hardware and personnel redundancy and backup. difficulties with terminal devices explain in part why most systems perform not all but only selected functions on-line. item systems it is possible, in systems with user and item files, to access records by using search keys that are either keystroked or machine-read from cards. the use of machine-readable cards involves the same problems as those described for absence systems. however, choosing keyboard entry of accession or call numbers eliminates card reading, and simplifies requirements so that keyboard devices with display capabilities can perform all basic functions. the feasibility of keyboarding the inputs at transaction time has been demonstrated by the systems at queen's university library (an absence system without machinereadable cards), ohio state university libraries, bell laboratories, and the technical library of the manned spacecr:1ft center. a system based on a single terminal device that handles all realtime functions offers attractive simplifications for hardware and teleprocessing software. the primary disadvantages also center on the device itself. factors such as input error, transmission and printing rates, character set, special function keys, noise, and cost have various implications two types of designs/mcgee 189 for system design and operations. obviously, in a system based on a single terminal device, the characteristics of that device are influential. kilgour has stated that the two most important factors in configuring a computer system for an item file design are, first, the nature of secondary memory, and second, the kind of terminal device to be employed ( 29). the need to quickly access large stores of data is basic. as for keyboard devices, one often finds that typewriter terminals may require far more computation of a central computer than do cathode ray tube terminals, because many crt systems have substantial computing power of their own, giving the effect of a satellite computer. this can be important for systems that will run on time-shared machines, or transmit data over long distances. the problems we have described for circulation terminals can be overcome; appropriate devices can be built. too many library systems have been designed around unsuitable hardware; there has been little choice but to develop circulation systems (both on-line and batch) with data collection devices designed for industrial applications. their influence-frequently bad-is fundamental to the nature of resulting systems. in fairness, suppliers need both direction and marketing potential. the deeper fault is with librarians, who have inadequately documented requirements and not proven the existence of a market. the integrated approach to library automation ultimately visualizes all library functions using a single set each of bibliographic, user, and other kinds of records, although different pieces of data for different purposes. similarly, one can say that different sets of terminal requirements arise from the different input/ output specifications among library tasks and not so much from the nature of bibliographic and other data. as functional requirements of different activities (e.g., acquisitions, cataloging, and circulation) overlap, the opportunity to use identical or similar terminals in a variety of library processes is enhanced. extending the integrated approach to libraryrelated hardware fits well with the concepts of modular hardware design and add-on features. take, for example, a basic keyboard/display screen terminal to which modules can be added to read book and borrower identification and to produce hardcopy printout. transaction evidence a variety of transactions may occur between a library and its users: charging and discharging books, placing reserves on circulating material, paying fines, etc. evidence may be provided to verify transaction accuracy and to furnish receipts for users. this evidence can be in various formats: it might be a hardcopy record (or worksheet) of transaction inputs, or a printout or screen display of system responses. printed charge evidence is a familiar example, and is sometimes used for inspection of items that library users carry from the building. two kinds of charge evidence may be defined ( 2). simple evidence contains no more user and item data than are input at transaction time. complex evidence contains user or item data other 190 ]oumal of library automation vol. 5/3 september, 1972 than charge time input, and requires the system to extract data from machine-held file( s). printed evidence typically contains an item due date that may be calculated from either user or item criteria, or both; or directly specified at the time of charge. let us look at the implications of printed charge evidence for the two system types. absence systems in most absence systems user identification number and full item data are transferred into the system from machine-readable cards at charge time. there are various ways of printing simple transaction evidence; the following are illustrative. one technique is to transmit data directly to a computer that formats them for output, calculates a due date, and returns them to a printing device. another method is to process source record data with a terminal system that can buffer and format them, select a juc date, and output the evidence on a printer. shifting functions from the computer to a terminal system may simplify teleprocessing software, save time at the central processing unit, and permit nearly normal charge operations during computer downtime. if more elaborate user data than identification number are required, there are two obvious solutions. central user records may he accessed to provide complex evidence, possibly increasing central processing unit time and response time. or, user cards (such as magnetically encoded ones ) that contain fuller data could be employed with a terminal system that handles them independently of the central computer. item systems in item systems as in absence systems it is possible to use machinereadable cards, with the same implications for printing charge evidence. however, if 1) user and item numbers are keystroked, 2) these are considered sufficient borrower and book information, and 3) decision rules for loan periods are simple, then little or no computer response is required for charging. due date may be returned and printed to signal completed transaction, or predated date due slips may be used. alternatively, special terminal features may be added to select and print a due date. this complicates otherwise simple terminal requirements. sophistications such as status checks on the borrower (e.g., any outstanding fines?) and item (e.g., is it reserved for another user?) will of course require more extensive processing and responses. if charge time inputs are indeed keystroked, user and item numbers with check digits are desirable, to minimize the effects of input error. for complex evidence response time is important, expecially if terminals are typewriter-like devices. the time required is determined by the sources of response data, their access times, how much data must be transmitted, and the transmission and terminal display rates. through careful design the time required to obtain charge evidence and complete the transaction can two types of designsj.mcgee 191 be minimized. for example, if the user number carries a code for borrower class, then a due date can be quickly selected and printed, while the item file is accessed for needed author /title data. it is clear that in an item system containing a user file, only very simple inputs are required to record a charge transaction. the additional requirements for charge evidence, status checks on user and item, and so forth determine how elaborate and slow system responses may become. availability, holdings, and absence information one may take the view that a library should provide the following kinds of responses to users. if a title is requested, library holdings for it should be given. if a specific item is wanted, either its absence or presumed location should be reported. if the item cannot be immediately provided, the library should determine its future availability and inform the user. the terms "availability," "holdings," and "absence information" have special meanings. the availability of a specific item to a library's users is mapped onto the universe of items by the library's acquisitions, cataloging, circulation control, and interlibrary borrowing functions. availability information obtains from all these sources, but particularly from the public catalog of library holdings. absence information, in contrast, corresponds only to a subset of library holdings-it tells the locations of library-owned items when they are absent from the locations indicated by the catalog. absence information therefore corresponds to a subset of holdings information; holdings information is a subset of availability information. in the context of our discussion an absence system provides full absence information and only partial holdings and availability information. an item system can provide full holdings and absence information, but only partial availability information, since items not owned may be ordered, or borrowed from another library. such considerations strengthen the argument that circulation control shares a functional unity with other library processes, and should therefore be considered as one of several integrated functions. the provision of absence and availability information is the essence of circulation system querying requirements. figure 1 shows that different query keys access different subsets of availability information. note the wider utility of some keys than for just circulation control. absence systems in on-line circulation systems built around an absence file, the data representing each physical item may range from a simple accession or item call number (as in the queen's university and northwestern university systems) to larger records containing as much data as may be stored and transferred with a machine-readable card (for example, a hollerith-punched book card). if availability information is to be obtained from a library's public catalog, and not from the circulation system, then access to the file of absence information may be with any key shown in the catalog records: 192 journal of library automation vol. 5/3 september, 1972 e.g., author and title, title and author, ca11 number, accession number. only the simpler keys, item call number and accession number, have been used in absence systems developed so far. consequently their requirements for file organization and access software are minimal. in most systems these keys permit exact matches to only single records, but in the northwestern university system a call number query may cause display of a set of related records ( 16). item systems the query function in item systems is bound by different constraints than those constraints of absence systems. the amount of bibliographic data is not restricted by the storage capacities of machine-readable cards, transaction response time, or the transfer rates of charge-time inputs. however, the following questions arise: how many and which records from the library's data base (e.g., its shelflist) must be converted to provide a sufficient item file? for each record converted, how much and which data are required? what functions shall such a data base ultimately support? what kinds of absence and availability information will be provided? these deserve special discussion before we consider querying in item systems. item system bibliographic data how much data are required for item file records? in an integrated system full records are ultimately produced by the cataloging process. should one use full, variable-length, marc-like records for circulation control? the conversion, on-line storage, management, and access of a large file of full bibliographic records are expensive propositions. one may be compelled toward a lesser effort. under much of the popular data management software it is easier to organize and store fixed length records than variable length records with different combinations of fixed and variable length fields. the files of current item systems hold less-than-full bibliographic records: bell laboratories library utilizes two basic fixed length formats of 155 and 188 characters ( 18,19); the eastern illinois item file consists of 124-byte records ( 20); although the ohio state system contains variable length records, they are less-than-full bibliographically, averaging 103 bytes ( 22 ) ; the ~1anned spacecraft center system has fixed length records of 168 characters ( 23). if not a full, marc-like record, then what? two questions may be asked: how much data should be converted for each record? and: how much of these data should be put initially into an item file? if one believes a fully integrated system may eventually take over some public catalog functions, then traditional author-title-subject accesses must be maintained, at least until proven unneeded. the minimum genuinely useful set of bibliographic data elements needed for futuristic information retrieval from library catalogs has not been proven; the safe but expensive answer is to convert univ e r se of all items two types of designs/ mcgee 193 access keys .,..------:~}standard bibliogr aphic / descr i ption l st andard but noncomp r ehensively --__ ::;;~ ~ppli~d.single-element unique a----/" 1dent1f1e r: e . g. , isbn , ssn _,/ l li brarya ssigned ke ys s uch as item call number and accession numbe r no te ~means t he access key r etrieves all me mbers of a set ..-me a ns the access ke y retrieves only so~e members of a set fig. 1. possible access ke ys to sets of availability information full records. initially, however, one might want no more data in an item file than are functionally justified. how much are a ctually needed ? the four existing item systems provide traditional information in new ways that have dramatically improved services to users. they answer several basic kinds of questions on-line: does the library have book ? is it available now? what books do i have charged out? such queries can be answered by nonsubject, d escriptive bibliographic data, and by circulation status information that shows if items are absent, and when they may become available. for this an item file needs records only for items that are used, in contrast to a comprehensive on-line shelflist. which records to include b ecomes a problem remarkably similar to deciding what books to put into low-access, compact storage , or to discard. the two university libraries with item systems chose comprehensive conversions: eastern illinois for 235,000 volumes ( 20 ), and ohio state for 800,000 titles ( 22). what are the potential advantages to users of an item system? if one only wants to know what books are charged, an absence system will suffice. both the penalties and promise of an item system lie in its bibliographic store-in the records it holds (scope ), and in the data these records contain (content ) . unless real-time querying of an item file can substitute for at least some manual searches of the public catalog, and in an improved way, its bibliographic data offer no direct advantages to users of a circulation system; an item system will provide no direct circulation services that an absence system could not. applying this as a test to the utility of a noncompreh ensive item file (a file of records only for items that are, or are likely to b e, in use ), we find perhaps the key question for development of item systems among libraries with very large catalogs: to what extent may 194 journal of library automation vol. 5/3 september, 1972 a noncomprehensive item file substitute for accesses to a comprehensive public catalog? although related, this is not the same question as what proportion of a library's book stock circulates. this is a question of how the public catalog is used: by whom , and for what? lipetz's study of the card catalog in yale university's sterling memorial library gives insight to at least that institution's catalog use (30). he found that 73 percent of the users attempting searches were looking for particular documents (known items ). overall, users' approaches to catalog searches were: author, 62 percent; title, 28.5 percent; subject, 4.5 percent; and editor, 4 percent. this may encourage one to believe that an item file which is accessible by author and title can handle a significant portion of manual catalog lookups. if so, developers of item systems may want to consider strategies similar to the following. if it is shown that satisfactory author /title access can be provided by an item file, then perhaps a large library is justified in dividing its card catalog and retaining only the subject-access portion. the argument is that author/ title access can be provided by an item file of partial records containing nonsubject descriptive data, whereas the requirements for subject access involve still more data that are likely to change as subject descriptions do. however, if a manual card catalog for subjects were maintained , this would facilitate updates of subject headings, and at the same time permit the most efficient format and smallest set of machine-held item file records to be kept. through the use of machine-held subject authority files, maintenance instructions and replacement heading cards could be computer-produced for update of the manual subject catalog. (distribution of machine-readable subject headings is being considered by the library of congress marc development office.) reduction in the maintenance and use of a full manual author-title-subject catalog by library technical services departments could produce significant savings, aside from whatever direct improvements in access that machine files might provide. if the item file were noncomprehensive, or contained retrospective records only for those items that circulated, then author/title accesses would of course be limited to the contents of that file. this would require maintenance of full manual catalogs for noncirculating items. two general alternatives to a comprehensive item file of full records come to mind. one is to utilize records as they are created by the cataloging process, complemented by partial-record conversion (conveniently, inhouse) for only those retrospective items that circulate. another is to create a special circulation-only item file of partial records. this kind of system would use an item file primarily as an alternative to machine-readable book cards. absence and holdings querying would be supported, but not acquisitions or cataloging functions. a system like this, with an item file of partial records, may be the most reasonable answer for large research libraries ( 28). it should be able to give the same circulation services as an absence system, in addition to satisfying certain kinds of two types of designs j.mcgee 195 public catalog searches. the simplifications for data conversion, data management software, and terminals are worth special evaluation as a middle or simple approach to on-line circulation system development, with an item system design. item system querying, bibliographic data structure, and file organization the querying capabilities featured by each item system differ somewhat, and are explained in part by differences in bibliographic data structure and file organization. the data and design of an information-providing system are fundamental to the kinds of services it can provide. a useful conceptual model is the traditional manual library system in which separate files are used for different functions: an in-process file for technical processing, a shelflist for the official holdings, a public catalog for reference, and a circulation file to control item absences. among these the file for circulation contains less bibliographical data than the others, since even a single data element such as the call number can uniquely identify a physical item and relate it to a fuller description, such as a shelflist record. a circulation file of this nature is in effect a manual absence file, and serves no major purpose other than circulation control. vvere the processing requirements not impractical, circulation status cou ld be more usefully recorded in public catalog records, in the manner of an item file. the eastern illinois university system has an indexed sequential item file organized by item call number plus accession number. it may be queried by this key to get an exact match to a single record, or by a classification number to get a file scan of corresponding records. query by user number displays charges to the user. the ohio state system has a read-only item file that is randomized by item call number, but it may also be accessed by an author/title key that consists of the first four characters of the main entry plus the first five characters of the first significant word or words of the title. the second five characters can be blanked to provide author-only access. the file is also accessible by an item record number that is assigned sequentially to new records entering the file. the bell laboratories system provides access by item number to its item file, and uses a set of twelve query codes to obtain status and other factual information on users and items. user number and item number are the query keys. the item file is also used to produce a book catalog that gives the item numbers by which queries can be made. the item file of the manned spacecraft center library is sequentially organized by item number, and can also be queried by call number and user number. these systems demonstrate alternatives for bibliographic data structure and file organization and access methods that are summarized by the author in a separate work ( 1) and explained by the references for each system 196 journal of library automation vol. 5/3 september, 1972 ( 18,19,20,21,22,23). briefly, the eastern illinois, bell laboratories, and manned spacecraft center systems use a fixed-length item record structure, and charge data are written directly to item record fields that are defined for this purpose. the ohio state system has a variable length , read-only item record. transaction data are recorded in an absence file, and linked to the item file. in the bell laboratories system what is conceptually a single item file is actually two separately organized physical rles with different record formats. fixed-length book records are organized sequentially, and each contains fields for three loans and two reserves; all copies and volumes are represented. journal records arc organized by an indexed sequential method, and do not contain copy and volume data, which must be added at transaction time. in the eastern illinois and manned spacecraft center systems the item file contains a separate record for each physical volume in the library. the ohio state item file contains one record per title. although it is difficult to tell without detailed programming knowledge of these systems, thebell laboratories data structure seems to enable exact matches to single records for status queries (e.g., what is the status of title number ? what is the status of copy ? ) in ways that the eastern illinois and ohio state systems can only accomplish through a terminal operator's interpretation of a displayed set of matching records. the bell laboratories system can therefore conduct queries of this nature with keyboard/printer terminals, whereas the eastern illinois and ohio state systems require crt devices to display large amounts of information. it can also ask what overnight loans are still out, possibly a function of its journal file's data structure. the software implications of these various capabilities will not be discussed here. suffice it to say that absence systems require simpler accesses and data management than do the kind of item systems we have discussed, and that as item files are designed to replace all or selected public catalog functions, their data management and user interface requirements become greater. special aspects of the charge function two aspects of the charge function have special significance for on-line systems: patron self-charging, and a telephone and mail or delivery service. among the on-line systems we surveyed, only the one at northwestern university is reported to be self-charging. to have patron self-charging requires that charge transactions be simple and convenient. data transfer methods that require little effort are therefore preferred, and the usc of machine-readable user and item identifications seems to be the best current choice. the northwestern system uses hollerithpunched user badges and book cards. other methods of data entry such as magnetic card reading and optical scanning are often mentioned for circulatwo types of designs/mcgee 197 tion control, but as of december 1971 we know of none that has resulted in a practical terminal-based svstem for on-line charging. two of the item systems promote a telephone and delivery service: the bell laboratories system and the ohio state university system. in each system inquiries can be directed to operators, who may conduct on-line searches of library holdings and circulation information for specific items. the kinds of questions that can be asked are "does the library have ___ ?" and "is it charged?" we noted earlier that a catalog can he made "remotely accessible" in several ways: e.g., by a special group that performs manual card catalog lookups for telephoned requests, or by users' consulting multiple copies of book or microform catalogs. in principle, a variety of catalogs and circulation systems can be used together in a telephone inquiry system of this nature. for example, the library of the georgia institute of technology has recently implemented an "extended catalog access" and delivery service that is based on microfiche copies of its catalog at thirty-six campus locations, coupled with telephone inquiry to a manual circulation system ( 31). readers look up wanted items and telephone the library to request them. the manual circulation file is checked: available items are charged for delivery, or reserves may be placed for items that are already loaned. presumably, the currency of information and quickness of response times are better in an on-line circulation system than in any other type. an item system can furnish both holdings and absence information. an absence system needs to be coupled to another system to furnish holdings information: a requirement is that the holdings information must contain a key by which the corresponding absence records can be accessed. these are basic considerations in providing a telephone and delivery service. system backup the problems considered here derive from two conditions: unexpected system downtimes and scheduled periods when the system is not in operation. at these times a system cannot execute on-line tasks . two classes of backup problems are: 1) provision of service to users during the downtimes; and 2) updating system files to record downtime transactions. the latter are termed recovery problems. one way to backup the query function is to periodically print a list of circulating items. the frequency and ease of access (e.g., number of copies, their locations, telephone access to them ) of such a list can pose substantial problems. an alternative to scheduled printings is an arrangement for quick printouts of a frequently copied backup tape on a redundant computer system. the basic recovery problem is how to enter data into the system for transactions that took place during downtimes. presumably, if unexpected do\\'ntimes are not inordinately long, discharges and other file updates may be postponed. this simplification is helpful, since transaction sequences 198 journal of librm·y automation vol. 5/3 september, 1972 among different kinds of updates can become quite complicated, e.g., discharges undo charges, and confusing the sequence causes problems: although other kinds of system updates have their own special problems, the following paragraphs only briefly discuss the backup and recovery of charging activities. absence systems the provision of transaction evidence in off-line mode has already been suggested for absence systems that have the necessary terminal capabilities. similarly, there are configurations which, in off-line mode, read user and item cards and produce machine-readable transaction records that can be read-in during post-downtime recovery procedures. the northwestern university system has a special backup terminal for this purpose. the provision of automatic recovery facilities is an attractive feature. alternatively, multiple part manual transaction records can be made for charges during downtimes. one part may serve as transaction evidence; the other can be used for manual input of recovery data, when the system is up again. exactly how this is done depends upon other details of the particular system. item systems since inputs of user and item identification numbers are sufficient to record charges in item systems, the recovery problem can be simpler than for absence systems. typewriter-like terminals with card or paper tape punches or magnetic recorders can be used to create machine-readable recovery data. the requirements for transaction evidence may be crucial. perhaps the solution to the worst case is the use of a two-part manual transaction form: one copy for transaction evidence, and the other, as above, for post-downtime recovery inputs. we can summarize three hardware solutions for transaction backup in either system type: 1) total system redundancy, 2) backup at the terminal level, and 3) a backup facility between terminal and computer. the cost of full system redundancy makes it unlikely. a facility to log transactions during downtimes is more feasible; there are several choices. one such alternative is to record transaction data off-line in machine-readable form at each data collection point: e.g., to punch paper-tape or cards. another alternative is to record data from several terminals with a single device, such as a magnetic recorder, or a control unit that coordinates a multiterminal system. a third solution, a variation on the second, is a mini-computer which links terminals, and handles telecommunications with a larger machine that holds system files. this approach has been taken by bucknell university. it affords more comprehensive backup than merely capturing transaction data. other functions, such as checks for user validity and reserved items, can be performed on a relatively reliable mini-computer dedicated to circulation. two types of designs/mcgee 199 conclusion on-line library catalogs are now a reality, but not yet for the exotic information retrieval work once popularly projected. instead, relatively straightforward accesses by author, title, and call number are supporting circulation, reference, and technical processing functions. the needs for better circulation systems and network processing of shared cataloging data have stimulated developments of large-scale operational (not experimental) systems around resident files of on-line bibliographic records. developers have not waited for solutions to fundamental problems of automatic indexing and information retrieval ; they have put large bibliographic files on-line and provided relatively simple, multiple access keys. the advances that have been made are in methods of physical access to bibliographic records, not in the intellectual or subject access to information. no new information is being retrieved, but familiar processes are being performed in better ways. improvements in the ease and time of accessing library files have dramatically upgraded the library's responses for its own routine work and to the public in general. we are experiencing the first of a new generation of practical systems that perform traditional functions with on-line rather than manual files, with as much benefit as possible short of better subject access. the new systems are transcending the barriers to convenient use that have been imposed by the size, complexities, and awkwardness of large manual systems. historically, it has been impractical to add circulation information to each record in the public catalog for an item. with on-line files of single records per item this is now possible. state-of-the-art computing affords multiple access keys to a record, instead of duplicating it for additional entries as in manual catalogs. how many and which keys are furnished largely determines the extent to which an on-line catalog can replace a traditional one. difficult cost and technical problems explain the current approaches. full requirements of a public catalog have been avoided; simpler files have been built to handle explicit processing functions. the advantages are simplified records and fewer access points. full bibliographic records are variable length, often large, and sometimes eccentric-and therefore relatively expensive to handle in machine form. in principle the overhead for access is the same as for manual files: the more entries that are provided, the greater the storage, processing, and cost. systems with simpler files than the public catalog have therefore been built. there have been no machine equivalents of large library catalogs; so we have studied manual ones to theorize ideal characteristics. in some cases this model may have supplied a misleading bias. studies of the new on-line systems at work could possibly revise our notions of what is needed. the kinds of systems now emerging are answers for the foreseeable futur e. the tradition of separately organizing and managing public and technical services will be challenged by the integrated systems. their centralized files , data handling, and access methods transcend functional boundaries which 200 journal of library automation vol. 5/3 september, 1972 grew between library tasks that used different but redundant manual files and evolved separate units and procedures to accomplish virtually the same basic data processing functions. the profession has yet to widely appreciate the new overview and managerial changes that are invited. reaction to them may be projected as a fourth and perhaps painful trend. in sofar as no fully integrated systems have yet been developed, it is likely that as they emerge they will force substantial changes to traditional patterns of library organization and management. acknowledgments this work was supported by the university of chicago library systems development office under clr/neh grant no. e0-262-70-4658 from the council on library resources and the national endowment for the humanities, for the d evelopment and operational testing of a library data management system. references 1. rob mcgee, a literatu1'e survey of operational and emerging online library circulation systems (university of chicago library systems development office, feb. 1972). available as eric/clis ed 059 752. mf$0.65, hc$3.29. 2. , "key factors of circulation system analysis and design," college and research libraries 33:127-140 ( mar. 1972 ). 3. h. k. g. bearman, "library computerisation in west sussex," program: news of computers in british libraries 2:53-58 (july 1968). 4. , "west sussex county library computer book issuing system," assistant libra1·ian 61:200-202 ( sept. 1968 ) . 5. richardt. kimber, "an operational computerised circulation system with on-line interrogation capability," program: news of computers in british librm·ies 2 :75-80 (oct. 1968 ) . 6. homer v. ruby, "computerized circulation at illinois state library," illinois libraries 50:159-162 ( feb . 1968 ). 7. robert e. hamilton, "the illinois state library 'on-line' circulation control system," in: proceedings of the 1968 clinic on librm·y applications of data processing. (urbana, ill.: university of illinois graduate school of library science, 1969 ) p. 11-28. 8. ibm corp., on-line library circulation control syste m, moffet library, midwestern university , wichita falls, t exas. application bri ef k-20-0271-0. ( white plains, n.y.: ibm corp., data processing div. , 1968) 14 p . 9. calvin j. boyer and jack frost, "on-lin e circulation controlmidwestern university library's system using an ibm 1401 computer in a 'time-sharing' mode," in: proceedings of the 1969 clinic on two types of designs/mcgee 201 library applications of data processing. (urbana, ill.: university of illinois graduate school of library science, 1970) p. 135-145. 10. charles d. reineke and calvin j. boyer, "automated circulation system at midwestern university," ala bulletin 63:1249-1254 (oct. 1969). 11. belfast, queen's university, school of library studies, study group on the library applications of computers, first report of the working party (belfast university, july 1965) 18 p. 12. richard t. kimber, "studies at the queen's university of belfast on real-time computer control of book circulation," journal of documentation 22:116-122 (june 1966) . 13. , "conversational circulation," libri 17:131-141 ( 1967). 14. ___ ,"the cost of an on-line circulation system," program: news of computers in british libraries, 2:81-94 (oct. 1968). 15. ann h. boyd and philip e. j. walden, "a simplified on-line circulation system," program: news of compute1·s in libraries 3:47-65 (july 1969). 16. velma veneziano and joseph t. paulukonis, "an on-line, real-time time circulation system." [this documentation of the northwestern university library system was made specially available to the author. a later version with the same title appears in larc reports 3:7-48 (winter 1970-71)]. 17. h . rivoire and m. smith, library systems automation reports 1971a-2, bucknell library on-line circulation system (blocs). ellen clarke bertrand library ( 15 mar. 1971) 19 p. 18. r. a. kennedy, "bell laboratories' library real-time loan system (bellrel)," lola, 1:128-146 (june 1968). 19. , "bell laboratories' on-line circulation control system: one year's experience," in : proceedings of the 1969 clinic on library applications of data processing. (urbana, ill.: university of illinois graduate school of library science, 1970) p. 14-30. 20. paladugu v. rao and b. joseph szerenyi, "booth library on-line circulation system (bloc)," ]ola, 4:86-102 (june 1971). 21. richard h. stanwood, "monograph and serial circulation control," a paper for the international congress of documentation, buenos aires, sept. 21-24, 1970. national council for scientific and technical researcb, buenos aires ( 1970) 23 p. 22. ibm corp., data processing division, functional specifications: a circulation system for the ohio state university libraries, gaithersburg, maryland (november 26, 1969) various paginations. [this and other technical documentation were made specially available to the author. this is now available through eric/clis as: on-line remote catalog access and circulation control system. part i: functional specifications. part ii: user's manual. november 1969. 151 p. ed 050 792. mf $0.65, hc $4.00] 202 journal of library automation vol. 5/3 september, 1972 23. edward e. shumilak, an online interactive book-library-management system. nasa technical note nasa tn d-7052. national aeronautics . and space administration, washington, d.c. ( march 1971 ) 40 p. [this document is available through the national technical information service under document number n71-20526] 24. university of chicago library, a p1'0posal for the development and operational testing of a library data management system, herman h. fussier and fred h. harris, principal investigators. ( chicago, ill.: 1970) 44 p. 25. richard de gennaro, "the development and administration of automated systems in academic libraries," ]ola, 1:75-91 (mar. 1968). 26. ellen w. miller and b. j. hodges, "shawnee mission's on-line cataloging system," ]ola 4:13-26 (mar. 1971). 27. simon p. j. chen, "on-line and real-time cataloging," american libraries 3:117-119 (feb. 1972 ). 28. university of chicago library, development of an integrated, computer-based, bibliographical data system for a large university library, annual report 1967/ 68. by herman h . fussier and charles t. payne. university of chicago library, chicago, illinois ( 1968 ) 17 p. + appendixes. 29. frederick g. kilgour, letter to the author 23 november 1971. 30. ben-ami lipetz, user requirements in identifying desired works in a large library, final report, grant no. sar/ oeg-1-71071140-4427, u.s. department of health, education, and welfare, office of education, bureau of research. (new haven, conn.: yale university library, june 1970) 73 p. + appendixes. 31. "library extends catalog access and new delivery service," [ 4 p.] a brochure issued by price gilbert memorial library, georgia institute of technology, atlanta, georgia, 1972. microsoft word september_ital_colegrove_for_proofing.docx editorial board thoughts: rise of the innovation commons tod colegrove   information  technology  and  libraries  |  september  2015             2   that  the  practice  of  libraries  and  librarianship  is  changing  is  an  understatement.  throughout   their  history,  libraries  have  adapted  and  evolved  to  better  meet  the  needs  of  the  communities   served.  from  content  collected  and/or  archived,  to  facilities  and  services  provided,  a   constant  throughout  has  been  the  adoption,  incorporation,  and  eventual  transition  away   from  technologies  along  the  way:  clay  tablets  and  papyrus  scrolls  giving  way  to  the  codex;  the   printing  press  and  eventual  mass  production  and  collection  of  books  yielding  to  information   communication  technology  such  as  computer  workstations  and  the  internet.  indeed,  the   rapid  and  widespread  adoption  of  the  internet  has  enabled  entire  topologies  of  information   to  change  –  morphing  from  ponderous  print  tomes  into  digital  databases,  effectively  escaping   the  walls  of  libraries  and  archives  altogether.1   in  reflection  of  end-­‐users’  growing  preference  for  easily  accessible  digital  materials,  libraries   have  responded  with  the  creation  of  new  spaces  and  services.  repositioning  physical,  digital,   human,  and  social  resources  to  better  meet  the  needs  of  the  communities  supported,  the   information  commons2  that  is  the  library  begins  to  acquire  a  more  technological  edge.  the   concept  of  a  library  service  or  area  referred  to  specifically  as  an  information  commons  can  be   traced  to  as  early  as  1992  with  the  opening  of  the  information  arcade  at  the  university  of   iowa  –  specifically  designed  to  provide  end-­‐users  technology  tools,  with  a  stated  mission  “to   facilitate  the  integration  of  new  technology  into  teaching,  learning,  and  research,  by   promoting  the  discovery  of  new  ways  to  access,  gather,  organize,  analyze,  manage,  create,   record,  and  transmit  information.”3   first  mentioned  in  the  literature  in  1994,  discussion  of  the  idea  itself  waited  another  five   years,  with  donald  beagle  writing  about  the  theoretical  underpinnings  of  “the  new  service   delivery  model”  in  1999.  defined  as  “a  cluster  of  network  access  points  and  associated  it   tools  situated  in  the  context  of  physical,  digital,  human,  and  social  resources  organized  in   support  of  learning.”  a  flurry  of  articles  followed,  with  the  idea  seeming  to  have  caught  the   collective  imagination  of  libraries  generally  by  2004.  information  commons  as  named  spaces   within  libraries  made  “…  sudden,  dramatic,  and  widespread  appearance  in  academic  and   research  libraries  across  the  country  and  around  the  world.”4  scott  bennett  went  further,  in   2008  asking  flatly:  “who  would  today  build  or  renovate  an  academic  library  without   including  an  information  commons?”5   this  proliferation  and  transition  has  not  been  limited  to  academic  libraries;  for  decades,   libraries  of  all  type,  shape,  and  size,  have  been  similarly  provisioning  resources  and     patrick  “tod”  colegrove  (pcolegrove@unr.edu),  a  member  of  the  ital  editorial  board,  is   head  of  the  delamare  science  &  engineering  library  at  the  university  of  nevada,  reno,  nv.       editorial  board  thoughts:  rise  of  the  innovation  commons  |  colegrove     doi:  10.6017/ital.v34i3.8919     3   technology  in  the  context  of  end-­‐user  access  and  learning.  by  2006,  a  new  variation  of  the   information  commons  had  entered  the  vernacular:  the  learning  commons.  defined  by  beagle   as  the  result  of  information  commons  resources  “organized  in  collaboration  with  learning   initiatives  sponsored  by  other  academic  units,  or  aligned  with  learning  outcomes  defined   through  a  cooperative  process.”6  a  subset  of  the  broader  concept,  when  the  library   collaborates  with  stakeholders  external  to  the  library  to  collaboratively  achieve  academic   learning  outcomes,  it  becomes  operationally  a  learning  commons.  one  can  easily  conceive  of   the  learning  commons  more  broadly  by  considering  learning  outcomes  desirable  within  the   context  of  particular  library  types:  school  libraries  with  offerings  and  programs  in  alignment   with  broader  k-­‐12  curricula;  public  libraries  in  support  of  lifelong  learning  and  participatory   citizenship;  special  libraries  in  alignment  with  other  niche-­‐specific  learning  outcomes.   note  that  not  all  information  commons  are  learning  commons.  as  defined,  the  learning   commons  depends  on  the  actions  and  involvement  of  other  units  that  establish  the  mission,   and  associated  learning  goals,  of  the  institution.  others  must  join  with  the  library’s  effort  in   order  to  create  and  nourish  such  spaces  in  a  way  that  is  deeply  responsive  to  the  aspirations   of  the  institution:  “the  fundamental  difference  between  the  information  and  the  learning   commons  is  that  the  former  supports  the  institutional  mission  while  the  latter  enacts  it.”   (bennett  2008,  emphasis  added)  at  a  time  when  libraries  are  undergoing  such  rapid  and   significant  transformation,  it’s  hard  to  dismiss  such  collaborative  effort  as  merely  trendy  –   such  spaces,  and  the  library  by  extension,  become  of  even  more  fundamental  relevance  to  the   broader  organization.   in  short,  resources  are  provisioned  in  the  information  commons  so  that  learning  can  happen;   collaborative  effort  with  stakeholders  beyond  the  library,  but  within  the  organization,   ensures  that  learning  does  happen.   drawing  a  parallel,  what  if  the  library  were  to  go  beyond  simply  repositioning  resources  in   support  of  learning  –  indeed,  beyond  working  with  other  units  of  the  organization  to   collaboratively  align  and  provision  resources  in  support  of  achieving  organizational  learning   outcomes?  to  go  beyond  strategic  alignment  with  the  aspirations  of  the  institution,  involving   stakeholders  from  beyond  the  immediate  organization  in  the  creation  and  support  of  such   spaces?  provisioning  library  spaces  and  services  that  are  deeply  responsive  to  the   aspirations  of  the  greater  community?  arguably  this  is  where  the  relatively  recent   introduction  of  makerspaces  into  the  library  fits  in.  the  annual  environmental  scan   performed  by  the  new  media  consortium  (nmc)  has  for  a  number  of  years  identified   makerspaces  to  be  on  its  short-­‐term  adoption  horizon  –  the  2015  library  edition  goes   further,  identifying  a  core  value:     the  introduction  of  makerspaces  in  the  academic  library  is  inspiring  a  mode  of  learning  that   has  immediate  applications  in  the  real  world.  aspiring  inventors  and  entrepreneurs  are     information  technology  and  libraries  |  september  2015         4   taking  advantage  of  these  spaces  to  access  tools  that  help  them  make  their  dreams  into   concrete  products  that  have  marketable  value.7     aspects  of  the  information  commons  are  present  in  library  makerspace  –  not  only  in  the   access  to  traditional  library  resources,  but  also  in  the  shift  toward  providing  support  of  21st-­‐ century  literacies  in  the  creation,  design,  and  engineering  of  output.  with  the  acquisition  and   use  of  these  literacies  in  collaboration  with  and  in  support  of  the  goals  of  the  greater   institution,  it  is  also  a  learning  commons;  for  example,  in  the  case  of  a  school  or  public  library   where  makerspace  activities  and  engagement  collaboratively  meet  and  support  learning   outcomes  including  increased  engagement  with  science,  technology,  engineering,  the  arts,   and  math  (steam)  disciplines.  consider  the  further  example  of  university  students   leveraging  makerspace  technology  as  part  of  ste(a)m  outreach  efforts  to  local  middle   schools  in  the  hope  of  kindling  interest,  or  partnering  with  the  local  discovery  museum  in  the   production  of  a  mini  maker-­‐faire  to  carry  that  interest  forward.  alternatively,  a  team  of   students  conceiving,  then  prototyping  and  patenting  a  new  technology  with  the  active  and   direct  support  of  the  library  commons,  going  on  to  eventually  launch  as  a  business.  to  the   extent  the  library  can  springboard  off  the  combination  of  makerspace  with  information  or   learning  commons  to  engage  stakeholders  from  beyond  the  institution,  it  can  go  beyond  –   becoming  something  broader,  and  potentially  transformative;  even  as  it  enables  progress   toward  collaboratively  achieving  community  goals,  outcomes,  and  aspirations.   the  hallmark  of  community  engagement  with  such  library  facilities  is  a  spontaneous   innovation  that  seems  to  flow  naturally.  library?  information  or  learning  commons?   arguably  such  spaces  are  more  accurately  named  innovation  commons.   beyond  solidifying  the  library’s  place  as  a  hub  of  access,  creation,  and  engagement  across   disciplinary  and  organizational  boundaries,  the  direct  support  of  innovation  –  the  process  of   going  from  idea  to  an  actual  good  or  service  with  a  real  perceived  value    –  is  in  potential   alignment  with  the  aspirations  of  the  broader  community.  in  collaboration  with  stakeholders   from  across  the  community,  from  economic  development  and  government  representatives  to   businesses  and  private  individuals,  broader  outcomes  and  aspirations  of  the  greater   community  can  be  identified  and  supported.    nevertheless,  simply  adding  makerspace   technology  to  an  information  or  learning  commons  does  not  automatically  create  an   innovation  commons.  it  is  in  the  broader  conversation,  along  with  the  catalyzation,   identification  of  and  support  for  the  greater  aspirations  of  the  community,  that  the  commons   begins  to  assume  its  proper  role  in  the  greater  ecosystem.  leveraging  the  deliberate   application  of  information,  with  imagination,  and  initiative,  enabling  end-­‐users  to  go  from   idea  all  the  way  to  useful  product  or  service  is  something  that  community  stakeholders  see  as   a  tangible  value.           editorial  board  thoughts:  rise  of  the  innovation  commons  |  colegrove     doi:  10.6017/ital.v34i3.8919     5   the  library  as  innovation  commons  becomes  a  natural  partner  in  the  local  innovation   ecosystem,  working  collaboratively  to  achieve  community  aspirations  and  economic  impact.   traditional  business  and  industry  reference  support  ramps  up  to  another  level,  providing   active  and  participatory  support  of  coworking,  startup  companies,  and  etsypreneur8  alike  –   patent  searches  taking  on  an  entirely  new  light  in  support  of  innovators  using  makerspace   resources  to  rapidly  prototype  inventions.  actualized,  the  library  joins  forces  in  a  deeper  way   with  the  community  in  the  creation  of  new  technologies,  jobs,  and  services,  taking  an  ever   more  active  role  in  building  the  futures  of  the  community  and  its  members.   references                                                                                                                               1.    morgan  currie,  “what  we  call  the  information  commons,”  institute  of  network  cultures   blog,  july  8,  2010,  http://networkcultures.org/blog/2010/07/08/what-­‐we-­‐call-­‐the-­‐ information-­‐commons/   2.    the  word  commons  reflects  the  shared  nature  of  a  resource  held  in  common,  such  as   grazing  lands.   3.    robert  a.  seal,  “issue  overview,”  journal  of  library  administration,  50  (2010),  1-­‐6.   http://www.tandfonline.com/doi/pdf/10.1080/01930820903422248   4.    charles  forrest  &  martin  halbert,  a  field  guide  to  the  information  commons.  lanham,  md:   scarecrow,  2009.   5.    scott  bennett,  “the  information  or  the  learning  commons:  which  will  we  have?,”  the   journal  of  academic  librarianship,  34,  no.  3  (2008),  183-­‐185.   6.    donald  robert,  donald  russel  bailey,  &  barbara  tierney,  the  information  commons   handbook,  xviii.  new  york:  neal  schuman,  2006.   7.    larry  johnson,  samantha  adams  becker,  victoria  estrada,  and  alex  freeman,  nmc  horizon   report:    2015  library  edition,  36.  austin,  tx:  the  new  media  consortium,  2015.   8.    the  combination  of  etsy,  a  peer-­‐to-­‐peer  e-­‐commerce  website  that  focuses  on  selling   handmade,  vintage,  or  unique  items,  and  entrepreneurship.  the  word  “etsypreneur”   refers  to  someone  who  is  in  the  “etsy  business”  –  namely,  selling  such  items  via  the   website.  http://etsypreneur.com/the-­‐hidden-­‐danger-­‐of-­‐the-­‐internet-­‐opportunity/   microsoft word march_ital_massicotte_proof_revised.docx reference rot in the repository: a case study of electronic theses and dissertations (etds) in an academic library mia massicotte and kathleen botter information technology and libraries | march 2017 11 abstract this study examines etds deposited during the period 2011-2015 in an institutional repository, to determine the degree to which the documents suffer from reference rot, that is, linkrot plus content drift. the authors converted and examined 664 doctoral dissertations in total, extracting 11,437 links, finding overall that 77% of links were active, and 23% exhibited linkrot. a stratified random sample of 49 etds was performed which produced 990 active links, which were then checked for content drift based on mementos found in the wayback machine. mementos were found for 77% of links, and approximately half of these, 492 of 990, exhibited content drift. the results serve to emphasize not only the necessity of broader awareness of this problem, but also to stimulate action on the preservation front. introduction a significant proportion of material in institutional repositories is comprised of electronic theses and dissertations (etds), providing academic librarians with a rich testbed for deepening our understanding of new paradigms in scholarly publishing and their implications for long-term digital preservation. while academic libraries have long collected and preserved hard copy theses and dissertations of the parent institution, the shift to mandatory electronic deposit of this material has conferred new obligations and curatorial functions not previously incorporated into library workflows. by highlighting etds as a susceptible collection deserving of specific preservation actions, we draw attention to some unique responsibilities for libraries housing university-produced content, particularly as scholarly information continues its shift away from commercial production and distribution channels. as teper and kraemer point out in their discussion of etd program goals, “without preservation, long-term access is impossible; without long-term access, preservation is meaningless.”1 what is reference rot, and why study it? in addition to linkrot (where a link sends the user to a webpage which is no longer available), mia massicotte (mia.massicotte@concordia.ca) is systems librarian, concordia university library, montreal, quebec, canada. kathleen botter (kathleen.botter@concordia.ca) is systems librarian, concordia university library, montreal, quebec, canada. a case study of electronic theses and dissertations (etds) in an academic library | massicotte and botter | https://doi.org/10.6017/ital.v36i1.9598 12 there are webpages that remain available, but whose contents have undergone change over time- known as content drift. this dual phenomena of linkrot plus content drift has been characterized as reference rot by the hiberlink project team,2 and has important implications for digital preservation. since theses and dissertations are original works born digital by virtue of mandatory deposit programs, a university’s etd program is effectively a digital publishing initiative, accompanied by a new universe of responsibility for its digital preservation. due to the specialized nature of graduate-level research, etds frequently include links to resources on the open web, for example, personal blogs, project websites, and commercial entities. digital object identifiers (dois), useful in the context of published literature, do not apply to urls on the free web, which are doi-indifferent. open web links also fall outside the scope of preservation initiatives such as lockss (lots of copies keep stuff safe)3 which aim to safeguard the published literature. with increasing frequency, researchers are citing newer forms of scholarship, which do not readily fall under the rubric of published literature. moreover, since thesis preparation is conducted over a period of time typically measured in years, links cited therein are likely to be more vulnerable to linkrot and content drift by the time of manuscript submission. yet despite the surfeit of anecdotal daily evidence that urls vanish and result in dead links, phillips, alemneh, and ayala point out that “by and large academic libraries are not capturing and maintaining collections of web resources that provide context and historical reference points to the modern theses and dissertations held in their collections.”4 since an etd comprises a unique form of scholarly output produced by universities, and simultaneously satisfies the parent institution's degree-granting apparatus, as well as reflecting its academic stature on the international stage, the presence of reference rot in this body of literature is of particular concern and worthy of immediate attention. smoking guns there has been no shortage of evidence reporting on the linkrot phenomena over the last two decades. koehler, whose initial study on linkrot appeared in jasis in 1999, periodically revisited, analyzed, and reported on the same set of 360 urls collected in his original study.5,6,7 in 2015, upon the twenty-year benchmark of the original data collection, oguz and koehler reported in jasis that only 2 of the original links remained active.8 a number of foundational studies, including casserly and bird,9 spinellis,10 sellitto,11 falagas, karveli, and tritsaroli,12 and wagner et al.13 have reported on linkrot occurring in professional literature. sanderson, phillips, and van de sompel provide a table of 17 well-known linkrot studies, comparing overall benchmarks, and supplying a succinct summary of the scope of each study.14 linkrot also gained further important exposure with the harvard law school study by zittrain, albert, and lessig, which found that 70% of 3 harvard law journal references, and 49.9% of urls in supreme court opinions examined, no longer pointed to their originally cited sources.15 information technology and libraries | march 2017 13 members of the hiberlink project, which set out to examine “a vast corpus of online scholarly publication in order to assess what links still work as intended and what web content has been successfully archived using text mining and information extracting tools” have been pivotal in making the case for reference rot.16 hiberlink demonstrated that failure to link to cited sources was due not only to linkrot, but also to web page content which changed over time.17 a new dimension of the digital preservation universe was thrown into sharp relief with follow-up study by klein et al. (2014), which examined one million web references extracted from 3.5 million science, technology, and medicine (stm) articles published in elsevier, pubmed central, and arxiv, between the years 1997 and 2012. the study concluded that one in five articles suffers from reference rot.18 though the study focused on stm articles, its authors drew attention to theses and dissertations as a susceptible class of material. analyzing the same set of links extracted from this large stm corpus, jones et al. (2016) recently reported that 75% of referenced open web pages demonstrated changes in content.19 etds — a susceptible collection the digital preservation part of institutionally mandated etd deposit has yet to have its dots fully connected to the rest of the diagram. after four years of research into academic institutions’ etd programs, halbert, skinner, and schultz reported that close to 75% of respondents surveyed had no preservation plan for their etd collections.20 despite the prevalence of linkrot studies, linkrot in etds has not been subjected to similar scrutiny, and the implications of disappearance of content is underappreciated. while mandatory deposit programs have become relatively commonplace, focus has largely remained on policy and implementation aspects, metadata quality, interoperability and conformance to standards.21,22 there are few studies which focus on institutional repository link content. the study conducted by sanderson, philips, and van de sompel (2011) was a large-scale examination of two repositories.23 400,144 papers deposited in arxiv, and 3,595 papers in the university of north texas (unt) digital library repository were studied, and more than 160,000 urls examined. links were analyzed for persistence and the availability of mementos, that is, whether prior versions of the page existed in a public web archive, such as the internet archive's wayback machine. for 72% of unt urls, either mementos were available, or the resource still existed at its original location, or both. although 54% (9,880) were available in one or more international web archives, 28% (5,073) of unt's etd links were found to no longer exist, nor had they been archived by the international archival community. phillips, alemneh, and ayhala looked at overall general patterns and trends of url references in repository etds, examining 4,335 etds between the years 1999-2012 in the unt repository.24 the team analyzed 26,683 unique urls in 2,713 etds containing one or more links, finding an overall average of 10.58 unique urls per etd with one or more links. the unt team provided a a case study of electronic theses and dissertations (etds) in an academic library | massicotte and botter | https://doi.org/10.6017/ital.v36i1.9598 14 breakdown of domain and subdomain occurrence frequency, and indicated areas of future investigation into content-based url linking patterns of etds. etd link decay was studied by sife and bernard, who performed a citation analysis on urls in 83 theses published between 2007 and 2011 at tanzania's sokoine national agricultural library.25 15,468 citations were examined, 9.6% (1,487) of which were open web citations. urls were considered active if found at the original location, or available after a url redirect. the authors manually tested urls over a period of seven days to record their accessibility, noting down inaccessible urls error messages and domains, and analyzing the types of errors encountered. the authors calculated that it took only 2.5 years for half of the web citations to disappear. at the etd2014 conference,26 an important study of 7,500 etds in 5 u.s. universities was presented. of 6,400 etds defended between 2003 and 2010, approximately 18% of open web link content was confirmed as lost, and a further 34% at risk of loss, that is, live links which lacked an archived copy.27 though the results of that particular study have not been formally published, it was briefly summarized in a session held at the 38th uksg annual conference in glasgow, scotland in march 2015, an account of which was subsequently published by burnhill, mewissen, and wincewicz in insights.28 given the scarcity of published literature on link content as found in etds, this present study which examines reference rot in etds in an academic institutional repository is unique, draws attention to an important digital collection which is vulnerable to loss, and highlights need for action. background and context concordia university is a comprehensive university located in montreal, with a student population of 43,903 full-time equivalents in 2015, of which 7,835 were graduate students. 27 phd programs were offered in 2015,29 and 43 programs at the masters level. faculties of arts and science, engineering and computer science, fine arts, and business have a thesis requirement, and produce upwards of 350 masters and 150 phd dissertations annually. the broad disciplines, and the departmental clusters used in this study are shown in table 1. prior to the thesis deposit mandate, concordia university library housed hard copy versions of theses and dissertations in the collection. in 2009, the library launched spectrum, concordia’s eprints institutional repository, playing a leadership role in spectrum's implementation and policy development, and providing training and support to the school of graduate studies regarding submission and management of theses for deposit. following a successful pilot project, the graduate studies office ceased accepting paper manuscripts, and mandated electronic deposit of all theses and dissertations into spectrum as of spring 2011. information technology and libraries | march 2017 15 discipline department discipline department arts applied linguistics communication economics educational technology history hist and phil of religion humanities philosophy sociology political science psychology religion business* decision sciences and mis finance management marketing engineering** building engineering civil engineering computer science comp sci & software eng electrical and comp eng industrial engineering info systems security mechanical engineering fine arts art education art history film and moving image studies industrial design fine arts performing arts science biology chemistry mathematics physics exercise science table 1. summary of departmental clusters used in this study * john molson school of business ** engineering & computer science methodology we concentrated on phd dissertations (henceforth etds) in spectrum in order to limit the scope of the project; master's theses were excluded. a 5-year period was chosen, beginning with the first semester of mandatory deposit, spring 2011, through fall 2015, a total of 720 etds. since concordia etds are released for publication immediately following convocation, the university's official convocation dates were used to identify the set of documents to be downloaded and examined. we proceeded in phases: first downloading etds from spectrum and converting to a text format that could be examined for patterns; then extracting links from each and testing programmatically a case study of electronic theses and dissertations (etds) in an academic library | massicotte and botter | https://doi.org/10.6017/ital.v36i1.9598 16 for linkrot; then drawing a stratified random sample of active urls and visiting them to determine if content drift had taken place. our methodology for link extraction was similar to those described by klein et al.,30 and zhou, tobin, and grover.31 during the dissertation download stage, 36 etds with embargoed content were encountered and eliminated. etds were then converted from existing pdf/a format to xml. a further 20 documents failed to convert due to nonstandard or complex formatting which resulted in unreadable, garbled characters. these documents resisted multiple conversion attempts, and since they could not be mined, had to be eliminated. a final total of 664 etds were successfully converted using three different tools: 97% (644) were converted using pdftohtml,32 the remaining 3% by either givemetext (14)33 or adobe acrobat (3). a spot check of documents was sufficient evidence that many links occurred throughout the text body. since we intended to extract urls to the open web, we wanted to err on the side of detecting more links, rather than easily-identifiable well-formed urls. links were mined from the body of the text in a manner similar to the study carried out at unt.34 we wanted a regular expression which would catch as many urls as possible, expecting to manually clean the link output before further processing. we tested multiple regular expressions35 against a small sample of our converted etds and compared the results. we selected one which seemed well-suited for our purpose, as it was liberal in detecting links throughout the text, was able to extract links which contained obvious omissions and problems — for example, those that lacked http:// prefixes — but also caught non-obvious errors, such as ellipses in long urls. we considered how deduplication of extracted links might affect the outcome, and opted to count each link as an individual instance. manual cleanup included catching urls that broke across new lines, identifying false hits such as titles containing colons and dois, and adding escape encoding characters for "&" and "%" in order to generate a clean url for use in the next step of the process. methodology — linkrot collection a script programmatically used the curl command line tool to visit each link and fetch the http response code in return.36 an output listing was produced for each doctoral dissertation, comprised of the original urls, the final urls, and the http response codes. link output for each of the converted 664 etds was collected from december 2015 to january 2016, with the fall 2015 semester checked in march 2016. 76% (504 of 664) of etds contained one or more links, the highest number of links (5,946) falling into the arts group. 24% (160 of 664) of etds contained no links. for the 5-year period, the broad discipline breakdown of documents examined, the number of etds with links, and the number of links extracted are shown in table 2. converted etds by publication year, broken out by broad disciplines, are shown in figure 1. information technology and libraries | march 2017 17 discipline number of phd etds in spectrum etds converted* contain no links contain links number of links extracted arts 210 195 31 164 5,946 business 45 43 12 31 210 engineering 351 326 82 244 3,259 fine arts 28 25 2 23 1,728 science 86 75 33 42 294 total 720 664 160 504 11,437 table 2. 5-year period, 2011-2015, summary of documents examined and links extracted * 56 documents in total eliminated (36 embargoed; plus 20 which failed to convert). figure 1. converted etds by publication year and broad discipline a case study of electronic theses and dissertations (etds) in an academic library | massicotte and botter | https://doi.org/10.6017/ital.v36i1.9598 18 the 11,437 links extracted were checked for linkrot, each link accessed and its http response code recorded. 77% (8,834 of 11,437) of links returned an active 2xx http response code. 23% (2,603) of links could not be reached, returning a response code other than in the 2xx range. this includes 102 links in the 3xx range which failed to reach a destination after 50 redirects and were considered linkrot. numbers of links, total link response, and link response by year broken down by broad discipline are shown in figure 2, with accompanying data provided in table 3 and discussed in the findings section. figure 2. link http response codes, by broad discipline and year information technology and libraries | march 2017 19 discipline http response code 2011 2012 2013 2014 2015 total % active & rotten** arts 2xx 691 864 800 1,108 1,093 4,556 77% all other* 320 428 131 293 218 1,390 23% business 2xx 14 52 17 22 50 155 74% all other 9 19 5 9 13 55 26% engineering 2xx 302 702 638 482 404 2,528 78% all other 134 172 180 196 49 731 22% fine arts 2xx 165 143 504 467 94 1,373 79% all other 74 56 118 98 9 355 21% science 2xx 77 34 58 39 14 222 76% all other 25 23 10 11 3 72 24% subtotal 2xx 1,249 1,795 2,017 2,118 1,655 8,834 77% active all other 562 698 444 607 292 2,603 23% rotten % rotten 31% 28% 18% 22% 15% 23% total 1,811 2,493 2,461 2,725 1,947 11,437 100% table 3. breakdown by year and discipline showing active (2xx) and rotten (all others) response codes *all other = 0, 1xx, 3xx (unresolved after 50 redirects), 4xx and 5xx response codes combined ** active and rotten rates based on total links per discipline methodology — content drift for the content drift phase, we wanted to sample documents from each of the five disciplines. etds which did not contain any links were excluded from the sample. using only documents with one or more active links, a stratified random sample of 10% was drawn for a final sample of 49 etds containing a total of 990 links. a snippet of text surrounding each link was then also extracted from each etd, along with any "date accessed" or "date viewed" information if present. each link was manually visited, assessed for content drift, and observations recorded. the breakdown of the content drift sample is shown in table 4. a case study of electronic theses and dissertations (etds) in an academic library | massicotte and botter | https://doi.org/10.6017/ital.v36i1.9598 20 discipline etds with links etds with active links (2xx) etds sampled for content drift* number of links extracted for sample arts 164 156 16 668 business 31 28 3 12 engineering 244 235 24 154 fine arts 23 23 2 136 science 42 40 4 20 total 504 482 49 990 table 4. breakdown of sample pool of etds for content drift analysis * 10% sample drawn from each discipline’s pool of etds; only etds with urls relevant for content drift assessment. visited links were benchmarked against the existence of a memento -an archived snapshot of that page located in the wayback machine.37 since the university sets a strict thesis submission deadline of 3 months prior to convocation, mementos prior to submission deadline would be sought. based on the occurrences of "date accessed" and discursive information found in the snippets, we arrived at the supposition that links were likely to have been checked the closer the student approached final stages of manuscript preparation, although this is not verifiable. we set ourselves a soft window for locating an archived snapshot using a date 6 months prior to the convocation date as the benchmark; that is, for each semester's deadline date, an additional 3 months was added, arriving at a 6-months-prior-to-publication marker. since programmatic analysis of 990 links required time, expertise, and resources not available to us, we approached the problem heuristically. assuming that online consultations are not linear, active links occurring multiple times in a document were given equal weight. each link was manually checked in the wayback machine using "date viewed" if provided; if no date was provided (the majority of cases), wayback was checked to see if an archived version existed as close to our 6 month soft marker as possible. if a memento was not found within a month earlier/later than the soft marker, then the nearest neighboring older memento was selected, if one existed. the original url, the date the url was visited, and whether a snapshot was located in wayback was recorded. all links were checked during july-august 2016. if the initial web browser failed to access, a second and sometimes third browser was tried, using safari, chrome, and internet explorer (ie) in that order. unsuccessful attempts to reach wayback were rechecked in september. the question as to whether, and to what degree content drift had occurred was assessed, and is discussed in the next section. information technology and libraries | march 2017 21 findings and discussion linkrot findings of 664 etds examined for linkrot, 77% of links tested returned an active http response code in the 2xx range -roughly three-quarters overall. numbers of links by broad discipline varied greatly, as shown in figure 2 (healthy links in green, linkrot shown in red). linkrot rates ranged from 21% in fine arts, to 26% in business, as seen in last column of table 3. it should be noted that 2xx response codes are also returned for pages that disguise themselves as active links. for example, a url returns an active status code when a domain has been parked (e.g. purchased to reserve the space), or when a customized 404-page-not-found is encountered. since we had no mechanism in place to treat false positives, these were flagged during the linkrot phase as candidates for subsequent content drift analysis. 23% (2,604 of 11,437) of all links, returned a response code of something other than in the 2xx-range and considered linkrot -roughly one-quarter. response codes in the 4xx range alone, including 404-page-not-found errors, comprised 17% (1,916 of 11,437) of all links. table 5 shows the breakdown of the total number of links that were visited in the spring of 2016 for linkrot determination. http response code category meaning of http response code* number of links percent of total links (%) 0 empty response** 507 4% 1xx informational 2 0% 2xx successful 8,834 77% 3xx redirection† 102 1% 4xx client error 1,916 17% 5xx server error 76 1% total 11,437 100% table 5. breakdown of http response codes received * we used http protocol definitions at http://www.w3.org/protocols/rfc2616/rfc2616-sec10.html ** unofficial http response code due to request timing out † failure to resolve after 50 redirects http responses ranged from a high of 85% active in 2015, to a low of 69% active in 2011, the oldest publication year. to put it differently, the most recent year exhibited a linkrot rate of 15%. consistent with other studies, linkrot manifests itself quickly after publication and increases over time, as indicated by percentages shown in figure 2. content drift findings of the 990 links visited to check for the presence of content drift, 764 (400 + 364), or 77%, had a wayback memento compared 226 (92+134), or 23%, which did not. slightly more than half of links with mementos, 52% (400 of 764), demonstrated some level of content drift when the a case study of electronic theses and dissertations (etds) in an academic library | massicotte and botter | https://doi.org/10.6017/ital.v36i1.9598 22 memento was compared to the current active link, while 48% (364 of 764) with mementos did not exhibit content drift. the presence of content drift by discipline, with/without mementos showing numbers of links tested, appears in table 6. discipline number of links tested content drift detected no content drift memento found memento not found total memento found memento not found total arts 668 261 60 321 254 93 347 business 12 5 0 5 4 3 7 engineering 154 74 10 84 55 15 70 fine arts 136 55 22 77 38 21 59 science 20 5 0 5 13 2 15 total 990 400 92 492 364 134 498 table 6. presence of content drift by discipline, with/without mementos for links that had no memento in wayback, content drift assessment was based on the presence of an observable date in the current active link, including copyright, and/or other details which positively correlated against our extracted snippet information. for example, some links retrieved a .pdf or other static file which correlated with the snippet, there being no reason to conclude its content had undergone change since publication, despite the lack of a memento. snippets were also used in cases where a robots.txt file at the target url had prevented wayback from creating a memento. occasional examination of the dissertation text was conducted to validate information extracted in the snippet. the 23% (226) which lacked mementos remain at significant risk and will fall prey to further drift as time passes. as seen in table 7, of 492 urls manifesting content drift, 11% (54 of 492) were completely lost, linking to web domains that had been sold or were currently up for sale, and webpages replaced or removed. 9% (42 of 492) of web pages exhibited major change such that there was little correlation with snippets, or where website overhauls made assessment difficult, but not impossible. 36% (179 of 492) web links exhibited minor drift, primarily pages that differed somewhat from a memento in visual appearance, such as header and footer differences, changes in background theme or style, or changes in navigation or search functionality which did not represent a high degree of impairment. 7% (34 of 492) linked to continually updating websites, such as wikipedia and news organizations, and 7% (35 of 492) were customized 404-page-notfound, distinctive enough to warrant separate categories. a full 30% (148 of 492) exhibited a multiplicity of changes of uncertain nature which we grouped together, such as pages where graphic or audio components had been removed or could not be retrieved, broken javascript that impeded access, browser failure, mementos not accessible after repeated attempts -indicative of a range of issues affecting the quality of web archives and hence preservation.38 the types of information technology and libraries | march 2017 23 content drift encountered, broken down by broad discipline and numbers of links, and percentage, is shown in table 7. type of content drift arts business engineeri ng fine arts science total % of type lost 45 0 3 6 0 54 11% major but findable 22 0 9 9 2 42 9% minor – redesigned but recognizable 128 2 30 17 2 179 36% ongoing updating website 25 3 5 0 1 34 7% custom 404 23 0 4 8 0 35 7% other 78 0 33 37 0 148 30% total 321 5 84 77 5 492 100% table 7. types of content drift encountered, number of links by broad discipline though difficulties encountered during content drift assessment made further extrapolation problematic, the presence of reference rot was confirmed. our 10% stratified random sample examined 990 active links, finding that roughly half (492 of 990) manifested some degree of content drift. for 364 links, or 36% overall, a benchmark memento was found and no content drift detected. although many content drift changes can arguably be characterized as minor, it is not possible to ascertain where the content drift scale tips irremediably for any particular reader. what can be said with certainty is that 11% of active links which did not exhibit linkrot, and were quite live and accessible, fell into a small but unsettling group where the context of the cited web source is irrevocably lost. of the 498 links which did not exhibit any evidence of content drift, 134, approximately one-third, have no memento archived and continue to remain at high risk. a focused and deeper analysis of active links which might lead to a typology of content drift types would be a possible area of future study, though even the well-resourced study by jones et al. which utilized a strict "ground truth" for comparing textual mementos over time, points out that classifying links would certainly be challenging.39 a larger sample size might also allow closer analysis of disciplinary differences, which may lead to a better understanding of these types of content drift variations. conclusion reference rot in the form of linkrot and content drift were observed in etds in spectrum, our institutional repository, and this confirmation should give pause for those charged with a case study of electronic theses and dissertations (etds) in an academic library | massicotte and botter | https://doi.org/10.6017/ital.v36i1.9598 24 stewardship of etd collections. theses and dissertations have long been viewed as material which contribute overall to academic scholarly output, and carry unique status within the academy. in august 2016, opendoar registered 1600 institutional repositories with etds,40 up from 1,100 institutions as reported in 2012 by grey literature specialist schoepfel.41 academic libraries have, in large part, facilitated the transition from paper to etd with widespread adoption of institutional repository deposit programs, and along with that adoption comes a range of long-term preservation issues. yet as ohio state’s strategic digital initiatives working group pointed out, “even in digital library communities, preservation all too often stands in for or is used interchangeably with byte level backup of content.”42 for long-term access, focus can productively be shifted to offset the immediate threat of incompleteness and inadequate capture.43 not much has changed since hedstrom wrote back in 1997: “with few exceptions, digital library research has focussed on architectures and systems for information organization and retrieval, presentation and visualization, and administration of intellectual property rights … the critical role of digital libraries and archives in ensuring the future accessibility of information with enduring value has taken a back seat to enhancing access to current and actively used materials.”44 our understanding and discussion of digital preservation must be broadened, and attention turned to this key area of responsibility in the preservation life-cycle. the authors maintain that etd content and link preservation is an editorial, not individual, imperative. encouraging individual authors to perform their own archiving is doomed to fall short of even reasonable expectations. instituting measures such as perma, a distributed, redundant method of capturing and archiving web site content as part of the citation process must be pro-actively sought and built into library, and hence repository, workflows.45 browser plugins and automated solutions which use the memento protocol for capturing and archiving web site content as part of the citation process do exist,46 but naturally have to be implemented before they can take effect. either way, efforts to operationalize existing mechanisms which are designed to reduce future loss would be extremely productive. responsibility for insuring not only current, but continuing future access to etd content rests with those who maintain curatorial function of the repository. academic librarians have assumed a prominent and de facto role as curators, facilitating the role of university publication and emphasizing its break away from previous ties with commercial entities. we collectively bear greater responsibility for this body of scholarly work, and need to move forward from a position of benign neglect to one of informed curation and pro-active preservation of an important collection of scholarly output which is at risk. information technology and libraries | march 2017 25 references 1. thomas h. teper and beth kraemer, “long-term retention of electronic theses and dissertations,” college & research libraries 63, no. 1 (january 1, 2002), 64, https://doi.org/10.5860/crl.63.1.61. 2 the term “reference rot” was introduced by the hiberlink team. “hiberlink – about,” accessed march 31, 2016, http://hiberlink.org/about.html. 3. lockss: lots of copies keep stuff safe, accessed december 6, 2016, http://www.lockss.org/about/what-is-lockss/. 4. mark edward phillips, daniel gelaw alemneh, and brenda reyes ayala, “analysis of url references in etds: a case study at the university of north texas,” library management 35, no. 4/5 (june 3, 2014), 294, https://doi.org/10.1108/lm-08-2013-0073. 5. wallace koehler, “an analysis of web page and web site constancy and permanence,” journal of the american society for information science 50, no. 2 (january 1, 1999): 162–80, https://doi.org/10.1002/(sici)1097-4571(1999)50:2<162::aid-asi7>3.0.co;2-b. 6. wallace koehler, “web page change and persistence—a four-year longitudinal study,” journal of the american society for information science & technology 53, no. 2 (january 15, 2002): 162–71, http://doi.org/10.1002/asi.10018. 7. wallace koehler, "a longitudinal study of web pages continued: a consideration of document persistence." information research 9, no. 2 (2004): 9-2, http://www.informationr.net/ir/92/paper174.html. 8. fatih oguz and wallace koehler, “url decay at year 20: a research note,” journal of the association for information science and technology 67, no. 2 (february 1, 2016): 477–79, https://doi.org/10.1002/asi.23561. 9. mary f. casserly and james bird, “web citation availability: analysis and implications for scholarship,” college and research libraries 64, no. 4 (july 2003): 300–317, http://crl.acrl.org/content/64/4/300.full.pdf. 10. diomidis spinellis, “the decay and failures of web references,” communications of the acm 46, no. 1 (january 2003): 71–77, https://doi.org/10.1145/602421.602422. 11. carmine sellitto, “a study of missing web-cites in scholarly articles: towards an evaluation framework,” journal of information science 30, no. 6 (december 1, 2004): 484–95, https://doi.org/10.1177/0165551504047822. a case study of electronic theses and dissertations (etds) in an academic library | massicotte and botter | https://doi.org/10.6017/ital.v36i1.9598 26 12. matthew e. falagas, efthymia a. karveli, and vassiliki i. tritsaroli, “the risk of using the internet as reference resource: a comparative study,” international journal of medical informatics 77, no. 4 (april 2008): 280–86, https://doi.org/10.1016/j.ijmedinf.2007.07.001. 13. cassie wagner et al., “disappearing act: decay of uniform resource locators in health care management journals,” journal of the medical library association 97, no. 2 (april 2009): 122– 30, https://doi.org/10.3163/1536-5050.97.2.009. 14. robert sanderson, mark phillips, and herbert van de sompel, “analyzing the persistence of referenced web resources with memento,” arxiv:1105.3459 [cs], may 17, 2011, http://arxiv.org/abs/1105.3459. 15. jonathan zittrain, kendra albert, and lawrence lessig, “perma: scoping and addressing the problem of link and reference rot in legal citations,” legal information management 14, no. 2 (june 2014): 88–99, https://doi.org/10.1017/s1472669614000255. 16. “hiberlink about,” accessed march 31, 2016, http://hiberlink.org/about.html. 17. “hiberlink our research,” accessed march 31, 2016, http://hiberlink.org/research.html. 18. martin klein, herbert van de sompel, robert sanderson, harihar shankar, lyudmila balakireva, ke zhou, richard tobin. “scholarly context not found: one in five articles suffers from reference rot,” plos one 9, no. 12 (december 26, 2014), https://doi.org/10.1371/journal.pone.0115253. 19. shawn m. jones, herbert van de sompel, harihar shankar, martin klein, richard tobin, claire grover. “scholarly context adrift: three out of four uri references lead to changed content,” plos one 11, no. 12 (december 2, 2016): e0167475, https://doi.org/10.1371/journal.pone.0167475. 20. martin halbert, katherine skinner, and matt schultz, “preserving electronic theses and dissertations: findings of the lifecycle management for etds project,” text, (august 6, 2015), 2, http://educopia.org/presentations/preserving-electronic-theses-anddissertations-findings-lifecycle-management-etds. 21. for a recent overview, see sarah potvin and santi thompson, “an analysis of evolving metadata influences, standards, and practices in electronic theses and dissertations,” library resources & technical services 60, no. 2 (march 31, 2016): 99–114, https://doi.org/10.5860/lrts.60n2.99. 22. joy m. perrin, heidi m. winkler, and le yang, “digital preservation challenges with an etd collection — a case study at texas tech university,” the journal of academic librarianship 41, no. 1 (january 2015): 98–104, https://doi.org/10.1016/j.acalib.2014.11.002. 23. sanderson, phillips, and van de sompel, “analyzing the persistence of referenced web resources with memento,” http://arxiv.org/abs/1105.3459. information technology and libraries | march 2017 27 24. phillips, alemneh, and ayala, "analysis of url references," https://doi.org/10.1108/lm-082013-0073. 25. alfred s. sife and ronald bernard, “persistence and decay of web citations used in theses and dissertations available at the sokoine national agricultural library, tanzania,” international journal of education and development using information and communication technology 9, no. 2 (2013): 85–94, http://eric.ed.gov/?id=ej1071354. 26. “etd2014 — university of leicester,” university of leicester, accessed january 27, 2016, http://www2.le.ac.uk/library/downloads/etd2014. 27. edina, university of edinburgh, “reference rot: threat and remedy,” (education, 04:54:38 utc), http://www.slideshare.net/edinadocumentationofficer/reference-rot-and-linkeddata-threat-and-remedy. 28. peter burnhill, muriel mewissen, and richard wincewicz, “reference rot in scholarly statement: threat and remedy,” insights the uksg journal 28, no. 2 (july 7, 2015): 55–61, https://doi.org/10.1629/uksg.237. 29. concordia university university graduate programs, accessed april 7, 2016, http://www.concordia.ca/academics/graduate.html. 30. klein et al., "scholarly context not found," https://doi.org/10.1371/journal.pone.0115253. 31. ke zhou, richard tobin, and claire grover, “extraction and analysis of referenced web links in large-scale scholarly articles,” in proceedings of the 14th acm/ieee-cs joint conference on digital libraries, jcdl ’14 (piscataway, nj, usa: ieee press, 2014), 451–452, http://dl.acm.org/citation.cfm?id=2740769.2740863. 32. pdftohtml v0.38 win32, meshko (mikhail kruk), http://pdftohtml.sourceforge.net/ accessed september 20, 2015. (actual download is at http://sourceforge.net/projects/pdftohtml/). 33. give me text! open knowledge international, accessed october 26, 2015-march 7, 2016, http://givemetext.okfnlabs.org/. 34. phillips, alemneh, and ayala, "analysis of url references," https://doi.org/10.1108/lm-082013-0073. 35. “in search of the perfect url validation regex,” accessed december 7, 2015, https://mathiasbynens.be/demo/url-regex. we selected "@gruber v2" for our extraction. 36. curl v7.45.0, "command line tool and library for transferring data with urls," accessed october 18, 2015, http://curl.haxx.se/. 37. we have used the term "memento" in lowercase to denote a snapshot souvenir page, to distinguish from an automated service utilizing the memento protocol. a case study of electronic theses and dissertations (etds) in an academic library | massicotte and botter | https://doi.org/10.6017/ital.v36i1.9598 28 38. for a good overview of the types of problems, see michael l. nelson, scott g. ainsworth, justin f. brunelle, mat kelly, hany salaheldeen and michele weigle, "assessing the quality of web archives" 1 vol., computer science presentations, book 8 (old dominion university. odu digital commons, 2014). http://digitalcommons.odu.edu/computerscience_presentations/8. 39. shawn m. jones, et al. “scholarly context adrift," https://doi.org/10.1371/journal.pone.0167475. 40. opendoar search of institutional repositories with theses at http://www.opendoar.org/find.php, accessed august 26, 2016. 41. joachim schöpfel, "adding value to electronic theses and dissertations in institutional repositories." d-lib magazine 19, no. 3 (2013): 1. https://doi.org/10.1045/march2013schopfel. 42. strategic digital initiatives working group. implementation of a modern digital library at the ohio state university. (apr 2014). https://library.osu.edu/documents/sdiwg/sdiwg_white_paper.pdf. (published). 43. tim gollins. “parsimonious preservation: preventing pointless processes! (the small simple steps that take digital preservation a long way forward),” in online information proceedings uk national archives, 2009. available at http://www.nationalarchives.gov.uk/documents/information-management/parsimoniouspreservation.pdf. 44. margaret hedstrom, "digital preservation: a time bomb for digital libraries." computers and the humanities 31, no. 3 (1997): 189-202. https://doi.org/10.1023/a:1000676723815. 45. zittrain, albert, and lessig, "perma," https://doi.org/10.1017/s1472669614000255. 46. herbert van de sompel, michael l. nelson, robert sanderson, lyudmila l. balakireva, scott ainsworth, and harihar shankar, “memento: time travel for the web,” arxiv:0911.1112 [cs], november 5, 2009, http://arxiv.org/abs/0911.1112. 240 i ournal of library automation vol. 7 i 3 september 197 4 book reviews case studies in lihm1·y computer systems, by richard phillips palmer. new york: r. r. bowker, 1973. 214p. $10.95. surely one of the most annoying and disappointing aspects of the literature of library automation is the complete lack of uniformity or standards for reports of individual accomplishments. thus one reads the continuing stream of reports of automated processes in individual libraries with only the remotest idea of which of the projects described are actually operating, which are in the process of being implemented, and which are merely proposals that still exist exclusively in the minds of their creators. in this volume, richard palmer has brought together a number of descriptions of operating systems, upon which he has imposed his own standards of presentation. in all, six circulation, eight serials, and six acquisitions systems are described; in each case the description is divided into six parts. first, in a section entitled "environment," the library, its collections, and its users are briefly described. some idea is provided of the library's total budget, or at least its materials budget, and unusual features of the library are given. next, the objectives of the automated system are stated, generally with some indication of what prompted automation to be considered and what features of the previous manual system were less than satisfactory. a section entitled "the computer" describes the hardware used in some detail (and this information is summarized in a table at the end of the book) , and the next section, "the system," gives a lengthy and detailed description of how the system works. the last section in each case is devoted to observations by palmer, indicating the significance to the library of the automated system, and often pointing out problems that have been noted. the least satisfactory section of the book is the final chapter, "summary and observations," in which palmer lays out the stated costs of each system in such a way that they may be directly compared, even though he knows the figures have been derived in various manners and are therefore not directly comparable. palmer's warning to the reader that "unit costs ... should not be compared without noting that they were not computed on a standard basis" makes even more mystifying his arrangement of those costs in tabular form. a second area that seems weak is the suggestion that the book constitutes an effective rebuttal to the criticisms of ellsworth mason. it seems unlikely that anything short of a very thorough systems analysis, showing all of the problems, alten1atives, costs, and benefits of both manual and machine systems, will satisfy mason. despite these very minor reservations, the book is well worthy of study. it presents, in nontechnical language, some of the most carefully and honestly described systems descriptions to be found in the literature, suggesting by example that many of the individual applications described in the journals, including lola, might well be better than they are. pete1· simmons school of librarianship university of british c@lumbla information systems, se1'vices and centers, by herman m. weisman. new york, n.y.: wiley-becker-hayes, 1972. 265p. $10.95. isbn: 0-471-92645-0. weisman states that his work "is not a text on automated information technology," and mechanization is pretty well dismissed in one page of critical discussion. ellsworth mason is singled out for his "amusing, facetious and bitter account of a [sic] melancholy experience at mechanization." use of automated services for information work is covered in less than a page. the work is supposed to be a university-level text and "reference source" on "the practices of information transfer and use" on the "retailer" level. it is almost entirely limited to industrial and government scientific and technical information services. libraries are defined in passing as a "specific type of information system . . . largely limited with some few exceptions to the passive repository function. . . ." however, "if the organization has a library, consultation with the librarian and use of his mechanisms for acquisition and purchase are advisable." it is also suggested that acquisitions are "recommended by the systems advisory committee . . . selected and purchased by the director of the information system and the documentation unit head," and the onorder file is maintained as a list (to be distributed monthly, perhaps) and in card form. the section on cataloging is equally instructive in advising that the acquisition process has provided "subject" as one of three elements needed for descriptive cataloging. the book swings dizzily back and forth from this lilliputian (or is it laputan?) perspective to the more olympian outlook suggested by a seventeen-page appendix which is the text of a charter for the united engineering information service with an expected annual budget of $1.2 million. it also seesaws from the uselessly general to the exquisite detail of an operations manual with hardly a pause for breath. we are told at the start of the chapter on "documentation practicesinformation services" that it "is more efficient to provide [information dissemination services] than to have individuals scurrying about searching for information." a summary of the "procedural flow" follows immediately: "1. ... all requests and inquiries no matter how received or to whom addressed are logged and assigned a control number. 2 .... the head of inquiry services is responsible for monitoring all requests and inquiries. . . . all incoming requests are entered on the inquiry form .... " most examples appear to be drawn from the author's experience as manager of information services, national bureau of standards. some are useful. strauss' scientific and technical libraries: their organization and administration was another wiley-becker-hayes volume issued during the same year. it is impossible to avoid imagining the publisher's marketing division people counting the respective memberships of the special libraries association and the american book reviews 241 society for information science as distinct markets for the two works. however, the first three-quarters of weisman's work is a duplication distinguished from strauss mainly by the shallowness of its coverage and the poverty of its prose. weisman's only notable contribution is thirty pages about information analysis centers, which might be worth a school reading assignment. the assignment will be at some risk, depending .on students' toleration for such words as "essentialness," "beneficialness," "collaborationists," and such phrases as "parameters of data points," as in "an indexed bibliography becomes a more useful document, since it can indicate to a user exactly the type of data contained as well as parameters of data points." "relevance," as weisman notes, "is not always synonymous with competence." justine roberts university of california san francisco k1wwing books and men; knowing computers, too, by jesse h. shera. littleton, colo.: libraries unlimited, 1973. 363p. $13.50. isbn: 0-87287-073-1. the only clumsy thing about this book is its pretentious title, which not only gives little indication of the book's contents but is discordant with the lucid and vigorous style of the writing. kbam;kc,t is a selection of writings and speeches by dr. shera, done between 1931 and 1972, all but one previously published. but only a few are reprinted unchanged; most have undergone revision to some significant extent, and one has been almost doubled in length in revision. even the oldest papers are not unduly "dated," and the author's reflections on the use and abuse of computers in libraries are as timely now as when first written. the twenty-nine papers published here are presented under six headings, each representing an area of librarianship in which dr. shera has been a major influence: philosophy of librarianship, library history, reference work in the library, documentation, the academic library, and library education. most of lola's readers, it is hazarded, will find 242 journal of library automation vol. 7/3 september 1974 the section on documentation of most interest. reviewing kbam;kc,t is no fit occasion for attempting to evaluate jesse shera's contributions to librarianship. he is established, and this selection from his writings contains many of his important and influential papers and others, inevitably, less weighty. throughout, however, they bear shera' s characteristic combination of clarity, intelligence, vision, and a forthrightness bordering on truculence, the mix spiced judiciously with attic salt. in a disarming preface shera suggests that the collection may be "more of an addition to library shelves than to library literature." be that as it may, many of the writings were originally published in somewhat obscure journals, and it is helpful to have them gathered in this convenient form. george piternick school of librarianship the university of british columbia a library management game: a report on a research project, by p. brophy, m. k. buckland, g. ford, a. hindle, and a. g. mackenzie; with an appendix by l. c. guy. (university of lancaster occasional papers, no.7) university of lancaster library, 1972. 90p. £ 1.00. isbn: 0901699-14-4. in the context of the need for greater managerial expertise in libraries, the state of managerial education in library schools, and the place of games in this education, the authors describe in this document the development of a simplified probablistic model of a loan and duplication system. while it is perhaps the novelty of concept exhibited by the game which first attracts attention, closer examination reveals that the game is but the vehicle upon which is carried a far-ranging analysis of the state of library management. a dynamic model utilizes three input variables-loan period, titles bought, and duplicates purchased-and three output measures-satisfaction level, document exposure, and collection bias-of the effective manipulation of the former within the constraint of budget to illustrate complex interactions within a library system. sufficient flexibility (e.g., variation of loan periods according to popularity of volumes and/ or status of user) enables different policies to be selected to effect the stated objectives of the player (library "manager"). comparison of selected outputs illustrates that while choosing and implementing policies may be simple (a "game editor" interprets a player's decisions to the computer), judging their merits is not. policy (l) decreases collection bias at the expense of average document exposure per issue, while policy ( q) has the opposite effect, for similar costs and total issues; policy (t) increases satisfaction level and decreases collection bias in comparison with policy ( q), at a cost of 8,000 units of expenditure. evaluation of the policy decision rests on a value judgment (as in the real world) . although description of the game and probabilities upon which it is based occupies a considerable portion of the volume, the authors considered not only the practicability of such a game but also its usefulness in teaching and cost of utilization. an appendix devoted to an in-depth study of education for library management concludes that: in britain and, to a lesser extent, the united states this aspect of library education needs considerable strengthening; games such as that described are most suited to specialized courses for experienced librarians but there is a place for similar ones in firstlevel courses; and a larger proportion of the profession needs to comprehend the concepts put forward in this and other studies before better management techniques will be applied to libraries. this volume is an important contribution to the literature of library management, illustrating that the effect that computers can have on the practice of librarianship goes far beyond the mere substitution of machines for clerical workers. george ]. snowball sir george williams unive1·sity library montreal, canada lib-s-mocs-kmc364-20141005045627 257 technical communications announcements isad institute on bibliographic networking information science and automation division (is ad) of the american library association will hold an institute in new orleans on february 28-march 1, 1974 at the monteleone hotel in the french quarter. the subject of the institute will be "alternatives in bibliographic networking, or how to use automation without doing it yourself." the seminar will review the options available in cooperative cataloging and library networks, provide a framework for identifying problems and selecting alternative cataloging systems on a functional basis, and suggest evaluation strategies and decision models to aid in making choices among alternative bibliographic networking systems. the institute is designed to assist the participant in solving problems and in selecting the best system for a library. methods of cost analysis and evaluation of alternative systems will be presented and special attention will be given to comparing on-line systems with microfiche-based systems. the speakers and panelists are recognized authorities in bibliographic networking and automated cataloging systems and will include: james rizzolo, new york public library; maryann duggan, slice; jean l. connor, new york state library; maurice freedman, hennepin county library, minneapolis; brett butler, information design, inc.; and michael malinconico, new york public library. the cost will be $60 for ala members and $75 for nonmembers. for hotel reserv~tion information and a registration blank, write to donald p. hammer· isad; american library association; 50 e. huron st.; chicago, il 60611. p.s. mardi gras is february 26! isad forms committee on technical standariu for library automation (tesla) the information science and automation division of the american library association now has a committee on technical standards for library automation (tesla). tesla, recently formed with the approval of the isad board of directors will act primarily as: a clearinghouse fo; technical standards relating to library automation; a focal point for information relating to automation standards; and a coordinator of standards proposals with appropriate organizations, e.g., the american national standards institute the electronic industries association, n~tional association of state information systems. the committee's initial work will be to formulate areas and priorities in which standards are required, to document existing standards sources, and to develop a "library" of applicable standards to be drawn upon by the membership of ala. according to the new committee's chairman, john kountz, california state universities and colleges, "it is auspicious that this time be selected for the implementation of a standards committee for library automation. with the current introduction en masse of production library automation systems and the fading of research and development activities, such standards will come into good use as they may be developed for library automation. in addition, the close linkage with new developments such as the information industries association and the availability of standardized data bases, hardware, and communication standards are becoming requirements. the standards which shall be emphasized in the committee activities are those relating to areas of interestfor administrators and automators 258 journal of library automation vol. 6/ 4 december 1973 alike. these standards are intended to fill the void for future library automation operations." the committee efforts should be measured in terms of facilitating the automation of library functions as required on an individual library basis. information relating to the standards committee activities and its scope, or general information relating to library information technical standards, should be addressed to: ala/ isad committee on technical standards for library automation, john kountz, chairman, 5670 wilshire blvd., suite 900, los angeles, ca 90036. formation of an ad hoc disetl$sion group on serials data bases as a result of an informal meeting held during the ala conference in las vegas to discuss the problems associated with the establishment and maintenance of union lists of serials, an ad hoc discussion group on serials data bases was formed, with richard anable acting as interim coordinator. the council on library resources agreed to fund a meeting of the group's steering committee on september 21, 1973 at york university in toronto, canada. many of the major union list activities on this continent will be represented as well as the national libraries and isds national centers from both canada and the united states. a list of the subgroups that have been formed gives a good idea of the individual problem areas which the group is tackling: a. record format comparison b. minimum record data element requirements c. cooperative conversion arrangements d. organizational relationships and grant support e. holding statement notation f. bibliographic standards g. authority files h. software evaluation and exchange a detailed description of the history and activities of the discussion group can be found on page 207 of this issue. for further information contact: richard anable, york university, downsview, ontario, canada, m3j 2r2, ( 416) 6673789. technical exchanges file conversion using optical scanning: a comparison of the systems employed by the university of minnesota and the university of california, berkeley by this time most large libraries in the u.s. have converted into machine-read~ able form at least some of their ides. most of them, however, have used relatively inefficient techniques (such as key-punching) or relatively expensive ones (such as on-line data entry). it was with pleasure, then, that i read ms. grosch's recent article ("computer-based subject authority files at the university of minnesota libraries," journal of library automation, dec. 1972) describing a conversion technique that she, like the library at the university of california at berkeley, has found to be extremely cost effective, namely optical character recognition using a cdc 915 scanner. berkeley has used (and still is using) this technique in its efforts to create what will soon be among the largest machinereadable serials files of any university in the world. that me currently contains records for over 50,000 serials (in the marc structure). it is expected to contain records for about 90,000 unique titles (approximately 30 million characters) before the end of the current fiscal year. based on our ej~:perience in this undertaking, i would like to offer the following comments on the use of the cdc 915 scanner as it is used in minneapolis and in berkeley. costs-it should be crystal clear that the main reason for using the scanner is cost of the keyboarding device. that is, the keyboarding device for the cdc 911s scanner is an ordinary ten pitch selectric typewriter which can be purchased for under $500.00 or rented for from $ii.oo to $30.00 per month. when not used as a computer input device the machine functions as a normal office typewriter. a device like an mt/st that rents for about $110.00 a month costs about $.60 an hour for every hour it is used, or ten times as much. keyboard operators for a typewriter are easily obtained since there is no need to train an operator in the idiosyncrasies of keypunch cards, crt terminals, magnetic or paper tape devices, etc. keyboarding is fast and easy, especially when compared to a key punch. mistakes are easily corrected by, for example, merely crossing out the character ( s) in error. keyboarding on a selectric for a scanner and keyboarding on a device like the mt/st both require a "converter" (the scanner itself or the mt/st-to-computertape converter) . these "converters" are equally available and the decision to use one keyboarding device over another should not hinge on the "availability" of such "converters," as is usually the case. in addition to selecting a cost-effective keyboarding device, minnesota has also operated a system that delivers the data to the keyboarding device in an efficient manner: the typing is done from the source document itself, rather than from a copy of that document that has been transcribed onto a "coding sheet" or a photocopy of that document. ms. grosch points out that photocopying the source document would have raised the project costs by about 50 percent. in addition, keyboarding from photocopied documents would probably have been much slower and less accurate. the berkeley typists also keyboard from the original document, even when that document is a public catalog card that must be temporarily marked up in order to resolve ambiguities for the typists. supplies-it is true that the ordinary selectric typewriter (without the pinfeed platen) performs satisfactorily. thus, one does not need continuous forms for the typewriter. indeed, it is not necessary even to use a "stock form"; plain 20 pound white long grain paper will do. we use zellerbach's hammermill bond 820, which costs $2 a ream. at minnesota, using this paper instead of the "stock form" would probably have reduced the supplies cost from $400 to less than $25. technical communicat!ons 259 had a keypunch been used, the operation would probably have required about $150 worth of ibm cards. scanner throughput-careful design of the format of the data on the typed sheet can substantially improve throughput on the cdc 915 scanner. with double spaced typing (three lines per inch), the cdc scanner is capable of reading data at the rate of over a half million characters an hour, or about twice as fast as was actually achieved at minnesota. thus, with altered design of the input format, about half of the cost of the "converter" -the scanner-could· have been saved, representing an additional savings of $500. the principle applied to maximize throughput on a scanner such as the cdc 915 is to enter as much data as possible on a line and as many lines as possible on a page without crowding the data so much as to cause the machine to misread. (the machine enforces stricter tolerances as its capabilities are pushed to their limits.) one wants to get as much as possible on a line for the same reason that one wants to get as much as possible onto a punched card: there is a fair amount of machine overhead involved in advancing to the next line and/or page. the berkeley system uses a sheet of paper that is 8}~ x 14 inches in size, and the typists type each line a full 63~ inches long. typing is double-spaced (even though the machine is capable of handling single-spaced typing) because this increases the vertical skew tolerance from ~ of a character height to a full character height. figure 1 is an example of a page typed at berkeley. at berkeley, more than one field may be placed on a line, each field being separated by the "fork" character (y) . like minnesota, typists identify each field by a one-character code at the beginning of the field (a for author, t for title, h for holdings, c for call number, b for branch library location, etc.). typists are instructed to type until the margin locks. the beginning of each logical record is identified by the ''chair" character (r\) plus the typist's initials at the beginning of the line. thus the entire line is utilized, and the 260 journal of library automation vol. 6/4 december 1973 nssya=faoytproceedings of a symposium on man made forests and th eir industrial importance, canberra, 19b7yh1-3, 19b7//ycsd118.fs nssya=faoytreport on a survey of the awash river basinyh1-s, 1965// nssya=faoytetudes pedrohydrologiques, togoyh1-3, 19b7//ycs599.c7fb fig.l. berkeley optical scanner input. machine is not required to read a large number of blank spaces at the beginning of the line (which, as ms. grosch points out, it has trouble doing since it cannot readily tell whether six blanks may, in fact, be really five or seven blanks) . we generally do not proofread the sheets after they are typed. we have found that when proofreading is necessary (usually during training), it is not difficult to proofread data typed in the format that we use. data element identification-at berkeley, as at minnesota, the typist identifies the data element (e.g., the author or the title) rather than relying on a computer algorithm of the kind used by the library of congress or the institute of library research (automatic format recognition). this approach was selected because it was felt (a) that the typist could perform this task better than the computer could, and (b) that the routine nature of the typing job necessitated the insertion of more meaningful tasks for the typists. the data presented to the typists for interpretation can be in a wide variety of languages and may be transcribed on the source document according to any one of the conventions used by the library during the past several decades. typing throughput-the berkeley conversion system includes the use of certain "super abbreviations" that typists may use in place of commonly occurring words or phrases. all such abbreviations are two or three characters in length and are preceded by an equal sign. for example, "= f ao" is translated into "food and agricultural organization of the united nations," by the computer software. although this substantially improves keyboarding throughput, its chief advantage is the insurance that the long phrase is entered into the :file correctly and consistently. i personally find the requirement that the typist at minnesota type the "format recognition line'; at the top of each sheet in .order to avoid the necessity of a "complete rerunning of the job" to be not only wasteful, but playing brinkmanship with systems design. expanding the character set-although the cdc 915 scanner is capable of reading only the ocr a font (an all upper case font), it is relatively simple to produce upper-and-lower case output from data input via the cdc 915. two alternatives are: 1. have the typist key a special character that means "next character is to be capitalized" before each upper case character (the technique used by typists throughout the western world, in the form of the shift key) . if, for the cdc scanner, the dollar sign were chosen to be that special character, the "$john" would represent "john" and "john" would represent "john." this technique can be used to expand the keyboard to include diacritical marks. a berke~ ley typist keys "esp an%eol" to produce "espafiol," since the computer translates %e into a tilde over the preceding character. 2. do all capitalization by logic contained within the software. a primitive computer algorithm might simply say "capitalize the first word of every sentence plus the following proper nouns .... " the berkeley library currently uses such a technique for the capitalization of words in serial entries. this has been done in order to print out the serial entries following standard rules of technical communications 261 call number university of california berkeley general "library serials key word index page 372 holdings qa76.alc545 z699.alh3 qa76.al!4.r4 qa74.ali65 qa16.all5549 qa16.a1!555 computers qa76.a1a36 engi advances in computers , qa76.ala36 hath advances in computers , , , •••• qa76.alc56 eng! computers and automation •• , •• , ra409.5.alc65 biol computers and biomedical research • qc.145.2.c6 eng! computers and fluids. , .•• , ••. , , , , •• 1,19601, 19602, 19531, 19671, 19131968 ta64l.c65 libr computers and libraries; an australian directory. 1, 1911-eng! computers and structures. , • • • • • . • • • , • qa76.5.a1c65 main computers and the humanities ••••••••••• 1, sep. 196* 5, 1910/110n order unde computers and the humanities ••••••••••• on order hath computers, control and information theory •••• lb1028.5.a1c* ~~g~ g~~~~~~~e igna~~~6tei~di~e~~;in~der9r~d~~te • c~r~i~uia: p~o~e~dln9s: f~~ 8~~~ alh65 * i:gi ~~~i~~~ng~n~~r:~~;t~~n~ir~~~i~:e~~ste;~n!~~egg:pb~~~~~' proc~edi~gs 1, 19701, 19701,19581,19691:18, dec.l* 1, mar. 196* 1, 1971ti<6540.i55 eng! ieee transactions on computers , •••••• , • , • z699.alp76 libr program; news of computers in libraries •••• , , , , engi quarterly bibliography computers and data processing ••• 8~l~:~~2q3 :i~ ~!~f~rh:r ag~i3f~~~~y. o: ?~p:rr~r~ ~n~ ~a~a. p:o:e~s~n? : qa4 7. t7 math tracts for computers. • , , • . • , , , , • • • , , • • , 1, 19711, 1919; 2* 11-161; 1sfig. 2. berkeley's serial key word index: sample page. style, rather than the traditional rules of librarianship, namely every significant word in the title is capitalized. (did the library practice arise because early typewriters had shift keys that were hard to use?) our computer algorithm says essentially "capitalize all words in the entry except the following insignificant ones .... " this technique has created an upper-lower case file without having typists use the shift key, or its equivalent, .at least a half million times. figure 2, a page from berkeley's serials key word index, illustrates the results of this system. the real problem-! do not mean to imply that everything is rosy in file conversion land. a file conversion is a messy, difficult and essentially unproductive task, no matter how well done, because it merely transforms existing data into another form and in so doing exposes, for all to see, the "many ancient errors" which we do not want to see. it also exposes the "ambiguities" that were perhaps better left ambiguous, not to mention the inconsistencies that have cropped up as library practices varied. i would suggest that any file conversion that works from files that have been built up over some time period requires more in the way of resources for the "cleansing" than for the conversion. that is, in the case of the subject authority files at minnesota, i would guess that far more than $5,296.21 (the total amount spent on typists, keyboards, computers, supplies, etc.) was spent resolving ambiguities (before the drawer was handed to the typist) and "cleansing" the data in the one year between the time when the data had been converted and the time that they were put to use. this has been our experience at berkeley. stephen silberstein university of california, berkeley 262 journal of library automation vol. 6/4 december 1973 reports-library projects and activities bucknell university plans entire bibliographic file to go on-line bucknell university's already strong computer-usage program is expected to be strengthened in 1973/74 to permit stu~ dents and faculty to conduct fast, accurate searches of the university library from any of thirty-five campus terminals. a $28,000 grant to the bucknell university library from the council on library resources is supporting this program. seventy-five percent of bucknell's students already use the campus computer in course work. and bucknell's on-line library data base includes records of ap-proximately 25,000 of the library's 200,000 books. the council grant will enable additional computer storage to be rented to permit the entire bibliographic .file at bucknell to go on~line. the complete flle is already in machine-readable form. while bucknell's current system enables a search of the on-line files by authortitle, title alone, and library of congress (lc) number, its enlarged plan calls for subject search capability as well. using lc classification numbers, a user will be able to ask the computer to locate and display the authors and titles associated with the subject of interest, examine the near neighbors of his original hit in the file, or he may pick an author's name from the response and enter the system again on the author's name to see what else the author may have written. stanford university data file directory the stanford university data file di~ rectory, compiled by douglas ferguson, is available as an example of a libraryproduced access publication for computer~ ized data files on a university campus. the directory lists and describes colle~ tions of social, economic, political, and scientific research data on punched cards, computer tape, and disk, located on the stanford campus. each file description directs the user to documentation and published research in the university library collection or elsewhere. access to each data file is controlled by the owner and is listed in each file description. the di~ rectory is available, for prepayment of $4, from the financial office, stanford university libraries, stanford, ca 94304. standards editor note: the recent flurry of activity concerning standards which affect l~ brary automation, dilta bases, etc., is pointed up in the several actions reported in the last issues of tc. perhaps the futility of keeping up with standards and the need for a clearinghouse type of operation is best recognized by noting a sample of some recently adopted stan.. dilrds which now have or will potentially have ramifications in library automation. the following list does not represent a complete accounting of all pertinent standards due to lack of a comprehensive source. selected ansi standards many ansi standards published in the ansi categories of "information processing systems" and "information systems" may be of interest to isad members. selected items are listed below. the new american national standards institute (ansi) catalog is available free of charge from the institute's sales department at 1430 broadway, new york, ny 10018. the catalog lists " iso standards" and "iso recommendations" as well. x3.14 recorded magnetic tape for information interchange (200 cpi, nrzi) (revision of ansi x3.14-i969)--provides the standard technique for recording american nation~ al standard code for information interchange (ascii), x3.4-l968, on magnetic tape at 200 characters per inch ( cpi) using nonretum-to-zero-change on ones (nrzi) recording techniques. approval date: december 12, 1972. x3.38 computer code for states-x3.38-i912 provides two-digit numeric codes and two-character alpha------~------------------betic abbreviations for both the states and the district of columbia. the numeric codes will allow the states and the district of columbia to be sorted into alphabetic sequence. ansi x3.38-1972 may be obtained from the american national standards institute at $1.25 per copy. it was developed under the secretariat of the business equipment manufacturers association. x3.31 structure for the identification of the counties of the united states for information interchange (new standard )-identifies a three-digit numeric code structure for the counties of the states of the united states, including the district of columbia. supersedes the listing which appeared in the march 26, 1971 issue of standards action. approval date: march 14, 1973. x3.39 recorded magnetic tape for information interchange (1600 cpi, phase encoded) (new standard)-presents the standard technique for recording the coded character set provided in american national standard code for information interchange, x3.4-1968 (ascii) on magnetic tape at 1600 characters per inch (cpi) using phase recording techniques. approval date: march 7, 1973. x3.40 unrecorded magnetic tape for information interchange (9-track 200 and 800 cpi, nrzi, and 1600 cpi, pe) (new standard)-presents the minimum requirements for the physical and magnetic interchangeability requirements of ~-inch wide magnetic tape and reels between information processing systems, communication systems, and associated equipment using american national standard code for information interchange, x3.4-1968 (ascii). approval date: march 5, 1973. bsr x3.41 code extension techniques for use with the 7-bit coded character set for ascii (ansi x3.4-1968) (new ,proposed standard)-provides means for augmenting the standard repertory of 128 technical communications 263 characters of american national standard code for information interchange, x3.41968 (ascii), with additional graphics or control functions, by extending the 7-bit code while remaining in a 7-bit environment, or increasing to an 8-bit environment in which ascii is a subset. order from: business equipment manufacturers association; 1828 l st., nw; washington, dc 20036. single copy price: free. bsr x3.47 identification of named populated places and related entities of the states of the united states, structure for the (new proposed standard)-provides the structure for an unambiguous, five digit code for named populated cities, towns, villages, and similar communities and for several categories of named entities similar to these in one or more important respects. order from: business equipment manufacturers association; 1828 l st., nw, washington, dc 20036. single copy price: free. bsr x11 .6 operational data processing applications containing constitutionally protected data, documentation requirements for (new proposed standard)-provides all those involved with operating electronic data processing applications, involving constitutionally protected data, with a list of minimum documentary requirements which apply to such applications. order from: society of c ertified data processors, 38 main st., hudson, ma 01749. single copy price: $2.00. bsr xll.l categories of errorcreating characteristics of various data storage systems used with electronic data processing applications (new proposed st andard)provides the consumers of electronic data processing applications and the suppliers and implementors of such applications with a technique for defining the error-generating capabilities that exist in the data st orage system used to hold the consumer data. it is one of a series of data storage stan264 journal of library automation vol. 6/4 december 1973 dards being prepared by the society of certified data processors technical standards committee, to provide a method whereby the application implementor ·and the application consumer may communicate easily, allowing the application consumer to take the responsibility for the accuracy of the maintenance of the data base by electronic data processing systems. . order from: society of certilled data processors, attn: chairman, technical standards committee, 38 main st., hudson. ma 01749. single copy price: $2.00. bsr x11.2 data items stored in general data bases, classification . of (new proposed standard)-provides the suppliers of data to a general data base \vith a means of communication with the operation of the base regarding the characteristics of the data items being supplied. order from: society of certified data p~o~ssors, attn: chainnan, technical standards committee, 38 main st., hudson,.ma01749. single copy price: $2.00. bsr xl1.3 data base processing activities based on data items used, categories of (new proposed standard)-provides the application designers of data base applications and the operators of several data bases with a means of describing the characteristics of the data items stored in the data base. order from: society of certified data processors, attn: chairman, technical standards committee, 38 main st., hudson, ma 01749. single copy price: $2.00. bsr x2.3.4-1959 charting paperwork procedures , metho d of -this standard was one of the original input docume nts considered in the developm.ent of american national standard fi.owchart symbols and their usage in information processing, x3.5-1970 (originally ansi x3.5-1966). however, ansi x2.3.4-1959 was not considered sufficiently useful to serve the needs of the community which now uses ansi x3.5, nor at that time did x3 have responsibility for ansi x2.3.4 or feel that it should initiate action to modify the older standard. the subject standard was subsequently as~ signed to american national standards committee x3 for review and revision, reaffirmation or withdrawal. current review finds no interest in this standard, either in the form of users of the standard or of an organization desiring to assume its maintenance. order from: american national standards institute, dept. bsr, 1430 broadway, new york, ny 10018. single copy price: $1.00. sc/ 20 standard serial coding-the american national standard identi.scation number· for serial publications, .z39.9-1971 is available from ansi at $2.25 per copy. in june 1970, iso/ tc 46/wg 1· accepted the system as outlined in z39.9-1971 -as the basis for the international standard numbering system. a final issn standard was presented to the plenary session on tg 46 in october 1972 at the hague. the international center (ic) of the international serials data system (isds) is responsible for the administration of the issn as a central authority. the ic-isds was established with headquarters in the bibliotheque nationale with financial support being shared by the french government and unesco. the national serials data program (nsdp) has been selected to serve as the united states national center and as such is the sole agency responsible for the control and assignment of issn in the u.s. (note-the ansi !stab (information systems technical advisory board) rejected the proposed ansi z219.1-1971 , use of coden for periodical title abbreviations. this proposal had been submitted to ansi by the american society for testing and materials in 1971 for approval as an american national standard; z39 members were asked to comment on it during the public review in july and august 1971. after considerable discussion the 1st ab came to the conclusion that the proposed standard was in conflict with 239.9-1971, the. ansi identification number for serial publications.) sc/ 2 machine input records -the members of sc/ 2 have agreed that this standard cannot be written at this time. the purpose of the proposed standard was for general information interchange at the interface between data processing terminal equipment {such as data processors, data media input/ output devices, office machines, etc.) and data communications equipment (such as data sets, modems, etc.) . the decision was based on the fact that the problem of designing a format is not being addressed here (that standard already exists, namely z39.2-1971) but rather the problem of ne twork protocol. therefore, the transmission of the bibliographic record itself, taken in this context, is only a small part of the total picture. subcommittee 2 has concluded, however, that in the light of future developments in network protocol, bibliographic data should be transmitted in the z39.21971 interchange format standard. in order to further this recommendation, the present z39.2-1971, the american national standard for bibliographic information interchange on magnetic tape, will be revised by sci 2 to reflect a broader scope, i.e., information interchange in digital form, with appropriate sections in the document describing the existing standards for different media (the first of these would be magnetic tape since this standard already exists). this should have the effect of using the standard format in future systems via telecommunications as well as via magnetic tape. the additional sections discussing various media will aid the user of the format regardless of the media involved. input to the editor: say it isn't so. tell me that, as editor of technical communications, you are not responsible for the item on page 65 of val. 6, no. 1. i refer to the squib headed "tomorrow's library: spools of tape." i am particularly offended to see this kind of outdated foolishne ss promoted after noting two pages earlier that the new directions for technical communications will involve pertinent information about technical developments. how could a technical communications 265 publication entitled college management possibly contribute technically significant information about such a specialized and sophisticated area as library automation? in general, i think blue sky articles are inappropriate for tc. carl m. spaulding council on library resources the new format and content of technical communications is expected to evolve, and thf18 no step function change was anticipated. in the meantime, while operating on an accelerated publication schedule i have attempted to find pertinent (if not completely appropriate) articles for tc. i would like to see more contributions of hardcore technical communications from the field, but until people accept the new design for tc and contribute to it, the selections wiu be scarce. incidentally, i have received some comment to the contrary, that perhaps a " bltte sky?" category of news notes in tc would serve the useful purpose af providing another perspective, or putting "far out" items into context. certainly, contributions of the type submitted by stephen silberstein in this issue and justine roberts in the last issue of tc represent the directions envisaged for tc's content. in most technical fields there's a place for the proposed tc type of forum, and r m confident library automation and technology have a similar need. i would appreciate more readers' comments, and more importantly, brief write-ups of the technical aspects of your accomplishments ami. findings which would be of interest to isad members.-dlb potpourri unisist international serials data system the international serials data system { isds ) as establi~hed within the framework of the unisist program, is an international network of operational centers, jointly responsible for the creation and maintenance of computer-based data banks. 266 journal of library automation vol. 6/ 4 december 1973 the objectives of the isds system are: a. to develop and maintain an international register of serial publications containing all the necessary information for the identification of the serials. b. to define and promote the use of a standard code (issn) for the unique identification of each serial. c. to facilitate retrieval of scientific and technical information in serials. d. to make this information currently available to all countries, organizations, or individual users. e. to establish a network of communications between libraries, secondary information services, publishers of serial literature, and international organizations . f. to promote international standards for bibliographic description, communication formats, and information exchange in the area of serial publications. the isds is designed as a two-tier system consisting of: an international centre (ic) national and regional centres the isds-international centre is established in paris by agreement between unesco and the french government. it is temporarily located at the bibliotheque nationale. the isds-ic will establish an international file of serials from all countries. this file will be limited, initially, to scientific and technical publications, and will be gradually extended to include all disciplines. each serial will receive an international standard serial number (issn), which has been developed by the international organization for standardization (iso). products which could be derived from the international serials data system are as follows: titles index; issn index; isds register of . periodicals (register); classified titles index ( cti); new and amended titles index (n & at); cumulated new titles (cnt); permuted index; microform reference file (mrf). a magnetic tape service will be provided of the current master file, and of the new and amended titles. the responsibility for the establishment of national or regional centres belongs to unesco member states, and associate members who wish to participate in the unisist program. upon establishment each national centre will obtain a block of issns from the international centre and will gradually take over the responsibility for the registration of serials published in its territory. a regular information exchange program will be established between the national centers and the international center. the international register will thus be a regularly updated cumulation of the initial file established by the ic and the national or regional files. serials published in countries with no national or regional centres will be registered by the international centre, which will endeavor to obtain the necessary information. the relationship with users of isds is primarily through national or regional centres, but this general rule does not exclude direct contact with the international centre. the building of a consistent international file of serials implies close cooperation between all members of isds. the work in all countries will be based on a common set of rules concerning: bibliographic description, communication format, character sets, abbreviations, transliteration, etc. coordination between all members of the system is one of the main tasks of the international centre. close cooperation has also been established with various international organizations, the objectives of which are closely related to those of isds. in november 1972 the director-general of unesco informed member states of the creation of the international centre and has invited them to cooperate in isds by establishing national or regional centers. to assist in the creation of these national or regional centers provisional guideunes were made available. these guidelines are at present being finalized and will shortly be widely distributed in english, french, spanish, and russian. the response of member states was most encouraging and to date the following countries have set up or are in the process of setting up national or regional centers: argentina, australia, austria, canada, colombia, dahomey, france, federal republic of germany, guatemala, india, italy, malta, new zealand, nigeria, philippines, union of soviet socialist republics, united kingdom, and united states of america. for further information and issn assignment contact is os-intemational centre, bibliotheque nationale, 58 rue de richelieu, paris 2eme, france. adl to condtjct study of the data base publishing industry arthur d. little, inc., the cambridge, massachusetts, consulting firm, is launching a major study of the data base publishing industry. the study, which will be available on a subscription basis, will cover present and future technology utilization, economics, markets, and business and competitive structure. more specifically, the study will: • characterize typical data base publishing activities in terms of markets, products, sales strategies, methods of data base collection, distribution, etc.; • identify the current and expected roles of private industry sectors, government, and professional associations; • analyze existing and latent markets for data base publishing ventures and estimate market growth over the next five years; • describe criteria for analyzing the economics of data base publishing services and pricing them; • review hardware, software, and developments likely to affect the industry in the next five years, including emergence of lower-cost switched data networks; • describe the probable impacts of technical communications 267 public policy and regulatory developments, including copyright legislation, patentability of software, and concern over protection of confidentiality of personal information, and • characterize the reasons for past failures of certain data base publishing ventures and propose strategies for successful involvement. the study will be directed by vincent giuliano and robert kvaal. dr. giuliano has extensive experience working with major information dissemination systems, ranging from libraries to telecommunications-based computer systems. he has led a variety of systems development, systems analysis, evaluation, and market research projects at adl. mr. kvaal has focused his recent work on strategic planning issues facing computer services companies, and on assisting computer users in financial institutions, retail and distribution companies. this work has included operational and management audits, planning and implementation assistance, management information systems development, and the overall design of a nationwide teleprocessing system. according to giuliano and kvaal, data base publishing enterprises tend to evolve through well-defined stages of automation and business development: maintenance of manual data bases (reports, clippings, etc.) and the manual preparation of conventional printed products; partial computerization of the data base and some computer usage in preparation of conventional printed products; considerable automation of the data base and output process; offering of information retrieval and specialized search services on an overnight or phone call basis; and offering direct access to the data base via remote computer terminals. "but," giuliano and kvaal note, "the growing tendency of data base enterprises to evolve along this scale is creating dislocations in many of them, while at the same time, offering new opportunities for participants and suppliers. this uncertainty makes a study such as adl's especially useful at this point in the industry's development." 268 journal of library automation vol. 6/ 4 december 1973 the results of adl's study will be presented to clients in published form and in group meetings held in appropriate locations. the cost to each subscriber is $2,000. additional information may be obtained from philip a. untersee (617864-5770). pertinent publications new 1973 acm publication catalog the new expanded thirty-four page publication catalog of the association for computing machinery has been released. the catalog covers technical publications in over thirty major segments of the computing and automation field. copies are available upon request by writing to: publication services department, association for computing machinery, 1133 avenue of the americas, new york, ny 10036. proceedings of 1973 national computer conference the proceedings of the 1973 national computer conference & exposition are now available from the american federation of information processing societies, inc. (afips). the conference proceedings, volume 42, contains more than 160 technical papers and abstracts covering a wide range of topics in computer science & technology and methods & applications featured at the recent '73 ncc, june 4-8 in new york. the price of the 920 page hard-cover volume is $40. a reduced rate of $20 is available for prepaid orders from members of the afips' constituent societies stating their affiliation and membership number. copies of the proceedings may be ordered from afips press, 210 summit ave., montvale, nj 07645. computerized serials systems the larc association announces a new publication series entitled computerized serials systems. each volume in the series will consist of six issues published · at bimonthly intervals in both paperback and hardbound editions. each issue will be authored and edited by a person directly affiliated with the project reported, and each issue will be devoted to papers relating to an automated serials project undertaken by a specific library. the format of the new series is designed to promote understanding through clear narrative description and extensive illustrative materials. for details concerning the purchase of individual issues or a subscription to the complete volume contact larc press, 105-117 west fourth ave., peoria, il 61602. -the binary vector as the basis of an inverted index file donald r. king: rutgers university, new brunswick, new jersey. 307 the inverted index file is a frequently used file structure for the storage of indexing information in a document retrieval system. this paper describes a novel method for the computer storage of such an index. the method not only offers the possibility of reducing storage requirements fot an index but also affords more mpid processing of query statements expressed in boolean logic. introduction the inverted index file is a frequently used file structure for the storage of indexing information in document retrieval systems. an inverted index file may be used by itself or with a direct file in a so-called combined file system. the inverted index file contains a logical record for each of the subject headings or index terms which may be used to describe documents in the system. within each logical record there is a list of pointers to those documents which have been indexed by the subject heading in question. the individual pointers are usually in the form of document numbers stored in fixed-length digital form. obviously, the length of the lists will vary from record to record. the purpose of this paper is the presentation of a new technique for the storage of the lists of pointers to documents. it will be shown that this technique not only reduces storage requirements, but that in many cases the time required to search the index is reduced. the technique is useful in systems which use boolean searches. the relative merits of boolean and weighted term searches are beyond the scope of this paper, as are the relative merits of the various possible file structures. the binary vector as a storage device the exact form of each document pointer is immaterial to the user of a document retrieval system as long as he is able to obtain the document he desires. the standard form for these pointers in most automated systems is a document number. note that each pointer is by itself a piece of information. however, if one thinks of a "peek-a-boo" system, the document 308 journal of library automation vol. 7/4 december 1974 pointer becomes simply a hole punched in a card. in this case the position of the pointer, not the pointer itself, conveys the information. the new technique presented in this paper is an extension of the "peeka-boo" concept. a vector or string of binary zeroes is constructed equal in length to the number of documents expected in the system. the position of each vector element corresponds to a document number. that is, the first position in a vector corresponds to document number one and the tenth vector position corresponds to document number ten. a vector is constructed for each subject heading in the system. as a document enters the system, ones are inserted in place of the zeroes in the positions corresponding to the new document number in the vectors for the subject headings used to describe the document. as an example, assume the following document descriptions are presented to a system using binary vectors: document number 1 2 3 subject headings a,b,d c,e a,c the binary vectors for terms a, b, c, d, and e before the insertion of the indexing data would be as follows: subject heading a b c d e vector 000 ... 0 000 ... 0 000 ... 0 000 ... 0 ooo ... ·o after the insertion of the indexing information, the same vectors would appear as follows: subject heading a b c d e vector 101 ... 0 100 ... 0 011 ... 0 100 ... 0 010 ... 0 the binary vector seems to have several advantages over the standard form of storage of document numbers in an inverted file. first, the records are of fixed length since the vectors are all equal in length to the expected number of documents in the system. space may be left at the end of each vector for the addition of new documents. periodic copying of the file may be used to expand the index records with additional zeroes added at the end of each record during the process. consequently, unless binary vector/king 309 there are limitations of size imposed by the equipment, only one access to the storage device will be needed to retrieve the index record for a term. the second advantage offered by the binary vector method appears in the search process. most modern computers have a built-in capability of performing boolean logical manipulations on binary digit vectors or strings. thus, when boolean operations are specified as part of a query, the implementation of the operations within the· computer is considerably easier and faster for binary vectors than for the standard form of inverted files. other investigators of the use of the binary digit patterns or vectors have not fully explored its advantages and disadvantages. bloom suggests, without an explanation or evaluation, the use of bit patterns as the storage technique for inverted files in large data bases in the area of management information systems.1 davis and lin, again in the area of management information systems, propose bit patterns as the means of locating pertinent records in a master file. 2 they do not compare the method with other possible techniques. sammon discusses briefly the use of binary vectors as a storage technique, but dismisses it on the basis that the two-valued approach obviates the possible assignment of weights to index terms in describing documents. 3 gorokhov discusses the use of a modified binary vector approach in a document retrieval system implemented on a small soviet computer.4 faced with the need to minimize storage requirements for his inverted file, gorokhov concentrated on developing a technique for locating and removing strings of zeroes occurring in the binary vectors used within the system. since these zeroes represent the absence of information they could be removed if there were a way to indicate the position in the original vector of the ones that remained. he proposed the removal of strings of zeroes and the inclusion of numeric place values with the remaining vector elements. his result is a file with variable-length index records. the abandoning of the pure binary vector obviates the process, and gorokhov found it necessary to expand the vector elements into the original vector before logical operations could be applied. even though he does not state so explicitly, gorokhov seems to have found his method more efficient than the standard inverted file. gorokhov' s suggestion has led to the development of an algorithm for the compression of binary vectors. heaps and thiel have also discussed the use of compressed binary vectors as the basis of an inverted index file. 5• 6 aside from a brief description of the method for implementing the concept, they offer no comparison of the binary vector with the standard inverted file. storage requirements an immediate reaction to the concept of binary vectors is to state that they will obviously take more storage space than the standard inverted file. a closer study shows that this is not always the case. the storage requirements for the two types of files may be calculated as follows: 310 journal of library automation vol. 7/4 december 1974 d·n 1. mbv = 8 bytes 2. msr = d · i · k where: ( binary vector file) (standard inverted file) m = storage requirements in bytes d = number of documents in the system n = number of index terms in the system i = average depth of indexing in the system k = size in bytes of a document number stored in the file using equations 1 and 2 we find that the storage requirements for the binary vector file are, in fact, less than the requirements for the standard inverted file if n < 8 •] • k. it is well lmown that the distribution of the use of index terms follows a logarithmic curve. in simple terms, one might say that a few terms are used very frequently and many terms are used infrequently. this condition implies that in a binary vector file the records for many terms will contain segments in which there are no "ones" in any byte. a method for removing these "zero" bytes is called compression. compression algorithm the technique for the compression of binary vectors as described here is designed specifically for the ibm 360 family of computers and similar machines. the extension to other machines should be obvious. within the ibm 360 the byte, which contains eight binary digits, is the basic storage unit, and with the eight binary digits it is possible to store a maximum integer value of 255. for the purpose of describing a proposed compression algorithm for the binary vector in the ibm 360, the term subvector will be defined as a string of contiguous bytes chosen from within the binary vector. a zero subvector will be a subvector each of whose bytes contains eight binary zeroes. a nonzero subvecto1· will be a subvector each of whose bytes contains at least one binary one. to compress a binary vector in the ibm 360 the following steps may be taken: 1. divide the binary vector into a series of zero subvectors and nonzero subvectors. subvectors of either type may have a maximum length of 255 bytes. for zero subvectors longer than 255 bytes, the 256th byte is to be treated as a nonzero byte, thus dividing the long zero subvector. 2. each nonzero subvector is prefixed with two bytes. the first of the prefix bytes contains the count of zero bytes which precede the nonzero subvector in the uncompressed vector. the second prefix byte contains a count of the bytes in the nonzero subvector. 3. the compressed vector then consists of only the nonzero subvectors together with their prefix bytes. 4. a two byte field of binary zeroes will end the compressed vector. binmy vector/king 311 the compression of the vectors creates variable-length records and removes the advantage of having records which are directly amenable to boolean manipulation. the effect of file compression on such manipulation in the search process is not as severe as it might appear. for the search process, the compressed vector may be expanded into its original form. the process of expansion of the binary vectors is relatively simple, and since only those index term records which are used in a query need to be expanded at the search time, the search time is not significantly affected. as an example of the use of the compression algorithm consider the following binary vector. 01100000/10000000/ seven zero bytes j00000001j10000000j ... the slashes indicate the division of the vector into bytes. the vector might be read as indicating the following list of document numbers: 2, 3, 9, 80, and 81. in a standard inverted file with each document number assigned three bytes of storage, fifteen bytes would be required to store these numbers. the compressed vector which results from the application of the algorithm is the following: 00000000j00000010j01100000/10000000j00000111/00000010/ 00000001/10000000/ ... again the slashes separate the vector into bytes. for the purpose of the following discussion consider each byte in a vector to be numbered sequentially beginning with byte one at the left. in the uncompressed vector bytes one and two form a nonzero subvector. consequently, the first four bytes in the compressed vector can be interpreted as follows: byte one. binary zero indicating that no zero bytes were removed preceding this subvector. byte two. binary two indicating that the following nonzero subvector is two bytes long. bytes three, four. bytes one and two of the original vector. bytes three through nine of the original vector are a zero subvector, and bytes ten and eleven form a second nonzero subvector. consequently, the second four bytes of the compressed vector are interpreted as follows: byte five. binary seven indicating that a zero subvector of seven bytes has been removed. byte six. binary two indicating that the following two bytes are a nonzero subvector. bytes seven, eight. bytes ten and eleven of the original vector. thus the binary vector has been reduced from eleven bytes to eight 312 journal of library automation vol. 7/4 december 1974 bytes while the space required to record the document numbers in the standard inverted file remains fifteen bytes. memory requirements for the standard inverted file and the binary vector file to compare memory requirements for the standard inverted file and the compressed binary vector file, we base our comparison on the total number of postings in the file. in the standard inverted file the storage space for the postings is equal to the number of postings times the length of a single posting, which is usually two, three, or five bytes. memory requirements for the compressed binary vector file are more difficult to estimate because the distribution of document numbers within the record for each index term is not known. the fact that a single byte in the binary vector file may contain between zero and eight postings is extremely important. the worst possible case occurs if the postings in the binary vector are spaced in such a way that each nonzero byte contains only one posting, and these bytes are separated by zero bytes. consider the following example: ... /00000000/00010000/00000000/00000100/ ... in this case the compression algorithm will remove the zero bytes, but will add two bytes (the prefix bytes) for each nonzero byte. the resulting compressed vector will be essentially the same length as the standard inverted file record if each posting is three bytes long in the standard inverted file. it might seem that the distribution of one posting per byte for the entire vector represents an even worse situation. it is clear that the compression algorithm will, in this case, not reduce the size of the vector. however, it must be remembered that in the standard inverted file each posting will require at least two bytes and perhaps three bytes. thus, the length of the record in the standard inverted file is two or three times longer than the corresponding binary vector regardless of compression. in data used in two model retrieval systems prepared to compare the standard inverted file and the binary vector file there are 6,121 documents with a total of 94,542 postings. an examination of the binary inverted file for the model systems discloses that there are only 55,311 nonzero bytes in the binary vector file. thus there seems to be some form of clustering of the document numbers in each index term record. if each nonzero byte in this binary vector is isolated by zero bytes, two prefix bytes would be added for each byte. thus the total memory requirements for the postings in the compressed file would be 165,933 bytes. less storage space is required if some nonzero bytes are contiguous. on the other hand, the standard inverted file will require 189,084 bytes if a two-byte posting is used, or 283,626 bytes if a three-byte posting is used. further study of the clustering phenomenon is needed. binary vector /king 313 model retrieval systems to test some of the conjectures about the differences between the standard inverted file and the binary vector file, two model systems were prepared for operation on an ibm 360/67. details of the systems and pl/1 program listings are available elsewhere. 7 the data base used was obtained from the institute of animal behavior at rutgers university. in the data base 6,121 documents were indexed by 1,484 index terms. a total of 94,542 postings in the system gives an average depth of indexing of 15.4 terms per document. both inverted files were stored on ibm 2314 disc storage devices. to ease the problem of handling variable-length records in both files the logical records for each index term were divided into chains of fixed~ lehgth physical records. for the standard inverted file a physical record size of 331 bytes was chosen. the entire file required 702,713 bytes including record overhead. for the uncompressed binary vector file a physical record size of 1,286 bytes was chosen to include overhead and space for up to 10,216 document numbers. when the compression algorithm was applied, with a physical record length of 130 bytes, the memory requirements for the binary vector file were reduced to 281,450 bytes, or 41 percent of the space required to store the standard inverted file. a series of forty searches of varying complexities were run against both files. the "time" function of pl/1 made it possible to accumulate timing statistics which excluded input/output functions. search times for the binary vector file include expansion of the compressed vectors, boolean manipulation of the vectors, and conversion of the resultant vector into digital document numbers. the times for the standard inverted file are for the boolean manipulation of the lists. the following points were noted in the analysis of the times: 1. in twenty-two of the forty queries for which comparative timings were obtained, the search of the binary vector file was faster, in one case by a factor of thirty-five. in the eighteen cases in which the search of the standard inverted file was faster, the search of the standard inverted file was at most 6.17 times faster. 2. the range of the total times for the binary vector file was .79 seconds to 9.72 seconds. the range for searching the standard inverted file was .15 seconds to 202.98 seconds. the fact that the search times for the binary vector file are within a fairly narrow range, in contrast to the wider range of times for searching the standard inverted file, has important implications for the design of an on-line interactive document retrieval system. in such a system it is important that the computer respond to users' requests not only rapidly but consistently. the narrower range of the search times provided by the binary vector file will assist in producing consistent times. 3. the search times for the binary vector file, exclusive of expansion and conversion times, are unaffected by the number of postings con314 journal of library automation vol. 7/4 december 1974 tained in the index terms used in a query. on the other hand, the number of postings in the records used from the standard inverted file appears to cause the differences in search times for that file. to test the conjectures! that 1. search times for the binary vector file are related to the number of index terms in the query, and 2. search times for the standard inverted file are related to the number of postings in the index terms in the query, a correlation analysis was performed. the following correlation coefficients were obtained: v a1'iables 1' number of terms in query and search .960 times for the binary vector file. number of postings in query terms and .979 search times for standard inverted file. the relationships indicated above are significant at the .001 level. no attempt was made to compute an average search time per term for the binary vector file or average search time per posting for the standard inverted file. such times would have meaning only for the model systems. summary the binary vector is suggested as an alternative to the usual method of storing document pointers in an inverted index file. the binary vector file can provide savings in storage space, search times, and programming effort. references 1. burton h. bloom, "some techniques and trade-offs affecting large data base retrieval times," proceedings of the acm 24 ( 1969). 2. d. r. davis and a. d. lin, "secondary key retrieval using an ibm 7090-1310 system," communications of the acm 8:243-46 (april1965). 3. john w. sammon, some mathematics of information storage and retrieval (technical report radc-tr-68-178 [rome, new york: rome air development center, 1968]). 4. s. a. gorokhov, "the 'setka-3' automated irs on the 'minsk-22' with the use of the socket associative-address method of organization of information" (paper presented at the all-union conference on information retrieval systems and automatic processing of scientific and technical information, moscow, 1967. translated and published as part of ad 697 687, national technical information service). 5. h. s. heaps and l. h. thiel, "optimum procedures for economic information retrieval," information storage & retrieval6:131-53 (1970). 6. l. h. thiel and h. s. heaps, "program design for retrospective searches on large data bases," information storage & retrieval8:1-20 (1972). 7. d. r. king, "an inverted file structure for an interactive document retrieval system" (ph.d. dissertation, rutgers university, 1971). microsoft word december_ital_farnell_final.docx editorial board thoughts: metadata training in canadian library technician programs sharon farnel information technologies and libraries | december 2016 3 the core metadata team at my institution is small but effective. in addition to myself as coordinator, we include two librarians and two full-time metadata assistants. our metadata assistant positions are considered to be similar, in some ways, to other senior assistant positions within the organization which require or at least prefer that individuals have a library technician diploma. however, neither of our metadata assistants has such a diploma. their credentials, in fact, are quite different. in part, this difference is driven by the nature of the work that our metadata assistants do. they work regularly with different metadata standards such as mods, dc, ddi in addition to marc. the perform operations on large batches of metadata using languages such as xslt or r. this is quite different in many ways than the work of their colleagues who work with the ils, many of whom do have a library technician diploma. as we prepare for an upcoming short-term leave of one of our team members, i have been thinking a great deal about the work our metadata assistants do and whether or not we would find an individual who came through a librarian technician program who had the skills and knowledge we need a replacement to have. and i have also been reminded of conversations i have had with recently graduated library technicians who felt their exposure to metadata standards, practices, and tools beyond rda and marc had been lacking in their programs. this got me thinking about the presence or absence of metadata courses in library technician programs in canada. i reached out to two colleagues from macewan university—norene erickson and lisa shamchuk—who are doing in-depth research into library technician education in canada. they kindly provided me with a list of canadian institutions that offer a library technician program so i could investigate further. now, i must begin with two caveats. one, this is very much a surface level scan rather than an indepth examination, although this is simply the first step in what i hope will be a longer term investigation. second, although several francophone institutions in canada offer library technician programs, i did not review their programs; i was concerned that my lack of fluency in the french language could lead to inadvertent misrepresentations. sharon farnel (sharon.farnel@ualberta.ca), a member of the ital editorial board, is metadata coordinator, university of alberta libraries, edmonton, alberta. editorial board thoughts | farnel https://doi.org/10.6017/ital.v35i4.9601 4 canadian institutions offering a library technician program (by province) are: alberta ● macewan university (http://www.macewan.ca/wcm/schoolsfaculties/business/programs/libraryandinforma tiontechnology/) ● southern alberta institute of technology (http://www.sait.ca/programs-and-courses/fulltime-studies/diplomas/library-information-technology) british columbia ● langara college (http://langara.ca/programs-and-courses/programs/library-informationtechnology/) ● university of the fraser valley (http://www.ufv.ca/programs/libit/) manitoba ● red river college (http://me.rrc.mb.ca/catalogue/programinfo.aspx?progcode=libifdp®ioncode=wpg) nova scotia ● nova scotia community college (http://www.nscc.ca/learning_programs/programs/plandescr.aspx?prg=lbtn&pln=libin ftech) ontario ● algonquin college (http://www.algonquincollege.com/healthandcommunity/program/library-andinformation-technician/) ● conestoga college (https://www.conestogac.on.ca/parttime/library-and-informationtechnician) ● confederation college (http://www.confederationcollege.ca/program/library-andinformation-technician) ● durham college (http://www.durhamcollege.ca/programs/library-and-informationtechnician) ● seneca college (http://www.senecacollege.ca/fulltime/lit.html) ● mohawk college (http://www.mohawkcollege.ca/ce/programs/community-services-andsupport/library-and-information-technician-diploma-800) information technologies and libraries | december 2016 5 quebec ● john abbott college (http://www.johnabbott.qc.ca/academics/careerprograms/information-library-technologies/) saskatchewan ● saskatchewan polytechnic (http://saskpolytech.ca/programs-andcourses/programs/library-and-information-technology.aspx) my method was quite simple. using the program websites listed above, i reviewed the course listings looking for ‘metadata’ either in the title or in the description when it was available. of the fourteen (14) programs examined, nine (9) had no course with metadata in the title or description. two (2) programs had courses where metadata was listed as part of the content but not the focus: langara college as part of “special topics: creating and managing digital collections” and seneca college as part of “cataloguing iii” which has a partial focus on metadata for digital collections. three (3) of the programs had a course with metadata in the title or description; all are a variation on “introduction to metadata and metadata applications”. (importantly, the three institutions in question conestoga college, confederation college, and mohawk college are all connected and share courses online). so, what do these very preliminary and impressionistic findings tell us? it seems that there is little opportunity for students enrolled in library technician programs in canada to be exposed to the metadata standards, practices, and tools that are increasingly necessary for positions involved in work with digital collections, research data management, digital preservation, and the like. admittedly, no program can include courses on all potentially relevant topics. in addition, formal course work is only one aspect of training and education that can prepare graduates for their career; practica and work placements and other more informal activities during a program are crucial, as are the skills and knowledge that can only be developed once hired and on the job. nevertheless, based on the investigation above, one would be justified in asking if we are disadvantaging students by not working to incorporate additional coursework focused on metadata standards, application, and tools, as well as on basic skills in manipulation of metadata in large batches. scripting languages or equivalent combination of education and experience. master’s desirable.” i edited our statement to more clearly allow a combination of factors that would show sufficient preparation: “bachelor’s degree and a minimum of 3-5 years of experience, or an equivalent combination of education and experience, are required; a master’s degree is preferred,” followed by a separate description of technical skills needed. this increased the number and quality of our editorial board thoughts | farnel https://doi.org/10.6017/ital.v35i4.9601 6 applications, so i’ll remain on the lookout for opportunities to represent what we want to require more faithfully and with an open mind. meanwhile, on the other side of the table, students and recent grads are uncertain how to demonstrate their skills. first, they’re wondering how to show clearly enough that they meet requirements like “three years of work experience” or “experience with user testing” so that their application is seriously considered. second, they ask about possibilities to formalize skills. recently, i’ve gotten questions about a certificate program in ux and whether there is any formal certification to be a systems librarian. surveying the past experience of my own network—with very diverse paths into technology jobs ranging from undergraduate or second master’s degrees to learning scripting as a technical services librarian to pre-mls work experience—doesn’t suggest any standard method for substantiating technical knowledge. once again, the truth of the situation may be that libraries will welcome a broad range of possible experience, but the postings don’t necessarily signal that. some advice from the tech industry about how to be more inviting to candidates applies to libraries too; for example, avoiding “rockstar”/ “ninja” descriptions, emphasizing the problem space over years of experience,1 and designing interview processes that encourage discussion rather than “gotcha” technical tasks. at penn libraries, for example, we’ve been asking developer candidates to spend a few hours at most on a take-home coding assignment, rather than doing whiteboard coding on the spot. this gives us concrete code to discuss in a far more realistic and relaxed context. while it may be helpful to express requirements better to encourage applicants to see more clearly whether they should respond to a posting, this is a small part of the question of preparing new mls grads for library technology jobs. the new grads who are seeking guidance on substantiating their skills are the ones who are confident they possess them. others have a sense that they should increase their comfort with technology but are not sure how to do it, especially when they’ve just completed a whole new degree and may not have the time or resources to pursue additional training. even if we make efforts to narrow the gap between employers and jobseekers, much remains to be discussed regarding the challenge of readying students with different interests and preparation for library employment. library school provides a relatively brief window to instill in students the fundamentals and values of the profession and it can’t be repurposed as a coding academy. there persists a need to discuss how to help students interested in technology learn and demonstrate competencies rather than teaching them rapidly shifting specific technologies. references 1. erin kissane, “job listings that don’t alienate,” https://storify.com/kissane/job-listings-thatdon-t-alienate. june_ita_pekala_final privacy and user experience in 21st century library discovery shayna pekala information technology and libraries | june 2017 48 abstract over the last decade, libraries have taken advantage of emerging technologies to provide new discovery tools to help users find information and resources more efficiently. in the wake of this technological shift in discovery, privacy has become an increasingly prominent and complex issue for libraries. the nature of the web, over which users interact with discovery tools, has substantially diminished the library’s ability to control patron privacy. the emergence of a data economy has led to a new wave of online tracking and surveillance, in which multiple third parties collect and share user data during the discovery process, making it much more difficult, if not impossible, for libraries to protect patron privacy. in addition, users are increasingly starting their searches with web search engines, diminishing the library’s control over privacy even further. while libraries have a legal and ethical responsibility to protect patron privacy, they are simultaneously challenged to meet evolving user needs for discovery. in a world where “search” is synonymous with google, users increasingly expect their library discovery experience to mimic their experience using web search engines.1 however, web search engines rely on a drastically different set of privacy standards, as they strive to create tailored, personalized search results based on user data. libraries are seemingly forced to make a choice between delivering the discovery experience users expect and protecting user privacy. this paper explores the competing interests of privacy and user experience, and proposes possible strategies to address them in the future design of library discovery tools. introduction on march 23, 2017, the internet erupted with outrage in response to the results of a senate vote to roll back federal communications commission (fcc) rules prohibiting internet service providers (isps), such as comcast, verizon, and at&t, from selling customer web browsing histories and other usage data without customer permission. less than a week after the senate vote, the house followed suit and similarly voted in favor of rolling back the fcc rules, which were set to go into effect at the end of 2017.2 the repeal became official on april 3, 2017 when the president signed it into law.3 this decision by u.s. lawmakers serves as a reminder that today’s internet economy is a data economy, where personal data flows freely on the web, ready to be compiled and sold to the highest bidder. continuous online tracking and surveillance has become the new normal. shayna pekala (shayna.pekala@georgetown.edu) is discovery services librarian, georgetown university library, washington, dc. privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 49 isps are just one of the many players in the online tracking game. major web search engines, such as google, bing, and yahoo, also collect information about users’ search histories, among other personal information.4 by selling this data to advertisers, data brokers, and/or government agencies, these search engine companies are able to make a profit while providing the search engines themselves for “free.” in addition to profiting off of user data, web search engines also use it to enhance the user experience of their products. collecting and analyzing user data enables systems to learn user preferences, providing personalized search results that make it easier to navigate the ever-increasing sea of online information. the collection and sharing of user data that occurs on the open web is deeply troubling for libraries, whose professional ethics embody the values of privacy and intellectual freedom. a user’s search history contains information about a user’s thought process, and the monitoring of these thoughts inhibits intellectual inquiry.5 libraries, however, would be remiss to dismiss the success of web search engines and their use of data altogether. mit’s preliminary report on the future of libraries urges, “while the notion of ‘tracking’ any individual’s consumption patterns for research and educational materials is anathema to the core values of libraries...the opportunity to leverage emerging technologies and new methodologies for discovery should not be discounted.”6 this article examines the current landscape of library discovery, the competing interests of privacy and user experience at play, and proposes possible strategies to address them in the future design of library discovery tools. background library discovery in the digital age the advent of new technologies has drastically shaped the way libraries support information discovery. while users once relied on shelf-browsing and card catalogs to find library resources, libraries now provide access to a suite of online tools and interfaces that facilitate cross-collection searching and access to a wide range of materials. in an online environment, many paths to discovery are possible, with the open web playing a newfound and significant role. today’s library discovery tools fall into three categories: online catalogs (the patron interface of the integrated library system (ils)), discovery layers (a patron interface with enhanced functionality that is separate from an ils), and web-scale discovery tools (an enhanced patron interface that relies on a central index to bring together resources from the library catalog, subscription databases, and digital repositories).7 these tools are commonly integrated with a variety of external systems, including proxy servers, inter-library loan, subscription databases, individual publisher websites, and more. for the most part, libraries purchase discovery tools from third-party vendors. while some libraries use open source discovery layers, such as blacklight or vufind, there are currently no open source options for web-scale discovery tools.8 information technology and libraries | june 2017 50 outside of the library, web search engines (e.g. google, bing, and yahoo), and targeted academic discovery products (e.g. google scholar, researchgate, and academia.edu) provide additional systems that enable discovery.9 in fact, web search engines, particularly google, play a significant role in the research process. both students and faculty use google in conjunction with library discovery tools. students typically use google at the beginning of the research process to get a better understanding of their topic and identify secondary search terms. faculty, on the other hand, use google to find out how other scholars are thinking about a topic.10 unsurprisingly, google and google scholar provide the majority of content access to major content platforms.11 the data economy and online privacy concerns in an information discovery environment that is primarily online, new threats to patron privacy emerge. in today’s economy, user data has become a global commodity. commercial businesses have recognized the value of data mining for marketing purposes. bjorn bloching, et. al. explain, “from cleverly aggregated data points, you can draw multiple conclusions that go right to the heart and mind of the customer.”12 along the same lines, the ability to collect and analyze user data is extremely valuable to government agencies for surveillance purposes, creating an additional data-driven market.13 the increasing value of user data has drastically expanded the business of online tracking. in her book, dragnet nation, journalist julia angwin outlines a detailed taxonomy of trackers, including various types of government, commercial, and individual trackers.14 in the online information discovery process, multiple parties collect user data at different points. consider the following scenario: a user executes a basic keyword search in google to access an openly available online resource. in the fifteen seconds it takes the user to get to that resource, information about the user’s search is collected by the internet service provider (isp), the web browser, the search engine, the website hosting the resource, and any third-party trackers embedded in the website. the search query, along with the user’s internet protocol (ip) address, become part of the data collector’s profile on the user. in the future, the data collector can sell the user’s profile to a data broker, where it will be merged with profiles from other data collectors to create an even more detailed portrait of the user.15 the data broker, in turn, can sell the complete dataset to the government, law enforcement, commercial businesses, and even criminals. this creates serious privacy concerns, particularly since users have no legal right over how their data is bought and sold.16 privacy protection in libraries libraries have deeply-rooted values in privacy and strong motivations to protect it. intellectual freedom, the foundation on which libraries are built, necessarily requires privacy. in its interpretation of the library bill of rights, the american library association (ala) explains, “in a library (physical or virtual), the right to privacy is the right to open inquiry without having the subject of one’s interest examined or scrutinized by others.”17 many studies support this idea, privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 51 having found that people who are indiscriminately and secretly monitored censor their behavior and speech.18 libraries have both legal and ethical obligations to protect patron privacy. while there is no federal legislation that protects privacy in libraries, forty-eight states have regulations regarding the confidentiality of library records, though the extent of these protections varies by state.19 because these statutes were drafted before the widespread use of the internet, they are phrased in a way that addresses circulation records and does not specifically include or exclude internet use records (records with information on sites accessed by patrons) from these protections. therefore, according to theresa chmara, libraries should not treat internet use records any differently than circulation records with respect to confidentiality.20 the library community has established many guiding documents that embody its ethical commitment to protecting patron privacy. the ala code of ethics states in its third principle, “we protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”21 the international federation of library associations and institutions (ifla) code of ethics has more specific language about data sharing, stating, “the relationship between the library and the user is one of confidentiality and librarians and other information workers will take appropriate measures to ensure that user data is not shared beyond the original transaction.”22 the library community has also established practical guidelines for dealing with privacy issues in libraries, particularly those issues relating to digital privacy, including the ala privacy guidelines23 and the national information standards organization (niso) consensus principles on user’s digital privacy in library, publisher, and software-provider systems.24 additionally, the library freedom project was launched in 2015 as an educational resource to teach librarians about privacy threats, rights, and tools, and in 2017, the library and information technology association (lita) released a set of seven privacy checklists25 to help libraries implement the ala privacy guidelines. personalization of online systems while user data can be used for tracking and surveillance, it can also be used to improve the digital user experience of online systems through personalization. because the growth of the internet has made it increasingly difficult to navigate the continually growing sea of information online, researchers have put significant effort into designing interfaces, interaction methods, and systems that deliver adaptive and personalized experiences.26 angsar koene, et. al. explain, “the basic concept behind personalization of on-line information services is to shield users from the risk of information overload, by pre-filtering search results based on a model of the user’s preferences… a perfect user model would…enable the service provider to perfectly predict the decision a user would make for any given choice.”27 the authors continue to describe three main flavors of personalization systems: 1. content-based systems, in which the system recommends items based on their similarity to items that the user expressed interest in; information technology and libraries | june 2017 52 2. collaborative-filtering systems, in which users are given recommendations for items that other users with similar tastes liked in the past; and 3. community-based systems, in which the system recommends items based on the preferences of the user’s friends.28 many popular consumer services, such as amazon.com, youtube, netflix, google, etc., have increased (and continue to increase) the level of personalization that they offer.29 one such service in the area of academic resource discovery is google scholar’s updates, which analyzes a user’s publication history in order to predict new publications of interest.30 libraries, in contrast, have not pressed their developers and vendors to personalize their services in favor of privacy, even though studies have shown that users expect library tools to mimic their experience using web search engines.31 some web-scale discovery services do, however, allow researchers to set personalization preferences, such as their field of study, and, according to roger schonfeld, it is likely that many researchers would benefit tremendously from increased personalization in discovery.32 in this vein, the american philosophical society library recently launched a new recommendation tool for archives and manuscripts that uses circulation data and user-supplied interests to drive recommendations.33 opportunities for user experience in library discovery a major challenge in today’s online discovery environment is that the user is inhibited by an overwhelming number of results. this leads to users rely on relevance rankings and to fail to examine search results in depth. creating fine-tuned relevance ranking algorithms based on user behavior is one remedy to this problem, but it relies on the use of personal user data.34 however, there may be opportunities to facilitate data-driven discovery while maintaining the user’s anonymity that would be suitable for library (and other) discovery tools. irina trapido proposes that relevance ranking algorithms could be designed to leverage the popularity of a resource measured by its circulation statistics or by ranking popular or introductory materials higher than more specialized ones to help users make sense of large results sets.35 michael schofield proposes “context-driven design” as an intermediary solution, whereby the user opts in to have the system infer context from neutral device or browser information, such as the time of day, business hours, weather, events, holidays, etc.36 jason clark describes a search prototype he built that applies these principles, but he questions whether these types of enhancements actually add value to users.37 rachel vacek cautions that personalization is not guaranteed to be useful or meaningful, and continuous user testing is key.38 discussion there are several aspects to consider for the design of future library discovery tools. the integrated, complex nature of the web causes privacy to become compromised during the information discovery process. library discovery tools have been designed not to retain borrowing records, but have not yet evolved to mask user behavior, which is invaluable in today’s data economy. it is imperative that all types of library discovery tools have built-in functionality to privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 53 protect patron privacy beyond borrowing records, while also enabling the ethical use of patron data to improve user experience. even if library discovery tools were to evolve so that they themselves were absolutely private (where no data were ever collected or shared), other online parties (isps, web browsers, advertisers, data brokers, etc.) would still have access to user data through other means, such as cookies and fingerprinting. the operating reality is such that privacy is not immediately and completely controllable by libraries. laurie rinehart-thompson explains, “in the big picture, privacy is at the mercy of ethical and stewardship choices on the part of all information handlers.”39 while libraries alone cannot guarantee complete privacy for their patrons, they can and should mitigate privacy risks to the greatest extent possible. at the same time, ignoring altogether the benefits of using patron data to improve the discovery user experience may threaten the library’s viability in the age of google. roger schonfeld explains, “if systems exclude all personal data and use-related data, the resulting services will be onedimensional and sterile. i consider it essential for libraries to deliver dynamic and personalized services to remain viable in today's environment; expectations are set by sophisticated social networks and commercial destinations.”40 libraries must find ways to keep up with greater industry trends while adhering to professional ethics. recommendations while libraries have traditionally shied away from collecting data about patron transactions, these conservative tendencies run counter to the library’s mission to provide outstanding user experience and the need to evolve in a rapidly changing information industry. as the profession adopts new technologies, ethical dilemmas present themselves that are tied into their use. while several library organizations have issued guidance for libraries about the role of user data in these new technologies, this does not go far enough. the niso privacy principles, for instance, acknowledge that its principles are merely “a starting point.”41 examining the substance of these guidelines is important for confronting the privacy challenges facing library discovery in the 21st century, but there are additional steps libraries can take to more fully address the competing interests of privacy and user experience in library discovery and in library technologies more generally. holding third parties accountable libraries are increasingly at the mercy of third parties when it comes to the development and design of library discovery tools. unfortunately, these third parties not have the same ethical obligations to protect patron privacy that librarians do. in addition, the existing guidance for protecting user data in library technologies is directed towards librarians, not third party vendors. the library community must hold third parties accountable for the ethical design of library discovery tools. one strategy for doing this would be to develop a ranking or certification process for discovery tools based on a community set of standards. the development of hipaa-compliant information technology and libraries | june 2017 54 records management systems in the medical field sets an example. because healthcare providers are required by law to guarantee the privacy of patient data,42 they must select electronic health records systems (erms) that have been certified by an office of the national coordinator for health information technology (onc)-authorized body.43 in order to be certified, the system must adhere to a set of criteria adopted by the department of health and human services,44 which includes privacy and security standards.45 another example is the consumer reports standard and testing program for consumer privacy and security, which is currently in development. consumer reports explains the reason for developing this new privacy standard, “if consumer reports and other public-interest organizations create a reasonable standard and let people know which products do the best job of meeting it, consumer pressure and choices can change the marketplace.”46 libraries could potentially adapt the consumer reports standards and rating system for library discovery tools and other library technologies. engaging in ux research & design libraries should not rely on third parties alone to address privacy and user experience requirements for library discovery tools. libraries are well-poised to become more involved in the design process itself by actively engaging in user experience research and design. the opportunities for “context-driven design” and personalization based on circulation and other anonymous data are promising for library discovery but require ample user testing to determine their usefulness. understanding which types of personalization features offer the most value while preserving privacy is key to accelerating the design of library discovery tools. the growth of user experience librarian jobs and the emergence of user experience teams and departments in libraries signals an increasing amount of user experience expertise in the field, which can be leveraged to investigate these important questions for library discovery. illuminating the black box when librarians adopt new discovery tools without fully understanding their underlying technologies and the data economy in which they operate, this does not serve users. librarians have ethical obligations that should require them to thoroughly understand how and when user data is captured by library discovery tools and other web technologies, and how this information is compiled and shared at a higher level. not only do librarians need to understand the technical aspects of discovery technologies, they also need to understand the related user experience benefits and privacy concerns and the resulting ethical implications. as technology continues to evolve, librarians should be required to engage in continued learning in these areas. such technology literacy skills could be incorporated in the curriculum of library and information science degree programs, as well as in ongoing professional development opportunities. empowering library users because information discovery in an online environment introduces new privacy risks, communication about this topic between librarians and patrons is paramount. librarians should privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 55 proactively discuss with patrons the potential risks to their privacy when conducting research online, whether they are using the open web or library discovery tools. it is ultimately up to the patron to weigh their needs and preferences in order to decide which tools to use, but it is the librarian’s responsibility to empower patrons to be able to make these decisions in the first place. conclusion with the rollback of the fcc privacy rules that prohibit isps from selling customer search histories without customer permission, understanding digital privacy issues and taking action to protect patron privacy is more important than ever. while privacy and user experience are both necessary and important components of library discovery systems, their requirements are in direct conflict with each other. an absolutely private discovery experience would mean that no user data is ever collected during the search process, whereas a completely personalized discovery experience would mean that all user data is collected and utilized to inform the design and features of the system. it is essential for library discovery tools to have built-in functionality that protects patron privacy to the greatest extent possible and enables the ethical use of patron data to improve user experience. the library community must take action to address these requirements beyond establishing guidelines. holding third party providers to higher privacy standards is a starting point. in addition, librarians themselves need to engage in user experience research and design to discover and test the usefulness of possible intermediary solutions. librarians must also become more educated as a profession on digital privacy issues and their ethical implications in order to educate patrons about their fundamental rights to privacy and empower them to make decisions about which discovery tools to use. collectively, these strategies enable libraries to address user needs, uphold professional ethics, and drive the future of library discovery. references 1. irina trapido, “library discovery products: discovering user expectations through failure analysis,” information technologies and libraries 35, no. 3 (2016): 9-23, https://doi.org/10.6017/ital.v35i3.9190. 2. brian fung, “the house just voted to wipe away the fcc’s landmark internet privacy protections,” the washington post, march 28, 2017, https://www.washingtonpost.com/news/the-switch/wp/2017/03/28/the-house-justvoted-to-wipe-out-the-fccs-landmark-internet-privacy-protections. 3. jon brodkin, “president trump delivers final blow to web browsing privacy rules,” ars technica, april 3, 2017, https://arstechnica.com/tech-policy/2017/04/trumps-signaturemakes-it-official-isp-privacy-rules-are-dead/. 4. nathan freed wessler, “how private is your online search history?” aclu free future (blog), https://www.aclu.org/blog/how-private-your-online-search-history. 5. julia angwin, dragnet nation (new york: times books, 2014), 41-42. information technology and libraries | june 2017 56 6. mit libraries, institute-wide task force on the future of libraries (2016), 12, https://assets.pubpub.org/abhksylo/futurelibrariesreport.pdf. 7. trapido, “library discovery products,” 10. 8. marshall breeding, “the future of library resource discovery,” niso white papers, niso, baltimore, md, 2015, 4, http://www.niso.org/apps/group_public/download.php/14487/future_library_resource_dis covery.pdf. 9. christine wolff, alisa b. rod, and roger c. schonfeld, ithaka s+r us faculty survey 2015 (new york: ithaka s+r, 2016), 11, https://doi.org/10.18665/sr.277685. 10. deirdre costello, “students and faculty research differently” (presentation, computers in libraries, washington, d.c., march 28, 2017), http://conferences.infotoday.com/documents/221/a103_costello.pdf. 11. roger c. schonfeld, meeting researchers where they start: streamlining access to scholarly resources (new york: ithaka s+r, 2015), https://doi.org/10.18665/sr.241038. 12. björn bloching, lars luck, and thomas ramge, in data we trust: how customer data is revolutionizing our economy (london: bloomsbury publishing, 2012), 65. 13. angwin, 21-36. 14. ibid., 32-33. 15. natasha singer, “mapping, and sharing, the consumer genome,” new york times, june 16, 2012, http://www.nytimes.com/2012/06/17/technology/acxiom-the-quiet-giant-ofconsumer-database-marketing.html. 16. lois beckett, “everything we know about what data brokers know about you,” propublica, june 13, 2014, https://www.propublica.org/article/everything-we-know-about-what-databrokers-know-about-you. 17. “an interpretation of the library bill of rights,” american library association, amended july 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 18. angwin, dragnet nation, 41-42. 19. anne klinefelter, “privacy and library public services: or, i know what you read last summer,” legal references services quarterly 26, no. 1-2 (2007): 258-260, https://doi.org/10.1300/j113v26n01_13. 20. theresa chmara, privacy and confidentiality issues: guide for libraries and their lawyers (chicago: ala editions, 2009), 27-28. 21. “code of ethics of the american library association,” american library association, privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 57 amended january 22, 2008, http://www.ala.org/advocacy/proethics/codeofethics/codeethics. 22. “ifla code of ethics for librarians and other information workers,” international federation of library associations and institutions, august 12, 2012, http://www.ifla.org/news/ifla-code-of-ethics-for-librarians-and-other-informationworkers-full-version. 23. “privacy & surveillance,” american library association, approved 2015-2016, http://www.ala.org/advocacy/privacyconfidentiality. 24. national information standards organization, niso consensus principles on users’ digital privacy in library, publisher, and softwareprovider systems (niso privacy principles), published on december 10, 2015, http://www.niso.org/apps/group_public/download.php/15863/niso%20consensus%20pr inciples%20on%20users%92%20digital%20privacy.pdf. 25. “library privacy checklists,” library and information technology association, accessed march 7, 2017, http://www.ala.org/lita/advocacy. 26. panagiotis germanakos and marios belk, “personalization in the digital era,” in humancentred web adaptation and personalization: from theory to practice, (switzerland: springer international publishing switzerland, 2016), 16. 27. ansgar koene et al., “privacy concerns arising from internet service personalization filters,” acm sigcas computers and society 45, no. 3 (2015): 167. 28. ibid., 168. 29. ibid. 30. james connor, “scholar updates: making new connections,” google scholar blog, https://scholar.googleblog.com/2012/08/scholar-updates-making-new-connections.html. 31. schonfeld, meeting researchers where they start, 2. 32. roger c. schonfeld, does discovery still happen in the library?: roles and strategies for a shifting reality (new york: ithaka s+r, 2014), 10, https://doi.org/10.18665/sr.24914. 33. abigail shelton, “american philosophical society announces launch of pal, an innovative recommendation tool for research libraries,” american philosophical society, april 3, 2017, https://www.amphilsoc.org/press/pal. 34. trapido, “library discovery products,” 17. 35. ibid. 36. michael schofield, “does the best library web design eliminate choice?” libux, september information technology and libraries | june 2017 58 11, 2015, http://libux.co/best-library-web-design-eliminate-choice/. 37. jason a. clark, “anticipatory design: improving search ux using query analysis and machine cues,” weave: journal of library user experience 1, no. 4 (2016), https://doi.org/10.3998/weave.12535642.0001.402. 38. rachel vacek, “customizing discovery at michigan” (presentation, electronic resources & libraries, austin, tx, april 4, 2017), https://www.slideshare.net/vacekrae/customizingdiscovery-at-the-university-of-michigan. 39. laurie a. rinehart-thompson, beth m. hjort, and bonnie s. cassidy, “redefining the health information management privacy and security role,” perspectives in health information management 6 (2009): 4.s 40. marshall breeding, “perspectives on patron privacy and security,” computers in libraries 35, no. 5 (2015): 13. 41. national information standards organization, niso consensus principles. 42. joel jpc rodrigues, et al., “analysis of the security and privacy requirements of cloud-based electronic health records systems,” journal of medical internet research 15, no. 8 (2013), https://www.ncbi.nlm.nih.gov/pmc/articles/pmc3757992/. 43. office of the national coordinator for health information technology, guide to privacy and security of electronic health information, april 2015, https://www.healthit.gov/sites/default/files/pdf/privacy/privacy-and-security-guide.pdf. 44. office of the national coordinator for health information technology, “health it certification program overview,” january 30, 2016, https://www.healthit.gov/sites/default/files/publichealthitcertificationprogramovervie w_v1.1.pdf. 45. office of the national coordinator for health information technology, “2015 edition health information technology (health it) certification criteria, base electronic health record (ehr) definition, and onc health it certification program modifications final rule,” october 2015, https://www.healthit.gov/sites/default/files/factsheet_draft_2015-10-06.pdf. 46. consumer reports, “consumer reports to begin evaluating products, services for privacy and data security,” consumer reports, march 6, 2017, http://www.consumerreports.org/privacy/consumer-reports-to-begin-evaluatingproducts-services-for-privacy-and-data-security/. lib-s-mocs-kmc364-20141005045055 207 the ad · hoc discussion group on serials data bases: its history, current position, and future richard anable: coordinator, york university libraries, toronto, ontario, canada. history the ad hoc discussion group on serials data bases was formed as a result of an informal meeting held during the american library association's conference in las vegas on june 26, 1973. those in attendance were primarily interested in the generation and maintenance of machine-readable union files of serials. (this author's involvement in that meeting and the later activities of the group stems from a contract between the national library of canada and york university concerning an investigation of the problems associated with machine-readable serials files.) it was intended to be a relatively small and informal meeting of about ten individuals. the meeting was by no means closed, but it was not widely advertised. however, twenty-five individuals representing twenty institutions on the national (both the united states and canada), regional, and local levels attended. at the meeting there was a great deal of concern expressed about: 1. the lack of communication among the generators of machine-readable serials files. 2. the incompatibility of format andjor bibliographic data among existing files. 3. the apparent confusion about the existing and proposed bibliographic description and format "standards." there was general agreement that something should and could be done about these problems, and that the formation of a group specifically concerned with the generation and maintenance of machine-readable serials data bases would at least improve the communications aspect of the overall problem. (poor communication was thought by some to be at the root of the other problems.) it was also suggested that such a group could lay the groundwork for solving some of the compatibility problems, by presenting proposals on various aspects of the overall problem. these proposals might be used as guidelines for any new projects or revisions of existing ones. it 208 journal of library automation vol. 6/ 4 december 1973 was felt very strongly that the time factor was crucial if the efforts of such a group were to be useful, particularly to several of the institutions represented at the meeting. there was also a concern that the activities of the group should not parallel or duplicate any work already being undertaken by other groups. while various ala committees were dealing with some aspects of the overall problem, no one committee seemed to be addressing its entire scope. the association of research libraries was conducting a study of the existing serials data bases held by their member institutions, but was not currently addressing the overall problem, particularly with regard to the union list activities. it was suggested that direct communication with that committee be established. the net result of this first meeting was that the discussion group was formed and several meetings were scheduled. cynthia pugsley from the university of waterloo libraries, jay cunningham from the university-wide library automation program, university of california, and this writer were requested to prepare a position paper outlining the need for such a group. in july, the minutes of the june 26 meeting and the position paper were distributed. in the meantime a steering committee was arbitrarily selected. the council on library resources agreed to fund a meeting of that committee to be held september 21 at york university in toronto. the steering committee was made up of representatives from the council on library resources (clr), northwestern university, the canadian union catalogue taskgroup and its subgroup on the serials union list, the state university of new york (suny), the university of california university-wide library automation program ( ulap), the association of research libraries (arl), the joint committee on the union list of serials ( jculs), ohio college library center ( oclc), the national serials data program (nsdp), the library of congress (lc), the national library of canada (nlc), universite laval, international serials data system ( isds) /canada, and an observer from the british library. the purpose of the meeting was: 1. to establish a mechanism for creating a set of "agreed-upon practices for converting and communicating machine-readable serials data." 2. to establish a mechanism for cooperatively converting a comprehensive retrospective bibliographic data base of serials. to further these ends, the following subcommittees were established: 1. holding statement notation 2. working communications conventions 3. authority files 4. cooperative conversion mechanism the steering committee recognized the need for swift action on the development of "agreed-upon practices." consequently, this job was delegatad hoc discussidn group i anable 209 ed to the holding statement notation and the working communications conventions subcommittees. it deferred action on the question of a cooperative conversion effort until a report was received from that subcommittee at the next steering committee meeting scheduled for october 22, 1973 during the american society for information science meeting in los angeles. on october 10, three of the four subcommittees met for very brief sessions at the library of congress. the most significant results came from the cooperative conversion subcommittee which recommended that: ( 1) a proposal for a cooperative project be prepared as soon as possible; and (2)· that the conversion vehicle for such a project be the oclc facilities. at the next steering committee meeting these recommendations were accepted and the coordinator was asked to prepare a draft of a proposal for review by the cooperative conversion subcommittee. at this time the proposal is being prepared. the question of the need for formal affiliation with one or more of the existing professional organizations had repeatedly been raised at the various meetings. it was initially decided to inform the appropriate organizations of our existence and intentions, and to cooperate whenever and wherever our activities overlapped. when the group decided to prepare a proposal for a cooperative conversion project, the need for such· affiliation increased dramatically. at the october 22 meeting, the association of research libraries indicated a positive interest in our exploring that possibility further with them. they asked for a more detailed definition of our goals and plans, which is also being prepared. · generally the reaction of the group toward some kind of organizational arrangement with arl, if assurance could be made regarding the participation of non-arl institutions, was favorable. another question that lingers is whether it would be advisable to have a formal dual affiliation with both arl and a second professional organization. at this point the question is still open. current position thus far the activities of the group have addressed the problems of: 1. the improvement of communications among institutions engaged in the generation or maintenance of serials data bases. 2. the establishment of a set of "agreed-upon practices.'' 3. the investigation of future means of cooperative or coordinated serials record conversion of retrospective titles. the reasons for these efforts are obvious. we are currently all spending much time and money on noncooperative and uncoordinated local and regional conversion, and few of us are satisfied with the results. through improved communications among conversion efforts, we hope 210 journal of library automation vol. 6/4 december 1973 to establish a set of "agreed-upon practices" which should increase the interfiling compatibility. this in turn should reduce the cost to each institution. the use of a common centralized data collection vehicle will minimize redundant conversion. the problems associated with the generation and maintenance of union files of serials have multiplied in the last decade with the introduction of the anglo-american cataloguing rules ( aacr), establishment of the international serials data system ( isds), the presentation of the international standard bibliographic description for serials ( isbds) proposal, the distribution of the library of congress marc serials records, and the increasing role played by the indexing and abstracting services as points of access to serials lists of all types. individually our institutions cannot comprehensively attack all of these aspects. if attacked independently, there is little chance of similarity of approach; if attacked jointly, through the establishment of a set of "agreed-upon practices," similarity will be greater. if attacked jointly through a cooperative conversion effort, the resulting file will be equally usable to all the participants. it is the primary objective of the cooperative conversion project to establish a relatively comprehensive bibliographic data base of serials titles within a time frame which would eliminate the necessity for redundant and costly conversion efforts on the local and regional levels. the prime use of the resulting data base is intended to be the support of union list of serials activities. the secondary objectives are: i. to assist the national libraries of both countries (canada and the united states) in the establishment of a computer-maintained (and hopefully remotely accessible) serials data system. this will be accomplished partly by the very existence of the resulting data base, and partly by the experience gained in its establishment. 2. to assist in the definition of the roles of the regional or resource centers in such enterprises. 3. to provide a source data base for use within the international serials data system, and to seek the active participation of the canadian and united states national centers. the intention of the cooperative conversion project is to establish a comprehensive data base of serials titles in such a way as to accommodate the past, present, and future standards for format, description, and identification, when they can be identified. it is not the intention of this group to establish any new standards in any area. the proposed record structure will be a composite record complying with the iso /2709 format standard on level one (structure), and will attempt to reconcile the minor conflicts among the international serials data system's guidelines, the national serials data program internal format, the library of congress' marc-s format, and the draft of the caad hoc discussion group/ anable 211 nadian marc serials format on levels two and three (content designators and content) . the problems here appear to be technical in nature and by no means insurmountable. thus far the communication among the par· ticipants (including representatives from three of the four areas) has been most encouraging. the record will be based on a minimum set of data elements established to provide enough data to support the union list functions. however, this is a minimum and not a maximum set. it is basically a convention below which a record will be considered incomplete and above which it will be considered acceptable. there probably will be two additional categories of data elements besides those that are required: ( 1) required if readily avail. able; and ( 2) not required by the system, but acceptable. "required if readily available" covers those situations where complete bibliographic data are available at the time of conversion, such as where library of con· gress data are present. if the data are there, it is cheaper to convert at that point than at a later date. for this category and the "not required but ac· ceptable" category, a set of agreed·upon practices will be in effect to en· sure that if a data element is converted it will be consistent in content with similarly tagged fields. since at the time of this writing the proposed working communication conventions have not been finalized, it is not possible for the reader to judge whether the minimum set of data elements will meet his local or regional requirements. at this stage it appears that the set will probably in· elude over thirty elements and will have as a subset the isds data element requirements. the conversion project is intended not to compete with any existing or planned programs at either the library of congress or the national li· brary of canada. in fact, it is intended to complement activities in which these two institutions might be engaged. the distribution of the lc marc-s records, and the similar proposed service by the national library of canada, deal primarily with new titles or title changes, and not with the conversion of retrospective titles. while it is the stated intent of the nsdp to attack this area (retrospective titles), thus far it has not been fund· ed to do so. in fact the active involvement of both national libraries and their isds centers is anticipated since their contribution would be inappropriate to duplicate. it is intended that the resulting data base be made available to the isds international center and thus the rest of the international li· brary community. while the direct participation in the conversion effort may well be limited to a manageable number of institutions, this should not deter any institution from direct involvement in the deliberations of the group. what is requested is that the prospective participants have a serious interest in the solving of problems within the short time frame allowed. 212 journal of library automation vol. 6/4 december 1973 future to repeat, the basic goals of the group are: 1. to improve the communication among the generators of serials data bases. 2. to establish a set of "agreed-upon practices." 3. to establish a mechanism for the cooperative conversion of a comprehensive serials data base. the first goal will be an on-going effort, probably carried through by one or more existing organizations. the second goal, we hope to have partially completed by the time ala meets in chicago in january 1974, through the presentation to the steering committee of the reports of the various subcommittees. the third goal will be accomplished in stages. by ala midwinter we hope to have a concrete proposal that can be presented to all prospective participants, to funding agencies, and to the library community as a whole. since time is one of the items to be optimized, we feel that we should have the project launched no later than the end of the second quarter of 1974. the basic approach being proposed in this document can be characterized in the following ways: 1. a limited number of large institutions ( 5-15), or centers representing large institutions, will use a single on-line data collection facility, such as the ohio college library center ( oclc), to convert their retrospective serials files. 2. one or more large bibliographic 6.les will be used as a base file ( possibly the minnesota union list of serials file) to which new records or fields can be added. 3. the conversion requirement will be based on: (a) the building of a composite record incorporating the aacr, isds, and proposed isbd ( s) requirements. (b) minimum set of data elements basically for union list of serials purposes. (c) the concept of an expandable record able to incorporate: ( 1) variant entry approaches, and (2) available (but not required) data elements. such an approach is a series of compromises, the first of which deals with the trade-off between time and cost. one argument which has been offered against the concept of collecting an "incomplete" serials record is that the total cost to the library community in the long run will be greater than if a complete record conversion were to be done initially. this argument is a carryover from the similar discussion concerning monographs. however, we must recognize the following: l. serials records are of a dynamic nature; what is true for a title this ad hoc discussion group/ anable 213 year probably will not be true next year. the more comprehensive the data element set, the more true this becomes. 2. the cost curve dramatically increases as the number and type of data elements increase. 3. the increased time required to collect, edit, and control an exhaustive data element set will seriously protract the time frame of such an effort. 4. the time frame is one of the prime targets for optimization. any massive additional data collection requirement will compromise that goal. there are no conclusive studies in the area of serials conversions which suggest that the "total" record conversion approach would be less expensive in the long run tha11 a base record conversion approach. we do not know what the total conversion effort is now, but it is guessed to be significant. the utilization of most of the existing files is primarily for catalogs or for location services which the proposed minimum data set will accommodate. only a limited number of institutions have experimented with serials check-in or other functions requiring more complete records. the building of a composite record incorporating the various bibliograpllic standards is easily justified. such a record must be accessible via past, current, future, and popular practices. the ability to incorporate alternative applications of the same standards is also important, particularly in those cases where the rule is open to interpretation. this is very important if there is no centralized authority to control the application of a specific standard. the ability to convert additional data elements which are readily available but not required, is also an important capability since it will reproduce a more complete record at a reasonable cost. keeping the number of contributing institutions to a relatively small number simplifies the control aspects of the project. using a central on-line (remotely accessible) system such as oclc reduces the amount of software development required and reduces the degree of redundant conversions. it also will enable us to start conversion in a time frame otherwise impossible. the use of at least one large bibliographic file such as muls decreases the amount of original conversion, thus shortening the total time frame. the use of multiple starting data bases increases the matching requirements of similar records among the data bases but further reduces the original effort. the problem of selecting data bases is being studied. summary i have attempted to define in this article the history, the current position, and the future plans of the ad hoc discussion group on serials data bases. we have tried to include in the deliberations of the group as many of the interested parties as possible. omissions do exist, not by intent, but 214 journal of library automation vol. 6/ 4 december 1973 because of a lack of complete information and poor communication. i have tried to act with speed because of the need expressed by the participants. the group is not a closed shop. any institution seriously desiring to make contributions is welcome. please consider this an open invitation. any documentation desired is readily available from me, as the coordinator. the willingness of the participants to cooperate and to make compromises has thus far exceeded all expectations, particularly in those areas where problems were expected. it has truly been a group effort. i would like to especially thank the national library of canada, the library of congress, and the national serials data program for the cooperation they have given to the regional organizations which have been the backbone of this effort. june_ita_buljung_final up against the clock: migrating to libguides v2 on a tight timeline brianna buljung and catherine johnson information technology and libraries | june 2017 68 abstract during fall semester 2015, librarians at the united states naval academy were faced with the challenge of migrating to libguides version 2 and integrating libanswers with libchat into their service offerings. initially, the entire migration process was anticipated to take almost a full academic year; giving guide owners considerable time to update and prepare their guides. however, with the acquisition of the libanswers module, library staff shortened the migration timeline considerably to ensure both products went live on the version 2 platform at the same time. the expedited implementation timeline forced the ad hoc implementation teams to prioritize completion of the tasks that were necessary for the system to remain functional after the upgrade. this paper provides an overview of the process the staff at the nimitz library followed for a successful implementation on a short timeline and highlights transferable lessons learned during the process. consistent communication of expectations with stakeholders and prioritization of tasks were essential to the successful completion of the project. introduction academic libraries all over the united states have migrated from libguides version 1 to the new, sleeker, responsive design of version 2. approaches to the migration can differ vastly depending on library size, staff capabilities and time frame available for completing the project. in 2015, the nimitz library at the united states naval academy, began planning to both upgrade libguides to version 2 and to acquire libanswers with libchat. the web team and reference department partnered to migrate the libguides platform and integrate libanswers into the library’s web presence. the library first adopted springshare’s libguides in 2009. by 2015, the subscription had grown to 61 published guides with 10,601 views. the libguides collection was modified and expanded during two web site upgrades and several staffing changes. throughout 2014 and 2015, library staff periodically discussed the possibility of upgrading to the version 2 interface, but timing, brianna buljung (bbuljung@mines.edu) is instruction & research librarian, colorado school of mines, golden, co. catherine johnson (cjohnson@usna.edu) is head of reference & instruction at the united states naval academy, annapolis, md. up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 69 staffing vacancies and the priority of other projects kept the migration from taking place. in late summer 2015, with the acquisition of springshare’s libanswers with libchat pending, staff determined that it was finally time to migrate to the new libguides interface. initially, the migration team planned to spend nearly a full academic year completing the migration process. this timeline would provide guide owners with ample time for staff training, revising guides, conducting usability testing and preparing the migrated guides to go live without distracting from their other duties. however, right before starting the project, the library finalized the acquisition of springshare’s libanswers with libchat which they decided to launch with the version 2 interface. the team pushed up the libguides migration by several months to keep from confusing patrons with multiple interfaces and launch dates. the migration of libguides and the implementation of libanswers would take place during the fall semester and both products would go live in the version 2 interface before the start of the spring semester. this paper provides an overview of the process that the staff at nimitz library followed for a successful implementation on a short timeline and highlights transferable lessons learned during the process. the authors also include a post-implementation reflection on the process. literature review much of the currently available literature on migration of platforms, especially the libguides platform is published informally. librarians from universities across the country have created help guides, checklists and best practices for surviving the migration. most migration help-guides are tailored to each specific institution but they can still provide helpful suggestions that can be adapted by another.1 springshare also provides extensive help content and checklists, including a list of the most important steps for administrators to complete.2 however, little of the available literature discusses the minimally acceptable amount of work needed to be completed by guide authors. this type of information was crucial to the nimitz library team after drastically shortening the migration timeline. a clearly delineated list of required and optional tasks was needed for guide owners, given time constraints and other job duties. in addition to the informally published help materials, several articles have been published on various aspects of research guide design and evaluation. a few articles examine the migration process. hernandez and mckeen offer advice for libraries contemplating migration; including setting goals and performing usability testing against the new guides.3 duncan et al provide a case study of the implementation process at the university of saskatchewan.4 some articles discuss the basics of guide design and usage in the library. these best practices can be adapted to different platforms, web sites and user populations. they discuss the importance of various web design elements such as word choice and page layout.5 another aspect the literature exposes is student use of the guides.6 finally, usability of research guides is one of the most important and widely discussed topics in the literature. creating and maintaining guide content depends on the user’s ability to locate and use the guides in their research.7 most often, research guides are designed up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 70 with the student in mind; to assist them in beginning a project, researching when a librarian is unavailable or as a reference for follow-up after an instruction session.8 as pittsley and memmott discuss, navigation elements can impact a student’s use of research guides.9 the process as preparations for the migration began, it became immediately apparent that the web team and reference department would have to divide the project into manageable segments to complete the work without overwhelming guide owners. three ad hoc teams, made up of librarians from several different departments, were created to take the lead on different elements of the project. the migration team was responsible for researching, organizing and supervising the migration of libguides to version 2. the libanswers team learned about libanswers and how to effectively integrate the product into the library’s web site. the libchat team tested the functionality of libchat and determined how it would fit into the library’s reference desk staffing model. dividing the project into manageable segments allowed each team to focus on the execution of their area of responsibility. the team approach allowed the library to draw on individual strengths and staff willingness to participate without depending on one single staff member to manage the entire migration and implementation process on such a short timeline. migration team the migration team was responsible for determining the tasks that were mandatory for guide owners to complete, the amount of training they would need to use the new interface and how each product should be incorporated into the library’s web site. the libguides migration team relied heavily on advice from other libraries and the documentation from springshare to guide them in determining mandatory tasks. the engineering and computer science librarian reached out to the asee engineering libraries division listserv for advice from peer libraries that had already completed migration. the team also made use of the springshare help guides and best practices guides posted by other universities. ultimately, the migration team created checklists and spreadsheets to help guide owners prepare their guides for migration. a pre-migration checklist (appendix a) was shared with guide owners; containing all of the required and optional tasks that needed to be completed before the migration took place in early november. tasks such as deleting outdated or unused images and evaluating low use guides for possible deletion were required for guide owners to complete. other tasks such as checking each guide for a friendly url or checking database descriptions for brevity and jargon free language were encouraged but considered optional. the team determined that items directly related to the ability of post-migration guides to function properly made the required list, while more cosmetic or stylistic tasks could be completed on a time-allowed basis. a post-migration checklist (appendix b) was created for guide owners following the migration. this list included portions of the guides that had to be checked to ensure widgets, links and other assets had up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 71 migrated properly. both checklists were accompanied by tips, screenshots, deadlines and indicated which team member to contact with questions. clear explanation of the expectations for the project, and accommodating the guide owners’ busy schedules made the migration more successful. the migration team gave the new, more robust a-z list significant attention. libguides version 2 allows the a-z list to be sorted by subject, type and vendor. it also allows a library to tag “best bets” databases in each subject area. the databases categorized as best bets display more prominently in the list of databases by subject. using google sheets, the electronic resources librarian quickly and easily solicited feedback from liaison librarians about which databases to tag as best bets for each subject area. google sheets also made it easy for librarians to edit the list of databases related to their subject expertise. some databases had been incorrectly categorized and, in some subjects, newer subscriptions didn’t appear on the list. libguides version 2 allows users to sort databases by type, but doesn’t provide a predetermined list of types. in order to create the list of material types into which all databases would be sorted, the migration team examined lists found on other library web sites. several lists were combined and duplicates, or irrelevant types were removed. an additional military specific type was added to address the most common research conducted by midshipmen. then, the liaison librarians were solicited for input on the language used to describe each type and which databases should be tagged by each type. name choices are a matter of local preference, such as having a single type category for both dictionaries and encyclopedias, or two separate categories. to keep the list of material types to a manageable length, the team decided that each type must contain more than one or two databases. it takes time to get well defined lists of subjects and types. staff working with patrons are able to gather informal feedback about the categorizations in their current form, and make suggestions, corrections, or additions based on patron feedback. the migration of libguides and acquisition of libanswers provided the reference department and web team with an opportunity to update policies and establish new best practices for guide owners. one important cosmetic update included more encouragement for guide owners to use a photo in their profiles. profile pictures had been used inconsistently in the first libguides interface, and several guide owners used the default grey avatar. guide owners who were reluctant to have a headshot on their profile were encouraged to take advantage of stock photos made available through the naval academy’s public affairs office. a photo shoot was also organized for guide owners. on a voluntary basis, guide owners spent about an hour helping each other to take pictures in and around the library. the event helped to get a collection of more professional photos for guide owners to choose from. another important update was the re-evaluation of libguides policies in light of the new functionality available in version 2. the guide owners gathered for a meeting midway through the pre-migration guide cleanup process to troubleshoot problems and consider best practices for the up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 72 new interface. guide owners discussed the standardization of tab names in the guides, the information important to include in author profile boxes, and potential categories for the “types” dropdown in the a-z database list. the meeting provided a great opportunity to discuss the options available to guide owners and to solicit feedback on interface mock-ups and guide templates created by the systems librarian. many items from the discussion were incorporated into the update libguides policies for guide owners. libanswers and libchat teams integrating libanswers with libchat, an additional springshare product, at the same time as the migration to libguides version 2 is not necessary. because the acquisition of libanswers coincided with the need to upgrade to version 2, the library staff determined that the two should be done at the same time in order to minimize disruption for patrons. the ad hoc teams tasked with implementing libanswers and libchat met regularly to learn about the new products and to consider how these products would fit into the library’s existing points of service. while the libanswers and libchat teams began as two distinct groups, it became increasingly clear that the functionality of these two systems is interwoven so closely that they must be reviewed and discussed together. the teams spent considerable time learning the functionality of the new systems, considering how the new service points would integrate into the existing offerings, and creating draft policies to provide guidance to staff. the teams developed a set of tips and guidelines to address staff concerns and provide guidance on how the new system should be used (see appendix c). the teams also held training sessions focused on providing opportunities for staff to explore and practice using the new products. although the implementation of libanswers with libchat was not necessary to upgrade to libguides version 2, undertaking all of these upgrades at once allowed the ad hoc groups to collaborate with ease, define policies and procedures that would help these products integrate seamlessly with existing services, and prevent change fatigue within the library. updating the library website the final element of migration and implementation the teams had to consider was integration into the library’s existing web site. many elements of the library’s site are dictated by the broader university web policy and content management system. however, working within guidelines the teams were able to take advantage of the new libguides interface, especially the more robust a-z list of databases, to provide users with multiple ways of accessing the new tools. the library makes use of a tabbed box to provide entry to summon, the catalog, the list of databases and libguides. the new functionality of libguides version 2 enabled the team to provide easier access directly to the alphabetical listing of databases. the libguides tab was also updated to provide a drop down list of all the guides and a link to browse by guide owner, subject or type of guide. these enhancements saved time for the user and cut down on the number of clicks needed to access database content licensed by the library. up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 73 integrating the libanswers product into the site was achieved by providing several different ways for patrons to access it. an faq tab was added to the main tabbed box to provide quick access to libanswers, complete with a link to submit questions. the “contact us” section on the site home page was updated to include a link to libanswers as well as newer, more modern icons for the different contact methods. all guide owners were instructed to update the contact information on their guides to include a libanswers widget. a great source of inspiration on integrating the tools into the library site came from looking at other library web sites. the teams worked from the list of libguides community members provided on the springshare help site and by viewing the sites of known peer libraries. working through an unfamiliar web site can be a quick way to find design ideas and work flows that are successful and attractive. team members found wording, icons and placement ideas that could be adapted for use on the nimitz library site. advice for managing a short migration timeline while on a short implementation timeline or with a small staff that has to accomplish this project in addition to their regular duties, it's important to consider a few strategies that can make the process simpler and less stressful. first, communicate expectations with everyone involved in the project at all steps of the process. determine which stakeholders need to know about the various checklists and upcoming deadlines. communicating needs and expectations throughout the entirety of the project reduces confusion and enables teams and individual guide owners to complete the project on time. although libguides had predominantly been the domain of the nimitz reference department, projects of this scale also impacted other parts of the library, from systems to the electronic resources librarian. email communication and short notices in the library’s weekly staff update were the primary means of communication with stakeholders. documents were shared via google drive to provide guide owners with a centralized file of help materials. also, the point of contact for questions with each element of the migration was clearly identified on each checklist and tip sheet. this single addition to the checklists helped guide owners to quickly and easily get questions and technical issues addressed. on a short timeline it is also important to consider the elements that are crucial for completion and those that can be delayed. some critical needs in a libguides migration include deleting guides that are no longer being used, checking for boxes that will not migrate and deleting bad links. these tasks must be completed by guide owners or administrators to ensure that the migrated data formats properly. careful attention to these tasks also save the staff unnecessary work on updating and fixing the new guides before going live. other elements of guide design and migration are merely nice to have. they complement the user’s experience with the final product but neglecting them will not affect basic functionality. these secondary tasks can be completed as time allows. for guide owners, optional tasks include shortening link descriptions, checking for a up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 74 guide description and friendly url and other general updates to the guides. the migration was broken into manageable tasks by giving guide owners a clear list of required and optional items. team leaders will also need to manage expectations. it can be difficult to remember that web pages, especially libguides, are living documents. they can be updated fairly easily after the system has gone live. on a short timeline, in the midst of other duties and responsibilities, it is acceptable for a guide to be just good enough. there is rarely enough time for each guide to reach a state of perfection prior to going live. a guide that is spell-checked and contains accurate information can be edited and made more aesthetically pleasing as time allows after the entire site has gone live. while additional edits are taking place, students still have access to the information they need for their academic work. lists, such as the subjects and material types in the a-z list, are always a work in progress based on feedback from service points and usability testing. updates and edits should be made as patrons interact with the products. regular use can help library staff identify problems with or confusion about the products that might not be anticipated prior to going live. stress on guide owners can be greatly reduced by communicating expectations throughout the process. post-implementation nimitz library successfully went live with both libguides version 2 and libanswers with libchat in early january 2016, right before midshipmen returned to campus for the spring semester. libanswers with libchat was introduced to the campus community with a soft launch at the beginning of the spring semester due to staffing levels and shifts at the reference desk. the librarian on duty at the reference desk was also responsible for answering any chats or libanswers questions initiated during their shift. the volume of questions remained fairly low during the semester. on average, the library received two synchronous and 1.5 asynchronous incoming questions per week via libanswers with libchat. the low volume was beneficial in that it allowed librarians to become familiar with answering questions and editing faqs. they were able to handle both face-to-face interactions with patrons in the library and the web traffic. however, the volume was so low that it became apparent more marketing of the service was needed. at the start of the fall 2016 semester, the library made an effort to increase awareness of the new libanswers products by emailing all students, mentioning the service in every instruction session, and creating fliers advertising the service and distributing them around the library. though data is preliminary, statistics have shown that use of these services has more than tripled in the first month of the new semester. as discussed above, the expedited implementation timeline forced the ad hoc teams to prioritize completion of the tasks that were necessary for the system to remain functional after the upgrade. this meant other necessary, but not urgent, updates to guides were left untouched during the migration. given the amount of effort needed to prepare the guides for migration, it is understandable that guide owners had grown tired of making libguides updates and found it up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 75 necessary to move on to other projects. with this fatigue in mind, the team leaders will continue to remind guide authors that libguides are living pages in need of constant attention. the team leaders will also take advantage of user feedback to promote continued updates to libguides. throughout the migration process team leaders solicited feedback from staff and users in a variety of ways. first, reference staff wereinformed of design and implementation changes made throughout the migration. they were given time to view and evaluate the master guide template prior to the migration. the team solicited feedback on the names and organization of categories in the a-z list. after the products went live, the team gathered informal feedback through reference desk interviews, in information literacy instruction sessions and in conversations with faculty and students. student volunteers participated in usability testing during the spring semester. they were asked to complete a series of tasks related to the different aspects of the new interface. their feedback, especially from thinking aloud while completing the tasks, revealed to librarians how students actually use the guides. both formal and informal feedback helped librarians adapt and improve the guides. based on the feedback, the systems librarian made global changes to improve system functionality. in one instance, users were having difficulty submitting a new libanswers question when they could not find an appropriate faq response. the systems librarian made the “submit your question” link more prominent for users in that situation. the libguides continue to be evaluated by staff for currency and ease of use. in discussing the first round of usability test results it was determined that more testing during the fall semester of 2016 would be helpful. during the upgrade to version 2 and implementation of libanswers with libchat, librarians focused on the functions in the system that were most essential or most desired. all of these products contain additional functionality that was not implemented during the upgrade. after a brief rest, the reference department and library web team explored the products’ additional functionality and determined what avenues to explore next. conclusions migration of any platform can be an extensive and time consuming task for library staff. preparations and post-migration clean up can interrupt staff workflows and strain limited resources. using migration teams was a successful strategy on a short timeline because it helped spread the workload by delegating specific learning and tasks to specific people. those people, in turn, became experts in their area of focus and served as a resource for others in the library. this model cultivated a sense of ownership in the migration across many stakeholders that might not have otherwise existed. that sense of ownership in the project, coupled with checklists and spreadsheets full of discrete tasks in need of completion made it possible for a small staff to complete the migration quickly and successfully. migrating on a short timeline can be especially stressful but careful planning and good communication of expectations helps stakeholders focus on the end goal. upon completion of the project there was a very real sense of fatigue with this up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 76 project. as a result, tasks that were listed as optional because they weren’t critical for migration went unattended for quite some time after the migration. slowly, months later, guide owners are ready to revisit guides and continue making improvements. if given more time, this migration may have been completed more methodically and with the intent of having everything perfect before moving on to the next step. instead, working on a tight timeline forced us to continue moving forward, making necessary changes, and making note of changes to be made in the future. ultimately, it was a constant reminder that our online presence is and should be a constant work in progress, not the subject of a big, occasional update. references 1. luke f. gadreau, “migration checklist for guide owners,” last modified april 3, 2015, https://wiki.harvard.edu/confluence/display/lg2/migration+checklist+for+guide+owners; leeanne morrow et al., “best practice guide for libguides,” accessed november 17, 2016, http://libguides.ucalgary.ca/c.php?g=255392&p=1703394; rebecca payne, “updating libguides & preparing for libguides v2,” last modified november 18, 2014, https://wiki.doit.wisc.edu/confluence/pages/viewpage.action?pageid=85630373; julia furay, “libguides presentation: migrating from v1 to v2 (julia),” last modified september 29, 2015, http://guides.cuny.edu/presentation/migration. 2. anna burke, “libguides 2: content migration is here!” last modified april 30, 2014, http://blog.springshare.com/2014/04/30/libguides-2-content-migration-is-here/; springshare, “on your checklist: five tips & tricks for migrating to libguides v2,” last modified february 18, 2016; http://buzz.springshare.com/springynews/news-27/springytips; springshare, “migrating to libguides v2(and going live!),” last modified november 7, 2016, http://help.springshare.com/libguides/update/whyupdate. 3. lauren mckeen and john hernandez, “moving mountains: surviving the migration to libguides 2.0,” online searcher 39 (2015): 16-21, http://www.infotoday.com/onlinesearcher/articles/features/moving-mountains-survivingthe-migration-to-libguides--102367.shtml. 4. vicky duncan et al., “implementing libguides 2: an academic case study,” journal of electronic resources librarianship, 27 (2015): 248-258, https://dx.doi.org/10.1080/1941126x.2015.1092351 5. jimmy ghaphery and erin white, “library use of web-based research guides,” information technology and libraries 31 (2012): 21-31, http://dx.doi.org/10.6017/ital.v31i1.1830; danielle a becker; “libguides remakes: how to get the look you want without rebuilding your website,” computers in libraries 34 (2014): 19-22, http://www.infotoday.com/cilmag/jun14/index.shtml; michal strutin, “making research guides up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 77 more useful and more well used,” issues in science and technology librarianship 55(2008), https://dx.doi.org/10.5062/f4m61h5k. 6. ning han and susan l. hall, “think globally! enhancing the international student experience with libguides,” journal of electronic resources librarianship 24(2012): 288-297, https://dx.doi.org/10.1080/1941126x.2012.732512; gabriela castro gessner et al., “are you reaching your audience?: the intersection between libguide authors and libguide users,” reference services review 43(2015): 491-508, http://dx.doi.org/10.1108/rsr-02-2015-0010. 7. luigina vileno, “testing the usability of two online research guides,” partnership: the canadian journal of library and information practice and research 5 (2012), https://dx.doi.org/10.21083/partnership.v5i2.1235; rachel hungerford et., “libguides usability testing: customizing a product to work for your users,” http://hdl.handle.net/1773/17101; alec sonsteby and jennifer dejonghe, “usability testing, user-centered design, and libguides subject guides: a case study,” journal of web librarianship 7(2013): 83-94, https://dx.doi.org/10.1080/19322909.2013.747366. 8. mardi mahaffy, “student use of library research guides following library instruction,” communications in information literacy 6(2012): 202-213, http://www.comminfolit.org/index.php?journal=cil&page=article&op=view&path%5b%5d=v 6i2p202. 9. kate a pittsley and sara memmot, “improving independent student navigation of complex educational web sites: an analysis of two navigation design changes in libguides,” information technology and libraries 31 (2012): 52-64, https://dx.doi.org/10.6017/ital.v31i3.1880. up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 78 appendix a: libguides pre-migration checklist if there are issues, contact the head of reference & instruction. required before migration: due date task check when complete 26 october 2015 review attached report of guides that have not been updated in the last year. delete or consolidate unneeded, practice, or backup guides.* 26 october 2015 review attached report of guides with fewer than 500 hits. delete or consolidate unneeded, practice, or backup guides.* 26 october 2015 review all links to all databases included on your guides and make sure the links are mapped to the a-z list. 26 october 2015 review all guides for links not included in the current a-z list. list any links that you think should be included in the a-z list moving forward on the shared spreadsheet (a-z additions and best bets). be sure to include all necessary information, including subject and type. mid october 2015 & 28 october 2015 review forthcoming reports about broken links. anticipate one report on october 13, and one october 26. 26 october 2015 review the databases by subject page of the a-z list and make sure everything that should be included in your subject is there. add anything you’d like removed from your subject to up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 79 the shared spreadsheet (tab 2). identify 3 “best bets” databases for each of your subject areas on the shared spreadsheet (tab 3). 26 october 2015 ensure all images have an alt tag 26 october 2015 delete outdated or unused images in your image collection 26 october 2015 convert all tables to percentages, not pixels 26 october 2015 review attached report of boxes that will not migrate into version 2. (this won’t apply to everyone) 26 october 2015 email the chair of the web team if you have guides with boxes containing custom formatting or code (this is only necessary if you manually adjusted the html or css, or use a tabs within a box on your guide). we are keeping a master list to double check after migration. 26 october 2015 check all links to the catalog in your guides to make sure they are accurate 26 october 2015 check all widgets (like catalog search boxes) to ensure they function properly, delete any widgets you don’t need, and keep a list of widgets to check post-migration to make sure they still function. up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 80 optional before migration: due date task check when complete consider turning links in ‘rich text’ boxes to a ‘links and list’ box. review all guides to ensure they have a friendly url, are assigned to a subject, have assigned tags, and a brief guide description. shorten database descriptions to one to two sentences. consider including dates of coverage and why it’s useful for this particular subject. helpful hints: *if you’d like to hold on to content from guides you plan to delete, create an unpublished “master guide” where you can store content you plan to use in the future. up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 81 appendix b: libguides post-migration checklist and guide clean up note: now that migration is complete, if you make an update to your version 1 guides, your change will not transfer to version 2. this means broken links will need to be fixed in both versions. if there are issues or questions contact the head of reference & instruction (general questions), the systems librarian (technical issues), or the electronic resources librarian (database assets and a-z list). clean up and check content 1) check boxes to make sure content is correctly displayed on all your guides. check all boxes closely, as some had the header end up below the first bullet point. for example: to fix an issue like this click on at the bottom of the box you are working on. then click on “reorder content”. you can move the links down and the text up 2) ensure all guides have a friendly url, are assigned to a subject, have assigned tags if you didn’t do this pre-migration. see the premigration handout for help. in version 2 this information will display at the top of our guides in edit mode and at the bottom of our guides on the public interface. 3) ensure images are resized to fit general web guidelines see this guide for help http://guidefaq.com/a.php?qid=12922 4) check all your widgets to ensure they still function properly 5) add a guide type to each of your guides. this is a new feature in libguides version 2. it is under the gear on the right side of your guide while in edit mode. this will help us sort and organize them in the list of guides. up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 82 add new libguides 2 content 1) make a box pointing to related guides. research has shown that a box on the guide home page pointing to related guides can be very helpful to students. link to other subject guides that would be of interest and any course guides for that subject. for example: the box on the mechanical engineering guide contains links to em215 and nuclear engineering (which is part of the mechanical engineering department). to do this go to the bottom of your welcome box, click the add/reorder button, and then on guide list, your first option is to manually choose guides to add to the list. 2) add a tab to every guide that is named citing your sources and redirects to the citing your sources libguide. to do this: a. create a blank page named citing your sources at the bottom of your left side navigation b. on your blank page click on the to open the options for editing the page. c. click on redirect url and paste the link to the citing sources guide in the box. d. it is also a good idea to mark the open in a new window box as well e. if you’ve completed it successfully your citing your sources tab will look like this in edit mode. since the citing your sources guide is still a work in progress it is unpublished and you will get an error when you preview it. up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 83 f. finally, remove the plagiarism and citing sources box from your guides. 3) now is a good time to take advantage of new functionality and to update the content of your guides. you can now combine multiple types of information into the same box, you can also take advantage of tabbed boxes. see this libguide for further assistance: http://support.springshare.com/libguides/migration/v2cleanup-regular 4) create your new profile box at the meeting on oct 20th, the reference & instruction department agreed that the following elements should be consistent in the profile box: box name: librarian image: a stock photo or a personal photo (picture day coming soon) in the contact box: title nimitz library xxxx dept. office # xxx 410-293-xxxx email address and your subjects will be displayed below up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 84 appendix c: tips & guidelines for libanswers with libchat what modes of inquiry will be available to users? using the libanswers platform, users will be able to submit questions via chat or by using the question form within libanswers. users will also be able to ask questions as they did before: at the reference desk, via askref@usna.edu, and by calling 410-293-2420. what are “best practices” or guidelines for libanswers w/ libchat? see the tips for responding to tickets at the bottom of this document. see the tips for creating/maintaining faq at the bottom of this document. see the tips for responding to chat questions at the bottom of this document. what priority should i give responses coming through various modes of inquiry? reference staff will have to use their professional judgement when deciding what priority to give questions coming in through various modes of inquiry. while the addition of chat and tickets may seem overwhelming at first, the same rules you’ve applied in the past will work. if a chat comes in while you’re helping someone face-to-face, use that as an opportunity to advertise the chat service. explain to the patron that you also help users via chat and you’re going to let the chatter know that you’ll be with them shortly. the same can apply if you’re finishing up a chat when a face-to-face user walks up. simply explain that the library also offers a chat service and you’re just finishing up a question. remember to get comfortable with and take advantage of the canned messages in chat, let the phone go to voicemail if necessary, and explain to face-to-face users what’s happening. during the pilot phase you should also keep track of strategies that worked well for you, or times when the various modes of inquiry became too overwhelming. we’ll take all of that into consideration when we reexamine this service. chat, phone, and face-to-face interactions are synchronous modes of communication, so users expect responses immediately. tickets are asynchronous modes of communication and should be dealt with on a first come, first served basis. respond to tickets when you have time. when responding to tickets, respond to the oldest tickets first as that user has been waiting the longest for an answer. however, feel free to use your judgement and, if you choose, respond to questions with quick answers right away. how should i prioritize questions from usna v. non-usna users? priority should be given to midshipmen, faculty, and staff. if an outside user makes use of the chat or ticket service, feel free to explain to them that this service is primarily for faculty/staff/students and they should direct their question to askref@usna.edu. if you are free and have time, feel free to assist outside patrons via the chat or ticket system. up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 85 how should i handle remaining questions during a change in shifts? handle them in the same manner that you would a face-to-face question with a student, faculty, or staff member. finish up quickly if you can, advise the patron that you need to leave and offer to handle the question when you return, or transfer the chat to another librarian. if there are remaining tickets in the queue, simply notify the next librarian on duty. what are the expected turnaround times for responding to patron inquiries? chat, face-to-face, and phone inquiries should be responded to as immediately as possible. tickets should be responded to within a business day. who can i contact for help and troubleshooting? if you have questions, your first stop should be the libanswers faq, provided by springshare (available in the “help” section when logged into libapps). if you can’t find the answer to your question there, feel free to contact the head of reference and instruction, who will work to resolve the problem with you. guidelines for responding to libanswers tickets*: ● keep in mind that when you are responding to tickets, you are a jack of all trades. that means even if the question is outside of your subject area, you should do your best to provide the user information that will get them started. in that email you may also suggest that the user contact the subject specialist. ● respond to libanswers tickets in the same way you would respond to an email inquiry from a user. ● if you provide a factual response, be sure to include the source from which that information came. guidelines for creating/maintaining faqs*: ● the faq database is a public-facing, searchable collection of questions and answers. the intent is to empower our users to find their answers. any question that might be considered a frequently asked question should be included in the faq. this might include questions about the library, the collections, how to find specific types of information, how to start research on specific and recurring assignments etc. ● when creating an faq from a ticket, remember that you can edit the question. do your best to format the question in a way that would be applicable and relevant to the most users. ● when creating an faq from a response you’ve already written, be sure to edit out any personally identifiable information (pii) about the person who initially asked the question. be sure to check the question and response for any pii. up against the clock: migrating to libguides v2 on a tight timeline | buljung and johnson https://doi.org/10.6017/ital.v36i2.9585 86 ● if you want to modify an faq: if a member of the staff notices incomplete or incorrect information in an faq response, he/she should use professional judgement in deciding how to handle the situation. if it’s an error that may have been caused by a typo, he/she may choose to edit the response immediately. however, if the edit impacts the substantive content of the response, he/she may choose to consult with the librarian who initially wrote the response. guidelines for libchat*: ● if you refer a question, alert the librarian to whom the user is being referred. ● remember the person you’re chatting with can’t see you so if you leave (to conduct a search, to check a book, to help someone else etc.) let them know you’ll be right back. ● sometimes chat questions can seem rushed, so it may be tempting to answer the initial question. remember, like face-to-face interactions, clarifying queries save time for the user and the librarian, allowing for the provision of more accurate and efficient answers. ● when providing responses, remember that as an academic library, our mission is to provide the information needed and to instruct our users so they may become self-reliant; chat challenges us to balance providing answers and instruction. do your best to find an appropriate balance. ● as the transaction is ending, remain courteous, check that all the user’s questions have been addressed, and encourage them to use the service again. * note: these guidelines are drafts and will evolve as the staff learns more about this system throughout the pilot phase. lib-mocs-kmc364-20140106084043 211 name-title entry retrieval from a marc file philip l. long, head, automated systems research and development and frederick g. kilgour, director: ohio college library center, columbus, ohio a test of validity of earlier findings on 3,3 search-ke y retrieval from an in-process file for retrieval from a marc file. probability of number of entries retrieved per reply is essentially the same for both files. this study was undertaken to test the applicability of previous findings on retrieval of name-title entries from a technical processing system fil e ( 1 ) to retrieval from a marc file; the technique for retrieval employs truncated 3,3 search keys. materials and methods the study cited above employed a file of 132,808 name-title entries obtained from the yale university library's machine aided technical processing system. bibliographic control was not maintained for the generation of records in this file , with the result that the file contained errors that simulated errors in the requests library users put to catalogs. the marc file employed in the present study contains 121,588 name-title entries that are nearly error free. whereas the marc file possesses few records bearing foreign titles, the yale file has a significantly higher percentage of such titles, as would be expected for a large university library. initial articles were deleted in yale titles, but only english articles in marc titles because the language of foreign language titles is not identified in marc. 212 journal of library automation vol. 4/4 december, 1971 design of the program used to analyze the marc file was the same as that for the program employed in the previous study. however, the new program runs on a xerox data systems sigma 5 computer. the test employed the 3,3 search key to make possible comparison with previous results. results table 1 presents the percentage of time that up to five replies can be expected, assuming equal likelihood of key choice. inspection of the table reveals that there is no significant difference between the findings from the yale and the marc files. table 1. probability of number of entries per reply using 3,3, search key number of replies 1 2 3 4 5 discussion cumulative probability percentage yale file marc file 78.58 79.98 92.75 93.28 96.83 96.93 98.40 98.26 99.08 98.91 the same result was expected for the marc file that had been obtained earlier from the yale file. possible influences that might have led to different results were the existence of errors in the yale file, a significant proportion of foreign titles in the yale file as compared to the nearly all-english marc file, and the inability to mechanically delete the initial articles in the few foreign language marc titles. it is most unlikely that the effects of these differences are masking one another. conclusion the findings of a previous study on the effectiveness of retrieval of entries from a large bibliographic file ( 1) by use of a truncated 3,3 search key have been confirmed for a similarly large marc file. reference 1. kilgour, frederick g.; long, philip l. ; leiderman, eugene b.: "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science, 7 ( 1970 ), 79-81. december_ital_maceli_final technology skills in the workplace: information professionals’ current use and future aspirations monica maceli and john j. burke information technology and libraries | december 2016 35 abstract information technology serves as an essential tool for today’s information professional, and ongoing research is needed to assess the technological directions of the field over time. this paper presents the results of a survey of the technologies used by library and information science practitioners, with attention to the combinations of technologies employed and the technology skills that practitioners wish to learn. the most common technologies employed were email, office productivity tools, web browsers, library catalogand database-searching tools, and printers, with programming topping the list of most-desired technology skill to learn. similar technology usage patterns were observed for early and later-career practitioners. findings also suggested the relative rarity of emerging technologies, such as the makerspace, in current practice. introduction over the past several decades, technology has rapidly moved from a specialized set of tools to an indispensable element of the library and information science (lis) workplace, and today it is woven throughout all aspects of librarianship and the information professions. information professionals engage with technology in traditional ways, such as working with integrated library systems, and in new innovative activities, such as mobile-app development or the creation of makerspaces.1 the vital role of technology has motivated a growing body of research literature, exploring the application of technology tools in the workplace, as well as within lis education, to effectively prepare tech-savvy practitioners. such work is instrumental to the progression of the field, and with the rapidly-changing technological landscape, requires ongoing attention from the research community. one of the most valuable perspectives in such research is that of the current practitioner. understanding current information professionals’ technology use can help in understanding the role and shape of the lis field, provide a baseline for related research efforts, and suggest future monica maceli (mmaceli@pratt.edu) is assistant professor, school of information, pratt institute, new york. john j. burke (burkejj@miamioh.edu) is library director and principal librarian, gardner-harvey library, miami university middletown, middletown, ohio. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 36 directions. the practitioner perspective is also valuable in separating the hype that often surrounds emerging technologies from the reality of their use and application within the lis field. this paper presents the results of a survey of lis practitioners, oriented toward understanding the participants’ current technology use and future technology aspirations. the guiding research questions for this work are as follows: 1. what combinations of technology skillsets do lis practitioners commonly use? 2. what combinations of technology skillsets do lis practitioners desire to learn? 3. what technology skillsets do newer lis practitioners use and desire to learn as compared to those with ten-plus years of experience in the field? literature review the growth and increasing diversity of technologies used in library settings has been matched by a desire to explore how these technologies impact expectations for lis practitioner skill sets. triumph and beile examined the academic library job market in 2011 by describing the required qualifications for 957 positions posted on the ala joblist and arl job announcements websites.2 the authors also compared their results with similar studies conducted in 1996 and 1988 to see if they could track changes in requirements over a twenty-three-year period. they found that the number of distinct job titles increased in each survey because of the addition of new technologies to the library work environment that require positions focused on handling them. the comparison also found that computer skills as a position requirement increased by 100 percent between 1988 and 2011, with 55 percent of 2011 announcements requiring them. looking more deeply at the technology requirements specifically, mathews and pardue conducted a content analysis of 620 jobs ads from the ala joblist to identify skills required in those positions.3 the top technology competencies required were web development, project management, systems development, systems applications, networking, and programming languages. they found a significant overlap of librarian skill sets with those of it professionals, particularly in the areas of web development, project management, and information systems. riley-huff and rholes found that the most commonly sought technology-related job titles were systems/automation librarian, digital librarian, emerging and instructional technology librarian, web services/development librarian, and electronic resources librarian.4 a few years later, maceli added to this list with newly popular technology-relating titles, including emerging technologies librarian, metadata librarian, and user experience/architect librarian.5 beyond examining which specific technologies librarians should be able to use, researchers have also pondered whether a list of skills is even possible to create. crawford synthesized a series of blog posts from various authors to discuss which technology skills are essential and which are too specialized to serve as minimum technology requirements for librarians.6 he questioned whether universal skill sets should be established given the variety of tasks within libraries and the unique backgrounds of each library worker. crawford also questioned the expectation that every librarian information technology and libraries | december 2016 37 will have a broad array of technology skills from programming to video editing to game design and device troubleshooting. partridge et al. reported on a series of focus groups held with 76 librarians that examined the skills required for members of the profession, especially those addressing technology.7 in the questions they asked the focus groups, the authors focused on the term “library 2.0” and attempted to gather suggestions on skills that current and future librarians need to assist users. they concluded that the groups identified that a change in attitudes by librarians was more important to future library service than the acquisition of skills with specific technology tools. importance was given to librarians’ abilities to stay aware of technological changes, be resilient and reflective in the face of them, and to communicate regularly and clearly with the members of their communities. another area examined in the studies is where the acquisition of technology skills should and does happen for librarians. riley-huff and rholes reported on a dual approach to measure librarians’ preparation for performing technology-related tasks.8 the authors assessed course offerings for lis programs to see if they included sufficient technology preparation for new graduates to succeed in the workplace. they then surveyed lis practitioners and administrators to learn how they acquired their skills and how difficult it is to find candidates with enough technology preparation for library positions. their findings suggest that while lis programs offer many technology courses, they lack standardization, and graduates of any program cannot be expected to have a broad education in library technologies. further research confirmed this troubling lack of consistency in technology-related curricula. singh and mehra assessed a variety of stakeholders, including students, employers, educators, and professional organizations, finding widespread concern about the coverage of technology topics in lis curricula.9 despite inconsistencies between individual programs, several studies provided a holistic view of the popular technology offerings within lis curricula. programs commonly offered one or more introductory technology courses, as well as courses in database design and development, web design and development, digital libraries, systems analysis, and metadata.10,11,12 as researchers have emphasized from a variety of perspectives, new graduates could not realistically be expected to know every technology with application to the field of information.13 there was widespread acknowledgement that learning in this area can, and must, continue in a lifelong fashion throughout one’s career. riley-huff and rholes reported that lis practitioners saw their own experiences involving continuing skill development on the job, both before and after taking on a technology role.14 however, literature going back many decades suggests that the increasing need for continuing education in information technology has generally not been matched by increasing organizational support for these ventures. numerous deterrents to continuing technology education were noted, including lack of time,15 organizational climate, and the perception of one’s age.16 while studies in this area have primarily focused on mls-level positions, jones reported on academic library support staff members and their perceptions of technology use over a ten-year period and found that increased technology responsibilities added technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 38 to workloads and increased workplace stress.17 respondents noted that increasing use of technology in their libraries has increased their individual workloads along with the range of responsibilities that they hold. method to build an understanding of the research questions stated above, which focus on the technologies currently used by information professionals and those they desired to learn, we designed and administered a thirteen-question anonymous survey (see appendix) to the subscribers of thirty library-focused electronic discussion groups between february 25 and march 13, 2015. the groups were chosen to target respondents employed in multiple types of libraries (academic, public, school, and special) with a wide array of roles in their libraries (public services librarians, systems staff members, catalogers, and so on). we solicited respondents with an email sent to the groups asking for their participation in the survey and with the promise to post initial results to the same groups. the survey included closed and open-ended questions oriented toward understanding current technology use and future aspirations as well as capturing demographics useful in interpreting and generalizing the results. the survey questions have been previously used and iteratively expanded over time by the second author, first in the fall of 2008, then spring of 2012, with summative results presented in the last three editions of the neal-schuman library technology companion. we obtained a total of 2,216 responses to the question, “which of the following technologies or technology skills are you expected to use in your job on a regular basis?” of these responses, 1,488 (67 percent) of the respondents answered the question regarding technologies they would like to learn: “what technology skill would you like to learn to help you do your job better?” we conducted basic reporting of response frequency for closed questions to assess and report the demographics of the respondents. to analyze the open-ended survey question results in greater depth, we conducted a textual analysis using the r statistical package (https://www.r-project.org/). we used the tm (text mining) package in r (http://cran.rproject.org/package=tm) to calculate frequency, correlation of terms, generate plots, and cluster terms. results the following section will first present an overview of survey responses and respondents, and then explore results as related to the stated four research questions. the lis practitioners who responded to the survey reported that their libraries are located in forty us states, eight canadian provinces, and forty-three other countries. academic libraries were the most common type of library represented, followed by public, school, special, and other (see table 1). information technology and libraries | december 2016 39 library type number of respondents percentage of all respondents academic 1,206 54.4 public 545 24.6 school 266 12 special 138 6.2 other 61 2.8 table 1. the types of libraries in which survey respondents work respondents also provided their highest level of education. a total of 77 percent of responding lis practitioners have earned a library-related or other master’s degrees, dual master’s degrees, or doctoral degrees. from these reported levels of education, it is likely that more respondents are in librarian positions than in library support staff positions. however, individuals with master’s degrees serve in various roles in library organizations, so the percentage of graduate degree holders may not map exactly to the percentage of individuals in positions that require those degrees. significantly fewer respondents (16 percent) reported holding a high school diploma, some college credit, an associate degree, or a bachelor’s degree as their highest level of education. another aspect we measured in the survey was tasks that respondents performed on a regular basis. the range of tasks provided in the survey allowed for a clearer analysis of job responsibilities than broad categories of library work such as “public services” or “technical services.” some respondents appeared to be employed in solo librarian environments where they are performing several roles. even respondents who might have more focused job titles such as “reference librarian” or “cataloger” may be performing tasks that overlap traditional roles and categories of library work. the tasks offered in the survey and the responses to each are shown in table 2. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 40 task number of respondents percentage of respondents reference 1,404 63.4 instruction 1,296 58.5 collection development 1,260 56.9 circulation 917 41.4 cataloging 905 40.8 electronic resource management 835 37.7 acquisitions 789 35.6 user experience 775 35 library administration 769 34.7 outreach 758 34.2 marketing/public relations 722 32.6 library/it systems 672 30.3 periodicals/serials 659 29.7 media/audiovisuals 566 25.5 interlibrary loan 518 23.4 distance library services 474 21.4 archives/special collections 437 19 other 209 9.40% table 2. tasks performed on a regular basis by survey respondents while public services-related activities lead the list, with reference, instruction, collection development, and circulation as the top four task areas, technical services-related activities are well represented; the next three in rank are cataloging, electronic resource management, and acquisitions. the overall list of tasks shows the diversity of work lis practitioners engage in, as each respondent chose an average of six tasks. the results also suggest that the survey respondents are well acquainted with a wide variety of library work rather than only having experience in a few areas, making their uses of technology more representative of the broader library world. the survey also questioned the barriers lis practitioners face as they try to add more technology to their libraries, and 2,161 respondents replied to the question, “which of the following are barriers to new technology adoption in your library?” financial considerations proved to be the most common barrier, with “budget” chosen by 80.7 percent of respondents, followed by “lack of staff time” (62.4 percent), “lack of staff with appropriate skill sets” (48.5 percent), and “administrative restrictions” (36.7 percent). information technology and libraries | december 2016 41 what combinations of technology skillsets do lis practitioners commonly use? responses from survey question 8, “which of the following technologies or technology skills are you expected to use in your job on a regular basis?,” were analyzed to build an understanding of this research questions. a total of 2,216 responses to this question were received. survey respondents were asked to select from a detailed list of technologies/skills (visible in question 8 of the appendix) that they regularly used. the top answers respondents chose for this question were: email, word processing, web browser, library catalog (public side), and library database searching. the full list of the top twenty-five technology skills and tools used is detailed in figure 1, with the list of the bottom fifteen technology skills used presented in figure 2. figure 1. top twenty-five technology skills/tools used by respondents (n = 2,216) 0 500 1,000 1,500 2,000 email word processing web browser library catalog public side library database searching spreadsheets printers web searching teaching others to use technology presentation software windows os laptops scanners library management system staff side downloadable ebooks web based ebook collections cloud based storage technology troubleshooting teaching using technology online instructional materials/products tablets web video conferencing educational copyright knowledge library website creation or management cloud-based productivity apps technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 42 figure 2. bottom fifteen technology skills/tools used by respondents (n = 2,216) text analysis techniques were then used to determine the frequent combinations of technology skills used in practice. first, a clustering approach was taken to visualize the most popular technologies that were commonly used in combination (figure 3). clustering helps in organizing and categorizing a large dataset when the categories are not known in advance, and, when plotted in a dendrogram chart, assists in visualizing these commonly co-occurring terms. the authors numbered the clusters identified in figure 3 for ease of reference. from left to right, the first cluster is focuses on communication and educational tools, the second emphasizes devices and software, the third contains web and multimedia creation tools, the fourth contains office productivity and public-facing information retrieval tools, and the fifth cluster has a diverse collection of responsibilities including systems-oriented responsibilities (from operating systems to specific hardware devices), working with ebooks, teaching with technology, and teaching technology to others. 0 500 1,000 1,500 2,000 mac os audio recording and editing technology equipment installation computer programming or coding assistive adaptive technology rfid chromebooks network management server management statistical analysis software makerspace technologies linux 3d printers augmented reality virtual reality information technology and libraries | december 2016 43 figure 3. cluster analysis of most frequent technology skills used in practice, with red outlines on each numbered cluster notably, the list of top skills used (figure 1) falls more on the end-user side of technology; skills more oriented toward systems work (e.g. linux, server management, computer programming, or coding) were less frequently mentioned, and several were among the lowest reported (figure 2). of the 2,216 respondents, 15 percent used programming or coding skills regularly in their job (which is of interest as programming or coding was the skill most desired to learn by respondents; this will be discussed further in the context of the next research question). plotting the correlations between the more advanced technology skillsets can provide a picture of the work such systems-oriented positions are commonly responsible for, particularly as they are less well represented in the responses as a whole. figure 4 plots the correlated terms for those tasked with “server management.” it is fair to assume someone with such responsibilities falls on the highly technical end of the spectrum. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 44 figure 4. terms correlated with “server management,” indicating commonly co-occurring workplace technologies for highly-technical positions the more common task of “library website creation or management,” which fell to those with a broad level of technological expertise, had numerous correlated terms. figure 5 demonstrated a wide array of technology tools and responsibilities. figure 5. terms correlated with “library website creation or management,” indicating commonly co-occurring technologies used on the job information technology and libraries | december 2016 45 and lastly, teaching using technology and teaching technology to others is a long-standing responsibility of librarians and library staff. the following plot (figure 6) presents the skills correlated with “teaching others to use technology.” figure 6. terms correlated with “teaching others to use technology,” indicating commonly cooccurring technologies used on the job what combinations of technology skillsets do lis practitioners desire to learn? we analyzed responses to survey question 10, “what technology skill would you like to learn to help you do your job better?,” to explore this research question. as summarized in burke18—and consistent with the prior year’s findings—coding or programming remained the most desired technology skillset, mentioned by 19 percent of respondents. the raw text analysis yielded a fuller list of the top terms mentioned by participants (table 3 and visualized in figure 7). technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 46 technology term number of respondents percentage of respondents coding or programming (combined for reporting) 292 19.59 web 178 11.96 software 158 10.62 video 112 7.53 apps 106 7.12 editing 105 7.06 design 85 5.71 database 76 5.11 table 3. terms mentioned by 5 percent or more of survey respondents figure 7. wordcloud of responses to “what technology skill would you like to learn to help you do your job better?” information technology and libraries | december 2016 47 we then explored the deeper context of responses and individually analyzed responses specific to the more popular technology desires. first, we assessed the responses mentioning the desire to learn coding or programming. of these responses, the most common specific technologies mentioned were html, python, css, javascript, ruby, and sql, listed in decreasing order of interest. although most participants did not describe what they would like to do with their desired coding or programming skills, of those that did, the responses indicated interest in ● becoming more empowered to solve their own technology problems (e.g., “i would like to learn the [programming languages] so i don't have to rely on others to help with our website,” “i’m one of the most tech-skilled people at my library, but i’d like to be able to build more of my own tools and manage systems without needing someone from it or outside support.”); ● improving communication with it (e.g., “how to speak code, to aid in communication with it,” “to better identify problems and work with it to fix them”); ● creating novel tools and improving system interoperability (e.g. “coding for app and api creation”); and ● bringing new technologies to their library and patrons (e.g., “coding so that i can incorporate a hackerspace in my library”). next, we took a clustering approach to visualize the terms commonly desired in combination. figure 8 describes the clustered terms that we found within the programming or coding responses. the terms “programming” and “coding” form a distinct cluster to the right of the diagram, indicating that many responses contained only those two terms. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 48 figure 8. clustering of terms present in responses indicating the desire to learn coding or programming the remaining portion of the diagram begins to illustrate the specific technologies mentioned for those respondents that answered in greater detail or expanded on their general answer of programming or coding. other related desired technology-skill areas become apparent: database management, html and css (as well as the more general “web design,” which appeared in the top terms in table 3), php and javascript, python and sql, and xml creation, among others. the bulleted list presented in the previous paragraph illustrates some of the potential applications participants envisioned these skills being useful in, but the majority did not provide this level of detail in their response. editing was another prominent term that appeared across participant responses and was largely meant in the context of video editing. because of the vagueness of the term “editing,” a closer look was necessary to determine other technology desires. looking at terms highly correlated with “editing” revealed both video and photo editing to be important to respondents. several of the topappearing terms were used more generally: “database” and mobile “apps” were mentioned without specifying the technology tool or scenario of use, such that a more contextual analysis could not be conducted. these responses can be particularly difficult to interpret as the term “databases” can have a technical meaning (e.g., working with sql) or it can refer to the use of library databases from an end user perspective. information technology and libraries | december 2016 49 what technology skillsets do newer lis practitioners use and desire to learn as compared to those with ten-plus years experience in the field? of the 2,216 survey responses, 877 stated they had worked in libraries for ten or fewer years. we analyzed these responses separately from the remaining 1,334 respondents who had worked in libraries for more than ten years. of this group, 644 had worked in libraries for twenty-plus years (figure 9). a handful of participants did not answer the question and were omitted from the analysis. figure 9. number of survey responses falling into the various categories for number of years working in libraries the top technology skills used in the workplace did not differ significantly between the different groups. the top skills, as discussed earlier and presented in figure 1, were well represented and similarly ordered. a few small percentage points of difference were noted in a handful of the top skills (figure 10). those newer to the field were slightly more likely to teach others to use technology, use cloud-based storage, and use cloud-based productivity apps. more experienced practitioners regularly used the library management system (on the staff side) more than those that were newer to the field. 0 100 200 300 400 500 600 700 0-2 3-5 6-10 11-15 16-20 21+ technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 50 figure 10. top twenty-five technology skills used by respondents in the zero to ten years’ experience (dark blue) and eleven-plus years experience (light blue) groups for the question regarding technologies they would like to learn, 69 percent of the participants with zero to ten years’ experience answered the question compared to a slightly smaller 65 percent of the participants with more than ten-years’ experience. top terms for both groups were very similar, including coding or programming, software, web, video, design, and editing. these terms were not dissimilar to the responses taken as a whole (table 3), indicating that respondents were generally interested in learning the same sorts of technology skills regardless of how long they had been in the field. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% email word processing web browser library catalog public side library database searching spreadsheets web searching printers teaching others to use technology presentation software windows os laptops scanners downloadable ebooks cloud based storage library management system staff side web based ebook collections technology troubleshooting teaching using technology online instructional materials/products cloud-based productivity apps tablets web video conferencing library website creation or management educational copyright knowledge 0-10 years 11+ years information technology and libraries | december 2016 51 a few noticeable differences between the two groups emerged. the most popular skills mentioned, coding or programming, were mentioned by 28 percent of the respondents with zero to ten years’ experience, and by 15 percent of the respondents with eleven-plus years experience. there was slightly more interest (by a few percentage points) in databases, design, python, and ruby in the zero to ten years’ experience group. taking a closer look at the different year ranges in the zero to ten years of experience or less group, revealed that those with three to five years of experience were most likely to be interested in learning coding or programming skills. figure 11. percentage of respondents interested in learning coding or programming in the groups with ten or fewer years’ experience of the participants that answered the question at all, several stated that there were no technology skills they would need or like to learn for their position, either because they were comfortable with their existing skills or were simply open to learning more as needed (but nothing specific came to mind). combined with those who did not answer the question (and so presumably did not have a particular technology they were interested in learning), 28 percent of the zero to ten years’ experience group and 31 percent of the eleven-plus years experience group did not have any technologies that they desired to learn at the moment. discussion as detailed earlier, the most common technologies employed by lis practitioners were email, office productivity tools, web browsers, library catalog and database searching tools, and printers. generally similar technology usage patterns were observed for early and later-career practitioners and programming topped the list of most-desired technology skill to learn. 0% 5% 10% 15% 20% 25% 30% 35% 0-2 years 3-5 years 6-10 years technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 52 the cluster analysis presented in figure 3 suggests that a relatively small percentage of practitioners have technology-intensive roles that would require skills such as programming, working with databases, systems administration, etc. rather, the cluster analysis showed common technology skillsets focused on the end-user side of technology tools. in fact, most of the top ten skills used—email, office productivity tools (word processing, spreadsheets and presentation software), web browsers, library catalog and database searching, printers, and teaching others to use technology—are fairly nontechnical in nature. a potential exception is that of teaching technology. figure 6 suggests that teaching others to use technology entails several hardware devices (for example, laptops, tablets, smartphones, and scanners) as well as online and digital resources, such as ebooks. however, most of the popular skills used would be considered baseline skills for information workers in any domain. as suggested by tennant, programming and other advanced technical skills do not necessarily need to be a core skill for all information professionals, but knowledge of the potential applications and possibilities of such tools is required.19 this idea was echoed by partridge et al., whose findings emphasized the need for awareness and resilience in tackling new technological developments.20 these skills alone would obviously be too little for lis practitioners explicitly seeking a high-tech role, as discussed in maceli.21 however, further research directed toward exploring the mental models and general technological understanding of information professionals would be helpful in understanding the true level of practitioner engagement with technology, to complement the list of relatively low-tech tools employed. programming has been a skill of great interest within the information professions for many years and the respondents’ enthusiasm and desire to learn in this area was readily apparent from the survey results, with nearly 20 percent of participants citing either “programming” or “coding” as a skill they desired to learn. in the context of their current responsibilities, 15 percent of respondents overall mentioned “computer programming or coding” as a regular technological skill they employed (figure 2). there was a slight difference between the librarians with fewer than eleven years of experience—19 percent coded regularly—compared to 13 percent of those with eleven or more years of experience. within the years-of-experience divisions, the newer practitioners were more interested in learning programming, with the peak of interest at three to five years in the workplace (figure 11). the relatively low interest or need to learn programming in the newest practitioners potentially indicates a hopeful finding—that their degree program was sufficient preparation for the early years of their career. prior research would contradict this finding. for example, choi and rasmussen’s 2006 survey found that, in the workplace, librarians frequently felt unprepared in their knowledge of programming and scripting languages.22 in the intervening years, curriculum has shifted to more heavily emphasize technology skills, including web development and other topics covering programming,23 perhaps better preparing early career practitioners. overall, information technology and libraries | december 2016 53 programming remains a popular skill in continuing education opportunities as well as in job listings,24 which aligns well with the respondents’ strong interest in this area. the skills commonly co-occurring with programming in practice included working with linux, database software, managing servers, and webpage creation (figure 4). taken as a whole, these skills indicate job responsibilities falling toward the systems side, with webpage creation a skill that bridged intensely technical and more user-focused work (as also evident in figure 4).this indicates that, though programming may be perceived as highly desirable for communicating and extending systems, as a formal job responsibility it may still fall to a relatively small number of information professionals in any significant manner. makerspace technologies and their implementation possibilities within libraries have garnered a great deal of excitement and interest in recent years, with much literature highlighting innovative projects in this area (such as american library association25 and bagley26). fourie and meyer provided an overview of the existing makerspace literature, finding that most research efforts focus on the needs and construction of the physical space.27 given the general popularity of the topic (as detailed in moorefield-lang),28 it is interesting to note that such technologies were infrequently mentioned by survey participants, both in those desiring to learn these tools and those who were currently using them. the most infrequent skills used (figure 2) included makerspace technologies, 3d printers, augmented, and virtual reality. only a small number of respondents currently used this mix of makerspace-oriented and emerging technologies, and only 3 percent of respondents mentioned interest in learning makespace-related skills. despite many research efforts exploring the particulars of unique makerspaces in a case-study approach (for example, moorefield-lang),29 little data exists on the total number of makerspaces within libraries, and the skillset is largely absent from prior research describing lis curriculum and job listings. this makes it difficult to determine whether the low number of participants that reported working with makerspace technologies is reflective of the small number of such spaces in existence or simply that few practitioners are assigned to work in this area, no matter their popularity. in either case, these findings provide a useful baseline with which to track the growth of makerspace offerings over time and librarian involvement in such intensely technological work. despite the interest and clear willingness to learn and use technology, several workplace challenges became apparent from participant responses. as prior research explored (notable riley-huff and rholes),30 practitioners assumed they would be continually learning and building skills on the job throughout their career to stay current technologically. as described in the earlier results section, many participants mentioned that, although they were highly willing and able to learn, the necessary organizational resources were lacking. as one participant noted, “i’d like to learn anything but the biggest problem seems to be budget (time and monetary).” several participants expressed feeling overwhelmed with their current workload. new learning opportunities, technological or otherwise, were simply not feasible. although the survey results indicated that practitioners of all ages were roughly equally interested in learning new technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 54 technologies, a handful of responses mentioned that ageist issues were creating barriers. though few, these respondents described being dismissed as technologists because of their age. these themes have long been noted in the large body of continuing-education-related literature going back several decades. stone’s study ranked lack of time as the top deterrent to professional development for librarians, and it appears little has changed.31 chan and auster noted that organizational climate and the perception of one’s age may impair the pursuit of professional development, among other impediments.32 however, research has noted a generally strong drive in older librarians to continue their education; long and applegate found a preference in latercareer librarians for learning outlets provided by formal library schools and related professional organizations, but a lower interest in generally popular topics such as programming.33 these findings were consistent with the participant responses gathered in this survey. finally, as detailed in the results section, a significant percent of respondents (33 percent) did not answer the question regarding what technologies they would like to learn. as is a limitation with survey research, it is difficult to know what the respondent’s intention was in not answering the question, i.e., are they comfortable with their current technology skills? do they lack the time or interest in pursuing further technology education? and of those that did answer, many did not specify their intended use of the technologies they desired to learn. so a deeper exploration of what technologies lis practitioners desire to learn and why would be of value as well. these questions are worth pursuing in more depth through further research efforts. conclusion this study provides a broad view into the technologies that lis practitioners currently use and desire to learn, across a variety of types of libraries, through an analysis of survey responses. despite a marked enthusiasm toward using and learning technology, respondents described serious organizational limitations impairing their ability to grow in these areas. the lis practitioners surveyed have interested patrons, see technology as part of their mission, and are not satisfied with the current state of affairs, but they seem to lack money, time, skills, and a willing library administration. though respondents expressed a great deal of interest in more advanced technology topics, such as programming, the majority typically engaged with technology on an end-user level, with a minority engaged in deeply technical work. this study suggests future work in exploring information professionals’ conceptual understanding of and attitudes toward technology, and a deeper look at the reasoning behind those who did not express a desire to learn new technologies. information technology and libraries | december 2016 55 references 1. marshall breeding, “library technology: the next generation,” computers in libraries 33, no. 8 (2013): 16–18, http://librarytechnology.org/repository/item.pl?id=18554. 2. therese f. triumph and penny m. beile, “the trending academic library job market: an analysis of library position announcements from 2011 with comparisons to 1996 and 1988,” college & research libraries 76, no. 6 (2015): 716–39, https://doi.org/10.5860/crl.76.6.716. 3. janie m. mathews and harold pardue, “the presence of it skill sets on librarian position announcements,” college & research libraries 70, no. 3 (2009): 250–57, https://doi.org/10.5860/crl.70.3.250. 4. debra a. riley-huff and julia m. rholes, “librarians and technology skill acquisition: issues and perspectives,” information technology and libraries 30, no. 3 (2011): 129–40, https://doi.org/10.6017/ital.v30i3.1770. 5. monica maceli, “creating tomorrow’s technologists: contrasting information technology curriculum in north american library and information science graduate programs against code4lib job listings,” journal of education for library and information science 56, no. 3 (2015): 198–212, https://doi.org/10.12783/issn.2328-2967/56/3/3. 6. walt crawford, “making it work perspective: techno and techmusts,” cites and insights 8, no. 4 (2008): 23–28. 7. helen partridge et al., “the contemporary librarian: skills, knowledge and attributes required in a world -f emerging technologies,” library & information science research 32, no. 4 (2010): 265–71, https://doi.org/10.1016/j.lisr.2010.07.001. 8. riley-huff and rholes, “librarians and technology skill acquisition.” 9. vandana singh and bharat mehra, “strengths and weaknesses of the information technology curriculum in library and information science graduate programs,” journal of librarianship and information science 45, no. 3 (2013): 219–231, https://doi.org/10.1177/0961000612448206. 10. riley-huff and rholes, “librarians and technology skill acquisition.” 11. sharon hu, “technology impacts on curriculum of library and information science (lis)—a united states (us) perspective,” libres: library & information science research electronic journal 23, no. 2 (2013): 1–9, http://www.libres-ejournal.info/1033/. 12. singh and mehra, “strengths and weaknesses of the information technology curriculum.” 13. see, for example, crawford, “making it work perspective”; partridge et al., “the contemporary librarian.” technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 56 14. riley-huff and rholes, “librarians and technology skill acquisition.” 15. elizabeth w. stone, factors related to the professional development of librarians (metuchen, nj: scarecrow, 1969). 16. donna c. chan and ethel auster, “factors contributing to the professional development of reference librarians,” library & information science research 25, no. 3 (2004): 265–86, https://doi.org/10.1016/s0740-8188(03)00030-6. 17. dorothy e. jones, “ten years later: support staff perceptions and opinions on technology in the workplace,” library trends 47, no. 4 (1999): 711–45. 18. john j. burke, the neal-schuman library technology companion: a basic guide for library staff, 5th edition (new york: neal-schuman, 2016). 19. roy tennant, “the digital librarian shortage,” library journal 127, no. 5 (2002): 32. 20. partridge et al., “the contemporary librarian.” 21. monica maceli, “what technology skills do developers need? a text analysis of job listings in library and information science (lis) from jobs.code4lib.org,” information technology and libraries 34, no. 3 (2015): 8–21, https://doi.org/10.6017/ital.v34i3.5893. 22. youngok choi and edie rasmussen, “what is needed to educate future digital libraries: a study of current practice and staffing patterns in academic and research libraries,” d-lib magazine 12, no. 9 (2006), http://www.dlib.org/dlib/september06/choi/09choi.html. 23. see, for example, maceli, “creating tomorrow's technologists.” 24. elías tzoc and john millard, “technical skills for new digital librarians,” library hi tech news 28, no. 8 (2011): 11–15, https://doi.org/10.1108/07419051111187851. 25. american library association, “manufacturing makerspaces,” american libraries 44, no. 1/2 (2013), https://americanlibrariesmagazine.org/2013/02/06/manufacturing-makerspaces/. 26. caitlin a. bagley, makerspaces: top trailblazing projects, a lita guide (chicago: american library association, 2014). 27. ina fourie and anika meyer, “what to make of makerspaces: tools and diy only or is there an interconnected information resources space?,” library hi tech 33, no. 4 (2015): 519–25, https://doi.org/10.1108/lht-09-2015-0092. 28. heather moorefield-lang, “change in the making: makerspaces and the ever-changing landscape of libraries,” techtrends 59, no. 3 (2015): 107–12, https://doi.org/10.1007/s11528-015-0860-z. information technology and libraries | december 2016 57 29. heather moorefield-lang, “makers in the library: case studies of 3d printers and maker spaces in library settings,” library hi tech 32, no. 4 (2014): 583–93, https://doi.org/10.1108/lht-06-2014-0056. 30. riley-huff and rholes, “librarians and technology skill acquisition.” 31. stone, factors related to the professional development of librarians. 32. chan and auster, “factors contributing to the professional development of reference librarians.” 33. chris e. long and rachel applegate, “bridging the gap in digital library continuing education: how librarians who were not ‘born digital’ are keeping up,” library leadership & management 22, no. 4 (2008), https://journals.tdl.org/llm/index.php/llm/article/view/1744. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 58 appendix. survey questions 1. what type of library do you work in? 2. where is your library located (state/province/country)? 3. what is your job title? 4. what is your highest level of education? 5. which of the following methods have you used to learn about technologies and how to use them? please mark all that apply. • articles • as part of a degree i earned • books • coworkers • face-to-face credit courses • face-to-face training sessions • library patrons • online credit courses • online training sessions (webinars, etc.) • practice and experiment on my own • web resources i regularly check (sites, blogs, twitter, etc.) • web searching • other: 6. which of the following skill areas are part of your responsibilities? please mark all that apply. • acquisitions • archives/special collections • cataloging • circulation • collection development • distance library services • electronic resource management • instruction • interlibrary loan information technology and libraries | december 2016 59 • library administration • library it/systems • marketing/public relations • media/audiovisuals • outreach • periodicals/serials • reference • user experience • other: 7. how long have you worked in libraries? • 0–2 years • 3–5 years • 6–10 years • 11–15 years • 16–20 years • 21 or more years 8. which of the following technologies or technology skills are you expected to use in your job on a regular basis? please mark all that apply • assistive/adaptive technology • audio recording and editing • augmented reality (google glass, etc.) • blogging • cameras (still, video, etc.) • chromebooks • cloud-based productivity apps (google apps, office 365, etc.) • cloud-based storage (google drive, dropbox, icloud, onedrive, etc.) • computer programming or coding • computer security and privacy knowledge • database creation/editing software (ms access, etc.) • dedicated e-readers (kindle, nook, etc.) • digital projectors technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 60 • discovery layer/service/system • downloadable e-books • educational copyright knowledge • e-mail • facebook • fax machine • image editing software (photoshop, etc.) • laptops • learning management system (lms) or virtual learning environment (vle) • library catalog (public side) • library database searching • library management system (staff side) • library website creation or management • linux • mac operating system • makerspace technologies (laser cutters, cnc machines, arduinos, etc.) • mobile apps • network management • online instructional materials/products (libguides, tutorials, screencasts, etc.) • presentation software (ms powerpoint, prezi, google slides, etc.) • printers (public or staff) • rfid (radio frequency identification) • scanners and similar devices • server management • smart boards/interactive whiteboards • smartphones (iphone, android, etc.) • software installation • spreadsheets (ms excel, google sheets, etc.) • statistical analysis software (sas, spss, etc.) • tablets (ipad, surface, kindle fire, etc.) • teaching others to use technology information technology and libraries | december 2016 61 • teaching using technology (instruction sessions, workshops, etc.) • technology equipment installation • technology purchase decision-making • technology troubleshooting • texting, chatting, or instant messaging • 3d printers • twitter • using a web browser • video recording and editing • virtual reality (oculus rift, etc.) • virtual reference (text, chat, im, etc.) • word processing (ms word, google docs, etc.) • web-based e-book collections • web conferencing/video conferencing (webex, google hangouts, goto meeting, etc.) • webpage creation • web searching • windows operating system • other: 9. which of the following are barriers to new technology adoption in your library? please mark all that apply. • administrative restrictions • budget • lack of fit with library mission • lack of patron interest • lack of staff time • lack of staff with appropriate skill sets • satisfaction with amount of available technology • other: 10. what technology skill would you like to learn to help you do your job better? 11. what technologies do you help patrons with the most? 12. what technology item do you circulate the most? technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 62 13. what technology or technology skill would you most like to see added to your library? use of language-learning apps as a tool for foreign language acquisition by academic libraries employees articles use of language-learning apps as a tool for foreign language acquisition by academic libraries employees kathia ibacache information technology and libraries | september 2019 22 kathia ibacache (kathia.ibacache@colorado.edu) is the romance languages librarian at the university of colorado boulder. abstract language-learning apps are becoming prominent tools for self-learners. this article investigates whether librarians and employees of academic libraries have used them and whether the content of these language-learning apps supports foreign language knowledge needed to fulfill library-related tasks. the research is based on a survey sent to librarians and employees of the university libraries of the university of colorado boulder (ucb), two professional library organizations, and randomly selected employees of 74 university libraries around the united states. the results reveal that librarians and employees of academic libraries have used language-learning apps. however, there is an unmet need for language-learning apps that cover broader content including reading comprehension and other foreign language skills suitable for academic library work. introduction the age of social media and the advances in mobile technologies have changed the manner in which we connect, socialize, and learn. as humans are curious and adaptive beings, the moment mobile technologies provided apps to learn a foreign language, it was natural that self-regulated learners would immerse themselves in them. language-learning apps’ practical nature, as an informal educational tool, may attract self-learners such as librarians and employees of academic libraries to utilize this technology to advance foreign language knowledge usable in the workplace. the academic library employs a wide spectrum of specialists, from employees offering research consultations, reference help, and instruction, to others specialized in cataloging , archival, acquisition, and user experience, among others. regardless of the library work, employees utilizing a foreign language possess an appealing skill, as knowing a foreign language heightens the desirability of employees and strengthens their job performance. in many instances, librarians and employees of academic libraries may be required to have reading knowledge of a foreign language. therefore, for these employees, acquiring knowledge of a foreign language might be paramount to deliver optimal job performance. this study aims to answer the following questions: 1) are librarians and employees of academic libraries using language-learning apps to support foreign language needs in their workplace? and 2) are language-learning apps addressing the needs of librarians and employees of academic libraries? for purposes of this article, mobile language apps are those accessed through a website, and apps downloaded onto portable smartphones, tablets, desktops, and laptops. mailto:kathia.ibacache@colorado.edu use of language-learning apps | ibacache 23 https://doi.org/10.6017/ital.v38i3.11077 background mobile-assisted language learning (mall) has a user-centered essence that resonates with users in the age of social media. librarians and employees of academic libraries needing a foreign language to fulfill work responsibilities are a target group that can benefit from using languagelearning apps. these apps provide a multifaceted capability that offers time and space flexibility and adaptability that facilitates the changeable environment favored by self-learners. kukulskahulme states that it is customary to have access to learning resources through mobile devices. 1 in the case of those individuals working in academic libraries, language-learning apps may present an opportunity to pursue a foreign language accommodating their self-learning style, time availability, space, and choice of device. considering the features of language-learning apps, some have a more personal quality where the device interacts with one user while other apps emulate social media characteristics connecting a wide array of users. for instance, users learning a language through the hello talk app can communicate with native speakers all around the world. through this app, language learners can send voice notes, corrections to faulty grammar, and use the built-in translator feature. therefore, language-learning apps may not only provide self-learners a vehicle to communicate remotely, but also to interact using basic conversational skills in a given language. in the case of those working in academic libraries, this human connectedness among users may not be as relevant as the interactive nature of the device, its mobility, the convenience of the virtual learning, and the flexibility of the mobile technology. kukulska-hulme notes that the ubiquity of mobile learning is affecting the manner in which one learns.2 although there is abundant literature referring to mobile language technologies and their usefulness in students’ language learning in different school levels including higher education, scholarship regarding the use of language-learning apps by professionals is scarce.3 broadbent refers to self-regulated learners as those who plan their learning through goals and activities. 4 the author concurs that to engage in organized language learning through a language-learning app, one should have some level of organizational learning or as a minimum enough motivation to engage in self-learning. in this context, some scholars believe that the level of self-management of learning will determine the level of learning success.5 moreover, learners who possess significant personal learning initiative (pli) have the foundation to accomplish learning outcomes and overcome difficulties.6 pli may be one factor affecting learners’ motivation to learn a language in a virtual environment and away from the formal classroom setting. this learning initiative may play a significant role in the learning process, as it may influence the level of engagement and positive learning outcome. in terms of learning outcomes, language software developers may also play a role by adapting and broadening content based on learning styles and considering the elements that would provide a meaningful user experience. in this sense, bachore conveys that there is a need to address language-learning styles when using mobile devices.7 bachore also notes that as interest in mobile language learning increases, so does the different manners in which mobile devices are used to implement language learning and instruction.8 similarly, louhab refers to context dimensions as the parameters in mobile learning that consider learners’ individuality in terms of where the learning takes place, individual personal qualities and information technology and libraries | september 2019 24 learning needs, and the features of their mobile device.9 bradley also suggests that learning is part of a dialogue between the learners and their devices as part of a sociocultural context where thinking and learning occur.10 in addition. bradley infers that users are considered when creating learning activities and when improving them.11 for these reasons, some researchers address the need to focus on accessibility and developing content designed for different types of users, including differently abled learner s.12 furthermore, adaptation, according to the learner’s style, may be considered as a pivotal quality of languagelearning apps as software developers try to break the gap between formal instruction and a learner-oriented mobile learning platform. undoubtedly, the technological gap, which includes the cost of the device, interactivity, screen size, and input capabilities, among others, matter when centering on implementing language learning supported by mobile technologies. however, learning style is only one aspect in the equation. a learner’s need is another. for example: the needs of a learner who seeks to acquire familiarity with a foreign language because of an upcoming vacation may be substantially distinct from the needs of professionals such as academic librarians, who may need reading, writing, or even speaking proficiency in a given language. a user-centered approach in language-learning software design may advance the adequacy of these apps connecting them with a much wider set of learning needs. when referring to mobile apps for language learning, godwin-jones asserts that while the capability of devices is relevant, software development is paramount to the educational process.13 therefore, language-learning software developers may consider creating learning activities that target basic foreign language-learning needs and more tailored ones suitable for people who require different content. kukulska-hulme refers to “design for learning” as creating structured activities for language learning.14 although language-learning apps appear to rely on learning activities built on basic foreign language learning needs, these apps should desire to rely more on learners’ evaluative insights to advance software development that meets the specific needs of learners. although mobile technologies as a general concept will continue to evolve, its mobile nature will likely continue focusing on user experience satisfying those who prefer the freedom of informal learning. methodology instrument the author used a 26-question qualtrics survey approved by the institutional review board at the university of colorado boulder (ucb). the survey was open for eight weeks and received 199 total responses. however, the number of responses to each question varied depending on the question. the data collected was both quantitative and qualitative in nature, seeking to capture respondents’ perspectives and measurable data that could be used for statistics. the survey consisted of twelve general questions for all respondents that reported working in an academic library, then branched into either nine questions for respondents who had used a languagelearning app, and five questions for those who had not. the respondents answered via text fields, standard single and multiple-choice questions, and a single answer likert matrix table. qualtrics provided a statistical report, which the author used to analyze the data and create the figures. use of language-learning apps | ibacache 25 https://doi.org/10.6017/ital.v38i3.11077 participants the survey was distributed through an email to librarians and employees of ucb’s university libraries. the author also identified 74 university libraries in the united states from a list of members of the association of research libraries, and distributed the survey via email to ten randomly selected library employees from each of these libraries.15 the recipients included catalogers, subject specialists, archivists, and others working in metadata, acquisition, reference, and circulation. in addition, the survey was also distributed to the listserv of two library organizations: the seminar on the acquisition of latin american library materials (salalm) and reforma, the national association to promote library and information services to latinos and the spanish speaking. these organizations were chosen due to their connection with foreign languages. results use of foreign language at work of the respondents, 172 identified as employees of academic libraries (66 percent). of these, a significant percentage reported using a foreign language in their library work. the respondents belonged to different generational groups. however, most of the respondents were in the age groups of 30-39 and 40-49 years old. the respondents performed a variety of duties within the categories presented. due to incomplete survey results, varying numbers of responses were collected for each question. therefore, of 110 respondents, 82 identified their gender as female. in addition, of 105 respondents, 62 percent reported being subject specialists, 56 worked in reference, 54 percent identified as instruction librarians, 30 percent worked in cataloging and metadata, 30 percent worked in acquisition, 10 percent worked in circulation, 2 percent worked in archiving, and 23 percent reported doing “other” types of library work. information technology and libraries | september 2019 26 figure 1. age of respondents (n=109). figure 2. foreign language skills respondents used at work (multiple responses allowed, n-106). 9.17% 29.36% 30.28% 12.84% 18.35% 20-29 years old 30-39 years old 40-49 years old 50-59 years old 60 years or older 102 65 49 49 reading writing speaking listening use of language-learning apps | ibacache 27 https://doi.org/10.6017/ital.v38i3.11077 as shown in figure 2, respondents used different foreign language skills at work. however, reading was used with significantly more frequency. when asked, “how often do you use a foreign language at work?” 38 respondents out of 105 used it daily, 29 used it weekly, and 21 used it monthly. in addition, table 1 shows that a large percentage of respondents noted that knowing a foreign language helped them with collection development tasks and reference services. however, the respondents who chose “other” stated in a text field that knowing a foreign language helped them with translation tasks, building management, creating a welcoming environment, attending foreign guests, communicating with vendors, researching, processing, and having a broader perspective of the world emphasizing empathy. these respondents also expressed that knowing a foreign language helped them to work with materials in other languages, digital humanities projects, and to offer library tours and outreach to the community. type of librarian work expressed benefit (%) collection development 61.5 reference 57.6 communication 56.7 instruction 41.3 cataloging and metadata 41.3 acquisition 40.3 other 19.2 table 1. types of librarian work benefiting from knowledge of a foreign language (multiple responses allowed, n=104). figure 3. languages respondents studied using an app (multiple responses allowed, n=51). as shown in figure 3, spanish was the most prominent language studied. thirteen out of 51 respondents studied french and portuguese. additionally, respondents stated in the text field “other” that they have also used these apps to study english, mandarin, arabic, malay, hebrew, swahili, korean, navajo, turkish, russian, greek, polish, welsh, indonesian, thai, and tamil. regardless, apps were not the sole means for language acquisition. some respondents specified using books, news articles, pimsleur cds, television shows, internet radio, conversations with family members and native speakers, formal instruction, websites, dictionaries, online tutorials, audio tapes, online laboratories, flashcards, podcasts, movies, and youtube videos. 22 13 13 9 8 5 26 spanish french portuguese german italian japanese other information technology and libraries | september 2019 28 over a third of 49 respondents used a language-learning app for 30 hours or more, and less than a quarter used it between 11-30 hours. concerning the device preferred to access the apps, most of the respondents utilized a smartphone (63.27 percent), followed by a laptop (16.33 percent), and a tablet (14.29 percent). table 2 shows the elements of language-leaning apps that 48 respondents found more satisfactory. they selected “learning in own time and space” as the most desired element followed by “vocabulary” and “translation exercises.” participants were less captivated by “pronunciation capability” (29.1 percent) and “dictionary function” (16.6 percent). element of a language-learning app percentage finding satisfactory (%) learning in own time and space 64.5 vocabulary 56.2 translation exercises 56.2 making mistakes without feeling embarrassed 54.1 responsive touch screen 52 self-testing 52 reading and writing exercises 43.7 game-like features 37.5 voice recognition capability 37.5 comfortable text entry 37.5 grammar and verb conjugation exercises 35.4 pronunciation capability 29.1 dictionary function 16.6 table 2. most satisfactory aspects with language-learning apps (multiple responses allowed, n=48). figure 4. most unsatisfactory elements of language-learning apps (n=30). conversely, 30 respondents described unsatisfactory elements on the survey. these elements were grouped into the categories shown in figure 4. the elements were: payment restrictions, lack of grammatical explanations, monocentric content focused on travel, vocabulary-centric content 13 10 5 2 content flexibility/interface grammar payment use of language-learning apps | ibacache 29 https://doi.org/10.6017/ital.v38i3.11077 (although opinions were varied on this issue), and poor interface. respondents also mentioned a lack of flexibility that inhibited learners from reviewing earlier lessons or moving forward as desired, unfriendly interfaces, and limited scope. other respondents alluded to technical issues with keyword diacritical, non-intuitive software and repetitive exercises. while these elements relate to the language apps themselves, one respondent mentioned missing human interaction and another reported the lack of a system to prompt learners to be accountable for their own learning process. figure 5. reasons participants had not used a language-learning app (multiple responses allowed, n=53). figure 5 shows that time restriction (i.e., availability of time to use the app) was the most prevalent reason why respondents had not used a language-learning app. however, a larger percentage of respondents answered “other” to expand on the reason they had not tried this technology. the explanations provided included: missing competent content for work; already having sufficient proficiency; preferring books, dictionaries, google translate, and podcasts; lacking interest; and having different priorities. similarly, when asked whether they would use a language-learning app if given an opportunity, a large percentage of 52 respondents answered “maybe” (65.38 percent). however, when 51 respondents answered the question: “what elements facilitated your language learning?,” 66.6 percent responded they preferred having an instructor, 54.9 percent liked being part of a classroom, and 41.1 percent liked language activities with classmates. discussion library employee use of language-learning apps the data revealed that a large number of respondents used a foreign language in their library work, reporting that reading and writing were the most needed skills. however, only about half of the respondents had used a language-learning app. therefore, there appears to be interest in language-learning apps, but use is not widespread at this time. overall, respondents felt languagelearning apps did not offer a curriculum that supported foreign language enhancement for the workplace, especially the academic library one. this factor may be one reason why respondents stopped using the apps and why this technology was not utilized more extensively. 54.71% 37.73% 32.07% 1.88% other lack of time prefer traditional setting screen too small information technology and libraries | september 2019 30 interestingly, the majority of the respondents were in their thirties and forties. one may surmise that young millennials in their twenties would be more inclined to use language-learning apps. however, the data showed a slight lead by respondents in their forties. this information may corroborate the author’s inference that generational distinctions among employees of academic libraries do not limit the ability to seek and even prefer learning another language through apps. moreover, a pew research center study showed that older generations than millennials have welcomed technology and even gen xers had a 10 percent lead on the ownership of tablets over millennials.16 referring to the device used to interact with the language app, most respondents preferred a smartphone. only a smaller fraction of respondents preferred a tablet, laptop, or desktop. this data may attest to the movability feature of language-learning apps preferred by self-learners and the notion that language learning may happen outside the classroom setting. however, while smartphones provide ubiquity and a sense of independence, so can tablets. therefore, what is it about smartphones that ignites preference from a user experience perspective? is it their ability to make calls, portability, fast processors, wi-fi signal, or cellular connectivity that makes a difference? since tablets can also be considered portable, and their larger screen and web surfing capabilities are desirable assets, is it the “when and where” that determines the device? while not all respondents reported using an app to learn a language, those who did expressed satisfaction with learning in their own space and time and with translation exercises. nevertheless, it is captivating that few respondents deemed important the ability of the software to help learners with the phonetic aspect of the language. this diminished interest in pronunciation may be connected with the type of language learning needed in the academic library profession. as respondents indicated, language-learning apps tend to focus on conversational skills rather than reading and text comprehension. in addition to those respondents who used an app to learn a new language, one respondent reported reinforcing skills in a language already acquired. a compelling matter to consider is the frequency with which respondents utilize a foreign language in their work. about a third of the respondents used a foreign language at work on a daily basis, and approximately a quarter used it weekly. this finding reveals that foreign language plays a significant role in academic library work. since the respondents fulfilled different responsibilities at their library work, one may deduce that foreign language is utilized in a variety of settings other than strictly desk tasks. in fact, as stated before, respondents reported using foreign language for multiple tasks including communicating with vendors and foreign guests as well as providing a welcoming environment, among others. even though 59 respondents stated that knowing a foreign language helped them with communication, respondents appeared to be more concerned with reading comprehension and vocabulary. it is likely reading comprehension was ranked high in the level of importance since library jobs that require foreign language knowledge tend to utilize reading comprehension skills widely. nonetheless, the author wonders whether subject specialists utilize more skills related to listening and communication in a foreign language, especially those librarians who provide instruction. therefore, it is curious that they did not prioritize these skills. perhaps this topic could be the subject for future research. notwithstanding these results, language-learning apps appear to center on content that improves listening and basic communication instead of reading use of language-learning apps | ibacache 31 https://doi.org/10.6017/ital.v38i3.11077 comprehension. therefore, the question remains as to whether mobile language apps have enough capabilities to provide a relevant learning experience to librarians and staff working in academic libraries. are language-learning apps responding to the language needs of employees working in academic libraries? the survey results indicate that language-learning apps are not sufficiently meeting respondents’ foreign language needs. qualitative data showed that there may be several elements affecting the compatibility of language-learning apps with the needs of employees working in academic libraries. however, the findings were not conclusive due to the limited number of responses. when respondents were asked to identify the unsatisfactory elements in these apps, 65.9 percent of 47 respondents found an issue with language-learning apps, but 23 percent of those respondents answered “none.” according to respondents, the main problems with apps were the lack of content and scope that was suitable for employees of academic libraries, flexibility, and grammar. perhaps mobile language-app developers speculate that some learners still use a formal classroom setting for foreign language acquisition, and therefore leave more advanced curriculum for that setting. it is also possible that developers deem more dominant a market that centers on travel and basic conversation; this may explain why these apps do not address foreign language needs at the professional level. finally, these academic library employees appear to perceive that there is a need for these apps to explore and offer a curriculum and learning activities that benefit those seeking deeper knowledge of a language. conclusion mobile language learning has changed the approach to language acquisition. its mobility, portability, and ubiquity have established a manner of instruction that provides a sense of freedom and self-management that suits self-learners. moreover, as app technology has progressed, features have been added to devices that facilitate a more meaningful user experience with language-learning apps. employees of academic libraries that have used foreign languagelearning apps are cognizant of language-learning activities that support their foreign language needs for work such as reading comprehension and vocabulary. however, language-learning apps appear to market conversational needs, providing exercises that focus on travel more than less ons that center on reading comprehension and deeper areas of language knowledge. this indicates a lack of language-learning content that would be more appropriate for those working in academic libraries. finally, academic library employees who require a foreign language in their work are a target group that may benefit from mobile language learning. presently, this target group feels languagelearning apps are too basic to cover professional, broader needs. therefore, as language-learning app developers consider service to wider groups of people, it would be beneficial for these apps to expand their lesson structure and content to address the needs of academic library professionals. endnotes 1 agnes kukulska-hulme, “will mobile learning change language learning?” recall 21, no. 2 (2009): 157, https://doi.org/10.1017/s0958344009000202. https://doi.org/10.1017/s0958344009000202 information technology and libraries | september 2019 32 2 ibid, 158. 3 see florence martin and jeffrey ertzberger, “here and now mobile learning: an experimental study on the use of mobile technology,” computers & education 68, (2013): 76-85, https://doi.org/10.1016/j.compedu.2013.04.021; houston heflin, jennifer shewmaker, and jessica nguyen, “impact of mobile technology on student attitudes, engagement, and learning,” computers & education 107, (2017): 91-99, https://doi.org/10.1016/j.compedu.2017.01.006; yoon jung kim, “the effects of mobileassisted language learning (mall) on korean college students’ english-listening performance and english-listening anxiety,” studies in linguistics, no. 48 (2018): 277-98, https://doi.org/10.15242/heaig.h1217424; jack burston, “the reality of mall: still on the fringes,” calico journal 31, no. 1 (2014): 103-25, https://www.jstor.org/stable/calicojournal.31.1.103. 4 jaclyn broadbent, “comparing online and blended learner’s self-regulated learning strategies and academic performance,” internet and higher education 33 (2017): 24, https://doi.org/10.1016/j.iheduc.2017.01.004. 5 rui-ting huang and chung-long yu, “exploring the impact of self-management of learning and personal learning initiative on mobile language learning: a moderated mediation model,” australian journal of education technology 35, no. 3 (2019): 118, https://doi.org/10.14742/ajet.4188. 6 ibid, 121. 7 mebratu mulato bachore, “language through mobile technologies: an opportunity for language learners and teachers,” journal of education and practice 6, no. 31 (2015): 51, https://files.eric.ed.gov/fulltext/ej1083417.pdf. 8 ibid, 50. 9 fatima ezzahraa louhab, ayoub bahnasse, and mohamed talea, “considering mobile device constraints and context-awareness in adaptive mobile learning for flipped classroom,” education and information technologies 23, no. 6 (2018): 2608, https://doi.org/10.1007/s10639-018-9733-3. 10 linda bradley, “the mobile language learner: use of technology in language learning,” journal of universal computer science 21, no. 10 (2015): 1270, http://jucs.org/jucs_21_10/the_mobile_language_learner/jucs_21_10_1269_1282_bradley.pdf . 11 ibid. 12 tanya elias, “universal instructional design principles for mobile learning,” the international review of research in open and distance learning 12, no. 2 (2011): 149, https://doi.org/10.19173/irrodl.v12i2.965. 13 robert godwin-jones, “emerging technologies: mobile apps for language learning,” language learning & technology 15, no. 2 (2011): 3, http://dx.doi.org/10125/44244. https://doi.org/10.1016/j.compedu.2013.04.021 https://doi.org/10.1016/j.compedu.2017.01.006 https://doi.org/10.15242/heaig.h1217424 https://www.jstor.org/stable/calicojournal.31.1.103 https://doi.org/10.1016/j.iheduc.2017.01.004 https://doi.org/10.14742/ajet.4188 https://files.eric.ed.gov/fulltext/ej1083417.pdf https://doi.org/10.1007/s10639-018-9733-3 http://jucs.org/jucs_21_10/the_mobile_language_learner/jucs_21_10_1269_1282_bradley.pdf https://doi.org/10.19173/irrodl.v12i2.965 http://dx.doi.org/10125/44244 use of language-learning apps | ibacache 33 https://doi.org/10.6017/ital.v38i3.11077 14 kukulska, 158. 15 “membership: list of arl members,” association of research libraries, accessed april 5, 2019, https://www.arl.org/membership/list-of-arl-members. 16 jingjing jiang, “millenials stand out for their technology use,” pew research center (2018), https://www.pewresearch.org/fact-tank/2018/05/02/millennials-stand-out-for-theirtechnology-use-but-older-generations-also-embrace-digital-life/. https://www.arl.org/membership/list-of-arl-members https://www.pewresearch.org/fact-tank/2018/05/02/millennials-stand-out-for-their-technology-use-but-older-generations-also-embrace-digital-life/ https://www.pewresearch.org/fact-tank/2018/05/02/millennials-stand-out-for-their-technology-use-but-older-generations-also-embrace-digital-life/ abstract introduction background methodology instrument participants results use of foreign language at work discussion library employee use of language-learning apps are language-learning apps responding to the language needs of employees working in academic libraries? conclusion endnotes 56 technical communications announcements new cola chairman brian aveney, of the richard abel co., has been elected chairman of the cola discussion group, effective january 1974. prior to his present position with the design group at richard abel, mr. aveney was head of the systems office at the university of pennsylvania libraries. the cola discussion group traditionally meets on the sunday afternoon preceding each ala conference. meetings are open, and all are invited to attend. and a book review editor a member of the university of bri:tish columbia graduate school of library science faculty, peter simmons, has been appointed book review editor of the ]ow·nal of library automation. mr. simmons is the author of the "library automation" chapter in the annual review of information science and technology, volume 8, the most recent of his publications. authors and publishers are requested to send relevant literature to mr. simmons at the graduate school of library science, university of british columbia, vancouver, british columbia, for review. missing issues? the rapid publication sequence of the 1972 and 1973 volumes of the ]omnal of library automation has created problems for some isad members and subscribers. if your address changed during 1973, or if your ala membership suffered any quirk, you are especially likely to have missed one or more of the issues due you. if this is the case, please write to the membership and subscription records department of the american library association, 50 e. huron st., chicago, il 60611. indicate which issues you are missing, and every attempt will be made to forward them to you as quickly as possible. new eric clearinghouse stanford university's school of education has been awarded a one-year contract by the national institute of education (nie) to operate the newly-formed eric clearinghouse on information resources under the direction of dr. richard clark. the new clearinghouse will be part of the stanford center for research and development in teaching. the clearinghouse on inf01mation resources is the result of a merger of two previous clearinghouses-the one on medfa and technology formerly located at the stanford center for research and development in teaching, and . the one on library and information sciences formerly located at the american society for information science in washington, d.c. the new clearinghouse is responsible for collecting information concerning print and nonprint learning resources, including those traditionally provided by school and community libraries and those provided by the growing number of technologybased media centers. the clearinghouse collects and processes noncopyright documents on the management, operation, and use of libraries, the technology to improve their operation, and the education, training, and professional activities of librarians and information specialists. in addition, the clearinghouse is collecting material on educational media such as television, computers, films, radio, and microforms, as well as techniques which are an outgrowth of technology-systems analysis, individualized instruction, and microteaching. library automation activitiesinternational computerized system at the james cook university of north queensland library. the system design phase of an integrated acquisitions/ cataloging system for the library at the james cook university of north queensland has been completed by a firm of computer consultants, ian oliver and associates, and programming has commenced. history the system, known as catalist, is a batch system to be operated on the university's central computer, a pdp-10. it will be programmed in fortran and macro, the assembly language of the pdp-10. desc1·iption the system will cover all aspects of cataloging/ acquisitions procedures for all library material apart from serials including: (a) production of orders, followups, reports (b) budget control (c) fund accounting (d) routing slips (e) accessions lists (f) in-process and catalog supplements (author/title and added entry) and subject catalog supplement shelflist and supplement (g) catalogs (author/title and subject) (h) union catalog cards. some features of the system include the maintenance of average book price in all subject areas. these are continually updated by the system to reflect the current fluctuations in the trade. thfs information will be used together with machine-based arrival predictions to control the budget and fund allocations. marc data will be used as much as possible, with records for individual items being supplied from external sources on request. technical communications 57 the in-process catalogues, which will contain items on order, items arrived, and items cataloged since the previous edition of the catalog, will contain added entries for all material where such information fs available. the catalogs will be produced on com. roll film will be used for public catalogs and fiche for in-house use. data for the national union catalogue will be submitted on minimally-formatted computerproduced cards. for further information contact ms. c. e. kenchfngton, systems librarian, post office, james cook university of north queensland, australia 4811. technical exchanges editor's note: the two following articles, prepared by the library of congress and the council on libtary resources, respectively, have been distribttted through various lc publications. due to the importance of the two documents, however, and to the fact that they may not have reache.d the entire libtary community, it seemed therefore appropriate to publish the papers again in journal of library automation. sharing machine-readable bibliographic data: a progress report on a series of meetings sponsored by the council on library resources beginning in december 1972 and continuing since that date, the council on library resources has convened a series of meetings of representatives of several organizations to discuss the implications of bibliographic data bases being built around the country and the possibilities of sharing these resources. although the deliberations are not yet completed, the council, as well as all participants in the meetings, felt that it was timely to make the progress to date known to the community. since publication in the 58 i ournal of library automation vol. 7/1 march 197 4 open literature implies a long waiting period between completion of a paper and the actual publication date, it was decided that this paper should be written and distributed as expeditiously as possible. since the library of congress has vehicles for dissemination of information in its marc distribution service, i nfotmation bulletin, and cataloging setvice bulletin, lc was asked to assume the responsibility for the preparation of a paper to be distributed via the above mentioned channels as well as sending copies to relevant associations. the institutions participating in the deliberations have been included as an appendix to this paper. the bibliographic data bases under consideration at individual institutions contain both marc records from lc as well as records locally encoded and transcribed. these local records represent: ( 1) titles in languages not yet within the scope of marc; (2) titles in languages cataloged by lc prior to the onset of the marc service; ( 3) titles not cataloged by lc; and ( 4) titles cataloged by lc and recataloged when the lc record cannot be found locally. the first two categories, in many instances, are being encoded and transcribed by institutions using lc data as the source, i.e., proofsheets, nuc records, and catalog cards. these are referred to for the remainder of this paper as lc source data and the third and fourth categories as original cataloging. all participants agreed that the stmcture of the format for the interchange of bibliographic data would be marc but several participants questioned if a subset of lc marc could not be established for interchange for all transcribing libraries other than lc. 1• 2 although lc had reported its survey regarding levels of completeness of marc records and the conclusions reached by the recon working task force, namely, "to satisfy the needs of diverse installations and applications, records for general distribution should be in the full marc format," it appeared worthwhile to once more make a survey to see if agreement could be reached on a subset of data elements. 3 the survey ineluded only those institutions participating in the clr meetings. the result of the survey again demonstrated that considered collectively, institutions need the complete marc set of data elements. the decision was made that the lc marc format was to be the basis of the further deliberations of the participants. attention was then turned to any additional elements of the format or modifications to present elements that may be required in order to interchange bibliographic data among institutions. all concmned recognized that although networks of libraries, in the true sense, still do not exist today, much has been learned since the development of the marc format in 1968. certain ground rules were established and are given below: 1. the material under consideration is to be limited to monographs. 2. the medium considered for the transmission of data is magnetic tape. 3. data recorded at one institution and transmitted to another in machinereadable form is not to be retransmitted by the receiving institution as part of the receiving institution's data base to still another institution.4 4. any additions or changes required to the marc format for "networking" arrangements are not to substantially impact lc procedures. 5. any additions or changes required to the marc format for "networking" arrangements are not to substantially affect marc users. long discussions took place concerning modifications to lc source data by a transcribing library and the complexity involved in transmitting information as to which particular data elements were modified. ground mle 6 was established stating that if any change is made to the bibliographic content of a record copied from an lc source document (other than the lc call number), the transcribing library would be considered the cataloging source, i.e., the machine-readable record would no longer be considered an lc cataloging record. any errors detected in lc marc records are to be reported to lc for correction. a subcommittee was formed to study what marc format additions and modifications were required. the subcommittee met on several occasions and made the following proposals to the parent committee: 1. fixed field position 39 and variable field 040, ·cataloging source, should be expanded to include information defini.ng the cataloging library, i.e., the hbrary responsible for the cataloging of the item, and the transcribing library, i.e., the library actually doing the input keying of the cataloging data. 2. lc should include the lc card number in field 010 as well as in field 001. when the lc card number is known by an agency transcribing cataloging data, field 001 should contain that agency's control number and field 010 should contain the lc card number. 3. variable field 050 should not be used for any call number other than the lc call number. transcribing agencies should always put the lc call number in this field if known. 4. a new variable field 059, contributed classification, should be defined to allow agencies other than lc to record classification numbers such as lc classification, dewey, national agricultural library classification, etc., with indicators assigned to provide the information as to what classification system was recorded and whether the cataloging or transcribing agency provided this data. 5. variable field 090, local call number should follow the same indicator sys~ tern as defined in field 059. (090 contains the actual call number used by either the cataloging or transcribing library while 059 would contain additional classification numbers assigned by the cataloging or transcribing library.) 6. lc would assume the responsibility of distributing any agreed upon additions or modifications as either an technical communications 59 addendum to or a new edition of books: a marc format. discussions following the presentation of these proposals indicated concern regarding three principal areas: 1. the modifications of any data element in an lc source document other than the addition of a local call number dictated that the institution performing the modification of the record assume the position of the cataloging source. this resulted in the possibility that a large number of records would undergo minor changes and consequently the knowledge that the record was actually an lc record would be lost. this loss was considered a critical problem. 2. the creation of a marc record implied that each fixed field and all content designators should be present if applicable for any one record. during the lc recon project, it was recognized that certain fixed fields could not be coded explicitly because the basic premise in the recon effort was the encoding of existing cataloging records without inspecting the book. consequently, the value of certain fixed fields such as indicating the presence or absence of an index in the work, could not be known. participants felt that a "fill" character was needed to describe to the recipient of machiner~adable cataloging data that a particular fixed field, tag, indicator, or subfield code could not be properly encoded due to uncertainty. the "fill" character will be a character in the present library character set but one not used for any purpose up to this time. 3. although networking is not clearly defined at this time, participants felt that the marc format should have the capability to include location symbols to satisfy any future requirement to transmit this information in order to expedite the sharing of library resources. majority opinion indicated there was a 60 journal of library automation vol. 7/1 march 1974 need to guarantee the recognition of an lc source record, that a "fill" character could serve a useful function, and that a method of transmitting location symbols was required. three position papers were written on the topics outlined above giving the rationale for the requirement and describing a proposed methodology for implementation. these papers were reviewed at a meeting of the participants and are presently undergoing modification taking into account recommendations made. the revised papers are to be distributed prior to the next meeting in january 1974. following this meeting, another paper will be prepared for publication which will include a definitive account of the modifications and additions recommended for the marc format as well as describing the rationale for the additions and modifications. at that time the proposals will be submitted to the library community for its review and acceptance. if the additions and changes are approved by the marbi5 committee of the american library association, lc will proceed to amend or rewrite the publication books: a marc format. however, the points elaborated below deserve emphasis toward the understanding of the issues described in this paper. 1. the meetings were concerned with a national exchange of data, not international. 2. the additions and modifications recommended for the marc format, with one exception, affect organizations other than the library of congress exchanging machine-readable cataloging data. except for distributing records with the lc card number in field 010 as well as 001, the marc format at lc will remain intact. 3. lc will investigate the use of the fill character in its own records, both retrospective and current, and for records representing all types of materials. henriette d. avram marc development office library of congress references 1. the marc format has been adopted as both a national and international format by ansi and iso respectively. 2. subset in this context includes· both the data content of the record (fixed and variable fields) and content designators (tags, indicators, and subfield codes) . 3. recon working task force, "levels of machine-readable records," in its national aspects of creating and using marc/recon reco1'ds (washington, d.c.: library of congress, 1973), p.4-6. 4. this rule did not extend to a subscriber to the lc marc service duplicating an lc tape for another institution. one can readily see the chaos that would result if institution a sent its records to institutions b and c, b then selected all or part of a's records for inclusion in its data base, and then transmitted its records to a and c. the result of the multitransmission of the same records, modified or not, would create useless duplication and confusion. 5. rtsd/isad/rasd representation in machine-readable form of bibliographic information committee. appendix 1 list of organizations participating in the clr sponsored meetings library of congress national agricultural library national library of medicine national serials data program new england library information network new york public library the ohio college library center stanford university libraries university of chicago libraries washington state library university of western ontario library a composite effort to build an on-line national serials data base (a paper for presentation at the arl midwinter meeting, chicago, 19 january 1974) an urgent requirement exists for a concerted effort to create a comprehensive national serials data base in machine-readable form. neither the national serials data program nor the marc serials distribution service, at their current rate of data base building, will solve the problem quickly enough. because of the absence of a sufficient effort at the national level, several concerted efforts by other groups are under way to construct serials data bases. these institutions have been holding in abeyance the development of their automated serials systems, some for several years, waiting for sufficient development at the national level to provide a base and guidance for the development of their individual and regional systems. this has not been forthcoming, and local pressures from their users, their administrators, and their own developing systems are forcing these librarians to act without waiting for the national effort. these efforts are exemplified by the work of one group of librarians, described below. what has now come to be known as the "ad hoc discussion group on serials" had its beginnings in an informal meeting during the american library association's conference in las vegas last june. you will also hear this discussion group referred to as the "toronto group." this is because its prime mover has been richard anable of york university, toronto, and because the first formal meeting occurred in that city. the expenses of the toronto and subsequent meetings have been borne by the council on library resources, and council staff have been involved in each meeting. a fuller exposition of the origins, purposes, and plans of the toronto group has been written by mr. anable for the journal of libm1'y automation. it appeared in the december 1973 issue. quoting from anable: "at the meeting [in las vegas] there was a great deal of concern expressed about: 1. the lack of communication among the generators of machine-readable serials files. 2. the incompatibility of format and/ or bibliographic data among existing files. 3. the apparent confusion about the technical communications 61 existing and proposed bibliographic description and format 'standards'." end of quote. the toronto group agreed that something could and should be done about these problems. if nothing else, better communications among those libraries and systems creating machine-readable files would allow each to enhance its own systems development by taking advantage of what others were doing. as the discussions progressed, several points of consensus emerged. among them were: 1. the marc serials distribution service of the library of congress and the national serials data program together were not building a national serials data base in machine-readable form fast enough to satisfy the requirements of developing library systems. this systems development was, in several places, at the point where it could no longer wait on serials data base development at the national level as long as progress remained at the current rate. 2. the marc serials format developed at lc offered the only hope for machine format capability. every system represented planned to use it. for the purpose of building a composite data base outside lc, the marc serials format would probably require minor modification, principally by extension. these extensions could and should be added on so as to do no violence to software already developed to handle marc serials. 3. there existed some difference between the lc marc serials format and that used by the national serials data program. these differences arose from several circumstances. for example, the marc serials format predated the international serials data system (isds), the national serials data program, and the key title concept. when these three came along, the requirement existed that the nsdp abide by the conventions of the isds. since the key title 62 journal of librm·y automation vol. 7/1 march 1974 is not yet a cataloging title, but is the title to which the international standard serial number is assigned, it is natural that the approach to serial record creation by nsdp should be different from that of a library cataloging serials by conventional methods. a working group under the auspices of the ifla cataloguing secretariat has devised an international standard bibliographic description for serials. the working group's recommendations are to be distributed for trial, discussion, and recommendation for change in february. when the isbd ( s) is accepted into cataloging practice, some of the differences in marc usage and nsdp procedure will disappear. others will still remain and they must be reconciled. we cannot continue with two serial records, both of which claim to be national in purpose but which are incompatible with each other. a good exposition of the differences in these serials records from the point of view of the marc development office is in an article by mrs. josephine pulsifer in the december 1973 issue of the journal of libmry automation. 4. major canadian libraries are active in cooperative work on serials and these two national efforts should be coordinated. several other circumstances bear on the problem. for example, the national serials data program is a national commitment of the three national libraries. in addition to the funding from the three national libraries, there are excellent chances that the nsdp will receive funds from other sources to expedite its activities. the nsdp is responsible for the issn and key title and for relationships with the international serials data system. ultimately, the issn and key title will be of great importance to serials handling in all libraries. for all of these reasons it is imperative that the activities of the nsdp be channeled into the comprehensive data base building effort described in this paper. when it was realized at the council on library resources that the toronto group was serious and that a data base building effort would result, it was obvious that this had enormous significance for the library of congress and other library systems because the result would be a de facto national serials data base. accordingly, a paper was prepared and sent to lc, urging that an effort be made in washington to coordinate the efforts of the marc serials distribution service, the national serials data program, and this external effort. in addition, it was felt that lc should take a hard look at its own several serials processing flows and attempt to reconcile them better with each other and with the external effort. to do this, lc was urged to do a brief study of lc serials systems, using lc staff and one person from clr. lc agreed and the study is now very nearly complete. the written guidance given the study group members was quite specific. they were to study all serials flow at lc and make their recommendations based on what lc should be doing, rather than being constrained by what lc is doing. the overall objectives of the study were to aim for the creation of serials records as near the source as possible and onetime conversion of each record to machine-readable form to serve multiple uses. specifically to be examined were the serials processing flows of the copyright office, the order division, the serial record division, new serials titles, and the national serials data program. while all of this was going forward, the toronto group had some more meetings. oclc was tentatively selected as the site for the data base building effort. it is understood by everyone that this is a temporary solution; eventually a national-level effort must be mounted which will provide a post-edit capability to bring the composite data base up to nationally acceptable standards. a permanent update capability is also required. this permanent activity, hopefully, will be based at the library of congress. oclc was chosen as the interim site for several reasons, but especially for its proven capability to produce network software and support which will work. within a very short time oclc will have on-line serials cataloging and input capability which will extend to some two hundred libraries. no other system is nearly so far advanced. the toronto group has assured itself that the data record oclc intends to use is adequate and is now working on the conventions required to insure consistency in input and content, to include some recommendations for minor additions to the marc serials format. during their deliberations, the toronto group realized that, to be effective, their efforts needed formal sponsorship, and discussions to this end were begun. initially, several agencies were considered to be candidates for this management role. various considerations quickly narrowed the list down to the library of congress, the association of research libraries, and the council on library resources, and representatives of these three met to discuss the matter further. during the discussions, clr was asked to assume the interim management responsibility until a permanent arrangement could be worked out. clr was selected because, as an operating foundation under the tax laws, it can act expeditiously in matters of this kind. clr can also deal with all kinds of libraries and has no vested interest in any particular course of action. meanwhile, certain institutions in the toronto group had indicated that they were ready to pledge $10,000 among themselves for the specific purpose of hiring mr. anable as a consultant to continue his coordinating activities. the group asked clr to act as agent to collect and disburse these funds. clr is ready to assume the initial responsibility for the management of this cooperative data base building effort, if that is the will of the leadership in the library community. clr is prepared to commit one staff member full time to the project who is well versed in the machine handling of marc serials records. this is mr. george parsons, and other staff members will assist as appropriate. mr. anable has agreed to act as a consultant to help coordinate these activities. clr would aim for the most complete, accutechnical c01nmunications 63 rate, and consistent serial record in the lc marc serials format which can be had under the circumstances. during the effort, clr will act as the point of contact between oclc and the participating libraries, assisting in negotiating contracts and other agreements as required. the composite data base will be made available to all other libraries at the least possible cost for copying. initially at least, the costs of this effort will have to be shared by the participating libraries, since no additional funds are presently available. the goal is to build 100,000 serial records the first year, another 100,000 the second year, and design and implement the permanent mechanism the third year, while file-building continues. as the project gets under way, it will work like this: a set of detailed written guidelines for establishing the record and creating the input will be promulgated, and agreement to abide by them will be a prerequisite to participation. selected libraries with known excellence in serial records will be asked to participate; others may request participation. those selected who already have or can arrange for terminals on the oclc system will participate on line. this is the preferred method, but it may be possible to permit record creation off line, such records to be added to the data base in a batch mode. it is very difficult to merge serial files from different sources in this way, so an attempt will be made to find a large serials data base in machine-readable form for use as a starting point. this file would be read into the oclc system. a participating library wishing to enter a record would first search to see whether it existed in the initial data base. if a record is found, it would be updated insofar as this is possible, within the standards chosen for the system. it may be further updated by other participants, still within the system standards, but at some point update on a record in the system will reach a point of diminishing returns and the record will remain static until a post-edit at the national level can be performed. these records will be for use as their recipients see fit, but their prime purpose is to support the development of automated serials sys64 journal of library automation vol. 7/1 march 1974 terns while eliminating duplication of effort. details of how to hag these records in the oclc data base as they are being created by this effort will be worked out, as will be the relationship between this effort and the rest of oclc activities. clr will, from time to time, report progress to the community. it would be the hope of clr that the toronto group will continue to assist in the technical and detailed aspects of the project. in addition, and after consultation with the appropriate people, an advisory group will be appointed to advise clr in this effort. lawrence living8ton council on library resou1'ces input to the editor: re: file convm·sion using optical scanning at berkeley and the university of minnesota discussed by stephen silber8tein, jola technical communications, december 1973. it is rewarding to find someone who has actually read in detail one's published work (grosch, a. n. "computer-based subject authority files at the university of minnesota libraries"), i generally agree with mr. silberstein's observations regarding the use of optical scanning for library file conversion. however, several points were raised by mr. silberstein on which i feel further comment is needed. perhaps in my article i should have cautioned the reader that when developing procedure and programs for the cdc 915 page reader, there is a great variance in these machines depending upon: 1. how early a serial number unit, i.e., vintage of machine, 2. what version of the software system grasp is being used, 3. what degree of machine maintenance is performed out, and 4. what kinds of other customers are using the scanner. it was our misfortune to have a cdc 915 page reader that had many peculiarities about it which could or would not be resolved by a maintenance engineer. in addition it was not heavily used and what use it did receive was mostly nonrepetitive conversion jobs dealing mostly with mailing address file creation and freight billing. in our initial testing we tried to use various stock bond paper and had various reading difficulties. in talking with others who had used this particular machine we found that the choice of paper stock was critical on this scanner. i might add that we did not actually use $400 worth of paper on this as i sold half of the stock we had ordered to another user locally who was going to use this device. it might be worth mentioning that we had a failure of a potentially large conversion project reported to us. this project tried to use this equipment but could not create a suitable input format because of a specific uncorrected peculiarity of not being able to read lines of greater than six inches without repeated rejects. we were aware of this from our experience which is why we kept our line short using the ro to terminate reading of the line at the last character position. also our input was double spaced, not single spaced as you seem to infer in your comments. with this particular device we also found that the format recognition line was easily lost, necessitating greater time spent in re-running the job. therefore, even though this was a great commission of sin on our part according to mr. silberstein, i then must plead guilty to using expedient methods to turn a bad situation into an acceptable one. i might also point out that this solution had been employed at various times by some past users we contacted. in fact, i have later found out that occasionally such a technique has been resmted to in one of our other local user installations on a much newer machine. i do not wish to imply that our conversion achieved maximum through-put but that in any case it was a cost effective way to proceed. with a small file conversion such as this one which is to be done on a one-shot basis, it seemed foolish to me to spend much time optimizing, but rather to find a way that worked as our difficulties were encountered. if this had to be a continuing job we would have had to get a better maintained scanner and invested more time and money into the project. i take the view that we wish to couple modest human costs with modest projects and reserve for greater projects of a continuing nature more optimized procedures. i agree that file cleansing is undoubtedly the most costly operation but i cannot say by just what amount since my responsibilities did not include such work. this technical communications 65 was later performed by our technical services department. our general point in writing about this project was to convey our broad experiences using this technique on a subject authority system as we had not seen such use reported in the literature previously. i would hope your comments and mine here serve to illustrate that one's systems problems must be solved in light of the conditions and not always according to what we term the best theory or practice. to this end i hope others will profit from both of om comments. audrey n. grosch university of minnesota libraries the lc/marc record as a national standard 159 the desire to promote exchange of bibliographic data has given rise to a rather cacophonous debate concerning marc as a "standard," and the definition of a marc compatible record. much of the confusion has arisen out of a failure to carefully separate the intellectual content of a bibliographic record, the specific analysis to which it is subjected in an lc/marc format, and its physical representation on magnetic tape. in addition, there has been a tendency to obscure the different requirements of users and creators of machine-readable bibliographic data. in general, the standards making process attempts to find a consensus among both groups based on existing practice. the process of standardization is rarely one which relies on enlightened legislation. rather, a more pragmatic approach is taken based on an evaluation of the costs to manufacturers weighed against costs to consumers. even this modest approach is not invested with lasting wisdom. ansi standards, for example, are subject to quinquennial review. standards, as already pointed out, have as their basis common acceptance of conventions. thus, it might prove useful to examine the conventions employed in an lc/marc record. the most important of these is the anglo-american cataloging rules as interpreted by lc. the use of these rules for descriptive cataloging and choice of entry is universal enough that they may safely be considered a standard. similar comments may be made concerning the subject headings used in the dictionary catalog of the library of congress. the physical format within which machine-readable bibliographic data may be transmitted is accepted as a codified national and international standard (ansi z39.2-1971 and iso 2709-1973 (e) ) . this standard, which is only seven pages in length, should be carefully read by anyone seriously concerned with the problems of bibliographic data interchange. ansi z39.2 is quite different from the published lc/ marc formats. it defines little more than the structure of a variable length record. simply stated, ansi z39.2 specifies only that a record shall contain a leader specifying its physical attributes, a directory for identifying elements within the record by numeric tag (the values of the tags are not defined), and optionally, additional designators which may be used to provide further information regarding fields and subfields. this structure is completely general. within this same structure one could transmit book 160 1 oumal of library automation vol. 7 i 3 september 197 4 orders, a bibliographic record, an abstract, or an authority record by adopting specific conventions regarding the interpretation of numeric tags. thus, we come to the crux of the problem, the meanings of the content designators. content designators (numeric tags, subfields, delimiters, etc.) are not synonymous with elements of bibliographic description; rather, they represent the level of explicitness we wish to achieve in encoding a record. it might safely be said that in the most common use of a marc record-card production-scarcely more than the paragraph distinctions on an lc card are really necessary. if we accept such an argument, then we can simply define compatibility with lc/marc by defining compatibility in terms of a particular class of applications, e.g., card, book, or crt catalog creation. a record may be said to be compatible with lcjmarc if a system which accepts a record as created by lc produces from the compatible 1·ecord products not discernibly different from those created from an lc/marc record. thus, what is called for is a family of standards all downwardly compatible with lc/marc, employing ansi z39.2 as a structural base. this represents the only rational approach. the alternative is to accept lc/ marc conventions as worthy of veneration as artistic expression. s. michael malinconico lib-s-mocs-kmc364-20140601052109 computer assisted circulation control at health sciences library sunyab 87 jean k. miller: associate health sciences librarian for circulation and dissemination a description of the circulation system which the health scien ces library at the state university of new york at buffalo has been using since october 1970. features of the system include automatic production of overdue, fin e, and billing notices; notices for call-in of requested books; and book availability notices. remote operation and processing on the ibm 360/40 and cdc 6400 computer are accomplished via the administrative terminal system (ats) and terminal job entry (t]e). th e system provides information for management of the collection and improved service to the user. introduction the health sciences library of the state university of new york at buffalo (sunyab) serves the teaching, research, and clinical programs of the five schools of health sciences at the university-medicine, dentistry, pharmacy, nursing, health related professions-as well as the department of biology. it is the biomedical resource library for the five teaching hospitals affiliated with sunyab and for the health professionals within the nine counties of the lakes area regional medical program. service demands had increased steadily since 1961 with the incorporation of the university within the state of new york this was apparent in the circulation department where statistics indicated a 21 percent increase in the circulation of materials between fy 1967/ 68 and 1968/ 69. the circulation system in use was inefficient and time-consuming for both the user and the clerical staff. the user was required to fill out a charge card for each book, giving his name, address, and status; and the title, author, year of publication, volume, copy number, and call number of the book. the card 88 journal of library automation vol. 5/ 2 june , 1972 was stamped with the due date and filed alphabetically by main entry. problems resulted from illegible handwriting, selection of incorrect main entry, and incorrect filing. control of library materials was inadequate. the system to be described was adopted following consideration of the requirements of an effective system of circulation control and of the resources available to the library. planning for the development and implementation of the automated circulation system began in the fall of 1969. funding was provided by a medical library resource grant and the office of the provost of health sciences of sunyab. system design began in february 1970; programming was accomplished during june and july; implementation started in august; and the system was operational in october 1970. costs of operation have been provided by the university libraries of sunyab since april 1971. computer facilities the health sciences library shares the facilities provided by the department of computer services on campus. the current installation is an ibm 360/40 h with an eight-disk drive 2319 unit, six magnetic tape devices, card read and punch unit, and a 1100 line-per-minute printer. it includes a 2703 telecommunications unit supporting forty 2741 terminals and a 2701 unit with parallel data adapter unit interfacing a channel-to-channel adapter to a cdc 6400 computer. the ibm operating system, scope 3.2.0 version 1.1, is used. processing the library's circulation system was designed to use the administrative terminal system (a ts) and terminal job entry ( tje) for remote operation and processing on the ibm 360/40 and cdc 6400 computers. programs are written in fortran for cdc 6400-6500-6600 (version 2.3). the program modules comprising the circulation system require from lk to 60k and from 0 to 2 tape units for processing. ats documents are used rather than punched card decks as program and data input media. the system incorporates several large data bases which are updated at regular intervals. a file of current circulation transactions ( 80 characters per record) is mahltained on both magnetic tape and in a ts storage. this file is merged daily with new transactions. names and addresses of university personnel and students are maintained on magnetic tape. a file of inactive circulation records (50 characters per record) is also maintained on tape. other smaller files are stored in ats and are updated daily and/ or weekly. no permanent disk storage is used. input of data and programs is made from the ibm 2741 terminal located in the circulation office of the library. data are entered daily by the clerical staff via the ats terminal. storage, retrieval, and text editing are performed as required. processing of data is initiated by the library staff. a properly sequenced assemblage of ats documents consisting of data and procomputer assisted circulation control/miller 89 gramming instructions ( tje input file) is input from the ibm 27 41 terminal. this input file is submitted through tje for execution on the cdc 6400 computer. the data are processed in accordance with the specific job command entered at the terminal. after processing, the output is stored as a single ats document. in some instances, the clerical staff divides the ats stored output into discrete output files for storage and subsequent use. selected segments of the output (notices, save lists, etc.) are produced in hard copy format and delivered to the library by the computer center (figure 1). hsccirc system the health sciences library circulation system ( hsccirc) provides: 1. a file (query file) of all monographs off the shelves which includes a record of: a. books charged out. b . books on interlibrary loan. c. books on reserve. d . books at the bindery. e. books on the "hold shelf'' which have been returned upon the request of another patron. f. books on the "new book shelf." g. books which have been declared lost and are in the process of being replaced. 2. overdue notices to all borrowers. 3. billing notices to students for those books not returned after a second overdue has been sent. 4. a file (fine file) indicating the amount owed by individual students for overdues. 5. fine notices to students if an overdue book is returned but the fine is not paid. 6. notices to users having books requested by other patrons. 7. hold shelf notices alerting library personnel to those books which have been reserved for library pat rons. 8. book availability notices to users who have made "save" requests. 9. a file (history file) containing records of inactive transactions. 10. daily and cumulative (fiscal year-to-date) statistics of the transactions. the foregoing lists the information which the system provides on a routine basis. other modules of the system permit access to additional information as required. for example, lists may be prepared of books currently in circulation to interlibrary loan, on reserve, or at the bindery. these lists are used by the staff involved in processing these materials and may be updated at their request. 90 journal of library automation vol. .5 j 2 jun e, 1972 query file processing query file (ats) 8 8 ,.-----'--'---" processing fine unpaid fine file file \ \ \ \ \ ' \ ' ' ' ' ' ' transaction file(s) \ ' \ \ \ \ \ ' \ \ ' \ ' ' ' \ \ \ \ \ updated fine file fig. 1. system overr.;ieu;. history file analysis tables charts lists address file update updated address fi le computer a ssist ed citculation control/ miller 91 creation names/ addresses master 1~---i address tape creation creation names / addresses semester faculty/ staff letters 9 special run sequences lists (ill . reserve bindery) 92 journal of library automation vol. .5/ 2 june, 1972 the history file is analyzed quarterly. the analysis provides a statistical breakdown, by user categories, of the transactions which occurred since the last analysis. the total number of charges, renewals, and save requests for each of the five user categories are tallied. the call numbers of the books borrowed by members of each user category are listed. multiple charges of the same book are incremented and recorded. this information on book usage and borrowing patterns assists in library management decisions. it is possible to identify high usage of specific volumes or subject areas and to determine whether the demand is from the faculty, staff, or graduate or undergraduate student body. records of heavy demand and multiple save requests aid in decisions to purchase additional copies of a monograph. at the end of each semester, faculty /staff letters are prepared and mailed. each notice lists the call number and due date for overdue books currently charged to the faculty or staff member. the notice requests return of the book( s) before the beginning of the next semester. statistics generated by the system (figure 2) are used in the preparation of monthly, quarterly, and annual reports. they have been used as a basis for decisions on policy such as that resulting in a change in the length of the circulation period in april 1971. subsequent statistics have been used to evaluate such changes. in addition, the system permits rapid, easy consultation of the query file to detect the location of any book off the shelves. this is accomplished through use of the printed query file (figure 3) which is arranged in call number sequence. it contains one line of information for each transaction new chrgs holds spcl chrgs (ill) spcl chrgs (bnd) spc 1 ch rgs (res) renewals save requests recall letters ho 1 d 1 etters books overdue 1st overdue 2nd overdue bills lost books discharges discharges (hld shlf) fig. 2. circulation statistics. 42572 year to date 112 5 5 0 1 12 3 2 3 48 0 0 0 2 173 5 2225 185 85 48 116 308 74 63 99 1230 773 364 122 61 2837 154 computer assisted circulation control/miller 93 *t)l696/p2l27 /1 *clo *t 60772 *d 70772 *p 202319 *u3 *ql696/p2l27 /1 *c51 *t 71172 *d 71872 *p 202319 *u3 *ql697/g4/l966/l *clo *t 62672 *d 62672 *p 71165132 *ul *ql697/g4/l966/l *c52 *t 70772 *d 71472 *p 71165132 *ul *ql698/h25/l *clo *t 61972 *0 71972 *p 138551 *u3 *ql698/j3/l953/l *clo *t 61972 *d 71772 *p 138551 *u3 *ql698/s78/l965/l *c20 *t 62372 *d 72372 *p 244856 *u2 *ql698/s78/l *clo *t 61972 *d 71972 *p 138551 *u3 *ql698.3/a7/l965/l *clo *t 61972 *d 71972 *p 138551 *u3 *ql703/w22/l968/vl/l *clo *t 51172 *d 61172 *p 102714 *u3 *ql706/p25/1957/l *clo *t 71172 *d 81172 *p 440503366 *u4 *ql715/cl32m/l966/l *clo *t 70272 *d 80272 *p 180439 *u3 *ql73l/d67f/l969/l *clo *t 51872 *d 61872 *p 102714 *u3 *ql73l/d67f/l969/l *c53 *t 70772 *d 71472 *p 102714 *u3 *ql73l/e55/1953/l *clo *t 51872 *0 61872 *p 102714 *u3 *ql731/e55/l953/l *c53 *t 70772 *d 71472 *p 102714 *u3 *ql737/c2h24/l948/l *clo *t 50772 *d 60772 *p 147339 *u2 *ql737/c2h24/l952/l *c60 *t 42472 *d -0 *p -0 *uo *ql737/c2h24/l966/l *c60 *t 42472 *d -0 *p -0 *uo *ql737/c23c7/l969/l *clo *t 62172 *d 72172 *p 175470 *u2 *ql737/c4l5/l96l/l *c60 *t 30172 *d -0 *p -0 *uo *ql737/c7/l964/l *clo *t 62272 *d 72272 *p 220053 *u3 *ql737/m3f919k/l969/l *clo *t 60972 *d 70972 *p 165872 *u3 ~ql737/m3f919k/l96l/l *c51 *t 71172 *d 71872 *p 165872 *u3 *ql737/p9b9/1963/l *c60 *t 12272 *d -0 *p -0 *uo *ql737/p9h4/vl/1953/l *clo *t 51172 *d 61172 *p 102714 *u3 *ql737/p9h4/vl/l953/l *c54 *t 71172 *d 61172 *p 102714 *u3 *ql737/p9m64/l967/3 *clo *t 61572 *d 71572 *p 360307268 *u6 *ql737/p9s3/l965/v2/2 *cl8 *t 100471 *d -0 *p 777777777 *uo legend: *call number *c transaction code *t date of transaction *d due date or date next notice will be generated *p patron identification number *u user category fig. 3. query file. changing the status of a book. for example, when a book is charged, the query file contains one line of information relating to the charge. if the book becomes overdue, a second line of information is automatically generated indicating the overdue status of the book. a two-digit transaction code defines the status. the transaction code is entered as part of the input (as code 10 when charging a book); or it is generated by the system, as occurs when a book becomes overdue and initial and subsequent overdue, billing, and/or fine notices are produced (code 51,52,53,54). this same information may be obtained through on-line query of the circulation file from the ibm 2741 terminal during the hours of operation of ats. access to the query file is either by call number of the book or identification number of the user. the latter is used when producing lists of items out on loan to a borrower and in detecting delinquent borrowers. 94 journal of library automation vol. 5/2 june, 1972 overview comparison of statistics between fy 1969/70 and fy 1971/72 showed a 12 percent increase in circulation. during the same period there was a 61 percent increase in the number of people using the library. the circulation department has been able to handle the increased workload more efficiently because of the automated system. a decrease in clerical time required for carrying out the tasks of the department has been realized. the circulation records are now updated five times per week and notices are issued promptly. previously, updating was possible only once in every seven to ten days. service to the user is much faster and more accurate in charging books and in providing information on this status. control of items loaned to users is more effective. information for management of the collection and provision of improved service is available. system disadvantages are related to the mode of data input and lack of author and title information on records. transcription errors occur during manual capture of data at the time the transaction occurs and when the data are entered by the clerical staff from the terminal. correction of errors requires rekeying and reentry of the corrected data for reprocessing. this increases cost in terms of personnel time and equipment use. author and title information is not provided in the query file or on notices sent to users. this is an inconvenience to the user and requires checking of the shelf list by library personnel to provide the information when required. these potential disadvantages were recognized at the time the system was planned. however, they were not considered serious drawbacks. the decision was made to adopt the system and, when additional funds were forthcoming, to provide machine readable input and add author and title information to the records. costs the cost of the system during its first year of operation was $10,590.65. this included monthly charges for rental of equipment, use of ats, storage of records, computer time, and print costs. ibm 27 41 terminal (including phone line) a ts sign on time ats storage computer time and print costs total $1082.86 1042.08 3187.24 5278.47 $10,590.65 unit cost figures are imperfect, but over 69,000 transactions were processed and over 20,000 notices generated at an average unit cost of 11.6 cents. clerical time is not included in this figure. the number of clerical assistants remained constant although, as noted, all phases of the work of the circulation department increased. computer assisted circulation contmljmiller 95 future development in the future, the library hopes to be able to take greater advantage of the on-line query capability of the present system. additional ibm 27 41 terminals at selected locations in the library could provide instantaneous file query. while non-routine queries are made on-line, the library now uses printed listings for most routine queries. the installation of automatic data input devices, such as ibm 1030 equipment, would permit reading of coded book cards and patron identification cards with direct transmission of data to ats storage. the hardware and software modification required to implement this additional capability is technically feasible and not financially prohibitive. the present system is to be installed soon in another library on the sunyab campus. implementation should require only minimal software modifications to identify and keep separate the records of the other library. adoption is simplified because of the fact that book cards are not required and that the circulation file consists only of charged materials and not a record of complete library holdings. acknowledgments the following individuals contributed their varied talents and support to the development and implementation of the system: mr. erich meyerhoff, former librarian of the health sciences library; gerald lazorick, systems design programmer, former director, technical information dissemination bureau, sunyab; mrs. jean risley, programmer/analyst; mr. mark fennessy, former library intern at the health sciences library; and the clerical staff of the circulation department, especially barbara helminiak and _evelyn hufford. of the people, for the people: digital literature resource knowledge recommendation based on user cognition wen lou, hui wang, and jiangen he information technology and libraries | september 2018 66 wen lou (wlou@infor.ecnu.edu.cn) is an assistant professor in the faculty of economics and management, east china normal university. hui wang (1830233606@qq.com) is a graduate student in the faculty of economics and management, east china normal university. jiangen he (jiangen.he@drexel.edu) is a doctoral student in the college of computing and informatics, drexel university. abstract we attempt to improve user satisfaction with the effects of retrieval results and visual appearance by employing users’ own information. user feedback on digital platforms has been proven to be one type of user cognition. through conducting a digital literature resource organization model based on user cognition, our proposal improves both the content and presentation of retrieval systems. this paper takes powell's city of books as an example to describe the construction process of a knowledge network. the model consists of two parts. in the unstructured data part, synopses and reviews were recorded as representatives of user cognition. to build the resource category, linguistic and semantic analyses were used to analyze the concepts and the relationships among them. in the structural data part, the metadata of every book was linked with each other by informetrics relationships. the semantic resource was constructed to assist with building the knowledge network. we conducted a mock-up to compare the new category and knowledge-recommendation system with the current retrieval system. thirty-nine subjects examined our mock-up and highly valued the differences we made for the improvements in retrieval and appearance. knowledge recommendation based on user cognition was tested to be positive based on user feedback. there could be more research objects for digital resource knowledge recommendations based on user cognition. introduction the concept of user cognition originates in cognitive psychology. this concept principally explores the human cognition process through information-processing methods.1 the concept characterizes a process in which a user obtains unknown information and knowledge through acquired information. as information-science workers, we may explore the psychological activities of users by analyzing their cognitive processes when they are using information services.2 a knowledge-recommendation service based on user cognition has become essential since it emphasizes facilitating collaborations between humans and computers and promotes the participation of users, which ultimately improves user satisfaction. a knowledge-recommendation system is based on a combination of information organization, a retrieval system, and knowledge visualization.3 however, when exploring digital online literature resources, it is difficult to quickly and precisely find what we want because of the problem of information organization and retrieval. most search results only display a one-by-one list view. mailto:2012101040015@whu.edu.cn mailto:1830233606@qq.com mailto:jiangen.he@drexel.edu of the people, for the people | lou, wang, and he 67 https://doi.org/10.6017/ital.v37i3.10060 thus, adding visualization techniques to an interface could improve user satisfaction. furthermore, the retrieval system and visualizations rely on information organization. only if information is well designed can the retrieval system and visualization be useful. therefore, we attempt to improve retrieval efficiency by proposing a digital literature resource organization model based on user cognition to improve both the content and presentation of retrieval systems. taking powell’s city of books as an example, this paper proposes user feedback as first-hand user information. we will focus on (1) resource organizations based on user cognition and (2) new formats on search results based on knowledge recommendations. we will purposefully employ data from users’ own information and give knowledge back to users in accordance with the quote “of the people, for the people.” related work user cognition and measurement user cognition usually consists of a series of processes, including feeling, noticing, temporary memory, learning, thinking, and long-term memory.4 feeling and noticing are at an inferior level, while learning, thinking, and memory are comparatively superior. researchers have so far tried to identify user cognition processes by analyzing user needs. there are four levels of user needs according to ma and yang5 (see figure 1.) in turn, user interests normally reflect potential user needs. users who retrieve information on their own show feeling needs. users who give feedback show expression needs. users who ask questions show knowledge needs, which is the highest level. the methods to quantify user cognition require visible and measurable variables. existing studies have commonly used website log analysis or user surveys. website log analysis has been proven to be a solid data source to record and analyze both user interests and information needs.6 user surveys, including online questionnaires and face-to-face interviews, have been widely used to comprehend user feelings and user satisfaction.7 user surveys generally measure two kinds of relationship: between users and digital services and between users and the digital community.8 with a survey, we can make the most of statistics and assessment studies to analyze user satisfaction about an array of standards and systems of existing service platforms, service environments, service quality, and service personnel, which provides some references and suggestions for future study of user experience quality, platform elements, interaction process , and more.9 however, neither log data nor surveys can obtain first-hand user information in reallife settings. eye tracking and the concept-map method can be used to understand user behavior in the course of user testing.10 however, these approaches are difficult to adapt to a large group of users. therefore, a linguistic-oriented review analysis has become an increasingly important method. user content, including reviews and tags, could be analyzed through text mining and become valuable data sources to learn their preferences for the product and service in the areas of electronic commerce and digital libraries.11 this type of data has been called “more than words.”12 information technology and libraries | september 2018 68 figure 1. understanding user cognition by analyzing user needs. user-oriented knowledge service model the user-oriented service model includes user demand, user cognition, and user information behavior. a service model based on user demand chiefly concentrates on the motives, habits, regularities, and purposes of user demand to identify the model of use demand so that the appropriate service is adopted.13 service models based on user cognition attach importance to the process of user cognition, the influence that users are facing,14 and the change of library information services under the effects of series of cognitive processes (such as feeling, receiving, memorizing, and thinking).15 a service model based on user information behavior focuses on interactive behavior in the process of library information services that users participate in, such as interactions with academic librarians, knowledge platforms,16 and others. studies have paid more attention to the pre-process of the user-oriented service model, which analyzes information habits and user behaviors.17 studies have also proposed frameworks of knowledge services, design innovations,18 or personalized systems and frames of the knowledge service model, but they have not succeeded in implementing or performing user testing. knowledge service system construction most studies of knowledge service system construction are in business areas. numerous studies have explored knowledge-innovation systems for product services.19 cheung et al. proposed a knowledge system to improve customer service.20 vitharana, jain, and zahedi composed a knowledge repository to enhance the knowledge-analysis skills of business consultants.21 from of the people, for the people | lou, wang, and he 69 https://doi.org/10.6017/ital.v37i3.10060 the angle of user demand, zhou analyzed the elements of service-platform construction and found that crucial platforms should serve knowledge service system construction. 22 scholars proposed basic models for knowledge management and knowledge sharing, but they did not simulate their applications.23 knowledge management from the library-science perspective is very different from that in the business area. library knowledge management usually refers to a digital library, especially a personal digital library.24 others explore and attempt to construct a personalized knowledge service system,25 while fewer studies about system designs are based on the results of user surveys in accordance with documented surveys. we rarely see a user-feedback study combined with the method of using users’ own knowledge. users themselves know what they desire. if user-oriented studies separate the system design from user-needs analysis or the other way around, the studies may miss the purpose. therefore, we propose a resource-organization method based on users’ own knowledge to close the distance between the users and the system. resource-organization model based on user cognition there are normally two ways to construct a category system. one method gathers experts to determine categories and assign content to them; the category system comes first and the content second. the other method is to derive a category tree from the content itself, as we propose in this paper. in this way, the content takes priority over the categorization system. in this paper, we focus on this second way to organize resources and index content. resource organization requires a series of steps, including information processes, extraction, and organization. figure 2 shows the resource-organization model based on user cognition. this model fits the needs of digital resources with comments and reviews. the model has two interrelated parts. one is for indexing the content, and the other is for knowledge recommendations. for the first part, the model integrates all the comments and reviews of all literature in an area or the whole resource. the core concepts and the relationships among the concepts are extracted through natural language processing. the relationships between concepts are either subordination and correlation. a triple consists of two core concepts and their relationship. the triple set includes all triples. next, all books are indexed by taxonomy in the new category system. however, the indexing of every book is not based upon the traditional method, which is to manually determine each category by reading the literature. we use a method based on the books’ content. while we are extracting the core concepts from all books we extract the core concepts from every book by the same semantic-analysis methods and build up triples for the individual book. then the triples of this book can match the triple set in the new category system. once a triple in a single book yields a maximum matching value, the core concepts in the triple set will be indexed as the keywords of the book. a few examples of the matching process will be discussed in the empirical study (in the section “indexing books”). the first part is about comments and reviews, which are unstructured data. the second part is to make use of structural data in the bibliography to build a semantic network. structural data, including titles, keywords, authors, and publishers, is stored separately. we calculate the information technology and libraries | september 2018 70 informetrics relationships among the entities. the relationships can be among different entities, such as between one author and another or between an author and a publisher. then two entities and their relationship compose a triple. the components in triples are linked to each other, which makes them semantic resources. furthermore, the keywords in structural data are not the original keywords before the new category system but are the modified keywords. finally, the reindexed resources (books in the new category) and semantic resources (the triples from structural data) are both used to build the knowledge network. figure 2. resource-organization model based on user cognition. however, why is it important to use both unstructured data and structural data? the reason is to complete the entire content of a literature resource. neither of them can fully represent the whole semantics for a literature resource. structural data lacks subjective content, and unstructured data lacks basic information. thus, a full semantic network can be built using both kinds of data. of the people, for the people | lou, wang, and he 71 https://doi.org/10.6017/ital.v37i3.10060 resource-organization experiment object selection located in portland, oregon, powell’s city of books (hereafter referred to as “book city”) is one of the largest bookstores in the united states, with 200 million books in its inventory. book city caught our eyes for four reasons. (1) the comments and reviews of books on book city’s website are well constructed and plentiful. the national geographic channel established it as one of the ten best bookstores in the world.26 atlantis books, pendulo, and munro's books are also on the list. among these bookstores, only book city and munro’s books have indexed the information of comments and reviews. since user reviews are fundamental to this study, we restricted ourselves to bookstores that provided user reviews. (2) we excluded libraries because literature resources have been well organized in libraries. it might not be necessary to reorganize them according to user cognition. however, we can put this topic in the future study. (3) book city is a typical online bookstore that also has a physical bookstore. unlike amazon, book city, indigo, barnes & noble, and munro’s books have physical bookstores. however, they all have technological limitations on retrieval-system and taxonomical construction compared to amazon. thus, it is necessary to investigate these bookstores’ online systems and optimize them. (4) the location was geographically convenient to the researchers. the authors are more familiar with book city than other bookstores. moreover, we plan on conducting a face-to-face interview for the user study. it is doable only if the authors can get to the bookstore and the users who live there. in all, we choose book city as a representative object. data collection and processing on december 22, 2015, we randomly selected the field “cooking and food” and downloaded bibliographic data for 462 new and old books that included title, picture, synopsis and review, isbn, publication date, author, and keywords. in our previous work we described how metadata for all kinds of literature can be categorized into one of three types: structural data, semistructural data, and unstructured data.27 (see table 1). title, isbn, date, publisher, and author are classified as structural data. titles can be seen as structural data or unstructured data depending on the need. titles will be considered as an indivisible entity in this paper as titles need to retain their original meanings. keywords are considered as semistructural data for two reasons: (1) normally one book is indexed with multiple keywords, which are natural language; and (2) keywords are separated by punctuation. each keyword can individually exist with its own meaning. however, in the current category system, keywords are the names of categories and subcategories. since we are about to reorganize the category system, the current keywords will not be included in the following steps. we use the field “synopsis and review” in the downloaded bibliographic records as the source of user cognition. synopses and reviews are classified as unstructured data. all synopses and reviews of a single book are first incorporated into one paragraph, since some books contain more than one review. structural data will be stored for constructing a knowledge network. unstructured data will be part-of-speech tagged and word segmented by the stanford segmenter. all the books’ metadata are stored into the defined three data types and separate fields. each field is linked by the isbn as the primary key. information technology and libraries | september 2018 72 category organization first, the frequencies of words in all books are separately calculated after word segmenting so that core concepts are identified by the frequencies of words. in total, 29,370 words appeared 43,675 times, after excluding stop words. the 206 words in the sample that occurred more than 105 times appeared 34,944 times. this subset was defined as the core words according to the pareto principle. table 1. data sample. field content data type title a modern way to eat: 200+ satisfying vegetarian recipes structural data isbn 9781607748038 date 04/21/2015 publisher ten speed press author anna jones kwds cooking and food-vegetarian and natural semistructural data synopsis and review a beautifully photographed and modern vegetarian cookbook packed with quick, healthy, and fresh recipes that explore the full breadth of vegetarian ingredients—grains, nuts, seeds, and seasonal vegetables—from jamie oliver's london-based food stylist and writer anna jones. how we want to eat is changing. more and more people cook without meat several nights a week and are constantly seeking to . . . unstructured data we are inspired by zhang et al., who described a linguistic-keywords-extraction method by defining multiple kinds of relationships among words.28 the relationships include direct relationship, indirect relationship, part-whole relationship, and related relationship. • direct relationship. two core words have a relationship directly to each other. • indirect relationship. two core words are related and linked by another word as a media. • part-whole relationship. the “is a” relation. one core word belongs to the other. it is the most common relationship in context. • related relationship. two core words have no relationships but they both appear in a large context. the first two relationships can be mixed with the second two relationships. for instance, a partwhole relationship can have either a direct relationship or an indirect relationship. for this study, we combined every two core words into pairs for analysis. for example, the sentence “a picnic is a great escape from our day-to-day and a chance to turn a meal into something more festive and memorable” would result in several core-word pairs, including of the people, for the people | lou, wang, and he 73 https://doi.org/10.6017/ital.v37i3.10060 “picnic” and “meal,” “picnic” and “festive,” and “meal” and “festive.” for “picnic” and “meal,” there is an obvious part-whole relationship in this context. we observed all their relationships in all books and determined their relationship as a direct part-whole relationship because 67 percent of their relationships are part-whole relationship, 80 percent are direct relationship, and others are related relationship. this is the case when two core words are in the same sentence. for two words in different sentences but within one context, we define the words’ relationship as a sentence relationship. for example, “ingredient” and “meat” in one review in table 1 have an indirect relationship because they are connected by other core concepts between them. therefore, the relationship between “ingredient” and “meat” is an indirect part-whole one in this context. for other cases, two concepts are either related if they appear in the same context or are not related if they do not appear in the same review. thus, all couples of concepts are calculated and stored as semantic triples. figure 3. parts of a modified category in “cooking and food” based on user cognition. the next step is to build up a category tree (figure 4). a direct part-whole relationship is that between a parent class and child class. an indirect part-whole relationship is the relationship between a parent class and a grandchild class. a related relationship is the relationship between sibling classes. information technology and libraries | september 2018 74 compared to the modified category system (figure 3), the current hierarchical category system (figure 4) has two major issues. first, some categories’ names are duplicated. for example, the child class “by ingredient” contains “fruit,” “fruits and vegetables,” and “fruits, vegetables, and nuts.” second, there are categories without semantic meaning, such as “oversized books.” these two problems brought out disorderly indexing and recalled many irrelevant results. for example, the system would let you refine your search first if you type one word in search box. however, refining is confusing by parent class and children class. searching “diet” books as an example, the system suggests you refine your search from five subcategories of “diet and nutrition” under three different parent classes. however, the modified category system has avoided the duplicated keywords. furthermore, the hierarchical system based on users’ comments maintains meaning. figure 4. parts of current category system in “cooking and food.” indexing books we found that the list of keywords was confusing due to the inefficiency of the previous category system. it is necessary to re-index the keywords of each book based on the modified category system. we stand on the data-oriented indexing process. the method to detect the core concepts of each book is the same as that for all books in section 4.3. taking the book a modern way to eat as an example, triples are extracted from the book, including “grain-direct part whole-ingredient,” “nut-direct part whole-ingredient,” “vegetarian-related-health,” and so on. using all triples of the book to match with the triples set from all books in section 4.3, we index this book to categories by the best match parent class. in this case, 5 out of 9 triples of a modern way to eat are matched with the parent class “ingredient.” another two are matched with “natural” and “technique,” and of the people, for the people | lou, wang, and he 75 https://doi.org/10.6017/ital.v37i3.10060 the other two cannot correctly match with the triples set. then, a modern way to eat will be indexed with “cooking and food-ingredient,” “cooking and food-natural,” and “cooking and food-technique.” 4.5 semantic-resource construction the semantic resource is constructed based on structural data that was prepared at the beginning. the informetrics method (specifically co-word analysis) will be used to extract the precise relationship among the bibliography of books, as we previously proposed.29 we construct all structural data together and conduct co-words matrixes between each title, publisher, date, author, and keyword. for example, the author “anna jones” co-occurred with many keywords to varying degrees. the author co-occurred with the keyword “natural” four times and “person” seven times. according to qiu and lou, the precise relationship needs to be divided by the threshold and formatted as literal words.30 therefore, among the degree of all relationships between “anna jones” and other keywords, the relationship between “anna jones” and “natural” is highly correlated, and the relationship between “anna jones” and “person” is extremely correlated. triples are composed of two concepts and their relationships. then a semantic resource is finally constructed that could be used for knowledge retrieval. figure 5. an example of the knowledge network. once the semantic resource is ready, the knowledge network is presentable. we adopted d3.js to display the knowledge network (figure 5). the net view automatically exhibits several books related with an author william davis, which is placed in a conspicuous position on the screen. the forced map can be reformed when users drag any book with the mouse, which will be the noticeable center of other books. the network can connect with the database and the website. information technology and libraries | september 2018 76 5. user-experience study on knowledge display and recommendation there are two common ways to evaluate a retrieval system. one is to test the statistic results, such as the recall and precision. the other is a user study. since our aim is “of the people, for the people,” we chose to conduct two user-experience studies over the statistical results. as such, we can obtain what users suggest and comment on our approach. user-experience study design in february 2016, with the help of friends, we recruited volunteers by posting fliers in portland, oregon. fifty volunteers contacted us. thirty-nine responses were received by the end of march 2016 because the other eleven volunteers were not able to enroll in the electronic test. since we needed to test the feasibility of both the new indexing category and the knowledge recommendation, we set up the user study into two parts, including the comparison of the simple retrieval and the knowledge recommendation. first, we requested permission to use the data source and website frame from book city. however, we cannot construct a new website for book city due to intellectual-property issues. therefore, we constructed a practical mock-up to guide users to simulate a retrieval experiment. following the procedure of the user experience design, we chose mockingbot (https://mockingbot.com) as the mock-up builder. mockingbot allows the demo users to experience a vivid system that will be developed later. the mock-up supports every tag that can be linked with other pages so that subjects could click on the mock-up just as they would on a real website. the demo is expected to help us (1) examine whether our changes would meet the users’ satisfaction and (2) gather information for a better design. then we performed face-to-face, userguided interviews to first gain experience on the previous retrieval system and then compare them with our results. we concurrently recorded the answers and scores of users’ feedback. in the following sections, we will describe the interview process and present the feedback results. study 1: comparison of simple retrieval first, subjects were asked to search related books written by “michael pollan” at powells.com (figure 6). as such, all subjects used the search box based on their instincts. then they were asked to find a new hardcover copy of a book named cooked: a natural history of transformation. we paid attention to the ways that subjects located the target. only five of them used keyboard shortcuts to find the target. however, thirteen subjects stated their concerns regarding the absence of refinement options. furthermore, we noticed that six subjects swept (moused over) the refinement area and then decided to continue eye screening. in the meantime, we recorded the time they spent looking for the item. after they found the target, all subjects gave us a score from one to ten that represented their satisfaction with the current retrieval system. of the people, for the people | lou, wang, and he 77 https://doi.org/10.6017/ital.v37i3.10060 figure 6. screenshot of retrieval results in the current system. in the comparison experiment, we placed our mock-up in front of subjects and conducted the same exam above. in the mock-up, we used the basic frame of the retrieval system but reframed the refinement area. in the new refinement area (figure 7), we added an optional box with refinement keywords in the left column to narrow the search scope. the logic of the refined keywords comes from the indexing category, as we mentioned in the section on the indexing books. “michael pollan” was indexed in six categories: “biographies,” “children’s books,” “cooking and food,” “engineering manufactures,” “hobby and leisure,” and “gardening.” thus, when subjects clicked the “cooking and food” category, they can refine the results to only twelve books rather than the seventy books in the current system. users can obtain accurate retrieval results faster. after the subjects completed their tasks, they gave us a score from one to ten representing their satisfaction with the modified retrieval system. figure 7. refinement results in the modified category-system mock-up. information technology and libraries | september 2018 78 study 2: knowledge recommendation in this experiment, we conducted two tests for two functions on knowledge visualization. one tested the preferences for the net view, and the other tested the preferences for the individual recommendation. for the net view, we guided subjects to search for “william davis” in the mock-up and reminded them to click the net view button after the system recalled a list view. then, the subjects could see the net view results in figure 5. we recorded the scores that they gave for the net view. as for the recommendation on individual books, we adopted multiple layers of associated retrieval results for every book. users could click on one book and another related book would show in a new tab window. we asked subjects to conduct a new search for “william davis.” then they could browse the website and freely click on any book. once they clicked on davis’s book wheat belly: lose the wheat, lose the weight, and find your path back to health, the first recommendation results popped up (figure 8). the recommendation results about wheat in the field of “grain and bread” showed up, including good to the grain: baking with whole grain flours and bread bakers apprentice: mastering the art of extraordinary bread. others about health and losing weight showed up also, such as paleo lunches and breakfasts on the go. all related books appeared because the first book is about both wheat and a healthy diet. a new window showing relevant authors and titles would pop up if the mouse glided over any picture. we asked the subjects about their thoughts on the new recommendation format and recorded the scores. figure 8. an example of knowledge recommendation. users’ feedback as a result, knowledge organization and retrieval received a positive response (tables 2 and 3). first, subjects complained about the inefficiency of the current retrieval system in that it took so long to find one book without using shortcut keys (ctrl-f). three quarters of them were not satisfied with the original search style due to the search time length. however, 67 percent of the subjects gave a score of more than eight points for the refined search results of our new system. of the people, for the people | lou, wang, and he 79 https://doi.org/10.6017/ital.v37i3.10060 only two of them thought that it was useless since they were the two users who only took ten seconds to target the exact result. second, 67 percent and 74 percent of the subjects, respectively, thought that the knowledge recommendation and net view were useful and gave them six points. however, five subjects gave scores of one point because they maintained that it was not necessary to build a new viewer system. table 2. the time to find the exact result in the current system. answers # of users fewer than 10 seconds 2 10 to 30 seconds 4 30 seconds to 1 minute 12 more than 1 minute 21 table 3. statistics of quantitative questions in the questionnaire. score questions 10 9 8 7 6 5 4 3 2 1 total satisfied with original results 0 0 0 0 1 9 14 9 4 2 39 preference of refined results 2 10 14 6 5 0 0 0 0 0 37 preference of results in net view 1 8 10 6 4 1 2 3 1 3 39 preference of knowledge recommendation 3 6 4 8 5 6 0 3 1 2 38 during the interview, subjects who gave scores of more than eight points spoke positively about the vivid visualization of the retrieval results, using words such as “innovative” and “creative.” for instance, user 11 said, “bravo changes for powell, that’d be the most innovative experience for the locals.” among the subjects who gave scores of more than six points, the comments were mostly “interesting idea.” for instance, user 17 commented, “this is an interesting idea to explore my knowledge. i had no idea powell could do such an improvement.” some users offered suggestions to improve the system. for example, user 12 suggested that the system was not comprehensive enough to confidently assess whether the modified category system was better than the previous system. user 25 (a possible professional) was very concerned about the recall efficiency since the system might use many matching algorithms. discussion and conclusion in this paper, a digital literature resource organization model based on user cognition is proposed. this model aims to make users exert subjective initiative. we noticed a significant difference between the previous category system and the new system based on user cognition. our aim, which was “of the people, for the people,” was fulfilled. taking powell’s city of books as an example, it is purposeful to describe how to construct a knowledge network based on user cognition. the user experience study showed that this network implements an optimized exhibition of a digital-resource knowledge recommendation and knowledge retrieval. although user cognition includes many other processes of user behavior, we only used the literal expression. it turned out to be a positive and possible way to reveal users’ cognition. information technology and libraries | september 2018 80 we find that there is much more space for the construction object of digital resource knowledge recommendation based on user cognition. for one, in this paper we only take the familiar book city as a study object and books as experiment objects and determined favorable positive effects, which indicates that the digital resource knowledge link can be applied to physical libraries and bookstores or other types of literature. even though libraries have well-developed taxonomy systems, they can be compared with or combined with new ideas. for another, users adore visual effects and user functions. the results show promise in actualizing improvements to book city’s website or even to other digital platforms. the concerns will be how to optimize the retrieval algorithm and reduce the time costs in the next study. acknowledgements we thank carolyn mckay and powell’s city of books for such great help for the questionnaire networking and all participates for feedback. this work was supported by the national social science foundation of china [grant number 17ctq025]. references and notes 1 peter carruthers, stephen stich, and michael siegal, the cognitive basis of science (cambridge: cambridge university press, 2002). 2 sophie monchaux et al., “query strategies during information searching: effects of prior domain knowledge and complexity of the information problems to be solved,” information processing and management 51, no. 5 (2015): 557–69, https://doi.org/10.1016/j.ipm.2015.05.004. 3 hoill jung and kyungyong chung, “knowledge-based dietary nutrition recommendation for obese management,” information technology and management 17, no. 1 (2016): 29–42, https://doi.org/10.1007/s10799-015-0218-4. 4 dandan ma, liren gan, and yonghua cen, “research on influence of individual cognitive preferences upon their acceptance for knowledge classification recommendation service,” journal of the china society for scientific and technical information 33, no. 7 (2014): 712–29. 5 haiqun ma and zhihe yang, “study on the cognitive model of information searchers from the perspective of neuro-language programming,” journal of library science in china 37, no. 3 (2011): 38–47. 6 paul gooding, “exploring the information behaviour of users of welsh newspapers online through web log analysis,” journal of documentation 72, no. 2 (2016): 232–46. https://doi.org/10.1108/jd-10-2014-0149. 7 munmun de choudhury and scott counts, “identifying relevant social media content : leveraging information diversity and user cognition,” in ’ht11 proceedings of the 22nd acm conference on hypertext and hypermedia (new york: acm, 2011), 161–70, https://doi.org/10.1145/1995966.1995990; carol tenopir et al., “academic users’ interactions with sciencedirect in search tasks: affective and cognitive behaviors ,” information processing and management 44, no. 1 (2008): 105–21, https://doi.org/10.1016/j.ipm.2006.10.007. https://doi.org/10.1016/j.ipm.2015.05.004 https://doi.org/10.1007/s10799-015-0218-4 https://doi.org/10.1145/1995966.1995990 https://doi.org/10.1016/j.ipm.2006.10.007 of the people, for the people | lou, wang, and he 81 https://doi.org/10.6017/ital.v37i3.10060 8 young han bae, jong woo jun, and michelle hough, “uses and gratifications of digital signage and relationships with user interface,” journal of international consumer marketing 28, no. 5 (2016): 323–31, https://doi.org/10.1080/08961530.2016.1189372. 9 claude sicotte et al., “analysing user satisfaction with the system in use prior to the implementation of a new electronic inpatient record,” in proceedings of the 12th world congress on health (medical) informatics; building sustainable health systems (amsterdam: ios press, 2007), 1779-1784; zhenzheng qian et al., “satiindicator: leveraging user reviews to evaluate user satisfaction of sourceforge projects,” in proceedings—international computer software and applications conference 1 (2016):93–102, https://doi.org/10.1109/compsac.2016.183. 10 christina merten and cristina conati, “eye-tracking to model and adapt to user meta-cognition in intelligent learning environments,” in proceedings of the 11th international conference on intelligent user interfaces—iui ’06 (new york: acm, 2006), 39–46, https://doi.org/10.1145/1111449.1111465; weidong zhao, ran wu, and haitao liu, “paper recommendation based on the knowledge gap between a researcher’s background knowledge and research target,” information processing & management 52, no. 5 (2016): 976–88, https://doi.org/10.1016/j.ipm.2016.04.004. 11 haoran xie et al., “incorporating sentiment into tag-based user profiles and resource profiles for personalized search in folksonomy,” information processing and management 52, no. 1 (2016): 61–72, https://doi.org/10.1016/j.ipm.2015.03.001; francisco villarroel ordenes et al., “analyzing customer experience feedback using text mining: a linguistics-based approach,” journal of service research 17, no. 3 (2014): 278–95, https://doi.org/10.1177/1094670514524625; yujong hwang and jaeseok jeong, “electronic commerce and online consumer behavior research: a literature review,” information development 32, no. 3 (2016): 377–88, https://doi.org/10.1177/0266666914551071. 12 stephan ludwig et al., “more than words: the influence of affective content and linguistic style matches in online reviews on conversion rates,” journal of marketing 77, no. 1 (2012): 1–52, https://doi.org/10.1509/jm.11.0560. 13 jun yang and yinglong wang, “a new framework based on cognitive psychology for knowledge discovery,” journal of software 8, no. 1 (2013): 47–54. 14 alan baddeley, “on applying cognitive psychology,” british journal of psychology 104, no. 4 (2013): 443–56, https://doi.org/10.1111/bjop.12049. 15 aidan moran, “cognitive psychology in sport: progress and prospects,” psychology of sport and exercise 10, no. 4 (2009): 420–26, https://doi.org/10.1016/j.psychsport.2009.02.010. 16 john van de pas, “a framework for public information services in the twenty-first century,” new library world 114, no. 1/2 (2013): 67–79, https://doi.org/10.1108/03074801311291974. 17 enrique frias-martinez, sherry y. chen, and xiaohui liu, “evaluation of a personalized digital library based on cognitive styles: adaptivity vs. adaptability,” international journal of https://doi.org/10.1080/08961530.2016.1189372 https://doi.org/10.1109/compsac.2016.183 https://doi.org/10.1145/1111449.1111465 https://doi.org/10.1016/j.ipm.2016.04.004 https://doi.org/10.1016/j.ipm.2015.03.001 https://doi.org/10.1177/1094670514524625 https://doi.org/10.1177/0266666914551071 https://doi.org/10.1509/jm.11.0560 https://doi.org/10.1111/bjop.12049 https://doi.org/10.1016/j.psychsport.2009.02.010 https://doi.org/10.1108/03074801311291974 information technology and libraries | september 2018 82 information management 29, no. 1 (2009): 48–56, https://doi.org/10.1016/j.ijinfomgt.2008.01.012. 18 shing lee chung et al., “an integrated framework for managing knowledge-intensive service innovation,” international journal of services technology and management 13, no. 1/2 (2010): 20, https://doi.org/10.1504/ijstm.2010.029669. 19 koteshwar chirumalla, “managing knowledge for product-service system innovation: the role of web 2.0 technologies,” research-technology management 56, no. 2 (2013): 45–53, https://doi.org/10.5437/08956308x5602045; koteshwar chirumalla et al., “knowledgesharing network for product-service system development: is it a typical?,” in international conference on industrial product-service systems (2013): 109–14; fumiya akasaka et al., “development of a knowledge-based design support system for product-service systems,” computers in industry 63, no. 4 (2012): 309–18, https://doi.org/10.1016/j.compind.2012.02.009. 20 c. f. cheung et al., “a multi-perspective knowledge-based system for customer service management,” expert systems with applications 24, no. 4 (2003): 457–70, https://doi.org/10.1016/s0957-4174(02)00193-8. 21 padmal vitharana, hemant jain, and fatemeh zahedi, “a knowledge based component/service repository to enhance analysts’ domain knowledge for requirements analysis,” information and management 49, no. 1 (2012): 24–35, https://doi.org/10.1016/j.im.2011.12.004. 22 baihai zhou, “the construction of library interdisciplinary knowledge sharing service system,” in 2014 11th international conference on service systems and service management (icsssm), june 25–27, 2014, https://doi.org/10.1109/icsssm.2014.6874033. 23 rusli abdullah, zeti darleena eri, and amir mohamed talib, “a model of knowledge management system for facilitating knowledge as a service (kaas) in cloud computing environment,” 2011 international conference on research and innovation in information systems, november 23–24, 2011, 1–4, https://doi.org/10.1109/icriis.2011.6125691. 24 alan smeaton and jamie callan, “personalisation and recommender systems in digital libraries,” international journal on digital libraries 5, no. 4 (2005): 299–308, https://doi.org/10.1007/s00799-004-0100-1. 25 yanwen wu et al., “research on personalized knowledge service system in community elearning,” lecture notes in computer science (berlin: springer, 2006), https://doi.org/10.1007/11736639_17; shu-chen kao and chienhsing wu, “pikipdl. a personalized information and knowledge integration platform for dl service,” library hi tech 30, no. 3 (2012): 490–512, https://doi.org/10.1108/07378831211266627. 26 national geographic, destinations of a lifetime: 225 of the world’s most amazing places (washington d.c.: national geographic society, 2016). 27 wen lou and junping qiu, “semantic information retrieval research based on co-occurrence analysis,” online information review 38, no. 1 (january 8, 2014): 4–23, https://doi.org/10.1016/j.ijinfomgt.2008.01.012 https://doi.org/10.1504/ijstm.2010.029669 https://doi.org/10.5437/08956308x5602045 https://doi.org/10.1016/j.compind.2012.02.009 https://doi.org/10.1016/s0957-4174(02)00193-8 https://doi.org/10.1016/j.im.2011.12.004 https://doi.org/10.1109/icsssm.2014.6874033 https://doi.org/10.1109/icriis.2011.6125691 https://doi.org/10.1007/s00799-004-0100-1 https://doi.org/10.1007/11736639_17 https://doi.org/10.1108/07378831211266627 of the people, for the people | lou, wang, and he 83 https://doi.org/10.6017/ital.v37i3.10060 https://doi.org/10.1108/oir-11-2012-0203; junping qiu and wen lou, “constructing an information science resource ontology based on the chinese social science citation index,” aslib journal of information management 66, no. 2 (march 10, 2014): 202–18, https://doi.org/10.1108/ajim-10-2013-0114; fan yu, junping qiu, and wen lou, “library resources semantization based on resource ontology,” electronic library 32, no. 3 (2014): 322–40, https://doi.org/10.1108/el-05-2012-0056. 28 lei zhang et al., “extracting and ranking product features in opinion documents,” in international conference on computational linguistics (2010): 1462–70. 29 lou and qiu, “semantic information retrieval research,” 4; qiu and lou, “constructing an information science resource ontology,” 202; yu, qiu, and lou, “library resources semantization,” 322. 30 qiu and lou, “constructing an information science resource ontology,” 202. https://doi.org/10.1108/oir-11-2012-0203 https://doi.org/10.1108/ajim-10-2013-0114 https://doi.org/10.1108/el-05-2012-0056 abstract introduction related work user cognition and measurement user-oriented knowledge service model knowledge service system construction resource-organization model based on user cognition resource-organization experiment object selection data collection and processing category organization indexing books 4.5 semantic-resource construction 5. user-experience study on knowledge display and recommendation user-experience study design study 1: comparison of simple retrieval study 2: knowledge recommendation users’ feedback discussion and conclusion acknowledgements references and notes lib-s-mocs-kmc364-20141005044017 catalog records retrieved by personal author using derived search keys 103 alan l. landgraf and frederick g. kilgour: the ohio college library center this investigation shows that search keys derived from personal author names possess a sufficient degree of distinctness to be employed in an effi~ cient computerized interactive index to a file of marc ii catalog records having 167,7 45 personal author entries. previous papers in this series and experience at the ohio college library center have established that truncated derived search keys are efficient for retrieval of entries by name-title and title from large on~line computerized files of catalog records. 14 experiments reported in the earlier papers were " ... based on the assumption that each key had a probable use equal to all other keys."5 however, guthrie and slifko have shown that random selection of entries, rather than keys, yields results closer to actual experience but with a higher number of entries per reply.6 for example, they found on retrieving from a file of 857,725 records using a 4, 5 (four characters of main entry, five characters of title) key tl1at when the basis of the search was random keys there was one entry per reply 81.3 percent of the time, but when the basis was random records, there was one entry per reply 55.7 percent of the time. this paper presents the results of experimentation with search keys to be used in constructing an author index to a large file of on-line catalog records. an interactive environment is assumed, with the interrogator employing a remote terminal. a companion paper de:;etibes the findings of an investigation into retrieval efficiency of search keys derived from corporate author names.7 materials and methods the investigation employed a marc ii file containing approximately 200,000 monographic records from which a computer program extracted 167,745 personal-name keys. the program extracted these keys from main entry, series statement, added entry, and series added entry fields. the basic key structure consisted of sixteen characters-the first eight from the surname, the first seven from the forename, and the first character from the middle name ( 8,7,1). if the surname and forename contained fewer char104 journal of libmry automation vol. 6/ 2 june 1973 ~ likelihood 90.00% 99.00% 99 . 50% 0 90. 00% 99. 00% 99.50% 0 ....... j: ii i .&: -"i ii 0 " ....... j: 2 .j: ..... it "i ~ ~ ii 3 j::. .... " ~ no. of characters extracted from the surname 3 4 5 6 (>200) (> 200) (>200) 171 (>200) 67 25 18 16 172 90 71 63 (>200) 105 102 81 16 8 6 6 55 25 23 ii 67 36 32 30 26 12 9 87 44 38 106 62 57 8 5 5 29 21 21 37 30 30 17 50 78 5 23 31 fig. 1. number of names retrieved 90, 99, and 99.5 percent of the titne for different key structures acters than the key segment to be derived, the segment was left-justified and padded out with blanks. if there was no middle name or middle initial, a blank was used. another program derived shorter keys from the 8,7,1 structure ranging from 3,0 to 5,2,1. next, a sort program arranged the shorter keys in alphabetical order. a statistics collection program then processed the alphabetical file. this program counted the number of distinct keys, built a frequency distribution of names per distinct key and cumulative frequency distributions of names per distinct key in percentile groups. results figure 1 presents the findings at three levels of likelihood for retrieving n catalog records retrieved/ landgraf 105 table 1 . number of names r etrieved with 90 percent likelihood no. of characters 3 4 5 6 7 no. of names retrieved ( > 200) (>200) (>200) ( > 200) 26 25 16 171 18 17 12 8 8 16 9 6 5 5 key structure 3,0 4,0 3,1 5,0 3,2 4,1 3,1,1 6,0 5,1 3,3 4,2 3,2,1 4,1,1 6,1 5,2 5,1,1 3,3,1 4,2,1 or fewer names when a variety of search key combinations were employed ranging from three to six characters from the surname, zero to three characters from the first name, and with or without the middle initial. table 1 is an extraction from figure l and contains the number of names retrieved at a level of 90 percent likelihood for the various search keys employed. figure 2 has the same structure as figure 1 but contains the degree of distinctness as percentages, ( no. of distinct keys) 100 no. of entries x percent. table 2 records distinctness arranged by number of characters per key. figure 3 is a graphical representation of the degrees of distinctness of the various keys. in this figure, different types of lines connect points representing key structures that contain an equal number of characters. the bottom line in table l may be read as saying that 90 percent of the time a 4,2,1 key will retrieve five or fewer names from a file of 167,745 personal name keys. the bottom line of table 2 states that from the same file the 4,2,1 key. yields a single name 64.1 percent of the time. discussion, this experiment has shown the degree of distinctness-that is to say, the number of distinct keys divided by the total number of entries from which all keys were derived-to be a useful tool in determining what key structures may be efficiently used. as seen by comparing figure 1 with figure 2 and table 1 with table 2, there is a high degree of correlation between distinctness aj}d the likelihood of retrieving a certain number of names 90, 106 journal of library automation vol. 6/ 2 june 1973 no. of characters extracted from the surname ~ 0 a: ila.~ 0 03: ~!::: ~~ a:o 1-z ~< 3:-' cl)t-< ffiie t;w!: :~w r the gr ) 14. gadd, “the use and misuse of early english books online,” 683. 15. “about eebo.” 16. details on the estc are provided by the british library at http://www.bl.uk/reshelp/findhelprestype/catblhold/estccontent/estccontent.html, viewed march 12, 2017 17. gadd, “the use and misuse of early english books online,” 685-686. 18. gadd, “the use and misuse of early english books online,” 686. 19. eebo, “frequently asked questions,” accessed february 18, 2017. http://eebo.chadwyck.com/help/faqs.htm 20. association of research libraries, microform sets in u.s. and canadian libraries, (washington, d.c.: association of research libraries, 1984), j-3. 21. martin d. joachim, “cooperative cataloging of microform sets,” in cooperative cataloging: past, present, and future (new york: the haworth press, 1993), 111. information technology and libraries | september 2017 46 22. gadd, “the use and misuse of early english books online,” 686. 23. british library, “catalogs of british library holdings: english short title catalogue content,” accessed february 18, 2017. http://www.bl.uk/reshelp/findhelprestype/catblhold/estccontent/estccontent.html 24. the british libraries estc codes for filmed copy locations are difficult to translate. see meaghan j. brown’s finding aid, “stc location code transcription” wherein she offers details on stc and estc location codes and the problem her finding aid addresses. brown explains, “… it is currently possible to search the estc for items using marc codes, but not the location codes familiar from the stc,” accessed february 18, 2017. http://www.meaghanbrown.com/stc-location-codes/ 25. text creation partnership, accessed january 25, 2017. http://www.textcreationpartnership.org/home/ 26. text creation partnership, accessed january 25, 2017. http://www.textcreationpartnership.org/catalog-records/ 27. oclc’s form is available at https://www.oclc.org/content/dam/support/knowledgebase/ocn_report.xlsx, accessed october 18, 2016. 28. see appendix 1 for the procedures 29. with streamlined kbart search features introduced by a metadata services department colleague, it’s expected this time may be reduced moving forward. 30. a june 9, 2015 email from an oclc staff member to the kb-l@oclc.org listserv reported on oclc’s efforts to match ocn in its kbart files to english language of cataloging records, when available. 31. um libraries’ staff use this metadata in the equivalent oclc microfilm and e-version and eebo resource records as match points. staff do not verify that the images linked to the eebo version records correspond to those in the aforementioned bibliographic records. it is hoped that proquest will investigate the case described in this paper in which the eebo resource differs from its corresponding record. 32. “510 citation/reference note,” oclc, bibliographic formats and standards. 4th edition, last revised august 22, 2016. https://www.oclc.org/bibformats/en/5xx/510.html 33. as of january 29, 2017, the marc 510 field has not been indexed by oclc. see http://www.oclc.org/support/help/searchingworldcatindexes/#05_fieldsandsubfields/5xx _fields.htm 34. e.g., oclc indexes “internet resources” using a combination of marc data elements. these are laid out in “searching worldcat indexes” at http://www.oclc.org/support/help/searchingworldcatindexes/#06_format_document_typ e_codes/format_document_type_codes.htm. marc 21 bibliographic at a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 47 https://www.loc.gov/marc/bibliographic/bdleader.html provides the leader position 06 code for “language material.” marc code list for languages (http://www.loc.gov/marc/languages/) contains the language codes contained in the language of cataloging field/subfield (marc 040 field, subfield “b”). 35. “024 other standard identifier,” in oclc, bibliographic formats and standards, 4th edition, accessed january 25, 2017. https://www.oclc.org/bibformats/en/0xx/024.html 36. ibid. 37. oclc. searching worldcat indexes, accessed february 18, 2017. http://www.oclc.org/support/help/searchingworldcatindexes/#05_fieldsandsubfields/0xx _fields.htm%3ftocpath%3dfields%2520and%2520subfields%7c_____2 38. see oclc bibliographic formats and standards, fourth edition. 024 other standard identifier https://www.oclc.org/bibformats/en/0xx/024.html, viewed january 25, 2017 39. an oct. 18, 2016 review of oclc’s all-collections-list, available at https://www.oclc.org/content/dam/support/knowledge-base/all-collections-list.xlsx indicates that 38.5% percent of the 129,498 resources on the eebo kbart file have oclc number coverage. 40. http://experimental.worldcat.org/marcusage/510.html 41. http://experimental.worldcat.org/marcusage/024.html 42. kbart phase ii working group, knowledge bases and related tools (kbart): recommended practice: niso rp-9-2014 (baltimore, md: niso 2014), 18. http://www.niso.org/workrooms/kbart 43. https://www.oclc.org/worldcat/data-strategy.en.html, viewed jan. 26, 2017 44. the image of the linked data view of figure 14 was captured on february 18, 2017. 45. carl stahmer, “making marc agnostic: transforming the english short title catalogue for the linked data universe,” in linked data for cultural heritage, (chicago: ala editions), p. 23-25. 46. the assertion that the estc transformation of marc 510 field metadata is solely based on carl stahmer, “the estc as a 21st century research tool,” presentation given at the 2014 conference of the text encoding initiative, viewed february 19, 2017. https://figshare.com/articles/estc21_at_tei_2014/1558057 47. roger p. bristol, supplement to charles evans' american bibliography (charlottesville: university press of virginia, 1970). 48. dianne hillmann, gordon dunsire, and jon phipps, “maps and gaps: strategies for vocabulary design and development,” in dcmi international conference on dublin core and metadata applications, 2013: 88, accessed february 18, 2017. http://dcpapers.dublincore.org/pubs/article/view/3673/1896 information technology and libraries | september 2017 48 49. see reference 14 above. 50. a discussion and invitation to collaborate on this work took place in late 2016 on the oclc worldcat kb listserv (see http://listserv.oclc.org/scripts/wa.exe?subed1=kb-l&a=1). to date, the preus library, luther college, will be working with the libraries on this project. trope or trap? roleplaying narratives and length in instructional video amanda s. clossen information technology and libraries | march 2018 27 amanda s. clossen (asc17@psu.edu) is learning design librarian, pennsylvania state university. abstract a concern that librarians face when creating video is whether users will actually watch the video they are directed to. this is a significant issue when it comes to how-to and other point-of-need videos. how should a video be designed to ensure maximum student interest and engagement? many of the basic skills demonstrated in how-to videos are crucial for success in research but are not always directly connected to a class. whether a video is selected for inclusion by an instructor or viewed after it is noticed by a student depends on how viewable the video is perceived to be. this article will discuss the results of a survey of more than thirteen hundred respondents. this survey was designed to establish the broad preferences of the viewers of instructional how-to videos, specifically focusing on the question of whether the length and presence of a role-playing narrative enhances or detracts from the viewer experience, depending on demographic. literature review length since the seminal 2010 study by bowles-terry, hensley, and hinchliffe established emerging best practices for pace, length, content, look and feel, and video versus text, a variety of works compiling best practices for video have been created.1 the very successful library minute videos from arizona state university resulted in a collection of how-tos and best practices by rachel perry.2 these included tips on addressing an audience, planning, content, length, frugality, and experimentation. in 2014 coastal carolina nursing students were surveyed for their preferences in video, resulting in another set of best practices. these focused on video length, speaking pace, zoom functionality, and use of callouts.3 martin and martin’s extensive 2015 review covers content, compatibility, accessibility, and audio.4 the recommended length listed in these best practices varies widely. thirty-seconds to a minute is recommended by bowles-terry, hensley, and hinchliffe, while perry recommends no longer than ninety seconds.5 the coastal carolina study and seminole state review recommend no longer than three minutes.6 nearly all the articles reviewed stress that complicated concepts should be broken into more easily comprehensible chunks to avoid overwhelming student cognitive load. mailto:asc17@psu.edu trope or trap? role-playing narratives and length in instructional video | clossen 28 https://doi.org/10.6017/ital.v37i1.10046 narrative roleplay scenario the typical roleplay involves a hypothetical student who needs some sort of assistance and is helped through the process using library resources. often there is also a hypothetical guide, who can be a librarian, friend, or professor. these hypothetical situations are recorded in a variety of ways: from live-action video recordings, to screencast voice-overs, to text. the efficacy of such tools in library video have been explored little, if at all. devine, quinn, and aguilar’s 2014 study explores the usage and effectiveness of microand macro-narratives in resident information literacy instruction,7 but there is no question that this instructional scenario is very different than how-to instructional videos. the interplay between student interest and such narratives is addressed by emotional interest theory, which states that adding unrelated but interesting material increases attention by energizing the learner. these unrelated pieces of engaging material are known as seductive details. this “highly interesting and entertaining information . . . is only tangentially related to the topic but is irrelevant to the author’s intended theme.”8 exploration of this concept through experimental study has indicated that seductive details are detrimental to learning.9 some evidence indicates that learners are more likely to remember these details than the important content itself thanks to cognitive load issues.10 however, there have also been cases where seductive details have improved recall.11 in their 2015 study, park, flowerday, and brünken argue that the format and presentation of seductive details have varying effect on learning processes and that they can be used to positive effect.12 in this paper, the seductive details to be studied are those of the roleplay narrative used to frame instruction in how-to videos. methods survey design the survey was designed to explore three questions: • does the length of the video affect a user’s willingness to watch it? • do users prefer videos that are pure instruction or those that use a roleplay narrative to deliver content? • does the demographic of the viewer affect a video’s viewability? the survey was revised in collaboration with a survey design and statistical specialist at the penn state library’s data learning center. the completed survey was then entered into qualtrics for implementation. implementation implementation and subject-gathering was done through a survey-research sampling company that provided both a wide demographic and rapid data collection. this was sponsored by an institutional grant. subjects from a variety of institution types and geographic locations were solicited via email invitation to complete a survey that explored their perspectives on instructional videos. information technology and libraries | march 2018 29 the twenty-question survey was focused on respondents of a traditional college age. implementation resulted in 1,305 responses out of 1,528 surveys. after implementation, results were compiled and analyzed by a statistical expert at the institutional data center. nearly all the analyses to follow are simple cross-tabulations of respondent choices as correlations between demographics and preference were minor based on a multivariate analysis of variance (manova) test. results and discussion demographics the survey, which was limited to a traditionally college-aged population (eighteen to twentyfour), produced a nearly 1:1 gender distribution (figure 1). figure 1. age and gender distribution. the survey had around 64 percent student participants, 77 percent of these attending school full time. of those full-time students, 60 percent were resident students, and only 9 percent were solely online students. unemployed participants were more likely to be full-time resident students whereas online students were more likely to be employed full-time. (see figures 2 and 3.) trope or trap? role-playing narratives and length in instructional video | clossen 30 https://doi.org/10.6017/ital.v37i1.10046 figure 2. employment and student status distribution. figure 3. resident versus online status distribution. information technology and libraries | march 2018 31 information and video confidence the distribution of confidence in information-seeking ability hovered around 90 percent. however, at most, only half of respondents had any familiarity with google scholar (see figure 4). this tells us several things, the most important being that what librarians consider appropriate confidence in information-seeking is very different from what the college-aged layperson considers appropriate. this supports colón-aguirre and fleming-may’s 2012 study that indicates that students are likely to use free online websites that require the least effort for their research.13 figure 4. information-seeking confidence. video length length of a video does play a role for most. about 70 percent of participants indicated that they are either more likely to watch a video with a timestamp or will rarely watch unless the time is indicated (see figure 5). timestamp is easily provided by most video players. the mean maximum time for college-age participants’ willingness to watch was about four and a half minutes. the median was approximately three minutes. in general, shorter appears better: three to four minutes is around the maximum length that most eighteen to twenty-nine year olds are willing to watch. this contradicts all the referenced best practices but those proffered by baker, who described thirty to ninety seconds as ideal video viewing time. her study found that 41 percent of her students preferred videos that were one to three minutes long, but 24 percent preferred three to five minutes. because of this, she recommends videos that are three minutes or less.14 trope or trap? role-playing narratives and length in instructional video | clossen 32 https://doi.org/10.6017/ital.v37i1.10046 figure 5. perspective on viewing time. instructions versus roleplay the bulk of the survey was questions related to two videos. both videos were under three minutes long and were produced using techsmith’s camtasia screencast software. the screencast video simply explained how to complete a research task—searching google scholar for an article addressing a theme in shakespeare’s romeo and juliet. viewers were guided through the process of finding articles on this topic by a single narrator. no dramatized roleplay situation was presented. the narrative video guided the participants through a hypothetical situation dramatized by two actors. the scenario was a common one—a student procrastinating on a paper and asking her roommate for assistance at the last minute. the roommate guided the student through use of google scholar, completing the same tasks as the screencast video. participants watched both videos and answered a series of questions on their reactions. number of views was tracked on the media player, verifying that both videos were viewed. screencasts while watching the screencast video, most participants found that the narrator was trustworthy and that they were learning. only 15 percent felt the video needed an example scenario. though there were mixed experiences as to the length of the video, the timing of the video seemed on information technology and libraries | march 2018 33 point, as only 11.6 percent strongly believed that the video took too long and 7.5 percent strongly felt that went too quickly. (see figure 6.) figure 6. screencast reactions. when asked an open-ended question about what struck them the most in the screencast video, respondents most frequently stated that they found it to be informative and interesting, or at least neutral. however, a variety of responses were observed, both negative and positive, or even contradictory. it is worth noting that within this open-ended format, dislike of the narrator’s voice was independently assigned as one of the top three issues. this stresses the importance of coherent and pleasant narration, as it is something that viewers will likely notice. trope or trap? role-playing narratives and length in instructional video | clossen 34 https://doi.org/10.6017/ital.v37i1.10046 figure 7. open-ended questions: screencast. narrative while watching the narrative video, participants found that they could relate to the characters or scenario and found that they were learning as much as they were when watching the screencast (see figure 8). however, there were mixed responses regarding video length and credibility of the narrator. when compared across demographics, employed respondents and students were more likely to agree that they could relate to the scenario than unemployed and nonstudents. male respondents and employed were more likely to think that the video went too fast than female and unemployed respondents. when asked an open-ended question on what most struck them about the narrative video, respondents most often stated that they found it to be boring and long, though a good number also indicated it was interesting and informative (see figure 9). just as with the screencast video, a variety of responses, both negative and positive, were observed, some even conflicting. information technology and libraries | march 2018 35 figure 8. narrative reactions. figure 9. open-ended questions: narrative. trope or trap? role-playing narratives and length in instructional video | clossen 36 https://doi.org/10.6017/ital.v37i1.10046 in addition, 13.5 percent of respondents were unsatisfied with the content of the video. just as with the screencast video, a variety of responses, both negative and positive, were observed, some even conflicting. screencast versus narrative the screencast video tended to be preferred by respondents, with higher average scores in content, engagement, learning value, and narrator trustworthiness. in contrast, respondents also thought that the screencast video moved too quickly compared to the narrative video. additionally, participants were more impatient during the narrative video (see figure 10). figure 10. screencast versus narrative. to observe differences between the screencast and narrative videos with regards to respondent reactions within specific population demographics, manova test was performed. this test revealed that none of the p-values were significant (at α = .05), leaving no correlation between student status, employment status, and reaction to each video. a more liberal interpretation of the data from this analysis might conclude that differences in impatience across student status were possibly significant (α = .10), with students being more likely to exhibit a smaller difference in *score defined as 1 = “not very much” to 5 = “very much”, with difference = screencast score – narrative score. red rows indicate higher scores for the narrative video. statistics for differences in screencast and narrative* (n=1305) information technology and libraries | march 2018 37 impatience for the two video styles. the preferences for screencast over narrative video did not change when the demographics were spliced. conclusions it is impossible to please everyone all the time—at least that is what survey results suggest. there are several takeaways to this study: video length matters, especially as a consideration before the video is viewed. timestamps should be included in video creation, or it is highly likely that the video will not be viewed. the video player is key here, as some video players include video length, while others do not. videos that exceed four minutes are unlikely to be viewed unless they are required. voice quality in narration matters. although preference in type of voice inevitably varies, the actor’s voice is noticed over production value. it is important that the narrator speaks evenly and clearly. for brief how-to videos, there is a small preference for screencast instructional videos over a narrative roleplay scenario. the results of the survey indicate that roleplay videos should be wellproduced, brief, and high quality. however, what constitutes high quality is not very well established.15 finally, screencast videos should include an example scenario, however brief, to ground the viewer in the task. suggestions for further study next steps for research might include a more refined survey focusing on the results of this study. of equal value would be a series of focus groups that are given both a screencast and narrative video and asked to discuss their preferences. though a wide variety of students were surveyed, limits of this dataset prevented the exploration of specific correlations among students attending different institution types or among those pursing different majors. further research addressing the differences among these student bodies would be a welcome addition to the literature. references 1 melissa bowles-terry, merinda kaye hensley, and lisa janicke hinchliffe, “best practices for online video tutorials in academic libraries: a study of student preferences and understanding,” communications in information literacy 4, no. 1 (january 1, 2010): 17–28. 2 anali maughan perry, “lights, camera, action! how to produce a library minute,” college & research libraries news 72, no. 5 (2011): 278–83. trope or trap? role-playing narratives and length in instructional video | clossen 38 https://doi.org/10.6017/ital.v37i1.10046 3 ariana baker, “students’ preferences regarding four characteristics of information literacy screencasts,” journal of library & information services in distance learning 8, no. 1–2 (january 2, 2014): 67–80, https://doi.org/10.1080/1533290x.2014.916247. 4 nichole a. martin and ross martin, “would you watch it? creating effective and engaging video tutorials,” journal of library & information services in distance learning 9, no. 1–2 (january 2, 2015): 40–56, https://doi.org/10.1080/1533290x.2014.946345. 5 bowles-terry, hensley, and hinchliffe, “best practices,” 23; perry, “lights, camera, action!,” 282. 6 baker, “students’ preferences,” 76; martin and martin, “would you watch it?,” 48. 7 jaclyn r. devine, todd quinn, and paulita aguilar, “teaching and transforming through stories: an exploration of macroand micro-narratives as teaching tools,” reference librarian 55, no. 4 (october 2, 2014): 273–88, https://doi.org/10.1080/02763877.2014.939537. 8 shannon f. harp and richard e. mayer, “the role of interest in learning from scientific text and illustrations: on the distinction between emotional interest and cognitive interest,” journal of educational psychology 89, no. 1 (1997): 92–102, https://doi.org/10.1037//00220663.89.1.92. 9 suzanne hidi and valerie anderson, “situational interest and its impact on reading and expository writing,” in the role of interest in learning and development, ed. by k. ann renniger (hillsdale, nj: l. erlbaum associates, 1992), 213–14. 10 babette park et al., “does cognitive load moderate the seductive details effect? a multimedia study,” in “current research topics in cognitive load theory,” special issue, computers in human behavior 27, no. 1 (january 1, 2011): 5–10, https://doi.org/10.1016/j.chb.2010.05.006. 11 annette towler et al., “the seductive details effect in technology-delivered instruction,” performance improvement quarterly 21, no. 2 (january 1, 2008): 65–86, https://doi.org/10.1002/piq.20023. 12 babette park, terri flowerday, and roland brünken, “cognitive and affective effects of seductive details in multimedia learning,” computers in human behavior 44 (march 1, 2015): 267–78, https://doi.org/10.1016/j.chb.2014.10.061. 13 mónica colón-aguirre and rachel a. fleming-may, “‘you just type in what you are looking for’: undergraduates’ use of library resources vs. wikipedia,” journal of academic librarianship 38, no. 6 (november 1, 2012): 391–99, https://doi.org/10.1016/j.acalib.2012.09.013. 14 baker, “students’ preferences,” 76. 15 towler et al., “the seductive details,” 71. https://doi.org/10.1080/1533290x.2014.916247 https://doi.org/10.1080/1533290x.2014.946345 https://doi.org/10.1080/02763877.2014.939537 https://doi.org/10.1037/0022-0663.89.1.92 https://doi.org/10.1037/0022-0663.89.1.92 https://doi.org/10.1016/j.chb.2010.05.006 https://doi.org/10.1002/piq.20023 https://doi.org/10.1016/j.chb.2014.10.061 https://doi.org/10.1016/j.acalib.2012.09.013 abstract literature review length narrative roleplay scenario methods survey design implementation results and discussion demographics information and video confidence video length instructions versus roleplay screencasts narrative screencast versus narrative conclusions suggestions for further study references transitioning from xml to rdf: considerations for an effective move towards linked data and the semantic web juliet l. hardesty information technology and libraries | march 2016 51 introduction metadata, particularly within the academic library setting, is often expressed in extensible markup language (xml) and managed with xml tools, technologies, and workflows. software tools such as the oxygen xml editor and querying languages such as xpath and xquery over time have become capable of helping that management. however, managing a library’s metadata currently takes on a greater level of complexity as libraries are increasingly adopting the resource description framework (rdf). semantic web initiatives are surfacing in the library context with experiments in publishing metadata as linked data sets, bibframe development using rdf, and software developments such as the fedora 4 digital repository using rdf. challenges are evident when considering examples of transitions from xml into rdf and show the need for communication and coordination between efforts to incorporate and implement rdf. this article outlines these challenges using different use cases from the literature and first-hand experience. the follow-up discussion considers ways to progress forward from metadata formatted in xml to metadata expressed in rdf. the options explored are not only targeted to metadata practitioners considering this transition but also to programmers, librarians, and managers. literature review and concepts as an initial example of the challenges faced when considering rdf, clarifying terminology is still a helpful activity. rdf focuses on sets of statements describing relationships and meaning. these statements consist of a subject, a predicate, and an object (i.e., an article, has an author, jane smith). these statement parts are also referred to as a resource, a property, and a property value. since there are three parts to rdf statements, they are referred to as triples. the predicate or property of an rdf statement defines the relationship between the subject and the object. rdf ontologies are sets of properties for a particular domain. for example, darwin core has an rdf ontology to express biological properties,1 and ebucore has an rdf ontology to express properties about audiovisual materials.2 pulling apart the many issues involved in moving from xml to rdf is an exploration into the juliet l. hardesty (jlhardes@iu.edu) is metadata analyst at indiana university libraries, bloomington, indiana. mailto:jlhardes@iu.edu transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 52 purpose of metadata, the tools available and their capabilities, and the various strategies that can be employed. poupeau rightly states that xml provides structural logic in its hierarchical identification of elements and attributes, where rdf provides data logic declaring resources that relate to each other using properties.3 these properties are ideally all identified with single reference points (uniform resource identifiers or uris) rather than a description encased in an encoding. a source of honest confusion, however, is that rdf can be expressed as xml. lassila’s note regarding the resource description framework specification from the world wide web consortium (w3c) states, “rdf encourages the view of ‘metadata being data’ by using xml (extensible markup language) as its encoding syntax.”4 so even though rdf can use xml to express resources that relate to each other via properties, identified with single reference points (uris), rdf is itself not an xml schema. rdf has an xml language (sometimes called, confusingly, rdf, and from here forward called rdf/xml). additionally, rdf schema (rdfs) declares a schema or vocabulary as an extension of rdf/xml to express application-specific classes and properties.5 simply speaking, rdf defines entities and their relationships using statements. there are various ways to make these statements, but the original way formulated by the w3c is using an xml language (rdf/xml) that can be extended by an additional xml schema (rdfs) to better define those relationships. ideally, all parts of that relationship (the subject, predicate, object, or the resource, property, property value) are uris pointing to an authority for that resource, that property, or that property value. an additional concept worth covering is serialization. this term is used as a way to describe how rdf data is expressed using various formatting languages. rdf/xml, n-triples, turtle, and jsonld are all examples of rdf serializations.6 describing something as being in rdf really means the framework of subject, predicate, object is being used. describing something as being expressed in rdf/xml or json-ld means that the rdf statements have been serialized into either of those formatting languages. using “rdf” to refer not only to the framework to describe something (rdf) but also the serialization of that description (rdf/xml) can easily muddle the discussion. other thoughts about the difference between xml and rdf or moving metadata from xml into rdf point to the difference in perspective and the change in thinking that is required to manage such a move. in an online discussion about rdf in relation to tei (text encoding initiative), cummings talks about the need for both xml and rdf, using xml to encode text and rdf to extract that data and make it more useful.7 yee, in her in-depth look at bibliographic data as part of the semantic web, points out that rdf is designed to encode knowledge, not information.8 the rdf primer 1.0 also states “rdf directly represents only binary relationships.”9 xml describes what something is by encoding it with descriptive elements and attributes. rdf, on the other hand, constructs statements about something using direct references—a reference to the thing itself, a reference to the descriptor, and a reference to the descriptor’s value. as farnel discussed in her 2015 open repositories presentation about the university of alberta’s move to rdf, they learned they were moving from a records-based framework in xml to a things-based framework in rdf.10 what is pointed out here time and again is something else farnel discussed—moving from xml to information technology and libraries | march 2016 53 rdf is not simply a conversion between encoding formats; it is a translation between two different ways of organizing knowledge. it involves understanding the meaning of the metadata encoded in xml and representing that meaning with appropriate rdf statements. the tools most commonly employed for reworking xml into rdf are openrefine when accompanied by its rdf extension; a triplestore database such as openlink virtuoso,11 apache fuseki,12 or sesame13; oxygen xml editor14; and protégé,15 an ontology editor. openrefine is, according to the website, “a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.”16 the rdf extension, called rdf refine, allows for importing existing vocabularies and reconciling against sparql endpoints (web services that accept sparql queries and return results).17,18 sparql is similar to sql as a language for querying a database, but the syntax is specifically designed to allow for querying data formatted in triple statements instead of tables with columns.19 triplestore databases such as openlink virtuoso can store and index rdf statements for searching as a sparql endpoint, offering a way to retrieve information and visualize connections across a collection of triples. oxygen xml editor has proven helpful in formulating extensible stylesheet language (xsl) transformations to move metadata from a particular xml schema or format into rdf/xml or other serializations such as json-ld (javascript object notation for linking data).20 protégé is a tool developed by stanford university that supports the owl 2 web ontology language and has helped to convert xml schemas to rdf ontologies and establish ways to express xml metadata in rdf. these tools provide the technical means to take metadata expressed in xml and physically reformat it to metadata expressed in an rdf serialization. what that reformatting also encompasses, however, is a review of the information expressed in xml and a set of decisions as to how to express that information as rdf statements. strategic approaches and ideas for handling data transformations into rdf have involved the xml schema or document type definition (dtd). these include thuy, lee, and lee’s approach to map an xml schema (the xsd) to rdf, associating simpletype’s xsd in xml with properties in rdf, defining complextype’s xsd in xml as classes in rdf, and handling a hierarchy of xml schema elements with top levels as domains and lower-level elements and attributes as container classes or subproperties in those domains.21 thuy et al. earlier worked on a method to transform xml to rdf by translating the dtd using rdfs (elements in the dtd are rdf classes or subclasses, attlists are rdf properties, and entities—preset variables in the dtd—are called up for use in rdf as encountered).22 similarly, hacherouf, bahloul, and cruz translate an xml schema into an owl ontology.23 klein et al. point out that while ontologies serve to describe a domain, xml schemas are meant to provide constraints on documents or structure for data so it can be advantageous to work out an rdf expression this way.24 tim berners-lee puts it simply: “the same rdf tree results from many xml trees,” meaning the same single statement in rdf (an article has an author jane smith) can be expressed in many ways in xml and can vary on the basis of the source of the xml, any schemas involved, and the people creating the metadata.25 transitioning from xml to rdf using the xml schema might serve to ensure all xml elements are transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 54 replicated in rdf but does not necessarily establish the relationships meant by that xml encoding without additional evaluation. there is no single strategy that will always work to move xml metadata into rdf, even within the same set of tools (such as fedora/hydra) or the same area of concern (libraries, archives, or museums). use cases for rdf the following use cases explain approaches to transition to rdf taken from two differing perspectives. the first set describes efforts to express xml schemas or standards as rdf ontologies. the second set describes efforts by various library or cultural-heritage digital collections to transform metadata records into rdf statements. they also show that strategies to transform xml to rdf cannot occur without a shift in view from structure to relationships and, likewise, from descriptive encoding to direct meaning. moving an xml schema/standard to an rdf ontology as a graduate student at kent state university, mixter took on converting the descriptive metadata standard vra core 4.0 from an xml schema to an rdf ontology.26 using the vra data standards committee guidelines to ensure all minimum fields were included,27 mixter mapped vra xml elements and attributes to schema.org, foaf, void, and dc terms ontologies. this process is known as “cherry-picking,” or combining various ontologies that already exist to represent properties or relationships (the predicates in rdf statements) as rdf instead of creating new proprietary rdf properties. using owl and rdfs as metavocabularies in protégé, this created an ontology that could “retain the granularity required to describe library, archive, or museum items” of vra core 4.0’s design in xml without being a straight conversion of vra core 4.0 from xml to rdf.28 the outcome was an xslt stylesheet that was tested on vra core 4.0 xml records to produce that same information as rdf statements. one point that seemed to help in testing was the fact that all controlled vocabulary terms had reference identifiers in the xml (ready-made uris). something not discussed in the outcomes was that dates resulted in complex rdf (rdf statements that encompass additional rdf statements or blank nodes) and there was no discussion about this complexity or its effect on using those particular rdf statements. vra core 4.0 now has an rdf ontology in draft form, with mixter as one of its authors.29 the owl ontology still points to schema.org, foaf, and void for equivalent classes and properties, but everything is now named within a vra rdf ontology and namespace and translates to such when vra core 4.0 xml is transformed to rdf. another case in the category of going from an xml standard to an rdf ontology is the development of the bibframe model for bibliographic description from the library of congress. the bibframe model is expressed as rdf. according to the bibframe site, “in addition to being a replacement for marc, bibframe serves as a general model for expressing and connecting bibliographic data.”30 marc has its own format of expression with numbered fields and subfields but can be expressed or serialized in xml and is often shared that way. the bibframe model, information technology and libraries | march 2016 55 while revamping the way a bibliographic record is described on the basis of work, instance, authority, and annotation, also provides tools to transform records from marc/xml to the rdf statements of bibframe.31 a single namespace serves the bibframe model and is explained as a long-term strategy to ensure namespace persistence over the next forty-plus years.32 the transformations produced from library of congress marc records and local marc records contain complex hierarchical rdf statements, particularly when ascribing authority sources to names, subjects, and types of identifiers. as it is still a work in progress there are no tools making use of bibframe records in rdf. an additional example is the work happening with pbcore, the public broadcasting metadata standard managed by the corporation for public broadcasting.33 public broadcasting stations and other institutions across the united states provide descriptive, technical, and structural metadata for audiovisual materials using this xml standard. in boston, wgbh’s use of pbcore coincides with its digital asset management system, hydradam, built on fedora 3 and the hydra technology stack (based on blacklight, solr, and the fedora digital repository).34 fedora 3 does not natively support rdf statements as properties on objects like fedora 4. building off an interest to move hydradam to fedora 4 and leverage rdf for metadata about audiovisual collections, wgbh began exploring transitioning the pbcore xml metadata standard into an rdf ontology. ebucore, the european broadcasting union’s metadata standard, is already expressed as an rdf ontology.35 a comparison between the xml standard of pbcore and the classes and properties expressed in ebucore revealed that most pbcore elements were covered by the ebucore ontology.36 efforts are ongoing to offer pbcore 3.0 as an rdf ontology that uses ebucore with the addition of a smaller set of properties along with a way to transform pbcore xml to pbcore 3.0 in rdf.37 the hydra community, in an effort to help the transition from fedora 3 with its xml binary files of descriptive metadata to fedora 4 using rdf statements as properties on objects, is working on a recommendation and transformation to move descriptive metadata in mods xml into rdf that is usable in fedora 4.38 the mods standard has a draft of an rdf ontology and a stylesheet transformation available,39 but the complex hierarchical rdf produced from this transformation is unmanageable with the current fedora 4 architecture. the hydra mods and rdf descriptive metadata subgroup is attempting to reflect the mods elements in simple rdf statements that can be incorporated as properties on a fedora 4 digital object.40 led by steven anderson at the boston public library, this group is moving through mods element by element, asking the question, “if you had to express this mods element from your metadata in rdf today, how would you do that?” participating institutions are reviewing their mods records and exploring the possible rdf predicates that could be used to represent the meaning of that information. some are even considering how to construct those rdf statements so that mods xml can be re-created as close to the original mods as possible (this is called “round tripping”). there are still questions as to whether every single mods element will be reflected in this transformation, how exactly fedora 4 will make use of these descriptive rdf statements, and if the original mods xml will need to be preserved as part of the digital object in fedora, but this group is recognizing that moving from transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 56 fedora 3 to fedora 4 requires a major shift in thinking about descriptive metadata. this transformation tool is an effort to help make that transition possible. the avalon media system is an open source system for managing and providing access to large collections of digital audio and video.41 it is built on fedora 3 and the hydra technology stack and uses mods xml to store descriptive metadata. as development progresses and the available descriptive fields expand, maintaining the workflow to update xml records in fedora and reindexing objects in the hydra interface becomes increasingly complicated. each time an update is made to descriptive information about an audiovisual item through the avalon interface, the entire xml record for that object, stored as a binary text file, is rewritten in fedora 3 and reindexed in solr. in considering advantages to using fedora 4, it appears that descriptive metadata properties stored in rdf are easier to manage programmatically (updating content, adding new fields, more focused reindexing) because descriptive information would not be stored in a single binary file but as individual properties on the object. turning xml metadata into rdf or linked data for publishing, search and discovery, and management as southwick describes the process, the library at the university of nevada las vegas (unlv) took a collection with descriptive records from contentdm and published them as a single rdf linked open data set.42 after cleaning up controlled vocabulary terms across collections and solidifying locally controlled vocabularies, they exported tab-delimited csv records from contentdm. these records were brought into openrefine with its rdf extension where they reviewed the data and mapped to various properties within the europeana data model (edm). controlled vocabulary terms were in text form and had to be reconciled against a sparql endpoint, either locally from downloaded data or from the controlled vocabulary service, to gather the uris to use as the object or value in the rdf statement. openrefine was then used to create rdf files that were uploaded to a triplestore (first mulgara then openlink virtuoso). this provided public access to the linked open data set and a sparql endpoint for querying the data set. after publishing the data set they experimented with pivotviewer from openlink virtuoso and relfinder to see what kinds of connections and relationships could be visualized from the data as linked open data. the outlined steps are clear and the outcomes are described, but interestingly the data set itself no longer appears to be available online.43 although the unlv use case relies on csv instead of xml as the data source, the tools and workflows enlisted to transform the data set into rdf linked open data are still applicable. openrefine can import xml just as it imports csv, so this described case shows the tools that can be used and decisions to be made in processing that data into rdf statements. in oregon digital,44 xml from qualified dublin core, vra core, and mods at two different institutions (university of oregon and oregon state university) were mapped as linked open data and stored in a triplestore to be served up in a new web application using the hydra technology stack.45 an inventory of metadata fields across all collections was first mapped to existing linked information technology and libraries | march 2016 57 data terms, or properties (those with available uris), then properties that were needed in the new web application but did not have available corresponding uris were mapped to a newly devised local namespace for oregon digital. any properties that were not used were kept in the original static xml file for the record as part of the digital object in fedora. the focus here appears to be on mapping properties without as much detail provided on whether the objects were kept as text or mapped to uri values where possible. from the sample record provided the objects appear to be text and not uris. the real power of this project is finding common properties to describe objects from diverse collections and institutions. what also comes out in the example mappings is the use of many different namespaces or ontologies (dc terms, marc relators, but also mods and mads that produce complex rdf). the university of alberta also combined a variety of xml metadata from different sources into a new digital asset management system based on fedora 4 and the hydra technology stack, called the education and research archive.46 reporting on the experience at open repositories 2015, farnel described the process as working in phases.47 beginning with item types, languages, and licenses, then moving to place names and controlled subject terms, and finally person names and free-form subjects, they made multiple passes converting xml metadata into rdf statements and incorporating uris whenever possible. they are combining all of this into a single data dictionary,48 making use of several rdf ontologies to cover the various metadata properties that are being described about objects and collections. university of california at san diego (ucsd) has developed a local data model using a mix of external (mads, vra core, darwin core, premis) and local ontologies. they published a data dictionary and are working on a substantially different revision as part of the metadata workflow they use to bring digital objects into their digital asset management system from a variety of source metadata formats including xml.49 this allows metadata to be created from disparate source formats and makes it possible to bring them together as rdf for delivery, management, and preservation. discussion if metadata is in xml form and the desire is to express it as rdf, this is not merely a transformation from one xml schema to another. it is changing the expression of that data and changing its use. having metadata in xml means information is encoded in a specific way that allows for interchange and sharing. having metadata in rdf is making statements that have direct meaning and can be used independently. there are different perspectives involved in metadata when approaching rdf: those that manage metadata standards (the xml standard side) and those that have metadata encoded using those xml standards (the data management side). depending on the desired outcomes, the needs of these two perspectives can conflict. when managing a metadata standard the rdf transition tends to follow certain patterns: transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 58 • transform an xml standard into a new rdf ontology o examples: dublin core (dc), darwin core (dwc), mods, vra core • establish a move to rdf that incorporates another existing ontology o example: pbcore, hydra community from the data management side, the rdf transition means different patterns occur. these scenarios often start by reviewing the needed outcome, deciding how much metadata needs to be expressed in rdf, and what works best to get the metadata to that point. cases include the following commonalities: • creating new search and discovery end user applications o example: oregon digital, university of alberta • publishing linked data sets o example: unlv, university of alberta • managing metadata using software that supports rdf o example: university of alberta, ucsd, hydra community conflicts are occurring when the needed outcome on the data management side is not supported by the rdf ontology transitions that have occurred for the xml standards being used. an example of this is how rdf is handled in fedora 4. when rdf is complex (the object of one statement is another entire rdf statement), fedora produces blank nodes as new objects within the repository. while not technically problematic, descriptive metadata with complex rdf can result in a situation where a digital object ends up referencing a blank node that then points to, for example, a subject or a genre. this subject or genre has been created as its own object within the digital repository even though that subject or genre is only meant to provide meaning for the digital object. mods rdf produces this complexity and thus is not workable to use with fedora 4. in contrast, other standards such as dc or dwc in rdf produce simple statements that fedora 4 can apply to a digital object without any additional processing. complications in transitioning from xml to rdf also occur when the original xml does not include uris or authority-controlled sources. converting this metadata to rdf can mean locally minting uris or bringing data over as literals (strings of text) without using uris at all. ideally, the result is somewhere in the middle with externally controlled vocabularies incorporated as much as possible and literals or locally minted uris only used where absolutely necessary. translating strings to authoritative sources is intensive work. if the xml standard cannot be expressed as a single rdf ontology, work is further complicated by the need to map xml elements to different rdf ontologies using logic that is often decided locally. while it is possible to transition xml to rdf, the process is not uniform and the pathway involves a lot of labor. potential alleviators for this labor might involve a more user-centered approach by xml standard bodies to consider the ways their standards can be used when translated into rdf (“users” in this context meaning the users of the standards, not the end users searching and information technology and libraries | march 2016 59 discovering digital content). triplestores can manage queries for complex rdf, but digital repository systems are not there yet. those that support rdf for description of objects do so on the basis of simple property statements. a complex rdf ontology is going to be a challenge to support over time. another way to progress forward is for the data management side of the equation to focus efforts on showing, in an end user search and discovery format, what is currently possible when xml is transitioned into rdf. published linked data sets need to have interfaces for access and use, showing the value of what is currently available and any needs or gaps that remain. libraries and cultural-heritage organization engaged in this work should also openly share the processes that both work and do not work so others contemplating this transformation can consider how to forge ahead themselves. libraries and cultural-heritage organizations moving metadata from xml to rdf should provide feedback to xml standard bodies regarding the usefulness or complications of any rdf transitional help an xml standard might provide. technologies for incorporating rdf into web applications and truly connecting triples across the web also require further work. triplestores have so far been the main way to expose data sets but have not been incorporated into common library or cultural-heritage end user search and discovery web applications. additionally, triplestore use does not seem to extend to management or long-term storage of complete data about digital objects. there seems to be a decision to either reduce the data stored in a triplestore down to simple statements or use the triplestore more like an isolated index or sparql endpoint only and manage the complete metadata record separately (in a static file containing text or in a separate database). that aligns triples in rdf more with relational database storage than with catalog records. triple statements focus on relationships and not the complete unique details of the thing being described. triplestores can handle complex hierarchical rdf graphs and provide responses on the basis of queries against those complexities,50 but triplestores do not appear to be taking over as either the main search and discovery mechanism for online digital resources or for digital object management. software using rdf natively is also not currently widespread. a project such as the bibframe initiative that plans to incorporate rdf needs to make sure the complexity of its data model in rdf is manageable by any tools it produces and that it is possible for vendors and suppliers to encompass the data model in their software development. conclusion the reasons for deciding metadata should transition to rdf are just as important as determining the best process for implementing that transition. reasons for transitioning to rdf are conceptually based around making data more easily shareable and setting up data to have meaning and relationships as opposed to local static description that requires programmatic interpretation. the use cases outlined in this article show the reality does not quite yet match the concept. transitioning an xml standard to rdf does not make that data more shareable or more easily understood unless there are end user applications for using that data in rdf. publishing transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 60 linked data involves going through transitional steps, but the endpoint seems to be more of a byproduct. the real goal is going through the process of producing linked data to learn how that works. self-contained projects that aim to express collections in rdf for the purpose of a new search and discovery interface are more successful in implementing rdf that has that new level of meaning and relationship. beyond the borders of these projects, however, the data is not being shared or used. the use cases described above show some examples of what is happening now when transitioning from xml to rdf. approaches include xml standards converting to rdf expression as well as digital collections with metadata in xml that have an interest in producing that metadata as rdf. software that incorporates rdf is still developing and maturing. helping that process along by providing a pathway from xml to functionally usable rdf improves the chances of the semantic web becoming a real and useful thing. it is vital to understand that transitioning from xml to rdf requires a shift in perspective from replicating structures in xml to defining meaningful relationships in rdf. metadata work is never easy, and for metadata to move from encoded strings of text to statements with semantic relationships requires coordination and communication. how best to achieve this coordination and communication is a topic worth engaging as the move to use rdf, produce linked data, and approach the semantic web continues. bibliography berners-lee, tim. “linked data.” linked data design issues, june 18, 2009. http://www.w3.org/designissues/linkeddata.html. ———. “why rdf model is different from the xml model.” semantic web, september 1998. http://www.w3.org/designissues/rdf-xml.html. estlund, karen, and tom johnson. “link it or don’t use it: transitioning metadata to linked data in hydra,” july 2013. http://ir.library.oregonstate.edu/xmlui/handle/1957/44856. farnel, sharon. “metadata at a crossroads: shifting ‘from strings to things’ for hydra north.” slideshow presented at the open repositories, indianapolis, indiana, 2015. http://slideplayer.com/slide/5384520/. hacherouf, mokhtaria, safia nait bahloul, and christophe cruz. “transforming xml documents to owl ontologies: a survey.” journal of information science 41, no. 2 (april 1, 2015): 242–59. doi:10.1177/0165551514565972. klein, michel, dieter fensel, frank van harmelen, and ian horrocks. “the relation between ontologies and xml schemas.” in linköping electronic articles in computer and information science, 2001. doi:10.1.1.14.1037. lassila, ora. “introduction to rdf metadata.” w3c, november 13, 1997. http://www.w3.org/tr/note-rdf-simple-intro-971113.html. http://www.w3.org/designissues/linkeddata.html http://www.w3.org/designissues/rdf-xml.html http://ir.library.oregonstate.edu/xmlui/handle/1957/44856 http://slideplayer.com/slide/5384520/ http://dx.doi.org/10.1177/0165551514565972 http://dx.doi.org/10.1.1.14.1037 http://www.w3.org/tr/note-rdf-simple-intro-971113.html information technology and libraries | march 2016 61 manola, frank, and eric miller. “rdf primer 1.0, section 2.3 structured property values and blank nodes.” w3c recommendation, february 10, 2004. http://www.w3.org/tr/2004/rec-rdfprimer-20040210/#structuredproperties. mixter, jeff. “using a common model: mapping vra core 4.0 into an rdf ontology.” journal of library metadata 14, no. 1 (january 2014): 1–23. 10.1080/19386389.2014.891890. poupeau, gautier. “xml vs rdf: logique structurelle contre logique des données (xml vs rdf: structural logic against logic data).” les petites cases, august 29, 2010. http://www.lespetitescases.net/xml-vs-rdf. “rdf and tei xml,” october 13, 2010. https://listserv.brown.edu/archives/cgibin/wa?a2=ind1010&l=tei-l&d=0&p=28928. southwick, silvia b. “a guide for transforming digital collections metadata into linked data using open source technologies.” journal of library metadata 15, no. 1 (march 2015): 1–35. doi: 10.1080/19386389.2015.1007009. thuy, pham thi thu, young-koo lee, and sungyoung lee. “a semantic approach for transforming xml data into rdf ontology.” wireless personal communications 73, no. 4 (2013): 1387–1402. doi: 10.1007/s11277-013-1256-z. thuy, pham thi thu, young-koo lee, sungyoung lee, and byeong-soo jeong. “transforming valid xml documents into rdf via rdf schema.” in next generation web services practices, international conference on, 0:35–40. los alamitos, ca: ieee computer society, 2007. doi:10.1109/nwesp.2007.23. “xml rdf.” w3schools. accessed september 30, 2015. http://www.w3schools.com/xml/xml_rdf.asp. yee, martha m. “can bibliographic data be put directly onto the semantic web?” information technology and libraries 28, no. 2 (march 1, 2013): 55–80. doi:10.6017/ital.v28i2.3175. notes 1. “darwin core,” darwin core task group, biodiversity information standards, last modified may 5, 2015, http://rs.tdwg.org/dwc/. 2. “metadata specifications,” european broadcasting union, https://tech.ebu.ch/metadataebucore. 3. gautier poupeau, “xml vs rdf: logique structurelle contre logique des données (xml vs rdf: structural logic against logic data),” les petites cases (blog), august 29, 2010, http://www.lespetitescases.net/xml-vs-rdf. 4. ora lassila, “introduction to rdf metadata,” w3c, november 13, 1997, http://www.w3.org/tr/note-rdf-simple-intro-971113.html. http://www.w3.org/tr/2004/rec-rdf-primer-20040210/#structuredproperties http://www.w3.org/tr/2004/rec-rdf-primer-20040210/#structuredproperties http://dx.doi.org/10.1080/19386389.2014.891890 http://www.lespetitescases.net/xml-vs-rdf https://listserv.brown.edu/archives/cgi-bin/wa?a2=ind1010&l=tei-l&d=0&p=28928 https://listserv.brown.edu/archives/cgi-bin/wa?a2=ind1010&l=tei-l&d=0&p=28928 http://dx.doi.org/10.1080/19386389.2015.1007009 http://dx.doi.org/10.1007/s11277-013-1256-z http://dx.doi.org/10.1109/nwesp.2007.23 http://www.w3schools.com/xml/xml_rdf.asp http://dx.doi.org/10.6017/ital.v28i2.3175 http://rs.tdwg.org/dwc/ https://tech.ebu.ch/metadataebucore http://www.lespetitescases.net/xml-vs-rdf http://www.w3.org/tr/note-rdf-simple-intro-971113.html transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 62 5. “xml rdf,” w3schools, accessed september 30, 2015, http://www.w3schools.com/xml/xml_rdf.asp. 6. see “serialization formats” from resource description framework on wikipedia. “resource description framework,” wikipedia, march 18, 2016, https://en.wikipedia.org/wiki/resource_description_framework#serialization_formats. 7. “rdf and tei xml,” email thread on tei-l@listserv.brown.edu, october 13–18, 2010, https://listserv.brown.edu/archives/cgi-bin/wa?a2=ind1010&l=tei-l&d=0&p=28928. 8. martha m. yee, “can bibliographic data be put directly onto the semantic web?” information technology & libraries 28, no. 2 (march 1, 2013): 57, doi:10.6017/ital.v28i2.3175. 9. frank manola and eric miller, “rdf primer 1.0, section 2.3 structured property values and blank nodes,” w3c recommendation, february 10, 2004, http://www.w3.org/tr/2004/recrdf-primer-20040210/#structuredproperties. 10. sharon farnel, “metadata at a crossroads: shifting ‘from strings to things’ for hydra north” (slideshow presentation, open repositories, indianapolis, indiana, 2015), http://slideplayer.com/slide/5384520/. 11. http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/main/. 12. https://jena.apache.org/documentation/fuseki2/. 13. http://rdf4j.org. 14. http://www.oxygenxml.com. 15. http://protege.stanford.edu. 16. http://openrefine.org. 17. https://en.wikipedia.org/wiki/sparql. 18. http://refine.deri.ie. 19. https://jena.apache.org/tutorials/sparql.html. 20. http://json-ld.org. 21. pham thi thu thuy, young-koo lee, and sungyoung lee, “a semantic approach for transforming xml data into rdf ontology,” wireless personal communications 73, no. 4 (2013): 1392–95, doi:10.1007/s11277-013-1256-z. 22. pham thi thu thuy et al., “transforming valid xml documents into rdf via rdf schema,” in next generation web services practices, international conference on, vol. 0 (los alamitos, ca: ieee computer society, 2007), 37, doi:10.1109/nwesp.2007.23. http://www.w3schools.com/xml/xml_rdf.asp https://en.wikipedia.org/wiki/resource_description_framework#serialization_formats https://listserv.brown.edu/archives/cgi-bin/wa?a2=ind1010&l=tei-l&d=0&p=28928 http://dx.doi.org/10.6017/ital.v28i2.3175 http://www.w3.org/tr/2004/rec-rdf-primer-20040210/#structuredproperties http://www.w3.org/tr/2004/rec-rdf-primer-20040210/#structuredproperties http://slideplayer.com/slide/5384520/ http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/main/ https://jena.apache.org/documentation/fuseki2/ http://rdf4j.org/ http://www.oxygenxml.com/ http://protege.stanford.edu/ http://openrefine.org/ https://en.wikipedia.org/wiki/sparql http://refine.deri.ie/ https://jena.apache.org/tutorials/sparql.html http://json-ld.org/ http://dx.doi.org/10.1007/s11277-013-1256-z http://dx.doi.org/10.1109/nwesp.2007.23 information technology and libraries | march 2016 63 23. see mokhtaria hacherouf, safia nait bahloul, and christophe cruz, “transforming xml documents to owl ontologies: a survey,” journal of information science 41, no. 2 (april 1, 2015): 242–59, doi:10.1177/0165551514565972. 24. michel klein et al., “the relation between ontologies and xml schemas,” section 5 in linköping electronic articles in computer and information science, 6 (2001), doi:10.1.1.108.7190. 25. tim berners-lee, “why rdf model is different from the xml model,” semantic web road map, september 1998, http://www.w3.org/designissues/rdf-xml.html. 26. see jeff mixter, “using a common model: mapping vra core 4.0 into an rdf ontology,” journal of library metadata 14, no. 1 (january 2014): 1–23, doi:10.1080/19386389.2014.891890. 27. the document currently labeled “how to convert version 3.0 to version 4.0” contains a recommendation for a minimum set of elements for “meaningful retrieval” in vra core: http://www.loc.gov/standards/vracore/convert_v3-v4.pdf. 28. mixter, “using a common model,” 2. 29. “vra core rdf ontology available for review,” visual resources association, october 7, 2015, http://vraweb.org/vra-core-rdf-ontology-available-for-review/. 30. “bibliographic framework initiative,” library of congress, https://www.loc.gov/bibframe/. 31. see “marc to bibframe transformation tools” at “tools” bibframe, http://bibframe.org/tools/. 32. “why a single namespace for the bibframe vocabulary?” library of congress, bibframe frequently asked questions, https://www.loc.gov/bibframe/faqs/#q06. 33. “pbcore 2.1,” public broadcasting metadata dictionary project, http://pbcore.org. 34. “wgbh,” hydra community partners, http://projecthydra.org/community-2-2/partners-andmore/wgbh/. 35. “metadata specifications,” european broadcasting union, https://tech.ebu.ch/metadataebucore. 36. see notes from pbcore hackathon part 2, which occurred in june 2015 showing an elementby-element analysis of pbcore against ebucore. “pbcore hackathon part 2,” june 15, 2015, https://docs.google.com/document/d/1pwdfyizhpfjcn5rwj1fiowexg5rirxudxcwkbq5bml a/. 37. “join us for the pbcore sub-committee meeting at amia!” public broadcasting metadata dictionary project blog, november 11, 2015, http://pbcore.org/join-us-for-the-pbcore-subcommittee-meeting-at-amia/. http://dx.doi.org/10.1177/0165551514565972 http://dx.doi.org/10.1.1.108.7190 http://www.w3.org/designissues/rdf-xml.html http://dx.doi.org/10.1080/19386389.2014.891890 http://www.loc.gov/standards/vracore/convert_v3-v4.pdf http://vraweb.org/vra-core-rdf-ontology-available-for-review/ https://www.loc.gov/bibframe/ http://bibframe.org/tools/ https://www.loc.gov/bibframe/faqs/#q06 http://pbcore.org/ http://projecthydra.org/community-2-2/partners-and-more/wgbh/ http://projecthydra.org/community-2-2/partners-and-more/wgbh/ https://tech.ebu.ch/metadataebucore https://docs.google.com/document/d/1pwdfyizhpfjcn5rwj1fiowexg5rirxudxcwkbq5bmla/ https://docs.google.com/document/d/1pwdfyizhpfjcn5rwj1fiowexg5rirxudxcwkbq5bmla/ http://pbcore.org/join-us-for-the-pbcore-sub-committee-meeting-at-amia/ http://pbcore.org/join-us-for-the-pbcore-sub-committee-meeting-at-amia/ transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 64 38. “mods and rdf descriptive metadata subgroup,” last modified march 19, 2016, https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgrou p 39. “mods rdf ontology,” library of congress, https://www.loc.gov/standards/mods/modsrdf/. 40. “mods and rdf descriptive metadata subgroup,” last modified march 19, 2016, https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgrou p 41. “avalon media system,” http://www.avalonmediasystem.org. 42. see silvia b. southwick, “a guide for transforming digital collections metadata into linked data using open source technologies,” journal of library metadata 15, no. 1 (march 2015): 1– 35, http://dx.doi.org/10.1080/19386389.2015.1007009. 43. the url for information is a blog with no links to a data set (https://www.library.unlv.edu/linked-data) and the collection site seems to still be based on contentdm (http://digital.library.unlv.edu/collections). 44. “oregon digital,” http://oregondigital.org. 45. see karen estlund and tom johnson, “link it or don’t use it: transitioning metadata to linked data in hydra,” july 2013, http://ir.library.oregonstate.edu/xmlui/handle/1957/44856, accessed from scholarsarchive@osu. 46. “era: education & research archive,” https://era.library.ualberta.ca. 47. farnel, “metadata at a crossroads.” 48. https://docs.google.com/spreadsheets/d/1hsd6kf4abmm8vtynyqfjgtizg7bljq3fwrbf_nvoiw/edit#gid=1362636241. 49. the substantially revised data model is not available online yet, but the following shows some of the progress toward an rdf data model: “overview of dams metadata workflow,” uc san diego, may 21, 2014, https://tpot.ucsd.edu/metadata-services/mas/data-workflow.html; “dams4 data dictionary,” https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/da ta-dictionary.html, retrieved from github. 50. see the apache jena sparql tutorial for an example of complex rdf with sample queries against that complexity. “sparql tutorial data formats,” the apache software foundation, https://jena.apache.org/tutorials/sparql_data.html. https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgroup https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgroup https://www.loc.gov/standards/mods/modsrdf/ https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgroup https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgroup http://www.avalonmediasystem.org/ http://dx.doi.org/10.1080/19386389.2015.1007009 https://www.library.unlv.edu/linked-data http://digital.library.unlv.edu/collections http://oregondigital.org/ http://ir.library.oregonstate.edu/xmlui/handle/1957/44856 https://era.library.ualberta.ca/ https://docs.google.com/spreadsheets/d/1hsd6kf4abm-m8vtynyqfjgtizg7bljq3fwrbf_nvoiw/edit#gid=1362636241 https://docs.google.com/spreadsheets/d/1hsd6kf4abm-m8vtynyqfjgtizg7bljq3fwrbf_nvoiw/edit#gid=1362636241 https://tpot.ucsd.edu/metadata-services/mas/data-workflow.html https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/data-dictionary.html https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/data-dictionary.html https://jena.apache.org/tutorials/sparql_data.html literature review and concepts use cases for rdf discussion conclusion bibliography notes delivering: automated materials handling for staff and patrons public libraries leading the way delivering automated materials handling for staff and patrons carole williams information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13xxx carole williams (williamscar@ccpl.org) is amh and self-service coordinator, charleston county public library. © 2021. “you’ve made libraries cool again!” “wowsuper techie—we’re fascinated by the book return!” with enthusiastic comments like these from our visitors, the staff at charleston county public library (ccpl) knew that we were delivering patron engagement while providing an effective book return system. thanks to county residents overwhelmingly approving a referendum to build five new libraries and renovate thirteen others. from june 2019–november 2020 charleston county opened four new library branches, each with an automated materials handler (amh), and moved our support and administrative staff into a renovated support services building that now houses a 32-chute amh central sorter with smart transit check-in technology. (side note: yes, i know what you’re thinking and yes, we did, we opened two of the new branches during the pandemic—definitely fodder for another article.) the branch amhs have interior and exterior return windows and sit along a glass wall so patrons can watch their items ride the conveyor belts and drop into sorting bins. the staff side has an inductor for items being returned from or sent to other branches, so there is almost always something for the public to watch (see figure 1). men, women, children, young and old enjoy watching the amh and asking questions. some patrons bring their out of town guests (even a nun visiting from ireland) to see the amh in action. this spontaneous interaction bolsters our connection with visitors and subconsciously reinforces the concept of “library as safe exploration.” a frequent question is “how does this work?” our explanation of tags and coding is the perfect opportunity to suggest books, point out games, and promote upcoming classes. we follow a roving customer service model. because an amh is an efficient tool that checks in items and deposits them in pre-determined bins for easy shelving, we have freed up hours of staff time that can now be spent in the stacks, helping patrons find items and answering questions as needed. delivering an excellent amh experience for staff has been more complicated. as befitting a port county, we went full steam ahead with new technology, new locations, and increased services. this required all staff to simultaneously learn new systems and change many of our in-house procedures while continuing with daily operations. every detail, from how to sort for shelving to labeling shipments, needed to be re-examined. the biggest changes came with bringing the central sorter online. some of the changes were technical. for example, we use rfid tags as an identification number and place a matching barcode on each item. rfid is excellent technology; tagging all our items has completely changed and streamlined our process. most items come pre-processed and the amhs are set to only read the rfid. an unintended, but useful consequence is that we have become more aware of vendor processing errors where tags and barcodes don’t match. (side note: we are working on some system-wide solutions to locate discrepancies between barcode numbers and rfid tags in the ils; rfid is another topic entirely, so stay tuned.) mailto:williamscar@ccpl.org information technology and libraries september 2021 delivering| williams 2 figure 1. children returning books to the automated materials handler at a branch of the charleston public library. another benefit of the amh is that our library collections acquisitions and technical services (lcats) department realized that as they processed new orders, they could now send those items out daily instead of waiting to accumulate enough individual branch materials for a separate shipment—a win for patrons (new materials every day) and lcats (storage space.) the unexpected twist: our adult, ya, and children’s’ librarians are accustomed to receiving new materials separately from returns so they can familiarize themselves with the titles before the items are shelved. with the central sorter, new items go out daily mixed in with the rest of the daily shipment. spine tape makes it is easy for circulation staff to separate the new adult items, but we still needed a solution for children and ya. after several sort changes and many discussions, we went old school, recycling used paper into book flags. the flagging doesn’t cause a problem with the amh, is quick for technical services to place in each new book and is easy for circulation to spot and put aside at the receiving location. some of the changes were electrical. only the four new branch locations and support services facility have an amh, while the other fourteen branches check in items by hand. we added a tote check-in server (tcs) system to the central sorter. this feature creates a manifest of the items in each crate. branches can now receive the contents of each crate by entering a 4 -digit barcode instead of scanning individual items. an unintended consequence to our new internet-dependent system that we had not anticipated was electricity. the coast has frequent thunderstorms that can information technology and libraries september 2021 delivering| williams 3 cause power outages and flooding. if the power is out, there is no way to sort or receive the items in delivery. luckily this doesn’t happen often, and so far, power has been restored quickly. some of the changes were physical. our delivery drivers also process the shipment when they return each day. in their previous workflow, most of the shipment was delivered to the downstream libraries. the parts of the shipment that they did process had printed routing slips placed in each item, so staff could all be sorting the shipment at the same time. now their department has become logistics, which is a more encompassing title and better covers the wider variety of tasks the staff have added to their day. in addition to delivery and mail duties, logistics also manages and maintains the amh and tcs equipment, troubleshooting problems that arise, scanning barcodes, and processing an average of 3000 items daily with the amh. most of the shipment is now coming through the central sorter—staff handle an average of 157 crates each weekday, moving items from support services and to/from branches. we have electric forklifts that hold three crates at a time to help with the increase in physically shifting the crates. now one person inducts the shipment while others scan and stack the crates on the loading dock. this procedure is much faster than the previous paper slip method and processing is usually finished in a couple of hours. other changes were mental and emotional. new locations, renovations, technologies, and procedures can be exciting, but can also lead to change fatigue. fortunately, everyone retained their job this past year, but in order to operate a new branch built in a previously unserved community, we had to reassign staff from locations closed for renovation. ccpl’s vision is for our library to be the path to our cultural heritage, a door to resources of the present, and a bridge to opportunities in the future. we are doers, creators, servers, and teammates, not only to the community, but to our coworkers. we are all in for our shared vision, but whew. . . some days we all experience mental eye rolling and collective sighs of “another change?” our director’s mantra is “we are the calm.” and it is true. by fall we will have three of the renovated branches reopened, three more under renovation and another staffing shift. with some grace and encouragement to one another, we will handle whatever comes next. reproduced with permission of the copyright owner. further reproduction prohibited without permission. consortia building: a handshake and a smile, island style cutright, patricia j information technology and libraries; jun 2000; 19, 2; proquest pg. 90 consortia building: a handshake and a smile, island style patricia j. cutright in the evaluation of consortia and what constitutes these entities the discussion runs the gamut. from small, loosely knit groups who are interested in cooperation for the sake of improving services to large membershipdriven organizations addressing multiple interests, all recognize the benefits of partnerships. the federated states of micronesia are located in the western pacific ocean and cover 3.2 million square miles. throughout this scattering of small islands exists an enthusiastic library community of staff and users that have changed the outlook of libraries since 1991. motivated by the collaborative eff orts of this group, a project has unfolded over the past year that will furth er enhance library services through staff training and education while utilizing innovative technology. in assessing the library needs of the region this group crafted the document "the federated states of micronesia library services plan, 1999-2003," which coalesces the concepts, goals, and priorities put forward by a broad-based contingency of librarians. the compilation of the plan and its implementation demonstrate an understanding of the issues and exhibit the ingenuity, creativity, and willingness to solve problems on a g rand scale addressing the needs of all libraries in this vast pacific region. t he basic philosophy inher ent in librarianship is the concept of sharing. the di sse mination of information through material exchang e and interlibrary communication has enriched so cieties for centuries. th ere ar e few institutions other than libraries that are better equipped or suited for such cooperation and collaborati ve e ndeavors. with servic e as the lifeblood that runs through its inky veins , the librar y has the potential to be the driving force in an y community toward partnerships that a fford mutual benefit for all. the examination of the literatur e exposes a wid e rang e of perceptions as to the d e finition of what is a consortium . the term "consortia" conjur es up impressions that span the spectrum from highly or ganized, membership-driv en groups to loosely knit cadres focusing on impro ving services to their patrons however they can make it happen. in kopp 's pap er "library consortia and patricia j. cutright (cutright@eou .edu} is library director of the pierce library at eastern oregon university. 90 information technology and libraries i june 2000 information technology : th e past, the present, th e promise" he presents information from a study conduct ed by ruth patrick on academic library consortia. in that study she identified four general types of consortia : • large consortia concerned primarily with computerized large-scale technical processing; • small consortia conc erned with user services and everyday probl ems ; • limited-purpose consortia cooperating with respect to limited special subject areas; • limited-purpose con sorti a concerned primarily with interlibrary loan or reference; and network operations.i with this distinction in mind , this paper will focus on th e second category typifying a small , less structured organization. whil e on a visiting assis tantship in the federated states of micronesia (fsm), i worked with a partnership of libraries that believe in order for cooperation to succeed, results for the patron must be the goal-not equity between libraries or some magical balance between resources lent by one library and resources received from a noth er library.2 unified effort s to provide service to the p a tron is the key. the libraries on a small, rem ote island situated in the western pacific ocean exhibit this grassroots effort that define s the true meaning of consortia-demonstrating collaboration , cooperation , and partnerships. it is a multi type library cooperative that not only encompasses interaction among libraries but also betwe en agencies as well as governments. the librarians on the island of pohnpei, micron esia, and all the islands throughout the federated states of micronesia have embraced this consortia) attitud e whil e achieving much through these collaborative efforts : • the joint work done on crafting the library services plan, 1999-2003 for the libraries throu ghout the federated states of micronesia • initiating successful grant-writing efforts which target national goals and priorities • implementing a collaborative library automation project which is d esigned to evolve into a national union catalog • the implementation of a viable resource-sharing and document delivery service for the nation i background and socioeconomic overview micron esia, a name m eaning " tiny islands ," comprise s som e 2,200 volcanic and coral islands spread throughout reproduced with permission of the copyright owner. further reproduction prohibited without permission. 3.2 million square miles of pacific ocean. lying west of hawaii, east of the philippines, south of japan and north of australia, the total land mass of all these tropical islands is fewer than 1,200 square miles with a population base estimated at no more than 111,500.3 a locationunique region, but nonetheless still plagued with all the problems associated with any geographically remote, economically depressed area found anywhere in the united states or elsewhere in the world. the federated states of micronesia is a small-island, developing nation that is aligned with the united states through a compact of free association, making it eligible for many u.s. federal programs. the economic base is centered around fisheries and marine-related industries, tourism, agriculture, and small-scale manufacturing. the average per capita income in 1996 was $1,657 for the four states of the fsm: kosrae, pohnpei, yap, and chuuk. thirteen major languages exist in the country, with english as the primary second language. the 607 different islands, atolls, and islets dot an immense expanse of ocean; this geographic condition presents challenges in implementing and enhancing library services and technology. 4 despite the extreme geographic and economic conditions, the college of micronesia-fsm national campus in collaboration with the librarians throughout the states have been successful in implementing nationwide projects. these endeavors have resulted in technical infrastructure and the foundation for information technology instruction supported through awards from the u.s. department of education, the title iii program, and the national science foundation. i collaboration: building bridges that cross the oceans the libraries in micronesia have shown an ongoing commitment to librarianship and cooperation since the establishment of the pacific islands association of libraries and archives (piala) in 1991. the organization is a micronesia-based regional association committed to fostering awareness and encouraging cooperation and resource sharing among libraries, archives, museums, and related institutions. piala was formed to address the needs of pacific islands librarians and archivists, with a special focus on micronesia; it is responsible for the common-thread cohesiveness shared by the librarians over the past eight years. the organization has grown to become an effective champion of the needs of libraries and librarians in the pacific region.s when piala was established, the most pressing areas of concern within the region were development of resource-sharing tools and networks among the libraries, archives, museums, and related institutions of the pacific islands. the development of continuing education programs and the promotion of technology and telecommunications applications throughout the region were areas targeted for attention. those concerns have changed little since the group's inception. building upon that original premise, in january 1999 a group of interested parties from throughout the federated states of micronesia met to draft a document they envisioned would lay the groundwork for library planning over the next five years. this strategic plan encompasses all library activity-services, staffing, and the impact technology will have on libraries in the region. the document, "the federated states of micronesia library services plan, 1999-2003," coalesces the concepts, goals, and priorities put forward by a broad-based contingent. in this meeting, the group addressed basic issues of library and museum service, barriers and solutions to improve service delivery, and additional funding and training resources for libraries and museums.6 the compilation of the plan crafted at the gathering demonstrated a thorough understanding of the issues that face the librarians of the vast region. it exhibits the ingenuity, creativity, and willingness to problem-solve on a grand scale in a way that addresses the needs of all libraries in the pacific region. the goals set forward by the writing session group illustrate the concerns impacting library populations throughout the fsm. the fsm has now established six major goals to carry out its responsibilities and the need for overall improvement in and delivery of library services: 1. establish or enhance electronic linkages between and among libraries, archives, and museums in the fsm. 2. enhance basic services delivery and promote improvement of infrastructure and facilities. 3. develop and deliver training programs for library staff and users of the libraries. 4. promote public education and awareness of libraries as information systems and sources for lifelong learning. 5. develop local and nationwide partnerships for the establishment and enhancement of libraries, museums, and archives. 6. improve quality of information access for all segments of the fsm population and extend access to information to underserved segments of the population. priorities the following are general priorities for the fsm library services plan. the priorities represent needs for overall improvement of the libraries, museums, and archives. the priorities are based on the fact that currently libraries, museums, and archives development is in its infancy in consortia building i cutright 91 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the fsm. specific priorities will change from year to year as programs are developed. 1. establishment of new libraries and enhancement of existing library facilities to increas e accessibility of all fsm citizens to library resources and services. outer islands and remote areas generally have no access to libraries or information sources. new facilities or mechanisms need to be established to provide access to information resources for the public. existing public and school library facilities often lack adequate staffing, climate control, and electrical connections needed to meet the needs of the community. existing public and school libraries also need to improve their facilities and services delivery to meet the needs of disabled individuals and other special populations. 2. provide training and professional development for library operation and use of new information technologies. a survey held during the writing session indicated that public and school library staff do not currently possess the skills needed to effectively provide assistance in the use of new information technologies. well-designed training programs with mechanisms for follow-up technical assistance and support need to be developed and implemented. 3. promote collaboration and cooperation among libraries, museums, and archives for sharing of holdings and technical ability. limited holdings, financial capacity, and human resources are major barriers to improving library services. collaboration and cooperation are needed among libraries, museums, and archives to maximize scarce resources . 4. develop recommended standards and guidelines for library services in the fsm. the ability to share resources and information could be significantly increased by development and implementation of recommended standards and guidelines for library services. standardization could assist with sharing of holdings and holdings information, increase availability of technical assistance, and provide guidance as new libraries and library services are set up. 5. increase access to electronic information sources. existing public and school libraries have limited or no access to electronic linkages including basic services such as e-mail and connections to the internet. the priority need is to establish basic electronic linkages for all libraries, followed by extending access to electronic information to all users.7 i shifting into action with the drafting of this five-year plan, the librarians stated emphatically the need and desire to move ahead 92 information technology and libraries i june 2000 with haste and determination . as the plan was conceptualized and documented, a small cadre of librarians from the college of micronesia -fsm national campus, the public library, and high school library crafted two successful grant proposals which addressed: • a cooperative library automation project which is designed to evolve into a national union catalog (goal 1; priorities 3, 5); • the installation of intern et services that would link the college of micronesia-fsm campuses, the public library, and high school library (goals 1, 2, 6; priorities 1, 2, 3, 5); • the development and delivery of training programs for library staff and users of the libraries (goals 3, 4, 6; priority 2); and • the implementation of a viable resource-sharing and document delivery service for the nation (goal 1, 2, 5, 6; priorities 3, 4, 5). over the past year the awarding of grant funds has shifted the library community into high gear with the design and implementation of project activities that will fulfill the targeted needs. the automation project and internet connectivity a collaborative request submitted by the bailey olter high school (bohs) library and the pohnpei public library provided the funding necessary to computerize the manual card catalog system at bohs and upgrade the dated automated library system at pohnpei public library. since the college of micronesia-fsm campuses are automated, it was important for the high school library and the public library to install like systems to achieve a networkable automated system, facilitating the development of a union catalog for all th e libraries' holdings. this migration to an automated system promoted cooperation and resource sharing for the island libraries-opening a wealth of information for all island residents. the project entailed purchasing a turnkey cataloging and circulation system that will facilitate the cataloging and processing of new acquisitions for each library as well as the conversion of approximately five thousand volumes of material already owned by the public and high school libraries. through internet connectivity, which was integral to the project, the system would also serve as public access to the many holdings of the libraries for students, faculty, and town patrons through a union catalog to be established in the future. the development and deliv ery of training programs for library staff and users is linked to the implementation of a viable resource-sharing and document delivery service for the nation. stated earlier, the librarians of the federated states of micronesia accepted the challenge facing them in rampreproduced with permission of the copyright owner. further reproduction prohibited without permission. ing up for the twenty-first century. their prior experience laid the groundwork necessary to implement the training programs necessary to bring the library community the knowledge and skills needed. a survey administered during the writing session indicated that few public and school librarians have significant training in or use of electronic linkages or information technologies, nor are they actively using such technologies at present. of the fourteen public and school librarians in the four states of micronesia, none hold a master's degree from an accredited library school or library media specialist certification. an exception is the library staff at the com-fsm national campus, where two-thirds of the librarians hold professional credentials. significant effort is needed on a sustained basis for effective training in the understanding and use of information systems throughout the nation. where training has occurred, it has often been of an infrequent, short variety with little support for ensuring implementation at the work site. additionally, often there are no formal systems for getting answers to questions when problems do arise. in addressing the information needs for this population it is apparent that education is the key component for continued improvement of library services. this concern is evident in a paper by daniel barron, where it is stated that only 54 percent of librarians and 19 percent of staff in libraries serving communities considered to be rural (i.e., 25,000 people or fewer) have an ala-mls. 8 and dowlin proposes even more perplexing questions, "how can a staff with such an educational deficit be expected to accomplish all that will be demanded to enable their libraries to go beyond being a warehouse of popular reading materials? how can we expect them to change from pointers and retrievers to organizers and facilitators?" 9 micronesia is no different than any other state or country in wanting its population to have access to qualified staff, current resources, and services. it recognizes the libraries are inadequately staffed and many others have staff who are seriously undereducated to meet the expanded information needs of the people in their communities. if these libraries are to seize the opportunities suggested by the developing positive view, develop services to support this view, and market such a view to a wider range of citizens in their communities they must invest in the intellectual capital of their staffs. in order to carry out this charge, the following activities were designed to address the educational and training needs of the librarians in the fsm. as outlined in a recently funded institute of museums and library services (imls) national leadership grant, preparation has begun with the following activities, which will address the staffing and technology concerns described in fsm libraries: 1. recruit and hire an outreach services librarian to survey training needs, coordinate and plan training, and deliver or arrange for needed training. 2. develop a skills profile for all library, museum, and archival staff positions. 3. identify training contact or coordinator for each state. 4. develop and provide periodic updates to operational manuals for school and public libraries, museums, and archives. 5. recruit local students and assist them in seeking out scholarships for professional training off island. 6. design and implement programs to provide continuous training and on-site support in new technological developments and information systems (provided on-site and virtually). 7. establish a summer training institute offering training based on needs as determined by the outreach services librarian in collaboration with state coordinators and recruiting onand off-island expertise as instructors. 8. design and develop programs for orientation and training of users of information systems (provided on-site and virtually). 9. develop and implement a "train the trainer" program, which will have representation from all four states, that will ensure continuity and sustainability of the project for the years to come. 10 the primary requisite to initiating this project is the recruitment and hiring of the outreach services librarian who will then begin the activities as listed. a beginning cadre of librarians gleaned from the summer institute will become the trainers of the future, perpetuating a learning environment enhanced with advanced technology. breakthroughs in distance education, aided with advances in telecommunications, will significantly impact this project. on-site training will be imperative for the initial cadre of summer institute attendees to provide sound teaching skills and a firm understanding of the material at hand. follow-up training will be presented on each island by the trainer either on location or virtually with available technology. products such as web course in a box, webct, or nicenet will be analyzed for appropriate utilization as teaching tools. these products will take advantage of newly established internet connections on each island and, more importantly, will provide the interactive element that distinguishes this learning methodology from the "talking head" or traditional correspondence course approach. a web site designed for this project will provide valuable information and connectivity for not only the pacific library community but anyone worldwide who may be interested in innovative methods of serving remote populations. using computer conferencing and virtual communities technology, a video conferencing system such as 8 x 8 consortia building i cutright 93 reproduced with permission of the copyright owner. further reproduction prohibited without permission. technologies will be used, which will allow face-to-face interaction with trainer and student in an intra-island situation (interisland telephone rates are too expensive for regular use as a teaching tool). to enhance the learning experience and information retrieval component for these librarians and the population they serve, the project also incorporates implementation of a viable resource-sharing, document delivery system capitalizing on a shared union catalog and using a service such as research library group's ariel product. with library budgets reflecting the critical economic climate of the nation, it becomes even more crucial for collaborative collection development and resource sharing to satisfy the needs of the library user. to maintain cost-effective communication and build a sense of community among the librarians, the messaging software icq has been installed on all participant hardware and utilized for group meetings, question and answer, and general correspondence. since icq operates as part of the internet, this package allows low-cost communication with maximum benefit in connecting the group. this technology will also be used as the primary mechanism for communication with an outside advisor who will provide expertise in the area of outreach services for rural populations. the realm of outreach services in libraries has always presented unique challenges that can now benefit greatly from current and emerging technologies. the definition of "outreach" is truly a matter of perspective, with the more traditional sense relating to a specific library servicing its own user or patron. but current practice regards "outreach" as a mere extension of services to all users whether they be a registered patron or colleague or peer. micronesia is a country where the proverbial phrase "the haves and the have-nots" is amplified. the recent (and ongoing) installation of internet services in the region has made possible many basic changes, but there still exists the reality that some of the sites for services proposed have nothing more than a common analog line and rudimentary services. as an example of the realities that exist, only 38 percent of the approximately 180 public schools in the fsm have access to reliable sources of electricity. another challenge for these libraries is the climate and environment, which has a significant impact on library facilities, equipment, and holdings. the fsm lies in the tropics, with temperatures ranging daily from 85 to 95 degrees with humidity normally 85 percent or higher.11 the high salt content in the ocean air wreaks havoc upon electrical equipment, and the favorable environs inside a library often entice everything from termites in the wooden bookcases to nesting ants in keyboards. from these examples it is apparent that the problems that trouble these libraries are not going to be solved with the magic bullet of technology. this reality constitutes the 94 information technology and libraries i june 2000 need for varying strategies and different aproaches to address the training requirements of the library staff. i summary the fsm library group, in particular the pohnpeian librarians, have accomplished much in the past year. the motivating factor for the flurry of activity that enveloped the libraries on pohnpei was spurred by the collaborative writing session in january 1999. a week-long "meeting of the minds" from libraries throughout micronesia produced the blueprint that will map the future of libraries and library service for years to come. these librarians stated their primary issues in delivering library services and came to a consensus on activities needed to address the issues. the "federated states of micronesia library services plan, 1999-2003" was crafted as a working document, a strategic plan for improving library services in the pacific region, and a commitment to achievement through collaboration. while in micronesia i observed the impact that the unification of ideas can have on the citizens of a community. in my fourteen-year tenure at eastern oregon university i have been exposed to the benefits of "consortium attitude" that come from cooperation and partnerships. time and again the university demonstrates the positive effects of what is referred to as "politics of entanglement." shepard describes the overriding philosophy that has been the recipe for success: the politics are really quite simple. we maintain an intricate pattern of relationships, any one of which might seem inconsequential. yet there is strength in the whole that is largely unaffected if a single relationship wanes. rather than mindlessly guarding turf, we seek to involve larger outside entities and in the ensnaring, to turn potential competitors into helpful partners .12 just as eastern oregon university has discovered, the libraries of the federated states of micronesia are learning the merits of entanglement. references and notes 1. james j. kopp, "library consortia and information technology: the past, the present, the promise," information technology and libraries 17 (mar. 1998): 7-12. 2. jan ison, "rural public libraries in multi-type library cooperatives," library trends 44 (summer 1995): 29-52. 3. pacific islands association of libraries and archives, www.uog.edu/rfk/piala.html, accessed june 6, 2000. 4. division of education, department of health, education and social affairs, federated states of micronesia, "federated reproduced with permission of the copyright owner. further reproduction prohibited without permission. states of micronesia, library services plan 1999-2003" (march 3, 1999): 2. 5. pacific islands association of libraries and archives, www.uog.edu/rfk/piala.html, accessed june 6, 2000. 6. division of education and others, "library services plan," 4. 7. ibid, 6. 8. daniel d. barron, "staffing rural pubic libraries: the need to invest in intellectual capital," library trends 44 (summer 1995): 77-88. the mit from gutenberg to the global information infrastructure access to information in the networked world christine l. borgman considers digital libraries from a social rather than a technical perspective. digital libraries and electronic publishing series 340 pp. $42 now in paperback remediation understanding new media jay david bolter and richard grusin " clearly written and not overly technical, this book will interest general readers, students, and scholars engaged with current trends in technology." choice 307 pp., 102 illus. $17.95 paper 9. k. e. dowlin, "the neographic library: a 30-year perspective on public libraries," in libraries and the future: essays oil the library ill the twenty-first century, f. w. lancaster, ed. (new york: haworth pr., 1993). 10. patricia j. cutright and jean thoulag, college of micronesia-fsm national campus, "institute of museums and library services, national leadership grant" (mar. 19, 1999). 11. division of education and others, "library services plan," 2. 12. w. bruce shepard, "spinning interin;titutional webs," aahe bulletin 49 (feb. 1997): 3-6. the intellectual foundation of information organization elaine svenonius "provides sound guidance to future developers of search engines and retrieval systems. the work is original, building on the foundations of information science and librarianship of the past 150 years." dr. barbara 8. tillett, director. ils program, library of congress digital libraries and electronic publishing series 264 pp. $37 now in paperback information ecologies using technology with heart bonnie a. nardi and vicki l. o'day "a new and refreshing perspective on our technologically dependent society." daily telegraph 246 pp. $15.95 paper to order call 800-356-0343 (us & canada) or 617-625-8569. prices subject to change without notice. http:/ /mitpress.mit.edu consortia building i cutright 95 reproduced with permission of the copyright owner. further reproduction prohibited without permission. china academic library and information system: an academic library consortium in china dai, longji;chen, ling;zhang, hongyang information technology and libraries; jun 2000; 19, 2; proquest pg. 66 china academic library and information system: an academic library consortium in china longji dai, ling chen, and hongyang zhang since its inception in 1998, china academic library and information system (calis) has become the most important academic library consortium in china. calis is centrally funded and organized in a tiered structure. it currently consists of thirteen management or information centers and seventy member libraries' 700,000 students. after more than a year of development in information infrastructure, a calis resource-sharing network is gradually taking shape. l ike their counterparts in other countries, academic libraries in china are facing such thorny problems as shrinking budgets, growing patron demands, and rising costs for purchasing books and subscribing to periodicals. it has thus become increasingly difficult for a single library to serve its patrons to their satisfaction. under these circumstances, the idea of resource sharing among academic libraries was born. library consortia provide an organizational form for libraries to share their resources. the georgia library learning online (galileo), the virtual library of virginia (viva), and ohiolink are among the wellknown library consortia in the united states.i traditionally, the primary purpose of establishing a library consortium is to share physical resources such as books and periodicals among members. more recently, however, advances in computer, information, and telecommunication technologies have dramatically revolutionized the way in which information is acquired, stored, accessed, and transferred. sharing electronic resources has rapidly become another important goal for library consortia. i what is calis? in may 1998, as one of the two public service systems in "project 211," the china academic library and information system (ca lis) project was approved by the state development and planning commission of china after a two-year feasibility study by experts from academic libraries across the country. calis is a nationwide academic library consortium. funded primarily by the chinese government, it is longji dai is director, peking university library, and deputy director, calis administrative center; ling chen is deputy director, calis administrative center; and hongyang zhang is deputy director, reference department, peking university library. 66 information technology and libraries i june 2000 intended to serve multiple resource-sharing functions among the participating libraries-including online searching, interlibrary loan, document delivery, and coordinated purchasing and cataloguing-by digitizing resources and developing an information service network. i structure and management of calis a library consortium is an alliance formed by member libraries on a voluntary basis to facilitate resource sharing in pursuit of common interests. whether a consortium can operate successfully depends in large part on how it is managed. calis differs from library consortia in the united states in that it is a national network. it resembles multistate consortia in the united states with respect to geographic distribution of member libraries, but it is like tightly knit or even centrally funded statewide ones in terms of management.2 the calis members are distributed in twenty-seven provinces, cities, and autonomous regions in china, making an entirely centralized management difficult. after surveying some of the major library consortia in the united states, europe, and russia, calis adopted an organizational mode characterized by a combination of both centralized and localized management-that is, a three-tiered structure (figure 1). in order to improve the management efficiency and maximize the sharing of various resources including funds, calis has established a coordination and management network comprising one national administrative center (which also serves as the north regional center), five national information centers (see table 1) and seven regional information centers (see table 2). the thirteen centers are maintained by full-time staff members provided by the libraries in which these centers are located. the national administrative center (located in peking university)-overseen by officials from the concerned office at the ministry of education and the presidents of peking and tsinghua universities and advised by an advisory committee consisting of experts from major member libraries-is responsible for the construction and management of calis, makes policies and regulations, and prepares resource-sharing agreements. the center has an office handling routine management needs and several specialized work groups overseeing calis' national projects, such as those for the development of databases for union catalogues, current chinese periodicals, and calis' service software. under the guidance of the national administrative center, five national information centers are each responsible for building and maintaining an information system reproduced with permission of the copyright owner. further reproduction prohibited without permission. in one of five general areas-humanities, social science, and science; engineering and technology; agriculture and forestry; medicine; and national defense-in coordination with regional centers and member libraries. the host libraries where these centers are located possess relatively abundant collections in their respective areas. these centers, which are intended to be information bases that cover all major disciplines of science, are responsible for importing databases for sharing and constructing resource-sharing networks among member libraries and for providing searching and document delivery services to member libraries. 5 national information centers 8 regional information centers 70 member libraries depending on their location, academic libraries in china are divided into eight groups, with each forming a regional library consortium. each regional consortium is overseen by a regional management center, except for the consortium in the north, which is directly managed by the national management center. the regional centers not only participate in nationwide projects in coordination with the national centers and other figure 1. the three-tiered structure of calis regional centers, but they also are responsible for promoting cooperation among libraries in their particular regions. all the centers are located in member universities and staffed by the host universities. the concerned vice president or library director of a host university is in charge of the associated center. the regional centers also are assisted by regional coordination committees and advisory committees of provincial and municipal officials in charge of education; university presidents; library directors; and senior librarians in the concerned table 2. table 1. five national information centers areas of specialization humanities , social science and science engineering and technology agriculture and forestry medicine national defense location peking university, beijing tsinghua university , beijing china agricultural university, beijing beijing medical university, beijing haerbin industrial university, haerbin, heilongjiang regional information centers and areas of the ir jurisdiction name national administrative center southeast (south) regional center southeast (north) regional center central regional center south regional center southwest regional center northwest regional center northeast regional center location beijing shanghai nanjing wuhan guanzhou chengdu xi'an jilin areas of juristiction beijing, tianjin , hebei, shanxi, and inner mongolia shanghai, zhejiang, fujian, and jiangxi jiangsu, anhui, and shandong hubei, hunan, and henan guangdong, hainan, and guangxi sichuan, chongqing, yunnan, and guizhou shanxi, gansu, ningxia, and xinjiang jilin, liaoning, and heilongjiang china academic library and information system i dai, chen, and zhang 67 reproduced with permission of the copyright owner. further reproduction prohibited without permission. regions. these committees serve a coordinating role in the regions. i funding the development and operation of calis has been funded in large part by the chinese government. the sources of funding for calis at the present time are as follows: • government grants. much of the funds for the calis project during the first phase of construction came from the government. because of the demonstrated benefits of the ongoing project, it is expected that the government will provide funds for the second phase of calis construction. these government funds have been used in the purchase of software and hardware for the calis centers and commercial databases, development of service software and databases, training of staff members, etc. • local matching funds. according to prior agreements, a province or city that desires to have a regional center is required to provide funds in supplementation to the government funds for the construction of its local center. • member library funds. these funds, primarily derived from the university budgets, have been used to purchase electronic resources and cover the expenses incurred from the use of the calis service software platforms. although calis is currently funded by the government, the future expansion and operation of the system is expected to rely in large part on other sources of fun_ds. the funding needs for calis may be met by operating the system in a commercial mode. i principles for cooperation among members the successful operation of a library consortium clearly depends on good working relationships among members and between members and the consortium. at calis, all members are required to adhere to a set of principles (see below) in dealing with these relationships. it is based on these principles, known as the calis princ~ples for cooperation among members, that calis pohc1es and rules are made. • the common interests of calis are above those of individual member libraries. 68 information technology and libraries i june 2000 • • • • member libraries should not cooperate at the expense of the interests of others. calis provides services to member libraries for no profit. member libraries are all equal and enjoy the same privileges. larger member libraries are obliged to make more contributions. i what has been achieved? when it was first established, calis had sixty-one member libraries from major universities participating in "project 211." later, as many other major universities were interested in joining the alliance, the number of calis members has climbed to seventy. at the present, calis serves about 700,000 students. construction of calis is a long-term, strategic undertaking. the system provides service functions as they become available and is constantly being improved in the process. in the first phase (1998 to 2000) of the project, calis successfully started the following information-sharing functions in its member libraries: • primary and secondary data searching; • interlibrary borrowing and lending; • document delivery; • coordinated purchasing; and • online cataloguing. the following tasks have been completed: • purchase of computer hardware (e.g., sun e~s00); • construction of a cern etor internet-based information-sharing network connecting academic libraries across the country; and • group purchase of databases, such as umi, ebsco, ei village, inspec, elsevier, and web of science, that are shared among member libraries either directly online or indirectly through requested service/ document delivery. calis also has completed development of a number of databases, including: • • union catalogues. these databases currently contain 500,000 bibliographic records of the chinese and western language books and periodicals in all member libraries. dissertation abstracts and conference proceedings . these databases now contain abstracts of doctoral dissertations (12,000 bibliographic records) and proceedings of national and international conferences (8,000 records) collected from more than thirty reproduced with permission of the copyright owner. further reproduction prohibited without permission. memb er librarie s. the databa ses are expected to ha ve 40,000 record s in total by the end of 2000. • current chinese periodicals. th ese databases (5,000 titl es, 1.5 milli on bibliographic records) cont ain cont ents and indexes of current chinese pe rio dicals from about thirt y member libraries. • key disciplines databases. calis has sponsor ed the de ve lopment of twe nt yfiv e di scip line-sp eci fic d a tabases by m em ber librarie s. each of thes e dat abc.ses contains about 50,000 to 100,000 record s. the first three class es of databases are prepared in the usmarc, un imarc, or ccfc format for the ease of u se b y patron s and ca ta loguing s taff and in data exchang e. clients from member librari es may perform a web-ba sed sear ch of th e above databa ses. most of th em contain secondary docum en ts and ab str acts, and access calis onl ine resources using brows ers. deve lopm ent of sofhvare platform s includes the following: • cooperative online cataloguing systems. the syst ems includ e protocol 239.50-based searc h and upl oad in g serve rs and terminal softw are platforms for cataloguing staff. acquisition and ca taloguin g staff in each memb er library m ay participate in cooperativ e online cataloguing using the terminal sof tware platform s on their local sys tem . th e sys tems have been u sed for the devel opment and operation of the union catalogue databa ses. • systems for database development. these syst ems can be used in the de velopment of shared databa ses containing secondary data informati on in usmarc, unimarc, ccfc, or dublin core format. the systems for dat abase developm ent in the usmarc, unimarc, or ccfc format s are equipp ed with a search server based on the 239.50 protocol to permit use by catalo guing staff and for data exchange . • a n interlibrary loan system. the sys tem, d eve loped base d on the iso10160/10161 protocol, consists of ill protocol machines and clien t terminal s. these sys tems, locat ed in memb er libr aries, are interco nnected to form a calis interl ibrar y loan n etw ork. primar y document deliv ery sof tware bas ed on the ftp protocol also has been developed for the de livery of scann ed docum ent s between libr aries. • an opac system. the system has both web /239. 50 a nd web / ill ga teways . patron s may visit the system using co mmon brow sers , sea rch all calis new! lita publications getting the most out of web-based surveys by david ward • 2000 $20 ($18 lita members) isbn 0838981089 surv eys help evalu ate user service s, rat e diff e r e nt librar y programs, facilitat e n ee ds assess m ents , a id fa cul ty research , a nd mor e. posti ng surv eys to the w eb provide s an easy and conveni en t way to reach in ten ded aud igetting the most out of web-based survey s enc es, cen tralizes data collection a n d gives librari a ns gre ater contro l over analyz ing and repor ting results . thi s guide shows ho w to create r ob u st w eb-ba se d sur veys, a nd t h e n gather a nd ass imil ate t h e ir da ta for u se in common database a nd spre adsh eet programs. th e auth or h as applied th e techniques described in hi s own work and has desi gned both comm ercial and acad emic web sites . digital imaging of photographs: a practical approach to workflow design and project management by lisa macklin and sarah lockmiller• 1999 $20 ($18 lita members) • isbn 0838980058 a com pre hens ive app roach to man agement of digit al im ag ing in libr aries a nd archi ves , from archival nega tives to metadata ca taloging a nd web -base d access. getting mileage out of meta data: applications for the library by jean hudgins, grace agnew, and elizabeth brown 1999 • $22 ($19.20 lita members)• isbn 0838980066 an overview of the state-of-the-art metadata cataloging and curr ently ava ilabl e metadata standa rds, incl uding compr ehen sive descr iption s an d links to current a pplications. other lita publications and a printable order form can be found at www.lita.org/litapubs/index.html. fax orders to (312) 836-9958 or call 800-545-2433, press 7 (m-f, 8-5 cst). china academic library and information system i dai, chen, and zhang 69 reproduced with permission of the copyright owner. further reproduction prohibited without permission. databases, and send search results directly to the calis interlibrary loan service. patrons also may access an ill server through web/ill, tracking the status of submitted interlibrary loan requests, inquiring about fees, and so on. the databases that are centrally located and those that are distributed at various locations as well as service platforms in member libraries form a calis information service network. i future considerations in a period of just over a year, considerable progress has been made in forming a nationwide resource-sharing library consortium in china. however, because member libraries vary in size, available funds, staff quality, and automation level, calis has yet to realize its potential. there are a number of problems that remain to be solved. for example, the calis union catalogue databases do not work well on some of the old automation systems in member libraries and the calis service platforms are incompatible with a dozen automation systems currently in use; as a result, the union catalogues cannot tell the real-time circulation status in all member libraries, affecting interlibrary loan service. in addition, primary 70 information technology and libraries i june 2000 resources are not sufficiently abundant. therefore, the extent to which resources are shared among member libraries remains quite limited. in the next phase of development, calis will improve service systems (including hardware and software platforms) and the distribution of shared databases. at the same time, calis will develop more electronic resource databases and be actively involved in the research and development of digital libraries, expanding the scale and extent of resource sharing. references 1. barbara a. winters, "access and ownership in the 21st century: development of virtual collection in consortia! settings," in electronic resources and consortia (taiwan: science and technology information center, 1999), 163-80; katherine a. perry, "viva (the virtual library of virginia): virtual management of information, in electronic resources and consortia (taiwan: science and technology information center, 1999), 93-114; delmus e. williams, "living in a cooperative world: meeting local needs through ohiolink," in electronic resources and consortia, ching-chin chen, ed. (taiwan: science and technology information center, 1999), 137-61. 2. jordan m. scepanski, "collaborating on new missions: library consortia and the future of academic libraries," in proceedings of the international conference on new missions of academic libraries in the 21st century, duan xiaoqing and he zhaohui, eds. (peking: peking univ. pr., 1998), 271-75. lib-s-mocs-kmc364-20140601051432 30 journal of library automation vol. 5/1 march, 1972 circulation control: off-line, on-line, or hybrid michael k. buckland, bernard gallivan: library research unit university of lancaster, england the requirements of a computer-aided circulation system are described. the characteristics of off-line systems are reviewed in the light of these requirements. on-line systems are then reviewed and their economic viability queried. a "hybrid" system (involving a dedicated mini-computer in the library, used in conjunction with a larger machine), appears to be more cost-effective than conventional on-line working. introduction an important feature of a very small library is the close contact between the librarian, his collections, and his users. over the years as collections and library usage have both increased enormously, librarians have gradually been losing this important "contact." the trend toward increased book use is a desirable one but the sheer pressure of transactions has necessitated the adoption of manual and photographic circulation control systems which concentrate on a restricted range of information about borrowing -notably when a book is due back and who has a given book. computerbased circulation systems offer the prospect of regaining detailed knowledge of book usage-at a price. this paper reviews three approaches. the desirable features of a circulation control system are that: 1. it should "marry" borrower, bo~k, and date information together rapidly and accurately. 2. it should enable rapid, easy consultation of the issue files at any time in order to detect the location of any book. 3. it should be able to immediately detect and register the fact that an item just returned from loan has been requested by another reader. this ability should not be dependent on whether or not the person returning the book has also remembered to bring in a recall notice. hybrid circulation control/buckland and gallivan 31 4. it should prepare suitable overdue notices for books retained too long. 5. it should be possible to produce lists of items out on loan to any given borrower and also to signal "over borrowing" (i.e., having an excessive number of books out on loan at any given time). 6. it should be able to detect delinquent borrowers at the point of issue. 7. when material is returned from loan, the system should amend the circulation records promptly and permit the calculation of any fine. 8. it should facilitate the collection, analysis, .and presentation of the "management information" needed to maintain effective stock control and high standards of service. 9. it should perform these tasks reliably and economically. these requirements vary in importance from library to library, but, with some differences in emphasis they appear to apply equally to both public and university libraries. off-line the commonest approach to computer-aided circulation control is to operate in the off-line mode. well-documented examples are ibm 357's (southern illinois), (2) southampton university library, (phase ii), using friden equipment, ( 3) and the current automated library systems ( als ) ltd. equipment. ( 4) these systems can perform the basic operations of issuing and discharging books in an economical manner but because they are operating in an off-line manner they experience difficulties in mainfig. 1 cards badr.es dials · '~'"m"lli ~ ~ ~i/ alsonly g :-taapping_l,_ _______ .,. receiver i store i .... ___ _ ___ ...j ! c:j or i i l----_j i library i i i ibm 357 collectadata 30. (friden) als ---------------------------------,-------------------------------1 32 i ournal of librm·y automation vol. 5 / 1 march, 1972 taining an up-to-date overview of their collections and in detecting reservations. they cannot detect delinquent readers at the point of issue. in order to solve some of these problems als has been developing an off-line system with a certain amount of storage attached to it. this "trapping store," indicated by dashed lines in figure 1 (see p.31), can contain the numbers of reserved books and delinquent readers to facilitate immediate identification at the point of issue. this system has proved to be quite popular and at least fifteen have been installed in university and public libraries. it is still not able to provide any better currency of information than is possible with a basic off-line system and the als system will handle only numeric information. books are identified by number only, so that if one receives an overdue reminder, it is because books 341672, 816649, and 654321 are overdue-unless there is a substantial matching operation against a complete catalog file. in contrast a system using alpha-numeric characters could include brief author, title, and call-number information on, say, an 80-column book card. this would permit the production of lists by author, etc., without reference to a complete catalog file. it may be noted by reference to table 1 that an outstanding virtue of the als system is the low cost of installing additional data collection points. a notable desideratum in library automation is the apparent lack of a simple, inexpensive data collection unit capable of reading alpha-numeric book cards. if relatively expensive equipment is used (e.g., ibm 357 or friden collectadata), there may be difficulties in coping economically with the inherent peakiness of library borrowing. attempts to mould table 1. off-line ci1'culation system costs ibm 357 basic two transmitter and receiver system ____ $13,000 maintenance ( p.a. ) __________________ 655 trapping store ___ _ --$13,655 als (6-reader friden collecta data 30 system) $14,000 $10,000 900 800 11,000 $14,900 $21,800 notes: 1. the specifications are intended to represent two service points. since als equipment uses separate card readers for borrowing and return, the provision for two borrowing points and two return points would, in fact, have a higher traffic capability than the other two systems. 2. figures representing british prices expressed as u.s. dollars at $2.40=£1. 3. collecta data 30 hardware is at 1967 price. 4. approximate cost of each als reader is $500. hybrid circulation control/buckland and gallivan 33 library use to suit the machinery are unlikely to prove satisfactory. notably a general lengthening of loan periods will result in a lower standard of library service in terms of immediate availability ( 1) and, almost certainly, a decrease in actual book usage. in management information terms, the symptoms of this would be an increase in the size of the issue file compared with the borrowing rate and a drop in issues per capita. on-line since the deficiencies of off-line working are serious, various attempts have been made to develop on-line circulation systems (see figure 2). this is the second main formula of which illinois state library ( 5), queens university, belfast ( 6 ), and midwestern university library (7) provide welldocumented examples. such a system is able to maintain a completely up-todate picture of the issue files. they can detect both reserved material and delinquent readers immediately and appear to provide the complete answer to the library's needs until their technical requirements are examined. in order to control the circulation system in an on-line manner, the library expects there to be at least ten hours on-line working available to it each working day. as more than one university library has already discovered, this number of hours of on-line working is very rarely available at present when computer facilities are being shared with many other users as in a university environment. furthermore, it is unlikely r=j/lt!plexer input-output d.c . u. ' s i fig. 2. on-line circulation control on-line computer dedicated storage c 0 m p u t e r u n i t 34 journal of library automation vol. 5/1 march, 1972 to become available for quite some years in the future because, with present machines and techniques, on-line working is an inefficient mode of operation unless the computer system is running well below capacity. a further obstacle when sharing facilities is the amount of dedicated storage that must be made available to the library. storage is a much prized commodity and computer centers are unwilling to forfeit valuable storage for any length of time. it should also be noted that no average-sized library will be able to afford or justify possession of its own dedicated computer adequate for on-line working. a library's requirements for storage, printing facilities, and so on would make such an independent system an extravagance since its power would have to be considerable to handle the vast quantities of data input to it, but it would constitute a grossly under-utilized investment compared with the sharing of the facilities provided in a university or local government computing center. the data collection units could be teleprinters or card reading stations with some printing or display facilities. the number and type of such dcu's will depend on the local work load, but we will consider a system using two alpha-numeric card reading stations with printout facilities plus an interrogating printer. an interface into the main computer and a multiplexing device will also be required. in order to answer queries and to control the circulation in a completely on-line manner the dedicated disk must be large enough to hold the issue file, and having gone to the expense of controlling the issue online, it would seem inconsistent to be satisfied with a number-only system. if ~e plan for an expected maximum number of 50,000 records in our issue file at any one time and allow 100 characters per record (i.e., author/ title, class or call number, borrower number, date due back, code to describe the type of loan, i.e., long or short, etc.), the disk must be capable of storing 5 million characters. since it is usual to store the bulk of the circulation control programs on the same disk and to allow certain parts of the disk to be used as work areas, a total store area of 6 million characters, at least, will be required. the cost of providing adequate dedicated disk storage will depend on the local situation but could well cost anything between $30,000 and $50,000 to purchase. the remaining equipment is likely to cost $20,000 and development costs will be greater than with off-line. a "hybrid" circulation control system it is possible to meet all the requirements of a library circulation system in a cost-effective manner by exploiting and combining the main advantages of on-line and off-line working in a hybrid system. the basic structure of such a system is shown in figure 3 (see p. 36) . as can be seen, the mini-computer is sited in the library building and has the various data collection terminals attached locally to it. the mini-computer is also conhybrid circulation control/buckland and gallivan 35 nected via a line to the main computer into and from which it can send and receive data. the important differences between this system and the conventional onor off-line systems are that: 1. the mini-computer spools up the transactions as they occur, into its own storage (either tape or disk). 2. the on-line link to the main computer is only used two or three times each day. this is important, since it implies that the hybrid system does not require continuous on-line facilities. 3. supplementary or full listings show the state of the issue file at a particular time. 4. the recent transactions stored by the mini·computer can be interrogated and, in conjunction with the listings, gives the immediately current state of the issue file. 5. reserved books and delinquent readers have their identifiers stored in the core of the mini-computer to enable their immediate identifi· cation at the point of issue. 6. the necessity for dedicated equipment on the main computer (such as a dedicated disk) is avoided. a fairly heavily used library will be handling approximately 5,000 transactions each day. since these transactions will be either issue or return transactions, in the main, if we allow 100 characters worth of information to identify an issue and 20 characters to identify a return, then on the average we will be handling 300,000 characters worth of information each day. in the hybrid system the mini~computer is acting as a controller to the data collection devices and is spooling this information up onto a magnetic tape until such time as the storage space is becoming full or until a sufficient time has elapsed since the last updating of the records. at this time the mini-computer passes the information on its magnetic tape to the main computer via the on-line link. the duration of the on-line connection might be ten to fifteen minutes owing to line speed limitations. in order . to operate a hybrid system, the library would need two periods of on-line working each day of approximately ten to fifteen minutes each. alternatively the magnetic tape could be physically replaced, the fresh one continuing to record transactions while the full one is carried manually to the main computer. provided that the tapes can be read by the main computer, on-line facilities would not be required. the recent transactions, having been passed to the main computer, will be sorted and merged with the rest of the issue file which would be kept on magnetic tape. the precise nature of the listings produced by the main computer will depend on local factors, such as the duration of the loan period, etc., but could be either a fully revised complete listing or a listing of the most recent transactions to supplement an earlier complete listing. 36 journal of library automation vol. 5/ 1 march, 1972 hybrid computer costs basic computer (includes teleprinter ) ------------------------------------------------$ 8,650 extra 4k of store ----------------------------------------------------------------------------------------$ 3,600 tape con troller --------------------------------------------------------------------------------------$ 7,200 dual tape transport --------------------------------------------------------------------------------$ 5, 700 data break interface ------------------------------------------------------------------------------$ 600 d. c. u .' s @ $3,100 ( 2 off) ______________________ ___________________________________________ ______ _ $ 6,200 interface d . c. u .-mini-computer -----------------------------------------------------------$ 2,000 interface mini-main computer --------------------------------------------------------------$ 1,200 $35,150 it is worth noting that the most widely adopted computeraided circulation system in great britain is the als system, which, if purchased with a "trapping store," is th e nearest equivalent to the hybrid system outlined in figure 3. the chief differences are that ( 1 ) the als system operates on numbers only, which, in our view makes it less suitable for university library applications; ( 2) the "trapping store" is inflexible in its capability when compared with a mini-computer of similar cost. in order to utilize the on-line facilities provided b y a mini-computer to the full, it would be possible to handle the "short loan" reserve collections of popular texts (commonly borrowable for a few hours only) in a completely on-line manner. in this respect there would be no reliance on the main machine. it might also be appropriate to use the mini-computer to handle other library data processing tasks. library ~r~als rn ~~ ,,!,. [___j computer d! lj consol e typewriter or vou comput er u nit · mai n computer -qhor lists, '-----,t,----------' call number lists, notices, etc . 8 fig. 3. simplified hypothetical " hybrid" circulation control system hybrid circulation control/buckland and gallivan 37 last year at the university of lancaster the average cost per issue was 12.72 cents. since the university of lancaster is a new university in great britain, it is in the middle of a period of growth and student numbers are expected to increase from 3,000 to 5,400 in the next five years. this university has also researched into the influence of duplication and loan period adjustment on the availability of stock to prospective users and with the present level of duplication and grades of loan period there is a per capita borrowing rate approaching 80 issues per annum. this figure is expected to increase in the next five years. with these figures as a basis, at least 2 million issues are expected in the next five years. even allowing for the cost of data conversion and the amortizement of hardware over five years, the use of a hybrid circulation control system could be expected to result in an average cost per issue of just under 12 cents. conclusion the costs already mentioned can be tabulated thus: off-line-$13,000-$22,000 on-line-$70,000 h ybrid-$35,000 this suggests that a hybrid system offers complete control over library circulation in a highly cost-effective manner compared with on-line working. whether or not a hybrid system is also to be preferred to off-line working will depend on the individual library context. the trade-off between the marginal advantages and the marginal increases in cost and complexity will depend on the detailed costs and value-judgments specific to each situation. if our diagnosis is correct then most attempts to progress from off-line to on-line working are ill-judged and would appear to have no justification in cost-effectiveness. in our view these developments are unlikely to become fully operational. if they do, their life will probably be short or restricted to limited hours unless exceptional circumstances prevail. such circumstances would include continuing subsidies for research and development or the existence of a system justified on other grounds (e.g., police records) . references 1. michael k. buckland and others, systems analysis of a university library; final report on a research project, university of lancaster library occasional papers, 4 (lancaster, england: university of lancaster library, 1970). 2. r. e. mccoy "computerised circulation work: a case study of the 357 data collection system, library resources & technical services 9:59-65 (winter 1965). 3. b. a. j. mcdowell and c. m. phillips circulation control system. automation project report no . 1 (soul/ aprl) (southampton, england: university of southampton library, 1970). 38 journal of library automation vol. 5/1 march, 1972 4. lorna m. cowburn "university of surrey automated issue system," program 2:70-88 (may 1971). 5. robert e. hamilton "the illinois state library "on-line" circulation control system," in dewey e. carroll ed., proceedings of the 1968 clinic on library applications of data processing. university of illinois graduate school of library science, urbana, illinois (london: bingley, 1969) , p.ll-28. 6. richard t. kimber, "an operational computerised circulation system with on-line interrogative capability," program 2:75-80 (oct. 1968). 7. calvin j. boyer and j. frost, "on-line circulation control-midwestern university library's system using an ibm 1401 computer in a "timesharing' mode," in dewey e. carroll ed., proceedings of the 1969 clinic on library applications of data processing. university of illinois graduate school of library science, urbana, illinois (london: bingley, 1970) , p.135-45. lib-mocs-kmc364-20140106083304 a marc ii-based program for retrieval and dissemination 141 georg r. mauerhoff: head, tape services, national science library and richard g. smith: analyst/programmer, research and planning branch, national library, ottawa, canada (formerly with library and computation center, university of saskatchewan) subscriptions to the library of congress' marc tapes number approximately sixty. the uses to which the weekly tapes have been put have been minimal in the area of selective dissemination of lnforrrultion (sdi) and current awareness. this paper reviews work that has been performed on hatched retrieval/dissemination and provides a description of a highly fl exible cooperative sdi system developed by the library, university of saskatchewan, and the national science library. the system will permit searching over all subject areas represented by the english language monographic literature on marc. introduction with subscriptions to the library of congress' marc ii tapes numbering approximately sixty ( 1 ), the utilization of standardized bibliographic information in machine readable form has reached an all-time high. numerous subscribers have written programs to access the tapes in order to produce acquisitions and cataloging products, but, unfortunately, the search techniques in these programs have been limited to searching of fixed-length information, such as lc card numbers, standard book numbers ( sbn' s) and compression codes. accelerated developments of searching mechanisms have been made by those involved with on-line bibliographic systems, but work on marc information retrieval in the batch mode has 142 journal of library automation vol. 4/3 september, 1971 been evolving very slowly. that is, the proffering of an assortment of remedies for one of the oldest library problems, that of current awareness and selective dissemination of information ( sdi) using marc, has not received the emphasis it should. the library of the university of saskatchewan has been utilizing the marc tapes since their weekly distribution on 1 april 1969, with areas of usefulness so far having been restricted by the kinds of searching methods available. concern has therefore been shown for a far greater exploitation of the marc records. since no algorithms other than time decay have been established locally for limiting the size of the file to items which have a high degree of usefulness, and since the cost of updating and storing the weekly files has to be incurred, it is only fitting that as many bibliographic records as possible be monitored and disseminated to those sections of the university where they can be most effectively used. a program package for current awareness/sol is the most likely method for achieving this. collaborative efforts are now the only realistic means of exploiting marc. costs can be spread over a large user group, and at the same time personalized services are assured to those taking part. it is for this reason that the office of technical services ( ots), library, university of saskatchewan, has been cooperating with the national science library ( nsl ), national research council of canada, on the development of such a current awareness/dissemination system. known by the acronym seldom (selective dissemination of marc), the program represents cooperation in the true sense of the word, in that the ots's experiences with marc are being coupled with nsl's expertise in nation-wide sdi. this paper will describe in detail the evolution of seldom, with a future paper to document user reaction to the seldom program. history the university of saskatchewan is not alone in the investigation of marc-based retrieval/dissemination programs. the oklahoma department of libraries, under the coordination of k. j. bierman ( 2, 3, 4, 5 ), has been operating a weekly marc sdi service since february of 1970 and found its reception overwhelming. over twenty user groups in the united states and canada are presently experimenting with this current awareness service in various subject fields, using the dewey decimal and the library of congress classification numbers as search keys. oklahoma's efforts followed the study by william j. studer ( 6) and the aerospace research applications center (arac) at indiana university. studer's hypothesis was "that an sdi system concerned with book-type material would be of significant benefit to faculty in keeping them alerted to what is being published in their fields of interest-especially faculty in the non-technical areas where books are probably still as vital, if not more important, a medium of information and ideas as periodical and report literature (7)". marc ii-based retrievaljmauerhoff and smith 143 in his experiment, studer translated participants' interests into profiles consisting of weighted library of congress subject headings and classification numbers. henriette avram ( 8) of the library of congress' marc development office reported on information retrieval using the marc retriever, a modification of programmatics inc.'s system known as aegis. regarded as "essentially a research tool that should be implemented as inexpensively as possible," the marc retriever is tape based and able to accept almost any kind of bibliographic query. unfortunately, it is only operational at the library of congress. along similar lines are syracuse university's l.c. marc on molds and leep projects (9, 10, 11, 12). the interactive retrieval capabilities, which are used in both batch and on-line modes, permit a variety of queries over their marc data bases. additional projects reporting on the subject approach to marc tapes in a batch environment are not numerous. dohn martin ( 13) at the washington university school of medicine describes a searching method by l.c. classification numbers, in which a pl/1 program is used to produce selection lists for the medical library. this is along the same lines as the work reported by j. g. veenstra ( 14) of the university of florida, d. l. weisbrod (15) of the yale university library, and f. m. palmer (16) of harvard university library. in sweden, bjorn tell ( 17) has run a marc ii test tape in his integrated information retrieval system called abacus, while in edmonton, canada, doreen heaps ( 18) reports on author and title searches of marc tapes in a chemical titles format. in england, related research is being contemplated by f. h. ayres ( 19) for bnb marc tapes. in ireland (20), also, plans are in the offing for sdi services based on bnb marc tapes, while in the united states, the first commercial venture is underway by richard abel and company ( 21), which is contemplating selective dissemination of announcements. background the national science library has been providing an sdi service for canada's scientific and technical information (sti) community since april, 1969, spinning a variety of machine readable indexing and abstracting services on a regular basis. a questionnaire ( 22) was sent out by can /sdi project officials in may 1970, asking its subscribership to suggest where subject expansion should take place in the future. although the responses emphasized the life sciences, e.g. biological abstracts' ba previews and medlars, the nsl was nevertheless quite enthusiastic about adding the library of congress' marc ii tapes to their present sdi service, especially if the project programming could be accomplished elsewhere. twenty-one subscribers responded to the marc ii tapes, indicating the existence of a good user group, although not one of top priority. the university of saskatchewan library expressed a willingness to perform the systems work and project programming, which was estimated to require less than four 144 journal of library automation vol. 4/3 september, 1971 man-months, making seldom operational by february 1971. the seldom program facilities and programming languages in order for the ots to make use of the pl/1 and assembler programs, an ibm s360 computer configuration consisting of at least look memory and a pl/1 compiler was deemed necessary. this presented no problem because the library had at its disposal an ibm s360/50 with 256k bytes of memory. additional hardware specifications include four tape drives, a 2314 disk and two 1403 printers, one with a tn option. the latter is soon to be replaced with the ala approved library print train. now, however, because of the addition of large core storage (lcs ), large bibliographic files such as marc will be processed much more easily. release 19 of os mft was also implemented in order to effectively utilize this additional million bytes of lcs memory. this more than modest memory has great utility, although serious investigations of automated library systems such as this one can take place even with small memories. as can be imagined, the switchover to release 19 came at an inopportune time as far as the seldom programs were concerned. implementation of the new release affected the scheduling and turn-around times. the seldom record format several years ago, the national science library decided to adopt a standard marc 11-like format and design programs to convert suppliers' tapes to this standard format. when a decision is made to add a new tape service, such as biological abstracts' ba-previews, to the present inventory of can/sdi tapes, the nsl personnel select those bibliographic items which will find use in an sdi environment. selected items are then pulled from the input tape by the conversion program, and structured into an nsl format. this then was the first of many tasks facing the otsdetermining which fields should be utilized from the lc marc tape for searching and printing. of approximately fifty marc tags, fixed and variable, only 32 contain information that might be of interest to users of the system for searching. these tags, however, can be grouped into analytical units, i.e. units of like information. arranged in six term types, they are: personal name, corporate name, classification, title, geographic area code, and date. the abbreviations for the term types are p, b, k, t, g, and d respectively. users then will be able to request information from the system in many ways, whether it be for a title term, or a combination of categories such as classification number and geographic area code. the twenty-three fields and five subfields chosen, along with their respective analytics are shown in table 1, where [ ] are not searched and o are ots calculations. percentages of occurrence, which was the criterion used for selection of the marc ii-based retrievaljmaverhoff and smith 145 tags, are also indicated in the table. all the 500 tags were omitted because nsl and the ots do not wish to search abstracts, annotations, or bibliographic notes at this time. where frequencies were not available from the library of congress' publication entitled format recognition process jor marc records: a logical design, the ots conducted its own counts over a tape selected at random. the tape chosen (volume 2, number 23 ) for the counts contained 881 records. table 1. search fie ld definitions se.arch key personal name (p) corporate name (b) title (t) classification ( k) geographic code (g) date (d) fieldjsubfield 100 [400] 600 700 [800] 110 260$b 410$a 610 710 810$a 111 130 240 [241] 245 410$t [411] 440 [611] 630 650 651 711 730 740 810$t [811] 840 050 051 082 043 009 (i.e. 008 ) % of occurrence per record 84.7 <0.1 12.1 22.4 0 11.7 97.9 2.4 4.8 11.1 4.6 1.5 0.2 4.3 ° 0.1 ° 100.0 2.4 < 0.1 6.0 0.1 0.9 95.9 17.5 0.2 0.8 4.1 ° 4.6 0.1 0.5 105.1 0.9 ° 95.8 34.0 ° 100.0 146 journal of library atltomation vol. 4/3 september, 1971 the fact that only 28 data elements were chosen for searching purposes proved highly useful, since the national science library's search module was designed, for the sake of efficiency, to accommodate a maximum of 32 search field definitions. the program can handle this many fields, but on the average it makes use of approximately twelve fields per record. there may be occasions, however, when as few as seven or as many as twenty-two directory entries will be handled, not counting subfields. table 2 is a distribution of directory entries for the sample marc ii file tape. the mean of the distribution of entries is 13, and the median 12. table 2. distribution of directory entries #of dir. entries l6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ::::,..23 #records 0 2 0 28 74 116 160 149 123 106 69 25 19 5 3 1 1 0 881 % 0 .23 0 3.18 8.40 13.17 18.16 16.91 13.96 12.03 7.83 2.84 2.16 .57 .34 .11 .11 0 100.00 at the same time that d ecisions were being made regarding the inclusion of certain search fields, print field definitions were structured. although the programs can accommodate any number of directory items, only 31 are required for satisfactory and meaningful output. the analytics for these definitions make up table 3, where o are ots calculations. frequency statistics are again included. description of programs the seldom software is comprised of four modules. these modules (a, b, c, d) are easily identified in the system flowchart (figure 1) and are: "a" the translation and conversion of marc; "b" the searching of files; "c" the outputting of the search results , and "d" the compiling of profiles. two ibm utility programs are also used. marc 11-based retrievaljmaverhoff and smith 147 table 3. print field definitions definition term ( s) causing retrieval main entry title statement edition statement imprint collation statement series statement/notes bibliographic price subject added entries lc card number profile number expression number threshold weight weight source form of content language lc class number dewey decimal number isbn %of occurrence per record 98.6 100.0 4.1 100.0 100.0 13.6 39.7 ° 131.6 100.0 53.2 ° 100.0 105.1 ° 95.8 ° 39.7 ° translation and conversion program ( lconv) the conversion program, called lconv, converts the weekly marc tape into a seldom marc ii-like format tape. the input records see the following changes: "%" used as field terminator, "$" used as subfield delimiter, "@" used as record terminator, upperand lower-case ascii translated to upper case ebcdic, diacritics removed, text compressed, and unromanized characters that can't be approximated removed. the program is driven by two tables, one of which consists of the marc tags in which the ots is interested, and the other, the processes to which the selected tags will be subjected. currently, all tags can be handled by one of four processes: 1) process 1 extracts the language and the form of content code from marc tag 008, and creates a new field 008 consisting of only these two units. instead of a one-character code for form of content, a four-letter abbreviation delimited by "$a" is used. language of publication is delimited by "$b". process 1 also extracts the first publication date from the original tag 008, and sets up a new field , tagged 009 and delimited "$a". 2) process 2 handles the library of congress ( 051, 052) and dewey decimal classification ( 082). it utilizes only the first subfield, compresses out slashes, and limits the length of these fields to 20 characters. 3) the geographic area code ( 043) and imprint ( 260) are routed through a third process which retains the marc subfield delimiters. subfield delimiters are retained to narrow the object field and reduce search time. 148 journal of library automation vol. 4/ 3 september, 1971 conversion weekly marc ii tape program ~ compile profiles compro d ibm update utility lconv update -----, i current profiles update -----, i current addresses fig. 1. system flowchart of seldom. b c converted marc hits & marc records sorted hits & marc records prin program pr npro printed profiles statistics marc ii-based retrievaljmauerhoff and smith 149 4) all other tags are routed through process 4, which removes subfield delimiters and heads up the entire field with "$a". narrowing down the object field is not desirable for fields input to this process. the conversion program also outputs for each record a field identified by 035, a marc ii tag for local system number. this field contains data base code ( r for marc), volume and issue number (extracted from marc tape label), and the library of congress card number truncated to the first eight characters. lconv sorts the tags, calculates base address and record length, builds a new directory, and writes the seldom marc 11-like record out on tape. the searqhing program ( srchpro) the searching program accepts as input compiled profiles, the converted marc tape from lconv, and parameter cards specifying data base and up to 32 search field definitions. each field definition consists of a term type code, tag, and delimiter of the field or subfield to be searched. six te1m types are allowed, although additions, deletions and changes to these six may be performed upon requests. all terms except date may be truncated on the right, with title terms benefitting from left truncation. the right truncation feature reduces storage and search time requirements. the searches are conducted over the converted tape according to the boolean expressions which connect symbols representing profile words. profile words are simply entered into core until the alloted core is filled, and the source tape is sequentially passed against the profiles; i.e., each of the records on the tape precipitates a search of the profile words in core. if all of the profile words were not entered into core, the source tape is rewound and another search is conducted. this continues until all profiles have been searched. an output tape is created containing the seldom record retrieved with a prefix consisting of the profile number, threshold weight, weight, expression number, hit number, and terms which caused retrieval of the record. users also have the option of applying a weight ( -99 to +99) to each profile word. each time profile words match terms in a record, the weight value of each of the words found is tallied. upon completing the search of that record and upon satisfying the expression logic, the total of the weight values is compared to a threshold value. thus, if the total is greater than or equal to the threshold value ( -999 to +999), that particular record is retrieved. another option available to the user is a hit option, in which the user may specify the maximum number of records he would like various expressions in the program to retrieve for him. the output programs the output from the search program is sorted by calling up the ibm sort utility, which sorts the records on prefix. the sorted output is then 150 journal of library automation vol. 4/3 september, 1971 input to the print program along with the address file. the latter is a separate file that is merely updated using the ibm utility iebupdte. it is in this address file, however, that several options can be specified. duplicate printouts can be obtained, such that the left and right sides of the page carry identical output, with the right side carrying a feedback mechanism. two-up printouts, notes, and if necessary, punched card output can be requested. on the whole, the record printed out (see figure 2) is similar in format to a 3 x 5 catalog card, the only differences being the fixed format, term or terms causing retrieval, the lack of name added entries and notes, and the control information at the bottom of each printout. profile compilation because of the library's bibliographic responsibility to the university, an alerting service such as seldom will vastly improve user awareness of the published monographic resources. first, users, in house and out, would not only be alerted to many works to be acquired by the library, but would also be alerted to items that are currently not being purchased. secondly, they would be assured of personalized services. users of seldom will not receive listings of just new books, but will be notified of the latest books which are presumed to be relevant to their interests. profiling when a prospective user (group) wishes to search a weekly marc tape, his (its) interests are entered onto profile formulation sheets. these sheets (see figures 3 and 4) contain a description of the user's subject interests, several references to the monographic literature, and a listing of the profile words with logical connectives. the profile words may number as many as 500. figure 5 shows three of the approximately eighty profiles currently running under seldom. the profiles are formulated by search editors using words that appear in the user's narrative and references. additional words are sought in the library of congress' list of subject headings. classification numbers that express the appropriate areas are incorporated; depending upon the information need, personal names, corporate names, geographic area codes, and date are also prescribed. according to mauer hoff ( 23), approximately twenty-seven hours per year are required of an information specialist/search editor in order to accurately capture and maintain a user's need for information. this figure _ incorporates interviewing time, user education, analyses of user feedback, and revision time. the success of this system or of any information retrieval system therefore depends on having sufficient profiling staff. the compile program (compro) compro, compiling of profiles program, edits the profile transactions 0018 0018 0018 0018 0018 0018 0018 oojr 0018 0018 001~ 0018 0018 0018 ooj g 0018 0018 0018 0018 001~ 0018 0018 0018 001 8 0018 0018 dat e: mar 1q, 1971 **************************************** •••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••• .........................................................•.• shortt lii\rary, c/0 murray mem or ial library, u~ivfrsity of saskat chewan, saskatoon, s ask . ••• ••• ••• ••• ••• ••• ••• ••• ••• ••• • •• ••• ••• ••• ••• ••• ••• ... ••• ••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ····································*···················· ··· n cn ab, n cn bc irish, ernest jame s wi 'iget t, 1'll2structure of the northern foothill s and eastfrn mountain ran ges, al ber ta an d br itish colu mbia, bet ween la titu des 53 15' an d 57 20 •, 1\y e. j. w. i ris h. dept. of en erg y, mines and resou rces< l96 8> 38 p . ill us ., fold. col . ~aps i in pocket! 25 cm . ** geologica l survey of canada. bulletin 168 **ca nada . geolog i cal sur vey. bulletin 16r ••2.00 geo logy british colu~bia . **gf.olo gy al be rta. lc 77-524 81>8 qe 185 pools fn oj tw 000 wt 000 s r024'l fc 557 .11 leng da tf: ma o. 1'1, l'l71 seldom project: ma o.c ii vol 02 no 4~ j 'l7 1 th e fij ll ~wing monographs in the are as i n whi ch you have fx~resseo int erf~ t rfpresfn t titles recently prcces~eo ry thf li~rapy of congress . this listi~g i s re i ng prov i ded to you as part of a research project 8f i ng conduc t ed ry your llerary i n cooperation ~ith the nati onal sc i fncf llijrary i n ottawa. murray ~e~or i a l li~rary , univers ity of saskatche o an , saskatoon , s ask . n cn nt cundy, robfrt . beacon six . london , eyrf & spottiswoooe, 1'l70 . 25 3 p., 16 plates. illus., 7 haps, ports. 21 c~ . •• so/ fr an klin, j ohn , s i r, 1786-ir47. **no rt ~w est t eqr i torifs , can descr ipti on and travel . **arcti c regions . lc 73-539884 f 1060 pool~ f~ ryj tw co o wt 00 0 s r0249 fc leng 9 17 . 122041 i 58~ c413263002 00 18 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 0018 ···············································*·········································· ····································· 0018 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • • ••••••••••••••••••••••••••••••••• ••••••••••••••• 0018 ··················································································································*············ 001 8 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • ••••••••• •••••••••••••••••••• •••••• 0018 ••••••••••••••••••••••••••••••••••••••••••••• • •••••••••••••••••••••••••••••••••••••••••••••••••••••••• • ••• •• •••• ••••••••••••••• fig. 2. sample profile notices. ~ > :;x:, c":l ....... ';"< b:l ~ ~ ~ ::x:l ~ ~ ~g ..... .......... 3:: > ~ trl ~ ~ 0 ~ ~ § 0.. (j') 3:: j-( ~ ~ ~ cjl ~ 152 journal of library automation vol. 4/3 september, 1971 profi le -number 0003 sheetnumber insert your address i.abel in this bi.ock reference department murray memorial library university of saskatchewan saskatoon, canada. sta'i'e you'i $earth ' li!qo!$t _,, iff .. ~ll~l:i\'1: 'ojm, aw l'w(i rej:t:fi(p'c~$ _:·of papers pu8usheo t'l! 'i!qo or a 'c.ot.i.[ague working i~ -::·yo.ur ::' ,fi~~o; ::. -(~i.£ase ,;t'tf'f; or:, priih.l .. this profile is intended to obtain information on current monographs that would most likely be of interest to the reference department, in order to keep the collection up to date. reference works such as dictionaries, encyclopedias, handbooks, catalogues, etc. are the kinds of items sought. references: 1. united nations. economic commission for europe . sub-committee on urban renewal and planning. "directory of national bodies concerned with urban and regional research. " new york: unitec nations, 1968. 134 pp. (jx1977) 2. berlin, roisman & kessler. "law and taxation; a guide for conservation and other nonprofit orqanizations". washinqton: conservation foundation, 1970. 47 pp, (kf6449) 3. hayes, robert m.' and becker, joseph. "handbook of data processing for libraries." new york; wiley-interscience, 1970 . 885 p. 4. havlice, patricia pate. "art in time". metuchen, n.j.: scarecrow press, 1970. 350 p. (n7225; 016.7; art -indexes} .. ...... .. .. . .... .. .. :; .. ..... .. .... .. .. :: l.lsf., "pro'fi~: : woilos/ mto tii~i'ci( rje~essions on iieversl: sloe .. ...... . ..... fig. 3. sample profile formulation sheet: narrative and references. marc ii-based retrievalfmauerhoff and smith 153 ., ·ii c. f>rofil.e worj)s .. :' .. .. .. ,· n w · ac . • profi\..e iiiorq~ a a. _as* _a anniia c ann iiai~ n rtri rnr.llap~* e almana c* f dictionar* g di rectory h directories i en cycloped* j facts k glossar* l guide* m han dbook* n in dex* 0 interlibrary p :heck [s. * 0 gfnfaiogy r manual s manuals t t oiiti tnf* t u reference t v rep rint* t w review* t x syllabus t y syllabi t z catalog* t aa abstract* t ab sta [s' * t ac yearbook* t ad re rboo k* .•.. r 99 air -~ 2 r 99 lim-+ 7 3 r 99 fig. 4. sample profile formulation sheet: t erms and logic. 154 journal of library automation vol. 4/3 september, 1971 p 0007 b (\ date : fe3 ?8 , 1970 /!. ciin<\0 11 13 br itis h co lu mb ia c al[l!'rti\ . ---··---0 saskajche~an e i'ia.'-hto'ia f ontario g ouefiec oatf : feb 28 , 197 c p 10 98 k k t t t t t a l~ i c43* b ll3 104 4* c avd!c-visut. 0 film* e aiioio* f vioe(u fl b tl ~ b b 13 fl fl 13 b h n'iva sc otia l prince edwaio islan d j newf•ju'lolanc . __ ,_ g instruct!dnal h tra~sp~renc ies i av k yukjn l nurth~est t~rritories m queen ' s pminter tl n u, s , r. 0 for s hl!; i:iy the supt ;--fl p ava il aa l£ f~o~ clfar!nghduse 8 0 na ti on al f\ r stat( 1 s gt , br it, 8 t h,'i , s . u. b u un it fd nat ions '3 v u, n, t w agarungpaph* eol r 99 ai b-m e02r99 n l ij r t t t t t p t t -,eo3 r 99 sit ___ · ---t e0 4 r 99 u iv e j5 r 99 w k g j la nguii ge lfltl* k tv l teach i ng ~ficrli~e* m progr~~~ec i~struction n ca i u c ~ mputeq-assisted ~9 ai c1 99 c 1.1 0 da te: hfl 2!!, 197 0 ')0()9 a pullut" tl contfi"h"lat* c pol stj~* -f env i ro'li·ient * f to• g n-* eol r ! ~i bf l&g fig. 5. computer version of profiles. for sequence, syntax and semantics, and generates codes for the boolean operators in the search expressions. the program hags incorrect data base specifications, incorrect term types, and incorrect alpha codes (i.e. symbols corresponding to the profile words). profile transactions are by way of card input, and can consist of profile additions, profile updates, and profile deletes. listings accompany all transactions. operation and costs of seldom from the time that seldom became operable on a day-to-day basis, cost information has been gathered, and since seldom is composed of four modules, the recording of items of cost has been easily done. for example, lconv computer charges are presently $0.019 per record converted based on weekly files ranging in size from 1194 records to 2399. this breaks down to about 1939 records per week, and averages out to about $37 per marc tape. following the preparation of the tape for searching, the srchpro-sort routine is run. the average computer cost has been about $0.186 per profile per issue. seldom's user group presently numbers 81, with profile terms numbering 1121 or 14 terms per profile, and questions or expressions numbering 273 or about 4 per profile. prinpro was formerly running under stream-oriented transmission, at a total computer cost of $1.70 per 1000 lines of output. a shift to record-----------------· marc ii-based retrievaljmauerhoff and smith 155 oriented transmission has lowered charges to $1.50 per 1000 lines. with profiles having averaged about 832 lines of output, the total cost of printing out search results has been about $1.25 per profile. overall costs for the 81 profiles are presently about $2.23 per profile per tape, or $116.00 per profile per year. since the profiles require updating at frequent intervals, charges of $0.37 per profile per tape have been incorporated into this charge to take care of changes in terms and addresses. costs which have not been included in the calculations are such items as marc tape subscriptions, forms, and staff time. discussion the ots and the nsl have at their disposal a program package that is highly flexible. for instance, search keys can be added or deleted at will. fields from the marc tapes can either be incorporated or removed from the directory. any number of fields and subfields can be searched on tape, and any new directory items may be created, with the srchpro limit, however, being 32. this number was chosen because it satisfies 99% of the users' needs. almost every procedure in the program is table driven, the result being that variations can easily be introduced into the programs. in consequence, if and when bnb marc tapes are made available, and if and when a canadian marc service becomes a reality, searching of these tapes would present no problems whatsoever. the benefits to be derived from seldom go beyond the concept of sdi, because seldom can produce outputs for a wide variety of applications. sdi and current awareness have received considerable emphasis in the literature by those providing search services over a spectrum of scientifictechnical tape services. since marc ii has also elicited a tremendous response, especially from kenneth bierman of the oklahoma department of libraries, these utilities do not merit additional treatment in this paper. seldom, however, is unique in that it is the only marc-based sdi system capable of searches using six coordinated entry points, linear matching, truncation, weighting, and output options. from the point of selection, marc has great appeal. since the majority of the university's acquisitions (i.e. almost 80%) are english-language monographs, faculty and staff who have the responsibility for book selection would benefit from regular alerting services based on their areas of interest. apart from receiving verified bibliographic information, the participants benefit from the timeliness of the records. at the same time, selection costs per record will be brought down significantly, especially now that this selection process becomes tied in to tesa -1, the library's automated marc-based acquisitions and cataloguing system. it has been suggested that selection and ordering could be done for the cost of selection alone. the only problem areas envisaged are the lack of canadian imprints, and the lack of other non-english monographs, such as french, german, 156 journal of library automation vol. 4/3 september, 1971 spanish and portugese. a partial solution to this problem may take the form of a canadian marc project. a more complete solution is on its way, since marc coverage for other languages is anticipated by the beginning of 1972. collection rationalization, an area receiving considerable attention along regional and national lines, can also benefit from seldom. devising divisions of responsibility in the acquisition of library materials will enable libraries to acquire, organize, store, and make available to the public, comprehensive monographic collections. marc deselection, where practised by subscribers, is being pursued mainly along the lines of time decay. the university of chicago (21) has so far exhibited the only deselection algorithm employing a subject and intellectual level approach, in addition to date. they eliminate records from their file if they fall outside of their collection policy by using classification numbers. the ots will be able to perform the same function, but much more rigorously, since its deselection criteria can consist of six elements. in this way, file size can be kept to a reasonable level, and update and storage charges will not be so high. internal library data and information services will be along the lines of sdi, current awareness, demand bibliographies, and management statistics. these in-house utilities, which are already being obtained, have been very useful the reference department, for instance, receives a bibliography each week of marc ii reference sources. another profile for one of the catalogers is monitoring the publications of the modern day novelists and poets. outlook seldom has been operational for only several months. while it has tremendous potential in the library field, and although immediate interest has been keen, the system will have to undergo considerable acceptance testing. attention will have to be given to costs and to the user and his evaluation of the service. how seldom fits into a library's patron or reference services will be especially important, since the system will be integrated into a library's current accessions program and also the card catalog service. acknowledgments major credit for the existence of the seldom project is due to the systems analysts and programmers at the national research council of canada, messrs. p. h. wolters, r. a. green, j. heilik, miss r. smith; and to dr. j. e. brown, national science librarian. references 1. personal communication with henriette avram, marc development office, library of congress, washington, d. c. marc ll-based retrievaljmauerhoff and smith 157 2. bierman, k. j.: "sdi service," lola-technical communications, 1 (october 1970 ), 3. 3. bierman, k. j.; blue, betty j.: "a marc-based sdi service," journal of library automation, 3 ( december 1970 ), 304-319. 4. bierman, k. j.: "an operating marc-based sdi system: some preliminary services and user reactions," proceedings of american society for information science, 7 ( 1970 ) , 87-90. 5 . bierman, k. j. : statements of progress of cooperative sdi project. in oklahoma department of libraries: automation newsletter, 2 (february 1970 ), 3-4; 2 (june-august 1970) ; 2 (september 1970); 16, 25-26; 2 (december 1970), 34-35; 3 (february 1971), 1-3. 6. studer, william j.: computer-based selective dissemination of information (sdi ) service for faculty using librm·y of congress machinereadable catalog ( marc) records. (ph.d. dissertation, graduate library school, indiana university, september, 1968). 7. studer, william j. : "book-oriented sdi service provided for 40 faculty." in avram, henrie tte : the marc pilot project, final report ( washington, d. c.: library of congress, 1968) p. 179-183. also in random bits, 3:3 (november 1967 ), 1-4; 3 :4 (december 1967), 1-4, 6. 8. avram, h enriette: "marc program research and development: a progress report," journal of library automation, 2 (december 1969 ), 257-265. 9. atherton, pauline: "lc/marc on molds ; an experiment in computer-based, interactive bibliographic storage, search, retrieval, and processing," journal of library automation, 3 (june 1970 ), 142-165. 10. atherton, pauline; wyman, john : "searching marc tapes with ibm/ document processing system," proceedings of american society for information scien ce, 6 ( 1969 ), 83-88. 11. atherton, pauline; tessier, judith: "t eaching with marc tapes," l ournal of library automation, 3 (march 1970 ), 24-35. 12. hudson, judith a. : "searching marc/ dps records for area studies: comparative results using keywords, lc and dc class numbers," library resources and technical services, 14 (fall 1970), 530-545. 13. martin, dohn h.: "marc t ap e as a selection tool in the medical library," special libraries, (april 1970 ), 190-193. 14. veenstra, j. g.: "university of florida." in avram, henriette d.: the marc pilot project, final r eport (washington, d.c.: library of congress, 1968 ), pp. 137-140. 15. weisbrod, d. l.: "yale university." in avram, henriette d.: the marc pilot project, final report (washington, d.c.: library of congress, 1968) , pp. 167-173. 16. palmer, foster m.: "harvard university library." in avram, henriette d.: the marc pilot project, final report (washington, d .c.: library of congress, 1968 ), pp. 103-111. 158 journal of library automatwn vol. 4 / 3 september, 1971 17. tell, b. v. ; larsson, r. ; lindh, r. : "information retrieval with the abacus program: an experiment in compatibility," proceedings of a symposium on handling of nuclear informatwn (vienna : 16-20 february, 1970), p. 184. 18. heaps, d.; shapiro, v.; walker, d.; appleyard, f.: "search program for marc tapes at the university of alberta," proceedings of the annual meeting of the western canada chapter of the american society for informatwn science, (vancouver: september 14, 15, 1970), 83-94. 19. ayres, f . h.: "making the most of marc; its use for selection, acquisitions, and cataloguing," program, 3 ( april 1969 ), 30-37. 20. dieneman, w.: "marc tapes in trinity college library," program, 4 (april 1970 ), 70-75. 21. "marc ii and its importance for law libraries," law library journal, 63 (november 1970), 505-525. 22. wolters, peter h .; brown, jack e.: "can/ sdi system : user reaction to a computerized information retrieval system for canadian scientists and technologists," canadian library journal, 28 (january, february 1971 ), 20-23. 23. mauerhoff, georg r.: "nsl profiling and search editing," proceedings of the annual meeting of the w estern canada chapter of the american society for information science, (vancouver: september 14, 15, 1970), 32-53. 324 journal of library automation vol. 7/4 december 1974 book reviews current awareness and the chemist, a study of the use of ca con.densates by chemists, by elizabeth e. duncan. metuchen, n.j.: scarecrow press, 1972. 150p. $5.00. this book starts with a five-page foreword by allen kent entitled "kwic indexes have come a long way-or have they?" kent is always interesting but when one detects that his foreword is becoming almost an. apologia, one wonders just what is to come. the remainder of the book (apart from the index) appears already to have been presented as dr. duncan's ph.d. thesis at the university of pittsburgh. the first two chapters are the usual sort of stuff, taking us from alexandria in the third century to columbus, ohio in 1970, with undistinguished reviews of user studies and the history of the chemical abstracts service. the remaining sixty-four pages of text report and discuss a study of the use of ca condensates by quite a small sample of academic and industrial chemists in the pittsburgh area. the objective appears to have been to compare profile hits with periodical holdings and interlibrary loan requests at the client's library so that a decision model for the acquisition of periodicals could be developed. on the author's own admission, this objective was not achieved. a certain amount of data is presented but it is difficult to draw many conclusions from it, other than the fact that chemists do not appear to follow up the majority of profile hits that they receive nor do they use the current issues of chemical abstracts very frequently. it is difficult to understand why this material was published in book form. it could have been condensed to one or possibly two papers for ].chem.doc. or perhaps even left for the really diligent seeker to find on the shelves of university microfilms-but, as the old testament scribe bemoaned, "of making many books there is no end." at the bottom of page 118 a reference is made to the paper by abbott et al. in aslib proceedings (feb. 1968); at the top of page 119 the same paper's date is given as january 1968. other errors are less obvious, but one really questions whether the provision of a short foreword and an index makes even a good thesis worth publishing in hard covers. r. t. bottle the city university london, u.k. computer-based reference service, by m. lorrai'ne mathies and peter g. watson. chicago: american library assn., 1973. 200p. $9.95. the archetypal title and model for all works of explication is ....... without tears. lorraine mathies and peter watson have attempted the praiseworthy task of explaining computer-produced indexes to the ordinary reference librarian, but for a number of reasons, some of them probably beyond the control of the authors, the tears will remai'n, perhaps one difficulty is that this book was, in its beginnings at least, the product of a committee. back in 1968 the information retrieval committee of the reference services division of the ala wanted to present to "working reference librarians the essentials of the reference potential of computers and the machine-readable data they produce" (p.xxix). the proposal worked its way (not untouched, of course) through several other groups and eventually resulted in a preconference workshop on computer-based reference service being given at the dallas convention of 1971. the present book is based on the tutor's manual which mathies and watson prepared for that workshop but incorporates revisions suggested by the ala publishing services as well as changes initiated by the authors themselves. with so many people getting into the planning act, it is not surprising that the various parts of the book should end up by working at cross purposes to each other. unfortunately, the principal conflicts come at just those points where a volume of exposition needs to be most definite and precise: just what is the book trying to do and for whom? at the original workshop, the eric data base was chosen as a "model system" since educational terminology was more likely to be understood than that of the sciences. and because the participants were to learn by doing, they were told a great deal about eric so as to be able to "practice" on it. the trouble is that these objectives do not translate well from workshop to print. the detafls about eric, which may have been necessary as tutors' instructions, seem misplaced in book form. almost half the present book is devoted to a laborious explanation of how eric works and this is a great deal more than most workaday reference librarians will want to know about it. moreover, it is no longer clear whether mathies and watson aim to train "producers" or "consumers." the welter of detail suggests that they expect their readers to learn hereby to construct profiles and to program searches but it is highly doubtful that skills of this kind can or should be imparted on a "teach yourself" basis. once mathies and watson leave eric behind, they seem on surer ground. part ii (computer searching: principles and strategies) begins with a fairly routine chapter on binary numeration which is perhaps unnecessary since this material is easily available elsewhere. however, the section quickly moves on to an excellent explanation of boolean logic and weighting, describes their application in the formulation of search strategies, and ends with an admirably succinct and demystifying account of how one evaluates the output (principles of relevance and recall). the reader might well have been better served if the book had indeed begun with this part. the last section (part iii: other machine readable data bases) is also very useful, particularly for the "critical bibliography" (p.153) in which the authors describe and evaluate ten of the major bibliographic data bases. this critical bibliography is apparently a first of its kind, which makes the authors' perceptive and frank comments all the more welcome. part iii also contains chapters on marc and the 1970 census but, sh·angely enough, does not include a final resume and conclusions. it is true that in each book reviews 325 chapter there is a paragraph or so of summary but this is hardly a satisfactory substitute for the overall recapitulation one would expect. in the final analysis, indeed, one's view of the book will depend on just thatwhat one expects of it. if "working reference librarians" expect to read this book in order to be no longer "intimidated by these electronic tools" (p.ix), they are apt to be disappointed. the inordinate emphasis on eric, the rather dense language, and the fact that the main ideas are never pulled together at the end will all prevent easy enlightenment. however, if our workaday reference librarians are willing to work their way through a fairly difficult manual on computer-based indexing as in effect a substitute 'for a workshop on the subject, they will find this book a worthwhile investment of their time-and tears. samuel rothstein school of lihl'arianship university of british columbia the circulation system at the university of missouri-columbia library: an evolutionary approach. sue mccollum and charles r. sievert, issue eds. the larc reports, vol. 5, issue 2, 1972. 101p. in 1958 the university of missouri-columbia library was one of the first libraries to mechanize circulation by punching a portion of the charge slip with book and borrower and/ or loan information. in 1964 an ibm 357 data collection system utilizing a modified 026 keypunch was installed, but not until 1966 was 026 output processed on the library owned and operated ibm 1440 computer. however, budgetary constraints forced a transfer of operations in 1970 to the data processing center, which undertook rewriting of library programs in 1971. after explanation of hardware changes and an overview of the circulation department organization and data processing center operation, this report deals in depth with the major files of the circulation system-circulation master flle and location master file-and the main components of the circulation system-edit, update, overdues, fines, interlibrary loans, 326 journal of libmry automation vol. 7/4 december 1974 address file, location file, reserve book, listing of files, special requests, and utility programs. many examples of report layouts are included, particularly those accomplished by utilizing data gathered from main collection and reserve book loans. although this off-line batch processing circulation system is limited in that it does not handle any borrower reserve or lookup (tracer) routines, both of which are possible in off-line systems, the university of missouri-columbia system has merit as a pioneer system which influenced other university library circulation system designs in the 1960s. detailed reference given throughout the report to changes in the original library programs not only makes it of value as a case history for any library interested in circulation automation but also indicates the important fact that library programs do change and evolve in response to new demands and technological capabilities. lois m. kershnm university of pennsylvania libraries national science information systems, a guide to science information systems in bulgaria, czechoslovakia, hungary, poland, rumania, and yugoslavia, by david h. kraus, pranas zunde, and vladimir slamecka. (national science information series) cambridge, mass.: the m.i.t. press, 1972. 325p. $12.50. as indicated by the title, this volume provides a comparative description and analysis of the various organizational or political structures which have been adopted by six counb·ies of central and eastern europe in their attempts to develop effective national systems for the dissemination of scientific and technical information. for each country there is a detailed account of the national information system now existing, with a brief outline of its antecedents, a directory of information or documentation centers, a list of serials published by these centers, and a bibliography of recent papers dealing with the development of information systems in that country. this main section of the book is preceded by a brief review of the common characteristics of the six national systems and an outline of steps being taken to achieve international cooperation for the exchange of information in specific subjects. of particular interest is the description of the international center of scientific and technical information established in moscow in 1969, and which is now linked to five of these national systems. no attempt is made to describe the techniques being used to store, retrieve, and disseminate information. the authors point out that the six countries being examined "have experimented intensely with organizational variants of national science information systems." unfortunately, they do not attempt to indicate which of these organizational structures was most effective in bringing about the desired results. undoubtedly, this would have been an impossible task and probably not worth the effort, since a successful type of organization in a socialist country would not necessarily be effective in a democracy. the book will be of interest to political scientists and to those seeking the most effective ways of coordinating the information processing efforts of all types of government bodies. it will be only of academic interest to the information specialist concerned primarily with information processing techniques. jack e. brown national science library of canada ottawa information retrieval: on-line, by f. w. lancaster and e. g. fayen. los angeles: melville publishing co., 1973. 597p. lc: 73-9697. isbn: 0-471-51235-4. have you been reading the asis annual review of information science and technology year after year and wishing for a compendium of the best information and examples of the latest systems, user manuals, cost data, and other facts so that you would not have to go searching in a library for the interesting reports, journal articles, and books? well, if you have (and who hasn't), your prayers have been answered if you are interested in online bibliographic retrieval systems. the authors of the handy reference book have collected and reprinted, among other things, the complete dialog terminal users reference manual, the supars user manual, the user instructions for aim-twx, obar, and the caruso tutorial program. each of these systems, and several others (arranged alphabetically from aim-twx [medline] to to xi con [toxline]), is described and illustrated. features and functions of on-line systems, such as vocabulary control and indexing, cataloging, instruction of users, equipment, and file design, are all covered in a straightforward manner, simply enough for the uninformed and carefully enough so that a system operator could compare his system's features and functions with the data provided. richly illustrated with tables, charts, graphs, and figures, up-to-date bibliographies (only serious omission noticed was the afips conference proceedings edited by d. walker), and subject and author indexes, this volume will stand as another landmark in the state-of-the-art review series which the wiley-becker & hayes information science series has come to represent. emphasis has been placed on the design, evaluation, and use of on-line retrieval systems rather than the hardware or programming aspects. several of the chapters have a broader base of interest than on-line systems, covering as they do performance criteria of retrieval systems, evaluating effectiveness, human factors, and cost-performance-benefits factors. easy to use and as up to date and balanced a book as any in a rapidly changing field can be, lancaster and fayen have given students of information studies and planners and managers of information services a very valuable reference aid. pauline a. atherton school of information studies syracuse university national library of australia. australian marc specification. canberra: national library of australia, 1973. 83p. $2.50. isbn: 0-642-99014-x for those readers who are familiar with book reviews 327 the library of congress marc format, the australian marc specification will be, for the most part, self-explanatory. the intent of the document is to describe the basic format structure and to list the various content designators that are used in the format. no effort was made to include any background information or explanation of data elements. because of this, the reviewer found it necessary to refer to other documents, e.g., precis: a rotated subiect index system, by derek austin and peter butcher, in order to complete a comparative analysis of the australian format with other similar formats. perhaps the value of reviewing a descriptive document of this type lies in discovering how the format it describes compares to other existing formats developed for the same purpose. the international organization for standardization published a format for bibliographic information interchange on magnetic tape in 1973, international standard iso 2709, the australian format structure is the same throughout as the international standard. the only variance is in character positions 20 and 21 of the leader, which the australian format left undefined. a comparison of content designators cannot be made with the international standard because it specifies only the position and length of the identifiers in the structure of the format, but not the actual identifier (except for the three-digit tags 001-999 that identify the data fields). the best comparison of content designators can be made with the lc marc format, since the australian format uses many of the same tags, indicators, and subfield codes for the same purposes. the australian format has assigned to the same character positions the same fixed-length data elements as the lc format except for position 38, which is the periodical code in the australian format and the modified record code in the lc format. in the fixed-length character. positions for form of contents, publisher (government publication in lc marc), and literary text (fiction in lc 328 journal of library automation vol. 7/4 december 1974 marc) , the australian format assigned different codes than lc. in general, the australian format uses the same three-digit tags as lc to identify the primary access fields in their records, e.g., 100, 110, 111 for main entries; 400, 410, 411, 440, 490 for series notes; 600, 610, 611, 650, 651 for subject headings; and 700, 710, 711 for added entries. for the remaining bibliographic fields there are some variations in tagging between the two formats. the australian marc has chosen a different method of identifying uniform titles, and has identified five more note fields in the 5xx series of tags than has lc. the australians have also added some manufactured fields to their record. these fields do not contain actual data from the bibliographic record, but rather are fields consisting of data created by program for control and manipulation purposes, or from lists such as the precis subject index. the australian format has also included, as part of its record, a series of cross-reference fields identified by 9xx tags. lc has reserved the 9xx block of tags for local use. the use of indicators differs in most instances between the two formats. both allow for two indicator positions in each field as specified by the international standard format structure. however, the information conveyed by the indicators differs except where the first indicator conwhich means no intelligence carried in this position. in the australian format the indicators in the 6xx block of tags have three different patterns. inconsistency of this kind does not tend to destroy compatibility with other coding systems using the same format structure, as long as sufficient explanation and examples are given from which conversion tables may be developed by the institutions with whom one wants to exchange, or interchange, bibliographic data. an even greater degree of difference exists between the two formats in the subfield codes used to identify data elements. the australian marc has identified some data elements that lc has not, e.g., in personal name main entries, the australian record identifies first names with subfield code "h," whereas lc does not identify parts of a personal name, only the form of the name, i.e., forename form, single surname, family name, etc. in most of the fields the two formats have defined some of the same data elements, but each uses a different subfield code to represent the element. in the australian document, under each field heading, the subfield codes are listed alphabetically with a data element following each code. this arrangement causes the data elements to fall out of their normal order of occurrence in the field, i.e., name, numeration, titles, dates, relator, etc. for example: personal name main entry (tag 100) subfield code a b amtralian marc entry element ( name) relator lc marc entry element (name ) numeration c dates d e second or subsequent additions to name numeration titles ( honorary) dates relator f additions to name other than date date (of a work) veys form of name for personal and corporate name headings. within each block of tags, lc has made an effort to remain consistent in the use of indicators, e.g., in the 6xx block for subject headings, the first indicator specifies form of name where a form of name can be discerned. where no form of name is discernable such as in a topical subject heading (tag 650), a null indicator or blank is used the example demonstrates the need for precise definition and documentation of data elements for the purpose of conversion or translation when interchanging data with other institutions. the australian format has included the capability of identifying analytical entries by using an additional digit (called the level digit) placed between the tag and the indicators to identify the analytical entries. a subrecord directory (tag 002) is present in each record containing data for analytical entries. the australian document includes appendixes for the country of publication codes, language codes, and geographical area codes that were developed by the library of congress. their only deviabook reviews 329 tion from lc marc usage is in the country of publication codes, where the australians have added entities and codes for australian first-level administrative subdivisions. patricia e. parker marc development office library of congress microsoft word june_ital_owen_final.docx engine  of  innovation:     building  the  high  performance  catalog        will  owen  and   sarah  c.  michalak     information  technology  and  libraries  |  june  2015               5   abstract   numerous  studies  have  indicated  that  sophisticated  web-­‐based  search  engines  have  eclipsed  the   primary  importance  of  the  library  catalog  as  the  premier  tool  for  researchers  in  higher  education.   we  submit  that  the  catalog  remains  central  to  the  research  process.  through  a  series  of  strategic   enhancements,  the  university  of  north  carolina  at  chapel  hill,  in  partnership  with  the  other   members  of  the  triangle  research  libraries  network  (trln),  has  made  the  catalog  a  carrier  of   services  in  addition  to  bibliographic  data,  facilitating  not  simply  discovery,  but  also  delivery  of  the   information  researchers  seek.   introduction in  2005,  an  oclc  research  report  documented  what  many  librarians  already  knew—that  the   library  webpage  and  catalog  were  no  longer  the  first  choice  to  begin  a  search  for  information.  the   report  states,   the  survey  findings  indicate  that  84  percent  of  information  searches  begin  with  a  search   engine.  library  web  sites  were  selected  by  just  1  percent  of  respondents  as  the  source  used  to   begin  an  information  search.  very  little  variability  in  preference  exists  across  geographic   regions  or  u.s.  age  groups.  two  percent  of  college  students  start  their  search  at  a  library  web   site.1   in  2006  a  report  by  karen  calhoun,  commissioned  by  the  library  of  congress,  asserted,  “today  a   large  and  growing  number  of  students  and  scholars  routinely  bypass  library  catalogs  in  favor  of   other  discovery  tools.  .  .  .  the  catalog  is  in  decline,  its  processes  and  structures  are  unsustainable,   and  change  needs  to  be  swift.”2     ithaka  s+r  has  conducted  national  faculty  surveys  triennially  since  2000.  summarizing  the  2000– 2006  surveys,  roger  schonfeld  and  kevin  guthrie  stated,  “when  the  findings  from  2006  are   compared  with  those  from  2000  and  2003,  it  becomes  evident  that  faculty  perceive  themselves  as   becoming  decreasingly  dependent  on  the  library  for  their  research  and  teaching  needs.”3   furthermore,  it  was  clear  that  the  “library  as  gateway  to  scholarly  information”  was  viewed  as   decreasingly  important.  the  2009  survey  continued  the  trend  with  even  fewer  faculty  seeing  the       will  owen  (owen@email.unc.edu)  is  associate  university  librarian  for  technical  services  and   systems  and  sarah  c.  michalak  (smichala@email.unc.edu)  is  university  librarian  and  associate   provost  for  university  libraries,  university  of  north  carolina  at  chapel  hill.     engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   6   gateway  function  as  critical.  these  results  occurred  in  a  time  when  electronic  resources  were   becoming  increasingly  important  and  large  google-­‐like  search  engines  were  rapidly  gaining  in   use.4     these  comments  extend  into  the  twenty-­‐first  century  more  than  thirty  years  of  concern  about  the   utility  of  the  library  catalog.  through  the  first  half  of  this  decade  new  observations  emerged  about   patron  perceptions  of  catalog  usability.  even  after  migration  from  the  card  to  the  online  catalog   was  complete,  the  new  tool  represented  primarily  the  traditionally  cataloged  holdings  of  a   particular  library.  providing  direct  access  to  resources  was  not  part  of  the  catalog’s  mission.   manuscripts,  finding  aids,  historical  photography,  and  other  special  collections  were  not  included   in  the  traditional  catalog.  journal  articles  could  only  be  discovered  through  abstracting  and   indexing  services.  as  these  discovery  tools  began  their  migration  to  electronic  formats,  the   centrality  of  the  library’s  bibliographic  database  was  challenged.   the  development  of  google  and  other  sophisticated  web-­‐based  search  engines  further  eclipsed  the   library’s  bibliographic  database  as  the  first  and  most  important  research  tool.  yet  we  submit  that   the  catalog  database  remains  a  necessary  fixture,  continuing  to  provide  access  to  each  library’s   particular  holdings.  while  the  catalog  may  never  regain  its  pride  of  place  as  the  starting  point  for   all  researchers,  it  still  remains  an  indispensable  tool  for  library  users,  even  if  it  may  be  used  only   at  a  later  stage  in  the  research  process.   at  the  university  of  north  carolina  at  chapel  hill,  we  have  continued  to  invest  in  enhancing  the   utility  of  the  catalog  as  a  valued  tool  for  research.  librarians  initially  reasoned  that  researchers   still  want  to  find  out  what  is  available  to  them  in  their  own  campus  library.  gradually  they  began   to  see  completely  new  possibilities.  to  that  end,  we  have  committed  to  a  program  that  enhances   discovery  and  delivery  through  the  catalog.  while  most  libraries  have  built  a  wide  range  of   discovery  tools  into  their  home  pages—adding  links  to  databases  of  electronic  resources,  article   databases,  and  google  scholar—we  have  continued  to  enhance  both  the  content  to  be  found  in  the   primary  local  bibliographic  database  and  the  services  available  to  students  and  researchers  via  the   interface  to  the  catalog.   in  our  local  consortium,  the  triangle  research  libraries  network  (trln),  librarians  have   deployed  the  search  and  faceting  services  of  endeca  to  enrich  the  discovery  interfaces.  we  have   gone  beyond  augmenting  the  catalog  through  the  addition  of  marcive  records  for  government   documents,  by  including  encoded  archival  description  (ead)  finding  aids  and  selected  (and  ever-­‐ expanding)  digital  collections  that  are  not  easily  discoverable  through  major  search  engines.  we   have  similarly  enhanced  services  related  to  the  discovery  and  delivery  of  items  listed  in  the   bibliographic  database,  including  not  only  common  features  like  the  ability  to  export  citations  in  a   variety  of  formats  but  also  more  extensive  services  such  as  document  delivery,  an  auto-­‐suggest   feature  that  maximizes  use  of  library  of  congress  subject  headings  (lcsh),  and  the  ability  to   submit  cataloged  items  to  be  processed  for  reserve  reading.     information  technology  and  libraries  |  june  2015     7   both  students  and  faculty  have  embraced  e-­‐books,  and  in  adding  more  than  a  million  such  titles  to   the  unc-­‐chapel  hill  catalog  we  continue  to  blend  discovery  and  delivery,  but  now  on  a  very  large   scale.  coupling  catalog  records  with  a  metadata  service  that  provides  book  jackets,  tables  of   contents,  and  content  summaries,  cataloging  geographic  information  systems  (gis)  data  sets,  and   adding  live  links  to  the  finding  aids  for  digitized  archival  and  manuscript  collections  have  further   enhanced  the  blended  discovery/delivery  capacity  of  the  catalog.   we  have  also  leveraged  the  advantages  of  operating  in  a  consortial  environment  by  extending  the   discovery  and  delivery  services  among  the  members  of  trln  to  provide  increased  scope  of   discovery  and  shared  processing  of  some  classes  of  bibliographic  records.  trln  comprises  four   institutions  and  content  from  all  member  libraries  is  discoverable  in  a  combined  catalog   (http://search.trln.org).  printed  material  requested  through  this  combined  catalog  is  often   delivered  between  trln  libraries  within  twenty-­‐four  hours.   at  unc,  our  search  logs  show  that  use  of  the  catalog  increases  as  we  add  new  capacity  and  content.   these  statistics  demonstrate  the  catalog’s  continuing  relevance  as  a  research  tool  that  adds  value   above  and  beyond  conventional  search  engines  and  general  web-­‐based  information  resources.  in   this  article  we  will  describe  the  most  important  enhancements  to  our  catalog,  include  data  from   search  logs  to  demonstrate  usage  changes  resulting  from  these  enhancements,  and  comment  on   potential  future  developments.   literature  review   an  extensive  literature  discusses  the  past  and  future  of  online  catalogs,  and  many  of  these   materials  themselves  include  detailed  literature  reviews.  in  fact,  there  are  so  many  studies,   reviews,  and  editorials,  it  becomes  clear  that  although  the  online  catalog  may  be  in  decline,  it   remains  a  subject  of  lively  interest  to  librarians.  two  important  threads  in  this  literature  report  on   user-­‐query  studies  and  on  other  usability  testing.  though  there  are  many  earlier  studies,  two   relatively  recent  articles  analyze  search  behavior  and  provide  selective  but  helpful  literature   surveys.5     there  are  many  efforts  to  define  directions  for  the  catalog  that  would  make  it  more  web-­‐like,  more   google-­‐like,  and  thus  more  often  chosen  for  search,  discovery,  and  access  by  library  patrons.   these  articles  aim  to  define  the  characteristics  of  the  ideal  catalog.  charles  hildreth  provides  a   benchmark  for  these  efforts  by  dividing  the  history  of  the  online  catalog  into  three  generations.   from  his  projections  of  a  third  generation  grew  the  “next  generation  catalog”—really  the  current   ideal.  he  called  for  improvement  of  the  second-­‐generation  catalog  through  an  enhanced  user-­‐ system  dialog,  automatic  correction  of  search-­‐term  spelling  and  format  errors,  automatic  search   aids,  enriched  subject  metadata  in  the  catalog  record  to  improve  search  results,  and  the   integration  of  periodical  indexes  in  the  catalog.  as  new  technologies  have  made  it  possible  to   achieve  these  goals  in  new  ways,  much  of  what  hildreth  envisioned  has  been  accomplished.6       engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   8   second-­‐generation  catalogs,  anchored  firmly  in  integrated  library  systems,  operated  throughout   most  of  the  1980s  and  the  1990s  without  significant  improvement.  by  the  mid-­‐2000s  the  search   for  the  “next-­‐gen”  catalog  was  in  full  swing,  and  there  are  numerous  articles  that  articulate  the   components  of  an  improved  model.  the  catalog  crossed  a  generational  line  for  good  when  the   north  carolina  state  university  libraries  (ncsu)  launched  a  new  catalog  search  engine  and   interface  with  endeca  in  january  2006.  three  ncsu  authors  published  a  thorough  article   describing  key  catalog  improvements.  their  endeca-­‐enhanced  catalog  fulfilled  the  most  important   criteria  for  a  “next-­‐gen”  catalog:  improved  search  and  retrieval  through  “relevance-­‐ranked  results,   new  browse  capabilities,  and  improved  subject  access.”7     librarians  gradually  concluded  that  the  catalog  need  not  be  written  off  but  would  benefit  from   being  enhanced  and  aligned  with  search  engine  capabilities  and  other  web-­‐like  characteristics.   catalogs  should  contain  more  information  about  titles,  such  as  book  jackets  or  reviews,  than   conventional  bibliographic  records  offered.  catalog  search  should  be  understandable  and  easy  to   use.  additional  relevant  works  should  be  presented  to  the  user  along  with  result  sets.  the   experience  should  be  interactive  and  participatory  and  provide  access  to  a  broad  array  of   resources  such  as  data  and  other  nonbook  content.8     karen  markey,  one  of  the  most  prolific  online  catalog  authors  and  analysts,  writes,  “now  that  the   era  of  mass  digitization  has  begun,  we  have  a  second  chance  at  redesigning  the  online  library   catalog,  getting  it  right,  coaxing  back  old  users  and  attracting  new  ones.”9   marshall  breeding  predicted  characteristics  of  the  next-­‐generation  catalog.  his  list  includes   expanded  scope  of  search,  more  modern  interface  techniques,  such  as  a  single  point  of  entry,   search  result  ranking,  faceted  navigation,  and  “did  you  mean  .  .  .  ?”  capacity,  as  well  as  an  expanded   search  universe  that  includes  the  full  text  of  journal  articles  and  an  array  of  digitized  resources.10     a  concept  that  is  less  represented  in  the  literature  is  that  of  envisioning  the  catalog  as  a   framework  for  service,  although  the  idea  of  the  catalog  designed  to  ensure  customer  self-­‐service   has  been  raised.11  michael  j.  bennett  has  studied  the  effect  of  catalog  enhancements  on  circulation   and  interlibrary  loan.12  service  and  the  online  catalog  have  a  new  meaning  in  morgan’s  idea  of   “services  against  texts,”  supporting  “use  and  understand”  in  addition  to  the  traditional  “find  and   get.”13  lorcan  dempsey  commented  on  the  catalog  as  an  identifiable  service  and  predicts  new   formulations  for  library  services  based  on  the  network-­‐level  orientation  of  search  and  discovery.14   but  the  idea  that  the  catalog  has  moved  from  a  fixed,  inward-­‐focused  tool  to  an  engine  for   services—a  locus  to  be  invested  with  everything  from  unmediated  circulation  renewal  and   ordering  delivery  to  the  “did  you  mean”  search  aid—has  yet  to  be  addressed  comprehensively  in   the  literature.   enhancing  the  traditional  catalog   one  of  the  factors  that  complicates  discussions  of  the  continued  relevance  of  the  library  catalog  to   research  is  the  very  imprecision  of  the  term  in  common  parlance,  especially  when  the  chief  point     information  technology  and  libraries  |  june  2015     9   of  comparison  to  today’s  ils-­‐driven  opacs  is  google  or,  more  specifically,  google  scholar.  from   first-­‐year  writing  assignments  through  advanced  faculty  research,  many  of  the  resources  that  our   patrons  seek  are  published  in  the  periodical  literature,  and  the  library  catalog,  the  one  descended   from  the  cabinets  full  of  cards  that  occupied  prominent  real  estate  in  our  buildings,  has  never  been   an  effective  tool  for  identifying  relevant  periodical  literature.   this  situation  has  changed  in  recent  years  as  products  like  summon,  from  proquest,  and  ebsco   discovery  service  have  introduced  platforms  that  can  accommodate  electronic  article  indexing  as   well  as  marc  records  for  the  types  of  materials—books,  audio,  and  video—that  have  long  been   discovered  through  the  opac.  in  the  following  discussion  of  “catalog”  developments  and   enhancements,  we  focus  initially  not  on  these  integrated  solutions,  but  on  the  catalog  as  more   traditionally  defined.  however,  as  electronic  resources  become  an  ever-­‐greater  percentage  of   library  collections,  we  shall  see  a  convergence  of  these  two  streams  that  will  portend  significant   changes  in  the  nature  and  utility  of  the  catalog.   much  work  has  been  done  in  the  first  decade  of  the  twenty-­‐first  century  to  enhance  discovery   services  and,  as  noted  above,  north  carolina  state  university’s  introduction  of  their  endeca-­‐based   search  engine  and  interface  was  a  significant  game-­‐changer.  in  the  years  following  the   introduction  of  the  endeca  interface  at  ncsu,  the  triangle  research  libraries  network  invested  in   further  development  of  features  that  enhanced  the  utility  of  the  endeca  software  itself.   programmed  enhancements  to  the  interface  provided  additional  services  and  functionality.  in   some  cases,  these  enhancements  were  aimed  at  improving  discovery.  in  others,  they  allowed   researchers  to  make  new  and  better  use  of  the  data  that  they  found  or  made  it  easier  to  obtain  the   documents  that  they  discovered.   faceting  and  limiting  retrieval  results   perhaps  the  most  immediately  striking  innovation  in  the  endeca  interface  was  the  introduction  of   facets.  the  use  of  faceted  browsing  allowed  users  to  parse  the  bibliographic  record  in  new  ways   (and  more  ways)  than  had  preceding  catalogs.  there  were  several  fundamentally  important  ways   faceting  enhanced  search  and  discovery.   the  first  of  these  was  the  formal  recognition  that  keyword  searching  was  the  user’s  default  means   of  interacting  with  the  catalog’s  data.  ncsu’s  initial  implementation  allowed  for  searches  using   several  indexes,  including  authors,  titles,  and  subject  headings,  and  this  functionality  remains  in   place  to  the  present  day.  however,  by  default,  searches  returned  records  containing  the  search   terms  “anywhere”  in  the  record.  this  behavior  was  more  in  line  with  user  expectations  in  an   information  ecosystem  dominated  by  google’s  single  search  box.   the  second  was  the  significantly  different  manner  in  which  multiple  limits  could  be  placed  on  an   initial  result  set  from  such  a  keyword  search.  the  concept  of  limiting  was  not  a  new  one:  certain   facets  worked  in  a  manner  consistent  with  traditional  limits  in  prior  search  interfaces,  allowing   users  to  screen  results  by  language,  or  date  of  publication,  for  example.       engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   10   it  was  the  ease  and  transparency  with  which  multiple  limits  could  be  applied  through  faceting  that   was  revolutionary.  a  user  who  entered  the  keyword  “java”  in  the  search  box  was  quickly  able  to   discriminate  between  the  programming  language  and  the  indonesian  island.  this  could  be   achieved  in  multiple  ways:  by  choosing  between  subjects  (for  example,  “application  software”  vs.   “history”)  or  clearly  labeled  lc  classification  categories  (“q  –  science”  vs.  “d  –  history”).  these   limits,  or  facets,  could  be  toggled  on  and  off,  independently  and  iteratively.   the  third  and  highly  significant  difference  resulted  from  how  library  of  congress  subject   headings  (lcsh)  were  parsed  and  indexed  in  the  system.  by  making  lcsh  subdivisions   independent  elements  of  the  subject-­‐heading  index  in  a  keyword  search,  the  endeca   implementation  unlocked  a  trove  of  metadata  that  had  been  painstakingly  curated  by  catalogers   for  nearly  a  century.  the  user  no  longer  needed  to  be  familiar  with  the  formal  structure  of  subject   headings;  if  the  keywords  appeared  anywhere  in  the  string,  the  subdivisions  in  which  they  were   contained  could  be  surfaced  and  used  as  facets  to  sharpen  the  focus  of  the  search.  this  was   revolutionary.   utilizing  the  power  of  new  indexing  structures   the  liberation  of  bibliographic  data  from  the  structure  of  marc  record  indexes  presaged  yet   another  far-­‐reaching  alteration  in  the  content  of  library  catalogs.  to  this  day,  most  commercial   integrated  library  systems  depend  on  marc  as  the  fundamental  record  structure.  in  ncsu’s   implementation,  the  multiple  indexes  built  from  that  metadata  created  a  new  framework  for   information.     this  change  made  possible  the  integration  of  non-­‐marc  data  with  marc  data,  allowing,  for   example,  dublin  core  (dc)  records  to  be  incorporated  into  the  universe  of  metadata  to  be  indexed,   searched,  and  retrieved.  there  was  no  need  to  crosswalk  dc  to  marc:  it  sufficed  to  simply  assign   the  dc  elements  to  the  appropriate  endeca  indexes.  with  this  capacity  to  integrate  rich  collections   of  locally  described  digital  resources,  the  scope  of  the  traditional  catalog  was  enlarged.   expanding  scopes  and  banishing  silos   at  unc-­‐chapel  hill,  we  began  this  process  of  augmentation  with  selected  collections  of  digital   objects.  these  collections  were  housed  in  a  contentdm  repository  we  had  been  building  for   several  years  at  the  time  of  the  library’s  introduction  of  the  endeca  interface.  image  files,  which   had  not  been  accessible  through  traditional  catalogs,  were  among  the  first  to  be  added.  for   example,  we  had  been  given  a  large  collection  of  illustrated  postcards  featuring  scenes  of  north   carolina  cities  and  towns.  these  postcards  had  been  digitized  and  metadata  describing  the  image   and  the  town  had  been  recorded.  other  collections  of  digitized  historical  photographs  were  also   selected  for  inclusion  in  the  catalog.  these  historical  resources  proved  to  be  a  boon  to  faculty   teaching  local  history  courses  and,  interestingly,  to  students  working  on  digital  projects  for  their   classes.  as  class  assignments  came  to  include  activities  like  creating  maps  enhanced  by  the     information  technology  and  libraries  |  june  2015     11   addition  of  digital  photographs  or  digitized  newspaper  clippings,  the  easy  discovery  of  these   formerly  hidden  collections  enriched  students’  learning  experience.   other  special  collection  materials  had  been  represented  in  the  traditional  catalog  in  somewhat   limited  fashion.  the  most  common  examples  were  manuscripts  collections.  the  processing  of   these  collections  had  always  resulted  in  the  creation  of  finding  aids,  produced  since  the  1930s   using  index  cards  and  typewriters.  during  the  last  years  of  the  twentieth  century,  archivists  began   migrating  many  of  these  finding  aids  to  the  web  using  the  ead  format,  presenting  them  as  simple   html  pages.  these  finding  aids  were  accessible  through  the  catalog  by  means  of  generalized   marc  records  that  described  the  collections  at  a  superficial  level.  however,  once  we  attained  the   ability  to  integrate  the  contents  of  the  finding  aids  themselves  into  the  indexes  underlying  the  new   interface,  this  much  richer  trove  of  keyword-­‐searchable  data  vastly  increased  the  discoverability   and  use  of  these  collections.   during  this  period,  the  library  also  undertook  systematic  digitization  of  many  of  these  manuscript   collections.  whenever  staff  received  a  request  for  duplication  of  an  item  from  a  manuscript   collection  (formerly  photocopies,  but  by  then  primarily  digital  copies),  we  digitized  the  entire   folder  in  which  that  item  was  housed.  we  developed  standards  for  naming  these  digital  surrogates   that  associated  the  individual  image  with  the  finding  aid.  it  then  became  a  simple  matter,  involving   the  addition  of  a  short  javascript  string  to  the  head  of  the  online  finding  aid,  to  dynamically  link   the  digital  objects  to  the  finding  aid  itself.     other  library  collections  likewise  benefited  from  the  new  indexing  structures.  some  uncataloged   materials  traditionally  had  minimal  bibliographic  control  provided  by  inventories  that  were  built   at  the  time  of  accession  in  desktop  database  applications;  funding  constraints  meant  that  full   cataloging  of  these  materials  (often  rare  books)  remained  elusive.  the  ability  to  take  the  data  that   we  had  and  blend  it  into  the  catalog  enhanced  the  discovery  of  these  collections  as  well.   we  also  have  an  extensive  collection  of  video  resources,  including  commercial  and  educational   films.  the  conventions  for  cataloging  these  materials,  held  over  from  the  days  of  catalog  cards,   often  did  not  match  user  expectations  for  search  and  discovery.  there  were  limits  to  the  number   of  added  entries  that  catalogers  would  make  for  directors,  actors,  and  others  associated  with  a   film.  many  records  lacked  the  kind  of  genre  descriptors  that  undergraduates  were  likely  to  use   when  seeking  a  film  for  an  evening’s  entertainment.  to  compensate  for  these  limitations,  staff  who   managed  the  collection  had  again  developed  local  database  applications  that  allowed  for  the   inclusion  of  more  extensive  metadata  and  for  categories  such  as  country  of  origin  or  folksonomic   genres  that  patrons  frequently  indicated  were  desirable  access  points.  once  again,  the  new   indexing  structures  allowed  us  to  incorporate  this  rich  set  of  metadata  into  what  looked  like  the   traditional  catalog.   each  of  the  instances  described  above  represents  what  we  commonly  call  the  destruction  of  silos.   information  about  library  collections  that  had  been  scattered  in  numerous  locations—and  not  all   of  them  online—was  integrated  into  a  single  point  of  discovery.  it  was  our  hope  and  intention  that     engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   12   such  integration  would  drive  more  users  to  the  catalog  as  a  discovery  tool  for  the  library’s  diverse   collections  and  not  simply  for  the  traditional  monographic  and  serials  collections  that  had  been   served  by  marc  cataloging.  usage  logs  indicate  that  the  average  number  of  searches  conducted  in   the  catalog  rose  from  approximately  13,000  per  day  in  2009  to  around  19,000  per  day  in  2013.  it   is  impossible  to  tell  with  any  certainty  whether  there  was  heavier  use  of  the  catalog  simply   because  increasingly  varied  resources  came  to  be  represented  in  it,  but  we  firmly  believe  that  the   experience  of  users  who  search  for  material  in  our  catalog  has  become  much  richer  as  a  result  of   these  changes  to  its  structure  and  content.   cooperation  encouraging  creativity   another  way  we  were  able  to  harness  the  power  of  endeca’s  indexing  scheme  involved  the  shared   loading  of  bibliographic  records  for  electronic  resources  to  which  multiple  trln  libraries   provided  access.  trln’s  endeca  indexes  are  built  from  the  records  of  each  member.  each   institution  has  a  “pipeline”  that  feeds  metadata  into  the  combined  trln  index.  duplicate  records   are  rolled  up  into  a  single  display  via  oclc  control  numbers  whenever  possible,  and  the   bibliographic  record  is  annotated  with  holdings  statements  for  the  appropriate  libraries.   we  quickly  realized  that  where  any  of  the  four  institutions  shared  electronic  access  to  materials,  it   was  redundant  to  load  copies  of  each  record  into  the  local  databases  of  each  institution.15  instead,   one  institution  could  take  responsibility  for  a  set  of  records  representing  shared  resources.   examples  of  such  material  include  electronic  government  documents  with  records  provided  by   the  marcive  documents  without  shelves  program,  large  sets  like  early  english  books  online,  and   pbs  videos  streamed  by  the  statewide  services  of  nc  live.   in  practice,  one  institution  takes  responsibility  for  loading,  editing,  and  performing  authority   control  on  a  given  set  of  records.  (for  example,  unc,  as  the  regional  depository,  manages  the   documents  without  shelves  record  set.)  these  records  are  loaded  with  a  special  flag  indicating   that  they  are  part  of  the  shared  records  program.  this  flag  generates  a  holdings  statement  that   reflects  the  availability  of  the  electronic  item  at  each  institution.  the  individual  holdings   statements  contain  the  institution-­‐specific  proxy  server  information  to  enable  and  expedite  access.   in  addition  to  this  distributed  model  of  record  loading  and  maintenance,  we  were  able  to  leverage   oai-­‐pmh  feeds  to  add  selected  resources  to  the  searchtrln  database.  all  four  institutions  have   access  to  the  data  made  available  by  the  inter-­‐university  consortium  for  political  and  social   research  (icpsr).  as  we  do  not  license  these  resources  or  maintain  them  locally,  and  as  records   provided  by  icpsr  can  change  over  time,  we  developed  a  mechanism  to  harvest  the  metadata  and   push  it  through  a  pipeline  directly  into  the  searchtrln  indexes.  none  of  the  member  libraries’   local  databases  house  this  metadata,  but  the  records  are  made  available  to  all  nonetheless.   while  we  were  engaged  in  implementing  these  enhancements,  additional  sources  of  potential   enrichment  of  the  catalog  were  appearing.  in  particular,  vendors  began  providing  indexing   services  for  the  vast  quantities  of  electronic  resources  contained  in  aggregator  databases.     information  technology  and  libraries  |  june  2015     13   additionally,  they  made  it  possible  for  patrons  to  move  seamlessly  from  the  catalog  to  those   electronic  resources  via  openurl  technologies.  indeed,  services  like  proquest’s  summon  or   ebsco’s  discovery  service  might  be  taken  as  another  step  toward  challenging  the  catalog’s   primacy  as  a  discovery  tool  as  they  offered  the  prospect  of  making  local  catalog  records  just  a   fraction  of  a  much  larger  universe  of  bibliographic  information  available  in  a  single,  keyword-­‐ searchable  database.   it  remains  to  be  seen,  therefore,  whether  continuing  to  load  many  kinds  of  marc  records  into  the   local  database  is  an  effective  aid  to  discovery  even  with  the  multiple  delimiting  capabilities  that   endeca  provides.  what  is  certain,  however,  is  that  our  approach  to  indexing  resources  of  any  kind   has  undergone  a  radical  transformation  over  the  past  few  years—a  transformation  that  goes   beyond  the  introduction  of  any  of  the  particular  changes  we  have  discussed  so  far.   promoting  a  culture  of  innovation   one  important  way  endeca  has  changed  our  libraries  is  that  a  culture  of  constant  innovation  has   become  the  norm,  rather  than  the  exception,  for  our  catalog  interface  and  content.  once  we  were   no  longer  subject  to  the  customary  cycle  of  submitting  enhancement  requests  to  an  integrated   library  system  vendor,  hoping  that  fellow  customers  shared  similar  desires,  and  waiting  for  a   response  and,  if  we  were  lucky,  implementation,  we  were  able  to  take  control  of  our  aspirations.   we  had  the  future  of  the  interface  to  our  collections  in  our  own  hands,  and  within  a  few  years  of   the  introduction  of  endeca  by  ncsu,  we  were  routinely  adding  new  features  to  enhance  its   functionality.   one  of  the  first  of  these  enhancements  was  the  addition  of  a  “type-­‐ahead”  or  “auto-­‐suggest”   option.16  inspired  by  google’s  autocomplete  feature,  this  service  suggests  phrases  that  might   match  the  keywords  a  patron  is  typing  into  the  search  box.  ben  pennell,  one  of  the  chief   programmers  working  on  endeca  enhancement  at  unc-­‐chapel  hill,  built  a  solr  index  from  the  ils   author,  title,  and  subject  indexes  and  from  a  log  of  recent  searches.  as  a  patron  typed,  a  drop-­‐ down  box  appeared  below  the  search  box.  the  drop-­‐down  contained  matching  terms  extracted   from  the  solr  index  in  a  matter  of  seconds  or  less.  for  example,  typing  the  letters  “bein”  into  the   box  produced  a  list  including  “being  john  malkovich,”  “nature—effects  of  human  beings  on,”   “human  beings,”  and  “bein,  alex,  1903–1988.”  the  italicized  letters  in  these  examples  are   highlighted  in  a  different  color  in  the  drop-­‐down  display.  in  the  case  of  terms  drawn  directly  from   an  index,  the  index  name  appears,  also  highlighted,  on  the  right  side  of  the  box.  for  example,  the   second  and  third  terms  in  the  examples  above  are  tagged  with  the  term  “subject.”  the  last  example   is  an  “author.”   in  allowing  for  the  textual  mining  of  lcsh,  the  initial  implementation  of  faceting  in  the  endeca   catalog  surfaced  those  headings  for  the  patron  by  uniting  keyword  and  controlled  vocabularies  in   an  unprecedented  manner.  there  was  a  remarkable  and  almost  immediate  increase  in  the  number   of  authority  index  searches  entered  into  the  system.  at  the  end  of  the  fall  semester  prior  to  the   implementation  of  the  auto-­‐suggest  feature,  an  average  of  around  1,400  subject  searches  were     engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   14   done  in  a  week.  approximately  one  month  into  the  spring  semester,  that  average  had  risen  to   around  4,000  subject  searches  per  week.  use  of  the  author  and  title  indexes  also  rose,  although   not  quite  as  dramatically.  in  the  perpetual  tug-­‐of-­‐war  between  precision  and  recall,  the  balance   had  decidedly  shifted.   another  service  that  we  provide,  which  is  especially  popular  with  students,  is  the  ability  to   produce  citations  formatted  in  one  of  several  commonly  used  bibliographic  styles,  including  apa,   mla,  and  chicago  (both  author-­‐date  and  note-­‐and-­‐bibliography  formats).  this  functionality,  first   introduced  by  ncsu  and  then  jointly  developed  with  unc  over  the  years  that  followed,  works  in   two  ways.  if  a  patron  finds  a  monographic  title  in  the  catalog,  simply  clicking  on  a  link  labeled  “cite”   produces  a  properly  formatted  citation  that  can  then  be  copied  and  pasted  into  a  document.  the   underlying  technology  also  powers  a  “citation  builder”  function  by  which  a  patron  can  enter  basic   bibliographic  information  for  a  book,  a  chapter  or  essay,  a  newspaper  or  journal  article,  or  a   website  into  a  form,  click  the  “submit”  button,  and  receive  a  citation  in  the  desired  format.     an  additional  example  of  innovation  that  falls  somewhat  outside  the  scope  of  the  changes   discussed  above  was  the  development  of  a  system  that  allowed  for  the  mapping  of  simplified   chinese  characters  to  their  traditional  counterparts.  searching  in  non-­‐roman  character  sets  has   always  offered  a  host  of  challenges  to  library  catalog  users.  the  trln  libraries  have  embraced  the   potential  of  endeca  to  reduce  some  of  these  challenges,  particularly  for  chinese,  through  the   development  of  better  keyword  searching  strategies  and  the  automatic  translation  of  simplified  to   traditional  characters.   since  we  had  complete  control  over  the  endeca  interface,  it  proved  relatively  simple  to  integrate   document  delivery  services  directly  into  the  functionality  of  the  catalog.  rather  than  simply   emailing  a  bibliographic  citation  or  a  call  number  to  themselves,  patrons  could  request  the   delivery  of  library  materials  directly  to  their  campus  addresses.  once  we  had  implemented  this   feature,  we  quickly  moved  to  amplify  its  power.  many  catalogs  offer  a  “shopping  cart”  service  that   allows  patrons  to  compile  lists  of  titles.  one  variation  on  this  concept  that  we  believe  is  unique  to   our  library  is  the  ability  for  a  professor  to  compile  such  a  list  of  materials  held  by  the  libraries  on   campus  and  submit  that  list  directly  to  the  reserve  reading  department,  where  the  books  are   pulled  from  the  shelves  and  placed  on  course-­‐reserve  lists  without  the  professor  needing  to  visit   any  particular  library  branch.  these  new  features,  in  combination  with  other  service   enhancements  such  as  the  delivery  of  physical  documents  to  campus  addresses  from  our  on-­‐ campus  libraries  and  our  remote  storage  facility,  have  increased  the  usefulness  of  the  catalog  as   well  as  our  users’  satisfaction  with  the  library.  we  believe  that  these  changes  have  contributed  to   the  ongoing  vitality  of  the  catalog  and  to  its  continued  importance  to  our  community.   in  december  2012,  the  libraries  adopted  proquest’s  summon  to  provide  enhanced  access  to   article  literature  and  electronic  resources  more  generally.  at  the  start  of  the  following  fall   semester,  the  libraries  instituted  another  major  change  to  our  discovery  and  delivery  services   through  a  combined  single-­‐search  box  on  our  home  page.  this  has  fundamentally  altered  how     information  technology  and  libraries  |  june  2015     15   patrons  interact  with  our  catalog  and  its  associated  resources.  first,  because  we  are  now   searching  both  the  catalog  and  the  summon  index,  the  type-­‐ahead  feature  that  we  had  deployed  to   suggest  index  terms  from  our  local  database  to  users  as  they  entered  search  strings  no  longer   functions  as  an  authority  index  search.  we  have  returned  to  querying  both  databases  through  a   simple  keyword  search.     second,  in  our  implementation  of  the  single  search  interface  we  have  chosen  to  present  the  results   from  our  local  database  and  the  retrievals  from  summon  in  two,  side-­‐by-­‐side  columns.  this  has   the  advantage  of  bringing  article  literature  and  other  resources  indexed  by  summon  directly  to   the  patron’s  attention.  as  a  result,  more  patrons  interact  directly  with  articles,  as  well  as  with   books  in  major  digital  repositories  like  google  books  and  hathitrust.  this  change  has   undoubtedly  led  patrons  to  make  less  in-­‐depth  use  of  the  local  catalog  database,  although  it   preserves  much  of  the  added  functionality  in  terms  of  discovering  our  own  digital  collections  as   well  as  those  resources  whose  cataloging  we  share  with  our  trln  partners.  we  believe  that  the   ease  of  access  to  the  resources  indexed  by  summon  complements  the  enhancements  we  have   made  to  our  local  catalog.   conclusion  and  further  directions   one  might  argue  that  the  integration  of  electronic  resources  into  the  “catalog”  actually  shifts  the   paradigm  more  significantly  than  any  previous  enhancements.  as  the  literature  review  indicates,   much  of  the  conversation  about  enriching  library  catalogs  has  centered  on  improving  the  means   by  which  search  and  discovery  are  conducted.  the  reasonably  direct  linking  to  full  text  that  is  now   possible  has  once  again  radically  shifted  that  conversation,  for  the  catalog  has  come  to  be  seen  not   simply  as  a  discovery  platform  based  on  metadata  but  as  an  integrated  system  for  delivering  the   essential  information  resources  for  which  users  are  searching.   once  the  catalog  is  understood  to  be  a  locus  for  delivering  content  in  addition  to  discovering  it,  the   local  information  ecosystem  can  be  fundamentally  altered.  at  unc-­‐chapel  hill  we  have  engaged  in   a  process  whereby  the  catalog,  central  to  the  library’s  web  presence  (given  the  prominence  of  the   single  search  box  on  the  home  page),  has  become  a  hub  from  which  many  other  services  are   delivered.  the  most  obvious  of  these,  perhaps,  is  a  system  for  the  delivery  of  physical  documents   that  is  analogous  to  the  ability  to  retrieve  the  full  text  of  electronic  documents.  if  an  information   source  is  discovered  that  exists  in  the  library  only  in  physical  form,  enhancements  to  the  display  of   the  catalog  record  facilitate  the  receipt  by  the  user  of  the  print  book  or  a  scanned  copy  of  an  article   from  a  bound  journal  in  the  stacks.     in  2013,  ithaka  s+r  conducted  a  local  unc  faculty  survey.  the  survey  posed  three  questions   related  to  the  catalog.  in  response  to  the  question,  “typically  when  you  are  conducting  academic   research,  which  of  these  four  starting  points  do  you  use  to  begin  locating  information  for  your   research?,”  41  percent  chose  “a  specific  electronic  research  resource/computer  database.”  nearly   one-­‐third  (30  percent)  chose  “your  online  library  catalog.”17     engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   16   when  asked,  “when  you  try  to  locate  a  specific  piece  of  secondary  scholarly  literature  that  you   already  know  about  but  do  not  have  in  hand,  how  do  you  most  often  begin  your  process?,”  41   percent  chose  the  library’s  website  or  online  catalog,  and  40  percent  chose  “search  on  a  specific   scholarly  database  or  search  engine.”  in  response  to  the  question,  “how  important  is  it  that  the   library  .  .  .  serves  as  a  starting  point  or  ‘gateway’  for  locating  information  for  my  research?,”  78   percent  answered  extremely  important.     on  several  questions,  ithaka  provided  the  scores  for  an  aggregation  of  unc’s  peer  libraries.  for   the  first  question  (the  starting  point  for  locating  information),  18  percent  of  national  peers  chose   the  online  catalog  compared  to  30  percent  at  unc.  on  the  importance  of  the  library  as  gateway,  61   percent  of  national  peers  answered  very  important  compared  to  the  78  percent  at  unc.   in  2014,  the  unc  libraries  were  among  a  handful  of  academic  research  libraries  that  implemented   a  new  ithaka  student  survey.  though  we  don’t  have  national  benchmarks,  we  can  compare  our   own  student  and  faculty  responses.  among  graduate  students,  31  percent  chose  the  online  catalog   as  the  starting  point  for  their  research,  similar  to  the  faculty.18  of  the  undergraduate  students,  33   percent  chose  the  library’s  website,  which  provides  access  to  the  catalog  through  a  single  search   box.19   a  finding  that  approximately  a  third  of  students  began  their  search  on  the  unc  library  website   was  gratifying.  oclc’s  perceptions  of  libraries  2010  reported  survey  results  regarding  where   people  start  their  information  searches.  in  2005,  1  percent  said  they  started  on  a  library  website;   in  2010,  not  a  single  respondent  indicated  doing  so.20     the  gross  disparity  between  the  oclc  reports  and  the  ithaka  surveys  of  our  faculty  and  students   requires  some  explanation.  the  libraries  at  the  university  of  north  carolina  at  chapel  hill  are   proud  of  a  long  tradition  of  ardent  and  vocal  support  from  the  faculty,  and  we  are  not  surprised  to   learn  that  students  share  their  loyalty.  for  us,  the  recently  completed  ithaka  surveys  point  out   directions  for  further  investigation  into  our  patrons’  use  of  our  catalog  and  why  they  feel  it  is  so   critical  to  their  research.   anecdotal  reports  indicate  that  one  of  the  most  highly  valued  services  that  the  libraries  provide  is   delivery  of  physical  materials  to  campus  addresses.  some  faculty  admit  with  a  certain  degree  of   diffidence  that  our  services  have  made  it  almost  unnecessary  to  set  foot  in  our  buildings;  that  is  a   trend  that  has  also  been  echoed  in  conversations  with  our  peers.  yet  the  online  presence  of  the   library  and  its  collections  continues  to  be  of  significant  importance—perhaps  precisely  because  it   offers  an  effective  gateway  to  a  wide  range  of  materials  and  services.   we  believe  that  the  radical  redesign  of  the  online  public  access  catalog  initiated  by  north  carolina   state  university  in  2006  marked  a  sea  change  in  interface  design  and  discovery  services  for  that   venerable  library  service.  without  a  doubt,  continued  innovation  has  enhanced  discovery.   however,  we  have  come  to  realize  that  discovery  is  only  one  function  that  the  online  catalog  can   and  should  serve  today.  equally  if  not  more  important  is  the  delivery  of  information  to  the     information  technology  and  libraries  |  june  2015     17   patron’s  home  or  office.  the  integration  of  discovery  and  delivery  is  what  sets  the  “next-­‐gen”   catalog  apart  from  its  predecessors,  and  we  must  strive  to  keep  that  orientation  in  mind,  not  only   as  we  continue  to  enhance  the  catalog  and  its  services,  but  as  we  ponder  the  role  of  the  library  as   place  in  the  coming  years.  far  from  being  in  decline,  the  online  catalog  continues  to  be  an  “engine   of  innovation”  (to  borrow  a  phrase  from  holden  thorp,  former  chancellor  of  unc-­‐chapel  hill)  and   a  source  of  new  challenges  for  our  libraries  and  our  profession.   references     1.     cathy  de  rosa  et  al.,  perceptions  of  libraries  and  information  resources:  a  report  to  the  oclc   membership  (dublin,  oh:  oclc  online  computer  library  center,  2005),  1–17,   https://www.oclc.org/en-­‐us/reports/2005perceptions.html.   2.     karen  calhoun,  the  changing  nature  of  the  catalog  and  its  integration  with  other  discovery   tools,  final  report,  prepared  for  the  library  of  congress  (ithaca,  ny:  k.  calhoun,  2006),  5,   http://www.loc.gov/catdir/calhoun-­‐report-­‐final.pdf.   3.     roger  c.  schonfeld  and  kevin  m.  guthrie,  “the  changing  information  services  needs  of   faculty,”  educause  review  42,  no.  4  (july/august  2007):  8,   http://www.educause.edu/ero/article/changing-­‐information-­‐services-­‐needs-­‐faculty.   4.     ross  housewright  and  roger  schonfeld,  ithaka’s  2006  studies  of  key  stakeholders  in  the  digital   transformation  in  higher  education  (new  york:  ithaka  s+r,  2008),  6,   http://www.sr.ithaka.org/sites/default/files/reports/ithakas_2006_studies_stakeholders_di gital_transformation_higher_education.pdf.   5.     xi  niu  and  bradley  m.  hemminger,  “beyond  text  querying  and  ranking  list:  how  people  are   searching  through  faceted  catalogs  in  two  library  environments,”  proceedings  of  the   american  society  for  information  science  &  technology  47,  no.  1  (2010):  1–9,   http://dx.doi.org/10.1002/meet.14504701294;  and  cory  lown,  tito  sierra,  and  josh  boyer,   “how  users  search  the  library  from  a  single  search  box,”  college  &  research  libraries  74,  no.   3  (2013):  227–41,  http://crl.acrl.org/content/74/3/227.full.pdf.     6.     charles  r.  hildreth,  “beyond  boolean;  designing  the  next  generation  of  online  catalogs,”   library  trends  (spring  1987):  647–67,  http://hdl.handle.net/2142/7500.   7.     kristen  antelman,  emily  lynema,  and  andrew  k.  pace,  “toward  a  twenty-­‐first  century   library  catalog,”  information  technology  and  libraries  25,  no.  3  (2006):  129,   http://dx.doi.org/10.6017/ital.v25i3.3342.   8.     karen  coyle,  “the  library  catalog:  some  possible  futures,”  journal  of  academic  librarianship   33,  no.  3  (2007):  415–16,  http://dx.doi.org/10.1016/j.acalib.2007.03.001.   9.     karen  markey,  “the  online  library  catalog:  paradise  lost  and  paradise  regained?”  d-­‐lib   magazine  13,  no.  1/2  (2007):  2,  http://dx.doi.org/10.1045/january2007-­‐markey.     engine  of  innovation:  building  the  high-­‐performance  catalog  |  owen  and  michalak       doi:  10.6017/ital.v34i2.5702   18     10.    marshall  breeding,  “next-­‐gen  library  catalogs,”  library  technology  reports  (july/august   2007):  10–13.   11.    jia  mi  and  cathy  weng,  “revitalizing  the  library  opac:  interface,  searching,  and  display   challenges,”  information  technology  and  libraries  27,  no.  1  (2008):  17–18,   http://dx.doi.org/10.6017/ital.v27i1.3259.   12.    michael  j.  bennett,  “opac  design  enhancements  and  their  effects  on  circulation  and   resource  sharing  within  the  library  consortium  environment,”  information  technology  and   libraries  26,  no.  1  (2007):  36–46,  http://dx.doi.org/10.6017/ital.v26i1.3287.   13.    eric  lease  morgan,  “use  and  understand;  the  inclusion  of  services  against  texts  in  library   catalogs  and  discovery  systems,”  library  hi  tech  (2012):  35–59,   http://dx.doi.org/10.1108/07378831211213201.   14.    lorcan  dempsey,  “thirteen  ways  of  looking  at  libraries,  discovery,  and  the  catalog:  scale,   workflow,  attention,”  educause  review  online  (december  10,  2012),   http://www.educause.edu/ero/article/thirteen-­‐ways-­‐looking-­‐libraries-­‐discovery-­‐and-­‐ catalog-­‐scale-­‐workflow-­‐attention.   15.    charles  pennell,  natalie  sommerville,  and  derek  a.  rodriguez,  “shared  resources,  shared   records:  letting  go  of  local  metadata  hosting  within  a  consortium  environment,”  library   resources  &  technical  services  57,  no.  4  (2013):  227–38,   http://journals.ala.org/lrts/article/view/5586.   16.    benjamin  pennell  and  jill  sexton,  “implementing  a  real-­‐time  suggestion  service  in  a  library   discovery  layer,”  code4lib  journal  10  (2010),  http://journal.code4lib.org/articles/3022.     17.    ithaka  s+r,  unc  chapel  hill  faculty  survey:  report  of  findings  (unpublished  report  to  the   university  of  north  carolina  at  chapel  hill,  2013),  questions  20,  21,  33.   18.    ithaka  s+r,  unc  chapel  hill  graduate  student  survey:  report  of  findings  (unpublished  report   to  the  university  of  north  carolina  at  chapel  hill,  2014),  47.   19.    ithaka  s+r,  unc  chapel  hill  undergraduate  student  survey:  report  of  findings  (unpublished   report  to  the  university  of  north  carolina  at  chapel  hill,  2014),  39.   20.    cathy  de  rosa  et  al.,  perceptions  of  libraries,  2010:  context  and  community:  a  report  to  the   oclc  membership  (dublin,  oh:  oclc  online  computer  library  center,  2011),  32,   http://oclc.org/content/dam/oclc/reports/2010perceptions/2010perceptions_all.pdf.     a technology-dependent information literacy model within the confines of a limited resources environment ibrahim abunadi information technology and libraries | december 2018 119 ibrahim abunadi (i.abunadi@gmail.com) is an assistant professor, college of computer and information sciences, prince sultan university, riyadh, saudi arabia. abstract the purpose of this paper is to investigate information literacy as an increasingly evolving trend in computer education. a quantitative research design was implemented, and a longitudinal case study methodology was conducted to measure tendencies in information literacy skill development and to develop a practical information literacy model. it was found that both students and educators believe that the combination of information literacy with a learning management system is more effective in increasing information literacy and research skills where information resources are limited. based on the quantitative study, a practical, technology-dependent information literacy model was developed and tested in a case study, resulting in fostering the information literacy skills of students who majored in information systems. these results are especially important in smaller universities with libraries having limited technology capabilities, located in developing countries. introduction many different challenges arise during a graduate’s career. moreover, professional life can involve numerous situations and problems that university students are not prepared for during their college studies.1 the use of internet sources to find solutions to real problems depends on students’/graduates’ information literacy skills.2 a strong aid to students’ learning is the ability to search, analyze, and apply knowledge from different sources, including literature, databases, and the internet.3 one of the issues students face concerning technology is its continuous evolution. although students learn survival skills in their professional lives, they also require special coping skills. a skill that should be considered for all technology-related courses is information literacy. lin defines information literacy as a “set of abilities, skills, competencies, or fluencies, which enable people to access and utilize information resources.”4 these are part of the lifelong learning skills of students, which put the power of continuous education in their hands. another issue is the exclusive allocation of the responsibility for information literacy skill development in smaller educational institutes to librarians or to instructors who majored in library science.5 this paper has taken another approach to information literacy skill development whereby specialized educators, such as capable information systems faculty members, facilitate this skill development. a learning management system (lms) is a widely used form of technology for course delivery and the organization of subject material. blackboard, desire2learn, sakai, moodle, and angel, as common lms platforms, provide an integrated guidance system to deliver and analyze learning. mailto:i.abunadi@gmail.com information literacy model | abunadi 120 https://doi.org/10.6017/ital.v37i4.9750 these systems can be used to support information literacy instruction. standard features include assignments and quizzes, while other systems offer tools that allow students to view and comment on other students’ portfolios or work, depending on the lms’s features.6 before the 1990s, face-toface learning was common within the educational domain. however, lms emerged in the twentyfirst century as the internet became a suitable alternative to traditional learning. moodle, an opensource lms, is an acronym that stands for “modular object-oriented dynamic learning environment.” this online education system is intended to make learning available with the necessary guidance for educators. web services available through moodle are based on a wellorganized structural outline, and they are widely used to perform educational tasks and to analyze statistics helpful to instructors.7 peter et al. (2015) presented an approach related to information literacy instruction in universities and colleges that combines traditional classroom instruction and online learning; this is known as “blended learning.”8 this involves only one seminar in the classroom; thus, it can replace traditional sessions at universities and colleges with education involving information literacy instructions. it has been recommended that a time-efficient method should be adopted by augmenting classroom seminars and literacy instructions through the addition of online materials. however, the findings of this study showed that students who only use online materials do not show greater progress in their learning than those who follow the blended approach. the results of another study by jackson more effectively integrated educational services into learning management systems and library resources.9 jackson suggested that better implementation was required, and recommended using blackboard lms to include information literacy and scaffolding activities into subject-specific courses. this study intends to determine the most effective method of information literacy education. it evaluates instructors’ and students’ perceptions of the effectiveness of traditional teaching in comparison to electronic teaching in information literacy. in this study, a quantitative research investigation was conducted with participants. a research model and questionnaire were developed for this purpose with three underlying latent variables. the participants were asked to describe their understanding of learning systems and their preferences in information literacy education. their requirements varied with their continuing education levels and past educational activities, based on which software or website appeared to be more supportive and compatible with them.10 this study considered the research results, developed an information literacy intervention model and applied it to a case study. literature review previously, educational institutions were limited to face-to-face teaching techniques or classroombased teaching. face-to-face teaching is the traditional method still used in most educational institutions. in classrooms, the subject is explained, and books or other paper-based materials are read out of class to enhance understanding.11 face-to-face learning or teaching is limited by the number of physical resources available. therefore, it becomes difficult to accommodate the widespread interest in information literacy through face-to-face learning.12 gathering information using only physical resources can lead to information deficiencies.13 education has evolved to benefit from advances in technologies by using lms and online sources. the effective usage of lms and online sources requires the development of information literacy. information technology and libraries | december 2018 121 information literacy information literacy includes technological literacy, information ethics, online library skills, and critical literacy.14 technological literacy is defined as the ability to use common software, hardware, and internet tools to reach a specific goal. this ability is an important component of information literacy that enables a graduate to seek answers by using the internet and digital resources.15 hauptman defines information ethics as “the production, dissemination, storage, retrieval, security, and application of information within an ethical context.”16 this skill is essential to preserve the original rights of researchers cited in a study, based on the ethical standards of the graduate conducting the study. another important component of information literacy comprises online library skills, which can be defined as the ability to use online digital sources, including digital libraries, to effectively seek different knowledge resources by using search engines, correctly locating required information, and using online support when needed.17 critical literacy is a thorough evaluation of online material that allows for the appropriate conclusion to be reached on the suitability of the material for the required investigation. 18 seeking answers from appropriate sources is important to allow graduates to find and report on accurate and valid data. these components of information literacy enable information extraction from topics related to the desired course or field of research. students, professors, instructors, employees, learners, and educational policy administrators are the major knowledge seekers who use information literacy skills.19 with improved online resources available for learning, many learning requirements are moving toward providing services that are exclusively online. 20 gray and montgomery studied an online information literacy course.21 they found that teaching with the aid of information literacy is helpful for students in obtaining improvised instruction. the authors also compared an online information literacy course and face-to-face instruction, focusing primarily on the behaviors and attitudes of teachers and college students toward the online course. the students agreed that the application of information literacy techniques would be particularly helpful to them in clarifying their understanding of complicated instructions. the teachers also indicated that an information literacy course would result in better regulation of academic processes than face-to-face learning. dimopoulos et al. (2013) measured student performance within an online learning environment, finding that the online learning environment has direct relevance for the completion of challenging tasks within academic settings.22 the findings further indicated that an lms could improve teaching activities. as an lms, moodle was also helpful for students to ensure their development of collaborative problem-solving skills. they concluded that moodle includes different useful modules such as digital resource repositories, interactive wikis, and external addin tools that have been related to student learning when incorporated into the lms environment, resulting in better performance. hernández-garcía and conde-gonzález focused on learning analytical tools within engineering education, noting that engineering students are more likely to understand complicated concepts better. therefore, the application of the information literacy model resulted in better performance.23 further, educating students about information sources was found to be helpful for the instructors in enhancing the students’ learning by improving their online information retrieval skills. this study indicated that students can develop their learning traits more effectively through online learning than through face-to-face learning. information literacy model | abunadi 122 https://doi.org/10.6017/ital.v37i4.9750 many researchers in this area have developed models that are only theoretical. 24 however, this paper develops a practical information literacy model that can be tested for improvement in information literacy skills. this is especially relevant for computer and information systems courses, which can sometimes fall outside the purview of library-related training or education in universities with limited resources. the inclusion of information literacy training within computer and information systems courses is not regularly done in the information literacy field. 25 additionally, although some information literacy has been implemented practically in research, no other study has developed a practical information literacy model based on educators’ and students’ information literacy dispositions as well as both information literacy theory and practice.26 moodle as an lms moodle is a useful and accommodating open-source platform with a stable structure of website services that allows instructors and learners to implement a range of helpful plugins. it can be used as a lively online education community and an enhancement to the face-to-face learning process.27 moodle is used in around 190 countries and offers its services in over seventy languages. it acts as a mediator for instruction and is widely adopted in many institutions. moodle provides services such as assignments, wikis, messaging, blogs, quizzes, and databases. 28 it can provide a more flexible teaching platform than traditional teaching. health science educational service providers facilitate self-assurance in their learners. several educational campuses operate by using face-to-face learning strategies, whereby learners obtain their training on-campus locations. the objective of moodle is to enable the education of learners through internet access.29 xing focused on the broad application of the moodle lms for developing educational technology within academic settings, suggesting that academic organizations should promote technology as a solution for common problems with students’ learning processes.30 such suggestions have been supported by costa et al. (2012) who found that moodle is significantly helpful for developing an e-learning platform for students. they emphasized that engineering universities must use the moodle lms to provide students with extensive technical knowledge. 31 costello et al. (2012) stated that moodle, if used, will significantly help students improve their skills and knowledge effectively.32 methodology in information literacy skill development, there are studies that support using only face-to-face education or only using an lms. for example, churkovich and oughtred found that face-to-face learning leads to better results in information literacy tutorials than online learning. 33 at the same time, anderson and may concluded that the use of an lms is viewed by students as a better method than face-to-face instruction in information literacy.34 to test which educational pedagogy (traditional or technology) is better regarding information literacy, the following two hypotheses were posited: h1: face-to-face learning has a significantly positive influence on information literacy disposition. h2: moodle learning has a significantly positive influence on information literacy disposition. to provide a better understanding of the most effective method of information literacy instruction, a quantitative research design was used. the wording of the questionnaire items (shown in table 1) was inspired by the studies of ng, horvat et al., abdullah, and deng and tavares. 35 online information technology and libraries | december 2018 123 questionnaires were prepared and distributed to students, teachers, trainers, and professors as well as administrative departments in a small private university located in the arabian gulf region. initially, a pilot study was conducted to test the instrument. this pilot study involved forty-nine participants and fifteen questions on information literacy. it also included demographic questions. variables code item wording face-to-face education disposition (fed) fed1 information literacy skills are polished through face-to-face learning fed2 face-to-face learning accommodates information literacy requirements fed3 face-to-face learning is easier than learning management systems fed4 face-to-face learning is better than learning management systems moodle usage disposition (mud) mud1 moodle is more easily accessible than other online resources mud2 moodle is an effective web server for information literacy mud3 moodle is more reliable than other online resources mud4 moodle enables the provision of an extensive amount of useful information mud5 moodle is used to overcome language, understanding, and communication gaps information literacy preference (il) il1 students and teachers prefer online resources il2 inauthentic websites are helpful for students and teachers il3 authentic websites are useful for students and teachers il4 students and teachers prefer published articles, journals, and books il5 online learning is more effective il6 information is essential for individuals’ knowledge table 1. item coding. after the pilot study, a full-scale study was conducted, in which the participants were students, professors, and educational administrators. an online questionnaire was sent to the management of an academic institution in the arabian gulf region to assess the instruction methodology to improve students’ information literacy skills. the language used in the survey was arabic, and the questionnaire was translated into english for this article by a professional translator. a total of five hundred questionnaires were sent, and 398 of them were received with complete responses. the following criteria were used to filter questionnaires that were not appropriate for this study: inclusion criteria • people currently involved in the education system. • students, teachers, or members of an academic department. • people who understand information literacy. a question was added in the survey about whether the participant was familiar with information literacy; if not, the participant was removed from the sample. exclusion criteria • people who were not involved in the education system. • people who were not aware of online learning systems. • staff with no role in learning or teaching. information literacy model | abunadi 124 https://doi.org/10.6017/ital.v37i4.9750 gender frequency percent male 186 46.73 female 212 53.27 total 398 100 qualification frequency percent undergraduate 181 45.48 graduate 98 24.62 masters 119 29.90 total 398 100 designation frequency percent student 216 54.27 instructor 90 22.61 administrator 92 23.12 total 398 100 table 2. demographic information. question agree neutral disagree don’t know face-to-face education disposition (fed) fed1 46.8 22.8 21.3 9.1 fed2 10 74.5 14.2 1.3 fed3 1.5 12.8 75.8 9.9 fed4 32 30 26 12 information literacy preference (il) il1 38.8 21.3 1.5 38.4 il2 0.3 1 98.7 - il3 15 31 53.3 - il4 49.5 30 13.0 7.5 il5 48 29.8 -22.2 il6 74 11.5 1.8 12.7 table 3. questionnaire response distribution for fed and il. question yes no moodle usage disposition (mud) mud1 65 35 mud2 73.3 26.8 mud3 67 33 mud4 66 34 mud5 63.7 36.3 table 4. responses to mud. the reliability statistics showed a high level of consistency for the pilot test because the cronbach’s alpha for the fifteen items was 0.901, which is above the recommended level of 0.7.36 cronbach’s alpha is a widely used coefficient measuring the internal consistency of items as a unified group.37 information technology and libraries | december 2018 125 based on the successful pilot study, a full-scale study was conducted. the demographic distribution for the full-scale study is shown in table 2 along with the mean and standard deviation of each demographic factor. the distribution of the questionnaire items for the full-scale study is shown in tables 3 and 4. cronbach’s alpha was used to determine the reliability of the constructed items for the full-scale study. the standard benchmark for the reliability value is a 0.7 threshold; however, the cronbach’s alpha for all constructed items was above the 0.7 standard value. thus, this standard score revealed that all the items had appropriate and adequate reliability.38 results the research hypotheses were tested using structural equation modeling (sem) with the analysis of momentum structures (amos) approach. sem includes various statistical methods and computer algorithms that are used to assess latent variables along with observed variables. sem also indicates the relationships among latent variables, showing the effects of the independent variables on the dependent variables.39 one well-regarded sem methodology is amos, which is a multivariate technique that can concurrently assess the relationships between latent variables and their corresponding indicators (measurement model), as well as the relationships among the model’s variables.40 highly cited information systems and statistics guidelines were followed for the sem to ensure the validity and reliability of data analysis. 41 measurement and structural model the measurement model contained fifteen items for ascertaining the representation of three latent variables, including face-to-face education disposition, moodle usage disposition, and information literacy preferences. before we proceed to this analysis, the data need to show normality for us to be able to trust the robustness of this parametric sem. curran et al. suggested a skewness and kurtosis less than the absolute value of 2 and 7, respectively, to display the normality of the data.42 all items’ absolute values of skewness and kurtosis were less than the suggested cut off, showing a suitable level of normality for conducting sem analysis. the overall measurement model showed a high level for the fit indices: gfi=0.99, agfi=0.98, nfi=0.98, cmin/df=0.86, and rmr=0.39. all these fit indices show that the theoretical model fits well with the empirical data if they are above 0.95, except cmin/df and rmr, which do not follow this cut off. the cmin/df should be less than 3, while the rmr should be less than 0.5.43 table 5 shows all the items loaded on their corresponding latent variables higher than the suggested cut off (0.5). as shown in the table, il6 was the only item that did not load clearly on its latent variable and, thus, it was dropped from further analysis.44 an additional method to assess item loading was item loading significance, which was significant at the level of 0.001, indicating that all items loaded on their latent variables.45 the indices of the measurement model suggested that the psychometric properties of this instrument can be preceded by the structural model. information literacy model | abunadi 126 https://doi.org/10.6017/ital.v37i4.9750 item estimate face-to-face education disposition (fed) fed4 0.71 fed3 0.52 fed2 0.66 fed1 0.89 moodle usage disposition (mud) mud5 0.93 mud4 0.92 mud3 0.92 mud2 0.73 mud1 0.93 information literacy preference disposition (il) il6 0.32 il5 0.91 il4 0.72 il3 0.86 il2 0.81 il1 0.83 table 5. item loadings. the next step was to assess the structural model, which was used to evaluate the hypothesized relations between the dependent variables (face-to-face education disposition [fed] and moodle usage disposition [mud]) and independent variables (il). both education methods were tested in the hypotheses to identify the most suitable information literacy delivery mode for students. both hypotheses were supported, which indicates that an individual method of information literacy delivery (either face-to-face instruction or lms) is not preferred, and a different model can be suggested. both hypotheses were supported at the level of 0.001 with an effect size for face-to-face education disposition of 0.32, which indicates a medium impact on information literacy preferences. meanwhile, the moodle usage disposition had an effect size of 0.70, which is considered a large effect size (hair et al. 2010). finally, the model’s explanatory power of information literacy preferences was determined by r2, which was high (0.85). based on the previous analysis, it can be said that an individual method of information literacy delivery is insufficient in developing countries. thus, a different model for information literacy was developed (figure 1), which had an impact on students’ related competencies. information technology and libraries | december 2018 127 figure 1. information literacy intervention model. as shown in figure 1, the model includes conducting weekly information literacy sessions that focus on educating students about technological literacy, information ethics, online library skills, and critical literacy. after each session is concluded, the instructor creates weekly assignments using an lms that tests the students’ information literacy abilities regarding the subject material. the instructor follows up regarding the students’ overall performance and fills any identified gaps in subsequent information literacy sessions and assignments. the instructor studies the students’ performance after one month and provides feedback to students. finally, a “real case project assignment” is used to teach students to solve real problems using the skills they learn. the instructor can further extend reflection on the process of assigning “real case project” grades by creating a course exit survey that asks students about their acquired level of information literacy skills. longitudinal case study a small technical university in the arabian gulf region faces difficulties in providing adequate library resources to its students because of its limited capabilities. the university has about 4,500 students and five hundred employees. the university library and information technology department shortage in adequate staff and resources, resulting in an insufficient support for student learning. this has caused lack of student information literacy education, which is evident in the submission of student assignments. for example, students are not accustomed to citing materials that were used in their assessments. thus, these undergraduates are viewed suspiciously by their educators when using online materials. not knowing how to paraphrase then cite relevant online materials causes missing learning opportunities for students. information literacy is a skill that should be considered for all technology-related courses.46 the outcomes of this course will be used to improve the education of students and place the power of learning in their hands.47 therefore, the objective of this case study is to determine the influence of information literacy practices in improving student performance in solving organizational problems, especially when technology and library resources are scarce. this longitudinal case information literacy model | abunadi 128 https://doi.org/10.6017/ital.v37i4.9750 study was conducted in two semesters: the first was conducted traditionally without the use of an information literacy intervention model, whereas in the second semester, the intervention model was introduced. finally, the performance and opinions of students for the two semesters were compared using a case study assignment and course exit survey. the information literacy intervention model was implemented by providing a series of practical tutorials at the beginning of the semester showing students how to use information from the internet. then, the students applied the information and used information literacy skills to solve weekly assessments for an enterprise-architecture (ea) course. this course is taught under the information systems program at a private university. students enrolling in the course are in their second year or higher. the information literacy assessments require students to search for reliable sources of information and cite and reference them. this forces them into the habit of critically examining sources of information, and grasping, analyzing, and using these sources to solve problems. the information literacy technology pedagogical method was followed to improve students’ knowledge of methods of learning.48 the students were educated through a series of classes on how to use the university’s databases, e-books, and internet resources to solve real-life organizational problems and to apply concepts in different situations, as shown in figure 1. the students were given ten small assessments from the moodle lms, where a concept taught in the class needed to be applied after students searched for it and learned more about it from different sources. this included looking in the correct places for reliable resources, online scholarly databases, and online videos that could be of use. then, students were taught how to critically examine resources and determine which of these could be reliable. for example, students were shown that highly cited papers were more reliable than less cited papers and that online videos from professional organizations (e.g., ibm or gartner) were more reliable than personal videos. students were also taught how to use in-text citations and how to create reference lists. in the last quarter of the semester, a case study assignment was provided with real-life problems that students were required to solve using different sources, including the internet. the performance of semester-1 students (no intervention was conducted) was compared with that of semester-2 students (information literacy intervention was conducted) taking the same course. an improvement in grades was considered a successful indicator. the comparison point was a major project that required students to solve real-life organizational problems and required greater information literacy. some of the ea concepts taught in the class required practice to apply them. for example, the as-is organizational modeling that is needed before implementing ea would be difficult to understand unless students actually conducted modeling on selected organizations. this enabled students to understand how they related to the real world. the concepts that were focused on were related to business tools in information systems (e.g., business process management and requirement elicitation) that are widely used for analysis within organizations. the theory behind these tools was explained in class; applying these theories required students to search many sources of information, including online books and research databases. students were unaware of these resources until the instructors explained their availability on the internet and in the library. the students were provided with regular information literacy sessions to improve their skills in this aspect. they were shown how to search; for instance, if they could not find a specific term, they information technology and libraries | december 2018 129 could look for synonyms. they were instructed on how to use search engines and research databases and were shown the relevant electronic journals and books that can aid in solving weekly assessments. the usage of internet multimedia is also important in education.49 the students were shown relevant youtube channels (e.g., by harvard and khan academy) and relevant massive online open courses (e.g., free courses on coursera.com and udemy.com). weekly tests required students to use these resources to solve the assessment problems. an important outcome of this intervention was an improvement in students’ abilities to use different digital resources. this was evident in semester-2 students’ usage of suitable reference lists and in-text citations, as compared to a lack of such usage by semester-1 students. an additional measure was the higher average score the students indicated in semester 2 (4.15/5), in comparison to semester 1 (3.2/5), for one of the items in the course exit survey relevant to information literacy: “illustrate responsibility for one’s own learning.” the students were continually taught that information literacy grants a power that comes with responsibility, and no incidents of plagiarism were reported during the semester in which the intervention was conducted. referencing became a habit with weekly information literacy assessments. the students’ grades in the final project were better than in the previous academic semester. the average grade for the project for semester 1 was 15.5/20, while that for semester 2 was 17/20. the difference between the grades for semester-1 and semester-2 projects was statistically significant at the level of 0.10, indicating significant differences in the students’ grades between the two semesters. the students could use digital library databases, and some were interested in using external online books. it became habitual for students to use in-text citations, and their references became diversified. some students, however, struggled at times with the limited usage of suitable references in only some paragraphs. this feedback was delivered to students so that they could address this issue in other courses. discussion and conclusion this study was conducted to investigate the most effective mode of information literacy delivery. the study focused on smaller universities because they do not have adequate library facilities and technological capabilities to provide students with sufficient information literacy competencies during course delivery. a survey was conducted to determine the most suitable form of information literacy delivery. the survey determined that moodle and face-to-face methods were both favored regarding information literacy. thus, the information literacy intervention model was developed and tested in a case study, so that students’ performance would improve. the results of this study have shown that the combination of technology and information literacy instruction is an effective method to improve student skills in using digital resources in seeking knowledge. it was found that both face-to-face learning and the use of an lms increase student performance in assessments that require information literacy. face-to-face learning is required in order to explain information literacy concepts, while the lms is used to disseminate the necessary digital resources and in creating assessment modules. thus, the arrangement of both theory and practice in information literacy resulted in better understanding and implementation in knowledge seeking and problem-solving related to information systems. the inclusion of information literacy instruction along with the use of lms for information literacy assessments within information systems courses has reduced the pressure on libraries that lack technological resources (such as pcs) and qualified staff. information literacy model | abunadi 130 https://doi.org/10.6017/ital.v37i4.9750 the results with regard to this study’s hypotheses are in agreement with those of previous studies.50 hypothesis 1, which posited that there would be a positive significant influence on information literacy disposition, is congruent with the research of churkovich and oughtred. 51 their research focused on student information literacy skill development using library facilities instead of faculty, which is a different approach than the approach followed in the present study. however, both the present study and the study of churkovich and oughtred found that using faceto-face instruction leads to improved student performance. hypothesis 2, which posited a positive impact on information literacy disposition, correlates with the research of anderson and may.52 they found that using an lms is more effective than face-to-face instruction for information literacy instruction. similar to churkovich and oughtred (and in contrast to the present study), anderson and may relied on librarians to deliver information literacy instruction online. however, anderson and may also relied on faculty staff in addition to librarians. there are two noteworthy outcomes of the first study. first, the questionnaire measurement model showed that the development of this instrument was successful and that the items and their latent variables can be used in further studies. second, the results regarding the structural model indicated that both face-to-face instruction and moodle use influenced information literacy preferences. other studies have supported these results. the results of peter et al. (2015) agree with the finding of the present study that the combination of face-to-face instruction and lms use leads to improved student performance.53 peter et al. (2015), based on psychology students, focused on the time-efficiency of the delivery of information literacy instruction; in contrast, the present study considers information literacy skill development as a progressive, long-term process. the information literacy intervention model is not only a learning medium but an interactive method of teaching that adapts to student learning patterns. the primary limitations of the study were the nature of the sample, the exclusion of some potentially relevant variables, and the simplification of the study’s findings. the sample was limited to students, professors, and people who were aware of the learning programs; it is highly possible that they were more familiar with such technological innovations than the general population. future studies could retest the hypothesis of the study in a comprehensive manner and impose more control on the respondents. the interaction between people while visiting a site is itself an activity worthy of examination, but it must be either controlled or measured for us to understand the role it plays in shaping attitudes and behaviors. future studies can apply the developed theoretical model in different settings to determine its interaction with other variables in the information systems field. a quantitative instrument can be developed based on the information literacy intervention model. alternatively, this model can be applied with qualitative interviews in future studies to develop theoretical themes based on instructors’ and students’ responses. references 1 harry m. kibirige and lisa depalo, “the internet as a source of academic research information: findings of two pilot studies,” information technology and libraries 19, no. 1 (2000): 11–15; debbie folaron, a discipline coming of age in the digital age (philadelphia: john benjamins, 2006); n. n. edzan, “tracing information literacy of computer science undergraduates: a content analysis of students’ academic exercise,” malaysian journal of library & information science 12, no. 1 (2007): 97–109. information technology and libraries | december 2018 131 2 heinz bonfadelli, “the internet and knowledge gaps,” european journal of communication 17, no. 1 (2002): 65–84, http://journals.sagepub.com/doi/abs/10.1177/0267323102017001607; kibirige and depalo, “the internet as a source of academic research information,” 11–15. 3 laurie a. henry, “searching for an answer: the critical role of new literacies while reading on the internet,” the reading teacher 59, no. 7 (2006): 614–27. 4 peyina lin, “information literacy barriers: language use and social structure,” library hi tech 28, no. 4 (2010): 548–68, https://doi.org/10.1108/07378831011096222. 5 michael r. hearn, “embedding a librarian in the classroom: an intensive information literacy model,” reference services review 33, no. 2 (2005): 219–27. 6 hui hui chen et al., “an analysis of moodle in engineering education: the tam perspective” (paper presented at teaching, assessment and learning for engineering (tale), 2012 ieee international conference on). 7 n. n. edzan, “tracing information literacy of computer science undergraduates: a content analysis of students' academic exercise,” malaysian journal of library & information science 12, no. 1 (2007): 97–109. 8 johannes peter et al., “making information literacy instruction more efficient by providing individual feedback,” studies in higher education (2015): 1–16, https://doi.org/10.1080/03075079.2015.1079607. 9 pamela alexondra jackson, “integrating information literacy into blackboard: building campus partnerships for successful student learning,” the journal of academic librarianship 33, no. 4 (2007): 454–61, https://doi.org/10.1016/j.acalib.2007.03.010. 10 manal abdulaziz abdullah, “learning style classification based on student's behavior in moodle learning management system,” transactions on machine learning and artificial intelligence 3, no. 1 (2015): 28. 11 catherine j. gray and molly montgomery, “teaching an online information literacy course: is it equivalent to face-to-face instruction?,” journal of library & information services in distance learning 8, no. 3–4 (2014): 301–9, https://doi.org/10.1080/1533290x.2014.945876. 12 william sugar, trey martindale, and frank e crawley, “one professor’s face-to-face teaching strategies while becoming an online instructor,” quarterly review of distance education 8, no. 4 (2007): 365–85. 13 stephann makri et al., “a library or just another information resource? a case study of users’ mental models of traditional and digital libraries,” journal of the association for information science and technology 58, no. 3 (2007): 433–45. 14 christine susan bruce, “workplace experiences of information literacy,” international journal of information management 19, no. 1 (1999): 33–47, michael b eisenberg, carrie a lowe, and kathleen l spitzer, information literacy: essential skills for the information age, (westport, ct: greenwood publishing group, 2004), https://doi.org/10.1016/s0268-4012(98)00045-0. information literacy model | abunadi 132 https://doi.org/10.6017/ital.v37i4.9750 15 andy carvin, “more than just access: fitting literacy and content into the digital divide equation,” educause review 35, no. 6 (2000): 38–47. 16 robert hauptman, ethics and librarianship (jefferson, nc: mcfarland, 2002). 17 janae kinikin and keith hench, “poster presentations as an assessment tool in a third/college level information literacy course: an effective method of measuring student understanding of library research skills,” journal of information literacy 6, no. 2 (2012), https://doi.org/10.11645/6.2.1698; stuart palmer and barry tucker, “planning, delivery and evaluation of information literacy training for engineering and technology students, ” australian academic & research libraries 35, no. 1 (2004): 16–34, https://doi.org/10.1080/00048623.2004.10755254. 18 lauren smith, “towards a model of critical information literacy instruction for the development of political agency,” journal of information literacy 7, no. 2 (2013): 15–32, https://doi.org/10.11645/7.2.1809. 19 melissa gross and don latham, “what’s skill got to do with it?: information literacy skills and self‐views of ability among first‐year college students,” journal of the american society for information science and technology 63, no. 3 (2012): 574–83, https://doi.org/10.1002/asi.21681. 20 bala haruna et al., “modelling web-based library service quality and user loyalty in the context of a developing country,” the electronic library 35, no. 3 (2017): 507–19, https://doi.org/10.1108/el-10-2015-0211. 21 catherine j. gray and molly montgomery, “teaching an online information literacy course: is it equivalent to face-to-face instruction?,” journal of library & information services in distance learning 8, no. 3–4 (2014): 301–9, https://doi.org/10.1080/1533290x.2014.945876. 22 ioannis dimopoulos et al., “using learning analytics in moodle for assessing students’ performance” (paper presented at the 2nd moodle research conference sousse, tunisia, 4 –6, 2013). 23 ángel hernández-garcía and miguel á. conde-gonzález, “using learning analytics tools in engineering education” (paper presented at lasi spain, bilbao, 2016). 24 michael r. hearn, “embedding a librarian in the classroom: an intensive information literacy model,” reference services review 33, no. 2 (2005): 219–27, https://doi.org/10.1108/00907320510597426; thomas p mackey and trudi e jacobson, “reframing information literacy as a metaliteracy,” college & research libraries 72, no. 1 (2011): 62–78; s. serap kurbanoglu, buket akkoyunlu, and aysun umay, “developing the information literacy self-efficacy scale,” journal of documentation 62, no. 6 (2006): 730–43, https://doi.org/10.1108/00220410610714949. 25 michelle holschuh simmons, “librarians as disciplinary discourse mediators: using genre theory to move toward critical information literacy,” portal: libraries and the academy 5, no. 3 (2005): 297–311, https://doi.org/10.1353/pla.2005.0041; sharon markless and david r. information technology and libraries | december 2018 133 streatfield, “three decades of information literacy: redefining the parameters,” change and challenge: information literacy for the 21st century (blackwood, south australia: auslib press, 2007): 15–36; meg raven and denyse rodrigues, “a course of our own: taking an information literacy credit course from inception to reality,” partnership: the canadian journal of library and information practice and research 12, no. 1 (2017), https://doi.org/10.21083/partnership.v12i1.3907. 26 joanne munn and jann small, “what is the best way to develop information literacy and academic skills of first year health science students? a systematic review,” evidence based library and information practice 12, no. 3 (2017): 56–94, https://doi.org/10.18438/b8qs9m; sheila corrall, “crossing the threshold: reflective practice in information literacy development,” journal of information literacy 11, no. 1 (2017): 23–53, https://doi.org/10.11645/11.1.2241. 27 liping deng and nicole judith tavares, “from moodle to facebook: exploring students’ motivation and experiences in online communities,” computers & education 68 (2013): 167– 76, https://doi.org/10.1016/j.compedu.2013.04.028. 28 ana horvat et al., “student perception of moodle learning management system: a satisfaction and significance analysis,” interactive learning environments 23, no. 4 (2015): 515–27, https://doi.org/10.1080/10494820.2013.788033. 29 cary roseth, mete akcaoglu, and andrea zellner, “blending synchronous face-to-face and computer-supported cooperative learning in a hybrid doctoral seminar,” techtrends 57, no. 3 (2013): 54–59, https://doi.org/10.1007/s11528-013-0663-z. 30 ruonan xing, “practical teaching platform construction based on moodle—taking ‘education technology project practice’ as an example,” communications and network 5, no. 3 (2013): 631, https://doi.org/10.4236/cn.2013.53b2113. 31 carolina costa, helena alvelos, and leonor teixeira, “the use of moodle e-learning platform: a study in a portuguese university,” procedia technology 5 (2012): 334–43, https://doi.org/10.1016/j.protcy.2012.09.037. 32 eamon costello, “opening up to open source: looking at how moodle was adopted in higher education,” open learning: the journal of open, distance and e-learning 28, no. 3 (2013): 187– 200, https://doi.org/10.1080/02680513.2013.856289. 33 marion churkovich and christine oughtred, “can an online tutorial pass the test for library instruction? an evaluation and comparison of library skills instruction methods for first year students at deakin university,” australian academic & research libraries 33, no. 1 (2002): 25– 38, https://doi.org/10.1080/00048623.2002.10755177. 34 karen anderson and frances a. may, “does the method of instruction matter? an experimental examination of information literacy instruction in the online, blended, and face-to-face classrooms,” the journal of academic librarianship 36, no. 6 (2010): 495–500, https://doi.org/10.1016/j.acalib.2010.08.005. information literacy model | abunadi 134 https://doi.org/10.6017/ital.v37i4.9750 35 wan ng, “can we teach digital natives digital literacy?,” computers & education 59, no. 3 (2012): 1065–78, https://doi.org/10.1016/j.compedu.2012.04.016; horvat et al., “student perception of moodle learning management system,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033; manal abdulaziz abdullah, “learning style classification based on student's behavior in moodle learning management system,” transactions on machine learning and artificial intelligence 3, no. 1 (2015): 28; liping deng and nicole judith tavares, “from moodle to facebook: exploring students’ motivation and experiences in online communities,” computers & education 68 (2013): 167–76, https://doi.org/10.1016/j.compedu.2013.04.028. 36 j. f. hair, william c. black, and barry j. babin, multivariate data analysis: a global perspective, 7th ed. (upper saddle river, nj: pearson, 2010). 37 l. j. cronbach, “test validation,” in educational measurement, r. l. thorndike 2nd ed. (washington, dc: american council on education, 1971). 38 b. tabachnick and l. fidell, using multivariate statistics, 5th ed. (new york: allyn and bacon, 2007). 39 hair, black, and babin, multivariate data analysis. 40 b. m. byrne, structural equation modeling with amos: basic concepts, applications, and programming, 2nd ed. (new york: taylor & francis group, 2010); hair, black, and babin, multivariate data analysis. 41 t. a. brown, confirmatory factor analysis for applied research (methodology in the social sciences) (new york: guilford, 2006); byrne, structural equation modeling with amos; d. gefen, d. straub, and m. boudreau, “structural equation modeling and regression: guidelines for research practice,” communications of the association for information systems 4, no. 7 (2000): 1–77; hair, multivariate data analysis: a global perspective. 42 p. j. curran, s. g. west, and j. f. finch, “the robustness of test statistics to nonnormality and specification error in confirmatory factor analysis,” psychological methods 1, no. 1 (1996): 16–29, https://doi.org/10.1037/1082-989x.1.1.16. 43 byrne, structural equation modeling with amos. 44 brown, confirmatory factor analysis for applied research; byrne, structural equation modeling with amos. 45 hair, black, and babin, multivariate data analysis: a global perspective. 46 michael b. eisenberg, carrie a. lowe, and kathleen l. spitzer, information literacy: essential skills for the information age (westport, ct: greenwood publishing group, 2004). 47 james elmborg, “critical information literacy: implications for instructional practice,” the journal of academic librarianship 32, no. 2 (2006): 192–99, https://doi.org/10.1016/j.acalib.2005.12.004. information technology and libraries | december 2018 135 48 ibid. 49 anderson and may, “does the method of instruction matter?,” 495–500; horvat et al., “student perception of moodle learning management system,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033. 50 horvat et al., “student perception of moodle learning management system,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033; anderson and may, “does the method of instruction matter?,” 495–500; raven and rodrigues, “a course of our own.” 51 churkovich and oughtred, “can an online tutorial pass the test for library instruction?,” 25–38. 52 anderson and may, “does the method of instruction matter?,” 495–500. 53 peter et al., “making information literacy instruction more efficient,” 1–16. abstract introduction literature review information literacy moodle as an lms methodology inclusion criteria exclusion criteria results measurement and structural model longitudinal case study discussion and conclusion references president’s message: imagination and structure in times of change bohyun kim information technology and libraries | december 2018 2 bohyun kim (bohyun.kim.ois@gmail.com) is lita president 2018-19 and chief technology officer & associate professor, university of rhode island libraries, kingston, ri. in my last column, i talked about the discussion that lita had begun regarding forming a new division to achieve financial sustainability and more transparency, responsiveness, and agility. this proposed new division would merge lita with alcts (association for library collections and technical services) and llama (library leadership and management association). when this topic was brought up and discussed at an open meeting at the 2018 ala annual conference in new orleans, many members of these three divisions expressed interests and excitement. at the same time, there were many requests for more concrete details. you may recall that as a response to those requests, the steering committee, which consists of the presidents, presidents-elect, and executive directors of the three divisions decided to form four working groups with the aim of providing more complete information about what the new division would look like. today, i am happy to report that the work of the steering committee and the four working groups is well underway. the operations working group that i have been chairing for the last two months submitted its recommendations on november 23. the activities working group finished its report on december 5. the budget and finance working group also submitted its second report. the communications working group continues to engage members of all three divisions by sharing new updates and soliciting opinions and suggestions. most recently, it started gathering input and feedback on potential names for the new division.1 you can see the charges, member rosters, and current statuses of these four working groups in the ‘current information’ page at the ‘alcts/ llama/ lita alignment discussion’ community in the ala connect website (https://connect.ala.org/communities/allcommunities/all/all-current-information).2 to give you a glimpse of our work preparing for the proposed new division, i would like to share some of my experience leading the operations working group. the operations working group consisted of nine members, three from each division, in addition to myself as the chair and one staff liaison. we quickly became familiar with the organizational and membership structures of three divisions. the three divisions are similar to one another in size, but they have slightly different structures. lita has 18 interest groups (ig), 25 committees, and 4 (current) task forces; llama has 7 communities of practice (cop) and 46 discussion groups / committees / task forces; alcts has 5 sections, 42 igs, and 61 committees (20 at the division level and 41 at the section level). all committees and task forces in lita are division-level, while alcts and llama have committees that are either division-level or section/cop-level. alcts is unique in that it elects section chairs, who serve on the division board alongside with alcts directors-at-large. alcts also has a separate executive committee in addition to the board. llama has self-governed cops, which are formed by the board’s approval. among all three, lita has the most flat and simplest structure due to its intentional efforts in the past. for example, there are neither sections nor mailto:bohyun.kim.ois@gmail.com https://connect.ala.org/communities/allcommunities/all/all-current-information information technology and libraries | december 2018 3 communities of practice in lita, and the lita board eliminated the executive committee a few years ago. the steering committee of the three divisions agreed upon several guiding principles for the potential merger. these include (i) open, flexible, and straightforward member engagement, (ii) simplified and streamlined processes, and (iii) a governance and coordinating structure that engages members and staff in meaningful and productive work. the challenge is how to translate those guiding principles into a specific organizational structure, membership structure, and bylaws. clearly, some shuffling of existing sections, cops, and igs in three divisions will be necessary to make the new division as effective, agile, and responsive as promised. however, when and how such consolidation should take place? furthermore, what kind of guidance should the new division provide for members to re-organize themselves into a new and better structure? these are not easy questions to answer. nor are they something that can be immediately answered. some changes may require going through multiple stages for them to be completed. this may concern some members. they may prefer all these questions to have definitive answers before they decide on whether they will support the proposed new division or not. people often assume that a change takes place after a big vision is formed, and then the change is executed by a clear plan that directly translates that vision into reality in an orderly fashion. however, that is rarely how a change takes place in reality. more often than not, a possible change builds up its own pressure, showing up in a variety of forms on multiple fronts by many different people while getting stronger, until the idea of this change gains enough urgency. finally, some vision of the change is crafted to give a form to that idea. the vision for a change also does not materialize in one fell swoop. it often begins with incomplete details and ideas that may even conflict with one another in its first iteration. it is up to all of us to sort them out and make them consistent, so that they would become operational in the real world. recently, the steering committee reached an agreement regarding the final version of the mission, vision, and values of the proposed new division. i hope these resonate with our members and guide us well in navigating challenges ahead if the membership votes in favor of the proposal. the new division’s mission: we connect library and information practitioners in all career stages and from all organization types with expertise, colleagues, and professional development to empower transformation in technology, collections, and leadership, and to advocate for access to information for all. the new division’s vision: we shape the future of libraries and catalyze innovation across boundaries. the new division [name to be determined] amplifies diverse voices and advocates for equal and equitable access to information for all. the new division’s values: shared and celebrated expertise; strategically chosen work that makes a difference; transparent, equitable, flexible, and inclusive structures; empowering framework for experimental and proven approaches; intentional amplification of diverse perspectives; expansive collaboration to become better together. imagination and structure in times of change | kim 4 https://doi.org/10.6017/ital.v37i4.10850 in deciding on all operational and logistical details for the new division, the most important criteria will be whether a proposed change will advance the vision and mission of the new division and how well it aligns with the agreed-upon values and guiding principles. the steering committee and the working groups are busy finalizing the details about the new division. those details will be first reviewed by the board of each division and then shared with the membership at the midwinter for feedback. i did not anticipate that during my service as the lita president-elect and president, i would be leading a change as great as dissolving lita and forming a new division with two other divisions, alcts and llama. it has been an adventure filled with many surprises, difficulties, and challenges, to say the least. this adventure taught me a great deal about leading a change for an organization at a high level. when we move from the high-level vision of a change to the matter of details deep in the weeds, it is easy to lose sight of the original aspiration and goal that led us to the change in the first place. trying to determine as many logistical details becomes tempting to those in a leadership role because we all want to assure people in our organizations at a time of uncertainty and to make the transition smooth. however, creating a new division itself is a huge change at the highest level. it would be wrong to backtrack on the original goal to make the transition smooth. for it is the original goal that requires a transition, not vice versa. i believe those in a leadership role should accept that their most important work during the time of change is not to try to wrangle logistics at all levels but to keep things on track and moving in the direction of the original aspiration and goal. lita and two other divisions have many talented and capable members who will be happy to lend a hand in developing new logistics. the responsibility of leaders is to create space where those people can achieve that freely and swiftly and to provide the right amount of framework and guidance. i hope that all lita members and those associated and involved with lita see themselves in the vision, mission, and values of the new division, embrace changes from the lowest to the highest level, and work towards making the new vision into reality together. 1 you can participate in this process at https://connect.ala.org/communities/communityhome/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b1cb4a82b8d09 and http://www.allourideas.org/newdivisionname. 2 this ‘current information’ page will be updated as the plans for the new division develop. https://connect.ala.org/communities/community-home/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b-1cb4a82b8d09 https://connect.ala.org/communities/community-home/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b-1cb4a82b8d09 https://connect.ala.org/communities/community-home/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b-1cb4a82b8d09 http://www.allourideas.org/newdivisionname lib-s-mocs-kmc364-20140601052745 146 journal of lihmry automation vol. 5/2 june, 1972 book reviews book catalogs. by maurice f. tauber and hilda feinberg. metuchen, n.j.: scarecrow press, 1971. 572 p. $15.00 in 1963 kingery & tauber published a collection entitled book catalogs. this is a much larger follow-up, containing twenty papers published between 1964 and 1970 and eight previously unpublished pieces. not surprisingly, nearly all of them are concerned with computer-produced book catalogs-in academic, special, county, public, and school libraries. although nearly all of the previously published papers appeared in wellknown journals, it is useful to have them collected together; the older ones are now of mainly historical interest, but, taken as a whole, they form a valuable record of trial and error-also of progress. it would be unfair to single out any of the published articles for special praise or blame. in a rapidly changing field, even the good is soon improved upon. it is the examples, the castings, and above all the mistakes that are so helpful. there is no excuse now for running into problems that have in the past led to the total scrapping of some computer systems: unforeseen filing difficulties, insufficient computer storage, bad economic estimating, and inability to produce an acceptable product. one major problem is still unsolved and indeed has not really been tackled systematically-the pattern of output (main sequence and supplements) that provides maximum usability at minimal cost-a problem surely amenable to or techniques. as a reviewer from the united kingdom, i would like to have seen a little more on relevant events there than is provided by frederick g. kilgour's general review: the smaller budgets of british libraries have generally enforced much more careful planning and, although there may be fewer successes, there are also very few failures. the introduction and the three final pieces, all specially written, are of great value, particularly hilda feinberg's "sample book catalogs and their characteristics" (some samples are unbelievably horrible). for good measure there is a bibliography, a (computer-produced) index, and the listing of "book form catalogs" reprinted from lrts. book reviews 147 i would hazard a guess that it is with com that the future lies for many libraries. the next collection of papers, for which i hope we shall not have to wait eight years, must surely be entitled "book and microform catalogs." maurice b. line an introduction to pljl programming for library and information science. library and information science series. by thomas h. mott, jr., susan artandi, and leny struminger. new york: academic press, 1972. 231 p. the importance of this text rests in the authors' assumptions that the acquisition of programming skills by the library student is an essential component of his education in the fields of library automation and information retrieval. such skills should enable the student to examine critically the relevance of automated information handling for the library, to experiment with some basic methods of manipulating machine readable textual material, and, "to acquire an understanding of the role of the programmer in the development of ... information handling techniques." the selection of a programming language for this text deserves some comment. pl/1 has been recognized as a particularly suitable language for the processing of textual material and data base management applications. its extensive and powerful repertoire of bit, character, string, array, record, and file manipulation capabilities argue strongly in favor of its adoption for library and other information handling applications. students should be encouraged by the selection of pl/1 for this text, for it offers the novice great flexibility and ease in constructing and manipulating even the most complex types of information structures. this title constitutes the first published attempt to tailor an introductory programming text to the needs of the library student. as such, it possesses several characteristics which distinguish it from other basic programming books, including other pl/1 texts. the language features receiving the greatest share of attention in the present title are the set of built-in functions in pl/1 designed to facilitate the manipulation of strings of both binary and character data. discussion of four of these functions ( bool, unspec, verify, and translate) is usually omitted from general introductory pl/1 textbooks. although the discussions of the bool and unspec functions are reasonably complete, the explanations of verify and translate fail to indicate the scope of their applications. for example, the utility of the verify function as an index function for ranges is completely ignored. a more illuminating example of the power of the translate function could have explored its usefulness in converting ascii characters to the corresponding characters of the ebcdic set. this might have clarified the section entitled "internal representation of pl/ 1 characters," which contains an equivalence table for the pl/1 character 148 journal of library automation vol. 5/2 june, 1972 set in ascii and ebcdic without indicating its purpose. additional use of this example could have been made in the presentation of the marc material, where the practical value of such a function could be stressed. another desirable feature of this text for the instructor and the library student is the inclusion of sample problems and exercises which , since they refer exclusively to text processing, library automation , and information retrieval, should be readily understandable. unfortunately, the present volume omits any mention of the picture attribute and its uses. as a powerful device facilitating the interchange of data between numeric and character variables and the uncomplicated editing of numeric fields prior to output time, its inclusion would have proved valuable to the tex t handling programmer. however, it should be emphasized that this appears to be the single instance in the text in which a generally acknowledged basic language feature has been entirely excluded. it seems to me that too much of the text ( 15-25 percent) is devoted to developing some of the elementary concepts of boolean algebra and constructing a theoretical model of document retrieval based on these concepts. one possible explanation for this emphasis is the fact that the material for the book was drawn from a graduate seminar in programming theory for information handling. although these chapters are informative and the exposition of ideas is straightforward, they should have been omitted. the space which they occupy could have been used more successfully to explore those pl/1 features essential for information handling but excluded or treated too briefly in the present volume. a list of such topics would include: an expanded discussion of program interrupts and the on condition, a description of pl/1 record formats emphasizing the variable length record, and a guide to the use of the varying structure method of writing variable length records. the deficiencies of this text are its overemphasis of information retrieval theory and applications, and its failure to stress those features of pl/ 1 which would enable the student to appreciate the file-handling capabilities of the language. however, for many instructors the availability of programming examples which should be easily grasped by the library student may strongly outweigh these disadvantages. h award s. harris guidelines f01' library automation; a handbook for federal and other libraries. by barbara evans markuson, judith wagner, sharon schatz, and donald black. santa monica, calif.: system development corporation, 1972. 401 p. $12.50 this handbook is the result of a 1970 study on the status of fed eral library automation projects which was conducted under the auspices of tlw federal book reviews 149 library committee's task force on automation. the survey was carried out by the system development corporation and funded by the u.s. office of education. it is one of two reports generated from the study data, the other report being aut01nation and the federal library community. the study consisted of a questionnaire survey of 2,104 federal libraries of which 964 responded. of that number, 57 libraries had one or more functions automated and ten had one or more functions in various stages of development or planning. the survey revealed that, among other activities, 27 cataloging systems (presumably "cataloging" means catalog card production), 25 serials systems, and 13 circulation systems were operational. the handbook purports to help the federal librarian answer the question: .. is it feasible to use automation for my library?" it attempts to do this by presenting step-by-step guidelines "from the initial feasibility survey through systems analysis and design to fully operational status." that material more or less follows a pattern of discussion on automation procedure followed by a checklist of the procedures in chart form. the areas covered include "feasibility guidelines" concerning such points as equipment, personnel, budget, and existing files; and "systems development guidelines" which include planning, analysis, design, implementation, and operation. the discussions include brief reviews of the various aspects of automation development, and statements describing the experiences of federal librarians as reported in the study. in this fashion, the reader is informed of the steps that should be considered with each aspect of automation development and, additionally, he is informed of what his colleagues have previously done about each phase and/or problem. much of this material is too general and too brief to do more than call the reader's attention to the fact that certain requirements must be met in the successful development of an automation project. a large portion of the book is taken up with descriptions of automation projects in 59 federal libraries. this overview of the federal sector provides limited descriptive information about each library and reviews the various applications in terms of system descriptions, equipment, programs, future plans, documentation, etc. the reviews are not consistent in that not all of the above points are included in every review. this, however, is the result of the data submitted to the survey by the respondents. approaches have been provided to this survey material by automated application, form of publication, type of equipment used, and by the special features of each system. surprisingly, there is no approach by name of library. at least one very important library is not represented, i.e., livermore, but for some reason, a similar library, los alamos, is included. the final section of the book is a potpourri of information about nonfederal automation activities and is the weakest section of the volume. it includes a list of "automated libraries" that was published before and is very incomplete and poorly defined. additionally, it briefly discusses data bases, commercial ventures, and for no apparent reason suddenly includes 150 journal of library automation vol. 5/2 jun e, 1972 22 pages of information on microforms in libraries. it just as suddenly reverts back to automation and proceeds to provide 23 pages of data on input/output hardware in libraries. the final section is a selected bibliography that seems almost as aimless as the section before it. the items included "have been selected on the basis of their particular interest and applicability to federal libraries," it is stated. they range over the whole spectrum of library automation, and some items have nothing to do with automation at all. there is no index to the book as a whole and a fair number of errors are present. in summary, the book includes a limited amount of rather old information most of which is available in other places in far greater detail. it appears that sdc had some rather weak survey data that seemed like it should be used! as a book of "guidelines" it does succeed in providing information in uncluttered and simplified form , but it is a very disappointing publication that leaves much to be desired both in substance and in organization. donald p. hammer canadian marc; a report of the activities of the marc task group resulting in a recommended canadian marc format for monographs and a canadian marc format for serials. recommended to the national librarian. by dr. guy sylvestre. ottawa: the national library of canada, 1972. canada's approach to the realization of a proposed format for machinereadable cataloging data was influenced by several factors. first and foremost was the fact that canada is bi-lingual, dictating the requirement for the possible representation of data in both french and english. in addition, the national library of canada wanted to continue its interaction with the library of congress and also to coordinat e the development of a canadian marc with international developments. the formats recommended are for the communication of machinereadable cataloging data. the processing of the data by local libraries was not ignored. it was recognized that this could involve ( 1) expansion. of the format to accommodate processing data (e.g., for acquisitions, serial control); and ( 2) the development of data format independent software for effective data storage and retrieval (e.g., a data management system with logical and physical characteristics of data described independently of specific applications software). the marc task group was established as a result of the recommendations of the conference on cataloguing standards held at the national library of canada in may 1970. the mission of the task group was to study the requirements for a format for machine-readable bibliographic records to be used in canada. the group was not to concern itself with book reviews 1.51 cataloging standards as such, since these were to be considered by the task group on cataloguing standards. the marc task group limited its attention to monographs and serials because this was the greatest need at the time. it was felt that after development of these two basic formats, i.e., monographs and serials, other formats for films, manuscripts, maps, etc., could be more logically developed. recognizing that canada has two official languages and that this creates specific bibliographic needs, the task group's first recommendation was that the national library of canada assume the responsibility for developing a distinctive canadian marc format. variations from the library of congress format are to be kept to a minimum, due to: • economic considerations. • dedication of canadian library communication (in common with the library of congress) to the full application of the aacr, american edition and the "version fran9aise." • willingness of canada for continued heavy reliance upon the library of congress for answering its bibliographical needs in both the traditional way as well as in machine-readable form. • readiness of canada to accept future bibliographic developments and amendments proposed by the library of congress, e.g., new filing rules. it is further recommended that: • the development of a separate canadian marc be coordinated with international developments such as isbd (international standard bibliographic description) and isds (international serials data system). • the national library of canada adopt the precis (preserved context index system) developed for bnb for the purpose of adding subject data to marc records for canadian publications in the form of descriptors. • any new data elements and varying levels of completeness of data introduced into the format in the future (for other media, specialized collections, or retrospective conversions) do not conflict with the basic specifications recommended for canadian marc. several studies were made by the task group. one addressed the need for marc formats and the user requirements for such formats, keeping in mind the need for bi-lingual content in the perspective of an international marc as to data for author, title, collation and notes, geographic names, and subject. format requirements were based on a comparison of the united states and united kingdom formats and the examination of italian and other national marc formats. an intensive study was made of the proposed library of congress format for serials. the implications and requirements for a marc format to be used in conjunction with information retrieval and indexing systems were also examined. the best formats were then defined and recommended to the national librarian. 152 journal of library automation vol. 5/2 june , 1972 the format recommended for monographs may be summarized as follows: 1. the tags are mainly from the library of congress marc-ii, with adoptions from bnb and monocle. particular attention was paid to avoiding conflict with any of the national formats. the library of congress 900 tags were expanded to provide canadian libraries the option of selecting data in bi-lingual content, i.e., the data for the secondary entry fields could be represented in either the french or english equivalent. 2. the indicators specified in the library of congress format have been retained. some additional ones from bnb and monocle have been added. 3. the subfield codes of the library of congress format have been used most often with additional ones from bnb. there is no basic conflict with the library of congress marc. canadian marc is more specific and the more precise specifications are hospitable to the library of congress format. it was felt that the subfielding for filing values or relationships found in monocle could be met by software. 4. descriptive and bibliographic content are not altered in any way since they are dealt with by cataloging codes. however, for codified content (e.g., codes for language, geographic area, bibliographic area, intellectual level), use of standard international codes is recommended. meanwhile, library of congress marc-ii codes will be used for some fields, e.g., languages, geographic area . for serials, it was the intention of the task group to maintain compatibility with the canadian marc format for monographs. however, it was necessary to study the proposed formats for serials issued by the library of congress, mass-a marc-based automated serials system proposed in the united kingdom by the birmingham libraries co-operative mechanisation project, and the french monocle. the proposed canadian marc format for serials has been based on the recommendation for the processing of serials issued by the task group on cataloguing standards. data elements were isolated to meet special applications such as: 1. the preparation of union lists for serial holdings with minimal bibliographic data (e.g., by broad subject groupings, by form division). 2. the bibliographic description of canadian serials for a national bibliography. 3. the development of local library in-house systems for acquisition, processing, and control of serials. 4. the preparation of a canadian serials directory incorporating a minimum of data and with a constant update facility. book reviews 153 this diversity of requirements led the task group to state several beliefs. first, the isolation of data elements for local library in-house systems and the compatibility of these data elements to allow for the exchange of computer programs can best be done by allocating a tag structure in a format separate to the main serials communication format. second, there is a requirement for the relating of entries in the serial and monograph format (e.g., monographs in series which may appear in either format). if an exchange of data between the two formats is necessary, there may be a need to have an additional tag or a more extensive tagging structure for titles and series title entries. the specific recommendations for serials were that the national library should: 1. participate in the unesco proposals for an international serials data system in which the isolation of data elements for international exchange will have a direct bearing on the elements in a canadian marc serials format. 2. immediately initiate any action deemed advisable within the international proposals to provide standard serial numbers for canadian serial publications. 3. consider the preparation of a canadian serials directory as a separate project. 4. initiate a pilot project with other libraries to test the proposed canadian serials format prior to full implementation. 5. on the basis of the above recommendations, explicitly state which data elements are necessary. (the proposed format for serials recommended has those elements asterisked that the task group believed were not necessary. these are all processing control-oriented, e.g., frequency control, publication patterns, indexing, and abstracting coverage.) the report includes three comparative tables to be used in evaluating the proposed canadian marc formats. table 1 compares, for monographs, the library of congress, united kingdom, french (monocle), and italian formats against the format proposed for canada. table 2 compares the library of congress proposed format and the mass format for serials against the format proposed for canada. table 3 compares the canadian format for monographs against the canadian format for serials. copies of the table 1 were submitted to the united states, the united kingdom, france, and italy for review and comments. the resulting revisions were not incorporated in the report since this would have delayed publication. the tagging structure, therefore, may be slightly revised when the canadian marc user's manual is finalized. however, those interested in the compatibility of the canadian formats with the library of congress formats and the implications of the canadian formats for an international marc format will find the tables sufficient. lillian h. \vashi11g ton 154 ]oumal of library automation vol. 5/ 2 june, 1972 monocle: pro;et de mise en ordinateur d'une notice catalographique de livre. publications de ia bibliotheque universitaire de grenoble, 4. [par) marc chauveinc. 2.eme ed. grenoble: bibliotheque interuniversitaire, 1972. 197 p. plus 25 annexes and errata a review of the 1st edition of monocle appeared in ]ola in march 1971 ( v. 4, no. 1, pp. 57-58). readers are referred to that review and to the article by m. chauveinc in the september 1971 issue of ]ola (v. 4, no. 3) for a description of the structure of monocle. the format has undergone little change in essentials, but many changes in detail have been made. new fields have been added ( 249: abridged title of periodical; 270: printer's imprint; 545: note showing title of periodical analyzed ) , subfield codes have been changed or added, new indicators have been created (see below), and the names (and therefore the contents) of some fields have been changed ( cf. 241 and 242). the leader has been enlarged from 19 to 24 bytes to show more exactly the address of the index related to a particular bibliographic record ( 4 new bytes) and to show the current number of fields in the record ( 2 new bytes) and the current length of the record ( 2 new bytes ) as well as the initial number of fields and the initial length. the length of the index is no longer given. thus the leader makes use of 8 new bytes and has discontinued 2 (only 18 of the original 19 were utilized). what has remained unchanged is the emphasis on coding for filing arrangement and on the use of tags to identify not only the nature of a field but its different functions and its relationship with other data. there is increased emphasis, however, on the importance of the integration and collaboration of several libraries in automation activities and, therefore, on the need for monocle to be generalized so that it is usable by institutions with other goals, hardware, and processing languages than the university of grenoble. mention is made throughout the volume of the variant approach of the bibliotheque nationale which uses monocle to prepare the bibliographie de la france. one change in the second edition is the increased awareness of the complexities involved in dealing with subrecords. the use of the subrecord technique has therefore been limited to works meeting certain requirements. the requirements are so strict that, for all practical purposes, grenoble does not use subrecords. instead, it uses secondary entries, or series headings, or contents notes. an important change has been made in the first indicator position of personal name fields ( 100, 400, 600, 700, 800, 900) which, in the 1st edition, was similar to marc. a new indicator structure has been created to facilitate construction of sort keys. a first indicator of '0' is used for forenames of saints, popes, and emperors. a '1' indicates a name that is to be filed exactly as given, whether it is a forename, simple surname, or multiple surname. book re vie ws 155 a '2' is used for multiple surnames containing a hyphen that is to be replaced by a blank, e.g., saint-exupery. a '3' is used when a name contains a blank, apostrophe, or hyphen that is to be deleted, e.g., la fontaine. a '4' is used for complex names, whether simple or multiple, in which it is necessary to keep some blanks and/or letters and to delete others. for this purpose, monocle makes use of three vertical bars to distinguish text to be printed and used for sorting from text to be printed only from text (supplied) to be used only for sorting. since the three bars are used only in fields with 1st indicator of '0' or '4', the use of these indicators enables the program to test for them only when these indicators are present instead of in every field. the 1st indicator of '4' is used for complex arrangements utilizing the three bars in other fields as well: 110, 111, 241, 243, 245, 410, 411, 441, 443, 445 and the equivalent 6xx, 7xx, 8xx, and 9xx fields. the errors in this volume are minor. monocle still lists field 653 (proper names incapable of authorship) as an lc subject field, although this field was discontinued almost as soon as it was created so that it doesn't even appear in the 1st edition ( 1969) of the marc manual.s. in a discussion of the use of terminals to catalog books, it footnotes 'the library' of 'ohio college' rather than 'the libraries' affiliated with the ohio college library center. the review of the 1st edition pointed out that one of the values of monocle for american librarians was the light it threw on marc. that statement still holds true. for purposes of facilitating its use for this purpose, an english language translation might be of value. judith hqpkins 156 journal of library automation vol. .5/2 june, 1972 information retrieval and library automation this monthly review is unique in its extensive u. s i and international coverage of the many specialized fields which contribute to improved information systems and labrary services for sc;ience, social science, technology, law and medicine; these fields include; computer technology and systems, library science and technology, library administration, photo· graphic technology and microforms, facsimile and communications, library and information networks, reprographic and printing technologies, copyright issues, indexing systems, mechine-aided indexing and abstracting, documentation and data standards, databanks and anlysis centers. subscription is $24.00 per year (overleas subscribers add $6.00). orders and inquiries should be directed to: lomond systems, inc. mt. airy maryland 21771 48,222 strong ... and still growing! f.w. faxon company, the only fully automated library subscription agency in the world, has an ibm 370/145 computer currently listing 48,222 p e riodic a ls for your library. our 'til forbidden service the a utomatic annua l renewal of your subscriptions provides fast, accurate, and efficient processing of your orde rs and invoices. send lor free descriptive brochure and annual librarians' guide. library business is our on ly businesssince 1886. reitic1 f. w. faxon co. ,inc. lljllj 15 southwest park westwood, massachusetts 02090 tel: (boo) 225-7894 (toll free) the american lffirary association announces the exclusive distribution here in the united states qf non-book materials: the organization of integrated collections jean riddle, shirley lewis, janet macdonald, in consultation with the technical services committee of the canadian library association. non-book materrols published by the canadian library association ( 1970) is now being exclusively distributed in the united states by the american library association. these officially approved rules for the cataloging of audiovisual materials were designed to be compatiable with parts i and ii of anglo-american cataloging rules (ala 1970). though written with the school library in mind, the principles can be applied to any library system which houses books and other media together and has a single, unified list of holdings. color coding, organization of media, rules for descriptive cataloging, use of illes, storage and media destination are covered in addition to discussing 20 different media. the glossary of media designations in the book is an attempt to standardize terminology within the media industry. isbn0-8389-3129-4 (1971 $3.50 iidiamerjcan library association ~ l huran st • chicaga 60611 lib-s-mocs-kmc364-20141005045744 book reviews key papers in informatwn science. edited by arthur w. elias. washington, d.c.: american society for information science, 1971. 223 p. $6.00. when i re-read the articles making up this volume for the purpose of writing this review, a strong feeling of nostalgia welled up. as a reader who has lived t?rough the years of speculation, exploration, experiment, development, and debate that they embody, i couldn't help but feel again the spirit of excitement that i and others felt at the time. these are indeed "key papers," and it's valuable to have them together. oh, of course some names are missing and are missedmooers, taube, fairthorne, perry and kent, bar-hillel, bush, shaw-but enough of them are here to give a full flavor of the times. the question is whether, as a collection, this set of papers has value beyond nostalgia. before turning to that question, however, let's see what they consist of. the volume groups nineteen papers into four categories: (1) background and philosophy, (2) information needs and systems, ( 3) organization and dissemination of information, and ( 4) other areas of interest. the first includes papers by borko, by shera, and by otten and debons that attempt to define information science, its relationship to librarianship, and its potential as an independent discipline. the second includes papers by weinberg, by murdock and liston, by taylor, by parker and paisley, and by kertesz that outline the purposes and functions of information transfer, especially for the sciences. the third includes papers by doyle, by fischer, by conner, and by rees that present some of the techniques which have been developed for handling, organizing, and presenting information-especially mechanized ones such as kwic indexes, automatic indexing and abstracting, and sdi. the final section presents a potpourri of topics: a paper by lipetz on information storage book reviews 269 and retrieval, one by de gennaro on library automation, one by garvin on natural language, one by borko on systems analysis, and one by heilprin on technology and copyright. the defined purpose of this collection is to serve students and instructors in introductory courses in information science, by making these key papers readily available as assigned readings. they indeed are useful readings, and the organization imposed on them by the editor, elias, adds greatly to their usefulness, making them far more than a simple chronological listing. despite this, however, i must confess that, as the instructor in an introductory course in which we used the key papers for the purpose for which it was intended, it fell short of meeting the needs. since then, i've tried to evaluate why. recognizing that the difficulties may have been due to the style of the instructor and the form of the course the fact is that any collection of readings: valuable though they individually may be, has many deficiencies. i suppose they can all be summed up as follows: a collection of papers has the appearance of a book without being a book. it lacks congruity; it lacks balance; it lacks inherent structure in contrast to that which is imposed; it lacks a theme or point to be made; it lacks a consistent style. as a sometime publisher, as an editor of a -series of books as a reviewer of prospective manuscrip~ i have felt that these things are as important in evaluation as substance and content. beyond this, a more important fact is that these papers, "key" though they are, represent the past, not the present. an introduction to information science requires reading assignments in the work of today, not just those of historical importance. on the other hand, the fact remains that these are important papers, ones with which students should be come familiarand not simply for historical purposes, and that most instructors and classes should bnd this a useful volume. robert m. hayes becker & hayes, inc. microsoft word december_ital_biswas_final.docx analyzing digital collections entrances: what gets used and why it matters paromita biswas and joel marchesoni information technology and libraries | december 2016 19 abstract this paper analyzes usage data from hunter library’s digital collections using google analytics for a period of twenty-seven months from october 2013 through december 2015. the authors consider this data analysis to be important for identifying collections that receive the largest number of visits. we argue this data evaluation is important in terms of better informing decisions for building digital collections that will serve user needs. the authors also study the benefits of harvesting to sites such as the digital public library of america, and they believe this paper will contribute to the literature on google analytics and its use by libraries. introduction hunter library at western carolina university (wcu) has fourteen digital collections hosted in contentdm—a digital collection management system from oclc. users can enter the collections in various ways—through the library’s contentdm landing pages,1 search engines, or sites such as the digital public library of america (dpla) where all the collections are harvested.2 since october 2013, the library has collected usage data from its collections’ websites and from dpla referrals via google analytics. this paper analyzes this usage data covering a period of approximately twenty-seven months from october 2013 through december 2015. the authors consider this data analysis important for identifying collections receiving the largest number of visits, including visits through harvesting sites such as the dpla. the authors argue that such data evaluation is important because it can better inform decisions taken to build collections that will attract users and serve their needs. additionally, this analysis of usage data generated from harvesting sites such as the dpla demonstrates the usefulness of harvesting in increasing digital collections’ usage. lastly, this paper contributes to the broader literature on google analytics and its use by libraries in data analysis. literature review using google analytics to study usage of electronic resources is common; a considerable amount of material exists describing the use of google analytics in marketing and business fields.3 paromita biswas (pbiswas@email.wcu.edu) is metadata librarian and joel marchesoni (jmarch@email.wcu.edu) is technology support analyst, hunter library, western carolina university, cullowhee, north carolina. analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 20 however, the published literature offers little about the use of this software for studying usage of collections consisting of unique materials digitized and placed online by libraries and cultural heritage organizations. for example, betty has written about using google analytics to track statistics for user interaction with librarian-created digital media such as quizzes and video tutorials.4 fang discusses using google analytics to track the behavior of users who visited the rutgers-newark law library website.5 fang looked at the number of visitors, what and how many pages they visited, how long they stayed on each page, where they were coming from, and which search engine or website had referred them to the library’s website. findings were evaluated and used to make improvements to the library’s website. for example, fang mentions using google analytics data for tracking the percentage of new and returning visitors before and after the website redesign. among articles that discuss using web analytics to learn how users access digital collections, most have focused on a comparison between third-party platforms, online search engines, and the traditional library catalog to find preferred modes of access and whether results call for a shift in how libraries share their digital collections. for example, in their article on the impact of social media platforms such as historypin and pinterest on the discovery and access of digital collections, baggett and gibbs use google analytics for tracking usage of digital objects on the library’s website as well statistics collected from historypin’s and pinterest’s first-party analytics tools.6 the authors conclude that while neither historypin nor pinterest drive users back to the library’s website, they help in the discovery of digital collections and can enhance user access to library collections. schlosser and stamper compare the effects on usage of a collection housed in an institutional repository and reposted on flickr.7 whether housing a collection on a third-party site had an adverse effect on attracting traffic to the library’s website was not as important as ensuring users accessed the collection somewhere. likewise, o’english demonstrates how data from web analytics were used to compare access to archival materials via online search engines as opposed to library catalogs using marc records for descriptions.8 o’english argues library practices should change accordingly to promote patron access and use. ladd’s article on the access and use of a digital postcard collection from miami university uses statistics from google analytics, contentdm, and flickr over a period of one year.9 ladd’s findings reveal that few users came to the main digital collections website to search and browse; instead, most arrived via external sources such as search engines and social media sites. the resulting increase in views makes it imperative, ladd asserts, that regular updates both in contentdm and flickr are important for promoting access and use of the postcards. articles on using google analytics for tracking digital collection usage have explored tracking the geographic base of users. for example, herold uses google analytics to demonstrate usage of a digital archival collection by users at institutional, national, and international levels.10 herold looks at server transaction logs maintained in google analytics, onand off-campus searching counts, user locations, and repeat visitors to the archival images representing cultural heritage materials related to orang asli peoples and cultures of malaysia. she uses these data to ascertain information technology and libraries | december 2016 21 the number of users by geographic region and determine that, while most visitors came from the united states, malaysia ranked second. the data supported, according to herold, that this particular digital collection was able to reach another target audience: users from malaysia. herold’s findings indicate that digitization of unique materials makes them available to a worldwide audience. whether harvesting has increased usage of digital collections available via dpla or its hubs has received limited exploration in the literature. most writings on harvesting digital collections have focused more on the technical aspects of the process, like the dpla’s ingestion method, the quality and scalability of metadata remediation and enhancement,11 and large metadata encoding.12 for example, gregory and williams write about the north carolina digital heritage center as one of the service hubs of the dpla. the service hubs are centers that aggregate digital collection metadata provided by institutions for harvesting by the dpla. the authors discuss metadata requirements, software review, and establishment of workflow for sending large metadata feeds to the dpla.13 boyd, gilbert, and vinson, in their article on the south carolina digital library (scdl), another service hub for dpla, describe the planning behind setting up the scdl, its management, and the technology involved in metadata harvesting.14 freeland and moulaison discuss the missouri hub as a model for “institutions with similar collective goals for exposing and enriching their data through the dpla.”15 according to them, by harvesting their metadata to the dpla, institutions are able to share their digital collections with the broader public. additionally, institutions that harvest metadata to the dpla get value-added services like geocoding of locationbased metadata and expression of contributed metadata as linked data. data collection parameters hunter library digital collections usage data included information on item views16 and referrals17 for each of the collections including dpla referrals. the authors also considered keyword search terms18 across all referrals, and within contentdm specifically, that brought users to the library’s collections. the authors considered the most frequently occurring keywords to be representing the subjects of collections that were most used. repeat visitors to the library’s digital collections’ website were also tracked. finally, sessions19 were traced by the geographic area20 of the users. hunter library’s collections vary in size. the library’s largest and one of the oldest collections, craft revival [note: collections are set in roman and capitalized] showcases documents, photographs, and craft objects housed in hunter library and smaller regional institutions. the collection’s items represent the late nineteenth and early twentieth century (1890s–1940s) craft revival movement in western north carolina, which was characterized by a renewed interest in handmade objects, including cherokee arts and crafts. the craft revival collection began in 2005 and includes 1,982 items. the second largest collection, great smoky mountains, which highlights efforts that went into the establishment of the park and includes photographs on the landscape and flora and fauna in the park, began in 2012 and consists of 1,829 items. not all digital analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 22 collections were harvested to the dpla at the same time. while some older collections were harvested to the dpla in 2013, smaller, institution-specific collections started later were also harvested later. for example wcu—oral histories, a collection of interviews collected by students of one of wcu’s history classes documenting the history and culture of western north carolina and the lives of wcu athletes or artists’ like josephina niggli who taught drama at wcu; highlights from wcu, a collection of unique items from wcu’s mountain heritage center and other departments on campus, including letters from the library’s special collections transcribed by wcu’s english department students; and wcu—fine art museum, showcasing art work from the university’s fine art museum, were harvested to the dpla in 2015. as these smaller collections were started later, their total item views and referral counts would likely be less than some of the library’s older collections; however, these newer collections were included as they might provide valuable data regarding harvesting referrals and returning visitors. table 1 shows the years the collections were started, the number of items included in each collection, and the year they were harvested to the dpla. collection name start year collection size (number of items) harvested since cherokee traditions 2011 332 2013 civil war 2011 68 2013 craft revival 2005 1,982 2013 great smoky mountains 2013 1,829 2013 highlights from wcu 2015 39 2015 horace kephart 2005 552 2013 picturing appalachia 2012 972 2013 stories of mountain folk 2012 374 2013 travel western north carolina 2011 160 2013 wcu—fine art museum 2015 87 2015 wcu—herbarium 2013 91 2013 wcu—making memories 2012 408 2013 wcu—oral histories 2015 67 2015 western north carolina regional maps 2015 37 2015 table 1. collections by year information technology and libraries | december 2016 23 collecting data using google analytics the library has had google analytics set up on online exhibits—websites outside of contentdm that provide additional insight into the collection—since 2008 and began using google analytics to track its contentdm materials with the 6.1.2 release in october 2013. contentdm version 6.4 introduced a configuration field that allowed the authors to enter a google analytics id and automatically generate the tracking code in pages to simplify the setup. following that software update, oclc made google analytics the default data logging mechanism. the library set up google analytics such that online exhibits are tracked together with their contentdm collections. this is accomplished by using custom tracking on all webpages and a custom script in contentdm. this allows the library to link its contentdm and wcu.edu domains within google analytics so that sessions can be viewed across all online digital collections. data were collected from google analytics using several tools. google provides an online tool called query explorer (https://ga-dev-tools.appspot.com/query-explorer/) that can create and execute custom searches against google analytics. this application was used to craft the queries. microsoft excel was primarily used to download data, using the custom plugin rest to excel library (http://ramblings.mcpher.com/home/excelquirks/json/rest) to parse information from google analytics into worksheets. the excel add-on works well, but requires knowledge of microsoft visual basic for applications (vba) programming to use effectively. this limitation prompted the authors to look for a simpler way of retrieving data. the authors found openrefine (https://github.com/openrefine/openrefine) to collect, sort, and filter data, with excel used for results analysis. once in excel, formulas were used to mine data for specific targets. results analysis the data collected using google analytics spanned a period of approximately twenty-seven months, from october 2013 through december 2015. table 1 and graph 1 show each collection’s item views, item referrals, and size (number of items in the collection). these numbers were calculated for each collection as a percentage of total item views, total items referrals, and total number of items for all collections together. in table 2, the top five collections in terms of items views and referrals are highlighted. graph 1, a graphical representation of table 2, displays more starkly the differences between collections in terms of views and referrals. analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 24 collection name item views as percentage of total views item referrals as percentage of total referrals number of items as percentage of total items for all collections cherokee traditions 6.38 6.12 4.74 civil war 1.89 0.88 0.97 craft revival 41.35 52.39 28.32 great smoky mountains 7.50 6.34 26.14 highlights from wcu 0.23 0.08 0.56 horace kephart 11.67 7.62 7.89 picturing appalachia 10.03 9.99 13.89 stories of mountain folk 3.51 2.45 5.344 travel western north carolina 7.87 9.57 2.29 wcu—fine art museum 0.19 0.08 1.24 wcu—herbarium 0.71 0.45 1.30 wcu—making memories 7.13 2.64 5.83 wcu—oral histories 0.80 1.08 0.96 western north carolina regional maps 0.26 0.11 0.53 total 100.00 100.00 100.00 table 2. collections by percentage graph 1. collections by percentage information technology and libraries | december 2016 25 as demonstrated in the preceding table and graph, craft revival, one of the library’s oldest and largest collections, contributes more than 28 percent of all digital collections’ items and garners close to 42 percent of all item views and 53 percent of all item referrals. great smoky mountains, the second largest collection, contributes a little more than 26 percent of items but receives only about 8 percent of all item views and 7 percent of all referrals. the horace kephart collection, focusing on the life and works of horace kephart—author, librarian, and outdoorsman who made the mountains of western north carolina his home later in life—is the library’s fourth largest collection. it receives almost 12 percent of all item views and about 8 percent of all item referrals. picturing appalachia, the third largest collection—consisting of photographs showcasing the history, culture, and natural landscape of southern appalachia in the western north carolina region—makes up 14 percent of items and receives approximately 10 percent of all referrals and views. travel western north carolina—visual journeys of western north carolina communities through three generations—contributes fewer than 3 percent of items but scores high on both items views and referrals. wcu—making memories, which highlights the people, buildings, and events from wcu’s history, and stories of mountain folk (somf), which is a collection of radio programs from western north carolina non-profit catch the spirit of appalachia and archived at hunter library, are collections that are similar in size—receiving fewer than 3 percent of all item referrals. however, wcu—making memories receives a more than 7 percent of all item views compared to somf’s almost 4 percent. these findings are not surprising as the making memories collection documents western carolina university’s history and may receive many views from within the institution. overall, however, the craft revival collection can be considered the library’s most popular collection. the horace kephart collection appears to be the second most popular collection. and, not surprisingly, cherokee traditions, a collection of art objects, photographs, and recordings similar in content to the craft revival in terms of its focus on cherokee culture and history, is quite popular and receives more item referrals than both wcu—making memories and somf and more item views than somf (table 2). an analysis of keyword searches within contentdm and keyword searches across all referral sources reiterates these findings. as part of the analysis, data collected for this twenty-sevenmonth period for the top keyword searches within contentdm and the top keyword searches counting all referrals was recorded in an excel spreadsheet and then uploaded to openrefine. openrefine allows text and numeric data to be sorted by name (alphabetical) and count (highest to lowest occurring). once the excel spreadsheet was uploaded to openrefine, keywords were sorted numerically and clustered. openrefine has a “cluster” function to bring together text that has the same meaning but differs by spelling or capitalization (for example, “cherokee,” “cherokee,” “cheroke”) or by order (for example, “jane smith,” “smith, jane”). the clustering function provides a count of the number of times a keyword was used regardless of exact spelling. after identifying keywords belonging to a cluster (for example, a cluster of the word “cherokee” spelled differently), the differently spelled or organized keywords in each cluster were merged in analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 26 openrefine with their most accurate counterparts. finally, it should be noted that keywords including “!” and “+” symbols were most likely generated from either using multiple search terms within contentdm’s advanced search or from curated search links maintained on some of our online exhibit websites. these links take users to commonly used result sets within the collection. tables 3 and 4 provide a listing of the ten most frequently searched keywords within contentdm across all referrals and names of collections that are most relevant to these searches. keywords occurrence count relevant collection(s) cherokee 187 craft revival; cherokee traditions cherokee language 107 craft revival; cherokee traditions southern highland craft guild 98 craft revival basket!object 96 craft revival; cherokee traditions indian masks—appalachian region, southern 83 craft revival; cherokee traditions basket!photograph postcard 82 craft revival; cherokee traditions w.m. cline company 78 picturing appalachia; craft revival cherokee +indian! photograph 72 craft revival; cherokee traditions wood-carving— appalachian region, southern 70 craft revival indian wood-carving— appalachian region, southern 69 craft revival table 3. top keywords searches within contentdm information technology and libraries | december 2016 27 keywords number of sessions relevant collection(s) cherokee traditions 442 craft revival; cherokee traditions horace kephart 185 horace kephart; great smoky mountains; picturing appalachia cherokee pottery 55 craft revival; cherokee traditions kephart knife 50 horace kephart amanda swimmer 37 craft revival; cherokee traditions appalachian people 36 craft revival; cherokee traditions; great smoky mountains; wcu—oral histories cherokee indian pottery 36 craft revival; cherokee traditions cherokee baskets 34 craft revival; cherokee traditions weaving patterns 33 craft revival; cherokee traditions basket weaving 26 craft revival; cherokee traditions table 4. top keyword searches across all referrals tables 3 and 4 show that top searches relate to arts and crafts from the western north carolina region (“baskets,” “indian masks,” “indian wood carving,” “cherokee pottery”), artists (“amanda swimmer”), or topics relating to cherokee culture (“cherokee,” “cherokee language”). searches relating to the horace kephart collection (“horace kephart,” “kephart knife”) are also popular, explaining the fact that the kephart collection, which accounts for fewer than 8 percent of the library’s digital collections’ items scores highly in terms of item views (second) and referrals (fourth). the popularity of topics related to western north carolina is reiterated in the geographic base of the users. graph 2 shows north carolina accounts for most of the searches, with cities in western north carolina (asheville, franklin, cherokee, waynesville) accounting for more than 40 percent of sessions. analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 28 graph 2. cities by session count the majority of item referrals come from search engines such as google, bing, and yahoo! graph 3 shows the percentage of item referrals from these external searches.21 however, the dpla also generates a fair amount of incoming traffic to the collections. for example, while all collections get referrals from the dpla, harvesting to the dpla is particularly useful for smaller collections such as highlights from wcu, wcu—fine art museum, and civil war collection. each of these collections gets 17 percent of referrals from the dpla, making dpla the largest referral source following the search engines for the highlights and fine art museum collections. graph 4 shows referrals each collection receives via the dpla as a percentage of total referrals. this indicates the usefulness of harvesting to the dpla. a trend seems also to show there is an increase in total referrals from dpla per month the longer items are in dpla (graph 5). graph 3. percentage of search engine item referrals (google, bing, and yahoo!) 367 319 171 146 144 135 122 109 105 98 44% 29% 47% 44% 75% 43% 57% 11% 23% 75% 74% 38% 33% 6% 22% information technology and libraries | december 2016 29 graph 4. percentage of dpla item referrals graph 5. increase in dpla referrals over time lastly, new and returning visitors to the collections were tracked as a marker of user interest in particular collections. graph 6 shows data collected for new and returning visitors calculated as a proportion of the total number of visits for each collection. some smaller collections like highlights from wcu, wnc regional maps, wcu—fine art museum, and wcu—oral histories score highly in terms of attracting return visitors (graph 6). 6% 17% 3% 12% 17% 4% 11% 6% 3% 17% 3% 4% 5% 0% analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 30 graph 6. new and returning visitors discussion the aim behind gathering data was to study usage of hunter library’s digital collections and examine the usefulness of harvesting in promoting use. although usage data logs were unable to shed much light on the actual usefulness of the collections to users, the logs provided information on volume of use, what materials were accessed, and where users were located. analysis of the transaction logs indicates that while all collections likely benefitted from harvesting, craft revival, cherokee traditions, and horace kephart (collections focusing on the culture and history of western north carolina) were the most heavily used and most visitors came from the state of north carolina and from the region in particular. search terms in the transaction logs also indicated a strong interest in items related to cherokee culture and horace kephart. as herold, who traced the second largest group of users of the orang asli digital image archive to malaysia notes, the geographic base of a collection’s users can be indicative of the popularity of a subject area.22 likewise, matusiak asserts that users’ comments can be indicative of the relevance of collections to users’ needs and provide direction for the future development of digital collections.23 as neither the craft revival, cherokee traditions, nor horace kephart collection includes items that relate specifically to the university’s history—unlike other institution-specific collections mentioned earlier—it is possible collection users may be more representative of the larger public than the university. these findings point to the need for questioning identification of an academic information technology and libraries | december 2016 31 library’s user base as mainly students and faculty of the institution and whether librarians should give greater consideration to the needs of a wider audience.24 data supporting the existence of this user base, whose true import or preferences might not be captured in surveys and questionnaires, can serve as a valuable source of information for individuals responsible for building digital collections. in an informal survey of hunter library faculty carried out by hunter library’s digital initiatives unit in september of 2014, respondents considered collections such as craft revival to be more useful to users external to the university. while the survey could allude to the nature of the user base of a collection like craft revival, it understandably could not capture the scale of the item views and referrals garnered by this collection as well as a usage data analysis could. on the other hand, analysis of usage data, as demonstrated in this paper, indicated that certain collections— highlights from wcu, wcu—fine art museum, and wcu—oral histories—possibly served a niche audience. these smaller and more recently established collections consisting of universitycreated materials attracted more returning visitors (see graph 6). these returning visitors were likely internal users whose visits indicated, as fang points out, a loyalty to these collections.25 in the paper “a framework of guidance for building good digital collections,” authored by the national information standards organization framework advisory group, the authors point out that while there are no absolute rules for creating quality digital collections, a good collection should include data pertaining to usage.26 the authors point to multiple assessment matrixes including using a combination of observations, surveys, experiments, and transaction log analyses. as the wcu digital collections findings demonstrate, a careful analysis of the popularity of collections can indicate the need for balancing quantitative data with more qualitative survey and interview data. these findings also indicate that usage data analysis can be very valuable in identifying the extent of collection usage by visitors who may not have significant survey representation. results from the small (fewer than ten respondents) wcu survey indicate that some respondents question the institutional usefulness of collections such as craft revival. these results show the importance of taking multiple factors into account when assessing user needs and interests in digital collections. conclusion the authors feel future projects might stem from this data analysis. for example, local subject fields based on the highest recurring keywords that were mined from the transaction logs can be added for all of hunter library’s digital collections. usage statistics at a later period could be evaluated to study if addition of user generated keywords increased use of any collection. as matusiak points out in her article on the usefulness of user-centered indexing in digital image collections, social tagging—despite its lack of synonym control or misuse of the singular and plural—is a powerful form of indexing because of “close connection with users and their language,” as opposed to traditional indexing.27 the terms users assign to describe images are also the ones they are most likely to type while searching for digital images. likewise, according to walsh, a analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 32 study conducted by the university of alberta found more than forty percent of collections reviewed used a locally developed classification for indexing and searching their collections, and many of these schemes could work well for searches within the collection by users who are familiar with the culture of the collection.28 usage-data analysis can constitute useful information that guides decisions for building digital collections that better serve user needs. it can identify a library’s digital collections’ users and what they want. these are important considerations to keep in mind if library services are to be all about engaging and building relationship with the users.29 harvesting to a national portal such as the dpla is beneficial for hunter library’s collections. at the same time, the library’s institution-specific collections receive more return visits, likely because of sustained interest from the large user base of the university’s students and employees, an assessment supported by survey findings. conversely, collections not so directly tied to the institution receive the most onetime item views and referrals. items that get used are a good indication of what users want and, as this paper demonstrates, the focus of academic digital library collections should consider the needs of both the university audience and the general public. references 1. a landing page refers to the homepage of a collection. 2. the dpla provides a single portal for accessing digital collections held by cultural heritage institutions across the united states. “history,” digital public library of america, accessed may 19, 2016, http://dp.la/info/about/history/. 3. paul betty, “assessing homegrown library collections: using google analytics to track use of screencasts and flash-based learning objects,” journal of electronic resources librarianship 21, no. 1 (2009): 75–92, https:// doi.org/10.1080/19411260902858631. 4. ibid. 5. wei fang, “using google analytics for improving library website content and design: a case study,” library philosophy and practice (e-journal), june 2007, 1-17, http://digitalcommons.unl.edu/libphilprac/121. 6. mark baggett and rabia gibbs, “historypin and pinterest for digital collections: measuring the impact of image-based social tools on discovery and access,” journal of library administration 54, no. 1 (2014): 11–22, https:// doi.org/10.1080/01930826.2014.893111. 7. melanie schlosser and brian stamper, “learning to share: measuring use of a digitized collection on flickr and in the ir,” information technology and libraries 31, no. 3 (september 2012): 85–93, https:// doi.org/10.6017/ital.v31i3.1926. information technology and libraries | december 2016 33 8. mark r. o’english, “applying web analytics to online finding aids: page views, pathways, and learning about users,” journal of western archives 2, no. 1 (2011): 1–12, http://digitalcommons.usu.edu/westernarchives/vol2/iss1/1. 9. marcus ladd, “access and use in the digital age: a case study of a digital postcard collection,” new review of academic librarianship 21, no. 2 (2015): 225–31, https://doi.org/10.1080/13614533.2015.1031258. 10. irene m. h. herold, “digital archival image collections: who are the users?” behavioral & social sciences librarian 29, no. 4 (2010): 267–82, https://doi.org/10.1080/01639269.2010.521024. 11. mark a. matienzo and amy rudersdorf, “the digital public library of america ingestion ecosystem: lessons learned after one year of large-scale collaborative metadata aggregation,” in 2014 proceedings of the international conference on dublin core and metadata applications (dcmi, 2014), 1–11, http://arxiv.org/abs/1408.1713. 12. oskana l. zavalina et al., “extended date/time format (edtf) in the digital public library of america’s metadata: exploratory analysis,” proceedings of the association for information science and technology 52, no. 1 (2015), 1–5, http://onlinelibrary.wiley.com/doi/10.1002/pra2.2015.145052010066/abstract. 13. lisa gregory and stephanie williams, “on being a hub: some details behind providing metadata for the digital public library of america,” d-lib magazine 20, no. 7/8 (july/august 2014): 1–10, https://doi.org/10.1045/july2014-gregory. 14. kate boyd, heather gilbert, and chris vinson, “the south carolina digital library (scdl): what is it and where is it going?” south carolina libraries 2, no. 1 (2016), http://scholarcommons.sc.edu/scl_journal/vol2/iss1/3. 15. chris freeland and heather moulaison, “development of the missouri hub: preparing for linked open data by contributing to the digital public library of america,” proceedings of the association for information science and technology 52, no. 1 (2015): 1–4, http://onlinelibrary.wiley.com/doi/10.1002/pra2.2015.1450520100105/abstract. 16. a single view of an item in a digital collection. 17. visits to the site that began from another site with an item page being the first page viewed. 18. keywords are words visitors used to find the library’s website when using a search engine. google analytics provides a list of these keywords. 19. a session is defined as a “group of interactions that take place on a website within a given time frame” and can include multiple kinds of interactions like page views, social interactions, and economic transactions. in google analytics, a session by default lasts thirty minutes, though analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 34 one can adjust this length to last a few seconds or several hours. “how a session is defined in analytics,” google, analytics help, accessed may 20, 2016, https://support.google.com/analytics/answer/2731565?hl=en. 20. locations were studied in terms of mostly cities and states. 21. the percentage is based on the total referral count a collection gets—for example, a 44 percent referral count for cherokee traditions would mean that the search engines account for 44 percent of the total referrals this collection gets. 22. herold, “digital archival image collections,” 278. 23. krystyna k. matusiak, “towards user-centered indexing in digital image collections,” oclc systems & services: international digital library perspectives 22, no. 4 (2006): 283–98, https://doi.org/10.1108/10650750610706998. 24. ladd, “access and use in the digital age,” 230. 25. fang points out that the improvements made to the rutgers-newark law library website could attract more return visitors and thus achieve loyalty. fang, “using google analytics for improving library website,” 11. 26. niso framework advisory group, a framework of guidance for building good digital collections, 2nd ed. (bethesda, md: national information standards organization, 2004), https://chnm.gmu.edu/digitalhistory/links/cached/chapter3/link3.2a.niso.html. 27. matusiak, “towards user-centered indexing,” 289. 28. john walsh, “the use of library of congress subject headings in digital collections,” library review 60, no. 4 (2011), https://doi.org/10.1108/00242531111127875. 29. lynn silipigni connaway, the library in the life of the user: engaging with people where they live and learn, (dublin: oclc research, 2015), http://www.oclc.org/research/publications/2015/oclcresearch-library-in-life-of-user.html. 10592 20190318 galley the map as a search box: using linked data to create a geographic discovery system gabriel mckee information technology and libraries | march 2019 40 gabriel mckee (gm95@nyu.edu) is librarian for collections and services at the institute for the study of the ancient world at new york university. abstract this article describes a bibliographic mapping project recently undertaken at the library of the institute for the study of the ancient world (isaw). the marc advisory committee recently approved an update to marc that enables the use of dereferenceable uniform resource identifiers (uris) in marc subfield $0. the isaw library has taken advantage of marc’s new openness to uris, using identifiers from the linked data gazetteer pleiades in marc records and using this metadata to create maps representing our library’s holdings. by populating our marc records with uris from pleiades, an online, linked open data (lod) gazetteer of the ancient world, we are able to create maps of the geographic metadata in our library’s catalog. this article describes the background, procedures, and potential future directions for this collection-mapping project. introduction since the concept of the semantic web was first articulated in 2001, libraries have faced the challenge of converting their vast stores of metadata into linked data.1 though bibframe, the planned replacement for the marc (machine-readable cataloging) systems that most american libraries have been using since the 1970s, is based on linked-data principles, it is unlikely to be implemented widely for several years. as a result, many libraries have delayed creating linked data within the existing marc framework. one reason for this delay has been the absence of a clear consensus in the cataloging community about the best method to incorporate uniform resource identifiers (uris), the key building block of linked data, into marc records.2 but recent developments have added clarity to how uris can be used in marc, clearing a path for projects that draw on uris in library metadata. this paper describes one such project undertaken by the library of the institute for the study of the ancient world (isaw) that draws on uris from the linked-data gazetteer pleiades to create maps of items in the library’s collection. a brief history of uris in marc over the last decade, the path to using uris in marc records has become more clear. this process began in 2007, when the deutsche nationalbibliothek submitted a proposal to expand the use of a particular marc subfield, $0 (also called “dollar-zero” or “subfield zero”), to contain control numbers for related authority records in main entry, subject access, and added entry fields.3 the proposal, which was approved on july 13, 2007, called for these control numbers to be recorded with a particular syntax: “the marc organization code (in parentheses) followed immediately by the number, e.g., (cabvau)2835210335.”4 this marc-specific syntax is usable within the marc environment, but is not actionable for linked-data purposes. a dereferenceable uri—that is, an identifier beginning with “http://” that links directly to an online resource or a descriptive information technology and libraries | march 2019 41 representation of a person, object, or concept—could be parsed and reconstructed, but only with a significant amount of human intervention and a high likelihood of error.5 in 2010, following a proposal from the british library, $0 was redefined to allow types of identifiers other than authority record numbers, in particular international standard name identifiers (isni), using this same parenthetical-prefix syntax.6 that same year, the rda/marc working group issued a discussion paper proposing the use of uris in $0, but no proposal regarding the matter was approved at that time.7 the 2010 redefinition made it possible to place uris in $0, provided they were preceded by the parenthetical prefix “(uri)”. however, this requirement of an added character string put marc practice at odds with the typical practices of the linked data community. not only does the addition of a prefix create the need for additional parsing before the uri can be used, the prefix is also redundant, since dereferenceable uris are self-identifying. in 2015, the program for cooperative cataloging (pcc) charged a task group with examining the challenges and opportunities for the use of uris within a marc environment.8 one of this group’s first accomplishments was submitting a proposal to the marc advisory committee to discontinue the requirement of the “(uri)” prefix on uris.9 though this change appears minor, it represents a significant step forward in the gradual process of converting marc metadata to linked data. linked data applications require dereferenceable uris. the requirement of either converting an http uri to a number string (as $0 required from 2007-10), or prefixing it with a parenthetical prefix, produced identifiers that did not meet the definition of dereferenceability. as shieh and reese explain, the marc syntax in place prior to this redefinition was at odds with the practices used by semantic web services: the use of qualifiers, rather than actionable uris, requires those interested in utilizing library metadata to become domain experts and become familiar with the wide range of standards and vocabularies utilized within the library metadata space. the qualifiers force human interaction, whereas dereferenceable uris are more intuitive for machines to process, to query services, to self-describe—a truly automated processing and a wholesome integration of web services.10 though it has been possible to use prefixed uris in marc for several years, few libraries have done so, in part because of this requirement for human intervention, and in part because of the scarcity of use-cases that justified their use. the removal of the prefix requirement brings marc’s use of uris more into line with that of other semantic web services, and will reduce system fragility and enhance forward-compatibility with developing products, projects, and services. though marc library catalogs still struggle with external interoperability, the capability of inserting unaltered, dereferenceable uris into marc records is potentially transformative.11 following the approval of the pcc task group on uri in marc’s 2016 proposal makes it easier to work with limited linked data applications directly within marc, rather than waiting for the implementation of bibframe. by inserting actionable uris directly into marc records, libraries can begin developing programs, tools, and projects that draw on these uris for any number of data outcomes. in the last two years, the isaw library has taken advantage of marc’s new openness to uris to create one such outcome: a bibliographic mapping project that creates browseable maps of items held by the library. the isaw library holds approximately 50,000 volumes in its print collection, map as a search box | mckee 42 https://doi.org/10.6017/ital.v38i1.10592 chiefly focusing on archaeology, history, and philology of asia, europe, and north africa from the beginning of agriculture through the dawn of the medieval period, with a focus on cultural interconnections and interdisciplinary approaches to antiquity. the institute, founded in 2007, is affiliated with new york university (nyu) and its library holdings are cataloged within bobcat, the nyu opac. by populating our marc records with uris from pleiades, an online, linked open data (lod) gazetteer of the ancient world, the isaw library is able to create maps of the geographic metadata in our library’s catalog. at the moment, this process is indirect and requires periodic human intervention, but we are working on ways of introducing greater automation as well as expanding beyond small sets of data to a larger map encompassing as much of our library’s holdings as it makes sense to represent geographically. map-based searching for ancient subjects in the disciplines of history and archaeology, geography is of vital importance. much of what we know about the past can be tied to particular locations: archaeological sites, ancient structures, and find-spots for caches of papyri and cuneiform tablets provide the spatial context for the cultures about which they inform us. but while geospatial data about antiquity can be extremely precise, the text-based searching that is the user’s primary means of accessing library materials enabled is much less clear. standards for geographic metadata focus on place names, which open the door for greater ambiguity, as buckland et al. explain: there is a basic distinction between place, a cultural concept, and space, a physical concept. cultural discourse tends to be about places rather than spaces and, being cultural and linguistic, place names tend to be multiple, ambiguous, and unstable. indeed, the places themselves are unstable. cities expand, absorbing neighboring places, and countries change both names and boundaries.12 nowhere is this instability of places and their names so clear as in the fields of ancient history and archaeology, which often require awareness of cultural changes in a single location throughout the longue durée. and yet researchers in these fields have had to rely on library search interfaces that rely entirely on toponyms for accessing research materials. scholars in these disciplines, and many others besides, would be well served by a method of discovering research materials that relies not on keywords or controlled vocabularies, but on geographic location. library of congress classification and subject cataloging tend to provide greater granularity for political developments in the modern era, presenting a challenge to students of ancient history. a scholar of the ancient caucasus, for example, is likely to be interested in materials that are currently classified under the history classes for the historical region of greater armenia (ds161199), the modern countries of armenia (dk680), azerbaijan (dk69x), georgia (dk67x), russia (dk5xx), ukraine (dk508), and turkey (ds51, ds155-156 and dr401-741); for preand protohistoric periods, materials may be classified in gn700-890; and texts in ancient languages of the caucasus will fall into the pk8000-9000 range. moreover, an effective catalog search may require familiarity with the romanization schemes for georgian, armenian, russian, and ukrainian. materials on the ancient caucasus fall into a dozen or more call number ranges, and there is no single term within the library of congress subject headings (lcsh) that connects them—but if their subjects were represented on a map, they would fall within a polygon only a few hundred miles long on each side. this geophysical collocation of materials from across many classes of knowledge can enable unexpected discoveries. as bidney and clair point out, “organizing information technology and libraries | march 2019 43 information based on location is a powerful idea—it has the capacity to bring together information from diverse communities of practice that a research may never have considered . . . ‘place’ is interdisciplinary.”13 with this in mind, the isaw library has set out to create an alternative method of accessing items in its collection: a browseable, map-based interface for the discovery of library materials. literature review though geographic searching is undoubtedly useful for many different types of content, much of the work in using coordinate data and map-based representations of resources has centered on searching for printed maps and, more recently, geospatial datasets. in an article published in 2007, buckland et al. issued a challenge to libraries to complement existing text-string toponymic terminology with coordinate data.14 perhaps unsurprisingly, the most progress in meeting this challenge has been made in the area of cartographic collections. in a 2010 article, bidney discussed the library of congress’s then-new requirement of coordinates in records describing maps, and explores the possibility of using this metadata to create a geographic search interface.15 a 2014 follow-up article by bidney and clair expanded this argument to include not just cartographic materials, but all library resources, challenging libraries to develop new interfaces to make use of geospatial data.16 the most advanced geospatial search interfaces have been developed for cartographic and geospatial data. for example, geoblacklight (http://geoblacklight.org) offers an excellent map-based interface, but it is intended primarily for cartographic and gis data specifically, and not library resources more broadly. the mapfast project described by bennett et al. in 2011 pursues goals similar to our pleiadesbased discovery system.17 using fast (faceted application of subject terminology) headings, which are already present in many marc records, this project creates a searchable map via the google maps api. each fast geographic heading creates a point on the map which, when clicked, brings the user to a precomposed search in the library catalog for the corresponding controlled subject heading. one limitation to the mapfast model is the absence of geographic coordinates on many of the lc authority records from which fast headings are derived: at the time that bennett et al. described the project, coordinates were available for only 62.5 percent of fast geographic headings; additional coordinates came from the geonames database (http://www.geonames.org/).18 moreover, the method of retrieving these coordinates is based on text string matching, which introduces the possibility of errors resulting from the lack of coordination between toponyms in fast and geonames. in exploring other mapping projects, we looked most closely at projects with a focus on the ancient world, including pelagios (http://commons.pelagios.org), its geographic search tool peripleo,19 and china historical gis (chgis, http://sites.fas.harvard.edu/~chgis). as described by simon et al. in 2016, pelagios offers a shared framework for researchers in classical history to explore geographic connections, and several applications of its data resemble our desired outcome.20 similarly, merrick lex berman’s work with the api provided by china historical gis in connection with library metadata provided important guidelines and points of comparison.21 we also explored mapping projects outside of the context of antiquity, including maphappy, the biodiversity heritage library’s map of lcsh headings, and the map interface developed for phillyhistory.org.22 map as a search box | mckee 44 https://doi.org/10.6017/ital.v38i1.10592 first steps: metadata to develop a system for mapping the isaw library’s collection, we began by working with smaller sets of metadata. our initial collection map, which served as a proof of concept, represented the titles available in the ancient world digital library (awdl, http://dlib.nyu.edu/ancientworld), an online e-book reader created by the isaw library in collaboration with nyu’s digital library technical services department. when we initially created this interface, called the awdl atlas, awdl contained a small, manageable set of about one hundred titles. working in a spreadsheet, we assigned geographic coordinates to each of these titles and mapped them using google fusion tables (https://fusiontables.google.com). fusion tables, launched by google in june 2009, is a cloud-based platform for data management that includes a number of visualization tools, including a mapping feature that builds on the infrastructure of google maps.23 the fusion tables map created for awdl shows a pinpoint for each title in the e-book library; when clicked, each pinpoint gives basic bibliographic data about the title and a link to the e-book itself. one problem with this initial map was that it did little to show precision—a pinpoint representing a specific archaeological site in iraq looks the same on the map as a pinpoint representing the entirety of central asia. nevertheless, the basic functionality of the awdl atlas performed as desired, providing a geographic search interface for a concrete set of resources. for our next collection map, we turned our attention to our monthly lists of new titles in our library’s collection. at the end of each month, nyu’s library systems team sends our library a marc-xml report listing all of the items added to our library’s collection that month. for several years now, we have been publishing this data on our library’s website in human-readable html form and adding the titles to a library in the open-source citation management platform zotero, allowing our users multiple pathways to discovering resources within our collection.24 beginning in august 2016, we began creating monthly maps of these titles, using a variation of the workflow that we devised for the awdl atlas. to better represent the different levels of precision that each point represents, we implemented a color-coded range of four levels of precision, from sitespecific archaeological publications to materials covering a broad, multi-country range, with a fifth category for cross-cultural materials and other works that can’t be well represented in geographic form. (these items are grouped in the mediterranean sea on the monthly new titles maps, but in a full-collection map would most likely be either excluded or represented by multiple points, as appropriate.) the initial new titles maps took a significant amount of title-by-title work to create. coordinates and assessments of precision needed to be assigned for each title individually. we quickly began looking for ways to automate the process of geolocation, and soon settled on using data from pleiades to increase the efficiency of creating each map.25 we set our sights on marc field 651 (subject added entry-geographic name) as the best place in a marc record to put pleiades data. as a subject access field, the 651 is structured to contain a searchable text string and can also include a $0 with a uri associated with that text string. however, under current cataloging guidelines, catalogers are not free to use any uri they choose in this field: the library of congress maintains a list of authorized sources for subject terms to be used in 651 and other subject-access fields.26 in august 2016, the isaw library submitted a proposal to the library of congress for pleiades to be approved as a source of authoritative subject data and added to lc’s list of subject heading and term source codes. the proposal was approved the following month, and by early 2017 the lc-assigned code was approved for use in information technology and libraries | march 2019 45 oclc records. with this approval in place, we began incorporating pleiades uris in marc records for items held by the isaw library. we used the names of pleiades resources as subject terms in new 651 (subject added entry-geographic name) fields, specifying pleiades as the source of the subject term in subfield $2 and adding the pleiades uri in a $0: figure 1. fields from a marc record showing an lcnaf geographic heading and the corresponding pleiades heading, with uri in $0. figure 1 shows a detail from oclc record #986242751, which describes a book containing texts from cuneiform tablets discovered at the hittite capital city hattusa. this detail shows both the lcnaf and pleiades geographic headings assigned to this record. (in addition to providing a uri for the site, the pleiades heading also enhances keyword searches: the 651 field is searchable in the nyu library catalog, thus providing keyword access to one of the city’s ancient names). the second 651 field contains a second indicator 7, indicating that the source of the subject term is specified in $2, where the lc-approved code “pleiades” is specified. this is followed by a $0 containing the uri for the pleiades place resource describing hattusa. our monthly reports of new titles now contain a field for pleiades uris. currently, we are not querying pleiades directly for coordinates, but rather are using the uri as a vertical-lookup term within a spreadsheet of each month’s new titles, which is checked against a separate data file that matches pleiades uris to coordinate pairs.27 for places where no pleiades heading is available, we have begun using uris from the getty thesaurus of geographic names (tgn), marc-syntax fast identifiers, and unique lcnaf text strings, using the same vertical-lookup process to retrieve previously researched coordinate pairs for those places. next, we retrieve coordinates for newly appearing pleiades locations, research the locations of new non-pleiades places, and add both to the local database of places used. lastly, due to google fusion table’s inability to display more than one item on a single coordinate pair, prior to uploading the map data to fusion tables we examine it for duplicated coordinate pairs, manually altering them to scatter these points to nearby locations. the overall amount of time spent on cleaning data and preparing each month’s map has decreased from more than a full day’s work in august 2016 to about two hours in january 2018. map as a search box | mckee 46 https://doi.org/10.6017/ital.v38i1.10592 figure 2. a screenshot from the isaw library new titles map for january 2018, showing an itemspecific information window (http://isaw.nyu.edu/library/find/newtitles-2017-18/2018-jan). challenges in developing the isaw library’s mapping program, we had to overcome several challenges: early in the project, we needed to address the philosophical differences between how pleiades and lcnaf think about places and toponyms. the concept of “place” in pleiades is broad, and contains cities, structures, archaeological sites, kingdoms, provinces, and other types of administrative divisions, roads, geological features, culturally defined regions, and ethnic groups: “the term [‘place’] applies to any locus of human attention, material or intellectual, in a real-world geographic context.”28 in functional terms, a “place” in pleiades is a top-level resource containing one or more other types of data: • one or more locations, consisting of either a precise point, an approximate rectangular polygon, or a precise polygon formed by multiple points; • one or more names, in one or more ancient or modern languages; • one or more connections to other place resources, generally denoting a geospatial or political/administrative connection. locations, names, and connections contain further metadata, including chronological attestations and citations to data sources. no one of these components is a requirement—even locations are optional, as ancient texts contain references to cities and structures whose geospatial location is unknown. information technology and libraries | march 2019 47 by contrast, library of congress rules focus almost exclusively on names—that is, text strings. there are two main categories of geographic names, as described in instruction sheet h690 of the subject headings manual (shm): headings for geographic names fall into two categories: (1) names of political jurisdictions, and (2) non-jurisdictional geographic names. headings in the first category are established according to descriptive cataloging conventions with authority records that reside in the name authority file . . . headings in the second category are established . . . with authority records that reside in the subject authority file.29 the two categories—essentially definable as political entities and geographic regions—are both of interest to the shm only as represented by text strings. the purpose of identifying places within the framework of lc’s guidelines is to enable text-based searching and collocation of items based on uniform, human-readable terminology. at the beginning of this project, it was important to acknowledge, explore, and understand this fundamental difference, and to understand the different purposes of an authority file (identifying unique text strings), a linked data gazetteer (assembling and linking many different kinds of geospatial and toponymic data), and our mapping project (identifying coordinate pairs related to specific library resources). in our project, this philosophical gap manifested as a difference between the primary and secondary importance of authorized text strings and uris: in lcsh and lcnaf, the text string is primary, and the uri secondary (where it is used at all); in pleiades and many other linked-data sources, uris are primary and text strings secondary. lcsh and lcnaf text strings are unique, and can be considered as a sort of identifier, but they do not have the machine-readable functionality of a uri. in pleiades, the machine-readable uri is primary, and can be used to return coordinates, place names, and other humanor machine-readable data. the name of a pleiades place resource can be construed as a “subject heading,” but these text strings are not necessarily unique, and additional data from the pleiades resource may be required for disambiguation by a human reader.30 toponymic terminology—that is, human-readable text strings—are just one type of data that pleiades contains, alongside geospatial data, temporal tags, and linkages between resources. one example of a recent change in pleiades data illustrates the fundamental difference in approach between authority control and uri management. until recently, pleiades contained two different place resources with the heading “aegyptus” (https://pleiades.stoa.org/places/766 and https://pleiades.stoa.org/places/981503), both referring to the general region of egypt. both of these resources were recently updated, and the title text of both was changed: /places/766 was retitled “aegyptus (roman imperial province)” and /places/981503 became “ancient egypt (region).” the distinction illustrates the difficulty in assigning names to places over long spans of time: egypt, as understood by pre-ptolemaic inhabitants of the nile region, had a different meaning than the administrative region established after octavian’s defeat of marc antony and cleopatra—or, for that matter, from the predynastic kingdoms of upper and lower egypt, the ottoman eyalet of misr, and the modern republic of egypt. prior to this change in pleiades, both uris were applied to marc records for items held by the isaw library, under the heading “aegyptus.” from a linked-data standpoint, there is no real problem here: the uris still link to resources describing different historical places called “egypt,” including the coordinate data needed for isaw’s collection maps. but from the standpoint of authority control, the subject term map as a search box | mckee 48 https://doi.org/10.6017/ital.v38i1.10592 “aegyptus” on these records is now “wrong,” representing a deprecated term, and should be updated. even here, though, a linked-data model has benefits that a text-string-based model lacks. even if they contain the same text string heading, the uri means there is no ambiguity between the two headings, and the text strings can be replaced with a batch operation based on the differences in their uris. getting away from text-string-based thinking will represent a major philosophical challenge for libraries as we move toward a linked data model for library metadata, but the many benefits of linked data will make that shift worthwhile. google fusion tables represents a future hurdle that the isaw library’s mapping project will need to clear. in december 2018, google announced that the fusion tables project would be discontinued, and that all embedded fusion tables visualizations will cease functioning on december 3, 2019.31 fortunately, the isaw library has already begun developing an alternative solution that does not rely on the deprecated fusion tables application. the core methodology used in developing our maps will remain the same, however. lastly, the geographic breadth of our collection reveals the limitations of pleiades as the sole data source for this project. at its inception, pleiades was focused on the greco-roman antiquity, and though it has expanded over time, central and east asia—regions of central interest to the isaw library—are largely not covered. because all contributions to pleiades undergo peer review prior to being published online, pleiades’ editors are understandably reluctant to commit to expanding their coverage eastward until the editorial team includes experts in these geographic areas. however, though we began this project with pleiades, there is no barrier to using other sources of geographic data, such as china historical gis, the getty thesaurus of geographic names (tgn, http://www.getty.edu/research/tools/vocabularies/tgn/index.html), geonames (http://www.geonames.org/), the world-historical gazetteer (http://whgazetteer.org/), or the library of congress’s linked data service (http://id.loc.gov/). the same procedures we’ve used with pleiades can be applied to any reliable data source with consistently formatted data. future directions we have already begun to move away from the google fusion tables model, and are working to develop our own javascript-based map application using mapbox (https://www.mapbox.com/) and leaflet (https://leafletjs.com/). when completed, this updated mapping application will actively query a database of pleiades headings for coordinates, further automating the process of map creation. we are looking into different methods of encoding and representing precision—for example, using points and polygons to represent sites and regions, respectively. the leaflet map interface will also enable us to show multiple items for single locations, something fusion tables is unable to do, and will thus eliminate the need to manually deduplicate coordinate pairs. to expand the number of records that contain pleiades uris, we are developing a crosswalk between existing lc geographic headings and pleiades place resources. when completed, we will use this crosswalk to batch-update our older records with pleiades data where appropriate. the crosswalk will contain uris from both pleiades and the lc linked data service, and it will be provided to the pleiades team so that pleiades resources can incorporate lc metadata as well. we are also exploring further user applications of map-based search. one function we hope to develop is a geographic notification service, allowing users to define polygonal areas of interest on the map. when a new point is added that falls within these polygons, the user will be notified of a information technology and libraries | march 2019 49 new item of potential interest. some user training will be required to ensure that users define their areas of interest in such a way that they will receive results that interest them—for example, a user interested in the roman empire will likely be interested in titles about the mediterranean region in general, and may need to draw their bounding box so that it encompasses the open sea as well as sites on land. it will also require thoughtfulness about where users are likely to look for points of interest, especially for empires and other historic entities that do not correspond to modern geopolitical boundaries (for example, the byzantine empire or scythia). additionally, we hope to begin working with chronological as well as geospatial data, with hopes of being able to add a time slider to the library map. this would enable users to focus on particular periods of history as well as geographic regions—for example, users interested in bronze age anatolia could limit results to that time period, so that they can browse the map without material from the byzantine empire “cluttering” their browsing experience.32 the online temporal gazetteer periodo (http://perio.do/) provides a rich data source to draw on, including uris for individual chronological periods and beginning and end dates for each defined temporal term. following a proposal submitted by the isaw library, periodo was approved by the library of congress as a source of subject terminology in september 2018, and its headings and uris are now useable in marc. however, though lcsh headings for geographic places are often quite good, the guidelines for chronological headings and subdivisions are often inadequate for describing ancient historical periods, and thus enacting a chronological slider, though highly desirable, would require a large amount of manual changes and additions to existing metadata. the isaw library’s collection mapping project has accomplished its initial goal of providing a geospatial interface for the discovery of materials in our library collection. as we expand our mapping project to incorporate more of our collection, we also hope that our model can prove useful to other institutions looking for practical applications of uris in marc, alternative discovery methods to text-based searching, or both. references and notes 1 for a summary of this challenging problem, see brighid m. gonzales, “linking libraries to the web: linked data and the future of the bibliographic record,” information technology & libraries 33, no. 4 (dec. 2014): 10–22, https://doi.org/10.6017/ital.v33i4.5631. 2 see, for example, timothy w. cole et al., “library marc records into linked open data: challenges and opportunities,” journal of library metadata 13, no. 2–3 (july 2013): 178, https://doi.org/10.1080/19386389.2013.826074. 3 deutsche nationalbibliothek, “marc proposal no. 2007-06: changes for the german and austrian conversion to marc 21,” marc standards, may 25, 2007, https://www.loc.gov/marc/marbi/2007/2007-06.html. 4 ibid. 5 for a detailed discussion of the importance of actionability in unique identifiers, see jackie shieh and terry reese, “the importance of identifiers in the new web environment and using the uniform resource identifier (uri) in subfield zero ($0): a small step that is actually a big map as a search box | mckee 50 https://doi.org/10.6017/ital.v38i1.10592 step,” journal of library metadata 15, no. 3–4 (oct. 2, 2015): 220–23, https://doi.org/10.1080/19386389.2015.1099981. 6 british library, “marc proposal no. 2010-06: encoding the international standard name identifier (isni) in the marc 21 bibliographic and authority formats,” marc standards, may 17, 2010, https://www.loc.gov/marc/marbi/2010/2010-06.html. 7 rda/marc working group, “marc discussion paper no. 2010-dp02: encoding uris for controlled values in marc records,” marc standards, dec. 14, 2009, https://www.loc.gov/marc/marbi/2010/2010-dp02.html. 8 for a summary of this task group’s work to date, see jackie shieh, “reports from the program for cooperative cataloging task groups on uris in marc & bibframe,” jlis.it: italian journal of library, archives and information science = rivista italiana di biblioteconomia, archivistica e scienza dell’informazione 9, no. 1 (2018): 110–19, https://doi.org/10.4403/jlis.it-12429. 9 pcc task group on uri in marc and the british library, “marc discussion paper no. 2016dp18: redefining subfield $0 to remove the use of parenthetical prefix ‘(uri)’ in the marc 21 authority, bibliographic, and holdings formats,” marc standards, may 27, 2016, https://www.loc.gov/marc/mac/2016/2016-dp18.html; marc advisory committee, “mac meeting minutes” (ala annual meeting, orlando, fl, 2016), https://www.loc.gov/marc/mac/minutes/an-16.html. for a cumulative description of the scope of this task group’s work, see pcc task group on uris in marc, “pcc task group on uris in marc: year 2 report to poco, october 2017” (program for cooperative cataloging, oct. 23, 2017), https://www.loc.gov/aba/pcc/documents/poco2017/pcc_uri_tg_20171015_report.pdf. 10 shieh and reese, “the importance of identifiers in the new web environment and using the uniform resource identifier (uri) in subfield zero ($0),” 221. 11 shieh and reese, “the importance of identifiers in the new web environment and using the uniform resource identifier (uri) in subfield zero ($0)”; for a discussion of a related problem (finding a place for a uri in marc authority records), see ioannis papadakis, konstantinos kyprianos, and michalis stefanidakis, “linked data uris and libraries: the story so far,” d-lib magazine 21, no. 5/6 (june 2015), https://doi.org/10.1045/may2015-papadakis. 12 michael buckland et al., “geographic search: catalogs, gazetteers, and maps,” college & research libraries 68, no. 5 (sept. 2007): 376, https://doi.org/10.5860/crl.68.5.376. 13 marcy bidney and kevin clair, “harnessing the geospatial semantic web: toward place-based information organization and access,” cataloging & classification quarterly 52, no. 1 (2014): 70, https://doi.org/10.1080/01639374.2013.852038. 14 buckland et al., “geographic search.” 15 marcy bidney, “can geographic coordinates in the catalog record be useful?,” journal of map & geography libraries 6, no. 2 (july 13, 2010): 140–50, https://doi.org/10.1080/15420353.2010.492304. information technology and libraries | march 2019 51 16 bidney and clair, “harnessing the geospatial semantic web.” 17 rick bennett et al., “mapfast: a fast geographic authorities mashup with google maps,” code4lib journal, no. 14 (july 25, 2011): 1–9, http://journal.code4lib.org/articles/5645. 18 bennett et al., 1. 19 rainer simon et al., “peripleo: a tool for exploring heterogeneous data through the dimensions of space and time,” the code4lib journal, no. 31 (jan. 28, 2016), http://journal.code4lib.org/articles/11144. 20 rainer simon et al., “the pleiades gazetteer and the pelagios project,” in placing names: enriching and integrating gazetteers, ed. merrick lex berman, ruth mostern, and humphrey southall, the spatial humanities (bloomington: indiana univ. pr., 2016), 97–109. 21 merrick lex berman, “linked places in the context of library metadata” (nov. 10, 2016), https://sites.fas.harvard.edu/~chgis/work/docs/papers/hvd_librarylinkeddatagroup_lexb erman_20161110.pdf. 22 lisa r. johnston and kristi l. jensen, “maphappy: a user-centered interface to library map collections via a google maps ‘mashup,’” journal of map & geography libraries 5, no. 2 (july 2009): 114–30, https://doi.org/10.1080/15420350903001138; chris freel et al., “geocoding lcsh in the biodiversity heritage library,” the code4lib journal, no. 2 (mar. 24, 2008), http://journal.code4lib.org/articles/52; gina l. nichols, “merging special collections with gis technology to enhance the user experience,” slis student research journal 5, no. 2 (2015): 52–71, http://scholarworks.sjsu.edu/slissrj/vol5/iss2/5/. 23 hector gonzalez et al., “google fusion tables: data management, integration and collaboration in the cloud,” in proceedings of the 1st acm symposium on cloud computing (indianapolis: acm, 2010), 175–80, https://doi.org/10.1145/1807128.1807158. 24 the isaw library new titles library is available at http://www.zotero.org/groups/290269. 25 since our interest was in obtaining coordinate data, we determined that lcnaf and lcsh would not be appropriate to our needs. although some marc authority records include coordinate data, it is not present in all geographic headings. moreover, where coordinate data is available in the authority file, it is not published in the rdf form of the records via the lc linked data service (http://id.loc.gov/). entries in the getty thesaurus of geographic names (tgn, http://www.getty.edu/research/tools/vocabularies/tgn/index.html) often include structured coordinate data, and in recent months we have begun using tgn uris when a pleiades uri is not available. 26 library of congress, network development & marc standards office, “subject heading and term source codes: source codes for vocabularies, rules, and schemes,” library of congress, jan. 9, 2018, https://www.loc.gov/standards/sourcelist/subject.html. 27 it is worth noting that, since the uris are not currently being queried in the preparation of the map, much of this work could have been accomplished with pre-uri identifiers from marc map as a search box | mckee 52 https://doi.org/10.6017/ital.v38i1.10592 data, or even unique text strings. one benefit of using uris is ease of access to coordinate data, especially from pleiades. pleiades puts coordinates front and center in its display, and even features a one-click feature to copy coordinates to the clipboard. moreover, the entire pleiades dataset is available for download, making the retrieval of coordinates automatable locally, reducing keystrokes even without active database querying. the primary benefit of using uris instead of other forms of unique identifiers, however, is forward-compatibility. this is of immediate importance, since we are developing an updated version of the map that will actively query pleiades for coordinates. future benefits of the presence of uris also include links from pleiades into the library catalog, based on records in which place uris appear. if and when the entire catalog shifts to a linked-data model, the benefits of having these uris present expands exponentially, as this metadata will then be available to all manner of outside sources. 28 sean gillies et al., “conceptual overview,” pleiades, mar. 24, 2017, https://pleiades.stoa.org/help/conceptual-overview. 29 library of congress, “subject headings manual (shm)” (library of congress, 2014), h 690, https://www.loc.gov/aba/publications/freeshm/freeshm.html. 30 for example, pleiades contains two place resources with the identical name “babylon”: one the mesopotamian city and capital of the region known as babylonia (https://pleiades.stoa.org/places/893951); the other the site of the muslim capital of egypt, al-fusṭāṭ, known in late antiquity as babylon (https://pleiades.stoa.org/places/893951727082). 31 google, “notice: google fusion tables turndown,” fusion tables help, dec. 11, 2018, https://support.google.com/fusiontables/answer/9185417. 32 a method of chronological browsing was described in vivien petras, ray r. larson, and michael buckland, “time period directories: a metadata infrastructure for placing events in temporal and geographic context,” in digital libraries, 2006. jcdl’06. proceedings of the 6th acm/ieeecs joint conference on digital library (ieee, 2006), 151–60, https://doi.org/10.1145/1141753.1141782. microsoft word september_ital_betz_final.docx self-­‐archiving  with  ease  in  an     institutional  repository:     microinteractions  and  the  user  experience     sonya  betz    and     robyn  hall     information  technology  and  libraries  |  september  2015             43   abstract   details  matter,  especially  when  they  can  influence  whether  users  engage  with  a  new  digital  initiative   that  relies  heavily  on  their  support.  during  the  recent  development  of  macewan  university’s   institutional  repository,  the  librarians  leading  the  project  wanted  to  ensure  the  site  would  offer  users   an  easy  and  effective  way  to  deposit  their  works,  in  turn  helping  to  ensure  the  repository’s  long-­‐term   viability.  the  following  paper  discusses  their  approach  to  user-­‐testing,  applying  dan  saffer’s   framework  of  microinteractions  to  how  faculty  members  experienced  the  repository’s  self-­‐archiving   functionality.  it  outlines  the  steps  taken  to  test  and  refine  the  self-­‐archiving  process,  shedding  light  on   how  others  may  apply  the  concept  of  microinteractions  to  better  understand  a  website’s  utility  and   the  overall  user  experience  that  it  delivers.     introduction   one  of  the  greatest  challenges  in  implementing  an  institutional  repository  (ir)  at  a  university  is   acquiring  faculty  buy-­‐in.  support  from  faculty  members  is  essential  to  ensuring  that  repositories   can  make  online  sharing  of  scholarly  materials  possible,  along  with  the  long-­‐term  digital   preservation  of  these  works.  many  open  access  mandates  have  begun  to  emerge  around  the  world,   developed  by  universities,  governments,  and  research  funding  organizations,  which  serve  to   increase  participation  through  requiring  that  faculty  contribute  their  works  to  a  repository.1   however,  for  many  staff  managing  irs  at  academic  libraries  there  are  no  enforceable  mandates  in   place,  and  only  a  fraction  of  faculty  works  can  be  contributed  without  copyright  implications  when   author  agreements  transfer  copyrights  to  publishers.  persuading  faculty  members  to  take  the  time   to  sort  through  their  works  and  self-­‐archive  those  that  are  not  bound  by  rights  restrictions  is  a   challenge.   standard  installations  of  popular  ir  software,  including  dspace,  digital  commons,  and  eprints,  do   little  to  help  facilitate  easy  and  efficient  ir  deposits  by  faculty.  as  dorothea  salo  writes  in  a  widely   cited  critique  of  irs  managed  by  academic  libraries,  the  “‘build  it  and  they  will  come’  proposition   has  been  decisively  wrong.”2  a  major  issue  she  points  out  is  that  repositories  were  predicated  on   the  “assumption  that  faculty  would  deposit,  describe,  and  manage  their  own  material.”3  seven     sonya  betz  (sonya.betz@ualberta.ca)  is  digital  initiatives  project  librarian,  university  of  alberta   libraries,  university  of  alberta,  edmonton,  alberta.  robyn  hall  (hallr27@macewan.ca)  is   scholarly  communications  librarian,  macewan  university  library,  macewan  university,   edmonton,  alberta.     self-­‐archiving  with  ease  in  an  institutional  repository  |  betz  and  hall     doi:  10.6017/ital.v34i3.5900   44   years  after  the  publication  of  her  article,  a  vast  majority  of  the  more  than  2,600  repositories   currently  operating  around  the  world  still  function  in  this  way  and  struggle  to  attract  widespread   faculty  support.4  to  deposit  works  into  these  systems,  faculty  are  often  required  to  fill  out  an   online  form  to  describe  and  upload  each  work  individually.  this  can  be  a  laborious  process  that   includes  deciphering  lengthy  copyright  agreements,  filling  out  an  array  of  metadata  fields,  and   ensuring  file  formats  or  file  sizes  that  are  compatible  with  the  constraints  of  the  software.   in  august  of  2014,  macewan  university  library  in  edmonton,  alberta,  launched  an  ir,  research   online  at  macewan  (ro@m;  http://roam.macewan.ca).  our  hope  was  that  ro@m’s  simple  user   interface  and  straightforward  submission  process  would  help  to  bolster  faculty  contributions.  the   site  was  built  using  islandora,  an  open-­‐source  software  framework  that  offered  the  project   developers  substantial  flexibility  in  appearance  and  functionality.  in  an  effort  to  balance  their   desire  for  independence  over  their  work  with  ease  of  use,  faculty  and  staff  have  the  option  of   submitting  to  ro@m  in  one  of  two  ways:  they  can  choose  to  complete  a  brief  process  to  create   basic  metadata  and  upload  their  work,  or  they  can  simply  upload  their  work  and  have  ro@m  staff   create  metadata  and  complete  the  deposit.     thoroughly  testing  both  of  these  processes  was  critical  to  the  success  of  the  ir.  we  wanted  to   ensure  that  there  were  no  obstacles  in  the  design  that  would  dissuade  faculty  members  from   contributing  their  works  once  they  had  made  the  decision  to  start  the  contribution  process.  as  the   primary  means  of  adding  content  to  the  ir,  and  as  a  process  that  other  institutions  have  found   problematic,  carefully  designing  each  step  of  how  a  faculty  contributor  submits  material  was  our   highest  priority.  to  help  us  focus  our  testing  on  some  of  these  important  details,  and  to  provide  a   framework  of  understanding  for  refining  our  design,  we  turned  to  dan  saffer’s  2013  book   microinteractions:  designing  with  details.  the  following  case  study  describes  our  use  of   microinteractions  as  a  user-­‐testing  approach  for  libraries  and  discusses  what  we  learned  as  a   result.  we  seek  to  shed  light  on  how  other  repository  managers  might  envision  and  structure  their   own  self-­‐archiving  processes  to  ensure  buy-­‐in  while  still  relying  on  faculty  members  to  do  some  of   the  necessary  legwork.  additionally,  we  lay  out  how  other  digital  initiatives  may  embrace  the   concept  of  microinteractions  as  a  means  of  better  understanding  the  relationship  between  the   utility  of  a  website  and  the  true  value  of  positive  user  experience.     literature  review   user  experience  and  self-­‐archiving  in  institutional  repositories   user  experience  (ux)  in  libraries  has  gained  significant  traction  in  recent  years  and  provides  a   useful  framework  for  exploring  how  our  users  are  interacting  with,  and  finding  meaning  in,  the   library  technologies  we  create  and  support.  although  there  is  still  some  disagreement  around  the   definition  and  scope  of  what  exactly  we  mean  when  we  talk  about  ux,  there  seems  to  be  general   consensus  that  paying  attention  to  ux  shifts  focus  from  the  usability  of  a  product  to  more   nonutilitarian  qualities,  such  as  meaning,  affect,  and  value.5  hassenzhal  simply  defines  ux  as  a     information  technologies  and  libraries  |  september  2015   45   “momentary,  primarily  evaluative  feeling  (good-­‐bad)  while  interacting  with  a  product  or  service.”6   hassenzhal,  diefenbach,  and  goritz  argue  that  positive  emotional  experiences  with  technology   occur  when  the  interaction  fulfills  certain  psychological  needs,  such  as  competence  or  popularity.7   the  2010  iso  standard  for  human-­‐centered  design  for  interactive  systems  defines  ux  even  more   broadly,  suggesting  that  it  “includes  all  the  users’  emotions,  beliefs,  preferences,  perceptions,   physical  and  psychological  responses,  behaviors  and  accomplishments  that  occur  before,  during   and  after  use.”8  however,  when  creating  tools  for  library  environments,  it  can  be  difficult  for   practitioners  to  translate  ambiguous  emotional  requirements,  such  as  satisfying  emotional  and   psychological  needs  or  increasing  motivation,  with  pragmatic  outcomes,  such  as  developing  a   piece  of  functionality  or  designing  a  user  interface.   it  has  been  well  documented  that  repository  managers  struggle  to  motivate  academics  to  self-­‐ archive  their  works.9  however,  the  literature  focusing  on  how  ir  websites’  self-­‐archiving   functionality  helps  or  hinders  faculty  support  and  engagement  is  sparse.  one  study  of  note  was   conducted  by  kim  and  kim  in  2006,  who  led  usability  testing  and  focus  groups  on  an  ir  in  south   korea.  10  they  provide  a  number  of  ways  to  improve  usability  on  the  basis  of  their  findings,  which   include  avoiding  jargon  terms  and  providing  comprehensive  instructions  at  points  of  need  rather   than  burying  them  in  submenus.  similarly,  veiga  e  silva,  goncalves,  and  laender  reported  results   of  usability  testing  conducted  on  the  brazilian  digital  library  of  computing,  which  confirmed  their   initial  goals  of  building  a  self-­‐archiving  service  that  was  easily  learned,  comfortable,  and   efficient.11  the  authors  of  both  of  these  studies  suggest  that  user-­‐friendly  design  could  help  to   ensure  the  active  support  and  sustainability  of  their  services,  but  long-­‐term  use  remained  to  be   seen  at  the  time  of  publication.  meanwhile,  bell  and  sarr  recommend  integrating  value-­‐added   features  into  ir  websites  as  a  way  to  attract  faculty.12  their  successful  strategy  for  reengineering  a   struggling  ir  at  the  university  of  rochester  included  adding  tools  to  allow  users  to  edit  metadata   and  add  and  remove  files,  and  providing  portfolio  pages  where  faculty  could  list  their  works  in  the   ir,  link  to  works  available  elsewhere,  detail  their  research  interests,  and  upload  a  copy  of  their  cv.   although  the  question  remains  as  to  whether  a  positive  user  experience  in  an  ir  can  be  a   significant  motivating  factor  for  increasing  faculty  participation,  there  seems  to  be  enough   evidence  to  support  its  viability  as  an  approach.   applying  microinteractions  to  user  testing   dan  saffer’s  2013  book,  microinteractions:  designing  with  details,  follows  logically  from  the  ux   movement.  although  he  uses  the  phrase  “user  experience”  sparingly,  saffer  consistently  connects   interactive  technologies  with  the  emotional  and  psychological  mindset  of  the  user.  saffer  focuses   on  “microinteractions,”  which  he  defines  as  “a  contained  product  moment  that  revolves  around  a   single  use  case.”13  saffer  argues  that  well-­‐designed  microinteractions  are  “the  difference  between   a  product  you  love  and  product  you  tolerate.”14  saffer’s  framework  is  an  effective  application  of  ux   theory  to  a  pragmatic  task.  not  only  does  he  privilege  the  emotional  state  of  the  user  as  a  priority     self-­‐archiving  with  ease  in  an  institutional  repository  |  betz  and  hall     doi:  10.6017/ital.v34i3.5900   46   for  design,  he  also  provides  concrete  recommendations  for  designing  technology  that  provokes   positive  psychological  states  such  as  pleasure,  engagement,  and  fun.   defining  what  we  mean  by  a  “microinteraction”  is  important  when  translating  saffer’s  theory  to  a   library  environment.  he  describes  a  microinteraction  as  “a  tiny  piece  of  functionality  that  only   does  one  thing  .  .  .  every  time  you  change  a  setting,  sync  your  data  or  devices,  set  an  alarm,  pick  a   password,  turn  on  an  appliance,  log  in,  set  a  status  message,  or  favorite  or  like  something,  you  are   engaging  with  a  microinteraction.”15  in  libraries,  many  microinteractions  are  built  around   common  user  tasks  such  as  booking  a  group-­‐use  room,  placing  a  hold  on  an  item,  registering  for  an   event,  rating  a  book,  or  conducting  a  search  in  a  discovery  tool.  a  single  piece  of  interactive  library   technology  may  have  any  number  of  discrete  microinteractions,  and  often  are  part  of  a  larger   ecosystem  of  connected  processes.  for  example,  an  integrated  library  system  is  composed  of   hundreds  of  microinteractions  designed  both  for  end  users  and  library  staff,  while  a  self-­‐checkout   machine  is  primarily  designed  to  facilitate  a  single  microinteraction.   saffer’s  framework  provided  a  valuable  new  lens  on  how  we  could  interpret  users’  interactions   with  our  ir.  while  we  generally  conceptualize  an  ir  as  a  searchable  collection  of  institutional   content,  we  can  also  understand  it  as  a  collection  of  microinteractions.  for  example,  ro@m’s  core   is  microinteractions  that  enable  tasks  such  as  searching  content,  browsing  content,  viewing  and   downloading  content,  logging  in,  submitting  content,  and  contacting  staff.  ro@m  also  includes   microinteractions  for  staff  to  upload,  review,  and  edit  content.  as  discussed  above,  one  of  the   primary  goals  when  developing  our  ir  was  to  allow  faculty  to  deposit  scholarly  content,  such  as   articles  and  conference  papers,  directly  to  the  repository.  we  wanted  this  process  to  be  simple  and   intuitive,  and  for  faculty  to  have  some  control  over  the  assignation  of  keywords  and  other   metadata,  but  also  to  have  the  option  to  simply  submit  content  with  minimal  effort.  we  decided  to   employ  user  testing  to  carefully  examine  the  deposit  process  as  a  discrete  microinteraction  and  to   apply  saffer’s  framework  as  a  means  of  assessing  both  functionality  and  ux.  we  hoped  that   focusing  on  the  details  of  that  particular  microinteraction  would  allow  us  to  make  careful  and   thoughtful  design  choices  that  would  lead  to  a  more  consistent  and  pleasurable  ux.   method  and  case  study   we  conducted  two  rounds  of  user  testing  for  the  self-­‐archiving  process.  our  initial  user  testing   was  conducted  in  january  2014.  we  asked  seven  faculty  to  review  and  comment  on  a  mockup  of   the  deposit  form  to  test  the  workflow.  this  simple  exercise  allowed  us  to  confirm  the  steps  in  the   upload  process,  and  identified  a  few  critical  issues  that  we  could  resolve  before  building  out  the  ir   in  islandora.  after  completing  the  development  of  the  ir,  and  with  a  working  copy  of  the  site   installed  on  our  user  acceptance  testing  (uat)  server,  we  conducted  a  second  round  of  in-­‐depth   usability  testing  within  our  new  microinteraction  framework.     in  april  2014  we  recruited  six  faculty  members  through  word  of  mouth  and  through  a  call  for   participants  in  the  university’s  weekly  electronic  staff  newsletter.  the  volunteers  represented   major  disciplines  at  macewan  university,  including  health  sciences,  social  sciences,  humanities,     information  technologies  and  libraries  |  september  2015   47   and  natural  sciences.  saffer  describes  a  process  for  testing  microinteractions  and  suggests  that  the   most  relevant  way  to  test  microinteractions  is  to  include  “hundreds  (if  not  thousands)  of   participants.”16  however,  he  goes  on  to  describe  the  most  effective  methods  of  testing  to  be   qualitative,  including  conversation,  interviews,  and  observation.  testing  thousands  of  participants   with  one-­‐on-­‐one  interviews  and  observation  sessions  is  well  beyond  the  means  of  most  academic   libraries,  and  runs  counter  to  standard  usability  testing  methodology.  while  testing  only  six   participants  may  seem  like  a  small  number,  and  one  that  is  apt  to  render  inconclusive  results  and   sparse  feedback,  it  is  strongly  supported  by  usability  experts,  such  as  jakob  nielson.  during  the   course  of  our  testing,  we  quickly  reached  what  nielson  refers  to  in  his  piece  “how  many  test  users   in  a  usability  study?”  as  “the  point  of  diminishing  returns.”17  he  suggests  that  for  most  qualitative   studies  aimed  at  gathering  insights  to  inform  site  design  and  overall  ux,  five  users  is  in  fact  a   suitable  number  of  participants.  we  support  his  recommendation  on  the  basis  of  our  own   experiences;  by  the  fourth  participant,  we  were  receiving  very  repetitive  feedback  on  what   worked  well  and  what  needed  to  be  changed.   testing  took  place  in  faculty  members’  offices  on  their  own  personal  computers  so  that  they  would   have  the  opportunity  to  engage  with  the  site  as  they  would  under  normal  workday  circumstances.   each  user  testing  session  lasted  45  to  60  minutes,  and  was  facilitated  by  three  members  of  the   ro@m  team:  the  web  and  ux  librarian  guided  each  faculty  member  through  the  testing  process,   the  scholarly  communications  librarian  observed  the  interaction,  and  a  library  technician  took   detailed  notes  recording  participant  comments  and  actions.  each  faculty  member  was  given  an   article  and  asked  to  contribute  that  article  to  ro@m  using  the  uat  site.  the  ro@m  team  observed   the  entire  process  carefully,  especially  noting  any  problematic  interactions,  while  encouraging  the   faculty  member  to  think  aloud.  once  testing  was  complete,  the  scholarly  communications  librarian   analyzed  the  notes  and  identified  areas  of  common  concern  and  confusion  among  participants,  as   well  as  several  suggestions  that  the  participants  made  to  improve  the  site’s  functionality  as  they   worked  through  the  process.  she  then  went  about  making  changes  to  the  site  based  on  this   feedback.  as  we  discuss  in  the  next  section,  each  task  that  faculty  members  performed,  from  easy   to  frustrating,  represented  an  interaction  with  the  user  interface  that  affected  participants’   experiences  of  engaging  with  the  contribution  process,  and  informed  changes  we  were  able  to   make  before  launching  the  ir  service  three  months  later.     basic  elements  of  microinteractions   saffer’s  theory  describes  four  primary  components  of  a  microinteraction:  the  trigger,  rules,   feedback,  and  loops  and  modes.  viewing  the  ir  upload  tool  as  a  microinteraction  intended  to  be   efficient  and  user-­‐friendly  required  us  to  first  identify  each  of  these  different  components  as  they   applied  to  the  contribution  process  (see  figure  1),  and  then  evaluate  the  tool  as  a  whole  through   our  user  testing.     self-­‐archiving  with  ease  in  an  institutional  repository  |  betz  and  hall     doi:  10.6017/ital.v34i3.5900   48     figure  1.  ir  self-­‐archiving  process  with  microinteraction  components.   trigger   the  first  component  to  examine  in  a  microinteraction  is  the  trigger,  which  is,  quite  simply,   “whatever  initiates  the  microinteraction.”18  on  an  iphone,  a  trigger  for  an  application  might  be  the   icon  that  launches  an  app;  on  a  dishwasher,  the  trigger  would  be  the  button  pressed  to  start  the   machine;  on  a  website,  a  trigger  could  be  a  login  button  or  a  menu  item.  well-­‐designed  triggers   follow  good  usability  principles:  they  appear  when  and  where  the  user  needs  them,  they  initiate   the  same  action  every  time,  and  they  act  predictably  (for  example,  buttons  are  pushable,  toggles   slide).     information  technologies  and  libraries  |  september  2015   49   examining  our  trigger  was  a  first  step  in  assessing  how  well  our  upload  microinteraction  was   designed.  uploading  and  adding  content  is  a  primary  function  of  the  ir,  and  the  trigger  needed  to   be  highly  noticeable.  we  can  assume  that  users  would  be  goal-­‐based  in  their  approach  to  the  ir;   faculty  would  be  visiting  the  site  with  the  specific  purpose  of  uploading  content  and  would  be   actively  looking  for  a  trigger  to  begin  an  interaction  that  would  allow  them  to  do  so.     the  initial  design  of  ro@m  included  a  top-­‐level  menu  item  as  the  only  trigger  for  contributing   works.  in  the  persistent  navigation  at  the  top  of  the  site,  users  could  click  on  the  menu  item   labeled  “contribute”  where  they  would  then  be  presented  with  a  login  screen  to  begin  the   contribution  process.  this  was  immediately  obvious  to  half  of  the  participants  during  user  testing.   however,  the  other  half  immediately  clicked  on  the  word  “share,”  which  appeared  on  the  lower   half  of  the  page  beside  a  small  icon  simply  as  a  way  to  add  some  aesthetic  appeal  to  the  homepage   along  with  the  words  “discover”  and  “preserve.”  not  surprisingly,  the  users  were  interpreting  the   word  and  icon  as  a  trigger.  because  of  the  user  behavior  that  we  observed,  we  decided  to  add   hyperlinks  to  all  three  of  these  words,  with  “share”  linking  to  the  contribution  login  screen  (see   figure  2),  “discover”  leading  to  a  browse  page,  and  “preserve”  linking  to  an  faq  for  authors  page   that  included  information  on  digital  preservation.  this  increased  visibility  of  the  trigger   significantly  for  the  microinteraction.     figure  2.  “share”  as  additional  trigger  for  contributing  works.     self-­‐archiving  with  ease  in  an  institutional  repository  |  betz  and  hall     doi:  10.6017/ital.v34i3.5900   50   rules   the  second  component  of  microinteractions  described  by  saffer  are  the  rules.  rules  are  the   parameters  that  govern  a  microinteraction;  they  provide  a  framework  of  understanding  to  help   users  succeed  at  completing  the  goal  of  a  microinteraction  by  defining  “what  can  and  cannot  be   done,  and  in  what  order.”19  while  users  don’t  need  to  understand  the  engineering  behind  a  library   self-­‐checkout  machine,  for  example,  they  do  need  to  understand  what  they  can  and  cannot  do   when  they’re  using  the  machine.  the  hardware  and  software  of  a  self-­‐checkout  machine  is   designed  to  support  the  rules  by  encouraging  users  to  scan  their  cards  to  start  the  machine,  to   align  their  books  or  videos  so  that  they  can  be  scanned  and  desensitized,  and  to  indicate  when   they  have  completed  the  interaction.   the  goal  when  designing  a  self-­‐archiving  process  in  ro@m  was  to  ensure  that  the  rules  were  easy   for  users  to  understand,  followed  a  logical  structure,  and  were  not  overly  complex.  to  this  end,  we   drew  on  saffer’s  approach  to  designing  rules  for  microinteractions,  along  with  the  philosophy   espoused  by  steve  krug  in  his  influential  web  design  book,  don’t  make  me  think:  a  common  sense   approach  to  web  usability.20  both  krug  and  saffer  argue  for  reducing  complexity  and  removing   decision-­‐making  from  the  user  whenever  possible  to  remove  potential  for  user  error  or  mistakes.   the  rules  in  ro@m  follow  a  familiar  form-­‐based  approach:  users  log  in  to  the  system,  they  have  to   agree  to  a  licensing  agreement,  they  create  some  metadata  for  their  item,  and  they  upload  a  file   (see  figure  1).  however,  determining  the  order  for  each  of  these  elements,  and  ensuring  that  users   could  understand  how  to  fill  out  the  form  successfully,  required  careful  thinking  that  was  greatly   informed  by  the  user  testing  we  conducted.   for  example,  we  designed  ro@m  to  connect  to  the  same  authentication  system  used  for  other   university  applications,  ensuring  that  faculty  could  log  in  with  the  credentials  they  use  daily  for   institutional  email  and  network  access.  forcing  faculty  to  create,  and  remember,  a  unique   username  and  password  to  submit  content  would  have  increased  the  possibility  of  login  errors   and  resulted  in  confusion  and  frustration.  we  also  used  drop-­‐down  options  where  possible   throughout  the  microinteraction  instead  of  requiring  faculty  to  input  data  such  as  file  types,   faculty  or  department  names,  or  content  types  into  free-­‐text  boxes.   during  our  user  testing  we  found  that  those  fields  where  we  had  free-­‐text  input  for  metadata   entry  most  often  led  to  confusion  and  errors.  for  instance,  it  quickly  became  apparent  that  name   authority  would  be  an  issue.  when  filling  out  the  “author”  field,  some  people  used  initials,  some   used  middle  names,  and  some  added  “dr”  before  their  name,  which  could  negatively  affect  the  ir’s   search  results  and  the  ability  to  track  where  and  when  these  works  may  be  cited  by  others.  when   asked  to  include  a  citation  for  published  works,  most  of  our  participants  noted  frustration  with   this  requirement  because  they  could  not  do  so  quickly,  and  they  had  concerns  about  creating   correct  citations.  finally,  many  participants  also  became  confused  at  the  last,  optional  field  in  the   form  that  allowed  them  to  assign  a  creative  commons  license  to  their  works.     information  technologies  and  libraries  |  september  2015   51   our  user  testing  indicated  that  we  would  need  to  be  mindful  of  how  information  like  author  names   and  citations  were  entered  by  users  before  making  an  item  available  on  the  site.  under  ideal   circumstances,  we  would  have  modified  the  form  to  ensure  that  any  information  that  the  system   knew  about  the  user  was  brought  forward:  what  saffer  calls  “don’t  start  from  zero.”21  this  could   include  automatically  filling  in  details  like  a  user’s  name.  however,  like  many  libraries,  we  chose  to   adapt  existing  software  rather  than  develop  our  microinteraction  from  the  ground  up;   implementing  such  changes  would  have  been  too  time-­‐consuming  or  expensive.  in  response,  we   instead  added  additional  workflows  to  allow  administrators  to  edit  the  metadata  before  a   contribution  would  be  published  to  the  web  so  we  could  correct  any  errors.  we  also  changed  the   “citation”  field  to  “publication  information”  to  imply  that  users  did  not  need  to  include  a  complete   citation.  lastly,  we  made  sure  that  “all  rights  reserved”  was  the  default  selection  for  the  optional   “add  a  creative  commons  license?”  field  in  the  form  because  this  was  language  that  with  which   our  users  were  familiar  with  and  comfortable  proceeding.     policy  constraints  are  another  aspect  of  the  rules  that  provide  structure  around  microinteractions,   and  can  also  limit  design  choices  that  can  be  made.  having  faculty  complete  a  nonexclusive   licensing  agreement  that  acknowledged  they  had  the  appropriate  copyright  permissions  to  allow   them  to  contribute  the  work  was  a  required  component  in  our  rules.  without  the  agreement,  we   would  risk  liability  for  copyright  infringement  and  could  not  accept  the  content  into  the  ir.   however,  our  early  designs  for  the  repository  included  this  step  at  the  end  of  the  submission   process,  after  faculty  had  created  metadata  about  the  item.  our  initial  round  of  testing  revealed   that  several  of  our  participants  were  unsure  of  whether  they  had  the  appropriate  copyright   permissions  to  add  content  and  didn’t  want  to  complete  the  submission,  a  frustrating  experience   for  them  after  spending  time  filling  out  author  information,  keywords,  abstract,  and  the  like.  we   attempted  to  resolve  this  issue  by  moving  the  agreement  much  earlier  in  the  process,  forcing  users   to  acknowledge  the  agreement  before  creating  any  metadata.  we  also  used  simple,   straightforward  language  for  the  agreement  and  added  additional  information  about  how  to   determine  copyrights  or  contact  ro@m  staff  for  assistance.  integrating  an  api  that  could   automatically  search  a  journal’s  archiving  policies  in  sherpa  romeo  at  this  stage  in  the   contribution  process  is  something  we  plan  to  investigate  to  help  reduce  complexity  further  for   users.     feedback   understanding  the  concept  of  feedback  is  critical  to  the  design  of  microinteractions.  while  most   libraries  are  familiar  with  collecting  feedback  from  users,  the  feedback  saffer  describes  is  flowing   in  the  opposite  direction,  and  instead  refers  to  feedback  the  application  or  interface  is  providing   back  to  users.  this  feedback  gives  users  information  when  and  where  they  need  it  to  help  them   navigate  the  microinteraction.  as  saffer  comments,  “the  true  purpose  of  feedback  is  to  help  users   understand  how  the  rules  of  the  microinteraction  work.”22     self-­‐archiving  with  ease  in  an  institutional  repository  |  betz  and  hall     doi:  10.6017/ital.v34i3.5900   52   feedback  can  be  provided  in  a  variety  of  ways.  an  action  as  simple  as  a  color  change  when  a  user   hovers  over  a  link  is  a  form  of  feedback,  providing  visual  information  that  indicates  that  a  segment   of  text  can  be  clicked  on.  confirmation  messages  are  an  obvious  form  of  feedback,  while  a  folder   with  numbers  indicating  how  many  items  have  been  added  to  it  is  more  subtle.  while  visual   feedback  is  most  commonly  used,  saffer  also  describes  cases  where  auditory  and  haptic  (touch)   feedback  may  be  useful  .  designing  feedback,  much  like  designing  rules,  should  aim  to  reduce   complexity  and  confusion  for  a  user,  and  should  be  explicitly  connected  both  functionally  and   visually  to  what  the  user  needs  to  know.   in  an  online  web  environment,  much  of  the  feedback  we  provide  the  user  should  be  based  on  good   usability  principles.  for  example,  formatting  web  links  consistently  and  providing  predictable   navigation  elements  are  some  ways  that  feedback  can  be  built  into  a  design.  providing  feedback  at   the  users’  point  of  need  is  also  critical,  especially  error  messages  or  instructional  content.  this   proved  to  be  especially  important  to  our  ro@m  test  subjects.  while  the  ir  featured  an  “about”   section  accessible  in  the  persistent  navigation  at  the  top  of  the  website  that  contained  detailed   instructions  and  information  about  how  to  submit  works,  and  terms  of  use  governing  these   submissions,  this  content  was  virtually  invisible  to  the  users  we  observed.  instead,  they  relied   heavily  on  the  contextual  feedback  that  was  included  throughout  the  contribution  process  when  it   was  visible  to  them.     these  observations  led  us  to  rethink  our  approach  to  providing  feedback  in  several  cases.  for   example,  an  unfortunate  constraint  of  our  software  required  users  to  select  a  faculty  or  school  and   a  department  and  then  click  an  “add”  button  before  they  could  save  and  continue.  we  included   some  instructions  above  the  drop-­‐down  menus,  stating  “select  and  click  add”  in  an  effort  to   prevent  any  errors.  however,  our  participants  failed  to  notice  the  instructions  and  inevitably   triggered  a  brief  error  message  (see  figure  3).  we  later  changed  the  word  “add”  in  the  instructions   from  black  to  bright  red  hoping  to  increase  its  visibility,  and  we  ensured  that  the  error  message   that  displayed  when  users  failed  to  select  “add”  clearly  explained  how  to  correct  the  problem  and   move  on.  we  also  observed  that  the  plus  signs  to  add  additional  authors  and  keywords  were  not   visible  to  users.  we  added  feedback  that  included  both  text  and  icons  with  more  detail  (see  figure   4).  however,  this  remains  a  problem  for  users  that  we  will  need  to  further  explore.  on  completing   a  contribution,  users  receive  a  confirmation  page  that  thanks  them  for  the  contribution,  provides  a   timeline  for  when  the  item  will  appear  on  the  site,  and  notes  that  they  will  receive  an  email  when   it  appears.  response  to  this  page  was  positive  as  it  succinctly  covered  all  of  the  information  the   users  felt  they  needed  to  know  having  completed  the  process.       information  technologies  and  libraries  |  september  2015   53     figure  3.  feedback  for  the  “add”  button.     figure  4.  feedback  for  adding  multiple  authors  and  keywords.     self-­‐archiving  with  ease  in  an  institutional  repository  |  betz  and  hall     doi:  10.6017/ital.v34i3.5900   54   modes  and  loops   the  final  two  components  of  microinteractions  defined  by  saffer  are  modes  and  loops.  saffer   describes  a  mode  as  a  “fork  in  the  rules,”  or  a  point  in  a  microinteraction  where  the  user  is   exposed  to  a  new  process,  interface,  or  state.23  for  example,  google  scholar  provides  users  with  a   setting  to  show  “library  access  links”  for  participating  institutions  with  openurl  compatible  link   resolvers.24  users  who  have  set  this  option  are  presented  with  a  search  results  page  that  is   different  from  the  default  mode  and  includes  additional  links  to  their  chosen  institution’s  link   resolver.  our  microinteraction  includes  two  distinct  modes.  once  logged  in,  users  can  choose  to   contribute  works  through  the  “do  it  yourself”  submission  that  we’ve  described  here  in  some   detail,  or  they  can  choose  “let  us  do  it”  and  complete  a  simplified  version  that  requires  them  to   acknowledge  the  licensing  agreement,  upload  their  files,  and  provide  any  additional  data  they   chose  in  a  free-­‐text  box  (see  figure  5).  the  majority  of  our  testers  specified  that  they  would  opt  for   the  “do  it  yourself”  option  because  they  wanted  to  have  control  over  the  metadata  describing   their  work,  including  the  abstract  and  keywords.  however,  since  launching  the  repository,  several   submissions  have  arrived  via  the  “let  us  do  it”  form,  which  suggests  a  reasonable  amount  of   interest  in  this  mode.     figure  5.  the  “let  us  do  it”  form.   loops,  on  the  other  hand,  are  simply  a  repeating  cycle  in  the  microinteraction.  a  loop  could  be  a   process  that  runs  in  the  background,  checking  for  network  connections,  or  it  could  be  a  more   visible  process  that  adapts  itself  on  the  basis  of  the  user’s  behavior.  for  example,  in  the  ro@m   submission  process  users  can  move  backward  and  forward  in  the  contribution  forms;  both  have     information  technologies  and  libraries  |  september  2015   55   “previous”  and  “save  and  continue”  buttons  on  each  page  to  allow  users  to  navigate  easily.  the   final  step  on  the  “do  it  yourself”  form  allows  users  to  review  their  metadata  and  the  file  that  they   have  uploaded.  they  can  then  use  the  previous  button  to  make  changes  to  what  they  have  entered   before  completing  the  submission.  ideally,  users  would  be  able  to  edit  this  content  directly  from   this  review  page,  but  software  constraints  prevented  us  from  including  this  feature,  and  the   “previous”  button  did  not  pose  any  major  challenges  for  our  testing  participants.  another  example   of  a  loop  in  ro@m  is  a  “contribute  more  works”  button  embedded  in  the  confirmation  screen  that   takes  users  back  to  the  beginning  of  the  microinteraction.  this  feature  was  suggested  by  one  of   our  participants,  and  it  extends  the  life  of  the  microinteraction,  potentially  leading  to  additional   contributions.   discussion  and  conclusions   focusing  on  the  details  of  the  self-­‐archiving  process  in  our  ir  provided  extremely  rich  qualitative   data  for  improving  the  user  interface,  while  analyzing  the  structure  of  the  microinteraction,   following  saffer’s  model,  was  also  a  valuable  exercise  in  thinking  about  user  needs  and  software   design  from  a  different  perspective  from  standard  usability  studies.  the  improvements  we  made,   based  on  both  saffer’s  theory  and  the  results  we  observed  through  testing,  added  significant   functionality  and  ease  of  use  to  the  self-­‐archiving  process  for  faculty.  thinking  carefully  about   elements  like  placement  of  buttons,  small  changes  in  wording  or  flow,  and  timing  of  instructional   or  error  feedback  highlighted  the  big  effect  small  elements  can  have  on  usability.     however,  there  are  some  limitations  to  both  the  theory,  and  our  approach  to  testing  and   improving  the  ir  that  affect  how  well  we  can  understand  and  utilize  the  results.  of  particular   concern  is  how  well  this  kind  of  testing  can  capture  the  ux  of  a  faculty  member  beyond  the  utility   or  ease  of  use  of  the  interaction.  in  an  observational  study  we  can  rely  on  comments  from   participants  and  key  statements  that  may  indicate  a  participant’s  emotional  or  affective  state,  but   we  didn’t  include  targeted  questions  to  gather  this  data  and  focused  instead  on  the  details  of  the   microinteraction.  we  didn’t  ask  how  they  felt  while  using  the  ir,  or  if  successfully  uploading  an   item  to  the  ir  gave  them  a  sense  of  autonomy  or  competence,  or  if  this  experience  would   encourage  them  to  submit  content  in  the  future.  nevertheless,  improving  usability  is  a  solid   foundation  for  providing  a  positive  ux.  hassahzhal  describes  the  difference  between  “do-­‐goals”   (completing  a  task)  and  “be-­‐goals”  (human  psychological  needs  like  being  competent,  or   developing  relationships).25  while  he  argues  that  “be-­‐goals”  are  the  ultimate  drivers  of  ux,  he  also   suggests  that  creating  tools  that  make  the  completion  of  do-­‐goals  easy  can  facilitate  the  potential   fulfillment  of  be-­‐goals  by  removing  barriers  and  making  their  fulfillment  more  likely.  ultimately,   however,  a  range  of  user  testing  strategies  can  lead  to  improvements  in  a  user  interface,  whether   that  testing  relies  on  carefully  detailed  examination  of  a  microinteraction,  analysis  of  large  data   sets  from  google  analytics,  or  interviews  with  key  user  groups.  microinteraction  theory  is  a  useful   approach,  and  valuable  in  its  conceptualization,  but  it  should  be  one  of  many  tools  libraries  adopt   to  improve  their  online  ux.     self-­‐archiving  with  ease  in  an  institutional  repository  |  betz  and  hall     doi:  10.6017/ital.v34i3.5900   56   similarly,  focusing  on  the  ux  of  irs  must  be  only  one  of  many  strategies  institutions  employ  to   improve  rates  of  faculty  self-­‐archiving.  there  have  been  recent  studies  that  argue  that  regardless   of  platform  or  process,  faculty-­‐initiated  submissions  have  proven  to  be  uncommon.26  instead,  they   suggest  that  sustainability  relies  on  marketing,  direct  outreach  with  individual  faculty  members,   and  significant  staff  involvement  in  identifying  content  for  inclusion,  investigating  rights,  and   depositing  on  authors’  behalf.  it  would  be  short  sighted  to  suggest  that  relying  solely  on  designing   a  user-­‐friendly  website,  or  only  developing  savvy  promotional  and  outreach  efforts,  can  determine   the  ongoing  success  of  an  ir  initiative.  gaining  and  maintaining  support  is  an  ongoing,   multifaceted  process,  and  largely  depends  on  the  academic  culture  of  an  institution  as  well  as   available  financial  and  staffing  resources.  as  such,  user  testing  offers  qualitative  insights  into  ways   that  processes  and  functions  might  be  improved  to  enhance  the  viability  of  ir  initiatives  in  tandem   with  a  variety  of  marketing  and  outreach     references     1.     “welcome  to  roarmap,”  university  of  southampton,  2014,  http://roarmap.eprints.org.   2.     dorothea  salo,  “innkeeper  at  the  roach  motel,”  library  trends  57,  no.  2  (2008):  98,   http://muse.jhu.edu/journals/library_trends.     3.     ibid.,  100.   4.     “the  directory  of  open  access  repositories—opendoar,”  university  of  nottingham,  uk,   2014,  http://www.opendoar.org.   5.     effie  l-­‐c  law  et  al.,  “understanding  scoping  and  defining  user  experience:  a  survey   approach,”  computer-­‐human  interaction  2009:  user  experience  (new  york:  acm  press,  2009),   719.   6.     marc  hassenzahl,  “user  experience  (ux):  towards  an  experiential  perspective  on  product   quality,”  proceedings  of  the  20th  international  conference  of  the  association  francophone   d’interaction  homme-­‐machine  (new  york:  acm  press,  2008),  11,   http://dx.doi.org/10.1145/1512714.1512717.     7.     marc  hassenzahl,  sarah  diefenbach,  and  anja  göritz,  “needs,  affect,  and  interactive  products:   facets  of  user  experience,”  interacting  with  computers  22,  no.  5  (2010):  353–62,   http://dx.doi.org/10.1016/j.intcom.2010.04.002.   8.     international  standards  organization,  human-­‐centred  design  for  interactive  systems,  iso   9241-­‐210  (geneva:  iso,  2010),  section  2.15.     9.     see  philip  m.  davis  and  matthew  j.l.  connolly,  “institutional  repositories:  evaluating  the   reasons  for  non-­‐use  of  cornell  university’s  installation  of  dspace,”  d-­‐lib  magazine  13,  no.   3/4  (2007),  http://www.dlib.orghttp://www.dlib.org;  ellen  dubinsky,  “a  current  snapshot  of   institutional  repositories:  growth  rate,  disciplinary  content  and  faculty  contributions,”     information  technologies  and  libraries  |  september  2015   57     journal  of  librarianship  &  scholarly  communication  2,  no.  3  (2014):  1–22,   http://dx.doi.org/10.7710/2162-­‐3309.1167;  anthony  w.  ferguson,  “back  talk—institutional   repositories:  wars  and  dream  fields  to  which  too  few  are  coming,”  against  the  grain  18,  no.   2  (2006):  86–85,   http://docs.lib.purdue.edu/atg/vol18/iss2/14http://docs.lib.purdue.edu/atg/vol18/iss2/14;   salo,  “innkeeper  at  the  roach  motel”;  feria  wirba  singeh,  a  abrizah,  and  noor  harun  abdul   karim,  “what  inhibits  authors  to  self-­‐archive  in  open  access  repositories?  a  malaysian  case,”   information  development  29,  no.  1  (2013):  24–35,   http://dx.doi.org/0.1177/0266666912450450.   10.   hyun  hee  kim  and  yong  ho  kim,  “usability  study  of  digital  institutional  repositories,”   electronic  library  26,  no.  6  (2008):  863–81,  http://dx.doi.org/10.1108/02640470810921637.   11.    lena  veiga  e  silva,  marcos  andré  gonçalves,  and  alberto  h.  f.  laender,  “evaluating  a  digital   library  self-­‐archiving  service:  the  bdbcomp  user  case  study,”  information  processing  &   management  43,  no.  4  (2007):  1103–20,  http://dx.doi.org/10.1016/j.ipm.2006.07.023.   12.    suzanne  bell  and  nathan  sarr,  “case  study:  re-­‐engineering  an  institutional  repository  to   engage  users,”  new  review  of  academic  librarianship  16,  no.  s1  (2010):  77–89,   http://dx.doi.org/10.1080/13614533.2010.5095170.   13.    dan  saffer,  microinteractions:  designing  with  details  (cambridge,  ma:  o’reilly,  2013),  2.   14.    ibid.,  3.   15.    ibid.,  2.   16.    ibid.,  142.   17.    jakob  nielson,  “how  many  test  users  in  a  usability  study?”  nielsen  norman  group,  2012,   http://www.nngroup.com/articles/how-­‐many-­‐test-­‐users.     18.    saffer,  microinteractions,  48.   19.    ibid.,  82.   20.    steve  krug,  don’t  make  me  think:  a  common  sense  approach  to  web  usability  (berkeley,  ca:   new  riders,  2000).   21.    saffer,  microinteractions,  64.   22.    ibid.,  86.   23.   ibid.,  111.   24.    “library  support,”  google  scholar,  http://scholar.google.com/intl/en-­‐ us/scholar/libraries.html.       self-­‐archiving  with  ease  in  an  institutional  repository  |  betz  and  hall     doi:  10.6017/ital.v34i3.5900   58     25.    hassahzhal,  “user  experience,”  10–15.   26.    see  dubinsky,  “a  current  snapshot  of  institutional  repositories,”  1–22;  shannon  kipphut-­‐ smith,  “good  enough:  developing  a  simple  workflow  for  open  access  policy  implementation,”   college  &  undergraduate  libraries  21,  no.  3/4  (2014):  279–94.   http://dx.doi.org/10.1080/10691316.2014.932263.   reproduced with permission of the copyright owner. further reproduction prohibited without permission. editorial: how do you know whence they will come? marmion, dan information technology and libraries; mar 2000; 19, 1; proquest pg. 3 editorial: how do you know whence they will come? a s i write this, i am putting my affairs in order at western michigan university, in preparation for a move to a new position at the university of notre dame libraries beginning in april. at each university my responsibilities include overseeing both the online catalog and the libraries' web presence. i mention this only because i find it interesting, and indicative of an issue with which the library profession in general is grappling, that librarians in both institutions are engaged in discussions regarding the relationship between the two. in talking to librarians at those places and others, from some i hear sentiment for making one or the other the "primary" access point. thus i've heard arguments that "the online catalog represents our collection, so we should use it as our main access mechanism." other librarians state that "the online catalog is fine for searching for books in our collection, but there is so much more to find and so many more options for finding it, that we should use our web pages to link everything together." my hunch is that probably we can all agree that there are things that an online catalog can do better than a web site, and things that a web site can do better than the online catalog. as far as that goes, have we ever had a primary access point (thanks to karen coyle for this thought)? but that's not what i want to talk about today. the debate over a primary access point contains an invalid implicit assumption and asks the wrong question. the implicit assumption is that we can and should control how our patrons come into our systems. the question we should be asking ourselves is not "what is our primary access method?" but rather "how can we ensure that our users, local and remote, will find an avenue that enables them to meet their informational needs?" since at this time i'm more familiar with wmu than notre dame, i'll draw some examples from the former. we have "subject guides to resources" on our web site. these consist of pages put together by subject specialists that point to recommended sources, both print and electronic, dan marmion local and remote, on given subjects. students can use them to begin researching topics in a large number of subject areas. the catch is that the students have to be browsing around the web site. if they happen to start out in the online catalog they will never encounter these gateways, because the only reference to them is on the web site. on the other hand, a student who stays strictly with the web site is quite possibly going to miss a valuable resource in our library if he/she doesn't consult the online catalog, because we obviously can't list everything we own on the web site. (also, obviously, the web site doesn't provide the patron with status information.) this is why we have to ask ourselves the correct question mentioned above. what is the solution? unfortunately i'm not any smarter than everyone else, so i don't have the answer (although i do know some folks who can help us with it: check out www.lita.org/ committe / toptech/ main page. htm). my guess is that we'll have to work it out as a profession, possibly in collaboration with our online system vendors, and that the solution will be neither quick nor simple nor easy. there are some ad hoc moves we can make, of course, such as put links to the gateways into the catalog, and on our web pages stress that the patron really needs to do a catalog search. the bottom line is that we have a dilemma: we can't control how people come into our electronic systems, so we can't have a "primary access point." if we try, we do harm to those who, for whatever reason, reach us via some other avenue. we need to make sure that we provide equal opportunity for all. dan marmion (dmarmion@nd.edu) is associate director of information systems and access at notre dame university, notre dame, indiana. production: ala production services (troy d. linker, christine s. taylor; angela hanshaw, kevin heubusch, and tracy malecki), american library association, 50 e. huron st., chicago, il 60611. publication of material in information technologtj and libraries does not constitute official endorsement by lita or the ala. abstracted in computer & information systems, computing reviews, information science abstracts, library & information science abstracts, referativnyi zhurnal, nauclmaya i tekhnicheskaya informatsiya, otdyelnyi vypusk, and science abstracts publications. indexed in compumath citation index, computer contents, computer literature index, current contents/health services administration, current contents/social behavioral sciences, current index to journals in education, education, library literature, magazine index, newsearch, and social sciences citation index. microfilm copies available to subscribers from university microfilms, ann arbor, michigan. mum requirements of american national standard for information sciences-permanence of paper for printed library materials, ansi z39.48-1992.oo copyright ©2000 american library association. all material in this journal subject to copyright by ala may be photocopied for the noncommercial purpose of scientific or educational advancement granted by sections 107 and 108 of the copyright revision act of 1976. for other reprinting, photocopying, or translating, address requests to the ala office of rights and permissions. the paper used in this publication meets the minieditorial i 3 google us! capital area district libraries gets noticed with google ads grant public libraries leading the way google us! capital area district libraries gets noticed with google ads grant sheryl cormicle knox and trenton m. smiley information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.12089 sheryl cormicle knox (knoxs@cadl.org) is technology director for capital area district libraries. trenton m. smiley (smileyt@cadl.org) is marketing & communications director for capital area district libraries. increased choices in the marketplace are forcing libraries to pay much more attention to how they market themselves. libraries can no longer simply employ an inward marketing approach that speaks to current users through printed materials and promotional signage plastered on the walls. furthermore, they cannot rely on occasional mentions by the local media as the primary driver of new users. that’s why in 2016, capital area district libraries (cadl), a 13 branch library system in and around lansing, michigan, began using more digital tactics as a cost-effective way to increase our marketing reach and to have more control over promoting the right service, at the right time, to the right person. one example of these tactics is ad placement on the weather channel app. this placement allows ads about digital services like overdrive and hoopla to appear when certain weather conditions, such as a snowstorm, occur in the area. in 2017, while attending the library marketing and communications conference in dallas, our marketing and communications director had the good fortune of sitting in on a presentation by trey gordner and bill mott from koios (www.koios.co) on how to receive up to $10,000 of in-kind advertising every month from a google ad grants (www.google.com/grants). during this presentation, koios offered participants a 60day trial of their services to help secure the google ad grants and create a few starter campaigns. google ads are text-based and appear in the top section of google's search results, along with the ads of paying advertisers. nonprofits in the google ad grants program can set up various ad campaigns to promote whatever they like—the overall brand of the library, the collection, and various events, meeting room offerings or any other product or service. the appearance of each google ad is triggered by keywords chosen for each campaign. after cadl's trial period expired, we decided to retain koios to oversee the google ad grants project. while the library has used google ads for the sharing of video, we had not done much with keyword advertising. so, we were excited to learn more about the process of using keywords and the funding available through the grant. we viewed this as a great new tool to add to our marketing toolbox. it would help us achieve a few of our marketing goals: expanding our overall marketing reach and digital footprint by 50 percent; increasing the library’s digital advertisement budget by 300% (by using alternative funding); and promoting the right service at the right time. getting started koios coached us through the slalom course of obtaining accounts and setting them up. to secure the monthly ad grant, we first obtained a validation key from tech soup (www.techsoup.org), the nonprofit that makes technology accessible to other non-profits and libraries. that, in turn, pre-qualified us for a google for nonprofits account. (at the time, we were able to get a validation token from our existing tech soup account, but koios currently recommends starting by registering a 501c3 friends organization or library foundation with tech soup whenever possible.) after creating our google for nonprofits account, we used the same account username to create a google ads account. finally, to work efficiently with koios, mailto:knoxs@cadl.org mailto:smileyt@cadl.org https://www.koios.co/ https://www.google.com/grants https://www.techsoup.org/ information technology and libraries march 2020 google us! | knox and smiley 2 we provided them access to our google analytics property (which we have configured to scrub patron identifying information) and our google tag manager account (with the ability to create tags that we in turn review and approve). if you are taking the do-it-yourself approach, google has a step-by-step google ad grants activation guide and extensive help online. designing campaigns spending money well is hard work and that holds true with keyword search ads as well. there are some performance and ad quality requirements in the grant program that must be observed to retain your monthly allotment. understanding these guidelines and implementing campaigns that respect them, while working well enough to spend your grant allocation requires study and patience. again, we relied on koios to guide us. they helped us create campaigns and ad groups within those campaigns that were effective within the grant program. figure 1. example of minecraft title keyword landing page created by koios. information technology and libraries march 2020 google us! | knox and smiley 3 in august 2018, we started with campaigns for general branding awareness that included ads aimed at people actively searching for local libraries and our core services. these ads funnel users to our homepage and our online card signup. they are configured to display only to searchers who are geographically located in our service area. this campaign has been grown and perfected over 18 months into one of our most successful campaigns, garnering over 2,300 impressions and 650 clicks in january 2020, yet it spends just $450 of our grant funds. another consistent performer for us has been our digital media campaign with ads targeting users searching for ebooks and audiobooks. by june 2019 we had grown our grant spend to $1,500 a month using 27 different campaigns. the game changer for us has been working with koios to create campaigns based on an export of marc records from our catalog. we worked with koios to massage this data into a very simple pseudo-catalog of landing pages based on item titles. the landing page is very simple and seo friendly so that it ranks well in the split-second ad auction competition that determines whether your ad will be displayed. it has cover images, clear calls to action, loads fast, is mobile friendly and communicates the breadth of formats held by the library (see figure 1). clicking the item title or the borrow button sends users straight into our full catalog to get more information, request the item, or link to the digital version. figure 2. a user search in google for “dad jokes” showing a catalog campaign ad. grant program ads are displayed below paid ads. the format of the ad may vary as well. this version shows several extensions, like phone number, site links, and directions links. information technology and libraries march 2020 google us! | knox and smiley 4 figure 3. the landing page displayed to the searcher after they click on the ad and the resulting catalog page if the searcher clicks the borrow button. in google ads, koios created 14 catalog campaigns out of the roughly 250,000 titles we sent them. each campaign has keywords (single words and phrases from titles) derived from roughly 18,000 titles ranked by how frequently they are used in google search. again, these ads are limited geographically to our service area. figures 2 and 3 illustrate what a google searcher in ingham county, michigan, potentially encounters when searching for “dad jokes”. since their inception in september 2019, these catalog campaigns have been top performers for us, generating clickthrough rates of 8-15% and a couple thousand additional ad clicks monthly, the aggregation of a small number of clicks on any one ad from our “long tail” of titles. we are now spending over $5,000 of our grant funds and garnering nearly 23,000 impressions and 3,000 ad clicks monthly. results in general, we find that our google ads have succeeded in drawing additional new visitors to our web site. using our long-established google analytics implementation that measures visits to our website and catalog combined, we compared the third quarter of 2018, when we were ramping up our google ad grants campaigns, to the third quarter of 2019, after our catalog campaign was firmly established. the summary numbers are encouraging. the number of users is up 17%, and number of sessions is up 4%. within the overall rise in users, returning users are up 9%, but new users are up 25%. therefore, we are getting more of those coveted, elusive “non-library-users” to visit us online. when comparing the behavior of new and returning visitors, we also see that the overall increase in sessions was achieved despite the head wind of a 4% decline in returning visitor sessions. however, are the new visitors engaging? perhaps the most tangible measure of engagement for a public library catalog is placing holds. we have a google analytics conversion goal that measures those holds. the information technology and libraries march 2020 google us! | knox and smiley 5 rate of conversion on the hold goal among new visitors rose 7%, while dropping 13% among returning visitors. from other analysis, we know that our highly-engaged members are migrating to our mobile app and to digital formats, so the drop for returning users is explainable and the rise among new visitors is hopeful. we are working on ways to study more closely these new visitors so that we can discover and remove more barriers in the way of them becoming highly engaged members of their public library. future plans with the help of koios, new campaigns will be created to promote our blogs and podcasts. we will also link a campaign to our demco events database. finally, in partnership with koios, we will work with patron point to incorporate our automated email marketing system into google ad campaigns. we will add campaigns for pop-up ads that encourage library card signup through our online registration system. once someone signs up for a library card online, the system will trigger a welcome email that promotes some of our core services. this on-boarding set-up will also include an opportunity for the new cardholder to fill out a form to tailor content in future emails to their interests. through all these means, cadl leads the way in delivering the right service, at the right time, to the right person. getting started designing campaigns results future plans reference is dead, long live reference: electronic collections in the digital age heather b. terrell information technology and libraries | december 2015 55 abstract in a literature survey on how reference collections have changed to accommodate patrons’ webbased information-seeking behaviors, one notes a marked “us vs. them” mentality—a fear that the internet might render reference irrelevant. these anxieties are oft-noted in articles urging libraries to embrace digital and online reference sources. why all the ambivalence? citing existing research and literature, this essay explores myths about the supposed superiority of physical reference collections and how patrons actually use them, potential challenges associated with electronic reference collections, and how providing vital e-reference collections benefits the library as well as its patrons. introduction reference collections are intended to meet the immediate information needs of users. reference librarians develop these collections with the intention of using them to answer indepth questions and to conduct ready-reference searches on a patron’s behalf. library users depend on reference collections to include easily navigable finding tools that assist them in locating sources that contain reliable information in a useful, accessible format and can be accessed when the information is needed. the expectation for print reference collections is that they are comprised of high-use materials—the very reason for their designation as noncirculating items is ostensibly so that materials are available for on-demand access by both patrons and staff, who use them frequently. however, librarians and patrons alike have acquired what margaret landesman calls “online habits,” to wit, the most-utilized access point to information is often the 24/7 web.1 in a wired world, where the information universe of the internet is not only on our desktops, but also in our pockets and on our fashion accessories, the role of the print reference collection is less relevant in supporting information and research aims. in no other realm have the common practices of both users and librarians changed more than in how we seek information. nevertheless, a technology-related panic seems to be at the boil, with article titles like “are reference books becoming an endangered species?”2 and “off the shelf: is print reference dead?”3 words like “invasion” are used to describe the influx of electronic reference sources. we read about the heather b. terrell (hterrell@milibrary.org), a recent mlis degree graduate from the school of library and information science, san jose state university, is winner of the 2015 lita/ex libris student writing award. mailto:hterrell@milibrary.org reference is dead, long live reference: electronic collections in the digital age | terrell | doi: 10.6017/ital.v34i4.9098 56 “unsustainable luxury” of housing hundreds—sometimes thousands—of unused books on the open shelves. all this handwringing leads us to wonder why librarians in the field need this much coaxing to be cajoled into weeding their print reference collections in favor of electronic reference resources. does this format transition really constitute such a dire situation? what if the decline of print reference usage isn’t a problem? and what’s so luxurious about dusty, overcrowded shelves full of books no one cares to use? in “the myth of browsing,” barclay concludes that “the continued over-my-dead-body insistence that no books be removed [from libraries] is an unsustainable position.”4 a survey of the relevant literature reveals that staff resistant to the transition from print to electronic reference collections often share three core presumptions about reference: • users prefer using print sources, and the importance of patrons’ ability to browse the physical collection is paramount. • the reliability of web-based reference sources may be questionable, especially when compared with the authority of print reference materials. • access to print materials is the only option that certain users (namely, those without library cards) have for being connected to information. there also seems to be a more subtle assumption at play in the print vs. electronic reference debate—that print books are more “dignified,” cultivating a scholarly atmosphere in the library. certain objections to removing print reference collections to closed stacks and using the newly freed public space to build a cooperative learning commons, for instance, tend to devolve into hysterics about the potentiality of libraries becoming “no better than a starbucks.” the “no better” variable in this equation is a cosmetic one—librarians aren’t worrying about libraries serving up a flavored “information latte” for vast profit margins—they are worrying that libraries will be perceived as a place to loiter, use the internet, and “hang out,” rather than a place for serious study. one thing for librarians who worry about this potential outcome to consider is that loyal coffee shop denizens would be up in arms to learn that their favorite shop was being closed or its services being reduced or eliminated. the implications are clear. perhaps libraries should consider the café model: a collaborative “no-shushing” zone—the difference between a library and a coffee house being that at a library, people are able to explore, learn, and be entertained using the resources provided by the institution. at homer babbidge library at the university of connecticut, staff considered it important to “maintain a vestige of the reference collection, so that students were reminded they had entered a special place where scholarship was tangible.”5 however, users considered the underutilized stacks of books a waste of space that could be better used for cooperative work areas or information technology and libraries | december 2015 57 computer access stations. the students’ needs and interests were heeded, and homer babbidge library’s learning commons has been a successful endeavour. reference collections: history and purpose brief points about the history of reference services lend context to the arguments presented in favor of building electronic reference collections. grimmelman points out that “it’s almost a cliché to assert that the internet is like a vast library, that it causes problems of information overload, or that it contains both treasures and junk in vast quantities.”6 from the earliest dedicated reference departments to the 24/7 reference model developed in response to progressing technology, tyckoson affirms that “one thing remains constant—users still need help. the question . . . is how to provide the library’s users with the best help.”7 browsing collections in libraries are newer than one might assume. prior to world war ii, academic library faculty could browse to find reference materials that met the information needs of students, but undergrads weren’t even allowed in the stacks.8 in public libraries, reference collections were open to users, but reference rooms were considered to be, first and foremost, the domain and workplace of the reference.9 this raises the question, what is the domain and workplace of the contemporary reference librarian? arguably, the answer to this query is wherever the information is, for example, online. ready reference collections arose from the need to make the most commonly used resources in the library convenient and readily available for patron use.10 the most commonly used resources in contemporary libraries are those found online—again, where the information is. both users and librarians now turn to the web as the first resort for answering quick reference queries, and they turn to online databases and journals for exploring complex research questions. meanwhile print works that were once used daily sit moldering, gathering dust on the shelves either because they are outdated or because no one thinks to find them when the answer is available at the swipe of a finger or the click of a mouse from where they sit, whether that’s in the library or in, ahem, a coffee shop. “the convenience, speed, and ubiquity of the internet is making most print reference sources irrelevant,” tyckoson says.11 print preference, browsing collections library use is increasing—but, as landesman and others point out, it is increasing because users want access to computers, instruction in technology, study spaces, or just a place to be that’s not home, not school, and not work. users do not come to the library for reference sources— researchers and scholars prefer to access full-text works via their computing devices.12 the argument that users prefer print sources is antiquated, and the emphasis on building browsing collections of physical reference materials reflects a misguided notion that users crave tactile information. landesman is blunt: “when it is a core title, users want it online.”13 reference is dead, long live reference: electronic collections in the digital age | terrell | doi: 10.6017/ital.v34i4.9098 58 statistics bear her assertions out. studies show that usage of print reference collections is minimal and that users strongly prefer online access to reference materials. • at stetson university, usage statistics gathered during the 2003–4 academic year showed that only 8.5 percent of the library’s 25,626 reference volumes were used during that period.14 • a year-long study by public libraries revealed that only 13 percent of winter park (fl) public library’s collection of 8,211 reference items was used.15 • when texas a&m university converted its reference collection’s primary format from print to e-books, a dramatic increase of the e-versions of reference titles was recorded.16 • in a survey of william & mary law library users, a majority of respondents indicated that they consciously select online reference sources over print, citing convenience and currency as top reasons for doing so.17 scanning the shelves may seem to some to be the most intuitive way to search for information, but in actual practice, browsing is ineffective—books at eye level are used more often, patrons are limited to sources not being used by another patron at any given moment, and overcrowding of the shelves results in patrons overlooking useful materials.18 browsing is the least effective way for patrons to “shop” a collection. searching by electronic means overcomes the obstacles inherent in browsing the physical shelves when using well-designed search algorithms that employ keywords on the basis of accurate metadata. landesman indicates that if librarians commit to educating patrons on the use of the reference databases, ebooks, and websites they offer, online reference will be “a huge win for users.”19 it should be noted: no one suggests that print reference should be eliminated entirely, at least not yet. smaller print reference collections result in better-utilized spaces; they ensure that remaining resources in the physical collections are used more effectively—only the items that are actually high-use are included, which makes these sources easier to locate; books formerly classified as reference materials are able to circulate to those interested in their specialized content. smaller print reference collections serve patrons in myriad ways, including freeing up funds that can be used to enhance electronic reference collections. digital reference services are just another way of organizing information—there is no revolution here, unless it is in providing information with more efficiency—with breadth, depth, and access that surpasses what is possible via a print-only reference collection. the inevitable digital shift is a very natural evolution of patron-driven library services rather than cause for consternation on the part of library service providers. web reliability: google and wikipedia those who argue that the reliability of online sources is questionable are typically referring to google results and wikipedia entries, which have little bearing on a library’s electronic collections information technology and libraries | december 2015 59 of databases and e-books, but has plenty to do with a library’s reference services: these two sources are very often used in lieu of printed fact-finding sources such as atlases, subject or biographical dictionaries, and specialized sources like bartlett’s familiar quotations—which was last printed in 2012 and has recently gone digital. for questions of fact, google is often a convenient and “reliable enough” source for most queries; authority of the results yielded by a google search is not always detectable, and is sometimes intentionally obscured, so the librarian must vet results carefully and select the most reliable sources when providing ready reference to patrons. however, google is far more than just its main search page. for instance, google scholar allows searchers to locate full-text articles as well as citations for scholarly papers and peer-reviewed journal articles. in general, there are many tools on the web, and librarians must expend effort determining how to make the best use of each. in particular, google is better suited to some information tasks than others—it’s up to the librarian to know when to use this tool and when to eschew it. wikipedia has been the subject of much heated debate since its inception in 2001, but in a study conducted by nature magazine, the encyclopedias britannica and wikipedia were evaluated on the basis of a review of fifty pairs of articles by subject experts and found to be comparable in terms of the number of content errors—2.9 and 3.9, respectively.20 deliberate misstatements of fact, usually in biographical entries, are cited as evidence that wikipedia is utterly unreliable as a reference source. in fact, print sources have been plagued with the same issues. for many years, the dictionary of american biography contained an entry based on a hoax claiming a (nonexistent) diary of horatio alger—and while the entry was removed in later editions, the article was still referred to in the index for several years after its removal.21 if anything, it seems that format might provide a false sense of assurance that a source’s authority is infallible. all reference sources include bias, and all will include faulty information. the major difference between print and electronic sources is that in the digital era, using the tools of technology, these errors can be corrected quickly. what some see as declining quality of a source based on its format is simply a longstanding feature of human-produced reference works, dissociated from any print vs. web debate. access: collection vs. policy some academic and public libraries intend to decrease or discontinue purchasing print-based reference sources so funds can be diverted to build electronic reference collections; they weed print reference to make room for information commons containing technology used for accessing these electronic collections. the basic assumption in the objection to this practice is that the traditional model of in-person reference is integral to a functioning reference collection, that access to information depends on that information being printed on a physical page. reference services are provided virtually via chat, im, and email. reference services are provided via the library’s website. reference services are provided by roving librarians, reference is dead, long live reference: electronic collections in the digital age | terrell | doi: 10.6017/ital.v34i4.9098 60 librarians engaging in one-on-one literacy sessions, and in large-group training sessions. long gone are the days of the reference librarian who waits patiently at her station for a patron to approach with a question. since the reference services model no longer mandatorily includes a stationary point on the library map, nor does providing quality reference depend solely on the depth and breadth of the print reference collection, how are print reference collections used? as indicated previously, about 10 percent of print reference collections are used by patrons on a regular basis. concern for the information needs of library users who do not have library cards is well-intentioned, but the question remains: if 90 percent of a collection goes unused, even when those users without library cards have access to these materials, is the collection useful? as stewart bodner of nypl says, “it is a disservice to refer a user to a difficult print resource when its electronic counterpart is a far superior product.”22 how users want to receive their information matters—access should not depend on whether a user can obtain a library card. for those libraries with high concentrations of patrons who do not qualify for library cards (e.g., individuals who do not have a fixed home address, or who cannot obtain a state-issued id card), libraries might reconsider their policies rather than their collections. computer-only access cards can be provided on a temporary basis for visitors and others who are unable to obtain permanent cards. san francisco public library recently instituted a welcome card for those members of the community who cannot meet identification requirements for full library privileges. the welcome card allows the user full access to computers and online resources and permits the patron to check out one physical item at a time.23 when compared with purchasing, housing, and maintaining vast print reference collections, this is a significantly less costly and far more patron-centered solution to the problem of access to electronic information sources— librarians should be advocates for users, with the goal being access to knowledge, no matter its format. conclusion: building better hybrid collections most library professionals agree that libraries should collect both print and electronic sources for their reference collections, but the ratio of print to digital is up for debate. as more formats with improved capabilities appear, researchers find that patrons prefer those sources that provide them with the best functionality. it is essential to look to the principles on which reference services are founded. one of those principles is to build collections on the basis of user preferences. librarians must consider what the reference collection is for and whether assumptions about patron preferences are backed by evidence. in essence, considering what “reference” means to users rather than defaulting to the status quo. a reference collection development policy must be based on what is actually used often, not on what has the potential to maybe be used sometime in the future. the library is not an archive, preserving great tomes for posterity—the collections in a library are for use. with less emphasis on print materials, librarians might focus on the wealth of sources available electronically via information technology and libraries | december 2015 61 databases and ebooks, as well as open-source, free online resources. librarians must cultivate an understanding of the resources patrons use and the formats in which they prefer to access information. as heintzelman and coauthors state, “a reference collection should evolve into a smaller and more efficient tool that continually adapts to the new era, merging into a symbiotic relationship with electronic resources.”24 rules of reference that were devised when print works were the premier sources of reference information no longer apply. reference librarians must lead the way in responding to the digital shift—creating electronic collections centered on web-based recommendations, licensed databases of journals, and ebooks—with a focus on rich, interactive, and unbiased content. weeding reference collections of outdated and unused tomes, moving some materials to the closed stacks while allowing others to circulate, and building e-book reference collections allows libraries to provide effective reference services by cultivating collections that patrons want to use. much of the transition from print to electronic reference collections can be accomplished by ensuring that resources are promoted to patrons and staff, that training in using these tools is provided to patrons and staff, that librarians become involved in the selection of digital collections, and that the spaces where print collections were formerly housed are used in ways the community finds valuable. one need not worry about the “invasion” of e-reference or the “death” of print reference. the two can coexist peacefully and vitally, as long as librarians maintain focus on selecting the best material for their reference collections, no matter its format. references 1. margaret landesman, “getting it right—the evolution of reference collections,” reference librarian 44, no. 91–92 (2005): 8. 2. nicole heintzelman, courtney moore, and joyce ward, “are reference books becoming an endangered species? results of a yearlong study of reference book usage at the winter park public library,” public libraries 47, no. 5 (2008): 60–64. 3. sue polanka, “off the shelf: is print reference dead?” booklist 104, no. 9/10 (january 1 & 15, 2008): 127. 4. donald a. barclay, “the myth of browsing: academic library space in the age of facebook,” american libraries 41, no. 6–7 (2010): 52–54. 5. scott kennedy, “farewell to the reference librarian,” journal of library administration 51, no. 4 (2011): 319–25. 6. james grimmelmann, “information policy for the library of babel,” journal of business & technology law 3 (2008): 29. reference is dead, long live reference: electronic collections in the digital age | terrell | doi: 10.6017/ital.v34i4.9098 62 7. david a. tyckoson, “issues and trends in the management of reference services: a historical perspective,” journal of library administration 51, no. 3 (2011): 259–78. 8. donald a. barclay, “the myth of browsing: academic library space in the age of facebook,” american libraries 41, no. 6–7 (2010): 52–54. 9. tyckoson, “issues and trends in the management of reference services.” 10. carol a. singer, “ready reference collections,” reference & user services quarterly 49, no. 3 (2010): 253–64. 11. tyckoson, issues and trends in the management of reference services,” 293. 12. landesman, “getting it right,” 8. 13. ibid., 10. 14. jane t. bradford, “what’s coming off the shelves? a reference use study analyzing print reference sources used in a university library,” journal of academic librarianship 31, no. 6 (2005): 546–58. 15. heintzelman, moore, and ward, “are reference books becoming an endangered species?” 16. dennis dillon, “e-books: the university of texas experience, part 1,” library hi tech 19, no. 2 (2001): 113–25. 17. paul hellyer, “reference 2.0: the future of shrinking print reference collections seems destined for the web,” 13 aall spectrum 24–27 (march 2009). 18. barclay, “the myth of browsing.” 19. landesman, “getting it right.” 20. jim giles, “internet encyclopaedias go head to head,” nature 438, no. 7070 (2005): 900–901. 21. denise beaubien bennett, “the ebb and flow of reference products,” online searcher 38, no. 4 (2014): 44–52. 22. mirela roncevic, “the e-ref invasion-now that e-reference is ubiquitous, has the confusion in the reference community subsided?” library journal 130, no. 19 (2005): 8–16. 23. san francisco public library, “welcome card,” sfpl.org/pdf/services/sfpl314.pdf (2014): 1–2. 24 . heintzelman, moore, and ward, “are reference books becoming an endangered species?” http://sfpl.org/pdf/services/sfpl314.pdf introduction lib-s-mocs-kmc364-20141005044228 book reviews die elektronische datenverarbeitung im bibliothekswesen. by paul niewalda. muenchen-pullach, berlin, verlag dokumentation, 1971. (bibliothekspraxis, 1) as the first volume in a new series called bibliothekspraxis (library practice), v erlag dokumentation has published a short monograph on library automation by paul n iewalda, of the university library of regensburg. niewalda has written an introductory text, in german, condensing the standard, largely american, literature on the subject. his treatment is concise, wellwritten, and · well-organized. computer capabilities, and existing library applications in the united states and elsewhere, are carefully delineated. the text is thoroughly documented, with a large number of notes and a useful bibliography included. the book addresses itself to the german reader and, in fact, much is already familiar to american librarians. yet niewalda's frequent references to the ·european, particularly the german, library automation scene enhance the book's value. the author is clearly well informed both about library automation in general, and about local practice and problems. he brings to his task common sense and sound judgment. the work is recommended to those readers having a general interest in foreign developments in the field of library automation. s. micha namenwirth university of california, berkeley dictionanj of library science, information and documentation in six languages. compiled and artanged by w. e. clason. amsterdam: elsevier scientific publ. co., 1973. the basic table, a numbered list of entries for 5,439 english language words and phrases, alphabetically arranged, forms the body of the dictionary of library science, information and documentation. each entry consists ·of a serial number, the english term (american and/or british), equivalents in french, spanish, italian, dutch; and german, and a code identifying the book reviews 123 vocabulary with which the term is associated. hence, there are separate entries for volume as a book trade or library term and as an information processing term. many entries are augmented by brief definitions. english synonyms are also frequently given; in general these are terms from which references have been made. in such cases entry is under the synonym which files first. this practice produces some apparently eccentric choices; e.g., pseudonym, see allonym; udc, see brussels system. following the basic ' table are indexes for the five non-english languages. numerical references are given to basic table entries in which the index term is cited. german band is found not only in the first volume entry mentioned above but also in the bookbinding and information processing entries for tape. criteria employed for the selection of entries are unexplained. ibm's data pro-·' cessing glossary and the american national standard vocabulary for information processing appear to have been important sources of information processing terms. the glossary in anglo-american cataloging rules was evidently not used. it is clear that some of the source lists used were in other languages. the juxtaposition of related vocabularies which often put the same words to different uses presents difficulties which the approach taken here seems capable of handling. nevertheless the work as executed has flaws which reduce its effectiveness. the notions of synonymy and nonsynonymy among the english terms are puzzling. definitions are frequently unclear and occasionally wrong. there are cases in which the non-english equivalents for a single' term are certainly not synonymous with each other. the utility of the mdexes would be enhanced if the number of nonenglish synonyms given were greater. however, if approached with care, the volume can ·provide much useful information. in works of this type it is probably unfair to expect perfection. besides, a dictionary which manages to encompass both negative entropy (information theory) and scrivener's palstj (authors and authorship) has to be interesting, at least. charles w. husbands harvard universitv library communications wikidata: from “an” identifier to “the” identifier theo van veen information technology and libraries | june 2019 72 theo van veen (theovanveen@gmail.com) is researcher (retired), koninklijke bibliotheek. abstract library catalogues may be connected to the linked data cloud through various types of thesauri. for name authority thesauri in particular i would like to suggest a fundamental break with the current distributed linked data paradigm: to make a transition from a multitude of different identifiers to using a single, universal identifier for all relevant named entities, in the form of the wikidata identifier. wikidata (https://wikidata.org) seems to be evolving into a major authority hub that is lowering barriers to access the web of data for everyone. using the wikidata identifier of notable entities as a common identifier for connecting resources has significant benefits compared to traversing the ever-growing linked data cloud. when the use of wikidata reaches a critical mass, for some institutions, wikidata could even serve as an authority control mechanism. introduction library catalogs, at national as well as institutional levels, make use of thesauri for authority control of named entities, such as persons, locations, and events. authority records in thesauri contain information to distinguish between entities with the same name, combine pseudonyms and name variants for a single entity, and offer additional contextual information. links to a thesaurus from within a catalog often take the form of an authority control number, and serve as identifiers for an entity within the scope of the catalog. authority records in a catalog can be part of the linked data cloud when including links to thesauri such as viaf (https://viaf.org/), isni (http://www.isni.org/), or orcid (https://orcid.org/). however, using different identifier systems can lead to having many identifiers for a single entity. a single identifier system, not restricted to the library world and bibliographic metadata, could facilitate globally unique identifiers for each authority and therefore improve discovery of resources within a catalog. the need for reconciliation of identifiers has been pointed out before.1 what is now being suggested is to use the wikidata identifier as “the” identifier. wikidata is not domain specific, has a large user community, and offers appropriate apis for linking to its data. it provides access to a wealth of entity properties, it links to more than 2,000 other knowledge bases, it is used by google, and the number of organisations that link to wikidata is quantifiably growing with tremendous speed.2 the idea of using wikidata as an authority linking hub was recently proposed by joachim neubert.3 but why not go one step further and bring the wikidata identifier to the surface directly as “the” resource identifier, or official authority record? this has been argued before and the implications of this argument will be considered in more detail in the remainder of this article. 4 information technology and libraries | june 2019 73 figure 1. from linking everything to everything to linking directly to wikidata. figure 1 illustrates the differences between a few possible situations that should be distinguished. on the left, the “everything links to everything” situation shows wikidata as one of the many hubs in the linked data cloud. in the middle, the “wikidata as authority hub” situation is shown, where name authorities are linked to wikidata. on the right is the arrangement proposed in this article, where library systems and other systems for which this may apply share wikidata as a common identifier mechanism. of course, there is a need for systems that feed wikidata with trusted information and provide wikidata with a backlink to a rich resource description for entities. in practice, however, many backlinks do not provide rich additional information and in such cases a direct link to wikidata would be sufficient for the identification of entities. figure 2 shows these two situations and other possible variations by means of dashed lines, i.e. systems that feed wikidata, but use the wikidata identifier as resource identifier for the outside world vs. systems that link directly to wikidata, but keep a local thesaurus for administrative purposes. it is certainly not the intention to encourage institutions to give up their own resource descriptions or resource identifiers locally, especially not when they are an original or rich source of information about an entity. a distinction can be made between the url of the description of an entity and the url of the entity itself. when following the url of a real-world entity in a browser, it is good practice to redirect to the corresponding description of the entity. this is known as the “httprange-14” issue.5 this article will not go into any detail about this distinction other than to note that it makes sense to have a single global identifier for an entity while accepting different descriptions of that entity linked from various sources. wikidata | van veen 74 https://doi.org/10.6017/ital.v38i2.10886 figure 2. feeding properties connecting collections to wikidata (left) and direct linking to wikidata using resource identifier (right). the dashed lines show additional connecting possibilities. the motivating use case the idea of using the wikidata identifier as a universal identifier was born at the research department of the national library of the netherlands (kb) while working on a project aimed at automatically enriching newspaper articles with links to knowledge bases for named entities occurring in the text.6 these links include the wikidata identifier and, where available, the dutch and english dbpedia (http://dbpedia.org) identifiers, the viaf number, the geonames number (http://geonames.org), the kb thesaurus record number, and the identifier used by the parliamentary documentation centre (https://www.parlementairdocumentatiecentrum.nl/). the identifying parts of these links are indexed along with the article text in order to enable semantic search, including search based on wikidata properties. for demonstration purposes the enriched “newspapers+” collection was made available through the kb research portal, which gives access to most of the regular kb collections (figure 3). 7 in the newspaper project, linked named entities in search results are clickable to obtain more information. as most users are not expected to know sparql, the query language for the semantic web, the system offers a user-friendly method for semantic search: a query string entered between square brackets, for example “[roman emperor]”, is expanded by a “best guess” sparql query in wikidata, in this case resulting in entities having the property “position held=roman emperor.”. these in turn are used to do a search for articles containing one or more mentions of a roman emperor, even if the text “roman emperor” is not present in the article. in another example, when a user searches for the term “[beatles]” the “best guess” search yields articles mentioning entities with the property “member of=the beatles”. for ambiguous items, as in the case of “guernica,”, which can be the place in spain or picasso’s painting, the one with the highest number of occurrences in the newspapers is selected by default, but the user may select another one. for information technology and libraries | june 2019 75 the default or selected item, the user can select a specific property from a list of wikidata properties available for that specific item. the possibilities of this semantic search functionality may inspire others to use the wikidata identifier for globally known entities in other systems as well. figure 3. screenshot of the kb research portal with a newspaper article as result of searching “[architect=willem dudok]”. the results are articles about buildings of which willem dudok is the architect. the name of the building meeting the query [architect=willem dudok] is highlighted. usage scenarios two usage scenarios can be considered in more detail: (1) manually following links between wikidata descriptions and other resource descriptions, and (2) a federated sparql query can be performed by the system to automatically bring up linked entities. in the first scenario, in which resource identifiers link to wikidata, the user can follow the link to all resource descriptions having a backlink in wikidata. but why would a user follow such a link? reasons may include wanting more or context-specific information about the entity, or a desire to search in another system for objects mentioning a specific entity. in the latter case, the information behind the backlink should provide a url to search for the entity, or the backlink should be the search url itself. wikidata provides the possibility to specify various uri templates. these can be used to specify a link for searching objects mentioning the entity, rather than just showing a thesaurus entry. when the backlink does not provide extra information or a way to search the entity, the backlink is almost useless. thus, when systems provide resource links to wikidata they give users access to a wealth of information about an entity in the web of data and, potentially, to objects mentioning a specific entity. some systems only provide backlinks from wikidata | van veen 76 https://doi.org/10.6017/ital.v38i2.10886 wikidata to their resource descriptions but not the other way around. users from such systems cannot easily benefit from these links. the second scenario of a federated sparql query applies when searching objects in one system based on properties coming from other systems. formulating such a sparql query is not easy because doing so requires a lot of knowledge about the linked data cloud. the alternative is to put the complete linked data cloud in a unified (triple store) database. the technology of linked data fragments might solve the performance and scaling issues but not the complexity. 8 using a central knowledge base like wikidata could reduce complexity for the most common situation of searching objects in other systems using properties from wikidata. this use case requires these systems to take the users query and automatically formulate a sparql search. there are many systems that are linked to wikidata that do not support sparql at all or only support it in a way that is not intended for the average user. those systems can still let users benefit from wikidata by offering a simple add-on to search in wikidata for entities that meet some criteria and use the identifiers for a conventional search in the local system as shown for the case of the historical newspapers. these two use cases illustrate how the use of a wikidata identifier can lower the barrier to access information about an entity and to finding objects related to an entity by minimizing the number of hubs, minimizing the required knowledge and minimizing the required technology. this is achieved by linking resources to wikidata and, even more so, by making objects searchable by means of the wikidata identifier. advantages of using the wikidata identifier as universal identifier summarizing the above, a number of significant advantages of using the wikidata identifier as universal identifier can be seen. these include: • using the wikidata identifier as resource identifier makes wikidata the first hub. applications therefore have in the first instance to deal with only one description model. from there, it is easy to navigate further: most information is only “one hub away,” so less prior knowledge is required to link from one source to another. • wikidata identifiers can be used for federated search based on properties in wikidata, so there is less need to know how to access properties in other resource descriptions. • wikidata identifiers facilitate generating “just in case” links to systems having the wikidata identifier indexed. • complicated sparql queries using wikidata as primary source for properties can be shared and reused more easily compared to a situation with many diverse sources for properties. • wikidata offers many tools and apis for accessing and processing data. • some libraries and similar institutions may even decide to use wikidata directly for authority control when it reaches a critical mass, relieving them from maintaining a local thesaurus. implementation institutions can gradually adopt the use of wikidata identifiers without needing to make radical changes in their local infrastructure. a simple first step is automatically generating links to information technology and libraries | june 2019 77 wikidata in the presentation of an object or to the object description to provide contextual information and navigation options. as a next step, the wikidata q-number of an entity could be indexed along with the descriptions containing it, so these objects become findable via a wikidata identifier search, e.g. of the form: https://whatever.local/wdsearch?id=q937 the wikidata identifier could then be used in conventional as well as federated searches for a resource, regardless of the exact spelling of a resource name. a search may be refined using wikidata properties without further requirements with respect to local infrastructures. institutions having a sparql endpoint can allow for a federated sparql query for combining local data with data from wikidata. as sparql is not easy for the end user this requires a user interface that can formulate a sparql query to protect the user from knowing sparql. those institutions willing to start using the wikidata identifier as resource identifier can unify references in their bibliographic records. currently, for example, a reference to albert einstein, in a simplified, rdf-like (https://www.w3.org/rdf/) xml fragment in a bibliographic record, could look quite different for different institutions, e.g.: albert einstein albert einstein albert einstein albert einstein if the wikidata identifier is used as resource identifier, this could for all institutions become the same: albert einstein in this case it becomes easy to navigate the web, to create common bookmarklets, and provide additional functionality using the wikidata identifier. cataloguing process and criteria for new wikidata entries for institutions that decide to link their entities directly to wikidata, their catalog software would have to be configured to support wikidata lookups. catalogers would not have to know about linked data or rdf to create links to wikidata; they would simply have to query wikidata and select the appropriate entry to link. the cataloging software would then add the selected identifier to the record being edited. if a query in wikidata does not yield any results the item would first then have to be created by the cataloger. creating a new item using the wikidata user interface (figure 4) is straightforward: create an account, add a new item, and add statements (fields) and values. wikidata | van veen 78 https://doi.org/10.6017/ital.v38i2.10886 figure 4. data entry screen for entering a new item in wikidata. catalogers must be aware of some rules when creating items. wikidata editors may delete items that fall under one of wikidata’s exclusion criteria, such as vandalism, empty descriptions, broken links, etc. in addition, the item must refer to an instance of a clearly identifiable conceptual or material “notable” entity. notable means that the item must be mentioned by at least one reliable, third-party published source. here, common sense is required: being mentioned in a telephone book or a newspaper is in itself not considered as notability. entities that are not notable enough to be entered into wikidata would then remain identified by a link to a local or other thesaurus. possible objections to wikidata as authority control mechanism although it is, at least at the present moment, not the intention of this article to propose the use of wikidata as the primary local authority control mechanism, some institutions may nonetheless consider the opportunity to do so. there are numerous objections to this idea to note, including: 1) institutions may consider themselves authoritative sources of information, and may therefore want to keep control over “their” thesaurus. the idea that the greater community can make changes to “their” thesaurus may not be tenable to them. quality control and error detection certainly are important issues, but experts from outside the library can sometimes provide more and better information about a resource than cataloguing professionals. for misuse and erroneous input, the community can be relied on and trusted to correct and add to wikidata entries. information that is critical for local usage, such as access control, may still be managed locally. despite possible objections to using wikidata for universal authority control, national libraries and other institutions can information technology and libraries | june 2019 79 work together with wikidata to share responsibility of maintaining the resource, to optimize and harmonize the shared use of wikidata, and maintain validity and authority. this might imply a more rigorous quality control. 2) existing systems like viaf and isni already, at present, still contain more persons than wikidata, so why use wikidata? viaf and isni are domain specific and are more restrictive with respect to updates of their content and the availability of tools and apis. in wikidata both viaf and isni are just one hub away and for internal use the viaf and isni identifiers remain available. the question here is whether there will be a moment that wikidata reaches a critical mass and supersedes viaf and isni. 3) there may be disagreement about a certain entity, especially when it concerns political events or persons whose role is perceived differently by different political parties. wikidata contains neutral properties. the properties that may contain subjective qualifications or might suffer bias are mostly behind the backlinks, like the abstract in wikipedia. a fundamental difference between wikipedia and wikidata is that wikipedia doesn’t have to be consistent across languages. wikidata is much more structured and therefore more useful for semantic applications. it doesn’t allow for the different nuances in descriptions like wikipedia articles do and therefore wikidata doesn’t reflect different opinions in descriptions and is less subject to bias.9 furthermore, the cataloguing practices in libraries are subject to bias and subjectivity too. perception and political view may, for example, be reflected in some subject headings and may also change over time.10 it is debatable whether a cataloger is more neutral and less biased than a larger user community. although the use and acceptance of wikipedia as a true source of information may be arguable, in the light of the current “fake news” discussion it is extremely important to guard the correctness of information in wikipedia. in this context it is interesting to note that “according to a study in nature, the correctness of wikipedia articles is comparable to the encyclopaedia britannica, and a study by ibm researchers found that vandalism is repaired extremely quickly.”11 4) some objections have to do with the discussion of “centralization versus decentralization.” some institutions may not want a central system perceptively having control over their local data. the idea of using wikidata as a common authority control mechanism is not that different from the use of any other thesaurus or identifier framework like isbn, issn, etc., except for its use of a central resource description. 5) what if wikidata disappears? there are solutions in terms of mirrors and a local copy of wikidata. moreover, national libraries and other, similar institutions that are already responsible for long-term preservation of digital content can take responsibility for keeping wikidata alive to maximize its viability wikidata | van veen 80 https://doi.org/10.6017/ital.v38i2.10886 conclusion reconciliation of linked data identifiers in general, and using the wikidata identifier as universal identifier in particular, has been shown to have many advantages. libraries and similar institutions can gradually start using the wikidata identifier without needing to make radical changes in their local database infrastructure. when wikidata reaches a critical mass, libraries and similar institutions may want to switch to using wikidata identifiers as the default resource identifiers or authority records. however, given the enormous growth of the number of collections that link entities to wikidata that is already taking place, we might end up in a situation where the perception is that “if an item is not in wikidata, it doesn’t exist” stimulating putting more items in wikidata and making local descriptions less relevant. from a strategic point of view for adopting wikidata decision makers may pose the question: “why do we have a local thesaurus when we already have wikidata?” the next question, then, will probably not be “should we go this way?” but rather “when should we go this way and start using the wikidata identifier as the identifier?” references 1 robert sanderson, “the linked data snowball and why we need reconciliation,” slideshare, apr. 4, 2016, https://www.slideshare.net/azaroth42/linked-data-snowball-or-why-we-needreconciliation. 2 karen smith-yoshimura, “the rise of wikidata as a linked data source,” hanging together, aug. 6, 2018, http://hangingtogether.org/?p=6775. 3 joachim neubert, “wikidata as a linking hub for knowledge organization systems? integrating an authority mapping into wikidata and learning lessons for kos mappings,” in proceedings of the 17th european networked knowledge organization systems workshop, 2017, 14-25, http://ceur-ws.org/vol-1937/paper2.pdf. 4 theo van veen, “wikidata as universal library thesaurus,” presented oct. 2017 at wikidatacon 2017, berlin, https://www.youtube.com/watch?v=1_nxkbncohm. 5 “httprange-14,” wikipedia, accessed mar. 15, 2019, https://en.wikipedia.org/wiki/httprange-14. 6 theo van veen et. al., “linking named entities in dutch historical newspapers,” in metadata and semantics research, mtsr 2016, ed. emmanouel garoufallou (cham: springer, 2016), 205–10, https://doi.org/10.1007/978-3-319-49157-8_18. 7 video demonstration of “kb research portal,” kb | national library of the netherlands, http://www.kbresearch.nl/xportal, accessed apr. 26, 2019, https://www.youtube.com/watch?v=j5mcem-hemg. 8 ruben verborgh, “linked data fragments: query the web of data on web-scale by moving intelligence from servers to clients,” accessed mar. 15, 2019, http://linkeddatafragments.org/. 9 mark graham, “the problem with wikidata,” apr. 6, 2012, https://www.theatlantic.com/technology/archive/2012/04/the-problem-withwikidata/255564/. information technology and libraries | june 2019 81 10 candise branum, “the myth of library neutrality,” may 15, 2014, https://candisebranum.wordpress.com/2014/05/15/the-myth-of-library-neutrality/. 11 “the reliability of wikipedia,” wikipedia, accessed mar. 15, 2019, https://en.wikipedia.org/wiki/reliability_of_wikipedia. hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries annie wu, santi thompson, rachel vacek, sean watkins, and andrew weidner information technology and libraries | june 2016 5 abstract since 2009, tens of thousands of rare and unique items have been made available online for research through the university of houston (uh) digital library. six years later, the uh libraries’ new digital initiatives call for a more dynamic digital repository infrastructure that is extensible, scalable, and interoperable. the uh libraries’ mission and the mandate of its strategic directions drives the pursuit of seamless access and expanded digital collections. to answer the calls for technological change, the uh libraries administration appointed a digital asset management system (dams) implementation task force to explore, evaluate, test, recommend, and implement a more robust digital asset management system. this article focuses on the task force’s dams selection activities: needs assessment, systems evaluation, and systems testing. the authors also describe the task force’s dams recommendation based on the evaluation and testing data analysis, a comparison of the advantages and disadvantages of each system, and system cost. finally, the authors outline their dams implementation strategy comprised of a phased rollout with the following stages: system installation, data migration, and interface development. introduction since the launch of the university of houston digital library (uhdl) in 2009, the uh libraries have made tens of thousands of rare and unique items available online for research using contentdm. as we began to explore and expand into new digital initiatives, we realize that the uh libraries’ digital aspirations require a more dynamic, flexible, scalable, and interoperable digital asset management system that can manage larger amounts of materials in a variety of formats. we plan to implement a new digital repository infrastructure that accommodates creative workflows and allows for the configuration of additional functionalities such as digital exhibits, data mining, cross-linking, geospatial visualization, and multimedia presentation. the annie wu (awu@uh.edu) is head of metadata and digitization services, santi thompson (sathompson3@uh.edu) is head of repository services, rachel vacek (evacek@uh.edu) is head of web services, sean watkins (slwatkins@uh.edu) is web projects manager, and andrew weidner (ajweidner@uh.edu) is metadata services coordinator, university of houston libraries. mailto:awu@uh.edu mailto:sathompson3@uh.edu mailto:evacek@uh.edu mailto:slwatkins@uh.edu mailto:ajweidner@uh.edu hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 6 new system will be designed with linked data in mind and will allow us to publish our digital collections as linked open data within the larger semantic web environment. the uh libraries strategic directions set forth a mandate for us to “work assiduously to expand our unique and comprehensive collections that support curricula and spotlight research. we will pursue seamless access and expand digital collections to increase national recognition.”1 to fulfill the uh libraries’ mission and the mandate of our strategic directions, the uh libraries administration appointed a digital asset management system (dams) implementation task force to explore, evaluate, test, recommend, and implement a more robust digital asset management system that would provide multiple modes of access to the uh libraries’ unique collections and accommodate digital object production at a larger scale. the collaborative task force comprises librarians from four departments: metadata and digitization services (mds), web services, digital repository services, and special collections. the core charge of the task force is to: • perform a needs assessment and build criteria and policies based on evaluation of the current system and requirements for the new dams • research and explore dams on the market and identify the top three systems for beta testing in a development environment • generate preliminary recommendations from stakeholders' comments and feedback • coordinate installation of the new dams and finish data migration • communicate the task force work to uh libraries colleagues literature review libraries have maintained dams for the publication of digitized surrogates of rare and unique materials for over two decades. during that time, information professionals have developed evaluation strategies for testing, comparing, and evaluating library dams software. reviewing these models and associated case studies provided insight into common practices for selecting systems and informed how the uh libraries dams implementation task force conducted its evaluation process. one of the first publications of its kind, “a checklist for evaluating open source digital library software” by dion hoe-lian goh et al., presents a comprehensive list of criteria for library dams evaluation.2 the researchers developed twelve broad categories for testing (e.g., content management, metadata, and preservation) and generated a scoring system based on the assignment of a weight and a numeric value to each criterion.3 while the checklist was created to assist with the evaluation process, the authors note that an institution’s selection decision should be guided primarily by defining the scope of their digital library, the content being curated using the software, and the uses of the material.4 through their efforts, the authors created a rubric that can be utilized by other organizations when selecting a dams. information technology and libraries | june 2016 7 subsequent research projects have expanded upon the checklist evaluation model. in “choosing software for a digital library,” jody deridder outlines major issues that librarians should address when choosing dams software, including many of the hardware, technological, and metadata concerns that goh et al. identified.5 additionally, she emphasizes the need to account for personnel and service requirements with a variety of activities: usability testing and estimating associated costs; conducting a formal needs assessment to guide the evaluation process; and a tiered-testing approach, which calls upon evaluators to winnow the number of systems.6 by considering stakeholder needs, from users to library administrators, deridder’s contributions inform a more comprehensive dams evaluation process. in addition to creating evaluation criteria, the literature on dams selection has also produced case studies that reflect real-world scenarios and identify use cases that help determine user needs and desires. in “evaluation of digital repository software at the national library of medicine,” jennifer l. marill and edward c. luczak discuss the process that the national library of medicine (nlm) used to compare ten dams, both proprietary and open-source.7 echoing goh et al. and deridder, marill and luczak created broad categories for testing and developed a scoring system for comparing dams.8 additionally, marill and luczak enriched the evaluation process by implementing two testing phases: “initial testing of ten systems” and “in-depth testing of three systems.”9 this method allowed nlm to conduct extensive research on the most promising systems for their needs before selecting a dams to implement. the tiered approach appealed to the task force, and influenced how it conducted the evaluation process, because it balances efficiency and comprehensiveness. in another case study, dora wagner and kent gerber describe the collaborative process of selecting a dams across a consortium. in their article “building a shared digital collection: the experience of the cooperating libraries in consortium,”10 the authors emphasize additional criteria that are important for collaborating institutions: the ability to brand consortial products for local audiences; the flexibility to incorporate differing workflows for local administrators; and the shared responsibility of system maintenance and costs.11 while the uh libraries will not be managing a shared repository dams, the task force appreciated the article’s emphasis on maximizing customizations to improve the user experience. in “evaluation and usage scenarios of open source digital library and collection management tools,” georgios gkoumas and fotis lazarinis describe how they tested multiple open-source systems against typical library functions—such as acquisitions, cataloging, digital libraries, and digital preservation—to identify typical use cases for libraries.12 some of the use cases formulated by the researchers address digital platforms, including features related to supporting a diverse array of metadata schema and using a simple web interface for the management of digital assets.13 these use cases mirror local feature and functionality requests incorporated into the uh libraries’ evaluation criteria. hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 8 in “digital libraries: comparison of 10 software,” mathieu andro, emmanuelle asselin, and marc maisonneuve discuss a rubric they developed to compare six open-source platforms (invenio, greenstone, omeka, eprints, ori-oai, and dspace) and four proprietary platforms (mnesys, digitool, yoolib, and contentdm) around six core areas: document management, metadata, engine, interoperability, user management, and web 2.0. 14 the authors note that each solution is “of good quality” and that institutions should consider a variety of factors when selecting a dams, including the “type of documents you will want to upload” and the “political criteria (open source or proprietary software)” desired by the institution.15 this article provided the uh libraries with additional factors to include in their evaluation criteria. finally, heather gilbert and tyler mobley’s article “breaking up with contentdm: why and how one institution took the leap to open source,” provides a case study for a new trend: selecting a dams for migration from an existing system to a new one.16 the researchers cite several reasons for their need to select a new dams, primarily their current system’s limitations with searching and displaying content in the digital library.17 they evaluated alternatives and selected a suite of open-source tools, including fedora, drupal, and blacklight, which combine to make up their new dams.18 gilbert and mobley also reflect on the migration process and identify several hurdles they had to overcome, such as customizing the open-source tools to meet their localized needs and confronting inconsistent metadata quality.19 gilbert and mobley’s article most closely matches the scenario faced by the uh libraries. our study adds to the limited literature on evaluating and selecting dams for migration in several ways. it demonstrates another model that other institutions can adapt to meet their specific needs. it identifies new factors for other institutions to take into account before or during their own migration process. finally, it adds to the body of evidence for a growing movement of libraries migrating from proprietary to open-source dams. dams evaluation and analysis methodology needs assessment the dams implementation task force fulfilled the first part of its charge by conducting a needs assessment. the goal of the needs assessment was to collect the key requirements of stakeholders, identify future features of the new dams, and gather data in order to craft criteria for evaluation and testing in the next phase of its work. the task force employed several techniques for information gathering during the needs assessment phase: • identified stakeholders and held internal focus group interviews to identify system requirement needs and gaps • reviewed scholarly literature on dams evaluation and migration • researched peer/aspirational institutions • reviewed national standards around dams information technology and libraries | june 2016 9 • determined both the current use of uhdl as well as its projected use of uhdl • identified uhdl materials and users task force members took detailed notes during each focus group interview session. the literature research on dams evaluation helped the task force to find articles with comprehensive dams evaluation criteria. the niso criteria for core types of entities in digital library collections were also listed and applied to the evaluation after reviewing the niso framework of guidance for building good digital collections.20 more than forty peer and aspirational institutions’ digital repositories were benchmarked to identify web site names, platform architecture, documentation, and user and system features. the task force analyzed the rich data gathered from needs assessment activities and built the dams evaluation criteria that prepared the task force for the next phase of evaluation. evaluation, testing, and recommendation the task force began its evaluation process by identifying twelve potential dams for consideration that were ultimately narrowed down to three systems for in-depth testing. using data from focus group interviews, literature reviews, and dams best practices, the group generated a list of benchmark criteria. these broad evaluation criteria covered features in categories of system functionality, content management, metadata, user interface, and search support. members of the task force researched dams documentation, product information, and related literature to score each system against the evaluation criteria. table 1 contains the scores of the initial evaluation. from this process, five systems emerged with the highest scores: ● fedora (and, closely associated, fedora/hydra and fedora/islandora) ● collective access ● dspace ● rosettacontentdm the task force eliminated collective access from the final systems for testing because of its limited functionality. it is based around archival content only, and is not widely deployed. the task force decided not to test contentdm because of the system’s known functionalities that we identified through firsthand experience. after the initial elimination process, fedora (including fedora/hydra and fedora/islandora), dspace, and rosetta remained for in-depth testing. hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 10 dams evaluation score* fedora 27 fedora/hydra 26 fedora/islandora 26 collective access 24 dspace 24 rosetta 20 contentdm 20 trinity (ibase) 19 preservica 16 luna imaging 15 roda† 6 invenio‡ 5 table 1. evaluation scores of twelve dams using broad evaluation criteria the task force then created detailed evaluation and testing criteria by drawing from the same sources used previously: focus groups, literature review, and best practices. while the broad evaluation focused on high-level functions, the detailed evaluation and testing criteria for the final three systems closely analyzed the specific features of each dams in eight categories: ● system environment and function ● administrative access ● content ingest and management ● metadata ● content access ● discoverability ● report and inquiry capabilities ● system support * total possible score: 29. † removed from evaluation because the system does not support dublin core metadata. ‡ removed from evaluation because the system does not support dublin core metadata. information technology and libraries | june 2016 11 prior to the in-depth testing of the final three systems, the task force researched timelines for system setup. rosetta’s timeline for system setup proved to be prohibitive. consequently, the task force eliminated rosetta from the testing pool and moved forward with fedora and dspace. to conduct the detailed evaluation, the task force scored the specific features under each category utilizing systems testing and documentation. a score range from zero to three (0 = none, 1 = low, 2 = moderate, 3 = high) was assigned for each feature evaluated. after evaluating all features, the score was tallied for each category. our testing revealed that fedora outperformed dspace in over half of the testing sections: content ingest and management, metadata, content access, discoverability, and report and inquiry capabilities. see table 2 for the tallied scores in each testing section. testing sections dspace score fedora score possible score system environment and testing 21 21 36 administrative access 15 12 18 content ingest and management 59 96 123 metadata 32 43 51 content access 14 18 18 discoverability 46 84 114 report and inquiry capabilities 6 15 21 system support 12 11 12 total score: 205 300 393 table 2. scores of top two dams from testing using detailed evaluation criteria after review of the testing results, the task force conducted a facilitated activity to summarize the advantages and disadvantages of each system. based on this comparison, the dams task force recommended that the uh libraries implement a fedora/hydra repository architecture with the following course of action: ● adapt the uhdl user interface to fedora and re-evaluate it for possible improvements ● develop an administrative content management interface with the hydra framework ● migrate all uhdl content to a fedora repository hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 12 fedora/hydra advantages fedora/hydra disadvantages open source steep learning curve large development community long setup time linked data ready requires additional tools for discovery modular design through api no standard model for multi-file objects scalable, sustainable, and extensible batch import/export of metadata handles any file format table 3. fedora/hydra advantages and disadvantages the primary advantages of a dams based on fedora/hydra are: a large and active development community; a scalable and modular system that can grow quickly to accommodate large scale digitization; and a repository architecture based on linked data technologies. this last advantage, in particular, is unique among all systems evaluated, and will give the uh libraries the ability to publish our collections as linked open data. fedora 4 conforms to the world wide web consortium (w3c) recommendation for linked data platforms.21 the main disadvantage of a fedora/hydra system is the steep learning curve associated with designing metadata models and developing a customized software suite, which translates to a longer implementation time compared to off-the-shelf products. the uh libraries must allocate an appropriate amount of time and resources for planning, implementation, and staff training. the long-term return on investment for this path will be a highly skilled technical staff with the ability to maintain and customize an open-source, standards-based repository architecture that can be expanded to support other uh libraries content such as geospatial data, research data, and institutional repository materials. information technology and libraries | june 2016 13 dspace advantages dspace disadvantages open source flat file and metadata structure easy installation / ready out of box limited reporting capabilities existing familiarity through texas digital library limited metadata features user group / profile controls does not support linked data metadata quality module limited api batch import of objects not scalable / extensible poor user interface table 4. dspace advantages and disadvantages the main advantages of dspace are ease of installation, familiarity of workflows, and additional functionality not found in contentdm.22 installation and migration to a dspace system would be relatively fast, and staff could quickly transition to new workflows because they are similar to contentdm. dspace also supports authentication and user roles that could be used to limit content to the uh community only. commercial add-on modules, although expensive, could be purchased to provide more sophisticated content management tools than are currently available with contentdm. the disadvantages of a dspace system are the same long-term, systemic problems with the current contentdm repository. dspace uses a flat metadata structure, has a limited api, does not scale well, and is not customizable to the uh libraries’ needs. consultations with peers indicated that both contentdm and dspace institutions are exploring the more robust capabilities of fedora-based systems. migration of the digital collections in contentdm to a dspace repository would provide few, if any, long term benefits to the uh libraries. of all the systems considered, implementation of a fedora/hydra repository aligns most clearly with the uh libraries strategic directions of attaining national recognition and improving access to our unique collections. the fedora and hydra communities are very active, with project management overseen by duraspace and hydra respectively.23,24 over the long term, a repository based on fedora/hydra will give the uh libraries a low cost, scalable, flexible, and interoperable platform for providing online access to our unique collections. hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 14 cost considerations to balance the current digital collections production schedule with the demands of a timely implementation and migration, the task force identified the following investments as cost effective for fedora/hydra and dspace, respectively: fedora/hydra dspace metadata librarian: annual salary ● manages daily metadata unit operations during implementation ● streamlines the migration process metadata librarian: annual salary ● manages daily metadata unit operations during implementation ● streamlines the migration process @mire modules: $41,500 ● content delivery (3): $13,500 ● metadata quality: $10,000 ● image conversion suite: $9,000 ● content & usage analysis: $9,000 ● these modules require one-time fees to @mire that recur when upgrading to a new version of dspace table 5. start-up costs associated with fedora/hydra and dspace the task force determined that an investment in one librarian’s salary is the most cost-effective course of action. the new metadata librarian will manage daily operations of the metadata unit in metadata & digitization services while the metadata services coordinator, in close collaboration with the web projects manager, leads the dams implementation process. in contrast to fedora, migration to dspace would require a substantial investment in third party software modules from @mire to deliver the best possible content management environment and user experience. implementation strategies the implementation of the new dams will occur in a phased rollout comprised of the following stages: system installation, data migration, and interface development. mds and web services will perform the majority of the work, in consultation with key stakeholders from special collections and other units. throughout this process, the dams implementation task force will information technology and libraries | june 2016 15 consult with the digital preservation task force* to coordinate the preservation and access systems. phase one system installation phase two data migration phase three interface development set up production and server environment formulate content migration strategy and schedule reevaluate front-end user interface rewrite uhdl front-end application for fedora/solr migrate test collections and document exceptions rewrite uhdl front end as a hydra head or . . . create metadata models conduct the data migration . . . update current front end coordinate workflows with digital preservation task force create preservation metadata for migrated data establish interdepartmental production workflows begin development of administrative hydra head for content management continue development of the hydra administrative interface refine administrative hydra head for content management table 6. overview of dams phased implementation phase one: system installation during the first phase of dams implementation, web services and mds will work closely together to install an open-source repository software stack based on fedora, rewrite the current php front-end interface to provide public access to the data in the new system, and create metadata content models for the uhdl based on the portland common data model,25 in consultation with the coordinator of digital projects from special collections and other key stakeholders. the dams task force will consult with the digital preservation task force† to determine how closely the preservation and access systems will be integrated and at what points. the two groups will also jointly outline a dams migration strategy that aligns with the preservation system. web services and mds will collaborate on research and development of an administrative interface, based on the hydra framework, for day-to-day management of uhdl content. * an appointed task force to create a digital preservation policy and identify strategies, actions, and tools needed to sustain long-term access to digital assets maintained by uh libraries. † a working team at uh libraries that enforces the digital preservation policy and maintains the digital preservation system.[convert these footnotes to endnotes?] hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 16 phase two: data migration in the second phase, mds will migrate legacy content from contentdm to the new system and work with web services, special collections, and the architecture and art library to resolve any technical, metadata, or content problems that arise. the second phase will begin with the development of a strategy for completing the work in a timely fashion, followed by migration of representative sample collections to the new system to test and refine its capabilities. after testing is complete, all legacy content will be migrated from contentdm to fedora, and preservation metadata for migrated collections will be created and archived. development work on the hydra administrative interface will also continue. after the data migration is complete, all new collections will be ingested into fedora/hydra, and the current contentdm installation will be retired. phase three: interface development in the final phase, web services will reevaluate the current front-end user interface (ui) for the uhdl by conducting user tests to better understand how and why users are visiting the uhdl. web services will also analyze web and system analytics and gather feedback from special collections and other stakeholders. depending on the outcome of this research, web services may create a new ui based on the hydra framework or choose to update the current front-end application with modifications or new features. web services and mds will also continue to develop or adopt tools for the management of uhdl content and work with special collections and the branch libraries to establish production workflows in the new system. continued development work on the front-end and administrative interfaces, for the life of the new digital asset management system, is both expected and desirable as we maintain and improve the uhdl infrastructure and contribute to the open source software community in line with the uh libraries strategic directions. ongoing: assessment, enhancement, training, and documenting throughout the transition process mds and web services will undergo extensive training in workshops and conferences to develop the skills necessary for developing and maintaining the new system. they will also establish and document workflows to ensure the long-term viability of the system. regular consultation with special collections, the branch libraries, and other stakeholders will be conducted to ensure that the new system satisfies the requirements of colleagues and patrons. ongoing activities will include: ● assessing service impact of new system ● user testing on ui ● regular system enhancements ● establishing new workflows ● creating and maintaining documentation ● training: conferences, webinars, workshops, etc. information technology and libraries | june 2016 17 conclusion transitioning from contentdm to a fedora/hydra repository will place the uh libraries in a position to sustainably grow the amount of content in the uh digital library and customize the uhdl interfaces for a better user experience. publishing our data in a linked data platform will give the uh libraries the ability to more easily publish our data for the semantic web. in addition, the fedora/hydra architecture can be adapted to support a wide range of uh libraries projects, including a geospatial data portal, a research data repository, and a self-deposit institutional repository. over the long term, the return on investment for implementing an open-source repository architecture based on industry standard software will be: improved visibility of our unique collections on the web; expanded opportunities for aggregating our collections with highprofile repositories such as the digital public library of america; and increased national recognition for our digital projects and staff expertise. references 1. “the university of houston libraries strategic directions, 2013–2016,” accessed july 22, 2015, http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016libraries-strategic-directions-final.pdf. 2. dion hoe-lian goh et al., “a checklist for evaluating open source digital library software,” online information review 30, no. 4 (july 13, 2006): 360–79, doi:10.1108/14684520610686283. 3. ibid., 366. 4. ibid., 364. 5. jody l. deridder, “choosing software for a digital library,” library hi tech news 24, no. 9 (2007): 19–21, doi:10.1108/07419050710874223. 6. ibid., 21. 7. jennifer l. marill and edward c. luczak, “evaluation of digital repository software at the national library of medicine,” d-lib magazine 15, no. 5/6 (may 2009), doi:10.1045/may2009marill. 8. ibid. 9. ibid. 10. dora wagner and kent gerber, “building a shared digital collection: the experience of the cooperating libraries in consortium,” college & undergraduate libraries 18, no. 2–3 (2011): 272–90, doi:10.1080/10691316.2011.577680. 11. ibid., 280–84. http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016-libraries-strategic-directions-final.pdf http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016-libraries-strategic-directions-final.pdf http://dx.doi.org/10.1108/14684520610686283 http://dx.doi.org/10.1108/07419050710874223 http://dx.doi.org/10.1045/may2009-marill http://dx.doi.org/10.1045/may2009-marill http://dx.doi.org/10.1080/10691316.2011.577680 hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 18 12. georgios gkoumas and fotis lazarinis, “evaluation and usage scenarios of open source digital library and collection management tools,” program: electronic library and information systems 49, no. 3 (2015): 226–41, doi:10.1108/prog-09-2014-0070. 13. ibid., 238–39. 14. mathieu andro, emmanuelle asselin, and marc maisonneuve, “digital libraries: comparison of 10 software,” library collections, acquisitions, & technical services 36, no. 3–4 (2012): 79–83, doi:10.1016/j.lcats.2012.05.002. 15. ibid., 82. 16. heather gilbert and tyler mobley, “breaking up with contentdm: why and how one institution took the leap to open source,” code4lib journal, no. 20 (2013), http://journal.code4lib.org/articles/8327. 17. ibid. 18. ibid. 19. ibid. 20. niso framework working group with support from the institute of museum and library services, a framework of guidance for building good digital collections (baltimore, md: national information standards organization (niso), 2007). 21 . “linked data platform 1.0”, w3c, accessed july 22, 2015, http://www.w3.org/tr/ldp/. 22. “dspace,” accessed july 22, 2015, http://www.dspace.org/. 23. “fedora repository home,” accessed july 22, 2015, https://wiki.duraspace.org/display/ff/fedora+repository+home. 24. “hydra project,” accessed july 22, 2015, http://projecthydra.org/. http://dx.doi.org/10.1108/prog-09-2014-0070 http://dx.doi.org/10.1016/j.lcats.2012.05.002 http://journal.code4lib.org/articles/8327 http://www.w3.org/tr/ldp/ http://www.dspace.org/ https://wiki.duraspace.org/display/ff/fedora+repository+home http://projecthydra.org/ introduction literature review dams evaluation and analysis methodology needs assessment evaluation, testing, and recommendation cost considerations implementation strategies phase one: system installation phase two: data migration phase three: interface development ongoing: assessment, enhancement, training, and documenting conclusion lib-s-mocs-kmc364-20140601051127 1 foreword the editorial board of the journal of library automation is pleased to pay tribute to frederick g. kilgour who, with the able assistance of his assistant editor, eleanor m. kilgour, so firmly established this periodical and set its standards so high. especially in view of the fact that in these first years of journal publication, mr. kilgour was also designing and implementing the complex system which is the ohio college library center, his achievement as first editor was remarkable. to him the information science and automation division of the american library association owes a great debt. as library automation moves further into the seventies, the context of its existence changes. ever-increasing fisca l pressures have required economic justification for every alteration of traditional practice. the mere availability of equipment, of programs and tested system design, even of skilled and experienced manpower can no longer be considered enough. novelty, the magic word "innovation," seldom now cast a spell on those who control institutional budgets. increasingly, in the issues of this journal, we hope that emphasis will be placed on reviews of experience, retrospective evaluations of operation rather than optimistic projections made in the first bright mornings of system design. we must have reports if not of failures at least of alterations and accommodations enforced on operational systems by experience and the heavy hand of time. it is our further hope that the ] ournal will receive more reports from public and school libraries which indicate an increasing dedication, in automation explications, to the social and educational goals of those institutions. -ajg 275 improved delivery of library materials: the cleveland experience j. p. herling: cleveland state university library, m. g. fancher beeler: cuyahoga county public library, and a. reisman and b. v. dean: case western reserve university, department of operations research. this paper describes a pmfect designed to impmve services to libmry users by solving, through the application of operations research methods, a complex problem of delivery of library materials in an urban, multisystem library service region. unique features, methodology, results, and limitations are discussed. introduction when one realizes that 113 of the major libraries in the country carry 75 percent of the estimated total cost of interlibrary loan per year of 16 million dollars, the importance of greater utilization of local resources is obvious.1 during the planning of the implementation of a closed-circuit teletype communications network ( twp) among libraries in greater cleveland in 1968, it became apparent that improved communications answer only a part of the problem of shared access to library materials. in fact, user frustration is often increased by the inability of a library to provide quickly the materials that it has informed a library user are available in another library in the region. effective delivet·y of materials is an essential component of a successful library network, with efficiency a highly desirable characteristic. in late summer of 1968, representatives of several libraries in cleveland met as an ad hoc committee to discuss approaches to the solution of the problem: how to make the total resources of all types of libraries in greater cleveland more accessible to all, hopefully, by providing daily delivery service among all libraries. members of the committee agreed that the complexity of the problem required more than the pragmatic approach. discussions with members of the department of operations research at case western reserve university led to the preparation of a proposal: "an operations research study and design of an optimal distribution network for selected public, academic, and special libraries in greater 276 journal of libmry automation vol. 7 i 4 december 197 4 cleveland." prior to the preparation of the proposal, a literature search and inquiries to major cooperative networks had indicated that nowhere had the operations research approach been utilized to improve a library delivery system of the scope of that with which we were dealing. 2 after a year's delay, the proposal became a project sponsored by the library council of greater cleveland 4 and funded through the state library of ohio under title iii of the library services and construction act. a task force of librarians and operations researchers began work in the summer of 1970: the official project was completed in september 1972.3 the strategy chosen for this project was to delineate objectives, describe the present system iri detail, and design an improved system ·based on the existing system. methodology specifically, the generalized statement of the problem as defined by the project team was to 1. determine optimized delivery frequencies, schedules, and · routes which maintain the present distribution system's effectiveness and reduce present costs, 2. determine the optimal delivery frequencies, schedules, and routes which maximize the distribution system's effectiveness without increasing costs, and 3. evaluate alternative configurations of distribution systems in consideration of the network of library demands and geographical locations of garages, vehicles, and drivers. next the task force undertook, by means of questionnaires, data collection forms, and site visitations, the difficult and time-consuming task of describing a system the magnitude and complexity of which is apparent from figure 1. the result was a report, systems description i. we consider systems description i a major accomplishment, bringing together for the first time specific details of many of the operations of the libraries and library systems in cleveland and giving basic information on the who, how, and how much of the delivery subsystem. this subsystem was formally defined as comprising ( 1) personnel, ( 2) vehicles, ( 3) facilities, ( 4) supplies, and ( 5) funds, together with the schedules and routes involved in the physical movement of library materials. in cleveland, this consisted of ( 1) drivers, custodial staff (in smaller libraries), student couriers (in academic libraries); ( 2) trucks owned and operated by the cuyahoga county public library and the cleve1and public library, commercial vehicles and " the library council of greater cleveland is comprised of the directors of the following libraries: case western reserve university, cleveland heights-university heights public libra1y, cleveland public library, cleveland state university, cuyahoga county public library, east cleveland public librmy, euclid public library, lakewood pubic library, porter public library, rocky river public library, shaker heights public library, and willoughby-eastlake public library. improved deliveryjherling, et al. 277 fig. 1. structure of existing maior distribution systems and frequencies of deliveries. private automobiles utilized by academic and independent suburban libraries; ( 3) garages owned by the two libraries mentioned; ( 4) equipment such as gasoline, tires, bindery boxes, telescopes, etc.; ( 5) direct and indirect costs of roughly $200,000. the systems description also provided information on the use of the delivery subsystem. materials transported were categorized as shown in table 1. table 1. material types and values" type i1 interlibrary loan i. intralibrary loan i. audiovisuals i. reciprocal return i. newly processed contract is newly processed intralibrary i. photoduplication is mending and bindery i. bulk lntralibrary loan i1o correspondence i11 supplies i12 gifts "the values assigned are described later in the paper. value (weight) .135 .135 .135 .119 .112 .109 .098 .081 .067 .065 .058 .021 the magnitude of the volume of materials involved is clear from the fact that on a single day each truck averages a delivery and/or pick up of 43 telescopes, 114 packages, 49 bindery boxes, and 36 audiovisual items. annually, 5 million volumes of interand intralibrary loans and reciprocal returns are transported among the libraries. for each of the libraries, the amount of materials originating for shipment was defined as the "de278 journal of library automation vol. 7/4 december 1974 mand" which that library placed on the delivery system. although most of the delivery stop services in the cleveland area are libraries, some are not, e.g., hospitals, post offices, boards of education, and schools. we decided to designate delivery points as "nodes." a library node is characterized by the following attributes: 1. it receives delivery on a continuing basis. 2. it is located within the boundaries of a specified geographic area at a fixed site. 3. it has an expressed need for library materials and/or is a source of materials needed elsewhere. 4. it contains library facilities and/or assigns a person to library service. 5. it has a formal (contractual, political, administrative, etc.) or an informal agreement with other nodes and/or library systems. seven hundred sixty-two nodes were identified in the systems description, including over 700 libraries with total collections of over 8 million books, 27,000 periodical subscriptions, and 200,000 technical reports. each of the nodes was coded to provide a convenient notation for computer processing. figure 2 displays a section of a computer-generated map. as indicated earlier, the discipline used in this study was that of operations research. operations research is the application of mathematical and *·363 +487 +486 •353 +311 +.!~ +33b u5u +65q +238 +731 u750 +450 *361 +454 u552 0578 h06 +395 u524 +230 +659 fig. 2. a portion of one of the computer-generated maps. +236 +232 +231 *267 +226 +225 +653 0606 +6!>5 +656 +235 *200 +233 0621 +657 0623 +237+658 •212 +229 *208 +223 0383 improved deliveryjherling, et al. 279 1.0 supplies ( i io i mending & binding ( i 9 i 0.75 newly processed contract mts ( i 6 i ~ reciprocal returns ( i 8 i ::j 0.5 photo duplicated mtls (i zl i= ;;;, 0.25 inter library loans ( 1 3 1 0 8 time (days) fig. 3. utility cu1'ves for the timeliness of library .materials delivery. engineering techniques to the solution of management and systems problems, generally but not necessarily with the use of the computer. the operations research approach requires a valid unit of measurement. if an existing system is to be evaluated for comparison with alternative systems other than subjectively, some quantitative basis must be derived. we believe that one of the most important products of this project was the development of a measure of effectiveness, or "objective function." this measure was a composite of numerical values (weights ) assigned to the types of materials to be delivered, as shown in table 1; the frequency of delivery within a week; timeliness value (utility), as shown in figures 3 and 4; and the number of units to be delivered. to illustrate: ten interlibrary loan items (a weight of .135) delivered in less than one day (a utility of 1) have an effectiveness value of 13.5. on the other hand, ten items delivered in five days (a utility of .5) have 'an effectiveness value of 0.5 x .135 x 10 = 6.75. a system designed to accomplish the latter would have 50 percent less effectiveness than a system that accomplished the former. no librarian needs to be told that it is generally more important to deliver interlibrary loans promptly than it is to deliver supplies. but in order to use operations research methods, quantitative values, as we have said, are required. the values shown in table 1 and the sensitivity to timeliness of delivery, i.e., figures 3 and 4, were established by the use of a technique 280 journal of library automation vol. 7/4 december 1974 0.75 ~ :::i 0.5 § 0.25 0 4 8 12 16 time (days) fig. 4. utility curves for the timeliness of librmy materials delive1·y. gifts (itt) intra library bulk shipmts ( i 5 ) newly processed intra library ( 1 7 ) correspondence (i 1 ) intra library loans (i 4) known as the delphi method. our application of the method in this project has been described elsewhere. 4 essentially the method seeks out a consensus from a panel of knowledgeable people, in this case experts from academic, school, special, and public libraries, and a trustee. the methodology has three characteristics: anonymity, controlled feedback, and statistical group response. anonymity is used to minimize the linpacts of dominant individuals in the panel. this is achieved by eliciting separate and individual responses to previously prepared questions. in this case, the responses were made in writing on preprinted forms. controlled feedback reduces the variance in parameter estimates. after the first and all remaining rounds, the results of the previous round are fed back to the panel in a summarized form showing the vote distribution along with various justifications for votes after the second round. since the panel is asked to reevaluate their position based on the feedback provided, but with no particular attempt to anive at unanimity, the spread of votes will usually be much smaller after several rounds than during the earlier rounds. this is known as statistical group response. in each case consensus was reached within five rounds. in addition to the need for evaluating system effectiveness in relation to service, there is the need to relate effectiveness to costs. systems description i provided the data on all fixed and variable costs of the existing system. because of the prevailing use in libraries of line accounting, all the improved delivery/herling, et al. 281 300 --~9---r.r---€>~· ~----;:'\..jo;0j..--hdq 0 0 250 200 "' ~ 150 ..., h ::0: ~ legei:ijl !;! 100 0 hdq a ---------a ~ ----b 4> -----c x ------d 50 0 ..1...,8""/ 1:-=7...,8'""/ 2,.,.4-..,.8/.,.,3.,...1 .....-::9""/ 8,....,.,9,..,/1:-:-4-,.,.9 /.,.,2.,..1 ...,...,9""/ 2""'8~10"""/""5 ~10::-;/-:-:12~1:-::0~/ 1:-::-9..-:1-::-0 /'7::2':""6 r-:1:-::-1-;:/ 2:-r1:-::-1-;:/ 9:-r.1:-1/;:-16::-'th!e --aug, •i• sept. ----f-4--oct. fig. 5. weeldy mileage versus time for ccpl trucks. associated costs were not readily available, hence present costs were probably underestimated. for purposes of computer processing, cost per minute of driving and cost per mile of truck operation were identified. software to repeat: the general approach was to study the characteristics of the existing system, then design an improved system. using the elements described above, a computer program was written to emulate the system, introducing, however, the measure of effectiveness to make it possible to establish values representing the existing level of performance. entered in the program were 1. the nodes, 2. demand and frequency of delivery at each node, 3. geographic coordinates of each node, 4. unit costs, and 5. weights and utilities for each type of material. the program was run to compute, for each driver, the costs, distances traveled, volume delivered, time utilization, the effectiveness as discussed earlier, and then the cost/effectiveness ratios. figures 5 through 10 show the hard data inputs to the program. table 2 depicts a sample of 1 to co to ._ 0 ~ table 2. statistical analysis of the cpl driver collection cards ~ -.q... summer schedule 8117170---914170 t"-1 .... c:s-' statistic delivery pickup ~ number number c:.s::: daily number number of number of of bindery number of mileage of stops telescopes packages boxes audiovisuals number of number of of bindery number of e" telescopes packages boxes audiovisuals .,.... 0 mean 48.13 14.33 18.00 13.11 0.11 5.89 0.11 ~ 16.22 ~ .,.... .... variance 187.27 9.50 58.50 98.61 0.11 57.94 51.11 0.11 0 ~ standard deviation 13.68 3.08 7.65 9.93 0.33 7.61 7.15 0.33 - x ---4-~--a --a---a-~-----.a-a-a a >< ... c t:-~z;.-~-legend 0 hdq a-------:.... a ~--~ b 4>---c x----n .,----_j~-!l-----£l----~~-@--~~~~0~·--~-----0 hdq 0 ·~ 0 q @ 0 0 0 e) 8/17 8/24 8/31 9/8 9/14 9/21 9/28 10/5 10/12 10/19 10/26 11/2 u/9 11/16 tille -aug. .,..,,. sept. --...,•-+1.,.•--oct. --~,._-nov. -fig. 6. number of stops per week vet·sus time for ccpl trucks. the statistical analysis performed on the hard data. four sets of computer runs were made, first using data for the same week for all drivers, then data for several weeks for different drivers. total effectiveness of the existing system, as measured by the sum of the multiples of the importance of each material type (weight), their timeliness values (utilities), and total amounts of materials delivered, ranged from 8,110 to 9,950. costs ranged from $3,801 to $3,934 per week. a second program incorporating the tools of operations research known as simulation and optimization was then used to design an improved system. this program (simopt) included a routing algorithm (set of instructions to the computer) to determine the best routes for each of the drivers on a daily basis. figure 11 describes the basic logic of this program. the procedures to operate the methodology require the following steps (see figure 11) : 1. based on the library hierarchy, contractual arrangements, or any ex~ traneous but agreed-upon reasons, the librarians assign frequencies of delivery to each group of or individual nodes. 2. using the maps (figure 2) and other information, librarians group nodes and assign them to a driver along with the frequency as de284 journal of library automation vol. 7/4 december 1974 200 '"' 150 "' "' ~ "' p., § "' .., ['.! ,.. 100 0 "' "' ~ "' 50 0 0 l-aue. "'i • sepr.----j----oct. ---!"'"'""''"'.,_nov, ._ legei>id 0 ccpl hdq a-------·ccpl a (;;>----ccpl b cp----ccplc x---·ccpl d fig. 7. number of telescopes delivered per week versus time for ccpl trucks. rived from step 1 above. this constitutes the input necessary for a computer production run. 3. in a production run, the computer calculates: a. its best route for each driver day by day; b. the effectiveness of the route; c. the cost of the route cumulative by day for one week; d. the distance traveled by each driver; e. the time spent working by each driver; and f. capacity, time, and/or distance constraint violations 4. if results of step 3 are not satisfactory or a better variant is synthesized, librarians can iterate through steps 1 or 2. in order to maintain the information basis of the procedure, the following input must be updated for computer files. ad hoc basis • node changes -new nodes -nodes to be dropped -changes of location -changes of hierarchical status and category • changes in vehicle capacity • changes in cost parameters pe1'iodic (inte1'mediate range) basis • evaluate demands-by season by node 600 500 0 imp1'0ved delive1·yjherling, et al. 285 0 0684 0-~~~------wq 0 legend 0 wq [;) i! 400 a------a (>--~b 0----c x----d .,. f;l ~ ..., ~ 300 0 i "' "" 0 0 200 i 100 fig. 8. number of packages delivered per week vm·sus time for ccpl trucks. (once every two or three years or ad hoc if major shifts have been established) • evaluate driver time data (as above) pel'iodic (long mnge) basis • reevaluate the material types • reestablish sensitivity curves maintaining the same frequency of delivery as used earlier, but with routes generated by the computer, results showed a potential cost reduction of 5 percent and an increased effectiveness of 37,930, or 400 to 500 percent improvement. the simulation-optimization program also has the capability of processing changes in the elements of the system. effects of two types of changes were tested: 1. configurations which included an increase of frequency of delivery to daily delivery for most libraries and twice-daily delivery to some; and 2. configurations which included one or two trucks dedicated to transshipment delivery among key distribution centers. 286 journal of library automation vol. 7/4 december 1974 150 0 125 0 0 i.egenq 0 hdq a-------a ~==:--= ~ x----d 25 .. -~ «!> cj)c t:;o ~ . sip ~ ~ _r:;.. r:;.. ·q;--b 0 8/17 8/24 8/31 9/8 9/14 9/21 time aug, ... i .. sept. ... i .. ...i .. fig. 9, number of bindery boxes delivered per week versus time for ccpl trucks. effectiveness again increased 400 to 500 percent; costs, however, also increased, between 3 and 39 percent. discussion essentially, these results provided the means by which cleveland libraries could maintain the existing delivery system at a slight reduction in cost, but with a fourto fivefold increase in effectiveness; or could improve the frequency of delivery at a known increase in cost and the fourto fivefold improved effectiveness. at the same time, a realistic basis for. evaluating bids from commercial delivery services was made available, should this alternative be explored. last, but by no means least, a method for the analysis and/or design of a delivery system that could be used by. other library networks was developed. no study-as is true of most human endeavors-is perfect: ours is no exception. the original intent of the proposal, to study the entire distribution system, and especially its reference network aspects, was narrowed to the delivery subsystem because of inadequate funding. underestimation of the complexity of the problem, which mandated the expenditure of more time than was anticipated on data collection and systems description, caused a limitation on the time that could be devoted to study of the deimproved delivetyjherling, et al. 287 300 0 250 "' "' ~ 200 "' legend :;j [;:' 0 hdq h a ---a "' "' 0 ----b "' 0 :> 150 0 ---c ~ ----d ""' 0 0 p'l "' i 100 50 0 8/17 8/24 8/31 9/8 9/14 9/21 9/28 10/5 10/12 10/19 10/26 11/2 11/9 11/16 -,aug. ~~--sept. ----t--oct, ---1-o--nov. fig. 10. numhe1' ofaudiovisuals delivered pel' week versus time fo1' ccpl trucks. livery subsystem. we could not, as we had intended, consider the question of optimum truck size or alternative types of vehicles; hypothetically, a combination of motorcy 1 cles and large trucks would produce a more costeffective system. acceptance of the location of facilities such as garages as fixed was a further limiting factor: their relocation might have a significant effect. finally, the method of approach in concert with the realities of library budgets ruled out the design of an ideal system unrelated to the existing system. · enough has been written recently to denigrate the usefulness of the computer in library applications. nevertheless, we must acknowledge that a greater amount of human intervention than anticipated was employed as a corrective in the generation of computer-produced routes and must also be used for their implementation. consider: each of 700 geographical locations is a potential successor in a route to any other of the remaining 699. to process these for computer routing would require obtaining nearly 500,000 pairs of geographical coordinates, their keypunching, and verifying. by human selection from a map, reasonable sets of contiguous nodes were fed into the computer; the pairs of geographical coordinates were thus reduced to the not unmanageable number of 2,500 . to 6,400 pairs. further, once computer routes have been generated, human interven288 ] ournal of library automation vol. 7 i 4 decem her 197 4 unit & weights & 9 variable utilities costs values hierarchy node select & location demand r--season or policy rules coordinates date i i t t i i select i i potential select node list frequency for driver ~ select schedule i routing subroutine ~ t i compute objective and cost • check :---jill------____ __,constraints ~......_-----------a yes next no day or driver constraints fig. 11. a general methodology for the simulation-optimization. resources ~available i i : i i i i l i i i i i t ! i i i : l _j i ____ _j tion is required to adjust these to road and traffic patterns that the computer cannot know. this does not imply that the multitude of calculations that need be performed in a study such as this could have ever been attempted without the computer. conclusion despite its imperfections, the project discussed here has convinced us that the approach and methodology are of value to the library community, not only in application to library delivery systems but also in application to a multitude of library service problems, particularly those involving several libraries or library systems, albeit because of changes in top administrative positions within the key library systems the results of this study are still awaiting implementation. improved delivery/herling, et al. 289 references 1. library of congress information bulletin 31:a72 (june 9, 1972). 2. a related study relatively limited in scope is j. c. hsiao and f. j. heinritz, "optimum distribution of centrally processed material: multiple routing solutions utilizing the lock-set method of sequential programming," library resources & technical services 13:537-44 (fall 1969). 3. full documentation of the project is available in the following: an operations research study and design of an optimal distribution network for selected public, academic, and special libraries in gmater cleveland: technical report (cleveland, ohio: the task force, lsca title iii distribution project, 1972); systems description i (cleveland, ohio: the task force, lsca title iii distribution project, 1972). these are available on loan through the state library of ohio. 4. a. reisman, g. kaminski, s. srinivasan, j. herling, and m. g. fancher, "timeliness of library material delivery: a set of priorities," socio-economic planning sciences 6:145--52 (1972). i ! ! i i, principles of format design henriette d. avram and lucia j. rather: marc development office, library of congress 161 this paper is a summary of several working papers prepared for the international federation of library associations (ifla) working group on content designators. the first working paper, january 1973, discussed the obstacles confronting the worldng group, stated the scope of responsibility for the working group, and gave definitions of the terms, tags, indicator and data element identifiers, as well as a statement of the function of each.1 the first paper was submitted to the working group for comments and was subsequently modified (revised aprill973) to reflect those comment$ that were applicable to the scope of the working group and to the definit·ion and function of content designators. the present paper makes the basic assumption that there will be a supermarc and discusses principles of format design. this se1·ies of papers is be·ing published in the interest of almting the library community to intemational activities. all individual working papers are submitted to the marbi interdivisional committee of ala by the chairman of the ifla working group for comments by that committee. introduction in order to have this paper stand alone, the scope and the definition and functions of the content designators as agreed to by the working group are summarized below: 1. the scope of responsibility for the ifla working group is to arrive at a standard list of content designators for different forms of material for the international interchange of bibliographic data. 2. the definition and function of each content designator are given as: a. a tag is a string of characters used to identify or name the main content of an associated data field. the designation of main content does not require that a data field contain all possible data elements all the time. b. an indicator is a character associated with a tag to supply additional information about the data field or parameters for the processing of the data field. there may be more than one indicator per data field. 162 ] ournal of lib1'a1'y automation vol. 7 i 3 september 197 4 c. a data element identifier is a code consisting of one or more characters used to identify individual data elements within a data field. the data element identifier precedes the data element which it identifies. d. a fixed field is one in which every occurrence of the field has a length of the same fixed value regardless of changes in the contents of the fixed field from occurrence to occurrence. the content of the fixed field can actually be data content, or a code representing data content, or a code representing information about the record. basic assumption-supermarc there appears to be little doubt that the format used for international exchange will not be the format presently in use in any national system. the first working paper addressed the obstacles that preclude complete agreement on any single national format, and a study of the matrix of the content designators assigned by various national agencies substantiates the above conclusion. consequently, we are concerned with the development of a supermarc whereby national agencies would translate their local format into that of the supermarc format and conversely, each agency would accept the supermarc format and translate it into a format for local processing. 2• 3 supermarc, therefore, is an international exchange format with the principal function that of transferring data across national boundaries. it is not a processing format (although if desired, it could be used as such) and in no way dictates the record organization, character bit configuration, coding schemes, etc., to be used within processing agencies. the supermarc format, however, should conform to certain conventions, namely the format structure should be iso 2709 and the character representation should be an eight-bit extension of iso 646. ~ the latter convention means that data cannot be in any other configuration than a character-by-character representation. supermarc assumes not only agreement on the value of content designators but, equally as important, on the level of application of these content designators. whatever the agreed upon level of content designation is, those agencies with formats more detailed will be able to translate to supermarc but will be in the position of having to upgrade all records entered into their local system from other agencies. likewise, local formats consisting of less detailed content designation than supermarc must upgrade to the supermarc level for communication purposes. where the actual content of the record is concerned, i.e., the fields andjor data elements to be included, it is highly probable that the decision of the content designator working group will be that data, if in~ iso/tc 46/sc4 wgl is presently engaged in the definition of extended characters for roman, cyrillic, and greek alphabets and mathematics and control symbols. principles of format design/ avram and rather 163 eluded in the record, are assigned supermarc content designators, but that not all data will always be present. this permits the flexibility required to bypass some of the substantive problems of different cataloging rules and cataloging systems. for example, one agency may supply printer and place of printing while another may not. it may be assumed, however, that all agencies will conform to the specifications prescribed by the isbd and other such standard descriptions as they become available. principles of format design prior to any deliberation regarding the actual value of content designators, the working group realized it must agree on a set of basic principles for the design of the international format. the first working paper set forth, in the form of questions, some of the issues that must be taken into account in arriving at the principles. several members of the working group expressed their opinions and these were considered in the formulation of the principles. the principles were discussed at the grenoble meeting in august 1973. five of the principles were adopted and the sixth was deferred for further analysis based on working papers to be written by some of the members. the sixth principle was adopted at the brussels meeting in february 1974. the six basic principles are stated below with a discussion following each principle: 1. the international format should be designed to handle all media. it would be ideal if at this time all forms of material had been fully analyzed. this is currently not the case. agreement on data fields and the assignment of content designators can realistically only be accomplished if there is a foundation upon which to build. therefore, the forms of material have been limited to those listed below because, to the best of our knowledge, these are the only forms where either experience has been gained in the actual conversion to machine-readable form or in-depth analysis has been performed to define the elements of information for the material. books: all monographic printed language materials. serials: all printed language materials in serial form. maps: printed maps, single maps, serial maps, and map collections. films: all media intended for projection in monographic or serial form. music and sound recordings: music scores and music and nonmusic sound recordings. at the meeting in brussels, the decision was made to use the isbd as the foundation for the definition of functional areas for the formats. since at the present time an isbd exists only for monographs and serials, these materials will receive first priority by the ifla working group. · still under consideration is the question whether manuscripts should be included in the forms of material within the scope of the 164 j oumal of lihra1'y automation vol. 7 i 3 september 197 4 working group. pictorial representations and computer mediums have not as yet been analyzed. when these forms have been analyzed, they should be added to the generalized list. 2. the inte1'national fo1'mat should accept single-level and multilevel st1'uctu1'es. there is a requirement to express the relationship of one bibliographic entity to another. this relationship may take many forms. a hierarchical relation is expressed for works which are part of a larger bibliographic entity (such as the chapter of a book, a single volume of a multivolume set, a book within a series). a linear relation is expressed for works which are related to other works such as a book in translation. this discussion is concerned with hierarchical relationships and the need to describe this relationship in machinereadable records. there are a number of ways in which hierarchical relationships may be expressed. one method is to place the information on the related work in a single field within the record. for example, the different volumes of a multivolume set may be carried in a contents field. when a book is in a series, the series may be caltied in a series field. this may be termed using a single-level record to show a hierarchical relationship. another method is to use a multilevel record made up of subrecords.t the concept of a subrecord directory and a subrecord relationship field was discussed in appendix ii to the ansi standard z39.2-197!.4 the appendix illustrated a possible method of handling subrecords and expressing relationships within a bibliographic record but was not part of the american standard. similarly, in 1968 the library of congress published as part of its marc ii format a proposal to provide for the bibliographic descriptions of more than one item in a single record, and represented this capability as "levels" of bibliographic description. 5 the international standard (iso 2709) defines a subrecord technique without an explicit statement of a method to describe relationships. 6 more recently, a level structure was proposed in a document by john e. linford,7 and an informal paper by richard coward8 gave the following example of a level structure: level collection sub-collection document analytical record 1 subrecord 1 subrecord 1 subrecord r------1------, 1 subrecord 1 subrecord 1 subrecord t a subrecord is a "group of fields within a bibliographic record which may be treated as a logical entity." when a bibliographic record describes more than one bibliographic unit, the descriptions of the individual bibliographic units may be treated as subrecords. principles of format design/ avram and rather 165 several national ,agencies have expressed concern regarding the efficiency of the iso 2709 subrecord technique and have suggested that a modification be made to the subrecord statement. there are alternative techniques which could be incorporated in the international exchange format to build in level capability. methods have been suggested that would cause a revision (specifically the number of characters in each directory entry) to the iso standard; other alternatives might not. regardless of the final technique agreed upon, national agencies should maintain the authority to record their cataloging data to reflect their catalog practices, i.e., either describing the items related to an item cataloged as fields within a single-level record or as subrecords of a multilevel record. 3. tags should identify a field by type of entry as well as function by assigning specific values to the charactet positions. assigning values to the characters of the tags allows the flexibility to derive more than a single kind of information from the tag. for example, it should be possible by an inspection of the tags to retrieve all personal names from a machine-readable record regardless of the function of the name in the record, i.e., principal author, secondary author, name used as subject, etc. 4. indicatots should be tag dependent and used as consistently as possible across all fields. indicators should be tag dependent because they provide both descriptive and processing information about a data field. if the value assigned to an indicator is used as consistently as possible across all fields, where the situation warrants this equality, the machine coding is simplified to process different functional fields containing the same type of entry. 5. data element identifiets should be tag dependent, but, as fat as possible, common data elements should be identified by the same data element identifiets actoss fields. the principle has been adopted that the format will handle all types of media and consequently the projected number of unique tags may be quite large. in addition, since all types of media are not yet fully analyzed, the number of unique fields is an unknown factor. while it is undeniable that making data element identifiers tag independent would be desirable, the limited number of alphabetic, numeric, and symbolic characters would restrict the number of data elements to the number of unique characters. this constraint on future expansion seems to be more important than any advantages gained from making data element identifiers tag independent. if data element identifiers are tag dependent, then additional refinements could be added in one of two ways: ( 1) the principle of identifying common data elements by the same identifiers across fields could be followed as far as possible, 01' ( 2) the identifiers could be given a value to aid in filing. the two refinements appear to be mutu166 journal of library automation vol. 7/3 september 197 4 ally exclusive since a data element in one field may have a different filing value from the same data element in another field. since the first refinement should be useful for many types of processing, and the second would be useful only in filing, the former seems to be the better option. 6. the fields in a bibliographic record are primarily related to broad categories of information relating to "sttbfect," "description," "intellectual1'esponsibility," etc., and should be grouped according to these fundamental categories. the first working paper discussed as an obstacle the lack of agreement on the organization of data content in machine-readable records in different bibliographic communities. a subsequent paper consisting of comments made by staff of the library of congress on the proposed eudised format discussed in greater detail the analytic versus traditional arrangement. 9 • t the majority of the national formats designed to date are arranged by using the function as the primary grouping and the type of entry as the secondary grouping. several working papers produced by committee members supported the arrangement by function on the grounds that it followed the traditional order of elements in the bibliographic record and therefore simplified input procedures. grouping of the fields first by function and then by type of entry was agreed to at the brussels meeting. references 1. henriette d. avram and kay d. guiles, "content designators for machine readable records," journal of library automation 5:207-16 (dec. 1972). 2. r. e. coward, "marc: national and international cooperation," in international seminar on the marc format and the exchange of bibilographic data in machinereadable form, berlin, 1971, the exchange of bibliographic data and the marc format (munich: pullach, 1972), p. 17-23. 3. roderick m. duchesne, "marc: national and international cooperation," in international seminar on the marc format and the exchange of bibliographic data in machine-readable form, berlin, 1971, the exchange of bibliographic data and the marc format (munich: pullach, 1972), p.37-56. 4. american national standards institute, american national standard fot' bibliogmphic information interchange on magnetic tape (washington, d.c.: 1971) (ansi z39.2-1971). appendix, p.l5-34. 5. henriette d. avram, john f. knapp, and lucia j. rather, the marc ii format; a communications format for bibliographic data (washington, d.c.: library of congress, 1968), appendix iv, p.l47-49. 6. international organization for standardization, documentation-format fot• bibliographic information interchange on magnetic tape. 1st ed. international standard iso 2709-1973(e). 4p. t in an analytic tagging scheme, the first character of the tag describes the type of entry and subsequent characters describe function; in a traditional tagging scheme, the first character describes function and subsequent characters describe type of entry. ptinciples of format design/ avram and rather 167 7. council for cultural cooperation. ad hoc committee for educational documentation and information. working party on eudised formats and standards, 3d meeting, luxembourg, 26-27 april 1973, draft eudised format (second revision). prepared by john e. linford. 8. paper sent from richard coward to henriette d. avram, "notes on marc subrecord directory mechanism." 9. henriette d. avram, "comments on draft eudised format (second revision)," unpublished paper. 247 buyer be wary! in the september 1974 issue of jola, the "highlights of isad board meeting" reflects the library automation community's growing concern with misrepresentation of products and misleading or fraudulent claims. a proposal was made that isad create a mechanism to monitor relevant advertising in order to inform and protect' its constituency and, indeed, the entire profession. it is paradoxical that this concern is being voiced at a time when the relationship between the public and private sectors seems closer than at any other time in the recent past. in general, librarians and vendors are good friends. there is an atmosphere of mutual respect, and we no longer raise eyebrows upon learning that a librarian-colleague has gone "commercial." indeed, librarians and libraries are learning from the business world to create products and market them in order to support desired internal services. the growing entrepreneurial efforts of libraries are linkirig the two groups with a yet firmer bond. unfortunately, but inevitably, there are a few flies in the ointment. with regularity, we pick up professional literature to find advertising which sounds too good to be true. an investigation will usually indicate that, in fact, it is not true. we are often visited by salesmen describing incredible advances in their particular areas. the pressure applied by these people can be distasteful and even intolerable. or we may receive a onepage brochure from an unknown company, touting its latest, very competitive system, and listing the familiar names of well-respected librarians as advisors. almost always, we are lucky and are able to discover for ourselves the true nature of the products being advertised. our misfortune may begin when an ambitious salesman finds his or her way into the office of an administrator or politician who does not have adequate preparation for the onslaught of facts, figures, and fallacies. what are the best ways of misrepresenting a product? most approaches fall into one of the following categories: ( 1) misleading advertising, with unclear statements and imprecise use of vocabulary; ( 2) claims that one, or several, or many other libraries are using the product with satisfaction (when this indeed is not the case); ( 3) specific statements that a large and prestigious library is about to sign a contract for servic~s or products (although investigation will reveal no such intention); ( 4) lists of experts in the field who are presumed to be associated with the company in an advisory or consultant role (but who are unaware of this use of their names); and ( 5) approaches to federal, state, or local agencies to appeal 248 journal of library automation vol. 7/4 december 1974 the procedures used by libraries in requesting bids or awarding contracts. at this point, a note of caution must be inserted. strategies of advertising and marketing usually involve one or more of the above techniques to a certain extent. we all practice minor exaggerations and simplifications in our professional lives in order to accomplish certain goals. it would be unwise and unfair to accuse an advertiser of misleading his market on the basis of one of these "small exaggerations." in resolving this issue, our concern must be with those individuals or organizations who are constantly found with a large discrepancy between the word and the deed. what methods can be used as protection against these tactics? there are several reliable paths: ( 1) be aware of and alert to the possibilities of misleading claims and misrepresentation; ( 2) follow up a sales pitch with a few phone calls to those institutions that are described to be using the product or about to sign the contract; ( 3) maintain a reasonable amount of resistance to the sales talk; ( 4) use the library profession's invisible college to determine the validity of the claims and the experiences that others have had with the firm; and ( 5) support the attempts of our professional societies, such as ala and asis, to require organizations to maintain certain advertising standards. the library market is expanding and maturing; therefore, these growing pains associated with increased marketing efforts are not unexpected. with adequate education and awareness on the part of the buyer, with some pressures placed on advertisers by the professional community, and with a tolerance for the normal tendencies of advertising and marketing, we will be able to resolve a difficult situation with grace and without hard feelings. susan k. martin december_ital_oud_final accessibility of vendor-created database tutorials for people with disabilities joanne oud information technology and libraries | december 2016 7 abstract many video, screencast, webinar, or interactive tutorials are created and provided by vendors for use by libraries to instruct users in database searching. this study investigates whether these vendorcreated database tutorials are accessible for people with disabilities to see whether librarians can use these tutorials instead of creating them in-house. findings on accessibility were mixed. positive accessibility features and common accessibility problems are described, with recommendations on how to maximize accessibility. introduction online videos, screencasts, and other multimedia tutorials are commonly used for instruction in academic libraries. these online learning objects are time consuming to create in-house and require a commitment to maintain and revise when database interfaces change. many database vendors provide screencasts or online videos on how to use their databases. should libraries use these vendor-provided instructional tools rather than spend the time and effort to create their own? many already do: a study shows that 17.7 percent of academic libraries link to tutorials created by third parties, mainly by vendors or other libraries.1 when deciding whether to use vendor-created tutorials, one consideration is whether the tutorials meet accessibility requirements for people with disabilities. the importance of accessibility for online tutorials has been increasingly recognized and outlined in recent library literature.2 people with disabilities make up one of the largest minority groups in the united states and canada, and studies show that about 9 percent of university or college students have a disability.3 problems with web accessibility have been well documented. people with disabilities are often unable to access the same online sites and resources as others, creating a digital divide.4 even if people with disabilities can access a site, it is more difficult for many to use it.5 assistive technologies, like screen-reading software, enable access but add an extra layer of complexity in interacting with the site, and blind or low-vision users can’t always rely on visual cues to navigate and interpret sites. a recent study of library website accessibility concluded that typical library websites are not designed with people with disabilities in mind.6 joanne oud (joud@wlu.ca) is instructional technology librarian and instruction coordinator, wilfrid laurier university, ontario, canada. accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 8 libraries, which are founded on a philosophy of equal access to information, should be concerned about online accessibility. legal requirements for providing accessible online web content vary, but exist in every jurisdiction in the united states and canada. apart from the legal requirements, recent literature points out that equitable access to information for people with disabilities is a matter of human rights and an issue of diversity and social justice, and calls on libraries and librarians to improve their commitment to online accessibility.7 it is important for libraries to participate in creating level playing field and to avoid creating conditions that make people feel unequal or prevent them from equitable access. it is unclear whether librarians can assume vendor-created instructional tutorials are accessible. studies on vendor database accessibility have been mixed, showing some commitment to and improvements in accessibility on one hand, but sometimes substantial gaps in accessibility on the other.8 the focus until now has been exclusively on the accessibility of database interfaces. this study investigates the accessibility of online tutorials, including videos, screencasts, interactive multimedia, and archived webinars created by database and journal vendors and offered as instructional materials to librarians and patrons, to determine whether they are a viable alternative to making in-house training materials. literature review although a few articles exist on how to make video tutorials accessible,9 no studies have evaluated the accessibility of already-created video or screencast tutorials. there are, however, some studies evaluating the accessibility of vendor databases. byerley, chambers, and thohira surveyed vendors in 2007 and found that most felt they had integrated accessibility standards into their search interfaces, and nearly all tested for accessibility to some degree, though not always with actual users.10 these findings conflict somewhat with the results of other studies. tatomir and durrance evaluated the accessibility of thirty-two databases with a checklist and found that although many did contain accessibility features, 72 percent were marginally accessible or inaccessible.11 similarly, dermody and majekodunmi found that students with print-related disabilities who use screen-reading software could only complete 55 percent of tasks successfully because of accessibility barriers and usability challenges.12 delancey surveyed vendors and examined vpats, or product accessibility claims, and found that vendors felt they were compliant with 64 percent of us section 508 items.13 especially relevant to this study, only 23 percent of vendors said that the multimedia content within their products was compliant, and 46 percent admitted multimedia content was not compliant at all. since vendor vpat forms are completed for databases and other products only, and not the instructional tutorials created by vendors on how to use those products, vendor accessibility claims for instructional tutorials are unknown. although no studies have been done on the accessibility of video or screencast tutorials, some have been done on the accessibility of multimedia or other related kinds of online learning. information technology and libraries | december 2016 9 roberts, crittenden, and crittenden surveyed 2,366 students taking online courses at several us universities. a total of 9.3 percent of those students reported that they had a disability, and of those, 46 percent said their disability affected their ability to succeed in their online course, although most reasons cited were not related to technical accessibility barriers.14 kumar and owston studied students with disabilities using online learning units that contained videos. all students in the study reported at least one barrier to completing the learning units.15 although this study involves student use of video tutorials, it doesn’t report on accessibility issues specific to those tutorials. previous studies of vendor products focus exclusively on database interfaces, and previous studies of online learning have not focused on screencast accessibility. therefore this study’s goal is to investigate how accessible vendor-created video tutorials are. accessibility is defined as both technical accessibility (can people with disabilities locate, access, and use them) and usability (how easy it is for people with disabilities to use them). this study will look at which major accessibility issues there are (if any) and make recommendations on whether librarians can direct students to them rather than making in-house instructional videos. method an evaluation checklist (see appendix 2) was developed for this study using criteria drawn from the web content accessibility guidelines (wcag) 2.0. wcag 2.0 is the most widely recognized web-accessibility standard internationally. much recent accessibility legislation adopts it, including the in-process revisions to section 508 guidelines in the united states.16 wcag 2.0 is also consistent with tutorial accessibility best-practice advice found in recent articles, which emphasize the need for accurate captions, keyboard accessibility, descriptive narration, and alternate versions for embedded objects, among other criteria.17 the checklist has twenty items and is split into two sections, “functionality” and “usability.” functionality items test whether the tutorial can be used by people using screen-reading software or a keyboard only, and include whether the tutorial is findable on the page and playable, whether player controls and interactive content can be operated by keyboard, whether captions are available, and whether audio narration is descriptive enough so someone who can’t see the video can understand what is happening. usability items test how easy the tutorial is to use. examples include clear visuals and audio, use of visual cues to focus the viewer’s attention, and short and logically focused content. to help prioritize the importance of checklist items, the local accessible learning centre (alc), which supports students on campus who use assistive technologies, was consulted about the difficulties most encountered by students. the alc’s highest priority was the provision of an alternate accessible version of a tutorial, since it is difficult to make complex embedded web content accessible for everyone under every circumstance and an alternate version allows people to work with content in a way that suits their needs. accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 10 for the evaluation, major database vendors were chosen through a scan of common vendors and platforms at universities, with input from collections colleagues. some vendors were eliminated because they don’t provide instructional tutorials on their websites. twenty-five vendors were included in the study (see appendix 1). a large majority of the tutorials found were screencast or video tutorials; a few vendors provided recorded webinars, and a few provided interactive multimedia tutorials, mainly text captions or visuals with clickable areas or quizzes. in total, 460 tutorials were evaluated for accessibility: 417 video, screencast, or interactive tutorials from twenty-foure vendors, and 41 recorded webinars from four vendors. if tutorials were available in more than one place, most commonly on both the vendor’s website and youtube, both locations were tested. if more than thirty tutorials were provided by a vendor, every other one was tested. if multiple formats of tutorial were available, such as screencasts and recorded webinars, each format was tested. testing from the perspective of people with visual impairments was a key focus. other assistive technologies such as kurzweil (for people who can see but have print-related disabilities) and zoomtext (for enlargement) are widely used, but if webpages work well using screen-reading software intended for people with visual impairments, they also generally work using other kinds of assistive software. tutorials were tested with two screen-reading programs used by people with visual impairments: nvda (with firefox), a free open source program, and jaws (with internet explorer), a widely used commercial product. both were used to determine whether any difficulties were due to the quirks of a particular software product or a result of inherent accessibility problems. in addition, captions were evaluated to determine accessibility for people who are deaf or have hearing difficulties. people with visual or some physical impairments use the keyboard only, so all tutorials were tested without a mouse using solely the keyboard. during testing, each task was tried three different ways within nvda or jaws before deciding that it couldn’t be completed. if one of the three methods worked the task was marked as successfully completed. if a task could be completed successfully in one screen-reading program but not the other, it was marked as unsuccessful. screen-reader support needs to be consistent across platforms, since people may be using a variety of types of assistive software. findings and discussion tutorials created by the same vendor nearly all used the same approach and had the same checklist results. this is positive, since consistency is important for accessibility and helps in navigation and ease of use. none of the forty-one recorded webinars tested in this study were accessible. webinars did not have player controls that were findable on the page by screen-reading software or usable by information technology and libraries | december 2016 11 keyboard. none had captions, transcripts, or alternate accessible versions. often webinars were quite long, with no clear structure and no cues to focus attention on the screen. recorded webinars had almost no accessibility features and can’t be recommended for use as accessible instructional materials in their current form. none of the screencast or video tutorials tested were completely accessible, and all failed in at least one checklist item. tutorials from some vendors, however, came close to meeting all checklist requirements. overall, there were many positive accessibility features in the video and screencast tutorials. most of these tutorials were findable and playable by screen reading software in some way, had video player controls usable by keyboard, had descriptive narration so people who can’t see the screen can tell what is happening, had clear visuals and audio narration, used simple language, and were relatively short and focused in content. the most accessible screencast or video tutorials were produced by the american psychological association (apa), american theological library association (atla), modern language association (mla), and ebsco. their tutorials had many accessibility features and rated highly on the checklist. they included much less commonly found accessibility features, especially the use of visual and/or audio cues to focus the viewer’s attention and the inclusion of accurate and properly synchronized closed captions. visual cues are important for people with learning or attentionrelated disabilities, and help all viewers interpret and follow the video more easily. people who are deaf can’t access the content without captions, and captions also help people who have english as a second language or are at public computers without headphones. tutorials from these vendors also had an alternate version or transcript available. as mentioned earlier, the highest-priority checklist item is the presence of an alternate accessible version, since it is difficult to design multimedia that works for people with all disabilities in all circumstances. people with disabilities may also have previous negative experiences with online multimedia and prefer to use an alternate format that they have had more success with. in the case of these above-average vendors, the alternate accessible version was a transcript consisting of the video’s closed captions, auto-generated by youtube. since the tutorials’ narration was descriptive and the captions were accurate, the auto-generated transcripts are useful. however, the youtube transcript is hard to find on the youtube page. also, most of these vendors had tutorials available both from their own websites and from youtube, and none had alternate versions available on their own websites. viewers requiring an alternate format would need to know to go to the youtube site instead of the vendor site to find it. two other vendors also had quite accessible tutorials. ieee’s tutorials had the same positive accessibility features already mentioned. tutorials were done in-house and presented through the vendor’s site. while most tutorials presented on vendor sites were lacking in accessibility, ieee’s were well thought out from an accessibility perspective and usable by screen-reading software. these were the only tutorials tested where all interactivity, including pop-up screens, was easily accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 12 usable and navigable by keyboard. the one accessibility issue was the lack of an alternate accessible version. elsevier’s sciencedirect tutorials took a different approach to accessibility than other vendors, or even than elsevier’s tutorials for other elsevier products. the science direct tutorials were not accessible, but an alternate text version was available and people using screen-reader software were informed of this when they get to the tutorial page and were redirected to the text version. the ideal is to have one version that is accessible to everyone, but this approach is a good way to implement an alternate version if one accessible version isn’t possible. screencasts or video tutorials from other vendors also have some good accessibility features, but these were balanced with serious accessibility problems. the main accessibility issues discovered include the following: alternate accessible versions: vendors who had captions and hosted their videos on youtube did have auto-generated youtube transcripts, but these were hard to find and were only useful if the captions were descriptive and accurate, which many were not. apart from elsevier’s sciencedirect tutorials, no vendors provided another format deliberately as an accessible alternative. captions: captions were missing or problematic in the tutorials of fourteen vendors, or 59 percent of the total. five (21 percent) of vendors provided no captions at all for their tutorials. nine (38 percent) had unedited, auto-generated youtube captions, which are highly inaccurate and therefore don’t provide usable access to the content for people who are deaf. tutorial not findable or playable on page: twelve vendors (50 percent) had tutorials that were not findable on the webpage or playable for people using a keyboard or screenreading software. most of these issues are with tutorials on vendor sites, which were often flash-based or offered through non-youtube third party sites like vimeo. four vendors (17 percent) offered access to their tutorials both through their own (inaccessible) website and youtube, which is findable and playable by screen reading software. eight (33 percent), however, only provided access through their (inaccessible) webpages, which means that people using a keyboard or screen reading software would not be able to use their tutorials. no visual cues to focus attention: eight vendors (33 percent) had no visual cues to focus attention in the video. visual cues help people with certain disabilities focus on the essential part of the screen that is being discussed, help everyone more easily interpret and follow what is happening, and are known to help facilitate successful multimedia learning.18 information technology and libraries | december 2016 13 nondescriptive narration: six vendors (25 percent) had tutorials with audio narration that didn’t sufficiently describe what was happening on the screen. narration needs to describe what is happening in enough detail so people who can’t see the screen are not missing information available for sighted viewers. fuzzy visuals: five vendors (21 percent) had tutorials with visuals that were fuzzy and hard to see. this makes viewing difficult for people with low vision, and challenging even for people with normal vision. fuzzy audio or background music: three vendors (13 percent) had poor-quality audio narration or background music playing during narration. background music is distracting for those with hearing difficulties and makes it more difficult to focus on what is being said. eliminating extraneous sound also makes it easier for people to learn from multimedia.19 tutorials consisting only of text captions: three vendors (13 percent) had tutorials consisting of text captions with no narration. the text captions were not readable by screen-reading software, and no alternate accessible versions were provided. providing narration in tutorials is recommended for accessibility, since it allows people who can’t see the screen to access the content more easily, and has been shown to improve learning and recall over on-screen text and graphics alone.20 recommendations and conclusions this study attempted to determine how accessible vendor-created database tutorials are, and whether academic librarians can use them instead of re-creating them locally. for recorded webinars, the answer is a clear no, since none were technically accessible for people using screenreading software. for video or screencast tutorials, however, the answer less is clear. results showed that many vendors created tutorials with positive features like clear visuals and audio, being short and focused on one main point, and using descriptive narration. however, technical accessibility was much less successful, with 59 percent of vendors omitting usable captions and 50 percent presenting tutorials that couldn’t be found on the page or played by people using screen-reading software. these technical accessibility issues prevent people with hearing, vision, or some mobility impairments from using the tutorials at all. although none of the tutorials studied met all the checklist criteria, some came close and could be used by librarians depending on local requirements, policies, and priorities for accessibility. in part, this study found that the accessibility of many tutorials depends on how they are presented. disappointingly, 50 percent of vendors had tutorials on their websites that were not findable or playable by people with disabilities. many vendors, however, hosted tutorials on youtube as well as their own site. in these cases, youtube was always a more accessible option accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 14 than the vendor site. youtube itself is relatively accessible, with both pages and players that are navigable by keyboard and by screen-reading software. there are options for accessibility settings in youtube, such as having captions display automatically, and more accessible third-party overlays are available for the youtube player. on vendor sites, there were more likely to be issues with flash and an inability for people using screen-reading software or keyboards to find and play videos. some vendors embed youtube videos on their site. even if the embedded videos are findable and playable, this method omits important accessibility features found on the youtube page, such as the text transcript. the results of this study show that using youtube where available is recommended. further, linking to youtube rather than embedding the video is preferred, unless a separate link to the transcript is made to provide an alternate accessible version. captions are another key accessibility problem identified in this study: nearly two-thirds had unusable captions. often, auto-generated youtube captions were present but were not usable. the presence of captions is not enough for accessibility; those captions need to be accurate and present the same content as the narration. youtube auto-captioning does not generate captions that are accurate enough to be useful without manual editing. youtube auto-generates transcripts from the captions, so if the captions are inaccurate the transcript will not be useful either. editing youtube auto-generated captions is necessary to ensure accessibility. a few accessibility issues found in this study would be easy to improve with some thought during tutorial creation. adding visual cues like arrows or highlighting to the screen to help people focus attention, or remembering that not everyone can see the screen while recording narration, can be easily achieved and would improve accessibility significantly. other issues would require more planning and effort to improve. given the widespread technical accessibility problems identified in this study, it is particularly important for people creating tutorials to provide alternate formats that are accessible if tutorials themselves are not accessible. almost no vendors do this currently, but it would have the most significant impact on accessibility for the broadest range of people. adding usable captions is the second most important area for improvement. to provide access for people who are deaf, captions need to be added or autogenerated youtube captions need to be edited for accuracy. both alternate formats and captions require some thought and effort to implement but ensure that tutorials will meet accessibility requirements and be usable by everyone. notes and bibliography 1. eamon tewell, “video tutorials in academic art libraries: a content analysis and review,” art documentation 29, no. 2 (2010): 53–61. information technology and libraries | december 2016 15 2. amanda s. clossen, “beyond the letter of the law: accessibility, universal design, and human-centered design in video tutorials,” pennsylvania libraries: research & practice 2, no. 1 (2014): 27–37, https://doi.org/10.5195/palrap.2014.43; joanne oud, “improving screencast accessibility for people with disabilities: guidelines and techniques,” internet reference services quarterly 16, no. 3 (2011): 129–44, https://doi.org/10.1080/10875301.2011.602304; kathleen pickens and jessica long, “click here! (and other ways to sabotage accessibility),” imagine, innovate, inspire: the proceedings of the acrl 2013 conference (chicago: acrl, 2013), 107–12. 3. deann barnard-brak, lucy lechtenberger, and william y. lan, “accommodation strategies of college students with disabilities,” qualitative report 15, no. 2 (2010): 411–29. 4. cyndi rowland et al., “universal design for the digital environment: transforming the institution,” educause review 45, no. 6 (2010): 14–28. 5. peter brophy and jenny craven, “web accessibility,” library trends 55, no. 4 (2008): 950–72. 6. kyunghye yoon, laura hulscher, and rachel dols, “accessibility and diversity in library and information science: inclusive information architecture for library websites,” library quarterly 86, no. 2 (2016): 213–29. 7. ruth v. small, william n. myhill, and lydia herring-harrington, “developing accessible libraries and inclusive librarians in the 21st century: examples from practice,” advances in librarianship 40 (2015): 73–88, https://doi.org/10.1108/s0065-2830201540; john carlo jaeger, paul t. wentz, and brian bertot, “libraries and the future of equal access for people with disabilities: legal frameworks, human rights, and social justice,” advances in librarianship 40 (2015): 237–53; yoon, hulscher, and dols, “accessibility and diversity in library and information science: inclusive information architecture for library websites.” 8. suzanne l. byerley, mary beth chambers, and mariyam thohira, “accessibility of web-based library databases: the vendors’ perspectives in 2007,” library hi tech 25, no. 4 (2007): 509– 27, https://doi.org/10.1108/07378830710840473; kelly dermody and norda majekodunmi, “online databases and the research experience for university students with print disabilities,” library hi tech 29, no. 1 (2011): 149–60, https://doi.org/10.1108/07378831111116976; jennifer tatomir and joan c. durrance, “overcoming the information gap: measuring the accessibility of library databases to adaptive technology users,” library hi tech 28, no. 4 (2010): 577–94, https://doi.org/10.1108/07378831011096240. 9. pickens and long, “click here!”; clossen, “beyond the letter of the law”; oud, “improving screencast accessibility for people with disabilities”; nichole a. martin and ross martin, “would you watch it? creating effective and engaging video tutorials,” journal of library & accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 16 information services in distance learning 9, no. 1–2 (2015): 40–56, https://doi.org/10.1080/1533290x.2014.946345. 10 . byerley, chambers, and thohira, “accessibility of web-based library databases.” 11. tatomir and durrance, “overcoming the information gap.” 12. dermody and majekodunmi, “online databases and the research experience for university students with print disabilities.” 13. laura delancey, “assessing the accuracy of vendor-supplied accessibility documentation,” library hi tech 33, no. 1 (2015): 103–13, https://doi.org/10.1108/lht-08-2014-0077. 14. jodi b. roberts, laura a. crittenden, and jason c. crittenden, “students with disabilities and online learning: a cross-institutional study of perceived satisfaction with accessibility compliance and services,” internet and higher education 14, no. 4 (2011): 242–50, https://doi.org/10.1016/j.iheduc.2011.05.004. 15. kari l. kumar and ron owston, “evaluating e-learning accessibility by automated and student-centered methods,” educational technology research and development 64, no. 2 (2015): 263–83, https://doi.org/10.1007/s11423-015-9413-6. 16. us access board, “draft information and communication technology ( ict ) standards and guidelines,” 36 cfr parts 1193 and 1194, rin 3014-aa37 (2015), https://www.accessboard.gov/attachments/article/1702/ict-proposed-rule.pdf. 17. pickens and long, “click here!”; clossen, “beyond the letter of the law”; martin and martin, “would you watch it?”; oud, “improving screencast accessibility for people with disabilities.” 18. see the signaling principle in richard e. mayer, multimedia learning, 2nd ed. (cambridge: cambridge university press, 2009): 108–17. 19. see the coherence principle, ibid., 89–107. 20. see the modality principle, ibid., 200–220. information technology and libraries | december 2016 17 appendix 1. list of vendors 1. acm 2. adam matthew 3. alexander st press 4. apa 5. atla 6. chemspider 7. cochrane library (webinars only) 8. ebsco 9. elsevier 10. factiva 11. gale 12. ieee 13. lexis nexis academic (tutorials and webinars) 14. marketline 15. mathscinet 16. ovid/wolters kluwer (tutorials and webinars) 17. oxford 18. proquest (tutorials and webinars) 19. pubmed 20. sage 21. scifinder 22. standard & poor/netadvantage 23. taylor and francis 24. web of knowledge/thompson reuters 25. zotero accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 18 appendix 2. tutorial accessibility evaluation checklist functionality � equivalent alternate format(s) are provided � transcript/test version � audio � other ___________________________ � alternate formats provided are accessible � alternate formats provided are findable on the page by screen reader � screen reading software can find the video on the webpage � screen-reading software can access and play the video � video-player functions can by operated by keyboard/screen-reading software � interactive content can be accessed and used by keyboard/screen-reading software � user has some control over timing (pause/rewind capability) � alternate modes of presentation are available for all, meaning presented through text, visuals, narration, color, or shape � synchronized closed captions are available for all audio � audio/narration is descriptive usability � user controls if/when the video starts (no auto play) � video is easy to use by screen-reading software � clear, high-contrast visuals and text � clear, high-contrast audio (no background noise/music) � uses visual cues to focus attention (e.g., highlighting, arrows) � is short and concise � is clearly and logically organized � has consistent navigation, look, and feel � uses simple language, avoids jargon, and defines unfamiliar terms � explicit structure with sections, headings to give viewers context � learning outcome/goal clearly outlined and content focused on outcome lib-s-mocs-kmc364-20141005044413 highlights of minutes information science and automation division board of directors meeting 1973 annual conference las vegas, nevada monday, june 25, 1973 145 the meeting was called to order by president ralph shoffner at 8:25 a.m. those present were: board-ralph shoffner, paul j. fasana, susan k. martin, donald p. hammer, and berniece coulter, secretary, isad. guestsfrederick kilgour, james dolby, stephen salmon, james rizzolo, lavahn overmyer, douglas ferguson, and brett butler. minutes of midwinter meeting. there was a request that the minutes of the board meetings at midwinter (january 1973-washington, d.c.) be further edited-deleting and clarifying for publication. mrs. susan martin, editor of lola, informed those present that the deadline for submission of copy for the march 1973 issue is the middle of july. president shoffner suggested a separate meeting be held during conference to revise the midwinter minutes. cooperation with asis/siglan. douglas ferguson, representing the american society for information science special interest group on library automation & networks (asis/ siglan), presented a proposal for the board's consideration that !sad cooperate with asis/ siglan in the areas of publications, programs, and research proposals. the aims of such cooperation would be to reach people (for membership) and to save money. mr. ferguson was interested in the board's response on this matter. mrs. martin stated that cooperation in the publications area might be relatively easy for asis, but for ala it might be another matter. mr. ferguson felt that implementation would be the problem. he would like to focus on "specific purpose projects" functions-sharing of budget and membership. mr. fasana suggested that the three chairmen of the isad ad hoc committees (research topics, seminar and institute topics, and objectives) meet with mr. ferguson, and that charles husbands, !sad's representative to asis, be included. president shoffner named jim dolby, don bosseau, and douglas fer146 journal of library automation vol. 6/ 3 september 1973 guson to set up a time and place to meet with those asis/siglan board members present at the conference to discuss isad and asis/ siglan cooperation. report of jola editor. mrs. martin informed the board of the status of ]ola. the june 1972 issue had been mailed, the september 1972 issue would be mailed in two weeks (about the middle of july), and the december 1972 issue was in the galley stage and should be mailed within six weeks. the post office has been told that the mailing of the jo la issues would be caught up by the end of the calendar year. the ]ola technical communications has completed publication of the 1972 issues and is ready to be incorporated into jola. she suggested that the board reconsider incorporating jola/tc into the journal since tc would become neither timely nor a newsletter. the decision to incorporate originally had had been voted by the board as a result of ala publishing board's request that all divisions cut their budget item for publications by 10 percent. the 1973/14 budget was set up for a combined journal. mrs. martin asked for the _board's opinion. president shoffner asked if it would be possible to return to a monthly publication as soon as cost allowed. mrs. martin replied that a reduction in the.jola budget would have to be made to allow for monthly publication since tc was now incorporated into lola's budget. the question was referred to the isad editorial board. information science abstracts. ben ami lipetz is interested in sponsorship and cooperation in relation to isa. several associations are now sponsors. to become a sponsor the association makes an initial financial commitment and then gains advisory board capacity. !sad at one t~e had a subscription drive for fifty new isa subscriptions from isad members, which was not fulfilled. motion by paul fasana that some attempt be made to evaluate isa and make recommendations to the board whether isad should consider promoting subscriptions to isa. seconded by susan martin. carried. report of the committee on objectnes. mr. stephen salmon, chairman, reported that his committee on objectives would meet that morning at 10:00 a.m. and they had not yet discussed the draft report. he had talked with the members of the committee by telephone, however. mr. richard angell had written to president shoffner; the main point of his letter was that isad not accept the recommendations of the media group that isad incorporate them into the division. he argued that ala . organization does not provide for types of materials, although it does provide for types of libraries and types of activities. he also felt that there was not a broad enough area of mutual concern, and that coo should deal with this matter. mr. salmon pointed out two matters: ( 1) highlights of minutes 147 when the full objectives committee met with representatives of the information technology discussion group, there did seem to be a community of effort-the "chemistry" seemed to be there, and it did seem to work well, and ( 2) it was expressed that "media" has been around ala for years, even decades, and may continue without a "home'~ while the best answer is still being determined. mr. salmon suggested that isad could attempt a solution at this time. mr. shoffner mentioned that one of his particular concerns was that the objectives committee very explicitly review and then express its opinion on the relation of the technology group to ala's audiovisual committee. mr. salmon stated the committee's feelings were that isad objectives should be changed to include media in its scope. he asked whether the information technology discussion group should continue as a discussion group or become a committee, round table, or section. president shoffner said he had put "proposed audiovisual committee" on the agenda to determine whether or not there were strong objections to the information technology discussion group being housed in isad as a committee. if not, the objectives committee could be charged to come back to the board with recommendations as to the form of the group. mr. sahnon stated, however, that the objectives committee was not an organization committee and the form of the group was not its concern. the establishment of a committee was provided for in the division bylaws. mr. fasana thought !sad's objectives were broad enough as stated in the bylaws, article ii, section 1, the object of the division: particularly the words "and related technological developments." mr. salmon said the committee felt that media should be specifically provided for in the isad objectives statement rather than leaving it open to be read into the present statement. he mentioned that historically the "founding fathers" of isad had merely called the name of the division: "library automation" but that the "information science" had been put in by coo to please the reference services division who were interested in that area. to place the matter before coo could take six years, mr. fasana stated. mr. salmon felt, however, that it may be the time to call coo's attention to this discussion group so that eventually they might eliminate the nineteen or so a v . committees which have no formal connection with ala's audiovisual committee. mr. kilgour wanted to know how the board felt about a name change of the division and mr. shoffner suggested he see steve salmon and larry auld about his suggestions. report of research topics ad hoc comm!ttee. mr. james l. dolby, chairman, expressed the feelings of his committee that the present research need lay in the area of improvement of library op148 journal of library automation vol. 6/3 september 1973 erations and suggested concentrating on planning and evaluating existing and proposed methods rather than on system "breakthroughs." a tremendous amount of data has been accumulated by presently operating automated systems. this data should now be put to use in research studies. first, acquisitions-no library is comprehensive, not even the library of congress. data should be collected that should enable the library to know the ratio of use to collectivity, e.g. lists of high-use books; funneling all acquisitions through a single source to reduce duplicate buying; the possible merging of data bases (upon which circulation data can shed some light); and setting up methods by which libraries can interchange data so that comparative analyses can be made. second, cataloging-mr. dolby made several recommendations: ( 1) develop measures of cost effectiveness of various access systems. he stressed a need to provide more access to library collections, looking into the means of expanding from two or three subject headings per book in a cataloging system to forty or fifty. we need to know what people are trying to do in libraries by systematic collection of information about their activities. ( 2) enumerate various types of information requests made by users. included in this would be the collection of data on in-library use. ( 3) determine needs in terms of coordinated planning, cooperation, and hardware and software transferability which should be confronted before the fact, rather than after, as more and more regional operations take shape. ( 4) develop the problem-solving and idea-producing capabilities of library staffs to a maximum. ( 5) develop a continuing education program for librarians covering !sad-related topics. ( 6) establish a curriculum committee to deal with problems in schools of librarianship and to make information science a teachable subject. conference planning committee report. chairman brett butler had two statements to make regarding future programs: ( 1) the theme of an institute program which had been postponed would be carried over to the 1974 new york annual conference: "library automation in the national libraries," enlarged to include other national libraries; ( 2) the 1975 san francisco meeting would be on "information science and library automation" and focus on the things dolby mentioned in his research topics ad hoc committee report. he further stated he would like to see more co-sponsored programs. wednesday, june 27,1973 president ralph shoffner opened the board meeting at 10:10 a.m. the following were in attendance: board-ralph shoffner, donald hammer, paul fasana> susan ma1tin, and berniece coulter, secretary, isad. guests-stephen salmon, james rizzolo, douglas ferguson, brett butler, david waite, velma veneziano, pearce grove, ronald miller, frederick kilgour, and lawrence w. s. auld. president shoffner pronounced a quorum present. highlights of minutes 149 seminar and institute topics. chairman ron miller said the committee had reviewed the literature and felt there was value in regard to continuing education programs. he said they had received summaries and statistics on previous seminars and felt the institutes should be continued. conference planning committee report. brett butler, chairman, summarized the seminars held during the year. the microforms seminar in detroit was not held as a separate seminar but incorporated into sessions of the national microform association's annual meeting. two other seminars were not held as planned but postponed. surplus monies in the preconference fund were to be used for publishing of the proceedings of the preconference. tapes had been professionally made of these proceedings. the 1975 meetings were being planned now with the goal of cooperating with asis. the program in new york on national libraries which was cancelled last midwinter (1973-washington, d.c. ) was being considered with the scope to be increased to include other national libraries-france, great britain, etc. the program could require more time than the normal time slots. maryann duggan was preparing a goals document to be distributed to each participating library. mr. butler continued with his report saying that maryann duggan would plan and coordinate the networks seminar for the spring. the focus would be on the proposed theme-"advertising library automation-how to share yom efforts." the general feeling is that isad should also do something with other associations in state library operations and library schools. mr. fasana stated that some feedback on the las vegas preconference had been related in such questions as "why was the registration fee for preconference so expensive this year?" mr. butler reported that a reduction in the price of the proceedings was offered to registrants. !sad would have to subsidize ala publishing services for the discount offered these registrants because ala cannot sell at different prices to the membership. mr. miller stated that his committee felt that a good deal of the work now done by volunteers should be done by a staff member and those costs be included in the registration fees. this approach should be used for future seminars. mr. fasana said that some divisions list the analysis or a breakdown of costs on their advertising or program for preconferences. apparently, president shoffner said, the price was reasonable, judging from the response. mr. butler said another objection was the conflict with acrl' s preconference on networks. had we had knowledge of their preconference previously, the two could have been coordinated. mr. shoffner said that in a large conference conflicts were to be expected. mrs. veneziano felt there were few people in attendance at the networks preconference who, had it not been held, would have attended !sad's. president given authority to appoint committee 150 journal of library automation vol. 6/ 3 september 1973 members without individual approval. mr. kilgour asked that the board give him the authority to appoint committee members without approval of each individual as the isad bylaws stated. he asked for blanket approval. this was previously given to mr. shoffner. mr. kilgour asked that the board invoke section 2b of the bylaws and provide that the appointments last until the end of the president's term. mr. fasana suggested that a sense of "yes" be given. report of marbi committee. mrs. veneziano, the chairman, suggested the acronym "marbf' (machine-readable bibliographic information) be used for convenience in remembering the lengthy title of the committee. the committee was still concerned with trying to define its role and the mechanism to implement the role. one important consideration of the committee was to serve as a link between the members of ala and the library of congress in order to avoid the repetition of the problem which arose regarding isbd and marc records. henriette avram had prepared a position paper on this committee's relationship to any impending content changes to marc records. these would not be changes required as a result of changes in cataloging rules, over which lc has no say, but rather changes in the needs of the .library community. the paper was not intended to set forth a permanent operation, but to propose guidelines. it outlined what would happen at the point where some possibility for change was discovered, and how lc and the committee would communicate. it did not, of course, detail the committee's communication with division members and the relationship of the committee to the marc users discussion group. the committee concluded that it should communicate at least with the marc subscribers, and that james rizzolo, chairman of the marc users discussion group, would assume the responsibility of circulating information on impending changes to the marc subscribers and marc users and get the information back to the committee which would determine if there was a consensus and in its best judgment give a reply back to lc. a second activity of the committee would be to reach interested people through some means as ]ola tc or lrts. the voluminous amount of papers should not be distributed generally as they become obsolete quickly, but a center is needed for storing these papers and the fact that they exist should be circulated to the library field in general. copies could be made available for a price to those interested. mrs. veneziano hoped this could be worked out with someone at ala headquarters. another area-that of nonbibliographic data, e.g., uniform library codes, dealer codes, etc.-is of interest to the committee. mrs. veneziano's personal opinion was that though the function statement of the committee indicates responsibility for bibliographic information only, the committee should also be involved with anything which impacts the use of that bibliographic data. she expressed hesitancy to have many other committees highlights of minute~ 151 working in this area as too much time is devoted to getting feedback from other committees before a decision can be made. . the committee would like to propose adoption of a mechanism where.by it can set up a subcommittee, task force, or working group with a limited life-span which would study and react to very specific technical proposals and working papers, etc. which are developing informally at a national and international level. these subgroups must be very responsive so that the committee will not be placed in a position where it cannot take action readily. also there must be a flexible mechanism for establishing subcommittees. rtsd has strong feelings on setting up such a group without approval by the division. she did not think, however, there was any objection to creating task forces and felt that it was the only way to obtain expert comment on some of these materials. the feeling of the committee was that henriette a vram' s position paper should be accepted with minor provisos: the implementation (section 1-b) and the time frame allowed for reporting back to lc. henriette a vram is to go back to lc and check these modifications. h they meet lc's approval, the committee will accept the position paper. the character set subcommittee, consisting of charles payne, david weisbrod, and michael malinconico, will study the latest draft working papers and comment to mrs. avram who is on the task force. · · · mr. fasana said the committee had power to set up subcommittees because of the board's previous approval on this; in addition, the function statement gives the committee the right to set up a task force. president shoffner pointed out that the results of deliberations require a joint submission to all three boards. mr. hammer volunteered help in any coordination which might be needed. comments regarding the distribution of the opinion survey that mrs. a vram' s report calls for was that it should not be general but that · it be noted (perhaps in lola tc) that the survey is available. , mr. shoffner suggested that the board accept john kountz's statement regarding the establishment of a committee on nonbibliographical data and reconsider the matter again at midwinter. john linford suggested that it would be best to expand the charge to the committee to include ·the nonbibliographical area. paul fasana stated that the committee's function statement now is so worded that it can include noncataloging data. the sense of the board was agreement that the authority already existed; telecommunications committee report. the new chairman, david waite, reported that the committee first discussed the committee's focus, as it was the desire of the board, as h e understood it, to make some changes in this committee. in the past the activities of the committee were basically in cable tv. the present members were not too inter~ ested in making that their prime target, but instead the electronic communications of bibliographic data. they are not going to just look at hot is152 journal of library automation vol. 6/ 3 september 1973 sues but are currently proceeding in the area of telecommunications information, at the same time keeping their eyes open for important developments in the technological field under the broad base of telecommunications. the two main focal points of the committee would be education and standards. in education the committee would try to communicate with decision makers as related to aspects of communications to be investigated and then overflow to the general library community. mrs. martin asked if there might not be a problem with the committee's taking on this role since seminars, institutes, etc., were the function of the conference planning committee. the planning for such seminars, etc., mr. butler said, on a six or nine month basis did not work adequately. there was no objection to the telecommunications committee functioning in this area mr. kilgour remarked that at&t and other phone companies, as well as fcc, had a great deal going on with impact on telecommunications and butler said, on a six or nine month basis did not work adequately. there was no objection to the telecommunications committee functioning in this with networks presently and he felt that ala should present a position to fcc. the committee should therefore inform itself extensively as to what is going on so that if it appears some action by ala was needed, we would be prepared. committee on objectives report. chairman stephen salmon summarized the discussion by the objectives committee of the three issues raised at the first session of the isad board meeting regarding the information technology discussion group: ( 1) how such a group should fit into the organizational structure of isad. the committee sensed a media committee was not an answer but felt a discussion group was appropriate. the media group should be continued even if transformed into a committee. ( 2) the restatement of the objectives and activities of the division to include the media group. consideration was given to paul fasana's words that "related technology" in the isad bylaws' objectives statement included educational technology already. but the committee agreed with the board that a change in the language would help clarify and mr. kilgour had some recommendations in rewording the objectives statement to solve the problem. ( 3 ) terminology for the name of the division and the journal. they finally identified three possible name changes for the division: (a) information science and library automation, (b) information science and educational technology, and (c) information science and technology. the final decision was that the present name of the division, "'information science and automation," was the best. the committee also thought the draft report should specifically include another objective, i.e., to offer expertise in this area to others in ala and other professional organizations like arl. mr. salmon listed the additions and changes made and included in the final draft of the committee's report. highlights of minutes 153 motion paul fasana moved that the isad board accept and adopt the report of the objectives committee. seconded by susan k. martin. carried. mrs. m·artin remarked tl1at the information technology discussion group was already within isad. mr. shoffner explained that the board had accepted the group only for one year and during that year isad intended to determine whether this activity was within isad' s scope. mrs. martin asked, if isad considers educational technology and audiovisual concerns to be within its scope, what the relationship would be with the other audiovisual committees in ala and also what coo's role would be? the board was not asserting what is out of scope with any other parts of ala, mr. shoffner answered, only what was within isad's scope. president shoffner thanked chairman salmon for his report and the committee for its work in carrying out the original charge as given and meeting the time schedule. he then declared the committee disbanded. there was some discussion on coordinating with other a v committees in ala and what aspect of a v, isad would be concerned with. mr. kilgour suggested that the information technology discussion group should pursue its own goals and not concern itself with the coordination of all ala av groups. such coordination he felt was impossible. whether a number of committees or subcommittees in the discussion group could be formed was also discussed. mr. shoffner stated that these should be "units" of the discussion group, not "committees." he further said he was reluctant to establish committees and would do so only after a group of people committed to doing a job showed, over some continuing period of time, productive activity on a number of different tasks that relate to each other. report of editorial board. mrs. martin said that at the monday isad board meeting she had talked of retaining ]ola tc as a separate publication, but the final feeling of the editorial board was negative. the thought was to create a separate section within ]ola but with a different format, as the green sheets are inserted in the library association record ( .. liaison"). don bosseau, editor of ]ola tc, remarked that the editorial board had provided insight into another need which was for truly technical communications, e.g., a short summary which would show up later in a longer, detailed article. the editorial board felt that tc should be made into something that has more impact than news releases. isad/led education committee report. a written report was submitted by the committee. (see exhibit a.) cola report. the membership of the discussion group had increased to 145. chairnlan don bosseau, who has held that position since the incorporation of the group within isad, said ballots would be sent out shortly for the election of a new chairman. 154 journal of library automation vol. 6/3 september 1973 mr. bosseau also asked about control of membership in the group and stated that ala's guidelines indicate one person per institution as a maximum membership. the board corrected this idea by saying that this limitation was not ala's but the old cola limitation. there is no limit on membership by ala. mr. butler asked that the planning of cola, marc, and information technology discussion groups' meetings be coordinated. mrs. martin said that david weisbrod had suggested that there be a cola meeting at asis and that would be part of the cooperation between asis and isad in the program area. marc users discussion group. mr. james rizzolo told of his intent to make a survey by breaking up the mailing lists he had into three groups: ( 1) marc subscribers, ( 2) those interested in using marc, and ( 3) an informational group. mr. kilgour thought the group was called "marc subscribers" not "marc users." mr. shoffner said the name had always been marc users. originally there had been the intent to set up a "marc subscribers'' group but mr. culbertson had said that it would not fit into either isad or ala's structure. they then settled on marc users discussion group. it was stated that both marc and cola discussion groups should be in the program section of the ala conference program book. it was pointed out that program time can be requested by committee or discussion group chairmen. information technology discussion group. mr. shoffner requested mr. donald hammer to inform the isad information technology discussion group that the board would not establish an av committee, but intended to continue with the information technology discussion group in response to their memo of march 2, 1973 requesting an a v committee within isad. report to ala planning committee. mr. shoffner also requested mr. hammer to forward the objectives committee report on the long range plans of isad to the ala planning committee as a means to meet their request. (this report had been deferred from midwinter so that the final report of the objectives committee could first be heard.) rtsd computer filing committee. mr. fasana said he was asked by the rtsd board why isad had refused their request to appoint an isad member to the rtsd computer filing committee. mr. hammer said he would see that a committee member was appointed. mr. shoffner expressed appreciation to the board and turned over the gavel to president fred kilgour. the meeting was adjourned at 12:00 noon. exhibit a june 25, 1973 highlights of m intttes 155 minutes of the 1973 annual isad/led meeting the 1973 annual meeting of !sad/led convened june 25 at caesar's palace, atrium i, las vegas. present were members jim liesener, ann painter and elaine svenonius; and visitors martha west (california state university, san jose), barbara fleming (university of nevada, reno) and philip heer (university of denver); in attendance were pauline atherton and charles davis. discussion centered on two topics: the disc questions as commented upon by library school faculties and the future course of disc. the general and specific comments on the disc questions given by library school faculties are given on the attached sheets. these sheets include, in addition to responses reported at ala, responses which arrived belatedly throughout the summer. general criticisms are primarily of two types: the questions are either too broad or they are outside the domain of information science. it was felt that had the use to which the questions are to be put-viz., to develop modules, not to examine graduating students-been clearer, the charge "too broad" would not have resulted. as to what is to be included in the domain of information science, this was precisely the point of the exercise of generating questions and comments limiting or extending the domain of information science, should be accorded consideration. at the june 25 meeting, the individual questions were discussed generally in light of comments received. following the discussion, participants at the meeting expressed informally and with varying degrees of determination interest in developing modules around certain of the questions. the meeting ended with a discussion of the future of disc. a technical session is being planned by the es sig at los angeles in october: program modules for developing curricula in information science; the plan for module development will be advertised and some demonstration modules shown with a view to drawing up module specifications. also contemplated is a program by !sad/led in january at midwinter ala.elaine svenonius, august 15, 1973. microsoft word june_ital_dehmlow_final.docx editorial board thoughts: developing relentless collaborations and powerful partnerships mark dehmlow information technologies and libraries | june 2017 3 with the end of the performance and fiscal year wrapping up, it seemed like a good time to reflect on what change initiatives we have engaged in over the past few years that have strengthened the organizational effectiveness of the it department in our library. my thoughts almost immediately drifted to our focus on collaboration. early in my career, it was the profession wide culture of cross-institutional collaboration that convinced me that becoming a librarian would be the right career move. i am certain that the impetus to collaborate stems from our professional service commitment a values based system that at its core believes that the success of all helps the collective do their jobs better in the name of service to our patrons. and yet, over the years, i have heard stories of and observed first hand internal competitions for resources, vilification of library it as siloed and opaque factions, and library it departments that have had strained relationships with their institution’s central it organizations. as a part of our senior leadership team for the hesburgh libraries, two of my core professional interests are organizational effectiveness and staff satisfaction, especially in the face of a rapidly changing technology landscape, competition for talent in the it sector where it is hard to contend with commercial salaries, and the slow rate of attrition at the university. retaining talented it staff requires creating a work culture that is better than the commercial sector, a work culture that values work/life balance, innovation and experimentation, a culture of teamwork and camaraderie, and where there is a clear sense of strategic priority. to build these latter two qualities into our work culture, we have strategically emphasized durable internal and external coalitions with a tenacious sense of partnership. true collaboration reinforces a collective sense of goals, allows for maximal efficiency, discourages unnecessary or destructive competition, and opens the door to the coveted but seldom realized ability to “stop doing” through partnering with other units on campus that share a sense of priority around particular services. creating sustainable and significant internal collaboration requires etching it into the culture of the organization. making it a part of the organization’s dna has to be prioritized and modeled by senior leadership and it begins with advancing shared goals over singular agendas. in our senior leadership team, we have committed to each other as our primary team. we may advocate for staff and initiatives in our own verticals, but our drive is to be holistic stewards for the libraries, not just our functional departments. we give as much, if not more weight, to the objectives of collective senior leadership team which also helps in clarifying priorities. our executive leadership mark dehmlow (mdehmlow@nd.edu), a member of the ital editorial board, is director of library information technology, hesburgh libraries, university of notre dame, south bend, in. editorial board thoughts | dehmlow https://doi.org/10.6017/ital.v36i2.10044 4 models cooperation, cross-divisional problem solving, and collective strategic initiative planning. using this model, decisions get made more quickly, enhancing our ability is to accomplish things on time with a high level of quality, and with a considerable level of satisfaction for our staff and faculty. the it department is less viewed as a black box where decisions for what to work on are made behind the curtains and rather as a group of talented staff who help our organization accomplish their priorities. when our it department needs to advocate for support and timely completion of work from individuals in other departments, the other senior managers help get their units mobilized. we see ourselves as part of the community and the community embraces us as part of them. historically, it has been tempting to view it as somewhat separate a part of the production line, but in an age where every operation in the library is affected by technology, our workflows need to be more integrated and team based. the problems we are working on are more cross-disciplinary and require a plurality of expertise to solve. libraries are increasingly becoming more and more of an interconnected and interdependent ecosystem that requires thinking holistically about problems and a relentless commitment to building coalitions to drive our services. it may seem obvious that this would be a more effective way to work, yet i have spoken with many people at organizations where there is a clear culture of departmental objective separation and competition for resources. i have long appreciated the the work environment at notre dame, in part because we strive to be an organization whose culture has been guided by our core institutional values accountability, integrity, excellence in leadership, excellence in mission, and teamwork. these values not only drive our internal collaborations, but also the way in which different departments on campus work with each other. we have had a long standing, positive relationship with our central office of information technologies one that has been tremendously cooperative, but for many years has lacked interconnections at a variety of levels and a clear collaborative and strategic focus. in the last 5 years, our organizations have shifted their focus the oit from emphasizing centralized, administrative, enterprise computing to decentralized, academic, enterprise computing and the libraries from doing everything in house to leveraging services for standardized services and focusing our staff’s time on initiatives where they can create the most value. in part, we developed an in-house it department because we had service expectations that weren’t a priority at the time for the oit. but during our strategic transitions, we have extended our working relationships at every level throughout our organizations from our staff in the trenches to our managers and senior leaders. my focus as the director for library it over the past few years has been to look at ways we can enhance our capacity through partnerships. to that end, there are several interrelated initiatives that we have begun to engage in with the oit: 1. embedding an oit presence in the libraries 
 2. shifting support for common it services to the oit, and 
 3. consolidating our customer communication through their service portal servicenow. 
 information technologies and libraries | march 2017 5 the first step in this new collaboration with the oit was letting go of the past and revisiting where the oit and the libraries have strategic overlaps that may not have been aligned before. as two service organizations on campus with a deep concern for supporting the academic endeavor, it was easy to find strategic alignment with each other. for the libraries, we often get questions at our service points about how to change passwords or install printer drivers, needs that are part of the central it service portfolio. for the oit, the libraries are a major campus hub where hordes of students and faculty conduct research and work on assignments, particularly after classes when many of the business unit leave the university for the day. working closely with the libraries’ director for teaching, research, and user services and the oit’s senior director for user services, we began developing a collaboration grounded in our common desire to support end users which resulted in creating an oit outpost in the libraries. while there are many libraries who have this kind of collaboration, this was a revolutionary step for us. this collaboration opened the door for us to begin a discussion about common technology services that we have been supporting internally printing and general lab computing. for us, these services are important to function well for our end users, but they are not services that require library expertise to accomplish. the oit supports these services for much of campus and as long as we have aligned expectations around service level expectations that are practical and committed to excellence the oit can handle that function much more efficiently and we can use our staff expertise to support other, emerging services that are core to the libraries. we are also working closely with the oit to leverage their it service portal servicenow as the libraries’ service portal. given that our service portfolio is much broader than strictly it services, moving in this direction for us required a willingness from the oit to think outside of the box and allow us to customize the system to meet our service needs. it has required some reciprocation from the libraries as well. the servicenow platform is more expensive than others we could license, its functionality will require effort from our staff to customize, and it is requiring us to change workflows, especially in the public services areas. integrating our customer communication into this platform, though, will create a better user experience for our patrons through supporting a common interface they are experienced with and it will allow for us to more easily transfer both staff and patron general it questions to the oit. beginning to work in truly collaborative ways requires shifting the narrative around our relationships from a client/provider model to one of a coalition. redefining these relationships as partnerships puts both parties on equal footing around the planning table where everyone has an equal stake in the objectives and outcomes. they don’t come effortlessly, they require libraries to ardently become more visible on campus, to articulate the complementary value that we can contribute to campus initiatives, and to proactively request to join initiatives that we haven’t participated in before. it also takes reaching out and helping campus partners see how we can collectively create value together using our unique talents to successfully support the campus community. and lastly, it takes engaging a more holistic view of the university and the way we steward its resources; sometimes that will mean allocating more resources for the common good editorial board thoughts | dehmlow https://doi.org/10.6017/ital.v36i2.10044 6 versus taking the narrower view that we should only consider our own context when adopting solutions. but in the end, if we are willing to think about our role at the university in that broader context and build powerful partnerships, we will collectively be able to serve our end users better. using augmented and virtual reality in information literacy instruction to reduce library anxiety in nontraditional and international students articles using augmented and virtual reality in information literacy instruction to reduce library anxiety in nontraditional and international students angela sample information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11723 dr. angela sample (asample@oru.edu) is head of access services, oral roberts university abstract throughout its early years, the oral roberts university (oru) library held a place of pre-eminence on campus. oru’s founder envisioned the library as central to all academic function and scholarship. under the direction of the founding dean of learning resources, the library was an early pioneer in innovative technologies and methods. however, over time, as the case with many academic libraries, the library’s reputation as an institution crucial to the academic work on campus had diminished. a team of librarians is now engaged in programs aimed at repositioning the library as the university’s hub of learning. toward that goal, the library has long taught information literacy (il) to students and faculty through several traditional methods, including one-shot workshops and sessions tied to specific courses of study. now, in conjunction with disseminating augmented, virtual, and mixed reality (avmr) learning technologies, the library is redesigning instruction to align with various realities of higher education today, including uses of avmr in instruction and research and following best practices from research into serving 1. online learners; 2. international learners not accustomed to western higher-education practices; and 3. learners returning to university study after being away from higher education for some time or having changed disciplines of study. the library is innovating online tutorials targeted for nontraditional and international graduate students with various combinations of avmr, with the goal to diminish library anxiety. numerous library and information science studies have shown a correlation between library anxiety and reduced library use, and library use has been linked to student learning, academic success, and retention.1 this paper focuses on il instruction methods under development by the library. current indicators are encouraging as the library embarks on the redesign of il instruction and early development of inclusion of avmr in il instruction for nontraditional and international students. literature review the patron approaches the reference desk, with eyes downcast. in a voice so soft that it is barely above a whisper, the patron mumbles, “is this where i can get help with research?” some variation on the above scenario is an occurrence long familiar to academic reference librarians. in 1986, mellon put a name to this nervousness of patrons; she called it library anxiety.2 mailto:asample@oru.edu information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 2 since then, librarians have implemented various measures to help put patrons at ease and minimize their library anxiety. scholars have studied many of these measures aimed at reducing library anxiety, both to determine the efficacy of such interventions and to understand better the causes of library anxiety. this paper describes one library’s intervention using a virtual-reality tour of the library to learn about some of the services available at the library prior to their initial visit in an attempt to reduce some aspects of their library anxiety. library anxiety library and information science (lis) researchers have long recognized anxiety related to libraries and research can have a detrimental effect on students. mizrachi described library anxiety as the feeling of being overwhelmed, intimidated, nervous, uncertain, or confused when using or contemplating use of the library and its resources to satisfy an information need. it is a state-based anxiety that can result in misconceptions or misapplications of library resources, procrastination, and avoidance of library tasks.3 since mellon’s theoretical framing of library anxiety in 1986, researchers have studied a number of library-related anxieties, including research anxiety, information literacy anxiety, library technophobia, and computer anxiety. various studies have focused on different groups of students—freshmen, nontraditional students, and international students, to name a few—who may experience higher levels of library anxiety. another area that has been of interest to researchers is the study of the efficacy of various measures aimed at reducing the library anxiety of students. causes and factors researchers have found several causes of library anxiety. in her seminal article, mellon used a grounded theory approach to understand and “describe students’ fear of the library as library anxiety.”4 mellon noted most of the students in her study described their feelings as being lost in the library, which mellon stated “stemmed from four causes: (1) the size of the library; (2) a lack of knowledge about where things were located; (3) how to begin; and (4) what to do.”5 head and eisenberg also found a majority of students (84 percent) had difficulties in knowing where to begin.6 bostick and later jiao and onwuegbuzie named “five general antecedents of library anxiety . . . namely, barriers with staff, affective barriers, comfort with the library, knowledge of the library, and mechanical barriers.”7 barriers with staff are the feelings students have regarding the accessibility and approachability of library staff.8 affective barriers are students’ self-perceptions of their competence in using the library and library resources. affective barriers’ arise from feelings of inadequacy and can be heightened by the perception that others possess library skills that they alone do not.9 comfort with the library deals with the student’s perception of the library as a “safe and comforting environment.”10 knowledge of the library is students’ knowledge of “where things are located and how to find their way around in the building.”11 mechanical barriers refer to students’ perception of the reliability of machines in the library (e.g., copiers, printers, computers, etc.).12 researchers focused on investigating the information-seeking behavior of students have identified stages of library anxiety. in her work, kuhlthau identified six stages of information seeking in information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 3 which students may experience library anxiety: task initiation, topic selection, prefocus exploration, focus formulation, information collection, and search closure.13 in blundell’s presentation of her theoretical model of the academic information search process (aisp) of undergraduate millennial students (figure 1), she described the varying levels of anxiety students may feel throughout this process depending upon their success at finding needed information.14 anxiety at stage 2: development/refinement “ranges from mild to extreme, depending on the success of the student’s aisp in finding information he/she believes is appropriate for addressing the academic need.”15 at stage 3, “based on information located through the aisp in stages 1 & 2, [the] student either fulfills [the] academic need with minimal anxiety, refocuses aisp with mid to high-level anxiety, or abandons the academic need completely with high/extreme levels of anxiety.”16 figure 1. blundell aisp model.17 although blundell studied undergraduate millennial students’ information-seeking behaviors, the same behaviors may also be descriptive of other groups of students. blundell omitted anxiety at or prior to stage 1 when the assignment is received by the student. one reason for the omission of anxiety in blundell’s model at stage 1 may be a seemingly paradoxical finding by many researchers regarding students’ inflated belief in their research skills as compared to their actual level of information literacy (il) skills.18 students with a high self-assessment of their il skills may feel confident at the onset of research, only experiencing anxiety when encountering low success rates when searching for information or when experiencing information overload. however, many other students may experience anxiety at the onset of receiving an assignment, particularly on a information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 4 topic in which they have little or no knowledge. others may experience anxiety if they realize they do not know where to look for information, how to use library research tools, or feel apprehension at the thought of asking for help from a librarian. for example, library anxiety can result from the requirements of the assignment; most professors require peer-reviewed sources. many new students do not know what a peer-reviewed source is, much less how to find one. indeed, many of the causes of library anxiety described from mellon’s and later jiao’s and onwuegbuzie’s work can be positioned throughout all six of kuhlthau’s and all three of blundell’s stages of information seeking and could explain some of the potential steps blundell noted in her model. negative effects in addition to the obvious discomfort students might feel, library anxiety, as with other forms of anxiety, can have a detrimental effect on students’ academic performance. as mellon noted, “students become so anxious about having to gather information in a library for their research paper that they are unable to approach the problem logically or effectively.”19 the findings from jiao’s and onwuegbuzie’s numerous studies support the negative effect library anxiety can have on students’ academic performance in various ways, including research performance, research proposal writing, and study habits.20 research has also shown the link between higher levels of library anxiety and avoidance of the library.21 avoidance of the library could hinder students’ academic performance or retention; studies have linked library use to higher gpas and increased retention rates.22 other negative effects of library anxiety include the reluctance of students to ask for help from a librarian and the tendency to procrastinate until it is too late to do well on assignments. when library anxiety is at a level high enough to cause students to enter a panic mode, logical thinking, the ability to apply existing skills, and building or acquiring new skills can be impaired. at-risk student groups acknowledging the negative effects library anxiety can have on students’ academic performance, several studies have looked to determine whether particular demographic groups of students experience library anxiety at higher rates and what factors or causes may be most prevalent in the causes of library anxiety for a particular group. in one study conducted by jiao, onwuegbuzie, and lichtenstein, students who fell into the following groups tended to have the highest levels of library anxiety: “male, undergraduate, not speak english as their native language, have high levels of academic achievement, be employed either partor full-time, and visit the library infrequently.”23 some studies have focused on learning more about the library anxiety of a particular group. some of the groups investigated include graduate, international, and nontraditional students. still others have focused on possible racial differences in the prevalence of library anxiety. although a few studies have found library anxiety to be higher for undergraduate students than graduate students, one of the most often-studied groups at risk for library anxiety has been graduate students.24 these researchers have looked at a number of factors in relation to graduate students’ library anxiety. in an early study, they found graduate students with the preferred learning style of visual learners tend to have higher levels of library anxiety. 25 in another study of graduate students, they examined the relation between library anxiety and trait anxiety, defined as “the relative stable proneness within each person to react to situations seen as stressful. ”26 jiao and onwuegbuzie, together with bostick, investigated the potential relationship between race and library anxiety in 2004, which study they replicated in 2006. in both, the researchers found information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 5 caucasian american graduate students reported higher levels of library anxiety than their african american counterparts.27 another group frequently examined in library anxiety studies is international students. mizrachi noted “studies involving international students in american universities consistently show their levels of library anxiety to be much higher than their american peers.”28 onwuegbuzie and jiao found international esl students “had higher levels of library anxiety associated with ‘barriers with staff,’ ‘affective barriers,’ and ‘mechanical barriers,’ and lower levels of library anxiety associated with ‘knowledge of the library’ than did native english speakers.”29 later, jiao and onwuegbuzie found the most prevalent causes of library anxiety for international students were mechanical barriers (library technology) as the greatest source, followed by affective barriers. 30 in the more recent pilot study by lu and adkins, the greatest barriers for international students were affective and staff barriers, while mechanical barriers, such as technologies, were no longer a significant cause of anxiety for most.31 collins and veal found adult learners in their study had the highest degree of library anxiety pertaining to affective barriers. 32 in their study, kwon, onwuegbuzie, and alexander revealed graduate students who had higher levels of library anxiety resulting from affective barriers and knowledge of the library had weaker critical-thinking skills, lower self-confidence, less inquisitiveness, and reduced systematicity (“less disposed toward organized, logical, focused, and attentive inquiry”).33 kwon found similar results in undergraduate students.34 interventions recognizing the multiple causes and multidimensional aspects of library anxiety, librarians have devised a number of interventions aimed at addressing one or more of its causes. some of the means to address barriers with staff have focused on outreach, engaging library instruction, online presence, and other similar efforts to reach students and provide needed support for students’ research. librarians have used information literacy instruction (ili), reference desk consultations, and print and online guides to address library anxiety stemming from affective barriers, knowledge of the library, and even the mechanical barriers arising from lack of technology skills. a common intervention is ili, which several studies have found to have some success in reducing students’ library anxiety. bell explored students’ levels of library anxiety before and after a onecredit il course.35 platt and platt examined the efficacy of two 50-minute ili sessions, required of students enrolled in the research methods in psychology course, in reducing library anxiety, which found “the greatest changes . . . were related primarily to knowledge of what resources are available in the library and how to access them.”36 in contrast to the typical one-session il class, fleming-may, mays, and radom investigated and found a three-workshop instruction model correlated with students’ increased confidence in using the library and lessening library anxiety. 37 notwithstanding the benefits of library instruction sessions for students in relieving library anxiety, pellegrino found students were far more likely to ask a librarian for help when their instructor, rather than a librarian, encouraged or required them do so.38 by familiarizing students with the location and arrangement of library services in the building, library orientations have been found to help relieve library anxiety. 39 library orientations primarily aim to address one of the causes of library anxiety: a lack knowledge of the library. these orientations often introduce students to various library staff, which may also help with the dimension of library anxiety due to barriers with staff. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 6 other interventions have been attempted with some success. martin and park found students were more apt to request assistance from the librarian if persuaded the consultation would save time. 40 mcdaniel found in a study of graduate students that the use of peer mentors was effective in reducing affective barriers.41 robbins discussed the use of library events to help ease students anxiety, but found in the follow-up survey many students were unaware of the events.42 diprince et al. discussed ways the use of a print guide can help alleviate library anxiety. 43 oru library oral roberts university the oru library serves the students, faculty, and staff of oral roberts university (oru). oru is a small, private, not-for-profit, liberal arts college located in tulsa, oklahoma. founded in 1963 by oral roberts, enrollment is approximately 3,600 students. oru is an interdenominational christian institution focused on a whole-person education of spirit, mind, and body. oru offers more than 150 majors, minors, and pre-professional programs in a range of degree fields, from business, biology, engineering, nursing, ministry, and more.44 history “the first building will be the library which is the core of the whole academic structure.”45 —oral roberts (1962) from the founding of oru, founder oral roberts had a vision of the library’s centrality to academics.46 this set a precedent early in the history of oru library of the importance of the library to the academic work of the students and faculty of oru. expanding on traditional views of the function of an academic library to serve mainly as the repository of books and articles, through the vision of early library administrators, oru library emerged as one of the early adopters of electronic technology with the dairs (dial access information retrieval system) computer.47 throughout the years, due to a number of factors, the oru library receded from the forefront of pre-eminence in academics on campus. library practices followed the general trend of academic libraries. the oru library continued to acquire needed materials (e.g., books, journals, access to databases). library instruction likewise kept up with current models of instruction. the typical method of instruction to undergraduates has been teaching one or two sessions to a class at the request of the instructor. on largely the efforts of the instruction librarian, il became a required component of undergraduate education at oru. with rare exceptions, undergraduate students at oru are required as a part of comp 102: composition ii to attend two sessions of an il course. other forms of ili include workshops and sessions for undergraduates working on their senior papers and other sessions for graduate and postgraduate students, all typically at the request of the instructors of classes. with the new addition of augmented, virtual, and mixed reality (avmr) learning technologies, at the behest of their dean, oru librarians have begun to look at ways to incorporate these technologies into their classes and daily work. several oru instructors are using avmr technologies in their classes.48 to help prepare students for the use of these technologies in their classes, one oru instruction librarian has begun to introduce students to avmr technologies. other oru instruction librarians are exploring ways to use avmr technologies to create visualizations of library and research concepts, such as a 3d visualization of how boolean logic information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 7 works in database searches. oru instruction librarians are also exploring ways to incorporate avmr technologies into a new program of online ili. although still in very early stages of planning, the proposed online ili will include a virtual tour of the library. this paper focuses on the implementation and early feedback from a formative assessment on a virtual tour of the oru library. oru modular students in addition to traditional 15-week semesters, two colleges at oru offer graduate modular programs, the college of education and the college of theology and ministry. many of the students who enroll in these programs are nontraditional students who are returning to college after some time. several of these students work full-time jobs and have family obligations in addition to their academic work. often, these students are not local to the tulsa campus; several are us students who live out of state and many others are international students. the modular classes offered by both programs can be a hybrid of online and modular format. the college of theology and ministry offers one-week courses on campus; the college of education offers two-and-a-half-day on-campus classes. modular classes are intensive due to the compressed nature of the curriculum. often, modular students are visiting campus for the first time, and in addition to locating their classes, are very busy with coursework. adding to these pressures, modular students may be using computer technologies in new ways. navigating the library’s resources is yet another stressor for many of these students. for students who are not familiar with the operations of an academic library, they may not be aware of library services nor how to access those services. the project in january 2017, the global learning center opened on the campus of oru. one hallmark of this renovated structure is the integration of avmr technologies.49 despite several professors on campus from various disciplines and colleges implementing avmr into their curriculum, students’ use of the facilities was somewhat lower than had been hoped. in the fall 2018 semester, the idea of creating a virtual tour of the oru library arose from a conversation between the author and a colleague, dan eller. eller described an online ili course he envisioned for oru’s graduate theology modular students. as a part of this course, he envisioned a virtual tour that could help students by reducing their library anxiety. early in 2019, oru’s associate vice president of technology and innovation, michael mathews, contacted dr. mark roberts, dean of learning resources (of which the oru library is a part) to propose making avmr learning technologies available through the oru library. dean roberts agreed and created an avmr team of library faculty to oversee this project. in the spring 2019 semester, the oru it department sent one of their employees, stephen guzman, to work with the library’s avmr team to set up an avmr station and work in the library to help make these new technologies available and known to oru students. in addition to other avmr projects guzman helped the library’s avmr team begin, he volunteered to take the 360 images when he learned of the library’s desire to create a virtual tour of the library. guzman also helped in the selection of editing software, 3dvista, for which the library acquired a license. working with the 360 images guzman took and stitched together, the author used 3dvista to create a virtual tour of the library. this software allows for the addition of elements to the 360 images that make up the virtual tour to enhance the viewer’s experience and to provide information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 8 information and hyperlinks to external webpages with more information. some of the elements added to the oru library virtual tour are hotspots that enable a viewer to move from one area to another, icons that present pop-up windows with more information, and other icons that link to the online profiles of various library faculty. throughout the tour, consistent use of icons for the same functions is maintained. for example, icons with arrows allow the viewer to move from one location to another (figure 2), while icons with question marks displayed over library personnel (figure 3) open the personnel’s profile webpage when clicked. icons that contain the letter “i” feature pop-up windows with information and related links. the tour begins from outside the building so new visitors will be able to recognize the building when they arrive on campus (figure 4). viewers can navigate through subsequent 360 images by clicking on the arrow icons so the viewer virtually travels the same path they will follow to enter the library when on campus. there are two other options to navigate the tour. the viewer can click on the small icons of scenes displayed on the left side of the screen to move to another area. the floor plans displayed at the upper right of the screen have red dots indicating the location of various scenes and, when clicked, move the viewer to that scene. figure 2. avmr station near the reference desk, oru library virtual tour. other elements of the tour include small icons of the scenes on the left of the screen. beneath these icons are the names of the various areas. the title of the current scene appears in yello w lettering, providing information to help orient the viewer. small floorplans located in the upper right side of the screen offer additional information on the location of the area (figure 3). viewers can toggle these floorplans on and off. another feature supplying location information is the dropdown menu for the floorplans (the dark blue bar at the upper right of the screen) which shows the floor level of the building on which the area is located. in the lower right of the screen, an information icon is available with details on what behavior to expect when clicking on icons and a description of the various ways to navigate the tour. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 9 figure 3. dean mark roberts near alexa and the self-checkout station at the circulation desk, oru library virtual tour. figure 4. oru campus, oru library tour. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 10 methodology the aim of the virtual tour of the library is to reduce several dimensions of students’ library anxiety. the primary goal of the tour is to reduce anxiety related to knowledge of the library by familiarizing students with images and information regarding the building prior to their arrival on campus. another aim is to reduce barriers with staff, which we address by proving information along with images of library faculty. affective barriers and mechanical barriers are two of the most prevalent causes of library anxiety, which the intervention of the tour does not directly address. the hope is, however, that with the minimization of any anxiety stemming from knowledge of the building and barriers with staff, students will be encouraged to consult with librarians, particularly as information on the variety of ways to contact librarians is included on the information pop-up window on the reference desk. preand post-surveys the preand post-surveys administered to students included 42 statements from bostick’s library anxiety scale. bostick’s library anxiety scale, developed in 1992, is a 5-point likert scale survey instrument that contains 43 statements. the pre-survey also contains demographic questions. the one statement omitted from bostick’s original survey was number 40, “the change machines are always out of order,” as the oru library does not have change machines.50 with the exception of the demographical questions, the post-survey is the duplicate of the pre-survey, with the same 42 statements. although several researchers have adapted bostick’s library anxiety scale, such as blundell’s adaptation to add “elements related specifically to information technology (both hardware such as computers, and software such as online research databases),”51 for the purposes of this preliminary inquiry, the researcher decided to use the original questions from the library anxiety scale. the original statements were used because reduction of library anxiety stemming from information technology use was not a goal of this study. administration of survey a link to the pre-survey was posted on the homepage of the oru library. the author sent email invitations containing a link to the pre-survey to students enrolled in the june 2019 summer modular theology classes. the author met with groups of education modular students during the week they were on campus (june 24–30, 2019) to recruit participation. in a library session, another librarian encouraged her modular students to participate in the study. at the end of the pre-survey, a unique number and instructions to note the number were provided to participants to be used to log in to the post-survey. the link to the virtual tour appeared on the final screen of the pre-survey. the link to the post-survey was provided on the same page as the virtual tour, allowing participants to navigate to the post-survey when desired. the surveys asked for no identifying information; however, the unique number provided on the pre-surveys and entered by the participants on the post-surveys allowed the researcher to link the participants’ responses to both surveys. once the results were downloaded, each of the participants’ preand post-survey responses were coded p1 through p7 to track any potential effects of the virtual tour on participants’ responses. because of the low rate of participation, formal statistical analyses were not applied to these findings. the results were examined in two ways. each participants’ preand post-survey responses were compared to determine if responses changed from preto post-survey. the total number of responses on each point of the likert scale information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 11 to each of the 42 statements were examined to determine trends in participants’ levels of library anxiety. results although approximately 100 students enrolled in either the graduate theology or graduate education modular classes visited the campus june 24–30, 2019, participation in this preliminary study was extremely low. to date only seven participants have completed both the preand postsurveys. the responses from this formative assessment will be used by the oru library to guide future iterations of the virtual library tour and inclusion in ili. the following discusses initial findings from the preand post-surveys. most of the participants reported little or no discomfort or anxiety with using the library. all participants indicated they are us citizens, and all indicated some level of familiarity with the library. four reported they had often visited the library, three responded they had visited the library previously, but not often. of the seven participants, five indicated they are graduate students, one marked “other,” and one reported doctoral-student status. ages of the participants varied from one at 20–29, one 30–39, two 40–49, and three at 50 years or over. the following describes the effect the virtual tour of the library had on participants’ responses. interestingly, one participant showed no change in responses from preto post-survey. note: bostick’s original categorization of the statements have been retained for all 42 of the statements on both instruments. knowledge of the library the principal aim of the virtual tour was to reduce library anxiety related to knowledge of the library by acquainting students with “where things are located and how to find their way around in the building.”52 bostick categorized 5 of the 42 statements as knowledge of the library. based on participants’ responses, there is some indication the tour did help acquaint students with the library. the changes in participants’ responses showed a greater positive trend after viewing the virtual tour; although on two statements, responses showed a negative trend (table 1). table 1 shows the questions on which participants had a change in their responses from preto postsurvey. the number in the positive column indicates the number of participants whose responses displayed a favorable change in the perceptions of participants to that statement following the virtual tour. the number in the negative column shows the number of participants whose responses on the post-survey showed a negative effect of the virtual tour on their responses. statement positive negative i don’t feel physically safe in the library. 1 1 i enjoy learning new things about the library. 3 1 i want to learn how to do my own research. 1 the library is a safe place. 2 the library is an important part of my school. 2 totals 9 2 table 1. statements in knowledge of the library category, which showed change on post-survey. the number of responses of strongly disagree statements in this category were unchanged from preto post-survey. the only statement that received any responses of strongly disagree was five information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 12 to the statement “i don't feel physically safe in the library.” taken together with the two responses of disagree to this statement, all the participants feel safe in the library. to the statement “the library is a safe place,” all seven participants answered either agree (five on pre-survey, four on post-survey) or strongly agree (two on pre-survey, three on post-survey) (figure 6). curiously, responses to “i enjoy learning new things about the library” changed from no responses of disagree on the pre-survey to one response of disagree on the post-survey. the other shift in the number of responses of disagree was on the statement “the library is an important part of my school” (two on pre-survey, one on post-survey), indicating a slight improvement (figure 5). figure 5. comparison of strongly disagree and disagree responses in knowledge of the library category. to the statements in this category, none of the respondents replied undecided, except to the statement “i enjoy learning new things about the library.” there was one undecided response on the pre-survey and no responses of undecided on the post-survey to this statement. the other change in this category was to the statement “the library is an important part of my school,” which moved from no responses of undecided on the pre-survey to one undecided on the postsurvey. the respondents, for the most part, wanted to learn to do their own research, with five responses of agree or strongly agree on both the preand post-surveys. five of the participants felt the library is of importance (one agree and four strongly agree). six of the seven participants reported they enjoy learning new things about the library. the shift in responses from five agree and one strongly agree on pre-survey to two agree and four strongly agree indicates the tour might have affected participants’ views on this statement (figure 6). information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 13 figure 6. comparison of strongly agree and agree responses in knowledge of the library category. affective barriers while not a direct goal of the virtual tour, the responses of participants showed the most gains on the post-survey within the category of affective barriers. this seems to indicate that viewing the virtual tour improved students’ self-perceptions of their competence in using the library and library resources. out of the 42 statements on each of the instruments, 12 are in bostick’s category, affective barriers. the statements in table 2 are those on which participants had a change in their responses from preto post-survey. the numbers in the positive column indicate the number of participant responses, which improved on the post-survey. a number in the negative column indicates participants’ post-survey responses that moved in a negative direction. statement positive negative a lot of the university is confusing to me. 2 i am unsure how to begin my research. 2 i can never find things i need in the library. 3 i don’t know what resources are available in the library. 2 i don't know what to do next when the book i need is not on the shelf. 1 i feel comfortable using the library. 3 i get confused trying to find my way around the library. 2 i’m embarrassed that i don’t know how to use the library. 1 1 the directions for using the computers are not clear. totals 17 1 table 2. statements in affective barriers category, which showed change on post-survey. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 14 looking at the responses to statements in this category reveals some possible effects of the tour and some potential areas of library anxiety. the responses to “i don’t know what resources are available in the library” were split on the pre-survey, with three responses of agree, one of strongly agree, one undecided, one of strongly disagree, and two of disagree. the post-survey responses showed almost no change; the only change was one additional response of undecided, with no strongly agree responses (figures 7.1, 8.1, 9.1). these findings indicate more information on what sources are available to patrons may be needed on the virtual tour. most of the respondents indicated confidence about where to begin research. on both preand post-surveys, there were five responses of strongly disagree or disagree to the statement “i am unsure how to begin my research” (figure 7.1). most indicated they feel confident in using the library based on the responses to the statements “i’m embarrassed that i don’t know how to use the library,” “i feel comfortable using the library,” “i can never find things in the library,” and “i get confused trying to find my way around the library” (figures 7.1, 7.2, 9.1, 9.2). responses were equally positive to the statements “the library won't let me check out as many items as i need,” “a lot of the university is confusing to me,” “i don’t know what to do next when the book i need is not on the shelf,” and “i can’t find enough space in the library to study” (figures 7.1, 7.2, 8.1, 8.2, 9.1, 9.2). responses were divided on the statement “i feel like i’m bothering the reference librarian if i ask a question” (figures 8.2, 10.2). this finding needs further research to determine what is causing students to feel reluctance to ask the librarian for assistance. figure 7.1. comparison of strongly disagree and disagree responses in affective barriers category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 15 figure 7.2. comparison of strongly disagree and disagree responses in affective barriers category. figure 8.1. comparison of undecided responses in affective barriers category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 16 figure 8.2. comparison of undecided responses in affective barriers category. figure 9.1. comparison of strongly agree and agree responses in affective barriers category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 17 figure 9.2. comparison of strongly agree and agree responses in affective barriers category. mechanical barriers although not a goal of the study, there was positive change in participants’ responses in both statements in the category of mechanical barriers. it is unclear how the virtual tour might have caused the improvement in participants’ perception of the reliability of machines in the library. statement positive negative the computer printers are often out of paper. 1 the copy machines are usually out of order. 1 totals 2 table 3. statements in mechanical barriers category, which showed change on post-survey. in this category, on both the preand post-surveys, there was one strongly disagree response to both statements. no respondents replied agree or strongly agree to the statements in this category. responses of disagree to both statements increased one from one disagree response on the pre-survey to two disagree responses on the post-survey. the number of undecided responses fell from five to four on the post-survey. as noted above, it is not clear what caused the change in responses. barriers with staff a secondary goal of the tour was to reduce barriers with staff and thus to reduce library anxiety by providing information with images of library faculty. by providing information and images of the library faculty, this study sought to reduce the anxiety students may have regarding the accessibility and approachability of library staff. in this category, participants showed some positive effects of the virtual tour on how participants viewed library staff. however, the responses of participants exhibited the most variability in this category, with almost an equal number of responses being positive or negative after viewing the tour. the reasons for this information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 18 variance are unclear. in future studies, additional space for comments will be included on the surveys as well as possible follow-up focus-group discussions to determine the causes of negative trends in responses. table 4 shows the statements within this category on which participants had a change in their responses from preto post-survey. the number in the positive column indicates how many participant responses changed in a favorable direction on the post-survey. the number in the negative column indicates the number of participants whose post-survey responses moved in a negative direction. on the survey instruments, 12 of the 15 statements categorized as bostick’s barriers with staff showed changes in responses. statement positive negative i can always ask a librarian if i don’t know how to work a piece of equipment in the library. 1 i can’t get help in the library at the times i need it. 1 1 if i can’t find a book on the shelf the library staff will help me. 2 2 library staff don’t have time to help me. 1 1 the librarians are unapproachable. 2 the library is a comfortable place to study. 2 the library staff doesn’t care about students. 1 3 the library staff doesn’t listen to students. 1 the reference librarians are not approachable. 2 the reference librarians are unhelpful. 2 the reference librarians don’t have time to help me because they’re always busy doing something else. 1 1 there is often no one available in the library to help me. 2 1 totals 15 12 table 4. statements in barriers with staff category, which showed change on post-survey. the findings in the category, overall, were favorable. most feel the librarians and library staff care and are responsive and available to students. pre-survey responses indicated one or two of the participants felt librarians are unapproachable or unhelpful. post-survey responses reflected a positive change in participants’ views on librarians’ approachability and helpfulness. participants also reported the library to be a comfortable study location and that the rules are reasonable (figures 10.1, 10.2, 11.1, 11.2, 12.1, 12.2). information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 19 figure 10.1. comparison of strongly disagree and disagree responses in barriers with staff category. figure 10.2. comparison of strongly disagree and disagree responses in barriers with staff category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 20 figure 11.1. comparison of undecided responses in barriers with staff category. figure 11.2. comparison of undecided responses in barriers with staff category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 21 figure 12.1. comparison of strongly agree and agree responses in barriers with staff category. figure 12.2. comparison of strongly agree and agree responses in barriers with staff category. comfort with the library according to collins and veal, comfort with the library is students’ perceptions of the library as a “safe and comforting environment.”53 out of the 42 statements, bostick placed 8 within this category, all of whom showed some change in responses from pre-survey to post-survey. the changes reflected in this category were positive, but it is unclear how the virtual tour might have influenced participants’ perceptions on statements such as “there is too much crime in the library” or “good instructions for using the library’s computers are available.” further investigation is needed to determine what may account for changes in perception on statements such as these. table 5 depicts the changes, both positive and negative, in participants’ responses on the statements in this category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 22 statement positive negative good instructions for using the library’s computers are available. 2 i don’t understand the library’s overdue fines. 1 2 i feel comfortable in the library. 2 i feel safe in the library. 2 1 the library never has the materials i need. 1 the people who work at the circulation desk are helpful. 3 1 the reference librarians are unfriendly. 1 there is too much crime in the library. 1 2 totals 12 7 table 5. statements in comfort with the library category that showed change on post-survey. the following bar graphs compare the responses on the pre-surveys to the post-survey responses within this category. as with other categories, responses were mostly favorable in this category (figures 13, 14, 15). figure 13. comparison of strongly disagree and disagree responses in comfort with the library category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 23 figure 14. comparison of undecided responses in comfort with the library category. figure 15. comparison of strongly agree and agree responses in comfort with the library category. conclusion the oru library has found the virtual tour to be of use in familiarizing students with the library. anecdotal statements from students who viewed the tour during its creation noted the desire that such a tour had been available when they began college and further commented on the assistance that the tour will provide new students. a limitation of this study is the low participation, with no participation from students from some of the groups that other studies have shown may have higher levels of library anxiety (e.g., new students, international students). however, given the indications of positive effects of the virtual tour from our study results and anecdotal statements, we are encouraged that this tool that will assist our students in reducing library anxiety, with the result that they will visit and use the information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 24 library more often to their benefit. again, although participation was low, these results have also encouraged oru librarians to seek other ways to include avmr and other innovative technologies in our instruction, outreach, and services. the 360 virtual tour of the library is undergoing updates and additions to provide students with disabilities information on access points and accessible restrooms. other projects underway include incorporating avmr in il sessions, the addition of a digital sandbox with various technologies and equipment including a vr station, and the addition of vr equipment in our designated faculty research room for use by university faculty to learn and teach students how to use avmr technologies. the response from students and faculty to these new services has been enthusiastic and encouraging that the oru library is positively influencing and supporting the academic work of oru faculty and students. recommended reading varnum, kenneth j. beyond reality: augmented, virtual, and mixed reality in the library. chicago: ala editions, 2019. elliott, christine, marie rose, and jolanda-pieta van arnhem. augmented and virtual reality in libraries. lanham, md: rowman & littlefield, 2018. endnotes 1 anthony j. onwuegbuzie and qun g. jiao, “information search performance and research achievement: an empirical test of the anxiety-expectation mediation model of library anxiety,” journal of the american society for information science & technology 55, no. 1 (2004): 41–54, https://doi.org/10.1002/asi.10342; qun g. jiao and anthony j. onwuegbuzie, “is library anxiety important?,” library review 48, no. 6 (1999), https://doi.org/10.1108/00242539910283732; qun g. jiao and anthony j. onwuegbuzie, library anxiety: the role of study habits (paper presented at the annual meeting of the midsouth educational research association (msera), bowling green, kentucky, november 15–17, 2000), http://files.eric.ed.gov/fulltext/ed448781.pdf. 2 constance a. mellon, “library anxiety: a grounded theory and its development,” college & research libraries 47, no. 2 (1986), https://doi.org/10.5860/crl_47_02_160; see also constance a. mellon, “library anxiety: a grounded theory and its development,” college & research libraries 76, no. 3 (2015), https://doi.org/10.5860/crl.76.3.276. 3 diane mizrachi, “library anxiety,” encyclopedia of library and information sciences (boca raton, fl: crc press, 2017): 2782. 4 mellon, “library anxiety,” (1986): 163; see also mellon, “library anxiety,” (2015): 280. 5 mellon, “library anxiety,” (1986): 162; see also mellon, “library anxiety,” (2015): 278. 6 alison j. head and michael b. eisenberg, truth be told: how college students evaluate and use information in the digital age: project information literacy progress report (university of washington's information school, 2010): 3. https://doi.org/10.1002/asi.10342 https://doi.org/10.1108/00242539910283732 http://files.eric.ed.gov/fulltext/ed448781.pdf https://doi.org/10.5860/crl_47_02_160 https://doi.org/10.5860/crl.76.3.276 information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 25 7 sharon lee bostick, “the development and validation of the library anxiety scale,” (phd diss., wayne state university, 1992); qun g. jiao and anthony j. onwuegbuzie, “antecedents of library anxiety,” library quarterly 67, no. 4 (1997): 72, https://doi.org/10.1086/629972. 8 jiao and onwuegbuzie, “antecedents of library anxiety.” 9 mellon, “library anxiety” (1986); see also mellon, “library anxiety” (2015); see also constance a. mellon, “attitudes: the forgotten dimension in library instruction,” library journal 113, no. 14 (1988). 10 kathleen m. t. collins and robin e. veal, “off-campus adult learners’ levels of library anxiety as a predictor of attitudes toward the internet,” library & information science research 26, no. 1 (2004): 4, https://doi.org/https://doi.org/10.1016/j.lisr.2003.11.002. 11 mizrachi, “library anxiety,” 2784. 12 anthony j. onwuegbuzie, “writing a research proposal: the role of library anxiety, statistics anxiety, and composition anxiety,” library & information science research 19, no. 1 (1997), https://doi.org/10.1016/s0740-8188(97)90003-7. 13 carol collier kuhlthau, “developing a model of the library search process: cognitive and affective aspects,” research quarterly 28, no. (winter 1988), https://www.jstor.org/stable/25828262; carol c kuhlthau, “inside the search process: information seeking from the user’s perspective,” journal of the american society for information science 42, no. 5 (1991), https://doi.org/10.1002/(sici)10974571(199106)42:5<361::aid-asi6>3.0.co;2-%23. 14 shelley blundell, “documenting the information-seeking experience of remedial undergraduate students,” proceedings from the document academy 1, no. 1 (2014), https://doi.org/10.35492/docam/1/1/4. 15 blundell, “documenting the information-seeking experience,” 5. 16 blundell, “documenting the information-seeking experience,” 6. 17 used by permission of the author. retrieved from http://remedialundergraduateaisp.pbworks.com/w/file/88755941/modelrevised%20-%208 .4.jpg. 18 blundell, “documenting the information-seeking experience”; melissa gross and don latham, “attaining information literacy: an investigation of the relationship between skill level, self estimates of skill, and library anxiety,” library & information science research 29, no. 3 (2007), https://doi.org/10.1016/j.lisr.2007.04.012; melissa gross and don latham, “undergraduate perceptions of information literacy: defining, attaining, and self-assessing skills,” college & research libraries 70, no. 4 (2009), https://doi.org/10.5860/0700336; melissa gross and don latham, “experiences with and perceptions of information: a phenomenographic study of first-year college students,” library quarterly 81, no. 2 (2011), https://doi.org/10.1086/658867; melissa gross, “the impact of low-level skills on https://doi.org/10.1086/629972 https://doi.org/https:/doi.org/10.1016/j.lisr.2003.11.002 https://doi.org/10.1016/s0740-8188(97)90003-7 https://www.jstor.org/stable/25828262 https://doi.org/10.1002/(sici)1097-4571(199106)42:5%3c361::aid-asi6%3e3.0.co;2-%23 https://doi.org/10.1002/(sici)1097-4571(199106)42:5%3c361::aid-asi6%3e3.0.co;2-%23 https://doi.org/10.35492/docam/1/1/4 http://remedialundergraduateaisp.pbworks.com/w/file/88755941/modelrevised%20-%208.4.jpg http://remedialundergraduateaisp.pbworks.com/w/file/88755941/modelrevised%20-%208.4.jpg https://doi.org/10.1016/j.lisr.2007.04.012 https://doi.org/10.5860/0700336 https://doi.org/10.1086/658867 information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 26 information-seeking behavior: implications of competency theory for research and practice,” reference & user services quarterly (2005), https://www.jstor.org/stable/20864481. 19 mellon, “attitudes,” 138; jiao and onwuegbuzie, “antecedents of library anxiety.” 20 qun g. jiao and anthony j. onwuegbuzie, “perfectionism and library anxiety among graduate students,” journal of academic librarianship 24, no. 5 (1998), https://doi.org/10.1016/s00991333(98)90073-8; jiao and onwuegbuzie, “is library anxiety important?”; qun g. jiao and anthony j. onwuegbuzie, “library anxiety among international students” (paper presented at the annual meeting of the mid-south education research association point clear, alabama, november 17–19, 1999), https://eric.ed.gov/?id=ed437973; qun g. jiao and anthony j. onwuegbuzie, “self-perception and library anxiety: an empirical study,” library review 48, no. 3 (1999), https://doi.org/10.1108/00242539910270312; qun g. jiao and anthony j. onwuegbuzie, “identifying library anxiety through students’ learning-modality preferences,” library quarterly 69, no. 2 (1999), https://doi.org/10.1086/603054; qun g. jiao and anthony j. onwuegbuzie, library anxiety: the role of study habits; qun g. jiao and anthony j. onwuegbuzie, “library anxiety and characteristic strengths and weaknesses of graduate students’ study habits,” library review 50, no. 2 (2001), https://doi.org/10.1108/00242530110381118; qun g. jiao and anthony j. onwuegbuzie, “dimensions of library anxiety and social interdependence: implications for library services, ” library review 51, no. 2 (2002), https://doi.org/10.1108/00242530210418837; qun g. jiao and anthony j. onwuegbuzie, the relationship between library anxiety and reading ability (paper presented at the annual meeting of the mid-south educational research association, chattanooga, tennessee, november 6–8, 2002), https://eric.ed.gov/?id=ed478612; qun g. jiao and anthony j. onwuegbuzie, “reading ability as a predictor of library anxiety,” library review 52, no. 4 (2003), https://doi.org/10.1108/00242530310470720; anthony j. onwuegbuzie, and vicki l. waytowich, “the relationship between citation errors and library anxiety: an empirical study of doctoral students in education,” information processing & management 44, no. 2 (2008), https://doi.org/10.1016/j.ipm.2007.05.007; onwuegbuzie, “writing a research proposal”; anthony j. onwuegbuzie and qun g. jiao, “i’ll go to the library later: the relationship between academic procrastination and library anxiety,” college & research libraries 61, no. 1 (2000), https://doi.org/10.5860/crl.61.1.45; onwuegbuzie and jiao, “information search performance and research achievement”; anthony j. onwuegbuzie, qun g. jiao, and sharon l bostick, library anxiety: theory, research, and applications, vol. 1 (lanham, maryland: scarecrow press, 2004). 21 jiao and onwuegbuzie, “identifying library anxiety”; qun g. jiao, anthony j. onwuegbuzie, and art a. lichtenstein, “library anxiety: characteristics of ‘at-risk’ college students,” library & information science research 18, no. 2 (1996), https://doi.org/10.1016/s07408188(96)90017-1; nahyun kwon, “a mixed-methods investigation of the relationship between critical thinking and library anxiety among undergraduate students in their information search process,” college & research libraries 69, no. 2 (2008), https://doi.org/10.5860/crl.69.2.117; mellon, “attitudes.” 22 gaby haddow, “academic library use and student retention: a quantitative analysis,” library & information science research 35, no. 2 (2013), https://www.jstor.org/stable/20864481 https://doi.org/10.1016/s0099-1333(98)90073-8 https://doi.org/10.1016/s0099-1333(98)90073-8 https://eric.ed.gov/?id=ed437973 https://doi.org/10.1108/00242539910270312 https://doi.org/10.1086/603054 https://doi.org/10.1108/00242530110381118 https://doi.org/10.1108/00242530210418837 https://eric.ed.gov/?id=ed478612 https://doi.org/10.1108/00242530310470720 https://doi.org/10.1016/j.ipm.2007.05.007 https://doi.org/10.5860/crl.61.1.45 https://doi.org/10.1016/s0740-8188(96)90017-1 https://doi.org/10.1016/s0740-8188(96)90017-1 https://doi.org/10.5860/crl.69.2.117 information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 27 https://doi.org/https://doi.org/10.1016/j.lisr.2012.12.002; adam murray, ashley ireland, and jana hackathorn, “the value of academic libraries: library services as a predictor of student retention,” college & research libraries 77, no. 5 (2016), https://doi.org/10.5860/crl.77.5.631; krista m. soria, “factors predicting the importance of libraries and research activities for undergraduates,” journal of academic librarianship 39, no. 6 (2013), https://doi.org/10.1016/j.acalib.2013.08.017; krista m soria, jan fransen, and shane nackerud, “library use and undergraduate student outcomes: new evidence for students’ retention and academic success,” portal: libraries and the academy 13, no. 2 (2013), https://doi.org/10.1353/pla.2013.0010; krista m. soria, jan fransen, and shane nackerud, “stacks, serials, search engines, and students’ success: first-year undergraduate students’ library use, academic achievement, and retention,” journal of academic librarianship 40, no. 1 (2014), https://doi.org/10.1016/j.acalib.2013.12.002; krista m soria, jan fransen, and shane nackerud, “beyond books: the extended academic benefits of library use for firstyear college students,” college & research libraries 78, no. 1 (2017), https://doi.org/10.5860/crl.78.1.8. 23 jiao, onwuegbuzie, and lichtenstein, “library anxiety,” 1. 24 jiao and onwuegbuzie, “identifying library anxiety”; see also bostick, “the development and validation”; barbara fister, julie gilbert, and amy ray fry, “aggregated interdisciplinary databases and the needs of undergraduate researchers,” portal: libraries and the academy 8, no. 3 (2008), https://doi.org/10.1353/pla.0.0003; mellon, “library anxiety”; jiao and onwuegbuzie, “perfectionism and library anxiety among graduate students”; jiao and onwuegbuzie, “is library anxiety important?”; jiao and onwuegbuzie, “library anxiety among international students”; jiao and onwuegbuzie, “self-perception and library anxiety: an empirical study”; jiao and onwuegbuzie, “identifying library anxiety through students’ learning-modality preferences”; jiao and onwuegbuzie, library anxiety: the role of study habits; jiao and onwuegbuzie, “library anxiety and characteristic strengths and weaknesses of graduate students’ study habits”; jiao and onwuegbuzie, “dimensions of library anxiety and social interdependence”; jiao and onwuegbuzie, the relationship between library anxiety and reading ability; jiao and onwuegbuzie, “reading ability as a predictor of library anxiety”; onwuegbuzie and waytowich, “the relationship between citation errors and library anxiety”; onwuegbuzie, “writing a research proposal”; onwuegbuzie and jiao, “i'll go to the library later”; onwuegbuzie and jiao, “information search performance and research achievement”; onwuegbuzie, jiao, and bostick, library anxiety: theory, research, and applications. 25 onwuegbuzie and jiao, “the relationship”; anthony onwuegbuzie and qun g. jiao, “understanding library-anxious graduate students,” library review 47, no. 4 (1998), https://doi.org/10.1108/00242539810212812. 26 jiao and onwuegbuzie, “is library anxiety important?” 27 qun g. jiao, anthony j. onwuegbuzie, and sharon l bostick, “racial differences in library anxiety among graduate students,” library review 53, no. 4 (2004), https://doi.org/10.1108/00242530410531857; qun g. jiao, anthony j. onwuegbuzie, and sharon l. bostick, “the relationship between race and library anxiety among graduate https://doi.org/https:/doi.org/10.1016/j.lisr.2012.12.002 https://doi.org/10.5860/crl.77.5.631 https://doi.org/10.1016/j.acalib.2013.08.017 https://doi.org/10.1353/pla.2013.0010 https://doi.org/10.1016/j.acalib.2013.12.002 https://doi.org/10.5860/crl.78.1.8 https://doi.org/10.1353/pla.0.0003 https://doi.org/10.1108/00242539810212812 https://doi.org/10.1108/00242530410531857 information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 28 students: a replication study,” information processing & management 42, no. 3 (2006), https://doi.org/10.1016/j.ipm.2005.03.018. 28 mizrachi, “library anxiety,” 2784. 29 anthony j. onwuegbuzie and qun g. jiao, “academic library useage: a comparison of native and non-native english-speaking students,” australian library journal 46, no. 3 (1997): 263, https://doi.org/10.1080/00049670.1997.10755807; jiao and onwuegbuzie, “antecedents of library anxiety.” 30 jiao and onwuegbuzie, “library anxiety among international students.” 31 yunhui lu and denice adkins, “library anxiety among international graduate students,” proceedings of the american society for information science and technology 49, no. 1 (2012), https://doi.org/10.1002/meet.14504901319. 32 collins and veal, “off-campus adult.” 33 nahyun kwon, anthony j. onwuegbuzie, and linda alexander, “critical thinking disposition and library anxiety: affective domains on the space of information seeking and use in academic libraries,” college & research libraries 68, no. 3 (2007): 276, https://doi.org/10.5860/crl.68.3.268. 34 kwon, “a mixed-methods investigation.” 35 judy carol bell, “student affect regarding library-based and web-based research before and after an information literacy course,” journal of librarianship & information science 43, no. 2 (2011), https://doi.org/10.1177/0961000610383634. 36 jessica platt and tyson l platt, “library anxiety among undergraduates enrolled in a research methods in psychology course,” behavioral & social sciences librarian 32, no. 4 (2013): 248, https://doi.org/10.1080/01639269.2013.841464. 37 rachel a. fleming-may, regina mays, and rachel radom, “‘i never had to use the library in high school’: a library instruction program for at-risk students,” portal: libraries and the academy 15, no. 3 (2015), https://doi.org/10.1353/pla.2015.0038. 38 catherine pellegrino, “does telling them to ask for help work?,” reference & user services quarterly 51, no. 3 (2012), https://doi.org/10.5860/rusq.51n3.272. 39 kathy christie anders, stephanie j. graves, and elizabeth german, “using student volunteers in library orientations,”practical academic librarianship: the international journal of the sla 6, no. 2 (2016): 17–30, http://hdl.handle.net/1969.1/166249. 40 pamela n. martin and lezlie park, “reference desk consultation assignment: an exploratory study of students’ perceptions of reference service,” reference & user services quarterly 49, no. 4 (2010), https://doi.org/10.5860/rusq.49n4.333. https://doi.org/10.1016/j.ipm.2005.03.018 https://doi.org/10.1080/00049670.1997.10755807 https://doi.org/10.1002/meet.14504901319 https://doi.org/10.5860/crl.68.3.268 https://doi.org/10.1177/0961000610383634 https://doi.org/10.1080/01639269.2013.841464 https://doi.org/10.1353/pla.2015.0038 https://doi.org/10.5860/rusq.51n3.272 http://hdl.handle.net/1969.1/166249 https://doi.org/10.5860/rusq.49n4.333 information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 29 41 sarah mcdaniel, “library roles in advancing graduate peer-tutor agency and integrated academic literacies,” reference services review 46, no. 2 (2018), https://doi.org/10.1108/rsr-02-2018-0017. 42 elaine m. robbins, “breaking the ice: using non-traditional methods of student involvement to effect [sic] a welcoming college library environment,” southeastern librarian 62, no. 1 (2014), https://digitalcommons.kennesaw.edu/seln/vol62/iss1/5. 43 elizabeth diprince et al., “don’t panic!,” reference & user services quarterly 55, no. 4 (2016), https://doi.org/10.5860/rusq.55n4.283. 44 oral roberts university, “about oru,” (2019), https://www.oru.edu/admissions/undergraduate/. 45 oral roberts, our partnership with god [sound recording]. eighth world outreach, oral roberts evangelistic association, tulsa, ok: abundant life recordings, 1962). 46 oral roberts, our partnership. 47 margaret m. grubiak, “an architecture for the electronic church: oral roberts university in tulsa, oklahoma,” technology and culture 57, no. 2 (2016), https://doi.org/10.1353/tech.2016.0066. 48 stephanie hill, “oru receives innovation award,” press release, may 2, 2017, http://www.oru.edu/news/oru_news/20170502-glc-innovation-award.php?locale=en. 49 hill, “oru receives.” 50 bostick, “the development and validation,” 160. 51 blundell, “documenting the information-seeking experience,” 263. 52 mizrachi, “library anxiety,” 2784. 53 collins and robin e. veal, “off-campus adult,” 7. https://doi.org/10.1108/rsr-02-2018-0017 https://digitalcommons.kennesaw.edu/seln/vol62/iss1/5 https://doi.org/10.5860/rusq.55n4.283 https://www.oru.edu/admissions/undergraduate/ https://doi.org/10.1353/tech.2016.0066 http://www.oru.edu/news/oru_news/20170502-glc-innovation-award.php?locale=en abstract literature review library anxiety causes and factors negative effects at-risk student groups interventions oru library oral roberts university history oru modular students the project methodology preand post-surveys administration of survey results knowledge of the library affective barriers mechanical barriers barriers with staff comfort with the library conclusion recommended reading endnotes microsoft word march_ital_kuglitsch_proof.docx facilitating research consultations using cloud services: experiences, preferences, and best practices rebecca zuege kuglitsch, natalia tingle, and alexander watkins information technology and libraries | march 2017 29 abstract the increasing complexity of the information ecosystem means that research consultations are increasingly important to meeting library users' needs. yet librarians struggle to balance escalating demands on their time. how can we embrace this expanded role and maintain accessibility to users while balancing competing demands on our time? one tool that allows us to better navigate this balance is google appointment calendar, part of google apps for education. it makes it easier than ever for students to book a consultation with a librarian, while at the same time allowing the librarian to better control their schedule. our experience suggests that both students and librarians felt it was a useful, efficient system. introduction the growing complexity of the information ecosystem means that research consultations are increasingly important to meeting library users' needs. although reference interactions in academic libraries have declined overall, in-depth research consultations have not followed that trend.1 these research consultations represent an increasingly large proportion of academic librarians' reference interactions, and offer important opportunities to follow up on information literacy instruction, support student academic success, and relieve library anxiety. the library literature has demonstrated a need for and appreciation of these services.2 moreover, students value face to face consultations because they provide an opportunity to talk through complex problems and questions while providing affective benefits such as relationship building and reassurance.3 it is evident that students seek out and value these services. but even as these services become increasingly important, librarians struggle to balance escalating demands on their time. how can we embrace this expanded role and maintain accessibility to users while managing competing priorities? we found little guidance in the literature to identify the most efficient technological tools to offer these services to undergraduates, so we began to explore options. one tool that allows us to better navigate this shifting landscape is google appointment calendar, part of google apps for education. it makes it easier for students to book a consultation with a librarian, while at the same time allowing the librarian to better control their schedule; rebecca zuege kuglitsch (rebecca.kuglitsch@colorado.edu) is head, gemmill library of engineering, mathematics & physics, university of colorado boulder. natalia tingle (natalia.tingle@colorado.edu) is business collections & reference librarian, university of colorado boulder. alexander watkins (alexander.watkins@colorado.edu) is art & architecture librarian, university of colorado boulder. facilitating research consultations using cloud services: experiences, preferences, and best practices | kuglitsch, tingle, and watkins | https://doi.org/10.6017/ital.v36i1.8923 30 consequently, it is being adopted by many librarians at the university of colorado boulder. there are several other options available for librarians interested in calendar applications, such as youcanbook.me.4 however, on campuses using google apps for education, it may be easier to use a tool students are already familiar with and commonly use as part of their daily academic routines. moreover, the integration with apps for education solves some of the problems hess noted in the public version of google calendar appointments (which is also no longer available), such as appointments booked without identifying information, and the extra step of logging in just for an appointment. because students are often already logged in due to using google apps for word processing, group work, and more, there is no extra step to log in for a simple appointment.5 our exploration of this tool suggests that it is helpful to librarians, but that it can also be of benefit to students, too. research has proposed that students may hesitate to ask questions due to library anxiety. would scheduling an appointment using a calendaring system be less intimidating than emailing a librarian directly, for example? we set out apply this technology in an environment of changing student preferences and expectations, explore how students received it, and establish effective practices for using it in an academic setting. since we are liaisons to science, social science, and humanities subject areas, we were able to work with a wide spread of undergraduate students in our exploration to see what might be most effective for us, and also for students from a variety of backgrounds. why google calendar we selected appointment booking via google calendar because of its ease of use and because the university of colorado boulder has google apps for education. this means that every student will have a google id and the option of using google calendar as part of their normal routine. in december 2012, google discontinued appointment calendars for general users, and limited claimable appointment slots to google apps for education. for institutions which who do not subscribe, it may be worth investigating third-party google calendar apps, some of which are free or freemium, such as calendly (https://calendly.com/), or springshare’s similar subscription service, libcal (https://www.springshare.com/libcal/). setting up google calendar one of the benefits of google calendar is its ease of use. starting to set up the calendar for appointment slots is as simple as creating a new google calendar event and selecting appointment slots as the type of event. next, you can give your appointment slots a name that correspond with the language your institution uses for research consultations, and schedule them for the desired length of time. it is possible to schedule blocks of appointments that google will automatically break into shorter appointments of predetermined amounts of time. the authors created appointments lasting 30 minutes, 60 minutes, or a mix of both, depending on the expectations of our disciplines. it is also possible to create several simultaneous appointment slots, if you would like to accommodate small groups. as well as indicating time, each appointment also has a space to indicate location, particularly useful for librarians who might work in several branches or combine office hours in academic buildings with in-library office consultations. once the events are named and saved, the calendar can be shared. information technology and libraries | march 2017 31 figure 1. create a new event, selecting ‘appointment slots’. appointment calendars are given a unique shareable url to direct users to available appointments; however, these urls are necessarily long and complicated, so we recommend using a link shortener. to obtain the very long url for an appointment calendar, click on ‘edit details’ in an appointment event. from there, it is possible to copy the link and use a link shortener to make a brief, understandable link. figure 2. obtain the shareable link when a student uses the link to make an appointment, both the librarian and the student receive an email with the student’s login name, email, appointment time, and other details. the slot immediately appears as taken on the calendar, so it is no longer available for other students, reducing confusion and double booking. receiving the student’s email allows the librarian to initiate the reference interview and establish expectations. facilitating research consultations using cloud services: experiences, preferences, and best practices | kuglitsch, tingle, and watkins | https://doi.org/10.6017/ital.v36i1.8923 32 figure 3. google calendar showing a variety of available appointments. student impressions we received positive feedback about the appointment calendars from students. students commented: ● “i like the ability to see all of the possible openings,” ● “i already bookmarked that bit.ly, so you’ll probably hear from me” (which we did, shortly thereafter). ● “i like to be able to ‘schedule’ a consultation, not request one. it seems more useful and immediate.” we kept track of how many students who made calendar appointments over two semesters kept them, and sent a short, informal survey to students who made appointments. no students who made a calendar appointment failed to attend their consultation. though our survey does not permit large-scale generalizations due to a very low response rate (4) and a small sample size (15), all of the students who responded and used the calendar found the experience of booking an appointment that way to be easy, convenient, and unintimidating. everyone who used the calendar indicated that they would prefer to use it again, and about half of the respondents who set up their appointments via email told us that they would prefer to book a consultation through information technology and libraries | march 2017 33 an appointment calendar in the future. our anecdotal evidence in succeeding semesters aligns with this perception. we found that using appointment calendars can have many benefits for students: ● they can reduce student anxiety from having to compose and send an email. ● booking appointments can take less of their time. they book immediately without back and forth emailing. this also means there’s no time to rethink the appointment and either never send the email or back out later. ● the appointment is placed on their calendar, meaning they automatically have a built-in reminder and don’t need to search through their email to find the date and time of their appointment. ● since the appointment calendars eliminate back and forth scheduling and reduce email fatigue, students may be more willing to use email to discuss their topic and/or question with the librarian. librarian impressions our experience has been equally positive. we found that using the calendars radically streamlines the typical back and forth email exchanges for setting appointments. we emailed each student to confirm the appointment, but this single email is still a significant reduction of claim on the librarian’s attention from a minimum of three emails to schedule an appointment (which often realistically becomes five or more when negotiating a time) to two. additionally, librarians can put appointment slots in between meetings and other times when they might only have a spare hour, which are often too tedious to list when emailing. using appointment calendars lets librarians efficiently use their time even when it is fragmented. as well as facilitating efficient use of small amounts of time, appointment calendars also allow librarians to gently create boundaries. rather than having to deny appointments requested for late nights or weekends, students are guided to viable times. while the use of google calendar is entirely voluntary at the university of colorado boulder we presented the tool at several reference librarian meetings with success and several other librarians have happily adopted the tool. one librarian who adopted the tool said: “sending a student a calendar that they can use to request a meeting eliminates the twelve messages back and forth on when to schedule a meeting. i also like that it puts the meeting on both our calendars, reducing the number of no-shows.” best practices our experiences and verbal feedback from students and librarians provided a foundation to develop best practices to minimize both librarian and student confusion. for students, confusion often centered around accessing the calendar, identifying which time slots were available, and identifying acceptable locations for appointments. the following best practices can help solve these difficulties. use a link shortener and a consistent naming convention so the links are similar for multiple librarians. using a link shortener makes it easy for students to jot down the calendar url, either to manually enter into a browser later or to quickly get to the link and bookmark it. this makes it easy for students to file the link and return to it at point of need. using a consistent naming facilitating research consultations using cloud services: experiences, preferences, and best practices | kuglitsch, tingle, and watkins | https://doi.org/10.6017/ital.v36i1.8923 34 convention makes it intuitive for students to transfer the appointment method over to other librarians’ cases for future research needs. if your link shortener is case-sensitive, create capitalized and lowercase versions of the link. many link shorteners are case-sensitive, unlike most urls, which can confuse students and lead to frustration when they try to access a link later. while this could be solved to some extent by using only lowercase letters for the shortened link, that solution can create a cumbersome and difficult to read short url. simply creating two forms of the link efficiently solves this. develop a naming convention so available appointment slots are obvious. we found that when naming time slots simply “consultation” students sometimes assumed that all appointments were booked when, in fact, every appointment was open. using a term like “available consultation” made it clear to students that the appointments were not already booked. google calendar automatically makes booked appointments unavailable, eliminating the opposite frustration. carefully consider the location in the bookable appointment form. google calendar allows librarians to enter or leave empty the location. if the field is left empty, users can specify a location, and students often filled in a location when none was indicated. if a librarian is not mobile, or is available in certain places only at certain times, it is key to identify a location. for example, in our study, one librarian held weekly office hours in two academic buildings; it was particularly important to identify which times the librarian was available in the library versus the academic buildings. on the other hand, it may also make sense not to designate a location. another of the authors, serving a population that used the main library, one branch library, and research area of the campus with no onsite library services, chose not to enter any location in order to accommodate the extremely dispersed population. users frequently indicated in which location they would be willing to meet, an option the librarian wanted to support in order to underscore the availability of services wherever users were located on campus. schedule two weeks of availability. we found that students could almost always find a time that worked for them with two weeks of available appointments. moreover, other than recurring office hours, it was difficult for librarians to predict their schedule further into the future than a few weeks. librarian concerns centered around keeping calendars synchronized, providing enough lead time for users to book appointments, and publicizing the service. we found several best practices that eased these concerns. designate a day each week to update hours and clear conflicts on the calendar. if google calendar is not the primary calendaring software for the library, it can be challenging to synchronize calendars. google calendar sends a calendar invitation to the librarian when an appointment is claimed, which they can accept on their primary calendaring system, but conflicts that arise on the primary calendaring system are not automatically sent to google calendar. by selecting a day and habitually updating the google calendar and quickly checking for conflicts that have arisen with unclaimed slots, librarians can avoid forgetting to add slots or remove those that conflict with other late-arising obligations. advertise the link on the library web site, give out the calendar link during class sessions and give it to professors to embed in course management systems. while appointment calendars still information technology and libraries | march 2017 35 benefit librarian workflows without advertising, students need easy access to the calendar. for maximum user uptake, it is important to put the calendar link anywhere a librarian’s contact information can be found. we found it helpful to promote the link in classes, and that it was particularly effective when professors agreed to place the link in the class web site. this positions library research assistance next to assignments when they are given out and drafts when they are returned--hopefully reminding students that the library is available for assistance at moments in which they are most likely to seek it. reflections and conclusions our experiences support the idea that online appointment calendars are appreciated by students, streamline work for librarians, and are easily adopted by both parties. more use of this technology, whether via google apps for education or another service, can be mutually beneficial to librarians and students. students using the calendar indicated that it was not more intimidating than emailing a librarian, and by removing the waiting period for a response, a calendar can prevent student distraction or students persuading themselves that they actually do not need help in the interim. by providing a calendar where students can quickly and simply book an appointment with a librarian for research assistance, librarians can support students seeking assistance, and thus ultimately bolster student success and increase the library’s relevance. references 1. naomi lederer and louise mort feldmann, “interactions: a study of office reference statistics,” evidence based library and information practice 7, no. 2 (2012): 5–19. 2. ramirose attebury, nancy sprague, and nancy j. young, “a decade of personalized research assistance,” reference services review 37, no. 2 (2009): 207–20, https://doi.org/10.1108/00907320910957233; trina j. magi and patricia e. mardeusz, “what students need from reference librarians: exploring the complexity of the individual consultation,” college & research libraries news 74, no. 6 (2013): 288–91. 3. trina j. magi and patricia e. mardeusz, “why some students continue to value individual, faceto-face research consultations in a technology-rich world,” college & research libraries 74, no. 6 (november 1, 2013): 605–18, https://doi.org/10.5860/crl12-363. 4. amanda nichols hess, “scheduling research consultations with youcanbook.me low effort, high yield,” college & research libraries news 75, no. 9 (october 1, 2014): 510–13. 5. hess, “scheduling research consultations with youcanbook.me low effort, high yield,” 511. reproduced with permission of the copyright owner. further reproduction prohibited without permission. beyond information architecture: a systems integration approach to web-site design maloney, krisellen;bracke, paul j information technology and libraries; dec 2004; 23, 4; proquest pg. 145 beyond information architecture: a systems integration approach to web-site design krisellen maloney and paul j. bracke users' needs and expectations regarding access to information have fundamentally changed, creating a disconnect between how users expect to use a library web site and how the site was designed. at the same time, library technical infrastructures include legacy systems that were not designed for the web environment. the authors propose a framework that combines elements of information architecture with approaches to incremental system design and implementation. the framework allows for the development of a web site that is responsive to changing user needs, while recognizing the need for libraries to adopt a cost-effective approach to implementation and maintenance. t he web has become the primary mode of information seeking and access for users of academic libraries. the rapid acceptance of web technologies is due, in part, to the ubiquity of the web browser, which presents a user interface that is recognized and understood by a broad range of users. as libraries increase the amount of content and broaden the range of services available through their web sites, it is becoming evident that it will take more than a well-designed user interface to completely support users' information-seeking and access needs. the underlying technical infrastructure of the web site must also be organized to logically support the users' tasks. library technical infrastructures, largely designed to support traditional library processes, are being adapted to provide web access. as part of this adaptation process, they are not necessarily being reorganized to meet the changing expectations of web-savvy users, particularly younger users who are not familiar with traditional library organization methods such as the card catalog, print indexes, or other legacy tools. libraries must harness the power of the highly structured information systems that have long been a part of libraries and integrate these systems in new ways to support users' goals and objectives. part of this challenge will be answered by the development of new systems and technical standards, but these are only a partial solution to the problem. an important part of making library systems and web sites function as powerful discovery tools is to modernize the systems that provide existing services and content to support the changing needs and expectations of the user. emerging concepts of information architecture (ia) describe the system requirements from the user perspective but do not provide a mechanism to conceptually integrate existing functions and content, or to inform the requirements necessary to modernize and integrate the current system architecture. the authors propose a framework for approaching a comprehensive web-site implementation that combines components of ia and system modernization that have been successful in other industries. within this framework, those components are tailored for the unique aspects of information provision that characterize a library. the proposed framework expands the concept of ia to include functional and content requirements for the web site. this expansion identifies points within the conceptual and physical design where user requirements are constrained by the existing infrastructure. identification of these constraints begins an iterative design process in which some user requirements inform changes to the underlying system architecture. conversely, when the required changes to the underlying system architecture cannot be achieved, the constraints inform the conceptual design of the web site. the iterative nature of this approach acknowledges the usefulness of much of the existing infrastructure but provides an incremental approach to modernizing installed systems. this framework describes aspects of the conceptual and physicaldesign elements that must be considered together and balanced to produce a web site that supports the goals and objectives of the user but is cost-effective and practical to implement. i information architecture and the problem of libraries ia is both a characteristic of a web site and an emerging discipline. a number of authors have attempted to develop a formal definition of ia. wodtke presents a simple task-based definition, stating that an information architect "creates a blueprint for how to organize the web site so that it will meet all (business, end user) these needs." 1 rosenfeld and marville present a four-part definition in which two parts focus on the practice, and two parts define ia as characteristic. the first characteristic defines ia as a combination of "organization, labeling, and navigation schemes" while the second describes it as "the structural design of an information space to facilitate task description and intuitive access to content." 2 there is general agreement that ia provides a specification of the web site from the perspective of the user. the specification usually describes the organization, navigational elements, krisellen maloney (maloneyk@u.library.arizona.edu) is director of technology at the university of arizona libraries, tucson. paul j. bracke (paul@ahsl.arizona.edu) is head of systems and networking at the arizona health sciences library, tucson. beyond information architecture i maloney and bracke 145 reproduced with permission of the copyright owner. further reproduction prohibited without permission. and labeling required to completely structure a user's web-site experience. ia is not synonymous with web-site design, but rather provides the conceptual foundation upon which a presentation design is based. web-site design adds presentation and graphical elements to ia to create the user experience. library web sites provide a display platform by which library content and services can be accessed through a common user interface. most of the tools and services have been available for decades and, in response to user demand, are increasingly being made web-accessible in digital formats (virtual reference, full-text databases). despite this new access medium and format, the conceptual design of the underlying systems has not changed much. the library technical infrastructure is made up of many loosely coupled systems optimized to perform a single function or to support the work of a library department. library web sites do not present a sufficiently unified interface design or level of technical integration to match current users' mental models of information seeking and access. 3 the systems have not been integrated to support users' overarching goals or meet the expectation of seamless access that they have developed when using other web sites (such as google or amazon). in many cases, users are still expected to understand aspects of the library that are now obsolete (card catalogs) in order to navigate the library's web site. for example, the process of finding a journal article using a typical library web site is based on a print paradigm and has changed little despite the advent of online discovery tools. in a print environment, users first looked at an index to identify an article of interest, then wrote down the citation, went to the card catalog, and there looked up the journal containing the article. if the library owned the journal, the user would then write down the call number and go to the shelves to find the article. this process has not necessarily changed much for many libraries, even though indexes, card catalogs, and journals are often available online. even more confusing is that the end result of some search processes within a library web site is not necessarily content, but a metadata representation of content that must be entered into another search box. although the first search is representative of the search of a traditional index and the second search is representative of the search of the card catalog, many of our users have no mental model for this multistep search process. users accustomed to the simple keyword search available through internet search engines may have great difficulty in understanding the need for the many steps involved in library use. there is an expectation that search systems and online content will be linked, regardless of the economic, legal, and technical factors that make these links difficult. while linking-options in vendor databases and openurl resolvers have begun to simplify the electronic version of the process by automating some of the steps, the multistep process is still valid in many instances in most libraries. it is clear that library web sites must undergo a fundamental change in order to be responsive to the needs of the user. because library web sites appear to be similar to conventional web sites, it is tempting to adopt a general approach to ia to address users' needs. there are, however, several areas in which the general approach to ia does not adequately support the design needs for library web sites. generalized ia approaches, such as those provided by rosenfeld and marville, do not provide adequate guidance regarding the organization and display of content from external sources. there is an unstated assumption that external sources will provide information in the format specified by the web-site architect. ia approaches suggest methods to completely describe the user experience, from the time a user first accesses a site to the point at which a user task is complete, regardless of the origin of the content or service accessed. for example, the content from each of amazon.com' s commercial partners is packaged to operate like a part of the amazon.com site. in contrast, libraries often only have control of a user's experience up to the point at which they leave a library's servers. libraries guide users not only to local services and digitized collections, but to databases, journals, and more that are licensed from external sources and the appearances of which are controlled by external sources. even when using a technical standard such as z39.50 to provide a local look and feel to remote resources, libraries do not necessarily have full control over the data format or elements of the content that is returned. this lack of local control over content is a limitation to libraries adopting common definitions of ia. another design area that is not well supported by generalized approaches to ia is the integration of previously installed systems, such as library catalogs. these legacy systems provide important services that represent decades of development and collaboration, and are essential to the future of libraries. for example, libraries provide access to unique resources and systems ranging from online catalogs to abstracting and indexing databases to interlibrary loan (ill) networks. libraries are using web technologies to provide new access methods to library content and services. these technologies provide a thin veneer on systems that function in a manner unfamiliar to many users. the challenge then becomes to change what lies beneath the surface, the underlying functionality of the site, to support the needs of the user. using a generalized approach to ia, as applied in other settings, libraries would assess the needs of the user and develop a new, complete system that supports those needs. such an approach ignores the extensive, existing infrastructure of legacy systems in libraries that is still useful and that serves purposes beyond the user's web interface. what is 146 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. needed is a standard reference model for library services that provides a framework for access to services and content. this is a long-term goal that requires cooperation and agreement among libraries, and that would allow legacy systems to be repackaged in ways that are more flexible, meet changing user needs, and can be integrated into changing technology environments. because there are currently no such reference models, librarians need to develop other approaches to integrate existing legacy systems into a modernized web site. i extending the ia framework in this paper, the general definition of ia that has been proposed by several authors has been extended to incorporate the additional constraints that characterize library web sites.4 extended information architecture (eia) is the first half of the framework, and provides a complete conceptual design of the web site from the users' perspective. figure 1 depicts the elements and relationships within eia. the coordinating structure provides an overarching framework for the integration of the multiple service elements that provide much of the underlying functionality of the web site. the relationship between the coordinating structure and the service elements is iterative, with service elements constraining the coordinating structure and the coordinating structure informing the design of the service elements. the coordinating structure the coordinating structure contains many of the design elements that are found in generalized approaches to ia, including the organization, navigational structure, and labeling. these are the elements of a web site that, in concert, define the structure of the user interface without specifying the functionality and content underlying that interface. the framework emphasizes aspects of the generalized approaches that are most relevant to libraries and places them in relation to the service elements that specify the content and functionality of the site. the first element of the coordinating structure is the organization of the web site. organization refers to the logical groupings of the content and services that are available to the user. these groupings are not necessarily representative of physical-system implementations, but may be taskor subject-based instead. for example, many academic library web sites have primary groupings that include information resources, services, and user guides. although the information resources may include information from a range of systems (for instance, the catalog, abstracting and indexing databases, full-text databases, locally-developed exhibits), the logical grouping of information resources unifies the concept for the user. a site's organization scheme will often serve as the foundation for the primary navigational choices on a site's main menu or primary navigational bar. another component of the coordinating structure is the navigational structure of the site. navigational structures define the relationships between content and service elements of a site, and between groupings in the site's organization. these structures also include search tools and other link-management tools that help users locate needed content and services. there are usually two types of relationships that form a navigational structure. first is the definition of a global relationship scheme that outlines the primary navigational structure of the site. these often define relationships between sections of a site's organization, but may also provide access to key pieces of functionality from any point within a site. in addition to the overarching global relationship scheme, there are often several locally or functionally defined relationship schemes that are used throughout the site. these local relationship schemes are usually located within a service or content grouping and provide logical connections within their defined grouping. both sets of relationships are designed to support a task and provide pathways for the user to move among the various elements of the site. other relationship schemes may be topic oriented, allowing the user to move easily among similar content sources. these logical relationships are later implemented within a user interface as tools such as menus, navigation bars, and navigation tabs when combined with labels and a visual design. customization and personalization are navigational structures that have gained a fair amount of attention in the library literature. both strategies allow a web site to be displayed differently, based on user characteristics. customization allows the user to create the relationships most suitable for his or her needs. this strategy has been explored by a number of libraries, although there is little convincing evidence that users implement such strategies in an intense or repeated manner. 5 personalization allows a system designer to bring together a set of pages in a relationship that is meaningful for a user or a user group. labels, the third element of the coordinating structure, provide signposts that communicate an integrated view of a web site's design to those who use it. it is important to define a labeling system that consistently and clearly communicates the meaning of the site to the user. accordingly, the labels should be constructed in the user's language, not the librarian's. for example, a user may not understand that an abstracting and indexing database will provide them with information regarding journal articles that are relevant to a topic of interest. in that case, the label "find an article" is more useful than "indexes." beyond information architecture i maloney and bracke 147 reproduced with permission of the copyright owner. further reproduction prohibited without permission. extended informati on architecture coordinating structure • organization: the grouping and specification of the funct ion and content that is necessary to support the site. • navigational structure: the associations among the service and content elements of the site. these relationships provide the conceptual foundation for navigation and include global and local navigationa l concepts, site index and search, customizab le and personalized structures. • labeling: a consistent naming scheme that presents options and choices to users in terms that will understand. serv ice elements • functio nal requirements: the description of the functional elements that are necessary to support the user. • content requirements: the description of the content elements that are necessary to support the user. • content specifications: the description of the content elements that are already available to support the user. • functional specifications: the description of the functional elements that are present in a previously installed system. figure 1. an extended information architecture for developing a conceptual design of library web sites labels are used to describe individual service or content units, but may also be used as headings to provide structural elements to augment the navigational scheme. the consistent use of labels as headings within the site not only increases user understanding of the site, but may also be explicitly constructed to support user tasks. an example of labeling to support tasks can be seen on the university libraries web site of the university of louisville where, under the main heading for articles, the first subheading is step 1: search article databases; and the second subheading is step 2: search (the catalog) by journal title." service elements service elements are the second major component of extended information architecture, and represent the content and functionality of the web site. in this framework, the service elements serve a dual purpose. the definition of service elements involves defining both the ideal requirements for functionality and content as well as the specifications of what is currently available. the definition process can then be used to identify points in the web site where new functions and content need to be added, or where existing functionality must be modernized. these additions and modifications may be achievable immediately, but in many cases an incremental plan for change may need to be developed. the service-element requirements, labeled as functional requirements and content requirements in figure 1, express the users' needs and expectations for the functional or content elements of the web site. the purpose of the requirements definitions is to describe the service elements that are necessary to allow a user to meet his or her goals or objectives in using the site. these requirements are a representation of the ideal composition of a web site, and inform not only the immediate implementation of the site but also the development of future systems and the modernization of existing systems. it is also important to note that the requirements should be developed to express user needs, not a particular implementation option. for example, it might be tempting to specify the implementation of a particular vendor's openurl resolver. this does not, however, describe how the system would function ideally from a user perspective. instead, an appropriate requirement would be that users should be able to link to full text from all citations in an abstracting and indexing database. more specifically, content requirements describe the content that is necessary to meet the users' goals and objectives. access to content is often the primary emphasis of a library web site, and the content requirements describe the intellectual content that should be accessible through a web site. examples of content that might be required are article citations, full-text articles, and multimedia objects. normally, these requirements will be closely connected with library-wide collection-development policies and priorities, and should be driven by subject specialists rather than systems personnel. these requirements inform the development of systems to meet the needs of the users. the content specifications describe the content that is available within the current systems. there are many reasons why content requirements and content specifications do not match, including the inability or choice of a library to acquire a particular piece, the unavailability of specified content, or technical incompatibilities between content and the library's infrastructure. although content is sometimes viewed as the core component of a library web site, there is also great deal of additional functionality that is provided to users. the functional requirements describe the users' needs and expectations of the functionality in the context of completing tasks on the web site. for example, ill forms found on many sites are easy for the user to fill out, although the most effective interface to ill for the user might not involve a form-based user interface at all. it might be a direct system-to-system interface from an openurl resolver to the ill software in which all citation data are transmitted for the user. this requirement is not necessarily obvious when considering ill in isolation, but is evident when considering it in the larger context of the users' goals and objective for the entire web site. the functional specifications describe the functions 148 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. as they exist in the installed base of systems and expose the functionality that is available to the user. when the specifications do not match the requirements, the users' expectations regarding the system will not be fully achieved. the economic and technical limitations of system implementation and modernization often reduce the speed at which the large base of previously installed systems can be modified to meet users' changing needs and expectations. it is thus critical to identify gaps between existing systems and desired systems and discover areas where a web site will have characteristics that are not completely aligned with what the user needs or expects. when the service-element requirements do not match the service-element specifications of existing systems, an iterative design process begins. this process will be intertwined with the evaluation of the system architecture. gaps that can be addressed immediately should be incorporated into an implementation plan for the new web site. longer-term migration or development plans can be developed to fill gaps that cannot be addressed immediately. it is also important to acknowledge that developing and meeting service-element requirements is an iterative process. they will need to be revisited over time as user needs change, and requirements that are met now become the specifications that are evaluated in the future. i interrelationships within eia when the service-element requirements cannot be used to modify the service-element specifications, the service elements constrain the design of the web site and influence the design of the coordinating structure. the upward arrow in figure 1 labeled constrains indicates that the user experience is constrained by the specifications of content or functional elements that are not currently changeable. in such situations, the coordinating structure must be designed to provide additional context for the user to understand the purpose of the existing service elements. this explanatory role can be seen in the implementation of many web sites as formal parts of the organizational structure designed to explain the idiosyncrasies of the web site to the user. for example, many academic library web sites have tutorials, faqs, or sections labeled "how do i . . . ?" that provide tips on using aspects of the site that are not always evident to users. it is necessary to acknowledge the usefulness of the explanatory role of the coordinating structure in the iterative and incremental processes of web-site design. just as bibliographic instruction and adequate signage have allowed the user to navigate aspects of the traditional library that were not intuitive, the coordinating structure provides the conceptual signposts and other guidance required for users to effectively navigate the web site. at the same time, it is important to realize that the explanatory role would not be necessary if the web site's architecture and design were intuitive to the user. as the design of the service elements changes to accommodate the larger goals of the user, the explanatory function of the coordinating structure will be diminished. the main goal of library web site design should be to reduce the explanatory role of the coordinating structure and to develop service elements that seamlessly support the goals and objectives of the user. until all service elements have been modernized to meet the needs of the user, the conceptual design of web sites will represent a compromise between what users require and what it is possible for users to do within the current legacy information infrastructure i system architecture while the conceptual design of the web site describes the needs of the user apart from the technical details of the implementation, the system architecture is the description of the system as it exists. in the case of library web sites, the system architecture is not limited to the functionality and data on the library's web server. instead, it is also inclusive of all core infrastructure, individual systems, and data access and storage mechanisms that provide the blueprint of the web site's backend as it has been built. the individual systems in the architecture may include locally controlled ones, (for instance, an online catalog), but will also include remote systems such as abstracting and indexing databases mounted by a vendor. a definition of the design of the existing system plays a key role in the evolutionary specification of the system because it provides developers with a greater understanding of the possibilities and constraints of the existing infrastructure. in describing a system architecture, several formal representations can be used that capture various aspects of the system's capabilities at different levels of granularity. these include module views that provide static specifications of individual components; component and connector views that provide dynamic views of processes; and deployment views that incorporate hardware elements.' the selection of representations is beyond the scope of this paper. typical elements of a system architecture can be seen in figure 2. for the paper, three classes of components are being considered, although more may be introduced if applicable locally. the core-infrastructure components are fundamental services and information that support one or more systems or subsystems. in a typical library environment this includes authentication services, web platforms, and the network. in some library environments, external beyond information architecture i maloney and bracke 149 reproduced with permission of the copyright owner. further reproduction prohibited without permission. units may maintain some or all of these components. for example, many college campuses maintain an authentication infrastructure in the campus computing office. overall, core infrastructure provides the glue for tying together the many applications that libraries attempt to integrate in their web sites. the system architecture should include details regarding the standards and interfaces that are used within the library technical infrastructure. many of the applications in the library environment are off-the-shelf components that have been developed by external vendors. these off-the-shelf components may include the catalog, ill modules, electronic-course reserves, and virtual-reference systems. although individual libraries may have some control over configuration options in these applications, they are likely to have little influence over the basic functionality or data formats provided by these systems. core functionality tends to change based on the demands of many libraries looking for similar functionality. despite the lack of functional control over these systems, components developed by external vendors may provide standards-based system interfaces to their functionality. these usually take the form of industry-supported standards or vendor-supplied application programming interfaces and give libraries some flexibility in working with these components. explicit descriptions of the available standard and proprietary interfaces should be included within the system architecture. other applications may have been developed within the library and so can be changed more easily. examples of locally developed applications typically include subject pages, information about the library, and digital web exhibits and collections. although local development does provide more control over the appearance and functionality of a piece of software, it is not without problems. local development is often conducted using a bricolage approach, solving specific problems singularly, without giving consideration to the larger networks of systems in which the solutions operate. when such approaches do not take into account larger issues of systems architecture, opportunities to solve a broader range of problems may be missed and subsequent repackaging of these solutions may be limited or impossible. libraries frequently also have a limited number of programmers, often remedied by pulling librarians or staff from other duties. while this certainly can allow libraries to meet some user needs, the lack of software-engineering skills in libraries may result in local solutions that are inflexible and that do not support standards for data storage or interchange. because the internal design of these applications is accessible and modifiable, the system architecture should include more extensive descriptions of the internal features and relationships that they contain. although this will not completely alleviate the problems of software maintenance, it will provide a better foundation for decisions regarding future migration. system architecture applications (off-the-shelf and locally developed) specification of the access mechanisms and standar ds for previously installed systems including: • catalog • interlibrary-loan • electronic reserves • abstracting and indexing databases • content management systems • legacy web content core infrastructure • authentication: the va lidation of a users identity based on creden tials. increas ingly a part of a campus-wide infrastructure . • web platforms : operating systems, server software and application software the provide the general foundation for the website. • network: the communication infrastructur e within the library system and connect ing to the internet. information storage and access • storage: the definition of storage structures including relational or hierarchical schema. character format specifications. • standards: standards available for access to the data. these include formats like marc, dublin core and mechani sms like 239 .50 and odbc . figure 2. eleme nts of a system archit ec tur e finally, typical library architectures consist of links to resources that are licensed or organized on behalf of the user. these include abstracting and indexing databases, full-text content provided by publishers outside of the library, and general vetted internet sites. linking the user to the system usually provides access to these systems, and libraries have no control over the technical implementations of such resources. newer federated search technologies are integrating into the library infrastructure the users' access to the site and to results from the sites, and linking tools make the interrelationships between these systems more easily understood. nevertheless, integrating these resources into a web site in a manner that makes sense to library users is a challenge. the access mechanisms and information formats required to communicate with the site should be clearly documented within this system architecture. i interrelationship of the information and system architectures reacting to the rapid pace of change can result in an adhoc or haphazard approach to web-site design. the sections above describe a systematic approach to include and evaluate changes to the web site. in order to imple150 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. ment the changes and create a web site that is scalable and made of reusable components, it is necessary to evaluate, plan, and document all changes to the system . figure 3 graphically depicts the interrelationship between eia and system architecture. user needs, as described by ia, should inform the development of technical infrastructure. the informs arrow indicating that eia informs the design and development of th e system architecture depict s this interrelationship. the constrains arrow designates the reality that some aspects of the existing infrastructure cannot be changed within this planning cycle and will limit the library 's ability to immediately change the underlying content and function of the web site. when mapping the conceptual d esign to the physical design, there will be gaps that represent functionality that cannot be supported, either fully or in part, by the current system architecture and thus constrain the full implementation of the conceptual design . if ia is then to be implemented as fully as possible, these gaps identify the modification s and additions that must be carefully evaluated, designed , and implemented within the underlying system architecture. gaps can be addressed in a variety of ways. if there is a total gap in functionality, a system can be deve loped or implemented to provide the desired functionality as part of the larg er system architecture. this may result in a complete development project or in the specification of an off-the-sh elf application to meet the newly identified demand. in the case where an existing system has some of the required functionality but is not completely suitable for the users ' goals and objectiv es, an incremental approa ch of modernization can be adopted . modernization surrounds "the legacy system with a so ftware layer that hid es the unwanted complexity of the old system and exports a modern interface ."" this is done to provide integration with a modern operating environment while retaining the data and exposing the functions of the existing system, if desired. techniques range from screen scraping to the implementation of web services to export access to functions that are still relevant within the new context. all of these chang es beco me part of the system architecture for future iteration s of change. gaps that cannot be immediately added or changed to meet the specified requirements become constraints in the next iteration of conceptual design. in the absence of a plan, the underlying systems will continue to undergo constant evolutionary changes, ostensibly to meet the changing needs and workflows of both users and staff. change comes from many sources, including local implementations and modifications, external vendors, and industry-wide changes in standards. this rapid but incremental change can produce a system that is very difficult to maintain and that provides few reusable modules. having a well-documented implementation and integration plan will not guarantee that extended inform atio n archit ecture system archit ect ure coro infrastructu re • authonticatlon • web platforms • network figure 3. the interrelat ionsh ip between the conceptual and physical design of the library web site the library will not experi ence the negative effects of technological change, but it does allow a library to b ette r manage change in meeting the needs of its users. th e more explicitly and clearly th e modifiable featur es are documented within the sys te m architecture; the easier it will be to plan to fill the gaps. i conclusion library users' mental models of library processes hav e fundamentally changed, creating a serious disconn ect between how users expect to use a library web site and how the site was design ed. in particular , user expectations regarding the numb er of step s that must be completed have changed. at the same time, library technical infrastructures are composed, in part , of legacy systems that provide great value and facilitate interlibrary resour ce sharing, but were not designed for the web environm ent. it is essential that librari es develop new approaches to th e conceptual design of web sites that support current and future changes to both user behaviors and to library systems architectures. in th e long run, these approach es should contribute to th e development of a referenc e model for the description of library services. the authors have proposed a complete framework for conceptual design and physical implementation that is responsive to changing user ne eds while recogni zing the need for libraries to adopt an efficient and cost-effe ctive approach to web-site design, implementation, and maintenanc e. functional and content needs of the user are id entified and molded into a conceptual design based on a broadened perspectiv e of the users' objectiv es . mapping conceptual requirem ents to physical architectures is an important part of this framework, using an architectural representation in combination with descriptions of integration elements that have been developed to support the incremental and iterative change. beyond information architecture i maloney and bracke 151 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the ability to respond is essential, necessitated by the rapid change in the technical and user environments in which libraries operate. the framework is designed to allow logical and informed decisions to be made throughout the process regarding when to create new systems, when to replace or modernize existing systems, and when to improve the conceptual signage of the web site. references 1. christina wodtke, information architecture: blueprints for the web (indianapolis: new riders, 2003), 2. louis rosenfeld and peter marville, information architecture for the world wide web, 2nd ed. (cambridge, mass.: o'reilly, 2002), 4. 3. bob gerrity, theresa lyman, and ed tallent, "blurring services and resources: boston college's implementation of metalib and spx," reference services review 30, no. 3 (2002): 229-41; barbara j. cockrell and elaine anderson jayne, "how do i find an article? insights from a web usability study," journal of academic librarianship 28, no. 3 (may 2002): 122-32. 4. jesse james garrett, elements of user experience (indianapolis: new riders, 2002); rosenfeld and marville, information architecture. 5. james s. ghapery and dan ream, "vcu's my library: librarians love it ... users? well maybe," information technology and libraries 19, no. 4 (dec. 2000): 186-90; james s. ghapery, "my library at virginia commonwealth university: third year evaluation," d-lib magazine 8, no. 7 /8 (july/ aug. 2002). accessed july 16, 2003, www.dlib.org/ dlib/july02/ ghaphery / 07ghaphery.html. 6. university of louisville libraries web site (2003). accessed july 16, 2003, http:/ /library.louisville.edu. 7. craig larman, applying uml and patterns: an introduction to object-oriented analysis and design (new jersey: prentice hall ptr, 1998); martin fowler, analysis patterns: reusable object models (boston: addison-wesley, 1997); james rumbaugh, ivar jacobson, and grady booch, the unified modeling language reference manual (boston: addison-wesley, 1999); robert c. seacord, daniel plakosh, and grace a. lewis, modernizing legacy systems: software technologies, engineering processes, and business practices (boston: addison-wesley, 2003). 8. seacord, plakosh, and lewis, modernizing legacy systems, 9. 152 information technology and libraries i december 2004 microsoft word june_ital_ellern_final.docx user  authentication  in  the  public     area  of  academic  libraries  in     north  carolina   gillian  (jill)  d.  ellern,     robin  hitch,  and   mark  a.  stoffan       information  technology  and  libraries  |  june  2015     103         abstract   the  clash  of  principles  between  protecting  privacy  and  protecting  security  can  create  an  impasse   between  libraries,  campus  it  departments,  and  academic  administration  over  authentication  issues   with  the  public  area  pcs  in  the  library.  this  research  takes  an  in-­‐depth  look  at  the  state  of   authentication  practices  within  a  specific  region  (i.e.,  all  the  academic  libraries  in  north  carolina)  in   an  attempt  to  create  a  profile  of  those  libraries  that  choose  to  authenticate  or  not.    the  researchers   reviewed  an  extensive  amount  of  data  to  identify  the  factors  involved  with  this  decision.   introduction   concerns  surrounding  usability,  administration,  and  privacy  with  user  authentication  on  public   computers  are  not  new  issues  for  librarians.  however,  in  recent  years  there  has  been  increasing   pressure  on  all  types  of  libraries  to  require  authentication  of  public  computers  for  a  variety  of   reasons.  since  the  9/11  tragedy,  there  has  been  increasing  legislation  such  as  the  uniting  and   strengthening  america  by  providing  appropriate  tools  required  to  intercept  and  obstruct   terrorism  act  of  2001  (usa  patriot  act)  and  communications  assistance  for  law  enforcement   act  (calea).    in  response,  administrators  and  campus  it  staff  have  become  increasingly   concerned  about  allowing  open  access  anywhere  on  their  campuses.    restrictive  licensing   agreements  for  specialized  software  and  web  resources  are  also  making  it  necessary  or  attractive   to  limit  access  to  particular  academic  subgroups  and  populations.    permitting  access  to  secured   campus  storage  from  these  computers  can  make  it  necessary  for  libraries  to  think  about  the   necessity  of  authentication.    and  finally,  the  general  state  of  the  economy  has  increased  the  user   traffic  to  libraries,  sometimes  making  it  necessary  to  control  the  use  of  limited  computer   resources.  authenticating  can  often  make  these  changes  easier  to  implement  and  can  give  the   library  more  control  over  its  it  environment.         that  being  said,  authentication  comes  at  a  price  for  librarians.  authentication  often  creates  ethical   issues  with  regards  to  patron  privacy,  freedom  of  inquiry,  increasing  the  complexity  of  using   public  area  machines,  and  restricting  the  open  access  needs  of  public  or  guest  users.    requiring  a   patron  to  log  into  a  computer  can  make  it  possible  for  organizations  outside  the  library’s  control     gillian  (jill)  d.  ellern  (ellern@email.wcu.edu)  is  systems  librarian,  robin  hitch     (rhitch@email.wcu.edu)  is  tech  support  analyst,  and  mark  a.  stoffan  (mstoffan@email.wcu.edu)   is  head,  digital,  access,  and  technology  services,  western  carolina  university,  cullowhee,  north   carolina.     user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     104   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   to  collect,  review  and  use  data  of  a  patron’s  searching  habits  or  online  behaviors.  issues  associated   with  managing  patron  logins  can  also  create  barriers  for  access  as  well  as  being  time  consuming   and  frustrating  for  both  the  patron  and  the  library  staff.1  while  open,  anonymous  access  does  not   completely  protect  against  these  issues,  it  can  help  to  create  an  environment  of  free,  private  and   open  access  similar  to  the  longstanding  situation  with  the  book  collection  in  most  libraries.     the  hunter  library  experience   while  working  on  the  implementation  of  a  new  campus-­‐wide  pay-­‐for-­‐print  solution  in  2009,   librarians  from  the  hunter  library  at  western  carolina  university  began  to  feel  pressured  by  the   campus  it  department  to  change  its  practice  of  allowing  anonymous  logins  to  all  the  computers  in   the  public  areas  of  the  library.    concerns  about  authenticating  users  on  library  public  area   machines  had  been  building  between  these  two  units  for  several  years.    the  resulting  clash  of   principles  between  protecting  privacy  and  protecting  security  came  to  a  head  over  this  project.   the  hunter  library  employees  perceived  that  there  needed  to  be  more  time  for  research  and   debate  before  implementing  the  preceded  mandate.  initially,  there  was  great  resistance  from   campus  it  staff  to  take  the  library’s  concerns  into  account,  but  eventually  a  compromise  was   worked  out  that  allowed  the  library  to  retain  anonymous  logins  on  its  public  computers.    the   confrontation  led  library  staff  to  investigate  the  practices  of  other  libraries,  particularly  within  the   university  of  north  carolina  (unc)  system  of  which  it  is  a  member.    it  seemed  a  logical   development  to  extend  the  initial  research  into  the  authentication  practices  throughout  the  state   of  north  carolina.   the  problem   one  of  the  first  questions  asked  by  western  carolina’s  library  administration  of  the  systems   department  was  what  other  libraries  in  the  area  were  doing.    in  our  case,  the  library  director   specifically  asked  how  many  of  west  carolina’s  sister  universities  were  authenticating  and  why.   anecdotally,  during  this  process,  it  seemed  that  many  other  university  of  north  carolina  system   libraries  reported  being  pressured  to  authenticate  their  public  computers  by  organizations   outside  the  library,  most  often  the  campus  it  department.   when  the  librarians  at  the  hunter  library  began  looking  at  research  to  support  their  position,   hard  data  and  practical  arguments  that  could  be  used  to  effectively  argue  their  case  against  this   change,  helpful  literature  seemed  to  be  lacking.  some  items  were  found  such  as  carlson,  writing  in   the  chronicle  of  higher  education,  who  reported  on  the  divide  between  access  and  security.  he   confirmed  that  other  librarians  also  have  ambivalent  feelings  about  authentication  issues  but  that   there  was  also  growing  understanding  in  libraries  about  the  potential  vulnerability  of  networks  or   misuse  of  their  resources.2     it  seemed  that  the  speed  at  which  authenticating  computers  in  the  public  areas  of  libraries  was   happening  across  the  country  had  not  really  allowed  the  literature  on  the  subject  to  quite  catch  up.     information  technology  and  libraries  |  june  2015     105           those  studies  that  existed  such  as  spec  kits  seem  to  address  the  issue  from  the  perspective  of   larger  research  libraries  or  else  did  not  systematically  assess  other  specific  groups  of  libraries.3,4     there  were  questions  in  our  minds  about  whether  the  current  research  that  was  found  would   describe  the  trends  and  unique  situations  of  libraries  located  in  rural  areas  or  in  other  types  of   academic  libraries.  there  seemed  to  be  no  current  statewide  or  geographically  defined  analysis  of   authentication  practices  across  various  types  of  academic  libraries  in  a  specific  state  or  region,  nor   were  there  any  available  studies  creating  a  profile  of  libraries  more  likely  to  authenticate   computers  in  their  public  areas.  we  questioned  if  the  rural  nature  of  our  settings,  our  mission,  or   our  geographic  area  in  the  south  might  reinforce  or  hurt  our  position  with  it.    authentication   status  is  not  something  that  is  mentioned  in  the  ala  directory  nor  is  this  kind  of  information  often   given  on  a  library’s  web  site.    we  found  that  individuals  usually  need  to  call  or  visit  the  library   directly  if  they  want  to  know  about  a  library’s  authentication  practices.   during  the  initial  investigation,  the  need  for  this  kind  of  information  to  support  the  library’s   perspective  became  clear.    this  question  led  to  the  creation  of  this  survey  of  authentication   practices  in  a  larger  geographical  area  and  across  various  kinds  of  academic  libraries.    the  goals  of   this  research  were  to  determine  some  answers  to  the  following  questions:   • what  is  the  current  state  of  authentication  practices  in  the  public  area  of  academic  libraries   in  north  carolina?       • what  factors  caused  these  libraries  to  make  the  decisions  that  they  did  in  regards  to   authentication?   • could  you  predict  whether  an  academic  library  would  require  users  to  authenticate?   literature  review   a  number  of  studies  have  discussed  various  other  aspects  of  user  authentication  in  libraries,   including  privacy  and  academic  freedom  concerns,  guest  access  policies,  differing  views  of  privacy   and  access  between  library  and  campus  it  departments,  and  legislation  impacting  library   operations.  all  are  potential  factors  impacting  decisions  on  authentication  of  patron  accessible   computers  located  in  the  public  areas  of  library.   privacy  and  academic  freedom  about  the  use  of  a  library’s  collection  have  long  been  major   concerns  for  librarians  even  before  information  technology  was  introduced.  the  impact  of  9/11   and  the  patriot  act  made  the  discussion  of  computers  and  network  security,  especially  in  the   library  environment  much  more  entwined.    oblinger  discussed  online  access  concerns  in  the   context  of  academic  values,  focusing  on  unique  aspects  of  the  academic  mission.  she  discussed  the   results  of  an  educause/internet2  computer  and  network  security  task  force  invitational   workshop  that  established  a  common  set  of  principles  as  a  starting  point  for  discussion:  civility   and  community,  academic  and  intellectual  freedom,  privacy  and  confidentiality,  equity  of  access  to   resources,  fairness,  and  ethics.  all  of  these  principles,  she  argues,  are  integral  to  the  environment     user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     106   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   of  a  university  and  concluded  that  security  is  a  complex  topic  and  that  written,  top-­‐imposed   policies  alone  will  not  adequately  address  all  concerns.5  while  not  directly  addressing  the  issues  of   the  library’s  public  computer  access  in  particular,  she  established  a  framework  of  values  on  how   security  issues  relate  to  the  university  culture  of  freedom  and  openness.   dixon  in  an  article  written  for  library  administrators  discussed  privacy  practices  for  libraries   within  the  context  of  the  library  profession’s  ethical  concerns.  she  highlights  such  documents  as   the  code  of  ethics  of  the  american  library  association6,  the  fair  information  practices  adopted  by   the  organization  for  economic  cooperation  and  development7,  and  the  niso  best  practices  for   designing  web  services  in  the  library  context8.    she  also  reviews  a  variety  of  ways  that  patron  data   may  be  misused  or  compromised.  she  stated  that  all  the  ways  that  patron  data  can  be  be  stored  or   tracked  by  local  networks,  it  departments,  or  internet  service  providers  may  not  be  fully   understood  by  librarians.  while  most  librarians  ardently  maintain  the  privacy  of  patron   circulation  records,  she  points  out  that  similar  usage  data  on  online  activities  may  be  collected   without  the  librarians  or  their  patrons  being  aware.  dixon  studied  the  current  literature  and   maintained  that  libraries  need  to  be  closely  involved  in  decisions  about  the  collection  and   retention  of  patron  usage  data,  especially  when  patron  authentication  and  access  is  controlled  by   external  agencies  such  as  campus  or  city  it  departments,  because  of  a  tendency  for  security  to   prevail  over  privacy  and  free  inquiry.9  this  theme  was  of  major  importance  to  us  in  preparing  the   present  study  as  it  shows  that  we  are  not  alone  in  these  concerns.   carter  focused  on  the  balance  between  security  and  privacy  and  suggested  several  possible   scenarios  for  addressing  both  areas.  he  emphasized  librarian  values  involving  privacy  and   intellectual  freedom,  contrasting  the  librarian’s  focus  on  unrestricted  access  with  the  over-­‐arching   security  concerns  of  computing  professionals.  he  discussed  several  computer  access  policies  in   use  at  various  institutions  and  possible  approaches.  these  options  include  computer   authentication  (with  associated  privacy  concerns),  open  access  stations  visually  monitored  from   staffed  desks,  or  routine  purging  of  user  logs  at  the  end  of  each  session.  he  also  suggested   librarians  lobby  state  legislatures  to  have  computer  usage  logs  included  in  laws  governing  the   confidentiality  of  library  records.10   still  and  kassabian  provided  a  good  summary  of  internet  access  issues  as  they  affected  academic   libraries  from  legal  and  ethical  perspectives.  they  suggested  that  librarians  focus  on  public   obligations,  free  speech  and  censorship,  and  potential  for  illegal  activities  occurring  on  library   workstations.  the  issues  highlighted  in  the  article  have  increased  in  the  15  years  since  the  article   was  written  but  it  remains  the  best  available  overview.11  the  arguments  put  forth  in  this  article   proved  relevant  for  us  in  understanding  the  multitude  of  viewpoints  regarding  authentication   even  before  9/11.     in  the  post-­‐9/11  era,  essex  discussed  the  usa-­‐patriot  act  and  its  implications  for  libraries  and   patron  privacy.  some  of  the  9/11  terrorists  were  reported  to  have  made  use  of  public  library   computers  in  the  days  before  the  attack.  this  has  led  to  heighted  concern  about  patron  privacy     information  technology  and  libraries  |  june  2015     107           among  librarians.  accurate  assessment  of  its  impact  is  difficult  due  to  restrictions  placed  on   libraries  in  even  disclosing  that  they  have  been  subjected  to  search.12  while  not  directly   addressing  authentication,  the  article  highlights  privacy  issues  surrounding  library  records  of  all   types.     one  of  the  arguments  in  not  requiring  authentication  in  the  public  area  is  the  use  by  unaffiliated   users  of  academic  libraries.    this  is  especially  true  in  rural  areas  where  an  academic  library  might   be  some  of  the  best-­‐funded,  comprehensive  and  accessible  resources  in  a  geographical  area.    even   in  urban  areas,  guest  access  by  unaffiliated  users  is  a  growing  issue  for  many  academic  libraries   because  of  limited  resources,  software  licensing  problems  and  public  access  to  campus   infrastructure.  while  most  institutions  have  traditionally  offered  basic  library  services  to   unaffiliated  patrons,  the  online  environment  has  raised  new  problems.  weber  and  lawrence   provided  one  of  the  best  studies  of  these  issues.    their  work  surveyed  association  of  research   libraries  (arl)  member  libraries  to  determine  the  extent  of  mandatory  logins  to  computer   workstations  and  document  how  online  access  was  provided  to  non-­‐affiliated  guest  users.  they   concentrated  their  study  questions  on  federal  and  canadian  depository  libraries  that  must   provide  some  type  of  access  to  online  government  information,  with  or  without  authentication.   less  than  half  of  respondents  reported  having  any  written  policies  governing  open  access  on   computers  or  guest  access  policies.  of  the  61  responding  libraries  to  the  survey,  32  required  that   affiliated  users  authenticate,  and  of  these  libraries  and  23  had  a  method  for  authenticating  guest   users.13  this  article,  which  was  published  just  as  this  study  was  testing  and  evaluating  the  survey   instrument,  proved  to  be  very  useful  as  we  worked  with  our  questions  in  qualtrics™      and  dealt   with  the  irb  requirements.     courtney  explored  a  half-­‐century  of  changes  in  access  policies  for  unaffiliated  library  users.   viewing  the  situation  from  somewhat  early  in  the  shift  from  print  to  electronic  resources,  she   foresaw  the  potential  for  significantly  reduced  access  to  library  resources  for  non-­‐affiliated   patrons.  these  barriers  would  be  created  by  access  policy  issues  with  computing  infrastructure   and  licensing  limitations  by  database  vendors.    this  is  especially  true  if  a  library’s  licenses  or   policies  did  not  specifically  address  use  by  unaffiliated  users.  she  concluded  that  decisions  about   guest  access  to  online  library  resources  should  be  made  by  librarians  and  not  be  handed  over  to   vendors  or  campus  computing  staff.14  our  study  began  as  a  result  of  this  very  issue,  i.e.,  an  outside   entity  (campus  it)  determining  how  access  to  library  resources  should  be  controlled,  without   input  by  librarians  or  library  staff.   courtney  also  surveyed  814  academic  libraries  to  assess  their  policies  for  access  by  unaffiliated   users.    she  focused  on  all  library  services  including  building  access,  reference  assistance,  and   borrowing  privileges  in  addition  to  online  access.  many  libraries  were  also  cancelling  print   subscriptions  in  favor  of  online  access  and  she  questioned  the  impact  this  might  have  on  use  by   unaffiliated  users.    while  suggesting  little  correlation  between  decisions  to  cancel  paper   subscriptions  and  requiring  authentication  of  computer  workstations,  she  concluded  that  reduced     user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     108   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   access  by  unaffiliated  users  would  be  an  unintended  consequence  of  this  change.15  this  article   proved  valuable  to  us  in  framing  our  study,  as  it  gave  us  some  idea  of  what  we  might  expect  to  find   and  provided  some  concepts  to  use  when  we  formulated  our  survey.     best-­‐nichols  surveyed  public  use  policies  in  11  nc  tax-­‐supported  academic  libraries  and  asked   similar  questions  to  our  own.  this  study  was  dated  and  didn’t  address  computer  resources,  but   some  of  the  same  issues  were  addressed.16  public  use  and  authentication  policies  have  the   potential  to  impact  one  another  and  how  the  library  responds.       courtney  called  on  librarians  to  conduct  a  carefully  thought  out  discussion  of  user  authentication   because  of  the  implications  for  public  access  and  freedom  of  inquiry.  while  librarians  are   traditionally  passionate  at  protecting  patron  privacy  involving  print  resources,  many  are  unaware   of  related  concerns  involving  online  authentication.  she  advocated  for  more  education  and  open   debate  of  the  issues  because  of  the  potential  gravity  of  leaving  decision-­‐making  in  the  hands  of   database  vendors  or  campus  it  departments.  decisions  regarding  authentication  and  privacy   impact  library  services  and  access,  and  therefore  need  to  include  input  from  librarians.17  as  this   study  included  a  summary  of  the  reasons  for  authentication  as  provided  by  surveyed  libraries,  it   also  gave  us  another  reference  point  to  use  when  comparing  our  results  and  highlighted  the   intellectual  freedom  issues  that  were  often  missing  or  glossed  over  in  other  studies.   barsun  surveyed  the  web  sites  of  the  100  association  of  research  libraries  to  assess  services  to   unaffiliated  users  in  four  areas:  building  access,  circulation  policies,  interlibrary  loan  services,  and   access  to  online  databases.  61  member  libraries  responded  to  requests  for  data.  she  explored  the   question  of  whether  the  policies  governing  these  services  would  be  found  on  a  library’s  web  site.   she  perceived  a  possible  disparity  between  increasing  demand  for  services  generated  by  members   of  the  public  who  are  discovering  a  library’s  resources  via  online  searching  and  the  library’s  ability   or  willingness  to  serve  outside  users.  while  she  did  not  address  computer  authentication  issues   directly,  she  did  find  that  a  significant  percentage  of  academic  library  web  sites  were  ambiguous   about  stating  the  availability  of  non-­‐authenticated  access  to  databases  from  onsite  computers.18   this  ambiguity  could  possibly  be  related  to  vague  usage  agreements  with  database  vendors  that   do  not  clearly  state  whether  non-­‐affiliated  users  may  obtain  onsite  access  to  these  resources.  in   “secret  shopper”  visits  done  as  part  of  our  own  research,  we  saw  a  disparity  between  what  was   stated  on  a  library’s  web  site  and  the  reality  of  access  offered.     method   it  seemed  appropriate  to  start  this  project  with  a  regional  focus.      none  of  the  studies  available   looked  at  authentication  geographically.    because  colleges  and  universities  within  a  state  are  all   subjected  to  the  same  economic,  political  and  environmental  factors,  looking  at  the  libraries  might   help  provide  some  continuity  for  creating  a  relevant  profile  of  current  practices.    north  carolina   has  a  substantial  number  of  academic  libraries  (114)  with  a  wide  variety  of  demographics.     historically,  the  state  supports  a  strong  educational  system  with  one  of  the  first  public  university     information  technology  and  libraries  |  june  2015     109           systems.    together  with  the  17  universities  within  university  of  north  carolina  system,  the  state   has  59  public  community  colleges,  36  private  colleges  and  universities,  and  3  religious  institutions.   religious  colleges  are  identified  as  those  whose  primary  degree  is  in  divinity  or  theology.    (see   chart  1.)     chart  1.  survey  participation  by  type  of  academic  library.   work  had  been  started  to  identify  the  authentication  practices  of  other  unc  system  libraries,  so   the  researchers  expanded  the  data  to  include  the  other  academic  libraries  within  the  state.  to   create  a  list  of  the  library’s  pertinent  information  for  this  investigation,  the  researchers  used  the   american  library  directory19,  the  nc  state  library’s  online  directories  of  libraries20,  and  visited   each  library’s  web  page  to  create  a  database.  the  researchers  augmented  each  library’s  data  to   include  information  including  the  type  of  academic  library  (public,  private,  unc  system  and   religious),  current  contact  information  on  personnel  who  might  be  able  to  answer  questions  on   authentication  policies  and  practices  in  that  library,  current  number  of  books,  institutional   enrollment  figures,  and  the  name  and  population  of  the  city  or  town  in  which  the  library  was   located.  the  library’s  responses  to  the  survey  were  also  tracked  in  the  database  with  spss  and   excel  employed  in  evaluating  the  collected  data.   a  western  carolina  institution  review  board  (irb)  “request  for  review  of  human  subject   research”  was  submitted  and  approved  using  the  following  statement:  “we  want  to  know  the   authentication  situation  for  all  the  college  libraries  in  north  carolina.”    the  researchers  discovered   quickly  that  the  definition  of  “authentication”  would  have  to  be  explained  to  the  review  board  and   many  of  the  responding  librarians  that  filled  out  the  survey.  the  research  goal  was  further   simplified  with  the  explanation  of  authentication  as  “how  do  patrons  identify  themselves  to  get     user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     110   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   access  to  a  computer  in  the  public  area  of  a  library”  because  many  librarians  might  not  realize  that   what  they  do  is  “authentication”.       during  the  approval  phase,  there  was  some  question  about  whether  the  researchers  needed   formal  approval  because  much  of  the  information  could  be  collected  by  just  visiting  the  libraries  in   person.    the  researchers  saw  no  risk  of  potentially  disclosing  confidential  data.  however,  it  was   decided  that  it  was  better  to  go  through  the  approval  process,  since  the  survey  asked  the  librarians   whether  they  were  being  required  to  authenticate  by  outside  entities.    there  might  also  be  a  need   to  do  some  follow-­‐up  calls  and  there  was  a  plan  to  do  site  visits  to  the  local  libraries  in  order  test   the  data  for  accuracy.     the  qualtrics™  online  survey  system  was  used  to  create  the  survey  and  collect  the  responses.     contact  information  from  the  database  was  uploaded  to  the  survey  system  with  the  irb  approved   introductory  letter  to  each  library  contact  person  along  with  a  link  to  the  survey.    the  introductory   letter  described  the  goals  of  the  project  and  included  an  invitation  to  participate  as  well  as  refusal   language  as  required  by  the  irb  request.  the  same  language  was  used  in  the  follow  up  emails  and   phone  calls.   the  initial  (16)  surveys  were  administered  to  the  unc  system  libraries  in  october  –  december   2010  as  a  test  of  the  delivery  and  collection  system  on  qualtrics™,  with  the  rest  of  the  libraries   being  sent  the  survey  mid-­‐december  2010.         in  the  spring  of  2011,  the  researchers  followed  initial  survey  with  a  second  letter  and  then  with   phone  calls  and  emails.  during  the  follow  up  calls,  some  librarians  chose  to  answer  the  survey   questions  with  the  researcher  filling  it  out  over  the  phone.    most  filled  out  the  survey  themselves.     the  final  surveys  were  completed  in  april  2011.    because  the  status  of  authentication  is  volatile,   this  survey  data  and  research  represents  a  snapshot  in  time  of  their  authentication  practices   between  october  2010  and  april  2011.  the  researchers  did  see  changes  happening  over  the   course  of  the  surveying  process  and  made  changes  to  any  data  collected  in  follow  up  contact  in   order  to  maintain  the  most  current  information  about  that  library  for  the  charts,  graphs  and   presentations  made  from  the  data.     in  fall  2011,  the  researchers  did  a  “secret  shopper”  type  expedition  to  the  nearest  academic   libraries  by  visiting  in  person  as  a  guest  user.    the  main  purpose  of  these  visits  was  to  check  the   data,  take  pictures  of  the  library  public  areas,  get  a  firsthand  experience  with  the  variety  of   authentication  practices,  and  talk  to  and  thank  the  librarians  that  participated.   the  survey   the  survey  asked  36  different  questions  using  a  variety  of  pull  down  lists,  check  boxes  and  fill  in   the  blank  questions.    qualtrics™  allows  for  the  survey  to  have  seven  branches,  or  skip  logic,  that   asked  further  questions  depending  upon  the  answer  given.    these  branches  allowed  the  survey   software  to  skip  particular  sections  or  ask  for  additional  information  depending  on  the  answers     information  technology  and  libraries  |  june  2015     111           supplied.    some  libraries,  especially  those  that  didn’t  authenticate  or  didn’t  know  specific  details,   might  be  asked  as  little  as  14  questions  while  others  received  all  36.  the  setup  of  computers  in  the   public  area  of  libraries  can  be  quite  variable,  especially  if  the  library  differentiates  between   student-­‐only  and  guest/public  use  only  workstations.  the  survey  questions  were  grouped  into   seven  basic  areas:  descriptive,  authentication,  student-­‐only  pcs,  guest/public  pcs,  wireless   access,  incident  reports,  and  computer  activity  logs.     the  full  survey  is  included  as  appendix  a.   initial  hypothesis   given  the  experience  at  the  hunter  library,  we  expected  the  following  factors  might  influence  a   decision  to  authenticate.    some  of  these  basic  assumptions  did  influence  our  selection  of  questions   in  the  seven  areas  of  the  survey.     we  expected  to  find:   • when  the  workstations  were  under  the  control  of  campus  it,  authentication  would  usually   be  required   • when  the  workstations  were  under  the  control  of  the  library,  authentication  would   probably  not  be  required   • that  factors  such  as  population,  enrollment,  and  book  volume  would  play  a  role  in   decisions  to  authenticate     • that  librarians  would  not  be  aware  of  what  user  information  was  being  logged  whether  or   not  authentication  was  required   • a  library  would  have  experienced  incidents  involving  the  computers  in  the  public  area  that   the  library  would  have  authentication     • that  authentication  increased  from  post-­‐  9/11  factors  and  its  legal  interpretations  to  force   libraries  to  authenticate   survey  questions,  responses,  and  general  findings     the  data  collected  from  this  survey,  especially  from  those  libraries  that  did  authenticate,  produced   over  200  data  points  for  each  library.  below  are  those  that  resulted  in  answers  to  questions  posed   at  the  outset  that  particularly  looked  at  overall  authentication  practices.  further  articles  are   planned  to  look  at  areas  of  inquiry  with  regards  to  other  related  practices  in  the  public  areas  of   academic  libraries  geographically.   there  are  114  academic  libraries  in  north  carolina.  as  a  result  of  the  follow  up  emails  and  phone   calls,  this  research  survey  got  an  exceptional  99.1%  response  rate  (113  out  of  114).    once  the     user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     112   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   appropriate  librarians  were  contacted  and  understood  the  scope  and  purpose  of  this  study,  they   were  very  cooperative  and  willing  to  fill  out  the  survey.    those  who  were  contacted  via  phone   mentioned  that  the  original  email  was  overlooked  or  lost.    only  one  library  refused  to  participate   in  the  study.     individual  library’s  demographics  were  collected  in  a  database  by  using  directory  and  online   information.    the  data  was  matched  with  the  survey  data  provided  by  the  respondents  to  produce   more  in-­‐depth  analysis  and  create  a  profile  of  each  library.           how  many  libraries  in  north  carolina  are  authenticating?  (chart  2)   the  survey  asked:  “is  any  type  of  authentication  required  or  mandated  for  using  any  of  the  pcs  in   the  library’s  public  area?”  66%  (or  75)  of  libraries  answered  yes  that  they  required  authentication   to  use  the  pcs.  (see  chart  2.)         chart  2.     are  some  types  of  libraries  more  likely  to  authenticate?  (chart  3)   while  each  type  of  library  had  a  different  overall  total  as  compared  to  the  other  types,  chart  3   shows  how  the  percentages  of  authentication  hold  for  each  type.    three  out  of  the  four  types  of   libraries  authenticate  more  often.    of  the  58  community  college  libraries,  60%  (or  35)  of  them   require  users  to  authenticate.    seventy-­‐eight  percent  (78%)  of  the  36  private  colleges  libraries   authenticate  and  11  of  the  16  (or  69%)  unc  system  libraries  authenticate.    only  the  religious   college  libraries  more  often  don’t  require  users  to  authenticate  (1  of  the  3  or  33%),  although  this   is  a  very  small  population  in  the  survey.    however,  percentagewise,  community  colleges  are  more   likely  to  not  require  users  to  authenticate  then  private  college  libraries  (40%  vs.  22%)  and  the   unc  system  libraries,  that  are  public  institutions,  fall  in  the  middle  at  31%.     information  technology  and  libraries  |  june  2015     113             chart  3.     how  many  academic  libraries  were  required  to  authenticate  pcs  in  their  public  areas?   (chart  4)   of  the  75  libraries  that  required  patrons  to  authenticate,  when  asked  if  “they  were  required  to  use   this  authentication”,  59  (52%)  replied  “yes”.    putting  these  data  points  together  shows  that  16  (or   14%)  of  the  libraries  authenticate  even  though  they  were  not  required  to  do  so.      some  clues  about   why  this  was  were  asked  in  the  next  question  and  during  the  follow  up  phone  calls.         chart  4.       user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     114   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   why  was  authentication  used?   libraries  were  asked,  “do  you  know  the  reasons  why  authentication  is  being  used?”  if  they   answered  “prevent  misuse  of  resources”  or  “control  the  public’s  use  of  these  pcs”  then  an   additional  question  was  asked,  “what  led  the  library  to  control  the  use  of  pcs?”      this  option  had   two  check  boxes  (“inability  of  students  to  use  the  resources  due  to  overuse  by  the  public”  and   “computer  abuse”)  and  a  third  box  to  allow  free  text  entry.      a  library  could  check  more  than  one   box.   of  those  75  libraries  that  authenticated,  60%  (or  45)  checked  “prevent  misuse  of  resources”  and   48%  (or  36)  cited  “controlling  the  public’s  use  of  these  pcs”  as  the  reasons  for  authenticating.   in  normalizing  the  data  from  the  two  questions  and  the  free  text  field,  table  1  combines  all   answers  to  illustrate  the  number  and  percentages  of  each.     table  1.     in  the  course  of  the  follow  up  calls  with  those  libraries  that  answered  the  survey  over  the  phone,   further  insight  was  provided.  one  librarian  said  that  their  it  department  told  them   “authentication  was  the  law  and  they  had  to  do  it”.  another  answered  that  they  were  “on  the  bus   line  and  so  the  public  used  their  resources  more  than  they  expected  and  so  they  had  to”.   to  get  a  better  understanding  of  the  scope  and  variety  of  these  answers,  here  are  some  examples   of  the  reasons  cited  in  the  free  text  space:  “all  it's  idea  to  do  this”  “best  practices”,  “caution”,   “concerned  they  would  be  used  for  the  wrong  reasons”,  “control”,  “we  found  them  misusing   computer  resources  (porn,  including  child  porn)”,  “control  over  college  students  searching  of   inappropriate  websites,  such  as  porn/explicit  sites”,  “disruption”,  “ease  of  distributing     information  technology  and  libraries  |  june  2015     115           applications”,  “fear  of  abuse  on  the  part  of  legal”,  “legal  issues  regarding  internet  access”,  “making   students  accountable”,  “monitor  use”,  “policy”,  “security  of  campus  network”,  “security  of   machines  after  issues  were  raised  at  a  conference”,  and  “time”.   who  required  that  the  libraries  authenticate?  (chart  5)   the  survey  asked,  “what  organization  or  group  required  or  mandated  the  library  to  use   authentication?”    respondents  were  allowed  to  choose  more  than  one  of  the  5  boxes.    these   choices  included  “the  library  itself,”  “it  or  some  unit  within  it,”  “college  or  university   administration,”  “other”  (with  a  text  box  to  explain),  and  “not  sure”.    the  results  of  this  question   are  shown  in  chart  5.    the  survey  revealed  that  the  decision  was  solely  the  library’s  choice  25%  of   the  time,  (or  28  libraries)    22%  of  the  time  the  library  was  mandated  or  required  to  authenticate   by  it  or  some  unit  within  it  (or  25  libraries)  and  4%  of  the  time  a  library’s  college  or  university   administration  required  or  mandated  authentication  (or  4  libraries).      collaborative  decisions  in   14  libraries  involved  more  than  one  organization.    of  the  39  libraries  that  were  involved  with  the   authentication  decision  (28  that  made  the  decision  by  themselves  and  11  that  were  part  of  a   collaborative  decision),  55%  (or  16)  authenticated  even  though  they  were  not  required  to  do  it.     chart  5.   what  type  of  authentication  is  used?   authentication  in  libraries  can  take  many  forms.    the  most  common  method  for  those  libraries   that  authenticate  was  by  using  centralized  or  networked  systems.    almost  sixty  percent  of  the   libraries  used  some  form  of  this  identified  access  (tables  2  and  3)  with  one  library  using  some   other  independent  system.    twenty-­‐five  percent  (or  19)  of  libraries  that  authenticate  still  use   some  form  of  paper  sign-­‐in  sheets  and  21%  (or  16)  use  pre-­‐set  or  temporary  logins  or  guest  cards.     fifteen  percent  (or  11)  use  pc  based  sign-­‐in  or  scheduling  software  and  8%  (or  6)  use  the  library     user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     116   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   system  in  some  form  for  authentication.    a  few  libraries  indicated  that  they  bypass  their   authentication  systems  for  guests  by  either  having  staff  log  guests  in  or  disabling  the  system  on   selected  pcs.  we  saw  this  during  the  “secret  shopper”  visits  as  well.       table  2.   do  the  forms  of  authentication  used  in  libraries  allow  for  user  privacy?   when  asked  how  they  handle  user  privacy  in  authentication,  of  the  75  libraries  that  authenticate,   67%  (or  50)  use  a  form  of  authentication  that  can  identify  the  user.    in  other  words,  most  users  do   not  have  privacy  when  using  public  computers  in  an  academic  library  because  they  are  required  to   use  some  form  of  centralized  or  networked  authentication.  the  options  in  table  3  were  presented   to  the  respondents  as  possible  forms  of  privacy  methods.  thirty-­‐five  percent  (or  26)  libraries   indicated  that  they  provide  some  form  of  privacy  for  their  patrons.  anonymous  access  accounted   for  28%  (or  21)  of  the  libraries.       table  3.     information  technology  and  libraries  |  june  2015     117           are  librarians  aware  of  the  computer  logging  activity  going  on  in  the  public  area?  (table  4)   all  the  113  respondents  were  asked  two  questions  about  the  computer  logging  activities  of  their   libraries:  “do  you  know  what  computer  activity  logs  are  kept”  and  “do  you  know  how  long   computer  activity  logs  are  kept”.    the  second  question  was  only  asked  if  “unsure”  was  not  checked.   besides  “unsure”,  responses  on  the  survey  included  “authentication  logs  (who  logged  in)”,   “browsing  history  (kept  on  pc  after  reboot)”,  “browsing  history  (kept  in  centralized  log  files)”,   “scheduling  logs  (manual  or  software)”,  “software  use  logs”  and  “other”.  the  respondents  could   select  more  than  one  answer.  however,  over  half  (52%)  of  the  respondents  were  unsure  if  the   library  kept  any  computer  logs  at  all.  authentication  logs  of  who  logged  in  were  the  most  common,   but  those  were  kept  in  only  25%  of  the  total  libraries  surveyed.    a  high  percentage  of  libraries   kept  some  kind  of  logs  but  most  respondents  were  unsure  how  long  those  records  were  kept.    of   the  various  types  of  logs,  respondents  that  use  scheduling  software  were  the  most  familiar  with   the  length  of  time  software  logs  were  kept.  in  one  case,  a  respondent  mentioned  that  the  manual   sign-­‐in  sheets  were  never  thrown  out  and  that  they  had  retained  them  for  years.     table  4.  log  retention.   are  past  incidents  factors  in  authenticating?     only  three  libraries  reported  breaches  of  privacy  and  all  those  libraries  reported  using   authentication.     of  the  75  libraries  that  do  authenticate  (chart  6,  3  bars  on  the  right),  36  reported  that  they  did   have  improper  use  of  the  pcs  while  29  of  the  libraries  reported  that  did  not  and  10  did  not  know.     of  the  38  libraries  that  do  not  authenticate  (chart  6,  3  bars  on  the  left),  23  reported  that  they  had   no  improper  use  of  the  pcs  while  13  stated  that  they  did  and  2  did  not  know.    the  overall  known   reports  of  improper  use  in  the  survey  are  higher  when  the  library  does  authenticate  and  is  lower   when  the  library  doesn’t  authenticate.   computer  activity  logs number of  total   libraries don't  know   how  long   data  is  kept   (unsure) unsure 59 52% 100% authentication  logs  (who  logged  in) 28 25% 60% none 21 19% -­‐-­‐ browsing  history  (kept  in  centralized  log  files) 14 12% 86% scheduling  logs  (manual  or  software) 10 9% 70% browsing  history  (kept  on  pc  after  reboot) 7 6% 57% software  use  logs 6 5% 33% library  system 4 4% 75% other 2 2% -­‐-­‐ what  kind  and  for  how  long  computer  logs  are  kept (all  113  libraries)   user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     118   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770     chart  6.   when  did  libraries  begin  authenticating  in  their  public  areas?   of  the  75  libraries  that  authenticate,  only  one  implemented  this  more  than  ten  years  prior  to  the   survey.  51  (or  67%)  of  the  responding  libraries  began  authenticating  between  3  and  10  years  ago.     10  libraries  implemented  authentication  in  the  year  before  the  survey.    this  is  consistent  with  the   growth  of  security  concerns  in  the  post  9/11  decade.  (chart  7)     chart  7.     information  technology  and  libraries  |  june  2015     119           discussion   since  the  introduction  of  computer  technology  to  libraries,  library  staff  and  patrons  have  used   different  levels  of  authentication  depending  upon  the  application.    while  remote  access  to   commercial  services  such  as  oclc  cataloging  subsystems  or  vendor  databases  have  always  used   some  form  of  authorization,  usually  username  and  password,  it  has  never  been  necessary  or   desirable  for  public  access  to  the  library’s  catalog  system  to  have  any  kind  of  authorization   requirements.    most  of  the  collections  within  an  academic  library  have  traditionally  been  housed   in  open  access  stacks  where  anyone  can  freely  access  material  on  the  shelves.    printed  indexes  and   other  tools  that  provide  in-­‐depth  access  to  these  collections  have  traditionally  been  open  as  well.     today,  most  libraries  still  make  their  library  catalog  and  even  some  bibliographic  discovery  tools   open  access  and  available  over  the  web.  this  practice  naturally  extended  to  computer  technology   and  other  electronic  reference  tools  until  libraries  began  connecting  them  to  the  campus  and   public  networks.       the  principle  of  free  and  open  access  to  the  materials  and  resources  of  the  library,  within  the   library  walls,  has  been  a  fundamental  characteristic  of  most  public  and  academic  libraries.  there  is   an  ethical  commitment  of  librarians  to  a  user’s  privacy  and  confidentiality  that  has  deep  roots   based  in  the  first  and  fourth  amendment  of  the  us  constitution,  state  laws,  and  the  code  of  ethics   of  the  ala.    article  ii  of  the  ala  code  states  “we  protect  each  library  user's  right  to  privacy  and   confidentiality  with  respect  to  information  sought  or  received  and  resources  consulted,  borrowed,   acquired  or  transmitted.”  traditionally,  library  staff  do  not  identify  patrons  that  walk  through  the   door;  they  don’t  ask  for  identification  when  answering  questions  at  the  reference  desk  nor  do  they   identify  patrons  reading  a  book  or  magazine  in  the  public  areas  of  a  library.  schneider  has   empathized  that  librarians  have  always  valued  user  privacy  and  have  been  instrumental  in  the   passing  of  many  state’s  library  privacy  laws.23    usually,  it  is  only  when  materials  are  checked  out   to  a  patron  that  a  user’s  affiliation  or  authorization  even  gets  questioned  directly.  frequently   patrons  can  make  use  of  materials  within  the  library  building  with  no  record  of  what  was  accessed.   we  are  seeing  these  traditional  principles  of  open  access  to  materials  as  they  transition  to   electronic  formats.  it  is  becoming  more  common  for  patrons  to  have  to  authenticate  before  they   can  use  what  was  once  openly  available.  the  data  collected  from  this  survey  confirms  this  trend   with  66%  of  the  libraries  using  some  form  of  authentication  in  their  public  area.   the  widespread  use  of  personally  identifiable  information  is  making  it  more  difficult  for  librarians   to  protect  the  privacy  and  confidentiality  of  library  users.  although  the  writing  was  on  the  wall   that  some  choices  would  have  to  be  made  with  regards  to  privacy  before  911,  no  easy  answer  to   the  problem  had  yet  been  identified.  librarians  themselves  are  often  uncertain  about  what   information  is  collected  and  stored  as  evidenced  by  our  data  (chart  6).    as  more  information   becomes  available  only  electronically,  because  computers  in  the  public  areas  are  now  used  for   much  more  than  just  accessing  library  catalog  functions,  it  is  becoming  difficult  to  uphold  the  code   of  ethics  and  protect  the  privacy  of  users.       user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     120   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   using  authentication  can  also  make  it  more  difficult  to  use  technology  in  the  library.    in  order  to   authenticate,  users  may  be  required  to  start  or  restart  a  computer  and/or,  log  into  or  out  of  the   computer.    this  can  take  time  to  do  as  well  as  require  the  user  to  remember  to  log  off  the   computer  when  finished.  users  often  have  difficulty  keeping  track  of  their  user  information  and   may  require  increased  assistance  (table  5).     table  5.   library  staff  or  scheduling  software  can  be  required  to  help  library  guests  obtain  access  to   computer  equipment.    north  carolina,  like  other  states,  does  have  laws  governing  the   confidentiality  of  library  records.  librarians  have  long  dealt  with  this  situation  by  keeping  as  little   data  as  possible.  for  example,  many  library  circulation  systems  do  not  store  data  beyond  the   current  checkout.  access  logs  that  detail  what  resources  a  particular  user  has  accessed  would   seem  to  fall  under  this  legislation,  although  the  wording  in  the  law  is  vague.   information  technology  departments,  legal  counsel,  and  administrators,  on  the  other  hand,  are   often  less  concerned  about  privacy  and  intellectual  freedom  issues.  more  often  their  focus  is  on   security,  limiting  access  to  those  users  affiliated  with  the  institution,  and  monitoring  use.  being   ready  and  able  to  provide  data  in  response  to  subpoenas  and  court  orders  is  often  a  priority.  at   western  carolina  university,  illicit  use  of  an  unauthenticated  computer  in  the  student  center  led  to   an  investigation  by  campus  and  county  law  enforcement.  this  case  is  still  used  as  justification  for   needing  to  authenticate  and  monitor  campus  computer  use  even  though  the  incident  occurred   many  years  ago.  being  able  to  track  an  individual’s  online  activity  is  believed  to  increase  security   by  ensuring  adherence  to  institutional  policies.  authentication  with  individually  assigned  login   credentials  permits  online  activity  to  be  traced  to  that  specific  account  whose  owner  can  then  be   held  accountable  for  the  activity  performed.  librarian’s  responses  to  the  survey  indicate  that  these   issues  play  a  role  in  a  library’s  decisions  to  authenticate  as  seen  in  the  free  text  responses  in  table   6.     information  technology  and  libraries  |  june  2015     121           tracking  use  through  ip  address,  individual  login,  and  transaction  logs  allows  scrutinizing  of  users   in  case  of  illegal  or  illicit  use  of  computer  resources.  in  many  cases,  this  action  is  justified  as  being   required  by  auditors  or  law  enforcement  agencies,  though  information  regarding  this  is  scarce.   the  authors  of  this  article  are  not  aware  of  any  laws  or  auditing  requirements  in  north  carolina   that  require  detailed  tracking  of  library  computer  use.     some  libraries  indicated  that  it  departments  were  concerned  about  security  of  networks  and/or   computers.  security  can  be  undermined  when  generic  accounts  are  used  or  when  no   authentication  is  required.  by  using  individual  logins,  users  can  be  restricted  to  specific  network   resources  and  can  be  monitored.  when  multiple  computers  use  the  same  account  for  logging  in  or   when  the  login  credentials  are  posted  on  each  computer,  it  can  compromise  security  because  use   cannot  be  tracked  to  a  specific  user.  in  some  libraries,  these  security  issues  have  trumped   librarian’s  concerns  about  intellectual  freedom  and  privacy.   creating  a  profile  as  a  result  of  these  findings   given  the  number  of  characteristics  collected  about  each  library,  it  was  assumed  there  were  some   factors  gathered  that  might  influence  a  decision  to  authenticate  and  allow  for  the  possibility  to   create  a  profile  for  prediction.  the  data  was  collected  from  libraries  within  a  fixed  geographic   region.  the  externally  collected  and  survey  data  was  coded,  put  into  spss™  and  a  number  of   statistical  tests  were  performed  to  find  what  factors  might  be  statistically  significant.    to  further   the  geographical  analysis  of  the  data,  the  data  was  also  put  into  arcview™  to  produce  a  map  of   north  carolina  with  the  libraries  given  different  colored  pins  for  those  academic  libraries  that   authenticated  vs.  non-­‐authenticated  to  see  if  there  were  any  pattern  to  the  choice.  (map  1)     to  more  completely  explore  the  possible  role  that  geographic  information  might  play  in  the   decision  to  authenticate,  the  population  of  the  city  or  town  the  institution  was  located  in,   enrollment,  book  volume,  number  of  pcs  and  total  number  of  library  it  staff  (scaled  variables)  as   well  as  ordinal  variables  such  as  “who  controlled  the  setup  of  the  pcs”,  “do  you  differentiate   between  student  and  public  pcs”,  and  “known  incidents  of  privacy  and  misuse”,  were  also   integrated  into  the  analysis.    the  data  collected  could  not  predict  whether  an  academic  library   would  authenticate  or  not  using  logistical  regression  techniques,  although  those  that  differentiate   between  student  and  public  pcs  did  have  a  higher  probability.    based  on  all  our  collected  data  and   mapping,  it  is  impossible  to  predict  with  any  significance  whether  or  not  an  academic  library   would  authenticate.     so  the  short  answer  statistically  is  no.    using  all  of  the  data  collected,  a  statistically  significant   profile  could  not  be  created,  however  there  are  general  tendencies  identified  that  the  data  was   able  to  suggest.       user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     122   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770     map  1.   for  those  libraries  that  do  authenticate,  the  average  book  volume  is  almost  400,000,  the   enrollment  around  5,600,  the  city  population  where  the  institution  is  located  is  94,000,  the  total   number  of  pcs  in  the  public  area  is  54,  and  the  average  number  of  library  it  staff  is  1.8.         for  those  libraries  that  do  not  authenticate,  the  average  book  volume  is  about  163,000,  enrollment   around  3,000,  the  population  is  53,000,  the  average  number  of  pcs  in  the  public  area  is  about  39   and  the  average  number  of  library  it  staff  is  0.8.     libraries  that  authenticate  tend  to  have  statistically  significant  differences  in  book  volume,  the   number  of  pcs  in  the  public  area,  which  has  a  t-­‐test  value  of  p<1.    student  enrollment  was  the  most   statistically  significant  factor  in  those  that  authenticated,  with  a  t-­‐test  value  of  p<0.5.  libraries  that   authenticate  had  many  more  students,  more  books  and  a  larger  number  of  pcs  in  their  public   areas  then  libraries  that  didn’t  authenticate.   those  libraries  that  didn’t  authenticate  tended  to  be  in  smaller  towns,  more  often  their  pcs  in  the   public  areas  were  setup  by  non-­‐library  it  staff,  and  had  fewer  library  it  staff.  sixty  percent  (60%)   of  the  libraries  that  don’t  authenticate  had  zero  library  it  staff.           information  technology  and  libraries  |  june  2015     123           while  it  was  assumed  at  the  outset  of  this  research  that  the  responsible  campus  department  for   the  setup  of  the  workstations  (the  library  or  it)  in  the  public  area  would  be  a  factor  in  whether   authentication  was  used  in  the  library,  the  data  does  not  support  this  assumption  statistically.   ethical  questions  about  authentication  as  a  result  of  these  findings   there  are  a  variety  of  reasons  why  a  library  might  choose  to  authenticate  despite  the  ethical  issues   associated  with  it.  the  protection  and  management  of  it  resources  or  the  mission  of  the   institution  are  two  likely  scenarios.    a  library,  especially  one  with  lots  of  use  by  unaffiliated  users   or  guests,  might  chose  to  authenticate  regardless  of  concerns  in  order  to  make  sure  its  own  users   have  preference  to  the  pcs  in  the  public  area  of  their  library.  a  private  institution  may  choose  to   authenticate  in  order  to  limit  access  by  any  members  of  the  general  public.  of  those  75  libraries   that  authenticate,  81%  cited  concerns  about  controlling  use,  overuse  and  misuse.  this  study  also   found  that  in  25%  of  the  total  academic  libraries,  the  library  itself  decided  to  authenticate  without   influence  from  external  groups.  this  was  a  higher  percentage  than  was  expected.  given  librarian’s   professional  concerns  about  intellectual  freedom  and  privacy,  we  were  very  surprised  that  so   many  libraries  choose  to  authenticate  on  their  own.   we  suspected  that  many  librarians  might  not  have  a  full  understanding  of  the  privacy  issues   created  when  requiring  individual  logins.    based  on  this  assumption,  we  expected  that  many  of  the   librarians  would  not  be  fully  aware  of  what  user  tracking  data  was  being  kept.    examples  include   network  authentication,  tracking  cookies,  web  browser  history,  and  user  sign-­‐in  sheets.  the  study   found  that  librarians  are  often  unsure  of  what  data  is  being  logged  with  51  (or  45%)  of  113   libraries  reporting  this.    only  19%  reported  knowing  with  certainly  that  no  tracking  data  was  kept.     of  those  that  did  know  that  tracking  data  was  being  kept,  most  had  no  idea  how  long  this  data  was   retained.   conclusion   this  study  found  that  66%  (or  75)  of  the  113  surveyed  north  carolina  academic  libraries  required   some  form  of  user  authentication  on  their  public  computers.  the  researchers  reviewed  an   extensive  amount  of  data  to  identify  the  factors  involved  with  this  decision.    these  factors   included  individual  demographics,  such  as  city  population,  book  volume,  type  of  academic  library,   and  enrollment.    it  was  anticipated  that  by  looking  a  large  pool  of  academic  libraries  within  a   specific  region,  a  profile  might  emerge  that  would  predict  which  libraries  would  chose  to   authenticate.    even  with  comprehensive  data  about  the  75  libraries  that  authenticated,  a  profile  of   a  “typical”  authenticated  library  could  not  be  developed.    the  data  did  show  two  factors  of  any   statistical  significance  (enrollment  and  book  volume)  in  determining  a  library’s  decision  to   authenticate.    however,  the  decision  to  authenticate  could  not  be  predicted.    each  library’s   decision  to  authenticate  seems  to  be  based  on  the  unique  situation  of  that  library.     we  expected  to  find  that  most  libraries  would  authenticate  due  to  pressure  from  external  sources,   such  as  campus  it  departments,  administrators,  or  in  response  to  incidents  involving  the     user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     124   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   computers  in  the  public  area.  this  study  found  that  only  39%  (or  44)  libraries  surveyed   authenticated  due  to  these  factors  so  our  assumption  was  incorrect.      surprisingly,  we  found  that   25%  (or  28)  libraries  did  choose  to  authenticate  on  their  own.  the  need  to  control  the  use  of  their   limited  resources  seemed  to  have  precedence  over  any  other  factors  including  user  privacy.  we   did  expect  to  see  a  rise  in  the  number  of  libraries  that  authenticated  in  the  aftermath  of  9/11.  this   we  found  to  be  true.  looking  at  the  prior  research  that  define  an  actual  percentage  of   authentications  in  academic  libraries,  no  matter  how  limited  in  scope,  (for  example,  just  the  arl   libraries,  responding  libraries,  etc.),  there  does  seem  to  be  a  strong  trend  for  academic  libraries  to   authenticate.   our  results,  with  75%  of  academic  libraries  having  authentication,  support  the  conclusion  that   there  is  a  continued  trend  of  authentication  that  has  steadily  expanded  over  the  past  decade.  this   has  happened  in  spite  of  librarian’s  traditional  philosophy  on  access  and  academic  freedom.   libraries  are  seemingly  relinquishing  their  ethical  stance  or  have  other  priorities  that  make   authentication  an  attractive  solution  to  controlling  use  of  limited  or  licensed  resources.    our   survey  results  show  that  many  librarians  may  not  fully  understand  the  privacy  risks  inherent  in   authentication.    slightly  over  half  (52%)  of  the  libraries  reported  that  they  did  not  know  if  any   computer  or  network  log  files  were  being  kept  nor  for  how  long  they  are  kept.   the  issues  surrounding  academic  freedom,  access  to  information,  and  privacy  in  the  face  of   security  concerns  continue  to  effect  library  users.  academic  libraries  in  smaller  communities  are   often  the  only  nearby  source  of  scholarly  materials.  traditionally  these  resources  have  been  made   available  to  community  members,  high  school  students,  and  others  who  require  materials  beyond   the  scope  of  the  resources  of  the  public  or  school  library.  as  pointed  out,  restrictive  authentication   policies  may  hamper  the  ability  of  these  groups  to  access  the  information  they  need.  however,  the   data  showed  very  little  consistency  to  support  this  idea  with  respect  to  authentication  in  small   towns  and  communities  throughout  the  state.   some  of  the  surveyed  academic  libraries  made  a  strong  statement  that  they  are  not  authenticating   in  their  public  area  computers  and  have  every  intention  of  continuing  this  practice.  these  libraries   are  now  in  a  distinct  minority  and  we  expect  their  position  will  continually  be  challenged.    for   example,  at  western  carolina  university,  we  continue  to  employ  open  computers  in  the  public   areas  of  the  library  but  are  regularly  pressed  by  our  campus  it  department  to  implement   authentication.  we  have  so  far  been  successful  in  resisting  this  pressure  because  of  the   commitment  of  our  dean  and  librarians  to  preserving  the  privacy  of  our  patrons.   further  studies   as  a  follow-­‐up  to  this  study,  we  plan  to  contact  the  35  libraries  that  did  not  authenticate  to   determine  if  they  now  require  authentication  or  have  plans  to  do  so.  based  on  responses  to  this   survey,  we  expect  that  many  librarians  are  unaware  of  the  degree  to  which  authentication  can   undermine  patron  privacy.  we  suggest  an  in-­‐depth  study  be  conducted  to  determine  the  degree  of     information  technology  and  libraries  |  june  2015     125           understanding  among  librarians  about  potential  privacy  issues  with  authentication  in  the  context   of  their  longstanding  professional  position  on  academic  freedom  and  patron  confidentiality.             user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     126   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   appendix  a.     survey  questions   1.  select  the  library  you  represent:   2.  which  library  or  library  building  are  you  reporting  on?       • main  library  or  the  only  library  on  campus   • medical  library   • special  library   • other   3.  how  many  total  pcs  do  you  have  in  your  library  public  area  for  the  building  you  are  reporting   on?   4.  how  many  library  it  or  library  systems  staff  does  the  library  have?   5.  does  the  library’s  it/systems  staff  control  the  setup  of  these  pcs  in  the  library  public  area?   • yes   • shared  with  it  (campus  computing  center)   • it  (campus  computing  center)   • no  (please  specify  who  does  control  the  setup  of  these  pcs)   authentication   6.  is  any  type  of  authentication  required  or  mandated  to  use  any  of  the  pcs  in  the  library’s  public   area?   7.  were  you  required  to  use  this  authentication  on  any  of  the  pcs  in  the  library’s  public  area?   8.  what  organization  or  group  required  or  mandated  the  library  to  use  authentication  on  pc’s  in   the  library  public  area?   • the  library  itself   • it  or  some  unit  within  it   • other  (please  explain)   • not  sure   • college/university  administration         information  technology  and  libraries  |  june  2015     127           9.  do  you  know  the  reason’s  authentication  is  being  used?     • mandated  by  parent  institution  or  group   • prevent  misuse  of  resources   • other  (please  specify)   • control  the  public’s  use  of  these  pcs   10.  what  lead  the  library  to  control  the  use  of  pcs?       • inability  of  students  to  use  the  resource  due  to  overuse  by  the  public   • computer  abuse   • other  (please  specify)   11.  how  are  the  users  informed  about  the  authentication  policy?   • screen  saver   • web  page   • login  or  sign  on  screen   • training  session  or  other  presentation   • other  (please  specify)   12.  what  form  of  authentication  do  you  use?   • manual  paper  sign-­‐in  sheets   • individual  pc  based  sign-­‐in  or  scheduling  software   • centralized  or  networked  authentication  such  as  active  directory,  novell,  or  ers   (enterprise  resource  planning)  system  with  a  college/university  wide  identifier   • pre-­‐set  or  temporary  authorization  logins  or  guest  cards  handed  out  (please  specify  the   length  of  time  this  is  good  for)   • other  (please  specify)   13.  how  does  the  library  handle  user  privacy  of  authentication?   • anonymous  access  (each  session  is  anonymous  with  repeat  users  not  identified)   • anonymous  access  (each  session  is  anonymous  with  repeat  users  not  identified)   • identified  access   • pseudonymous  access  with  demographic  identification  (characteristics  of  users   determined  but  not  actual  identified)   • pseudonymous  access  (repeat  users  identified  but  not  the  identity  of  a  particular  user)         user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     128   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   14.  when  did  you  implement  authentication  of  the  pcs  in  the  library  public  area?   • this  year   • last  year   • 3-­‐5  years  ago   • 5-­‐10  years  ago   • don’t  know   student  only  pcs   15.  do  you  differentiate  between  student  only  pcs  and  guest/public  use  pcs  in  the  library  public   area?   17.  how  many  pcs  are  designated  for  student  only  pcs  in  the  library’s  public  area?   18.  do  you  require  authentication  to  access  student  only  pcs  in  the  library’s  public  area?   19.  what  does  authentication  provide  on  a  student  only  pc  once  an  affiliated  person  logs  in?   • access  to  specialized  software   • access  to  storage  space   • printing   • internet  access   • other  (please  specify)   20.  once  done  with  an  authenticated  session  on  a  student  only  pc,  how  is  authentication  on  a  pc   removed?     • user  is  required  to  log  out   • user  is  timed  out   • other  (please  specify)    21  what  authentication  issue  have  you  seen  in  your  library  with  student  only  pcs?   • id  management  issues  from  the  user  (e.g.,  like  forgetting  passwords)   • id  management  issues  from  the  network  (e.g.,  updating  changes  in  timely  fashion)   • timing  out  issues   • authentication  system  become  not  available   • other  (please  specify)   guest/public  pcs   22.  how  many  pcs  are  designated  for  guest  or  public  use  in  the  library’s  public  area?   23.  describe  the  location  of  these  guest/public  use  pcs.     information  technology  and  libraries  |  june  2015     129           • line-­‐of-­‐sight  to  library  service  desk   • all  in  one  general  area   • scattered  throughout  the  library   • other  (please  specify)   • in  several  groups  around  the  library   24.  do  you  require  authentication  to  access  guest/public  use  pcs  in  the  library’s  public  area?   25.  what  does  authentication  allow  for  guest  or  the  public  that  log  in?   • limited  software   • control,  limit  or  block  web  sites  that  can  be  accessed   • limited  or  different  charge  for  printing   • timed  or  scheduled  access   • internet  access   • other  (please  specify)   • control,  limit  or  block  access  to  library  resources  (such  as  databases  or  other  subscription   based  services)   26.  are  there  different  type  of  pcs  in  your  library  area?  check  those  that  apply.   • all  pcs  are  the  same   • some  have  different  type  of  software  (like  browser  only)   • some  have  time  or  scheduling  limitation   • some  have  printing  limitations   • some  have  specialized  equipment  attached  (like  scanners,  microfiche  readers,  etc.)   • some  control,  limit  or  block  web  sites  that  can  be  accessed   • some  control,  limit  or  block  access  to  library  resources  (such  as  database  or  other   subscription  based  services)   • other  (please  specify)   wireless  access   27.  do  you  have  wireless  access  in  your  library  public  area?   28.  do  you  require  authentication  to  your  wireless  access  in  the  library  public  area?   29.  does  the  library  have  its  own  wireless  policies  different  from  the  campus’s  policy?   30.  what  methods  are  used  to  give  guests  or  the  public  access  to  your  wireless  access?  check   those  that  apply.   • no  access  to  guest  or  general  public   • paperwork  and/or  signature  required  before  access  given     user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     130   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   • limited  access  by  time   • open  access   • limited  access  by  resource  (such  as  internet  access  only)   • other   incident  reports   31.  has  your  library  had  any  known  incidents  of  breach  of  privacy  that  you  know  about?   32.  has  your  library  had  any  incidents  of  improper  use  of  public  pcs  (such  as  cyber  stalking,  child   pornography,  terrorism,  etc.?)   33.  have  these  incidents  required  investigation  or  digital  forensics  work  to  be  done?   34.  who  handled  the  work  of  investigation?   • library  it  or  library  systems  staff   • it  or  campus  computing  center   • campus  police   • other  law  enforcement   • unsure   • other  (please  specify)   computer  activity  logs   35.  do  you  know  what  computer  activity  logs  are  kept?  (if  unsure,  end,  if  not  ask)   • authentication  logs  (who  logged  in)   • browsing  history  (kept  on  pc  after  reboot)   • browsing  history  (kept  in  centralized  log  files)   • scheduling  logs  (manual  or  software)   • software  use  logs   • none   • unsure   • other  (please  specify)   36  do  you  know  how  long  computer  activity  logs  are  kept?   • 24  hours  or  less     • week   • month   • year   • unknown     information  technology  and  libraries  |  june  2015     131           references   1.   pam  dixon,  "ethical  risks  and  best  practices,"  journal  of  library  administration  47,  no.  3/4   (may  2008):  157.     2.   scott  carlson,  “to  use  that  library  computer,  please  identify  yourself,”  chronicle  of  higher   education,  june  25,  2004,  a39.   3.   lori  driscoll,  library  public  access  workstation  authentication,  spec  kit  277  (washington,  d.c.:   association  of  research  libraries,  2003).     4.   martin  cook  and  mark  shelton,  managing  public  computing,  spec  kit  302  (washington,  d.c.:   association  of  research  libraries,  2007).     5.   diana  oblinger,  “it  security  and  academic  values,”  in  computer  and  network  security  in   higher  education,  ed.    mark  luker  and  rodney  petersen  (jossey-­‐bass,  2003):  1-­‐13.   6.   code  of  ethics  of  the  american  library  association,   http://www.ala.org/advocacy/proethics/codeofethics/codeethics     7.   fair  information  practices  adopted  by  the  organization  for  economic  cooperation  and   development,  http://www.oecd.org/sti/security-­‐privacy     8.   ”niso  best  practices  for  designing  web  services  in  the  library  context,”  niso  rp-­‐2006-­‐01   (bethesda,  md:  national  information  standards  organization,  2006)   9.   dixon,  “ethical  issues  implicit  in  library  authentication  and  access  management.”   10.  howard  carter,  "misuse  of  library  public  access  computers:  balancing  privacy,  accountability,   and  security,"  journal  of  library  administration  36,  no.  4    (april  2002):  29-­‐48.   11.   julie  still  and  vibiana  kassabian,  "the  mole's  dilemma:  ethical  aspects  of  public  internet   access  in  academic  libraries,"  internet  reference  services  quarterly  4,  no.  3  (january  1,  1999):   7-­‐22.   12.   don  essex,  "opposing  the  usa  patriot  act:  the  best  alternative  for  american  librarians,"   public  libraries  43,  no.  6  (november  2004):  331-­‐340.   13.   lynne  weber  and  peg  lawrence,  "authentication  and  access:  accommodating  public  users  in   an  academic  world."  information  technology  &  libraries  29,  no.  3(september  2010):  128-­‐140.   14.   nancy  courtney,  "barbarians  at  the  gates:  a  half-­‐century  of  unaffiliated  users  in  academic   libraries,"  journal  of  academic  librarianship  27,  no.  6  (november  2001):  473.   15.   nancy  courtney,  "unaffiliated  users’  access  to  academic  libraries:  a  survey,"  the  journal  of   academic  librarianship  29,  no.  1  (2003):  3-­‐7.     user  authentication  in  the  public  library  area  of  academic  libraries  in  north  carolina  |     132   ellern,  hitch,  and  stoffan   doi:  10.6017/ital.v34i2.5770   16.   barbara  best-­‐nichols,  “community  use  of  tax-­‐supported  academic  libraries  in  north   carolina:  is  unlimited  access  a  right?”  north  carolina  libraries  51  (fall  1993):  120-­‐125.   17.   nancy  courtney,  "authentication  and  library  public  access  computers:  a  call  for  discussion,"   college  &  research  libraries  news  65,  no.  5  (may  2004):  269-­‐277.   18.   rita  barsun,  "library  web  pages  and  policies  toward  “outsiders”:  is  the  information  there?"   public  services  quarterly  1,  no.  4    (october  2003):  11-­‐27.   19.   american  library  directory  :  a  classified  list  of  libraries  in  the  united  states  and  canada,  with   personnel  and  statistical  data,  62nd  ed.  (new  york:  information  today,  2009)   20.   http://statelibrary.ncdcr.gov/ld/aboutlibraries/nclibrarydirectory2011.pdf.       21.   karen  schneider,  “so  they  won’t  hate  the  wait:  time  control  for  workstations,”  american   libraries,  29  no.  11  (1998):  64.   22.   code  of  ethics  of  the  american  library  association.   23.   karen  schneider,  “privacy:  the  next  challenge,”  american  libraries,  30,  no.  7  (1999):  98.   3 hoist by their own petard a funny thing happened at ala midwinter. what's more, it was fascinating as well, for it was one of the loveliest examples of "communications dysfunction" i've ever seen. (dysfunction: impaired or abnormal functioning.) librarians-information scientists-have always been concerned with the transfer of information. in recent times, this concern has been explicitly identified as constituting the major component of the profession's domain. whether one interprets information to be the book, and discusses its transfer in terms of acquisitions, circulation, and interlibrary loan, or one interprets information to be datum, and discusses transfer in terms of access, retrieval, and transfer, the fact remains that information transfer is the area of concern of the information profession. yet, as is already evident from the paragraph above, the medium being used to relay the message, the unit which is basic to the process of information transfer, i.e., the word, is a fractious thing. one would think that informationalists would be among the most alert to this frailty of language; yet, though the problem has been addressed at great length by a great many, members of our profession have not been predominant among them. we, too, use words ever more loosely, violate structure ever more often, and transpose jargon ever more freely-unaware, and, apparently, uncaring that in the process we are vitiating the very foundation of our field. and thus, at the palmer house in chicago, during a very balmy january midwinter meeting of the american library association, a select group of professional practitioners who had gathered together to work together found themselves caught in their own trap. they were unable to communicate! information specialists-listening without hearing, reading without comprehending, talking without communicating. it was almost frightening. "network" concerns got defined in terms of the need for reimbursement for interlibraty loan. the phrases "data base interchange," "machine-readable record exchange," and "networking" were being used interchangeably, engendering damaging misconceptions. the distinction between "conb:act negotiation assistance" (which clr will provide the anable serials group) and "contracting" (which clr is not doing here) was not made. legislative "networks" described procedural, not substantive, activity. the jargon of internal revenue code section 4942 ( j) 4 journal of library automation vol. 7/1 march 1974 ( 3) (operating foundation) and the jargon of the technical sector ( operations) were interpreted as being synonymous. and the word standard lost its identity altogether. the irony is overwhelming. like the old adage about the shoemaker's children who don't have shoes, it would appear that it is the information specialists who cannot communicate.-ruth l. tighe, new england library information network microsoft word september_ital_park_proofed.docx evaluation  of  semi-­‐automatic     metadata  generation  tools:  a  survey     of  the  current  state  of  the  art     jung-­‐ran  park    and     andrew  brenza     information  technology  and  libraries  |  september  2015             22   abstract   assessment  of  the  current  landscape  of  semi-­‐automatic  metadata  generation  tools  is  particularly   important  considering  the  rapid  development  of  digital  repositories  and  the  recent  explosion  of  big   data.  utilization  of  semi-­‐automatic  metadata  generation  is  critical  in  addressing  these   environmental  changes  and  may  be  unavoidable  in  the  future  considering  the  costly  and  complex   operation  of  manual  metadata  creation.  to  address  such  needs,  this  study  examines  the  range  of   semi-­‐automatic  metadata  generation  tools  (n  =  39)  while  providing  an  analysis  of  their  techniques,   features,  and  functions.  the  study  focuses  on  open-­‐source  tools  that  can  be  readily  utilized  in  libraries   and  other  memory  institutions.  the  challenges  and  current  barriers  to  implementation  of  these  tools   were  identified.  the  greatest  area  of  difficulty  lies  in  the  fact  that  the  piecemeal  development  of  most   semi-­‐automatic  generation  tools  only  addresses  part  of  the  issue  of  semi-­‐automatic  metadata   generation,  providing  solutions  to  one  or  a  few  metadata  elements  but  not  the  full  range  of  elements.   this  indicates  that  significant  local  efforts  will  be  required  to  integrate  the  various  tools  into  a   coherent  set  of  a  working  whole.  suggestions  toward  such  efforts  are  presented  for  future   developments  that  may  assist  information  professionals  with  incorporation  of  semi-­‐automatic  tools   within  their  daily  workflows.     introduction   with  the  rapid  increase  in  all  types  of  information  resources  managed  by  libraries  over  the  last   few  decades,  the  ability  of  the  cataloging  and  metadata  community  to  describe  those  resources  has   been  severely  strained.  furthermore,  the  reality  of  stagnant  and  decreasing  library  budgets  has   prevented  the  library  community  from  addressing  this  issue  with  concomitant  staffing  increases.   nevertheless,  the  ability  of  libraries  to  make  information  resources  accessible  to  their   communities  of  users  remains  a  central  concern.  thus  there  is  a  critical  need  to  devise  efficient   and  cost  effective  ways  of  creating  bibliographic  records  so  that  users  are  able  to  find,  identify,   and  obtain  the  information  resources  they  need.     one  promising  approach  to  managing  the  ever-­‐increasing  amount  of  information  is  with  semi-­‐ automatic  metadata  generation  tools.  semi-­‐automatic  metadata  generation  tools     jung-­‐ran  park  (jung-­‐ran.park@drexel.edu)  is  editor,  journal  of  library  metadata,  and  associate   professor,  college  of  computing  and  informatics,  drexel  university,  philadelphia.     andrew  brenza  (apb84@drexel.edu)  is  project  assistant,  college  of  computing  and  informatics,   drexel  university,  philadelphia.     evaluation  of  semi-­‐automatic  metadata  generation  tools|  park  and  brenza     doi:  10.6017/ital.v34i3.5889   23   concern  the  use  of  software  to  create  metadata  records  with  varying  degrees  of  supervision  from  a   human  specialist.1  in  its  ideal  form,  semi-­‐automatic  metadata  generation  tools  are  capable  of   extracting  information  from  structured  and  unstructured  information  resources  of  all  types  and   creating  quality  metadata  that  not  only  facilitate  bibliographic  record  creation  but  also  semantic   interoperability,  a  critical  factor  for  resource  sharing  and  discovery  in  the  networked  environment.   through  the  use  of  semi-­‐automatic  metadata  generation  tools,  the  library  community  has  the   potential  to  address  many  issues  related  to  the  increase  of  information  resources,  the  strain  on   library  budget,  the  need  to  create  high-­‐quality,  interoperable  metadata  records,  and,  ultimately,   the  effective  provision  of  information  resources  to  users.   there  are  many  potential  benefits  to  semi-­‐automatic  metadata  generation.  the  first  is  scalability.   because  of  the  quantity  of  information  resources  and  the  costly  and  time-­‐consuming  nature  of   manual  metadata  generation,2  it  is  increasingly  apparent  that  there  simply  are  not  enough   information  professionals  available  for  satisfying  the  metadata-­‐generation  needs  of  the  library   community.  semi-­‐automatic  metadata  generation,  on  the  other  hand,  offers  the  promise  of  using   high  levels  of  computing  power  to  manage  large  amounts  of  information  resources.  in  addition  to   scalability,  semi-­‐automatic  metadata  generation  also  offers  potential  cost  savings  through  a   decrease  in  the  time  required  to  create  effective  records.  furthermore,  the  time  savings  would   allow  information  professionals  to  focus  on  tasks  that  are  more  conceptually  demanding  and  thus   not  suitable  for  automatic  generation.  finally,  because  computers  can  perform  repetitive  tasks   with  relative  consistency  when  compared  to  their  human  counterparts,  automatic  metadata   generation  promises  the  ability  to  create  more  consistent  records.  a  potential  increase  in   consistency  of  quality  metadata  records  would,  in  turn,  increase  the  potential  for  interoperability   and  thereby  the  accessibility  of  information  resources  in  general.  thus  semi-­‐automatic  metadata   generation  offers  the  potential  to  not  only  ease  resource  description  demands  on  the  library   community  but  also  to  improve  resource  discovery  for  its  users.     goals  of  the  study   assessment  of  the  current  landscape  of  semi-­‐automatic  metadata  generation  tools  is  particularly   important  considering  the  fast  development  of  digital  repositories  and  the  recent  explosion  of   data  and  information.  utilization  of  semi-­‐automatic  metadata  generation  is  critical  to  address  such   environmental  changes  and  may  be  unavoidable  in  the  future  considering  the  costly  and  complex   operation  of  manual  metadata  creation.  even  though  there  are  promising  experimental  studies   that  exploit  various  methods  and  sources  for  semi-­‐automatic  metadata  generation,3  a  lack  of   studies  assessing  and  evaluating  the  range  of  tools  have  been  developed,  implemented,  or   improved.  to  address  such  needs,  this  study  aims  to  examine  the  current  landscape  of  semi-­‐ automatic  metadata  generation  tools  while  providing  an  evaluative  analysis  of  their  techniques,   features,  and  functions.  the  study  primarily  focuses  on  open-­‐source  tools  that  can  be  readily   utilized  in  libraries  and  other  memory  institutions.  the  study  also  highlights  some  of  the   challenges  still  facing  the  continued  development  of  semi-­‐automatic  tools  and  the  current  barriers     information  technology  and  libraries  |  september  2015       24   to  their  incorporation  into  the  daily  workflows  for  information  organization  and  management.   future  directions  for  the  further  development  of  tools  are  also  discussed.     toward  this  end,  a  critical  review  of  the  literature  in  relation  to  semi-­‐automatic  metadata   generation  tools  published  from  2004  to  2014  was  conducted.  databases  such  as  library  and   information  sciences  abstracts  and  library,  information  science  and  technology  abstracts  were   searched  and  germane  articles  identified  through  review  of  titles  and  abstracts.  because  the   problem  of  creating  viable  tools  for  the  reliable  automatic  generation  of  metadata  is  a  not  a   problem  limited  to  the  library  and  information  science  professions,4  database  searches  were   expanded  to  include  those  databases  pertinent  to  the  computing  science,  including  proquest   computing,  academic  search  premier,  and  applied  science  and  technology.  keywords,  such  as   “automatic  metadata  generation,”  “metadata  extraction,”  “metadata  tools,”  and  “text  mining,”   including  their  stems,  were  used  to  explore  the  databases.  in  addition  to  keyword  searching,   relevant  articles  were  also  identified  within  the  reference  sections  of  articles  already  deemed   pertinent  to  the  focus  of  the  survey  as  well  as  through  the  expansion  of  results  lists  through  the   application  of  relevant  subject  terms  applied  to  pertinent  articles.  to  ensure  that  the  latest,  most   reliable  developments  in  automatic  metadata  were  reviewed,  various  filters,  such  as  date  range   and  peer-­‐review,  were  employed.  once  tools  were  identified,  their  capabilities  were  tested  (when   possible),  their  features  were  noted,  and  overarching  developments  were  determined.     the  remainder  of  the  article  provides  an  overview  of  the  primary  techniques  developed  for  the   semi-­‐automatic  generation  of  metadata  and  a  review  of  the  open-­‐source  metadata  generation   tools  that  employ  them.  the  challenges  and  current  barriers  to  semi-­‐automatic  metadata  tool   implementation  are  described  as  well  as  suggestions  for  future  developments  that  may  assist   information  professionals  with  integration  of  semi-­‐automatic  tools  within  the  daily  workflow  of   technical  services  departments.     current  techniques  for  the  automatic  generation  of  metadata   as  opposed  to  manual  metadata  generation,  semi-­‐automatic  metadata  generation  relies  on   machine  methods  to  assist  with  or  to  complete  the  metadata-­‐creation  process.  greenberg   distinguished  between  two  methods  of  automatic  metadata  generation:  metadata  extraction  and   metadata  harvesting.5  metadata  extraction  in  general  employs  automatic  indexing  and   information  retrieval  techniques  to  generate  structured  metadata  using  the  original  content  of   resources.  on  the  other  hand,  metadata  harvesting  concerns  a  technique  to  automatically  gather   metadata  from  individual  repositories  in  which  metadata  has  been  produced  by  semi-­‐automatic  or   manual  approaches.  the  harvested  metadata  can  be  stored  in  a  central  repository  for  future   resource  retrieval.   within  this  dichotomy  of  extraction  methods,  there  are  several  other  more  specific  techniques   that  researchers  have  developed  for  the  semi-­‐automatic  generation  of  metadata.  polfreman  et  al.   identified  an  additional  six  techniques  that  have  been  developed  over  the  years:  meta-­‐tag   harvesting,  content  extraction,  automatic  indexing,  text  and  data  mining,  extrinsic  data  auto     evaluation  of  semi-­‐automatic  metadata  generation  tools|  park  and  brenza     doi:  10.6017/ital.v34i3.5889   25   generation,  and  social  tagging.6  although  the  last  technique  is  not  properly  a  semi-­‐automatic   metadata  generation  technique  because  it  is  used  to  generate  metadata  with  a  minimum  of   intervention  required  by  metadata  professionals,  it  can  be  viewed  as  a  possible  mode  to   streamline  the  metadata  creation  process.     both  greenberg  and  polfreman  provide  comprehensive,  high-­‐level  characterizations  of  the   techniques  employed  in  current  semi-­‐automatic  metadata  generation  tools.  however,  an   evaluation  of  these  techniques  within  the  context  of  a  broad  survey  of  the  tools  themselves  and  a   comprehensive  enumeration  of  currently  available  tools  are  not  addressed.  thus,  although  these   techniques  will  be  examined  for  the  remainder  of  this  section,  they  serve  simply  as  a  framework   through  which  this  study  provides  a  current  and  comprehensive  analysis  of  the  tools  available  for   use  today.  each  section  provides  an  overview  of  the  relevant  technique,  a  discussion  of  the  most   current  research  related  to  it,  and  the  tools  that  employ  that  technique.   the  tables  included  in  each  section  provide  lists  of  the  semi-­‐automatic  metadata  generation  tools   (n  =  39)  evaluated  in  the  course  of  this  survey.  the  information  presented  in  the  tables  is   designed  to  provide  a  characterization  of  each  tool:  its  name,  its  online  location,  the  technique(s)   used  to  generate  metadata,  and  a  brief  description  of  the  tool’s  functions  and  features.  only  those   tools  that  are  currently  available  for  download  or  for  use  as  web  services  at  the  time  of  this   writing  are  included.  furthermore,  the  listed  tools  have  not  been  strictly  limited  to  metadata-­‐ generation  applications  but  also  include  some  content  management  system  software  (cmss)  as   these  generally  provide  some  form  of  semi-­‐automatic  metadata  extraction.  typically,  cmss  are   capable  of  extracting  technical  metadata  as  well  as  data  that  can  found  in  the  meta-­‐tags  of   information  resources,  such  as  the  file  name,  and  using  that  information  as  the  title  of  a  record.     meta-­‐tag  extraction   meta-­‐tag  extraction  is  a  computing  process  whereby  values  for  metadata  fields  are  identified  and   populated  through  an  examination  of  metadata  tags  within  or  attached  to  a  document.  in  other   words,  it  is  a  form  of  metadata  harvesting  and,  possibly,  conversion  of  that  metadata  into  other   formats.  marcedit,  the  most  widely  used  semi-­‐automatic  tool  for  the  generation  of  metadata  in  us   libraries,7  is  an  example  of  this  technique.  marcedit  essentially  harvests  metadata  from  open   archives  initiative  protocol  for  metadata  harvesting  (oai-­‐pmh)  compliant  records  and  offers  the   user  the  opportunity  to  convert  those  records  to  a  variety  of  formats,  including  machine-­‐readable   cataloging  (marc),  machine-­‐readable  cataloguing  in  xml  (marc  xml),  metadata  object   description  schema  (mods),  and  encoded  archival  description  (ead).  it  also  offers  the   capabilities  of  converting  records  from  any  of  the  supported  formats  to  any  of  the  other  supported   formats.   other  examples  of  this  technique  are  the  web  services  editor-­‐converter  dublin  core  metadata  and   firefox  dublin  core  viewer  extension.  both  of  these  programs  search  html  files  on  the  web  and   convert  information  found  in  html  meta-­‐tags  to  dublin  core  elements.  in  the  cases  of  marcedit     information  technology  and  libraries  |  september  2015       26   and  editor-­‐converter  dublin  core,  users  are  presented  with  the  converted  information  in  an   interface  that  allows  the  user  to  edit  or  refine  the  data.     figure  1  provides  an  illustration  of  the  extracted  metadata  of  the  new  york  times  homepage  using   editor-­‐converter  dublin  core,  while  figure  2  offers  an  illustration  of  the  editor  that  this  web   service  provides.         figure  1.  screenshot  of  extracted  dublin  core  metadata  using  editor-­‐converter  dublin  core.     figure  2.  screenshot  of  editor-­‐converter  dublin  core  editing  tool  (only  eight  of  the  sixteen  fields   are  visible  in  this  screenshot).     evaluation  of  semi-­‐automatic  metadata  generation  tools|  park  and  brenza     doi:  10.6017/ital.v34i3.5889   27   perhaps  the  biggest  weakness  to  this  type  of  tool  is  that  it  entirely  depends  on  the  quality  of  the   metadata  from  which  the  programs  harvest.  this  can  be  most  readily  seen  in  the  above  figure  by   the  lack  of  values  for  a  number  of  the  dublin  core  fields  for  the  the  new  york  times  website.   programs  that  solely  employ  the  technique  of  meta-­‐tag  harvesting  are  unable  to  infer  values  for   metadata  elements  that  are  not  already  populated  in  the  source.     table  1  lists  the  tools  that  support  meta-­‐tag  harvesting  either  as  the  sole  technique  or  as  one  of  a   suite  of  techniques  used  to  generate  metadata  from  resources.  of  the  thirty-­‐nine  tools  evaluated   for  this  study,  nineteen  support  meta-­‐tag  harvesting.   tool  name   location   techniques   functions/features   anvl/erc   kernel  metadata   conversion   toolkit   http://search.cpan.org/~jak/file-­‐ anvl/anvl     meta-­‐tag  harvester   a  utility  that  can  automatically   convert  records  in  the  anvl   format  into  other  formats  such  as   xml,  json  (javascript  object   notation),  turtle  or  plain,  among   others.   apache  poi  –   text  extractor   http://poi.apache.org/download.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   apache  poi  provides  basic  text   extraction  for  all  project   supported  file  formats.  in   addition  to  the  (plain)  text,   apache  poi  can  access  the   metadata  associated  with  a  given   file,  such  as  title  and  author.     apache  tika   http://tika.apache.org/     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   built  on  apache  poi,  the  apache   tika  toolkit  detects  and  extracts   metadata  and  text  content  from   various  documents.   ariadne   harvester   http://sourceforge.net/projects/ariadn ekps/files/?source=navbar     meta-­‐tag  harvester   a  harvester  of  oai-­‐pmh   compliant  records  which  can  be   converted  to  various  other   schema  such  as  learning  object   metadata  (lom).       bibframe  tools   http://www.loc.gov/bibframe/implem entation/     meta-­‐tag  harvester   bibframe  offers  a  number  of   tools  for  the  conversion  of   marcxml  documents  to   bibframe  documents.    web   service  and  downloadable   software  are  both  available.   data  fountains   http://datafountains.ucr.edu/     content  extractor;   automatic  indexer;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   scans  html  documents  and  first   extracts  information  contained  in   meta-­‐tags.    if  information  is   unavailable  in  meta-­‐tags,  the   program  will  use  other   techniques  to  assign  values.     includes  a  focused  web  crawler   that  can  target  websites   concerning  a  specific  subject.           information  technology  and  libraries  |  september  2015       28   dublin  core  meta   toolkit   http://sourceforge.net/projects/dcmet atoolkit/files/?source=navbar     meta-­‐tag  harvester   transforms  data  collected  via   different  methods  into  dublin   core  (dc)  compatible  metadata.   dspace   http://www.dspace.org/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator;  social   tagging   automatically  extracts  technical   information  regarding  file  format   and  size.    can  also  extract  some   information  from  meta-­‐tags.   editor-­‐converter   dublin  core   metadata   http://www.library.kr.ua/dc/dcedituni e.html   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   scans  html  documents,   harvesting  metadata  from  tags   and  converting  them  to  dc.   embedded   metadata   extraction  tool   (emet)   http://www.artstor.org/global/g-­‐ html/download-­‐emet-­‐public.html     content  extractor;   emet  is  a  tool  designed  to   extract  metadata  embedded  in   jpeg  and  tiff  files.   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   firefox  dublin   core  viewer   extension   http://www.splintered.co.uk/experime nts/73/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   scans  html  documents,   harvesting  metadata  from  tags   and  displaying  them  in  dublin   core.   marcedit   http://marcedit.reeset.net/   meta-­‐tag  harvester   harvests  oai-­‐pmh  compliant   data  and  converts  it  to  various   formats  including  dc  and  marc.   metatag   extractor   software   http://meta-­‐tag-­‐ extractor.software.informer.com/     meta-­‐tag  harvester   permits  customizable  extraction   features,  harvesting  meta-­‐tags  as   well  as  contact  information  from   websites.   my  meta  maker   http://old.isn-­‐ oldenburg.de/services/mmm/     meta-­‐tag  harvester   can  convert  manually  entered   data  into  dc.   photo  rdf-­‐gen   http://www.webposible.com/utilidade s/photo_rdf_generator_en.html   meta-­‐tag  harvester   generates  dublin  core  and   resource  description  framework   (rdf)  output  from  manually   entered  input.   pymarc   https://github.com/edsu/pymarc     meta-­‐tag  harvester   scripting  tool  in  python  language   for  the  batch  processing  of  marc   records,  similar  to  marcedit.       repomman   http://www.hull.ac.uk/esig/repomman /index.html   meta-­‐tag  harvester;   content  extractor;   extrinsic  auto-­‐ generator   automatically  extracts  various   elements  for  documents   uploaded  to  fedora  such  as   author,  title,  description,  and  key   words,  among  others.    results  are   presented  to  user  for  review.   sherpa/romeo   http://www.sherpa.ac.uk/romeo/api.h tml     meta-­‐tag  harvester   a  machine-­‐to-­‐machine   application  program  interface   (api)  that  permits  the  automatic   look-­‐up  and  importation  of   publishers  and  journals.   url  and  metatag   extractor   http://www.metatagextractor.com/     meta-­‐tag  harvester   permits  the  targeted  searching  of   websites  and  extracts  urls  and   meta-­‐tags  from  those  sites.   table  1.  semi-­‐automatic  tools  that  support  meta-­‐tag  harvesting.     evaluation  of  semi-­‐automatic  metadata  generation  tools|  park  and  brenza     doi:  10.6017/ital.v34i3.5889   29   content  extraction     content  extraction  is  a  form  of  metadata  extraction  whereby  various  computing  techniques  are   used  to  extract  information  from  the  information  resource  itself.  in  other  words,  these  techniques   do  not  rely  on  the  identification  of  relevant  meta-­‐tags  for  the  population  of  metadata  values.  an   example  of  this  technique  is  the  kea  application,  a  program  developed  at  the  new  zealand  digital   library  that  uses  machine  learning,  term  frequency-­‐inverse  document  frequency  (tf.idf)  and   first-­‐occurrence  techniques  to  identify  and  assign  key  phrases  from  the  full  text  of  documents.8   the  major  advantage  of  this  type  of  technique  is  that  the  extraction  of  metadata  can  be  done   independently  of  the  quality  of  metadata  associated  with  any  given  information  resource.  another   example  of  a  tool  utilizing  this  technique  is  the  open  text  summarizer,  an  open-­‐source  program   that  offers  the  capability  of  reading  a  text  and  extracting  important  sentences  to  create  a  summary   as  well  as  to  assign  keywords.  figure  3  provides  a  screenshot  of  what  a  summarized  text  might   look  like  using  the  open  text  summarizer.           figure  3.  open  text  summarizer:  sample  summary  of  text.   another  form  of  this  technique  often  relies  on  the  predictable  structure  of  certain  types  of   documents  to  identify  candidate  values  for  metadata  elements.  for  instance,  because  of  the   reliable  format  of  scholarly  research  papers—which  generally  include  a  title,  author,  abstract,   introduction,  conclusion,  and  reference  sections  in  predictable  ways—this  format  can  be  exploited   by  machines  to  extract  metadata  values  from  them.  several  projects  have  been  able  to  exploit  this   technique  in  combination  with  machine  learning  algorithms  to  extract  various  forms  of  metadata.     for  instance,  in  the  randkte  project,  optical  character  recognition  software  was  used  to  scan  a   large  quantity  of  legal  documents  from  which,  because  of  the  regularity  of  the  documents’     information  technology  and  libraries  |  september  2015       30   structure,  structural  metadata  such  as  chapter,  section,  and  page  number  could  be  extracted.9  in   contrast,  the  kovacevic’s  project  used  the  predictable  structure  of  scholarly  articles,  converting   documents  from  pdf  to  html  files  while  preserving  the  formatting  details  and  used  classification   algorithms  to  extract  metadata  regarding  title,  author,  abstract,  and  keywords,  among  other   elements.10   table  2  lists  the  tools  that  support  content  extraction  either  as  the  sole  technique  or  as  one  of  a   suite  of  techniques  used  to  generate  metadata  from  resources.  of  the  thirty-­‐nine  tools  evaluated   for  this  study,  twenty  tools  support  some  form  of  content  extraction.   tool  name   location   techniques   functions/features   apache  poi— text  extractor   http://poi.apache.org/download.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   apache  poi  provides  basic  text   extraction  for  all  project   supported  file  formats.  in   addition  to  the  (plain)  text,   apache  poi  can  access  the   metadata  associated  with  a  given   file,  such  as  title  and  author.     apache   standol   https://stanbol.apache.org/     content  extractor;   automatic  indexer   extracts  semantic  metadata  from   pdf  and  text  files.  can  apply   extracted  terms  to  ontologies.   apache  tika   http://tika.apache.org/     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   built  on  apache  poi,  the  apache   tika  toolkit  detects  and  extracts   metadata  and  text  content  from   various  documents.   biblio  citation   parser   http://search.cpan.org/~mjewell/   biblio-­‐citation-­‐parser-­‐1.10/     content  extractor   a  set  of  modules  for  citation   parsing.   catmdedit   http://catmdedit.sourceforge.net/     content  extractor   catmdedit  allows  the  automatic   creation  of  metadata  for   collections  of  related  resources,   in  particular  spatial  series  that   arise  as  a  result  of  the   fragmentation  of  geometric   resources  into  datasets  of   manageable  size  and  similar   scale.   crossref   http://www.crossref.org/   simpletextquery/     content  extractor   this  web  service  returns  digital   object  identifiers  for  inputted   references.     data   fountains   http://datafountains.ucr.edu/     content  extractor;   automatic  indexer;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   scans  html  documents  and  first   extracts  information  contained  in   meta-­‐tags.  if  information  is   unavailable  in  meta-­‐tags,  the   program  will  use  other   techniques  to  assign  values.   includes  a  focused  web  crawler   that  can  target  websites   concerning  a  specific  subject.       evaluation  of  semi-­‐automatic  metadata  generation  tools|  park  and  brenza     doi:  10.6017/ital.v34i3.5889   31   embedded   metadata   extraction   tool  (emet)   http://www.artstor.org/global/g   -­‐html/download-­‐emet-­‐public.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   emet  is  a  tool  designed  to  extract   metadata  embedded  in  jpeg  and   tiff  files.   freecite   http://freecite.library.brown.edu/     content  extractor   free  parsing  tool  for  the   extraction  of  reference   information.  can  be  downloaded   or  used  as  a  web  service.     general   architecture   for  text   engineering   (gate)   http://gate.ac.uk/overview.html     content  extractor;   automatic  indexer;   natural  language  processor  and   information  extractor.   kea     http://www.nzdl.org/kea/index_old   .html#download   content  extractor;   automatic  indexer   analyzes  the  full  texts  of   resources  and  extracts   keyphrases.  keyphrases  can  also   be  mapped  to  customized   ontologies  or  controlled   vocabularies  for  subject  term   assignment.   metagen   http://www.codeproject.com/articles /41910/metagen-­‐a-­‐project   -­‐metadata-­‐generator-­‐for-­‐visual-­‐st     content  extractor;   automatic  indexer   used  to  build  a  metadata   generator  for  silverlight  and   desktop  clr  projects,  metagen   can  be  used  as  a  replacement  for   static  reflection  (expression   trees),  reflection  (walking  the   stack),  and  various  other  means   for  deriving  the  name  of  a   property,  method,  or  field.     metagenerator   http://extensions.joomla.org/   extensions/site-­‐management/seo-­‐a   -­‐metadata/meta-­‐data/11038   content  extractor   a  plugin  that  automatically   generates  description  and   keyword  meta-­‐tags  by  pulling   text  from  joomla  content.  with   this  plugin  you  can  also  control   some  title  options  and  add  url   meta-­‐tags.     ont-­‐o-­‐mat   http://projects.semwebcentral.org/   projects/ontomat/     content  extractor   assists  user  with  annotation  of   websites  that  are  semantic  web-­‐ compliant.  may  now  include  a   feature  that  automatically   suggests  portions  of  the  website   to  annotate.   open  text   summarizer   http://libots.sourceforge.net/   content  extractor   extracts  pertinent  sentences  from   a  resource  to  build  a  free  text   description.     information  technology  and  libraries  |  september  2015       32   parscit   http://wing.comp.nus.edu.sg/parscit/ #ws     content  extractor   open-­‐source  string-­‐parsing   package  for  the  extraction  of   reference  information  from   scholarly  articles.   repomman   http://www.hull.ac.uk/esig/   repomman/index.html   meta-­‐tag  harvester;   content  extractor;   extrinsic  auto-­‐ generator   automatically  extracts  various   elements  for  documents   uploaded  to  fedora  such  as   author,  title,  description,  and  key   words,  among  others.  results  are   presented  to  user  for  review.   simple   automatic   metadata   generation   interface   (samgi)   http://hmdb.cs.kuleuven.be/amg/   download.php   content  extractor;   extrinsic  auto-­‐ generator   a  suite  of  tools  that  is  able  to   automatically  extract  metadata   elements  such  as  key  phrase  and   language  from  documents  as  well   as  from  the  context  in  which  a   document  exists.     termine   http://www.nactem.ac.uk/software/   termine/     content  extractor   extracts  keywords  from  texts   through  c-­‐value  analysis  and   acromine,  an  acronym  identifier   and  dictionary.  available  as  free   web  service  for  academic  use.   yahoo  content   analysis  api   https://developer.yahoo.com/   contentanalysis/     content  extractor;   automatic  indexer   the  content  analysis  web   service  detects  entities/concepts,   categories,  and  relationships   within  unstructured  content.  it   ranks  those  detected   entities/concepts  by  their  overall   relevance,  resolves  those  if   possible  into  wikipedia  pages,   and  annotates  tags  with  relevant   metadata.   table  2.  semi-­‐automatic  tools  that  support  content  extraction   automatic  indexing   in  the  same  way  as  content  extraction,  automatic  indexing  involves  the  use  of  machine  learning   and  rule-­‐based  algorithms  to  extract  metadata  values  from  within  information  resources   themselves,  rather  than  relying  on  the  content  of  meta-­‐tags  applied  to  resources.  however,  this   technique  also  involves  the  mapping  of  extracted  metadata  terms  to  controlled  vocabularies  such   as  the  library  of  congress  subject  headings  (lcsh),  the  getty  thesaurus  of  geographic  names   (tgn),  or  the  library  of  congress  name  authority  file  (lcnaf),  or  to  domain-­‐specific  or  locally   developed  ontologies.  thus,  in  this  technique,  researchers  use  classifying  and  clustering   algorithms  to  extract  relevant  metadata  from  texts.  term-­‐frequency  statistics  or  if.idf,  which   determines  likelihood  of  keyword  applicability  through  its  relative  frequency  within  a  given     evaluation  of  semi-­‐automatic  metadata  generation  tools|  park  and  brenza     doi:  10.6017/ital.v34i3.5889   33   document  as  opposed  to  its  relative  infrequency  in  related  documents,  are  commonly  used  in  this   technique.     projects  such  as  john  hopkins  university’s  automatic  name  authority  control  (anac)  tool  utilizes   this  technique  to  extract  the  names  of  composers  within  its  sheet  music  collections  and  to  assign   the  authorized  form  of  those  names  based  on  comparisons  with  lcnaf.11  erbs  et  al.  also  use  this   technique  to  extract  key  phrases  from  german  educational  documents  which  are  then  used  to   assign  index  terms,  thereby  increasing  the  degree  to  which  related  documents  are  collocated   within  the  repository  and  the  consistency  of  subject  term  application.12   table  3  lists  the  tools  that  support  automatic  indexing  either  as  the  sole  technique  or  as  one  of  a   suite  of  techniques  used  to  generate  metadata  from  resources.  of  the  thirty-­‐nine  tools  evaluated   for  this  study,  seven  tools  support  some  form  of  automatic  indexing.   tool  name   location   techniques   functions/features   apache  poi— text  extractor   http://poi.apache.org/download.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   apache  poi  provides  basic  text   extraction  for  all  project   supported  file  formats.  in  addition   to  the  (plain)  text,  apache  poi  can   access  the  metadata  associated   with  a  given  file,  such  as  title  and   author.     apache  tika   http://tika.apache.org/     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   built  on  apache  poi,  the  apache   tika  toolkit  detects  and  extracts   metadata  and  text  content  from   various  documents.   data   fountains   http://datafountains.ucr.edu/     content  extractor;   automatic  indexer;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   scans  html  documents  and  first   extracts  information  contained  in   meta-­‐tags.  if  information  is   unavailable  in  meta-­‐tags,  the   program  will  use  other  techniques   to  assign  values.  includes  a   focused  web  crawler  that  can   target  websites  concerning  a   specific  subject.     digital  record   object   identification   (droid)   http://www.nationalarchives.gov.uk/   information-­‐management/manage   -­‐information/preserving-­‐digital   -­‐records/droid/     extrinsic  auto-­‐ generator   droid  is  a  software  tool   developed  by  the  national   archives  to  perform  automated   batch  identification  of  file  formats.   dspace   http://www.dspace.org/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   automatically  extracts  technical   information  regarding  file  format   and  size.  can  also  extract  some   information  from  meta-­‐tags.   editor-­‐ converter   dublin  core   metadata   http://www.library.kr.ua/dc/   dceditunie.html   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   scans  html  documents,   harvesting  metadata  from  tags   and  converting  them  to  dublin   core.     information  technology  and  libraries  |  september  2015       34   embedded   metadata   extraction   tool  (emet)   http://www.artstor.org/global/g   -­‐html/download-­‐emet-­‐public.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   emet  is  a  tool  designed  to  extract   metadata  embedded  in  jpeg  and   tiff  files.   firefox  dublin   core  viewer   extension   http://www.splintered.co.uk/   experiments/73/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   scans  html  documents,   harvesting  metadata  from  tags   and  displaying  them  to  dublin   core.   jhove   http://jhove.sourceforge.net/   #implementation     extrinsic  auto-­‐ generator   extracts  metadata  regarding  file   format  and  size  as  well  as   validating  the  structure  of  the   identified  file  format.   national   library  of   new   zealand— metadata   extraction   tool   http://meta-­‐extractor   .sourceforge.net/     extrinsic  auto-­‐ generator   developed  by  the  national  library   of  new  zealand  to   programmatically  extract   preservation  metadata  from  a   range  of  file  formats  like  pdf   documents,  image  files,  sound   files,  microsoft  office  documents,   and  others.   omeka   http://omeka.org/     extrinsic  auto-­‐ generator;  social   tagging   automatically  extracts  technical   information  regarding  file  format   and  size.     repomman   http://www.hull.ac.uk/esig/   repomman/index.html   meta-­‐tag  harvester;   content  extractor;   extrinsic  auto-­‐ generator   automatically  extracts  various   elements  for  documents  uploaded   to  fedora  such  as  author,  title,   description,  and  key  words,   among  others.  results  are   presented  to  user  for  review.   simple   automatic   metadata   generation   interface   (samgi)   http://hmdb.cs.kuleuven.be/amg/   download.php   content  extractor;   extrinsic  auto-­‐ generator   a  suite  of  tools  that  is  able  to   automatically  extract  metadata   elements  such  as  keyphrase  and   language  from  documents  as  well   as  from  the  context  in  which  a   document  exists.     table  3.  semi-­‐automatic  tools  that  support  automatic  indexing   text  and  data  mining   the  two  methods  discussed  above,  content  extraction  and  automatic  indexing,  rely  on  text-­‐  and   data-­‐mining  techniques  for  the  automatic  extraction  of  metadata.  in  other  words,  the  above   methods  utilize  machine-­‐learning  algorithms,  statistical  analysis  of  term  frequencies,  clustering   techniques,  or  techniques  that  examine  the  frequency  of  term  utilization  between  documents  as   opposed  to  the  use  of  controlled  vocabularies,  and  classifying  techniques,  or  techniques  that   exploit  the  conventional  structure  of  documents,  for  the  semi-­‐automatic  generation  of  metadata.   because  of  the  complexity  of  these  techniques,  few  tools  have  been  fully  developed  for  application   within  real-­‐world  library  settings.  rather,  most  uses  of  these  techniques  have  been  developed  to   solve  the  problems  of  automatic  metadata  generation  within  the  context  of  specific  research   projects.       evaluation  of  semi-­‐automatic  metadata  generation  tools|  park  and  brenza     doi:  10.6017/ital.v34i3.5889   35   there  are  two  reasons  for  this.  one  is  that,  as  many  researchers  have  noted,  the  effectiveness  of   machine  learning  techniques  depends  on  the  quality  and  quantity  of  training  data  used  to  teach   the  system.13,  14,  15  because  of  the  number  and  diversity  of  subject  domains  as  well  as  the  shear   variety  of  document  formats,  many  applications  are  designed  to  address  the  metadata  needs  of   very  specific  subject  domains  and  very  specific  types  of  documents.  this  is  a  point  that  kovacevic   et  al.  make  in  stating  that  machine  learning  techniques  generally  work  best  for  documents  of  a   similar  type,  like  research  papers.16  another  issue,  especially  as  it  applies  to  automatic  indexing,  is   the  fact  that,  as  gardner  notes,  controlled  vocabularies  such  as  the  lcsh  are  too  complicated  and   diverse  in  structure  to  be  applied  through  semi-­‐automatic  means.17  although  some  open-­‐source   tools  such  as  data  fountains  have  made  efforts  to  overcome  this  complexity,  projects  like  it  are  the   exception  rather  than  the  rule.  these  issues  signify  the  difficulty  with  developing  sophisticated   semi-­‐automatic  metadata  generation  tools  that  have  general  applicability  across  a  wide  range  of   subject  domains  and  format  types.  nevertheless,  for  semi-­‐automatic  metadata  generation  tools  to   become  a  reality  for  the  library  community,  such  complexity  will  have  to  be  overcome.   there  are,  however,  some  tools  that  have  broader  applicability  or  can  be  customized  to  meet  local   needs.  for  instance,  the  kea  keyphrase  extractor  offers  the  option  of  building  local  or  applying   available  ontologies  that  can  be  used  to  refine  the  extraction  process.  perhaps  the  most  promising   of  all  is  the  above  mentioned  data  fountains  suite  of  tools  developed  by  the  university  of   california.  the  data  fountains  suite  incorporates  almost  every  one  of  the  semi-­‐automatic   metadata  techniques  described  in  this  study,  including  sophisticated  content  extraction  and   automatic  indexing  features.  it  also  provides  several  ways  to  customize  the  suite  in  order  to  meet   local  needs.     extrinsic  data  auto-­‐generation   extrinsic  data  auto-­‐generation  is  the  process  of  extracting  metadata  about  an  information   resource  that  is  not  contained  within  the  resource  itself.  extrinsic  data  auto-­‐generation  can   involve  the  extraction  of  technical  metadata  such  as  file  format  and  size  but  can  also  include  the   extraction  of  more  complicated  features  such  as  the  grade  level  of  an  educational  resource  or  the   intended  audience  for  a  document.  the  process  of  extracting  technical  metadata  is  perhaps  one   area  of  semi-­‐automatic  metadata  generation  that  is  in  a  high  state  of  development,  included  in   most  cmss  such  as  dspace,18  as  well  as  other  more  sophisticated  tools  such  as  harvard’s  jhove,   which  can  recognize  at  least  7twelve  different  kinds  of  textual,  audio,  and  visual  file  formats.19  on   the  other  hand,  the  problem  of  semi-­‐automatically  generating  other  types  of  extrinsic  metadata,   like  grade  level,  are  of  the  most  difficult  to  solve.     as  leibbrandt  et  al.  note  in  their  analysis  of  the  use  of  artificial  intelligence  mechanisms  to   generate  subject  metadata  for  a  repository  of  educational  materials  at  the  education  services   australia,  the  extraction  of  extrinsic  metadata  such  as  grade  level  was  much  more  difficult  than   the  extraction  of  keywords  because  of  the  lack  of  information  surrounding  a  resource’s  context   within  the  resource  itself.20  this  difficulty  can  also  be  seen  in  the  absence  of  tools  that  support  the     information  technology  and  libraries  |  september  2015       36   extraction  of  extrinsic  data  beyond  those  that  are  harvesting  metadata  that  has  been  created   manually  or  extracting  technical  metadata.     table  4  lists  the  tools  that  support  extrinsic  data  auto-­‐generation  either  as  the  sole  technique  or  as   one  of  a  suite  of  techniques  used  to  generate  metadata  from  resources.  of  the  thirty-­‐nine  tools   evaluated  for  this  study,  thirteen  tools  support  some  form  of  extrinsic  data  auto-­‐generation.   tool  name   location   techniques   functions/features   apache  poi— text  extractor   http://poi.apache.org/download.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   apache  poi  provides  basic  text   extraction  for  all  project   supported  file  formats.  in  addition   to  the  (plain)  text,  apache  poi  can   access  the  metadata  associated   with  a  given  file,  such  as  title  and   author.     apache  tika   http://tika.apache.org/     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   built  on  apache  poi,  the  apache   tika  toolkit  detects  and  extracts   metadata  and  text  content  from   various  documents.   data   fountains   http://datafountains.ucr.edu/     content  extractor;   automatic  indexer;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   scans  html  documents  and  first   extracts  information  contained  in   meta-­‐tags.  if  information  is   unavailable  in  meta-­‐tags,  the   program  will  use  other  techniques   to  assign  values.  includes  a   focused  web  crawler  that  can   target  websites  concerning  a   specific  subject.     digital  record   object   identification   (droid)   http://www.nationalarchives.gov.uk/   information-­‐management/manage   -­‐information/preserving-­‐digital   -­‐records/droid/     extrinsic  auto-­‐ generator   droid  is  a  software  tool   developed  by  the  national   archives  to  perform  automated   batch  identification  of  file  formats.   dspace   http://www.dspace.org/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   automatically  extracts  technical   information  regarding  file  format   and  size.  can  also  extract  some   information  from  meta-­‐tags.   editor-­‐ converter   dublin  core   metadata   http://www.library.kr.ua/dc/   dceditunie.html   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   scans  html  documents,   harvesting  metadata  from  tags   and  converting  them  to  dublin   core.   embedded   metadata   extraction   tool  (emet)   http://www.artstor.org/global/g   -­‐html/download-­‐emet-­‐public.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   emet  is  a  tool  designed  to  extract   metadata  embedded  in  jpeg  and   tiff  files.   firefox  dublin   core  viewer   extension   http://www.splintered.co.uk/   experiments/73/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   scans  html  documents,   harvesting  metadata  from  tags   and  displaying  them  to  dublin   core.   jhove   http://jhove.sourceforge.net/   extrinsic  auto-­‐ extracts  metadata  regarding  file     evaluation  of  semi-­‐automatic  metadata  generation  tools|  park  and  brenza     doi:  10.6017/ital.v34i3.5889   37   #implementation     generator   format  and  size  as  well  as   validating  the  structure  of  the   identified  file  format.   national   library  of   new   zealand— metadata   extraction   tool   http://meta-­‐extractor   .sourceforge.net/     extrinsic  auto-­‐ generator   developed  by  the  national  library   of  new  zealand  to   programmatically  extract   preservation  metadata  from  a   range  of  file  formats  like  pdf   documents,  image  files,  sound   files,  microsoft  office  documents,   and  others.   omeka   http://omeka.org/     extrinsic  auto-­‐ generator;  social   tagging   automatically  extracts  technical   information  regarding  file  format   and  size.     repomman   http://www.hull.ac.uk/esig/   repomman/index.html   meta-­‐tag  harvester;   content  extractor;   extrinsic  auto-­‐ generator   automatically  extracts  various   elements  for  documents  uploaded   to  fedora  such  as  author,  title,   description,  and  key  words,   among  others.  results  are   presented  to  user  for  review.   simple   automatic   metadata   generation   interface   (samgi)   http://hmdb.cs.kuleuven.be/amg/   download.php   content  extractor;   extrinsic  auto-­‐ generator   a  suite  of  tools  that  is  able  to   automatically  extract  metadata   elements  such  as  keyphrase  and   language  from  documents  as  well   as  from  the  context  in  which  a   document  exists.       table  4.  semi-­‐automatic  tools  that  support  extrinsic  data  auto-­‐generation.   social  tagging     social  tagging  is  now  a  familiar  form  of  subject  metadata  generation  although,  as  mentioned   previously,  it  is  not  properly  a  form  of  automatic  metadata  generation.  nevertheless,  because  of   the  relatively  low  cost  in  generating  and  maintaining  metadata  through  social  tagging  and  its   current  widespread  popularity,  a  few  projects  have  attempted  to  utilize  such  data  to  enhance   repositories.  for  instance,  linstaedt  et  al.  use  sophisticated  computer  programs  to  analyze  still   images  found  within  flickr  and  then  use  this  analysis  to  process  new  images  and  to  propagate   relevant  user  tags  to  those  images.21     in  a  slightly  more  complicated  example,  liu  and  qin  employ  machine-­‐learning  techniques  to   initially  process  and  assign  metadata,  including  subject  terms,  to  a  repository  of  documents   related  to  the  computer  science  profession.22  however,  this  proof  of  concept  project  also  permits   users  to  edit  the  fields  of  the  metadata  once  established.  the  user-­‐edited  tags  are  then   reprocessed  by  the  system  with  the  hope  of  improving  the  machine-­‐learning  mechanisms  of  the   database,  creating  a  kind  of  feedback  loop  for  the  system.  specifically,  the  improved  tags  are  used   by  the  system  to  suggest  and  assign  subject  terms  for  new  documents  as  well  as  to  improve   subject  description  of  existing  documents  within  the  repository.  although  these  two  examples   provide  instances  of  sophisticated  reprocessing  of  social  tag  metadata,  these  capabilities  do  not   seem  to  be  present  in  open-­‐source  tools  at  this  time.  nevertheless,  social  tagging  capabilities  are   offered  by  many  cmss  such  as  omeka.  these  social  tagging  capabilities  may  offer  a  means  to   enhance  subject  access  to  holdings.       information  technology  and  libraries  |  september  2015       38   table  5  below  lists  the  tools  that  support  social  tagging  either  as  the  sole  technique  or  as  one  of  a   suite  of  techniques  used  to  generate  metadata  from  resources.  of  the  thirty-­‐nine  tools  evaluated   for  this  study,  two  tools  support  some  form  of  social  tagging.   tool  name   location   techniques   functions/features   dspace   http://www.dspace.org/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator;  social   tagging   automatically  extracts   technical  information   regarding  file  format  and  size.   can  also  extract  some   information  from  meta-­‐tags.   omeka   http://omeka.org/     extrinsic  auto-­‐ generator;  social   tagging   automatically  extracts   technical  information   regarding  file  format  and  size.     table  5.  semi-­‐automatic  tools  that  support  social  tagging.   challenges  to  implementation   although  semi-­‐automatic  metadata  generation  tools  offer  many  benefits,  especially  in  regards  to   streamlining  the  metadata-­‐creation  process,  there  are  significant  barriers  to  the  widespread   adoption  and  implementation  of  these  tools.  one  problem  with  semi-­‐automatic  metadata   generation  tools  is  that  many  are  developed  locally  to  address  the  specific  needs  of  a  given  project   or  as  part  of  academic  research.  this  local,  highly  focused  milieu  for  development  means  that   general  applicability  of  the  tools  is  potentially  diminished.  the  local  context  may  also  hinder   widespread  adoption  of  applications  that  would  result  in  strong  communities  of  application  users   and  provide  further  support  for  the  development  of  applications  in  an  open-­‐source  context.   because  of  the  highly  specific  nature  of  many  current  tools,  their  relevance  to  real-­‐world   processes  of  metadata  creation  within  the  broader  context  of  libraries’  diverse  information   management  needs  are  not  accounted  for.   additionally,  many  tools  are  focused  on  solving  one  or,  at  most,  a  few  metadata  generation   problems.  for  instance,  the  kea  application  is  designed  to  use  machine-­‐learning  techniques  for  the   sole  purpose  of  extracting  keywords,  the  open  text  summarizer  is  limited  to  automatic   extractions  of  summary  descriptions  and  keywords,  and  editor  converter  dublin  core  is  designed   to  extract  information  in  html  meta-­‐tags  and  map  them  to  dublin  core  elements.  because  of  the   piecemeal  development  of  semi-­‐automatic  generation  tools,  any  comprehensive  package  of  tools   will  require  the  significant  efforts  of  the  implementer  to  coordinate  the  selected  applications  and   to  produce  results  in  a  single  output.  this  is,  to  say  the  least,  a  daunting  task.     furthermore,  a  high  degree  of  technical  skill  is  required  to  implement  these  complex  tools.  many   of  the  more  sophisticated  tools  used  to  semi-­‐automatically  generate  metadata,  such  as  data   fountains,  kea,  and  apache  stanbol,  require  competence  in  a  variety  of  programming  languages.     evaluation  of  semi-­‐automatic  metadata  generation  tools|  park  and  brenza     doi:  10.6017/ital.v34i3.5889   39   significant  knowledge  of  c++,  python,  and  java,  are  required  to  implement  these  systems  properly.   the  high  degree  of  technical  knowledge  needed  to  implement  these  tools  means  that  many   libraries  and  other  institutions  may  not  have  resources  to  begin  implementing  them,  let  alone   incorporating  them  into  the  daily  workflows  of  the  metadata  creation  process.  further,  this  high   degree  of  technical  expertise  may  require  libraries  to  seek  assistance  outside  of  the  library.  in   other  words,  librarians  may  need  to  build  strong  collaborative  relationships  with  those  who  have   the  technical  skills,  expertise  and  credentials  to  implement  and  maintain  these  complicated  tools.   as  vellucci  et  al.  note  in  regards  to  their  development  of  the  metadata  education  and  research   information  commons  (meric),  a  metadata-­‐driven  clearinghouse  of  education  materials  related   to  metadata,  elaborate  and  multidisciplinary  partnerships  need  to  be  firmly  established  for  the   ultimate  success  of  such  projects,  including  the  sustained  support  of  the  highest  levels  of   administration.23  these  types  of  partnerships  may  be  difficult  to  establish  and  maintain  for  the   sustained  implementation  of  complicated  tools.     additionally,  sustainable  development  of  tools,  especially  in  regards  to  the  funding  needed  for   continued  development  of  open-­‐source  applications,  appears  to  be  a  significant  barrier  to   implementation.  for  instance,  at  the  time  of  this  writing,  many  of  the  tools  that  were  touted  in  the   literature  as  being  most  promising,  such  as  dc  dot,  reggie,  and  describethis,  are  no  longer   available  for  implementation.  beyond  the  fact  that  discontinuation  hurts  the  potential  adoption   and  continued  development  of  semi-­‐automatic  tools  within  real  world  library  and  other   information  settings,  there  is  also  the  problem  that  those  settings  that  have  in  fact  adopted  tools   may  lose  the  technical  support  of  a  central  developer  and  community  of  users.  thus   discontinuation  may  result  in  higher  rates  of  tool  obsolescence  and  increase  the  potential   expenses  of  libraries  who  have  implemented  and  then  must  change  applications.   finally,  the  application  of  semi-­‐automatic  metadata  tools  remains  relatively  untested  in  real-­‐world   scenarios.  as  polfreman  et  al.  note,  most  tests  of  automatic  metadata  generation  tools  have  several   of  problems,  including  small  sample  sizes,  narrow  scope  of  project  domains,  and  experiments  that   lack  true  objectivity  because  systems  are  generally  tested  by  their  creators.24  for  these  reasons,   libraries  and  other  institutions  may  be  reluctant  to  expand  the  resources  needed  to  implement   and  fully  integrate  a  complicated,  promising,  but  ultimately  untested,  tool  within  the  already   strained  workflows  of  its  processes.     conclusion   semi-­‐automatic  metadata  generation  tools  hold  the  promise  of  assisting  information  professionals   with  the  management  of  ever-­‐increasing  quantities  and  types  of  information  resources.  using   software  that  can  create  metadata  records  consistently  and  efficiently,  semi-­‐automatic  metadata   generation  tools  potentially  offer  significant  cost  and  time  savings.  however,  the  full  integration  of   these  tools  into  the  daily  workflows  of  libraries  and  other  information  settings  remains  elusive.   for  instance,  although  many  tools  have  been  developed  that  have  addressed  many  of  the  more   complicated  aspects  of  semi-­‐automatic  metadata  generation,  including  the  extraction  of     information  technology  and  libraries  |  september  2015       40   information  related  to  conceptually  difficult  areas  of  bibliographic  description  such  as  subject   terms,  open-­‐ended  resource  descriptions,  and  keyword  assignment,  many  of  these  tools  are   relevant  only  at  the  project  level  and  are  not  applicable  to  the  broader  contexts  needed  by   libraries.  in  other  words,  the  current  array  of  tools  exists  to  solve  experimental  problems  but  has   not  been  developed  to  the  point  that  the  library  community  can  implement  it  in  a  meaningful  way.     perhaps  the  greatest  area  of  difficulty  lies  in  the  fact  that  most  tools  only  address  part  of  the   problem  of  semi-­‐automatic  metadata  generation,  providing  solutions  to  the  semi-­‐automatic   generation  of  one  or  a  few  bibliographic  elements  but  not  the  full  range  elements.  this  means  that   for  libraries  to  truly  have  a  comprehensive  tool  set  for  the  semi-­‐automatic  generation  of  metadata   records,  significant  local  efforts  will  be  required  to  integrate  the  various  tools  into  a  working   whole.  couple  this  issue  with  the  instability  of  tool  development  and  maintenance  and  it  appears   that  the  library  community  may  lack  incentive  to  invest  already  strained  and  limited  resources  in   the  adoption  of  these  tools.   thus  it  appears  that  a  number  of  steps  will  need  to  be  taken  before  the  library  community  can   seriously  consider  the  incorporation  of  semi-­‐automatic  metadata  generation  tools  within  its  daily   workflows.  first,  it  seems  that  the  integration  of  these  various  tools  into  a  coherent  set  of   applications  is  likely  the  next  step  in  the  development  of  viable  semi-­‐automatic  metadata   generation.  since  most  small  libraries  likely  do  not  have  the  resources  required  to  integrate  these   disparate  tools  together,  let  alone  incorporate  them  within  existing  library  systems,  a  single   package  of  tools  will  be  needed  simply  from  a  resource  perspective.  secondly,  considering  the  high   level  of  technical  expertise  needed  to  implement  the  current  array  of  tools,  the  integrated  set  of   tools  must  be  accomplished  in  such  a  way  as  to  foster  implementation,  utilization,  and   maintenance  with  a  minimum  of  technical  expertise.  for  instance,  if  an  integrated  set  of  tools  that   functioned  across  a  wide  range  of  subject  domains  and  format  types  could  be  developed,  the  suite   might  be  akin  to  the  cmss  currently  employed  by  many  libraries.  furthermore,  with  a  suite  of   tools  that  are  relatively  easy  to  use,  adaption  would  likely  increase.  this  might  result  in  a  stable   community  of  users  that  would  foster  the  further  development  of  the  tools  in  a  sustainable   manner.  a  comprehensive,  relatively  easy  to  implement  set  of  tools  might  foster  independent   testing  of  those  tools.  the  independent  testing  of  the  semi-­‐automatic  tools  is  needed  to  provide  an   objective  basis  for  tool  evaluation  and  further  development.   finally,  designing  automated  workflows  tailored  to  the  subject  domain  and  types  of  resources   seems  to  be  an  essential  step  for  integrating  semi-­‐automatic  metadata  generation  tools  into   metadata  creation.  such  workflows  may  delineate  data  elements  that  can  be  generated  by   automated  meta-­‐tag  extractor  from  data  elements  that  need  to  be  refined  and  manually  created  by   cataloging  and  metadata  professionals.  to  develop,  maximize,  and  sustain  semi-­‐automatic   metadata  generation  workflows,  administrative  support  for  finance,  human  resources,  and   training  is  critical.       evaluation  of  semi-­‐automatic  metadata  generation  tools|  park  and  brenza     doi:  10.6017/ital.v34i3.5889   41   thus,  although  many  of  the  technical  aspects  of  semi-­‐automatic  metadata  generation  are  well  on   their  way  to  being  solved,  many  other  barriers  exist  that  might  limit  adoption.  further,  these   barriers  may  have  a  negative  influence  on  the  continued,  sustainable  development  of  semi-­‐ automatic  metadata  generation  tools.  nevertheless,  there  is  a  critical  need  that  the  library   community  finds  ways  to  manage  the  recent  explosion  of  data  and  information  in  cost-­‐effective   and  efficient  ways.  semi-­‐automatic  metadata  generation  holds  the  promise  to  do  just  that.     acknowledgement   this  study  was  supported  by  the  institute  of  museum  and  library  services.   references     1.     jane  greenberg,  kristina  spurgin,  and  abe  crystal,  “final  report  for  the  amega  (autozmatic   2.     sue  ann  gardner,  “cresting  toward  the  sea  change,”  library  resources  &  technical  services   56,  no.  2  (2012):  64–79,  http://dx.doi.org/10.5860/lrts.56n2.64.   3.     for  details,  see  jung-­‐ran  park  and  caimei  lu,  “application  of  semi-­‐automatic  metadata   generation  in  libraries:  types,  tools,  and  techniques,”  library  &  information  science   research  31,  no.  4  (2009):  225–31,  http://dx.doi.org/10.1016/j.lisr.2009.05.002.   4.     erik  mitchell,  “trending  tech  services:  programmatic  tools  and  the  implications  of   automation  in  the  next  generation  of  metadata,”  technical  services  quarterly  30,  no.  3  (2013):   296–10,  http://dx.doi.org/10.1080/07317131.2013.785802.     5.     jane  greenberg,  “metadata  extraction  and  harvesting:  a  comparison  of  two  automatic   metadata  generation  applications,”  journal  of  internet  cataloging  6,  no.  4  (2004):  59–82,   http://dx.doi.org/10.1300/j141v06n04_05.     6.     malcolm  polfreman,  vanda  broughton,  and  andrew  wilson,  “metadata  generation  for   resource  discovery,”  jisc,  2008,   http://www.jisc.ac.uk/whatwedo/programmes/resourcediscovery/autometgen.aspx.   7     park  and  lu,  “application  of  semi-­‐automatic  metadata  generation  in  libraries.”   8.   kea  automatic  keyphrase  extraction  homepage,  http://www.nzdl.org/kea/index_old.html.   9.     wilhelmina  randtke,  “automated  metadata  creation:  possibilities  and  pitfalls,”  serials   librarian  64,  no.  1–4  (2013):  267–84,  http://dx.doi.org/10.1080/0361526x.2013.760286.       10.    aleksandar  kovačević  et  al.,“automatic  extraction  of  metadata  from  scientific  publications  for   cris  systems.”  electronic  library  and  information  systems  45,  no.  4  (2011):  376–96,   http://dx.doi.org/10.1108/00330331111182094.     information  technology  and  libraries  |  september  2015       42     11.    mark  patton  et  al.,  “toward  a  metadata  generation  framework:  a  case  study  at  johns  hopkins   university,”  d-­‐lib  magazine  10,  no.  11  (2004),   http://www.dlib.org/dlib/november04/choudhury/11choudhury.html.     12.    nicolai  erbs,  iryna  gurevych,  and  marc  rittberger,  “bringing  order  to  digital  libraries:  from   keyphrase  extraction  to  index  term  assignment.”  d-­‐lib  magazine  19,  no.  9/10  (2013),   http://www.dlib.org/dlib/september13/erbs/09erbs.html.   13.    polfreman,  broughton,  and  wilson,  “metadata  generation  for  resource  discovery.”   14.    randtke,  “automated  metadata  creation.”   15.    xiaozhong  liu  and  jian  qin,  “an  interactive  metadata  model  for  structural,  descriptive,  and   referential  representation  of  scholarly  output,”  journal  of  the  association  for  information   science  &  technology  65,  no.  5  (2014):  964–83,  http://dx.doi.org/10.1002/asi.23007.   16.    kovačević  et  al.,  “automatic  extraction  of  metadata  from  scientific  publications  for  cris   systems.”   17.    gardner,  “cresting  toward  the  sea  change.”   18.    mary  kurtz,  “dublin  core,  dspace,  and  a  brief  analysis  of  three  university  repositories,”   information  technology  &  libraries  29,  no.  1  (2010):  40–46,   http://dx.doi.org/10.6017/ital.v29i1.3157.     19.    “jhove  -­‐  jstor/harvard  object  validation  environment,”  jstor,     http://jhove.sourceforge.net.   20.    richard  leibbrandt  et  al.,  “smart  collections:  can  artificial  intelligence  tools  and  techniques   assist  with  discovering,  evaluating  and  tagging  digital  learning  resources?”  international   association  of  school  librarianship:  selected  papers  from  the  annual  conference  (2010).   21.    stefanie  lindstaedt  et  al.,  “automatic  image  annotation  using  visual  content  and   folksonomies,”  multimedia  tools  &  applications  42,  no.  1  (2009):  97–113,   http://dx.doi.org/10.1007/s11042-­‐008-­‐0247-­‐7.   22.    liu  and  qin,  “an  interactive  metadata  model.”   23.    sherry  vellucci,  ingrid  hsieh-­‐yee,  and  william  moen,  “the  metadata  education  and  research   information  commons  (meric):  a  collaborative  teaching  and  research  initiative,”  education   for  information  25,  no.  3/4  (2007):  169–78.   24.    polfreman,  broughton,  and  wilson,  “metadata  generation  for  resource  discovery.”   27 automatic format recognition of marc bibliographic elements: a review and projection brett butler: butler associates, stanford, california. a review and discussion of the technique of automatic format recognition ( afr) of bibliographic data are presented. a comparison is made of the record-building facilities of the library of congress, the university of california (both afr techniques), and the ohio college library center (non-afr). a projection of a next logical generation is described. introduction the technique commonly identified as "format recognition" has more potential for radically changing the automation programs of libraries than any other technical issue today. while the development of marc has provided an international standard, and various computer developments provide increasingly lower operating costs, the investment in converting a catalog into machine-readable form has kept most libraries from integrating automated systems into their operations. the most expensive part of the conversion to machine-readable form has been the human editing required (generally by a cataloger) to identify the many variable portions of the marc-format cataloging record. a full cataloging record contains several hundred possible sections (or fields) in the marc format. research at the library of congress (lc) into this problem resulted in the concept of "format recognition" to reduce cataloging input costs. with the automatic format recognition ( afr) approach, an unedited cataloging entry is prepared (keypunched or otherwise converted to machine-readable form). then the afr computer program provides identification of the various elements of the catalog record through sophisticatedcomputer editing. a degree of human post-editing is generally assumed, but· the computer basically is assigned the responsibility of editing an un.(lae;n.1:itie~d block of text into a marc-format cataloging record. pioneering afr work at the library of congress is presently in use original cataloging input to the marc distribution service. this 28 journal of libm1·y automation vol. 7/1 march 1974 system is quite sophisticated because its output goal is a complete marc record with all fields, subfields, tags, and delimiters identified almost entirely through computer editing. the institute of library research (ilr) at the university of california, faced with the need to convert 800,000 catalog records to marc format, has developed a ·jess ambitious afr program which provides a level of identification sufficient to provide the desired book catalog bibliographic output, or to print catalog cards. . . · the aim of this paper is to examine these two afr strategies and consider their implications for input of two major classes of cataloging records: ( 1) lc or other cataloging records in standard card format; and ( 2) original cataloging not yet in card format. comparing the two afr strategies to an essentially non-afr format used at the ohio college library center for on-line ca:taloging input, we will propose a median strategy for original cataloging. format recognition ( ofr). the thesis is that differing strategies of input should be used for records already formatted into catalog card images and for those original cataloging items being input prior to card production. automatic format recognition an examination of the library of congress ( lc), university of california ( u c), ohio college library center ( oclc), and original format recognition ( ofr) strategies will show the operating differences .. a ·detailed field-by-field comparison of the nearly 500 distinct codes which can be identified in creation of a marc record is attached as appendix i. general comparisons can be made in several areas: input documents, manual coding, level of identification, input and processing costs, error correction, and flexibility in use. input documents-the lc/afr program operates from an uncoded typescript to a machine-readable record prepared through mt /st magnetic tape input. this typescript is, however, prepared from an lc cataloger's manuscript worksheet, in which thereis some inherent bibliographic order.· the lc/ afr program does not rely on this inherent order although its design takes advantage of the probable order in search strategies. lc/ afr could operate with keying of catalog cards, book catalog entries~ or any structure of bibliographic data. the uc program is designed more specifically to handle input of formatted catalog cards, and some of its afr strategy is based on· the sequence and physical indentation pattern on standard catalog cards. it would not work effectively on noncard format input without special recognition of some tagging conventions. the oclc program allows direct input to crt screen from any input docutnent; it requires complete identification of each cataloging field or subelement input. automatic fo1'mat recognition/butler 29 manual coding-lc/ afr requires minimal input coding. within the title paragraph, the title proper, the edition statement, and imprint are explicitly separated at input. series, subject, and other added entries are recognized initially from the roman and arabic numerals preceding them. aside jrom these items, virtually all marc fields are recognized by the computer editing program. uc/ afr inserts a code after the call number input, thus providing explicit identification at input. it also identifies each new indentation on the catalog card explicitly, thus implicitly identifying main entry, title, and certain other major cataloging blocks on the card. the oclc input specifications require explicit coding, some of which is prompted by the crt screen. level of identification-lc/ afr provides the highest possible level of marc record identification, deriving practically every field, subfield, and other code if it is present in an individual cataloging record.~ in evaluation of this element of lc/ afr it should be realized that the needs of the library of congress in creating original marc records for nationwide distribution (and its own use) are much more sophisticated and complex than those of any individual user library or system. the uc/ afr approach reflects a more task-oriented approach, deriving a sufficient level of identification to separate major bibliographic elements. this technique is clearly sufficient to produce computer-generated catalog cards or similar output in a standard manner. however, ucjafr lacks several identifiers, such as specific delimitation of information in the imprint field, which would make feasible the use of its records for further computer-generated processes. the oclc input format is of variable level; many elements are optional and are noted with an asterisk in appendix i. at its most complete, the oclc format specifically excludes only a very few marc fields, most notably geographic area and bibliographic price. input and p1'ocessing costs-direct cost information has not been published for production costs of any of the format recognition systems. the library of congress has reported that ..... the format recognition technique is of considerable value in speeding up input and lowering the cost per record for processing."3 while formal reports have not been published, informed opinion has placed the cost of creation of a marc record at a level of $3.00 ± $.50. format recognition is credited with an increase in productivity of about one-third on input keying and an increase of over 80 percent in human editing/proofreading, and actual computer 0 a number of standard subdivi'sions of various fields were first announced as part of the marc format in the 5th edition of books: a marc format, which was published in 1972.1 consequently they are not specified in format recognition process for marc records, published in 1970, which was used as the reference for this paper.2 they are, however, clearly subfields which could be identified by expansions of afr. these elements are marked with a lower-case "r" in appendix i. 30 journal of libtary automation vol. 7/1 march 1974 processing times approximate those achieved with earlier library of congress marc processing programs.4 lt would seem that afr may have lowered library of congress marc processing costs to the level of $2.00 ± $.50. in the final report of the recon pilot project, cost simulation projections for full editing and format recognition editing were given as $3.46 and $3.06 per record, respectively.6 while full cost information has not been derived for the uc/ afr program itself, figures have been informally reported at library automation meetings indicating that the cost of record creation was approximately $1.00 per entry. included in this figure is computer editing of name and other entries against a computerized authority file, which is done manually in the lc/ afr system. this program is undeniably the least-cost effort to date providing a marc-format bibliographic record. no cost data are provided on the oclc on-line input system. it can be observed that the coding required is quite similar to the pre-afr system in use at the library of congress itself, and that on-line crt input had been evaluated at lc as a higher-cost input technique than the magnetictape typewriters currently providing marc input. lc is considering, though, on-line crt access for subsequent human editing of the marc record created through off-line input and afr editing. error rate and correction-any afr strategy, with present state of the art, generates some error above the normal keying rate observed with edited records. the strategy aims for lowest overall cost by catching these errors in a postprocessing edit which must be performed even for records edited prior to input. the library of congress reports, "the format recognition production rate of 8.4 records per hour (proofing only) . . . is slightly less than that (about 9.2 per hour) for proofing edited records. with format recognition records, the editors must be aware of the errors made by the program ... as well as keying mistakes."6 the savings in prekeyboard editing and increased keying rates more than make up for this slight decrease in postprocessing editing. at the library of congress, where afr is used for production of marc records, a full editing process aims at 100 percent accuracy of input. while such a goal is statistically unreachable, considerable effort· is expended by the marc distribution service to provide the most accurate output possible. from a systems perspective, errors existing in marc records are perhaps less reprehensible than errors in printed bibliographic output, simply because the distributed marc record can be updated by subsequent distribution of a "correction" record. it should be noted that some marc subscribers have voiced concern about the increased percentage of "correction" records, which the library of congress indicates come primarily from cataloging changes rather than input edit errors. the uc/ afr program clearly takes a statistical approach to bibliographic element input and processing. shoffner has indicated that the scale of the 1,000,000 record input project caused a reevaluation of the feasibility of traditional procedures. 7 the result is, in the uc/ afr implementaautomatic format recognition/butler 31 tion, a marc record essentially devoid of human editing. for a smaller scale of production, the uc approach could be combined with post-editing such as that used at lc to increase overall file accuracy. in passing, however, it should be noted that rather sophisticated verification techniques are used in the uc/ afr approach which could be of value in future approaches. these include, for instance, comparison of all words against a machine-readable english-language dictionary; words not found in the dictionary are output for manual editing as suspected keypunch errors. little information is available on the error rates and corrections in the oclc system. however, most records keyed to the oclc system are for a local member's catalog card production, so feedback is provided and presumably errors are corrected through re-inputting to obtain a proper set of catalog cards. there is no central control on the quality of locally entered oclc records at present, except for the encoding standards developed by oclc. flexibility in use-a number of considerations are appropriate herehow many types of format (catalog cards, worksheets, etc. ) can be used as input, how many possible outputs can be developed from the derived marc format, how adaptable is the system to remote and multiple input locations, how many special equipment restrictions are there? the lc/ afr program is clearly the most flexible in ability to accept varying inputs and provide a flexible output. it is, however, not capable of any authority-file editing at present (this is done manually against lc's master files before input). while the input form could be used rather easily at remote locations, the marc afr programs themselves are not available for use outside the library of congress. the uc/afr program provides a rather minimal set of cataloging element subfields but does provide more sophisticated textual editing within the program. it is quite adaptable to remote input as long as the original "worksheet" is in catalog card format, a restriction which in effect requires a preinput human editing step for original cataloging input. the marc format provided would not be sufficient for some currently operating programs using the full marc format, but is quite sufficient for most bibliographic outputs. the oclc input program is dependent on visual editing at the time of crt keying. its flexibility in input is considerable, and outputs can approach a full marc record if all optional fields are identified. original format recognition a working conclusion of this review is that an afr program developed according to the strategy of the university of california will deliver a satisfactory marc-format record at a lower cost than other afr or nonafr alternatives. however, much of the efficiency of the ucjafr is based on the presence of an already existing lc-format catalog card from which to keyboard machine-readable data. for original cataloging to be keyboarded from a cataloger's worksheet, 32 journal of library automation vol. 7/1 march 1974 an original format . recognition strategy is proposed which · provides a somewhat more detailed format than the uc/ afr marc while retaining a generally flexible system and low input costs. several system considerations also guide the design of an ofr system designed for relatively general-purpose user input and multiple output functions: • no special equipment requirements for input keying; • no special knowledge of the marc format required; • minimal table-lookup or text searching in processing; • flexible options for depth of coding provided; and • sufficient depth of format derived for most applications. the ofr input strategy outlined in appendix i provides a much greater degree of explicit field coding at input than the afr programs outlined above. the basis for this decision is the judgment that this cataloging, being done originally by a professional, can readily be coded by element name prior to input. no effort is made to identify marc field elements which· occur with very low frequency, or which are of limited utility for most applications. for instance, the "meeting" type of entry occurs in all combinations, in only 1.8 percent of all records studied by the library of congress in its format recognition study. 8 marc elements requiring either extensive human editing or complex computer processing are likewise excluded from input, on a cost-utility basis. an example is the geographic area code, which must either be assigned by a knowledgeable editor or derived through extensive computer searching for the city /county of publication. however, where little penalty is attached to allowing input of coded information, the ofr format allows input for inclusion in the derived marc-format record. conclusion it is clear that the afr programs developed for specific needs by the library of congress and the university of california can be great factors for change in library automation strategies over the next decade. striking benefits in cost savings, ease of input, and subsequent processing are to be gained. the abbreviated outline of an original cataloging ( ofr) input strategy is simply a suggestion of a second generation of format recognition programs which will undoubtedly develop to serve more general needs for marc-format bibliographic input. references 1. u.s. library of congress, marc development office, books: a marc format. 5th ed. (washington, d.c.: u.s. government printing office, 1972). 2. format recognition process for marc records: a logical design (chicago: american library association, 1970). automatic format reeognitionjbvtler 33 3. henriette d. avram, lenore s. maruyama, and john c. rather, "automation activities in tl:te processing department of the library of congress," library resources & technical services 16:195-239 (spring 1972). · . 4. ibid., p.204, 206. 5. recon pilot project, recon pilot project final report, prepared by henriette d. avram. (washington, d.c.: library of congress, 1972). 6. avram, et al., "automation activities," p.206. 7. ralph m. shoffner, some implications of automatic recognition of bibliographic elements, technical paper no. 22. (berkeley, calif.: institute of library research, university of california, april 1971) . 8. format recognition process, p.48. appendix i format recognition input specifications code outline field tag the number listed is the field tag number of that bibliographic element in the marc format. each general field is listed first. following it are notes indicating areas within the field. fixed-field indicators within the field are listed first; each one's code number follows a slash after the field code (041/ 1 =field 41, indicator code 1). if there is more than one group of indicators, an additional code describes group 1 (il) or group 2 (i2). subfields within the field are alphabetic codes following a "+" sign after the field code ( 070+b =field 070, subfield b). field name the overall field name is listed first. fixed-field indicator names are listed at the first indenti'on under the field n arne. subfield names are listed at the second indention under the field n arne. treatment by program these codes indicate the processing provided for each field and subelement by the four computer processing systems considered. codes are slightly different for each column considered: lc the library of congress system. "r'' indicates that the element described is recognized by the program, rather than explicitly identified at input. "i" indicates the element is keyed and not recognized by the format recognition process. a small "r" denotes elements introduced to the marc format since afr documentation was published, but presumably treated by the afr program just as "r" elements. "0" indicates that the element marked is omitted from input altogether. uc the university of california system. codes are identical to those above, but the "r" code is not used. oclc the ohio college library center system. in addition to the above codes, "~" following any item denotes that input is optional. "i" code is used wherever an element is tagged even though the oclc programs create the marc format from these tags. ofr original format recognition proposals. codes are similar to those described in the previous paragraphs. field tag, indicator 015 015+a format recognition input specifications treatment by program field name lc uc oclc ofr national bibliography no. r 0 0 i~ 34 journal of library automation vol. 7/1 march 1974 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 025 overseas acquisition no. r 0 0 0 025ta 041 languages r r i i" 041/0 multilanguage indicator r 041/1 translation indicator r 041ta text/translation code r i 041tb summary language code r i 043 geographic area code r 0 0 0 043ta 049 holdings information 049ta holding library code 0 i i i 050 lc call number r r" i r" 050/0 book is in lc r 050/1 book not in lc r 050ta lc class number r i r 050tb book number r i r 051 lc copy statement r 0 0 0 051ta 051t b 051tc 060 natl. lib. medicine call no. r 0 r" 060ta nlm class number r i" 060t b nlm book number r i" 070 n.a.l. call number r 0 r" 070ta nal class number r 0 070t b nal book number r 0 082 dewey decimal classif. no. r 0 i r" 082ta ddc number r i 086 su. docs. classif. no. r 0 i 0 086ta su. docs. number r i 090 local call number ( lc) 0 r r 090ta lc class number i" r 090tb book number i" r 092 local call number (dewey) 0 0 .. i" 092ta dewey class number i" i" 092tb book number i" i" 100 personal name r r r 100/0,11 forename r 100/1, 11 single surname r 100/2,11 multiple surname r 100/3,11 name of family r 100/0,i2 main entry not subject r 100/1,i2 main entry is subject r loot a name r i loot b numeration r" i lootc title assoc. w /name r i lootd date r i loote relator r i lootk form subheading r i loott title of book r i lootl language r i" loot£ date of work r i" automatic format recognition/butler 35 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr ioo+p part of work r i" 110 corporate name r 0 i" 110/0,11 inverted surname r i" 11011,11 place or place + name r 110/2,11 direct-order name r i" 110/0,12 main entry not subject r 110/1,12 main entry is subject r 110+a name r i 110+b subordinate unit r i 110+c relator r i 110+k form subheading r i 110+t title of book r i 110+u nonprinting element r 0 110+1 language r i" 110+p part code r p' 110+f date of work r i" llo+g miscellaneous r i" ill conference or meeting, m.e. r 0 i 0 111/0,11 inverted surname r 111/1, 11 place or place+ name r 111/2,11 direct-order name r 111/0,12 main entry not subject r 111/1,12 main entry is subject r 111 +a name r, i 111 + b number r i 111+c place r i 11l+d date r i 111+e subordinate unit r i 111 +f date of publication r i" 111+g miscellaneous r i" 11l+k form subheading r i 111+1 language r i" 111+p part r i" 111+t title of book r i 130 uniform title heading, m.e. r 0 i i" 130,11 blank 130/0,12 main entry is not subject r 130/1,12 main entry is subject r 130+a uniform title heading r i 130+£ date of work r i" 130+ g miscellaneous r i" 130+h media qualifier r i" 130+k form subheading r i 130+1 language r i" 130+p part r i" 130+s alternate version r i" 130+t title of book r i 240 uniform title, supplied r r i i" 240/0,11 not printed on lc cards r 240/1,11 printed on lc cards r r 240+ a uniform title r r i 240+£ date of work i i" 240+k form subheading r, i 240+p part of work r i" 36 journal of librmy automation vol. 7/1 march 1974 appendix 1 (continued) field tag, treatment by program indicator field name lc ,, ·. uc oclg ofr 240+s version r i" 241 romanized title r 0 i" 241/0, i1 not printed on lc cards r 241/1,!1 printed on lc cards r 241 +a romanized title r i"' 245 title r r i i 245/0, i1 no title added entry r r r 245/1, i1 title added entry r r r 245/0,i2 nonfiling field r 0 245+a short title r r i r 245+b subtitle r r i r 245+c title page transcription r r i r 250 edition statement r 0 i r 250+a edition r i 0 250+ b additional information r i 260 imprint statement r 0 i i 260/0 publisher not m.e. r i r 260/1 publisher is m.e. r i r 260+a place of publication r i r 260+b publisher r i r 260+c date of publication r i r 300 collation r r i r 300+a pagination or volume r r i r 300+b illustration r 0 i 0 300+c height r 0 i 0 350+a bibliographic price r 0 0 i 400 series, personal name r (r) i r 400/0, i1 forename r 400/1, i1 single surname r 400/2, i1 multiple surname r 400/3, i1 name of family r 400/0,i2 author not main entry r 400/1,i2 author is main entry r 400+a name r i r 400+b numeration r i 400+c title associated r i 400+d dates r i 400+e relator r i 400+k form subheading r i 400+f date of work r i"' 400+1 language r i"' 400+p part of work r i" 400+t title of book r i 400+v volume or number r i 410 series, corporate name r (r) i i 410/0, i1 inverted surname r r i" 410/1, i1 place, place + n arne r r 410/2, i1 direct-order name r r i" 410/0,12 author not main entry r r 410/1,12 author is main entry r r 410+a name r i 410+b subordinate unit r i 410+e relator r i automatic format recognition/butler 37 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 410+f date of work r i" 410+g miscellaneous r i" 410+k form subheading r i 410+1 language r i" 410+p part r i" 410+t title of book r i 410+u nonprinting element r 0 410+v volume r i 411 series, conference title r 0 i i" 411/0, ii inverted surname r 411/1, ii place, place+ name r 411/2, ii ·.direct-order name r 411/0, i2 author not main enhy r 411/1, i2 ,author is main enhy r 411+a name r i 4ll+b number r i 411+c place r i 41l+d date r i 411+e name subordinates r i 41l+f publication date r i" 411+g miscellaneous r i" 41l+k form subdivision r i 411+1 language r i" 4ll+p part r i" 411+t title of book r i 4ll+v volume r i 440 series, title r r i i 440+a title r r i r 440+v volume or number r i r 490 series, untraced or r r r t:raced differently i 490/0 . series not traced i 490/1 • series traced diff. r i r 490+a series name r r i r 500 bibliographic notes r r r 500+a general note r r i" 501 +a "bound with" r 0 502+a dissertation r i" 503+a bibliography history 0 0 504+a bibliography note r i 505 contents note r r 505/0 . contents complete r 505/1 · contents incomplete r 505/2 partial contents r 505+a contents note r i" 520+a abstract or annotation r i 600 subject a.e., personal r r i i 600/0, ii forename r 600/1, ii single surname r 60012, ii multiple surname r 600/3, ii name of family r 60010, i2 lc subject heading code r i 600/1, i2 annotated card heading r i 38 ] ournal of library automation vol. 7/1 march 1974 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 600/2,12 nlm subject heading code r i 600/3, i2 nal subject heading code r 0 600/4, i2 other subject heading r i i 600+a name r i 600+b numeration r i 600+c associated title r i 600+d date r i 600+e relator r i 600+f date of work r i"' 600+k form subheading r i 600+1 language r i"' 600+t title of book r i 600+p part of book r i"' 600+x general subdivision r i 600+y period subdivision r i 600+z place subdivision r i 610 subject a.e., corporate r 0 i i 610/0,11 inverted surname r 610/1,11 place, place+ name r 610/2,11 direct-order name r 610/0,i2 lc subject heading code r i 610/1,i2 annotated card heading r i 610/2,i2 nlm subject heading code r i 610/3,12 nal subject heading code r 0 610/4,12 other subject heading r i i 610+a name r 0 1 i 610+b subordinate unit r i 610+e relator r i 610+f date of work r i"' 610+k form subheading r i 610+1 language r i"' 610+g miscellaneous r i"' 610+p part r i"' 610+t title of book r i 610+u nonprinting element r 0 610+x general subdivision r i 610+y period subdivision r i 610+z place subdivision r i 611 subject a.e., conference r 0 i 0 611/0,11 inverted surname r 611/1,11 place, place + n arne r 611/2,11 direct-order name r 611/0, i2 lc subject heading code r i 611/1, i2 annotated card heading r i 611/2, i2 nlm subject heading code r i 611/3, i2 nal subject heading code r 0 611/4, i2 other subject heading r i 611 +a name r i 611+b number r i 61l+c place r i 61l+d date r i 61l+e subordinate unit r i 611+f publication date r io 61l+g miscellaneous r i"' automatic format recognition/butler 39 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 6ll+k form subheading r i 611+1 language r i~ 61l+p part r i~ 61l+t title of book r i 6ll+x general subdivision r i 6ll+y . period subdivision r i 6ll+z place subdivision r i 630 subject a.e., uniform title r 0 i 0 630/0,i2 lc subject heading code r i 630/1,i2 annotated card heading r i 630/2, i2 nlm subject heading code r i 630/3,i2 nal subject heading code r 0 630/4, i2 other subject heading r i 630+a uniform title heading r i r 630+£ date of work r i~ 630+g miscellaneous r i~ 630+h media qualifier r i~ 630+k form subdivision r i 630+1 language r i~ 630+p part r i~ 630+s alternate version r i~ 630+t title r i 630+x general subdivision r i 630+y period subdivision r i 630+z place subdivision r i 650 subject a.e., topical r r i r 650/0, i2 lc subject heading code r i 650/1, i2 annotated card heading r i 650/2,i2 nlm subject heading code r i 650/3,i2 nal subject heading code r 0 650/4, i2 other subject heading r i i 650+a topical subject, place r i 650+b element after place r i 650+x general subdivision r i 650+y period subdivision r i 650+z place subdivision r i 651 subject a.e., geographic r 0 i 0 651/0, i2 lc subject heading code r i 651/1,i2 annotated card heading r i 651/2, i2 nlm subject heading code r i 65113,12 nal subject heading code r 0 65114,12 other subject heading r i 651+a geographic name, place r i 651+b element after place r i 651+x general subdivision r i 651+y period subdivision r i 651+z place subdivision r i 690 subject a.e., local topical 0 0 i~ 0 690+a topical subject, place 0 i 690+b element after place 0 i 690+x general subdivision 0 i 690+y period subdivision 0 i 690+z place subdivision 0 i 40 journal of libm1·y automation vol. 7/1 march 1974 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 691 subject a.e., local geogr. 0 0 f 0 691+a geographic name, place 0 i 691+b element after place 0 i 691+x general subdivision 0 i 69l+y period subdivision 0 i 691+z place subdivision 0 i 700 other a.e., personal name r r i r 700/0, i1 forename r 700/1, i1 single surname r 700/2, i1 multiple surname r 700/3, i1 name of family r 700/0, i2 alternate entry r 700/1, i2 secondary entry r 700/2, i2 analytical entry r 700+a name r i 700+b numeration r i 700+c title associated r i 700+d date r i 700+e relator r i 700+f publication date r i~ 700+k form subheading r i 700+1 language r io 700+p part of work r i~ 700+t title of book r i 710 other a.e., corporate name r 0 i j~; 7~0/0, i1 inverted surname r 710/1, i1 place, place + n arne r 710/2, i1 direct-order name r 710/0, i2 alternate entry r 710/1, i2 secondary entry r 710/2, i2 analytical entry r 710+a name r i 710+b subordinate unit r i 710+e relator r i 710+f date of work r i~ 710+g miscellaneous r io 710+k form subheading r i 710+1 language r i~ 710+p part of work r i~ 710+t title of work r i 710+u nonprinting element r 0 711 other a.e., conference r 0 i i~ 711/0, i1 inverted surname r 711/1, i1 place, place+ name r 711/2, i1 direct-order name r 711/0, i2 alternate entry r 711/1, i2 secondary entry r 711/2, i2 analytical entry r 711+a name r i 711+b number r i 711+c place r i 711+d date r i 711+e subordinate units r i 711+f date of work r i~ . ' automatic for.mat recognition/butler 41 appendix 1 (continued) field•tag, treatment by program indicator field name lc uc oclc ofr 711tg miscellaneous r i" 711tk form subheading r i. 711tl language r i" 711tp part of work r ·i" 711tt title of book r i 730 other a.e., uniform title r 0 i r 730/0,i2 alternate entry r 730/1, i2 secondary entry r 730/2,i2 analytical entry r 730ta uniform title r i 730tf date of work r i" 730tg miscellaneous r i" 730th media qualifier r i" 730tk form subdivision r i" 730tl language r i" 730tp part of work r i" 730ts alternate version r i" 730tt title of work r i 740 other a.e., title traced differently r r i r 740/0, i2 alternate entry r 740/l,i2 secondary entry r 740/2,i2 analytical entry r 740ta title different r i 800 series a.e., personal r r i" i 800/0 forename r 800/1 single surname r 800/2 multiple surname r 800/3 name of family r soot a name r i sooth numeration r i boote title associated r i 800td dates r i boote relator r i 800tf date of work r i" 800tk form subheading r i 800tl language r i" 800tp part of work r i" 800+t title of work r i 800tv volume or number r i 810 series a.e., corporate r r i" i" 810/0 inverted surname r 810/1 place, placet name r 810/2 direct-order name r slota name r i 810tb subordinate unit r i 810te relator r i 810tf date of work r i" 810tg miscellaneous r i" 810tk form subheading r i 810tl language r i" 810tp part of work r i" slott title of work r i 810tu nonprinting element r 0 42 ] ournal of library automation vol. 7/1 march 1974 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 810+v volume or number r i 811 series a.e., conference r 0 i<) 0 811/0 inverted surname r 811/1 place, place+ name r 811/2 direct-order name r 811+a name r i 811+b number r i 811+c place r i 811+d date r i 811+e subordinate unit r i 811+f date of work r io 811+g miscellaneous r jo 811+k form heading r i 811+1 language r i"' 811+p part of work r i"' 811+t title of book r i 811+v volume or number r i 840 series a.e., title r 0 i"' 0 840+a title r i 840+v volume or number r i 590+a local notes field 0 0 i"' 0 910+a user option data field 0 0 i"* 0 lib-s-mocs-kmc364-20141005043735 75 a cost effectiveness model for comparing various circulation systems thomas k. burgess: washington state university library two models for circulation systems costing are presented. both the auto~ mated and the manual models are based on experience gained in the analysis of circulation services at washington state university library. validation tests for the model assumptions are devised and explained. use of the models for cost effectiveness comparison and for cost prediction are discussed and examples are given showing their application. introduction many methods for analyzing cost effectiveness have been presented recently in the literature.1 one main difficulty with studies of effectiveness is in quantifying the benefits, or in the case of libraries, assigning values to the quantity or quality of the services offered. 2• 3 one way to circumvent this difficulty is to compare the costs of different methods of providing the same services. value assessment of the services is eliminated by keeping them constant as shown in most cost benefit studies.16 this, of course, is not always possible when comparing manual library systems with mechanized systems. library circulation systems, however, may fit this type of model with relative ease. for this reason, the models described below were developed to compare a manual with a mechanized system. they have the added advantage of allowing for the prediction of costs for either the manual or automated system based on certain circulation loads. the utilization of the models is probably best understood by working through an application. therefore, a description of these applications as performed at washington state university library will be used. assumptions based on practices peculiar to washington state university are removed by the model through the use of the activities definitions for our library. washington state university library has been operating a mechanized circulation system since 1967. based on past experience, the system has recently undergone major modifications to improve its capabilities. we consider it to be a highly efficient machine circulation system. thus, cost 76 journal of library automation vol. 6/2 june 1973 effectiveness comparison with a similar manual operation can provide information on effectiveness of automated circulation systems in general as well as on the wsu implementation. model consideration to insure that the comparisons were fair and that biases were held to a minimum, mathematical models had to be established with rather rigid constraints. validations of these models had to be devised to insure that extrapolations of the model results were meaningful. information about our manual system in operation prior to 1967 is sparse, as no analysis had been performed. it was decided that the manual model should, therefore, be a variant of the machine model, since our machine system includes a small manual system. if the models are to be useful to others, they should make very few assumptions about circulation tasks. therefore, the models should break out each specific task so costs can be accumulated. this also insures that only circulation tasks are counted. if total hours of staff assigned to circulation are used as the basic labor costs, their time at other library functions are included and would provide erroneous data. using a breakdown by tasks will allow use of the model even if major changes occur in organizational or physical rearrangement of the circulation functions. twenty-three basic activities were identified that would cover all circulation functions of our library. a similar list should be prepared for each library to be modeled.7 our list can be used as a guide. these functions and their definitions are listed in appendix a. fifteen functions represent activity for which both the quantity of the activities and the average time to perlorm it are required information for building the model. of the nine remaining activities, eight require only the measurement of total performance time. the last activity, computer operations, was subdivided into three parts: computer charges, library equipment rental costs, and computer personnel costs. the computer personnel costs represent time donated by the computer center to keypunch, decollate and burst printouts, and prepare and schedule jobs. these personnel costs are a part of the machine system and are not reflected otherwise in the computer charges. these three charges are summed and used as a single dollar figure in the model. in our machine system as in many other circulation systems it is impossible to split our computer cost for each circulation subfunction because we use integrated data bases which are charged as a single storage rental cost and not split up among the various programs. the collection of data for this study could have resulted in a sizeable effort and could have unduly biased the data which were to be collected.8 • 9 for example, circulation clerks might have taken as much time to measure the circulation transactions as the circulation transactions themselves required. therefore, we requested supervisors to estimate the time necessary for these tasks, the number of transactions performed, and the pera cost effectiveness model/ burgess 77 centage of staff and student hours used. these data were developed monthly for a three month period during the middle portion of a semester. validation of these estimates to insure their reasonability was accomplished by comparing the total time expended in circulation as reported in the collected data with the total time assigned to circulation activities as reported in the payroll records (the usual manner of estimating costs) .10 a surprisingly high degree of correlation was found primarily due to the fact that few of our circulation staff members have responsibilities outside of circulation. the payroll data also had to be adjusted to reflect actual hours used in circulation functions. a 25 percent figure was used: to reflect holidays and leaves-8.4-10 percent; coffeebreaks-8-12 percent; sickness-2 percent; tardiness and work slumps-3 percent; and miscellaneous-3 percent, for regular staff. by the same method 15 percent was determined for student help. the difference in total hours between the two samples · was less than 5 percent (appendix b, table 5). · the study data were collected from five separate organizational areas (three circulation desks, technical service division, and the library administrative office) which are reasonably independent of each other; the monthly variation in activities reported by the various units was also closely correlated. for example, the percent increase in checkouts for a month was approximately the same at all three circulation desks. model calculations-automated system mtmthly totals were averaged for each activity's transaction time, number of transactions, and percentage of effort allocated to staff, or student labor . (appendix b, tables 1, 2, and 3). average hourly wages were developed separately for student (part-time help) and staff based on salaries of personnel allocated to circulation. the total hours and salarjes were then calculated for staff and student help for each activity. the follo~g example shows the formulas used in calculating some of the entries in; tpe tables: a1 manual checkout transactions ( alt) times transaction time ( altt) equals total time expended ( alte) adjusted to hours [i] total part-time help in hours (ahth) equals (aite) times the percentage of student effort ( aippth) [2] total staff hours ( al 8 ) equals ( alte) times the percentage of staff help (alps) . [3] total salaries (aits) equals (alptn) times student rate (rpth) plus . (al8 ) times staff rate (rs) [4] x shelving total student hours xptn = xte · xppth [5] 18 journal of library automation vol. 6/ 2 june 1973 total staff hours xs = xre · xps total salaries xts = xs · rs + xpth · rpta all other activities were calculated in the same manner as shown above. personnel hours used were totaled and multiplied by the hourly rates. the salary totals and the computer costs were then added together to get the total system cost per month. (appendix b, table 6) v v [6] [7] total salary cost = 1.15 ~ ipta · rpta + 1.25 1: is · rs [8] i=a i=a figure 1 represents curves of monthly cost vs. monthly circulation. the automated system curve was determined from the initial model plotted point, and from extrapolations to other plotted points which were computed b~sed on the following factors: a 25 percent increase or decrease in circulation will result in a 5 percent increase or decrease in computer costs. this estimate results from analyzing the computer processes. the bulk of the computer cost results from sorting and other total file processes which are reasonably insensitive to changes in volume of updating. a factor of 25 percent change in circulation results in a 30 percent change in personnel costs. the 5 percent differential may be conservative, but results from the need for additional supervisory support with its higher salary for each additional operational position added. using the above factors, several additional points were predicted and plotted and the automated system curve was drawn to fit these points (appendix b, table 7). validation of these factors was determined by using budget information and circulation data available from the year 1968 (appendix b, table 8). these data were used to establish a point on the graph. the 1968 costs were compared to the predicted cost as shown by the curve for the circulation volume in 1968. this provided a cost differential which was within 1 percent of the curve predicted costs (figure 1) . . these data were adjusted to reflect annual circulation hours used in circulation in 1971. model cl\.lculations-manual system · the manual model was a modification of the automated model. obviously no machine costs were incurred, but costs for filing and retrieving cards from large tub files of book cards of items in circulation must be added to each' check-out or check-in procedure as well as to snags, holds, and other categories. since ·some· loaned materials are not included in our automated system, a small manual circulation operation runs parallel to the automated system and was included in the automated study. this small manual system served as the base activity for the manual model in the study. a cost effectiveness model/burgess 79 retrieval time from a card tub file is dependent on the number of the cards in the file. the tub file size is approximately equal to the size of the computer's circulation file. sample filing times were made on a catalog card file of comparable size. the results were an average of 40 seconds per item on timings of single records and of batches of alphabetized records to be filed. this figure was then used to extend the average time of the appropriate activities in the parallel manual systems data (activities al, b, il, k, m, and q). following the calculation method used in our automated system, data were developed from the 1971 circulation data and a curve was drawn for the manual system (appendix c and appendix b, table 7 and figure 1). validation of this curve by budget information available from 1967 shows that the difference in the predicted cost from actual cost was less than 2 percent (appendix c, table 5). this represents a significant correlation and validates the entire manual model. generalized use of the models as has been shown by the example of its use at wsu, the models provide for two functions: cost comparison of automated and manual circulation systems at the same levels of book circulation, and prediction of cost in either a manual or automated system at different levels of circulation. of course, combinations of these models may be made, such as: at what circulation levels are costs of both models equal? or, what will my costs .e • cron-over point 4,000 8/yjo 12,000 16,000 20/)00 24,000 28,000 32,000 36,000 monthly !m rrj is fi'e ill ie iw ck f?. ,;1. b u 0 i ;;.. 0 oj 0[&' 1s' i q ?:l1:> -lt.t e i q ~~ i 9 7 i 0 .:?'f ~ 0 ~ 0 q 7 3 08'7.$' /)1 unique # 00 09 7 update infor~l~tio!t d i l j l i i i i _l ll i ii i ijjjjjj 2-74 016 indexing 0 3 i 0 l j i i i i iii_ljljij 80 017 pay t.leth. 13 ').~··· 3 7-80 018 publisher ]) 11 n i :s r. ~ iii en/ name .l .. 4 7-40 019 publisher & :~ i.b jf.l.:.z:tritihi ia-ivl£1 i i j i i l_l_l_l i address l l j ll t i i 41-43 027 sullj .code 0 i 6 h4-h9 028 lf'.o.date 0 &, a lrl7j3j 51-7? 020 ublisher la2. e.w ll::i~iiljki i l.,t..j!t:--iwj 1yic61tc i f\1 i i i i i city,stat " 76-80 jo.:>1 pub.zip j (i) 0 jl9l fig. 3. libmry pe1'iodical data tmnsmittal form duplicated the subscription length field. (the subscription length field itself was to be kept intact for transfer to the history record. ) a one-position field (figure 3, item 002) could be programmed to suppress printing of a purchase order, as in the case of a canceled subscription, or to keep the order in "hold" if a budget problem arose near the end of the fiscal year. hold status would cause the order to be printed with the tag "pay when authorized" to call attention to this status. other fields shown in figautomated periodicals system/harp and heard 89 !ncc library periodical data 'rransmittal unique # p () 0 '9 '7 sheet #2 original inforh!\tion update il!formati on field 0 d 'd cc iteh descrip'l'io l ..• . -5 7-41 022 publisher ~ ck -l 'l i-ko 4 los' ~} l 0000 13-1'7 014 18-26 c cj ? lg 0 f 6 9 008 2'7-31 ,{: 0 stj 6 006 32 i i 010 33-80 'i 0 s;j. (., l l l l j b 013 7-12 ;? 8 5 ~¢ rooo 13-17 014 18-26 0 q 10 fc n"~' o! 008 2'7-31 {/ () 'l 0 0 006 32 i i r q 0 q tf 'f i 1111 010 33-80 i i b _q13 7-12 0 0 b 317 tf 000 13-17 c f, ?b2_ 014 18-26 ·c 9 ? 0 0 g 7 3 008 2'7-31 0 i %' 0 0 006 32 3 010 33-80 l111j b 013 7-12 000 13-17 014 18-26 n-r-l __ j 008 27-31 ~ --32 _r-·-,.-j-r-· -r-r--~ 010 33-80 ! i · i lj ~f+tr-· j--rtieb t-r-----i i i 1 i ii-tj l i ! j_ fig. 5. historical record form plete file on the history of each subscription. data necessary to maintain this file were the subscription length and cost fields (described above) and the addition of fields for the purchase order number (figure 3, item 013) plus the invoice number (figure 3, item 010). the computer program was wlitten so these data could be automatically transferred to the history record card at renewal time. automated periodicals system/harp and heard 91 superintencent of documents governmf.nt printing office washington, dc 20402 re• our public lands attention subscription claims department 02107/73 according to our records, we have not receive,d the following issueisi. if our su8scription is in order, kindly send our missing issues. volumf. 2?, issue n0.3••summer 1q72 -------.. thank you moraine valley community college library fig. 6. claims letter cpl•oct ba•d•l8220 morainf vallev cc~munity college library 10900 so 88th ave palos hills il 60465 claims data are transmitted as needed by providing the unique number of the title and the information concerning the missing issue( s). claims letters (figure 6) are then mailed in window envelopes, so no typing is required. when working with periodicals or serials one becomes accustomed to sudden or unusual changes that occur with or without notice. a few examples could be changes in title, frequency, or general publishing patterns. we wanted to provide our system with the ability to notify us that an investigative procedure had been completed and thus avoid many of the "why's" that recur. accordingly, we included a comment card (figure 4, card c) which can be updated as circumstances require. from the transmittal forms for the initial batch of titles, cards were keypunched and built into a magnetic tape file. the serials technician now submits updates or additions (e.g., for new titles) within the schedule provided by information systems and the tape is updated each month. the main printed report is run monthly (figure 7). this master list includes all bibliographic, holding, and renewal information. titles due for renewal in three months are flagged with asterisks. the technician 92 journal of library automation vol. 7/2 june 1974 m 0 r a i n e val ley c 0 m m un i t y c 0 l l e.g e _______ ___!.2{2117_3 ______________ _ periodical list page_ 67 r r 2 bu 01200 0875 00097 p dun's review 197z•oate 1965•1971 024502 0973•0875 m dun's rev! ew 666 fifth ave new york, ne\1 york ~~rkin~o~:~l~~r~6~~~y 86o~t ~~;3~~~~~~~0 --~ library 10900 south 88th ave palos hills, !l 60465 0310 062573 _b ____________ _ 10019 ___ ____:_ _______________ _ 016 history 00097 ebsco 0968•0869 00500 1 805276 ------------------~co o~7o oo70o 1 905441 -------------006374 06-_73 0970•,9873 01800 3 ' ______ ___:_____.:_____~~--"-----:__ _ ___:_ _ ______.:.___---'---~-'-----~-~------------******* history 00664 p early years .0373•date early years , one hale lane library, moraine valley comm. college, 10900 south 88th ave., palos hills, ill. 60465 022934 0373•0374 9/y 000 darien, conn. r r 1 t 00700 ------040273 06820 025 no _history record found 0374 a ------0-~~:~1 ~l ,eb~~~~2""-d~at~e-ll59•1 072 023-yr608'1:3-rzt4 m 10 r r 1 l 00595 1274 051013 bhistory ebony . 820 s. michigan av_e chicago, !lltn~!=._s__:..._....:____!6~0~6~05~~--'--~-. p 160464mrn88t090092 06/b 2 moraine vall comm 06080 '10900 s 88th ave 13 coll_lib palos hills, ill 60464 004 00098 ebsco 0868•0769 00500 1 805276 ebsco 0869•0770 00600 1 905441 _______ __:o:o:0_.:::63=-.7~3~05::__:•73 0870•0773 01200 3 history 00099 px ecology ·today 0371•0872 ecology today r r p ·ooooo 000000 000000000 m 000 box 180 ll-6 21'3 -----~s_t _mystic connecticut -~_b_b ___________ _ moraine vall~y comm coll lib 10900 s 88th ave palos hills, ill 60465 ceased pub•8/72--replaced with environment 00099 010859 02•71 u37l-u'zt2llll600 1 015680 02-73 0372•0273 00600 1 fig. 7. master periodical printout 019 i . 0000 . a determines the current subscription price and number of years to renew each flagged title and updates these fields. at the next monthly run, facsimile purchase orders (containing all revised data except the purchase order number) are printed (figure 8). the technician types up numbered purchase orders from these and forwards them to the business office in time for payment. we intended our system to utilize purchase order forms to be run directly on the computer. therefore, our present method of typing from facsimiles does seem wasted effort, but is looked on as a stopgap measure for the present and the inconvenience is tolerated while waiting for the more desirable method. if the computer forms are adopted, we may have increased conflict in price updates because there will be less opportunity for last minute corrections. however, we do plan to avoid as automated periodicals system/harp and heard 93 assoc. p.o. no ... .0 .hen.tion· ·· ·t.ia~kltv * ****************>lr>lr>lr.i.>lr>lri*>lrlilr>lr•iiii*h****i> .z i • vsiir.rfnevai. s•la§t~~tktltl~··r~·;i~b~~itj.;,.·. :.·,... .... ··= ··· ·>~~cit)'· * aovocate * * * * * * * * * * library * * palos.hllls ll 6046~ * • payment enclosed * * * * to this pri~da~~~;'.~~;~j~ ~j~~e~ ·.··• corr espfjnoene·~.·: •... :. • ·.... · · """'' ·: •·:· < ii< .,. fig. 8. facsimile purchase order * * * • * * * * much conflict as possible by plans to run actual purchase orders closer to the actual expiration date. the renewal procedure followed involves these steps: 1. check purchase order facsimiles for accuracy and match with renewal notices received. 2. check kardex for material arrival regularity. 3. type and forward purchase orders to business office. 4. update forms and send to information systems. 5. scan master list for flagged items and record their unique numbers and titles on update sheets. 94 journal of library automation vol. 7/2 june 1974. 6. update flagged items with renewal notices as follows: a. price. b. number of years for renewal. c. new subscription dates. d. method of payment. e. any changed information concerning publisher and mailing label. 7. update flagged items without renewal notices in the same manner, using the latest issue received. 8. as additional renewal notices for flagged items come in, make necessary updates. 9. send all updates to information systems at least three days before the master list and facsimiles are due to be run. price changes do occur between the time the item is flagged and the check is mailed. with most, though, notification is received from the publisher before the purchase order is actually typed, and corrections are made at that time. since the renewal process is linked to the expiration field, updating that field also causes transfer of data for the year just expired into the history record, as explained earlier. free materials, government depository items, and standing orders for which invoicing is known to be automatic are handled by filling the expiration date field with zeros. if a purchase order history record is needed, as with standing orders, these fields are updated at the time the invoice arrives. our master list does not contain headings to explain field descriptions. we place our master list in· a binder; a legend describing placement of field descriptions is attached to the inside of the front of the binder cover and is readily available for reference. we felt headings on each record would be clumsy, confusing, and would waste valuable printing space. codes and their explanations are attached to the inside of the back of the binder cover. to date, two revisions have been installed into the system: ( 1) in 1972 we decided to classify our holdings by subject. space was "found" for three digits, and we then proceeded to code our subjects (figure 3, item 027). our subject codes and their meanings are explained in figure 2a. ( 2) correspondence was assisted by having all necessary information in one location. the cost, purchase order number, and problem explanation were available by merely flipping the printout pages to the title in question on the master list. however, the date the purchase order was typed had to be looked up in order to effect an intelligent solution. six spaces were again "found" to provide this purchase order date (figure 3, item 028). actual computer programming was performed by information systems staff in bal, and programs are run on the college ibm 370-135 computer. automated pet·iodicals system/harp and heard 95 results it has not been possible to figure actual monetary costs for the library portion and maintenance of this system, nor to compare these costs with the manual system. libraries have traditionally been weak in figuring operation costs, and we confess to not having been very innovative in this area. we do not have specific itemized costs for our manual routines, so actual comparisons are not possible. a few figures concerning library time can be given. from october through december of 1971, when the initial phase was set up, the serials technician and public services librarian each contributed about 20 percent of their time, and a student aide worked 10 to 15 hours per week on the clerical part of the data transmittal. since that time the system has been operational for over two years, and some time approximations concerning updating, adding to the file, etc., are now available. with development behind us, time contributed by the serials technician, who is now solely responsible for the maintenance of the system, has dropped from 20 percent to between 5 and 15 percent. exact costs are difficult to extract, since this varies during the year according to the number of renewals due in particularly heavy expiration months as compared with those due in light expiration months. the library as part of the college is not charged for use of computer facilities. figures for machine time and keypunching are available and are as follows: program periodical additions per 100 titles periodical updates per 100 titles purchase order printing claim disbursements miscellaneous reports machine time (hr.) .1 .1 .5 .1 3.0 keypunch time (hr.) 8.0 2.0 .5 .2 .0 information systems has given their monetary cost in developing this system as $5,970 for programming time. they also figure program maintenance at $215 per year and the cost to run programs per year at $256. we can list important benefits we have derived. renewal problems have been eliminated. the few duplicate problems can be handled now as soon as they occur. our system handles all types of live subscriptions and the "dead file" as well. there is no more fussing with cards since we have a one-stop, clear record of holdings and histories, including the entire invoice and payment record for each subscription. at renewal time all the information for purchase orders is listed on a single-sheet facsimile. claim letters are done for us and we can call for various listings as they are needed. reports we receive are: master listing once a month, purchase order facsimiles once a month, claim letters as needed, fiscal year total cost re96 journal of library automation vol. 7/2 june 1974 ports, fiscal year area cost reports, subject lists as needed, holdings lists as needed, unique number lists as needed. conclusions many librarians having access to sophisticated computer facilities content themselves with producing a more or less elaborate holdings list. subscription placements and renewals are handled manually, often through a commercial agency. common agency problems such as overlapping and lapsed subscriptions are simply tolerated. we feel from our experience that if enough effort is expended to create a successfully operating holdings list, a small library does not require much further effort to add renewal, history record, and claiming functions. this eliminates agency problems, provides the ability to manipulate files for producing various reports, ancl in our opinion, results in more efficient and convenient record-keeping. the size of our operation falls at the lower end of a range of libraries having holdings large enough to require at least one individual's time. translated into figures, we feel that any automated system would be wasted on holdings of under 150 periodicals. the crucial factor in relation to size is not really any magic number of holdings but the ratio of available staff time to the size of the holdings. this factor must be evaluated by libraries considering any type of automated system. we feel much of the success of our system has been dependent upon our initial planning, our staff availability, and our conviction that a change was necessary to/eliminate the problems we were encountering with our n:ianual system. also the availability of the computer facilities, the encouragement provided by our superiors, and adequate library staff and information systems staff all contributed to an efficient changeover. acknowledgments gratitude is due moraine valley community college for its permission and support of this innovation. particular gratitude is due anabel sproat, head librarian, for her permission, support, and constant encouragement. the excellent work and friendly attitude of linda nemeth and the entire information systems staff who made this project a reality have been ~eeply appreciated. also, the capable assistance of student aide barbara hart ( goeske) in the recording process proved to be a very valuable asset. lib-mocs-kmc364-20140106083744 185 automatic processing of personal names for filing foster m. palmer: associate university librarian, harvard university library, cambridge, massachusetts describes a method for preparing personal names already in machine readable form for processing by any standard computer sort program, determining filing order insofar as possible from normally available information rather than from special formating . prefix recognition is emphasized; multiword forename entries are a problem area. provision is made for an edit list of problems requiring human decision. possible extension of the method to titles is discussed. this paper describes a method of computerized filing of personal names for display in book catalogs or other lists intended for direct human consultation. the problem is to be distinguished from a related but different one: computerized storage for retrieval by means of a search key, in which machine rather than human convenience can determine the order. to the extent that filing is a purely mechanistic sorting process, it is ideally suited to computerization. however, it was early recognized that there are many possible complications in machine filing of library entries, even in the relatively straightforward area of personal names. some of these complications arise from such factors as upper-case codes, diacritic codes, and punctuation; others are the result of library rules or practices that call for departures from strict alphabetical order. while the latter are especially numerous in subject headings and titles, they affect names as well, for example, the custom of filing me as if mac. 186 journal of library automation vol. 4/4 december, 1971 while no general review of the literature on machine filing will be attempted here, attention will be called to selected contributions. nugent ( 1) described an approach to computerizing the library of congress filing rules and pointed out areas where the present rules do not lend themselves to mechanization. cartwright and shoffner ( 2) discussed four major ways of approaching a solution to the problem and concluded that a mixture of different methods would eventually be required. in a later publication cartwright ( 3) developed his ideas further and included a brief description of the present writer's then unpublished work. the principal monograph on the subject is that by hines and harris ( 4). they present a suggested filing code departing significantly from those in widespread use and propose that material be encoded in a certain fashion so that it will be ready for computer sorting. in particular, considerable dependence is placed on distinctions between single, double, and multiple blanks separating words or fields. in a recent paper, harris and hines restate their rules briefly and report on their later research ( 5). the present paper describes a different, virtually an opposite, approach. rather than relying on special formating of the material at the time of encoding, the system described herein attempts to derive the necessary filing information from normally formated material. historically, it grew out of a desire to construct improved indexes for use at the harvard university library to the body of records distributed by the marc pilot project, in which there were field indicators and a limited number of delimiters within fields, but a general absence of information added expressly for the purpose of filing. while some early work embraced both personal names and titles, it was soon apparent that names by themselves presented a considerable challenge, and further consideration of the even more difficult areas of titles, corporate entries, and subject entries was deferred. a few comments on the possible applicability of the general method to titles will be made later. the concrete form which the work eventually took was an autocoder macro instruction for a second generation computer, an ibm 1401. (a macro instruction is a means of calling forth by means of a single instruction a more extensive routine already worked out and placed in the system "library.") since the 1401 was a fairly small computer, it was important that the algorithm not require an excessive number of instructions, and since the internal speed of the machine was only moderate, it was also important that processing be direct and economical. the method used, however, is by no means limited to a particular computer or a particular language. a partial version of the algorithm has been written in adpac, as an exercise in the evaluation of that language, and run on an ibm 360-65 using marc ii test data. the system is based on examination of names (previously identified as such by appropriate tags) and development of parallel sort keys consisting processing of personal names/palmer 187 only of letters, numerals, and blanks, readily processable by any standard computer sort package designed for alphanumeric information. the only requirements are that blank sort low and that the letters a z and the numerals 0-9 sort in their natural order; whether numbers are considered higher or lower than letters does not matter. processing starts at the beginning of the name and proceeds until one of three conditions prevails: the number of characters examined is equal to the length of the field as specified in the record; the number of characters developed in the sort key has reached a specified cut-off point or the default value of 40; or a delimiter indicating the end of the name, or the end of the name proper, is encountered (a search being then made beyond the delimiter for a date, which, if found, is added to the sort key). the sort key is derived by transferring letters (or, in the case of a date, numbers) from the source, with occasional modifications as described below, and inserting one of four filing codes at the end of each word or element of the name. in early work, single special characters were used as filing codes, but this was inappropriate as a general solution since the filing order of these characters depended on the collating sequence peculiar to a particular computer. furthermore, it was inconvenient because it involved changing all blanks to something else, since a blank within a name with its implication of something to follow should not file as low as whatever indicates the very end of the name. the idea of using a two-character code, the first always being blank so that any filing code will file ahead of any letter or date, was derived from nugent ( l) and has been followed in all later work. only three filing codes were actually used in compiling indexes to the marc i tapes, and in the first description privately circulated by the author ( 6 ). however, at least four are now seen to be necessary, actual need to. distinguish the second and third not yet having been encountered but being possible: code (blank followed by: ) 3 5 6 7 placement the end of the name including date if any. between the name proper and a date. the end of the surname. the end of any other "word" of the name. (a word is any element followed by a blank, hyphen, comma, or period, except that prefixes which are identified as such are not considered separate words. ) the following examples illustrate the use of the codes and the general workings of the system. in this and later examples, the left hand column gives data in marc i format (where diacritics are represented by superscript numbers preceding the letters to which they apply, and the equal sign is a delimiter ), and the right hand column gives the sort key as derived by the macro. 188 journal of library automation vol. 4/4 december, 1971 arthur arthur, joseph arthur, joseph,= 1875arthur, joseph charles arthur-behenna, k. arthur-petr2os, gabriele maria wilson, william wilson, william,= 1923wilson, william lyne wilson-browne, a. e. arthur 3 arthur 6joseph 3 arthur 6joseph 51875 3 arthur 6joseph 7 charles 3 arthur 7behenna 6k 3 arthur 7petros 6gabriele 7maria 3 wilson 6william 3 wilson 6william 51923 3 wilson 6william 7lyne 3 wilson 7browne 6a 7 e 3 the use of the numbers 3, 5, 6, and 7 is arbitrary to a degree. an interval was left between 3 and 5 so that the end of name code could be changed to 4 if the name were a subject rather than a main or added entry. no extra interval to accommodate added entry as distinguished from main entry was left because the author did not wish to encourage what he regards as an unwise practice. however, those who insist may easily substitute a new series of codes allowing for it. the distinction between end of name and end of surname serves to bring simple forename entries, that is those consisting of a single word, e.g. sophocles, ahead of similar surnames, e.g. sophocles, evangelinus apostolides. no serious work has yet been undertaken on the problem of processing complex forenames, but the distinctive tagging of forenames in marc ii has made available a growing body of experimental data and the codes 1 (and 2 for subject) are reserved for possible future use in this connection, without any intent of prejudging the question whether complex forename entries should come before similar surnames. it is the view of the author that the filing of complex forename entries is one of the areas in which all librarians are on most uncertain grounds in assessing the preference and convenience of readers. in handling such entries as alexander, mrs., or maurice, sister, the algorithm depends on the presence of a delimiter before mrs. or sister to avoid filing after alexander, milton or maurice, robert. such delimiters were in fact present in the marc pilot project data. despite the limitations mentioned in dealing with multiple-word forename entries and with surnames lacking forenames, the algorithm is well suited to names in the normal modern pattern, namely a simple or compound surname followed by a comma and one or more given names or initials. furthermore, very specifically, it deals with prefix names. prefixes with apostrophes are taken care of by a general dropping out of apostrophes and other non-significant punctuation: [l'isle, guillaume de] lisle 6guillaume 7de 3 o'brian, robert enlow obrian 6robert 7 enlow 3 the same feature also handles such names as the following: prud'homme, louis arthur prudhomme 6louis 7arthur 3 ta'bois, roland tabois 6roland 3 processing of personal names/palmer 189 most prefixes, however, are dealt with by a specific search based on examining the first letter of each new "word" of the name. if the element begins with a, b, d, e, f, i, l, m, 0, s, t, v, or z, a branch is made to a prefix searching routine tailor-made for the particular letter. takin& names beginning with l as an example, if the second character is "e," "a,' or "o," a prefix may be present; otherwise the prefix search is discontinued. if still searching and the third character is a blank or a hyphen, a prefix is adjudged to be present. the letters "le," "la," or "lo" are moved to the sort key output field. three input and two output characters are counted, effectively skipping over the blank or hyphen. similarly, if the third character is an "s" followed by a blank or a hyphen, "les," "los," or "las" is moved with a count of four input and three output. otherwise there is no prefix. la place, pierre antoine de laplace 6pierre 7antoine 7de 3 las cases, philippe de las cases 6philippe 7 de 3 le fanu, joseph sheridan lefanu 6joseph 7sheridan 3 lo presti, salvatore iopresti 6salvatore 3 routines for other letters, similar in approach but varying in detail, produce similar results: degli antoni, carlo degliantoni 6carlo 3 de la roche, mazo delaroche 6mazo 3 fitz gibbon, constantine fitzgibbon 6constantine 3 van der bijl, hendrick johannes vanderbijl 6hendrick 7johannes 3 the search for prefixes and quasi-prefixes is not limited to the first surname. it is and quite plainly should be extended to given names: bundy, mcgeorge bundy 6macgeorge 3 bundy, mary lee bundy 6mary 7lee 3 whether it should be extended to later elements of compound surnames is problematical. bowing to the fact that filing is as much an art as a science, in practice a compromise was reached: the prefix search was extended to compounds, except when the prefix of the succeeding element begins with d. the exception was made to accommodate the large number of hispanic names in this pattern, since it seemed clearly preferable to file all the names beginning "perez de" before any of those beginning "perez del": p2erez, joaqu2in perez 6joaquin 3 p2erez de urbel, justo perez 7de 7urbel 6justo 3 p2erez del castillo, j os2e perez 7 del 7 castillo 6jose 3 p2erez gald2os, benito perez 7 galdos 6benito 3 perhaps skipping prefix treatment in subsequent elements should have been made the rule rather than the exception; but an exception would then have been required for "me," "st.," and perhaps others. a list of the prefixes and quasi-prefixes sought for is given in table 1. note that in some cases the result is considered doubtful, and a special signal is set. in such situations the program can then set another signal within the macro and reprocess the name using alternate rules. 190 journal of library automation vol. 4/4 december, 1971 table 1. list of prefixes, etc., found by special search a 1, 4, 7 den st. 4, 15 a 2, 4 der 4, 11, 18 ste. 16 ab des te 4, 11 al 5 di ten 4 ai 3, 4, 6 do ter an 4, 7 dos 4, 11 the 1, 4, 8 ap du van 1, 17 at el 5 van 2, 4, 12, 17 aus 17 el 3, 4,6 van' ... 4,9 aus' ... 4, 9 fitz vande bar 10 im vanden bat 10 in 17 vander ben 10 la ver da las von 17 das 4, 12 le vande de 17 les vanden degli 1 lo vander dei los z 4, 5 del m' 4, 14 zu 17 della mac zum delle me 13 zur della 0 1. only when followed by blank. 2. only when followed by hyphen. 3. only when upper case. 4. "doubt" signal is set. 5. bypassed, i.e. dropped out and disregarded. 6. bypassed if "alternate" signal is on. 7. bypassed unless "alternate" signal is on. 8. bypassed if first word. 9. aus'm and van' t are closed up to "ausm" and "vant'' by the general dropping of apostrophes but no attempt is made at further special processing since their rarity would not justify the necessary elaboration of the algorithm. 10. not treated as prefix if special parameter is present. 11. not treated as prefix if "alternate" signal is on. 12. not treated as prefix unless "alternate" signal is on. 13. expanded to "mac". 14. expanded to "mac" unless "alternate" signal is on. 15. expanded to "saint". 16. expanded to "sainte". 17. another prefix may follow, as in de la. 18. previous notes do not apply when preceded by van or von. processing of personal names/palmer 191 diacritical marks on other than the first letter, or capitalization beyond the normal, such as all caps., would prevent proper processing. except as indicated, lower case is included along with upper, and prefixes followed by a hyphen are treated the same as those followed by a blank. the marc i corpus included several names with hyphenated prefixes, and fortuitously a method was available with the 1401 for giving the hyphen search almost a "free ride" along with that for the blank. since the code for hyphen was a single bit, the so-called b bit, and a blank was represented by no bits, a "branch if bit equal" instruction specifying all the other bits, a, 8, 4, 2, and 1, would branch if any character other than blank or hyphen was present. implementations for other machines may have to devote a disproportionate number of instructions to the search for the rare hyphenated prefixes, or else risk missing them. no doubt some other prefixes could be added to the list. "ua," for example, was considered but not included in the actual working macro after examination of a catalog of five million cards showed that only two beginning with these two letters were not for the prefix. the increase in processing time involved in adding another initial letter to the list of those looked for did not seem to be justified. in the program employing the macro for production of an index to names in the marc pilot project data, whenever the "doubt" signal was set, the name was printed on an edit list for human inspection. the name was then reprocessed with the "alternate" signal set and if a different output form was developed, this form also was printed. if the person reviewing the list accepted the first form, no special action was necessary. if the second was preferred, a card with an identifying number and the code 2 was punched; if a hand-made form was needed, this form was entered on a card with the code 3. these cards and the original output tape were then used to produce an edited output tape, in which the alternate forms were dropped unless a card directed otherwise. a second printed listing, recording the action taken, was also produced. the doubtful cases identified by the algorithm are not limited to the prefix problems described above. by far the commonest occasion for doubt was the presence of "a," "o," or "ii." was it a germanic umlaut, calling for translation for filing purposes to "ae," "oe," or "ue," or was it something else? this is not the place to debate the practice, followed in most american academic libraries, of filing umlauted letters as if spelled out with an "e." the major bibliographies covering the german book trade do so, but most german dictionaries and encydopedias do not; the example of other reference works and indexes is mixed. since the aim of the work described here was to produce an index of names that could be used comfortably by librarians used to the practice, a means of continuing it was sought. however, it would be manifestly improper to insert an "e" if the mark were a diaeresis rather than an umlaut; and, in the opinion of the writer, almost equally improper for hungarian, finnish, and turkish vowels. even 192 journal of library automation vol. 4/4 december, 1971 those who do file such vowels in these languages as if they were germanic do not usually do so for chinese. it should be noted here that not all transformations of special letters turn on the doubt signal. "a" is routinely translated to "aa" and icelandic thorn to "th." other occasions for signalling doubt include names with a suspiciously high number of words before the first comma. this provision was introduced in an attempt to catch some non-names in the original data which had been wrongly coded, e.g. women's association of the st. louis symphony. when found, a card with the code d was punched for the edit run to delete these entries entirely. statistics of processing for the entire corpus of marc pilot project data as cumulated and to some slight degree edited at the harvard university library will be useful in seeing the edit list in proper perspective. the entire file consisted of 47,884 records, 4,285 of which lacked names. the remaining 43,599 records contained 55,286 names ( or alleged names ). of these, 52,372 or 94.7% were judged to be purely routine. special processing of some sort not involving doubt (e.g., recognition of compound surname, expansion of "me" to "mac," closing up of apostrophe or nondoubtful prefix) was performed on 2,283 names, or 4.1%. the total number of doubtful names printed on the edit list was 631, or 1.1%. somewhat more than half of these ( 334) resulted in different forms on being reprocessed with the "alternate" signal on. in 562 of the 631 doubtful cases, or 89% of this group, the first or only form printed was accepted, so that no action beyond inspection was necessary. only 69 names, or not quite one out of 800 of the whole number, required the punching of a card-47 to indicate choice of the second form, 14 supplying a hand-made form, and 8 calling for deletion of non-names. subsequent changes in the macro would have reduced considerably the number of names requiring hand-made forms. it will be instructive to examine some of the names from the edit list to see what types of problems arise. the first selection of actual consecutive names (from lc card number 66-15363 through 66-17297) is rather typical: barnard, douglas st. paul barnard 6douglas 7saint 7paul 3 ekel4of, gunnar,= 1907ekeloef 6gunnar 51907 3 or: ekelof 6gunnar 51907 3 woolley, ai e. woolley 6al 7e 3 sch4onfeld, walther h. p., = 1888schoenfeld 6walther 7h 7p 51888 3 or: schonfeld 6walther 7h 7p 51888 3 ]4anner, michael jaenner 6michael 3 or: janner 6michael 3 m4 uller, alois,= 1924mueller 6alois 51924 3 or: muller 6alois 51924 3 huang, y4uan-shan huang 6yuean 7shan 3 or: huang 6yuan 7shan 3 m4uller, kurt,= 1903 mueller 6kurt 51903 3 or: muller 6kurt 51903 3 processing of personal names/palmer 193 note the dominance of simple umlauts; also, as a curiosity, the fact that all persons named "al" appear on the list because of the possibility that it might be an unhyphenated arabic prefix. note also that saint is treated as a separate word, not closed up as a prefix. "st." was originally put on the doubtful list with the thought that it might stand for sankt or szent instead of saint, although normal library practice would not use an abbreviation in such cases. its inclusion on the doubtful list was unexpectedly justified, however, by the occurrence of the name erlich, vera st. it seems likely that in this case "st." may stand for a patronymic, perhaps stojanova or stefanova, and there may be other occasions on which st. rather than s. is used as an abbreviation for such a name as stefan ( cf. the french use of ch. rather than simple c. as an abbreviation for charles). the only action required for the names in the list above would be to punch a "2" card for the chinese name huang, yuan-shan. indeed, just as the umlaut is the largest category on the edit list, so the non-umlauta diacritic that looks like an umlaut but does not call for insertion of "e"is the commonest occasion for punching an exception card. occasionally a diaeresis is found: lecomte du no 4 uy, pierre lecomte 7du 7nouey 6pierre 3 or: lecomte 7du 7nouy 6pierre 3 more common are certain front vowels in hungarian, finnish, or turkish, or the vowel ii in chinese as already encountered: f 4oldi, mih2aly foeldi 6mihaly 3 t4 olgyessy, juraj or: foldi 6mihaly 3 toelgyessy 6juraj 3 or: tolgyessy 6juraj 3 mettaelae 7portin 6raija 3 or: mettala 7portin 6raija 3 naervaenen 6sakari 3 or: narvanen 6sakari 3 inoenue 6e 3 or: inonu 6e 3 suemer 6mine 3 or: stuner 6mine 3 yue 6ying 7shih 3 or: yu 6ying 7shih 3 some libraries avoid the problem by treating all but the last of these as if umlauted, but determination of the correct category can usually be made at sight. occasionally a name gives pause, for example these two which both prove to be swiss and presumably germanic, although chonz may be romansh: ch4onz, selina r4 uede, thomas or: or: choenz 6selina 3 chonz 6selina 3 rueede 6thomas 3 ruede 6thomas 3 194 journal of library automation vol. 4/4 december, 1971 somewhat more troublesome are names where some but not all elements are germanic: vogt, ulya (g4oknil) ouchterlony, 40rjan vogt 6ulya 7 goeknil 3 or: vogt 6ulya 7 goknil 3 ouchterlony 6oerjan 3 or: ouchterlony 6orjan 3 ivanyi 7 gruenwald 6bela 3 or: ivanyi 7grunwald 6bela 3 although vogt is obviously germanic, ulya goknil is equally obviously not, and therefore the decision is that no umlaut is present. orjan, on the other hand, is a scandanavian forename, to be treated as umlauted even though coupled with a surname of scottish gaelic origin. bela ivanyigrunwald is a more difficult case. grunwald is of course germanic in origin, but can it be regarded as magyarized? in english we might assume that such a name is anglicized when the bearer starts writing it grunwald or gruenwald. however, the case is not so clear in hungarian, since that language also has the letter "u." discussion of such a point may seem to split hairs, but it does involve a significant difference between manual and machine systems. in a manual system, the question of whether to file as ivanyi-grunwald or as ivanyi-gruenwald would arise only in the exceedingly unlikely event that another name which would file between the two also occurred in the corpus. in a machine system, however, any difference, even this late in a distinctive name, could result in the various works of the author being misfiled among themselves, or a work about him filed before one by him. use of different codes to represent the same graphic, umlaut on the one hand or diaeresis or other non-umlaut on the other, would drastically reduce both the number of doubtful names aud lhe number of those for which an exception procedure is required. the harvard college library actually follows this practice. the library of congress experimented with it, but found that catalogers were reluctant in some cases to make the decision. contemplation of the case of bela ivanyi-grunwald gives the author more sympathy with this reluctance than he originally felt. in attempting to evaluate the method described above, one must acknowledge both strong points and limitations. on the one hand it is very gratifying to see aesop us and [a esopus] falling together despite differences in the capitalization of the "e" and the bracketing, and to find such sequences as the following, all without even being referred to the edit list under the rules then prevailing: aziz, khursheed kamal aziz ahmad al-azm, sadik j. azrael, jeremy r. ba maw, u baab, clarence theodore aziz 6khursheed 7kamal 3 aziz 7 ahmad 3 azm 6sadik 7j 3 azrael 6jeremy 7r 3 ba 7maw 6u 3 baab 6clarence 7theodore 3 processing of personal names/palmer 195 delgado, david j. del grande, john joseph delhom, louis a. delieb, eric delise, knoxie c. de lisser, r. lionel dell, ralph bishop dellinger, dave dell'isola, frank del mar, alexander delmar, anton delmar-morgan, edward locker delgado 6david 7j 3 delgrande 6john 7joseph 3 delhom 6louis 7 a 3 delieb 6eric 3 delise 6knoxie 7 c 3 delisser 6r 7lionel 3 dell 6ralph 7bishop 3 dellinger 6dave 3 dellisola 6frank 3 delmar 6alexander 3 delmar 6anton 3 delmar 7morgan 6edward 7locker 3 while it is certainly true that the system cannot survive without some provision for referring doubtful questions to a human editor, the number of these depends to a considerable extent on the filing and coding policies followed. provided forename entries are coded as such, the system does a good job of identifying possible problems. (presently, all multiple word forename entries are considered doubtful.) "u a" has already been cited as an example of a prefix deliberately omitted, and there are others which could be added at any time it is thought worth while. a more troublesome situation, pointed out by kelley cartwright, is the possible occurrence of "van" as a non-final element of an unhyphenated vietnamese name. the only way this could be prevented from misfiling by merging it with the next element would be to throw all "vans" including the numerous ones of dutch origin into the doubtful category, expanding the edit list more than twenty percent. this did not seem advisable, particularly since normal library usage is to hyphenate vietnamese compound names. up to this point the evaluation is quite favorable. the system can correctly process a very large proportion of names, including some which involve quite sophisticated points, without reference to a human editor, and it can call virtually all the rest to the attention of an editor. however, human review of problems means that there will be occasions when borderline cases are decided in different ways. if a permanent machine file of all established forms of names in the system is kept, both forms of each doubtful name could be checked against it so that decisions already made would not have to be repeated, thus saving the time of the editor as well as the hazard of differing decisions. it would of course be very expensive to keep such a file just for this purpose, but a file of this type would probably form a part of a comprehensive mechanized bibliographic system anyway. another area in which a mixed report would have to be given to the system is its extensibility to types of headings other than names. in work conducted on the same principles with a few thousand early titles from the marc pilot project, there were only two conspicuous problems, one of which may not in fact be a problem: the filing of numbers as such rather 196 journal of library automation vol. 4/4 december, 1971 than as if they were spelled out in the language of the title. true, the particular algorithm then in use did not provide for bringing numbers of differing length into logical order ("50 great ghost stories" before "200 years of watercolor painting in america" ), but this is a readily attainable refinement. the other problem is more refractory and is exemplified by titles beginning with prefix names, for example "de gaulle," "de soto," and "van gogh." names within titles could not receive the usual name treatment since there was no way of identifying them as such, and therefore the prefixes were filed as separate words. furthermore, while marc pilot project authors were quite a cosmopolitan lot, the titles were almost entirely in english. therefore, removal of initial articles was not much of a problem. there did not happen to be any work beginning "a to z of ... ". however, there was a book which, although in english and so coded, had a title beginning with a spanish article: "la vida," by the late oscar lewis. in working toward automatic removal of initial articles from titles, the usual assumption is that machine coding of the language of the work is available and will be checked first. this seems desirable both because it is probably more efficient in machine time than to check every title against a long list of possible articles in many languages, and because words that are articles in one lan~uage are not necessarily so in another. most occurrences of initial "die' are probably german articles, but some are other parts of speech in english, for example "die casting" or "die like a dog." if the umlaut is the common problem in names, the initial indefinite article which is the same as the numeral "one" in several languages may well be the most frequent occasion for doubt in processing of titles. "un" or "ein" will usually mean "a," to be dropped; but will sometimes mean "one," to be kept. there are certainly other problems, in addition to the one with prefix names already mentioned, including some that give trouble even in manual filing: "charles the first," "charles ii," "charles v et son temps." it may be that at some point in the cataloging process a reviser will have to be on the lookout for certain of these special situations and add flags to indicate that a title includes a prefix name, or that it begins with an article which would not be found by program, or that it does not begin with an article although it appears to do so, or that for some other reason it calls for a hand made key. the system described is not an absolute system, but absolute systems have their own tyrannies. if, as the author believes, cartwright and shoffner ( 2) are correct in thinking that a mixture of methods will be required in actual book catalog projects, then a system along the lines of the one described may well be a useful part of the mix. references 1. nugent, william r.: "the mechanization of the filing rules for the dictionary catalogs of the library of congress," library resources & technical services, 11 (spring 1967), 145-166. processing of personal names/palmer 197 2. cartwright, kelley l.; shoffner, ralph m.: catalogs in book form ([berkeley]; institute of library research, university of california, 1967 ), pp. 24-27. 3. cartwright, kelley l.: "mechanization and library filing rules," advances in librarianship, 1 ( 1970 ), 59-94. 4. hines, theodore c.; harris, jessica l.: computer filing of index, bibliographic, and catalog entries (newark, n.j.: bro-dart foundation, [ 1966]). 5. harris, jessica l.; hines, theodore c.: "the mechanization of the filing rules for library catalogs: dictionary or divided," library resources & technical services, 14 (fall 1970 ), 502-516. 6. palmer, foster m.: a macro instruction to process personal names for filing ([cambridge, mass.]: harvard university library, 1970 ). a copy of this document, which contains an autocoder listing of the actual working macro, has been deposited with the national auxiliary publications service, from which it can be obtained on microfiche (naps 01680 ) . in this version there are only three codes, 2 corresponding to 3 as used in this paper, 4 to both 5 and 6, and 6 to 7. there are also a few differences in the treatment of particular prefixes. the macro is made up of 579 cards, of which 125 are comments only. online ticketed-passes: a mid-tech leap in what libraries are for public libraries leading the way online ticketed-passes: a mid-tech leap in what libraries are for jeffrey davis information technology and libraries | june 2019 8 jeffrey davis (jtrappdavis@gmail.com) is branch manager at san diego public library, san diego, california. last year a library program received coverage from the new york times, the wall street journal, the magazines mental floss and travel+leisure, many local newspapers and tv outlets, online and trade publications like curbed, thrillist, and artforum, and more. that program is new york’s culture pass, a joint program of the new york, brooklyn, and queens public libraries. culture pass is an online ticketed-pass program providing access to area museums, gardens, performances, and other attractions. as the new york daily news wrote in their lede: “it’s hard to believe nobody thought of it sooner: a new york city library card can now get you into 33 museums free.” libraries had thought of it sooner, of course. museum pass programs in libraries began at least as early as 1995 at boston public library and the online ticketed model in 2011 at contra costa (ca) county library. the library profession has paid this “mid-tech” program too little attention, i think, but that may be starting to change. what are online ticketed-passes? the original museum pass programs in libraries circulate a physical pass that provides access to an attraction or group of attractions. sometimes libraries are able to negotiate free or discounted passes but many times the passes are purchased outright. the circulating model is still the most common for library pass programs, but it suffers from many limitations. passes by necessity are checked out for longer than they’re used. they sit waiting for pick up on hold shelves and in transit to their next location. long queues make it hard for patrons to predict when their requests will be filled, and therefore difficult to plan on using. for the participating attractions, physical passes are typically good anytime and so compete with memberships and paid admission. there are few ways to shape who borrows the passes in order to meet institutional goals. and there are few ways to limit repeat use by library patrons to both increase exposure and nudge users toward membership. as a result, most circulating pass programs only connect patrons to a small number of venues. despite these limitations, circulating passes have been incredibly popular: at writing there are 967 requests for san diego public library’s 73 passes to the new children’s museum. we sometimes see that sort of interest in a new bestseller, but this is a pass that sdpl has offered continuously since 2009. in 2011, contra costa county library launched the first “ticketed-pass” program, discover & go. discover & go replaced circulating physical passes with an online system with which patrons, remotely or in the library with staff assistance, retrieve day-passes — tickets — by available date or venue. this relatively simple and common-sense change makes an enormous difference. in addition to convenience and predictability for patrons, availability is markedly increased because venues are much more comfortable providing passes when they can manage their use: patrons can be restricted to a limited number of tickets per venue per year and venues can match the information technology and libraries | june 2019 9 number of tickets available to days that they are less busy. the latter preserves the value of their memberships while making use of their own “surplus capacity” to bring in new visitors and potential new members. funding and internal expectations at many venues carry obligations to reach underserved communities and the programs allow partner attractions to shape public access and receive reporting by patron zip code and other factors. the epass software behind discover & go is regional by design and supports sharing of tickets across multiple library systems in ways that are impractical to do with physical passes. as new library systems join the program, they bring new partner attractions into the shared collection with them. the oakland zoo, for example, needs only to negotiate with their contact at oakland public library to coordinate access for members of oakland, san francisco, and san jose public libraries. because of the increased attractiveness of participation, it’s been easier for libraries to bring venues into the program. in 2011, discover & go hoped for a launch collection of five museums but ultimately opened with forty. the success of ticketed-pass programs in turn attracts more partners. today, discover & go is available through 49 library systems in california and nevada with passes to 137 participating attractions. similarly, new york’s culture pass launched with 33 participating venues and has grown in less than a year to offer a collection of 49. while big city programs attract the most attention, pass programs are offered by county systems like alamace county (nc), consortiums like libraries in clackamas county (or), small cities like lawrence (ma), small towns like atkinson (nh), and statewide like the michigan activity pass which is available through over 600 library sites with tickets to 179 destinations plus state parks, camping, and historical sites. for each library, the participating destinations form a unique collection: a shelf of local riches, idiosyncratic and rooted in place. through various libraries one can find tickets for the basketball hall of fame, stone barns center for food and agriculture, dinosaur ridge, eric carle museum of picture book art, bushnell park carousel, california shakespeare theater, children’s museums, zoos, aquariums, botanical gardens, tours, classes, performances, and on to the met, moma, crocker, de young, and many, many, many more. for kids, “enrichments” like these are increasingly understood as essential parts of learning and exploration. for adults, access to our cultural treasures, including partners like san francisco’s museum of the african diaspora or chicago’s national museum of puerto rican arts & culture — besides being its own reward — enhances local connection and understanding. we’re also starting to see the ticketing platform itself become an asset to smaller organizations — craft studios, school performances, farm visits, nature centers, and more — that want to increase public access without having to take on a new ability. importantly, ticketed-pass programs are built on the core skills of librarians: information management, collection development, community outreach, user-centered design, customer service, and technological savvy. the technology discover & go was initially funded by a $45,000 grant from the bay area library and information system (balis) cooperative. contra costa contracted with library software company quipu group to develop the epass software that runs the program and that is also used by ny’s culture pass, public libraries leading the way: online ticketed passes | davis 10 https://doi.org/10.6017/ital.v38i2.11141 multnomah county (or) library’s my discovery pass, and a consortium of oregon libraries as cultural pass. ticketed-pass software is also offered by the libraryinsight and plymouth rocket companies and used by denver public library, seattle public library, the michigan activity pass, and others. the software consists of a web application with a responsive patron interface and connects over sip2 or vendor api to patron status information from the library ils. administrative tools set finegrained ticket availability, blackout dates, and policies including restrictions by patron age, library system, zip code, municipality, number of uses allowed globally and per venue, and more. recent improvements to epass include geolocation to identify nearby attractions and improved search filters. still in development are transfer of tickets between accounts, re-pooling of unclaimed tickets, and better handling of replaced library cards. the strength that comes from multi-system ticketed-pass programs also carries with it challenges on the patron account side. ilses each implement protocols and apis for working with patron account information differently and library systems maintain divergent policies around patron status. there’s a role for lita and for library consortia and state libraries to push for more attention to and consistency on patron account policies and standards. the emphasis in library automation is similarly shifting. our ilses originated to manage the circulation of physical items, a catalog-centric view. today, as robert anderson of quipu group suggested to me, a diverse range of online and offline services and non-catalog offerings orbit our users, calling for a new frame of reference: “it’s a patron-centric world now.” the vision library membership is the lynchpin of ticketed-pass and complementary programs in the technical sense, as above, and conceptually: library membership as one’s ticket to the world around. though i’m not aware of academic libraries offering ticketed-passes, they have been providing local access through membership. at many campuses, the library is the source for one’s library card which is also one’s campus id, onand off-campus cash card, transit pass, electronic key, print management, and more. that’s kind of remarkable and deserving of more attention. traditionally, librarians have responded to patron needs by providing information, resources, and services ourselves. new models and technologies are making it easier to complement this with the facilitation approach, of which online ticketed-passes are the quintessential example. we further increase access by reducing barriers of complexity, language, know-how, and social capital, for example, by maintaining community calendars of local goings-on or helping communities take advantage of nearby nature. online ticketed-pass programs will grow and take their place in the public’s expectations of libraries and librarians: that libraries are the place that help us (better, more equitably) access the resources and riches around us. powering this are important new tools for library technologists to interrogate and advance with the same attention we give to both more established and more speculative applications. identifying emerging relationships in healthcare domain journals via citation network analysis kuo-chung chu, hsin-ke lu, and wen-i liu information technology and libraries | march 2018 39 kuo-chung chu (kcchu@ntunhs.edu.tw) is professor, department of information management, and dean, college of health technology, national taipei university of nursing and health sciences; hsin-ke lu (sklu@sce.pccu.edu.tw) is associate professor, department of information management, and dean, school of continuing education, chinese culture university; wen-i liu (wenyi@ntunhs.edu.tw, corresponding author) is professor, department of nursing, and dean, college of nursing, national taipei university of nursing and health sciences. abstract online e-journal databases enable scholars to search the literature in a research domain or to crosssearch an interdisciplinary field. the key literature can thereby be efficiently mapped. this study builds a web-based citation analysis system consisting of four modules: (1) literature search; (2) statistics; (3) articles analysis; and (4) co-citation analysis. the system focuses on the pubmed central dataset and facilitates specific keyword searches in each research domain for authors, journals, and core issues. in addition, we use data mining techniques for co-citation analysis. the results could help researchers develop an in-depth understanding of the research domain. an automated system for co-citation analysis promises to facilitate understanding of the changing trends that affect the journal structure of research domains. the proposed system has the potential to become a value-added database of the healthcare domain, which will benefit researchers. introduction healthcare is a multidisciplinary research domain of medical services provided both inside and outside a hospital or clinical setting. article retrieval for systematic reviews in the domain is much more elusive than retrieval for reviews in clinical medicine because of the interdisciplinary nature of the field and the lack of a significant body of evaluative literature. other connecting research fields consist of the respective research fields of the application domain (i.e., the health sciences, including medicine and nursing).1 in addition, valuable knowledge and methods can be taken from the fields of psychology, the social sciences, economics, ethics, and law. further, the integration of those disciplines is attracting increasing interest.2 researchers may use bibliometrics to evaluate the influence of a paper or describe the relationship between citing and cited papers. citation analysis, one of several possible bibliometric approaches, is more popular than others because of the advent of information technologies.3 citation analysis counts the frequency of cited papers from a set of citing papers to determine the most influential scholars, publications, or universities in a discipline. it can be classified into two basic types: the first type counts only the citations in a paper that are authored by an individual, while the second mailto:kcchu@ntunhs.edu.tw mailto:sklu@sce.pccu.edu.tw mailto:wenyi@ntunhs.edu.tw identifying emerging issues in the healthcare domain | chu, lu, and liu 40 https://doi.org/10.6017/ital.v37i1.9595 type analyzes co-citations to identify intellectual links among authors in different articles. this paper focuses on the second type of citation analysis. small defined co-citation analysis as “the frequency with which two items of earlier literature are cited together by the later literature.”4 it is not only the most important type of bibliometric analysis, but also the most sophisticated and popular method. many other methods originate from citation analysis, including document co-citation analysis, bibliographic coupling,5 author cocitation analysis,6 and co-word analysis.7 there are levels of co-citation analysis: document, author, and journal. co-citation could be used to establish a cluster or “core” of earlier literature.8 the pattern of links between documents can establish a structure to highlight the relationship of research areas. citation patterns change when previously less-cited papers are cited more frequently, or old papers are no longer cited. changing citation patterns imply the possibility of new developments in research areas; furthermore, we can investigate changing patterns to understand the scientific trend within a research domain.9 co-citation analysis can help obtain a global overview of research domains.10 the aim of this paper is to detect emerging issues in the healthcare research domain via citation network analysis. our results can provide a basis for knowledge that researchers can use to construct a search strategy. structural knowledge is intrinsic to problem solving. because of the interdisciplinary nature of the healthcare domain and the broadness of the term, research is performed in several research fields, such as nursing, nursing informatics, long-term care, medical informatics, geriatrics, information technology, telecommunications, and so forth. although electronic journals enable searching by author, article, and journal title using keywords or full text, the results are limited to article content and references and therefore do not provide an in-depth understanding of the knowledge structure in a specific domain. the knowledge structure includes the core journals, core issues, the analysis of research trends, and the changes in focus of researchers. for a novice researcher, however, the literature survey remains a troublesome process in terms of precisely identifying the key articles that highlight the overview concept in a specific domain. the process is complicated and time-consuming, and it limits the number of articles collected for retrospective research. the objective of this paper is to provide information about the challenges and methodology of relevant literature retrieval by systematically reviewing the effectiveness of healthcare strategies. to this end, we build a platform for automatically gathering the full text of ejournals offered by the pubmed central (pmc) database.11 we then analyze the co-citation results to understand the research theme of the domain. methods this paper tries to build a value-added literature database system for co-citation analysis of healthcare research. the results of the analysis will be visually presented to provide the structure of the domain knowledge to increase the productivity of researchers. information technology and libraries | march 2018 41 dataset for co-citation analysis, a data source of related articles on healthcare is required. for this paper, the articles were retrieved from the pmc database using search terms related to the healthcare domain. to build the article analysis system, we used bibliometrics to locate the relevant references while analysis techniques were implemented by the association rule algorithm of data mining. the pmc database, which is produced by the us national institutes of health and is implemented and maintained by the us national center for biotechnology information of the us national library of medicine, provides electronic articles from more than one thousand full-text journals for free. we could understand the publication status from the open access subset (oas) and access to the oai (open archives initiative) protocol for metadata harvesting, which includes the full text in xml and pdf. regarding access permission, pmc offers a dataset of many open access journal articles. this paper used a dedicated xml-formatted dataset (https://www.ncbi.nlm.nih.gov/pmc/tools/oai/). the xml-formatted dataset followed the specification of dtd (document type definition) files, which are sorted by journal title. each article has a pmcid (pmc identification), which is useful for data analysis. in addition to the dataset, the pmc also provides several web services to help widely disseminate articles to researchers. pubmed central (pmc) citation database searching module citation module web view users data sourcemiddle-end pre-processeingback-end front-end xml files web serverdb server keyword co-citation module statistical module figure 1. the system architecture of citation analysis with four subsystems. https://www.ncbi.nlm.nih.gov/pmc/tools/oai/ identifying emerging issues in the healthcare domain | chu, lu, and liu 42 https://doi.org/10.6017/ital.v37i1.9595 system architecture our development environment consisted of the following four subsystems: front-end, middle-end, back-end, and pre-processing. the front-end creates a “web view,” a visualization of the results for our web-based co-citation analysis system. the system architecture is shown in figure 1. front-end development subsystem we used adobe dreamweaver cs5 as a visual development tool for the design of web templates. the php programming language was chosen to build the co-citation system that would be used to access and analyze the full-text articles. in terms of the data mining technique, we implemented the apriori algorithm with the php language.12 the results were exported as xml to a charting process, where we used amcharts (https://www.amcharts.com/), to create stock charts, column charts, pie charts, scatter charts, line charts, and so forth. middle-end server subsystem the system architecture was a microsoft windows-based environment with a xampp 2.5 web server platform (https://www.apachefriends.org/download.html). xampp is a cross-platform web development kit that consists of apache, mysql, php, and perl. it works across several operating systems, such as linux, windows, apache, macos, and oracle solaris, and provides ssl encryption, a phpmyadmin database management system, webalizer traffic management and control suite, a mail server (mercury mail transport system), and filezilla ftp server. back-end database subsystem to speed up co-citation analysis, the back-end database system used mysql 5.0.51b with interface phpmyadmin 2.11.7 for easy management of the database. mysql includes the following features: • using c and c++ to code programs, users can develop an application programming interface (api) through visual basic, c, c + +, eiffel, java, perl, php, python, ruby, and tcl languages with the multithreading capability that can be used in multi-cpu systems and easily linked to other databases. • performance of querying articles is quick because sql commands are optimally implemented, providing many additional commands and functions for a user-friendly and flexible operating database. an encryption mechanism is also offered to improve data confidentiality. • mysql can handle a large-scale dataset. the storage capacity is up to 2tb for win32 nts systems and up to 4tb for linux ext3 systems. • it provides the software myodbc as an odbc driver for connecting many programming languages, and it several languages and character sets to achieve localization and internationalization. pre-processing subsystem the pmc provides access to the article via oas, oai services, e-utilities, and ftp. we used ftp to download a compressed (zip) file packaged with a filename following the pattern “articles?-?.xml.tar.gz” on october 28, 2012 (ftp://ftp.ncbi.nlm.nih.gov/pub/pmc), where “?-?” is “0-9” or “a-z”. the size of the zip file was approximately 6.17gb. after extraction, the size of the articles was approximately 10gb. the 571,890 articles from 3,046 journals were grouped and https://www.amcharts.com/ https://www.apachefriends.org/download.html ftp://ftp.ncbi.nlm.nih.gov/pub/pmc information technology and libraries | march 2018 43 sorted by journal title in a folder labeled with an abbreviated title. an xml file would, for example, be named “aapsj-10-1-2751445.nxml,” where “aapsj” was the abbreviated title of the journal american association of pharmaceutical scientists journal, “10” was the volume of the journal, “1” was number of the issue, and “2751445” was the pmcid. we used related technologies for developing systems that include php language, array usage, and the apriori algorithm to analyze the articles and build the co-citation system.13 finally, several analysis modules were created to build an integrated co-citation system. research procedure the following is our seven-step research procedure to fulfill the integrated co-citation system: 1. parse xml file: select tags for construction of database; choose fields for co-citation analysis (for example, , , and ). 2. present web-based article: design webpage and css style; present web-based xml file by indexing variable . 3. build an abstract database: the database consists of several fields: , , , , , , and . 4. develop searching module: pass the keyword to the method “post” in sql query language and present the search result in the webpage. 5. develop statistical module: the statistical results include number of article and cited articles, the journals and authors cited in all articles, and the number of cited articles. 6. develop citation module: visually present the statistical results in several formats; rank searched journals; rank searched and cited journals in all the articles. 7. develop co-citation module: analyze the association between articles with the apriori algorithm. association rule algorithms the association rule (ar), usually represented by ab, means that the transaction containing item a also contains item b. there are many such rules in most of the dataset, but some were useless. to validate the rules, two indicators, support and confidence, can be applied. support, which means usefulness, is the number of times the rules feature in the transactions, whereas confidence means certainty, which is the probability that b occurs whenever the a occurs. we chose the rules for which the values of both support and confidence were greater than a predefined threshold. for example, a rule stipulating “toastjam” has support of 1.2 percent and confidence of 65 percent, implying that 1.2 percent of the transactions contain “toast” and “jam” and that 65 percent of the transactions containing “toast” also contained “jam.” the principle for generating the ar is based on two features of the documents: (1) find the highfrequency items that set their supports greater than the threshold; (2) for each dataset x and its subnet y, check the rule xy if the support is greater than the threshold, in which the rule xy means that the occurrence in the rule containing x also contains y. most studies focus on searching high-frequency item sets.14 the most popular approach for identifying the item sets is apriori algorithm, as shown in figure 2.15 the algorithm rationale is that if the support of item set i is less identifying emerging issues in the healthcare domain | chu, lu, and liu 44 https://doi.org/10.6017/ital.v37i1.9595 than or equal to the threshold, i is not a high-frequency item set. new item set i that inserts any item a into i would not be a high-frequency item set. according to the rationale, the apriori algorithm is an iteration-based approach. first, it generates candidate item set c1 by calculating the number of occurrences of each attribute and finding that the high-frequency item set l1 has support greater than the threshold. second, it generates item set c2 by joining l1 to c1, iteratively finding l2 and generating c3, and so on. 1: l1 = {large 1-item sets}; 2: for (k=2; lk-1; k++) do begin 3: ck = candidate_gen (lk-1); 4: for all transactions td do begin /* generate candidate k-dataset*/ 5: ct = subset (ck, t); 6: for all candidates c  ct do 7: c_count=c_count+1; 8: end 9: lk ={cck | c_count ≥ minsuppport} 10: end 11: return l =  lk; figure 2. the apriori algorithm. the apriori algorithm is one of the most commonly used methods for ar induction. the candidate_gen algorithm, as shown in figure 3, includes join and prune operations for generating candidate sets.16 steps 1 to 4 generate all possible candidate item sets c from lk-1. steps 5 to 8: delete the item set that is not a frequent item set by the apriori algorithm. step 9 returns candidate set ck to the main algorithm. 1: for each item set x1 lk-1 2: for each item set x2 lk-1 3: c = join (x1[1], x1[2], x1[k-2], x1[k-1], x2[k-1]) 4: where x1[1] = x2[1], x1[k-2] = x2[k-2], x1[k-1] < x2[k-1]; 5: for item sets c  ck do 6: for all (k-1)-subsets s of c do 7: if (s  lk-1) then add c to ck; 8: else delete c from ck; 9: return ck; figure 3. the candidate_gen algorithm. information technology and libraries | march 2018 45 results we searched the pmc database with keywords “healthcare,” “telecare,” “ecare,” “ehealthcare,” and “telemedicine” and located 681 articles with a combined 14,368 references. values were missing from the year field for 4 of the references; this was also the case for 635 of a total of 52,902 authors. according to the keyword search for the healthcare domain, a pie chart of the journal citation analysis, as shown in figure 4, the top-ranked journal in terms of citations was the british medical journal (bmj). it was cited approximately 439 times, 18.89 percent of the total, followed by the journal of the american medical association (jama), which was cited approximately 344 times, 14.80 percent of the total. the trend of healthcare citation 1852 to 2009 peaked in 2006 at approximately 1,419 citations, with more than half of the total occurring in this year. figure 4. top-cited journals in the healthcare domain by percentage of total citations (n = 2324) with the keyword search for the healthcare domain, figure 5 shows a pie chart of the author citations. the most-cited author was j. w. varni, professor of pediatric cardiology at the university of michigan mott children’s hospital in ann arbor. this author was cited approximately 149 times, equivalent to 23.24 percent of the total, followed by d. n. herndon, professor at the department of plastic and hand surgery, friedrich-alexander university of erlangen in germany. this author was cited approximately 73 times, 11.39 percent of the total. by identifying the affiliations of the topranked authors, researchers can access related information in their field of interest. the co-citation analysis was conducted using the apriori algorithm. the relationship of co-citation journals with a supporting degree greater than 38 from 1852 to 2009 is shown in figure 6. each identifying emerging issues in the healthcare domain | chu, lu, and liu 46 https://doi.org/10.6017/ital.v37i1.9595 journal was denoted by a node, where the node with double circle meant the journal is co-cited with the other in a citing article. bmj, which covers the fields of evidence-based nursing care, obstetrics, healthcare, nursing knowledge and practices, and others, is the core journal of the healthcare domain. figure 5. top-cited authors in journals of the healthcare domain by percentage of total citations (n = 641) figure 6. the relationship of co-citation journals with bmj. information technology and libraries | march 2018 47 to identify the focus of the journal, we analyze the co-citation in three periods. in 1852–1907, journals are not in co-citation relationships; in 1908–61, five candidates had a supporting degree greater than 1 (see table 1); and in 1962–2009, twenty-eight candidates had a supporting degree greater than 14 (see table 2 (for example, bmj and lancet had sixty-eight co-citations). table 1. candidates in co-citation analysis with a supporting degree greater than 1 (1908–61). no journals no. of journals co-cited support 1 publ math inst hung acad sci, publ math 2 3 2 jaoa, j osteopath 2 1 3 antioch rev, j abnorm soc psychol 2 1 4 n engl j med, am surg 2 1 5 arch neurol psychiatry, j neurol psychopathol, z ges neurol psychiat 3 1 table 2. candidates in co-citation analysis with a supporting degree greater than 14 (1962–2009). no journals no. of journals co-cited support 1 bmj, lancet 2 68 2 bmj, jama 2 65 3 jama, med care 2 64 4 bmj, arch intern med 2 61 5 lancet, jama 2 52 6 soc sci med, bmj 2 52 7 jama, arch intern med 2 51 8 lancet, med care 2 50 9 crit care med, prehospital disaster med 2 49 10 n engl j med, bmj 2 49 11 n engl j med, lancet 2 49 12 n engl j med, jama 2 47 13 n engl j med, med care 2 47 14 qual saf health care, bmj 2 47 15 bmj, crit care med 2 42 16 med care, bmj 2 38 17 n engl j med, j bone miner res 2 33 identifying emerging issues in the healthcare domain | chu, lu, and liu 48 https://doi.org/10.6017/ital.v37i1.9595 18 n engl j med, j pediatr surg 2 26 19 lancet, j pediatr surg 2 25 20 jama, nature 2 25 21 lancet, jama, bmj 3 24 22 n engl j med, lancet, bmj 3 21 23 intensive care med, bmj 2 21 24 bmj, n engl j med, jama 3 20 25 n engl j med, jama, lancet 3 20 26 jama, med care, lancet 3 14 27 jama, med care, n engl j med 3 14 28 bmj, jama, lancet, n engl j med 4 14 the link of co-citation journals in three periods from 1852 to 2009 can be summarized as follows: (1) three journals were highly cited but were not in a co-citation relationship in 1852–1907 (see figure 7); (2) five clusters of the healthcare journals in co-citation relationships were found for the years 1908–61 (see figure 8); and (3) 1962–2009 had a distinct cluster of four journals within the healthcare domain (see figure 9). figure 7. the relationship of co-citation journals for the healthcare domain in 1852–1907. information technology and libraries | march 2018 49 figure 8. the relationship of co-citation journals for the healthcare domain in 1908–61. journals with double circles are co-cited with the other in a citing article. journals with triple circles are cocited with the other two in a citing article. figure 9. the relationship of co-citation journals for the healthcare domain in 1962–2009. the thick line and circle indicates the journals are co-cited in a citing article. conclusions identifying emerging issues in the healthcare domain | chu, lu, and liu 50 https://doi.org/10.6017/ital.v37i1.9595 this paper presented an automated literature system for co-citation analysis to facilitate understanding of the sequence structure of journal articles cited in the healthcare domain. the system visually presents the results of its analysis to help researchers quickly identify the key articles that provide an overview of the healthcare domain. this paper used the keywords related to healthcare for its analysis and found that bmj is a core journal in the domain. the co-citation analysis found a single cluster within the healthcare domain comprising four journals: bmj, jama, lancet, and the new england journal of medicine. this paper focused on a co-citation analysis of journals. authors, articles, and issues featured in the co-citation analysis can be further studied in an automated way. a period analysis of publication years is also important. further analyses can facilitate understanding of the changes in a research domain and the trend of research issues. in addition, the automatic generation of a map would be a worthwhile topic for the future study. acknowledgements this article was funded by the ministry of science and technology of taiwan (most), formerly known as national science council (nsc), with grant no: nsc 100-2410-h-227-003. for the remaining authors none were declared. all the authors have made significant contributions to the article and agree with its content. there is no known conflict of interest in this study. references 1 a. kitson et al., “what are the core elements of patient-centered care? a narrative review and synthesis of the literature from health policy, medicine and nursing,” journal of advanced nursing 69 (2013): 4–8, https://doi.org/10.1111/j.1365-2648.2012.06064.x. 2 s. j. brownsell et al., “future systems for remote health care,” journal of telemedicine and telecare 5 (1999): 145–48, https://doi.org/10.1258/1357633991933503; b. g. celler, n. h. lovell, and d. k. chan, “the potential impact of home telecare on clinical practice,” medical journal of australia 171 (1999): 518–20; r. walker et al., “what it will take to create new internet initiatives in health care,” journal of medical systems 27 (2003): 95–98, https://doi.org/10.1023/a:1021065330652. 3 i. marshakova-shaikevich, the standard impact factor as an evaluation tool of science fields and scientific journals,” scientometrics 35 (1996): 283–85, https://doi.org/10.1007/bf02018487; i. marshakova-shaikevich, “bibliometric maps of field of science,” information processing & management 41(2005):1536–45, https://doi.org/10.1016/j.ipm.2005.03.027; a. r. ramosrodrí guez and j. ruí z-navarro, “changes in the intellectual structure of strategic management research: a bibliometric study of the strategic management journal, 1980–2000,” strategic management journal 25, no. 10 (2004): 982–1000, https://doi.org/10.1002/smj.397. 4 h. small, “co-citation in the scientific literature: a new measure of the relationship between two documents,” journal of american society for information science 24 (1973): 266–68. https://doi.org/10.1111/j.1365-2648.2012.06064.x https://doi.org/10.1258/1357633991933503 https://doi.org/10.1023/a:1021065330652 https://doi.org/10.1007/bf02018487 https://doi.org/10.1016/j.ipm.2005.03.027 https://doi.org/10.1002/smj.397 information technology and libraries | march 2018 51 5 m. m. kessler, “bibliographic coupling between scientific papers,” american documentation 14 (1963): 10–25, https://doi.org/10.1002/asi.5090140103; b. h. weinberg, “bibliographic coupling: a review,” information storage and retrieval 10 (1974): 190–95. 6 h. d. white and b. c. griffith, “author cocitation: a literature measure of intellectual structure,” journal of the american society for information science 32 (1981): 164–70, https://doi.org/10.1002/asi.4630320302. 7 y. ding, g. g. chowdhury, and s. foo, “bibliometric cartography of information retrieval research by using co-word analysis,” information processing & management 37 no. 6 (november 2001): 818–20, https://doi.org/10.1016/s0306-4573(00)00051-0. 8 small, “co-citation,” 266. 9 d. sullivan et al., “understanding rapid theoretical change in particle physics: a month-bymonth co-citation analysis,” scientometrics 2 (1980): 312–16, https://doi.org/10.1007/bf02016351. 10 n. shibata et al., “detecting emerging research fronts based on topological measures in citation networks of scientific publications,” technovation 28 (2008): 762–70, https://doi.org/10.1016/j.technovation.2008.03.009. 11 weinberg, “bibliographic coupling.” 12 white and griffith, “author cocitation.” 13 r. agrawal and r. srikant. “fast algorithm for mining association rules in large databases” (paper, international conference on very large databases [vldb], september 12–15, 1994, santiago de chile). 14 r. agrawal, t. imielinski, and a. swami, “mining association rules between sets of items in large databases” (paper, acm sigmod international conference on management of data, washington, dc, may 25–28, 1993. 15 agrawal and srikant, “fast algorithm,” 3. 16 ibid., 4. https://doi.org/10.1002/asi.5090140103 https://doi.org/10.1002/asi.4630320302 https://doi.org/10.1016/s0306-4573(00)00051-0 https://doi.org/10.1007/bf02016351 https://doi.org/10.1016/j.technovation.2008.03.009 abstract introduction methods dataset system architecture front-end development subsystem middle-end server subsystem back-end database subsystem pre-processing subsystem research procedure association rule algorithms results conclusions acknowledgements references information technology and libraries at 50: the 1960s in review mark cyzyk information technology and libraries | march 2018 6 mark cyzyk (mcyzyk@jhu.edu), a member of lita and the ital editorial board, is the scholarly communication architect in the sheridan libraries, the johns hopkins university, baltimore, maryland. in the quarter century since graduating from library school, i have now and then run into someone who had what i consider to be a highly inaccurate and unintuitive view of librarians and information technology. seemingly, in their view, librarians are at worst luddites and at best technological neophytes. not so! in my view, librarians have always been at worst technological power users and at best true it innovators. one has only to scan the first issues of ital, or the journal of library automation as it was then called, to put such debate to rest. march 1968 saw the first issue of the first volume of the journal of library automation published. the first article of that inaugural issue sets the scene: “computer based acquisitions system at texas a&i university” by ned c. morris. here we find librarians not only employing computing technology to streamline library operations (using an ibm 1620 with 40k ram), but as the article points out, this new system for computerizing acquisitions was an adjunct to the systems they already had in place at texas a&i for circulation and serials management. this first article in the first issue of the first volume indicates that we’ve dipped a toe into a stream that was already swiftly flowing. the other bookend of that first issue, “the development and administration of automated systems in academic libraries” by harvard’s richard de gennaro, goes meta and takes a comprehensive look at how automated library systems were already being created and the various system development and implementation rubrics under which such development occurred. much in this article should resonate with current readers of ital. i knew immediately that this article was going to be a good read when i encountered, in the very first paragraph: development, administration, and operations are all bound up together and are in most cases carried on by the same staff. this situation will change in time, but it seems safe to assume that automated library systems will continue to be characterized by instability and change for the next several years. i’d say that was a safe assumption. the second and final volume of the 1960’s contains gems as well. the entirety of volume 2 issue 2 that year was devoted to “usa standard for a format for bibliographic information interchange on magnetic tape” a.k.a. marc ii. is it possible for something to be dry, yet fascinating? some titles of this second volume point to the wide range of technological projects underway in the library world in 1969: mailto:mcyzyk@jhu.edu the 1960s in review | cyzyk 7 https://doi.org/10.6017/ital.v37i1.10339 • “an automated music programmer (musprog)” by david f. harrison and randolph j. herber • “a fast algorithm for automatic classification” by r. t. dattola • “simon fraser university computer produced map catalogue” by brian phillips and gary rogers • “management planning for library systems development” by fred l. bellomy • “performance of ruecking’s word-compression method when applied to machine retrieval from a library catalog” by ben-ami lipetz, peter stangl, and kathryn f. taylor and this is only in the first two volumes. as this current 2018 volume of ital proceeds, we’ll be surveying the morphing information technology and libraries landscape through ital articles of the seventies, eighties, and nineties. i think you will see what i mean when i say that librarians have always been at worst technological power users, at best true it innovators. a simulation model for purchasing duplicate copies in a library w. y. arms: the open university, and t. p. walter: unilever limited. at the time this study was undertaken the authors were at the university of sussex. 73 p1'ovision of duplicate copies in a lib1'at'y requires knowledge of the demand fo1' each title. since di1'ect measu1'ement of demand is difficult a simulation model has been developed to estimate the demand for a book f1'om the number of times it has been loaned and hence to dete1·mine the number of copies required. special attention has been given to accurate calibration of the model. introduction a common difficulty in library management is deciding when to buy duplicate copies of a given book and how many copies to buy. a typical research library has several hundred thousand different works; many are lightly used but all are potential candidates for duplication. the problem which we faced at sussex university was how to obtain reliable forecasts of the demand for each title and to translate this into a purchasing policy. at present sussex spends between £10,000 and £20,000 ($22,00o-$44,000) per year on duplicate copies, and as the university grows this amount is increasing steadily. because of the large number of books in a library relatively little data are available about each title. records are kept of books on loan or removed from the library, but frequently these are the only routine data collected. few large libraries even manage inventory checks. we therefore looked for a system that could be implemented with the minimum of data collection, preferably one based on existing records. forecasts of demand if the demand for a particular book is known, it is possible, though not necessarily easy, to determine how many copies of that book are needed to achieve a specified level of service, such as a copy being available on 80 percent of the occasions that a reader requires the book. unfortunately demand cannot be measured directly, even retrospectively. records of the 74 journal of librm·y automation vol. 7/2 june 1974 number of times that a book is issued from the library contain no information about how many times the book was used within the library, nor how many readers failed to find a copy and went away unsatisfied. since both these factors are extremely difficult to measure, one of the central parts of our work was to develop a method of estimating them from data readily available. to forecast demand two lines of approach seemed reasonable: subjective estimation based on faculty reading lists; and forecasts based on the number of loans in previous years. in the past, sussex library has made extensive use of reading lists provided by faculty to decide how many copies to buy of each title. as the books most in demand are those recommended for undergraduate courses this seemed a sensible approach, though the number of copies required is not obvious even if the demand is known. webster analysed the effectiveness of these lists in predicting demand for specific titles and evaluated the purchasing rule being used, one copy for every ten students taking a course. 1 restricting his attention to books known to be in demand and marked in the catalog, he drew a random sample of 673 titles, about 4 percent of the books falling into this category. he compared the number of loans of each of these titles over a term· with data from the reading lists supplied at the beginning of the term. as the library had made a special effort to obtain reading lists for all courses taught that term, he had data on the number and type of students taking each course, the importance given to each text, and the subject areas involved. yet despite a thorough analysis of these data webster was able to find very little relationship between observed demand and reading list information. his work shows that faculty at the university have remarkably little knowledge of the books that their students read. in the sample some books strongly recommended to large groups of students were hardly used and some of the most heavily used works appeared on no reading list. the results of this study are fascinating from an educational viewpoint but less satisfying as operational research. the failure of this .. approach led us to predicting demand from records of the number of past loans. this divides into two parts: using the number of loans over a period to estimate what the total demand was during that period; and using this estimate of the demand in one period to forecast the demand in another. various evidence suggests that the latter is a sensible thing to do. the main demand for heavily used books comes from undergraduate courses. most faculty are loyal in their reading habits, recommending books they know rather than new ones, and each course tends to be repeated year after year with a syllabus that changes only gradually. the use of past circulation to forecast future use is fundamental to a markov model of book usage developed by morse and elston and tested with data from the m.i.t. engineering library. 2 for our work we have used the number of loans in a given term to predict the demand in the corresponding term a year later. simulation m odelj arms and walter 75 estimating the total demand in a period from the number of loans in that period is more difficult. this requires a model of the circulation system. mathematical approach several attempts have been made to apply the methods of inventory control or queueing theory to the problem of buying duplicates. for example, grant has recently described an operational system using the simple rule that the number of copies required to satisfy 95 percent of the demand is n (p,. + 2cr.)/t where n is the number of times that the book is issued during a period of t days and p,8 and cr8 are the mean and standard deviation of the time that each book is off the shelf when on loan. 3 this type of approach has the advantage of being straightforward to use. periodically a simple computer program analyzes the circulation history of each book in the library and prints a list of books requiring duplication. however, the method suffers from difficulties both mathematical and practical. to obtain the simple mathematical expression given above, several simplifying assumptions have to be made. for example, the expression ignores use of a book within the library, and identifies demand in a period with the number of loans within that period. practical difficulties in arriving at a more exact mathematical expression are discussed in the next section. difficulties in constructing a model the following are the main difficulties that we found in constructing a model, either mathematical or using simulation: 1. the most useful measure of the effectiveness of a duplication policy is satisfaction level, the proportion of readers who on approaching the shelves find a copy of the book there, but satisfaction level is almost impossible to measure directly since, although some unsatisfied readers ask that the book be held for them, most go away without comment. more or less equivalent is the percentage time on shelf, the proportion of time that at least one copy of the book is available. this can be measured directly, though a visit to the shelves is needed, and was found useful in validating our model. if the underlying demand is random these two measures of effectiveness have the same value. 2. use of books within the library is also difficult to measure. at sussex, as in most libraries, data are available only on the number of times that a book is lent out of the library. if a reader does not find a copy on the shelves or if he uses a book within the library but does not take it away then no record is generated. since various studies, notably that of fussier and simon, suggest that the amount of use within li76 ]oumal of libmry automation vol. 7/2 june 1974 braries often exceeds the number of loans recorded by a factor of three or more, if the number of loans is used to estimate demand a reasonable knowledge of within-library use is essential.4 3. the number of copies required to achieve a specified satisfaction level does not go up linearly with demand. since a reader is satisfied if he finds a single copy on the shelves, proportionately fewer duplicates are needed of the books most in demand. at sussex more than twenty copies are provided of several books and this nonlinearity is very noticeable. 4. the demand for a title is erratic, changing from term to term, from week to week, and from day to day, even if the mean demand is constant. over a period such as a term three different effects might be expected: a background random demand independent of university courses; sudden peaks when a book is required for a course taken by several students; and feedback caused by previously unsatisfied readers returning. 5. the circulation of books is surprisingly complicated. at sussex some books are designated short term loan and can be borrowed for up to four days only; the remainder are long term loan books and can be borrowed for up to six weeks. circulation data show that the time for which a book is off the shelf is not the same as the period for which it is lent, but has a heavily skewed distribution. few books are returned until near the due date; just before the book is due back there is a peak when most books are returned but many become overdue and the tail of the distribution dies away slowly. simulation as these various factors seemed too complex to derive usable mathematical results, we decided to use computer simulation of the book circulation. simulation of book circulation is not new. in particular it has been used at lancaster university by mackenzie et al. to decide loan periods.5 their report includes a good description of the general approach. the object of our simulation was to model the circulation process so that we could study the relationship between three groups of parameters: 1. 0 bserved data number of copies available number of loans 2. total underlying demand 3. measures of effectiveness satisfaction of level percentage time on shelf. the results obtained from any simulation are only as accurate as the values given to the variables used to calibrate the model. as several of these values were not known at all accurately when the work was begun, special efforts were put into careful validation and calibration of the mod76 ]oumal of libmry automation vol. 7/2 june 1974 braries often exceeds the number of loans recorded by a factor of three or more, if the number of loans is used to estimate demand a reasonable knowledge of within-library use is essentiaj.4 3. the number of copies required to achieve a specified satisfaction level does not go up linearly with demand. since a reader is satisfied if he finds a single copy on the shelves, proportionately fewer duplicates are needed of the books most in demand. at sussex more than twenty copies are provided of several books and this nonlinearity is very noticeable. 4. the demand for a title is erratic, changing from term to term, from week to week, and from day to day, even if the mean demand is constant. over a period such as a term three different effects might be expected: a background random demand independent of university courses; sudden peaks when a book is required for a course taken by several students; and feedback caused by previously unsatisfied readers returning. 5. the circulation of books is surprisingly complicated. at sussex some books are designated short term loan and can be borrowed for up to four days only; the remainder are long term loan books and can be borrowed for up to six weeks. circulation data show that the time for which a book is off the shelf is not the same as the period for which it is lent, but has a heavily skewed distribution. few books are returned until near the due date; just before the book is due back there is a peak when most books are returned but many become overdue and the tail of the distribution dies away slowly. simulation as these various factors seemed too complex to derive usable mathematical results, we decided to use computer simulation of the book circulation. simulation of book circulation is not new. in particular it has been used at lancaster university by mackenzie et al. to decide loan periods.5 their report includes a good description of the general approach. the object of our simulation was to model the circulation process so that we could study the relationship between three groups of parameters: 1. 0 bserved data number of copies available number of loans 2. total underlying demand 3. measures of effectiveness satisfaction of level percentage time on shelf. the results obtained from any simulation are only as accurate as the values given to the variables used to calibrate the model. as several of these values were not known at all accurately when the work was begun, special efforts were put into careful validation and calibration of the modsimulation model/ arms and walter 77 el. a separate study was made for a small sample of books, to compare the percentage time on shelf estimated by the simulation with the actual time for which a copy was available, found by looking at the shelves. the results of this study were used to check the amount of use within the library. by this means we were able to verify the simulation model and calibrate it to a highly satisfactory level of accuracy. description of program the basic layout of the simulation is shown in figure 1. .this is a time advance model with a period of one day. the program has been coded in fortran and running on the icl 1904a computer at sussex takes about one second of machine time to simulate two years. this fast speed has enabled us to try a wide range of values for most parameters and to experiment with a variety of distributions of arrival times and book return dates. 1. satisfaction level at the beginning of each day the number of demands for that day is generated. the satisfaction level is taken as the proportion of these requests which can be satisfied from the books left on the shelf from the previous day and those returned during the simulated day. 2. within-library use the proportion of use that takes place within the library was a key parameter in calibrating the model. the first version of the simulation program assumed a figure of 25 percent use within the library. this was based on a small survey of the type of books being studied, standard texts used for undergraduate courses. the weakness of this survey was that it used a count of those books that were left lying in the library at the end of the day and did not make sufficient allowance for books reshelved by readers or by library staff during the day. the validation experiment showed a consistent difference between predicted and observed percentage time on shelf which could be corrected by changing the value of the within-library use parameter to 60 percent. 3. distribution of demand two distributions of demand have been used, poisson arrivals with a specified mean, and a step demand superimposed on a poisson process. in both cases provision is made for a proportion of unsatisfied readers to return later. as the effect of this feedback is to introduce sharp peaks of demand, the two distributions have proved surprisingly similar in the results produced and most of the runs of the program have been done with random demand. a recent survey showed that 69 percent of readers who fail to find a book intend to return, but we do not know how many actually come back nor what the time interval is before they return. 6 the simulation proved to be insensitive to moderate changes of these parameters 78 journal of library automation vol. 7/2 june 1974 advance clock one day add returned books generate requests fig. 1. outline flowchart of simulation program generate :return date generate return date reader return date simulation model/ arms and walter 79 and for most runs 25 percent of unsatisfied readers were deemed to return after a delay which averaged two days. 4. period for which the book is off the shelf the simulation allows for a book to be borrowed within the library, in which case it is available again the next day, or to be lent from the library. if the book is lent, the return date is generated from one of two histograms which respectively refer to books available on short and long term loan. these histograms were derived from an analysis of all books returned during one week in autumn 1970, modified to reflect changes in the circulation system. validation experiment although the structure of the simulation is fairly straightforward several parameters used in the model have been estimated indirectly. validation of the model took two forms. firstly we ran the program with a wide range of values for the main parameters to see which most influence the results. secondly a small study was set up to measure the percentage time on shelf of a number of books. for each book, the actual availability was estimated by the simulation from the number of loans during the same period. twenty-eight books known to be in heavy demand were selected, half in physics and half in sociology. over a period of eight weeks the shelves were inspected once per day, at random times during the day, to see if a copy was available. the number of loans of each copy of each book during the period was noted and the library staff carried out a thorough check to determine whether any copies shown in the catalog had been lost, stolen, or had their loan category altered. the simulation was used to estimate the percentage time on shelf and this was plotted on a graph against the observed percentage. figure 2 shows the graph for the original values of the parameters. in this graph the x axis shows the percentage time on shelf predicted by the simulation; the y axis shows the percentage observed. if the model were perfect the points would lie near the line y = x, deviations being caused by y being a random variable. the graph in figure 2 is clearly convex downwards showing a consistent error in the model, with these values of the parameters. knowing that the simulation is sensitive to the parameter giving the proportion of use that takes place within the library and that our estimate of its value was not precise, a series of graphs were prepared varying this parameter. figure 3 shows the same observations plotted against predictions assuming 60 percent use within the library, the value which best predicts the observations. this graph is much closer to being linear than figure2. the next question is whether the nonlinearities in figure 3 are the type to be expected from y being a random variable. a very rough calculation helps to answer this question. if we make the dubious assumption that 80 i ournal of lihm1'y automation vol. 7/2 june 197 4 observed availability (percent time on shelf) 100 50 25 o~----------~~----------~50~----------~75 ____________ -jloo predicted availability (percent time on shelf) fig. 2. observed percentage time on shelf against predicted ( 25 percent use within library) availability of a copy on a given day is independent of the days before and afterwards, then, for x given, y should be approximately normally distributed with mean x and variance x( 1 : ) , where n is the number of days in the study (forty). if this calculation were exact, 95 percent of the observations of y would lie within two standard deviations of x, but, since the assumption of independence is definitely false, we would expect the number of observations which fall within the range to be less than 95 percent. the curves y = x ± 2 { x(lx)/n} ¥. observed availability (percent time on shelf) 100 75 50 25 simulation model/ arms and walter 81 predicted availability (percent time on shelf) fig. 3. observed percentage time on shelf against predicted ( 60 percent use within library) with 95 percent probability curves have been added to figure 3. two points lie well off all graphs and cannot be explained except as the result of books being stolen or lost during the period of the study. of the remaining twenty-six all but three lie within the curves. this shows that the simulation model as finally calibrated gives a very reasonable description of the situation. operational experience the results of this simulation have been used by library staff since the middle of 1971 initially on an experimental basis. a two-stage process is in82 journal of library automation vol. 7/2 june 1974 volved. from the computer based circulation system cau; be found the number of times that each short term loan copy has been circulated. from these figures the library staff can estimate the demand for a title, over a given period. once the demand has been estimated the staff can use the simulation again to determine how many copies would have been required to have achieved a specified satisfaction level, perhaps 80 percent. if fewer copies are held by the library orders are placed for extra copies. at present these procedures are done manually using tables, but the possibility exists of modifying the computer system to identify those titles which need extra duplication. the actual decision to purchase needs to be done by library staff who can take account of factors not included in the simulation, such as price and changes of undergraduate courses. conclusion although this work was carried out during 1971, we shall have little operational experience of the method in action until the computer circulation system is reorganized. in the past, different copies of the same book have been processed entirely independently, meaning that the total number of loans of a given title can only be found by manually adding up the number of loans of each copy. in the revised computer system this will be done automatically. experience will probably show that the best procedure combines use of the simulation model with reading lists and the skill of a librarian. one possible feature of a computer based system is that it could automatically indicate which books appear to require duplication. the method used here would seem to apply equally well to other libraries. naturally the circulation patterns of other libraries are different, which means that a different simulation would be needed, but this work has shown that it is possible to calibrate a simulation accurately enough to examine the circulation of individual books. acknowledgments we would like to thank the many members of the university of sussex library staff who have helped at various stages, particularly p. t. stone who was closely involved throughout. references 1. p. f. webster, provision of duplicate copies in the university library, final year project report (university of sussex, 1971). 2. p. m. morse and c. r. elston, "a probability model for obsolescence," operations resem·ch 17:36-47 (1969). 3. r. s. grant, "predicting the need for multiple copies of books," journal of library automation 4:64-71 (june 1971). 4. h. h. fussier and j. l. simon, patterns in the use of books in large research libmries (chicago: univ. of chicago pr., 1969). 5. a. g. mackenzie et al., systems analysis of a university library. report to osti on project sl/ 52/02, 1969. 6. j. urquhart, private discussion, 1971. lib-s-mocs-kmc364-20140601051313 17 a truncated search key title index philip l. long: head, automated systems research and development and frederick g. kilgour: director, ohio college library center, columbus, ohio. an experiment showing that 3, 1, 1, 1 search keys derived from titles are sufficiently specific to be an efficient computerized, interactive index to a file of 135,938 marc ii records. this paper reports the findings of an experiment undertaken to design a title index to entries in the ohio college library center's on-line shared cataloging system. several large libraries participating in the center requested a title index because experience in those libraries had shown that the staff could locate entries in files more readily by title than by author and title. users of large author-title catalogs have long been aware of great difficulties in finding entries in such catalogs. since the center's computer program for producing an author-title index could be readily adapted to produce a title index, it was decided to add title access to the system. a previous paper has shown that truncated three-letter search keys derived from the first two words of a title are less specific than authortitle keys ( 1). earlier work had revealed that addition of only the first letter of another word in a title improved specificity ( 2) . therefore, the experiment was designed to test the specificity of keys consisting of the first three characters of the first non-english-article word of the title plus the first letter of a variable number of consecutive words. the experiment was also designed to produce an index that catalogers could use efficiently and that would operate efficiently in the computer system. it was assumed that the terminal user would have in hand the volume for which an entry was to be sought in the on-line catalog. the index was not to be designed for use by library users; subsequent experiments will be done to design an index for nonlibrarian users. other investigations into computerized, derived-key title indexes include 18 journal of library automation vol. 5/1 march, 1972 the previous paper in this series to which reference has already been made ( 1) and development of a title index in stanford's ballots system ( 3). although stanford has not published results observed from experiment or experience that describe the retrieval specificity of its technique, it is clear that the stanford procedure is not only more powerful than the one described in this paper but also more adaptable for user employment. the stanford index is probably less efficient. materials and methods a file of 135,938 marc ii records was used in this experiment. this file contains title-only and name-title entries, and keys were derived from titles in both types of entries. a key was extracted consisting of the first three characters of the first non-english-article word of each title plus the first character of each following word up to four. if there were fewer than four additional words, the key was left-justified, with trailing blank fill. only alphabetic and numeric characters were used in key derivation; alphabetic characters were forced to uppercase. all other characters were eliminated and the space occupied by an eliminated character was closed up before the key was derived. a total of 115,623 distinct keys was derived from the 135,938 entries. these 115,623 keys were then sorted. each key in the file was compared with the subsequent key or keys and equal comparisons were counted. a frequency distribution by identical keys was thus prepared, and a table constructed of percentages of numbers of equal comparisons based on the total number of distinct keys. this table contains the percentage of time for expected numbers of replies based on the assumption that each key had a probable use equal to all other keys. next, by eliminating the fourth single character and then the fourth and third, files of 3,1,1,1 and 3,1,1 keys were prepared from the 3,1,1,1,1 file. for example, the 3,1,1,1,1 key for raymond irwin's the heritage of the english library is her, 0, t, e , l; the 3,1,1,1 key for this title is her, 0 , t, e; and the 3,1,1 key, her, 0 , t. the same processing given to the 3,1,1,1,1 file was employed on these two files. results table 1 contains the maximum number of entries in 99 percent of replies. inspection of the table reveals that there is a large increase in specificity when the key is enlarged from 3,1,1 to 3,1,1,1; the maximum number of entries ( 99+ percent of the time) drops from twelve to five. however, when the key goes to 3,1,1,1,1, the number of entries per reply goes down only to four from five. the percentage of replies that contained a single entry was 67.8 for the 3,1,1 key, 84.0 for the 3,1,1,1 key, and 90.0 for the 3,1,1,1,1 key. a truncat ed search key / long and kilgour 19 table. 1. maximum number of entries in 99 percent of replies search key 3, 1,1 3, 1, 1,1 3, 1, 1, 1,1 title index entries maximum entries per reply 12 5 4 percentage of time 99.0 99.1 99.2 the irascope cathode ray tube terminals used in the oclc system can display nine truncated entries on the one screen, and it is felt that catalogers can use with ease up to two screensful of entries. therefore, the keys producing more than eighteen titles were listed. for 3,1,1,1,1 there were only 33; for 3,1,1,1 there were 67; and for 3,1,1 there were 357. the maximum number of identical keys was 321 for 3,1,1,1,1 and 3,1,1,1; the key was pro, b, b, b, b, most of which was d erived from "proceedings." for 3,1,1 the maximum was 417, for his, 0 , t "history of the." discussion it is clear from the findings that a 3,1,1 search key is not sufficiently specific to operate efficiently as a title index in a large file. however, the 3,1,1,1 key appears to be sufficiently specific for efficient operation, while the 3,1,1,1,1 key does not appear to possess sufficient increased specificity to justify its additional complexity. the observation that there is a large increase in specificity between keys employing threeand four-title words that constitute markov strings suggests that the second and third words may be highly correlated. indeed this suggestion is substantiated b y the maximum case for 3,1,1-his, 0, t. in the more-than-eighteen group for 3,1,1,1, these characters occurred in seven keys for a total of 206 entries, and for 3,1,1,1,1 they did not occur at all in the more-than-eighteen group. conclusion this experiment has shown that a 3,1,1,1 or 3,1,1,1,1 derived search key is sufficiently specific to operate efficiently as a title index to a file of 135,938 marc ii records. since a previous paper observed that as a fil e of entries increases the number of entries per reply does not increase in a one-to-one ratio ( 1 ), it is likely that these keys will operate efficiently for files of considerably greater size. 20 journal of library automation vol. 5/1 march, 1972 references 1. frederick g. kilgour, philip l. long, eugene b. leiderman, and alan l. landgraff, "title-only entries retrieved by use of truncated search keys," l ournal of library automation 4:207-10 ( dec. 1971 ). 2. frederick g. kilgour, "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science 5: 133-36 ( 1968 ). 3. edwin b. parker, spires (stanford physics information retrieval system) 1969-70 annual report ( palo alto: stanford university, june 1970 ), p. 7778. business intelligence in the service of libraries articles business intelligence in the service of libraries danijela tešendić and danijela boberić krstićev information technology and libraries | december 2019 98 danijela tešendić (tesendic@uns.ac.rs) is associate professor, university of novi sad. danijela boberić krstićev (dboberic@uns.ac.rs) is associate professor, university of novi sad. abstract business intelligence (bi) refers to methodologies, analytical tools, and applications used for data analysis of business information. this article aims to illustrate an application of bi in libraries, as reporting modules in library management systems are usually inadequate for a comprehensive business analysis. the application of bi technology is presented as a case study of libraries using the bisis library management system in order to overcome shortcomings of an existing reporting module. both user requirements regarding reporting in bisis and already existing transactional databases are analysed during the development of a data warehouse model. based on that analysis, three data warehouse models have been proposed. also, examples of reports generated by an olap tool are given. by building the data warehouse and using olap tools, users of bisis can perform business analysis in a more user-friendly and interactive manner. they are not limited with predefined types of reports. librarians can easily generate customized reports tailored to the specific needs of the library. introduction organizations usually have a vast amount of data which increases on a daily basis. the success of an organization is directly related to its ability to provide relevant information in a timely manner. an organization must be able to transform raw data into valuable information that will enable better decision-making.1 for this reason, it is impossible to imagine an organization without an efficient reporting module as a part of its management information system. if we put libraries in a business context, they are very similar to any other organization. the common characteristic of each is that they have high demand for a variety of statistical reports in order to support their business. a library management system uses a transactional database to store and process relevant data. this database is designed in accordance with the main functionalities of the system. information used to make strategic decisions is usually obtained from historical and summarized data. however, the database model may have a complex structure and may not be suitable for performing analytical queries that are often very complex and involve aggregations. execution of those queries may be a time-consuming and resource-intensive process that can decrease performance as well as the availability of the system itself. also, creating such queries can require advanced it knowledge. these problems can be overcome by developing business intelligence systems. business intelligence (bi) refers to methodologies, analytical tools, and applications used for data analysis of business information. bi gives business managers and analysts the ability to conduct mailto:tesendic@uns.ac.rs mailto:dboberic@uns.ac.rs business intelligence in the service of libraries |tešendić and krstićev 99 https://doi.org/10.6017/ital.v38i4.10599 appropriate analyses. by analyzing historical and current data, decision-makers get valuable insights that enable them to make better, more-informed decisions. bi systems rely on a data warehouse as an information source. the data warehouse is a repository of data usually structured to be available in a form ready for analytical processing activities. 2 business intelligence systems do not exist as ready-made solutions for each organization, but need to be built in accordance with the characteristics of each organization using the appropriate methodology. this article proposes a data warehouse architecture and usage of olap tools in order to support bi in libraries. the application of bi technology is illustrated through a case study of libraries using the bisis library management system. the first step in implementation of bi was the creation of a data warehouse model considering data that exist in bisis and requirements regarding reporting. after the data warehouse model had been created, data were loaded into the data warehouse using olap tools. olap tools were also used for visualization of data stored in the data warehouse. reporting in bisis the bisis library management system has been in development since 1993 by the university of novi sad, serbia. currently, the bisis community comprises over forty medium-sized libraries in serbia.3 the primary modules of the bisis system include cataloguing, reporting, circulation, opac, bibliographic data interchange, and administration. bisis supports cataloguing according to unimarc and marc 21 formats using an xml editor for bibliographic material processing.4 the bisis search engine is implemented with a lucene engine.5 bisis supports z39.50 and sru protocols for the search and retrieval of bibliographic records.6 those protocols are also used for developing a bisis service for searching and downloading electronic materials by the audio library system for visually impaired people.7 in addition, bisis allows sharing of bibliographic records with the union catalogue of the university of novi sad.8 the circulation module features all standard activities for managing users: registration, charging, discharging, searching users and publications, and generating different kinds of reports, as well as user reminders.9 the reporting module of bisis is implemented using the jasperreports tool.10 however, this module has some limitations due to the fact that bisis works only with a transactional database and does not cope well with complex reports. firstly, in order to generate reports regarding library collections, it is necessary to process all bibliographic records stored in that transactional database. this activity significantly burdens the system and reduces its performance. to avoid this, reports are prepared in advance outside working hours, usually at night. consequently, those reports include only data collected before report generation. creating reports in this manner greatly reduces system load and speeds up presentation of the reports because they are already generated. however, some reports, such as those related to the financial aspects of the library (e.g., the number of new members and the balance at the end of the day), need to be created in real time. due to the execution in real time, those reports are ineffective and affect the performance of the entire system. the next limitation of this reporting module is that it has a set of predefined reports and the creation of new reports requires additional development. in the current deployment it is not possible to add new reports without engaging software developers. also, an additional obstacle is the fact that the data for generating reports are obtained from two different data sources (described in more detail in the following sections). for example, the report regarding the number information technology and libraries | december2019 100 of borrowed books by the udc (universal decimal classification) groups requires data about the udc groups from xml documents and data about book borrowing from the relational database. generating this kind of reports cannot be done in a timely and efficient manner. taking into account these shortcomings of the reporting module, it can be concluded that the application of business intelligence, primarily data warehouse and olap tools, could improve analytical data processing in the libraries using bisis. related work one of the basic components of the business intelligence system is a data warehouse. a data warehouse is a centralized database that stores historical data. those data are in principle unchangeable and they are obtained by collecting and processing data from various data sources. data warehouses are used as support for making business decisions.11 the data sources for a data warehouse can be diverse and may include transactional databases and different file formats. the process of integrating data from different data sources into a single database is called data warehousing. data warehousing includes extracting, transforming, and loading (etl) data into data warehouse.12 the goal of data warehousing is to extract useful data for further analysis from the huge amount of data that is potentially available. there are different approaches to modeling a data warehouse. these approaches can be classified in three different paradigms according to the origin of the information requirements: (1) supplydriven, (2) demand-driven, and (3) hybrids of these. a supply-driven approach is based on data that exist in the transactional database. these data are analyzed to determine which data are the most relevant for making business decisions, or which data should be part of the data warehouse. alternatively, a demand-driven approach is based on the end-user requirements which means that the data warehouse is modeled in a way that is possible to get answers to the questions asked by the users. the third approach is a hybrid approach and it combines the previous two approaches in the process of data warehouse modelling. the hybrid approach attempts to diminish the shortcomings of the previous two approaches. in the case of a supply-driven approach, the data warehouse will probably not meet the requirements of the end users, while in the demand-driven approach there may be no data to fill the created data warehouse. in an article published in 2009, romero and abelló gave an overall view of the research in the field of dimensional modeling of data warehouses.13 various examples of implementation of data-warehouse solutions in libraries can be found in the literature. in 2014, siguenza-guzman et al. described the design of a knowledge-based decision support system based on data-warehouse techniques that assists library managers making tactical decisions about the optimal use and leverage of their resources and services. when designing the data warehouse, the authors started from the requirements of the end users (demand -driven approach) and extracted data from heterogeneous sources.14 a similar approach has been used by yang and shieh, who started from the reports needed by public libraries in taiwan and through an iterative methodological approach modeled a data warehouse that meets all their reporting requirements.15 unlike the previously described articles where a demand-driven approach was used, we applied a hybrid approach to modeling data warehouse. we analyzed data sources that exist in bisis business intelligence in the service of libraries |tešendić and krstićev 101 https://doi.org/10.6017/ital.v38i4.10599 following a supply-driven approach, but we also analysed user requirements to identify the facts and dimensions for the dimensional data warehouse model. modeling the data warehouse in order to implement a data warehouse solution, the first step is to design a data model suitable for analytical data processing. a data warehouse usually stores data in a relational database and organizes them in so called dimensional models. unlike standard relational database models, those models are denormalized and provide easier data visualization. data can be presented as a cube with three, four or n-dimensions. analyzing such data is more intuitive and user-friendly. the dimensional model contains the following concepts: dimensions, facts, and measures. dimensions represent the parameters for data analysis while facts represent business entities, business transactions, or events that can be used in analyzing business processes. the most commonly used model in dimensional modeling is the star model. after identifying the facts and dimensions, a dimensional model almost always resembles a star, with one central fact and several dimensions that surround it. dimensions and facts are usually implemented as tables in the relational database. dimension tables contain primary keys and other attributes. fact tables contain numerical data as well as dimension tables keys. the measure is a numerical attribute of the fact table and can be obtained by aggregating data by certain dimensions. there are several approaches to modeling data warehouse and we followed a hybrid approach to design dimensional models presented in this article. this implies that both the existing data sources and the user requirements were considered while designing the final data-warehouse models. that modeling process involved the following activities: 1. analysis of existing data sources in bisis with identification of possible facts and dimensions, 2. analysis of user requirements regarding reporting, 3. refactoring of the facts and dimensions in accordance with the user requirements, and 4. design of dimensional models. analysis of data sources in bisis the first step in creating a data warehouse is an analysis of existing data sources. the bisis system uses two different data sources. bibliographic records are stored in xml documents, while circulation data, as well as holdings data regarding the items that are circulated, are stored in a relational database. in 2009, tešendić et al. described the bisis circulation database model.16 that model describes data about individual and corporate library members. data about members includes information about personal data, membership fees, as well as information about a member’s borrowed and returned items. bibliographic data in bisis are presented in unimarc format. dimić and surla in 2009 described the model for bibliographic records used in bisis.17 a bibliographic record is modeled as a list of fields and subfields. a field contains a name, values of the indicators and a list of subfields. a subfield contains a name and a value of that subfield. the data described by that model are stored in xml documents because the bibliographic record structure is not suitable for relational modeling. that structure is more in line with the document-oriented data storage approach. information technology and libraries | december2019 102 analysis of user requirements one of the essential functionalities of information systems, including library management systems, is to provide various statistical reports that should help the management of the library to make better business decisions. user requirements related to analytical processing in bisis can be grouped into several categories. the first category consists of requirements regarding reports on the library collections. examples of reports from this category are: • number of publications per language for a certain period of time; • number of publications by departments; • number of new publications for a certain period of time; and • number of publications by udc groups. the second category consists of requirements related to the circulation of library resources. examples of such reports are: • number of borrowed items by member category; • number of borrowed items by language of publication; • number of borrowed items by departments; • the most popular books; and • the most avid readers for a certain period. the third category consists of requirements related to the reports on financial elements of the library's business. some of the reports are: • number of new members on a daily basis with a financial balance; • number of members by membership category and gender; and • number of members per departments. analyzing user requirements, it was perceived that a new data warehouse have to be created using data from both data sources. this means that appropriate transformations of data from the relational database as well as from the bibliographic records documents need to be performed. data warehouse models taking into account the reporting requirements as well as the data that exist in bisis, appropriate dimensional models are designed. the proposed dimensional models were designed to meet all the needs for analytical processing, as well as to enable flexibility of the reporting process in bisis. for each of the observed groups of reports, a dimensional model was created as described below. model describing library collection data a dimensional model of the bisis data warehouse used for analytical processing of the library collection data is shown in figure 1. the data from this model are used to generate reports on the library collection. examples of such reports are accessions register, number of items by udc group, number of items by departments, etc. in generating all these reports, an acquisition number of an item has the main role and all reports are created either by counting the acquisition numbers or by displaying the acquisition business intelligence in the service of libraries |tešendić and krstićev 103 https://doi.org/10.6017/ital.v38i4.10599 numbers along with other data related to that item. therefore, the acquisition number represents the measure in this dimensional model. the central table in the model is the item table and it presents a fact table. this table contains the acquisition number and foreign keys from dimension tables. all other tables in the model are dimension tables. the publication table represents a dimension table containing bibliographic data from bibliographic records. only data that are needed for reports are extracted from bibliographic records and stored in this table. those data refer to the name of the author, the title of the publication, the publication’s isbn and udc number, the number of pages, keywords, and an identification number for the bibliographic record in the transactional database. the acquisition table represents a dimension that describes the publication's acquisition data such as a retail price, the name of the supplier, and the invoice number. the location table describes departments within the library where an item is stored. the status, publisher, language, and udc_group tables relate to information about the status of an item, publisher, language of the publication, and udc group to which an item belongs. the date and year tables represent the time dimensions. data in the date table are extracted from the date of an item acquisition and data in the year table are extracted from the publishing year. figure 1. dimensional model describing library collection data. information technology and libraries | december2019 104 model describing library circulation data a dimensional model of the bisis data warehouse used for the analytical processing of library circulation data is shown in figure 2. data from this model are used for generating statistical reports regarding usage of library resources. examples of such reports are the number of borrowed publications according to different criteria (such as user categories, language of publication, departmental affiliation of the user who borrowed the publication, etc.). these data can answer questions about the most popular books or the readers with the highest number of borrowed books. similar to the previous reporting group, the acquisition number of the item which was borrowed has the main role in generating those reports. all reports from this group are created by counting acquisition numbers of borrowed items and displaying data related to those checkouts. therefore, in this dimensional model, the acquisition number is a measure. the central table in the model is the lending table and is presented as a fact table. this table contains the acquisition number of the borrowed item and foreign keys from the dimension tables. all other tables in the model are dimension tables. the publication, publisher, year, acquisition, ucd_group, status, and language tables contain data from bibliographic records and the content of these tables have been already explained. the member, membershiptype, category, education, and gender tables represent the dimension tables containing information about library users. these data are only a subset of circulation data from transactional database. the location table describes departments within the library where items are borrowed. the date table represents the time dimension. the data in the date table are derived from the date of borrowing and the date of discharge of an item. business intelligence in the service of libraries |tešendić and krstićev 105 https://doi.org/10.6017/ital.v38i4.10599 figure 2. dimensional model describing library circulation data. model describing members’ data a dimensional model of the bisis data warehouse used for the analytical processing of members’ data is shown in figure 3. data from this model are used for generating statistical reports on library members, as well as for generating financial reports based on membership fees. examples of such reports are the number of members according to different criteria (such as department of registration, member category, type of membership, gender, or education level). also, this report group contains reports that include a financial balance (for example, a list of members with membership fees in a certain time period). the membership fee has the main role in generating these reports. all reports from this group are generated by counting or displaying members who have paid a membership fee or summarizing membership fees. therefore, in this dimensional model, membership fee is a measure. the main table in the model is the membership table and it presents a fact table. it contains the membership fee, which is the measure, and foreign keys from the dimension tables. information technology and libraries | december2019 106 all other tables in the model are dimension tables. tables member, membershiptype, category, education and gender represents the dimension tables that contain information about library members and the content of these tables was previously described. the table location describes departments within the library where user registration is performed. the table date represents the time dimension. data in the table date are based on the registration date and the date of the membership expiration. figure 3. dimensional model describing library members. true value of a data warehouse in the previous sections, we presented models of data warehouse, but those models are unusable if they are not implemented and populated with data needed for business analysis. extracting, transforming, and loading (etl) processes are responsible for reshaping the relevant data from the source systems into useful data to be stored in the data warehouse. etl processes load data into a data warehouse, but that data warehouse is still only storage for those data. a real-time and interactive visualisation of those data will show the true benefits of data warehouse implementation in various organisations including libraries. to load as well as to analyze and visualize large volumes of data in data warehouses, various online analytical processing (olap) tools can be used.18 the usage of olap tools does not business intelligence in the service of libraries |tešendić and krstićev 107 https://doi.org/10.6017/ital.v38i4.10599 require a lot of programming knowledge in comparison to tools used for querying transactional databases. the interface of olap tools should provide a user with a comfortable working environment to perform analytical operations and to visualize query results without knowing programming techniques or structure of transactional database. there are various olap tools available on the market.19 when choosing an olap tool to be used in an organization, there are several important criteria to consider: the duration of query execution, user-oriented interface, the possibility of interactive reports, price of tool, automation of the etl process, etc.20pentaho bi system is one of the open-source olap tools which satisfies most of those criteria. among various features, pentaho supports creation of etl processes, data analysis, and reporting.21 implementation of etl processes can be a challenging task primarily because of the nature of the source systems. we used pentaho tool to transform data from bisis to the data warehouse, as well as to visualize data and generate statistical reports. etl processes modeling after creating a data-warehouse model, it is necessary to load data into the data warehouse. the first step in that process is to extract data from the data sources. those data may not be in accordance with the newly created data-warehouse model and appropriate transformations of data may be needed before loading. regarding the structure of the data sources, transformations can be implemented from scratch, or by using dedicated olap tools. both techniques are used in the development of our data warehouse. transformations that required data from bibliographic records were implemented from scratch because of complex data structure, while transformations that processed data from relational database are implemented using pentaho data integration (pdi) tool. pdi is a graphical tool that enables designing and testing etl processes without writing programming code. figures 4 and 5 show an example of transformations created and executed by that tool. those transformations have been applied to load members’ data from bisis relational database into the data warehouse. figure 4. transformations for loading members data. information technology and libraries | december2019 108 figure 5. the membershiptransformation process. an issue that may arise after an initial loading of a data warehouse relates to updating the data warehouse. in order to achieve a better performance of transactional databases, updates of the data warehouse should not be performed in real time. in the case of library management systems, those updates can be performed outside of working hours so data in the data warehouse will be up to date on a daily basis. an update algorithm can be defined as an etl process using olap tools or it can be implemented from scratch. data visualization the basic task of olap tools is to enable visualization of data stored in a data warehouse. the olap tools use multidimensional data representation, known as a cube, which allows a user to analyze data from different perspectives. olap cubes are built on dimensional models of a data warehouse and consist of dimensions and measures. dimensions form the cube structure and each cell of the cube holds a measure. measures are derived from the records in the fact table and dimensions are derived from the dimension tables. olap tools allow a user to select a part of the olap cube by setting an appropriate query and that part can be further analyzed by different dimensions. this process is performed by applying common operations on the cube which include slice and dice, drill down, roll up, and pivot.22 data that are results of operations on the cube can be visualized in the form of tables, charts, graphs, maps, etc. the main advantage of olap tools reflects is that end users can do their own analyses and reporting very efficiently. users can extract and view data from different points of view on demand. olap tools are valuable because they provide an easy way to analyze data using various graphical wizards. by analyzing data interactively, users are provided with feedback which can define the direction of further analysis. in order to visualize data from our data warehouse, we used the pentaho olap tool. we used it to create predefined reports identified during the analysis of user requirements as well as some interactive reports using operations on the olap cube. examples of generated reports are presented below in order to illustrate some features of the pentaho olap tool. an example of a report shown in figure 6 was obtained with a dice operation on the cube. the dice operation selects two or more dimensions from a given cube and provides a new sub-cube. in this particular example, we selected three dimensions: gender, member category, and registration date. business intelligence in the service of libraries |tešendić and krstićev 109 https://doi.org/10.6017/ital.v38i4.10599 figure 6. example of dice operation performed on the olap cube. information technology and libraries | december2019 110 figure 7. example of roll-up and drill-down operations performed on the olap cube. business intelligence in the service of libraries |tešendić and krstićev 111 https://doi.org/10.6017/ital.v38i4.10599 additionally, we analyzed only those data from 2014 to 2018. the result of this operation is presented in the form of nested pie charts. however, other forms of visualisation can be applied on the same data set very easily. in figure 7, a more complex report is presented. that report is obtained by performing combination of roll-up and drill-down operations. the roll-up operation performs aggregation on a cube reducing the dimensions. in our example, we aggregated the number of newly acquired publications for certain years ignoring all other dimension except the date dimension. a user can select a particular year, quarter, and month and analyze details of purchased publications in that period, such as title and author of the publication. this is an example of using drill-down operation on the cube. the result of that operation is presented as a table, as shown in figure 7. this report is interactive, because user can investigate data in more detail by performing other operations on the cube that are placed on the toolbar of the report. conclusion this article aims to illustrate an application of business intelligence in libraries, as reporting modules in library management systems are usually inadequate for a comprehensive business analysis. development of a data warehouse, which is the base of any business intelligence system, as well as usage of olap tools are presented. both user requirements regarding reporting in bisis and already-existing transactional databases are analyzed during the development of a datawarehouse model. based on that analysis, three data-warehouse models have been proposed. also, examples of reports generated by an olap tool are given. by building the data warehouse and using olap tools, users of bisis can perform business analysis in a more user-friendly manner than with other processes. users are not limited to predefined types of reports. librarians can easily generate customized reports tailored to the specific needs of the library. in this way, librarians work in a more comfortable environments, performing analytical operations interactively and visualizing query results without additional programming knowledge. the article presents the usage of pentaho olap tool, but the proposed data-warehouse model is independent of olap tools selection and any other tool can be integrated with the proposed data warehouse. references 1 ralph stair and george reynolds, fundamentals of information systems (cengage learning, 2017). 2 ramesh sharda, dursun delen, and efraim turban, business intelligence, analytics, and data science: a managerial perspective (pearson, 2016). 3 “bisis,” library management system bisis, accessed july 8, 2019, http://www.bisis.rs/korisnici/. 4 bojana dimić and dušan surla, “xml editor for unimarc and marc 21 cataloguing,” the electronic library 27, no. 3 (2009): 509-28, https://doi.org/10.1108/02640470910966934; bojana dimić surla,“eclipse editor for marc records,” information technology and libraries 31, no. 3 (2012): 65-75, https://doi.org/10.6017/ital.v31i3.2384; bojana dimić surla, “developing an eclipse editor for marc records using xtext,” software: practice and experience 43, no. 11 (2013): 1377-92, https://doi.org/10.1002/spe.2140. http://www.bisis.rs/korisnici/ https://doi.org/10.1108/02640470910966934 https://doi.org/10.6017/ital.v31i3.2384 https://doi.org/10.1002/spe.2140 information technology and libraries | december2019 112 5 branko milosavljević, danijela boberić, and dušan surla, “retrieval of bibliographic records using apache lucene,” the electronic library 28, no. 4 (2010): 525-39, https://doi.org/10.1108/02640471011065355. 6 danijela boberić and dušan surla,“ xml editor for search and retrieval of bibliographic records in the z39. 50 standard,” the electronic library 27, no. 3 (2009): 474-95; danijela boberić krstićev, “information retrieval using a middleware approach,” information technology and libraries 32, no. 1 (2013): 54-69, https://doi.org/10.6017/ital.v32i1.1941; miroslav zarić, danijela boberić krstićev, and dušan surla, “multitarget/multiprotocol client application for search and retrieval of bibliographic records,” the electronic library 30, no. 3 (2012): 351-66, https://doi.org/10.1108/02640471211241636. 7 danijela tesendic and danijela boberic krsticev, “web service for connecting visually impaired people with libraries,” aslib journal of information management 67, no. 2 (2015): 230-43, https://doi.org/10.1108/ajim-11-2014-0149. 8 danijela boberić-krstićev and danijela tešendić,“ mixed approach in creating a university union catalogue,” the electronic library 33, no. 6 (2015): 970-89, https://doi.org/10.1108/el-022014-0026. 9 danijela tešendić, branko milosavljević, and dušan surla, “a library circulation system for city and special libraries,” the electronic library 27, no. 1 (2009): 162-86, https://doi.org/10.1108/02640470910934669; branko milosavljević and danijela tešendić, “software architecture of distributed client/server library circulation system,” the electronic library 28, no. 2 (2010): 286-99, https://doi.org/10.1108/02640471011033648; danijela tešendić, “data model for consortial circulation in libraries,” in proceedings of the fifth balkan conference in informatics, novi sad, serbia, september, 16-20, 2012. 10 danijela boberic and branko milosavljevic, “generating library material reports in software system bisis,” in proceedings of the 4th international conference on engineering technologies icet, 2009: 133-37. 11 william h. inmon, building the data warehouse (indianapolis: john wiley & sons, 2005); ralph kimball, the data warehouse toolkit: practical techniques for building dimensional data warehouses (ny: john willey & sons, 1996), 248, no. 4. 12 ralph kimball and joe caserta, the data warehouse etl toolkit: practical techniques for extracting, cleaning, conforming, and delivering data (indianapolis: john wiley& sons, 2004), 528. 13 oscar romero and alberto abelló, “a survey of multidimensional modeling methodologies,” international journal of data warehousing and mining (ijdwm) 5, no. 2 (2009): 1-23. 14 lorena siguenza guzman, victor saquicela, and dirk cattrysse,“design of an integrated decision support system for library holistic evaluation,”in proceedings of iatul conferences (2014):112. https://doi.org/10.1108/02640471011065355 https://doi.org/10.6017/ital.v32i1.1941 https://doi.org/10.1108/02640471211241636 https://doi.org/10.1108/ajim-11-2014-0149 https://doi.org/10.1108/el-02-2014-0026 https://doi.org/10.1108/el-02-2014-0026 https://doi.org/10.1108/02640470910934669 https://doi.org/10.1108/02640471011033648 business intelligence in the service of libraries |tešendić and krstićev 113 https://doi.org/10.6017/ital.v38i4.10599 15 yi-ting yang and jiann-cherng shieh, “data warehouse applications in libraries—the development of library management reports,” in advanced applied informatics (iiai-aai), 2016 5th iiai international congress on advanced applied informatics, 88-91. ieee, 2016, https://doi.org/10.1109/iiai-aai.2016.129. 16 tešendić, milosavljević, and surla, “a library circulation system,”162-86. 17 dimić and surla, “xml editor for unimarc,” 509-28. 18 paulraj ponniah, data warehousing fundamentals for it professionals (hoboken, nj: john wiley & sons, 2011). 19 “top 10 best analytical processing (olap) tools,” software testing help, https://www.softwaretestinghelp.com/best-olap-tools/. 20 rick sherman, “how to evaluate and select the right bi tools,” https://searchbusinessanalytics.techtarget.com/buyersguide/a-buyers-guide-to-choosingthe-right-bi-analytics-tool. 21 doug moran, “pentaho community wiki,” https://wiki.pentaho.com/. 22 ponniah, “data warehousing,” 382-93. https://doi.org/10.1109/iiai-aai.2016.129 https://www.softwaretestinghelp.com/best-olap-tools/ https://searchbusinessanalytics.techtarget.com/buyersguide/a-buyers-guide-to-choosing-the-right-bi-analytics-tool https://searchbusinessanalytics.techtarget.com/buyersguide/a-buyers-guide-to-choosing-the-right-bi-analytics-tool https://wiki.pentaho.com/ abstract introduction reporting in bisis related work modeling the data warehouse analysis of data sources in bisis analysis of user requirements data warehouse models model describing library collection data model describing library circulation data model describing members’ data true value of a data warehouse etl processes modeling data visualization conclusion references 126 standards for library automation and isad's committee on technical standards for library automation (tesla) the 1'0le of isad's committee on technical standards for library automation is examined and discussed. a p1'dcedu1'e fo1' the reaction to and initiation of standards is described, with reference to relevant standards organizations. the development, implementation, and maintenance of standards might best be characterized as the complexity of simplification-complex insofar as a standard represents a universally applicable ideal which is usually the result of arduous negotiation and compromise; simplification since a standard, once recommended and followed in practice, forms a firm reference point for the achievement of specified objectives. thus, if a standard exists, it can be referenced or used immediately and variant wheels do not have to be invented. unfortunately, to use, reference, or advocate a standard requires an awareness of available standards or the process whereby standards evolve. it is at this point that standards again become complex-in fact, they become a maze, which perhaps can be characterized by questions such as: is there a standard already? who is responsible for it? where are copies of this standard available? and so on (a maze familiar, certainly, to all of us). n is precisely to address the mazelike aspect of the standards "game" that the committee on technical standards for library automation ( tesla) has been established. in short, tesla intends to act as a twoway clearinghouse, hopefully to bring user and supplier into a meaningful dialogue wherein the requirements of both might be satisfied. technical standards and data dependence within this context the emphasis by tesla shall be on technical standards for library automation (e.g., standards relating to electronic data processing devices and techniques). concurrently, however, there are instances where device and data become inseparably linked. for example, the standatds fot libraty automation 127 relationship between the physical dimensions of a machine-readable patron badge and the amount and, therefore, type of data which can be me~ chanically encoded in it; or the character set used by a terminal and the minimum processing potential and, thus, hardware which must be internal to the terminal to receive, transmit, and display that character set. because it would be foolish to ignore this relationship, tesla in its clearinghouse function will stress and foster the involvement of individuals or organizational units within the american library association wherever data-dependent technical standards are involved. ala-originated and maintained technical standards though certainly no mystery, there is little evidence that the direct cost and personal involvement for a published and practiced standard is popularly known. for example, it has been indicated by those within the standards business that an adopted standard might culminate an investment of over a million dollars and represent the expenditure of tens of man-years. the cost, for example, leading to and including the final publication by the american national standards institute (ansi) of the standard for bibliographic information interchange on magnetic tape (ansi z39.21971), more popularly known as marc, has not been published. it is suspected, however, that the cost of the marc standard was monumental. in short, and by way of this example, it can be safely assumed that neither the american library association nor isad nor tesla will become standards organizations in the strict sense of the word. in fact such a capability is not desirable, since organizations such as ansi, electronics industry association (eia), institute of electronic and electrical engineers (ieee), national microfilm association ( nma), etc., exist and are geared specifically to this activity. rather, the american library association and isad should, and must, participate actively in the standards processes available to them to insure a meaningful user-voice in the development of standards by those organizations. to provide for participation in the standards process at the membership level is precisely tesla' s role. thus, when placed in operation, such standards will reflect the library community's requirements, contributing to and fostering library automation rather than hindering it. at least one of the anticipated results would be the development of equipment addressing library needs directly, and so preclude the custom fabrication of specialty devices which while satisfying the needs of a few libraries-expensively, cannot satisfy libraries in generaleconomically. what is provided by tesla? tesla specifically has established a procedure whereby the membership of the american library association might either teact to proposals for standards regardless of origin, or initiate proposals for standards for membership reaction. the results of this procedure, whether reactive or initia128 journal of library automation vol. 7/2 june 1974 tive, would be communicated to the membership in terms of the status and position taken for each proposal, and to the originator and to ala's official representative in full detail for subsequent application. the tesla procedure the procedure is geared to handle both reactive (originating from the outside) and initiative (originating from within ala) standards proposals to provide recommendations to ala's representatives on existing, recognized standards organizations. to enter the procedure for an initiative standards proposal the member must complete an "initiative standards proposal" using the outline which follows: initiative standard proposal outline the following outline is designed to facilitate review by both the committee and the membership of initiative standards requirements and to expedite the handling of initiative standards proposals through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indicated by: vi. existing standards. not applicable). note that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 8%" x 11" white paper (typing on one side only). each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title). ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications). v. description. briefly describe the standard proposal (specification of the standard). vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard proposal, cite them here (expository remarks). vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). standards for librm·y automation 129 viii. specifications. specify the standard proposal using record layouts, mechanical . drawings, and such related documentation aids as required in addition to text exposition where applicable (specification of the standard). please note that the outline is designed to enable standards proposals to be written following a generalized format which will facilitate their review. in addition, the outline permits the presentation of background and descriptive information which, while important during any evaluation, is a prerequisite to the development of a standard. the reactor ballot (figure 1) is to be used by members to voice their recommendations relative to initiative standards proposals. the reactor ballot permits both "for" and "againsf' votes to be explained permitting the capture of additional information which is necessary to document and communicate formal standards proposals to standards organizations outside the american library association. as you, the members, use the outline to present your standards protesla reactor ballot reactor information name __________________ __ title organization address city___ state ___ zip __ telephone ---------identification number for standard requirement -----------for ------------against reason for position: (use additional pages if required) fig. 1. tesla reactor ballot posals, tesla will publish them in ]ola-tc and solicit membership reaction via the reactor ballot. throughout the process tesla will insure that standards proposals are drawn to the attention of the applicable american library association division or committee. thus, internal review usually will proceed concurrent with membership review. from the review and the reactor ballot tesla will prepare a majority recommendation and a minority report on each standards proposal. the majority recom130 journal of library automation vol. 7/2 june 1974 receipt screen division rej/acp1 publish tally representative title/i.d. number date date date date date date date target fig. 2. tesla standards scoreboard mendation and minority report so developed will then be transmitted to the originator, and to the official american library association representative on the appropriate standards organization where it should prove a standards for library automation 131 source of guidance as official votes are cast. in addition, the status of each standards proposal will be reported by teslain jola-tc via the standards scoreboard (figure 2). the committee ( tesla) itself will be nonpartisan with regard to the proposals handled by it. however, the committee does reserve the right to reject proposals which after review are not found to relate to library automation. tesla's composition tesla is comprised of representatives both from the library community and library suppliers to insure a mix of both users and producers for its review of standards proposals. in addition, rotating membership on tesla will insure a continuing movement of voices from different segments of the library and library supplier communities to shortstop the pressing of vested interests. at this time, the members of tesla and the term for each are: ms. madeline henderson chairperson, task force on automation of library operations and federal library committee u.s. department of commerce/nos washington, dc 20234 term ends: 1974 mr. arthur brody chairman of the board bro-dart industries 1609 memorial ave. williamsport, pa 17701 term ends: 1975 dr. edmund a.' bowles data processing division international business machines corporation 10401 fernwood rd. bethesda, md 20034 term ends: 1974 mr. anthony w. miele assistant director technical services illinois state library centennial building springfield, il 62706 term ends: 1975 standards library mr. jay l. cunningham director, university-wide library automation program university of california, south hall annex berkeley, ca 94720 term ends: 1976 mr. richard e. uttman p.o. box 200 princeton, nj 08540 term ends: 1976 mr. leonard l. johnson director of media services greensboro public schools drawer v greensboro, nc 27402 term ends: 1975 mr. john c. kountz (chairman) associate for library automation office of the chancellor the california state university and colleges 5670 wilshire blvd., suite 900 los angeles, ca 90036 term ends: 1976 in addition to acting as a clearinghouse for standards for isad, and the maintenance of the standard proposal and reactor ballot procedure, tesla intends to urge the establishment of an ala collection of stan132 journal of library automation vol. 7/2 june 1974 dards applicable to libraries to handle requests for information from the library community. thus, while currently each member is left to "do it himself," there appear to be definite economies in the centralization of such a collection and the periodic publication of indices to relevant standards. sources of information relating to standards to provide a source of guidance at this time for the types of available standards, to list the many existing standards, and to index the originating standards organizations would consume several issues of jola. therefore, in the following are very brief recapitulations of the more relevant organizations impacting library automation standards. the list is very incomplete as might be expected. for those interested in a comprehensive review of standards, global engineering, 3950 campus drive, newport beach, ca 92660, maintains and annually publishes their directory of engineering document somces .. this directory, now in its second edition, lists over 2,000 standards organizations and the prefixes used by those organizations in publishing their standards, to permit global engineering's customers to specify standards for purchase. ( global engineering's primary function is the sale of original copies of standards and specifications.) american national standards institute, inc. (ansi)-the american national standards institute, inc. ( 1430 broadway, new york; ny 10019) does not write standards. rather, ansi has established the procedures and guidelines to be followed in the development of standards that will be labeled american national standards. the actual work is done by ansi members and other interested groups and individuals, using the ansi procedures in the development of standards. only after these groups have demonstrated to ansi's satisfaction that the proposed standard has been developed in accordance with the procedure established by ansi will it be approved and published as an american national standard. in addition, ansi publishes materials relating to standards, of which the ansi reporter, a journal dedicated to standards, probably represents the single best source for information relating to current standards issues. however, the scope of ansi is very broad. thus for library and library automation activities specific committees of ansi, rather than ansi itself, are relevant; specifically, ansi committees: ph5 (microfilm). see national microfilm association below. x3 ( computers and information processing); x4 (business machines and supplies). both of these subcommittees are sponsored by the computer and business equipment manufacturers' association ( cbema), which is also their secretariat. cbema ( 1828 l st. nw, washington, dc 20036) periodically provides indices to the published standards of x3. an insight into the breadth of x3's activities can be implied from figure 3, ansi x3 standards committee organization. x3 currently has standards fo1' library automation 133 bema iso/tc 97 secreuiriat information processing systems technical advisory board (ipstab) computers and lnfonnation processing american national standards institute american national standards committee ~=:.:.:.:;:.~::::::;;r:----1 consumer members -i i i i i i :------, : i : i : i i -i i standanls department administrstion secretariat a standards advisory committee coordination dpg advisory committee on plans & policy policy dpg standards conmittee technics/ '-----staff line staff i i i standards standards internanonal planning & steering ·advisory requirements committee coi.wittee coi.wittee (ssc) (lac) (sl'arc) i study groui's i i•• requiredl i i hardware gro141 software gro141 systems g10141 recognition section language section data communications sectio~ ma" m:ro "s" iqa1 ocr ic3j1 pl/1 ic3s3 data communications 113a7 mica 113j3 forman physical media section x3j4 cobol systems technology section x3j7 apt "t" "au x3jb algol magnetic tape & cauettea 113'19 1/0 lntarfaca x3b1 documentation section x3b2 perforated tape "k" x3b3 punched cards x3b4 edge punched carda x3k1 documentation 11387 megnetic disc. 1131<2 flow charta 1131<5 vocabulary x3kb network-oriented lnfonnation syatama data representation section ml" x3l2 cod"" x3l5 labels 113lb data repntsontatkln fig, 3, ansi x3 standards committee organization, 134 ]oumal of librmy automation vol. 1/2 june 1974 over fifty member organizations. the ala representative on x3 is mr. james rizzolo of the new york public library. an excellent overview of x4's scope and activities was published in the secretary (nov. 1973) under the title "what's being done about office equipment standards." from the library viewpoint x4' s activities in credit cards, typewriter keyboards, and forms are of interest. at this writing x4 has nine user, fifteen producer, and nine general interest members. the ala is not currently represented on x4. z39 (library work, documentation and related publishing practices) sponsored by the council of national library associations. with thirtysix subcommittees ( sc), z39 covers library related activities from machine input records ( sc/2) through standard order forms ( sc/36). it was through z39 that marc became the american national standard (z39.2-1911). z39 publishes a quarterly entitled news about z39. z39 is located at the school of library science, university of north carolina, chapel hill, nc 27514. fifteen standards have been published by z39. the ala representative to z39 is fred blum of eastern michigan university, ypsilanti, whose excellent summary of z39 appears in the winter 1974 issue of library resources & technical services. national microfilm association (nma)-the national microfilm association has an organization of standards committees and a standards board as shown in figure 4. information relating to their standards is published from time to time in the ]oumal of mic1·ographics. a recent article, entitled "standards: nma standards committee scope of work," (vol. 7, no. 1 [sept. 1973] ), briefly describes the subcommittees internal to nma's standards organization and the scope of each. of particular interest to libraries is the sponsorship by nma of ansi-ph5. micrographic standards are listed in an nma publication ( rr1-1974 resource report). copies of this resource report may be obtained by contacting the nma at suite 1101, 8728 colesville rd., silver spring, md 20910. international organization for standardization (iso)-this organization is truly international with representatives from thirty-five nations. the secretariat of iso is ansi (see above). while iso parallels ansi in its coverage, it differs organizationally. thus, the committees/subcommittees of ansi have in large measure their equivalent technical committees, subcommittees, and working groups in the iso. standards developed by the iso and published by them are reported regularly in the ansi reporter (referred to above). most recently, the january 11, 1974 ansi reporter contained an article outlining iso publications and describing five iso titles. marc, by the way, is also iso standard 2709. the iso technical committees ( tc) of immediate interest to library automation are tc 37 (terminology), tc 46 (documentation), tc 95 (office machines), and tc 97 (computers and associated information processing systems, nma standards board i i i i i i microfiche inspection & materials & operational public equipment quality con. supplies practices records i i i i i i microdrafting terminology info. storage rotary reduction facsimile & retrieval cameras ratios i i i i i i flow chart newspapers com format com com ecology symbols & coding quality software fig. 4. nma standards organization 136 journal of libmry automation vol. 7/2 june 197 4 peripheral equipment and devices, and media related thereto). as an indication of the technical areas covered by tc 97, its organization is shown in figure 5. electronic industries association (eia)-the electronic industries association maintains a broad variety of standards for hardware and related peripheral equipment. such areas as cathode ray tube (crt) terminals, the luminescence of cathode ray tubes themselves, television transmission, and data communications are dealt with by the eia standards. an excellent source of eia standards is the publication produced by the eia entitled index of eia and jedec standards and engineering publications ( 1973 revision and no. 2). copies are available through the electronic industries association, engineering department, 2001 i st. nw, washington, dc 20006. the ala is not a member of the eia, by definition. the institute of electt·onic and electrical engineers (ieee)-the institute of electronic and electrical engineers, inc. is a professional organization which, in addition to its professional activities, maintains standards. many of these standards relate to library automation in such areas as keyboards for terminals and transmission types for data communications. while each monthly issue of the ieee publication spectrum contains annotated lists of new standards, a full index to the ieee standards is available by contacting the ieee headquarters, 345 e. 47 st., new york, ny 10017. national bureau of standards (nbs)-the national bureau of standards, u.s. department of commerce, has the responsibility within the federal government for monitoring and coordinating the development of information processing standards and publishing proved data standards for data elements and codes in data systems. thus, the national bureau of standards works closely with federal departments and agencies, the american national standards institute (ansi), and the international orgatlization for standardization (iso). of specific interest to library automation are the fedeml information processing standards (fips) and the fips index published by nbs. the annual fips index (fips pub 12-1) is a veritable gold mine of information relating to ansi, iso, federal government participation and representatim,1 in the standards process, and the role of nbs itself. fips 12-1 is available from the superintendent of documents, u.s. g.p.o., washington, dc 20402 (sd catalog no. c 13:52: 12-1). while the material above should provide a brief overview of the standards arena in which tesla will function and some insight into the scope of standards activities, it is not to be construed as a definitive compilation of standards organizations. as indicated earlier, over 2,000 such organizations are known to be active currently. iso/tc97 computers aiid ill formation processing usa sci sc2 sc3 sc4 sc 5 sc 6 sc7 s c8 'igk character character program lung digital data problem numerical data elements vocabulary ill put/ output definition l control of l their coded sets lcodinc recognition languages transmission analysis machines representations sec: france france usa italy usa usa germany france usa i i i wgi wg i 'ig2 prog\i~niiig wgi vocabulary optical magnetic ink laitguage for vocabulary maintenance character character numeric control for numeric recognition recognition of machines control usa switzerland belgium usa france we 1 'ig2 wg3 wg4 igs 'ig6 magnetic punched punched 1/0 iiisllwweiita tioi magnetic tape cards tape equipneiit tape disk packs usa france italy germany usa germaiiy fig. 5. iso/tc 97 organization chart 138 ] ournal of libm1'y automation vol. 7/2 june 197 4 finally, an invitation during the formative period of tesla a list of potential standards areas for library automation was developed. potential technical standa1'ds a1·eas 1. codes for libraries and library networks, including network hierarchy structures. 2. documentation for systems design, development, implementation, operation, and post-implementation review. 3. minimum display requirements for library crts, keyboards for terminals, and machine-readable character or code set to be used as label printed in book. 4. patron or user badge physical dimension ( s) and minimum data elements. 5. book catalog layout (physical and minimum data elements) : a. off-line print b. photocomposed c. microform 6. communication formats for inventory control (absorptive of interlibrary loan and local circulation). 7. data element dictionary content, format, and minimum vocabulary, and inventory identification minimum content. 8. inventory labels or identifiers (punched cards, labels, badges, or ... ) physical dimensions and minimum data elements. 9. model/minimum specifications relating to hardware, software, and services procurement for library applications. 10. communication formats for library material procurement (absorptive of order, bid, invoice, and related follow-up). you are invited to review this list and voice your opinion of any or all areas indicated by means of the reactor ballot in ]ola-tc in this issue. or, if you've a requirement for a standard not included in this list, use the initiative standard proposal outline to collect and present your thoughts. henceforth, future issues of ]ola-tc will contain a reactor ballot and the scoreboard. the ball is in your court! send ballots and/or initiative standard proposals to: john kountz, chairman, isad-tesla, 5670 wilshire blvd., suite 900, los angeles, ca 90036. lib-s-mocs-kmc364-20141005044052 109 statistical behavior of search keys abraham bookstein: graduate library school, university of chicago editor's note: the editor and author are aware that varying approaches may be taken to the problem presented here. readers are invited to respond in the form of a paper or a technical c.'ommunication. in discussion about search keys, concern has been expressed as to how the nwnber of items tetrieved by a single value relates to collection size. this paper creates a statistical model that attempts to give some insight into this behavior. it is concluded that, in general, the observed behavior can be explained as being intrinsically statistical in nature rather than being a property of specific search keys. an attempt is made to relate this model to other tesearch, and to indicate how this model may be made to yield more accurate predictions. introduction various experiments suggest that it may be possible to develop, as an access route into a file of bibliographic records, a search key'" whose values can be easily derived from such bibliographic data as is likely to be available to its users.1 some concern, however, has been expressed regarding the nonuniqueness of these keys: if the number of items retrieved were often to exceed an amount easily handled by a user of the system, the value of this access route would be considerably diminished. accordingly, an important measure of search key performance is the frequency with which a large number of records is reh·ieved as the search key is applied to the file. this measure is · related, for example, to how many memory accesses will be required, on the average, to retrieve all records satisfying a request; it is also an important consideration in deciding which display device should be installed in a system.2 • 3 after evaluating such a measure for a search key on a particular file, it is reasonable to ask how that measure will change over time, as the file increases in size. the nature of this variation has already been of concern to researchers in the field. kilgour, on the basis of a· number of experiments carried out at oclc, notes that "there remains a major problem to be o by the. phrase "search key'~ we mean a key similar to the 3-3 or 3-1-1-1 keys used at · ohio college library center and other places, which is made up by concatenating truncations of bibliographic data elements. llo journal of library automation vol 6/ 2 june 1973 solved and a major question to be answered. the problem is constituted of those replies that contain a number of entries exceeding the optimal maximum .. .. the major question to be answered is how truncated search keys will perform on files ten and a hundred times the size of that used in this experiment."' he elsewhere observes that "as a file of bibliographic entries increases, the maximum number of entries per reply does not increase in a one-to-one ratio ... . "5 this paper presents a mathematical model that addresses itself to the problem defined by kilgour and attempts to explain his observation; it is suggested that the gross features of the behavior are statistical in nature and not properties of specific search keys. a view of collection growth the cause of the phenomenon observed by kilgour can best be understood by first considering a simple model which, while not itself valid, does cast light on the nature of the behavior. this first model neglects the effect of randomness both in the growth of the collection and in the arrival of requests. it supposes our search key has the following property: regardless of collection size, the fraction of the collection retrieved by a particular search key value, v~, is exactly given by a constant f;; thus, if the fil e holds n records, a request for v 1 will retrieve n 1 = f,n records. this model similarly assumes that among any sizeable number of requests, the fraction of the time any particular search key value will occur is fixed; thus, for any subset of search key values, it is possible to determine how often members of that subset will occur among a set of requests. in particular, for any integer n, we can form the set of all the search key values that will retrieve less than n items. we can then determine how often search key values from that set are requested. if, for example, requests for these values occur 99 percent of the time, then we can assert that 99 percent of the time less than n items will be retrieved. if the fil e contains n items, then these n items constitute the fraction f = ~ of the file. should the collection size increase to ln, then the model predicts that 99 percent of the time less than f( ln) = ln items would be retrieved. in other words, we have precisely the behavior kilgour observes does not occur. this argument shows that a simple deterministic model does not conform to experience with search keys. the model breaks down in two ways, which accounts for the discrepancy between the results derived from it and kilgour's observations: 1. in any actual library, the fraction of the time that a particular request will appear within a sequence of requests will vary; and 2. in comparing two different samples having the same size, the number of items having a given search key value will vary. the first of these factors is easily dealt with and its analysis will suggest the number of requests to use in a test of search key behavior in a given library. for a particular collection, lets denote the set of search key values statistical behavior of search keysj bookstein 111 for which, say, twenty or more items are retrieved. we would like to find the fraction of the time that a request in s occurs in the long run; suppose this value is in fact q. then among m requests, the probability that m members of s occur is given by the binomial distribution fb(m\q,mi). this distribution has a mean of qm and a variance of qm(1 q). should we desire to estimate the actual fraction of the time that twenty or more items will be retrieved, we can take a sample of m requests and compute q, the fraction of the requests with search key values in s; if we do so, we will usually get a value for q between q ,/ m v q ( 1 q) and q + v2 m v q ( 1 q) .' if for example, q = .01 and m = 10,000, we would tend to find q in the interval .01 ± .002. thus the effect of randomness in the arrival of requests can easily be controlled by increasing the number of requests considered; furthermore, the size of error can be predicted. we next introduce the second factor; its analysis will suggest how the behavior of search keys will change as the collection grows in size. for this purpose we adopt a model of collection growth which assumes that as items arrive, they are randomly distributed among the search key values in accordance with some probability distribution. if we suppose that the probability of an item being assigned a specified search key value, v11 is p11 then in a collection of n items we may conclude that the probability of n items having that value is given by the binomial distribution: ( n ) n n-n fu(n jpbn) = 7 p1(1p1 ) • if g' ( v;) is the probability that the value v1 is selected from the request population, then the probability that the "next" request retrieve n items is given by def ~~ g'(vt) fb(njp;,n) =fg(p) fb(njp,n)dp; g(p) dp= ~ g'(v;) p;! p i ~ p + dp is the probability that a request arrive with value p1 in the interval (p,p + dp), and will be treated as a continuous function.""' since the expectation of the binomial distribution is given by pn, we have de£ nfpg(p)dp = np as the expected number of items retrieved by a random request; since this is proportional ton, doubling the size of the collection will, on the average, double the amount of material ret1·ieved. similarly, the 2 2 variance, u 2, is given by n2 ( p 2 p ) + nf p( 1 p) g( p )dp. should p2 p , de£ the variance of p, be small, this reduces to nfp(l p )g(p )dp = i?n, so that approximately 95 percent of the time the amount of material retrieved would be less than np + 2\1 n a-= n ( p + , ~a) . v n . •• this result would more precisely be expressed as f fb(n lp ,n)dg(p), which has the form of a stieltjes integral. the expression used in the text is simpler and reasonably valid because of the vast number of values the search key can take. i i i j i 112 journal of library automation vol. 6/2 june 1973 it is the factor + 2cf p vn' and its dependence on n, that may account for kilgour's nonlinearity, and not any property intrinsic in the nature of any type of search key. thus, to the extent that this model reflects what is really happening, the 95 percent point increases roughly proportionately with file size; the "constant" of proportionality, however, is the sum of two tem1s: the first is a true constant, and the second is a term that approaches zero as the file gets larger. in particular, this model suggests that we will never reach a leveling off point-as the file increases in size, the number of items retrieved will also increase, and the pattern of increase will become increasingly linear. up to this point this discussion has been qualitative in nature, being based upon general statistical considerations and making use of the normal approximation to some unknown distribution; its broad conclusions are, however, consistent with the findings of earlier workers and can explain certai11 unanticipated properties of search keys. to proceed further it will be necessary to restrict the form of the function g(p); tl1is will be attemped in the following section of this paper. relationship of model to earlier research interest in access methods that are appropriate for files of bibliographic data has generated a considerable amount of empirical research on search key behavior. of necessity, this pioneering work has been of a descriptive nature, resulting in data showing search key behavior in specific environments. while these efforts have lent a good deal of insight into the nature of search keys, the basic weakness of such research lies in the difficulty of extending these findings to other situations. one purpose of a mathematical model such ·as. the one being developed here is to provide this increased generality by representing in a concise and easily manipulated form the results of previous research. it is accordingly of interest to indicate the relationship between previous work on search keys and our model. research on search key performance has been of two kinds. the fi.rst kind seeks .to answer the question: for any number, n, how many search key values retrieve n items? the answer to this question depends only on the search key and the collection; it is independent of the pattern of request arrivals. the second kind of research involves the ·actual arrival of requests; it tries to answer the question: for any number n, how frequently will requests resulting in the retrieval of n items occur? to discuss this research in terms of our model requires a closer examinadef tion of the function g( p) previously defined. we recall that g( p) dp == ~ g'(v1), with dp being a small number. thus g(p) is determined p ~ pi ~ p+dp by two factors: statistical behavior of semch keysjbookstein 113 a. the number of search key values in the interval ( p,p + dp). let us denote this value by f(p )dp, so f(p) is the density of search keys at p. we make use here of the fact that although the number of possible search key values is finite, the number is very large, so their. distribution can be thought of as continuous. b. the average probability of search keys, with values p 1 near p, being requested. we shall refer to this quantity as g"(p). by combining these factors we have g(p) = g"(p )f(p ). · in terms of this discussion, the first type of research described above. is in fact estimating f(p): if there ares search key values that retrieve n items from a collection of n items, then sis an estimate of this relation uses _!_ f (~)· n n' n + ~~ n = pn, and dp = n n~ 1 n n' the second kind of research directly estimates g ( p). guthrie, in a recent paper, provides a bridge between the two types of research by discussing his findings in terms of two models.6 one of his models, which asserts that each search key value has an equal chance of being requested, is equivalent to the assumption that g"(p) = 1, and g(p) = f(p). guthrie finds that this is not an adequate representation of his data. guthrie's second model asserts that each item has an equal chance of being requested. in our terms this becomes g' ( p )ap, and g( p )apf ( p). this model, while an improvement over the first, still disagrees with the data. furthermore , these models do not estimate f ( p); even if guthrie's model were correct, we would not know the probability that n items would be re trieved until we were told how many search key values contained n items. in the next section we will try to remedy this situation by means of a two paramete r representation of g( p). a representation of f(p) to get a more detailed account of search key behavior by experiment is difficult since the two aspects of randomness already discussed are confounded; the experimenter only sees the combined effect. we will, however, try to estimate the distribution g ( p) by a distribution of the form (a + {3 + 1)! a (1 )f3 a!f3! p p. we believe that such an attempt is reasonable on three grounds: a. it is not possible to find g(p) exactly, and moreover, it is not clear that this would be desirable. we are interested in a reasonable approximation that is satisfactory for decision-making purposes; b. the above distribution assumes a wide variety of shapes as a and f3 vary; it seems likely that values of a and f3 can be found for which 114 journal of library automation vol. 6/ 2 june 1973 this distribution is close enough to g ( p); and c. this distribution is mathematically tractable. if we proceed using the above approximation for g(p ), we find: (i) the probability, p(n), of n items being retrieved is given by 1. p(n) = (-n) ~-+ f3 + 1~l(a + n)! (nn + [3)! n a!fj! (a+fj+n+l)! ( ii) the expected number of items retrieved, e, is given by a + 1 2. e == n a + {3 + 2 ; and (iii) the variance, v, of the number of items retrieved is given by _ a+l {3 + 1 n 3· v n a + f3 + 2 a + {3 + 3 ( 1 + a. + {3 + 2 ) · if the experiment is performed on a small sample, the expectation and variance can be computed and the values of a and f1 estimated from the relations e a (1 -) + 1 4. f1== n 2, and e n v en e 1 -n 5. a. v 1 e l e 1-n usually ~ will be much smaller than one; in this case we may use the approximations: n 4'. f3 =(a+ i)e, and e 5'. e 1 a= n. once a and f1 have been evaluated, we can compute the probabilities p ( n) for files of arbitrary size, and with these values we can make assertions regarding the probability of, say, more than 30 items being retrieved. a relation that can be derived from formula 1 and may be of use when comparing this model with experiment is: p(n) i + {3 n-n = 1 + a n + 1 p(n + 1) statistical behavior of search keys/ bookstein 115 the probability of zero retrievals is likely to be an extraordinary point in the distributions g ( p) and p ( n) since it is influenced by the knowledge that a user may have of the collection; this effect is likely to be encountered in a sampling process in which the requests have to be generated artificially. in such cases it would be advisable to treat p ( 0) as an empirically derived parameter, (), and use the modified formula { (jifn=o 6. p' (n) = (1 fj) 1 ~(;~o) if n ::1= 0. the value of() can be estimated by the fraction of requests retrieving zero items; for sampling techniques using only productive requests, () will be zero. a. and f3 can be calculated as before from the mean and variance of the sample. conclusion the above discussion is intended as an attempt to provide some theoretical understanding of the puzzling behavior discovered in the use of search keys and also to provide some guide for those experimenting with samples of such files. we do, however, urge caution for the latter uses. an analysis similar to the above can be useful under several different circumstances, such as: determining the future behavior expected of a search key in a single library as the collection grows; determining the behavior for one library based upon experiments conducted on a different but similar library; and extrapolating from the performance of a search key in a sample of the collection to its pedormance in the full collection. if one wishes to compare two different libraries, one can note that as far as search key values are concerned, a particular library's collection can be thought of as a random sample of the larger population from which it selects its material, and accordingly the formula for p ( n) should be valid. in this case, if two different collections are drawn from the same population, the g ( p) refers to this population and the libraries are distinguished by the parameter n; when we are considering samples from a single library, then n is the sample size and g ( p) refers to the library itself. no theoretical basis exists at present for estimating to what extent the populations being considered depend upon the type of library, if any, so this problem must be dealt with empirically. we have assumed here that these populations are similar with regard to search key values. should these populations in fact vary, it is possible that they can be broken down, e.g., by language, into subpopulations that are stable and for each of which the analysis is valid. acknowledgments this work was made possible by clr/ neh grant no. e0-262-70-4658. i would like to express my gratitude to members of the university of chicago systems development office for their many comments and suggestions on this work. i ; i ll6 journal of library automation vol. 6/ 2 june 1973 references i. frederick g. kilgour, philip l. long, eugene b. leiderman, and alan l. landgraf, "title-only entries retrieved by use of truncated search key," journal of library automation 4:207-10 (dec. 1971). 2. a. bookstein, "double hashing," journal of the american society for information science 23:402-25 (nov.-dec. 1972) . 3. a. bookstein, "hash coding with a non-unique search key," to be published in the journal of american society for information science. 4. frederick g. kilgour, philip l. long, eugene b. leiderman, and ajan l. landgraf, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys." preprint. 5. kilgour, long, leiderman, and landgraf, "title-only entries," p.209-10. 6. gerry p. guthrie and steven d. slifko, "analysis of search key retrieval on a large bibliographic file," journal of library automation 5:96-100 (june 1972). letter from the editor (september 2019) letter from the editor kenneth j. varnum information technology and libraries | september 2019 1 https://doi.org/10.6017/ital.v38i3.11631 editorial board changes thanks to the dozens of lita members who applied to join the board this spring. the large number of interested volunteers made the selection process challenging. i’m pleased to welcome six new members to the ital editorial board for two-year terms (2019-2021): • lori ayre (independent technology consultant) • jon goddard (north shore public library) • soo-yeon hwang (sam houston state university) • holli kubly (syracuse university) • brady lund (emporia state university) • paul swanson (minitex) in this issue welcome to lita’s new president, emily morton-owens. in her inaugural president’s message, “sustaining lita,” morton-owens discusses the many ways lita strives to provide a sustainable organization for its members. we also have the next edition of our “public libraries leading the way column. this quarter’s essay is by thomas lamanna, “on educating patrons on privacy and maximizing library resources.” joining those essays are six excellent peer-reviewed articles: • “library-authored web content and the need for content strategy,” by courtney mcdonald and heidi burkhardt • “use of language-learning apps as a tool for foreign language acquisition by academic libraries employees,” by kathia ibacache • “is creative commons a panacea for managing digital humanities intellectual property rights?,” by yi ding • “am i on the library website?,” by suzanna conrad and christy stevens • “assessing the effectiveness of open access finding tools,” by teresa auch schultz, elena azadbakht, jonathan bull, rosalind bucy, and jeremy floyd • “creating and deploying usb port covers at hudson county community college,” by lotta sanchez and john delooper call for pllw contributions if you work at a public library, you’re invited to submit a proposal for a column in our “public libraries leading the way” series for 2020. our series has gotten off to a strong start with essays by thomas finley, jeffrey davis, and thomas lamanna. if you would like to add your voice, please submit a proposal through this google form. kenneth j. varnum, editor varnum@umich.edu september 2019 https://doi.org/10.6017/ital.v38n3.11627 https://doi.org/10.6017/ital.v38n3.11571 https://doi.org/10.6017/ital.v38n3.11571 https://doi.org/10.6017/ital.v38n3.11627 https://doi.org/10.6017/ital.v38n3.11077 https://doi.org/10.6017/ital.v38n3.11077 https://doi.org/10.6017/ital.v38n3.10714 https://doi.org/10.6017/ital.v38n3.10714 https://doi.org/10.6017/ital.v38n3.10977 https://doi.org/10.6017/ital.v38n3.11009 https://doi.org/10.6017/ital.v38n3.11007 https://doi.org/10.6017/ital.v38n1.10974 https://doi.org/10.6017/ital.v38n2.11141 https://doi.org/10.6017/ital.v38n3.11571 mailto:https://docs.google.com/forms/d/e/1faipqlsfqu7c9ogmcdvvbn025a0kiehavrrlr7090ao3rowqypbqtng/viewform?usp=sf_link mailto:varnum@umich.edu editorial board changes in this issue call for pllw contributions lib-mocs-kmc364-20140106083618 170 book reviews basic fortran iv programming, by donald h. ford. homewood, illinois: richard d. irwin, inc., 1971. 254 pp. $7.95. fortran texts are now quite plentiful, so the main question in the reviewer's mind is: what does this book have to offer that no other book has? regrettably the answer must be nothing. there are many other good fortran books available. this has very little to distinguish it. that is not to say that it is not a good book. the quality of the book is good, the text is very readable, and there has been very good attention to the examples and proofreading. the book is suit able for an introductory course, or for self study. it does not go completely into all the features of the language, as these are usually best left to the specific manuals relating to the machines available. the book does bring the student to a level where he will be able to use those manuals and the level where he will need to use those manuals. the book does come to the level necessary for the person who writes his programs with professional assistance. the author has chosen ansi basic fortran iv to be discussed in the book. in particular he relates this to the ibm/360 and 370 computers. this is a common language and is available on most machines with only minor modifications. this was a good choice for the level of book he intended to write, since he didn't want to go into the advanced features of the language. the author goes quickly to the heart of the matter in fortran programming, so that the reader can start using the computer right away. the basic material is well covered and gives a good introduction to the more advanced features which are available on most machines. the examples are well chosen so that they do not require any specialized knowledge ; therefore the emphasis can be put on the programming aspects of the examples. he also has very good end-ofchapter problems, ranging in difficulty from straight repetition of text material to programming problems which will require a considerable amount of individual work. he has a good discussion of mixed mode arithmetic, one of the more difficult topics of fortran to explain. he also has a good discussion of input/output operations, and an explanation of formatting which is very good. this again is a difficult area of the language and has been well explained. discussing each of the statement types in fortran, he begins by giving the general form of the statement in a standardized way, which is very good for introductory purposes and for review and reference. the index in the book doesn't single these out, so somebody who wanted to use the book as a reference should make a self-index of these particular areas of the book where the general forms and statements are given. this is a good feature of the book. robert f. mathis book reviews 171 films: a marc format; specifications for magnetic tapes containing catalog records for motion pictures, filmstrips, and other pictorial media intended for pro;ection. washington: marc development office, 1970. 65 pp. $0.65. this latest format issued by the marc development office is similar in organiza tion to the previously issued formats, describing in tum the leader, record directory, control fields , and variable fields. three appendices give the variable field tags , indicators, and subfield codes applicable to this format , categories of films , and a sample record in the marc format. in addition to the motion pictures and filmstrips specified in the subtitle, the coverage of this format includes slides, transparencies, video tapes, and electronic video recordings. data elements describing these last two have not been defined completely as the marc development office feels that further investigation is needed in these areas. the bibliographic level for this format is for monograph material, i.e., material complete at time of issue or to be issued in a known number of parts . since most of the material covered by this format is entered under title, main entry fields ( 100, 110, 111, 130 ) have not been described. this exclusion also covers the equivalent fields in the 400s and 800s. main entry and other fields not listed in this format but required by a user can be obtained from books: a marc format. this format describes two kinds of data: that generally found on an lc printed card and that needed to describe films in archival collections. only the first category will be distributed in machine readable form on a regular basis. one innovation introduced in this format that can only be applauded by marc users is the adoption of the bnb practice of using the second indicator of title fields (241, 245, 440, 840, but not 740 where the second indicator had previously been assigned a different function) to specify the number of characters at the beginning of the entry which are to be ignored in filing. it is to be hoped that in the future this practice will be applied to books, serials, and other types of works as well as to films. judith hopkins u.k. marc pmiect, edited by a. e. jeffreys and t. d. wilson. newcastle upon tyne: oriel press, 1970. 116 pp. 25s. this volume, which reports the proceedings of a conference on the u.k. marc project held in march 1969, may be of as much interest in the usa as in britain; although the intake of british libraries is much smaller and the money available for experiments much less, the problems of developing and using marc effectively within these constraints are for this very reason of special interest. 172 journal of library automation vol. 4/3 september, 1971 a. j. wells opened the conference with a paper introducing u.k. marc and closed it with a paper stating its relationship to the british national bibliography. points of interest are the need for standardisation among libraries (not smprisingly, this theme occurs throughout) and the differences between u.k. marc and l.c. marc (the latter being the odd one out, in its departures from aacr 67). disappointingly, no hint is given of additional national bibliographical products that might come from marc, such as cumulated and updated bibliographies on given subjects, or listings of children's books, etc. richard coward, with his usual clarity and conciseness, explains the planning and format of u.k. marc, in which he has been so centrally involved. as he says, "we have the technology to produce a marc service but we really need a higher level of technology to use it at anything like its full potential." r. bayly's paper on "user programs and package deals" is disappointing, dealing only with icl 1900 computers, and not comprehensively or clearly even with them. two papers discuss the problems of actually using marc: e. h. c . driver's "why marc?", which concludes that "the most efficient use of marc will be made by large library sys tems or groups of libraries," and f . h. ayres' "marc in a special library environment," which concludes that eventually all libraries will use the marc tape. mr. ayres discusses the proposed use of marc at a wre aldermaston, and also gives a general (and highly optimistic ) blueprint of the sort of way marc could be used in an all-through selection, acquisition and cataloging system. (the four american experimental uses of marc reviewed by c. d. batty-at toronto, yale, rice and indianaare probably well enough known in the usa and canada.) keith davidson's discussion of filing problems is first class-and his paper is just as topical as when it was written, because little progress has been made since then. peter lewis, in "marc and the future in libraries," makes the point that whereas bnb cards provided a ready-made product for libraries, marc tapes will merely offer them a set of parts to put together themselves. of special interest to american audiences may be derek austin's paper, "subject retrieval in the u.k. marc," since the precis system to which it forms an introduction may represent a major breakthrough in machine manipulable subject indexing. marc and its uses constitute one of the most rapidly developing areas of librarianship. regular conferences of this standard are needed to review progress from time to time. maurice b. line lessons learned: a primo usability study kelsey brett, ashley lierman, and cherie turner information technology and libraries | march 2016 7 abstract the university of houston libraries implemented primo as the primary search option on the library website in may 2014. in may 2015, the libraries released a redesigned interface to improve user experience with the tool. the libraries took a user-centered approach to redesigning the primo interface, conducting a “think-aloud” usability test to gather user feedback and identify needed improvements. this article describes the method and findings from the usability study, the changes that were made to the primo interface as a result, and implications for discovery-system vendor relations and library instruction. introduction index-based discovery systems have become commonplace in academic libraries over the past several years, and academic libraries have invested a great deal of time and money into implementing them. frequently, discovery platforms serve as the primary access point to library resources, and in some libraries they have even replaced traditional online public access catalogs. because of the prominence of these systems in academic libraries and the important function that they serve, libraries have a vested interest in presenting users with a positive and seamless experience while using a discovery system to find and access library information. libraries commonly conduct user testing on their discovery systems, make local customizations when possible, and sometimes even change products to present the most user-friendly experience possible. university of houston libraries has adopted new discovery technologies as they became available in an effort to provide simplified discovery and access to library resources. as a first step, the libraries implemented innovative interface’s encore, a federated search tool, in 2007. when index-based discovery systems became available, the libraries saw them as a way to provide an improved and intuitive search experience. in 2010, the libraries implemented serials solutions’ summon. after three years and a thorough process of evaluating priorities and investigating alternatives, the libraries made the decision to move to ex libris’ primo, which was done in may of 2014. the libraries’ intention was to continually assess and customize primo to improve functionality and user experience. the libraries conducted research and performed user testing, and in may kelsey brett (krbrett@ua.edu) is discovery systems librarian, ashley lierman (arlierman@uh.edu) is instructional design librarian, and cherie turner (ckturner2@uh.edu) is chemical sciences librarian, university of houston libraries, houston, texas. mailto:krbrett@ua.edu mailto:arlierman@uh.edu mailto:ckturner2@uh.edu lessons learned: a primo usability study | brett, lierman, and turner doi: 10.6017/ital.v35i1.8965 8 2015 a redesigned primo search results page was released. one of the activities that informed the primo redesign was a “think-aloud” usability test that required users to complete a set of two tasks using primo. this article will present the method and results of the testing as well as the customizations that were made to the discovery system as a result. it will also discuss some broader implications for library discovery and its effect on information literacy instruction. literature review there is a substantial body of literature discussing usability testing of discovery systems. in the interest of brevity, we will focus solely on studies and overviews involving primo implementations, from which several patterns have emerged. multiple studies have indicated that users’ responses to the system are generally positive; even in testing of very early versions by a development partner users responded positively overall.1 interestingly, some studies found that in many cases users rated primo positively in post-testing surveys even when their task completion rate in the testing had been low.2 multiple studies also found evidence that, although users may struggle with primo initially, the system is learnable over time. comeaux found that the time it took users to use facets or locate resources decreased significantly with each task they performed,3 while other studies saw the use of facets per task increase for each user over the course of the testing.4 user reactions to facets and other post-limiting functions in primo were divided. in one of the earliest studies, sadeh found that users responded positively to facets,5 and some authors found users came to use them heavily while searching,6 while others found that facets were generally underused.7 multiple studies found that users tended to repeat their searches with slightly different terms rather than use post-limiting options.8 thomsett-scott and reese, in a survey of the literature on discovery tools, reported evidence of a trend that users reacted more positively to post-limiting in earlier discovery studies,9 while the broader literature shows more negative reactions in more recent studies. this could indicate that shifts in the software, user expectations, or both may have decreased users’ interest in these options. a few specific types of usability problems seem common across tests of primo and other discovery systems. across a large number of studies, it has been found that users—especially undergraduate students—struggle to understand library and academic terminology used in discovery. some terminology changes were made after users had difficulty in the earliest usability tests of primo,10 but users continued to struggle with terms like hold and recall in item records.11 users also failed to understand the labels of limiters,12 and they also failed to recognize the internal names of repositories and collections.13 literature reviews on discovery systems have found terminology to be a common stumbling block for searchers across a wide number of individual studies.14 similarly, users often struggle to understand the scope of options available to them when searching and the holdings information in item records. users failed in multiple tests to distinguish between the article level and the journal level,15 could not interpret bibliographic information technology and libraries | march 2016 9 information sufficiently to determine that they had found the desired item,16 and chose incorrect options for scoping their searches.17 many studies found that users were unable to distinguish between multiple editions of a held item when all item types or editions were listed in the record.18 in other cases, users had difficulty interpreting locations and holdings information for physical items.19 among the needs and desires expressed by and for primo users in the literature, two in particular stand out. first, many users expressed a desire for more advanced search options; some wanted more complexity in certain facets and the ability to search within results,20 while other users simply wanted an advanced search option to be available.21 secondly, a large number of studies indicated that instruction on primo or other discovery systems was needed for users to search effectively. in some cases this was the conclusion of the researchers conducting the study,22 while in other cases users themselves either suggested or requested instruction on the system.23 it is also worth noting that it has been questioned whether usability testing as a whole is a sufficient mechanism for evaluating discovery-system functionality. prommann and zhang found that usability testing has focused almost exclusively on the technical functioning of the software and not adequately revealed the ability of discovery systems like primo to successfully complete users’ desired tasks.24 they proposed hierarchical task analysis (hta) as an alternative, to examine users’ most frequent desires and the capacity of discovery systems to meet them. prommann and zhang acknowledged, however, that as hta is completed by an expert on the system rather than by an actual user, some of the valuable information derived from usability testing (including terms and functions that users do not understand, however well-designed) is lost in the process; they concluded that a combination of the two systems of testing is ideal to retain the best of both. background at the university of houston libraries, the resource discovery systems department (rds) is responsible for the maintenance and development of primo. however, it is important to rds to gather feedback and foster buy-in from stakeholders in the library before making changes to the system. to that end, rds works with two committees to assess the system and make recommendations for its improvement. the discovery usability group and the discovery advisory group include members from public services, technical services, and systems; each member brings a unique perspective on discovery. the discovery usability group is charged with assessing the discovery system through a variety of methods including usability testing, focus groups, and user interviews. the discovery advisory group reviews results of user testing and makes recommendations for improvement. all changes to the discovery system are reviewed by the groups before they are released for public use. lessons learned: a primo usability study | brett, lierman, and turner doi: 10.6017/ital.v35i1.8965 10 in fall 2014, several months after the primo implementation, the discovery usability group conducted a focus group with student workers from the library’s information desk (a dual reference and circulation desk) to solicit feedback about the functionality of primo and suggestions for its improvement. in the meantime, the discovery advisory group was testing primo and evaluating primo sites at peer and aspirational institutions. the groups used the information collected through the focus group and research on primo to make recommendations for improvement. rds has access to a primo development sandbox, and many of the recommended changes were made in the sandbox environment and reviewed by the two groups prior to public release. changes to the search box can be seen in figure 1. rarely used tabs were replaced with a dropdown menu to the right of the search box to allow users to limit to “everything,” “books+,” or “digital library.” to increase visibility, links to “advanced search” and “browse search” were made larger and more spacing was added. live site: development sandbox: figure 1. search box in live site (above) and development sandbox (below) at time of testing changes were also made to create a cleaner and less cluttered search results page (see figure 2). more white space was added, and the links (or tabs) to “view online,” “request,” “details,” etc., were redesigned and renamed for clarity. for example, the “view online” link was renamed to “preview online” because it opens a box within the search results page that displays the item. the groups believed “preview online” more accurately represents what the link does. information technology and libraries | march 2016 11 live site: development sandbox: figure 2. search results in live site (above) and development sandbox (below) at time of testing the facets were also redesigned to look cleaner and larger to attract users’ attention (see figure 3). lessons learned: a primo usability study | brett, lierman, and turner doi: 10.6017/ital.v35i1.8965 12 live site: development sandbox: figure 3. facets in live site and development sandbox at time of testing both groups were happy with the changes to the primo development sandbox but wanted to test the effect of the changes on user search behavior before updating the live site. the discovery usability group conducted a usability test within the development sandbox. the goal of the test was to find out if users could effectively complete common research tasks using primo. with that goal in mind, the group developed a usability test and conducted it during the spring semester of 2015. information technology and libraries | march 2016 13 methodology the discovery usability group developed a usability test using a “think-aloud” methodology, where users were asked to verbalize their thought process as they completed research tasks through primo. four tasks were designed to mirror tasks that users are likely to complete for class assignments or for general research. to minimize the testing time, each participant completed two tasks, with the facilitators alternating between two sets of tasks from one participant to the next. test 1 task 1: you are trying to find an article that was cited in a paper you read recently. you have the following citation: clapp, e., & edwards, l. (2013). expanding our vision for the arts in education. harvard educational review, 83(1), 5–14. please find this article using onesearch [the public-facing name given to the libraries’ primo implementation]. task 2: you are doing a research project on the effects of video games on early childhood development. find a peer-reviewed article on this topic, using onesearch. test 2 task 1: recently your friend recommended the book the lighthouse by p. d. james. use onesearch to find out if you can check this book out from the library. task 2: you are writing a paper about the drug cartels’ influence on mexico’s relationship with the united states. find a newspaper article on this topic, using onesearch. two facilitators set up a table with a laptop in the front entrance of the library. they alternated between the facilitator and note-taker roles. another group member took on the role of “caller” and recruited library patrons to participate in the study. the caller set up a table visible to those passing by with library-branded t-shirts and umbrellas to incentivize participation. the caller explained what would be expected of the potential participant and went over the informedconsent document. after signing the form, the participant performed two tasks. after the test the participant received a library t-shirt or umbrella, and snacks. the facilitators used morae usability software to record the screen and audio of each test. participants were asked for permission to record their sessions, but could opt out. during the three hour testing period, fifteen library patrons participated in the study, and fourteen sessions were recorded. of the fifteen participants, thirteen were undergraduate students (four freshman, one sophomore, seven juniors, and two seniors), one was a graduate student, and one was a postbaccalaureate student. the majority of the participants were from the sciences, along with two students from the college of business and two from the school of communications. there were no participants from the humanities. lessons learned: a primo usability study | brett, lierman, and turner doi: 10.6017/ital.v35i1.8965 14 the facilitators took notes on a rubric (see table 1) that simplified the processes of coding and reviewing the recordings. after the usability testing, the facilitators reviewed the notes and recordings, coded them for common themes and breakdowns, and prepared a report of their findings and design recommendations. the facilitators sent the report, along with audio and screen recordings, to the discovery advisory group, who reviewed them along with rds. the discovery advisory group made additional design recommendations, and rds used the information and recommendations to implement additional customizations to the primo development sandbox. preliminary questions ask: what is your affiliation with the university of houston? year? major? ask: how often do you use the library website? for what purpose(s)? task 1 describe the steps the participant took to complete the task s/u ask: how did you feel about this task? what was simple? what was difficult? ask: is there anything that would make completing this task easier? task 2 describe the steps the participant took to complete the task s/u ask: how did you feel about this task? what was simple? what was difficult? ask: is there anything that would make completing this task easier? follow-up question ask: what can we do to improve the overall experience using onesearch? table 1. task completion rubric for test 1 information technology and libraries | march 2016 15 results test 1, task 1 you are trying to find an article that was cited in a paper you read recently. you have the following citation: clapp, e., & edwards, l. (2013). expanding our vision for the arts in education. harvard educational review, 83(1), 5–14. please find this article using onesearch. participant time on task task completion 1 1m 54s y 2 4m 13s y 3 1m 26s y 4 1m 17s y 5 1m 26s y (required assistance) 6 1m 43s y 7 1m 27s y 8 1m 5s y table 2. results for test 1, task 1 all eight participants successfully completed this task, although sophistication and efficiency varied between participants. some searched by the authors’ last names, which was not specific enough to return the item in question. four participants attempted to use advanced search or the drop-down menu to the right of the search box to pre-filter their results. two participants viewed the options in the drop-down menu, which were “everything,” “books+,” and “digital library,” and left it on the default “everything” search. when prompted, the participants explained that they were expecting the drop-down to contain title and/or author limiters. similarly, participants expected an author limiter in the advanced search. the citation format seemed to confuse participants, and they tended to search for the piece of information that was listed first—the authors—rather than the most unique piece of information—the title. if the first search did not return the correct item in the first few results, the participant would modify their search by searching for a different element of the citation or adding another element of the citation to the initial search until the item they were looking for appeared as one of the first few results. participant 5 thought they had successfully completed the lessons learned: a primo usability study | brett, lierman, and turner doi: 10.6017/ital.v35i1.8965 16 task, but the facilitator had to point out that the item they chose did not meet the citation exactly, and on the second try they found the correct item. participant 2 worked on the task for more than four minutes, significantly longer than the other seven participants. they immediately navigated to advanced search and filled out several fields in the advanced search form with the elements of the citation. if the search did not return their item, they added more elements until they finally found it. simply searching the title in the citation would have returned the item as the first search result. filling out the advanced search form with all of the information from the citation does not necessarily increase a user’s chances of finding the item in a discovery system, though it might do so when searching in an online catalog or subject database. the discovery advisory and usability groups made two recommendations to address some of the identified issues: include an author search option in the advanced search, and add an “articles+” option to the drop-down menu on the basic search. rds implemented both recommendations. the discovery usability group identified confusion around citations as a common breakdown during this task. the groups recommended providing instructional information about searching for known items to address this breakdown; however, rds is still working on an effective method to provide this information in a simple and visible way. test 1, task 2 you are doing a research project on the effects of video games on early childhood development. find a peer-reviewed article on this topic, using onesearch. participant time on task task completion 1 3m 44s y 2 2m 21s y 3 5m 23s y (required assistance) 4 2m 5s y 5 3m 32s y 6 2m 45s y 7 3m 8s y 8 3m 1s y (required assistance) table 3. results for test 1, task 2 all eight participants successfully found an article on this topic, but were less successful in determining whether the article was peer-reviewed. only one participant used the “peer-reviewed information technology and libraries | march 2016 17 journals” facet without being prompted. three users noticed the “[peer-reviewed journal]” note in the record information for search results, and used it to determine if the article was peer-reviewed. one participant went to the full-text of an article, and said it “seemed” like it was peer-reviewed and considered the task complete. the resource type facets were more heavily used during this task than the “peer-reviewed journals” facet, despite its being promoted to the top of the list of facets. two participants used the “articles” facet, and two participants used the “reviews” facet, thinking it limited to peer-reviewed articles. participants 3 and 8 needed help from the facilitator to determine whether a source was peer-reviewed. there was an overall misunderstanding of what peer-reviewed means, which affected participants’ confidence in completing the task. the design recommendations based on this task included changing the “peer-reviewed journals” facet to “peer-reviewed articles” or simply, “peer-reviewed.” rds changed the facet to “peerreviewed articles” to help alleviate confusion. additionally, the groups recommended emphasizing the “[peer-reviewed journal]” designations within the search results and providing a method for limiting to peer-reviewed materials before conducting a search. customization limitations of the system have prevented rds from implementing these design recommendations yet. a way to address the breakdowns caused by misunderstanding terminology also has yet to be identified. it was disheartening that participants did not use the “peer-reviewed journals” facet despite its being purposefully emphasized on the search results page. test 2, task 1 recently your friend recommended the book the lighthouse by p. d. james. use onesearch to find out if you can check this book out from the library. participant time on task task completion 1 1m 7s y 2 56s y 3 no recording y 4 2m 21s y 5 1m 8s y 6 2m 14s y 7 1m 15s y table 4. results for test 2, task 1 all seven participants were able to find this book using primo, but had difficulty in determining what to do once they found it. for this task every participant searched by title and found the book as the first search result. four users limited to “books+” before searching using the drop-down lessons learned: a primo usability study | brett, lierman, and turner doi: 10.6017/ital.v35i1.8965 18 menu, while the other three remained in the default “everything” search. only one participant used the locations tab within the search results to determine availability; the others clicked the title and went to the item’s catalog record. all participants were able to determine that the book was available in the library, but there was an overall lack of understanding about how to use the information in the catalog to check out a book. participant 1 said that they would write down the call number, take it to the information desk, and ask how to find it, which was the most sophisticated response of all seven participants. participant 4 spent nearly two minutes clicking through links in the opac expecting to find a “check out” button and only stopped when the facilitator stepped in. a recommended design change based on this task was to have call numbers in primo and the online catalog link to a stacks guide or map. this is a feature that may be developed in the future, but technical limitations prevented rds from implementing it in time for the release of the redesigned search interface. like the previous tasks, some of the breakdowns occurred because of a lack of understanding of library services. users easily figured out that there was a copy of the book in the library, but had little sense of what to do next. none of the participants successfully located the stacks guide or the request feature that would put the item on hold for them. steps should be taken to direct users to these features more effectively. test 2, task 2 you are writing a paper about the drug cartels’ influence on mexico’s relationship with the united states. find a newspaper article on this topic, using onesearch. participant time on task task completion 1 4m 45s y (required assistance) 2 59s y 3 no recording n 4 7m 47s y 5 2m 52s y 6 1m 33s y 7 1m 30s y table 5. results for test 2, task 2 this task was difficult for participants. two users limited their search initially to “digital library” using the drop-down menu, thinking it would be a place to find newspaper articles; their searches returned zero results. only two users used the “newspaper articles” facet without being prompted, and users did not seem to readily distinguish newspaper articles as a resource type. participants information technology and libraries | march 2016 19 did not notice the resource type icons without being prompted. several participants needed to be reminded that the task was to find a newspaper article, and not any other type of article. with guidance, most participants were able to complete the task. participant 4 remained on the task for almost eight minutes because of their dissatisfaction with the relevancy of the results to the prompt. interestingly, they found the “newspaper articles” facet and reapplied it after each modified search, suggesting that they learned to use system features as they went. one of the recommendations based on this task was to remove “digital library” as an option in the drop-down menu on the basic search. it was evident that “digital library” did not have the same meaning to end users as it does to internal users. this recommendation was easily implemented. another recommendation was to emphasize the resource type icons within the search results, but we have not determined a way to do so effectively. one suggestion from the discovery usability group was to exclude newspaper articles from the search results as a default, but no consensus was reached on this issue. limitations the discovery usability group identified limitations to the usability test that should be noted. testing was done in a high-traffic portion of the library’s lobby, which is used as study space by a broad range of students. participants were recruited from this study space, and we chose not to screen participants. the fifteen participants in the study did not constitute a representative sample. almost all participants were undergraduate students, and no humanities majors participated. the outcomes might have been different if our participants had included more experienced researchers or students from a broader range of disciplines. by adding screening questions or choosing a more neutral location, we would have limited the number of participants who could complete our testing. another limitation was that the participants started the usability test within the primo interface. because primo is integrated into the libraries’ website, users would typically begin searching the system from within the library homepage. the goals of the study required testing of our primo development sandbox, which was not yet available to the public, and therefore could not be accessed in the same way. this gave participants some additional options from the initial search pages that are not usually available through the main search interface. while testing an active version of the interface would be preferable, one of our goals was to understand how our modifications affected user behavior, so testing the unmodified version was not an acceptable substitute. additionally, the usability study presented tasks out of context and did not replicate a true user-searching experience. despite the limitations, we learned valuable lessons from the participants in this study. discussion users successfully completed the tasks in this usability study. unfortunately, they did not take advantage of many of the features that can make such tasks easier—particularly facets. this was lessons learned: a primo usability study | brett, lierman, and turner doi: 10.6017/ital.v35i1.8965 20 especially apparent when we asked users to find a peer-reviewed journal article (test 1, task 2). primo has a facet that will limit a search to only peer-reviewed journal articles, and only one out of eight participants used this facet during this task. participants appreciated the pre-search filtering options, and requested more of them (such as an author search), while post-search facets were underutilized. similarly, participants almost uniformly ignored the links, or tabs, within the search results, which would provide users with more information, a preview of the full-text, and additional features such as an email function. users bypassed these options and clicked on the title instead. the discovery usability group theorized that users clicked on the title of the item because that behavior would be successful in a more familiar search interface like google. the team customized the configuration so that a title click would open either the full-text of electronic items or the catalog record for physical items to accommodate users’ instinctive search behaviors. the tabs, though a prominent feature of the discovery system, have proved to have little value for users. throughout the implementation of discovery systems in academic libraries, both research studies and anecdotal evidence have suggested that users do not find end-user features like facets valuable; however, discovery system vendors have made no apparent attempt to reimagine the possibilities for search refinements. indeed, most of the findings in this study will present few surprises to anyone familiar with the discovery usability literature, which is itself concerning. as our literature review has shown, many of the same general usability issues have repeated throughout studies of primo since 2008, and most are very similar to usability issues in other, competitor discovery systems. this raises some concerns about the pace of innovation in the discovery field, and whether discovery vendors are genuinely taking into account the research findings about the needs of our users as they refine their products. in a recent article, david nelson and linda turney identified many issues with discovery facets in their current form that may be barriers to usage, particularly labeling and library jargon; we join them in urging vendors and libraries to collaborate more closely for deep analysis of actual facet usage by users, and to address those factors that have negatively affected facets’ value.25 during our usability study, a common barrier to the successful completion of a task was not the technology itself but a lack of understanding of the task. participants had difficulty deciphering a citation, which may have led to their tendency to search for a journal article by author and not by title. many participants struggled with using call numbers, and how to find and check out books in the library. peer review also proved to be a difficult or unfamiliar concept for many; when looking for peer-reviewed articles, some participants clicked on the “reviews” facet, which limited their searches to an inappropriate resource type. additionally, participants did not differentiate between journal articles and newspaper articles, which may indicate a broader inability to differentiate between scholarly and nonscholarly resources. this effect may be exaggerated by the high percentage of science students who participated, as these students may not have frequent need for newspaper articles. all of these challenges, however, are indicative of a deeper problem information technology and libraries | march 2016 21 with terminology. regardless of how simple it is to limit a search to peer-reviewed articles, a user who does not understand what peer review means cannot complete the task with confidence or certainty. librarians struggle with presenting understandable language and avoiding library terminology; as we discovered, academic language, like “peer-reviewed” and “citation,” presents a similar problem. these are not issues that can be resolved with a technological solution. rather, we join previous authors in suggesting that instruction may be a reasonable way to address many usability issues in primo. from our findings and from those in the wider literature, we conclude that general instruction in information literacy is prerequisite for effective use of this or any research tool, particularly for undergraduates. nichols et al. “recommend studying how to effectively provide instruction on primo searching and results interpretation,”26 but instruction on the use of a single tool is of limited utility to students in their academic lives. instead libraries could bolster information literacy instruction on key concepts around the production and storage of information, scholarly communications, and differences in information types. teaching these concepts effectively should help to alleviate the most common user issues, including understanding terminology and different types of information, as well as helping students to understand key elements of research in general. this is a particularly important point to note for librarians working as advocates for information literacy instruction, especially in cases where administrators or faculty may feel that more advanced tools, like discovery systems, should make instruction obsolete. conclusion several changes were made to the primo interface in response to breakdowns identified during the usability study. resource discovery systems first implemented the changes to the primo development sandbox. after the discovery usability and advisory groups agreed on the changes, they were made available on the live site (see figure 4). the redesigned search results page became available to the general public between the spring and summer academic sessions of 2015. in addition to the changes that were made because the usability study, rds made changes to the look and feel to make the search results interface more aesthetically pleasing and more in line with the university of houston brand. lessons learned: a primo usability study | brett, lierman, and turner doi: 10.6017/ital.v35i1.8965 22 before (live site): figure 4. primo interface before usability testing during (development sandbox): figure 5. primo interface during usability testing information technology and libraries | march 2016 23 after (live site): figure 6. primo interface after usability testing many larger assertions of this study, encompassing implications for instruction and our needs from discovery vendors, will require further study to address. the authors intend to continue to investigate these issues as additional usability testing is conducted and to use the data to support future vendor relations and instructional curriculum development discussions. references 1. tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1/2 (2008): 7–24, doi:10.1108/03074800810845976. 2. aaron nichols et al., “kicking the tires: a usability study of the primo discovery tool,” journal of web librarianship 8, no. 2 (2014): 172–95, doi: doi:10.1080/19322909.2014.903133; scott hanrath and miloche kottman, “use and usability of a discovery tool in an academic library,” journal of web librarianship 9, no. 1 (2015): 1–21, doi:10.1080/19322909.2014.983259. 3. david j. comeaux, “usability testing of a web-scale discovery system at an academic library,” college & undergraduate libraries 19, no. 2–4 (2012): 189–206, doi:10.1080/10691316.2012.695671. 4. kylie jarrett, “findit@ flinders: user experiences of the primo discovery search solution,” australian academic & research libraries 43, no. 4 (2012): 278–300; nichols et al., "kicking the tires." 5. sadeh, “user experience in the library.” http://dx.doi.org/10.1108/03074800810845976 http://dx.doi.org/10.1080/19322909.2014.903133 http://dx.doi.org/10.1080/19322909.2014.983259 http://dx.doi.org/10.1080/10691316.2012.695671 lessons learned: a primo usability study | brett, lierman, and turner doi: 10.6017/ital.v35i1.8965 24 6. jarrett, “findit@ flinders”; nichols et al., “kicking the tires.” 7. xi niu, tao zhang, and hsin-liang chen, “study of user search activities with two discovery tools at an academic library,” libraries faculty and staff scholarship and research 30, no. 5 (2014), doi:10.1080/10447318.2013.873281; hanrath and kottman, “use and usability of a discovery tool in an academic library.” 8. rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends 61, no. 1 (2012): 186–207, doi:10.1353/lib.2012.0029; niu, zhang, and chen, “study of user search activities with two discovery tools at an academic library.” 9. beth thomsett-scott and patricia e. reese, “academic libraries and discovery tools: a survey of the literature,” college & undergraduate libraries 19, no. 2–4 (2012): 123–43, doi:10.1080/10691316.2012.697009. 10. sadeh, “user experience in the library.” 11. comeaux, “usability testing of a web-scale discovery system at an academic library.” 12. jessica mahoney and susan leach-murray, “implementation of a discovery layer: the franklin college experience,” college & undergraduate libraries 19, no. 2–4 (2012): 327–43, doi:10.1080/10691316.2012.693435. 13. joy marie perrin et al., “usability testing for greater impact: a primo case study,” information technology & libraries 33, no. 4 (2014): 57–67. 14. majors, “comparative user experiences of next-generation catalogue interfaces”; thomsettscott and reese, “academic libraries and discovery tools.” 15. jarrett, “findit@ flinders”; mahoney and leach-murray, “implementation of a discovery layer.” 16. jarrett, “findit@ flinders”; mahoney and leach-murray, “implementation of a discovery layer”; nichols et al., “kicking the tires." 17. jarrett, “findit@ flinders”; mahoney and leach-murray, “implementation of a discovery layer”; perrin et al., “usability testing for greater impact : a primo case study.” 18. jarrett, “findit@ flinders”; nichols et al., “kicking the tires”; hanrath and kottman, “use and usability of a discovery tool in an academic library”; majors, “comparative user experiences of next-generation catalogue interfaces.” 19. comeaux, “usability testing of a web-scale discovery system at an academic library”; thomsett-scott and reese, “academic libraries and discovery tools.” 20. jarrett, “findit@ flinders.” 21. mahoney and leach-murray, “implementation of a discovery layer”; perrin et al., “usability testing for greater impact.” http://dx.doi.org/10.1080/10447318.2013.873281 http://dx.doi.org/10.1353/lib.2012.0029 http://dx.doi.org/10.1080/10691316.2012.697009 http://dx.doi.org/10.1080/10691316.2012.693435 information technology and libraries | march 2016 25 22. mahoney and leach-murray, “implementation of a discovery layer”; nichols et al., “kicking the tires”; niu, zhang, and chen, “study of user search activities with two discovery tools at an academic library.” 23. thomsett-scott and reese, “academic libraries and discovery tools.” 24. tao zhang and merlen prommann, “applying hierarchical task analysis method to discovery layer evaluation,” information technology & libraries 34, no. 1 (2015): 77–105, doi:10.6017/ital.v34i1.5600. 25. david nelson and linda turney, “what’s in a word? rethinking facet headings in a discovery service,” information technology & libraries 34, no. 2 (2015): 76–91, doi:10.6017/ital.v34i2.5629. 26. nichols et al., “kicking the tires,” 184. http://dx.doi.org/10.6017/ital.v34i1.5600 http://dx.doi.org/10.6017/ital.v34i2.5629 reproduced with permission of the copyright owner. further reproduction prohibited without permission. policies governing use of computing technology in academic libraries vaughan, jason information technology and libraries; dec 2004; 23, 4; proquest pg. 153 policies governing use of computing technology in academic libraries the networked computing environment is a vital resource for academic libraries. ever-increasing use dictates the prudence of having a comprehensive computer-use policy in force. universities often have an overarching policy or policies governing the general use of computing technology that helps to safeguard the university equipment, software, and network against inappropriate use. libraries often benefit from having an adjunct policy that works to emphasize the existence and important points of higher-level policies, while also providing a local context for systems and policies pertinent to the library in particular. having computer-use policies at the university and library level helps provide a comprehensive, encompassing guide for the effective and appropriate use of this vital resource. f or clients of academic libraries, the computing environment and access to online information is an essential part of everyday service-every bit as vital as having a printed collection on the shelf. the computing environment has grown in positive ways-higher-caliber hardware and software, evolving methods of communication, and large quantities of accurate online information content. it has also grown in many negative ways-the propagation of worms and viruses, other methods of hacking and disruption, and inaccurate informational content. as the computing environment has grown, it has become essential to have adequate and regularly reviewed policies governing its use. often, if not always, overarching policies exist at a broad institutional or even larger systemwide level. such policies can govern the use of all university equipment, software, and network access within the library and elsewhere on campus, such as campus computer labs. a single policy may encompass every easily conceivable computing-related topic, or there may be several individual policies. apart from any document drafted and enforced at the university level, various public laws exist that also govern appropriate computer-use behavior, whether in academia or on the beach. many institutions have separate policies governing employee use of computer resources; this paper focuses on student use of computing technologies. in some cases, the library and the additional campus student-computer infrastructure (for example, campus labs and dormitory computer access) are governed by the same organizational entity, so the higher-level policy and the library policy are de facto the same. in many instances, libraries have enacted additional computeruse policies. such policies may emphasize or augment certain points found in the institution-level policy(s), address concerns specific to the library environment, or both. this paper surveys the scope of what are most jason vaughan commonly referred to as "computer-use policies," specifically, those geared toward the student-client population. common elements found in university-level policies (and often later emphasized in the library policy) are identified. a discussion on additional topics generally more specific to the library environment, and often found in library computer-use policies, follows. the final section takes a look at the computer-use environment at the university of nevada, las vegas (unlv), the various policies in force, and identifies where certain elements are spelled out-at the university level, the library level, or both. i policy basics purpose and scope policies can serve several purposes. a policy is defined as: a plan or course of action ... intended to influence and determine decisions, actions, and other matters. a course of action, guiding principle, or procedure considered expedient, prudent, or advantageous.' any sound university has a comprehensive computeruse policy readily available and visible to all members of the university community-faculty, staff, students, and visitors. some institutions have drafted a universal policy that seeks to cover all the pertinent bases pertaining to the use of computing technology. in some cases, these broad overarching policies have descriptive content as well as references to other related or subsidiary policies. in this way, they provide content and serve as an index to other policies. in other cases, no illusions are made about having a single, general, overarching policy-the university has multiple policies instead. policies can define what is permitted (use of computers for academic research) or not permitted (use of computers for nonacademic purposes, such as commercial or political interests). a policy is meant to guide behavior and the use of resources as they are meant to be used. in addition, policies can delve into procedure. for example, most policies contain a section on how to report suspected abuse and how suspected abuse is investigated, and outlines potential penalties. policies buried in legalese may serve some purpose, but they may not do a good job of educating users on what is acceptable and not acceptable. perhaps the best approach is an appropriate jason vaughan (jvaughan@ccmail.nevada.edu) is head of the library systems department at the university of nevada, las vegas. policies governing use of computer technology in academic libraries i vaughan 153 reproduced with permission of the copyright owner. further reproduction prohibited without permission. balance between legalese and language most users will understand. in addition, policies can also serve to help educate individuals on important topics, rather than merely stating what is allowed and what will get one in trouble. for example, a general policy statement might read, "you must keep your password confidential." taken a step further, the policy could include recommendations pertaining to passwords, such as the minimum password length, inclusion on nonalphabetic characters, the recommendation to change the password regularly, and the mandate to never write down the password. characteristics of a policy-visibility, prominence, easily identifiable a policy is most useful when it is highly visible and clearly identified as a policy that has been approved by some authoritative individual or body. students often sign a form or agree online to terms and conditions when their university accounts are established. web pages may have a disclaimer stating something to the effect of "use of (institution's) resources is governed by .... " and provide a hyperlink to the various policies in place. or, a simple policies link may appear in the footer of every web page at the institutional site. some universities have gone a bit further. at the university of virginia, for example, students must complete an online quiz after reviewing the computer-use guidelines.' in addition, they can choose to view the optional video. such components serve to enhance awareness of the various policies in place. a review of the library literature failed to uncover any articles focusing on computer-use policies in academic libraries. the author then selected several similar-sized (but not necessarily peer) institutions to unlv-doctoralgranting universities with a student population between twenty thousand and thirty thousand-and thoroughly examined their library web sites to see what, if any, policy components were explicitly highlighted. it quickly became evident that many libraries do not have a centrally visible, specifically titled, inclusive computer-use policy document. most, but not all, of the library web sites provided a link to the institutional-level computer-use policy. in some cases, library policies were not consolidated under a central page titled "policies and procedures," or "guidelines," and, where they did appear, the context did not imply or state authoritatively that this was an official policy. there was no statement of who drafted the policy (which can lend some level of authority or credence), as well as no indicated creation or revision date. granted, many libraries have paper forms one must sign to obtain a library card, or they may state the rules in hardcopy posted within prominent computer-dense locations. still, with so much emphasis given to licensed database and internet resources, and with such heavy use of the computing environment, such policies should appear online in a prominent location. where better to provide a computer-use policy than online? perhaps all the libraries reviewed did have policies posted somewhere online. if the author could not easily find them, chances are a student would have difficulties as well. in sum, the location of the policy information and how it is labeled can make a tremendous difference. revisions policies should be reviewed on a regular basis. often, the initial policy likely goes through university counsel, the president's administrative circles, and, perhaps, a board of regents or the equivalent. revisions may go through such avenues, or may be more streamlined. a frequent review of policies is mandated by evolving information technology. for example, cell phones with built-in cameras or internetbrowsing capabilities, nonexistent a few years ago, are now becoming mainstream. with such an inconspicuous device, activities such as taking pictures of an exam or finding simple answers online are now possible. similarly, regularly installed critical updates are a central concept within windows' latest version of operating-system software. such functionality failed to attract much attention until the increase in security exploits and associated media coverage. some policies, recently updated, now make mention of the need to keep operating systems patched. i why have a library policy? while some libraries link to higher-level institutional policies and perhaps have a few rules stated on various scattered library web pages, other libraries have quite comprehensive policies that serve as an adjunct to (and certainly comply with) higher-institutional policies. there are several reasons to have a library policy. first, it adds visibility to whatever higher-level policy may be in place. a central feature of a library policy is that it often provides links (and thus, additional visibility) to other higher-level policies. a computer-use policy can never appear in too many places. (some libraries have the link in the footer of every web page.) a computer-use policy can be thought of as a speed limit sign. presumably, everyone knows that unless otherwise posted, the speed limit inside the city is thirty-five miles per hour, and outside it is fifty-five miles per hour. nevertheless, numerous speed-limit signs are in place to remind drivers of this. higher-level institutional policies often take a broad stroke, in that they pertain to and address computing technology in general, without addressing specific systems in detail. a second reason to have a local-library policy is to reflect rules governing local-library resources that are housed and managed by the library. such systems 154 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. often include virtual reference, electronic reserves, laptop-checkout privileges, and the mass of electronic databases and full-text resources purchased and managed by libraries. such library-based systems do not necessarily make the radar of higher-level policies, yet have important considerations, such as copyright issues in the electronic age or privacy as it relates to e-mail and chat reference. in addition, libraries often have two large user groups that other campus entities do not have-university affiliates (faculty, staff, students) and nonuniversity affiliates (community users). while broader university policies generally apply to all users of computing technology, local-library policies can work to address all users of the library pcs, and make distinctions as to when, where, and what each group can use. i common computer-use policy elements the following section outlines broad topics that are usually addressed within high-level, institutional policies. often, some or many of these same elements are later reemphasized or adapted by libraries, focusing on the library environment. in many cases, the policy is presented in a manner somewhat like breaking the seal on a new piece of software packaging. essentially, if someone is using the university equipment or network, that person agrees to abide by all policies governing such use. an overarching policy frequently may end with a bulleted summary of the important points in the document. an important first part of the policy is a clear indication of who the policy applies to. this may be as broad as "anyone who sits down in front of university equipment or connects to the network," or as specific as spelling out individual user groups (undergraduates, graduates, alumni, k-12 students). appendix a summarizes elements found in the various end-user computer policies in force at unlv and the unlv university libraries. network and workstation security network security is a universal topic addressed in computer-use policies. under this general aegis one often finds prohibitions against various forms of hacking, as well as recommendations for steps individual users should take to help better secure the overall network. there are also such policies as the prohibition of food and drink near computer workstations or on the furniture housing computer workstations. typical components related to network and workstation security include: 1. disruption of other computer systems or networks; deliberately altering or reconfiguring system files; use of ftp servers, peer-to-peer file sharing, or operation of other bandwidth-intensive services 2. creation of a virus; propagation of a virus 3. attempts at unauthorized access; theft of account ids or passwords 4. password information-individual users need to maintain a strong, confidential password 5. intentionally viewing, copying, modifying, or deleting other users' files 6. a requirement to secure restrictions to files stored on university servers 7. recommendation or requirement to back up files 8. statement of ownership regarding equipment and software-the university, not the student, owns the equipment, network, and software 9. intentional physical damage: tampering, marking, or reconfiguring equipment or infrastructuresuch as unplugging network cables 10. food and drink policies personal hardware and software many universities allow students to attach their own laptops to the campus wired or wireless network(s). in addition to network connections, a growing number of consumer devices such as floppy disks, zip disks, and rewritable cd /dvd-media have the potential to connect to university computers for the purpose of data transfer. today, the list has grown to include portable flash drives, digital cameras and camcorders, and mp3 players, among others. the attaching of personal equipment to university hardware may or may not be allowed. similarly, users may often try to install software on university-owned equipment. typical examples may include a game brought from home or any of the myriad pieces of software easily downloaded from the internet. some of the policy elements dealing with the use of personal hardware and software include: 1. connecting personal laptops to the university wired or wireless network(s) 2. use of current and up-to-date patched operating systems and antivirus programs running on personal equipment attached to the network 3. connecting, inserting, or interfacing such personal hardware as floppy disks, cds, flash drives, and digital cameras with university-owned hardware; liability regarding physical damage or data loss 4. limit access to and mandate immediate reporting of stolen personal equipment (to deactivate registered mac addresses, for example) 5. downloading or installing personal or otherwise additional software onto university equipment 6. use of personal technology (cell phones, pdas) in classroom or test-taking environments policies governing use of computer technology in academic libraries i vaughan 155 reproduced with permission of the copyright owner. further reproduction prohibited without permission. e-mail e-mail privileges figure prominently in computer-use policies. some topics deal with security and network performance (sending a virus), while many deal with inappropriate use (making threats or sending obscene e-mails). other topics deal with both (such as sending spam, which is unsolicited, annoying, and consumes a lot of bandwidth). among the activities covered are prohibitions or statements regarding: l. hiding identity, forging an e-mail address 2. initiating spam 3. subscribing others to mailing lists 4. disseminating obscene material or weblinks to such material 5. general guidelines on e-mail privileges, such as the size of an e-mail account, how long an account can be used after graduation, and e-mail retention 6. basic education regarding e-mail etiquette printing with the explosion of full-text resources, libraries and other student-computing facilities have experienced a tremendous growth in the volume of pages printed on library printers. at unlv libraries, for example, the printing volume for july 2002 to june 2003 was just shy of two million pages; the following year that had jumped to almost 2.4 million pages. various policies helping to govern printing may exist, such as honor-system guidelines ("don't print more than ten pages per day"). some institutions or libraries have implemented cost-recovery systems, where students pay fixed amounts per black-and-white and color pages printed through networked printers. standard policies regarding printer-use cover: 1. mass printing of flyers or newsletters 2. tampering with or trying to load anything into paper trays (such as trying to load transparencies in a laser printer) 3. per-sheet print costs (color and black-and-white; by paper size) 4. refund policies 5. additional commonsense guidelines, such as "use print preview in browser" personal web sites many universities allow students to create personal web sites, hosted and served from university-owned equipment. customary policy items focusing on this privilege include: 1. general account guidelines-space limitations, backups, secure ftp requirements 2. use of school logo on personal web pages 3. statement of content responsibility or institutional disclaimer information 4. requirement to provide personal contact information 5. posting or hosting of obscene, questionable, or inappropriate content intellectual property, copyright, or trademark abuse of copyright, clearly a violation of federal law, is something that libraries and universities were concerned about long before computers hit the mainstream. widespread computing has introduced new avenues to potentially break copyright laws, such as peer-to-peer file sharing and dvd-movie duplication, to mention only two. a computer-use policy covering copyright will generally include: l. general discussion of copyright and trademark law; links to comprehensive information on these topics 2. concept of educational "fair use" 3. copying or modification of licensed software, use of software as intended, use of unlicensed software 4. specific rules pertaining to electronic theses and dissertations 5. specific mention of the illegality of downloading copyrighted music and video files appropriateand priority-use guidelines appropriate use is often covered in association with topics such as network security or intellectual property. however, appropriateand priority-use rules can be an entire policy and would include: l. mention of federal, state, and local laws 2. use of resources for theft or plagiarism 3. abuse, harassment, or making threats to others (via e-mail, instant messaging, or web page) 4. viewing material that may offend or harass others 5. legitimate versus prohibited use; use for nonacademic purposes such as commercial, advertising, political purposes, or games 6. academic freedom, internet filtering privacy, data security, and monitoring privacy and data security are tremendous issues within the computing environment. networking protocols and components of many software programs and operating systems by default keep track of many activities (browser history files and cache, dynamic host configuration protocol logs, and network account login logs, to mention a few). additional specialized tools can track specific sessions and provide additional information. just as credit156 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. card companies, banks, and hospitals provide a privacy policy to their clients, so do many academic computer-use policies. such statements often address what logs are kept, how they are maintained, how they may be used, and who has access. in addition to the legitimate use of maintaining information, there is the general concept of questionable or outright malicious collection of information, through cookies, spybots, or browser hijacks. the following are concepts often addressed under the general heading of privacy: l. cookies, spybots, other malicious software 2. what information is collected for evaluative system management and/ or statistical purposes; use of cookies for this; how such information is used and reported 3. statement on routine monitoring or inspection of accounts or use; reasons information may be accessed (routine system maintenance, official university business, investigation of abuse, irregular usage patterns) 4. security of information stored on or transmitted by various campus resources 5. statement on general lack of security of public, multiuser workstations (browser cache, search history, recent documents) 6. disposition of information under certain circumstances (for example, if a student dies while enrolled, any personal university e-mail and stored files can be turned over to executor of will or parents) abuse violations, investigations, and penalties as policies generally are a statement of what is or is not permitted, or what is considered abuse, a clearly defined mechanism for reporting suspected abuse and policy violations can often be found. obviously, some abuse issues violate not only university policy, but also local, state, or federal law. investigations of suspected abuse are by their nature tied into the privacy and monitoring category. policy items detailing suspected abuse usually include: 1. how one can report suspected abuse 2. how requests for content, logging, or other account information are handled; how and by what entities abuse investigations are handled 3. potential penalties 4. how to appeal potential penalties; rights and responsibilities one may have in such a situation other computer or network-based services affecting the broad student population universities operate any number of other computer or network-based services for the broad academic community. such services may include provisioning of isp accounts, courseware, online registration, and digital institutional repositories. depending on the broad nature of these services, policy information particular to such systems can be specified at the broad policy level, especially if they have unique avenues of potential exploitation or abuse not covered in the general topics included elsewhere in the policy. i additional library-specific computer-use policy elements many libraries elect to have their own, additional computer-use policies that serve as an adjunct to the larger university-level policy that generally governs the use of all computing resources on campus. libraries that have a formalized library computer-use policy often start with a statement of other policies governing the use of the library equipment and network-references to the university policies in place. the library policy may choose to include or paraphrase parts of the university policy deemed especially important or otherwise applicable to the specific library environment. important concepts governing university policies apply equally to library policies-purpose and comprehensiveness, visibility, and frequent review. libraries that have formalized computer-use policies often link them under library common web-site sections such as "information about the libraries," or "about the libraries." library policies can help address items unique, special, or otherwise worthy of elaboration, such as specific systems in place or situations that may arise. they can also help provide guidelines and strategies to aid staff in policy enforcement. as an example of a library computer-use policy, appendix b provides the main unlv libraries computer-use policy. public versus student use-allowances and priority use many of the other entities on a university campus do not daily deal with the community at large (the non-university affiliates) as do academic libraries. this applies to most if not all public institutions, as well as many private institutions. the degree to which academic libraries embrace community users varies widely; often, a statement on which user groups are the primary clients is stated in a policy. such policy statements may discuss who may use what computers, what software components they have access to, and when access is allowed. in some cases, levels of access for students and the community are basically the same. community users may be allowed to use all software installed on the pc. more often, separate pcs with smaller software sets have been configured for community users or for specific access to policies governing use of computer technology in academic libraries i vaughan 157 reproduced with permission of the copyright owner. further reproduction prohibited without permission. government documents. in some cases, libraries allow some or all pcs to be used by anyone, student or nonstudent, but have technically configured the pc or network to prevent the community at large from using the full software set (such as common productivity suites). however, community users may be limited from using the productivity software (such as microsoft word) found on these pcs. the may be restricted from using pcs on upper floors, or those reserved for special purposes, such as high-end graphics-development workstations. in addition, during crunch time-midterms and final exams-community users are often restricted to the few pcs set up and configured to allow access only to the library web page (not the web at large) and the online catalog. in addition, only students and staff can plug in their personal laptops to the library and campus network. regardless of whether it is crunch time, nonstudent users can be asked to leave if all pcs are in use and students are waiting. an in-house-authored program identifies accounts and whether particular users are students or nonstudents. in 2005, the unlv libraries will begin limiting full web access to community users; they will only be permitted access to a limited set of webbased resources, such as government document websites and library licensed databases. more and more government information is available online. for libraries serving as government document repositories, all users have the right to freely access information distributed by the government. in 2005, the unlv libraries will begin limiting full web access to community users; they will only be permitted access to a limited set of web-based resources, such as government document web sites and library licensed databases. on another note, many libraries have special adaptive workstations with additional software and hardware to facilitate access to library resources by disabled citizens. disabled individuals, enrolled at the university or not, are allowed to use these adaptive workstations. laptop checkout privileges many libraries today check out laptops for student use. at unlv libraries, faculty, staff, and students may check out lcd projectors and library-owned laptops and plug them into the network at any of the hundreds of available locations within the main library. more details on these privileges can be found in the article "bringing them in and checking them out: laptop use in the modern academic library." 3 as the university does not otherwise check out laptops to users or allow students to plug in their own laptops to the wired university network, the libraries had to come up with these additional specific policies. licensed electronic resources-terms and conditions academic libraries are generally the gatekeepers to many citation and full-text databases and electronic journals. each of the myriad subscription vendors has terms of use, violations of which can carry harsh penalties. for example, the unlv libraries had an incident where a vendor temporarily cut off access to its resource due to potential abuse detected from a single student. in this case, the user was downloading multiple pdf full-text files in an automated manner. this illustrates the need to have some statement in a library policy outlining the existence of such additional terms of use. vendors generally place a link at the top page of each of their resources related to this. for greater visibility, libraries should at least point out the existence of such terms of use for better exposure and potential compliance. in addition, some electronic resources have licensing agreements that simply do not permit community-user access. in these cases, library policy can simply state that some licensed resources may be accessed only by university affiliates. electronic reserves many libraries have set up electronic reserves systems to help distribute electronic full-text documents and streaming media content, among other things. additional policies may govern the use of such systems, such as making the system available only to currently enrolled students, and providing some boundaries in terms of what is acceptable for mounting on such a system. in addition, there is the whole area of copyright. e-reserve systems often have built-in methods to help better enforce copyright compliance in the electronic arena. additional policy statements can help educate faculty members on particulars related to copyright and e-reserves. offsite access to licensed electronic resources many libraries provide offsite access to their licensed resources to legitimate users via proxy servers or other methods. the policy regarding such access may address things such as who is permitted to access resources from offsite (such as students, staff, and faculty), and the requirement that the user be in good standing (such as no outstanding library-book fines). in some instances, universities have implemented broad authentication systems that, once logged on from an offsite location, allow the user into a range of university resources, including, potentially, library-licensed electronic resources. if such is the case, information pertaining to offsite access may be found in a higher-level policy. 158 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. electronic reference transactions many libraries have installed (or plan to install) virtualreference systems, or, at a minimum, have a simple e-mail reference service ("ask a librarian"). in addition, many collect library feedback or survey information through simple forms. in all cases, a record exists of the transaction. with virtual-reference systems, the record can include chat logs, e-mail reference inquiries, and urls of web pages accessed during the transaction. a policy governing the use of electronic-reference systems may address such things as which clientele may use the system; a statement on the confidentiality of the transaction; or a statement on whether the library maintains the electronic-transaction details. items such as hours of operation and response time to an e-mail question could be considered more procedural or informational than a policy issue. statements on information literacy while perhaps not a policy per se, many libraries have a computer-use policy statement to the effect that while the library may provide links to certain information, this does not serve as an endorsement or guarantee that the information is accurate, up-to-date, or has been verified. (such a statement posted on the library web site may provide additional exposure to the maxim that all that glitters is not gold.) statements that libraries do not regulate, organize, or otherwise verify the general mass of information on the internet may be included. obviously, many libraries have separate instruction sessions, awareness programs, and overall mission goals geared toward information literacy. i principles on intellectual freedom and internet filtering statements by the american library association (ala) on intellectual freedom and internet filtering may well appear in an institutional policy and often are included in library policies. filtering is something more likely to affect public and school libraries as opposed to academic libraries. still, underage children can and do use academic libraries. in such an environment, they may be intentionally or unintentionally exposed to questionable or obscene material. thus, a library computer-use policy can express the general concept behind the following: 1. intellectual freedom (freedom of speech; free, equal, unrestricted access); 2. the fact that academic libraries provide a variety of information expressing a variety of viewpoints; 3. the fact that this information is not filtered; and 4. the responsibility of parents to be aware of what their children may be viewing on library pcs. some libraries have provided policy links to various sets of information from the office of intellectual freedom at ala's web site, such as: 1. ala code of ethics 2. ala bill of rights 3. intellectual freedom principles for academic libraries: an interpretation of the library bill of rights 4. access to electronic information, services, and networks: an interpretation of the library bill of rights some libraries also provide references to ala information pertaining to the usa patriot act and how lawenforcement inquiries are handled. i summary computing is a vitally important tool in the academic environment. university and library computing resources receive constant and growing use for research, communication, and synthesizing information. just as computer use has grown, so have the dangers in the networked computing environment. universities often have an overarching policy or policies governing the general use of computing technology that help to safeguard the university equipment, software, and network against inappropriate use. libraries often benefit from having an adjunct policy that works to emphasize the existence and important points of higher-level policies, while also providing a local context for systems and policies pertinent to the library in particular. having computer-use policies at the university and library level help provide a comprehensive, encompassing guide for the effective and appropriate use of this vital resource. references 1. the a111erica11 i jeri/age college dictionary, 3rd edition. (boston: houghton, 1997), 1058. 2. board of visitors of the university of virginia, "responsible computing at u.va.: a handbook for students." accessed june 2, 2004, www.itc.virginia.edu/pubs/ docs/respcomp / rchandbook03.html. 3. jason vaughan and brett burnes, "bringing them in and checking them out: laptop use in the modern academic library," information technology and libraries 21 (2002): 52-62. policies governing use of computer technology in academic libraries i vaughan 159 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a. systemwide, institutional, and library computing policies at unlv unlv uccsn unlv policy libraries scs computing unlv student for posting guidelines unlv libraries nevadanet resou rces computer-use information for library additional policy* policy** policy*** on the webt computer usett policiesttt general direct evident link or references to higher-level institutional/system computer use policy x x x author / authority information included x x x approved/revised date included x x x x network and workstation security disruption of other computer systems/networks; deliberate ly altering or reconfiguring system files; ftp servers/peer-to-peer file sharing/operation of other bandwidth intensiv e services x x x creat ion of a virus; propagation of a virus x x x x attempts at unauthorized access/theft of account ids or passwords x x x x x password informationindividual user's need to maintain a strong, confidential password intentionally view, copy, mod ify, or de lete other users' files x x x x requirement to secure restrictions on stored files recommendat ion/requirement to back up fi les statement of ownership regarding equipment/software x intent ion al phys ical damage: tampering with or marking, reconfiguring equipment or infrastructure x x x food and dr ink policies x 160 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a. systemwide, institutional, and library computing policies at unlv (cont.) scs nevada net policy* pe rsonal hardware and software connect ing persona l laptops, etc. to university wired or wireless network(s) use of current and up-to-date patched operating systems and antiv irus programs running on personal equipment attached to network connect ing/ insert ing/ interfacing personal hardware with existing univers ity equipment; liability regarding physica l damage or data loss limiting access to personal equipment/report immed iately if stolen download ing or installat ion of personal or otherwise add itiona l software onto university equipment use of pe rsonal technology in c lassroom/test -tak ing environments printing mass printing of f lyers or news lette rs tampering with or trying to load anything into paper trays per-sheet print costs refund policies additiona l commonsense gu idel ines e-mail hiding ident ity/forging an e-mai l address initiating spam x uccsn computing resources policy** x x x x unlv policy unlv student for posting computer-use information policy*** on the webt x x unlv libraries gu idelines for library computer usett x x x x x x x unlv libraries additional policiesttt x x policies governing use of computer technology in academic libraries i vaughan 161 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a. systemwide, institutional, and library computing policies at unlv (cont.) j unlv uccsn unlv policy libraries i scs computing unlv student for posting guidelines unlv libraries nevada net resources computer-use information for library additional policy* policy** policy*** on the webt computer usett policiesttt e-mail (cont.) subscribing others to mailing lists dissemination of obscene material or web links to such material x x general guidel ines on e-mail privileges , such as the size of an e-mail account, how long an account can be used after graduation, etc. personal web site specific general account guidelines x use of schoo l name and logo statement of content responsibility/institutional discla imer inform ation x requirement to prov ide personal contact inform at ion x posting and hosting of obscene, questionable, or inappropriate material x intellectual property, copyright, and trademark general d iscussion of copyrights and trademark law; link s to comprehensive information on these topics x x x the concept of educational fair use x copying or modifying licensed software/use of software as intended/use of unlicensed software x x x specific rules pertaining to electronic theses and dissertations 162 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. ' f i i appendix a. systemwide, institutional, and library computing policies at unlv (cont.) unlv uccsn unlv policy libraries scs computing unlv student for posting guidelines unlv libraries nevadanet resources computer-use information for library additional policy* policy** policy*** on the webt computer usett policiesttt appropriateand priority-use guidelines mention of federal, state, and local laws x x x use of resources for theft/plagiarism x abuse, harassment, or making threats to others (via e-mai l, instant messaging, web page, etc.) x x x x viewing material which may offend others x legitimate versus prohibited use; use for nonacademic purposes (commerc ial; advertising; political purposes; games; etc.) x x x x x academic freedom; internet filtering x x x x privacy cookies, spybots, other malicious software what information is collected for evaluative/system management/statistical purposes; use of cookies for this statement on routine monitoring or inspect ion of accounts or use; reasons information may be accessed x x security of information stored on or transmitted by various campus resources x statement on general lack of security of public, multi-user workstations disposition of information under certain circumstances policies governing use of computer technology in academic libraries i vaughan 163 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a. systemwide, institutional, and library computing policies at unlv (cont.) unlv scs nevadanet policy* abuse violations, investigations, and penalties how one can report suspected ab use how requests for content, logg ing, or other account information are hand led; how and by what entities abuse investigations are hand led potential pena lt ies how to appea l potentia l penalties; rights/ respons ibilit ies you may have in such a sit uation other computer/ network-based services affecting the broad student population library-specific pub lic versus student use -a llowances and pr iority use right to access government information assistance for person w ith disab ilit ies laptop, lcd projector, etc. checkout privileges licensed electron ic resources-terms and conditions offsite access to licensed electron ic resources-who can access from offsite electronic reference transactions statements on information literacy x x x uccsn computing unlv student resources computer,-use policy** policy*** x x x 164 information technology and libraries i december 2004 unlv policy for posting information on the webt x x x libraries guidelines unlv libraries for library additional computer usett policies ttt x x x x x x reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a. systemwide, institutional, and library computing policies at unlv (cont.) unlv ala princip les on academic freedom /internet filtering scs nevada net policy* electron ic reserves; copyright as it pertains to electronic reserves notes uccsn unlv policy computing unlv student for posting resources computer-use information policy** policy*** on the webt libraries guidelines unlv libraries for library additional computer usett policiesttt x x * the systems computing services nevadanet policy. among other responsibilities, scs provides and maintains the general internet connectivity for nevada's higher education institutions, including unlv. the complete document can be accessed at www.scs.nevada.edu/nevadanet/nvpolicies.html. ** the university and community college system of nevada computing resources policy. uccsn is the system of higher education institutions in the state of nevada, governed by an elected board of regents. the complete document can be accessed at www.scs .nevada .edu/about/policy061899.html *** the complete document can be accessed at www.unlv.edu/infotech/itcc/scup.html. 1 the complete document can be accessed at www.unlv.edu/infotech/itcc/www _policy.html. 11 the primary unlv libraries policy governing student computer use. provided in appendix 2, the complete document can also be accessed at www. library.unlv.edu/services/policies/computeruse.html . ttt various other policies are in effect at the unlv libraries. some of these can be accessed at www.library.unlv .edu/services/policies/computeruse.html. appendix b. unlv university libraries guidelines for library computer use in pursuit of its goal to provide effective access to information resources in support of the university's programs of teaching, research, and scholarly and creative production , the university libraries have adopted guidelines governing electronic access and use of licensed software. all those who use the libraries' public computers must do so in a legal and ethical manner that demonstrates respect for the rights of other users and recognizes the importance of civility and responsibility when using resources in a shared academic environment. authorized users to gain authenticated access to the libraries ' computer network, all users of the university libraries public computers must be officially registered as a library borrower, a library computer user, or a guest user . a photo id is required. (exceptions may be made as needed when access to federal depository electronic resources is required.) priority use is granted to unlv students, faculty, and staff. as need arises, access restrictions may be imposed on nonuniversity users. in accordance with lic ensing and legal restrictions, nonuniversity users are restricted from using word-processing, spreadsheet, and other productivity and high-end multi-media software. during high-demand times, all users may have time restrictions placed on their computer use. if requested by library staff, all users must be prepared to show photo id to confirm their user status. authorized and unauthorized use public computers are to be used for academic research purposes only. electronic information, services, software, and networks provided directly or indirectly by the mliversity libraries shall be accessible, in accordance with licensing or contractual obligations and in accordance with existing unlv and university and community college system of nevada (uccsn) computing services policies (uccsn computing resources policy www.scs .nev ada.edu/about/policy061899.html; policies governing use of computer technology in academic libraries i vaughan 165 reproduced with permission of the copyright owner. further reproduction prohibited without permission. unlv faculty computer use policy www.unlv.edu/infotech/itcc/fcup.html; student computer use policy http:/ /ccs. unlv.edu/ scr/ computeruse.asp). users are not permitted to: 1. copy any copyrighted software provided by unlv. it is a criminal offense to copy any software that is protected by copyright, and unlv will treat it as such 2. use licensed software in a manner inconsistent with the licensing arrangement. information on licenses is available through your instructor 3. copy, rename, alter, examine, or delete the files or programs of another person or unlv without permission 4. use a computer with the intent to intimidate, harass, or display hostility toward others (sending offensive messages or prominently displaying material that others might find offensive such as vulgar language, explicit sexual material, or material from hate groups) 5. create, disseminate, or run a self-replicating program ("virus"), whether destructive in nature or not 6. use a computer for business purposes 7. tamper with switch settings, move, reconfigure, or do anything that could damage terminals, computers, printers, or other equipment 8. collect, read, or destroy output other than your own work without the permission of the owner 9. use the computer account of another person with or without their permission unless it is designated for group work 10. use software not provided by unlv 11. access or attempt to access a host compnter, either at unlv or through a network, without the owner's permission, or through use of log-in informatio! belonging to another person internet and web use the university libraries cannot control the information available over the internet and are not responsible for its content. the internet contains a wide variety of material, expressing many points of view. not all sources provide information that is accurate, complete, or current, and some may be offensive or disturbing to some viewers. users should properly evaluate internet resources according to their academic and research needs. links to other internet sites should not be construed as an endorsement by the libraries of the content or views contained therein. the university libraries respect the first amendment and support the concept of intellectual freedom. the libraries also endorse ala's library bill of rights, which supports access to information and opposes censorship, labeling, and restricting access to information. in accordance with this policy, the university libraries do not use filters to restrict access to information on the internet or web. as with other library resources, restriction of a minor's access to the internet or web is the responsibility of the parent or legal guardian. printing users are charged for printing no matter who supplies the paper. mass production of club flyers, newsletters, posters is strictly prohibited. if multiple copies are desired, users need to go to an appropriate copying facility such as campus reprographics. contact a staff member when using the color laser printer to avoid costly mistakes. the university libraries reserve the right to restrict user printing based on quantity and content (such as materials related to running an outside business). copyright alert many of the resources found on the internet or web are copyright protected. although the internet is a different medium from printed text, ownership and intellectual property rights still exist. check the documents for appropriate statements indicating ownership. most of the electronic software and journal articles available on library servers and computers are also copyrighted. users shall not violate the legal protection provided by copyrights and licenses held by the university libraries or others. users shall not make copies of any licensed or copyrighted computer program found on a library computer. use of personal laptops and other equipment students, faculty, and staff of the university are welcome to bring laptops with network cards and use them with our data drops to gain access to our network. the laptop must be registered in our laptop authentication system, and a valid 166 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. library barcode is also required. users are responsible for notifying the library promptly if their registered laptop is lost or stolen, since they may be held responsible if their laptop is used to access and damage the network. users taking advantage of this service are required to abide by all uccsn and unlv computer policies. the libraries allow the use of the universal serial bus (usb) connections located in the front of the workstations. this includes use with portable usb-based devices such as flash-based memory readers (memory sticks, secure digital) and digital camera connections. the patron assumes all responsibility in attaching personal hardware to library workstations. the libraries are not responsible for any damage done to patron-owned items (hardware, software, or personal data) as a result of connecting such devices to library workstations. as with any use of library workstations, patrons must adhere to all uccsn, unlv, and university libraries' computing and network-use policies. patrons are responsible for the security of their personal hardware, software, and data. inappropriate behavior behavior that adversely affects the work of others and interferes with the ability of library staff to provide good service is considered inappropriate. it is expected that users of the libraries' public computers will be sensitive to the perspective of others and responsive to library staff's reasonable requests for changes in behavior and compliance with library and university policies. the university libraries and their staff reserve the right to remove any user(s) from a computer if they are in violation of any part of this policy and may deny further access to library computers and other library resources for repeat offenders. the libraries will pursue infractions or misconduct through the campus disciplinary channels and law enforcement as appropriate. revised: march 3, 2004 updated: thursday, may 13, 2004 content provider: wendy starkweather, director of public services policies governing use of computer technology in academic libraries i vaughan 167 lib-s-mocs-kmc364-20141005044532 156 corporate author entry records retrieved by use of derived truncated search keys alan l. landgraf, kunj b. rastogi, and philip l. long, the ohio college library center. an experiment was conducted to design a corporate author index to a large bibliographic file. the nature of corporate entries necessitates a different search key construction from that of personal names or titles. derivation of a search key to select distinct corporate entry rec01'ds is discussed. introduction this paper describes the findings of an experiment conducted to design a corporate author index to entries in a large file of catalog records at the ohio college library center; a companion paper describes findings of a similar investigation into retrieval employing a personal author index. 1 the center has operated an on-line, shared cataloging system since august 1971. in addition to a library of congress card number index, the system maintains truncated name-title and title index files. the user is thus able to retrieve entries employing truncated search keys. three previous papers report results of experiments which led to the design of the name-title and title indexes.24 for monographs having personal names as main entries, a truncated 3,3 search key consisting of the first three letters of the author's name plus the first three letters of the first non-english-article word of the title was judged to be satisfactory in that this key yielded five or fewer entries per query in more than 99 percent of the cases when keys were selected at random.5 however, a recent study by guthrie and slifko reveals that a model which employs random selection of entries yields results closer to actual experience, and with a higher average number of entries per reply.6 a search key composed of the first five or four characters of the surname and the first or first and second initials makes possible efficient retrievap however, the situation is different in the case of corporate entries because many corporate names begin with the same or similar words. for example, in the records examined, the initial words of more than 1,300 publications are "u.s. congress, house committee on .. .. " obviously a corporate author entry recordsjlandgraf, et al. 157 type of search key different from that which proved efficient for retrieving personal authors is required for retrieval of corporate entries. material and methods the experiment used a file of approximately 200,000 marc ii records having a total of 68,169 corporate name entries. corporate entries were extracted from the llo, ill, 410, 411, 710, 711, 810, and 811 fi elds in the records. a program edited the file to extract keys; initial english language articles were removed from each entry, and the words "united states," "u.s .," "u. s.," "great brit.," and "great britain" appearing anywhere in the entry were replaced with "us" and "gt brit" respectively. a blank was substituted for each subfield delimiter and associated code, and unwanted characters such as punctuation, diacritics, and special symbols were removed; the program also closed up the space that the unwanted character had occupied. one blank replaced multiple blanks. the elements extracted consisted of five segments of eight characters each, representing the initial eight characters of the first five words of the corporate entry. segments containing fewer than eight characters were padded out with blanks. if a corporate name had fewer than five words, the remaining segments were blank. to study a given type of key, the file was sorted on a specified number of initial characters of each segment; these initial characters were then employed as search keys by a program which sequentially compared the characters in the key, counting distinct and identical keys. results and discussion table 1 presents the number of distinct keys and the maximum number of occurrences of identical keys for the structures studied in the experiment. the larger the number of distinct keys for a fixed number of entries in the file, the better the key will be for retrieval purposes. given two search keys which are more or less equally specific, the one which is simpler to use is preferable. the peculiarity of corporate-entry keys can be observed from table 1. even for the 8,8,8)8,8 key structure the percentage of distinct keys ( 33.7 percent) is low, and the maximum number of occurrences of an identical key ( 1304) is high. another observation revealed by table 1 is that as the key structure goes from five to three segments, there is a steady decrease in the percentage of distinct keys and consequently an increase in the maximum number of entries per key. however, a reduction in the number of characters in a segment does not cause a great deal of deterioration. for example, for 8,8,8,~,8 keys, the percentage of unique keys and the maximum number of entries per key are respectively 33.7 percent and 1304, while for 2,2,2,2,2 keys, the corresponding figures are 32.3 percent and 1307. thus, the 2,2,2,2,2 key structure seemed a good candidate for a corporate 158 journal of library automation vol. 6/ 3 september 1973 table 1. number of distinct keys and maximum number of identical entries per key for different key structures in 68,169 marc ii records. key structure 8,8,8,8,8 8,8,8,8,0 8,8,8,0,0 4,2,2,2,2 4,2,2,2,1 4,2,2,2,0 4,2,2,1,0 4,2,2,0,0 3,3,2,2,2 3,3,2,2,1 3,3,2,2,0 3,3,2,1,0 3,3,2,0,0 2,2,2,2,2 2,2,2,2,1 2,2,2,2,0 2,2,2,1,0 2,2,2,0,0 1,1,1,1,1 number of distinct keys 22982 20476 16283 22411 22120 19513 18589 14801 22417 22132 19560 18654 14922 22053 21743 19034 18036 13842 19028 number of distinct ker1s as a percent of total number of records 33.7 30.0 23.9 32.9 32.4 28.6 27.3 21.7 32.9 32.5 28.7 27.4 21.9 32.3 31.9 27.9 26.5 20.3 27.9 maximum number of entries per key 1304 1305 1802 1307 1308 1311 1311 1807 1307 1308 1311 1311 1806 1307 1308 1311 1311 1807 1308 entries index and therefore the number of entries per reply for this key structure was more intensely studied. on the average it is desirable that the number of replies per query be such that information by which the user can choose among the possible replies can be displayed on a single crt screen. this maximizes the utility of a computer system, since it minimizes the amount of system activity to promptly satisfy a user's request. since some query keys produce but one reply while others produce hundreds of candidate records, it is necessary to use the mathematics of probability to determine the likely long-term effect of a given choice of system parameters. using the approach indicated table 2. average number of entries per reply for key st1·ucture 2,2,2,2,2 for various multiplicity of entries. number of average number maximum frequency total records percent of distinct keys of entries of any entries in file in file total records eliminated per reply 19 44174 64.8 389 5.0 29 48127 70.6 223 6.6 39 50854 74.6 142 8.1 49 52422 76.9 107 9.1 59 53513 78.5 87 10.1 corporate autho-r entry recordsjlandgraf, et al. 159 as useful by guthrie and slifko, the analysis of the effect of various choices of search key becomes the following. assume that every entry has an equal probability of being accessed. then, in attempting to retrieve each entry once, keys having i number . of entries will cause a total of i 2 entries to be accessed. if ft denotes the frequency of keys having i number of entries and m denotes the maximum allowable occurrences of any key in the file, the average number of entries per reply y, is given by: jl{ where ~ i ft is the number of entries in the file whose derived keys have • = 1 a frequency of m or less. the above formula yields the average number of entries per reply for the 2,2,2,2,2 key to be much larger than 20 for m > 100; but some 2,2,2,2,2 keys corresponded to more than 500 file entries. a typical crt display terminal can accommodate only ten or fewer entries per screen. therefore, if the average number of entries per reply is desired to be ten or fewer, it is necessary either to ignore entries with high multiplicity or to adopt a different scheme of storing and retrieving such items, in which case the mathematical result would be the same as ignoring high-frequency items. the average number of entries per reply was computed for five different values of m ( 19,29,39,49, and 59); the results of these computations are in table 2, which reveals that if keys in the file are allowed a maximum recurrence of 39 entries per key, it would be possible to have keys in the main index for about 75 percent of total records, while entries for only 142 high frequency keys would have to be shunted to a secondary index. in this case, the average number of entries per reply would be about eight. table 3 gives the probability of number of entries per reply for the index file consisting of 50,854 (out of a total of 68,169) records with the maximum frequency of any key in the file being 39. for preparing this table the assumption is made that each entry in the file has an equal probability of being accessed. thus the probability of obtaining i entries per reply is given by: p(i)= jft 'f. ifj i= 1 where f, is frequency of keys occurring exactly i number of times in the index file. an inspection of this table shows that in 87.7 percent of the 160 journal of library automation vol. 6/ 3 september 1973 table 3. probability of number of entries per reply for an index file using 2,2,2,2, 2 key. number of entries 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 frequency 14820 2893 1276 726 427 312 248 195 150 120 78 88 56 71 62 48 41 28 24 22 18 16 23 25 13 9 12 18 10 11 11 13 6 9 7 6 11 5 2 probability pt·,ccntasc29.1 11.4 7.5 5.7 4.2 3.7 3.4 3.1 2.6 2.4 1.7 2.1 1.4 1.9 1.9 1.5 1.3 1.0 0.9 0.9 0.7 0.7 l.l l.l 0.7 0.4 0.7 1.0 0.5 0.7 0.7 0.8 0.4 0.6 0.4 0.5 0.8 0.3 0.2 cumulutioc prolhjhiijlll 1' ~rr.cnltrew 29.1 40.5 48.0 53.7 57.9 61.6 65.0 68.1 70.7 73.1 74.8 76.9 78.3 80.2 82.1 83 .6 84.9 85.9 86.8 87.7 88.4 89.1 90.2 91.3 92.0 92.4 93.1 94.1 94.6 95.3 96.0 96.8 97.2 97.8 98.2 98.7 99.5 99.8 100.0 time there would be 20 or fewer replies. this represents two screensful of information on a typical crt display. conclusion a file containing only those entries for which the frequencies of 2,2,2,2,2 search keys is 39 or fewer would produce 20 or fewer entries per corporate autlwr entry recordsjlandgraf, et al. 161 reply approximately 88 percent of the time, but such a file excludes 142 high frequency keys for 17,315 of a total of 68,169 entries . therefore, a special technique for handling corporate~entry derived keys of high multi~ plicity is desirable. references 1. a. l. landgraf and f. g. kilgour, "catalog records retrieved by personal author using derived search keys," journal of library automati{)n 6:103-8 (june 1973}. 2. f. g. kilgour, p. l. long, and e. b. leiderman, "retrieval of bibliographic entries from a nam~title catalog by use of truncated search keys," proceedings of the american society for information science 7:79-82 ( 1970}. 3. f . g. kilgour, p. l. long, e. b. leiderman, and a. l. landgraf, "titl~only entries retrieved by the use of truncated search keys," journal of library automation 4:207-10 (dec. 1971). 4. p. l. long and f. g. kilgour, "a truncated search key title index," journal of library automation 5:17-20 (march 1972}. 5. kilgour, long, leiderman, "retrieval of bibliographic entries." 6. g. d. guthrie and s. d. slifko, "analysis of search key retrieval on a large bibliographic file," journal of library automation 5:96--100 (june 1972}. 1. landgraf and kilgour, "catalog records retrieved." static vs. dynamic tutorials: applying usability principles to evaluate online point-of-need instruction benjamin turner, caroline fuchs, and anthony todman information technology and libraries | december 2015 30 abstract this study had a two-fold purpose. one is to discover through the implementation of usability testing which mode of tutorial was more effective: screencasts containing audio/video directions (dynamic) or text-and-image tutorials (static). the other is to determine if online point-of-need tutorials were effective in helping undergraduate students use library resources. to this end, the authors conducted two rounds of usability tests consisting of three groups each, in which participants were asked to complete a database-searching task after viewing a text-and-image tutorial, audio/video tutorial, or no tutorial. the authors found that web usability testing was a useful tutorial-testing tool while discovering that participants learned most effectively from text-and-image tutorials because both rounds of participants completed tasks more accurately and more quickly than those who received audio/video instruction or no instruction. introduction the provision of library instruction online has become increasingly important, given that more than one third of higher education students now take at least some of their courses online and that the number of students enrolling in online courses continues to increase more rapidly than the number of students in higher education as a whole.1 academic library websites reflect the growth of online education. by 1998, online versions of journals had become ubiquitous.2 in contrast, electronic books have been slower to be adopted in academic libraries, but there has been a steady and significant growth of their use in recent years. between 2010 and 2011, for example, the average number of electronic books available at academic libraries in the united states increased by 93 percent.3 benjamin turner (turnerb@stjohns.edu) is associate professor and instructional librarian, caroline fuchs (fuchsc@stjohns.edu) is associate professor and outreach librarian, and anthony todman (todmana@stjohns.edu) is associate professor and reference and government documents librarian, st. johns university libraries, new york, new york. mailto:turnerb@stjohns.edu mailto:fuchsc@stjohns.edu mailto:todmana@stjohns.edu static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 31 with the increasing availability of library content online, many users bypass the “brick and mortar” library and go directly to its website.4 remote access to library collections has advantages in terms of convenience, which further underscores the importance of making library websites as intuitive as possible while offering quality instruction at point-of-need. a recent survey of 264 academic library websites found that 64 percent offered some form of online tutorials.5 the relative effectiveness of different types of tutorials in providing online, point-of-need library instruction is therefore an important consideration for library professionals. this study had a two-fold purpose. one is to discover through the implementation of usability testing which mode of tutorial was more effective: screencasts containing visual and audio directions (dynamic) or text-and-image tutorials (static). the other is to determine if online point-of-need tutorials were effective in helping undergraduate students use library resources. for the purpose of this study, researchers were less interested in the long-term effects of these tutorials on student research but rather focused on point-of-need instruction for database use. st. john’s university st. john’s university is a private, coeducational roman catholic university, founded in 1870 by the vincentian community. the university has three residential campuses within new york city and an academic center in oakdale, new york, as well as international campuses in rome, italy, and paris, france. the university comprises six schools and colleges: st. john’s college of liberal arts and sciences; the school of education; the peter j. tobin college of business; college of pharmacy and health sciences; college of professional studies; and the school of law. there is a strong focus on online learning. special academic programs include accelerated three-year bachelor’s degrees, five-year bachelor’s/master’s degrees in the graduate schools, a six-year bachelor’s/jd from the school of law, and a six-year pharmd program. in fall 2013, total student enrollment was 20,729, with 15,773 registered undergraduates and 1,364 international students. during the 2012–13 academic year, 97 percent of undergraduate students received financial aid in the form of scholarships, loans, grants, and college work/study initiatives. the student body was 56 percent female and 44 percent male, representing 47 states and 116 countries. the diversity of student population is noted by the fact that 47 percent identified themselves as black, hispanic, asian, native hawaiian/pacific islander, american indian, alaska native, or multiracial. st. john’s university has a library presence at four campuses: queens, staten island, manhattan, and rome, italy. in addition to traditional or in-person interaction, both information technology and libraries | december 2015 32 online and distance learning are integral parts of the library-tutorial and instruction environment. undergraduate students receive a laptop computer at no cost, and the entire campus is wireless accessible. full-time faculty members receive laptop computers as well. the university libraries has 24/7 access to electronic resources, both on and off campus. the libraries’ portal is located at http://www.stjohns.edu/libraries. an online catalog can be found at http://stjohns.waldo.kohalibrary.com. wireless computing and printing are available at the four campus library sites as well as in other areas across campus. library reference and research assistance services are delivered in-person or electronically. library reserve services are accessible in either print or electronic formats. interlibrary loan has both domestic and international borrowing and lending via the illiad software platform. when the main queens campus library is not open for service, a 24/7 quiet study area is available for current students within the library space. library instructional services take place in formal classes that are requested by faculty, as well as library faculty-initiated workshops held in either the libraries’ computerized classrooms or at other on-campus locations. there is no mandated information literacy session. during june 2012–may 2013, 333 instruction classes were offered to 4,435 students. literature review the library literature on online library tutorials might be divided into subcategories: early development of online instructional tutorials, library website usability testing, evaluation of online information-literacy instruction tutorials, best practices for the creation of library tutorials, and the best mediums for the creation of library tutorials. early development of online instruction tutorials the need to evaluate and assess the usefulness of online instructional tutorials is not new. although not explicitly related to today’s environment, tobin and kesselman’s work contains an early history detailing the design of internet-based information pages and their use in the library information environment.6 they also included the early guidelines of the association of college and research libraries (acrl), the international federation of library associations (ifla), and the american library association (ala). a study by dewald conducted around the same time evaluated twenty library tutorials according to the current best practices in library instruction, and concluded “online tutorials cannot completely substitute for http://www.stjohns.edu/libraries http://stjohns.waldo.kohalibrary.com/ static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 33 the human connection in learning”7 and should be designed specifically to support students’ academic work. further, it was noted that tutorials should teach concepts, rather than mechanics, and incorporate active learning where possible.8 in a separate article, dewald argued that the web made possible new, creative ways of teaching library skills, through features such as linked tables of contents, and the provision of immediate feedback through cgi scripts. users also were able to open secondary windows to practice the skills they learned as they moved through tutorials. she further concluded that effective instructional content should not be text heavy, but rather include images and interactive features.9 another early study of online tutorials discussed the development of a self-paced web tutorial at seneca college in toronto, called “library research success,” which was designed to teach subject-specific and general research skills to first-year business majors. the creation of the tutorial was first requested by seneca college’s school of business management, which collaborated with seneca college library, the school’s centre for new technology, and centre for professional development in completing the project. the tutorial was a success, with overwhelmingly positive feedback from students and faculty members.10 despite such successful examples, a common concern expressed in early studies was that online tutorials would not be as effective as face-to-face instruction. one article compared and evaluated library skills instruction methods for first-year students at deakin university.11 another tracked the difference between cai (computer assisted instruction) without a personal librarian interaction and a more traditional library instruction incorporated into an english classroom setting, and which concluded that while useful, cai was not a good substitute for face-to-face instruction.12 library website usability testing as concern grew at the onset of the twenty-first century for the need to evaluate online library tutorials, articles on library website usability testing began to appear more frequently. in one study, the authors noted that they would not have identified problems with their website had they not done usability testing: “testers’ observations and the comments of the students participating in the test were invaluable in revealing where and why the site failed and helped evaluators to identify and prioritize the gross usability problems to be addressed.”13 librarians aiming to examine their patrons’ ability to independently navigate their library’s webpage to fulfill key research needs, conducted similar studies. at western michigan university (wmu), librarians investigated how researchers navigated the wmu library website in order to find three things: the title of a magazine article on affirmative action, the title of a journal article on endangered species, and a recent information technology and libraries | december 2015 34 newspaper about the senate race in new york state. they successfully used the data gathered to identify problems with their website and to establish goals and priorities in clarifying language and navigation on their site.14 more recently, researchers conducted a usability study with the aim of showing how librarians could build websites to better compete with nonlibrary search sites such as google, which would allow greater personalization by the individual user and more seamless integration into learning management systems.15 other researchers have studied the readability of content on academic library websites. in one such study, lim used a combination of readability formulas and focus groups to evaluate twenty-one academic library websites that serve significant numbers of academically underprepared students and/or students who spoke english as a second language. they concluded that the majority of information literacy content on library pages had poor readability, and that the lack of welldesigned and well-written information literacy content could undermine its effectiveness in serving users.16 kruger, ray, and knight employed a usability study to evaluate student knowledge of their library’s web resources. the study produced mixed results, with most students able to navigate to the library’s website and the opac, but large numbers unable to perform basic research tasks such as finding a journal article. the authors noted that such information would allow them to modify library instruction accordingly.17 another study focused on the use of language as it relates to awareness of relevant databases. at bowling green university library, staff members attempted to learn more about how users find and select databases through the library website’s electronic resources management system (erm). because of their study, the authors recommended that librarians should focus on promoting brand awareness of relevant databases among students in their subject disciplines by providing better database descriptions on the library webpages and by collaborating with subject faculty members.18 evaluation of online information-literacy instruction tutorials librarians at wayne state university conducted an assessment of their revamped information literacy tutorial, known as “re:search.”19 they distributed a multiplechoice knowledge questionnaire to seventy-two students participating in their 2010 wayne state federal trio student support service summer residential program, which was based on donald kirkpatrick’s evaluating training programs: the four levels.20 they concluded that their study highlighted some flaws in their tutorials, including navigational problems. as a result, they would consider partnering with wsu faculty in the future to develop better modules. one curious comment by the authors in their introduction warrants further discussion about assumptions made static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 35 by librarians regarding student research skills: “the internet has bolstered student confidence levels in their research abilities, increasing the demand for point-of-need instruction. students are accustomed to online learning, not only because of the shift in higher education to online coursework, but also because they have been leaning online through youtube, social networking, and other websites.”21 at purdue university, librarians evaluated the success of their seven-module online tutorial through the distribution of a post-test survey. these researchers found that the feedback received was essential for planning future versions of online instruction at their institution.22 a report from zayed university (united arab emirates) outlined an evaluation of infoasis, the university’s online information literacy tutorial, testing 4,000 female students with limited library proficiency and remedial english aptitudes.23 best practices for the creation of library tutorials other researchers developed guidelines and best practices for future planning and implementation. bowles-terry, hensley, and hinchliffe at the university of illinois conducted interviews to investigate the usability, findability, and instruction effectiveness of online video tutorials. although shorter than three minutes, students found the tutorials to be too lengthy, and would have preferred the option to skip ahead to pertinent sections. other participants found the tutorials too slow, while some preferred to read rather than watch and listen. on the basis of their study, the authors recommended a set of best practices for creating library video tutorials, including pace, length, content, look and feel, video versus text, findability, and interest in using video tutorials.24 at regis university library, librarians created online interactive animated tutorials and incorporated google analytics for use statistics and tutorial assessment, from which they developed a list of tips and suggestions for tutorial development. these included suggestions regarding the technical aspects such as screen resolution and accessibility. of some significance is that the data from the analytics suggest that the tutorials are being used both within and without the university. most useful here is the “best practices for creating and managing animated tutorials” found in the article’s appendix.25 best mediums for the creation of library tutorials other authors have explored the need to accommodate different learning styles in library tutorials rather than relying too heavily on text to convey information.26 at the university of leeds in the united kingdom, an information literacy tutorial was planned and created to support online distance learners in the geography postgraduate program. using an articulate presenter, the authors created a tutorial that information technology and libraries | december 2015 36 covered the same material that would be taught in a face-to-face session, and which incorporated visual, auditory, and textual elements. these researchers concluded that the online tutorial is supplemental and did not alleviate the need for face-toface instruction.27 to reach different types of learners, many librarians have begun to use adobe flash (formerly macromedia flash) to create multimodal online information literacy tutorials. authors who use flash note that learning how to use the software correctly represents a significant investment in time and effort.28 another study, conducted via a suny albany web design class, focused on the effect/outcome of teaching with web-based tutorials in addition to or instead of face-to-face interaction. the authors of this study pointed out that self-paced instruction, lab time, office, hours, and email exchange were all factors that are affecting web-based multimedia (wbmm) flash that were incorporated into instruction.29 rather than focusing purely on the content of online library instruction tutorials, some studies considered and evaluated the various tutorial-creating software tools. blevins and elton conducted a case study at the william e. laupus health sciences library at east caroline university, which set out “to determine the best practices for creating and delivering online database instruction tutorials for optimal information accessibility.”30 they produced “identical” tutorials using microsoft’s powerpoint, sonic foundry’s mediasite, and techsmith’s camtasia software. they chose to include powerpoint because “previous research has shown that online students prefer powerpoint presentations to video lectures.”31 their testing results indicated that participants found specific tutorial features to be most effective: video (33.3 percent), mouse movements (57.1 percent), instructor presence (28.6 percent), audio instruction only (28.6 percent), and interaction (28.6 percent). they concluded that camtasia tutorials provided optimal results for short sessions such as database instruction and that for instruction where video and audio of instructor + screen shots, mediasite was more appropriate. however, they also determined that powerpoint tutorials were an acceptable solution if cost were an important factor.32 in a separate study at florida atlantic university, researchers described the process of designing and creating library tutorials using the screencasting software camtasia. in addition to the creation of the tutorials themselves, the authors described how the project entailed the development of policies and guidelines for the creation of library tutorials, as well as training for of librarians in using camtasia software.33 this study provides another good example of the time investment involved in the creation of multimedia tutorials. static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 37 while the professional literature thus shows that flash-based tutorial software is popular among librarians, and the desire to accommodate students with different learning styles is a laudable goal, at least one study suggests that the time and money involved in the creation of multimedia tutorials could be better spent in other ways. a university of illinois urbana-champaign study found that students from different learning styles performed better after using tutorials made with a combination of text and screenshots than from tutorials created with camtasia software.34 method usability testing for the evaluation of tutorials in dynamic audio/video tutorials compared with text and image tutorials, the researchers employed usability testing, which is “watching people use something that you have created, with the intention of making it easier to use, or proving that it is easy to use.”35 usability testing requires relatively small numbers of participants to provide meaningful results, and it does not require the selection from a representative sample population.36 participants group number of participants control group 1 5 text-and-image group 1 5 dynamic audio/video group 1 5 group 1 total 15 control group 2 5 text-and-image group 2 5 dynamic audio video group 2 5 group 2 total 15 total participants 30 table 1. breakdown of participants thirty freshmen at st. john’s university participated in this study. while usabilitytesting experts do not place a great deal of importance on recruiting participants from a specific target audience, the researchers wanted to choose users who were less likely to have had significant experience with university library database searching, since prior knowledge could make it harder to determine the effectiveness of the tutorials. they therefore chose freshmen as the participants in the study. they did not seek any other variables such as age, gender, information technology and libraries | december 2015 38 ethnicity/culture, or any other demographic information. participants were recruited through the st. john’s central portal, which is the main channel of internal communication at st. john’s university, and through which mass emails can be sent to a targeted population of students. the email to students provided a registration link to a google form, which asked students to provide their name, year of study, time availability preference, and contact information. freshmen were selected from the response list. as an incentive for participation, the student participants became eligible for a kindle fire tablet for each of the two rounds of the study. prior to beginning the study, the authors consulted st. john’s university’s office of institutional research, which oversees all research at the university, and provides approval for the study of human subjects. since this study focused on tutorials rather than the participants themselves, the authors were granted a waiver for the study. tests usability testing typically involves having participants complete a task or tasks in front of an observer. for this study, the authors designed two tasks that required participants to find articles in academic search premier ebsco database (asp ebsco). the first task, given to all participants in the first round of tests, was relatively simple, and consisted of three components: finding an article about climate change published in the journal lancet and downloading a copy of the citation for that article in mla format from the database. participants who attempted the first task were labeled “group 1” (see appendix i). the second task was given to all participants in the second round of tests and was more complex, comprising five components. participants were asked to find an article about the deepwater horizon spill from a peer-reviewed journal published after 2011 that included color photographs. as with the first task, these participants were also required to download a copy of the citation for the article in mla format from the database. participants who attempted the second task were labeled “group 2” (see appendix ii). group 1 and group 2 were divided into three subgroups each. the first subgroup was the control group and received no instruction. the second subgroup was given access to the dynamic audio/visual tutorial (see appendix iii). the third subgroup was given access to the static text-and-image tutorial instruction (see appendixes iv and v). each subgroup consisted of five unique participants. each participant was scheduled for a specific fifteen-minute time slot. tests were conducted in a small meeting room in the library, with one participant at a time working with the facilitator. as the participants entered the meeting room, the static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 39 facilitator greeted them and confirmed their identities. participants were provided with an information sheet (see appendix vi), which told participants that the session would be recorded, that the researchers were concerned with testing the libraryinstruction tutorials, not the participants themselves, and that the tests were confidential and anonymous. participants were also told that they could end the test at any time for any reason. additionally, the facilitator read aloud the points-ofinformation sheet. participants were invited to ask questions or voice concerns. for both rounds of tests, participants had use of a laptop computer with a browser window open to the asp ebsco home page. for those who received instruction, a second browser window was open to either the dynamic or the static tutorial. for members of the control group, no tutorial was available. those who received instruction were allowed to return to the tutorial at any point they wished. using adobe connect software, the testing activities, tutorials, participants’ attempt(s) at task(s), participants’ computer screen, and any conversation between the participants and the facilitators were simultaneously recorded and broadcast to a separate room, where the two other researchers observed, listened and took notes. the participants were asked to verbally describe the steps they were taking, as per the “think aloud” protocol that is essential to usability testing. recorded sessions were then available for later review by the research team. on completing the task, participants who received either the text-and-image or dynamic audio/video tutorial were asked to complete a short questionnaire giving feedback on the instruction received (see appendix vii). participants who received no instruction were not asked to provide feedback. tutorials the researchers created four tutorials for this study. two were flash-based dynamic audio/video tutorials created using techsmith’s jing software. the static text-and-image tutorials were created using microsoft word, which was then converted into a pdf document. the dynamic and static tutorials mirrored each other in terms of content, and were designed with the specific goal of helping participants complete the tasks successfully, though in both cases there was some variation between the tutorials and the tasks. the tutorials received by group 1, for instance, showed participants how to find articles about the occupy wall street movement, limiting the search to “published in the new york times,” and how to download the citation in mla format. the tutorials for group 2 showed participants how to find articles about climate change that included color photographs, limiting the search to peer-reviewed journals that were published after 2011. discussion information technology and libraries | december 2015 40 the results of the usability study revealed two things: participants benefited from library instruction, through which they evidently acquired new skills; and participants benefited more from static text-and-image tutorials than from the dynamic audio/video tutorials. in both rounds of tests, the participants who received the text-and-image tutorials performed the tasks more effectively than did members of the control group or those who viewed the dynamic tutorials. group 1 for the first round of tests, members of the control group spent longer on the task and made more mistakes than those who received either the dynamic or the static tutorial (see table 2). for example, one participant in the control group was unable to download the mla citation, and another in the control group ventured outside the asp ebsco database platform to find the correct citation format. when members of the control group did succeed, they did so without a clear search strategy, evidenced by their use of natural language instead of boolean connectors. (asp ebsco uses boolean connectors by default, and natural language is usually ineffective.) another participant reached several dead-ends in the search before finally succeeding. while most of the control group participants were at least partially successful in completing the task, it is reasonable to suspect that they would have given up in frustration in a non-test situation, and would have benefited from point-of-need instruction. control 1 control 2 control 3 control 4 control 5 relevant article y y y y y lancet y y y y y mla citation y n y y y time on task (minutes) 8:28 2:49 6:30 2:41 1:42 average time on task: 4:26 mins. table 2. task completion success and time, control group 1 the participants who received the static text-and-image tutorial performed the best, completing the task with the highest speed and with the greatest accuracy (see table 3). all five of the participants in this group managed to find appropriate articles and to download the citation in mla format, though several had difficulty with the final task. all were able to navigate to the “cite” feature effectively, but all participants chose to click on the “mla” link rather than simply copy the citation. clearer directions in the tutorial might alleviate this problem. static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 41 t&i 1 t&i 2 t&i 3 t&i 4 t&i 5 relevant article y y y y y lancet y y y y y mla citation y n y y y time on task (minutes) 2:01 3:00 2:21 2:40 3:15 average time on task: 2:39 mins. table 3. task completion success and time, text and image tutorial, group 1 participants who received the dynamic video tutorial were more successful than those in the control group, but spent significantly longer on task than did those who received the static tutorial (see table 4). interestingly, two of the participants searched for “climate change” as the “subject term” in asp ebsco, even though the tutorial did not instruct them to do so. (su subject term is one of the options in the drop-down menu in asp ebsco, which otherwise searches citation and abstract by default.) while “climate change” is a commonly accepted scientific term, and the searches produced relevant search results, it is not generally advisable to begin a search with controlled vocabulary terms. t&i 1 t&i 2 t&i 3 t&i 4 t&i 5 relevant article y y y y y lancet y y y y y mla citation y y y y y time on task (minutes) 4:34 3:17 3:17 3:07 3:28 average time on task: 3:32 mins. table 4. task completion success and time, dynamic a/v tutorial, group figure 1. average time on task in minutes, group 1 information technology and libraries | december 2015 42 figure 2. successful task completion: group 1 group 2 the advantages of text-and-image instruction were more pronounced in the second round of tests, which involved a more complex task (see figure 3). as in the first round of tests, the participants in the control group had the lowest number of satisfactory task completions, and spent the greatest amount of time on task. although most of the participants in control group 2 had at least partial success in completing the task, most did so through trial and error, and showed a general lack of understanding of database terminology and functions. one participant, for example, attempted to use “peer-review” and “color photographs” as search terms. another attempted to search for “deepwater horizon” as a journal title. only two of the participants completed all components of the task successfully. two others partially completed the task – one found a suitable article with color photographs, but published in the nation, which is not peer-reviewed. one user failed to complete any part of the task and gave up in frustration (see table 5). control 1 control 2 control 3 control 4 control 5 relevant article y n y y y peer-reviewed y n y n y publication date y n n y y color photos y n y y y mla citation y n y n y time on task (minutes) 1:51 7:39 2:54 9:16 7:55 average time on task: 5:55 mins. table 5. task completion and success and time, control group 2 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% relevant article lancet mla citation control text and images video static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 43 in contrast, participants who received the text-and-image tutorial enjoyed the most success in round 2. three of the five participants who received the static tutorial completed all components of the task successfully. errors committed by the two others were related to publication date. participants in this group also completed the task more rapidly than those from the other two groups. t&i 1 t&i 2 t&i 3 t&i 4 t&i 5 relevant article y y y y y peer-reviewed y y y y y publication date y y n n y color photos y y y y y mla citation y y y n y time on task (minutes) 6:33 2:46 3:00 4:50 3:24 average time on task: 4:06 mins. table 6. task completion and success and time, text and image tutorial group 2 as in group 1, however, all but one of the participants who received the text-andimage tutorial first attempted to download the mla citation by clicking on the “mla” link, rather than simply copying the text. two of the participants referred back to the tutorials after they had begun the task, which was permissible according to the facilitator’s instructions. this suggests that the text-and image-tutorials are suitable for quick reference and allow users to access needed information at a glance. video 1 video 2 video 3 video 4 video 5 relevant article y y y y y peer-reviewed y y y y n publication date n n n y n color photos y y n y y mla citation y y n y y time on task (minutes) 4:13 5:39 6:33 3:59 4:40 average time on task: 4:57 mins. table 7. task completion and success and time, a/v tutorial group 2 among the five participants who received the dynamic audio/visual tutorial, only one completed all five components of the test successfully. one was unable to locate the citation feature, while another failed to limit to peer-reviewed articles. four of the participants limited the publication date from 2011 to the present instead of 2012 to the present. all participants correctly used the publication limiter. although given the option, none chose to return to the dynamic tutorial after starting the task. this might be because of the length of the tutorial (more than three minutes) and the difficulty in navigating to specific sections. information technology and libraries | december 2015 44 as noted above, participants in all groups tended to make errors related to publication date, which may have stemmed from the wording of the task itself rather than misunderstanding the functionality of the database. the task required participants to find articles published after 2011, but many found articles published from 2011 onward. clearer wording of the task probably would have alleviated this problem. figure 3. average time on task in minutes, group 2 figure 4. successful task completion, group 2 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% control text and image video static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 45 tutorial feedback after completing the task, participants were asked to provide anonymous, written feedback on the instruction they received. (members of the control groups were not asked to provide feedback because the purpose of the study was to compare different types of library tutorials.) participants were asked ten questions, eight of which were on a likert scale and two of which were openended. although the feedback for both the static and dynamic tutorials was generally positive, the text-and-image tutorials also received higher combined scores than the audio/visual tutorials on the likert scale questions (see figures 5 and 6). participants’ written feedback on the text-and-image tutorials was generally more positive than for the video tutorials. commenting on the text and image tutorial, one participant remarked that it was a “great resource” while another said that it was “very easy to use. will become really helpful when put into full effect.” another observed that the tutorial “was pretty precise.” not all the comments on the text and image tutorials were positive, however. more than one participant noted that the images used in the tutorials were blurry. one even suggested that “more animations to the text would make it much more open to people with different learning styles.” the feedback on the video tutorials was generally positive, with comments such as “very straightforward,” “helpful,” “easy to follow,” and “i would use this for school assignments.” however, a common complaint about the dynamic tutorials was that the audio was not very clear. (this may be because the quality of the microphone used for the recordings.) other participants seemed to criticize the layout of the database itself, saying that bigger size of words would have made it easier to follow. another complained that the dynamic tutorial was too simple, and that it should cover more advanced and in-depth topics. figure 5. tutorial feedback likert score averages group 1 0 1 2 3 4 5 text tutorial group #1 video tutorial group #1 information technology and libraries | december 2015 46 figure 6. tutorial feedback likert score averages group 2 conclusion this study suggests that library users benefit from online instruction library instruction at pointof-need, and that text-and-image tutorials are more effective than dynamic audio/visual tutorials for its provision. librarians should not assume that instructional tutorials must use flash or other video technology, especially given the learning curve, time, and financial commitments involved in creating video tutorial software. although the researchers in this study used the free software jing, learning to use it effectively was still a significant investment in time. more importantly, it is evident that the participants learned more and were more satisfied with text-and-image tutorials, which were more easily navigated than dynamic audio/video tutorials and which allowed users to more easily review tutorial content than did dynamic audio/video tutorials. this study corroborates the findings of mestre, who found that text-and-image tutorials were more effective than audio/video tutorials in teaching library skills.37 it also lends credence to the work of bowles-terry, hincliffe, and hutchinson, who found that users preferred tutorials that allowed them to read quickly and navigate to pertinent sections rather than watch and listen.38 as lim suggests, it is important to create instructional material that is clearly written.39 it further suggests that regardless of the technology used, librarians should focus on creating content that is relevant and helpful to our user population. again, it is worth noting that the control group, without the aid of point-of-need instructional materials, achieved some success in completing the tasks. it is possible that the members of the control group gained important knowledge simply by being told about asp ebsco and that there was enough implied information in the tasks themselves to provide basic information about the content and functionalities of the database. this suggests that databases like asp ebsco are intuitive enough that people can learn how to use them independently. the higher number of serious errors, and the greater length of time members of the control group spent on tasks, 0 1 2 3 4 5 text tutorial group #2 video tutorial group #2 static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 47 however, shows that efforts to raise student awareness of databases and library resources should be coupled with point-of-need instruction. although the usability tests generally went smoothly, researchers did encounter occasional difficulties with audio between the testing room and the observation room when it became difficult to hear what the participant was saying as he or she was completing the task. fortunately, the researchers kept recordings of each test, which allowed them to review those where the audio quality was less than optimal. to save time and run the tests more efficiently, however, the researchers recommend purchasing a high-quality microphone like those used for teleconferences. furthermore, this study shows the broader value of usability testing of library instructional material. although participants who received the text-and-image tutorials performed better than either of the other two groups, the tests helped researchers identify two problems with the tutorials: users found the images blurry and often misinterpreted how to download citations in mla format. such information gleaned from the user’s perspective would be valuable in creating future library online point-of-need instructional tutorials. references 1. i. elaine allen and jeff seaman, “grade change: tracking online education in the united states, 2013 | the sloan consortium,” sloanconsortium.org, 2013, sloanconsortium.org/publications/survey/grade-change-2013. 2. m. walter, “as online journals advance, new challenges emerge,” seybold report on internet publishing 3, no. 1 (1998). 3. rebecca miller, “dramatic growth,” library journal 136, no. 17 (october 15, 2011): 32, www.thedigitalshift.com/2011/10/ebooks/dramatic-growth-ljs-second-annual-ebooksurvey. 4. megan von isenburg, “undergraduate student use of the physical and virtual library varies according to academic discipline,” evidence based library & information practice 5, no. 1 (april 2010): 130. 5. sharon q. yang and min chou, “promoting and teaching information literacy on the internet: surveying the web sites of 264 academic libraries in north america,” journal of web librarianship 8, no. 1 (2014): 88–104, doi: 10.1080/19322909.2014.855586. 6. tess tobin and martin kesselman, “evaluation of web-based library instruction programs,” www.eric.ed.gov/ericwebportal/contentdelivery/servlet/ericservlet?accno=ed441454. 7. nancy h. dewald, “transporting good library instruction practices into the web environment: an analysis of online tutorials,” journal of academic librarianship 25, no. 1 (january 1999): 26–31. http://sloanconsortium.org/publications/survey/grade-change-2013 http://www.thedigitalshift.com/2011/10/ebooks/dramatic-growth-ljs-second-annual-ebook-survey http://www.thedigitalshift.com/2011/10/ebooks/dramatic-growth-ljs-second-annual-ebook-survey http://dx.doi.org/10.1080/19322909.2014.855586 http://www.eric.ed.gov/ericwebportal/contentdelivery/servlet/ericservlet?accno=ed441454 information technology and libraries | december 2015 48 8. ibid. 9. nancy h. dewald, “web-based library instruction: what is good pedagogy?,” information technology & libraries 18, no. 1 (march 1999): 26–31. 10. kelly a. donaldson, “library research success: designing an online tutorial to teach information literacy skills to first-year students,” internet & higher education 2, no. 4 (january 2, 1999): 237–51, doi: 10.1016/s1096-7516(00)00025-7. 11. marion churkovich and christine oughtred, “can an online tutorial pass the test for library instruction? an evaluation and comparison of library skills instruction methods for first year students at deakin university,” australian academic research libraries 33, no. 1 (march 2002): 25–38. 12. stephanie michel, “what do they really think? assessing student and faculty perspectives of a web-based tutorial to library research,” college & research libraries 62, no. 4 (july 2001): 317–32. 13. brenda battleson, austin booth, and jane weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (may 2001): 194. 14. barbara j. cockrell and elaine anderson jayne, “how do i find an article? insights from a web usability study,” journal of academic librarianship 28, no. 3 (may 2002): 122–32, doi: 10.1016/s0099-1333(02)00279-3. 15. brian detlor and vivian lewis, “academic library web sites: current practice and future directions,” journal of academic librarianship 32, no. 3 (may 2006): 251–58, doi: 10.1016/j.acalib.2006.02.007. 16. adriene lim, “the readability of information literacy content on academic library web sites,” journal of academic librarianship 36, no. 4 (july 2010): 296–303, doi: 10.1016/j.acalib.2010.05.003. 17. janice krueger, ron l. ray, and lorrie knight, “applying web usability techniques to assess student awareness of library web resources,” journal of academic librarianship 30, no. 4 (july 2004): 285–93, doi: 10.1016/j.acalib.2004.04.002. 18. amy fry and linda rich, “usability testing for e-resource discovery: how students find and choose e-resources using library web sites,” journal of academic librarianship 37, no. 5 (september 2011): 386–401, doi: 10.1016/j.acalib.2011.06.003. 19. rebeca befus and katrina byrne, “redesigned with them in mind: evaluating an online library information literacy tutorial,” urban library journal 17, no. 1 (spring 2011): 1–26. http://dx.doi.org/10.1016/s1096-7516(00)00025-7 http://dx.doi.org/10.1016/s0099-1333(02)00279-3 http://dx.doi.org/10.1016/j.acalib.2006.02.007 http://dx.doi.org/10.1016/j.acalib.2010.05.003 http://dx.doi.org/10.1016/j.acalib.2004.04.002 http://dx.doi.org/10.1016/j.acalib.2011.06.003 static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 49 20. donald l kirkpatrick, evaluating training programs: the four levels (san francisco: berrettkoehler; publishers group west [distributor], 1994). 21. rebeca befus and katrina byrne, “redesigned with them in mind: evaluating an online library information literacy tutorial,” urban library journal 17, no. 1 (spring 2011): 1–26. 22. sharon a. weiner et al., “biology and nursing students’ perceptions of a web-based information literacy tutorial,” communications in information literacy 5, no. 2 (september 2011): 187–201. 23. janet martin, jane birks, and fiona hunt, “designing for users: online information literacy in the middle east,” portal: libraries & the academy 10, no. 1 (january 2010): 57–73. 24. melissa bowles-terry, merinda kaye hensley, and lisa janicke hinchliffe, “best practices for online video tutorials in academic libraries: a study of student preferences and understanding,” communications in information literacy 4, no. 1 (march 2010): 17–28. 25. paul betty, “creation, management, and assessment of library screencasts: the regis libraries animated tutorials project,” part of a special issue on the proceedings of the thirteenth off-campus library services conference, part 1 48, no. 3/4 (october 2008): 295– 315, doi: 10.1080/01930820802289342. 26. lori s. mestre, “matching up learning styles with learning objects: what’s effective?,” journal of library administration 50, no. 7/8 (december 2010): 808–29, doi: 10.1080/01930826.2010.488975. 27. sara l. thornes, “creating an online tutorial to support information literacy and academic skills development,” journal of information literacy 6, no. 1 (june 2012): 81–95. 28. richard d. jones and simon bains, “using macromedia flash to create online information skills materials at edinburgh university library,” electronic library & information systems 37, no. 4 (december 2003): 242–50, www.era.lib.ed.ac.uk/handle/1842/248. 29. thomas p. mackey and jinwon ho, “exploring the relationships between web usability and students’ perceived learning in web-based multimedia (wbmm) tutorials,” computers & education 50, no. 1 (january 2008): 386–409. 30. amy blevins and c. w. elton, “an evaluation of three tutorial-creating software programs: camtasia, powerpoint, and mediasite,” journal of electronic resources in medical libraries 6, no. 1 (march 2009): 1–7, doi: 10.1080/15424060802705095. 31. ibid., 2. 32. ibid. http://dx.doi.org/10.1080/01930820802289342 http://dx.doi.org/10.1080/01930820802289342 http://www.era.lib.ed.ac.uk/handle/1842/248 http://dx.doi.org/10.1080/15424060802705095 information technology and libraries | december 2015 50 33. alyse ergood, kristy padron, and lauri rebar, “making library screencast tutorials: factors and processes,” internet reference services quarterly 17, no. 2 (april 2012): 95–107, doi: 10.1080/10875301.2012.725705. 34. lori s. mestre, “student preference for tutorial design: a usability study,” reference services review 40, no. 2 (may 2012): 258–76, http://dx.doi.org/10.1108/00907321211228318. 35. steve krug, rocket surgery made easy: the do-it-yourself guide to finding and fixing usability problems (berkeley, ca: new riders, 2010), 13. 36. jakob nielsen, “why you only need to test with 5 users,”nielsen norman group, march 19, 2000, www.nngroup.com/articles/why-you-only-need-to-test-with-5-users. 37. mestre, “student preference for tutorial design,” 258. 38. bowles-terry, hensley, and hinchliffe, “best practices for online video tutorials in academic libraries,” 22. 39. lim, “the readability of information literacy content on academic library web sites,” 302. appendix i. task 1 in academic search premier (ebsco), find an article about climate change, published in lancet. then copy a citation to the article in mla format. appendix ii. task 2 complete the following task using academic search premier (ebsco). take as long as you need. remember also to “think out loud” through the process. a) find an article about deepwater horizon oil spill published in a peer-reviewed journal after 2011, which includes color photographs. b) after you find an article, copy its citation in mla format. appendix iii. dynamic audio/video tutorials group 1 (basic): http://screencast.com/t/5uln4h8xr group 2 (advanced): http://screencast.com/t/c9kzkgofx6 http://dx.doi.org/10.1080/10875301.2012.725705 http://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users http://screencast.com/t/5uln4h8xr http://screencast.com/t/c9kzkgofx6 static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 51 appendix iv. text-and-image tutorial 1 information technology and libraries | december 2015 52 appendix v. text-and-image tutorial 2 static vs. dynamic tutorials | turner, fuchs, and todman | doi: 10.6017/ital.v34i4.5831 53 appendix vi. information sheet st. john’s university libraries web site usability study information sheet thank you for participating in the sj libraries’ usability study! before beginning the test, please read the following: • the computer screen, your voice, and the voice of the facilitator will be recorded. • the results of this study may be published in an article, but no identifying information will be included in the article. • your participation in this study is totally confidential. • you may stop participating in the study at any time, and for any reason. information technology and libraries | december 2015 54 appendix vii. tutorial questionnaire thank you for participating in the st. john’s university libraries’ tutorial usability study. please take a few moments to answer this brief survey. please refer to the following scale when answering the questionnaire, and circle the correct response. 1 =no, not at all 2 = not likely 3 = neutral (not sure, maybe) 4 = likely 5 = yes, absolutely 1. the tutorial was easy to follow. 1 2 3 4 5 2. i felt comfortable using the tutorial. 1 2 3 4 5 3. the graphics on the tutorial were easy to use. 1 2 3 4 5 4. the language/text on the tutorial was easy to understand. 1 2 3 4 5 5. i would use stj libraries’ tutorials on my own in the future. 1 2 3 4 5 6. i would recommend the stj libraries’ tutorials to my friends. 1 2 3 4 5 7. i was able to complete the tasks with ease. 1 2 3 4 5 8. i would be able to repeat the task now without the aid of the tutorial. 1 2 3 4 5 9. what changes would you make to the tutorial? additional comments and suggestions? literature review early development of online instruction tutorials library website usability testing evaluation of online information-literacy instruction tutorials in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs manolis peponakis abstract the aim of this study is to contribute to the field of machine-processable bibliographic data that is suitable for the semantic web. we examine the entity relationship (er) model, which has been selected by ifla as a “conceptual framework” in order to model the fr family (frbr, frad, and rda), and the problems er causes as we move towards the semantic web. subsequently, while maintaining the semantics of the aforementioned standards but rejecting the er as a conceptual framework for bibliographic data, this paper builds on the rdf (resource description framework) potential and documents how both the rdf and linked data’s rationale can affect the way we model bibliographic data. in this way, a new approach to bibliographic data emerges where the distinction between description and authorities is obsolete. instead, the integration of the authorities with descriptive information becomes fundamental so that a network of correlations can be established between the entities and the names by which the entities are known. naming is a vital issue for human cultures because names are not random sequences of characters or sounds that stand just as identifiers for the entities—they also have socio-cultural meanings and interpretations. thus, instead of describing indivisible resources, we could describe entities that appear in a variety of names on various resources. in this study, a method is proposed to connect the names with the entities they represent and, in this way, to document the provenance of these names by connecting specific resources with specific names. introduction the basic aim of this study is to contribute to the field of machine-processable bibliographic data. as to what constitutes “machine processable” we concur with the clarification of antoniou and van harmelen, who state, “in the literature the term machine-understandable is used quite often. we believe it is the wrong word because it gives the wrong impression. it is not necessary for intelligent agents to understand information; it is sufficient for them to process information effectively, which sometimes causes people to think the machine really understands.”1 also, in the bibliography used, the term “computationally processable” is used as a synonym to “machine­ processable.” manolis peponakis (epepo@ekt.gr) is an information scientist at the national documentation centre, national hellenic research foundation, athens, greece. information technology and libraries | june 2016 19 mailto:epepo@ekt.gr with regard to machine-processable bibliographic data, we have taken into consideration both the practice and theory of library and information science (lis) and computer science. from lis we have chosen the functional requirements for bibliographic records (frbr) and the functional requirements for authority data (frad) while making comparisons with the resource description and access (rda) standard. from the computer science domain we have chosen the resource description framework (rdf) as a basic mechanism for the semantic web. we examine the entity relationship (er) model (selected from ifla as a “conceptual framework” for the development of frbr), 2 as well as the potential problems that may arise as we move towards the semantic web. having rejected the er model as a conceptual framework for bibliographic data, we have built on the potential of rdf and document how its rationale affects the modeling process. in the context of the semantic web and uniform resource identifiers (uris), the identification process has been transformed. for this reason we have performed an analysis of appellations and names as identifiers and also explored how we could move on from an era where controlled names play the role of identifiers to one of the uri dominion: “while it is self-evident that labels and comments are important for constructing and using ontologies by humans, the owl standard does not pay much attention to them. the standard focuses on the syntax, structure and reasoning capabilities. . . . if the semantic web is to be queried by humans, there will be no other way than dealing with the ambiguousness of human language.”3 it is essential to build on the “library's signature service, its catalog,”4 and use it to provide addedvalue services. but to get there, first there has to be “a shift in perspective, from locked-up databases of records to open data shared on the web.”5 this requires a transition from descriptions aimed at human readers to descriptions that put the emphasis on computational processes to escape the rationale of records being a condensed description in textual form and move towards more flexible and fruitful representations and visualizations. background frbr and rda the fr family has been growing for more than a decade. the first member of the family was the functional requirements for bibliographic records (frbr),6 the first version of which was published towards the end of the last century. subsequently, ifla decided to extend the model in order to cover authorities. during this process, the task of modeling the names was separated from the task of modeling the subjects. thus two new members were added to the family; the “functional requirements for authority data: a conceptual model” (frad) and the “functional requirements for subject authority data (frsad).” 7,8 at the same period of time, the “resource description and access” (rda) standard was established as a set of cataloging rules to replace the aacr standard. according to its creators, the alignment with the fr family was crucial. as stated, in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 20 “a key element in the design of rda is its alignment with the conceptual models for bibliographic and authority data developed by the international federation of library associations and institutions (ifla): functional requirements for bibliographic records [and] functional requirements for authority data.”9 this paper uses the fr family and the rda as a starting point but detects some problems and inconsistencies between these models. it sustains the basic semantics from these standards but rejects their structural formalism because it finds that it is quite problematic and lacks effectiveness in expressing highly machine-processable data. the effective processability of the data will be discussed in detail in the section “the impact of the representation scheme’s selection: rdf versus er.” among the fr family, the terminology is inconsistent and, as we pass from the frbr to frad and frsad, even the perception angle of the general model undergoes change. in frbr (the first in order), there is no notion of the name as an entity. frad introduces this perception (frad also adds family as a new entity) and frsad makes a step forward and introduces the concept of nomen instead of the concept of name. hence, despite the fact that each of the members of the fr family of models has been represented in rdf,10 there is no established consolidated edition yet that combines the different angles using a common model and terminology (vocabulary).11 these representations (one for each model) are available at ifla’s website.12 on the other hand, in the context of rda there may be more consistency regarding terminology, but, as is well established in the relevant literature, there are significant differences between the two models, i.e. the fr family and rda.13,14,15 due to these differences, there are no uris, not even in the rda registry, in the examples of our study.16 given the above, the terms appearing in the figures are a selection from the three texts of the fr family. thus, nomen (from frsad) is used instead of name (from frad) as a more abstract notion, and the attribute—property in the context of rdf—“has string” (from frad) is used to assign a specific literal to a nomen. in figures 2–5 we have used the “has appellation” (reversed “is appellation of”) relationship of frad.17 notes about terminology and graphs: how to read the figures in this paper two different sorts of figures appear. this covers the need to compare two different models and pinpoint the differences between them and the problems that arise from selecting the er model to express frbr. an explanation of the two major models follows in the next subsection. information technology and libraries | june 2016 21 the first figure type follows the diagrams of the entity–relationship model and is used in figure 1. in this case: • the rectangles represent entities. • the oval shapes represent attributes. • the diamond-shaped boxes represent relationships. the second figure type has been created according to the rdf graphical representations and is used in figures 2–5. in these cases: • the oval shapes represent nodes that are identified by a uri and they could serve as objects or subjects for further expansion of the network. in figures 3–5 all the names were derived from the fr entities. • the line connectors between nodes represent the predicates (i.e., they are properties) and should also serve as uris. • the rectangle shapes represent literals consisting of lexical form. language code could apply in these cases. with or without language codes, these are the end points and they could not be subject to new connections. we follow the common modeling of the language in rdf in which the literal itself contains a language code, for example "example"@en in standard turtle syntax, or in rdfs xml coding. we must note that this kind of modeling is quite a simplistic way of language modeling because there is no mechanism to declare more information about language, such as multiple scripts, which could apply in the context of the same language. the impact of the representation scheme’s selection: rdf versus er nowadays, all the information on library catalogs is created through and stored in computers. this technological infrastructure provides specific methods and dictates limitations for the catalog’s data management. hence, every model must take into consideration the basic rationale of the technological infrastructure that will curate and process the data. depending on the syntax capabilities of the representation model, the expression of what we want to express becomes reasonably easy and accurate since “semantics is always going to have a close relationship with the field of syntax.”18 this establishes a vital relationship between what we want to do and how computers can do it. in this section we emphasize the limitations of the entity relationship (er) implementation, which frbr proposes, and denote how syntax affects expressiveness and, accordingly, functionality. finally, we demonstrate how the selection of one implementation or another (in our case er vs. rdf) has serious implications, both for cataloging rules and for cataloging practice. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 22 why do we compare these two specific models? the er model is the base that has been selected from ifla as a “conceptual framework” 19 for the development of frbr, while frbr is the conceptual model upon which rda has been founded. subsequently, rda is also affected by the choice of er model. on the other hand, rdf is the current conceptualization for resource description in the web of data. so, what kind of problems and conflicts arise from the implementations of each of these models? the basic rationale of er comprises three fundamental elements. there are entities; entities have attributes; and there are relationships between entities. it is also possible to declare cardinality constraints upon which the fr family builds. then again, rdf implies quite a different model. “the core structure of the abstract syntax is a set of triples, each consisting of a subject, a predicate and an object. a set of such triples is called an rdf graph. an rdf graph can be visualized as a node and directed-arc diagram, in which each triple is represented as a node-arc-node link. . . . there can be three kinds of nodes in an rdf graph: iris, literals, and blank nodes.”20 “linking the object of one statement to the subject of another, via uris, results in a chain of linked statements, or linked data. this avoids the ambiguity of using natural language strings as headings to match statements. as a result, a literal object terminates a linked data chain, and literals are generally used for human-readable display data such as labels, notes, names, and so on.”21 as a representative example of the differences between the two models, let us consider “place of publication.” peponakis counts nine attributes of place and notices that, due to the fact that the er model does not allow links between attributes, there is no way to define explicitly whether these attributes address the same place or not.22 taking into consideration this problem we demonstrate the transition from the er attributes approach to rdf implementations in figures 1– 2. let us assume that there is person (x), who was born in london, is named john smith and works at publisher (y). this publisher is located in london, where book (1), entitled history of london, has been published. for this specific book, person x was the lithographer. if we create a strict mapping to frbr entities, attributes, and relations, then we have the situation illustrated in figure 1. due to the fact that there is no way to link the four occurrences of london (inasmuch as there is no option to define relations between attributes in the er model), there is no way to be certain that london is the same in all cases. judging only by the name, it could stand for london in england, in ontario, in ohio, or elsewhere. information technology and libraries | june 2016 23 figure 1. example of “place” as attribute of several entities the ifla working group has faced the problem with place and noted the following. the model does not, however, parallel entity relationships with attributes in all cases where such parallels could be drawn. for example, “place of publication/distribution” is defined as an attribute of the manifestation to reflect the statement appearing in the manifestation itself that indicates where it was published. inasmuch as the model also defines place as an entity it would have been possible to define an additional relationship linking the entity place either directly to the manifestation or indirectly through the entities person and corporate body which in turn are linked through the production relationship to the manifestation. to produce a fully developed data model further definition of that kind would be appropriate. but for the purposes of this study it was deemed unnecessary to have the conceptual model reflect all such possibilities. 23 finally, they seem to avoid the problem and repeat their position in frad as well. in certain instances, the model treats an association between one entity and another simply as an attribute of the first entity. for example, the association between a person and the place in which the person was born could be expressed logically by defining a relationship (“born in”) between person and place. however, for the purposes of this study, it was deemed sufficient to treat place of birth simply as an attribute of person. 24 in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 24 for some reason the creators of the fr family have chosen not to “upgrade” the attributes of place into one and only one entity. furthermore, the same problem exists for many attributes, not only for place. thus, the problem has to do with the selection of er as “conceptual framework” and not with the specific entity of place. if we accept that “place of publication” must not be recorded as it appears on the resource, an rdf-based approach makes things clearer, as figure 2 shows. in this case, all attributes of place are promoted to the same rdf node and, instead of four repeats of the attribute with the value “london,” we reduce it to one and only one node with four connections to it. then, as illustrated by figure 2, we can be sure that all instances refer to the same london. figure 2. rdf-based representations of figure 1 in figure 2, it is assumed that there is no need to transcribe the literal of “place of publication” from the resource; i.e., we did not follow rule 2.8.1.4 of rda: “transcribe places of publication and publishers' names as they appear on the source of information.” for cataloging rules that demand to record the place as it appears on the resource, the readers can consult the subsection “place names” in this study. last but not least, rdf has another significant advantage compared to the er model: data coded in rdf are packed ready for use in the semantic web. on the contrary, data coded in er must undergo conversion—with all its implications—in order to be published in the semantic web. information technology and libraries | june 2016 25 names, entities, and identities in this section, the significance of names as carriers of meaning is outlined and the importance of documenting the relations of names with the entities and identities they refer to is established. additionally, the basic approaches are presented for metadata generation for managing names. these approaches resulted in the distinction (dissociation of authorities) from the bibliographic records, which in turn led (both frbr/frad and rda) to the lack of potentially linking—in an explicit way—the entity with the names it goes by. this linking, as it is presented later in this text, is fundamental for the description and interpretation of the entity. in everyday communication, the usage of a name in a sentence plays the role of the identifier for the entity that this specific name indicates. if the speakers share a common background, there is no need for qualifiers other than the name in order to disambiguate information such as whether nick is person x or person y, or if the word “london” indicates the city in ohio or in england, etc. thus, the common background leads to a very limited context in which the interpretation of the name and the assignment to the appropriate entity is sufficient and accurate. however, the context of the internet is extended into a variety of possibilities, so there is need of a more precise way to identify specific entities. in this regard, a very essential issue is the distinction between the properties of the name and the properties of the entity that is represented by the specific name. the word “john” could be recognized as an english name, but we jump to a logical flaw if we assume that john knows english. a representative example of this kind of inference (syllogism) can be found in rayside and campbell.25 statement: “man is a species of animal. socrates is a man. therefore, socrates is a species of animal. . . . ‘man' is a three-lettered word. socrates is a man. therefore, socrates is a three-lettered word.” therefore the authorities of a catalog should embody a two-level modeling of the information they represent. the first has to do with the entities and the second with the names of these entities. consequently, there is the need to find a way to pass from names to the entities they indicate; and, from entities, to the various appellations that these entities have. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 26 in catalogs, it is kind of vague whether the change of a name signifies a new identity. niu states: “for example: the maiden name and the married name of an agent are normally not considered two separate identities, yet one pseudonym used for writing fiction and another pseudonym used for writing scientific works are often considered two different identities of an agent.”26 then there can be one individual with many identities. but there can also be one identity which incorporates many individuals: for example, a shared pseudonym for a group of authors. to deal with these problems, frad introduces the notion of persona, rejecting at the same time the idea that a person is equal to an individual. frad defines a person as an “individual or a persona or identity established or adopted by an individual or group.”27 the question that arises here is when the persona must be conceived as a new identity. yet, frad does not make a sufficient judgment; instead, they refer to cataloguing rules. “under some cataloguing rules, for example, authors are uniformly viewed as real individuals, and consequently specific instances of the bibliographic entity person always correspond to individuals. under other cataloguing rules, however, authors may be viewed in certain circumstances as establishing more than one bibliographic identity, and in that case a specific instance of the bibliographic entity person may correspond to a persona adopted by an individual rather than to the individual per se.”28 so there is no specific guidance if, for example, in the case of “religious relationship,”29 there must be one identity created with two alternative names or two different identities. rule 9.2.2.8 in rda does not elaborate further. still, even with the problem of identities solved, the matter of appellations itself could be extremely complicated, and this is widely addressed in relevant literature.30,31,32 the viaf project confirms this with an extremely huge data set .33 assigning all appellations as attributes is an easy way to model the variants of a name, but it is very simplistic because it “does not allow these appellations to have attributes of their own and neither does it allow the establishing of relationships among the appellations. . . . frad makes a big step forward: all appellations are defined as entities in their own right, thus allowing full modeling.”34 of course, frad’s approach is not a novelty in the domain of lis since library catalogs have been modeling names since the era of marc. in unimarc authorities,35 the control subfield $5 contains a coded value to indicate the relations between the names with values such as “k = name before the marriage,” “i = name in religion,” “d = acronym,” etc., and in marc 21 there is the corresponding subfield $w.36 frad puts these values on a more consistent and abstract level. frad also defines “relationships between persons, families, corporate bodies, and works” in section 5.3 and “relationships between their various names” in section 5.4.37 the distinction between authorities and descriptive information since the days of card catalogs and for as long as marc and aacr have been used, bibliographic records have set their grounds on the dichotomy between descriptive information and control access points. the various types of headings stand for control access points. the terminus of headings was the alphabetical sorting. with the advent of computers, they were used as string identifiers to cluster and retrieve relevant bibliographic records. these bibliographic records had information technology and libraries | june 2016 27 a body of descriptive information that was transcribed from the resource and remained unchanged. so the headings were the keys to the records and the records were surrogates for documents. “the elements of a bibliographic record . . . were designed to be read and comprehended by human beings, not by machines”38; established headings are not an exception. one of their basic characteristics was the precondition that they were unique in the context of a specific catalog, thereby avoiding ambiguity. in every case of synonymy, qualifiers (such as date of birth or profession) were added to disambiguate, while the names also played the role of a unique identifier. from this process, an issue emerges: the information that appears on the document has changed and the controlled name may be completely different from the name on the resource. this means that the cataloger performs a transformation of the information, and this transformation carries two dangers. first, by changing the name, there is the possibility of assigning the entity behind the name to a wrong entity. second, by disturbing the correspondence between the information on the resource and the information on the record of the resource, the record becomes a problematic surrogate of the resource. to surpass this obstacle, traditional catalogs split the information into two different areas: one with the established forms, i.e., the headings; and the second with the purely descriptive information, i.e., the information that must be transcribed from the resource. this is the reason why traditional library catalogs put much effort into transcribing information from resources and very detailed guidelines have been developed. on the other hand, current approaches on metadata creation (such as dublin core) seem to underestimate the importance of descriptive information while concentrating on the established forms of names. but how can we be sure that different literals communicate the same meaning? does this kind of simplification, perhaps, cause problems regarding the integrity of the information? the names are not just sequences of characters (i.e., strings), but they carry latent information. it is known that there are women who wrote using male names (for example mary ann evans wrote as george eliot) and men who wrote by using female names. there are also nicknames for groups (e.g., “richard henry” is a pseudonym for the collaborative works of richard butler and henry chance newton), etc. therefore, it is important not to ignore names and the forms in which they appear on the resources, but to model them in such a way that integration between authorities and descriptive information is feasible, and the names are efficiently machine-processable. integrating authorities with descriptive information as we have already stated, traditional library catalogs are built on the dichotomy between description and access points. this analysis aims to bring descriptive information and authorities closer, i.e. to connect the access point of catalogs with the description of the resource. the basic principle of the model presented in this section is to promote each verbal (lexical) representation of a name to a nomen, whether this form of the name derives from a controlled vocabulary or not. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 28 in the cases that this form appears in a specific vocabulary, appropriate properties could be used to indicate such a relation. in this section, some representative examples are presented. it is important to note, once again, that every node and relation in the following figures could (and must, in the context of the semantic web) be identified by a uri, except for the values in rectangles, which are rdf simple literals and therefore cannot be the subjects of further expansion. thus, the concatenation is the following: every individual (instance of the relevant class) acquires a uri. every individual is connected through the “has appellation” property (acquires uri) to a nomen (also acquires uri) and these nomens end up connected to a plain rdf literal, which is in natural language wording and cannot be subjected to further analysis. place names the problem of place as an attribute in frbr and frad has also been analyzed in the background analysis of the current paper, specifically in the subsection “the impact of the representation scheme’s selection: rdf versus er.” here, a solution to this problem that is compatible with the frbr/rda solution is proposed. by promoting every nomen of a place to an rdf node, there is the option of referring to the entity of place as a whole or to a specific appellation of this entity. so, the relation (property in the context of rdf) between the subjects of a work could be indicated by connecting work x with place z. on the other hand, according to rule 2.8.1.4 of rda, the place of publication for the manifestation must be transcribed as it appears on the source of information. but following the connections presented in figure 3, it is easy to assume that this specific nomen corresponds to the same entity, i.e., to the same place. figure 3. place information technology and libraries | june 2016 29 personal names in the section “names, entities and identities,” we analyzed many of the problems associated with personal names. here, a model is presented where the work (and expression) is connected directly with the author, whereas manifestation is connected with a specific appellation, i.e., nomen, of this author. figure 4. statements of responsibility rda rule 2.4.1.4 states, “transcribe a statement of responsibility as it appears on the source of information.” but occasionally the statement of responsibility may contain phrases and not just names. in these cases, a solution similar to the metadata object description schema (mods) could be implemented where, if needed, the statement of responsibility is included in the note element using the attribute type="statement of responsibility." titles the management of titles in frbr and rda indicates a different point of view between the two standards. according to rda there is no title for the expression,39 and, as taniguchi states, this is a “significant difference between frbr and rda.”40 bibframe abides by the same principle of downgrading expression, since it entangles expression with work in an indivisible unit. in this regard, bibframe is closer to rda than to frbr. the notion of work has nothing to do with specific languages, even in the case when the work is a written text. therefore the assignment of the title of work to a specific appellation is an unnecessary limitation. on the contrary, the title of a manifestation is derived by a specific in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 30 resource. we argue that between these two poles there is the title of expression, which could stand as a uniform title per language. figure 5. titles v of bibliographic records and cataloging rules resource description in the domain of lis—from cutter’s era to the present day—emphasizes static linear textual representations. according to the rda “0.1 key features,” “in rda, there is a clear line of separation between the guidelines and instructions on recording data and those on the presentation of data. this separation has been established in order to optimize flexibility in the storage and display of the data produced using rda. guidelines and instructions on recording data are covered in chapters 1 through 37; those on the presentation of data are covered in appendices d and e.” but the tables in the relative appendices (d and e) contain guidelines that are mainly concentrated on punctuation issues, and they do not take into consideration the dynamics of current interactive user interface capabilities. as coyle and hillmann comment, “there are instructions for highly structured strings that are clearly not compatible with what we think of today as machine-manipulable data.”41 it is rather like producing high-tech cards: rda is faithful information technology and libraries | june 2016 31 to the classical text-centric approaches that produce bibliographic records as a linear enumeration of attributes; thus, rda can be likened to a new suit that is quite old fashioned. traditional catalogs (from card catalogs to opacs and repository catalogs) were built upon the principle of creating autonomous records. frbr set this principle, i.e. one record for each resource, under dispute, while linked data abolishes it. this way, a gigantic graph of statements is created, while a certain part of these statements (not always the same) responds to or describes the desired information. thus, a more sophisticated method emerges, if not makes itself imposed, for showing the results. therefore, the issue is not to present a record that describes a specific resource, since this conceptualization tends to be obsolete altogether. consequently, the visualization has to be different while in dependence with the data structure as well as the available interface of the searcher. in this context, the analysis of this study tries to keep in balance the machine-processable character of rdf that builds on identifiers (uris), while paying attention to the linguistic representation of entities. we argue that the balance between them will result in highly accurate and efficient representations for both humans and software agents. let us consider the model for titles that has been introduced in this study. according to frbr, “if the work has appeared under varying titles (differing in form, language, etc.), a bibliographic agency normally selects one of those titles as the basis of a ‘uniform title’ for purposes of consistency in naming and referencing the work.”42 rda treats the case in a very similar way: rule 5.1.3 states, “the term ‘title of the work’ refers to a word, character, or group of words and/or characters by which a work is known. the term ‘preferred title for the work’ refers to the title or form of title chosen to identify the work. the preferred title is also the basis for the authorized access point representing that work”. in this study, we consider the aforementioned statements as a projection that springs from the days when records were static textual descriptions independent of interfaces. nowadays we are moving towards a much clearer distinction between the entity and its names. this is reflected in figure 5, in which the connection between a work and its author has nothing to do with specific names (appellations) but is based on uris. the selection of the appropriate name as a title for the specific work could be based on certain criteria such as the language of the interface: in this case, the title of the work will be the title of the user interface language, and if this is not possible (i.e. there is no title label in this language), then it could be the title of the catalog’s default language. following the kind of modeling proposed in the current study, the visualizations of data become more flexible and efficient in a variety of dynamic ways. hence, we can isolate and display nodes and their connections, correlate them with the interface language or screen size (i.e., mobile phone or pc), create levels relative to the desired depth of analysis, personalize them upon the user’s request or habits, and so on. also, it becomes possible to display the data in forms other than textual. “as a result, humans, with their great visual pattern recognition skills, can comprehend data tremendously faster and more effectively through visualization than by reading the numerical or textual representation of the data.”43 in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 32 as we have already mentioned, the syntax and the semantics are always going to have a close relationship, but it is crystal clear that, now more than ever, the current semantic web standards allow for greater flexibility. as dunsire et al. put it, the rdf approach is very different from the traditional library catalog record exemplified by marc21, where descriptions of multiple aspects of a resource are bound together by a specific syntax of tags, indicators, and subfields as a single identifiable stream of data that is manipulated as a whole. in rdf, the data must be separated out into single statements that can then be processed independently from one another; processing includes the aggregation of statements into a record-based view, but is not confined to any specific record schema or source for the data. statements or triples can be mixed and matched from many different sources to form many different kinds of user-friendly displays.44 in this framework, cataloging rules must reexamine their instructions in light of the new opportunities offered by technological advancements. discussion naming is a vital issue for human cultures. names are not random sequences of characters or sounds that stand just as identifiers for the entities, but they also have socio-cultural meanings and interpretations. recently, out of “political correctness” and fear of triggering racism, sweden changed the names of bird species that could potentially offend, such as “gypsy bird” and “negro.”45 therefore we cannot treat names just as random identifiers. in this study we examined how, instead of describing indivisible resources, we could describe entities that appear in a variety of names on various resources. we proposed a method for connecting the names to the entities they represent and, at the same time, we documented the provenance of these names by connecting specific resources with specific names. we illustrated how to establish connections between entities, connections between an entity and a specific name of another entity, as well as connections between one name and another name concerning one or two entities. in the proposed framework, we maintain the linguistic character of naming while modeling the names in a machine-processable way. this formalism allows for a high level of expressiveness and flexible descriptions that do not have a static, text-centric orientation, since the central point is not the establishment of the text values (i.e., heading) but the meaning of our statements. this study has shown that it is important to have the possibility to establish relationships both between entities and between specific appellations (nomens in the context of this study) of these entities. to achieve this we promoted every appellation to an rdf node. this is not something unheard of in the domain of rdf since this approach has also been adopted by w3c for the development of skos-xl.46 frbroo, which is another interpretation of increasing influence in the wider context of the fr family, adopts the same perspective. 47 frbroo also gives the option to connect a specific name with a resource through the property “r64 used name (was name used information technology and libraries | june 2016 33 by)” or to connect a name with someone who uses this specific name through the property “r63 named (was named by).” murray and tillett state that “cataloging is a process of making observations on resources”48; hence, the production of records is the result of the judgments during this process. but in the context of traditional descriptive cataloging, the cataloger was not required to judge information in any way other than its category, i.e. to characterize whether the x set of characters corresponded to the name of an author, publisher, or place and so on. there was no obligation of assigning a particular name to a specific author, publisher, or place. in our approach, the cataloger interprets the information and supports the catalog’s potential to deliver added-value information. moreover, the initial information remains undifferentiated; hence, there is always the option of going back in order to generate new interpretations or validate existing ones. in recent years, there has been a significant increase in the attention given to multi-entity models of resource description.49 in this new environment, “the creation of one record per resource seems a deficient simplification.”50 rdf allows the transformation of universal bibliographic control to a giant global graph.51 in this manner, current approaches on resource description “cannot be considered as simple metadata describing a specific resource but more like some kind of knowledge related to the resource.”52 indeed, this knowledge can be computationally processable and exploitable. yet, to achieve this, “catalogers can only begin to work in this way if they are not held bound by the traditional definitions and conceptualizations of bibliographic records.”53 one critical issue is the isolation of parts (sets of statements) of this “giant graph” and the linking of these parts with something else; indeed, theory on this topic is starting to emerge.54 this is very essential because it allows for the creation of ad hoc clusters (i.e. the usage of a specific identity for an entity with all the names that have been assigned to this identity, in our context), which could be used as a set to link to some other entity. as a final remark, we could say that authorities manage controlled access points. in the semantic web, every uri is a controlled access point, and hence, the discrimination between description and authorities acquires a new meaning. in the context of machine-processable bibliographic data, the aim is to connect these two, i.e. the authorities with the description, and examine how one can support the other. however, since the emphasis is not on their individual management, we are drawn away from a mentality of ‘descriptive information versus access points” and towards one of “descriptive information as an access point.” acknowledgement the author wishes to thank henry scott who assisted in the proofreading of the manuscript. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 34 references and notes 1. grigoris antoniou and frank van harmelen, a semantic web primer, 2nd ed. (cambridge, ma: mit press, 2008), 3. 2. ifla, functional requirements for bibliographic records: final report, as amended and corrected through february 2009, ifla series on bibliographic control, vol. 19 (munich: k.g. saur, 1998), 6. 3. daniel kless et al., “interoperability of knowledge organization systems with and through ontologies,” in classification & ontology: formal approaches and access to knowledge: proceedings of the international udc seminar 19–20 september 2011, the hague, the netherlands, organized by udc consortium, the hague, edited by aida slavic and edgardo civallero (würzburg: ergon, 2011), 63–64. 4. karen coyle and diane hillmann, “resource description and access (rda): cataloging rules for the 20th century,” d-lib magazine 13, no. 1/2 (january 2007): para. 2, doi:10.1045/january2007-coyle. 5. cory k. lampert and silvia b. southwick, “leading to linking: introducing linked data to academic library digital collections,” journal of library metadata 13, no. 2–3 (2013): 231, doi:10.1080/19386389.2013.826095. 6. ifla, functional requirements for bibliographic records. 7. ifla, functional requirements for authority data: a conceptual model, edited by glenn e. patton, ifla series on bibliographic control (munich: k.g. saur, 2009). 8. ifla, “functional requirements for subject authority data (frsad): a conceptual model” (ifla, 2010), http://www.ifla.org/files/assets/classification-and-indexing/functional­ requirements-for-subject-authority-data/frsad-final-report.pdf. 9. ala, “rda toolkit: resource description and access,” sec. 0.3.1, accessed june 18, 2014, http://access.rdatoolkit.org/. 10. gordon dunsire, “representing the fr family in the semantic web,” cataloging & classification quarterly 50, no. 5–7 (2012): 724–41, dx:10.1080/01639374.2012.679881. 11. while this paper was under review, ifla released the draft “frbr-library reference model” (frbr-lrm), which is a consolidated edition for the fr family standards. it is developed according to the respective individual standards following the principles of the entity relationship modeling, which is challenged in this paper. taking into account the er modeling and the statement (available on p.5 of the standard) that “the model is comprehensive at the conceptual level, but only indicative in terms of the attributes and relationships that are defined,” this consolidated edition could not be perceived as a standard that could be implemented directly as a property vocabulary qualifying for use in the rdf environment. information technology and libraries | june 2016 35 http://dx.doi.org/10.1045/january2007-coyle http://dx.doi.org/10.1080/19386389.2013.826095 http://www.ifla.org/files/assets/classification-and-indexing/functional-requirements-for-subject-authority-data/frsad-final-report.pdf http://www.ifla.org/files/assets/classification-and-indexing/functional-requirements-for-subject-authority-data/frsad-final-report.pdf http://access.rdatoolkit.org/ http://dx.doi.org/10.1080/01639374.2012.679881 12. main page (for all fr) at http://iflastandards.info/ns/fr/; “frbr model" available at http://iflastandards.info/ns/fr/frbr/frbrer/; “frad model” available at http://iflastandards.info/ns/fr/frad/; “frsad model” available at http://iflastandards.info/ns/fr/frsad/. an addition to the previous is frbroo: the element set is available at http://iflastandards.info/ns/fr/frbr/frbroo/. 13. manolis peponakis, “conceptualizations of the cataloging object: a critique on current perceptions of frbr group 1 entities,” cataloging & classification quarterly 50, no. 5–7 (2012): 587–602, doi:10.1080/01639374.2012.681275. 14. pat riva and chris oliver, “evaluation of rda as an implementation of frbr and frad,” cataloging & classification quarterly 50, no. 5–7 (2012): 564–86, doi:10.1080/01639374.2012.680848. 15. shoichi taniguchi, “viewing rda from frbr and frad: does rda represent a different conceptual model?,” cataloging & classification quarterly 50, no. 8 (2012): 929–43, doi:10.1080/01639374.2012.712631. 16. rda registry is available at http://www.rdaregistry.info/. 17. the nomen entity and the “has appellation” (reversed “is appellation of”) property are also used by the frbr-lrm. 18. paul h. portner, what is meaning?: fundamentals of formal semantics (malden, ma: blackwell, 2005), 34. 19. ifla, functional requirements for bibliographic records, 19:6. 20. w3c, “rdf 1.1 concepts and abstract syntax: w3c recommendation,” february 25, 2014, http://www.w3.org/tr/2014/rec-rdf11-concepts-20140225/. 21. gordon dunsire, diane hillmann, and jon phipps, “reconsidering universal bibliographic control in light of the semantic web,” journal of library metadata 12, no. 2–3 (2012): 166, doi:10.1080/19386389.2012.699831. 22. manolis peponakis, “libraries’ metadata as data in the era of the semantic web: modeling a repository of master theses and phd dissertations for the web of data,” journal of library metadata 13, no. 4 (2013): 333, doi:10.1080/19386389.2013.846618. 23. ifla, functional requirements for bibliographic records, 19:32. 24. ifla, functional requirements for authority data: a conceptual model, 36–37. 25. derek rayside and gerard t. campbell, “an aristotelian understanding of object-oriented programming,” in proceedings of the 15th acm sigplan conference on object-oriented programming, systems, languages, and applications, oopsla ’00 (new york: acm, 2000), 350, doi:10.1145/353171.353194. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 36 http://iflastandards.info/ns/fr/ http://iflastandards.info/ns/fr/frbr/frbrer/ http://iflastandards.info/ns/fr/frad/ http://iflastandards.info/ns/fr/frad/ http://iflastandards.info/ns/fr/frsad/ http://iflastandards.info/ns/fr/frbr/frbroo/ http://dx.doi.org/10.1080/01639374.2012.681275 http://dx.doi.org/10.1080/01639374.2012.680848 http://dx.doi.org/10.1080/01639374.2012.712631 http://www.rdaregistry.info/ http://www.w3.org/tr/2014/rec-rdf11-concepts-20140225/ http://dx.doi.org/10.1080/19386389.2012.699831 http://dx.doi.org/10.1080/19386389.2013.846618 http://dx.doi.org/10.1145/353171.353194 26. jinfang niu, “evolving landscape in name authority control,” cataloging & classification quarterly 51, no. 4 (2013): 405, doi:10.1080/01639374.2012.756843. 27. ifla, functional requirements for authority data: a conceptual model, 24. 28. ibid., 20. 29. “religious relationship” is the “relationship between a person and an identity that person assumes in a religious capacity”; for example the “relationship between the person known as thomas merton and that person’s name in religion, father louis” (ifla, 2009, 61–62). 30. junli diao, “‘fu hao,’ ‘fu hao,’ ‘fuhao,’ or ‘fu hao’? a cataloger’s navigation of an ancient chinese woman’s name,” cataloging & classification quarterly 53, no. 1 (2015): 71–87, doi:10.1080/01639374.2014.935543. 31. on byung-won, sang choi gyu, and jung soo-mok, “a case study for understanding the nature of redundant entities in bibliographic digital libraries,” program: electronic library and information systems 48, no. 3 (july 1, 2014): 246–71, doi:10.1108/prog-07-2012-0037. 32. neil r. smalheiser and vetle i. torvik, “author name disambiguation,” annual review of information science and technology 43, no. 1 (2009): 1–43, doi:10.1002/aris.2009.1440430113. 33. thomas b. hickey and jenny a. toves, “managing ambiguity in viaf,” d-lib magazine 20, no. 7/8 (2014), doi:10.1045/july2014-hickey. 34. martin doerr, pat riva, and maja žumer, “frbr entities: identity and identification,” cataloging & classification quarterly 50, no. 5–7 (2012): 524, doi:10.1080/01639374.2012.681252. 35. ifla, unimarc manual: authorities format, 2nd revised and enlarged edition, ubcim publications—new series, vol. 22 (munich: k.g. saur, 2001). 36. library of congress, “marc 21 format for authority data” (library of congress, april 18, 1999), http://www.loc.gov/marc/authority/. 37. ifla, functional requirements for authority data: a conceptual model. 38. martha m. yee, “frbrization: a method for turning online public findings lists into online public catalogs,” information technology and libraries 24, no. 2 (2005): 81, doi:10.6017/ital.v24i2.3368. 39. see frbr-rda mapping from joint steering committee for development of rda available at http://www.rda-jsc.org/docs/5rda-frbrrdamappingrev.pdf 40. taniguchi, “viewing rda from frbr and frad,” 934. 41. coyle and hillmann, “resource description and access (rda): cataloging rules for the 20th century,” sec. 8. information technology and libraries | june 2016 37 http://dx.doi.org/10.1080/01639374.2012.756843 http://dx.doi.org/10.1080/01639374.2014.935543 http://dx.doi.org/10.1108/prog-07-2012-0037 http://dx.doi.org/10.1002/aris.2009.1440430113 http://dx.doi.org/10.1045/july2014-hickey http://dx.doi.org/10.1080/01639374.2012.681252 http://www.loc.gov/marc/authority/ http://dx.doi.org/10.6017/ital.v24i2.3368 http://www.rda-jsc.org/docs/5rda-frbrrdamappingrev.pdf 42. ifla, functional requirements for bibliographic records, 19:33. 43. leonidas deligiannidis, amit p. sheth, and boanerges aleman-meza, “semantic analytics visualization,” in intelligence and security informatics, edited by sharad mehrotra et al., lecture notes in computer science 3975 (springer berlin heidelberg, 2006), 49, http://link.springer.com/chapter/10.1007/11760146_5. 44. dunsire, hillmann, and phipps, “reconsidering universal bibliographic control in light of the semantic web,” 166. 45. rick noack, “out of fear of racism, sweden changes the names of bird species,” washington post, february 24, 2015, http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of­ racism-sweden-changes-the-names-of-bird-species/. 46. w3c, “skos extension for labels (skos-xl) namespace document—html variant,” 2009, http://www.w3.org/tr/2009/rec-skos-reference-20090818/skos-xl.html. 47. chryssoula bekiari et al., frbr object-oriented definition and mapping from frbrer, frad and frsad, version 2.0 (draft), 2013, http://www.cidoc­ crm.org/docs/frbr_oo//frbr_docs/frbroo_v2.0_draft_2013may.pdf. 48. robert j. murray and barbara b. tillett, “cataloging theory in search of graph theory and other ivory towers,” information technology and libraries 30, no. 4 (january 12, 2011): 171, http://dx.doi.org/10.6017/ital.v30i4.1868. 49. thomas baker, karen coyle, and sean petiya, “multi-entity models of resource description in the semantic web,” library hi tech 32, no. 4 (2014): 562–82, http://dx.doi.org/10.1108/lht­ 08-2014-0081. 50. peponakis, “libraries’ metadata as data in the era of the semantic web,” 343. 51. kim tallerås, “from many records to one graph: heterogeneity conflicts in the linked data restructuring cycle,” information research 18, no. 3 (2013), http://informationr.net/ir/18­ 3/colis/paperc18.html. 52. peponakis, “conceptualizations of the cataloging object,” 599. 53. rachel ivy clarke, “breaking records: the history of bibliographic records and their influence in conceptualizing bibliographic data,” cataloging & classification quarterly 53, no. 3–4 (2015): 286–302, doi:10.1080/01639374.2014.960988. 54. gianmaria silvello, “a methodology for citing linked open data subsets,” d-lib magazine 21, no. 1/2 (2015), doi:10.1045/january2015-silvello. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 38 http://link.springer.com/chapter/10.1007/11760146_5 http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of-racism-sweden-changes-the-names-of-bird-species/ http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of-racism-sweden-changes-the-names-of-bird-species/ http://www.w3.org/tr/2009/rec-skos-reference-20090818/skos-xl.html http://www.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf http://www.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf http://dx.doi.org/10.6017/ital.v30i4.1868 http://dx.doi.org/10.1108/lht-08-2014-0081 http://dx.doi.org/10.1108/lht-08-2014-0081 http://informationr.net/ir/18-3/colis/paperc18.html http://informationr.net/ir/18-3/colis/paperc18.html http://dx.doi.org/10.1080/01639374.2014.960988 http://dx.doi.org/10.1045/january2015-silvello introduction background frbr and rda notes about terminology and graphs: how to read the figures the impact of the representation scheme’s selection: rdf versus er names, entities, and identities the distinction between authorities and descriptive information integrating authorities with descriptive information place names personal names titles visualization of bibliographic records and cataloging rules discussion references and notes lib-s-mocs-kmc364-20140601053644 computer-based subject authority files at the university of minnesota libraries audrey n. grosch: university of minnesota libraries a computer-based system to produce listings of topical sub;ect terms and geographically subdivided terms is described. the system files and their associated listings are called the subject authority file (saf) and the geographic authority file (gaf). conversion, operation, problems, and costs of the system are presented. details of the optical scanning conversion, with illustrations, show the relative ease of the technique for simple upper case data files. program and data characteristics are illustrated with record layouts and sample listings. introduction as a corollary to the creation and maintenance of large library catalogs, it has become necessary for academic or research libraries to maintain authority files of various kinds, such as author name, subject, series. in a manual cataloging system these files serve to unravel the mysteries of form, meaning, and usage to the cataloger. they also serve as a control to h elp avoid conflicts, synonyms, or overlapping subjects. with a system of decentralized catalogs using different subject entries from a system's union catalog, some method must be derived to preserve such usage for the cataloger. a computer-based subject authority file provides that means. in january 1970, the university of minnesota libraries began studying the relationship of subject authority files to both the present manual cataloging system and to a planned mechanized system employing the marc ii format for storage of bibliographic data. minnesota's subject authority files are divided into two distinct logical files: subject authority and geographic authority subdivisions. the subject authority file ( saf) contains all topical subject heading terms and their subdivisions down to nine subject authority filesjgrosch 231 levels of term, and geographic main headings, i.e. u.s. with nongeographic subdivisions. nonterm data such as origin, usage notes, "libraries using," and other kinds of information are contained in the saf. the geographic authority file ( gaf) contains topical headings found in the saf, with geographical place names as subdivisions and indications of direct or indirect terms in geographic heading assignment. also similar nonterm data as found in the saf are found in the gaf. immediate and long range benefits, together with the cost of conversion versus photocopying showed that greater flexibility would be achieved through the conversion to machine-readable form. some of the benefits were: 1) immediate assistance to the libraries performing their own decentralized cataloging, while providing cards to the union catalog at minnesota; 2 ) future assistance to our coordinate campus libraries should they wish to increase compatibility of their catalogs to the minneapolis campus union catalog; 3) future provisions of a machine-readable authority to enable linking of various subject vocabularies together for an on-line controlled vocabulary subject searching system. when the decision had been made to convert the files to machinereadable form, we tried to determine what others had done regarding this application. although much previous work has been done on subject analysis, cataloging, vocabulary construction, and mechanization of bibliographic processes, very few designers have developed systems to support thesauri or subject heading files. in 1967 heald ( 1) reported on the system for test-thesaurus of engineering and scientific terms. the following year hammond of aries corp. ( 2) described the nasa thesaurus and way ( 3) outlined in detail the rand corporation library subject heading authority list ( shal) mechanized using punch cards and computer in 1967. mount and kollin ( 4) described the use of the computer in the updating and revision of the subject heading list for applied science and technology index. of course several famous information systems use mechanized thesauri , among them the national library of medicine's medlars system with its mesh vocabulary and the department of defense ddc descriptors. in addition, the seventh edition of the library of congress subject headings utilized computer photocomposition. another reported work on subject headings in a mechanized system is that of the library of congress in which a marc record for subject headings is discussed. avram et al. (5) give examples of this record and describe the system now under development at lc. unfortunately, for us, we completed the work herein reported in 1971, thereby not structuring our file to marc specifications. we mention this work here, as our file will lend itself to such a conversion, should we later require it. 232 journal of library automation vol. 5/4 december, 1972 data preparation and file conversion the saf and gaf files comprised 59 catalog card drawers of information (about 115,000 lines of typed data). each file would be converted and maintained separately, but would use the same system design and processing programs. at a later stage, merging the files would be considered. moreover, the cost of the system would be lower if one design could be used for both files. two conversion methods were evaluated, keypunching and optical scanning. other methods would have lent themselves to this conversion, such as ibm magnetic tape selectric typewriters ( mt /st) or an on-line system such as ibm's administrative terminal system (ats ). however, because of the relatively small file size (under six million characters) and a desire for as economical a conversion as possible, only keypunching and optical scanning input were seriously considered. mt /st typewriters were ruled out because of cost and lack of locally available tape conversion equipment. keypunching was considered too slow in relation to typing. our assessment of optical scanning as the cheapest method was confirmed later after completion of the conversion phase of the project, as an estimated $1800 in total savings over keypunching. files were converted without intermediate coding, permitting the typists to transcribe directly from the subject and geographic authority card files. the data preparation was done by the catalog division's subject authority coordinator. this librarian edited the file to eliminate ambiguities before the typist received the drawer. otherwise, except for a quick check of the typist's finished sheets, the data were not examined again until after they were in machine readable form on tape. this procedure worked very smoothly, and caused the staff of the catalog division little inconvenience during the conversion phase. figure 1 shows flow of the complete conversion activity. equipment used for preparation of the data consisted of two ibm selectric typewriters model 715 with carbon ribbon, dual cam inhibitor, and 065 typing element ( rabinow font). one machine had a pin feed platen. this feature later proved to make no discernible difference in the quality of the typed output, but some typists stated that they preferred the pin feed platen over the standard platen. the control data 915 page reader with a cdc 8092 teleprogrammer operating under grasp iii software was used for the conversion. block time was rented at a commercial service bureau for $50.00 per hour. library systems division personnel operated the system during these time periods. control data provided a system manual and debugging time in order to prepare for our operation during conversion. however, little assistance in handling the application was actually received from the control data personnel, who were familiar only with business data processing. a stock form, called the cdc 915 page reader form, procured from a pr1111 saf!:.. caf lv!astcr !.ists 1-. upda t <> l'jj'i,,tt.· . t r~·a ll' s,\f" gaf i \la::-,h·r t <.q"' c.. ' co:"\' u\sjo\. actjv j t) scanned shev t s subje ct authority filesjgrosch 233 cdc 9 15 procl'ssing consoll• c..• rr ur· list & n ·.1··• 1 1-------tl shec·ts cdc 3300 list tape fo r er r or chcc king cdc 3l00 updat .. , < onvl•rl ra\\ l alh' tu tnt<~ r11"letl i a t f• fornhtt fig. 1. conversion process for saf and gaf 234 journal of library automation vol. 5/ 4 december, 1972 t soc i al scienc e rese arch$sgf n t ess ays n t period i calsn d cl , en t social sc i enc esn n do not subd i vide furt he r without approval n t abst ra ctsn t periodic alsn i r c .. 19 00$f1nu n mnu ph . d, thesis shaw , ~eop~e. i!1i; 1970-n adopted j une 1970 per recom~endat i o~ of as n do no t dat e suodiv i s?d e fur ther in mnu cat n fig. 2. saf input typing sample page local forms vendor, was used. this form has a typing area of 9~" x 13" marked off by faint blue lines. top and bottom alignment areas are provided to check for line skew. scanner throughput is increased by use of the longest permissible form with as much single line data as possible. figure 2 shows a portion of a typed page from the saf. line 1 is the format recognition line which was repeated on each sheet as a precaution against its loss by the optical scanner program during processing. such a loss of the format recognition line would have forced complete rerunning of the job. the remaining lines show the various data elements identified by tag characters. the complete set of tag characters is shown in table 1. the end of page symbol # is used on pages which terminate before the last physical line of the page to increase scanner throughput. the h symbol terminates each line and serves the same speed-increasing function. table 1. conversion identification tags t ag t d n c r z x description term d epartmental catalog in which the t erm is used scope note or general note on use of the term continuation line reference from which the t erm was verified if other than lc f ollowed by s = see; by sa = see also; by x = see tracing; by xx = see also tracing. geographic authority flle cross reference tracing (implied ). subject authority filesfgrosch 235 table 2. term subfield indicat01's indicator $ sgf $ dir $ ind $ mnu $ prov $mesh $ nal description term also entered in gaf direct indirect local university of minnesota subject term provisional term medical subject heading term national agriculture library term indentation spaces serve as a flag to the conversion program to show the level of the term or other data element. this technique decreased the number of characters to be typed, yet level errors were easy to detect during proofreading. subfield indicators for certain nonterm data completed the input format used during conversion. table 2 describes these indicators and the meaning of each subfield. the gaf typed input is shown in figure 3. note the similarity between the two files, yet the presence of the variant treatment of an older term (social surveys in) from a newer term (social sciences). as a result the catalog division has now changed these old form terms to conform with library of congress subject heading forms. 1>4oor. t social sciencesn t history$dirn t byzantine empiren t sourcesn d artn t social surveys in$dirn x africa, southn x alabaman brynmawrm?, walese~~ x arynmawr, walesn # fig . 3. gaf input typing sample page 236 ]ourrujl of library automation vol. 5/ 4 december, 1972 during typing, error correction by typists was facilitated by the use of three special characters: .j, -delete line ? -delete preceding character t -type over a character to delete character without inserting blanks. a program is typed on an optical scanning sheet in an assembly level language for the cdc 915 page reader. it is then assembled into object code which operates the page reader and its controlling computer. an example of the program used in this conversion is shown in table 3. line 1 of this program defines the input-output and control characters together with a coordinate to terminate reading of a line if data are not found on the line. it also defines the special characters described above for error correction, end of line, etc. line 2 specifies that a stock form (not preprinted) is to be read, giving the left-most and right-most character positions and maximum number of lines per page together with the first line number to establish the scanning area coordinates. these coordinates are expressed as three digit octal values determined through use of a forms grid and ruler. line 3 describes the tape record format including the field size, the blank fill character, left or right justification, and alphanumeric or numeric only data field content. line 4 instructs the 8092 teleprogrammer unit to convert certain characters to octal values matching the cdc 3300 computer system which are not identical to the normal 915 page reader octal values. the final e terminates reading of the program sheet. from this sheet grasp iii compiles an object program which is stored in the 8092 teleprogrammer memory, enabling scanner operation. system description and operation the raw data tape created during optical scanning was used to build the saf and gaf data files. the magnetic tape coding is binary (odd parity) using 800 bpi density. a fixed length record of 20 characters is used with 100 records per physical block. as many 20 character format c (continuation of data) records are used as needed to achieve variable length logical records. table 4 shows the three record formats used. table 3. cdc 915 program for raw data tape creation ictliblk,dsican, ? idlt,tieol,nieop,#ifmt,wlww istkid27,350,116,004lww e subject authority filesjgrosch 237 table 4. saf and gaf record formats fo rmat a control record a;ar.--contents pos . 1 2-s 6 7-14 15 1618 19-20 reco rd tv~e paqe number column number file cr~ation date file identification subj. au th. (saf) geog . auth. {g4f) co lumns used (123 standard) )lumber of 1 i ne s ner page (75 standard) format a data record (ini tiajl ~~!~ · con tents 1 record tyoe tenn reference tenn (gaf only) reference dept. library see see also see from values 1-9999 1-3 '1:1-dd-yy s g 123 . 121' 111' 131 80 max . values t x r d 1 2 3 4 fonnat b c ar. pas 4 s-6 7-20 (continued) contents qualification code (6 bit binary) sgf (se~ geograohic) dir (direct entry) 1~0 (indirect ent.) pro (provisional entry) t1nu {r1i nn esota tenn) mesh (medi cal subj. heading term) nal (national agri. library tenn) comb inations of these terms are possibl~. they are stored by adding the above values together, i . '!. 17 r1nu/sgf number of disolay lines for item first 14 characters of i tern values 1 2 4 8 16 32 48 2 3 see also from level number 1 -7 fonna t c data record (cont1nuation) sort exception code i~umeric ~xce'ltlon hvn;en excertion sub>titut1on exceo. u.s. obbreviation ~t . brit. ab 1 'r~v . !j h s u ' char· contents pas. 1 2-20 ~ecord tyoe con tinuation of item . values blank or to change or modify the file, keypunched cards are used; one transaction card is used for each correction for both saf and gaf files. table 5 shows the layout of this card. table 5. saf and gaf transaction card column 1-4 5 6-7 8-9 10 11 12 13 14-15 16-80 contents page of master list column of master list line of master list sequence number deck number continuation number level num ber transaction type add cancel modify record type term reference term ( gaf) reference departmental library see see also see from see also from data values 1-9999 1-3 1-80 00-99 or blank 0-9 blank or 0-9 1-7 a c m t xt r d s sa x xx 238 journal of library automation vol. 5/4 december, 1972 catalogers in the wilson library (the university's largest and central library) and the bio-medical library use a 3 x 5 card as an input form. this card is filled in and transmitted to the librarian acting as subject co. ordinator. then the information is keypunched and prepared for submission to an updating run. the normal schedule as originally planned was to run a cumulative supplement monthly, with a quarterly full updating of the file. however, this schedule has been flexible as the transaction volume has varied considerably from early estimates. currently updates are run quarterly to produce supplements, with a full listing annually. these updates vary from 5,000 to 14,000 transactions. the program for the system is written in cobol for the cdc 3300 computer operating under the master operating system. upon demand the program performs four basic functions on the data files: 1 ) creation of a cumulative supplement list from a transaction card deck; 2) updating of the tape files from the transaction card deck; 3) preparation of master lists either during the update process or independently; and 4 ) querying the file on the basis of user defined search terms. parameter cards control the options available when supplements or master lists are to be run. the accept, deck, list, abort, line, space, column parameters provide control over cutoff for new supplement, transaction card list form, termination of job if the number of error cards exceeds a given value, number of lines per page of output, and number of blank lines before and after each transaction on the suppl~ment, and whether a single or double column supplement is to be produced. figure 4 shows a sample from the saf supplement. the updating phase of the program creates the new master file and produces an update error listing accompanied by a report on composition of the file by level number, kind of data, and logical/physical record counts. the master list printout is also controlled through parameter cards. the line, column, select options indicate the number of lines of data to be printed in each column, the number of columns per page, and which pages are to be listed. this latter feature permits supplying replacements for pages improperly printed or bound and suppression of printing when a program restart is necessary. figure 5 shows the most commonly used master list format. the file query function is performed upon demand to assist in file revision, to change a term throughout the file , or other special purpose. the search items can be composed of any and /or combinations of record types, record levels, qualification codes, sort exception codes, and key words or phrases. a keyword search is a character by character search of file items. thus, by specifying a root word, all derivatives of the word formed by adding prefixes or suffixes will be identified. if these derivatives are not desired, a blank preceding and/or following the root word in the search key will prevent their display. however, the word will not be identified if it is "'" · t'-•7"> 0 l t .... 'it<:; ~i· '11 • 7? ~ ll r cflgp.-,g~tl"l'' "' 1 ~., . ., dl'"' t t(, ! <:la t ton <:tf' '""l::>p"'04. t j n 'l l ~if wt' tf!"< r-<::; "'"' r '""'o~~>.oo ~ 1 r "' rr t t~r.<: o[<:;flll,l ll' " '" r' l i fr'h'"j w i'\o o" ~ <:: rr '""l:;~r>r)oe t i i" 'i o r <;ro " '"" l"f"l""l'" ht l'\ •~ <;fr '" l'l~o:.fll tih ti"'n .ant'! "'(qr,r-o nr co~p r) i'a t j qt•<:: r(' =~~~~~n~!, r'q of ii>n r c ooro;p nt;flf' nc'f o;cwont.<:: hjl'l r::ouq.sro:; "'<: ., ~v<:; ,. .. 'ill!. o!f"a'l 4 t• tw o o o:. <:;[ r atj t>4"10 s . ccs. t l olf"4n r" f:o: l t l"' ( of 11 fci''ff;4 l" dfojqi'hr.al<; cotrf'l owaotng(4l o;p .. tnrtfll' i"of ~f a._n i"ofmfn&l ~ l ~>j (.tjlf( <:; ff '"a 'i f ta••r.u arr <:;ff f".l'it i"o! i"ic a~'"' "'ou~"' a t 10 '1 "'£" ow~~"•ll; n[<::c;pw & g£4l <:;pw j'it';t "' o roy •r por ve: ntt o"' 1' q\,' 1 w dll.:;lf!i"jojit fi')n t '1'1)1)1'"1 etu>tt f'to ,lfj""' 1 '1 .... i,. f r..,e:vrwt rnn y vf1u fw ~i t c!:o\.s f ("o t.'4., ooic:vf "4tjon' r:: rni"!"'il l4.w l"f'nt't ~rt or' law<:; vr f' c!:j "i"~il j u oi sdh~tjon 0lf~otng t.th) orji r:lt""' o::<'f l"~f .. !"'iil pq11c'!01.j!: f (' 0 .. ~" <;,r[ i"oj oi!"'il l4.w (of' ..... n l 1w) co•h" lfl"f 1'1~ ldiw c: <:; c'r r.:ttoitiiill j ij '!j<;'h c t lo"'l pl ~:~~~~ t:~2.~$1~ ~~l~~l1'!'" po .. ,. o;rr !"c!iiiit~al lh f of\ooi &n l.\w) ~o tf lcol o(ijn t ofl>; t yr r, oepwv conn ra~ ~ uf io4('1oo;, ~=g~~t :.~~~!~~~ f :~~~fl~~ <;h nflll~on f"r;>q<; 's s £r"t.fi1nc;; c uo ~ 0£'\rojdrt o"l gnq tuvel foc:t• 'f 0 "1 0 " 1c fon rl !tl o"ls 1 0 1:0. of'!;f01f' f! 0 "1 a ~c to~v(l 111 1) 1r'f\o:a .... f l l" ll l1 '1 oft:l! r'i ')tr:al s .,..,r tr ft~s ""u""'rc;; iijql t or.o ~ o,..,. cu'~ e oj g u 1'"111~>itq e~>~gljs ~o~ " ul tljof 6.,1'1 f( £l t g i q"' subject authority files/grosch u"ft vi' 0 <:ify of mi nnf'i0 t6 'i~~~t~; .. ~ ~~ ... ~; ~ i y r 1 ~" h;a "1' .. .j fr !l t .j io ~1\1 n t 1" ~"' ~'~l"lo ' " " t'ijllo l f:o l on ~o 1 , .. {l tl c i n ot'l 0"1 01'1 f'l"' (ltl r: a"" 00 chi 00 ca n or~ r:a~ i ~ o , .. 7 t .. ~ t c; , aoo " i otloz h1 t ot af10 ' 1 o t .. z 20!> ,. ~ t ,.on r i o lio? '"~ flu of\0 t i o lio ? '"~ 1'h ado <:; i 0\lo? ''1 101 b(!o t 1 olio ? ?2 1 t 5 t ann '< i olio ? ?23 101 aoi) t 0131o '36 ~ ofl c at.' 's i olio? zz.~ 1<;1 100 <:; i ii iio? z?lo ofl ru,j t , ,~ .. 31~ t o t • oo 1 ' otlo? ??'!> oo on c:; iilja '"''" 15 1 an o i olio? ?lf. 00 can 1 i d11o2 ?21 ll l'f cl"l 'i i otlo ? ;t?~ u ("an f i otlo ? ??g 00 {"a"' <:; i i 0 1 lo? 7c; 1 i 0 1 ~!11'1 t i lt tlo ? 9 1'1o i t'l l aoo t i : i i i i i i 014 ? ~'sz 00 c in t ihit ? l~'i 101 aorj t otlo~ 11! ''ill cl~ t otlo3 1 h101 l{)o t 014 ! 1 ~el tst 100 l' ol lolo l ,3 101 a(10 t iij iolo 133 tst ~~n " llt]o teo(' t!!'t aoo t i n11o1o 2'if 1 1}1 100 t ?'ifl 1 st ado i) ?ofl 00 c ... l t ;?e. 7 00 o n r1 ]qf tnt 100 t ht t 'i t jim ~ 011q t fio 11o f i flo ih]g tf. o 1~1 aoo ado am aoo •no •on aoo ran can con o n can r.&n on~ lf.f tot u )l} nt!oo v? tftt an o dtlo" t '7 t 'i t •nn nt•)t v i " f 'l ~ • "" ' pcrn •r\trrro:; "r'\j71~~ ,:;;o rr·~~ "'' '"'' i 1_,,_ jo< i• • .,,,, 'ctjc••t s~~> rh.ll · y . ,:. ~,.~;~l.~'¥!1 .~~~;~ " ~ .. ..... :, 1i-1t~j,,.s;r; ~;1~,.:hm:}n rr "'" .. , ':' i r p,f c: of" rr ... t .&': t t r(!h~"i •r t rr "p' rt u.>. p ~~~' ' " of u ... t ~llt .. , 1. .. r,,e t c'f1p.luf t7 0 "' '· y'" t • " "ct~;;::~~l~ft~ ~; ~ ~~f ,.al f f' 'i l o(" fl~~~fl ~ ~~f! j<:c y h ·•rv • ~ro: 1\jh"tof""j<:; ·c"~ jy'! ri'iwll' ' '" ll~ ( jih ).oo::j'i . .. , . ~·~. 7 1 c . ... ,. .. r~· r .. • t '"* ~ ....... ., ... , :. "' "'ll"'·""""' • n•·rr'" ~ r .,.q uot: :::g: ~~~; ~ j~j!~h:~i,~~ u-1 o • "('• '· ,"1(1' 1 ~' t1 y if tj~ • • l i, w) 0 1·•··.r:r , cr -.. ~ ... ,(' .. ~ c:h<' t r r l'tfi(trt ~o~r <; :· ''' f ~ ; ~,;~.k1 ~~~~ ~ ~~ ~~ r,: .. ' ~~ ~ ~f~!j'~! ~~ i (: "t.,t · r r,. r •• . ~·r =~""~i~t:~:~;· ... ~ ,., .. ,, ,. ... ,. .,. c"'' 'trt: •• .,. • ttl\ t ipl~ ~ • c,.~rr •tt t , , • i ,("'0 p~ tr ~ '' "'• 1"1 i t"ll"' p'h f io , t , ! <\3 <>-t"lo' . f'l~ ·l"t.!i"t ll''l"' ct' ~j·?;l!;: ... : ,~! ~/f , 6 ·~~ r : ;~~ ~~~jpo y ~" ~::~-~~,y~f'·~~~~~~~~~ ' ~~~-·11 nrn , ''l tl ~.l'f htrt• , ft'"dll • .•· ~"'• n tr~ • ,. .... ~ '' • '' '' ,. , d l"ttco'" ' ' "c•rtt ., , r " r.l u .fll r-· "~:~.l~t ~ ~c ~~~~itil<: ll f' ,., r~ u" • "o"' "r~ t ~ ... , , ,., ' '' ., 1"r • •~ r • •r• 1 l· f'' i /'5-f,t' comu fru. itat( -.,1 •21 • rn' o:• pila t 1r1 c~c:,r,.ru•(,n ·~r: ... ~~:~: ~~~ h·~(l~~~ .. ~,,~, ,ro ii('(' " <'• ((fo,""" fy illr. .. .t klf'l 1 r ~ '"'" •n _., co::~~" ~=ffh ~n~~; ~~ n1~~ :, c' \l ll ro~· thc1 "":~~~"~~~~f' ~ r~ . ~:~~~ ~ cn·'"' "" ' ''f"' l.r ' ' ch t hr . ., ""r onf,c. &'tlf ,.f<' , f' f'i-is£ c'f •jtn"' t,.d iih • h' p p!j ..; t f'l''l~"rv•ttoi t'f t't'<;c -•,.,,.,.. rr•~•uv •t rn. cr· "' p'f it t r • ,., ff'~f ... ,:::. • ~('\< ., (l' f> t rr ,. .,. li""~ ipt " ,~ '"j k\.~1 ... h t r . t:"'hvj>v • nr"' •tto . ,. . tf'ci fij : • ~"n· .,~ ,h'• ' rot o1 .. r.,.t· r"'" ,.,::,..:a¥~!~:; · .... ~~~~t ~~~~l=~ .. , .,., , li p , l t' f' h l l,.r ~ ts • r .,,.,c;rq,. u tct r ," 'll:ioi l "rc;ru•r~" i <;cf ; ~:~. !~~ .,:f .. , <'c; a'f s "'" i"c'i! r 't" ~~~~~~~l ~ ~~ ,~ar.-t.c t '>r. f v.-;1 ~\~~~t ·~~!o& "r " yrio d('if' ll' " rn,..o;.• j> vu i o .. ('f ,..iiu<> . "fr "''''"' r"'t .,, ~y t • i i"._ ,. ,.. .,o:rj>yttt r • rr r'' "' ''llfr,o: · , ... "~h; ~~ ; q .~f '"t"v• t latot .,.('i f' o! :~ l·~l ~ ~ ~~ 's~ ~~~~~~~~; i o n t;(' ~<'~" f.'y .t tj rj cf r p1 jol s "~i pf' [t 1 • , (f'nj' i'v .t tr; ,_ -h o pp' l i" ch1 c,._ r:o•·c;~j.o 'flt~n cf ~r i f lft"(:o:; c;u· rc"" t'l ~'t i"n cf n.tt u"il ff <' oij111 '" r..; • ro;,i~~" ~!l ~~ .$' ! <'pi (' ~ tp~yf ~~'ll rc;t, · ~g; ~~=~:lj~~ ~= ;~;"'!~~f' c• o;. h .,olci6" f'f':,~~~ :~n('~c"~~ :~~;~~~l op<' o:-rr cljjttj•t:•. (l""'"''lfv i t j i"'n &jon cr• t i" i i tj')t ,.,. :, ·::~~n ~ f, ,.~~n~~; f('l"l •rc• crov11 1 ~1 t i'i (p~u c:o:t u ic 0tu711 1h i< c.< f i pttl ipii u"' o: j ., ro • """" ' 'n• t r~ •r~< tih' ""'"t."f< • c:t'l'c)10f'olth' h i w) • cowo:;f' l(t.,h' oi •,.tu••t a, r ,, • r t"t••nli tf ( "" •"t" toorc ., ~ if ) i •nrt,y fig. 5. master list format using 3 column standard expansion of the shelf list. although file conversion took five months to complete, the program to operate the system was delayed because of termination of the programmer originally assigned to the project. although the basic program features were ready in about 3-4 months, it was not until january of 1971 that the system was installed. during that year the staff gained experience in the system and cleansed the data of many ancient errors. by the end of the year, the system was an integral part of our catalog division support activities. costs as was pointed out previously, there was consideration given to photocopying the authority files to provide a duplicate set for the bio-medical library. it was determined that this would cost $2,400 ( 60,000 cards @ $ .04 each) . this equalled the cost of the typing personnel and rental of optical scanning equipment. moreover, there would have to be duplicate cards and filing to maintain both files, with no assurance that they would remain exact duplicates of one another. in our opinion the benefits of this subject authority files j grosch 241 table 6. conversion costs item senior clerk typists @ $2.40 ( 2 fte for 3 mos.) cdc 915 rental (20.1 hours @ $50 per hour) typewriter purchase typewriter rental ( 2 mos. ) magnetic tape cdc 915 forms cdc 3300 computer time @ $95.00/hr. total cost $1810.56 1007.50 532.70 60.00 74.00 400.00 1411.45 $5296.21 computer-based system offset the additional cost over the photocopying approach. to create these files completely cost $5,296.21 for all direct expenditures for clerical help, scanner time, typewriter purchase and rental, supplies, and cdc 3300 computer time. table 6 shows the breakdown of these costs. during the conversion and development phase, salaries of the systems personnel were absorbed by the library so that only these direct costs were charged to the project. also, the library absorbed the subject coordinator's time for editing the file of cards prior to typing. two senior clerk-typists at $2.40 per hour each were employed for three months full time to type the data. operating costs are borne by the library, which requires a half time librarian as subject coordinator and a student keypunch operator for 15-20 hours per week. the systems division provides program maintenance as required. supplies and computer time require about $2,100 per year if quarterly full lists are used with monthly supplements. some idea of the relative processing economy can be shown by examining some typical running times on the computer. the sizes of the saf and gaf files are respectively 4.35 and 1.75 million characters. a typical supplement with 12,000 transactions takes 45 minutes to print on the cdc 3300 equipped with a 1000 line-per-minute printer for either saf or gaf. printing of a full master list for the saf and gaf is 1 hour 25 minutes and 45 minutes respectively. updating the files takes about 1 hour 40 minutes for 12,000 transactions. a query of the file takes about 30 minutes. current computer and channel charges are $95 per hour. general observations our experience with this project has shown us the high reliability of the cdc 915 page reader as a conversion device. less than 1 percent of the total amount of data the page reader scanned was rejected. those errors rejected were easily spotted and retyped. no scanner-produced errors were found in the data; however, there was an occasional failure to pick up spaces when more than three occurred together. these errors were very infrequent and were discovered in the raw data proofreading. these errors were corrected and, after the final output file was generated, we again 242 journal of library automation vol. 5/4 december, 1972 checked for similar conditions and found everything in order with regard to term level indication. with an upper-case file such as this, use of the cdc 915 is simple and easily accomplished. ·however, the library should not rely upon a scanner manufacturer or the installation where a unit is being leased to provide all the assistance required. the library will have to design its application and become familiar with the equipment in order to achieve best results. all optical scanning usage requires that certain care be exercised in t~e typing operation. lines must not be skewed, characters must not be blurred, and length of line can be critical even though the scan optics may be opened and closed over longer lines than are intended to be typed. further, it is imperative that the paper used in the scanning operation meet specifications for use with the chosen scanner. our experience indicates that a pin feed platen is not necessary to maintain forms alignment if typists use care in initial alignment. we experienced some operational problems when we actually tried our program on the page reader. initially, the system would not compile our program. it was not due to a catastrophic error in our program, but rather a hardware fault in the 8092 teleprogrammer. in trying to read the program onto tape after compilation, the system consistently failed. we finally gave up trying and recompiled from the scanned input sheet at the beginning of each conversion run. no one at the data center could explain our failure to load, but we must assume an intermittent or undetected hardware problem. during the job run it was imperative that the scanner be watched closely as occasionally it would stop reading or fail to feed a sheet. these were not difficult problems but did require occasional attention by the center's customer engineer. on one occasion the scanner failed during our run, and we could not achieve a timely repair. we rescheduled for the next week and then experienced no problem. after our experiences with the 915 page reader at the data center we felt that we knew as much about the equipment as any of the operators we met while doing our production runs. we would not hesitate to use the page reader again for a simple file conversion, and would continue to handle the operation ourselves as the center operators were no better able to run our job. acknowledgments the author wishes to thank mr. eugene d. lourey for developing the program for this system. mr. curt herbert deserves recognition for the preliminary design for the system and initiating the optical scanning activities. also, mr. carl 0. sandberg, who was responsible for the many details of the conversion portion and who now maintains these programs, contributed many significant design parameters. the staff of the catalog division, too, deserve our gratitude for their file cleansing and data editing during and after conversion. subject authority filesjgrosch 243 references 1. j. heston heald, the making of test -thesaurus of engineering scientific terms. (final report of project lex, [u.s. office of naval research: nov. 1967] ad 661,001). 2. william hammond, construction of the nasa thesaurus, computer processing support, final report. (aries corp., 1968) n 68-28811. 3. william way, "subject heading authority list, computer prepared," american documentation 19: 188-99, (april 1968). 4. ellis mount and richard kollin, "analysis and revision of subject headings for applied science and technology index," special libraries 60: 639-46, (dec. 1969). · 5. henriette d. avram, lenore s. maruyama, and john c. rather, "automation activities in the processing department of the library of congress," library resources and technical services 16: 195-239, (spring 1972). microsoft word march_ital_young_tc proofread.docx building  library  community   through  social  media   scott  w.  h.  young     and  doralyn  rossmann     information  technology  and  libraries  |  march  2015             20   abstract   in  this  article  academic  librarians  present  and  analyze  a  model  for  community-­‐building  through   social  media.  findings  demonstrate  the  importance  of  strategy  and  interactivity  via  social  media  for   generating  new  connections  with  library  users.  details  of  this  research  include  successful  guidelines   for  building  community  and  developing  engagement  online  with  social  media.  by  applying   intentional  social  media  practices,  the  researchers’  twitter  user  community  grew  100  percent  in  one   year,  with  a  corresponding  275  percent  increase  in  user  interactions.  using  a  community  analysis   approach,  this  research  demonstrates  that  the  principles  of  personality  and  interactivity  can  lead  to   community  formation  for  targeted  user  groups.  discussion  includes  the  strategies  and  research   approaches  that  were  employed  to  build,  study,  and  understand  user  community,  including  user-­‐type   analysis  and  action-­‐object  mapping.  from  this  research  a  picture  of  the  library  as  a  member  of  an   active  academic  community  comes  into  focus.     introduction   this  paper  describes  an  academic  library’s  approach  to  building  community  through  twitter.   much  of  the  literature  offers  guidance  to  libraries  on  approaches  to  using  social  media  as  a   marketing  tool.  the  research  presented  here  reframes  that  conversation  to  explore  the  role  of   social  media  as  it  relates  to  building  community.  the  researchers’  university  library  formed  a   social  media  group  and  implemented  a  social  media  guide  to  bring  an  intentional,  personality-­‐rich,   and  interaction-­‐driven  approach  to  its  social  media  activity.  quantitative  analyses  reveal  a   significant  shift  and  increase  in  twitter  follower  population  and  interactions,  and  suggest   promising  opportunities  for  social  media  to  strengthen  the  library’s  ties  with  academic   communities.     literature  review   research  in  libraries  has  long  brought  a  critical  analysis  to  the  value,  purpose,  and  practical  usage   of  social  media.  glazer  asked  of  library  facebook  usage,  “clever  outreach  or  costly  diversion?”1   three  years  later,  glazer  presented  a  more  developed  perspective  on  facebook  metrics  and  the   nature  of  online  engagement,  but  social  media  was  still  described  as  “puzzling  and  poorly   defined.”2  vucovich  et  al.  furthermore  notes  that  “the  usefulness  of  [social  networking  tools]  has   often  proven  elusive,  and  evaluating  their  impact  is  even  harder  to  grasp  in  library  settings.”3     scott  w.  h.  young  (swyoung@montana.edu)  is  digital  initiatives  librarian  and     doralyn  rossmann  (doralyn@montana.edu)  is  head  of  collection  development,                 montana  state  university  library,  bozeman.     building  library  community  through  social  media  |  young  and  rossmann   21   li  and  li  similarly  observe  that  there  “seems  to  be  some  confusion  regarding  what  exactly  social   media  is.”4  social  media  has  been  experimented  with  and  identified  variously  as  a  tool  for   enhancing  the  image  of  libraries,5  as  a  digital  listening  post,6  or  as  an  intelligence  gathering  tool.7   with  such  a  variety  of  perspectives  and  approaches,  the  discussion  around  social  media  in   libraries  has  been  somewhat  disjointed.     if  there  is  a  common  thread  through  library  social  media  research,  however,  it  ties  together  the   broadcast-­‐based  promotion  and  marketing  of  library  resources  and  services,  what  li  calls  “the   most  notable  achievement  of  many  libraries  that  have  adopted  social  media.”8  this  particularly   common  approach  has  been  thoroughly  examined.9,10,11,12,13,14,15  in  evaluating  the  use  of  facebook   at  western  michigan  university’s  waldo  library,  sachs,  eckel,  and  langan  found  that  promotion   and  marketing  was  the  only  “truly  successful”  use  for  social  media.16  a  survey  of  estonian   librarians  revealed  that  facebook  “is  being  used  mainly  for  announcements;  it  is  reduplicating   libraries’  web  site[s].  interestingly  librarians  don’t  feel  a  reason  to  change  anything  or  to  do   something  differently.”17  with  this  widespread  approach  to  social  media,  much  of  the  library   literature  is  predominated  by  exploratory  descriptions  of  current  usage  and  implementation   methods  under  the  banner  of  promoting  resources  by  meeting  users  where  they  are  on  social   media.18,19,20,21,22,23,24,25,26,27  this  research  is  effective  at  describing  how  social  media  is  used,  but  it   often  does  not  extend  the  discussion  to  address  the  more  difficult  and  valuable  question  of  why   social  media  is  used.     the  literature  of  library  science  has  not  yet  developed  a  significant  body  of  research  around  the   practice  of  social  media  beyond  the  broadcast-­‐driven,  how-­‐to  focus  on  marketing,  promotion,  and   public-­‐relations  announcements.  this  deficiency  was  recognized  by  saw,  who  studied  social   networking  preferences  of  international  and  domestic  australian  students,  concluding  “to  date,   the  majority  of  libraries  that  use  social  networking  have  used  it  as  a  marketing  and  promotional   medium  to  push  out  information  and  announcements.  our  survey  results  strongly  suggest  that   libraries  need  to  further  exploit  the  strengths  of  different  social  networking  sites.”28  from  this   strong  emphasis  on  marketing  and  best  practices  emerges  an  opportunity  to  examine  social  media   from  another  perspective—community  building—which  may  represent  an  untapped  strength  of   social  networking  sites  for  libraries.     while  research  in  library  and  information  science  has  predominantly  developed  around  social   media  as  marketing  resource,  a  small  subset  has  begun  to  investigate  the  community-­‐building   capabilities  of  social  media.29,30,31,32  by  making  users  feel  connected  to  a  community  and   increasing  their  knowledge  of  other  members,  “sites  such  as  facebook  can  foster  norms  of   reciprocity  and  trust  and,  therefore,  create  opportunities  for  collective  action.”33  lee,  yen,  and   hsiao  studied  the  value  of  interaction  and  information  sharing  on  social  media:  “a  sense  of   belonging  is  achieved  when  a  friend  replies  to  or  ‘likes’  a  post  on  facebook.”34  lee  found  that   facebook  users  perceived  real-­‐world  social  value  from  shared  trust  and  shared  vision  developed   and  expressed  through  information-­‐sharing  on  social  media.  research  from  oh,  ozkaya,  and     information  technology  and  libraries  |  march  2015      22   larose  indicated  that  users  who  engaged  in  a  certain  quality  of  social  media  interactivity   perceived  an  enhanced  sense  of  community  and  life  satisfaction.35   broader  discussion  of  social  media  as  a  tool  for  community-­‐building  has  been  advanced  within  the   context  of  political  activity,  where  social  media  is  identified  as  a  method  for  organizing  civic  action   and  revolutionary  protests.36,37,38  related  research  focuses  on  the  online  social  connections  and   “virtual  communities”  developed  around  common  interests  such  as  religion,39  health,40   education,41  social  interests  and  norms,42  politics,43  web-­‐video  sharing,44  and  reading.45  in  these   analyses,  social  media  is  framed  as  an  online  instrument  utilized  to  draw  together  offline  persons.   hofer  notes  that  communities  formed  online  through  social  media  activity  can  generate  a  sense  of   “online  bonding  social  capital.”46  further  marking  the  online/offline  boundary,  research  from   grieve  et  al.  investigates  the  value  of  social  connectedness  in  online  contexts,  suggesting  that   social  connectedness  on  facebook  “is  a  distinct  construct  from  face-­‐to-­‐face  social   connectedness.”47  grieve  et  al.  acknowledges  that  the  research  design  was  predicated  on  the   assumptive  existence  of  an  online/offline  divide,  noting  “it  is  possible  that  such  a  separation  does   not  exist.”48   around  this  online/offline  separation  has  developed  “digital  dualism,”  a  theoretical  approach  that   interrogates  the  false  boundaries  and  contrasts  between  an  online  world  as  distinct  from  an   offline  world.49,50  sociologist  zeynep  tufekci  expressed  this  concisely:  “in  fact,  the  internet  is  not  a   world;  it’s  part  of  the  world.”51  a  central  characteristic  of  community-­‐building  through  social   media  is  that  the  “online”  experience  is  so  connected  and  interwoven  with  the  “offline”  experience   as  to  create  a  single  seamless  experience.  this  concept  is  related  to  a  foundational  study  from   ellison,  steinfield,  and  lampe,  who  identified  facebook  as  a  valuable  subject  of  research  because   of  its  “heavy  usage  patterns  and  technological  capacities  that  bridge  online  and  offline   connections.”52  they  conclude,  “online  social  network  sites  may  play  a  role  different  from  that   described  in  early  literature  on  virtual  communities.  online  interactions  do  not  necessarily   remove  people  from  their  offline  world  but  may  indeed  be  used  to  support  relationships.”53   this  paper  builds  on  existing  online  community  research  while  drawing  on  the  critical  theory  of   “digital  dualism”  to  argue  that  communities  built  through  social  media  do  not  reside  in  a  separate   “online”  space,  but  rather  are  one  element  of  a  much  more  significant  and  valuable  form  of  holistic   connectedness.  our  research  represents  a  further  step  in  shifting  the  focus  of  library  social  media   research  and  practice  from  marketing  to  community  building,  recasting  library-­‐led  social  media  as   a  tool  that  enables  users  to  join  together  and  share  in  the  commonalities  of  research,  learning,  and   the  university  community.  as  library  social  media  practice  advances  within  the  framework  of   community,  it  moves  from  a  one-­‐dimensional  online  broadcast  platform  to  a  multidimensional   socially  connected  space  that  creates  value  for  both  the  library  and  library  users.   method     in  may  2012,  montana  state  university  library  convened  a  social  media  group  (smg)  to  guide  our     building  library  community  through  social  media  |  young  and  rossmann   23   social  media  activity.  the  formation  of  smg  marked  an  important  shift  in  our  social  media  activity   and  was  crucial  in  building  a  strategic  and  programmatic  focus  around  social  media.  this  internal   committee,  comprising  three  librarians  and  one  library  staff  member,  aimed  to  build  a  community   of  student  participants  around  the  twitter  platform.  smg  then  created  a  social  media  guide  to   provide  structure  for  our  social  media  program.  this  guide  outlines  eight  principal  components  of   social  media  activity  (see  table  1).     social  media  guide  component   twitter  focus   audience  focus   undergraduate  and  graduate  students   goals   connect  with  students  and  build  community   values   availability,  care,  scholarship   activity  focus   information  sharing;  social  interaction   tone  &  tenor   welcoming,  warm,  energetic   posting  frequency   daily,  with  regular  monitoring  of  subsequent  interactions   posting  categories   student  life,  local  community   posting  personnel   1  librarian,  approximately  .10  fte     table  1.  social  media  activity  components   prior  to  the  formation  of  smg,  our  twitter  activity  featured  automated  posts  that  lacked  a  sense  of   presence  and  personality.  after  the  formation  of  smg,  our  twitter  activity  featured  hand-­‐crafted   posts  that  possessed  both  presence  and  personality.  to  measure  the  effectiveness  of  our  social   media  program,  we  divided  our  twitter  activity  into  two  categories  based  on  the  may  2012  date  of   smg’s  formation:  phase  1  (pre-­‐smg)  and  phase  2  (post-­‐smg).  phase  1  user  data  included   followers  1–514,  those  users  who  followed  the  library  between  november  2008,  when  the  library   joined  twitter,  and  april  2012,  the  last  month  before  the  library  formed  smg.  phase  2  included   followers  515–937,  those  users  who  followed  the  library  between  may  2012,  when  the  library   formed  smg,  and  august  2013,  the  end  date  of  our  research  period.  using  corresponding  dates  to   our  user  analysis,  phase  1  tweet  data  included  the  library’s  tweets  1–329,  which  were  posted   between  november  2008  and  april  2012,  and  phase  2  included  the  library’s  tweets  330–998,   which  were  posted  between  may  2012  and  august  2013  (table  2).  for  the  purposes  of  this   research,  phase  1  and  phase  2  users  and  tweets  were  evaluated  as  distinct  categories  so  that  all   corresponding  tweets,  followers,  and  interactions  could  be  compared  in  relation  to  the  formation   date  of  smg.  within  twitter,  “followers”  are  members  of  the  user  community,  “tweets”  are     information  technology  and  libraries  |  march  2015      24   messages  to  the  community,  and  “interactions”  are  the  user  behaviors  of  favoriting,  retweeting,  or   replying.  favorites  are  most  commonly  employed  when  users  like  a  tweet.  favoriting  a  tweet  can   indicate  approval,  for  instance.  a  user  may  also  share  another  user’s  tweet  with  their  own   followers  by  “retweeting.”         followers   tweets   duration   phase  1   1–514   1–329   nov.  2008–april  2012   phase  2   515–937   330–998   may  2012–august  2013   table  2.  comparison  of  phase  1  and  2  twitter  activity   we  employed  three  approaches  for  evaluating  our  twitter  activity:  user  type  analysis,  action-­‐ object  mapping,  and  interaction  analysis.  user  type  analysis  aims  to  understand  our  community   from  a  broad  perspective  by  creating  categories  of  users  following  the  library’s  twitter  account.   after  reviewing  the  accounts  of  each  member  of  our  user  community,  we  collected  them  into  one   of  the  following  nine  groups:  alumni,  business,  community  member,  faculty,  library,  librarian,   other,  spam,  and  student.  categorization  was  based  on  a  manual  review  of  information  found  from   each  user’s  biographical  profile,  tweet  content,  account  name,  and  a  comparison  against  campus   directories.     action-­‐object  mapping  is  a  quantitative  method  that  describes  the  relationship  between  the   performance  of  an  activity—the  action—in  relation  to  an  external  phenomenon—the  object.   action-­‐object  mapping  aims  to  describe  the  interaction  process  between  a  system  and  its   users.54,55,56,57  within  the  context  of  our  study,  the  system  is  twitter,  the  object  is  an  individual   tweet,  and  the  action  is  the  user  behavior  in  response  to  the  object,  i.e.,  a  user  marking  a  tweet  as  a   favorite,  retweeting  a  tweet,  or  replying  to  a  tweet.  we  collected  our  library’s  tweets  into  sixteen   object  categories:  blog  post,  book,  database,  event,  external  web  resource,  librarian,  library  space,   local  community,  other  libraries/universities,  photo  from  archive,  topics—libraries,  service,   students,  think  tank,  hortative,  and  workshop.   interaction  analysis  serves  as  an  extension  of  action-­‐object  mapping  and  aims  to  provide  further   details  about  the  level  of  interaction  between  a  system  and  its  users.  for  this  study  we  created  an   associated  metric,  “interaction  rate,”  that  measures  the  rate  by  which  each  object  category   received  an  action.  within  the  context  of  our  study,  we  have  treated  the  “action”  of  action-­‐object   mapping  and  the  “interaction”  of  twitter  as  equivalents.  to  identify  the  interaction  rate,  we  used   the  following  formula:  “total  number  of  tweets  within  an  object  category”  divided  by  “number  of   tweets  within  an  object  category  that  received  an  action.”  interaction  rate  was  calculated  for  each   object  category  and  for  all  tweets  in  phase  1  and  in  phase  2.     building  library  community  through  social  media  |  young  and  rossmann   25     results   the  changes  in  approach  to  the  library’s  twitter  presence  through  smg  and  the  social  media   guide  are  evident  in  this  study’s  results  (figure  1).  an  analysis  of  user  types  in  phase  1  reveals  a   large  portion,  48  percent,  were  business  followers.  in  comparison,  the  business  percentage   decreased  to  30  percent  in  phase  2.  the  student  percentage  increased  from  6  percent  in  phase  1  to   28  percent  in  phase  2,  representing  a  366  percent  increase  in  student  users.  as  noted  earlier,  the   social  media  guide  component,  “audience  focus”  for  twitter  is  “undergraduate  and  graduate   students”  and  includes  the  “goal”  to  “connect  with  students  and  build  community”  (table  1).  the   increase  in  the  percentage  of  students  in  the  follower  population  and  the  decrease  in  the  business   percentage  of  the  population  suggest  progress  towards  this  goal.     figure  1.  comparison  of  twitter  users  by  type     the  object  categorization  for  phase  1  shows  a  heavily  skewed  distribution  of  tweets  in  certain   areas,  while  phase  2  has  a  more  even  and  targeted  distribution  reflecting  implementation  of  the   social  media  guide  components  (figure  2).  in  phase  1,  workshops  is  the  most  tweeted  category   with  of  36  percent  of  all  posts.  library  space  represents  18  percent  of  tweets  while  library  events   is  third  with  17  percent.  the  remaining  13  categories  range  from  5  percent  to  a  fraction  of  a   percent  of  tweets.  phase  2  shows  a  more  balanced  and  intentional  distribution  of  tweets  across  all   object  categories,  with  a  strong  focus  on  the  social  media  guide  “posting  category”  of  “student     information  technology  and  libraries  |  march  2015      26   life,”  which  accounted  for  25  percent  of  tweets.  library  space  consists  of  11  percent  of  tweets,  and   external  web  resource  composes  9  percent  of  tweets.  the  remaining  categories  range  from  8   percent  to  1  percent  of  tweets.       figure  2.  comparison  of  tweets  by  content  category   interaction  rates  were  low  in  most  object  categories  in  phase  1  (see  figure  3).  given  that  the  social   media  guide  has  an  “activity  focus”  of  “social  interaction,”  a  tweet  category  with  a  high  percentage   of  posting  and  a  low  interaction  rate  suggests  a  disconnect  between  tweet  posting  and  meeting   stated  goals.  for  example,  workshops  represented  a  large  percentage  (36  percent)  of  the  tweets   but  yielded  a  0  percent  interaction  rate.  library  space  was  18  percent  of  tweets  but  had  only  a  2   percent  interaction  rate.  eleven  of  the  16  categories  in  phase  1  had  no  associated  actions  and  thus   a  0  percent  interaction  rate.  the  interaction  rate  for  phase  1  was  12.5  percent.  in  essence,  our   action-­‐object  data  and  interaction  rate  data  shows  us  that  during  phase  1  we  created  content  most   frequently  about  topics  of  low  interest  to  our  community  while  we  tweeted  less  frequently  about   topics  of  high  interest  to  our  community.         building  library  community  through  social  media  |  young  and  rossmann   27     figure  3.  interaction  rates,  phase  1   in  contrast  to  phase  1,  phase  2  interaction  rate  demonstrates  an  increase  in  interaction  rate  across   nearly  every  object  category  (figure  4,  figure  5),  especially  student  life  and  local  community.     figure  4.  interaction  rates,  phase  2   information  technology  and  libraries  |  march  2015      28     figure  5.  interaction  rate  comparison   the  local  community  category  of  tweets  had  the  highest  interaction  rate  at  68  percent.  the   student  life  category  had  the  second  highest  interaction  rate  at  62  percent.  only  2  of  the  16   categories  in  phase  2  had  no  associated  actions  and  thus  a  0  percent  interaction  rate.  the   interaction  rate  for  phase  2  was  46.8  percent,  which  represented  an  increase  of  275  percent  from   phase  1.  in  essence,  our  action-­‐object  data  and  interaction  rate  data  shows  us  that  during  phase  2   we  created  content  most  frequently  about  topics  of  higher  interest  to  our  community  while  we   tweeted  less  frequently  about  topics  of  low  interest  to  our  community.   discussion   this  research  suggests  a  strong  community-­‐building  capability  of  social  media  at  our  academic   library.  the  shift  in  user  types  from  phase  1  to  phase  2,  notably  the  increase  in  student  twitter   followers,  indicates  that  the  shape  of  our  twitter  community  was  directly  affected  by  our  social   media  program.  likewise,  the  marked  increase  in  interaction  rrate  between  phase  1  and  phase  2   suggests  the  effectiveness  of  our  programmatic  community-­‐focused  approach.     the  montana  state  university  library  social  media  program  was  fundamentally  formed  around  an   approach  described  by  glazer:  “be  interesting,  be  interested.”58  our  twitter  user  community  has   thrived  since  we  adopted  this  axiom.  we  have  interpreted  “interesting”  as  sharing  original     building  library  community  through  social  media  |  young  and  rossmann   29   personality-­‐rich  content  with  our  community  and  “interested”  as  regularly  interacting  with  and   responding  to  members  of  our  community.  the  twofold  theme  of  personality-­‐rich  content  and   interactivity-­‐based  behavior  has  allowed  us  to  shape  our  phase  2  user  community.   prior  to  the  formation  of  smg,  social  media  at  the  msu  library  was  a  rather  drab  affair.  the  library   twitter  account  during  that  time  was  characterized  by  automated  content,  low  responsiveness,  no   dedicated  personnel,  and  no  strategic  vision.  our  resulting  twitter  community  was  composed  of   mostly  businesses,  at  47  percent  of  followers,  with  students  representing  just  6  percent  of  our   followers.  the  resulting  interaction  rate  of  12.5  percent  reflects  the  broadcast-­‐driven  approach,   personality-­‐devoid  content,  and  disengaged  community  that  together  characterized  phase  1.   following  the  formation  of  smg,  the  library  twitter  account  benefitted  from  original  and  unique   content,  high  responsiveness,  dedicated  personnel,  and  a  strategic,  goal-­‐driven  vision.  our  phase  2   twitter  community  underwent  a  transformation,  with  business  representation  decreasing  to  30   percent  and  student  representation  increasing  to  28  percent.  the  resulting  interaction  rate  of  46.8   percent  reflects  our  refocused  community-­‐driven  program,  personality-­‐rich  content,  and  engaged   community  of  phase  2.     figure  6.  typical  phase  1  tweet   figure  6  illustrates  a  typical  phase  1  tweet.  the  object  category  for  this  tweet  is  database  and  it   yielded  no  actions.  the  announcement  of  a  new  database  trial  was  auto-­‐generated  from  our   library  blog,  a  common  method  for  sharing  content  during  phase  1.  this  tweet  is  problematic  for   community-­‐building  for  two  primary  reasons:  the  style  and  content  lacks  a  sense  and  personality   of  a  human  author  and  does  not  offer  compelling  opportunities  for  interaction.         information  technology  and  libraries  |  march  2015      30     figure  7.  typical  phase  2  tweet   figure  7  illustrates  a  typical  phase  2  tweet.  the  object  category  for  this  tweet  is  student  life  and  it   yielded  6  actions  (2  retweets  and  4  favorites).  the  content  relates  to  a  meaningful  and  current   event  for  our  target  student  user  community,  and  is  fashioned  in  such  a  way  as  to  invite   interaction  by  providing  a  strong  sense  of  relevancy  and  personality.  figure  8  further   demonstrates  the  community  effect  of  phase  2.  in  this  example  we  have  reminded  our  twitter   community  of  the  services  available  through  the  library,  and  one  student  user  has  replied.  during   our  phase  2  twitter  activity,  we  prioritized  responsiveness,  availability,  and  scholarship  with  the   goal  of  connecting  with  students  and  building  a  sense  of  community.  in  many  ways  the  series  of   tweets  shown  in  figure  8  encapsulates  our  social  media  program.  we  were  able  to  deliver   resources  to  this  student,  who  then  associates  these  interactions  with  a  sense  of  pride  in  the   university.  this  example  illustrates  the  overall  connectedness  afforded  by  social  media.  in   contacting  the  library  twitter  account,  this  user  asked  a  real-­‐world  research  question.  neither  his   inquiry  nor  our  response  was  located  strictly  within  an  online  world.  while  we  pointed  this  user   to  an  online  resource,  his  remarks  indicated  “offline”  feelings  of  satisfaction  with  the  interaction.   lee  and  oh  found  that  social  media  interactivity  and  information  sharing  can  create  a  shared   vision  that  leads  to  a  sense  of  community  belonging.59,60  by  creating  personality-­‐rich  content  that   invites  two-­‐way  interaction,  our  strategic  social  media  program  has  helped  form  a  holistic   community  of  users  around  our  twitter  activity.         building  library  community  through  social  media  |  young  and  rossmann   31     figure  8.  phase  2  example,  community  effect   currently  our  work  addresses  the  formation  of  community  through  social  media.  a  next  step  will   introduce  a  wider  scope  by  addressing  the  value  of  community  formed  through  social  media.   there  is  a  rich  area  of  study  around  the  relationship  between  social  media  activity,  perceived   sense  of  community  and  connectedness,  and  student  success.  61,62,63,64,65  further  research  along   this  line  will  allow  us  to  explore  whether  a  library-­‐led  social  media  community  can  serve  as  an  aid   in  undergraduate  academic  performance  and  graduation  rate.  continued  and  extended  analysis     information  technology  and  libraries  |  march  2015      32   will  allow  us  to  increase  the  granularity  of  results,  for  example  by  mapping  user  types  to  action-­‐ object  pairs  and  identifying  the  interaction  rate  for  particular  users  such  as  students  and  faculty.   conclusion   in  articulating  and  realizing  an  intentional  and  strategic  social  media  program,  we  have  generated   results  that  demonstrate  the  community-­‐building  capability  of  social  media.  over  the  course  of   one  year,  we  transformed  our  social  media  activity  from  personality-­‐devoid  one-­‐way  broadcasting   to  personality-­‐rich  two-­‐way  interacting.  the  research  that  followed  this  fundamental  shift   provided  new  information  about  our  users  that  enabled  us  to  tailor  our  twitter  activity  and  shape   our  community  around  a  target  population  of  undergraduate  students.  in  so  doing,  we  have   formed  a  community  that  has  shown  new  interest  in  social  media  content  published  by  the  library.   following  the  application  of  our  social  media  program,  our  student  user  community  grew  by  366   percent  and  the  rate  of  interaction  with  our  community  grew  by  275  percent.  our  research   demonstrates  the  value  of  social  media  as  a  community-­‐building  tool,  and  our  model  can  guide   social  media  in  libraries  toward  this  purpose.   references   1. harry  glazer,  “clever  outreach  or  costly  diversion?  an  academic  library  evaluates  its   facebook  experience,”  college  &  research  libraries  news  70,  no.  1  (2009):  11,   http://crln.acrl.org/content/70/1/11.full.pdf+html.     2. harry  glazer,  “‘likes’  are  lovely,  but  do  they  lead  to  more  logins?  developing  metrics   for  academic  libraries’  facebook  pages,”  college  &  research  libraries  news  73,  no.  1   (2012):  20,  http://crln.acrl.org/content/73/1/18.full.pdf+html.   3. lee  a.  vucovich  et  al.,  “is  the  time  and  effort  worth  it?  one  library’s  evaluation  of  using   social  networking  tools  for  outreach,”  medical  reference  services  quarterly  32,  no.  1   (2013):  13,  http://dx.doi.org/10.1080/02763869.2013.749107.   4. xiang  li  and  tang  li,  “integrating  social  media  into  east  asia  library  services:  case   studies  at  university  of  colorado  and  yale  university,”  journal  of  east  asian  libraries  157,   no.  1  (2013):  24,   https://ojs.lib.byu.edu/spc/index.php/jeal/article/view/32663/30799.     5. colleen  cuddy,  jamie  graham,  and  emily  g.  morton-­‐owens,  “implementing  twitter  in  a   health  sciences  library,”  medical  reference  services  quarterly  29,  no.  4  (2010),   http://dx.doi.org/10.1080/02763869.2010.518915.     6. steven  bell,  “students  tweet  the  darndest  things  about  your  library—and  why  you   need  to  listen,”  reference  services  review  40,  no.  2  (2012),   http://dx.doi.org/10.1108/00907321211228264.         building  library  community  through  social  media  |  young  and  rossmann   33   7. robin  r.  sewell,  “who  is  following  us?  data  mining  a  library’s  twitter  followers,”   library  hi  tech  31,  no.  1  (2013),  http://dx.doi.org/10.1108/07378831311303994.     8. li  and  li,  “integrating  social  media  into  east  asia  library  services,”  25.   9. remi  castonguay,  “say  it  loud  spreading  the  word  with  facebook  and  twitter,”  college  &   research  libraries  news  72,  no.  7  (2011),   http://crln.acrl.org/content/72/7/412.full.pdf+html.     10. dianna  e.  sachs,  edward  j.  eckel,  and  kathleen  a.  langan,  “striking  a  balance:  effective   use  of  facebook  in  an  academic  library,”  internet  reference  services  quarterly  16,  nos.  1– 2  (2011),  http://dx.doi.org/10.1080/10875301.2011.572457.   11. christopher  chan,  “marketing  the  academic  library  with  online  social  network   advertising,”  library  management  33,  no.  8,  (2012),   http://dx.doi.org/10.1108/01435121211279849.   12. melissa  dennis,  “outreach  initiatives  in  academic  libraries,  2009–2011,”  reference   services  review  40,  no.  3,  (2012),  http://dx.doi.org/10.1108/00907321211254643.   13. melanie  griffin  and  tomaro  i.  taylor,  “of  fans,  friends,  and  followers:  methods  for   assessing  social  media  outreach  in  special  collections  repositories,”  journal  of  web   librarianship  7,  no.  3  (2013),  http://dx.doi.org/10.1080/19322909.2013.812471.     14. lili  luo,“marketing  via  social  media:  a  case  study,”  library  hi  tech  31,  no.  3  (2013),   http://dx.doi.org/10.1108/lht-­‐12-­‐2012-­‐0141.     15. li  and  li,  “integrating  social  media  into  east  asia  library  services,”  25.   16. sachs,  eckel,  and  langan,  “striking  a  balance,”  48.   17. jaana  roos,  “why  university  libraries  don’t  trust  facebook  marketing?,”  proceedings  of   the  21st  international  bobcatsss  conference  (2013):  164,   http://bobcatsss2013.bobcatsss.net/proceedings.pdf.   18. noa  aharony,  “twitter  use  in  libraries:  an  exploratory  analysis,”  journal  of  web   librarianship  4,  no.  4  (2010),  http://dx.doi.org/10.1080/19322909.2010.487766.   19. a.  r.  riza  ayu  and  a.  abrizah,  “do  you  facebook?  usage  and  applications  of  facebook   page  among  academic  libraries  in  malaysia,”  international  information  &  library  review   43,  no.  4  (2011),  http://dx.doi.org/10.1016/j.iilr.2011.10.005.   20. alton  y.  k.  chua  and  dion  h  goh.,  “a  study  of  web  2.0  applications  in  library  websites,”   library  &  information  science  research  32,  no.  3  (2010),   http://dx.doi.org/10.1016/j.lisr.2010.01.002.         information  technology  and  libraries  |  march  2015      34   21. andrea  dickson  and  robert  p.  holley,  “social  networking  in  academic  libraries:  the   possibilities  and  the  concerns,”  new  library  world  111,  nos.  11/12  (2010),   http://dx.doi.org/10.1108/03074801011094840.     22. valerie  forrestal,  “making  twitter  work:  a  guide  for  the  uninitiated,  the  skeptical,  and   the  pragmatic,”  reference  librarian  52,  nos.  1–2  (2010),   http://dx.doi.org/10.1080/02763877.2011.527607.   23. gang  wan,  “how  academic  libraries  reach  users  on  facebook,”  college  &  undergraduate   libraries  18,  no.  4  (2011),  http://dx.doi.org/10.1080/10691316.2011.624944.   24. dora  yu-­‐ting  chen,  samuel  kai-­‐wah  chu,  and  shu-­‐qin  xu,  “how  do  libraries  use  social   networking  sites  to  interact  with  users,”  proceedings  of  the  american  society  for   information  science  and  technology  49,  no.  1  (2012),   http://dx.doi.org/10.1002/meet.14504901085.   25. rolando  garcia-­‐milian,  hannah  f.  norton,  and  michele  r.  tennant,  “the  presence  of   academic  health  sciences  libraries  on  facebook:  the  relationship  between  content  and   library  popularity,”  medical  reference  services  quarterly  31,  no.  2  (2012),   http://dx.doi.org/10.1080/02763869.2012.670588.   26. elaine  thornton,  “is  your  academic  library  pinning?  academic  libraries  and  pinterest,”   journal  of  web  librarianship  6,  no.  3  (2012),   http://dx.doi.org/10.1080/19322909.2012.702006.   27. katie  elson  anderson  and  julie  m.  still,  “librarians’  use  of  images  on  libguides  and  other   social  media  platforms,”  journal  of  web  librarianship  7,  no.  3  (2013),   http://dx.doi.org/10.1080/19322909.2013.812473.   28. grace  saw,  “social  media  for  international  students—it’s  not  all  about  facebook,”  library   management  34,  no.  3  (2013):  172,  http://dx.doi.org/10.1108/01435121311310860.   29. ligaya  ganster  and  bridget  schumacher,  “expanding  beyond  our  library  walls:  building   an  active  online  community  through  facebook,”  journal  of  web  librarianship  3,  no.  2   (2009),  http://dx.doi.org/10.1080/19322900902820929.     30. sebastián  valenzuela,  namsu  park,  and  kerk  f.  kee,  “is  there  social  capital  in  a  social   network  site?  facebook  use  and  college  students’  life  satisfaction,  trust,  and   participation,”  journal  of  computer-­‐mediated  communication  14,  no.  4  (2009),   http://dx.doi.org/10.1111/j.1083-­‐6101.2009.01474.x.   31. nancy  kim  phillips,  “academic  library  use  of  facebook:  building  relationships  with   students,”  journal  of  academic  librarianship  37,  no.  6  (2011),   http://dx.doi.org/10.1016/j.acalib.2011.07.008.       building  library  community  through  social  media  |  young  and  rossmann   35   32. tina  mccorkindale,  marcia  w.  distaso,  and  hilary  fussell  sisco,  “how  millennials  are   engaging  and  building  relationships  with  organizations  on  facebook,”  journal  of  social   media  in  society  2,  no.  1  (2013),   http://thejsms.org/index.php/tsmri/article/view/15/18.     33. valenzuela,  park,  and  kee,  “is  there  social  capital  in  a  social  network  site?,”  882.   34. maria  r.  lee,  david  c.  yen,  and  c.  y.  hsiao,  “understanding  the  perceived  community   value  of  facebook  users,”  computers  in  human  behavior  35  (february  2014):  355,   http://dx.doi.org/10.1016/j.chb.2014.03.018.   35. hyun  jung  oh,  elif  ozkaya,  and  robert  larose,  “how  does  online  social  networking   enhance  life  satisfaction?  the  relationships  among  online  supportive  interaction,  affect,   perceived  social  support,  sense  of  community,  and  life  satisfaction,”  computers  in  human   behavior  30  (2014),  http://dx.doi.org/10.1016/j.chb.2013.07.053.   36. rowena  cullen  and  laura  sommer,  “participatory  democracy  and  the  value  of  online   community  networks:  an  exploration  of  online  and  offline  communities  engaged  in  civil   society  and  political  activity,”  government  information  quarterly  28,  no.  2  (2011),   http://dx.doi.org/10.1016/j.giq.2010.04.008.   37. mohamed  nanabhay  and  roxane  farmanfarmaian,  “from  spectacle  to  spectacular:  how   physical  space,  social  media  and  mainstream  broadcast  amplified  the  public  sphere  in   egypt’s  ‘revolution,’”  journal  of  north  african  studies  16,  no.  4  (2011),   http://dx.doi.org/10.1080/13629387.2011.639562.   38. nermeen  sayed,  “towards  the  egyptian  revolution:  activists  perceptions  of  social  media   for  mobilization,”  journal  of  arab  &  muslim  media  research  4,  nos.  2–3  (2012):  273–98,   http://dx.doi.org/10.1386/jammr.4.2-­‐3.273_1.   39. morton  a.  lieberman  and  andrew  winzelberg,  “the  relationship  between  religious   expression  and  outcomes  in  online  support  groups:  a  partial  replication,”  computers  in   human  behavior  25,  no.  3  (2009),  http://dx.doi.org/10.1016/j.chb.2008.11.003.   40. christopher  e.  beaudoin  and  chen-­‐chao  tao,  “benefiting  from  social  capital  in  online   support  groups:  an  empirical  study  of  cancer  patients,”  cyberpsychology  &  behavior:  the   impact  of  the  internet,  multimedia  and  virtual  reality  on  behavior  and  society  10,  no.  4   (2007),  http://dx.doi.org/10.1089/cpb.2007.9986.     41. manuela  tomai  et  al.,  “virtual  communities  in  schools  as  tools  to  promote  social  capital   with  high  schools  students,”  computers  &  education  54,  no.  1  (2010),   http://dx.doi.org/10.1016/j.compedu.2009.08.009.         information  technology  and  libraries  |  march  2015      36   42. edward  shih-­‐tse  wang  and  lily  shui-­‐lien  chen,  “forming  relationship  commitments  to   online  communities:  the  role  of  social  motivations,”  computers  in  human  behavior  28,   no.  2  (2012),  http://dx.doi.org/10.1016/j.chb.2011.11.002.   43. pippa  norris  and  david  jones,  “virtual  democracy,”  harvard  international  journal  of   press/politics  3,  no.  2  (1998),  http://dx.doi.org/10.1177/1081180x98003002001.     44. xu  cheng,  jiangchuan  liu,  and  cameron  dale,  “understanding  the  characteristics  of   internet  short  video  sharing:  a  youtube-­‐based  measurement  study,”  ieee  transactions   on  multimedia  15,  no.  5  (2013),  http://dx.doi.org/10.1109/tmm.2013.2265531.   45. nancy  foasberg,  “online  reading  communities:  from  book  clubs  to  book  blogs,”  journal   of  social  media  in  society  1,  no.1  (2012),   http://thejsms.org/index.php/tsmri/article/view/3/4.     46. matthias  hofer  and  viviane  aubert,  “perceived  bridging  and  bonding  social  capital  of   twitter:  differentiating  between  followers  and  followees,”  computers  in  human  behavior   29,  no.  6  (2013):  2137,  http://dx.doi.org/10.1016/j.chb.2013.04.038.     47. rachel  grieve  et.  al.,  “face-­‐to-­‐face  or  facebook:  can  social  connectedness  be  derived   online?,”  computers  in  human  behavior  29,  no.  3  (2013):  607,   http://dx.doi.org/10.1016/j.chb.2012.11.017.   48. ibid.,  608.   49. nathan  jurgenson,  “when  atoms  meet  bits:  social  media,  the  mobile  web  and  augmented   revolution,”  future  internet  4,  no.  1  (2012),  http://dx.doi.org/10.3390/fi4010083.   50. r.  stuart  geiger,  “bots,  bespoke,  code  and  the  materiality  of  software  platforms,”   information,  communication  &  society  17,  no.  3  (2014),   http://dx.doi.org/10.1080/1369118x.2013.873069.     51. zeynep  tufekci,  “the  social  internet:  frustrating,  enriching,  but  not  lonely,”  public   culture  26,  no.  1,  iss.  72  (2013):  14,  http://dx.doi.org/10.1215/08992363-­‐2346322.   52. nicole  b.  ellison,  charles  steinfield,  and  cliff  lampe,  “the  benefits  of  facebook  ‘friends’:   social  capital  and  college  students’  use  of  online  social  network  sites,”  journal  of   computer-­‐mediated  communication  12,  no.  4  (2007):  1144,   http://dx.doi.org/10.1111/j.1083-­‐6101.2007.00367.x.     53. ibid.,  1165.   54. roger  brown,  a  first  language:  the  early  stages  (cambridge:  harvard  university  press,   1973).       building  library  community  through  social  media  |  young  and  rossmann   37   55. mimi  zhang  and  bernard  j.  jansen,  “using  action-­‐object  pairs  as  a  conceptual  framework   for  transaction  log  analysis,”  in  handbook  of  research  on  web  log  analysis,  edited  by   bernard  j.  jansen,  amanda  spink,  and  isak  taksa  (hershey,  pa:  igi,  2008).   56. bernard  j.  jansen  and  mimi  zhang,  “twitter  power:  tweets  as  electronic  word  of  mouth,”   journal  of  the  american  society  for  information  science  &  technology  60,  no.  11  (2009),   http://dx.doi.org/10.1002/asi.v60:11.     57. sewell,  “who  is  following  us?”     58. glazer,  “‘likes’  are  lovely,”  20.   59. lee,  yen,  and  hsiao,  “understanding  the  perceived.“   60. oh,  ozkaya,  and  larose,  “how  does  online  social  networking.”     61. reynol  junco,  greg  heiberger,  and  eric  loken,  “the  effect  of  twitter  on  college  student   engagement  and  grades,”  journal  of  computer  assisted  learning  27,  no.  2  (2011),   http://dx.doi.org/10.1111/j.1365-­‐2729.2010.00387.x.   62. susannah  k.  brown  and  charles  a.  burdsal,  “an  exploration  of  sense  of  community  and   student  success  using  the  national  survey  of  student  engagement,”  journal  of  general   education  61,  no.  4  (2012),  http://dx.doi.org/10.1353/jge.2012.0039.   63. jill  l.  creighton  et  al.,  “i  just  look  it  up:  undergraduate  student  perception  of  social  media   use  in  their  academic  success,”  journal  of  social  media  in  society  2,  no.  2  (2013),   http://thejsms.org/index.php/tsmri/article/view/48/25.     64. david  c.  deandrea  et  al.,  “serious  social  media:  on  the  use  of  social  media  for  improving   students’  adjustment  to  college,”  the  internet  and  higher  education  15,  no.  1  (2012),   http://dx.doi.org/10.1016/j.iheduc.2011.05.009.   65. rebecca  gray  et  al.,  “examining  social  adjustment  to  college  in  the  age  of  social  media:   factors  influencing  successful  transitions  and  persistence,”  computers  &  education  67   (2013),  http://dx.doi.org/10.1016/j.compedu.2013.02.021.     user experience methods and maturity in academic libraries articles user experience methods and maturity in academic libraries scott w. h. young, zoe chao, and adam chandler information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11787 scott w. h. young (swyoung@montana.edu) is ux and assessment librarian, montana state university. zoe chao (chaoszuyu@gmail.com) is ux designer, truist financial. adam chandler (alc28@cornell.edu) is director of automation, assessment, and post-cataloging services, cornell university. abstract this article presents a mixed-methods study of the methods and maturity of user experience (ux) practice in academic libraries. the authors apply qualitative content analysis and quantitative statistical analysis to a research dataset derived from a survey of ux practitioners. results reveal the type and extent of ux methods currently in use by practitioners in academic libraries. themes extracted from the survey responses also reveal a set of factors that influence the development of ux maturity. analysis and discussion focus on organizational characteristics that influence ux methods and maturity. the authors conclude by offering a library-focused maturity scale with recommended practices for advancing ux maturity in academic libraries. introduction user experience (ux) is a design practice for creating tools and services from a user-centered perspective. academic libraries have been practicing ux for some time, with ux methods having been incorporated across the profession. however, there has been a lack of empirical data showing the extent of ux methods in use or state of ux maturity in libraries. to help illuminate these areas, we distributed a survey to ux practitioners working in academic libraries that inquired into methods and maturity. we followed a mixed-methods approach involving both qualitative content analysis and quantitative statistical analysis to analyze the dataset. our results reveal the mostand least-common ux methods currently in use in academic libraries. results also demonstrate specific organizational characteristics that help and hinder ux maturity. we conclude by offering a set of strategies for reaching higher levels of ux maturity. background and motivation: ux in academic libraries ux has been represented in the literature of library and information science for at least two decades, when “the human interaction involved in service use” was recognized as a factor affecting the value and impact of libraries.1 the practice of ux has expanded and evolved and is now a growing specialty in the librarianship profession.2 ux in libraries is motivated by a call to actively pay close attention to users’ unique and distinctive requirements, which allows libraries to more effectively design services for our communities.3 as a practice, ux is now beginning to be represented in graduate curricula, public services and research support, access services, space design, and web design.4 with its attunement to a set of practices and principles, ux can be viewed as a research and design methodology similar and related to other methodologies that focus on users, services, problem solving, participation, collaboration, and qualitative data analysis .5 notably, ux is related to human-centered design, service design, and participatory design.6 mailto:swyoung@montana.edu mailto:chaoszuyu@gmail.com mailto:alc28@cornell.edu information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 2 specific methods of ux practice are today wide-ranging. they include surveys, focus groups, interviews, contextual inquiry, journey mapping, usability testing, personas, card sorting, a/b testing, ecology maps, observations, ethnography, prototyping, and blueprinting.7 some ux methods are incorporated into agile development processes.8 though tools and techniques are available to library ux practitioners in abundance, the rate of adoption of these tools is less understood. in a notable contribution to this question, pshock showed through a nation-wide survey that the most familiar ux methods among library practitioners included usability testing, surveys, and focus groups.9 the question of methods is related to the question of maturity—how advanced is library ux practice? in addition to the rate of adoption of methods and tools, several different ux maturity models have been advanced in recent years. priester derives maturity from four factors: culture of innovation, infrastructure agility, acceptance of failure, and library user focus.10 in discussing ux capacity in libraries, macdonald proposes a six-stage maturity model: unrecognized, recognized, considered, implemented, integrated, and institutionalized.11 sharon defines maturity as a combination of staff resources and organizational buy-in.12 similarly, sheldon-hess proposes a five-level scale of ux maturity, based primarily on the degree of implementation of ux practice and user-centered thinking in an organization.13 and even earlier, nielsen proposed an eight-level scale of ux maturity, starting with a “hostility toward usability” and concluding with a “userdriven” organization.14 after reviewing a number of different maturity models, anderson reports that the most common hierarchies include the following steps: (1) absence/unawareness of ux research, (2) ux research awareness—ad hoc research, (3) adoption of ux research into projects, (4) maturing of ux research into an organizational focus, (5) integrated ux research across strategy, and (6) complete ux research culture.15 the field of library ux shows a clear and compelling interest in ux maturity, and we can benefit from further empirical evidence that can help illuminate the current state and future progress toward ux maturity, including the rate of adoption of methods, resource allocation toward ux, and organizational buy-in. the research presented in this paper is motivated by the need to provide current and comprehensive data to answer questions related to ux maturity in academic libraries. methods research questions the research questions for this study are the following: • rq1: how mature is ux practice within academic libraries? • rq2: what factors influence ux maturity? to answer these questions, we distributed a survey to ux practitioners working in academic libraries. survey responses were analyzed qualitatively using content analysis and quantitatively using statistical analysis. survey participants the team members sent out the survey on may 23, 2018, to library profession electronic discussion lists.16 of the 87 received responses, 74 included an institution name. we identified size and setting classification using the carnegie classification of institutions of higher education (see information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 3 table 1) for the institutions.17 eight of them cannot be mapped to the carnegie classification due to being outside the united states (n = 6) or of different scopes (one research lab and one information school). six schools have more than one response, which are treated separately to represent the diversity of opinion and experience within an organization. classification response count percentage four-year, large 4918 56 four-year, medium 1019 11 university outside us 6 7 four-year, small 5 6 non-university 2 2 four-year, very small 1 1 two-year, very large 1 1 unspecified 13 15 table 1. institutional profiles of survey respondents, with response counts. materials and procedure our online survey was organized into two main parts. after an initial informed consent section, the survey investigated (1) demographics and ux methods and (2) ux maturity. demographics and ux methods in the first main part of the survey, participants were asked to select among 20 different ux methods that “you personally use at least every year or two at your institution.” the list of methods is derived from the ux research cheat sheet by nielsen norman group.20 participants were asked to complete an optional free-text response question: “would you like to add a comment clarifying the way you completed [this question]?” ux maturity in the second main part of the survey, participants were asked to identify the ux maturity stage that “properly describes the current ux status” in their organization. the stages were adapted from the eight-stage scale of ux maturity proposed by nielsen norman group: • stage 1: hostility toward usability • stage 2: developer-centered ux • stage 3: skunkworks ux • stage 4: dedicated ux budget • stage 5: managed usability • stage 6: systematic user-centered design process • stage 7: integrated user-centered design information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 4 • stage 8: user-driven organization we concluded the survey by asking participants to optionally “explain why you selected that stage” with a free-text response. research data analysis content analysis we followed the methodology of content analysis.21 each qualitative survey response functioned as a meaning unit, with meaning units sorted into themes and subthemes. each article author coded units independently; themes were resolved through discussion among the author group. the process of coding via content analysis allowed us to identify overarching trends in ux practice and maturity. results are further discussed below. statistical analysis data preparation and statistical analysis were conducted using r version 3.4.1 (see table 2 for full r package). base r was used for our statistical analysis. other r packages utilized in the project are listed in the table below. r package name version ggplot2 3.0.0 tibble 2.1.1 dplyr 0.7.5 tidyr 0.8.1 stringr 1.4.0 readr 1.1.1 readxl 1.3.1 table 2. r packages used in the analysis data preparation the following steps were taken in the data analysis: 1. content analysis into themes (see above) 2. normalize institution names. we received more than one response from a few institutions. for these, the responses were treated as separate responses that happened to have the same demographics. 3. for responses that included institution names, we added a total student population variable to the response using values derived from wikipedia and the carnegie classification of institutions of higher education. information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 5 4. for variables we derived during the content analysis we coded them as 0 or 1 dummy variables, that is, 0 = not present, and 1 = present. coding them in this way allows us to bring them into a multiple linear regression model. 5. using an r script, we tested each response for the presence of the content analysis, 0 or 1. 6. plots were created using the r ggplot2 library. 7. linear regression models were conducted using the base r lm function. research dataset dataset, survey instrument, and r code are available through dryad at https://doi.org/10.5061/dryad.jwstqjq5d.22 survey respondents eighty-seven participants responded to one or more components of the survey. see table 3 for a breakdown of survey responses. survey question responses ux methods multiple choice: “please check the following ux methods that you personally use at least every year or two at your institution.” 81 ux methods free-text response: “would you like to add a comment clarifying the way you completed [the question related to ux methods]?” 20 ux maturity stage multiple choice: “which of the following [maturity stages] do you think properly describes the current ux status in your organization?” 79 ux maturity stage free-text response: “please explain why you selected that stage.” 54 table 3. survey responses. results our research results demonstrate that certain characteristics of a library organization are related to ux maturity. these characteristics include the type and extent of ux methods that are currently in use, as well as organizational factors such as leadership support, staffing, and collaboration. we further explicate below according to our two research questions. rq1: how mature is user experience practice within academic libraries? our survey also asked participants to identify which stage of the nielsen norman group maturity scale “properly describes the current ux status” in their organization. our findings indicate that most libraries are in a low-to-middle range of maturity, with more than 75% of respondents placing their organization at either stage 3, stage 4, or stage 5 (figure 1). https://doi.org/10.5061/dryad.jwstqjq5d information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 6 figure 1. histogram of responses by stage, showing that the majority of respondents placed their organization at either stage 3, stage 4, or stage 5. rq2: what factors influence ux maturity? overview of statistical analysis results we use linear regression for two different applications in this study (see appendix a for glossary of terms related to statistical analysis). the process of creating a statistical model allows us to see, with varying degrees of confidence, the impact of different variables on ux maturity stage. the results of the linear regression help us to tease out the variables with the most predictive value. using certain methods does not cause the library to be at a higher stage; rather, libraries that use certain methods tend to be at a higher stage, statistically. that is what is meant by “predictive” in this context. linear regression provides a ground truth in what we think we are seeing in survey responses: a useful general principle in science is that when you don’t know the true form of a relationship, start with something simple. a linear equation is perhaps the simplest way to describe a relationship between two or more variables and still get reasonably accurate predictions.23 the other reason we are using the linear regression output is to inform a possible future version of a ux maturity survey instrument, one more finely tuned to libraries than the nielsen instrument alone that we used in this iteration.24 we feel that our use of multiple linear regression is appropriate and helpful given the exploratory nature of our study. the complete output is available at https://doi.org/10.5061/dryad.jwstqjq5d. size of institution we used the institution’s student population, the number of full-time enrolled students, as our proxy for the size of the library. our assumption being, larger enrollment generally means larger https://doi.org/10.5061/dryad.jwstqjq5d information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 7 number of library staff. there are different ways ux maturity level could be compared with student enrollment; because the range in our sample is very wide, from 1,098 to 200,000 students across the sample of institutions, we attempted to control for the vast differences in size between the smallest and largest by sorting, from smallest to largest, 1 to 69 (the total number of cases in our dataset with both a stage and population defined), then assigning rank to the institution as an additional demographic variable. we then created a simple linear regression model comparing maturity stage as a function of ranked size. the null hypothesis is that there is no relationship between ranked size of institution and stage. stage is the response variable and ranked size of the institution is the explanatory variable. the adjusted r-squared relationship is 0.027. this means that only about 3% of the variance is accounted for by the ranked size of the institution. the probability, or “p-value,” of getting our observed result if the null hypothesis is true for this relationship is 0.095 (almost 10%). this exceeds the standard .05 confidence level commonly used in statistical analysis. therefore the size of the institution is not a reliable predictor of ux maturity level in our sample, a counterintuitive finding. the full statistical summary is available in the appendix. methods currently in use by academic libraries our next rq2 finding relates to the type and extent of ux methods that are currently in use in academic libraries. our survey asked participants to select which ux methods “you personally use at least every year or two at your institution.” user surveys, usability testing, and user interviews stand out as the most commonly used. figure 2 shows response counts for all of the methods in the survey. information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 8 figure 2. number of respondents that selected each method in the survey, showing the type and extent of ux methods currently in use in academic libraries. we then examined the number of methods in use per institution compared to the reported maturity stage (figure 3). the number of methods used per institution illustrates a trend: more methods used at an institution generally means the institution is at a higher stag e of maturity. information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 9 figure 3. the number of methods used per institution, illustrating that more methods currently in use at an institution generally indicates a higher level of maturity. another way of representing the same two variables (reported number of methods and maturity stage) is with a scatterplot and statistical test (figure 4). in this simple linear regression model we have two variables: the response variable is stage and the explanatory variable is number of ux methods used in the past two years. in plotting these two variables on a chart, we can draw a line that minimizes the distance between the line and all of the points on the plot. like the chart above, the linear relationship between total methods and stage is clearly visible. the total number of methods practiced accounts for about 18% of the variance when predicting the correct maturity stage. (recall from our discussion about ranked size of institution that rank accounts for less than 3% of the variation, and is not even statistically significant.) in this case, the p-value is far below the 0.05 threshold, meaning the likelihood that we are seeing a relationship by random chance is very low. therefore total number of methods is predictive of stage. generally, the more methods respondents chose, the higher the maturity stage. we can see from this data that the number of methods used is more predictive of maturity stage than institution size. information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 10 figure 4. maturity stage compared against total number of methods used, showing the positive relationship between number of ux methods used and ux maturity stage. for a more granular view, figure 5 shows the relation of specific ux methods used in different ux research phases (as categorized in the survey question, with methods organized by discovery, exploration, listening, and testing) to reported maturity stage. information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 11 figure 5. showing the relation of specific ux methods to reported maturity stage. factors that influence ux methods: recency, formality, regularity we then applied content analysis to the free-text questions of our survey. following the question that asked participants to select among 20 different ux methods that “you personally use at least every year or two at your institution,” the free-text question asked, “would you like to add a comment clarifying the way you completed [this question]?” each of the 20 free-text responses to this question was counted and categorized as a “meaning unit.” themes were extracted from the free-text survey responses. we identified 3 themes across 20 meaning units: formality, regularity, and recency (see table 4). question would you like to add a comment clarifying the way you completed [this question related to ux methods]? thematic analysis theme definition number of meaning units* example meaning unit information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 12 recency how new or developed a library’s ux practice 7 “i am fairly new here and we are still developing a process that is well-rounded.” formality how formal or structured the ux practice is 9 “we are aware of many of the techniques mentioned, but we don’t have a formal process for implementing them.” regularity how often or frequently ux is practiced 4 “right now we are doing a workflow analysis of interlibrary loan, but once completed probably wouldn’t do that for another three to four years.” *each free-text response was counted and categorized as a single meaning unit. table 4. qualitative questions and thematic analysis for ux methods responses (n = 20). factors that influence ux maturity: leadership support, collaboration, ux lead, ux group, growth, and resources we also conducted a content analysis on the free-text responses to the survey question related to the ux maturity scale that asked participants to “explain why you selected that stage.” each of the 54 free-text responses to this question was counted and categorized as a “meaning unit.” themes were extracted from the free-text survey responses. we identified 7 themes across 54 meaning units: leadership support, collaboration, ux lead, ux group, growth, and resources, and strategic alignment (see table 5). question please explain why you selected [the current ux status in your organization]? thematic analysis theme definition number of meaning units* example meaning unit leadership support the degree to which ux work is seen, understood by, and supported by library leadership. 32 “just last year, the ux team moved into administration so that we can tie our work to strategic planning for the organization.” information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 13 ux group the presence of a committee or working group that conducts or otherwise supports ux work. 31 “i also chair a web working group which focuses on improving our website from a usability standpoint.” collaboration the degree to which ux work is collaboratively shared by individuals and departments throughout the library 30 “i don't know if ux has become a necessarily planned activity across the whole organization. i am team of one, and though i’ve tried, i haven’t been able to add anyone else to form an official ux team as well.” ux lead personnel assigned to ux work, especially a dedicated ux lead 30 “i have recently been hired to partially work with ux and another person has been appointed ux coordinator.” growth the degree to which expansion occurs around staffing, resources, and organizational understanding of ux work. 13 “we . . . will soon be posting a position for a ux librarian.” resources the amount of time and budgetary resources dedicated to ux. 10 “budget is our biggest constraint when it comes to ux testing.” strategic alignment the inclusion of ux or usercenteredness in strategic planning 2 “we do employ user research to determine where to target priorities and strategy. however, i do not think we have a robust process for iterative testing or participatory design yet.” * each free-text response was counted and categorized as a single meaning unit. table 5. qualitative questions and thematic analysis for ux maturity responses (n = 54). information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 14 this data can be visualized to show the relationships between ux maturity stage and the coded thematic responses (figure 6). figure 6. coded responses versus selected stage (0 indicates no comment related to that theme), showing that a lack of leadership support is often cited as a reason for not advancing past stage 3; the presence of dedicated staff in the form of a ux lead or a ux group is often cited as a reason for reaching stage 5. full ux maturity model: ux maturity as a function of ux methods in building a full model for the purposes of quantitative data analysis, we are attempting to predict the maturity stage based on the many different variables that appear in our dataset. this statistical exercise is a heuristic tool that can help us understand the survey responses and to draw results from the dataset that reveal key characteristics of ux maturity in libraries. we approached building a full model using a modified backward stepwise approach. with this approach, we begin with the full range of variables and work backward step by step to focus only on those variables that combine to form a model that makes the best predictions about the response variable—the ux maturity stage—for each case. through this process, those variables that are less predictive are removed from the model one by one until we can settle on a model that explains the most variance.25 the modified backward stepwise “step” function used to create our model required 18 iterations before settling on the best version. using adjusted r-squared as our metric, our full model accounts for 62% of the variance for this dataset. adjusted r-square is an appropriate measure because it allows us to include many variables but also includes a penalty for including too many variables (as a penalty, the adjusted r-square value will decrease). with this model, we can make reasonable estimates of the maturity stage that a survey respondent selected by knowing which methods they use combined with the coded explanation the respondent provided via the free-text survey questions. the coded responses (see table 4 and table 5) provided measurable insights into information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 15 the organizational context of our respondents’ institutions, and this allows us to analyze and predict their respective maturity levels. with this additional information, we have a model that represents the multiple dimensions available in the dataset (see appendix b for additional data analysis). in table 6, we show the relationship between specific ux methods and the ux maturity stages. we see here that journey mapping, for example, is a highly influential factor for ux maturity. variable estimated influence on maturity stage p-value (significance) journey maps 1.7 0.001*** design review 1.3 0.010* user interviews 0.9 0.047* usability testing 0.8 0.158 benchmark testing 0.7 0.067 usability ug review 0.3 0.498 user stories 0.2 0.440 requirements and constraints 0.2 0.514 user surveys -0.3 0.492 diary camera studies -0.6 0.325 faq review -0.8 0.062 prototype testing -0.8 0.076 field studies -1.3 0.003** *p < .05 (statistically significant result), **p < .01 , ***p <.001 (a highly statistically significant result) table 6. relationship between ux method variables and predicted maturity stage. in table 7, we show the relationship between the coded responses from the free-text survey questions (presented in tables 4 and 5), and the ux maturity stages. we see through this analysis that variables such as “resources” are important for advancing maturity. similarly, we see that a lack of “leadership support” has a strong negative effect on maturity. information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 16 variable estimated influence on maturity stage p-value (significance) resources: yes 2.9 0.014* collaboration: yes 0.8 0.147 growth: yes 0.2 0.561 resources: no -0.2 0.615 ux lead: no -0.5 0.216 leadership support: yes -0.7 0.177 ux lead: yes -0.9 0.038* ux group: no -0.9 0.022* leadership support: no -1.0 0.009** strategic alignment: no -2.8 0.012* *p < .05 (statistically significant result), **p < .01 , ***p <.001 (a highly statistically significant result) table 7. relationship between organizational variables and predicted maturity stage, in descending order of influence on maturity stage. a statistical example case: estimating ux maturity to help the reader understand the statistical summary provided by our model, we take a close look at one case drawn from one actual survey participant. in this example case, the respondent’s institution is a four-year, large university. the intercept for this multiple regression model happens to be 4.1119. intercept in a multiple regression model represents the mean response (stage) when all the predictors are all zero.26 it is a baseline. our example institution has practiced the following methods, with their respective influence on ux maturity included in parentheses: • user interviews (+ 0.9521) • usability testing (+ 0.7984) • benchmark testing (+ 0.7124) • usability bug review (+ 0.2692) • field studies (1.3346) • prototype testing (0.8454) • user surveys (0.3204) additionally, this institution has the following organizational characteristics, with their respective influence on ux maturity included in parentheses: • leadership support: yes (0.6842) information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 17 • resources: no (0.2192) by adding these numbers together with the starting point (4.111), we can calculate that the predicted stage for this institution is 3.44. actual stage selected by survey respondent: 3 residual -0.44 the model predicts a stage of 3.440 for this large, four-year university library. the model’s predicted value for this library is 0.440 greater than the stage selected by this survey respondent. the leftover part, or error, is the residual. the attentive reader might at this point ask why the variable called leadership support: yes has a negative estimate, 0.6842. that certainly is counterintuitive. other evidence and our own interpretation lead us to expect that leadership support: yes should have a positive effect on the maturity. in this particular case, the negative estimate has a high p-value (0.177) and is thus unreliable and not significant to the model. part of the unreliability stems from the relatively small number of institutions (n = 9) that were coded as leadership support: yes. in contrast, responses that were coded as leadership support: no (n = 23) produced an even lower negative estimate of -0.9911, with a very reliable and highly significant p-value of 0.009. this shows us that when leadership support is lacking, maturity reliably suffers. we discuss this and other organizational characteristics in more detail below. discussion in interpreting our results, we have identified four key areas that we wish to emphasize: the significance of leadership support, the importance of organization-wide collaboration, the role of applied ux methods, and the emerging theory and practice of ux and design in libraries. leadership support and strategic alignment a major theme evident in the results relates to leadership support and strategic alignment. as expressed by the survey respondents, leadership support is viewed as the degree to which ux work is seen, understood by, and supported by library leadership and organizational strategic planning. in particular, a lack of support and visioning from leadership exerts negative pressure on ux maturity. on the other hand, when ux is coordinated with leadership vision and situated into strategic planning, ux maturity was rated more highly. from a leadership perspective, ux maturity relies on an allocated budget and designated staff to move beyond an ad hoc approach and reach higher levels on the maturity scale. one might expect that the larger an institution, the more advanced the ux maturity stage. however, based on our data analysis, size of institution is not a significant factor in ux maturity. therefore the resources provided to library ux activities may not be about how large institutions are, but rather if leadership acknowledge the importance of ux and provide official, particularly financial, support. organizational collaboration another major theme was collaboration—the degree to which ux research is collaboratively shared by individuals and departments throughout the library. higher levels of ux maturity are driven by a widespread understanding of ux within an organization, with user research data information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 18 integrated into decision-making across multiple touchpoints. conversely, a lack of collaboration was a factor that hindered maturity. many respondents shared similar experiences, telling us that other staff or departments within the organization are not ready to embrace the potential of ux data, methods, and insights. we recognize that cultivating ux is an organic process that can result in uneven growth of ux within an organization. some units may be ready to move further and faster while others may hesitate to contribute or collaborate. not every department will immediately see the relevance or value of ux work for their area. accounting or human resources, for example, might consider ux as beyond the scope of their practice. thinking inclusively and holistically from the perspective of user-centered service design, however, opens up new connections between ux and the work of all departments across the organization. ux can help center those users—even internal users—who interact with service points such as accounting or human resources in ways that can improve the service experience for all involved. applied ux methods across the 20 methods that we included in the survey, our results indicate that the application of different ux methods varies widely in type and extent. many methods are in use to varying degrees. as the methods relate to maturity, we find that a greater number of methods in use during the previous two years was indicative of a higher maturity rating. in short, more methods lead to more maturity. the five most common methods included usability testing, user surveys, user interviews, accessibility evaluation, and field studies. these methods are similar in their ease of implementation and their wide representation in the library literature, and due to their commonness, they are not strongly indicative of ux maturity, high or low. the five least common methods included journey maps, benchmark testing, design review, faq review, and diary/camera studies. in this grouping we see a set of ux methods that are not as well known or widely discussed, but which can paint a more complete picture of the user experience. journey mapping in particular was strongly and positively influential on ux maturity in our statistical model. this result does not necessarily indicate that a library can boost ux maturity simply by creating a journey map. rather, we interpret this to indicate that the method itself is reflective of a coordinated ux effort in the institution. journey mapping aims to obtain a high-level overview of a user’s interactions with every touch point to accomplish a task. as such, the successful implementation of a journey map relies on cross-functional and cross-departmental input and interpretation. this result calls for greater collaboration toward greater ux maturity. ux as an emerging practice within libraries many respondents focused on the newness or the maturity of their library’s ux practice, and most responses connected a low methods usage to the newness of the practice. in these responses, we see that ux in libraries is still a new field, and the practice is emerging with variations across institutions in terms of methods and maturity. we note that institutional size was not a factor that influenced maturity—some smaller institutions reported mature ux practice while some larger institutions reported lower ux maturity. in this result, we see that the amount of possible resources matters less than the intentional application of those resources in support of ux work. as institutions begin to see the value of ux and dedicate increasingly more resources relative to their budgets, ux maturity increases. our survey respondents shared a variety of experiences along this journey toward maturity. many told us “i’m new here” and that their library doesn’t fully understand ux and isn’t yet ready to include ux research in decision-making or strategic planning, or that the institution doesn’t have a plan yet for how to integrate the ux librarian into library operations. still others reported that librarians in other units or library administrators are information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 19 not required or encouraged to consult with the ux librarian or integrate ux research. in this way, many libraries continue a more traditional model of decision-making that does not regularly apply intentional methods to account for the voices of users. on the upper end of the maturity scale, on the other hand, we see a wide adoption of ux as a legitimate area of work varies across units and within leadership groups. in this way, some libraries have demonstrated more responsiveness to ux and have more successfully integrated ux practices into strategic and operational workflows. through the survey responses, we see a threestep progression that marks the emergence of ux as a trusted and legitimate methodology for understanding user experiences and designing library services: recency, formality, and regularity (table 4). in the earlier stages of maturity, survey respondents emphasize the newness or recency of a group or person assigned to conduct ux work. from there, a ux practice emerges as increasingly more formal as more ux methods are introduced more often into different contexts. finally, as a library reaches ux maturity, we see a frequent application of a wide variety of ux methods in all corners of the library and with many stakeholders, along with organizational decision-making that regularly includes ux research data. a ux maturity scale for libraries to help in understanding of the ux maturity scale and the characteristics related to each of its stages, we have adapted the nielsen norman ux maturity scale for a library context. table 8 shows a set of organizational characteristics that correspond to the eight stages of ux maturity. the indicators in table 8 are presented as an approximate guideline for understanding and diagnosing ux maturity. stage key indicators stage 1–2 apathy or hostility to ux practice; lack of resources and staff for ux stage 3 ad hoc ux practices within the organization; ux is practiced, but unofficially and without dedicated resources or staff; leadership does not fully understand or support ux stage 4 leadership beginning to understand and support ux; dedicated ux budget; ux is assigned fully or partly to a permanent position stage 5 the ux lead or ux group collaborates with units across the organization and contributes ux data meaningfully to organizational and strategic decision-making stage 6 ux research data is regularly included in projects and decision-making; a wide variety of methods are practiced regularly by multiple departments stage 7–8 ux is practiced throughout the organization; decisions are made and resources are allocated only with ux insights as a guide table 8. key indicators for ux maturity in academic libraries. information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 20 this scale reflects the research presented in this paper while building on related models and prior research (more granularity is available in stages 2–6 because we received more survey responses representing those stages). we note that our research is consonant with prior work in this area. priestner includes a greater focus on library users (in contrast to a focus on library staff) as a key driver of library ux maturity.27 macdonald reports that ux work is defined by applied methods, in particular, qualitative research.28 sharon describes a ux maturity model based on two primary factors: the presence of ux researchers on staff and whether the organization actually listens to and responds to ux research.29 finally, sheldon-hess bases library ux maturity on the extent of applied ux methods and the level of user-centeredness present in an organization, as indicated by degree to which staff consider user perspectives in internal communications and decision making.30 taken together, we see common strands that can help illuminate the key factors of ux maturity in libraries: applied methods, leadership support in the form of resources and strategic alignment, organizational collaboration, and decision-making that includes ux research. strategies for climbing the maturity scale: toward a more user-centered library our results reveal a few key barriers and boosts to higher maturity, and one key point of stagnation. across the maturity scale, important factors that positively influence maturity involve leadership support and resource allocation toward ux in the form of personnel and infrastructure such as physical space, materials, strategic direction, and a working budget. notably, respondents in our survey reported being stuck at stage 3 due to a lack of leadership support. for instance, when resource-related comments appeared, we primarily heard about a lack of resources, which impaired maturity. participants reported a mixture of personnel in support of ux work. some libraries have a staff member dedicated to ux but lack a committee structure to support and advocate for the work. other libraries do not have dedicated ux staff but had formed committee infrastructure to collaboratively move ux forward. participants who lacked either a ux group or a ux lead reported lower levels of maturity and were particularly stagnated at stage 3 (see figure 5 above). alternatively, libraries are boosted to stage 5 with the presence of a fully empowered ux lead who has the support of a ux group or committee that can network throughout the organization and drive collaboration and cross-functional implementation of ux methods and research data. we found that respondents from libraries that possessed both a dedicated ux staff and a ux group tended to place themselves higher on the maturity scale. for those who reside at stage 5, having a ux group or a ux lead are the two main themes present in the survey. to move forward to stage 5, a library needs to organize a ux group with an appointed lead to coordinate ux practice widely throughout the organization, including in library spaces, web presence, learning services, and digital initiatives. a systematic and cooperative ux approach planned by an official ux group and led by a designated ux lead is the key indicator of stage 5. the support for the group and its lead needs to come rom not only leadership, but also colleagues throughout the library, which relates to the two major themes of leadership support and organizational collaboration. stage 7–8 is achievable only with significant investment in ux. given parent entity pressures, existing hierarchies, and prevailing non-user-centered cultures, libraries face a formidable set of challenges on the road to becoming user-centered organizations.31 this road is somewhat illuminated by the small number of survey respondents that marked themselves at a stage 7 or 8. highlights from their responses are instructive. one respondent told us, information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 21 we have multiple teams in the library to help with service design, conducting and gathering user research, and helping library staff think more about the user in their everyday work. we also have a couple special interest groups (sigs) dedicated to user research, ux, and assessment. we also have multiple departments within the library with ux expertise. from this response, we can see the key characteristics of ux maturity: leadership support up the line along with wide-spread collaboration throughout the organization. staff infrastructures including multiple ux-oriented committees help drive and coordinate ux work. this respondent also reported the recent hiring of an assessment librarian situated in the library’s administration department who will help coordinate ux work throughout the organization. these elements work together to meaningfully integrate user perspectives into both digital and physical spaces and in multiple units. moreover, this respondent marked 19 out of 20 ux methods currently in use (all but diary/camera studies), thus reinforcing the symbiotic relationship between ux maturity and ux methods: the variety of methods in use are a signal of maturity, and correspondingly, a greater maturity allows the space and resources for the application of more and different methods. another survey respondent at stage 7 remarked the following, my workplace has been very supportive in addressing ux issues both in digital and physical spaces. since being hired, i have created workflows that incorporate data that we gather from users. if there isn’t data gathered in a certain area, we usually find a way to update workflows so that we can get that data. almost every project that i have worked on digitally and in the physical spaces at the library has been the result of ux/ui data that has been gathered from our users. the elevated level of maturity at this library is especially reflected through the practice of “almost every project” being driven by user data. a truly user-centered library indeed integrates user data across all projects and advocates for the user at every opportunity. this respondent also marked a high variety of methods currently being practiced: 17 out of a possible 20 (methods not in use include diary/camera studies, user stories, and competitive analysis), further underscoring the two-way connection between methods and maturity. in further considering the upper reaches of maturity, we are inspired by an emerging theory of design-oriented librarianship that signals a professional paradigm shift such that ux could become recognized as a fundamental component of library research, education, and practice. 32 by investing more in ux methods, practices, and principles, libraries can achieve greater value and empowerment for our communities by designing more user-centered services and tools.33 ultimately, achieving stage 7–8 will result from deeply integrating user-centeredness across all operational phases, strategic planning, and decision-making of a library organization. limitations we note a few limitations of our study. first, the ux stages used in the survey were defined by jakob nielson in 2006 for corporate application, so it is perhaps a bit dated. further, the main goal of our statistical analysis is to develop a model that can accurately predict the ux maturity of a information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 22 library based on the ux methods employed at the institution combined with organizational characteristics. allison outlines three broad categories of error in regression analysis: measurement error: very few variables can be measured with perfect accuracy, especially in the social sciences. sampling error: in many cases, our data are only a sample from a larger population, and the sample will never be exactly like the population. uncontrolled variation: [age and schooling] are surely not the only variables that affect a person’s income, and these uncontrolled variables may “disturb” the relationship.34 in terms of measurement error, survey respondents may have bias when self-reporting maturity stage due to social pressures to produce desirable responses, meaning people tend to respond to self-report items in a manner that makes themselves look good.35 the resulting measurement error takes the form of over-reporting “desirable behavior” or under-reporting “undesirable behavior.” this is evident in some responses for ux maturity stages. for example, one respondent chose stage 5—“managed usability”—but the comment described a slightly different picture: i think we are still floundering between “dedicated ux budget” and “managed usability.” . . . we are at the stage where people know they should consult with us, but either they don’t or they do but don’t really hear the results, they are using us to confirm what they want to hear. in terms of sampling error, self-selection bias is a factor: our respondents might not be representative of the full population of ux librarians. we also did not make all of our questions mandatory, and as a result were not able to make use of all possible data within the scope of our survey. in terms of uncontrolled variation, our survey and statistical model does not fully account for all variables that influence ux maturity in libraries; for example, we included a limited list of ux methods, and we did not include questions that inquired specifically into the presence of a ux lead or ux group. future directions we see at least three paths forward for future research related to ux methods and maturity. first, librarianship would benefit from a ux maturity scale created specifically with and for our field’s theoreticians and practitioners. we propose one such scale above, but our scale has not undergone further testing, research, or validation. we note especially the library ux maturity scales of sheldon-hess and macdonald, which could be further synthesized or built upon.36 second, a selfassessment tool for diagnosing ux maturity could be developed based on a validated maturity scale. and third, the theory advanced by clarke that librarianship can be usefully conceived of and practiced as a design discipline warrants further critical attention, especially as it relates to the application of ux methods and the development of ux maturity.37 conclusion we applied a mixed-methods approach that involved content analysis and statistical analysis to a profession-wide survey. our research data and analysis demonstrate the type and extent of ux methods currently in use by academic libraries. the five most common methods are usability information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 23 testing, user surveys, user interviews, accessibility evaluation, and field studies. the five least common methods are journey maps, benchmark testing, design review, faq review, and diary/camera studies. furthermore, we identify the organizational characteristics that help or hinder the development of ux maturity. ux maturity in libraries is related to four key factors: the number of ux methods currently in use; the level of support from leadership in the form of strategic alignment, budget, and personnel; the extent of collaboration throughout the organization; and the degree to which organizational decisions are influenced by ux research. when one or more of these four connected factors advances, so too does ux maturity. we close by emphasizing three key factors for reaching higher levels of ux maturity. first, we encourage library leadership to see the value of ux and support its practice through strategic alignment and resource allocation. second, we encourage libraries to commit to integrating ux principles and practices across all units, especially into leadership groups and through organization-wide collaboration and workflows. third, ux methods should be reinforced and amplified with personnel, such as a standing ux group and a dedicated ux lead that can help direct ux work and enhance ux maturity. libraries have the promise and potential to more deeply practice ux. doing so can allow libraries to more deeply connect with users and reach higher levels of ux maturity, with the ultimate result of delivering tools and services that further empower our user communities. information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 24 appendix a: glossary of statistical terms term working definition adjusted r-squared adjusted r2 = variance of fitted model values / variance of response values. “the adjusted r-squared compares the descriptive power of regression models—two or more variables—that include a diverse number of independent variables—known as a predictor. every predictor or independent variable, added to a model increases the r-squared value and never decreases it. so, a model that includes several predictors will return higher r-squared values and may seem to be a better fit. however, this result is due to it including more terms. the adjusted r-squared compensates for the addition of variables and only increases if the new predictor enhances the model above what would be obtained by probability. conversely, it will decrease when a predictor improves the model less than what is predicted by chance.” source: https://www.investopedia.com/ask/answers/012615/whatsdifference-between-rsquared-and-adjusted-rsquared.asp confidence level “the confidence level tells you how sure you can be. it is expressed as a percentage and represents how often the true percentage of the population who would pick an answer lies within the confidence interval. the 95% confidence level means you can be 95% certain; the 99% confidence level means you can be 99% certain. most researchers use the 95% confidence level.” source: https://researchbasics.education.uconn.edu/confidenceintervals-and-levels/ confidence interval “a confidence interval is an interval which has a known and controlled probability (generally 95% or 99%) to contain the true value.” source: https://stats.oecd.org/glossary/detail.asp?id=5055 explained variance “explained variance (also called explained variation) is used to measure the discrepancy between a model and actual data. in other words, it’s the part of the model’s total variance that is explained by factors that are actually present and isn’t due to error variance.” source: https://www.statisticshowto.datasciencecentral.com/explained -variance-variation/ explanatory and response variables “the response variable is the focus of a question in a study or experiment. an explanatory variable is one that explains changes in that variable. it can be anything that might affect the https://www.investopedia.com/ask/answers/012615/whats-difference-between-rsquared-and-adjusted-rsquared.asp https://www.investopedia.com/ask/answers/012615/whats-difference-between-rsquared-and-adjusted-rsquared.asp https://researchbasics.education.uconn.edu/confidence-intervals-and-levels/ https://researchbasics.education.uconn.edu/confidence-intervals-and-levels/ https://stats.oecd.org/glossary/detail.asp?id=5055 https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/variance/ https://www.statisticshowto.datasciencecentral.com/explained-variance-variation/ https://www.statisticshowto.datasciencecentral.com/explained-variance-variation/ information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 25 response variable.” source: https://www.statisticshowto.datasciencecentral.com/explanato ry-variable/ multiple regression “multiple regression is a statistical method for studying the relationship between a single dependent [or response] variable and one or more independent [or explanatory] variables. it is unquestionably the most widely used statistical technique in the biological and physical sciences.”38 null hypothesis “in general, this term relates to a particular hypothesis under test, as distinct from the alternative hypotheses which are under consideration. it is therefore the hypothesis which determines the probability of the type i error. in some contexts, however, the term is restricted to an hypothesis under test of ‘no difference’.” source: https://stats.oecd.org/glossary/detail.asp?id=3767 probability or p-value “the p value is the probability of getting our observed result, or a more extreme result, if the null hypothesis is true.”39 simple linear regression “simple linear regression models the relationship between the magnitude of one variable and that of a second for example, as x increases, y also increases. or as x increases, y decreases.”40 statistical significance “statistical significance refers to the claim that a result from data generated by testing or experimentation is not likely to occur randomly or by chance but is instead likely to be attributable to a specific cause. having statistical significance is important for academic disciplines or practitioners that rely heavily on analyzing data and research, such as economics, finance, investing, medicine, physics, and biology. statistical significance can be considered strong or weak. when analyzing a data set and doing the necessary tests to discern whether one or more variables have an effect on an outcome, strong statistical significance helps support the fact that the results are real and not caused by luck or chance. simply stated, if a statistic has high significance then it's considered more reliable.” source: https://www.investopedia.com/terms/s/statisticalsignificance.asp variance “the variance is the mean square deviation of the variable around the average value. it reflects the dispersion of the empirical values around its mean.” source: https://stats.oecd.org/glossary/detail.asp?id=5160 https://www.statisticshowto.datasciencecentral.com/explanatory-variable/ https://www.statisticshowto.datasciencecentral.com/explanatory-variable/ https://stats.oecd.org/glossary/detail.asp?id=3767 https://www.investopedia.com/terms/s/statistical-significance.asp https://www.investopedia.com/terms/s/statistical-significance.asp https://stats.oecd.org/glossary/detail.asp?id=5160 information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 26 appendix b: additional data analysis model: stage as a function of population rank model: stage as a function of total methods information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 27 model: variables that combine to produce the most accurate stage predictions information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 28 endnotes 1 david liddle, “best value—the impact on libraries: practical steps in demonstrating best value,” library management 20, no. 4 (june 1, 1999): 206–14, https://doi.org/10.1108/01435129910268982. 2 daniel pshock, “the user experience of libraries: serving the common good,” user experience 17, no. 2 (2017), https://web.archive.org/web/20190822051708/http://uxpamagazine.org/the-userexperience-of-libraries/. 3 bruce massis, “the user experience (ux) in libraries,” information and learning sciences 119, no. 3/4 (march 12, 2018): 241–44, https://doi.org/10.1108/ils-12-2017-0132. 4 rachel fleming-may et al., “experience assessment: designing an innovative curriculum for assessment and ux professionals,” performance measurement and metrics 19, no. 1 (december 15, 2017): 30–39, https://doi.org/10.1108/pmm-09-2017-0036; rachel ivy clarke, satyen amonkar, and ann rosenblad, “design thinking and methods in library practice and graduate library education,” journal of librarianship and information science (september 8, 2019), https://doi.org/10.1177/0961000619871989; aja bettencourt-mccarthy and dawn lowewincentsen, “how do undergraduates research? a user experience experience,” ola quarterly 22, no. 3 (february 22, 2017): 20–25, https://doi.org/10.7710/1093-7374.1866; juan carlos rodriguez, kristin meyer, and brian merry, “understand, identify, and respond: the new focus of access services,” portal: libraries and the academy 17, no. 2 (april 8, 2017): 321–35, https://doi.org/10.1353/pla.2017.0019; asha l. hegde, patricia m. boucher, and allison d. lavelle, “how do you work? understanding user needs for responsive study space design,” college & research libraries 79, no. 7 (2018), https://doi.org/10.5860/crl.79.7.895; amy deschenes, “improving the library homepage through user research—without a total redesign,” weave: journal of library user experience 1, no. 1 (2014), https://doi.org/10.3998/weave.12535642.0001.102. 5 amanda kraft, “parsing the acronyms of user-centered design,” in 2019 ascue proceedings (association supporting computer users in education (ascue), myrtle beach, south carolina, 2019), 61–69, https://eric.ed.gov/?id=ed597115. 6 ideo, the field guide to human-centered design (san francisco: ideo, 2015); joe marquez and annie downey, library service design: a lita guide to holistic assessment, insight, and improvement (lanham, md: rowman & littlefield, 2016); scott w. h. young and celina brownotter, “toward a more just library: participatory design with native american students,” weave: journal of library user experience 1, no. 9 (2018), https://doi.org/10.3998/weave.12535642.0001.901. 7 aaron schmidt and amanda etches, useful, usable, desirable: applying user experience design to your library (chicago: ala editions, 2014); joe j. marquez and annie downey, getting started in service design: a how-to-do-it manual for librarians (chicago: american library association, 2017). https://doi.org/10.1108/01435129910268982 https://web.archive.org/web/20190822051708/http:/uxpamagazine.org/the-user-experience-of-libraries/ https://web.archive.org/web/20190822051708/http:/uxpamagazine.org/the-user-experience-of-libraries/ https://doi.org/10.1108/ils-12-2017-0132 https://doi.org/10.1108/pmm-09-2017-0036 https://doi.org/10.1177/0961000619871989 https://doi.org/10.7710/1093-7374.1866 https://doi.org/10.1353/pla.2017.0019 https://doi.org/10.5860/crl.79.7.895 https://doi.org/10.3998/weave.12535642.0001.102 https://eric.ed.gov/?id=ed597115 https://doi.org/10.3998/weave.12535642.0001.901 information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 29 8 zoe chao, “rethinking user experience studies in libraries: the story of ux café,” weave: journal of library user experience 2, no. 2 (2019), https://doi.org/10.3998/weave.12535642.0002.203. 9 daniel pshock, “results from the 2017 library user experience survey,” designing for digital, march 6, 2018, https://web.archive.org/web/20190829163234/https://d4d2018.sched.com/event/dm8h/d 16-02-results-from-the-2017-library-user-experience-survey. 10 andy priestner, “approaching maturity? ux adoption in libraries,” in user experience in libraries: yearbook 2017, ed. andy priestner (cambridge, england: ux in libraries, 2017), 1–8. 11 craig m. macdonald, “‘it takes a village’: on ux librarianship and building ux capacity in libraries,” journal of library administration 57, no. 2 (february 17, 2017): 194–214, https://doi.org/10.1080/01930826.2016.1232942. 12 tomer sharon, “ux research maturity model,” prototypr (blog), 2016, https://web.archive.org/web/20190829163113/https://blog.prototypr.io/ux-researchmaturity-model-9e9c6c0edb83?gi=c462f7ac4600. 13 coral sheldon-hess, “ux, consideration, and a cmmi-based model,” coral sheldon-hess blog (blog), 2013, https://web.archive.org/web/20190117144529/http://www.sheldonhess.org/coral/2013/07/ux-consideration-cmmi/. 14 jakob nielsen, “corporate ux maturity: stages 1–4,” nielsen norman group, 2006, https://web.archive.org/web/20190709231540/https://www.nngroup.com/articles/uxmaturity-stages-1-4/; jakob nielsen, “corporate ux maturity: stages 5–8,” nielsen norman group, 2006, https://web.archive.org/web/20190709231533/https://www.nngroup.com/articles/uxmaturity-stages-5-8/. 15 nikki anderson, “ux maturity: how to grow user research in your organization,” medium, may 1, 2019, https://medium.com/researchops-community/ux-maturity-how-to-grow-userresearch-in-your-organization-848715c3543. 16 including: the user experience working group under the digital libraries federation assessment interest group (dlf aig ux), code4lib, assessment listserv of association of research libraries (arl), access conference list, coalition for networked information (cni), library and information technology association (lita), library user experience (libux) slack channel, and ala user experience interest group. 17 current index and classification list available from http://www.carnegieclassifications.iu.edu/classification_descriptions/size_setting.php ; data at time of analysis available from indiana university center for postsecondary research (2018). carnegie classifications 2018 public data file, https://web.archive.org/web/20191006220952/http://carnegieclassifications.iu.edu/downl oads/ccihe2018-publicdatafile.xlsx. https://doi.org/10.3998/weave.12535642.0002.203 https://web.archive.org/web/20190829163234/https:/d4d2018.sched.com/event/dm8h/d16-02-results-from-the-2017-library-user-experience-survey https://web.archive.org/web/20190829163234/https:/d4d2018.sched.com/event/dm8h/d16-02-results-from-the-2017-library-user-experience-survey https://doi.org/10.1080/01930826.2016.1232942 https://web.archive.org/web/20190829163113/https:/blog.prototypr.io/ux-research-maturity-model-9e9c6c0edb83?gi=c462f7ac4600 https://web.archive.org/web/20190829163113/https:/blog.prototypr.io/ux-research-maturity-model-9e9c6c0edb83?gi=c462f7ac4600 https://web.archive.org/web/20190117144529/http:/www.sheldon-hess.org/coral/2013/07/ux-consideration-cmmi/ https://web.archive.org/web/20190117144529/http:/www.sheldon-hess.org/coral/2013/07/ux-consideration-cmmi/ https://web.archive.org/web/20190709231540/https:/www.nngroup.com/articles/ux-maturity-stages-1-4/ https://web.archive.org/web/20190709231540/https:/www.nngroup.com/articles/ux-maturity-stages-1-4/ https://web.archive.org/web/20190709231533/https:/www.nngroup.com/articles/ux-maturity-stages-5-8/ https://web.archive.org/web/20190709231533/https:/www.nngroup.com/articles/ux-maturity-stages-5-8/ https://medium.com/researchops-community/ux-maturity-how-to-grow-user-research-in-your-organization-848715c3543 https://medium.com/researchops-community/ux-maturity-how-to-grow-user-research-in-your-organization-848715c3543 http://www.carnegieclassifications.iu.edu/classification_descriptions/size_setting.php https://web.archive.org/web/20191006220952/http:/carnegieclassifications.iu.edu/downloads/ccihe2018-publicdatafile.xlsx https://web.archive.org/web/20191006220952/http:/carnegieclassifications.iu.edu/downloads/ccihe2018-publicdatafile.xlsx information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 30 18 five institutions appear in this count twice, that is, on five occasions, two persons from the same institution responded separately to the survey. we invited this type of response to capture diversity of opinion and experience within an organization. 19 one institution appears in this count twice, for the same reason as explained in the previous endnote. 20 susan farrell, “ux research cheat sheet,” nielsen norman group, february 12, 2017, https://web.archive.org/web/20190828224735/https://www.nngroup.com/articles/uxresearch-cheat-sheet/. 21 klaus krippendorff, content analysis: an introduction to its methodology, 3rd edition (los angeles: sage, 2012). 22 scott w. h. young, zoe chao, and adam chandler, “data from: user experience methods and maturity in academic libraries,” distributed by dryad, https://doi.org/10.5061/dryad.jwstqjq5d. 23 paul d. allison, multiple regression: a primer (thousand oaks, ca: pine forge, 1999), 6. 24 we are aware that our use of linear regression with this small sample surely “over-fits” the dataset, that is, the model is unlikely to predict as accurately if applied to a different dataset. the model will undergo further refinement in the future. 25 we made a conscious choice to leave in some variables in this model that are statistically insignificant. we did so because it might be too early to fully dismiss these elements as unimportant; it could be that our sample was too small to really be certain. furthermore, our primary emphasis is in creating a model that does a good job of accurately predicting stage based on an array of different characteristics. removing all the nonsignificant variables in this model would actually lower the prediction accuracy. adjusted r-squared accounts for additional variables. 26 for more description of the multiple linear regression model, please see https://web.archive.org/web/20191006231250/https://newonlinecourses.science.psu.edu/s tat462/node/131/. 27 priestner, “approaching maturity? ux adoption in libraries.” 28 craig m. macdonald, “user experience librarians: user advocates, user researchers, usability evaluators, or all of the above?,” proceedings of the association for information science and technology 52, no. 1 (2015): 1–10, https://doi.org/10.1002/pra2.2015.145052010055. 29 sharon, “ux research maturity model.” 30 sheldon-hess, “ux, consideration, and a cmmi-based model.” 31 macdonald, “‘it takes a village.’” https://web.archive.org/web/20190828224735/https:/www.nngroup.com/articles/ux-research-cheat-sheet/ https://web.archive.org/web/20190828224735/https:/www.nngroup.com/articles/ux-research-cheat-sheet/ https://doi.org/10.5061/dryad.jwstqjq5d https://web.archive.org/web/20191006231250/https:/newonlinecourses.science.psu.edu/stat462/node/131/ https://web.archive.org/web/20191006231250/https:/newonlinecourses.science.psu.edu/stat462/node/131/ https://doi.org/10.1002/pra2.2015.145052010055 information technology and libraries march 2020 user experience methods and maturity in academic libraries | young, chao, and chandler 31 32 rachel ivy clarke, “toward a design epistemology for librarianship,” the library quarterly 88, no. 1 (2018): 41–59, https://doi.org/10.1086/694872; rachel ivy clarke, “how we done it good: research through design as a legitimate methodology for librarianship,” library & information science research 40, no. 3 (july 1, 2018): 255–61, https://doi.org/10.1016/j.lisr.2018.09.007. 33 clarke, amonkar, and rosenblad, “design thinking and methods in library practice and graduate library education.” 34 allison, multiple regression: a primer, 14. 35 delroy l. paulhus, “socially desirable responding: the evolution of a construct,” in the role of constructs in psychological and educational measurement (mahwah, nj: lawrence erlbaum, 2002), 49–69. 36 sheldon-hess, “ux, consideration, and a cmmi-based model”; macdonald, “‘it takes a village.’” 37 rachel ivy clarke, “design thinking for design librarians: rethinking art and design librarianship,” in the handbook of art and design librarianship, ed. paul glassman and judy dyki, 2nd edition (chicago: ala neal-schuman, 2017), 41–49; clarke, “toward a design epistemology for librarianship”; clarke, “how we done it good”; clarke, amonkar, and rosenblad, “design thinking and methods in library practice and graduate library education”; shannon marie robinson, “critical design in librarianship: visual and narrative exploration for critical praxis,” the library quarterly 89, no. 4 (october 1, 2019): 348–61, https://doi.org/10.1086/704965. 38 allison, multiple regression: a primer, 1. 39 geoff cumming, understanding the new statistics: effect sizes, confidence intervals, and metaanalysis (new york: routledge, 2012), 26. 40 peter bruce and andrew bruce, “regression and prediction,” in practical statistics for data scientists: 50 essential concepts (sebastopol, ca: o’reilly media, 2017). https://doi.org/10.1002/pra2.2015.145052010055 https://doi.org/10.1016/j.lisr.2018.09.007 https://doi.org/10.1086/704965 abstract introduction background and motivation: ux in academic libraries methods research questions survey participants materials and procedure demographics and ux methods ux maturity research data analysis content analysis statistical analysis data preparation research dataset survey respondents results rq1: how mature is user experience practice within academic libraries? rq2: what factors influence ux maturity? overview of statistical analysis results size of institution methods currently in use by academic libraries factors that influence ux methods: recency, formality, regularity factors that influence ux maturity: leadership support, collaboration, ux lead, ux group, growth, and resources full ux maturity model: ux maturity as a function of ux methods a statistical example case: estimating ux maturity discussion leadership support and strategic alignment organizational collaboration applied ux methods ux as an emerging practice within libraries a ux maturity scale for libraries strategies for climbing the maturity scale: toward a more user-centered library limitations future directions conclusion appendix a: glossary of statistical terms appendix b: additional data analysis model: stage as a function of population rank model: stage as a function of total methods model: variables that combine to produce the most accurate stage predictions endnotes editor’s comments: odds and ends bob gerrity information technology and libraries | june 2016 1 this issue marks the midpoint of information technology and libraries’ fifth year as an openaccess e-only journal. the move to online-only in 2012 was inevitable, as ital’s print subscription base was longer covering the costs of producing and distributing the print journal. moving to an eonly model using an open-source publishing platform (the public knowledge project’s open journal systems) provided a low-cost production and distribution system that has allowed ital to continue publishing without requiring a large ongoing investment from lita. the move to open access, however, was not inevitable, and i commend lita for supporting that move and for continuing to provide a base subsidy that supports the journal’s ongoing publication. i also thank the boston college libraries for their ongoing support in hosting ital along with a number of other oa journals. since ital is now open, access to it can no longer be offered as an exclusive benefit that comes with lita membership. regardless of the publishing model, though, ital has always relied on voluntary contributions of the time and expertise of reviewers and editors. i’d like to acknowledge the contributions of our past and current editorial board members, who play a key role in ensuring the ongoing quality and vitality of the journal. we will be adding a few additional board members shortly, to help ensure that review of submissions to the journal are completed as quickly and effectively as possible. speaking of peer review, one of the recent innovative startups in the scholarly communication space is a company called publons, which tracks and verifies peer-review activity, providing a mechanism for academics to report (and possibly receive institutional credit for) their peerreview work, an undervalued part of the scholarly communication framework. (full disclosure: at university of queensland we are conducting a pilot project with publons, to integrate the peerreview activities of our academics into our institutional repository.) in addition to new approaches to peer review, such as publons and academic karma, there are quite a few recent examples of innovations in various aspects of scholarly communication that are worth keeping an eye on. these include new collaborative authoring tools such as overleaf, impact-measurement tools such as impactstory, and personal digital library platforms such as readcube. on a broader scale, initiatives such as peerj are building open access publishing platforms intended to dramatically improve the efficiency of and drive down the overall costs of scholarly publishing. february marked the 14th anniversary of a key trigger event in the open access movement—the launch of the budapest open access initiative in 2002. bob gerrity (r.gerrity@uq.edu.au), a member of lita and the editor of information technology and libraries, is university librarian at the university of queensland, brisbane, australia. http://ejournals.bc.edu/ojs/index.php/ https://publons.com/ http://academickarma.org/ https://www.overleaf.com/ https://impactstory.org/ https://www.readcube.com/ https://peerj.com/ http://www.budapestopenaccessinitiative.org/read mailto:r.gerrity@uq.edu.au editor’s comments | gerrity doi: 10.6017/ital.v35i2.9462 2 much has happened in the 14 years since the budapest initiative, on various fronts: o policy—introduction and widespread adoption of funder and institutional oa mandates; o technology--development and widespread adoption of institutional repositories, recent development of mechanisms to facilitate the discovery of oa publications (e.g., share on the library side and chorus on the publisher side); o publishing—establishment of new oa megajournals (e.g., plos, biomed central), embrace of hybrid oa models by mainstream commercial publishers. yet despite all the hype, acrimony, and activity triggered by the oa movement, a recent analysis in chronical of higher education suggests the growth of oa has been slow and incremental: the percentage of research articles published annually in fully open-access format has increased at an average rate of of around one percent a year, from 4.8% in 2008 to 12% in 2015. at this rate, the tipping point for oa still seems very far away. lots of energy has been and continues to be invested by different stakeholders in different approaches, and the green vs. gold argument still predominates. recent developments suggest momentum is gaining for a more radical shift. in december 2015, the max planck institute, a key player in the launch of oa with the berlin declaration on open access in 2003, hosted the 12th version of its annual oa conference to further the discussion around open access. ironically, unlike previous meetings and seemingly in philosophical conflict with the underpinnings of the oa movement, the meeting was by invitation only. given the topic, though, a “proposal to flip subscription journals to open access,” the closed nature of of the meeting is understandable. underpinning the proposal was a 2015 paper from the max planck digital library that suggested that the amount of money currently being spent (largely by libraries) on journal subscriptions should be sufficient to fund research publication costs if applied to a “flipped” journal publishing business model, from subscription-based to gold open access.1 in the netherlands, the university sector has adopted a national approach in negotiating deals with several major publishers (springer, sage, elsevier, and wiley) that allow dutch authors to publish their papers as gold oa, without additional charges (but, depending on the publisher, with limits on total numbers and/or which journals are available within the deals).2 the so-called “dutch deal” by the vsnu (association of universities in the netherlands) and ukb (dutch consortium of university libraries and royal library) takes a national approach to flipping the model, attempting to bundle access rights for dutch readers with apc credits for dutch authors. http://www.arl.org/focus-areas/shared-access-research-ecosystem-share#.v3xhlzn95ty http://www.chorusaccess.org/ http://chronicle.com/article/as-an-open-access-megajournal/234890 http://chronicle.com/article/as-an-open-access-megajournal/234890 https://openaccess.mpg.de/berlin-declaration https://openaccess.mpg.de/berlin-declaration https://www.mpg.de/9202262/area-wide-transition-open-access information technology and libraries | june 2016 3 the dutch government, which currently holds the eu presidency, is pushing hard for a europewide adoption of this approach. last month, the eu’s competitiveness council agreed that all scientific papers should be freely available by 2020.3 meanwhile, in the us, the “pay it forward” research project at the university of california is examining what the institutional financial impact would be with a flipped model. the study is looking at existing institutional journal expenditures on subscriptions and modeling what a future, apc-based model would look like based on institutional research publication output and estimated average apc charges. who knows when or if a global flip might occur, but it does strike me that the scholarly publishing world is overdue for a major shakeup. from the point of view of a university librarian, focused on keeping journal subscription costs in line (unsuccessfully i might add), i think there is real danger in not considering what a flip to a gold model might look like. the commercial publishers we all complain about are successfully exploiting the gold model as an additional revenue stream which, for the most part, academic libraries have been ignoring, since the individual apcs typically are paid from someone else’s budget. this has allowed the overall envelope of spending on research publication (subscriptions and apcs) to grow significantly. perhaps a more interesting question is what the impact of a flip on libraries would be. if gold oa became the predominant model, we would no longer need all of the complex systems we’ve built to manage subscriptions and user access. to quote homer simpson, “woohoo!” in the “watch this space” arena, ebsco’s recently-launched open-source library services platform (lsp) initiative is beginning to take shape. it now has a name—folio (for future of the libraries is open)—and as marshall breeding put it, the project “injects a new dynamic into the competitive landscape of academic library technology, pitting and open source framework backed by ebsco against a proprietary market dominated by ex libris, now owned by ebsco archrival proquest.”4 publicly listed participants in the project include (in addition to ebsco) ole, index data, bywater, bibliolabs, and sirsi dynix.5 the platform release timetable calls for an initial, “technical preview” release of of the code for the base platform in august 2016, and an anticipated release of the apps needed to operate a library in early 2018.6 1. ralf schimmer, kai karin geschuhn, andreas vogler, disrupting the subscription journals’ business model for the necessary large-scale transformation to open access, (2015), doi:10.17617/1.3 2. frank huysmans, vsnu-wiley: not such a big deal for open access, warekennis (blog), march 1, 2016, https://warekennis.nl/vsnu-wiley-not-such-a-big-deal-for-open-access/ 3. martin enserink, “in dramatic statement, european leaders call for ‘immediate’ open access to all scientific papers by by 2020,” science, may 27, 2016, doi:10.1126/science.aag0577. http://icis.ucdavis.edu/?page_id=286 https://warekennis.nl/vsnu-wiley-not-such-a-big-deal-for-open-access/ editor’s comments | gerrity doi: 10.6017/ital.v35i2.9462 4 4. marshall breeding, ebsco supports new open source project, amercian libraries, april 22, 2016, https://americanlibrariesmagazine.org/2016/04/22/ebsco-kuali-open-source-project/ 5. https://www.folio.org/collaboration.php. 6. https://www.folio.org/apps-timelines.php. https://americanlibrariesmagazine.org/2016/04/22/ebsco-kuali-open-source-project/ https://www.folio.org/collaboration.php https://www.folio.org/apps-timelines.php taking the long way around: improving the display of hathitrust records in the primo discovery system jason alden bengtson and jason coleman information technology and libraries | march 2019 27 jason bengtson (jbengtson@ksu.edu) is head of it services for kansas state university libraries. jason coleman (coleman@ksu.edu) is head of library user services for kansas state university libraries. abstract as with any shared format for serializing data, primo’s pnx records have limits on the types of data which they pass along from the source records and into the primo tool. as a result of these limitations, pnx records do not currently have a provision for harvesting and transferring rights information about hathitrust holdings that the kansas state university (ksu) library system indexes through primo. this created a problem, since primo was defaulting to indicate that all hathitrust materials were available to ksu libraries (k-state libraries) patrons, when only a limited portion of them actually were. this disconnect was infuriating some library users, and creating difficulties for the public services librarians. there was a library-wide discussion about removing hathitrust holdings from primo altogether, but it was decided that such a solution was an overreaction. as a consequence, the library it department began a crash program to attempt to find a solution to the problem. the result was an application called hathigenius. introduction many information professionals will be aware of primo, the web scale discovery tool provided by ex libris. web scale discovery services are designed to provide indexing and searching user experiences, not only for the library’s holdings (as with a traditional online public access catalog), but also for many of a library’s licensed and open access holdings. primo offers a variety of useful features for search and discovery, taking in data from manifold sources and serializing them into a common format for indexing within the tool. however, such applications are still relatively young, and the technologies powering them have not fully matured. the combination of this lack of maturity and deliberately closed architecture between vendors leads to several problems for the user. one of the most frustrating is errors in identifying full-text access availability. as with any shared format for serializing data, primo’s pnx (primo normalized xml) records have limits on the types of data they pass from the source records into the primo tool. as a result of these limitations, pnx records do not currently have a provision for harvesting and transferring rights information about hathitrust holdings that the k-state libraries system indexes through primo. this created a problem in the k-state libraries’ implementation, since primo was defaulting to indicate that all hathitrust materials were available to k-state libraries patrons, when only a limited portion of them actually were. this disconnect was infuriating some library users, and creating difficulties for the public services librarians. there was a library-wide discussion about removing hathitrust holdings from primo altogether, but it was decided that such a solution was an overreaction. as a consequence, the library it services department began a crash program to attempt to find a solution to the problem. taking the long way around | bengston and coleman 28 https://doi.org/10.6017/ital.v38i1.10574 hathitrust’s digital library as a collection in primo central hathitrust was established in 2008 as a collaboration among several research libraries that were interested in preserving digital content. as of the beginning of march 2018, the collaborative’s digital library contained more than sixteen million items, approximately 37 percent of which were in the public domain.1 ex libris’ primo central index (pci), which serves as primo’s built-in index of articles from various database providers, includes metadata for the vast majority of the items in hathitrust’s digital library, providing inline frames within the original primo user interface to directly display full-text content of those items that the library has access to. libraries subscribing to primo choose whether or not to make these records available to their users. k-state libraries, like many other primo central clients, elected to activate hathitrust in its instance of primo, which it has branded with the name search it. the unmodified version of primo central identified all records from hathitrust’s digital library as available online, regardless of the actual level of access provided to users. users who discovered a record for an item from hathitrust’s digital library were presented with a conspicuous message indicating that full text was available and two links named view it and details. an example of the appearance of these search results is shown in figure 1. after clicking the “view it” tab, the center window would display the item’s homepage from hathitrust’s digital library inside an iframe. public domain items would display the title page of the item and present users with an interface containing numerous visual indicators that they were viewing an ebook (see figure 2 for an example). items with copyright restrictions would display a message indicating that the item is not available online (see figure 3 for an example). figure 1. two books from hathitrust as they appeared in search it prior to implementation of hathigenius. information technology and libraries | march 2019 29 figure 2. hathitrust result for an item in the public domain. figure 3. hathitrust’s homepage for an item that is not in the public domain. despite the intentions evident in the design of the primo interface, availability of hathitrust records was not being accurately reflected in the list of returns. the size of the indices underlying web scale discovery systems and the number of configurations and settings that must be maintained locally introduce a variety of failure points that can intercede when patrons attempt to access subscribed resources.2 one of the failure points identified by sunshine and carter is inaccurate knowledgebase information. the scope of inaccurate information about hathitrust items in primo central index constituted a particularly egregious example of this type of failure. patron reaction to misinformation about access to hathitrust between the time hathitrust’s digital library was activated in search it and the time the hathigenius application was installed at least thirty patrons contacted k-state libraries to ask why they were unable to access a book in hathitrust when search it had indicated that full text was available for the book. many of these expressed frustration at frequently encountering this error (for an example, see figure 4). taking the long way around | bengston and coleman 30 https://doi.org/10.6017/ital.v38i1.10574 1:08 26389957777759601093088133 i find it misleading that the search it function often finds a book i am interested in, but sometimes says it is available online; however, oftentimes it takes me to the hathi trust webpage for the book where i am told it is not available online. is this because our library has had to give up their subscription to this service? 1:08 me hi! 1:09 me that is definitely frustrating and we are trying to find a way to correct it. 1:10 me it does not have to do with our subscription, but rather the metadata we receive from hathitrust and its compatibility (or rather, incompatibility) with search it 1:11 26389957777759601093088133 okay, so i guess i better ask for the book i am seeking (the emperor’s mirror) through ill. 1:11 me that’d probably be your best bet, but let me take a look one moment 1:14 me yes, ill does look best. please note that the ill department will be closed after today until january 1:14 26389957777759601093088133 got it. thanks. i hope the hathi trust issue is resolved soon. (i have seen this problem all semester and finally got so frustrated to ask about it.) 1:15 26389957777759601093088133 have a happy holiday! 1:15 me you as well! and yes, i hope we can figure it out asap 1:15 me (it’s frustrating for us, too!) 1:20 26389957777759601093088133 has left the conversation figure 4. chat transcript revealing frustration with inaccurate information about availability of items in hathitrust. staff reaction to misinformation about access to hathitrust reference staff at k-state libraries use a ticketing system to report electronic resource access problems to a team of librarians who troubleshoot the underlying issues. shortly after the hathitrust library was activated in search it, reference staff submitted several tickets about problems with access to items in that collection. members of the troubleshooting team responded quickly and informed the reporting librarians that the problem was one beyond their control. this message was slow to reach the entirety of the reference staff and was not always understood as being applicable to the full range of access problems our patrons were experiencing. samples and healy note that this type of decentralization and reactive orientation is common in electronic resource troubleshooting.3 like them, k-state libraries recognized a need to develop best practices to obviate confusion. we also found ourselves pining for a tool such as that described by collins and murray that could automatically verify access for a large set of links.4 the extent of displeasure with the situation was so severe that some librarians stated they were loath to promote search it to students since several million records were so conspicuously inaccurate. information technology and libraries | march 2019 31 technical challenges the k-state libraries it department wanted to fix the situation, in order to provide accurate expectations to their users, but doing so presented severe technical challenges, the most significant of which stemmed from the lack of rights information in the pnx record in primo. without more accurate information on availability, user satisfaction seemed destined to remain low. research into patron use of discovery layers predicted this unsurprising dissatisfaction. oclc’s (2009) research into what patrons want from discovery system led the researchers to conclude that “a seamless, easy flow from discovery through delivery is critical to end users. this point may seem obvious, but it is important to remember that for many end users, without the delivery of something he or she wants or needs, discovery alone is a waste of time.”5 a later usability study reported: “some participants spent considerable time looking around for features they hoped or presumed existed that would support their path toward task completion.”6 additionally, the perceived need to customize discovery layers so that they reflect the needs of a particular research library is hardly new, or exclusive to k-state libraries. the same issue was confronted by catalogers at east carolina university, as well as catalogers at unc chapel hill.7 nonetheless, the challenge posed by discovery layers comes with opportunity, as james madison university discovered when their ebsco discovery service widget netted almost twice the usage of their previous library catalog widget, and as the university of colorado discovered when they observed users attempting to use the discovery layer search box in “google-like” ways that could potentially aid discovery layer creators (as well as library it departments) in both design and in setting expectations.8 as previously noted, primo’s results display is driven by pnx records (see figure 5 for an example). the single most fundamental challenge was finding a way to get to holdings rights information despite that data not being present in the pnx records, or, consequently, the search results that showed up in the presentation layer. there was no immediate option to create a solution that leveraged “server-side” resources, where the data itself resided and was transformed, since k-state libraries subscribes to primo as a hosted service, and ex libris provided no direct server-side access to k-state libraries. some alternative way had to be found to locate the rights data for individual records and populate it into the primo interface. upon assessing the situation, the assistant director, it (ad) decided that one potential approach would be to independently query the hathitrust bibliographic application programming interface (api) for rights information. this approach solved a number of fundamental problems, but also posited its own questions and challenges: 1. some server-side component would still be needed for part of the query . . . where would that live and how could it be made to communicate with the javascript k-state libraries had injected into its primo instance? 2. how to best isolate hathitrust object identifiers from primo and then use them to launch an api query? 3. how to keep those responses appropriately “pinned” to their corresponding entries on the primo page? 4. how would the hathitrust bibliographic api perform under load from search it queries? answering these questions would require significant research into the hathitrust bibliographic api documentation, and extensive experimentation. taking the long way around | bengston and coleman 32 https://doi.org/10.6017/ital.v38i1.10574 figure 5. a portion of the pnx record for http://hdl.handle.net/2027/uc1.32106011231518 (the second item shown in figure 1). building the application of these four questions, the first was easily the most challenging: where would the server-side component live and how would it work? the k-state libraries it services department had, in the past, made a number of significant modifications to the appearance and functionality of the primo application by adding javascript to the static html tiles used in the primo interface. however, generally speaking, javascript cannot successfully request data from outside of the domain of the web document it occupies. requesting data from an api across domains requires the mediation of a server-side appliance. the ad constructed one for this purpose, using the php programming language. this script would serve as an intermediary between the javascript in primo and the hathitrust api. the appliance accepted data from the primo javascript in the form of the contents of http variables (encoded in the url of the get request to the php appliance), then used those values to query the hathitrust api. however, since this server-side appliance did not reside in the same domain as k-state libraries’ primo instance, the problem of getting the returned api data from the php appliance to the javascript still remained. this problem was solved by treating the php appliance as a javascript file for purposes of the application. while javascript cannot load data from another domain, a web document may load actual javascript files from anywhere on the web. the hathigenius appliance takes advantage of this fact by calling the php appliance programmatically as a javascript file, with a javascript object notation (json) version of the identifiers of any hathitrust entries encoded as part of the url used to call the file. the php script runs the queries against the api and returns a javascript file consisting of a single variable containing the json data encoding the availability information for the hathitrust entries as supplied from the bibliographic api . . . essentially appearing to the browser as a standard javascript file. information technology and libraries | march 2019 33 the second and third problems were intrinsically interrelated, and essentially boiled down to finding a unique identifier to use in an api query from the hathitrust entries. the most effective way to handle these queries was to use the “htid” identifier, which was largely unique to hathitrust entries, could be easily extracted from any entries that contained it, and would form the basis of the php script’s request to the hathitrust restful api to obtain rights information. in the process of harvesting the htid, hathigenius also copies the id for the object in the webpage that serves as the entry in the list of primo returns containing that htid. as the data is moved back and forth for processing, the htids, and later the resultant json data, remain paired to the object id for the entry in the list of returns. when hathigenius receives the results of the api query, it can then easily rewrite those entries to reflect the rights data it obtained. the fourth question has been fully answered with time. to this point, well over a year after hathigenius was activated in production, library it has not observed any failure of the api to deliver the requested results in testing, and no issues to that effect have been reported by users. log data indicates that, even under heavy load, the api is performing to expectations. further modifications originally, the hathigenius application supplied definitive states of available or unavailable for each entry. however, some experimentation showed this approach to be less than optimal. since the bibliographic api cannot be queried by kansas state university as a specific user, but rather was being queried for general access rights, the possibility still existed for false negatives in the future, if kansas state university’s level of access to hathitrust changed. the data returned from the api queries, when drilled down, just consisted of the usrightsstring property from the api, which corresponded to open-source availability, and did not account for any additional items available to the library by license in the future. after the application had been active for a short time, to mitigate this potential issue, the “not available” state (consisting of an application of the “exlresultstatusnotavailable” class to the hathitrust entry) was “softened” into an application of the “exlresultstatusmaybeavailable” class and verbiage asking users to check the “view it” tab for availability. a few weeks after deployment, it received a ticket indicating hathigenius was failing to work properly. the source of the problem proved to be detailed bibliographic pages for items in a search results list, which were linking out from the search entries. these pages used a different class and object structure than the search results pages in primo, requiring that an additional module be built into hathigenius to account for them. once the new module was added to the application and put into place, the problem was resolved. a second issue presented itself some weeks later, when a few false negatives were reported. at first, the assistant director assumed that licensing had changed, creating a disparity between the access information from the usrightsstring property and the library’s actual holdings. however, upon investigation it was clear that hathigenius was dropping some of the calls to the hathitrust bibliographic api. the api itself was performing as expected under load, however, and the failure proved to be coming from an unexpected source. the php script used by hathigenius to interface with the api was employing the curl module, which, in turn, was using its own, less secure certificate to establish a secure socket layer (ssl) connection to the hathitrust server. once the taking the long way around | bengston and coleman 34 https://doi.org/10.6017/ital.v38i1.10574 script was refactored to employ the simpler file_get_contents function, which relied upon the server’s main ssl certificate, the problem was fully resolved. hathigenius also had a limited vulnerability to bad actors. while the internal script’s destination hardwiring prevented hathigenius from being used as a generic tool to anonymously query apis, the library did encounter a situation in which a (probably inadvertently) malicious bot repeatedly pinged the script, causing it to use up system resources until it interrupted other services on the host machine. modifications were added to the script to provide a simple check against requests originating from primo. additionally, restrictions were placed on the script so that excessive resource use would cause it to be intermittently deactivated. while not perfect solutions, these measures have prevented a repeat of the earlier incident. k-state libraries has recently finished work on its version of the new primo user interface (primo new ui), which was moved into production this year. the new interface has a completely different client-side structure, requiring a very different approach to integrating hathigenius.9 appearance of hathitrust results in primo after hathigenius when the hathigenius api does not find a usrights property, we configured primo to display a yellow dot and the text “please check availability with the view it tab” (see figure 6 for an example). as noted earlier, we originally considered this preferable to displaying a red dot and the text “not available online,” because there might be instances in which the item is actually available in full view through hathitrust despite the absence of usrights in the record. figure 6. two books for which hathigenius found no usrights in hathitrust. when the hathigenuis api finds usrights, we configured primo to display a green dot and text “available online” (see figure 7 for an example). information technology and libraries | march 2019 35 figure 7. a book for which hathigenius found usrights. patron response since the beginning of 2017, the reference staff at k-state libraries have received no reports of patrons encountering situations in the original user interface in which primo indicates that full text is available but hathitrust is only providing a preview. however, a small number of patrons (at least four) expressed confusion at seeing a result in primo and discovering that the full-text is not available. some of those patrons noted that they saw the text “please check availability with the view it tab,” and inferred that this was meant to state that the full-text was available. others indicated that they never considered that we would include results for books that we do not own. these responses add to the body of literature documenting user expectations that everything should be available in full-text in an online library and that systems should be easy to use.10 internal response in order to gauge the feelings of k-state libraries’ staff who regularly assist patrons with reference questions, the authors crafted a brief survey (included in appendix a). respondents were asked to indicate whether they had noticed a positive change following implementation of hathigenius, a negative change, or no change at all. they were also invited to share comments. the survey was distributed to thirty individuals. twelve (40 percent) of those thirty responded to the survey. the survey response indicated a great deal of ambivalence by reference staff toward the change, with four individuals (33 percent) indicating they had not noticed a difference, and another four (33 percent) indicating that they had noticed a difference, but that it had not improved the quality of search results. only two (17 percent) of the respondents revealed that they had noticed an improvement in the quality of the search results. one (9 percent) respondent indicated that they felt that the hathitrust results had gotten noticeably worse since the introduction of hathigenius, although they did not elaborate on this in the survey question which invited further comment. the remaining respondent stated that they did not have an opinion. four comments were left by respondents, including one which indicated displeasure with the new, softer verbiage for hathitrust “negatives,” and one who claimed that the problem of false positives persisted, despite such feedback not being seen by the authors through any of the statistical modalities currently used for recording reference transactions. one user praised hathigenius, while another related broad displeasure with the decision to include hathitrust records in search it. that individual claimed that almost none of the results from hathitrust were available and stated that the hope engendered by the presence of the hathitrust results and the corresponding suggestion to check the view it tab was always dashed, to the detriment of patron satisfaction. taking the long way around | bengston and coleman 36 https://doi.org/10.6017/ital.v38i1.10574 the new ui as previously mentioned, in late 2018, k-state libraries adopted the primo new ui created by ex libris. this new user interface was built in angular, and changed many aspects about how hathigenius had to be integrated into primo. the k-state libraries’ it department completed a refactoring (reworking application code to change how an application works, but not what it does) of hathigenius to integrate it with the new ui and released it into production in september 2018. as an interesting aside, the it department did not initially prioritize the reintegration of hathigenius, due to the ambivalence of the response to the application evidenced by the survey conducted for this paper. however, shortly after search it was switched over to the new ui, complaints about the hathitrust results again displaying inaccurate availability information began to come in to the it department via both email and tickets from reference staff. as the stridence of the response increased, the project was reprioritized, and the work completed. future directions as previously mentioned, hathigenius currently uses the very rough “usrightsstring” property value from the hathitrust bibliographic api. however, the api also delivers much more granular rights data for digital objects. a future version of the app may inspect these more granular rights codes and compare them to rights data from k-state libraries in order to more definitively provide access determinations for hathitrust results in primo should the licensing of hathitrust holdings be changed. similarly, since htid technically only resolves to the volume level, a future version may additionally harvest the hathitrust record number, which appears to be extractable from the primo entries. based on feedback from the survey, the “soft negative” verbiage used in hathigenius was replaced with a firmer negative. this decision proved especially sagacious given that, once the early issues with certificates and communication with the hathitrust bibliographic api were sorted out, the accuracy of the tool seemed to be fully satisfactory. another problem with the “soft negative” was the fact that it asked users to click on the view-it tab, when many users simply chose to ignore the tabs and links in the search results, instead clicking on the article title, as found in a usability study on primo conducted by the university of houston libraries.11 it is also worth noting the one survey respondent who is apparently not seeing an improvement in hathitrust accuracy. if the continued difficulties they have indicated can be documented and replicated, the it department can examine those complaints to investigate where the tool may be failing. discussion one interesting feature of this experience is the seeming disconnect between library reference support staff and users in terms of the perception of the efficacy of the tool. this disconnect is all the more curious given the negative reaction displayed by reference support staff when hathigenius became unavailable temporarily upon introduction of the primo new ui. part of this perceived disconnect may be a result of the fact that staff were given a survey instrument, while the reactions of users have been determined largely via null results (a lack of complaints to, or information technology and libraries | march 2019 37 requests for assistance from, service point staff). however, given the dramatic drop in user complaints compared to the ambivalent reaction to the tool by most of the survey respondents, it appears that the staff had a much less enthusiastic response to the intervention than patrons. a few possibilities occur to the authors, including a general dislike for the discovery layer by reference librarians, a general disinclination toward a technological solution by some respondents, or the initial perception by at least part of the reference staff that the problem was not significant. as noted by fagan et al., the pivot toward discovery layers has not been a comfortable one for many librarians.12 until further research can be conducted on this, and reactions to similar customization interventions, these possibilities remain speculation. one particular feature of note with hathigenius is the use of what one of the authors refers to as “sidewise development” to solve problems that seem to be intractable within a proprietary, or open source, web-based tool. while not a new methodology in and of itself, the author has mainly encountered this type of design in ad-hoc creations, rather than as a systematic approach to problem-solving. instead of relying upon the capabilities of primo, this type of customization made its own query to a relevant api and blended that external data with the data available from primo seamlessly within the application’s presentation layer in order to facilitate a solution to a known problem. the solution created in this fashion was portable, and unaffected by most updates to primo itself. even the transition to the new ui required changes to the “hooks” and timing used by the javascript, rather than any substantial rewrite of the core engines of the application. this methodology has been used repeatedly by k-state libraries it services to solve problems where other interventions would have necessitated the creation of specialized modules, or the rewriting of source code; both of which would be substantially affected by updates to the product itself, and which would have been difficult to improve or version without down time to the affected product. similar solutions have seen tools independently query an application’s database in order to inject the data back into the application’s presentation layer, bypassing the core functionality of the application. conclusion reactions at this point from users, and at least some library staff, have been positive. while not a perfect tool, hathigenius has improved the user experience, removing a point of frustration and an area of disconnect between the library and its users. the application itself is fully replicable by other institutions (as is the general model of sideways development), allowing them to improve the utility of their primo instances. as with many possible customizations to discovery layers, hathigenius provides fertile ground for additional work, research, and refinement, as libraries struggle to find the most effective ways to implement discovery tools within their own environments. beyond hathigenius itself, the sideways development method provides a powerful tool for libraries to improve the tools they use by integrating additional functionality at the presentation layer level. tackling the problem of inaccurate full-text links in discovery layers is only one application of this approach, but it is an important one. as libraries continue to strive to improve the results and usability of their search offerings, the ability to add local customizations and improvements will be an essential feature for vendors to consider. taking the long way around | bengston and coleman 38 https://doi.org/10.6017/ital.v38i1.10574 appendix a. feedback survey q1 in january 2017, the library began applying a tool (called hathigenius) to the hathitrust results in primo in order to eliminate the problem of “false positives.” in other words, primo would report that all of the hathitrust results it returned were available online as full text, when many were not. we would like your feedback about the impact of this change from your perspective. q2 which of the following statements best describes your opinion about the impact of hathigenius? o i haven’t noticed a difference. o i feel that search it’s presentation of hathitrust results in search it has become noticeably better since hathigenius was implemented. o i feel that search it’s presentation of hathitrust results in search it has become noticeably worse since hathigenius was implemented. o i have noticed a difference, but i feel that search it’s presentation of hathitrust results is about the same quality as it was before hathigenius was implemented. o no opinion. q3 please share any comments you have about hathigenius or any ideas you have for improving the display of hathitrust’s records in search it. information technology and libraries | march 2019 39 references 1 hathitrust digital library, “welcome to hathitrust!” accessed march 4, 2018, https://www.hathitrust.org/about. 2 sunshine carter and stacie traill, “essential skills and knowledge for troubleshooting eresources access issues in a web-scale discovery environment,” journal of electronic resources librarianship 29, no. 1 (2017): 7, https://doi.org/10.1080/1941126x.2017.1270096. 3 jacquie samples and ciara healy, “making it look easy: maintaining the magic of access,” serials review 40, no. 2 (2014): 114, https://doi.org/10.1080/00987913.2014.929483. 4 maria collins and william t. murray, “seesau: university of georgia’s electronic journal verification system,” serials review 35, no. 2 (2009): 80, https://doi.org/10.1080/00987913.2009.10765216. 5 karen calhoun, diane cellentani, and oclc, eds., online catalogs: what users and librarians want: an oclc report (dublin, ohio: oclc, 2009): 20, https://www.oclc.org/content/dam/oclc/reports/onlinecatalogs/fullreport.pdf. 6 rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends; baltimore 61, no. 1 (summer 2012): 191, https://scholarcommons.scu.edu/cgi/viewcontent.cgi?article=1132&context=library. 7 marlena barber, christopher holden, and janet l. mayo, “customizing an open source discovery layer at east carolina university libraries “the cataloger’s role in developing a replacement for a traditional online catalog,” library resources & technical services 60, no. 3 (july 2016): 184, https://journals.ala.org/index.php/lrts/article/view/6039; benjamin pennell and jill sexton, “implementing a real-time suggestion service in a library discovery layer,” code4lib journal, no. 10 (june 2010): 5, https://journal.code4lib.org/articles/3022. 8 jody condit fagan et al., “usability test results for a discovery tool in an academic library,” information technology and libraries 31, no. 1 (march 2008): 99, https://doi.org/10.6017/ital.v31i1.1855. 9 dan moore and nathan mealey, “consortial-based customizations for new primo ui,” the code4lib journal, no. 34 (october 25, 2016), http://journal.code4lib.org/articles/11948. 10 lesley m. moyo, “electronic libraries and the emergence of new service paradigms,” the electronic library, 22, no. 3 (2004): 221, https://www.emeraldinsight.com/doi/full/10.1108/02640470410541615. 11 kelsey brett, ashley lierman, and cherie turner, “lessons learned: a primo usability study,” information technology and libraries, 35, no. 1 (march 2016): 20, https://ejournals.bc.edu/ojs/index.php/ital/article/view/8965. 12 fagan et al., “usability test results for a discovery tool in an academic library,” 84. lib-mocs-kmc364-20140106084054 a computer system for effective management of a medical library network 213 richard e. nance and w. kenneth wickham: computer science/ operations research center, institute of technology, southern methodist university, dallas, texas, and maryann duggan: systems analyst, south central regional medical library program, dallas, texas trips (talon reporting and information processing system) is an interactive software system for generating reports to nlm on regional medical library network activity and constitutes a vital part of a network management information system (nemis) for the south central regional medical library program. implemented on a pdp-lofsru 1108 interfaced system, trips accepts paper tape input describing network transactions and generates output statistics on disposition of requests, elapsed time for completing filled requests, time to clear unfilled requests, arrival time distribution of requests by day of month, and various other measures of activity andjor performance. emphasized in the trips design are flexibility, extensibility, and system integrity. processing costs, neglecting preparation of input which may be accomplished in several ways, are estimated at $.05 per transaction, a transaction being the transmittal of a message from one library to another. introduction the talon (texas, arkansas, louisiana, oklahoma, and new mexico) regional medical library program is one of twelve regional programs established by the medical library assistance act of 1965. the regional programs form an intermediate link in a national biomedical information network with the national library of medicine ( nlm) at the apex. unlike 214 journal of library automation vol. 4/4 december, 1971 most of the regional programs that formed around a single library, talon evolved as a consortium of eleven large medical resource libraries with administrative headquarters in dallas. a major focus of the talon program is the maintenance of a document delivery service, created in march 1970, to enable rapid access to published medical information. twx units located in ten of the resource libraries and at talon headquarters in dallas comprise the major communication channel. in july 1970 a joint program was initiated to develop a statistical reporting system for the talon document delivery network. design and development of the system was done by the computer science/operations research center at southern methodist university, while training and operational procedures were developed by talon personnel. both parties in the effort view the statistical reporting system as a vital first step in providing talon administrators with a comprehensive network management information system (nemis ). an overview of this statistical reporting system, designated as trips (talon reporting and information processing systems), and its relation to nemis is discussed in the following paragraphs. the objectives and design characteristics of nemis are stated in ( 1 ). design requirements there were two considerations for requirements for a network management information service ( nemis ) for talon: 1) in what environment would talon function? 2) what should be the objectives of a network management information service and what part does a statistical reporting system play in its development? the talon staff and the design team spent an intensive period in joint discussion of these two questions. talon environment the talon document delivery network operates in an expansive geographical area (figure 1). the decentralized structure of the network enables information transfer between any two resource libraries. in addition talon headquarters serves as a switching center, by accepting loan requests, locating documents, and relaying requests to holding libraries. a requirement placed on talon by nlm is the submission of monthly, quarterly, and annual reports giving statistical data on network activity. these statistics provide details on: 1) requests received by channel used (mail, telephone, twx, other), 2) disposition of requests (rejected, accepted and filled , accepted and unfilled), 3) response time for filled requests, 4) response time for unfilled requests, 5) most frequent user libraries, 6) requests received from each of the other regions, and 7) non-medlars reference inquiries. a medical library networkjnance 215 • fig. 1. location of the eleven resource libraries and talon headquarters. monthly reports require cumulative statistics on year-to-date performance, and each of the eleven resource libraries and talon headquarters is required to submit a report on its activity. needs and objectives while the immediate need of the talon network was to develop a system to eliminate manual preparation of nlm reports, an initial decision was made to develop software also capable of assisting talon management in policy and decision making. eventual need for a network management information system ( nemis) being recognized, the talon reporting and information processing system (trips) was designed as the first step in the creation of nemis. provision of information in a form suitable for analytical studies of policy and decision makinge.g., the message distribution problem described by nance ( 2) -placed some stringent requirements on trips. for instance, the identification of primitive data elements could not be made from report considerations only; an overall decision had to be made that no sub-item of information would ever be required for a data element. in addition the system demanded flexibility and extensibility, since it was to operate in a highly dynamic environment. these characteristics are quite apparent in the design of trips. 216 journal of library automation vol. 4/4 december, 1971 trips design trips is viewed as a system consisting of hardware and software components. the description of this system considers: 1) the input, 2) the software subsystems (set of programs), 3) hardware components, and 4) the output. emphasis is placed on providing an overview, and no effort is made to give a detailed description. the environment in which trips is to operate is defined in a single file ( for25.dat). this file assigns network parameters, e.g., number of reporting libraries, library codes, and library titles. the file is accessed by subprograms written in fortran iv and dystal ( 3), the latter being a set of fortran iv subprograms, termed dystal functions, that perform primitive list processing and dynamic storage allocation operations. because it requires only fortran iv trips can be implemented easily on most computers. input a transaction log, maintained by each regional library and talon headquarters, constitutes the basic input to trips. copies of log sheets are used to create paper tape description of the transactions. if and when compatibility is achieved between standard twx units and telephone entry to computer systems, the input could be entered directly by each regional library. (this is technically possible at present. ) currently, talon headquarters is converting the transaction descriptions to machine readable form. initial data entry under normal circumstances is pictured in figure 2, which shows the sequence of operations and file accesses in two phases: 1) data entry and 2) report generation. data entry in tum comprises 1) collecting statistics, 2) diagnosis and verification of input data and 3) backup of original verified input data. trips is designed to be extremely sensitive to input data. all data is subjected to an error analysis, and a specific file (for22.dat ) is used to collect errors detected or diagnosed in the error analysis routine. only verified data records are transmitted to the statistical accumulation file (for20.dat). software subsystems trips comprises seven subsystems or modules. within each module are several fortran iv subprograms, dystal function and/ or pdp-10 systems programs discussed under hardware components in the following section: newy: run at the beginning of each year, newy builds an in-core data structure and transfers it to disk for each resource library in the network. it further creates the original data backup disk file ( for23.dat). after disk formatting , record (the accessing and storage module) may be activated to begin accumulating statistics for the new year. a medical library networkjnance 217 statistical collection l~cport genera tion reimburs ab le statis tic s repor t non-reimburs•ble s tat istics report fig. 2. trips structure newq: newm: dumpl: record: report: manage: run between quarters, newq purges the past quarter statistics for each library and prepares file for23.dat for the next quarter. the report for the quarter must be generated before newq is executed. run between months, newm purges the monthly statistics for each regional library and prepares file for23.dat for the backing up of next month's data. the utility module causes a dyst al dump of the data base. the accessing and storage module record incorporates the error diagnosis on input and the entry of validated data records into file for23.dat. no data record with an indicated error is permitted, and erroneous records are flagged for exception reporting. the error report (ermes.dat) may be printed on the teletype or line printer after execution of record. the reporting module report generates all reimbursable statistics on a month-to-date, quarter-to-date, and year-todate basis. utilization of trips as a network management tool is afforded by manage, which combines statistics from reimbursable and non-reimbursable transactions to generate a report providing measures of total network activity and performance. 218 journal of library automation vol. 4/4 december, 1971 the primary files used by the software subsystems are described briefly in table 1. table 1. primary files in trips file name function of the file for25.da t contains the system definition parameters and initialization values. for20.dat for2l.dat statistical accumulation for validated data records. generation of reports from information in for20.dat. comments created from card input to assure proper format. two parts : file type ascii ( 1) input translator binary data structure, and (2) statistical data base. carriage control charascii acters must be included to generate reports. for22.dat collects data records errors accumulated ascii diagnosed as in error. in for22.da t are transmitted to ermes.dat for output. for23.dat enables creation and each month's valiascii updating of the backdated records added up magnetic tape. to tape. for24.dat enables recovery tape information binary read of backup tape. stored prior to transfer of file information to for20.dat. ermes.dat serves to output mesif 6 or less errors ocascii sages on data records cur ermes is not diagnosed as in error. created and messages are output to teletype. if more than 6 errors, an estimate of typing time is given to user who has option of printing them on the teletype or in a report form on the line printer. a medical library networkjnance 219 a major concern in any management information system is the system integrity. in addition to the diagnosis of input data, trips concatenates sequential copies of disk file for23.dat to provide a magnetic tape backup containing all valid data records for the current year. a failsafe tape, containing all trips programs, is also maintained. hardware components conversion of transaction information to machine readable form is done off line currently. using a standard twx with ascii code, paper tapes are created and spliced together. fed through a paper tape reader to a pdp-10 (digital equipment company), the input data is submitted to trips. control of trips is interactive, with the user monitoring program execution from a teletype. all file operations are accomplished using the pdp-10 via the teletype, and the output reports are created on a high-speed line printer. with sm,u's pdp-10 and sru 1108 interface, report generation can be done on line printers at remote terminals to the sru 1108 as well. output trips output consists of a report for each library in the network and a composite for the entire network. the report may be limited to reimbursable statistics or include all statistics. information includes: 1) errors encountered in the input phase, 2) number of requests received by channel, 3 ) disposition of requests (i.e., rejected, accepted/ filled, accepted/ unfilled, etc. ) , 4) elapsed time for completing :filled requests or clearing unfilled requests, 5) geographic origin of requests, 6) titles for which no holdings were located within the region, 7 ) types of requesting institutions, 8) arrival time distribution of requests by day of month, 9) invoice for reimbursement by talon, 10 ) node/ network dependency coefficient as described by ( 4). summary trips is now entering its operational phase. training of personnel at the resource libraries is concluded, and data on transactions are being entered into the system. input errors have decreased significantly ( from fifteen or twenty percent to approximately two percent ). talon personnel are enthusiastic, and needless to say the regional library staffs are happy to see a bothersome, time-consuming manual task eliminated. in summary, the following characteristics of trips deserve repeating: 1) with its modular construction, it is flexible and extensible. 220 journal of library automation vol. 4/4 december, 1971 2) implemented in dystal and fortran iv, it should allow installation on most computers without major modifications. 3) designed to operate in an interactive environment, it can be modified easily to function in a batch processing environment. 4) trips is extremely sensitive to system integrity, providing diagnosis of input data, reporting of errors, magnetic tape backup of data files, and a system failsafe tape. 5) definition of primitive data elements and the structural design of trips enable it to serve as the nucleus of a network management information system ( nemis) as well as to generate reports required by nlm. 6) currently accepting paper tape as the input medium, trips could be modified easily to accept punched card input and with more extensive changes could derive the input information during the message transfer among libraries. finally, the processing cost of operating trips, neglecting the conversion to paper tape, is estimated to be $.05 per transaction (a message transfer from one library to another). extensive and thorough documentation of trips has been provided. availability of this documentation is under review by the funding agency. acknowledgment work described in this article was done under contract hew phs 1 g04 lm 00785-01, administered by the south central regional medical library program of the national library of medicine. the authors express their appreciation to dr. u. narayan bhat and dr. donald d. hendricks for their contributions to this work. references 1. "nemis -a network management information system," status report of the south central regional medical library program, october 26, 1970. 2. nance, richard e.: "an analytical model of a library network," journal of the american society for information science, 21: (jan.-feb. 1970), 58-66. 3. sakoda, james m.: dyst aldynamic storage allocation language manual, (providence, r. i.: brown university, 1965). 4. duggan, maryann, "library network analysis and planning (libnat)," journal of library automation, 2: (1969), 157-175. journey with veterans: virtual reality program using google expeditions public libraries leading the way journey with veterans virtual reality program using google expeditions jessica hall information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12857 jessica hall (jessica.hall@fresnolibrary.org) community librarian, fresno county public library. © 2020. “where would you like to go?” is the question of the day. we have stood atop the great wall of china, swam with sea lions in the galapagos islands, and walked along the vast red sands of mars. each journey was unique and available through the library. as a community librarian in charge of outreach to seniors and veterans, i first learned about the virtual tour idea from a colleague who returned from a conference excited to tell me about a workshop she had attended. the workshop she had taken described a program which utilized google expeditions to take seniors on virtual tours. this idea stayed with me for months until fresno county public library obtained the $3000 value of libraries grant, which was funded by the california library services act. as a part of this grant, $2905 went to purchase a google expeditions kit and supplied to create a virtual reality program called journey with veterans. the kit includes 5 viewers and 1 tablet. a viewer is basically a google cardboard except the case is plastic and there is a smartphone inside of the case. during the program, i use the table to select and run each tour. the tour i select on the tablet is projected to the 5 viewers so participants can experience it. in this manner, veterans can explore places without physically having to travel anywhere. the journey with veterans program took the technology to the veterans instead of requiring them to come into the library. the two locations that were chosen were the veterans home of california fresno and the community living center at the va medical center in fresno, ca. from the time the program began in september 2019 to march 2020, when the pandemic shutdown brought a halt to the program, the library hosted 26 sessions at these two locations with 182 veterans. in sessions where more than 5 people were in attendance, the viewers were shared between the participants. the tablet and smartphones inside of the viewers have an app installed on them called google expeditions which is the software that runs the tours. one hotspot, which was already owned by the library, was used for this program. it is a requirement that all the viewers and the tablet are connected to the same wifi. having a portable wifi connection was necessary to run this program in locations where there was not access to a strong internet connection. each tour is a selection of still 360-degree views. the landscape does not move. instead, the participant turns their head around, up and down to look at the entire scene. the control tablet included additional menu items not seen by participants. these items included scripts that i can read off about the landscape we are looking at and suggested points of interest that i could highlight for participants. when i selected the point of interest on the tablet, the participant would see arrows pointing to that area of their screen. the participant would follow the arrows by turning their head in the direction that was indicated. the participants knew they were looking at the area of interest when the arrows disappeared and was replaced by a white circle surrounding the relevant portion of the screen. mailto:jessica.hall@fresnolibrary.org information technology and libraries december 2020 journey with veterans | hall 2 the viewers did not have straps attached to them and there was no way to attach straps to them. therefore, the viewer could not be strapped to the participant’s head. instead, the participant had to hold up the viewer the entire time they wished to look through it. this presented a challenge for participants who did not have the ability to hold the viewer on their own. at the locations i went to, there were staff available to help and they would hold the viewer up to a participant’s eyes. in some cases, one staff person held the viewer up for the participant while another would turn the participant’s wheelchair in a circle so they could see the entire image. each program lasted 30-45 minutes but the amount of time looking through the viewer was kept to around 15-20 minutes. the rest of the time was filled with talking about the location that we are viewing. for the veterans in memory care at the veterans home of california fresno, this program was designed with the hope that it would allow the veterans to reminisce about places they had visited and lived in and encouraged them to talk about their experiences. some of the participants had been to the countries that we visited virtually and they reminisced on their time there. at every session, the participants shared their enthusiasm and eagerness to continue the program. the program once was tried with music. on one of my first visits to the community living center at the va medical center, a participant asked if he could play music in the background. since i had thought about incorporating music into the program, i agreed, and the participant played some classical music from his own device. though it was a good idea, the execution did not work well. the music was coming from one location, which made it too loud when one stood near it but too quiet once one walked too far away. i found the music difficult to talk over while giving the tour. i believe that incorporating sounds of the location we visit, such as the sounds of the countryside or a big city would make the experience more immersive. however, i have yet to find a way to do so successfully. after the grant ended, i continued the program at both locations. the partnership i had created at the veterans home of california-fresno grew into a second program, storytime with veterans which was requested by specifically by the residents. i alternated my visits so that some weeks we did a virtual reality program and some weeks i read to them. one time, there was miscommunication and the activity coordinator thought i had come to read a story but i was under the impression that it was a virtual reality week and so i had brought the google expeditions with me. the solution was to do both. one of the google expeditions tours is a very short and much abridged virtual reality version of twenty thousand leagues under the sea by jules verne. the tour used artwork to represent scenes from the books and each scene tells a different part of the story. the veterans home’s residents were treated to both a story and a virtual reality tour at the same time. up until the library’s shutdown in mid-march due to covid-19, i was in the process of expanding the use of the google expeditions but was unable to continue. since then, the equipment has not been used. restarting the program now includes multiple challenges, not the least of which is sanitizing the devices. sanitation was a consideration even before covid-19 and sanitary virtual reality masks were acquired using grant funds as part of the initial program. these masks look like strips of cloth that line the eyes with strings to hook it around the ears to hold it in place. cleaning products were also purchased and utilized to clean the devices after each program. information technology and libraries december 2020 journey with veterans | hall 3 before covid-19, a viewer could be handled by multiple people before it was cleaned. i always handled them first to prepare them for use. then i handed each one to the participant. occasionally they were also handled by staff. i always cleaned the viewers right after the program ended but not during the program. with the current covid-19 restrictions, the sanitation practices previously used are inadequate. i do not know the future of the program in a postcovid-19 world, but i intend to begin the program again once when it becomes safe to do so and i will incorporate all required precautions and restrictions. i look forward to once more being able to take veterans on exciting virtual journeys. digital faculty development editorial board thoughts digital faculty development cinthya ippoliti information technology and libraries | june 2019 5 cinthya ippoliti (cinthya.ippoliti@ucdenver.edu) is director, auraria library, colorado. the role of libraries within faculty development is not a new concept. librarians have offered workshops and consultations for faculty for everything from designing effective research assignments, to scholarly impact, and open educational resources. in recent months however, both acrl and educause have highlighted new expectations for faculty to develop skills in supporting students within a digital environment. as part of acrl’s “keeping up with…” series, katelyn handler and lauren hays1 discuss the rise of faculty learning communities that cover topics such as universal design, instructional design, and assessment. effective teaching has also recently become the focus of many institutions’ efforts in increasing student success and retention, and faculty play a central role in students’ academic experience. in addition, the educause horizon report echoes these sentiments, positing that “the role of full-time faculty and adjuncts alike includes being key stakeholders in the adoption and scaling of digital solutions; as such, faculty need to be included in the evaluation, planning, and implementation of any teaching and learning initiative.”2 finally, maha bali and autumn caines mention that “when offering workshops and evidence-based approaches, educational development centers make decisions on behalf of educators based on what has worked in the past for the majority.”3 they call for a new model that blends digital pedagogy, identity, networks, and scholarship where the experience is focused on “participants negotiating multiple online contexts through various online tools that span open and more private spaces to create a networked learning experience and an ongoing institutionally based online community.”4 so how does the library fit into this context? what we are talking about here goes far beyond merely providing access to tools and materials for faculty. it requires a deep tripartite partnership with educators and the centers for faculty development, as each partner brings something unique to the table that cannot be covered by one area alone. the interesting element here is a dichotomy where this type of engagement can span both in-person and virtual environments as faculty utilize both to teach and connect with colleagues as part of their own development. the lines between these two worlds suddenly blur and it is experience and connectivity that are at the center of the interactions rather than the tools themselves. while librarians may not be able to provide direct support in terms of instructional technologies, they can certainly inform efforts to integrate open and critical pedagogy and scholarship into faculty development programming and into the curriculum. libraries can take the lead on providing the theoretical foundation and application for these efforts while the specifics of tools and approaches can be covered by other entities. bali and caines also observe that bringing together disparate teaching philosophies and skill sets under this broader umbrella of digital support and pedagogy can help provide professional development opportunities for faculty, especially adjuncts, who may not have the ability to participate otherwise. this opportunity can act as a powerful catalyst to influence their teaching by implementing, and therefore modeling, a best-practices approach so that they are thinking about digial faculty develoment | ippoliti 6 https://doi.org/10.6017/ital.v38i2.11091 bringing students together in a similar fashion even if they are not teaching exclusively online, but especially if they are.5 open pedagogy can accomplish this in a variety of ways. bronwyn hegarty defines eight areas that constitute open pedagogy: (1) participatory technologies; (2) people, openness, and trust; (3) sharing ideas and resources; (4) connected community; 5) learner generated; (6) reflective practice; and (7) peer review.6 these elements are applicable to both faculty development practices, as well as pedagogical ones. just as faculty might interact with one another in this manner, so can they collaborate with their students utilizing these methods. by being able to change the course materials and think about the ways in which those activities shape their learning, students can view the act of repurposing information as a way to help them define and achieve their learning goals. this highlights the fact that an environment where this is possible must exist as a starting point and it also underlines the importance of the instructor’s role in fostering this environment. having a cohort of colleagues, for both instructors and students, can “facilitate student access to existing knowledge, and empower them to critique it, dismantle it, and create new knowledge.”7 this interaction emphasizes a twoway experience where both students and instructors can learn from one another. this is very much in keeping with the theme of digital content, as by the very nature of these types of activities, the tools and methods must lend themselves to being manipulated and repurposed, and this can only occur in a digital environment. finally, in a recent posting on the open oregon blog, silvia lin hanick and amy hofer discuss how open pedagogy can also influence how librarians interact with faculty and students. specifically, they state that “open education is simultaneously content and practice”8 and that by integrating these practices into the classroom, students are learning about issues such as intellectual property and the value of information, by acting “like practitioners” 9 where they take on “a disciplinary perspective and engage with a community of practice.”10 this is a potentially pivotal element to take into consideration when analyzing the landscape of library-related instruction, because it frees the librarian from feeling as if everything rests on that one-time instructional opportunity. the development of a community of practitioners which includes the students, faculty, and the librarian has the potential to provide learning opportunities along the way. including the librarian as part of this model makes sense not only as a way to signal the critical role the librarian plays in the classroom, but also as a way to stress that thinking about, and practicing library-related activities is (or should be) as much part of the course as any other exercise. information technology and libraries | june 2019 7 references 1 katelyn handler and lauren hays, “keeping up with…faculty development,” association of college and research libraries, last modified 2019, http://www.ala.org/acrl/publications/keeping_up_with/faculty_development. 2 “horizon report,” educause, last modified 2019, https://library.educause.edu//media/files/library/2019/2/2019horizonreportpreview.pdf. 3 maha bali and autumm caines. “a call for promoting ownership, equity, and agency in faculty development via connected learning.” international journal of educational technology in higher education 15, no. 1 (2018): 3. 4 bali, “a call for promoting ownership, equity, and agency in faculty development,” 9. 5 ibid, 3. 6 bronwyn hegarty, “attributes of open pedagogy: a model for using open educational resources,” last modified, 2015, https://upload.wikimedia.org/wikipedia/commons/c/ca/ed_tech_hegarty_2015_article_attri butes_of_open_pedagogy.pdf. 7 kris shaffer, “the critical textbook,” last modified 2014, http://hybridpedagogy.org/criticaltextbook/. 8 silvia lin hanick and amy hofer, “opening the framework: connecting open education practices and information literacy,” open oregon, last modified 2017, http://openoregon.org/openingthe-framework/. 9 “opening the framework.” 10 “opening the framework.” creating and managing a repository of past exam papers communications creating and managing a repository of past exam papers mariya maistrovskaya and rachel wang information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11837 mariya maistrovskaya (mariya.maistrovskaya@utoronto.ca) is digital publishing librarian, university of toronto. rachel wang (rachel.wang@utoronto.ca) is application programmer analyst, university of toronto. abstract exam period can be a stressful time for students, and having examples of past papers to help prepare for the tests can be extremely helpful. it is possible that past exams are already shared on your campus—by professors in their specific courses, via student unions or groups, or between individual students. in this article, we will go over the workflows and infrastructure to support the systematic collection, provision of access to, and repository management of past exam papers. we will discuss platform-agnostic considerations of opt-in versus opt-out submission, access restriction, discovery, retention schedules, and more. finally, we will share the university of toronto setup, including a dedicated instance of dspace, batch metadata creation and ingest scripts, and our submission and retention workflows that take into account the varying needs of stakeholders across our three campuses. background the university of toronto (u of t) is the largest academic institution in canada. it spans across three campuses and serves more than 90,000 students through its 700 undergraduate and 200 graduate programs.1 the university of toronto structure is the product of its rich history and is thus largely decentralized. as a result, the management of undergraduate exams is carried out individually by each major faculty at the downtown (st. george) campus, and centrally at the university of toronto mississauga (utm) campus and the university of toronto scarborough (utsc) camp us. the faculty of arts and science (fas) at the st. george campus has traditionally made exams from its departments available to students. in the pre-internet era, students were able to consult print and bound exams in departmental and college libraries’ reference collections. with the rise of online technologies, the fas registrar’s office seized the opportunity to make access to past exams more equitable for students and worked with the university of toronto libraries (utl) information technology services (its) to digitize and make exams available online. they were initially shared electronically via the gopher protocol and later via docutek eres, one of the first available course e-reserves systems. after the utl became an early adopter of the dspace (https://duraspace.org/dspace/) open source platform for its institutional repository in 2003, the utl its created a separate instance of dspace to serve as a repository of old exams. the repository makes the last three years of exams from the fas, utm, and utsc available online in pdf. about 5,500 exam papers are available to students with u of t login at any given time. discussed below are some of the considerations in establishing and maintaining a repository of old exams on campus, along with practical recommendations and shared workflows from the utl. mailto:mariya.maistrovskaya@utoronto.ca mailto:rachel.wang@utoronto.ca https://duraspace.org/dspace/ information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 2 considerations in establishing a repository of old exams if you are looking to establish a repository of old exams, these are some of the considerations to take into account when planning a new service or evaluating an existing one. the source of old exams depending on the level of centralization on your campus, exams may be administered by individual academic departments or submitted by instructors/admins into a single location and managed centrally. the stakeholders involved in this process may include the office of the registrar, campus it, departmental admins or libraries, etc. establishing a relationship with such stakeholders is key in getting access to the files. when arranging to receive electronic files, consider whether they could be accompanied with existing metadata. alternatively, if the university archives or records management already receive copies of campus exams, you may be able to obtain them there. print versions will need to be digitized for online access—later in this article we will share metadata creation strategies in this scenario. it is also possible that exams may be collected in less formal ways, for example, via exam drives by student unions and groups. the utl works closely with the fas registrar’s office to receive a batch of exams annually. the utl receives a copy of print fas exams that get digitized by the its staff. the utl also receives exams from two u of t campuses, utm and utsc, that arrive in electronic format via the campus libraries. the u of t engineering society and the faculty of law each maintain their individual exam repositories, and the arts and science student union maintains a bank of term tests donated by students. content hosting and management one of the key questions to answer is which campus department or unit will be responsible for hosting the exams, managing content collection, processing and uploads, and providing technical and user support. these responsibilities may be within the purview of a single unit or may be shared between stakeholders. here are some examples of the tasks to consider: 1. collecting exams from faculty or receiving them from a central location 2. managing restrictions (exams that will not be made available online) 3. digitizing exams received in print 4. creating metadata or converting metadata received with the files 5. uploading exams to the online repository 6. removing exams from the online repository 7. providing technical support and maintenance (e.g., platform upgrades, troubleshooting) 8. providing user support (e.g., assistance with locating exams) at u of t, tasks 1–2 are taken care of by registrar offices at fas and utm and by the library at utsc. tasks 3–8 are performed centrally by the utl its, with the exception of digitization services for exams received from the utm and utsc campuses. further details and considerations related to the content management system and processing pipelines are outlined in the “infrastructure and workflows” section below. information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 3 collection scope depending on the sources of your exams, you may need to establish the scope rules for what gets included in the collection. for example: • will you only include final exams? will term tests also be included? • will solutions be posted with the exams? • will additional materials, such as course syllabi, also be included? at the utl, only final exams are included in the repository, and no answers are supplied. exam retention making old exams available online is always a balancing act between the interests of students who want to have access to past test questions and the interests of instructors who may have a limited pool of questions to draw from or who may teach different course content over time and want to ensure that the questions continue to be relevant. at the utl, in consultation with campus partners, the balance was achieved by only posting the three most recent years of exams in the repository. as soon as a new batch is received, the utl removes a batch of exams more than three years old. opt-in versus opt-out approach where exam collection is driven centrally by a registrar’s office, for example, that office may require that all past exams be made available to students. similarly to the retention considerations, the needs of instructors who draw questions from a limited pool can be accommodated via opt-outs, individual exam restrictions, and ad hoc take-down requests. an alternative approach to exam collection would be an opt-in model where faculty choose to submit exam questions on their own schedule. at the utl, the fas and the utm campus both operate under the opt-out model. the utl receives all exam questions in regular batches unless they have been restricted by instructors’ requests. occasional withdrawal requests from instructors require an approval from the registrar’s office. conversely, the utsc campus operates under the opt-in model where individual departments submit their exams to the library. while this model provides the most flexibility, the volume of exams received from this campus is subsequently relatively small. repository access when making old exams available online, one of the things to consider is who will have access to them. will the exams only be available to students of the respective academic department, or to all students, or to the general public? will access be possible on campus as well as off campus? if the decision is made to restrict access, is there an existing authorization infrastructure in place that the repository could take advantage of, such as an institutional single sign-on or library’s proxy access? at the utl, access to the old exams repository is provided through ezproxy in the same fashion as subscription resources made available via the library. information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 4 discoverability and promotion how will students find out about the exams available in the repository? will the repository be advertised via the library’s website, promoted by course instructors, or linked with the other course materials? considering the challenge of promoting a resource like this along with a variety of other library resources, it will be preferable to make it known to students via the same channels through which they receive other course information. for many institutions this would be via their learning management system or their course information system. at u of t, the old exams repository is linked from the library website. previously, the link was embedded in the university’s learning management system course template. with a recent transition to a new learning management engine, such exposure is yet to be reestablished. infrastructure and workflows minimum cms requirements a repository of old exams does not require a specific content management system (cms) or an offthe-shelf platform. your institution may already have all the components in place to make it happen. here are the minimum requirements you will want to see in such a system: • file upload by staff (preferably in batch) • file download by end users • basic descriptive metadata • search / browse interface • access control / authentication (if you choose to restrict access) the utl uses a stand-alone instance of dspace for its old exams repository. dspace is an opensource software for digital repositories used across the globe primarily in academic institutions. the utl chose this platform since it was already running an instance of dspace for its institutional repository (ir) and had the infrastructure and expertise on site. however, this is not a solution we would recommend to an institution with no existing dspace experience. while dspace is an open source platform, maintaining it locally requires significant staff expertise that may not be warranted considering that a collection of exams would only use a fraction of its robust functionality. if you do consider using dspace, a hosted solution may be preferable in a situation when local it resources and expertise are limited. distributing past exams via an existing digital repository an institution that already maintains a digital repository may consider adding exams as a collection to the existing infrastructure. when choosing to do so it is important to consider whether the exams use case may be different from your ir use case, and whether the new collection will fit in the existing mission and policies. differences may include the following: • access level. ir missions tend to revolve around providing openly accessible materials, whereas exams may need to be restricted. will your repository allow selective access restrictions to the exams collection? • longevity. ir materials are usually intended to be kept long-term, whereas exams may be on a retention schedule. for that reason, it also does not make sense to assign permanent identifiers to exams as many repositories do for their other materials. information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 5 • file types and metadata. unlike a variety of research outputs and metadata usually captured in an ir, exams would have uniform metadata and object type. this makes them suitable for batch transformations and uploads. batch metadata creation options because of the uniform object type, exams are well suited to batch processing, transformations, and uploads. at utl, metadata is created from the filenames of scanned pdf files by a python script.2 the script breaks up the filename into dublin core metadata fields based on the pattern shown in figure 1. see figure 2 for a snippet of the script populating dublin core metadata fields. figure 1. file-naming pattern for metadata creation at utl. figure 2. a screenshot of the utl script generating dublin core metadata from filenames. once metadata is generated, the second python script (figure 3) packages the pdf and metadata file into a dspace simple archive (dsa) which is the format that dspace accepts for batch ingests. information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 6 figure 3. a screenshot of the utl script packaging a pdf and metadata into a dspace simple archive. the dspace simple archive (dsa) then gets batch uploaded into the respective campus and examperiod collections (figure 4) using the dspace native batch import functionality. figure 5 shows what an individual exam record looks like in the repository. after a new batch is uploaded, collections older than three years are removed from the repository. the utl’s exams processing scripts are openly available in github under an apache license 2.0 (https://github.com/utlib/dspace-exams-ingest-scripts/). figure 4. a screenshot of collections in the utl’s old exams repository. https://github.com/utlib/dspace-exams-ingest-scripts/ information technology and libraries march 2020 creating and managing a repository of past exam papers | maistrovskaya and wang 7 figure 5. a screenshot of a record in the utl’s old exams repository. conclusion having access to examples of past exam questions can be extremely helpful to students in preparing for upcoming tests. it is possible that old exams are already being shared on your campus in official or unofficial ways, in print or electronically. facilitating online sharing of electronic copies means that all students, on and off campus, will have equitable access to these valuable resources. we hope that the considerations and workflows outlined in this article will help institutions establish such services locally. acknowledgements the authors would like to acknowledge the utl librarians and staff who contributed to the setup and maintenance of the old exams repository over the years: marlene van ballegooie, metadata technologies manager, who operated the filename-to-dublin core metadata crosswalk; sean xiao zhao, former applications programmer analyst, who converted it into python; and sian meikle, associate chief librarian for digital strategies and technology, who was at the inception of the original exam-sharing service and provided valuable historical context and feedback on this article. endnotes 1 university of toronto, “quick facts,” accessed november 4, 2019, https://www.utoronto.ca/about-u-of-t/quick-facts. 2 university of toronto libraries, “exam metadata generation and ingest for dspace,” github repository, last modified september 20, 2019, https://github.com/utlib/dspace-exams-ingestscripts/. https://www.utoronto.ca/about-u-of-t/quick-facts https://github.com/utlib/dspace-exams-ingest-scripts/ https://github.com/utlib/dspace-exams-ingest-scripts/ abstract background considerations in establishing a repository of old exams the source of old exams content hosting and management collection scope exam retention opt-in versus opt-out approach repository access discoverability and promotion infrastructure and workflows minimum cms requirements distributing past exams via an existing digital repository batch metadata creation options conclusion acknowledgements endnotes 201 application of the variety-generator approach to searches of personal names in bibliographic data bases-part 2. optimization of key-sets, and evaluation of their retrieval efficiency dirk w. fokker and michael f. lynch: postgraduate school of librarianship and information science, university of sheffield, england. keys consisting of variable-length chamcter strings from the front and rear of surnames, derived by analysis of author names in a particular data base, am used to provide approximate representations of author names. when combined in appropriate mtios, and used together with keys for each of the first two initials of personal names, they provide a high degme of discrimination in search. methods for optimization of key-sets are desc1·ibed, and the performance of key-sets varying in size between 150 and 300 is determined at file sizes of up to 50,000 name entries. the effects of varying the proportions of the queries present in the file are also examined. the results obtained with fixed-length keys are compared with those f01' variable-length keys, showing the latter to be greatly superior. implications of the work for a variety of types of information systems a1'e discussed. introduction in part i of this series the development of variety generators, or sets of variable-length keys with high relative entropies of occurrence, from the initial and terminal character strings of authors' surnames was described.1 their purpose, used singly or in combination, is to provide a high and constant degree of discrimination among personal names so as to facilitate searches for them. in this paper the selection of optimal combinations of the keys and evaluation of their efficiency in search are described. the performance of combined key-sets of various compositions is determined at a range of file sizes and compared with fixed-length keys. in addition, 202 1 ournal of lib1'm'y automation vol. 7 i 3 september 197 4 the extent of statistical associations among keys from different positions in the names is determined. balancing of key-sets the relative entropies of distribution of the first and last letters of the surnames of authors in the file of 100,000 entries from the inspec data base differ significantly, the former being 0.92 and the latter 0.86. as a result, a larger key-set has to be produced from the back of the surnames to reach the same value of the relative entropy as that of a key-set of given size from the front of the surname. for instance, the value of 0.954 is reached by a key-set comprising 41 keys from the front of the name, but a set of 101 keys from the back is needed to attain this value. it seemed reasonable to assume that keys from the front and rear should be combined in different proportions in order to maximize the relative entropy of the combined system, and that their proportions should reflect the redundancies of each distribution (redundancy = 1 hr). in order to test this, a series of combined key-sets of different total sizes was produced, in which the proportions of keys were varied around the ratio of the redundancies of the first and last character positions, i.e., ( 1 0.92): ( 1 0.86), or 8:14. the relative entropies of the name representations provided by combining these key-sets with keys for the first and second initials were determined by applying them to the 50,000 name file, and the entropy value used to determine the optimal ratio of keys. in one case, the correlation between the value of the relative entropy and retrieval efficiency, as measured by the precision ratio, was also studied, and shown to be high. the sizes of the combined key-sets studied were 148 and 296, with an intermediate set of 254 keys. the values of 148 and 296 were chosen in view of the projected implementation in the serial-parallel file organization.2 this relates the size of the key-set to the number of blocks on one cylinder of a disc. (the 30mbyte disc cartridges available to us have 296 blocks per cylinder.) otherwise the choice of key-set is arbitrary, and can be varied at will. the minimum key-set size is 106, consisting of 26 letters each for the first and last letter of the surname, and 27 ( 26 letters and the space symbol) each for the first and second initials. the numbers of n-gram keys ( n ::::,. 2) required for the key-sets numbering 148, 254, and 296 in size are . thus 42, 148, and 190. full details are given of the composition of the first and third of these sets. a slight refinement to key-set generation was employed to ensure as close an approximation to equifrequency as possible, especially with the smallest key-sets. precise application of a threshold frequency may occasionally result in arbitrary inclusion of either very high or very low frequency keys. thus, if almost all the occurrences of a longer key are accounted for by a shorter key (as with -mann and -ann), only the shorter n-gram is included. va1'iety-generato1· approach/fokker and lynch 203 optimal set of 148 keys the number of n-gram keys ( n ::::::,. 2) to be added to the minimum set of 106 keys is 42, the presumed optimum proportion being 8:14, which implies about 16 keys from the front of the name and 26 from the back. in order to examine the relationship between the ratio of keys from the front and rear of the surname and the relative entropy of the combined sets, the ratios were varied at intervals between 1:1 and 1:3 so that the numbers of n-grams varied from 21 and 21 to 11 and 31 respectively. for each ratio the keys were applied to the 50,000 name entries, and the distribution of the resultant descriptions determined. the ratios, the number of n-gram keys, and the relative entropies of the distributions are shown in table 1. the maximum value of the entropy is taken to be log250,000. in this case the balancing point, with the key-set including 16 n-gram keys table 1. relation between ratio of n-grams f1'0m f1'dnt and rear of surname, entropy of combined key-sets, and retrieval efficiency for a series of sets of 148 keys ratio numbm· of n-gram number of diffm·ent relative · precision(%) of n-gram keys representations entropy (file size= keys front back in 50,000 entries of system 25,000) 1:1 21 21 33,485 0.9450 71.5 3:4 18 24 33,501 0.9450 71.3 17:25 17 25 33,434 0.9447 70.9 8:13 16 26'* 33,454 0.9453 72.2 5:9 15 27 33,402 0.9450 72.0 1:2 14 28 33,378 0.9449 72.1. 1:3 11 31 33,126 0.9437 71.5 total number of different name entries = 41,469. '* key-set with highest relative entropy. from the front and 26 from the back, corresponds with the ratio of the redundancies of the first and last letters of the surnames. table 2 shows the composition of the optimal key-set of 148 keys, while table 3 gives the distribution of the name representations compiled from the combined key-set, and its corresponding relative entropy. optimal set of 296 keys a similar procedure to that used for the optimal148-key key-set was also applied in this instance. here the ratios of front and rear n-gram keys varied from 57 and 133 to 69 and 121 respectively. for each of the sets chosen, the distributions of the entries resulting from application of the combined key-sets to the file of 50,000 names were determined. these showed virtually no difference in terms of the relative entropy alone, although the total number of different entries differed slightly between keysets, and the highest value was used to choose the optimal set, detailed in table 4. the range of combinations studied is shown in table 5, and the distribution of the entries for the optimal set is given in table 6. .. , -::: 204 journal of library automation vol. 7/3 september 197 4 table 2. composition of balanced key-set of 148 keys keys from front of surname ( 42) : key p• key p• key p• key p• a .035 g .055 ma .030 sh .016 b .020 h .035 n .025 st .016 ba .020 ha .021 0 .017 t .040 be .017 i .013 p .038 u .005 bo .014 j .017 pa .014 v .025 br .014 k .041 q .001 w .040 c .036 ka .017 r .032 x ch .016 ko .017 ro .017 y .011 d .044 l .033 s .049 z .013 e .018 le .014 sa ,016 f .034 m .050 sc .015 keys from rear of surname (52) : a .060 ii .015 nn .010 is .012 ra .010 ki .015 on .018 t .042 va .015 j .001 son .027 u .013 b .003 k .033 0 .028 v .001 c .005 l .013 ko .013 ev .018 d .030 el .012 p .004 ov .026 e .068 ll .016 q .001 ,kov .012 f .006 m .013 r .016 nov .on g .012 n .009 er .064 w .005 ng .014 an .020 ler .013 x .003 h .020 man .017 ner .010 y .031 ch .017 en .025 s .055 ey .012 i .044 in .039 es .015 z .013 keys from first initial: 27 characters keys from second initial: 27 characters table 3. frequencies of entries represented by optimall48-key key-set in a file of 50,000 names frequency number of entries with f frequencyf 1 24,363 2 5,622 3 1,850 4 757 5 372 6 193 7 103 8 68 9 32 10 24 11-15 54 16--20 11 21-30 4 33 1 total number of different entries = 33,454 maximum number of possible combinations= 1,592,136 (i.e., 42 x 52 x 27") h = 14.7553 hmax = 15.6096{log,50,000) hr = 0.9453 variety-generator approach/fokker and lynch 205 table 4. composition of balanced key-set of 296 keys keys from front of surname ( 87) : a bu e ha ki ma ni ra si wa al c f he ko mar 0 re so we an qa fr ho kr mc p ri st wi b ch g hu ku me pa ro t x ba co ga i l mi pe s ta y bar d go j la mo po sa u z be da gr jo le mu pr sc v bo de gu k ll n q se· va br do h ka m na r sh w keys from rear of surname ( 155) : a ld ng vskii el lin r or nt sov ca nd ang ki ll tin ar s rt w da rd lng ski all nn er as ert x ka e rg wski ell on ber es st y ma de h li m son der nes tt ay na ee ch ni am lson ger is ett ey ina ge ich ri n nson nger ns u ley ra ke vich ti an rson her ins v ky ta le gh j man ton ier os ev ry va ne sh k rman 0 ker rs ov z ova re th ak yan ko ler ss kov tz wa se ith ck en nko ller ts ikov ya te i ek sen no mer us lov b f ai ik in to ner t nov c ff hi l ein p ser dt anov d g ii al kin q ter et rov keys from first initial: 27 characters keys from second initial: 27 characters table 5. relation between ratio of n-grams from front and rear of surname and entropy of combined key-sets for a series of sets of 296 keys (file size= 50,000) ratio ofn-gram keys 3:7 61:129 13:25 69:121 number of n-gram keys front 57 61 65 69 back 133 129'* 125 121 '* key-set with highest number of different entries. number of different representations 39,182 39,191 39,186 39,179 relative entropy of system 0.9679 0.9679 0.9679 0.9679 in this instance, the ratio of n-gram keys from the front and back of the surnames has been displaced from the ratio of the redundancies of the first and last characters of the surnames, i.e., 8:14 (1:1.7). here the ratio is roughly 1:2. this is undoubtedly due to the fact that the relative entropies of key-sets from the back of the surname increase less rapidly than those of key-sets from the front, and hence larger sets must be employed. evaluation of retrieval effectiveness the keys in the optimized key-sets represent name entries in an approxi,, i' i: 206 ] oumal of librm·y automation vol. 7 i 3 september 197 4 table 6. frequencies of entries represented by optimal key-set of 296 keys in a file of 50,000 names frequency f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 total number of different entries = 39,191 number of entries with frequencyf 31,705 5,394 1,371 442 164 63 27 12 4 3 2 2 1 1 maximum number of possible combinations= 9,830,565 (i.e., 87 x 155 x 27') h = 15.108 hmax = 15.6096(log,50,000) hr = 0.9679 mate manner only, so that when a search for a name is performed, additional entries represented by the same combination of keys are identified. while these may be eliminated in a subsequent character-by-character match of the candidate hits, the proportion of unwanted items should remain low if the method is to offer advantages. in evaluating the effectiveness of the key-sets in the retrieval, the names in the search file were represented by concatenating the codes for the keys from the front and back of the surnames and the initials, and subjecting the query names to the same procedure. the matching procedure produced lists of candidate entries, of which the desired entries were a subset. the final determination was carried out manually. the tests were performed first with names sampled from the search file, so that correct items were retrieved for each query. since searches for name entries may be performed with varying probabilities that the authors' names are present in the file (especially in current-awareness searches), varying proportions of names of the same provenance, but known not to be present in the search file, were also added. in these cases candidate items were selected which included none of the desired entries. recall tests were also performed and recall shown to be complete. the measure used in determining the performance of the variety-generator search method is the precision ratio, defined as the ratio of correctly identified names to all names retrieved. it is presented both as the ratio of averages (i.e., the summation of items retrieved in the search and calculation of the average) and as the average of ratios (i.e., averaging the val'iety-genemtor app1'0ach/fokker and lynch 207 figures for individual searches). the latter gives higher figures, since many of the individual searches give 100 percent precision ratios. the precision ratio was found to be dependent on file size and to fall somewhat as the size of file increases. this is due to the fact that the keysets provided only a limited, if very high, total number of possible combinations, while the total possible variety of personal names is virtually unlimited. the evaluation was performed with a sample of 700 names, selected by interval sampling. this number ensured a 99 percent confidence limit in the results. a comparison of the interval sampled query names with randomly sampled names showed that no bias was introduced by interval sampling. a test to confirm that the retrieval effectiveness reached a peak at the maximum value of the relative entropy of a balanced key-set was performed first. this was carried out on a file of 25,000 names, using as queries names selected from the file and the optimal 148-key key-set. as shown in table 1, the values of the precision ratio (ratio of averages) and of the relative entropy both peak at the same ratio of n-gram keys from the front and back of the surnames. the performance of the optimal key-sets of 148, 254, and 296 keys with files of 10,000, 25,000, and 50,000 names is shown in table 7. calculated as the ratio of averages, the smallest key-set ( 148 keys) shows a precision ratio of 64 percent with a file of 50,000 names, which means that of every three names identified in the variety-generator search, two are those desired. with the largest key-set ( 296 keys), this rises to nine correctly identified names in every ten retrieved at this stage. on the other hand, calculated as the average of ratios, the precision ratios rise to 81 percent and 94 percent respectively. for smaller file sizes-typical, for instance, of current-awareness searches-the figures for all of these are cottespondingly higher. table 7. precision ratios obtained in variety-generator searches of personal names-queries sampled from sea1'ch file (confidence level= 99 pm·cent) precision as ratio of averages (%) : file size 50,000 25,000 10,000 precision as average of ratios (%) : file size 50,000 25,000 10,000 148 64 71 84 148 81 87 93 key-set size 254 87 90 93 key-set size 254 91 95 97 296 90 91 94 296 94 96 97 '; ~;: 208 journal of library automation vol. 7/3 september 1974 the effect of sampling from a larger file, so that increasing proportions of the names searched for are not present in the search file, is shown in table 8 for a file of 25,000 names. in this case, the proportion of correctly identified names in the total falls, so that overall performance is somewhat reduced. thus, depending both on file size and on the expected proportion of queries identifying hits, the key-set size can be adjusted to reach a desired level of performance. in addition, tests to determine the table b. effect of varying proportion of query names not present in search file of 25,000 names, using 296 keys (ratio of averages) %of names not precision% number of names number of names in search file (ratio of averages) ret1·ieved correctly retrieved 21 90 766 691 42 85 595 505 61 83 449 371 74 76 319 242 84 68 228 154 applicability of a key-set optimized for one file of 50,000 names to another file of the same provenance and size were carried out. the three key-sets derived from the first file were applied to the second, query names sampled from the latter, and the precision ratios determined. some reduction in performance was observed; expressed as ratio of averages, the precision with the 296-key key-set fell from 90 to 83 percent, with the 254-key keyset from 87 to 82 percent, and with the 148-key key-set from 64 to 56 percent, figures which seem unlikely to prejudice the net performance in any marked way. nonetheless, monitoring of performance and of data base name characteristics over a period of operation might well be advisable. distribution characteristics of other types of keys it is particularly instructive to examine the distribution characteristics of other types of keys, including those of fixed length, generated from various positions in the names, and to compare them with those of the optimal key-sets employed in the variety-generator approach. to this end, the file of 50,000 names was processed to produce the following keys or keysets: 1. initial digram of surname. 2. initial trigram of surname. 3. key-set of ninety-four n-grams from the front of the surname, with first and second initials. 4. key-set consisting of first and last character of surname, with first and second initials. the figures (table 9) show clearly that all have distributions which leave no doubt as to their relative inadequacy in resolving power, where this is defined as the ratio of distinct name representations provided by the key-set used to the number of different name entries ( 41,469) in the file. at the digram level, the value of the resolving power is 0.009, i.e., each vm·iety-generator approach/fokker and lynch 209 digram represents, on average, 110 different name entries, while no fewer than thirty-two specific digrams each represent between 500 and 1,000 different names. at the trigram level, the value of the resolving power rises to 0.08, a tenfold increase; however, one trigram still represents between 500 and 1,000 different names. use of the first and last letters of the surname plus the initials again increases the value of the resolving power to 0.627, or 1.6 distinct names per entry; eight of the representations now account for between thirty-one table 9. distributions of a variety of other representations of personal names in a file of 50,000 entries 94 n-grams from first and last frequency initial digram initial trigram front of surname letter of surname f of surname of surname plus 2 initials plus 2 initials 1 40 735 8,964 16,346 2 22 428 3,929 4,919 3 16 249 1,884 2,025 4 11 197 1,006 973 5 7 170 646 581 6 7 110 397 340 7 10 112 234 224 8 4 98 186 146 9 7 81 144 92 10 5 66 108 72 11 6 61 70 49 12 2 56 88 36 13 5 51 74 33 14 1 48 50 24 15 2 35 51 23 16 3 37 36 25 17 2 35 29 15 18 3 33 29 11 19 8 35 28 6 20 8 40 23 5 21-30 21 207 127 49 31-40 23 109 47 8 41-50 13 88 13 51-100 36 142 3 101-200 24 62 201-500 57 15 501-1000 32 1 total 375 3,301 18,166 26,002 resolving power .009 .080 .438 .627 and forty distinct entries. in contrast, however, the key-set of 148 keys comprising ninety-four n-gram keys from the front of the name and the first and second initials, although almost 50 percent larger than the fourcharacter representation, has a resolving power of only 0.438 (or 2.28 entries per representation). this contrast provides particularly strong evidence for the superiority of keys from the front and rear of the surnames over those from the front alone, even when the latter are variable in •' 210 journal of library automation vol. 7/3 september 1974 length. as expected, the precision ratio of the four-character representation is low, at 37 percent (ratio of averages), compared with 64 percent for the optimal148-key key-set. extent of statistical association among keys thus far, the frequency of occurrence of variable-length character strings from the front and back of the surnames is the only factor considered in their selection as keys. it is well known in other areas that statistical associations among keys can influence the effectiveness of their combinations. 3 where a strong positive association between two keys exists, their intersection results in only a small reduction of the number of items retrieved over that obtained by using each independently. when the association is strongly negative, the result of intersection may be much greater than that predicted on the basis of the product of the individual probabilities of the keys. to assess the extent of associations among keys from the front and rear of surnames and initials, sets of both fixedand variable-length keys from each of these positions were examined.· the kendall correlation coefficient v was calculated for each of the twenty most frequent combinations of these. this is related to the chi-square value by the expression x2 =m v2 where m is the file size, or 50,000. table 10 shows the values of the association coefficient for certain of the characters in the full name. those above .012 are significant at a 99 percent confidence level. positive associations are table 10. a8sociation coefficients for sets of the most frequent digrams from various positions in personal names first and last first letter of surname first and second letters of surname and first initial initials digram v digram v digram v kv .064 kv .054 hv .078 wr .050 hj .027 mv .069 ka .038 br -.024 kv .069 hn .028 sj -.023 rv -.055 sa .024 dj .022 dv -.053 sn .024 bg .018 tv .053 cn .022 ka .018 jv -.045 kn -.020 cj ,018 sv .034 ma .014 sd .015 fv .033 kr -.011 sv .013 nv -.029 sv ,010 mm .011 gv .022 rn .010 mj ,007 lv -.022 bn -.008 bj ,005 iv -.019 br .008 sg -.004 av -.019 mn -.007 sr .004 cv -.018 sr .007 ba .004 pv .017 mr .004 ma ,004 wv -.014 si -.002 sm -.003 yv .010 gn .001 mr .002 bv .005 ln .001 sa -.000 ev -.002 variety-generator app1'0ach/fokker and lynch 211 more frequent than negative. the figures indicate that intersection of certain of these characters as keys in search would result in some slight diminution in performance against that expected. the figures for the association coefficients among the twenty most frequent combinations of keys from the front and back of surnames in the 148and 296-key key-sets show magnitudes (mostly positive) which are substantially greater than those for single characters (see table 11). the reasons for these values are obvious; in certain instances, e.g., miller, jones, and martin, common complete names are apparent, while in one case, lee, an overlap between keys from the front and rear exists. in others, linguistic variations on common names can be discerned, as with br n-brown or braun. table 11. association coefficients in the twenty most frequent key combinations from front and back of surnames in two key-sets key-set size key-set size 148 296 keys v keys v s h .146 s ith .343 j son .127 jo nson .297 sc er .104 jo nes .278 w s .043 an rson .274 t a .038 si gh .249 t i .038 le ee .221 w er .038 mu ller .214 c e .034 ta or .195 f er .033 gu ta .168 p s .025 br n .160 d e .023 mi ller .151 l e .022 mar tin .145 w e .022 wi s .137 g in .020 f her .133 m e .009 sc der .121 s a .008 sa to .110 g e .006 t as .084 m a .005 sc er .069 m er -.004 ch en .055 g er -.000 t son .050 such associations are inevitable. when the selection of keys is based solely on frequency, some deviation from the ideal of independence must result, becoming larger as the size of the key-sets increases, and as the length of certain of the keys increases. however, since its effect in the most extreme cases is merely to lead to virtually exact definition of the most frequent surnames, no particular disadvantage results. possible implementations of the variety-generator name search approach the variety-generator approach permits a number of possible implementations of searches for personal names to be considered, if only in outline f ( f•j/ 212 journal of library automation vol. 7/3 september 1974 at this stage, using a variety of file organization methods. the most widely known methods (apart from purely sequential files) are direct access (utilizing hash-addressing), chained, and index sequential files. direct application of the concatenated key-numbers as the basis for hash-address computation appears attractive in instances where the personal name is used alone or in combination (as, for instance, with a part of the document title). the almost random distribution of the bits in this code should result in a general diminution of the collision and overflow problems commonly encountered with fixed-length keys. since only four keys are used to represent each name, and the four sets of keys from which these are selected are limited in number and of approximately equal probability, the keys can be used to construct chained indexes, to which, however, the usual constraints still apply. index sequential storage again offers opportunities, in particular since the low variety of key types means that the sorting operations which this entails can be eliminated. in effect, each name entry would be represented by an entry in each of four lists of document numbers or addresses, and documents retrieved by intersection of the lists. while four such numbers are stored for each name, in contrast to a single entry for the more conventional name list, the removal of the name list itself would more than compensate for the additional storage required for the lists. in the index sequential mode, the lists of document addresses or numbers stored with each key are more or less equally long. they may thus be replaced by bit-vectors in which the position of a bit corresponds to a name or document number. if the number of keys bears a simple relation to the number of blocks on a disc cylinder, the vectors can be stored in predetermined positions within a cylinder, resulting in the serial-parallel file. the usefulness of this file organization has yet to be fully evaluated; however, it also promises substantial economies in storage. on average, only four of the bits are set at the positions in the vectors corresponding to the name or document entry. on average, then, the density of 1-bits is very low, and long runs of zeros occur in the vectors. they can, therefore, be compressed using run-length coding, for instance as applied by bradley.3· 4 preliminary work with the 296-key key-set has indicated already that a gross compression ratio of nine to one is attainable, so that the explicit storage requirements to identify the association between a name and a document number would be just over thirty bits. conclusions the work described here relates solely to searches for individual occurrences of personal names. clearly, in operational systems in which one or more author names are associated with a particular bibliographical item, it will be necessary to provide for description of each of these for access. if this is provided solely on the basis of a document number, some false coordination will occur-for instance, when the initials of one entry are variety-generator approach/fokker and lynch 213 combined with the surname of another. a number of strategies can be envisaged to overcome this problem. , the performance figures show clearly that a small number of characteristics-between 100 and 300 in this study-are sufficient to characterize the entries in large files of personal names and to provide a high degree of resolution in searches for them. while performance in much larger files, involving the extension of key-set sizes to larger munbers, has yet to be studied, the logical application of the concept of variety generation would appear to open the way to novel approaches to searches for documents associated with particular personal names, which seem likely to offer advantages in terms of the overall economic performance of search systems, not only in bibliographic but also in more general computer-based information systems. acknowledgments we thank m. d. martin of the institution of electrical engineers for provision of a part of the inspec data base and of file-handling software, and the potchefstroom university for c.h.e. (south mrica) for awarding a national grant to d. fokker to pursue this work. we also thank dr. i. j. barton and dr. g. w. adamson for valuable discussions, and the former for n-gram generation programs. references 1. d. w. fokker and m. f. lynch, "application of the variety-generator approach to searches of personal names in bibliographic data bases-part 1. microstructure of personal authors' names," journal of library automation 7:105-18 (june 1974). 2. i. j. barton, s. e. creasey, m. f. lynch, and m. j. snell, "an information-theoretic approach to text searching in direct-access systems," communications of the acm (in press). 3. s. d. bradley, "optimizing a scheme for run-length encoding," proceedings of the ieee 57:108-9 (1969). 4. m. f. lynch, "compression of bibliographic files using an adaptation of runlength coding," information storage and retrieval 9:207-14 (1973). r /' i, marcive: a cooperative automated library system virginia m. bowden: systems analyst, the university of texas health science center at san antonio, and ruby b. miller: head cataloger, trinity university, san antonio, texas. 183 the marcive library system is a batch computer system utilizing both the marc tapes and local cataloging to provide catalog cal'ds, book catalogs, and selective bibliographies for five academic libra1·ies in san antonio, texas. the development of the system is traced and present procedures are described. batch retrieval from the marc 1·ecords plus the modification of these records costs less than twenty cents per title. computer costs fo1' retrieval, modification, and card production average six-ty-six cents per title, between seven and ten cents per card. the attributes and limitations of the marcive system are compm·ed with those of the oclc system. in san antonio, texas, a unique cooperative effort in library automation has developed, involving the libraries of five diverse institutions: trinity university, the university of texas health science center at san antonio (uthscsa), san antonio college (sac), the university of texas at san antonio (utsa), and st. mary's university. these institutions are utilizing the marcive library system which was developed by and for one library, that of trinity university. the marcive system is a batch, disc oriented computer system utilizing both local cataloging and the marc tapes to produce catalog cards, book catalogs, selective bibliographies, and other products. development the trinity university library has been involved in library automation since 1966.1 when the library reclassified its collection from dewey to the library of congress classification in1966, a simplified machine-readable format was developed and used for storage on computer. this format contained the following bibliographic elements: accession number, call number, author, title, and imprint date. in 1969 the library decided to reformat the computer data base into a marc ii compatible format in order 184 ] ournal of library automation vol. 7 i 3 september 197 4 to build a data base of bibliographic records that could be the basis for all future automated systems within the library. the resulting system, marcive, was designed jointly by the head cataloger, ruby b. miller, and the library programmer, paul jackson, a graduate student in trinity's department of computer science. since in 1969 literature on completed library automation projects was sparse, no other system was used as a guide. the marcive format was based on the designers' interpretation of the 1969 edition of the marc manual. the name, marcive, evolved when the programmer facetiously claimed that his format was so advanced he would call it the marc iv format. the computer room operating staff, ignoring the space between the marc and iv, combined the two, producing marciv. an e was added later for ease of pronunciation. the marcive system was designed initially as a system for data storage and retrieval. the update, select, and acquisitions list programs were operative in september 1970. the next month uthscsa inquired as to the possibility of producing catalog cards as part of the marcive system. within the brief span of three months, by january 1971, trinity university library produced 4,289 catalog cards and uthscsa produced 1,719 catalog cards via marcive. in february 1974, the five participating libraries produced a total of 29,000 catalog cards, with trinity accounting for 10,740 cards. continued development of the marcive system was delayed in 1971 by changes in computer center personnel and equipment. in 1972 new programs were developed to incorporate the marc tapes into the marcive system. the size of the marc data base, which is now held on three discs, was a major problem. modifications were included to accept input from magnetic tape and typewriter terminals using the apl language as well as keypunched cards. the original restriction of the system to classifications with one to three alphabetic letters followed by numbers, such as used by lc and nlm, was modified to accept dewey decimal classification to accommodate san antonio college. this restriction had been incorporated in an attempt to insure that the call number would be properly formatted, thus simplifying retrieval in the select program and grouping in the acquisitions list and update programs. computer configuration the marcive system is a disc oriented system which was programmed for an ibm 360/44 using the mft operating system. this computer model was designed for scientific programming and was manufactured in limited quantities. the programs were written in basic assembly language since adequate higher level language compilers for the 360 i 44 were not available at the trinity computer center. in 1971 the programs were converted to run under dos, and in 1972 they were converted for processing on the ibm 370/155 using the os processing system. since the initial promarcne/bowden and miller 185 grams were written in basic assembly language, the subsequent programs have also been written this way. marcive format the marcive format is an adaptation of the marc ii format. the definition of the marc ii format is a", .. format which is intended for the interchange of bibliographic records on magnetic tape. it has not been designed as a record format for retention within the files of any specific organization ... [it is] a generalized structure which can be used to transmit between systems records describing all forms of material capable of bibliographic descriptions . . . the methods of recording and identifying data should provide for maximum manipulability leading to ease of conversion to other formats for various uses."2 adaptation of the marc ii format is common among users. an analysis by the recon task force found much variation among the use of the fixed fields, tags, indicators, and subfields. 3 the oclc system can regenerate marc ii records from oclc records although they contain only 78 percent of the number of characters in the original marc ii record. 4 the developers of the marcive system studied the marc manual and decided that the leader and directmy were not necessary for program manipulation. such information can be generated by a conversion program. the marc mnemonic codes were chosen instead of the numeric ones because all bibliographic data were being coded locally and it was felt that mnemonics would be easier to work with. the mnemonic codes are the ones designated in the marc manuals except that "si" was substituted for "se." rules for assigning indicators, subfields, and delimiters are those described by marc. the basic structure of the marcive format is illustrated in figure 1. the differences between marcive and marc are as follows: 1. marcive's leader consists of three fields: length of disc space, status code, and length of record. in converting marc the following elements of the marc leader are incorporated in the marcive leader fields: length of disc space, status code, and length of record. 2. marcive does not contain the marc record directory, but rather places the tags and subfield codes in front of the actual data. 3. in the conversion from marc ii to marcive, fixed fields such as date of publication are omitted. 4. all data elements in marcive are treated as variable tags even though they contain fixed field data. 5. marcive uses the mnemonic code names for the input of data rather than the numeric marc codes. for example "mep" is used for coding a person as main entry rather than "llo." the mnemonic tag names are stored in the machine format and not the numeric marc tags. ,, ·' 186 j oumal of libra1'y automation vol. 7 i 3 september 197 4 '""d ~ .... ~ 0 i5 cj " "' .oj "' "' .oj § ~ s s " '""' p:; ql "' ql ~ cj "' fin fin-data data elements ..sl 0"' ~ z z <:<:: "' 0 <:<:: <:<:: 1"'1 fjp< ~ tag .g elements bil .g bil ~ b!l"' fl "' "' .s "' bil c/) e-< c/) e-< "' "' " "' p >-1 "' >-1 length of disc space. this identifies the number of seventy-two byte blocks a record uses. the marcive records average 350 characters or three to six blocks. blank. this field is used by the update program. length of record. identifies the actual number of characters a record .contains. fin tag. this is the marcive control tag and must precede each record. it contains four subfields: accession number, type of material, location of material, and call number. tag name. after the fin tag, any of the marcive tags may be input as long as they conform to the proper sequence (i.e., main entry must pi·ecede title). each tag is followed by its subfield codes and the data elements. fig. 1. marcive fo1•mat st1'uctu1'e. 6. all first indicators are input except for the first indicator in the contents note. 7. most of the second indicators are not input, except for the filing indicators which are included in the marcive format. 8. marcive adds one variable tag to the marc format called "fin." it serves the function of the marc 090 local holdings tag. the fin tag must be the first variable tag in each marcive record and must contain four data elements: ( 1) accession number; ( 2) type of material code (monograph, serial, etc.); ( 3) location of material within library (reference, reserve, etc.); ( 4) local call number. even though marcive is not a pure marc format, there has been an attempt to code most of the data elements into marcive. a marcive to marc conversion is being written by one of the marcive libraries in order to merge its marcive data base with a purchased marc data base. marcive master data bases each of the m arcive users maintains a separate data base of its holdings, which is called its marcive master. this master file contains a complete bibliographic record for each title cataloged by the library, including marc cataloging and local cataloging. when a library modifies a marc record, the modified record is recorded in that library's marcive master. the various libraries' marcive masters have not been merged, although this is being considered. each library has prefaced all of its accession numbers with a unique library code just in case a merged data base is desired. marc-con data base the largest data base in the system is the marc-converted data base, marcive/bowden and miller 187 hereafter referred to as marc-con. this data base contains only pure marc data that have been converted into marcive machine format. no original cataloging or local modifications of marc are contained in the marc-con data base. marcive programs convert-this program reformats the weekly marc tapes into the marcive machine format. marc-update-this program merges the weekly converted marc tape with the marc-con disc file. an index sequential ( isam) file containing lc card number, fifty characters of the title, and the disc address of the marc reoord is generated. the isam file is in lc card number order. in 1974 the marc-con data base filled three 3330 disc packs. there are three tape back-up files: one file consisting of original marc records, one of the marc-con records, and a third with the isam file. deleted records and replaced records are annually purged from the marc-con files. a new set of back-up tapes for the disc packs is created every three months in order to facilitate regeneration of the disc packs should damage occur. marc-list-this program lists marc records in title sequence from the tape. once every six to eight weeks the list is cumulated and printed. these lists are used for searching until the annual cumulation of the nuc is received. this provides current listings of records on the marc tapes that are not easily available in the national union catalog. this listing will be eliminated in 1974, when access by title to the marc-con data base is available. marc-search-this program searches for lc numbers on the marc-con file using the isam file. a file of the matched records is produced on tape or disc as specified along with a listing of these records. this listing contains the marc-con complete bibliographic entry (figure 2). although access is currently only by lc card number, access by title algorithm ( 3, 1, 1) is expected in 197 4. replace-the purpose of this program is to modify marc-con records to fit the needs of the individual library. these modifications can be done automatically to all records or on a single record basis by the library. the automatic changes are specified on a control card and include twenty-two options such as assignment of accession number, usage of dewey class number instead of lc, and changing "u.s." in subject headings to "united states." an example of a single modification would be the changing of a series entry from t~·aced to untraced. most marcive participants use a combination of automatic and single changes. the output from the replace program may be input to all other marcive programs, such as edit, catalog card, update, etc. edit-this program verifies the format of the input. valid tags and subfields as well as correct sequence of tags are checked. multiple spaces 188 journal of library automation vol. 7/3 september 1974 library code t0000100fin ab~pa3877.a1~d5~ t0000102lcn a~?0-022854 ~ t0000104lano a~eng~ t0000106lant a~enggrc~ t0000108ddc a~882j.01~ t0000110mepf a~aristophanes.~ t0000112tiln ac~plays;~newly translated into english vbrsb by patrie dickinson.~ t0000114ihp aabc~london,~new york,~oxford university pr~ss,~1970-~ t0000116col ac~v. ~21 em.~ t000011hpri ablb.0.75 (v. 1)~{$2.95 u.s.)~ t0000120siru a~oxford paperbacks, 216-~ t0000122noc a~1. acbarnians. knights. clouds. wasps. peace,1 t0000124aeps ade~dickinson, patrie,11914-1tr.~ t0000200fin ab1nd1097.w4~m613~ t0000202lcn a173-4j7272 ~ t0000204lano a~enq1 t0000206lant a~engita~ t0000210meps a~monti, franco.~ t0000212til ac~african masks;~[translated from th~ italian by andrew hale].1 t0000214imp aabc~london,~new york,1hamlyn,~1969.~ t0000216col adc~j-157 p.169 col. illus.~20 em.~ t0000218pri a,15/-~ t0000220siru a~cameo~ t0000222nog a,translation of le maschere africane.~ t0000224sut az,masks, african,africa, west., fig. 2. search listing of marc-con data. are compressed to one, implied subfields are added, and a limited number of punctuation marks are generated. actual bibliographic data are not checked so spelling errors are not detected by the program. those titles which do not conform to specifications are rejected and an explanatory message is generated. a library may choose one of three forms of listings of output: (1) full-edit, (2) mini-edit, or (3) error-edit. the full-edit marcive!bowden and miller 189 950564 fin,cb6950564,m,rp, qs,4,jk49t,1961;, 950564 meps a, kimber, !diana jclifford, 950564 til ac,janatomy and physiology, (by> jdiana !clifford jkimber 950564 html tag, so most urls were easily identified for removal. the data cleaning process has been described here in a linear fashion for ease of understanding, but over the course of the project it was actually an iterative process, as more cleaning issues were discovered during analysis. based on the analyses performed in the related literature, the practicum student wrote code to test five topic modeling algorithms: (1) latent dirichlet allocation (lda), (2) phrase-lda (lda applied to phrases instead of words), (3) biterm topic modeling (btm), (4) dirichlet mixture modeling (dmm), and (5) non-negative matrix factorization (nmf). ultimately, the processing power and time required to implement btm meant that this algorithm could not be implemented for this project. however, for the other four models, lda, phrase-lda, dmm, and nmf, were all successfully implemented. all code related to this project, including the cleaning and analysis, are available on github (https://github.com/mozeran/uiuc-chat-log-analysis). results outputs of the lda, phrase-lda, dmm, and nmf modeling algorithms are shown in tables 1 through 4. after removing common stop words, the remaining words were put into lowercase and stemmed before topic modeling algorithms were applied. the objective of the stemming process was to convert singular and plural versions of a word to a hybrid form so that they are treated as the same word. thus, many words ending in “y” are shown ending in “i”. for instance, “library” and “libraries” would both be converted to “librari” and thus be treated as the same word. the phrase “easi search” refers to “easy search,” the all-in-one search box on the library homepage. the word “ugl” refers to the undergraduate library (ugl). the word “remov” showed up in the topic lists surprisingly frequently, probably because patron pii was replaced with the word “removed.” since explicitly denoting the removal of pii is unlikely to be of import, it makes sense in the future to simply remove the pii without replacement. table 1: lda (top 10 words in each topic) topic 1 music map laptop remov find ok one also may score topic 2 look search find help databas thank use articl research would topic 3 book librari thank help check look remov reserv would els topic 4 help use student find articl librari hi look tri question topic 5 request librari account item thank ok get help loan number topic 6 thank chat good know one night go okay think hi topic 7 thank look librari remov help would contact inform find like topic 8 search articl databas click thank journal help page ok find topic 9 articl thank journal access look help remov full link find topic 10 access tri link thank use work get campu remov let table 2: phrase-lda information technology and libraries | june 2019 63 (top 10 phrases in each topic) topic 1 interlibrari loan, lose chat, chat servic, lower level, chat open, writer workshop, spring break, studi room, call ugl, add chat topic 2 good night, great day, good day, good luck, drop menu, sound good, nice day, ye great, remov thank welcom, make sens topic 3 anyth els, tri find, abl find, find anyth, feel free, ll tri, social scienc, tri access, ll back, abl access topic 4 easi search, academ search, find articl, search box, tri search, databas subject, search bar, search term, databas search, search databas topic 5 graduat student, grad student, peer review, undergrad student, illinoi undergrad, scholarli sourc, univers illinoi, undergradu student, primari sourc, googl scholar topic 6 main librari, librari catalog, librari account, librari homepag, call number, librari websit, netid password, main stack, creat account, borrow id topic 7 page remov, click link, open new tab, link remov, send link, remov click, left side, remov link, page click, error messag topic 8 give one moment, contact inform, moment pleas, faculti staff, give minut, pleas contact, email address, staff member, faculti member, unit state topic 9 full text, journal articl, access articl, find articl, databas journal, light blue, articl titl, titl articl, journal databas, found articl topic 10 request book, request item, check book, doubl check, print copi, cours reserv, copi avail, physic copi, book avail, copi past table 3: dmm (top 10 words in each topic) topic 1 work open chat way onlin say specif avail day sourc topic 2 check titl research much onlin avail day text sourc say topic 3 pleas sourc day onlin titl found right hello may take topic 4 chat also copi pleas think onlin undergrad sourc work way topic 5 pleas sorri found item chat way right open work time topic 6 found also right much think could research undergrad sorri way topic 7 contact hello account sorri could ask titl moment may think topic 8 copi onlin sorri ask think say right also much sourc topic 9 much research way may right think open take hello result topic 10 abl avail also titl catalog pleas say campu onlin take table 4: nmf (top 10 words in each topic) topic 1 request take titl today moment way item may place say topic 2 specif start type journal topic research tab way subject result topic 3 ugl today ask wonder call may contact peopl someon talk topic 4 sourc univers scholarli research servic resourc tell illinoi guid librarian topic 5 account log set vpn us password id say campu problem topic 6 main locat undergradu call tab review two circul ugl number topic 7 reserv class time undergradu cours websit show im titl onlin good night, good day, good luck | ozeran and martin 64 https://doi.org/10.6017/ital.v38i2.10921 topic 8 text full troubl problem still pdf websit onlin send moment topic 9 chat night hey yeah oh well time tonight take yep topic 10 unfortun uiuc onlin wonder version graduat print seem way grad discussion interpreting the results of a topic model can be a bit of a guessing game. none of these algorithms look at the semantic meaning of words, so the resulting topics are not based on semantics. each algorithm simply employs a different method of mathematically determining the likelihood that words are related to each other. when this likelihood is high enough (as defined by the algorithm), the words are listed within the same topic. identifying topics mathematically is much quicker than a person hand-coding conversations. however, automatic classification also means that the resulting topics could make absolutely no sense to people, who understand the semantic meaning of the words within a topic. this lack of coherent meaning is most present in the results of the dmm model (table 3). for instance, the words that comprise topic 1 are the following: “work open chat way online say specify available day source.” it is difficult to imagine what overarching concept links all, or even most, of these words. only a few words appear to have any significance at all: “open” could refer to open access, or to the library’s open hours; “online” may refer to finding resources online, or the fact that a student is taking online classes; and “source” is likely some reference to a research resource. these words barely relate to each other semantically, and the remaining seven words don’t provide much clarification. thus, it appears that dmm is not a particularly good topic modeling algorithm for library chat reference. the results seen from the lda model (table 1) appear slightly more comprehensible. in topic 2, for instance, the words are as follows: “look search find help database thank use article research would.” while not all the words relate to each other, a common theme could emerge from the words look, search, find, database, article, and research. it’s possible that this topic 2 identified chat conversations where a patron needed help finding research articles. even topic 6, at first glance a silly list of words, makes some sense: “thank chat good know one night go okay think hi.” greetings and sign-offs probably comprised a good number of the total words in the corpus, so it is understandable that a “greetings” topic could be mathematically identified. overall, lda appears to have potential in topic modeling chat reference, but it probably needs to be further tweaked. when applying the lda model to phrases (table 2), the coherence increases within the phrases, but the topics are not always as coherent. topic 1 includes the following phrases: “interlibrary loan, lose chat, chat service, lower level, chat open, writer workshop, spring break, study room, call ugl, add chat.” each phrase, individually, makes perfect sense for the context of this library; as a collection, however, the phrases don’t comprise one coherent topic. four of the phrases explicitly mention chat services (an interesting meta-topic), while the rest appear completely unrelated. on the other hand, topic 10 does show more semantic relation between the phrases: “request book, request item, check book, double check, print copy, course reserve, copy available, physical copy, book available, copy past.” it seems pretty clear that this topic refers to books— whether on reserve, being requested, or checking if they are even available. with the wide difference in topic coherence, the phrase-lda algorithm is not perfect for topic modeling chat reference, but further exploration is warranted. information technology and libraries | june 2019 65 the final algorithm, nfm (table 4), is also imperfect. it is possible to distill each topic into an actual semantic concept, but there is almost always at least one word that makes it a little less clear. topic 5 probably provides the best coherence: “account log set vpn use password id say campus problem.” it seems clear this topic refers to identity verification, likely for off-campus use of library resources. the other topics given by the algorithm have more confusing elements, such as in topic 1 where the relatively meaningless words may, way, and say all appear. it’s interesting that kohler found nmf to work very well, while the results above are not nearly as coherent as those identified in her implementation.12 this is a perfect example of how the tuning of many different parameters can affect the ultimate results of each topic modeling algorithm. this is why the authors think it is worth continuing to explore how to improve the implementation of lda, phrase-lda, and nmf algorithms for chat conversations, as well as share the original code for others to test and revise. it will take many different projects at many different libraries before an optimum topic model implementation is found for chat reference. next steps for the most part, the more coherent results from the lda and nmf topic modeling algorithms support anecdotal understanding of the primary themes in chat conversations. currently, two members of the research & information services unit, the department responsible for scheduling the chat reference service at the main library, are examining the model outputs to determine whether any of the results are strong enough at this stage to suggest changes to services or resources. they will also share the results with the chat coordinators at other libraries on campus in case the results indicate changes for them. additionally, results will be shared with the library’s web working group, since repeated questions about the same services or locations may suggest the need to display them in a more prominent place on the library website or provide a more discoverable online path to them. since this was a pilot project that used a fairly small data set, it is anticipated that years of transcripts—along with improved topic model implementation—will reveal even more significant and robust themes. with the encouraging results of this pilot project, there is much to continue to explore.13 one future question is whether there are differences between fall and spring semesters. if some topics arise more frequently in one semester than the other, perhaps the library needs to offer more workshops during that semester. alternatively, perhaps support materials should be created (such as handouts or online guides) that emphasize the related services and place them more prominently, while withdrawing or de-emphasizing them in the other semester. another area for further analysis is how the topics that emerge in the late-night chat interactions compare to other times of day. this will help the library design more relevant training materials for the graduate assistants who staff those shifts, or potentially change who is staffing the shifts. also of interest is comparing the text written by the chat operators versus the chat users, as this would further spotlight the terminology that patrons use. if patrons are using significantly different terms from staff, then modifying the language of the library’s website may reduce confusion. there are also improvements to make to the data cleaning process, such as better identifying when to remove stop words and when to remove punctuation. these steps weren’t perfectly aligned, which is why; for example, the “ll” that appears in topic 3 of the phrase-lda results (table 2) is most likely a mutation of the contractions like “i’ll,” “we’ll,” and “you’ll.” generating “ll” as a word from multiple different contractions not only created a meaningless word, but since “ll” good night, good day, good luck | ozeran and martin 66 https://doi.org/10.6017/ital.v38i2.10921 occurred more frequently than any unique contraction, it was potentially treated as more important by the topic modeling algorithms. conclusion this project has demonstrated that topic modeling is one possible way to employ automated methods to analyze chat reference, with mixed success. the library will continue to improve chat reference analysis based on this project experience. the authors hope that other libraries will use the lessons from this project and the code in github as a starting point to employ similar analysis for their own chat reference. in fact, a related project at the university of northern iowa library is evidence of growing interest in topic modeling of chat reference transcripts.14 considering how frequently patrons use chat reference, is it important for libraries to explore and embrace whatever methods will allow them to assess and improve such services. acknowledgements the authors wish to acknowledge the research and publication committee of the university of illinois at urbana-champaign library, which provided support for the completion of this research. many thanks are owed to xinyu tian, our practicum student, for the extensive work he did in identifying relevant literature and developing the project code. notes 1 jo kibbee, david ward, and wei ma, “virtual service, real data: results of a pilot study,” reference services review 30, no. 1 (mar. 1, 2002): 25–36, https://doi.org/10.1108/00907320210416519. 2 the library uses the read scale (reference effort assessment data scale), which allows reference transactions to be translated into a numerical scale that takes into account the effort, skills, knowledge, teaching moment, techniques and tools used by the staff in the transaction. see readscale.org for more information. 3 david ward and m. kathleen kern, “combining im and vendor-based chat: a report from the frontlines of an integrated service,” portal: libraries and the academy 6, no. 4 (oct. 2006): 417–29, https://doi.org/10.1353/pla.2006.0058; joann jacoby et al., “the value of chat reference services: a pilot study,” portal: libraries and the academy 16, no. 1 (jan. 2016): 109– 29, https://doi.org/10.1353/pla.2016.0013; david ward, “using virtual reference transcripts for staff training,” reference services review 31, no. 1 (2003): 46–56, https://doi.org/10.1108/00907320310460915. 4 robin brown, “lifting the veil: analyzing collaborative virtual reference transcripts to demonstrate value and make recommendations for practice,” reference & user services quarterly 57, no. 1 (fall 2017): 42–47, https://doi.org/10.5860/rusq.57.1.6441; maryvon côté, svetlana kochkina, and tara mawhinney, “do you want to chat? reevaluating organization of virtual reference service at an academic library,” reference & user services quarterly 56, no. 1 (fall 2016): 36–46, https://doi.org/10.5860/rusq.56n1.36; donna goda and corinne bisshop, “frequency and content of chat questions by time of semester at the university of central florida: implications for training, staffing and marketing,” public services quarterly 4, no. 4 (dec. 2008): 291–316, https://doi.org/10.1080/15228950802285593; information technology and libraries | june 2019 67 kelsey keyes and ellie dworak, “staffing chat reference with undergraduate student assistants at an academic library: a standards-based assessment,” the journal of academic librarianship 43, no. 6 (2017): 469–78, https://doi.org/10.1016/j.acalib.2017.09.001; michael mungin, “stats don’t tell the whole story: using qualitative data analysis of chat reference transcripts to assess and improve services,” journal of library & information services in distance learning 11, no. 1–2 (jan. 2017): 25–36, https://doi.org/10.1080/1533290x.2016.1223965. 5 shu z. schiller, “chat for chat: mediated learning in online chat virtual reference service,” computers in human behavior 65 (dec. 2016): 651–65, https://doi.org/10.1016/j.chb.2016.06.053. 6 ellie kohler, “what do your library chats say?: how to analyze webchat transcripts for sentiment and topic extraction,” in brick & click libraries conference proceedings (brick & click, maryville, mo: northwest missouri state university, 2017), 138–48, https://www.nwmissouri.edu/library/brickandclick/presentations/eproceedings.pdf. 7 kohler, 141. 8 for example: guan-bin chen and hung-yu kao, “re-organized topic modeling for microblogging data,” in proceedings of the ase bigdata & socialinformatics 2015, ase bd&si ’15 (new york, ny: acm, 2015), 35:1–35:8, https://doi.org/10.1145/2818869.2818875. 9 x. cheng et al., “btm: topic modeling over short texts,” ieee transactions on knowledge and data engineering 26, no. 12 (dec.2014): 2,928–41, https://doi.org/10.1109/tkde.2014.2313872. 10 for example: chenliang li et al., “topic modeling for short texts with auxiliary word embeddings,” in proceedings of the 39th international acm sigir conference on research and development in information retrieval (acm press, 2016), 165–74, https://doi.org/10.1145/2911451.2911499. 11 we used python packages gensim, langid, nltk, numpy, pandas, re, sklearn, and stop_words for data cleaning and analysis. 12 kohler, “what do your library chats say?” 13 the library implemented new chat reference software after this project was completed, so analysis of chat conversations that took place after the spring 2018 semester will require a reworking of the data collection and cleaning processes. 14 hyunseung koh and mark fienup, “library chat analysis: a navigation tool,” (poster, dec. 5, 2018), https://libraryassessment.org/wp-content/uploads/2018/11/58-kohfienuplibrarychatanalysis.pdf. microsoft word march_ital_prommann_original_notes.docx applying  hierarchical  task  analysis   method  to  discovery  layer  evaluation       merlen  prommann   and  tao  zhang     information  technology  and  libraries  |  march  2015             77   abstract   while  usability  tests  have  been  helpful  in  evaluating  the  success  or  failure  of  implementing  discovery   layers  in  the  library  context,  the  focus  of  usability  tests  has  remained  on  the  search  interface  rather   than  the  discovery  process  for  users.  the  informal  site-­‐  and  context  specific  usability  tests  have   offered  little  to  test  the  rigor  of  the  discovery  layers  against  the  user  goals,  motivations  and  workflow   they  have  been  designed  to  support.  this  study  proposes  hierarchical  task  analysis  (hta)  as  an   important  complementary  evaluation  method  to  usability  testing  of  discovery  layers.  relevant   literature  is  reviewed  for  the  discovery  layers  and  the  hta  method.  as  no  previous  application  of  hta   to  the  evaluation  of  discovery  layers  was  found,  this  paper  presents  the  application  of  hta  as  an   expert  based  and  workflow  centered  (e.g.,  retrieving  a  relevant  book  or  a  journal  article)  method  to   evaluating  discovery  layers.  purdue  university’s  primo  by  ex  libris  was  used  to  map  eleven  use  cases   as  hta  charts.  nielsen’s  goal  composition  theory  was  used  as  an  analytical  framework  to  evaluate   the  goal  charts  from  two  perspectives:  a)  users’  physical  interactions  (i.e.,  clicks),  and  b)  user’s   cognitive  steps  (i.e.,  decision  points  for  what  to  do  next).  a  brief  comparison  of  hta  and  usability  test   findings  is  offered  as  a  way  of  conclusion.   introduction   discovery  layers  are  relatively  new  third  party  software  components  that  offer  google-­‐like  web-­‐ scale  search  interface  for  library  users  to  find  information  held  in  the  library  catalo  and  beyond.   libraries  are  increasingly  utilizing  these  to  offer  a  better  user  experience  to  their  patrons.  while   popular  in  application,  the  discussion  about  discovery  layer  implementation  and  evaluation   remains  limited.  [1][2]     a  majority  of  reported  case  studies  discussing  discovery  layer  implementations  are  based  on   informal  usability  tests  that  involve  a  small  sample  of  users  in  a  specific  context.  the  resulting  data   sets  are  often  incomplete  and  the  scenarios  are  hard  to  generalize.[3]  discovery  layers  have  a   number  of  technical  advantages  over  the  traditional  federated  search  and  cover  a  much  wider   range  of  library  resources.  however,  they  are  not  without  limitations.  questions  have  remained   scarce  about  the  workflow  of  discovery  layers  and  how  well  they  help  users  achieve  their  goals.     merlen  prommann  (mpromann@purdue.edu)  is  user  experience  researcher  and  designer,   purdue  university  libraries.  tao  zhang  (zhan1022@purdue.edu)  is  user  experience  specialist,   purdue  university  libraries.     applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       78   beth  thomsett-­‐scott  and  patricia  e.  reese1  offered  an  extensive  overview  of  the  literature   discussing  the  disconnect  between  what  the  library  websites  offer  and  what  their  users  would   like.[1]  on  the  one  hand,  library  directors  deal  with  a  great  variety  of  faculty  perceptions,  in  terms   of  what  the  role  of  library  is  and  how  they  approach  research  differently.  the  ithaka  s+r  library   survey  of  not-­‐for  profit  four-­‐year  academic  institutions  in  the  us  suggests  a  real  diversity  of   american  academic  libraries  as  they  seek  to  develop  services  with  sustained  value.[4]  for  the   common  library  website  user,  irrelevant  search  results  and  unfamiliar  library  taxonomy  (e.g.  call   numbers,  multiple  locations,  item  formats,  etc.)  are  two  most  common  gaps.[3]  michael  khoo  and   catherine  hall  demonstrated  how  users,  primarily  college  students,  have  become  so  accustomed   to  the  search  functionalities  on  the  internet  that  they  are  reluctant  to  use  library  websites  for  their   research.[5]  no  doubt,  the  launch  of  google  scholar  in  2005  was  another  driver  for  librarians  to   move  from  the  traditional  federated  searching  to  something  faster  and  more  comprehensive.[1]   while  literature  encouraging  google-­‐like  search  experiences  is  abundant,  khoo  and  hall  have   warned  designers  to  not  take  users’  preferences  towards  google  at  face  value.  they  studied  users’   mental  models,  defining  it  as  “a  model  that  people  have  of  themselves,  others,  the  environment,   and  the  things  with  which  they  interact,  such  as  technologies,”  and  concluded  that  users  often  do   not  understand  the  complexities  of  how  search  functions  actually  work  or  what  is  useful  about   them.[5]     a  more  systematic  examination  of  the  tasks  that  discovery  layers  are  designed  to  support  is   needed.  this  paper  introduces  hierarchical  task  analysis  (henceforth  hta)  as  an  expert  method  to   evaluate  discovery  layers  from  a  task-­‐oriented  perspective.  it  aims  to  complement  usability   testing.  for  more  than  40  years,  hta  has  been  the  primary  methodology  to  study  systems’  sub-­‐ goal  hierarchies  for  it  presents  the  opportunity  to  provide  insights  into  key  workflow  issues.  with   expertise  in  applying  hta  and  being  frequent  users  of  the  purdue  university  libraries  website  for   personal  academic  needs,  we  mapped  user  tasks  into  several  flow  charts  based  on  three  task   scenarios:  (1)  finding  an  article,  (2)  finding  a  book,  and  (3)  finding  an  ebook.  jackob  nielsen’s   “goal  composition”  heuristics:  generalization,  integration  and  user  control  mechanisms[6]  were   used  as  an  analytical  framework  to  evaluate  the  user  experience  of  an  ex  libris  primo®  discovery   layer  implemented  at  purdue  university  libraries.  the  goal  composition  heuristics  focus  on   multifunctionality  and  the  idea  of  servicing  many  possible  user  goals  at  once.  for  instance,   generalization  allows  users  to  use  one  feature  on  more  objects.  integration  allows  each  feature  to   be  used  in  combination  with  other  facilities.  control  mechanisms  allow  users  to  inspect  and   amend  how  the  computer  carries  out  the  instructions.  we  discussed  the  key  issues  with  other   library  colleagues  to  meet  nielsen’s  five  expert  rule  and  avoid  loss  in  the  quality  of  insights.[7]   nielsen  studied  the  value  of  participant  volume  in  usability  tests  and  concluded  that  after  the  fifth   user  researchers  are  wasting  their  time  by  observing  the  same  findings  and  not  learning  much   new.  a  comparison  to  usability  study  findings,  as  presented  by  fagan  et  al,  is  offered  as  a  way  of   conclusion.[3]       information  technology  and  libraries  |  march  2015   79   related  work     discovery  layers   the  traditional  federated  search  technology  offers  the  overall  benefit  of  searching  many  databases   at  once.[8][1]  yet  it  has  been  known  to  frustrate  users,  as  they  often  do  not  know  which  databases   to  include  in  their  search.  emily  alling  and  rachel  naismith  aggregated  common  findings  from  a   number  of  studies  involving  the  traditional  federated  search  technology.[9]  besides  slow  response   time,  other  key  causes  of  frustrating  inefficiency  were:  limited  information  about  search  results,   information  overload  due  to  the  lack  of  filters,  and  the  fact  that  results  were  not  ranked  in  order  of   relevance  (see  also  [2][1]).   new  tools,  termed  as  “discovery,”  “discovery  tools,”[2][10]  “discovery  layers’”  or  “next  generation   catalogs,”[11]  have  become  increasingly  popular  and  have  provided  the  hope  of  eliminating  some   of  the  issues  with  traditional  federated  search.  generally,  they  are  third  party  interfaces  that  use   pre-­‐indexing  to  provide  speedy  discovery  of  relevant  materials  across  millions  of  records  of  local   library  collections,  from  books  and  articles,  to  databases  and  digital  archives.  furthermore,  some   systems  (e.g.,  ex  libris  primo  central  index)  aggregate  hundreds  of  millions  of  scholarly  e-­‐ resources,  including  journal  articles,  e-­‐books,  reviews,  legal  documents  and  more  that  are   harvested  from  primary  and  secondary  publishers  and  aggregators,  and  from  open-­‐access   repositories.  discovery  layers  are  projected  to  help  create  the  next  generation  of  federated  search   engines  that  utilize  a  single  search  index  of  metadata  to  search  the  rising  volume  of  resources   available  for  libraries.[2][11][10][1]    while  not  systematic  yet,  results  from  a  number  of  usability   studies  on  these  discovery  layers  point  to  the  benefits  they  offer.     the  most  noteworthy  benefit  of  a  discovery  layer  is  its  seemingly  easy  to  use  unified  search   interface.  jerry  caswell  and  john  d.  wynstra  studied  the  implementation  of  ex  libris  metalib   centralized  indexes  based  on  the  federated  search  technology  at  the  university  of  northern  iowa   library.[8]  they  confirmed  how  the  easily  accessible  unified  interface  helped  users  to  search   multiple  relevant  databases  simultaneously  and  more  efficiently.  lyle  ford  concluded  that  the   summon  discovery  layer  by  serials  solutions  fulfilled  students’  expectations  to  be  able  to  search   books  and  articles  together.[12]  susan  johns-­‐smith  pointed  out  another  key  benefit  to  users:   customizability.[10]  the  summon  discovery  layer  allowed  users  to  determine  how  much  of  the   machine-­‐readable  cataloging  (marc)  record  was  displayed.  the  study  also  confirmed  how  the   unified  interface,  aligning  the  look  and  feel  among  databases,  increased  the  ease  of  use  for  end-­‐ users.  michael  gorrell  described  how  one  of  the  key  providers,  ebsco,  gathered  input  from  users   and  considered  design  features  of  popular  websites,  to  implement  new  technologies  to  the   ebscohost  interface.[13]  some  of  the  features  that  ease  the  usability  of  ebscohost  are  a  dynamic   date  slider,  an  article  preview  hover,  and  expandable  features  for  various  facets,  such  as  subject   and  publication.[2]     applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       80   another  key  benefit  of  discovery  systems  is  the  speed  of  results  retrieval.  the  primo  discovery   layer  by  ex  libris  has  been  complimented  for  its  ability  to  reduce  the  time  it  takes  to  conclude  a   search  session,  while  maximizing  the  volume  of  relevant  results  per  search  session.[14]  it  was   suggested  that  in  so  doing  the  tool  helps  introduce  users  to  new  content  types.  yuji  tosaka  and   cathy  weng  reported  how  records  with  richer  metadata  tend  to  be  found  more  frequently  and   lead  to  more  circulation.[15]  similarly,  luther  and  kelly  reported  an  increase  in  overall  downloads,   while  the  use  of  individual  databases  decreased.[16]  these  studies  point  to  the  trend  of  an   enhanced  distribution  of  discovery  and  knowledge.     with  the  additional  metadata  of  item  records,  however,  there  is  also  the  increased  likelihood  of   inconsistencies  across  databases  that  are  brought  together  in  a  centralized  index.  a  study  by   graham  stone  offered  a  comprehensive  report  on  the  implementation  process  of  the  summon   discovery  layer  at  the  university  of  huddersfield,  highlighting  major  inconsistences  in  cataloging   practices  and  the  difficulties  it  caused  in  providing  consistent  journal  holdings  and  titles.[17]  this   casts  shadows  on  the  promise  of  better  findability.     jeff  wisniewski[18]  and  williams  and  foster[2]  are  among  the  many  who  espouse  discovery   layers  as  a  step  towards  a  truly  single  search  function  that  is  flexible  while  allowing  needed   customizability.  these  new  tools,  however,  are  not  without  their  limitations.  the  majority  of   usability  studies  reinforce  similar  results  and  focus  on  the  user  interface.  fagan  et  al,  for  example,   studied  the  usability  of  ebsco  discovery  service  at  james  madison  university  (jmu).  while  most   tasks  were  accomplished  successfully,  the  study  confirmed  previous  warnings  that  users  do  not   understand  the  complexities  of  search  and  identified  several  interface  issues:  (1)  users  desire   single  search,  but  willingly  use  multiple  options  for  search,  (2)  lack  of  visibility  for  the  option  to   sort  search  results,  and  (3)  the  difficulty  in  finding  journal  articles.[3]     yang  and  wagner  offer  one  case  where  the  aim  was  to  evaluate  discovery  layers  against  a  check-­‐ list  of  12  features  that  would  define  a  true  ‘next  generation  catalogue’:     (1)  single  point  of  entry  to  all  library  information,     (2)  state-­‐of-­‐the-­‐art  web  interface  (e.g.  google  and  amazon),     (3)  enriched  content  (e.g.  book  cover  images,  ratings  and  comments),     (4)  faceted  navigation  for  search  results,     (5)  simple  keyword  search  on  every  page,     (6)  more  precise  relevancy  (with  circulation  statistics  a  contributing  factor),     (7)  automatic  spell  check,     (8)  recommendations  to  related  materials  (common  in  commercial  sites,  e.g.  amazon),     (9)  allowing  users  to  add  data  to  records  (e.g.  reviews),       information  technology  and  libraries  |  march  2015   81   (10)  rss  feeds  to  allow  users  to  follow  top  circulating  books  or  topic  related  updates  in  the   library  catalogue,     (11)  links  to  social  networking  sites  to  allow  users  to  share  their  resources,     (12)  stable  url’s  that  can  be  easily  copied,  pasted  and  shared.  [11]     they  used  this  list  to  evaluate  seven  open  source  and  ten  proprietary  discovery  layers,  revealing   how  only  a  few  of  them  can  be  considered  true  ‘next  generation  catalogs’  supporting  the  users’   needs  that  are  common  on  the  web.  all  of  the  tools  included  in  their  study  missed  precision  in   retrieving  relevant  search  results,  e.g.  based  on  transaction  data.  the  authors  were  impressed   with  open  source  discovery  layers  libraryfind  and  vufind,  which  had  10  of  the  12  features,   leaving  vendors  of  proprietary  discovery  layers  ranking  lower  (see  figure  1).     figure  1.  17  discovery  layers  (x-­‐axis)  were  evaluated  against  a  checklist  of  12  features  expected  of   the  next  generation  catalogue  (y-­‐axis)   yang  and  wagner  theorized  that  the  relative  lack  of  innovation  among  commercial  discovery   layers  is  due  to  practical  reasons:  vendors  create  their  new  discovery  layers  to  run  alongside  older   ones,  rather  than  attempting  to  alter  the  proprietary  code  of  the  integrated  library  system’s  (ils)   online  public  access  catalog  (opac).  they  pointed  to  the  need  for  “libraries,  vendors  and  the  open   source  community  […]  to  cooperate  and  work  together  in  a  spirit  of  optimism  and  collegiality  to   make  the  true  next  generation  catalogs  a  reality”.[11]  at  the  same  time,  the  university  of  michigan   article  discovery  working  group  reported  on  vendors’  being  more  cooperative  and  allowing   coordination  among  products,  increasing  the  potential  of  web-­‐scale  discovery  services.[19]  how   to  evaluate  and  optimize  user  workflow  across  these  coordinating  products  remains  a  practical   9   9   9   8   7.5   7   7   7   6   6   6   5   5   4   2   1   0   1   2   3   4   5   6   7   8   9   10   ranking  of  discovery  layers     (yang  and  wagner  2010,  707)       applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       82   challenge.  in  this  study,  we  propose  hta  as  a  prospectively  helpful  method  to  evaluate  user   workflow  through  these  increasingly  complex  products.       hierarchical  task  analysis   with  roots  in  tylorism*,  industrial  psychology  and  system  processes,  task  analyses  continue  to   offer  valuable  insights  into  the  balance  of  efficiency  and  effectiveness  in  human-­‐computer   interaction  scenarios  [20][21].  historically,  frank  and  lillian  gilbreth  (1911)  set  forth  the   principle  of  hierarchical  task  analysis  (hta),  when  they  broke  down  and  studied  the  individual   steps  involved  in  laying  bricks.  they  reduced  the  brick  laying  process  from  about  18  movements   down  to  four  (in  [21]).  but,  it  was  john  annett  and  keith  d.  duncan  (1967)  who  introduced  hta  as   a  method  to  better  evaluate  the  personnel  training  needs  of  an  organization.  they  used  it  to  break   apart  behavioral  aspects  of  complex  tasks  such  as  planning,  diagnosis  and  decision-­‐making  (see   in[22][21]).     hta  helps  break  users  goals  into  subtasks  and  actions,  usually  in  a  visual  form  of  a  graphic  chart.   it  offers  a  practical  model  for  goal  execution,  allowing  designers  to  map  user  goals  to  the  system’s   varying  task  levels  and  evaluate  their  feasibility  [23].  in  so  doing,  hta  offers  the  structure  with   which  to  learn  about  tasks  and  highlight  any  unnecessary  steps  and  potential  errors  that  might   occur  during  a  task  performance  [24][25],  whether  cognitive  or  physical.  its  strength  lies  in  its   dual  approach  to  evaluation:  on  the  one  hand,  user  interface  elements  are  mapped  at  an  extremely   low  and  detailed  level  (to  individual  buttons),  while  on  the  other  hand,  each  of  these  interface   elements  gets  mapped  to  user’s  high-­‐level  cognitive  tasks  (the  cognitive  load).  this  informs  a   rigorous  design  approach,  where  each  detail  accounts  for  the  high-­‐level  user  task  it  needs  to   support.     the  main  limitation  of  classical  hta  is  its  system-­‐centric  focus  that  does  not  account  for  the  wider   context  the  tasks  under  examination  exists  in.  the  field  of  human-­‐computer  interaction  has  shifted   our  understanding  of  cognition  from  an  individual  information  processing  model  to  a  networked   and  contextually  defined  set  of  interactions,  where  the  task  under  analysis  is  no  longer  confined  to   a  desktop  but  “extends  into  a  complex  network  of  information  and  computer-­‐mediated  interactions”   [26].  the  task  step  focused  hta  does  not  have  the  ability  to  account  for  the  rich  social  and   physical  contexts  that  the  increasingly  mediated  and  multifaceted  activities  are  embedded  in.  hta   has  been  reiterated  with  additional  theories  and  heuristics,  so  as  to  better  account  for  the   increasingly  more  complete  understanding  of  human  activity.       advanced  task  models  and  analysis  methods  have  been  developed  based  on  the  principle  of  hta.   stuart  k.  card,  thomas  p.  moran  and  allen  newell  [27]  proposed  an  engineering  model  of  human   performance  –  goms  (goals,  operators,  methods,  and  selection)  –  to  map  how  task  environment   features  determine  what  and  when  users  know  about  the  task  [20].  goms  have  been  expanded  to   cope  with  rising  complexities  (e.g.  [28][29][30]),  but  the  models  have  become  largely  impractical                                                                                                                             *  tylorism  is  the  application  of  scientific  method  to  the  analysis  of  work,  so  as  to  make  it  more  efficient  and  cost-­‐effective.  modern  task     information  technology  and  libraries  |  march  2015   83   in  the  process  [20].  instead  of  simplistically  suggesting  cognitive  errors  are  due  to  interface  design,   cognitive  task  analysis  (cta)  attempts  to  address  the  underlying  mental  processes  that  most   often  give  rise  to  errors  [24].  given  the  lack  of  our  structural  understanding  about  cognitive   processes,  the  analysis  of  cognitive  tasks  has  remained  problematic  to  implement  [20][31].   activity  theory  models  people  as  active  decision  makers  [20].  it  explains  how  users  convert  goals   into  a  set  of  motives  and  how  they  seek  to  execute  those  motives  as  a  set  of  interactions  in  a  given   situational  condition.  these  situational  conditions  either  help  or  prevent  the  user  from  achieving   the  intended  goal.  activity  theory  is  beginning  to  offer  a  coherent  foundation  to  account  for  the   task  context  [20],  but  it  has  yet  to  offer  a  disciplined  set  of  methods  to  execute  this  theory  in  the   form  of  a  task  analysis.     even  though  task  analyses  have  seen  much  improvement,  adaptation  and  usage  in  its  near-­‐40-­‐ year-­‐long  existence  and  its  core  benefit  –  aiding  an  understanding  of  the  tasks  users  need  to   perform  to  achieve  their  desired  goals  –  have  remained  the  same.  until  activity  theory,  cla  and   other  contextual  approaches  are  developed  into  more  readily  applicable  analysis  frameworks,   classical  hta  with  the  additional  layers  of  heuristics  guiding  the  analysis  remains  the  practical   option  [21].  nielsen’s  goal  composition  [6]  offers  one  such  set  of  heuristics  applicable  for  the  web   context.  it  presents  usability  concepts  such  as  reuse,  multitasking,  automated  use,  recovering  and   retrieving,  to  name  a  few,  so  as  to  systematically  evaluate  the  hta  charts  representing  the   interplay  between  an  interface  and  the  user.     utility  of  hta  for  evaluating  discovery  layers     usability  testing  has  become  the  norm  in  validating  the  effectiveness  and  ease  of  use  of  library   websites.  yet,  thirteen  years  ago,  brenda  battleson,  austin  booth  and  jane  weintrop  [32]   emphasized  the  need  to  support  user  tasks  as  the  crucial  element  to  user-­‐centered  design.  in   comparison  to  usability  testing,  hta  offers  a  more  comprehensive  model  for  the  analysis  of  how   well  discovery  layers  support  users’  tasks  in  the  contemporary  library  context.  considering  the   strengths  of  the  hta  method  and  the  current  need  for  vendors  to  simplify  the  workflows  in  the   increasingly  complex  systems,  it  is  surprising  that  hta  has  not  yet  been  applied  to  the  evaluation   of  discovery  layers.     this  paper  introduces  hierarchical  task  analysis  (hta)  as  a  solution  to  systematically  evaluate   the  workflow  of  discovery  layers  as  a  technology  that  helps  users  accomplish  specific  tasks,  herein,   retrieving  relevant  items  from  the  library  catalog  and  other  scholarly  collections.  nielsen’s  [6]   goal  composition  heuristics,  designed  to  evaluate  usability  in  the  web  context,  is  used  to  guide  the   evaluation  of  the  user  workflow  via  the  hta  task  maps.  as  a  process  (vs.  context)  specific   approach,  hta  can  help  achieve  a  more  systematic  examination  of  the  tasks  discovery  layers   should  support,  such  as  finding  an  article,  a  book  or  an  ebook,  and  help  vendors  coordinate  to   achieve  the  full  potential  of  web-­‐scale  discovery  services.       applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       84   method:  applying  hta  to  primo  by  ex  libris   the  object  of  this  study  was  purdue  university’s  library  website,  which  was  re-­‐launched  with  ex   libris’  primo  in  january  2013  (figure  2)  to  serve  the  growing  student  and  faculty  community.  its   3.6  million  indexed  records  are  visited  over  1.1  million  times  every  year.  roughly  34%  of  these   visits  are  to  electronic  books.  according  to  sharon  q.  yang  and  kurt  wagner  [11],  who  studied  17   different  discovery  layers,  primo  ranked  the  best  among  the  commercial  discovery  layer  products,   coming  fourth  after  the  open  source  tools  library  find,  vufind,  and  scriblio  in  the  overall  rankings.   we  will  evaluate  how  efficiently  and  effectively  the  primo  search  interface  supports  users’  of  the   purdue  libraries  tasks.         figure  2.  purdue  library  front  page  and  search  box   based  on  our  three  year  experience  of  user  studies  and  usability  testing  of  the  library  website,  we   identified  finding  an  article,  a  book  and  an  ebook  as  the  three  major  representative  scenarios  of   purdue  library  usage.  to  test  how  primo  helps  its  users  and  how  many  cognitive  steps  it  requires   of  them,  each  of  the  three  scenarios  were  broken  into  three  or  four  specific  case  studies.  the  case   studies  were  designed  to  account  for  the  different  availability  categories  present  in  the  current   primo  system,  e.g.  ‘full  text  available’,  ‘partial  availability’,  ‘restricted  access’  or  ‘no  access’.  this  is   because  the  different  availabilities  present  users  with  different  possible  frustrations  and  obstacles     information  technology  and  libraries  |  march  2015   85   to  task  accomplishment.  this  system-­‐design  perspective  could  offer  a  comparable  baseline  for   discovery  layer  evaluation  across  libraries.  a  full  list  of  the  eleven  case  studies  can  be  seen  below:     find  an  article:   case  1.  the  library  has  only  a  full  electronic  text.   case  2.  the  library  has  the  correct  issue  of  the  journal  in  print,  which  contains  the  article,  as   well  as,  a  full  electronic  text.   case  3.  the  library  has  the  correct  issue  of  the  journal,  which  contains  the  article,  only  in   print.   case  4.  the  library  does  not  have  the  full  text,  either  in  print  or  electronically.  a  possible   option  is  to  use  inter  library  loan  (here  forth  ill)  request.     find  a  book  (print  copy):   case  5.  the  library  has  the  book  and  the  book  is  on  the  shelf.   case  6.  the  library  has  the  book,  but  the  book  is  in  a  restricted  place,  such  as  the  hicks   repository.  the  user  has  to  request  the  book.   case  7.  the  library  has  the  book,  but  it  is  either  on  the  shelf  or  in  a  repository.  the  user   would  like  to  request  the  book.   case  8.  the  library  does  not  have  the  book.  possible  options  are  uborrow†  or  ill.       find  an  ebook:   case  9.  the  library  has  the  full  text  of  the  ebook.     case  10.  the  ebook  is  shown  in  search  results  but  the  library  does  not  have  full  text.   case  11.  the  book  is  not  shown  in  search  results.  possible  option  is  to  use  uborrow  or  ill.   it  is  generally  accepted  that  hta  is  not  a  complex  analysis  method,  but  since  it  offers  general   guiding  principles  rather  than  a  rigorous  step-­‐by-­‐step  guide,  it  can  be  tricky  to  implement   [24][20][21][23].  both  authors  of  this  study  have  expertise  in  applying  hta  and  are  frequent   users  of  the  purdue  library’s  website.  we  are  familiar  with  the  library’s  commonly  reported   system  errors;  however,  all  of  our  case  studies  result  from  a  randomized  topic  search,  not  from   specific  reported  items.  to  achieve  consistent  hta  charts  one  author  carried  out  the  identified   use-­‐cases  on  a  part-­‐time  basis  over  a  two-­‐month  period.  each  case  was  executed  on  the  purdue   library  website,  using  the  primo  discovery  layer.  an  on  campus  hewlett-­‐packard  (hp)  desktop   computer  with  internet  explorer  and  a  personal  macbook  laptop  with  safari  and  google  chrome   were  used  to  identify  any  possible  inconsistencies  between  user  experiences  on  different                                                                                                                             †  uborrow  is  a  federated  catalog  and  direct  consortial  borrowing  service  provided  by  the  committee  on  institutional   cooperation  (cic).  uborrow  allows  users  to  search  for,  and  request,  available  books  from  all  cic  libraries,  which  includes   all  universities  in  the  big  ten  as  well  as  the  university  of  chicago,  and  the  center  for  research  libraries.     applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       86   operating  systems.  as  per  stanton’s  [21]  statement  that  “hta  is  a  living  documentation  of  the  sub-­‐ goal  hierarchy  that  only  exists  in  the  latest  state  of  revision”,  mapping  the  hta  charts  was  an   iterative  process  between  the  two  authors.   according  to  david  embrey  [24]  “the  analyst  needs  to  develop  a  measure  of  skill  [in  the  task]  in   order  to  analyze  a  task  effectively”  (2).  this  measure  of  skill  was  developed  in  the  process  of   finding  real  examples  (via  a  randomized  topic  search)  from  the  purdue  library  catalog  to  match   the  structural  cases  listed  above.  for  instance  ‘case  1.  the  library  has  only  the  electronic  full  text’   was  turned  into  a  case  goal:  ‘0  find  the  conference  proceeding  on  network-­‐assisted  underwater   acoustic  communication'.  a  full  list  of  referenced  case  studies  is  below:   find  an  article:   case  1.  find  the  article  “network-­‐assisted  underwater  acoustic  communication”  (yang  and  kevin,   2012).   case  2.  find  the  article  “comparison  of  simple  potential  functions  for  simulating  liquid  water”   (jorgensen  et  al.,  1983).   case  3.  find  the  journal  design  annual  “graphis  inc”  (2008).   case  4.  find  the  article  “a  technique  for  murine  irradiation  in  a  controlled  gas  environment”   (walb,  m.  c.  et  al.,  2012).   find  a  book  (in  print):   case  5.  find  the  book  show  me  the  numbers:  designing  tables  and  graphs  to  enlighten  (few,   2004).   case  6.  find  the  book  the  love  of  cats  and  place  a  request  for  it  (metcalf,  1973).   case  7.  find  the  book  the  prince  and  place  a  request  for  it  (machiavelli).   case  8.  find  the  book  the  design  history  reader  by  maffei  and  houze  (2010).  (uborrow  or  ill).     find  an  ebook:   case  9.  find  the  ebook  handbook  of  usability  testing.  how  to  plan,  design  and  conduct  effective   tests  (rubin  and  chisnell,  2008)   case  10.  find  the  ebook  the  science  of  awakening  consciousness:  our  ancient  wisdom  (partly   available  via  hathi  trust)   case  11.  find  the  ebook  ancient  awakening  by  matthew  bryan  laube  (uborrow).     hta  descriptions  are  generally  diagrammatic  or  tabular.  since  diagrams  are  easier  to  assimilate   and  promise  the  identification  of  a  larger  number  of  sub-­‐goals  [23],  diagrammatic  description   method  was  preferred  (figure  2).  each  analysis  started  with  the  establishment  of  sub-­‐goals,  such   as  ‘browse  the  library  website’  and  ‘retrieve  the  article’,  and  followed  with  the  identification  of   individual  small  steps  that  make  the  sub-­‐goal  possible,  e.g.  ‘press  search’  and  ‘click  on  2,  to  go  to   page  2’  (figures  3-­‐5).  then,  additional  iterations  were  made  to  include:  (1)  cognitive  steps,  where     information  technology  and  libraries  |  march  2015   87   users  need  to  evaluate  the  screen  in  order  to  take  the  next  step  (e.g.  identifying  the  correct  url  to   open  from  the  initial  results  set),  and  (2)  capture  cognitive  decision  points  between  multiple   options  for  users  to  choose  from.  for  instance,  items  can  be  requested  either  via  interlibrary  loan   (ill)  or  uborrow,  presenting  the  user  with  an  a  or  b  option,  requiring  cognitive  effort  to  make  a   choice.  such  parallel  paths  were  color  coded  in  yellow  (figure  2).  both  physical  and  cognitive   steps  were  recorded  into  xmind‡,  a  free  mind  mapping  software.  they  were  color-­‐coded  black  and   gray,  respectively,  helping  visualize  the  volume  of  cognitive  decision  points  and  steps  (i.e.   cognitive  load).     figure  3.  full  hta  chart  for  'find  a  book'  scenario  (case  5).  created  in  xmind.         figure  4.  zoom  in  to  steps  1  and  2  of  the  hta  map  for  ‘find  a  book’  scenario  (case  5).  created  in   xmind.                                                                                                                               ‡ xmind is a free mind mapping software that allows structured presentation of step multiple coding references, the addition of images, links and extensive notes. http://www.xmind.net/   applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       88     figure  5.  zoom  in  to  step  3  of  the  hta  map  for  the  'find  a  book'  scenario  (case  5).  created  in   xmind.     information  technology  and  libraries  |  march  2015   89     figure  6.  zoom  in  to  step  4  of  the  hta  map  of  the  'find  a  book'  scenario  (case  5).  created  in   xmind.   to  organize  the  decision  flow  chart,  the  original  hierarchical  number  scheme  for  hta  that   requires  every  sub-­‐goal  to  be  uniquely  numbered  with  an  integer  in  numerical  sequence  [21],  was   strictly  followed.  visual  (screen  captures)  and  verbal  notes  on  efficient  and  inefficient  design   factors  were  taken  during  the  hta  mapping  process  and  linked  directly  to  the  tasks  they  applied   to.  steps,  where  interface  design  guided  the  user  to  the  next  step,  were  marked  ‘fluent’  with  a   green  tick  (figures  3  and  4).  steps  that  were  likely  to  mislead  users  from  the  optimal  path  to  item   retrieval  and  were  a  burden  to  user’s  workflow  were  marked  with  a  red  ‘x’  (see  figures  4  and  5).   one  major  advantage  of  the  diagram  format  is  its  visual  and  structural  representation  of  sub-­‐goals   and  their  steps  in  a  spatial  manner  (see  figures  2-­‐5).  this  is  useful  for  gaining  a  quick  overview  of   the  workflow  [21].   when  exactly  to  stop  the  analysis  has  remained  undefined  for  hta  [21].  it  is  at  the  discretion  of   the  analyst  to  evaluate  if  there  is  the  need  to  re-­‐describe  every  sub-­‐goal  down  to  the  most  basic   level,  or  whether  the  failure  to  perform  that  sub-­‐goal  is,  in  fact,  consequential  to  the  study  results.   we  decided  to  stop  evaluation  at  the  point  where  the  user  located  (a  shelf  number  or  reserve  pick   up  number)  or  received  the  sought  item  via  download.  furthermore,  steps  that  were  perceived  as   possible  when  impossible  in  actuality  were  transcribed  into  the  diagrams.  article  scenario  case  1   offers  an  example:  once  the  desired  search  result  was  identified,  its  green  dot  for  ‘full  text  available’     applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       90   was  likely  to  be  perceived  as  clickable,  when  in  actuality  it  was  not.  the  user  is  required  to  click  on   the  title  or  open  the  tab  ‘find  online’  to  access  the  external  digital  library  and  download  the   desired  article  (see  figure  7).       figure  7.  article  scenario  (case1)  two  search  results,  where  green  'full  text  available'  may  be   perceived  as  clickable.   task  analysis  focuses  on  the  properties  of  the  task  rather  than  the  user.  this  requires  expert   evaluation  in  place  of  involving  users  in  the  study.  as  stated  above,  both  of  the  authors  are   working  experts  in  the  field  of  user  experience  in  the  library  context,  thoroughly  aware  of  the   tasks  under  analysis  and  how  they  are  executed  on  a  daily  basis.  a  group  of  12  (librarians,   reference  service  staff,  system  administrators  and  developers)  were  asked  to  review  the  hta   charts  on  a  monthly  basis.  feedback  and  implications  of  identified  issues  were  discussed  as  a   group.  according  to  nielsen  [7]  it  takes  five  experts  (double  specialist  in  nielsen’s  terms,  is  an   expert  in  usability  as  well  as  in  the  particular  technology  employed  by  the  software.)  to  not  have   significant  loss  of  findings  (see  figure  7).  based  on  this  enumeration,  the  final  versions  of  the  hta   charts  offer  accurate  representations  of  the  primo  workflow  in  the  three  use  scenarios  of  finding   an  article,  finding  a  book  and  finding  an  ebook  at  purdue  university  libraries.       information  technology  and  libraries  |  march  2015   91     figure  8.  average  proportion  of  usability  problems  found  as  a  function  of  number  of  evaluators  in   a  group  performing  heuristic  evaluation  [7].   results     the  reason  for  mapping  primo’s  workflows  in  hta  charts  was  to  identify  key  workflow  and   usability  issues  of  a  widely  used  discovery  layer  in  scenarios  and  contexts  it  was  designed  to   serve.  the  resulting  hta  diagrams  offered  insights  into  fluent  steps  (green  ticks),  as  well  as   workflow  issues  (red  ‘x’)  present  in  primo,  as  applied  at  purdue  university  libraries.  it  is  due  to   space  limitations,  that  only  the  main  findings  of  the  hta  will  be  discussed.  the  full  results  are   published  on  purdue  university  research  repository§.  table  1  presents  how  many  parallel  routes   (a  vs.  b  route),  physical  steps  (clicks),  cognitive  evaluation  steps,  likely  errors  and  well  guided   steps  each  of  the  use  cases  had.     on  average  it  took  between  20  to  30  steps  to  find  a  relevant  item  within  primo.  even  though  no   ideal  step  count  has  been  identified  for  the  library  context,  this  is  quite  high  in  the  general  context   of  the  web,  where  fast  task  accomplishment  is  generally  expected.  paul  chojecki  [33]  tested  how   too  many  options  impact  usability  on  a  website.  he  revealed  that  the  average  step  count  to  lead  to   higher  satisfaction  levels  is  6  (vs.  18,16  average  steps  at  purdue  libraries).  in  our  study,  the   majority  of  the  steps  were  physical  pressing  of  a  button  or  filter  selection;  however,  cognitive   steps  took  up  just  under  a  half  of  the  steps  in  nearly  all  cases.  the  majority  of  cases  flow  well,  as   the  strengths  (fluent  well  guided  steps)  of  primo  outweigh  its  less  guided  steps  that  easily  lend   themselves  to  the  chance  of  error.                                                                                                                                   § task analysis cases and results for ex libris primo. https://purr.purdue.edu/publications/1738   applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       92   content  type   articles   books   ebooks   case  number   1   2   3   4   avg   5   6   7   8   avg   9   10   11   avg   no.  of  decision  points   (between  a  &  b),  to   retrieve  an  item   5   8   4   4   5   4   5   5   2   4   6   3   2   4   minimum  steps   possible  to  retrieve  an   item  (clicks  +  cognitive   decisions)     18   27   16   30   23   18   25   28   24   24   22   19   19   20   of  these  minimum   steps,  how  many  were   cognitive  (information   evaluation  was  needed   to  proceed)     4   8   9   13   9   6   9   7   7   7   4   6   4   5   maximum  steps  it  can   take  to  retrieve  an   item  (clicks  +  cognitive   decisions)   26   35   23   36   30   22   31   33   28   29   32   23   22   26   of  these,  maximum   steps,  how  many  were   cognitive   10   17   14   15   14   10   13   16   8   12   9   8   5   7   errors  (steps  that   mislead  from  optimal   item  retrieval)   3   15   4   8   8   2   2   4   3   3   13   1   2   5   fluent  well  guided   steps  to  item  retrieval   11   11   9   8   10   7   8   7   5   7   6   4   3   5   table  1.  table  listing  each  case’s  key  task  measures,  and  each  scenario’s  averages.   between  the  three  item  search  scenarios  –  articles,  books  and  ebooks  –  the  retrieval  of  articles   was  least  guided  and  required  the  highest  amount  of  decisions  from  the  user  (5,  vs.  4  for  books   and  4  for  ebooks  on  average).  retrieving  an  article  (between  23-­‐30  steps  on  average)  or  a  book   (24-­‐29  steps  on  average)  took  more  steps  to  accomplish  than  finding  a  relevant  ebook  (20-­‐26   steps  on  average).  the  high  volume  of  steps  (max  30  steps  on  average)  it  required  to  retrieve  an   article,  as  well  as  its  high  error  rate  (8),  were  due  to  the  higher  amount  of  cognitive  steps  (12   steps  on  average)  required  to  identify  the  correct  article  and  to  locate  a  hard  copy  (instead  of  the   relatively  easily  retrievable  online  copy).  in  the  book  scenario,  the  challenge  was  also  two-­‐fold:  on   the  one  hand,  it  was  challenging  to  verify  the  right  book  when  there  were  many  similar  results   (this  explains  the  high  number  of  12  cognitive  steps  on  average);  on  the  other  hand,  the  flow  to   place  a  request  for  a  book  was  also  a  challenge.  the  latter  was  a  key  contributor  to  the  higher   amount  of  physical  steps  required  for  retrieving  a  book  (max  29  on  average).         information  technology  and  libraries  |  march  2015   93   common  to  all  eleven  cases,  whether  articles  or  books,  was  the  four  sub-­‐goal-­‐process:  1)  browse   the  library  website,  2)  find  results,  3)  open  the  page  of  the  desired  item,  and  4)  retrieve,  locate   or  order  the  item.  the  first  two  offered  near  identical  experiences,  no  matter  the  search  scenario   or  case.  third  and  fourth  sub-­‐goals,  however,  presented  different  workflow  issues  depending  on   the  product  searched  and  its  availability,  e.g.  ‘in  print’  or  ‘online’.  as  such,  general  results  will  be   presented  for  the  first  two  themes,  while  scenario  specific  overviews  will  be  provided  for  the   latter  two  themes.   browsing  the  library  website   browsing  the  library  website  was  easy  and  supported  different  user  tasks.  the  simple  url   (lib.purdue.edu)  was  memorable  and  appeared  first  in  the  search  results.  the  immediate   availability  of  sub-­‐menus,  such  as  databases  and  catalogs,  offered  speedy  searching  for  the   frequent  users.  the  choice  between:  a)  general  url,  or  b)  sub-­‐menu,  was  the  first  key  decision   point  users  of  primo  at  purdue  libraries  were  presented  with.     the  purdue  libraries’  home  page  (revisit  figure  1)  had  a  simple  design  with  a  clear,  central  and   visible  search  box.  just  above  it  were  search  filters  for  articles,  books  and  the  web.  this  was  the   second  key  decision  point  users  were  presented  with:  a)  they  could  either  type  into  the  search  bar   without  selecting  any  filters,  or  b)  they  could  select  a  filter  to  aid  the  focus  of  their  results  to  a   specific  item  type.  browsing  the  library  website  offers  an  efficient  and  fluent  workflow,  with   ebooks  being  the  only  exception.  it  was  hard  to  know  whether  they  were  grouped  under  articles   or  books  &  media  filters.  confusingly  (at  the  time  of  the  study)  purdue  libraries  listed  ebooks  that   had  no  physical  copies  under  articles,  while  other  ebooks  that  purdue  had  physical  version  of  (in   addition  to  the  digital  ones)  under  books  &  media.  this  was  not  explained  in  the  interface,  nor  was   there  a  readily  available  tooltip.   finding  relevant  results     figure  9.  search  results  for  article  (case2)  ‘comparison  of  simple  potential  functions  for   simulating  liquid  water’     applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       94   primo  presented  the  search  results  in  an  algorithmic  order  of  relevance  offering  additional  pages   for  every  20  items  appearing  in  the  search  results.  the  search  bar  was  then  minimized  at  the  top   of  the  page,  available  for  easy  editing.  the  page  was  divided  into  two  key  sections,  where  the  first   quarter  entailed  filters  (e.g.  year  of  publishing,  resource  type,  author,  journal,  etc.),  and  the  other   three  quarters  was  left  for  search  results  (see  figure  8).  the  majority  of  cognitive  decisions  across   scenarios  were  made  on  this  results  page.  this  was  due  to  the  need  to  pick  up  the  cues  to  identify   and  verify  the  accurate  item  being  searched.  the  value  of  these  cognitive  steps  lies  in  their  leading   of  the  user  to  the  next  physical  steps.  as  discussed  in  the  next  section,  opening  the  page  of  the   desired  item,  there  were  several  elements  that  succeeded  and  failed  at  guiding  the  user  to  their   accurate  search  result.     search  results  were  considered  relevant  when  the  search  presented  results  in  the  general  topic   area  of  the  searched  item.  most  cases  in  most  scenarios  led  to  relevant  results,  however,  book   case  8  and  ebook  case  11,  provided  only  unrelated  results.  generally,  books  and  ebooks  were   easy  to  identify  as  available.  this  was  due  to  their  typically  short  titles,  which  took  less  effort  to   read.  journal  articles,  on  the  other  hand,  have  longer  titles  and  required  more  cognitive  effort  to   be  verified.     article  case  4,  book  case  6  and  ebook  case  10  had  relevant  but  restricted  results.  the  color-­‐ coding  system  that  indicated  the  level  of  availability  for  the  presented  search  results:  green  (fully   available),  orange  (partly  available)  or  gray  (not  available)  dots  –  was  followed  by  an  explanatory   availability  tag,  e.g.  'available  online'  or  'full  text  available'  etc.  tabs  represented  additional  cues,   offering  additional  information,  e.g.  ‘find  in  print’.  these  appeared  in  a  supplementary  way  where   applicable.  for  example,  if  an  item  was  not  available,  its  dot  was  gray  and  it  neither  had  the  'find   in  print'  nor  'find  online'  tab.  instead,  it  had  a  'request'  tab,  guiding  the  user  towards  an  available   alternative  action.  restricted  availability  items,  such  as  a  book  in  a  closed  repository,  had  an   orange  indicator  for  partial  availability.  for  these,  primo  still  offered  the  'find  in  print'  or  'find   online'  tab,  whichever  was  appropriate.  while  the  overall  presentation  of  item  availability  was   clear  and  color-­‐coding  consistent,  the  mechanisms  were  not  without  their  errors,  as  discussed   below.   opening  the  page  of  the  desired  item   this  sub-­‐goal  comprised  of  two  main  steps:  1)  information  driven  cognitive  steps,  which  help  the   user  identify  the  correct  item,  and  2)  user  interface  guided  physical  steps  that  resulted  in  opening   the  page  of  the  desired  item.     frequent  strengths  that  helped  the  identification  of  relevant  items  across  the  scenarios  were  the   clearly  identifiable  labels  underneath  the  image  icons  (e.g.  'book’,  'article',  ‘conference  proceeding'),   hierarchically  structured  information  about  the  items  (title,  key  details,  availability)  and   perceivably  clickable  links  (blue  with  an  underlined  hover  effect).  the  labels  and  hierarchically   presented  details  (e.g.  year,  journal,  issue,  volume,  etc.)  helped  the  workflow  to  remain  smooth,     information  technology  and  libraries  |  march  2015   95   minimizing  the  need  to  use  side  filters.  the  immediate  details  reduced  the  need  to  open   additional  pages,  cutting  down  the  steps  needed  to  accomplish  the  task.  the  hover  effect  of  item   titles  made  the  link  look  and  feel  clickable,  guiding  the  user  closer  to  retrieving  the  item.  color-­‐ coding  all  clickable  links  in  the  same  blue  was  also  an  effective  design  feature,  even  though  bolded   availability  labels  were  equally  prominent  and  clickable.  this  was  especially  true  for  articles   where  the  ‘full  text  available’  tags  correspond  to  users  goal  to  immediately  download  the  sought   item  (figure  8).   the  most  frequent  causes  of  errors  were  duplicated  search  results.  generally,  primo  displays   multiple  versions  of  the  same  item  into  one  search  result  and  offered  a  link:  ‘see  all  results’.  in  line   with  graham  stone’s  [17]  study,  which  highlighted  the  problem  of  cataloging  inconsistences,   primo  struggled  to  consistently  grouping  all  overlapping  search  result  items.  both  book  and   article  scenarios  suffered  from  at  least  one  duplicate  search  result  case  due  to  inconsistent  details.   article  scenario  case  2  offers  an  example,  where  jorgensen  et  al  “comparison  of  simple  potential   functions  for  simulating  liquid  water”  (1983)  had  two  separate  results  for  the  same  journal   article  of  the  same  year  (first  two  results  in  figure  8).  problematically,  the  two  results  offered   different  details  for  the  journal  issue  and  page  numbers.  this  may  cause  likely  referencing   problems  for  primo  users.   duplicated  search  results  were  also  an  issue  for  book  scenarios.  the  most  frequent  causes  for  this   were  instances  where  authors’  first  and  last  names  were  presented  in  a  reverse  order  (see  also   figure  8  for  article  case  2),  the  books  had  different  print  editions,  or  the  editors’  name  was  used   in  place  of  the  authors’.  book  scenario  case  7:  machiavelli’s  “the  prince”  resulted  in  extremely   varied  results,  requiring  16  cognitive  steps  and  33  physical  steps  before  a  desired  item  could  be   verified.  this  is  where  search  filters  were  most  handy.  problematically,  in  case  7,  machiavelli  –   the  author  –  did  not  even  appear  in  the  author  filter  list,  while  ebrary  inc  was  listed.  again,  this   points  to  the  inconsistent  metadata  and  the  effects  it  can  have  on  usability,  as  discussed  by  stone.2   other  workflow  issues  were  presented  by  design  details  such  as  the  additional  information  boxes   underneath  the  item  information,  e.g.  ‘find  in  print’,  ‘details’  and  ‘find  online’.  they  opened  a  small   scrollable  box  that  maintained  the  overall  page  view,  were  difficult  to  scroll.  the  arrow  kept   slipping  outside  of  the  box,  scrolling  the  entire  site’s  page  instead  of  the  content  inside  the  box.  in   addition,  the  information  boxes  did  not  work  well  with  chrome.  this  was  especially  problematic   on  the  macbook  where  after  a  couple  of  searches  the  boxes  failed  to  list  the  details  and  left  the   user  with  an  unaccomplished  task.  comparably,  safari  on  a  mac  and  internet  explorer  on  a  pc   never  had  such  issues.       retrieving  the  items  (call  number  or  downloading  the  pdf)   the  last  sub-­‐goal  was  to  retrieve  the  item  of  interest.  this  often  comprised  of  multiple  decision   points:  whether  to  retrieve  the  pdf  version  from  online  or  identify  a  call  number  for  the  physical     applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       96   copy  or  whether  to  place  a  request,  ordering  it  via  inter  library  loan  (ill)  or  uborrow.  each   option  is  briefly  discussed  below.     ebooks  and  articles,  if  available  online,  offered  efficient  online  availability.  if  an  article  was   identified  for  retrieval,  there  were  two  options  to  access  the  link  to  the  database,  e.g.  ‘view  this   record  in  acm’:  a)  via  the  full  view  of  the  item,  or  b)  small  ‘find  online’  preview  box  discussed   above.  where  more  than  one  database  was  available,  information  about  the  publication  range  the   library  holds  helped  identify  the  right  link  to  download  the  pdf  on  the  link-­‐resolver  page.  one  of   the  key  benefits  of  having  links  from  within  primo  to  the  full  texts  was  the  fact  that  they  opened  in   new  browser  windows  or  tabs,  without  interference  to  other  ongoing  search.  while  a  few  of  the   pdf  links  to  downloadable  texts  were  difficult  to  find  through  some  external  database  sites,  once   found,  they  all  opened  in  adobe  reader  with  easy  options  to  either  'save'  or  ‘print’  the  material.     ebooks  were  available  via  ebrary  or  ebl  libraries.  while  the  latter  offers  some  novel  uses,  such  as   audio  (i.e.  read  aloud),  neither  of  the  two  platforms  was  easy  to  use.  while  reading  online  was   possible,  downloading  an  ebook  was  challenging.  the  platform  seemed  to  offer  good  options:  a)   download  by  chapter,  b)  download  by  page  numbers,  or  c)  download  the  full  book  for  14  days.  in   actuality,  however,  these  were  all  unavailable.  ebook  case  9  had  chapters  longer  than  the  60-­‐page   limit  per  day.  page  numbers  proved  difficult  to  use,  as  the  book’s  numbers  did  not  match  the  pdf’s   page  numbers.  this  made  it  hard  to  keep  track  of  what  was  downloaded  and  where  one  left  off  to   continue  later  (due  to  imposed  time-­‐limits).  the  14-­‐day  full  access  option  was  only  available  in   adobe  digital  editions  software  (an  ebook  reader  software  by  adobe  systems  built  with  adobe   flash),  which  was  neither  available  on  most  campus  computers  nor  on  personal  laptops.     the  least  demanding  and  most  fluent  of  all  retrieval  options  was  the  process  of  identifying  the   location  and  call  number  for  physical  copies.  inconsistent  metadata,  however,  posed  some   challenges.    book  case  5  offered  a  merged  search  result  of  two  books,  but  listed  them  with   different  call  numbers  in  the  ‘find  in  print’  tab.  libraries  have  many  physical  copies  of  the  same   book,  but  identifying  consistency  in  call  number  is  a  cognitive  step  that  helps  verify  the   similarities  or  differences  between  the  two  results.  the  different  call  numbers  raised  doubts  about   which  item  to  choose,  slowing  the  workflow  for  the  task  and  increasing  the  number  of  cognitive   steps  required  to  accomplish  the  task.     compared  to  books,  finding  an  article  in  print  format  was  hardly  straightforward.  the  main  cause   for  error  when  looking  up  hard  copies  of  journals  was  the  fact  that  individual  journal  issues  did   not  have  individual  call  numbers  at  purdue  libraries.  instead,  they  were  had  one  call  number  per   periodical  where  the  entire  journal  series  had  only  one  call  number.  article  case  2,  for  example,   offered  the  journal  code:  530.5  j821  in  the  ‘find  in  print’  tab.  in  general,  the  tab  suffered  from  too   much  information,  poor  layout  and  unhelpful  information  hierarchy,  all  of  which  slowed  down  the   cognitive  tasks  of  verifying  whether  an  item  was  relevant  or  not.  it  listed  ‘location’  and  ‘holdings   range’  as  the  first  pieces  of  information,  wherein  ‘holdings  range’  included  not  just  hard  copy   related  information,  but  listed  digital  items  as  well,  even  though  this  tab  was  for  physical  version     information  technology  and  libraries  |  march  2015   97   of  the  item.  to  illustrate,  article  case  2  claimed  to  have  holdings  for  1900  –  2013,  whereas  hard   copies  were  only  available  for  1900-­‐2000,  and  digital  copies  for  2001-­‐2013.     each  scenario  had  one  or  two  cases  where  there  were  neither  physical  nor  digital  options   available.  the  sub-­‐goal  commonly  comprised  of  a  decision  between  three  options:  c)  placing  a   request,  d)  ordering  an  item  via  inter  library  loan  (ill),  or  c)  ordering  an  item  via  uborrow.   while  the  ‘signing  in  to  request’  option  and  ill  were  easy  to  use  with  few  required  steps,  there   was  a  lack  of  guidance  on  how  to  choose  between  the  three  options.  frequently,  ill  and  uborrow   appeared  as  equal  options  adjacent  to  one  another,  leaving  the  next  step  unguided.  of  all  three,   placing  a  request  via  uborrow  was  the  hardest  to  accomplish.  it  often  failed  to  present  any   relevant  results  on  the  first  results  page  of  the  uborrow  system,  requiring  the  use  of  advanced   search  and  filters.  for  instance,  book  case  6  was  ‘not  requestable’  via  uborrow.  when  it  did  list   the  sought  for  item  in  the  search  results  it  looped  back  to  purdue's  own  closed  repository  (which   remained  unavailable).     discussion   the  goal  of  this  study  was  to  utilize  hta  to  examine  the  workflow  of  the  primo  discovery  layer  at   purdue  university  libraries.  nielsen’s  [6]  goal  composition  heuristics  were  used  to  extend  the   task-­‐based  analysis  and  understand  the  tasks  in  the  context  of  discovery  layers  in  libraries.  three   key  usability  domains:  generalization,  integration  and  user  control  mechanisms  were  used  as  an   analytical  framework  to  draw  usability  conclusions  about  how  primo  was  supporting,  if  at  all,   successful  completion  of  the  three  scenarios.  the  next  three  sub-­‐sections  evaluate  and  offer  design   solutions  on  the  three  usability  domains  mentioned  above.  overall,  this  study  confirmed  primo’s   ability  to  reduce  the  workload  for  users  to  find  their  materials.  primo  is  flexible  and  intuitive,   permitting  efficient  search  and  successful  retrieval  of  library  materials,  while  offering  the   possibility  of  many  search  sessions  at  once  [14].    a  comparison  to  a  usability  test  results  is  offered   as  a  way  of  conclusion.     generalization  mechanisms   primo  can  be  considered  a  flexible  discovery  layer  as  it  helps  users  achieve  many  goals  with   minimum  amount  of  steps.  it  makes  use  of  several  generalization  mechanisms  that  allow  users  to   utilize  their  tasks  towards  many  goals  at  once.  for  instance,  the  library  website  result  in  google   offers  not  only  the  main  url  but  also  seven  sub-­‐links  to  specialist  library  site  locations,  such  as   opening  hours  and  databases.  this  makes  primo  accessible  and  relevant  for  a  broader  array  of   people  who  are  likely  to  have  different  goals.  for  instance,  some  may  seek  to  enter  a  specific   database,  instead  of  having  to  open  primo’s  landing  page  and  entering  the  search  terms.  another   may  wish  to  utilize  ‘find’,  which  guides  the  user,  one  step  at  a  time,  via  a  process  of  definition   elimination,  closer  to  the  item  they  are  looking  forknow  the  opening  times.   similarly,  the  primo  search  function  saves  already  typed  information,  both  on  its  landing  page  and   its  results  page.  this  facilitates  search  by  requiring  query  entry  only  once,  while  allowing  end     applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       98   users  to  click  on  different  filters  to  narrow  the  results  in  different  ways.  as  a  part  of  the  work  done   towards  one  search  can  be  used  towards  another,  e.g.  by  content,  journal,  or  topic  type,  the  system   can  ease  the  work  effort  required  of  users.  this  is  further  supported  by  the  system  saving  already   typed  keywords  when  returning  to  the  main  search  page  from  research  results  and  allows  for  a   fluid  search  experience  where  the  user  adjusts  a  keyword  thread  with  minimal  typing,  until  they   find  what  they  are  looking  for.   a  key  problem  for  primo  is  its  inability  to  manage  inconsistent  meta-­‐data.  the  tendency  to  group   different  versions  of  the  same  search  results  together  is  helpful  as  it  clarifies  information  noise.  in   an  effort  to  enhance  the  speed  it  takes  to  evaluate  the  relevancy  of  search  results,  the  system  seeks   to  shighlight  any  differences  in  the  meta-­‐data.  if  inconsistencies  in  meta-­‐data  cause  same  search   results  to  appear  as  separate  items,  it  is  likely  to  affect  the  cognitive  steps  and  therefore  the   workload  and  efficiency  with  which  the  user  is  able  to  accomplish  identification.     it  is  clear  from  previous  studies  that  if  discovery  layers  were  to  become  the  next  generation   catalogs  [11],  and  were  to  enhance  the  speed  of  knowledge  distribution  as  has  been  hoped  by   tosaka  and  weng  [15]  and  luther  and  kelly  [16],  then  mutual  agreement  is  needed  on  how  meta-­‐ data  from  disparate  sources  [17].  understanding  that  users’  cognitive  workload  should  be   minimized  (by  offering  fewer  options  and  more  directive  guidance)  for  more  efficient  decision-­‐ making,  library  items  should  have  accurate  details  in  their  meta-­‐data,  e.g.  consistent  and  thorough   volume,  issue  and  page  numbers  for  journal  articles,  correct  print  and  reprint  years  for  books,  and   item  type  (conference  proceeding  vs.  journal  article).   integration  mechanisms   the  discovery  layer’s  ability  to  increase  the  number  of  search  sessions  [14]  at  any  one  time  is   possible  due  to  its  flexibility  to  support  multitasking.  primo  achieves  this  with  its  own  individual   features  used  in  combination  with  other  system  facilities  and  external  sources.  for  instance,   primo’s  design  allows  users  to  review  and  compare  several  search  results  at  once  via  the  ‘find  in   print’  or  ‘details’  tabs.  although  not  perfect,  since  the  small  boxes  are  hard  to  scroll  within,  the   information  can  save  the  user  the  need  and  additional  steps  of  opening  many  new  windows  and   having  to  click  between  them  just  for  reviewing  search  results.  instead,  many  ‘detail’  boxes  of   similar  results  may  be  opened  and  viewed  at  once,  allowing  for  effective  visual  comparison.  this   integration  mechanism  allows  a  fluent  transition  from  skimming  the  search  results  to  another   temporary  action  of  gaining  insight  about  the  relevance  of  an  item.  most  importantly,  this  is   accomplished  without  requiring  the  user  to  open  a  new  browser  page  or  tab,  where  they  would   have  to  break  from  their  overall  search  flow  and  remember  the  details  (instead  of  visually   comparing  them),  making  it  hard  to  resume  from  where  they  left  off.     a  contrary  integration  mechanic  that  primo  makes  use  of  is  its  smooth  automated  connectivity  to   external  sites,  such  as  databases,  ebrary,  ill,  etc.  new  browser  pages  are  used  to  allow  the   continuation  of  a  task  outside  of  primo  itself  without  forcing  the  user  out  of  the  system  to  the     information  technology  and  libraries  |  march  2015   99   library  service  or  full  text.  primo  users  can  skim  search  results,  identify  relevant  resources  and   open  them  in  new  browser  pages  for  later  reviewing.     what  is  missing,  however,  is  the  opportunity  to  easily  save  and  resume  a  search.  retrieving  the   search  result  or  saving  it  under  ones’  login  details  would  benefit  users  who  recall  items  of  interest   from  previous  searches  and  would  like  to  repeat  the  results  without  having  to  remember  the   keywords  or  search  process  they  used.  it  is  not  obvious  how  to  locate  the  save  search  session   option  in  primo’s  interface.   user  control  mechanisms   yang  and  wagner  [11]  ranked  primo  highest  among  the  vendors,  primarily  for  its  good  user   control  mechanisms,  which  allow  users  to  inspect  and  change  the  search  functions  on  an  ongoing   basis.  primo  does  a  good  job  at  presenting  search  results  in  a  quick  and  organized  manner.  it   allows  for  the  needed  ‘undo’  functionality  and  continued  attachment  and  removal  of  filters,  while   saving  the  last  keywords  when  clicking  the  back  button  from  search  results.  the  continuously   available  small  search  box  also  offers  the  flexibility  for  the  user  to  change  search  parameters   easily.  in  summary,  primo  offers  agile  searching,  while  accounting  for  a  few  different  discovery   mental  models.     however,  if  primo  wants  to  preserve  its  current  effectiveness  and  make  the  jump  towards  a  single   search  function  that  is  truly  flexible  and  allows  for  much  needed  customizability  [18][2],  it  needs   to  allow  for  several  similar  user  goals  to  be  easily  executable  without  confusion  about  the  likely   outcome.  the  most  prominent  current  system  error  for  primo,  as  it  has  been  applied  in  the  purdue   libraries,  is  its  inability  to  differentiate  ebooks  from  journal  articles  or  books.  it  would  support   users  goals  to  be  able  to  start  and  finish  an  ebook  related  tasks  at  the  home  page’s  search  box.   currently,  users  have  the  cognitive  burden  to  consider  whether  ebooks  are  more  likely  to  be   found  under  ‘books  &  media’  or  ‘journals’.  currently,  primo,  as  applied  to  its  implementation  at   purdue  libraries  at  the  time  of  this  study,  does  not  support  goals  to  search  for  content  type,  e.g.  an   ebook.  this  however,  is  increasingly  popular  among  the  student  population  who  want  ebooks  on   their  tablets  and  phones  instead  of  carrying  heavy  books  in  their  backpacks.     another  key  pain-­‐point  for  current  users  is  the  identification  of  specific  journals  in  physical  form,   say  for  archival  research.  currently,  each  journal  issue  is  listed  individually  in  the  ‘find  in  print’   section,  even  though  the  journals  only  have  one  call  number.  listing  all  volumes  and  issues  of  each   periodical  overwhelms  the  user  with  too  much  information  and  prevents  the  effective   accomplishment  of  the  task  of  locating  a  specific  journal  issue.  since  there  is  only  one  call  number   available  for  the  entire  journal  sequence,  it  may  lead  to  better  clarity  and  usability  if  the   information  was  reduced.  instead  of  listing  all  possible  journal  issues,  a  range  or  ranges  (if   incomplete  set  of  issues)  that  the  library  has  physically  present  should  be  listed.  in  article  case  2,   for  instance,  there  are  five  items  for  the  year  1983.  why  lead  the  user  to  look  at  a  range  where   there  is  no  possible  option?     applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       100   comparing  hta  to  a  usability  test   usability  tests  benefit  from  the  invaluable  direct  input  from  the  end  user.  at  the  same  time   usability  studies,  as  constructed  conditions,  offer  limited  opportunities  to  learn  about  users’  real   motivations  and  goals  and  how  the  discovery  layers  support  or  fail  to  support  their  tasks.  fagan   et  al  [3]  conducted  a  usability  test  with  eight  students  and  two  faculty  members  to  learn  about   usability  issues  and  user  satisfaction  with  discovery  layers.  they  measured  time,  accuracy  and   completion  rate  for  nine  specific  tasks,  and  obtained  insights  from  task  observations  and  post-­‐test   surveys.  they  reported  on  issues  with  users  not  following  directions  (93),  the  prevalence  of  time   outs,  users  skipping  tasks,  and  variable  task  times.    these  results  all  point  to  a  mismatch  between   the  user  goals  and  the  study  tasks  and  offer  an  incomplete  picture  about  the  system’s  ability  to   support  user  goals  that  are  accomplished  via  specific  tasks.   expert  evaluation  based  hta  method  does  not  require  users’  direct  input.  hta  offers  a  method  to   achieve  a  relatively  complete  evaluation  of  how  low-­‐level  interface  facets  support  users’  high-­‐ level  cognitive  tasks.  hta  measures  the  system  designs  quality  in  supporting  a  specific  task   needed  to  accomplish  a  user  goal.  instead  of  measuring  time,  physical  and  cognitive  tasks  are   measured  in  number  of  steps.  instead  of  accuracy  and  completion  rate,  fluent  workflow  steps  and   mistaken  steps  are  counted.  the  two  methods  offer  opposite  strengths,  making  them  a  good   complements.  given  hta’s  system-­‐centric  approach,  it  can  better  inform  which  tasks  would  be   useful  in  usability  testing.   to  compare  the  our  research  findings  with  usability  tests,  fagan  et  al  [3]  confirmed  some  of  the   previously  established  findings  that  journal  titles  are  difficult  to  locate  via  the  library  home  page   (vs.  databases),  that  filters  are  handy  when  they  are  needed  and  that  users’  mental  models  have  a   preference  for  a  google-­‐like  single  search-­‐box.  for  instance,  students  and  even  librarians,  struggle   to  understand  what  is  being  searched  in  each  system  and  how  results  are  ranked  (see  also  [5]).   the  hta  method  applied  in  this  study  was  also  able  to  confirm  that  journal  titles  are  more   difficult  to  identify  than  books  and  ebooks,  the  flexibility  benefit  offered  by  filters  and  identify  the   single  search  box  as  a  fluent  system  design.  since,  hta  does  not  rely  on  the  user  to  tell  why  these   results  are  true,  hta,  as  applied  in  this  study,  helped  expert  evaluators  understand  the  reasons   for  these  findings  via  self-­‐directed  execution  and  discussion  with  colleagues  later.  depending  on   the  task  design,  either  usability  testing  or  hta  offer  the  capabilities  to  identify  cases  such  as   confusion  about  how  to  start  an  ebook  search  in  primo.  taking  a  system  design  approach  to  task   design  offers  a  path  to  a  systematic  understanding  of  discovery  layer  usability,  which  lends  itself   to  easier  comparison  and  external  validity.     in  terms  of  specific  interface  features,  usability  tests  are  good  for  evaluating  the  visibility  of   specific  features.  for  example,  fagan  et  al  [3]  asked  their  participants  to  (1)  search  on  speech   pathology,  (2)  find  a  way  to  limit  search  results  to  audiology,  and  then  (3)  limit  their  search   results  to  peer-­‐reviewed  (task  3  in  [3],  p.  95).  by  measuring  completion  rate,  they  were  able  to   identify  the  relative  failure  of  ‘peer-­‐reviewed’  over  ‘audiology’  filters,  but  they  were  left  “unclear     information  technology  and  libraries  |  march  2015   101   [about]  why  the  remaining  participants  did  not  attempt  to  alter  the  search  results  to  ‘peer  reviewed,”   failing  to  accomplish  the  task  [3].  in  comparison,  hta  as  an  analytical  rather  than  observational   methodology,  leads  to  more  synthesized  results.  in  addition  to  insights  into  possible  gaps   between  system  design  and  mental  models,  hta  as  a  goal-­‐oriented  approach,  concerns  itself  with   issues  of  workflow  (how  well  the  system  guides  the  user  to  accomplishing  their  task)  and   efficiency  (minimizing  the  number  of  steps  required  to  finish  a  task).  these  are  less  obvious  to   identify  with  usability  tests,  where  participants  are  not  impacted  by  their  routine  goals,  time   pressures  and  consequently  their  patience  may  be  more  tolerant  as  a  result.   the  application  of  hta  helped  identify  key  workflow  issues  and  map  them  to  specific  design   elements.  for  instance,  the  lack  of  ebooks  as  a  search  filter  meant  that  the  current  system  did  not   support  content  form  based  searching  well  for  two  mains  forms:  articles  and  books.  compared  to   usability  tests  that  focus  on  specific  fabricated  search  processes,  hta  aims  to  map  all  possible   routes  the  system’s  design  offers  to  accomplish  a  goal,  allowing  for  their  parallel  existence  during   the  analysis.  this  system-­‐centered  approach  to  task  evaluation,  we  argue,  is  the  key  benefit  hta   can  offer  towards  a  more  systematic  evaluation  of  discovery  layers,  where  different  user  groups   would  have  varying  levels  of  assistance  needs.  hta  task-­‐analysis  allows  for  the  nuanced   understanding  that  results  can  differ  as  the  context  of  use  differs.  that  applies  even  to  the   contextual  difference  between  user  test  participants  and  routine  library  users.     conclusion   discovery  layers  are  advancing  the  search  experiences  libraries  can  offer.  with  increasing   efficiency,  increased  ease  of  use  and  more  relevant  results,  scholarly  search  has  become  a  far  less   frustrating  experience.  while  google  is  still  perceived  as  the  holy  grail  of  discovery  experiences,  in   reality  it  may  not  be  quite  what  scholarly  users  are  after  [5].  the  application  of  discovery  layers   has  focused  on  eliminating  the  limitations  that  plagued  the  traditional  federated  search  and   improving  the  search  index  coverage  and  performance.  usability  studies  have  been  effective  in   verifying  these  benefits  and  key  interface  issues.  moving  forward,  studies  on  discovery  layers   should  focus  more  on  the  significance  of  discovery  layers  on  user  experience.   this  study  presents  the  expert  evaluation  based  hta  methods  as  a  complementary  way  to   systematically  evaluate  popular  discovery  layers.  it  is  the  system  design  and  goal-­‐oriented   evaluation  approach  that  offers  the  prospects  of  a  more  thorough  body  of  research  on  discovery   layers  than  usability  alone.  using  hta  as  a  systematic  preliminary  study  guiding  formal  usability   testing  offers  one  way  to  achieve  more  comparable  study  results  on  applications  of  discovery   layers.  it  is  through  comparisons  that  the  discussion  of  discovery  and  user  experience  can  gain  a   more  focused  research  attention.  as  such,  hta  can  help  vendors  to  achieve  the  full  potential  of   web-­‐scale  discovery  services.     to  better  understand  and  ultimately  design  to  their  full  potential,  systematic  studies  are  needed   on  discovery  layers.  this  study  is  the  first  attempt  to  apply  hta  towards  systematically  analyzing   user  workflow  and  interaction  issues  on  discovery  layers.  the  authors  hope  to  see  more  work  in     applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       102   this  area,  with  the  hope  of  achieving  true  next  generation  catalogs  that  can  enhance  knowledge   distribution.         references   [1]   beth  thomsett-­‐scott  and  patricia  e.  reese,  “academic  libraries  and  discovery  tools:  a   survey  of  the  literature,”  college  &  undergraduate  libraries  19,  no.  2–4  (april  2012):  123– 143.  http://dx.doi.org/10.1080/10691316.2012.697009.     [2]     sarah  c.  williams  and  anita  k.  foster,  “promise  fulfilled?  an  ebsco  discovery  service   usability  study,”  journal  of  web  librarianship  5,  no.  3  (jul.  2011):  179–198.   http://dx.doi.org/10.1080/19322909.2011.597590.     [3]     jody  condit  fagan,  meris  a.  mandernach,  carl  s.  nelson,  jonathan  r.  paulo,  and  grover   saunders,  “usability  test  results  for  a  discovery  tool  in  an  academic  library,”  information   technology  and  libraries  31,  no.  1  (mar.  2012):  83–112,  mar.  2012.   http://dx.doi.org/10.6017/ital.v31i1.1855.   [4]     roger  c.  schonfeld  and  matthew  p.  long,  “ithaka  s+r  us  library  survey  2013,”  ithaka  s+r,     survey  2,  mar.  2014.  http://sr.ithaka.org/research-­‐publications/ithaka-­‐sr-­‐us-­‐library-­‐survey-­‐ 2013.   [5]     michael  khoo  and  catherin  hall,  “what  would  ‘google’  do?  users’  mental  models  of  a  digital   library  search  engine,”  in  theory  and  practice  of  digital  libraries,  ed.  panayiotis  zaphiris,   george  buchanan,  edie  rasmussen,  and  fernando  loizides,  1-­‐12  (berlin  heidelberg,   springer:  2012).  http://dx.doi.org/10.1007/978-­‐3-­‐642-­‐33290-­‐6_1.   [6]     jakob  nielsen,  “goal  composition:  extending  task  analysis  to  predict  things  people  may   want  to  do,”  goal  composition:  extending  task  analysis  to  predict  things  people  may  want  to   do,  01-­‐jan-­‐1994.  http://www.nngroup.com/articles/goal-­‐composition/.   [7]     jakob  nielsen,  “finding  usability  problems  through  heuristic  evaluation,”  in  proceedings  of   the  sigchi  conference  on  human  factors  in  computing  systems,  373-­‐380  (new  york,  ny,   acm:  1992).  http://dx.doi.org/10.1145/142750.142834.   [8]     jerry  v.  caswell  and  john  d.  wynstra,  “improving  the  search  experience:  federated  search   and  the  library  gateway,”  library  hi  tech  28,  no.  3  (sep.  2010):  391–401.   http://dx.doi.org/10.1108/07378831011076648.   [9]     emily  r.  alling  and  rachael  naismith,  “protocol  analysis  of  a  federated  search  tool:   designing  for  users,”  internet  reference  services  quarterly  12,  no.  1/2,  (2007):  195–210.   http://dx.doi.org/10.1300/j136v12n01_10.   [10]   susan  johns-­‐smith,  “evaluation  and  implementation  of  a  discovery  tool,”  kansas  library   association  college  and  university  libraries  section  proceedings  2,  no.  1  (jan.  2012):  17–23.     information  technology  and  libraries  |  march  2015   103   [11]   sharon  q.  yang  and  kurt  wagner,  “evaluating  and  comparing  discovery  tools:  how  close  are   we  towards  next  generation  catalog?,”  library  hi  tech  28,  no.  4  (nov.  2010):  690–709.   http://dx.doi.org/10.1108/07378831011096312.   [12]   lyle  ford,  “better  than  google  scholar?,”  presentation,  advance  program  for  internet   librarian  2010,  monterey,  california,  25-­‐oct-­‐2010.   [13]   michael  gorrell,  “the  21st  century  searcher:  how  the  growth  of  search  engines  affected  the   redesign  of  ebscohost,”  against  the  grain  20,  no.  3  (2008):  22,  24.   [14]   sian  harris,  “discovery  services  sift  through  expert  resources,”  research  information,  no.  53,  ,   (apr.  2011):  18–20.   http://www.researchinformation.info/features/feature.php?feature_id=315.   [15]   yuji  tosaka  and  cathy  weng,  “reexamining  content-­‐enriched  access:  its  effect  on  usage  and   discovery,”  college  &  research  libraries  72,  no.  5  (sep.  2011):  pp.  412–427.   http://dx.doi.org/10.5860/.   [16]   judy  luther  and  maureen  c.  kelly,  “the  next  generation  of  discovery,”  library  journal  136,   no.  5  (march  15,  2011):  66-­‐71.     [17]   graham  stone,  “searching  life,  the  universe  and  everything?  the  implementation  of   summon  at  the  university  of  huddersfield,”  liber  quarterly  20,  no.  1  (2010):  25–51.   http://liber.library.uu.nl/index.php/lq/article/view/7974.   [18]   jeff  wisniewski,  “web  scale  discovery:  the  future’s  so  bright,  i  gotta  wear  shades,”  online   34,  no.  4  (aug.  2010):  55–57.   [19]   gaurav  bhatnagar,  scott  dennis,  gabriel  duque,  sara  henry,  mark  maceachern,  stephanie   teasley,  and  ken  varnum,  “university  of  michigan  library  article  discovery  working  group   final  report,”  university  of  michigan  library,  jan.  2010,   http://www.lib.umich.edu/files/adwg/final-­‐report.pdf     [20]   abe  crystal  and  beth  ellington,  “task  analysis  and  human-­‐computer  interaction:  approaches,   techniques,  and  levels  of  analysis”  in  amcis  2004  proeedings,  paper  391,   http://aisel.aisnet.org/amcis2004/391.    [21]  neville  a.  stanton,  “hierarchical  task  analysis:  developments,  applications,  and  extensions,”   applied  ergonomics  37,  no.  1  (2006):  55–79.   [22]   john  annett  and  neville  a.  stanton,  eds.  task  analysis,  1  edition.  london ;  new  york:  crc   press,  2000.   [23]   sarah  k.  felipe,  anne  e.  adams,  wendy  a.  rogers,  and  arthur  d.  fisk,  “training  novices  on   hierarchical  task  analysis,”  proceedings  of  the  human  factors  and  ergonomics  society   annual  meeting  54,  no.  23,  (sep.  2010):  2005–2009,   http://dx.doi.org/10.1177/154193121005402321.     applying  hierarchical  task  analysis  method  to  discovery  layer  evaluation  |  promann  and   zhang       104   [24]   d.  embrey,  “task  analysis  techniques,”  human  reliability  associates  ltd,  vol.  1,  2000.   [25]   j.  reason,  “combating  omission  errors  through  task  analysis  and  good  reminders,”  quality  &   safety  health  care  11,  no.  1  (mar.  2002):  40–44,  http://dx.doi.org/10.1136/qhc.11.1.40.   [26]   james  hollan,  edwin  hutchins,  and  david  kirsh,  “distributed  cognition:  toward  a  new   foundation  for  human-­‐computer  interaction  research,”  acm  trans.  comput.-­‐hum.  interact   7,  no.  2  (jun.  2000):  174–196,    http://dx.doi.org/10.1145/353485.353487.   [27]   stuart  k.  card,  allen  newell,  and  thomas  p.  moran,  the  psychology  of  human-­‐computer   interaction.  hillsdale,  nj,  usa:  l.  erlbaum  associates  inc.,  1983.   [28]   stephen  j.  payne  and  t.  r.  g.  green,  “the  structure  of  command  languages:  an  experiment  on   task-­‐action  grammar,”  international  journal  of  man-­‐machine  studies  30,  no.  2  (feb.  1989):   213–234.   [29]   bonnie  e.  john  and  david  e.  kieras,  “using  goms  for  user  interface  design  and  evaluation:   which  technique?,”  acm  transactions  on  computer-­‐human  interactions  3,  no.  4  (dec.  1996):   287–319,  http://dx.doi.org/10.1145/235833.236050.   [30]   david  e.  kieras  and  david  e.  meyer,  “an  overview  of  the  epic  architecture  for  cognition  and   performance  with  application  to  human-­‐computer  interaction,”  human-­‐computer   interaction  12,  no.  4  (dec.  1997):  391–438,   http://dx.doi.org/10.1207/s15327051hci1204_4.   [31]   laura  g.  militello  and  robert  j.  hutton,  “applied  cognitive  task  analysis  (acta):  a   practitioner’s  toolkit  for  understanding  cognitive  task  demands,”  ergonomics  41,  no.  11   (nov.  1998):    1618–1641,  http://dx.doi.org/10.1080/001401398186108.   [32]   brenda  battleson,  austin  booth,  and  jane  weintrop,  “usability  testing  of  an  academic  library   web  site:  a  case  study,”  the  journal  of  academic  librarianship  27,  no.  3  (may  2001):  188– 198.   [33]   paul  chojecki,  “how  to  increase  website  usability  with  link  annotations,”  in  20th   international  symposium  on  human  factors  in  telecommunication.  6th  european  colloquium   for  user-­‐friendly  product  information.  proceedings,  2006,  p.  8.             case  study  references:     information  technology  and  libraries  |  march  2015   105     find  an  article:   case  1.  yang,  t.  c.,  and  kevin  d.  heaney.  "network-­‐assisted  underwater  acoustic   communications."  in  proceedings  of  the  seventh  acm  international  conference  on  underwater   networks  and  systems,  p.  37.  acm,  2012.   case  2.  jorgensen,  william  l.,  jayaraman  chandrasekhar,  jeffry  d.  madura,  roger  w.  impey,  and   michael  l.  klein.  "comparison  of  simple  potential  functions  for  simulating  liquid  water."  the   journal  of  chemical  physics  79  (1983):  926.   case  3.  “design  annual.”  graphis  inc.,  2008   case  4.  walb,  m.  c.,  j.  e.  moore,  a.  attia,  k.  t.  wheeler,  m.  s.  miller,  and  m.  t.  munley.  "a   technique  for  murine  irradiation  in  a  controlled  gas  environment."  biomedical  sciences   instrumentation  48  (2012):  470.   find  a  book  (physical):   case  5.  few,  stephen.  show  me  the  numbers:  designing  tables  and  graphs  to  enlighten.  vol.  1,   no.  1.  oakland,  ca:  analytics  press,  2004.   case  6.  metcalf,  christine.  the  love  of  cats.  crescent  books,  1973.   case  7.  machiavelli,  niccolò,  and  leo  paul  s.  de  alvarez.  1989.  the  prince.  prospect  heights,  ill:   waveland  press.   case  8.  lees-­‐maffei,  grace,  and  rebecca  houze,  eds.  the  design  history  reader.  berg,  2010.   find  an  ebook:   case  9.  rubin,  jeffrey,  and  dana  chisnell.  handbook  of  usability  testing:  how  to  plan,  design,   and  conduct  effective  tests.  wiley  technical  communication  library,  2008.   case  10.  rubin,  jeffrey,  and  dana  chisnell.  handbook  of  usability  testing:  how  to  plan,  design,   and  conduct  effective  tests.  wiley  technical  communication  library,  2008.   case  11.  laube,  matthew  bryan.  ancient  awakening.  2010.       letter from the editor: the core question letter from the editor the core question kenneth j. varnum information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.12137 as i write this, the members of the association for library collections and technical services (alcts), the library leadership and management association (llama), and lita are voting to merge into a new consolidated division, core: leadership, infrastructure, futures. this merger is essential to the continuing activities that we library technologists rely on. the lita board has indicated that if the merger does not go through, lita will be forced to dissolve over the coming year. the merger will enrich lita members’ opportunities. lita has long focused on the library technology practitioner. that has been our core competency, born of a time when technology was the new thing in libraries. we technologists know—the entire information profession knows— that technology is no longer an addition to a library, but is the way society operates, for a huge portion of our work and life. core reflects this evolutionary change. similar evolutions have taken place in technical services and collections development areas; those functions have been forever changed by the wave of technologies that we have implemented over the past half century. core brings together the practitioners and technologies that make libraries run, and combines them with the library leadership areas that many of us aspire to, or end up taking on, as our careers develop. when i joined lita over a decade ago, i was myself moving from “doer” to “manager.” now that my role is largely project and personnel management, the skills and conversations i seek for personal growth are often found in other parts of ala, and beyond. yet, the focus—the core, if i may—of what i do is still in the center of the venn diagram of technology, people, and data. i voted to support core and hope that all of you who belong to lita, alcts, and/or llama, will do the same. sincerely, kenneth j. varnum, editor varnum@umich.edu march 2020 https://core.ala.org/ mailto:varnum@umich.edu efficiently processing and storing library linked data using apache spark and parquet kumar sharma, ujjal marjit, and utpal biswas information technology and libraries | september 2018 29 kumar sharma (kumar.asom@gmail.com) is research scholar, department of computer science and engineering; ujjal marjit (marjitujjal@gmail.com) is system-in-charge, center for information resource management (cirm); and utpal biswas (utpal01in@yahoo.com) is professor, department of computer science and engineering, the university of kalyani, india. abstract resource description framework (rdf) is a commonly used data model in the semantic web environment. libraries and various other communities have been using the rdf data model to store valuable data after it is extracted from traditional storage systems. however, because of the large volume of the data, processing and storing it is becoming a nightmare for traditional datamanagement tools. this challenge demands a scalable and distributed system that can manage data in parallel. in this article, a distributed solution is proposed for efficiently processing and storing the large volume of library linked data stored in traditional storage systems. apache spark is used for parallel processing of large data sets and a column-oriented schema is proposed for storing rdf data. the storage system is built on top of hadoop distributed file systems (hdfs) and uses the apache parquet format to store data in a compressed form. the experimental evaluation showed that storage requirements were reduced significantly as compared to jena tdb, sesame, rdf/xml, and n-triples file formats. sparql queries are processed using spark sql to query the compressed data. the experimental evaluation showed a good query response time, which significantly reduces as the number of worker nodes increases. introduction more and more organizations, communities, and research-development centers are using semantic web technologies to represent data using rdf. libraries have been trying to replace the cataloging system using a linked-data technique such as bibframe.1 libraries have received much attention on transitioning marc cataloging data into rdf format.2 data stored in various other formats such as relational databases, csv, and html have already begun their journey toward the open-data movement.3 libraries have participated in the evolution of linked open data (lod) to make data an essential part of the web.4 various researchers have explored areas related to library data and linked data. in particular, transitioning legacy library data into linked data has dominated most of the research works. other areas include researching the impact of linked library data, investigating how privacy and security can be maintained, and exploring the potential effects of having open linked library data. obviously, a linked-data approach for publishing data on the web brings many benefits to libraries. first, once isolated library data currently stored using traditional cataloging systems (marc) becomes a part of the web, it can be shared, reused, and consumed by web users.5 this promotes the cross-domain sharing of knowledge hidden in the library data, opening the library as a rich source of information. online library users can share more information using linked library resources since every library mailto:kumar.asom@gmail.com mailto:marjitujjal@gmail.com mailto:utpal01in@yahoo.com efficiently processing and storing library linked data | sharma, marjit, and biswas 30 https://doi.org/10.6017/ital.v37i3.10177 resource is crawlable on the web via uniform resource identifiers (uri). most importantly, library data benefits from linked-data technology’s real advantages, such as interoperability, integration with other systems, data crosswalks, and smart federated search.6 numerous approaches have evolved for making the vision of the semantic web a success. no doubt, they have succeeded in making the library a part of the web, but there remain issues related to library big data. the term big data refers to data or information that cannot be processed using traditional software systems.7 the volume of such data is so large that it requires advanced technologies for processing and storing the information. libraries also have real concerns with large volumes of data during and after the transition to linked data. the main challenges are in processing and storage. during conversion from library data to rdf, the process can become stalled because of the large volumes of data. once the data is successfully converted into rdf formats, there are storage issues. finally, even if the data is somehow stored using common rdf triple stores, it is difficult to retrieve and filter. this is a challenging problem that every librarian must give attention to. librarians should know the real nature of library big data, which causes problems in analyzing data and decision making. librarians must also know the technologies that can resolve these issues. the rate of data generation and the complexity of the data itself are constantly increasing. traditional data-management tools are becoming incapable of managing the data. that is why the definition of big data has been characterized by five vs—volume, velocity, variety, value, and veracity.8 • volume is the amount of the data. • velocity is the data-generation rate (which is high in this case). • variety refers to the heterogeneous nature of the data. • value refers to the actual use of the data after the extraction. • veracity is the quality or trustworthiness of the data. to handle the five vs of big data, distributed technologies such as commodity hardware, parallel processing frameworks, and optimized storage systems are needed. commodity hardware reduces the cost of setting up a distributed environment and can be managed with very limited configurations. a parallel processing system can process distributed data in parallel to reduce processing time. an optimized storage system is required to store the large volume of data, supporting scalability to accommodate more data on demand. with these library requirements to tackle the challenges posed by library big data, a distributed solution is proposed. this approach is based on apache hadoop, apache spark, and a column-oriented storage system to process largesize data and to store the processed data in a compressed form. bibliographic rdf data from british national library and the national library of portugal have been used for this experiment. these bibliographic data are processed using apache spark and stored using apache parquet format. the stored data can be queried using sparql queries for which spark sql is used to execute queries. given an existing rdf dataset, we designed a schema for storing rdf data using a columnoriented database. using column-oriented design with apache parquet and spark sql as the query information technology and libraries | september 2018 31 processor, a distributed rdf storage system was implemented that can store any amount of rdf data by increasing the number of distributed nodes as needed. literature review while big data continues to rise, library data are still in traditional storage systems isolated from the web. to continue working with the web, libraries must redesign the way they format data and contribute toward the web of data. to serve library data to other communities, libraries must integrate their data with the web. attempts to do this have been made by several researchers. the task of integration cannot be achieved by only librarians; rather, it requires a team of experts in the field of library and information technology. the advanced way for integrating resources is with linked-data technology by assigning uris to every piece of library data. with this goal, there exist various projects related to the convergence of library data and linked data. one of these, bibframe, is an initiative to transition bibliographic resources into linked-data representation. bibframe aims to replace traditional cataloging standards such as marc and unimarc using the concept of publishing structured data on the web. marc formats cannot be exchanged easily with nonlibrary systems. the marc standard also suffers from inconsistencies, errors, and inability to express relationships between records and fields within the record. that is why mostly bibliographic resources stored in marc standards are targeted for conversion.9 other works include the open-data initiative from the british national library, library catalog to linked opendata conversion, exposing library data as linked data, and building a knowledge graph to reshape the library staff directory.10 linked data is fully dependent on rdf. rdf reveals graph-like structures where resources are linked with one another. thus, rdf can improve on marc standards because of its strong ability to link related resources. this system of revealing everything as a graph helps in building a network of library resources and other data on the web. this also makes for fast search functionality. in addition, searching a topic or book could bring similar graphs from other library resources, leading to the creation of linked-data service.11 such a service has been implemented by the german national library to provide bibliographic and authority data in rdf format, by the europeana linked open data with access to open metadata on millions of books and multimedia data, and by the library of congress linked data service.12 there is less discussion of library big data. though big data in general is in active research, the library domain has received much less attention than the broader concept of big data and its challenges. this could be because most of librarians working with linked data are from nontechnical backgrounds. now is the right time for libraries to give priority to adopting big data technologies to overcome challenges posed by big data. wang et al. have discussed library big data issues and challenges.13 they made some statements about whether library data belongs to the big data category. obviously, library data belongs to big data since it fulfills some of the characteristics of big data, such as volume, variety, and velocity. wang et al. also raise some of libraries’ challenges related to library big data, such as lacking teams of experts, inability to adopt big data due to budgetary issues, and technical challenges. finally, they point out that to take advantage of the web’s full potential, library data must be transformed into a format that can be accessible beyond the library using technologies like semantic web and linked data. the web has already started its work related to big-data challenges. libraries need to transition their data into an advanced format with the ability to handle big-data issues. the main problems efficiently processing and storing library linked data | sharma, marjit, and biswas 32 https://doi.org/10.6017/ital.v37i3.10177 related to library big data happen at data transformation and storage. to store and retrieve large amounts of data, we need commodity hardware that can handle trillions of rdf triples, requiring terabytes or petabytes of disk space. as of now, there are semantic web frameworks such as jena and sesame to handle rdf data, but these frameworks are not scalable for large rdf graphs.14 jena is a java-based framework for building semantic web and linked-data applications. it is basically a semantic web programming framework that provides java libraries for dealing with rdf data. jena tdb is the component of jena for storing and querying rdf data. 15 it is designed to work in a single-node environment. sesame is also a semantic web framework for processing, storing, and querying rdf data. basically, sesame is a web-based architecture for storing and querying rdf data as well as schema information. 16 background this section briefly describes the structure of rdf triples, apache spark along with its features and column-oriented database system, and apache parquet. structure of rdf triples rdf is a schema-less data model. it implies that the data is not fixed to a specific schema, so it does not need to conform to any predefined schema. unlike in relational tables, where we define columns during schema definition and those columns must contain the required type of data, in rdf we can have any number of properties and data using any kind of vocabulary. we only need vocabulary terms to embed properties. the vocabulary is created using domain ontology, which represents the schemas. to describe library resources we need a library-domain ontology. for example, to define a book and its properties one can use the bookont ontology.17 bookont is a book-structure ontology designed for an optimized book search and retrieval process. however, it is not mandatory to use existing ontology and all the properties defined under it. we can use terms from a newly created ontology or mixed ontologies with required properties. rdf represents resources in the form of subject, predicate, and object. the subject is the resource being described, identified by a uri. this subject can have any number of property-value pairs. this way representation of a resource is called knowledge representation, where everything is defined as a knowledge in the form of entity attribute value (eav). in rdf, the basic unit of information is a triple t, such that t = {subject, predicate, object}. such information when stored on disk is called a triplestore. the collection of rdf triples is called an rdf database. an rdf database is specially designed to store linked data to make the web more useful by interlinking data from different sources in a meaningful way. the real advantage of rdf is its support of the common data model. rdf is the standard way for publishing meaningful data on the web, and this is backed by linked data. linked data provides some rules about how data can be published on the web by following the rdf data model.18 with such a common data model, one can integrate data from any sources by inserting new property-value pairs without altering database schema. another important purpose of rdf is to provide resources to be processable by software agents on the web. rdf triples are of two types: literal triples and linked triples. literal triples consist of a uri referenced subject and a literal object (scalar value) joined by a predicate. in linked triples, both the subject and the object consist of uris linking by the predicate. this type of linking is called rdf link, which is the basis for interlinking the resources.19 rdf data are queried using the sparql query language.20 sparql is a graph-matching query language and is used to retrieve information technology and libraries | september 2018 33 triples from the triple store. the sparql queries are also called semantic queries. like sql queries, sparql also finds and retrieves the information stored in the triplestore. a sparql query is composed of five main components:21 • the prefix declaration part is used to abbreviate the uris; • the dataset definition is used to specify the rdf dataset from which the data is to be fetched; • the result clause is used to specify what information is needed to be fetched, which can be select, construct, describe, and ask; • the query pattern is used to specify the search conditions; and • the query modifiers are used to rearrange query results using order by, limit etc. hadoop and mapreduce hadoop is open-source software that supports distributed processing of large datasets on machine clusters.22 two core components—hadoop distributed file system (hdfs) and mapreduce— make distributed storage and computation of processing jobs possible.23 hdfs is the storage component, whereas mapreduce is a distributed data-processing framework, the computational model of hadoop based on java. the mapreduce algorithm consists of two main tasks: map and reduce. the map task takes a set of data as input and produces another set of data with individual components in the form of key/value pairs or tuples. the output of the map task goes to the reduce task, which combines common key/value pairs into a smaller set of tuples. hdfs and mapreduce are based on driver/worker architecture consisting of driver and worker nodes having different roles. an hdfs driver node is called the name-node while the worker node is called the data-node. the name-node is responsible for managing names and data blocks. data blocks are present in the data-nodes. data-nodes are distributed across each machine, responsible for actual data storage. similarly, the mapreduce driver node is called the job-tracker and the worker node is called the task-tracker. job-tracker is responsible for scheduling jobs on task-trackers. task-tracker again is distributed across each machine along with the data-nodes, responsible for processing map and reducing tasks as instructed by the job-tracker. the concept of hadoop implies that the set of data to be processed is broken into smaller forms that can be processed individually and independently. this way, tasks can be assigned to multiple processors to process the data, and eventually it becomes easy to scale data processing over multiple computing nodes. once a mapreduce program is written, the program can be scaled to run over thousands of machines in a cluster. spark and resilient distributed datasets (rdd) apache spark is an in-memory cluster computing platform, which is a faster batch-processing framework than mapreduce. more importantly, it supports in-memory processing of tasks along with data, so querying data is much faster than disk-based engines. the core of spark is the resilient distributed dataset (rdd). rdd is a fundamental data structure of spark that holds a distributed collection of data where data cannot be modified. rather, data modification yields another immutable collection of data (or rdd). this process is called rdd transformation. for example, figure 1 depicts an example of rdd transformation. the distributed processing and efficiently processing and storing library linked data | sharma, marjit, and biswas 34 https://doi.org/10.6017/ital.v37i3.10177 transformation of data is managed by rdd. rdds are fault-tolerant, meaning that the lost data is recoverable using lineage graph of rdds.24 spark constructs a direct acyclic graph (dag) of a sequence of computations that needed to be performed on data. spark has the most powerful computing engine that allows most of the computations in multistage memory. because of this multistage in-memory computation engine, it provides better performance at reading and writing data than the mapreduce paradigm.25 it aims at speed, ease of use, extensibility, and interactive analytics. spark relies on concepts such as rdd, dag, spark context, transformations, and actions. spark context is an execution environment in which rdds and broadcasting variables can be created. spark context is also called the master of a spark application and allows accessing the cluster through a resource manager. data transformation happens in the spark application when the data is loaded from a data-store into rdds and some filter or map functions are performed to produce a new set of rdds. when the set of computations is created, forming a dag, it does not perform any execution; rather, it prepares for execution in the end, like a lazy loading process. some examples of actions are data extraction or collection and getting the count of words. transformations are the sequence of events, and action is the final execution of the underlying logic. figure 1. rdd transformations. the execution model of spark is shown in figure 2. the execution model is based on the driver/worker architecture consisting of the driver and the worker processes. the driver process creates the spark context and schedules tasks based on the available worker nodes. initially, the master process must be started, then creating worker nodes follows. the driver takes the responsibility of converting a user’s application into several tasks. these tasks are distributed among the workers. the executors are the main components of every spark application. executors actually perform data processing, reading and writing data to the external sources and the storage system. the spark manager is responsible for resource allocation and deallocation to the spark job. basically, spark is only a computation model. it is not related to storage of data, which is a different concept. it only helps in computations and data analytics in a distributed manner. for distributed execution, the task is distributed among the connected nodes so that every node can perform tasks at the same time; it performs the desired operation and notifies the master upon completion of the task. information technology and libraries | september 2018 35 figure 2 execution model of spark. in mapreduce, read/write operations happen between disk and memory, making job computation slower than spark. rdds resolve this by allowing fault-tolerant, distributed, in-memory computations. in rdd, the first load of data is read from disk and then a write-to-disk operation may take place depending upon the program. the operations between first read and last write happen in memory. data on rdds are lazily evaluated, i.e., during rdd transformations, data will not take part until any action is called on the final rdd, which triggers the job execution. the chain of rdd transformations creates dependencies between rdds. each dependency has a function for calculating its data and a pointer to its parent rdd. spark divides rdd dependencies into stages and tasks, then it sends them to workers for execution. hence, an rdd does not actually hold the data; rather, it either loads data from disk or from another rdd and performs some actions on the data for producing results. one of the important features of rdd is its fault tolerance, because of which it can retain and recompute any of the unsuccessful partitions due to node failures. rdds have built-in methods for saving data into files. for example, the rdd calls on saveastextfile(), its data are written on the specified text file line by line. there are numerous options for storing data in different formats, such as json, csv, sequence files, and object files. all these file formats can be saved directly into hdfs or normal file systems. spark sql and dataframe spark sql is a query interface for processing structured data using sql style on the distributed collection of data. that means it is used for querying structured data stored in hdfs (like hive) and parquet. spark sql runs on top of spark as a library and provides higher optimization. the efficiently processing and storing library linked data | sharma, marjit, and biswas 36 https://doi.org/10.6017/ital.v37i3.10177 spark dataframe is an api (application programming interface) that can perform relational operations on rdds and external data sources such as hive and parquet. like rdds, a spark dataframe is also a collection of structured records that can be manipulated by spark sql. it evaluates operations lazily to perform relational optimizations.26 a dataframe is created using rdds along with the schema information. for example, the java code snippet below creates a dataframe using rdd and a schema called rdftriple (rdf-triple schema will be discussed in the proposed approach). javardd n_triples_ = marc_records.map(new texttostring()); javardd rdf_triples = n_triples.map(new linestordffunction()); dataset dataframe = sparksession.createdataframe(rdf_triples, rdftriple.class); dataframe.write().parquet("/full-path/rdfdata.parquet"); the spark dataframe uses memory management wisely by saving data in off-heap memory and provides an optimized execution plan. conceptually, a dataframe is equivalent to the relational tables with richer optimization and supports sql queries over its data. so, a dataframe is used for storing data into tables. structured data from spark dataframe can be saved into the parquet file format as shown in the above code snippet. column-oriented database a database is a persistent collection of records. these records are accessed via queries. the system that stores data and processes queries to retrieve data is called a database system. such systems use indexes or iteration over the records to find the required information stored in the database. indexes are an auxiliary, dictionary-like data structure that keeps indexes of individual records. indexing is efficient in some cases, however, as it requires two lookup operations and it slows down the access time. data scanning or iteration over each record resolves the query by finding the exact location of the records. it is inefficient when the size of the data is too large. as data-generation rate is increasing constantly, more and more data is going to be stored on the disk. for a fast-growing rate of data, we need a system that can adjust to more data than traditional storage systems and, at the same time, query-processing tasks should take less time. when the data gets too large, indexing and record scanning will be costly during querying. hence, a satisfying solution is the columnar-storage system, which stores data by columns rather than by rows. 27 a column-oriented database system stores data in corresponding columns, and each column is stored in a separate file into the disk. this makes data access time much quicker. since each column is stored separately, any required data can directly be accessed instead of reading all the data. that means any column can be used as an index, making it auto-indexing. that is why the column-oriented representation is much faster than the row-oriented representation. apart from this, data is stored in the compressed form. each column is compressed using a different scheme. in the column-oriented database, the compression is always efficient as all the values belong to the same data type. hence, column-oriented databases require less disk space, as they do not need additional storage for indexes since the data is stored within the indexes themselves. consider an example where a database table named “book” consisting of columns “bookid,” “title,” and “price.” following a column-oriented approach, all the values for bookid are stored together under the “bookid” column, all the values for title are stored together under “title” column. and so on as shown in figure 3. information technology and libraries | september 2018 37 figure 3 an example of an entity and its row and column representation. apache parquet parquet is a top-level apache project that stores data in column-oriented fashion, highly compressed and densely packed in the disk.28 it is a self-describing data format that embeds schema within the data itself. it supports efficient compression and encoding schemes that allows lowering data-storage costs and maximizes the effectiveness of querying data. parquet has added advantages, such as limiting the i/o operation and storing data in compressed form using the snappy method developed by google and used in its production environment. hence it is designed especially for space and query efficiency. snappy aims at compressing petabytes of data in minimal amounts of time, and especially aims for resolving big data issues.29 the data compression rate is more than 250 mb/sec, and decompression rate is more than 500 mb/sec. these compression and decompression rates are for a single core of a system having a core i7 processor in 64-bit mode. it is even faster than the fastest mode of zlib compression algorithm.30 parquet is implemented using column-striping and assembly-language algorithms that are optimized for storing large data-blocks.31 it supports nested data structures in which each value of the same column is stored in contiguous memory locations.32 apache parquet is flexible and can work with many programming languages because it is implemented using apache thrift (https://thrift.apache.org/). a parquet file is divided into row groups and metadata at the end of the file. each row group is divided into column values (or column chunks), such as column 1, column 2, and so on as shown in figure 4. each column value is divided into pages, and each page consists of the page header, repetition levels, definition levels, and values. the footer of the file contains various metadata, such as file metadata, column metadata, and page-header metadata. the metadata information is required to locate and find the values, just like indexing. https://thrift.apache.org/ efficiently processing and storing library linked data | sharma, marjit, and biswas 38 https://doi.org/10.6017/ital.v37i3.10177 figure 4 parquet file structure. the proposed approach the proposed approach relies on spark’s core apis—rdd, spark sql, and dataframe—which can operate on large datasets. rdd is used to load the initial data from the input file, process the data and transform them into triple structure. spark dataframe is used to load the data from rdd into the triple structure and send the transformed rdf data into a parquet file. spark sql is used to fetch the data stored in the parquet file. processing rdf data processing rdf data from large rdf/xml files requires breaking the file into smaller file components. general data-processing systems cannot handle large files because they face memory issues. at this stage, the proposed approach can process the data using an n-triples file, hence individual rdf/xml files again need to be converted into the n-triples file format. the process of breaking rdf/xml file into smaller file components and then converting them into n-triples format depends upon the size of the input file. if it is not more than 500 mb then it is directly converted into n-triples file format. multiple rdf/xml files are converted into individual ntriples file formats, which are again combined into one n-triples file, as the proposed spark application reads input from a single file. information technology and libraries | september 2018 39 schema to store rdf data a simple rdf schema with three triple entities has been designed. this schema is an rdf triple view, which is the building block of the rdf storage schema proposed in this work. the rdf triple view is a simple java class consisting of three attributes—subject, predicate, and object. given an rdf dataset d, consisting of a set of rdf triples t, in either rdf/xml or n-triples format, the dataset is transformed into a format that can be processed by a spark application. further, the dataset is transformed into a line-based format where the individual triple statement is placed in a line separated by a new-line (\n) character. a line contains three components—subject, predicate, and object separated by a space. here each line is unique, using the combined information of subject, predicate, and object. given an rdf triple structure ti, ti = (si, pi, oi) and ti ∈ t, for each t an instance of rdf triple view is created to hold the triple information. the columnar schema organizes triple information into three components, storing each component separately as subject, predicate, and object columns (figure 5). figure 5. rdf triple view. rdf storage we store the rdf data based on rdf triple view, which is the main schema for storing data in the triple representation. we do not need any indexing or additional information related to subject, predicate, or object to be stored on the disk. since we can have any number of temporary dataframe tables in memory, join operations can be performed using these tables to filter the data. in the absence of expensive indexing and additional triple information, storage area can be reduced significantly. apart from this, the compression technique used in apache parquet reduces lot more space than storing in other triple stores. in figure 6, we illustrate the data-storing process. efficiently processing and storing library linked data | sharma, marjit, and biswas 40 https://doi.org/10.6017/ital.v37i3.10177 figure 6. data-storing process in hdfs. the collection of triple instances is loaded into an rdd. at the end, the collection of triple instances is loaded into spark dataframe. spark dataframes are equivalent to the rdbms tables and support both structured and unstructured data formats. using a single schema, multiple dataframes can be used and can be registered as temporary tables in the memory, where highlevel sql queries can be executed on top of them. here the concept of using multiple dataframes with a single schema is motivated to avoid joins and indexing. in the final step, the spark dataframe is saved into hdfs files in the parquet format. from the parquet file, the data can be loaded back into dataframes in memory and queried using spark sql. fetching data from storage given an rdf dataset d, a sparql query q, and a columnar-schema s, we use s to translate q to q' to perform queries on top of s. here, the answer of query q' on top of s is equal to the answer of q on top of d. query mappings m are used to transform sparql queries into spark sql queries. for querying, first the data is loaded into a spark dataframe from parquet files. to query data using sparql, queries must follow basic graph patterns (bgp). a bgp is a set of triple patterns similar to an rdf triple (s, p, o) where any of s, p, and o can be query variables or literals. bgp is used for matching a triple pattern to an rdf graph. this process is called binding between query variables and rdf terms. the statements listed under the where clause is known as bgp consisting of query patterns. for example, the query “select ?name ?mbox where {?x foaf:name ?name . ?x foaf:mbox ?mbox .}” has two query patterns. to evaluate the query containing two query patterns, one join is required. based on the total number of query patterns, information technology and libraries | september 2018 41 we need one less number of joins. that is, for n number of query patterns we need n-1 joins to resolve the values. figure 7 illustrates the process of query execution. figure 7. process of query execution. evaluation to evaluate the proposed approach we compare the storage size with file-based storage systems such as n-triples files and rdf/xml files. we also compare with standard triple stores such as jena tdb and sesame. the data-storing time is compared with jena tdb, sesame, and parquet, having one, two, and three worker nodes respectively. finally, for the purposes of the experiment, some sparql queries are selected and tested over rdf data stored in parquet format into hdfs. the query performance is tested on the distributed system having one, two, and three worker nodes respectively. in the following subsections, we show the results for each of the above comparisons. datasets for evaluation, we use two datasets. dataset 1 contains bibliographic data from the national library of portugal (nlp) (http://opendata.bnportugal.pt/eng_linked_data.htm). from nlp, we choose the nlp catalogue datasets in rdf/xml formats. the datasets are freely available to reuse and contain metadata information from nlp catalogue, the national bibliographic database, the portuguese national bibliography, and the national digital library. the datasets are available as linked data, which were produced in the context of the european library. the size of the rdf/xml file is 6.46 gb with more than 45 billion rdf triples. http://opendata.bnportugal.pt/eng_linked_data.htm efficiently processing and storing library linked data | sharma, marjit, and biswas 42 https://doi.org/10.6017/ital.v37i3.10177 dataset 2 contains bibliographic data from the british national library (https://www.bl.uk/bibliographic/download.html). from the british national bibliography collection we choose the bnb lod books dataset. the datasets are publicly available and contain bibliographic records of different categories, such as books, locations, bibliographic resources, persons, organizations, and agents. the datasets are divided into sixty-seven files in rdf format. however, we combine them into one file in n-triples format to fit the requirement of the large size of the input data. the combined file is 22.52 gb and contains more than 16 billion rdf resources in n-triples format, making it suitable for the proposed approach. from this conversion, we get more than 150 billion rdf triples. figure 8. data storage time for different file formats. figure 9. disk size for different file formats. disk storage figure 8 shows the data-storing time using sesame, jena tdb, and parquet for the above two datasets. data from raw rdf files are stored in jena tdb and sesame. individual files are processed for storing into jena tdb and sesame to avoid memory overflow as jena or sesame models cannot load data at once from the large files. to store data in parquet format we run the program separately on different worker nodes. figure 9 presents the total disk size required for each of these file formats and triple stores for the two datasets. https://www.bl.uk/bibliographic/download.html information technology and libraries | september 2018 43 query performance for testing, the sparql queries are converted manually at this stage. we run some of the selected queries over bibliographic rdf data stored in parquet file format in hdfs. we run the following type of queries on worker nodes 1, 2 and 3 respectively. the queries are listed below: q1) the first query is to fetch the count of rdf triples present in the storage. query: select (count(*) as ?count) where ?s ?p ?o . q2) the second query is to fetch the entire dataset in spo format. it fetches data in the n triples format. query: select * { ?s ?p ?o } . q3) the third query is to fetch resources that belong to books with the subject “english language composition and exercises.” query: select ?s where ?x rdf:type bibo:book . ?x dc:subject . q4) the fourth query is to fetch resources that belong to books with the subject “english language composition and exercises” and creator “palmer frederick.” query: select ?s where ?x rdf:type bibo:book . ?x dc:subject . ?x dc:creator . q5) the fifth query is to fetch objects having predicate dcterms:ispartof. query: select ?name where ?s dcterms:ispartof ?name . figure 10 shows the query response time for the above queries on different worker nodes for two different datasets. the queries are executed in the distributed environment. it shows that increasing the number of worker nodes decreases the query response time. efficiently processing and storing library linked data | sharma, marjit, and biswas 44 https://doi.org/10.6017/ital.v37i3.10177 figure 10. query response time with different numbers of worker nodes. query comparison for comparing query response time, the proposed approach is tested with the first dataset as mentioned above. though at this stage the proposed approach requires further research to be compared with other distributed triple storage systems. also, it requires more worker nodes and larger datasets compatible for parallel processing in the distributed environment. with a smaller setup, it will be hard to analyze the performance of the individual approaches, as they may produce similar results. we compare the proposed approach with the standard jena tdb solution in a single-node environment. the following sparql queries are tested against dataset 1. prefix rdf: prefix dc: prefix rdau: prefix foaf: q1. select (count(*) as ?count) { ?s ?p ?o } q2. select * { ?s ?p ?o } q3. select ?x where { ?x rdf:type dc:bibliographicresource. } q4. select ?x where { ?x rdf:type . ?x rdau:p60339 'time out lisboa'. } q5. select ?s where {?s dc:ispartof . ?s foaf:page 'http://www.theeuropeanlibrary.org/tel4/record/3000115318515'. } information technology and libraries | september 2018 45 figure 11. query comparison. we are interested in measuring the query response time with the above queries. first, we test with jena tdb. we then test the proposed approach on a single-node environment. we execute the above set of queries multiple times to record the average performance. as mentioned above, no indexing is used in the storage. rdf triples are stored as they appeared in the n-triples file. queries are executed without indexing and are still getting better performance than jena tdb, as shown in figure 11. discussion in this article, we claim that apache spark and column-oriented databases can resolve library big data issues. especially when dealing with rdf data, spark can perform far better than other approaches because of its in-memory processing ability. concerning rdf data storage, the column-oriented database is suitable for storing the large volume of data because of its scalability, fast data loading, and highly efficient data compression and partitioning. a column-oriented database system requires less disk, reducing the storage area. as a proof, we have shown the data storage comparison and the performance of the columnar-storage for rdf data using parquet formats in hdfs. as shown in the results, apache parquet takes much less disk space as compared to other storage systems. also, the data-storing time is relatively very small as compared to others. we observed that the result of query 2 is the entire dataset stored in parquet format. the size of this resultant dataset is 22.52 gb, which is the same as the original size. the same dataset when stored with parquet format is reduced to 2.89 gb. this shows that parquet is a very optimized efficiently processing and storing library linked data | sharma, marjit, and biswas 46 https://doi.org/10.6017/ital.v37i3.10177 storage system that can reduce the storage cost. we have shown the query response time for five different sparql queries on distributed nodes for two different datasets. we believe with better schema for storing rdf triples the proposed approach can be improved, and with the used technologies a fast and reliable triple store can be designed. conclusion and future work librarians all over the globe should give priority to integrating library data with the web to enable cross-domain sharing of library data. to do this, they must pay attention to current trends in big data technologies. because the data-generation rate is increasing in every domain, traditional data processing and storage systems are becoming ineffective because of the scale and complexity of the data. in this article, we present a distributed solution for processing and storing a large volume of library linked data. from the experiment, we observe that the processing of large volume of the data takes significantly less time using the proposed approach. also, the storage area is reduced significantly as compared to other storage systems. in the future we plan to optimize the current approach using advanced technologies such as graphx, machine learning tools, and other big -data technologies for even faster data processing, searching, and analyzing. references 1 eric miller et al., “bibliographic framework as a web of data: linked data model and supporting services,” library of congress, november 11, 2012, https://www.loc.gov/bibframe/pdf/marcld-report-11-21-2012.pdf. 2 brighid m. gonzales, “linking libraries to the web: linked data and the future of the bibliographic record,” information technology and libraries 33 no. 4 (2014): 10, https://doi.org/10.6017/ital.v33i4.5631; myung-ja k. han et al., “exposing library holdings metadata in rdf using schema.org semantics,” in international conference on dublin core and metadata applications dc-2015, são paulo, brazil, september 1–4, 2015, pp. 41–49, http://dcevents.dublincore.org/intconf/dc-2015/paper/view/328/363. 3 franck michel et al., “translation of relational and non-relational databases into rdf with xr2rml,” in proceedings of the 11th international conference on web information systems and technologies, lisbon, portugal, 2015, pp. 443–54, https://doi.org/10.5220/0005448304430454; varish mulwad, tim finin, and anupam joshi, “automatically generating government linked data from tables,” working notes of aaai fall symposium on open government knowledge: ai opportunities and challenges 4, no. 3 (2011), https://ebiquity.umbc.edu/_file_directory_/papers/582.pdf; matthew rowe, “data.dcs: converting legacy data into linked data,” ldow 628 (2010), http://ceur-ws.org/vol628/ldow2010_paper01.pdf. 4 virginia schilling, “transforming library metadata into linked library data,” association for library collections and technical services, september 25, 2012, http://www.ala.org/alcts/resources/org/cat/research/linked-data. 5 getaneh alemu et al., “linked data for libraries: benefits of a conceptual shift from libraryspecific record structures to rdf-based data models,” new library world 113, no. 11/12 (2012): 549–70 (2012), https://doi.org/10.1108/03074801211282920. https://www.loc.gov/bibframe/pdf/marcld-report-11-21-2012.pdf https://doi.org/10.6017/ital.v33i4.5631 http://dcevents.dublincore.org/intconf/dc-2015/paper/view/328/363 https://doi.org/10.5220/0005448304430454 https://ebiquity.umbc.edu/_file_directory_/papers/582.pdf http://ceur-ws.org/vol-628/ldow2010_paper01.pdf http://ceur-ws.org/vol-628/ldow2010_paper01.pdf http://www.ala.org/alcts/resources/org/cat/research/linked-data https://doi.org/10.1108/03074801211282920 information technology and libraries | september 2018 47 6 lisa goddard and gillian byrne, “the strongest link: libraries and linked data,” d-lib magazine, 16, no. 11/12 (2010), https://doi.org/10.1045/november2010-byrne. 7 t. nasser and r. s. tariq, “big data challenges,” journal of computer engineering & information technology 4, no. 3 (2015), https://doi.org/10.4172/2324-9307.1000133. 8 alexandru adrian tole, “big data challenges,” database systems journal 4, no. 3 (2013): 31–40, http://dbjournal.ro/archive/13/13_4.pdf. 9 carol jean godby and karen smith-yoshimura, “from records to things: managing the transition from legacy library metadata to linked data,” bulletin of the association for information science and technology 43, no. 2 (2017): 18–23, https://doi.org/10.1002/bul2.2017.1720430209. 10 corine deliot, “publishing the british national bibliography as linked open data,” catalogue & index, issue 174 (2014): 13–18, http://www.bl.uk/bibliographic/pdfs/publishing_bnb_as_lod.pdf; gustavo candela et al., “migration of a library catalogue into rda linked open data,” semantic web 9, no. 4 (2017): 481–91, https://doi.org/10.3233/sw-170274; martin malmsten, “exposing library data as linked data,” ifla satellite preconference sponsored by the information technology section: emerging trends in 2009, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.181.860&rep=rep1&type=pdf ; keri thompson and joel richard, “moving our data to the semantic web: leveraging a content management system to create the linked open library,” journal of library metadata 13, no. 2– 3 (2013): 290–309, https://doi.org/10.1080/19386389.2013.828551; jason a. clark and scott w. h. young, “linked data is people: building a knowledge graph to reshape the library staff directory,” code4lib journal 36 (2017), http://journal.code4lib.org/articles/12320; martin malmsten, “making a library catalogue part of the semantic web,” humbolt university of berlin, 2008, https://doi.org/10.18452/1260. 11 r. hastings, “linked data in libraries: status and future direction,” computers in libraries 35, no. 9 (2015): 12–28, http://www.infotoday.com/cilmag/nov15/hastings--linked-data-inlibraries.shtml. 12 mirjam keßler, “linked open data of the german national library,” in eco4r workshop lod of dnb, 2010; antoine isaac, robina clayphan, and bernhard haslhofer, “europeana: moving to linked open data,” information standards quarterly 24, no. 2/3 (2012)<>; carol jean godby and ray denenberg, “common ground: exploring compatibilities between the linked data models of the library of congress and oclc,” oclc online computer library center, 2015, https://files.eric.ed.gov/fulltext/ed564824.pdf. 13 chunning wang et al., “exposing library data with big data technology: a review,” 2016 ieee/acis 15th international conference on computer and information science (icis), pp. 1-6, https://doi.org/10.1109/icis.2016.7550937. 14 b. mcbride, “jena: a semantic web toolkit,” ieee internet computing 6, no. 6 (2002): 55–59, https://doi.org/10.1109/mic.2002.1067737; jeen broekstra, arjohn kampman, and frank van https://doi.org/10.1045/november2010-byrne https://doi.org/10.4172/2324-9307.1000133 http://dbjournal.ro/archive/13/13_4.pdf https://doi.org/10.1002/bul2.2017.1720430209 http://www.bl.uk/bibliographic/pdfs/publishing_bnb_as_lod.pdf https://doi.org/10.3233/sw-170274 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.181.860&rep=rep1&type=pdf https://doi.org/10.1080/19386389.2013.828551 http://journal.code4lib.org/articles/12320 https://doi.org/10.18452/1260 http://www.infotoday.com/cilmag/nov15/hastings--linked-data-in-libraries.shtml http://www.infotoday.com/cilmag/nov15/hastings--linked-data-in-libraries.shtml https://files.eric.ed.gov/fulltext/ed564824.pdf https://doi.org/10.1109/icis.2016.7550937 https://doi.org/10.1109/mic.2002.1067737 efficiently processing and storing library linked data | sharma, marjit, and biswas 48 https://doi.org/10.6017/ital.v37i3.10177 harmelen, “sesame: a generic architecture for storing and querying rdf and rdf schema,” international semantic web conference, ed. j. davies, d. fensel, and f. van harmelen (berlin and heidelberg: springer, 2002), https://doi.org/10.1002/0470858060.ch5. 15 “apache jena—tdb,” apache jena, accessed august 22, 2018, https://jena.apache.org/documentation/tdb/. 16 “sesame (framework),” everipedia, july 15, 2016, https://everipedia.org/wiki/sesame_(framework)/. 17 asim ullah et al., “bookont: a comprehensive book structural ontology for book search and retrieval,” 2016 international conference on frontiers of information technology (fit), 211– 16, https://doi.org/10.1109/fit.2016.046. 18 tom heath and christian bizer, “linked data: evolving the web into a global data space,” synthesis lectures on the semantic web: theory and technology 1, no. 1 (2011): 1–136, https://doi.org/10.2200/s00334ed1v01y201102wbe001. 19 christian bizer et al., “linked data on the web (ldow2008),” proceeding of the 17th international conference on world wide web—www 08, 2008, pp. 1265–66 (2008), https://doi.org/10.1145/1367497.1367760. 20 eric prud and andy seaborne, “sparql query language for rdf,” w3c recommendation, january 15, 2008, https://www.w3.org/tr/rdf-sparql-query/. 21 devin gaffney, “how to use sparql,” datagov wiki rss, last modified april 7, 2010, https://data-gov.tw.rpi.edu/wiki/how_to_use_sparql. 22 tom white, hadoop: the definitive guide (sebastopol, ca: o’reilly media,, 2012), https://www.isical.ac.in/~acmsc/wbda2015/slides/hg/oreilly.hadoop.the.definitive.guide. 3rd.edition.jan.2012.pdf. 23 dhruba borthakur, “the hadoop distributed file system: architecture and design,” hadoop project website, 2007, http://svn.apache.org/repos/asf/hadoop/common/tags/release0.16.3/docs/hdfs_design.pdf; seema maitrey and c. k. jha, “mapreduce: simplified data analysis of big data,” procedia computer science 57 (2015), 563–71 (2015), https://doi.org/10.1016/j.procs.2015.07.392. 24 michael armbrust et al., “spark sql: relational data processing in spark,” in proceedings of the 2015 acm sigmod international conference on management of data (new york: acm, 2015), 1383–94, https://doi.org/10.1145/2723372.2742797. 25 abdul ghaffar shoro and tariq rahim soomro, “big data analysis: apache spark perspective,” global journal of computer science and technology 15, no. 1 (2015), https://globaljournals.org/gjcst_volume15/2-big-data-analysis.pdf. 26 salman salloum et al., “big data analytics on apache spark,” international journal of data science and analytics 1, no. 3–4 (2016): 145–64, https://doi.org/10.1007/s41060-016-0027-9. https://doi.org/10.1002/0470858060.ch5 https://jena.apache.org/documentation/tdb/ https://everipedia.org/wiki/sesame_(framework)/ https://doi.org/10.1109/fit.2016.046 https://doi.org/10.2200/s00334ed1v01y201102wbe001 https://doi.org/10.1145/1367497.1367760 https://www.w3.org/tr/rdf-sparql-query/ https://data-gov.tw.rpi.edu/wiki/how_to_use_sparql https://www.isical.ac.in/~acmsc/wbda2015/slides/hg/oreilly.hadoop.the.definitive.guide.3rd.edition.jan.2012.pdf https://www.isical.ac.in/~acmsc/wbda2015/slides/hg/oreilly.hadoop.the.definitive.guide.3rd.edition.jan.2012.pdf http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.16.3/docs/hdfs_design.pdf http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.16.3/docs/hdfs_design.pdf https://doi.org/10.1016/j.procs.2015.07.392 https://doi.org/10.1145/2723372.2742797 https://globaljournals.org/gjcst_volume15/2-big-data-analysis.pdf https://doi.org/10.1007/s41060-016-0027-9 information technology and libraries | september 2018 49 27 daniel j. abadi, samuel r. madden, and nabil hachem, “column-stores vs. row-stores: how different are they really?,” in proceedings of the 2008 acm sigmod international conference on management of data (new york: acm, 2008), 967–80, https://doi.org/10.1145/1376616.1376712. 28 deepak vohra, “apache parquet,” in practical hadoop ecosystem (berkeley, ca: apress, 2016), 325–35, https://doi.org/10.1007/978-1-4842-2199-0_8. 29 “google/snappy,” github, january 04, 2018, https://github.com/google/snappy. 30 jean-loup gailly and mark adler, “zlib compression library,” 2004, https://www.repository.cam.ac.uk/bitstream/handle/1810/3486/rfc1951.txt?sequence=4. 31 sergey melnik et al., “dremel: interactive analysis of web-scale datasets,” proceedings of the vldb endowment 3, no. 1–2 (2010): 330–39, https://doi.org/10.14778/1920841.1920886. 32 marcel kornacker et al., “impala: a modern, open-source sql engine for hadoop,” in proceedings of the 7th biennial conference on innovative data systems research, asilomar, california, january 4–7, 2015, http://www.inf.ufpr.br/eduardo/ensino/ci763/papers/cidr15_paper28.pdf. https://doi.org/10.1145/1376616.1376712 https://doi.org/10.1007/978-1-4842-2199-0_8 https://github.com/google/snappy https://www.repository.cam.ac.uk/bitstream/handle/1810/3486/rfc1951.txt?sequence=4 https://doi.org/10.14778/1920841.1920886 http://www.inf.ufpr.br/eduardo/ensino/ci763/papers/cidr15_paper28.pdf abstract introduction literature review background structure of rdf triples hadoop and mapreduce spark and resilient distributed datasets (rdd) spark sql and dataframe column-oriented database apache parquet the proposed approach processing rdf data schema to store rdf data rdf storage fetching data from storage evaluation datasets disk storage query performance query comparison discussion conclusion and future work references reproduced with permission of the copyright owner. further reproduction prohibited without permission. is this a geolibrary? a case of the idaho geospatial data center jankowska, maria anna;jankowski, piotr information technology and libraries; mar 2000; 19, 1; proquest pg. 4 is this a geolibrary? a case of the idaho geospatial data center maria anna jankowska and piotr jankowski the article presents the idaho geospatial data center (igdc), a digital library of public-domain geographic data for the state of idaho. the design and implementation of igdc are introduced as part of the larger context of a geolibrary model. the article presents methodology and tools used to build igdc with the focus on a geolibrary map browser. the use of igdc is evaluated from the perspective of access and demand for geographic data. finally, the article offers recommendations for future development of geospatial data centers. i n the era of integrated transnational economies, demand for fast and easy access to information has become one of the great challenges faced by the traditional repositories of information-libraries. globalization and the growth of market-based economies have brought about, faster than ever before, acquisition and dissemination of data, and the increasing demand for open access to information, unrestricted by time and location. these demands are mobilizing libraries to adopt digital information technologies and create new methods of cataloging, storing, and disseminating information in digital formats. libraries encounter new challenges constantly. participation in the global information infrastructure requires them to support public demand for new information services, to help the society in the process of selfeducation, and to promote the internet as a tool for sharing information. these tasks are becoming easier to accomplish thanks to the growing number of digital libraries. since 1994, when the digital library initiative originated as part of the national information infrastructure program, the internet has accommodated many digital libraries with spatial data content. for example, the electronic environmental library project at the university of california, berkeley (http:/ /elib.cs. berkeley.edu/) provides botanical and geographic data; the university of michigan digital library teaching and learning project (www.si.umich.edu/umdl/) focuses on earth and space sciences; the carnegie mellon's informedia digital video library (www.informedia. cs.cmu.edu) distributes digital video, audio, and images maria anna jankowska (majanko@uidaho.edu) is associate network resources librarian, university of idaho library, and piotr jankowski (piotrj@uidaho.edu) is associate professor, department of geography, university of idaho, moscow, idaho. 4 information technology and libraries i march 2000 with text; and the alexandria digital library at santa barbara (http:/ /alexandria.sdc.ucsb.edu/) provides geographically referenced information. the alexandria digital library is of special interest in this article because it implements a model of a geolibrary. a geolibrary stores georeferenced information searchable by geographic location in addition to traditional searching methods such as by author, title, and subject. the purpose of this article is to present the idaho geospatial data center (igdc) in the larger context of a geolibrary model. igdc is a digital library of publicdomain geographic and statistical data for the state of idaho. the article discusses methodology and tools used to build igdc and contrast its capabilities with a geolibrary model. the usage of igdc is evaluated from the perspective of access and demand for geographic data. finally, the article offers recommendations for future development of geospatial data centers. i geographic information systems for public services terms such as digital, electronic, virtual, or image libraries have existed long enough to inspire diverse interpretations. the broad definition by covi and king concentrates on the main objective of digital libraries, which is the collection of electronic resources and services for the delivery of materials in different formats.1 the common motivation for initiatives leading to the development of digital libraries is to allow conventional libraries to move beyond their traditional roles of gathering, selecting, organizing, accessing, and preserving information. digital libraries provide new tools allowing their users not only to access the existing data but also to create new information. the creation of new information using the existing data sources is essential to the very idea of the digital library. since the information in a digital library exists in virtual form, it can be manipulated instantaneously by computer-based information processing tools. this is not possible using traditional information media (e.g., paper, microfilm) where the information must first be transferred from non-digital into digital format. since late 1994, when the u.s. national science foundation founded the alexandria digital library project, the number of internet sites devoted to spatially referenced information has grown dramatically. today, it would require a serious expenditure of time and effort to visit all geographic data sites created by state agencies, universities, and commercial organizations. in 1997 karl musser wrote, "there are now more than 140 sites featuring interactive maps, most of which have been created in the last two years." 2 this incredible boom in publishing reproduced with permission of the copyright owner. further reproduction prohibited without permission. spatial data is possible thanks to geographic information system (gis) technology and data development efforts brought about by the rapidly increasing use of gis. this new technology provides its users with capabilities to automate, search, query, manage, and analyze geographic data using the methods of spatial analysis supported by data visualization. traditionally, geographic data were presented on maps considered as public assets. according to a norwegian survey, the aggregate benefit accrued from using maps was three times the total cost of their production, even though maps provided only static information.3 today, the conventional distribution of geographic data on printed maps has become less efficient than distributing them in the digital format through wide area data networks. this happened largely due to gis's ability to separate data storage from data presentation. as a result, data can be presented in a dynamic way, according to users' needs. often gis is termed "data mixing system" because it can process data from different sources and formats such as vector-format maps with full topological and attribute information, digital images of scanned maps and photos, satellite data, video data, text data, tabular data, and databases. 4 all of these data types provide a rich informational infrastructure about locations and properties of entities and phenomena distributed in terrestrial and subterrestrial space. the definition of gis changes according to the discipline using it. gis can be used as a map-making machine, a 3-d visualization tool, and as an analytical, planning, collaboration, and business information management tool. today, it is hard to find a planning agency, city engineering department, or utility company (not to mention individual internet users) that has not used digital maps. this is why the number of users seeking spatial data in digital format has increased so dramatically. data discovery can be for gis users the most time-consuming part of using the technology. 5 as a result, libraries are faced with the growing demand for services that help discover, retrieve, and manipulate spatial data. the web greatly improved the availability and accessibility of spatial data but, at the same time, stimulated public interest in using geographic information. the continuing migration to popular operating systems (i.e., microsoft windows family) and the adoption of their common functionality has brought gis software to many desktops. tools such as arcview gis from environmental systems research institute, inc. (esri, www.esri.com) or maplnfo from maplnfo corporation (maplnfo, www.mapinfo.com) have become popular gis desktop systems. new software tools such as arcexplorer, released by esri, are focused on making gis more accessible, simpler, and available for use by the public. by taking advantage of the popularity of the web, attempts are being made to gain a wider acceptance of gis. in the wake of the simplification of gis tools and improved access to spatial data, a new exciting area of gis use has recently emerged-public participation gis.6 public participation gis by definition is a pluralistic, inclusive, and nondiscriminatory tool that focuses on the possibility of reducing the marginalization of societies by means of introducing geographic information operable on a local level.7 it promotes an understanding of spatial problems by those who are most likely to be affected by the implementation of problem solutions, and encourages transfer of control and knowledge to these parties. this approach leads to a broader use of gis tools and spatial data and creates new challenges for libraries storing and serving geographic data in digital formats. broadening the use of data and gis tools requires attention to data access. traditional libraries have often fulfilled the crucial role of being an impartial information provider for all parties involved in public decision-making processes. will they be capable of serving the society in this capacity in the digital age? i geolibrary as a repository of georeferenced information according to brandon plewe, the user of spatial data can choose among seven types of distributed geographic information services available on the intemet. 8 they range from raw data download, through static map display, metadata search, dynamic map browsing, data processing, web-based gis query and analysis, to net-savvy gis software. yet, another important new category of geographic data service that can be added to this list is geolibrary. goodchild defines a geolibrary as a library filled with georeferenced information where the primary basis of representation and retrieval are spatial footprints that determine the location by geographic coordinates. "the footprints can be precise, when they refer to areas with precise boundaries, or they can be fuzzy when the limits of the area are unclear." 9 according to buttenfield, "the value of a geolibrary is that catalogs and other indexing tools can be used to attach explicit locational information to implicit or fuzzy requests, and once accomplished, can provide links to specific books, maps, photographs, and other materials." 10 a geolibrary is distinguished from a traditional library in being fully electronic, with digital tools to access digital catalogs and indexes. it is anticipated that most of the information is archived in digital form. the value of a geolibrary is that it can be more than a traditional, physical library in electronic form.11 is this a geolibrary? i jankowska and jankowski 5 reproduced with permission of the copyright owner. further reproduction prohibited without permission. since its introduction, the concept of a geolibrary has been synonymous with the alexandria digital library (aol) project. once aol was defined as the internetbased archive providing comprehensive browsing and retrieval services for maps, images, and spatial information.12 a more recent definition characterizes aol as a geolibrary where a primary attribute of collection objects is their location on earth, represented by geographic footprints. a footprint is the latitude and longitude values that represent a point, a bounding box, a linear feature, or a complete polygonal boundary.13 according to goodchild (1998) a geolibrary' s components include: • the browser-a specialized software application running on the user's computer and providing access to geolibrary via a computer network. • the basemap-a geographic frame of reference for the browser's searches. a basemap provides the image of an area corresponding to the geographical extent of geolibrary collection. for the worldwide collection this would be the image of the earth. for the statewide collection this could be the image of a state. the basemap may be potentially large, in which case it is more advantageous to include it in the browser then to download it from a geolibrary server each time a geolibrary is accessed. • the gazetteer-the index that links place names to a map. the gazetteer allows geographic searches by place name instead of by area. • server catalogs-collection catalogs maintained on distributed computer servers. the servers can be accessed over a network with the browser, utilizing basic server-client architecture. the value of a geolibrary lies in providing open access to a multitude of information with geographic footprints regardless of the storage media. because all information in a digital library is stored using the same digital medium, traditional problems of physical storage, accessibility, portability, and concurrent use (e.g., many patrons wanting to view the one and only copy of a map) do not exist. i idaho geospatial data center in 1996, inspired by the aol project, a team of geographers, geologists, and librarians started to work on a digital library of public-domain geographic data for the state of idaho. the main goal of the project was the development of a geographic digital data repository accessible through a flexible browsing tool. the project 6 information technology and libraries i march 2000 was funded by a grant from the idaho board of education's technology incentive program. the project resulted in the creation of the idaho geospatial data center (igdc, http://geolibrary.uidaho.edu). the first in the state of idaho, this digital library is comprised of a database containing geospatial datasets, and geolibrary software that facilitates access, browsing, and retrieval of data in popular gis data formats including digital line graph (dlg), digital raster graphics (drg), usgs digital elevation model (dem), and u.s. bureau of census tiger boundary files for the state of idaho. the site also provides an interactive visual analysis of selected demographic/economic data for idaho counties. additionally, the site provides interactive links to other idaho and national spatial data repositories. the key component of the library is the geolibrary software. the name "geolibrary" is not synonymous with the model of geolibrary defined by goodchild (1998). it was rather adopted as a reference to a geolibrary browser-one of the components of the geolibrary. the geolibrary browser (gl) supports online retrieval of spatial information related to the state of idaho. it was implemented using microsoft visual basic 5.0/6.0 and esri mapobjects technology. the software allows users to query an area of interest using a search based on map selection, as well as selection by area name (based on uses 7.5-minute quad naming convention). queries return gis data including dems, dlgs, drgs, and tiger files. queries are intended both for professionals seeking gis-format data and nonprofessionals seeking topographic reference maps in the drg format. the interface of gl consists of three panels resembling the microsoft outlook user interface. our intent in designing the interface was to have panels that would be used in the following order. first, the map panel is used to explore the geographic coverage of the geolibrary and to select the area of interest. next, the query panel is used to execute a query, and finally the result panel allows the user to analyze results and to download spatial data. users can use a shortcut to go directly to the query panel and type their query. both approaches result in the output being displayed as the list of files available for download from participating servers. the map panel (figure 1) includes a navigable map of idaho, a vertical command toolbar, and a map finder tool. the command toolbar allows the user to zoom in, zoom out, pan the map, identify by name the entities visible on the map canvas, and select a geographic area of interest. geographic entity name identification was implemented as a dynamic feature whereby the name of entity changes as the user moves the mouse over the map. spatial selection provides a tool to select a rectangular area of interest directly on the map canvas. the map finder provides additional means to simplify the exploration of the map. reproduced with permission of the copyright owner. further reproduction prohibited without permission. the results panel shows the outcome of the query and includes important information about the data files: their size, type , projection, scale , the name of the server providing the data, as well as the access path (figure 4). based on this information , the user has the option of manually connecting to the server, using ftp protocol, and retrieving th e selected files. a much more convenient approach, however, is to rely on gl software to automatically retrieve the files through the software int erface. as an option , the result of the query can also be exported to a plain html document that contains links to all listed files . this feature can be very useful in the case of multifile files selected by the user and slow or limited-time internet access. this way the user can open the saved list of files in a web browser and download individual files as needed, without having to download all the files at once and tie up the internet connection for a long period of time. figure 1. map panel. the vertical toolbar provides zooming, panning , as well as labeling and simple feature querying capabilities. the map finder allows finding and selecting an area by county or usgs quad name . the screen copy here presents the selection of latah county in idaho. the result panel provides a flexible way to review and organize the outcomes of queries before commencing the download. one can sort files by name, size, scale, the user can select a county or a quad name and zoom in on the selected geographic unit. the query panel (figure 2) allows the user to perform a query, based either on the selection made on the map or a new selection using one of the available query tools (figure 3). in the latter case, the user can enter geographic coordinates (in decimal degrees) defining the area of interest. this approach is equivalent to selecting a rectangular area directly on the map, and will return all data files that spatially intersect with the selected area. optionally, the user can handpick quads of interest from the list. finally, a name can be entered to execute a more flexible query . for instance, the search containing the word "moscow" returns spatial data related to three quads containing "moscow" within their names. the query is executed when the user presses the query button . after the results are received, the application automatically switches to the results panel. projection, and server name . this feature may be useful if the user decides to retrieve data of only one type (e.g., dems), of one scale, or when the user prefers to connect only to a specific sever. in addition, individual records as well as entire file types can be selected to prevent files from being downloaded. the user can also remove selected files to scale down the set of data in the list. one of the most important assets of the gl browser is that all of the user activities described up to this point, with the exception of file download, take place entirely on the client-side without any network traffic. in fact, area/file selection as well as queries do not require an active internet connection. map exploration is based on vector-format maps contained in gl software and queries are run against the local database. such an approach limits bandwidth consumption and unnecessary network traffic. internet connection is only necessary to perform retrieval of selected files. is this a geolibrary? i jankowska and jankowski 7 reproduced with permission of the copyright owner. further reproduction prohibited without permission. figure 2. query panel. the interface was set to query spatial selection from the map panel. figure 3. query panel. the query is based on the selection of usgs quads . optionally, the user can enter geographic coordinates of the area or a text to search. 8 information technology and libraries i march 2000 the vulnerability of the client-side approach to data query is to be left with a potentially outdated local database. in order to prevent this problem from happening, the gl is equipped with a database synchronization mechanism that allows users to keep up with the server database updates. the client-side database, contained in gl software, which mirrors the schema of the server database, can be synchronized automatically or by the user's request. in either case, the gl client contacts the server-based database synchronizer on the server side and handles all necessary processes. since the synchronization is limited to database record updates, the network traffic is kept low, making gl suitable for limited internet connections. igdc is an open solution. new local datasets can be added or removed making the collection easily adaptable to different geographical areas. in addition, datasets can physically reside on multiple servers, taking full advantage of the internet's distributed nature. i evaluation of igdc use geospatial information is among the most common public information needs; almost 80 percent of all information is geographic in nature. published research reflecting those needs and the role of libraries in resolving them is not extensive. the efforts of federal, state, and local agencies collecting digital geospatial data and the growth of gis created an interest in the role of libraries as repositories of geospatial data. 14 the main obstacle to providing access to digital spatial information is its complexity. this is why the user-friendly interface is critical for presenting spatially referenced information.15 the igdc has been a first attempt at creating a user-friendly interface in the form of a map-based data browser allowing the users to access and retrieve geographic datasets about idaho. in order to track and evaluate the use of geospatial data, webtrends software was installed on the igdc server. the webtrends software produces customized web log statistics and allows tracking information on traffic and reproduced with permission of the copyright owner. further reproduction prohibited without permission. ahsahka -southwick ·· lenore --juliaetta green knob -· aldeamand ridge park texas ridge · mcgary butte ·· bovill deary viola palouse dlg_aoads i.tj dlg_rai l!l ·dlg_transp01t dlg_hydro olg_bcu'ldaries tiger_streets tiger_bnds ----'-----'--"-'--'--'----'=---:__.:_::.._-_·-since the opening of igdc for public us e (april 1998), the geolibrary map browser was downloaded 1,352 times. the software proved to be relati vely easy to use by the public. out of fort y-four bug report s/ user questions submitted to igdc, most were concerned with filling out the software registration form and not with software failure. the igdc project spurred an interest in geographic information among students , faculty, and librarians at the university of idaho. in a direct response to this interest, the university of idaho library installed a new dedicated computer at the reference desk with geolibrar y software to access, view , and retrieve igdc data . i conclusion idaho geospatial data center is the first geospatial digital library for the state of idaho. it does not fulfill all requirements of a figure 4. the results panel. results of a query can be sorted; individual items can be removed from the list or can be deselected to prevent them from being downloaded . geolibrary model proposed by goodchild and others. the igdc has only two components of the geolibrary model; they are the datasets dissemination. during a one-year timeframe the number of successful hits was more than twenty-five thousand . almost 40 percent of users came from .com domain, 35 percent were .net domain users, 15 percent w ere .org, and 10 percent were .edu users (figure 5). tracking the geographic origin of users by state, the biggest number of users came from virginia, followed by washington, california, ohio, and idaho . the high number of users from virginia can be explained by the linking of the igdc site to one of the most popular geospatial data sites in the country-the united states geological survey (usgs) site. eighty-four percent of user sessions were from the united states; the rest originated from sweden, canada , and germany. the average number of hits per day on weekdays was around one hundred customers. the most popular retrievable information were digital raster graphics (drg) data that present scanned images of usgs standard series topographic maps at 1:24,000 scale. digital elevation models (dem) and digital line graphs (dlg) were less popular. the tiger boundary files for the state of idaho were in small demand . the popularity of drg-format maps and the fact that most of the users accessed igdc via the usgs web site makes plausible a speculation that most of the users were non-gis specialists interested in general reference geographic information about idaho including topography and basic land use information. geolibrary map browser and the basemap . the main difference between the geolibrary map browser and a web-based browser solution adopted by other spatial repositories is a client-side solution to geospatial data query and selection. spatial data query is done locally on the user's machine, using the library data base schema contained in the geolibrary map browser. this saves time by eliminating client-server communication delays during data searches, gives the user an experience of almost instantaneous response to queries , and reduces the network communication to the data download time . in comparison with th e geolibrary model, igdc is missing the gazetteer . this component can help improve the ease of user navigation through a geospatial data collection. the other useful component includes online mapping and spatial data visualization services. the idea of such services is to provide the user with a simple-tooperate mapping tool for visualizing and exploring the results of user-run queries . one such service, currently under implementation at igdc, includes thematic mapping of economic and demographic variables for idaho using descartes software .16 descartes is a knowledgebased system supporting users in the design and utilization of thematic maps. the knowledge base incorporates domain-independent visualization rules determining which map presentation technique to employ in response to the user selection of variables. an intelligent is this a geolibrary? i jankowska and jankowski 9 reproduced with permission of the copyright owner. further reproduction prohibited without permission. i ,i distribution of igdc users (in %) by domain 40 30 20 10 0 . com .net org .edu web domain categories figure 5. distribution of igdc users in percent by origin domain map generator such as descartes can enhance the utility of a geolibrary by providing tools to transform georeferenced data into information. references and notes 1. l. covi and r. king, "organizational dimensions of effective digital library use: closed rational and open natural systems models," journal of the american society for information science 47, no. 9 (1996): 697. 2. k. musser, "interactive mapping on the world wide web." (1997) accessed march 6, 2000, www .min.net/-boggan/ mapping/thesis.htm. 3. t. bernhardsen, geographic information systems (arendal, norway: viak it and norwegian mapping authority, 1992), 2. 4. ibid., 4. 5. j. stone, "stocking your gis data library," issues in science and technology librarianship. (winter 1999). accessed march 6, 2000, www.library.ucsb .edu/istl/99-winter/articlel. html. 6. p. schroeder, "gis in public participation settings." (1997.) accessed june 2, 1999, www.spatial.maine.edu/ucgis/ testproc/ schroeder / ucgisdft.htm . 7. w. j. craig and others, "empowerment, marginalization, and public participation gis," report of a specialist meeting held under the auspices of the varenius project. santa barbara, california, oct. 15-17, 1998, ncgia, uc santa barbara. 8. b. plewe, gis online: information retrieval, mapping, and the internet (santa fe, n.m.: on word pr., 1997), 71-91 . 9. m. f. goodchild, "the geolibrary," in innovations in gis 5: selected papers from the fifth national conference on gis research uk (gisruk), ed. s. carver. (london: taylor and francis, 1998), 59. accessed march 6, 2000, www.geog.ucsb.edu/ -good/geolibrary.html . 10. b. p. buttenfield, "making the case for distributed geolibraries." (1998) accessed march 6, 2000, www.nap.edu/ html/ geolibraries/ app_b .html . 11. ibid . 12. m. rock, "monitoring user navigation through the alexandria digital library," (master's thesis abstract, 1998). accessed march 6, 2000, http :/ /greenwich.colorado.edu/projects/ rockm.htm. 13. l. l. hill and others, "geographic names the implementation of a gazetteer in a georeferenced digital library. d-lib magazine 5, no. 1 (1999). accessed march 6, 2000, www.dlib. org/ dlib/ january99 /hill/0lhill.html. 14. m. gluck and others, "public librarians' views of the public's geospatial information needs," library quarterly 66, no . 4 (1996): 409. 15. b. p. buttenfield, "user evaluation for the alexandria digital library project." (1995) accessed march 6, 2000, http://edfu.lis.uiuc.edu/allerton/95 /s 2/buttenfield .html. 16. g. andrienko and others, "thematic mapping in the internet: exploring census data with descartes," in proceedings of telegeo '99, first international workshop on telegeoprocessing, lyon, may 6-7, r. laurini, ed. (seiten, france: claude bernard univ. of lyon, 1999), 138--45. 10 information technology and libraries i march 2000 microsoft word december_ital_fifarek_final.docx president’s message: for the record aimee fifarek information technologies and libraries | march 2017 1 for a long time, i’ve have an idea that when a new president of the united states is elected, sometime after he's sworn in, amid all of the briefings, a wizened old man sits down with him to have the talk. in my imagination the messenger is some cross between the templar knight from indiana jones and the last crusade and the international express man from neil gaiman and terry pratchett’s good omens: officious yet wise. he tells the new president the why of it all, the real reasons why important things have happened in the ways they have, making all the decisions that seemed so wrong now seem inevitable. and probably not for the first time the new president thinks to himself “what have i gotten myself into?” this is clearly reflective of my desire for there to be, if not a reason for everything that happens, then at least some record of it all that can be reviewed, synthesized, and mined for meaning by future leaders. it’s the librarian in me i suppose. although being lita president bears absolutely no resemblance to being president of the united states, i have been thinking about this little imagining of mine a lot lately. this is probably because, now that i am midway through my presidential cycle (vice president, president, past president), i realize how much of what i’ve done has been marked by the absence of such a record. i did not receive a “how to be lita president” manual along with my gavel, and no one gave me the lita version of the talk. the one person who could have done it, lita executive director jenny levine, was as new to her position as i was to mine, so we have learned together and asked many questions of those around us with more experience. we are in the midst of election season, and will soon have a new president-elect. bohyun kim and david lee king are both excellent candidates (http://litablog.org/2017/01/meet-yourcandidates-for-the-2017-lita-election/); those of you who have not yet voted have a difficult choice. in order to make a little progress toward developing that how-to guide i thought i’d document a few of things i’ve learned since being lita president. being lita president also means being president of a division of the american library association. when i was elected i expected to manage the business of the library information technology association—board meetings, committee appointments, presidential programs and lita forums. seeing the board complete the lita strategic plan (http://www.ala.org/lita/about/strategic) was a great accomplishment at this level. while it’s possible for a division leader to have minimal interactions with “big ala” during their aimee fifarek (aimee.fifarek@phoenix.gov) is lita president 2016-17 and deputy director for customer support, it and digital initiatives at phoenix public library, phoenix, az. president’s message | fifarek https://doi.org/10.6017/ital.v36i1.9808 2 term and still be successful, my priority for my presidential year—increasing value litans receive from membership, especially those who are not able to attend in-person conferences—meant that i needed to learn more about how ala works. after a year and a half, i have a much better understanding of the association’s budgeting, publishing, and technology practices, and how all of these are impacted by declines in membership and decreasing revenues. future lita leaders are going to need to continue to be engaged at the larger organizational level if we are to be able to use lita’s technological knowledge and expertise to support ala’s efforts to maximize efficiency while minimizing costs. being lita president means speaking not just to, but for, an incredibly diverse community. my plan when i became lita president was to blog on a more regular basis. however, i didn’t expect some of my first communications to be about a mass shooting in dallas (in advance of the forum in ft. worth) or working with the board to craft a statement on inclusivity after the us presidential election. the proverbial curse “may you live in interesting times” has certainly been true this year. having to speak to the lita community about those issues made me acutely aware of my responsibility to adequately represent you when we’ve also been asked to weigh in on technology policy issues at the federal level such as the call for increased gun violence research and rescinding isp regulations on privacy protection. the decision by the board to include advocacy and information policy as a primary focus for the strategic plan was certainly prescient. we are fortunate that our president elect, andromeda yelton, is both well-versed in the issues and able to speak eloquently to them1. being lita president means being part of more than one team i’m continually amazed at the hard work and dedication that board members (http://www.ala.org/lita/about/board), committee and interest group chairs (http://www.ala.org/lita/about/committees/chairs), and anyone who fits into involvement our member persona (http://litablog.org/2017/03/who-are-lita-members-lita-personas/). the success of lita as an organization is entirely due to the time and passion of this team. but when you become lita president-elect you get a new team—the other division vice presidents. this cohort travels to ala hq in chicago in october after they are elected to meet each other and the incoming ala president and learn about the structure of ala. i have learned much from the other presidents this year, and we have had a number of truly productive discussions about how the divisions can collaborate and learn from each other to more effectively serve our members. lita is directly benefitting from the expertise of the other groups and they are in turn looking to us for both our technical skillset and the successes we’ve had over 50-years as an association. information technologies and libraries | march 1017 3 consider this a new preface to the how to be lita president manual. i hope that my successors find it useful, and that it will serve as an inspiration for any litans out there who are thinking about putting their name on the ballot in future years. it has been a marvelous and educational experience. and the gavel is pretty cool, too. references 1. making ala great again, publisher’s weekly, feb 17, 2017. http://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/72814making-ala-great-again.html reproduced with permission of the copyright owner. further reproduction prohibited without permission. the internet as a source of academic research information: findings of two pilot studies kibirige, harry m;depalo, lisa information technology and libraries; mar 2000; 19, 1; proquest pg. 11 the internet as a source of academic research information: harry m. kibirige and lisa depalo findings of two pilot studies as a source of serious subject-oriented information, the internet has been a powerful feature in the information arena since its inception in the last quarter of the twentieth century. it was, however, initially restricted to government contractors or major research universities operating under the aegis of the advanced research projects network (arpanet).1 in the 1990s, the content and use of the internet was expanded to include mundane subjects covered in business, industry, education, government, entertainment, and a host of other areas. it has become a magnanimous network of networks the measurement of whose size, impact, and content often elude serious scholarly effort.2 opening the internet to common usage literally opened the flood gates of what has come to be known as the information superhighway. currently, there is virtually no subject that cannot be found on the internet in one form or another. t here is both hype and reality as to what the internet can generate in terms of substantive information. in their daily pursuits of information, information professionals as well as end-users of information are challenged with regard to what their expectations are and what actually is delivered in terms of tangible information products and services on the net. academic users are a special breed in that both faculty and students have specific topics covered in their courses of study or faculty research agendas for which they need information. the use of electronic resources found on and off the internet is becoming increasingly vital for education and training in academic environments.3 five basic elements often are required in the electronic resources that academic information seekers desire: accessibility, timeliness, readability, relevance, and authority. the internet excels in the first three, but depending on how and from where the information is gathered, it may not be so reliable with regard to the last two elements. the two pilot studies discussed in this article involved four academic institutions and were conducted by the researchers with approximately twelve months apart. one covering two institutions was done in the fall of 1997. it was replicated covering another two institutions in the spring of 1999. the main goal of the studies was to investigate how academic users perceive search engines and subject-oriented databases as sources of topical information . the basic underlying question was, "when faced with a topical subject, what is the users' predominant recourse, online databases (which may include cd-rom, or dvd databases) or search engines?" our results indicated that there is predominant preference for search engines for the group taken as a whole. further analysis using nonparametric correlation coefficientskendall's tau_b and spearman's rho-however, indicated that those who use the internet monthly or weekly had high correlations with online databases as their preferred predominant information sources . on the other hand, daily users tended to have high correlations with search engines as preferred predominant information sources . i information seeking behavior of academic users over the years, several studies have been conducted on how users seek and find information relevant to their needs . for the purposes of our analysis three categories will be used: the undergraduate, the graduate, and the post-doctoral research faculty user. while the levels of how the needed information may be articulated and packaged may be different , the five basic required elements in the electronic information resources needed by academics, already identified, remain the same. the internet has, however, added another dimension to the information-seeking behavior of all academics in that much of the needed information, if and when found, has a higher chance of appearing as full text (sometimes defined as viewdata) on the internet. 4 with viewdata the end user has the ultimate in information seeking and acquisition in that he or she will get text, images, and sound in one, two, or more resources on the net. the process also may be accomplished in one sitting or search session on the computer terminal. the internet thus may be more likely to generate viewdata in contrast to conventional databases , which have for a long time been associated with the less desirable citations. in many instances and with a little persistence, it can provide the analogy of "one stop shopping" whereby a user can get viewdata needed for a topic. this may explain the tendency to try the internet first as a potential information source even for experienced searchers. to be effective, such searching needs experience and a lot of patience while sifting through pages of useless verbiage, as the information sources often are garnered from several sites. categories of academic users have varying levels of expertise in information seeking and harry m. klblrlge is associate professor at queens college, city university of new york, and lisa depalo is assistant professor at college of staten island, city university of new york. the internet as a source of academic resource information i kibirigie and depalo 11 reproduced with permission of the copyright owner. further reproduction prohibited without permission. have different characteristics in their information-seeking behavior. undergraduate users undergraduates are at the lowest point on the totem pole with regard to expertise in information seeking at any academic institution. there is more to the information needs of undergraduate students than can be revealed during the reference interview process. there are the pervading needs that the information age has created, which can be met only by those who possess critical thinking skills. critical thinking skills are imperative to much more than completing college-level assignments-they are also imperative to surviving in the job market once students graduate. this premise has been set forth in the 1992 united states government report from the department of labor and the secretary's commission on achieving necessary skills (scans) entitled skills and tasks for jobs: a scans report for america 2000. this report defined two types of skills needed to excel in the workplace and labeled them as competencies and foundations. effective reference and instruction services can help students develop the critical thinking skills needed to meet the information competency, in particular, since it pertains to one who "acquires and evaluates information, organizes and maintains information, interprets and communicates information, and uses computers to process information." 5 acquiring and evaluating information can be particularly difficult for undergraduates in the information age since one is bombarded with data in print and electronic formats. one can easily determine the reliability of print sources by looking at the name of the author, editor, or publisher. however, the internet has become a popular choice for students who need to do research. it has gained the reputation for providing all that one needs right at one's fingertips. the problem is that one cannot readily discern what is reliable and what is not without some instruction. it may be argued that the undergraduates' information seeking is somewhat eased by the general guides they get from the faculty in the classroom. there is the general professorial lecture which outlines the topics to be covered during the course, as well as associated relevant readings used to broaden the subjects covered. in addition, there is the text book which elaborates on material covered in class. finally, there are journal articles and other information sources which ordinarily are placed on reserve. as far as subject content covered in class lectures and discussion is concerned, information is usually well organized and accessible. at that level information seeking is minimal and often guided by the dictates of the professor. but then enters the term paper and the whole student peace of mind with regard to information gathering habits is disturbed. the term paper brings many unknowns to the undergraduate. the magnitude of the subject to be covered is initially fuzzy. the resources needed to get background as well as specific information are also fuzzy. furthermore, even when the resources are a little clear, sifting through them and making rational selection of relevant material may be problematic. the whole academic exercise entails learning and using new information tools, many of which were not covered in high school. computers and other electronic equipment have accentuated the undergraduates' mesmerization process in their information-seeking effort. a trait that most undergraduates exhibit in their information-seeking behavior is approaching the reference librarian for suggestions of leads to information sources needed for the term paper topic. they also may request the librarian to evaluate the sources as to their relevance, and sometimes even ask him or her to fetch the actual material needed. 6 with the advent of the internet and other electronic resources online or otherwise, (e.g., dialog, lexis-nexis, cd-roms, dvd, and tapes), the undergraduate may go directly to the internet terminal and thus skip the librarians' counsel and hand-holding which used to be vital for accessing the printed material. unless the undergraduate student is well-groomed in searching the internet, this relatively new tendency to act independently of the information professional may result in hours of useless roaming on the net with little relevant information retrieved. the graduate user in their study of business students, atkinson and figueroa found that graduates reported fewer hours spent in the library than undergraduates.7 the researchers did not attempt to explain why that was so. perhaps because of their search skills, graduates do more focused information seeking and do not waste much of their time browsing and floundering in the unknown information abyss within the library. the researchers reported an equal interest in searching internet resources and online databases (e.g., lexis-nexis, dow jones, and abi/inform), among graduates and undergraduates. however, their research was done at the end of 1995 and beginning of 1996, before the proliferation of search engines on the internet. as an information searcher the graduate is more sophisticated compared with the undergraduate. subject coverage is usually more clearly defined in many of the assignments encountered. he or she has gone through most of the pitfalls of the undergraduate experience and can select a subject and research it relatively well. most likely due to the nature of their assignments, undergraduates' information needs may be satisfied by simple information systems that allow users to browse. their searches also tend to be less exhaustive than graduates. on the other hand, graduates are faced with relatively 12 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. narrower subjects and prefer to conduct more comprehensive searches. 8 the post-doctoral researcher-faculty faculty have mastered the art of getting relevant information. many belong to the informal invisible college and attend professional conferences, both of which are used to get information for teaching and research. hart's study found that formal sources, which may be found in the personal and college or university library, are more important in the faculty's information-seeking effort than informal ones. 9 according to hart, this informationseeking characteristic would be applicable to printed and electronic resources found on the internet. although our research did not specifically test it, online databases tend to direct the end user to formalized definitive and tested resources than the internet search engines. this would minimize user search time and maximize relevance of the information needed by the research academic faculty. in other words, while the listserv might be one of the internet substitutes for the invisible college, information found on it would be more acceptable to a research faculty if it directs him or her to reliable and verifiable databases, i.e., information from cendata (u.s. census bureau information database), edgar (u.s. security and exchange database), or dow jones . developments in the electronic resources arena have made many hard copies less popular. subject-oriented databases can be searched either in the library or in faculty offices. curtis et al. researched the information-seeking behavior of health sciences faculty and found a relatively new and growing information-seeking characteristic. according to curtis et al., faculty tend to prefer to search electronic resources from their offices rather than go to the library. 10 that is not surprising, for if a faculty member can access library catalogs and electronic databases, some of which can provide viewdata (full text), it is not necessary for him or her to go to the library for some of the information needed. in addition, if cd-rom databases are on a local area network accessible via the college online catalog, faculty may seldom go to a library whose resources are on the network via a library web site, telnet, or the traditional dial up. i the pilot studies with the general information-seeking behavior of academic users in mind, the researchers decided to investigate the use of search engines for information sources in the academe in the new york metropolitan area. search engines were contrasted to databases which may be url(universal resource locator) accessible online via an internet browser, stand alone on cd-rom, or on cdrom towers linked by a library local area network. in her article on web search engines, schwartz discussed recent studies done on their performance . she pointed out the fact that the end user is not often a participant in such studies .11 although our research was not on evaluation, we deliberately focused on the end user to gather statistics on perception of web search engine utility in internet surfing and information seeking . kassel evaluated search engines indicating their variety and complexity when used to search the internet. 12 other relevant literature indicated the difficulty of navigating the internet for both the information professional and the end user. it also indicated how direct access to databases was a shortcut to retrieving some of the topical information . our periodic observations of internet users revealed heavy use of search engines. we suspected that end users use them to get topical information which might otherwise be easily gotten from online databases. consequently, we thought it necessary to conduct a study on end-user perception . objectives our objectives in embarking on the pilot studies were to: 1. find the frequency of internet use by end users. this would allow us to check whether there is a correlation between frequency of internet use and perception of search engine utility. 2. find the most popular search engine. examining the most popular search engine with respect to indexing policy might indicate whether it would generate more topical subject type of information. 3. gauge the use of online and cd-rom databases in the library. in order to help the end-users' memory as to what databases are involved in the research, common databases were listed on the questionnaire as examples. 4. gauge the use of search engines in libraries and information centers. common search engines likewise were listed to help the end user identify what they were. 5. relate the results to pragmatic library and informa tion-center functioning in providing information . methodology four metropolitan new york academic institutions were selected : borough of manhattan community college; iona college; queens college of the city university of new york; and wagner college. the main criteria for selection was ease of access for the researchers. a composite sample of users was selected from these institutions to participate in the studies. the sample used was dynamic and self-selected in that whoever used the the internet as a source of academic resource information i kibirigie and depalo 13 reproduced with permission of the copyright owner. further reproduction prohibited without permission. "internet terminal" was a potential research subject. only end users as opposed to information professionals/librarians were used in the study . while subjects sat at the terminal, they were requested to complete the questionnaire and return it to the reference/information desk. simplicity dictated the design of the research and data collection instrument (questionnaire). it was one page, multicolored, and was entitled "internet use questionnaire." we estimated that it would take the subjects four to seven minutes to complete. our assumption daily 46% ivlonthly 9% weekly 45% in designing it to be simple and least time-consuming was that since the subjects were sitting at the terminals, they were time conscious. figure 1. frequency of internet use while subjects were asked to complete the questionnaire, they had the option not to. forty copies of the questionnaire were given to each academic institutional librar y, making a total of 160. useable returns were 155, or 97 percent. in addition to the questionnaire, we conducted exit interviews with some of the subjects who were using the internet terminals after they handed in the completed questionnaires. the purpose of the interviews was to have some idea as to how the users perceived the utility of the internet in getting electronic-based information . four questions were used: 1. how do you find the internet as an information source? 2. did you get what you needed from the internet ? 3. do you have a favorite search engine? 4. is there any point when you would seek the assistance of the reference librarian/information specialist? analysis of the data was done using the spss (statistical package for social science) package. we used descriptive statistics for general group tendencies-frequency of internet use and preferred sources for topical subject search. for inferential statistics we preferred the non-parametric pairwise two-tailed correlation coefficients, kendall's tau_b and spearman's rho statistics . microsoft's excel program package was used to draw some of the illustrations. results the study revealed that an overwhelming majority of subjects (91 percent) use the internet at least once a week (this includes those who use it daily) . an almost equal number (45 percent) use it weekly-(at least once a week); 46 percent use it at least once a day (see figure 1). as figure 2 shows, search engines are the predominant preferred tools for searching topical subjects on the search engine 84% figure 2. preferred sources for subject search online db 16% internet as contrasted to online and cd-rom databases. we used the two-tailed pairwise correlation coefficients to see whether there are correlations between frequency of internet use and tool preferences. as table 1 and table 2 indicate, subjects who used the internet monthly or weekly had high correlations with online databases . daily users, however, tended to have high correlations with search engines as tools to get to topical subject information sources. i interpretations and conclusions search engines certainly provide the most common access points utilized by library /information center users to get to electronic resources on the internet . unfortunately, the average user seems to have the impression that the internet is a be-all and almost a panacea to all information problems. kassel suggests 14 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the pilot studies do not give correlatlons conclusive answers as to why the daily i seng i m:>nthl y i weekly ondb weekly and monthly internet users correlated with those who use online and cd-rom databases. it might be that they search the internet via search engines as supplements to conventional online sources. alternatively they may search using search engines on an exploratory basis when they begin a relatively new subject. daily users who correlated with search engines might have mistaken the highway function of search spearman's rho daily correlation coeffic ient 1.000 -.544 sig. (2-tailed) .456 n 4 4 seng correlation coefficient -.544 1.000 sig. (2-tailed) .456 n 4 4 m:>nthl y correlation coefficient -258 0.316 sig. (2-tailed) .742 .684 n 4 4 weekly correlation coefficient -.544 .500 sig . (2-tailed) .456 .500 n 4 4 ondb correlation coefficient .258 .316 sig . (2-tailed) .742 .684 n 4 4 table 1. nonparametric correlations-spearman's rho that, at best, search engines seem to reach just about half of the web pages available on the internet.13 sullivan has given several reasons why search engine coverage is incomplete and search results sometimes may be misleading.14 among the most cogent reasons are: documents may be changed after they have been picked up for inclusion; deleted materials may be displayed as available; and web sites or files which are password accessible are not covered. much of the information needed in academe is proprietary and available via database vendors. using search engines as the main recourse to topical information shortchanges the user and may lead to frustration unless the high user expectations are tempered by constant education by the information specialist. correlations .258 .742 4 .316 .684 4 1 4 .949 .051 4 .800 .200 4 -.544 .45€ 4 .50< .50< ' 0.94! .051 ' 1.00( .63 .36! ' .258 .742 4 .316 .684 4 0.8 .200 4 .632 .368 4 1.000 4 engines from the actual sources for example: edgar or medline or eric. it might have been the problem of confusing "the end" with the "means to the end ." i implications for information professionals our studies indicated that a majority of the users in the sample preferred the search engines as access points to the internet for topical information. the interest in search engines correlated with the state university of new york at albany study which also indicated their predominant use in searching the internet. 15 while the albany study was general, ours related the search engines to getting topical information and the use of online databases as an alternative. our findings point to the need to re-educate the internet user in several aspects of the superhighdaily i seng i m:>nthl yi weekly ondb way. first, content-the fact that only a fraction of the possible sites (approximately one half) are indexed by the search engines. second, authority-because it is so easy to self-publish on the internet, a lot of information of low integrity (for instance) or factual inaccuracy may be mistaken for reliable sources. third, transiency of information found on the internet must be pointed out. the maxim "here today, gone tomorrow" is appropriate for several kendall 's tau_ b daily correlation coefficient 1.000 -.516 sig . (2-tailed) .346 n 4 4 seng correlation coefficient .516 1.000 sig. (2-tailed) .346 n 4 4 m:>nthl y correlation coefficient -236 .000 .183 sig . (2-tailed) .655 .718 n 4 4 weekly correlation coefficient -516 .000 .400 sig . (2-tailed) .346 .444 n 4 4 ondb correlation coeffic ient .236 .183 sig. (2-tailed) .655 .718 n 4 4 table 2. nonparametric correlations-kendall's tau-b .236 .516 .655 .346 4 ' .183 .40< .718 .44< 4 ' 1.000 .91, .071 4 ' .913 1.00c .071 4 ' .667 .54l .174 .27! 4 ' .236 .655 4 .183 .718 4 .667 .174 4 .548 .279 4 1.000 4 web sites on the internet. finally, information professionals must the internet as a source of academic resource information i kibirigie and depalo 15 reproduced with permission of the copyright owner. further reproduction prohibited without permission. emphasize in their training the proven online databases to which users should go directly, if and when those databases are provided by the library or information center. information professionals have a direct link to providing users with guidance to proven online databases, specifically during course-integrated instruction. education for the end user is paramount to the optimum utilization of electronic information sources. a welldeveloped information resources instruction program is needed in conjunction with the one-on-one instruction that takes place every day at the reference/information desk. such instruction programs must be cumulative, if they are to be effective in an age of burgeoning choices for end users who can more and more often choose to be remote users of information resources. in an academic environment, early intervention at the freshman level is paramount, but also must be pursued in a structured manner at the upper levels. many college and university information resources instruction programs are based on a one-shot, approximately fifty minute session, which often is executed as an orientation to the library /information center. such a method of instruction has no guarantee that there will be further guidance sought, either at the behest of a teaching faculty member in the form of course-integrated instruction, or on an individual level at the reference desk. developing effective ways to integrate information resources instruction into the lives of end users is one of the challenges information professionals face in the new millennium with an increase in the use of electronic resources found on the internet. references and notes 1. jon guice, "looking backward and forward at the internet," the information society 14, no. 3 (july /sept. 1998): 201-11. 2. g. mcmurdo, "the net by numbers," journal of information science 22, no. 5 (1996): 1397-411. 3. n. l. pelzer and others, "library use and information seeking behavior of veterinary medical students revisited in the electronic environment," bulletin of the medical library association 86, no. 3 (july 1998): 346-55. 4. harry m. kibirige, "viewdata," in encyclopedia of electrical and electronics engineering, vol. 23, ed. by g. webster (new york: john wiley, 1999), 223-31. 5. department of labor, the secretary's commission on achieving necessary skills, skills and tasks for jobs (washington , d.c.: department of labor, 1992). 6. gloria l. leckie , "desperately seeking citations: uncovering faculty assumptions about the undergraduate search process," journal of academic librarianship 22, no. 3 (1996): 202-208. 7. joseph d. atkinson and miguel figueroa, "information seeking behavior of business students : a research study," the reference librarian 58, (1997): 59-73. 8. deborah shaw, "bibliographic database searching by graduate students in language and literature: search strategies, systems interfaces, and relevance judgements," library & information science research 17, no. 4 (fall 1995): 327-45 . 9. richard l. hart, "information gathering among the faculty of a comprehensive college : formality and globality," journal of academic librarianship 23, no . 1 (jan. 1997): 21-27. 10. k. l. curtis and others, "information-seeking behavior of health science faculty: the impact of new information technologies," bulletin of the medical library association 85, no . 4 (oct. 1997): 402-10. 11. candy schwartz, "web search engines," journal of the american society for information science 49, no. 11 (sept. 1998) 973-82. 12. amelia kassel, "internet power searching : finding pearls in a zillion grains of sand," information outlook (apr . 1999): 28-32. 13. ibid. 14. danny sullivan , "search engine coverage study published," search engine watch. accessed march 11, 2000, www .searchenginewatch.com. / sereport/99 /os-size.html. 15. wei peter he, "what are they doing on the internet?: study of information seeking behaviors," internet reference services quarterly 1, no. 1 (1996): 31-51 . 16 information technology and libraries i march 2000 lib-s-mocs-kmc364-20141005045405 bibcon-a general purpose software system for marc-based book catalog production liz gibson: assistant systems analyst, california state library, sacramento. 237 the bibcon file management system, designed for use on ibm 360 system equipment, performs two basic functions: (1) it creates marc structured, bibliographic records from untagged input data; (2) from these records it produces page image output for book catalogs. the system accepts data from several different input devices and can produce a variety of output formats by line printer, photocomposition, or computer output microform (com). introduction bibcon is a general purpose data management system for bibliographic records control (i.e., for creating, manipulating, formatting and outputting of marc structured bibliographic records from catalog card input data). the system, shown in figure 1, consists of seven basic programs which functionally divide into two parts: (a) four programs for creation and correction of marc-like records; and (b) three programs and an ibm utility sort for formation of book catalog entries from these records. obviously, a detailed description of such a large and complicated system is impossible in one journal article. a detailed description of the system specifications and user instructions has been prepared and published by the california state library.1 the bibcon system was cooperatively developed by the institute of library research, berkeley; the library systems development project, santa barbara; and the library systems offices at the santa cruz and berkeley campuses of the university of california. the system was developed in response to the needs of the university of california ( uc) and of the california state library ( csl) for efficient production of author, title, and added entry listings of their monographic holdings for distribution to their respective clientele groups. the general system requirements for both libraries were the same: (a) with a minimum of expensive manual keying, bibliographic data 238 journal of library automation vol. 6/4 december 1973 i nitial input data processor vjo~oco}fp. conversion program f1l( update processor allto>ianc field ~~cog. mal\cilrec. forma'tter sked output sort~recoih) cre.o\tt)r p.~>cero~u outi'\i'rpace formatter fig. l. bibcon: basic system schematic inmallnflri' data l'i\oc:essor pblm'sus proof listing forma. tier bipust output entity and colomn f ormatter ibm sort ou"rput file sorter must be prepared for book catalog production, with any of the standard catalog entries as keys. (b) provision must be made for the widest feasible variety of columnar output formats. (c) the format for any machine-readable records must be compatible with the marc standard. the system has been installed with revisions and modifications on an ibm 360 model 50 computer used by the california state library. all proknowledge numbers 090 call number main entry 100 main entry supplied titles 240 uniform title paragraph 245 title collation 300 collation series notes bibliographic notes 500 notes lc subject headings bibcon ! gibson 239 650 subject added entry other added entries 700 author added entry 7 40 title added entry (traced differently) series added entry 810 series added entry (traced differently) remaining unspecified data 099 remaining unspecified data 400 series, traced (personal ) 410 series, traced (corporate) 440 series, traced (title) 490 series, untraced or traced differently fig. 2. variable field tags-afr-marc ii grams in this version are written in the ibm basic assembler language ( bal) instead of the original combination of bal and cobol. in its .first version, bibcon processed monographic records exclusively. various programs have now been modified so that the system will also process serial records in a simplified marc serials format. this article, however, will describe only the system for processing monographic records. the system has been used to produce catalogs of monographs for uc santa cruz, uc san diego, and the one million record supplement to the uc catalog of books.2• portions of the system were used to produce the initial copies of the university of california union list of serials. the california state library automation project is using this basic .file management system to process both monographic and serial records for the production of several book catalogs. these will include, principally, the california union list of periodicals, reflecting the periodical holdings of libraries throughout california, the california state library list of periodicals, and the catalog of books in the california state library. automatic field recognition (afr) at the heart of the system is the program which creates marc-like records from unedited input data. this program, called automatic field recognition or afr, identifies control and variable fields and creates a leader and record directory for each record submitted to it. in order to accomplish this, when a record is submitted to the program, it .first sets aside areas into which data for each of the four parts can be placed. the field 240 journal of library automation vol. 6/ 4 december 1973 control numbers 0 1 0 lc card number 0 11 linking lc card number 0 1 5 national bibliography number 0 1 6 linking nbn 0 2 0 standard book number 0 2 1 linking sbn 0 2 5 overseas acquisitions number (pl480, lacap, etc.) 0 2 6 linking oan number 0 3 5 local system number 0 3 6 linking local number 0 4 0 cataloging source 0 4 1 languages 0 4 2 search code knowledge numbers 0 50 lc call number 0 5 1 copy statement 0 6 0 nlm call number 0 7 0 nal call number 0 71 nal subject category number 0 80 udc number 0 8 1 bnb classification number 0 8 2 dewey decimal classification number 0 8 6 supt. of documents classification 0 9 0 local call number main entry 10 0 personal name 110 corporate name 111 conference or meeting 13 0 uniform title heading supplied titles 2 4 0 uniform title 2 4 1 romanized title 2 4 2 translated title title paragraph 2 4 5 title 2 5 0 edition statement 260 imprint collation 3 0 0 collation 3 5 0 bibliographic price 3 6 0 converted price series notes 4 0 0 personal name-title (traced same) 410 corporate name-title (traced same) 4 11 conference-title (traced same) 4 4 0 title (traced same) 4 9 0 series untraced or traced differently bibliographic notes 5 0 0 general notes 5 0 1 "bound with" note 50 2 dissertation note 50 3 bibliographic history note 50 4 bibliography note 50 5 contents note (formatted) 5 0 6 "limited use" note 5 2 0 abstract or annotation subject added entries 6 0 0 personal name 6 1 0 corporate n arne ( excluding political jurisdiction alone) 6 11 conference or meeting 6 3 0 uniform title heading lc subject headings 650 topical 6 51 geographic names 6 52 political jurisdictions alone or with subject subdivisions other subject headings 6 6 0 nlm subject headings (mesh) 6 7 0 nal subject headings fig. 3. variable field tags-lc-marc ii 6 9 0 local subject heading systems other added entries 7 0 0 personal n arne 710 corporate name 7 11 conference or meeting 7 3 0 uniform title heading 7 4 0 title traced differently fig. 3 (continued) bibcon /gibson 241 7 50 name not capable of authorship series added entries 8 0 0 personal n arne-title 810 corporate name-title 811 conference or meetingtitle 8 4 0 title identification progresses on the basis of two signal symbols which are inserted between fields during input and on the basis of the order and content of the fields. when a control or variable field is identified, a standard marc record directory entry is created, containing the afr-marc ii field tag, the length of the field, and the starting character position of the field (figure 2) . necessary indicators and subfield delimiters are also created and placed in their proper positions in the field's data stream, and the field, along with its field terminator, is placed into the area set aside for data fields. afr-marc ii records it is important to emphasize that the system produces marc-like records rather than full marc records. while the basic record structure is exactly like that of standard library of congress marc, distinctions such as personal versus corporate main entry are not shown by the field tagging and the degree of subfield delimiting is extremely restricted.5 compare the list of variable field tags for afr-marc ii (automatic field recognition marc ii) records to that for lc-marc ii (library of congress marc ii) records (figures 2 and 3). at present, afr-marc ii provides detailed subfield tagging for only two fields, call number ( 090) and title ( 245). this lack of detailed discrimination causes no problem, however, for output of book catalog entries. it can affect filing sequence, since ala filing rules depend on such distinctions as personal versus corporate author to determine proper sorting. the decision to omit detailed subfield discrimination is a concession to cost. the two principal developers ( uc and csl) decided that, for book catalog production, detailed subfield delimiting would be of little value and that the benefits of such detail (i.e., ability to sort according to lc filing rules) would not justify the added costs in editing, input, programming, and processing which would be required to provide this detail. a sample of an afr-marc ii record created by the automatic field recognition program is shown in figures 4 and 6. it can be contrasted with the lc-marc ii record for the same title (figures 5 and 7). both a machine-based representation (figures 4 and 5) and a formatted output example (figures 6 and 7) of the record are shown. 242 journal of library automation vol. 6;4 december 1973 t $asocial~policy--bibl. 10 $aunited~nations.~~dept.~of~social~affairs. t ~ = blank 1 = field terminator t = end of record rz ?164 s66. 11.5 00001 united nations educational, scientific and cultural organi· zation. education clearing llou.se. education for community l' 2 fal ~-\tr:a!r!t. n. title. ( s~rlcs) m, j.., , j.t. 1.1. 1 s i.b5.u37 no.·{ 0 016.370193 ---copy 2. 1 z711h.s06us library of congress 151 l united nations. in, i, s r, l, ~&-373 fig. 4. sample library of congress card in apr-marc ii format bibcon /gibson 243 fig. 5. sample library of congress card in lc-marc ii format input data afr creates marc structured records from unedited input data. to what does "unedited" input refer? without a program such as afr, each marc field tag, subfield code, indicator, etc., for every marc or marclike record must be manually supplied by a human editor. with afr the input keyer simply indicates that some field is beginning; it is then up to the afr program to identify the field. afr will accept input created by a variety of methods. the decision on input method is based principally on cost. since input costs can vary widely as a result of various local conditions, provision has been made in the bibcon system to accept data in card or tape format. keypunch and optical character recognition (ocr) input are the two methods used thus far. a sample ocr input record appears in figure 8. while input instructions will vary according to the input method used, the four basic keying requirements remain the same: 1. begin an input record with an identification number. 2. place a field separator symbol before each field (i.e., each indention on the catalog card). 3. place a different symbol (called a "location" symbol) after call number and after the library location data. 244 journal of library automation vol. 6/4 december 1973 errors tag1 ind 2 sub3 data record no. 0000001 leader directory 00689nam 00145 , 008004100000090002500041099007700066100009700143245014900240 300001800389410004500407650002500452650002100477700004600498 008 090 $a $c 100 10 $a 245 1 300 $a $b $c $a 410 21 $a 650 0 $a 650 0 $a 700 10 $a 099 $a 710324sl954 z 7165 s66 us ref. 00000 eng united nations educational, scientific and cultural organization. education clearing house. education for community development; a selected bibliography, prepared by unesco and united nations [division of social affairs. paris, 1954] 49 p. 28cm. its educational studies and documents, 7 social policy--bib!. education--bib!. united nations. dept. of social affairs. lb5.u37 no.7 /016.370193 /55-373 /z7164.s66u5 /library of congress$ 1. tag = field tag. 2. ind = indicator. 3. sub = subfield code. fig. 6. afr-marc ii printsus output format 4. end each input record with an end-of-record symbol ·! variations on these four basic rules may be required because of restrictions of the input device used, because of variations in content or form of the input data, or because output specifications require nonstandard treatment by the programs. the task of manipulating the varying input into a form which is acceptable to afr is performed by a program called preafr. pre automatic field recognition (preafr) this program provides the interface between any one of the different input methods and the afr program. basically, preafr accepts data from keypunched cards, and ocr preafr accepts it from tape records. both forms of the preprocessing program combine input data segments until an end-of-record symbol is reached, indicating that all the data for one bibliographic record have been assembled. a character by character search is made, and special characters and diacriticals which were input as special codes are translated into the values necessary for output processing. bibcon /gibson 245 errors tag ind sub data record no. leader directory 001 008 050 0 051 0 082 $a $b $a $a $b $c $a 110 20 $a $b 245 1 $a $b $c 260 1 $a 300 $c $a $c 410 21 $a $b $t $v 650 0 $a $b 650 0 $a $b 00804nam 2200181 001001300000008004100013050003100054051002700085082001500112 110009700127245013500224260001800359300001800377410013600395 650002500531650002100556710004600577 55-373 710324sl954 fre lb5 u37 no. 7 z7164.s66u5 lbs u37 no. 7 copy 2 0164.370193 00000 eng united nations educational, scientific and cultural organization. education clearing house. education for community development; a sel~cted bibliography, prepared by unesco and united nations [division of social affairs. paris, 1954] 49 p. 28 em. united nations educational, scientific and cultural organization. education clearing house. educational studies and documents, 7 social policy bibl. education bibl. 710 20 $a united nations. $b dept. of social affairs. fig. 7. lc-marc ii printsus output format in addition the program can perform several editing and checking func,.. tions. these functions are optional and are dependent upon the input equipment and upon the wishes of the user. options such as deletion of data on the basis of special input symbols, checking to determine that the record control number is valid, and production of a file of control numbers for records in which data could not be interpreted by the input device are standard. because this program provides the interface between different, nonstandard input methods and one standard record formatting program, it is very user-dependent. the basic logic will remain the same, but individual options will have to be added or subtracted by each separate user. 246 journal of library automation vol. 6/4 december 1973 0000001 r=z 7164 =sbb =us y=refy=united =nations =educational, =s cientific and =cultural =organization. =education =clearing =ho use./=education for community development; a selected bibliograp hy, prepared by =u=n=e=s=c=o and =united =nations {{=division 0 f =social =affairs· =paris, 1954}}/49 p. 28 cm· {=its =educat ional siiitudies and documents, 7}/1. =social policy--=bibl· 2. =e ducation--=bibl· =i. =united =nations. =dept. of =social =affa irs· =i=i· =title· {=series}/=l=bs.=u37 no. 7/016-370193/55373/=z71b4·=~bb=u5/=library of =congress$+ note: data are from the catalog card shown in figure 5. fig. 8. sample ocr input s 0 0.0 0 1 0 0 9 8 it 1 , 8 b ~~ u 0 e l a i r e w b .5jillozz'i0000eqfoeqedeifof~h.e1~1e4c4c503c!t2.jl2t5404062epb8.96'tl!.mjuauoilum!lllllllill.__ c g p -'1-'l8.5.hfill!tqlec.lllllll!iift.3.1u9.ll!wiaza2juab!l!jizab40c7859699.ll.qtl2400t96a493~~289968799q197~~1l9.!1_~-r . k • t . r -.s.l..'ll~.qal!i6.ti!!~95a3b!99b985az!t08za8400997828599a3'tl!.llu!>llll!b~l!fllul~allllwll!!hill.allill.'l..~r,~;:____ a .j e • g es 0 _m,9,9.alio.cl93s}b5s54cbl,5l.~qiu.81!j485a240c594949695a2484040~~~ull.lg£li,489a2a3?.!ll__ u.s. w p • c c • 1'1 ___8.1_mub.s.a.~t,4c8995.t.013a8~8f24b40aza840f6cj69993844007a48~~jlu.~5jls!illtl~m.~~!t.qf.l_e2_· 69 189 • • ( .,. • -em'.s.3.e!tll~ 0 6 2 fl f 8 f!! •os 7484040899393 a4a2484040 9781 99a340 839693 48,06840 86818 3a2 8'1 94 az48 &b!t09] 9699 a3a24b 74 .. . it . a . .. . . i q .. b .. _!t._om)..f2£.a.83!.ur4df3 a p 8540c 1 9943,942 a340aj 9_5a44q8889a240a6969993845d404062069540az97~57a4040c 281 a.l..:._ • i _jht..a59.3.b 1 8999 8 5 5e ,c. 38 8,.. 40 p1 s9a 3 89aza]40 8195 8440888942 4066 969993 844840 40 62c99 ,8393 a484 8sa 24 0 82 89 82 93 • l • 8 • c p • l 8 . 2 1 1 8 __ag_g.6j.l9.!ul...9.1.8..8.ab..~.m..o!tll6 ze 1 48 .s.oczbi 448485 9181 8~998 5684 oc388 81 9993851240078~ 8599998 5~0f 1 fa fzf 169f li_$_ 6 7 • j • . p " g . • . . i i ~ k , r • i. i 1 • fbf74b401o~llj..s.itms..3-ua36b40c78596998785az48404qc9c94840q2969t916q400996828599a348404~c2ccj4840 t 8 • i s -e.3...8.9..u93ssiaie(v.dc281a.§84esqj8reqgcjai§5elr0&38885408t' 9iu389az.ala1lw~.889a240a69699~lll448401tdez8599. ····· ~ . . ·-··~~· · ······"'· -·~ ... . · . ,.·.-~ •; . . . .. .. .... ... . i ii ·ll__o c 0 1 c i o·., l 6 0 3 3 p 2 c 2 6 c • c -.e.g..as.a2sd3701 e400qo£il.ed.e.oe..o£.l.e0flf00099f9~0f6fof3f]4007~_f_2f6407999858679401toc381249761l!!.qu888199 fig. 9. preafr output data-printed from tape record preafr produces a file of variable length, machine-readable records (figure 9) which are passed to afr for formatting into a marc structure with limited marc ii tagging as described in the section on afr. record proofing and correcting prints us printsus is an output program which provides formatted afrmarc ii records, showing field tag, subfield delimiters, indicators, etc. this printout is designed for proofing of the marc records created by afr. samples of this type output appear in figures 6 and 7. bibcon /gibson 247 fix by processing data according to "fix commands" this program corrects records in marc format, operating as a context editor. corrections can be made to content or structure. entire records can be deleted and new records can be created using fix "correction" statements. when any change is made, fix automatically updates the record's leader and directory to reflect the record as changed. there are two input files: bibliographic records, in marc format, and the fix correction data. the input records file must be in marc format and must be in the same order (by record i.d. numbers) as the fix correction data file in order to successfully update the records. the fix program method of making corrections is based on the fix expression, which can be considered as a "language," with rules of grammar governing the structure of expressions (sentences), the order of elements within the expressions and the possible contents of each element (see figure 10). output processor the output processor consists of three programs and an ibm utility sort program. these general-purpose programs, which are designed to create book catalog page output, allow a variety of options for sorting as well as formatting. sort key edit ( sked) this program performs two major functions (figure 11) as follows: (a) from a single marc record it creates a record for each point of access to that record as specified by the program user; and (b) it establishes a 256 character sort key at the head of each record extracted. the file is then passed to an ibm sort package for sequencing. record extraction sked does not actually extract data from the original marc record. instead, it replicates the full record for each access point specified. it is left to the biblist program to extract the required data from these records. thus, if a particular bibliographic record should have five access points (one for main entry, one for title, two for subjects, and one for some other added entry), sked would output five full marc records. essentially the only differences in the output sked records would be in the data found in the sort keys prefixed to each record. the record for main entry access would contain main entry data as its first element; the title entry access record would contain title data first, etc. sort key creation data for the sort key are selected on the basis of user-specified tables. rf.cord 0200380 . 01 fix ripr1'ss!on 3 1 s 1 1)50 1 i j~put l£2 a 1. loans, parsonal s~n francisco. oot !'ul' l.'lij 1 · a 1 . loans, parson u s an !' nnclsco. 02 pix !ipr~ssion 2 1 ' c' d fix commands in this sample i~put 2~'; 0 a san prancisc:). v. c2~ci. ttssnln output 245 0 a san f'~ancisco. v.24c!ll. 01 03 fix exprpssion 2 1 1 v.24cll. 1 i 1 300 1 i 02 1 np u:i' l ll s ,, " ~-~'pr.~::rn·r. r#ic-o~·-v·;ncm. v~ output 300 a v. 24cm. 04 245 0 a se:n l'rancisco. os 04 fii ~ipressjon 2 1 i i!l i cd 1 1 'report.• input 245 0 a san 17ranci:;c:>. 100 10 a san francisco reme~ial loan association. output 245 n a reoort.san francisco. 100 10 a s<~.n francisco remedial loan association. u::>7ix :o.:xyr~ sstoh-------z-1-tport.-~-c .., -. hpu'! o!jtpfl'!' 245 0 245 0 a a p.e~ort.san francisco. ....... report.._fan francisco. fig. 10. sample fix data, illustrating fix operations coo£ meaning 8 set field tag c ~data ~ ify -fi·eracd ~&~ ata c ~data report. ~ 00 ...... i ....... .q., t""« j j a no 0 ;:s < ~ 0';) ~ tj cl) @ g_ ~ ..... ~ general description of sked subsystem functions of principal tables original marc rec. broad program functions f!eldtable 1. explosion of record. user indicates the fields to be extracted leader directory cant'! var. field table is used to determine and their heirarchy. fields data number of "skcd records" to make. stream for each field selected, the next table is 2. extraction of data; sort consulted by the program. key build for each sked record to be made, i the program cascades through the three tables as is required, in order field control table to properly extract the data from the marc record to build sort keys. for each field selected by the field table, the control table provides a pointer to the 256 byte sort key is created. matching subfield control entry in the next 3. creation of sked records . 1-..;':;:ab~le=( see=be==-lo=::w~).;_. -:--:---------1 original marc record is rep1icated "superblank": indicates the number · of and unique sort key is affixed to blanks to be placed at the beginning and the front end. end of each field. ~....;..;..._....;..;..._-,rl~---j subfield cont'l table sequencing: for each field selected, indicates the sequence for subflelds to be placed in sort key. blanks: indicates if blanks are to be placed at the end of each sub6eld (for sked ~--~so~rt~in~g~p~u~~~se~s~)~·------------------~ recotjrs i i initial articles: indicates if initial exampl e nrticles should be dropped in each subfield. conditions require 3 miscellaneous editing: all upper sked records ca:<>e ; remove blanks; drop punctuation; etc. -----------.---_;;c;.;:o;.;.p;.;.y_o;;,;f;._., + copy of copy of sort key number 1 marc data sort key number 2 -marc data sort key number 3 marc data r (main entry) m 1-0 ~ v ... t/': r (title added entry) m ii// 1/: i/ v.~ r (name added entry) m vv v v v e swth, john@jfirst i v.1 !/':: f/.; v .i e first world war 40 i i' /[/ v e doe, jane 40 first i 1' .1 r -' c worldwa!r@)i96710 s v/v/ j/ v.1 c smith,john40!96710 s r/v/v v c worldwar40smith, s i .1 [/ vv v/ c l /.-t:~:vv . fc l/ v / 1-john40196710 c 1/ v ~ i~v i/~ ~ l/ ~l~v: ~ [/vv~ ~ ~ 256 byte -----i ~~ ~ ~ ~ +--256 b)ie _____, ~ v rj~ ~ ~ 256 byte~ i.~~ f~ v: fig.ll. general desc1·iption of sked sttbsystem tl:l ....... tl:l r.j a ~ ...______ 0 1--( t::d {j') 0 z 1.'0 ~ ---------------···· ·· _____ _____ !;;_l}i..:t.fo~l'ulul.'l'a't'llli..l.ffiaiu__ ____ _______ _____ _ ·-------··--···page 3 sample sked tables author/title list --··-· -----·· ··--------------····~---· ------------------------···-------······· ···----tnc: na.ifr:t r.nnf. ~ii dr i af.ijr? st ~t 50urt:e sti\tf.ment fl51lct70 l./oh7z 11'1 *· ······ ... . .. .. . .. ·.. .. .... _. . ......... 0~01)~'>~0 ~·1 * f i' f f f f f f f f f f u 'j11)fl~'l70 'l() * l 'l l l l l l l l l l l n 01\0fjil'>~o --·----·--·-'ll........ o __ -lj _ ___ o. o. o ........ o.o o o . o .. ........ o o u ... __ do'jijo'''jo .. ... 'l7 * t t t c l l [j 9 2 f i i s 001101000 i< l (i 1)1)1)1)( 0'•0 97 ... ·--......... ·-· . ···· .... -·-...... ..... .. . ... ....... 1)!]1\0111~1) ~q • 000010~0 q9 o byte 0 .i i> 1 ~ i 1 i i i i 1 i 000n1 ohi ------------·--------· ·--··--100 .. ~ _____ ........... ---·· ....................... 0 .• 1 .l.3 4 ... 7 .. 6 .9 .. ·· -· ···--·--·----. o~(ioioao .. .. 101 * oooqii)qq tn7 • ijnnnlti)o .. ona1 1 r:...ftof:lf.~ll~tljcuf51l5----·--·-.lcj _______ ijc ___ .j:.! 400 '· ,x .0. ilooq oomm00010' sep.if.s-cor 0·101jii~? 0 oih6c f?f.-1 r•;t~'l:l~o~f'li:s 1 cq llc (. '<''•5', x' 0 c·1•j10e 51jsq01)01·70000il000f\0qdf r.ij01j0' itt'.( i oijoijl\60 .. oilll.!~\l. l'~f' '-ox 1.000000.esu500juf 1000000 jooooo~ 10000' ..... .ime.l---00001200 .00r.'.i 11'1 l.lc r. 'i xx 1 1x 'l)tlllocl0[~tjsc.1'1clf30fllloooihj00~0 !)0d0•10' 000 1)1 z 10 ooihiio fof~frih)otioiclci ii~ llc c't11jfl 1 ox'ullq007f.lr.lfl~0'•f'o70 ............... ···----· -·--· ----·---··-----· __ 115 .. ~.----------· ·-··-·····--·-...... ... . ---..... ______ .. 00001730 -· oo.nft, f4f4rc:rooor,nf'>r.~ 11~ oc f.''i'i0 1 ,x'003:joor-51j'i00<11jfifffffl101101lcooi)mo• series-ttl 000!112'10 00;1'\fr f?f'ol' 'ir.olo'lrllf'>f.'i 117 i;c c'2'~~;•,x•0•j0·1cjor~·.j~ooilof?000003100001f'l0000' itmei coijoiz50 -'loo•ooc .fofoi:tl0._ )jq * ~o~!jiz•o oo14'c! foftl'rr.::n .wof>r: 'i i?o nr. f"! l'1' ,x•o'l!icjil11jru-rrrr-.l'lu~ooo~f 3joouoooocooo ·~•hj000' 00')()1300 00:14~1: fofohtlflools. --ihni-corps•en. (chico, 196q] 50p. illas. sponsored bj the california dept. of education, dhision of co11peasatory education, bureau of cou•udity seryices add. :hgra.nt educat.ion, peb.-a'119· 1968. 1 .. children of aign.nt laborers--educatioa. 2~ teachers--butte co., calif. 1. f.lecurntaty and secondu:y educatioa act. if. educatioa--california.-buite co. 5>. social17 baddicapped c:.taildrea-educa.tion. c:gps gps ci.i.iporirlll. &d'us081 coi!ftissio• 01 voftelc~ --day careo .. tnnscrip.t of the ..,ublic hearlnq held jojntly with senate aad asse•blt social melfec~ coaa ittces. san rrancisco, october 11, 18, 19h~. 22dp .. 1. d-ty nurseries .. ll@b!t d) gps calipor:tu. adyisoitf corllsittee 01 coftp!msatorf educltioii., -.:..reconeadations for pxpansion by the california state legislature of the state cospctn1iatarr education ptograll base" on tho rtcat@~r act. [ 196~?) 46p. paul r. j,avtence. chau:•.._n. 1. caucornia. advisory couitt.ee 'ln co•pt!~satorr education. 2. t't[ceptional children--uuc:at1on. l .. socially handicapped childred--educahon. c170 rl cps caliporiiu. bureatj 0, et.e~emuri f.duca1'101. --cjilifot'nia proql."a.ll for tbe ear" of children of \forkinq po\rcnts. s<~cra~ento, califocnia state dept. of e;tucation ( tqu] 1 h, 12's pa incl. lllusa (plaas) tllbles. for11s. 21 c•. (californiaa dept. of education. bulletin of tlle california state dp.['artaent nt education, wol. ill • no.6l .. prepare4 by the diwision of ele11entary el\ucation.•-rorevord .. rihlioqraph:!': p. 79-101. 1 .. dctj nur~e~:im;. 2. childr•n--cbarities. protll!'ction, etc.--cctlifornia.. j. world vat, 19)9-childu·n. 1200 88 '1. 12 ao.,6 cps 'rh~ california children's centers and preschool educatiodal proq~ss. see under: califoroh .. legislati•• laalyst .. l42'l c41 gps c&liporifu. co!'!rsisstolf por spf.cul eotjcatioj,. --the education of pbysicdj., handic .. pc~ children. prepare~ by the c0211ir.~ion for:spp.cii\1 p.dllciltiod of thp. califord.ia state oepat't:.ent of ll!'dueation. saer&aento. calitot'nia state dopt. ot elluc:ttioa (19111] wiii, 121, (1] p .. 2 .1 ca. (california. dept. of education. nullnti~ of the callf:>rnia state oepa~:ti!ipnt of education, •ol x. no.12) lt h"ad of title: ••• oecpallel', 19111.coataias biblioqrapbies. 1. oefecti't'e and deliaqu.,nt claases--p.~iicolt:ioa. 2 .. cbildrea, lbnol'•al and bacltvard. l. edue"tl01l and childten. "· !docl!ltion--califoraia. £200 88 '1.,10 d0.1z gps cllipo!uul. cod!dultllfc conncil por hicii!ii educatiom. --california biqbet educ;stion iid4 the dis:a!!yantaged: a status report. n6a. 67p. on cou~r: lh1•ber 10j2. 1. education .. lligbt>t. 2. nnlu~r!litip.s and colleges--california. 1. student aid--california. if. socially handicapped childrp.n--!ducatioa. el'jo 06 gp5 --co1lifornh hiqher adacatlon and the disadvantac)e4; a status rp.port 68-2 for prosentation to the council, pehruary 1cj, 1968. 86p. 1 [ pablie .. tion] 68-2) 1. educiltiod. lliqher. 2. schohrships--califotnia. l. students--califot"nh. 4 .. stud<-nt aid-califot'nia. 'l. socidly handicapped children-califol'nia. to .. llniversities and c:olleqes-california. 1. "p.r~onnpl s~rwice in p.ducatlon-california. r. uniwcr s itie:o; and colleqes-ca li fornia--r.n tra nc" rtoq ui rfl'•en ts. j"1lf0 llh gp!l --nseof cxc,..pt ion!; to ad .. i:lrion!'; s t"'rtdar-ls for a.:il!li:;sion t"t .!is·l•h.1nt.,'l ~d !';tu:1pnt,,: !l nivprrity of calitornirjp.s--f.ntranep. rr.quireap.nts. 2. univ~rr.itics anr1 cotlr.ttea .. california. l .. collpp.ns.ltory ~r1ucation. f.19d al gps fig. 15. sample bibcon output: csl education catalog rials. additionally, portions of the software have been transferred successfully to the hennepin county library, minnesota. disadvantages: 1. personnel dependency: bal: the system is written in basic assembler language, thus necessitating the services of an experienced programmer. marc: because the system operates upon marc structured record format, the average programmer may well have a difficult time in dealing with the added complexities introduced by this aspect. 256 journal of library automation vol. 6/ 4 december 1973 options: the wide range of options provided by the system necessitates highly complex programs which may be difficult for the average programmer to grasp readily. 2. equipment dependency: ibm: because the programs are written in ibm basic assembler language, the system is presently usable on ibm equipment only. conclusion the bibcon-360 .system is a versatile and inexpensive method for producing book catalogs, when a wide range of format options are i·equired and when the catalogs must contain bibliographic information with more than one entry or access point per bibliographic record. if a simple, main entry catalog is needed, microfilm reproduction of the catalog cards may still be much cheaper. bibcon-360 is most useful for producing large scale catalogs (e.g., union catalogs) to be distributed widely to assist in the effort to provide the widest possible dissemination of library information at the least possible cost. references i. california. state library, sacramento. automation project, a users' manual for bibcon 360; a file manngement system for bibliographic records control {sacramento: california state library, 1972), 274p. (this manual, produced in limited quantities, is now available only on interlibrary loan.) 2. university of california, santa cruz, author-title catalog of the university library (santa cruz: university of california, 1970), 32 v. 3. university of california, san diego, autlwr-title catalog; subject catalog (san diego : san diego medical society-university library, 1969), 350p. 4. california. university. institute of library research, university of california union catalog of mortograplu; cataloged by the nine campuses from 1963 through 1967; a supplement to the catalogs of the university libraries at berkeley and los angeles published in 1963 (berkeley: university of california, 1972), 47 v. 5. u.s. library of congress. informatilm systems office, marc manuals used by the library of congress (chicago: ala, 1970), p.42. 6. california. state library, sacramento, recent works in the california state library in science and technology (sacramento: california state library, 1972) , p.426. 7. california. state library, sacramento, special education problems; a catalcg of materials in the california state library (unpublished). (this topical catalog was output only to test refinements to the bibcon-360 programs. it was not published, but the sample pages produced illustrate furth er refinements in formatting and sorting routines.) microsoft word march_ital_dehmlow.docx editorial  board  thoughts     a&i  databases:  the  next  frontier     to  discover   mark  dehmlow       information  technology  and  libraries  |  march  2015     1   i  think  it  is  fair  to  say  that  the  discovery  technology  space  is  a  relatively  mature  market  segment,   not  complete,  but  mature.    much  of  the  easy-­‐to-­‐negotiate  content  has  been  negotiated,  and  many   of  the  systems  on  the  market  are  above  or  approaching  a  billion  records.    this  would  seem  a  lot,   but  there  is  a  whole  slice  of  tremendously  valuable  content  still  not  fully  available  across  all   platforms,  namely  the  specialized  subject  abstracting  and  indexing  database  content.    this  content   has  a  lot  of  significant  value  for  the  discovery  community—many  of  those  databases  go  further   back  than  content  pulled  from  journal  publishers  or  full-­‐text  databases.    equally  as  important  is   that  they  represent  an  important  portion  of  humanities  and  social  sciences  content  that  is  less   represented  in  discovery  systems  as  compared  to  stem  content.    for  vendors  of  a&i  content,  the   concerns  are  clear  and  realistic,  differently  from  journal  publishers  whose  metadata  is  meant  to   direct  users  to  their  main  content  (full  text),  the  metadata  for  a&i  publishers  is  the  main  content.     according  to  a  recent  nfais  report,  a  major  concern  for  them  is  that  if  they  include  their  content   in  discovery  systems,  they  “risk  loss  of  brand  awareness”  and  the  implications  are  that  institutions   will  be  more  likely  to  cancel  those  subscriptions.1    the  focus  therefore  seems  to  have  been  how  to   optimize  the  visibility  of  their  content  in  discovery  systems  before  being  willing  to  share  it.       in  addition  to  the  nfais  report,  some  of  the  conversations  i  have  seen  on  the  topic  seem  to  focus   on  wanting  discovery  system  providers  to  meet  a  more  complex  set  of  requirements  that  will   maximize  leveraging  the  rich  metadata  contained  in  those  resources,  the  idea  being  that  utilizing   that  metadata  in  specific  ways  will  increase  the  visibility  of  the  content.    in  principle  i  think  it  is  a   commendable  goal  to  maximize  the  value  of  the  comprehensive  metadata  a&i  records  contain,   and  the  complexities  of  including  a&i  data  into  discovery  systems  need  to  be  carefully  considered   -­‐  namely  blending  multiple  subject  and  authority  vocabularies,  and  ensuring  that  metadata   records  are  appropriately  balanced  with  full  text  in  the  relevancy  algorithm.  but  i  also  worry  that   setting  too  many  requirements  that  are  too  complicated  will  lead  to  delayed  access  and  biased   search  results.    it  is  important  that  this  content  is  blended  in  a  meaningful  way,  but  determining   relevancy  is  a  complex  endeavor,  and  it  is  critically  important  for  relevancy  to  be  unbiased  from   the  content  provider  perspective  and  instead  focus  on  the  user,  their  query,  and  the  context  of   their  search.       another  concern  that  i  have  heard  articulated  is  that  results  in  discovery  services  are  unlikely  to     be  as  good  as  native  a&i  systems  because  of  the  already  mentioned  blending  issues.    this  is  likely     mark  dehmlow  (mark.dehmlow@nd.edu),  a  member  of  the  ital  editorial  board,  is  program   director,  library  information  technology,  university  of  notre  dame,  south  bend,  in.       editorial  board  thoughts:  a&i  databases  |  dehmlow     2   to  be  true,  but  i  think  it  is  critical  to  focus  on  the  purpose  of  discovery  systems.    as  donald   hawkins  recently  wrote  in  a  summary  of  a  workshop  called  “information  discovery  and  the   future  of  abstracting  and  indexing  services,”  “a&i  services  provide  precision  discipline-­‐specific   searching  for  expert  researchers,  and  discovery  services  provide  quick  access  to  full  text.”2     hawkins  indicates  that  discovery  systems  are  not  meant  to  be  sophisticated  search  tools,  but   rather  a  quick  means  to  search  a  broad  range  of  scholarly  resources  and  i  think  sometimes  a  quick   starting  point  for  researchers.    because  of  the  nature  of  merging  billions  of  scholarly  records  into  a   single  system,  discovery  systems  will  never  be  able  to  provide  the  same  experience  as  a  native  a&i   system,  nor  should  they.    over  time,  they  may  become  better  tuned  to  provide  a  better  overall   experience  for  the  three  different  types  of  searchers  we  have  in  higher  education:  novice  users  like   undergraduates  looking  for  a  quick  resource,  advanced  users  like  graduate  students  and  faculty   looking  for  more  comprehensive  topical  coverage,  and  expert  users  like  librarians  who  want   sophisticated  search  features  to  hone  in  on  the  perfect  few  resources.    many  of  the  discovery   systems  are  working  on  building  these  features,  but  the  industry  will  take  time  to  solve  this   problem,  and  i  tend  to  look  at  things  from  the  lense  of  our  end  users—non-­‐inclusion  of  this   content  directly  impacts  their  overall  discovery  experience.   one  might  ask,  if  the  discovery  system  experience  isn’t  as  precise  and  complete  as  the  native  a&i   experience,  why  bother?    in  addition  to  broadening  the  subject  scope  by  including  many  of  the   more  narrow  and  deep  subject  metadata,  there  is  also  the  importance  of  serendipitous  finding.     that  content,  in  the  context  of  a  quick  user  search,  may  drive  the  user  to  just  the  right  thing  that   they  need.    in  addition,  my  belief  is  that  with  that  content,  we  can  build  search  systems  that  are   deeper  than  google  scholar,  and  by  extension  provide  our  end  users  with  a  superior  search   experience.    and  so  i  advocate  for  innovating  now  instead  of  waiting  to  work  out  all  of  the  details.     i  am  not  suggesting  moving  forward  callously,  but  swiftly.    the  work  that  niso  has  done  on  the   open  data  initiative  has  resulted  in  some  good  recommendations  about  how  to  proceed.    for   example,  they  have  suggested  two  usage  metrics  that  could  be  valuable  for  measuring  a&i  content   use  in  discovery  systems:  search  counts  (by  collection  and  customer  for  a&i  databases)  and   results  clicks  (number  of  times  an  end  user  clicks  on  a  content  provider’s  content  in  a  set  of   results).3     while  i  think  these  types  of  metrics  are  aligned  with  the  types  of  measures  that  libraries  evaluate   a&i  database  usage  by,  i  think  at  the  same  time  they  don’t  really  say  much  about  the  overall  value   of  the  resources  themselves.    sometimes  in  the  library  profession,  our  obsession  for  counting  stuff   loses  connection  with  collecting  metrics  that  actually  say  something  about  impact.    of  the  two   counts,  i  could  see  perhaps  counting  the  result  clicks  as  having  more  value.    in  this  instance,   knowing  that  a  user  found  something  of  interest  from  a  specific  resource  at  the  very  least  indicates   that  it  led  the  user  some  place.    i  think  the  measure  of  search  counts  by  collection  is  less  useful.    at   best  it  indicates  that  the  resource  was  searched,  but  it  tells  us  nothing  about  who  was  searching   for  an  item,  what  they  found,  or  what  they  subsequently  did  with  the  item  once  they  found  it.    i  do   think  we  in  libraries  need  to  consider  the  bigger  picture.    regardless  of  the  number  of  searches   information  technology  and  libraries  |  march  2015     3   (which  doesn’t  really  tell  us  anything  anyway),  we  need  to  recognize  the  value  alone  of  including   the  a&i  content,  and  instead  of  trying  to  determine  the  value  of  the  resource  by  the  number  of   times  it  was  searched,  focus  more  on  the  breadth  of  exposure  that  content  is  getting  by  inclusion   in  the  discovery  system.   i  think  a  more  useful  technical  requirement  for  discovery  providers  would  be  to  provide  pathways   to  specific  a&i  resources  within  the  context  of  a  user’s  search—not  dissimilar  to  how  google   places  sponsored  content  at  the  top  of  their  search  results,  a  kind  of  promotional  widget.    in  this   case,  using  metadata  returned  from  the  query,  the  systems  could  calculate  which  one  or  two   specific  resources  would  guide  the  user  to  more  in  depth  research.    by  virtue  of  inclusion  of  the   resource  in  the  discovery  system,  those  resources  could  become  part  of  the  promotional  widget.     this  would  guide  users  back  to  the  native  a&i  resource  which  both  libraries  and  a&i  providers   want,  and  it  would  do  that  in  a  more  intuitive  and  meaningful  way  for  the  end  user.   all  of  the  parties  involved  in  the  discovery  discussion  can  bring  something  to  the  table  if  we  want   to  solve  these  issues  in  a  timely  way.    i  hope  that  a&i  publishers  and  discovery  system  providers   make  haste  and  get  agreements  underway  for  content  sharing  and  i  would  recommend  that   instead  of  focusing  on  requiring  finished  implementations  based  in  complex  requirement  before   loading  content,  both  of  them  should  instead  focus  on  some  achievable  short  and  long  term  goals.     integrating  a&i  content  perfectly  will  take  some  time  to  complete  and  the  longer  we  wait,  the   longer  our  users  have  a  sub-­‐optimal  discovery  experience.    discovery  providers  need  to  make  long   term  commitments  to  developing  mechanisms  that  satisfy  usage  metrics  for  a&i  content,  although   i  would  recommend  defining  measures  that  have  true  value.    a&i  providers  should  be  measured  in   their  demands:  while  their  stakes  in  system  integration  is  real,  there  runs  a  risk  of  content   providers  vying  for  their  content  to  be  preferred  when  relevancy  neutrality  is  paramount  for  a   discovery  system  to  be  effective.    i  think  it  is  worth  lauding  the  efforts  of  a  few  trailblazing  a&i   publishers  such  as  thomson  reuters  and  proquest  who  have  made  agreements  with  some  of  the   discovery  providers  and  are  sharing  their  a&i  content  already,  providing  some  precedent  for   sharing  a&i  content.    lastly,  libraries  and  knowledge  workers  need  to  develop  better  means  for   calculating  overall  resource  value,  moving  beyond  strict  counts  to  thinking  of  ways  to  determine   the  overall  scholarly/pedagogical  impact  of  those  resources  and  they  need  to  make  the  fact  alone   that  an  a&i  publisher  shares  its  data  with  a  discovery  provider  indicate  significant  value  for  the   resource.                       editorial  board  thoughts:  a&i  databases  |  dehmlow     4     references     1.    nfais,  recommended  practices:  discovery  systems.  nfais,  2013.   https://nfais.memberclicks.net/assets/docs/bestpractices/recommended_practices_final_aug_ 2013.pdf.     2.    hawkins,  donald  t.,    “information  discovery  and  the  future  of  abstracting  and  indexing   services:  an  nfais  workshop.”    against  the  grain.    ,  2013.  http://www.against-­‐the-­‐ grain.com/2013/08/information-­‐discovery-­‐and-­‐the-­‐future-­‐of-­‐abstracting-­‐and-­‐indexing-­‐ services-­‐an-­‐nfais-­‐workshop/.   3.    open  discovery  initiative  working  group,  open  discovery  initiative:  promoting  transparency  in   discovery.    baltimore:  niso,  2014.   http://www.niso.org/apps/group_public/download.php/13388/rp-­‐19-­‐2014_odi.pdf.   editorial board thoughts: the importance of staff change management in the face of the growing “cloud” mark dehmlow information technology and libraries | march 2016 3 the library vendor market likes to throw around the word “cloud” to make their offerings seem innovative and significant. in many ways, much of what the library it market refers to as “cloud,” especially saas (software as a service) offerings, are really just a fancier term for hosted services. the real gravitas behind the label cloud really emanated from grid-computing or large interconnected, and quickly deployable infrastructure like amazon’s aws or microsoft’s azure platforms. infrastructure at that scale and that level of geographic distribution was revolutionary when it emerged. still these offerings at their core are basically iaas (infrastructure as a service) bundled as a menu of services. so i think the most broadly applicable synonym for the “cloud” could be “it as a service” in various forms. outsourcing in this way isn’t entirely new to libraries. the function and structure of oclc has arguably been one of the earlier instantiations of “it as a service” for libraries vis-à-vis their marc record aggregation and distribution which oclc has been doing for decades. the more recent trend toward hosted it services has been relatively easy for non-it related units in our library. a service no different to most library staff based on where it is hosted. and with many services implementing apis for libraries, that distinction is becoming less significant for our application developers too. for many of our technology staff, who have built careers around systems administration, application development, systems integration, and application management, hosted services represent a threat to not only their livelihoods but in some ways also their philosophical perspectives that are grounded in open source and do-ityourself oriented beliefs. in many ways the “cloud” for the it segment of our profession is perhaps more synonymous with change, and with change requires effective management of that change, especially for the human element of our organizations. recently, our office of information technologies started an initiative to move 80% of their technology infrastructure into the cloud. they have proposed an inverted pyramid structure for determining where it solutions should reside — focusing first on hosted software as a service solutions for the largest segment of applications, followed by hosting those applications we would have typically installed locally onto a platform or infrastructure as a service provider, and then limiting only those applications that have specialized technical or legal needs to reside on premise. this is a big shift for our it staff, especially, but not limited to, our systems administrators. the iaas platform our university is migrating to is amazon web services and their infrastructure is mark dehmlow (mdehmlow@nd.edu), a member of lita and the ital editorial board, is the director, information technology program, hesburgh libraries, university of notre dame, south bend, indiana. editorial board thoughts: the importance of staff change management in the face of the growing “cloud” | dehmlow | doi: 10.6017/ital.v35i1.8965 4 largely accessible via a web dashboard, so that the myriad of tasks our systems administrators took days and weeks to do can now, in some adjusted way, be accomplished with a few clicks. this example is on one extreme end of the spectrum as far as it change goes, but simultaneously, we have looked at the vendor market to lease pre-packaged tools that support standard functions in academic libraries and can be locally branded and configured with our data — things like course guides, a-z journal lists, scheduling events, etc. the overarching goals of these efforts are cost savings and increased velocity and resiliency of infrastructure, but also and perhaps more important, is giving us flexibility in how we invest our staff time. if we are able to move high level tasks from staff to a platform, then we will be able to reallocate our staff’s time and considerable talent to take on the constant stream of new, high level technology needs. partnering with the university, we are aiming towards their defined goal of moving 80% of our technical infrastructure into the “cloud.” we have adopted their overall strategy of approach to systems infrastructure, at least in principle and are integrating into our own strategy significant consideration for the impact of these changes on our staff. our organization has recognized that people form not only habits around process, but also personal and emotional attachments to why we do things the way we do them, both from a philosophical as well as a pragmatic perspective. our approach to staff change is layered as well as long term. we know that getting from shock to acceptance is not an overnight process and that staff who adopt our overarching goals and strategy as their own will be more successful in the long term. to make this transition, we have developed several strategic approaches: 1. explaining the case: my experience is that staff can live through most changes as long as they understand why. helping them gain that understanding can take some time, but ultimately having that comprehension will help them fully understand our strategic goals as well as help them make decisions that are in alignment with the overall approach. i often find it is important to remember that, as managers, we have been a part of all of the change conversations and we have had time to assimilate ideas, discuss points of view, and process the implications of change. each of our staff needs to go through the same process and it is up to leadership to guide them through that process and ensure they get to participate in similar conversations. it is tempting to want to hit an initiative running, but there is significant value in seeding those discussions gradually over a somewhat gradual time period to more holistically integrate staff into the broader vision. it is important to explain the case for change multiple times and actively listen to staff thoughts and concerns and to remember to lay out the context for change, why it is important, and how we intend to accomplish things. then reassure, reassure, and reassure. the threats to staff may seem innocuous or unfounded to managers, but staff need to feel secure during a process to ultimately buy in. 2. consistency and persistence: staff acceptance doesn’t always come easy — nor should it necessarily. listening and integrating their perspectives into the planning and information technology and libraries | march 2016 5 implementation process can help demonstrate that they matter, but equally important is that they feel our approach is built on something solid. stability is reinforced through consistency in messaging. not only in individual consistency, but also team consistency, and upper management consistency — everyone should be able to support and explain messaging around a particular change. any time staff approach me and say, “it was much easier to do it this other way,” i talk about the efficiency we will garner through this change and how we will be able to train and repurpose staff in the future. the more they hear the message, the more ingrained it becomes, and the more normative it begins to feel. 3. training and investment: it futures require investment, not just in infrastructure, but also in skill development. we continue to invest significantly in providing some level of training on new technologies that we implement. that training will not only prove to staff that you are invested in their development as well as their job security, but it will also give them the tools they need to be successful in implementing new technologies. change is anxiety inducing because it exposes so many unknowns. providing training helps build confidence and competence for staff, reducing anxieties and providing some added engagement in the process. it also gives them exposure to the real world implementation of technologies where they can begin to see the benefits that you have been communicating for themselves. 4. envisioning the future: improvements and roles — one of the initial benefits we will be getting from recouping staff time is around shoring up our processes. we have generally had a more ad hoc approach to managing the day to day. it has been difficult to institute a strong technical change management process, in part, because of time. we will be able to remove that consideration from our excuses as we take advantage of the “cloud.” the net effect will be that we will do our work more thoughtfully and less ad hoc and use better defined processes that will meet group-developed expectations. in addition to doing things better, we do expect to do things differently. with fewer tasks at the operational level, we believe we will be able to transition staff into newly defined roles. some of these roles include devops engineers, a hybrid of application engineering (the dev) and systems administration (the ops), these staff will help design automation and continuous integration processes that allow developers to focus on their programming and less on the environment they are deploying their applications in; financial engineers who will take system requirements and calculate costs in somewhat complex technical cloud environments; systems architects who will be focused on understanding the smorgasbord of options that can be tied together to provide a service to meet expected response performance, disaster recovery, uptime, and other requirements; and business analysts who will focus on taking technical requirements and looking at all of the potential approaches to solve that need whether it be a hosted service, a locally developed solution, an implementation of an open source system, or some integration of all or some of the editorial board thoughts: the importance of staff change management in the face of the growing “cloud” | dehmlow | doi: 10.6017/ital.v35i1.8965 6 above. this list is by no means exhaustive, but i think it forms a good foundation on which to help staff develop their skill set along with our changing environment. i believe it is important to remind those of us who are managing it departments in libraries that in many ways the easiest parts of change are the logistics. the technology we work with is bounded by sets of guidelines that define how they are used and ensure that if they are implemented properly, they will work effectively. people on the other hand are not bounded as neatly by stringent rules. they are guided by diverse backgrounds, personalities, experiences, and feelings. they can be unpredictable, difficult to fully figure out, and behaviorally inconsistent. and yet, they are the great constant in our organizations and therefore require significant attention. our field needs “servant leaders” dedicated to supporting and developing staff, and not just being competent at implementing technologies. those managers who invest in staff, their well-being, development, and sense of engagement in their jobs, will find their organizations are able to tackle most anything. but those who ignore their staffs’ needs over pragmatic goals will likely find their organizations struggling to move quickly and instead spend too much energy overcoming resistance instead of energizing change. article explainable artificial intelligence (xai) adoption and advocacy michael ridley information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14683 michael ridley (mridley@uoguelph.ca) is librarian, university of guelph. © 2022. abstract the field of explainable artificial intelligence (xai) advances techniques, processes, and strategies that provide explanations for the predictions, recommendations, and decisions of opaque and complex machine learning systems. increasingly academic libraries are providing library users with systems, services, and collections created and delivered by machine learning. academic libraries should adopt xai as a tool set to verify and validate these resources, and advocate for public policy regarding xai that serves libraries, the academy, and the public interest. introduction explainable artificial intelligence (xai) is a subfield of artificial intelligence (ai) that provides explanations for the predictions, recommendations, and decisions of intelligent systems.1 machine learning is rapidly becoming an integral part of academic libraries. xai is a set of techniques, processes, and strategies that libraries should adopt and advocate for to ensure that machine learning appropriately serves librarianship, the academy, and the public interest. knowingly or not, libraries acquire and provide access to systems, services, and collections infused and directed by machine learning methods, and library users are engaged in information behavior (e.g., seeking, using, managing) facilitated or augmented by machine learning. machine learning in library and information science (lis), as with many other fields, has become ubiquitous. however, this technology is often opaque and complex, yet consequential. there are significant concerns about bias, unfairness, and veracity.2 there are troubling questions about user agency and power imbalances.3 while lis has a long-standing interest in ai and intelligent information systems generally, 4 it has only recently turned its attention to xai and how it affects the field and how the field might influence it.5 xai is a critical lens through which to view machine learning in libraries. it is also a set of techniques, processes, and strategies essential to influencing and shaping this stil l emerging technology: research libraries have a unique and important opportunity to shape the development, deployment, and use of intelligent systems in a manner consistent with the values of scholarship and librarianship. the area of explainable artificial intelligence is only one component of this, but in many ways, it may be the most important.6 dismissing engagement with xai because it is “highly technical and impenetrable to those outside that community” is neither acceptable nor increasingly possible.7 artificial intelligence is the essential substrate of contemporary information systems and xai is a tool set for critical assessment and accountability. the details matter and must be understood if libraries are to have a place at the table as xai, and machine learning, evolves and further deepens its effect on lis. mailto:mridley@uoguelph.ca information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 2 this paper provides an overview of xai with key definitions, a historical context, and examples of xai techniques, strategies, and processes that form the basis of the field. it considers areas where xai and academic libraries intersect. the dual emphasis is on xai as a toolset for libraries to adopt and xai as an area for public policy advocacy. what is xai? xai is plagued by definitional problems.8 some definitions are focused solely and narrowly on the technical concepts while others focus only on the broad social and political dimensions. lacking “a theory of explainable ai, with a formal and universally agreed definition of what explanations are,”9 the fundamentals of this field are still being explored, often from different disciplinary perspectives.10 critical algorithm studies position machine learning as socio-techno-informational systems.11 as such, a definition of xai must encompass not just the techniques, as important and necessary as they are, but also the context within which xai operates. the us defense advanced research projects agency (darpa) description of xai captures the breadth and scope of the field. the purpose of xai is for ai systems to have “the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future” 12 and to “enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners.”13 xai is needed to: 1. generate trust, transparency, and understanding; 2. ensure compliance with regulations and legislation; 3. mitigate risk; 4. generate accountable, reliable, and sound models for justification; 5. minimize or mitigate bias, unfairness, and misinterpretation in model performance and interpretation; and 6. validate models and validate explanations generated by xai.14 xai consists of testable and unambiguous proofs, various verification and validation methods that assess influence and veracity, and authorizations that define requirements or mandate auditing within a public policy framework. xai is not a new consideration. explainability has been a preoccupation of computer science since the early days of expert systems in the late twentieth century.15 however, the 2018 introduction of the general data protection regulation (gdpr) by the european union (eu) shifted explainability from a purely technical issue to one with an additional and urgent focus on public policy.16 while the presence of a “right to explanation” in the gdpr is highly contested, 17 industry groups and jurisdictions beyond the eu recognized its evitability spurring an explosion in xai research and development.18 types of xai taxonomies of xai types are classified based on their scope and mechanism.19 local explanations interpret the decisions of a machine learning model used in a specific instance (i.e., involving data and context relevant to the circumstance). global explanations interpret the model more generally (i.e., involving all the training data and relevant contexts). in black-box or model-agnostic explanations, only the input and the output of the machine learning model are required while information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 3 white-box or model-specific explanations require more detailed information regarding the processing or design of the model. another way to categorize xai is as proofs, validations, and authorizations. proofs are testable, traceable, and unambiguous explanations demonstrable through causal links, logic statements, or transparent processes. typically, proofs are only available for ai systems that use “inherently interpretable” techniques such as rules, decisions trees, or linear regressions.20 validations are explanations that confirm the veracity of the ai system. these verifications occur through testing procedures, reproducibility, approximations and abstractions, and justifications. authorizations are explanations because of processes in which third parties provide some form of standard, ratification, prohibition, or audit. authorizations might pertain to the ai model, its operation in specific instances, or even the process by which the ai was created. they can be provided by professional groups, nongovernmental organizations, governments and government agencies, and third parties in the public and private sector. academic libraries can adopt proofs and validations as means to interrogate information systems and resources. this includes collections which are increasingly machine learning systems themselves or developed with machine learning methods. the recognition of “collections as data” is an important shift in this direction.21 where appropriate, proofs and validations should accompany content and systems derived from machine learning. libraries must also engage with xai as authorizations to assess the public policy implications that exist, are emergent, or are necessary. library advocacy is currently lacking in this area. the requirement for policy and governance frameworks is a reminder that machine learning is “far from being purely mechanistic, it is deeply, inescapably human”22 and that while complex and opaque “the ‘black box’ is full of people.”23 prerequisites to an xai strategy three questions are important for any xai strategy: • what constitutes a good explanation? • who is the explanation for? • how will the explanation be provided? explanations are context specific. the “goodness” of an explanation is dependent on the needs and objectives of the explainee (a user) and the explainer (an xai). following research from the fields of psychology and cognitive science, keil suggests five reasons for why someone wants an explanation: (1) to predict similar events in the future, (2) to diagnose, (3) to assess blame or guilt, (4) to justify or rationalize an action, and (5) for aesthetic pleasure.24 for most people, explanations need not be complete or even fully accurate.25 as a result, who the explanation is for is critical to a good explanation. different audiences have different priorities. system developers are primarily interested in performance explanations while clients focus on effectiveness or efficacy, professionals are concerned about veracity, and regulators are interested in policy implications. nonexpert, lay users of a system want explanations that build trust and provide accountability. information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 4 a good explanation is also affected by its presentation. there are temporal and format considerations. explanations can be provided or available in real time and continuously as the process occurs (hence partial explanations) or post hoc and in summary form. interactive explanations are widely preferred but are not always appropriate or actionable. 26 studies have compared textual, visual, and multimodal formats with differing results. familiar textual responses or simple visual explanations such as venn diagrams are often most effective for nonexpert users.27 drawing from philosophy, psychology, and cognitive science, miller recommends four approach es for xai.28 explanations are contrastive. when people want to know the “why” of something, “people do not ask why event p happened, but rather why event p happened instead of some event q.” explanations are selected. “humans are adept at selecting one or two causes from a sometimes infinite number of causes to be the explanation.” explanations are social. “they are a transfer of knowledge, presented as part of a conversation or interaction, and are thus presented relative to the explainer’s beliefs about the explainee’s beliefs.” finally, miller cautions against using probabilities and statistical relationships and encourages references to causes. burrell identifies three key barriers to explainability: concealment, the limited technical understanding of the user, and an incompatibility between the user (human) and algorithmic reasoning.29 while concealment is deliberate, it may or may not be justified. protecting ip and trade secrets is acceptable while obscuring processes to purposively deceive users is not. regulations are a tool to moderate the former and minimize the latter. the technical limitations of users and the incompatibility between users and algorithms suggest two remedies. first is enhancing algorithmic literacy. algorithmic literacy is a “a set of competencies that enables individuals to critically evaluate ai technologies; communicate and collaborate effectively with ai; and use ai as a tool online, at home, and in the workplace.”30 libraries have a key role in advancing algorithmic literacy in their communities.31 just as libraries championed information literacy through the promulgation of standards and principles, the provision of diverse educational programming, and the engagement of the broad academic community, so too can libraries be central to efforts to enhance algorithmic literacy. second is a requirement that xai must be sensitive to the abilities and needs of different users. a survey of the key challenges and research direction of xai identified 39 issues, including the need to understand and enhance the user experience, match xai to user expertise, and explain the competencies of ai systems to users.32 this is the essence of human-centered explainable ai (hcxai). among hcxai principles are the importance of context (regarding user objectives, decision consequences, timing, modality, and intended audience), the value of using hybrid explanation methods that complement and extend each other, and the power of contrastive examples and approaches.33 proofs and validations xai that provide proofs or validations can be adopted by libraries to assess and evaluate machine learning utilized in systems, services, and collections. since proofs pertain to already interpretable systems, the four examples provided focus on validations: feature audit, approximation and abstraction, reproducibility, and xai by ai. these techniques may require access to, or information about, the machine learning model. this would include such characteristics as the algorithms used, settings of the parameters and hyperparameters, optimization choices, and the training data. while all these may not be normally information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 5 available, designers of machine learning systems in consequential settings should expect to provide, indeed be required to provide, such access. similarly, vendors of library content or systems utilizing machine learning should make explanatory proofs and validations available for library inspection. feature audit feature audit is an explanatory strategy that attempts to reveal the key features (e.g., characteristics of the data or settings of the hyperparameters used to the differentiate data) that have a primary role in the prediction of the algorithm. by isolating these features, it is possible to explain the key components of the decision. feature audit is a standard technique of linear regression, but it is made more difficult in machine learning because of the complexity of the information space (e.g., billions of parameters and high dimensionality). there are various feature audit techniques34 but all of them are “decompositional” in that they attempt to reduce the work of the algorithm to its component parts and then use those results as an explanation.35 feature audit can highlight bias or inaccuracy by revealing incongruence between the data and the prediction. more advanced feature audit techniques (e.g., gradient feature auditing) recognize that features can indirectly influence other features and that these features are not easily detectable as separate, influential elements.36 this interaction among features challenges the strict decompositional approach to feature audit and will likely lead to an increased focus on the relational analysis among and between elements. approximation and abstraction approximation and abstraction are techniques that create a more simplified model to explain the more complex model.37 people seek and accept explanations that “satisfice”38 and are coherent with existing beliefs.39 this recognizes that “an explanation has greater power than an alternative if it makes what is being explained less surprising.”40 approaches such as “model distillation”41 or the “model agnostic” feature reduction of the local interpretable model-agnostic explanations (lime) tool create a simplified presentation of the algorithmic model.42 this approximation or abstraction may compromise accuracy, but it provides an accessible representation that enhances understandability. a different type of approximation or abstraction is a narrative of the machine learning processes utilized that provides sufficient documentation for a reader to act as an explanation of the outcomes. an exemplary case of this is lithium-ion batteries: a machine-generated summary of current research published by springer nature and written by beta writer, an ai or more accurately a suite of algorithms.43 a collaboration of machine learning and human editors, the full production cycle of the book is documented in the introduction.44 in lieu of being able to interrogate the system directly, this detailed account provides an explanation of the system allowing readers to assess the strengths, limitations, and confidence levels of the algorithmic processes and offers a model of what might be necessary for future ai generated texts.45 libraries can utilize this documentation in acquisition or licensing decisions and subsequently make it available as user guides when resources are added to the collection. reproducibility replication is a verification strategy fundamental to science. being able to independently reproduce results in different settings provides evidence of veracity and supports user trust. however, documented problems in reproducing machine learning studies have questioned the information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 6 generalizability of these approaches and undermined their explanatory capacity. for example, an analysis of text mining studies using machine learning for citation screening in the preparation of systemic reviews revealed a lack of key elements to enable replicability (e.g., access to research datasets, software environments used, randomization control, and lack of detail on new methods proposed or employed).46 in response, a “reproducibility challenge” was created by the international conference on learning representations (iclr) to validate 2018 conference submissions and has continued in subsequent meetings.47 more rigorous replication through the availability of all necessary components and the development of standards will be important to this type of verification.48 xai by ai the inherent complexity and opacity of unsupervised learning or reinforcement learning suggests, as xai researcher trevor darrell puts it, “the solution to explainable ai is more ai.”49 in this approach to explanation, oversight ai are positioned as intermediaries between an ai and its users: workers have supervisors; businesses have accountants; schoolteachers have principals. we suggest that the time has come to develop ai oversight systems (“ai guardians”) that will seek to ensure that the various smart machines will not stray from the guidelines their programmers have provided.50 while the prospect of ai guardians may be dystopic, oversight systems performing roles that validate, interrogate, and report are common in code checking tools. generative adversarial networks (gans) have been used to create counterfactual explanations of another machine learning model to enhance explainability.51 with strategic organizational and staffing changes to enhance capabilities, libraries can design and deploy such oversight or adversarial tools with objectives appropriate to the requirements and norms of libraries and the academy. authorization xai that results from authorizations is an area where public policy engagement is needed to ensure xai, and machine learning, are appropriately serving libraries, the academy, and the public at large. three examples are provided: codes and standards, regulation, and audit. codes and standards one approach to explanation, supported by the ai industry and professional organizations, are voluntary codes or standards that encourage explanatory capabilities. these nonbinding principles are a type of self-regulation and are widely promoted as a means of assurance.52 the association for computing machinery’s statement on algorithms highlights seven principles as guides to system design and use: awareness, access and redress, accountability, explanation, data provenance, auditability, validation, and testing. however, the language used is tentative and conditional. designers are “encouraged” to provide explanations and to “encourage” a means for interrogation and auditing “where harm is suspected” (i.e., a post hoc process). despite this, the statement concludes with a strong position on accountability if not explainability: “institutions should be held responsible for decisions made by the algorithms that they use, even if it is not feasible to explain in detail how the algorithms produce their results.”53 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 7 unfortunately, the optimism for self-regulation in explainability is undercut by the poor experience with voluntary mechanisms regarding privacy protection.54 in addition, library associations, library system vendors, and scholarly publishers have been slow to endorse any codes or standards regarding explainability. regulation the most common recommendation for ai oversight and authorization to ensure explainability is the creation of a regulatory agency. specific suggestions include a “neutral data arbiter” with investigative powers like the us federal trade commission,55 a food and drug administration “for algorithms,”56 a standing “commission on artificial intelligence,”57 quasi-governmental agencies such as the council of europe,58 and a hybrid agency model combining certification and liability.59 such agencies would have legislated or delegated powers to investigate, certify, license, and arbitrate on matters relating to ai and algorithms, including their design, use, and effects. there are few calls for an international regulatory agency despite digitally porous national boundaries and the global reach of machine learning.60 that almost no such agencies have been created reveals the strength and influence of the large corporations responsible for developing and deploying most machine learning tools and systems.61 reports comparing regulatory approaches to ai among the european union, the united kingdom, the united states, and canada indicate significantly different approaches but with most proceeding with a “light touch” to avoid competitive disadvantages in a multitrillion dollar global marketplace.62 the introduction of the draft eu artificial intelligence act marks the first major jurisdiction to propose specific ai legislation.63 while the act is fulsome about high-risk ai, it is silent on any notion of “explainable” ai, preferring to focus on the less specific idea of “trustworthy artificial intelligence.” with this the eu appears to retreat from the idea of explainability in the gdpr. an exception to this inertia or backtracking is the development and use of algorithmic impact assessments in both governments and industry. these instruments help prospective users of an algorithmic decision-making system determine levels of explanatory requirements and standards to meet those requirements.64 canada has been a leader in this area with a protocol covering use of these systems in the federal government.65 some identify due process as a possible, if limited, remedy for explainability.66 however, a landmark us case suggests otherwise. in state v. loomis, regarding the use of compas, an algorithmic sentencing system, the court ruled on the role of explanation in due process:67 the wisconsin supreme court held that a trial court’s use of an algorithmic risk assessment in sentencing did not violate the defendant’s due process rights even though the methodology used to produce the assessment was disclosed neither to the court nor to the defendant.68 the petition of the loomis case to the us supreme court was denied, so a higher court ruling on this issue is unavailable.69 advocacy for regulations regarding explainability should be a central concern for libraries. without strong regulatory oversight requiring disclosure and accountability, machine learning information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 8 systems will remain black boxes and presence of these consequential systems in the lives of users will be obscured. audit a commonly recommended approach to ai oversight and explanation is third-party auditing.70 the use of audit and principles of auditing are widely accepted in a variety of areas. 71 in a library context, auditing of ai can be thought of as a reviewing process to achieve transparency or to determine product compliance. auditing is typically done after system implementation, but it can be accomplished at any stage. it is possible to audit design specifications, completed code, cognitive models, or periodic audits of specific decisions.72 the keys to successful audit oversight are clear audit goals and objectives (e.g., what is being audited and for what purpose), acknowledged expertise of the auditors, authority of the auditors to recommend, and authorization of the auditors to investigate. any such auditing responsibility for xai would require the trust of stakeholders such as ai designers, government regulators, industry representatives as well as users themselves. critics of the audit approach have focused on lack of auditor expertise, algorithmic complexity, and the need for approaches that assess the algorithmic system prior to its release. 73 while most audit recommendations assume a public agency in this role, an innovative suggestion is a crowdsourced audit (a form of audit study that involves the recruitment of testers to anonymously assess an algorithmic system; an xai form of the “secret shopper”).74 this approach resembles techniques used by consumer advocates and might indicate the rise of public activists into the xai arena. the complexity of algorithms suggests that a precondition for an audit is “auditability.”75 this would require that ai be designed in such a way that an audit is possible (i.e., inspectable in some manner) while, presumably, not impairing its predictive performance. sandvig et al. propose regulatory changes because “rather than regulating for transparency or misbehavior, we find this situation argues for ‘regulation toward auditability’.”76 auditing is not without its difficulties. there are no industry standards for algorithmic auditing.77 a high-profile development was the recent launch of orcaa (orcaarisk.com), an algorithmic auditing company started by cathy o’neil, a data scientist who has written extensively about the perils of uncontrolled algorithms.78 however, the legitimacy of third-party auditing has been criticized as lacking public transparency and the capacity to demand change.79 while libraries may not be able to create their own auditing capacity, whether collectively or individually, they are encouraged to engage with the emerging algorithmic auditing community to shape auditing practices appropriate for scholarly communication. xai as discovery while xai is primarily a means to validate and authorize machine learning systems, another use of xai is gaining attention. since xai can find new information latent in large and complex datasets, discovery is promoted as “one of the most important achievements of the entire algorithmic explainability project.”80 alkhateeb asks “can scientific discovery really be automated” while invoking the earlier work of swanson which mined the medical literature for new knowledge by connecting seemingly unrelated articles through search.81 an emerging reason for libraries to adopt xai may be as a powerful discovery tool. https://orcaarisk.com/ information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 9 conclusion our lives have become “algorithmically mediated”82 where we are “dependent on computational spectacles to see the world.”83 academic libraries are now sites where systems, services, and collections are increasingly shaped and provided by machine learning. the predictions, recommendations, and decisions of machine learning systems are powerful as well as consequential. however, “the danger is not so much in delegating cognitive tasks, but in distancing ourselves from—or in not knowing about—the nature and precise mechanisms of that delegation.”84 taddeo notes that “delegation without supervision characterises the presence of trust.”85 xai is an essential tool to build that trust. geoffrey hinton, a central figure in the development of machine learning,86 argues that requiring an explanation from an ai system would be “a complete disaster” and that trust and acceptance should be based on the system’s performance, not its explainability.87 this is consistent with the view of many that “if algorithms that cannot be easily explained consistently make better decisions in certain areas, then policymakers should not require an explanation.”88 both these views are at odds with the tenants of critical thought and assessment, and both challenge norms of algorithmic accountability. xai is a dual opportunity for libraries. on one hand, it is a set of techniques, processes, and strategies that enable the interrogation of the algorithmically driven resources that libraries provide to their users. on the other hand, it is a public policy arena where advocacy is necessary to promote and uphold the values of librarianship, the academy, and the public interest in the face of powerful new technologies. many disciplines have engaged with xai as machine learning has impacted their fields.89 xai has been called a “disruptive force” in lis,90 warranting the growing interest in how xai affects the field and how the field might influence it. information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 10 endnotes 1 vijay arya et al., “one explanation does not fit all: a toolkit and taxonomy of ai explainability techniques,” arxiv:1909.03012 [cs, stat], 2019, http://arxiv.org/abs/1909.03012; shane t. mueller et al., “explanation in human-ai systems: a literature meta-review, synopsis of key ideas and publications, and bibliography for explainable ai,” arxiv:1902.01876 [cs], 2019, http://arxiv.org/abs/1902.01876; ingrid nunes and dietmar jannach, “a systematic review and taxonomy of explanations in decision support and recommender systems,” user modeling and user-adapted interaction 27, no. 3 (2017): 393–444, https://doi.org/10.1007/s11257-017-9195-0; gesina schwalbe and bettina finzel, “xai method properties: a (meta-) study,” arxiv:2105.07190 [cs], 2021, http://arxiv.org/abs/2105.07190. 2 safiya noble, algorithms of oppression: how search engines reinforce racism (new york: new york university press, 2018); frank pasquale, the black box society: the secret algorithms that control money and information (cambridge, mass.: harvard university press, 2015); sara wachter-boettcher, technically wrong: sexist apps, biased algorithms, and other threats of toxic tech (new york: w. w. norton, 2017). 3 abeba birhane et al., “the values encoded in machine learning research,” arxiv:2106.15590 [cs], 2021, http://arxiv.org/abs/2106.15590; taina bucher, if ... then: algorithmic power and politics (new york: oxford university press, 2018); sarah myers west, meredith whittaker, and kate crawford, discriminating systems: gender, race, and power in ai (ai now institute, 2019), https://ainowinstitute.org/discriminatingsystems.html. 4 rao aluri and donald e. riggs, “application of expert systems to libraries,” ed. joe a. hewitt, advances in library automation and networking 2 (1988): 1–43; ryan cordell, machine learning + libraries: a report on the state of the field (washington dc: library of congress, 2020), https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf; jason griffey, ed., “artificial intelligence and machine learning in libraries,” library technology reports 55, no. 1 (2019), https://doi.org/10.5860/ltr.55n1; guoying liu, “the application of intelligent agents in libraries: a survey,” program: electronic library and information systems 45, no. 1 (2011): 78–97, https://doi.org/10.1108/00330331111107411; linda c. smith, “artificial intelligence in information retrieval systems,” information processing and management 12, no. 3 (1976): 189–222, https://doi.org/10.1016/0306-4573(76)90005-4. 5 jenny bunn, “working in contexts for which transparency is important: a recordkeeping view of explainable artificial intelligence (xai),” records management journal (london, england) 30, no. 2 (2020): 143–53, https://doi.org/10.1108/rmj-08-2019-0038; cordell, “machine learning + libraries”; andrew m. cox, the impact of ai, machine learning, automation and robotics on the information professions (cilip, 2021), http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report__final_lo.pdf; daniel johnson, machine learning, libraries, and cross-disciplinary research: possibilities and provocations (notre dame, indiana: hesburgh libraries, university of notre dame, 2020), https://dx.doi.org/10.7274/r0-wxg0-pe06; sarah lippincott, mapping the current landscape of research library engagement with emerging technologies in research and learning (washington dc: association of research libraries, 2020), https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies http://arxiv.org/abs/1909.03012 http://arxiv.org/abs/1902.01876 https://doi.org/10.1007/s11257-017-9195-0 http://arxiv.org/abs/2105.07190 http://arxiv.org/abs/2106.15590 https://ainowinstitute.org/discriminatingsystems.html https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf https://doi.org/10.5860/ltr.55n1 https://doi.org/10.1108/00330331111107411 https://doi.org/10.1016/0306-4573(76)90005-4 https://doi.org/10.1108/rmj-08-2019-0038 http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report_-_final_lo.pdf http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report_-_final_lo.pdf https://dx.doi.org/10.7274/r0-wxg0-pe06 https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies-landscape-summary.pdf information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 11 landscape-summary.pdf; thomas padilla, responsible operations. data science, machine learning, and ai in libraries (dublin, oh: oclc research, 2019), https://doi.org/10.25333/xk7z-9g97; michael ridley, “explainable artificial intelligence,” research library issues, no. 299 (2019): 28–46, https://doi.org/10.29242/rli.299.3. 6 ridley, “explainable artificial intelligence,” 42. 7 bunn, “working in contexts for which transparency is important,” 151. 8 sebastian palacio et al., “xai handbook: towards a unified framework for explainable ai,” arxiv:2105.06677 [cs], 2021, http://arxiv.org/abs/2105.06677; sahil verma et al., “pitfalls of explainable ml: an industry perspective,” in mlsys journe workshop, 2021, http://arxiv.org/abs/2106.07758; giulia vilone and luca longo, “explainable artificial intelligence: a systematic review,” arxiv:2006.00093 [cs], 2020, http://arxiv.org/abs/2006.00093. 9 wojciech samek and klaus-robert muller, “towards explainable artificial intelligence,” in explainable ai: interpreting, explaining and visualizing deep learning, ed. wojciech samek et al., lecture notes in artificial intelligence 11700 (cham: springer international publishing, 2019), 17. 10 mueller et al., “explanation in human-ai systems.” 11 isto huvila et al., “information behavior and practices research informing information systems design,” journal of the association for information science and technology, 2021, 1–15, https://doi.org/10.1002/asi.24611. 12 darpa, explainable artificial intelligence (xai) (arlington, va: darpa, 2016), http://www.darpa.mil/attachments/darpa-baa-16-53.pdf. 13 matt turek, “explainable artificial intelligence (xai),” darpa, https://www.darpa.mil/program/explainable-artificial-intelligence. 14 julie gerlings, arisa shollo, and ioanna constantiou, “reviewing the need for explainable artificial intelligence (xai),” in proceedings of the hawaii international conference on system sciences, 2020, http://arxiv.org/abs/2012.01007. 15 william j. clancey, “the epistemology of a rule-based expert system—a framework for explanation,” artificial intelligence 20, no. 3 (1983): 215–51, https://doi.org/10.1016/00043702(83)90008-5; william swartout, “xplain: a system for creating and explaining expert consulting programs,” artificial intelligence 21 (1983): 285–325; william swartout, cecile paris, and johanna moore, “design for explainable expert systems,” ieee expert-intelligent systems & their applications 6, no. 3 (1991): 58–64, https://doi.org/10.1109/64.87686. 16 european union, “regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016,” 2016, http://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:32016r0679. https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies-landscape-summary.pdf https://doi.org/10.25333/xk7z-9g97 https://doi.org/10.29242/rli.299.3 http://arxiv.org/abs/2105.06677 http://arxiv.org/abs/2006.00093 https://doi.org/10.1002/asi.24611 http://www.darpa.mil/attachments/darpa-baa-16-53.pdf https://www.darpa.mil/program/explainable-artificial-intelligence http://arxiv.org/abs/2012.01007 https://doi.org/10.1016/0004-3702(83)90008-5 https://doi.org/10.1016/0004-3702(83)90008-5 https://doi.org/10.1109/64.87686 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 12 17 lilian edwards and michael veale, “slave to the algorithm? why a ‘right to explanation’ is probably not the remedy you are looking for,” duke law & technology review 16 (2017): 18–84; bryce goodman and seth flaxman, “european union regulations on algorithmic decision making and a ‘right to explanation’,” ai magazine 38, no. 3 (2017): 50–57, https://doi.org/10.1609/aimag.v38i3.2741; margot e. kaminski, “the right to explanation, explained,” berkeley technology law journal 34, no. 1 (2019): 189–218, https://doi.org/10.15779/z38td9n83h; sandra wachter, brent mittelstadt, and luciano floridi, “why a right to explanation of automated decision-making does not exist in the general data protection regulation,” international data privacy law 7, no. 2 (2017): 76–99, https://doi.org/10.1093/idpl/ipx005. 18 amina adadi and mohammed berrada, “peeking inside the black-box: a survey on explainable artificial intelligence (xai),” ieee access 6 (2018): 52138–60, https://doi.org/10.1109/access.2018.2870052; mueller et al., “explanation in human-ai systems”; vilone and longo, “explainable artificial intelligence.” 19 schwalbe and finzel, “xai method properties.” 20 or biran and courtenay cotton, “explanation and justification in machine learning: a survey” (international joint conference on artificial intelligence, workshop on explainable artificial intelligence (xai), melbourne, 2017), http://www.cs.columbia.edu/~orb/papers/xai_survey_paper_2017.pdf. 21 padilla, responsible operations. 22 jenna burrell and marion fourcade, “the society of algorithms,” annual review of sociology 47, no. 1 (2021): 231, https://doi.org/10.1146/annurev-soc-090820-020800. 23 nick seaver, “seeing like an infrastructure: avidity and difference in algorithmic recommendation,” cultural studies 35, no. 4–5 (2021): 775, https://doi.org/10.1080/09502386.2021.1895248. 24 frank c. keil, “explanation and understanding,” annual review of psychology 57 (2006): 227– 54, https://doi.org/10.1146/annurev.psych.57.102904.190100. 25 donald a. norman, “some observations on mental models,” in mental models, ed. dedre gentner and albert l. stevens (new york: psychology press, 1983), 7–14. 26 ashraf abdul et al., “trends and trajectories for explainable, accountable, and intelligible systems: an hci research agenda,” in proceedings of the 2018 chi conference on human factors in computing systems, chi ’18 (new york: acm, 2018), 582:1–582:18, https://doi.org/10.1145/3173574.3174156; joachim diederich, “methods for the explanation of machine learning processes and results for non-experts,” psyarxiv, 2018, https://doi.org/10.31234/osf.io/54eub. 27 pigi kouki et al., “user preferences for hybrid explanations,” in proceedings of the eleventh acm conference on recommender systems, recsys ’17 (new york, ny: acm, 2017), 84–88, https://doi.org/10.1145/3109859.3109915. https://doi.org/10.1609/aimag.v38i3.2741 https://doi.org/10.15779/z38td9n83h https://doi.org/10.1093/idpl/ipx005 https://doi.org/10.1109/access.2018.2870052 http://www.cs.columbia.edu/~orb/papers/xai_survey_paper_2017.pdf https://doi.org/10.1146/annurev-soc-090820-020800 https://doi.org/10.1080/09502386.2021.1895248 https://doi.org/10.1146/annurev.psych.57.102904.190100 https://doi.org/10.1145/3173574.3174156 https://doi.org/10.31234/osf.io/54eub https://doi.org/10.1145/3109859.3109915 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 13 28 tim miller, “explanation in artificial intelligence: insights from the social sciences,” artificial intelligence 267 (2019): 3, https://doi.org/10.1016/j.artint.2018.07.007. 29 jenna burrell, “how the machine ‘thinks’: understanding opacity in machine learning algorithms,” big data & society 3, no. 1 (2016), https://doi.org/10.1177/2053951715622512. 30 duri long and brian magerko, “what is ai literacy? competencies and design considerations,” in proceedings of the 2020 chi conference on human factors in computing systems, chi ’20 (honolulu, hi: association for computing machinery, 2020), 2, https://doi.org/10.1145/3313831.3376727. 31 michael ridley and danica pawlick-potts, “algorithmic literacy and the role for libraries,” information technology and libraries 40, no. 2 (2021), https://doi.org/doi.org/10.6017/ital.v40i2.12963. 32 waddah saeed and christian omlin, “explainable ai (xai): a systematic meta-survey of current challenges and future opportunities,” arxiv:2111.06420 [cs], 2021, http://arxiv.org/abs/2111.06420. 33 shane t. mueller et al., “principles of explanation in human-ai systems” (explainable agency in artificial intelligence workshop, aaai 2021), http://arxiv.org/abs/2102.04972. 34 sebastian bach et al., “on pixel-wise explanations for non-linear classifier decisions by layerwise relevance propagation,” plos one 10, no. 7 (2015): e0130140, https://doi.org/10.1371/journal.pone.0130140; biran and cotton, “explanation and justification in machine learning: a survey”; chris brinton, “a framework for explanation of machine learning decisions” (ijcai-17 workshop on explainable ai (xai), melbourne: ijcai, 2017), http://www.intelligentrobots.org/files/ijcai2017/ijcai-17_xai_ws_proceedings.pdf; chris olah, alexander mordvintsev, and ludwig schubert, “feature visualization,” distill, november 7, 2017, https://doi.org/10.23915/distill.00007. 35 edwards and veale, “slave to the algorithm?” 36 philip adler et al., “auditing black-box models for indirect influence,” knowledge and information systems 54 (2018): 95–122, https://doi.org/10.1007/s10115-017-1116-3. 37 alisa bokulich, “how scientific models can explain,” synthese 180, no. 1 (2011): 33–45, https://doi.org/10.1007/s11229-009-9565-1; keil, “explanation and understanding.” 38 herbert a. simon, “what is an ‘explanation’ of behavior?,” psychological science 3, no. 3 (1992): 150–61, https://doi.org/10.1111/j.1467-9280.1992.tb00017.x. 39 norbert schwarz et al., “ease of retrieval as information: another look at the availability heuristic,” journal of personality and social psychology 61, no. 2 (1991): 195–202, https://doi.org/10.1037/0022-3514.61.2.195; paul thagard, “evaluating explanations in law, science, and everyday life,” current directions in psychological science 15, no. 3 (2006): 141– 45, https://doi.org/10.1111/j.0963-7214.2006.00424.x. https://doi.org/10.1016/j.artint.2018.07.007 https://doi.org/10.1177/2053951715622512 https://doi.org/10.1145/3313831.3376727 https://doi.org/doi.org/10.6017/ital.v40i2.12963 http://arxiv.org/abs/2111.06420 http://arxiv.org/abs/2102.04972 https://doi.org/10.1371/journal.pone.0130140 http://www.intelligentrobots.org/files/ijcai2017/ijcai-17_xai_ws_proceedings.pdf https://doi.org/10.23915/distill.00007 https://doi.org/10.1007/s10115-017-1116-3 https://doi.org/10.1007/s11229-009-9565-1 https://doi.org/10.1111/j.1467-9280.1992.tb00017.x https://doi.org/10.1037/0022-3514.61.2.195 https://doi.org/10.1111/j.0963-7214.2006.00424.x information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 14 40 tania lombrozo, “explanatory preferences shape learning and inference,” trends in cognitive sciences 20, no. 10 (2016): 756, https://doi.org/10.1016/j.tics.2016.08.001. 41 sarah tan et al., “detecting bias in black-box models using transparent model distillation,” arxiv:1710.06169 [cs, stat], november 18, 2017, http://arxiv.org/abs/1710.06169. 42 marco tulio ribeiro, sameer singh, and carlos guestrin, “model-agnostic interpretability of machine learning,” arxiv:1606.05386 [cs, stat], 2016, http://arxiv.org/abs/1606.05386. 43 beta writer, lithium-ion batteries: a machine-generated summary of current research (heidelberg: springer nature, 2019), https://link.springer.com/book/10.1007/978-3-03016800-1. 44 henning schoenenberger, christian chiarcos, and niko schenk, preface to lithium-ion batteries; a machine-generated summary of current research, by beta writer, (heidelberg: springer international publishing, 2019). 45 michael ridley, “machine information behaviour,” in the rise of ai: implications and applications of artificial intelligence in academic libraries, ed. sandy hervieux and amanda wheatley (association of college and university libraries, 2022). 46 babatunde kazeem olorisade, pearl brereton, and peter andras, “reproducibility of studies on text mining for citation screening in systematic reviews: evaluation and checklist,” journal of biomedical informatics 73 (2017): 1–13, https://doi.org/10.1016/j.jbi.2017.07.010; babatunde k. olorisade, pearl brereton, and peter andras, “reproducibility in machine learning-based studies: an example of text mining,” in reproducibility in ml workshop (international conference on machine learning, sydney, australia, 2017), https://openreview.net/pdf?id=by4l2pbq-. 47 joelle pineau, “reproducibility challenge,” october 6, 2017, http://www.cs.mcgill.ca/~jpineau/iclr2018-reproducibilitychallenge.html. 48 benjamin haibe-kains et al., “transparency and reproducibility in artificial intelligence,” nature 586, no. 7829 (2020): e14–e16, https://doi.org/10.1038/s41586-020-2766-y; benjamin j. heil et al., “reproducibility standards for machine learning in the life sciences,” nature methods, august 30, 2021, https://doi.org/10.1038/s41592-021-01256-7. 49 cliff kuang, “can a.i. be taught to explain itself?,” the new york times magazine, november 21, 2017, 50, https://nyti.ms/2hr1s15. 50 amitai etzioni and oren etzioni, “incorporating ethics into artificial intelligence,” the journal of ethics 21, no. 4 (2017): 403–18, https://doi.org/10.1007/s10892-017-9252-2. 51 kamran alipour et al., “improving users’ mental model with attention-directed counterfactual edits,” applied ai letters, 2021, e47, https://doi.org/10.1002/ail2.47. 52 association for computing machinery, statement on algorithmic transparency and accountability (new york: acm, 2017), http://www.acm.org/binaries/content/assets/publicpolicy/2017_joint_statement_algorithms.pdf; alex campolo et al., ai now 2017 report (new https://doi.org/10.1016/j.tics.2016.08.001 http://arxiv.org/abs/1710.06169 http://arxiv.org/abs/1606.05386 https://link.springer.com/book/10.1007/978-3-030-16800-1 https://link.springer.com/book/10.1007/978-3-030-16800-1 https://doi.org/10.1016/j.jbi.2017.07.010 https://openreview.net/pdf?id=by4l2pbqhttp://www.cs.mcgill.ca/~jpineau/iclr2018-reproducibilitychallenge.html https://doi.org/10.1038/s41586-020-2766-y https://doi.org/10.1038/s41592-021-01256-7 https://nyti.ms/2hr1s15 https://doi.org/10.1007/s10892-017-9252-2 https://doi.org/10.1002/ail2.47 http://www.acm.org/binaries/content/assets/public-policy/2017_joint_statement_algorithms.pdf http://www.acm.org/binaries/content/assets/public-policy/2017_joint_statement_algorithms.pdf information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 15 york: ai now institute, 2017); ieee, ethically aligned design: a vision for prioritizing human wellbeing with artificial intelligence and autonomous systems (new york: ieee, 2019), https://standards.ieee.org/content/dam/ieeestandards/standards/web/documents/other/ead1e.pdf. 53 association for computing machinery, statement on algorithmic transparency and accountability, 2. 54 lilian edwards and michael veale, “enslaving the algorithm: from a ‘right to an explanation’ to a ‘right to better decisions’?,” ieee security & privacy 16, no. 3 (2018): 46–54. 55 kate crawford and jason schultz, “big data and due process: toward a framework to redress predictive privacy harms,” boston college law review 55, no. 1 (2014): 93–128. 56 andrew tutt, “an fda for algorithms,” administrative law review 69, no. 1 (2017): 83–123. 57 corinne cath et al., “artificial intelligence and the ‘good society’: the us, eu, and uk approach,” science and engineering ethics, march 28, 2017, https://doi.org/10.1007/s11948-017-9901-7. 58 edwards and veale, “slave to the algorithm?” 59 matthew u. scherer, “regulating artificial intelligence systems: risks, challenges, competencies, and strategies,” harvard journal of law & technology 29, no. 2 (2016): 353– 400. 60 roger brownsword, “from erewhon to alphago: for the sake of human dignity, should we destroy the machines?,” law, innovation and technology 9, no. 1 (january 2, 2017): 117–53, https://doi.org/10.1080/17579961.2017.1303927. 61 birhane et al., “the values encoded in machine learning research”; ana brandusescu, artificial intelligence policy and funding in canada: public investments, private interests (montreal: centre for interdisciplinary research on montreal, mcgill university, 2021). 62 cath et al., “artificial intelligence and the ‘good society’”; law commission of ontario and céline castets-renard, comparing european and canadian ai regulation, 2021, https://www.lcocdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulationfinal-november-2021.pdf. 63 european commission, “artificial intelligence act,” 2021, https://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:52021pc0206. 64 dillon reisman et al., algorithmic impact assessment: a practical framework for public agency accountability (new york: ai now institute, 2018), https://ainowinstitute.org/aiareport2018.pdf. 65 treasury board of canada secretariat, “directive on automated decision-making,” 2019, http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf https://doi.org/10.1007/s11948-017-9901-7 https://doi.org/10.1080/17579961.2017.1303927 https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:52021pc0206 https://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:52021pc0206 https://ainowinstitute.org/aiareport2018.pdf http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 16 66 danielle keats citron and frank pasquale, “the scored society: due process for automated predictions,” washington law review 89 (2014): 1–33; scherer, “regulating artificial intelligence systems.” 67 julia angwin et al., “machine bias,” propublica, may 23, 2016, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. 68 “state v. loomis,” harvard law review 130, no. 5 (2017), https://harvardlawreview.org/2017/03/state-v-loomis/. 69 “loomis v. wisconsin,” scotusblog, june 26, 2017, http://www.scotusblog.com/casefiles/cases/loomis-v-wisconsin/. 70 brownsword, “from erewhon to alphago”; campolo et al., ai now 2017 report; ieee, ethically aligned design; pasquale, the black box society: the secret algorithms that control money and information; wachter, mittelstadt, and floridi, “why a right to explanation.” 71 michael power, the audit society: rituals of verification (oxford: oxford university press, 1997). 72 alfred ng, “can auditing eliminate bias from algorithms?,” the markup, february 23, 2021, https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-fromalgorithms. 73 joshua alexander knoll, “accountable algorithms” (phd diss, princeton university, 2015). 74 christian sandvig et al., “auditing algorithms: research methods for detecting discrimination on internet platforms,” data and discrimination: converting critical concerns into productive inquiry, 2014, http://wwwpersonal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20-%20ica%202014%20data%20and%20discrimination%20preconference.pdf. 75 association for computing machinery, statement on algorithmic transparency and accountability. 76 sandvig et al., “auditing algorithms,” 17. 77 ng, “can auditing eliminate bias from algorithms?” 78 cathy o’neil, weapons of math destruction: how big data increases inequality and threatens democracy (new york: crown, 2016). 79 emanuel moss et al., assembling accountability: algorithmic impact assessment for the public interest (data & society, 2021), https://datasociety.net/wpcontent/uploads/2021/06/assembling-accountability.pdf. 80 david s. watson and luciano floridi, “the explanation game: a formal framework for interpretable machine learning,” synthese (dordrecht) 198, no. 10 (2020): 9214, https://doi.org/10.1007/s11229-020-02629-9. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing https://harvardlawreview.org/2017/03/state-v-loomis/ http://www.scotusblog.com/case-files/cases/loomis-v-wisconsin/ http://www.scotusblog.com/case-files/cases/loomis-v-wisconsin/ https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-from-algorithms https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-from-algorithms http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf https://datasociety.net/wp-content/uploads/2021/06/assembling-accountability.pdf https://datasociety.net/wp-content/uploads/2021/06/assembling-accountability.pdf https://doi.org/10.1007/s11229-020-02629-9 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 17 81 ahmed alkhateeb, “science has outgrown the human mind and its limited capacities,” aeon, april 24, 2017, https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limitedcapacities; don r. swanson, “undiscovered public knowledge,” the library quarterly 56, no. 2 (1986): 103–18; don r. swanson, “medical literature as a potential source of new knowledge.,” bulletin of the medical library association 78, no. 1 (1990): 29–37. 82 jack anderson, “understanding and interpreting algorithms: toward a hermeneutics of algorithms,” media, culture & society 42, no. 7–8 (2020): 1479–94, https://doi.org/10.1177/0163443720919373. 83 ed finn, “algorithm of the enlightenment,” issues in science and technology 33, no. 3 (2017): 24. 84 jos de mul and bibi van den berg, “remote control: human autonomy in the age of computermediated agency,” in law, human agency, and autonomic computing, ed. mireille hildebrandt and antoinette rouvroy (abingdon: routledge, 2011), 59. 85 mariarosaria taddeo, “trusting digital technologies correctly,” minds and machines 27, no. 4 (2017): 565, https://doi.org/10.1007/s11023-017-9450-5. 86 cade metz, genius makers: the mavericks who brought ai to google, facebook, and the world (dutton, 2021). 87 tom simonite, “google’s ai guru wants computers to think more like brains,” wired, december 12, 2018, https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/. 88 nick wallace, “eu’s right to explanation: a harmful restriction on artificial intelligence,” techzone, january 25, 2017, http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-rightexplanation-harmful-restriction-artificial-intelligence.htm#. 89 mueller et al., “explanation in human-ai systems.” 90 bunn, “working in contexts for which transparency is important,” 143. https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limited-capacities https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limited-capacities https://doi.org/10.1177/0163443720919373 https://doi.org/10.1007/s11023-017-9450-5 https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/ http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-right-explanation-harmful-restriction-artificial-intelligence.htm http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-right-explanation-harmful-restriction-artificial-intelligence.htm abstract introduction what is xai? types of xai prerequisites to an xai strategy proofs and validations feature audit approximation and abstraction reproducibility xai by ai authorization codes and standards regulation audit xai as discovery conclusion endnotes 66 journal of lihm1'y automation vol. 7/1 march 1974 book reviews computer systems in the library: a handbook for managers and designers. by stanley j. swihart and beryl f. hefley. a wiley-becker & hayes series book. los angeles: melville publishing company, 1973. 388p. once every year or two, either in england or the united states, a book appears attempting to explain computer systems to librarians. this book, compute?' systems in the library, is the most recent of the introductory texts. it starts off with a chapter entitled "why automate?" which skims ve1y lightly and uncritically over the often-repeated reasons for using computers. in this instance, money is included as a reason to automate, for we are told that "when properly planned, unit operating costs are normally reduced when a function is automated." automation's impact on the library's research and development budget is not discussed. the book then proceeds to the six chapters which occupy the bull< of the book they cover the automation of six major librmy functions: catalog publication, circulation, acquisitions, cataloging, catalog reference services, and serials. each chapter consists of a description of one or two apparently existing automated systems, with a complete discussion of how the system functions, what files are involved, the data in each file, coding and formats used in the files, and reproductions of various output products from each file. unfortunately, we are not told where each of these systems exists, and the systems often appear to use techniques that are suitable only for very small libraries. for example, in the circulation system that is described, a packet of prepunched book cards is to be carried in the book; each time the book is charged or discharged one of the cards is removed, with the last card serving as a signal to create a new deck of cards. little mention is made of the data collection terminals that are so commonly used in automated circulation systems, with the result that the description is very closely linked to a single system, with little opportunity for the reader to compare various methods or techniques of information handling. the latter part of the book addresses itself to some general problems, including the interlibrary sharing of data and programs; the planning, implementation, and control of automation projects; and brief discussions of input and output problems, the protection of records, and some considerations in choosing hardware. three appendixes offer a 2,500-word exclusion list for kwic indexes, a set of model keypunching rules for a corporate library, and a thirty-three-item bibliography in which the majority of works listed were published between 1964 and 1968. a major weakness of the book seems to be its lack of critical focus. library automation problems are treated as being not particularly difficult; in fact, "the authors can see no serious or major disadvantages to automation in libraries. the situation," we are told, "can be compared with the disadvantages of using typewriters or telephones." this reviewer finds it difficult to know what sort of audience these words, and the entire book, are addressed to. though subtitled "a handbook for managers and designers," it would be an inexperienced manager indeed who needed to be told that "in its mode of operation, a keypunch is quite similar to a typewriter. a key must be struck for each character . . . ," or that "the catalog master file may be stored on magnetic tape reels or on magnetic disks." the experienced librarian, on the other hand, will not be pleased to learn that "many libraries with computer systems have given up the library of congress [filing] system for mel mac and have placed mac in order between mab and mad, and me between mb and md." nor will anyone associated with libraries be pleased to discover that "computer centers not only can, but frequently do, lose information. from time to time complete files are erased. there is almost no way to ensure that information will not be inadvertently erased." the librarian who is already involved in automated systems will not need this book; the librarian who wishes to learn about automation and the systems analyst who needs to understand library systems will do well to read other sources in addition to this one. peter simmons university of british columbia the metropolitan library. edited by ralph w. conant and kathleen molz. cambridge, mass.: m.i.t. press, 1972. 333p. $10.00. the editors describe this book as a sequel to the important public library and the city (1965), also published by m.i.t. press. the focus again is on the concerns of metropolitan public librarians, combining the viewpoints of specialists from library and social science disciplines. of the eighteen papers included, only three, by john trebbel, john bystrom, and kathleen molz, concentrate on the implications of present and future technology on public library service. their papers offer a general, if hard-nosed, approach to the need for specific research into the economic, behavioral, professional, and technological barriers impeding the advent of the automated millenium. micrographics, reprography, computers, facsimile transmission, telecommunications hardware, and technology are considered essential components of information transbook reviews 61 fer with which libraries must become compatible-and comfortable. the ijllperative need for and conduct of long-range research in telecommunications is outlined by bystrom, including aspects of research necessary for both a national telecommunications network linking all types of libraries and the local use of community cablevision by individual library outlets. the three authors devote considerable head-shaking to the chilling reality of financing technological adaptations and innovations in libraries-the "snake in eden" according to trebbel. governments, specifically national governments, are cited as the logical sources of the enormous sums required for automated library and information services of whatever kind. molz warns repeatedly and forcefully that libraries, while not discarding the book, must change their priorities. continued dependence on print as the prime information transfer medium is insupportable. the public library must adapt to a multimedia world. none of the foregoing is new to 'information scientists or specialists in automation, but as concerned participants in the knowledge business they should find these papers of general interest. lois m. bewley university of british columbia library space information model based on gis — a case study of shanghai jiao tong university yaqi shen information technology and libraries | september 2018 99 yaqi shen (yqshen@sjtu.edu.cn) is a librarian at shanghai jiao tong university. abstract in this paper, a library-space information model (lsim) based on a geographical information system (gis) was built to visually show the bookshelf location of each book through the display interface of various terminals. taking shanghai jiao tong university library as an example, both spatial information and attribute information were integrated into the model. in the spatial information, the reading room layout, bookshelves, reference desks, and so on were constructed with different attributes. the bookshelf layer was the key attribute of the bookshelves, and each book was linked to one bookshelf layer. through the field of bookshelf layer, the book in the query system can be connected with the bookshelf-layer information of the lsim. with the help of this model, readers can search books visually in the query system and find the books’ positions accurately. it can also be used in the inquiry of special-collection resources. additionally, librarians can use this model to analyze books’ circulation status, and books with similar subjects that are frequently circulated can be recommended to readers. the library’s permanent assets (chairs, tables, etc.) could be managed visually in the model. this paper used gis as a tool to solve the problem of accurate positioning, simultaneously providing better services for readers and realizing visual management of books for librarians. introduction geographical information systems (gis) are powerful tools that can edit, store, analyze, display, and manage geographical data. early in 1992, several association of research libraries (arl) institutions, including the university of georgia, harvard university, north carolina state university, and southern illinois university, launched the gis literacy project and carried out an extensive survey about the possible applications of gis in libraries.1 since then, studies about the application of gis in library research have attracted more and more attention.2 gis is effective for library-planning efforts, such as investigating library-service areas, modeling the implications of the opening and closing of library services, informing initial location decisions, and so on.3 the university of idaho library adopted gis to link variables such as age, race, income, and education from the 2000 us census with the service-area maps of two proposed branch libraries. based on the thematic maps created, the demographic information about potential library users can be displayed. most importantly, the maps were also helpful for improving the library-service planning. koontz et al. from florida state university investigated the reasons for public-library closure by using gis. the authors presented a methodology using gis to describe libraries’ geographic market to illustrate the effects of facility location, relocation, and permanent closure on potential users. sin used gis with inequality measures and multiple regressions to analyze statistics from the public-libraries survey and the census-tract data. then the nationwide library space information model based on gis | shen 100 https://doi.org/10.6017/ital.v37i3.10308 multivariate study of the neighborhood-level variations was investigated, and the public libraries’ funding and service landscapes were mapped. gis can also provide strong support for the library accessibility.4 in south wales, united kingdom, a case study about a preliminary analysis of spatial variations in accessibility of library services was carried out based on a gis model. park further measured the public-library accessibility accurately and provided realistic analysis by using gis, including descriptive and statistical analyses and a road network–based distance measure. in another paper, park went a step further to measure readers’ travel time and distance while they are using the library. in addition to using gis for library planning and accessibility, it can be also applied to managing the collections, including the physical documents and digital databases of an academic library.5 solar and radovan from the national and university library of slovenia explored the possibility of creating a virtual collection of diverse materials like maps and pictorial documents using gis. they connected spatial data with other pictorial elements, including views and portrait images with hyperlinks.6 coyle from rochester public library studied the implementation of gis in the library collection. he believed that libraries that implemented gis early on would have an intellectual advantage over those coming on board later.7 sedighi conducted research about gis as a decisionsupport system in analyzing geospatial data in the databases of an academic library. by using the analysis functions of the system, a range of features could be indicated; for example, the spatial relationships of data based on the educational course can be analyzed.8 boda used a 3d virtuallibrary model to represent the most prominent and celebrated collection of classical antiquity in the alexandria library.9 beyond the applications mentioned above, some libraries have used gis techniques to analyze reader behaviors.10 xia developed gis into an analytical tool for examining the relationships between the height of the shelf and the frequency of book use, revealing that readers tended to pull books off shelves that are easily reachable by human eyes and hands. mandel used gis to map the most popular routes that readers took when entering the library. based on the seating sweeps method, mandel adopted maps to depict use of tables and computers. the research results of both xia and mandel can provide the information of readers’ behavior whereby the books’ positions, and accordingly the entry routes and facilities’ evaluation can be adjusted strategically. though lots of work has been done about the application of gis to the library, there are few reports about visually showing the exact position of each book through the library-catalog display interface, which is of great importance both for the readers and the librarians. xia located library items with gis and pointed out that updating the starting and ending call numbers for each shelf could be the most tedious work.11 specifically, gis cannot tell if the book is not in its correct location or is being used by somebody else. xia advised combining gis with radio frequency identification (rfid), both of which have the capability of tracing the location of each book. stackmap, a library-mapping tool providing a collection-mapping product for librarians, was being used at the hampton library.12 the shanghai jiao tong university library built an interface that would use gis to identify the specific location of each book in the catalog. a gis model that includes spatial and attribute information was constructed. the connection of gis, rfid, and opac was discussed in detail. additionally, the relationship between the bookshelves and patrons’ behavior was studied deeply. information technology and libraries | septmber 2018 101 it is hoped that this gis model will bring convenient services for readers and efficient management for librarians. methodology background in 1984, shanghai jiao tong university circulation system was built based on barcode-reader technology. the first automated library-management system (lms), minisis and image library integrated system, was implemented in 1988. in 1993, the second lms, the unify online multiuser system, was implemented. in 1994, an open public access catalogue (opac) system was built based on the unils, allowing readers to query the library bibliographic record through the computer. in 1998, the third automated lms, a client/server–based tool, was built based on the horizon lms. in 2008, we launched the aleph integrated library system (ils). in the same year, primo, a resource discovery and access system, was introduced. in 2009, the our explore interface was built based on the primo system, providing the services of resource retrieval and access.13 rfid technology was introduced in 2014, and now readers can borrow or return books through self-service machines. users can find a book via the opac or our explore system in the shanghai jiao tong university library homepage (http://www.lib.sjtu.edu.cn/index.php?m=content&c=index&lang=en), a screen shot of which is shown in figure 1. book information can be found through the systems, but the exact position of the books cannot be exhibited in the system. at the library reference desk, the question readers ask most frequently is where they can find a certain book. the chinese library classification (clc) system is used to organize the collections in the shanghai jiao tong university. the librarians are very familiar with the classification. however, it is hard for the inexperienced users to understand, even if they have been trained. although static maps can guide patrons to find the books, patrons sometimes still have difficulties finding the books. if the readers can get the exact bookshelf location for a book through the opac or our explore system, the users’ experience could be improved significantly, and much of readers’ time for finding the books could be saved. therefore, it is necessary to introduce gis to the library with the aim of visually showing the position of each book. furthermore, library managers need to plan the budget at the end of every year. the arrangement of different subjects should be considered in the planning. although the usage of the collections by the ils provides reference for the planning, a library-space information model (lsim) would bring a new insight. software there are many kinds of gis software in this research field, including commercial products such as arcgis, mapinfo, and mapgis as well as free and open-source software (foss) solutions. taking foss and arcgis for example, foss can provide a broader context of the open-source software movement and developments in gis.14 no single foss package can match all the functionality that arcgis has for creating thematic maps; therefore, the function of spatial analysis and data processing of arcgis is more powerful. the software used in this study is arcgis 10.3 trial version. http://www.lib.sjtu.edu.cn/index.php?m=content&c=index&lang=en library space information model based on gis | shen 102 https://doi.org/10.6017/ital.v37i3.10308 figure 1. opac and our explore in the shanghai jiao tong university library homepage. methods there are two modules in the lsim, including spatial information and attributes information, as shown in figure 2. spatial information, including the building position, the reading-room layout, bookshelf information, and so on, is transferred to shapefile style. remote-sensing information is used to set the geographic location of the library. these elements are constructed with different attributes, and 2d-attribute and 3d-multipatch data are stored in the geodatabase. arcmap and arcscene are used to generate the 2d and 3d maps and analyze the readers’ behavior. we connect the spatial information with data from the opac, our explore, and rfid. the query fields (which we call “general information”) in the opac are title, author, keyword, call number, issn, isbn, system number, barcode, collection location, and publisher. in the our explore system, readers can not only search the general information, but also refine the search results by specific fields, such as topic, author, collection location, published date, and clc. the functions of book reserving and renewing are also supported by these two systems. rfid is introduced to the shanghai jiao tong library to allow self-service, and the fields include collection location, subject, issn, isbn, barcode, and so on. barcode is the common field in all three systems and is used to connect them. in the rfid system, the bookshelf is the unique identification of each shelf in the bookshelves. in the shanghai jiao tong university library, the first-book location method is used to manage books in the rfid system. the first book on each bookshelf is recorded as a different bookshelf location, and the books on one bookshelf are assigned to the same bookshelf location. the books are ordered and arranged according to the call number. a book’s current status can be obtained in the information technology and libraries | septmber 2018 103 rfid system by shelf inventory. the books that are borrowed by patrons or not on the right shelf would be recorded in the rfid system. the key attribute information in the lsim is the bookshelf layer, which is used to describe the book’s position. the field of the bookshelf layer is connected with the rfid data. taking the bookshelf layer of rfid as the attribute field, the position of a book can be located by the bookshelf layer in the lsim. compared to xia’s research, it is easier to get the bookshelf-layer information based on the rfid in the lsim.17 figure 2. research flowchart. the connection of the opac, rfid, and lsim is shown in figure 3. when the reader locates a book in the opac or our explore, the barcode will be shown in the system. the bookshelf layer in the rfid system can be retrieved through the barcode immediately. the map of the reading room has been embedded in the opac. furthermore, the coordinates of the book (x, y, height) can be shown through the bookshelf layer. the index of each bookshelf coordination is created in the opac, rfid system, and lsim. the field of the map presentation is built in the opac, and the search interface is supported by the arcmap and arcscene. the url link is the content of the field, and its content is varied with the different bookshelves. in short, when the reader searches one book, the related bookshelf coordination is highlighted in the map. through the bookshelf layer field, the book information in the query system can be connected with that of lsim. faculty and students can search books in the query system visually. as shown in figure 2, spatial information and attribute information are connected in the lsim. furthermore, a lsim based on gis is built to provide better services for readers and enhance librarians’ visual management. library space information model based on gis | shen 104 https://doi.org/10.6017/ital.v37i3.10308 figure 3. the connection of the opac, rfid system, and lsim. figure 4. finding a book in the our explore system. information technology and libraries | septmber 2018 105 figure 5a. the visual position of the book with the call number r318-53/3 (2d). figure 5b. the visual position of the book with the call number r318-53/3 (3d). library space information model based on gis | shen 106 https://doi.org/10.6017/ital.v37i3.10308 discussion providing services for readers by lsim visual query in the reading room when a book about biological medicine is required, it can be searched by using the keyword “biological medicine” in our explore. then, as shown in figure 4, a book titled amalgamation within evolution can be found with the clc call number r318-53/3. readers can find the book with the call number in the corresponding reading room. however, if the lsim is applied, the search results include not only the text information about the book’s location, but also a visual map. firstly, the barcode of the book (32832872) is identified and passed to the bookshelf layer. the bookshelf layer (a4r042c04) will be found in the lsim. then the book’s spatial position can be shown on a visual map. figures 5a and 5b show the 2d and 3d visual position of the book with the call number r318-53/3, and these two results can be switched in the system. the red arrow is the book’s position. based on the visual position, readers can find the book more conveniently. the reading rooms in shanghai jiao tong university library are organized by subject. in each reading room, the books with related categories are distributed together. figures 5a and 5b show the layout of one reading room. the books with the large clc classes, i.e., o, p, q, r, and s, were studied as an example in the reading room in this paper. the red triangles represent chairs and the light green rectangles represents desks. shelves are alphabetically labeled. the reference desk, office area, group study room, storehouse, inquiry machines, printers, and stairs are also shown. special collections in different reading rooms in the shanghai jiao tong university library, there are many special collections, such as contract documents, tsung-dao lee’s manuscripts, alumni theses, important findings of research teams, and so on. because of their rarity, these special collections do not circulate and can only be read in the reading rooms. furthermore, these collections are located in different branch libraries. the geographical information of these resources can be input into the model. scholars can use lsim to achieve the exact positions of these resources, go directly to the related area, and quickly find these special items. library analysis and management book-borrowing situation analysis using gis, it is also possible to show how often books circulate based on their physical location. as shown in figure 6, each rectangle represents a shelf in the reading room. the books with the same topic are placed on the same shelf. the number labeled on the shelf represents the average borrowing frequency of the books on this shelf. different colors mean different frequency, with scale of five to one hundred. the clc classes o, p, and q appearing on the right of the shelves represent mathematical sciences and chemistry, astronomy and geosciences, and bioscience, respectively. information technology and libraries | septmber 2018 107 figure 6. average borrowing frequency of the books on each shelf in one reading room. based on analysis of the relationship between borrowing frequency and subject category, the hot spots of the professional fields can be found and shown. in turn, books related to the hot spots can be recommended to readers. taking class o as an example, the shelf position of the highest borrowing frequency (100) is in row 9, column 2. according to the query system, the theme of the books on this shelf is high polymer chemistry. the books with high borrowing frequency can be highlighted both on the bookshelf and in the query system. if the higher-borrowing-frequency books on the remote shelves meet school discipline development policy, the purchases of these books will be increased. books related to the subjects with the higher borrowing frequency on the taller or lower shelves will also be considered, and vice versa. permanent-assets management permanent assets such as chairs, desks, shelves, inquiry machines, printers, etc., can be managed in this model. information about permanent assets (such as their status, spatial position, etc.) was input in the model, as is shown in figures 5a and 5b. librarians can find the visual positions of permanent assets at any time, and readers can conveniently find the inquiry machines or printers to search books and print documents. library space information model based on gis | shen 108 https://doi.org/10.6017/ital.v37i3.10308 future directions the lsim is only tested in one reading room and is still experimental. this model will be expanded to the whole library, providing visual information of library books and materials. in the process of using this model, gis potentiality in the library will be exploited to provide better services for readers and managers. conclusion based on readers’ need of the book position in the library, the lsim is built to visually show the exact bookshelf layer of the book. spatial and attribute information is combined into the model. based on the model, readers can search for books and find books’ positions. meanwhile, many special collections located in the different branches can be easily found in the model. the gis model not only brings convenience to readers, but also supports the library’s analysis and management. librarians can analyze books’ circulation history based on the relationship between the books’ borrowing frequency and subject categories. books with higher borrowing frequency and ones related them can be recommended to the readers. then the number of the purchased books with the higher borrowing frequency in the remote, taller, or lower places will be increased based on the above analysis. permanent assets can also be managed, and librarians can conveniently find the status and spatial position of the inquiry machines, printers, and so on. in short, the application of gis in the library will bring a visual insight into the library, providing a better reader experience and better library management. acknowledgements i thank guo jing, chen jiayi and huang qinling, shanghai jiao tong university library, for their advice on the structure of this article and the grammar of the written english. i also thank liu min and peng xia, east china normal university, for their help in the model building. research was funded by the “fundamental research funds for the central universities" (grant 17jcya13), shanghai jiao tong university. information technology and libraries | septmber 2018 109 endnotes 1 d. kevin davie, james fox, and barbara preece, the arl geographic information systems literacy project. spec kit 238 and spec flyer 238 (washington, dc: association of research libraries, 1999). 2 b. w. bishop and l. h. mandel, “utilizing geographic information systems (gis) in library research,” library hi tech 4, no. 4 (2010): 536–47. 3 karen hertel and nancy sprague, “gis and census data: tools for library planning,” library hi tech 25, no. 2 (2007): 246–59, https://doi.org/10.1108/07378830710755009; christie m. koontz, dean k. jue, and bradley wade bishop, “public library facility closure: an investigation of reasons for closure and effects on geographic market areas,” library information science research 31, no. 2 (2009): 84–91, https://doi.org/10.1016/j.lisr.2008.12.002; sei-ching joanna sin, “neighborhood disparities in access to information resources: measuring and mappin g u.s. public libraries’ funding and service landscapes,” library information science research 33, no. 1 (2011): 41–53, https://10.1016/j.lisr.2010.06.002. 4 gary higgs, mitch langford. and richard fry, “investigating variations in the provision of digital services in public libraries using network-based gis models,” library and information science research 35, no. 1 (2013): 24–32, https://doi.org/10.1016/j.lisr.2012.09.002; sung jae park, “measuring public library accessibility: a case study using gis,” library and information science research 34, no. 1 (2012): 13–21, https://doi.org/10.1016/j.lisr.2011.07.007; sung jae park, “measuring travel time and distance in library use,” library hi tech 30, no. 1 (2012): 151–69, https://doi.org/10.1108/07378831211213274. 5 wang xuemei et al., “applications and researches of geographic information system technologies in bibliometrics,” earth science informatics 7, no. 3 (2014): 147–52, https://doi.org/10.1007/s12145-013-0132-4. 6 renata solar and dalibor radovan, “use of gis for presentation of the map and pictorial collection of the national and university library of slovenia,” information technology and libraries 24, no. 4 (2005): 196–200, https://doi.org/10.6017/ital.v24i4.3385. 7 andrew coyle, “interior library gis,” library hi tech 29, no. 3 (2011): 529–49, https://doi.org/10.1108/07378831111174468. 8 mehri sedighi, “application of geographic information system (gis) in analyzing geospatial information of academic library databases,” electronic library 30, no. 3 (2012): 367–76, https://doi.org/10.1108/02640471211241645. 9 istván boda et al., “a 3d virtual library model: representing verbal and multimedia content in three dimensional space,” qualitative and quantitative methods in libraries 4, no. 4 (2017): 891–901. 10 xia jingfeng, “using gis to measure in-library book-use behavior,” information technology and libraries 23, no 4 (2004): 184–91, https://doi.org/10.6017/ital.v23i4.9663; lauren h. mandel, “toward an understanding of library patron wayfinding: observing patrons’ entry routes in a mailto:https://doi.org/10.1108/07378830710755009 mailto:https://doi.org/10.1016/j.lisr.2008.12.002 mailto:https://10.1016/j.lisr.2010.06.002 https://doi.org/10.1016/j.lisr.2012.09.002 https://doi.org/10.1108/07378831211213274 https://doi.org/10.1108/02640471211241645 https://doi.org/10.6017/ital.v23i4.9663 library space information model based on gis | shen 110 https://doi.org/10.6017/ital.v37i3.10308 public library,” library and information science research 32, no. 2 (2010): 116–30, https://doi.org/10.1016/j.lisr.2009.12.004; lauren h. mandel, “geographic information systems: tools for displaying in-library use data,” information technology and libraries 29, no. 1 (2010): 47–52, https://doi.org/10.6017/ital.v29i1.3158. 11 xia jingfeng, “locating library items by gis technology,” collection management 30, no. 1 (2005): 63–72, https://doi.org/10.1300/j105v30n01_07. 12 matt enis, “technology: capira adds stackmap,” library journal 139, no. 13 (2014): 17. 13 chen jin, the history of shanghai jiao tong university library (shanghai: shanghai jiao tong university press, 2013). 14 francis p. donnelly, “evaluating open source gis for libraries,” library hi tech 28, no. 1, (2010): 131–51, https://doi.org/10.1108/07378831011026742. https://doi.org/10.1016/j.lisr.2009.12.004 https://doi.org/10.6017/ital.v29i1.3158 https://doi.org/10.1300/j105v30n01_07 https://doi.org/10.1108/07378831011026742 abstract introduction methodology background software figure 1. opac and our explore in the shanghai jiao tong university library homepage. methods figure 4. finding a book in the our explore system. figure 5a. the visual position of the book with the call number r318-53/3 (2d). figure 5b. the visual position of the book with the call number r318-53/3 (3d). discussion providing services for readers by lsim visual query in the reading room special collections in different reading rooms library analysis and management book-borrowing situation analysis figure 6. average borrowing frequency of the books on each shelf in one reading room. permanent-assets management future directions conclusion acknowledgements endnotes president’s message thomas dowling information technologies and libraries | december 2015 doi: 10.6017/ital.v34i4.9150 1 the lita governing board has had a productive autumn, and i wanted to share a few highlights. keeping an eye on how better to understand and improve the member experience, we have a couple of new groups getting down to work. lita local task force i'm writing this shortly after returning from lita forum 2015, which was a fantastic meeting. i'm glad that so many people were able to attend, and i hope even more will come to forum 2016. but we know many members cannot regularly travel to national meetings, and even the best online experience can lack the serendipitous benefits that so often come from face-to-face meetings. the new lita local task force will be responsible for creating a toolkit to facilitate local groups’ ability to host events, including information on event planning, accessibility, and ensuring an inclusive culture at meetings. so you’ll be able to host a lita event in your own backyard! (if your backyard has a couple of meeting rooms and good wireless.) forum assessment and alternatives task force as we begin work on lita local events, we are also turning our eyes to our national meeting. planning the next lita forum is essentially a year-round process. we assess the work we’ve done on previous forums, of course, but the annual schedule often doesn’t afford an opportunity to strategically rethink what forum is and how it can best serve the members. to address that issue, we’re convening another new task force, on forum assessment and alternatives. this group will look critically at how forum advances our strategic priorities, and will also look at other library technology conferences to help identify how forum can continue to distinguish itself in a rapidly changing environment. lita personas task force finally, as i write this, the board is in the final stages of creating a personas task force as a tool for better understanding our current and potential new members. a well-constructed set of personas, representing both people who are lita members and people who aren’t—but who could be or should be—will become a valuable tool for membership development, programming, communications, assessment, and other purposes. each of these task forces will work throughout 2016 and deliver their results by midwinter 2017. it is worth noting that we could only convene these groups because we have a strong list of volunteers on tap. if you haven’t filled out a lita volunteer form recently, please considering doing so at http://www.ala.org/lita/about/committees. thomas dowling (dowlintp@wfu.edu) is lita president 2015-16 and director of technologies, z. smith reynolds library, wake forest university, winston-salem, north carolina. http://www.ala.org/lita/about/committees mailto:dowlintp@wfu.edu lita local task force forum assessment and alternatives task force lita personas task force a library website migration: project planning in the midst of a pandemic communication a library website migration project planning in the midst of a pandemic isabel vargas ochoa information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.14801 isabel vargas ochoa (ivargas2@csustan.edu) is stockton campus & web services librarian, california state university, stanislaus. © 2022. abstract this article provides a background on the migration of the california state university (csu), stanislaus library website from an open-source platform to a content management system specifically designed for library websites. before the migration, there was a trial of different content management systems (cms), a student usability study, and consultations with outside web and systems librarians to acquire better insight on their experiences migrating a library website and their familiarity with the different cms trialed.1 the evaluation process, website design, and usability study began before the pandemic and the global shift to remote services. however, despite this shift, the timeline for the migration was not altered and the migration was completed as planned. within a year, the library website migration planning, designing, trialing, and structural organization was completed using a modified waterfall model approach. background completed under a sudden time limit, the website migration project for the california state university (csu), stanislaus library website is both distinctive and relevant to other libraries who plan to complete a redesign of their website, on both desktop and mobile screens, to meet accessibility requirements under a limited schedule caused by unforeseen circumstances—in this case, a global pandemic and sudden shift to remote work. the website migration project included a reconsideration of the content management system (cms) the library was hosted on. csu stanislaus, surrounded by agricultural landscapes and settled in the central valley, is a hispanic-serving and minority-majority university. ethnic minorities make up 70% of total enrollment and three-fourths of the undergraduates are first-generation students.2 in fall 2021, a little over 8,800 fte (full time equivalent) students were enrolled and total enrollment reached 10,500.3 the university has two campuses, turlock and stockton, and four colleges: the college of science; the college of business administration; the college of arts, humanities and social sciences; and the college of education, kinesiology and social work. the csu stanislaus library website has been designed and redesigned over twenty years for services and content updates, university and library rebranding, and to comply with web accessibility requirements. before the library website migration in 2020, at the start of the covid19 pandemic, the website was developed and produced using the drupal platform (version 7), an open-source cms. the contents of the library website have been updated from time to time since its first years and the website’s front-end design has been modified during the past few years. before the website migration from drupal to springshare llc, the library explored various cms, including wordpress and joomla. initially, staff encountered issues on the former library website hosted on drupal. over the years, the library’s website became difficult to maintain, due to the continuously modified written framework, and to implement new branding across the website’s content and overall theme. mailto:ivargas2@csustan.edu information technology and libraries december 2022 a library website migration | ochoa 2 the objective of the website migration project was to effectively enhance the website’s interface for usability and to meet current standards and guidelines for accessibility. additionally, the university was set to launch a new design of the institutional website, which required that the library emulate the university website’s design for uniformity. in preparation for the university website redesign, it was necessary for the library to model the new university design, explore cms, and migrate the library website. a possible migration from drupal 7 to drupal 8 was investigated from 2017 to 2018, when the former web services librarian was in the process of redesigning the website; however, the redesign and migration was not completed. it was discovered around the same time that editing or upgrading the initial design and development of the library website, which utilized a community developed design heavily customized over the years, would make the migration extremely difficult. any editing that triggered modification of the locally customized theme caused other elements of the website to break or collapse, particularly the website’s layout design, including the header, footer, and menu. it quickly became apparent that it would be more sensible to begin the redesign of the website infrastructure on a platform starting from scratch. a new design would also facilitate the application of the latest accessibility and usability standards. a complete rebuild would also afford the library the option to consider other web management platforms. this would evidently be a challenging feat. so, a modified version of the waterfall model was adopted. for this migration, a simple cascading approach was chosen as it worked best with the natural flow of the library’s planned migration. the waterfall model consists of the following objective processes: requirement analysis, design, construction, acceptance testing, and transition .4 for the planned migration, the requirement analysis was confirming that i would have a local and cloud server available for trialing the cms and developing a website design. the design and construction processes would be complete when the new website design was created, the cms trials were finalized, and outside web services and systems librarians were consulted. the testing phase would be complete when the student usability studies were concluded. as explored in my previous article, “navigation design and library terminology,” a user-centered usability study was conducted to assist in the library website redesign and create the website prototype. the prototype was designed to assess the library website’s front-end elements as well as the layout theme and overall design. lastly, the transition, or migration process, would be the final planned objective in the approach. as the web services librarian, i worked as the website migration project manager. the project manager migrated the final and redesigned library website and website content, conducted the student usability studies, tested the cms and created a cms recommendation for the library, and consulted with outside librarians on their experience migrating a website and using drupal or springshare as their library website cms. timeline and an unexpected pandemic the cms trials began in fall 2019 and continued until summer 2020. the former library website used drupal 7, so drupal 8 and springhare’s libguides cms were set to be trialed. the trials consisted of developing and designing a new library website on the platforms. experiences were documented and the design process was recorded for analysis and comparison information technology and libraries december 2022 a library website migration | ochoa 3 of the platforms. this information was used to determine which platform would best support the new website. consultations were also sought from various web services librarians, system librarians, and information technology professionals. table 1. timeline of the website migration semester activities fall 2019 • trial springshare libguides cms5 • develop library website design prototype • consult outside web and systems librarians on their migration to libguides cms spring 2020 • trial drupal (version 8) • test website design prototype through a student usability study6 • complete, compare, and analyze libguides cms and drupal platform trials summer 2020 • finalize library website design • migrate former library website content to final chosen platform • complete library website migration cms trials libguides cms and drupal were the systems trialed for the library website. drupal was used for more than two decades and consideration was given to upgrading to the latest version of the platform, drupal 8. springshare libguides cms was trialed as library staff and faculty were already familiar with libguides and subscribed to several other springshare applications, including libanswers and libchat for virtual research consultations, libcal for reserving library spaces, and libinsight for user analytics. trialing of libguides cms began in fall 2019. a website prototype (design, theme, homepage) was developed and designed using the platform. the platform was analyzed and explored heavily since it was a platform that had not been previously utilized by our library. libguides cms offers unlimited advanced customized groups. features like publishing workflow management, discussion boards, internal sites, various account types, password protections , and ip restrictions, and courseware integration via lti, were also researched during the test. in terms of content creation and maintenance, there were limitations under libguides cms. libguides cms has a built-in framework, ideal for libraries, with default settings that may disrupt or limit complex customization. at the time of the trial, additional support from springshare was needed to override default settings. also, libguides cms, compared to drupal, did not provide an option for tracking revisions on guides and content. drupal, a highly programmable free open-source website platform, was the previous platform the library website was hosted on. like all upgrades, drupal 8 offered a series of new features and improvements, from framework to themes. as a highly programmable platform, it requires information technology and libraries december 2022 a library website migration | ochoa 4 mindful designing and programming to establish the infrastructure and design. drupal 8 was tested and trialed on a development site in february 2020. the development site on drupal was utilized for the final evaluation of the cms. when the campus was ordered to partially close in march 2020, the home page and foundational design were completed; however, the development site was inaccessible remotely due to a block by the campus firewall. the development site resided on a protected local server on campus, and special permissions were required for remote access. unfortunately, i was not granted the special permissions required in due time, so the development site on drupal was put on hold. based on projections after creating a foundational design in february 2020, it would have taken about six months to complete the overall website design. remotely, i continued the drupal 8 evaluation, considered the results of librarian consultations and the literature on drupal as a cms. consultations and cms comparisons generally, the difference observed between both platforms is that libguides cms is a content management system primarily designed and maintained for library websites, whereas drupal is a framework for all sites, including highly customized websites. to support the cms comparisons, six web services and system librarians were consulted prior to the migration. the librarians were from distinct institutions: two community colleges and four 4year universities ranging from 2,000 enrolled students to 30,000 enrolled students, and library departments from 10 to over 200 library personnel. a systems librarian from a university of about 2,000 enrolled undergraduate students, regarding their experience migrating their website from drupal to libguides cms, shared that, “[their migration] took a couple months . . . we worked with campus it and springshare to ‘flip the switch’.” a digital services librarian from a university of over 10,000 enrolled students explained, “the entire transition probably took about 6–8 months.” the time to migrate a library website would depend on the size of the website, which was also influenced by the size of the campus. with more than 10,000 students, the csu stanislaus library website migration project was scheduled to be completed by the end of the summer semester, from june 2020 to august 2020. creating a new website from scratch on drupal proved to be a longer process than creating a new website on libguides cms. a systems librarian explained that “[libguides cms] is also quite streamlined compared to trying to maintain a more complex platform like drupal, which makes it a bit easier for librarians who are not full-time professional coders.” still, libguides cms is not as robust and did not offer the level of customized creation that drupal offered for general websites. the systems librarian added that although it is helpful to have a web services or systems librarian who can code full time, “turnover happens and some libraries can’t be sure there will always be someone on hand who is comfortable doing that coding.” ideally, having a full-time programmer is valuable for any library managing their own website; however, it is currently not the case for our university library. a user interface developer from a university of over 30,000 enrolled undergraduate students described their experience using a large amount of css to override default settings in libguides cms. they explained, “we have a large amount of overriding css, not to mention that it makes [it] messy. when building a site [in libguides] you can do whatever you want as long as you know where to put your code, utilize the js libraries springshare uses, implement css to override their information technology and libraries december 2022 a library website migration | ochoa 5 system default items/styles and use their api.” before finalizing our migration, we were required to contact the springshare technical support team to override default settings in libguides cms. however, the default settings are implemented to guide nonprogramming librarians when creating web content. a web services librarian from a university with 2,000 undergraduate students enrolled stated, “i don't think [libguides cms] will be anything like a drupal or wordpress cms. but, i do believe that their software is the perfect niche for libraries and librarians.” libguides cms required getting used to as csu stanislaus library staff were accustomed to hosting the website on a cms that allowed file transfer protocol (ftp). as a systems librarian explained, it can be difficult to organize a large amount of coding in libguides cms since “you don’t get your own server that you can configure and use for things like ftp storage.” the experiences shared by librarians were similar throughout our process for creating the new site and design, and these consultations in particular were not only insightful but helped prepare for and organize the structure of the website before actually migrating the content. an additional component considered before choosing a cms was the technical support and server options available for each. libguides cms is cloud-based and hosted by springshare, which currently uses amazon web services (aws). upgrades are implemented by springshare overnight, as well as minute-by-minute base backups. for the most part, systems and web librarians were satisfied with springshare technical support to implement these changes. with drupal, the institution can choose whether to host their site on a cloud or local server. during our migration in summer 2020, we sought assistance from springshare technical support to modify security certificates and custom domain names. if a site is hosted on drupal, the librarian can implement security certificates and update custom name domains without having to contact the drupal technical support team. it is fundamental for a library to consider these features as well, especially if under a set timeline. these consultations with developers and with systems and web librarians aided in the understanding of what libguides cms and drupal offered based on general comparisons, programming, customization, and technical support. cms accessibility compliance the accessibility levels of each cms also supported the final decision of the chosen cms. according to the web content accessibility guidelines (wcag), there are three levels of accessibility conformance: a (lowest), aa (midrange), and aaa (highest).7 currently, the target level of accessibility for csu campus websites is aa, which also includes all the guidelines found under conformance a. regardless of the foundational framework for both libguides cms and drupal, it was determined after exploring accessibility on these cms that developers should regularly test the design and content customization for accessibility. ultimately, the accessibility levels of a library website and its mobile responsiveness are dependent on the local management and develo pment of the sites. website design: usability study concurrent to the consultations, trials, and design development, a usability study was conducted in february 2020 with a total of 38 university student participants, including undergraduate information technology and libraries december 2022 a library website migration | ochoa 6 students, from freshman to seniors, and graduate students. the usability study was organized to test the website navigation design prototype that was built and used during the cms trials. students’ feedback would guide the decision of whether to design an audience-based navigation menu or a topic-based navigation menu. the study was conducted in a closed and monitored library room with laptops prepared. students were asked to answer questions and complete tasks to test the website design prototype menu navigation design. each student’s actions were recorded through screen recordings and visual observation, while assigning numbers to students to ensure anonymity, e.g., student 1, student 2, etc. the following seven tasks were used for the student usability study: 1. find research help on citing legal documents—a california statute—in apa style citation. 2. find the library hours during spring break. 3. find information on the library study spaces hours and location. 4. you’re a student at the stanislaus state stockton campus and you need to request a book from turlock to be sent to stockton. fill out the request-a-book form. 5. you are a graduate student and you need to submit your thesis online. fill out the thesis submission form. 6. for your history class, you need to find information on the university’s history in the university archives and special collections. find information on the university archives and special collections. 7. find any article on salmon migration in portland, or. you need to print it, email it to yourself, and you also need the article cited. during each study, students’ actions were screen recorded using snagit, screen capture and screen recording software installed on the laptops. data collected included the ease of access in terms of navigation behavior, the number of clicks, web pages visited, and the time it took for students to complete each of the seven tasks. that data was recorded and analyzed from the anonymously saved screen recorded videos. students were also asked to answer questions at the end of the study in the form of a written survey, which was then collected and utilized to support the final decision of the outcome of the prototype design. the results of the usability study provided the library with a variety of outcomes and several elements were integrated to lead the redesign of the library website’s header and main menu. the results of the study showed that the prototype’s design navigation, an audience-based navigation, was not as user friendly as predicted; therefore, the library website prototype design would need to be edited and modified to revert to the current navigation design of the existing website, which is a topic-based navigation. students had difficulties with the audience-based navigation design since it required them to select an “audience type” under the menu (fig. 1). their selection was determined by assessing where they believed the information was found. since most students did not understand the structure of the website, they did not know how to utilize the audience-based navigation to complete the seven tasks. although they found that the navigation design of the website was clear and simple, it required a “getting used to.” information technology and libraries december 2022 a library website migration | ochoa 7 the results also highlighted the effects of the use of library terms. to make menu links exceptionally user friendly, clear and common terminology was added. an additional component was a search-all search box for the website, which was advocated by the student participants. based on navigation results, the main menu and submenus were also structured to not only be clear and organized, but for popular pages to be mapped and linked in more than one menu. figure 1. screenshot of the audience-based navigation design developed for the library website prototype. figure 2. screenshot of the topic-based navigation in the former library website. the design structure of the website relied on the organization and management of the website pages. to maintain a congruent structure, it was necessary to choose a navigation design that met the needs of our students. in this case, the results determined that the topic-based navigation was preferred; thus, the management of website pages and submenus was modified to fit this navigation. the usability study was focused on testing the navigation design of the website and the navigation main menu. given the helpful feedback from users and having more participants than expected, it would have been beneficial to also test other aspects of the website in this study. the home page is the landing point and statistically the most visited web page of the library’s website. it is the hotspot for students and our university’s community to find the catalog, resources to fulfill their research needs, upcoming news and events, the reservation platform, and more. however, as the website redesign progressed, there were challenges in designing the primary components of the website home page, such as assessing what elements were fundamental to have on the library website, following web accessibility requirements, and following the university website’s new theme and design. ultimately, the components of the former library website’s home page were migrated in its similar structure to the redesigned information technology and libraries december 2022 a library website migration | ochoa 8 website. yet, adding questions and tasks on the usability of the library home page and its content components would have certainly aided in not only the redesign and migration, bu t the direction of the library website’s future development. this information will be a focal point for future usability studies of the library’s website. the final migration the greatest challenge throughout the project was the time constraint due to the covid-19 pandemic. because the pandemic brought several unforeseen obstacles into staff work schedules, it was challenging to manage the time needed to complete the project and simultaneously work around tasks surfaced by the pandemic. however, staff committed to stay on schedule and to complete the migration project before the start of the fall semester despite the circumstances. after transitioning to remote library services, emphasis was placed on developing the website and web content. even more so, the migration project served to ensure that the library was providing an enhanced and accessible desktop and mobile website for users who were now working from home. additionally, this included the management of web services, on top of the migration project. a concise and organized schedule was necessary and although time management of the different projects and tasks offset by the pandemic was challenging, the web services librarian was fortunate to have support from the library information technology staff. after the cms trials and after the website prototype usability study, libguides cms was chosen as the content management system for the university library website. because the library was looking for an easy-to-use platform, utilizing libguides cms reduced the time needed to build an infrastructure and allowed simple website content management, maintenance, and an improvement over the former website’s accessibility and mobile responsiveness. the platform worked well for the campus and the library; however, each library should evaluate its respective department priorities, along with what is expected, desired, and needed for their individual library website to successfully showcase services and programs to users. following a modified waterfall model approach proved to be a success for the website migration project due to existing resources and scheduled timeline for implementation. in a future virtual renovation or redesign of the library website, the library will explore various project planning models pertinent to the future proposal’s desired outcomes. endnotes 1 isabel vargas ochoa, “navigation design and library terminology,” information technology and libraries 39, no. 4 (2020). 2 “diversity and equity data portal,” california state university, stanislaus, 2021, https://www.csustan.edu/iea/diversity-and-equity-data-portal. 3 “quick facts,” california state university, stanislaus, 2021, https://www.csustan.edu/iea/institutional-data/quick-facts. 4 bob hughes and roger ireland, project management for it-related projects, 3rd edition, (swindon, uk: bcs learning and development, 2019). 5 vargas ochoa, “navigation design and library terminology.” https://www.csustan.edu/iea/diversity-and-equity-data-portal https://www.csustan.edu/iea/institutional-data/quick-facts information technology and libraries december 2022 a library website migration | ochoa 9 6 vargas ochoa, “navigation design and library terminology.” 7 “web content accessibility guidelines (wcag) 2 level aaa conformance,” w3c web accessibility initiative (wai), web accessibility initiative (wai), 13 july 2020, https://www.w3.org/wai/wcag2aaa-conformance. https://www.w3.org/wai/wcag2aaa-conformance abstract background timeline and an unexpected pandemic cms trials consultations and cms comparisons cms accessibility compliance website design: usability study the final migration endnotes lib-s-mocs-kmc364-20141005044646 167 the minnesota union list of serials audrey n. grosch: university of minnesota libraries this paper describes development of a marc serials format union catalog of serials caued the minnesota union list of serials ( muls). the preliminary edition, published august 1972, contains over 37,000 main entries in 1,566 text pages produced through photocomposition in news gothic typefont using the full marc character set. the total number of entries is over 59,000, including cross-references. conceptualization and scope of the system as well as its design, data conversion, computer and programming support, photocomposition, costs, and problems are discussed. introduction this paper has been prepared to inform the profession of the development of the minnesota union list of serials (muls), the data base of which represents significant differences from those of previously reported union lists. as one can see from figure 1, a muls preliminary edition sample page, muls is a full bibliographic union serials catalog. it uses the marc serials format as its formal structure. the preliminary edition contained 37,289 university of minnesota serial titles. the file now includes holdings of the minneapolis public library, eight private colleges, and ten minnesota state agency libraries including the minnesota historical society. augmentation of the data is continuing so that all minnesota academic, large public, and selected special libraries will be included in the coming two years. several years ago, the university of minnesota began investigating the development of its own unified serials catalog as a first stage in the development of an automated serials management system. at that time many libraries in the state had developed their own serials lists, and regional consortia had created lists of their members' holdings. these resources, coupled with networking through the minitex (minnesota inter-library teletype exchange) program, made possible the muls initial development. the minitex program links together seventy-two libraries via teletype to the university of minnesota for rapid interchange of library materials. state supported academic institutions, public libraries, and private colleges in the local metropolitan area also participate in this program. in spring 1971, it became apparent that a union list had become a necessity if the expanding minitex program and the university were to 168 journal of library automation vol. 6/ 3 september 1973 minnesota union l ist of serials fr..-kt. minitttrt mi'air. pvwiutiona kietltifiquh et tkmfqun. lultelln *' mf'yic.n techfriques. p~rrs. mnu en no i · ( 1921· ). frmkt. minist.,t4ei'huuh0ft nationie . .t..rwl.aitt de 2073900 i' m\iution mtion~ . hns. "'pubhe par l'lnst•tut ~goe~que nijoi'ii!ie ... mnu wll 1962. 196$.1968.1970310.58qfsa4 mnu r current votume oruy. france. milthtert ch l'ectuutioft iuitionale. cawocu.ctes 2074000 theloh 4e docto,.t. p•"j· ~~ htfdot t•ue: 1884/ 1885 1931 · m•ntst~e~t·,nstuxtton pubhque: 1932 1958 "'""'''~ore de l'edi.icatjo n nlhon•le, 1960·1968. 1884t 89· 1958. 1ssued annua lly. 5yurty pans be+nc plied cont.nously and supplle d w1th a e~eral t p and subt«t •nde•totofma volume. vols 1·5, 8{l884 t 89· 1904 / 09. 1919123) have also an alphabetical hst of 1uthors. in vois. 6 -7 ( 190911 3 ·19141 18) a hst ol authors is 81\'ci"' wllh tach iikicie. bea~rmmc w1t h 1959, vol . numbers no lonliltr used bea•nn•na "'''h thue~ of 1930. a hst was also publtshed m parts, as supplements to the numbers ott he b•bhograpl"ue de ia france. 1. pt•e .. btbl1ograptue, 1932 ·date. lt~~;~::!~~::4t'~s-19sb:. za~~~,~~:~a'~;>:~::,e~~~~ce. supplement annuel 0 j915 16(1uc.)2 33)1,a 1917 is (la$( )4 )5)uci'iiu<.tf0 of1qne fiumbef t.•49llrt'w 1 mnu 818 v. i ·i s(l884 fl889-1958): n . ser . 1959· 0,6 j18qf84 ftance mm1s1ere de l'1nstruc110n pubhq~ etdes beau•· 201.4joo trts. s..france. m1 ntsterede l'edutaiion nat10naje c1talo@:ue de~ thhh de doctor at. fr1nce mm1sttre de l'+nstruction publtqueet des beaux arts. shbultttm des btbiiotheques e t cfe~ archtves. 2074200 fra nc.,, m•n•stere de l'mstru<:tton pubhque e t cfes beaux 207000 lrts. comlte des travaux h1stor•ques et sc.enflhque~ se. f ranee. cotnte dn travau• h•storiquf:s et sc.~ttftques. section de gqraph.e.buuem. fr~e mon!sterede l'llislruc.iioopubijqvtetdesdeijui 2074 40() arts comtte des travau• h•sicm'iquts et sc~ent•hques. sh france corrute des tr1v1u• h •stonques el sctenl•hqoes sec lion de gqrtphm!.bullel•n. funce mlntstere de l'lflletteur revue g~ale 2074!.00 d 'admentstultton par•s. ltnfk. bere~·tew•vlt. ai hudot t•tle. 1879-1910: m•n.stet"e del'.nterteur. t•tle ylrtts j.i•lhtly v. ll. jan ok j878. pub11hayee.lacollaborahonde senateurs, de deputes. de mtmbtes du conwtl d'etat , de lonctiqnna trts tt de publtctstes sous ia dlfe18 •~ mnu per 1878 19 10 . fra": ~~~!~t'!,"~:~~~~:t~!faet~fsc:~,;:~s de ia 20;:.600 frtnct comm.ss10n de pubf•uhon des documents relahls a u• or1c•nesdc ia guerre de 1914 oocument:s.d•plomat 1ques •. ftlnc e , m•n•sttre des ltnances . sh bulletmde 2074 100 stlltsltque et de teg,stat1on comparee. rrance mtnts tere des t.nances et des afla1rs :?0741!.00 econom1que. rapport sur le:s.comptes de ia nat•on see france . se,....tci des etudes tconom 1ques et hnan<:jtres.rapport sur les comptes de ia nat•on france m.n1steredu comfl"ltrce. annates du commerce 2074900 extet~eur pa"s 18&3 18~1 hlvet tue oocumenssurlecommercee~ter•eur . various chanles '" n.jme ot m 1ntstry: 1843-1851. mtn1stere de l' t&oc uttureetdu commerce. 18s2-f eb. 1853, mm•sterede l'•tlltfleur. de l' ailrtculture et du commerce. feb.· may 1853, mm•ste•e de l'tntt:t~eur. june 1853·july 1869, mtn1sterede l'aguculture. du cotnit\frct tt des tra vaux publics. auc. 1869 1881 , mln•stere de t·agnculture el du comme•ct. ocustonamy pubhs~ •n comb•nt'd numbers. no more publ•she800 rapport wt lei c.omptu de ia n11uon. p.jrts, lmpr nat~te vol tor 1949 1 95~ •ssued by wrvice dts etudes konon'i•ques et ftna ncttrts and lnsittut n.c.on.l de~ ~tahshque et des etudes e<:of'iomiques 1962 i 966 by lnst•tut nattanal c!r ia s tttish que et des etudes economtques 6) •1<600 mnu ooc 1967 mnu wll 1960 1966 jj6 "" f8a 74 fran« servteedespichesmartl lfn's shfr•nc.e office 2'01~900 sctenl!fiqut et t tchntque des pectle's ~ntunes rr-vve (kos lra vau• france serveces techntques et .ndustt~el~ de 2016000 t·.eronaullque publteations scttnllhques el techrnques ~ france mtntstere de l'alf pubhcat10ns sc.enhftqun et ledw'iiqut:sfrlntt , armee etat mtpservice hts torjque s..rewe 1076100 tus lortque de i"jrmet pubhution tmneslf•eim: de i'etat ma1or de franceactuelle. p•ns 1076100 mnu doc. vi snol (jani966· fr~rtea·a.merlque . new yor.t. 2'016100 mnu new current 3 ye"s frjniu mnu per v. l 6(1919 1924) france •llustrat•on p•t•s 2016~ t•tle vaues, mar. 1947 -1948. france •llustrat1on htteratte et thentrtle absorbed by nouveau fem1na supp't'mtlll tntllt••• tt ht ttl',..,t' •a 2<~216 mnu per oct1945 1955 , s uppl no i 191 (1947-1955). france 1llus1ratton ll!ter,we et thea ttale sftfr;snce •llustrat!on franceln,l•tut natfontfci'etudes dernocraphtqw1 . tr1v1u• et documents. p•os mnu xx v 9 11. 1.3 19 .21 40,43 (1950 ff ance hbte.ltbefte egaht~. hatt:tme tondon, ham$h h 1m111on l•m• l ed ceased ovb v ll no 74 (jan 1947) mnu p[r v 1 13 (n0v1940 jan j947). la france lotteta11e p•tts nomorepubl•s~' l &'lllll mnu pta vi 36(1832 1839). lrancsscan ht)iof' tc:at classics s..ac~myol amerun f rlinc,s,un htsior)' f ranci'scan htstor-cat c lass:cs. 2076000 2016800 2076~ 539 fig. 1. minnesota union list of serials. preliminary edition sample page. provide maximum benefit for minnesota's library users. our state's library environment features: • one large academic research library-the university of minnesota; • many smaller academic libraries in the 75,000-250,000 volume class; • two large public library systems-the minneapolis public library and the st. paul public library; • one private research library-the james jerome hill reference library -which serves as a nucleus for the metropolitan area private college minnesota union list of serialsj grosch 169 library network called clic (cooperating libraries in consortium ); and • some library automation activities among these libraries, with the largest automation staff and activity at the university of minnesota. the parallel developments of networking and systems design at the university made possible the proposal to the minitex program advisory board for funds to develop the system and publish the first union list. in summer 1971 this program received approval and work was begun in midaugust. on september 1, 1972, the preliminary edition of muls was published and distributed to participating university and minitex network members. following is a report of this work, its results and problems. program scope obviously, to create a system capable of eventually including library holdings state-wide and to convert such data requires definition of an initial and future scope. the initial scope was defined as: conversion of the university of minnesota libraries' actively received titles, departmental libraries' complete titles, and inactive titles in the libraries' periodical division. development of a batch input tape software system capable of supporting initial conversion, correction, and updating to produce the preliminary edition of muls. the future scope would potentially include the augmentation of the muls data base with the following non-university of minnesota holdings: a. eight metropolitan area private colleges in the clic network, with production of a clic union list for their members' use; b. minneapolis public library serials and unique titles from other public libraries of over 50,000 volumes, with production of a public libraries union list; c. holdings of all state agencies, which would include the minnesota historical society, state law library, state department of health, and legislative reference library, with production of a union list for their internal resource sharing; d. state supported colleges' holdings; e. university of minnesota inactive general collection serials, thereby completing access to the state's largest research library; f. private college holdings outside of the metropolitan area clic institutions; and g. selected special libraries' holdings. at the moment of this writing we have the initial scope completed, are just completing a, b, and c, and have planned work on d and e for 1973. in view of this scope the initial muls magnetic tape system was based on the marc format to permit: • publication of a photocomposed or line-printer-method full union list; 170 journal of library automation vol. 6/3 september 1973 • publication of regional combination or individual library lists using an ibm 1403 line printer equipped with the ala graphic print train; • storage of complete and verified information on each serial as known, together with the source of the cataloging data; • extraction of the data via individual libraries to assist those wishing to develop automated serials management systems including check· in, claiming, binding, etc.; • conversion of the file to other storage media such as disk; • fulfillment of the smallest to the largest libraries' needs for biblio· . graphic detail; and • extension to a fully automated resource sharing system which would further improve the benefits of library cooperation. with this picture of the program scope, the design factors, data conversion, computer system, programs, photocomposition, costs, and problems will be described below. system design the easiest way to look at the muls design is to gain an understanding of the muls marc record content as shown in table 1. this record is the basic unit which is entered, including all associated cross-references or added entries to be made. it in tum generates each of these secondary entries in the file. in this brief description we will assume the reader is familiar with the marc serials record as described in serials: a marc format: preliminary edition and its addendum no. 1.1• 2 there are some differences between the muls format and the lc marc format, most importantly the addition of a sort field (tag 249) and the subfield arrange· ment for holding fields (tag 850). other variations have been indicated in table 1, which uses the same organization as that contained in the lc format description referred to above. figure 2 shows a page from a master·file listing. note entry no. 2074000. this listing is formatted with the sequence number of the record appearing on the first line, followed by the bibliographic level and the remaining leader information. next the record directory entries are found for fields 008-950 as applicable. on the next line are the 008 fixed length data ele· table 1. muls marc record content a. leader 1. logical record length-five characters 2. recqrd status = 1 for marc record 3. legend = 4 for added entry ( aet) or cross-reference (xrf) entry a. type of record-not used (blank) b. bibliographic level ,. s c. two blank characters 4. indicator count = 2 minnesota union list of serialsjgrosch 171 5. subfield code count "' 2 6. base address of data "' 5 characters 7. sequence number= 7 characters b. record directory 1. variable :geld tag "' 3 characters 2. field length = 4 characters 3. starting character position = 5 characters c. control fields-008 fixed length data elements 1. date typed 2. publication status 5. country of publication code 9. type of serial designator 10. physical medium designator 12. form of content a. type of material code b. nature of contents codes 13. government publication indicator 14. conference publication designator 20. language code 21. modified record designator 22. cataloging source code d. variable :gelds 1. indicators in general we have not followed lc in the use of indicators. one exception is the use of filing indicator for the 100 and 200 series tags, which we implemented before seeing that this feature was provided in the addendum no. i to the lc format. therefore, the indicators except as above are both blank. 2. subfield codes except for the holdings statements (tag 850) we have generally followed lc philosophy. for tag 850 we now precede the $a sub:geld with a $z sub6eld, suppressed on printing, which contains the 4 digit number identifying each specific holding library which is also found at the end of the 008 field. 3. variable fields currently used. 010 lc card number 022 standard serial number 041 languages 100 main entry-personal name 110 main entry-corporate name 111 main entry-conference or meeting 200 title as it appears on piece 245 full title 249 sort key from 100 or 200 series tags stored in collating codes and limited to 120 characters 250 edition statement 260 imprint 500 general note 501 bound with note 515 note for explanation of dates, volumes, etc. 525 supplement note 555 cumulative index note 730 added entry 850 holdings 950 cross-reference tracing note: we have followed lc numbering for the above data elements, and have substituted blanks on the tape record for those elements omitted. we have also expanded the 008 field to include a variable number of 4 character elements which contain the index number of each holdings location listed in the z subfield of tag 850. 172 journal of library automation vol. 6/3 september 1973 20739!jijs~2221iiiijjui1,i621 tl0auii4~j00000 lloilll:ilouu .. fl .l450li..&uuuloo 24900820ui4u 260001100222 51jooci5:ju02j:l 7j0004600286 l't500u5800jjz ~51;1}(;j~iif j:ttl 711ll.t• fr .. c: ,.. • frta 111 ut.:jj 0130 u:£• rr.an('t!'• "'jntst er~ •le l' ..,chu:atlol'l llatlonale.t ua..a aunu.,jre dt!' l'.educa'i"n natlouele•l u i>a t t.:an(:t.: .-'c i hi stewe-dt: lto.diical' l o~na 1lonale·--annua 1 r£-de-l£ducat ion-national e .1: o:.a p,.,.,,..,j 1~ .. "rubt•"'• i>ar" l't!"st!tut p"'ed'amolllque natlonale.ul l:ia i-r"&nc~: .. l11stitut p edaiiomiqua nationale.a: ••• nnu f'h .. il!icltdl9&2,1965,19b5, uno•sej70.s~ qt'844t: .l~• »nu ~bm ~c$dcurrent voalu•e unly • .l074uuo~u.llltltll110164j i oo~uu450•jooij 4j 100011-:00045 ll000su00063 245003800113 249008200 is l 260001100233 500044000244 5000u54006h4 "i:.h1cli'72ui•jj~ 500014'700e1u 51500'7~01057 730006901l36 '730005301205 730005401258 730003801312 85000760135(] 71121"'• fr fwe c 0107 ll:.ia o-"4!llt'l r"'v :.2& u~a ..-ran~e. winlstere de l' .. educatlon natj.onale.t: o:&a c .. ,.,l~jhic "~"' th~se:, de doctorat.1 0~& t"" wancf. .-11 in istf:ih·:-de-leducati on-no\, ionale.--cata.logue-des-theses-de-doctoio\1•( u:.ia t.. r.i>l•1 &~ ... at hf!'.arj of titl6': irr4/ir85-19.31: winlstere de l'lnetructlon publlque; 1932-1958 •lntet'er• de l' .. edu.catlo n natlonale, l~lb0-19btl• lt184/b9-1958: laaued •nnually, 5 yearly parte beln• paaed contlnouely end •upplle d elth a ••neral t. p• and !4ub.lect index to fore • volu••• vots. 1-5, h ( 1884189-1904109, 1919/23) ha¥e aleo en elphebetlcat ll11t of author•• in vuts. b-7 11909/ij-1914/uo a list uf author& ia .-:lven •lth each taeclcle.s: l.,t;d hp...:innln,. •lth 1950, vol. nuaberli no lon11er umed•s: .j:iio ut"t~otlrtnln.r •1 th theses ot" 19jo, a llht was a lao publlwhed ln parta, as auppleaente to the nuaber• of the llblloaraphle de ia franc~, 1. ptie., blblloaraphle, 1932-data.l -is• tttle'.,•rjea: ih84/89-iustj, catalo.eue des th"et~ee et ""acrlta acade•lc:~uea. 1965; called blblla.raptlle da la prance, suppl .. ••ent a.nnu~l 0.1 t~a l!ll5-lt. c fa!:ic• j2-jj) and 1911-tr (lase. 34-35) each laeued jn one nuaber.s: i$"' 1-" rancor. wiul9t .. ere de l'ln&tru~tlon publlque et dea beaux-erte.t l:i• fr .... uce. !hrectlon det> hjblloth eques de france .. t j5• jjihllu~otr.j.ph.le de la france. suj>pleaent annuel d.s: 4:io• ct~ciiloio(u~ des theses de doctorat. t lo:da nnu sljitih.scsdv.l-15 ( ie84/18fl9-1958t; n. ser• l9511207410us l.tu. minist"ere de l'ln»tructlon publlque .. t ri•a beaux arte.sb8ull•tln d .. a blbllothequea et de• archlvew.l: ~ .. ficancf.-!41 nis 1 ewe-dh-lf hs"fltuclion-puhliquf.-et-di:s-ri:"aux-arts •• j 2:074juos .l2urhjbiuuj!=!t4 00hliuu50u000 150021000005 2490115011215 u 119 fr11.ru: ... ~tni.st"~rl!' <.it!> !'instruction publlque vt dee beaua •rt•• c~•lt .. ., d~• trevaux hl•torlquea et sclentltjqu•9•lbf•anc ... <..:urzrit v tl~~ tru.v .. ux l•lt~corifiup.s et scientlfioues. sectloa de g i!cirr•phle.llulletln.1 :f.t~ t).ianll::~ -.loll n i sl"l:"wt·:-uh-li ns ofucll on-puhl jqijeel-oes-i:teo\ux-awts.-conite-oes-tiiavaui-aistc 5 icue5-et-sc i entif iquf.s•.i fig. 2. master-file listing. ments with the last four digits the holdings location index number which is the same as the suppressed $z subfield in the 850 field. then the variable fields are listed in numeric sequence. note the subfields as indicated by $z, $b, etc. the number to the left of each $a is the marc tag number. another departure from marc is to store the call number as a subfield of the holdings statement since it may vary among participating libraries. to contrast how the information is stored and how it appears when published, the same record is shown in the left column of figure 1. also, the next record shown is generated from an added entry tag 730 in this parent record. we have prepared a detailed coding manual which is followed by our coders; this document presents various examples of conditions and details the full system structural requirements. these changes in the format were made to simplify wherever possible, to provide for conditions which the original lc format did not cover, and to preserve the marc structure with full text. with the exception of subject headings, all bibliographic text is stored. other marc tags may be added to the system at any time. the initial system was tape-based, as our computer system at that time did not have uncommitted disk drives. also, we needed to gain some detailed knowledge of the file and record characteristics to most effectively design the disk-based system. this knowledge could be gained easily after minnesota union list of serials j grosch 173 some basic data were stored in the system. since programmer time was our most precious commodity, this phased approach was used to: ( 1) achieve enough support on the tape system to permit publication of the preliminary edition of muls while gathering file and data characteristics; and (2) bring into operation a disk-based system with completely automatic addedentry correction and generation, coupled with very flexible correction procedures. data conversion various methods of data conversion were investigated. two requirements seemed obvious in our system-compilation of data on a code sheet and efficient, accurate keyboarding. further, since the marc character set was being used, any potential device had to provide a minimal keying situation to accommodate this character set. compilation of data on a code sheet was necessary because multiple files in multiple locations would be checked to gather all of the information. keyboarding had to be efficient as it was initially estimated that some 25 million characters would be entered before we were ready to publish the union list. the ibm model v record only magnetic tape selectric typewriter ( mt jst) was chosen as offering the best approach for high volume, short duration use. three machines, each equipped with the special marc element and key buttons, were leased. typists easily corrected their discovered errors on these units. each typist followed detailed typing instructions and, after mastering the coding manual practices and procedures, was a trained coder. during july i august 1971 all training aids were prepared, forms designed, and staff recruited. the initial staff complement received their training during the last two weeks of august. during september the data gathering staff was brought to full strength and consisted of: project director editors (librarians-library assistants) senior clerk-typists clerks (students) 1 fte 4 fte 6 fte 12 fte full-time equivalents are used as staff were in many cases part time or temporarily lent to the project. during the period august 1971-june 15, 1972, which comprised the total data preparation time for the preliminary edition, five librarians and thirty-five students actually were trained and participated in the project. it took about six weeks to bring most of the staff to an acceptable performance level. some students found the work too complex or detailed and voluntarily left the project. one clerk-typist did not gain sufficient proficiency to pass out of a trainee status and was terminated at the end of her probation period. thereafter, with a staff of this size, performance problems were minimal. ,i 174 journal of library automation vol. 6/3 september 1973 the data to be included in the preliminary edition comprised the university's • currently received, centrally recorded serials ( 20,000 titles); • inactive periodical division titles ( 8,000 titles); • coordinate campus locations of the university ( 4,000 titles); • complete departmental library titles excluding the bio-medical library ( 6,000 titles). the bio-medical library was excluded due to its present mechanized serials system which would be used to produce a separate serials list, issued as volume 3 of muls to the university and the minitex participating libraries. this separate publication was necessary due to the short time in which the initial data were to be collected. however, the bio-medical library is now also being included in the body of the muls data base. these four categories of serials necessitated quite different approaches dependent upon the available check-in files, shelflists, or catalogs. for example: to capture data on the currently received, centrally recorded titles we photocopied the kardex drawers from the serial check-in file maintained in our headquarters library. these running titles were checked against the official card catalog in the library. if the title was found, the bibliographic infom1ation was transcribed, together with all kardex and catalog locations. if not, the kardex data were copied onto a code sheet for subsequent verification together with its listed location. about 5 percent of the time the photocopied sheet was illegible. these entries had to be transcribed from the check-in file, verified, and then passed on to the next step. when bibliographic data had been assembled on the code sheets they were edited in groups, each group accompanied by its photocopied sheet. corrections were entered by editors, the catalog or check-in file was rechecked as necessary, and then the sheets were sorted by holding location. next all holdings information was procured from the remote location to make sure it was the most reliable information. finally, the sheets were returned to be rechecked and typed. "mopping up" occurred at each holding location to encode inactive titles and uncataloged serials. when a title could not be verified, the piece itself was used to develop the main entry, added entries, and other pertinent cataloging information. similar procedures were used on the inactive periodical division shelflist. departmental library locations involved the use of shelf-locator visible indexes and shelflists, coupled with check-in files and branch catalogs. coordinate campus locations outside the twin cities metropolitan area required the checking of title/holdings listings provided by these campus libraries. many entry problems resulted, because variant cataloging approaches were used in many of these libraries. typing and subsequent input were done as coding sheets became ready for keyboarding and were therefore in random order. over 40,000 individual records were typed, each averaging about 480 characters (an approximate minnesota union list of serials/ grosch 175 18 million keystrokes). during the period february-june 15, 1972, when the complete file was proofread from the thirteen volume master-file listing, another 5 million keystrokes were required to delete, to reenter, and to correct entries and associated cross-references. our final keyboarding stroke count was exceedingly close to our original estimate of 25 million characters. the proofreading portion of the data conversion took twice as long as originally anticipated, causing a delay of two months in photocomposition scheduling. proofreading was completed on june 15, 1972, and on the following monday the photocomposition vendor received the final output tape. due to some format changes and continued systems problems the photocomposition output was not received until july 21. printing and binding followed and on august 28 the preliminary edition, consisting of 1,566 text pages in two class a bound volumes, was ready for distribution. computer system two computer systems were used in muls production. one system was used to convert mt /st cassette tapes and involved initially an ibm 2495 cassette converter coupled to an ibm 360/ 20 system. this configuration was replaced by off-line tape conversion using a data action tape pooler and the same computer for code conversion and record blocking. twohour to one-day service was provided by this service center, located in a local insurance company. the raw data tape resulting from the above process then required processing on the second computer system, an ibm 360/50 at the university of minnesota. all programs are written for the cobol f compiler and operate under os/ mft using 1600 cpi magnetic tape. two 80k core partitions are required for the updating and printing programs. the ala graphic print train is used to print the file and control listings. figure 2 was printed with this character set. programs muls programs for the present tape system were conceived as two sets: ( 1) conversion, file creation, and updating; and ( 2) printing functions. the first set performs the following functions: • identification and checking of fields for validity, tagging, and structure from the raw input tape; • creation of marc-type main entries; • creation of secondary entries generated from the added entry (tag 730) and cross-reference (tag 950) fields; • creation of correction and deletion entries; • sorting of main entries and the generated secondary entries in alphabetical sequence; • sorting of correction and deletion entries in sequence number order; • addition of new records; "' ~~ .. ,,, ,, 'i'' i ... j• !'i · · ~-· 176 journal of library automation vol. 6/ 3 september 1973 • deletion of an old record; • addition of a new variable field, including holdings statements; • substitution of data in a variable field; • deletion of a variable field; • production of a transaction file reflecting changes to the data base; and • generation of a new master tape, which can include resequencing the entire file andjor producing a work list of the file. however, any change in a !00, 200, 730, or 950 tag requires deletion of the complete record with its secondary entries, and reentry of the record in its changed form. this is because a two-pass update would be required in the tape system to automatically coltect secondary entries as well as to generate them. the second set of programs perfom1s the following functions: • printing of a formatted work list selectively by location or combination of locations, diacritical printing preceding the character to which it applies; and • printing of a conventional union list format which closely duplicates the design of the photocomposed page in figure 1. selectivity by location or groups of locations is present and all diacritical characters are overprinted as in the photocomposed list. photocomposition the preliminary edition of muls, as shown in figure 1, was photocomposed by a twin cities firm using a harris fototronic crt composition system and an ibm 370/ 145 computer system. we chose the lowest bidder which was fortunately a local firm. the bid required the vendor to program from our marc format master file tape an input tape for the photocomposer which would produce the specified format, using the marc character set in a font to be chosen from sample text pages. the vendor's bid included programming, composing, and procurement of several of the characters used by marc which were not in his current font repertoire. a test tape was provided to the vendor for his developmental use, together with documentation on the marc muls system. after seeing the initial result of our specified format we were not pleased with the result. the reason for this was compounded by the fact that: • the vendor had not followed some of the suggestions; • the vendor had made some unspecified changes; • the program had injected some data errors and other unacceptable conditions; and • ·the library, in its total lack of experience with this variable density form of display, had no idea of the real effect of its proposed format in getting efficient character density coupled with attractiveness. each of the design problems was looked at in order to adjust character minnesota union list of setials j grosch 177 size, column length or width, continuation line placement, display form (bold, regular, oblique), and relative data element placement. four iterations were required to finally produce the format shown in figure 1. as a result, our photocomposition and printing costs were half the costs had the original format been developed. style and readability also improved dramatically. the choice of type font was made by comparing sample pages in both serif and sans serif styles, including times roman and other well-known fonts. various library staff members were asked to vote on their preferred font. news gothic was an overwhehning favorite by both public and technical services oriented librarians. the photocomposition vendor had produced many catalogs and books using other special alphabets and characters, but had not previously done any catalog from a marc format tape. this made possible a high degree of expertise on their part in handling our special character requirements, but added some developmental problems because of lack of marc format experience. except for superscripts, subscripts, and the underline, all marc characters have been needed to display the text. our advice to those considering catalog photocomposition is to request bids, as the price on this service has continued to drop. the page price will be dependent upon the services perfom1ed. in our case the vendor handled all composition programming. one can estimate that at a minimum 40 percent-50 percent of the page charge would be involved in this service. also, the size of the job will cause a variance in the price a vendor will quote-the larger the number of pages, the cheaper the cost per page. on a very large application it may be to the library's advantage, if resources permit, to train their own programmer to program the composing device. however, we feel that our best needs were served b y contracting for this support as our programming staff was limited and did not have any prior composing-machine experience. costs the expenditure to produce a computer-based serials catalog will vary dependent upon salary and equipment rates and the conditions found in the library system. in the case of muls, condition of the files used ranged from disastrous to excellent, yet with only fragmentary information in each file. moreover, entry forms varied greatly among the many check-in, shelflist, and catalog files. therefore, data collection was much more expensive than it would have been had we keyboarded directly from one existing file of data. to present some idea of costs for others planning similar activities, we have developed some average costing information from our expenditures. each main entry in muls costs $2.81 on an average, figuring all known actual charges or subsidized costs. this main entry cost includes all associated secondary entries, which is about one secondary entry generated per h ll, h i" ii"• rr! i; ~ ;;jl i i" i'" 178 journal of library automation vol. 6/3 september 1973 1.5 main entries. this $2.81 breaks down to approximately $1.00 for design, programming, and administrative costs; $1.40 for data conversion; and $.41 for photocomposition, final printing and binding. let us look at some specific items which figure into this average cost per record to give the reader some idea of what is reasonable to expect in a project of this sort. a good example is conversion of mt/st cassette tapes to computer compatible magnetic tape, including code conversion and blocking of the records. our per-cassette conversion cost varied from $.50 to ·$2.00 per cassette. this variance was caused by a change from online to off-line conversion and the problem of handling cassette tapes which did not have the proper stop code at their end. our actual billed average throughout the whole project was $. 73 per cassette. if no tapes had been prepared omitting stop codes and if total off-line conversion had been used, our average would have been $.50 per cassette. a typical cassette tape averaged seventy-five new marc entries, so this was a very economical charge for this method. another specific cost to examine is computer time. on our ibm 360/50 system, time is billed as time on/off the system and not according to some calculation of cpu /channel/storage/peripheral device usage. normally an internal university rate is a great deal cheaper than a commercial rate for the same equipment. however, the billing method used in our system has probably increased our costs for computer time over the cpu time method of billing, since the user is at the mercy of contending with other jobs on the system at the same time; i.e., waiting for his processing turn. this has had a noticeable effect in our case; run times to update the file have varied from four to six hours machine onjoff time almost independent of the number of transactions being processed. photocomposition page rates over the last few years have been dropping as competition in this area has flourished. two years ago it was common to receive quotes of $6.00 per page or even higher. most prices we received were under this figure; but at the time our contract was · signed, our successful bidder, who also was our lowest bidder, quoted $2.60 per page. this included full programming support to convert our marc format tape for creation of the photocomposer input tape. today rates much lower than this can be found. moreover, rates under $1.00 per page can be obtained if the customer is able to create his own input or driver tape for the photocomposition device, making this method considerably more attractive for even low volume per-page printing. in the case of muls, one photocomposed page equals ten double column computer printed pages without photoreduction. photoreduction can cut computer output pages about one-third, yet obviously not to the limit achieved through the photocomposition method. therefore, considerable printing costs can be saved dependent upon the number of copies of each page printed. minnesota union list of serialsj grosch 179 problems the problems encountered during this project and its daily operation presently have been, for the most part, those commonly found in any large scale project. the large volume of data, less than ideal computer environment, condition of the original data, and large staff required to produce this effort all magnify many problems which seem unimportant in a small or short term project. in general these problems fall into the following categories: ( 1) data handling and bibliographic; ( 2) communications; ( 3) estimating; and ( 4) hardware or computer related problems. data handling and bibliographic. those who create and use research library catalogs can appreciate the formidable physical problem in any data conversion activity. a half century or more of cataloging variations must be brought together; mistakes in the original data, differences in format of cards, and spelling or usage inconsistencies must be weeded. couple this situation with a new staff, large in number but containing few professionals. the result could be disastrous if proper decision-making and problem identification did not occur. not knowing the magnitude of these problems we decided on almost verbatim transcription of records but spelling out all abbreviated words in any filing field. when our first file listing appeared-some 40,000 main entries plus 30,000 secondary entries-we saw that the filing arrangement was very poor due mainly to spelling variants, failure to consistently follow instructions to spell out abbreviated terms (which somehow escaped editing), and different entry forms for the same body. transcription of data from the original source was very accurate but because of these problems in the original data our proofreading resulted in some change occurring in about 10,000 of these 70,000 records. the use of punctuation marks in main entries varied so much that some corporate entries were filing in five or six separate groups in the list, each separated perhaps by several pages. the great shocker was the arrangement under the united states, as some coders had copied exactly from the card without spelling out u.s. and inserting a period and space. about a dozen entries had failed to be caught by the editors and appeared as one block. then, to compound the problem, others spelled out united states but forgot to insert a period after it. moreover, very early in the project the typists incorrectly inserted 2 spaces after the period. in all, there were six forms to the u.s. entries alone, with only one being correct. this lesson taught us that no matter how well instructions and examples are prepared misunderstanding can result; and, of course, editors and others will not catch all possible errors. however, these major errors were eliminated before publication. with the large volume of data and limited funds our conversion process was quite streamlined with most of the errorchecking resulting after the data were on tape and displayed in their proper relation to other records. few keyboarding errors occurred which 180 journal of library automation vol. 6/3 september 1973 were not caught at typing. the predominant errors resided in the nature of the original data, or in the lack of some piece of information from three or four different files which may have been checked in building the full record. communications. in any large project effective communication is necessary to improve quality of work and progress toward completion of the scheduled task. frequently scheduled meetings of the staff were used to inform all project members of decisions, receive their suggestions and criticisms, and develop coordinated work assignments among the teams of each editor /librarian. all typing personnel were trained as coders and were periodically relieved of typing to code. this gave them an insight into detecting problems for referral to the professional staff, renewed their knowledge of proper format, and provided more variety in their work. all project members were capable of performing tasks of coding,. control list checking, and proofreading. the most capable clerical staff also assisted the editors in editorial work. it was felt that our use of the team approach, unified training, frequent staff meetings, and very detailed written documentation served to channel communication with a resultant minimization of these problems-once the first few months of the project had passed. estimating. in most data conversion work accurate estimating is required on many matters. some estimates we made were very accurate, such as basic time and staff to complete initial coding, typing time and staff, and supplies needed. however, other estimates were not very accurate. for example, the time to edit and correct the file once basic data collection was completed was double our original estimate and required more typing than anticipated. this caused the publication schedule to be delayed two months. difficulties at the computer center and at the photocomposition vendor caused another two months delay, even though it is doubtful that our photocomposition firm would have been ready had we met our original estimate. our original target was publication not later than two months after the basic data collection period of six months, i.e., in eight months. however, on a project of this size, and with the addition of about 7,000 more titles than we had originally estimated, we did not feel that the fiftyfour weeks really required was excessive. computer time was also difficult to estimate because of the time on / off the system. dependent upon the nature of the other jobs on the computer, this time varied greatly, for updating runs were almost independent of the number of transactions. there is always room for improvement in estimating, and, obviously, we have learned many things from this experience to use in further work. hardware/computer center. our largest problem was creating firm computer scheduling commitments on our campus ibm 360/50 computer, which serves the business functions of the university. all other campus computing facilities use control data equipment which is six-bit character, word oriented. with the extended character set requirement and the availability minnesota union list of serialsj grosch 181 of the ibm 360/ 50, which we were already using for other work with the ala graphic print train, it was natural for us to choose this system. current facilities are now satisfactory to permit our tape batch system operation and the development of our new disk-based batch system. tape pooling operations for the mt / st have caused some problems due to equipment changes at our vendor. we have now switched to a new conversion source as our former vendor upgraded his data entry system to keyto-disk. the three mt/ st typewriters we leased pedormed quite reliably, but one machine seemed to have more down time than the others. now that our typing load is down1 we have cancelled two model v s and will maintain two machines. we are now choosing a new system for key input to cassette tape. on the new equipment we will do our proofreading and initial correction off-line resulting in a further cost saving. this was not possible previously as our typing load required two-shift operation on all machines during the preliminary edition preparation time. conclusion a great amount of effort has been expended to achieve a unified serials data base to serve minnesota's libraries. it is our hope that this system can continue to be developed in as flexible a way as possible so that future needs can be supported through the system. only the imagination of those involved in networking is the limit to identifying the future needs to be met through access to this data base. of course, we would hope that one day our data could benefit the development of other similar programs in other states and, perhaps more importantly, in achieving a true national serials data base. acknowledgment many staff members at the university and other institutions contributed their invaluable counsel as we h~ve proceeded on the development of the system and the data base. the muls project staff particularly receives our deep gratitude for its yeoman effort. special commendation is due mr. don norris for systems design and principal programming support. mr. carl sandberg, who wrote all printer output programs, also contributed invaluable assistance to the project. the minitex program and university library administration receive our appreciation for placing their confidence .in the systems division. muls and its support system is truly a product resulting from the coordinated concern and interest of the aforementioned individuals and groups. references i. u.s. library of congress. information systems office. serials: a marc format. preliminary edition. washington, d.c.: library of congress, 1970 (l.c. 73-606842). 2. u.s. library of congress. marc development office. serials: a marc format. addendum no.1. washington, d.c.: library of congress, june, 1971. 20190318 10992 editor letter from the editor kenneth j. varnum information technology and libraries | march 2019 1 https://doi.org/10.6017/ital.v38i1.10992 the current (march 2019) issue of information technology and libraries sees the first of what i know will be many exciting contributions to our new “public libraries leading the way” column. this feature (announced in december 2018) shines a spotlight on technology-based innovations from the public library perspective. the first column, “the democratization of artificial intelligence: one library’s approach,” by thomas finley of the frisco (texas) public library, discusses how his library has developed a teaching and technology lending program around artificial intelligence, creating kits that community members can take home and use to explore artificial intelligence through a practical, hands-on, approach. if you have a public library perspective on technology that you would like to share in a conversational, 1000-1500-word column, submit a proposal. full details and a link to the proposal submission form can be found on the lita blog. i look forward to hearing your ideas. in addition to the premiere column in this series, the current issue includes the lita president’s column from bohyun kim to update us on the 2019 ala midwinter meeting, particularly on the status of the proposed alcts/llama/lita merger, and our regular editorial board thoughts column, contributed this quarter by kevin ford, on the importance of user stories in successful technology projects. articles in this issue cover the topics: improving sitewide navigation; improving the display of hathitrust records in primo; using linked data to create a geographic discovery system; measuring information system project success; a systematic approach towards web preservation; and determining textbook cost, formats and licensing. i hope you enjoy reading the issue, whether you explore just one article, or read it “cover to cover.” as always, if you want to share the research or practical experience you have gained as an article in ital, get in touch with me at varnum@umich.edu. sincerely, kenneth j. varnum, editor varnum@umich.edu march 2019 microsoft word december_ital_vacek_final.docx president’s  message   twitter  nodes  to  networks:   thoughts  on  the  #litaforum   rachel  vacek     information  technologies  and  libraries  |  december  1014     1   one  thing  that  never  ceases  to  amaze  me  is  the  technological  talent  and  creativity  of  my  library   colleagues.  the  lita  forum  is  a  gathering  of  intelligent,  fun,  and  passionate  people  who  want  to   talk  about  technology  and  learn  from  one  another.    i  suppose  many  conferences  have  lots  of   opportunities  to  network,  but  the  size  and  friendliness  of  the  forum  makes  it  feel  more  like  a   comfortable  place  among  friends.  however,  the  utilization  of  technology  always  inspires  me,  and   the  networking  and  reconnect  with  friends  is  rejuvenating.   so  many  more  people  are  sharing  their  research  and  their  presentations  through  twitter,  and  it’s   fantastic  in  so  many  ways.  so  no  matter  what  concurrent  session  you  were  in,  or  if  you  couldn’t   even  make  it  to  albuquerque  this  year,  you  can  still  view  most  of  the  presentations,  listen  to  the   keynotes,  see  pictures  of  attendees,  follow  the  backchannel,  and  engage  with  everyone  on  twitter.   with  libraries  having  more  tight  budgets,  it’s  extremely  important  that  we  continue  to  learn   virtually.  there  are  plenty  of  online  workshops  and  webinars,  but  often  they  still  cost  money,  don’t   usually  encourage  much  communication  between  attendees,  and  “attending”  the  lita  forum  only   through  twitter  is  not  only  free,  but  the  learning  and  sharing  is  more  organic.  you  have  the   opportunity  to  engage  with  attendees,  observers,  and  even  the  presenters  themselves.  structured   workshops  have  their  place  for  focused,  more  in-­‐depth  learning  on  a  particular  topic,  and  they  are   definitely  still  needed  and  very  popular.  i  enjoy  our  lita  educational  programs  and  highly   recommend  them.  however,  interacting  with  twitter  throughout  the  forum  was  like  a  giant  social   playground  for  me,  and  i  could  engage  as  much  as  or  as  little  as  i  liked.  it’s  a  different  user   experience  than  so  many  other  more  traditional  learning  environments.   twitter  was  born  in  mid  2006  and  the  paradigm  shift  started  happening  a  few  years  later,  but  the   ways  people  are  socially  engaging  with  one  another  through  twitter  has  changed  drastically  since   then.1    people  aren’t  just  regurgitating  what  the  presenters  are  saying,  but  are  responding  to   speakers  and  others  in  the  physical  and  virtual  audience.  people  are  talking  more  in  depth  about   what  they  are  learning  and  supplementing  talks  with  links  to  sites,  videos,  images,  and  reports   that  might  have  been  mentioned.  they  are  coding  and  sharing  their  code  while  at  the  conference.   they  are  blogging  about  their  experiences  and  sharing  those  links.  they  are  extending  their   networks.       the  conference  theme  this  year  was  “from  node  to  network”  and  reflecting  on  my  own   conference  experience  and  reviewing  all  the  twitter  data,  i  don’t  think  the  2014  lita  forum     rachel  vacek  (revacek@uh.edu)  is  lita  president  2014-­‐15  and  head  of  web  services,  university   libraries,  university  of  houston,  houston,  texas.     president’s  message  |  vacek       2   planning  committee,  led  by  ken  varnum  from  the  university  of  michigan,  could  have  chosen  a   better  theme.     as  previously  mentioned,  the  ways  in  which  we  are  using  twitter  have  been  significantly  changing   the  way  we  learn  and  interact.  when  combing  through  the  #litaform  tweets  for  the  gems,  i  found   many  links  to  tools  that  analyze  and  visually  display  unique  information  about  tweets  from  the   forum.  the  love  of  data  is  not  uncommon  in  libraries,  and  neither  is  the  analysis  of  that  data.     the  tagsarchive2  contains  lots  of  twitter  data  from  the  forum.  as  you  can  see  in  image  1,   between  november  1,  2013,  and  november  17,  2014,  (the  same  tag  for  the  forum  was  used  for   the  2013  forum)  there  were  5,454  tweets,  4,390  of  which  were  unique,  not  just  retweets.  there   were  1,394  links  within  those  tweets,  demonstrating  that  we  aren’t  just  repeating  what  the   speakers  are  saying;  we  are  enriching  our  networks  with  more  easily  accessible  information.     image  1.  archive  of  #litaforum  tweets  through  tags   the  data  also  tells  stories.  for  example,  @cm_harlow  by  far  tweeted  more  than  everyone  else  with   881  tweets,  @thestackscat  had  the  highest  retweet  rate  at  90%,  and  @varnum  with  the  lowest         information  technologies  and  libraries  |  december  1014         3   retweet  rate  at  1%.  i  was  able  to  look  at  every  single  tweet  in  a  google  spreadsheet,  complete  with   timestamps  and  links  to  user  profiles.  all  this  is  rich  data  and  quite  informative,  but  tagsexplorer,   developed  by  @mhawksey,  is  also  quite  an  impressive  data  visualization  tool  that  shows   connections  between  the  twitter  handles.  (see  image  2.)     image  2.  tagsexplorer  data  visualization  and  top  conversationalists   additionally,  you  can  see  whom  you  retweeted  and  who  retweeted  you,3  again  demonstrating  the   power  of  rich,  structured  data.  (see  image  3.)  all  of  these  tools  improve  our  ability  to  share,  reflect,   archive,  and  network  within  lita  and  beyond  our  typical,  often  comfortable  library  boundaries.   tweets  also  don’t  last  forever  on  the  web,  but  they  do  when  they  are  archived.4    one  conference   attendee,  @kayiwa,  used  a  tool  called  twarc  (https://github.com/edsu/twarc),  a  command-­‐line   tool  for  archiving  json  twitter  search  results  before  they  disappear.  looking  through  the  tweets,   you  will  learn  that  a  great  number  of  attendees  experienced  altitude  sickness  due  to   albuquerque’s  elevation,  which  is  around  5,000  feet  above  sea  level.  the  most  popular  and   desired  food  to  were  enchiladas  with  green  chili.  many  were  impressed  with  the  scenery,   mountains,  and  endless  blue  skies  of  the  city,  as  evidenced  by  the  number  of  images  of  outdoor   landscapes  and  sky  shots.       president’s  message  |  vacek       4     image  3.  connections  between  @vacekrae’s  retweets  and  who  she  was  retweeted  by   there  were  two  packed  pre-­‐conferences  at  the  lita  forum.  dean  krafft  and  jon  corson-­‐rikert   from  cornell  university  library  taught  attendees  about  a  very  hot  topic:  linked  data  and  “how   libraries  can  make  use  of  linked  open  data  to  share  information  about  library  resources  and  to   improve  discovery,  access,  and  understanding  for  library  users.”    the  hashtag  #linkeddata  was   used  382  times  across  all  the  forum’s  tweets  –  clearly  conversation  went  beyond  the  workshop.   also,  francis  kayiwa,  of  kayiwa  consulting,  and  eric  phetteplace  from  the  california  college  of   arts,  helped  attendees  “learn  python  by  playing  with  library  data”  in  the  second,  equally  as   popular  pre-­‐conference.  (see  image  4.)     image  4         information  technologies  and  libraries  |  december  1014         5   the  forum  this  year  also  had  three  exceptional  keynote  speakers.    annmarie  thomas,  @amptmn,   an  engineering  professor  from  the  university  of  st.  thomas  in  minnesota,  kicked  off  the  forum   and  shared  her  enthusiasm  and  passion  for  makerspaces,  squishy  circuits,  and  how  to  engage  kids   in  engineering  and  science  in  incredibly  creative  ways.  i  was  truly  inspired  by  her  passion  for   making  and  sharing  with  others.    she  reminded  us  that  all  children  are  makers,  and  as  adults  we   need  to  remember  to  be  curious,  explore,  and  play.    there  are  129  tweets  that  capture  not  only  her   fun  presentation  but  also  her  vision  for  making  in  the  future.  (see  image  5.)     image  5   the  second  keynote  speaker  was  lorcan  dempsey,  @lorcand,  the  vice  president,  oclc  research   and  chief  strategist.    he’s  known  primarily  for  the  research  he  presents  through  his  weblog,   http://orweblog.oclc.org,  where  he  makes  observations  on  the  way  users  interact  with  technology   and  the  discoverability  of  all  that  libraries  have  to  offer,  from  collections  to  services  to  expertise.   he  wants  to  make  library  data  more  usable.  in  his  talk,  he  explained  how  some  technologies  such   as  mobile  devices  and  irs  are  having  huge  effects  on  user  behaviors.    “the  network  reshapes   society  and  society  reshapes  the  network.”  what  was  nice  also  is  that  lorcan’s  talk  complimented   annmarie’s  talk  about  making  and  sharing.  users  are  going  from  consumption  to  creation,  and  we,   as  libraries,  need  to  be  offering  our  services  and  content  in  the  users’  workflows.    we  need  to   share  our  resources,  make  them  more  discoverable.    why?    “discovery  often  happens  elsewhere.”     check  out  the  123  posts  on  the  twitter  archive,  which  includes  links  to  his  presentation.    (see   image  6.)     image  6     president’s  message  |  vacek       6   kortney  ryan  ziegler,  @fakerapper,  is  the  founder  trans*h4ck  and  the  closing  keynote  speaker.     his  work  focuses  on  supporting  trans-­‐created  technology,  trans  entrepreneurs,  and  trans-­‐led   startups.  he’s  led  hackathons  and  helped  create  safe  spaces  for  the  trans  community.    his  work  is   so  important  and  many  of  the  apps  help  to  address  the  social  inequalities  that  the  trans   community  still  faces.    for  example,  he  mentioned  that  it’s  still  legal  in  36  states  to  be  fired  for   being  trans.    but  there  are  174  tweets  captured  at  the  forum  that  give  examples  of  the  web  tools   created,  and  ideas  about  how  libraries  can  be  inclusive  and  more  supportive  of  the  trans   community.    (see  image  7.)     image  7   the  sessions  themselves  were  excellent,  and  many  sparked  conversations  long  after  the   presentation.  lightning  talks  were  engaging,  fast,  and  fun.  posters  were  both  beautiful  and   informative.  overarching  terms  that  i  heard  repeatedly  and  saw  among  the  tweets  were:  open   graph,  openrefine,  social  media,  makerspaces,  bibframe,  library  labs,  leadership,  support,   community,  analytics,  assessment,  engagement,  inclusivity,  diversity,  agile  development,  open   access,  linked  data,  vivo,  dataone,  discovery  systems,  discoverability,  librarybox,  islandora,  and   institutional  repositories.  below  are  some  highlights:           information  technologies  and  libraries  |  december  1014         7       there  were  so  many  opportunities  to  network  at  sessions,  on  breaks,  at  the  networking  dinners,   and  even  at  game  night.  i  see  networking  as  a  huge  benefit  of  a  small  conference,  and  networking   can  lead  to  some  pretty  amazing  things.    for  example,  whitni  watkins,  @nimblelibrarian  and  one   of  lita’s  invaluable  volunteers  for  the  forum,  was  so  inspired  by  a  conversation  on  openrefine   that  she  created  a  list  where  people  could  sign  up  to  learn  more  and  get  some  hands-­‐on  playing   time  with  the  tool.    on  her  blog,5  whitni  says,  “…most  if  not  all  of  those  who  came  left  with  a  bit     president’s  message  |  vacek       8   more  knowledge  of  the  program  than  before  and  we  opened  a  door  of  possibility  for  those  who   hadn’t  any  clue  as  to  what  openrefine  could  do.”   another  example  of  great  networking  is  where  tabby  farney,  @sharebrarian,  and  cody  behles,   @cbehles,  decided  to  create  a  lita  metrics  interest  group.    at  one  of  the  networking  dinners,  they   discussed  their  passion  for  altmetrics  and  web  analytics  but  noticed  that  there  wasn’t  an  existing   group,  and  felt  spurred  to  create  one.         the  technology  and  information  sharing,  the  networking,  the  collaborating,  and  the  strategizing  –   these  are  all  components  that  make  up  the  lita  forum.    twitter  is  just  another  technology   platform  to  help  us  connect  with  one  another.    we  are  all  just  nodes,  and  technology  enables  us  to   both  become  the  network  and  to  network  more  effectively.   but  finally,  i  want  to  acknowledge  and  thank  our  sponsors,  many  of  which  are  also  lita  members.   we  could  not  have  run  the  forum  without  the  generous  funds  from  ebsco,  springshare,  @mire,   innovative,  and  oclc.  on  behalf  of  lita,  i  truly  appreciate  their  support.   i  want  to  leave  you  with  one  more  image  that  was  created  by  @kayiwa  using  the  most  tweeted   words  from  all  the  posts.6  next  year’s  forum  is  in  minneapolis,  and  i  hope  to  see  you  there.           information  technologies  and  libraries  |  december  1014         9   references     1.  http://consumercentric.biz/wordpress/?p=106   2.https://docs.google.com/spreadsheet/pub?key=0asyivmoyhk87dfnfx196v1e2m2zqtvlhq2j vs2fsdee&output=html     3.  http://msk0.org/lita2014/litaforum-­‐directed-­‐retweets.html   4.  http://msk0.org/lita2014/lita2014.html   5.  http://nimblelibrarian.wordpress.com/2014/11/14/lita-­‐forum-­‐2014-­‐a-­‐recap/   6.  http://msk0.org/lita2014/litaforum-­‐wordcloud.html mobile website use and advanced researchers: understanding library users at a university marine sciences branch campus mary j. markland, hannah gascho rempel, and laurie bridges information technology and libraries | december 2017 7 abstract this exploratory study examined the use of the oregon state university libraries website via mobile devices by advanced researchers at an off-campus branch location. branch campus–affiliated faculty, staff, and graduate students were invited to participate in a survey to determine what their research behaviors are via mobile devices, including frequency of their mobile library website use and the tasks they were attempting to complete. findings showed that while these advanced researchers do periodically use the library website via mobile devices, mobile devices are not the primary mode of searching for articles and books or for reading scholarly sources. mobile devices are most frequently used for viewing the library website when these advanced researchers are at home or in transit. results of this survey will be used to address knowledge gaps around library resources and research tools and to generate more ways to study advanced researchers’ use of library services via mobile devices. introduction as use of mobile devices has expanded in the academic environment, so has the practice of gathering data from multiple sources about what mobile resources are and are not being used. this data informs the design decisions and resource investments libraries make in mobile tools. web analytics is one tool that allows researchers to discover which devices patrons use to access library webpages. but web analytics data do not show what patrons want to do and what hurdles they face when using the library website via a mobile device. web analytics also lacks nuance in that it cannot distinguish user characteristics, such as whether users are novice or advanced researchers, which may affect how these users interact with a mobile device. user surveys are another tool for gathering data on mobile behaviors. user surveys help overcome some of the limitations of web analytics data by directly asking users about their perceived research skills and the resources they use on a mobile device. as is the case at most libraries, oregon state university libraries serves a diverse range of users. we were interested in learning whether advanced researchers—particularly advanced researchers who work at a branch campus—use the library’s resources differently than main mary j. markland (mary.markland@oregonstate.edu), is head, guin library; hannah gascho rempel (hannah.rempel@oregonstate.edu) is science librarian and coordinator of graduate student success services; and laurie bridges (laurie.bridges@oregonstate.edu) is instruction and outreach librarian, oregon state university libraries and press. mailto:mary.markland@oregonstate.edu mailto:hannah.rempel@oregonstate.edu mailto:laurie.bridges@oregonstate.edu mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 8 campus users. we were chiefly interested in these advanced researchers because of the mobile nature of their work. they are graduate students and faculty in the field of marine science who work in a variety of locations, including their offices, labs, and in the field (which can include rivers, lakes, and the ocean). we focused on the use of the library website via mobile devices as one way to determine whether specific library services should be adapted to best meet the needs of this targeted user community. oregon state university (osu) is oregon’s land-grant university; its home campus is in corvallis, oregon. hatfield marine science center (hmsc) in newport is a branch campus that includes a branch library. guin library at hmsc serves osu students and faculty from across the osu colleges along with the co-located federal and state agencies of the national oceanic and atmospheric administration (noaa), us fish and wildlife service, environmental protection agency (epa), united states geological survey (usgs), united states department of agriculture (usda), and the oregon department of fish and wildlife. the guin library is in newport, which is forty-five miles from the main campus. like many other branch libraries, guin library was established at a time when providing a print collection close to where researchers and students work was paramount, but today it must adapt its services to meet the changing information needs of its user base. branch libraries are typically designed to serve a clientele or subject area, which can create a different institutional culture from the main library. guin library serves advanced undergraduates, graduate students, and scientific researchers. hmsc’s distance from corvallis, the small size of the researcher community, and the shared focus on a research area—marine sciences—create a distinct culture. while guin library is often referred to as the “heart of hmsc,” the number of in-person library users is decreasing. this decline is not unexpected as numerous studies have shown that faculty and graduate students have fewer needs that require an in-person trip to the library.1 studies have also shown that faculty and graduate students can be unaware of the services and resources that libraries provide, thereby continuing the cycle of underuse. 2 to learn more about the needs of hmsc’s advanced researchers, this exploratory study examined their research behaviors via mobile devices. the goals of this study were to • determine if and with what frequency advanced researchers at hmsc use the osu libraries website via mobile devices; • gather a list of tasks advanced users attempt to accomplish when they visit the osu libraries website on a mobile device; and • determine whether the mobile behaviors of these advanced researchers are different from those of researchers from the main osu campus (including undergraduate students), and if so, whether these differences warrant alternative modes of design or service delivery. information technology and libraries | december 2017 9 literature review the conversation about how best to design mobile library websites has shifted over the past decade. early in the mobile-adoption process some libraries focused on creating special websites or apps that worked with mobile devices.3 while libraries globally might still be creating mobilespecific websites and apps,4 us libraries are trending toward responsively designed websites as a more user-friendly option and a simpler solution for most libraries with limited staff and budgets.5 most of the literature on mobile-device use in higher education is focused on undergraduates across a wide range of majors who are using a standard academic library. 6 to help provide context for how libraries have designed their websites for mobile users, some of those specific findings will be shared later. but because our study focused on graduate students and faculty in a sciencefocused branch library, we will begin with a discussion of what is known about more advanced researchers’ use of library services and their mobile-device habits. several themes emerged from the literature on graduate students’ relationships with libraries. in an ironic twist, faculty think graduate students are being assisted by the library while librarians think faculty are providing graduate students with the help they need to be successful.7 this results in many graduate students end up using their library’s resources in an entirely disintermediated way. graduate students, especially those in the sciences, visit the physical library less often and use online resources more than undergraduate students.8 most graduate students start their research process with assistance from academic staff, such as advisors and committee members,9 and are unaware of many library services and resources.10 as frequent virtual-library users who receive little guidance on how to use the library’s tools, graduate students need a library website that is clear in scope and purpose, offers help, and has targeted services. 11 compared to reports on undergraduate use of mobile devices to access their library’s website, relatively few studies have focused on graduate-student or faculty mobile behaviors. a recent survey of japanese library and information science (lis) students compared and undergraduate graduate students’ usage of mobile devices to access library services and found slight differences. however, both groups reported accessing libraries as last on their list of preferred smartphone uses.12 aharony examined the mobile use behaviors of israeli lis graduate students and found approximately half of these graduate students used smartphones and perceived them to be useful and easy tools for use in their everyday life, and could transfer those habits to library searching behaviors.13 when looking specifically at how patrons use library services via a mobile device, rempel and bridges found the top reason graduate students at their main campus used the osu libraries website via mobile devices was to find information on library hours, followed by finding a book and researching a topic.14 barnett-ellis and vann surveyed their small university and found that both undergraduate and graduate students were more than twice as likely to use mobile devices as are their faculty and staff; a majority of students also indicated they were likely to use mobile devices to conduct research.15 finally, survey results showed graduate students in hofstra university’s college of education reported accessing library materials via a mobile device twice as often as other student groups. in addition, these graduate students reported being comfortabl e mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 10 reading articles up to five pages long on their mobile devices. graduate students were also more likely to be at home when using their mobile device to access the library, a finding the authors attributed to education graduate students frequently being employed as full-time teachers.16 research on how faculty members use library resources characterizes a population that is confident in their literature-searching skills, prefers to search on their own, and has little direct contact with the library.17 faculty researchers highly value convenience;18 they rely primarily on electronic access to journal articles but prefer print access to monographs.19 faculty tend to be self-trained at using search tools, such as pubmed or other online databases, and therefore are not always aware of the more in-depth functionality of these tools.20 in contrast to graduate students, rempel and bridges found that faculty using the library website via mobile devices were less interested in information about the physical library, such as library hours, and were more likely to be researching a topic.21 medical faculty are one of the few faculty groups whose mobile-research behaviors have been specifically examined. a survey administered by bushhousen et al. at a medical university revealed that a third of respondents used mobile apps for research-related activities.22 findings by boruff and storie indicate that one of the biggest barriers to mobile use in health-related academic settings was wireless access.23 thus apps that did not require the user to be connected to the internet were highly desired. faculty and graduate students in health-related academic settings saw a role for the library in advocating for better wireless infrastructure, providing access to a targeted set of heavily used resources, and providing online guides or in-person tutorials on mobile apps or procedures specific to their institution. 24 according to the literature, most design decisions for library mobile sites have been made on the basis of information collected about undergraduate students’ behavior at main-branch campuses. to help inform our understanding of how recent decisions have been made, the remainder of the literature review focuses on what is known about undergraduate students’ mobile behavior. undergraduate students are very comfortable using mobile technologies and perceive themselves to be skilled with these devices. according to the 2015 educause center for research and analysis’ (ecar) study of undergraduate students and information technology, most undergraduate students consider themselves sophisticated technology users who are engaged with information technologies.25 undergraduate students mainly use their smartphones for nonclass activities. but students indicate they could be more effective technology users if they were more skilled at tools such as the learning management system, online collaboration tools, e-books, or laptops and smartphones in class. of interest to libraries is the ecar participants’ top area of reported interest, “search tools to find reference or other information online for class work.”26 however, when a mobile library site is in place, usage rates have been found to be lower than anticipated. in a study of undergraduate science students, salisbury et al. found only 2 percent of respondents reported using their cell phones to access library databases or the library’s catalog every hour or daily, despite 66 percent of the students browsing the internet using their mobile information technology and libraries | december 2017 11 phone hourly or daily. salisbury et al. speculated that users need to be told about mobileoptimized library resources if libraries want to increase usage. 27 rempel and bridges used a pop-up interrupt survey while users were accessing the osu libraries mobile site.28 this approach allowed a larger cross-section of library users to be surveyed. it also reduced memory errors by capturing their activities in real time. activities that had been included in the mobile site because of their perceived usefulness in a mobile environment, such as directions, asking a librarian a question, and the coffee shop webcam, were rarely cited as a reason for visiting the mobile site. the osu libraries branch at hmsc is entering a new era. a marine studies initiative will result in the building of a new multidisciplinary research campus at hmsc that aims to serve five hundred undergraduate students. the change in demographics and the increase in students who will need to be served has prompted guin library staff to explore how the current population of advanced researchers interact with library resources. in addition, examining the ways undergraduate students at the main campus use these tools will help with planning for the upcoming changes in the user community. methods this study used an online qualtrics survey to gather information about how frequently advanced researchers (graduate students, faculty, and affiliated scientists at a branch library for marine science) use the osu libraries website via mobile devices, what they search for, and other ways they use mobile devices to support their research behaviors. a recruitment email with a link to the survey was sent to three discussion lists used by hmsc community in spring 2016. the survey was available for four weeks, and a reminder email was sent one week before the survey closed. the invitation email included a link to an informedconsent document. once the consent document had been reviewed, users were taken to the survey via a second link. respondents could provide an email address to receive a three-dollar coffee card for participating in the study, but their email address was recorded in a separate survey location to preserve their anonymity. the invitation email indicated that this survey was about using the website via a mobile device, and the first survey question asked users if they had ever accessed the library website on a mobile device. if they answered “no,” they were immediately taken to the end of the survey and were not recorded as a participant in the study. a similar survey was conducted with users from osu’s main campus in 2012–13 and again in 2015. the results from 2012–13 have been published previously,29 but the results from 2015 have not. while the focus of the present study is on the mobile behaviors of advanced researchers in the hmsc community, data from the 2015 main-campus study is used to provide a comparison to the broader osu community. osu main-campus respondents in 2015 and hmsc participants in 2016 both answered closedand open-ended questions that explored participants’ general mobiledevice behaviors and behaviors specific to using the osu libraries website via mobile devices. mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 12 however, the hmsc survey also asked questions about behaviors related to using the osu (nonlibrary) website via a mobile device and participants’ mobile scholarly reading and writing behaviors. the survey concluded with several demographic questions. the survey data was analyzed using qualtrics’ cross-tab functionality and microsoft excel to observe trends and potential differences between user groups. open-ended responses were examined for common themes. twenty-three members of the hmsc community completed the survey, whereas one hundred participants responded to the 2015 main campus survey. participation in the 2015 survey was capped at one hundred respondents because limited incentives were available. the participation difference between the two surveys reflects several differences between the two sampled communities. the most obvious difference is size. the osu community comprises more than thirty-six thousand students, faculty, and staff; the hmsc community is approximately five hundred students, researchers, and faculty—some of whom are also included as part of the larger osu community. the second factor influencing response rates relates to the difference in size between the two communities, but is more striking in the hmsc community: the survey relied on a self-selected group of users who indicated they had a history using the library website via a mobile device. therefore, it is not possible to estimate the population size of mobile-device library-website users specific to the branch library or the main campus library. this limitation means that the results from this study cannot be used to generalize findings to all users who visit a library website via mobile devices; instead the results are intended to present a case that other libraries may compare with behaviors observed on their own campuses. sharing the behaviors of advanced researchers at a branch campus is particularly valuable as this population has historically been understudied. results and discussion participant demographics and devices used of the twenty-three respondents to the hmsc mobile behaviors survey, 13 (62 percent) were graduate students, 7 (34 percent) were faculty (this category includes faculty researchers and courtesy faculty), and one respondent was an noaa employee. two participants declined to declare their affiliation. of the 97 respondents to the 2015 osu main-campus survey who shared their affiliation, 16 (16 percent) were graduate students, 5 (5 percent) were faculty members, and 69 (71 percent) were undergraduates. respondents varied in the types of mobile devices they used when doing library research. smartphones were used by 78 percent (18 respondents) and 22 percent (5 respondents) used a tablet. apple (15 respondents) was the most common device brand used, although six of the respondents used an android phone or tablet. compared to the general population’s device ownership, these respondents are more likely to own apple devices, but the two major device types owned (apple and android) match market trends.30 information technology and libraries | december 2017 13 frequency of library site use on mobile devices most of the hmsc respondents are infrequent users of the library website via mobile devices: 50 percent (11 respondents) did so less than once a month; 41 percent (9 respondents) did so at least once a month; and 9 percent (2 respondents) did so at least once a week. the low level of library website usage via mobile devices was especially notable as this population reports being heavy users of the library website via laptops or desktop computers, with 82 percent (18 respondents) visiting the library website via those tools at least once a week. researchers at hmsc used the library website via mobile devices much less often than the 2015 main-campus respondents (undergraduates, graduate students, and faculty). no hmsc respondents visited the mobile site daily compared to 10 percent of main-campus users, and only 9 percent of hmsc respondents visited weekly compared to 28 percent of main-campus users (see figure 1). figure 1. 2016 hmsc participants vs. 2015 osu main-campus participants reported frequency of library website visits via a mobile device by percent of responses. while hmsc advanced researchers share some mobile behaviors with main-campus students, this exploratory study demonstrates they do not use the library website via mobile devices as frequently. some possible reasons for this are researchers rarely spend time coming and going to and from classes and therefore do not have small gaps of time to fill throughout their day. instead, their daily schedule involves being in the field or in the lab collecting and analyzing data. 0% 10% 20% 30% 40% 50% 60% this is my first time less often than once a month at least once a month at least once a week every day or almost every day branch 2016 main 2015 mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 14 alternatively, they are frequently involved in writing-intensive projects such as drafting journal articles or grant proposals. they carve out specific periods to do research and do not appear to be filling time with short bursts of literature searching. they can work on laptops and do not need to multitask on a phone or tablet between classes or in other situations. mobile-device ownership among hmsc graduate students might also be limited because of personal budgets that do not allow for owning multiple mobile devices or for having the most recent model. in addition, this group of scientists may not be on the front edge of personal technologies, especially compared to medical researchers, because few mobile apps are designed specifically for the research needs of marine scientists. where researchers are when using mobile devices for library tasks because mobile devices facilitate connecting to resources from many locations, and because advanced researchers conduct research in a range of settings—including the field, the office, and home—we asked respondents where they were most likely to use the library website via a mobile device. thirty-two percent were most likely to be at home, 27 percent in transit; 18 percent at work; and 9 percent in the field. the popularity of using the library website via mobile devices while in transit was somewhat unexpected, but perhaps should not have been because many people try to maximize their travel time by multitasking on mobile devices. the distance from the main campus might explain this finding because a local bus service provides an easy way to travel to and from the main campus, and the hour-long trip would provide opportunities for multitasking via a mobile device. relatively few respondents used mobile devices to access the library website while at work. previous studies show that a lack of reliable campus wireless internet access can affect students’ ability to use mobile technology.31 hmsc also struggles to provide consistent wireless access, and signals are spotty in many areas of our campus. despite signal boosters in guin library, wireless access is still limited at times. in addition, cell phone service is equally spotty both at hmsc and up and down the coast of oregon. it is much less frustrating to work on a device that has a wired connection to the internet while at hmsc. these respondents did use mobile devices while at home, which might indicate they had a better wireless signal there. alternatively, working from home on a mobile device might indicate that they compartmentalize their library-research time as an activity to do at home instead of in the office. researchers used their mobile devices to access the library while in the field less than originally expected, but upon further reflection, it made sense that researchers would be less likely to use library resources during periods of data collection for oceanic or other water-based research projects because of their focused involvement during that stage. the water-based research also increases the risk of losing mobile devices. library resources accessed via mobile devices information technology and libraries | december 2017 15 to learn more about how these respondents used the library website, we asked them to choose what they were searching for from a list of options. respondents could choose as many options as applied to their searching behaviors. hmsc respondents’ primary reason for visiting the library’s site via a mobile device was to find a specific source: 68 percent looked for an article, 45 percent for a journal, 36 percent for a book, and 14 percent for a thesis. many of the hmsc respondents also looked for procedural or library-specific information: 36 percent looked for hours, 32 percent for my account information, 18 percent for interlibrary loan, 14 percent for contact information, 9 percent for how to borrow and request books, 9 percent for workshop information, and 9 percent for oregon estuaries bibliographies—a unique resource provided by the hmsc library. fifty-five percent of searches were for a specific source and 43 percent were for procedural or libraryspecific information. notably missing from this list were respondents who reported searching via their mobile device for directions to the library. compared to the 2015 osu libraries main-campus survey respondents, hmsc respondents were much more likely to visit the library website via a mobile device to look for an article (68 percent vs. 37 percent), find a journal (45 percent vs. 23 percent), access my account information (32 percent vs. 7 percent), use interlibrary loan (18 percent vs. 5 percent), or find contact information (14 percent vs. 1 percent). however, unlike hmsc participants, who do not have access to course reserves at the branch library, 7 percent of osu main-campus respondents used their mobile devices to find course reserves on the library website. see figure 2. 0% 10% 20% 30% 40% 50% 60% 70% directions contact information interlibrary loan course reserves my account a journal a book library hours an article branch 2016 main 2015 mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 16 figure 2. 2016 hmsc vs. 2015 osu main-campus participants reported searches while visiting the library website via a mobile device by percent of responses. it is possible that hmsc users with different affiliations might use the library site via a mobile device differently. these exploratory findings show that graduate students used the greatest variety of content via mobile devices. graduate students as a group reported using 11 of the 14 provided content choices via a mobile device while faculty reported using 8 of the 14. graduate students were the largest group (62 percent of respondents), which might explain why as a group they searched for more types of content via mobile devices. interestingly, faculty members and faculty researchers reported looking for a thesis via a mobile device, but no graduate students did. perhaps these graduate students had not yet learned about the usefulness of referencing past theses as a starting point for their own thesis writing. or perhaps they were only familiar with searching for journal articles on a topic. in contrast, faculty members might have been searching for specific theses for which they had provided advising or mentoring support. to help us make decisions about how to best direct users to library content via mobile devices, we asked respondents to indicate their searching behaviors and preferences. of the 16 hmsc respondents who answered this question, 12 (75 percent) used our web-scale discovery search box via mobile devices; 4 (25 percent) reported that they did. presumably these latter searchers were navigating to another database to find their sources. of 16 respondents, only 6 (38 percent) indicated that they looked for a specific library database (as opposed to the discovery tool) when using a mobile device. those respondents who were looking for a database tended to be looking for the web of science database, which makes sense for their field of study. when conducting searches for sources on their mobile devices, hmsc respondents employed a variety of search strategies: the 12 respondents who replied used a combination of author (75 percent), journal title (67 percent, keyword (67 percent), and book title (50 percent) searches when starting at the mobile version of the discovery tool. when asked about their preferred way to find sources, a majority of hmsc respondents reported that they tended to prefer a combination of searching and menu navigation while using the library website from mobile devices, while the remainder were evenly divided between preferring menu driven and search-driven discovery. while osu libraries does not currently provide links to any specific apps for source discovery, such as pubmed mobile or jstor browser, 13 (62 percent) of the hmsc respondents indicated they would be somewhat or very likely to use an app to access and use library services. this finding connects to the issue of reliable wireless access. medical graduate students had a wider array of apps available to them, but the primary reason they wanted to use these apps was because they provided a better searching experience in hospitals that had intermittent wireless access—an experience to which researchers at hmsc could relate.32 university website use behaviors on mobile devices to help situate respondents’ library use behaviors on mobile devices in comparison to the way they use other academic resources on mobile devices, we asked hmsc respondents to describe information technology and libraries | december 2017 17 their visits to resources on the osu (nonlibrary) website via mobile devices. compared to their use of the library site on a mobile device, respondents’ use of university services was higher: 43 percent (9 respondents) visited the university’s website via a mobile device at least once a week compared to only 9 percent (2 respondents) who visited the library site with that frequency. this makes sense because of the integral function many of these university services play in most university employees’ regular workflow. respondents indicated visiting key university sites including myosu (a portal webpage, visited by 60 percent of respondents), the hmsc webpage (55 percent), canvas (the university’s learning management system, visited by 50 percent of respondents), and webmail (45 percent). see figure 3. figure 3. university webpages hmsc respondents access on a mobile device by percent of responses. university resources such as campus maps, parking locations, and the graduate school website were frequently used by this population. the use of the first two makes sense as hmsc users are located off-site and need to use maps and parking guidance when they visit the main campus. the use of the graduate school website makes sense because the respondents were primarily graduate students and graduate school guidelines are a necessary source of information. interestingly, our advanced users are similar to undergraduates in that they primarily read email, information from social networking sites, and news on their mobile devices. 33 other research behaviors on mobile devices mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 18 we wanted to know what other research-related behaviors the hmsc respondents are engaged in via mobile devices to determine if there might be additional ways to support researchers’ workflows. we specifically asked about respondents’ reading, writing, and note-taking behaviors to learn how well these respondents have integrated them with their mobile usage behaviors. all respondents reported reading on their mobile device (see figure 4). email represented the most common reading activity (95 percent), followed by “quick reading” activities, such as reading social networking posts (81 percent), current news (81 percent), and blog posts (62 percent). smaller numbers used their mobile devices for academic or long-form reading, such as reading scholarly articles (33 percent) or books (19 percent). of those respondents who read articles and books on their mobile devices, only respondents highlighted or took notes using their mobile device. seven respondents used a citation manager on their mobile device: three used endnote, one used mendeley, one used pages, and one used zotero. one respondent used evernote on their mobile device, and one advanced user reported using specific data and database management software, websites, and apps related to their projects. more advanced and interactive mobilereading features, such as online spatial landmarks, might be needed before reading scholarly articles on mobile devices becomes more common.34 figure 4. what hmsc respondents reported reading on a mobile device by percent of responses. limitations this exploratory study had several limitations, most of which reflect the nature of doing research with a small population at a branch campus. this study had a small sample size, which limited observations of this population; however, future studies could use research techniques such as interviews or ethnographic studies to gather deep qualitative information about mobile-use 19% 33% 62% 81% 81% 95% 0% 20% 40% 60% 80% 100% 120% books academic or scholarly articles blog posts current news social networking posts (facebook, twitter, etc.) email percent of responses information technology and libraries | december 2017 19 behaviors in this population. a second limitation was that previous studies of the osu libraries mobile website used google analytics to compare survey results with what users were actually doing on the library website. unfortunately, this was not possible for this study. because of how hmsc’s network was set up, anyone at hmsc using the osu internet connections is assigned an ip address that shows a corvallis, oregon, location rather than a newport, oregon, location, which rendered parsing hmsc-specific users in google analytics impossible. the research behaviors of advanced researchers at a branch campus has not been well-examined; despite its limitations, this study provides beneficial insights into the behaviors of this user population. conclusion focusing on how advanced researchers at a branch campus use mobile devices while accessing library and other campus information provides a snapshot of key trends among this user group. these exploratory findings show that these advanced researchers are infrequent users of library resources via mobile devices and, contrary to our initial expectations, are not using mobile devices as a research resource while conducting field-based research. findings showed that while these advanced researchers do periodically use the library website via mobile devices, mobile devices are not the primary mode of searching for articles and books or for reading scholarly sources. mobile devices are most frequently used for viewing the library website when these advanced researchers are at home or in transit. the results of this survey will be used to address the hmsc knowledge gaps around use of library resources and research tools via mobile devices. both graduate students and faculty lack awareness of library resources and services and have unsophisticated library research skills. 35 while the osu main campus has library workshops for graduate students and faculty, these workshops have been inconsistently duplicated at the guin library. because the people working at hmsc come from such a wide variety of departments across osu that focus on marine sciences, hmsc has never had a library orientation. the results indicate possible value in devising ways to promote guin library’s resources and services locally, which could include highlighting the availability of mobile library access. while several participants mentioned using research tools like evernote, pages, or zotero on their mobile devices, most participants did not report enhancing their mobile research experience with these mobile-friendly tools. workshops specifically modeling how to use mobile-friendly tools and apps such as dropbox, evernote, goodreader, or browzine could help introduce the benefits of these tools to these advanced researchers. because wireless access is even more of a concern for researchers at this branch location than for researchers at the main campus, database-specific apps will be explored to determine if the use of searching apps could help alleviate inconsistent wireless access. if database apps that are appropriate for marine science researchers are available, these will be promoted to this user population. future research might involve follow-up interviews or focus groups, ethnographic studies, or interviews, which could expand the knowledge of these researchers’ mobile-device behaviors and mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 20 their perceptions of mobile devices. exploring the technology usage by these advanced researchers in their labs, including electronic lab notebooks or other tools, might be an interesting contrast to their use of mobile devices. in addition, as the hmsc campus grows with the expansion of the marine studies initiative, increasing numbers of undergraduates will use guin library. the ecar 2015 statistics show that current undergraduates own multiple internet-capable devices.36 presumably, these hmsc undergraduates will be likely to follow the trends seen in the ecar data. certainly, the plans to expand hmsc’s internet and wireless infrastructure will affect all its users. our mobile survey gave us insights into how a sample of the hmsc population uses the library’s resources and services. these observations will allow guin library to expand its services for the hmsc campus. we encourage other librarians to explore their unique user populations when evaluating services and resources. references 1 maria anna jankowska, “identifying university professors’ information needs in the challenging environment of information and communication technologies,” journal of academic librarianship 30, no. 1 (2004): 51–66, https://doi.org/10.1016/j.jal.2003.11.007; pali u. kuruppu and anne marie gruber, “understanding the information needs of academic scholars in agricultural and biological sciences,” journal of academic librarianship 32, no. 6 (2006): 609–23; lotta haglund and per olsson, “the impact on university libraries of changes in information behavior among academic researchers: a multiple case study,” journal of academic librarianship 34, no. 1 (2008): 52–59, https://doi.org/10.1016/j.acalib.2007.11.010; nirmala gunapala, “meeting the needs of the ‘invisible university’: identifying information needs of postdoctoral scholars in the sciences,” issues in science and technology librarianship, no. 77 (summer 2014), https://doi.org/10.5062/f4b8563p. 2 tina chrzastowski and lura joseph, “surveying graduate and professional students’ perspectives on library services, facilities and collections at the university of illinois at urbanachampaign: does subject discipline continue to influence library use?,” issues in science and technology librarianship no. 45 (winter 2006), https://doi.org/10.5062/f4dz068j; kuruppu and gruber, “understanding the information needs of academic scholars in agricultural and biological sciences”; haglund and olsson, “the impact on university libraries of changes in information behavior among academic researchers.” 3 ellyssa kroski, “on the move with the mobile web: libraries and mobile technologies,” library technology reports 44, no. 5 (2008): 1–48, https://doi.org/10.5860/ltr.44n5. 4 paula torres-pérez, eva méndez-rodríguez, and enrique orduna-malea, “mobile web adoption in top ranked university libraries: a preliminary study,” journal of academic librarianship 42, no. 4 (2016): 329–39, https://doi.org/10.1016/j.acalib.2016.05.011. 5 david j. comeaux, “web design trends in academic libraries—a longitudinal study,” journal of web librarianship 11, no. 1 (2017), 1–15, https://doi.org/10.1080/19322909.2016.1230031; https://doi.org/10.1016/j.jal.2003.11.007 https://doi.org/10.1016/j.acalib.2007.11.010 https://doi.org/10.5062/f4b8563p https://doi.org/10.5062/f4dz068j https://doi.org/10.5860/ltr.44n5 https://doi.org/10.1016/j.acalib.2016.05.011 https://doi.org/10.1080/19322909.2016.1230031 information technology and libraries | december 2017 21 zebulin evelhoch, “mobile web site ease of use: an analysis of orbis cascade alliance member web sites,” journal of web librarianship 10, no. 2 (2016): 101–23, https://doi.org/10.1080/19322909.2016.1167649. 6 barbara blummer and jeffrey m. kenton, “academic libraries’ mobile initiatives and research from 2010 to the present: identifying themes in the literature,” in handbook of research on mobile devices and applications in higher education settings, ed. laura briz-ponce, juan juanesméndez, and josé francisco garcía-peñalvo (hershey, pa: igi global, 2016), 118–39. 7 jankowska, “identifying university professors’ information needs in the challenging environment of information and communication technologies.” 8 chrzastowski and joseph, “surveying graduate and professional students’ perspectives on library services, facilities and collections at the university of illinois at urbana-champaign.” 9 carole a. george et al., “scholarly use of information: graduate students’ information seeking behaviour,” information research 11, no. 4 (2006), http://www.informationr.net/ir/114/paper272.html. 10 kristin hoffman et al., “library research skills: a needs assessment for graduate student workshops,” issues in science and technology librarianship 53 (winter-spring 2008), https://doi.org/10.5062/f48p5xfc; hannah gascho rempel and jeanne davidson, “providing information literacy instruction to graduate students through literature review workshops,” issues in science and technology librarianship 53 (winter-spring 2008), https://doi.org/10.5062/f44x55rg. 11 jankowska, “identifying university professors’ information needs in the challenging environment of information and communication technologies.” 12 ka po lau et al., “educational usage of mobile devices: differences between postgraduate and undergraduate students,” journal of academic librarianship 43, no. 3 (may 2017), 201–8, https://doi.org/10.1016/j.acalib.2017.03.004. 13 noa aharony, “mobile libraries: librarians’ and students’ perspectives,” college & research libraries 75, no. 2 (2014): 202–17, https://doi.org/10.5860/crl12-415. 14 hannah gashco rempel and laurie m. bridges, “that was then, this is now: replacing the mobile-optimized site with responsive design,” information technology and libraries 32, no. 4 (2013): 8–24, https://doi.org/10.6017/ital.v32i4.4636. 15 paula barnett-ellis and charlcie pettway vann, “the library right there in my hand: determining user needs for mobile services at a medium-sized regional university,” southeastern librarian 62, no. 2 (2014): 10–15. https://doi.org/10.1080/19322909.2016.1167649 http://www.informationr.net/ir/11-4/paper272.html http://www.informationr.net/ir/11-4/paper272.html https://doi.org/10.5062/f48p5xfc https://doi.org/10.5062/f44x55rg https://doi.org/10.1016/j.acalib.2017.03.004 https://doi.org/10.5860/crl12-415 https://doi.org/10.6017/ital.v32i4.4636 mobile website use and advanced researchers | markland, rempel, and bridges doi:10.6017/ital.v36i4.9953 22 16 william t. caniano and amy catalano, “academic libraries and mobile devices: user and reader preferences,” reference librarian 55, no. 4 (2014), 298–317, https://doi.org/10.1080/02763877.2014.929910. 17 haglund and olsson, “the impact on university libraries of changes in information behavior among academic researchers.” 18 kuruppu and gruber, “understanding the information needs of academic scholars in agricultural and biological sciences.” 19 christine wolff, alisa b. rod, and roger c. schonfeld, “ithaka s+r us faculty survey 2015,” ithaka s+r, april 4, 2016, http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey2015/. 20 m. macedo-rouet et al., “how do scientists select articles in the pubmed database? an empirical study of criteria and strategies,” revue européenne de psychologie appliquée/european review of applied psychology 62, no. 2 (2012): 63–72. 21 rempel and bridges, “that was then, this is now.” 22 ellie bushhousen et al., “smartphone use at a university health science center,” medical reference services quarterly 32, no. 1 (2013): 52–72, https://doi.org/10.1080/02763869.2013.749134. 23 jill t. boruff and dale storie, “mobile devices in medicine: a survey of how medical students, residents, and faculty use smartphones and other mobile devices to find information,” journal of the medical library association 102, no. 1 (2014): 22–30, https://doi.org/10.3163/15365050.102.1.006. 24 bushhousen et al., “smartphone use at a university health science center”; boruff and storie, “mobile devices in medicine.” 25 eden dahlstrom et al., “ecar study of students and information technology, 2015 ," research report, educause center for analysis and research, 2015, https://library.educause.edu/~/media/files/library/2015/8/ers1510ss.pdf?la=en. 26 ibid., 24. 27 lutishoor salisbury, jozef laincz, and jeremy j. smith, “science and technology undergraduate students’ use of the internet, cell phones and social networking sites to access library information,” issues in science and technology librarianship 69 (spring 2012), https://doi.org/10.5062/f4sb43pd. 28 rempel and bridges, “that was then, this is now.” 29 ibid. https://doi.org/10.1080/02763877.2014.929910 http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/ http://www.sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/ https://doi.org/10.1080/02763869.2013.749134 https://doi.org/10.3163/1536-5050.102.1.006 https://doi.org/10.3163/1536-5050.102.1.006 https://library.educause.edu/~/media/files/library/2015/8/ers1510ss.pdf?la=en https://doi.org/10.5062/f4sb43pd information technology and libraries | december 2017 23 30 “mobile/tablet operating system market share,” netmarketshare, march 2017, https://www.netmarketshare.com/operating-system-market-share.aspx?qprid=8&qpcustomd=1. 31 boruff and storie, “mobile devices in medicine”; patrick lo et al., “use of smartphones by art and design students for accessing library services and learning,” library hi tech 34, no. 2 (2016): 224–38, https://doi.org/10.1108/lht-02-2016-0015. 32 boruff and storie, “mobile devices in medicine.” 33 dahlstrom et al., “ecar study of students and information technology, 2015.” 34 caroline myrberg and ninna wiberg, “screen vs. paper: what is the difference for reading and learning?” insights 28, no. 2 (2015): 49–54, https://doi.org/10.1629/uksg.236. 35 barnett-ellis and vann, “the library right there in my hand”; haglund and olsson, “the impact on university libraries of changes in information behavior among academic researchers”; hoffman et al., “library research skills”; kuruppu and gruber, “understanding the information needs of academic scholars in agricultural and biological sciences”; lau et al., “educational usage of mobile devices”; macedo-rouet et al., “how do scientists select articles in the pubmed database?” 36 dahlstrom et al., “ecar study of students and information technology, 2015.” https://www.netmarketshare.com/operating-system-market-share.aspx?qprid=8&qpcustomd=1 https://doi.org/10.1108/lht-02-2016-0015 https://doi.org/10.1629/uksg.236 abstract introduction literature review methods results and discussion participant demographics and devices used frequency of library site use on mobile devices where researchers are when using mobile devices for library tasks library resources accessed via mobile devices university website use behaviors on mobile devices other research behaviors on mobile devices limitations conclusion references letter from the editor kenneth j. varnum information technology and libraries | december 2018 1 https://doi.org/10.6017/ital.v37i4.10852 as 2018 draws to a close, so does our celebration of information technology and libraries’ 50th anniversary. in the final “ital at 50” column, editorial board member steven bowers takes a look at the 1990s. much as for steven, for me this decade was where my career direction and interests crystallized around the then-newfangled “world wide web.” taking a look at the topics covered in ital over those ten years, it’s clear that plus ça change, plus c'est la même chose: the more things change, the more they stay the same. we were exploring then questions of how the burgeoning internet would allow libraries to provide new services and be more efficient and helpful in improving existing ones. user experience, distributed data and the challenges that causes, who has access to technology and who does not…. all topics as vibrant and concerning then as they ar e now. with the end of our look back at the last 50 years, we are taking the opportunity start something new in 2019. there will be a new quarterly column, “public libraries leading the way,” to highlight a technology-based innovation from a public library perspective. topics we are interested in include the following, but proposals on any other technology topic are welcome. • virtual and augmented reality • artificial intelligence • big data • internet of things • 3-d printing and makerspaces • robotics • drones • geographic information systems and mapping • diversity, equity, and inclusion and technology • privacy and cyber-security • library analytics and data-driven services • anything else related to public libraries and innovations in technology columns will be in the 1,000-1,500 word range and may include illustrations. these will not be research articles, but are meant to share practical experience with technology development or uses within the library. if you are interested in contributing a column, please submit a brief summary of your idea. i’m grateful to the ital editorial board, and especially to ida joiner and laurie willis, for their guidance in shaping this concept. regardless of whether you work in a public, or any other, library, i’m always happy to talk with you about how your experience and knowledge could be published as an article in ital. get in touch with me at varnum@umich.edu. kenneth j. varnum, editor varnum@umich.edu december 2018 https://goo.gl/forms/mcz2kdltiwypsnq43 https://goo.gl/forms/mcz2kdltiwypsnq43 mailto:varnum@umich.edu mailto:varnum@umich.edu 226 technical communications reports-library projects and activities light-pen technology at the university of south carolina-the south carolina circulation system for some years at the university of south carolina studies have been underway to perfect a computer-based circulation system, and every avenue of input has been explored. about three years ago the new light-pen inventory control system which was being developed for use in the retail trade became known to the library. this device, using a light-readable label about one inch square in size, seemed to be a far better input device for an inventory control system utilizing identification cards and books than the traditional punched card. mter as much research as could be done in such a new field, the library staff examined light-pen systems marketed by ncr, checkpointplessey, and the monarch marking system, a subsidiary of pitney-bowes. all of these systems were similar in technology but the hardware and interest of the companies varied. monarch and plessey were very interested and willing to cooperate with libraries. plessey (through checkpoint) has developed and is marketing a library circulation system. at carolina the decision was made to develop an in-house batch system using the monarch light-pen and its technology coupled to a digital equipment corporation pdpll/ 10, 16k, dual dectape system. the basis for the functioning of the system is the light-pen stations at the circulation desk where the books are charged and discharged by running the light-pen rapidly over the light-readable label on the patron's id card and the label in the front inside cover of the book. one quick pass over the id card sets the system, followed by a pass over each book label in succession. at present the library has three light-pen stations and anticipates adding a fourth, which will be sufficient for meeting all of the main library's needs for the foreseeable future. the light-pen control boxes were constructed on campus. they turn the system on, and have message lights, trouble li:ghts, and charge points up to five different due dates. any number of light-pen stations can be attached. a hazeltine 2000 crt is used to show all transactions on the screen as they occur and to serve as the system console. a decwriter, used as a line printer, insures a backup system and gives a printout of transactions. the decwriter was selected because its thirty cps speed is adequate, it is highly reliable, and the price is right. the pdpll is used as a batch controller. it does not convert the label data to human-readable data; this is done at the central computer center. each night after the circulation desk closes, a telephone link is made to the university's central computer and the day's scanned data are pumped into the big computer. while the system is batch design, it incorporates the features of an on-line system without the high cost. if a patron inquires about an item, a glance at the updated patron report and/ or an inquiry into the current activity file through the system console can answer questions on the location of all books in the system. unlike some systems, this one has not required that the lfbrary change its hours of operation or data input. when the library opens in the morning, all reports are distributed. the library gets its in-process, or charge, file, in clear text without borrower information for use by patrons to see which books are charged out. complete circulation files with borrower information are furnished to the circulation staff on com fiche. a total of 10,335 charge records is on each four-bysix-inch fiche. the record includes patron name, status, and social security number, call number of the book, item number, date checked out, date due, author, and title. in addition, there is a field for charging to graduate carrels within the library. notices are periodically written for overdue books and there is an indication in the charge file showing how many notices have been sent. personal reserves or holds can be placed on a book by simply keying the book number into the crt. when a book is returned, a message light on the controller lights so that the staff member will know that the book has a hold on it. similar procedures will put a "hold" on any borrower whom the library needs to reach. the printouts are generally by call number; however, it is possible to get lists of all books by borrower, or in other formats. statistics are obtainable in almost any configuration, including number of books checked out to different categories of borrowers, by individual title, etc. the total cost for the hardware in this system was $38,381.04. maintenance contracts, telephone charges, and miscellaneous operational costs add up to about $357.04 monthly. labels cost $1.70 per thousand. the total cost of the system amortized over a five-year period is no more than $975.00 per month. mter that the only continuing costs will be maintenance contracts and labels. additional light-pen stations can be added in the same building for about $1,200 each. light-pen stations can be added in branch libraries for about $2,400 each. not included in the cost figures is central computer time, which is held to a minimum by the batch features and software development. at the university of south carolina computer services are not charged to the individual department but are treated as a campus-wide service. all of the com fiche is produced on the campus. the entire project was developed and put into operation between april 1973 technical communications 227 and january 1974. the first books were officially charged out on january 15, 1974 and the system has been in continuous operation since then. needless to say, problems of a special nature had to be solved, for example, issuing id cards with light-readable labels to 20,000 students in two days, acquiring labels which are permanent for the books, constructing the control boxes which are unique, and solving the usual telephone and computer difficulties. suffice it to say, the entire project was planned and became operational in less than eight months without employing additional staff. although the system is still being refined, the performance has been spectacular.-kenneth f. toombs, di1'ecto1' of librm·ies, unive1'sity of south cm·olina. public access cable tv information center at new york public library the new york public library (nypl), recognizing public access cable television as an important social tool, has assembled in the mid-manhattan library one of the most extensive collections, and possibly the first, on the subject. noncommercial public access television has been characterized as a people-oriented television system that can respond to and reflect society in terms of culture, language, history, experience, and race. the collection is designed for readers seeking information on all aspects of pubhe access cable tv, both practical and theoretical, with a significant portion devoted to television as a community tool. the mid-manhattan collection includes books, pamphlets, periodicals, and microforms. related materials are also collected to document television activities of programming intentions similar to public access tv. it is hoped that the mid-manhattan project will provide a prototype for other libraries beginning collections of public access cable tv. the collection's book materials emphasize three main areas of interest: programming and the audience, the educational potential of public access television, and legislative controls. pamphlet materials in"' :i: '" : ~ i '" ,, :i! ,ii '" ;ii ,, ill i! 228 j oumal of library automation vol. 7 i 3 september 197 4 elude information on ethnic involvement, women's groups, conferences and conventions, library activities in video, bibliographies, and other current topics, and are accessible through a vertical file index. the subject headings in the file reflect the "rule of probable associ:ation" whereby the first meaningful word (after the assumed word television) is used. if the reader knows what he wants, aided by cross-reference system, he can easily identify the proper subject file. the collection also includes periodical indexes leading to a wide variety of journal articles on different aspects of video. the eric (educational resources information center) microfiche series is available from 1971 to date and includes much published and unpublished research. a special feature of the collection is the card file which lists hard-to-find information concerning organizations, associations, and periodicals in the field. such commercial groups as video cassette manufacturers as well as alternative groups making tapes can all be located in the file. contact: richard hecht, history and social science department, mid-manhattan library, 8 e. 40th st., new york, ny 10018. reports on regional projects and activities sal/net-satellite libmry information network this project is designed to experiment in the extension of library services to sparsely populated regions of the rocky mountain and northern plains states. the project has been awarded "designated user" status on the communications technology satellite to be launched by nasa in late 1975. salinet is one of the first attempts to experiment in delivering library services via satellite. a group meeting in denver described plans to use the world's most powerful communications satellite as an extension of local library resources for residents of twelve mountain and plains states. the national space agency, the multistate federation of rocky mountain states, and several library oriented groups and agencies serving the area will pool their expertise and resources in the program, vvhich will begin planning late this year. the library information and development program is a new passenger on the educational satellite which will demonstrate new means of helping to teach residents of far-flung portions of the rocky mountain states and assist them in their information needs during a hvo-year period beginning next fall. four interests are represented in the library oriented project, which bears the acronym of salinet -satellite library information network. the university of denver graduate school of librarianship, the university of kansas libraries, the wyoming state library, and natrona county (wyoming) library are the principals in the consortium. each institution is responsible for certain portions of the library program, which will benefit both libraries and their patrons in the mountain and plains states. dr. margaret knox goggin, dean of the du graduate school of librarianship, is principal investigator on the library program. her co-workers representing other members of the consortium include kenneth e. dowlin, director of natrona county library, casper, wyoming; william williams, wyoming state librarian; and robert malinowsky, assistant director for public service, education, and statistics at the university of kansas libraries. also taking part in the salinet program are the bibliographical center for research, rocky mountain region, inc.; the federation of rocky mountain states; and the mountain-plains library association. these groups will assist with programming, broadcast, and engineering requirements, utilization, and research. the proposed program will utilize fiftysix satellite ground stations which will be in place as part of the federation of rocky mountain states' satellite technology demonstrations. twenty participating libraries in the states of north dakota, south dakota, nebraska, and kansas will 230 1 ournal of library automation vol. 7 i 3 september 197 4 the student as "physician" has selected in managing the case. the learning center offers this and other kinds of audiovisual programs designed to enhance textbook and classroom learning. computers, video cassettes, slide projectors, and models enable the student to experience close up and at his or her own speed areas of medicine that often cannot he presented as well in lectures or textbooks. in addition to the audiovisual materials, the learning center provides students with periodicals, lecture notes, and reference texts. students can watch dissections, see examples of blood cell abnormalities, hear the sounds of healthy and defective heartbeats, and examine oversize plastic models of the brain, the heart, and other parts of the human anatomy. and, with the computer learning programs, the students can participate in a case and make choices to guide its outcome. medical students learning about iron metabolism last year were split into two groups by their instructor so that half attended traditional classroom lectures and half learned the unit from the computer. final examinations showed no difference between the two groups, according to their teacher, dr. james mcarthur, formerly associate professor of medicine. the students preferred human teachers in small group tutorial sections for this unit, mcarthur said, but generally they were in favor of computer instruction. mcarthur, now assistant director of the health sciences learning resources center at the university of washington in seattle, said that the crowd of students usually found at the learning center is an indication of its success. announcements resolution whereas, the american library association is the chief advocate for librarians and laymen seeking to provide citizens of the united states with the highest quality library and information service, and whereas, a major effort will be required of this association and of all supporters of libraries in the next few years as the country's leaders determine longrange national positions in such matters as copyright, intellectual freedom, federal support of libraries, and a national plan for libraries and information services, and whereas, the effectiveness of this effort will depend on the concerted effort of all those concerned with library service, including library users, citizens groups, government officials and librarians themselves from all aspects and ranks of the profession; therefore let it be resolved that all the committees, chapters and divisions of the american library association take definite steps to increase mutual efforts within the association and with other associations seeking ways to strengthen the common effort toward the provision of quality library service to all people. and let it be further resolved that chapter councilors, division officers, the legislation assembly and chairpersons of committees and round tables, affiliated organizations and related groups transmit this resolution to members of their respective units. adopted by the ala legislation committee on july 9, 1974. isad p1·esident receives awar.d frederick g. kilgour, director of the ohio college library center, received the margaret mann citation for 1974, on july 9 at the program meeting of the resources and technical services division of the american library association, during the annual conference of ala in new york city, the week of july 7-13, 197 4. the award recognizes outstanding professional achievement in the areas of cataloging or classification. mr. kilgour received his a.b. from harvard and studied library service at columbia while working at the harvard college library. he worked at the office of strategic services in washington, d.c. and later became deputy director of the 232 journal of library automation vol. 7 i 3 september 197 4 loo, ontario) and edwin buchinski universite d'ottawa): canadian marc, canadian cataloging task group, union lists of serials, prospects for cooperation, unique cataloging problems (e.g., dual language requirements) , large serial data bases. registration will be $70.00 to members of either ala or asis, $85.00 to nonmembers, and $20.00 to library school students. registration includes one lunch, a reception, and a copy of the marc serials manual. for hotel reservation information and a registration blank, write to donald p. hammer, isad, american library association, 50 e. huron st., chicago, il 60611. washington university school of medicine library-book catalog the washington university school of medicine library, st. louis, announces the publication of its catalog of books 1970-1973, containing all entries for monographs cataloged at this library from january 1, 1970 to december 31, 1973. the first part, the register, consists of the complete citations arranged in the order cataloged. the second part consists of three alphabetical indexes to the register -name, title and series, and subject indexes. the catalog is on thirty microfiche, 24x reduction, and the price is $15.00. orders can be filled or additional information obtained from doris bole£, assistant librarian for technical and informational services. proceedings of info1·matics/ ucla symposium available on tapes high fidelity recordings of proceedings from the annual data processing symposium held march 27-29, 1974, at the university of california, los angeles, are now available on cassette tapes. the subject of this year's conference, cosponsored by informatics inc., los angeles, and ucla, was "information systems and networks: the new world of information retrieval available to your organization through computer networks." the complete program, recorded by convention seminar cassettes, north hoilywood, can be ordered by session or in total for review whenever convenient. cassettes one and two cover session one, the evolution of interactive information systems; cassettes three and four include session two, data bases; cassettes five and six, session three, on-line information retrieval systems; cassettes seven and eight, session four, cost effectiveness of information retrieval systems and networks; and cassettes nine and ten, session five, information networks in the 1980s. each set of two cassettes covering one session is priced at $10.95. the entire series can be purchased for $49.95 in an easy-to-store cassette album file. prices include postage and handling. to order, contact convention seminar cassettes, 13356 sherman way, north hollywood, ca 91605; tel: (213) 765-2777. nonprint media institute a nonprint media institute will be held in galveston, texas on october 15, 1974, southwestern library association's annual conference registration day. the one-day institute, sponsored by swla, will feature morning speakers, including pearce grove discussing progress in resolving differences among three cataloging standards for nonprint media, and vivian schrader, head of the a v section of library of congress, reporting on the progress of lc's nonprint cataloging standards. afternoon informal discussion forums will focus on technical service handling of art prints, microforms, films, kits, phonorecords, and audiotape. the nonprint media institute is open to members and nonmembers of swla, but is limited to 150 registrants. registration fee is $20.00. for registration, hotel reservations, and transportation information, write: ann adams, head cataloger, houston public library, 500 mckinney, houston, tx 77002. international standm·ds for cataloging: an institute on isbd, issn, nsdp and chapter 6, aacr the seventh annual institute of the li234 journal of library automation vol. 7/3 september 1974 zurkowski, president, information industry association, 4720 montgomery lane, bethesda, md 20014; tel: (301) 6544150. commercial services and developments 10,000 computer program abstracts in ncpas data base the national computer program abstract service ( n cp as) , a clearinghouse for computer program abstracts, has categorized over 10,000 abstracts into 142 subject areas in its latest newsletter. these abstracts of simulation models, application and computational programs, and information retrieval systems are derived from business, government, industry, military, and universities. all fields of knowledge are included and are grouped into the following general categories: biosciences, medical sciences, business, manufacturing, management, education, libraries, environment, ecology, nature, government (federal, state, local), urban affairs, legal, humanities, specific industries, publfc utilities, military, science, and engineering. this service should be of value to a present or potential user of computer programs, a vendor with a program to sell, or a professor developing programs in the academic community. programs can be listed in the data base free of charge. the service is problem oriented. the program abstract information is disseminated in two forms: ( 1) a program index newsletter which includes a detailed index of the available subjects and the number of abstracts available for each subject (updated quarterly) -the newsletter cost is $10.00 per year ( $5.00 additional for foreign airmail) ; and ( 2) a sub;ect abstract report which includes all the abstracts available in the ncp as data base on a particular subject identi'fied in the progmm index newsletter-the abstract report cost is $10.00 for the rrst 200 abstracts and $5.00 for up to each additional 200 abstracts ( $5.00 additional for foreign airmail). for additional information contact: ncpas, p.o. box 3783, washington, dc 20007. communications by telephone a lfghtweight communications system about the size of a suitcase is now being introduced that can take the travel-and the cost-out of meetings. the new solid state system is the darome edu-com, a portable self-contained communications unit with four microphones, that uses regular telephone lines. edu-com enables groups of people in different places across the country to confer together as easily as if they were all in the same room. the cost of a onehour meeting with participants located coast-to-coast is a few hundred dollars. manufactured by darome, inc., of harvard, illinois, makers of modular sound systems equipment, the edu-com unit plugs into an inexpensive, standard telephone coupler, a device supplied by the telephone company. the number of locations that can be included in a darome edu-com conference is practically unlimited. to participate, each location need only be equipped with a darome unit and a telephone coupler. then, rather than having just one meeting at a time, it is possible to hold any number of meetings in any number of places at the same time. before an edu-com session begins, the organizer of the meeting telephones a special conference call telephone operator and gives the names, locations, and telephone numbers of the groups to be reached and the time of the meeting. charges begin only when all locations have been tied together by the operator and the conference is ready to start. the rate for the darome edu-com meetings is much lower than for direct dialing the places individually. the rate is equal to the cost of calling only the farthest city participating in the conference. for example, a one-hour meeting that originates in chicago and includes groups of people who participate in new york, newark, huntington, greensboro, atlanta, orlando, detroit, denver, and san diego would cost $280. 236 journal of library automation vol. 7/3 september 1974 the wall street journal and banon's magazine. anthony a. barnett, senior vice-president of bunker ramo, said test installations in five stockbrokerage firms over the past month "have allowed us to shake down the system and prepare for nationwide marketing." mr. barnett said fifty of the news retrieval systems were sold before formal introduction. "we're encouraged by the marketing prospects among stockbrokerage firms and financial institutions but believe the market among corporations may have even greater potential," he added. the news retrieval system permits instantaneous recall of stories on 6,000 companies listed on the new york and american stock exchanges and traded over-thecounter. users also are able to retrieve news of twenty-five industry groups, fifteen government agencies, and several general categories. mr. barnett said that at the outset customers will be able to recall from the ffie any story that has appeared in the last three months. dj news-recall was developed as a joint venture by bunker ramo and dow jones, which publishes the wall stmet joumal, barron's, and the dow jones news service. the joint venture, dow jones-bunker ramo news retrieval service inc., in turn will market the data base to distributors for resale. bunker ramo's information systems division is the charter distributor for dj news-recall. mr. barnett said the basic charge for dj news-recall to users of bunker ramo's system 7 will be $175 a month per office ·plus $25 for each video terminal having access to the news retrieval service. on-line access to the compendex data base engineering index, inc., announces the availability of its computer-readable data base, compendex, through on-line access. two organizations are currently providing this di'rect mode of bibliographic search: lockheed information systems and system development corporation. using the latest in data communications services, users requiring access to the compendex files may interact with the system via their own in-house terminal, thus providing the convenience and speed of "on-demand" searches. compendex is the machine-readable version of ei's monthly and provides abstracts/bibliographic citations covering worldwide developments in all fields of engineering. both s.d.c. and lockheed, utilizing the most modern system technology, afford the user the opportunity to maintain an actual "dialog" with major bases. this is done without imposing an overly complicated or difficult command language on those addressing the system. on-line access now adds a new dimension to those requiring searches of the ei data base. for further information interested individuals and organizations may contact: lockheed information service, 3251 hanover st., palo alto, ca 94304; tel: ( 415) 493-44ll (east coast office: lockheed-405 lexington ave., new york, ny 10017; tel: (212) 697-7171), or s.d.c. search service, system development corporation, 2500 colorado ave., santa monica, ca 90406; tel: (213) 393-94ll (east coast office: s.d.c.5827 columbia pike, falls church, va 22041; tel: ( 703) 820-2220) . standards the isad committee on technical standards for library automation invites your participation in the standards game editor's note: use of the following guidelines and forms is described in the mticle by john kountz in the june 1974 issue of jola. the tesla reactor ballot will also appear in subsequent issues of technical communications for mader use, and the tesla standards sc01'eboard will be presented as cumulated results warrant its publication. to use, photocopy or otherwise duplicate the forms presented in jolatc, fill out these copies, and mail them be added to complete a twelve-state test bed representing all categories of libraries. with the involvement of all these points, half of which will be in two-way communication with other points via the satellite, the library information project hopes to accomplish three primary goals: 1. improving individual and organization capacities for getting information. 2. demonstrating and testing cost effectiveness in using technological advances to disseminate information. 3. developing user "markets" for information utilizing satellite distribution. the program will try to help individual users of information and community-level groups such as governmental agencies, businesses, and other organizations. on a regional level, bibliographic information will be transmitted to libraries in a "compressed data format." with such a fonnat, a library in a remote area of north dakota may have access to most needed information about resources available from large and specialized centers, such as the denver public library's special conservation library or western history collection. the proposed satellite information program will also be used to train librarians, both at a professional and paraprofessional level. the in-service program will be aimed at helping librarians to better assist their patrons in getting information. all these major aspects-public information programming at the individual level, technology dissemination at the community level, compressed bibliographical data transmission, and in-service training-will be accomplished in a total of fifty hours per year of programming, reports dr. william e. rapp, vice-president of the federation of rocky mountain states. the limited time available for this programming in coordination with other programs planned for the satellite project place a premium on solid advance preparation of material to be transmitted, and speed of transmission, he notes. for example, the transmission of the technical communications 229 compressed bibliographical data would be in twoto three-minute segments at the end of other programming. technology dissemination, a community-level program, would be handled in a total of fifteen hours of satellfte use a year-an average of fifteen to twenty minutes per week. the largest segment of time, for inservice training of librarians, is twenty hours per year-which breaks down to less than half an hour a week on the average. but if the available time on the satellite is used to its full potential, dean goggin belfeves the population of the entire rocky mountain and plains region will benefit tremendously. the combined resources of major libraries and two major universities could be shared instantly with communities and residents of the region. new horizons cu1'e by compute1'-a learning expe1'ience it has been a long day for the physician, and at 10:30 p.m. he is getting ready to go horne and have his first meal since breakfast. but the phone rings and the caller from university of minnesota hospital tells him that an infant has just been brought in from outstate minnesota for diagnosis. the baby's hometown physician has noted that the baby hasn't eaten well for several days and he can't decide what's wrong. the case apparently isn't urgent, so the physician can either go for a bite to eat and return later or go straight to the hospital to look at the baby. this is the first of a series of choices presented to medical students in this imaginary case history. it's offered as a computerized learning program for medical and health sciences students at the university of minnesota's learning center. for a student playing the part of the physician in this case, each choice he or she makes presents new difficulties in the case--which call for more choices. ultimately, the imaginary infant either dies or survives, depending on what options i i ' i i i. office of intelligence collection at the department of state fn washington. he was librarian at the yale medical library and then associate librarian for research and development of the yale university library. he has been active in library and library-related organizations since the beginning of. his career and has served on many committees. he was managing editor of the yale journal of biology and medicine; and has written numerous articles for professional journals. his professional interests are computerization of libraries and information retrieval. the text of the citation reads in part: " ... awarded in 1974 to frederick g. kilgour for his success in organizing and putting into operation the first practical centralized computer bibli:ographical center. he has been the principal inr.uence behind an emerging trend toward cooperation in technical services. . . . as director of the ohio college library center he has made the library of congress marc data base a practical and useful product, stimulating i'nterest throughout the country and the profession .... his tireless efforts represent an outstanding contribution to the technical improvements of cataloging and classification and the introduction of new techniques of recognized importance." institute on automated serials contml the information science and automation division (isad) of the american library association and the american society for information science will cosponsor a preconference institute on "automated serials control: national and international considerations." the institute will be held on october 11 and 12, 1974 in atlanta, georgia immediately before the asis annual conference, which begins on october 13. the institute and the conference will both be held in the atlanta regency hyatt house. the purposes of the institute will be: ( 1) to present in-depth discussions on the new and dramatic developments in fhe serials field and their implications for the library and the library systems development communities; .and ( 2) to provide a technical communications 231 survey of the progress made to date in automated serials systems. formal presentations by acknowledged experts actively involved in the field will be provided, and ample opportunity will be available for informal discussion between the participants and a panel of concerned professionals. the panel will represent various views related to current and future developments as well as the national and international consequences. among other things, the program will include the following: elaine wood (lc marc development office) : marc serials format and serials processing at lc-a tutorial. in addition, this session will include discussion on national and international standards, and will emphasize the difference between marc serial and marc monograph formats. joseph howard (lc serials record division) : cataloging considerations. the proposed changes to the cataloging rules -isbd, aacr, various points concerning entry and other unique cataloging problems. henriette a vram and lucia rather (lc marc development office): international considerations. prospects for international exchange of data, mechanisms for exchange, problems posed by differing practices and conventions, international developments in machine-readable cataloging. paul fasana (new york public library) : impact of national developments and of automation on library services. general consideration of automation's impact on library services with emphasis on serials control. recommendations concerning national developments. linda crismond (university of southern california): review of serials systems and system considerations. acquisitions, cataloging, check-in, claiming, etc. problems posed by holdings notations, volatility of data, linking entries, etc. lois upham (university of minnesota): conser (consolidated serials project). joseph price (nsdp): international serials data system and nsdp. cynthia pugsley (university of waterbrary institutes planning committee will be held october 18-19, 1974, at rickey's hyatt house hotel, palo alto, california. paul w. winkler, principal descriptive cataloger, library of congress, will speak on the application of the international standard bibliographic description to monographs and on related topics. the establi:shment of bibliographic control of serials through international standard serial numbers, chapter 6 of the angloame1'ican cataloging rules, and the national serials data program will be presented by richard anable, coordinator, conser project. the program is designed to be of particular interest to technical services librarians, serials librarians, bibliographers, and administrators. registration for the two-day meeting is limited; the fee is $20.00 and includes two luncheons. further information, including a list of hotel accommodations, will be mailed to applicants. registrants of the 1972 and 1973 institutes will automatically receive registration forms. others may obtain forms by writing joseph e. ryus, 2858 oxford ave., richmond, ca 94806, or by telephoning him during weekday hours at the university of california, berkeley, ( 415) 642-4144. all registration forms will be mailed early in september. the library institutes planning committee is a nonprofit organization composed of eight librarians from county, special, and university libraries in northern california. previous institutes have featured ralph ellsworth, j. mcree elrod, seymour lubetzky, ellsworth mason, daniel melcher, john c. rather, joseph a. rosenthal, and paul w. winkler. info1'mation industry associations expanded micropublishing and data base programs major policy steps have been taken by the information industry association ( iia), making the work of the associatirm more understandable and more relevant to information industry companies. it changed the title of the government micropublishing committee to micropublishing and it directed the establishment technical communications 233 of a data base committee. "regardless of the media information companies and other publishers currently use in delivering information," iia presi'dent, paul g. zurkowski, said, "competition and rising costs are forcing them into consideration of alternative methods. iia member companies will be able to focus their energies most effectively on the industry-wide problems through these new committees." micropublishing committee chairman, henry powell, bell & howell, bethesda, at a recent meeting of the committee spelled out several areas of concern to micropublishers which will be the subject of committee action: 1. how can micropublishers protect their investment from unfair competition of unscrupulous competitors who misappropriate the micropublishers' work product and market essentially "reprinted" versions of the original microfilm. 2. library relations. joint library-industry steps toward mutual understanding and cooperation. 3. z-39 standards committee recommended standards covering what micropublishers can say about their products. what is a volume equivalent in microform? what information should be included on each mfcrofiche and where on the header or title section of the microfilm product? 4. a program to educate users as to the operational benefits of micropublished materials. the data base committee is being formed with the participation of both data base creating companies and those offering public access to various data bases. the area of interest to this committee will embrace the status of data bases under existing proprietary rights laws, communications capabilities and rates as controlled by the fcc, unfair competition legislation pending in congress, and such other problems as those created for the industry by university computer centers marketing access to similar data bases, but without full cost recovery. for further information contact paul g. simply by pressing the lever on one of the edu-com microphones and speaking into it, anyone in any of those cities will be heard in the meeting i'n every other city as if he were right there in the room. in addition, an automatic slide projector can also be plugged into the unit. during a presentation, the speaker can change the slides simultaneously in all the locations equipped with slides. a cassette player-recorder can be plugged into the darome edu-com unit, either to provide the program or record the session. vvhen used alone, the darome educom can also serve as a public address system for a single meeting. for further information, contact darome, inc., 711 e. diggins st., harvard, il 60033. automated news clipping, indexing and retrieval system (ancirs) image systems, inc. of culver city, california, has developed an automated system for the indexing and retrieval of news clippings. while ancirs (pronounced answers) is geared for use in the newspaper library, terminals located at remote sites provide access to the system for business, industry, education, and law enforcement and other government agencies. the microfiche terminals, whfch are controlled by a minicomputer, are each capable of storing 325,000 clippings and 1 million lines of index and search terms. ancirs has a capacity in excess of 1.25 million listings. access to a page of index hstings or to the full text of any clipping requires less than four seconds. paper copy of any or all of the selected clippings can be produced at the terminal at the touch of a button. multiple terminals can share the same minicomputer. a unique off-line/ on-line indexing system generates subject term lists from story headlines and other key words, names, and places in the story as selected by the indexer. when indexing a story, the indexer keys in the first letters of the subject terms to be assigned. this causes the terms currently in use to be displayed at the terminal, allowing the indexer to automatically assign the appropriate terms to the story being indexed. if the term is technical communications 235 not already in use it may be entered by completing the typing of the term. after new terms have been entered and old terms assigned, a magnetic tape is produced for the off-line program. the off-line program prepares three lists for computer output to microfiche: 1. headlines permuted by the key words in each headline interleaved with subject terms selected from the stories. 2. a category list by classification and subclass. 3. each story headline in date order. to perform a search, the user keys i'n the first few letters of the search term. this causes the appropriate portion of list one to be displayed. each item on the page has a line number. keying the line number ( s) selects the desired term ( s) and causes the most recent clipping to be displayed. if the selected term is too general, i.e., a category heading, the appropriate portion of hst two is automatically displayed so that a more precise selection may be made. the selection in this instance is also made by entering the line number(s) of the desired term(s). these selections may be combined logically with other selections to further narrow the search. once the terms have been selected and the most recent story i's displayed it is then possible to page back through previous related stories. the hard copy of any story can be requested at any time and is produced by the unit in ten seconds. ancirs' low cost makes it the ideal tool for researchers and decision makers who must have at their fingertips complete facts on world, national, and local events. machine-readable data bases news retrieval service bunker ramo corporation and dow jones & co. inc. announced the start of dj news-recall, a computerized news retrieval service based on stories appearing on the dow jones news service and in to the tesla chairman, mr. john c. kountz, associate for librmy automation, of/ice of the chancellor, the california state university and colleges, 5670 wilshire blvd., suite 900, los angeles, ca 90036. the procedure this procedure is geared to handle both reactive (originating from the outside) and initiative (originating from within ala) standards proposals to provide recommendations to ala's representatives on existing, recognized standards organizations. to enter the procedure for an initiative standards proposal you must complete an "initiative standards proposal" using the outline which follows: initiative standard proposal outlinethe following outline and forms are designed to facilitate review by both the isad committee on technical standards for library automation (tesla) and the membership of initiative standards requirements and to expedite the handling of the initiative standard proposal through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indicated by: vi. existing standards. not applicable). note that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 8%" x 11" white paper (typing on one side only). each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title). technical communications 237 ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications) . v. description. briefly describe the standard proposal (specification of the standard). vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard proposal, cite them here (expository remarks). vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). viii. specifications. specify the standard proposal using record layouts, mechanical drawings, and such related documentation aids as required in addition to text exposition where applicable (specification of the standard). kindly note that the outline is designed to enable standards proposals to be written following a generalized format which will facilitate their review. in addition, the outline permits the presentation of background and descriptive information which, while important during any evalu238 journal of libmry automation vol. 7 i 3 september 197 4 ation, is a prerequisite to the development of a standard. the reactor ballot is to be used by members to voice their recommendations relative to initiative standards proposals. the reactor ballot permits both "for" and "against" votes to be explained, permitting the capture of additional information · which is necessary to document and communicate formal standards proposals to standards organizations outside of the american library association. tesla reactor ballot reactor information name title organization address city state ___ zip __ telephone identification number for standard requirement for against reason for position: (use additional pages if required} as you, the members, use the outline to present your standards proposals, tesla will publish them in ]ola-tc and solicit membership reaction via the reactor ballot. throughout the process tesla will insure that standards proposals are drawn to the attention of the applicable american library association division or committee. thus, internal review usually will proceed concurrently with membership review. from the review and the reactor ballot tesla will prepare a "majority recommendation" and a "minority report" on each standards proposal. the majority recommendation and "minority report" so developed will then be transmitted to the originator, and to the official american library association representative on the appropriate standards organization where it should prove a source of guidance as official votes are cast. in addition, the status of each standards proposal will be reported by tesla in jola-tc via the standards scoreboard. the committee (tesla) itself will be nonpartisan with regard to the proposals handled by it. however, the committee does reserve the right to reject proposals which after review are not found to relate to library automation. an invitation from tesla during the formative period of tesla the list of potential standards areas for library automation, below, was developed. you are invited to review the list below and voice your opinion of any or all areas indicated by means of the reactor ballot. or, if you have a requirement for a standard not included in this list, use the initiative standard proposal outline to collect and present your thoughts. potential technical standards areas1. codes for library and library network, including network hierarchy structures. 2. documentation for systems design, development, implementation, operation, and postimplementation review. 3. minimum display requirements for library crts, keyboards for terminals, and machine-readable character or code set to be used as label printed in book. 4. patron or user badge physical dimension(s) and minimum data elements. 5. book catalog layout (physical and minimum data elements), a. off-line print b. photocomposed c. microform 6. communication formats for inventory control (absorptive of interlibrary loan and local circulation). 7. data element dictionary content, format, and minimum vocabulary, and inventory identification minimum content. 8. inventory labels or identifiers (punched cards, labels, badges, or . . . ) physical dimensions and minimum data elements. 9. model/minimum specifications relating to hardware, software, and services procurement for library applications. 10. communications formats for library material procurement (absorptive of order, bid, invoice, and related follow-up). input to the editor: i have reviewed mr. joe rosenthal's incisive survey of the marc types which appear to be eminent. unfortunately, in his studies he seems to have overlooked the one marc type which will pose the technical communications 239 greatest problem and relates to "unmarced" books: nonmarc-the grand universe of records which yet remain to be placed in this most noble of formats and its international counterpart, originating in holland, nedermarc-which stems from, "i nedermarc, you nedermarc, all them systems nedermarc .... " hopefully, someone will solve the problem explicit in these forms of marc. as a result, until someone solves this problem, we are all without marc. john kountz associate for library automation california state university and colleges managing in-library use data: putting a web geographic information systems platform through its paces bruce godfrey and rick stoddart information technology and libraries | june 2018 34 bruce godfrey (bgodfrey@uidaho.edu) is gis librarian and rick stoddart (rstoddart@uidaho.edu) is education librarian at the university of idaho library. abstract web geographic information system (gis) platforms have matured to a point where they offer attractive capabilities for collecting, analyzing, sharing, and visualizing in-library use data for space-assessment initiatives. as these platforms continue to evolve, it is reasonable to conclude that enhancements to these platforms will not only offer librarians more opportunities to collect in-library use data to inform the use of physical space in their buildings, but also that they will potentially provide opportunities to more easily share database schemas for defining learning spaces and observations associated with those spaces. this article proposes using web gis, as opposed to traditional desktop gis, as an approach for collecting, managing, documenting, analyzing, visualizing, and sharing in-library use data and goes on to highlight the process for utilizing the esri arcgis online platform for a pilot project by an academic library for this purpose. introduction a geographic information system (gis) is a computer program for working with geographic data. a gis is an ideal tool for capturing data about library learning spaces because they can be described by a geographic area. the learning spaces might be small or large, irregularly shaped or symmetrical—either way, the shape can be described by a set of geographic coordinates. tools for storing, managing, documenting, analyzing, and visualizing geographic data can all be found in a gis. the locations and shapes of geographic features (such as library learning spaces) as well as attributes of those features (such as the type of learning space) can be captured in a gis. the roots of giss stretch back to the 1960s. goodchild characterizes giss’ advances in spatial analysis during the 1970s and the growth of gis in the 1980s, coinciding with the proliferation and affordability of desktop computers.1 the enhancement of gis software from desktop computer applications to online platforms has been underway for some time. the origins of web gis can be traced back to 1990s, but it is only since the mid2000s that products have really matured to a point where they can be viable alternatives to their desktop counterparts. web gis first appears in 1993 when xerox corporation’s palo alto research center created an online map viewer.2 their map viewer, running in a web browser, was the first demonstration of performing gis tasks without gis software installed on a local computer. even though this early web-based gis application had limited capabilities, the potential of performing gis operations from computers anywhere and anytime was recognized. the possible capabilities of web gis began to be more fully discussed in the mid-1990s.3 web gis software became available in earnest in 1996 as gis companies began releasing commercial offerings.4 the first two decades of this century have seen web gis explode in functionality and scope to become an integral part of most giss. mailto:bgodfrey@uidaho.edu mailto:rstoddart@uidaho.edu managing in-library use data | godfrey and stoddart 35 https://doi.org/10.6017/ital.v37i2.10208 in late 2012, a collaborative mapping platform hosted by esri (environmental systems research institute) named arcgis online (https://www.arcgis.com/) was released. esri is a gis software company that was founded in 1969, and its products are used by more than 7,000 colleges and universities across the globe.5 the collaborative platform enables users to create, manage, analyze, store, and share maps, applications, and data on the internet. gis software continues to evolve from desktop computer programs to specialized software applications (i.e., apps) that are part of a web-focused platform. this transformation is profoundly growing the accessibility of the technology to a broader array of users. what was once a technology reserved for geographic information professionals because of its complexity and cost has now been streamlined and put in the hands of nonprofessionals who want to take advantage of its many possibilities. it is no longer reserved for academic disciplines such as geographic information science and remote sensing science; instead, gis has seen its use grow in humanities and social science to the point where libraries are developing targeted services for these disciplines. 6 professionals are afforded the ability to share their data more easily, and nonprofessionals are able to utilize those data to create information and knowledge more easily. this transformation bodes well for libraries because it lowers technological hurdles that might have precluded the technology’s use for space-assessment and other place-based initiatives in the past. now that software-as-a-service (saas) mapping platforms such as mango, gis cloud, and arcgis online enable users to access capabilities over the internet, there is no server software for users to install or licensing to configure. additionally, the training required by personnel to gather, utilize, and manage data has been greatly reduced compared to its desktop predecessor. academic libraries, and libraries in general, stand to gain from the evolution. the use of desktop gis for space assessment the value of space planning efforts in libraries and the observational methods employed to conduct such activities have been well articulated in library research. the use of desktop gis as a tool for collecting in library use data in academic libraries has been present for more than a decade. bishop and mandel show that libraries’ use of gis falls into two broad categories, analyzing service area populations and facilities management, the latter of which encompasses “in-library use and occupancy of library study space.”7 work related to the use of gis to study library-patron spaces is discussed below. in the past twenty years, academic libraries have seen many transformations in their roles on college and university campuses. gis technologies have helped document and respond to those transformations. xia outlined the value of using gis as a tool for space management in academic libraries more than a decade ago because of its “capacity for analyzing spatial data and interactive information.” 8 in one study, xia describes using esri arcview 3.x desktop software for library space management. arcview was esri’s first gis software to have a graphical user interface; predecessors had command-line interfaces. xia mentions the use of, at that time, the emerging arcgis product, which went on to replace arcview 3.x. gis proved to be a valuable tool for xia to track the spatial distribution of books in the library environment.9 xia went on to measure and visualize the occupancy of study space using arcview.10 lastly, xia used arcview as an item-locating system within the physical space of the library.11 more recently, mandel utilized mapwindow, an open-source desktop gis originally developed at idaho state university, for creating maps of fictional in-library use data.12 mandel’s process demonstrated how a gis could be utilized to visualize the use of library spaces for marketing materials and services as well as graphically depicting a library’s value. coyle argued for the use of gis as a tool to analyze the interior space of the library, and specifically the library collection itself, while not implementing a system with any https://www.arcgis.com/ information technology and libraries | june 2018 36 specific gis package.13 given and archibald detailed their use of visual traffic sweeps as an approach to collect and visualize in-library use data.14 their workflow involved utilizing a microsoft excel spreadsheet to capture data and then importing the data into arcgis to query and visualize the data. therefore, gis wasn’t used for data capture; it was used toward the end of the process to visualize these data. while the body of work details the use of desktop gis for working with in-library use data, collaborative web gis platforms now offer opportunities to advance existing research in this arena by streamlining datacollection workflows, sharing database schemas, and enabling broader collaboration with peers, thereby potentially creating opportunities for new research. fusing the capabilities of these new platforms with traditional observational methods of gathering data on how people are using library spaces extends the body of knowledge and offers interesting new opportunities for research such as cross-institutional comparisons. it is critical for twenty-first-century academic libraries to collect such data to continue to evolve with the changing needs of digital-age campus research and culture. utilizing a cloud-based platform for learning space assessment discussed below is the approach employed for this pilot project to use web gis to collect, manage, share, and visualize information about library learning spaces. this pilot project utilized the esri arcgis online platform and client applications accessing that platform (see figure 1). collector for arcgis (http://doc.arcgis.com/en/collector), a ready-made app, was used for data collection. arcgis desktop (http://desktop.arcgis.com) was used at the outset to create the initial database schema. a custom html/javascript web application was developed to better enable library administrators to visualize the data as a map, table, or chart. prior to the implementation of this pilot project, the circulation department conducted floor sweeps for safety purposes (e.g., making sure certain doors were locked), but space assessment data had never been gathered for the library. research study location all observations were taken during fall 2016 and spring 2017 at the university of idaho library and the gary strong curriculum center. this article focuses on the implementation of the platform for use at the library. the first floor of the university of idaho library underwent a remodel during winter 2016. the remodel included new furniture and different configurations of areas better customized for learning and studying. spaces such as group study, booths, and brainstorming spaces figured prominently in the remodel. additionally, expanded food and beverage options and having proximity to open seating areas located near natural light provide a welcoming environment. library hours were also expanded to 24 hours per day, 5 days a week. with these changes arose the desire to digitally collect data to learn about the use of these new locations by patrons. utilizing these data to inform decision-making about future changes to the physical spaces in the library, as well as connecting library learning spaces to campus learning outcomes, were goals of this research. http://doc.arcgis.com/en/collector http://desktop.arcgis.com/ managing in-library use data | godfrey and stoddart 37 https://doi.org/10.6017/ital.v37i2.10208 figure 1. infrastructure for the pilot project. selecting the arcgis online platform using locally existing resources to implement this pilot project was a requirement. funding was not available to purchase server software or hardware. personnel time could be carved out of existing positions for this effort, but money was not available to hire additional personnel. the university of idaho library does not have a dedicated it unit, so choices were limited. purchasing business-intelligence software such as tableau was cost prohibitive. an open-source tool such as suma, developed by north carolina state university libraries, was not a practical option in this case because the system requirements did not align with the expertise of existing personnel.15 fortunately, the arcgis online platform was available for this research at no cost to the library, and existing personnel had experience using the platform. the university of idaho participates in, and contributes financially to, a state of idaho higher education site license for esri software. the software is then available to personnel across the institution for research, teaching, and, to a lesser extent at this time, administrative purposes. since arcgis online is a cloud platform, there is no server software to install and update and no server hardware to configure. additionally, the university of idaho gis librarian was familiar with the capabilities of the platform and available to actively participate in this research. information technology and libraries | june 2018 38 in short, researchers’ access to and existing expertise with the arcgis online platform, coupled with the extensive capabilities of the platform itself, made it the best choice for this research. pilot project design a public services librarian and the gis librarian assumed leadership roles for the pilot project. the public services librarian led tasks associated with defining the learning (i.e., the data-collection) spaces, defining the data fields and domains for those spaces, and overseeing personnel responsible for collecting these data. the gis librarian led tasks associated with creating the database schema, creating the geographic features representing the learning spaces, creating a web application to visualize the data, and managing content on the arcgis online platform. library personnel were responsible for collecting the data. gathering ancillary data having building floor plans in a digital format was helpful for data collectors to orient themselves in the space when looking at a map on a mobile device. our research team was able to acquire georeferenced building floor plans for our institution from the information technology services unit on campus. each of the four floors of the library were published to arcgis online as hosted tile layers to serve as a frame of reference for data collectors. managing content and users arcgis online provides the ability to create and define groups. groups are collections of items that can be shared among named users. individual user accounts for each project participant were created, and a group containing items for this pilot project to be shared among those users was created. this approach allowed all data associated with the project to be private and only shared among personnel participating in the project. database design the primary knowledge product resulting from this research was a web application containing a twodimensional map, tables, and charts. a geodatabase, which is an assemblage of geographic datasets, needed to be designed and created to provide data to the web application.16 designing a geodatabase begins with defining the operational layers required to gather information. 17 for this pilot project, one operational layer depicting individual learning spaces was required (see table 1). table 1. description of the learning spaces layer layer learning spaces map use learning spaces define areas intended for a specific type of learning data source digitized using building floor plans as a frame of reference representation polygons the learning spaces layer was used to store the geometry of the individual learning spaces. a table to store observations for each learning space was needed, and a relationship between each individual space and the observations for each space was required (see figure 2). the relationship binds observations to their appropriate learning spaces. the relationship was defined to allow one learning space to relate to many observations for that space. managing in-library use data | godfrey and stoddart 39 https://doi.org/10.6017/ital.v37i2.10208 figure 2. data elements of the geodatabase. fields, analogous to columns in a spreadsheet, were defined for the learning spaces and observations table to store descriptive information. for example, a friendly name was assigned to each learning space. additionally, domains were defined to manage valid values for specific fields. domains were necessary for quality control and quality assurance to enforce data integrity, enabling data collectors to pick items from lists rather than having to type the item names. this feature eliminates potential data-collection errors. field names, data types, field descriptions, and domains for this pilot project can be found in the appendix. defining data-collection spaces a template was created to define the information required to create each learning space feature. these features were created by digitizing them on a computer screen for each of the four floors of the library using the building floor plans as a frame of reference. ten learning spaces were defined for the first floor of the library and one each for floors 2, 3, and 4. a map for each floor was created and published to arcgis online as a hosted feature layer.18 each map contained two layers: one for the floor plan and one for the learning spaces (figure 3). library personnel used these maps to collect data. data collection data collection was accomplished using collector for arcgis installed on mobile devices. this eliminated the need for any software-development costs for data collection. collector for arcgis is a ready-made arcgis online application that is designed to provide an easy-to-use interface for collecting location-based data. the software was installed on a variety of devices, including a samsung galaxy tablet, a surface tablet, and an apple ipad. the online collection mode was enabled during collection, resulting in data being transferred real-time to arcgis online. the software can collect data in an offline mode, but, because strong internet connections were available in both campus buildings, the online mode was utilized. the collection workflow consisted of library personnel traversing the floors of the library and recording data about the number of users in each space, what the users were doing in the space, and entering additional context comments if necessary. library staff were encouraged to use their own expertise and observational cues (e.g., textbooks present) when recording data associated with patron activities in library spaces. the date, time, and name of the data collector was recorded automatically, an option available through the arcgis online platform. the user interface for the software was friendly and intuitive and required minimal training (figure 4). a list was provided to select the type of use for the selected space. data were accessible via arcgis online immediately following collection. information technology and libraries | june 2018 40 figure 3. first floor learning spaces of the university of idaho library overlaid on the building floor plan. figure 4. the collector for arcgis user interface utilized for data collection. managing in-library use data | godfrey and stoddart 41 https://doi.org/10.6017/ital.v37i2.10208 results of using web gis web gis, specifically arcgis online, offered the functionality required for collecting and managing in library use data. additionally, the platform offers librarians supplementary opportunities for collaborative space-assessment projects. while the arcgis online platform proved to be useful for this pilot project, some of the advantages and limitations encountered are discussed below. advantage: ease of use through targeted applications esri software has been used in academia for decades. while the early command-line versions and later desktop versions were the playground of those with gis training, web gis applications have a decidedly friendlier interface because of the ability to customize applications on the platform for specific purposes. for example, applications with management functionality can be separated from applications intended for data gathering. the need for excessive functionally to be included in one interface is replaced with a more modular framework, resulting in less complex user interfaces as seen in many desktop gis programs. while some personnel involved with this project had used esri software for many years and were familiar with the capabilities of the arcgis online, they had not used the platform for data collection prior to this project. managing users and content for the project proved to be straightforward. it was made even easier when enterprise logins were configured, which allowed personnel to sign in using their institutional user name and password. authoring the database schema, creating the necessary maps, and publishing those maps as hosted services was not complicated for those with basic desktop and web gis knowledge. those responsible for collecting data needed little training using collector for arcgis to begin data collection. finally, librarians with no gis background were able to export the data to a familiar format (commaseparated values) to begin analysis using software such as excel. in short, authoring the database and map services remains best handled by those with gis experience. however, targeted application interfaces enable user without gis experience to collect and work with data. advantage: participation in enterprise architecture conducting library research on a platform many faculty, students, and staff are beginning to use for research, learning, and administration places librarians within the same collaborative space as the communities they are serving. in the case of this research, our need for building floor plans presented opportunities to more broadly discuss enterprise gis at our institution by sharing this information. interaction took place between the library, facilities services, and information technology services, resulting in a cultivation of relationships around data sharing. furthermore, integration of our enterprise security with the arcgis online platform adds a level of legitimacy to geospatial data management efforts. advantage: potential for cross-institutional collaborative projects the potential for cross-institutional collaboration on library-space assessment and other projects should not be overlooked when using the arcgis online platform. such collaborations are even more manageable because esri software is being used by more than 7,000 colleges and universities across the globe. even though cross-institutional collaboration was not a goal of this research, the opportunities for projects or programs of this nature became abundantly clear. items created in arcgis online can be shared between organizations. simply sharing a library-space-assessment database schema with librarians at other institutions would allow them to quickly implement a similar project on the arcgis online platform. this opens the door to new research opportunities. the functionality exists for one institution to host a database that personnel from multiple institutions could populate. a single dataset containing learning spaces of multiple institutions with multiple contributors could be created, managed, and analyzed collaboratively. this could enable lower-resource libraries to participate in projects with larger institutions as economies information technology and libraries | june 2018 42 of scale are realized. and it offers the ability to undertake projects across multiple institutions to explore broader space assessment or other research questions. limitation: updating hosted feature service schemas the ability to author and edit schemas entirely in arcgis online has not yet matured to a point where it matches the abilities of its desktop counterpart. specifically, updating a published schema is currently difficult to accomplish in arcgis online because a user-friendly interface does not exist. however, the task can be accomplished by editing the javascript object notation (json) of the hosted feature service. while this is a current limitation for managers of the hosted feature service and not data collectors, it is anticipated that this will be addressed in future updates. limitation: user interface for standards-based metadata items created as part of the pilot project were documented using the metadata editor provided in arcgis online. arcgis online’s users can create and maintain geospatial standards-based metadata for content. however, the user interface for creating metadata based on either the iso 19115-series or federal geographic data committee (fgdc) content standard for digital geospatial metadata (csdgm) could be improved by simplifying its complexity and allowing for batch updating specific elements. item documentation for the platform focuses on creating and editing elements of arcgis-format metadata. it should be noted, and potentially added as a point of concern for librarians, that the ability to author and edit metadata based on the iso and csdgm standards was introduced three years after the initial release of arcgis online. limitation: visualizing data in related tables the ability to visualize data collected as part of this project using ready-made applications in arcgis online yielded unsatisfactory results. the primary limitation was related to working with repeated measurements for the learning spaces. ready-made applications like web appbuilder and operations dashboard have limited support for a user-friendly presentation of repeated learning-space observation. therefore, a custom web application was developed by a university of idaho student using the esri javascript application programming interface (api). the application provides the ability to select a date range, a time scope (e.g., daytime, nighttime, all hours), a building, and a floor to visualize the data. the learning spaces are colored by the total number of users in a space on the basis of the parameters selected (see figure 5). figure 5. map view of the space assessment dashboard application. managing in-library use data | godfrey and stoddart 43 https://doi.org/10.6017/ital.v37i2.10208 for each individual space, a chart and table can be displayed to gain further insight (see figures 5 and 6). figure 6. chart view of the space assessment dashboard application. figure 7. table view of the space assessment dashboard application. limitations: data-collection software issues using collector for arcgis on devices running windows 10 proved frustrating because of a documented bug with collector. a “you are not connected to the internet” error would appear randomly, even when there was a valid internet connection. a workaround was implemented to circumvent the issue, but it was a source of frustration for data-collection staff. offline data-collection mode was experimented with to see if it was a more favorable option; however, the date and time of the data collection are not captured in offline information technology and libraries | june 2018 44 mode, so that potential workflow was abandoned. there were no issues encountered for data collectors who used the samsung galaxy (running the android operating system) or an apple ipad. conclusions web-based gis platforms such as arcgis online have evolved to the point where they offer the functionality required for collecting and managing in-library use data. the arcgis online platform performed commendably for this pilot project. while arcgis desktop was used to author the original database schema in this project, it is reasonable to conclude that it is only a matter of time until the functionality required to complete the entire workflow in the web-based platform is available. using mobile and desktop devices outfitted with the collector for arcgis application proved to be a practical way for collecting real-time in-library use data. managing project users and the items those users were able to access was straightforward. while the visualization tools for repeated measurements data are currently limited in arcgis online, the data are accessible as a web service, and the sky is the limit on custom webapplication development. looking ahead, adjusting schemas to capture height above and below ground level to take advantage of 3d data models and visualization is intriguing. use of this model may be beneficial for space-assessment projects that seek to gather data more broadly across institutions. finally, a noteworthy realization from this research is the potential for inter-institutional and crossinstitution collaboration of library space–assessment projects, or other projects for that matter. librarians can begin embracing the web gis movement alongside those in the communities they participate in and serve. opportunities to create efficiencies are possible through the simple sharing of database schemas. additionally, the ability for one institution to host a database enabling personnel at multiple institutions, or at multiple libraries at larger institutions, to contribute data is available and ready for further research. managing in-library use data | godfrey and stoddart 45 https://doi.org/10.6017/ital.v37i2.10208 appendix: schemas for each object in the geodatabase used for data collection building name table and associated domain values domainname buildingname description name of the building fieldtype smallinteger domain type codedvalue code name 0 library 1 education space identifier table and associated domain values domainname spaceid description identifier for the area fieldtype string domain type codedvalue code name 1a group study 1b café 1c landing 1d computer lab 1e individual/small group study 1f mill (134) 1g group study (133) 1h group study (132) 1i group study (131) 1j classroom (120) information technology and libraries | june 2018 46 2a 2nd floor 3a 3rd floor 4a 4th floor 3a_1 imtc area 1 3b_1 imtc area 2 3c_1 imtc area 3 3d_1 imtc area 4 type of use table and associated domain values domainname typeofusage description type of usage of the area. fieldtype smallinteger domain type codedvalue code name 0 browsing stacks 1 individual studying 2 lounging 3 meeting / group study 4 service point (circulation / reference / its help) 5 using library computers space assessment areas feature class table field datatype description domin globalid guid global identifier spaceid string space identifier spaceid floor string building floor bldgname smallinteger building name buildingname managing in-library use data | godfrey and stoddart 47 https://doi.org/10.6017/ital.v37i2.10208 space assessment areas observations table field datatype description domin type_of_usage smallinteger type of usage typeofusage number_of_users smallinteger number of users globalid guid global identifier spaceid string space identifier spaceid comments string general comments space assessment areas feature class to observations relationship class cardinality onetomany isattributed false iscomposite false forwardpathlabel space_assessment_data backwardpathlabel space_assessment_areas description relationship between the space assessment areas and data collected origin class name origin primary key origin foreign key space_assessment_areas spaceid spaceid information technology and libraries | june 2018 48 references 1 michael f. goodchild, “part 1. spatial analysts and gis practitioners,” journal of geographical systems 2, no. 1 (2000): 5–10, https://doi.org/10.1007/s101090050022. 2 pinde fu and jiulin sun, web gis: principles and applications (redlands, ca: esri, 2011), 7. 3 suzana dragićević, “the potential of web-based gis,” journal of geographical systems 6, no. 2 (2004): 79– 81, https://doi.org/10.1007/s10109-004-0133-4. 4 fu and sun, web gis, 9. 5 “who we are,” esri, accessed october 17, 2017, http://www.esri.com/about-esri#who-we-are. 6 ningning kong, michael fosmire, and benjamin dewayne branch, “developing library gis services for humanities and social science: an action research approach,” college & research libraries 78, no. 4 (2017): 413–27, https://doi.org/10.5860/crl.78.4.413. 7 bradley wade bishop and lauren h. mandel, “utilizing geographic information systems (gis) in library research,” library hi tech 28, no. 4 (2010): 543, https://doi.org/10.1108/07378831011096213. 8 jingfeng xia, “library space management: a gis proposal,” library hi tech 22, no. 4 (2004): 375, https://doi.org/10.1108/07378830410570476. 9 jingfeng xia. “gis in the management of library pick-up books,” library hi tech 22, no. 2 (2004): 209–16, https://doi.org/10.1108/07378830410543520. 10 jingfeng xia, “visualizing occupancy of library study space with gis maps,” new library world 106, no. 5/6 (2005): 219–33, https://doi.org/10.1108/03074800510595832. 11 jingfeng xia, “locating library items by gis technology,” collection management 30, no. 1 (2005): 63–72, https://doi.org/10.1300/j105v30n01_07. 12 lauren h. mandel, “geographic information systems: tools for displaying in-library use data,” information technology & libraries 29, no. 1 (2010): 47–52, https://doi.org/10.6017/ital.v29i1.3158. 13 andrew coyle, “interior library gis,” library hi tech 29, no. 3 (2011): 529–49, https://doi.org/10.1108/07378831111174468. 14 lisa m. given and heather archibald, “visual traffic sweeps (vts): a research method for mapping user activities in the library space,” library & information science research 37, no. 2 (2015): 100–108, https://doi.org/10.1016/j.lisr.2015.02.005. 15 “suma,” north carolina state university libraries, accessed october 17, 2017, https://www.lib.ncsu.edu/projects/suma. 16 “what is a geodatabase?,” esri, accessed october 17, 2017, http://desktop.arcgis.com/en/arcmap/10.4/manage-data/geodatabases/what-is-a-geodatabase.htm. https://doi.org/10.1007/s101090050022 https://doi.org/10.1007/s10109-004-0133-4 http://www.esri.com/about-esri%23who-we-are https://doi.org/10.5860/crl.78.4.413 https://doi.org/10.1108/07378831011096213 https://doi.org/10.1108/07378830410570476 https://doi.org/10.1108/07378830410543520 https://doi.org/10.1108/03074800510595832 https://doi.org/10.1300/j105v30n01_07 https://doi.org/10.6017/ital.v29i1.3158 https://doi.org/10.1108/07378831111174468 https://doi.org/10.1016/j.lisr.2015.02.005 https://www.lib.ncsu.edu/projects/suma http://desktop.arcgis.com/en/arcmap/10.4/manage-data/geodatabases/what-is-a-geodatabase.htm managing in-library use data | godfrey and stoddart 49 https://doi.org/10.6017/ital.v37i2.10208 17 “geodatabase design steps,” esri, accessed october 17, 2017, http://desktop.arcgis.com/en/arcmap/10.4/manage-data/geodatabases/geodatabase-designsteps.htm. 18 “hosted layers,” esri, accessed october 17, 2017, http://doc.arcgis.com/en/arcgis-online/sharemaps/hosted-web-layers.htm. http://desktop.arcgis.com/en/arcmap/10.4/manage-data/geodatabases/geodatabase-design-steps.htm http://desktop.arcgis.com/en/arcmap/10.4/manage-data/geodatabases/geodatabase-design-steps.htm http://doc.arcgis.com/en/arcgis-online/share-maps/hosted-web-layers.htm http://doc.arcgis.com/en/arcgis-online/share-maps/hosted-web-layers.htm abstract introduction the use of desktop gis for space assessment utilizing a cloud-based platform for learning space assessment research study location selecting the arcgis online platform pilot project design gathering ancillary data managing content and users database design defining data-collection spaces data collection results of using web gis advantage: ease of use through targeted applications advantage: participation in enterprise architecture advantage: potential for cross-institutional collaborative projects limitation: updating hosted feature service schemas limitation: user interface for standards-based metadata limitation: visualizing data in related tables limitations: data-collection software issues conclusions appendix: schemas for each object in the geodatabase used for data collection references reproduced with permission of the copyright owner. further reproduction prohibited without permission. in the beginning...was the command line zillner, tom information technology and libraries; jun 2000; 19, 2; proquest pg. 103 book reviews in the beginning ... was the command line by neal stephenson. new york: avon books, inc., 1999. 151p. $10 (isbn 0-38081593-1) neal stephenson is best known for his cyberfiction, including snow crash and most recently cryptonomicon. in the beginning . . . was tlze command line is a quite different kettle of fish. command line is a short book with a succinct message: the command line is a good thing, because the full power of the computer is only available to those who can access the command line and type in the magic commands that make things happen. stephenson learned this lesson the hard way, after first spending much time as a macintosh-devoted guihead. the revelation came when he lost a document he was editing on his powerbook, completely and without a trace, forever irretrievable. actually, i say the book has a succinct message, but it has many messages and many metaphors, all artfully constructed by a master of prose. stephenson constructs his arguments along multiple lines, providing a discursive tour through windows, macintosh, and unix history, offering personal history as well as his own take on the economics of the software industry. for example, he believes that microsoft would be better off as an applications company rather than carrying the millstone of a family of operating systems. as for apple, he suggests that they have been doing their best to destroy themselves for years, so far unsuccessfully (but give them time). the real meat of the book is whether, in fact, it is better to offer to people the flash of metaphor with the recognition that power and certain levels of choice are lost, as with graphical user interfaces exemplified by windows and the macintosh, or whether it is better to have at least some access to the command line interface, which ms/dos offered and members of the unix family (e.g., linux) afford. this is, in fact, both a silly and important question at the same time. silly because many people would wonder why anyone would want command line access to any software. silly because others might wonder why you couldn't have both. important, or at least apparently important, because we seem to have become, without much warning, a world wrapped in guis of one sort or another. important in the library automation world, because end-user tools are moving increasingly toward gui-based or web-based interfaces without textbased alternatives (except, perhaps, lynx or similar web browsers, which have their own problems). for much of the book, stephenson dances around the question, among others, of why not both gui and text-based interfaces, and finally finds the answer in the be operating system. my question is, why not as many interfaces as it takes, of whatever sort? to repeat the trite saw, there are two kinds of people in the world, those who divide the world into two kinds of people and those who don't. stephenson has a lot of fun trying to make the division in this case, then ultimately comes out from behind the posturing and admits that he believes in the availability of both worlds. there are many people who do, indeed, want hard things hidden from them, at least some of the time. when i am dealing with an automated teller machine, i don't want to have to use mechanical levers or pedals as i might have needed were atms invented in an earlier age, nor do i want to type in commands, although i am comfortable using a command line environment in my workplace. i just want to be prompted through a minimal number of steps to walk away with some cash from my checking account. the world is a complicated and challenging place to navigate. some people tom zillner, editor would like to be helped by other people in this navigation, although many have found that they would far rather deal with the dumbeddown interface of an atm machine than to interact with not-so-friendly, underpaid bank tellers. similarly, many people want to accomplish a particular task requiring the use of a computer and don't mind having the details hidden from them, no matter how much power knowing the details would provide. or, they want to do that at least some of the time. as an example in the library world, let's consider a nai:ve patron who enters the library desiring to perform a known-item search. such a user might be quite comfortable with an interface with a single type-in box and a set of clickable buttons labeled title, author and subject. or maybe just a single button "click to start search." although nai:ve users may consult library staff, who are most often more friendly than bank tellers, many people want to find their own materials. at the same time, more sophisticated users want more sophisticated capabilities and interfaces from the same catalogs. although vendors have gotten better at providing a couple of levels of complexity and corresponding user interfaces, why not go further? there aren't just two kinds of people. there are lots of kinds of people, with lots of kinds of information needs, representing lots of experience levels. why the restrictions at the user interface? in the history of microcomputing, stephenson points to the evolution of two major players, microsoft and apple, with linux coming on strong and be representing an interesting offshoot. i think the important insight implicit in what stephenson discusses is that much of the appearance and behavior of windows and the macintosh desktop are historically based artifacts. in order to maintain backward compatibility with existing applications, the windows and macintosh book reviews 103 reproduced with permission of the copyright owner. further reproduction prohibited without permission. operating systems have picked up a great deal of "cruft," computer code that allows multitasking and other improvements cobbled on to the fragile inner shell of ancient code required for compatibility with older applications. at the same time, stephenson invokes the familiar refrain that the user interfaces of both platforms are tied to a tired set of metaphors that attempt to mimic the real-world office (e.g., desktop, folder) but do not do so with any kind of useful fidelity. in the library world, i think a similar kind of lineage might be traced from command line interfaces to the current windowsand web-based front-ends. although many libraries and librarians have faced painful conversion processes over the years in moving through generations of automated systems, it might be interesting to see if there are still traces of underlying code that owe their existence to backward compatibility. where does stephenson turn in the face of the inelegance of the windows and macintosh worlds? he finds solace in the power and integrity of linux. it may take a long time to successfully install the operating system and get it to function with all of the hardware components of a particular computer configuration, but it has all that power, and all of those cool applications carefully constructed by people who care. bugs are fixed quickly. it's a community effort. that's all very appealing, particularly when compared to the appalling response (or lack of it) to windows or macintosh bugs. the problem is that so far most of us aren't equipped to deal with the steep curve required to install linux on personal computers, and the corporate or library environment usually isn't politically prepared for linux to be adopted as an institutionwide standard. so, while linux boxes are frequent choices for servers, they are not widespread personal pc choices. nor r.hould they be until easy installation tools are available. again, stephenson is ambivalent. on the one hand, he recognizes that there are many people who don't want the kind of power offered by being so close to the machine if it means becoming experts in arcane commands and codes. even though he wants the power and simplicity, and decries the limitations imposed by the gui, he recognizes that linux is not for everyone. he's right. most people use computers to get some work done (or to play). to the extent that the software gets in the way, it isn't operating properly. by that criterion, none of the three environments described are particularly useful in a desktop world. in spite of the fact that the old metaphors have been rightly criticized for years for their tiredness, there doesn't seem to be much movement beyond them, except in limited research operating environments and applications. similarly, it seems, in the library and information world, at least in most people's routine interactions with opacs and databases. yes, i am waffling, because i'm sure that someone could point out the "snarfle n 1 virtual reality interface to the lc catalog that affords a walkthrough browsing experience," but of course only six computer science researchers have actually experienced the snarfletm interface, and it requires a $25,000 workstation and $10,000 in virtual reality gear to work, plus it is s-1-o-w. pardon the sarcastic riff, but there is a lot of wonderful user interface work that is certainly not finding its way onto mainstream computer users' desktops, or to the library or information center. so what's the answer? criticism is fun, because critics don't necessarily have to provide a positive account to match their nay-saying function. if things are bleak in the world of the user interface, both on the average user desktop and on the library desk104 information technology and libraries i june 2000 top as well, what is to be done? for a taste of what is to come in the library world, take a look at mylibrary (http:/ /my.lib.ncsu.edu/), which allows profiling of user preferences and customization based on academic discipline. similarly, there are a number of web portals and other sites that allow customization for users (e.g., my yahoo, my excite, etc.). suppose that these first steps in customization are carried further, so that each user's unique profile generates a unique user interface experience across all databases he or she deals with in a session. the interface unification could be accomplished across heterogeneous databases in a couple of different ways. a simple initial step that many libraries already employ is to obtain databases from a single aggregator, so that a uniform interface is presented to the user. for example, oclc' s first search offers a single interface to a number of commercial databases. this type of solution is not possible for libraries that need access to a diverse array of databases not available through a single aggregator or vendor. of course, this situation can present patrons and staff with a bewildering array of interfaces and search methods. a more elaborate solution is to employ z39.50 to access the databases and build a single interface at the front end. there may be aggregators that already use this strategy with the databases they provide, but in the future perhaps there would be an incentive to offer unified interfaces with fine-grain customization possible by users. getting back to stephenson's more generalized view of the user interface, i think there are also opportunities here for more finegrained customization. stephenson points to the beos, which apparently allows both command-line and guibased interactions, as an example of what can be done when an operating system is constructed anew, from the bottom up, with no pre-existing reproduced with permission of the copyright owner. further reproduction prohibited without permission. audience to satisfy. at the same time, and in contrast, stephenson extols the power of open software development, which he believes is most apparent in operating systems, the production of which he describes as money-losing propositions. yet, linux is tremendously successful without, for the most part, commercial gain for developers. can this same model be applied to interface and other development in the library world? in this example, might not some group of librarian coders (or coder librarians) work together to put mylibrary together with z39.50 capabilities and customization of interfaces to produce a little slice of paradise for library patrons? promising moves are being made within the library community to get open source efforts off the ground. this could be one of many especially useful and fruitful projects to come out of open software development for libraries. although his book is ostensibly about a few issues that elicit yawns from most of the world, stephenson is really using in the beginning . . . was the command line to look at a much bigger picture than simply the command line versus the gui at its microscopic level. stephenson looks at the cloaking, obfuscation or replacement of underlying text by images and multimedia as contributing to the decline of civilization. that seems like a radical claim, but at heart it is the one that stephenson makes in his discussion of the disney-ification of the world-that visual metaphors and explanations oversimplify and obscure the truth. in fact, stephenson goes further, discussing this trend toward anti-word as our attempt at an antidote for the kind of intellectualism that resulted in a lot of death, pain, and suffering for people in the twentieth century. he, as a person who lives by words and loves the intellectual life, thinks we've gone too far, reaching a state of cultural relativism where there is neither good nor bad remammg. this discussion includes my favorite quote of the book: the problem is that once you have done away with the ability to make judgments as to right and wrong, true and false, etc., there's no real culture left. all that remains is clog dancing and macrame. the ability to make judgments, to believe things, is the entire point of having a culture. i think this is why guys with machine guns sometimes pop up in places like luxor and begin pumping bullets into westerners .... when their sons come home wearing chicago bulls caps with the bills turned sidewavs, the dads go out of their minds. (p. 56) it's a pretty startling move to try to connect up the decline in use of the command line to an anti-intellectualism following world war ii that resulted in cultural relativism. i think it actually has some merit, although in the case of visual interfaces versus the command line the ethical import is minimal, i.e., i don't believe my decision to accomplish certain tasks using visual metaphors contributes to the decline of civilization, and i think the fact that i like to work on other tasks utilizing a command line won't serve to save our written culture. it's too much of a stretch. i think that something stephenson misses in his discussion of the replacement of the written word by visual images is that there is still a creative force and judgment involved in the creation of the images. there is still script writing. isn't this, after all, what a writer does in any case, creating images, metaphorically, through his or her work? certainly, we are moving through a perilous time, when the world really is changing from a reliance on the written word to more dependence on the visual. there will be many things lost in this transition. plato had some major, wellfounded doubts about the transition from greece's oral cultural tradition to a written one. the change happened anyway. civilization has been declining for a long time. my fearless prediction is that it will continue to decline for a long time. i think stephenson has done a masterful job of writing a brief glimpse of the overall picture that represents the state of culture and intellectual life in the world today, and has also made some important points about the economics and character of the world of software and operating environments. his writing skills make this fairly short book a pleasurable read and a worthwhile one. as i did, i think you might find this long essay a useful starting point for thoughts about issues large and small.-tom zillner, wils the cathedral & the • bazaar: musings on linux and open source by an accidental revolutionary by eric s. raymond, sebastopol, calif.: o'reilly, 1999. 288p. $19.95 (isbn 156592-724-9) this short essay examines, in the guise of a book review, the concept of a "gift culture" and how it may or may not be related to librarianship. as a result of this examination, and with a few qualifications, i believe my judgements about open source software and librarianship are true: open source software development and librarianship have a number of similarities-both are examples of gift cultures. i have recently read a book about open source software development by eric raymond. the cathedral & the bazaar describes the environment of free software and tries to explain why some programmers are willing to give away the products of their labors. it describes the "hacker milieu" as a "gift culture": book reviews 105 reproduced with permission of the copyright owner. further reproduction prohibited without permission. gift cultures are adaptations not to scarcity but to abundance. they arise in populations that do not have significant material scarcity problems with survival goods. we can observe gift cultures in action among aboriginal cultures living in ecozones with mild climates and abundant food. we can also observe them in certain strata of our own society, especially in show business and among the very wealthy. 1 raymond alludes to the definition of "gift cultures," but not enough to satisfy my curiosity. being the good librarian, i was off to the reference department for more specific answers. more often than not, i found information about "gift exchange" and "gift economies" as opposed to "gift cultures." (yes, i did look on the internet but found little.) probably one of the earliest and more comprehensive studies of gift exchange was written by marcell mauss. 2 in his analysis he says gifts, with their three obligations of giving, receiving, and repaying, are in aspects of almost all societies. the process of gift giving strengthens cooperation, competitiveness, and antagonism. it reveals itself in religious, legal, moral, economic, aesthetic, morphological, and mythological aspects of life.3 as gregory states, for the industrial capitalist economies, gifts are nothing but presents or things given, and "that is all that needs to be said on the matter." ironically for economists, gifts have value and consequently have implications for commodity exchange. 4 he goes on to review studies about gift giving from an anthropological view, studies focusing on tribal communities of various american indians, cultures from new guinea and melanesia, and even ancient roman, hindu, and germanic societies: the key to understanding gift giving is apprehension of the fact that things in tribal economics are produced by nonalienated labor. this creates a special bond between a producer and his/her product, a bond that is broken in a capitalistic societv based on alienated wage-labor. 5 ingold, in "introduction to social life," echoes many of the things summarized by gregory when he states that industrialization is concerned exclusively with the dynamics of commodity production. clearly in non-industrial societies, where these conditions do not obtain, the significance of work will be very different. for one thing, people retain control over their own capacity to work and over other productive means, and their activities are carried on in the context of their relationships with kin and community. indeed their work may have the strengthening or regeneration of these relationships as its principle objective. 6 in short, the exchange of gifts forges relationships between partners and emphasizes qualitative as opposed to quantitative terms. the producer of the product (or service) takes a personal interest in production, and when the product is given away as a gift it is difficult to quantify the value of the item. therefore, along with the product or service, less tangible elements-such as obligations, promises, respect, and interpersonal relationships-are exchanged. as i read raymond and others i continually saw similarities between librarianship and gift cultures, and therefore similarities between librarianship and open source software development. while the summaries outlined above do not necessarily mention the "abundance" alluded to by raymond, the existence of abundance is more than mere speculation. potlatch, "a ceremonial feast of the american indians of the northwest coast marked by the host's lavish distribution of gifts or sometimes destruction of property to demonstrate wealth and generosity with the 106 information technology and libraries i june 2000 expectation of eventual reciprocation," is an excellent example.? libraries have an abundance of data and information. (i won't go into whether or not they have an abundance of knowledge or wisdom of the ages. that is another essay.) libraries do not exchange this data and information for money; you don't have to have your credit card ready as you leave the door. libraries don't accept checks. instead the exchange is much less tangible. first of all, based on my experience, most librarians simply take pride in their ability to collect, organize, and disseminate data and information in an effective manner. they are curious. they enjoy learning things for learning's sake. it is a sort of platonic end in itself. librarians, generally speaking, just like what they do and they certainly aren't in it for the money. you won't get rich by becoming a librarian. information is not free. it requires time and energy to create, collect, and share, but when an information exchange does take place, it is usually intangible, not monetary, in nature. information is intangible. it is difficult to assign it a monetary value, especially in a digital environment where it can be duplicated effortlessly: an exchange process is a process whereby two or more individuals (or groups) exchange goods or services for items of value. in library land, one of these individuals is almost always a librarian. the other individuals include tax payers, students, faculty, or in the case of special libraries, fellow employees. the items of value are information and information services exchanged for a perception of worth-a rating valuing the services rendered. this perception of worth, a highly intangible and difficult thing to measure, is something the user of library services "pays," not to libraries and librarians, but to administrators and decision-makers. ultimately, these payments reproduced with permission of the copyright owner. further reproduction prohibited without permission. manifest themselves as tax dollars or other administrative support. as the perception of worth decreases so do tax dollars and support. 8 therefore, when information exchanges take place in libraries, librarians hope their clientele will support the goals of the library to administrators when issues of funding arise. librarians believe that "free" information ("think free speech, not free beer") will improve society. it will allow people to grow spiritually and intellectually. it will improve humankind's situation in the world. libraries are only perceived as beneficial when they give away this data and information. that is their purpose, and they, generally speaking, do this without regard to fees or tangible exchanges. in many ways i believe open source software development, as articulated by raymond, is very similar to the principles of librarianship. first and foremost they are similar in the idea of sharing information. both camps put a premium on open access. both camps are gift cultures and gain reputation by the amount of "stuff" they give away. what people do with the information, whether it be source code or journal articles, is up to them. both camps hope the shared information will be used to improve our place in the world. just as jefferson's informed public is necessary for democracy, open source software is necessary for the improvement of computer applications. second, human interactions are a necessary part of the mixture in both librarianship and open source development. open source development requires people skills by source code maintainers. it requires an understanding of the problem the computer application is intended to solve, since the maintainer must be able to "patch" the software, both to add functionality and to repair bugs. this, in turn, requires interactions both with other developers and with users who request repairs or enhancements. similarly, librarians understand that information-seeking behavior is a human process. while databases and many "digital libraries" house information, these collections are really "data stores" and are only manifested as information after the assignment of value is given to the data and interrelations between data are created. third, it has been stated that open source development will remove the necessity for programmers. yet raymond posits that no such thing will happen. if anything, there will be an increased need for programmers. similarly, many librarians feared the advent of the web because they believed their jobs would be in jeopardy. ironically, librarianship is flowering under new rubrics such as information architects and knowledge managers. it has also been brought to my attention by kevin clarke (kevin_clarke@unc.edu) that both institutions use peer-review: your cultural take (gift culture) on "open source" is interesting. i've been mostly thinking in material terms but you are right, i think, in your assessment. one thing you didn't mention is that, like academic librarians, open source folks participate in a peer-review type process. index to advertisers all of this is happening because of an information economy. it sure is an exciting time to be a librarian, especially a librarian who can build relational databases and program on a unix computer. acknowledgements thank you to art rhyno (arhyno@ server.uwindsor.ca) who encouraged me to post the original version of this text.-eric lease morgan, north carolina state university, raleigh, north carolina references 1. the cathedral & the bazaar: musings on linux and open source by an accidental revolutionary, 99. 2. m. mauss, the gift: forms and functions of exchange in archaic societies (new york: norton, 1967). 3. s. lukes, "mauss, marcel," in international encyclopedia of the social sciences, d. l. sills, ed. (new york: macmillian), vol 10, 80. 4. c. a. gregory, "gifts," in the new pa/grave: a dictionary of eeconomics, j. eatwell and others, eds. (new york: stockton pr., 1987), vol. 4, 524. 5. ibid. 6. t. ingold, "introduction to social life," in companion encyclopedia of anthropology, t. ingold, ed (new york: routledge, 1984), 747. 7. the merriam-webster online dictionary, http://search.eb.com/ cgi-bin/ dictionary?va=potlatch 8. e. l. morgan, "marketing future libraries." accessed apr. 27, 2000, www.lib.ncsu.edu/ staff/ morgan/ cil/ marketing. info usa library technologies, inc. lita mit press cover 2 cover 3 58, 69, cover 4 95 book reviews 107 libraryvpn: a new tool to protect patron privacy public libraries leading the way libraryvpn a new tool to protect patron privacy chuck mcandrew information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12391 chuck mcandrew (chuck.mcandrew@leblibrary.com) is information technology librarian, lebanon (nh) public libraries. due to increased public awareness of online surveillance, a rise in massive data breaches, and spikes in identity theft, there is a high demand for privacy enhancing services. vpn (virtual private network) services are a proven way to protect online security and privacy. vpn’s effectiveness and ease of use have led to a boom in vpn service providers globally. vpns protect privacy and security by offering an encrypted tunnel from the user’s device to the vpn provider. vpns ensure that no one who is on the same network as the user can learn anything about their traffic except that they are connecting to a vpn. this prevents surveillance of data from any source, including commercial snooping such as your isp trying to monetize your browsing habits by selling your data, malicious snooping such as a fake wifi hotspot in an airport hoping to steal your data, or government-level surveillance that can target political activists and reporters in repressive countries. some people might ask why we need a vpn as https becomes more ubiquitous and provides end to end encryption for your web traffic. https will encrypt the content that goes over the network, but metadata such as the site you are connecting to, how long you are there, and where you go next are all unprotected. additionally, some very important network protocols, such as dns, are unencrypted and anyone can see them. a vpn eliminates all of those issues. however, there are two major problems with current vpn offerings. first, all reliable vpn solutions require a paid subscription. this puts them out of reach of economically vulnerable populations who often have no access to the internet in their homes. in order to access online services, they may rely on public internet connections such as those provided by restaurants, coffee shops, and libraries. using publicly accessible networks without the security benefits of a vpn puts people’s security and privacy at great risk. this risk could be eliminated by providing free access to a high-quality vpn service. the second problem is that using a vpn requires people to place their trust in whatever vpn company they use. some (especially free solutions) have proven not to be worthy of that trust by containing malware or leaking and even outright selling customer data. companies that abuse customer data are taking advantage of vulnerable populations who are unable to afford more expensive solutions or who do not have the knowledge to protect themselves. together, these two problems create a situation where having security and privacy is only available to those who can afford it and have the knowledge to protect themselves. libraries are ideally positioned to help with this situation. libraries work to provide privacy and security to people every day. this can mean teaching classes, making privacy resources available, and even advocating for privacyfriendly laws. mailto:chuck.mcandrew@leblibrary.com https://www.forbes.com/sites/forbestechcouncil/2018/07/10/the-future-of-the-vpn-market/#5b08fd8e2e4d https://research.csiro.au/ng/wp-content/uploads/sites/106/2016/08/paper-1.pdf information technology and libraries june 2020 libraryvpn | mcandrew 2 libraries are also located in almost every community in the united states and enjoy a high level of trust from the public. librarians can be thought of as being a physical vpn. people who come into libraries know that what they read and information that they seek out will be protected by the library. in fact, libraries have helped to get laws protecting the library records of patrons in all 50 states of the usa. people know that when a library offers a service to their community it isn’t because they want to sell their information or show them advertisements. with libraries, our patrons are not the product. libraries also already provide many online services to all members of their community, regardless of financial circumstances. examples include access to online databases, language learning software, and online access to periodicals such as the new york times or consumer reports. many of these services would cost too much for individual patrons to access individually. by pooling their resources, communities are able to make more services available to all of their citizens. to help address the above issues, the lebanon public libraries, in partnership with the westchester (new york) library system, the leap encryption access project (https://leap.se/), and tj lamanna (emerging technology librarian from cherry hill public library and library freedom institute graduate) started the libraryvpn project. this project will allow libraries to offer a vpn to their patrons. patrons will be able to download the libraryvpn application on a device of their choosing and connect to their library’s vpn server from wherever they are. libraryvpn was first conceived a number of years ago, but the real start of the project was when it received an imls national leadership grant (lg-36-19-0071-19) in 2019. this grant was to develop integrations between leap’s existing vpn solution and integrated library systems using sip2 which will allow library patrons to sign in to libraryvpn using their library card. this grant also included development of a windows client (there was already a mac and linux client) and alpha testing at the lebanon public libraries and westchester library system. we are currently working on moving into the testing phase of the software, and planning phase two of this project. phase two of libraryvpn will involve expanding our testing to up to 12 libraries and conducting end-user testing with patrons and library staff. we have submitted an application for imls funding for phase two and are actively looking for libraries that are excited about protecting patron privacy and would like to help us beta test this software. if you work for a library that would be interested in participating, you can reach us via email at libraryvpn@riseup.net or @libraryvpn on twitter. if you would like to help out with this project in another way, we would love to have more help. please reach out. we currently are thinking about three deployment models for libraries in phase two. first would be an on-premises deployment. this would be for larger library systems with their own servers and it staff. libraryvpn is free and open source software and can be deployed by anyone. since it uses sip2 to connect to your ils, it should work with any ils that supports the sip2 protocol. this deployment model has the advantage of not requiring any hosting fees but does require the library system to have staff that can deploy and manage public facing services. drawbacks to this approach would include higher bandwidth use and dealing with abuse complaints. phase 2 testing should give us better data about how much of an issue this will be, but https://leap.se/ mailto:libraryvpn@riseup.net information technology and libraries june 2020 libraryvpn | mcandrew 3 our experience hosting a tor exit node at the lebanon public libraries suggest that it won’t be too bad to deal with. our second deployment model would be cloud hosting. if a library has it staff who can deploy services to the cloud, they could host their own libraryvpn service without needing their own hardware. however, when deploying to the cloud, there will be ongoing costs for running the servers and bandwidth used. figuring out how much bandwidth an average user will consume is part of the data we are hoping to get from our phase 2 testing so we can offer guidelines to libraries who choose to deploy their own libraryvpn service. finally, we are looking at a hosted version of libraryvpn. we anticipate that smaller systems that do not have dedicated servers or it staff will be interested in this option. in this case, there would be ongoing hosting and support costs, but managing the service would not be any more complicated than subscribing to any other service the library hosts for their patrons. libraryvpn is a new project that is pushing library services outside of the library to where the library is. we want to make sure that all of our patrons are protected, not just those with the financial ability and technical know-how to get their own vpn service. as librarians, we understand that privacy and intellectual freedom are joined, and we want to maximize both. as the american library association’s code of ethics says, “we protect each library user's right to privacy and confidentiality.” http://www.ala.org/tools/ethics 10980 2019038 editor lita president’s message updates from the 2019 ala midwinter meeting bohyun kim information technology and libraries | march 2019 2 bohyun kim (bohyun.kim.ois@gmail.com) is lita president 2018-19 and chief technology officer & associate professor, university of rhode island libraries, kingston, rhode island. in this president’s message, i would like to provide some updates from the 2019 ala midwinter meeting held in seattle, washington. first, as many of you know, the potential merger of lita with alcts and llama has been temporarily put on hold, due to an initial timeline that was rather ambitious and the lack of time required to deliberate on and resolve some issues in the transition plan to meet that timeline.1 these updates were also shared at the lita town hall during the midwinter meeting, where many lita members spent time discussing topics such as the draft mission and vision statements for the new division, what makes people feel at home in a division, in which areas lita should redouble its focus, and which activities lita may be able to set aside without losing its identity. valuable feedback and thoughts were provided by town hall participants. many emphasized the importance of building and retaining a community of library technologists around lita values, programming, resources, advocacy, service activities, and networking opportunities in those feedback. the merger-related discussion is to resume this spring, and the leadership of lita, alcts, and llama will make every effort to ensure the best future for three divisions at this time of great flux and change. second, lita is looking into introducing some changes to the lita forum. in the feedback and thoughts gathered at the lita town hall, the lita forum was also mentioned as one of the valuable lita offerings to its members. the origin of the lita forum goes back to lita’s first national conference held in baltimore in 1983.2 since then, the lita forum has become a cherished venue for many library technologists, a place where they meet other like-minded people in the field, learn from one another, share ideas and experience, and look for more ways in which technology can be utilized to better serve libraries and library patrons. initially, the steering committee hoped that all three divisions would participate in putting together the lita forum with a wider range of content that encompasses the interests of not only lita members but also of those in alcts and llama, in a virtual format in order to engage more members who cannot easily travel, to be held some time in spring 2020. at the time this idea was conceived more than a year ago, it was assumed that all preparations for the member vote regarding the merger would have been nearly completed by the time of the midwinter meeting. however, the steering committee unfortunately ran out of time for that preparation. merger planning also took up almost the entirety of the time that the leadership and the staff of the three divisions had available. this resulted in an unfortunate delay in proper forum planning. with the merger conversation on hold at this point and the new timeline for the merger being likely to be set back at least by a year, the changed circumstances for the forum planning had to be reviewed. information technology and libraries | march 2019 3 after a lively and thoughtful discussion at the midwinter meeting, the lita board decided that, considering how much work remains to be done regarding merger planning, it may not be practical or feasible to have the next lita forum be the first virtual and joint one. however, there was a lot of interest in and excitement about trying a virtual format since it will allow lita to reach and serve the needs of more lita members than the traditional in-person meeting. it was also pointed out that the virtual format may provide an opportunity for lita to experiment with different and more unconventional conference program formats, which could be a welcoming change to lita members. the lita board, however, also acknowledged the value of a physical conference where people get to meet one another in person, which cannot be easily transferred to a virtual conference. if the virtual conference experiment takes place and is successful, lita may hold its forum alternating every year between two different formats – virtual and physical. planning for and running a fully virtual conference at the scale of a multi-day national forum will require additional time and careful consideration since it will be the first time the lita forum planning committee and the lita office attempt this. logistics management is likely to be quite different in a virtual conference. the attendee expectations and the user experience will also significantly differ in a virtual conference than in a physical conference. as the first step of this investigation, the lita forum planning committee will explore what the ideal lita virtual forum may look like in terms of programming formats and participant experience. the lita office and the finance advisory committee will also look into the financial side of running the lita forum in a virtual format. at this time, it is not yet determined when the next lita forum will be held and whether it will be a virtual or a physical one. once these investigations are completed, however, the lita board should be able to decide on the most appropriate path towards the next lita forum. stay tuned for what exciting changes may be coming to the lita forum. third, i would like to mention that lita issued a statement regarding the incidents of aggressive behavior, racism, and harassment reported at the 2019 ala midwinter meeting.3 along with the statement, the lita board has decided to commit funds to provide an online bystander / allyship training, which we hope will equip lita members with tools that empower active and effective allyship, recognize and undo oppressive behaviors and systems, and promote the practice of cultural humility, thereby collectively increasing our collaborative capacity. the lita statement and the board decision were received positively by many lita members. other ala divisions such as alcts, alsc, asgcla, llama, united, and yalsa have already expressed interest in working together with lita on this, and the lita board is looking into a few options to choose from. more information about the training will be soon provided. lastly, i am thrilled to announce that the lita president’s program at the upcoming ala annual conference at washington d.c in june will feature meredith broussard, a data journalist and the author of artificial unintelligence: how computers misunderstand the world, as the speaker. in her book, broussard delves into many problems surrounding techno-chauvinism, which displays blind optimism about technology and an abundant lack of caution about how new technologies will be used. she further details how this simplistic worldview that prioritizes building new things and efficient code above social conventions and human interactions often misinterprets a complex social issue as a technical problem and results in a reckless disregard for public safety and the public good. lita president’s message: updates from the 2019 ala midwinter meeting | kim 4 https://doi.org/10.6017/ital.v38i1.10980 reviewing the early history of computing and digital technology, broussard observes: “we have a small, elite group of men who tend to overestimate their mathematical abilities, who have systematically excluded women and people of color in favor of machines for centuries, who tend to want to make science fiction real, who have little regard for social convention, who don’t believe that social norms or rules apply to them, who have unused piles of government money sitting around, and who have adopted the ideological rhetoric of far-right libertarian anarcho-capitalists. what could possibly go wrong?”4 i invite all of you to come to this program for more insight and a deeper understanding about what the recent technology innovation involving artificial intelligence (ai) and big data means to our everyday life and where it may be headed. the program information is available in the ala 2019 annual conference scheduler at https://www.eventscribe.com/2019/alaannual/fspopup.asp?mode=presinfo&presentationid=519109. endnotes 1 the official announcement can be found at the lita blog. see bohyun kim, “update on new division discussions,” lita blog, january 26, 2019, https://litablog.org/2019/01/update-onnew-division-discussions/. 2 stephen r. salmon, “lita’s first twenty-five years: a brief history,” library information technology association (lita), september 28, 2006, http://www.ala.org/lita/about/history/1st25years. 3 “lita’s statement in response to incidents at ala midwinter 2019,” lita blog, february 4, 2019, https://litablog.org/2019/02/litas-statement-in-response-to-incidents-at-alamidwinter-2019/. 4 meredith broussard, artificial unintelligence: how computers misunderstand the world (cambridge, massachusetts: the mit press, 2018), p. 85. lita president's message: a framework for member success lita president’s message a framework for member success emily morton-owens information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.12105 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the assistant university librarian for digital library development & systems at the university of pennsylvania libraries. this column represents my final venue to reflect on our potential merger with alcts and llama before the vote. after a busy midwinter meeting with lots of intense discussions about the steering committee on organizational effectiveness (scoe)’s recommendations, the divisions, the merger, ala finances, and more, my thoughts keep turning in a particularly wonkish direction: towards our organization. so many of the challenges before us hinge on one particular dilemma. for those of us who are most involved in ala and lita, the organization (our committees, offices, processes, bylaws, etc.) may be familiar and supportive. but for new members looking for a foothold, or library workers who don’t see themselves in our association, our organization may look like a barrier. moreover, many of our financial challenges are connected to our organization. the organization must evolve, but we must achieve this without losing what makes us loyal members. while ala and lita have specific audiences of library workers and technologists, we have a lot in common with other membership organizations. one of the responsibilities for the lita vicepresident is attendance at a workshop put on by the american society of association executives, where we learn how to steward an organization. representatives from many different groups attended this workshop, where i had a chance to discuss challenges with leaders from medical and manufacturing associations, and i learned that these challenges are often orthogonal to the subject matter at hand. everyone was dealing with the need to balance membership cost and value, how to give members a voice while allowing for agile decision-making, and how to put on events that are great for attendees without becoming the only way to get value from membership. hearkening back even further, i worked as a library-school intern at a library with a long run of german and french-language serials that i retrospectively cataloged. one batch that has always stuck in my mind is the planning materials for international congresses that were held in the early 20th century by the international societies for horticulture and botany. these events were massive undertakings held at multi-year intervals, gradually planned by international mail. interested parties would receive a lavish printed prospectus, with registration and travel arrangements starting several years in advance. the most interesting documents pertained to the events planned for the mid to late 1930s in europe. these events were cancelled or fell short of intentions because of pre-world war ii political pressures. the congress schedules did not resume until 1950 or later, with some radical changes—for example, german was no longer used as the language of science, and the geographic distribution of events increased significantly in the later 20th century. when i first encountered this material, i was intrigued by how the war affected science. looking back now, i see a dual case study in organizations weathering a crisis whose magnitude we can only imagine, and then reinventing themselves on the other side. both of these organizations still exist and continue to meet, by the way—and i can’t help but feel that reinvention is the key to survival. mailto:egmowens.lita@gmail.com information technology and libraries march 2020 a framework for member success | morton-owens 2 our organizational framework is a key part of the challenge for both ala and lita. i have no doubt that members remain excited about our key issues for advocacy, our subjects for continuing education, and our opportunities for networking. but we have concerns about how we make those things happen. in lita, for example, continuing education requires a massive effort on the part of both member volunteers and staff to organize. we need to brainstorm relevant topics, recruit qualified instructors, schedule and promote the events, and finally run the sessions, collect feedback, and arrange payment for the instructors. this takes the time of the same people we’d like to have creating newsletters and booking conference speakers. meanwhile, right across the hall at ala headquarters, we have staff from alcts and llama doing the same things. these inefficiencies hit at the heart of our financial problems. at the ala level, scoe has proposed ideas like a single set of dues structures for divisions, and a single set of policies and procedures for all round tables. these changes would reduce the overhead required to operate these groups as unique entities, a financial benefit, while also making it easier for members to afford, join, and move between them, a membership benefit. that framework also offers us an opportunity to improve our associations. members have been asking how the association can act more responsively on issues of diversity, equity, and inclusion—for example, how can we have incident response that is proactive and sensitive to member needs while recognizing the complexities of navigating that space as a member-based organization. this is a chance to live up to our aspirations as a community. the actions lita has taken to extend all forms of participation to members who can only participate remotely/online are a way to make us more accessible to library workers regardless of finances or home circumstances. bylaws and policies may not be the most glamorous part of associations but they are the levers we can employ to change the character of our community. coming back to core, we can observe elements of the plan that are responding to both threats and opportunities. members of alcts, llama, and lita know that financial pressures are a major impetus for the merger effort. but, in the hope of achieving a positive reinvention, the merger planning steering committee put most of its emphasis on the opportunity side. the diagram of intersecting interests for core’s six proposed sections (https://core.ala.org/core-overlap/) is a demonstration of the new frontiers of collaboration that core will offer members. the proposed structure of core retains committees while also offering a more nimble way to instantiate interest groups. moreover, the process of creating core reflects the kind of transparent process we want to see in the future. the steering committee and the communications sub-committee crossed not just the three divisions but also different levels of experience and types of prior participation in the divisions. the communications group answered freeform questions, held twitter ama’s, and held numerous forums to collect ideas and feelings about the project. zoom meetings and twitter are not new media, but the sustained effort that went into soliciting and responding to feedback through these channels is a new mode for our divisions. the lita board recently issued a statement (https://litablog.org/2020/02/news-regarding-thefuture-of-lita-after-the-core-vote/) explaining that if the core vote does not succeed, we don’t see a viable financial path forward and will be spending the latter half of 2020 and the beginning of 2021 working toward an orderly dissolution of lita. it is tempting to approach this crossroads from a place of disappointment or fear. we cannot yet say precisely what it will be like to be a https://core.ala.org/core-overlap/ https://litablog.org/2020/02/news-regarding-the-future-of-lita-after-the-core-vote/ https://litablog.org/2020/02/news-regarding-the-future-of-lita-after-the-core-vote/ information technology and libraries march 2020 a framework for member success | morton-owens 3 member of core. but when i look at the organizational structure core offers us, i feel hopeful about it being a framework in which members will find their home and flourish. the new division includes what we need for a rich member experience coupled with a streamlined structure that makes it easier to be involved in the ways and extent that make sense for you. in fifty years, perhaps a future member of core will be writing a letter to their members: looking back at this moment of technological and organizational disruption and reflecting on how we reinvented our organization at the moment it needed it most. lib-s-mocs-kmc364-20140601052858 the shared cataloging system of the ohio college library center frederick g. kilgour, philip l. long, alan l. landgraf, and john a. wyckoff: ohio college library center, columbus, ohio development and implementation of an off-line catalog card production system and an on-line shared cataloging system are described. in off-line production, average cost per card for 529,893 catalog cards in finished form and alphabetized for filing was 6.57 (·. an account is given of system design and equipment selection for the on-line system. file organization and programs are described, and the on-line cataloging system is discussed. the system is easy to use, efficient, 1'eliable, and cost beneficial. the ohio college library center ( oclc) is a not-for-profit corporation chartered by the state of ohio on 6 july 1967. ohio colleges and universities may become members of the center; forty-nine institutions are participating in 1971/ 72. the center may also work with other regional centers that may "become a part of any national electronic network for bibliographic communication." the objectives of oclc are to increase the availability to individual students and faculty of resources in ohio's academic libraries, and at the same time to decrease the rate of rise of library costs per student. the oclc system complies with national and international standards and has been designed to operate as a node in a future national network as well as to attain the more immediate target of providing computer support to ohio academic libraries. the system is based on a central computer with a large, random access, secondary memory, and cathode ray tube terminals which are connected to the central computer by a network of telephone circuits. the large secondary memory contains a file of bibliographic records and indexes to the bibliographic record file. access to this central file from 158 journal of library automation vol. 5/3 september, 1972 the remote terminals located in member libraries requires fewer than five seconds. oclc will eventually have five on-line subsystems: 1) shared cataloging; 2) serials control; 3) technical processing; 4 ) remote catalog access and circulation control; and 5) access by subject and title. this paper concentrates on cataloging; the other subsystems are not operational at the present time. figure 1 presents the general file design of the system. the shared cataloging system has been the first on-line subsystem to be activated, and the files and indexes it employs are depicted in figure 1 by the heavy black lines and arrows. as can be seen in the figure, much of the system required for shared cataloging is common with the other four subsystems. the three main goals of shared cataloging are: 1) catalog cards printed to meet varying requirements of members; 2 ) an on-line union catalog; and 3) a communications system for requesting interlibrary loans. in addition, the bibliographic and location information in the system can be used for other purposes such as book selection and purchasing. the only description of an on-line cataloging system that had appeared in the literature during the development of the oclc system is that of the shawnee mission (kansas) public schools ( 1). the shawnee mission cataloging system produces uniform cards from a fixed-length, non-marc record. the oclc system uses a variable-length marc record and has great flexibility for production of cards in various formats. there are a number of reports describing off-line catalog card production systems, including systems at the georgia institute of technology ( 2), the new subject class 3,3 3,1,1, i lc card call author name and title index number title title number number index index index index index index l ! ! l ! l t ! bibliographic f---+ holding library, record file multiple and ~ partial holdings file i l l 1 ' date file name and techn ica i note and address processing dash entries file system and extra files added entries fig. 1. general file design; shared cataloging subsystem in heavy lines. shared cataloging systemfkilgour, et al. 159 england library information network ( nelinet) ( 3), and the university of chicago ( 4). the flexibility of the oclc system distinguishes it from these three systems as well. catalog card production-off-line an off-line catalog card production system based on a file of marc ii records was activated a year before the on-line system ( 5) . oclc supplied member libraries with request cards (punch cards prepunched with symbols for each holding library within an institution). for each title for which catalog cards were needed, members transcribed library of congress ( lc) card numbers onto a request card. members sent batches of cards to oclc at least once a week. at oclc, the lc card numbers were keypunched into the cards and new requests were combined with unfilled requests to be searched against the marc ii file. by the spring of 1971, over 70 percent of titles requested were found the first time they were searched. the selected marc ii records were then submitted to a formatting program that produced print images on magnetic tape for all cards required by a member library. the number of cards to be printed was determined by the number of tracings on the catalog record and the number of catalogs into which cards were to go including a regional union catalog (the cleveland regional union catalog) and the national union catalog. individual cards were formatted according to options originally selected by the member library. these options included: 1) presence or absence of tracings and holdings information on each of nine different types of cards; 2) three different indentions for added entries and subject headings; 3) a choice of upper-case or upperand lower-case characters for each type of added entry and subject heading; and 4) many formats for call numbers. oclc returned cards to members in finished form, alphabetized within packs for filing in specific local catalogs. the primary objective of off-line operation was the production of catalog cards at a lower cost than manual methods in oclc member libraries. early activation of off-line catalog card production did reduce costs and gave some members an opportunity to take advantage of normal staff turnover by not filling vacated positions in anticipation of further savings after activation of the on-line system. other objectives of off-line operation were the automated simulation of on-line activity in member libraries and development and implementation of catalog card production in preparation for card production in an on-line operation. the number of catalog card variations required by members, even after members had reviewed and accepted detailed designs of card products, proved to be higher than anticipated. more than one man-year was expended after activation of the off-line system in further development and implementation to take care of the formats and card dissemination variations requested by specific libraries. the one year advance start on 160 journal of library automation vol. 5/3 september, 1972 catalog production made possible by using marc ii records in the off-line mode proved to be a far greater blessing than anticipated, for it would have been literally impossible to have activated on-line operation and catalog card production simultaneously. a major goal of oclc card production is elimination of uniformity required by standardized procedures. the oclc goal is to facilitate cooperative cataloging without imposing on the cooperators. the cost to attain this goal is slight, for although there is a single expense to establish a decision point in a computer program, the cost of selection among three or thirty alternatives during program execution is infinitesimal. design of catalog cards and format options began four months before off-line activities. two general meetings of the oclc membership were held at which card formats were reviewed and agreed upon in a general sense. next, the oclc staff published a description of catalog card production and procedures for participation ( 6). this publication was reviewed by the membership and format variations were reported for inclusions in the procedure. members reported few variations at this time, but when implementation for individual members was undertaken, it was necessary to build many additional options into the computer programs. to assist the oclc staff in defining options for off-line catalog products and on-line procedures, an advisory committee on cataloging was established. this committee met several times and provided much needed guidance and counsel. the catalog card format options that members could select were extensive. for example, although the position of the call number was fixed in the upper left-hand corner of the card, there were 24 basic formats for lc call numbers, and libraries using the dewey decimal classification could format their call numbers as they wished. in general, the greatest number of format options are associated with call numbers, probably because there has never been a standard procedure for call number construction. programs because designing, writing, coding, and debugging of catalog card production programs can cost tens of thousands of dollars, oclc sought existing card production programs that could run on computers at ohio state university, which is the generous host of the ohio college library center. only two programs were located that could both produce cards in the manner required by oclc and run on osu computers. card production costs were not available for one of the programs, but because analysis suggested that the design of the program would create very high card costs, this program was not selected. the other program had been written and used at the yale university library, and although the card production costs were high, it was known that changes could be made to increase efficiency. thus, arrangements were made to obtain and run the yale programs at osu. members were free to choose a variety of format options and submitted on a catalog profile questionnaire (figure 2) their specifications for each shared cataloging systemjkilgovr, et al. 161 catalog. holdings information and tracings could be printed on any or all of nine types of cards: 1) shelf list; 2) main entry; 3) topical subject; 4) name as subject; 5) geographic subject; 6) personal and corporate added entries; 7) title added entry; 8) author-type series added entry; and 9) title-type series added entry. subject headings and added entries could have top-of-card or bottom-of-card placement and could be printed in all upper-case or in upperand lower-case characters. any type of subject heading and added entry could begin at the left edge of the card or at the first, second, or third indention. other options are described in the manual for oclc catalog card production ( 5). the data received on catalog profile questionnaires were transferred to punch cards and a computer program written in snobol iv embedded the information in the form of a pack definition table (pdt) in one of the principal catalog production programs named convert ( cnvt). each pdt defined the cards to go into the catalogs of one holding library, a holding library being a collection with its own catalog. the first major program in the processing sequence was prepros, which was written in ibm 360 basic assembler language ( bal) and run on an ibm 360/75. prepros converted records from the weekly marc ii tapes to an oclc internal processing format, including conversion of marc ii characters from ascii to ebcdic code. this program also parsed lc call numbers and partially formatted them. it also checked for end-of-field and end-of-record characters and verified the length of record. finally, it wrote the output records in lc card number sequence into huge variable format blocks of 20,644 characters. the large blocks reduced computer costs since the pricing algorithm employed on the ibm 360/75 imposed a charge for each physical read and write operation. the magnetic tape output weekly by prepros was then submitted to cnvt together with the old master file of bibliographic records in lc card number order and a file of request cards that had been sorted in lc card number order. cnvt merged the records on the weekly tape with the master file and then matched the requests by lc card number. when a match was obtained, cnvt deleted some fields from the bibliographic record and formatted the call number according to the specifications of the library that had originated the request. it then wrote the modified record and associated pot's onto an output tape in external ibm 7094 binarycoded-decimal (bcd) character code with the record format converted to that of the yale bibliographic system. the second principal product of cnvt was the new master tape of bibliographic records that would become the old master for the next week's run. cnvt also punched out a card bearing the lc card number for each request card for which there was a match. these punch cards were used to withdraw cards from the request card file so that they would not be submitted again. cnvt was first run on an ibm 360/50. the tape file of modified records and pot's was then submitted to 162 journal of library automation vol. 5/3 september, 1972 ohio college library center ca t alog pr o f ile questionnaire i. to define the pack of a rece i ving catalog, the member should complete the following tabl e . directions for completing the table are in the instruction manual , pp . 2 -3. leave blank rows for types o f entry not to be included ~n th~s pack. ii. 1. what is the name of the holding library or collection for which this pack contains c a rds? jvvc.;.. ,l... ~~ ::s 2 . what is the name of the receiving catalog into which this pack will go? u"'"" ~~~;~\ ""~r_s~ 3 . if this receiving catalog is not in the holding library or collection , 11 11 11111 put in the following box the stamp to appear above the call number · · · · · · · · · (see instruction manual) . lnsti tution: \)," jcl.<>·~ or;. [:>.,\( • .-<' fig. 2. catalog profile questionnaire. expand, a modified yale program written in mad and run on an ibm 7094. by combining the number of tracings and pdt requirements, expand developed a card image for each catalog card required by the requesting library. it also prepared a sort tag for each image so that the image could be subsequently sorted by library into packs and alphabetized within each pack. expand essentially did the formatting of catalog cards except for the complex lc call number formatting carried out by cnvt. the file of card images was passed to a program named build print tape (bldpt) written in bal and run on the ibm 360/ 75. bldpt first converted the external ibm 7094 bcd characters to ebcdic. next bldpt sorted the images, and finally, it arranged the images on a single tape to allow printing on continuous, two-up catalog card formsthe first half of the sorted file was printed on the left-hand cards and the second half on the right. the print program was also written in bal but run on an ibm 360/ 50. it was designed so that either the entire file or a segment as small as four cards could be printed; the latter feature was of greatest use in reprinting cards that for one of several reasons were not satisfactorily printed during the first run. cards were printed six lines to an inch and the print train used was a modified version of the train designed by the university of chicago which in turn was a modified version of the ibm tn train. shared cataloging systemjkilgour, et al. 163 the printer attached to the ibm 360/50 was an ibm 1403 n1 printer. this printer appears to be superior to any other high-speed printer currently available, but to obtain a product of high quality, it was necessary to fine-tune the printer, to use a mylar ribbon from which the ink does not flake off, and to experiment with various mechanical settings to determine the best setting for tension on the card forms and for forms thickness. above all, patience in large amounts was required during initial weeks when it seemed as though a messy appearance would never be eliminated. oclc off-line catalog card production programs were written in assembler language and higher level languages. use of higher level languages for character manipulation incurs unnecessarily high costs. therefore, for a large production sys tem like oclc, it is absolutely required that processing programs and subroutines that manipulate all characters, character by character, be written in an assembler language to obtain efficient programs that run at low cost. programs that do not manipulate characters, such as the oclc program for embedding pdt's in cnvt, may well be written in a higher level language. materials and equipment-a summary off-line catalog production was based on availability of marc ii records on magnetic tapes disseminated weekly by the library of congress. without the marc ii tapes, the off-line procedure could not have operated. each week, the new marc ii records were added to the previous cumulated master file also on magnetic tape, and previously unfilled and new requests were run against the updated file. osu computers employed were an ibm 360/75, an ibm 360/50, an ibm 7094, and an ibm 1620. the run procedure was complex and therefore somewhat inefficient, but this inefficiency was traded off against a predictably high expense to write a new card formatting program. members submitted a request for card production on a punch card on which the member had written an lc card number. members could specify a recycling period of from one to thirty-six weeks for running their request cards against the marc ii file before unfulfilled requests would be returned. in general, request cards bore lc card numbers for that section of the marc ii file that was complete; at first, the file was inclusive for only "7" series numbers, but in early 1971 the recon file for "69" numbers was added. request cards often numbered several thousand a week. catalog card forms are the now-familiar two-up, continuous forms with tractor holes along each side for mechanical driving. the card stock is permalife, one of the longest-lived paper stocks available. a thin slit of about one thirty-second of an inch in height converts each three-inch vertical section of card stock to 75 mm. the lowest price paid in a lot of a half million cards has been $8.065 per thousand. after having been printed, the card forms are trimmed on a modified uarco forms trimmer, model number 1721-1. this trimmer makes four 164 journal of library automation vol. 5/3 september, 1972 continuous cuts in the forms and produces cards with horizontal dimensions of 125 mm. cards are stacked in their original order as printed and are therefore in filing order. the trimmer operates at quoted speeds of 115 and 170 feet per minute or 920 and 1,360 cards per minute. measurements of speeds of operations confirmed these ratings. results the off-line catalog production system produced 529,893 catalog cards from july 1970 through august 1971 at an average cost of 6.57 cents per card. this cost includes over twenty separate cost elements plus a threequarter cent charge for overhead. the firm of haskins & sells, certified public accountants, reviewed the costing procedures that oclc employs, found that all direct costs were being included, and recommended the three-quarter cent overhead charge. the number of extension cards varies from library to library depending almost entirely on the types of cards on which libraries have elected to print tracings. however, one university library with a half-dozen department libraries and requiring tracings on only shelf list and main entry cards averages approximately six cards per title. cataloging using the oclc off-line system results in a decrease of staff requirements, and some libraries that used the system during most of the year found that they needed less staff in cataloging. reduction of staff by taking advantage of normal staff turnover facilitated financial preparation for the oclc on-line system in these libraries. evaluation despite the obvious inefficiences generated by running production computer programs on four different computers in two different locations and despite inefficiencies in the programs themselves, computer costs to process marc ii tapes and to format catalog cards, but not to print them, was 2.27 cents per card. as will be shown later, newer and more efficient programs have halved this cost, but even at 2.27 cents per card for formatting and .33 cents per card for printing, the cost of oclc off-line card production is less than half the cost of more traditional card production methods ( 7). two features originally designed into the system were never implemented, somewhat diminishing the usefulness of the system for some libraries. one of the incompleted features was a technique for deleting, changing, or adding a field to a marc record (this capability exists in the on-line system). absence of this procedure meant that libraries had to accept lc cataloging without modification except to call numbers. the second missing feature was the ability to print multiple holding locations on cards (this capability also exists in the on-line system) although it was possible to print multiple holdings in one location. this deficiency limited the usefulness of the system for large libraries processing duplicates into shared cataloging systemjkilgour, et al. 165 two or more collections. both of these features could have been activated, . but shortage of available time prior to activation of the on-line system prevented their implementation. figure 3 shows the high quality of the catalog cards produced. subsequent to attainment of this level of quality, there have been no complaints from members except in cases where a piece of chaff from the card forms went through the printer and caused omission of characters. oclc continues to vary the design of its continuous forms to achieve completely chaff-free stock. the shortest possible time in which cards could be received by the member library after submitting a request card was ten days, but it is doubtful that this response time was often achieved. the minimum average response time for the three-quarters of requests for which a marc record was located on the first run was two weeks. delays at a computer center or incorrect submission of a run could extend this delay to three and four weeks, and unfortunately such delays were cumulative for subsequent requests until the "weekly" runs were made sufficiently more often than weekly to catch up. if another delay occurred during a catch-up period, the response time further degraded. during the fourteen months of operation, there were two serious delays. the amount of normal turnover that occurred in oclc libraries during the fourteen months and that was taken advantage of by not filling positions was too small to reduce the financial burden incurred in starting up the on-line system. a few libraries demonstrated that it was possible to take advantage of such attrition. however, 20 percent of the libraries did not participate in the on-line system and perhaps half of those who did participate were uncertain as to whether the on-line cataloging system would operate or would operate at a saving. when feasibility of on-line shared cataloging has been substantiated and other centers begin to implement similar systems, it should be possible to activate off-line catalog production sufficiently in advance of on-line implementation to enable participants to take adequate advantage of normal attrition to minimize, or nearly eliminate, additional expenditures. experience such as that of oclc will enable new centers to calculate the number of months necessary for off-line production required to reduce salary expenditures by an amount needed to finance the on-line system. shared cataloging-on-line the cataloging objectives of the on-line shared cataloging system are to supply a cataloger with cataloging information when and where the cataloger needs the information and to reduce the per-unit cost of cataloging. catalog products of the system are the same as the off-line systemcatalog cards in final form alphabetized for £ling in specific catalogs; the on-line system is not limited to marc ii records but also allows cataloging input by member libraries. the shared cataloging system, which accommo166 ]oumal of library automation vol. 5/3 septembe r, 1972 jc423 oll7 ctt tt 171 oe45 1971 oaku la~reit de lacharr~ere repe. stude& sur ta theorle deaocratlc: spinoza, rousseau, beaet, marx. paris, pa:yot, 1963o 218 p• 23 c •• c8ibllotheque politique et econoaique) bibliocraphical ~ootnotes. dawis, mildred j., edo babroider:y desians, 1780-1820; 1ro• the aanuscript collection, the textile resource and research center, the valentine museua, richaond, virginia. edited by mildred j. davis. new york, crown publishers f1971 1 xiii, 94 p• (chie1l:y illus. (part colo)) 29 c•• commercial policy. 338.91 1:875in l:reinin, wordechai elihau, 193000 international econo•ics; a polic y approach (b:y) mordechai e. ~reinin• mew york harcourt, brace, jovanovich [ 1971 1 x., 379 p• it lus. (the harbrace series in business and econo•i cso) dc 430.5 • z9 c34 oako intersectoral capital ~tows in the econoaic dewelopaent o~ taiwan, 1 89 51960. lee, tena-hui • intersectoral capital ~lows in th e econoaic developaent o~ taiwan, 1 8&~1960. ithaca (n.y.] cornell univ ers it y press [ 1971 1 xxt 197 p• 23 cao an out&rowth o~ the author's the~is , cornell oniwersit:y, 1968. bibliography: p• (183)-1 8 1. 0 a"rnt 76-1 59031 ( fl(;uh e fig. 3. computer-produced catalog cards. hed uced 25%) dates all cataloging done in modern european alphabets, builds a union catalog of holdings in oclc member libraries as cataloging is done. one library, wright state university, is converting its entire catalog to machinereadable form in the oclc on-line catalog. the third major goal is a communications system for transacting interlibrary loans. system design and equipment selection figure 4 depicts the basic design of computer and communication comshared cataloging systemf kilgour, et al. 167 ponents for th e comprehensive system comprised of the five subsystems described in th e introduction. the machine system for shared cataloging was designed to be a subsystem of the total system so that subsequent modules could be added with minimal dismption . similarly, the logical d esign of the shared cataloging subsystem was constructed so that the modules of shared cataloging would be common to the remaining file requirements as shown in figure 1. design of the on-line shared cataloging system began with a redefinition of the catalog products of off-line catalog production ( 5) . in this exercise, the advisory committee on cataloging, comprised of members from seven libraries, contributed valuable assistance. the committee was also most helpful in designing the formats of displays to appear on terminal screens. important decisions in the design of the computer, communications, and terminal systems were those involving mass storage devices and terminals. random access storage was the only type feasible for achieving the objective of supplying a user with bibliographic information when and where he needed it. hence, random access memory devices were selected for the comprehensive system and ipso facto for shared cataloging. data channel system file catalog f1 l e data channe i memory drive contr ol data channe l ----connect1on made 1f cpu #i malfunct ions ·connect1on made if cpu #2 ma l funct1ons fig. 4. computer and communication system. 168 ]oum.al of library automation vol. 5/ 3 september, 1972 the cathode ray tube (crt) type of terminal was selected primarily because of its speed and ease of use by a cataloger. crt terminals are far more flexible in operation than are typewriter terminals from the viewpoint of both the user and machine system designer. for these reasons, crt terminals can enhance the amount of work done by the system as a whole. it was originally planned to select a computer without the assistance of computerized simulation, but in the course of time, it became clear that it was impossible to cope with the interaction among the large number of variable computer characteristics without computerized simulation. therefore, a contract was let to comress, a firm well known for its work in computer simulation. ten computer manufacturers made proposals to oclc for equipment to operate the five subsystems at peak loading (an average five requests per second over the period of an hour ) . all ten proposed computer systems failed because simulation revealed inefficiencies in their operating systems for oclc requirements. oclc and comress staff then proposed a modification in operating systems, which the manufacturers accepted. the next series of trials revealed that more than half of the computers or secondary memory files would have to be utilized over 100 percent of the time to process the projected traffic. as a result of these findings , one computer manufacturer withdrew its proposal, and five others changed proposals by upgrading their systems. on the final simulation runs, the percent of simulated computer utilization ranged from 19.70 percent to 114.31 percent. a subsequent investigation of predictable delays due to queuing in such a system showed that unacceptable delays could arise if computer utilization rose above 30 percent at peak traffic. three manufacturers proposed computer systems that were under 30 percent utilization and, for these, a trade-off study was made that included such characteristics as cost, reliability, time to install the applications system, and simplicity of program design. the findings of the simulation and trade-off studies provided the basis of the decision to select a xerox data systems sigma 5 computer. major components of the oclc sigma 5 are the central processing unit (cpu), three banks of core memory with a total capacity of 48 thousand 32-bit words or 192 thousand 8-bit bytes, a high speed disk secondary memory, 10 disk-pack spindles with total capacity of 250,000,000 bytes plus two spare spindles, two magnetic tape drives, two multiplexor channels, five communications controllers, a card reader, card punch, and printer. the character code is ebcdic. figure 5 illustrates the sigma 5 configuration at oclc. in this configuration, the burden of operating communications does not fall on the cpu so that there is no requirement for "cycle stealing" that slows processing by a cpu. the lease cost to oclc of the equipment represented in figure 5 is $16,317 monthly. the listed monthly lease of the equipment is $21,421 from which an educational discount of 10 percent is deducted. (the remaining difference is due to a rebate because the original order included secondary shared cataloging system j kilgour, et al. 169 memory units that xds was to obtain from another manufacturer who proved incapable of supplying units that fulfilled specifications. hence, xds was forced to supply other memory units having a higher list price but has done so at a cost per bit of the units originally ordered.) the printer furnished with the sigma 5 does not provide the high-quality printing required for library use. at the present time, oclc prints catalog cards on an osu ibm 1403-n1 printer that without doubt provides the highest quality printing currently available from a line printer. however, oclc is designing an interface between a sigma 5 and an ibm 1403 memory bonk no. i --dolo ---control memory bonk no. 2 memory bonk no. 3 i i i --------------~ l i 1 r----------r sigma 5 cpu multiplexor opera! or's console cord punch magnetic tope units cord reader dolo bose disk bonk no. i doto bose disk bonk no. 2 _______ !j bus-shor in g 1---+----, mull iplexor fig. 5. xds sigma 5 configuration. 170 journal of library automation vol. 5/3 september, 1972 printer; xds is also developing a new type of printer that will provide high quality output. when the sigma 5 can produce quality printing, it will be fully qualified to be used for nodes in national networks. as has already been stated, the crt-type terminal was selected because of its ease of use. moreover, the simulation study confirmed that crt terminals would place far less burden on the central computer and therefore, for the oclc system, would make possible selection of a less expensive computer than would be required to drive typewriter terminals. although typewriter terminals cost less, the total cost could be higher for a system employing typewriter terminals than for one using crt's because of greater central computer expense. library requirements for a crt terminal are: 1) that the terminals have the capability of displaying upperand lower-case characters and diacritical marks; 2) that the image on the screen be highly legible and visible; 3) that the terminal possess a large repertoire of editing capabilities; and 4) that interaction with the central computer and files be simple and swift. system requirements were: 1) that the terminal accept and generate ascii code; 2) that it make minimal demands for message transmissions from and to the central site; 3) that it have the capability of operating with at least a score of other terminals on the same dedicated line; and 4) that its cost, including service at remote sites, be about $150 per month. data were collected on crt's produced by fifteen manufacturers, and three machines were identified as being prime candidates for selection. oclc carried out a trade-off study in which thirty-three characteristics were assessed for these three machines. one of the thirty-three (reliability) could not be judged for any of the three because none had yet reached the market. for the remaining characteristics, the irascope lte excelled or equaled the other two terminals for twenty-eight characteristics including all nineteen characteristics of importance to the oclc user. moreover, the irascope was outstandingly superior in its ability to perform continuous insertion of characters, line wrap-around during insertion of characters, repositioning of characters so that each line ends in a complete word, and full use of its memory. however, the irascope was the most expensive$175 a month as compared with $153 and $166. nevertheless, the irascope was selected because of its obvious superiority. pilot operation by library staffs has not produced complaints concerning visibility or operability; complaints during pilot operation have sprung from failures caused by a variety of bugs in telephone systems and a couple of bugs in the terminals that were subsequently exterminated. the number of terminals needed by a member library for shared cataloging was calculated on the assumption that six titles could be processed per terminal-hour. it was also assumed that a library might have only one staff member to use the terminal throughout the year. it was further assumed that as much as three months of the terminal operator's time would be lost to vacations, sick leave, and breaks. at the rate of six titles per terminal-hour shared cataloging system f kilgour, et al. 171 and with 2,000 working hours in a year, 12,000 titles would be processed annually assuming full-time use. since only nine months was assumed to be available, it was estimated that 9,000 titles would be processed at each terminal. in large libraries where there would be more than one staff member to operate a terminal, there would be three months of time available to do input cataloging, and since only a few libraries will be obtaining less than 75 percent of cataloging from the central system, it appears that a formula of one terminal for every 9,000 titles or fraction thereof cataloged annually would give each library sufficient terminal-hours. in actual operation, operators have been able to work at twice the assumed rate of six titles per terminal-hour so that there is reason to believe that these guidelines will provide adequate terminal capability. file organization the primary data that will enter the total system are bibliographic records, and since the system is being designed to conform to standards, the national standard for bibliographic interchange on magnetic tape ( 8) has been complied with in file design. in other words, the system can produce marc records from records in the oclc file format; more specifically, the system can regenerate marc ii records from oclc records derived originally from marc ii records, although an oclc record contains only 78 percent of the number of characters in the original marc ii record. similarly, the system can generate marc ii records from original cataloging input by member libraries. the simulation study clearly showed that bibliographic data would have to be accessed in the shortest possible time if the system were to avoid generating frustrating delays at the terminal. imitation of library manual files or of standard computer techniques for file searching would not provide sufficient efficiency. oclc, therefore, set about developing a file organization and an access method that would take advantage of the computation speeds of computers. oclc research on access methods has produced several reports ( 9,10,11) and has developed a technique for deriving truncated search keys that is efficient for retrieval of single entries from large files. these findings have been employed in the present system that contained over 600,000 catalog records in april1973, arranged in a sequential file on disks, and indexed by a library of congress card-number index, author-title index, and a title index. the research program on access methods did not, however, investigate methods for storing and retrieving records. research on file organization included experiments directed toward development of a file organization that would minimize processing time for retrieval of entries or for the discovery that an entry is not in the file. since the oclc system is designed for on-line entry of data into the data base, it was not possible to consider a physically sequential file for the index files. 172 ]ottmal of library automation vol. 5/ 3 september, 1972 the indexed sequential method of file organization obviates the data-entry obstacle posed by physical sequential organization, but is inefficient in operation. consequently, scatter storage was determined to be the best method for meeting the efficient file organization requirements of the system. the findings of the investigation have shown that very large files of bibliographic index entries organized by a scatter-store technique in which search keys are derived from the main entry can be made to operate very efficiently for on-line retrieval and at the same time be sparing of machine time even in those cases where requests are for entries not in the file ( 12). this research also produced two powerful mathematical tools for predicting retrieval behavior of such files, and a design technique for optimizing record blocking in such files so that, on the average, only one to two physical accesses to the file storage device are needed to retrieve the desired information. the files displayed in figure 1 are constructed by a single file-building program designed so that additional modules can be embedded in the program. the program accepts a bibliographic record, assigns an address for it in the main sequential file, and places the record at that address. having determined the bibliographic record address, the program next derives the author-title search key and constructs an author-title index file entry which contains the pointer to the bibliographic record. then the program produces an lc card number index entry and a title index entry, each of which contains the same pointer to the bibliographic record. when a bibliographic record is used for catalog card production, an entry is made in the holdings file. when the first holdings entry is made for a bibliographic record, a pointer to the holdings entry is placed in that record; the pointer to each subsequent holdings entry is placed in the previous holdings entry. an entry is made at the same time in the call number index containing a pointer to the holdings entry. this file organization operates with efficiency and economy. the files containing the large bibliographic records and their associated holdings information are sequential, and hence, are highly economical in disk space. the technique used ensures that only a low percentage of available disk area need be reserved for growth of these large sequential files. disk units can be added as needed. each fixed-length record in the scatter-store files is less than 3 percent of the size of an average bibliographic record, and since 25 percent to 50 percent of these files are unoccupied, the empty disk area is small because of the small record lengths. sequential files the bibliographic record file and holdings file are sequential files, the holdings file being a logical extension of the bibliographic record file. a record is loaded into a free position made available by deletion of a record or into the position following the last record. whenever a new version of a shared cataloging system/kilgour, et al. 173 record updates the version already in the file, the new record is placed in the same location as the old if it will fit; otherwise, it is placed at the end of the file and pointers in the indexes are changed. there is a third, small sequential file containing unique notes for specific copies, dash entries, and extra added entries. each bibliographic record contains the information in a marc ii record. each record also contains a 128-bit subrecord capable of listing up to 128 institutions that could hold the item described by the record. at the present time, only 49 of the 128 bits are used since there are 49 institutions participating in oclc. the record also includes pointers to entries in index files, so that the data base may be readily updated, and a pointer to the beginning of the list of holdings for the record. in addition, each record has a small directory for the construction of truncated author-title-date entries, which are displayed to allow a user to make a choice whenever a search key indexes two or more records. although each bibliographic record includes all information in a standard marc ii record, records in the bibliographic record file have been reduced to 78 percent of the size of the communication record largely by reducing redundancy in structural information. oclc intends to compress bibliographic records further by reducing redundancy in text by employing compression techniques similar to those described in the literature ( 13,14). the holdings file contains a string of holdings records for each bibliographic record; individual records are chained with pointers. information in each record includes identity of the holding institution and the holding library within the institution, a list of each physical item of multiple or partial holdings, the call number and pointers to the next record in the chain, and to the call number index. the last record in the chain also has a back-pointer to the associated bibliographic record. whenever there is a unique note, dash entry, or extra added entry coupled to a holding, that holding has a pointer to a location in the third sequential file in which the note or entry resides. index files indexes include an author-title index, a title index, and an lc card number index. research and development are under way leading to implementation of an author and added author index and a call number index. a class number index will be developed and implemented in the future. with the exception of the class number index, which by its nature is required to be a sequentially accessible file, the oclc indexes are scatter storage files. the construction of and access to a scatter storage file involves the calculation of a home address for the record and the resolution of the collisions that occur when two or more records have the same home address. the calculation of a home address comprises derivation of a search key from the record to be stored or retrieved and the hashing or randomizing of the key to obtain an integer, relative record address that is converted to a 174 journal of library automation vol. 5/3 september, 1972 storage home address. the findings of oclc research on search keys has been reported (9,10,11). the hashing procedure employs a pseudo-random number generator of the multiplicative type: home address= rem ( k x.jm) where k is the multiplier 65539, x,. is the binary numerical value of the search key, and m is the modulus which is set equal to the size of the index file; 'rem' denotes that only the remainder of the division on the right-hand side is used. philip l. long and his associates have shown that efficiency of a scatter storage file is rapidly degraded when the loading of the file exceeds 75 percent ( 12 ); therefore, oclc initially loads files at 50 percent of physical capacity. hence, the modulus is chosen to be twice th e size of initial number of records to be loaded. when 75 percent occupancy is reached a new modulus is chosen and the file is regenerated. collisions are resolved using the quadratic residue search method proposed by a. c. day ( 15) and shown to be efficient ( 12). in this method, a new location is calculated when the home address is full; the first new location has the value (home address 2), the second (home address 6 ), the third ( home address 12 ) and so on until an empty location is found if a record is being placed in the file, or the end of the entry chain is found if records are being retrieved. when the file size m is a prime having the form 4n + 3, where n is an integer, the entire file may be examined by 1n searches. retrieval techniques the retrieval of a record or records from the oclc data base is achieved in fractions of a second when a single request is put to the file, and rarely exceeds a second when queuing delays are introduced by simultaneous operation of upwards of 50 terminals. response time at the terminal is greater than these figures because of the low communication line data rate, but terminal response time rarely exceeds five seconds. figure 6 shows the map of a record in the author-title index file and the title file. in the author-title file, the search key is a 3,3 key with the first trigram being the first three characters of the author entry and the second being the first three characters of the first word of the title that is not an english article (9). for example, "str,cha" is the search key for b. h. streeter's the chained library. however, any or all of the characters in the trigrams may be all in lower case. the author-title index also indexes title-only entries, but the title index provides a more efficient access to this type of entry. the pointer in the record map in figure 6 is the address of the bibliographic record from which the search key was d erived. the entry chain indicator bit is set to 0 (zero) if there is another record in the entry chain and to 1 if the record is last in the chain. when this bit is 0, the search skips to the next record as calculated by day's skip algorithm. the shared cataloging systemjkilgour, et al. 175 bibliographic record presence indicator bit is set to 0 (zero) to indicate that the bibliographic record associated with this index entry has been deleted; it is set to 1 to indicate that the bibliographic record is present. an author-title search of the data base is initiated by transmission of a 3,3 key from a terminal. a message parser analyzes the message and identifies it as a 3,3 author-title search key by the presence of the comma and by there not being more than three characters on either side of that comma. next, the hashing algorithm calculates the home address and the location is checked for the presence of a record. if no record is present, a message is sent to the terminal stating that there is no entry for the key submitted and suggesting other action to be taken. if a record is present and its key matches the key submitted and if the entry-chain indicator bit signifies that the record at the home address is the only record in the chain, the bibliographic record which matches the key submitted is displayed on the terminal screen. if the entry-chain bit signifies that there are additional records in the chain, those records are located by use of the skip algorithm. if more than one record possesses the same key as that submitted, truncated author-titledate entries derived from the matching bibliographic records are displayed with consecutive numbering on the terminal screen. the user then indicates by number the entry containing information pertaining to the desired work, and the program displays the full bibliographic record. the title-index record has the same map as the author-title record and is depicted in figure 6. the title index is also constructed and searched in entry chain indicator bit 4 bytes bibliographic record pointer nometitle search key bibliographic record presence indicator bit bibliographic record pointer title search key fig. 6. author-1'itte and title index records. 8 bytes 176 ]ou,-nal of library automation vol. 5/ 3 september, 1972 the same manner as the author-title index. the title search key ( 3,1,1,1) consists of the first three characters of the first word of the title that is not an english article plus the initial character of each of the next three words. commas separate the characters derived from each word. the title search key is "cha,l," for b. h. streeter's the chained libmry, the three commas signifying that the message is a title search key. the bibliographic record pointer and the two indicator bits have the same function as in the authortitle record. figure 7 exhibits the map for a record in the lc card number index. the three left-most bytes in the lc card number section contain an alphabetic prefix to a number where this is present, or, more usually, three blanks when there is no alphabetic prefix. similarly the right-most byte contains a supplement number or is blank. the middle four bytes contain eight digits packed two digits to a byte after the digits to the right of the dash have been, when necessary, left-filled with zeroes to a total of six digits. the dash is then discarded. for example, lc card number 68-54216 would be 68054216 before being packed. the pointer and the two indicator bits have the same function as in the author-title index record. an lc card number search is started with the transmission of an lc card number as the request. the parser identifies the message as an lc card number search by determining that there is a dash in the string of characters and that there are numeric characters in the two positions immediately to the left of the dash. the remainder of the search procedure duplicates that for the author-title index. on-line programs as is the case with all routinely used oclc programs, the on-line programs are written in assembly language to achieve the utmost efficiency in processing. in addition, every effort has been made to design programs to run in the fastest possible time. in other words, one of the main goals of the oclc on-line operation is lowest possible cost. the simulation study had shown that it was necessary to modify the operating system of the xds sigma 5 so that the work area of the operating system would be identical with that of the applications programs. the xds real-time batch monitor, which is one of the operating systems furnished by xds for the sigma 5, has been extensively altered, and one of the alterations is the change to a single work area. another major change to the operating system was building into it the capability for multiprogramming. at the present time, the on-line foreground of the system operates two tasks in that two polling sequences are running simultaneously, and the background runs batch jobs at the same time. this new monitor is called the on-line bibliographic monitor ( obm). an extension of obm is named motherhood (mh); mh supervises the operation of the on-line programs. mh also keeps track of the activities of these programs and compiles statistics of these activities. in addition, mh shared cataloging systemjkilgovr, et al. 177 contains some utility programs such as the disk and terminal 1/0 routines. the principal on-line application program is catalog (cat); its functions are described in detail in the subsequent sections entitled cataloging with existing bibliographic information and input cataloging. in general, cat accepts requests from terminals, parses them to identify the type of request, and then takes appropriate action. if a request is for a bibliographic record, cat identifies it as such, and if there is only one bibliographic record in the reply, cat formats the record in one of its work area buffers and sends the formatted record to the terminal for display. if more than one record is in the reply, cat formats truncated records and puts them out for display. after a single bibliographic record has been displayed, cat modifies the computer memory image of the record in accordance with update requests from the terminal. for example, fields such as edition statement or subject headings may be deleted or altered, and new fields may be added. when the request is received from the terminal to produce catalog cards from the record as revised or unrevised, cat writes the current computer memory image of the record onto a tape to be used as input to the catalog card production programs. the catalog card production programs operate off-line, and the first processing program is convert ( cnvt), which formats some of the fields and call numbers. the major activity of cnvt is the latter, for libraries require a vast number of options to set up their call numbers for printing. cnvt also automatically places symbols used to indicate oversized books above, below, or within call numbers as required. format is the second program; it receives partially formatted records from cnvt. format expands each record into the total number of card images corresponding to the total cards required by the requesting library 4 bytes bibliographic record pointer lbibliogrophic 8 bytes lc cord number record presence indicator bit entry choin indicator bit fig. 7. library of congress card number index record. 178 ] ournal of libm1·y automation vol. 5/3 september, 1972 for each particular title. format determines this total from the number of tracings and pack definition tables previously submitted by the library that define the printing of formats of cards to go into each catalog. format, which is an extensive revision of expand, contains many options not found in the old off-line catalog card production system. format can set up a contents note on any particular card, and puts tracings at the bottom of a card when tracings are requested. the author entry normally occurs on the third line, but if a subject heading or added entry is two or more lines long, format moves the author entry down on the card so that a blank line separates the added entry from the author entry. in other words, each card is formatted individually. the major benefit of this feature, which allows the body of the catalog data to float up and down the card, is that the text on most cards can start high up on the card, thereby reducing the number of extension cards. the omission of tracings from added entry cards has a similar effect. table 1 presents the percentage of extension cards in a sample of 126,738 oclc cards for 18,182 titles produced for twenty-five or more libraries during a seventeen-day period, compared with extension cards in library of congress printed cards and in a sample of nelinet cards "for over 1,300 titles" ( 16). the table shows that the oclc mixture of cards with and without tracings and with the floating body of text yields about 10.8 percent more extension cards compared to library of congress printed cards. were libraries to restore the original meaning to the phrase "main entry" by printing tracings only on main entry cards, the percentage of extension cards in computer produced catalog cards printed six lines to the inch would probably be less than for lc cards. format also sets up a sort key for each record and a sort program sorts the card images by institution, library, catalog, and by entry or call number within each catalog pack. another program, build-print-tape (bpt), arranges the sorted images on tape so that cards are printed in consecutive order in two columns on two-up card stock. f inally, a print program prints the cards on an ibm 1403 n1 printer attached to an ibm 360/50 computer. cataloging with existing bibliographic information this section describes cataloging using a bibliographic record already in the central file; the next section, entitled input cataloging describes cataloging when there is no record in the system for the item being cataloged. the cataloger at the terminal first searches for an existing record, using the lc card number found on the verso of the title page or elsewhere. if the respon se is negative or if there is no card number available, the cataloger searches by title or by author and title using the 3,1,1,1 or 3,3 search keys respectively. if these searches are unproductive, the cataloger does input cataloging. when a search does produce a record, the cataloger reviews the record shared cataloging systemjkilgour, et al. 179 table 1. extension catalog card percentages number oclc lilnary of nell net of marc ii congress marc ii cards cards printed cards cards 1 77.2 87.8 79.9 2 18.9 10.0 16.7 3 2.5 1.6 2.5 4 1.1 .3 .6 5 .2 .2 .1 6 .1 .2 to see if it correctly describes the book at hand. if it is the correct record and if the library uses library of congress call numbers, the cataloger tra nsmits a request for card production by depressing two keys on the keyboard. cataloging is then complete. if the lc call number is not used, the cataloger constructs and keys in a new number and then transmits the produce-cards request. if the record does not describe the book as the cataloger wishes, the record may be edited . the cataloger may remove a field or element, such as a subject heading. information within a field may be changed by replacing existing characters, such as changing an imprint date by overtyping, by inserting characters, or by deleting characters. finally, a new field such as an additional subject heading may be added. when the editing process is complete, the cataloger can request that the record on the screen be reformatted according to the alterations. having reviewed the reformatted version, the cataloger may proceed to card production. when a cataloger has edited a record for card production, the alterations in the record are not made in the record in the bibliographic record file. rather, the changes are made only in the version of the record that is to be used for card production. the edited version of the record is retained in an archive file after catalog card production so that cards may be produced again from the same record for the same library, should the need arise in the future. the author index currently under development will enable a cataloger to determine the titles of works in the file by a given author. the call number index, also currently being developed, will make it possible for a cataloger to determine whether or not a call number has been used before in his library. the class number index that will be developed in the future will provide the capability of determining titles that have recently been placed under a given class number or, if none is under the number, the class number and titles on each side of the given number. liijjul cataloging input cataloging is undertaken when there is no bibliographic record in the file for the book at hand. to do input cataloging, the cataloger requests 180 ]ounwl of library automation vol. 5/3 september, 1972 that a work form be displayed on the screen (figure 8 ) . the cataloger then proceeds to fill in the work form by keyboarding the catalog data, and transmitting the data to the computer field by field as each is completed. a~ shown in figure 8, a paragraph mark terminates each field ; each dash is to be filled in by the cataloger for each field used. input cataloging may be original cataloging or may use cataloging data obtained from some source other than the oclc system. type: form: intel i vi: bib i lv i: 1t ~ t> 1-t> 2 24t> 3 250 t> 4 260t> 5 300 t> 6 4-t> 7 5-t> 8 6-t> 9 7-t> 10 8-t> i i 092 t> 12 049 -t> 13 590 fig. 8. workform for a dewe y library. lang: isbn card no: d ~ b c ~ 1t b c ~ b c ~ d 1t ~ -« d ~ 1t b-j 4[ 1t shared cataloging systemjkilgour, et al. 181 when the catalog data has been input, revised, and correctly displayed on the terminal screen, the cataloger requests catalog card production. in the case of new cataloging, not only are cards produced, but also the new record is added to the file and indexed so that it is available within seconds to other users. if a marc ii record for the same book is subsequently added to the file, it replaces the input-cataloging record but does not disturb the holdings information. union catalog each display of a bibliographic record contains a list of symbols for those member institutions that possess the title. in other words, the central file is also a union catalog of the holdings of oclc member libraries, although in the early months of operation these holdings data are very incomplete. nevertheless, they will approach completeness with the passage of time and with retrospective conversion of catalog data. titles cataloged during the operation of the off-line system have been included in the union catalog. the union catalog function is an important function of the shared cataloging system, for it makes available to students and faculties, through the increased information available to staff members, the resources of academic institutions throughout ohio. libraries also use the union catalog as a selection tool since they can dispense with expensive purchases of little-used materials residing in a neighboring library. members also use the file to obtain bibliographic data to be used in ordering. assessment with over nine hundred thousand holdings recorded in the union catalog as of april 1973, it is clear that having this type of information immediately at hand will greatly improve services to students and faculties. enlargement of holdings recorded will enhance the union-catalog value of the system. wright state university is in process of converting its holdings using the oclc system, and the ohio state university libraries-the largest collection in the state-has already converted its shelf list in truncated form. the osu holdings information will soon be available to oclc members. members using the oclc system report a large reduction in cataloging effort. two libraries using lc classification report that they are cataloging at a rate in excess of ten titles per terminal hour when cataloging already exists in the system. libraries using dewey classification are experiencing a somewhat lower rate. the original cost benefit studies were done on the basis of a calculated rate of six titles per hour for those books for which there were already cataloging data in the system. the net savings will be realized when the file has reached sufficient size to enable the largest libraries to locate records for 65 percent of their cataloging and for the smallest to find 95 percent. to reach this level, members collectively would have to use 182 journal of library automation vol. 5/3 september, 1972 existing bibliographic information to catalog 350,000 titles in the course of a year, or an average of approximately 1,460 titles for the total system per working day. it was thought that this rate would be attained by the end of the second year of operation. however, at the end of the first month of on-line operation, over a thousand titles per day were being cataloged. the new catalog card production programs operating on the sigma 5 are much more efficient than the programs used in the older off-line system. earlier in this paper it was reported that cost of the older programs to format catalog cards, but not to print them, was 2.27 cents per card. if costs of the sigma 5 are calculated at commercial rates, the new programs format cards at 2.21 cents per card. however, if actual costs to oclc are used and with the total cost being assigned to one shift, the cost of formatting each card becomes 0.86 cents. the total cost of producing catalog cards is, of course, much more than the cost to format them on a computer. nevertheless, either the 2.21 cents or 0.86 cents rate might serve as a criterion for measuring the efficiency of computerized catalog card production. the low terminal response-time delay for the operation of seventy terminals is a good gauge of the efficiency of the on-line system. in particular, the file organization is efficient, for it enables retrieval of a single entry swiftly from a file of over 600,000 records. moreover, no serious degradation in retrieval efficiency is expected to arise as the result of the growth of file size. the system operates from 7:00 a.m. to 7:00 p.m. on mondays through fridays, and at times the interval between system downtimes has exceeded a week. it is rare that the system will be down on successive days, and when a problem does occur, the system can be restored within a minute or two. moreover, when the system goes down, only two terminals will occasionally lose data, and most of the time, there is no loss of data. hence, it can be concluded that the hardware and software are highly reliable. in summary, it can be said that the oclc on-line shared cataloging system is easy to use, efficient, reliable, and cost beneficial. acknowledgments the research and development reported in this paper were partially supported by office of education contract no. oec-0-70-2209 ( 506), council on library resources grant no. clr-489, national agricultural library contract no. 12-03-01-5-70, and an l.s.c.a. title iii grant from the ohio state library board. references 1. ellen washy miller and b. j. hodges, "shawnee mission's on-line cataloging system," ]ola 4:13-26 (march 1971). shared cataloging systemjkilgour, et al. 183 2. john p . kennedy, "a local marc project : the georgia t ech library," in proceedings of th e 1968 clinic on library a pplications of data processing. (u rbana , ill.: university of illinois gradu ate school of library science, 1969 ) p . 199-215. 3. new england board of high er education, new england librm·y information netw01·k; final r ep01t on council on library resources grant #443. (feb. 1970 ). 4. charles t. payne and robert s. mcgee, the university of chicago bibliographic data processing system: documentation and report supplement, (chicago, ill. : university of chicago library, april1971). 5. judith hopkins, manual for oclc catalog card production (feb. 1971). 6. ohio college library center, pt·eliminary description of catalog cards produced from marc ii data (sept. 1969). 7. f. g. kilgour, "libraries-evolving, computerizing, personalizing," american libraries 3:141-47 ( feb. 1972). 8. american national standards institute, american national standard for bibliographic information interchange on magnetic tape (new york: american national standards institute, 1971 ). 9. f. g. kilgour, p. l. long, and e. b. l eiderman, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7:79-82 ( 1970 ). 10. f. g. kilgour, p. l. long, e. b. leiderman, and a. l. landgraf, "titleonly entries retrieved by use of truncated search keys," lola 4: 207-210 (dec. 1971 ) . 11. philip l. long, and f. g. kilgour, "a truncated search key title index," lola 5:17-20 ( mar. 1972). 12. p. l. long, k. b. l. rastogi, j. e. rush, and j. a. w yckoff, "large on-line files of bibliographic data : an efficient design and a mathematical predictor of re trieval behavior." ifip congress '71: ljubljana -aug ust 1971. ( amsterdam, north holland publishing co., 1971 ). bookle t ta-3, 145-149. 13. martin snyderman and bernard hunt, "the myriad virtues of text compaction," datamation 16:36-40 (dec. 1970). 14. w. d. schieb er and g. w. thomas, "an algorithm for compaction of alphanumeric data," lola 4 :198-206 (dec. 1970). 15. a. c. day, "full table quadratic searching for scatter storage," communications of the acm 13:481 (aug. 1970). 16. new england board of higher education, new england libmry in formation . .. , p. 100-101. letter to the editor ann kucera information technology and libraries | june 2018 9 https://doi.org/10.6017/ital.v37i2.10407 dear editorial board, regarding “halfway home: user centered design and library websites” in the march 2018 issue of information technology and libraries (ital), i thought there were some interesting points. i think that your assertion, however, that user centered design automatically eliminates anything from a website that your main user group did not expressly ask for is faulty. when someone brings up the fact that user centered design is not statistically significant, i interpret that as a misunderstanding of what user centered design is. our academic library websites are not research projects so why would we gather statistically significant information about them? our academic library websites are (or should be) helpful to students and faculty and constantly changing to meet their needs. if librarians perpetuate a misunderstanding of user centered design, my fear is that misunderstanding could perpetuate stagnation and a refusal to change our technology/user interfaces in a rapidly changing environment and do our patrons and ourselves a disservice. user centered design is a set of tools to help us gather information about users and their needs. the information gathered informs the design but does not dictate the design and needs to be part of an iterative process. the web design team at your institution demonstrated user centered design when they added floor maps back into the web site when a group of users pointed out that it was causing problems for the main users at your institution. while valuable experience from librarians and other staff is critical to take into account, it is sometimes difficult to determine which pieces of the puzzle provide comfort to those who work at the library vs. which pieces assist students in their studies. i applaud your willingness to “clear the slate” and reduce the amount of information you were maintaining on your website. i’m guessing you may have removed dozens of links from your website. you only mentioned adding one category of information back into the design. i would say your user centered design process is working quite well. ann kucera systems librarian central michigan university https://doi.org/10.6017/ital.v37i1.10338 the provision of mobile services in us urban libraries ya jun guo, yan quan liu, and arlene bielefield information technology and libraries | june 2018 78 ya jun guo (yadon0619@hotmail.com) is associate professor of information and library science at zhengzhou university of aeronautics, china. yan quan liu (liuy1@southernct.edu) is professor of information and library science at southern connecticut state university. arlene bielefield (bielefielda1@southernct.edu) is professor in information and library science at southern connecticut state university. . abstract to determine the present situation regarding services provided to mobile users in us urban libraries, the authors surveyed 138 urban libraries council members utilizing a combination of mobile visits, content analysis, and librarian interviews. the results show that nearly 95% of these libraries have at least one mobile website, mobile catalog, or mobile app. the libraries actively applied new approaches to meet each local community’s remote-access needs via new technologies, including app download links, mobile reference services, scan isbn, location navigation, and mobile printing. mobile services that libraries provide today are timely, convenient, and universally applicable. introduction the mobile internet has had a major impact on people’s lives and on how information is found located and accessed. today, library patrons are untethered from and free of the limitations of the desktop computer.1 the popularity of mobile devices has changed the relationship between libraries and patrons. mobile technology allows libraries to have the kind of connectivity with their patrons that did not exist previously. patrons no longer think that it is necessary for them to be physically in the library building to use library services, and they are eager to obtain 24/7 access to library resources anywhere using their mobile devices. mobile patrons need mobile libraries to provide them with services. in other words, “patrons want to have a library in their pocket.”2 as a result, libraries around the world are exploring and developing mobile services. according to the state of america’s libraries 2017 report by the american library association, the 50 us states, the district of columbia, and outlying territories have 8,895 public library administrative units (as well as 7,641 branches and bookmobiles). the vital role public libraries play in their communities has also expanded.3 as part of the main role of public libraries, us urban libraries need to embrace the developmental trend of the mobile internet to better serve their communities. the provision of mobile services in us urban libraries is worthy of study and is of great significance as a model for how other public libraries plan and implement their mobile services. mailto:yadon0619@hotmail.com mailto:liuy1@southernct.edu mailto:bielefielda1@southernct.edu the provision of mobile services in us urban libraries | guo, liu, and bielefield 79 https://doi.org/10.6017/ital.v37i2.10170 literature review definition and types of mobile devices and mobile services as early as 1991, mark weiser proposed “ubiquitous computing,” pointing out how people could obtain and handle information at anytime, anywhere, and in any way.4 with this expectation, the possibilities of using personal digital assistants (pdas) as mobile web browsers were researched in 1995.5 in combination with a wireless modem, library users are able to use pdas to access information services whenever they are needed. today, mobile devices are generally defined as units small enough to carry around in a pocket, falling into the categories of pdas, mobile phones, and personal media players.6 for many researchers, laptops are not included in the definition of mobile devices. although wireless laptops purportedly offer the opportunity to go “anywhere in the home,” laptops are generally used in a small set of locations, rather than moving fluidly through the home; wireless laptops are portable, but not mobile.7 in contrast, lippincott suggested that mobile devices should include laptops, netbooks, notebook computers, cell phones, audio players such as mp3 players, cameras, and other items.8 according to the “mobile strategy report” by the california digital library, mobile phones, e-readers, mp3 players, tablets, gaming devices, and pdas are common mobile devices.9 each mobile device has its own characteristics and the potential to connect to the internet from anywhere with a wi-fi network, driving widespread use and thus the provision of library mobile services. mobile services are services libraries offer to patrons via their mobile devices. these services as described herein comprise two categories: traditional library services modified to be available via mobile devices and services created for mobile devices.10 pope et al. listed several mobile services, including sms or text-messaging services, the my info quest project, digital collections, audiobooks, applications, and mobile-friendly websites.11 the california digital library pointed out that a growing number of university and public libraries are offering mobile services. libraries are creating mobile versions of library websites, using text messaging to communicate with patrons, developing mobile catalog searching, providing access to resources, and creating new tools and services, particularly for mobile devices.12 the most recognized mobile services in university libraries are mobile sites, mobile apps, mobile opacs, mobile access to databases, text messaging services, qr codes, augmented reality, and e books.13 both academic and public libraries’ use of web 2.0 applications and services include blogs, wikis, phone apps, qr codes, mash-ups, video or audio sharing, customized webpages, social media and social networking, and types of social tagging.14 this study focuses on the two most common mobile devices, mobile phones and tablets, and on the services provided to library patrons and local communities through mobile websites, mobile apps, and mobile catalogs. status of mobile services in us libraries mobile devices present a new and exciting opportunity for libraries of all types to provide information to people of all ages on the go, wherever they are.15 it is generally observed that there is an increased use of mobile technology in the library environment. information technology and libraries | june 2018 80 librarians see their users increasingly using mobile phones instead of laptops and desktop computers to search the catalog, check the library’s opening hours, and maintain contact with library staff.16 in an earlier investigation of 766 librarians, spires found that there was very little demand for services for mobile devices as of august 2007. at that time, relatively few libraries (18%) purchased content specifically for wireless handheld device use, and very few libraries (15%) reformatted content for these devices.17 however, a survey of public libraries completed by the american library association between september and november 2011 indicated interesting changes: 15% of library websites are optimized for mobile devices, and 12% of libraries use scanned codes (e.g. qr codes), and 7% of libraries have developed smartphone applications for access to library services; 36% of urban libraries have websites optimized for mobile devices, compared to 9% of rural libraries; 76% of libraries offer access to e-books; 70% of libraries use social networking tools such as facebook. 18 later studies revealed more significant changes. 99 association of research libraries member libraries were surveyed in 2012 to identify how many had optimized at least some services for the mobile web. apps were not investigated. the result showed that 83 libraries (84%) had a mobile website.19 a study in 2015 by liu and briggs showed that the top 100 university libraries in the united states offered one or more mobile services, with mobile websites, mobile access to the library catalog, mobile access to the library’s databases, e-books, and text messaging services being the most common. qr codes and augmented reality were less common.20 kim noted that “libraries are acknowledging that people expect to do just about everything on mobile devices and that more and more people are now using a mobile device as their primary access point for the web.”21 although librarians may have previously underestimated what people wanted to do using mobile devices, there is a growing understanding of the potential of these access points. research design survey samples while a growing number of users tend to access information remotely, urban libraries, as the most popular public-sector institutions and community centers, are facing great challenges in addressing the growing need for mobile services. the urban libraries council (ulc) (https://www.urbanlibraries.org), as an authoritative source founded in 1971, is the premier membership association of north america’s leading public library systems. ulc’s member libraries are in communities throughout the united states and canada, comprising a mix of institutions with varying revenue sources and governance structures, and serving communities with populations of differing sizes. ulc’s website lists 145 us and canadian urban libraries. since this study focused only on us urban libraries, 138 libraries were chosen as the study targets, and all were examined. https://www.urbanlibraries.org/ the provision of mobile services in us urban libraries | guo, liu, and bielefield 81 https://doi.org/10.6017/ital.v37i2.10170 table 1. the survey and examples of survey results. contents options example no.1: pima county public library … example no.138: milwaukee public library components of mobile websites 1 account login; 2 catalog search; 3 contact us; 4 downloadables; 5 events; 6 interlibrary loan; 7 kids & teens; 8 locations and hours; 9 meeting room; 10 recent arrivals; 11 recommendations; 12 social media; 13 suggest a purchase; 14 support 1, 2, 3, 4, 5, 7, 8, 9, 10, 12, 13, 14. 1, 2, 3, 4, 5, 7, 8, 9, 12, 13, 14. components of mobile apps 1 account login; 2 barcode wallet; 3 bestsellers; 4 catalog search; 5 contact us; 6 downloadables; 7 events; 8 full website; 9 interlibrary loan; 10 just ordered; 11 kids & teens; 12 locations and hours; 13 meeting room; 14 my bookshelf; 15 my library; 16 pay fines; 17 popular this week; 18 recent arrivals; 19 recommendations; 20 scan isbn; 21 social media; 22 suggest a purchase; 21 support 1, 4, 5, 6, 7, 8, 12, 15, 18, 20, 21. 1, 4, 5, 6, 7, 8, 12, 17, 20, 21. mobile reference services 1 chat/im; 2 social medias; 3 text/sms; 4 web form - 1, 3, 4. social media 1 blog; 2 facebook; 3 flickr; 4 goodreads; 5 google+; 6 instagram; 7 linkedin; 8 pinterest; 9 tumblr; 10 twitter; 11 youtube 1, 2, 3, 6, 8, 10, 11. 1, 2, 6, 8, 10. mobile reservation services 1 reserve a computer; 2 reserve a librarian; 3 reserve a meeting room; 4 reserve a museum pass; 5 reserve a study room; 6 reserve exhibit space - 3. mobile printing 1 mobile printing; 2 no mobile/ wi-fi printing; 3 wifi printing 3. 2. apps or databases 1 axis 360; 2 biblioboard; 3 bookflix;4 brainfuse; 5 career transitions; 6 cloud library; 7 driving -tests.org; 8 ebscohost; 9 flipster; 10 freading; 11 freegal; 12 gale virtual; 13 hoopla; 14 instant flix; 15 learning express; 16 lynda.com; 17 mango languages; 18 master file; 19 morningstar; 20 new york times; 21 novelist; 22 one click digital; 23 overdrive; 24 reference usa; 25 safari; 26 tumble book; 27 tutor.com; 28 world book; 29 worldcat; 30 zinio. 4, 11, 14, 22, 23, 26, 28, 30. 4, 8, 11,12, 13, 15, 17, 18, 19, 21, 23, 24, 30. information technology and libraries | june 2018 82 survey methods as mobile services are offered basically via wireless systems and mobile devices, a combination of research methods, including mobile website visits, content analysis, and librarian interviews, were applied for data collection. specifically, librarian interviews were employed as a verification and supplemental process to ensure that survey data were accurate and exhaustive. first, the authors utilized an iphone, an android mobile phone, and an ipad to access the websites of the 138 us urban libraries in the study sample to ascertain if these libraries have mobile websites or mobile catalogs and whether the platforms are operated properly. then the authors checked whether these libraries have mobile apps that can be downloaded from the apple app store or the google play store. the survey was conducted from june 18 to june 24, 2017. next, the authors went through all the mobile websites and the mobile apps the libraries provide to check the mobile services offered. the authors used a specially designed survey to collect data about each library’s mobile website and app (see table 1). the procedure of survey content analysis was conducted between june 25 and july 24, 2017, with the examination of each library’s services taking approximately 30 minutes. finally, for those libraries that had no mobile websites or mobile apps found through the website visits, the authors made interview requests to staff librarians via their online reference services such as live chat, web form and email. an additional purpose of this step was to confirm the accuracy of the survey data collected from website visits. the survey was conducted from july 22 to august 3, 2017. results and analysis results from the examination of mobile website visits, content analysis, and librarian interviews revealed what services us urban libraries provided as mobile services, how they were provided, and which were commonly provided. how many libraries provide mobile services? over 83% of us urban libraries have developed their own mobile websites (see figure 1) for communities they serve. the mobile website is currently the most popular service platform for mobile users. the provision of mobile services in us urban libraries | guo, liu, and bielefield 83 https://doi.org/10.6017/ital.v37i2.10170 figure 1. types of mobile services provided by libraries. promisingly, each test of these websites through the authors’ mobile devices, either smartphones or tablets, confirmed that all the study subjects can be accessed 100% of the time. these library websites, however, are not entirely built specially for mobile devices. while the majority of urban libraries have transformed their desktop websites into mobile sites with proper responsive design, about 17% are just smaller versions of their desktop websites (see figure 2). a responsive mobile website can react or change according to the needs of the users and the mobile device they’re viewing it on to achieve a good layout and content display. here, text and images change from a three-column to a single-column layout, and unnecessary images are hidden. the web address of a responsively designed mobile website is the same as the desktop website. responsive design is described as a long-term solution for addressing both designers’ and users’ needs.22 the survey found that 59% of libraries now have apps. our analysis of the earliest version of apps records indicate that los angeles public library was the first to use an app, in august 2010. mobile apps have advantages and disadvantages compared to mobile websites, and many libraries compared them and chose between the two. skokie (illinois) public library, as of october 2015, is no longer supporting the library’s mobile app because they claim the library’s website offers a better mobile experience. they also offer an easy access solution like that for a mobile app, with a message displayed to users: “miss having an icon on your home screen? bookmark the site to your home screen and you’ll have an icon to take you directly to this site.” 83% 59% 22% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% mobile website mobile app mobile catalog information technology and libraries | june 2018 84 figure 2. the smaller versions of the desktop website and the specially designed mobile website the proportion of libraries providing mobile catalog services is only 22%. libraries can use multiple options to create one or more mobile service platforms. nearly half (46%) of us urban libraries have both mobile websites and mobile apps. according to the survey, 95% of libraries have at least one mobile website, mobile catalog, or mobile app. a survey the authors conducted in april 2014 found that only 81% of the urban libraries had at least one mobile website, mobile catalog, or mobile app (see figure 3). clearly, libraries are paying increasing attention to mobile services, and providing mobile services has become the unavoidable choice of libraries nowadays. figure 3. changes in the proportion of libraries that provide mobile services from 2014 to 2017. 19% 81% 2014 no mobile services at least one mobile service 5% 95% 2017 no mobile services at least one mobile service the provision of mobile services in us urban libraries | guo, liu, and bielefield 85 https://doi.org/10.6017/ital.v37i2.10170 what content do the mobile websites offer? through mobile website visits and content analysis, it was found that some types of information are available at all libraries, including “account login,” “events,” “locations and hours,” “contact us,” and “social media” (see figure 4). figure 4. components of mobile websites the proportion of library mobile sites that offer “support” and “downloadables” is 96% and 95%, respectively. among them, “support” generally includes donations to the library foundation, donation of books and other materials, and providing volunteer services; “downloadables” generally include e-books, e-magazines, and music. a total of 86% of the urban libraries set up “kids” and “teens” sections, providing specialized information services, such as storytime, games, events, book lists, homework help, volunteer information, and college information. a majority (62%) of libraries provide interlibrary loan information on mobile websites, but one library, palo alto (california) city library, no longer offers the costly interlibrary loan service as of july 2011. more than half (56%) of the libraries set up a “suggest a purchase” function and generally ask readers to provide title, author, publisher, year published, format, and other information in web form. some libraries display “recommendations” (26%) on their mobile websites. denver public library has a special column recommending books for children and teenagers and offers personalized reading suggestions: “tell us what you like to read and we’ll send you our recommendations in about a week.” many mobile websites will pop hints to the libraries’ mobile apps and link to the apple app store or the google play store after automatically identifying the user’s mobile phone operating system. this is helpful for promoting the use of the libraries’ apps, and it also provides great convenience for users. 100% 100% 100% 100% 100% 99% 96% 95% 86% 74% 62% 56% 32% 26% 0% 20% 40% 60% 80% 100% account login events locations and hours contact us social media catalog search support downloadables kids & teens meeting room interlibrary loan suggest a purchase recent arrivals recommendations http://www.marinlibrary.org/events/?trumbaembed=filter3%3dstorytimes information technology and libraries | june 2018 86 what content do the mobile apps offer? the content of mobile websites in libraries is basically the same, but the content of their mobile apps varies widely. the reason is that the understanding of the various libraries about the functions an app should offer differs from one library to another. some of these apps were designed by software vendors, such as boopsie, sirsidynix, and bibliocommoms, but some were designed by the libraries themselves, leading to the absence of a uniform standard or template for the app design. survey results show that only “account login” and “catalog search” are available in all apps (see figure 5). “locations and hours” accounts for a high proportion of apps at 96%. the “locations” feature in many libraries apps, with the help of gps, helps users find their nearest library location. figure 5. components of mobile apps about 85% of apps provide “contact us.” click “contact us” in poudre river public library district and some other libraries’ apps, and you can directly call the library or send text messages via email. “scan isbn” is a unique feature of mobile apps, and 75% of apps provide this functionality. if a library user finds a book they need in a bookstore or elsewhere, they can scan the isbn to can see if that book is in the library’s collection. apps designed by bibliocommoms all have “bestsellers”, “recently reviewed”, “just ordered” and “my library” (see chart figure 6). in “my library,” the “checked out” section contains red alerts for “overdue,” yellow alerts for “due soon,” and “total items.” the “holds” section contains “ready for pickup,” “active holds,” and “paused holds.”. the “my shelves” section contains “completed,” “in progress,” and “for later.” in this way, users can clearly see the details of the books they have 100% 100% 96% 89% 85% 77% 75% 68% 46% 27% 24% 19% 18% 16% 16% 10% 6% 5% 3% 0% 20% 40% 60% 80% 100% account login catalog search locations and hours downloadables contact us events scan isbn social media full website recent arrivals bestsellers recently reviewed popular this week just ordered my library my bookshelf pay fines barcode wallet kids & teens the provision of mobile services in us urban libraries | guo, liu, and bielefield 87 https://doi.org/10.6017/ital.v37i2.10170 borrowed and intend to borrow. apps designed by boopsie generally have “popular this week” to tell users which books have been borrowed more recently. figure 6. an app designed by bibliocommoms. only 3% of apps have “kids” and “teens” sections, which differs greatly from the percentage of mobile websites that offer those sections (86%). what mobile reference services do libraries provide? according to the survey, the most common way for us urban libraries to provide mobile reference service is a web form, which is available in 86% of surveyed libraries (see figure 7). related to “call us,” a web form has the advantage of being independent from the library’s working hours. although users fill out and submit a web form, it is similar to email and, generally, librarians respond to the user’s e-mail address, but it does not require users to enter their own email system, as they only need to fill in the content required by the web form. therefore, it is more convenient to use. the authors believe that providing only an email address is not mobile reference service. the survey found that 6% of libraries do not have mobile reference services. information technology and libraries | june 2018 88 figure 7. mobile reference services provided by libraries. currently, 43% of libraries offer chat and instant messaging (im) services, which allow users to communicate with librarians instantly. for example, when gwinnett county (georgia) public library’s mobile website is visited, an “ask us” dialog box appears in the upper right corner of the site, which allows visitors to chat with librarians. outside of the library’s work hours, the box displays “sorry, chat is offline but you can still get help” (see figure 8). the county of los angeles public library provides four options for im. they are aim, google talk, yahoo! messenger, and msn messenger. figure 8. “ask us” on gwinnett county public library’s mobile website 86% 43% 33% 8% 0% 20% 40% 60% 80% 100% web form chat/im text/sms social media the provision of mobile services in us urban libraries | guo, liu, and bielefield 89 https://doi.org/10.6017/ital.v37i2.10170 all the florida urban libraries surveyed offer reference services via the web form, chat, and text because an “ask a librarian” service administered by the tampa bay library consortium provides florida residents with those mobile reference services. the survey shows that only 8% of the libraries provide social media reference service in “ask a librarian.” the social media that provides reference service is either facebook or twitter. in fact, 100% of libraries have social media, and 100% of libraries have facebook and twitter, but most libraries do not use them to provide reference services. what social media do the libraries use? survey results showed that 100% of mobile websites display links to their social media, usually in the prominent position of the front page of the websites; 68% of apps have social media links. facebook and twitter are social media leaders, and now all libraries’ mobile websites have both (see figure 9). the survey conducted in 2014 showed that facebook and twitter had the highest occupancy rate, but only 61% of libraries offered facebook and 53% offered twitter. it is obvious that libraries have made great progress in the last three years in the application of social media. figure 9. social media being used by libraries. instagram and pinterest are both photo social media, and they are used 76% and 49%, respectively. as the leading social media in the video field, youtube is used by 67% of libraries. what mobile reservation services do libraries provide? mobile reservation services were found in 78% of all libraries’ mobile services. a majority (62%) of the libraries allow online reservation of a meeting room via web form or other forms, and 14% allow reserving a study room (see figure 10). some libraries only reserve a study or meeting room via phone. 100% 100% 76% 67% 57% 49% 41% 19% 12% 12% 9% 0% 20% 40% 60% 80% 100% facebook twitter instagram youtube blog pinterest flickr tumblr linkedin google+ goodreads information technology and libraries | june 2018 90 figure 10. mobile reservation services provided by libraries. a few libraries provide instant online access to free and low-cost tickets to museums, science centers, zoos, theatres, and other fun local cultural venues with discover & go. a total of 14% of the libraries provide “reserve a librarian” service, allowing patrons to reserve a free session with a reference librarian or subject specialist at the library. in addition, several libraries, such as pasadena public library, allow reserving of exhibit space. how many libraries provide mobile printing? mobile printing services allow patrons to print to a library printer from outside the library or from their mobile device. patrons’ print jobs are available for pick up at the library. already, 43% of the libraries provide mobile printing service (see figure 11). it is expected that more libraries will provide this service. to print from a mobile device, patrons need to download an app that supports mobile printing. printeron is the more commonly used app, which has been used by oakland public library, and san mateo county (california) libraries, and others. however, san diego public library uses the your print cloud print system, and santa clara county (california) library uses smart alec. san mateo county libraries offers wireless printing from smartphones, tablets, and laptops at all of its locations, and its wireless printing includes mobile printing, web printing, and email printing. in addition, 14% of libraries offer wireless printing services but do not provide mobile printing services. for example, live oak public libraries in savannah, georgia, states that printing from laptops (pc and mac) is available in all branches, but they don’t have apps that support printing from tablets or mobile phones. 62% 20% 15% 14% 14% 4% 0% 10% 20% 30% 40% 50% 60% 70% reserve a meeting room reserve a computer reserve a museum pass reserve a study room reserve a librarian reserve exhibit space the provision of mobile services in us urban libraries | guo, liu, and bielefield 91 https://doi.org/10.6017/ital.v37i2.10170 figure 11. the proportion of libraries that offer mobile printing. what apps or databases do libraries provide for patrons? four main software programs found to be used to display e-books of the surveyed libraries are overdrive (93%), hoopla (64%), tumblebook (61%), and cloud library (48%). for audiobooks, overdrive (93%) and hoopla (64%) are the most popular; oneclickdigital is used by 48%. most libraries (74%) use zinio for e-magazines, and 48% use the music software freegal. overdrive is the most common application in libraries (see table 2). table 2. the proportion of apps or databases being used in libraries. apps or databases % of libraries providing apps or databases % of libraries providing overdrive 93 world book 46 novelist 79 new york times 44 referenceusa 74 masterfile 43 zinio 74 ebscohost 43 learningexpress 69 flipster 29 gale virtual 68 bookflix 28 hoopla 64 brainfuse 22 morningstar 64 tutor.com 17 mango languages 61 safari 17 tumblebook 61 driving-tests.org 16 lynda.com 57 biblioboard 12 worldcat 51 career transitions 12 freegal 48 axis 360 11 oneclick digital 48 instantflix 10 cloud library 48 freading 9 mobile printing 43% no wireless/mobile printing 42% wireless printing 14% information technology and libraries | june 2018 92 the libraries provide users with various types of databases. survey statistics show that the widely used databases include referenceusa (business), mango languages (language learning), learningexpress and career transitions (job and career), lynda.com and tutor.com (education), morningstar (investment), world book (encyclopedias), worldcat (library resources worldwide), new york times (newspaper articles), driving-tests.org (testing preparation), and safari (technology). conclusion this study shows that mobile services have become popular in us urban libraries as of summer 2017, with 95% offering one or more types of mobile service. responsive mobile websites and mobile apps are the main platforms of current mobile services. the us urban libraries are terribly striving to meet local community’s remote access needs via new technologies. compared with desktop websites, mobile websites and apps for mobile devices offer services that are more accessible, smarter and interactive for local users. some mobile websites automatically prompt the user to install the libraries’ apps; many libraries’ apps offer the “scan isbn” function, making it convenient for the user to scan a book title at any time to see if it is in the library’s collection; “location” provides gps positioning and navigation services for users; “contact us” can directly link telephone, text, and email. libraries are actively developing and adding more mobile services, such as mobile reservation services and mobile printing services. the development of mobile technology has provided the support for libraries to offer mobile services. a future world of users accessing services provided by the libraries at anytime, anywhere, and in any way is getting closer and closer. acknowledgements this work was supported by grant no. 14ctq028 from the national social science foundation of china. references 1jason griffey, mobile technology and libraries (new york: neal-schuman, 2010). 2meredith farkas, “a library in your pocket,” american libraries no. 41 (2010): 38. 3american library association, “the state of america’s libraries 2017: a report from the american library association,” special report, american libraries, april 2017, http://www.ala.org/news/sites/ala.org.news/files/content/state-of-americas-librariesreport-2017.pdf. 4mark weiser, “the computer for the 21st century,” scientific american 265, no. 3 (1991): 94–104. 5stefan gessler and andreas kotulla, “pdas as mobile www browsers,” computer networks and isdn systems 28, no. 1–2 (1995): 53–59. 6georgina parsons, “information provision for he distance learners using mobile devices,” electronic library 28, no. 2 (2010): 231–44, https://doi.org/10.1108/02640471011033594. http://www.ala.org/news/sites/ala.org.news/files/content/state-of-americas-libraries-report-2017.pdf http://www.ala.org/news/sites/ala.org.news/files/content/state-of-americas-libraries-report-2017.pdf https://doi.org/10.1108/02640471011033594 the provision of mobile services in us urban libraries | guo, liu, and bielefield 93 https://doi.org/10.6017/ital.v37i2.10170 7allison woodruff et al., “portable, but not mobile: a study of wireless laptops in the home,” international conference on pervasive computing 4480 (2007): 216–33, https://doi.org/10.1007/978-3-540-72037-9_13. 8joan k. lippincott, “a mobile future for academic libraries,” reference services review 38, no. 2 (2010): 205–13. 9rachel hu and alison meir, “mobile strategy report,” california digital library, august 18, 2010, https://confluence.ucop.edu/download/attachments/26476757/cdl+mobile+device+user+r esearch_final.pdf?version=1. 10yan quan liu and sarah briggs, “a library in the palm of your hand: mobile services in top 100 university libraries,” information technology & libraries 34, no. 2 (2015): 133–48, https://doi.org/10.6017/ital.v34i2.5650. 11kitty pope et al., “twenty-first century library must-haves: mobile library services,” searcher 18, no. 3 (2010): 44–47. 12hu and meir, “mobile strategy report.” 13qian and briggs, “a library in the palm of your hand.” 14kalah rogers, “academic and public libraries’ use of web 2.0 applications and services in mississippi,” slis connecting 4, no. 1 (2015), https://doi.org/10.18785/slis.0401.08. 15 pope et al., “twenty-first century library must-haves.” 16lorraine paterson and low boon, “usability inspection of digital libraries: a case study,” ariadne 63, no. 1 (2010): 11, https://doi.org/10.1007/s00799-003-0074-4. [website lists h. rex hartson, priya shivakumar, and manuel a. pérez-quiñones as the authors] 17todd spires, “handheld librarians: a survey of librarian and library patron use of wireless handheld devices,” internet reference services quarterly 13, no. 4 (2008): 287–309, https://doi.org/10.1080/10875300802326327. 18 american library association, “libraries connect communities 2011-2012,” last modified june, 2012, http://connect.ala.org/files/68293/2012.67b%20plfts%20results.pdf. 19barry trott and rebecca jackson, “mobile academic libraries,” reference & user services quarterly 52, no. 3 (2013): 174–78. 20 liu and briggs, “a library in the palm of your hand.” 21bohyun kim, “the present and future of the library mobile experience,” library technology reports 49, no. 6 (2013): 15–28. 22hannah gascho rempel and laurie bridges, “that was then, this is now: replacing the mobileoptimized site with responsive design,” information technology & libraries 32, no. 4 (2013): 8–24, https://doi.org/10.6017/ital.v32i4.4636. https://doi.org/10.1007/978-3-540-72037-9_13 https://confluence.ucop.edu/download/attachments/26476757/cdl+mobile+device+user+research_final.pdf?version=1 https://confluence.ucop.edu/download/attachments/26476757/cdl+mobile+device+user+research_final.pdf?version=1 https://doi.org/10.6017/ital.v34i2.5650 https://doi.org/10.18785/slis.0401.08 https://doi.org/10.1007/s00799-003-0074-4 https://doi.org/10.1080/10875300802326327 http://connect.ala.org/files/68293/2012.67b%20plfts%20results.pdf https://doi.org/10.6017/ital.v32i4.4636 abstract introduction literature review definition and types of mobile devices and mobile services status of mobile services in us libraries research design survey samples survey methods results and analysis how many libraries provide mobile services? what content do the mobile websites offer? what content do the mobile apps offer? what mobile reference services do libraries provide? what social media do the libraries use? what mobile reservation services do libraries provide? how many libraries provide mobile printing? what apps or databases do libraries provide for patrons? conclusion acknowledgements references microsoft word june_ital_deodato_final.docx evaluating  web-­‐scale  discovery:     a  step-­‐by-­‐step  guide              joseph  deodato     information  technology  and  libraries  |  june  2015             19   abstract   selecting a web-scale discovery service is a large and important undertaking that involves a significant investment of time, staff, and resources. finding the right match begins with a thorough and carefully planned evaluation process. to be successful, this process should be inclusive, goal-oriented, datadriven, user-centered, and transparent. the following article offers a step-by-step guide for developing a web-scale discovery evaluation plan rooted in these five key principles based on best practices synthesized from the literature as well as the author’s own experiences coordinating the evaluation process at rutgers university. the goal is to offer academic libraries that are considering acquiring a web-scale discovery service a blueprint for planning a structured and comprehensive evaluation process. introduction as  the  volume  and  variety  of  information  resources  continue  to  multiply,  the  library  search   environment  has  become  increasingly  fragmented.  instead  of  providing  a  unified,  central  point  of   access  to  its  collections,  the  library  offers  an  assortment  of  pathways  to  disparate  silos  of   information.  to  the  seasoned  researcher  familiar  with  these  resources  and  experienced  with  a   variety  of  search  tools  and  strategies,  this  maze  of  options  may  be  easy  to  navigate.  but  for  the   novice  user  who  is  less  accustomed  to  these  tools  and  even  less  attuned  to  the  idiosyncrasies  of   each  one’s  own  unique  interface,  the  sheer  amount  of  choice  can  be  overwhelming.  even  if  the   user  manages  to  find  their  way  to  the  appropriate  resource,  figuring  out  how  to  use  it  effectively   becomes  yet  another  challenge.  this  is  at  least  partly  due  to  the  fact  that  the  expectations  and   behaviors  of  today’s  library  users  have  been  profoundly  shaped  by  their  experiences  on  the  web.   popular  sites  like  google  and  amazon  offer  simple,  intuitive  interfaces  that  search  across  a  wide   range  of  content  to  deliver  immediate,  relevant,  and  useful  results.  in  comparison,  library  search   interfaces  often  appear  antiquated,  confusing,  and  cumbersome.  as  a  result,  users  are  increasingly   relying  on  information  sources  that  they  know  to  be  of  inferior  quality,  but  are  simply  easier  to   find.  as  luther  and  kelly  note,  the  biggest  challenge  academic  libraries  face  in  today’s  abundant   but  fragmented  information  landscape  is  “to  offer  an  experience  that  has  the  simplicity  of   google—which  users  expect—while  searching  the  library’s  rich  digital  and  print  collections— which  users  need.”1  in  an  effort  to  better  serve  the  needs  of  these  users  and  improve  access  to   library  content,  libraries  have  begun  turning  to  new  technologies  capable  of  providing  deep   discovery  of  their  vast  scholarly  collections  from  a  single,  easy-­‐to-­‐use  interface.  these   technologies  are  known  as  web-­‐scale  discovery  services.     joseph  deodato  (jdeodato@rutgers.edu)  is  digital  user  services  librarian  at  rutgers  university,   new  brunswick,  new  jersey.     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   20   to  paraphrase  hoeppner,  a  web-­‐scale  discovery  service  is  a  large  central  index  paired  with  a   richly  featured  user  interface  providing  a  single  point  of  access  to  the  library’s  local,  open  access,   and  subscription  collections.2  unlike  federated  search,  which  broadcasts  queries  in  real-­‐time  to   multiple  indexes  and  merges  the  retrieved  results  into  a  single  set,  web-­‐scale  discovery  relies  on  a   central  index  of  preharvested  data.  discovery  vendors  contract  with  content  providers  to  index   their  metadata  and  full-­‐text  content,  which  is  combined  with  the  library's  own  local  collections   and  made  accessible  via  a  unified  index.  this  approach  allows  for  rapid  search,  retrieval,  and   ranking  of  a  broad  range  of  content  within  a  single  interface,  including  materials  from  the  library’s   catalog,  licensed  databases,  institutional  repository,  and  digital  collections.  web-­‐scale  discovery   services  also  offer  a  variety  of  features  and  functionality  that  users  have  come  to  expect  from   modern  search  tools.  features  such  as  autocorrect,  relevance  ranking,  and  faceted  browsing  make   it  easier  for  users  to  locate  library  materials  more  efficiently  while  enhanced  content  such  as  cover   images,  ratings,  and  reviews  offer  an  enriched  user  experience  while  providing  useful  contextual   information  for  evaluating  results.   commercial  discovery  products  entered  the  market  in  2007  at  a  time  when  academic  libraries   were  feeling  pressure  to  compete  with  newer  and  more  efficient  search  tools  like  google  scholar.   to  improve  the  library  search  experience  and  stem  the  seemingly  rising  tide  of  defecting  users,   academic  libraries  were  quick  to  adopt  discovery  solutions  that  promised  improved  access  and   increased  usage  of  their  collections.  yet  despite  the  significant  impact  these  technologies  have  on   staff  and  users,  libraries  have  not  always  undertaken  a  formal  evaluation  process  when  selecting  a   discovery  product.  some  were  early  adopters  that  selected  a  product  at  a  time  when  there  few   other  options  existed  on  the  market.  others  served  as  beta  sites  for  particular  vendors  or  simply   chose  the  product  offered  by  their  existing  ils  or  federated  search  provider.  still  others  had  a   selection  decision  made  for  them  by  their  library  director  or  consortium.  however,  despite  rapid   adoption,  the  web-­‐scale  discovery  market  has  only  just  begun  to  mature.  as  products  emerge  from   their  initial  release  and  more  information  about  them  becomes  available,  the  library  community   has  gained  a  better  understanding  of  how  web-­‐scale  discovery  services  work  and  their  particular   strengths  and  weaknesses.  in  fact,  some  libraries  that  have  already  implemented  a  discovery   service  are  currently  considering  switching  products.  whether  your  library  is  new  to  the   discovery  marketplace  or  poised  for  reentry,  this  article  is  intended  to  help  you  navigate  to  the   best  product  to  meet  the  needs  of  your  institution.  it  covers  the  entire  process  from  soup  to  nuts   from  conducting  product  research  and  drafting  organizational  requirements  to  setting  up  local   trials  and  coordinating  user  testing.  by  combining  guiding  principles  with  practical  examples,  this   article  aims  to  offer  an  evaluation  model  rooted  in  best  practices  that  can  be  adapted  by  other   academic  libraries.   literature  review   as  the  adoption  of  web-­‐scale  discovery  services  continues  to  rise,  a  growing  body  of  literature  has   emerged  to  help  librarians  evaluate  and  select  the  right  product.  moore  and  greene  provide  a     information  technology  and  libraries  |  june  2015     21   useful  review  of  this  literature  summarizing  key  trends  such  as  the  timeframe  for  evaluation,  the   type  of  staff  involved,  the  products  being  evaluated,  and  the  methods  and  criteria  used  by   evaluators.3  much  of  the  early  literature  on  this  subject  focuses  on  comparisons  of  product   features  and  functionality.  rowe,  for  example,  offers  comparative  reviews  of  leading  commercial   services  on  the  basis  of  criteria  such  as  content,  user  interface,  pricing,  and  contract  options.4  yang   and  wagner  compare  commercial  and  open  source  discovery  tools  using  a  checklist  of  user   interface  features  that  includes  search  options,  faceted  navigation,  result  ranking,  and  web  2.0   features.5  vaughan  provides  an  in-­‐depth  look  at  discovery  services  that  includes  an  introduction   to  key  concepts,  detailed  profiles  on  each  major  service  provider,  and  a  list  of  questions  to   consider  when  selecting  a  product.6  a  number  of  authors  have  provided  useful  lists  of  criteria  to   help  guide  product  evaluations.  hoeppner,  for  example,  offers  a  list  of  key  factors  such  as  breadth   and  depth  of  indexing,  search  and  refinement  options,  branding  and  customization,  and  tools  for   saving,  organizing,  and  exporting  results.7  luther  and  kelly  and  hoseth  provide  a  similar  list  of   end-­‐user  features  but  also  include  institutional  considerations  such  as  library  goals,  cost,  vendor   support,  and  compatibility  with  existing  technologies.8     while  these  works  are  helpful  for  getting  a  better  sense  of  what  to  look  for  when  shopping  for  a   web-­‐scale  discovery  service,  they  do  not  offer  guidance  on  how  to  design  a  structured  evaluation   plan.  indeed,  many  library  evaluations  have  tended  to  rely  on  what  can  be  described  as  the   checklist  method  of  evaluation.  this  typically  involves  creating  a  checklist  of  desirable  features   and  then  evaluating  products  on  the  basis  of  whether  they  provide  these  features.  for  example,  in   developing  an  evaluation  process  for  rider  university,  chickering  and  yang  compiled  a  list  of   sixteen  user  interface  features,  examined  live  product  installations,  and  ranked  each  product   according  to  the  number  of  features  offered.9  brubaker,  leach-­‐murray,  and  parker  employed  a   similar  process  to  select  a  discovery  service  for  the  twenty-­‐three  members  of  the  private   academic  library  network  of  indiana  (palni).10  these  types  of  evaluations  suffer  from  a  number   of  limitations.  first,  they  tend  to  rely  on  vendor  marketing  materials  or  reviews  of   implementations  at  other  institutions  rather  than  local  trials  and  testing.  second,  product   requirements  are  typically  given  equal  weight  rather  than  prioritized  according  to  importance.   third,  these  requirements  tend  to  focus  predominantly  on  user  interface  features  while  neglecting   equally  important  back  end  functionality  and  institutional  considerations.  finally,  these   evaluations  do  not  always  include  input  or  participation  from  library  staff,  users,  and  stakeholders.   the  first  published  work  to  offer  a  structured  model  for  evaluating  web-­‐scale  discovery  services   was  vaughan’s  “investigations  into  library  web-­‐scale  discovery  services.”11  vaughan  outlines  the   evaluation  process  employed  at  university  of  nevada,  las  vegas  (unlv),  which,  in  addition  to   developing  a  checklist  of  product  requirements,  also  included  staff  surveys,  interviews  with  early   adopters,  vendor  demonstrations,  and  coverage  analysis.  the  author  also  provides  several  useful   appendixes  with  templates  and  documents  that  librarians  can  use  to  guide  their  own  evaluation.   vaughan’s  work  also  appears  in  popp  and  dallis’  must-­‐read  compendium  planning  and   implementing  resource  discovery  tools  in  academic  libraries.12  this  substantial  volume  presents     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   22   forty  chapters  on  planning,  implementing,  and  maintaining  web-­‐scale  discovery  services,   including  an  entire  section  devoted  to  evaluation  and  selection.  in  it,  vaughan  elaborates  on  the   unlv  model  and  offers  useful  recommendations  for  creating  an  evaluation  team,  educating  library   staff,  and  communicating  with  vendors.13  metz-­‐wiseman  et  al.  offer  an  overview  of  best  practices   for  selecting  a  web-­‐scale  discovery  service  on  the  basis  of  interviews  with  librarians  from  fifteen   academic  institutions.14  freivalds  and  lush  of  penn  state  university  explain  how  to  select  a  web-­‐ scale  discovery  service  through  a  request  for  proposal  (rfp)  process.15  bietila  and  olson  describe   a  series  of  tests  that  were  done  at  the  university  of  chicago  to  evaluate  the  coverage  and   functionality  of  different  discovery  tools.16  chapman  et  al.  explain  how  personas,  surveys,  and   usability  testing  were  used  to  develop  a  user-­‐centered  evaluation  process  at  university  of   michigan.17     the  following  article  attempts  to  build  on  this  existing  literature,  combining  the  best  elements   from  evaluation  methods  employed  at  other  institutions  as  well  as  the  author’s  own,  with  the  aim   of  providing  a  comprehensive,  step-­‐by-­‐step  guide  to  evaluating  web-­‐scale  discovery  services   rooted  in  best  practices.   background   rutgers,  the  state  university  of  new  jersey,  is  a  public  research  university  consisting  of  thirty-­‐two   schools  and  colleges  offering  degrees  in  the  liberal  arts  and  sciences  as  well  as  programs  in   professional  and  continuing  education.  the  university  is  distributed  across  three  regional   campuses  serving  more  than  65,000  students  and  24,000  faculty  and  staff.  the  rutgers  university   libraries  comprise  twenty-­‐six  libraries  and  centers  with  a  combined  collection  of  more  than  10.5   million  print  and  electronic  holdings.  the  libraries’  collections  and  services  support  the   curriculum  of  the  university’s  many  degree  programs  as  well  as  advanced  research  in  all  major   academic  disciplines.   in  january  2013,  the  libraries  appointed  a  cross-­‐departmental  team  to  research,  evaluate,  and   recommend  the  selection  of  a  web-­‐scale  discovery  service.  the  impetus  for  this  initiative  derived   from  a  demonstrated  need  to  improve  the  user  search  experience  on  the  basis  of  data  collected   over  the  last  several  years  through  ethnographic  studies,  user  surveys,  and  informal  interactions   at  the  reference  desk  and  in  the  classroom.  users  reported  high  levels  of  dissatisfaction  with   existing  library  search  tools  such  as  the  catalog  and  electronic  databases,  which  they  found   confusing  and  difficult  to  navigate.  above  all,  users  demanded  a  simple,  intuitive  starting  point   from  which  to  search  and  access  the  library’s  collections.  accordingly,  the  libraries  began   investigating  ways  to  improve  access  with  web-­‐scale  discovery.  the  evaluation  team  examined   offerings  from  four  leading  web-­‐scale  discovery  providers,  including  ebsco  discovery  service,   proquest’s  summon,  ex  libris’  primo,  and  oclc’s  worldcat  local.  the  process  lasted   approximately  nine  months  and  included  extensive  product  and  user  research,  vendor   demonstrations,  an  rfp,  reference  interviews,  trials,  surveys,  and  product  testing.  see  appendix  a   for  an  overview  of  the  evaluation  plan.     information  technology  and  libraries  |  june  2015     23   by  the  time  it  began  its  evaluation,  rutgers  was  already  a  latecomer  to  the  discovery  game.  most  of   our  peers  had  already  been  using  web-­‐scale  discovery  services  for  many  years.  however,  rutgers’   less-­‐than-­‐stellar  experience  with  federated  search  had  led  it  to  adopt  a  more  cautious  attitude   toward  the  latest  and  greatest  of  library  “holy  grails.”  this  wait-­‐and-­‐see  approach  proved  highly   beneficial  in  the  end  as  it  allowed  time  for  the  discovery  market  to  mature  and  gave  the  evaluation   team  an  opportunity  to  learn  from  the  successes  and  failures  of  early  adopters.  in  planning  its   evaluation,  the  rutgers  team  was  able  to  draw  on  the  experiences  of  earlier  pioneers  such  as   unlv,  penn  state,  the  university  of  chicago,  and  the  university  of  michigan.  it  was  on  the   metaphorical  shoulders  of  these  library  giants  that  rutgers  built  its  own  successful  evaluation   process.  what  follows  is  a  step-­‐by-­‐step  guide  for  evaluating  and  selecting  a  web-­‐scale  discovery   service  on  the  basis  of  best  practices  synthesized  from  the  literature  as  well  as  the  author’s  own   experiences  coordinating  the  evaluation  process  at  rutgers.  given  the  rapidly  changing  nature  of   the  discovery  market,  the  focus  of  this  article  is  on  the  process  rather  than  the  results  of  rutgers’   evaluation.  while  the  results  will  undoubtedly  be  outdated  by  the  time  this  article  goes  to  press,   the  process  is  likely  to  remain  relevant  and  useful  for  years  to  come.   form  an  evaluation  team   the  first  step  in  selecting  a  web-­‐scale  discovery  service  is  appointing  a  team  that  will  be   responsible  for  conducting  the  evaluation.  composition  of  the  team  will  vary  depending  on  local   practice  and  staffing,  but  should  include  representatives  from  a  broad  cross  section  of  library   units,  including  collections,  public  services,  technical  services,  and  systems.  institutions  with   multiple  campuses,  schools,  or  library  branches  will  want  make  sure  the  interests  of  these   constituencies  are  also  represented.  if  feasible,  the  library  should  consider  including  actual  users   on  the  evaluation  team.  these  may  be  members  of  an  existing  user  advisory  board  or  recruits   from  among  the  library’s  student  employees  and  faculty  liaisons.  including  users  on  your   evaluation  team  will  keep  the  process  focused  on  user  needs  and  ensure  that  the  library  selects   the  best  product  to  meet  them.   there  are  many  reasons  for  establishing  an  inclusive  evaluation  team.  first,  discovery  tools  have   broad  implications  for  a  wide  range  of  library  services  and  functions.  therefore  a  diversity  of   library  expertise  is  required  for  an  informed  and  comprehensive  evaluation.  reference  and   instruction  librarians  will  need  to  evaluate  the  functionality  of  the  tool,  the  quality  of  results,  and   its  role  in  the  research  process.  collections  staff  will  need  to  assess  scope  of  coverage  and   congruency  with  the  library’s  existing  subscriptions.  access  services  will  need  to  assess  how  the   tool  handles  local  holdings  information  and  integrates  with  borrowing  and  delivery  services  like   interlibrary  loan.  catalogers  will  need  to  evaluate  metadata  requirements  and  procedures  for   harvesting  local  records.  it  staff  will  need  to  assess  technical  requirements  and  compatibility  with   existing  infrastructure  and  systems.   second,  depending  on  the  size  and  goals  of  the  institution,  the  product  may  be  expected  to  serve  a   wide  community  of  users  with  different  needs,  skill  levels,  and  academic  backgrounds.  large     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   24   universities  that  include  multiple  schools,  offer  various  degree  programs,  or  have  specialized   programs  like  law  or  medicine  will  need  to  determine  if  and  how  a  new  discovery  tool  will  address   the  needs  of  all  these  users.  it  is  important  that  the  composition  of  the  evaluation  team  adequately   represents  the  interests  of  the  different  user  groups  the  tool  is  intended  to  serve.  the  evaluation  at   rutgers  was  conducted  by  a  cross-­‐departmental  team  of  fifteen  members  and  included  experts   from  a  variety  of  library  units  and  representatives  from  all  campuses.   finally,  because  web-­‐scale  discovery  brings  such  profound  changes  to  staff  and  user  workflows,   decisions  regarding  selection  and  implementation  are  often  fraught  with  controversy.  as  noted,   discovery  tools  impact  a  wide  range  of  library  services  and  therefore  require  careful  evaluation   from  the  perspectives  of  multiple  stakeholders.  furthermore,  these  tools  dramatically  change  the   nature  of  library  research,  and  not  everyone  in  your  organization  may  view  this  change  as  being   for  the  better.  despite  growing  rates  of  adoption,  debates  over  the  value  and  utility  of  web-­‐scale   discovery  continue  to  divide  librarians.18  according  to  one  survey,  securing  staff  buy-­‐in  is  the   biggest  challenge  academic  libraries  face  when  implementing  a  web-­‐scale  discovery  service.19   ensuring  broad  involvement  early  in  the  process  will  help  to  secure  organizational  buy-­‐in  and   support  for  the  selected  product.   while  broad  representation  is  important,  having  a  large  and  diverse  team  can  sometimes  slow   down  the  process;  schedules  can  be  difficult  to  coordinate,  members  may  have  competing  views  or   demands  on  their  time,  meetings  can  lose  focus  or  wander  off  topic,  etc.  the  more  members  on   your  evaluation  team,  the  more  difficult  the  team  may  be  to  manage.  one  strategy  for  managing  a   large  group  might  be  to  create  a  smaller,  core  team  with  all  other  members  serving  on  an  ad  hoc   basis.  the  core  team  functions  as  a  steering  committee  to  manage  the  project  and  calls  on  the  ad   hoc  members  at  different  stages  in  the  evaluation  process  where  their  input  and  expertise  is   needed.  another  strategy  would  be  to  break  the  larger  group  into  several  functional  teams,  each   responsible  for  evaluating  specific  aspects  of  the  discovery  tool.  for  example,  one  team  might   focus  on  functionality,  another  on  technology,  a  third  on  administration,  etc.  this  method  also  has   the  advantage  of  distributing  the  workload  among  team  members  and  breaking  down  a  complex   evaluation  process  into  discrete,  more  manageable  parts.   like  any  other  committee  or  taskforce,  your  evaluation  team  should  have  a  charge  outlining  its   responsibilities,  timetable  of  deliverables,  reporting  structure,  and  membership.  the  charge   should  also  include  a  vision  or  goals  statement  that  explicitly  states  the  underlying  assumptions   and  premises  of  the  discovery  tool,  its  purpose,  and  how  it  supports  the  library’s  larger  mission  of   connecting  users  with  information.20  although  frequently  highlighted  in  the  literature,  the   importance  of  defining  institutional  goals  for  discovery  is  often  overlooked  or  taken  for  granted.21   having  a  vision  statement  is  crucial  to  the  success  of  the  project  for  multiple  reasons.  first,  it   frames  the  evaluation  process  by  establishing  mutually  agreed-­‐upon  goals  and  priorities  for  the   product.  before  the  evaluation  can  begin,  the  team  must  have  a  clear  understanding  of  what   problems  the  discovery  service  is  expected  to  solve,  who  it  is  intended  to  serve,  and  how  it     information  technology  and  libraries  |  june  2015     25   supports  the  library’s  strategic  goals.  is  the  service  primarily  intended  for  undergraduates,  or  is  it   also  expected  to  serve  graduate  students  and  faculty?  is  it  a  one-­‐stop  shop  for  all  information   needs,  a  starting  point  in  a  multi-­‐step  research  process,  or  merely  a  useful  tool  for  general  and   interdisciplinary  research?  second,  having  a  clear  vision  for  the  product  will  help  guide   implementation  and  assessment.  it  will  not  only  help  the  library  decide  how  to  configure  the   product  and  what  features  to  prioritize,  but  also  offer  explicit  benchmarks  by  which  to  evaluate   performance.  finally,  aligning  web-­‐scale  discovery  with  the  library’s  strategic  plan  will  help  put   the  project  in  wider  context  and  secure  buy-­‐in  across  all  units  in  the  organization.  having  a  clear   understanding  of  how  the  product  will  be  integrated  with  and  support  other  library  services  will   help  minimize  common  misunderstandings  and  ensure  wider  adoption.   educate  library  stakeholders   despite  the  quick  maturation  and  adoption  of  web-­‐scale  discovery  services,  these  technologies  are   still  relatively  new.  many  librarians  in  your  organization,  including  those  on  the  evaluation  team,   may  only  possess  a  cursory  understanding  of  what  these  tools  are  and  how  they  function.  creating   an  inclusive  evaluation  process  requires  having  an  informed  staff  that  can  participate  in  the   discussions  and  decision-­‐making  processes  leading  to  product  selection.  therefore  the  first  task  of   your  evaluation  team  should  be  to  educate  themselves  and  their  colleagues  on  the  ins  and  outs  of   web-­‐scale  discovery  services.  this  should  include  performing  a  literature  review,  collecting   information  about  products  currently  on  the  market,  and  reviewing  live  implementations  at  other   institutions.   at  rutgers,  the  evaluation  team  conducted  an  extensive  literature  review  that  resulted  in   annotated  bibliography  covering  all  aspects  of  web-­‐scale  discovery,  including  general   introductions,  product  reviews,  and  methodologies  for  evaluation,  implementation,  and   assessment.  all  team  members  were  encouraged  to  read  this  literature  to  familiarize  themselves   with  relevant  terminology,  products,  and  best  practices.  the  team  also  collected  product   information  from  vendor  websites  and  reviewed  live  implementations  at  other  institutions.  in  this   way,  members  were  able  to  familiarize  themselves  with  the  different  features  and  functionality   offered  by  each  vendor.   once  the  team  has  done  its  research,  it  can  begin  sharing  its  findings  with  the  rest  of  the  library   community.  vaughan  recommends  establishing  a  quick  and  easy  means  of  disseminating   information  such  as  an  internal  staff  website,  blog,  or  wiki  that  staff  can  visit  on  their  own  time.22   the  rutgers  team  created  a  private  libguide  that  served  as  a  central  repository  for  all  information   related  to  the  evaluation  process,  including  a  brief  introduction  to  web-­‐scale  discovery,   information  about  each  product,  recorded  vendor  demonstrations,  links  to  live  implementations,   and  an  annotated  bibliography.  also  included  was  information  about  the  team’s  ongoing  work,   including  the  group’s  charge,  timeline,  meeting  minutes,  and  reports.  in  addition  to  maintaining  an   online  presence,  the  team  also  held  a  series  of  public  forums  and  workshops  to  educate  staff  about   the  nature  of  web-­‐scale  discovery  as  well  as  provide  updates  on  the  evaluation  process  and     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   26   respond  to  questions  and  concerns.  by  providing  staff  with  a  foundation  for  understanding  web-­‐ scale  discovery  and  the  process  by  which  these  products  were  to  be  evaluated,  the  team  sought  to   maximize  the  engagement  and  participation  of  the  larger  library  community.   schedule  vendor  demonstrations   once  everyone  has  a  conceptual  understanding  of  what  web-­‐scale  discovery  services  do  and  how   they  work,  it  is  time  to  begin  inviting  onsite  vendor  demonstrations.  these  presentations  give   library  staff  an  opportunity  to  see  these  products  in  action  and  ask  vendors  in-­‐depth  questions.   sessions  are  usually  led  by  a  sales  representative  and  product  manager  and  typically  include  a   brief  history  of  the  product’s  development,  a  demonstration  of  key  features  and  functionality,  and   an  audience  question-­‐and-­‐answer  period.  to  provide  a  level  playing  field  for  comparison,  the   evaluation  team  may  wish  to  submit  a  list  of  topics  or  questions  for  each  vendor  to  address  in   their  presentation.  this  could  be  a  general  outline  of  key  areas  of  interest  identified  by  the   evaluation  team  or  a  list  of  specific  questions  solicited  from  the  wider  library  community.  vaughan   offers  a  useful  list  of  questions  that  librarians  may  wish  to  consider  to  structure  vendor   demonstrations.23  one  tactic  used  by  the  evaluation  team  at  auburn  university  involved  requiring   vendors  to  use  their  products  to  answer  a  series  of  actual  reference  questions.24  this  not  only   precluded  them  from  using  canned  searches  that  might  only  showcase  the  strengths  of  their   products,  but  also  gave  librarians  a  better  sense  of  how  these  products  would  perform  out  in  the   wild  against  real  user  queries.  another  approach  might  be  to  invite  actual  users  to  the   demonstrations.  whether  you  are  fortunate  enough  to  have  users  on  your  evaluation  team  or  able   to  encourage  a  few  library  student  workers  to  attend,  your  users  may  raise  important  questions   that  your  staff  has  overlooked.   vendor  demonstrations  should  only  be  scheduled  after  the  evaluation  team  has  had  an   opportunity  to  educate  the  wider  library  community.  an  informed  staff  will  get  more  out  of  the   demos  and  be  better  equipped  to  ask  focused  questions.  as  vaughan  suggests,  demonstrations   should  be  scheduled  in  close  proximity  (preferably  within  the  same  month)  to  sustain  staff   engagement,  facilitate  retention  of  details,  and  make  it  easier  to  compare  services.25  with  the   vendor’s  permission,  libraries  should  also  consider  recording  these  sessions  and  making  them   available  to  staff  members  who  are  unable  to  attend.  at  the  conclusion  of  each  demonstration,   staff  should  be  invited  to  offer  their  feedback  on  the  presentation  or  ask  any  follow-­‐up  questions.   this  can  be  accomplished  by  distributing  a  brief  paper  or  online  survey  to  the  attendees.   create  an  evaluation  rubric   perhaps  the  most  important  part  of  the  evaluation  process  is  developing  a  list  of  key  criteria  that   will  be  used  to  evaluate  and  compare  vendor  offerings.  once  the  evaluation  team  has  a  better   understanding  of  what  these  products  can  do  and  the  different  features  and  functionality  offered   by  each  vendor,  it  can  begin  defining  the  ideal  discovery  environment  for  its  institution.  this  often   takes  the  form  of  a  list  of  desirable  features  or  product  requirements.  the  process  for  generating     information  technology  and  libraries  |  june  2015     27   these  criteria  tends  to  vary  by  institution.  in  some  cases,  they  are  defined  by  the  team  leader  or   based  on  criteria  used  for  past  technology  purchases.26  in  other  cases,  criteria  are  compiled   through  a  review  of  the  literature.27  in  yet  other  cases,  they  are  developed  and  refined  with  input   from  library  staff  through  staff  surveys  and  meetings.28   one  important  element  missing  from  all  of  these  approaches  is  the  user.  to  ensure  the  evaluation   team  selects  the  best  tool  for  library  users,  product  requirements  should  be  firmly  rooted  in  an   assessment  of  user  needs.  the  university  of  michigan,  for  example,  used  persona  analysis  to   identify  common  user  needs  and  distilled  these  into  a  list  of  tangible  features  that  could  be  used   for  product  evaluation.29  other  tactics  for  assessing  user  needs  and  expectations  might  include   user  surveys,  interviews,  or  focus  groups.  these  tools  can  be  useful  for  gathering  information   about  what  users  want  from  your  web-­‐scale  discovery  system.  however,  these  methods  should  be   used  with  caution,  as  users  themselves  don’t  always  know  what  they  want,  particularly  from  a   product  they  have  never  used.  furthermore,  as  usability  experts  have  pointed  out,  what  users  say   they  want  may  not  be  what  they  actually  need.30  therefore  it  is  important  to  validate  data   collected  from  surveys  and  focus  groups  with  usability  testing.  to  reliably  determine  whether  a   product  meets  the  needs  of  your  users,  it  is  best  to  observe  what  users  actually  do  rather  than   what  they  say  they  do.   if  the  evaluation  team  has  a  short  timeframe  or  is  unable  to  undertake  extensive  user  research,  it   may  be  able  to  develop  product  requirements  on  the  basis  of  existing  research.  at  rutgers,  for   example,  the  libraries’  department  of  planning  and  assessment  conducts  a  standing  survey  to   collect  information  about  users’  opinions  of  and  satisfaction  with  library  services.  the  evaluation   team  was  able  to  use  this  data  to  learn  more  about  what  users  like  and  don’t  like  about  the   library’s  current  search  environment.  the  team  analyzed  more  than  700  user  comments  collected   from  2009  to  2012  related  to  the  library’s  catalog  and  electronic  resources.  comments  were   mapped  to  specific  types  of  features  and  functionality  that  users  want  or  expect  from  a  library   search  tool.  since  most  users  don’t  typically  articulate  their  needs  in  terms  of  concrete  technical   requirements,  some  interpretation  was  required  on  the  part  of  the  evaluation  team.  for  example,   the  average  user  may  not  necessarily  know  what  faceted  browsing  is,  but  a  suggestion  that  there   be  “a  way  to  browse  through  books  by  category  instead  of  always  having  to  use  the  search  box”   could  reasonably  be  interpreted  as  a  request  for  this  feature.  features  were  ranked  in  order  of   importance  by  the  number  of  comments  made  about  it.  some  of  the  most  “requested”  features   included  single  point  of  access,  “smart”  search  functionality  such  as  autocorrect  and  autocomplete,   and  improved  relevance  ranking.   of  course,  user  needs  are  not  the  only  criteria  to  be  considered  when  choosing  a  discovery  service.   organizational  and  staff  needs  must  also  be  taken  into  account.  user  input  is  important  for   defining  the  functionality  of  the  public  interface,  but  staff  input  is  necessary  for  determining  back-­‐ end  functionality  and  organizational  fit.  to  the  list  of  user  requirements,  the  evaluation  team   added  institutional  requirements  related  to  factors  such  as  cost,  coverage,  customizability,  and     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   28   support.  the  team  then  conducted  a  library-­‐wide  survey  inviting  all  staff  to  rank  these   requirements  in  order  of  importance  and  offer  any  additional  requirements  that  should  be   factored  into  the  evaluation.   combining  the  input  from  library  staff  and  users,  the  evaluation  team  drafted  a  list  of  fifty-­‐five   product  requirements  (see  appendix  b),  which  became  the  basis  for  a  comprehensive  evaluation   rubric  that  would  be  used  to  evaluate  and  ultimately  select  a  web-­‐scale  discovery  service.  the   design  of  the  rubric  was  largely  modeled  after  the  one  developed  at  penn  state.31  requirements   were  arranged  into  five  categories:  content,  functionality,  usability,  administration,  and   technology.  each  category  was  allocated  to  a  sub  team  according  to  area  of  expertise  that  would  be   responsible  for  that  portion  of  the  evaluation.  each  requirement  was  assigned  a  weight  according   to  its  degree  of  importance:  3  =  mandatory,  2  =  desired,  1  =  optional.  each  product  was  given  a   score  based  on  how  well  it  met  each  requirement:  3  =  fully  meets,  2  =  partially  meets,  1  =  barely   meets,  0  =  does  not  meet.  the  total  number  of  points  awarded  for  each  requirement  was   calculated  by  multiplying  weight  by  score.  the  final  score  for  each  product  was  calculated  by   summing  up  the  total  number  of  points  awarded  (see  appendix  c).   this  scoring  method  was  particularly  helpful  in  minimizing  the  influence  of  bias  on  the  evaluation   process.  keep  in  mind  that  some  stakeholders  may  possess  personal  preferences  for  or  against  a   particular  product  because  of  current  or  past  relations  with  the  vendor,  their  experiences  with  the   product  while  at  another  institution,  or  their  perception  of  how  the  product  might  impact  their   own  work.  by  establishing  a  set  of  predefined  criteria,  rooted  in  local  needs  and  measured   according  to  clear  and  consistent  standards,  the  team  adopted  an  evaluation  model  that  was  not   only  user-­‐centered,  but  also  allowed  for  a  fair,  unbiased,  and  systematic  evaluation  of  vendor   offerings.  this  is  particularly  important  for  libraries  that  must  go  through  a  formal  procurement   process  to  purchase  a  web-­‐scale  discovery  service.   draft  the  rfp   once  the  evaluation  team  has  defined  its  product  requirements  and  established  a  method  for   evaluating  the  products  in  the  marketplace,  it  can  set  to  work  drafting  a  formal  rfp.  some   institutions  may  be  able  to  forego  the  rfp  process.  others,  like  rutgers,  are  required  to  go  through   a  competitive  bidding  process  for  any  goods  and  services  purchased  over  a  certain  dollar  amount.   the  only  published  model  on  selecting  a  discovery  service  through  the  rfp  process  is  offered  by   freivalds  and  lush.32  the  authors  provide  a  brief  overview  of  the  pros  and  cons  of  using  an  rfp,   describe  the  process  developed  at  penn  state,  and  offer  several  useful  templates  to  help  guide  the   evaluation.   the  rfp  lets  vendors  know  that  the  organization  is  interested  in  their  product,  outlines  the   organization’s  requirements  for  said  product,  and  gives  the  vendors  an  opportunity  to  explain  in   detail  how  their  product  meets  these  requirements.  rfps  are  usually  written  in  collaboration  with     information  technology  and  libraries  |  june  2015     29   your  university’s  purchasing  department  who  typically  provides  a  template  for  this  purpose.  at  a   minimum,  your  rfp  should  include  the  following:   • background  information  about  the  library,  including  size,  user  population,  holdings,  and   existing  technical  infrastructure   • a  description  of  the  product  being  sought,  including  product  requirements,  services  and   support  expected  from  the  vendor,  and  the  anticipated  timeline  for  implementation   • a  summary  of  the  criteria  that  will  be  used  to  evaluate  proposals,  the  deadline  for   submission,  and  the  preferred  format  of  responses   • any  additional  terms  or  conditions  such  as  requiring  vendors  to  provide  references,  onsite   demonstrations,  trial  subscriptions,  or  access  to  support  and  technical  documentation   • information  about  who  to  contact  regarding  questions  related  to  the  rfp   rfps  are  useful  not  only  because  they  force  the  library  to  clearly  articulate  its  needs  for  web-­‐scale   discovery,  but  also  because  they  produce  a  detailed,  written  record  of  product  information  that   can  be  referenced  throughout  the  evaluation  process.  the  key  component  of  rutgers’  rfp  was  a   comprehensive,  135-­‐item  questionnaire  that  asked  vendors  to  spell  out  in  painstaking  detail  the   design,  technical,  and  functional  specifications  of  their  products  (see  appendix  d).  many  of  the   questions  were  either  borrowed  from  the  existing  literature  or  submitted  by  members  of  the   evaluation  team.  all  questions  were  directly  mapped  to  criteria  from  the  team’s  evaluation  rubric.   the  responses  were  used  to  determine  how  well  each  product  met  these  criteria  and  factored  into   product  scoring.  vendors  were  given  one  month  to  respond  to  the  rfp.   interview  current  customers   while  vendor  marketing  materials,  demonstrations,  and  questionnaires  are  important  sources  of   product  information,  vendor  claims  should  not  simply  be  taken  at  face  value.  to  obtain  an   impartial  assessment  of  the  products  under  consideration,  the  evaluation  team  should  reach  out  to   current  customers.  there  are  several  ways  to  identify  current  discovery  service  subscribers.  many   published  overviews  of  web-­‐scale  discovery  services  offer  lists  of  example  implementations  for   each  major  discovery  provider.33  most  vendors  also  provide  a  list  of  subscribers  on  their  website   or  community  wiki  (or  will  provide  one  on  request).  and,  of  course,  there  is  also  marshall   breeding’s  invaluable  website,  library  technology  guides,  which  provides  up-­‐to-­‐date  information   about  technology  products  used  by  libraries  around  the  world.34  the  advanced  search  allows  you   to  filter  libraries  by  criteria  such  as  type,  collection  size,  geographic  area,  and  ils,  thereby  making   it  easier  to  identify  institutions  similar  to  your  own.   as  part  of  the  rfp  process,  all  four  vendors  were  required  to  provide  references  for  three  current   academic  library  customers  of  equivalent  size  and  classification  to  rutgers.  these  twelve   references  were  then  invited  to  take  an  online  survey  asking  them  to  share  their  opinions  of  and   experiences  with  the  product  (see  appendix  e).  the  survey  consisted  of  a  series  of  likert-­‐scale   questions  asking  each  reference  to  rate  their  satisfaction  with  various  functions  and  features  of     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   30   their  discovery  service.  this  was  followed  by  many  in-­‐depth  written  response  questions  regarding   topics  such  as  coverage,  quality  of  results,  interface  usability,  customization,  and  support.  follow-­‐ up  phone  interviews  were  conducted  in  cases  where  additional  information  or  clarification  was   needed.   the  surveys  permitted  the  evaluation  team  to  collect  feedback  from  current  customers  in  a  way   that  was  minimally  obtrusive  while  allowing  for  easy  analysis  and  comparison  of  responses.  it  also   provided  a  necessary  counterbalance  to  vendor  claims  by  giving  the  team  a  much  more  candid   view  of  each  product’s  strengths  and  weaknesses.  the  reference  interviews  helped  highlight   issues  and  areas  of  concern  that  were  frequently  minimized  or  glossed  over  in  communications   with  vendors  such  as  gaps  in  coverage,  inconsistent  metadata,  duplicate  results,  discoverability  of   local  collections,  and  problems  with  known-­‐item  searching.   configure  and  test  local  trials   although  the  evaluation  team  should  strive  to  collect  as  much  product  information  from  as  many   sources  as  possible,  no  amount  of  research  can  effectively  substitute  for  a  good  old-­‐fashioned  trial   evaluation.  conducting  trials  using  the  library’s  own  collections  and  local  settings  is  the  best  way   to  gain  first-­‐hand  insight  into  how  a  discovery  service  works.  for  some  libraries,  the  expenditure   of  time  and  effort  involved  in  configuring  a  web-­‐scale  discovery  service  can  make  the  prospect  of   conducting  trials  prohibitive.  as  a  result,  many  discovery  evaluations  tend  to  rely  on  testing   existing  implementations  at  other  institutions.  however,  this  method  of  evaluation  only  scratches   the  surface.  for  one  thing,  the  evaluation  team  is  only  able  to  observe  the  front-­‐end  functionality   of  the  public  interface.  but  setting  up  a  local  trial  gives  the  library  an  opportunity  to  peak  under   the  hood  and  learn  about  back-­‐end  administration,  explore  configuration  and  customization   options,  attain  a  deeper  understanding  of  the  composition  of  the  central  index,  and  get  a  better   feel  for  what  it  is  like  working  with  the  vendor.  second,  discovery  services  are  highly  customizable   and  the  availability  of  certain  features,  functionality,  and  types  of  content  varies  by  institution.  as   hoeppner  points  out,  no  individual  site  is  capable  of  demonstrating  the  “full  range  of  possibilities”   available  from  any  vendor.35  the  presence  or  absence  of  certain  features  has  as  much  to  do  with   local  library  decisions  as  they  do  with  any  inherent  limitations  of  the  product.  finally,  establishing   trials  gives  the  evaluation  team  an  opportunity  to  see  how  a  particular  discovery  service  performs   within  its  own  local  environment.  the  ability  to  see  how  the  product  works  with  the  library’s  own   records,  ils,  link  resolver,  and  authentication  system  allows  the  team  to  evaluate  the  compatibility   of  the  discovery  service  with  the  library’s  existing  technical  infrastructure.   at  rutgers,  one  of  the  goals  of  the  rfp  was  to  help  narrow  the  pool  of  potential  candidates  from   four  to  two.  the  evaluation  team  was  asked  to  review  vendor  responses  and  apply  the  evaluation   rubric  to  assign  each  a  preliminary  score  on  the  basis  of  how  well  they  met  the  library’s   requirements.  the  two  top-­‐scoring  candidates  would  then  be  selected  for  a  trial  evaluation  that   would  allow  the  team  to  conduct  further  testing  and  make  a  final  recommendation.  however,  after   the  proposals  were  reviewed,  the  scores  for  three  of  the  products  were  so  close  that  the  team     information  technology  and  libraries  |  june  2015     31   decided  to  trial  all  three.  the  one  remaining  product  scored  notably  lower  than  its  competitors   and  was  dropped  from  further  consideration.   configuring  trials  for  three  different  web-­‐scale  discovery  services  was  no  easy  task,  to  be  sure.  an   implementation  team  was  formed  to  work  with  the  vendors  to  get  the  trials  up  and  running.  the   team  received  basic  training  for  each  product  and  was  given  full  access  to  support  and  technical   documentation.  working  with  the  vendors,  the  implementation  team  set  to  work  loading  the   library’s  records  and  configuring  local  settings.  for  the  most  part,  the  trials  were  basic  out-­‐of-­‐the-­‐ box  implementations  with  minimal  customization.  the  vendors  were  willing  to  do  much  of  the   configuration  work  for  us,  but  it  was  important  that  the  team  learn  and  understand  the   administrative  functionality  of  each  product,  as  this  was  an  integral  part  of  the  evaluation  process.   all  vendors  agreed  to  a  three-­‐month  trial  period  during  which  the  evaluation  team  ran  their   products  through  a  series  of  tests  assessing  three  key  areas:  coverage,  usability,  and  relevance   ranking.   the  importance  of  product  testing  cannot  be  overstated.  as  previously  mentioned,  web-­‐scale   discovery  affect  a  wide  variety  of  library  services  and,  in  most  cases,  will  likely  serve  as  the  central   point  of  access  to  the  library’s  collections.  before  committing  to  a  product,  the  library  should  have   an  opportunity  to  conduct  independent  testing  to  validate  vendor  claims  and  ensure  that  their   products  function  according  to  the  library’s  expectations.  to  ensure  that  critical  issues  are   uncovered,  testing  should  strive  to  simulate  as  much  as  possible  the  environment  and  behavior  of   your  users  by  employing  sample  searches  and  strategies  that  they  themselves  would  use.  in  fact,   wherever  possible,  users  should  be  invited  to  participate  in  testing  and  offer  their  feedback  about   the  products  under  consideration.  testing  checklists  and  scripts  must  also  be  created  to  guide   testers  and  ensure  consistency  throughout  the  process.  as  mandernach  and  condit  fagan  point   out,  although  product  testing  is  time-­‐consuming  and  labor-­‐intensive,  it  will  ultimately  save  the   time  of  your  users  and  staff  who  would  otherwise  be  the  first  to  encounter  any  bugs  and  help   avoid  early  unfavorable  impressions  of  the  product.36   the  first  test  the  evaluation  team  conducted  aimed  at  evaluating  the  coverage  and  quality  of   indexing  of  each  discovery  product  (see  appendix  f).  loosely  borrowing  from  methods  employed   at  university  of  chicago,  twelve  library  subject  specialists  were  recruited  to  help  assess  coverage   within  their  discipline.37  each  subject  specialist  was  asked  to  perform  three  search  queries   representing  popular  research  topics  in  their  discipline  and  compare  the  results  from  each   discovery  service  with  respect  to  breadth  of  coverage  and  quality  of  indexing.  in  scoring  each   product,  subject  specialists  were  asked  to  consider  the  following  questions:     • do  the  search  results  demonstrate  broad  coverage  of  the  variety  of  subjects,  formats,   and  content  types  represented  in  the  library’s  collection?     • do  any  particular  types  of  content  seem  to  dominate  the  results  (books,  journal  articles,   newspapers,  book  reviews,  reference  materials,  etc.)?     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   32   • are  the  library’s  local  collections  adequately  represented  in  the  results?   • do  any  relevant  resources  appear  to  be  missing  from  the  search  results  (i.e.,  results   from  an  especially  relevant  database  or  journal)?   • do  item  records  contain  complete  and  accurate  source  information?   • do  item  records  contain  sufficient  metadata  (citation,  subject  headings,  abstracts,  etc.)   to  help  users  identify  and  evaluate  results?   participants  were  asked  to  rate  the  performance  of  each  discovery  service  in  terms  of  coverage   and  indexing  on  a  scale  of  1  to  3  (1  =  poor,  2  =  average,  3  =  good).  although  results  varied  by   discipline,  one  product  received  the  highest  average  scores  in  both  areas.  in  their  observations,   participants  frequently  noted  that  it  appeared  to  have  better  coverage  and  produce  a  greater   variety  of  sources  while  results  from  the  other  two  products  tended  to  be  dominated  by  specific   source  types  like  newspapers  or  reference  books.  the  same  product  was  also  noted  to  have  more   complete  metadata  while  the  other  two  frequently  produced  results  that  lacked  additional   information  like  abstracts  and  subject  terms.     the  second  test  aimed  to  evaluate  the  usability  of  each  discovery  service.  five  undergraduate   students  of  varying  grade  levels  and  areas  of  study  were  invited  to  participate  in  a  task-­‐based   usability  test  (see  appendix  g).  the  purpose  of  the  test  was  to  assess  users’  ability  to  use  these   products  to  complete  common  research  tasks  and  determine  which  product  best  meet  their  needs.   students  were  asked  to  use  all  three  products  to  complete  five  tasks  while  sharing  their  thoughts   aloud.  for  the  purposes  of  testing,  products  were  referred  to  by  letters  (a,  b,  c)  rather  than  name.   because  participants  were  asked  to  complete  the  same  tasks  using  each  product,  it  was  assumed   that  they  their  ability  to  complete  tasks  might  improve  as  the  test  progressed.  accordingly,   product  order  was  randomized  to  minimize  potential  bias.  each  session  lasted  approximately   forty-­‐five  minutes  and  included  a  pre-­‐test  questionnaire  to  collect  background  information  about   the  participant  as  well  as  a  post-­‐test  questionnaire  to  ascertain  their  opinions  on  the  products   being  tested.  because  users  were  being  asked  to  test  three  different  products,  the  number  of  tasks   was  kept  to  a  minimum  and  focused  only  on  basic  product  functionality.  more  comprehensive   usability  testing  would  be  conducted  after  selection  to  help  guide  implementation  and  improve   the  selected  product.     using  each  product,  participants  were  asked  to  find  three  relevant  sources  on  a  topic,  email  the   results  to  themselves,  and  attempt  to  obtain  full  text  for  at  least  one  item.  although  the  team  noted   potential  problems  in  users’  interaction  with  all  of  the  products,  participants  had  slightly  higher   success  rates  with  one  product  over  all  others.  furthermore,  in  the  post-­‐test  questionnaire,  four   out  of  five  users  stated  that  they  preferred  this  product  to  the  other  two,  noting  that  they  found  it   easier  to  navigate,  obtained  more  relevant  results,  and  had  notably  less  difficulty  accessing  full   text.  a  follow-­‐up  question  asked  participants  how  these  products  compared  with  the  search  tools   currently  offered  by  the  library.  almost  all  participants  cited  disappointing  previous  experiences     information  technology  and  libraries  |  june  2015     33   with  library  databases  and  the  catalog  and  suggested  that  a  discovery  tool  might  make  finding   materials  easier.  however,  several  users  also  suggested  that  none  these  tools  were  “perfect.”  and,   while  these  discovery  services  may  have  the  “potential”  to  improve  their  library  experience,  all   could  use  a  good  deal  of  improvement,  particularly  with  returning  relevant  results.    therefore  the  evaluation  team  embarked  on  a  third  and  final  test  of  its  top  three  discovery   candidates,  the  goal  of  which  was  to  evaluate  relevance  ranking.  while  usability  testing  is  helpful   for  highlighting  problems  with  the  design  of  an  interface,  it  is  not  always  the  best  method  for   assessing  the  quality  of  results.  in  user  testing,  students  frequently  retrieved  or  selected  results   that  were  not  relevant  to  the  topic.  it  was  not  always  clear  whether  this  outcome  was  attributable   to  a  flaw  in  product  design  or  to  the  users’  own  ability  to  construct  effective  search  queries  and   evaluate  results.  determining  relevance  is  a  subjective  process  and  one  that  requires  a  certain   level  of  expertise  in  the  relevant  subject  area.  therefore,  to  assess  relevance  ranking  among  the   competing  discovery  services,  the  evaluation  team  turned  once  again  to  its  library  subject   specialists.   echoing  countless  other  user  studies,  our  testing  indicated  that  most  users  do  not  often  scroll   beyond  the  first  page  of  results.  therefore  a  discovery  service  that  harvests  content  from  a  wide   variety  of  different  sources  must  have  an  effective  ranking  algorithm  capable  of  surfacing  the  most   useful  and  relevant  results.  to  evaluate  relevance  ranking,  subject  specialists  were  asked  to   construct  a  search  query  related  to  their  area  of  expertise,  perform  this  search  in  each  discovery   tool,  and  rate  the  relevancy  of  the  first  ten  results.  results  were  recorded  in  the  exact  order   retrieved  and  ranked  on  a  scale  of  0–3  (0  =  not  relevant,  1  =  somewhat  relevant,  2  =  relevant,  3  =   very  relevant).     two  values  were  used  to  evaluate  the  relevance-­‐ranking  algorithm  of  each  discovery  service.   relevance  was  assessed  by  calculating  cumulative  gain,  or  the  sum  of  all  relevance  scores.  for   example,  if  the  first  ten  results  returned  by  a  discovery  product  received  a  score  of  3  because  they   were  all  deemed  to  be  “very  relevant,”  the  product  would  receive  a  cumulative  gain  score  of  30.   ranking  was  assessed  by  calculating  discounted  cumulative  gain,  which  discounts  the  relevance   score  of  results  on  the  basis  of  where  they  appear  in  the  rankings.  assuming  that  the  relevance  of   results  should  decrease  with  rank,  each  result  after  the  first  was  associated  with  a  discount  factor   of  1/log2i    (where  i  =  rank).  the  relevance  for  each  result  is  multiplied  by  the  discount  factor  to   provide  the  discount  gain.  for  example,  a  result  with  a  relevance  score  of  3  but  a  rank  of  4  is   discounted  through  this  process  to  a  relevance  score  of  1.5.  discounted  cumulative  gain   represents  the  sum  of  all  discount  gain  scores.38   eighteen  librarians  conducted  a  total  of  twenty-­‐six  searches.  using  a  microsoft  excel  worksheet,   participants  were  asked  to  record  their  search  query,  the  titles  of  the  first  ten  results,  and  the   relevance  score  of  each  result  (see  appendix  h).  formulas  for  cumulative  gain  and  discount   cumulative  gain  were  embedded  in  the  worksheet  so  these  values  were  automatically  calculated.   after  all  the  values  were  calculated,  one  product  once  again  had  outperformed  all  others.  in  the     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   34   majority  of  searches  conducted,  librarians  rated  its  results  as  being  more  relevant  than  its   competitors.  however,  librarians  were  quick  to  point  out  that  they  were  not  entirely  satisfied  with   the  results  from  any  of  the  three  products.  in  their  observations,  they  noted  many  of  the  same   issues  that  were  raised  in  previous  rounds  of  testing  such  as  incomplete  metadata,  duplicate   results,  and  overrepresentation  of  certain  types  of  content.     at  the  end  of  the  trial  period,  the  evaluation  team  once  again  invited  feedback  from  the  library   staff.  an  online  library-­‐wide  survey  was  distributed  in  which  staff  members  were  asked  to  rank   each  discovery  product  according  to  several  key  requirements  drawn  from  the  team’s  evaluation   rubric.  each  requirement  was  accompanied  by  one  or  more  questions  for  participants  to  consider   in  their  evaluation.  the  final  question  asked  participants  to  rank  the  three  candidates  in  order  of   preference.  links  to  the  trial  implementations  of  all  three  products  were  included  in  the  survey.   included  in  the  email  announcement  was  also  a  link  to  the  team’s  website  where  participants   could  find  more  information  about  web-­‐scale  discovery.  because  participating  in  the  survey   required  staff  to  review  and  interact  with  all  three  products,  the  team  estimated  that  it  would  take   forty-­‐five  minutes  to  an  hour  to  complete  (depending  on  the  staff  member’s  familiarity  with  the   products).  given  the  amount  of  time  and  effort  required  for  participation,  relevant  committees   were  also  encouraged  to  review  the  trials  and  submit  their  evaluation  as  a  group.  response  rate   for  the  survey  was  much  lower  than  expected,  possibly  because  of  the  amount  of  effort  involved  or   because  a  large  number  of  staff  did  not  feel  qualified  to  comment  on  certain  aspects  of  the   evaluation.  however,  among  the  staff  members  that  did  respond,  one  product  was  rated  more   highly  than  all  others.  coincidentally,  it  was  also  the  same  product  that  had  received  the  highest   scores  in  all  three  rounds  of  testing.   make  final  recommendation   at  this  stage  in  the  process,  your  evaluation  team  should  have  collected  enough  data  to  make  an   informed  selection  decision.  your  decision  should  take  into  consideration  all  of  the  information   gathered  throughout  the  evaluation  process,  including  user  and  product  research,  vendor   demonstrations,  rfp  responses,  customer  references,  staff  and  user  feedback,  trials,  and  product   testing.  in  preparation  for  the  evaluation  team’s  final  meeting,  each  sub  team  was  asked  to  revisit   the  evaluation  rubric.  using  all  of  the  information  that  had  been  collected  and  made  available  on   the  team’s  website,  each  sub  team  was  asked  to  score  the  remaining  three  candidates  based  on   how  well  they  met  the  requirements  in  their  assigned  category  and  submit  a  report  explaining  the   rationale  for  their  scores.  at  the  final  meeting,  a  representative  from  each  sub  team  presented   their  report  to  the  larger  group.  the  entire  team  reviewed  the  scores  awarded  to  each  product.   once  a  consensus  was  reached  on  the  scoring,  the  final  results  were  tabulated  and  the  product   that  received  the  highest  total  score  was  selected.     once  the  evaluation  team  has  reached  a  conclusion,  its  decision  needs  to  be  communicated  to   library  stakeholders.  the  team’s  findings  should  be  compiled  in  a  final  report  that  includes  a  brief   introduction  to  the  subject  of  web-­‐scale  discovery,  the  factors  motivating  the  library’s  decision  to     information  technology  and  libraries  |  june  2015     35   acquire  a  discovery  service,  an  overview  of  the  methods  that  were  used  evaluate  these  services,   and  a  summary  of  the  team’s  final  recommendation.  of  course,  considering  that  few  people  in  your   organization  may  ever  actually  read  the  report,  the  team  should  seek  out  additional  opportunities   to  present  its  findings  to  the  community.  the  rutgers  evaluation  team  presented  its   recommendation  report  on  three  different  occasions.  the  first  was  joint  meeting  of  the  library’s   two  major  governing  councils.  after  securing  the  support  of  the  councils,  the  group’s   recommendation  was  presented  at  a  meeting  of  library  administrators  for  final  approval.  once   approved,  a  third  and  final  presentation  was  given  at  an  all-­‐staff  meeting  and  included  a   demonstration  of  the  selected  product.  by  taking  special  care  to  openly  communicate  the  team’s   decision  and  making  transparent  the  process  used  to  reach  it,  the  evaluation  team  not  only   demonstrated  the  depth  of  its  research  but  also  was  able  to  secure  organizational  buy-­‐in  and   support  for  its  recommendation.   conclusion   selecting  a  web-­‐scale  discovery  service  is  a  large  and  important  undertaking  that  involves  a   significant  investment  of  time,  staff,  and  resources.  finding  the  right  match  begins  with  a  thorough   and  carefully  planned  evaluation  process.  the  evaluation  process  outlined  here  is  intended  as  a   blueprint  that  similar  institutions  may  wish  to  follow.  however,  every  library  has  different  needs,   means,  and  goals.  while  this  process  served  rutgers  well,  certain  elements  may  not  be  applicable   to  your  institution.  regardless  of  what  method  your  library  chooses,  it  should  strive  to  create  an   evaluation  process  that  is  inclusive,  goal-­‐oriented,  data-­‐driven,  user-­‐centered,  and  transparent.   inclusive   web-­‐scale  discovery  impacts  a  wide  variety  of  library  services  and  functions.  therefore  a   complete  and  informed  evaluation  requires  the  participation  and  expertise  of  a  broad  cross   section  of  library  units.  furthermore,  as  with  the  adoption  of  any  new  technology,  the   implementation  of  a  web-­‐scale  discovery  service  can  be  potentially  disruptive.  these  products   introduce  significant  and  sometimes  controversial  changes  to  staff  workflows,  user  behavior,  and   library  usage.  ensuring  broad  involvement  in  the  evaluation  process  can  help  allay  potential   concerns,  reduce  tensions,  and  ensure  wider  adoption.   goal-­‐oriented     it  can  be  easy  to  be  seduced  by  new  technologies  simply  because  they  are  new.  but  merely   adopting  these  technologies  without  taking  to  the  time  to  reflect  on  and  communicate  their   purpose  and  goals  can  be  a  recipe  for  disaster.  to  select  the  best  discovery  tool  for  your  library,   evaluators  must  have  a  clear  understanding  of  the  problems  it  is  trying  to  solve,  the  audience  it   seeks  to  serve,  and  the  role  it  plays  within  the  library’s  larger  mission.  articulating  the  library’s   vision  and  goals  for  web-­‐scale  discovery  is  crucial  for  establishing  an  evaluation  plan,  developing  a   prioritized  list  of  product  requirements,  understanding  what  questions  to  ask  vendors,  and  setting   benchmarks  by  which  to  evaluate  performance.     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   36   data-­‐driven   to  ensure  an  informed,  fair,  and  impartial  evaluation,  evaluators  should  strive  to  incorporate   data-­‐driven  practices  into  all  of  their  decision-­‐making.  many  library  stakeholders,  including   members  of  the  evaluation  team,  may  enter  the  evaluation  process  with  preexisting  views  on  web-­‐ scale  discovery,  untested  assumptions  about  user  behavior,  or  strong  opinions  about  specific   products  and  vendors.  to  minimize  the  influence  of  these  potential  biases  on  the  selection  process,   it  is  important  that  the  team  be  able  to  demonstrate  the  rationale  for  its  decisions  through   verifiable  data.  evaluating  web-­‐scale  discovery  services  requires  extensive  research  and  should   include  data  collected  through  user  research,  staff  surveys,  collections  analysis,  and  product   testing.  all  of  this  data  should  be  carefully  collected,  analyzed,  and  used  to  inform  the  team’s  final   recommendation.     user-­‐centered   if  the  purpose  of  adopting  a  web-­‐scale  discovery  service  is  to  better  serve  your  users,  then  you   should  try  as  much  as  possible  to  involve  users  in  the  evaluation  and  selection  process.  this   means  including  users  on  the  evaluation  team,  grounding  product  requirements  in  user  research,   and  gathering  user  feedback  through  surveys,  focus  groups,  and  product  testing.  this  last  step  is   especially  important.  no  other  piece  of  information  gathered  throughout  the  evaluation  process   will  be  as  helpful  or  revealing  as  actually  watching  users  use  these  products  to  complete  real-­‐life   research  tasks.  user  testing  is  the  best  and,  frankly,  only  way  to  validate  claims  from  both  vendors   and  librarians  about  what  your  users  want  and  need  from  your  library’s  search  environment.     transparent   because  web-­‐scale  discovery  impacts  library  staff  and  users  in  significant  ways,  its  reception   within  academic  libraries  has  been  somewhat  mixed.  as  previously  mentioned,  securing  staff  buy-­‐ in  is  often  one  of  the  most  difficult  obstacles  libraries  face  when  introducing  a  new  web-­‐scale   discovery  service.  while  encouraging  broad  participation  in  the  evaluation  process  helps  facilitate   buy-­‐in,  not  every  library  stakeholder  will  be  able  to  participate.  therefore  it  is  important  that  the   evaluation  team  make  special  effort  to  communicate  its  work  and  keep  the  library  community   updated  on  its  progress.  this  can  be  done  by  creating  a  staff  website  or  blog  devoted  to  the   evaluation  process,  sending  periodic  updates  via  the  library’s  electronic  discussion  list,  holding   public  forums  and  demonstrations,  regularly  soliciting  staff  feedback  through  surveys  and  polls,   and  widely  distributing  the  team’s  findings  and  final  report.  these  communications  should  help   secure  organizational  support  by  making  clear  that  the  team  recommendations  are  based  on  a   thorough  evaluation  that  is  inclusive,  goal-­‐oriented,  data-­‐driven,  user-­‐centered,  and  transparent.         information  technology  and  libraries  |  june  2015     37   appendix  a.  overview  of  web-­‐scale  discovery  evaluation  plan                                                     form an evaluation team create an evaluation team representing a broad cross section of library units. draft a charge outlining the library’s goals for web-scale discovery and the team’s responsibilities, timetable, reporting structure, and membership. 1 educate library stakeholders create a staff website or blog to disseminate information about web-scale discovery and the evaluation process. host workshops and public forums to educate staff, share information, and maximize community participation. 2 schedule vendor demonstrations invite vendors for onsite product demonstrations. schedule visits in close proximity and provide vendors with an outline or list of questions in advance. invite all members of the library community to attend and offer feedback. 3 create an evaluation rubric create a comprehensive, prioritized list of product requirements rooted in staff and user needs. develop a fair and consistent scoring method for determining how each product meets these requirements. 4 draft the rfp if required, draft an rfp to solicit bids from vendors. include information about your library, a summary of your product requirements and evaluation criteria, and any terms or conditions of the bidding process. 5 interview current customers obtain candid assessments of each product by interviewing current customers. ask customers to share their experiences and offer assessments on factors such as coverage, design, functionality, customizability, and vendor support. 6 configure and test local trials after narrowing down the options, select the top candidates for a trial evaluation. test the products with users and staff to evaluate and compare coverage, functionality, and result quality. 7 make final recommendation make an informed recommendation based on all of the information collected. compile the results of your research in a final report and communicate the team’s findings to the library community. 8   evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   38   appendix  b.  product  requirements  for  a  web-­‐scale  discovery  service   #   requirement   description   questions  to  consider   1   content           1.1   scope   provides  access  to  the  broadest   possible  spectrum  of  library   content  including  books,   periodicals,  audiovisual   materials,  institutional   repository  items,  digital   collections,  and  open  access   content   with  how  many  publishers  and   aggregators  does  the  vendor  have   license  agreements?  are  there  any   notable  exclusions?  how  many   total  unique  items  are  included  in   the  central  index?  how  many  open   access  resources  are  included?   what  percentage  of  content  is   mutually  licensed?  what  is  the   approximate  disciplinary,  format,   and  date  breakdown  of  the  central   index?  what  types  of  local  content   can  be  ingested  into  the  index  (ils   records,  institutional  repository   items,  digital  collections,  research   guides,  webpages,  etc.)?  can  the   library  customize  what  content  is   exposed  to  its  users?   1.2   depth   provides  the  richest  possible   metadata  for  all  indexed  items,   including  citations,  descriptors,   abstracts,  and  full  text   what  level  of  indexing  is  provided?   what  percentage  of  items  contains   only  citations?  what  percentage   includes  abstracts?  what   percentage  includes  full  text?   1.3   currency   provides  regular  and  timely   updates  of  licensed  content  as   well  as  on-­‐demand  updates  of   local  content     how  frequently  is  the  central  index   updated?  how  frequently  are  local   records  ingested?  can  the  library   initiate  a  manual  harvest  of  local   records?  can  the  library  initiate  a   manual  harvest  of  a  specific  subset   of  local  records?     information  technology  and  libraries  |  june  2015     39   1.4   data  quality   provides  clear  and  consistent   indexing  of  records  from  a   variety  of  different  sources  and   in  a  variety  of  different  formats     what  record  formats  are   supported?  what  metadata  fields   are  required  for  indexing?  how  is   metadata  from  different  sources   normalized  into  a  universal   metadata  schema?  how  are   controlled  vocabularies  created?  to   what  degree  can  collections  from   different  sources  have  their  own   unique  field  information  displayed   and/or  calculated  into  the   relevancy-­‐ranking  algorithm  for   retrieval  purposes?   1.5   language   supports  indexing  and   searching  of  foreign-­‐language   materials  using  non-­‐roman   characters   does  the  product  support  indexing   and  searching  of  foreign-­‐language   materials  using  non-­‐roman   characters?  what  languages  and   character  sets  are  supported?   1.6   federated   searching   supports  incorporation  of   content  not  included  in  the   central  index  via  federated   searching   does  the  vendor  offer  federated   searching  of  sources  not  included   in  the  central  index?  how  are  these   sources  integrated  into  search   results?  is  there  an  additional  cost   for  adding  connectors  to  these   sources?   1.7   unlicensed  content   includes  and  makes   discoverable  additional  content   not  owned  or  licensed  by  the   library   are  local  collections  from  other   libraries  using  the  discovery   service  exposed  to  all  customers?   are  users  able  to  search  content   that  is  included  in  the  central  index   but  not  licensed  or  owned  by  the   host  library?     2   functionality           2.1   smart  searching   provides  “smart”  search   features  such  as  autocomplete,   autocorrect,  autostemming,   thesaurus  matching,  stop-­‐word   filtering,  keyword  highlighting,   etc.   what  “smart”  features  are  included   in  the  search  engine?  are  these   features  customizable?  can  they  be   enabled  or  disabled  by  the  library?     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   40   2.2   advanced  searching   provides  advanced  search   options  such  as  field  searching,   boolean  operators,  proximity   searching,  nesting,   wildcard/truncation,  etc.   what  types  of  advanced  search   options  are  available?  are  these   options  customizable?  can  they  be   enabled  or  disabled  by  the  library?   2.3   search  limits   provides  limits  for  refining   search  results  according  to   specified  criteria  such  as  peer-­‐ review  status,  full-­‐text   availability,  or  location   does  the  product  include   appropriate  limits  for  filtering   search  results?     2.4   faceted  browsing   allows  users  to  browse  the   index  by  facets  such  as  format,   author,  subject,  region,  era,  etc.   what  types  of  facets  are  available   for  browsing?  can  users  select   multiple  facets  in  different   categories?  are  facets  easy  to  add   or  remove  from  a  search?  are  facet   categories,  labels,  and  ordering   customizable?  can  facets  be   customized  by  format  or  material   type  (e.g.,  music,  film,  etc.)?   2.5   scoped  searching   provides  discipline-­‐,  format-­‐,  or   location-­‐specific  search  options   that  allow  searches  to  be   limited  to  a  set  of  predefined   resources  or  criteria   can  the  library  construct  scoped   search  portals  for  specific  campus   libraries,  disciplines,  or  formats?   can  these  portals  be  customized   with  different  search  options,   facets,  relevancy  ranking,  or  record   displays?   2.6   visual  searching   provides  visual  search  and   browse  options  such  as  tag   clouds,  cluster  maps,  virtual   shelf  browsing,  geo-­‐browsing,   etc.   does  the  product  provide  any   options  for  visualizing  search   results  beyond  text-­‐based  lists?  can   data  visualization  tools  be   integrated  into  search  result   display  with  additional   programming?   2.7   relevancy  ranking   provides  useful  results  using  an   effective  and  locally   customizable  relevancy  ranking   algorithm   what  criteria  are  used  to  determine   relevancy  (term  frequency  and   placement,  format,  document   length,  publication  date,  user   behavior,  scholarly  value,  etc.)?   how  does  it  rank  items  with   varying  levels  of  metadata  (e.g.,   citation  only  vs.  citation  +  full  text)?   is  relevancy  ranking  customizable     information  technology  and  libraries  |  june  2015     41   by  the  library?  by  the  user?     2.8   deduplication   has  an  effective  method  for   identifying  and  managing   duplicate  records  within  results   does  the  product  employ  an   effective  method  of  deduplication?   2.9   record  grouping   groups  different  manifestations   of  the  same  work  together  in  a   single  record  or  cluster   does  the  product  employ  frbr  or   some  similar  method  to  group   multiple  manifestations  of  the  same   work?   2.10   result  sorting   provides  alternative  options   for  sorting  results  by  criteria   such  as  date,  title,  author,  call   number,  etc.   what  options  does  the  product   offer  for  sorting  results?   2.11   item  holdings   provides  real-­‐time  local   holdings  and  availability   information  within  search   results   how  does  the  product  provide  local   holdings  and  availability   information?  is  this  information   displayed  in  real-­‐time?  is  this   information  displayed  on  the   results  screen  or  only  within  the   item  record?   2.12   openurl   supports  openurl  linking  to   facilitate  seamless  access  from   search  results  to  electronic  full   text  and  related  services   how  does  the  product  provide   access  to  the  library’s  licensed  full-­‐ text  content?  are  openurl  links   displayed  on  the  results  screen  or   only  in  the  item  record?     2.13   native  record   linking   provides  direct  links  to  original   records  in  their  native  source   does  the  product  offer  direct  links   to  original  records  allowing  users   to  easily  navigate  from  the   discovery  service  to  the  record   source,  whether  it  is  a  subscription   database,  the  library  catalog,  or  the   institutional  repository?   2.14   output  options   provides  useful  output  options   such  as  print,  email,  text,  cite,   export,  etc.   what  output  options  does  the   product  offer?  what  citation   formats  are  supported?  which   citation  managers  are  supported?   are  export  options  customizable?     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   42   2.15   personalization   provides  personalization   features  that  allow  users  to   customize  preferences,  save   results,  bookmark  items,  create   lists,  etc.   what  personalization  features  does   the  product  offer?  are  these   features  linked  to  a  personal   account  or  only  session-­‐based?   must  users  create  their  own   accounts  or  can  accounts  be   automatically  linked  to  their   institutional  id?   2.16   recommendations   provides  recommendations  to   help  users  locate  similar  items   or  related  resources   does  the  product  provide  item   recommendations  to  help  users   locate  similar  items?  does  the   product  provide  database   recommendations  to  help  users   identify  specialized  databases   related  to  their  topic?   2.17   account   management   allows  users  to  access  their   library  account  for  activities   such  as  renewing  loans,  placing   holds  and  requests,  paying   fines,  viewing  borrowing   history,  etc.   can  the  product  be  integrated  with   the  library’s  ils  to  provide   seamless  access  to  user  account   management  functions?  does  the   vendor  provide  any  drivers  or   technical  support  for  this  purpose?   2.18   guest  access   allows  users  to  search  and   retrieve  records  without   requiring  authentication   does  the  vendor  allow  for  “guest   access”  to  the  service?  are  users   required  to  authenticate  to  search   or  only  when  requesting  access  to   licensed  content?   2.19   context-­‐sensitive   services   interacts  with  university   identity  and  course-­‐ management  systems  to  deliver   customized  services  on  the   basis  of  user  status  and   affiliation   can  the  product  be  configured  to   interact  with  university  identity   and  course-­‐management  systems   to  deliver  customized  services  on   the  basis  of  user  status  and   affiliation?  does  the  vendor   provide  any  drivers  or  technical   support  for  this  purpose?   2.20   context-­‐sensitive   delivery  options   displays  context  sensitive   delivery  options  based  on  the   item’s  format,  status,  and   availability   can  the  product  be  configured  to   interact  with  the  library’s  ill  and   consortium  borrowing  services  to   display  context-­‐sensitive  delivery   options  for  unavailable  local   holdings?  does  the  vendor  provide   any  drivers  or  technical  support  for   this  purpose?     information  technology  and  libraries  |  june  2015     43       2.21   location  mapping   supports  dynamic  library   mapping  to  help  users   physically  locate  items  on  the   shelf   can  the  product  be  configured  to   support  location  mapping  by   linking  the  call  numbers  of  physical   items  to  online  library  maps?  what   additional  programming  is   required?   2.22   custom  widgets   supports  the  integration  of   custom  library  widgets  such  as   live  chat   can  the  library’s  chat  service  be   embedded  into  the  interface  to   provide  live  user  support?  where   can  it  be  embedded?  search  page?   result  screen?     2.23   featured  items   highlights  new,  featured,  or   popular  items  such  as  recent   acquisitions,  recreational   reading,  or  heavily  borrowed  or   downloaded  items   can  the  product  be  configured  to   dynamically  highlight  specific  items   or  collections  in  the  library?     2.24   alerts   provides  customizable  alerts  or   rss  feeds  to  inform  users  about   new  items  related  to  their   research  or  area  of  study   does  the  product  offer   customizable  alerts  or  rss  feeds?   2.25   user-­‐submitted   content   supports  user-­‐submitted   content  such  as  tags,  ratings,   comments,  and  reviews   what  types  of  user-­‐submitted   content  does  the  product  support?   is  this  content  only  available  to  the   host  library  or  is  it  shared  among   all  subscribers  of  the  service?  can   these  features  be  optionally   enabled  or  disabled?     2.26   social  media   integration   allows  users  to  seamlessly   share  items  via  social  media   such  as  facebook,  twitter,   delicious,  etc.   what  types  of  social  media  sharing   does  the  product  support?  can   these  features  be  enabled  or   disabled?       evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   44   3   usability           3.1   design   provides  a  modern,   aesthetically  appealing  design   that  is  locally  customizable   does  the  product  have  a  modern,   aesthetically  pleasing  design?  is  it   easy  to  locate  all  important   elements  of  the  interface?  are   colors,  graphics,  and  spacing  used   effectively  to  organize  content?   what  aspects  of  the  interface  are   locally  customizable  (color  scheme,   branding,  navigation  menus,  result   display,  item  records,  etc.)?  can  the   library  apply  its  own  custom   stylesheets  or  is  customization   limited  to  a  set  or  predefined   options?   3.2   navigation   provides  an  interface  that  is   easy  to  use  and  navigate  with   little  or  no  specialized   knowledge     is  the  interface  intuitive  and  easy  to   navigate?  does  it  use  familiar   navigational  elements  and  intuitive   icons  and  labels?  are  links  clearly   and  consistently  labeled?  do  they   allow  the  user  to  easily  move  from   page  to  page  (forward  and  back)?   do  they  take  the  user  where  he  or   she  expects  to  go?   3.3   accessibility     meets  ada  and  section  508   accessibility  requirements   does  the  product  meet  ada  and   section  508  accessibility   requirements?   3.4   internationalization   provides  translations  of  the   user  interface  in  multiple   languages   does  the  vendor  offer  translations   of  the  interface  in  multiple   languages?  which  languages  are   supported?  does  this  include   translations  of  customized  text?   3.5   help   provides  user  help  screens  that   are  thorough,  easy  to   understand,  context-­‐sensitive,   and  customizable   are  product  help  screens  thorough,   easy  to  navigate,  and  easy  to   understand?  are  help  screens   general  or  context-­‐sensitive  (i.e.,   relevant  to  the  user’s  current   location  within  the  system)?  are   help  screens  customizable?       information  technology  and  libraries  |  june  2015     45   3.6   record  display   provides  multiple  record   displays  with  varying  levels  of   information  (e.g.,  preview,  brief   view,  full  view,  staff  view,  etc.)   are  record  displays  well  organized   and  easily  scannable?  does  the   product  offer  multiple  record   displays  with  varying  levels  of   information?  what  types  of  record   displays  are  available?  can  record   displays  be  customized  by  item   type  or  search  portal?   3.7   enriched  content   supports  integration  of   enriched  content  from  third-­‐ party  providers  such  as  cover   images,  table  of  contents,   author  biographies,  reviews,   excerpts,  journal  rankings,   citation  counts,  etc.   what  types  of  enriched  content   does  the  vendor  provide  or   support?  is  there  an  additional  cost   for  this  content?   3.8   format  icons   provides  intuitive  icons  to   indicate  the  format  of  items   within  search  results   does  the  product  provide  any  icons   or  visual  cues  to  help  users  easily   recognize  the  formats  of  the  variety   of  items  displayed  in  search   results?  is  this  information   displayed  on  the  results  screen  or   only  within  the  item  record?  how   does  the  product  define  formats?   are  these  definitions  customizable?   3.9   persistent  urls   provides  short,  persistent  links   to  item  records,  search  queries,   and  browse  categories   does  the  product  offer  persistent   links  to  item  records?  what  about   persistent  links  to  canned  searches   and  browse  categories?  are  these   links  sufficiently  short  and  user-­‐ friendly?   4   administration           4.1   cost   is  offered  at  a  price  that  is   within  the  library’s  budget  and   proportional  to  the  value  of  the   service   how  is  product  pricing  calculated?   what  is  the  total  cost  of  the  service   including  initial  upfront  costs  and   ongoing  costs  for  subscription  and   technical  support?  what  additional   costs  would  be  incurred  for  add-­‐on   services  (e.g.,  federated  search,   recommender  services,  enriched   content,  customer  support,  etc.)?   4.2   implementation   is  capable  of  being   implemented  within  the   what  is  the  estimated  timeframe   for  implementation,  including     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   46       library’s  designated  timeframe   loading  of  local  records  and   configuration  and  customization  of   the  platform?   4.3   user  community   is  widely  used  and  respected   among  the  library’s  peer   institutions   how  many  subscribers  does  the   product  have?  what  percentage  of   subscribers  are  college  or   university  libraries?  how  do   current  subscribers  view  the   service?   4.4   support     is  supported  by  high-­‐quality   customer  service,  training,  and   product  documentation   does  the  vendor  provide  adequate   support,  training,  and  help   documentation?  what  forms  of   customer  support  are  offered?  how   adequate  is  the  vendor’s   documentation  regarding  content   agreements,  metadata  schema,   ranking  algorithms,  apis,  etc.?  does   the  vendor  provide  on-­‐site  and   online  training?  is  there  any   additional  cost  associated  with   training?   4.5   administrative   tools   is  supported  by  a  robust,  easy-­‐ to-­‐use  administrative  interface   and  customization  tools   does  the  product  have  an  easy  to   use  administrative  interface?  does   it  support  multiple  administrator   logins  and  roles?  what  tools  are   provided  for  product  customization   and  administering  access  control?   4.6   statistics  reporting   includes  a  robust  statistical   reporting  modules  for   monitoring  and  analyzing   product  usage     does  the  vendor  offer  a  means  of   capturing  and  reporting  system   and  usage  statistics?  what  kinds  of   data  are  included  in  such  reports?   in  what  formats  are  these  reports   available?  is  the  data  exportable?     information  technology  and  libraries  |  june  2015     47       5   technology           5.1   development     is  a  sufficiently  mature  product   supported  by  a  stable  codebase   and  progressive  development   cycle   is  the  product  sufficiently  mature   and  supported  by  a  stable   codebase?  is  development   informed  by  a  dedicated  user’s   advisory  group?  how  frequently   are  improvements  and   enhancements  made  to  the  service?   is  there  a  formal  mechanism  by   which  customers  can  suggest,  rank,   and  monitor  the  status  of   enhancement  requests?  what   major  enhancements  are  planned   for  the  next  3–5  years?   5.2   authentication   is  compatible  with  the  library’s   authentication  protocols     does  the  product  allow  for  ip-­‐ authentication  for  on-­‐site  users  and   proxy  access  for  remote  users?   what  authentication  methods  are   supported  (e.g.,  ldap,  cas,   shibboleth,  etc.)?   5.3   browser   compatibility   is  compatible  with  all  major   web  browsers   what  browsers  does  the  vendor   currently  support?   5.4   mobile  access   is  accessible  on  mobile  devices   is  the  product  accessible  on  mobile   devices  via  a  mobile  optimized  web   interface  or  app?  does  the  mobile   version  include  the  same  features   and  functionality  of  the  desktop   version?     5.5   portability   can  be  embedded  in  external   platforms  such  as  library   research  guides,  course   management  systems,  or   university  portals   can  custom  search  boxes  be   created  and  embedded  in  external   platforms  such  as  library  research   guides,  course  management   systems,  or  university  portals?     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   48           5.6   interoperability   includes  a  robust  api  and  is   interoperable  with  other   major  library  systems  such  as   the  ils,  ill,  proxy  server,  link   resolver,  institutional   repository,  etc.     is  the  product  interoperable   with  other  major  library  systems   such  as  the  ils,  ill,  proxy  server,   link  resolver,  institutional   repository,  etc.?  does  the  vendor   offer  a  robust  api  that  can  be   used  to  extract  data  from  the   central  index  or  pair  it  with  a   different  interface?  what  types   of  data  can  be  extracted  with  the   api?   5.7   consortia  support   supports  multiple  product   instances  or  configurations  for   a  multilibrary  environment   can  the  technology  support   multiple  institutions  on  the  same   installation,  each  with  its  own   unique  instance  and  configuration   of  the  product?  is  there  an   additional  cost  for  this  service?     information  technology  and  libraries  |  june  2015     49   appendix  c.  sample  web-­‐scale  discovery  evaluation  rubric   category   functionality   product   product  a     requirement   weight   score   points     notes   2.1  smart  searching           2.2  advanced   searching           2.3  search  limits           2.4  faceted  browsing           2.5  scoped  searching           2.6  visual  searching           2.7  relevancy  ranking           2.8  deduplication           2.9  record  grouping           2.10  result  sorting           2.11  item  holdings           2.12  openurl           2.13  native  record   linking           2.14  output  options           2.11  item  holdings                 weight  scale   1  =  optional   2  =  desired   3  =  mandatory   scoring  scale   0  =  does  not  meet   1  =  barely  meets   2  =  partially  meets   3  =  fully  meets   points  =  weight  ×  score   explanation  and   rationale  for  score     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   50   appendix  d.  web-­‐scale  discovery  vendor  questionnaire     1.  content       1.1  scope   with  how  many  content  publishers  and  aggregators  have  you  forged  content  agreements?   are  there  any  publishers  or  aggregators  with  whom  you  have  exclusive  agreements  that  prohibit   or  limit  them  from  making  their  content  available  to  competing  discovery  vendors?  if  so,  which   ones?   does  your  central  index  exclude  any  of  the  publishers  and  aggregators  listed  in  appendix  y  [not   reproduced  here]?  if  so,  which  ones?   how  many  total  unique  items  are  included  in  your  central  index?     what  is  the  approximate  disciplinary  breakdown  of  the  central  index?  what  percentage  of  content   pertains  to  subjects  in  the  humanities?  what  percentage  in  the  sciences?  what  percentage  in  the   social  sciences?   what  is  the  approximate  format  breakdown  of  the  central  index?  what  percentage  of  content   derives  from  scholarly  journals?  what  percentage  derives  from  magazines,  newspapers,  and  trade   publications?  what  percentage  derives  from  conference  proceedings?  what  percentage  derives   from  monographs?  what  percentage  derives  from  other  publications?   what  is  the  publication  date  range  of  the  central  index?  what  is  the  bulk  publication  date  range   (i.e.,  the  date  range  in  which  the  majority  of  content  was  published)?   does  your  index  include  content  from  open  access  repositories  such  as  doaj,  hathitrust,  and   arxiv?  if  so,  which  ones?   does  your  index  include  oclc  worldcat  catalog  records?  if  so,  do  these  records  include  holdings   information?   what  types  of  local  content  can  be  ingested  into  the  index  (e.g.,  library  catalog  records,   institutional  repository  items,  digital  collections,  research  guides,  library  web  pages,  etc.)?   can  your  service  host  or  provide  access  to  items  within  a  consortia  or  shared  catalog  like  the   pennsylvania  academic  library  consortium  (palci)  or  committee  on  institutional  cooperation   (cic)?   are  local  collections  (ils  records,  digital  collections,  institutional  repositories,  etc.)  from  libraries   that  use  your  discovery  service  exposed  to  all  customers?     information  technology  and  libraries  |  june  2015     51   can  the  library  customize  its  holdings  within  the  central  index?  can  the  library  choose  what   content  to  expose  to  its  users?   1.2  depth   what  level  of  indexing  do  you  typically  provide  in  your  central  index?    what  percentage  of  items   contains  only  citations?  what  percentage  includes  abstracts?  what  percentage  includes  full  text?     1.3  currency   how  frequently  is  the  central  index  updated?   how  often  do  you  harvest  and  ingest  metadata  for  the  library’s  local  content?  how  long  does  it   typically  take  for  such  updates  to  appear  in  the  central  index?   can  the  library  initiate  a  manual  harvest  of  local  records?  can  the  library  initiate  a  manual  harvest   of  a  specific  subset  of  local  records?   1.4  data  quality   with  what  metadata  schemas  (marc,  mets,  mods,  ead,  etc.)  does  your  discovery  platform  work?     do  you  currently  support  rda  records?  if  not,  do  you  have  any  plans  to  do  so  in  the  near  future?   what  metadata  is  required  for  a  local  resource  to  be  indexed  and  discoverable  within  your   platform?   how  is  metadata  from  different  sources  normalized  into  a  universal  metadata  schema?     to  what  degree  can  collections  from  different  sources  have  their  own  unique  field  information   displayed  and/or  calculated  into  the  relevancy-­‐ranking  algorithm  for  retrieval  purposes?   do  you  provide  authority  control?  how  are  controlled  vocabularies  for  subjects,  names,  and  titles   established?     1.5  language   does  your  product  support  indexing  and  searching  of  foreign  language  materials  using  non-­‐ roman  characters?  what  languages  and  character  sets  are  supported?   1.6  federated  searching     how  does  your  product  make  provisions  for  sources  not  included  in  your  central  index?  is  it   possible  to  incorporate  these  sources  via  federated  search?  how  are  federated  search  results     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   52   displayed  with  the  results  from  the  central  index?  is  there  an  additional  cost  for  implementing   federated  search  connectors  to  these  resources?     1.7  unlicensed  content   are  end  users  able  to  search  content  that  is  included  in  your  central  index  but  not  licensed  or   owned  by  the  library?  if  so,  does  your  system  provide  a  locally  customizable  message  to  the  user   or  does  the  user  just  receive  the  publisher/aggregator  message  encouraging  them  to  purchase  the   article?  can  the  library  opt  not  to  expose  content  it  does  not  license  to  its  users?     2.  functionality     2.1  “smart”  searching   does  your  product  include  autocomplete  or  predictive  search  functionality?  how  are   autocomplete  predictions  populated?   does  your  product  include  autocorrect  or  “did  you  mean  .  .  .  ”  suggestions  to  correct  misspelled   queries?  how  are  autocorrect  suggestions  populated?     does  your  product  support  search  query  stemming  to  automatically  retrieve  search  terms  with   variant  endings  (e.g.,  car/cars)?   does  your  product  support  thesaurus  matching  to  retrieve  synonyms  and  related  words  (e.g.,   car/automobile)?         does  your  product  support  stop  word  filtering  to  automatically  remove  common  stop  words  (e.g.,   a,  an,  on,  from,  the,  etc.)  from  search  queries?   does  your  product  support  search  term  highlighting  to  automatically  highlight  search  terms  found   within  results?     how  does  your  product  handle  zero  result  or  “dead  end”  searches?  please  describe  what  happens   when  a  user  searches  for  an  item  that  is  not  included  in  the  central  index  or  the  library’s  local   holdings  but  may  be  available  through  interlibrary  loan.   does  your  product  include  any  other  “smart”  search  features  that  you  think  enhance  the  usability   of  your  product?   are  all  of  the  above  mentioned  search  features  customizable  by  the  library?  can  they  be  optionally   enabled  or  disabled?     2.2  advanced  searching     information  technology  and  libraries  |  june  2015     53   does  your  product  support  boolean  searching  that  allows  users  to  combine  search  terms  using   operators  such  as  and,  or,  and  not?     does  your  product  support  fielded  searching  that  allows  users  to  search  for  terms  within  specific   metadata  fields  (e.g.,  title,  author,  subject,  etc.)?   does  your  product  support  phrase  searching  that  allows  users  to  search  for  exact  phrases?   does  your  product  support  proximity  searching  that  allows  users  to  search  for  terms  within  a   specified  distance  from  one  another?   does  your  product  support  nested  searching  to  allow  users  to  specify  relationships  between   search  terms  and  determine  the  order  in  which  they  will  be  searched?   does  your  product  support  wildcard  and  truncation  searching  that  allow  users  to  retrieve   variations  of  their  search  terms?   does  your  product  include  any  other  advanced  search  features  that  you  think  enhance  the   usability  of  your  product?   are  all  of  the  above  mentioned  search  features  customizable  by  the  library?  can  they  be  optionally   enabled  or  disabled?   2.3  search  limits   does  your  product  offer  search  limits  for  limiting  results  according  to  predetermined  criteria  such   as  peer-­‐review  status  or  full  text  availability?   2.4  faceted  browsing   does  your  product  support  faceted  browsing  of  results  by  attributes  such  as  format,  author,   subject,  region,  era,  etc.?  if  so,  what  types  of  facets  are  available  for  browsing?     is  faceted  browsing  possible  before  as  well  after  the  execution  of  a  search?     can  users  select  multiple  facets  in  different  categories?     are  facet  categories,  labels,  and  ordering  customizable  by  the  library?     can  specialized  materials  be  assigned  different  facets  in  accordance  with  their  unique  attributes   (e.g.,  allowing  users  to  browse  music  materials  by  unique  attributes  such  as  medium  of   performance,  musical  key/range,  recording  format,  etc.)?     2.5  scoped  searching     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   54   does  your  product  support  the  construction  of  multiple  scoped  search  portals  for  specific  campus   libraries,  disciplines  (medicine),  or  formats  (music/video)?     if  so,  what  aspects  of  these  search  portals  are  customizable  (branding,  search  options,  facets,   relevancy  ranking,  record  displays,  etc.)?   2.6  visual  searching   does  your  product  provide  any  options  for  visualizing  search  results  beyond  text-­‐based  lists,  such   as  cluster  maps,  tag  clouds,  image  carousels,  etc.?     2.7  relevancy  ranking   please  describe  your  relevancy  ranking  algorithm.  in  particular,  please  describe  what  criteria  are   used  to  determine  relevancy  (term  frequency/placement,  item  format/length,  publication  date,   user  behavior,  scholarly  value,  etc.)  and  how  is  each  weighted?   how  does  your  product  rank  items  with  varying  levels  of  metadata  (e.g.,  citation  only  vs.  citation,   abstract,  and  full  text)?     is  relevancy  ranking  customizable  by  the  library?     can  relevancy  ranking  be  customized  by  end  users?   2.8  deduplication   how  does  your  product  identify  and  manage  duplicate  records?   2.9  record  grouping   does  your  product  employ  a  frbr-­‐ized  method  to  group  different  manifestations  of  the  same   work?   2.10  result  sorting   what  options  does  your  product  offer  for  sorting  results?   2.11  item  holdings   how  does  your  product  retrieve  and  display  availability  data  for  local  physical  holdings?  is  there  a   delay  in  harvesting  this  data  or  is  it  presented  in  real  time?  is  item  location  and  availability   displayed  in  the  results  list  or  only  in  the  item  record?       2.12  openurl     information  technology  and  libraries  |  june  2015     55   how  does  your  product  provide  access  to  the  library’s  licensed  full  text  content?   are  openurl  links  displayed  on  the  results  screen  or  only  in  the  item  record?   2.13  native  record  linking   does  your  product  offer  direct  links  to  original  records  in  their  native  source  (e.g.,  library  catalog,   institutional  repository,  third-­‐party  databases,  etc.)?   2.14  output  options   what  output  options  does  your  product  offer  (e.g.,  print,  save,  email,  sms,  cite,  export)?     if  you  offer  a  citation  function,  what  citation  formats  does  your  product  support  (mla,  apa,   chicago,  etc.)?   if  you  offer  an  export  function,  which  citation  managers  does  your  product  support  (e.g.,  refworks,   endnote,  zotero,  mendeley,  easybib,  etc.)?     are  citation  and  export  options  locally  customizable?  can  they  be  customized  by  search  portal?   2.15  personalization   does  your  product  offer  any  personalization  features  that  allow  users  to  customize  preferences,   save  results,  create  lists,  bookmark  items,  etc.?  are  these  features  linked  to  a  personal  account  or   are  they  session-­‐based?   if  personal  accounts  are  supported,  must  users  create  their  own  accounts  or  can  account  creation   be  based  on  the  university’s  cas/ldap  identity  management  system?   2.16  recommendations   does  your  product  provide  item  recommendations  to  help  users  locate  similar  items?  on  what   criteria  are  these  recommendations  based?   is  your  product  capable  of  referring  users  to  specialized  databases  based  on  their  search  query?   (for  example,  can  a  search  for  “autism”  trigger  database  recommendations  suggesting  that  the   user  try  their  search  in  psycinfo  or  pubmed?)  if  so,  does  your  product  just  provide  links  to  these   resources  or  does  it  allow  the  user  to  launch  a  new  search  by  passing  their  query  to  the   recommended  database?     2.17  account  management   can  your  product  be  integrated  with  the  library’s  ils  (sirsidynix  symphony)  to  provide  users   access  to  its  account  management  functions  (e.g.,  renewing  loans,  placing  holds/requests,  viewing   borrowing  history,  etc.)?  if  so,  do  you  provide  any  drivers  or  technical  support  for  this  purpose?     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   56   2.18  guest  access   are  users  permitted  “guest  access”  to  the  service?  are  users  required  to  authenticate  in  order  to   search  or  only  when  requesting  access  to  licensed  content?   2.19  context-­‐sensitive  services   could  your  product  be  configured  to  interact  with  our  university  course  management  systems   (sakai,  blackboard,  and  ecollege)  to  deliver  customized  services  based  on  user  status  and   affiliation?  if  so,  do  you  provide  any  drivers  or  technical  support  for  this  purpose?   2.20  context-­‐sensitive  delivery  options   could  your  product  be  configured  to  interact  with  the  library’s  interlibrary  loan  (illiad)  and   consortium  borrowing  services  (ezborrow  and  uborrow)  to  display  context-­‐sensitive  delivery   options  for  unavailable  local  holdings?  if  so,  do  you  provide  any  drivers  or  technical  support  for   this  purpose?   2.21  location  mapping   could  your  product  be  configured  to  support  location  mapping  by  linking  the  call  numbers  of   physical  items  to  library  maps?   2.22  custom  widgets   does  your  product  support  the  integration  of  custom  library  widgets  such  as  live  chat?  where  can   these  widgets  be  embedded?   2.23  featured  items   could  your  product  be  configured  to  highlight  specific  library  items  such  as  recent  acquisitions,   popular  items,  or  featured  collections?     2.24  alerts   does  your  product  offer  customizable  alerts  or  rss  feeds  to  inform  users  about  new  items  related   to  their  research  or  area  of  study?   2.25  user-­‐submitted  content   does  your  product  support  user-­‐generated  content  such  as  tags,  ratings,  comments,  and  reviews?     is  user-­‐generated  content  only  available  to  the  host  library  or  is  it  shared  among  all  subscribers  of   your  service?   can  these  features  be  optionally  enabled  or  disabled?       information  technology  and  libraries  |  june  2015     57   2.26  social  media  integration   does  your  product  allow  users  to  seamlessly  share  items  via  social  media  such  as  facebook,   google+,  and  twitter?     can  these  features  be  optionally  enabled  or  disabled?   3.  usability       3.1  design   describe  how  your  product  incorporates  established  best  practices  in  usability.  what  usability   testing  have  you  performed  and/or  do  you  conduct  on  an  ongoing  basis?   what  aspects  of  the  interface’s  design  are  locally  customizable  (e.g.,  color  scheme,  branding,   display,  etc.)?     can  the  library  apply  its  own  custom  stylesheets  or  is  customization  limited  to  a  set  or  predefined   options?   3.2  navigation   what  aspects  of  the  interface’s  navigation  are  locally  customizable  (e.g.,  menus,  pagination,  facets,   etc.)?     3.3  accessibility   does  your  product  meet  ada  and  section  508  accessibility  requirements?  what  steps  have  you   taken  beyond  section  508  requirements  to  make  your  product  more  accessible  to  people  with   disabilities?     3.4  internationalization   do  you  offer  translations  of  the  interface  in  multiple  languages?  which  languages  are  supported?   does  this  include  translation  of  any  locally  customized  text?   3.5  help   does  your  product  include  help  screens  to  assist  users  in  using  and  navigating  the  system?     are  help  screens  general  or  context-­‐sensitive  (i.e.,  relevant  to  the  user’s  current  location  within   the  system)?     are  help  screens  locally  customizable?   3.6  record  display     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   58   does  your  product  offer  multiple  record  displays  with  varying  levels  of  information?  what  types   of  record  displays  are  available  (e.g.,  preview,  brief  view,  full  view,  staff  view,  etc.)?   can  record  displays  be  customizable  by  item  type  or  metadata  (e.g.,  marc-­‐based  book  record  vs.   mods-­‐based  repository  record)?   can  record  displays  be  customizable  by  search  portal  (e.g.,  a  biosciences  search  portal  that   displays  medical  rather  than  lc  subject  headings  and  call  numbers)?   3.7  enriched  content   does  your  product  provide  or  support  the  integration  of  enriched  content  such  as  cover  images,   tables  of  contents,  author  biographies,  reviews,  excerpts,  journal  rankings,  citation  counts,  etc.?  if   so,  what  types  of  content  does  this  include?  is  there  an  additional  cost  for  this  content?   3.8  format  icons   does  your  product  provide  any  icons  or  visual  cues  to  help  users  easily  recognize  the  formats  of   the  variety  of  items  displayed  in  search  results?   how  does  your  product  define  formats?  are  these  definitions  readily  available  to  end  users?  are   these  definitions  customizable?   3.10  persistent  urls   does  your  product  offer  persistent  links  to  item  records?   does  your  product  offer  persistent  links  to  search  queries  and  browse  categories?   4.  administration     4.1  cost   briefly  describe  your  product  pricing  model  for  academic  library  customers.   4.2  implementation   can  you  meet  the  timetable  defined  in  appendix  z  [not  reproduced  here]?    if  not,  which  milestones   cannot  be  met  or  which  conditions  must  the  libraries  address  in  order  to  meet  the  milestones?   are  you  currently  working  on  web-­‐scale  discovery  implementations  at  any  other  large  institutions?     4.3  user  community   how  many  live,  active  installations  (i.e.,  where  the  product  is  currently  available  to  end-­‐users)  do   you  currently  have?     information  technology  and  libraries  |  june  2015     59   how  many  additional  customers  have  committed  to  the  product?   how  many  of  your  total  customers  are  college  or  university  libraries?   4.4  support   what  customer  support  services  and  hours  of  availability  do  you  provide  for  reporting  and/or   troubleshooting  technical  problems?   do  you  have  a  help  ticket  tracking  system  for  monitoring  and  notifying  clients  of  the  status  of   outstanding  support  issues?     do  you  offer  a  support  website  with  up-­‐to-­‐date  product  documentation,  manuals,  tutorials,  and   faqs?     do  you  provide  on-­‐site  and  online  training  for  library  staff?   do  you  provide  on-­‐site  and  online  training  for  end  users?   briefly  describe  any  consulting  services  you  may  provide  above  and  beyond  support  services   included  with  subscription  (e.g.,  consulting  services  related  to  harvesting  of  a  unique  library   resource  for  which  an  ingest/transform/normalize  routine  does  not  already  exist).     do  you  have  regular  public  meetings  for  users  to  share  experiences  and  provide  feedback  on  the   product?    if  so,  where  and  how  often  are  these  meetings  held?   what  other  communication  avenues  do  you  provide  for  users  to  communicate  with  your  company   and  also  with  each  other  (e.g.,  listserv,  blog,  social  media)?   4.5  administration     what  kinds  of  tools  are  provided  for  local  administration  and  customization  of  the  product?   does  your  product  support  multiple  administrator  logins  and  roles?       4.6  statistics  reporting   what  statistics  reporting  capabilities  are  included  with  your  product?  what  kinds  of  data  are   available  to  track  and  assess  collection  management  and  product  usage?  in  what  formats  are  these   reports  available?  is  the  data  exportable?   is  it  possible  to  integrate  third-­‐party  analytic  tools  such  as  google  analytics  in  order  to  collect   usage  data?           evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   60   5.  technology     5.1  development   in  what  month  and  year  did  product  development  begin?   what  key  features  differentiate  your  product  from  those  of  your  competitors?   how  frequently  are  enhancements  and  upgrades  made  to  the  service?   please  describe  the  major  enhancements  you  expect  to  implement  in  the  next  year.   please  describe  the  future  direction  or  major  enhancements  you  envision  for  the  product  in  the   next  3–5  years.   is  there  a  formal  mechanism  by  which  customers  may  make,  rank,  and  monitor  the  status  of   enhancement  requests?   do  you  have  a  dedicated  user’s  advisory  group  to  test  and  provide  feedback  on  product   development?   5.2  authentication     what  authentication  methods  does  your  product  support  (e.g.,  ldap,  cas,  shibboleth,  etc.)?   5.3  browser  compatibility   please  provide  a  list  of  currently  supported  web  browsers.   5.4  mobile  access   is  the  product  accessible  on  mobile  devices  via  a  mobile  optimized  web  interface  or  app?     does  the  mobile  version  include  the  same  features  and  functionality  of  the  desktop  version?   5.5  portability   can  custom  search  boxes  be  created  and  embedded  in  external  platforms  such  as  the  library’s   research  guides,  course  management  systems,  or  university  portals?   5.6  interoperability   does  your  product  include  an  api  that  can  be  used  extract  data  from  the  central  index  or  pair  it   with  a  different  interface?  what  types  of  data  can  be  extracted  with  the  api?  do  you  provide   documentation  and  instruction  on  the  functionality  and  use  of  your  api?     information  technology  and  libraries  |  june  2015     61   are  there  any  known  compatibility  issues  with  your  product  and  any  of  the  following  systems  or   platforms?   • drupal   • vufind   • sirsidynix  symphony   • fedora  commons   • ezproxy   • illiad   5.7  consortia  support   can  your  product  support  multiple  institutions  on  the  same  installation,  each  with  its  own  unique   instance  and  configuration  of  the  product?  is  there  any  additional  cost  for  this  service?         evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   62   appendix  e.  web-­‐scale  discovery  customer  questionnaire     institutional  background   please  tell  us  a  little  bit  about  your  library.   what  is  the  name  of  your  college  or  university?   which  web-­‐scale  discovery  service  is  currently  in  use  at  your  library?   � ebsco  discovery  service  (eds)   � primo  central  (ex  libris)   � summon  (proquest)   � worldcat  local  (oclc)   � other  ________________   when  was  your  current  web-­‐scale  discovery  service  selected  (month,  year)?   how  long  did  it  take  to  implement  (even  in  beta  form)  your  current  web-­‐scale  discovery   service?   which  of  the  following  types  of  content  are  included  in  your  web-­‐scale  discovery  service?   (check  all  that  apply)   � library  catalog  records   � periodical  indexes  and  databases   � open  access  content     � institutional  repository  records   � local  digital  collections  (other  than  your  institutional  repository)   � library  research  guides   � library  web  pages   � other  ________________           information  technology  and  libraries  |  june  2015     63   rate  your  satisfaction   on  a  scale  of  1  (low)  to  5  (high),  please  rate  your  satisfaction  with  the  following  aspects  of   your  web-­‐scale  discovery  service.     content   how  satisfied  are  you  with  the  scope,  depth,  and  currency  of  coverage  provided  by  your   web-­‐scale  discovery  service?  [are  the  question  marks  below  the  wrong  font?]   ◌  1     ◌  2   ◌  3   ◌  4   ◌  5     functionality   how  satisfied  are  you  with  the  search  functionality,  performance,  and  result  quality  of  your   web-­‐scale  discovery  service?   ◌  1   ◌  2   ◌  3   ◌  4   ◌  5   usability   how  satisfied  are  you  with  the  design,  layout,  navigability,  and  overall  ease  of  use  of  your   web-­‐scale  discovery  interface?   ◌  1   ◌  2   ◌  3   ◌  4   ◌  5   administration   how  satisfied  are  you  with  the  administrative,  customization,  and  reporting  tools  offered   by  your  web-­‐scale  discovery  service?   ◌  1   ◌  2   ◌  3   ◌  4   ◌  5   technology   how  satisfied  are  you  with  the  level  of  interoperability  between  your  web-­‐scale  discovery   service  and  other  library  systems  such  as  your  ils,  knowledge  base,  link  resolver,  and   institutional  repository?   ◌  1   ◌  2   ◌  3   ◌  4   ◌  5   overall   overall,  how  satisfied  are  you  with  your  institution’s  web-­‐scale  discovery  service?   ◌  1   ◌  2   ◌  3   ◌  4   ◌  5       evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   64     questions   please  share  your  experiences  with  your  web-­‐scale  discovery  service  by  responding  to  the   following  questions.   briefly  describe  your  reasons  for  implementing  a  web-­‐scale  discovery  service.  what  role   does  this  service  play  at  your  library?  how  is  it  intended  to  benefit  your  users?  what  types   of  users  is  it  intended  to  serve?   does  your  web-­‐scale  discovery  service  have  any  notable  gaps  in  coverage?  if  so,  how  do   you  compensate  for  those  gaps  or  make  users  of  aware  of  resources  that  are  not  included  in   the  service?   are  you  satisfied  with  the  relevance  of  the  results  returned  by  your  web-­‐scale  discovery   service?  have  you  noticed  any  particular  anomalies  within  search  results?     does  your  web-­‐scale  discovery  service  lack  any  specific  features  or  functions  that  you  wish   were  available?   are  there  any  particular  aspects  of  your  web-­‐scale  discovery  service  that  you  wish  were   customizable  but  are  not?       did  you  face  any  particular  challenges  integrating  your  web-­‐scale  discovery  service  with   other  library  systems  such  as  your  ils,  knowledge  base,  and  link  resolver?   how  responsive  has  the  vendor  been  in  providing  technical  support,  resolving  problems,   and  responding  to  enhancement  requests?  have  they  provided  adequate  training  and   documentation  to  support  your  implementation?   in  general,  how  have  users  responded  to  the  introduction  of  this  service?  has  their   response  been  positive,  negative,  or  mixed?     in  general,  how  have  librarians  responded  to  the  introduction  of  this  service?  has  their   response  been  positive,  negative,  or  mixed?   what  has  been  the  impact  of  implementing  a  web-­‐scale  discovery  service  on  the  overall   usage  of  your  collection?  have  you  noticed  any  fluctuations  in  circulation,  full  text   downloads,  or  usage  of  subject-­‐specific  databases?     has  your  institution  conducted  any  assessment  or  usability  studies  of  your  web-­‐scale   discovery  service?  if  so,  please  briefly  describe  the  key  findings  of  these  studies.   please  share  any  additional  thoughts  or  advice  that  you  think  might  be  helpful  to  other   libraries  currently  exploring  web-­‐scale  discovery  services.             information  technology  and  libraries  |  june  2015     65   appendix  f.  sample  worksheet  for  web-­‐scale  discovery  coverage  test   instructions   construct  3  search  queries  representing  commonly  researched  topics  in  your  discipline.   test  your  queries  in  each  discovery  product  and  compare  the  results.  for  each  product,   record  the  number  of  results  retrieved  and  rate  the  quality  of  coverage  and  indexing.  use   the  space  below  your  ratings  to  explain  your  rationale  and  record  any  notes  or   observations.   rate  coverage  and  indexing  a  scale  of  1  to  3  (1  =  poor,  2  =  average,  3  =  good).  in  your   evaluation,  please  consider  the  following:       coverage   indexing   •  do  the  search  results  demonstrate  broad   coverage  of  the  variety  of  subjects,  formats,   and  content  types  represented  in  the   library’s  collection?  (hint:  use  facets  to   examine  the  breakdown  of  results  by   source  type  or  collection).     •  do  any  particular  types  of  content  seem   to  dominate  the  results  (books,  journal   articles,  newspapers,  book  reviews,   reference  materials,  etc.)?   •  are  the  library’s  local  collections   adequately  represented  in  the  results?   •  do  any  relevant  resources  appear  to  be   missing  from  the  search  results  (e.g.,   results  from  an  especially  relevant   database  or  journal)?   •  do  item  records  contain  complete  and   accurate  source  information?   •  do  item  records  contain  sufficient   metadata  (citation,  subject  headings,   abstracts,  etc.)  to  help  users  identify  and   evaluate  results?               evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   66     example   product   product  b   reviewer   reviewer  #2   discipline   history   query   results   coverage   indexing   kw:  slavery  and   “united  states”   181,457   1  (poor)   3  (good)   the  majority  of  results   appear  to  be  from   newspapers  and  periodicals.   some  items  designated  as   “journals”  are  actually   magazines.  there  are  a  large   number  of  duplicate  records.   some  major  works  on  this   subject  are  not  represented  in   the  results.     depth  of  indexing  varies  by   publication  but  most  include   abstracts  and  subject   headings.  some  records  only   include  citations,  but   citations  appear  to  be   complete  and  accurate.             information  technology  and  libraries  |  june  2015     67   appendix  g.  sample  worksheet  for  web-­‐scale  discovery  usability  test   pre-­‐test  questionnaire     before  beginning  the  test,  ask  the  user  for  the  following  information.   status       �  undergraduate     �  graduate     �  faculty   �  staff     �  other   major/department   ___________________________   what  resource  do  you  use  most  often  for  scholarly  research?          ___________________________   on  a  scale  of  1  to  5,  how  would  you  rate  your  ability  to  find  information  using  library   resources?   low   �  1       �  2       �  3     �  4     �  5   high   on  a  scale  of  1  to  5,  how  would  you  rate  your  ability  to  find  information  using  google  or   other  search  engines?   low   �  1       �  2       �  3     �  4     �  5   high     scenarios   ask  the  user  to  complete  the  following  tasks  using  each  product  while  sharing  their   thoughts  aloud.   1.  you  are  writing  a  research  paper  for  your  communications  course.  you’ve  recently  been   discussing  how  social  media  sites  like  facebook  collect  and  store  large  amounts  of  personal   data.  you  decide  to  write  a  paper  that  answers  the  question:  “are  social  networking  sites  a   threat  to  privacy?”  use  the  search  tool  to  find  sources  that  will  help  you  support  your   argument.   2.  from  the  first  10  results,  select  those  that  you  would  use  to  learn  more  about  this  topic   and  email  them  to  yourself.  if  none  of  the  results  seem  useful,  do  not  select  any.   3.  if  you  were  writing  a  paper  on  this  topic,  how  satisfied  would  you  be  with  these  results?            �  very  dissatisfied              �  dissatisfied    �  no  opinion            �  satisfied              �  very  satisfied   4.  from  the  first  10  results,  attempt  to  access  an  item  for  which  full  text  is  available  online.     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   68   5.  now  that  you’ve  seen  the  first  10  results,  what  would  you  do  next?   � decide  you  have  enough  information  and  stop     � continue  and  review  the  next  set  of  results   � revise  your  search  and  try  again   � exit  and  try  your  search  in  another  library  database  (which  one?)   � exit  and  try  your  search  in  google  or  another  search  engine     � other  (please  explain)       post-­‐test  questionnaire   after  the  user  has  used  all  three  products,  ask  them  about  their  experiences.     based  on  your  experience,  please  rank  the  three  search  tools  you’ve  seen  in  order  of   preference.     how  would  you  compare  these  search  tools  with  the  search  options  currently  offered  by   the  library?                           information  technology  and  libraries  |  june  2015     69   appendix  h.  sample  worksheet  for  web-­‐scale  discovery  relevance  test   instructions   conduct  the  same  search  query  in  each  discovery  product  and  rate  the  relevance  of  the  first  10   results  using  the  scale  provided.  for  each  query,  record  your  search  condition,  terms,  and   limiters.  for  each  product,  record  the  first  10  results  in  the  exact  order  they  appear,  rank  the   relevance  of  each  result  using  the  relevance  scale,  and  explain  the  rationale  for  your  score.  all   calculations  will  be  tabulated  automatically.     relevance  scale   0  =  not  relevant   not  at  all  relevant  to  the  topic,  exact  duplicate  of  a  previous  result,  or   not  enough  information  in  the  record  or  full  text  to  determine  relevance   1  =  somewhat  relevant   somewhat  relevant  but  does  not  address  all  of  concepts  or  criteria   specified  in  the  search  query,  e.g.,  addresses  only  part  of  the  topic,  is  too   broad  or  narrow  in  scope,  is  not  in  the  specified  format,  etc.       2  =  relevant   relevant  to  the  topic,  but  the  topic  may  not  be  the  primary  or  central   subject  of  the  work,  or  the  work  is  too  brief  or  dated  to  be  useful;  a   resource  that  the  user  might  select     3  =  very  relevant   completely  relevant;  exactly  on  topic;  addresses  all  concepts  and   criteria  included  in  the  search  query;  a  resource  that  the  user  would   likely  select       calculations     cumulative  gain   measure  of  overall  relevance  based  on  the  sum  of  all  relevance  scores.     discount  factor   (1/log2i)   penalization  of  relevance  based  on  ranking.  assuming  that  relevance   decreases  with  rank,  each  result  after  the  first  is  associated  with  a   discount  factor  based  on  log  factor  2.  discount  factor  is  calculated  as   1/log2i  where  i  =  rank.  the  discount  factor  of  result  #6  is  calculated  as  1   divided  by  the  logarithm  of  6  with  base  2,  or  1/log(6,2)  =  0.39.   discounted  gain   discounted  relevance  score  based  on  ranking.  discounted  gain  is   calculated  by  multiplying  a  result’s  relevance  score  by  its  discount   factor.  the  discounted  gain  of  a  result  with  a  relevance  score  of  3  and   discount  factor  of  0.39  is  3  ×  0.39,  or  1.17.   discounted   cumulative  gain   measure  of  overall  discounted  gain  based  on  the  sum  of  all  discount  gain   scores.       evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   70   example   product     product  c   reviewer   reviewer  #3   search   condition   seeking  peer  reviewed  articles  about  the  impact  of  media  violence  on  children   search  terms   “mass  media”  and  violence  and  children   limits   peer  reviewed     r esult   r elevan ce   n otes   c.  g ain   r an k   r elevan ce   1/log 2 i   d .  g ain   d .c.  g ain   effects  of   media  ratings   on  children   and   adolescents:  a   litmus  test  of   the  forbidden   fruit  effect   0   research  article   suggesting  that  ratings  do   not  influence  children’s   perceptions  of  films  or   video  games.  not  relevant;   does  not  discuss  impact  of   media  violence  on   children.   19   1   0   1.00   0   9.65   media  violence   associations   with  the  form   and  function  of   aggression   among   elementary   school   children   3   research  article   demonstrating  a  positive   association  between   media  violence  exposure   and  levels  of  physical  and   relational  aggression  in   grade  school  students.   very  relevant.     2   3   1.00   3       information  technology  and  libraries  |  june  2015     71   harmful  effects   of  media  on   children  and   adolescents   2   review  article  discussing   the  influence  of  media  on   negative  child  behaviors   such  as  violence,   substance  abuse,  and   sexual  promiscuity.   relevant  but  does  not   focus  exclusively  on  media   violence.       3   2   0.63   1.26     the  influence   of  media   violence  on   children   3   review  article  examining   opposing  views  on  media   violence  and  its  impact  on   children.  very  relevant.       4   3   0.50   1.5     remote   control   childhood:   combating  the   hazards  of   media  culture   in  schools   1   review  article  discussing   the  harmful  effects  of  mass   media  on  child  behavior   and  learning  as  well  as   strategies  educators  can   use  to  counteract  them.   somewhat  relevant  but   does  not  focus  exclusively   on  media  violence  and   discussion  is  limited  to  the   educational  context.     5   1   0.43   0.43     media  violence,   physical   aggression,   and  relational   aggression  in   school  age   children   3   research  article  on  the   impact  of  media  violence   on  childhood  aggression  in   relation  to  different  types   of  aggression,  media,  and   time  periods.  very   relevant.       6   3   0.39   1.17     do  you  see   what  i  see?   parent  and   child  reports   of  parental   2   research  article   examining  the   effectiveness  of  parental   monitoring  of  children’s   violent  media     7   2   0.36   0.72       evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   72   monitoring  of   media   consumption.  relevant  but   focused  less  on  the  effects   of  media  violence  than   strategies  for  mitigating   them.   exposure  to   media  violence   and  young   children  with   and  without   disabilities:   powerful   opportunities   for  family-­‐ professional   partnerships   2   review  article  discussing   the  impact  of  media   violence  on  children  with   and  without  disabilities   and  recommendations  for   addressing  this  through   family-­‐professional   partnerships.  relevant  but   slightly  more  specific  than   required.       8   2   0.33   0.66     kitle   iletisim   araçlarinda n   televizyonu n  3-­‐6  yas   grubundaki   çocuklarin   davranislar i  üzerine   etkisi.     1   research  article   demonstrating  a  positive   correlation  between   media  violence  exposure   and  aggressive  behavior  in   grade  school  students.   seems  very  relevant  but   article  is  in  turkish.     9   1   0.32   0.32     sex  and   violence:  is   exposure  to   media  content   harmful  to   children?   2   review  article  discussing   how  exposure  to  violent  or   sexually  explicit  media   influences  child  behavior   and  what  librarians  can  do   about  it.  relevant  but  less   than  two  pages  long.       10   2   0.30   0.60         information  technology  and  libraries  |  june  2015     73   references     1.     judy  luther  and  maureen  c.  kelly,  “the  next  generation  of  discovery,”  library  journal  136,  no.   5  (2011):  66.     2.     athena  hoeppner,  “the  ins  and  outs  of  evaluating  web-­‐scale  discovery  services,”  computers   in  libraries  32,  no.  3  (2012):  8.   3.     kate  b.  moore  and  courtney  greene,  “choosing  discovery:  a  literature  review  on  the   selection  and  evaluation  of  discovery  layers,”  journal  of  web  librarianship  6,  no.  3  (2012):   145–63,  http://dx.doi.org/10.1080/19322909.2012.689602.     4.     ronda  rowe,  “web-­‐scale  discovery:  a  review  of  summon,  ebsco  discovery  service,  and   worldcat  local,”  charleston  advisor  12,  no.  1  (2010):  5-­‐10,   http://dx.doi.org/10.5260/chara.12.1.5;  ronda  rowe,  “encore  synergy,  primo  central,”   charleston  advisor  12,  no.  4  (2011):  11–15,  http://dx.doi.org/10.5260/chara.12.4.11.   5.     sharon  q.  yang  and  kurt  wagner,  “evaluating  and  comparing  discovery  tools:  how  close  are   we  towards  the  next  generation  catalog?”  library  hi  tech  28,  no.  4  (2010):  690–709,   http://dx.doi.org/10.1108/07378831011096312.   6.     jason  vaughan,  “web  scale  discovery  services,”  library  technology  reports  47,  no.  1.  (2011):   5–61,  http://dx.doi.org/10.5860/ltr.47n1.   7.     hoeppner,  “the  ins  and  outs  of  evaluation  web-­‐scale  discovery  services.”   8.     luther  and  kelly,  “the  next  generation  of  discovery”;  amy  hoseth,  “criteria  to  consider   when  evaluating  web-­‐based  discovery  tools,”  in  planning  and  implementing  resource   discovery  tools  in  academic  libraries,  ed.  mary  p.  popp  and  diane  dallis  (hershey,  pa:   information  science  reference,  2012),  90–103,  http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐ 3.ch006.   9.     f.  william  chickering  and  sharon  q.  yang,  “evaluation  and  comparison  of  discovery  tools:  an   update,”  information  technology  &  libraries  33,  no.  2  (2014):  5–30,   http://dx.doi.org/10.6017/ital.v33i2.3471.   10.    noah  brubaker,  susan  leach-­‐murray,  and  sherri  parker,  “shapes  in  the  cloud:  finding  the   right  discovery  layer,”  online  35,  no.  2  (2011):  20–26.   11.    jason  vaughan,  “investigations  into  library  web-­‐scale  discovery  services,”  information   technology  &  libraries  31,  no.  1  (2012):  32–82,  http://dx.doi.org/10.6017/ital.v31i1.1916.   12.    mary  p.  popp  and  diane  dallis,  eds.,  planning  and  implementing  resource  discovery  tools  in   academic  libraries  (hershey,  pa:  information  science  reference,  2012),   http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.   13.    jason  vaughan,  “evaluating  and  selecting  a  library  web-­‐scale  discovery  service,”  in  planning   and  implementing  resource  discovery  tools  in  academic  libraries,  ed.  mary  p.  popp  and  diane     evaluating  web-­‐scale  discovery  services:  a  step-­‐by-­‐step  guide  |  deodato   doi:  10.6017/ital.v34i2.5745   74     dallis  (hershey,  pa:  information  science  reference,  2012),  59–76,   http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch004.   14.    monica  metz-­‐wiseman  et  al.,  “best  practices  for  selecting  the  best  fit,”  in  planning  and   implementing  resource  discovery  tools  in  academic  libraries,  ed.  mary  p.  popp  and  diane   dallis  (hershey,  pa:  information  science  reference,  2012),  77–89,   http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch005.   15.    david  freivalds  and  binky  lush,  “thinking  inside  the  grid:  selecting  a  discovery  system   through  the  rfp  process,”  in  planning  and  implementing  resource  discovery  tools  in  academic   libraries,  ed.  mary  p.  popp  and  diane  dallis  (hershey,  pa:  information  science  reference,   2012),  104–21,  http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch007.     16.    david  bietila  and  tod  olson,  “designing  an  evaluation  process  for  resource  discovery  tools,”   in  planning  and  implementing  resource  discovery  tools  in  academic  libraries,  ed.  mary  p.  popp   and  diane  dallis  (hershey,  pa:  information  science  reference,  2012),  122–36,   http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch008.   17.    suzanne  chapman  et  al.,  “developing  a  user-­‐centered  article  discovery  environment,”  in   planning  and  implementing  resource  discovery  tools  in  academic  libraries,  ed.  mary  p.  popp   and  diane  dallis  (hershey,  pa:  information  science  reference,  2012),  194–224,   http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch012.   18.    lynn  d.  lampert  and  katherine  s.  dabbour,  “librarian  perspectives  on  teaching  metasearch   and  federated  search  technologies,”  internet  reference  services  quarterly  12,  no.3/4  (2007):   253–78,  http://dx.doi.org/10.1300/j136v12n03_02;  william  breitbach,  “web-­‐scale   discovery:  a  library  of  babel?”  in  planning  and  implementing  resource  discovery  tools  in   academic  libraries,  ed.  mary  p.  popp  and  diane  dallis  (hershey,  pa:  information  science   reference,  2012),  637–45,  http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch038.   19.    metz-­‐wiseman  et  al.,  “best  practices  for  selecting  the  best  fit,”  81.   20.    meris  a.  mandernach  and  jody  condit  fagan,  “creating  organizational  buy-­‐in:  overcoming   challenges  to  a  library-­‐wide  discovery  tool  implementation,”  in  planning  and  implementing   resource  discovery  tools  in  academic  libraries,  ed.  mary  p.  popp  and  diane  dallis  (hershey,   pa:  information  science  reference,  2012),  422,  http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐ 3.ch024.   21.    david  p.  brennan,  “details,  details,  details:  issues  in  planning  for,  implementing,  and  using   resource  discovery  tools,”  in  planning  and  implementing  resource  discovery  tools  in   academic  libraries,  ed.  mary  p.  popp  and  diane  dallis  (hershey,  pa:  information  science   reference,  2012),  44–56,  http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch003;  hoseth,   “criteria  to  consider  when  evaluating  web-­‐based  discovery  tools”;  mandernach  and  condit   fagan,  “creating  organizational  buy-­‐in.”     information  technology  and  libraries  |  june  2015     75     22.    vaughan,  “evaluating  and  selecting  a  library  web-­‐scale  discovery  service,”  64.   23.    ibid.,  81.   24.    nadine  p.  ellero,  “an  unexpected  discovery:  one  library’s  experience  with  web-­‐scale   discovery  service  (wsds)  evaluation  and  assessment,”  journal  of  library  administration  53,   no.  5–6  (2014):  323–43,  http://dx.doi.org/10.1080/01930826.2013.876824.   25.    vaughan,  “evaluating  and  selecting  a  library  web-­‐scale  discovery  service,”  66.   26.    hoseth,  “criteria  to  consider  when  evaluating  web-­‐based  discovery  tools.”   27.    yang  and  wagner,  “evaluating  and  comparing  discovery  tools”;  chickering  and  yang,   “evaluation  and  comparison  of  discovery  tools”;  bietila  and  olson,  “designing  an  evaluation   process  for  resource  discovery  tools.”   28.    vaughan,  “investigations  into  library  web-­‐scale  discovery  services”;  vaughan,  “evaluating   and  selecting  a  library  web-­‐scale  discovery  service”;  freivalds  and  lush,  “thinking  inside   the  grid”l  brubaker,  leach-­‐murray,  and  parker,  “shapes  in  the  cloud.”     29.    chapman  et  al.,  “developing  a  user-­‐centered  article  discovery  environment.”     30.    jakob  nielsen,  “first  rule  of  usability?  don't  listen  to  users,”  nielsen  norman  group,  last   modified  august  5,  2001,  accessed,  august  5  2014,  http://www.nngroup.com/articles/first-­‐ rule-­‐of-­‐usability-­‐dont-­‐listen-­‐to-­‐users.     31.    freivalds  and  lush,  “thinking  inside  the  grid.”   32.    ibid.   33.    matthew  b.  hoy,  “an  introduction  to  web  scale  discovery  systems,”  medical  reference  services   quarterly  31,  no.  3  (2012):  323–29,  http://dx.doi.org/10.1080/02763869.2012.698186;   vaughan,  “web  scale  discovery  services”;  vaughan,  “investigations  into  library  web-­‐scale   discovery  services”;  hoeppner,  “the  ins  and  outs  of  evaluating  web-­‐scale  discovery   services”;  chickering  and  yang,  “evaluation  and  comparison  of  discovery  tools.”   34.    marshall  breeding,  “major  discovery  products,”  library  technology  guides,  accessed  august   5,  2014,  http://librarytechnology.org/discovery.   35.    hoeppner,  “the  ins  and  outs  of  evaluating  web-­‐scale  discovery  services,”  40.   36.    mandernach  and  condit  fagan,  “creating  organizational  buy-­‐in,”  429.   37.    bietila  and  olson,  “designing  an  evaluation  process  for  resource  discovery  tools.”   38.    special  thanks  to  rutgers’  associate  university  librarian  for  digital  library  systems,  grace   agnew,  for  designing  this  testing  method.   lib-s-mocs-kmc364-20140601053820 book reviews proceedings of the conference on interlibrary communications and information networks, edited by joseph becker, sponsored by the american library association and the u.s. office of education, bureau of libraries and educational technology held at airlie house, warrenton, virginia, september 28, 1970-0ctober 2, 1970. chicago: american library association, 1971. 347p to see how rapidly the field of library networking and communications has moved in recent times, one need only try to review a conference on the subject some years after it was held. what was fresh, imaginative, innovative, or blue-sky has become accepted or gone beyond; errors in thinking or bad guesses as to the future have been shown up; and the blue sky has been divided into lower stratospheres and outer space for ease of working. under these circumstances one can only review such proceedings as history. the assumptions on which the conference was based were the traditional ones of librarians and information scientists-that access to information should be the right of anyone without regard to geographical or economic position, and that pooling of resources (here by networking operations) is one of the best ways to reach that goal. since 1970 both of these assumptions have been questioned, but at the time of the conference there were no opposing voices. the final conclusions, of course, were based on these assumptions. national systems were recommended, both governmental and private, with the establishment of a public corporation (such as the corporation for public broadcasting) as the central stimulator, coordinator, and regulator, to be served by input from a large number of groups. funding, the attendees decided, should be pluralistic, from public, private, and foundation sources (are there any others?), but with the federal government bearing the largest burden of support. since it is deemed desirable to give the widest chance for all individuals to use these networks, it was recommended that fee-forservice prices should be kept low through subventions of the telecommunications costs by libraries and information centers. and since new techniques and methods need to be learned, both education and research in the field must be strengthened and enlarged. since the basic components of networks of libraries and information centers was conceived as being: 1. bibliographic access to media 2. mediation of user request to information book reviews 245 3. delivery of media to users 4. education traditional questions of bibliographic description, the most useful form of public services (including such things as interviewing requestors, seeking information on the existence of answers, locating the answers physically, providing them, evaluating them and obtaining feedback), as well as the best ways to set up networks were discussed at length. moreover, since new technologies have sometimes been touted as the answer to many of these problems, a whole section on network technology was included. such subjects as telecommunications, cable television, and computers were examined; here most of the recommendations still remain to be carried out. the organization proposed for these networks again plowed old ground. the conferees felt that one should use the tremendous national and disciplinary resources already established (the library of congress, the national library of medicine, the national agricultural library, chemical abstracts, etc.); there should be a coordinating body to minimize duplication of effort and assure across-the-board coverage; the systems must be sold to legislators if public money is to be provided; and more research on the best networking operations is necessary. above all in almost every section of the report and in the preface the then-new national commission on libraries and information science was referred to as the great savior. together with requests for public money, it might be said, this was the thread binding all sections of the conference together. was this conference necessary? could it have brought forth something more useful than the gentle spoof in irwin pizer's poem "hiawatha's network?" it was undoubtedly very inspiring for those at the conferenceall 100 of them-who probably learned more over the cocktail glass and dinner plate than at the formal sessions, and who learned as they grappled with the difficulties of consensus-making. but need the proceedings have been published? is everything ever said at a meeting always worth preserving? how about the concept of ephemera rather than total recall? would not a short summary of the recommendations have sufficed? estelle brodman december_ital_fifarek_final president’s message: focus on information ethics aimee fifarek information technologies and libraries | december 2016 1 just a few weeks ago we held yet another successful lita forum1, this time in fort worth, tx. tight travel budgets and time constraints mean that only a few hundred people get to attend forum each year, but that is one of the things that make it a great conference. because of its size you have a realistic chance of meeting everyone there, whether it’s at game night, one of the many networking dinners, or just for during hallway chitchat after a session. and the sessions really do give you something to talk about. this year i couldn’t help but notice a theme. among all the talk about makerspace technologies, analytics, and specific software platforms, the one bubble that kept rising to the surface was information ethics. why are you doing what you are doing with the information you have, and should you really be doing it? have you stopped to think what impact collecting, posting, sharing that information is going to have on the world around you? in a post-election environment replete with talk of fake news and other forms of deliberate misinformation, lita forum presenters seem to have tapped in to the zeitgeist. tara robertson, in her closing keynote2, talked about the harm digitizing analog materials can do when what is depicted is sensitive to individuals and communities. waldo jaquith of us open data talked about how a government decision to limit options on a birth certificate to either “white” or “colored” effectively wiped the native population out of political existence in virginia. and sam kome from claremont colleges talked about how well-meaning librarians can facilitate privacy invasion merely by collecting operational statistics3. there were many other examples brought out by forum speakers but these in particular emphasized the real consequences the serious consequences the use of data – intentional or not – can have on people. i think it is time for librarians4 to get more vocal about information ethics and the role we play in educating the population about humane information use. our profession has always been forward thinking about information literacy and is traditionally known for helping our communities make judgements about the information they consume. but we have not done enough to declare our expertise in the information economy, to stand up and say “we’re librarians – this is what we do.” now, more than ever, people need the skills to think critically about the information they are consuming via all kinds of media, understand the consequences of allowing algorithms to shape their information universe, and make quality judgments about trading their personal information for goods and services. to quote from unesco: aimee fifarek (aimee.fifarek@phoenix.gov) is lita president 2016-17 and deputy director for customer support, it and digital initiatives at phoenix public library, phoenix, az. president’s message | fifarek https://doi.org/10.6017/ital.v35i4.9602 2 changes brought about by the rapid development of information and communication technologies (ict) not only open tremendous opportunities to humankind but also pose unprecedented ethical challenges. ensuring that information society is based upon principles of mutual respect and the observance of human rights is one of the major ethical challenges of the 21st century.5 i challenge all librarians to make a commitment to propagating information ethics, both personally and professionally. make an effort to get out of your social media echo chamber6 and engage with uncomfortable ideas. when you see biased information being shared consider it a “teachable moment” and highlight the spin or present more neutral information. and if your library is not actively making information literacy and information ethics part of its programming and instruction, then do what you can to change it. offer to be on a panel, create a curriculum, or host a program that includes key concepts relating to information “ownership, access, privacy, security, and community”7. the focus of the libraries transform campaign this year is all about our expertise: “because the best search engine in the library is the librarian”8 it’s our time to shine. references 1. http://forum.lita.org/home/ 2. http://forum.lita.org/speakers/tara-robertson/ 3. http://forum.lita.org/sessions/patron-activity-monitoring-and-privacy-protection/ 4. as always, when i use the term “librarian” my intention is to include any person who works in a library and is skilled in information and library science, not to limit the reference to those who hold a library degree. 5. http://en.unesco.org/themes/ethics-information 6. https://www.wnyc.org/story/buzzfeed-echo-chamber-online-news-politics/ 7. https://en.wikipedia.org/wiki/information_ethics 8. http://www.ilovelibraries.org/librariestransform/ gaps in it and library services at small academic libraries in canada jasmine hoover information technology and libraries | december 2018 15 jasmine hoover (jasmine_hoover@cbu.ca) is scholarly resources librarian, cape breton university, sydney, nova scotia, canada. abstract modern academic libraries are hubs of technology, yet the gap between the library and it is an issue at several small university libraries across canada that can inhibit innovation and lead to diminished student experience. this paper outlines results of a survey of small (<5,000 fte) universities in canada, focusing on it and the library when it comes to organizational structure, staffing, and location. it then discusses higher level as well as smaller scale solutions to this issue. introduction modern academic libraries are hubs of technology, yet existing staffing, organizational structures, physical proximity and traditional ways of doing things in higher education have maintained a gap between the library and it, which is an issue at several small university libraries across canada. libraries today are largely online, which means managing access to resources, using online tools for reference and research, designing websites and more. the physical space in libraries is now a place to interact with new technologies, visualize data, a place for research support including open access repositories and data management, and other digital research initiatives. 1 these library functions often require a staffing complement to support them with a level of specialization in information technology (it). however, though the offerings of the library have changed drastically over the years, smaller university libraries have struggled to support the growing need for it services. larger universities (over 5,000 fte) have managed this influx of demand and usage of new technologies in libraries by having their own library it services to manage software and technologies to support research, teaching and learning. many also offer student and user -facing technical support with it help desks within the library. smaller universities (below 5,000 fte) often do not have the resources to have their own it department or staff and find themselves not able to help researchers with modern digital scholarship, not able to support new systems and software, and not working as closely with it as they would like or need. also, the it department is generally not responsible for this kind of work, as it is outside of institutional-wide software support. this paper outlines the current status of it and the library when it comes to organizational structure, physical location, and collaboration in small academic libraries across canada. it then outlines strategies that can be used in smaller libraries to help bridge the gap, as well as recommendations for administrators when considering organizational changes to better serve a modern research atmosphere. current status at small canadian universities the technologies behind modern library services are often complex, as libraries need to securely manage access to online resources (both on and off campus); support faculty as they research and mailto:jasmine_hoover@cbu.ca gaps in it and library services at small academic libraries in canada | hoover 16 https://doi.org/10.6017/ital.v37i4.10596 teach using new software and technologies; and support new models for publishing that include open-access repositories, data management, open education resources, and more. library staff deal with technology issues that come up daily, with several non-it library staff members troubleshooting and solving various issues that arise. library users run into all kinds of technical issues and reach out for help. in nova scotia, our library consortium offers live help, an online library chat service distributed throughout eleven academic institutions in nova scotia. statistics kept on type of question asked on this service from january 2010 to march 2018 show that 26 percent of the over 68,000 questions asked are technical in nature, with topics including difficulty accessing online resources, login troubles, and other technical issues.2 for this study, 18 out of the 21 universities with fte >1,000 and < 5,000 in canada were surveyed. excluded were universities that were “sister” institutions of larger universities which utilized the same library system and french-only-speaking universities. twelve university libraries responded to an online survey which asked questions concerning organization and collaboration focused on it, the library, and educational technology. results (see figure 1) show that organizational reporting structures in higher education vary when it comes to it and the library. fifty percent of the survey respondents reported that their it department reports to the ceo/cfo or vp administrative, 25 percent of it departments report to a cio, 17 percent report to a provost/vp academic, and 8 percent report to a vp finance. figure 1. which of the following best describes how your it organization reports? all of the libraries in this survey, on the other hand, report to a provost or vp academic. this makes sense, as libraries are generally considered academic while it is usually associated with operations. however, there have been recent changes to some university library structures in canada that might indicate new thinking when it comes to organizational structure and the relationship between these units. in 2018, it was announced that there would be restructuring at brandon university which removed the university librarian position altogether (as well as the 50% 17% 25% 8% reports to ceo/cfo/vp admin reports to provost/ vp academic reports to cio reports to the vp finance information technology and libraries | december 2018 17 director of it services), and placed the library under a chief information officer. this would bring the library and it under one reporting structure.3 in an opposite move, mount allison university recently proposed to eliminate the top librarian position and have an academic dean split the responsibility of the library and their academic unit.4 after local outcry, this move was reversed and the job ad is out for a head librarian. it is hard to say if these are signals of upcoming change in the future of library reporting, or a temporary solution in a time of budget restrictions. however, half of the survey respondents mentioned that there has been some recent reorganization or planned reorganization related to it and the library at their institutions. only 33 percent of small university libraries surveyed have their own it department or staff. one of those libraries have an it specialist who splits time between the library and their it department. the other 67 percent have no it department or staff in the library (see figure 2). figure 2. does your library have its own it department? when asked, “is there anything you would like changed about the current organization when it comes to it and the library?,” all of the libraries without in-library it support mentioned a desire for either a position in the library responsible for it; greater collaboration between it and the library; or a specific person within the it department who they could contact regarding it. student experience, including their experience with technology is important according to a recent educause study. this 2017 educause study outlines the importance of it, and support for students when it comes to wi-fi and other technical support.5 one recommendation from this report is to have it help desks more visible and available. not only is the library a convenient location, but as we have already seen, students are increasingly using technologies in the library and often run into issues. it makes sense then to have an it help desk within the library, as the majority of larger university libraries in canada already offer. when asked about it help desks in 25% 67% 8% yes, they are library employees no gaps in it and library services at small academic libraries in canada | hoover 18 https://doi.org/10.6017/ital.v37i4.10596 the library, three of the responding university libraries (25 percent) have help desks staffed by it services, one (8 percent) had a help desk staffed by library staff, and another (8 percent) had an after-hours help desk staffed by it services. the remaining 59 percent have no it help available in the library (see figure 3). figure 3. does your library have an it help desk? the physical location of the two units is also important. in this survey, 75 percent of respondents replied that the library and the it department are in separate spaces while 25 percent share a common space. studies have shown that physical proximity in the workplace can lead to greater collaboration. an mit study showed that physical proximity drives collaboration between researchers on university campuses.6 as one of the common themes in the survey was the desire for more collaboration, a physical change of location could have a great impact. when asked about changes people would like to see with the current organization of the it and library, many mention a need for more collaboration due to interrelated responsibilities. common suggestions included library it staff, having an it help desk in the library, or a specific person in it they could contact directly for help or who had shared responsibilities between it and the library. another suggestion was a committee that would bring together members from both units to strengthen communication. what can be done? in the larger view, university administrations need to look for outdated governance and organizational structures that are in place. as universities shift their goals and focus over time, they need to adapt structures and staffing accordingly. chong and tan describe it govern ance as being of utmost importance, claiming there needs to be strategic alignment between it and organizational strategies and objectives.7 carraway describes universities with a high level of it governance maturity and effectiveness as those where “it initiatives are aligned with the 59%25% 8% 8% no yes, employed by it services yes, employed by the library yes, employed by it services, only after hours information technology and libraries | december 2018 19 institution’s strategic priorities and prioritized among the university’s portfolio projects.” 8 effective it governance, focused on collaboration and communication, is associated with greater integration of innovation into institutional process. also, it governance was found to be more effective under a delegated model that empowers it governance bodies than under a cio centric model. the majority of universities surveyed showed common governance structures of it, with most as separate units reporting to a cfo/vp admin or similar. the inclusion of faculty, students , and business units in it governance committees was associated with a stronger innovation culture.9 stakeholder inclusion is an important characteristic of it governance maturity. students, as consumers of it, and faculty should both have a seat at the table when it comes to it governance. carraway found that an increased level of student engagement in it governance correlates with a high level of innovation culture.10 university administration should take a good look at how it is governed, who has input and how it is affecting the university’s objectives. the reporting structure of libraries has generally gone unchanged, with most respondents confirming that their library reports to an academic vice president. budget constraints at two canadian universities have seen the library structure being impacted as of late, however there has been little research done on the ideal governance structure of libraries in higher education. both it and the library in smaller canadian universities could consider governance committees that include students, faculty and other stakeholders, in order to be more innovative and effective. it is an interesting unit, where the model in higher education has moved back and forth between three main structures: centralized, decentralized, and federated it structures. centralized, where there is a central hub that runs it services for the university, is the most common structure found at the surveyed universities. decentralized, where it services are spread throughout the organization, would automatically mean the library (and other units) had it staff. a federated model would also lead to local library it work being done by specific people, who work for and out of a central it office, but are assigned to specific areas. federated structures offer centralized control, with decentralized functions in faculties and units. chong and tan believe that federated structures are more appropriate for a collaborative network, such as a university.11 their study found that federated structure, combined with coordinated communication, led to higher effectiveness. nugroho maintains that decentralized organizations such as universities need to regularly review their it governance structure, as both technology and the organization itself changes.12 he maintains that effective governance does not happen by coincidence, and it governance is not a static concept. library staffing also needs to change based on needs of the users and goals of the organization. some even suggest that libraries reorganize every few years to keep staff flexible, take advantage of new opportunities and foster growth.13 in 2011, we saw bell and shank’s work on the blended librarian, which advocated for librarianship with educational technology and instructional design skills.14 according to the 2015 arl statistics, we continue to see nontraditional professional jobs increasing in the library. in 2015, the top three new hire categories included two nontraditional categories: digital specialists and functional specialists.15 arcl statistics from 2016 showed that over the previous five years, 61 percent of libraries repurposed or cross-trained staff to better support new technologies or services.16 we saw in the survey that out of over 68,000 research questions fielded by librarians across nova scotia since 2010, just over one quarter of these are technical in nature. library administration at smaller universities, looking at these numbers, gaps in it and library services at small academic libraries in canada | hoover 20 https://doi.org/10.6017/ital.v37i4.10596 should respond by ensuring that technical knowledge and skills will be written into job ads, as they are increasingly in demand or that staff are trained appropriately. physical location is also important. we’ve seen from the survey results that there is a lack of physical connectedness between the library and it in smaller canadian universities. wineman et al. studied various organizations and their physical proximity. they state: “social networks play important roles in structuring communication, collaboration, access to knowledge and knowledge transformation.”17 they suggest that innovation is a process that occurs at the crossroad between social and physical space. cramton points out that “maintaining mutual knowledge is a central problem of geographically dispersed collaboration.”18 if it is not possible to change the organizational structure or governance to ensure more communication and knowledge sharing, physical spaces such as an it desk in the library is another way for the library and it staff to be in regular contact. a 2017 mit study recommended that institutions keen to support the crossdisciplinary collaborative activity that is vital to research and practice, may need to adopt “a new approach to socio-spatial organisation that may ultimately enrich the design and operation of places for knowledge creation.”19 we could apply the same thinking to institutions interested in supporting collaborative activity between the library, it, and newer-yet-related initiatives such as educational technology and digital research centers. proximity to collaborators should be considered as one option to enhance outcomes and innovation between the library and it. organizational structures and models, physical locations, and governance are all large-scale factors that should be considered when looking at the relationship between it and the library. there are also smaller-scale practical ideas that can help. these ideas will be discussed below. an important first step is to start the conversation. the author’s institution has begun thinking about the gaps in our services and support for research, especially when it comes to support for technologies needed for modern research and publication that are often housed in the library. factors which have helped start this conversation include: funding mandates related to open access and data management; new services or initiatives that researchers or units would like to start; which require it and library specialization; and planning for a future in higher education that increasingly relies on up to date technologies to support research, publishing, and teaching. a conversation is beginning between researchers, administration, the library, and other stakeholders which will lead to a collaborative solution to some of these issues. it’s important that there is interest and initiation from administration, but also that other stakeholders are involved from the onset. many universities have developed new positions, new units, or worked these positions into it or the library to fill this gap, but the solution needs to fit each institution and their goals. often times when there is no it staff in the library, technical issues are managed by one or two technical-minded staff members. equipping frontline service providers may help alleviate some of this work by enabling many staff to solve common technical issues. here at the author’s institution, the librarian in charge of access has begun presenting common technical/access issues during a monthly reference meeting. the goal is to have all staff who field questions from users have a basic understanding of how the systems work in the library, what to do if they see issues, and whom they can contact. in libraries where there is not a strong it presence, it is important to enable staff to be comfortable with basic issues that will come up. this also ensures that there is not just one person who can answer common technical/access issues. if someone staffing the information technology and libraries | december 2018 21 reference or circulation desk encounters users with these issues, they can explain why they are happening and what the library is going to do to help them. the plan is to create a library technical manual out of these quick presentations that can act as a resource for all staff or as a training manual for new staff. at each of these presentations, a survey is administered. the survey has four questions and asks participants about their comfort level dealing with technical/access questions both before and after the presentation. one hundred percent of staff answered that after the presentation, they felt more comfortable when encountering the issues described. this is not a suitable replacement of the specialized it skills needed in libraries; however, it can alleviate some of the pressure put on select people in smaller academic libraries. library staff can, and do, actively work to learn new skills through formal training and professional development. we saw from the acrl survey that many libraries are working to cross-train staff in order to keep up with technological demands. encouraging learning new skills and educational opportunities can go a long way and should be encouraged by library administration. the benefit of having it staff dedicated to the library is obvious, and libraries should continually push for this. results of the survey showed that library staff would prefer to have a person to contact with issues specific to the library. issues can be dealt with promptly, it personnel working in or assigned to the library will have an understanding of the systems involved, communication is easier, as there is a point person to contact, and the library has control over the products and services they offer. however, if that is not possible within the organization, a good system of communication is important. a timely system of contacting it and resolving issues can go a long way. chong and tan maintain that a coordinated communication system is key for it in an organization.20 a commonly used system for technical issues is the ticket system, where issues can be submitted by users, and answered and tracked by it. this is a very useful system for it staff, however the users often cannot track their own ticket, see a timeline for completion, or know who is on the other end to contact with more information. it is a good idea to meet regularly with it, formally or informally, to be able to discuss issues, build a relationship with colleagues, and get a better sense of how each unit works. on the library end, it is important to keep statistics on technical issues sent to it and the time elapsed before the issues are resolved. these statistics can be used to demonstrate the need for library-specific it staff, encourage better communication between departments, or demonstrate a problem with the current way issues are communicated. having statistics will help libraries if and when the time comes that new positions can be created. at the author’s institution we use springshare’s libanswers software to track all technical issues, including those sent on to it. this software records the dates and times; important details and resolutions to technical issues; and exports useful statistics. in smaller canadian university libraries there is a growing need for it support. however, there has been little done by way of organizational structure, staffing, or physical proximity between these two units to allow universities to better serve their students and faculty. this paper outlined the current situation in several smaller university libraries in canada and provides some high level as well as local solutions to this problem. gaps in it and library services at small academic libraries in canada | hoover 22 https://doi.org/10.6017/ital.v37i4.10596 appendix a: it, library, and educational technology organization *required 1. institution name * 2. total student population 3. which of the following best describes how your it organization reports? mark only one oval. reports to ceo/cfo/vp admin reports to cio reports to provost/vp academic reports to dean of library/ head of library other: 4. which of the following best describes how the dean/head of library/university librarian reports? mark only one oval. reports to the ceo/cfo/vp admin reports to provost/vp academic reports to university president other: 5. which of the following best describes it's relationship to the library? mark only one oval. it and the library are not at all part of the same reporting structure it is a part of the library reporting structure it and the library report to the same person, but are separate departments other: 6. which of the following describes the physical location of it and the library? mark only one oval. located in separate spaces share a physical location other: 7. does your library have its own it department? mark only one oval. yes, they are library employees yes, they are employed by it services and work in the library information technology and libraries | december 2018 23 no other: 8. does your library have an it help desk? mark only one oval. yes, they are library employees yes, they are employed by it services no other: 9. have there been any major reorganizations (that you are aware of) related to it and library services in the last ten years? 10. is there anything you would like changed about your current organization when it comes to it and the library? 11. who is in charge of educational technology/academic technology at your university? mark only one oval. library it educational technology is separate unit/office educational technology duties are split up among the library/it/other other: 12. which of the following describes the physical location of educational technology? mark only one oval. ed tech is located in or shares space with the library ed tech is located in or shares space with it ed tech has its own space no ed tech unit gaps in it and library services at small academic libraries in canada | hoover 24 https://doi.org/10.6017/ital.v37i4.10596 other: 13. what would you include as roles of an educational technology unit? mark all that apply. media design/production research and development (testing technologies, em erging tech) instructional design and developm ent faculty development learning spaces assessment (learning outcomes, course evaluations) distance/online learning support training on course software/technologies related to teaching and learning managing classroom technologies other: 14. have there been any changes (that you know of) related to educational technology services in the last ten years? 15. is there anything you would like changed about your current organization when it comes to educational technology services and the library? 16. may i use direct quotes in my research/publication? (no names or institutions will be attributed to a quote.) mark only one oval. yes no information technology and libraries | december 2018 25 references 1 tibor koltay, “are you ready? tasks and roles for academic libraries in supporting research 2.0,” new library world 117, no. 1/2 (january 11, 2016): 94–104, https://doi.org/10.1108/nlw-09-2015-0062. 2 “instant messaging service—statistics data entry page,” novanet, accessed june 5, 2018, https://util.library.dal.ca/livehelp/liveh3lp/admin/livehelp/chatentry.php. 3 “brandon university will eliminate 15% of senior administration to help tackle budget cut,” brandon university, march 15, 2018, https://www.brandonu.ca/news/2018/03/15/brandonuniversity-will-eliminate-15-of-senior-administration-to-help-tackle-budget-cut/. 4 joseph tunney, “mount a proposal to phase out top librarian makes students, staff want to make noise,” cbc news, january 18, 2018, https://www.cbc.ca/news/canada/newbrunswick/mount-allison-university-librarian-1.4492297. 5 d. christopher brooks and jeffrey pomerantz, “ecar study of undergraduate students and information technology,” educause, october 18, 2017, accessed june 7, 2017, https://library.educause.edu/resources/2017/10/ecar-study-of-undergraduate-studentsand-information-technology-2017. 6 matthew claudel et al., “an exploration of collaborative scientific production at mit through spatial organization and institutional affiliation,” plos one 12, no. 6 (2017), https://doi.org/10.1371/journal.pone.0179334. 7 josephine chong and felix b. tan, “it governance in collaborative networks: a socio-technical perspective,” pacific asia journal of the association for information systems 4, no. 2 (2012). 8 deborah louise carraway, “information technology governance maturity and technology innovation in higher education: factors in effectiveness” (master’s diss., the university of north carolina at greensboro, 2015), 113. 9 ibid., 89. 10 ibid. 11 chong and tan, “it governance in collaborative networks: a socio-technical perspective,” 44. 12 heru nugroho, “conceptual model of it governance for higher education based on cobit 5 framework,” journal of theoretical and applied information technology, 60, no. 2 (february 2014): 6. 13 gillian s. gremmels, “staffing trends in college and university libraries,” reference services review 41, no. 2 (2013): 233–52, https://doi.org/10.1108/00907321311326165. 14 john d. shank and steven bell, “blended librarianship.” reference & user services quarterly 51, no. 2 (winter 2011): 105–10. https://doi.org/10.1108/nlw-09-2015-0062 https://util.library.dal.ca/livehelp/liveh3lp/admin/livehelp/chatentry.php https://www.brandonu.ca/news/2018/03/15/brandon-university-will-eliminate-15-of-senior-administration-to-help-tackle-budget-cut/ https://www.brandonu.ca/news/2018/03/15/brandon-university-will-eliminate-15-of-senior-administration-to-help-tackle-budget-cut/ https://www.cbc.ca/news/canada/new-brunswick/mount-allison-university-librarian-1.4492297 https://www.cbc.ca/news/canada/new-brunswick/mount-allison-university-librarian-1.4492297 https://library.educause.edu/resources/2017/10/ecar-study-of-undergraduate-students-and-information-technology-2017 https://library.educause.edu/resources/2017/10/ecar-study-of-undergraduate-students-and-information-technology-2017 https://doi.org/10.1371/journal.pone.0179334 https://doi.org/10.1108/00907321311326165 gaps in it and library services at small academic libraries in canada | hoover 26 https://doi.org/10.6017/ital.v37i4.10596 15 stanley wilder, “hiring and staffing trends in arl libraries,” association of research libraries, october 2017, https://www.arl.org/storage/documents/publications/rli-2017-stanleywilder-article2.pdf. 16 “new acrl publication: 2016 academic library trends and statistics,” news and press center, july 20, 2017, http://www.ala.org/news/member-news/2017/07/new-acrl-publication-2016academic-library-trends-and-statistics. 17 jean wineman et al., “spatial layout, social structure, and innovation in organizations,” environment and planning b: planning and design 41, no. 6 (december 1, 2014): 1,100–112, https://doi.org/10.1068/b130074p. 18 catherine durnell cramton, “the mutual knowledge problem and its consequences for dispersed collaboration,” organization science 12, no. 3 (may-june2001): 346–71, https://doi.org/10.1287/orsc.12.3.346.10098. 19 claudel et al., “an exploration of collaborative scientific production at mit through spatial organization and institutional affiliation,” 2. 20 chong and tan, “it governance in collaborative networks: a socio-technical perspective,” 44. https://www.arl.org/storage/documents/publications/rli-2017-stanley-wilder-article2.pdf https://www.arl.org/storage/documents/publications/rli-2017-stanley-wilder-article2.pdf http://www.ala.org/news/member-news/2017/07/new-acrl-publication-2016-academic-library-trends-and-statistics http://www.ala.org/news/member-news/2017/07/new-acrl-publication-2016-academic-library-trends-and-statistics https://doi.org/10.1068/b130074p https://doi.org/10.1287/orsc.12.3.346.10098 abstract introduction current status at small canadian universities what can be done? appendix a: it, library, and educational technology organization references academic libraries on social media: finding the students and the information they want heather howard, sarah huber, lisa carter, and elizabeth moore information technology and libraries | march 2018 8 heather howard (howar198@purdue.edu) is assistant professor of library science; sarah huber (huber47@purdue.edu) is assistant professor of library science; lisa carter (carte241@purdue.edu) is library assistant; and elizabeth moore (moore658@purdue.edu) is library assistant and student supervisor at purdue university. librarians from purdue university wanted to determine which social media platforms students use, which platforms they would like the library to use, and what content they would like to see from the library on each of these platforms. we conducted a survey at four of the nine campus libraries to determine student social media habits and preferences. results show that students currently use facebook, youtube, and snapchat more than other social media types; however, students responded that they would like to see the library on facebook, instagram, and twitter. students wanted nearly all types of content from the libraries on facebook, twitter, and instagram, but they did not want to receive business news or content related to library resources on snapchat. youtube was seen as a resource for library service information. we intend to use this information to develop improved communication channels, a clear social media presence, and a cohesive message from all campus libraries. introduction in his book tell everyone: why we share and why it matters, alfred hermida states, “people are not hooked on youtube, twitter or facebook but on each other. tools and services come and go; what is constant is our human urge to share.”1 libraries are places of connection, where people connect with information, technologies, ideas, and each other. as such, libraries look for ways to increase this connection through communication. social media is a key component of how students communicate with classmates, families, friends, and other external entities. it is essential for libraries to communicate with students regarding services, collections, events, library logistics, and more. purdue university is a large, land-grant university located in west lafayette, indiana, with an enrollment of more than forty thousand. the purdue libraries consist of nine libraries, presented collectively on the social media platforms facebook and twitter since 2009 and youtube since 2012. going forward, the purdue libraries want to ensure it establishes a cohesive message and brand that is communicated to students on platforms they use and on which they will engage with it. the purpose of this study was to determine which social media platforms the students are currently using, which platforms they would like the library to use, and what content they would like to see from the libraries on each of these platforms. mailto:howar198@purdue.edu mailto:huber47@purdue.edu mailto:carte241@purdue.edu mailto:moore658@purdue.edu academic libraries on social media | howard, huber, carter, and moore 9 https://doi.org/10.6017/ital.v37i1.10160 literature review academic libraries and social media academic libraries have been slow to accept social media as a venue for either promoting their services or academic purposes. a 2007 study of 126 academic librarians found that only 12 percent of those surveyed “identified academic potential or possible benefits” of facebook while 54 percent saw absolutely no value in social media.2 however, the mission of academic libraries has shifted in the last decade from being a repository of knowledge to being a conduit for information literacy; new roles include being a catalyst for on-campus collaboration and a facilitator for scholarly publication within contemporary academic librarianship.3 academic librarians have responded to this change, with many now believing that “social media, which empowers libraries to connect with and engage its diverse stakeholder groups, has a vital role to play in moving academic libraries beyond their traditional borders and helping them engage new stakeholder groups.”4 student perceptions about academic libraries on social media as the use of social media has grown with college-aged students, so has an increasing acceptance of academic libraries using social media to communicate. a pew research center report from 2005 showed just 7 percent of eighteen to twenty-nine year olds using social media. by 2016, 86 percent were using social media.5 in 2007 the oclc asked 511 college students from six different countries to share their thoughts on libraries using social networking sites. this survey revealed that “most college students would be unlikely to participate in social networking services offered by a library,” with just 13 percent of students believing libraries have a place on social media.6 however, just two years later (in 2009), a shift was seen: students were open to connecting with academic libraries, as observed in a survey of 366 freshmen at valparaiso university. when asked their thoughts on the library sending announcements and communications to them via facebook or myspace (a social media powerhouse at the time), 42.6 percent answered they would be “more receptive to information received in this way than any other response.” a smaller group, 12.3 percent, responded more negatively to this approach. students showed concern for their privacy and the level of professionalism, as a quote from a student illustrates: “facebook is to stay in touch with friends or teachers from the past. email is for announcements. stick with that!!!” 7 as students report becoming more open to academic libraries on social media, the question of whether they will engage through social media emerges. a recent study from western oregon university’s hammersley library asked this question with promising results. forty percent of students said they were either “very likely “or “somewhat likely” to follow the library on instagram and twitter, as opposed to wanting communications being sent to them directly through social media (for example, a facebook message). pinterest followed, with 33 percent of students saying they were either “very likely” or “somewhat likely” to follow the library using this platform.8 throughout the literature, students have shown an interest in information about the libraries that is useful to them. in another survey given to undergraduate students from three information technology classes at florida state university, one question examined the perceived importance of different library social media postings to students. the report showed students considered postings related to operations updates, study support, and events as the most important.9 in the hammersly study noted above, 78 percent and 87 percent of respondents said information technology and libraries | march 2018 10 they were either “very interested” or “somewhat interested,” respectively, in every category relating to library resources presented in the survey, but “interesting/fun websites and memes” received the least interest from participants.10 the literature shows an increase in students being receptive to academic libraries on social media. results vary campus to campus and students are leery of libraries reaching out to them via social media, but they have an increasingly positive view about content posted that will help them with the library. research questions the aim of this project was to investigate the social media behaviors of purdue university students as they relate to the libraries, and to develop evidence-based practices for managing the library’s social media accounts. the project focused on three research questions: 1. what social media platforms are students using? 2. what social media platforms do students want the library to use? 3. what kind of content do students want from the library on each of these platforms? methods we created the survey using the web-based qualtrics survey software. it was distributed in electronic form only, and it was promoted to potential respondents via table tents in the libraries, bookmarks at the library desk, facebook posts, and in-classroom promotion. potential respondents were advised that the survey was anonymous and voluntary. the survey consisted of closed questions, though many questions contained an open-ended field for answers that did not fall into the provided choices. inspiration for some of the options in our survey questions came from the hammersly library study, as we felt they did a good job capturing information about the social media usage of their patrons.11 our survey asked what social media platforms students use, what they use them for, how often they visit the library, how likely they are to follow the library on social media, which platforms they want the library to have, and what content they would like from the library on each of those platforms. the social media platforms included were facebook, flickr, g+, instagram, linkedin, pinterest, qzone, renren, snapchat, tumblr, twitter, youtube, and yik yak.12 there were also open-ended spaces where participants could write in additional platforms. the survey originally ran for three weeks in only the business library early in the spring 2017 semester, as its intended purpose was to inform how the business library would manage social media. after that survey was completed, we decided to replicate the survey in three additional libraries (humanities, social science, and education; engineering; and the main undergraduate libraries). this was done to expand the dataset and reach additional students in a variety of disciplines. these libraries were chosen because they were the libraries in which the authors work, with the hope to expand to additional libraries in the future. the second survey also lasted for three weeks starting in mid-april of the spring 2017 semester. as a participation incentive, students who completed the initial survey and the second survey had an opportunity to enter a drawing for a $25 visa gift card. academic libraries on social media | howard, huber, carter, and moore 11 https://doi.org/10.6017/ital.v37i1.10160 the survey was advertised across four different campus libraries and promoted in several ways to reach different populations. though the results are not from a random sample of the student population, the results are broad enough that we intend to apply them to our entire student population. results survey the survey was completed by 128 students. an additional 13 students began the survey but did not complete it; we removed their results from the analysis. the breakdown of respondents was 10 percent freshmen (n = 13), 22 percent sophomore (n = 28), 27 percent junior (n = 35), 20 percent senior (n = 25), and 21 percent graduate or professional (n = 27). library usage the students were asked how frequently they visit the library to determine if the survey was reaching a population of regular or infrequent library visitors. the results showed that the students who completed the survey were primarily frequent library users, with 93 percent (n = 119) visiting once a week or more. social media platforms the students were asked to identify which social media platforms they used and how frequently they used them. the most popular social media platforms were determined by combining the number of students who said they used them daily or weekly. the top five were facebook (n = 114, 88 percent), youtube (n = 102, 79 percent), snapchat (n = 90, 70 percent), instagram (n = 85, 66 percent), and twitter (n = 41, 32 percent). full results are in table 1. table 1. usage frequency by platform social media platform daily weekly monthly < once per month never facebook 94 (72.87%) 20 (15.50%) 5 (3.88%) 5 (3.88%) 4 (3.10%) flickr 0 (0.00%) 1 (0.78%) 2 (1.55%) 8 (6.20%) 117 (90.70%) g+ 3 (2.33%) 6 (4.65%) 4 (3.10%) 16 (12.40%) 99 (76.74%) instagram 68 (52.71%) 17 (13.18%) 5 (3.88%) 11 (8.53%) 27 (20.93%) linkedin 9 (6.98%) 29 (22.48) 22 (17.05%) 22 (17.05%) 46 (35.66%) pinterest 12 (9.30%) 12 (9.30%) 16 (12.40%) 19 (14.73%) 69 (53.49%) qzone 0 (0.00%) 0 (0.00%) 0 (0.00%) 4 (3.10%) 124 (96.12%) renren 0 (0.00%) 0 (0.00%) 1 (0.78%) 3 (2.33%) 124 (96.12%) snapchat 84 (65.12%) 6 (4.65%) 6 (4.65%) 7 (5.43%) 25 (19.38%) tumblr 7 (5.43%) 2 (1.55%) 7 (5.43%) 11 (8.53%) 101 (78.29%) information technology and libraries | march 2018 12 social media platform daily weekly monthly < once per month never twitter 28 (21.71%) 13 (10.08%) 12 (9.30%) 9 (6.98%) 66 (51.16%) youtube 58 (44.96%) 44 (34.11%) 15 (11.63%) 4 (3.10%) 7 (5.43%) yik yak 0 (0.00%) 0 (0.00%) 0 (0.00%) 11 (8.53%) 117 (90.70%) other: email 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: groupme 3 (2.33%) 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: reddit 2 (1.55%) 2 (1.55%) 0 (0.00%) 0 (0%) 0 (0.00%) other: skype 0 (0.00%) 0 (0.00%) 0 (0.00%) 1 (0.78%) 0 (0.00%) other: vine 0 (0.00%) 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: wechat 3 (2.33%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: weibo 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: whatsapp 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) social media activity next, students were asked how much time they spend on social media doing the following activities: watching videos, keeping in touch with friends/family, sharing photos, keeping in touch with classmates/professors, learning about campus events, doing research, getting news, or following public figures. table 2 shows that students overwhelmingly use social media daily or weekly to watch videos (94 percent, n = 120), keep in touch with family/friends (93 percent, n = 119), and to get news (81 percent, n = 104). the least popular activities, those that students do less than once per month or never, were research (47 percent, n = 60) and to following public figures (34 percent, n = 45). social media and the library the students were asked how likely they are to follow the libraries on social media. the response to this was primarily positive, with 57 percent of respondents saying they are either extremely likely or somewhat likely to follow the library. one response for this question was inexplicably null, so for this question n = 127. figure 1 contains the full results. academic libraries on social media | howard, huber, carter, and moore 13 https://doi.org/10.6017/ital.v37i1.10160 table 2. social media activity social media activity daily weekly monthly < once per month never watch videos 85 (66.41%) 35 (27.34%) 1 (0.78%) 4 (3.13%) 3 (2.34%) keep in touch with friends/family 89 (69.53%) 30 (23.44%) 6 (4.69%) 2 (1.56%) 1 (0.78%) share photos 32 (25%) 33 (25.78%) 38 (29.69%) 20 (15.63%) 5 (3.91%) keep in touch with classmates/professors 34 (26.56% 47 (36.72%) 21 (16.41%) 19 (14.84%) 7 (5.47%) learn about campus events 24 (18.75%) 53 (41.41%) 29 (22.66%) 18 (14.06%) 4 (3.13%) do research 24 (18.75%) 26 (20.31%) 18 (14.06%) 23 (17.97%) 37 (28.91%) get news 66 (51.56%) 38 (29.69%) 7 (5.47%) 9 (7.03%) 8 (6.25%) follow public figures 34 (26.56%) 30 (23.44%) 20 (15.63%) 19 (14.84%) 24 (18.75%) other 2 (1.56%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) figure 1. library social media follows. 12 66 23 16 10 0 10 20 30 40 50 60 70 extremely likely somewhat likely neither likely nor unlikely somewhat unlikely extremely unlikely how likely are you to follow the library on social media? information technology and libraries | march 2018 14 the students were asked which social media platforms they thought the library should be on. five rose to the top of the results: facebook (82 percent, n = 105), instagram (55 percent, n = 70), twitter (40 percent, n = 51), snapchat (34 percent, n = 44), and youtube (29 percent, n = 37). full results can be seen in figure 2. after a student selected a platform they wanted the library to be on, logic built into the survey then directed them to an additional question that asked what content they would like to see from the library on that platform. content included library logistics (hours, events, etc.), research techniques and tips, how to use library resources and services, library resource info (database instruction/tips, journal availability, etc.), business news, library news (e.g., if the library wins an award), campus-wide info/events, and interesting/fun websites and memes. for facebook, students widely selected all types of content, with the most selections made for library logistics (n = 73) and the fewest made for business news (n = 33). for instagram, students wanted all content except business news (n = 18). snapchat was similar, except along with business news (n = 8), students also were not interested in receiving content related to library resource information (n = 9). twitter was similar to facebook in that all content was widely selected. youtube had a focus on library services, with the three most-selected content options being research techniques and tips (n = 20), how to use library resources and services (n = 19), and library resource info (n = 16). table 3 contains the full results. figure 2. library social media presence. 105 7 70 23 10 1 1 44 5 51 37 0 20 40 60 80 100 120 facebook g+ instagram linkedin pinterest qzone renren snapchat tumblr twitter youtube what social media platform should the library be on? academic libraries on social media | howard, huber, carter, and moore 15 https://doi.org/10.6017/ital.v37i1.10160 table 3. library social media content by platform what type of content would you like to see from the library? content type f a c e b o o k (n = 1 0 5 ) g + (n = 7 ) in s ta g r a m (n = 7 0 ) l in k e d in (n = 2 3 ) p in te r e s t (n = 1 0 ) s n a p c h a t (n = 4 4 ) t u m b lr (n = 5 ) t w itte r (n = 5 1 ) y o u t u b e (n = 3 7 ) library logistics (hours, events, etc.) 73 (69.52%) 2 (28.57%) 34 (48.57%) 7 (30.43%) 4 (40%) 23 (52.27%) 2 (40%) 32 (62.75%) 8 (21.62%) research techniques & tips 52 (49.52%) 3 (42.85%) 28 (40%) 13 (56.53%) 7 (70%) 19 (43.18%) 3 (60%) 27 (52.94%) 20 (54.05%) how to use library resources & services 53 (50.48%) 3 (42.85%) 26 (37.14%) 8 (34.78%) 7 (70%) 16 (36.36%) 3 (60%) 25 (49.02%) 19 (51.35%) library resource info (database instruction/tips , journal availability, etc.) 53 (50.48%) 3 (42.85%) 22 (31.42%) 8 (34.78%) 6 (60%) 9 (20.45%) 2 (40%) 23 (45.10%) 16 (43.24%) business news 33 (31.43%) 2 (28.57%) 18 (25.71%) 13 (56.52%) 3 (30%) 8 (18.18%) 2 (40%) 17 (33.33%) 7 (18.92%) library news (e.g., if the library wins an award) 49 (46.67%) 3 (42.85%) 37 (52.86%) 12 (52.17%) 5 (50%) 19 (43.18%) 3 (60%) 24 (47.06%) 7 (18.92%) campus-wide info/events 73 (69.52%) 3 (42.85%) 42 (60%) 5 (21.74%) 5 (50%) 26 (59.09%) 2 (40%) 35 (68.63%) 13 (35.14%) interesting/fun websites & memes 48 (45.71%) 0 41 (58.57%) 2 (8.70%) 10 (100%) 30 (68.18%) 3 (60%) 26 (50.98%) 12 (32.43%) other 1 (0.95%) 0 2 (2.86%) 0 1 (10%) 2 (4.55%) 0 2 (3.92%) 1 (2.70%) discussion historically, libraries have used social media as a marketing tool.13 with social media’s everincreasing popularity with young adults, academic libraries have actively established a presence on several platforms.14 our survey shows that our students follow this trend, using social media regularly and for a variety of activities. we were surprised that facebook turned out to be the information technology and libraries | march 2018 16 most widely used by our students, as much has been written in the last few years about teens and young adults leaving the platform.15 a november 2016 survey, however, found that 65 percent of teens said they used facebook daily, a large increase from 59 percent in november 2014. though snapchat and instagram preferred, teens continue to use facebook for its utility in scheduling events or keeping in touch regarding homework.16 students do seem receptive to following the library on different platforms and report wanting primarily library-related content from us, including more in-depth content such as research techniques and database instruction. limitations and future work findings from this study give insight into opportunities for libraries to reach university students through social media. we acknowledge that only limited generalizations can be made because of the way the survey was conducted. our internal recruitment methods led to a selection bias in our surveyed population, as advertisement of the survey took place either in the chosen libraries or on the purdue libraries’ existing facebook page. because of this, our sample consists primarily of students who visit the library or already follow the library on facebook. we hope to alter this in future surveys by expanding our recruitment to other physical spaces across campus. in addition, we plan to add questions that first establish a better understanding of students’ opinions of libraries being on social media before asking what social media they would like to see libraries use. this would potentially avoid leading students to an answer. further, we are concerned we took for granted students’ understanding of library resources; that is, we may have made distinctions librarians understand, but students may not. in future studies, we plan to rephrase, and possibly combine, questions in a way that will be clear to people less familiar with library resources and services. we believe confusion with these questions created contradictory responses. for example, “research help through social media” received a low response rate, but “information on research techniques and tips” received a much higher response rate. additionally, a limitation of using a survey to collect behavior information is that respondents do not always report how they actually behave. using methods such as focus groups, interviews, text mining, or usability studies could provide a more holistic view of student behavior. duplication of this study on a yearly or semi-yearly basis across all libraries could help us see how social media preferences change over time and across a larger sample of our population. this study aimed to provide a broad view of a large university’s student body by surveying across different subject libraries. with the changes discussed, we think a revised survey could give us the detailed information we need to build a more effective social media strategy that reaches both library users and non-users. conclusion this study improved our understanding of the social media usage and preferences of purdue students. from these results, we intend to develop better communication channels, a clear social media presence, and a more cohesive message across the purdue libraries. under the direction of our new director of strategic communication, a social media committee was formed with representatives from each of the libraries to contribute content for social media. the committee will consider expanding the purdue libraries’ social media presence to communication channels where students have said they are and would like us to be. as social media usage is ever-changing, we recommend repeated surveys such as this to better understand where on social media students want to see their libraries and what information they want to receive from them. academic libraries on social media | howard, huber, carter, and moore 17 https://doi.org/10.6017/ital.v37i1.10160 references 1 alfred hermida, tell everyone: why we share and why it matters (toronto: doubleday canada, 2014), 1. 2 laurie charnigo and paula barnett-ellis, “checking out facebook.com: the impact of a digital trend on academic libraries,” information technology and libraries 26, no. 1 (march 2007): 23–34, https://doi.org/10.6017/ital.v26i1.3286. 3 stephen bell, lorcan dempsey, and barbara fister, new roles for the road ahead: essays commissioned for the acrl’s 75th anniversary (chicago: association of college and research libraries, 2015). 4 amanda harrison et al., “social media use in academic libraries: a phenomenological study,” journal of academic librarianship 43, no. 3 (may 1, 2017): 248–56, https://doi.org/10.1016/j.acalib.2017.02.014. 5 “social media fact sheet,” pew research center, january 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 6 online computer library center, sharing, privacy and trust in our networked world: a report to the oclc membership, (dublin, ohio: oclc, 2007)), https://eric.ed.gov/?id=ed532599. 7 ruth sara connell, “academic libraries, facebook and myspace, and student outreach: a survey of student opinion,” portal: libraries and the academy 9, no. 1 (january 8, 2009): 25–36, https://doi.org/10.1353/pla.0.0036. 8 elizabeth brookbank, “so much social media, so little time: using student feedback to guide academic library social media strategy,” journal of electronic resources librarianship 27, no. 4 (2015): 232–47, https://doi.org/10.1080/1941126x.2015.1092344. 9 besiki stvilia and leila gibradze, “examining undergraduate students’ priorities for academic library services and social media communication,” journal of academic librarianship 43, no. 3 (may 1, 2017): 257–62, https://doi.org/10.1016/j.acalib.2017.02.013. 10 brookbank, “so much social media, so little time.” 11 stvilia and gibradze, “examining undergraduate students’ priorities.” 12 qzone and renren are chinese social media platforms. 13 curtis r. rogers, “social media, libraries, and web 2.0: how american libraries are using new tools for public relations and to attract new users,” south carolina state library, may 22, 2009, http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_20 09-5.pdf?sequence=1; jakob harnesk and marie-madeleine salmon, “social media usage in libraries in europe—survey findings,” linkedin slideshare slideshow presentation, august https://doi.org/10.6017/ital.v26i1.3286 https://doi.org/10.1016/j.acalib.2017.02.014 http://www.pewinternet.org/fact-sheet/social-media/ https://eric.ed.gov/?id=ed532599 https://doi.org/10.1353/pla.0.0036 https://doi.org/10.1080/1941126x.2015.1092344 https://doi.org/10.1016/j.acalib.2017.02.013 http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_2009-5.pdf?sequence=1 http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_2009-5.pdf?sequence=1 information technology and libraries | march 2018 18 10, 2010, https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europesurvey-teaser. 14 “social media fact sheet.” 15 daniel miller, “facebook’s so uncool, but it’s morphing into a different beast,” the conversation, 2013, http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-differentbeast-21548; ryan bradley, “understanding facebook’s lost generation of teens,” fast company, june 16, 2014, https://www.fastcompany.com/3031259/these-kids-today; nico lang, “why teens are leaving facebook: it’s ‘meaningless,’” washington post, february 21, 2015, https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teensare-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662. 16 alison mccarthy, “survey finds us teens upped daily facebook usage in 2016,” emarketer, january 28, 2017, https://www.emarketer.com/article/survey-finds-us-teens-upped-dailyfacebook-usage-2016/1015053. https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europe-survey-teaser https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europe-survey-teaser http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-different-beast-21548 http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-different-beast-21548 https://www.fastcompany.com/3031259/these-kids-today https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teens-are-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662 https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teens-are-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662 https://www.emarketer.com/article/survey-finds-us-teens-upped-daily-facebook-usage-2016/1015053 https://www.emarketer.com/article/survey-finds-us-teens-upped-daily-facebook-usage-2016/1015053 introduction literature review academic libraries and social media student perceptions about academic libraries on social media research questions methods results survey library usage social media platforms social media activity social media and the library discussion limitations and future work conclusion references letter from the editor kenneth j. varnum information technology and libraries | december 2017 1 https://doi.org/10.6017/ital.v36i4.10237 i am excited to have been appointed editor of information technology and libraries as the journal enters its 50th year. originally published as the journal of library automation, ital has a long history of tracking the rapid-fire changes in technology as it relates to libraries. much as it has over the past 50 years, technology will continue to change not just the way libraries offer services to their communities, but the way we conceptualize what it is we do. if past is prologue, i have no doubt the next decades will continue to amaze, probably in ways even the most adventurous trend-forecaster won’t get quite right. in the context of the rapid change in how we do our work, what we do will remain the same: collecting, preserving, and providing access to the information and artefacts of our culture, whatever that may be. i would like ital to grow and expand, while keeping its core essence the same. that core is high-quality, relevant, and informative articles, reviewed by our peers, and made available to the world. but i think there is more we can do for lita and the library technology profession by expanding the scope and impact of the journal through seeking and soliciting articles from a wider range of librarians, adding more case studies to the research articles that are at the journal’s core, and being more rapidly responsive to the evolving technology landscape in front of us. to that end, i invite you to think broadly about researching, documenting, and describing the technology-related work you do so that others can learn about it. i welcome questions about how your project might fit into ital, and look forward to working with you. i’d like to close by extending my thanks to bob gerrity, who served as ital’s editor for the past 6 years and stewarded the journal’s transition to an open access publication. i am grateful for his service to ital, lita, and the profession. sincerely, kenneth j. varnum editor varnum@umich.edu mailto:varnum@umich.edu lib-s-mocs-kmc364-20140601053406 content designators for machine-readable records: a working paper henriette d. a vram and kay d. guiles: marc development office, library of congress, washington, d.c. under the auspices of the international federation of library association's committees on cataloging and mechanization,. an international working group on content designators was formed to attempt to resolve the differences in the content designators assigned by national agencies to their machine-readable bibliographic records. the members of the ifla working group are: henriette d. avram, chairman, marc development office, library of congress; kay d. guiles, secretary, marc development office, library of congress; edwin buchinski, research and planning branch, national library of canada; marc chauveinc, bibliotheque interuniversitaire de grenoble, section science, domaine universitaire, france; richard coward, british library planning secretariat, department of education & science, united kingdom; r. erezepky, deutsche bibliothek, german federal republic; f. poncet, bibliotheque nationale. paris, france; mogens weitemeyer, det kongelige bibliotek, denmark. all working papers emanating from the ifla working group will be submitted to the international standards organization technical committee 46, subcommittee 4, working group on content designators. prior to any attempt to standardize the content designators for the international exchange of bibliographic data in machine-readable form, it is necessary to agree on certain basic points from which all future work will be derived. this first working paper is a statement of: 1) the obstacles that presently exist which prevent the effective international interchange of bibliographic data in machine-readable form; 2) the scope of concern for the ifla working group; and 3) the definition of terms included in the broader term "content designators." if an international standard format can be derived, it would greatly facilitate the use in this country of machine-readable bibliographic records issued by other national agencies. it should also contribute significantly to the expansion of marc to other languages by the library of congress. at 208 journal of library automation vol. 5/4 december, 1972 present, the assignment of content designators of most national systems is so varied that tailor-made programs must be written to translate each agency's records into the united states marc format. the international communications format might become the common denominator between all countries, each national system maintaining its own national version. introduction the international organization for standardization standard for bibliographic information interchange on magnetic tape ( 1) has recently been adopted, following on the adoption of the american national standard (2). these events, along with the implementation of the united states and the united kingdom marc projects and similar projects in other countries, have emphasized the importance of the international exchange of bibliographic data in machine-readable form. there are many problems to be resolved before we can approach a truly universal bibliographic system. many of these have been described in an article by dr. franz kaltwasser ( 3) . basic to the exchange of bibliographic data is the requirement for an interchange format which can be used to transmit records representing the bibliographic descriptions of different forms of material (such as records for books, serials, and films) and related records (such as authority records for authors and for subject terms). a format for machine-readable bibliographic records is composed of the following three elements: 1. the structure of the record, which is the physical representation of the information on the machine-readable medium. 2. the content designators (tags, indicators, and data element identifiers ( 4) ) for the record, which are means of identifying data elements or providing additional information about a data element. 3. the content of the record, which is the data itself, i.e., the author's name, title, etc. obstacles the structure of the record, as described in ansi z39.2-1971 and in the iso standard on bibliographic information interchange on magnetic tape, has been fairly well accepted by the international bibliographic community. however, events have shown that as the different agencies examine their requirements and establish the content of their machine-readable records, the content and the content designators so established are not the same across all systems. this lack of uniformity is the result of at least four principal factors: 1. the different functions performed by various bibliographic agencies. bibliographic services are provided by many types of organizations issuing a variety of products. these products are dissimilar because the content d esignators/ avram and guiles 209 uses made of them vary, reflecting dissimilarities in the principal functions of the agencies involved. the main products of some of the different bibliographic services are briefly described as follows: catalogs serve to index the collections of individual libraries by author, title, subject, and series. to enable a user to find a physical volume rather than merely a bibliographic reference, catalogs also provide a location code. a unique form of entry for each name or topical heading used as an access point is maintained by means of authority files. the various access points serve to bring together works by the same author, works with the same title, works on the same subject, and works within the same series. a unique bibliographic description of each item makes it possible to distinguish between different works with the same title, and different editions of the same work. natio1ull bibliographies provide an awareness service for those items published within a country during a given time period. a national bibliography is not a catalog, since it is not based on or limited to any single collection, nor is it concerned with providing access to the physical item itself. abstracting and indexing services are principally concerned with indexing technical report literature and individual articles from journals and composite works. because these services generally index more specialized materials and are aimed at the specialist in a particular discipline, more complete indexing by means of a relatively large number of very specific subject terms is the rule. like the national bibliography, the abstracting and indexing service is not concerned with a single collection or, in most cases, with providing access to the item on the shelf. 2. the lack of internationally accepted cataloging practices. the paris conference of 1961, which resulted in the paris principles, set the framework for an international cataloging code. following the conference, progress in standardization was evident in the work begun on the formulation of cataloging codes embodying, in varying degrees, the paris principles. one such code is the anglo-american cataloging rules (aacr) ( 5). however, we are concerned with the present, and the differences that exist in the cataloging codes of various countries do create differences in the content that may affect content designation of machine-readable bibliographic records. the differences between cataloging rules practiced in the library community and in the information community ( 6) are even more prominent. in the united states, these differences are clearly seen in a comparison between aacr and the cosati rules (7). even more significant is the fact that in preparing entries for abstracting and indexing services, it is common practice to use a name as it appears on the document, without attempting to distinguish it from names of other persons so as to bring together the works of a single author. in addition, cataloging practice in the information community often requires inclusion of data elements that 210 journal of librm·y automation vol. 5/4 december, 1972 are not used in the library community (e.g., organizational affiliation). it is obvious that these differences in practice are serious obstacles to achieving agreement on details of content designation for machine-readable records used in each environment. 3. lack of agreement on organization of data content in machinereadable records in different bibliographic communities. bibliographic data can be organized in machine-readable form in many different ways. for example, one approach could be the grouping of data elements by bibliographic function, such as main entry, title, etc.; another approach could be the grouping together of information by type, such as all personal names, all corporate names, etc. there are pros and cons associated with each of these groupings. this difference in organization exists in some instances between the library community and the information community. for the present discussion, it is not appropriate to analyze the relative merits between the two points of view. it must be emphasized, however, that there is no optimum organization, and that a variety of users will use the data in a variety of ways. it is certainly true that any given system can define, upon agreement of its members, a particular use to be made of the data exchanged and, in this case, perhaps an optimum data organization can be defined ("perhaps" is used because hardware is another variable that comes into play). 4. lack of agreement as to the functions of content designators. there is a lack of agreement as to the functions of content designators, as well as a misunderstanding, in some instances, of the rationale for the assignment of certain of them to specific data elements. the lack of agreement as to the functions of content designators is clearly seen when one examines the use of the data element identifiers in the different national formats. for example, in some cases the data element identifier is assigned to the data element according to its value in a collation sequence (e.g., a is smaller than b, b is smaller than c). the result is a prescribed order, from the smallest value to the largest, for selecting the data elements to build a sort key for file arrangement. in other systems, the data element identifier assigned to a data element is for the unique identification of that data element. there is no prescribed ordering built into the data element identifiers; the identification of the data elements allows them to be selected according to the requirements of the user to build a sort key for file arrangement. data element identifiers in some cases are tag dependent, i.e., they identify the same data elements consistently when used with a particular tag and data field, regardless of the combination of data elements present in the data field for any particular record. in other cases, the data element identifiers are tag, indicator, and data dependent, i.e. , the meaning of the data element identifiers changes and the data element identifiers are assigned to different data elements, depending upon the combination of data elements occurring in a data field for a particular record. content designat01·s j avram and guiles 211 scope the scope of responsibility for the ifla working group is to investigate the present assignment of content designators for the purpose of determining those areas in which there is uniformity of assignment and those areas in which there is not uniformity. once this has been done, the working group's next task is to explore how best these differences can be accommodated so as to arrive at a standard for the international interchange of bibliographic data. within that scope, the working group will first be concerned with the requirements for the international library community, i.e., libraries and national bibliographies. the magnitude of this assignment is such that it appears unwise to impose the additional problems of the needs of the information community concurrently. if the attempt is made to do so, and the result of the effort is failure, it will not be clear whether we failed because the task was too difficult or whether it is not possible to merge two communities with significant variation throughout their systems. on the other hand, if only the library community is approached at this time, the result of the effort can be success; but if the result is failure, at least one factor will be clear if only in a negative sense: there will be no lingering question as to whether the attempt might have succeeded had the problems of only one community been addressed at one time. in summary, it may be stated that our attempt to standardize content designators within the library community will be complicated by: 1) the lack of an international cataloging code; 2) the dissimilarities in the products of various agencies created by the different functions performed by those agencies; and 3) the lack of an agreement on the functions of the content designators themselves. the lack of agreement on an international cataloging code will have an impact on our work, but is an area which is out of scope for the working group, and therefore can be considered a variable over which there is no control. the dissimilarities in the functions of the different bibliographic services are also a given. however, since it was possible to work around these differences in the formulation of the international standard bibliographic description, it may be possible to do so for the standardization of content designators. therefore, within the two variables given above, our emphasis should be placed on attempting to resolve the lack of agreement on the functions of content designators and then we can proceed to attempt to standardize the assignment of tags, indicators, and data element identifiers. the present paper concentrates on the substance of the problem, namely, a statement of the definition of tags, indicators and data element identifiers and their functions, i.e., the information they are intended to provide to a system processing bibliographic data. the concept of a supermarc has been discussed in the literature ( 8, 9) as an international system for exchange, leaving the various national systems as they now exist. each country would have an agency that would 2i2 journal of library automation vol. 5/4 december, 1972 translate its own machine-readable record into that of the supermarc system; likewise, each agency would translate the supermarc record from national bibliographic systems into its own format for processing within the country concerned. at the international level, there would be only one record format. this concept has the theoretical advantage of eliminating the difficulties inherent in seeking agreement internationally. however, what has not been addressed is the problem inherent in this concept, namely, the problem associated with any switching language. this may be illustrated in the following manner. consider the case of a national agency (called system 1) whose format is not detailed in regard to content and/or content designation. when system i translates to supermarc, the result will be a supermarc record, but it will be a supermarc record still only defined at the level of detail of the limited record of system i. this will be true regardless of the level of detail at which supermarc is originally defined. likewise, when a national agency (called system 2) accepts records from system i via supermarc and translates the supermarc records into its own format, the resulting records will be the limited records of system i, regardless of the detail of system 2's local records. this may be schematically represented as follows: system 1 supermarc =no more detail than (little detail) (great detail) system i supermarc system 2 =no more detail than (record from (great detail) supermarc record system 1) from system 1 the result of this analysis suggests that systems with formats of less detail than that of supermarc must permanently upgrade their national formats to the level of detail of supermarc while systems with formats more detailed than supermarc must be prepared to accept the fact that records from other countries will probably require significant modification. therefore, although national variation is allowed in a supermarc system, the international community still faces all the problems of international agreement, i.e., arriving at an acceptable level of content designation for supermarc. content designators bibliographic records in machine-readable form permit the manipulation of data and allow greater flexibility for the creation of a variety of products. the full potential of machine-readable files has not been exploited to date, but based on experience and the projection of this experience into the future, it may be said that the variety of uses of machine-readable cataloging data will be limited only by the imagination of the user. among the possible products are printed catalog cards, book catalogs, special bibliographies, special indexes, book preparation materials, crt display of cataloging information, management statistics (analysis of data by type content designators/ avram and guiles 213 of material, subject, language, date, or other parameters), etc. all of the above are possible in a wide variety of output formats. in order to produce these various tools, there are four basic operations ( 10) which are performed on the data. 1. store-the storage operation is the internal (to the computer) management of the data, i.e., how files are organized, the type of accessing technique ( s) used, and the data elements (e.g., author, title) selected as keys to the complete bibliographic record. 2. retrieve-the retrieval operation is used here in its broadest sense, to cover the following kinds of retrieval: the retrieval of a single element from a record; the retrieval of a known item, such as the selection of a record by unique number or author and title; the retrieval of a category of records, such as those for all french language monographs on a particular subject with an imprint date of 1968 or later; the retrieval of all bibliographic records for a particular form of material, e.g., serials. (the latter retrieval capability allows segmentation of files not only for display purposes but also for the implementation of certain file organization techniques. ) 3. arrange-the arrange operation puts information in a sequence that is most useful for the user of the product, i.e., an alphabetic sequence or a systematic arrangement. 4. display-the display operation as used in this context implies formatting, the purpose of the operation being to make the information human-readable, e.g., display on a crt, computer printout, and photocomposed output. for example, to display a particular catalog record on a crt device, the record must be retrieved from the data base by a known number or other means of access and formatted for display; or, to prepare a special bibliography, all records satisfying a particular search argument are retrieved from the data base, arranged in some predefined order, formatted and printed. the storage operation is implicit in the examples. in order to perform these four basic operations through machine manipulation, content designators are assigned to the data content of the record. therefore, it may be stated that the function of content designators is to provide the means for the user to store, retrieve, arrange, and display information in a variety of ways to suit his needs. there are three types of content designators currently in use: tags, indicators, and data element identifiers. for the purposes of standardization, agreement must not only be reached on the definition of those three elements but also on other basic issues. the definitions for the elements are given below, as well as a general discussion of some of the decisions that must be made concerning each of the elements, prior to attempting to achieve standardization. 1. a tag is a series of characters used to identify or name the main content of an associated data field ( 11). the designation of main 214 journal of library automation vol. 5/4 december, 1972 content does not require that a data field contain all possible data elements (units of information) all the time. for example, the imprint may be defined as a data field containing the data elements, place, publisher, date of publication, printer, address of printer. the tag for the data field called imprint would be the same if only a partial set of the data elements existed for any single occurrence of the data field in a bibliographic record. should the method of assigning tags be simply to assign a unique series of characters to a data field whereby the characters have no meaning other than to name the main content of the data field? or is it desirable to give values to the characters making up the tag? in the latter case, a tag may identify a data field both by function and type of entry, thus allowing greater flexibility in internal organization of the data as well as its formatting for output. 2. an indicator is a character associated with a tag to supply additional information about the data field or parameters for the processing of the data field. indicators are tag dependent because they provide both descriptive and processing information about a data field. should alphabetic characters as well as numeric characters be assigned to indicators? should the character b (blank) always mean a null condition and the character 0 (zero) have a value or a meaning? should indicators with the same values and meanings be used for different data fields and their associated tags where the situation warrants this equality? for example, a personal name may be a main entry, an added entry, or a subject entry. if it is deemed desirable to further describe the type of personal name such as forename, single surname, multiple surname, or name of family, the indicators set for each of the data fields mentioned above would have the same value and the same meaning. this technique has the advantage of simplifying machine coding for the processing of different functional fields containing the same types of entries. 3. a data element identifier is a code consisting of one or more characters used to identify individual data elements within a data field. the data element identifier precedes the data element which it identifies ( 12). should data element identifiers be given a value, i.e., file arrangement value, other than the identification of the data element? should data element identifiers be tag dependent only or tag, indicator, and data dependent? should the same data element identifiers be assigned, so far as is possible, to the same data element regardless of the field in which the data element occurs? should data element identifiers be restricted to alphabetic characters or should they be expanded to allow the use of numerics and symbols? the assignment of a filing value to a data element identifier is intended to minimize the effort required to create software for filing. however, assigning filing values to data element identifiers results in identifiers that are tag, indicator, and data dependent. on the other hand, without assigning content designators/ avram and guiles 215 filing values to the data element identifiers and using computer filing algorithms, the system can avoid data dependent codes, thus ensuring maximum consistency across all fields. for example, the use of the same data element identifier assigned to a title wherever a title appears in the record allows the flexibility of selecting all titles by data element identifier. furthermore, tag, indicator, and data dependent data element identifiers create additional complexity in the editing procedure ( 13). although fixed fields are not content designators, they do take on similar characteristics as to function, i.e., to provide the means for the user to store, retrieve, arrange, and display information in a variety of ways to suit his needs. therefore, they should be considered by the working group along with the content designators. a fixed field is one in which every occurrence of the field has a length of the same fixed value regardless of changes in the contents of the field from occurrence to occurrence. the contents of the fixed field can actually be data content, e.g., date of imprint; or a code representing data content, e.g., type of illustration; or a code representing information about the record, e.g., language of the record; or data concerned with the processing of the record, e.g., date entered on file. here again, certain basic issues must be resolved. should the character b (blank) be used to signify a null condition, e.g., in a record without any type of illustration b (blank) would be used? should the codes that represent more than two possible conditions be alphabetic or numeric? should the characters 1 (one) and 0 (zero) be used to indicate an on-off condition, e.g., a book contains an index to its own contents ( 1) or it does not ( 0 )? it is important to keep in mind the eventual necessity of correlating the content designators and fixed fields for all the formats defined for different forms of material (books, serials, maps, films, music, etc.) . by adhering as much as possible to the same content designators and fixed fields, the processing of different forms of material will be facilitated in terms of the software required to perform a particular process and to combine all forms of material in a single product, such as a book catalog. references 1. international organization for standardization. bibliographic information interchange-format for magnetic tape recording. draft international standard iso/dis 2709. technical committee iso/tc 46 secretariat (germany), 1972. 2. american national standards institute. american national standard for bibliographic information interchange on magnetic tape. ansi z39.2-1971. new york: american national standards institute, 1971. 3. franz georg kaltwasser, "the quest for universal bibliographical control," unesco bulletin for libraries, 25:252-259 (sept./oct. 1971). 4. data element identifiers have more commonly been referred to as subfield codes. 216 journal of library automation vol. 5/ 4 december, 1972 5. anglo-american cataloging rules. prepared by the american library association ... north american text. chicago: american library association, 1967. 6. the term bibliographic services has been used to include all agencies concerned with bibliographic products. for this paper such agencies have been further subdivided into two communities: the library community, defined as including libraries and national bibliographies; and the information community, defined as including the abstracting and indexing services. this broad definition has been used for the sake of simplicity. 7. committee on scientific and technical information. standard for descriptive cataloging of government scientific and technical reports. washington: committee on scientific and technical information, 1966. 8. r. e. coward, "marc: national and international cooperation," in international seminar on the marc format and the exchange of bibliographic data in machine-readable form, berlin, 1971: the exchange of bibliographic data and the marc format, ( mi.inchenpullach, 1972), 17-23. 9. roderick m. duchesne, "marc and supermarc," in international seminar on the marc format ... , p. 37-56. 10. these basic operations are not used in this context to mean basic machine operations such as add, subtract, multiply, and divide. 11. a data field is a variable length field containing bibliographic or other data not intended to supply parameters to the processing of the bibliographic record, i.e., content data only. 12. there are in existence formats in which the data element identifier is a single character, i.e., a delimiter. in this case, there is no explicit identification function built into the data element identifier. if, in the particular data field, the data elements are all of the same type, such as a multiname data field, then the meaning of the delimiter is implicit. 13. editing is used in this context to mean the human or machine assignment of content designators. reproduced with permission of the copyright owner. further reproduction prohibited without permission. site license initiatives in the united kingdom: the psli and nesli experience borin, jacqueline information technology and libraries; mar 2000; 19, 1; proquest pg. 42 l site license initiatives in the united kingdom: the psli and nesli experience jacqueline borin this article examines the development of site licensing within the united kingdom higher education community. in particular, it looks at haw the pressure to make better use of dwindling fiscal resources led ta the conclusion that information technology and its exploitation was necessary in order to create an effective library service. these conclusions, reached in the follett report of 1993, led to the establishment of a pilot site license initiative and then a national electronic site license initiative. the focus of this article is these initiatives and the issues they faced, which included off-site access, definition of a site and perhaps most importantly, the unbundling of print and electronic journals. increased competition for institution funding around the world has resulted in an erosion of library funding. in the united states state universities are receiving a decreasing portion of their funds from the state while private universities are forced to limit tuition increases due to outside market forces. in the united kingdom the entitlement to free higher education is currently under attack and losing ground. today's economic pressures are requiring individual libraries to make better use of their fiscal resources while the emphasis moves from being a repository for information to providing access to information. jacqueline sorin (jborin@csusm.edu) is coordinator of reference and electronic resources, library and information services, california state university, san marcos. as in the united states, the use of consortia for cost sharing in the united kingdom is becoming imperative as producers produce more electronic materials and make them available in full-text formats. consortia, while originally formed to cooperate on interlibrary loans and union catalogs, have recently taken on a new role, driven by financial expediency, in negotiating electronic licenses for their members, and the percentage of vendor contracts with consortia are rising. academic libraries cannot afford the prevalent pricing model that asks for the current print price plus an electronic surcharge plus projected inflation surcharges, therefore group purchasing power allows higher education institutions to leverage the money they have and to provide resources that would otherwise be unavailable. advantages for the vendor include one negotiator and one technical person for the consortia as a whole. in addition, the use of consortia provide greater leverage in pushing for the need for stable archiving and for retaining the principles of fair use within the electronic environment as well as reminding publishers of the need for flexible and multiple economic models to deal with the diverse needs and funding structures of consortia. i during the spring of 1998, while visiting academic libraries in the united kingdom, i looked at an existing initiative within the uk higher education community-the pilot site license initiative (psli), which had begun as a response to the follett report and to rising journal prices. at the time the three-year initiative was nearing its end and its successor, the national electronic site license initiative (nesli), was already the topic of much discussion. i history the concept of site licensing in the united kingdom higher education 42 information technology and libraries i march 2000 community had already been established, since 1988, by the combined higher education software team (chest), based at the university of bath. chest has negotiated site licenses with software suppliers and some large database producers through two different methods. either the supplier sells a national license to chest, which passes it on to the individual institution or chest sells licenses to the institution on the suppliers behalf and passes the fees on to them (see figure 1). chest works closely with national information services and systems (niss). niss provides a focal point for the uk education and research communities to access information resources. niss's web service, the niss information gateway, provides a host for chest information such as ebsco masterfile and oclc netfirst. most chest agreements are institution-wide site licenses that allow for all noncommercial use of the product, normally for five years to allow for incorporation into the curriculum. once an institution signs up it is committed for the full term of the agreement. chest is not in the business of either evaluating products or differentiating among competing suppliers. evaluations and purchase decisions are left up to the individual institutions.2 chest does set up and support e-mail discussion lists for each agreement so that users can discuss features and problems of the product among themselves. they also send out electronic news bulletins to provide advance warning of forthcoming agreements and to assess level of interest in future agreements. chest operates in a similar manner to many library consortia in the united states. the major differences are that it sells to higher education institutions as a whole so the products they sell include not only databases but also for example, software programs. this is also beginning to change in reproduced with permission of the copyright owner. further reproduction prohibited without permission. the united states. a recent article in the chronicle of higher education mentions that institutions will not stop with library databases, "in the future we'll be negotiating site licenses for software and all sorts of things . . . not just databases."3 although chest is substantially self-funding it is strongly supported (as is niss) by the joint information systems committee (jisc) of the higher education funding councils of england (hefce). the majority of public funding for higher education funding in the united kingdom is funneled through the hefcs (one each for england, scotland, wales, and northern ireland). one of the jisc committees, the information services subcommittee (issc), which in 1997 became part of the committee for electronic information (cei) defined principles for the delivery of content. 4 they were: • free at the point of use; • subscriptions not transaction based; • lowest common denominator; • universality; • commonality of interfaces and • mass instruction. i follett report in 1993 an investigation into how to deal with the pressures on library resources caused by the rapid expansion of student numbers and the worldwide explosion in academic knowledge and information was undertaken by the joint funding council 's libraries review group, chaired by sir brian follett. this investigation resulted in the follett report. one of the key conclusions of the report was "the exploitation of it is essential to create the effective higher education and public research establishments software, data , training needs ! chest © chest (university of bath) 1996 figure 1. chest diagram chest deals , chest offers negotiations software , data, training materials t it product suppliers library service of the future ." the review group recommended that as a starting point "a pilot initiative between a small number of institutions and a similar number of publishing houses should be sponsored by the funding councils to demonstrate in practical terms how material can be handled and distributed electronically." 5 as a consequence £15 million was allocated to an electronic libraries program, managed by jisc on behalf of hefce. the electronic libraries program was to "engage the higher education community in developing and shaping th e implementation of the electronic library." 6 this project provided a body of electronic resources and services for uk higher education and influenced a cultural shift towards the acceptance and use of electronic resources instead of more traditional information storage and access methods. psli in may 1995 a pilot site license initiative subsidized by the funding councils was set up to : • test if the site license concept could provide wider access to journals for those in the academic community; • see if it would allow more flexibility in the use of scholarly material ; • test the methods for dissemination of scholarly material to the higher education sector in a variety of formats ; • test legal models for a national site license program; and • explore the possibility for increased value for money from scholarly journals.7 sixty-five publishers were invited by hefce to participate for three years commencing january 1, 1996. hefce was also responsible through jisc for the funding of the elib program, but no formal links were established between the elib project and communications i borin 43 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the psli. 8 the final selection of four companies included academic press ltd., blackwell publishers ltd., blackwell science ltd., and iop publishing ltd. the publishers agreed to offer print journals to higher education institutions for discounts of between 30 and 40 percent over the three year period as well as electronic access as available. originally the electronic journals were supposed to be the subsidiary component of the agreement but by the end of the agreement they had become the major focus. the psli achieved almost 100 percent take up among the higher education institutions due to the anticipated savings through the program.9 hefce did not specify how the publishers were to deliver their content. iopp hosted the journals on their own server, for example, while academic press linked their ideal server to the journals online service at the university of bath. one of the key provisions of the site license was the unlimited rights of authorized users to make photocopies (including their use within course packs) of the journals. academic press and iopp provided full-text access to all their journals while blackwell and blackwell science only allowed reading of full text where a print subscription existed. an integral part of the psli was that the funding from hefce to the higher education institutions was top sliced to support the discounted price offered to the institutions. several assessments of the initiative were made and a final evaluation of the pilot was concluded at the end of 1997. initial surveys indicated subscription savings through the program (average annual savings were approximately £11,800 per annum) and the first report of the evaluation team showed a wide level of support for the project despite major problems with lack of communication in a timely manner.10 the team recommended an extension of the psli to include more publishers and more emphasis on electronic delivery. one concern that was raised was ease of access, students had to know which system a journal they required was on. this was not easily discernible or user friendly. evaluations by focus groups showed users wanted one single access point to all electronic journals.11 also unresolved was the need for one consistent interface to the electronic journals and a solution to the archiving issue. at the end of the psli, hefce handed the next phase over to jisc. in the fall of 1997 jisc announced that a nesli would be set up and a new steering group was established. nesli was to be an electronic-only scheme and the invitation to tender went out at the end of 1997 with a decision to be made mid-1998. national electronic site license initiative nesli, a three-year jisc funded program, began on january 1, 1999 although the "official" launch was held at the british library on june 15, 1999. it is an initiative to deliver a national electronic journal service to the united kingdom higher education and research community (approximately 180 institutions) and is a successor program to the pilot site license initiative {psli). in may 1998 jisc appointed a consortium of swets and zeitlinger and manchester computing {university of manchester) to act as a managing agent (swets and blackwell ltd. announced in june 1999 their intention to combine swets subscription service and blackwell's information services, the two subscription agency services). the managing agent represents the higher education institutions in negotiations with publishers, manages delivery of the electronic material through a single web interface and oversees day-to-day operation of the program including the handling of subscriptions.12 44 information technology and libraries i march 2000 the managing agent also encourages the widespread acceptance by publishers of a standard model site license, one of the objectives of this being to reduce the number and diversity of site definitions used by publishers. other important provisions of the model site license addressed the issues of walk-in use by clients and the need for publishers to provide access to material previously subscribed to when a subscription is cancelled. the subscription model is currently the prevalent option although they are also working towards a pay-per-view option.13 priority has been given to publishers who had been involved in the psli and to those publishers participating in swetsnet, the delivery mechanism for the nesli. swetsnet is an electronic journal aggregation service that offers access to and management of internet journals. its search engine allows searching and browsing through titles from all publishers with links to the full-text articles. nesli is not a mandatory initiative, the higher education institutions can choose whether to participate in proposals and can pursue their own arrangements individually or through their own consortiums if they wish. while psli was basically a printbased initiative limited to a small number of publishers and funded via top slicing, nesli is an electronic initiative aimed at involving many more publishers. it is designed to be self-funding, although it did receive some start-up funding. although it is an electronic initiative, proposals that include print will be considered, as it is still not easy to separate print and electronic materials.14 the initiative addresses the most effective use, access, and purchase of electronic journals in the academic library community. its aims include: • access control-for on-site and remote users; • cost; reproduced with permission of the copyright owner. further reproduction prohibited without permission. • definition of a site; • archiving; and • unbundling print from electronic. access to swetsnet, the delivery mechanism for journals included in nesli, has now been supplemented by the option of athens authentication. athens, an authentication system developed by niss, provides individuals affiliated with higher education institutions a single username and password for all electronic services they have permission to access. athens is linked to swetsnet to ensure access for off-site, remote, and distance learners who do not have a fixed ip address. this supplements swetsnet's ip address authentication, which does not allow for individual access to toc and sdi alerting. a help desk is available for all nesli users through the university of manchester. the definition of a site is being addressed by the nesli model site license, which tries to standardize site definitions (including access from places that authorized users work or study, including homes and residence halls); interlibrary loan (supplying an authorized user of another library a single paper copy of an electronic original of a individual document); walk-in-users; access to subscribed material in perpetuity (it provides for an archive to be made of the licensed material with access to the archive permissible after termination of the license); and inclusion of material in course packs. jisc' s nesli steering group approved the model nesli site license on may 11, 1999 for use by the nesli managing agent.15 the managing agent asks publishers to accept the model license with as few alterations as possible. during the term of the initiative the managing agent will be working on additional value added services. these include links from key indexing and abstracting services, provision of access via z39.50, linking from library opacs, creation of catalog records and assessing a model for ejournal delivery via subject clusters. in particular, they have begun to look at the technical issues concerned with providing marc records for all electronic journals included in nesli offers. additionally they will be looking at solutions for longer term archiving of electronic journals to provide a comfort level for librarians purchasing electronic only copies.16 two offers that have been made under the nesli umbrella so far are blackwell sciences for 130 electronic journals and johns hopkins university press for 46 electronic titles. most recently two additional vendors have been added to the list. elsevier has made a proposal to deliver full text content via the publishers sciencedirect platform that includes the full text of more than 1,000 elsevier science journals along with those of other publishers. a total of more than 3,800 journals would be included in the service.17 mcb university press, an independent niche publisher, is offering access to 114 full text journals and secondary information in the area of management through it's emerald intelligence + fulltext service. similarly, here in the united states, california state university (csu) put out for competitive tender a contract for the building of a customized database of 1200+ electronic journals based on the print titles subscribed to by 15 or more of the 22 campuses-journal access core collection oacc). the journals will be made available via pharos, a new unified information access system for the csu. like ohiolink, a consortium of 74 ohio libraries, it will provide a common interface to electronic journals for students and faculty and will facilitate the development of distance learning programs.18 by unbundling the journals, libraries will no longer be required to pay for journals they do not want or need leading to moderate price savings. additional savings can be realized through the lowering of overhead costs achieved by system wide purchasing of core resources. other issues being addressed within the jacc rfp included archiving and perpetual access to journal articles the university system has paid for, availability of e-journals in multiple formats, interlibrary loan of electronic documents, currency of content and cost value at the journal-title level. 19 currently 500 core journals are being provided under the jacc by ebsco information services and the csu plans on expanding those offerings. i conclusion as we move into the next millennium library consortia will continue to work together with vendors to further customize journal offerings. however it is still far too early to say whether nesli will be successful or whether it will succeed in getting the publishing industry to accept the model site license. if it is to work within the higher education community, it will depend greatly on the flexibility and willingness of the publishers of scholarly journals. it has made a start by developing a license that sets a wider definition of a site and that deals realistically with the question of off-site access. by encouraging the unbundling of electronic and print subscriptions nesli allows services to be tailored to specific needs of the information community, but it remains to be seen how many publishers are prepared to accept unbundled deals at this stage. also as technology stabilizes and libraries acquire increasingly larger electronic collections, we will not be able to rely on license negotiations as the only way to influence pricing, access, and distribution. an additional problem that remains unaddressed by either psli or nesli is the pressure on academics to publish in traditional journals and the corcommunications i borin 45 reproduced with permission of the copyright owner. further reproduction prohibited without permission. responding rise in scholarly journal prices. nesli neither encourages nor hinders changes in scholarly communication and therefore the question of restructuring the scholarly communication process remains.20 references and notes 1. barbara mcfadden and arnold hirshon, "hanging together to avoid hanging separately: opportunities for academic libraries and consortia," information technology and libraries 17, no. 1 (march 1998): 36. see also international coalition of library consortia, "statement of current perspective and preferred practices for the selection and purchase of electronic information," information technology and libraries 17, no. 1 (march 1998): 45. 2. martin s. white, "from psli to nesli: site licensing for electronic journals," new review of academic librarianship 3, (1997): 139-50. see also chest. chest: software, data, and information for education (1996). 3. thomas j. deloughry, "library consortia save members money on electronic materials," the chronicle of higher education (feb. 9, 1996): a21. 4. information services subcommittee, "principles for the delivery of content." accessed nov. 17, 1999, www.jisc.ac.uk/ pub97 /nl_97.html#issc. 5. joint funding council's libraries review group. the follett report. (dec. 1993): accessed nov. 20, 1999, www.niss.ac. uk/ education/ hefc / follett/report/. 6. john kirriemuir, "background of the elib programme." accessed nov. 21, 1999, www.ukoln.ac.uk/services.elib/ background/history.html. 7. psli evaluation team, "uk pilot site license initiative: a progress report," serials 10, no. 1 (1997): 17-20. 8. white, "from psli to nesli," 149. 9. tony kidd, "electronic journals: their introduction and exploitation in academic libraries in the uk," serials review 24, no. 1 (1998): 7-14. 10. jill taylor roe, "united we save, divided we spend: current purchasing trends in serials acquisitions in the uk academic sector," serials review 24, no. 1 (1998): ~11. psli evaluation team, "uk pilot site license initiative," 17-20. 12. beverly friedgood, "the uk national site licensing initiative," serials 11, no. 1 (1998): 37-39. 13. university of manchester and swets & zeitlinger, nesli: national electronic site license initiative (1999). accessed nov. 21, 1999, www.nesli.ac.uk/. 14. nesli brochure, "further information for librarians." accessed nov. 21, 1999, www.nesli.ac.uk/ nesli-librarians-leaflet.html. 15. a copy of the model site license is available on the nesli web site. accessed nov. 22, 1999, www.nesli.ac.uk/ mode1license8.html. 16. albert prior, "nesli progress through collaboration," learned publishing 12, no. 1 (1999). 17. science direct. accessed nov. 24, 1999, www.sciencedirect.com. 18. declan butler, "the writing is on the web for science journals in print," nature 397, oan. 211998). 19. the journal access core collection request for proposal. accessed nov. 22, 1999, www.calstate.edu/tier3/ cs+p/rfp_ifb/980160/980160.pdf. 20. frederick j. friend, "uk pilot site license initiative: is it guiding libraries away from disaster on the rocks of price rises?" serials 9, no. 2 (1996): 129-33. a low-cost library database solution mark england, lura joseph, and nern w. schlecht two locally created databases are made available to the world via the web using an inexpensive but highly functional search engine created in-house. the technology consists of a microcomputer running unix to serve relational databases. cgi forms created using the programming language perl offer flexible interface designs for database users and database maintainers. many libraries maintain indexes to local collections or resources and create databases or bibliographies con46 information technology and libraries i march 2000 cerning subjects of local or regional interest. these local resource indexes are of great value to researchers. the web provides an inexpensive means for broadly disseminating these indexes. for example, kilcullen has described a nonsearchable, webbased newspaper index that uses microsoft access 97.1 jacso has written about the use of java applets to publish small directories and bibliographies.2 sturr has discussed the use of wais software to provide searchable online indexes.3 many of the web-based local databases and search interfaces currently used by libraries may: • have problems with functionality; • lack provisions for efficient searching; • be based on unreliable software; • be based on software and hardware that is expensive to purchase or implement; • be difficult for patrons to use; and • be difficult for staff to maintain. after trying several alternatives, staff members at the north dakota state university libraries have implemented an inexpensive but highly functional and reliable solution. we are now providing searchable indexes on the web using a microcomputer running unix to serve relational databases. cgi forms created at the north dakota state university libraries using the programming language perl offer flexible interface designs for database users and database maintainers. this article describes how we have implemark england (england@badlands. nodak.edu) is assistant director, lura joseph (ljoseph@badlands.nodak.edu) is physical sciences librarian, and nem w. schlecht (schlecht@plains.nodak.edu) is a systems administrator at the north dakota state university libraries, fargo, north dakota. automated storage & retrieval system: from storage to service articles automated storage & retrieval system: from storage to service justin kovalcik and mike villalobos information technology and libraries | december 2019 114 justin kovalcik (jdkovalcik@gmail.com) is director of library information technology, csun oviatt library. mike villalobos (mike.villalobos@csun.edu) is guest services supervisor, csun oviatt library. abstract the california state university, northridge (csun) oviatt library was the first library in the world to integrate an automated storage and retrieval system (as/rs) into its operations. the as/rs continues to provide efficient space management for the library. however, added value has been identified in materials security and inventory as well as customer service. the concept of library as space, paired with improved services and efficiencies, has resulted in the as/rs becoming a critical component of library operations and future strategy. staffing, service, and security opportunities paired with support and maintenance challenges, enable the library to provide a unique critique and assessment of an as/rs. introduction “space is a premium” is a phrase not unique to libraries; however, due to the inclusive and open environment promoted by libraries, their floor space is especially attractive to those within and outside of the building’s traditional walls. in many libraries, the majority of floor space is used to house a library’s collection. in the past, as collections grew, floor space became increasingly limited. faced with expanding expectations and demands, libraries struggled to identify a balance between transforming space for new services while adding materials to a growing collection. in addition to management activities like weeding, other solutions such as offsite storage and compact shelving rose in popularity as a method to create library space in the absence o f new building construction. years later as collections move away from print and physical materials, libraries are beginning to reexamine their building’s space and envision new features and services. “now that so many library holdings are accessible digitally, academic libraries have the opportunity to make use of their physical space in new and innovative ways.”1 the csun oviatt library took a novel approach and launched the world’s first automated storage and retrieval system (as/rs) in 1991 as a storage solution to resolve its building space limitations. the project was a california state university (csu) system chancellor’s office initiative that cost more than $2 million to implement and began in 1989. the original concept “came from the warehousing industry, where it had been used by business enterprises for years.”2 by leveraging and storing physical materials in the as/rs, the csun oviatt library is able to create space within the library for new activities and services. “instead of simply storing information materials, the library space can and should evolve to meet current academic needs by transforming into an environment that encourages collaborative work.”3 mailto:jdkovalcik@gmail.com mailto:mike.villalobos@csun.edu automated storage & retrieval system | kovalcik and villalobos 115 https://doi.org/10.6017/ital.v38i4.11273 unfortunately, as the first stewards of an as/rs, csun made decisions that led to mismanagement and neglect resulting in the as/rs facing many challenges in becoming a stable and reliable component of the library. however, recent efforts have sought to resolve these issues and resulted in system updates, management, and functionality. whereas in the past low-use materials were placed in as/rs to create space for new materials, now materials are moved into the as/rs to create space for patrons, secure collections, and improve customer service. as part of this critical review, the functionality and maintenance along with the historical and current management of the as/rs will be examined. background csun is the second-largest member of the twenty-three-campus csu system. the diverse university community includes over 38,000 students and more than 4,000 employees.4 consisting of nine colleges offering 60 baccalaureate degrees, 41 master’s degrees, 28 credentials in education, and various extended learning and special programs, csun provides a diverse community with numerous opportunities for scholarly success.5 the csun oviatt library’s as/rs is an imposing and impressive area of the library that routinely attracts onlookers and has become part of the campus tour. the as/rs is housed in the library’s east wing and occupies an area that is 8,000 square feet and 40 feet high arranged into six aisles. the 13,260 steel bins, each 2 feet x 4 feet, in heights of 6, 10, 12, 15, and 18 inches, are stored on both sides of the aisles enabling the as/rs to store an estimated 1.2 million items.6 each aisle has a storage retrieval machine (srm) that performs automatic, semiautomatic, and manual “picks” and “deposits” of the bins.7 the as/rs was assessed in 2014 as responsibilities, support, and expectations of the system shifted and previous configurations were no longer viable. discontinued and failing equipment, unsupported server software, inconsistent training and use, and decreased local support and management were identified as impediments for greater involvement in library projects and operations. campus provided funding in 2015 to update the server software as well as major hardware components on three of the six aisles. divided into two phases, the server software upgrade was completed in may 2017 followed by the hardware upgrade in january 2019.8 literature review the continued growth of student, faculty, and academic programs along with evolving expectations and needs since the late 1980s has required the library to analyze library services and examine the building’s physical space and storage capacity. in the late 1980s, identifying space for increasing printed materials was the main contributing factor in implementing the as/rs. in the mid-2010s, creating space within the library for new services was dependent on a stable and reliable as/rs. “the conventional way of solving the space problem by adding new buildings and off-site storage facilities was untenable.”9 a benefit of an as/rs, as creaghe and davis predicted in 1986 was, “the probable slow transition from books to electronic media, an aaf [automated access facility] may postpone the need for future library construction indefinitely.”10 the as/rs has enabled the library to create space by removing physical materials while enhancing customer service, material security, and inventory control. “the role of the library as service has been evolving in lockstep with user needs. the current transformative process that takes place in academia has a powerful impact on at least two functional areas of the library: information technology and libraries | december 2019 116 library as space and library as collection.”11 in addition, the “increased security the aaf … offers will save patrons time that would be spent looking for books on the open shelves that may be in use in the library, on the waiting shelves, misplaced, or missing.”12 in subsequent years, library services have evolved to include computer labs with multiple high-use printers/scanners/copiers, instructional spaces, individual and group study spaces, makerspaces, etc., in addition to campus entities that have required large amounts of physical space within the library. “it is well-known that academic libraries have storage problems. traditional remedies for this situation—used in libraries across the nation—include off-site storage for less used volumes, as well as, more recently, innovative compact shelving. these solutions help, but each has its disadvantages, and both are far from ideal. . . . when the eastern michigan university library had the opportunity to move into a new building, we saw that an as/rs system would enable us to gain open space for activities such as computer labs, training rooms, a cafe, meeting rooms, and seating for students studying.”13 the as/rs provides all the space advantages provided by off-site storage and compact shelving while adding much more value while mitigating negatives of off-site time delays and the confusion of accessing and using compact shelving. staffing & usage 1991–1994 following the 80/20 principle, low-use items were initially selected for storage in the as/rs. “when the storage policy was being developed in [the] 1990s, the 80/20 principle was firmly espoused by librarians. . . . thus, by moving lower-use materials to as/rs, the library could still ensure that more than 80% of the use of the materials occurs on volumes available in the open stacks.”14 low-use items were identified if one of the following three conditions was met: (1) the item’s last circulation date was more than five years ago; (2) the item was a non-circulating periodical; or (3) items that were not designed to leave an area and received little patron usage such as the reference collection. in 1991, the as/rs was loaded with 800,000 low-use items and went live for the first time later that year. staffing for the initial as/rs department consisted of one full-time as/rs supervisor (40 hours/week), one part-time as/rs repair technician (20 hours/week), and 40 hours a week of dedicated student employees, for a total of 100 hours a week of dedicated as/rs management. the as/rs was largely utilized as a specialized service for internal library operations with limited patron-initiated requests. as/rs operations were uniquely created and customized for each as/rs operator as well as the desired task needing to be performed. skills were developed internally with knowledge and training shared by word of mouth or accompanied with limited documentation. 2000 mid-2000s the as/rs department functioned in this manner until the 1994 northridge earthquake struck the campus directly and required partial building reconstruction to the library. although there was no damage to the as/rs itself or its surrounding structure, extensive damage occurred in the wings of the library. the damage resulted in the library building being closed and inaccessible. when the library reopened in 2000, it was determined that due to previous as/rs low usage that a dedicated department was no longer warranted. the as/rs supervisor position was dissolved, the student employee budget was eliminated, and the as/rs technician position was not replaced after the employee retired in 2008. as/rs operational responsibilities were consolidated into the circulation department and as/rs administration into the systems department. both circulation automated storage & retrieval system | kovalcik and villalobos 117 https://doi.org/10.6017/ital.v38i4.11273 and systems departments redefined their roles and responsibilities to include the as/rs without additional budgetary funding, staffing, or training. in order for as/rs operations to be absorbed by these departments, changes had to occur in the administration, operating procedures, staffing assignments, and access to the as/rs. all five circulation staff members and twenty student employees received informal training by members of the former as/rs department in the daily operations of the as/rs. the circulation members also received additional training for first-tier troubleshooting of as/rs operations such as bin alignments, emergency stops, and inventory audits. the as/rs repair technician remained in the systems department; however, as/rs troubleshooting responsibility was shared among the systems support specialists and dedicated as/rs support was lost. the administrative tasks of scheduling preventive maintenance services (pms), resolving as/rs hardware/equipment issues with the vendor, and maintaining the server software remained with the head of the systems department. without a dedicated department providing oversight for the as/rs, issues and problems began to occur frequently. circulation had neither the training nor resources available to master procedures or enforce quality control measures. similarly, the systems department became increasingly removed from daily operations. many issues were not reported at all and became viewed as system quirks that required workarounds or were viewed as limitations of the system. for issues that were reported, troubleshooting had to start all over again and systems relied on circulation staff being able to replicate the issue in order to demonstrate the problem. system’s personnel retained little knowledge on performing daily operations, and troubleshooting became more complex and problematic as different operators had different levels of knowledge and skill that accompanied their unique procedures. mid-2000s–2015 these issues became further exasperated when areas outside of circulation were given full access to the as/rs in the mid-2000s. employees from different departments of the library began entering and accessing the as/rs area and operated the as/rs based on knowledge and skills they learned informally. student assistants from these other departments also began accessing the area and performing tasks on behalf of their informally trained supervisors. further, without access control, employees as well as students ventured into the “pit” area of the as/rs where the srms move and end-of-aisle operations occur. this area contains many hazards and is unsafe without proper training. during this period, the special collections and archives (sc/a) department loaded thousands of un-cataloged, high-use items into the as/rs that required specialized service from circulation. these items were categorized as “non-library of congress” and inventory records were entered into the as/rs software manually by various library employees. in addition, paper copies were created and maintained as an independent inventory by sc/a. over the years, the sc/a paper inventory copies were found to be insufficiently labeled, misidentified, or missing. therefore, the as/rs software inventory database and the sc/a paper copy inventory contained conflicts that could not be reconciled. to resolve this situation, an audit of sc/a materials was completed in spring 2019 to locate inventory that was thought to be missing. information technology and libraries | december 2019 118 all bound journals and current periodicals were eventually loaded into the as/rs as well, causing other departments and areas to rely on the as/rs more heavily. departments such as interlibrary loan and reserves, as well as patrons, began requesting materials stored in the as/rs more routinely and frequently. the as/rs transformed from a storage space with limited usage to an active area with simultaneous usage requests of different types throughout the day. without a dedicated staff to organize, troubleshoot, and provide quality control, there was an abundance of errors that led to long waits for materials, interdepartmental conflicts, and unresolved errors. high-use materials from sc/a, as well as currently received periodicals from the main collection, were the catalysts that drove and eventually warranted change in the as/rs usage model from storage to service. the inclusion of these materials created new primary customers identified as internal library departments: sc/a and interlibrary loan (ill). with over 4,000 materials contained in the as/rs, sc/a requires prompt service for processing archival material into the as/rs and filling specialized patron requests for these materials. in addition, ill processes over 500 periodical requests per month that utilize and depended on as/rs services. the additional storage and requests created an uptick in overall as/rs utilization that carried over into circulation desk operations as well. 2015–present the move from storage to service was not only inevitable due to an evolving as/rs inventory, but was necessary in order to regain quality control and manage the library-wide projects that involved the as/rs. the increased usage and reliance on the as/rs required the system be well maintained and managed. administration of the as/rs remains within systems and circulation student employees continue to provide supervised assistance to the as/rs. the crucial change was identified and emerged within circulation for a dedicated operations and project manager. an as/rs lead position was created with responsibilities for the daily operations and management of the system and service. however, this was not a complete return to the original staffing concept of the early 1990s. the concept for this new position focuses on project management and system operations rather than the original sole attention to system operations. the as/rs lead is the point of contact for all library projects that utilize the as/rs, relaying any as/rs issues or concerns to systems, and daily as/rs usage. this shift is necessary due to the increased demand and reliance on the system that has changed its charge from storage to service. customer service the library noted over time that the as/rs could be used as a tool in weeding and other collection shift projects to create space and aid in reorganizing materials. as more high-use materials were loaded into the as/rs the indirect advantages of the as/rs became more apparent. patrons request materials stored within the as/rs through the library’s website and pick up the materials at the circulation desk. there is no need for patrons to navigate the library, successfully use the classification system, and search shelves to locate an item that may or may not be there. as kirsch notes, “the ability to request items electronically and pick them up within minutes eliminates the user’s frustration at searching the aisles and floors of an unfamiliar library.”15 the vast majority of library patrons are csun students that commute and must make the best use of their time while on campus. housing items in the as/rs creates the opportunity to have hundreds of thousands of items all picked up and returned to one central location. this makes it far easier for library patrons, especially users with mobility challenges, to engage with a plethora of library automated storage & retrieval system | kovalcik and villalobos 119 https://doi.org/10.6017/ital.v38i4.11273 materials. the time allotted for library research and/or enjoyment becomes more productive as their desired materials are delivered within minutes of arriving in the building. as heinrich and willis state, “the provision of the nimble, just-in-time collection becomes paramount, and the demand for as/rs increases exponentially.”16 as/rs items are more readily available than shelved items on the floor, as it takes minutes to have as/rs items returned and made available once again. “they may be lost, stolen, misshelved, or simply still on their way back to the shelves from circulation—we actually have no way of knowing where they are without a lengthy manual search process, which may take days. . . . unlike books on the open shelves, returned storage books are immediately and easily ‘reshelved’ and quickly available again.”17 another advantage is there is no need to keep materials in call-number order with the unpleasant reality of missing and misshelved items. items in the as/rs are assigned bin locations that can only be accessed by an operatoror user-initiated request. the workflow required to remove a material from the as/rs involves multiple scans and procedures that increase accountability that does not exist for items stored on floor shelves. further, users are assured of an item’s availability within the system. storing materials in the as/rs ensures that items are always checked out when they leave the library and not sitting unaccounted for in library offices and processing areas. it also avoids patron frustration of misshelved, recently checked-out, or missing items. security the decision to follow the 80/20 principle and place low-use items in the as/rs meant high-use items remained freely available to library patrons on the open shelves of each floor. this resulted in high-use items being available for patron browsing and checkout, as well as patron misuse and theft. the sole means of securing these high-use items involved tattle-tape and installing security gates at the main entrance. therefore, the development of policies and procedures for the enforcement of these gates was also required. beyond the inherent cost, maintenance, and issue of ensuring items are sensitized and desensitized correctly, gate enforcement became another issue that rested upon the circulation department. assuming theft would occur by exiting the building through passing through the gates at the main entrance of the library, enforcement is limited in actions that may be performed by library employees. touching, impeding the path, following, detaining, searching, etc. of library patrons are restricted actions reserved for campus authorities such as the police and not library employees. rather than attempting to enforce a security mechanism in which we have no authority, the as/rs provides an alternative for the security of high-use and valuable materials. storing items in the as/rs eliminates the possibility of theft or damage by visitors and places control and accountability over the internal use of materials. “there would be far fewer instances of mutilation and fewer missing items.”18 further, access to the as/rs area was restricted from all library personnel to only circulation and systems employees with limited exceptions. individual log ins also provided a method of control and accountability as each operator is required to use a personal account rather than a departmental account to perform actions on the as/rs. materials stored in the as/rs are, “more significantly . . . safer from theft and vandalism.”19 information technology and libraries | december 2019 120 inventory conducting a full inventory of a library collection is time consuming, expensive, and often inaccurate by the time of completion. missing or lost items, shelf reading projects, in-process items, etc. create overhead for library employees and generate frustration for patrons searching for an item. massive, library-wide projects such as collection shifts and weeding are common endeavors undertaken to create space, remove outdated materials, and improve collection efficiency. however, actions taken on an open shelves collection is time consuming, costly, inefficient, and affect patron activities. these projects typically involve months of work that involve multiple departments to complete. items stored within the as/rs do not experience these challenges because the system is managed by a full-time employee throughout the year and not on a project basis. the system is capable of performing inventory audits, and does not affect public services. therefore, while the cost of an item on an open shelf is $0.079, the cost of storing the same item in the as/rs is $0.0220 routine and spot audits ensure an accurate inventory, confirm capacity level of the system, and establish best management of the bins. as/rs inventory audits are highly accurate and much more efficient than shelf reading with little impact to patron services. “while this takes some staff time, it is far less time-consuming than shelf reading or searching for misshelved books.”21 storing materials in the as/rs is more efficient than on open shelves; however, bin management is essential in ensuring bins are configured in the best arrangement to achieve optimal efficiency. the size and configuration of bins directly affects storage capacity. type of storage, random or dedicated, also influences capacity, efficiency, and accessibility of items. the 13,260 steel bins in the as/rs range in height from 6 to 18 inches. the most commonly used bins are the 10and 12-inch bins; however, there is a finite number of these bin heights. unfortunately, the smallest and largest bins are rarely used due to material sizes and weight capacity; therefore, as/rs optimal capacity is unattainable and the number of materials eligible for loading limited by number of bins available. the library also determined that dedicated, rather than random, bin storage type aided in locating specialized materials, reduced loading and retrieval errors, and enhanced accessibility by arranging highly used bins to reachable locations. in the event an srm breaks down and an aisle becomes nonfunctional for retrieving bins, strategically placing the highest used and specialized locations in bins that can be manually pulled is a proactive strategy. however, this requires dedicated bins with an accurate and known inventory that has been arranged in accessible locations. lessons learned disasters & security in 1994, the as/rs proved to provide a much more stable and secure environment than the open stacks when it successfully endured a 6.9 earthquake. the reshelving of more than 300,000 items required a crew of more than thirty personnel over a year to complete. many items were destroyed from the impact of falling to the floor and being buried underneath hundreds of other automated storage & retrieval system | kovalcik and villalobos 121 https://doi.org/10.6017/ital.v38i4.11273 items. the as/rs in contrast consisted of over 800,000 items and successfully sustained the brunt of the earthquake’s impact with no damage to any of the stored items. unfortunately. the materials that had been loaded into the as/rs in 1991 were low-use items that were viewed as one step from weeding. therefore, high-use items stored in open shelves were damaged and required the long process of recovery and reconstruction: identifying and cataloging damaged and undamaged materials, disposal of those damaged, renovation of the area, and purchase of new items. the low-use items stored in the as/rs by contrast required a few bins that had slightly shifted be pushed back fully into their slots. as/rs items have proven to be more secure from misplacement, theft, and physical damage from earthquakes as compared to items in open shelves. maintenance, support, and modernization the csun oviatt library has received two major updates to the as/rs since it was installed in 1991. in 2011, the as/rs received updates for communication and positioning components. the second major update occurred in two phases between 2016 and 2018 and focused on software and equipment. in phase one, server and client-side software was updated from the original software created in 1989. in phase two, half the srms received new motors, drives, and controllers. due to the many years of reliance on preventive maintenance (pm) visits and avoidance of modernization, our vendors were unable to provide support for the as/rs software and had difficulty locating equipment that had become obsolete. preventive maintenance visits were used to maintain the status quo and are not a long-term strategy for maintaining a large investment and critical component of business operations. creaghe and davis note that, “current industrial facility managers report that with a proper aaf [automated access facility] maintenance program, it is realistic to expect the system to be up 9598 percent of the time.”22 pm service is essential for long-term as/rs success; however, preventive maintenance alone is incapable of modernization and ensuring equipment and software do not become obsolete. maintenance is not the same as support, rather maintenance is an aspect of support. support includes points of contacts who are available for troubleshooting, spare supplies on hand for quick repairs, a life-cycle strategy for major components, and longterm planning and budgeting. kirsch attested the following describing eastern michigan university’s strategy: “although the dean is proud and excited about this technology, he acknowledges that just like any computerized technology, when it’s down, it’s down. ” to avoid system problems, emu bought a twenty-year supply of major spare parts and employs the equivalent of one-and-a-half full-time workers to care for its automated storage and retrieval system.”23 a system that relies solely on preventive maintenance will quickly become obsolete and require large and expensive projects in the future if the system is to continue functioning. further, modernization provides an avenue for new features and functions to be realized that increase functionality and efficiency. networking the csun oviatt library on average receives between three to four visits a year along with multiple emails and phone conversations requesting information from different libraries regarding the as/rs. these conversations aid the library by viewing the as/rs in different perspectives and forces the library to review current practices. information technology and libraries | december 2019 122 the library has learned through speaking with many different libraries that needs, design, and configuration of an as/rs can be as unique as the libraries inquiring. the csun oviatt library, for example. is much different than the three other csu system libraries that have an as/rs. due to our system being outdated, it has been difficult to form or establish meaningful groups or share information because the systems are all different from each other. as more conversations occur and systems become more modern and standard, there is potential for knowledge sharing as well as group lobbying efforts for features and pricing. buy in user confidence in any system is required in order for that system to be successful. convincing a user base that moving materials from readily available open shelves and transferring them into steel bins housed within 40-feet-high aisles that are inaccessible will be difficult if the system is consistently down. therefore, the better the as/rs is managed and supported, the more reliable and dependable that system will be and the likelihood user confidence will grow. informing stakeholders of long-term planning and welcoming feedback demonstrates that the system is being supported and managed with an ongoing strategy that is part of future library operations. similarly, administrators need confirmation that large investments and mission-critical services are stable, reliable, and efficient. creating a new line item in the budget for as/rs support and equipment life-cycle requires justification along with a firm understanding of the system. in addition, staffing and organizational responsibilities must also be reviewed in order to establish an environment that is successful and efficient. continuous assessments of the as/rs regarding downtime, projects involved, services and efficiencies provided, etc. aid in providing an illustration of the importance and impact of the system on library operations as a whole. recording usage and statistics unfortunately, usage statistics were not recorded for the as/rs prior to june 2017. therefore, data is unavailable to analyze previous system usage, maintenance, downtime, or project involvement. data-driven decisions require the collection of statistics for system analysis and assessment. following the server software and hardware updates, efforts have been taken to record project statistics, inventory audits, and srm faults, as well as public and internal paging requests. conclusion the as/rs remains, as heinrich & willis described it, “a time-tested innovation.”24 through lessons learned and objective assessment, the library is positioning the as/rs to be a critical component for future development and strategy. by expanding the role of the as/rs to include functions beyond low-use storage, the library discovered efficiencies in material security, customer service, inventory accountability, and strategic planning. the csun oviatt library has learned, experienced, and adjusted its perception, treatment, and usage of the as/rs over the past thirty years. factors often forgotten such as access to the area, staffing and inventory auditing are easily overlooked, while other potential functions such as material security and customer services may not be identified without ongoing analysis and assessment. critical review without a limited or biased perception, has enabled the library to realize the greater functionality the as/rs is able to provide. automated storage & retrieval system | kovalcik and villalobos 123 https://doi.org/10.6017/ital.v38i4.11273 notes 1 shira atkinson and kirsten lee, “design and implementation of a study room reservation system: lessons from a pilot program using google calendar,” college & research libraries 79, no. 7 (2018): 916–30, https://doi.org/10.5860/crl.79.7.916. 2 helen heinrich and eric willis. “automated storage and retrieval system: a time-tested innovation,” library management 35, no. 6/7 (august 5, 2014): 444-53. https://doi.org/10.1108/lm-09-2013-0086. 3 atkinson and lee, “design and implementation of a study room reservation system,” 916–30. 4 “about csun,” california state university, northridge, february 2, 2019, https://www.csun.edu/about-csun. 5 “colleges,” california state university, northridge, may 8, 2019, https://www.csun.edu/academic-affairs/colleges. 6 estimated as/rs capacity was calculated by determining the average size and weight of an item for each size of bin along with the most common bin layout. the average item was then used to determine how many could be stored along the width and length (and if appropriate height) of the bin and then multiplied. many factors affect the overall capacity including: bin layout (with or without dividers), stored item type (book, box, records, etc.), weight of the items, and operator determination of full, partial, empty bin designation. the as/rs mini-loaders have a weight limit of 450 pounds including the weight of the bin. 7 “automated storage and retrieval system (as/rs),” csun oviatt library, https://library.csun.edu/about/asrs. 8 “automated storage and retrieval system (as/rs),” csun oviatt library, https://library.csun.edu/about/asrs. 9 heinrich and willis, “automated storage and retrieval system,” 444-53. 10 norma s. creaghe and douglas a. davis. “hard copy in transition: an automated storage and retrieval facility for low-use library materials,” college & research libraries 47, no. 5 (september 1986): 495-99, https://doi.org/10.5860/crl_47_05_495. 11 heinrich and willis, “automated storage and retrieval system,” 444-53. 12 creaghe and davis, “hard copy in transition,” 495-99. 13 linda shirato, sarah cogan, and sandra yee, “the impact of an automated storage and retrieval system on public services.” reference services review 29, no. 3 (september 2001): 253-61, https://doi.org/10.1108/eum0000000006545. 14 heinrich and willis, “automated storage and retrieval system,” 444-53. 15 sarah e. kirsch, “automated storage and retrieval—the next generation: how northridge’s success is spurring a revolution in library storage and circulation,” paper presented at the https://doi.org/10.5860/crl.79.7.916 https://doi.org/10.1108/lm-09-2013-0086 https://www.csun.edu/about-csun https://www.csun.edu/academic-affairs/colleges https://library.csun.edu/about/asrs https://doi.org/10.5860/crl_47_05_495 https://doi.org/10.1108/eum0000000006545 information technology and libraries | december 2019 124 acrl 9th national conference, detroit, michigan, april 8-11 1999, http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/pdf/kirsch99.pdf . 16 heinrich and willis, “automated storage and retrieval system,” 444-53. 17 shirato, cogan, and yee, “the impact of an automated storage and retrieval system, 253-61. 18 kirsch, “automated storage and retrieval.” 19 shirato, cogan, and yee, “the impact of an automated storage and retrieval system, 253-61. 20 cost of material management was calculated by removing building operational costs (lighting, hvac, carpet, accessibility/open hours, etc.) and focusing on the management of the material instead. the management of materials (or unit cost) is determined by dividing the total amount of fixed and variable costs by the total number of units; 400,000 items divided by $31,500 in annual shelving student budget equals $0.079 per-material per-year in open shelves; 900,000 items divided by $18,000 in annual as/rs student budget equals $0.02 permaterial per-year in the as/rs. 21 shirato, cogan, and yee, “the impact of an automated storage and retrieval system,” 253-61. 22creaghe and davis, “hard copy in transition,” 495-99. 23 kirsch, “automated storage and retrieval.” 24 heinrich and willis, “automated storage and retrieval system,” 444-53. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/pdf/kirsch99.pdf abstract introduction background literature review staffing & usage 1991–1994 2000 mid-2000s mid-2000s–2015 2015–present customer service security inventory lessons learned disasters & security maintenance, support, and modernization networking buy in recording usage and statistics conclusion notes reproduced with permission of the copyright owner. further reproduction prohibited without permission. engelond: a model for faculty-librarian collaboration in the information age scott, walter information technology and libraries; mar 2000; 19, 1; proquest pg. 34 one-stop place for presenting scholarly research. staff support includes consultation in any aspect of the bailiwick project, including design issues, interface development, and training in software. staff members do not provide programming nor do they do any work in researching or assembling sites. each faculty member is assigned an information arcade consultant at the point of submitting a bailiwick application. the consultant serves as a primary contact person for technical support, troubleshooting, basic interface design guidance, and referrals to other staff both in the libraries and on campus. at present, the current level of staffing has been sufficient to accommodate this sort of assistance, which is not unlike the assistance provided to any patron who walks in the door of the information arcade. as a computing facility, the information arcade provides public access to a host of multimedia development workstations for scanning images, slides, and text, and for digitizing video and audio. at these multimedia stations, a large suite of multimedia integration software and web publishing software is made available for public use. staff at the public services desk have a strong background in multimedia development and web design and can provide some one-on-one training on a walk-in basis beyond technical support and troubleshooting. all of these hardware and software resources are available to bailiwick content providers, who can choose to do their development work in the information arcade or at their home or office. finally, since there is a close relationship between the information arcade and the university libraries web site, system administration and web server support is all handled inhouse as well. there are few artificial barriers imposed by the technology, thereby permitting content providers to focus on their creative expression and scholarly work. with only minimal reallocation of existing resources, the university of iowa libraries has been able to launch the bailiwick project and continue to develop it at a modest pace. one of the components most essential for its continued success, however, is the ability to scale up to meet the expected demand over the next several years. technical infrastructure challenges are not overwhelming as yet. an analysis still needs to be made to determine how quickly creators are developing their sites, what the implications are for network delivery of these resources, what reasonable projections there are for disk space, and who is using the resources. perhaps more importantly, though, adequate staffing will always remain a concern. some faculty wish to work more closely with library staff consultants than time allows, and the consultants would certainly find it enriching to be more intimately involved with the development of each bailiwick site. marketing of the bailiwick project has been discrete (to say the least) because of the limited staffing available. however, embedded in the collaboration inherent in bailiwicks is the potential for stronger involvement with faculty in obtaining grant funding to support the development of specific bailiwick sites. a model for research libraries bailiwick is a project that allows the university of iowa libraries, and specifically the information arcade, to focus on the integration of technology, multimedia, and hypertext in the context of scholarship and research. to date, most of the bailiwick sites represent disciplines in the arts, humanities, and social sciences. this matches the overall clientele of the information arcade (given its location in the university of iowa's main library), but it also reflects the fact that these disciplines have been tradi34 information technology and libraries i march 2000 tionally undersupported with respect to technology. nevertheless, individual faculty in these disciplines have integrated some of the most creative applications of the technology in their everyday teaching and research, in part because of the existence of the information arcade and the groundwork laid by the libraries for the past several years. with the information arcade's visibility on campus, and with similar resources and support in the information commons-a sister facility in the hardin library for the health sciences-the university of iowa libraries are well regarded on campus as a leader in information technology, electronic publishing, and new media. thus, faculty and students alike are accustomed to turning to the libraries for innovation in technology and the bailiwick project is a natural fit. bailiwick is now fully integrated as part of a palette of new technology services and scholarly resources included within the libraries' support of teaching, learning, and research at the university of iowa. engelond: a model for faculty-librarian collaboration in the information age scott walter the question of how best to incorporate information literacy instruction into the academic curriculum has long been a leading concern of academic librarians. in scott walter (walter.123@osu.edu), formerly humanities and educaton reference librarian, university of missouri-kansas city, now is information services librarian, ohio state university. reproduced with permission of the copyright owner. further reproduction prohibited without permission. recent years, this issue has grown beyond the boundaries of professional librarianship and has become a general concern regularly addresssed by classroom faculty, educational administrators, and even regional accrediting organizations and state legislatures. this essay reports on the success of a pilot program in course-integration information literacy instruction in the field of medieval studies. the author's experience with the "enge/and" project provides a model for the ways in which information literacy instruction can be effectively integrated into the academic curriculum, and for the ways in which a successful pilot program can both lead the way for further development of the general instructional program in an academic library, and serve as a springboard for future collaborative projects between classroom faculty and academic librarians. in 1989 the chronicle of higher education reported on the proceedings of a conference on teaching and technology held near the richmond, indiana campus of earlham college.1 conference speakers identified a number of concerns for those involved in teaching and learning at the end of the twentieth century. chief among these were recent advances in information technology that threatened "to leave students adrift in a sea of information." earlham college librarian evan i. farber and his fellow speakers called upon conference attendees to develop new teaching strategies that would help students learn how to evaluate and make use of the "masses of information" now accessible to them through emergent information technologies, and to embrace a collaborative teaching model that would allow academic librarians and classroom faculty members to work together in developing instructional objectives appropriate to the information age. the concerns expressed by these faculty and administrators for the information literacy skills of their students may have still seemed unusual to the general educational community in the late 1980s, but, as behrens and breivik have demonstrated, such concerns have been a leading issue for academic librarians for more than twenty years. according to its most popular definition, information literacy may be understood as "[the ability] to recognize when information is needed and ... the ability to locate, evaluate, and use effectively the needed information."2 it has become increasingly clear over the past decade that educators at every level consider information literacy a critical educational issue in contemporary society. perhaps the most frequently cited example of concern among educational policy-makers for the information literacy skills of the student body can be found in ernest boyer's report to the carnegie foundation, college: the undergraduate experience in america (1987), in which the author concludes that "all undergraduates should be introduced to the full range of resources for learning on campus," and that students should spend "at least as much time in the library ... as they spend in classes."3 but while boyer's report may be the most familiar example of such concern, it is hardly unique. as breivik and gee have described, a small group of educational leaders have regularly expressed similar concerns over the past several decades. moreover, as bodi et al. among others, have demonstrated, the rise in professional interest in information literacy issues among librarians in the past decade is closely related to more general concerns among the educational community, especially the desire to foster critical thinking skills among the student body. by the mid-1990s, professional organizations such as the national education association, accrediting bodies such as the middle states association of colleges and schools, and even state legislators began to incorporate information literacy competencies into proposals for educational reform at both the secondary and the post-secondary levels. the confluence over the past decade of new priorities in educational reform with rapid developments in information technology provided a perfect opportunity for academic librarians to develop and implement formal information literacy programs on their campuses, and to assume a higher profile in terms of classroom instruction. for the past two years, a pilot project has been underway at the miller nichols library of the university of missouri-kansas city that not only fosters collaborative relations between classroom faculty members and librarians, but promotes the development of higherorder information literacy skills among participating members of the student body. engelond: resources for 14th-century english studies (www.umkc.edu/lib i engelond/) incorporates traditional library instruction in information access as well as instruction in how to apply critical thinking skills to the contemporary information environment into the academic curriculum of participating courses in the field of medieval studies. our experience with the engelond project provides a model for the ways in which information literacy instruction can be effectively integrated into the academic curriculum, and for the ways in which a successful pilot program can both lead the way for further development of the general instructional program in an academic library, and serve as a springboard for future collaborative projects between classroom faculty members and librarians. the impetus for collaboration "most medieval web sites are dreck," or so wrote linda e. voigts, curators' professor of english at the university communications i walter 35 reproduced with permission of the copyright owner. further reproduction prohibited without permission. of missouri-kansas city, in a recent review of her participation in the engelond project for the medieval academy news. describing the impetus for the development of the project in terms of a complaint increasingly common among members of the classroom faculty, voigts provides a number of examples from recent years in which students made extensive, but inappropriate, use of web-based information resources in their academic research. in one example, voigts describes a student who made the mistake of relying heavily on what appeared to be an authoritative essay for her report on medieval medical practices. the report was actually authored by a radiologist "with little apparent knowledge of either the middle ages or of premodern medicine." "how can those of us who teach the middle ages," voigts asked, "help our students find in the morass of rubbish on the internet the relatively few pearls? how can we foster skills for distinguishing between true pearls and those glittery paste jewels that dissolve upon close examination?"4 by the time voigts approached the miller nichols library during the fall 1997 semester for suggestions about the best ways to teach her students how to "sift the web" in their search for resources suitable for academic research in medieval studies, the issue of faculty-librarian collaboration in internet instruction was a familiar one. in a representative review of the literature, jayne and vander meer identified three "common approaches" that libraries have taken to the problem of teaching students how to apply critical thinking skills to the use of web-based information resources: (1) the development of generic evaluative criteria that may be applied to web-based information resources; (2) the inclusion of web-based information resources as simply one more material type to be evaluated during the course of one's research (i.e., adding lo.st updated : 27.april 1999 ! enge/and supports the research of students in dr . linda ehrsam voigts' chaucer (english 4121512 ) and medieval literature ii (english 555a) courses at the university of miss ouri-k ansas city. the site was created by the university libraries' public services staff with the collaboration of dr . voigts . we hope it will serve as a prototype for future collaborative efforts integrating library resources, course content. and multi-media technologies these pages contain syllabi for both courses, links to internet resources (including web sites. news groups and online discussion groups relevant to medieval studies) . a guide to evaluating both online and print research tools. a list of materials held on reserve at miller nich ols library for the use of these classes. and links to the merlin library catalog and a wide range of databases available through the university libraries . audionisual resources include rea!audio streams of dr. voigts reading from chaucer's canterbury tales and troi/us and criseyde . also included is joshua merrill's 'from gatehouse to cathedral a phot ograp hic pilgrimage to chaucerian landmarks .' , ,~ • • i ~ i i ju l . • i t• ii i i i ' • "i h d1,t i lt.;l 1.,,ld~.;, ..,;,.,"' r'\uu,v i io 1.;:,u.ji .;l ,t. '•·" ;,t.,:, ),j l..,l<;i~.;tl.., .... j -'-' ' • j ~ figure 1. engelond home page the web to the litany of resources, popular and scholarly, print and electronic, typically addressed in a general instructional session); and (3) working with faculty to integrate critical thinking skills into an academic assignment that asks students to use or evaluate web-based information resources relevant to their coursework. 5 while the engelond project focused primarily on the last of these options, our work on the project also fostered the use of the first two approaches in our broader instructional program. engelond's landscape the engelond web site provides access to a number of resources for participating students. these resources may be categorized as course-specific (e.g., course syllabi), information literacyrelated (e.g., a set of evaluative criteria for use with web-based information resources), or multimedia (e.g., sound recordings of voigts reading excerpts from chaucer's works in middle english). all of these resources are accessible from the engelond home page www.umkc.edu/lib/engelond/) (see figure 1). several links are also provided throughout the site to resources housed on the library's web site, including access to electronic databases and subject-specific guides to relevant resources in the print collection. although students make use of all of these resources during the course of the semester, the emphasis in this essay will be on describing the nature and use of the information literacy-related resources. as behrens and euster have noted, recent interest in information literacy instruction has been guided to a degree by concern over student ability to make effective use of new forms of information technology. this concern is addressed in the engelond project by its "internet resources" page, through which students are acquainted with the architecture of the internet and are provided with annotated references (and links) to a number of electronic resources (including web portals) that will allow them to begin their research in medieval studies. students making use of the page are 36 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. introduced, for example, to a variety of the different types of information resourc e s available through the internet, including web sites, telnet sites, news groups, and discussion lists. users are also directed to related resources on the library web site, including a guide to print resources for the study of chaucer and an annotat ed guide to web-based information resources generally useful for the study of literatur e . 6 also provided on the engelond site is a discussion of evaluative criteria that students might apply to their selection of web-based information resources for academic research. designed to address voigts' initial concern about the issue of teaching students how to apply critical thinking skills to their use of the web, the "criteria" page provides a general discussion of the nature of webbased information resources, the ways in which such resources differ from traditional resources, and the kinds of questions that students must ask of any web-based resource before making use of it in their academic work . reflecting the idea that information literacy skills are best taught in connection with a specific subject matter, the "criteria" page includes references to a number of illustrative examples of web-based resources in medieval studies. this page also reflects the evolutionary nature of the engelond project, since new illustrations are added as each successive group of student users discovers different examples (both positive and negative). also included on this page is a link to the library 's "quick reference guide to evaluating resources on the world wide web," a generic version of the criteria developed for use with the broader instruction program at the miller nichols library . 7 while the resources described above introduce students to the information landscape in the field of medieval studies and provide them with evaluative tools tailored to subject-specific concerns in making use of web-based information resources in their academic work, the final information literacy-related resource made available through the engelond site is perhaps of the greatest interest. the "class picks" page presents the results of participating students' web site evaluati on assignments. on this page , user s will find student evaluations of web-based resources in medieval studies that draw not only on the information literacy skills provided through traditional library instruction, but also on the subject-specific knowledge that students gain as part of their academic coursework. jayne and vander meer wrote that faculty-librarian collaboration in internet instruction is most effective when students are asked to draw both on generic informational literacy skills and on information and evaluative criteria specific to the subject matter being addressed.8 as they concluded, " [to] benefit fully from the web's potential, stud ents need training and guidance from librarians and faculty." incorporating discussions of site design, organization of information, and veracity of content, the web site evaluations found on the "class picks" page demonstrate that participating students have learned both from the librarian and the scholar, and hav e begun to consider the best ways to incorporate web-based information resources into their day-to-day academic work. in a review of "the harvard chaucer page " (http:/ / icg.fas. harvard .edu / -chaucer /) , for example, students note the general appeal of the site, but criticize it both for technical problems in its design and for editorial choices that limit its utility for academic research: the harvard chaucer is an insightful , colorful look at the author and his times, but is dappled conspicuously with misspellings, repeated phrases , sentence fragments, broken links, and unfinished pages . translations of medieval texts provided on the site are often anonymou s, making it hard to tell if the translation is credible and an acceptable resourc e for serious research in chauce r studies. if one is interested in pursuing a topic found on the harvard chaucer , s / he is well advised to explore the site for ideas and background information, but to go elsewhere for authoritative sources .. . 9 in another review , this one of "the medieval feminist index" (www.haverford.edu / library/ reference/mschaus/mfi/mfi.html), students provide a discussion of the scholarly authority of the site as well as a description of the results retrieved in sample searches of the index for materials relevant to the study of chaucer. 10 the review concludes with further examples of issues relevant to chaucer studies that might be effectively investigated with information identified through this resource. in both reviews, students demonstrate the ability to critically evaluate a web site both for its design and for its content , and the ability to express the strengths and weaknesses of a site from the point of view of a student concerned with how to make use of a web-based information resource in his or her academic work. as a result, the reviews found on the "class picks" page not only demonstrate the successful approach to course-integrated information literacy instruction promoted through the engelond project, but also provide a useful student resource in their own right. the collaborative approach in her review of faculty-librarian partnerships in information literacy instruction, smalley wrote that, in the best-case scenario, "the student gains communications i walter 37 reproduced with permission of the copyright owner. further reproduction prohibited without permission. mastery in using some portion of internet resources, as well as exposure to resources intrinsically important to disciplinary pursuits. in doing the web-based exercises, students see information seeking and evaluation as essential parts of problem solving within the field of study." 11 the three information literacy-related resources found on the engelond site"internet resources," "criteria, " and "class picks" -demonstrate one approach to providing course-integrated information literacy instruction in such a way that the classroom faculty member and the academic librarian can work collaboratively and productively to meet their mutual instructional goals . both the classroom faculty member and the cooperating librarian are able to meet their instructional goals using the engelond model because of the collaborative nature of the information literacy instruction provided to the participating students. students enrolled in voigts' chaucer class during the winter 1999 semester received information literacy instruction focused both on information access and critical thinking while completing successive iterations of the web site evaluation assignment required for the course. a brief overview of the collaborative teaching process should suggest ways in which the participating faculty member and librarian were able to draw successfully both on generic information literac y skills and on subjectspecific knowledge while conducting course-integrated library instruction using the engelond site. participating students during the winter 1999 semester began with a general introduction to the electronic resources available through the miller nichols library at the university of missouri-kansas city (e.g., using the online catalog and databases such as the mla bibliography) . students were then presented with an introduction to the problem of applying critical thinking skills to the use of web-based information resources, as described on engelond's "criteria" page. following this introductory session conducted by the cooperating librarian, the cooperating faculty member provided students with a number of illustrative examples of the inappropriate use of electronic resources for academic research in medieval studies. from the beginning, the librarian and the faculty member modeled an integrated approach to the evaluation of information resources for their students; one that drew both on generic critical thinking skills and on specific examples of how such skills might be applied to resources in their field. following this initial session (which took place during the first week of the semester) , students were asked to complete an evaluation of a web site containing information they might consider using as part of their academic work. individual sites were chosen from among those accessible through the subject-specific web portals provided on the "internet resources" page. students were provided both with the library's "quick reference guide to evaluating resources on the world wide web" and with the more extensive description of web site evaluation available on the "criteria" page . students completed these initial reviews over the following week and submitted copies to both the faculty member and the librarian . in preparation for the second instructional session (which took place during the third week of the semester), the faculty member and the librarian evaluated each review twice (individually, and then together). reviews were evaluated for the clarity of their criticism of a site, both from the point of view of information organization and design and from the point of view of the significance of the information for student research in the field . sites that seemed to merit further review by the entire class were selected from 38 information technology and libraries i march 2000 this pool of evaluations and were discussed in greater detail by the instructors. the second instructional session took the form of an extended review of the sites selected in the meeting described above . in each case, students were asked to describe their reaction to the site in question. in cases where more than one student had evaluated the same site, each student was asked to present one or two distinct points from his or her review. the instructors then presented their reactions to the site. again, the librarian and the faculty member modeled for the students an approach to the critical evaluation of information resources that drew not only on the professional expertise of the librarian, but also on the scholarly expertise of the faculty member . by the end of this session, students had been exposed to three separate critiques of the selected web sites: the student's opinion of how the information presented on the site might be used in academic research; the librarian 's opinion of how effectively the information was organized and presented, and how its authority, currency, etc ., might differ from that of comparable print resources; and, finally, the faculty member's opinion of the place and value of the information provided on the site in the broader scheme of the discipline. following this session, the students were assigned to groups in order to develop more detailed evaluations of the web sites discussed in class. as before, these assignments were submitted both to the faculty member and to the librarian. after further review by both instructors, the assignments were returned to the students for a third (and final) iteration, and then mounted to the "class picks" page. by the conclusion of this assignment, participating students had learned not only how to apply critical thinking skills to web-based information resources, but had begun to think about the nature of reproduced with permission of the copyright owner. further reproduction prohibited without permission. electronic information and the many forms that such information can take. the web site evaluations included on the "class picks" page demonstrate the students' ability to successfully evaluate a web-based information resource both for its design and for its content, and to suggest the academic situations in which its use might be warranted for a student of medieval literature. evaluating engelond during the winter 1999 semester, we attempted to evaluate the success of the information literacy instruction provided through the engelond project. while the web site evaluations produced by the students provided one obvious measure of our instructional success, we attempted to learn more about the ways in which students used the materials provided through the engelond site by polling users and by examining use patterns on the site. both of these latter measures confirmed what the instructors already suspected: students enrolled in participating courses were making heavy use of the information literacy-related resources housed on the engelond site and saw the skills fostered by those resources as a valuable complement to the disciplinary knowledge being gained in the traditional classroom. as part of a general evaluation of the instructional services provided by the library during the course of the semester, students participating in the engelond project were asked open-ended questions such as: "what features of the engelond web site did you find most useful as a student in this course?"; "how did the existence of the engelond site and the collaboration between your classroom instructor and the library enhance your learning experience in this course?"; and "what aspects of the library instruction that you received as part of this course do you believe will be useful to you in other courses or in regards to lifelong learning?" among the specific items cited most often by students as being useful to them in their academic work were two of the information literacy-related resources: "internet resources" and "class picks." likewise , information literacy skills such as familiarity with the structure of the internet and the ability to critically evaluate web-based information resources were listed by almost every student as skills that would be useful both in other academic courses and in their daily lives . moreover, two graduate students who were participants reported that their experience with engelond had led them to incorporate information literacy instruction into the undergraduate courses that they taught themselves. any conclusions about the appeal of the information literacyrelated resources housed on the engelond site based on these narrative responses were reinforced by a study of the use statistics for the same period. through the first three months of the winter 1999 semester ganuary-march), the engelond site recorded approximately one thousand "hits" on its main page.12 in each month, the most frequently accessed pages were the three information literacy-related resources described above, with the "criteria" page regularly recording the greatest number of hits . among the other most-frequently visited pages on the site were the multimedia resource page (" audio -visual"), the "syllabi" page, and the "quick reference guide to chaucer" (housed on the library web site, but accessible through the "internet resources" page). taken in conjunction with the narrative responses provided on the evaluation form , these use statistics suggest that the information literacy resources provided through the engelond site have become a fullyintegrated, and greatly appreciated, feature of the academic curriculum in medieval studies in the department of english at the university of missouri-kansas city. a model for future collaboration the engelond project has not only been a success with students who have enrolled in participating courses, but has had a significant influence on the broader instructional program at the miller nichols library. it has served as a template for future collaborative efforts between the classroom faculty and the library in terms of integrating information technology and information literacy into the academic curriculum. in terms of the instructional program at the miller nichols library, our experience with engelond helped lay the groundwork for the development of new instructional materials and for new instructional programs . it was through engelond, for example, that we first provided electronic access to our point-of-service guides to library materials in various subjects (e.g., the "library guide to chaucer"). as of the end of the winter 1999 semester, we have made almost all of our pathfinders available on the library web site and are now considering ways in which these might be effectively incorporated into the work being done by our faculty in developing web-based coursework. also, it was through engelond that our subject specialists started collecting and annotating web-based information resources of potential use to our students and faculty. now, subject specialists are developing "subject guides" to web-based resources in a number of fields and promoting their use among faculty members who , like voigts, are concerned about the quality of the web-based information being used by their students in their communications i walter 39 reproduced with permission of the copyright owner. further reproduction prohibited without permission. !' miller nichols library about the tl tc services schedule workshops staff technology for learning and teaching center a umkc faculty service figure 2. tltc home page academic work. both our pathfinders and our subject guides to web-based resources are available online (www. umkc.edu/lib /instruction/guides/ index.html). finally, the instructional session on the critical evaluation of webbased resources that has been the centerpiece of library instruction for the engelond project has now been adapted for inclusion in our normal round of instructional workshops. while support for such innovations in our instructional program clearly existed within the library prior to the initiation of the engelond project, the project's success has provided an important spur to the development of instructional services in the library. the commitment to collaborative instructional programming demonstrated by the engelond project has also helped pave the way for the development of the university of missouri-kansas city's new technology for learning and teaching (tlt) center. housed in the miller nichols library, the tlt center offers faculty workshops in the use of information technology and a place in which classroom faculty, subject specialists, and educational technologists may collaborate on the development of projects such as engelond. further information on the tlt center is available online (www.umkc.edu/tltc/) (see figure 2). initiating a culture of collaboration between members of the classroom faculty and academic librarians can be a difficult task (as so much of the literature has shown). in reviewing our experience with engelond, we have benefited from the suggestions that hardesty made some years ago about the means of supporting the adoption across campus of an innovative instructional model: (1) the librarian must present information literacy instruction in such a way that it does not threaten the role of the classroom faculty member as an authority in the subject matter of the course; (2) the new approach to instructional collaboration must be adopted on a limited basis at first, rather than requiring that all instructional programs immediately adopt the new approach; and (3) the results of a successful pilot projects 40 information technology and libraries i march 2000 must be "readily visible to others" on campus. 13 designed as a pilot project, engelond has successfully demonstrated that classroom faculty and academic librarians can collaborate to meet their mutual instructional objectives, both in terms of information literacy instruction and in terms of academic course content. as information technology continues to gain a central place in the educational mission of the college and university, it is likely that the sphere of mutual instructional objectives between classroom faculty and academic librarians will only increase. our careful approach to raising the instructional profile of librarians on campus has been rewarded, too, both by an increasing number of faculty members seeking course-related instruction in our electronic classroom as part of the regular instructional program of the library, and by the institutional commitment of resources to the tlt center, which will become the nexus of instructional collaboration between faculty and librarians on our campus. during the 1999-2000 academic year, no fewer than three academic courses in medieval studies will make use of the engelond site. as more faculty become aware of the services provided by the tlt center, such collaborative approaches to information literacy instruction will likely become more evident across a variety of disciplines. the lessons learned over the past two years of project development will be invaluable as we move to provide courseintegrated information literacy instruction to an increasing number of students in an increasingly broad variety of courses. acknowledgments the engelond project has benefited from the work of a number of individuals over the past two years, reproduced with permission of the copyright owner. further reproduction prohibited without permission. especially ted p. sheldon, director of libraries at the university of missouri-kansas city, and marilyn carbonell, assistant director for collection development, both of whom were instrumental in developing the plan for a pilot project in courseintegrated information literacy instruction with professor voigts. the design for the engelond site was developed by john laroe, former multimedia design technologist at the miller nichols library. the original text for the site was written by voigts, laroe, and t. michael kelly, former humanities reference librarian at the miller nichols library. additional text and resources for the site have been developed over the past year by voigts and myself. in addition, a number of librarians and staff members in the public services division of the miller nichols library devoted time to critiquing the site and to assisting with the creation of the embedded audio files. these contributions may not always be evident to the students who benefit from the project, but they were instrumental in our ability to successfully meet our instructional objectives during the 1998-99 academic year. references and notes 1. thomas j. deloughry, "professors are urged to devise strategies to help students deal with 'information explosion' spurred by technology," chronicle of higher education 35 (march 8, 1989), a13, al5. 2. shirley j. behrens, "a conceptual analysis and historical overview of information literacy," college & research libraries 55 guly 1994): 309-22; patricia senn breivik, student learning in the information age (phoenix, ariz.: oryx pr., 1998); "final report of the american library association presidential committee on information literacy" (1989), as reproduced in breivik, student learning in the information age, 121-37 (quotation is from pp. 121-22). for another recent overview of the development of the theory and practice of information literacy at every level of american education over the past two decades, see kathleen l. spitzer and others, information literacy: essential skills for the information age (syracuse, n.y.: eric clearinghouse on information and technology, 1998). 3. ernest l. boyer, college: the undergraduate experience in america (new york: harper & row, 1987), 165; patricia senn breivik and e. gordon gee, information literacy: revolution in the library (new york: macmillan, 1989); sonia bodi, "critical thinking and bibliographic instruction: the relationship," journal of academic librarianship 14 guly 1988): 150-53; barbara b. moran, "library /classroom partnerships for the 1990s," c&rl news 51 (june 1990): 511-14; sonia bodi, "collaborating with faculty in teaching critical thinking: the role of librarians," research strategies 10 (spring 1992): 69-76; hannelore b. rader, "information literacy and the undergraduate curriculum," library trends 44 (fall 1995): 270-78; spitzer and others, information literacy; and breivik, student learning in the information age, 7-8. on the relationship between trends in educational reform favoring the development of critical thinking skills and their relationship to the place of information literacy instruction in higher education, see also joanne r. euster, "the academic library: its place and role in the institution," in academic libraries: their rationale and role in american higher education, gerard b. mccabe and ruth j. person eds. (westport: greenwood pr., 1995), 7; craig gibson, "critical thinking: implications for instruction," rq 35 (fall 1995): 27-35. 4. linda ersham voigts, "teaching students to sift the web," medieval academy news (nov. 1998): 5. 5. elaine jayne and patricia vander meer, "the library's role in academic instructional use of the world wide web," research strategies 15 (1997): 125. see also topsy n. smalley, "partnering with faculty to interweave internet instruction into college coursework," reference services review 26 (summer 1998): 19-27. 6. behrens, "a conceptual analysis and historical overview of information literacy," 312; euster, "the academic library," 6; scott walter, "umkc university libraries: quick reference guide to chaucer." accessed sept. 24, 1999, ww.umkc.edu/lib/ instruction/ guides/ chaucer .html; scott walter, "umkc university libraries: subject guide to literature." accessed sept. 24, 1999, www.umkc.edu/lib/ instruction/ guides/literature.html. all references to specific pages on the engelond site will be made to the page title, e.g., "internet resources." because engelond has been designed in a frameset, it will be easier for interested readers to access the main page at the url provided in the text and then make use of the navigational buttons provided there. 7. scott walter, "umkc university libraries: quick reference guide to evaluating resources on the world wide web." accessed sept. 24, 1999, www.umkc.edu/ lib/ instruction/ guides/ webeval.html. 8. jayne and vander meer, "the library's role in academic instructional use of the world wide web," 125. 9. laura arruda and others, review of "the harvard chaucer page." accessed accessed sept. 24, 1999, www.umkc.edu/lib/engelond. 10. sherrida d. harris and jennifer kearney, review of "the medieval feminist index: scholarship on women, sexuality, and gender." accessed sept. 24, 1999, www.umkc.edu/lib/engelond. 11. smalley, "partnering with faculty to interweave internet instruction into college coursework," 20. 12. in january 1999 engelond received 368 hits, with the three most frequently accessed items being "criteria" (157), "internet resources" (130), and "class picks" (128). in february the total number of hits dropped to 216, with the most frequently accessed items being "criteria" (130), "audio-visual" (59), and "internet resources" and "class picks" (both with 46). in march the total number of hits was 323, with the favorite resources again being "criteria" (113), "internet resources" (74), and "class picks" (65). statistics are based on a study of the daily use logs. accessed sept. 24, 1999,www.umkc.edu/ _reports/. 13. larry hardesty, "the role of the classroom faculty in bibliographic instruction," in teaching librarians to teach: on-the-job training for bibliographic instruction librarians, alice f. clark and kay f. jones eds. (metuchen: scarecrow pr., 1986), 171-72. communications i walter 41 digitization of libraries, archives, and museums in russia article digitization of libraries, archives, and museums in russia heesop kim and nadezhda maltceva information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.13783 heesop kim (heesop@knu.ac.kr) is professor, kyungpook national university. nadezhda maltceva (nadyamaltceva7@gmail.com) is graduate student, kyungpook national university. © 2022. abstract this paper discusses the digitization of cultural heritage in russian libraries, archives, and museums. in order to achieve the research goals, both quantitative and qualitative research methodologies were adopted to analyze the current status of legislative principles related to digitization through the literature review and the circumstance of the latest projects related to digitization through the literature and website review. the results showed that these institutions seem quite successful where they provide a wide range of services for the users to access the digital collections. however, the main constraints on digitization within libraries, archives, and museums in russia are connected with the scale of the work, dispersal of rare books throughout the country, and low level of document usage. introduction culture is one of the most important aspects of human activity. libraries, archives, and museums (lams) in the russian federation store some of the richest cultural and historical heritage collections, some of which can be classified as world cultural treasures. as is true with other countries, lams in russia are engaging in problems with digitization of their unique cultural treasures. in this regard, these repositories are implementing digital technologies to improve their work on digitization, preservation, indexing, search, and access of cultural heritage more effectively and efficiently. information technologies can be used to preserve national knowledge and experience.1 the digitization of cultural heritage is one of the changes that occurred at the present stage of the global information society. researchers have made many attempts to define the concept of digital culture, which is considered to be a phenomenon that manifests itself through art, creativity, and self-realization, by implementing information technologies.2 the need for digitization of unique cultural heritage has caused the rapid development of digital libraries, archives, and museums, described collectively as digital lams, the multidisciplinary institutions that change the way people retrieve and access information. researchers and specialists involved in the digitization of information resources in lams work together to preserve the cultural heritage of the russian federation using modern information technologies. as pronina noted, the digitization of cultural heritage began to develop actively in many countries, including russia, around the same time.3 many researchers analyzed digitization issues in russia. for example, lopatina and neretin discussed the modernization of the system of cultural information resources and the history of preserving digital cultural heritage in russia.4 astakhova pointed out the problem of the digitization of cultural heritage and the transformation of art objects into 3d models.5 miroshnichenko et al. discussed the problem of organizing digital documents in the state archives and pointed out the issues of providing digitized archival documents for wide access through open electronic resources.6 mailto:heesop@knu.ac.kr mailto:nadyamaltceva7@gmail.com information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 2 despite a long history of improvements in digitization policies and programs, issues still exist in the major cultural repositories, and russia’s level and scope of digitization research are still lagging behind many european countries.7 therefore, three primary research questions guide this study: 1. what is the policy to regulate the digitization of cultural heritage in russia? 2. what is the status of the digitization of cultural heritage in russia? 3. what are the constraints related to digitization in russia? in addition, there is not enough research that fully reflects the current activities of digitization practices in lams in russia. by analyzing this matter, the authors hope to present the state of cultural heritage digitization in russia and uncover problems and limitations in this field. benefits of digitization in a cultural heritage repository before answering the key research questions, it is worth exploring the ultimate benefits of digitization in cultural heritage repositories. digitization refers to converting an analogue source into a digital version.8 a large proportion of the collections related to cultural heritage repositories comprise not only the materials that are born digital, but many resources that are not originally created in digital form that have been digitized. digitization involves three major stages.9 the first stage is related to preparing objects for digitization and the actual process of digitizing them. the second stage is concerned with the processing required to make the materials easily accessible to users. this involves a number of editorial and processing activities including cataloguing, indexing, compression, and storage, as well as applying appropriate standards for text and multimedia file formats to meet the needs of online digital lams. the third stage includes the preservation and maintenance of the digitized collections and services built upon them.10 the benefits of digitization are improved access and preservation. items, once digitized, can be used by many people from different places simultaneously at any point in time. unlike printed or analogue collections, digitized collections are not damaged by heavy and frequent usage, which helps in the preservation of information. according to ifla’s guidelines, several benefits come from having digitized materials. organizations digitize 1. to increase access to a high demand from users and the library or archive has the desire to improve access to a specific collection; 2. to improve services to an expanding user’s group by providing enhanced access to the institution’s resources with respect to education and life-long learning; 3. to reduce the handling and use of fragile or heavily used original material and create a backup copy for endangered material such as brittle books or documents; 4. to give the institution opportunities for the development of its technical infrastructure and staff skill capacity; 5. to form a desire to develop collaborative resources, sharing partnerships with other institutions to create virtual collections and increase worldwide access; 6. to seek partnerships with other institutions to capitalize on the economic advantages of a shared approach; and 7. to take advantage of financial opportunities, for example the likelihood of securing funding to implement a program, or of a particular project being able to generate significant income.11 information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 3 while digitization has benefits, there are also some problems. the most obvious one is related to the quality of the digitized objects. in the course of digitizing, we may lose some important aspects of the original document. another problem is related to access management. proper mechanisms need to be put in place to determine the authenticity of materials, as well as to control unauthorized access and use. the success of digitization projects depends not only on technology but also on project planning. since digitization is a relatively new process, institutions may concentrate on technology before deciding on a project’s purpose. however, technology should never drive digitization projects; instead, user needs should be determined first, and only then should a technology appropriate to those needs be selected to meet a project’s objectives. the best practices for planning a digitization project can be suggested as follows: determine the copyright status of the materials; identify the intended audience of the materials; determine whether it is technically feasible to capture the information; insist on the highest quality of technical work that the institution can afford; factor in costs and capabilities for long-term maintenance of the digitized images; cultivate a high level of staff involvement; write a project plan, budget, timeline, and other planning documents; budget time for staff training; plan a workflow based upon the results of scanning and cataloging a representative sample of material.12 policies regulating digitization of cultural heritage in russia the policy development at the time of selection should be made early for the suitability of selection and digital object management. this policy should formulate goals of the digitization project, identify materials, set selection criteria, define the means of access to digitized collections, set standards for image and metadata capture and for preservation of the original materials and state the institutional commitment to the long-term preservation of digital content.13 as stated by russian law, the cultural heritage of the peoples of the russian federation includes material and spiritual values created in the past, as well as monuments and historical and cultural territories and objects significant for the preservation and development of identity of the russian federation and all its peoples, their contribution to world civilization.14 the decree of the president of the russian federation “on approval of the fundamentals of state cultural policy” extended the term of cultural heritage by including documents, books, photos, art objects, and other cultural treasures that represent the knowledge and ideas of people throughout the centuries. the government emphasized the role of the information environment and modern technologies by analyzing it at the legislative level. in the presidential decree “on approval of the fundamentals of state cultural policy,” the concept of the information environment is separately distinguished, defined as a set of mass media, radio, and television broadcasting, and the internet; the textual and visual information materials disseminated through them; as well as the creation of digital archives, libraries, and digitized museum collections.15 another important part of the government policy is to provide open access to cultural heritage objects. the problem of access was confirmed in the state program culture of russia (2012 –2018), which stipulated the need to provide access to cultural heritage in digital forms as well as to create and support resources that provide access to cultural heritage objects on the internet and in the national electronic library, one of the main digital repositories in the country.16 access to digital cultural heritage was also considered in the state program information society (2011–2020). the subprogram information environment ensured equal access to the media environment, including information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 4 objects of digital cultural heritage. the program aimed to reduce the gap in access to cultural heritage objects in different regions across the russian federation.17 the digitization of cultural heritage and creation of digital archives is one of the characteristics of innovative changes in the cultural sphere of the information society. the law “on archival affairs” notes that a significant part of the information resources of the archives has a historical and cultural value and should be considered as part of the digital cultural heritage collection, the digitization of which is required.18 with regards to libraries, on january 22, 2020, the state duma of the russian federation adopted the draft law “on amendments to the federal law on librarianship” in terms of improving the procedure for state registration of rare books (rare books are defined as handwritten books or printed publications that have an outstanding spiritual or material value; have a special historical, scientific, cultural significance; and for which a special regime for accounting, storage, and use has been established) that aimed at ensuring legal protection of rare books by improving the system of protection of the items of the national library. the law reflects the criteria for classifying valuable documents as rare books and fixes the main stages of their registration. in case of museums, a federal law from 1996 aimed to establish the national catalog of the russian federation museum collections. at first this national catalog was created for inventory purposes and then it was transformed into an online database to ensure open access to russia’s cultural heritage (http://kremlin.ru/events/administration/21027). annual reports “on the state of culture in the russian federation” reflect the overall situation and changes in libraries, archives, and museums. some researchers emphasized the need to develop a unified regulatory framework for cultural heritage preservation practices. particularly, shapovalova stressed that the leader in this discussion should be the government, which plays a crucial role in the legal regulations of the cultural heritage policy and is responsible for the development of initiatives.19 however, lialkova and naumov criticized that russian policy discusses digitization of only a few cultural objects, but does not define the legal status of such objects and does not cover objects originally created in a digital form.20 kozlova considered the issues of russian digital culture within the framework of the obligatory library copies system.21 since 1994, the national library of russia has accepted electronic media according to the federal law “on the obligatory copy of documents,” which established the legal deposit system; the bibliographic records of deposited electronic media are available online in the electronic catalog “russian electronic editions.” acquisitions librarians use this catalog as a national bibliographic resource for adding electronic editions to their collections. dzhigo addressed issues of digital preservation of cultural heritage and also paid attention to the federal legal deposit law.22 yumasheva dealt with the content of the russian normative methodic of regulating the process of digital copying of historical and cultural heritage from russian libraries and museums.23 kruglikova considered theoretical and practical issues of legislation for the preservation and popularization of cultural heritage in the modern world.24 shapovalova suggested introducing the terms of digital cultural heritage objects on a legislative level, to recognize the concept of preserving cultural heritage and to provide virtual access to such objects on the bigger scale.25 a review of the literature reveals various studies that discuss cultural heritage preservation using modern technologies. the majority of researchers identified issues in this field. digitization practices are carried out mainly by the state libraries, archives, and museums which seek to http://kremlin.ru/events/administration/21027 information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 5 preserve cultural heritage objects in a better methodological and legislative way, and less development is seen in smaller local lams. researchers express the value of preservation of cultural materials and the need to analyze and improve legislative procedures. to this day government realizes the importance of digital preservation; however, the term “digital cultural heritage” is not mentioned and the legal status of such digitized objects is not defined. in addition, legislative documents do not cover the regulation of objects originally created in digital format. moreover, we can see a large gap between the accumulation of materials and the degree of their use despite the fact that the government seems to support open access to digital cultural heritage objects. digitization projects of cultural heritage in russia to analyze the circumstances of the latest projects related to digitization, we investigated the relevant websites from may 2021 to june 2022. in this study, we chose a few representative institutions, including some national projects, based on their reputation, authority, and the scope of the collections. the data on digitization practices and current projects were collected . the list of institutions is shown in table 1. as shown in table 1, the authors selected russian national library, national electronic library, russian state library, and presidential library as the largest and most well-known libraries in russia. among the archives chosen for the analysis, the archival fonds was selected because it unites the archives in russia in one system, and the national digital archive was selected because its main goal is to preserve and archive key russian digital resources. as for the museums, the state hermitage museum, the state russian museum, and the state museum of fine arts named after a. s. pushkin were chosen for this study because they hold the richest collections of russian cultural heritage and play a vital role in replenishment of the national catalogue of the russian federation museum collections, the main goal of which is to unite museums across the country. by analyzing the websites of these selected libraries, archives, and museums, we can gain insight into what projects have been undertaken to preserve cultural heritage and what are the main drawbacks of this field. however, it is true that some institutions do not share the latest information on digitized items. in the case of libraries and archives, the numbers are fairly public on the website, but it is difficult to prove exactly when the objects were digitized. however, not all museums share information about recently digitized objects. in this case, quantitatively analyzing digitization practices is the only way. therefore, the authors used a manual method for data collection and counted the number of digitized materials available on the website. indeed, this could be one of the limitations of this work, as some institutions have hidden the exact amount of digitized collections; some institutions have not been able to manually count digitized copies due to huge amounts of data; and some websites may not be up to date. information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 6 table 1. institutions responsible for digitization of cultural heritage in russian lams type name size description libraries russian national library http://nlr.ru/eng/ra 2403/digital-library 650,000 scanned copies as of the beginning of 2019, the digital library included scanned copies of books, magazines, newspapers, music publications, graphic materials, maps, audio recordings, and more. the scanned materials include items from the national library of russia and from partner libraries, publishing organizations, authors and readers. national electronic library https://rusneb.ru 1,700,000 digitized books26 the nel project was designed to provide internet users with access to digitized documents from russian libraries, museums, and archives. nel combines rare books and manuscripts, periodicals, and sheet music collected from all major russian libraries. russian state library https://www.rsl.ru 1,500,000 documents this is the largest public library in russia; the digital collection contains copies of valuable and most requested publications, as well as documents originally created in electronic form. the electronic catalog contains information on more than 21 million publications, 1.5 million of which have been digitized. presidential library https://www.prlib.r u/en 1,000,000 units the presidential library is a nationwide electronic repository of digital copies of the most important documents of the history of russia. the volume of the presidential library collections is more than a million storage units including digital copies of books and journals, archival documents, audio and video recordings, photographs, films, dissertation abstracts, and other materials. http://nlr.ru/eng/ra2403/digital-library http://nlr.ru/eng/ra2403/digital-library https://rusneb.ru/ https://www.rsl.ru/ https://www.prlib.ru/en https://www.prlib.ru/en information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 7 type name size description archives archival fonds of russia (central fonds catalog) https://cfc.rusarchiv es.ru/cfc-search/ 959,576 archival fonds27 annually, the volume of documents of the archival fonds of the russian federation increases by an average of 1.7 million units. as of december 13, 2020, the central fonds catalog included 959,576 items from 13 federal archives and 2,225 state and municipal archives of the russian federation. national digital archive https://ruarxive.org 282 websites28 the purpose of this initiative is to find and preserve websites and other digital materials of high public value and at risk of destruction. the nda project collects official accounts on social networks, official websites of government bodies and political parties, and historical data. however, not many websites were collected in comparison with other countries’ initiatives. unlike the internet archive, the nda project make a complete copy of everything that is on the site, including archive channels on twitter, instagram, and telegram. museums national catalogue of the russian federation museum collections https://goskatalog.r u/portal/#/ 23,193,078 units the catalog is an electronic database containing basic information about each museum item and each museum collection included in the museum fonds of the russian federation. according to the latest statistics (2020), over 23 million units were recorded in the national museum catalog. however, the total amount of museum objects across russia is more than 84 million. https://cfc.rusarchives.ru/cfc-search/ https://cfc.rusarchives.ru/cfc-search/ https://ruarxive.org/ https://goskatalog.ru/portal/#/ https://goskatalog.ru/portal/#/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 8 type name size description state hermitage museum https://www.hermit agemuseum.org 400,000 units the state hermitage museum is the second largest museum in the world. the hermitage exposition is gradually moving online. this process is slow and very laborious. the entire collection of the hermitage has not been digitized, but the website already contains 400,000 exhibits (that is, approximately only one tenth of the entire collection). the online collection includes paintings, sculptures, numismatics, archaeological finds, and other exhibits. state russian museum https://www.rusmus eum.ru/collections/ 3,682* * the number of digitized collections were manually counted on the website this is the world’s largest museum of russian art. the collection of the museum has about 400,000 exhibits and covers all historical periods of russian art. at the moment on the museum website only a small part of the collection is available in digitized form. however, the museum is maintaining the virtual state russian museum branch project, the main goal of which is to give free access to digital and printed materials from other institutions online. state museum of fine arts named after a. s. pushkin https://pushkinmus eum.art 334,000 as of march 1, 2019, the museum’s database contained information on 670,000 museum items, 334,000 (49%) of which have images. in total there are about 683,000 images in the database (not counting special photography) with a volume of about 35 tb. https://www.hermitagemuseum.org/ https://www.hermitagemuseum.org/ https://www.rusmuseum.ru/collections/ https://www.rusmuseum.ru/collections/ https://pushkinmuseum.art/ https://pushkinmuseum.art/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 9 russian national library national electronic library russian national digital archive state hermitage museum figure 1. screenshots of the websites of some of the institutions listed in table 1. a further analysis of russian museums shows that 2,773 state and municipal museums have more than 84 million items, but only a few are displayed in digital form. biryukova et al. reviewed the interdisciplinary approach to preserving cultural heritage and creating virtual museums.29 povroznik also analyzed virtual museums that preserve the cultural heritage of the russian federation.30 the author concluded that virtual museums and its resources need to be studied, developed, and improved more. kondratyev et al. considered the issues of digital heritage preservation from the security, integrity, and accessibility perspective, and analyzed the concept of a smart museum.31 lapteva and pikov represented the experience of the students of the institute for the humanities of siberian federal university working with the state russian museum and the state hermitage museum, the leading russian museums that are playing the important role in country digitization practices.32 the authors noted that results of implementing modern information technologies in museums create a comfortable infrastructure for the audience by preserving and representing cultural heritage in interactive contexts. information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 10 findings digitization in russian libraries creating a digital collection has become a normal library activity in russia.33 within the framework of the main development of activities to preserve russian library collections from 2011 to 2020, one of the main programs of the national library is the digitization of rare books. rare books, according to the federal law “on amendments to the federal law on librarianship,” include handwritten books or printed publications that have outstanding material value or special historical, scientific, and/or cultural significance.34 thus, the law elevated the book to the same level of protection as other objects of cultural heritage at the national level. the website of the register of rare books (https://knpam.rusneb.ru), hosted by the russian state library, became a part of the national library collection preservation program developed in 2001. from 2001 to 2009, the subprogram rare books of the russian federation was created to provide a regulatory framework and methodological support for all areas of library activities related to the preservation of library collections. this program includes not only libraries but also other institutions such as museums, archives, and scientific and educational institutions. however, in order to implement the state registration of rare books, it is necessary to further develop regulatory documents that can control the reference procedure and registration procedure of rare books. another initiative for book preservation is the federal project digital culture, designed to provide citizens with wide access to the country’s unique cultural heritage. it was expected that ten to twenty libraries of different russian regions will take part in the digitization project, each offering at least 50 documents from their collections to the project. however, the problems of this program are related to the work scale, as well as the dispersal issue of rare books throughout the country. as the 2011–2020 library preservation report emphasizes, many of these rare books remain unknown to the wider scholarly community. approximately half of the valuable collections available in the country’s repositories are not described as integral objects of cultural and historical heritage. the russian state library noted that the main problems associated with rare books include comprehensive function to identify and record rare and valuable books; ensuring the safety and security of the books; copying valuable materials requires special equipment; and the need for proper storage as the most important condition for the preservation. another main center of digitization is the digital library of the national library of russia (nlr, https://nlr.ru). the digital library is an open and accessible information resource that includes over 650,000 digitized copies of books, magazines, newspapers, music publications, graphic materials, maps, plans, atlases, and audio recordings. the digitized materials include items from the holdings of the national library of russia, partner libraries, publishing organizations, authors, and even readers. for now, the digital collection of the library includes various collections such as landmarks of the nlr, rare books, rossica, maps, and manuscripts. hosted by russia’s national library in 2004, the national electronic library (nel, https://rusneb.ru/) was launched to create an electronic library sponsored by the russian federal ministry of culture. the nel is a service that searches the full text of scanned books and magazines that have been processed using optical character recognition and converted them into text. it is stored in a digital database available through the internet and mobile applications. one of the main tasks of the nel is the integration of the libraries of the russian federation into a single information network. as of june 15, 2022, the nel collection had a total of 5 million artifacts including electronic copies of books, educational and periodical literature, dissertations and https://knpam.rusneb.ru/ https://nlr.ru/ https://rusneb.ru/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 11 abstracts, monographs, patents, notes, and visual and cartographic publications. the russian state library became the main operator of the national electronic library project in 2014. since 2015, the national library of russia has expanded its digitization program and the site now publishes a list of publications that require digitization. readers vote for publications directly on the site by clicking the vote for digitalization button. for example, as of november 2020, a list of 1,998 publications on a variety of topics ranging from physics and mathematical literature to psychology and music was available for voting. digitization in russian archives archives have historical, scientific, social, and cultural significance, which is an essential part of the process of preserving russian cultural heritage. digitization projects in russia began as an element of the digital cataloging of the largest archives from the 1980s to the 1990s. initially, the main purpose of the digitization project was to create digital copies to ensure the preservation of original archive documents and to eliminate the distribution of rare or poor originals in the reading room. since then, digitization has become an integral part of creating digital archives in russia.35 currently, one of the main goals of digitizing archival documents is to provide open access to legal entities and individuals to archival documents from the russian federation. the main archival center is the archives fond of the russian federation (http://archives.ru/af.shtml). the archives fond has more than 609 million items from the early eleventh century to the present and performs important functions to preserve historical memory, replenish information resources, and provide access to the public. the main task of digitization is to preserve russia’s cultural and historical heritage. each year, the total volume of archives across russia increases by an average of 1.7 million items. despite the relatively small amount of equipment for digitization, we can still see progress. in 2015, 8,750 documents were digitized, while in 2019, the annual total had reached 27,518 documents. this increase in the number of digital documents shows that digital copy production is directly related to equipment acquisition. however, the researchers found that the level of use of these documents was not high and tended to decrease. for example, in 2015, there were 18,155 document views, while in in 2019, there were only 19,417 document views. therefore, it is necessary not only to promote the service of the archive agency but also to increase the demand for archive documents. a portal was created under the auspices of the archives fond of the russian federation (http://www.rusarchives.ru) to encourage archiving services for users and to organize all archives throughout russia. the portal collects information resources of russian archives on the internet and publishes archival directories and regulations. the establishment was an important breakthrough in organizing access to the documents of the archives fond of the russian federation. since 2012, the website has operated the central catalog software complex, which provides information on the composition of federal and regional digitized fonds. as reported by the federal archival agency, 32 virtual exhibition projects are posted on the official website and portals of the federal archival agency. this website provides information about online archive projects, including virtual exhibitions, digital collections, and inter-archive projects. users can search for materials on the website by three publication types: virtual exhibition, document collection, and inter-archive project. the project covers four subjects: the great patriotic war, statehood of russia, soviet era, and space exploration. the federal archival agency’s main website also provides five catalogs and databases that guide users through digitized collections. this list includes the central stock catalog http://archives.ru/af.shtml http://www.rusarchives.ru/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 12 (http://cfc.rusarchives.ru/cfc-search), state register of unique documents (https://unikdoc.rusarchives.ru/), guides to russian archives (https://guides.rusarchives.ru/), electronic inventories of federal archives (https://rusarchives.ru/elektronnye-opisi-federalnyharhivov), database of declassified cases and documents of the federal archives (http://unsecret.rusarchives.ru/), and database on the places of storage of documents on personnel (http://ls.rusarchives.ru/). as of january 1, 2022, 859 documents were included in the state register of unique documents of the archival fund of the russian federation. a total of more than 98,000 documents are stored in the database. a project to digitize documents from the soviet era is still in progress, and the new collections of digitized copies of archival documents stored in federal archives across russia will be displayed on the website in the future (http://sovdoc.rusarchives.ru/#main). one of the major drawbacks of the digitization process in russia is that archival agencies and cultural heritages are scattered throughout russia. to develop digital archiving initiatives in different regions of russia the culture of russia (2012–2018) program was developed. archives of the constituent entities of the russian federation can take part in this program and get funding from the regional budget to digitize collections as a part of the regional program for the development of archival affairs.36 despite some improvements and ongoing projects, there are still no initiatives for the long -term preservation of born-digital materials and no requirements for mandatory long-term preservation of information. however, the national digital archive (https://ruarxive.org) was created to find and preserve websites and other digital materials that have a high public value and are at risk of destruction. this initiative proposes the general idea of archiving modern digital heritage and consists of many projects. the main one is preserved government, which aims to preserve official materials in the following areas: official accounts on social networks; official sites of government managers, officials, political parties; historical documents; and especially databases. future plans include developing tools that will help collect digital materials faster and more efficiently and also better systematize what has already been collected. digitization in russian museums the active introduction of information technology into museums began at the end of the twentieth century. a new area of study, museum informatics, has emerged in russian higher-education institutions. this area of study focuses on museum work and modern information technology to develop and improve museum activities.37 museums have developed many digitization projects to preserve their collections and give free and easy access to cultural heritage items. the modern russian museum system consists of about 2,773 museums, although the exact number of museums is not known. since the 1970s, the rationale for russian museum digitization practices has been quite similar to that of many other countries, finding that information and collection management are needed to ensure that museum objects are listed and properly preserved. the museums plan to create electronic collections, open valuable collections to the public, create a state catalog of the museum collection of the russian federation (https://goskatalog.ru/portal/#/) and integrate all works from all museums in russia. as of 2020, more than 23 million museum items are registered in the national catalog of the museum collection. the catalog is planned to be complete by 2026, when metadata and images of the museum’s collection are included in the register and posted online. digitization of museum collections is an important process that has recently received stable support from the government. http://cfc.rusarchives.ru/cfc-search https://unikdoc.rusarchives.ru/ https://guides.rusarchives.ru/ https://rusarchives.ru/elektronnye-opisi-federalnyh-arhivov https://rusarchives.ru/elektronnye-opisi-federalnyh-arhivov http://unsecret.rusarchives.ru/ http://ls.rusarchives.ru/ http://sovdoc.rusarchives.ru/#main https://ruarxive.org/ https://goskatalog.ru/portal/#/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 13 the national information society (2011–2020) program includes a project to create a new virtual museum based on the collections of the country’s largest national museum. the term “virtual museum” is used to characterize various projects linked to digital technology in virtual and museum space.38 it can be represented by a collection of works of art on the internet and the publication of the museum’s electronic expositions. currently, there are about 300 virtual museums available in virtual form across the country (https://www.culture.ru/museums). the most-visited museums are the state hermitage museum in st. peterburg (https://www.hermitagemuseum.org/), the state tretyakov gallery (https://www.tretyakovgallery.ru), and the state russian museum (http://en.rusmuseum.ru). these museums offer users a wide range of activities, including the use of modern technology . for example, since 2003 the russian art collection at the state russian museum (the world’s largest museum) started to implement the russian museum: virtual branch project, opening virtual branches in museums, universities, cultural centers, and institutions of additional education around the country. thanks to computer technology and digitization, thousands of russian residents in near and far places have access to the value of russian culture, russia’s historical and artistic past, and the richest collection of russian art. international business machines (ibm) collaborated with the hermitage museum to make it one of the most technologically advanced museums in the world. ibm built the state hermitage museum website in 1997, later called the “world’s best online museum” by national geographic traveler.39 the hermitage has unique experience in developing digitization programs and uploading collections to websites. currently, the museum collects more than 3 million items, and the online archives presented on its website provide easy search and the possibility of creating your own collection on the website. in 2020, the hermitage released a documentary feature film in virtual reality (vr) format, “vr— hermitage: immersion in history with konstantin khabenskiy” (https://www.khabenskiy.com/ filmography-vr-hermitage-immersion-in-history-with-konstantin-khabenskiy/). visitors can tour the history of the hermitage in a vr format based on the most important events in the history of the hermitage from the eighteenth century to the present. the pushkin museum, the largest museum of european art in moscow, offers another example of using vr technology. the joy of museums offers virtual tours of more than 60,000 museums and historic sites around the world, including the pushkin museum (https://joyofmuseums.com/museums/russianfederation/moscow-museums/pushkin-museum/). virtual museums can display electronic versions of exhibits longer than actual museum exhibitions limited by region and time zone and have the means to record information about past exhibits, including electronic collections of exhibits, as well as data on opening times and concepts. for example, the website of the state tretyakov gallery contains a virtual archive of past exhibitions. therefore, the virtual museum has considerable research potential and is actively contributing to the preservation of cultural heritage. digital copies of the original culture and arts form an electronic archive of great value from two perspectives. this is the preservation of rarity for future generations, the broad access of users to the rarest and most unique artworks in historical significance, and the possibility of research. on the other hand, it is an opportunity to find commercial use of artifacts, additional sponsorship, and investment proposals for museums. conclusions and further study the two most obvious benefits of digitization are improved access and preservation, so that libraries, archives, and museums can represent russian culture and introduce rare and unique cultural heritage artifacts to future generations. in this work, we have addressed some legislative https://www.culture.ru/museums https://www.hermitagemuseum.org/ https://www.tretyakovgallery.ru/ http://en.rusmuseum.ru/ https://www.khabenskiy.com/filmography-vr-hermitage-immersion-in-history-with-konstantin-khabenskiy/ https://www.khabenskiy.com/filmography-vr-hermitage-immersion-in-history-with-konstantin-khabenskiy/ https://joyofmuseums.com/museums/russian-federation/moscow-museums/pushkin-museum/ https://joyofmuseums.com/museums/russian-federation/moscow-museums/pushkin-museum/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 14 principles and outlined major digitization projects. the general problem of digitization in russia is related to the size of works, the tendency of documents to decrease without high use, and the scatter of rare books nationwide. in the case of libraries, one of the problems of digitization is also related to the uneven distribution of rare books throughout the country. the most important materials are concentrated in the largest federal library, and many rare books are housed in many central libraries in various parts of the russian federation. work using book memorials should be planned as long-term activities performed at different levels. in the case of archives and museums, one of the major drawbacks of the digitization is the dismantling of national archives and cultural heritages. based on this preliminary study, there are several further research topics that can enhance understanding of digitization of cultural heritage in russia. in particular, since digitization is a complex process that requires both management and technology, future research needs to be divided into three aspects: management, technology, and content. endnotes 1 g. a. kruglikova, “use of information technologies in preservation and popularization of cultural heritage,” advances in social science, education and humanities research 437 (2020): 446–50. 2 g. m. shapovalova, “digital culture and digital heritage—doctrinal definitions in the field of culture at the stage of development of modern russian legislation. the territory of new opportunities” [in russian], the herald of vladivostok state university of economics and service 10, no. 4 (2018): 81–89. 3 l. a. pronina, “information technologies preserving cultural heritage. analytics of cultural studies,” 2008, https://cyberleninka.ru/article/n/informatsionnye-tehnologii-v-sohraneniikulturnogo-naslediya/viewer. 4 n. v. lopatina and o. p. neretin, “preservation of digital cultural heritage in a single electronic knowledge space,” bulletin mguki 5, no. 85 (2018): 74–80. 5 y. s. astakhova, “cultural heritage in the digital age. human in digital reality: technological risks,” materials of the v international scientific and practical conference (2020): 204–6. 6 m. a. miroshnichenko, y. v. shevchenko, and r. s. ohrimenko, “preservation of the historical heritage of state archives by digitalizing archive documents” [in russian], вестник академии знаний 37, no. 2 (2020): 188–94. 7 inna kizhner et al., “accessing russian culture online: the scope of digitization in museum s across russia,” digital scholarship in the humanities 19 (2019): 350–67, https://doi.org/10.1093/llc/fqy035. 8 s. d. lee, digital imaging: a practical handbook (new york: neal-schuman publishers, inc., 2001). 9 s. tanner and b. robinson, “the higher education digitisation service (heds): access in the future, preserving the past,” serials 11 (1998): 127–31; g. a. young, “technical advisory service for images (tasi),” 2003, http://www.jiscmail.ac.uk/files/newsletter/issue3_03/; https://cyberleninka.ru/article/n/informatsionnye-tehnologii-v-sohranenii-kulturnogo-naslediya/viewer https://cyberleninka.ru/article/n/informatsionnye-tehnologii-v-sohranenii-kulturnogo-naslediya/viewer https://doi.org/10.1093/llc/fqy035 http://www.jiscmail.ac.uk/files/newsletter/issue3_03/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 15 “preservation services,” harvard library, https://preservation.library.harvard.edu/digitization. 10 g. g. chowdhury and s. chowdhury, introduction to digital libraries (london: facet publishing, 2003), https://doi.org/10.1016/b978-1-84334-599-2.50006-4. 11 j. mcilwaine et al., “guidelines for digitization projects for collections and holdings in the public domain, particularly those held by libraries and archives” (draft) (unesco, march 2002), 6– 7, https://www.ifla.org/wp-content/uploads/2019/05/assets/preservation-andconservation/publications/digitization-projects-guidelines.pdf. 12 m. note, managing image collections: a practical guide (oxford: chandos publishing, 2011). 13 mcilwaine et al., “guidelines,” 51–52. 14 fundamentals of the legislation of the russian federation on culture, http://www.consultant.ru/document/cons_doc_law_1870/068694c3b5a06683b5e5a2d480 bb399b9a7e3dcc/. 15 decree of the president of the russian federation of december 24, 2014 no. 808, on approval of the fundamentals of state cultural policy, http://kremlin.ru/acts/bank/39208. 16 v. zvereva, ‘‘state propaganda and popular culture in the russian-speaking internet,” in freedom of expression in russia’s new mediasphere, ed. mariëlle wijermars and katja lehtisaari (routledge: abingdon, oxon, 2020), 225–47, https://doi.org/10.4324/9780429437205-12. 17 s. l. yablochnikov, m. n. mahiboroda, and o. v. pochekaeva, “information aspects in the field of modern public administration and law,” in 2020 international conference on engineering management of communication and technology (emctech), 1–5; u. chimittsyrenova, “a research proposal information society: copyright (presumption of access to the digital cultural heritage),” colloquium journal, no. 11-3 (2017): 22–24. голопристанський міськрайонний центр зайнятості = голопристанский районный центр занятости. 18 g. m. shapovalova, “information society: from digital archives to digital cultural heritage,” international research journal 5, no. 47 (2016): 177–81. 19 g. m. shapovalova, “the global information society changing the world: the copyright or the presumption of access to digital cultural heritage,” society: politics, economics, law, 2016. 20 s. b. lialkova and v. b. naumov, “the development of regulation of the protection of cultural heritage in the digital age: the experience of the european union,” информационное общество 1 (2020): 29–41. 21 e. kozlova, “russia’s digital cultural heritage in the legal deposit system,” slavic & east european information resources 12, no. 2-3 (2011): 188–91. 22 a. a. dzhigo, “preserving russia’s digital cultural heritage: acquisition of electronic documents in russian libraries and information centers,” slavic & east european information resources 14, no. 2-3 (2013): 219–23. https://preservation.library.harvard.edu/digitization https://doi.org/10.1016/b978-1-84334-599-2.50006-4 https://www.ifla.org/wp-content/uploads/2019/05/assets/preservation-and-conservation/publications/digitization-projects-guidelines.pdf https://www.ifla.org/wp-content/uploads/2019/05/assets/preservation-and-conservation/publications/digitization-projects-guidelines.pdf http://www.consultant.ru/document/cons_doc_law_1870/068694c3b5a06683b5e5a2d480bb399b9a7e3dcc/ http://www.consultant.ru/document/cons_doc_law_1870/068694c3b5a06683b5e5a2d480bb399b9a7e3dcc/ http://kremlin.ru/acts/bank/39208 https://www.worldcat.org/search?q=au=%22wijermars,%20marie%cc%88lle%22 https://doi.org/10.4324/9780429437205-12 information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 16 23 y. y. yumasheva, “digitizing russian cultural heritage: normative and methodical regulation,” bulletin of the ural federal university humanitarian sciences 3, no. 117 (2013): 2–7. 24 g. a. kruglikova, “use of information technologies in preservation and popularization of cultural heritage,” advances in social science, education and humanities research 437 (2020): 446–50. 25 g. m. shapovalova, “the concept of digital cultural heritage and its genesis: theoretical and legal analysis, the territory of new opportunities” [in russian], the herald of vladivostok state university of economics and service 9, no. 4 (2017): 159–68. 26 a. annenkov, “national electronic library of russia: it’s not yet on fire, but the time to save it is now [in russian], http://d-russia.ru/nacionalnaya-elektronnaya-biblioteka-rossii-eshhyone-gorela-no-spasat-uzhe-pora.html. 27 saa dictionary of archives terminology. a “fonds” is the entire body of records of an organization, family, or individual that have been created and accumulated as the result of an organic process reflecting the functions of the creator. 28 airtable, https://airtable.com/shro1hise7wgurxg5/tblhdxawiv5avtn7y. 29 m. v. biryukova et al., “interdisciplinary aspects of digital preservation of cultural heritage in russia” [in russian], european journal of science and theology 13, no. 4 (2017): 149–60. 30 n. povroznik, “virtual museums and cultural heritage: challenges and solution,” https://www.researchgate.net/profile/nadezhdapovroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_ solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritagechallenges-and-solutions.pdf. 31 d. v. kondratyev et al., “problems of preservation of digital cultural heritage in the context of information security,” history and archives (2013): 36–51. 32 m. a. lapteva and n. o. pikov, “visualization technology in museum: from the experience of sibfu collaboration with the museums of russia,” journal of siberian federal university humanities & social sciences 7, no. 9 (2016): 1674–81. 33 g. a. evstigneeva, the ideology of digitization of library collections on the example of the russian national public library for science and technology, library collections: problems and solutions, 2014, http://www.gpntb.ru/ntb/ntb/2014/3/ntb_3_8_2014.pdf. 34 main directions of development of activities for the preservation of library collections in the russian federation for 2011–2020, https://kp.rsl.ru/assets/files/documents/maindirections.pdf. 35 g. m. shapovalova, “the concept of digital cultural heritage,” 159–68. 36 o. a. kolchenko and e. a. bryukhanova, “the main directions of archiving informatization in the context of electronic society development,” vestnik tomskogo gosudarstvennogo universiteta—tomsk state university journal 443 (2019): 114–18. http://d-russia.ru/nacionalnaya-elektronnaya-biblioteka-rossii-eshhyo-ne-gorela-no-spasat-uzhe-pora.html http://d-russia.ru/nacionalnaya-elektronnaya-biblioteka-rossii-eshhyo-ne-gorela-no-spasat-uzhe-pora.html https://airtable.com/shro1hise7wgurxg5/tblhdxawiv5avtn7y https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf http://www.gpntb.ru/ntb/ntb/2014/3/ntb_3_8_2014.pdf https://kp.rsl.ru/assets/files/documents/main-directions.pdf https://kp.rsl.ru/assets/files/documents/main-directions.pdf information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 17 37 g. p. nesgovorova, “modern information, communication and digital technologies in the preservation of cultural and scientific heritage and the development of museums: problems of intellectualization and quality of informatics systems” (2006): 153–61, https://www.iis.nsk.su/files/articles/sbor_kas_13_nesgovorova.pdf. 38 n. g. povroznik, “virtual museum: preservation and representation of historical and cultural heritage,” perm university bulletin 4, no. 31 (2015): 2013–21. 39 the preservation of culture through technology, https://www.ibm.com/ibm/history/ibm100/us/en/icons/preservation/ . https://www.iis.nsk.su/files/articles/sbor_kas_13_nesgovorova.pdf https://www.ibm.com/ibm/history/ibm100/us/en/icons/preservation/ abstract introduction benefits of digitization in a cultural heritage repository policies regulating digitization of cultural heritage in russia digitization projects of cultural heritage in russia findings digitization in russian libraries digitization in russian archives digitization in russian museums conclusions and further study endnotes library discovery products: discovering user expectations through failure analysis irina trapido information technology and libraries | september 2016 9 abstract as the new generation of discovery systems evolve and gain maturity, it is important to continually focus on how users interact with these tools and what areas they find problematic. this study looks at user interactions within searchworks, a discovery system developed by stanford university libraries, with an emphasis on identifying and analyzing problematic and failed searches. our findings indicate that users still experience difficulties conducting author and subject searches, could benefit from enhanced support for browsing, and expect their overall search experience to be more closely aligned with that on popular web destinations. the article also offers practical recommendations pertaining to metadata, functionality, and scope of the search system that could help address some of the most common problems encountered by the users. introduction in recent years, rapid modernization of online catalogs has brought library discovery to the forefront of research efforts in the library community, giving libraries an opportunity to take a fresh look at such important issues as the scope of the library catalog, metadata creation practices, and the future of library discovery in general. while there is an abundance of studies looking at various aspects of planning, implementation, use, and acceptance of these new discovery environments, surprisingly little research focuses specifically on user failure. the present study aims to address this gap by identifying and analyzing potentially problematic or failed searches. it is hoped that focusing on common error patterns will help us gain a better understanding of users’ mental models, needs, and expectations that should be considered when designing discovery systems, creating metadata, and interacting with library patrons. terminology in this paper, we adopt a broad definition of discovery products as “tools and interfaces that a library implements to provide patrons the ability to search its collections and gain access to materials.”1 these products can be further subdivided into the following categories: irina trapido (itrapido@stanford.edu) is electronic resources librarian at stanford university libraries, stanford, california. mailto:itrapido@stanford.edu library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 10 • online catalogs (opacs)—patron-facing modules of an integrated library system. • discovery layers (also referred to as “discovery interfaces” or “next-generation library catalogs”)—new catalog interfaces, decoupled from the integrated library system and offering enhanced functionality, such as faceted navigation, relevance-ranked results, as well as the ability to incorporate content from institutional repositories and digital libraries. • web-scale discovery tools, which in addition to providing all interface features and functionality of next generation catalogs, broaden the scope of discovery by systematically aggregating content from library catalogs, subscription databases, and institutional digital repositories into a central index. literature review to identify and investigate problems that end users experience in the course of their regular searching activities, we analyzed digital traces of user interactions with the system recorded in the system’s log files. this method, commonly referred to as transaction log analysis, has been a popular way of studying information-seeking in a digital environment since the first online search systems came into existence, allowing researchers to monitor system use and gain insight into the users’ search process. server logs have been used extensively to examine user interactions with web search engines, consistently showing that web searchers tend to engage in short search sessions, enter brief search statements, do not browse the results beyond the first page, and rarely resort to advanced searching.2 a similar picture has emerged from transaction log studies of library catalogs. researchers have found that library users employ the same surface strategies: queries within library discovery tools are equally short and simply constructed; 3 the majority of search sessions consist of only one or two actions.4 patrons commonly accept the system’s default search settings and rarely take advantage of a rich set of search features traditionally offered by online catalogs, such as boolean searching, index browsing, term truncation, and fielded searching.5 although advanced searching in library discovery layers is uncommon, faceted navigation, a new feature introduced into library catalogs in the mid-2000s, quickly became an integral part of the users’ search process. research has shown that facets in library discovery interfaces are used both in conjunction with text searching, as a search refinement tool, and as a way to browse the collection with no search term entered.6 a recent study that analyzed interaction patterns in a faceted library interface at the north carolina state university using log data and user experiments demonstrated that users of faceted interfaces tend to issue shorter queries, go through fewer iterations of query reformulation, and scan deeper along the result list than those who use nonfaceted search systems. the authors also concluded that facets increase search accuracy, especially for complex and open-ended tasks, and improve user satisfaction.7 information technology and libraries | september 2016 11 another traditional use of transaction logs has been to gauge the performance of library catalogs, mostly through measuring success and failure rates. while the exact percentage of failed searches varied dramatically depending on the system’s search capabilities, interface design, the size of the underlying database, and, most importantly, on the researchers’ definition of an unsuccessful search, the conclusion was the same: the incidence of failure in library opacs was extremely high.8 in addition to reporting error rates, these studies also looked at the distribution of errors by search type (title, author, or subject search) and categorized sources of searching failure. most researchers agreed that typing errors and misspellings accounted for a significant portion of failed searches and were common across all search types.9 subject searching, which remained the most problematic area, often failed because of a mismatch between the search terms chosen by the user and the controlled vocabulary contained in the library records, suggesting that users experienced considerable difficulties in formulating subject queries with library of congress subject headings.10 other errors reported by researchers, such as the selection of the wrong search index or the inclusion of the initial article for title searches, were also caused by users’ lack of conceptual understanding of the search process and the system’s functions.11 these research findings were reinforced by multiple observational studies and user interviews, which showed that patrons found library catalogs “illogical,” “counter-intuitive,” and “intimidating,”12 and that patrons were unwilling to learn the intricacies of catalog searching.13 instead, users expected simple, fast, and easy searching across the entire range of library collections, relevance-ranked results that exactly matched what users expected to find, and convenient and seamless transition from discovery to access.14 today’s library discovery systems have come a long way: they offer one-stop search for a wide array of library resources, intuitive interfaces that require minimal training to be searched effectively, facets to help users narrow down the result set, and much more.15 but are today’s patrons always successful in their searches? usability studies of next-generation catalogs and, more recently, of web-scale discovery systems have pointed to patron difficulties associated with the use of certain facets, mostly because of terminological issues and inconsistencies in the underlying metadata.16 researchers also reported that users had trouble interpreting and evaluating the results of their search;17 users also were confused as to what resources were covered by the search tool.18 our study builds on this line of research by systematically analyzing real-life problematic searches as reported by library users and recorded in transaction logs. background stanford university is a private, four-year or above research university offering undergraduate and graduate degrees in a wide range of disciplines to about sixteen thousand students. the study analyzed the use of searchworks, a discovery platform developed by stanford university libraries. searchworks features a single search box with a link to advanced search on every page, relevanceranked results, faceted navigation, enhanced textual and visual content (summaries, tables of library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 12 content, book cover images, etc.), as well as “browse shelf” functionality. searchworks offers searching and browsing of catalog records and digital repository objects in a single interface; however, it does not allow article-level searching. searchworks was developed on the basis of blacklight (projectblacklight.org), an open-source application for searching and interacting with collections of digital objects.19 thanks to blacklight’s flexibility and extensibility, searchworks enables discovery across an increasingly diverse range of collections (marc catalog records, archival materials, sound recordings, images, geospatial data, etc.) and allows to continuously add new features and improvements (e.g., https://library.stanford.edu/blogs/stanford-libraries-blog/2014/09/searchworks-30-released). study objectives the goal of the present study was two-fold. first, we sought to determine how patrons interact with the discovery systems, which features they use and with what frequency. second, this study aimed to identify and analyze problems that users encounter in their search process. method this study used data comprising four years of searchworks use, which was recorded in apache solr logs. the analysis was performed at the aggregate level; no attempts were made to identify individual searchers from the logs. at the preprocessing stage, we created and used a series of perl scripts to clean and parse the data and extract only those transactions where the user entered a search query and/or selected at least one facet value. page views of individual records were excluded from the analysis. the resulting output file contained the following parameters for each transaction: a time stamp, search mode used (basic or advanced), query terms, search index (“all fields,” “author,” “title,” “subject,” etc.), facets selected, and the number of results returned. the query stream was subsequently partitioned into task-based search sessions using a combination of syntactic features (word cooccurrence across multiple transactions) and temporal features (session time-outs: we used fifteen minutes of inactivity as a boundary between search sessions). the analysis was conducted over the following datasets: dataset 1. aggregate data of approximately 6 million search transactions conducted between february 13, 2011, and december 31, 2014. we performed quantitative analysis of this set to identify general patterns of system use. dataset 2. a sample of 5,101 search sessions containing 11,478 failed or potentially problematic interactions performed in the basic search mode and 2,719 sessions containing 3,600 advanced searches, annotated with query intent and potential cause of the problem. the searches were performed during eleven twenty-four-hour periods, representing different years, academic http://projectblacklight.org/ https://library.stanford.edu/blogs/stanford-libraries-blog/2014/09/searchworks-30-released information technology and libraries | september 2016 13 quarters, times of the school year (beginning of the quarter, midterms, finals, breaks), and days of the week. this dataset was analyzed to identify common sources of user failure. dataset 3. user feedback messages submitted to searchworks between january 2011 and december 2014 through the “feedback” link, which appears on every searchworks page. while the majority of feedback messages were error and bug reports, this dataset also contained valuable information about how users employed various features of the discovery layer, what problems they encountered, and what features they felt would improve their search experience. for the manual analysis of dataset 2, all searches within a search session were reconstructed in searchworks and, in some cases, also in external sources such as worldcat, google scholar, and google. they were subsequently assigned to one of the following categories: known-item searches (searches for a specific resource by title, combination of title and author, a standard number such as issn or isbn, or a call number), author searches (queries for a specific person or organization responsible for or contributing to a resource), topical searches, browse searches (searches for a subset of the library collection, e.g., “rock operas,” “graphic novels,” “dvds,” etc.), invalid queries, and queries where the search intent could not be established. to identify potentially problematic transactions, the following heuristic was employed: we selected all search sessions where at least one transaction failed to retrieve any records, as well as sessions consisting predominantly of known-item or author searches, where the user repeated or reformulated the query three or more times within a five-minute time frame. we hypothesized that this search pattern could be part of the normal query formulation process for topical searches, but it could serve as an indicator of the user’s dissatisfaction with the results of the initial query for known-item and author searches. we identified seventeen distinct types of problems, which we further aggregated into the following five groups: input errors, absence of the resource from the collection, queries at the wrong level of granularity, erroneous or too restrictive use of limiters, and mismatch between the search terms entered and the library metadata. each search transaction in dataset 2 was manually reviewed and assigned to one or more of these error categories. findings usage patterns our analysis of the aggregate data suggests that keyword searching remains the primary interaction paradigm with the library discovery system, accounting for 76 percent of all searches. however, users also increasingly take advantage of facets both for browsing and refining their searches: the use of facets grew from 25 percent in 2011 to 41 percent in 2014. library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 14 although both the basic and the advanced search modes allow for “fielded” searches, where the user can specify which element of the record to search (author, title, subject, etc.), searchers rarely made use of this feature, relying mostly on the system’s defaults (the “all fields” search option in the basic search mode): users selected a specific search index in less than 25 percent of all basic searches. advanced searching was infrequent and declining (from 11 percent in 2011 to 4 percent in 2014). typically, users engaged in short sessions with a mean session length of 1.5 queries. search queries were brief: 2.9 terms per query on average. single terms made up 23 percent of queries; 26 percent had two terms, and 19 percent had three terms. error patterns the breakdown of errors by category and search mode is shown in figure 1. in the following sections, we describe and analyze different types of errors. figure 1. breakdown of errors by category and search mode input errors input errors accounted for the largest proportion of problematic searches in the basic search mode (29 percent) and for 5 percent of problems in the advanced search. while the majority of such errors occurred at the level of individual words (misspellings or typographical errors), entire search statements were also imprecise and erroneous (e.g., “diary of an economic hit man” instead of “confessions of an economic hit man” and “dostoevsky war and peace” instead of “tolstoy war and peace”). it is noteworthy that in 46 percent of all search sessions containing information technology and libraries | september 2016 15 problems of this type, users subsequently entered a corrected query. however, if such errors occurred in a personal name, they were almost half as likely to be corrected. absence of the item sought from the collection queries for materials that were not in the library’s collection accounted for about a quarter of all potentially problematic searches. in the advanced search modality, where the query is matched against a specific search field, such queries typically resulted in zero hits and can hardly be considered failures per se. however, in the default cross-field search, users were often faced with the problem of false hits and had to issue multiple progressively more specific queries to ascertain that the desired resource was absent from the collection. queries at the wrong level of granularity a substantial number of user queries failed because they were posed at the level of specificity not supported by the catalog. such queries accounted for the largest percentage of problematic advanced searches (63 percent), where they consisted almost exclusively of article-level searching: users either tried to locate a specific article (often by copying the entire citation or its part from external sources) or conducted highly specific topical searches more suitable for a fulltext database. in the basic search mode, the proportion of searches at the wrong granularity level was much lower, but still substantial (20 percent). in addition to searches for articles and narrowly defined subject searches, users also attempted to search for other types of more granular content, such as book chapters, individual papers in conference proceedings, poems, songs, etc. erroneous or too restrictive use of limiters another common source of failure was the selection of the wrong search index or a facet that was too restrictive to yield any results. the majority of these errors were purely mechanical: users failed to clear out search refinements from their previous search or entered query terms into the wrong search field. however, our analysis also revealed several conceptual errors, typically stemming from a misunderstanding of the meaning and purpose of certain limiters. for example, “online,” “database,” and “journal/periodical” facets were often perceived by the user as a possible route to article-level content. even seemingly straightforward limiters such as “date” caused confusion, especially when applied to serial publications: users attempted to employ this facet to drill down to the desired journal issue or article, most likely acting on the assumption that the system included article-level metadata. lack of correspondence between the users’ search terms and the library metadata a significant number of problems in this group involved searches for non-english materials. when performed in their english transliteration, such queries often failed because of users’ lack of library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 16 familiarity with the transliteration rules established by the library community, whereas searches in the vernacular scripts tended to produce incomplete or no results because not all bibliographic records in the database contained parallel non-roman script fields. author and title searches often failed because of the users’ tendency to enter abbreviated queries. for example, personal name searches where the user truncated the author’s first or middle name to an initial while the bibliographic records only contained this name in its full form were extremely likely to fail. abbreviations were also used in searches for journals, conference proceedings, and occasionally even for book titles (e.g., “ai: a modern approach” instead of “artificial intelligence: a modern approach”). such queries were successful only if the abbreviation used by the searcher was included in the bibliographic records as a variant title. a somewhat related problem occurred when the title of a resource contained a numeral in its spelled out form but was entered as a digit by the user. because these title variations are not always recorded as additional access points in the bibliographic records, the desired item either did not appear in the result set or was buried too deep to be discovered. topical searches within the subject index were also prone to failure, mostly because patrons were unaware that such searches require the use of precise terms from controlled vocabularies and resorted to natural language searching instead. user feedback our analysis of user feedback revealed substantial differences in how various user groups approach the search system and which areas of it they find problematic. students were often frustrated by the absence of spelling suggestions, which, as one user put it, “left the users wander [to?] in the dark” as to the cause of searching failure. this user group also found certain social features desirable: for example, one user suggested that having ratings for books would be helpful in his choice of a good programming book. by contrast, faculty and researchers were more concerned about the lack of the more advanced features, such as cross-reference searching and left-anchored browsing of the title, subject, and author indexes. however, there were several areas that both groups found problematic: students and faculty alike saw the system’s inability to assist in the selection of the correct form of the author’s name as a major barrier to effective author searching and also converged on the need for more granular access to formats of audiovisual materials. discussion scope of the discovery system the results of our analysis point to users’ lack of understanding of what is covered by the discovery layer. users are often unaware of the existence of separate specialized search interfaces for different categories of materials and assume that the library discovery layer offers google-like information technology and libraries | september 2016 17 searching across the entire range of library resource types. moreover, they are confused by the multiple search modalities offered by the discovery layer: one of the common misconceptions in searchworks is that the advanced search will allow the user to access additional content rather than offer a different way of searching the same catalog data. in addition to the expanded scope of the discovery tools, there is also a growing expectation of greater depth of coverage. according to our data, searching in a discovery layer occurs at several levels: the entire resource (book, journal title, music recording), its smaller integral units (book chapters, journal articles, individual musical compositions, etc.), and full text. user search strategies the search strategies employed by searchworks users are heavily influenced by their experiences with web search engines. users tend to engage in brief search sessions and use short queries, which is consistent with the general patterns of web searching. they rely on relevance ranking and are often reluctant to examine search results in any depth: if the desired item does not appear within the first few hits, users tend to rework their initial search statement (often with only a minimal change to the search terms) rather than scrolling down to the bottom of the results screen or looking beyond the first page of results. given these search patterns, it is crucial to fine-tune relevance-ranking algorithms to the extent that the most relevant results are displayed not just on the first page but are included in the first few hits. while this is typically the case for unique and specific queries, more general searches could benefit from a relevance-ranking algorithm that would leverage the popularity of a resource as measured by its circulation statistics. adding this dimension to relevance determination would help users make sense of large result sets generated by broad topical queries (e.g., “quantum mechanics,” “linear algebra,” “microeconomics”) by ranking more popular or introductory materials higher than more specialized ones. it could also provide some guidance to the user trying to choose between different editions of the same resource and improve the quality of results of author searches by ranking works created by the author before critical and biographical materials. users’ query formulation strategies are also modeled by google, where making search terms as specific as possible is often the only way to increase the precision of a search. faceted search systems, however, require a different approach: the user is expected to conduct a broad search and subsequently focus it by superimposing facets on the results. qualifying the search upfront through keywords rather than facets is not only ineffective, but may actually lead to failure. for example, a common search pattern is to add the format of a resource as a search term (e.g., “fortune magazine,” “science journal,” “gre e-book,” “nicole lopez dissertation,” “woody allen movies”), and because the format information is coded rather than spelled out in the bibliographic records, such queries either result in zero hits or produce irrelevant results. in a similar vein, making the query overly restrictive by including the year of publication, publisher, or edition library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 18 information often causes empty retrievals because the library might not have the edition specified by the user or because the query does not match the data in the bibliographic record. thus our study lends further weight to claims that even in today’s reality of sophisticated discovery environments and unmediated searching, library users can still benefit from learning the best search techniques that are specifically tailored to faceted interfaces.20 error tolerance input errors remain one of the major sources of failure in library discovery layers. users have become increasingly reliant on error recovery features that they find elsewhere on the web, such as “did you mean . . . ” suggestions, automatic spelling corrections, and helpful suggestions on how to proceed in situations where the initial search resulted in no hits. but perhaps even more crucial are error-prevention mechanisms, such as query autocomplete, which helps users avoid spelling and typographical errors and provides interactive search assistance and instant feedback during the query formulation process. our visual analysis of the logs from the most recent years revealed an interesting search pattern, where the user enters only the beginning of the search query and then increments it by one or two letters: pr pro proq proque proques proquest such search patterns indicate that users expect the system to offer query expansion options and show the extent to which the query autocomplete feature (currently missing from searchworks) has become an organic part of the users’ search process. topical searching while next-generation discovery systems represent a significant step toward enabling more sophisticated topical discovery, a number of challenges still remain. apart from mechanical errors, such as misspellings and wrong search index selections, the majority of zero-hit topical searches were caused by a mismatch between the user’s query and the vocabulary in the system’s index. in many cases such queries were formulated too narrowly, reflecting the users’ underlying belief that the discovery layer offers full-text searching across all of the library’s resources. in addition to keyword searching, libraries have traditionally offered a more sophisticated and precise way of accessing subject information in the form of library of congress subject headings (lcsh). however, our results indicate that these tools remain largely underused: users took advantage of this feature in only 21 percent of all subject searches in our sample. we also found information technology and libraries | september 2016 19 that 95 percent of lcsh usage came from clicks on subject heading links within individual bibliographic records rather than from “subject” facets, corroborating the results of earlier studies.21 there is a whole range of measures that could help patrons leverage the power of controlled vocabulary searching. they include raising the level of patron familiarity with the lcshs, integrating cross-references for authorized subject terms, enabling more sophisticated facetbased access to subject information by allowing users to manipulate facets independently, and exposing hierarchical and associative relationships among lcshs. ideally, once the user has identified a helpful controlled vocabulary term, it should be possible to expand, refine, or change the focus of a search through broader, narrower, and related terms in the lcsh’s hierarchy as well as to discover various aspects of a topic through browse lists of topical subdivisions or via facets. known-item searching important as it is for the discovery layer to facilitate topical exploration, our data suggests that searchworks remains, first and foremost, a known-item lookup tool. while a typical searchworks user rarely has problems with known-work searches, our analysis of clusters of closely related searches has revealed several situations where users’ known-item search experience could be improved. for example, when the desired resource is not in the library’s collection, the user is rarely left with empty result sets because of automatic word-stemming and cross-field searching. while this is a boon for exploratory searching, it becomes a problem when the user needs to ensure that the item sought is not included in the library’s collection. another common scenario arises when the query is too generic, imprecise, or simply erroneous, or when the search string entered by the user does not match the metadata in the bibliographic record, causing the most relevant resources to be pushed too far down the results list to be discoverable. providing helpful “did you mean . . . ” suggestions could potentially help the user distinguish between these two scenarios. another feature that would substantially benefit the user struggling with the problem of noisy retrievals is highlighting the user’s search terms in retrieved records. displaying search matches could alleviate some of the concerns over lack of transparency as to why seemingly irrelevant results are retrieved, repeatedly expressed in user feedback, as well as expedite the process of relevance assessment. author searching author searching remains problematic because of a convergence of factors: a. misspellings. according to our data, typographical errors and misspellings are by far the most common problem in author searching. when such errors occur in personal names, they are much more difficult to identify than errors in the title, and in the absence of library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 20 index-based spell-checking mechanisms, often require the use of external sources to be corrected. b. mismatch between the form and fullness of the name entered by the user and the form of the name in the bibliographic record. for example, a user’s search for “d. reynolds” will retrieve records where “d” and “reynolds” appear anywhere in the record (or anywhere in the author fields, if the user opts for a more focused “author” search), but will not bring up records where the author’s name is recorded as “reynolds, david.” c. lack of cross-reference searching of the lc name authority file. if the user searches for a variant name represented by a cross-reference on an authority record, she might not be directed to the authorized form of the name. d. lack of name disambiguation, which is especially problematic when the search is for a common name. while the process of name authority control ensures the uniqueness of name headings, it does not necessarily provide information that would help users distinguish between authors. for instance, the user often has to know the author’s middle name or date of birth to choose the correct entry, as exemplified by the following choices in the “author” facet resulting from the query “david kelly”: kelly, david kelly, david (david d.) kelly, david (david francis) kelly, david f. kelly, david h. kelly, david patrick kelly, david st. leger kelly, david t. kelly, david, 1929 july 11– kelly, david, 1929– kelly, david, 1929–2012 kelly, david, 1938– kelly, david, 1948– kelly, david, 1950– kelly, david, 1959– e. errors and inaccuracies in the bibliographic records. given the past practice of creating undifferentiated personal-name authority records, it is not uncommon to have one name heading for different authors or contributors. conversely, situations where a single person is identified by multiple headings (largely because some records still contain obsolete or variant forms of a personal name) are also prevalent and may information technology and libraries | september 2016 21 become a significant barrier to effective retrieval as they create multiple facet values for the same author or contributor. f. inability to perform an exhaustive search on the author’s name. a fielded “author” search will miss the records where the name does not appear in the “author” fields but appears elsewhere in the bibliographic record. g. relevance ranking. because search terms occurring in the title have more weight than search terms in the “author” fields, works about an author are ranked higher than works of the author. browsing like many other next-generation discovery systems, searchworks features faceted navigation, which facilitates both general-purpose browsing and more targeted search. in searchworks, facets are displayed from the outset, providing a high-level overview of the collection and jumping-off points for further exploration. rather than having to guess the entry vocabulary, the searcher may just choose from the available facets and explore the entire collection along a specific dimension. however, findings from our manual analysis of the query stream suggest that facets as a browsing tool might not be used to their fullest potential: users often resort to keyword searching when faceted browsing would have been a more optimal strategy. there are at least two factors that contribute to this trend. the first is users’ lack of awareness of this interface feature: it is common for searchworks users to issue queries such as “dissertations,” “theses,” and “newspapers” instead of selecting the appropriate value of the “format” facet. second, many of the facets that could be useful in the discovery process are not available as top-level browsing categories. for example, users expect more granular faceting of audiovisual resources, which would include the ability to browse by content type (“computer games,” “video games”) and genre (“feature films,” “documentaries,” “tv series,” “romantic comedies”). another category of resources commonly accessed by browsing is theses and dissertations. users frequently try to browse dissertations by field or discipline (issuing searches such as “linguistics thesis,” “dissertations aeronautics,” “phd thesis economics,” “biophysics thesis”), by program or department and by the level of study (undergraduate, master’s, doctoral), and could benefit from a set of facets dedicated to these categories. browsing for books could be enhanced by additional faceting related to intellectual content, such as genre and literary form (e.g., “fantasy,” “graphic novels,” “autobiography,” “poetry”) and audience (e.g., “children’s books”). users also want to be able to browse for specific subsets of materials on the basis of their location (e.g., permanent reserves at the engineering library). browsing for new acquisitions with the option of limiting to a specific topic is also a highly desirable feature. library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 22 while some browsing categories are common across all types of resources, others only apply to specific types of materials (e.g., music, cartographic/geospatial materials, audiovisual resources, etc.). for example, there is a strong demand among music searchers for systematic browsing by specific musical instruments and their combinations. ideally, the system should offer both an optimal set of initial browse options and intuitive context-specific ways to progressively limit or expand the search. offering such browsing tools may require improvements in system design as well as significant data remediation and enhancement because much of the metadata that could be used to create these browsing categories is often scattered across multiple fixed and variable fields in the bibliographic records, inconsistently recorded, or not present at all. one of the hallmarks of modern discovery tools has been their increased focus on developing tools that would facilitate serendipitous browsing. searchworks was one of the pioneers to offer virtual “browse shelf” feature, which is aimed at emulating browsing the shelves in a physical library. however, because this functionality relies on the classification number, it does not allow browsing of many other important groups of materials, such as multimedia resources, rare books, or archival resources. call-number proximity is only one of the many dimensions that could be leveraged to create more opportunities for serendipitous discoveries. other methods of associating related content might include recommendations based on subject similarity, authorship, keyword associations, forward and backward citations, and use. implications for practice addressing the issues that we identified would involve improvements in several areas: • scope. our findings indicate that library users increasingly perceive the discovery interface as a portal to all of the library’s resources. meeting this need goes far beyond offering the ability to search multiple content sources from a single search box: it is just as important to help users make sense of the results of their search and to provide easy and convenient ways to access the resources that they have discovered. and whatever the scope of the library discovery layer is, it needs to be communicated to the user with maximum clarity. • functionality. users expect a robust and fault-tolerant search system with a rich suite of search-assistance features, such as index-based alternative spelling suggestions, result screens displaying keywords in context, and query auto-completion mechanisms. these features, many of which have become deeply embedded into user search processes elsewhere on the web, could prevent or alleviate a substantial number of issues related to problematic user queries (misspellings, typographical errors, imprecise queries, etc.), enable more efficient recovery from errors by guiding users to improved results, and facilitate discovery of foreign-language materials. equally important is the continued focus on relevance ranking algorithms, which ideally should move beyond simple keyword information technology and libraries | september 2016 23 matching techniques toward incorporating social data as well as leveraging the semantics of the query itself and offering more intelligent and possibly more personalized results depending on the context of the search. • metadata. the quality of the user experience in the discovery environments depends as much on the metadata as it does on the functionality of the discovery layer. thus it remains extremely important to ensure consistency, granularity, and uniformity of metadata, especially as libraries are increasingly faced with the problem of integrating heterogeneous pools of metadata into a single discovery tool. conclusions and future directions the analysis of the transaction log data and user feedback has helped us identify several common patterns of search failure, which in turn can reveal important assumptions and expectations that users bring to the library discovery. these expectations pertain primarily to the system’s functionality: in addition to simple, intuitive, and visually appealing interfaces and relevanceranked results, users expect a sophisticated search system that would consistently produce relevant results even for incomplete, inaccurate, or erroneous queries. users also expect a more centralized, comprehensive, and inclusive search environment that would enable more in-depth discovery by offering article-level, chapter-level, and full-text searching. finally, the results of this study have underscored the continued need for a more flexible and adaptive system that would be easy to use for novices while offering advanced functionality and more control over the search process for the “power” users, a system that would provide targeted support for the different types of information behavior (known-item look-up, author searching, topical exploration, browsing) and would facilitate both general inquiry and very specialized searches (e.g., searches for music, cartographic and geospatial materials, digital collections of images, etc.). just like discovery itself, building discovery tools is a dynamic, complex, iterative process that requires intimate knowledge of ever-changing and evolving user needs and expectations. it is hoped that ongoing focus on user problems and frustrations in the new discovery environments can complement other assessment methods by identifying unmet user needs, thus helping create a more holistic and nuanced picture of users’ search and discovery behaviors. references 1. marshall breeding, “library resource discovery products: context, library perspectives, and vendor positions,” library technology reports 50, no. 1 (2014): 5–58. 2. craig silverstein et al., “analysis of a very large web search engine query log,” sigir forum 33, no. 1 (1999): 6–12; bernard j. jansen, amanda spink, and tefko saracevic, “real life, real users, and real needs: a study and analysis of user queries on the web,” information library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 24 processing & management 36, no. 2 (2000): 207–27, http://dx.doi.org/10.1016/s03064573(99)00056-4; amanda spink, bernard j. jansen, and h. cenk ozmultu, “use of query reformulation and relevance feedback by excite users,” internet research 10, no. 4 (2000): 317–28; amanda spink et al., “searching the web: the public and their queries,” journal of the american society for information science & technology 52, no. 3 (2001): 226–34; bernard j. jansen and amanda spink, “an analysis of web searching by european allteweb.com users,” information processing & management 41, no. 2 (2005): 361–81, http://dx.doi.org/10.1016/s0306-4573(03)00067-0. 3. cory lown and bradley hemminger, “extracting user interaction information from the transaction logs of a faceted navigation opac,” code4lib 7, june 26, 2009, http://journal.code4lib.org/articles/1633; eng pwey lau and dion ho-lian goh, “in search of query patterns: a case study of a university opac,” information processing & management 42, no. 5 (2006): 1316–29, http://dx.doi.org/10.1016/j.ipm.2006.02.003; heather moulaison, “opac queries at a medium-sized academic library: a transaction log analysis,” library resources & technical services 52, no. 4 (2008): 230–37. 4. william h. mischo et al., “user search activities within an academic library gateway: implications for web-scale discovery systems,” in planning and implementing resource discovery tools in academic libraries, edited by mary pagliero popp and diane dallis, 153–73 (hershey, : information science reference, 2012); xi niu, tao zhang, and hsin-liang chen, “study of user search activities with two discovery tools at an academic library,” international journal of human-computer interaction 30, no. 5 (2014): 422–33, http://dx.doi.org/10.1080/10447318.2013.873281. 5. eng pwey lau and dion ho-lian goh, “in search of query patterns”; niu, zhang, and chen, “study of user search activities with two discovery tools at an academic library.”. 6. lown and hemminger, “extracting user interaction; kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39; niu, zhang, and chen, “study of user search activities with two discovery tools at an academic library.” 7. xi niu and bradley hemminger, “analyzing the interaction patterns in a faceted search interface,” journal of the association for information science & technology 66, no. 5 (2015): 1030–47, http://dx.doi.org/10.1002/asi.23227. 8. steven d. zink, “monitoring user search success through transaction log analysis: the wolfpac example,” reference services review 19, no. 1 (1991): 49–56; deborah d. blecic et al., “using transaction log analysis to improve opac retrieval results,” college & research libraries 59, no. 1 (1998): 39–50; holly yu and margo young, “the impact of web search http://dx.doi.org/10.1016/s0306-4573(99)00056-4 http://dx.doi.org/10.1016/s0306-4573(99)00056-4 http://dx.doi.org/10.1016/s0306-4573(03)00067-0 http://journal.code4lib.org/articles/1633 http://dx.doi.org/10.1016/j.ipm.2006.02.003 http://dx.doi.org/10.1080/10447318.2013.873281 http://dx.doi.org/10.1080/10447318.2013.873281 information technology and libraries | september 2016 25 engines on subject searching in opac,” information technology & libraries 23, no. 4 (2004): 168–80; moulaison, “opac queries at a medium-sized academic library.” 9. thomas peters, “when smart people fail,” journal of academic librarianship 15, no. 5 (1989): 267–73; zink, “monitoring user search success through transaction log analysis”; rhonda h. hunter, “successes and failures of patrons searching the online catalog at a large academic library: a transaction log analysis,” reference quarterly (spring 1991): 395–402. 10. karen antell and jie huang, “subject searching success: transaction logs, patron perceptions, and implications for library instruction,” reference & user services quarterly 48, no. 1 (2008): 68–76; hunter, “successes and failures of patrons searching the online catalog at a large academic library”; peters, “when smart people fail.” 11. peters, “when smart people fail.”; moulaison, “opac queries at a medium-sized academic library”; blecic et al., “using transaction log analysis to improve opac retrieval results.” 12. lynn silipigni connaway, debra wilcox johnson, and susan e. searing, “online catalogs from the users’ perspective: the use of focus group interviews,” college & research libraries 58, no. 5 (1997): 403–20, http://dx.doi.org/10.5860/crl.58.5.403. 13. karl v. fast and d. grant campbell, “‘i still like google’: university student perceptions of searching opacs and the web,” asist proceedings 41 (2004): 138–46; eric novotny, “i don’t think i click: a protocol analysis study of use of a library online catalog in the internet age,” college & research libraries 65, no. 6 (2004): 525–37, http://dx.doi.org/10.5860/crl.65.6.525. 14. xi niu et al., “national study of information seeking behavior of academic researchers in the united states,” journal of the american society for information science & technology 61, no. 5 (2010): 869–90, http://dx.doi.org/10.1002/asi.21307; lynn sillipigni connaway, timothy j. dikey, and marie l. radford, “if it is too inconvenient i’m not going after it: convenience as a critical factor in information-seeking behaviors,” library & information science research 33, no. 3 (2011): 179–90; karen calhoun, joanne cantrell, peggy gallagher and janet hawk, online catalogs: what users and librarians want: an oclc report (dublin, oh: oclc online computer library center, 2009). 15. f. william chickering and sharon q. young, “evaluation and comparison of discovery tools: an update,” information technology & libraries 33, no.2 (2014): 5–30, http://dx.doi.org/10.6017/ital.v33i2.3471. 16. william denton and sarah j. coysh, “usability testing of vufind at an academic library,” library hi tech 29, no. 2 (2011): 301–19, http://dx.doi.org/10.1108/07378831111138189; jennifer emanuel, “usability of the vufind next-generation online catalog,” information technology & libraries 30, no. 1 (2011): 44–52; erin dorris cassidy et al., “student searching http://dx.doi.org/10.5860/crl.58.5.403 http://dx.doi.org/10.5860/crl.65.6.525 http://dx.doi.org/10.1002/asi.21307 http://dx.doi.org/10.6017/ital.v33i2.3471 http://dx.doi.org/10.1108/07378831111138189 library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 26 with ebsco discovery: a usability study,” journal of electronic resources librarianship 26, no. 1 (2014): 17–35, http://dx.doi.org/10.1080/1941126x.2014.877331. 17. sarah c. williams and anita k. foster, “promise fulfilled? an ebsco discovery service usability study,” journal of web librarianship 5, no. 3 (2011): 179–98, http://dx.doi.org/10.1080/19322909.2011.597590; rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends 61, no. 1 (2012): 186– 207; andrew d. asher, lynda m. duke, and suzanne wilson, “paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources,” college & research libraries 74, no. 5 (2013): 464–88. 18. jody condit fagan et al., “usability test results for a discovery tool in an academic library,” information technology & libraries 31, no. 1 (2012): 83–112; megan johnson, “usability test results for encore in an academic library,” information technology & libraries 32, no. 3 (2013): 59–85. 19. elizabeth (bess) sadler, “project blacklight: a next generation library catalog at a first generation university,” library hi tech 27, no. 1 (2009): 57–67, http://dx.doi.org/10.1108/07378830910942919; bess sadler, “stanford's searchworks: unified discovery for collections?” in more library mashups: exploring new ways to deliver library data, edited by nicole c. engard, 247–260 (london: facet, 2015). 20. andrew d. asher, lynda m. duke and suzanne wilson, “paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources,” college & research libraries 74, no. 5 (2013): 464–88; kelly meadow and james meadow, “search query quality and web-scale discovery: a qualitative and quantitative analysis,” college & undergraduate libraries 19, no. 2–4 (2012): 163–75, http://dx.doi.org/10.1080/10691316.2012.693434. 21. sarah c. williams and anita k. foster, “promise fulfilled? an ebsco discovery service usability study,” journal of web librarianship 5, no. 3 (2011): 179–98, http://dx.doi.org/10.1080/19322909.2011.597590; kathleen bauer and alice peterson-hart, “does faceted display in a library catalog increase use of subject headings?” library hi tech 30, no. 2 (2012), 347–58, http://dx.doi.org/10.1108/07378831211240003. http://dx.doi.org/10.1080/1941126x.2014.877331 http://dx.doi.org/10.1080/19322909.2011.597590 http://dx.doi.org/10.1108/07378830910942919 http://dx.doi.org/10.1080/10691316.2012.693434 http://dx.doi.org/10.1080/19322909.2011.597590 http://dx.doi.org/10.1108/07378831211240003 abstract introduction references a computer output microfilm serials list for patron use william saffady: wayne state university, detroit, michigan. 263 library literature generally assumes that com is bette1· suited to staff rather than patron use applications. this paper describes a com serials holdings list intended for patton use. the application and conversion from paper to com are described. emphasis is placed on the selection of an appropriate microformat and easily operable viewing equipment as conditions of success fo1' patron use. as a marriage of dynamic information-handli11g technologies, computer output microfilm (com) is a systems tool of· potentially great significance to librarians. several libraries have reported successful com applications initiated within the last few years. the two most recent-fischer's description of four com-generated reports used by the los angeles public libraries and bolefs account of a com book catalog at the washington university school of medicine library-stress the time, space, and cost savings so frequently reported in analyses of the advantages of com.1• 2 this article describes the substitution of microfilm for paper as the computer output medium in one of the most common library automation applications, a serials holdings list intended for use by library patrons. it is interesting that, at a time when librarians are insisting on the importance of patron acceptance of technological innovation, the recent literature reports com applications intended solely for staff use. bole£, in fact, lists staff rather than patron use among the characteristics of potentially successful library com applications. the report that follows suggests, however, that careful attention to the selection of an appropriate microformat and viewing equipment can successfully extend the effectiveness of com to include pab:on-use library automation applications. the application the union list of se1·ials in the wayne state university libraries is a computer-generated alphabetical listing, by title, of serials held by the wayne state university library system and some biomedical libraries in the detroit metropolitan area. sullivan describes it as "informative in purpose and conventional in method."3 as with many similar applications, serials i' 264 journal of library automation vol. 7 j 4 december 197 4 holdings were automated in order to unify and disseminate hitherto separate, local records. the list is primarily a location device, giving for each title the location within the library system and information on the holdings at each location. it is updated monthly, the july 1974 issue totalling 1,431 pages. in paper form, twenty copies produced on an ibm 1403 line printer using four-ply carbon-interleaved forms were distributed for use throughout the library system. the list shares some of the characteristics that have marked other successful com applications. 4 it consists of many pages and has a sizeable distribution. quick retrieval of information is essential. use is for reference rather than reading. there is no need to annotate the list and no need for paper copies, although the latter requirement would not rule out the use of com for this particular application. patrons simply consult the list to determine whether the library's holdings include a particular serial and then proceed to the indicated location. it is interesting that serials holdings lists, long recognized as an excellent introductory library automation application, should also prove an excellent first application for com. complexities of format and viewing equipment selection aside, the conversion of output from paper to microfilm presented no problems. since the wayne state university computing and data processing center does not have com capability, the university libraries, after careful consideration of several vendors, contracted with the mark larwood company, a microfilm service bureau equipped with a gould beta com 700l recorder. the beta com is a crt-type com recorder with an uppercase and lowercase character set, forms-overlay capability, proportional spacing, underlining, superscripts, subscripts, italics, and a universal camera capable of producing 16, 35, 70, and 105mm microformats at several reduction ratios. a decisive factor in the selection of this particular vendor was the beta com's dedicated pdp-8/l minicomputer that enables the com recorder to accept an ibm 1403 print tape, thereby greatly simplifying conversion and eliminating the expense of reprogramming. microformat selection as ballou notes, discussions of com have tended to concentrate more on the computer than on micrographics, but for a patron-use com application the selection of an appropriate microformat is of the greatest importance.5 however, there has been an unfortunate emphasis placed, both in the literature of micrographics and by vendors, on microfiche, the format now dominating the industry, especially in com applications. such emphasis ignores the fundamental rule of systems design, that form follows function. each of the microformats has strengths and weaknesses that must be analyzed with reference to the application at hand. for a patron-use, com-generated serials holdings list, ease of use with a minimum of patron film handling is a paramount consideration. microfiche is clearly unsuitable for a list of over 1,400 pages. even at 42x reduction, the paserials list/saffady 265 tron would be forced to choose from among seven :fiches, each containing 208 pages. the difficulties of handling and loading, combined with library staff involvement in a program of user instruction, make fiche an unattractive choice. instead, the relatively large size of the holdings list suggests that one of the 16mm roll formats offers the best prospects of containing present size and future growth within a single microform. the disadvantages of the conventional 16mm open spool-the necessity of threading film onto a take-up reel before viewing-can be minimized by using a magazine-type film housing. the popular cartridge format eliminates much film handling, but cartridge readers are very expensive, necessitating a considerable investment where many readers are required. even with the cartridge, it is still possible for a patron to unwind the film from the take-up reel, necessitating rethreading before viewing. fortunately, microfilm cassettes overcome this difficulty. unlike the cartridge format, 16mm cassettes feature selfcontained supply and take-up reels. the film cannot be completely unwound from the take-up reel and the cassette can be removed from the viewer at any time without rewinding. patron film handling is virtually eliminated. the cassette format has proven very popular with british libraries, where it has been used with satisfactory results in com applications.6 viewing equipment success in format choice is contingent on the selection of appropriate viewing equipment. as larkworthy and brown point out, the best viewer for patron-use com applications is one that can easily be operated by the least mechanically inclined person.7 fortunately, cassette viewers, while limited in number, tend to be very easy to operate. the viewer chosen for use with the union list of serials, the memorex 1644 autoviewer, features a simple control panel, fixed 24x reduction, easily operated focus and scan knobs, motorized film drive for high-speed searching, and a manual hand control for more precise image positioning. the screen measures eleven by fourteen inches in size, with sufficient brightness for comfortable ambient light viewing. other cassette viewers examined, however satisfactory they might be in other respects, failed to meet the peculiar requirements of this particular application. discussion since its introduction in april 1974, the com-generated union list of serials in the wayne state university libraries has enjoyed a satisfactory reception. patrons have learned to consult the com list with little difficulty. the selection of an appropriate microformat and easily operated viewing equipment have kept staff involvement in patron instruction to a minimum. there appears to be no reason for limiting potential library com applications to those used primarily or solely by staff members. given the 266 journal of library automation vol. 7/4 december 1974 severity of the current paper shortage, the consequent rise in paper prices, and serious questions about the availability of paper at any price, com merits serious consideration as an alternative output medium for the widest range of library automation applications. references 1. mary l. fischer, "the use of com at the los angeles public library," the journal of micrographics 6:205-10 (may 1973). 2. doris bole£, "computer-output microffim," special libraries 65:169-75 (april 1974). 3. howard a. sullivan, "metropolitan detroit's network: wayne state university library's serials automation project," medical library association buuetin 56:269-71 (july 1968). 4. see, for example, auerbach on computer output microfilm (princeton: auerbach publishers, 1972), p.1-10. 5. hubbard w. ballou, "microform technology," in carlos cuadra, ed., annual review of information science and technology, v.8 (washington, d.c.: american society for information science, 1973), p.139. 6. d. r. g. buckle and thomas french, "the application of microform to manual and machine readable catalogues," program 6:187-203 (july 1972). 7. graham larkworthy and cyril brown, "library catalogs on microfilm," library association record 73:231-32 (dec. 1971). 20190615 11093 galley lita president’s message moving forward with lita bohyun kim information technology and libraries | june 2019 2 bohyun kim (bohyun.kim.ois@gmail.com) is lita president 2018-19 and chief technology officer & associate professor, university of rhode island libraries, kingston, ri. i am happy to share some updates on what i covered in my previous column. first of all, i am excited to report that the merger planning of lita, alcts, and llama is back on track. the merger planning had been temporarily put on hold due to the target date for the merger being delayed from fall 2019 to fall 2020, as announced earlier this year. after taking some time after the 2019 ala midwinter meeting, the current leadership of lita, alcts, and llama met, reviewed the work that we have accomplished so far, and decided that the remaining work will now go to the capable hands of the president-elects of lita, alcts, and llama, who were elected this april. during their term, this new cohort of president-elects will build on the work done by the cross-divisional working groups, in order to present the three-division merger for the membership vote in spring 2020 with more details. another piece of good news is that lita, alcts, and llama will begin experimenting with joint programming in order to kickstart our collaboration while the merger planning continues. the lita board decided to hold the next lita forum in fall 2020. alcts is also planning for its second virtual alcts exchange to take place in spring 2020. lita, alcts, and llama will work together on both program committees of the lita forum and the alcts exchange to provide a wider and more interesting range of programs at both conferences. if the membership vote result is in favor of the three-division merger, then the new division will be officially formed in fall 2020, and the planned 2020 lita forum may become the first conference of the new division. shortly after the 2019 ala midwinter meeting, the lita board decided to commit funds to create and disseminate an online allyship training to address the issues aggressive behavior, racism, and harassment reported at the midwinter meeting.1 since then, the lita staff and the lita board of directors have been closely working with the ala office and several other divisions, alcts, alsc, asgcla, pla, rusa, and united, reviewing options. it is likely that this training will follow the “train-the-trainer” model, in order to generate and expand the pool of allyship trainers who will develop and run the lita’s online allyship training for lita members. our goal is to expand our collective capacity to strengthen active and effective allyship, recognize and undo oppressive behaviors and systems, and promote the practice of cultural humility, which requires ongoing efforts, not just a one-time event. we hope to be able to announce more details soon once the final plan is determined. i would also like to highlight the lita award winners who will be celebrated at the 2019 ala annual conference in washington d.c. and to thank the members of the award committees for their hard work.2 the 2019 lita/ex libris student writing award will go to sharon han, a master of science in library and information science candidate at the university of illinois school of information sciences, for her paper, "weathering the twitter storm: early uses of social media as a disaster response tool for public libraries during hurricane sandy," which is included in this issue. charles mcclure and john price wilkin were selected as the 2019 winners of the lita/oclc information technology and libraries | june 2019 3 frederick g. kilgour award for research in library and information technology and the hugh c. atkinson memorial award sponsored by acrl, alcts, llama, and lita, respectively. charles mcclure is the francis eppes professor of information studies in the school of information and the director of the information use management and policy institute at florida state university. john price wilkin is the juanita j. and robert e. simpson dean of libraries at the university of illinois at urbana-champaign. the north carolina state university libraries will receive the 2019 lita/library hi tech award for outstanding communication in library and information technology, which recognizes outstanding individuals or institutions for their long-term contributions to the education of the library and information science technology field and is sponsored by lita and emerald publishing. other not-to-be-missed lita highlights at the 2019 ala annual conference in washington d.c. include the lita top tech trends program widely known for its insightful overview of emerging technologies, the lita president’s program with meredith broussard, a data journalist and the author of artificial unintelligence: how computers misunderstand the world3 as the speaker, and the lita happy hour, a lively social gathering of all library technologists and technologyenthusiasts. the lita avram camp is also preparing for another terrific all-day discussion and activities this year for women and non-binary library technologists to examine the shared challenges, to network, and to support one another. the lita imagineering interest group has put together another fantastic program, “agency, consent, and power in science fiction and fantasy,” featuring four sci-fi authors: sarah gailey, malka older, john scalzi, and martha wells. the lita membership committee is also preparing a virtual lita kickoff orientation for those who are newly attending the ala annual conference. in this last column that i write as the lita president, i would like to express my sincere gratitude to the dedicated lita board of directors, the always fantastic lita staff, and many lita leaders and members whose creativity, passion, and energy continue to drive lita forward. serving as the chief elected officer of one of the leading membership association in library technology has been a true honor to me, and having such a great team of people to work with has been of tremendous help to me in tackling many dauting tasks. it is often said that all lita presidents face unique challenges during their terms. i can say that this has been certainly true during my term. working together with the alcts and the llama leadership on the three-division merger was a valuable experience and a privilege. while we could not move things as quickly as we hoped, we have built a great foundation for the next phase of the planning and learned many things together along the way. last but not least, i would like to thank everyone who stood for the election and congratulate all newly-elected lita officers: evviva weinraub for the president-elect, hong ma and galen charlton for board of directors at large, and jodie gambill for the lita councilor. i am confident that led by the incoming lita president, emily morton-owens, the capable and dedicated lita leadership will continue to accomplish many great things with energetic and forward-thinking lita members in coming years. the future of lita is brighter with these new lita leaders. good luck and thank you for your service! lita president’s message: moving forward with lita | kim 4 https://doi.org/10.6017/ital.v38i2.11093 endnotes 1 “lita’s statement in response to incidents at ala midwinter 2019,” lita blog, february 4, 2019, https://litablog.org/2019/02/litas-statement-in-response-to-incidents-at-ala-midwinter2019/. 2 “lita awards & scholarships,” library information technology association (lita), http://www.ala.org/lita/awards. 3 meredith broussard, artificial unintelligence: how computers misunderstand the world (cambridge, massachusetts: the mit press, 2018). from dreamweaver to drupal: a university library website case study jesi buell and mark sandford information technology and libraries | june 2018 118 jesi buell (jbuell@colgate.edu) is instruction and design and web librarian and mark sandford (msandford@colgate.edu) is systems librarian at colgate university, hamilton, new york. abstract in 2016, colgate university libraries began converting their static html website to the drupal platform. this article outlines the process librarians used to complete this project using only in-house resources and minimal funding. for libraries and similar institutions considering the move to a content management system, this case study can provide a starting point and highlight important issues. introduction the literature available on website design and usability is predominantly focused on business or marketing websites. what separates library websites from other informational or commercial websites is the complexity of the information architecture—they contain both intricate informational and transactional functions. website managers need to maintain congruity between many interrelated but disparate tools in a singular interface and navigational system. libraries are also often challenged with finding individuals who possess the appropriate skills to build and maintain a secure, accessible, attractive, and easy-to-use website. in contrast to libraries, commercial companies employ a team of designers, developers, content managers, and specialists to triage internal and external issues. they can also spend months or years perfecting a website and, of course, all these factors have great costs associated with them. given that many commercial websites need a team of highly skilled workers with copious time and funding, how can librarians be expected to give their patrons similar experiences to sites like google? this case study will outline how a small team of librarians completely overhauled their fragmented, dreamweaver-based website to a more secure, organized, and appealing open-source platform with drupal within a tight timeline and very few financial consequences. it includes a timeline of major milestones in the appendix. goals and objectives the first necessity for restructuring the colgate university libraries’ website was building a team that had the skills and knowledge necessary to perform this task. the website overhaul was spearheaded by jesi buell, instructional design and web librarian, and mark sandford, systems librarian. buell has a user experience (ux) design and editing background while sandford has systems, cataloging, and server experience. they were advised by web development committee (wdc) members cindy li, associate director of library technology and digital initiatives, and debbie krahmer, digital learning and media librarian. together, the group understood trends in digital librarianship, the needs of the libraries’ patrons, as well as website and catalog design and mailto:jbuell@colgate.edu mailto:msandford@colgate.edu from dreamweaver to drupal | buell and sandford 119 https://doi.org/10.6017/ital.v37i2.10113 maintenance. the first thing the wdc did was outline its goals and objectives, and this documented weaknesses the group wanted to address with a new website. the wdc identified four main improvements colgate libraries needed to make to the website: improve design colgate libraries’ old website suffered from varied design and language use across pages and various tools (libguides, catalog, etc.). this led to an inconsistent and often frustrating user experience and detracted from the user’s sense of a single, cohesive website. the wdc also wanted to improve and update the aesthetic quality of the website. while many of these changes could have been made with an overhaul of the existing site, the wdc would have still needed to address the underlying cause. responsibility for content was decentralized, and content creation relied too heavily on technical expertise with dreamweaver. further, the ad hoc nature of the content—the product of years of “fitting in” content without a holistic approach—meant that changes to visual style could not be accomplished by changing a single css file. there were far too many exceptions to make changes simply. improve usability the wdc needed to make sure all the webpages were responsive and accessible. a restructuring of layout and information architecture (ia) was also necessary to improve findability of resources. on the old site, some content was hidden behind several layers of links. with no platform to ensure or enforce accessibility standards, website managers had to trust that all content creators were conscious of best practices or, failing that, pages had to be re-edited to improve accessibility. improve content creation and governance a common source of library staff frustration was the authoring experience using dreamweaver. there was no way to track when a webpage was changed or see who had made those changes. situations occurred where content was deleted or changed in error, and no one else knew until a patron discovered a mistake. staff could also mistakenly push out outdated versions of pages. it was not an ideal situation, and it was impossible for an individual (the web librarian) to monitor hundreds of pieces of content for daily changes to check for accuracy. the only other option would be narrow access to only those on the wdc, but that would mean everyone had to wait for the web librarian to push content live, which would also be frustrating. beyond the security and workflow issues, many of the library staff felt uncomfortable adding or editing content because dreamweaver requires some coding knowledge (html, css, javascript). therefore, the group wanted to install a content management system (cms) that provided a wysiwyg (what you see is what you get) content editor so that no coding knowledge would be needed. unite disparate sites (website, blog, and database list) under one updated url on a single secure server colgate libraries’ website functionality suffered from what marshall breeding describes as “a fragmented user experience.”1 the libraries website’s main address was http://exlibris.colgate.edu. however, different tools lived under other urls—one for a blog, another for the database list, yet another still for the mobile site librarians had to maintain information technology and libraries | june 2018 120 because the main website was not responsive. additionally, some portions of the website had been set up on other servers because of various limitations in the windows.net environment and inhouse skills. this was further complicated by the fact that most specialized interactivity or visual components had to be created from scratch by existing staff. the libraries’ blog was on an externally hosted wordpress site, and the database a–z list was on a custom-coded php page. a unified domain would make usage statistics easier to track and analyze. additionally, it would eliminate the need for multiple credentials for the various external sites. custom code, be it in php, .net, or any other language, also needs to be regularly updated as new security vulnerabilities arise.2 moving to a well-maintained cms would help alleviate that burden. by establishing goals and objectives, the wdc had identified that it wanted a cms to help with better governance, easier maintenance, and ways to disperse web maintenance responsibilities across library faculty. it was important to choose a cms platform that offered a wysiwyg editor so that content authoring did not require coding knowledge. additionally, the group wanted to update the site’s aesthetic and navigational designs. the wdc also decided that this was the optimal time to introduce a discovery layer (since all these changes would be one entirely new experience for colgate users) rather than smaller, continual changes that would require users to keep readjusting how they used the website. the backend complexity of updating both the website platform and implementing a discovery layer required abundant and detailed planning. however, while there was a lot of overlap in the preparatory work for implementing the discovery layer as well the cms, this article will focus primarily on the cms. planning after the wdc had detailed goals and objectives, and the proposal to update the libraries’ website platform was accepted by library faculty, the group had to take several steps to plan the implementation. the first steps in planning dealt with analysis. content analysis the web librarian conducted a content analysis of the existing website. using microsoft excel to document the pages and the omni group’s omnigraffle to organize the spreadsheet into a diagram, she cataloged each page and the navigation that connected that page to other pages. this can be extremely laborious but was necessary because some content was inherited from past employees over the course of a decade, and no one knew exactly what content was live on the website. this visual representation allowed for content creators to see redundancy in both content and navigation. it also made it easy for them to identify old content and combine or reorder pages. needs analysis the wdc wanted to make sure it considered more than the content creators’ needs. this group surveyed colgate faculty, staff, and students to learn what they would like to see improved or changed. the web librarian conducted several ux studies with both students and faculty, and this elucidated several key areas in need of improvement. from dreamweaver to drupal | buell and sandford 121 https://doi.org/10.6017/ital.v37i2.10113 peer analysis peer analysis involves thoroughly investigating peer institution’s websites to analyze how they organize both their content and their site navigation. it also gives insight into what other services and tools they provide. it is important to choose institutions similar in size and academic focus. colgate university is a small, liberal arts institution that only serves an undergraduate population, so the libraries would not seek to emulate a large university that serves graduate populations or distance learners. peer analysis is an excellent opportunity to see where a website is not measuring up to other websites as well as to borrow ideas from peers to customize for your specific patrons. evaluating platforms now that the group knew what the libraries had and what the libraries wanted from our web presence, it was time to evaluate the available options. this involved evaluating cms products and discovery layer platforms. the wdc researched different cmss and listed positives and negatives. ultimately, the group determined that drupal best satisfied the majority of colgate’s identified needs. a separate committee was formed to evaluate the major discovery-layer services with the understanding that any option could be integrated into the main website as a search box. budgeting as free, open-source software, drupal does not require a subscription or licensing fee. campus it provided a virtual server for the website at no cost to the libraries. budgeting was organized by the associate director of library technology and digital initiatives and the university librarian. money was set aside in case a consultant or developer was needed, but the web and systems librarians were able to execute the conversion from dreamweaver to drupal without external support. if future development support is needed for specific projects, it can be budgeted for and purchased as needed. the last step was creating a timeline defining achievable goals, ownership (who oversees completing the goal and who needs to be involved with the work), and date of completion. timeline the timeline was outlined as follows: october 2015–january 2016 halfway through the fall 2015 semester, the wdc began to create a proposal for changes to be made to the website. this proposal would be submitted to the university librarian for consideration by december 1. in the meantime, the web librarian completed a content inventory, peer analysis, and ux studies. she also gathered faculty and staff feedback on the current website through suggestion-box commentary, one-on-one interviews, online questionnaires, and anecdotal stories. by the deadline for the proposal, this additional information was condensed and presented to the university librarian. after incorporating suggested changes made by the university librarian, the wdc was able to present both the proposal and results from various studies to the library faculty on january 4, information technology and libraries | june 2018 122 2016. at the end of the meeting, the faculty voted to move forward and adopt the proposed changes. february 2016 february was spent meeting with stakeholders, both internal and external to the libraries, to gather concerns, necessary content, and ideas for improvements. the wdc members shared the responsibility of running these meetings. all members from the following departments were interviewed: research and instruction, borrowing services, acquisitions, library administration, cataloging, government documents, information literacy, special collections and university archives, and the science library. together, the wdc also met with members from it and communications. it was vital that these sessions identify several components. first, what content was important to retain on the new site, and why? the act of justification made stakeholders evaluate whether the information was necessary and useful to the libraries’ users. the wdc also asked the stakeholders to identify changes they wanted to see made to the website. the answers ranged from minor aesthetic tweaks to major navigational overhauls. last, it was important to understand how specific changes might impact workflows and functionality for tools outside colgate libraries’ own website. for example, the wdc had to update information with the communications department so that the libraries’ website would be findable on the university’s app. all the answers the wdc received were compiled into a report, and the web librarian used this information to inform design decisions moving forward. march 2016 while the associate director of library technology and digital initiatives coordinated demos from discovery layer vendors, the wdc also met to choose the final template from three options designed by the web librarian. the web and systems librarians also met to create a list of developers in case assistance was needed in the development of the drupal site. the wdc team researched potential developers and inquired about their pricing. the web librarian began to create wireframe templates of the different types of pages and page components (homepage, hours blocks, blogs, forms, etc.). she also began transferring existing content from the old website to the new website. this process, in addition to the development of new content identified by stakeholders, was to be completed by mid-summer. meanwhile, the systems librarian began to consolidate the external sites under drupal to the extent possible. while libguides lives externally to drupal and maintains its own url that the libraries’ website links out to, he was able to bring the database a–z list, blog, and analytics into the drupal platform. this entailed setting up new content types in drupal to accommodate various functional requirements for the a–z list and assist in creating pages to search for and display database information. from dreamweaver to drupal | buell and sandford 123 https://doi.org/10.6017/ital.v37i2.10113 april–may 2016 drupal allows for various models of permissions and authentication. by default, accounts can be created within the drupal system and roles and permissions assigned to individuals as needed. the ldap (lightweight directory access protocol) module allowed us to tie authentication to university accounts and includes the ability to tie drupal permissions to active directory roles and groups. connecting drupal to the university ldap server required the assistance of it infrastructure staff but was straightforward. it staff provided the connection information for the drupal module’s configuration and created a resource account for the drupal module to use to connect to the ldap service. as currently implemented, the ldap module simply verifies credentials and, if a local drupal account does not exist, creates one for the user. permissions for staff are added to accounts after account creation as needed as a part of the onboarding process. permissions in drupal can be highly granular. since one of the goals of the migration to drupal was to simplify maintenance of the website, the wdc decided to begin with a relatively simple, permissive approach. currently, all library staff can edit any page. because of drupal’s ability to track and revert changes easily, undoing a problematic edit is a simple procedure, and because all changes are tied to an individual login, problems can be addressed through training as needed. the wdc discussed a more fragmented approach that tied editing privileges to specific parts of the site but decided against it. the wdc team felt it was better to begin with the presumption of trustworthiness, expecting staff to only make changes to pages they were personally responsible for. additionally, trying to divide the site into logical pieces, then accounting for the inevitable exceptions, would be complicated and time-consuming. the wdc reserved the right to begin restricting permissions in the future, but thus far this has proven unnecessary. july–august 2016 as the libraries ramped up to the official launch, it was crucial to educate the library faculty and staff so they could become independent back-end content creators. both the web and systems librarians held multiple training sessions for the libraries employees so that everyone felt comfortable both editing and generating content. the associate director of library technology and digital initiatives drafted a campus-wide email announcing the new website and discovery layer at this point. it was sent out a month in advance of the official launch. the new website launched in two parts. the soft launch occurred on august 1, 2016. the web and systems librarians set up a link to the new website on the old site so that users could choose between getting acclimated to the new website or using the tool they were used to in the frantic weeks leading up to the beginning of the semester. august 15, 2016, was the official launch. at this point, the http://exlibris.colgate.edu dreamweaver-based website was retired, and it redirected all traffic heading to the old url to the new drupal-based website at http://cul.colgate.edu. because drupal’s url structure and information architecture differed from the old website, the wdc decided that mapping every page on the old site to the new one would be too time consuming. while it was acknowledged that this may cause some disruption (as it would break existing links), it seemed necessary for keeping the project moving forward. library staff updated all external links possible. the google search operator “inurl” allowed us to identify other sites information technology and libraries | june 2018 124 outside the libraries’ control that pointed to the old website. the wdc reached out to the maintainers of those few sites as appropriate. the biggest risk the libraries took by not redirecting all urls to the correct content was the potential to disrupt faculty who had bookmarked content or had direct urls in course materials. however, the wdc team received very few complaints about the new site, and most users agreed that the improvements to the site far outweighed any temporary inconveniences caused by it. if nothing else, the simplified architecture made finding content easier, so direct links and bookmarks became far less important than they once were. implementation and future steps by strictly following the timeline and working closely together, the web librarian and systems librarian were able to launch colgate libraries’ new website in time for the 2016 fall semester. the wdc team was able to pull off this feat within eight months without spending any extra money. the timeline above only gives a high-level view of the steps the wdc took to accomplish this task. the librarians who worked on this project cannot overemphasize the complexity of this endeavor, especially with a small team. however, a website conversion is feasible with organization, time, and with the online support the drupal community provides (especially the community of libraries on the drupal platform). it is also critical to have in-house personnel that have technical (coding and server-side) knowledge, project management knowledge, and information architecture and design knowledge. the response from incoming and returning students and faculty to the updated look and improved usability of the libraries’ digital content was overwhelmingly positive. following best design practices, in january 2017 more ux testing was conducted with student and teaching faculty participants to gauge their reactions to the new website. 3 users overwhelmingly found the new website to be both more aesthetically pleasing and usable than the old website. on the back end, the libraries’ content is now more secure, responsive, and accessible because the libraries are using a cms. library faculty and staff have been able to add or remove content that they are responsible for, but the website can still maintain a consistent look and feel across all pages. governance has been improved exponentially as library staff have been able to easily and quickly contribute to the website’s content without administrative delays. as the team moves forward, the wdc plans to investigate different advanced drupal tools, implementing an intranet, and better leveraging google analytics. as with all library endeavors, improvement requires continued effort and attention. from dreamweaver to drupal | buell and sandford 125 https://doi.org/10.6017/ital.v37i2.10113 appendix: detailed timeline 1. october 2015 a. began discussion with wdc to create proposal for website changes (web librarian) 2. november–december 2015 a. complete content inventory (web librarian) b. complete peer analysis (web librarian) c. complete ux studies (web librarian) d. gather faculty and staff feedback on current website (web librarian) 3. december 1, 2015 a. submit proposal to change from dreamweaver to drupal to university librarian for consideration and approval (web librarian) 4. january 4, 2016 a. submit revised proposal to library faculty for consideration and approval (web librarian) 5. january 2016 a. set up test drupal site (systems librarian) 6. february 2016 a. complete meetings with departments to gather feedback on concerns, content, and ideas for improvements (library department meetings were split among wdc members) 7. march 2016 a. demo primo, ex libris, and summon for library faculty and staff consideration (associate director of library technology and digital initiatives) b. from three options, choose template for our website (web librarian—approval by the wdc and then the library faculty) c. create list of developers in case we need assistance (web librarian and systems librarian) d. create wireframe templates for homepage (web librarian) e. begin transferring content from old website to new website and create new content with other stakeholders—to be completed by mid-summer (web librarian) f. begin consolidating multifarious external sites under drupal as much as possible (systems librarian) 8. april 2016 a. get drupal working with the ldap (systems librarian) b. agree on permissions and roles for back-end users (systems librarian—with approval by wdc) c. agree on discovery layer choice (associate director of library technology and digital initiatives) d. meet with outside stakeholders—communications, it, administration 9. may 2016 a. integrate discovery layer search (systems librarian) 10. july 2016 a. provide training for library faculty and staff as back-end content creators (web librarian) information technology and libraries | june 2018 126 b. prepare campus-wide email to announce new website and discovery layer with our new url (associate director of library technology and digital initiatives and web librarian) 11. august 1, 2016 a. set up a link on our old site (http://exlibris.colgate.edu) so for two weeks users could choose between using the old interface or start getting acclimated to the new website before the fall semester started (systems librarian) 12. august 15, 2016 a. official launch—we retire our http://exlibris.colgate.edu dreamweaver-based website and redirect all traffic headed to our old url to our new drupal-based website at http://cul.colgate.edu (systems librarian) 13. september–october 2016 a. update and get approval from library faculty for a new web style guide and governance guide (web librarian) 14. january 2017 a. conduct ux studies of students and faculty to see how people are using both the new website and the new discovery layer; gather feedback and ideas for improvement (web librarian) bibliography breeding, marshall. “smarter libraries through technology: strategies for creating a unified web presence.” smart libraries newsletter 36, 11 (november 2016): 1-2. general onefile (accessed august 3, 2017). http://go.galegroup.com/ps/i.do?p=itof&sw=w&v=2.1&it=r&id=gale%7ca471553487. naudi, t. “nearly all websites have serious security vulnerabilities--new research shows.” database and network journal 45, 4 (2015): 25. general onefile (accessed august 3, 2017). http://bi.galegroup.com/essentials/article/gale%7ca427422281. raward, r. “academic library website design principles: development of a checklist.” australian academic & research libraries 32, 2 (2001): 123-36. http://dx.doi.org/10.1080/00048623.2001.10755151 1 marshall breeding, “smarter libraries through technology: strategies for creating a unified web presence,” smart libraries newsletter 36, no. 11 (november 2016): 1–2. general onefile. 2 tamara naudi, “nearly all websites have serious security vulnerabilities—new research shows,” database and network journal 45, no. 4 (2015): 25. general onefile. 3 roslyn raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36. http://dx.doi.org/10.1080/00048623.2001.10755151 introduction goals and objectives improve design improve usability improve content creation and governance unite disparate sites (website, blog, and database list) under one updated url on a single secure server planning content analysis needs analysis peer analysis evaluating platforms budgeting timeline october 2015–january 2016 february 2016 march 2016 april–may 2016 july–august 2016 implementation and future steps appendix: detailed timeline bibliography emergency remote library instruction and tech tools: a matter of equity during a pandemic article emergency remote library instruction and tech tools a matter of equity during a pandemic kathia ibacache, amanda rybin koob, and eric vance information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12751 abstract during spring 2020, emergency remote teaching became the norm for hundreds of higher education institutions in the united states due to the covid-19 pandemic. librarians were suddenly tasked with moving in-person services and resources online. for librarians with instruction responsibilities, this online mandate meant deciding between synchronous and asynchronous sessions, learning new technologies and tools for active learning, and vetting these same tools for security issues and ada compliance. in an effort to understand our shared and unique experiences with emergency remote teaching, the authors surveyed 202 academic instruction librarians in order to answer the following questions: (1) what technology tools are academic librarians using to deliver content and engage student participation in emergency remote library sessions during covid-19? (2) what do instruction librarians perceive as the strengths and weaknesses of these tools? (3) what digital literacy gaps are instruction librarians identifying right now that may prevent access to equitable information literacy instruction online? this study will deliver and discuss findings from the survey as well as make recommendations toward best practices for utilizing technology tools and assessing them for equity and student engagement. introduction the worldwide covid-19 pandemic has had important repercussions for university libraries. all library services, including information literacy instruction, moved online in a matter of days, creating a wave of needs that required immediate response. with the closure of university campuses all around the world, academic libraries encountered an unprecedented test of their adaptation abilities. although online education has been around for many years, widespread use of the remote classroom may have been unprecedented for many librarians until the spring of 2020. this type of online learning, as charles hodges et al. explain, is significantly different from the otherwise established domains of online and distance learning because it is unplanned, rushed, and happening in the midst of a crisis.1 as they note, “emergency remote teaching has emerged as a common alternative term” to differentiate from standard online education prior to the pandemic.2 the authors recognize the different and sometimes overlapping personal and professional impacts covid-19 has had on our communities, both inside and outside of the classroom. rather than broadly assessing emergency remote teaching, the authors are looking at what jody greene, referring to teaching during the covid-19 pandemic, calls “specific technological tools and flexible teaching practices.”3 this paper is concerned with issues of equity, student engagement, kathia salomé ibacache oliva (kathia.ibacache@colorado.edu) is romance languages librarian, assistant professor, university of colorado boulder. amanda rybin koob (amanda.rybinkoob@colorado.edu) is literature and humanities librarian, assistant professor, university of colorado boulder. eric vance (eric.vance@colorado.edu) is associate professor of applied mathematics and director of lisa (laboratory for interdisciplinary statistical analysis), university of colorado boulder. © 2021. mailto:kathia.ibacache@colorado.edu mailto:amanda.rybinkoob@colorado.edu mailto:eric.vance@colorado.edu information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 2 and technology tools that could be used to facilitate library instruction during emergency remote teaching. the authors seek to answer the following questions: (1) what technology tools are academic librarians using to deliver content and engage student participation in emergency remote library sessions during covid-19? (2) what do instruction librarians perceive as the strengths and weaknesses of these tools? (3) what digital literacy gaps are instruction librarians identifying since covid-19 that may prevent equitable access to information literacy instruction online? literature review technology tools facilitated a quick transition online in march 2020, enabling librarians to interact with students despite the move to emergency remote teaching. however, this fast transition and its associated learning curve accentuated issues of student engagement including equity and accessibility. there is a dearth of existing literature on teaching and learning online during times of great societal stress, with some notable exceptions, including a recent piece about university closures and moving to online classes during student-led protests in south africa from 2015 to 2017.4 as such, this literature review considers some of the barriers that contribute to inequitable information access in online learning, as well as digital literacy definitions. here we consider both ongoing challenges to equitable online access and specific challenges for the current covid-19 pandemic. barriers to equitable student access in online learning equity in academic libraries is widely represented in the scholarship through topics including disability, race, class, and salary gaps among librarians.5 however, as our ongoing pandemic illustrates, there is a strong need for more literature regarding students’ equitable online access to information during times that call for emergency remote teaching. the issue of equity may be considered in terms of external and internal challenges, which affect students differently. external barriers include low bandwidth and lack of devices. some researchers advise letting students communicate through chat instead of a webcam, since webcam use increases bandwidth consumption.6 understandably, colleges may need to provide computers and wireless hotspots to students who lack access to computers or to the internet.7 moreover, a 2018 pew fact tank publication noted that 15 percent of homes with school-age students (6–17 years old) do not have access to high-speed connection, and this digital divide particularly affects teens and their ability to be involved with homework.8 although this data focused on school-age students, these issues probably affected some college students during the pandemic. students may also be experiencing internal barriers such as language differences, lack of self regulation, lack of previous educational experience, and stress, all of which may affect academic performance. for example, one study found that language barriers challenged international students during remote web conferences with librarians.9 another study of international students showed that their academic success relied significantly on a variety of internal characteristics, such as self-regulation.10 additionally, a survey of students taking online courses showed that previous educational experience, including with online learning or within a given discipline, supported completion of those courses.11 moreover, stress is an internal barrier for students that may have external causes and is likely affecting librarians, faculty, and students during covid -19. scholars note that stress changes peoples’ use of technology, and this stress manifests differently depending on individual identity markers, such as gender and experience.12 information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 3 technology tools, digital literacy, student engagement in addition to barriers to equitable access, the digital age that has characterized the late 20th and 21st centuries, has prompted the advent of multiple technology tools that may be used in online library sessions, including emergency remote library instruction. these tools are meant to facilitate instruction and engagement, but they require students and instructors to be comfortable with technology. in the case of higher education, this level of comfort involves digital literacy competencies that surpass what is known as traditional textual literacy. the american library association’s (ala) digital literacy task force defines digital literacy as “the ability to use information and communication technologies to find, evaluate, create, and communicate information, requiring both cognitive and technical skills.”13 during the pandemic, the technical and cognitive skills of library instructors and students may be compromised due to stress as well as individual situations and specific environments. one of the technical challenges for remote library sessions stems from the need for instructor s to use tools to achieve flexibility and hybridity. librarians steven j. bell and john shank, addressing the challenges of new technologies for librarianship, coined the term “blended librarian” in 2004 to denote a librarian who combines traditional skills with those involving knowledge of hardware and software as applied in the teaching and learning process.14 the concept of the “blended librarian” may be outdated, but it encompasses the notion that librarians are expected to be comfortable with technology. again, librarians are now facing the mandate of presenting information literacy and library resources online, navigating between and facilitating the use of multiple technology tools and formats. it is worth considering how well our tools meet this mandate. although remote learning may be more amenable to some learners than others, there is consensus on the benefits of using technology for teaching and learning even if a learning curve exists for instructors. for example, researchers examining school support for classroom technology found that teachers supported enhanced technology integration even if it surpassed their own technology skills.15 notwithstanding the benefits perceived by teachers, there are also some drawbacks in the use of technology in the classroom, especially for distance learning. digital technologies researcher jesper aagaard, reporting part of a study on “technological mediation in the classroom” refers to two processes: “outside in,” where students use educational technologies to acquire knowledge in the classroom, and “inside out,” where students use technology tools to withdraw from the classroom visiting non-related websites.16 for instruction librarians, student engagement is paramount; therefore, redirecting students who leave the digital classroom is important, though it can be difficult to know when this occurs. a number of reasons could explain why students may disengage in a distance learning setting, one of them being the lack of digital literacy. moreover, the belief that higher education students in the 21st century are technologically savvy may be misleading. citing mark prensky, who originated the terms “digital native” and “digital immigrant,” wan ng explains that the phrase “digital natives” describes those people born in 1980 and after whose lives have been shaped by technology.17 ng found that while the students in his study were very comfortable with technologies such as word processor software, youtube, and facebook, they were not as comfortable using technologies to create content.18 there may be a digital literacy divide between information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 4 knowing and using a technology for social media and using a technology to create online content such as web pages and blogs. similarly, ng found that when presented with unfamiliar technology, students spent less time learning the new technology and instead focused on preparation of content.19 this finding may be of concern to instruction librarians who use a myriad of tools during emergency remote teaching. it is important to consider that these digital literacy divides could stem from factors not related to a student’s age group. researchers ellen johanna helsper and rebecca envon question the notion that a person may be called a digital native if they were born after 1980. these authors state that there are variables other than generational differences that could define a person as a digital native, such as gender, education, experience, and interaction with technology.20 therefore, even when people grow up in technological environments, they may not be considered digital natives. to minimize a gap in equity, lecture design, even for one-time library sessions, offers an opportunity to think of technology tools that could increase students’ participation and prompt learning. david ellis, studying classroom resources to enhance student engagement, notes that padlet, a web 2.0 technology, supports interaction and learning.21 seyed abdollah shahrokni, reviewing another web 2.0 technology, playposit, as a video tool for language instruction, states that it “can support learning in language classrooms” if used in a lecture design that includes relevant questions.22 lecture design applies to all types of settings: in person, flipped, and distance learning. approaches should be applied consistently to help students become more digitally literate and bridge equity issues where possible. jurgen schulte et al., providing examples of “new” librarian roles in a science curriculum, note that digital literacy enables better learning. 23 in the case of emergency remote teaching, instruction librarians may promote digital literacies through the use of technologies that increase students’ engagement and their “outside in” participation in the teaching and learning process. considering these challenges, the authors seek to identify the strengths and weaknesses of technology tools used by librarians and the digital literacy gaps that may prevent access to equitable library instruction. methods instrument the authors used a six-question qualtrics survey approved by the institutional review board at the university of colorado boulder. the survey was open for two weeks, between may 10 and may 24, 2020. it is worth noting that the questions were specific to this timeframe, and some responses indicated that instruction librarians were still finishing up spring semester 2020. the survey received 202 responses. however, the number of responses to each question varied as answers were not required. the data collected were both quantitative and qualitative, reflecting respondents’ practices, perceptions, and personal knowledge. respondents answered two multiple-choice and four free-text questions. for the multiple-choice questions, participants could choose all the options that applied and enter their own choice as well. the multiple-choice questions gathered data on the technology tools that librarians used to deliver content or to engage with students during covid-19. these questions distinguished between content delivery platforms (like zoom) and technology tools used for student engagement (like padlet). the technology tools included in the multiple-choice questions were chosen based on the authors’ knowledge of their potential relevance to instruction librarians. the information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 5 final four qualitative questions collected information about respondents’ perceptions of strengths and weaknesses of technology tools, as well as digital literacy gaps identified during covid -19 and other challenges to equitable instruction. qualtrics provided a report, which the authors organized in a spreadsheet used to analyze the data and create the figures. the following survey questions were asked: 1. what content delivery technology have you used to create your distance learning library sessions during covid-19? 2. what technology tools have you used to enhance student engagement in your distance learning library sessions during covid-19? 3. what are the strengths of the technology tools you’re using right now? 4. what are the weaknesses of the technology tools you’re using right now? 5. what digital literacy gaps have you identified in your students since covid-19 closures? ala’s digital literacy task force defines digital literacy as “the ability to use information and communication technologies to find, evaluate, create, and communicate information, requiring both cognitive and technical skills.” 6. what other challenges exist in your ability to effectively provide equitable information literacy instruction during this time? please see appendix a for the complete survey instrument. participants the survey was distributed through email to five listservs associated with academic libraries and library organizations: the seminar on the acquisition of latin american library materials (salalm) listserv, information literacy instruction discussion listserv, the library instruction roundtable (lita) listserv, the lita instructional technologies interest group listserv, and the literature in english discussion list. these organizations were chosen due to their connection with library instruction in academic libraries and the authors’ subject specialty affiliations (romance languages and english and american literature). grounded theory approach the data for questions 3, 4, 5, and 6 were analyzed using a basic grounded theory approach, where the authors collected themes and patterns from the responses rather than approaching the data with pre-existing hypotheses.24 based on their observations, the authors categorized responses according to an agreed-upon set of keywords. in addition, after coding the data separately, the researchers examined every answer together to ensure consistency and reliability. a mixedmethods survey with a grounded theory approach to analysis allowed for a larger number of responses than qualitative interviews. the survey format also allowed for quicker solicitation and analysis of data, given the urgency of the topic and the authors’ desire to provide recommendations to colleagues in a timely manner. findings popularity of technology tools figure 1 shows respondent selections from the list of content delivery tools provided by the authors. a large number of respondents used libguides as a content delivery tool during covid 19, followed closely by the video conferencing tool zoom. however, although libguides and zoom displayed a substantial amount of concurrence among the respondents, fewer than half of the https://lists.ala.org/sympa/info/ili-l https://lists.ala.org/sympa/info/lirt-l https://lists.ala.org/sympa/info/lirt-l https://lists.ala.org/sympa/info/lita-insttechig https://lists.ala.org/sympa/info/les-l information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 6 respondents used the rest of the technology tools shown in figure 1. these data suggest that a large number of the respondents were able to deliver library instruction via synchronous learning through zoom or by providing resources asynchronously via libguides, and thus had the opportunity to have at least some engagement with students. figure 1 also shows that more respondents used snagit and screencast-o-matic to create videos than playposit. similarly, a little over one-eighth of respondents used the graphic design tool canva to create content, although this tool had better usage than adobe illustrator, which was only used by one respondent. in addition, the communication software google hangouts was largely not used by respondents. the authors listed formative and pear deck in the survey options as well, but these were not selected by any respondents (not shown in figure 1). figure 1. respondent selections to question 1: what content delivery technology have you used to create your distance learning library sessions during covid-19? figure 2 represents the tools used by the 95 respondents who selected other and entered additional tools in a free-text box in question 1. tools mentioned, such as webex, camtasia, panopto, and kaltura capture, were used for video conferencing but to a lesser extent than zoom. similarly, only six respondents reported using narrated powerpoint. interestingly these tools were still used by more people than playposit. respondents mentioned a wide array of other technology tools in the free-text box (see appendix b); however, none of these tools were used individually by more than three respondents. information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 7 figure 2. other content delivery technology used to create distance learning library sessions during covid-19. the survey also asked about technology tools used for student engagement in distance learning library sessions during covid-19. the authors distinguish these tools from content delivery tools, as they are often utilized in conjunction with some of the tools mentioned in figure 1 to facilitate student interaction. figure 3 shows that, among the tools listed by the authors, respondents preferred the application google forms, found in google drive and google classroom, as more than one-third of respondents indicated they used this application to enhance student engagement. although representing fewer than half of the respondents, 18 more people selected google forms over poll everywhere, the tool with the second-best representation. moreover, poll everywhere and padlet, two online tools that enable student participation through custom-made polls and post-it boards, were each utilized by about one-fourth of participants. the game-based learning platform kahoot was used by nearly one-fifth of respondents, and mentimeter, another interactive platform allowing students to answer multiple-choice and openended questions, was used by 11 respondents. less than five percent of the respondents used the interactive technology tools flipgrid, answergarden, jamboard, mural, slido, and socrative. no respondents indicated they used pear deck, google drawings, quizalize, gosoapbox, and yo teach! (not shown in figure 3). in addition, 42 respondents entered the names of technology tools they used to enhance student engagement in the other free-text option. similar to the responses in the free-text answer for question 1, respondents provided a broad list of technology tools. two of the tools listed displayed a higher number of concurrences: eight respondents mentioned zoom polls and four mentioned information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 8 springshare libwizard. an additional 20 tools were used by just one participant each (see appendix c). figure 3. respondent selections to question 2: what technology tools have you used to enhance student engagement in your distance learning library sessions during covid-19? strengths and weaknesses of technology tools instruction librarians also described the perceived strengths of the technology tools they used. figure 4 shows that a little less than half of the respondents agreed “easy to use” was an important consideration for technology tools, making it the most frequently mentioned strength. responses showed interest in ease of use for librarians, students, and faculty alike. for example, respondents included the phrases “our learners were comfortable with them,” “it’s easy to get started,” and “everyone already knows zoom.” in addition, nearly one-fourth of participants selected the strength “interactive/collaborative” followed at a distance by the strength “flexible,” which dropped dramatically to 15 percent. in fact, the number of respondents who noted “interactive/collaborative” was almost quadruple the number of respondents who mentioned the less popular choices “supported by it” or “captioning functionality.” fewer than 19 participants acknowledged that it was important for the technology tools to enable remote instruction, include recording functionality and screen-sharing functionality, and to be able to enhance communication. only 11 participants wrote that it was important for the tool to be readily available. respondents referred to other strengths not included in figure 4 due to their infrequency. nonetheless, some of these strengths offer unique insights. for example, four respondents noted information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 9 that they favor free tools. in addition, three respondents stated that it was beneficial to repurpose content created with technology tools. two respondents mentioned that they preferred tools that do not require download and/or account creation. another respondent mentioned that mo bilefriendly tools were most helpful for engaging students. figure 4. respondent answers to question 3: what are the strengths of the technology tools you’re using right now? respondents also shared their observations around technology tool weaknesses. figure 5 shows that several perceived weaknesses were the inverse of strengths from figure 4, including that tools were “difficult to use” or “not interactive or engaging.” figure 5 also indicates that respondents were divided as to the most significant weaknesses. in fact, not even one-fourth of respondents selected the most frequent response, “not interacting or engaging,” displaying a lack of concurrence. the second most-repeated weakness referred to bandwidth requirements, with 27 respondents worrying about the lack of requisite internet access. the authors joined together seven weaknesses mentioned by respondents as “other functional limitations.” these weaknesses included “lack of screen capture,” “connection failures,” “lack of captioning,” “lack of recording capabilities,” “limited sharing screen,” “freezing video,” and “video quality.” each of these specific limitations was only mentioned a couple of times, but together these functional limitations were mentioned by 17 respondents. again, there were some specific weaknesses mentioned by only a few respondents. some of the highlights included tech overload or too many tools to choose from (two respondents), computer storage requirements (three respondents), and that the tools are not flexible enough or easy to information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 10 integrate into other systems such as canvas or libguides (four respondents). two people observed that the tech tools they used had no weaknesses. interestingly, 18 respondents included keywords and phrases in their answers (not shown in figure 4) that were not directly related to tool weaknesses, but rather described other issues affecting teaching and learning in a remote setting. these included students lacking computer s or having only cell phones (seven respondents), students’ limited technology skills or attitudes about remote learning (six respondents), students’ home setups (three respondents), and limited familiarity with the tools among teaching faculty (two respondents). these kinds of responses illustrate the wide range of interconnected factors impacting librarians’ experiences engaging with students and technology during covid-19. finally, 26 percent of librarians answering this question mentioned some weaknesses related to zoom (not shown in figure 5). to illustrate, some comments included “active learning in zoom [sic] is difficult . . .”; “zoom recordings take up a lot of space and our college is running out of room . . .”; “zoom doesn’t work as well when using wifi [sic], as opposed to connecting through a network”; “it is easy to zone out and not pay attention to zoom [sic]”; “with zoom it is difficult to interact with students on a one-to-one basis as they breakout [sic] to conduct research”; and “students tend to not have cameras on . . . and it’s hard to tell if they are actually paying attention.” these observations may show that while respondents favor using tools like zoom, they are also aware of important limitations. figure 5. respondent answers to question 4: what are the weaknesses of the technology tools you’re using right now? information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 11 digital literacy gaps beyond describing the technology tools used, respondents were asked to identify digital literacy gaps that they noticed in students during covid-19 closures. as stated above, the authors defined digital literacy in the survey question according to the ala digital literacy task force definition. still, answers to this question provoked a wide range of responses as seen in figure 6. the most frequently recurring response was that digital literacy gaps were the same as those perceived before the pandemic, although only 25 respondents agreed on this. digital literacy gaps observed by respondents included “lack of tech skills in general,” “problems evaluating information online,” “ineffective search strategies,” “difficulty communicating online,” “problems using library resources in general,” “problems using online resources,” “problems using library databases,” and “understanding citation and plagiarism.” the second-biggest category, “lack of tech skills in general,” included varied responses such as “some of my students lack a basic understanding of . . . browsers, upload/download, url versus link, activate/enable a feature etc.”; “students have trouble navigating multiple windows”; and “students are having a hard time trying something new which involves more than a single click or two.” eleven other respondents noted that it was too early to evaluate digital literacy gaps during emergency remote teaching. one respondent offered insight about the possibility that librarians missed gaps because they were not able to meet with all students. as they stated, “students who have access and are in contact with librarians seem to have adequate skills. i don’t know how many students simply lack internet access, and i don’t know how many need the library and don’t figure out how to access it. . . .” ideas for reaching more students who may not have access to in-class library sessions are mentioned in the recommendations section below. when asked about digital literacy gaps, some respondents mentioned student experiences during covid-19 that were not directly related to digital literacy and therefore are not included in figure 6. however, the authors considered this information relevant because it provided insight into perceived challenges students faced. the authors separated such responses into two g roups: external challenges and internal challenges. external challenges mostly involved technology access rather than digital literacy per se, with 22 respondents mentioning lack of tech access as a barrier or gap. it is worth noting that respondents mentioned this lack more than any individual digital literacy gap shown in figure 6. fifteen respondents also noted that students may lack internet access at home, while five percent mentioned a home environment that was not ideal or conducive to learning. although these external challenges are not explicitly related to digital literacy, the fact that they are mentioned here may indicate that respondents perceived these challenges as interrelated during the covid-19 pandemic. internal challenges included concepts that may be seen as related to digital literacy but are not explicitly included in the ala task force definition. in fact, many of these challenges had to do with pandemic-specific difficulties such as “emotional issues” arising during covid-19 (10 respondents). five respondents worried about information overload, while two respondents each mentioned that students were less likely to ask for help and more likely to have problems following directions during emergency remote learning. information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 12 figure 6. respondent answers to question 5: what digital literacy gaps have you identified in your students since covid-19 closures? the last survey question asked respondents to reflect on any other challenges that may have impacted their ability to effectively provide equitable library instruction during emergency remote learning. figure 7 displays an array of responses, including some related to technology tools, home environments, and institutional support. nonetheless, technology access from home was perceived as the most important challenge (39 respondents) followed closely by internet issues (35 respondents). for both challenges, the authors included responses that specified lack of tech access for students, teaching faculty, or instruction librarians. many respondents d id not specify who lacked access. however, one could argue that lack of access by any of those three groups may impede connection and student engagement. other challenges such as home environment, fewer library instruction sessions, communication barriers with students, lack of student engagement, no time to plan, emotional distress, and issues with synchronous or asynchronous instruction affected 11 percent or less of respondents each. additionally, the data indicated that librarians perceived more communication barriers with students (14 respondents) than with faculty (nine respondents). in figure 7, “asynchronous/synchronous” refers to problems encountered by respondents that had to do, in general, with the unique challenges of presenting content online either asynchronously or synchronously. for example, respondents mentioned being unsure whether students were engaging with asynchronous content. they also mentioned being asked by faculty to use one format over another, despite librarian preferences. one respondent focused specifically information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 13 on the need for flexibility when addressing equity: “asynchronous instruction does not allow the real time adaptation to student needs (cognitive and technical).” even though figure 7 relates to challenges experienced in providing equitable library instruction, respondents showed that there was also an emotional factor surrounding these challenges . two revealing responses to the question about challenges included “my kids running around in the background, not having an actual office, being expected to work 40 hours a week while homeschooling and running a household” and “some students [are] more or less in shock from the pandemic; some students have illness in the family; some students have economic issues, some students just don’t learn well with online learning only.” other comments stated personal challenges, such as the “stress of living in [the] epicenter of [a] global pandemic” and “my own mental and emotional capacity.” figure 7. respondent answers to question 6: what other challenges exist in your ability to effectively provide equitable information literacy instruction during this time? because there was often little consensus among responses, the authors created word clouds for all four qualitative questions (figure 8). each of these questions showed students at the center of instruction librarians’ responses, which is not surprising given their roles and the subject of this survey. the purpose of emergency remote teaching and learning is, at its core, to continue to connect students with resources and to engage them in their learning, even and especially when it is challenging to do so. still, it is meaningful to see students at the heart of these data. information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 14 figure 8. word cloud visualization for each qualitative question answer set. challenges and limitations many of the challenges encountered while analyzing data had to do with creating meaningful keyword codes for the qualitative survey questions. this coding was challenging because respondents expressed varied experiences and opinions and there was no significant consensus regarding tools used, tool weaknesses, digital literacy gaps, or other challenges. in contrast, respondents frequently referred to students’ lack of technology and internet access, even when the question at hand did not explicitly address this. these challenges speak both to the varied experiences of and institutional responses to covid-19, as well as perceived lack of tech or internet access among students as a primary barrier to effective emergency remote teaching. further, while some questions signaled a clear answer, others required interpretation. to illustrate, respondents used the term “accessibility” inconsistently. some respondents used this term to refer to accessibility for students with disabilities, and others used it to refer to “availability.” therefore, the authors employed contextual clues to determine meaning. regardless, if the meaning remained unclear, then these answers were not considered for coding. similarly, respondents didn’t always use the same language to describe the same concepts. for example, a participant noted that “the technology we have is limited to lecturing and answering questions and providing documents and videos online. we don’t have polls enabled . . . .” the information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 15 authors interpreted this to mean that the technology tools didn’t allow for robust engagement with students, though the respondent didn’t specifically mention the word “engagement.” again, if context or meaning was unclear, those responses were not coded. another challenge occurred in analyzing responses to question 6: what digital literacy gaps have you identified in your students since covid-19 closure? some respondents appeared to be unfamiliar with the term “digital literacy,” even though a definition was provided within the question. some respondents referred to hardware access, home environment, tech access, or psychological stress rather than explicitly reflecting on digital literacy gaps as included in ala ’s task force definition. these responses could indicate either confusion around the definition of digital literacy or, as suggested above, the perception of all these factors being codependent or interrelated. limitations of the study included the design of the survey itself. for example, respondents received a list of tools for questions 1 and 2, which may have meant that they were more likely to select these than to remember other tools that they used and add them to the other category accordingly. questions 3 through 6, in contrast, did not include any multiple-choice options, which may have limited the thoroughness of responses. for example, the average number of responses to question 3 was 2.08 strengths mentioned per respondent. we think it is likely that respondents would have indicated more strengths had they been presented with a list rather than only a freetext box. the authors also did not define the difference between content delivery tools and tools for student engagement in the survey. for this reason, there was some overlap noted in the responses for questions 1 and 2. also, respondents mentioned tools for engagement that were sometimes features of content delivery tools, such as webex whiteboards and lms discussion forums. the vast landscape of tools used meant that our survey could not account for all possible manifestations of technology for content delivery and student engagement. discussion questions 1 and 2: technology tools instruction librarians are using to deliver content and engage students the instruction librarians that answered the survey have widely used technology tools such as libguides and zoom in their library seminars during covid-19. however, as data show, librarians have also used many other technology tools to create and deliver emergency remote library sessions during covid-19, due perhaps to the wide array of tools available. while libguides and zoom exhibit a high percentage of usage, this result was expected because libguides is a wellknown tool used by academic librarians and, according to the company, zoom became more prominent as a tool during covid-19.25 the relatively low usage of adobe illustrator is also somewhat predictable because this tool not only requires a subscription, but also may have a higher learning curve than other free graphic editor and design programs. data raised some further questions about the role of information technology (it) departments. are instruction librarians reaching out to their respective universities’ it departments to learn about technology tools available to them and vice versa? are it offices willing and able to provide training via video conferencing if in-person training is not available due to the pandemic? do it departments offer enough promotion to advertise these tools? these questions are not addressed in this manuscript but are important avenues for further research. only six percent of respondents information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 16 recorded “supported by it” as a strength of the technology tools they were using. this shallow percentage may appear striking but could be understood under the premise that as the pandemic set in across the united states and instruction librarians rushed to prepare and present online sessions, librarians relied on the tools that were most familiar to them instead of learning a new technology tool. the data from this survey would seem to corroborate this, as so many respondents chose “ease of use” as an important strength. one interesting detail that is worth addressing about the tools is that respondents mentioned other more than they selected the options the authors provided, which may imply that either the authors did not include the most-used tools or that the number and/or variety of tools is so wide that it is difficult to reach a consensus. one wonders if the tools mentioned in other had been included as part of the listed options, the number of respondents using these tools would be higher. questions 3 and 4: the strengths and weaknesses of tools as they affect student engagement the authors wanted to know what tools respondents had used to enhance student engagement. data show that google forms is the tool that most of the respondents have used for this purpose. however, fewer respondents used tools that are purposely designed to increase interaction in online sessions, such as kahoot and mentimeter, which do not even require a fee for using their basic features. respondents’ perceptions of the strengths and weaknesses of tools they have used provided useful information. in terms of tools strengthening student engagement, the responses were not as conclusive, as 40 of 150 respondents found these tools interactive or collaborative, and an even lower number of these respondents thought of these tools as flexible or helpful to enable remote instruction. one could argue that ada capabilities are features that may facilitate student engagement. however, when respondents were asked about the strengths of technology tools they had used, accessibility was not often mentioned as a strength. moreover, data showed that only eight respondents referred to ada problems and three of them voiced concern over captioning capabilities, which is considered a relevant ada feature. there was no mention of alt text for images, screen-readable and software-neutral file formats, or the importance of user ability to change the color and font setting in their devices to see the content. in fact, only three respondents specifically mentioned issues with videos in terms of their audio quality, lack of auto closed-captioning, and freezing images. respondents noted wide-ranging effects of tool weaknesses on both instruction librarians and students. to illustrate, the weaknesses “time intensive,” “not designed for teaching,” and “no feedback or assessment” likely affected instruction librarians at a personal level as they prepared for and assessed their teaching. in contrast, the weaknesses “ada problems,” “not interactive or engaging,” “difficult to use,” and “makes communication difficult” might primarily impact students. other concerns respondents stated that may influence student engagement included poor bandwidth, which affects internet access and causes connection issues. for example, even if librarians try to improve the video quality in zoom by disabling the higher definition option, or start a session with audio conferencing only, which will decrease the amount of bandwidth needed, students with poor bandwidth may still not be able to engage. therefore, in situations of emergency remote learning, if students lack bandwidth or an appropriate home environment, learning and engaging may become a challenge. information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 17 questions 5 and 6: digital literacy gaps and equitable instruction a number of factors affect the ability to provide equitable library instruction on the librarian side and to engage with equitable library instruction on the students’ side. one of these factors is home environment, including access to computers, good bandwidth, or an appropriate working station. our data specify that 15 respondents perceived “home environment” as a challenge when providing equitable library instruction. in addition, some respondents noticed home environment issues when asked about digital literacy gaps in relation to shared spaces and lack of computers and access. it’s possible that equity issues increased during the covid-19 shutdown, which raises the question of whether there is a correlation between the issues affecting students, librarians, and faculty. data showed that of 139 participants, a little over one-fourth of them considered “tech access from home” and “internet issues” as challenges in the ability to provide equitable library instruction. these challenges, along with “emotional issues,” were perceived to affect not only students but also librarians and faculty. although responses recorded librarians’ perceptions on equity issues, often including their own experiences, data revealed that respondents presumed faculty and students were having similar issues. data also exhibited that respondents perceived other challenges, such as “fewer library sessions” and “student lack of engagement,” that may affect students directly. fewer library sessions are a challenge that may be further addressed forthwith. however, students’ lack of engagement is a difficulty that may require thoughtful outreach, collaboration with mental health offices at the campus level, and reflective and inclusive lecture design. these challenges may have a negative impact on receiving an equitable learning experience. in fact, less commonly acknowledged gaps may, in some ways, be more important than those frequently mentioned. a well-known gap can be addressed because there is consensus that the gap exists and poses a barrier to equity in education. unnoticed or overlooked gaps, in contrast, are more difficult to address but may be no less important as barriers to equitable education. equity issues may also arise as a result of lack of digital literacy skills in students. students with higher digital literacy are deemed to perform better in an emergency remote library instruction setting, may be more prone to stay in tune and engaged with the lesson, and have less emotional stress by feeling confident. however, as wan ng explains above, those recognized as digital natives may not necessarily have digital literacy, even if they are comfortable with social media tools.26 data do not tell us the age of students; regardless, digital literacy gaps were detected by respondents. these perceived gaps in digital literacies (evaluating information, communicating online, applying search strategies, using library resources and databases, understanding plagiarism and citation, and using online resources) are important for librarians to address during emergency remote learning. last, the lack of consensus may be explained by the complexity of the concept of digital literacy. it is possible that many of these gaps existed before, but librarians recognized them as new during emergency remote learning. one response illustrates this idea: “the closure has prompted many more students to request help in every step of the digital literacy process. i’m not sure if students typically ask each other, or their professors/instructors. regardless, it’s exposed that not all students know things i’d assumed they did.” whether these gaps are new or not remains unclear, as evidenced by another respondent who stated, “nothing new to the covid era.” information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 18 recommendations these recommendations seek to address some of the issues that arose in the data, especially those regarding equity and emergency remote library instruction. to illustrate, one respondent summed up the current situation while also posing a question that appears valuable: “not all of our students have the same access to stable technology and internet, nor do they all respond to online teaching strategies in the same ways. how do we create equitable and accessible learning opportunities?” while the authors do not have all the answers, based on the analysis of the data and emerging themes, some recommendations may help instruction librarians move forward through the covid-19 crisis. technology and equity the authors realize that a budget is essential for the implementation of recommendations that may reduce both inequitable access to information and lack of digital literacy. nonetheless, the recommendations below intend to offer guidance on ways to improve equitable access, digital literacy, and student engagement during emergency remote library sessions. one external digital barrier for students engaging in emergency remote library sessions was the lack of equipment at home, possibly due to economic hardship. university libraries could provide kits containing a chromebook, webcam, microphone, wi-fi hotspot, and headphones to increase equitable access. access to this equipment may help students feel supported and understood, with a sense of dignity. these offerings should be in coordination with other campus units who may provide similar services, such as student affairs and it departments. likewise, a coordinated marketing and outreach effort at the campus level may enhance the visibility of equipment available for student use. as stated above, “ease of use” rose to the top as the most-frequently mentioned technology tool strength, which is understandable given the many stressors educators and students may be experiencing during covid-19. however, it is important to keep in mind that tools should be “easy to use” not just for librarians and teaching faculty, but especially for students. nonetheless, given the difficulty of assessing instructional technology and library information literacy sessions right now, it is challenging to know whether students find the technology tools that librarians choose truly “easy to use.” compounding the perception that tools are easy to use is the possibility that tools may not be ada accessible. though the survey did not ask about accessibility explicitly, and while the authors did not vet the tools listed in the survey for their accessibility features, the authors wonder how many tools are fully accessible to all learners. instead of choosing tools for their perceived ease of use, a further recommendation is to move beyond valuing what’s easy to critically reflect on whether tools are fully accessible to students with visual or hearing impairments or learning differences. if the answer is no or unclear, perhaps using basic content delivery tools that are vetted for accessibility features is the better option. it is recommended to follow best practices for using those tools (for example, by referring to guidance from campus it departments). if instruction librarians consider themselves “blended,” or perhaps even so well-versed in technology that a term like “blended librarian” is no longer needed, they should also prioritize flexible, responsive, and intentional use of technology in their lecture design. if a tool that they assumed would be easy to use for all students is proving challenging for some, librarians should have alternative options and extra support at the ready. they may also ask themselves whether information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 19 use of a technology tool furthers the learning process and outcomes of the course, or if technology is added for its own sake. in addition, avoiding use of extra tools and technology that does not genuinely enhance lecture goals and priorities may help students avoid stress related to technology, which could further students’ emotional well-being during this fraught time. being clear with students about which tools will be used and for what purpose may help students who would otherwise struggle with layered content delivery and engagement tools. a glossary of these tools, along with when and how they’ll be used and links to technical support, could be a helpful support document for students. communication and equity it is worth exploring librarian, student, and faculty communication not explicitly focused on technology. some respondents mentioned outreach and connection challenges that have less to do with technology and more to do with other stressors and limitations. for example, some librarians reported receiving fewer requests for information literacy sessions or library support than usual, and some speculated that this was because of the quick move to emergency remote learning, lack of time to plan, and the possibility that a library session was “extra” and faculty were trying to simplify. there are several ways to address this challenge. librarians can attempt to meet students and faculty where they are by offering multimodal learning opportunities, including both synchronous and asynchronous offerings (zoom meetings, prerecorded videos, tutorials/quizzes, canvas discussion posts, and libguides are a few options). it is also paramount to make sure librarians are reachable at the point-of-need, which may mean extended weekend and evening hours on the virtual ask-a-librarian desk. also imperative is ensuring that virtual services, as well as consultation request links and/or email addresses, are clear and visible to students and faculty on the library’s website. some survey respondents mentioned that communication with faculty was difficult, and this may have contributed to fewer instruction requests. while it is understandable that faculty may have been less responsive to librarian outreach for a variety of reasons, there are some ways to encourage faculty communications. for example, librarians could provide simple, bulleted lists with updated information on services and offerings, individual attention (focused on specific classes and topics), and options, acknowledging that some faculty will simply not want to share classroom time during emergency remote teaching. librarians can also work to bridge the disconnect between it and their departments by proactively reaching out to learn about best practices not only for technology use, but also for ada accommodations. even when information literacy sessions are requested, faculty may not always share student accommodation needs. librarians can ask for help from it or other units on campus (such as centers for teaching and learning) to make sure that their communication techniques are aligned with inclusive, user-centered approaches to teaching and learning with technology. as professionals in a unique role serving both students and faculty, librarians may also check in on a person-to-person basis with both groups. acknowledging that we are people with mental and physical health needs working together in difficult circumstances is one way of connecting with students and faculty in an authentic way. emergency remote teaching and learning is different information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 20 from typical remote or online learning and being clear about that might also help everyone adjust expectations and extend compassion. professional development and personal support while emergency remote teaching and learning may not seem like the best time for professional development, it is important to acknowledge that librarians deserve support in navigating this unprecedented time. even as we clearly want to help students who may be especially vulnerable during covid-19, there is a sense of being overwhelmed, and librarians may not always know where to start. while there are online webinars and discussions that provide advice about how to best help students during covid-19, the authors recommend a more specific approach targeting digital literacy gaps and support systems for librarians. in reviewing survey responses to perceived digital literacy gaps and other challenges, it became clear that not all librarians are well-versed in digital literacy concepts. if librarians have time to take one approach to professional development as it relates to instruction and information literacy, the authors recommend learning more about digital literacy competencies and thinking critically about how emergency remote library instruction design can address those competencies and potential gaps. of course, stresses of the pandemic are impacting librarians as well as faculty and students. it is important that we connect with colleagues and support systems during this time. one option might be to form a community with colleagues to determine best practices for use of technology in instruction among other relevant topics (examples at the authors’ library include anti-racist actions and a caregiver’s support group). librarians should also prioritize their own health (mental and physical) and stress management. the recommendations are everywhere but bear repeating: connect with family and friends, exercise, take time away from the computer, and make sure to rest. librarians should be kind to themselves and their colleagues and offer or ask for support when needed. conclusion as of spring 2021, the covid-19 pandemic is not yet over. it remains unclear whether and when academic library instruction will return to the old normal. the data collected and analyzed during this paper, as well as the discussion and recommendations, can inform how instruction librarians respond to student needs and challenges as everyone continues to cope with life during emergency remote learning. especially compelling are the data shared about the strengths and weaknesses of technology tools used to enhance student engagement in library instruction. these data provide parameters that may help other instruction librarians make decisions when choosing a technology tool and be prepared to troubleshoot when issues arise. a concerning data revealed that digital literacy, as defined by ala’s digital literacy task force, is a subject that may not be widely understood by instructors. although our pool of respondents was small, instruction librarians may need a broader understanding of what digital literacies look like in practice when dealing with emergency remote teaching and a diverse student population. while instruction librarians’ experiences and perceptions are one important piece of the puzzle, especially in acknowledging shared challenges, it is important to recognize that students may have needs, digital literacy or otherwise, that educators are missing. though assessment is difficult right now, reflection and attention to the whole student experience is necessary. working with colleagues on campus to provide technology, including laptop computers and wi-fi hotspots, as information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 21 well as evaluating our content delivery and engagement tech tools for ada accessibility, are examples of ways that instruction librarians can connect students with unmet needs to resources during this difficult time. examining instruction librarians’ ongoing response to the pandemic, while challenging, will help libraries become more emergency-responsive and better able to meet the needs of diverse students in the 21st century. acknowledgement we would like to thank moria woodruff from the university of colorado boulder writing center for her help revising this manuscript. information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 22 appendix a: survey instrument distance learning during a pandemic: a matter of equity we are curious to hear about your experiences of library instruction during the abrupt shift to online learning. in particular, we are researching librarians’ use of technology tools for online content delivery and student engagement during covid-19.this survey should take less than ten minutes to complete. your answers will be anonymous. please do not include personally identifiable information. participation in the survey indicates your consent for us to use the data collected in a forthcoming research paper about using online technology tools to teach information literacy or library seminars during covid-19. the survey will be open through sunday, may 24th. thank you for your participation! q1 what content delivery technology have you used to create your distance learning library sessions during covid-19? select as many as apply. zoom microsoft teams libguides course management system (e.g., canvas) formative pear deck adobe illustrator snagit screencast-o-matic playposit google hangouts google classrooms canva (graphic design tool) other information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 23 q2 what technology tools have you used to enhance student engagement in your distance learning library sessions during covid-19? select as many as apply. padlet answergarden kahoot! mentimeter flipgrid slido socrative jamboard pear deck mural google drawings google forms quizalize gosoapbox poll everywhere yo teach! other q3 what are the strengths of the technology tools you’re using right now? q4 what are the weaknesses of the technology tools you’re using right now? information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 24 q5 what digital literacy gaps have you identified in your students since covid-19 closures? ala’s digital literacy task force defines digital literacy as “the ability to use information and communication technologies to find, evaluate, create, and communicate information, requiring both cognitive and technical skills.” q6 what other challenges exist in your ability to effectively provide equitable library instruction during this time? information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 25 appendix b: tools mentioned by three or fewer respondents, question 1, option other ninety-five respondents answered other to question 1: what content delivery technology have you used to create your synchronous and asynchronous distance learning library sessions during covid-19? tool name type of tool number of respondents bluejeans online meetings 3 google meet online meetings 3 jing / techsmith capture screen capture 3 blackboard ensemble video creation 2 imovie video editing 2 guide on the side interactive tutorials 2 kapwing video and image editing 2 libchat communications service 2 piktochart graphics editing 2 techsmith relay video creation 2 thinglink multimedia editing 2 adobe indesign desktop publishing 1 adobe photoshop graphics editing 1 adobe premiere pro video editing 1 amazon chime communications service 1 audacity audio editing 1 chat (in general) communications service 1 clideo video editing 1 faststone capture screen capture 1 genially interactive content creation 1 google sheets web-based spreadsheets 1 gotomeeting online meetings 1 information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 26 tool name type of tool number of respondents microsoft bookings scheduling 1 microsoft stream video sharing 1 powtoons video creation 1 pressbooks content management 1 prezi video video creation 1 qualtrics surveys 1 quicktime multimedia editing 1 screenflow video editing and screen capture 1 springshare libwizard interactive tutorials and forms 1 telephone communications service 1 videoscribe animated video creation 1 vimeo video sharing 1 whatsapp communications service 1 information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 27 appendix c: tools mentioned by one respondent, question 2, option other forty-two respondents answered other to question 2: what technology tools have you used to enhance student engagement in your distance learning library sessions or courses during covid 19? each tool was used by only one respondent. tool name type of tool articulate storyline interactive e-learning modules calendly scheduling camtasia video editing and screen recording canva quizzes quizzes google voice communications service h5p programming language for websites handout (not a technology tool) html/css programming language for websites knight lab tools storytelling lms discussion forums discussions microsoft powerpoint presentation platform microsoft word word processor nearpod interactive lessons parlay discussions qualtrics surveys remind communications service speakpipe communications service twine storytelling voicethread video, voice, and text commenting webex whiteboard drawing tool information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 28 endnotes 1 charles hodges et al., “the difference between emergency remote teaching and online learning,” educause review (2020), https://er.educause.edu/articles/2020/3/thedifference-between-emergency-remote-teaching-and-online-learning. 2 hodges et al., “the difference.” 3 jody greene, “how (not) to evaluate teaching during a pandemic,” chronicle of higher education (2020), https://www-chronicle-com.colorado.idm.oclc.org/article/how-not-to-evaluateteaching/248434. 4 laura czerniewicz, “what we learnt from ‘going online’ during university shutdowns in south africa,” philoned (2020), https://philonedtech.com/what-we-learnt-from-going-onlineduring-university-shutdowns-in-south-africa/. 5 for scholarship on equity and librarianship see joanne oud, “systematic workplace barriers for academic librarians with disabilities,” college & research libraries 80, no. 2 (2019), https://doi.org/10.5860/crl.80.2.169; amanda l. folk, “reframing information literacy as academic cultural capital: a critical and equity-based foundation for practice, assessment, and scholarship,” college & research libraries 80, no. 5 (2019), https://doi.org/10.5860/crl.80.5.658; scott seaman, carol krismann, and nancy carter, “salary market equity at the university of colorado at boulder libraries: a case study followup,” college & research libraries 64, no. 5 (2003), https://doi.org/10.5860/crl.64.5.390; freeda brook, dave ellenwood, and althea eannace lazzaro, “in pursuit of antiracist social justice: denaturalizing whiteness in the academic library,” library trends 64, no. 2 (2015), https://doi.org/10.1353/lib.2015.0048; isabel gonzalez-smith, juleah swanson, and azusa tanaka, “unpacking identity: racial, ethnic, and professional identity and academic librarians of color,” in the librarian stereotype: deconstructing perceptions and presentations of information work, ed. nicole pagowsky and miriam rigby (chicago: association of college and research libraries, 2014), 149–73. 6 tom riedel and paul betty, “real time with the librarian: using web conferencing software to connect to distance students,” journal of library & information services in distance learning 7, no. 1–2 (2013): 101, https://doi.org/10.1080/1533290x.2012.705616. 7 keith shaw, “colleges expand vpn capacity, conferencing to answer covid-19,” network world (online) (2020): 1. 8 monica anderson and andrew perrin, “nearly one-in-five teens can’t always finish their homework because of the digital divide,” pew research center fact tank news in the numbers, october 26, 2018, https://www.pewresearch.org/fact-tank/2018/10/26/nearlyone-in-five-teens-cant-always-finish-their-homework-because-of-the-digital-divide/. 9 julie arnold lietzau and barbara j. mann, “breaking out of the asynchronous box: suing web conferencing in distance learning,” journal of library & information services in distance learning 3, no. 3–4 (2009): 113, https://doi.org/10.1080/15332900903375291. https://er.educause.edu/articles/2020/3/the-difference-between-emergency-remote-teaching-and-online-learning https://er.educause.edu/articles/2020/3/the-difference-between-emergency-remote-teaching-and-online-learning https://www-chronicle-com.colorado.idm.oclc.org/article/how-not-to-evaluate-teaching/248434 https://www-chronicle-com.colorado.idm.oclc.org/article/how-not-to-evaluate-teaching/248434 https://philonedtech.com/what-we-learnt-from-going-online-during-university-shutdowns-in-south-africa/ https://philonedtech.com/what-we-learnt-from-going-online-during-university-shutdowns-in-south-africa/ https://doi.org/10.5860/crl.80.2.169 https://doi.org/10.5860/crl.80.2.169 https://doi.org/10.5860/crl.80.5.658 https://doi.org/10.5860/crl.64.5.390 https://doi.org/10.1353/lib.2015.0048 https://doi.org/10.1080/1533290x.2012.705616 https://www.pewresearch.org/fact-tank/2018/10/26/nearly-one-in-five-teens-cant-always-finish-their-homework-because-of-the-digital-divide/ https://www.pewresearch.org/fact-tank/2018/10/26/nearly-one-in-five-teens-cant-always-finish-their-homework-because-of-the-digital-divide/ https://doi.org/10.1080/15332900903375291 information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 29 10 aek phakiti, david hirsh, and lindy woodrow, “it’s not only english: effects of other individual factors on english language learning and academic learning of esl international students in australia,” journal of research in international education 12, no. 3 (2013): 248. 11 t. v. semenova and l. m. rudakova, “barriers to taking massive open online courses,” russian education & society 58, no. 3 (2016): 242, https://doi.org/10.1080/10609393.2016.1242992. 12 xinghua wang, seng chee tan, and lu li, “technostress in university students’ technologyenhanced learning: an investigation from multidimensional person-environment misfit,” computers in human behavior 105, (2020): 2, https://doi.org/10.1016/j.chb.2019.106208. 13 “digital literacy,” ala literacy clearinghouse, accessed may 16, 2021: https://literacy.ala.org/digital-literacy/. 14 steven j. bell and john shank, “the blended librarian: a blueprint for redefining the teaching and learning role of academic librarians,” college & research libraries news 65, no. 7 (2004): 374, https://doi.org/10.5860/crln.65.7.7297. 15 vanessa w. vongkulluksn, kui xie, and margaret a bowman, “the role of value on teachers’ internalization of external barriers and externalization of personal beliefs for classroom technology integration,” computer & education 118, (2018): 79, https://doi.org/10.1016/j.compedu.2017.11.009. 16 jesper aagaard, “breaking down barriers: the ambivalent nature of technologies in the classroom, new media & society 19, no. 7 (2017): 1138, https://doi.org/10.1177/1461444816631505. 17 wan ng, “can we teach digital natives digital literacy?” computers & education 59, no. 3 (2012): 1065, https://doi.org/10.1016/j.compedu.2012.04.016. 18 ng, “can we teach,” 1071–72. 19 ng, “can we teach,” 1072. 20 ellen helsper and rebecca enyon, “digital natives: where is the evidence?,” british educational research journal 36, no. 3. (2010): 515, https://doi.org/10.1080/01411920902989227. 21 david ellis, “using padlet to increase student engagement in lectures,” in proceedings of the 14th european conference on e-learning (ecel 2015), ed. amanda jefferies and marija cubric (reading, uk: academic conferences and publishing international limited): 195. 22 seyed abdollah shahrokni, “playposit: using interactive videos in language education,” teaching english with technology 18, no. 1 (2018): 106. 23 jurgen schulte et al., “shaping the future of academic libraries: authentic learning for the next generation,” college & research libraries 79, no. 5 (2018): 688, https://doi.org/10.5860/crl.79.5.685. https://doi.org/10.1080/10609393.2016.1242992 https://doi.org/10.1016/j.chb.2019.106208 https://literacy.ala.org/digital-literacy/ https://doi.org/10.5860/crln.65.7.7297 https://doi.org/10.1016/j.compedu.2017.11.009 https://doi.org/10.1177/1461444816631505 https://doi.org/10.1177/1461444816631505 https://doi.org/10.1016/j.compedu.2012.04.016 https://doi.org/10.1080/01411920902989227 https://doi.org/10.5860/crl.79.5.685 information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 30 24 chiara faggiolani, “perceived identity: applying grounded theory in libraries,” jlis.it: italian journal of library and information science 2, no. 1 (2011): 4592, https://doi.org/10.4403/jlis.it-4592. 25 “over 700 universities and colleges now use zoom!” zoom blog, july 15, 2013, https://blog.zoom.us/over-700-universities-and-colleges-now-use-zoom-video-conferencing/. 26 ng, “can we teach,” 1071–72. https://doi.org/10.4403/jlis.it-4592 https://blog.zoom.us/over-700-universities-and-colleges-now-use-zoom-video-conferencing/ abstract introduction literature review barriers to equitable student access in online learning technology tools, digital literacy, student engagement methods instrument participants grounded theory approach findings popularity of technology tools strengths and weaknesses of technology tools digital literacy gaps challenges and limitations discussion questions 1 and 2: technology tools instruction librarians are using to deliver content and engage students questions 3 and 4: the strengths and weaknesses of tools as they affect student engagement questions 5 and 6: digital literacy gaps and equitable instruction recommendations technology and equity communication and equity professional development and personal support conclusion acknowledgement appendix a: survey instrument appendix b: tools mentioned by three or fewer respondents, question 1, option other appendix c: tools mentioned by one respondent, question 2, option other endnotes 20190318 10979 gallley editorial board thoughts who will use this and why? user stories and use cases kevin m. ford information technology and libraries | march 2019 5 kevin m. ford (kefo@loc.gov) is librarian, linked data specialist, library of congress. perhaps i’m that guy. the one always asking for either a “user story” or a “use case,” and sometimes both. they are tools employed in software or system engineering to capture how, and importantly why, actors (often human users, but not necessarily) interact with a system. both have protagonists, but one is more a creative narrative, the other like a strict, unvarnished retelling. user stories relate what an actor wants to do and why. use cases detail to varying degrees how that actor might go about realizing his desire. the concepts, though distinct, are often confused and conflated. and, because they classify as jargon, the concepts have sometimes been employed outside of technology to capture what an actor needs, the path the actor takes to his or her objective, including any decisions that might be made along the way, and all of this effort is undertaken in order to identify the best solution. by giving the actors a starring role, user stories and use cases ensure focus is on the actors, their inputs, and the expected outcome. they protect against incorporating unnecessary elements, which could clutter and, even worse, weaken the end product, and they create a baseline understanding by which the result can be measured. and so i find myself frequently asking in meetings, and mumbling in hallways: “what’s the use case for that?” or “is there a user story? if not, then why are we doing it?” you get the idea. it’s a little ironic that i would become this person. not because i didn’t believe in user stories and use cases – quite the contrary, i’ve always believed in the importance and utility of them – but because of a book i was assigned during graduate coursework for my lis degree and my initial reaction. it’s not just an unassuming book, it has a downright boring appearance, as one might expect of a book entitled “use case modeling.”1 it’s a shocking 347 pages. it was a joint endeavor by two authors: kurt bittner and ian spence. i think i read it, but i can’t honestly recall. i assume i did because i was that type of student and i had a long chicago el commute at the time. in any case, i know beyond doubt that i was assigned this book, dutifully obtained it, and then picked it up, thumbed through it, rolled my eyes, and probably said, “ugh, really?” and that’s just it. the joke’s on me. the concepts, and as such the book, which i’ve moved across the country a couple of times, remain near-daily constants in my life. as a developer, i basically don’t do anything without a user story and a use case, especially one whose steps (including preconditions, alternatives, variables, triggers, and final outcome) haven’t been reasonably sketched out. “sketched out” is an interesting phrase because one would think that if entire books were being authored on the topic of use cases, for example, then use cases would be complicated and involved affairs. they can be, but they need not be. the same holds for user stories. imagine you were designing a cataloging system, here’s an example of the latter: as a librarian i want my student catalogers to be guided through selection of vocabulary terms to improve both their accuracy and speed.2 editorial board thoughts: who will use this and why? | ford 6 https://doi.org/10.6017/ital.v38i1.10979 that single-sentence user story identifies the actors (student catalogers), what they need (a “guided … selection of vocabulary terms”), and why (“to improve their accuracy and speed”). the use case would explore how the student catalogers (the actors) would interact with the system to realize that user story. the use case might be narrowly defined (“adding controlled terms to records”) or might be part of a broader use case (“cataloging records”), but in either instance the use case might go to some length to describe the interaction between the student catalogers and the system in order to generate a clear understanding of the various interactions. by doing this, the use case helps to identify functional requirements and it clearly articulates user/system expectations, which can be reviewed by stakeholders before work begins and used to verify delivery of the final product. as i have presented this, using these tools might strike you as overly formal and time-consuming. in many circumstances they might be, if the developer has sufficient user and domain knowledge (rare, very, very rare) and especially if the “solution” is not an entirely new system but just an enhancement or augmentation to an existing system. yet, whether it is a completely new system being developed by someone who has long and profound experience with the domain or a simple enhancement, it may be worth entertaining the questions/process if even informally. i find it is often sufficient to ask “who will use this and why?” essentially i’m asking for the “user story” but dispensing with the jargon. doing so may lead to additional questions, the answers to which would likely check the boxes of a “use case” even if the effort is not identified as such, and it certainly ensures the user-driven nature and need of the request. this might all sound obvious, but i like to think of it as defensive programming, which is like defensive driving. yes, the driver coming up to the stop sign on my right is going to stop, but i take my foot off the gas and position it over the brake just in case. likewise, i’m confident the functional requirements i’m being handed have been fully considered and address a user need, but i’m going to ask for the user story anyway. i’m also leery of scope creep which, if i were to continue the driving analogy, would be equivalent to driving to one store because you need to, but then also driving to two additional stores for items you think might be good to have but for which you have no present need. it’s time-consuming, you’ve complicated your project, you’ve added expense to your budget, and the extra items might be of little or no use in the end. the number of times i’ve been in meetings in which new, additional features are discussed because the designers think it is a good idea (that is, there has been no actual user request or input sought) is alarmingly high. that’s when i pipe up, “is there a user story? if not, then why are we doing it?” user stories and use cases help focus any development project on those who stand to benefit, i.e. the project’s stakeholders, and can guard simultaneously against insufficient planning and software bloat. and the concepts, though most often thought of with respect to large-scale projects, apply in all circumstances, from the smallest feature request to an existing system to the redesign of a complex system. if you are not in the habit of asking, try it next time: who will use this and why? endnotes 1 kurt bittner and ian spence, use case modeling (boston: addison-wesley, 2003). also useful: alistair cockburn, writing effective use cases (boston: addison-wesley, 2001). information technology and libraries | march 2019 7 2 “use case 3.4: authority tool for more accurate data entry,” linked data for libraries (ld4l), accessed march 1, 2019, https://wiki.duraspace.org/display/ld4l/use+case+3.4%3a+authority+tool+for+more+accur ate+data+entry. digital collections are a sprint, not a marathon: adapting scrum project management techniques to library digital initiatives michael j. dulock and holley long information technology and libraries | december 2015 5 abstract this article describes a case study in which a small team from the digital initiatives group and metadata services department at the university of colorado boulder (cu-boulder) libraries conducted a pilot of the scrum project management framework. the pilot team organized digital initiatives work into short, fixed intervals called sprints—a key component of scrum. working for more than a year in the modified framework yielded significant improvements to digital collection work, including increased production of digital objects and surrogate records, accelerated publication of digital collections, and an increase in the number of concurrent projects. adoption of sprints has improved communication and cooperation between participants, reinforced teamwork, and enhanced their ability to adapt to shifting priorities. introduction libraries in recent years have freely adapted methodologies from other disciplines in an effort to improve library services. for example, librarians have • employed usability testing techniques to enhance users’ experience with digital libraries interfaces,1 improve the utility of library websites,2 and determine the efficacy of a visual search interface for a commercial library database;3 • adopted participatory design methods to identify information visualizations that could augment digital library services4 and determine user needs in new library buildings;5 and • utilized principles of continuous process improvement to enhance workflows for book acquisition and implementation of serial title changes in a technical services unit.6 librarians often come to the profession with disciplinary knowledge from an undergraduate degree unrelated to librarianship, so it should come as no surprise that they bring some of that disciplinary knowledge to their work. the interdisciplinary nature of librarianship also creates an environment that is amenable to adoption or adaptation of techniques from a variety of sources, not only those originating in library science. in this paper, the authors describe their experiences michael j. dulock (michael.dulock@colorado.edu) is assistant professor and metadata librarian, university of colorado boulder. holley long (longh@uncw.edu), previously assistant professor and systems librarian for digital initiatives at university of colorado, boulder, is digital initiatives librarian, randall library, university of north carolina wilmington. mailto:michael.dulock@colorado.edu mailto:longh@uncw.edu digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 6 in applying a modified scrum management framework to facilitate digital collection production. they begin by elucidating the fundamentals of scrum and then describes a pilot project using aspects of the methodology. they discuss the outcomes of the pilot and posit additional features of scrum that may be adopted in the future. fundamentals of scrum project management the scrum project management framework—one of several techniques under the rubric of agile project management—originated in software development, and has been applied in a variety of library contexts including the development of digital library platforms7 and library web applications.8 scrum’s salient characteristics include self-managing teams that organize their work into “short iterations of clearly defined deliverables” and focus on “communication over documentation.”9 the scrum primer: a lightweight guide to the theory and practice of scrum describes the roles, tools, and processes involved in this project management technique.10 scrum teams are cross-functional and consist of five to nine members who are cross-trained to perform multiple tasks. in addition to the team, two individuals serve specialized roles, scrum master and product owner. the scrum master is responsible for ensuring that scrum principles are followed and for removing any obstacles that hinder the team’s productivity. hence the scrum master is not a project manager, but a facilitator. the product owner’s role is to manage the product by identifying and prioritizing its features. this individual represents the stakeholders’ interests and is ultimately responsible for the product’s value. the team divides their work into short, fixed intervals called sprints that typically last two to four weeks and are never extended. at the beginning of each sprint, the team meets to select and commit to completing a set of deliverables. once these goals are set, they remain stable for the duration; course corrections can occur in later sprints. in software development, the scrum team aims to complete a unit of work that stands on its own and is fully functional, known as a potentially shippable increment. it is selected from an itemized list of product features called the product backlog. the backlog is established at the outset of development and consists of a comprehensive list of tasks that must occur to complete the product. a well-constructed backlog has four characteristics. first, it is prioritized with the features that will yield the highest return on investment at the top of the list. second, the backlog is appropriately detailed, so that the tasks at the top of the list are well-defined whereas those at the bottom may be more vaguely demarcated. third, each task receives an estimation for the amount of effort required to complete it, which helps the team to project a timeline for the product. finally, the backlog evolves in response to new developments. individual tasks may be added, deleted, divided, or reprioritized over the life of the project. during the course of a sprint, team members meet to plan the sprint, check-in on a daily basis, and then debrief at the conclusion of the sprint. they begin with a two-part planning meeting in which the product owner reviews the highest priority tasks with the team. in the second half of the meeting, the team and the scrum master determine how many of the tasks can be accomplished in information technologies and libraries |december 2015 7 the given timeframe, thus defining the goals for the sprint. this meeting generally lasts no longer than four hours for a two-week sprint. every day, the team holds a brief meeting to get organized and stay on track. during these “daily scrums,” each team member shares three pieces of information: what has been accomplished since the previous meeting, what will be accomplished before the next meeting, and what, if any, obstacles are impeding the work. these fifteen-minute meetings provide the team with a valuable opportunity to communicate and coordinate their efforts. sprints conclude with two meetings, a review and retrospective. during the review, the team inspects the deliverables that were produced during that sprint. the retrospective provides an opportunity to discuss the process, what is working well, and what needs to be adjusted. figure 1. typical meeting schedule for a two-week sprint evidence in the literature suggests that scrum improves both outcomes and process. one metaanalysis of 274 programming case studies found that implementing scrum led to improved productivity as well as greater customer satisfaction, product quality, team motivation, and cost reduction.11 proponents of this project management technique find that it leads to a more flexible and efficient process. scrum’s brief iterative work cycles and evolving product backlog promote adaptability so the team can address the inevitable changes that occur over the life of a project. by contrast, traditional project management techniques have been criticized for requiring too much time upfront on planning and being too rigid to respond to changes in later stages of the project.12 scrum also promotes communication over documentation,13 resulting in less administrative overhead as well as increased accountability and trust between team members. scrum pilot at university of colorado boulder libraries the university of colorado boulder (cu-boulder) libraries digital initiatives team was interested in adopting scrum because of its incremental approach to completing large projects, its focus on communication, and its flexibility. these attributes meshed well with the group’s goals to publish larger collections more quickly and to more effectively multitask the production of multiple high digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 8 priority collections. the group’s staffing model and approach to collection building prior to the scrum pilot is described here to provide some context for this choice of project management tool. digital collection proposals are vetted by a working group composed of ten members, the digital library management group (dlmg), to ensure that major considerations such as copyright status are fully investigated before undertaking the collection. approved proposals are prioritized by the appropriate collection manager as high, medium, or low and then placed in a queue for scanning and metadata provisioning. a core group of individuals generally works on all digital collections, including the metadata librarian, the digital initiatives librarian, and one or both of the digitization lab managers. additionally, the team frequently includes the subject specialist who nominated the collection for digitization, staff catalogers, and other library staff members whose expertise is required. at any given time, the queue may contain as many as fifteen collections, and the core team works on several of them concurrently to address the separate needs of participating departments. while this approach allows the teams to distribute resources more equitably across departments, progress on individual collections can be slower than if they are addressed one at a time. prior to implementing aspects of scrum, the team also completed the scanning and metadata records for every object in the collection before it was published. as a result, publication of larger collections trailed behind smaller collections. the details of digital collection production vary depending of the nature of the project, but the process usually follows the same broad outline. unless the entire collection will be digitized, the collection manager chooses a selection of materials on the basis of criteria such as research value, rarity, curatorial considerations, copyright status, physical condition, feasibility for scanning, and availability of metadata. photographs and paper-based materials are then evaluated by the preservation department to ensure that they are in suitable condition for scanning. likewise, the media lab manager evaluates audio and video media for condition issues such as sticky shed syndrome, which will affect digitization.14 depending on format, the material is then digitized by the digitization lab manager or the media lab manager and their student assistants according to locally established workflows that conform to nationally recognized best practices. once digitized, student assistants apply post-processing procedures as appropriate and required, such as running ocr (optical character recognition) software to convert images to text or equalizing levels on an audio file. the lab managers then check the files for quality assurance and move the files to the appropriate location on the server. the metadata librarian creates a metadata template appropriate to the material being digitized by using industry standards such as visual resources association core (vra core), metadata object description schema (mods), pbcore, and dublin core (dc). metadata creation methods depend on the existence of legacy metadata for the analog materials and in what format legacy metadata is contained. the metadata librarian, along with his staff and/or student assistants, adapts legacy metadata into a format that can be ingested by the digital library software or creates records directly in the software when there is no legacy metadata. metadata is formatted or created in accordance with existing input standards such as cataloging cultural objects (cco) and resource description and access (rda), and it is enhanced information technologies and libraries |december 2015 9 as much as possible using controlled vocabularies such as the art and architectural thesaurus (aat) and library of congress subject headings. the metadata librarian performs quality assurance on the metadata records during creation and before the collection is published. in the final stages, the collection is created in the digital library software, at which time search and display options are established: thumbnail labels, default collection sorting, faceted browsing fields, etc. then the files and metadata are uploaded and published online. the highlight of the cu-boulder digital library is the twenty-seven collections drawn from local holdings in archives, special collections department, music library, and earth sciences and map library, among others. the library also contains purchased content and “luna commons” collections created by institutions that use the same digital library platform, for a total of more than 185,000 images, texts, maps, audio recordings, and videos. the following four collections were created during the scrum pilot and illustrate the types of materials available in the cuboulder digital library: the colorado coal project consists of video and audio interviews, transcripts, and slides collected between 1974 and 1979 by the university of colorado coal project. the project was funded by the colorado humanities program and the national endowment for the humanities to create an ethnographic record of the history of coal mining in the western united states from immigration and daily life in the coal camps to labor conditions and strikes, including ludlow (1913–14) and columbine (1927). the mining maps collection provides access to scanned maps of various mines, lodes, and claims in colorado from the late 1800s to the early 1900s. these maps come from a variety of creators, including private publishers and us government agencies. digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 10 the vasulka media archive showcases the work of pioneering video artists steina and woody vasulka and contains some of their cutting-edge studies in video that experiment with form, content, and presentation. steina, an icelander, educated in music at the prague conservatory of music, and woody, a graduate of prague's film academy, arrived in new york city just in time for the new media explosion. they brought with them their experience of the european media awakening, which helped them blend seamlessly into the youth media revolution of the late sixties and early seventies in the united states. the 3d natural history collection comprises one hundred archaeology and paleontology specimens from the rocky mountain and southwest regions, including baskets, moccasins, animal figurines, game pieces, jewelry, tools, and other everyday objects from the freemont, clovis, and ancestral puebloan cultures as well as a selection of vertebrate, invertebrate, and track paleontology specimens from the mesozoic through the cenozoic eras (250 ma to the present). the diffusion of effort across multiple collections and a slower publication rate for larger collections offered opportunities for improvement. after attending a conference session on scrum project management for web development projects, one of the team members recognized scrum’s potential to improve production processes since the technique divides large projects into manageable subtasks that can be accomplished in regular, short intervals.15 this approach would allow the team to switch between different high priority collections at regularly defined intervals to facilitate steady progress on competing priorities. working in sprints would also make it easier to publish smaller portions of a large collection at regular intervals. thus scrum held the potential to increase the production rate for larger collections and make the team’s progress more transparent to users and colleagues. in april 2013, a small team of cu-boulder librarians and staff initiated a pilot to assess the effect on processes and outcomes for digital collection production. rather than involving individuals from all affected units, regardless of their level of engagement in a particular project, the scrum pilot was limited to the three individuals who were involved in most, if not all, of the projects information technologies and libraries |december 2015 11 undertaken: the digital initiatives librarian, metadata librarian, and digitization lab manager.16 by including these three individuals, the major functions of metadata provision, digitization, and publication were covered in the trial with no disruption to the existing workflows or organizational structures. selecting this group also ensured that scrum would be tested in a broad range of scenarios and on collections from several different departments. to begin, the team met to review the scrum project management framework and considered how best to pilot the technique. taking a pragmatic approach, they only adopted those aspects of scrum that were deemed most likely to result in improved outcomes. if the pilot were successful, other aspects of scrum could be incrementally incorporated later. the group discussed how scrum roles, processes, and tools could be adapted to digital collection workflows and determined that sprints would likely have the highest return on investment. they also chose to adapt and hybridize certain aspects of the planning meeting and daily scrum to achieve goals that were not being met by other existing meetings. sprint planning and end meetings were combined so that all three participants knew what each had completed and what was targeted for the next sprint. select activities of sprint planning and end meetings were already a part of the monthly dlmg meetings, making additional sprint meetings redundant. daily scrum meetings were excluded as the team felt that daily meetings would not produce enough benefit to justify the costs. in addition, two of the three participants have numerous responsibilities that lie outside of projects subject to the scrum pilot, so each person does not necessarily perform scrum-related work every day. however, the short meeting time was adopted into the planning/end meeting, as were elements of the three core questions of the daily scrum meeting, with some modifications. the questions addressed in the biweekly meetings are: what have you done since the last meeting? what are you planning for the next meeting? what impediments, if any, did you encounter during the sprint? the latter question was sometimes addressed mid-sprint through emails, phone calls, or one-off meetings that include a larger or different group of stakeholders. the team adopted the two-week duration typical of scrum sprints for the pilot. this has proven to be a good medium-term timeframe. it was short enough that the team could adjust priorities quickly, but long enough to complete significant work. the team chose to combine the sprint planning and sprint review meetings into a single meeting. part of the motivation for a trial of the scrum technique was to minimize additional time away from projects while maximizing information transfer during the meetings. a single biweekly planning/review meeting was determined to be sufficient to report accomplishments and set goals yet substantial and free of irrelevant content without being overly burdensome as “yet another meeting.” at each sprint meeting, each participant reported on results from the previous sprint. work that was completed allowed the next phase of a project to proceed. based on the results of the last sprint, each team member set measurable goals that could be realistically met in the next twoweek sprint. there has been a concerted effort to keep the meetings short, limited to about twenty digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 12 to twenty-five minutes. to enforce this habit, the sprint meetings were scheduled to begin twenty minutes before other regularly scheduled meetings for most or all of the participants. this helped keep participants on-topic and reinforced the transfer-of-information aspect of the meetings, with minimal leeway for extraneous topics. reflection the modified scrum methodology described above has been in place for more than a year. there have been several positive outcomes resulting from this practice. beginning with the most practical, production has become more regular than it was before scrum was implemented. the nature of digital initiatives in this environment dictates that many projects are in progress at once, in various stages of completion. the production work, such as digitizing media or creating metadata records, has become more consistent and regular. instead of production peaks and valleys, there is more of a straight line as portions of projects are finished and others come online. this in turn has resulted in faster publication of collections. in 2013, the team published six new collections, twice as many as the previous year. the ability to put all hands on deck for a project for a two-week period can increase productivity. since sprints allow for short, concentrated bursts of work on a single project, smaller projects can be completed in a few sprints and larger projects can be divided into “potentially shippable units” and thus published incrementally. another benefit of scrum is that the variability of the two-week sprint cycle allows the team to work on more collections concurrently. for example, during a given sprint, scanning is underway for one collection, a metadata template is being constructed for another, the analog material in a third is being examined for pre-scanning preservation assessment, and a fourth collection is being published. while this type of multitasking occurred before the team piloted sprints, the scrum project management framework lends more structure and coordination to the various team members’ efforts. collection building activities can be broken down into subtasks that are accomplished in nonconsecutive sprints without undercutting the team’s concerted efforts. as a result, the team can juggle competing priorities much more effectively. the team is working with multiple stakeholders at any given time, each of whom may have several projects planned or in progress. as focus shifts among stakeholders and their respective projects, the scrum team is able to adjust quickly to align with those priorities, even if only for a single sprint. this also makes it easier to respond to emerging requests or address small, focused projects on the basis of events such as exhibits or course assignments. additional benefits of the scrum methodology pertain to communication and work style among the three scrum participants. the frequent, short meetings are densely packed and highly focused. each person has only a few minutes to describe what has been accomplished, explain problems encountered, troubleshoot solutions, and share plans for the next sprint. the return on the time investment of twenty minutes every two weeks is significant—there is no time to waste on issues that do not pertain directly to the projects underway, just completed, or about to start. a further result is that the group’s sense of itself as a team is enhanced. as stated above, the three scrum information technologies and libraries |december 2015 13 participants do not all work in the same administrative unit within the library. though they shared frequent communication by email as projects progressed, regular sprint meetings have fostered a closer sense of team. the participants know from sprint to sprint what the others are doing; they can assist one another with problems face-to-face and coordinate with one another so that work segments progress toward production in a logical sequence. with more than a year of experience with scrum, the pilot team has determined that several aspects of the methodology have worked well in our environment. in general, the sprint pattern fits well with existing operating modes. the monthly dlmg meeting, which includes a large and diverse group, provides an opportunity to discuss priorities, review project proposals, establish standards, and make strategic decisions. the bi-weekly sprint meetings dovetail nicely, with one meeting taking place at a midpoint between dlmg meetings, and one just prior to dlmg meetings. this allows the three scrum participants to focus on strategic items during the dlmg meeting but keep a close eye on operational items in between. the scrum methodology has also accommodated the competing priorities that the three participants must balance on an ongoing basis. there is considerable variation between participants in terms of roles and responsibilities, but the division of work into sprints has given the team greater opportunity to fit production work in with other responsibilities, such as supervision and training; scholarly research and writing; service performed for disciplinary organizations; infrastructure building; and planning, research, and design work for future projects. the two-week sprint duration is a productive time interval during which the team can set and reach incremental goals, whether that is starting and finishing a small project on short notice, making a big push on a large-scale project, or continuing gradual progress on a large, deliberatelypaced initiative. the brief meetings ensure that participants focus on the previous sprint and the upcoming sprint. there is usually just enough time to discuss accomplishments, goals, and obstacles, with some time left to troubleshoot as necessary. the meeting schedule and structure allows each individual to set his or her own goals so that he or she can make maximum progress during the sprint. this in turn feeds into accountability. there is always an external check on one’s progress—the next meeting comes up in two weeks, creating an effective deadline (which also sometimes corresponds to a project deadline). it becomes easier to stay on task and keep goals in sight with the sprint report looming in a matter of days. at the same time, scrum helps to define each person’s role and clarifies how roles align with each other. some tasks are completely independent, while others must be done in sequence and depend on another’s work. the sprint schedule allows large, complex projects to be divided into manageable pieces so that each sprint can result in a sense of accomplishment, even if it may require many sprint cycles to actually complete a project. this is especially true for large digital initiatives. for instance, completing the entire project may take a year, but subsets of a collection may be published in phases at more frequent intervals in the meantime. digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 14 summary of benefits ● enhanced ability to manage multiple concurrent projects ● published large collections incrementally, increasing responsiveness to users and other stakeholders ● improved team building ● increased communication and accountability among team members future considerations based on these outcomes, the team can safely say that it met its objectives for the test pilot. one of the reasons that it was feasible to try this when the participants were already highly committed is that the pilot used a small portion of the scrum methodology and was not too rigid in its approach. the team felt that a hybrid of the scrum planning and scrum review meeting held twice a month would provide the benefits without overburdening schedules with additional meetings. there were also plans to have a virtual email check-in every other week to loosely achieve the goals of the daily scrum meeting, that is, to improve communication and accountability. the email check-in fell by the wayside; the team found it wasn’t necessary because there were already adequate opportunities to check-in with each other over the course of a two-week sprint. the team has found the sprints and modified scrum meetings to be highly useful and relatively easy to incorporate into their workflows. the next phase of the pilot will implement product backlogs and burn down charts, diagrams showing how much work remains for the team in a single sprint, with the goal of tracking collections’ progress at the item level through each step of the planning, selection, preservation assessment, digitization, metadata provisioning, and publication workflows. figure 2. hypothetical backlog for the first sprint of a digital collection17 information technologies and libraries |december 2015 15 scrum backlogs are arranged on the basis of a task’s perceived benefit for customers. to adapt backlogs for digital collection production work, the backlog task list’s order will instead be based in part on the workflow sequence. for example, pieces from the physical collection must be selected before preservation staff can assess them. additionally, the backlog items will be sequenced according to the materials’ research value or complexity. for instance, the digitization of a folder of significant correspondence from an archival collection would be assigned a higher priority in the backlog than the digitization of newspaper clippings of minor importance from the same collection. or, materials that are easy to scan would be listed in the backlog ahead of fragile or complex items that require more time to complete. this will allow the team to publish the most valuable items from the collection more quickly. according to scrum best practices, backlogs are also appropriately detailed. in the context of digital collection production work, collections’ backlogs would begin with a standard template of high-level activities: materials’ selection, copyright analysis, preservation assessment, digitization, metadata creation, and publication. as the team progresses through backlog items, they will become increasingly detailed. backlogs also evolve. scrum’s ability to respond to change has been one of its strongest assets in this environment and therefore the backlog’s ability to evolve will make it a valuable addition to the team’s process. for example, materials that a collection manager uncovers and adds to the project late in the process can be easily incorporated into the backlog or materials in the collection that are needed to support an upcoming instruction session can be moved up in the backlog for the next sprint. in this way, the backlog will support the team’s goal to nimbly respond to shifting priorities and emerging opportunities. figure 3. hypothetical burn down chart18 digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 16 the final relevant feature of a backlog, the “effort estimates,” taken in conjunction with the burn down chart will help the team develop better metrics for estimating the time and resources required to complete a collection. when items are added to the backlog, team members estimate the amount of effort needed to complete it. the burn down chart illustrates how much work remains and, in general practice, is updated on a daily basis. given that the team has truncated the scrum meeting schedule, this may occur on a weekly basis, but will nonetheless benefit the team in several ways. initially, it will keep the team on track and provide valuable and detailed information for stakeholders on the collections’ progress. as the team accrues old burn down charts from completed collections, they can use the data to hone their ability to estimate the amount of time and resources needed to complete a given project. conclusion through the pilot conducted for digital initiatives at cu-boulder libraries, application of aspects of the scrum project management framework has demonstrated significant benefits with no discernable downside. adoption of sprint planning and end meetings resulted in several positive outcomes for the participants. digital collection production has become more regular; work can be underway on more collections simultaneously; and collections are, on average, published more quickly. in addition, communication and cooperation among the sprint pilot participants have increased and strengthened the sense of teamwork among them. the sprint schedule has blended well with existing digital initiatives meetings and workflows, and has enhanced the team’s ability to handle ever-shifting priorities. additional aspects of scrum, such as product backlogs and burn down charts, will be incorporated into the participants’ workflows to allow them to better track the work done at the item level, provide more detailed information for stakeholders during the course of a project, and predict how much time and effort will be required for future projects. the positive results of this pilot demonstrate the benefits to be gained by looking outside standard library practice and adopting techniques developed in another discipline. given the range of activities performed in libraries, the possibilities to improve workflows and increase efficiency are limitless as long as those doing the work keep an open mind and a sharp eye out for methodologies that could ultimately benefit their work, and in turn, their users. references 1. sueli mara ferreira and denise nunes pithan, “usability of digital libraries,” oclc systems & services: international digital library perspectives 21, no. 4 (2005): 316, doi: 10.1108/10650750510631695. 2. danielle a. becker and lauren yannotta, “modeling a library web site redesign process: developing a user-centered web site through usability testing,” information technology & libraries 32, no. 1 (2013): 11, doi: 10.6017/ital.v32i1.2311. 3. jodi condit fagan, “usability testing of a large, multidisciplinary library database: basic search and visual search,” information technology & libraries 25 no. 3 (2006): 140–41, 10.6017/ital.v25i3.3345. http://dx.doi.org/10.1108/10650750510631695 http://dx.doi.org/10.6017/ital.v32i1.2311 http://dx.doi.org/10.6017/ital.v25i3.3345 information technologies and libraries |december 2015 17 4. panayiotis zaphiris, kulvinder gill, terry h.-y. ma, stephanie wilson and helen petrie, “exploring the use of information visualization for digital libraries,” new review of information networking 10, no. 1 (2004): 58, doi: 10.1080/1361457042000304136. 5. benjamin meunier and olaf eigenbrodt, “more than bricks and mortar: building a community of users through library design,” journal of library administration 54 no. 3 (2014): 218–19, 10.1080/01930826.2014.915166. 6. lisa a. palmer and barbara c. ingrassia, “utilizing the power of continuous process improvement in technical services,” journal of hospital librarianship 5 no. 3 (2005): 94–95, 10.1300/j186v05n03_09. 7. javier d. fernández et al., “agile dl: building a delos-conformed digital library using agile software development,” in research and advanced technology for digital libraries, edited by birte christensen-dalsgaard et al. (berlin: springer-verlag, 2008), 398–9, doi: 10.1007/978-3540-87599-4_44. 8. michelle frisque, “using scrum to streamline web applications development and improve transparency” (paper presented at the 13th annual lita national forum, atlanta, georgia, september 30–october 3, 2010). 9. frank h. cervone, “understanding agile project management methods using scrum,” oclc systems & services 27, no. 1 (2011): 19, 10.1108/10650751111106528. 10. pete deemer, gabrielle benefield, craig larman, and bas vodde, “the scrum primer: a lightweight guide to the theory and practice of scrum," (2012), 3-15, www.infoq.com/minibooks/scrum_primer. 11. eliza s. f. cardozo et al., “scrum and productivity in software projects: a systematic literature review” (paper presented at the 14th international conference on evaluation and assessment in software engineering (ease), 2010), 3. 12. cervone, “understanding agile project management,” 18. 13. ibid., 19. 14. sticky shed syndrome refers to the degradation of magnetic tape where the binder separates from the carrier. the binder can then stick to the playback equipment rendering the tape unplayable. 15. frisque, “using scrum.” 16. the media lab manager responsible for audio and video digitization did not participate because his lab offers fee-based services to the public and thus has long-established business processes in place that would not have blended easily with sprints. 17. figure 2 is based on illustration created by mountain goat software, “sprint backlog,” https://www.mountaingoatsoftware.com/agile/scrum/sprint-backlog. 18. figure 3 is adapted from template created by expert project management, “burn down chart template,” www.expertprogrammanagement.com/wpcontent/uploads/templates/burndown.xls. http://dx.doi.org/10.1080/1361457042000304136 http://dx.doi.org/10.1080/01930826.2014.915166 http://dx.doi.org/10.1300/j186v05n03_09 http://dx.doi.org/10.1007/978-3-540-87599-4_44 http://dx.doi.org/10.1007/978-3-540-87599-4_44 http://dx.doi.org/10.1108/10650751111106528 https://www.mountaingoatsoftware.com/agile/scrum/sprint-backlog http://www.expertprogrammanagement.com/wp-content/uploads/templates/burndown.xls http://www.expertprogrammanagement.com/wp-content/uploads/templates/burndown.xls application level security in a public library: a case study richard thomchick and tonia san nicolas-rocca information technology and libraries | december 2018 107 richard thomchick (richardt@vmware.com) is mlis, san josé state university. tonia san nicolas-rocca (tonia.sannicolas-rocca@sjsu.edu) is assistant professor in the school of information at san josé state university. abstract libraries have historically made great efforts to ensure the confidentiality of patron personally identifiable information (pii), but the rapid, widespread adoption of information technology and the internet have given rise to new privacy and security challenges. hypertext transport protocol secure (https) is a form of hypertext transport protocol (http) that enables secure communication over the public internet and provides a deterministic way to guarantee data confidentiality so that attackers cannot eavesdrop on communications. https has been used to protect sensitive information exchanges, but security exploits such as passive and active attacks have exposed the need to implement https in a more rigorous and pervasive manner. this report is intended to shed light on the state of https implementation in libraries, and to suggest ways in which libraries can evaluate and improve application security so that they can better protect the confidentiality of pii about library patrons. introduction patron privacy is fundamental to the practice of librarianship in the united states (u.s.). libraries have historically made great efforts to ensure the confidentiality of personally identifiable information (pii), but the rapid, widespread adoption of information technology and the internet have given rise to new privacy and security challenges. the usa patriot act, the rollback of the federal communications commission rules prohibiting internet service providers from selling customer browsing histories without the customer’s permission, along with electronic surveillance efforts by the national security agency (nsa) and other government agencies, have further intensified privacy concerns about sensitive information that is transmitted over the public internet when patrons interact with electronic library resources through online systems such as an online public access catalog (opac). 1 hypertext transport protocol secure (https) is a form of hypertext transport protocol (http) that enables secure communication over the public internet and provides a deterministic way to guarantee data confidentiality so that attackers cannot eavesdrop on communications. https has been used to protect sensitive information exchanges (i.e., e-commerce transactions, user authentication, etc.). in practice, however, security exploits such as man-in-the-middle attacks have demonstrated the relative ease with which an attacker can transparently eavesdrop on or hijack http traffic by targeting gaps in https implementation. there is little or no evidence in the literature that libraries are aware of the associated vulnerabilities, threats, or risks, or that researchers have evaluated the use of https in library web applications. this report is intended to shed light on the state of https implementation in libraries, and to suggest ways in which libraries can evaluate and improve application security so that they can better protect the mailto:richardt@vmware.com mailto:tonia.sannicolas-rocca@sjsu.edu application level security in a public library |thomchick and san nicolas-rocca 108 https://doi.org/10.6017/ital.v37i4.10405 confidentiality of pii about library patrons. the structure of this paper is as follows. first, we review the literature on privacy as it pertains to librarianship and cybersecurity. we then describe the testing and research methods used to evaluate https implementation. a discussion on the results of the findings is presented. finally, we explain the limitations and suggest future research directions. literature review the research begins with a survey of the literature on the topic of confidentiality as it pertains to patron privacy; the impact of information technology on libraries; and the use of https as a security control to protect the confidentiality of patron data when it is transmitted over the public internet. while there is ample literature on the topic of patron privacy, there appears to be a lack of empirical studies that measure the use of https to protect the privacy of data transmitted to and from patrons when they use library web applications.2 the primal importance of patron privacy patron privacy has long been one of the most important principles of the library profession in the u.s. as early as 1939, the code of ethics for librarians explicitly stated, “it is the librarian’s obligation to treat as confidential any private information obtained through contact with li brary patrons.”3 the concept of privacy as applied to personal and circulation data in library records began to appear in the library literature not long after the passage of the u.s. privacy act of 1974.4 today, the american library association (ala) regards privacy as “fundamental to the ethics and practice of librarianship,” and has formally adopted a policy regarding the confidentiality of personally identifiable information (pii) about library users, which asserts, “confidentiality exists when a library is in possession of personally identifiable information about users and keeps that information private on their behalf.”5 this policy affirms language from the ala code of ethics, and states that “confidentiality extends to information sought or received and resources consulted, borrowed, acquired or transmitted including database search records, reference questions and interviews, circulation records, interlibrary loan records, information about materials downloaded or placed on ‘hold’ or ‘reserve,’ and other personally identifiable information about uses of library materials, programs, facilities, or services.” 6 with the advent of new technologies used in libraries to support information discovery, more challenges arise to protect patron privacy.7 the impact of information technology on patron privacy researchers have studied the impact of information technology on patron privacy for several decades. early research by harter and machovec discussed the data privacy challenges arising from the use of automated systems in the library, and the associated ethical considerations for librarians who create, view, modify, and use patron records.8 fouty addressed issues regarding the privacy of patron data contained in library databases, arguing that online patron records provide more information about individual library users, more quickly, than traditional paperbased files.9 agnew and miller presented a hypothetical case involving the transmission of an obscene email from a library computer, and an ensuing fbi inquiry, as a method of examining privacy issues that arise from patron internet use at the library.10 in addition, merry pointed to the potential for violations of patron privacy brought about by tracking of personal information attached to electronic text supplied by publishers.11 information technology and libraries | december 2018 109 the consensus from the literature, as articulated by fifarek, is that technology has given rise to new privacy challenges, and that the adoption of technology in the library has outpaced efforts to maintain patron privacy.12 this sentiment was echoed and amplified by john berry, former ala president, who commented that there are “deeper issues that arise from the impact of converting information to digitized, online formats” and critiqued the library profession for having “not built protections for such fundamental rights as those to free expression, privacy, and freedom.”13 ala affirmed these findings and validated much of the prevailing research in a report from the library information technology association, which concluded, “user records have also expanded beyond the standard lists of library cardholders and circulation records as libraries begin to use electronic communication methods such as electronic mail for reference services, and as they provide access to computer, web and printing use.”14 in more recent years, library systems have made increasing use of network communication protocols such as http and focus of the literature has shifted towards internet technologies in response to the growth of trends such as cloud computing and web 2.0. mavodza characterizes the relevance of cloud computing as “unavoidable” and expounds on the ways in which software-as-aservice (saas), platform as a service (paas), and infrastructure as a service (iaas) and other cloud computing models “bring to the forefront considerations about . . . information security [and] privacy . . . that the librarian has to be knowledgeable about.”15 levy and bérard caution that nextgeneration library systems and web-based solutions are “a breakthrough but need careful scrutiny” of security, privacy, and related issues such as data provenance (i.e., where the information is physically stored, which can potentially affect security and privacy compliance requirements). 16 protecting patron privacy in the “library 2.0” era “library 2.0” is an approach to librarianship that emphasizes engagement and multidirectional interaction with library patrons. although this model is “broader than just online communication and collaboration” and “encompasses both physical and virtual spaces,” there can be no doubt that “library 2.0 is rooted in the global web 2.0 discussion,” and that libraries have made increasing use of web 2.0 technologies to engage patrons.17 the library 2.0 model disrupts many traditional practices for protecting privacy, such as limited tracking of user activity, short-term data retention policies, and anonymous browsing of physical materials. instead, as zimmer states, “the norms of web 2.0 promote the open sharing of information—often personal information—and the design of many library 2.0 services capitalize on access to patron information and might require additional tracking, collection, and aggregation of patron activities.”18 as ala cautioned in their study on privacy and confidentiality, “libraries that provide materials over websites controlled by the library must determine the appropriate use of any data describing user activity logged or gathered by the web server software.”19 the dilemma facing libraries in the library 2.0 era, then, is how to appropriately leverage user information while maintaining patron privacy. many library systems require users to validate their identity through the use of a username, password, pin code, or another unique identifier for access to their library circulation records and other personal information.20 however, several studies suggest the authentication process itself spawns a trail of personally identifiable information about library patrons that must be kept confidential.21 there is discussion in the literature about the value of using https and ssl certificates to protect patron privacy and build a high level of trust with users, and general awareness about importance of encrypting communications that involve sensitive information, such as “payment for fines and fees via the opac” or when “patrons are required to enter personal application level security in a public library |thomchick and san nicolas-rocca 110 https://doi.org/10.6017/ital.v37i4.10405 details such as addresses, phone numbers, usernames, and/or passwords.”22 however, as breeding observed, many opacs and other library automation software products “don't use ssl by default, even when processing these personalization features.” 23 these observations call library privacy practices into question, and are concerning since “hackers have identified library ilss as vulnerable, especially when libraries do not enforce strict system security protocols.” 24 one of the challenges facing libraries is the perception that “a library's basic website and online catalog functions don't need enhanced security.”25 as a matter-of-fact, one of the most common complaints against https implementation in libraries has been: “we don’t serve any sensitive information.”26 these beliefs may be based on the historical practice of using https selectively to secure “sensitive” information and operations such as user authentication. but in recent years, it has become clear that selective https implementation is not an adequate defense. the electronic frontier foundation (eff) cautions, “some site operators provide only the login page over https, on the theory that only the user’s password is sensitive. these sites’ users are vulnerable to passive and active attacks.”27 passive attacks do not alter systems or data. during a passive attack, a hacker will attempt to listen in on communications over a network. eavesdropping is an example of a passive attack.28 active attacks alter systems or data. during this type of attack, a hacker will attempt to break into a system to make changes to transmitted or stored data, or introduce data into the system. examples of active attacks include man-in-the-middle, impersonation, and session hijacking.29 http exploits web servers typically generate unique session token ids for authenticated users and transmit them to the browser, where they are cached in the form of cookies. session hijacking is a type of attack that “compromises the session token by stealing or predicting a valid session token to gain unauthorized access to the web server,” often by using a network sniffer to capture a valid session id that can be used to gain access to the server.30 session hijacking is not a new problem, but the release of the firesheep attack kit in 2010 increased awareness about the inherent insecurity of http and the need for persistent https.31 in the wake of firesheep’s release and several major security breaches, senator charles schumer, in a letter to yahoo!, twitter, and amazon, characterized http as a “welcome mat for would-be hackers” and urged the technology industry to implement better security as quickly as possible.32 these and other events prompted several major site operators, including google, facebook, paypal, and twitter, to switch from partial to pervasive https. today these sites transmit virtually all web application traffic over https. security researchers from these companies, as well as from several standards organizations such as electronic frontier foundation (eff), internet engineering task force (ietf), and open web application security project have shared their experiences and recommendations to help other website operators implement https effectively.33 these include encrypting the entire session, avoiding mixed content, configuring cookies correctly, using valid ssl certificates, and enabling hsts to enforce https. testing techniques used to evaluate https implementation there is little or no evidence in the literature that libraries are aware of the associated vulnerabilities, threats, or risks, or that researchers have evaluated the use of https in library web applications. however, there are many methods that libraries can use to evaluate https and information technology and libraries | december 2018 111 ssl/tls implementation, including automated software tools and heuristic evaluations. these methods can be combined for deeper analysis. automated software tools among the most widely used automated analysis software tools is ssl server test from qualys ssl labs. this online service “performs a deep analysis of the configuration of any ssl web server on the public internet” and provides a visual summary as well as detailed information about authentication (certification and certificate chains) and configuration (protocols, key strength, cipher suites, and protocol details).34 users can optionally post the results to a central “board” that acts as a clearinghouse for identifying “insecure” and “trusted” sites. another popular tool is sslscan, a command-line application that, as the name implies, quickly “queries ssl services, such as https, in order to determine the ciphers that are supported.”35 however, these tools are limited in that they only report specific types of data and do not provide a holistic view of https implementation. heuristic evaluations in addition to automated software tools, librarians can also use heuristic evaluations to manually inspect the gray areas of https implementation, either to validate the results of automated software or to examine aspects not included in the functionality of these tools. one example is httpsnow, a service that lets users report and view information about how websites use https. httpsnow enables this activity by providing heuristics that non-technical audiences can use to derive a relatively accurate assessment of https deployment on any particular website or application. the project documentation includes descriptions of, and guidance for identifying, http-related vulnerabilities such as use of http during authenticated user sessions, presence of mixed content (instances in which content on a webpage is transmitted via https while other content elements are transmitted via http), insecure cookie configurations, and use of invalid ssl certificates. research methodology a combination of heuristic and automated methods was used to evaluate https implementation in a public library web application to determine how many security vulnerabilities exist in the application and assess to the potential privacy risks to the library’s patrons. research location this research project was conducted at a public library in the western us that we will call west coast public library (wcpl). this library was established in 1908 and employs ninety staff and approximately forty volunteers. in addition, it has approximately 91,000 cardholders. as part of its operations, wcpl runs a public-facing website and an integrated library system (ils) that includes an opac with personalization for authenticated users. test to conduct the test, a valid wcpl library patron account was created and used to authenticate one of the authors for access to account information and personalized features of wcpl’s opac. next, the google chrome web browser was used to visit wcpl’s public-facing website. a valid patron name, library card number, and eight-digit pin number were then used to gain access to online account information. several tasks were performed to evaluate https usage. a sample search application level security in a public library |thomchick and san nicolas-rocca 112 https://doi.org/10.6017/ital.v37i4.10405 query for the keyword “recipes” was performed in the opac while logged in. the description pages for two of the resources listed in the search engine result page (one printed resource and one electronic resource) were clicked on and viewed. the electronic resource was added to the online account’s “book cart” and the book cart page was viewed. during these activities, httpsnow heuristics were applied to individual webpages and to the user session as a whole. the web browser’s url address window was inspected to determine whether some or all pages were transmitted via http or https. the url icon in the browser’s address bar was clicked on to view a list of the cookies that the application set in the browser. each cookie was inspected for the text, "send for: encrypted connections only," which indicates that the cookie is secure. individual webpages were checked for the presence of mixed (encrypted and unencrypted) content. information about individual ssl certificates was inspected to determine their validity and encryption key length. all domain and subdomain names encountered during these activ ities were documented. the google chrome web browser was then used to access the qualys ssl server test tool. each domain name encountered was submitted. test results were then examined to determine whether any authentication or configuration flaws exist in wcpl’s web applications. results and discussion given the recommendations suggested by several organizations (e.g., eff, ietf, owasp), we evaluated wcpl’s web application to determine how many security vulnerabilities exist in the application, and assess the potential privacy risks to the library’s patrons. the results of tests, as discussed below, suggest that wcpl’s web application processes a number of vulnerabilities that could potentially be exploited by attackers and compromise the confidentiality of pii about library patrons. this is not surprising given the lack of research on https implementation, as well as the general consensus in the literature that technology adoption has outpaced efforts to maintain patron privacy. based on the results of these tests, wcpl’s website and ils span across several domains. some of these domains appear to be operated by wcpl, while others appear to be part of a hosted environment operated by the ils vendor. based on this information, it is reasonable to conclude that wcpl’s ils utilizes a “hybrid cloud” model. in addition, random use of https is observed in the opac interface during the testing process. this is discussed in the following sections. use of http during authenticated user sessions library patrons use wcpl’s website and opac to access and search for books and other material available through the library. given the results of the tests, wcpl does not use https pervasively across its entire web application. during the test, we found that wcpl’s website is transmitted via http by default. this was after manually entering in the url with an “https” prefix, which resulted in a redirect to the unencrypted “http” page. we continued to test wcpl’s website and opac by performing a query using the search bar located on the patron account page. we found that wcpl’s opac transmits some pages over http and others over https. for example, when a search query is performed in the search bar located on the patron account page, the search engine results page is sometimes served over https, and sometimes over http (see figure 1). this behavior is not limited to specific pages; rather it appears to be random. this security flaw leaves library patrons vulnerable to passive and active attacks that exploit gaps in https implementation, which allows an attacker to eavesdrop on and hijack a user-session providing the attacker with access to private information. information technology and libraries | december 2018 113 figure 1. results of the library’s use of https. presence of mixed content when a library patron visits a webpage served over https, the connection with the web server is encrypted, and therefore, safeguarded from attack. if an https webpage includes content retrieved via http, the webpage is only partially encrypted, leaving the unencrypted content vulnerable to attackers. analysis of wcpl’s website did not reveal any explicit use of mixed content on the public-facing portion of the site. test results, however, detected unencrypted content sources on some pages of the library’s online catalog. this, unfortunately, puts patron privacy at risk as attackers can intercept the http resources when an https webpage loads content such as an image, iframe or font over http. this compromises the security of what is perceived to be a secure site by enabling an attacker to exploit an insecure css file or javascript function, leading to disclosure of sensitive data, malicious website redirect, man-in-the-middle attacks, phishing, and other active attacks.36 insecure cookie management cookies are small text files, sent from a web server and stored on user computers via web browsers. cookies can be divided into two categories: session and persistent. persistent cookies are stored on the user’s hard drive until they are erased or expire. unlike persistent cookies, session cookies are stored in memory and erased once the user closes their browser. provided that computer settings allow for it, cookies are created when a user visits a website. cookies can be set up such that communication is limited to encrypted communication, and can be used to remember login credentials, previous information entered into forms, such as name, mailing address, email address, and the like. cookies can also be used to monitor the number of times a user visits a website, the pages a user visits, and the amount of time spent on a webpage. application level security in a public library |thomchick and san nicolas-rocca 114 https://doi.org/10.6017/ital.v37i4.10405 the results of the tests suggest that wcpl’s cookie policies are inconsistent. we found two types of cookies present. within one domain, the web application uses a jsession cookie that is configured to send for “secure connections only.” this indicates that the session id cookie is encrypted during transmission. another domain uses an asp.net session id that is configured to send for any connection, which means the session id could be transmitted in an unencrypted format. cookies transmitted in an unencrypted format could be intercepted by an attacker in order to eavesdrop on or hijack user sessions. this leaves user privacy vulnerable given the type of information contained within cookies. flawed encryption protocol support transport layer security (tls) is a protocol designed to provide secure communication over the web. websites using tls, therefore, provide a secure communication path between their web servers and web browsers preventing eavesdropping, hijacking, and other active attacks. this study employed the ssl server test from qualys ssl labs to perform an analysis of wcpl’s web applications. results of the qualys test (see figure 2) indicate that the site does not support tls 1.2, which means the server may be vulnerable to passive and active attacks, thereby providing hackers with access to data passed between a web server and web browser accessing the server. in addition, the application’s server platform supports ssl 2.0, which is insecure because it is subject to a number of passive and active attacks leading to loss of confidentiality, privacy, and integrity. figure 2. qualys scanning service results. the vulnerabilities discovered during the testing process may be a result of uncoordinated security. this is concerning because it is a by-product of the cloud computing approach used to operate wcpl’s ils. while libraries may have acclimated to the challenge of coordinating security measures across a distributed application, they now face the added complexity of coordinating information technology and libraries | december 2018 115 security measures with their vendors, who themselves may also utilize additional cloud-based offerings from third parties. as cloud technology adoption increases and cloud-based infrastructures become more complex and distributed, attackers will likely attempt to find and exploit systems with inconsistent or uneven security measures, and libraries will need to work closely with information technology vendors to ensure tight coordination of security measures. unencrypted communication using http affects the privacy, security, and integrity of patron data. passive attacks such as eavesdropping, and active attacks such as hijacking, man -in-the-middle, and phishing can reveal patron login credentials, search history, identity, and other sensitive information that, according to ala, should be kept private and confidential. given the results of the testing done in this study, it is clear that wcpl needs to revisit and strengthen their web application security measures by, according to organizations within the security community, using https pervasively across the entire web application, avoiding mixed content, configuring cookies limited to encrypted communication, using valid ssl certificates, and enabling hsts to enforce https. implementing improvements to https will mitigate attacks by strengthening the integrity of wcpl’s web applications, which in turn, will help protect the privacy and confidentiality of library patrons. limitations and future research this research was performed at a public library in the western u.s. therefore, future research is needed to study the implementation of https to increase patron privacy at other public libraries, libraries in other parts of the u.s. and in other countries. it would also be valuable to conduct similar research at libraries of different types, including academic, law, medical, and other types of special libraries. ssl server test from qualys ssl labs and httpsnow were used to evaluate the use of https at wcpl. the use of other evaluation techniques may generate different results. while a major limitation of this study is the evaluation of a single public library and the implementation of https to ensure patron privacy, a next phase of research should further investigate the policies in place that are used to safeguard patron privacy. these include security education, training, and awareness programs, as well as access controls. furthermore, library 2.0 and cloud computing are fundamental to libraries, but create risks that could impact the ability to keep patron pii safeguarded. as such, future research should evaluate the impact library 2.0 and cloud computing applications have on maintaining the confidentiality of patron information. conclusion the library profession has long been a staunch defender of privacy rights, and the literature reviewed indicates strong awareness and concern about the rapid pace of information technology and its impact on the confidentiality of personally identifiable information about library patrons. much work has been done to educate librarians and patrons about the risks facing them and the measures they can take to protect themselves. however, the research and experimentation presented in this report strongly suggest that there is a need for wcpl and other libraries to reassess and strengthen their https implementations. https is not a panacea for mitigating web application risks, but it can help libraries give patrons the assurance of knowing they take security and privacy seriously, and that reasonable steps are being taken to protect them. finally, this report concludes that further research on library application security should be conducted to assess the overall state of application security in public, academic, and special libraries, with the application level security in a public library |thomchick and san nicolas-rocca 116 https://doi.org/10.6017/ital.v37i4.10405 long-term objective of enabling ala and other professional institutions to develop policies and best practices to guide the secure adoption of library 2.0 and cloud computing technologies within a socially connected world. references 1 jon brodkin, “president trump delivers final blow to web browsing privacy rules,” ars technica (april 3, 2017), https://arstechnica.com/tech-policy/2017/04/trumps-signaturemakes-it-official-isp-privacy-rules-are-dead/. 2 shayna pekala, “privacy and user experience in 21st century library discovery,” information technology and libraries 36, no. 2 (2017): 48–58, https://doi.org/10.6017/ital.v36i2.9817. 3 american library association, “history of the code of ethics: 1939 code of ethics for librarians,” accessed may 11, 2018, http://www.ala.org/template.cfm?section=history1&template=/contentmanagement/conte ntdisplay.cfm&contentid=8875. 4 joyce crooks, “civil liberties, libraries, and computers,” library journal 101, no. 3 (1976): 482– 87; stephen harter and charles c. busha, “libraries and privacy legislation,” library journal 101, no. 3 (1976): 475–81; kathleen g. fouty, “online patron records and privacy: service vs. security,” journal of academic librarianship 19, no. 5 (1993): 289–93, https://doi.org/10.1016/0099-1333(93)90024-y. 5 “code of ethics of the american library association,” american library association, amended january 22, 2008, http://www.ala.org/advocacy/proethics/codeofethics/codeethics; “privacy: an interpretation of the library bill of rights,” american library association, amended july 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 6 american library association, “privacy: an interpretation of the library bill of rights,” amended july 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 7 pekala, “privacy and user,” pp. 48–58. 8 harter and busha, “libraries and privacy legislation,” pp. 475–81; george s. machovec, “data security and privacy in the age of automated library systems,” information intelligence, online libraries, and microcomputers 6, no. 1 (1988). 9 fouty, “online patron records and privacy, pp. 289–93. 10 grace j. agnew and rex miller, “how do you manage?,” library journal 121, no. 2 (1996): 54. 11 lois k. merry, “hey, look who took this out!—privacy in the electronic library,” journal of interlibrary loan, document delivery & information supply 6, no. 4 (1996): 35–44, https://doi.org/10.1300/j110v06n04_04. https://arstechnica.com/tech-policy/2017/04/trumps-signature-makes-it-official-isp-privacy-rules-are-dead/ https://arstechnica.com/tech-policy/2017/04/trumps-signature-makes-it-official-isp-privacy-rules-are-dead/ https://doi.org/10.6017/ital.v36i2.9817 http://www.ala.org/template.cfm?section=history1&template=/contentmanagement/contentdisplay.cfm&contentid=8875 http://www.ala.org/template.cfm?section=history1&template=/contentmanagement/contentdisplay.cfm&contentid=8875 https://doi.org/10.1016/0099-1333(93)90024-y http://www.ala.org/advocacy/proethics/codeofethics/codeethics http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy https://doi.org/10.1300/j110v06n04_04 information technology and libraries | december 2018 117 12 aimee fifarek, “technology and privacy in the academic library,” online information review 26, no. 6 (2002): 366–74, https://doi.org/10.1108/14684520210452691. 13 john n. berry iii, “digital democracy: not yet!,” library journal 125, no. 1 (2000): 6. 14 american library association, “appendix—privacy and confidentiality in the electronic environment,” september 28, 2006, http://www.ala.org/lita/involve/taskforces/dissolved/privacy/appendix. 15 judith mavodza, “the impact of cloud computing on the future of academic library practices and services,” new library world 114, no. 3/4 (2012): 132–41, https://doi.org/10.1108/03074801311304041. 16 richard levy, “library in the cloud with diamonds: a critical evaluation of the future of library management systems,” library hi tech news 30, no. 3 (2013): 9–13, https://doi.org/10.1108/lhtn-11-2012-0071; raymond bérard, “next generation library systems: new opportunities and threats,” bibliothek, forschung und praxis 37, no. 1 (2013): 52–58, https://doi.org/10.1515/bfp-2013-0008. 17 michael stephens, “the hyperlinked library: a ttw white paper,” accessed may 13, 2018, http://tametheweb.com/2011/02/21/hyperlinkedlibrary2011/; michael zimmer, “patron privacy in the ‘2.0’ era.” journal of information ethics 22, no. 1 (2013): 44–59, https://doi.org/10.3172/jie.22.1.44. 18 zimmer, “patron privacy in the ‘2.0’ era,” p. 44. 19 “the american library association’s task force on privacy and confidentiality in the electronic environment,” american library association, final report july 7, 2000, http://www.ala.org/lita/about/taskforces/dissolved/privacy. 20 library information technology association (lita), accessed may 11, 2018, http://www.ala.org/lita/. 21 library information technology association (lita), accessed may 11, 2018, http://www.ala.org/lita/; pam dixon, “ethical issues implicit in library authentication and access management: risks and best practices,” journal of library administration 47, no. 3 (2008): 141–62, https://doi.org/10.1080/01930820802186480; eric p. delozier, “anonymity and authenticity in the cloud: issues and applications,” oclc systems and services: international digital library perspectives 29, no. 2 (2012): 65–77, https://doi.org/10.1108/10650751311319278. 22 marshall breeding, “building trust through secure web sites,” computers in libraries 25, no. 6 (2006), p. 24. 23 breeding, “building trust,” p. 25. https://doi.org/10.1108/14684520210452691 http://www.ala.org/lita/involve/taskforces/dissolved/privacy/appendix https://doi.org/10.1108/03074801311304041 https://doi.org/10.1108/lhtn-11-2012-0071 https://doi.org/10.1515/bfp-2013-0008 http://tametheweb.com/2011/02/21/hyperlinkedlibrary2011/ https://doi.org/10.3172/jie.22.1.44 http://www.ala.org/lita/about/taskforces/dissolved/privacy http://www.ala.org/lita/ http://www.ala.org/lita/ https://doi.org/10.1080/01930820802186480 https://doi.org/10.1108/10650751311319278 application level security in a public library |thomchick and san nicolas-rocca 118 https://doi.org/10.6017/ital.v37i4.10405 24 barbara swatt engstrom et al., “evaluating patron privacy on your ils: how to protect the confidentiality of your patron information,” aall spectrum 10, no 6 (2006): 4–19. 25 breeding, “building trust,” p. 26. 26 tj lamana, “the state of https in libraries,” intellectual freedom blog, the office for intellectual freedom of the american library association (2017), https://www.oif.ala.org/oif/?p=11883. 27 chris palmer and yan zhu, “how to deploy https correctly,” electronic frontier foundation, updated february 9, 2017, https://www.eff.org/https-everywhere/deploying-https. 28 computer security resource center, “glossary,” national institute of standards and technology, accessed may 12, 2018, https://csrc.nist.gov/glossary/?term=491#alphaindexdiv. 29 computer security resource center, “glossary,” national institute of standards and technology, accessed may 12, 2018, https://csrc.nist.gov/glossary/?term=2817. 30 open web application security project, “session hijacking attack,” last modified august 14, 2014, https://www.owasp.org/index.php/session_hijacking_attack; open web application security project, “session management cheat sheet,” last modified september 11, 2017, https://www.owasp.org/index.php/session_management_cheat_sheet. 31 eric butler, “firesheep,” (2010), http://codebutler.com/firesheep/; audrey watters, “zuckerberg's page hacked, now facebook to offer ‘always on’ https," accessed may 16, 2018, https://readwrite.com/2011/01/26/zuckerbergs_facebook_page_hacked_and_now_facebook/ . 32 info security magazine, “senator schumer: current internet security “welcome mat for wouldbe hackers,” (march 2, 2011), http://www.infosecurity-magazine.com/view/16328/senator schumer-current-internetsecurity-welcome-mat-for-wouldbe-hackers/. 33 palmer and zhu, “how to deploy https correctly”; internet engineering task force, “recommendations for secure use of transport layer security (tls) and datagram transport layer security (dtls),” (may, 2015), https://tools.ietf.org/html/bcp195; open web application security project, “session management cheat sheet,” last modified september 11, 2017, https://www.owasp.org/index.php/session_management_cheat_sheet. 34 qualys ssl labs, “ssl/tls deployment best practices,” accessed may 18, 2018, https://www.ssllabs.com/projects/best-practices/. 35 sourceforge, “sslscan—fast ssl scanner,” last updated april 24, 2013, http://sourceforge.net/projects/sslscan/. 36 palmer and zhu, “how to deploy https correctly.” https://www.oif.ala.org/oif/?p=11883 https://www.eff.org/https-everywhere/deploying-https https://csrc.nist.gov/glossary/?term=491#alphaindexdiv https://csrc.nist.gov/glossary/?term=2817 https://www.owasp.org/index.php/session_hijacking_attack https://www.owasp.org/index.php/session_management_cheat_sheet http://codebutler.com/firesheep/ https://readwrite.com/2011/01/26/zuckerbergs_facebook_page_hacked_and_now_facebook/ http://www.infosecurity-magazine.com/view/16328/senator-%20schumer-current-internet-%20security-welcome-mat-for-wouldbe-hackers/ http://www.infosecurity-magazine.com/view/16328/senator-%20schumer-current-internet-%20security-welcome-mat-for-wouldbe-hackers/ https://tools.ietf.org/html/bcp195 https://www.owasp.org/index.php/session_management_cheat_sheet https://www.ssllabs.com/projects/best-practices/ http://sourceforge.net/projects/sslscan/ abstract introduction literature review the primal importance of patron privacy the impact of information technology on patron privacy protecting patron privacy in the “library 2.0” era http exploits testing techniques used to evaluate https implementation automated software tools heuristic evaluations research methodology research location test results and discussion use of http during authenticated user sessions presence of mixed content insecure cookie management flawed encryption protocol support limitations and future research conclusion references reproduced with permission of the copyright owner. further reproduction prohibited without permission. a low-cost library database solution england, mark;lura, joseph;schlecht, nem w information technology and libraries; mar 2000; 19, 1; proquest pg. 46 responding rise in scholarly journal prices. nesli neither encourages nor hinders changes in scholarly communication and therefore the question of restructuring the scholarly communication process remains.20 references and notes 1. barbara mcfadden and arnold hirshon, "hanging together to avoid hanging separately: opportunities for academic libraries and consortia," information technology and libraries 17, no . 1 (march 1998): 36. see also international coalition of library consortia, "statement of current perspective and preferred practices for the selection and purchase of electronic information," information technology and libraries 17, no. 1 (march 1998): 45. 2. martin s. white, "from psli to nesli: site licensing for electronic journals," new review of academic librarianship 3, (1997): 139-50. see also chest. chest: software, data, and information for education (1996). 3. thomas j. deloughry, "library consortia save members money on electronic materials," the chronicle of higher education (feb. 9, 1996): a21. 4. information services subcommittee , "principles for the delivery of content." accessed nov . 17, 1999, www.jisc.ac.uk/ pub97 / nl_97.html#issc. 5. joint funding council's libraries review group . the follett report. (dec. 1993): accessed nov . 20, 1999, www.niss . ac . uk/ ed ucation/hefc/ follett/report/ . 6. john kirriemuir, "background of the elib programme ." accessed nov . 21, 1999, www .ukoln.ac.uk/services .elib/ background/history.html . 7. psli evaluation team, "uk pilot site license initiative : a progress report," serials io, no. 1 (1997): 17-20. 8. white, "from psli to nesli," 149. 9. tony kidd, "electronic journals: their introduction and exploitation in academic libraries in the uk," serials review 24, no . 1 (1998): 7-14. 10. jill taylor roe, "united we save, divided we spend: current purchasing trends in serials acquisitions in the uk academic sector," serials review 24, no. 1 (1998): ~11. psli evaluation team, "uk pilot site license initiative," 17-20. 12. beverly friedgood, "the uk national site licensing initiative," serials 11, no. 1 (1998): 37-39 . 13. university of manchester and swets & zeitlinger, nesli: national electronic site license initiative (1999). accessed nov. 21, 1999, www.nesli.ac.uk/. 14. nesli brochure, "further information for librarians." accessed nov . 21, 1999, www .nesli .ac.uk/ nesli-librarians-leaflet.html. 15. a copy of the model site license is available on the nesli web site . accessed nov . 22, 1999, www .nesli .ac .uk/ mode1license8.html . 16. albert prior, "nesli progress through collaboration," learned publishing 12, no . 1 (1999). 17. science direct. accessed nov. 24, 1999, www .sciencedirect.com. 18. declan butler, "the writing is on the web for science journals in print," nature 397, oan. 211998) . 19. the journal access core collection request for proposal. accessed nov . 22, 1999, www .calstate.edu/tier3/ cs+p/rfp_ifb/980160/980160.pdf . 20. frederick j. friend, "uk pilot site license initiative: is it guiding libraries away from disaster on the rocks of price rises?" serials 9, no. 2 (1996): 129-33. a low-cost library database solution mark england, lura joseph, and nem w. schlecht two locally created databases are made available to the world via the web using an inexpensive but highly functional search engine created in-house. the technology consists of a microcomputer running unix to serve relational databases. cgi forms created using the programming language perl offer flexible interface designs for database users and database maintainers. many libraries maintain indexes to local collections or resources and create databases or bibliographies con46 information technology and libraries i march 2000 cerning subjects of local or regional interest. these local resource indexes are of great value to researchers. the web provides an inexpensive means for broadly disseminating these indexes. for example, kilcullen has described a nonsearchable, webbased newspaper index that uses microsoft access 97.1 jacso has written about the use of java applets to publish small directories and bibliographies.2 sturr has discussed the use of wais software to provide searchable online indexes.3 many of the web-based local databases and search interfaces currently used by libraries may: • have problems with functionality; • lack provisions for efficient searching; • be based on unreliable software; • be based on software and hardware that is expensive to purchase or implement; • be difficult for patrons to use; and • be difficult for staff to maintain. after trying several alternatives, staff members at the north dakota state university libraries have implemented an inexpensive but highly functional and reliable solution. we are now providing searchable indexes on the web using a microcomputer running unix to serve relational databases. cgi forms created at the north dakota state university libraries using the programming language perl offer flexible interface designs for database users and database maintainers. this article describes how we have implemark england (england@badlands . nodak.edu) is assistant director, lura joseph (ljoseph@badlands.nodak.edu) is physical sciences librarian, and nem w. schlecht (schlecht@plains.nodak.edu) is a systems administrator at the north dakota state university libraries, fargo, north dakota. reproduced with permission of the copyright owner. further reproduction prohibited without permission. mented this technology to distribute two local databases to the world via the web. it is hoped that recounting our experiences will facilitate other such projects . i creating the databases the two databases that we selected to use as demonstrations of this technology are a community newspaper index and a bibliography of publications related to north dakota geology. the forum index the farg o forum is a daily newspaper published in fargo, north dakota. it began publication in 1879 and is the paper of record for north dakota . for many years, the north dakota state university libraries have maintained an index to the forum. beginning with the selective indexing of notable events and editions, we started offering full-text indexing of the entire paper in 1996. until early in the 1980s, all indexing was done manually and preserved on cards or paper. then for several years , indexing was done on one of the university's mainframe computers . starting in 1987, microcomputers were used to compile the index, first using dbase and then using procite as the database management software . printed copies of the database were sold annually to subscribing libraries and businesses . starting in the summer of 1996, th e library made arrangements with the publisher of the paper to acquire digital copy of the text of each newspaper. in early 1997, the ndsu libraries began a project to place all of our forum indexes on the web. dbase, pro-cite, wordperfect, or microsoft access computer files existed for the newspaper index from 1879 to 1975, 1988, and from 1990 to 1996. all other data was unavailable or unreadable. printed indexes from 1976 to 1987 and 1989 were scanned using a hewlett packard 4c scanner fitted with a page feeder . optical character recognition was accomplished using the software omnipage pro. once experience was gained with scanner and software settings, the scanning went very quickly with very few errors appearing in the data. various members of the library staff volunteered to check and edit the data, and the digitizing of approximately 1,500 pages was completed in about three weeks. all data were checked and normalized using microsoft's excel spreadsheet software and then saved as tab-delimited text. programmer's file editor was used to do the final text editing. because of variations in the completeness of the indexing, three separate relational database tables were created: one each for the years 1879-1975, 1976-1996, and 1996-the present. the collective bibliography of north dakota geology in 1996 a project was initiated to combine three bibliographies of north dakota geology and to make the final product searchable and browsable on the web. all three of the original print bibliographies were published by the north dakota geological survey. scott published the first bibliography as a thesis . it is a bibliography of all then-known north dakota geological literature published between 1805 and 1960, and most entries are annotated. 4 the second print bibliography, also by scott, focuses on north dakota geological literature published in the years 1960 through 1979, and also includes some material omitted in the first bibliography .5 most entries in the second bibliography include annotations in the form of keywords or keyword phrases. the third bibliography covers the years 1980 through 1993, and is not annotated.6 all three bibliographies are indexed . the third bibliography was available in digital format, whereas the first two were in print format only. library staff members began rekeying the two print bibliographies using microsoft word. the remaining pages were digitally scanned using a new hewlett packard 4c scanner and the optical character recognition software omnipage pro . there were many errors in the resulting text. different font sizes in the original documents may have contributed to optical recognition errors . editing of the scanned pages was nearly as time consuming and tedious as rekeying the documents . the microsoft word documents were saved as text files and combined as a single text file. programmer's file editor was used as a final editor to remove any line breaks or other undesirable formatting. each record was edited to occupy one line, and each field was delimited by two asterisks . asterisks were used because there were many occurrences of commas, semicolons, and other symbols that would have made it difficult to parse any other way. because italics were removed by converting to a text file, some errors were made in parsing. in retrospect, parsing should have been done before the document was saved as a text file. punctuation between fields was removed because the database would be converted to a large table. it would have been better to leave the punctuation intact, since it cannot easily be put back in for the output to be presented in bibliographic form. the alphabetical additions to publication dates (e.g. baker, 1966a) were left intact to aid in hand-cutting and pasting index terms into the records at a later date. initially, the resulting document was converted to a microsoft access file so that it would be in a table format. however, many of the fields communications i england, joseph, and schlecht 47 reproduced with permission of the copyright owner. further reproduction prohibited without permission. secure database: shaw diese fields in results : aalhor: ::::=====~---~---date : le~al to i p author p date p tille p' source tid,:l . _j r annot:l!iom r: index sour1:e: ~=====~ amiotalions: l ... -· · ... ······-···~~ ..... ·.-... --.... j r prilll resource p record number bulu: ;:::::::::::=::::::::::::::::::;:::;,~~ priat re1oun:11: ! show all ii record naml,er: i equal to ij l=:j sort results by: jaulhor j r descending ai b ic idi eif ig ihii ij ikil im in io ipi.q.iri s it iuiviwix iy iz figure 1: secure database editing interface were well over the 256 character limit of individual fields . to solve this problem, the data were imported into a relational database called mysql, which allows large data fields called "blobs." running under unix, mysql is very flexible and powerful . i database and search engine design we examined the features and capabilities of various online bibliographies and indexes when deciding on our search interfaces and search engine designs . we wanted our databases to be both searchable and browsable and, in the case of the collective bibliography of north dakota geology, we wanted to provide the option of receiving search results accurately in a specific bibliographic format. we wanted both simple and advanced search capabilities, including the ability to do highly sophisticated boolean searching. finally, we wanted to provide those maintaining the databases with the ability to easily add, delete, and change records from within simple forms on the web and immediately see the results of this editing . mysql uses a perl interface, dbi (database independent interface), which makes accessing the database simple from a perl script. essentially, a sql statement is generated, based on data from an html form. this sql statement is then run against the mysql database, returning matching rows that the same script can handle and display as needed. all of the dynamically generated pages in this database are created this way. using both mysql and perl provided a nice, elegant way to integrate database functionality with the web. the databases were installed on a server and made available via the web. it soon became apparent that there were problems with large numbers of returns . depending upon the client machine's hardware configuration, browsers could lock up the 48 information technology and libraries i march 2000 machine. while an efficient search should not result in such a large number of hits, we decided to limit returns to reduce this problem. following suggestions from users, various search tips were added, and some search interface terminology was changed. from a secure gateway , it is possible to call up different forms that allow individual records to be displayed, edited, and saved (see figure 1). new records are added by using a simple html form . it is also possible to bulk-load large numbers of records by using a special perl program to load the data directly from a text file. i advantages of the unix/mysql solution after first using glimpse, a popular web search engine, under linux, a free unix platform, and then microsoft's internet information server (iis) software on a windows nt platform to search the forum newspaper index, we settled on using mysql on a microcomputer running linux and the apache web server. we found we could write perl scripts that allowed users to make very sophisticated searches of the data from within very simple web forms. mysql is stable, reliable, free, and offers a high degree of functionality, flexibility, and efficiency. apache is reliable, extendible, very fast, free, and offers tight control of data access. initially, each story received from the newspaper was maintained as a separate file on a microcomputer. by having the stories as separate files, it was easy to set up glimpse as a searching tool for the articles. although it did provide a nice preview of a workable system, glimpse did not provide enough flexibility in how records were displayed, organized, or searched. it was not meant for managing data of this sort. windows nt, although a popular and successful it solution, was reproduced with permission of the copyright owner. further reproduction prohibited without permission. found to be somewhat cumbersome to implement and did not provide enough flexibility. the installation of these tools was easy, but it was difficult to obtain a high level of database and web integration . reliability and cost were also concerns . we found that unix was more stable and practically eliminated any unavailability of the data . perl, mysql, and apache were ultimately used to manage, store, and deliver the data. although these products are available for windows nt, their native platform is unix. by running these products on unix, we were able to take advantage of all the features offered by each of the products. we found that mysql offered the flexibility and power to manage both sets of data efficiently. also, to load the data into a relational database such as mysql required the data to be normalized. normalized data are data that are separated into logically separate components. to normalize data often takes some extra effort, as fields must be defined to contain certain types of data, but in the end the data is easier to manage and well organized. by having articles and bibliographies in a relational database, we are able to easily make updates, additions, and generate output or reports on the data in many different ways. there are several web servers available on the market today . however, apache is often singled out as being the most popular server . apache, like perl and mysql, is available free for all uses (educational and commercial). using apache and .htaccess control files, we are able to restrict access to administrative pages where data are added or modified. many extensions for apache are available to increase web performance in different situations. for example, a module for apache allows the web server to execute perl code within the server without the need to run the regular perl interpreter. i conclusion and future plans work is under way to refine and update the collective bibliography of north dakota geology. because bibliography number three was not annotated, index terms are being added to facilitate searching and retrieval of citations. we have recently updated the collective bibliography of north dakota geology to include citations to publications through 1998, and we plan to update the database annually. additionally, we receive monthly updates of forum articles, which are added using a simple perl script as soon as they are received. we have successfully implemented a number of other databases using these methods. we realize that this unix/ mysql solution is likely to be most helpful to other academic libraries: there are generally students and staff available on many campuses who are capable of programming in perl and maintaining sql databases on unix servers. our perl scripts are available at the url ww.lib.ndsu .nodak.edu/ kids. references and notes 1. m . kilcullen, "publishing a newspaper index on the world wide web using microsoft access 97," the indexer 20, no . 4 (1997): 195-96 . 2. p . jacso, "publishing textual databases on the web," information today 15, no . 11 (1998): 33, 36 3. n .o . sturr, "wais: an internet tool for full-text indexing," computers in libraries 15 (june 1995): 52-54. 4. m .w . scott, annotated bibliography of the geology of north dakota 1806-1959 north dakota geological survey miscellaneous series, no. 49 . (grand forks , n .d .: north dakota geological survey , 1972). 5. m . w . scott , annotated bibliography of the geology of north dakota 1960-1979 north dakota geological survey miscellaneous series, no. 60. (grand forks, n.d.: north dakota geological survey, 1981). 6. l. greenwood and others, bibliography of the geology of north dakota 1980-1993 north dakota geological survey miscellaneous series, no. 83. (bismarck, n .d .: north dakota geological survey, 1996). related urls linux homepage: www.linux.org/ mysql homepage: www.mysql.com/ perl homepage: www.perl.com/ apache homepage: www.apache.org/ ndsu forum index: www.lib.ndsu. nodak.edu/forum/ collective bibliography of north dakota geology: www.lib.ndsu.nodak.edu/ ndgs/ communications i england, joseph, and schlecht 49 “am i on the library website?”: a libguides usability study articles “am i on the library website?”: a libguides usability study suzanna conrad and christy stevens information technology and libraries | september 2019 49 suzanna conrad (suzanna.conrad@csus.edu) is associate dean for digital technologies and resource management at california state university, sacramento. christy stevens (crstevens@sfsu.edu) is the associate university librarian at san francisco state university. abstract in spring 2015, the cal poly pomona university library conducted usability testing with ten student testers to establish recommendations and guide the migration process from libguides version 1 to version 2. this case study describes the results of the testing as well as raises additional questions regarding the general effectiveness of libguides, especially when students rely heavily on search to find library resources. introduction guides designed to help users with research have long been included among a suite of reference services offered by academic libraries, though terminology, formats, and mediums of delivery have evolved over the years. print “pathfinders,” developed and popularized by the model library program of project intrex at mit in the 1970s, are the precursor to today’s online research guides, now a ubiquitous resource featured on academic library websites.1 pathfinders were designed to function as a “kind of map to the resources of the library,” helping “beginners who seek instruction in gathering the fundamental literature of a field new to them in every respect” find their way in a complex library environment.2 with the advent of the internet, pathfinders evolved into online “research guides,” which tend to be organized around subjects or courses. in the late 1990s and early 2000s, creating guides online required a level of technological expertise that many librarians did not possess, such as html-coding knowledge or the ability to use web development applications like adobe dreamweaver. as a result, many librarians could not create their own online guides and relied upon webmasters to upload and update content. the online guide landscape changed again in 2007 with the introduction of springshare’s libguides, a content management solution that quickly became a wildly popular library product.3 as of december 2018, 614,811 guides had been published by 181,896 librarians, at 4,743 institutions in 56 countries.4 the popularity of libguides is due in part to its removal of technological barriers to online guide creation, making it possible for those without web-design experience to create content. libguides is also a particularly attractive product for libraries constrained by campus or library web templates, affording librarians and library staff the freedom to design pages without requiring higher level permissions to websites. despite these advantages, in the absence of oversight, libguides sites can develop into microsites within the library’s larger web presence. inexperienced content creators can inadvertently develop guides that are difficult to use, lacking consistent templates and containing overwhelming amounts of information. as a result, libraries mailto:suzanna.conrad@csus.edu mailto:crstevens@sfsu.edu am i on the library website? | conrad and stevens 50 https://doi.org/10.6017/ital.v38i3.10977 often find it useful to develop local standards and best practices in order to enhance the user experience.5 like many academic libraries, the cal poly pomona university library uses the libguides platform to provide the campus community with course and subject guides. in 2015, librarians began discussing plans to migrate from libguides version one to the version two platfo rm. these discussions led to broader conversations about libguides related issues and concerns, some of which had arisen during website focus group sessions conducted in early 2015. the focus groups were designed to provide the library with a better understanding of students’ library website preferences. students reported frustration with search options on the library website as well as confusion regarding inconsistent headers. even though focus group questions were related to the library website, two participants commented on the library’s libguides as well. the library was using a modified version of the library website header for vendor-provided services, including libguides, so it was sometimes unclear to students when they had navigated to an external s ite.6 to complicate matters, the library also occasionally used libguides for other, non-researchrelated library pages, such as a page delineating the library’s hours, because of the ease of updating that the platform affords. one student, who had landed on the libguides page detailing the library’s hours, described feeling confused about where she was on the library website. she explained that she had tried to use the search box on the libguides page to navigate away from the hours page, apparently unaware that it was only an internal libguides search. as a result, she did not receive any results for her query. the language the student used to describe the experience clearly revealed her disorientation and perplexity: “something popped up called libguides and then i put what i was looking for and that was nothing. it said no search found. i don’t even know what that was, so i just went to the main page.” another participant, who also tried to search for a research-related topic after landing on a libguides page, stated, “i tried putting my topics. i even tried refining my topic, but then it took me to the guide thing.” accustomed to using a search function to find information on a topic, this student did not interpret the research guide she had landed upon as a potentially useful tool that could help with her research. she expected that her search would produce search results in the form of a list of potentially relevant books or articles. the appearance instead of a research guide was misaligned with her intentions and expectations and therefore confusing to her.7 given both the libguides related issues that emerged during the library website focus groups and the library’s plan to migrate from libguides version one to version two in the near future, the library’s digital initiatives librarian and head of reference and instruction decided to conduct usability testing focused specifically on libguides. in addition to testing the usability of specific libguides features, such as navigational tabs and subtabs, we were also interested in determining whether some of the insights gleaned from the library website focus groups and from prior user surveys and usability testing regarding users’ web expectations, preferences, and behaviors were also relevant in the libguides environment. specifically, prior data had indicated users were unlikely to differentiate between the library’s website and vendor-provided content, such as libguides, libanswers, the library catalog, etc. findings also suggested that rather than intentionally selecting databases that were appropriate to their topics, students often resorted to searching in the first box they saw. this included searching for articles and books on their topics using search boxes that were not designed for that purpose, such as the database search box on the library’s a-z database page and the libguides site search tool for searching all guides. although many students did not always resort to searching first (many did attempt to browse to information technology and libraries | september 2019 51 specific library services), if they were not immediately successful, they would then type terms from the usability testing task into the first available search box.8 finally, we were also aware that many of our current libguides contained elements that were inconsistent with website search and design best practices as well as misaligned with typical website users’ behaviors and expectations, as described by usability experts like jakob nielsen. as such, we wanted to test the usability of some of these potentially problematic elements to determine whether they negatively impacted the user experience in the libguides environment. if they did, we would have institution-specific data that we could leverage to develop local recommendations for libguides standards and best practices that would better meet students’ needs. literature review the growth of libguides since springshare’s founding in 2007, libguides have been widely embraced by academic libraries.9 in 2011, ghaphery and white visited the websites of 99 arl university libraries in the united states and found that 68 percent used libguides as their research guides platform. they also surveyed librarians from 188 institutions, 82 percent of which were college or university libraries, and found that 69 percent of respondents reported they used libguides.10 as of december 2018, springshare’s libguides community website indicated that 1,620 academic libraries in the united states and a total of 1,823 academic libraries around the world, not counting law and medical libraries, were using the libguides platform.11 libguides’ popularity is due in part to its user-friendly format, which eliminates most technical barriers to entry for would be guide authors. for example, anderson and springs surveyed librarians at rutgers university and found they were more likely to update and use libguides than previous static subject guides that were located on the library website and maintained by the webmaster, to whom subject specialists submitted content and any needed changes.12 the majority of librarians reported that having direct access to the libguides system would increase how often they updated their guides. moreover, after implementing the new libguides system, 52 percent said they would update guides as needed, and 14 percent said they would update guides weekly; prior to implementation, only 36 percent stated they would update guides as needed, and none said they would do so weekly. libguides usability testing and user studies although much literature has been published on the usability of library websites,13 fewer studies have focused on research guides or libguides specifically. of these, several focused on navigation and layout issues. for example, in their 2012 libguides navigation study, pittsley and memmott confirmed their initial hypothesis that the standard libguides navigation tabs located in a horizontal line near the top of each page can sometimes go unnoticed, a phenomenon referred to as “banner blindness.” as a result of their findings, librarians at their institution decided to increase the tab height in all libguides, and some librarians also chose to add content menus on the homepages of each of their guides. they moved additional elements from the header to the bottom of the guide under the theory that decreased complexity would contribute to increased tab navigation recognition.14 sonsteby and dejonghe examined the efficacy of libguides’ tabbed navigational interface as well as design issues that caused usability problems. they identified user preferences, su ch as users’ am i on the library website? | conrad and stevens 52 https://doi.org/10.6017/ital.v38i3.10977 desire for a visible search box that behaved like a discovery tool, and design issues that frequently led to confusion, such as search boxes that searched for limited types of content, like journal titles. they also found that jargon confused users, and that guides containing too many tabs that were inconsistently labeled led to both confusion and the perception that guides were “cluttered” and “busy.”15 thorngate and hoden explored the effectiveness of libguides version two designs, specifically focusing on use of columns, navigation, and the integration of libguides into the library website. they found that two-column navigation is the most usable, users are more likely to notice left navigation over horizontal tabs, and students do not view libguides as a separate platform, expecting instead for it to live coherently within the library’s website. 16 almeida and tidal employed a mixed methods approach to gather user feedback about libguides, including usage of “paper prototyping, advanced scribbling, task analysis, tap, and semi-structured interviews.”17 the researchers intended to “translate user design and learning modality preferences into executable design principles,” but found that no one layout filled all students’ needs or learning modalities.18 ouellette’s 2011 study differed from many libguides-focused articles in that rather than assigning participants usability tasks, it employed in-depth interviews with 11 students to explore how they used subject guides created on the libguides platform and the features they liked and disliked about them. like some of the aforementioned studies, oullette found that students did not like horizontal tabbed navigation, preferring instead the more common left-side navigation that has become standard on the web. however, the study was also able to explore issues that many of the usability task-focused studies did not, including whether and how students use subject guides to accomplish their own research-related academic work. ouellette found that students “do not use subject guides, or at least not unless it is a last resort.”19 reasons provided for non-use included not knowing that they existed, preferring to search the open web, and not perceiving a need to use them, preferring instead to search for information rather than browsing a guide.20 such findings call into question the wisdom of expending time and resources on creating guides. however, ouellette asserted that students were more likely to use research guides when they were stuck, when they were required to find information in a new discipline, or when their instructors explicitly suggested that they use them.21 nevertheless, most students who had used libguides reported that they had done so solely “to find the best database for locating journal articles.”22 indeed, ouellette found that the majority of “participants had only ever clicked on the tab leading to the database section of a guide,” a finding that was consistent with staley’s 2007 study, which found that databases are the most commonly used subject guide section.23 while ouellette concluded that libguides creators should therefore emphasize databases on their guides, both the more recent widespread library adoption of discovery systems that search across databases, in many cases making it unnecessary for students to select a specific database, as well as the common practice of aggregating relevant databases under disciplinary subject headings on library databases pages implicitly call into question the need for duplicating such information on library subject guides. if users can easily find such information elsewhere, these conclusions also cast doubt on the effectiveness of the entire libguides enterprise. information retrieval behaviors: search and browse preferences in 1997, usability expert jakob nielsen reported that more than half of web users are “search dominant,” meaning that they go directly to a search function when they arrive at a website rather than clicking links. in contrast, only a fifth of users are “link dominant,” preferrin g to navigate sites by clicking on links rather than searching. the rest of the users employ mixed strategies, switching information technology and libraries | september 2019 53 between searching and clicking on links in accordance with what appears to be the most promising strategy within the context of a specific page.24 while some researchers have questioned the prevalence of search dominance, nielsen’s mobile usability studies have indicated an even stronger tendency toward search dominance when users access websites on their mobile devices.25 moreover, by 2011, nielsen’s research had indicated that search dominance is a user behavior that gets stronger every year, and that “many users are so reliant on search that it’s undermining their problem-solving abilities.” specifically, nielsen found that users exhibited an increasing reluctance to experiment with different strategies to find the information they needed when their initial search strategy failed.26 nielsen attributes the search dominance phenomenon to two main user preferences. the firs t is that search allows users to “assert independence from websites’ attempt to direct how they use the web.”27 the second is that search functions as an “escape hatch when they are stuck in navigation. when they can’t find a reasonable place to go next, they often turn to the site’s search function.” nielsen developed a number of best practices based on these usability testing results, including that search should be made available from every page in a website, since it is not possible to predict when users will feel lost. additionally, given that users quickly scan sites for a box where they can type in words, search should be configured as a box and not a link, it should be located at the top of the page where users can easily spot it, and it should be wide enough to accommodate a typical number of search terms.28 nielsen’s usability studies have shed light not only on where search should be located but also on how search should function. in 2005, nielsen reported that searchers “now have precise expectations for the behavior of search” and that “designs that invoke this mental model but work differently are confusing.”29 specifically, searchers’ “firm mental model” for how search should work includes “a box where they can type words, a button labeled ‘search’ that they click to run the search, [and] a list of top results that’s linear, prioritized, and appears on a new page.” moreover, nielsen found that searchers want all search boxes on all websites to function in the same way as typical search engines and that any deviation from this design causes usability issues. he specifically highlighted scoped searches as problematic, pointing out that searches that only cover a subsite are generally misleading to users, most of whom are unlikely to consider what th e search box is actually searching.30 while there is much evidence to support nielsen’s claims about the prevalence of search dominance, other studies have suggested that users themselves are not necessarily always search or link dominant. rather, some websites lend themselves better to searching or exploring links, and users often adjust their behaviors accordingly.31 although we did not find studies that specifically discussed the search and browse preferences and behaviors of libguides users, we did find studies of library website use that suggested that though users often exhibit search -dominant tendencies, they also often rely on a mixed approach to library website navigation. for example, hess and hristova’s 2016 study of users’ searching and browsing tendencies explored how students access library tutorials and online learning objects. specifically, they compared searching from a search box on the tutorials landing page, using a tag cloud under a search box, and browsing links.32 google analytics data revealed that students employed a mixed approach, equally relying upon both searching and clicking links to access the library’s tutorials.33 similarly, han and wolfram analyzed clickstream data from 1.3 million sessions in an image repository and determined that the two most common actions (86 percent of actions) were simple search and am i on the library website? | conrad and stevens 54 https://doi.org/10.6017/ital.v38i3.10977 click actions.34 however, users in this study exhibited a tendency toward search dominance, conducting simple searches in 70 percent of the actions.35 niu, zhang, and chen presented a mixed methods study analyzing search transaction logs and conducting usability testing f ocused on comparing the discovery layers vufind and primo. browsing in the context of their study included browsing search results. they found that most search sessions were very brief, and students searched using two or three keywords.36 xie and joo tested how thirty-one participants went about finding items on a website, classifying their approaches into what they described as eight “search tactics,” including explorative methods, such as browsing.37 over 88 percent of users conducted at least one search query, and 75 percent employed “iterative exploration,” browsing and evaluating both internal and external links on the site “until they were satisfied or they quit.”38 only four of thirty-one, or 6.7 percent, did “whole site exploration,” a tactic which included browsing and evaluating most of the available information on a website, looking through every page on the site to find the desired information.39 method this study addresses the following research questions: 1. when prompted to find a research guide, are students more likely to click links or type terms into a search box to find the guide? 2. are students more likely to successfully accomplish usability tasks directing them to find specific information on a libguide when using a guide with horizontal or vertical tabs? 3. how likely are students to click on subtabs? 4. how and to what extent does a one-, two-, or three-column content design layout affect students’ ability to find information on a libguide? 5. how and to what extent do students use embedded search boxes in libguides? 6. do students confuse screenshots of search boxes with functioning search tools? in 2015, the university library had access to two versions of libguides: the live version one instance and a beta version two instance. in order to answer our research questions and make data-informed design decisions that would improve the usability of our libguides, we compared the usability of existing research guides in libguides version one to test sites on libguides version two. version two guides differed from version one guides in several ways. version two guides were better aligned with nielsen’s recommendations regarding search box placement and function. every libguide page included a header identical to the library website’s header, which contained a global search box that searched both library resources and the library’s website. the inclusion of a visible discovery tool in the header was consistent with usability recommendations in the literature40 as well as our own prior library website usability tests, which indicated many users preferred searching for resources over finding a path to them by clicking through a series of links. in mid-april 2015, ten students were scheduled to test libguides. each student attempted the same seven tasks, but five students tested the current version of libguides and five students tested version two. the sessions were recorded using camtasia, and students completed usability tasks on a laptop that was hooked up to a large television monitor, allowing the two librarians who were in the room to observe how students navigated the library’s website and libguides platform. one librarian served as the moderator and the other managed the recording technology.41 although additional members of the web team were interested in viewing the test information technology and libraries | september 2019 55 sessions, in order to avoid overwhelming the students, only two librarians sat in the sessions. the moderator read tasks aloud and students were instructed to think aloud while completing each task, narrating their thought processes and navigational decisions. students were recruited via a website usage and perceptions survey sent out the prior quarter, which included a question as to whether they would be interested in participating in usability testing. the students who received this survey were selected from a randomized sample provided by the university’s institutional research office. the sample included both lower division students in the first or second year of their studies and transfer students. students were also recruited in information literacy instruction sessions for lower-level english courses as well as in a creditbearing information literacy course taught by librarians. survey respondents and students from the targeted classes who indicated that they would be interested in participating in usability testing were subsequently contacted via email. students with appropriate testing day availability were selected. students from the various colleges were represented, including engineering; business administration; letters, arts and social sciences; education and integrative studies; and hospitality management. all of the participants were undergraduates and most were lower division students. we chose to focus on recruiting lower division students because we wanted to ensure that our guides were usable by students with the least amount of library experience; many lower division students are unaware of library services and may not have taken a library instruction session or a library information literacy course. however, while the goal was to recruit lower division students, scheduling difficulties, including three no-shows, led us to recruit students on-the-fly who were in the library, regardless of their lower division or upper division status. task 1 in both rounds of usability testing, students were prompted to find a “research guide” to help them write a paper on climate change for a com 100 class. students started from the homepage of the library. two possible success routes included browsing to a featured links section on the homepage where a “research guides” link was listed (see figure 1) or searching via the top level “onesearch” discovery layer search box, displayed in figure 2, which delivered results, including articles from databases, books from the catalog, library website pages, and libguides pages, in a bento-box format. the purpose of this task was to determine if students browsed or searched to find research guides. we defined browsing as clicking on links, menus, or images to arrive at a result, whereas searching involved typing words and phrases into a search box. am i on the library website? | conrad and stevens 56 https://doi.org/10.6017/ital.v38i3.10977 figure 1. featured links section on library homepage. figure 2. onesearch search box on library homepage. task 2 task 2 was designed to compare the usability of libguides version one’s horizontal tab orientation with version two’s left navigation tab option. students were provided with a scenario in which they were asked to compare two public opinion polls on the topic of climate change for the same com 100 class. we displayed the appropriate research guide for the students and instructed them to find a list of public opinion polls. the phrase “public opinion polls” appeared in the navigation of both versions of the guide. figure 3 displays the research guide with horizontal tab navigation and figure 4 with vertical, left tab navigation. information technology and libraries | september 2019 57 figure 3. horizontal tab navigation. am i on the library website? | conrad and stevens 58 https://doi.org/10.6017/ital.v38i3.10977 figure 4. left tab navigation. task 3 in the third scenario, students were informed that their professor recommended that they u se a library “research guide” to find articles for a research paper assignment in an apparel merchandising and management class. students were instructed to find the product development articles on the research guide. the phrase “product development” appeared as a subtab in both versions of the guide. this task was intended to test whether students navigated to subtabs in libguides. as shown in figure 5, the subtab located on the horizontal navigation menu appeared when scrolled over but was otherwise not immediately visible. in contrast, figure 6 shows how the navigation was automatically popped open on the left tab navigation menu so that subtabs were always visible, a newly available option in libguides version two. figure 5. horizontal subtab options. information technology and libraries | september 2019 59 figure 6. left tab navigation with lower subtabs automatically open. task 4 on the same apparel merchandising and management libguide, students were asked where they would go to find additional books on the topic of product development. the librarian who designed this libguide had included search widgets in separate boxes on the page that searched the catalog and the discovery layer “onesearch.” we were interested in seeing whether students would use the embedded search boxes to search for books. this functionality was identical in both the version one and two instances of the guide, as shown in figure 7. figure 7. embedded catalog search and embedded discovery layer search. task 5 in the fifth scenario, students were told that they were designing an earthquake-resistant structure for a civil engineering class. as part of that process, they were required to review am i on the library website? | conrad and stevens 60 https://doi.org/10.6017/ital.v38i3.10977 seismic load provisions. we asked them to locate the asce standards on seismic loads using a research guide we opened for them. the asce standard was located on the “codes & standards” page, which could be accessed by clicking on the “codes & standards” tab. the version one instance of the guide was two-columned, and a link to the asce seismic load standard was available in the second column on the right, per figure 8. the version two instance of the guide used a single, centered column, and the user had to scroll down the page to find the standard, per figure 9. we wanted to see if students noticed content in columns on the right, as many of our libguides featured books, articles, and other resources in columns on the right side of the page, or whether guides with content in a single central column were easier for students to use. figure 8. two-column design with horizontal tabs. information technology and libraries | september 2019 61 figure 9. two-column design with left tab navigation. task 6 because librarians sometimes included screenshots of search interfaces in their guides, we were interested in testing whether students mistook these images of search tools for actual search boxes. in task six, we opened a civil engineering libguide for students and told them to find an online handbook or reference source on the topic of finite element analysis. as shown in figure 10, a screenshot of a search box was accompanied by instructional text explaining how to find specific types of handbooks. within this libguide, there were also screenshots of the onesearch discovery layer as well as a screenshot of a “findit” link resolver button. am i on the library website? | conrad and stevens 62 https://doi.org/10.6017/ital.v38i3.10977 figure 10. screenshots used for instruction. task 7 the final task was designed to test whether it was more difficult for students to find content in a twoor three-column guide. students were instructed to do background research on motivation and classroom learning for a psychology course. they were told to find an “encyclopedic source” on this topic. within each version of the psychology libguide, there was a section called “useful books for background research.” as shown in figure 11, in the version one libguide, books useful for background research were displayed in the third column on the right side of the page. the version two libguide displayed those same books in the first column under the left navigation options. information technology and libraries | september 2019 63 figure 11. books displayed in third column. am i on the library website? | conrad and stevens 64 https://doi.org/10.6017/ital.v38i3.10977 figure 12. two-column display with books in the left column. results searching vs. browsing to find libguides understanding how students navigate and use libguides is important, but if they have difficulty finding the libguides from the library homepage, usability of the actual guides is moot. of the ten students tested, six students used the onesearch discovery layer located on the library’s homepage to search for a guide designed to help them write a paper on climate change for a com 100 class. frequently used search terms included “research guide,” “communication guides,” “climate change,” “climate change research guide,” “faculty guides,” and “com 100.” of these students, two used search as their only strategy, typing search queries into whichever search box they discovered. neither of these students were successful at locating the correct guide. the remaining four students used mixed strategies; they started by searching and resorted to browsing after the search did not deliver exact results. two of these students were eventually successful in finding the specific research guide; two were not. of the six studen ts who searched using the discovery layer, only one did not find the libguides landing page at all. in general, it seems that the task and student expectations during testing were not aligned with the way the guide was constructed. only one student went to the controversial topics guide because “climate change is a controversial topic.” one student thought the guide would be titled “climate change” and another thought there might be a subject librarian dedicated to climate change. students would search for keywords corresponding with their course and topic, but generally they did not make the leap to focus more broadly on controversial topics. only one student browsed directly to the “research guides” link on the homepage and found the guide under subject guides for “communication" on the first try. another student navigated to a information technology and libraries | september 2019 65 “services and help” page from the main website navigation and found a group of libguides that were labeled “user guides,” designed specifically for new students, faculty, staff, and visitors; however, the student did not find any other libguides relevant to the task at hand. the remaining two students navigated to pages with siloed content; one student clicked the library catalog link on the library homepage and began searching using the keywords “climate change.” the other student clicked on the “databases” link. upon arriving at the databases a-z page, the student chose a subject area (science) and searched for the phrase “faculty guides” in the databases search box. the student was unable to find the research guide because our libguides were not indexed in this search box; only database names were listed. only three out of ten students found the guide; the rest gave up. two of the successful participants employed mixed strategies that began with searching and included some browsing; the third student browsed directly to the guide without searching. testers in the libguides version one environment attempted the task an average of 3.8 times before achieving success or giving up compared to an average of 3.2 attempts per tester in version two testing. we defined an attempt as browsing or searching for a result until the student tried a different strategy or started over. for instance, if a student tried to browse to a guide and then chose to search after not finding what they were looking for, that constituted two attempts. testers in both rounds began on the same library website. one major difference between the two research guides landing pages was the search boxes; one was an internal libguides search box (version one) and one was a global onesearch box (version two). it is possible that testers in round two made fewer attempts because of the inclusion of the onesearch box. for those testing with the libguides search box in version one, three searched on the libguides landing page. from both rounds, eight of the students located the libguides landing page, regardless of whether or not they found the correct guide. the two students who did not find the correct guide did land in libguides, but they arrived at specific libguides pages that served other purposes (one found a onesearch help guide and the other landed on a new users’ guide). navigation, tabs, and layout navigation, tab options (including subtab usage), and layouts were evaluated in tasks two, three, five, and seven. as mentioned in the method section, the first group of five students who tested the interface used the version one libguides with horizontal navigation and hidden subtabs. the second round of five students used the version two libguides with left navigation and popped open subtabs. students in both rounds were able to find items in the main navigation (not including subtabs) at consistent rates, with those in the second round with left navigation completing all tasks significantly faster than the first-round testers (38 seconds faster on average across all tasks). in task two, students were asked to find public opinion polls, which they could access by clicking a “public opinion polls” link on the main navigation. in both rounds, regardless of horizontal or vertical navigation, nine of the students clicked on the tab for the polls. only one student testing on version two was unable to find the tab. students in version one testing with horizontal navigation attempted this task two times on average before successfully finding the tab; students testing on version two with vertical navigation attempted 1.4 times before finding the tab with the polls or giving up. am i on the library website? | conrad and stevens 66 https://doi.org/10.6017/ital.v38i3.10977 when asked in task three to find articles on product development, which were included on a “product development” subtab under the primary “library databases amm” tab, nine out of ten students were unable to locate the subtab. in libguides version one, this subtab was only viewable after clicking the main “library databases amm” tab. in libguides version two, this subtab was popped open and immediately visible underneath the “library databases amm” tab. a version two tester was the only student who clicked on the “product development” subtab. students attempted this task 1.8 times in version one testing compared to 1.2 times for those testing version two. it is worth noting that six of the students found product development articles by searching via other means (onesearch, databases, and other library website links); they just did not find the articles on the libguide shown. while they still successfully found resources, they did not find them on the guide we were testing. in task five, we asked students to find the asce standards on seismic loads on a specific guide. the version one guide used a two-column design while the version two guide with the same content utilized a single column for all content. while six students found the standards (three in round one and three in round two), only four of ten testers overall did so by browsing to the resource. three of the students who chose to browse were in round one and the fourth student was from round two. in version one testing with the two-column design, two students found the standards after making two attempts to browse the guide. both of these students used the libguides “search this guide” function to find the correct page for the standards using keywords “asce standards” and “asce.” the third successful student in this round used a mixed methods strategy of searching and browsing. she used the search terms “asce standards on seismic loads” and then searched for “seismic loads” twice in the same search box. she landed on the correct tab of the libguide, scrolling over the correct standard multiple times, but only found the standards after the sixth attempt. during version two testing, which included the one column design and global search box, only one student browsed to the standards on the libguide. this student scrolled up and down the main libguide page, clicked on the left navigation option for “find books,” then the left navigation option for “codes & standards” and scrolled down to find the correct item. four out of five version two testers bypassed browsing altogether, instead using the onesearch box on the page header to try to find the asce standards. two of those students found the specific asce standards that were featured on the libguide; the other two found asce standards, just not the specific item we intended for them to find. the four students who did no t find the specific standards were equally distributed across both testing groups. on average, students attempted to complete the task 3.6 times in version one testing and 1.6 times in version two testing before either finding the resource or giving up. task seven asked students to find an encyclopedic source using a three-column design in version one and a two-column design in version two. the version one guide listed encyclopedias in the right-most column of a three-column layout and the version two guide included them under the left navigation in a two-column design. only three students found the encyclopedia mentioned in task seven, two of whom completed the task using version two’s two-column display. only one student was able to locate the encyclopedia in the third column in version one testing. the seven students who were unable to find the encyclopedia all attempted to search when they were unable to find the encyclopedia by browsing. six of these seven students searched for the keywords “motivation and classroom learning” and the seventh for “motivation and learning.” those who landed in onesearch (six out of seven) received many results and were unable to find encyclopedias. one student searched within libanswers for “encyclopedia” and found britannica. information technology and libraries | september 2019 67 one student attempted to refine by facets, thinking that “encyclopedia” would be a facet similar to “book” or “article.” using search, especially onesearch, to attempt to find an encyclopedia was ultimately unsuccessful for the students. search terms students chose were far too general for them to complete the task successfully. students in version one testing attempted this task 2.4 times compared to 3.2 times for version two testers. embedded search boxes & screenshots of search boxes embedded search boxes and screenshots of search boxes were tested in tasks four and six. the header used in version one libguides was limited, defaulting to searching within the guide, and the additional options on the dropdown menu next to the search box did not include a global “onesearch.” in version two guides, a onesearch box that searched most library resources (articles, books, library webpages, and libguides) was included. during task four, which asked students who were already on a specific guide how they would go about finding additional books on product development, version one testers were much more likely to use embedded search box widgets in the guide content. three of the five students in version one testing used the search widgets on the page to either search the catalog or search onesearch. the remaining two students in that round used a header search or browsed. one of these students used the libguides “search this guide” function in libguides and searched for “producte [sic] development books.” this student did not notice the typo in the search term and subsequently navigated out of libguides to the library website via the library link on the libguides header. the user then searched the catalog for “product development” and was able to locate books. a fifth student in the version one testing round did not use embedded search box widgets or the libguides search. she browsed through two guide pages and then gave up. in version two testing, three of five students used the global onesearch box to find the product development books. the remaining two students chose to search the millennium catalog linked from a “books and articles” tab on the main website header, finding books via that route. during testing of both versions, students tried an average of 1.5 times to complete the task before achieving success or acknowledging failure. nine out of ten testers found books on the topic of product development. the one tester who did not find the books attempted to complete the task one time; she found product development articles from the prior task and said she would click on the same links (for individual article titles) to find books. in task six, half of the ten students from both rounds attempted to click on screenshots of search boxes or unlinked “findit” buttons. a screenshot of the onesearch box and a knovel search box were embedded in the test engineering guide. two users in the version one testing and one tester in version two testing attempted to click on the onesearch screenshot. one student in version two testing attempted to click on the knovel search box screenshot. one student from version one testing tried to click on a “findit” button for the link resolver. comparisons between rounds we recorded how many attempts were needed to complete tasks in each round. in round one, which tested libguides version one, students took an average of 2.74 tries to complete the tasks. in round two, which focused on libguides version two, students took two tries to complete tasks. average attempts per task are displayed in figure 13. we also timed the rounds to see how many minutes it took students to complete all of the tasks. in the first round, it took 16:07 minutes on average and in the second round 15:29 minutes. this does not appear to constitute an important am i on the library website? | conrad and stevens 68 https://doi.org/10.6017/ital.v38i3.10977 difference, but there was one tester in round two who narrated his experiences very explicitly and in great detail. his session lasted 23 minutes. if his testing is excluded, then round two had a shorter average of 13:30 minutes. despite the lower total time spent testing, task success was nearly equal between the two rounds. details on individual testing times per participant are in figure 14. in round one, testers were successful at completing the task, whether they completed it in the manner we predicted or not, for 24 tasks. round two was slightly lower with 23 successfully completed tasks. success was, however, subjective. in task three, we wanted to test whether students found a list of articles on a libguide on a certain topic. nearly all of the students (nine out of ten) found articles on the topic, but only one of them found them via the method we had anticipated. other tasks produced similar results where the students found resources that technically fulfilled the task we had asked them to complete, even though they did not test the feature of the interface we were hoping. in these cases, we called this a success, as they had fulfilled the task as written. figure 13. attempts per task for libguides v1 compared to libguides v2. information technology and libraries | september 2019 69 figure 14. total time per participant for libguides v1 compared to libguides v2. discussion there were several overarching themes that we discovered during the testing of libguides versions one and two. the first relates to nielsen’s conception of search dominance and its implications for finding guides as well as resources within guides. task one, which asked students to navigate to a relevant libguide from the library homepage, revealed that students were much more likely to search for a guide than to navigate to one by using links. although the library homepage in our study included a clearly demarcated “research guides” link, only one tester clicked on it. in contrast, six of ten of the students used search as their first and only strategy, and an additional two of ten first clicked on a link and then switched to search as their next strategy. although our initial search-focused research question and related task looked specifically at how students navigate to guides, most of the other tasks provided additional insight into how students navigate within them as well. our findings are consistent with nielsen’s observation that search functions as an “escape hatch” when users get “stuck in navigation.”42 many students we tested used mixed strategies to find content, often resorting to searching for content when they were confused, lost, or impatient. while one student explicitly stated that search is a backup for when he cannot find something via browsing, search behaviors from many other students suggested that they were “search-dominant,” preferring searching over browsing both on library website pages and from within libguides. similar to nielsen’s studies on reliance on search engine results, students were unlikely to change their search strategies even if they were not receiving helpful results. students did not engage in what xie and joo referred to as “whole site exploration,” browsing and evaluating most of the available information on a website to accomplish the assigned tasks.43 while research guides are sometimes designed to function as linear pathways that lead students through the research process or as comprehensive resources that introduce am i on the library website? | conrad and stevens 70 https://doi.org/10.6017/ital.v38i3.10977 students to a series of tools and resources, all of which could be useful in the research process, the students we tested did not approach guides in this way. rather than starting on the first tab and comprehensively exploring it tab by tab and content box by content box, students ignored most of the content on the page, searching instead to find the specific information they needed. our testers’ search behaviors were also consonant with nielsen’s observation that scoped searches are inconsistent with users’ mental models about how search should function. nielsen found that search boxes that only cover a subsection of a site are generally confusing to users and negatively impact users’ ability to find what they are looking for on a site. in our study, several students used scoped search boxes both on library website pages and within libguides to find content that the search did not index. version two testers had access to a search box on every page that aligned with their global search expectations, and they frequently used it, so much so that they their preference for search disrupted some of the usability questions we were trying to answer in our tasks. for example, users’ tendency to search instead of browse interfered with our ability to clearly discern whether it was easier for students to find content on pages with one-, two-, or three-column content designs (many students did not even attempt to find content in the columns). students’ global search expectations of search boxes also have implications on their ability to find libguides that they have been told exist or to discover the existence of libguides that might help them with their research. for example, students with search-dominant tendencies who attempt to use a library search tool that does not index libguides or the content within libguides will be unlikely to find them. while students did use search boxes embedded within libguides content areas, version two testers had access to a global search box located at the top right-hand side of every libguides page, and as a result, they were more likely to use the global search than the embedded search boxes. this behavior is consistent with nielsen’s assertion that for ease of use, search should consist of a box “at the top of the page, usually in the right-hand corner,” that is “wide enough to contain the typical query.”44 version two testers were quick to find and use the search box in the header that fit this description. although students often used search boxes, and global ones in particular, to accomplish usability testing tasks, they were sometimes impeded by screenshots of search boxes and links. several students clicked on them thinking they were live, unable to immediately distinguish that they differed from the functional embedded search boxes that some of the guides also included. as nielsen observed, “users often move fast and furiously when they’re looking for search. as we’ve seen in recent studies, they typically scan the homepage looking for ‘the little box where i can type.’”45 librarians sometimes use screenshots of search boxes in an effort to provide helpful visuals to accompany instructional content (text) focusing on how to access and use a specific resource. because many students scan the page for a search box so that they can quickly find needed information rather than carefully reading material in the content boxes, it could be argued that these screenshots inadvertently confuse students and impede usability. another way to look at this issue, however, may be that guide content can be misaligned with user expectations and contexts. a user looking to search for articles on a topic who stumbles on a guide may have no reason to do anything other than look for a search box. in contrast, a user introduced to a guide in the context of a course who is asked to read through the content and explore three listed resources in preparation for a discussion to occur in the next class meeting will likely have a very different orientation to the guide and perception of its purpose and usefulness. information technology and libraries | september 2019 71 students’ search behaviors also made us question the efficacy of linking to specific books or articles within a libguide. in tasks three through seven, many of the students used onesearch or the library catalog to search for specific books or articles rather than referencing the guide where potentially useful resources were listed. for example, while trying to find the com 100 guide during task one, one student commented, “i never really look for stuff. i just go to the databases.” version two testers, who had access to a global search in the header of every libguides page, were even more likely to navigate away from the guides to find books or articles. while several studies in the literature had suggested that vertical tab navigation may be more usable than horizontal tab navigation, our study did not bear this out, as students in both rounds were able to find items on vertical and horizontal navigation menus at relatively consistent rates. similarly, one-, two-, and three-column content design did not appear to affect users’ abilities to find information and links on a page; however, users’ tendency to search rather than bro wse interfered with the relevant task’s intention of comparing the browsability of different content column designs, and therefore more targeted research on this question is needed. one student commented on the pointlessness of content in second columns, stating “nobody ever looks on the right side, i always look on the left cause everything’s usually on the left side. because you don’t read from right to left, it’s left to right.” he was, nevertheless, able to complete the task regardless of the multi-column design. subtab placement in libguides versions one and two was very different from each other; version one subtabs were invisible to users unless they hovered over the main menu item on the horizontal menu, while version two allowed us to make subtabs immediately visible on the vertical menu, without any action needed by the user to uncover their existence. given the subtabs’ visibility, we had anticipated that version two testers would be more likely to find and use subtabs, but this turned out not to be the case. only one out of ten students found the relevant subtab. although the successful tester was using libguides version two in which the subtab was visible, the fact that nine out of ten testers failed to see the subtab, regardless of whether it was immediately visible or not, suggests that subtab usage may not be an effective navigation strategy. results from all tasks also suggested that students might not understand what research guides are or how guides might help them with their research. like many libraries, the cal poly pomona university library did not refer to libguides by their product name on the library website, labeling them “research guides” instead in an effort to make their intended purpose clearer. testing revealed, however, that students are not predisposed to think of a “research guide” as a useful tool to help them get started on their research. one student said, “i’m not sure what the definition of a research guide is.” when prompted to think more about what it might be, the student guessed that it was a pamphlet with “something to help me guide the research.” the student did not offer any additional guesses about what specifically that help might look like. moreover, students’ tendency to resort to search itself can also be interpreted as evidence that they are confused about how guides are supposed to help them with research. instead of reading or skimming information on the guides, students used search as a strategy to attempt to complete the tasks an average of 70 percent of the time across both rounds. many of their searches navigated students away from the very guides that were designed to help them. the tendency to navigate away from guides was likely increased by the content included in the guides we tested, since many incorporated search boxes and links that pointed to external systems, such as the catalog, the discovery layer, libanswers, etc. however, many students’ first attempts to am i on the library website? | conrad and stevens 72 https://doi.org/10.6017/ital.v38i3.10977 accomplish the tasks given them involved immediately navigating away from libguides. others navigated away shortly after an initial attempt or two to complete the task within the guide. all but one student navigated away from libguides to complete tasks; four did so more than five times. eight of ten students used onesearch in the header or from the library homepage; the other two used embedded onesearch boxes on the libguides. results also suggested that it might be easier for students to find guides that are explicitly associated with their courses, through either the guides’ titles or other searchable metadata, than to find and understand the relevance of general research guides. even though general research guides might be relevant to the subject matter of students’ courses, guides that explicitly reference a course or courses are easily discoverable and their relevance is more immediately obvious. for instance, the first task asked students to find a “research guide” to help them write a paper on climate change for a com 100 class. we wanted to see whether students would find the “controversial topics” research guide that was designed for com 100 and that included the course number in the guide’s metadata. mentioning the course number in the task seemed to make it more actionable as an assignment they might expect from a professor. when students searched for “com 100,” they were more likely to find the controversial topics guide; two of three students who found the guide searched using the course number. if course numbers had not been included, they might not have found the guide as searching for the course number brought up the correct guide as the one result. two additional students unsuccessfully attempted to find the guide by searching for “com100,” without a space. had the libguides search been more effective, or had librarians included both versions of the course code with and without a space, more students would likely have found the guide. limitations limitations of this study include weaknesses in both our usability tasks and the content of some of the libguides, which made it difficult to answer our research questions. we may have tested too many different features at once, which can be a pitfall of usability testing in general. some tasks, such as tasks five and seven, tested both navigation placement and column layouts. in task five, for instance, there were multiple factors that could have led to success or failure; did a student overlook the asce standards because of column layout or tab placement or was the layout moot because the search box was comprehensive enough to allow them to complete the task without browsing the guide’s content? similarly, task two tested a guide with seven tabs. it is not clear if the students who did not click on a tab missed it because of the placement of the navigation on the page or because the navigation contained too many options. weaknesses in the content of many of the libguides used in the study led to additional limitations. many of the libguides were text heavy and included jargon. one student even commented, “ it’s a lot of words here, so i really don’t want to read them.” although we set out to test the usability of different navigation layouts and template designs, factors such as content overload or use of jargon could have influenced success or failure. the wording of task seven, for example, was particularly problematic and led to unclear results. students were instructed to find an “encyclopedic source” in an attempt to see if they would click on books listed in a third column in version one testing compared to a left column in version two testing. the column header was titled “useful books for background research” and the box included encyclopedias. students appeared to struggle with the idea of what constituted an “encyclopedic source.” when one student was specifically asked what she thought the term meant, she responded, “not sure.” based information technology and libraries | september 2019 73 on the results of this task, it was difficult to discern if the interface or the wording of the task resulted in task completion failures. the contrived nature of usability testing itself might also have affected our results. for example, one student exhibited a tendency to rush through tasks, a behavior that may have been due to experiencing content overload, anxiety over being observed during the testing process, time limitations of which we were unaware, etc. on the other hand, behavior that we perceived to be rushing might be consistent with the students’ normal approach to navigating websites. whatever the case, it is important to keep in mind that usability testing puts users on the spot because they are testing an interface in front of an audience. the usability testing context can therefore influence user behavior, including the number of times students might attempt to find a resource or complete a given task. some students might be impatient or uncomfortable with the process, resulting in attempts to complete the testing as quickly as possible, including giving up on tasks more quickly than they would in a more natural setting. conversely, other students might be more likely to expend more time and effort when performing in front of an audience than they would privately. conclusion usability testing was effective for revealing some of the difficulties students encounter when using our libguides and our website and for prompting reflection on the types of content they include, how that content is presented, and the contexts in which that content may or may not be useful to our students. analysis of the data from our study and a review of the literature within the context of existing political realities and constraints within our library led to our development of several data-informed recommendations for libguides creators, most of which were adopted. one of the most important recommendations was that libguides should use the same header that is on the library’s main website, which includes a global search box. use of the similar header not only would provide a consistent look and feel but it would also provide users with the global search box at the top of the page that is aligned with their mental model of how search should function. our testing confirmed many students prefer to use global search boxes to find information rather than browsing or in addition to browsing when they get stuck. while some librarians were not thrilled with what they viewed as the privileging or predominance of the discovery layer on their guides, preferring to direct students to specific databases instead of the onesearch, this recommendation was ultimately accepted due to the compelling nature of the usability data we were able to share. our recommendation that subtabs should be avoided was also accepted because of how compelling the data was: 90 percent of users failed to find links located on subtabs. we also recommended that librarians should evaluate the importance of all content on their guides to minimize student confusion when browsing. while we acknowledged that there might be contexts when screenshots of search boxes would be useful, we encouraged librarians to think carefully about their use and to avoid them when possible. additionally, librarians were encouraged to evaluate whether the content they were adding was of core importance to the libguide, reflecting on the degree to which it added value or possibly detracted from the libguide, perhaps by virtue of lack of relevance or content-overload. content boxes consisting of suggested books on a general subject guide were used as an example, given the difficulty of providing useful book suggestions to students working on wildly different topics. while results from our rounds of usability testing did not indicate that left-side vertical navigation was decidedly more usable than horizontal navigation at the top of the page, we nevertheless am i on the library website? | conrad and stevens 74 https://doi.org/10.6017/ital.v38i3.10977 recommended that all guides should use left tab navigation, for consistency’s sake across guides, because left-side navigation has become standard on the web, and because other libguide studies have suggested that left-side navigation is easier to use than horizontal navigation, due to issues such as “banner blindness.”46 the librarians agreed, and a template was set up in the administrative console requiring that all public-facing libguides use left tab navigation. based on other usability studies in the literature as well, we also recommended that guides should include no more than a maximum of seven main content tabs.47 although our study did not provide any actionable data about the relative usability of one-, two-, and three-column content designs, other articles in the literature had emphasized the importance of consistency and avoiding a busy look with too much content. in order to avoid both a busy look and having guides that looked decidedly different from each other due to inconsistent number of columns, we therefore recommended that all guides should utilize a two-column layout, with the left column reserved for navigation. all content should appear in a single main column. however, future iterations of libguides usability testing should attempt to find ways to test whether limiting content to a single column is indeed more usable than dispersing it across two or more columns. the group voted on many of our recommendations, and several were simple to implement and oversee because they could translate into design decisions that could be set up as default, unchangeable options within the libguides administration module. other recommendations were more difficult to operationalize and enforce. for example, because our findings indicated that students attempted to search for course numbers to find a guide that they were told was relevant to their research for a specific class, another one of our recommendations to the librarians’ group was to include, as appropriate, various course numbers in their guides’ metadata in order to both make them more discoverable and appear more immediately relevant to students’ coursework. this recommendation is not one that a libguides administrator could enforce due to issues revolving around subject matter and curriculum knowledge. the issue of context, and specifically the connection between courses and guides that has the potential to underscore their relevance and purpose to students, also caused us to question the effectiveness of general subject guides in assisting students with their research. if students are more likely to understand the relevance and purpose of a libguide when it is explicitly connected to their specific class or assignment and less likely to make the connection between a general research guide and their coursework, then the creation and maintenance of general subject guides might not be worth the time and effort librarians invest in them. this question is made more pressing by studies in the literature that indicate both low usage and shallow use of guides, such as using them primarily to find a database.48 while this question did not lead to a specific recommendation to the librarians’ group, we have since reflected that the return on investment issue might be effectively addressed via closer collaboration with faculty in the disciplines. if research guides are more clearly aligned with specific research assignments in specific courses , and if faculty members instruct their students to consult library research guides and integrate libguides and other library resources into learning management systems, perhaps use and return on investment would improve. researchers like hess and hristova, for example, found that online tutorials that are required in specific courses show high usage.49 the connection between course integration and usage may hold true with libguides as well. regardless, students’ frequent lack of understanding of what guides are designed to do and their tendency to navigate quickly away from them rather than exploring them suggests that information technology and libraries | september 2019 75 reconceptualizing what guides are designed to do, and what needs they are designed to meet in what specific contexts might prove to be a useful exercise. a guide designed as an ins tructional tool to teach specific concepts, topic generation processes, search strategies, citation practices, etc. within the context of a specific assignment for a specific course may well be immediately perceived as relevant to students in that course. such a guide discussed in the context of a class might also be perceived as more useful than guides consisting of lists of resources and tools, which are unlikely to be interpreted as helpful by students who stumble upon them while seeking research assistance on the library’s website. as such, thinking about how and in what context students are likely to find guides, and how material might be presented so that guides are quickly perceived as a potentially relevant resource worth exploring might also prove useful. the importance of talking to users cannot be overemphasized; without collecting user feedback, whether through usability testing or another method, it is difficult to know how students perceive and use libguides or any other library online service. getting user input on navigation flow, template design, and search functionality can provide valuable details that can help libraries improve the usability of their online resources. it is also important to note that in our rapidly changing environment, users’ needs and preferences also change. as such, collecting and analyzing user feedback to inform user-centered design should be a fluid process, not a one-time effort. admittedly, it can sometimes be challenging to make collective design decisions, particularly when librarians have strong opinions grounded on their own personal experiences working with students that conflict with usability testing data. although it is necessary to incorporate user feedback into the design process, it is also important to be open to compromise in order to achieve stakeholder buy-in for some usability-informed changes. as with many library services, usage of libguides is contingent at least in part on awareness, as students are unlikely to use services of which they are unaware or are unlikely to discover due to the limitations of a library’s search tools. given the prevalence of search dominance among our users, we should not assume that simply placing a “research guides” link on a webpage will lead to usage. increased outreach, better integration with the content of specific courses and assignments, and a thorough review of libguides content by those creating the guides with an eye toward the specific contexts in which they are likely to be used, taught, serendipitously discovered, etc. is necessary to ensure that the research guides librarians create are worth the time they invest in them. additional studies focusing on why students do or do not use specific types of research guides, the contexts in which they are most useful, how students use them, and the specific content in guides that students find most helpful are needed to determine whether and to what extent they are aligned with students’ information-seeking preferences, behaviors, and needs, as well as how they might be improved to increase their use and usefulness. am i on the library website? | conrad and stevens 76 https://doi.org/10.6017/ital.v38i3.10977 appendix 1: libguides usability testing tasks purpose: seeing how students browse or search to get to research guides task 1: you are writing a research paper on the topic of climate change for your com 100 class. your teacher told you that the library has a “research guide” that will help you write your paper. find the guide. start: library homepage purpose: testing tab orientation on top task 2: you need to compare two public opinion polls on the topic of climate change for your com 100 class. find a list of public opinion polls on the research guide shown. start: http://libguides.library.cpp.edu/controversialtopics or http://csupomona.beta.libguides.com/controversial-topics purpose: testing subtabs task 3: you are writing a research paper for your apparel merchandising & management class on the topic of product development. your teacher told you that the library has a “research guide” that includes a list of articles on product development. find the product development articles on this research guide. start: http://libguides.library.cpp.edu/amm or http://csupomona.beta.libguides.com/amm purpose: testing searching within the libguides pages task 4: if you were going to look for additional books on the topic of product development, what would you do next? start: http://libguides.library.cpp.edu/amm or http://csupomona.beta.libguides.com/amm purpose: testing two-tab column design task 5: you are designing an earthquake-resistant structure for your civil engineering course and need to review seismic load provisions. locate the asce standards on seismic loads. use the research guide we open for you. start: http://libguides.library.cpp.edu/civil or http://csupomona.beta.libguides.com/civilengineering purpose: seeing if including screenshots of search boxes is problematic task 6: your professor also asks you to find an online handbook or reference source on the topic of finite element analysis. locate an online handbook or reference source on this topic. start: http://libguides.library.cpp.edu/civil or http://csupomona.beta.libguides.com/civilengineering purpose: seeing if three-columns are noticeable http://libguides.library.cpp.edu/controversialtopics http://csupomona.beta.libguides.com/controversial-topics http://csupomona.beta.libguides.com/controversial-topics http://libguides.library.cpp.edu/amm http://csupomona.beta.libguides.com/amm http://libguides.library.cpp.edu/amm http://csupomona.beta.libguides.com/amm http://libguides.library.cpp.edu/civil http://csupomona.beta.libguides.com/civil-engineering http://csupomona.beta.libguides.com/civil-engineering http://libguides.library.cpp.edu/civil http://csupomona.beta.libguides.com/civil-engineering http://csupomona.beta.libguides.com/civil-engineering information technology and libraries | september 2019 77 task 7: find resources that might be good for background research on motivation and classroom learning for a psychology course. find an encyclopedic source on this topic. start: http://libguides.library.cpp.edu/psychology or http://csupomona.beta.libguides.com/psychology http://libguides.library.cpp.edu/psychology http://libguides.library.cpp.edu/psychology http://libguides.library.cpp.edu/psychology am i on the library website? | conrad and stevens 78 https://doi.org/10.6017/ital.v38i3.10977 references 1 william hemmig, “online pathfinders: toward an experience-centered model,” reference services review 33, no. 1 (february 2005): 67, https://dx.doi.org/10.1108/00907320510581397. 2 charles h. stevens, marie p. canfield, and jeffrey t. gardner, “library pathfinders: a new possibility for cooperative reference service,” college & research libraries 34, no. 1 (january 1973): 41, https://doi.org/10.5860/crl_34_01_40. 3 “about springshare,” springshare, accessed may 7, 2017, https://springshare.com/about.html. 4 “libguides community,” accessed december 4, 2018, https://community.libguides.com/?action=0. 5 see, for example, alisa c. gonzalez and theresa westbrock, “reaching out with libguides: establishing a working set of best practices,” journal of library administration 50, no. 5/6 (september 7, 2010): 638–56, https://doi.org/10.1080/01930826.2010.488941. 6 suzanna conrad and nathasha alvarez, “conversations with web site users: using focus groups to open discussion and improve user experience,” the journal of web librarianship 10, no. 2 (2016): 74, https://doi.org/10.1080/19322909.2016.1161572. 7 ibid., 74. 8 suzanna conrad and julie shen, “designing a user-centric web site for handheld devices: incorporating data-driven decision-making techniques with surveys and usability testing,” the journal of web librarianship 8, no. 4 (2014): 349-83, https://doi.org/10.1080/19322909.2014.969796. 9 “about springshare.” 10 jimmy ghaphery and erin white, “library use of web-based research guides,” information technology and libraries 31, no. 1 (2012): 21-31, https://doi.org/10.6017/ital.v31i1.1830. 11 “libguides community,” accessed december 4, 2018, https://community.libguides.com/?action=0&inst_type=1. 12 katie e. anderson and gene r. springs, “assessing librarian expectations before and after libguides implementation,” practical academic librarianship: the international journal of the sla academic division 6, no. 1 (2016): 19-38, https://journals.tdl.org/pal/index.php/pal/article/view/19. 13 examples include: troy a. swanson and jeremy green, “why we are not google: lessons from a library web site usability study,” the journal of academic librarianship 37, no. 3 (2011): 22229, https://doi.org/10.1016/j.acalib.2011.02.014; judith z. emde, sara e. morris, and monica claassen-wilson, “testing an academic library website for usability with faculty and graduate students,” evidence based library and information practice 4, no. 4 (2009): 24-36, https://doi.org/10.18438/b8tk7q; heather jeffcoat king and catherine m. jannik, “redesigning for usability: information architecture and usability testing for georgia tech https://dx.doi.org/10.1108/00907320510581397 https://doi.org/10.5860/crl_34_01_40 https://springshare.com/about.html https://community.libguides.com/?action=0 https://doi.org/10.1080/01930826.2010.488941 https://doi.org/10.1080/01930826.2010.488941 https://doi.org/10.1080/19322909.2014.969796 https://doi.org/10.6017/ital.v31i1.1830 https://community.libguides.com/?action=0&inst_type=1 https://journals.tdl.org/pal/index.php/pal/article/view/19 https://doi.org/10.1016/j.acalib.2011.02.014 https://doi.org/10.18438/b8tk7q information technology and libraries | september 2019 79 library’s website,” oclc systems & services 21, no. 3 (2005): 235-43, https://doi.org/10.1108/10650750510612425; danielle a. becker and lauren yannotta, “modeling a library website redesign process: developing a user-centered website through usability testing,” information technology and libraries 32, no. 1 (2013): 6-22, https://doi.org/10.6017/ital.v32i1.2311; darren chase, “the perfect storm: examining user experience and conducting a usability test to investigate a disruptive academic library web site redevelopment,” the journal of web librarianship 10, no. 1 (2016): 28-44, https://doi.org/10.1080/19322909.2015.1124740; andrew r. clark et al., “taking action on usability testing findings: simmons college library case study,” the serials librarian 71, no. 3-4 (2016): 186-96, https://doi.org/10.1080/0361526x.2016.1245170; anthony s. chow, michelle bridges, and patrician commander, “the website design and usability of us academic and public libraries: findings from a nationwide study,” reference & user services quarterly 53, no. 3 (2014): 253-65, https://journals.ala.org/index.php/rusq/article/view/3244/3427; gricel dominguez, sarah j. hammill, and ava iuliano brillat, “toward a usable academic library web site: a case study of tried and tested usability practices,” the journal of web librarianship 9, no. 2-3 (2015), https://doi.org/10.1080/19322909.2015.1076710; junior tidal, “one site to rule them all, redux: the second round of usability testing of a responsively designed web site,” the journal of web librarianship 11, no. 1 (2017): 16-34, https://doi.org/10.1080/19322909.2016.1243458. 14 kate a. pittsley and sara memmott, “improving independent student navigation of complex educational web sites: an analysis of two navigation design changes in libguides,” information technology and libraries 31, no. 3 (2012): 52-64, https://doi.org/10.6017/ital.v31i3.1880. 15 alec sonsteby and jennifer dejonghe, “usability testing, user-centered design, and libguides subject guides: a case study,” the journal of web librarianship 7, no. 1 (2013): 83-94, http://dx.doi.org/10.1080/19322909.2013.747366. 16 sarah thorngate and allison hoden, “exploratory usability testing of user interface options in libguides 2,” college & research libraries 78, no. 6 (2017), https://doi.org/10.5860/crl.78.6.844. 17 nora almeida and junior tidal, “mixed methods not mixed messages: improving libguides with student usability data,” evidence based library and information practice 12, no. 4 (2017): 66, https://academicworks.cuny.edu/ny_pubs/166/. 18 ibid., 63; 71. 19 dana ouellette, “subject guides in academic libraries: a user-centered study of uses and perceptions,” canadian journal of information and library science 35, no. 4 (december 2011): 436–51, https://doi.org/10.1353/ils.2011.0024. 20 ibid., 442. 21 ibid., 442-43. https://doi.org/10.1108/10650750510612425 https://doi.org/10.6017/ital.v32i1.2311 https://doi.org/10.1080/19322909.2015.1124740 https://doi.org/10.1080/0361526x.2016.1245170 https://journals.ala.org/index.php/rusq/article/view/3244/3427 https://journals.ala.org/index.php/rusq/article/view/3244/3427 https://doi.org/10.1080/19322909.2015.1076710 https://doi.org/10.1080/19322909.2016.1243458 https://doi.org/10.6017/ital.v31i3.1880 http://dx.doi.org/10.1080/19322909.2013.747366 https://doi.org/10.5860/crl.78.6.844 https://academicworks.cuny.edu/ny_pubs/166/ https://doi.org/10.1353/ils.2011.0024 am i on the library website? | conrad and stevens 80 https://doi.org/10.6017/ital.v38i3.10977 22 ibid., 443. 23 ibid., 443; shannon m. staley, “academic subject guides: a case study of use at san jose state university,” college & research libraries 68, no. 2 (march 2007): 119–39, https://doi.org/10.5860/crl.68.2.119. 24 jakob nielsen, “search and you may find,” nielsen norman group, last modified july 15, 1997, https://www.nngroup.com/articles/search-and-you-may-find/. 25 jakob nielsen, “macintosh: 25 years,” nielsen norman group, last modified february 2, 2 009, https://www.nngroup.com/articles/macintosh-25-years/; jakob nielsen and raluca budiu, mobile usability (berkeley: new riders, 2013), chap. 2, o’reilly. 26 jakob nielsen, “incompetent research skills curb users’ problem solving,” nielsen norman group, last modified april 11, 2011, https://www.nngroup.com/articles/incompetent-searchskills/. 27 jakob nielsen, “search: visible and simple,” nielsen norman group, last modified may 13, 2001, https://www.nngroup.com/articles/search-visible-and-simple/. 28 ibid. 29 jakob nielsen, “mental models for search are getting firmer,” nielsen norman group, last modified may 9, 2005, https://www.nngroup.com/articles/mental-models-for-search/. 30 ibid. 31 erik ojakaar and jared m. spool, getting them to what they want: eight best practices to get users to the content they want (and to content they didn’t know they wanted) (bradford, ma: uie reports: best practices series, 2001). 32 amanda nichols hess and mariela hristova, “to search or to browse: how users navigate a new interface for online library tutorials,” college & undergraduate libraries 23, no. 2 (2016): 173, https://doi.org/10.1080/10691316.2014.963274. 33 ibid., 176. 34 hyejung han and dietmar wolfram, “an exploration of search session patterns in an imagebased digital library,” journal of information science 42, no. 4 (2016): 483, https://doi.org/10.1177/0165551515598952. 35 ibid., 487. 36 xi niu, tao zhang, and hsin-liang chen, “study of user search activities with two discovery tools at an academic library,” international journal of human-computer interaction 30 (2014): 431, https://doi.org/10.1080/10447318.2013.873281. 37 iris xie and soohyung joo, “tales from the field: search strategies applied in web searching,” future internet 2 (2010): 268-69, https://doi.org/10.3390/fi2030259. https://doi.org/10.5860/crl.68.2.119 https://www.nngroup.com/articles/search-and-you-may-find/ https://www.nngroup.com/articles/macintosh-25-years/ https://www.nngroup.com/articles/incompetent-search-skills/ https://www.nngroup.com/articles/incompetent-search-skills/ https://www.nngroup.com/articles/search-visible-and-simple/ https://www.nngroup.com/articles/mental-models-for-search/ https://doi.org/10.1080/10691316.2014.963274 https://doi.org/10.1177/0165551515598952 https://doi.org/10.1080/10447318.2013.873281 https://doi.org/10.3390/fi2030259 information technology and libraries | september 2019 81 38 ibid., 275; 267-68. 39 ibid., 268-69. 40 sonsteby and dejonghe, “usability testing, user-centered design,” 83-94. 41 we experienced technical difficulties when capturing screens and audio simultaneously in camtasia. the audio did not sync in real time with the testing and we had to correct sync issues after the fact. a full technical test of screen capture and recording technology might have resolved this issue. 42 nielsen, “search: visible and simple.” 43 nielsen, “search and you may find”; nielsen, “incompetent research skills”; iris xie and soohyung joo, “tales from the field,” 268-69. 44 jakob nielsen, “search: visible and simple.” 45 ibid. 46 pittsley and memmott, “improving independent student navigation,” 52-64. 47 e.g., sonsteby and dejonghe, “usability testing, user-centered design,” 83-94. 48 ouellette, “subject guides in academic libraries,” 448; brenda reeb and susan gibbons, “students, librarians, and subject guides: improving a poor rate of return,” portal: libraries and the academy 4, no. 1 (january 22, 2004): 124, https://dx.doi.org/10.1353/pla.2004.0020; staley, “academic subject guides,” 119–39. 49 hess and hristova, “to search or to browse,” 174. https://dx.doi.org/10.1353/pla.2004.0020 abstract introduction literature review the growth of libguides libguides usability testing and user studies information retrieval behaviors: search and browse preferences method task 1 task 2 task 3 task 4 task 5 task 6 task 7 results searching vs. browsing to find libguides navigation, tabs, and layout embedded search boxes & screenshots of search boxes comparisons between rounds discussion limitations conclusion appendix 1: libguides usability testing tasks references information security in libraries: examining the effects of knowledge transfer tonia san nicolas-rocca and richard j. burkhard information technology and libraries | june 2019 58 tonia san nicolas-rocca (tonia.sannicolas-rocca@sjsu.edu) is assistant professor in the school of information at san jose state university. richard j. burkhard (richard.burkhard@sjsu.edu) is professor in the school of information systems and technology in the college of business at san jose state university. . author three (email) is title, institution. abstract libraries in the united states handle sensitive patron information, including personally identifiable information and circulation records. with libraries providing services to millions of patrons across the u.s., it is important that they understand the importance of patron privacy and how to protect it. this study investigates how knowledge transferred within an online cybersecurity education affects library employee information security practices. the results of this study suggest that knowledge transfer does have a positive effect on library employee information security and risk management practices. introduction libraries across the u.s. provide a wide range of services and resources to society. libraries of all types are viewed as important parts of their communities, offering a place for research, to learn about technology, to access accurate and unbiased information, and a place that inspires and sparks creativity. as a result, there were over 171 million registered public library users in the u.s. in 2016.1 a library is a collection of information resources and services made available to the community in which it serves. the american library association (ala) affirms the ethical imperative to provide unrestricted access to information and to guard against impediments to open inquiry.2 further, in all areas of librarianship, best practice leaves the library user in control of as many choices as possible.3 in a library, the right to privacy is the right to open inquiry without having the subject of one’s interest examined or scrutinized by others.4 many library resources require the use of a library card. to obtain a library card in the u.s. one must provide official photo identification showing personally identifiable information (pii), such as name, address, telephone number, and email address. pii connects library users or patrons with, for example, items checked out, and websites visited. as such, pii has the potential to build up an image of a library patron that could potentially be used to assess the patron’s character. in response, the ala developed a policy concerning the confidentiality of pii about library users.5 confidentiality extends to “information sought or received and resources consulted, borrowed, acquired or transmitted,” and includes, but is not limited to, database search records, reference interviews, circulation records, interlibrary loan records, and other personally identifiable uses of library materials, facilities, or services.6 in more recent years, the ala has further specified that the right of patrons to privacy applies to any information that can link “choices of taste, interest, or research with an individual.”7 when library users recognize or fear that their privacy or information technology and libraries | june 2019 59 confidentiality is compromised, true freedom of inquiry no longer exists. therefore, it is imperative that libraries use extra care when handling patron personally identifiable information. while librarians and other library employees may understand the importance of data protection, they generally don’t have the resources available to assess information security risk, employ risk mitigation strategies, or offer security education, training, or awareness (seta) programs. this is of particular concern as libraries increasingly have access to databases of both proprietary and personal information.8 seta programs are risk mitigation strategies employed by organizations worldwide to increase and maintain end-user compliance of information security and privacy policies. in libraries, information systems are widely used to provide services to patrons, however, there is little known about information security practices in libraries.9 given the sensitivity of the data libraries handle, and the lack of information security resources available to them, it is important for those currently or planning to work in the library environment to develop the knowledge necessary to identify risks and develop and employ risk mitigation strategies to protect information and information resources they are entrusted with. therefore, the research question in this present study is: how can cybersecurity education strengthen information security practices in libraries? currently, there is a dearth of research on information security practices in libraries.10 this is an important research gap to acknowledge given that patron privacy is fundamental to the practice of librarianship in the u.s, and the advancement in technology coupled with federal regulations adds to the challenges of keeping patron privacy safe.11 thus this study contributes to current literature by evaluating the effects of knowledge transfer as a means to strengthen information security within libraries. furthermore, this study will offer a preliminary investigation as to whether knowledge utilization leads to motivation and the participation of information security risk management activities within libraries. the remainder of this paper proceeds as follows: first, a review of knowledge transfer is covered. a description of the cybersecurity course, including students and course material, is provided. data collection and analysis are then presented. this is followed by a discussion of the findings, limitations, and future research. literature rivew knowledge transfer in seta knowledge transfer through seta programs plays a key role in the development and implementation of cybersecurity practices.12 knowledge is transferred when learning takes place and when the recipient of that knowledge understands the intricacies and implications associated with that knowledge so that he or she can apply it.13 for example, in a security education program, an educator may transfer knowledge about information security risks to users who learn and apply the knowledge to increase patron privacy. the knowledge is applied when evidenced by users who are able to identify risks to patron data and implement risk mitigations strategies that serve to protect patron information and information system assets. knowledge transfer can be influenced by four factors: absorptive capacity, communication, motivation, and user participation.14 this study evaluates the extent to which knowledge transferred from a cybersecurity course strengthens information security practices within libraries. this study adapts the theoretical model as proposed by spears & san nicolas-rocca information security in libraries | san-nicolas-rocca and burkhard 60 https://doi.org/10.6017/ital.v38i2.10973 (2015) (see figure 1) to examine the effects of cybersecurity education on information security practices in libraries.15 figure 1. factors of knowledge transfer leads to knowledge utilization. absorptive capacity absorptive capacity is the ability of a recipient to recognize the importance and value of eternally sourced knowledge, assimilate and apply it and has been found to be positively related to knowledge transfer.16 activating a student’s prior knowledge could enhance their ability to process new information.17 that is, knowledge transfer is more likely to take place between the instructor and students enrolled in a cybersecurity course if the student has existing knowledge or has had experience in some related area. for the present study, students have stated that prior to enrolling in the cybersecurity course, they had little to no knowledge of cybersecurity. one student mentioned, “while i am the director of a small academic library, i have no understanding of cybersecurity. i am taking this course to learn about cybersecurity so that i can better secure the library i work in and to share the information with those who work in the library.” another student mentioned, “my goal is to work in a public library after graduation. i am taking this course because i keep hearing about cybersecurity breaches in the news, and i want to learn more about cybersecurity because i think it will help me in my future job.” while all of the students enrolled in the course had no cybersecurity experience, all of them had some understanding of principle 3 in the ala code of ethics, which states, “we protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”18 understanding of principle 3 in the code of ethics demonstrates existing knowledge in some related area with regards to cybersecurity, albeit limited knowledge. given this understanding, students should have the ability to process new information from the cybersecurity course. information technology and libraries | june 2019 61 communication the success of any seta program depends on the ability of the instructor to effectively communicate the applicability and practical purpose of the material to be mastered, as distinguished from abstract or conceptual learning.19 according to current research, knowledge transfer can only occur if communication is effective in terms of type, amount, competence, and usefulness.20 for the present study, students were enrolled in an online graduate level cybersecurity course at a university we call mountain view university (mvu). we changed the name to protect the privacy of the research participants. while research suggests that the best form of communication for knowledge transfer is face-to-face communication, the cybersecurity course at mvu is only offered online.21 therefore, communication relating to the course was conducted via course management software, email, video conferencing, discussion board, and prerecorded videos. motivation motivation can be a significant influence on knowledge transfer.22 that is, an individual’s motivation to participate in seta programs has been found to influence the extent to which knowledge is transferred.23 specifically, without motivation, a trainee may fail to use information shared with them about methods used to protect and safeguard patron privacy. in this present study, research participants voluntarily enrolled in the cybersecurity course. the cybersecurity course is not a core course or a class required for graduation. therefore, enrolling in the course implies motivation to learn about cybersecurity by participating in course activities and completing assigned work. user participation user participation in information security activities may influence effective knowledge transfer initiatives.24 according to previous research, when users participate in cybersecurity activities, security safeguards were more aligned with organizational objectives and were more effectively designed and performed within the organization.25 for the present study, given that students enrolled in the cybersecurity course, it is expected that they will participate in information security risk management activities, such as the completion of personal and organizational risk management projects. cybersecurity course information this study will examine whether cybersecurity education strengthens information security practices within libraries. based on the model in figure 1, students enrolled in the cybersecurity course (motivation), and therefore, were expected to participate in all course activities and complete assigned work (user participation), such as isrm assignments. isrm assignments are described in the course material section below. as per figure 2, the cybersecurity course was offered online, and used multiple forms of communication, including email, video conferencing, discussion board, and pre-recorded videos (communication). students were able to access these resources through canvas, a learning management system. students came into the class with some understanding of principle 3 in the ala code of ethics. therefore, given that this knowledge is in a “related area,” students may be able to process new information relating to cybersecurity (absorptive capacity). as per the above information and as depicted in figure 1, motivation, user participation, communication, and absorptive capacity will lead to knowledge transfer. therefore, this study will focus on how knowledge transfer, as a means to strengthen information security, leads to knowledge utilization by cybersecurity students within information organizations. information security in libraries | san-nicolas-rocca and burkhard 62 https://doi.org/10.6017/ital.v38i2.10973 specifically, this study will explore the possibility of knowledge utilization leading to motivation, and participation in isrm initiatives in libraries. figure 2. knowledge transfer elements: cybersecurity knowledge transfer for information organizations. course material the course was offered to graduate students at mountain view university. course material was created based on the national institute of technology special publication (nist sp) 800-53 and 60, as well as federal information processing standards (fips) publications 199 and 200. the focus of the course was information security risk management (isrm). course requirements included lab exercises, discussion posts relating to current cybersecurity findings and news reports, and isrm assignments. isrm assignments included a personal risk management assignment, which then led to the completion of an organizational risk management project (ormp). students completed the ormp for various libraries, healthcare institutions, pharmaceutical companies, government organizations, and small businesses. with instructor approval, students were allowed to select the organization they wanted to work with. the objective of the course was for students to obtain an understanding of isrm and be able to apply what they have learned to the workplace. course communication seta programs depend strongly on the ability of the knowledge source to effectively communicate the importance and applicability of the knowledge shared. current research suggests that the type of communication medium, relevance and usefulness of the information, and competency of the instructor can affect knowledge transfer. given that face-to-face communication is considered the best method for successful knowledge transfer, it is important to understand if online communication methods were effective in the cybersecurity course described herein as the main focus of this study is to determine if knowledge transfer leads to knowledge utilization. according to table 1, respondents “strongly agree” or “agree” that the materials used, relevance of communication, comprehension of instructor communication, and the amount of time communicating about cybersecurity in the course was effective (data collection described in section, data collection and analysis. information technology and libraries | june 2019 63 questions response strongly agree agree neither agree nor disagree disagree strongly disagree medium: the material used in the cybersecurity course i took at mvu communicated security lessons effectively. 12 (50%) 12 (50%) 0 (0.00%) 0 (0.00%) 0 (0.00%) relevance: communication during the cybersecurity course i took at mvu was effective in focusing on things i needed to know about cybersecurity for my job. 10 (45.45%) 12 (54.55%) 0 (0.00%) 0 (0.00%) 0 (0.00%) comprehension: in the cybersecurity course i took at mvu, the instructor’s oral and/or written communication with me was understandable. 12 (54.55%) 10 (45.45%) 0 (0.00%) 0 (0.00%) 0 (0.00%) amount: in the cybersecurity course i took at mvu, the amount of time communicating about cybersecurity was sufficient. 12 (54.55%) 10 (45.45%) 0 (0.00%) 0 (0.00%) 0 (0.00%) table 1. effectiveness of communication in cybersecurity course. data collection and analysis the purpose of this study is to determine if knowledge transfer through cybersecurity education, as a means to strengthen information security, leads to knowledge utilization within libraries. specifically, this study will examine if research participants will engage in isrm activities after completion of the cybersecurity education course. the model in figure 1 is examined via survey instrument by the authors. the survey instrument was available to former students who completed an online, semester long, cybersecurity course from fall 2013 through fall 2017. one hundred and twenty-six former students completed one of eight cybersecurity courses, and all were asked to participate in this study. thirty-nine students accessed the survey, but only thirty-eight agreed to participate. of those who agreed to participate in the survey, only twenty-two work in a library in the u.s. or a u.s. territory. of the other sixteen participants, twelve do not currently work within a library environment, and four do not have a job. therefore, responses from twenty-two research participants who work in a library in the u.s. or u.s. territory will be reported in this study. table 2 provides a list of the types of libraries the twenty-two research participants work in. type of library environment response (22) academic library 3 (13.64%) public library 11 (50%) school library (k-12) 2 (9.09%) special library 6 (27.27%) table 2. types of libraries research participants work in. information security in libraries | san-nicolas-rocca and burkhard 64 https://doi.org/10.6017/ital.v38i2.10973 having knowledge and an understanding of information security policies, work processes, and information and information system use within a library environment, a knowledge recipient may understand the value of the knowledge shared with them through effective seta programs and utilize the new knowledge to protect information and information resources. according to table 3, most survey participants stated that they have average to excellent knowledge of their library’s computing-related policies, work processes that handle sensitive patron information, how access to patron information is granted, and how internal staff tend to use computing devices to access organizational information. a few respondents stated that their knowledge is below average. questions: response excellent above average average below average poor how would you rate your knowledge of your organization’s computing-related policies for internal staff computer usage? 4 (18.18%) 10 (45.45%) 8 (36.36%) 0 (0.00%) 0 (0.00%) how would you rate your knowledge of your library’s work processes that handle sensitive patron information? 4 (18.18%) 11 (50%) 6 (27.27%) 1 (4.55%) 0 (0.00%) within the organization you work for, how would you rate your knowledge of how access to patron information is granted? 3 (13.64%) 12 (54.55%) 5 (22.73%) 2 (9.10%) 0 (0.00%) how would you rate your knowledge on how internal staff tend to use computing devices to access organizational information? 2 (9.10%) 11 (50%) 8 (36.36%) 1 (4.55%) 0 (0.00%) table 3. knowledge of organization’s computing-related policies. knowledge transfer for this study, knowledge transfer is measured as the extent to which the cybersecurity student acquired knowledge or understands the key educational objective. according to table 4 below, all survey participants stated that during the cybersecurity course, they acquired knowledge on information security risks, and solutions to manage information security risks within organizations. furthermore, 91 percent of the twenty-two survey participants stated that they gained an understanding of the feasibility to implement solutions and potential impact of not implementing solutions to manage information security risk within the organizations in which they work. this is consistent with previous research that has measured knowledge transfer.26 question: during the cybersecurity course i took at mvu, i _________. response acquired knowledge on information security risks within the organization. 22 (100%) acquired knowledge on solutions to manage information security risks identified within my organization. 22 (100%) gained an understanding of the feasibility to implement solutions to manage information security risks identified within my organization. 20 (90.90%) gained an understanding of the potential impact of not implementing solutions to manage information security risks identified within my organization. 20 (90.90%) information technology and libraries | june 2019 65 table 4. indicators of knowledge transfer. knowledge utilization the desired outcome of knowledge transfer is knowledge utilization.27 this study is interested in the extent to which cybersecurity students have been engaged in information security risk management initiatives in their workplace since the completion of the cybersecurity course. according to table 5, twelve of the twenty-two survey participants have utilized the knowledge transferred to them from the cybersecurity course within the libraries in which they work. of the twelve survey participants, ten performed security procedures within the organization on an ad hoc, informal basis. seven worked on defining new or revised security policies. four implemented new or revised security procedures for organizational staff to follow, and two evaluated at least one security safeguard to determine whether it is being followed by organizational staff. question: since the completion of the cybersecurity course i took at mvu, i have ______ (please check all that apply). response performed security procedures within the organization on an ad hoc, informal basis. 10 (83.33%) worked on defining new or revised security policies. 7 (58.33%) implemented new or revised security procedures for organizational staff to follow. 4 (33.33%) evaluated at least one security safeguard to determine whether it is being followed by organizational staff. 2 (16.66%) not performed any security procedures within the organization. 10 (45.45%) table 5. indicators of knowledge utilization in the library. participation knowledge transfer through cybersecurity education may influence a cybersecurity student to utilize the knowledge they have gained by participating in isrm activities. according to table 6, sixteen of the twenty-two survey participants have participated in isrm activities in the library in which they work since the completion of the cybersecurity course. fifteen communicated with internal senior management on training materials. seven performed a policy review and communicated with internal senior management on training materials. five worked on a security questionnaire, one had an interview with an external collaborator, and another research participant analyzed their library’s business or it process workflow. question: since the completion of the cybersecurity course you took at mvu, have you performed any of the following activities within the workplace: (please check all that apply) response security questionnaire 5 (31.25%) interview with external collaborator (i.e. trainers) 1 (6.25%) policy review 7 (43.75%) business or it process workflow analysis 1 (6.25%) communication with internal peers or staff on training materials 15 (93.75%) communicate with internal senior management on training materials 7 (43.75%) i have not performed any security activities in my workplace 6 (14.29%) table 6. participation in isrm activities. participation may also include discussions on isrm activities. according to table 7, sixteen of the twenty-two survey participants have participated in discussion on isrm activities within the information security in libraries | san-nicolas-rocca and burkhard 66 https://doi.org/10.6017/ital.v38i2.10973 libraries they are currently working at. fifteen survey participants participated in discussions on physical security, and ten had discussions on password policy. seven survey participants had discussions on user provisioning, and six had discussions on encryption. four survey participants had discussions on mobile devices, and another four had discussions on vendor security question: since the completion of the cybersecurity course you took at mvu, have you participated in discussions on the following areas of security? (check all that apply) response password policy 10 (62.5%) user provisioning (i.e., establishing or revoking user logons and system authorization) 7 (43.75%) mobile device 4 (25%) encryption 6 (37.5%) vendor security 4 (25%) physical security 15 (93.75%) disaster recovery, business continuity, or security incident response 6 (37.50%) i have not participated in any discussions relating to security in my workplace 6 (27.27%) table 7. participation in discussions on isrm activities. participation in cybersecurity education may lead to formal responsibility or accountability of isrm activities. according to table 8, nine of the twenty-two survey respondents stated that since the completion of the cybersecurity course, they are formally responsible or accountable for isrm in the libraries in which they work. three research participants are responsible for identifying organizational members to participate in cybersecurity training. five survey participants stated that they are responsible for communicating results on cybersecurity training to upper management, peers, and staff. three research participants are responsible for organizational compliance with government regulations. two are responsible for communicating organizational risk to the board of directors, and one research participant is responsible for organizational compliance of funder requirements. question: since the completion of the cybersecurity course you took at mvu, are you formally responsible or accountable in the following ways? (check all that apply) response identifying organizational members to participate in cybersecurity training 3 (33.33%) communicating results to upper management 5 (55.56%) communicating results to peers or staff 5 (55.56%) responsible for organizational compliance of funder requirements 1 (1.11%) responsible for organizational compliance with government regulations 3 (33.33%) responsible for internal audit 0 (0%) responsible for communicating organizational risk to the board of directors 2 (22.22%) i am not formally responsible for security in my workplace 13 (59.10%) table 8. participation via accountability of isrm activities. motivation an objective of seta programs is to motivate knowledge recipients to comply with information security policies that serve to protect information and information resources. as such, cybersecurity education may motivate students to comply with organizational information security policies that serve to protect information and information resources. according to table 9, information technology and libraries | june 2019 67 since the completion of the cybersecurity course, eighteen of the twenty-two survey participants stated that they believe it is important to protect patron sensitive data. two respondents stated that they wholeheartedly feel responsible to protect their patrons from harm, and another two stated that they would be embarrassed if their organization experienced a data breach. since the completion of the cybersecurity course i took at mvu, _________. response i wholeheartedly feel responsible to protect our patrons from harm. 2 (9.10%) i believe it is important to protect our patrons’ sensitive data. 18 (81.82%) i would be embarrassed if my organization experienced a data breach. 2 (9.10%) my job could be in jeopardy if my organization were to experience a data breach. 0 (0.00%) i do not care about cybersecurity in my organization. 0 (0.00%) table 9. motivation to protect patron privacy. discussion the purpose of this study was to evaluate the effects of knowledge transfer as a means to strengthen information security within libraries. given the results from the survey instrument, the findings suggest that knowledge transfer through cybersecurity education can lead to knowledge utilization. specifically, knowledge transfer through cybersecurity education may influence a library employee to utilize the knowledge they have gained by participating in discussions about, and the accountability and responsibility of isrm activities. in addition, participating in seta programs. seta programs are implemented within organizations as a means to increase compliance of information security policies. the findings suggest that library employees who completed a cybersecurity education course believe that it is important to, or feel that they have a responsibility to, protect patron private information. a couple of research participants stated that they would feel embarrassed if their organization experienced a data breach. a student enrolled in a cybersecurity education course may develop an understanding of and value the information that is passed on from the knowledge source about isrm activities. with ongoing development and implementation of seta programs, activating a student’s prior knowledge of isrm activities could enhance their ability to process new information and apply to their job. limitations and future research this research was conducted based on an online cybersecurity course offered at a university located in the western u.s. therefore, future research is needed to study how cybersecurity courses in other parts of the u.s and internationally affects knowledge transfer as a means to strengthen isrm initiatives in libraries, and other information organizations. it would also be valuable to conduct a modified version of this research within a classroom-based, face-to-face cybersecurity course. furthermore, seta programs implemented in libraries in the united states and internationally would add to this research area. there were 126 potential research participants identified, and although all were asked to participate, only thirty-eight completed the online survey. of the thirty-eight completed surveys, responses from twenty-two participants were reported in this article. participation from additional research participants may have generated different results. information security in libraries | san-nicolas-rocca and burkhard 68 https://doi.org/10.6017/ital.v38i2.10973 while a major limitation of this study is its small pilot study and exploratory focus, a next phase of research should further investigate what type of seta programs would be most effective in different library environments. while cybersecurity education may not be feasible for all library employees to obtain, examining and implementing the most effective seta program for each library environment could strengthen cybersecurity practices in libraries across the u.s. a future study instrument should take into account the factors that influence knowledge transfer (absorptive capacity, communication, motivation, and user participation) as a means to strengthen isrm practices. a common an important outcome for seta programs is user compliance to information security policies. as such, a future study should test library employee knowledge of, and compliance to, information security policies. conclusion u.s. libraries handle sensitive patron information, including personally identifiable information and circulation records. with libraries providing services to millions of patrons across the united states, it is important that they understand the importance of patron privacy and how to protect it. this study investigated how knowledge transferred within an online cybersecurity education course as a means to strengthen information security risk management affects library employee information security practices. the results of this study suggest that knowledge transfer does have a positive effect on library employee information security and risk management practices. references 1 “public library survey (pls) data and reports,” institute of museum and library services, retrieved on june 10, 2018 from https://www.imls.gov/research-evaluation/datacollection/public-libraries-survey/explore-pls-data/pls-data. 2 “policy concerning confidentiality of personally identifiable information about library users,” american library association, july 7, 2006, http://www.ala.org/advocacy/intfreedom/statementspols/otherpolicies/policyconcerning; "professional ethics," american library association, may 19, 2017, http://www.ala.org/tools/ethics. 3 “privacy: an interpretation of the library bill of rights,” american library association, amended july 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 4 ibid. 5 “policy concerning confidentiality of personally identifiable information about library users,” american library association; “code of ethics of the american library association,” american library association, amended jan. 22, 2008, http://www.ala.org/advocacy/proethics/codeofethics/codeethics. 6 “policy concerning confidentiality of personally identifiable information about library users,” american library association; “code of ethics of the american library association,” american library association. 7 “privacy: an interpretation of the library bill of rights,” american library association. information technology and libraries | june 2019 69 8 samuel t.c. thompson, “helping the hacker? library information, security, and social engineering,” information technology and libraries 25, no. 4 (2006): 222-25, https://doi.org/10.6017/ital.v25i4.3355. 9 roesnita ismail and awang ngah zainab, “assessing the status of library information systems security,” journal of librarianship and information science 45, no. 3 (2013): 232-47, https://doi.org/10.1177/0961000613477676. 10 ibid. 11 shayna pekala, “privacy and user experience in 21st century library discovery,” information technology and libraries 36, no. 2 (2017): 48–58, https://doi.org/10.6017/ital.v36i2.9817. 12 tonia san nicolas-rocca, benjamin schooley and janine l. spears, “exploring the effect of knowledge transfer practices on user compliance to is security practices,” international journal of knowledge management 10, no. 2, (2014): 62-78, https://doi.org/10.4018/ijkm.2014040105; janine spears and tonia san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” international journal of knowledge management 11, no. 4 (2015): 52-69, https://doi.org/10.4018/ijkm.2015100104. 13 dong-gil ko, laurie j. kirsch and william r. king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” mis quarterly 29, no. 1 (2005): 59-85, https://doi.org/10.2307/25148668. 14 spears and san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” pp. 52-69; dana minbaeva et al., “mnc knowledge transfer, subsidiary absorptive capacity and hrm,” journal of international business studies 45, no. 1 (2014): 38-51, https://doi.org/10.1057/jibs.2013.43; geordie stewart and david lacey, “death by a thousand facts: criticising the technocratic approach to information security awareness,” information management & computer security 20, no. 1 (2012): 29-38, https://doi.org/10.1108/09685221211219182; mark wilson et al., “information technology training requirements: a role-and performance-based model” (nist special publication 800-16), national institute of standards and technology, (2018), https://www.nist.gov/publications/information-technology-security-training-requirementsrole-and-performance-based-model; san nicolas-rocca, schooley and spears, “exploring the effect of knowledge transfer practices on user compliance to is security practices,” 62-78. 15 spears and san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” 52-69. 16 janine l. spears and henri barki, “user participation in information systems security risk management,” mis quarterly 34, no. 3 (2010): 503-22, https://doi.org/10.2307/25750689; piya shedden, tobias ruighaver, and atif ahmad, “risk management standards-the perception of ease of use,” journal of information systems security 6, no. 3 (2010): 23–41. information security in libraries | san-nicolas-rocca and burkhard 70 https://doi.org/10.6017/ital.v38i2.10973 17 shedden, ruighaver and ahmad, “risk management standards-the perception of ease of use” pp. 23-42; janne hagen, eirik albrechtsen, and stig ole johnsen, “the long-term effects of information security e-learning on organizational learning,” information management & computer security 19, no. 3 (2011): 140-154, https://doi.org/10.1108/09685221111153537. 18 “code of ethics of the american library association,” american library association. 19 spears and san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” pp. 52-69; wilson et al., “information technology training requirements: a roleand performance-based model” (nist special publication 800-16). 20 thompson s.h. teo and anol bhattacherjee, “knowledge transfer and utilization in it outsourcing partnerships: a preliminary model of antecedents and outcomes,” information & management 51, no. 2 (2014): 177–86, https://doi.org/10.1016/j.im.2013.12.001; ko, kirsch, and king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” 59-85; minbaeva et al., “mnc knowledge transfer, subsidiary absorptive capacity and hrm,” 38-51; geordie stewart and david lacey, “death by a thousand facts: criticising the technocratic approach to information security awareness,” information management & computer security 20, no. 1 (2012): 29-38, https://doi.org/10.1108/09685221211219182. 21 martin spraggon and virginia bodolica, “a multidimensional taxonomy of intra-firm knowledge transfer processes,” journal of business research 65, no. 9 (2012) 1,273-282: https://doi.org/10.1016/j.jbusres.2011.10.043; shizhong chen et al., “toward understanding inter-organizational knowledge transfer needs in smes: insight from a uk investigation,” journal of knowledge management 10, no. 3 (2006): 6-23, https://doi.org/10.1108/13673270610670821. 22 maryam alavi and dorothy e. leidner, “review: knowledge management and knowledge management systems: conceptual foundations and research issues,” mis quarterly 25, no. 1 (2001): 107-36, https://doi.org/10.2307/3250961. 23 ko, kirsch, and king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” 59-85. 24 san nicolas-rocca, schooley, and spears, “exploring the effect of knowledge transfer practices on user compliance to is security practices,” 62-78; spears and san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” 52-69. 25 spears and san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” 52-69; spears and barki, “user participation in information systems security risk management,” 503-22. 26 san nicolas-rocca, schooley, and spears, “exploring the effect of knowledge transfer practices on user compliance to is security practices,” 62-78; janine l. spears and tonia san nicolasrocca, “information security capacity building in community-based organizations: information technology and libraries | june 2019 71 examining the effects of knowledge transfer,” 49th hawaii international conference on system sciences (hicss), koloa, hi, 2016, pp. 4,011-20, https://doi.org/10.1109/hicss.2016.498; ko, kirsch, and king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” 59-85. 27 ko, kirsch, and king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” 59-85; teo and bhattacherjee, “knowledge transfer and utilization in it outsourcing partnerships: a preliminary model of antecedents and outcomes,” 177–86. virtual reality: the next big thing for libraries to consider editorial board thoughts virtual reality: the next big thing for libraries to consider breanne kirsch information technology and libraries | december 2019 4 breanne kirsch (breanne.kirsch@briarcliff.edu) is university librarian, briar cliff university. i had the pleasure of attending educause annual conference from october 14-17, 2019. this was my first time at educause, but i was impressed with the variety of programs, vendors, and options for learning about technology and higher education. after recently completing my coursework for a second master’s in educational technology, i was curious to see what new technologies would be highlighted at educause. i found out about some new trends, such as the growth of esports in high schools and higher education. esports are when players or teams compete through computers in video game competitions.1 there were over 20 programs and sessions about virtual reality at educause. since there were so many programs about virtual reality at educause, i wanted to share a little of what i learned including how some higher education institutions are creating vr content, using pre-created content, and vr in libraries. since virtual reality is still new to many higher education institutions, i wasn’t sure how many would be creating content, but i did attend a couple of sessions about how 360 -degree content is being created. virtual reality content creation seems to happen most frequently in the medical field so students can practice different procedures that may not happen very frequently in their jobs, allowing them to experience a wider variety of procedures that they will eventually encounter in the workplace. health sciences libraries are generally ahead of the curve in providing vr services to patrons.2 additionally, stem areas are finding more uses for vr, such as vr laboratories so expensive lab equipment does not need to be purchased, but students can still participate in vr lab experiences. creating vr content using tools such as unity can be difficult and time-consuming. some educators are using 360-degree cameras to create virtual settings that can be used by students but are easier to create. tim fuller and rich kappel spoke about how they used a 360-degree camera and matterport scans to create 360-degree virtual environments for students to explore and engage with robotics technology. tags can then be added to include pictures, videos, or link to websites with more information. this creates a shareable link that can be used to share with students. i was able to use my iphone and the google street view app to create a 360-degree tour of my library. it is not high quality enough to view in virtual reality with an oculus go or other vr headset, but it is a great starting point for creating a 360-degree virtual tour of a library on a budget. this was free (since i already had an iphone). there is a wide variety of freely available, 360-degree content that can be used by educators in the classroom and more is being created. what does this mean for libraries? while quick virtual tours can be created with smartphones, higher quality vr experiences can also be created by librarians using a 360-degree video camera. these experiences could be used to teach students information literacy skills or search strategies in a vr environment. while this would be harder to do right now with the technologies available, mailto:breanne.kirsch@briarcliff.edu virtual reality: the next big thing for libraries to consider | kirsch 5 https://doi.org/10.6017/ital.v38i4.11847 it could become easier down the road. meanwhile, librarians can create 360-degree virtual tours. libraries can offer vr services, such as a vr lab or checking out standalone vr headsets, such as oculus go or oculus quest. just like with the makerspaces trend, libraries are well situated to support virtual reality in education. our library circulates an oculus go and when we were considering adding a virtual reality headset, there were some risks we considered prior to purchasing it. there are health risks for some people when using virtual reality headsets, such as motion sickness, dizziness, and, in some cases, epileptic seizures. it is important to explain this to students before they check out the device, so they know to immediately quit using the oculus go if they have an adverse reaction. additionally, we keep cleaning wipes with the oculus go to help keep it sanitary when multiple people are using it. a tablet or smart phone needs to be associated with the oculus go in order to update apps or download new apps. therefore, a passcode needs to be added so students can’t purchase paid apps on the oculus go with the associated credit card. privacy can also be a concern, especially when using the social apps, which is why i decided not to download the social apps on the oculus go at this time. some of the scary apps, such as the face your fear app can cause students to scream, so it is important that students realize how realistic the experiences are before using them. one final consideration when offering vr services is staffing. there needs to be someone trained in the library that can help teach students how to use the vr headset and experiences. i’ve trained each of our student workers in how to use the headset so they can show other students. while these are some important considerations when deciding whether to offer vr services or not, i believe the benefits outweigh the risks. virtual reality is expected to continue to grow, especially with wireless headsets, such as the oculus go and oculus quest available. it is important for libraries to be ready to offer support with virtual reality, just as we’ve offered support for prior technologies including tablets, laptops, computers, 3d printers, etc. libraries can start small, by circulating an oculus go or creating a 360-degree library tour. libraries with more resources could create a vr lab or provide support for creating vr content, such as 360 -degree video cameras or tools like unity. it will be exciting to see how libraries can support vr in the future. further readings van arnhem, jolanda-pieta, christine elliott, and marie rose. augmented and virtual reality in libraries. lanham: rowman & littlefield, 2018. varnum, kenneth j. beyond reality: augmented, virtual, and mixed reality in the library. chicago: ala editions, 2019. endnote 1 matthew a. pluss, kyle j. m. bennett, andrew r. novak, derek panchuk, aaron j. coutts and job fransen, “esports: the chess of the 21st century,” frontiers in psychology 10, no. 156, 2019, https://doi.org/10.3389/fpsyg.2019.00156. 2 susan lessick and michelle kraft, “facing reality: the growth of virtual reality and health sciences libraries,” journal of the medical library association 105, no. 4, 2017, https://doi.org/10.5195/jmla.2017.329. https://doi.org/10.3389/fpsyg.2019.00156 https://doi.org/10.5195/jmla.2017.329 further readings endnote 10603 20190318 galley measuring information system project success through a software-assisted qualitative content analysis jin xiu guo information technology and libraries | march 2019 53 jin xiu guo (jin.x.guo@stonybrook.edu) is director of collections and resource management, frank melville, jr. memorial library, stony brook university abstract information system (is)/it project success is a growing interest in management due to its high impact on organizational change and effectiveness. libraries have been adopting integrated library systems (ils) to manage services and resources for years. it is essential for librarians to understand the mechanism of is project management in order to successfully bring technology innovation to the organization. this study develops a theoretical model of measuring is project success and tests it in an ils merger project through a software-assisted qualitative content analysis. the model addresses project success through three constructs: (1) project management process, (2) project outcomes, and (3) contextual factors. the results indicate project management success alone cannot guarantee project success; project outputs and contextual factors also influence success through the leadership of the project manager throughout the lifecycle. the study not only confirms the proposed model in a post-project evaluation, but also signifies that project assessment can reinforce organizational learning, increase the chance of achieving success, and maximize overall returns for an organization. the qualitative content analysis with nvivo 11 has provided a new research method for project managers to self-assess an is/it project success systematically and learn from their experiences throughout the project lifecycle. introduction information technology (it) project success has drawn more attention in the last two decades due to its high impact on organizational change. more companies have conducted their innovation to gain business advantages through is projects. in the united kingdom alone, 21 percent of the gross value increased in manufacturing and construction happens through complex products and is development projects. however, the implementation of is projects has not been successful as practitioners hoped. nicholas and hidding reported that only 35 percent of it projects were completed on time and budget, and met the project requirements.1 the u.s. office of electronic government and information technology (oegit) noted that only 25 percent of 1,400 projects reached the office’s goals and more than $21 billion spent on it projects were in jeopardy.2 in the european union, about 20 to 30 percent of contracted it/is projects could not meet the stakeholders’ expectations and cause the loss of ₤70 billion or $99 billion.3 although some it projects are considered successful from the perspective of project management, project sponsors hardly recognize the results leading to organizational effectiveness. it is critical for it practitioners to explore new methods to articulate what it project success is and then improve project performance. measuring is project success | guo 54 https://doi.org/10.6017/ital.v38i1.10603 traditionally, the measurement of it project success focuses on internal measures such as project time, cost, risk, and quality, which address project efficiency. in recent years, external measures, such as product satisfaction and organizational effectiveness, have gained more attention. moreover, contextual factors such as top management support, project managers’ qualifications, system vendors, implementation consultants, and adaptation to change have shown critical effects on project success. the lack of literature in the post-project evaluation and merger of multiple information systems (is) still exists. notably, the consolidation of information systems of different organizations creates additional challenges for the new organizations. diverse cultures and leadership styles may create barriers for managers to gain the trust of employees who used to work at a different institution. nevertheless, the adaptation to change for all staff is necessary in the course of the merger. the need for addressing the impact of these factors on is project success is increasing. libraries have adopted the ils to manage services and resources for the last two decades. the next generation system—cloud-based library management systems—are now replacing existing ils. to improve the efficiency of higher education, consolidation of public universities or colleges is still a viable alternative. it is essential for librarians to understand the mechanism of is project management in order to successfully bring technology innovation to the organization. this study is to fill the gap by examining is project success factors and developing a model to measure is project success. the model can help practitioners better understand is project success and improve the chance of success. the author firstly provides a historical account of the definitions of project success and measures adopted. what follows is to apply the model in a postproject evaluation at an academic library. theoretical background researchers and practitioners have been seeking it project success through both quantitative and qualitative studies to find out what makes a successful it project and how a project manager can make better decisions to increase the chance of project success. this review is to examine how project success is defined and what criteria practitioners employ for measurement. it projects can be at different levels of complexity. for instance, a project of enterprise resource planning (erp) implementation is more complicated and requires more resources to deploy across organizational functions. this type of projects might quickly overrun budget and deadline. as a result, the studies on erp implementation success draw more attention. cảrstea believes that project success is to achieve the targets that an organization has created and can be relatively measured against time, cost, quality, final results obtained, resources, the degree of automation, and international standards with a flexible evaluation system. he suggests that project managers may analyze the goal discrepancies between the current and new to self-evaluate the progress.4 although this method emphasizes project efficiency, the self-developed evaluation system has shown the potential for it project managers to control planning and organization of multiple it projects within the organization. instead of studying project management process alone, tsai et al. incorporate system providers, implementation consultants, and the achievement level of project management into delone and mclean’s modified is success model. they describe the erp project success as efficient deployment and enhancement of organizational effectiveness. the success indicators include the information technology and libraries | march 2019 55 accomplishment level of project management and the degree of the improvement of is performance. the metrics of project management are fulfilling business implementation goal, top management support, budget, time, communication, and troubleshooting; while the system performance dimensions include achieving integration of systems for system quality, information quality, system use, user satisfaction, individual and organizational impacts. the authors applied the research model to a quantitative study to test five hypotheses with servqual (service quality) instruments. the results show that the services provided by system vendors and implementation consultants are correlated with project management, then from project management to system performance.5 it is worth mentioning that this measurement integrates project management into the is success model and confirms the contribution of project management to erp performance that leads to the improvement of organizational effectiveness. both studies indicate is project measures should comprise the dimensions of project management success and business goals. with the similar interest of erp, young and jordan investigate the impact of top management support (tms) on erp implementation success through descriptive case studies. the authors regard project success as the delivery of “expected benefits” and the achievement of “above average performance.” the findings of the research reveal that tms is the most important critical success factor (csf) that affects it project success through the involvement of top management in project planning, result follow-ups, and the facilitation of management problems, but project management success does not guarantee project success resulting in organizational productiveness.6 researchers are also interested in different perspectives of it project success. irani believes is project appraisal should incorporate investment evaluation into the project lifecycle. a project manager evaluates is impacts before, during, and after the investment is secured to dynamically justify the investment and ensure the project is in alignment with the organizational strategy. the author also points out that post-project evaluation lacks in current project management so that organizations lose a great learning opportunity to optimize their project management.7 furthermore, peslak inspects the relationship between it project success and overall it returns from the viewpoint of financial executives. the author defines it project success as organizational success in which staying abreast of technology and the ability to measure project and balance managerial control over projects positively affect it project success, then project success to overall it returns.8 likewise, lacerda and ensslin develop a conceptual model from the standpoint of external consultants to assess software projects. the theoretical framework contains the hierarchical structure of value, analysis, and recommendation, where they identify performance descriptors and analyze project values to improve the decision process in the course of the consultation.9 nicholas and hidding discover business goals, time for learning and reflection, and flexibility of the product are associated with project success through a series of interviews with it project managers.10 additionally, researchers make efforts to explain project outcomes for better understanding project success. thomas and fernández believe project success is changeable to each company, but the success criteria should consist of project management, technical system, and business goals that underscore business continuity, met business objectives, and delivery of benefits.11 measuring is project success | guo 56 https://doi.org/10.6017/ital.v38i1.10603 another study performed by kutsch also proves that the achievement of business purpose; benefit to the owner; the satisfaction of owners, users, and stakeholders; achieving prestated objectives; quality; cost and time; and satisfaction of team are sequentially significant variables affecting project outcomes.12 the study further attests that organizational effectiveness is an essential criterion of it project success. interestingly, researchers also examine individual success indicators such as quality and risk to deepen their understanding of project success. geraldi, kutsch, and turner think project quality has eight attributes including (1) a commitment to quality, (2) enabling capabilities, (3) completeness, (4) clarity, (5) integration, (6) adaptability, and (7) compliance along with (8) value-adding and met requirements.13 among them, enabling capabilities and adaptability are comparatively new. this discovery discloses that project quality is evaluated vigorously in the project lifecycle, which is consistent with cảrstea’s finding that project managers need to assess the projects regularly to recognize project controls and safety to achieve project goals. such practices create the agility for software development projects and secure the resources needed for development. summary of literature the literature review indicates that it is necessary to define project performance criteria and outcomes to measure is project success. is project success is the achievement of project management process and project goals. when measuring an is project, practitioners should also consider the impacts of contextual factors throughout the project lifecycle. system vendors, consultants’ services, management support, communication, adaptation to changes, time for learning and reflection, product flexibility, and project complexity are environmental influences. it is essential for practitioners to create an opportunity for organizational learning and improve future project success through a post-project evaluation. figure 1. the relationship between project success and organizational effectiveness. organizational effectiveness business goals & objectives business case 1, case 2, ... case n. project 1, project 2, … project n. information technology and libraries | march 2019 57 project success model the purpose of this study is to develop the measurement of is project success based on the findings of the literature review. therefore, the first step is to define project success. project success comprises project management success and the achievement of business goals. in the previous studies, practitioners emphasized project management success but pay less attention to project outcomes, which leads to many unexplainable project failures. for example, some it projects did not meet the business goals but conformed to the criteria of project management success. it might be a successful project from the perspective of project management process although it failed to attain the project goals. the relationship between is projects and organizational effectiveness is described in figure 1. each is project makes at least one business case, and each business case contributes to at least a business objective. it will be a successful is project if the project outcomes reach the business goals resulting in organizational effectiveness. the purpose of project performance criteria is to measure project progress throughout its lifecycle. without standards, a project manager could lose the control over the project, and most likely fail. as a result, the next step is to identify the measures of project success. the indicators of project management success have been widely studied and tested. the project scope, time, cost, quality, and risk are on the top of the metrics list. the discovery of literature review shows researchers employ business continuity, achieving business objectives, delivery of benefits, and the perceived value of a project to measure project outcomes. it is noteworthy that contextual factors also impact project success, influences such as top management support (tms), user involvement, system vendors, project manager’s qualifications, communication, and the complexity of a project, and adaptation to change need to be measured as well. hence, the author proposes a measurement model as shown in figure 2. measuring is project success | guo 58 https://doi.org/10.6017/ital.v38i1.10603 figure 2. model for measuring is project success. three constructs effect is project success in this model. project management process is a tool to help a project manager attain success, where project performance criteria are identified to control quality and assess the progress throughout the lifecycle. on the other hand, project outcomes entail project goals to ensure ultimate project success. the contextual factors may contribute to success directly or indirectly by influencing project management process or organizational environment such as change management. therefore, a project manager has to examine three constructs when assessing project success. to demonstrate the application of the model, the author conducted a case study on an ils merger project. case study: a post-project evaluation background in november 2013, the board of regents of the university system of georgia announced the consolidation of kennesaw state university (ksu) and southern polytechnic state university (spsu). the merger of two state university libraries was one of the main tasks and involved merging two integrated library systems (ils). the project involved removing duplicate bibliographic and customer records between two libraries and of relational databases that contain financial, bibliographic, transactional, vendor, and customer data. the ils provider, ex libris, and two university libraries executed the merger with the support of galileo interconnected libraries (gil) it staff. the ils merger implementation team comprised of two it experts from ex information technology and libraries | march 2019 59 libris and fourteen ils users from two libraries across five functional units comprising acquisition, cataloging, circulation and interlibrary loan, serials, and system administration. ksu/spsu and ex libris had a project manager on each side, and the author was the ksu/spsu project manager. the gil support team facilitated the implementation of the merger. the project goal was to operate two libraries with a consolidated ils by july 2015 without interrupting services to students, staff, and faculty. the project was completed within eighty-one days and the consolidated university libraries were operated uniformly by the timeline. the team also won the 2015 georgia library association team award due to its success. methodology the methodologies adopted in previous researches include interview and survey. both methods need to collect feedback from stakeholders during the post-project period, which sometimes can be challenging to reach the project stakeholders once the project is completed. however, many written communications including project documentation, emails, and reports are invaluable data for project managers to assess project success. researchers have utilized software to assist content analysis in qualitative studies. hoover and koerber used nvivo to analyze data like text, interview transcripts, photographs, audio and video recordings by coding and retrieving to understand sophisticated relations among those data.14 researchers think that computer-assisted qualitative data analysis (caqdas) has created new research practices and helped data analysis, research management, and theory development, where caqdas becomes a synonym of qualitative research.15 balan’s team manually coded and categorized the dimensions identified in concept analysis, then employed concept mapping to present data relationship, which is an integration of qualitative and quantitative methods. 16 the word tag cloud in nvivo is a technique to assess the relevance of the data obtained or gathered to the research topic and treemap on the other hand is a tool to extract the new themes along with their relationship from the study data.17 hutchison et al. believed that caqdas could facilitate the ground theory investigation. the group utilized the memo in nvivo to monitor emerging trends and justify the research purpose and theoretical sampling procedures. they also experienced the model-building tool to visualize the analytical observations.18 a study on content analysis of new articles indicated nvivo could assist qualitative research through data organization, idea management, querying data, and modeling. the research group also raised the concern about analytical reliability because qualitative data analysis is a highly interpretive method. therefore, they suggested utilizing double coding and comparison of codes by different researchers to resolve this problem.19 paulus’s team suggested researchers should write a description of the software to allow audience unfamiliar with the tool to not only appreciate its role in the study, but also understand how precisely the software enhances the potential in their analyses.20 in this case study, the author adopted nvivo 11 to conduct a content analysis to testify the proposed model by measuring is project success, which is a qualitative method for practitioners to assess project with textual data in the post-project period. data collection the data gathered in this study include the email communications between the project manager and stakeholders, the reports of university consolidation operational work group (owg), and project committee reports. after reviewing all document data to ensure the relevancy to the measuring is project success | guo 60 https://doi.org/10.6017/ital.v38i1.10603 research topic, the author imported 878 emails, twenty-five owg reports, and sixty-three project committee reports into nvivo 11. content analysis process the software—nvivo 11. nvivo 11 is the software package that allows researchers to collaborate and conduct qualitative studies. researchers can import various types of raw data including social media into nvivo 11 to store, manage, and share the data throughout the research process. however, initial learning and mastering the software could pose a difficult hurdle for researchers to perform a software-assisted qualitative research. data preparation and import. nvivo 11 can process documents (ms word, pdf, or rtf), survey, audio, video, and image. researchers may import outlook emails saved as .msg files into nvivo 11 directly. it is also noted that emails imported into nvivo become pdfs and any supported attachments are imported as well. in this study, the owg and committee reports in either ms word or pdf were imported to nvivo directly. to ensure the email content relevant to the project, the author opened the software nvivo 11 and emails in outlook 2010 simultaneously, and then dragged each email into the sources list view of nvivo 11 (see figure 3) after reviewing each email. figure 3. sources list view in nvivo 11. coding. coding is a way of categorizing all references to a specific topic, theme, person, or other entity. the process of coding can help researchers to identify patterns and theories in research data.21 in this study, the author adopted coding using queries to answer the following research questions: • what is is project success? • what are the factors that affect is project success? information technology and libraries | march 2019 61 • how do these factors influence is project success? below are the steps of coding source data: • run the query of word frequency in all data sets using the criteria of one hundred most frequent words with minimal five-character length including exact matches, stemmed words, and synonyms. • review the word list, remove irrelevant words from the list, and re-run the query until the words are accurate and relevant to the research topic. • create the parent nodes (e.g. contextual factors, project management process, project outcomes) and child nodes (e.g. top management support, manager’s qualifications, project involvement) based on the proposed model, and then save the results of word frequency in respective nodes (see the coding in figure 4). • run the query of word frequency with the same criteria in the context nodes (within each parent node) • review the results of word frequency and save the new word as a new node. • review all node references and sources, merge relevant nodes, and remove irrelevant ones as needed. figure 4. coding using queries. findings and discussion an overview of content analysis previous studies have shown that visualization tools such as models, charts, and treemaps provided by nvivo can be helpful to present the findings of qualitative studies.22 therefore, the author used the model tool to gain a better understanding and overview of key themes in the ils merger project. since the number of emails is much larger than the number of reports, the author measuring is project success | guo 62 https://doi.org/10.6017/ital.v38i1.10603 decided to display the themes of emails and reports separately. figures 5 and 6 are the word treemaps for the emails and reports respectively. figure 5. email tree map. figure 6. report tree map. the treemap is the visualization of the results of word frequency queries. in figure 5, the concepts of patron, barcode, missing, fines, charge, circulation, and policy are library user transactional data; while order, vendor, complete, wilson, lines, taylor, and holding show procurement information. the themes of production, mapping, duplicate, matching, location, cataloging, and process stand for library resource data. hence, the acquisition, bibliographic, patron, and transactional data are the primary content migrated to the new ils. the names mentioned such as russell, adriana meryll, trotter, and david reveal the involvement of system and service providers and top management. information technology and libraries | march 2019 63 figure 6 displays more details on library resource data such as serials, codes, bibliographic, ebook, format, journal and print. the user transactional data also appear. the subjects of production, implement, identify, training, mapping, match, finish, matrix, plan, procedure, campus and urgent indicate project management process. the term “accepted” in contrast shows one of the project outcomes. the treemaps shown above demonstrate that project management process, the involvement of user and system providers, top management, and project outcomes are the representatives of project success, which implies project success is to succeed in project management process, project outcomes, and engaging top management and system users and providers. how do these factors come together to impact project success? the next step is to examine the relationships among these variables and their interactions. relationships among constructs to analyze the concepts of contextual factors, project management process, and project outcomes further, the author utilized the model tool to create project maps. project maps are graphic representations of the data in a project, which helps illustrate the relationships among constructs and answer the research questions of this study. the author further inspected each construct node by creating project maps. figure 7. project management process map. figure 7 shows the relationships among the variables that affect the project management process. the child nodes of communication, project cost, quality, risk, time, and scope are the influencers of project management process. their respective child nodes such as barcodes, missing, and deadline are the results of coding source data and well support how the concepts of communication, cost, quality, risk, scope, and time effect project management process correspondently. measuring is project success | guo 64 https://doi.org/10.6017/ital.v38i1.10603 figure 8. contextual factor map. contextual factors have not been thoroughly discussed in previous project management practices. figure 8 illustrates the results of coding source data within this construct. the engagement of users and vendors, and their feedback signify the variable of project involvement. the node of top management also confirms its parent node of top management support. furthermore, jin as the project manager is associated with the node of project manger’s qualifications. she could affect project success either directly or indirectly through contextual factors. information technology and libraries | march 2019 65 figure 9. project outcomes map. figure 9 represents the themes— downtime, service satisfaction, and acceptance are the child nodes of business continuity, delivery of benefit, and project deliverables correspondently. the pdf reference source supports the subjects of “satisfaction of service” and “conditional acceptance” as the child nodes of “delivery of benefits” and “project deliverables” respectively. thus, business continuity, delivery of benefits, and project deliverables are the core factors to be considered when assessing project outcomes. figures 7, 8, and 9 have demonstrated that the project would not be successful if the project management process was not executed appropriately, context factors measuring is project success | guo 66 https://doi.org/10.6017/ital.v38i1.10603 were not fully met, or preferred project outcomes were not delivered. in other words, if one of three above project variables is not executed or delivered appropriately, the project could fail. the role of project manager although figures 7, 8, and 9 have signified the three constructs can affect project success, but do not tell how project management process, project outcomes, and contextual factors play together in this role. consequently, the author hoped to identify the connections between project items and to see if there are gaps or isolated items unexplained by the proposed model. to create such project map in nvivo 11, the author chose emails as project items and added the issues associated with the project manager jin to the map. figure 10. manager’s project map. this case study is to test the proposed model in a post-project assessment. the manager’s project map in figure 10 has well self-explained this purpose. the project manager jin led the project to success by influencing project management process, project outcomes, and contextual factors. the project success in this case includes the contribution to the consolidation of two state universities and maximization of library resources for the organization. the outcomes of the merger project are to deliver a consolidated ils and to provide library services for the new university continuously. figure 10 clearly indicates jin managed business continuity and project deliverables information technology and libraries | march 2019 67 through downtime and load acceptance. among contextual factors, the project manager executed project involvement through engaging system users and vendors and gathering user feedback. she also involved top management david in the project directly. senior management empowered jin to make decisions on the project. as a manager her qualifications enabled her to cope with the complexity of the project. the project documentation has verified the manager’s ability to govern the project. for instance, figure 11 is the project framework that the manager created according to the pmbok (project management body of knowledge). hence, a qualified project manager can directly make impacts on project success through contextual factors. figure 11. ksu library ils merger project management framework. meanwhile, the nodes of barcode, mappings, missing, patrons, and vendors confirm the manager’s role in project quality control. the coding of the deadline, cost-consolidation, communication, and risk control indicates the manager put her effort in project time, cost, and communication management and risk mitigation correspondingly. figure 10 reveals the project manager is the core of the project team and makes significant impacts on project success by influencing project management process, contextual factors and project outcomes. a project manager must fully understand project outputs; have the ability to execute project plans in the business environment, and communicate with different stakeholders at the corresponding levels through various channels since communication becomes challenging when a project involves more people from different sections of the business. people decode messages differently. multiple communication chains can help stakeholders gain consistent and accurate information directly. for example, this project manager utilized formal reports, group measuring is project success | guo 68 https://doi.org/10.6017/ital.v38i1.10603 discussions, training, and weekly coordination meetings to share information and seek feedback. the functional groups are the governance structure of the project. in the phrase of test and production loads, the leaders of functional groups communicated problems with the project manager more frequently to ensure the manager resolve issues in collaboration with related stakeholders (e.g. ex libris) in a timely way. in the meantime, the project manager communicated the expectations for responsible it staff regularly to prevent the additional waiting time for feeding the merged ils with patron data by verifying the feeder during the test load, which helps meet the deadlines of the campus it projects. the manager mitigated risk by implementing the project plan thoughtfully throughout the project. it was the project manager who connects the three variables—project management process, project outcomes, and contextual factors with project success. conclusions libraries have used the ils to manage resources and services for decades. with the exponential growth of digital information, is innovation continuously becomes one of most effective drivers of library transformation. therefore, it is crucial for libraries to effectively manage is/it projects to achieve organizational goals. this study develops a model of is project success. the model employs three constructs namely project management process, project outcomes, and contextual factors to measure is project success. project management success cannot bring is project success unless the project results achieve business goals and lead to the improvement of organizational effectiveness. the project manager makes important impacts on project success by delivering project outcomes through implementing project management process and making use of contextual factors throughout the project. the research methodology—software-assisted qualitative content analysis can be an approach to develop or test a theoretical model for library practitioners. a post-project evaluation can create an excellent opportunity for organizational learning and help managers to manage talents better and improve the chances of project success in the future. future research libraries have moved into a new era that is full of new and disruptive technologies, which affect library services, operations, and decisions on a daily basis. is projects will continue bringing innovations to library services and programs. a theoretical framework could provide librarians a methodology to manage is projects successfully. notably, the u.s. senate has unanimously approved the program management improvement and accountability act (pmiaa) to enhance project and program management practices to maximize efficiency in the federal government.23 project management has become a must-have skill for today’s library leaders. there are many opportunities for managers to test the is project success model through their practices. the future studies may combine quantitative and qualitative methods to assess and enhance the model further. each institution has different goals and contextual indicators that the author has not mentioned in this study. these factors might shift from minor to major or vice versa due to different organizational cultures. practitioners can also use nvivo to collaborate on double coding to increase the analytical reliability. a software-assisted qualitative content analysis will help library leaders to understand project management better and experiment the solutions to complex information world. information technology and libraries | march 2019 69 acknowledgements this work would not have been possible without the support of the ksu library system administration and the team efforts from ksu voyager consolidation committee, gil support, and ex libris team. i am grateful to all of those with whom i have had the privilege to work during this project. references 1 john nicholas and gezinus hidding, “management principles associated with it project success,” international journal of management and information systems 14, no. 5 (nov. 2, 2010): 147-56, https://doi.org/10.19030/ijmis.v14i5.22. 2 alan r. peslak, “information technology project management and project success,” international journal of information technology project management 3, no. 3 (july 2012): 31-44, https://doi.org/10.4018/jitpm.2012070103. 3 udechukwu ojiako, eric johansen, and david greenwood, “a qualitative re-construction of project measurement criteria.” industrial management & data systems 108, no. 3 (mar. 2008): 405-17, https://doi.org/10.1108/02635570810858796. 4 claudia-georgeta cảrstea, “it project management—cost, time and quality,” economy transdisciplinarity cognition 17, no. 1 (mar. 2014): 28-34, http://www.ugb.ro/etc/etc2014no1/07_carstea_c..pdf. 5 wen-hsien tsai et al., “an empirical investigation of the impacts of internal/external facilitators on the project success of erp: a structural equation model,” decision support systems 50, no. 2 (jan. 2011): 480-90, https://doi.org/10.1016/j.dss.2010.11.005. 6 raymond young and ernest jordan, “top management support: mantra or necessity?” international journal of project management 26, no. 7 (oct. 2008): 713-25, https://doi.org/10.1016/j.ijproman.2008.06.001. 7 z. irani, “investment evaluation within project management: an information systems perspective,” the journal of the operational research society 61, no. 6 (june 2010): 917-28, https://doi.org/10.1057/jors.2010.10. 8 peslak, “information technology project management and project success,” 31-44. 9 tadeau oliveira de lacerda, leonardo ensslin, and sandra rolim ensslin, “a performance measurement view of it project management,” international journal of productivity and performance management 60, no. 2 (2011): 132-51, https://doi.org/10.1108/17410401111101476. 10 nicholas and hidding, “management principles,” 153. 11 graeme thomas and walter fernández, “success in it projects: a matter of definition?” international journal of project management 26, no. 7 (oct. 2008): 733-42, https://doi.org/10.1016/j.ijproman.2008.06.003. measuring is project success | guo 70 https://doi.org/10.6017/ital.v38i1.10603 12 elmar kutsch, “the measurement of performance in it projects,” international journal of electronic business 5, no. 4 (2007): 415, https://doi.org/10.1504/ijeb.2007.014786. 13 joana g. geraldi, elmar kutsch, and neil turner, “towards a conceptualisation of quality in information technology projects,” international journal of project management 29, no. 5 (july 2011): 557-67, https://doi.org/10.1016/j.ijproman.2010.06.004. 14 ryan s. hoover and amy l. koerber, “using nvivo to answer the challenges of qualitative research in professional communication: benefits and best practices tutorial,” ieee transactions on professional communication 54, no. 1 (mar. 2011): 68-82, https://doi.org/10.1109/tpc.2009.2036896. 15 erika goble et al., “habits of mind and the split-mind effect: when computer-assisted qualitative data analysis software is used in phenomenological research,” forum: qualitative social research 13, no. 2 (may 2012): 1-22, https://doi.org/10.17169/fqs-13.2.1709. 16 peter balan et al., “concept mapping as a methodical and transparent data analysis process,” in handbook of qualitative organizational research (london: routledge, 2015): 318-30, https://doi.org/10.4324/9781315849072. 17 syed zubair haider and muhammad dilshad, “higher education and global development: a cross cultural qualitative study in pakistan,” higher education for the future 2, no. 2 (july 2015): 175-93, https://doi.org/10.1177/2347631114558185. 18 andrew john hutchison, lynne halley johnston, and jeff david breckon, “using qsr-nvivo to facilitate the development of a grounded theory project: an account of a worked example,” international journal of social research methodology 13, no. 4 (oct. 2010): 283-302, https://doi.org/10.1080/13645570902996301. 19 florian kaefer, juliet roper, and paresha sinha. “a software-assisted qualitative content analysis of news articles: example and reflections,” forum: qualitative social research 16, no. 2 (may 2015): 1-20, https://doi.org/10.17169/fqs-16.2.2123. 20 trena paulus et al., “the discourse of qdas: reporting practices of atlas.ti and nvivo users with implications for best practices,” international journal of social research methodology 20, no. 1 (jan. 2017): 35-47, https://doi.org/10.1080/13645579.2015.1102454. 21“about coding,” nvivo help (melbourne, australia: qsr international, 2018), accessed apr. 3 2018, http://helpnv11.qsrinternational.com/desktop/concepts/about_coding.htm?rhsearch=coding&rhsyns=. 22 paulus et al., “discourse of qdas,” 41. 23 “u.s. senate unanimously approves the program management improvement and accountability act,” business wire (dec. 2016) accessed nov. 10, 2017, http://www.businesswire.com/news/home/20161201006499/en/u.s.-senate-unanimouslyapproves-program-management-improvement. integrated technologies of blockchain and biometrics based on wireless sensor network for library management articles integrated technologies of blockchain and biometrics based on wireless sensor network for library management meng-hsuan fu information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.11883 meng-hsuan fu (msfu@mail.shu.edu.tw) is assistant professor, shih hsin university (taiwan). © 2020. abstract the internet of things (iot) is built on a strong internet infrastructure and many wireless sensor devices. presently, radio frequency identification embedded (rfid-embedded) smart cards are ubiquitous, used for many things including student id cards, transportation cards, bank cards, prepaid cards, and citizenship cards. one example of places that require smart cards is libraries. each library, such as a university library, city library, local library, or community library, has its own card and the user must bring the appropriate card to enter a library and borrow material. however, it is inconvenient to bring various cards to access different libraries. wireless infrastructure has been well developed and iot devices are connected through this infrastructure. moreover, the development of biometric identification technologies has continued to advance. blockchain methodologies have been successfully adopted in various fields. this paper proposes the blockmetrics library based on integrated technologies using blockchain and finger-vein biometrics, which are adopted into a library collection management and access control system. the library collection is managed by image recognition, rfid, and wireless sensor technologies. in addition, a biometric system is connected to a library collection control system, enabling the borrowing procedure to consist of only two steps. first, the user adopts a biometric recognition device for user authentication and then performs a collection scan with the rfid devices. all the records are recorded in a personal borrowing blockchain, which is a peer-to-peer transfer system and permanent data storage. in addition, the user can check the status of his collection across various libraries in his personal borrowing blockchain. the blockmetrics library is based on an integration of technologies that include blockchain, biometrics, and wireless sensor technologies to improve the smart library. introduction the internet of things (iot) connects individual objects together through their uniqu e address or tag, which are based on the sensor devices and wireless network infrastructure. presently, “smart living” (a term that includes concepts such as the smart home, smart city, smart university, smart government, and smart transportation) is based on the iot, which plays a key role to achieve a convenient and secure living environment. gartner, a data analytics company that presents the top ten strategic technology trends for the next year at the end of each year, listed blockchain as one of the top ten in 2017, 2018, 2019, and 2020.1 the fact that blockchain has been proposed as one of the top strategic technology trends for four consecutive years represents its sustained interest among technology experts and developers. in a blockchain, a block is the basic storage unit where data is saved and protected with cryptography and complex algorithms. the technology of peer-to-peer transfer is adopted mailto:msfu@mail.shu.edu.tw information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 2 when data or information is exchanged without the need for a third party. in other words, data is transferred directly from node to node or user to user thanks to the decentralized nature of the blockchain. in addition, blockchain is authorized and maintained by all nodes in the same blockchain network. each node has equal right (also known as equal weight) to access the blockchain and authorize new transactions. thus, all transactions are published and broadcast to all nodes and content cannot be altered by single or minority users or nodes. additionally, transaction content is secured by cryptography and complex secure algorithms. therefore, transactions occur and are preserved under a fully secure and private network. in practice, the blockchain has been applied to various fields including finance, medicine, academia, and logistics. the blockchain has also been adopted for personal transaction records for its privacy and security properties and because it offers immutable and permanent data storage. in this research, blockchain technologies are adopted to store the records of collections borrowed from various libraries in a personal borrowing blockchain. table 1. definition of key terms of blockchain key term definition blockchain a blockchain comprises many blocks. it has the characteristics of security, decentralized, immutability, distributed ledgers, transparent log, and irreversible data storage. block a block is the basic unit in blockchain. each block consists of a block header with nonce, previous block hash, timestamp, merkle root, and many transactions in a block body. nonce the counter of the algorithm, hash value will be changed once the nonce modified. merkle root a secure hash algorithm (sha) is used in merkle root to transform data into a meaningless hash value. transaction each transaction is composed of address, hash, index, and timestamp. all transactions will be stored in blocks permanently. hash secure hash algorithm (sha) transforms input data into meaningless output data, called a hash, which consists of english letters and digital numbers, in order to protect data content during transmission. biometrics using human physical characteristics including finger vein, iris, voice, and facial features for recognition. sensor network a sensor is a small and portable node with a data record function and power source. a sensor network is composed of many sensors based on a communication infrastructure. iot internet of things (iot) is a system to connect sensors and devices together under an internet environment. presently, many iot applications were adopted such as smart home, health care, and smart transportation. information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 3 although the iot and wireless networks have been well developed, people still own many different rfid-embedded cards, such as public transportation cards, credit cards, student cards, medical cards, identification cards, membership cards, or library cards. an rfid-embedded card is issued for each place or purpose, requiring the user to bring the appropriate cards to access the corresponding functions. in this study, the library is used as the objective to which blockchain technologies are applied because currently each library has its own library card for entering the library and borrowing material. this implies that users may have to carry several library cards to access each of a university library, community library, and district library on the same day. here, biometrics can be adopted to solve the problem of having to carry many access control cards and managing various borrowing policies. in this study, the blockmetrics library is designed based on the technologies of blockchain and biometrics within the environment of a wireless sensor network with iot devices. here, borrowing records are transferred and stored through blockchain technologies, automatic library access control is managed by biometric identification and the borrowing and returning of library materials are achieved under a wireless sensor network with iot devices to create a convenient, efficient, and secure library environment. the key terms of blockchain and its related terms, biometrics, sensor network and iot applied in this research are defined in table 1. related works blockchain technology nakamoto has presented bitcoin as a peer-to-peer electronic system that uses blockchain technologies, which include smart contracts, cryptography, decentralization, and consensus in proof of work. because this electronic system is based on cryptography, a trusted third party is not required in the payment mechanism. additionally, peer-to-peer technology and a timestamp server are adopted and a block is given a hash in serial order. this procedure solves the problem of double-spending during payment.2 in addition, proof of work is used in a decentralized system for authentication by most nodes in the blockchain network. each node has equal rights to compete to receive a block and each node can vote to authenticate a new block. 3 košt’ál et al. define proof-of-work (pow) as an asymmetric method with complex calculations where its difficulty is adjusted by the problem-solving duration.4 however, pow has drawbacks, such as high power consumption and the fact that some users can control the blockchain if their shares of users in the same blockchain network reach 51 percent.5 despite the possible presence of malicious parties, the information in a blockchain is difficult to modify because of the distributed ledger methodology in which each node has the same copy of ledger, making it difficult for a single or minority node to change or destroy the stored data. 6 a block is a basic unit in a blockchain. in other words, a blockchain is composed of blocks that are connected. one of the blockchain technologies is the distributed ledger, in which a ledger can be distributed to each node all over the world.7 each block is composed of a block body and header, and a block size is 4 bytes. the block header is a combination of the version, previous block hash, merkle root, timestamp, difficulty target, and nonce. a blockchain is 80 bytes in total and transactions are between 1 to 9 bytes (see table 2). information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 4 table 2. block element. block size 4 bytes size block header (80 bytes) 4 bytes version 32 bytes previous block hash 32 bytes merkle root 4 bytes timestamp 4 bytes difficulty target 4 bytes nonce block body 1-9 bytes transactions blockchain technologies have been applied to various fields including finance, art, hygiene, healthcare, and academic certificates. for example, in healthcare, user medical records are stored in a blockchain so users can check their health conditions and share them with their family members in advance. in business, blockchain is adopted into the supply chain management for monitoring activities during goods production. in the academic field, the certificates are permanently saved in a blockchain where users can retrieve them from their mobile devices and show them upon request.8 because blockchain is protected by cryptography and offers privacy, reliability, and decentralization, an increasing number of applications are beginning to adopt it. as an application for a library system, a blockchain, in combination with biometrics within a wireless infrastructure, can be adopted for personal borrowing records. library borrowing management each library has its own regulations. for example, the national central library (ncl) in taiwan has created reader service directions in which the general principles and rules for library card application, reader access to library materials, reference services, request for library materials, violations, and supplementary provisions are clearly stated. according to the ncl reader service directions, citizens are required to present their national id card and foreigners are asked to present their passport to apply for a library card. users are allowed to access the library when they have a valid library card. those who have library cards but have forgotten to bring them can apply for a temporary library card to enter the library, but this is limited to three times.9 this rule is only specific to the ncl. other libraries in the same country have their own regulations. another example is the taipei public library. citizens can apply for a library card using their citizen’s id card, passport, educational certificate, or residence permit. a taipei citizen can apply for a family library card using their family certificate. users can borrow material and return the material to all the libraries in taipei city. however, these policies are only applicable to users who hold library cards issued by libraries in taipei city.10 as for university libraries, each library also has its own regulations. for instance, shih hsin university (shu) issues its own library card to access its library. alumni are requested to present their id cards and photo to apply for a library card in person. the number of items and their loan periods are clearly stated in the rules set by the shu library.11 again, the regulations are individually set by each university library. information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 5 biometrics rfid-embedded smart cards such as student cards, transportation cards, and bank cards are widely used, however, they can be stolen, lost, or forgotten at home. biometrics is becoming more widely used in access-control systems for homes, offices, buildings, government facilities, and libraries. for these systems, the fingerprint is one of the most commonly used biometrics. users place their finger on a read device, usually a touch panel. this method ensures a unique identity, is easy to use and widely accepted, boasts a high scan speed, and is difficult to falsify. however, its effectiveness is influenced by the age of the user and the presence of moisture, wounds, dust, or particles on the finger, in addition to the concern for hygiene because of the use of touch devices. face recognition has been used in various applications such as unlocking smart devices, performing security checks and community surveillance, and maintaining home security. this method of biometric identification is convenient, widely accepted, difficult to falsify, and can be applied without the awareness of the person. however, limitations to face recognition include errors that can occur due to lighting, facial expression, and cosmetics. also, privacy is an issue in face recognition because it may take place at a distance without the user’s consent. another form of biometric identification uses the iris as an inner biometric indicator because th e iris is unique for each person. nevertheless, this method is also prone to errors that can be caused by bad lighting and the possible presence of diseases such as diabetes or glaucoma. devices used for iris recognition are expensive, and thus rarely adopted in biometrics.12 speech recognition is used for equipment control such as a smart device switch, however, it can be affected by noise, physical conditions of the user, or weather. vein recognition using finger or palm veins is becoming more prevalent as a form of biometric identification for banks or access control but can be limited by the possible presence of bruises or a lower temperature. however, vein recognition ensures a unique identity, is easy to use, convenient, accurate, and widely accepted, thus, many businesses are adopting vein recognition for various usages. to summarize, biometric identification is convenient, reduces the error rate in recognition, and is difficult to falsify. therefore, biometric identification is suitable for access control. blockmetrics library the blockmetrics library is based on the integration of blockchain and biometric technologies in a wireless sensor network with iot devices. figure 1 shows the blockmetrics library architecture with its bottom-up structure consisting of five layers: hardware, software, internet, transfer and security, and application. all components are sequentially described in detail from the bottom to the top layers. in the hardware layer, sensor nodes are physically located on library collection shelves, entrance gates, and relevant equipment to be further connected with the upper layers. rfid tags are attached to each item in the library, including books, audio resources, and videos. tag information is read and transferred by rfid readers. the biometric devices used in this study include fingerprint readers, palm and finger-vein identifiers, and face or iris recognition devices for biometric authentication when users enter libraries or borrow collections. all images including action images, collection images, and surveillance images are recorded with cameras. the ground surveillance, library collection recognition, image processing, and user identification are manipulated by graphics processing units. touch panels are used for typing or searching for information and there is a particular process for user registration. for general input and output of information, i/o devices include speakers, microphones, keyboards, and monitors. the entrance gate is connected to biometric devices and recognition systems for automatic access control. microprocessors and servers, which make up the core of the hardware, handle all the functions that run in the operating system. data and programs are run and securely saved on a large information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 6 memory drive. data transmission occurs through the wireless data collector, and data collection and transfer in the library are based on a wireless environment. in the blockmetrics library, a library collection database is used to store and maintain all the library material information in a local library for backup usage, and a blockchain is used to record personal borrowing and returning history. figure 1. blockmetrics library architecture in the software layer, an open-source word processor (such as writer, impress, or calc provided by libreoffice) is used to record library collections and handle library affairs. biometric recognition identifies a user’s biological features collected from biometric devices such as a fingervein recognition device, which is adopted in this research. all images and videos include ground information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 7 and entrance surveillance and library material borrowing and returning are recorded by cameras. all images and videos are operated by and processed with video-processing software. the data of the images, videos, personal information, and library collections is managed and saved through an image and file management system as well as a database management system. the software programs associated with creating, modifying, and maintaining library processes are written with open-source programming codes, in particular, python and r, which were scored as two of the top ten programming languages by ieee spectrum in 2019.13 there are various functions saved as packages that are free to download and can be modified and reproduced into a customized program for specific purposes. the hardware houses the cpu, which runs the general library operations, and software programs are maintained by the operating system and management information system. library stock management, collection search, personal registration, and other library-related functions can be designed and developed through app inventor. each borrow and return record is connected with the personal identification recognized from the finger-vein reader and recognition system and is saved as a transaction. the transactions that take place in a specific period are saved in a block that can be connected to another block to form a personal borrowing blockchain. the internet technology layer is built upon the hardware and software structure. the main purpose of internet technologies is to connect the equipment and devices with the internet, in which the internet plays the role of an intermediary for all devices communicating and cooperating together in the library. in the internet technology layer, bluetooth connects devices such as earphones, speakers, and audio guidance within short distances. files are exchanged between smart devices, including smartphones, tablets, or ipads, through near-field communication, which is a contact-less sensor device. rfid is adopted for collection borrowing and returning services in the library. because fiber-optic cables have been ubiquitously planted within infrastructure with the development of smart cities, most libraries have also been built with them. users or vehicles are more easily and more accurately located by the global positioning system (gps), which also assists with image recognition when a material is taken from the shelves. sensors transfer the sensing data to the relative devices for recording or processing under the infrastructure of the wireless sensor network. the library is currently built with a wi-fi environment, but li-fi is one of the future trends that involve creating a wireless environment just with light. mobile devices operate under wireless communications and most countries provide 4g and 4g+ with some supporting even 5g. the internet technology layer is the tools provider for intercommunication among devices. data security and transmission reliability are extremely important issues when various equipment is linked together and connected to the internet. the user interface is the bridge between the user and the devices. in other words, the user gives commands to the devices or software through a user interface or app. biometric recognition devices, rfid readers, and entrance control equipment are connected via the internet in this study. the devices send the information to the corresponding devices for specific purposes in a specified order. collected data such as private user identification are secured by cryptography utilized in blockchain technology. the finger-vein identification used as personal identification is combined with the borrow and return records, stored as transactions, and secured under a secure hash algorithm before being saved into a blockchain. all data and personal identification are transferred under the corresponding secure methodologies. information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 8 in the blockmetrics library, self-check-in and self-checkout rely mostly on rfid technology and finger-vein biometrics. the borrow and return records are stored in a personal borrowing blockchain, where records are saved in the blockchains of user and library, biometrics system, and library servers. entrance control is automatically managed by finger-vein recognition. library stock management is particularly based on image assistance and rfid technology. new user registration is performed only through a few identification questions and finger-vein characteristics extraction. the blockmetrics library is without a circulation desk environment and has an automated borrow and return mechanism through a single sign at the entrance and exit. the five layers in the blockmetrics library architecture communicate with each other such that operations are inseparably related. the blockmetrics library scenario is described in the next section. scenario in this section, the scenarios for registration, entry, and material borrowing and returning in the blockmetrics library will be described in detail. in figures 2 and 3, the user side indicates the actual user actions and is represented as solid lines and the background shows mostly background operations and is indicated as dotted lines. in figure 2, when a new user comes into the blockmetrics library, the registration procedure starts with a biometric pattern extraction and recognition of the user. finger-vein authentication is selected as the personal biometrics for entrance and material borrowing. on the user side, registration is completed with only two steps. the first is finger-vein extraction and the second is to simply provide personal information. the biometric recognition data is processed and stored in the appropriate database, which is linked to the personal identification management system. personal information is secured through the cryptography used in blockchain technology, thus, all information is securely stored. the registration procedure is performed only once at the first entry, and afterward all registered users can enter the blockmetrics library using finger-vein authentication. biometric recognition proceeds with a biometrics database that verifies user identity followed by verification results sent to the entrance control management for entrance guarding. the entrance is automatically controlled because the results from the biometric recognition step are sent as the rules for entrance control. users will be permitted to enter the library when they pass the biometrics recognition step. users do not have to bring their library cards to enter or borrow material, increasing convenience and decreasing identity infringement when library cards get lost. figure 3 shows the scenario of library material borrowing and returning. on the user side, library material borrowing consists of four simple steps performed by the user: 1) retrieving items, 2) authenticating with the finger-vein recognition device, 3) placing items on the rfid reader, and 4) exiting the library. when the user removes a book from the shelf, an infrared detector is triggered, recognizing that a book was removed from the shelf. then, image recognition identifies the specific book and the book’s status is marked as charged to the user in the stock database. if the user wants to leave the library without borrowing anything, the user just scans their finger with the finger-vein device to open the entrance gate. if the user wants to borrow library materials such as books or videos, the borrowing procedure is quickly completed after finger-vein scanning and placing all material including books and videos under the rfid read area. in the background, the user’s recognition results from the finger-vein scan are saved in the biometrics database, which is connected to the blockchain. when the library materials are placed together in the rfid read area, all the tags are read at once while the materials’ statuses in the information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 9 database are updated. user information and material borrowing information are linked and saved as transactions that are stored in the personal borrowing blockchain. figure 2. blockmetrics library scenario—registration and entry information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 10 figure 3. blockmetrics library scenario – borrowing and returning to return library material, the user only needs to put the library materials in the specific area with the rfid reader and the return procedure is completed. the rfid tags of returned materials are read and recorded and their status in the stock database is updated. personal borrow and return records are saved as transactions and stored in the personal borrowing blockchain as well. limitations partial biometric technologies such as facial recognition or fingerprint recognition systems have been adopted by some libraries. these tools have increased the efficiency of accessing and borrowing procedures. however, all the records include some personal information (e.g., fingerprint, historical borrowing records, log of library access, etc.) that is still stored in the individual libraries’ database. the blockchain model may not suitable for all current libraries system due to the unknown database design of each library. at present, library classification systems are by each library individually. therefore, integrating library information among national or international libraries will be a huge task. thus, how to establish the general regulations for all libraries to develop and manage the library information will need additional information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 11 research. after that, the information management system should be designed and built by collecting diverse comments from all library managers. the works should be completed by interdisciplinary experts including library management, information engineering, biometrics system design, and data management. the cost may include manager committees, collection coding design, system development, hardware layout, and related training plans. also, th ere may exist unpredictable privacy issues which could be known until practical system operation. lastly, some users need an adaptation period while new technologies are implemented, the duration of which can depend on how smooth the interface design is, if the system manipulation is easy and clear to use, and what benefits the technologies bring to users’ life. the limitations are concluded as: 1) integrating library information such as stock data and serial numbers, 2) establishing general regulations, 3) creating a consistent library management system, 4) the cost for this system, 5) potential for privacy breaches, and 6) library patron resistance or reluctance to use the technology. conclusion in this research, the blockmetrics library is designed under a wireless sensor network infrastructure combined with blockchain, biometric, iot, and rfid technologies. the library access control system is based on finger-vein biometric recognition, in which users can register with their finger-vein information through biometric devices and input personal information via various i/o devices. thus, automatic and secure library access control is achieved through biometric recognition. additionally, image recognition, gps, and rfid are adopted in the library collection management, providing a simplified way to borrow and return library material. blockchain technologies are utilized to record personal borrowing history of collections from various libraries into a personal borrowing blockchain where records are permanently stored. users can clearly understand their borrowing status through their own blockchain and manage their borrowing information through an application. to summarize, users can enter the library with finger-vein recognition instead of a specific library card. then, if they would like to check out library material, the user can retrieve the items, pass them through the rfid reader, scan their finger vein, and go. the blockmetrics library is designed for convenience and security, which are achieved by combining a wireless sensor network with the integration of blockchain and biometric technologies. this method eliminates the inconvenience of having to bring many library cards, increases the efficiency of collection borrowing procedures, and simplifies the management of collection borrowing from different libraries. adoption of these biometric technologies is still in its early stages. some libraries have begun using different tools, but few libraries have adopted all of them. it simplifies both accessing and borrowing procedures, and all the records are still stored in a particular library’s database for private access only. the development of the blockmetrics library will help to integrate biometric technologies and blockchain under the infrastructure of wireless sensor network to maintain library-accessing recognition, library collections, library users, borrowing records crossing libraries to raise the user convenience and satisfactions, library management efficiency, and library security. in the near future, the library transaction formula in a blockchain will be developed for collection borrowing storage. the library collection serial numbers will be considered in information management system as well. information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 12 endnotes 1 gartner, “smart with gartner, gartner top 10 strategic technology trends for 2020,” https://www.gartner.com/smarterwithgartner/gartner-top-10-strategic-technology-trendsfor-2020/; gartner, “smart with gartner, gartner top 10 strategic technology trends for 2019,” https://www.gartner.com/smarterwithgartner/gartner-top-10-strategic-technologytrends-for-2019/; gartner, “smart with gartner, gartner top 10 strategic technology trends for 2018,” https://www.gartner.com/smarterwithgartner/gartner-top-10-strategictechnology-trends-for-2018/; gartner, “smart with gartner, gartner top 10 strategic technology trends for 2017,” https://www.gartner.com/smarterwithgartner/gartners-top10-technology-trends-2017/. 2 satoshi nakamoto, “bitcoin: a peer-to-peer electronic cash system” (2009), https://bitcoin.org/bitcoin.pdf. 3 david shrier, weige wu, and alex pentland, “blockchain & infrastructure (identity, data security),” connection science & engineering, massachusetts institute of technology, 2016. 4 kristián košt’ál et al., “on transition between pow and pos,” international symposium elmar (2018). 5 thomas p. keenan, “alice in blockchains: surprising security pitfalls in pow and pos blockchain systems,” 15th annual conference on privacy, security and trust (2017); takeshi ogawa, hayato kima, and noriharu miyaho, “proposal of proof-of-lucky-id (pol) to solve the problems of pow and pos,” ieee international conference on internet of things and ieee green computing and communications and ieee cyber, physical and social computing and ieee smart data (2018). 6 quoc khanh nguyen, quang vang dang, “blockchain technology for the advancement of the future,” 4th international conference on green technology and sustainable development, (2018); nir kshetri and jeffrey voas, “blockchain in developing countries,” it professional, 20, no.2 (2018): 11-14. 7 shangping wang, yinglong zhang, and yaling zhang, “a blockchain-based framework for data sharing with fine-grained access control in decentralized storage systems,” 2018 ieee access, 6 (2018):38437-38450. 8 pinyaphat tasatanattakool and chian techapanupreeda, “blockchain: challenges and applications,” 2018 international conference on information networking (icoin), (2018), https://doi.org/10.1109/icoin.2018.8343163; abderahman rejeb, john g. keogh and horst treiblmaier, “leveraging the internet of things and blockchain technology in supply chain management,” future internet, 11, no. 7 (2019): 161; stanislaw p. stawicki, michael s. firstenberg, and thomas j. papadimos, “what’s new in academic medicine? blockchain technology in health-care: bigger, better, fairer, faster, and leaner,” international journal of academic medicine, 4, no. 1 (2018): 1-11; guang chen et al., “exploring blockchain technology and its potential applications for education,” smart learning environments, 5, no. 1 (2018), https://doi.org/10.1186/s40561-017-0050-x; asma khatoon, “a blockchain-based smart contract system for healthcare management,” electronics, 9, no. 1 (2020): 94. https://doi.org/10.1109/icoin.2018.8343163 https://doi.org/10.1186/s40561-017-0050-x information technology and libraries september 2020 integrated technologies of blockchain and biometrics | meng-hsuan fu 13 9 national central library, “national central library reader service directions,” november 11, 2016, https://enwww.ncl.edu.tw/content_26.html. 10 taipei public library, “regulation of circulation services,” june 13, 2018, https://english.tpml.gov.taipei/cp.aspx?n=af5cca6fc258864e. 11 shih hsin university library, “library regulations, access to shu libraries,” http://lib.shu.edu.tw/e_orders_enter.htm; shih hsin university library, “library regulations, borrowing policies,” accessed september 25, 2019, http://lib.shu.edu.tw/e_orders_borrows.htm. 12 sudhinder singh chowhan and ganeshchandra shinde, “iris biometrics recognition application in security management,” 2008 congress on image and signal processing. 13 stephen cass, “the top programming languages 2019,” ieee spectrum (2019), https://spectrum.ieee.org/computing/software/the-top-programming-languages-2019. https://enwww.ncl.edu.tw/content_26.html https://english.tpml.gov.taipei/cp.aspx?n=af5cca6fc258864e http://lib.shu.edu.tw/e_orders_enter.htm http://lib.shu.edu.tw/e_orders_borrows.htm https://spectrum.ieee.org/computing/software/the-top-programming-languages-2019 abstract introduction related works blockchain technology library borrowing management biometrics blockmetrics library scenario limitations conclusion endnotes everyone’s invited: a website usability study involving multiple library stakeholders elena azadbakht, john blair, and lisa jones information technology and libraries | december 2017 34 elena azadbakht (elena.azadbakht@usm.edu) is health and nursing librarian and assistant professor, john blair (john.blair@usm.edu) is web services coordinator, and lisa jones (lisa.r.jones@usm.edu) is head of finance and information technology, university of southern mississippi, hattiesburg, mississippi. abstract this article describes a usability study of the university of southern mississippi libraries website conducted in early 2016. the study involved six participants from each of four key user groups— undergraduate students, graduate students, faculty, and library employees—and consisted of six typical library search tasks, such as finding a book and an article on a topic, locating a journal by title, and looking up hours of operation. library employees and graduate students completed the study’s tasks most successfully, whereas undergraduate students performed relatively simple searches and relied on the libraries’ discovery tool, primo. the study’s results displayed several problematic features that affected each user group, including library employees. these results increased internal buy-in for usability-related changes to the library website in a later redesign. introduction within the last decade, usability testing has become a common way for libraries to assess their websites. eager to gain a better understanding of how users experience our website, we assembled a two-person team and conducted the first usability study of the university of southern mississippi libraries website in february 2016. the web advisory committee—which is tasked with developing, maintaining, and enhancing the libraries’ online presence—wanted to determine if the content on the website was organized in a way that made sense to users and facilitated the efficient use of the libraries’ online resources. our usability study involved six participants from each of the following library user groups: undergraduate students, graduate students, faculty, and library employees. student and faculty participants represented several academic disciplines and departments. all of the library employees involved in the study work in public-facing roles. the web advisory committee and libraries’ administration wanted to know how each of these groups differ in their website use and whether they have difficulty with the same architecture or features. usability testing helped illuminate which aspects of the website’s design might be hindering users from accomplishing key tasks, thereby identifying where and how improvement needed to be made. we included library employees in this study to compare their approach to the website to that of other users in the mailto:elena.azadbakht@usm.edu mailto:john.blair@usm.edu mailto:lisa.r.jones@usm.edu everyone’s invited | azadbakht, blair, and jones 35 https://doi.org/10.6017/ital.v36i4.9959 hope of increasing internal stakeholders’ buy-in for recommendations resulting from this study. this article will discuss the usability study’s design, results, and recommendations as well as the implications of the study’s findings for similarly situated academic libraries. we will give special consideration to how the behavior of library employees compared to that of other groups. literature review the literature on library-website user experience and usability is extensive. in 2007, blummer conducted a literature review of research related to academic-library websites, including usability studies. her article provides an overview of the goals and outcomes of early library-website usability studies. 1 more recent articles focus on a portion or aspect of a library’s website such as the homepage, federated search or discovery tool, or subject guides. fagan published an article in 2010 that reviews user studies of faceted browsing and outlines several best practices for designing studies that focus on next-generation catalogs or discovery tools. 2 other library-website studies have reported on the habits of user groups, with undergraduates being the most commonly studied constituent group. emde, morris, and claassen-wilson observed university of kansas faculty and graduate students’ use of the library website, which had been recently redesigned, including a new federated search tool. 3 many of the study’s participants gravitated toward the subject-specific resources they were familiar with and either missed or avoided using the website’s new features. when asked for their opinions on the federated search tool, several participants said that while it was not a tool they saw themselves using, they did see how it might be a helpful for undergraduate students who were still new to research. the researchers also provided the participants with an article citation and asked them to locate it using the using the library’s website or online resources. while half the participants did use the website’s “e-journals” link, others were less successful. some who had the most difficulty “search[ed] for the journal title in a search box that was set up to search database titles.” 4 this led emde, morris, and claassen-wilson to observe that “locating journal articles from known citations is a difficult concept even for some advanced researchers.” turner’s 2011 article describes the result of a usability study at syracuse university library that included both students and library staff. participants were asked to start at the library’s homepage and complete five tasks designed to emulate the types of searches a typical library user might perform, such as finding a specific book, a multimedia item, an article in the journal nature, and primary sources pertaining to a historic event. 5 when asked to find toni morrison’s beloved, most staff members used the library’s traditional online catalog whereas students almost always began their searches with the federated search tool located on the homepage. participants of both types were less successful at locating a primary source, although this task highlighted key differences in each groups’ approach to searching the library website. since library staff were more familiar than students with the library’s collections and online search tools, they relied more on facets and limiters to narrow their searches, and some even began their searches by navigating to the library’s webpage for special collections. information technology and libraries |december 2017 36 library staff tended to be more persistent; draw upon their greater knowledge of the library’s collections, website, and search tools; and use special syntax in their searchers, like inverting an author’s first and last names. “library staff took more time, on average, to locate materials,” writes turner, because of their “interest in trying alternative strategies.” 6 students, on the other hand, usually included more detail than necessary in their search queries (such as adding a word related to the format they were searching for after their keywords) and could not always differentiate various types of catalog records, for example, the record for a book review and the record for the book itself. turner concludes that the students’ mental models for searching online and their experiences with other web-search environments influence their expectations of how library search tools work and that library-website design should take these mental models into consideration. research on the search behaviors of students versus more experienced researchers or subject experts also has implications for library website design. two recent articles explore the different mental models or mindsets students bring to a search. the students in asher and duke’s 2012 study “generally treated all search boxes as the equivalent of a google search box” and used very simple keyword searches. 7 this tracked with holman’s 2010 study, which likewise found that the students she observed relied on simple search strategies and did not understand how search interfaces and systems are structured. 8 methods our research team consisted of the libraries’ health and nursing librarian and the web services coordinator. we worked closely with the head of finance and information technology in designing and running the usability study. a two-week period in mid-february 2016 was chosen for usability testing to avoid losing potential participants to midterms or spring break. we posted a call for participants to two university discussion lists, on the libraries website, and on social media (facebook and twitter). we also reached out directly to faculty in academic departments we regularly work with and emailed library employees directly. we directed nonlibrary participants to a web form on the libraries website to provide their name, contact information, university affiliation/class standing, and availability. the health and nursing librarian followed up with and scheduled participants on the basis of their availability. each student participant received a ten-dollar print card and each faculty participant received a ten-dollar starbucks gift card. to record the testing sessions, we needed a free or low-cost software option. since the libraries already had a subscription to screencast-o-matic to develop video tutorials, and the tool allows for simultaneous screen, audio, and video capture, so we decided to use it to record all testing sessions. we also used a spare laptop with an embedded camera and microphone. the health and nursing librarian served as both facilitator and note-taker for most usability testing sessions. participants were given six tasks to complete. we encouraged participants to everyone’s invited | azadbakht, blair, and jones 37 https://doi.org/10.6017/ital.v36i4.9959 narrate as they completed each task. the sessions began with simple, secondary navigational questions like the following: • how late is our main library open on a typical monday night? • how could you contact a librarian for help? • where would you find more information about services offered by the library? next, we asked the participants to complete tasks designed to assess their ability to search for specific library resources and to illuminate any difficulty users might have navigating the website in the process. each of the three tasks focused on a particular library-resource type, including books, articles, and journals: • find a book about rabbits. • find an article about rabbits. • check to see if we have a subscription/access to a journal called nature. after the usability testing was complete, we reviewed the recordings and notes and coded them. for each task, we calculated time to completion and documented the various paths participants took to answer each question, noting any issues they encountered. we also compared the four user groups in our analysis. limitations although we controlled for user type (undergraduate, graduate, faculty, or library employee) in the recruitment of study participants, we did not screen by academic discipline. doing so would have hindered our team’s ability to include enough graduate students and faculty members in the study, as nearly all the volunteers from these two groups were from humanities or social science fields. the results might have differed slightly had the study successfully managed to include more faculty from the so-called hard sciences and allied health fields. additionally, the order in which we asked participants to attempt the tasks might have affected how they approached some of the later tasks. if a participant chose to search for a book using the primo discovery tool, for example, they might be more inclined to use it to complete the next task (find an article) rather than navigate to a different online resource or tool. despite these limitations, usability testing has helped improve the website in key ways. we plan to correct for these limitations in future studies. results every group included a participant who failed to complete at least one of the six tasks. an adequate answer to each of the study’s six tasks can be found within one or two pages/clicks from the libraries homepage (figure 1). the average distance to a solution remained at about two page loads across all of the study’s participants, despite a few individual “website safaris.” information technology and libraries |december 2017 38 figure 1. university of southern mississippi libraries’ homepage. graduate students tended to complete tasks the quickest and were generally as successful as library employees. they preferred to use primo for finding books but tended to favor the list of scholarly databases on the “articles & databases” page to find articles and journals. undergraduates were the second fastest group, but many struggled to complete one or more of the six tasks. they had the most trouble finding books and locating the journal by title. undergraduates generally performed simple searches and had trouble recovering from missteps. they were heavy users of primo, relying on the discovery tool more than any other group. the other two user groups, faculty and library employees, were slower at completing tasks. of the two, faculty took the longest to complete any task and failed to complete tasks at a similar rate as undergraduates. likewise, this group favored primo nearly as often. in contrast, library employees took almost as long as faculty to complete tasks but were much more successful. as a group, library employees demonstrated the different paths users could take to complete each task but favored those paths they identified as the “preferred” method for finding an item or resource over the fastest route. everyone’s invited | azadbakht, blair, and jones 39 https://doi.org/10.6017/ital.v36i4.9959 the majority of study participants across all user groups had little trouble with the first three tasks. although most participants favored the less direct path to the libraries’ hours—missing the direct link at the top of the homepage (figure 2)—they spent relatively little time on this task. likewise, virtually all participants took note of the links to our “ask-a-librarian” and “services” pages located in our homepage’s main navigation menu. this portion of the usability study alerted us to the need for a more prominent display of our opening hours on the homepage. figure 2. link to “hours” from the homepage. of the second set of tasks—find a book, find an article, and determine if we have access to nature—the first and last proved the most challenging for participants. one undergraduate was unable to complete the book task, and one faculty member took nearly eight minutes to do so—the longest time to completion of any task by any user in the study. primo was the most preferred method for finding a book. although an option for searching our classic catalog (which uses innovative interfaces’ millennium integrated library system) is contained within a search widget on the homepage, primo is the default search option and therefore users’ default choice. interestingly, even after statements from some faculty such as “i don’t love primo,” “primo isn’t the best,” and “the [classic catalog] is better,” these participants proceeded to use primo to find a book. library employees were evenly split between primo and classic catalog. one undergraduate student, graduate student, and library employee were unable to determine whether we have access to nature. this task was the most time consuming for library employees because there are multiple ways to approach this question and library employees tended to favor the most consistently successful yet most time-consuming options (e.g., searching within the classic catalog). lacking a clear option in the main navigation bar, the most popular path started information technology and libraries |december 2017 40 with our “articles & databases” page, but the answer was most often successfully found using primo. several participants tried using the “search for databases” search box on the “articles & databases” page, which yielded no results because it searches only our database list. the search widget on the homepage that includes primo has an option for searching e-journals by title, as shown in figure 3. however, nearly all nonlibrary employees missed this feature. participants from both the undergraduate and graduate student user groups had trouble with this task, including those who were ultimately successful. unfortunately, many of the undergraduates could not differentiate a journal from an article, and while graduate students were aware of the distinction, a few indicated that they were not used to the idea of finding articles from a specific journal. figure 3. e-journals search tab. when it came to finding articles, undergraduates, as well as several faculty and a few library employees, gravitated toward primo. others, particularly graduate students and library employees, opted to search a specific database—most often academic search premier or jstor. however, those who used primo to answer this question arrived at an answer two to three times faster because of the discovery tool’s accessibility from a search widget on the homepage. regardless of the tool or resource they used, most participants found a sufficient result or two. common breakdowns despite the clear label “search for databases,” at least one participant from each user group, including library employees, attempted to enter a book title, journal name, or keyword into the libguides’ database search tool on our “articles & databases” page (figure 4). some participants attempted this repeatedly despite getting no results. others did not try a search but stated, with everyone’s invited | azadbakht, blair, and jones 41 https://doi.org/10.6017/ital.v36i4.9959 confidence, that entering a journal, book, or article title into the “search for databases” field would yield a relevant result. a few participants also attempted this with the search box on our research guides (libguides) page, which searches only within the content of the libguides themselves. across all groups, when not starting at the homepage, many participants had difficulty finding books because no clear menu option exists for finding books like it does for articles (our “articles & databases” page). this was difficulty was compounded by many participants struggling to return to the libraries homepage from within the website’s subpages. those participants who were able to navigate back to the homepage were reminded of the primo search box located there and used it to search for books. figure 4. “search for databases” box on the “articles & databases” page. another breakdown was the “help & faq” page (figure 5). participants who turned there for help at any point in the study spent a relatively long time trying to find a usable answer and often ended up more confused than before. in fact, only one in three participants managed to use “help & faq” successfully because the faq consists of many questions with answers on many different pages and subpages. this portion of the website had not been updated in several years and therefore the questions were not listed in order of frequency. information technology and libraries |december 2017 42 figure 5. the answer to the “how do i find books?” faq item leads to several subpages. discussion using the results of the study, we made several recommendations to the libraries’ web advisory committee and administration: (1) display our hours of operation on the homepage; (2) remove the search boxes from the “articles & databases” and “research guides” pages; (3) condense the “help & faq” pages; and (4) create a “find books” option on the homepage. all of these recommendations were taken into account during a recent redesign of the website. we also considered each user group’s performance and its implications for website design as well as instruction and outreach efforts. first, our team suggested that the current day’s hours of operation be featured prominently on the website’s front page. despite “how late is our main library open on a typical monday night?” being one of two tasks that had a 100 percent completion rate, this change is easy to make, adds convenience, and addresses a long-voiced complaint. several participants expressed a desire to see this change implemented. moreover, this is something many of our peer libraries provide on their websites. the team’s next recommendation was to remove the “find databases by title” search box from the “article & databases” page. during the study, participants who had a particular database in mind opted to navigate directly to that database rather than search for it. another such search box exists on the “research guides” page. although most of the participants did not encounter this search box during the study, those that did also mistook it for a general search tool. participants everyone’s invited | azadbakht, blair, and jones 43 https://doi.org/10.6017/ital.v36i4.9959 from all groups, especially undergraduate students, assumed that any search box on the libraries’ website was designed to search for and within resources like article databases and the online catalog, regardless of how the search box was labeled. given our findings, libraries with similar search boxes might also consider removing these from their websites. another recommended change was to condense the “help & faq” section of the website considerably. the “help & faq” section was too large and unwieldy for participants to use successfully without becoming visibly frustrated, defeating its purpose. moreover, google analytics showed that only nine of the more than one hundred “help & faq” pages were used with any regularity. going forward, we will work to identify the roughly ten most important questions to feature in this section. the final major recommendation was to consider adding a top-level menu item called “find books” that would provide users with a means to escape the depths of the site and direct them to primo or the classic catalog. when participants would get stuck on the book-finding task, they looked for a parallel to the “articles & databases” menu option. a “getting started” page or libguide could take this idea a step further by also including brief, straightforward instructions on finding articles and journals by title. in effect, this option would be another way to condense and reinvent some of the topics originally addressed in the “help & faq” pages. comparing each user group’s average performance helped illuminate the strengths and weaknesses of the website’s design. we suspect that graduate students were the fastest and nearly most successful group because they are early in their academic careers and doing a great deal of their own research (as compared to faculty). many of them are also responsible for teaching introductory courses and are working closely with first-year students who are just learning how to do research. faculty, because their research tends to be on narrower topics, were familiar with the specific resources and tools they use in their work but were less able to efficiently navigate the parts of the website with which they have less experience. moreover, individual faculty varied widely in their comfort level with technology, and this affected their ability to complete certain tasks. conclusion the results of our website usability study echo those found elsewhere in the literature. students approach library search interfaces as if they were google and generally conduct very simple searches. without knowledge of the libraries’ digital environment and without the research skills library employees possess, undergraduates in our study tended to favor the most direct route to the answer—if they could identify it. this group had the most trouble with library and academic terminology or concepts like the difference between an article and a journal. though not as quick as the graduate students, undergraduates completed tasks swiftly, mainly becau se of their reliance on the primo discovery tool. however, undergraduate students were less able to recover from missteps; more of them confused the “find databases by title” search tool for an article search tool than participants from any other group. since undergraduates compose the bulk of our user information technology and libraries |december 2017 44 base and are the least experienced researchers, we decided to focus our redesign on solutions that will help them use the website more easily. although all of the library employees in our study work in public-facing roles, not all of them provide regular research help or teach information literacy. since most of them are very familiar with our website and online resources, they approached the tasks more methodically and thoroughly than other participants. library employees tended to choose the search strategy or path to discovery that would yield the highest-quality result or they would demonstrate multiple ways of completing a given task, including any necessary workarounds. the inclusion of library employees yielded the most powerful tool in our research team’s arsenal. holding this group’s “correct” methods side-by-side to equally valid methods of discovery helped shake loose rigid thinking, and the fact that some library employees were unable to complete certain tasks shocked all parties in attendance when we presented our findings to stakeholders. any potential argument that student, faculty, and staff missteps were the result of improper instruction and not of a usability issue was countered by evidence that the same missteps were sometimes made by library staff. not only was this an eye-opening revelation to our entire staff, it served as the evidence our team needed to break through entrenched resistance to making any changes. we were met with almost instant, even enthusiastic, buy-in to our redesign recommendations from the libraries’ administration. therefore, we highly recommend that other academic libraries consider including library staff as participants in their website usability studies. references 1 barbara a. blummer, “a literature review of academic library web page studies,” journal of web librarianship 1, no. 1 (2007): 45–64, https://doi.org/10.1300/j502v01n01_04. 2 jody condit fagan, “usability studies of faceted browsing: a literature review,” information technology and libraries 29, no. 2 (2010): 58–66, https://ejournals.bc.edu/ojs/index.php/ital/article/view/3144/2758. 3 judith z. emde, sara e. morris, and monica claassen-wilson, “testing an academic library website for usability with faculty and graduate students,” evidence based library and information practice 4, no. 4 (2009): 24–36, https://doi.org/10.18438/b8tk7q. 4 ibid., 30. 5 nancy b. turner, “librarians do it differently: comparative usability testing with students and library staff,” journal of web librarianship 5, no. 4 (2011): 286–98, https://doi.org/10.1080/19322909.2011.624428. 6 ibid., 295. https://doi.org/10.1300/j502v01n01_04 https://ejournals.bc.edu/ojs/index.php/ital/article/view/3144/2758 https://doi.org/10.18438/b8tk7q https://doi.org/10.1080/19322909.2011.624428 everyone’s invited | azadbakht, blair, and jones 45 https://doi.org/10.6017/ital.v36i4.9959 7 andrew d. asher and lynda m. duke, “searching for answers: student behavior at illinois western university,” in college libraries and student culture: what we now know (chicago: american library association, 2012), 77–78. 8 lucy holman, “millennial students’ mental models of search: implications for academic librarians and database developers,” journal of academic librarianship 37, no. 1 (2011): 21– 23, https://doi.org/10.1016/j.acalib.2010.10.003. https://doi.org/10.1016/j.acalib.2010.10.003 abstract introduction methods limitations results common breakdowns discussion conclusion references reproduced with permission of the copyright owner. further reproduction prohibited without permission. editorial: i inhaled helmer, john f information technology and libraries; jun 2000; 19, 2; proquest pg. 59 editorial: i inhaled t his editorial introduces the third special issue of information technology and libraries dedicated to library consortia, and the second primarily aimed at surveying consortial activities outside the united states. 1 the concept of a special consortial issue began in 1997 as an outgrowth of a sporadic and wide-ranging discussion with jim kopp, editor of ital 1996-98. at the time, jim and i were involved in the creation and maturation of the orbis consortium in oregon and washington. jim was a member and later chair of the governing council and i was chief volunteer staff person and finding myself increasingly absorbed by consortial work. our discussions lasted more than a year and were sustained by many e-mail messages and several enjoyable conversations over bottles of nut brown ale. in the mid-1990s it seemed obvious that we were witnessing the beginning of a renaissance in library consortia. consortia had been around for many years but now established groups were showing renewed vigor and new groups seemed to be forming every day. why was this happening? what were all these consortia doing? jim and i discussed these questions and speculated on future roles for library consortia and their impact on member libraries. library consortia seemed an ideal topic for a special issue of ital. my initial goal as guest editor of ital was to take a snapshot of a variety of consortia and begin to better understand the implications of the explosive growth we were witnessing. while assembling the march 1998 issue i soon realized that consortia were all over the map, both figuratively and literally. a small amount of study revealed a tremendous variety of consortia and a truly worldwide distribution. although american consortia were starting to receive attention in the professional literature, a great deal of important work was occurring abroad. this realization gave rise to the september 1999 issue and the present issue dedicated to consortia from around the world. in addition to six articles from the united states, these three special issues of ital include contributions from south africa, canada, israel, spain, australia, brazil, john f. helmer china, italy, micronesia, and the united kingdom. taken together these groups represent a dizzying array of organizing principles, membership models, governance structures, and funding models. although most are geographically defined, the type of library they serve also defines many. virtually all license electronic resources for their membership but many offer a wide variety of other services including shared catalogs, union catalogs, patron-initiated borrowing systems, authentication systems, cooperative collection development, digitizing, instruction, preservation, courier systems, and shared human resources. each consortium is formed by unique political and cultural circumstances, but a few themes are common to all. it is clear that the technology of the web, the increasing importance of electronic resources, and advances in resource-sharing systems have created new opportunities for consortia. beyond these technological and economic motivations, i believe that in consortia we see the librarian's instinct for collaboration being brought to bear at a time of great uncertainty and rapid change. librarians often forget that as a profession we collaborate and cooperate with an ease seldom seen in other endeavors. there is safety in numbers and in uncertain times it helps to confer with others, spread risk over a larger group, and speak with a collective voice. library consortia fulfill these functions very well and their future continues to look bright. as i conclude my duties as guest editor i would like to thank jim kopp for sparking my interest in this project and for several years of stimulating conversation. special thanks are due to managing editors ann jones and judith carter as well as the helpful and professional staff at ala production services. obstacles of language and time differences make composing and editing a publication such as this unusually challenging. the quality and cohesivejohn f.helmer(jhelmer@darkwing.uoregon.edu) is executive director, orbis library consortium. production: ala production services (troy d. linker, kevin heubusch; ellie barta-moran, angela hanshaw, and karen sheets), american library association, 30 e. huron st., chicago, il 60611. publication of material in infornrntion trclz110logy and libraries does not constitute official endorsement bv lita or the ala. . abstracted in computer & /11jtj1·11wtwn systems, compllting rn 1icws, il~{ormation science abstracts, library [-r lnforlllatio11 science abstracts, rtfrrati'unyi zlwrnal, i\iauclmaya i tckfrniclzeskaya l11fon11atsiya, otdyclnyi vyp11sk, and science abstracts pu{j/icnticms_ indexed in co111pu1\r!nth citation lndcx, comptdcr contents, co111putcr litaaturc lndc:r, current contc11ts/healtl1 scn.·iccs admi11istratio1l, current ccmtcnfs/social bclwuioral scic11ces, c11rrcnt index to journals in education, education, library literature, a1agazinc jndcj:, ncwscarcl1, and social sciences citation index. microfilm copies available to subscribers from university microfilms, ann arbor, michigan. for information sciences-permanence of paper for printed library materials, ansi 239.48-1992.= copyright ©2000 american library association. all material in this journal subject to copyright by ala may be photocopied for the noncommercial purpose of scientific or educational advancement granted by sections 107 and 108 of the copyright revision act of 1976. for other reprinting, photocopying, or translating, address requests to the ala office of rights and permissions. the paper used in this publication meets the minimum requirements of american national standard editorial 59 reproduced with permission of the copyright owner. further reproduction prohibited without permission. ness of these issues of ital are due in large measure to the efforts of these individuals. in inhaling the spore, the editorial introduction to the first special consortial issue, i compared a librarian's involvement in consortia to the cameroonian stink ant's inhalation of a contagious spore. the effect of this spore is featured in mr. wilson's cabinet of wonder, lawrence weschler's remarkable history of the museum of jurassic technology. 2 weschler explains that, once inhaled, the spore lodges in the brain and "immediately begins to grow, quickly fomenting bizarre behavioral changes in its ant host." although the concept of a consortial spore is somewhat extreme (or "icky" according to my nine-yearold daughter) the editorial was an accurate reflection of my own sense of being inexorably drawn into a consortium-drawn not so much against my will but as a willing crazed participant. at the time i was nominally working for the university of oregon library system and vainly trying to keep consortial work in perspective. 60 information technology and libraries i june 2000 by the time of my second editorial, epidemiology of the consortia/ spore, i was exploring consortia around the world but still laboring under the illusion that i could keep my own consortium at arm's length. i must have failed since, as of this writing, i have left my position at the uo and now serve as the executive director of the orbis library consortium. like the cameroonian stink ant, i have inhaled the spore and am now happily laboring under its influence. references and notes 1. see ital 17, no. 1 (mar. 1998) and ital 18, no. 3 (sept. 1999). 2. lawrence weschler, mr. wilson's cabinet of wonder (new york: vintage books, 1995). the museum of jurassic technology (www.mjt.org) is located in culver city, calif. see www.mjt.org/ exhibits/stinkant.html for more on the cameroonian stink ant. we can do it for free! using freeware for online patron engagement public libraries leading the way we can do it for free! using freeware for online patron engagement karin suni and christopher a. brown information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13257 karin suni (sunik@freelibrary.org) is curator, theatre collection, the free library of philadelphia. christopher a. brown (brownc@freelibrary.org) is curator, children’s literature research collection, the free library of philadelphia. © 2021. “public libraries leading the way” is a regular column spotlighting technology in public libraries. in the early weeks of the pandemic, the special collections division of the free library of philadelphia (https://freelibrary.org/) responded to the library’s call for fun and interactive online engagement. initially staff members released games and buzzfeed-inspired lists via various social media accounts to amuse patrons, distract from the lockdown, and provide educational programming. as the list of activities grew, we realized this content needed a more substantial home; the return on investment of time for the development and production of an online game to be released once on social media was not sufficient. activities and passive programming that took hours to create could easily fall victim to social media’s algorithms and be quickly buried in a patron’s feed. the free library’s official blog was an insufficient option because it promoted all library programming, and our goal was to highlight the value of our division and the materials housed within it. we resolved these issues by creating an online repository solely with freeware systems (https://bit.ly/funwithflpspeccoll). the repository provides a stable landing page wherein the special collections division content builds meaningful connections with patrons of all ages. this model can be readily adapted and is a valuable tool for library workers promoting their own online engagement. repository framework it was clear that our division could not add to the burden of an overworked it staff by requesting support for digital engagement. we needed to seek external alternatives that would interest patrons and could be managed with limited training. before we began our search, we brainstormed a list of requirements: • an inexpensive and user-friendly hosting platform • a pleasing look and easy navigation • the ability to be updated frequently and easily • the flexibility to adapt and expand as our requirements change our search led us to the google suite of products, specifically google sites and google drawings. google sites and google drawings integrated perfectly with each other, and we appreciated their usability and relative simplicity. once we selected the software, we knew we needed a list of best practices to guide the repository’s creation: ● to establish a visual connection with our official website, the repository would primarily use the free library’s branded color scheme. ● all thumbnails created would be square, allowing us to reuse the image as promotional material on different social media accounts. mailto:sunik@freelibrary.org mailto:brownc@freelibrary.org https://freelibrary.org/ https://bit.ly/funwithflpspeccoll information technology and libraries march 2021 we can do it for free! | suni and brown 2 ● all members of the division can create content, but the ability to update and edit the repository would remain limited to ensure consistency. these guidelines have proven effective. the color scheme and thumbnail rules formed a framework wherein we could work productively without “reinventing the wheel.” limiting administrative abilities has allowed us to maintain a controlled vocabulary within the repository, better unifying the content. repository software the google suite, specifically google sites, is advantageous for library workers looking to create professional-looking content quickly. it is free with a google account and built-in templates allow users to build a fully functional website within a few hours with little-to-no design experience. as with all freeware, google sites has quirks. the foremost is that while there are options for customization, these options are finite. there are a limited number of layout, header, and font designs meaning that anyone using the software must temper their vision to fit within the confines of the program. google drawings is far more flexible, in part because it is a much simpler program. users familiar with software like powerpoint or ms paint have the ability to design images for headers, thumbnails, etc. two drawbacks we encountered with this freeware are the restrictions on image upload size (a consideration for our division given the archival files used in our digital collections) and the limited ability to create word art. for our division, the advantages of these software products outweigh their limitations. content framework the repository houses programming devised primarily with freeware. an early discovery was a suite of activities from flippity (https://www.flippity.net). designed for educational use, flippity provides templates for a variety of online activities including memory games, matching games, and board games. our primary focus has been on the first two, although we continue to explore new aspects of this suite as templates are added. flippity works with google sheets and can integrate images from google drawings. jigsaw planet (https://jigsawplanet.net/) has been used extensively by libraries and museums during the pandemic. it allows creators to easily turn images into puzzles that are played online, either on the site itself or through embedding the puzzle. the site allows registered users to access leaderboards, and it allows creators to track how many times puzzles have been played. in addition to the ease of use, the major benefit of jigsaw planet is that the patron can customize their experience by changing the number of pieces to fit their preferred level of difficulty. the desire for audio and video content has surged over the last several months, and we have sought to meet that need through the use of a variety of software. in regard to video, youtube is not a new tool, but the majority of our pre-pandemic programs were not filmed. with the shift to crowdcast and zoom, we now have a library of online lectures and other events that have been uploaded to youtube and can be viewed repeatedly and at any time. with a dedicated home for this content, we have been inspired to seek out older videos of special collections programming across multiple channels and link them to the repository. https://www.flippity.net/ https://jigsawplanet.net/ information technology and libraries march 2021 we can do it for free! | suni and brown 3 one of the newest additions to our offerings has been the podcast story search from special collections (http://bit.ly/flpstorysearch), which explores stories based on, inspired by, or connected to material artifacts. the podcast is recorded and edited using zencastr and audacity and is posted on anchor, which also distributes it to major listening apps. in recent weeks, our division has added images, blog posts, and additional con tent for current and past exhibitions. this is the first formal exhibition compilation since the special collections division began in 2015, and we are delighted that it is available for the public to explore. the material is arranged using templates and tools available in google sites, allowing patrons to view image carousels, exhibition tags, and past programs. the inclusion of this material marks a shift away from the repository functioning as a response to the need for pandemic-related content to a living history of our division and our work promoting the special collections of the free library. accessibility accessibility and equity of access lie at the core of library service. sadly, we were not initially focused on this point, and our content was not fully accessible, e.g., text was presented in thumbnails only which limited the use of screen readers to relay information. as the content expanded, we sought to make the space as inclusive as the freeware limits allowed. alternative text was added to images and information was not limited within thumbnails. this is an ongoing process, but one that is necessary to reach as many patrons as possible. analytics site visits and other statistics for a library’s online presence are always important, but especially so during the pandemic when restricted physical access has driven more patrons to online resources. our plan for capturing this information was two-pronged. first, we used bit.ly to create customized, trackable links for our content. these are used within the repository and on social media and in other online promotions. this has proven to increase repository traffic while providing information on how patrons discover our content. the statistics generated from bit.ly are only available for 30 days for free accounts, albeit in a rolling 30-day window. knowing this, we transcribe the statistics monthly into a spreadsheet to maintain a consistent account of patron access. our second prong is google analytics, a freeware option that only tracks data within the repository. google analytics connects a single google account to google sites, but the integration is seamless and the data remains available indefinitely. this provides a visual breakdown of statistics, including maps and graphs that are easily shared with other stakeholders. by using both tools we are able to surmise who is visiting the repository, where they are finding the links, and which sections are popular with our patrons. conclusion the special collections repository was created in response to a growing need for online patron engagement during the early weeks of the pandemic. our division strove to engage the public with fun, educational programming and activities primarily using freeware. this has proven to be successful with the general public and members of our division. the statistics from the site have both informed content creation and engendered a better appreciation for the repository from our administration. as we move forward, the repository is evolving into a comprehensive collection of what the special collection division does and how we meet the need for patron engagement http://bit.ly/flpstorysearch information technology and libraries march 2021 we can do it for free! | suni and brown 4 online and in person. it is a framework that can be used by library workers across a multitude of areas and specialties, housing activities from story times and passive programming to book clubs and lectures. repository framework repository software content framework accessibility analytics conclusion alexa, are you listening? an exploration of smart voice assistant use and privacy in libraries article alexa, are you listening? an exploration of smart voice assistant use and privacy in libraries miriam e. sweeney and emma davis information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12363 miriam e. sweeney (mesweeney1@ua.edu) is associate professor, university of alabama. emma davis (edavispatsfan@gmail.com) is library specialist, hoover public library. abstract smart voice assistants have expanded from personal use in the home to applications in public services and educational spaces. the library and information science (lis) trade literature suggests that libraries are part of this trend, however there are few empirical studies that explore how libraries are implementing smart voice assistants in their services, and how these libraries are mitigating the potential patron data privacy issues posed by these technologies. this study fills this gap by reporting on the results of a national survey that documents how libraries are integrating voice assistant technologies (e.g., amazon echo, google home) into their services, programming, and checkout programs. the survey also surfaces some of the key privacy concerns of library workers in regard to implementing voice assistants in library services. we find that although voice assistant use might not be mainstreamed in library services in high numbers (yet), libraries are clearly experimenting with (and having internal conversations with their staff about) using these technologies. the responses to our survey indicate that library workers have many savvy privacy concerns about the use of voice assistants in library services that are critical to address in advance of library institutions riding the wave of emerging technology adoption. this research has important implications for developing library practices, policies, and education opportunities that place patron privacy as a central part of digital literacy in an information landscape characterized by ubiquitous smart surveillant technologies. introduction smart voice assistant use has expanded from personal uses in the home to new applications in customer services, healthcare, e-government, and educational spaces, raising questions from groups like the american civil liberties union (aclu), among others, about the data privacy implications of these technologies in public and shared spaces.1 libraries are part of the voice assistant adoption trend, as documented in the american libraries magazine article “your library needs to speak to you” by carrie smith.2 smith gives examples of school, public, and academic libraries adopting smart voice assistants like amazon’s alexa and echo devices for a range of services and programming including “event calendars, catalog searches, holds, and advocacy.” nicole hennig points out that there are tremendous opportunities for voice assistants to assist “people with disabilities, the elderly, and people who can’t easily type.”3 in these ways, voice assistants are often presented in the trade literature as part of an exciting new wave of emerging smart technology services that libraries can “get ahead of” and potentially harness for public service and community engagement. at the same time, the key privacy issues inherent in voice assistants are often downplayed as secondary concerns while librarians are encouraged to press forward and experiment with smart technology adoption. we argue that the privacy concerns surrounding voice assistant use in libraries should be treated as fundamental questions for library mailto:mesweeney1@ua.edu mailto:edavispatsfan@gmail.com information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 2 workers to consider as a part of upholding the core values of patron privacy and confidentiality in library services. voice assistant use in libraries is still nascent, reflecting the emerging nature of these technologies. given this, it is not surprising that very few empirical studies have explored voice assistant use and potential data privacy implications for libraries. our research is intended as an exploratory study that contributes to advancing knowledge in this area. the goals of this study are to begin mapping smart voice assistant use in libraries, to assess how aware library workers are of privacy concerns involving these technologies, and document how library workers are educating patrons about privacy and voice assistant use. these are necessary first steps for developing library practices, policies, and education opportunities for voice assistant use that prioritize privacy as a central part of digital literacy in an information landscape characterized by ubiquitous smart surveillant technologies and diminishing data privacy protections. review of literature what is a voice assistant? voice assistants are a type of digital assistant technology, also known as virtual assistants, and can be broadly defined as computer programs designed with human characteristics that act on behalf of users in digital environments using voice interfaces.4 apple’s siri, microsoft’s cortana, and amazon’s alexa are prevalent examples of smart digital assistants that use voice recognition and natural language user interfacing to help learn users’ preferences, answer questions, and manage a variety of applications and personal information. voice assistants can run on multiple devices and be seamlessly integrated across platforms including networked internet of things (iot) gadgets like smart speakers (e.g., amazon echo and google home) and other smart-home technologies (e.g., nest or ring), along with mobile devices, smart watches, personal computers, and numerous third-party applications. ubiquitous “always on” features are offered as a convenience to users who can use “wake words” (e.g., “hey, siri”; “alexa”; “ok google”) to initiate queries and commands. amazon’s smart speakers and intelligent digital assistants are rapidly becoming pervasive home and personal technologies, with the amazon echo leading the market in 2019 with 61 percent market share, followed distantly by the google home device with 24 percent market share.5 a recent united states survey by clutch reported that nearly half of people surveyed owned a voice assistant, with one-third planning to purchase one in the next three years.6 additionally, the clutch survey found that 69 percent of voice assistant owners used their devices every day.7 the popularity of voice assistants for personal use has driven the expansion of these technologies for customer service applications outside of the home in shared and public spaces, including in educational settings and health care. in this landscape it is perhaps not surprising that librarians are following suit and exploring the service potentials of voice assistants for libraries. libraries and voice assistant use the american library association’s (ala) center for the future of libraries initiative identified “voice control” as a trend in their 2017 report, anticipating the relevance of voice assistant technologies for libraries.8 the capability of voice assistants to integrate across platforms through customized applications—which amazon calls “skills” and google refers to as “actions”—allows libraries to create specialized uses for these technologies as a part of their regular information services. additionally, existing third-party vendors like overdrive (for e-book lending) and hoopla (multimedia lending) that most public libraries use are preconfigured to connect to voice information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 3 assistants like amazon’s alexa. there are many creative and potentially helpful ways that voice assistants could be integrated into the library setting, including enhancing read-along with music and effects, providing accessible services for elderly patrons or individuals with disabilities, and providing an alternative access point for common library queries and institutional information (e.g., searching titles, placing holds, requesting library event information).9 some libraries have started experimenting with voice assistant services in the library. for example, iowa state university staff developed alexa skills for their library so that users could find out information about library history and library collections.10 other libraries are using voice assistants to strategically engage their communities, as when the spokane public library placed amazon echo dots in the library so patrons could ask questions about upcoming bond elections, an issue that directly impacts library funding.11 the worthington (oh) libraries are integrating voice assistant technologies into technology training and “petting zoo kits” which allow their patrons to try out emerging technologies.12 the king county (wa) library system is taking a novel approach and experimenting with developing their own voice assistant, libro.13 these examples point to the many applications and creative approaches libraries are experimenting with to bring voice assistant technology to their services. data privacy issues as convenient as voice assistants may be for library services, the underlying data infrastructures of these technologies are tightly controlled by the technology companies that design and sell them. the lack of library control (and transparency) over these infrastructures raises questions about how the core values of privacy and confidentiality can be guaranteed in the library setting. 14 voice assistant technologies capture a wide range of intimate user information in the form of biometric data (e.g., voice recognition), consumer habits, internet-based transactions, personally identifiable information (pii), and geographical information.15 the ubiquitous “always on” feature that makes these technologies so convenient also flags important privacy questions about the extent of user interactions that are recorded; how these files are processed, transcribed, and stored; and how local, state or other law enforcement agencies might compel or otherwise use these records.16 recently amazon has confirmed that they have employees dedicated to listening to recordings from echo devices in order to help “eliminate the gaps in alexa’s understanding of human speech and help it better respond to commands,” which is concerning for patron privacy in the library context.17 researchers at northeastern university and imperial college london recently did a study about how often smart speakers record “accidentally” and whether or not they are constantly recording. the study found no evidence to support the theory that these devices are constantly recording, however the researchers did report that smart speakers are accidentally activated around 19 times a day, on average. 18 these reports aside, there is still much unknown about what these companies, and the companies they contract out work to, do with the personal data collected from voice assistants. lastly, amazon is a known collaborator with us government agencies like homeland security and immigration and customs enforcement (ice), hosting their biometric data on amazon web services (aws).19 amazon has a reputation for being one of the least transparent technology companies in terms of data sharing practices, and has routinely evaded questions about if/how much of customers’ echo data has been turned over to federal authorities.20 given this data environment, the fact that libraries are beginning to experiment with voice assistant integration in their services poses important questions for patron data privacy and confidentiality. ala provides library privacy guidelines for third-party vendors that clearly detail information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 4 expectations for use, aggregation, retention, and disclosure of user data. 21 while this document has been helpful for guiding license agreements with digital content providers, program facilitators, and other libraries, it does not quite capture the range of complexities that emerging smart technologies pose in the app-driven iot landscape. this area is ripe for study and having more information about how libraries of different types are approaching using voice assistants is necessary for developing responsive professional practices that center issues of privacy and critical digital literacy. our survey explores some of these issues with the purpose of beginning to document voice assistant use, and associated privacy concerns, in library services. research methods four main research questions guide this study: (1) how are libraries using smart voice assistant technologies as a part of their library services? (2) how aware are library workers of how voice assistants integrate with third-party digital content platforms? (3) are libraries educating library patrons about the privacy implications of smart voice assistant technologies? and (4) what kinds of privacy concerns do library workers have about the use of smart voice assistant technologies in their library services and programming? to address these questions, we developed an online survey using qualtrics web software, and distributed it in fall 2019 to 1,929 public and academic libraries across the us via email solicitation.22 the survey consisted of a mix of 31 multiple choice and open-ended questions designed to address different aspects of the stated research questions (see appendix a). since most of the examples of library voice assistant use detailed in the lis trade literature came from public and academic libraries, these were the library types we identified as most likely to already be experimenting with voice assistants in services and programming. using purposive sampling techniques, we selected 30 public libraries for each state that represented a range of rural and metropolitan service areas. we selected approximately 10-20 academic libraries per state, the actual numbers ranging based on the total number of universities and colleges in a given state. we identified a cross-section of large state schools, private colleges, and community colleges in each state to account for the variety of higher education institutional settings for academic libraries. we sent email solicitations to each public library, targeting email addresses for library directors where possible. for libraries that had centralized email services, we solicited participation using the contact forms available on the libraries’ websites. email solicitations to the academic libraries targeted library employees with job titles that included: emerging technology, user services, user experiences, head of public services, and head of technology. our survey analysis documents the numbers of reported uses, and kinds of integration, of voice assistant technologies across library applications and services. we conducted a qualitative content analysis of the short answer responses, with both researchers independently coding participant comments for emergent themes and categories. as a part of this process both researchers compared and negotiated categories in two iterations of coding to arrive at a common codebook which was then applied in the final pass of the responses. these categories have some distinct features, but also have many overlapping components. comments that embodied multiple themes were included in all categories that were relevant for describing them, meaning a particular comment might be included in multiple categories. the following sections report on the key findings of this study, organizing the discussion around our original research questions. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 5 findings participant demographics we received 86 total responses for the survey, with the majority of respondents (61 percent) reporting affiliation with public libraries, followed by respondents from academic libraries (38 percent), with one respondent from a school library (1 percent).23 the participants represented libraries from 42 states across the us.24 the vast majority of public library respondents (65 percent) reported serving populations of 25,000 or more, though there was also a large reporting from libraries serving smaller populations of 2,500-9,999. the majority of academic library respondents work for small and medium sized institutions serving populations between 2,5009,999 (table 1), with nearly a third of respondents representing medium to large institutions. admittedly, these are rough demographic sketches to help quickly identify which types of libraries might be using voice assistants. more granular demographic detail would be useful in future studies to further understand how factors like institution type, geographical region, access to resources, and service community demographics shape decisions about emerging technology adoption in libraries. table 1. size of service population by library type total public academic school total count 84 51 32 1 2,5000 or less 11.9% 2.0% 28.1% 0.0% 2,500-9,999 25.0% 19.6% 34.4% 0.0% 10,000-25,000 16.7% 11.8% 25.0% 0.0% 25,000+ 44.0% 64.7% 12.5% 0.0% i’m not sure. 2.4% 2.0% 0.0% 100.0% how are libraries using smart voice assistant technologies as a part of their library services? only five respondents (6 percent) in our study reported that their library is currently using amazon echo, google home, or apple siri devices for patron services and programming. of the voice assistant adopters, three were public libraries using amazon echo and google home devices, and two were academic libraries using amazon echo and apple siri (table 2). table 2. voice assistant device by library type total public academic school amazon echo 3 1 2 0 google home 2 2 0 0 apple siri 1 0 1 0 librarians described using voice assistants to “provide basic info about the library and resources,” and on an “ad hoc basis” to promote the library-specific alexa skills and google home actions. other reported uses included “translation services” and as a part of “technology petting zoos.”25 we asked librarians to describe where voice assistants were located in the library to get a better idea of the spatial arrangements of these technologies, which could be important for considering potential surveillant concerns. several libraries reported that they had voice assistants sitting at front service desks or reference desks for patrons to use in both adult and children’s service areas, information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 6 as well as at circulation desks. as one librarian described, “we are mounting it [the voice assistant] so students/users can ask questions when necessary.” when it comes to using these devices in library programming, the most common response was for use in technology petting zoos and in technology classes where patrons can see technology demonstrations and ask library staff questions, or get on-on-one tutoring sessions: “our technology department holds regular ‘tech drop-in's’ and carries out one on one assistance by appointment. in the context of these patrons will sometimes bring in their own devices or ask questions about the use of digital assistants.” other programming applications that librarians mentioned for voice assistants included trivia, 3 -d printing, and makerspaces. two libraries (one public and one academic) reported that they were circulating apple siri devices (e.g., ipads) and amazon alexa products (e.g., echo) for checkout. how aware are library workers of how voice assistants integrate with third-party digital content platforms? the majority of library workers surveyed (70 percent) reported that their libraries use third-party digital media platforms like overdrive and hoopla to provide multimedia content like e-books and streaming video to patrons. both of these platforms support integration with voice assistants like amazon alexa through “skills” (the alexa equivalent of an application). patrons are able to download a skill for their alexa-enabled device to access digital content through these platforms, which are often linked to their library accounts (e.g., “alexa, ask hoopla how many borrows i have remaining.”).26 around 14 percent of the respondents reported that they were aware that overdrive and hoopla integrated with voice assistants, and 3 percent of all respondents reported that their libraries actively inform patrons about amazon alexa skills for these services. when patrons begin connecting their personal voice assistant devices with third -party digital content providers that are also linked to their library accounts, different terms of service agreements and privacy policies overlap creating a complex data rights landscape. almost a third of our respondents (29 percent) replied that they were aware that amazon has different privacy policies from overdrive and hoopla, with 22 percent responding that they were unaware of these differences (the rest were unsure or did not respond). only 15 percent of respondents reported that their libraries provided patrons with information about overdrive and hoopla’s privacy policies. one library worker offered that, “when helping a patron or informing them that we use overdrive they are encouraged to read all the privacy info.” however, no libraries in this study reported sharing information about amazon’s privacy policies with patrons, which might also apply to linked accounts. lastly, 34 percent of the library workers indicated that they were familiar with the ala guidelines on privacy that pertain to third-party vendors, and 16 percent reported that their library actively refers to these guidelines in information materials for patrons. for instance, “we have a privacy policy on our website, which was based on the ala library privacy checklists. it states that our vendors have different privacy policies than we do.” these responses indicate that while some library workers are aware of the privacy implications of the integration of voice assistants into third-party digital content platforms, there are opportunities to increase staff and patron awareness about the intersecting privacy policies and terms of service in this landscape. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 7 are libraries educating library patrons about voice assistant technologies as a part of services and programming? we were curious if the libraries who used voice assistants in their services were taking any particular measures to inform patrons about the privacy implications of these technologies, or offering any other kinds of specific privacy “best practices” guides for use (e.g., how to erase your data records, adjust settings, etc.). the two libraries who reported circulating voice assistants indicated that they did not include any privacy information with voice assistant devices at checkout. similarly, we asked library workers about the kinds of technology classes or programming that their libraries were offering, since these might be sites where there is potential to educate or provide information about privacy issues raised by smart technologies like voice assistants. we found that 49 (56 percent) of the libraries represented in the survey (37 public, 12 academic) offer technology courses for the public. of these, 39 libraries (24 public, 15 academic) responded “yes” to our question asking if aspects of “data privacy or data literacy” are included as part of these classes or other related programming.27 only 3 libraries (2 public, 1 academic) were able to report that their library offers data literacy education that specifically addresses voice assistant technologies. library workers provided many examples of the kinds of data literacy information that their libraries typically provided in technology classes and programming. twelve respondents said that their libraries offered some sort of broad data literacy class and several cited classes specifically targeted at personal data practices and security. topics taught in these classes included: understanding your personal risk profile; password managers and security; how to understand and protect your digital footprint; and sessions on facebook and google where staff “walk users through how to find their information and make decisions about it.” several respondents identified information literacy topics in conjunction with data literacy, noting that their library teaches classes about identifying “fake news,” phishing scams, and evaluating the authority of websites and website content. none of the responses specifically named issues around privacy or data capture by voice assistants or other smart technologies as topics covered in library technology classes. several library workers noted that technology classes were offered at their libraries through one-on-one sessions, geared to individually address what patrons had questions about. based on these responses it is unclear how in-depth, or if at all, these one-on-one sessions might go into informing patrons about privacy best practices and risks when using smart technologies like voice assistants. what kinds of privacy concerns do library workers have about the use of smart voice assistant technologies in their library services and programming? just over half of the library workers surveyed (52 percent) answered “yes” to the question: “do you have any privacy concerns about the use of amazon echo, google home, or apple siri devices in the library?” of the other responses, 16 percent reported “no” concerns and 15 percent answered “i’m not sure.” those who answered yes were asked to further describe their privacy concerns, resulting in robust descriptions that demonstrated a savvy understanding of the voice assistant data landscape. we characterized library workers’ concerns about voice assistants in the library by five major categories: data access and use; surveillance and “always on” features; procedure and operations; legal issues; and professional responsibility. data access and use by far the most prevalent privacy concerns focused on questions about who has access to data collected by smart voice assistants and how this data might be used (or misused) by different information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 8 parties. library workers were the most concerned about the reach of access that the three major voice assistant parent companies (amazon, google, apple) have to patron data, closely accompanied by concerns with the selling of this data to third-party vendors: “there are known risks in the logging practices of the assistant vendor (amazon, google, apple). there are potentially greater, and unknown, risks of privacy and data security problems with third-party integrators that libraries are working with to create the alexa skills, google home actions, etc.” “these devices are tied to user accounts for vendors that sell goods and services. there are opportunities to make purchases that we do not want to present to our patrons.” “as currently constituted, most of these devices' privacy policies require owners to allow voice recordings to be sent to cloud services for transcription and, in some instances, for storage and for re-listening by staff or 3rd-party contractors.” another library worker added that they were concerned about the willingness of these parent companies to “share personal, private data with law enforcement agencies.” this observation underscores what is potentially at stake in terms of patron vulnerability in this data environment. several concerns focused on patrons “unwittingly leaving their sensitive information on devices that we might use.” “being that anything we use in the library, or check out to our patrons is shared, i have privacy concerns for what data and recordings will be collected by the services while they are either in use in the library or while they are in the patron’s possession.” while some of these concerns were tied back to how parent companies might use this data, others were equally wary of the potentials for “storing information that can be accessed between patron uses” or by library staff: “as with computers in the info commons, i would be concerned whether user information is scrubbed after each user. or would one user's information persist and become available to a subsequent user.” “i would not want to be able to identify the patron who used the device. in this case, we cannot. we circulate ipads as assistive devices. as soon as the item is returned, the checkout record is purged.” lastly, library workers expressed cybersecurity concerns about voice assistants, wondering about how voice assistants might be hacked or otherwise manipulated by malicious actors: “the library is public space, these devices are not known for being secure. a device would have to be registered to some university account, but would be prone to algorithmic manipulation from public voice inputs if that makes sense?” “just the idea that they (everything!) is [sic] hackable, and hostage-able, and so on, creeps me out personally, but also in terms of privacy and confidentiality of users of that technology.” information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 9 “alexa and google home can be hacked to phish passwords and other sensitive information.” taken together, these concerns gesture to the opacity of the data environment in terms of who might have access to data (companies, law enforcement, patrons, library staff, hackers) and how this data might be used (advertising and marketing, exposure of personal patron information, state surveillance, and exploitation). surveillance and “always on” features the second major area of concern that library workers expressed was about the surveillance potentials of voice assistants via their passive listening features. in order for voice assistants to respond to their various wake words, they need to be “always on” and listening. while there is a difference between always listening and always recording (which recent studies suggest is no t happening), library workers remained wary about devices “constantly monitoring staff or patrons.”28 these concerns have some obvious overlap to the data access and use theme, but differ in that they are specifically concerned with the act of surveilling—monitoring—patron activities, use patterns, and personal information. three respondents in this category couched their data privacy concerns in terms of ability to exert some control over their data (e.g., deleting data), or the ability to grant permission/consent to be recorded: “these devices are intended for use in the home. they offer some protections for users with management access. for example, the google assistant allows review and deletion of recording history. for users without such access there are no such protections.” “...they [voice assistants] are intended to for use inside a single household, learning the voices, habits and preferences of those household members. i feel that this kind of personal information should be the individual's choice to make and not the library's [sic].” “my concern is that my personal data is being collected without my permission. the same concern applies to patrons of the library. having them present and turned on captures people's conversations and they may not be aware that is happening.” as these comments suggest, passive listening in public spaces opens up the potential for surveilling patrons and library staff who are not intending to interact with the devices, or who have no knowledge that the device is present. in other words, while some patrons might opt to use a voice assistant to ask a question or look a book up in a library catalog, patrons (and library staff) who are merely talking in the vicinity of these devices may still be listened to and recorded by these devices without their knowledge or consent. this group of privacy concerns conveys a lack of transparency around data collection and surveillance in voice assistants, pointing to larger power differentials between parent companies and users in terms of control over data collection and management. procedure and operations library workers discussed the operational challenges that voice assistants present to staff in terms of establishing routine procedures that ensure patron privacy and confidentiality in between patron use: information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 10 “how do we make sure no residual information remains in the device before someone else uses it or that if used during a program 'private' information isn't being broadcast to other devices in the area?” another library worker alluded to some of the operational considerations that already accompany library use and lending of personal computing devices, “clearing data, purchasing, maintaining, we already have ipads and other devices and their management with our staff has been a challenge.” this comment points to the extra staff labor that underpins technology services, which is often not considered as a part of infrastructure for offering these services. similarly, there is a sense from these comments that establishing procedures to maintain privacy and confidentiality are critical for voice assistants. failure to erase or secure patron data could lead to inadvertently exposing sensitive or personally identifiable information (pii). “patron's [sic] may inadvertently be saving their information or staff may forget to delete information causing the previous patrons sensitive information to remain for the next patron to discover.” while google home and amazon alexa devices do provide the ability for individual recordings to be deleted by the account holder, in the case of shared library use of voice assistants, it would likely be incumbent on a library staff member to access and delete recordings. this raises ethical, legal, and operational questions for library staff required to manage any patron data collected by voice assistants. in any case, procedural concerns are a reminder that library staff have an active role to take in ensuring patron privacy. legal issues library workers in this study identified three legal issues posed by voice assistants in the library. the first legal issue raised was the potential for violation of the family educational rights and privacy act (ferpa)—the federal law that protects the privacy of student education records—due to the collection of pii by voice assistants. library workers in many academic settings are required to maintain compliance with ferpa. one of the respondents was concerned that by using voice assistants in their services, libraries would be putting themselves in a position to potentially violate this law. a second set of concerns focused on questions about the liability of the library (or individual library workers) if a patron’s pii is misused by technology companies or the third-party vendors who have access to user data: “i have great concerns regarding the use of this technology in a library setting since it might expose the library to potential liability if, more likely when patron data is misused by the technology providers.” related to this concern, another library worker asked, “who owns the info?” questions about rights and ownership of personal data by technology companies, itself a fraught and opaque legal area, require more ethical and legal probing as libraries become intermediaries to patron use of voice assistants. lastly, one library worker cited concerns about librarians’ ability to uphold first amendment rights with voice assistants. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 11 “we take our mandated role to uphold first amendment rights and patron privacy very seriously. there are too many issues with the way these for-profit companies collect, store and potentially use information. we see no benefits of service gained that offset these concerns. we are also concerned about the way owners of these products use their wealth to leverage political influence.” this comment identifies privacy as a necessary condition for facilitating free speech, contrasting this with a sketch of the political and economic motives underlying voice assistant development. the concerns raised by these library workers point to the complexity of managing patron data in the context of a variety of existing legal frameworks. professional responsibility three respondents explicitly placed privacy concerns in the context of their professional responsibility as library workers to “protect” patrons and patron privacy. a fourth respondent voiced a twin concern about “the library's inability to protect privacy and patron information” (emphasis added). beyond descriptions of protecting patrons, these library workers framed privacy as a professional value. comments such as, “we take our mandated role to uphold first amendment rights and patron privacy very seriously,” emphasize privacy as a professional charge. these kinds of comments tacitly draw on lis professional core values and ethics statements to position responsible professional practice as the action of upholding privacy. as a result, professional identity is discursively constructed by these library workers as a function of valuing privacy. the following comment, particularly, draws an identity-based line between “us” (library professionals) and “them” (technology companies) that is based on divergent values surrounding privacy: “since one of the main concerns we (should) have as library professionals is patron privacy; ‘teaming up’ with technology providers who do not have that level of concern is problematic at best.” the assertion that library core values may be in conflict with the technology providers that are designing voice assistants is very astute, and important for libraries to consider when weighing the decision to experiment with these (and other) emerging smart technologies. discussion: key considerations for library professionals our research suggests that library use of voice assistants poses many as-of-yet unresolved privacy issues for library staff and patrons alike. though voice assistant use is still fairly nascent across public and academic libraries, our study confirms that these tools are already being adopted by some libraries. the adoption of these, and other, smart technologies, is likely to keep trending in library services across institution types, paralleling market trends for personal adoption of voice assistants. many library workers in our study expressed astute concerns about voice assistants, raising important questions about how patron data was collected, managed, and used across the data lifecycle of these technologies. this is a critical moment, then, for the library profession to take stock of questions of privacy surrounding voice assistants, and an opportunity to set a broader professional agenda for data-privacy that encompasses the complexities of smart technology use in library services. in this spirit, we have identified several main areas of concern that emerged from our study, posited as key considerations about voice assistants for library professionals to grapple with. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 12 circulation procedures for libraries who are, or are considering, lending voice assistant-enabled technologies, clear lending rules are needed for patrons that set guidelines for disconnecting their personal amazon, apple, or google accounts before returning the device. likewise, it is important to develop procedures for library staff to anticipate instances when patrons forget to disconnect their personal accounts. library workers cannot, and should not, be responsible for disconnecting personal accounts as a protective measure for both staff and patrons, since doing so asks library workers to access and take responsibility for personal patron data, including pii. one suggestion might be to require devices to be restored to factory settings, which could be verified by a library staff member at time of device return. libraries might also consider including privacy best practices with these devices that outline known privacy risks and provide information about how to adjust settings to limit data sharing or delete records in personal accounts where applicable (e.g., amazon). third-party digital content platforms the integration of voice assistants in third-party digital content platforms licensed by libraries is becoming more common, pointing to the complexity of upholding patron data privacy throughout these layered and linked services. this issue speaks to the difficulties navigating overlapping privacy statements and terms of service agreements, which is not unique to voice assistants but does indicate the need for more data protections and consumer-oriented information policies. ala already does advocacy work on these issues and provides many helpful guidelines , such as the library privacy guidelines for vendors (http://www.ala.org/advocacy/privacy/guidelines/vendors). still, the data environment is very much characterized by the unequal power differential between technology companies and users. we are in dire need of more robust information policy frameworks that are predicated on transparency, strict parameters for data collection and use, corporate accountability, and user control and agency. a promising example of this is the general data protection regulation (gdpr) implemented in the european union in 2018. something similar is needed in the us to regulate corporate data-sharing practices and give users more control over their data. this would be beneficial across the board for the public, as well as to library patrons using their personal voice assistant devices to access library resources. education opportunities for expanding digital literacy library workers in our study reported a range of technology education and digital literacy programming initiatives in their libraries, though none that specifically addressed voice assistants. this suggests that library technology programming might not be targeting the kinds of specific privacy concerns posed by smart technologies like voice assistants. as smart technologies like voice assistants become more common for household/personal use, it would make sense to expand library programming initiatives to include informational sessions that incorporate data privacy considerations for smart technologies in addition to skills-driven sessions. additionally, some survey responses indicated that library workers may have some knowledge gaps or a lack of concern about voice assistant use. this might point to a need for expanded education, training, and professional development around data privacy issues and emerging technologies for library workers. there has already been a large push in the field to expand digital literacy, defined by ala as “the ability to use information and communication technologies to find, evaluate, create, and communicate information, requiring both cognitive and technical skills.”29 however, this definition of digital literacy falls short of considering the role of assessing data collection, storage, and use as a core part of digital knowledge. expanding digital literacy training, for both staff and http://www.ala.org/advocacy/privacy/guidelines/vendors information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 13 patrons, to include awareness of the data ecosystems and privacy concerns that undergird smart technologies is a must for responsive library services. surveilling patrons and staff voice assistants placed in public service areas, in the library stacks, and in public gathering areas within the library raise the ethical issue of recording patrons (and staff) who either do not wish to be recorded, or do not even know they may be recorded. in the case of library staff, this poses a labor issue where staff may be asked to work in areas where devices may be listening to their interactions during the duration of their shift. for patrons, this could compromise privacy in reference transactions and in other information seeking activities, as well as capturing other personal interactions that take place in the library setting. it is critical that libraries are transparent about using voice assistant technologies, upfront about the potential privacy harms of these technologies, and abide by “opt-in” rather than “opt-out” frameworks. library workers should consider treating voice assistant records in the same way they have historically treated circulation records, opting to either delete these records or not collect them (meaning, not use voice assistants) at all. unlike circulation records, however, library workers have far less control over the data captured by voice assistants. this data is stored in the cloud on privately owned servers that remain outside of library control and oversight. given the incredibly low bar for federal access to information under the usa patriot act, actively facilitating the collection of patron and staff interactions, particularly without informed consent, should give librarians pause. opt to not adopt in light of the issues raised in this study, library workers need to seriously weigh whether the benefits of using voice assistants in libraries at this point in time outweigh the vast privacy concerns that we have outlined here. as it stands, these technologies are not currently filling a gap in library services that cannot be otherwise met by more traditional service models that carry fewer potential harms for our patron communities. importantly, not all patrons are equally vulnerable to harm or exploitation in these data environments. for instance, there is a wealth of research that demonstrates the multitude of ways that black, indigenous, people of color, lqbtq+, women, and low-income communities are subjected to higher levels of surveillance and data profiling that results in harassment, discrimination, economic penalties, and legal persecution.30 as the current national political landscape is aflame in protests against police violence and anti-black racism, it is important to identify surveillance technologies as policing technologies. libraries need to consider that these tools, as extensions of policing data networks, may directly endanger, particularly, black, latinx, and indigenous people who are already subjected to over-policing. in this sense, concerns about patron data privacy are high-stakes and are deeply linked to the professional core value of social responsibility.31 libraries should consider not using voice assistants until key data privacy concerns are addressed, more robust data protections are in place at a federal level, and the blanket authority for federal agencies and law enforcement to compel user data is revoked. this is not a technophobic stance. on the contrary, we are suggesting that library workers could serve an important role as privacy advocates, which includes critically evaluating the role of emerging technologies in their communities on behalf of public interest. a key part of this must include the library profession taking responsibility for the use of surveillance technologies in their institutions since these technologies are deeply implicated in the policing of disenfranchised communities by state and federal authorities. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 14 conclusion we view this study as a modest starting point for mapping some of the many privacy issues associated with voice assistant use in library services and programming and hope it points a way forward for future research. future research might address specific case studies of voice assistant use in libraries, data mapping of patron data through third-party library services, use and privacy issues across different institution types, patron digital literacies with voice assistants, and library policies for smart technologies more generally. plural and diverse vantage points are needed to understand the potential impacts of these technologies across different community types. such research is critical for developing best practices, guidelines, policies, and education opportunities for voice assistant use (and other smart technologies) that prioritize patron privacy and confidentiality. the use of voice assistants in libraries raises questions about the responsibility of libraries and librarians to actively engage patron data privacy concerns when considering integrating these technologies into services and programming. indeed, we encourage library workers to consider informed non-adoption of these technologies as a socially responsible professional stance until the key issues we have outlined are addressed. while it is, of course, important for library workers to remain current and innovative in their services, it is also paramount that patron privacy (as a function of safety) stays at the forefront of library services. in other words, it is the responsibility of library workers to anticipate potential privacy issues associated with emerging technologies, rather than treating privacy as a secondary concern to technology adoption. there are tremendous opportunities for library workers to lead the data privacy charge—in collaboration with community stakeholders—in pursuit of privacy-centered library services that are accountable to community members, particularly those who are mostly likely to be harmed by these technologies. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 15 appendix a: survey instrument 1. by selecting the “i agree” button below, i hereby certify: that i am 19 years old or older; that i have read and understand the above consent form; and that this action indicates my willingness to voluntarily take part in the study. a. i agree to participate in the research study described above. b. i do not agree to participate in the research study described above. 2. do you work in a library setting? a. yes b. no 3. what kind of library do you work at? a. public b. academic c. school d. other, please specify [fill in the blank] 4. what is the size of your library’s service population? a. 2,500 or less b. 2,500-9,999 c. 10,000-25,000 d. 25,000+ e. i’m not sure. 5. what state is your library located in? [fill in the blank] 6. does your library have amazon echo devices, google home devices, or apple siri devices available for use by patrons? a. yes b. no c. i’m not sure. 7. which of the following digital assistant devices does your library have available for patrons to use? a. amazon echo devices b. apple siri devices c. google home devices d. other products, please specify: [fill in the blank] 8. please provide some examples of how your library patrons use the library's digital assistant technologies. [short answer] 9. could you describe where these digital assistant technologies are located in the library? [short answer] information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 16 10. does your library use amazon echo devices, google home devices or apple’s siri devices in any of the following kinds of programming? (select all that apply) a. tech “petting zoos” b. trivia c. homework help d. technology classes e. makerspaces f. not listed, please specify: [fill in description] g. none of the above 11. for the programs you selected, briefly explain how the devices are integrated into programming. [short answer] 12. does your library circulate amazon echo, google home, and apple siri devices to the public for checkout? a. yes b. no c. i’m not sure. 13. which devices do you circulate? a. amazon echo devices b. apple siri devices c. google home devices d. other products, please specify [fill in the blank] 14. do you provide any privacy information and/or best practice information with the device at checkout? a. yes b. no c. i’m not sure 15. if so, briefly explain what kind of privacy or best practices information you include. examples of content covered in this information would be helpful. [short answer] 16. do you have any privacy concerns about the use of amazon echo, google home, or apple siri devices in the library? a. yes b. no c. i’m not sure 17. could you describe these privacy concerns? [short answer] 18. does your library offer any sort of technology courses to the public? a. yes b. no c. i’m not sure information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 17 19. does your library teach data privacy or data literacy as part of the library's programming? a. yes b. no c. i’m not sure 20. does your library offer any data literacy education in programming that specifically addresses digital assistants? a. yes b. no c. i’m not sure 21. what kinds of data literacy information is provided in these courses taught at your library? please provide some examples: [short answer] 22. does your library use any of the following services? select all that apply: a. overdrive/libby b. hoopla c. none of the above 23. are you aware that both overdrive and hoopla have amazon echo application integration (called "skills")? a. yes b. no c. i’m not sure 24. does your library inform patrons about amazon echo skills on overdrive and/or hoopla? a. yes b. no c. i’m not sure 25. are you aware that amazon's privacy policies differ from those of overdrive and hoopla? a. yes b. no c. i’m not sure 26. does your library provide any information to patrons about overdrive and hoopla's privacy policies? a. yes b. no c. i’m not sure 27. does your library provide any information to patrons about amazon's privacy policies? a. yes b. no c. i’m not sure 28. please provide a brief description of the information that you are providing to patrons on this subject, including where this information is located for patron access. [short answer] information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 18 29. are you aware of the guidelines that the american library association (ala) provides on privacy as it pertains to third party electronic vendors? a. yes b. no 30. does your library use or refer to these privacy guidelines in any informational materials for patrons? a. yes b. no c. i’m not sure 31. please describe these informational materials, including how and where they are distributed to patrons: [short answer] information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 19 endnotes 1 benjamin herald, “teacher’s aide or surveillance nightmare? alexa hits the classroom,” digital education, education week, june 26, 2018, http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teacher s_surveillance.html?cmp=soc-shr-fb. 2 carrie smith, “your library needs to speak to you,” american libraries (june 3, 2019), https://americanlibrariesmagazine.org/2019/06/03/voice-assistants-your-library-needs-tospeak-to-you/. 3 nicole hennig, siri, alexa, and other digital assistants: the librarian’s quick guide (santa barbara, ca: libraries unlimited, 2018) 33–8. 4 adapted from: brenda laurel, “interface agents: metaphors with character,” human values and the design of computer technology (1997): 207–19, cambridge university press. 5 emily clark, “alexa, are you listening? how people use voice assistants,” https://clutch.co/appdevelopers/resources/alexa-listening-how-people-use-voice-assistants. 6 clark, “alexa, are you listening? how people use voice assistants.” 7 clark, “alexa, are you listening? how people use voice assistants.” 8 "voice control", american library association, http://www.ala.org/tools/future/trends/voicecontrol. 9 shannon liao, “google home will play music and sound effects when you read disney storybooks,” https://www.theverge.com/2018/10/29/18037466/google-home-disneymusic-moana-incredibles-coco-storytime; hennig, siri, alexa, and other digital assistants, 35; susan allen and avneet sarang, “serving patrons using voice assistants at worthington,” online searcher 42, no. 6 (november-december 2018): 49–52. 10 smith, “your library needs to speak to you.” 11 smith, “your library needs to speak to you.” 12 allen and sarang, “serving patrons using voice assistants at worthington.” 13 king county library system, “voice assistants, connecting you to your library,” https://kcls.org/voice/. 14 “core values of librarianship,” american library association, http://www.ala.org/advocacy/intfreedom/corevalues. 15 miriam e. sweeney, “digital assistants,” in uncertain archives: critical keywords for big data, ed. nanna bonde thylstrup, daniela agostinho, annie ring, catherine d’ignazio , and kristin veel (baltimore, md: mit press, 2021), 151–60. http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb https://americanlibrariesmagazine.org/2019/06/03/voice-assistants-your-library-needs-to-speak-to-you/ https://americanlibrariesmagazine.org/2019/06/03/voice-assistants-your-library-needs-to-speak-to-you/ https://clutch.co/app-developers/resources/alexa-listening-how-people-use-voice-assistants https://clutch.co/app-developers/resources/alexa-listening-how-people-use-voice-assistants http://www.ala.org/tools/future/trends/voicecontrol https://www.theverge.com/2018/10/29/18037466/google-home-disney-music-moana-incredibles-coco-storytime https://www.theverge.com/2018/10/29/18037466/google-home-disney-music-moana-incredibles-coco-storytime https://kcls.org/voice/ http://www.ala.org/advocacy/intfreedom/corevalues information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 20 16 anthony cuthbertson, “amazon admits employees listen to audio from echo devices,” https://www.independent.co.uk/life-style/gadgets-and-tech/news/amazon-alexa-echolistening-spy-security-a8865056.html. 17 matt day et al., “amazon workers are listening to what you tell alexa,” https://www.bloomberg.com/news/articles/2019-04-10/is-anyone-listening-to-you-onalexa-a-global-team-reviews-audio. 18 daniel j. dubois et al., “when speakers are all ears: understanding when smart speakers mistakenly record conversations,” mon(iot)r, february 14, 2020, https://moniotrlab.ccis.neu.edu/smart-speakers-study/. 19 karen hao, “amazon is the invisible backbone of ice’s immigration crackdown,” mit technology review, october 16, 2019, https://www.technologyreview.com/s/612335/amazon-is-theinvisible-backbone-behind-ices-immigration-crackdown/. 20 zack whittaker, “echo is listening, but amazon’s not talking,” zdnet, january 16, 2018, https://www.zdnet.com/article/amazon-the-least-transparent-tech-company/. 21 american library association, “library privacy guidelines for vendors,” http://www.ala.org/advocacy/privacy/guidelines/vendors. 22 this research protocol (19-08-2671) was approved in october 2019 by the university of alabama’s institutional review board (irb). 23 note, participants were not required to answer every question, so some questions have fewer than 86 total responses due to participants electing to not respond. also, even though we were targeting public and academic libraries, we did receive a response from someone identifying their institution as a school library and decided to include it in the results. 24 we did not receive responses from libraries in arizona, arkansas, connecticut, delaware, pennsylvania, vermont, virginia, or wyoming. 25 “technology petting zoos” are areas where patrons can experiment or try out technologies and gadgets. 26 hoopla, “alexa, meet hoopla,” july16, 2018, http://hub.hoopladigital.com/whatsnew/2018/7/alexa-meet-hoopla. 27 we purposely couched questions about “data literacy” and “data privacy” broadly in the survey to allow for a range of interpretations by respondents in an attempt to capture the range of information that might be taught under this umbrella. 28 daniel j. dubois et al., “when speakers are all ears: understanding when smart speakers mistakenly record conversations.” 29 american library association, “digital literacy,” https://literacy.ala.org/digital-literacy/. 30 examples of critical research in this area include: toby beauchamp, going stealth: transgender politics and u.s. surveillance practices (durham, london: duke university press, 2019); https://www.independent.co.uk/life-style/gadgets-and-tech/news/amazon-alexa-echo-listening-spy-security-a8865056.html https://www.independent.co.uk/life-style/gadgets-and-tech/news/amazon-alexa-echo-listening-spy-security-a8865056.html https://www.bloomberg.com/news/articles/2019-04-10/is-anyone-listening-to-you-on-alexa-a-global-team-reviews-audio https://www.bloomberg.com/news/articles/2019-04-10/is-anyone-listening-to-you-on-alexa-a-global-team-reviews-audio https://moniotrlab.ccis.neu.edu/smart-speakers-study/ https://moniotrlab.ccis.neu.edu/smart-speakers-study/ https://moniotrlab.ccis.neu.edu/smart-speakers-study/ https://www.technologyreview.com/s/612335/amazon-is-the-invisible-backbone-behind-ices-immigration-crackdown/ https://www.technologyreview.com/s/612335/amazon-is-the-invisible-backbone-behind-ices-immigration-crackdown/ https://www.zdnet.com/article/amazon-the-least-transparent-tech-company/ http://www.ala.org/advocacy/privacy/guidelines/vendors http://hub.hoopladigital.com/whats-new/2018/7/alexa-meet-hoopla http://hub.hoopladigital.com/whats-new/2018/7/alexa-meet-hoopla https://literacy.ala.org/digital-literacy/ information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 21 virginia eubanks, automating inequality: how high-tech tools profile, police, and punish the poor (st. martin’s press, 2018); safiya u. noble, algorithms of oppression: how search engines reinforce racism (new york: nyu press, 2018). 31 “core values of librarianship,” american library association. abstract introduction review of literature what is a voice assistant? libraries and voice assistant use data privacy issues research methods findings participant demographics how are libraries using smart voice assistant technologies as a part of their library services? how aware are library workers of how voice assistants integrate with third-party digital content platforms? are libraries educating library patrons about voice assistant technologies as a part of services and programming? what kinds of privacy concerns do library workers have about the use of smart voice assistant technologies in their library services and programming? data access and use surveillance and “always on” features procedure and operations legal issues professional responsibility discussion: key considerations for library professionals circulation procedures third-party digital content platforms education opportunities for expanding digital literacy surveilling patrons and staff opt to not adopt conclusion appendix a: survey instrument endnotes measuring library broadband networks to address knowledge gaps and data caps article measuring library broadband networks to address knowledge gaps and data caps chris ritzo, colin rhinesmith, and jie jiang information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.13775 chris ritzo, mslis (critzo@afutures.xyz) is consultant/owner, anemophlious futures llc. colin rhinesmith, phd (crhinesmith@metro.org) is director, digital equity research center, metropolitan new york library council. jie jiang, mslis (jie.jiang@simmons.edu) is a doctoral student at the simmons university school of library and information science. © 2022. abstract in this paper, we present findings from a three-year research project funded by the us institute of museum and library services that examined how advanced broadband measurement capabilities can support the infrastructure and services needed to respond to the digital demands of public library users across the us. previous studies have identified the ongoing broadband challenges of public libraries while also highlighting the increasing digital expectations of their patrons. however, few large-scale research efforts have collected automated, longitudinal measurement data on library broadband speeds and quality of service at a local, granular level inside public libraries over time, including when buildings are closed. this research seeks to address this gap in the literature through the following research question: how can public libraries utilize broadband measurement tools to develop a better understanding of the broadband speeds and quality of service that public libraries receive? in response, quantitative measurement data were gathered from an open-source broadband measurement system that was both developed for the research and deployed at 30 public libraries across the us. findings from our analysis of the data revealed that ookla measurements over time can confirm when the library’s internet connection matches expected service levels and when they do not. when measurements are not consistent with expected service levels, libraries can observe the differences and correlate this with additional local information about the causes. ongoing measurements conducted by the library enable local control and monitoring of this vital service and support critique and interrogation of the differences between internet measurement platforms. in addition, we learned that speed tests are useful for examining these trends but are only a small part of assessing an internet connection and how well it can be used for specific purposes. these findings have implications for state library agencies and federal policymakers interested in having access to data on observed versus advertised speeds and quality of service of public library broadband connections nationwide. introduction the covid-19 pandemic exposed the severity of the digital divide in the united states. during this time, lack of access to computers and the internet has been highlighted among individuals and families with limited monthly incomes in tribal, rural, and urban communities where broadband is neither available nor affordable. decades of research has shown that this digital divide is further deepened along racial and ethnic lines. wealthier, white, and more educated individuals consistently have higher rates of home computer and broadband ownership. many without th is societal privilege rely on their local public libraries and other community spaces to fill these gaps. the pandemic has also underscored just how significant public libraries have been in addressing people’s need for computers and high-speed internet. last year, for example, mainstream news mailto:critzo@afutures.xyz mailto:crhinesmith@metro.org mailto:jie.jiang@simmons.edu information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 2 organizations shared several stories about children, parents, and teachers all relying on wireless internet access while seated outside in school and public library parking lots, which happens both during and outside library hours.1 much less attention has been paid, however, to the broadband infrastructure and technical support that public schools and libraries need to meet the digital demands of their communities. in 2018, our research team, composed of researchers and practitioners at the simmons university school of library and information science, measurement lab (m-lab), and internet2, received a grant (award #71-18-0110-18) from the us institute of museum and library services (imls). the purpose was to investigate how advanced broadband measurement capabilities can inform the capacity of the nation’s public libraries to support the online software applications and social and technical infrastructure needed to promote the national digital platform.2 in this paper, we present findings from this study, which seeks to address a gap in understanding, particularly among researchers, practitioners, and policymakers, about the speeds and quality of service of public library internet connections across the united states. through our research we learned that there are significant gaps in knowledge about broadband speeds and quality of service measures that are impacting the ability of public libraries to support their communities’ digital needs. in this context, we hope the quantitative data and analysis presented in this paper contributes to the scholarship on broadband measurement in libraries, as well as to expanding awareness and understanding of broadband data. more concretely, we hope this paper helps to raise awareness of the urgent need for shared knowledge about broadband data and infrastructure that supports digital services in public libraries. we begin the paper with a brief review of key studies that have highlighted the important role of public libraries in promoting digital equity, as well as studies that have discussed the importance of measuring broadband connectivity in public libraries. we concentrate particularly on those studies that have sought to elucidate the opportunities and challenges of both connecting public libraries with high-speed internet connections and educating public librarians, other researchers, and policymakers about what is meant by broadband infrastructure and services. we then present our findings from the quantitative analysis of our broadband measurement data, which highlights the ways in which ongoing, locally collected measurements can enhance libraries’ understanding of their internet service and help inform interactions with patrons and it service providers. the paper concludes with a discussion of the contribution of our research to the scholarship, and we briefly discuss the implications for state and federal policymakers interested in better understanding the role that library broadband measurement data can play in promoting healthy digital equity ecosystems. literature review digital inclusion and broadband measurement in public libraries public libraries in the united states have been committed to bridging the digital gap by providing free public access to computers, internet, and digital literacy skills for decades.3 for example, in their study of how public libraries respond to inquiries about the digital divide through participatory forms, schenck-hamlin, han, and schenck-hamlin found that public libraries have been recognized as the “first and last” resort for internet access particularly “for those unable to afford high-speed connections at home.”4 further, bertot, real, and jaeger affirmed this idea with their digital inclusion survey data, collected over several years, by stating, “america’s public libraries are an important force for bridging this (digital) divide, with 62.1% of these outlets information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 3 reporting that they are the only free providers of internet access inclusive of computers in their communities.”5 in addition to providing public access to computers and the internet, us public libraries have placed an emphasis on promoting the general public’s awareness and skills around broadband through delivering free digital literacy training sessions, as well as hosting civic discussions around the topic of broadband connections with their patrons.6 to further illustrate how public libraries narrow the digital divide, deguzman, jain, and loureiro explained that telemedicine has become a new norm in today’s medical visits, which quickly became a reality during the covid-19 pandemic. in their article, the authors show how public libraries can play a critical role in bridging this “digital health divide” that exists in many communities.7 as a bottomup means to promote digital inclusion in the us, the role that public libraries have played to promote digital inclusion and equity cannot be overlooked. however, as jaeger et al. explained in their study on how public libraries address the digital divide and digital inclusion, “one curious constant across policy approaches to digital divides in many, though not all, nations has been the failure to involve librarians in the formulation of definitions, policies, or other aspects of the policy-making process.”8 it is within this space that public librarians and the technological staff who support them can play an important role in co-designing the tools, skills, and knowledge needed to better understand broadband in public libraries. broadband measurement in public libraries many public libraries, particularly small, rural, and tribal libraries, face ongoing challenges in gaining accurate information about their broadband speeds and quality of service. this lack of information can limit their capacity to provide a wide range of applications and services to the community. as bertot, real, and jaeger concluded, one of the big challenges that public libraries have been dealing with is that the speed of public library internet connections “can vary significantly according to local population density.”9 in reaction to this situation, public librarians have shown great interest and need to acquire knowledge about their libraries’ current broadband performance.10 digital inclusion scholars have proposed topics that future research on public libraries and broadband measurement should explore.11 these topics include how to better inform public librarians in order to assist them in planning, as well as how to deliver sufficient and quality broadband connections to the community. other topics include looking at how to help public libraries justify the need for more workstations and bandwidth using data coming from “empirical measures, especially longitudinal measures.”12 these and other questions remain largely unanswered in the academic literature. the measuring library broadband networks (mlbn) project and research design research questions and significance of study our research sought to address this gap in the scholarship on broadband measurement in public libraries through the following research question: how can public libraries utilize broadband measurement tools and training materials to develop a better understanding of the broadband speeds and quality of service that public libraries receive? in response, our research team gathered quantitative data from an open-source broadband measurement system that was both developed for this study and deployed at 30 public libraries across the us. our research is significant because answers to these questions can help strengthen public libraries as essential anchor institutions and partners in providing data to address the digital needs of their communities. the findings from our study can also assist public libraries in responding to the challenges of developing a more integrated, equitable, and dynamic set of infrastructures for delivering public computing access and digital library services. information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 4 project overview and research design the measuring library broadband networks (mlbn) project (https://slis.simmons.edu/blogs/mlbn/) was originally conceptualized to be completed in four phases during the two-year grant period. during the first phase, we organized a “participatory design” workshop with our 10 first-year public libraries who agreed to serve as part of our research panel on the project.13 findings from our analysis of the qualitative data gathered during the workshop revealed that public libraries wanted access to broadband measurement data in order to: (1) better communicate with their patrons about their library’s broadband connectivity, (2) respond to their communities’ digital needs, and (3) justify the importance of robust internet connectivity to their funders.14 our analysis revealed early on in the project that knowledge gaps existed around the performance of public library broadband networks, patron and staff experiences using the library’s internet connection, and the meaning and value of measurements such as speed tests. during the second phase of the project, we applied what we learned from insights gained during the workshop to our site visits with the 10 participating first-year public libraries. during our fieldwork at the libraries, we sought to interview four different groups of people: (1) library staff, (2) library administrators, (3) it staff, and (4) it administrators. the purpose was to gain multiple perspectives on the same sets of questions, which would provide additional qualitative data to help answer our research questions. in a few of the libraries, the library administrator was also the primary it professional on site. in other words, depending on the size of the organization, librarians often wore several hats, which is certainly not uncommon for small, rural, and tribal libraries. in addition to conducting interviews with these four groups, we also held focus groups with patrons on site at each of the libraries. during this process, we were able to learn more about the context, character, and communities of our partner libraries and gain a better sense of what it is like to work at and/or be a patron of each library, as well as why public libraries might need an open-source broadband measurement system. the other main goal during this phase was to learn more about and document the process of installing our broadband measurement devices. through this process, we gained additional insights into the nuances of the network configurations at each location and refined our device configurations and setup instructions in response. ultimately, we sought to identify potential barriers to the measurement devices working properly in the networks of our second-year libraries, when we would not have the luxury of being there in person. at the conclusion of the research program in march 2021, we asked participating library and/or it staff to complete a final evaluation survey. twenty libraries responded to a range of questions, two of which related to their understanding of the library’s internet connectivity and network management practices: “is there an overall download and/or upload cap on the connection to the entire library building?” and “is there a cap on individual devices using the internet at the library?” eight libraries responded to one or both of the above questions; their responses are in table a.2 in appendix b. training manual during phases 2 and 3, we worked with carson block, a well-known library technology expert and consultant who helped us to develop our mlbn training manual (https://dataverse.harvard.edu/dataset.xhtml?persistentid=doi:10.7910/dvn/8xxxzq). the https://slis.simmons.edu/blogs/mlbn/ https://dataverse.harvard.edu/dataset.xhtml?persistentid=doi:10.7910/dvn/8xxxzq information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 5 development of the manual was led by chris ritzo, carson block, and colin rhinesmith to assist our second-year participating public libraries in being able to install the measurement devices on their own. the manual provides a comprehensive overview of our mlbn project, including what we learned in the first year of the project about why public libraries would want to measure broadband at their libraries. section 2 focuses on the setup instructions that public libraries would need to install the devices to measure both wired and wireless internet connections. we also provided details on the hardware used, device management, and data collection, as well as the data visualization platform developed for the project, which allows public libraries to access and use the broadband data gathered from the devices installed on their network. the manual includes complete information about how the measurement platform used in this study can be set up independently for future use by any library, institution, or individual. we knew this manual would be essential for scaling to the 60 total public libraries for our project and for any additional library after the end of our grant. final cohort of participating libraries libraries participating in this research were recruited primarily through the suggestion of the project’s advisory board, many of whom represented state library agencies, regional research and education networks, or other intermediary organizations working with or supporting public libraries. though the covid-19 pandemic limited our ability to scale up to our goal of 60 libraries, ultimately 30 libraries were recruited to participate in the research. appendix a lists the final cohort of participating libraries, the specific branch where measurements were conducted, the city, state, and the library’s imls code. broadband measurement data collection quantitative measurements of the network connections at participating libraries were collected using the murakami software developed by m-lab, running from a dedicated, on-premise measurement computer/device.15 this software provides tests from two large platforms for crowdsourced speed tests: m-lab’s network diagnostic tool versions 5 and 7 (ndt) and speedtest-cli, an open source client using the ookla platform.16 ndt is a network performance test of a connection’s bulk transport, conforming to the internet engineering task force’s (ieft) rfc 3148.17 m-lab provides two ndt testing protocols (ndt5 and ndt7), each measuring different aspects of the transmission control protocol (tcp).18 all versions of ndt measure upload and download speeds and latency using a single tcp stream, between the computer running the test and the nearest m-lab server. ookla is a commercial company that created the network performance test speedtest.net.19 ookla’s test also measures upload and download speeds, as well as latency, but provides the option to measure using a single tcp stream or using multiple streams. the primary differences between these two platforms’ tests are the use of single or multiple tcp streams and the location of testing servers.20 at each location (with a few exceptions), two devices were configured using network details supplied by library or it staff and shipped to the library with setup instructions (in some cases, depending on network complexity, only one device was installed). one device was connected to the switch or router where internet service connected the location (egress). the other device was connected to an available switch port on the same virtual local area network (vlan) as wifi access points. the intention was to measure the capacity of the entire location using the egress device, and the capacity of a single wifi access point (ap) to serve multiple patrons using the wifi ap device. once connected, each device ran tests approximately six randomized times within each 24-hour period. information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 6 each murakami device ran four tests: ndt 5, ndt 7, ookla single-stream, and ookla multi-stream. each test result was exported to an archive in google cloud. this data was imported into bigquery and analyzed in datastudio.21 data from the 2019 public libraries survey (pls) from imls was also included to describe each library’s locale, service population, and number of public computers, annual computer use sessions, and annual wifi sessions reported.22 public data provided by both ookla and m-lab for the counties in which each mlbn library was located were loaded to compare each platform’s reported aggregate measurements for the surrounding area to the measurements conducted at the libraries.23 aggregate public data for the surrounding counties in our analyses excluded all measurements from the libraries themselves. along with the data itself, specific details on our data import, cleaning, and analysis are provided in our publicly available mlbn dataverse (https://dataverse.harvard.edu/dataverse/mlbn), hosted by harvard university. limitations the covid-19 pandemic created challenges for the research team in scaling up to 60 libraries, as was planned at the beginning of the project. therefore, we had to limit our outreach and engagement during 2020. when we asked the final 30 public libraries that were able to participate in the research whether they had closed their doors during the pandemic, all of them said yes. however, all the libraries reported that they continued to provide wireless internet access, even though their buildings were closed to the public during this time. although we were unable to scale up to 60 public libraries, we were still absolutely thrilled with the response we received from the libraries that were able to participate. the ndt 7 tests in our program uncovered a now-resolved bug where measurements were limited by the performance of our selected premise devices, which lack proper support for encryption.24 this is observable in some of our data as a large jump in measured speeds from ndt 7 after november 1, 2020. the jump in measured throughput from ndt 7 tests after november 1, 2020 reflected when encrypted ndt 7 tests were disabled and began running unencrypted.25 findings individual libraries’ data data collected at each library is provided in an interactive mlbn datastudio report, along with summary information about the library from the 2019 public libraries survey.26 aggregated download, upload, and latency metrics from measurements conducted at each library can be viewed on page 2, individual library data (see figure 1 for an example).27 https://dataverse.harvard.edu/dataverse/mlbn information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 7 figure 1. individual library data page for andover, massachusetts.28 a unique feature of the report is a map of server locations to which tests were conducted. this feature demonstrates the different topologies of the ookla and m-lab platforms and enables analysis of measurements to specific servers. if a library’s internet service provider (isp) hosts an ookla server, it can be selected to display only measurements of the isp’s network, as shown in figure 2. the federal communications commission (fcc) distinguishes this topology as on-net, when the server and client are both within the same network, in contrast to off-net, where the server and client are in different networks.29 figure 2. individual library data page for clarkston, michigan, merit networks’ nearest ookla server selected.30 information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 8 by selecting all servers, we can observe the wide geographic range of ookla servers used. conversely, when we select one of the ndt tests, we can see that m-lab servers are only hosted in large metropolitan areas, as seen in figure 3. this demonstrates key differences in the server locations of these two measurement platforms and how the data from each relates to the fcc’s national broadband standard.31 clarkston, michigan—all ookla servers selected32 clarkston, michigan—all m-lab servers selected33 figure 3. test server locations for clarkston, michigan—all ookla and m-lab servers selected. additional aggregate speeds are provided on the individual library data page to communicate general measurement trends over time: maximum upload and download speed by month, day, hour, and weekday (see figures 4–7). this allows a library to confirm advertised speeds, as seen in figure 4 where the connection at clarkston, michigan, was measured consistently at just under 100 mbps symmetric download and upload. we also observe where measurements are not always consistent, as seen below in figures 5 and 7. in figure 5 we observe a dip in the upload median for westchester county, new york, in late june 2020, and a drop in upload median in late october 2020. with additional information, a library could correlate these observations with network outages, service changes, or network management changes. for example, the change in october 2020 could have been a network management change or service change to 200 mbps symmetric. in figure 7 we observe a trend that many librarians will recognize: a slight dip in median speeds over the peak hours of use. information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 9 figure 4. max speeds by month—clarkston, michigan.34 figure 5. daily aggregate speeds—westchester county, new york.35 information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 10 figure 6. weekday aggregate speeds—pasco county, florida.36 figure 7. hourly aggregate speeds—estherville, iowa.37 with additional local knowledge about network use, conditions, and events, library and it staff can use ongoing measurements to confirm and explain service changes or uncover issues that are not previously known. many mlbn libraries sought ongoing measurements of internet service to confirm service delivery levels, and some shared their expected service speeds in our final program evaluation survey. the list of libraries that shared their expected service levels are listed at the end of this paper. using these reported speeds as an example, we can observe where the overall measured speeds were consistent with the service levels and where they were not, using the ookla multi-stream measurements. bennington (vt) free library reported a 100 mbps symmetric connection as their expected service level, and the monthly maximum speeds range between 93 and 98 mbps.38 similar results were seen in live oak, georgia; monroe county, michigan; and sheridan, arkansas.39 in other cases, the reported service levels did not match the measurements. in pasco county, florida, measurements indicate a 50 mbps symmetric connection where the reported service level was 100 mbps download and 25 mbps upload.40 and in ventura county, california, our information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 11 measurements confirm an ~300 mbps symmetric connection but the reported service level was 1 gbps symmetric. finally, in several cases measurements may confirm anomalies or changes in the library’s internet service. in these examples we do not have local knowledge of changes in service or events that might explain anomalies observed in the data, but we can nonetheless observe that a change happened and make an inference about the causes. some examples include: • graham county, arizona—possible service delivery change in may 2020 from ~100/10 (download/upload mbps) to ~300/3041 • traverse city, michigan—possible service delivery change in january 2021 from ~80/5 to ~300/2042 • waltham, massachusetts—change to symmetric download and upload in november 2020 from ~50/25 to ~50/5043 • truro, massachusetts—observed changes in symmetry of upload and download measurements in june 2020 and march 2021 are perhaps indicators of testing changes in network management to adapt to changing needs44 • westchester county, new york—observed dip in some upload measurements in late june 2020 at specific times of day for unknown reason45 comparing average monthly maximum speeds the final two pages (5 and 6) of our data studio report display the maximum overall speeds and the average monthly maximum speeds measured for each library, filterable by imls code, access media, and type of isp.46 figure 8 shows a report for the average maximum speeds per test at libraries connected with fiber. figure 8. average maximum speeds per test measured at mlbn libraries connected with fiber. information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 12 comparison of measurements and related data to support increased understanding of network measurement within the public library community, we also compared measurements from the public libraries that participated in our study to the public data of crowdsourced measurements from the two large scale internet measurement platforms used in our research measurements, ookla and m-lab. we can observe the differences or similarities in measurements between the tests conducted from the libraries’ premise devices and the publicly released data in aggregate for the surrounding county. the weighted average speeds and latency are provided by quarter, since ookla’s public data limits more granularity. figure 9. comparing individual library data to public datasets—twin falls, idaho, 2020 q4.47 on page 4 of the data studio report we can observe whether the measured speeds in mlbn libraries were lower or higher than measurements from the surrounding county (see figure 9), along with the percentage difference between the two sources (see figure 10).48 while these differences are interesting to observe, and in some cases seem quite pronounced, this is not a finding that explains whether libraries are getting better or worse speeds than their communities. this is a coarse comparison that we might think of as a kind of litmus test for further inquiry, rather than findings that tell a definitive story. ookla public data aggregates all measurements from all isps together, while measurements from mlbn libraries are from one. a subscription to ookla speedtest intelligence might enable more direct comparisons on a per isp basis. information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 13 figure 10. mlbn data studio report, page 4—was public data or library data higher or lower? discussion our research builds on the digital inclusion survey that used a version of the ookla test in a supplemental speed test study in which, “libraries were asked to run the speed test when the library was closed, during regular hours of operation, and when usage was light, normal, or heavy, by the librarians’ estimation.”49 the software created by mlbn to collect ongoing, randomized measurements extends the idea of the digital inclusion survey’s supplemental speed test, making it a source of monitoring data that can be correlated with each location’s connection plans, service tiers, costs, and other metadata. this approach aligns with the methods used by the fcc by using a dedicated, on-premise device.50 the mlbn system also goes further, providing a framework for any open-source measurement to be added as an available test. since the conclusion of our research, several new tests have either been added or are being considered. 51 as new measurement tools and analyses are developed by the research community, the mlbn system can incorporate them and bridge network science researchers’ understandings to anchor institutions and the general public. while speed tests have been used in this research and its predecessors, we find there is a gap in public understanding about what these tests can tell us. one important outcome of our study is that understanding the experience of using the internet, measuring it, and regulating it all need additional measurements and approaches that go beyond speed tests alone. speed tests and the platforms that support them are very different. internet service plans may focus on upload and download speeds, as does telecommunications regulation at the fcc, but these tests offer only simple and incomplete assessments. the ookla and m-lab platforms provide two different controlled experiments designed to measure internet protocols and performance for different segments of the internet’s topology. they both use data generated specifically for the measurement itself. but these tests do not measure our experiences using the internet in general. for that we need network science researchers to develop new measurement methods and analyses. we advocate for even more nuance and additional metrics in the measurement and understanding of internet service beyond speed. this perspective aligns with the research er and network science community who are designing new measurements to account for user experience, content delivery, and latency issues, all of which are often incorrectly assumed to be measured by speed tests.52 information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 14 we took the approach of using multiple tests and platforms to provide complementary measurements of different aspects of delivered internet service. if we need to confirm advertised service levels, we can look at the ookla test data. in this analysis, we used the average monthly maximum speeds as measured by the ookla multi-stream test since it was used in the digital inclusion survey and most closely aligns with isps’ terms of service.53 m-lab’s ndt, on the other hand, provides a diagnostic measurement for how well the underlying tcp protocol is performing over the measured path. that measurement path for m-lab tests always traverses the boundaries between networks. if we want to assess our isp’s connectivity to the internet beyond the last mile network it maintains, we can examine the ndt data. however different the methodologies and measurements of ookla’s speed test and m-lab’s ndt test are, we agree with the internet measurement community that measurements of throughput or speed should not be the only focus of assessing a connection’s quality, and a broader public understanding that includes other measurements and nuances needs to be cultivated. 54 researchers’ understanding of the usefulness of both ookla and ndt measurements is not static, as recent analyses of their public datasets have shown.55 the research community is also exploring new metrics derived from various data sources that may eventually provide analyses that speak to user experience of using the internet, as well as other technical factors that can influence performance such as latency, bufferbloat, and responsiveness.56 what does this all mean for public libraries? building on the speed test used in the digital inclusion survey, the mlbn measurement system enables communities to collect ongoing measurements using a dedicated premise device, leveraging available open-source measurement tools, instead of running periodic or occasional tests. using this longitudinal data, libraries can confirm expected service levels using ookla test results, uncover where there is a mismatch in understanding of service levels or network management practice, or compare a library’s measured service to the surrounding community using public data sources. additional measurements like m-lab’s ndt can assess a library’s connectivity beyond the isp’s network. the resulting measurements are useful for interrogating the differences in platforms, their tests, and regulatory or funding benchmarks. but while speed tests can provide useful metrics for understanding general trends and anomalies, an appropriate understanding of what they do and do not measure is also needed. speed tests demonstrating advertised levels do not necessarily mean that users of that network will not experience slowness as content is delivered to their computers over the same connection. as the federal government prepares to outlay infrastructure dollars to states to improve internet access and service quality, libraries and other public institutions in their states will need specific data and understanding of its nuances and differences. conclusion in this paper, we sought to promote greater understanding about the speeds and quality of service of public library internet connections, an understudied area within library and information science, as well as among broadband policymakers. library staff and administrators need information to understand and communicate about a library’s network capacities, management practices, and diagnostic or monitoring information. the availability of measurements from different sources can help build shared understanding about a library’s internet connectivity between library staff and it or network administration personnel. and while speed tests are admittedly limited in what they can tell us about internet capacity, library staff who have access to information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 15 these types of measurements, as well as other information provided by a library’s it staff, would be better equipped to engage with patrons around questions of internet stability and capacity in support of the library’s mission. from our observations of the data collected, the mlbn measurement system can be used to enhance understanding of the library’s internet service and network management. for the subscribed service levels identified at mlbn libraries, ookla measurements confirm whether the library’s measured connection speeds matched the expected service levels and when they did not. when measurements are not consistent with expected service levels, libraries can observe the differences and correlate this with additional local information. ongoing measurements conducted by the library enable local control and monitoring of this vital service and support critique and interrogation of the differences between internet measurement platforms, their topologies, tests, and data, from the perspective of the library doing the measurement. speed tests are useful for examining these trends but may not always be indicative of a user’s experience accessing and using internet content and services. new research and leadership from the internet measurement community are needed to provide more nuanced and authentic assessments of both network performance and user experience. emerging research and analyses published openly can be added to the mlbn system to support increased public understanding of internet connection quality and user experience. we hope this paper and our research will help support public libraries interested in ongoing measurement and assessment of their internet service, as well as contribute to discussion of the implications for state and federal policymakers interested in better understanding that public libraries play a key role in their local digital equity ecosystems. acknolwedgments funding statement this work was supported by an award (#lg-71-18-0110-18) from the us institute of museum and library services national leadership grants for libraries program. data accessibility the datasets supporting this article have been uploaded to the harvard dataverse, located here: https://dataverse.harvard.edu/dataverse/mlbn https://dataverse.harvard.edu/dataverse/mlbn information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 16 appendix a: all participating libraries table a.1. the final cohort of participating libraries, the specific branch where measurements were conducted, the city, state, and the library’s imls code library branch (if applicable) city state imls region code andover memorial hall library — andover ma 21 suburb, large arkansas river valley regional library arkansas river valley dardanelle ar 33 town, remote avery mitchell yancy (amy) regional library avery morrison library newland nc 42 rural, distant bennington free library — bennington vt 32 town, distant caruthersville public library — caruthersville mo 33 town, remote cherokee public library — cherokee ia 33 town, remote clarkston independence district library main branch clarkston mi 21 suburb, large cochise county library district elfrida elfrida az 32 town, distant denver public library central library denver co 11 city, large estherville public library — estherville ia 33 town, remote mid arkansas regional library system grant county library sheridan ar 33 town, remote casewell county library gunn memorial public library yanceyville nc 42 rural, distant hall county library system gainesville gainesville ga 13 city, small hollis public library — hollis ak 43 rural, remote live oak public libraries bull street library savannah ga 13 city, small monroe county library system bedford branch library temperance mi 21 suburb, large information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 17 library branch (if applicable) city state imls region code multnomah county library st. johns branch portland or 12 city, midsize pasco county library regency park library new port richey fl 21 suburb, large pryor public library — pryor ok 32 town, distant union county library system the public library for union county lewisburg pa 32 town, distant safford city-graham county library — safford az 32 town, distant saline county library — benton ar 33 town, remote saint paul public library rondo branch, central branch saint paul mn 11 city, large the ferguson library — stamford ct 12 city, midsize traverse area district library kingsley branch library kingsley mi 21 suburb, large truro public library — truro ma 21 suburb, large twin falls public library — twin falls id 33 town, remote ventura county public library ep foster branch, admin branch ventura ca 21 suburb, large waltham public library main library waltham ma 21 suburb, large westchester county public library hendrick hudson free library, library system datacenter montrose ny 21 suburb, large information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 18 appendix b: final program evaluation survey table a.2. final evaluation responses on internet connectivity and network management library survey respondent role(s) service tier from survey (download/upload) per device limit imposed bennington free library library staff, it staff 100/100 live oak public libraries it staff 300/300 monroe county library system network administrator 50/20 pasco county library network administrator 100/25 grant county library library administrator 15/15 public library for union county it staff 10 mb/s ventura county public library it staff 1000/1000 waltham public library library staff, it staff 100/100 50 mb/s information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 19 endnotes 1 the editorial board, “doing schoolwork in the parking lot is not a solution,” the new york times, july 18, 2020, https://www.nytimes.com/2020/07/18/opinion/sunday/broadbandinternet-access-civil-rights.html; kathleen gray, “these buses bring school to students,” the new york times, december 17, 2020, https://www.nytimes.com/interactive/2020/12/17/us/school-bus-remote-learningwifi.html; cecilia kang, “parking lots have become a digital lifeline,” the new york times, may 5, 2020, https://www.nytimes.com/2020/05/05/technology/parking-lots-wificoronavirus.html; dan levin, “in rural ‘dead zones,’ school comes on a flash drive,” the new york times, november 13, 2020, https://www.nytimes.com/2020/11/13/us/wifi-dead-zonesschools.html. 2 institute of museum and library services, “lg-71-18-0110-18,” accessed august 25, 2021, https://www.imls.gov/grants/awarded/lg-71-18-0110-18-0. 3 john carlo bertot, brian real, and paul t. jaeger, “public libraries building digital inclusive communities: data and findings from the 2013 digital inclusion survey,” library quarterly 86, no. 3 (2016): 270–89, https://doi.org/10.1086/686674; donna schenck-hamlin, soo-hye han, and bill schenck-hamlin, “library-led forums on broadband: an inquiry into public deliberation,” library quarterly 84, no. 3 (july 2014): 278–93; sharon strover, brian whitacre, colin rhinesmith, and alexis schrubbe, “the digital inclusion role of rural libraries: social inequities through space and place,” media, culture & society 42, no. 2 (2020), https://doi.org/10.1177/0163443719853504. 4 schenck-hamlin, han, and schenck-hamlin, “library-led forums on broadband,” 280. 5 bertot, real, and jaeger, “public libraries building digital inclusive communities,” 271. 6 schenck-hamlin, han, and schenck-hamlin, “library-led forums on broadband.” 7 pamela b. deguzman, neha jain, and christine g. loureiro, “public libraries as partners in telemedicine delivery: a review and research agenda,” public library quarterly 41, no. 3 (may 2022): 294–304. 8 paul t. jaeger, john carlo bertot, kim m. thompson, sarah m. katz, and elizabeth j. decoster, “the intersection of public policy and public access: digital divides, digital literacy, digital inclusion, and public libraries,” public library quarterly 31, no. 1 (january 2012): 4. 9 bertot, real, and jaeger, “public libraries building digital inclusive communities,” 276. 10 colin rhinesmith et al., “co-designing an open source broadband measurement system with public libraries,” in eds. larry stillman, misita anwar, colin rhinesmith, and vanessa rhinesmith, proceedings—17th cirn conference 6-8 november 2019, monash university prato centre, italy: “whose agenda: action, research, & politics” (department of human centred computing, monash university, 2020): 153–76, https://www.researchgate.net/publication/341882544_codesigning_an_open_source_broadband_measurement_system_with_public_libraries. https://www.nytimes.com/2020/07/18/opinion/sunday/broadband-internet-access-civil-rights.html https://www.nytimes.com/2020/07/18/opinion/sunday/broadband-internet-access-civil-rights.html https://www.nytimes.com/interactive/2020/12/17/us/school-bus-remote-learning-wifi.html https://www.nytimes.com/interactive/2020/12/17/us/school-bus-remote-learning-wifi.html https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html https://www.nytimes.com/2020/11/13/us/wifi-dead-zones-schools.html https://www.nytimes.com/2020/11/13/us/wifi-dead-zones-schools.html https://www.imls.gov/grants/awarded/lg-71-18-0110-18-0 https://doi.org/10.1086/686674 https://doi.org/10.1177/0163443719853504 https://www.researchgate.net/publication/341882544_co-designing_an_open_source_broadband_measurement_system_with_public_libraries https://www.researchgate.net/publication/341882544_co-designing_an_open_source_broadband_measurement_system_with_public_libraries information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 20 11 john carlo bertot and charles r. mcclure, “assessing sufficiency and quality of bandwidth for public libraries,” information technology and libraries 26, no. 1 (march 2007): 14–22; lauren h. mandel, bradley w. bishop, charles r. mcclure, john carlo bertot, paul t. jaeger, “broadband for public libraries: importance, issues, and research needs,” government information quarterly 27, no. 3 (january 1, 2010): 280–91. 12 mandel, bishop, mcclure, bertot, and jaeger, “broadband for public libraries,” 388. 13 douglas schuler and aki namioka, eds., participatory design: principles and practices (hillsdale, nj: lawrence erlbaum associates, inc., 1993). 14 rhinesmith et al., “co-designing.” 15 measurement lab, “m-lab/murakami: run automated internet measurement tests in a docker container” (2021), https://github.com/m-lab/murakami/. 16 measurement lab, “m-lab/ndt5-client-go: ndt5 reference client implementation in go” (2021), https://github.com/m-lab/ndt5-client-go; sivel, “sivel/speedtest-cli: command line interface for testing internet bandwidth using speedtest.net” (2021), https://github.com/sivel/speedtest-cli. 17 measurement lab, “ndt (network diagnostic tool),” https://www.measurementlab.net/tests/ndt/. 18 lai yi ohlsen, matt mathis, and stephen soltesz, “evolution of ndt,” measurement lab (blog), august 5, 2020, https://www.measurementlab.net/blog/evolution-of-ndt/. 19 ookla, “speedtest,” https://www.speedtest.net/. 20 measurement lab, “where are m-lab servers hosted?”, https://support.measurementlab.net/help/en-us/9-platform/2-where-are-m-lab-servershosted. 21 mlbn data studio report, page 1—overview of mlbn libraries, https://datastudio.google.com/u/0/reporting/0dff817b-0e0e-446e-a3b3406121291124/page/gxxib. 22 institute of museum and library services, “public libraries survey,” https://www.imls.gov/research-evaluation/data-collection/public-libraries-survey. 23 ookla, “ookla’s open data initiative,” https://www.ookla.com/ookla-for-good/open-data; measurement lab, “data overview,” https://www.measurementlab.net/data/. 24 measurement lab, “detect cpu capabilities and set scheme accordingly by robertodauria · pull request #62 · m-lab/ndt7-client-go” (2021), https://github.com/m-lab/ndt7-clientgo/pull/62; measurement lab, “m-lab/ndt7-client-go: ndt7 reference client implementation in go” (2021), https://github.com/m-lab/ndt7-client-go. https://github.com/m-lab/murakami/ https://github.com/m-lab/ndt5-client-go https://github.com/sivel/speedtest-cli https://www.measurementlab.net/tests/ndt/ https://www.measurementlab.net/blog/evolution-of-ndt/ https://www.speedtest.net/ https://support.measurementlab.net/help/en-us/9-platform/2-where-are-m-lab-servers-hosted https://support.measurementlab.net/help/en-us/9-platform/2-where-are-m-lab-servers-hosted https://datastudio.google.com/u/0/reporting/0dff817b-0e0e-446e-a3b3-406121291124/page/gxxib https://datastudio.google.com/u/0/reporting/0dff817b-0e0e-446e-a3b3-406121291124/page/gxxib https://www.imls.gov/research-evaluation/data-collection/public-libraries-survey https://www.ookla.com/ookla-for-good/open-data https://www.measurementlab.net/data/ https://github.com/m-lab/ndt7-client-go/pull/62 https://github.com/m-lab/ndt7-client-go/pull/62 https://github.com/m-lab/ndt7-client-go information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 21 25 measurement lab, “add to ndt7 runner to force all tests to be non-tls” (2020), https://github.com/mlab/murakami/commit/3770d6b63ebd9ad62b3754e0642bd0e9216e171e. 26 mlbn data studio report, page 1—overview of mlbn libraries, https://datastudio.google.com/s/ghtw2gm-vfi. 27 mlbn data studio report, page 2—individual library data, https://datastudio.google.com/s/kppdgf3i8c4. 28 mlbn data studio report page 2—individual library data page, andover, massachusetts, https://datastudio.google.com/s/unpy5z1m21g. 29 fcc office of engineering and technology, “technical appendix to the tenth mba report” (n.d.) 24, accessed 2021-06-22, http://data.fcc.gov/download/measuring-broadbandamerica/2020/technical-appendix-fixed-2020.pdf. 30 mlbn data studio report page 2—individual library data for clarkston, michigan, merit networks’ nearest ookla server selected, https://datastudio.google.com/s/pqpcybmhaj8. 31 ookla, “the speedtest server network,” accessed august 16, 2021, https://www.ookla.com/speedtest-servers; m-lab, “ndt data in ntia indicators of broadband need,” accessed february 15, 2022, https://www.measurementlab.net/blog/ntia/. 32 mlbn data studio report page 2—clarkston, michigan—all ookla servers selected, https://datastudio.google.com/s/oilgppl47rc. 33 mlbn data studio report page 2—clarkston, michigan—all m-lab servers selected, https://datastudio.google.com/s/j79kg_wmyrc. 34 mlbn data studio report page 2—max speeds by month—clarkston, michigan https://datastudio.google.com/s/hyljk_mpvl4. 35 mlbn data studio report page 2—daily speeds—westchester county, ny https://datastudio.google.com/s/ixgn8v4r3ia. 36 mlbn data studio report page 2—weekday aggregate speeds—pasco county, fl https://datastudio.google.com/s/gadtydi2g9k. 37 mlbn data studio report—hourly speeds—estherville, ia https://datastudio.google.com/s/n5pe2y9rjyg. 38 mlbn data studio report page 2—individual library data (bennington, vt and speedtestmulti-stream selected) https://datastudio.google.com/s/njreltxejge. 39 mlbn data studio report page 2—individual library data—ookla multi-stream measurements for live oak, ga, https://datastudio.google.com/s/kild8kkltza; mlbn data studio report page 2—individual library data—ookla multi-stream measurements for monroe county, mi, https://datastudio.google.com/s/p140swjihjm; mlbn data studio report page 2—individual https://github.com/m-lab/murakami/commit/3770d6b63ebd9ad62b3754e0642bd0e9216e171e https://github.com/m-lab/murakami/commit/3770d6b63ebd9ad62b3754e0642bd0e9216e171e https://datastudio.google.com/s/ghtw2gm-vfi https://datastudio.google.com/s/kppdgf3i8c4 https://datastudio.google.com/s/unpy5z1m21g http://data.fcc.gov/download/measuring-broadband-america/2020/technical-appendix-fixed-2020.pdf http://data.fcc.gov/download/measuring-broadband-america/2020/technical-appendix-fixed-2020.pdf https://datastudio.google.com/s/pqpcybmhaj8 https://www.ookla.com/speedtest-servers https://www.measurementlab.net/blog/ntia/ https://datastudio.google.com/s/oilgppl47rc https://datastudio.google.com/s/j79kg_wmyrc https://datastudio.google.com/s/hyljk_mpvl4 https://datastudio.google.com/s/ixgn8v4r3ia https://datastudio.google.com/s/gadtydi2g9k https://datastudio.google.com/s/n5pe2y9rjyg https://datastudio.google.com/s/njreltxejge https://datastudio.google.com/s/kild8kkltza https://datastudio.google.com/s/p140swjihjm information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 22 library data—ookla multi-stream measurements for sheridan, ar, https://datastudio.google.com/s/pehrfxosx6s. 40 mlbn data studio report page 2—individual library data—ookla multi-stream measurements for pasco county, fl, https://datastudio.google.com/s/uxgjlfv44ze. 41 mlbn data studio report page 2—individual library data—graham county, az—observing maximum ookla measured speeds by month, https://datastudio.google.com/s/ieal3b2vbu8. 42 mlbn data studio report page 2—individual library data—traverse city, mi – observing maximum ookla measured speeds by month, https://datastudio.google.com/s/tljy900tx-q. 43 mlbn data studio report page 2—individual library data—waltham, ma—observing maximum ookla measured speeds by month, https://datastudio.google.com/s/pxzdrb0chb0. 44 mlbn data studio report page 2—individual library data—truro, ma—observing maximum ookla measured speeds by month, https://datastudio.google.com/s/okcwp-_f6qs. 45 mlbn data studio report page 2—individual library data—westchester county, ny— observing daily aggregate speeds between june 17-29, 2020 and hourly aggregate speeds for tests in the 8 a.m., 10 a.m., 3 p.m., and 10 p.m. columns, https://datastudio.google.com/s/vlegynkoqc. 46 mlbn data studio report page 5—comparison of overall maximum speeds by test (fiber access media selected), https://datastudio.google.com/s/ii2f7onuchi; mlbn data studio report page 6—average monthly maximum speeds per test (all libraries selected), https://datastudio.google.com/s/oo7xal_kv4k. 47 mlbn data studio report page 3—comparing individual library data to public datasets—twin falls, id, 2020 q4, https://datastudio.google.com/s/vqq38wyraho. 48 mlbn data studio report page 4—was public data or library data higher or lower?, https://datastudio.google.com/s/meb4s-cflcw. 49 bertot, real, and jaeger, “public libraries building digital inclusive communities,” 271; american library association, “library broadband speed test shows increased capacity; room still for improvement” (press release), https://www.ala.org/news/pressreleases/2015/04/library-broadband-speed-test-shows-increased-capacity-room-stillimprovement. 50 federal communications commission, “measuring broadband america,” https://www.fcc.gov/general/measuring-broadband-america. 51 measurement lab, “add new runner for ooniprobe-cli,” https://github.com/mlab/murakami/pull/103; measurement lab, “add fast.com test runner,” https://github.com/m-lab/murakami/issues/48; “data science institute,” university of chicago, https://cdac.uchicago.edu/; university of chicago data science institute, “netrics – active measurements of internet performance,” https://github.com/chicago-cdac/nm-expactive-netrics/. https://datastudio.google.com/s/pehrfxosx6s https://datastudio.google.com/s/uxgjlfv44ze https://datastudio.google.com/s/ieal3b2vbu8 https://datastudio.google.com/s/tljy900tx-q https://datastudio.google.com/s/pxzdrb0chb0 https://datastudio.google.com/s/okcwp-_f6qs https://datastudio.google.com/s/vleg-ynkoqc https://datastudio.google.com/s/vleg-ynkoqc https://datastudio.google.com/s/ii2f7onuchi https://datastudio.google.com/s/oo7xal_kv4k https://datastudio.google.com/s/vqq38wyraho https://datastudio.google.com/s/meb4s-cflcw https://www.ala.org/news/press-releases/2015/04/library-broadband-speed-test-shows-increased-capacity-room-still-improvement https://www.ala.org/news/press-releases/2015/04/library-broadband-speed-test-shows-increased-capacity-room-still-improvement https://www.ala.org/news/press-releases/2015/04/library-broadband-speed-test-shows-increased-capacity-room-still-improvement https://www.fcc.gov/general/measuring-broadband-america https://github.com/m-lab/murakami/pull/103 https://github.com/m-lab/murakami/pull/103 https://github.com/m-lab/murakami/issues/48 https://cdac.uchicago.edu/ https://github.com/chicago-cdac/nm-exp-active-netrics/ https://github.com/chicago-cdac/nm-exp-active-netrics/ information technology and libraries september 2022 measuring library broadband networks | ritzo, rhinesmith, and jiang 23 52 internet architecture board, “measuring network quality for end-users, 2021, https://www.iab.org/activities/workshops/network-quality/; david d. clark and sara wedeman, “measurement, meaning and purpose: exploring the m-lab ndt dataset (august 2, 2021), https://ssrn.com/abstract=3898339 or http://dx.doi.org/10.2139/ssrn.3898339. 53 bertot, real, and jaeger, “public libraries building digital inclusive communities.” 54 internet architecture board, “measuring network quality for end-users, 2021” (call for papers), accessed august 26, 2021, https://www.iab.org/activities/workshops/network-quality/; lai yi ohlsen and chris ritzo, “ndt data in ntia indicators of broadband need,” measurement lab (blog), july 15, 2021, https://www.measurementlab.net/blog/ntia/. 55 clark and wedeman, “measurement, meaning and purpose.” 56 lai yi ohlsen, “m-lab research fellows – sprint 2022,” measurement lab (blog), january 13, 2022, https://www.measurementlab.net/blog/research-fellow-announcement/; lai yi ohlsen, “upcoming m-lab community call discussing latency, bufferbloat, responsiveness,” measurement lab (blog), august 18, 2021, https://www.measurementlab.net/blog/community-call-announcement/; broadband internet technical advisory group (bitag), “latency explained,” https://bitag.org/latencyexplained.php; internet architecture board, “measuring network quality for end-users 2021,” https://www.iab.org/activities/workshops/network-quality/; caida, “nsf workshop on overcoming measurement barriers to internet research (wombir 2021), https://www.caida.org/workshops/wombir/2101/; caida, “2nd nsf workshop on overcoming measurement barriers to internet research (wombir-2), https://www.caida.org/workshops/wombir/2104/; ietf, “responsiveness under working conditions” (draft), https://datatracker.ietf.org/doc/draft-cpaasch-ippm-responsiveness/. https://www.iab.org/activities/workshops/network-quality/ https://ssrn.com/abstract=3898339 http://dx.doi.org/10.2139/ssrn.3898339 https://www.iab.org/activities/workshops/network-quality/ https://www.measurementlab.net/blog/ntia/ https://www.measurementlab.net/blog/research-fellow-announcement/ https://www.measurementlab.net/blog/community-call-announcement/ https://bitag.org/latency-explained.php https://bitag.org/latency-explained.php https://www.iab.org/activities/workshops/network-quality/ https://www.caida.org/workshops/wombir/2101/ https://www.caida.org/workshops/wombir/2104/ https://datatracker.ietf.org/doc/draft-cpaasch-ippm-responsiveness/ abstract introduction literature review digital inclusion and broadband measurement in public libraries broadband measurement in public libraries the measuring library broadband networks (mlbn) project and research design research questions and significance of study project overview and research design training manual final cohort of participating libraries broadband measurement data collection limitations findings individual libraries’ data comparing average monthly maximum speeds comparison of measurements and related data discussion what does this all mean for public libraries? conclusion acknolwedgments funding statement data accessibility appendix a: all participating libraries appendix b: final program evaluation survey endnotes microsoft word 12915 20211217 gallery.docx article black, white, and grey the wicked problem of virtual reality in libraries gillian d. ellern and laura cruz information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.12915 gillian d. ellern (ellern@email.wcu.edu) is associate professor and systems librarian, hunter library, western carolina university. laura cruz (lxc601@psu.edu) is associate research professor, schreyer institute for teaching excellence, the pennsylvania state university. © 2021. abstract this study seeks to extend wicked problems analysis within the context of a library’s support for virtual reality (vr) and the related extended reality (xr) emerging technologies. the researchers conducted 11 interviews with 13 librarians, embedded it staff, and/or faculty members who were involved in administering, managing, or planning a virtual reality lab or classroom in a library (or similar unit) in a higher education setting. the qualitative analysis of the interviews identified clusters of challenges, which are categorized as either emergent (but solvable) such as portability and training; complicated (but possible) such as licensing and ethics: and/or wicked (but tameable). the respondents framed their role in supporting the wickedness of vr/xr in three basic ways: library as gateway, library as learning partner, and library as maker. five taming strategies were suggested from this research to help librarians wrestle with these challenges of advocating for a vision of vr/xr on their respective campuses. this research also hints at a larger role for librarians in the research of technology diffusion and what that might mean to their role in higher education in the future. introduction political scientists horst rittel and melvin webber coined the term “wicked problems” in the early 1970s to refer to problems that were sufficiently complex that they defied conventional problemsolving methods.1 initially framed as broad social problems, such as food security or climate change, wicked problems are characterized by having ambiguous parameters, shifting requirements and/or stakeholders, and, perhaps more importantly, “no determinable stopping point.”2 such problems are called wicked because they are “diabolical, in that they resist the usual attempts to resolve them.”3 without the possibility of a clear solution, the end product of wicked problems analysis is not to solve the problem but rather to find ways to “tame” them, an approach which runs counter to conventional models of not only planning but also reasoning.4 if taming is the last step in wicked problems analysis, a critical first step is to determine if a given challenge is, in fact, wicked, as that will then determine what tools, perspectives, and strategies will need to be brought to the table. simple problems can be resolved by matching them to known solutions, more complicated problems may be addressed by analyzing engineering solutions, but super complex/messy/wicked problems require an entirely different mindset.5 persistent frustration with the limitations of conventional problem-solving models has led to a proliferation of studies identifying a host of wicked problems, ranging from the global (covid-19 response) to the local (dysfunctional families).6 the present study seeks to apply the framework of wicked problem analysis to the question of the role of academic libraries in supporting emerging technologies, using the integration of vr/xr as a case study. information technology and libraries december 2021 black, white, and grey | ellern and cruz 2 literature review the wicked problem of libraries and technology has been recognized by a number of scholars, each using a different frame of reference, as perhaps fits the inherent ambiguity of a wicked problem. scholars have identified electronic data management, research data management, and ebooks as library problems that are wicked in nature, and howley notes that the question of public access touches on larger social issues that could be described as wicked. 7 a recent article by williams and willet identifies makerspace technology as boundary work, suggesting that it challenges conventional roles and relationships held by libraries and librarians, an approach which implies the existence a wicked problem.8 despite these exceptions, at least one set of library scholars has noted that “there are very few applications [of wicked problems] in librarianship.”9 the present study seeks to make the case that the application of the wicked problems framework to the question of the role of the libraries in emerging technology can illuminate new strategies, roles, and pathways forward. while research on wicked problems in libraries may be limited, the role of libraries in the curation, development, and dissemination of virtual reality (vr)—or using the more encapsulated term of extended reality (xr)—has been extensively written about by library scholars. although it could be argued that the current output reflects the nascent stages of vr/xr as a research field as scholars explore a library’s role with virtual reality (vr), mixed reality (mr), augmented reality (ar), and everything associated with them such as virtual worlds or 3d 360-degree videos, it is clear that, to date, the published works about vr/xr largely fall into two camps: the visionary and the applied. the former contains studies advocating for the integration of vr/xr (and related technologies) as part of a vision of the future for libraries; and the latter are applied studies that booth labels as “technorealistic.”10 in other words, these are descriptions of established practice or suggestions of practical strategies for how a library (or librarian) can actually implement a vr/xr lab or related program.11 what remains in shorter supply are critical and/or empirical studies that situate the development of vr/xr as an institutional capacity into larger, arguably wicked, questions of the evolving purpose and position of libraries. the case of vr/xr presents a distinctive perspective on the wicked problem of the technological orientation of academic libraries. unlike issues such as electronic records management, vr/xr is not part of the core technological infrastructure of a library, nor does it touch directly upon prior core administrative functions, such as collection development or access services. rather, it is perceived as an extension of library services, particularly those related to the evolving educational mission of the academic library and its role as a broader facilitator of information literacy across disciplines. as one library scholar remarks: as libraries are increasingly called upon to support knowledge exchange beyond traditional books and journals, the creation of novel types of research infrastructure will shape the preservation and access expectations of constituents.12 the present study looks at how librarians navigate, or tame, the myriad of challenges that arise not just from rethinking how an academic library engages with technology, but from pushing the boundaries of what library work is (or could be). as the emergence of vr/xr technology begins to cast a larger shadow over higher education, many librarians have argued that academic libraries associated with institutions with high research activity are especially well situated to take on a leadership role, an opportunity that they information technology and libraries december 2021 black, white, and grey | ellern and cruz 3 had largely missed with recent related technologies such as 3d printing. not wanting to be left behind, these libraries have embraced vr/xr technology at a relatively rapid rate. a recent (unpublished) study noted that in 2015, only 3% (n=4) out of the 125 sampled research universities had a vr/xr presence; by 2020, that percentage increased to 66% (n=77), a rate which appears to be outstripping that of technology competitors such as gis, institutional repositories, and data visualization services.13 given the relatively high resource up-front investment required to support vr/xr, it would appear that many university libraries are doubling down on the prospect that vr/xr will be an integral part of their future. the degree to which the rapid adoption of vr/xr will live up its promise remains to be seen, but the present study seeks to illuminate how current librarians are seeking to tame this potentially savage beast. methodology this irb-approved study is based on the qualitative analysis of eleven interviews with thirteen librarians (8), embedded it staff (3), or faculty members (2), all of whom were involved with the adoption of vr/xr technology at their respective libraries. the inclusion criteria for the study were described in the consent document as those people “currently involved in administering, managing, or planning a virtual reality lab or classroom in a library (or similar unit) in a higher education setting.” to identify potential participants, the researchers conducted a web search using the terms “library” and “virtual reality” or “vr” and then utilized a snowball sampling method to generate a list of potential interviewees that included multiple library types (e.g., academic research libraries [arls], public libraries) as well as institutional types (e.g., community colleges). one large library had multiple participants including one librarian and two support staff responsible for the vr room. taken collectively, these participants’ institutions included community colleges (3), public libraries (2), medical libraries (4), and academic research libraries (4), located in either the united states (10) or canada (1). the pool of the us educational institutions (10) represented five different carnegie classifications: associate’s colleges, doctoral universities, doctoral/professional universities, master’s colleges & universities, and special focus four-year institutions. these comprised a mix of small-, medium-, and large-sized institutions (by full-time enrollment, or fte). all the organizations in this study (11) were public institutions. each interviewee received a copy of the possible interview topics in advance, including a list of potential challenges faced by libraries seeking to integrate vr/xr (see appendix a). the list of challenges was crafted from a literature search, as well as the personal experience of one of the researchers, a librarian who oversees a vr lab. each hour-long, semi-structured interview was conducted via zoom, machine transcribed with kaltura, and further edited manually by the researchers. the transcripts then underwent three rounds of coding. first, the researchers independently reviewed the body of transcripts in their entirety and identified emergent themes. in the second round of coding, potential themes were merged into semi-structured coding guidelines, which were used to code each interview separately. in the third and final round, the themes were re-evaluated and adjusted based on feedback from the previous rounds, leading to the identification of a problem-based typology (emergent, complex, wicked). from our process, we gained insight into a myriad of challenges facing libraries as they work to integrate vr/xr into the work that they do. that insight has, in turn, led to the development of a conceptual framework that we believe will be useful to others seeking to wrestle with these challenges in the future. information technology and libraries december 2021 black, white, and grey | ellern and cruz 4 table 1. equipment, staffing, and funding for vr/xr spaces in participating libraries location of vr service in library number of pcs connected number of mobile headsets types of pc headsets types of mobile headsets staffing one time funding continuing funding room 2 10 htc (vive and pro) oculus go, spectra vr 2 staff yes no entrance 1 0 oculus rift sv 2 staff yes as needed room 4 8 htc vive pro oculus quest and microsoft hololens 1 staff, 3 students yes no, but planned room 5 1 oculus rift s and htc vive pro oculus quest 3 staff, 3 students yes as needed room, mobile vr 1 3 in circ, several in office htc vive cosmos oculus quest, oculus go, samsung odyssey, lenovo mirage solo, hololens, playstation vr and google cardboard 2 staff, 8 students no as needed mobile vr, entrance 2 6 htc vive, oculus rift oculus go, oculus quest all circ staff (2/3 per shift) yes no room 3 7 oculus rift, htc vive pro, htc vive standard google cardboard, insignia vr viewers 2 staff yes no, but planned mobile vr, entrance 4 30 htc vive, oculus rift oculus quest, google cardboard or plastic viewers circ staff at each of 4 locations no yes information technology and libraries december 2021 black, white, and grey | ellern and cruz 5 results library vr/xr spaces even within the relatively small sample of institutions included in our sample, we found that there was a fairly wide range of practice regarding vr/xr library labs, with considerable variance on location, number, and manufacturer of headsets, staffing, and funding as seen in table 1. challenges through our coding process, we identified clusters of those challenges, which we categorized as either emergent (but solvable), complicated (but possible), and/or wicked (but tameable). emergent challenges our respondents identified a number of challenges that are frequently associated with the adoption of emergent technology, regardless of who is choosing to adopt it or what they are choosing to adopt. in other words, any person or place adopting xr (or other types of emergent technology) at this stage of its development is likely to run into similar issues. portability and mobility portability (or lack thereof) was frequently referenced as a limitation of the current technology. the most common headsets purchased for the first generation of library vr lab spaces have physical cords and sensors that have to be plugged in (to high performing computers) during use. one intrepid librarian even described carting around her bulky alienware desktop computers and video displays between campuses, but needing to find a better way because, “it made the computer folks very angry because it’s so delicate and our sidewalks are so bumpy.” she now uses an alienware laptop and some sturdy tripods (for the base stations) on these trips. the lack of portability not only limited the ability of libraries to take vr/xr out of the library for events and in-class presentations, but it also exacerbated existing space constraints, with users having to be literally tethered to cpus, screens, and base stations. as one of our respondents put it, “the biggest issue is that it’s in one place and it’s stuck there.” in this case, manufacturers are aware of the limitations on mobility, and it appears as though wireless headsets will be the next wave of adaptation by the industry. several wireless headsets have already come and gone, as vendors continue to work to overcome both technological and human-centered challenges. the oculus go, google cardboard, and google daydream have all been brought to the market and subsequently been discontinued.14 only one of the libraries we spoke to indicated that they had purchased a wireless headset, and that headset (the oculus go) turned out to be of limited utility. while this next generation of headsets will likely solve a number of operability issues, it also has the potential to compound another challenge noted by most of our respondents, i.e., a lack of sustainable funding for equipment refresh. the majority of our respondents (6 of 8, or 75%) indicated that they purchased their equipment through one-time funding sources, whether internal or external grants (n=6), end-of-year funds, or some combination of these. vr/xr training vr/xr experience remains new to most people outside of the gaming world, so it has fallen largely on librarians to develop introductory training protocols at the level of access to the technology. there are distinctive challenges in introducing vr/xr to a broader audience. some of those challenges may be physical. in its earlier stages, a number of users experienced symptoms such as nausea or seasickness, and while these have been lessened with higher refresh rates and movable information technology and libraries december 2021 black, white, and grey | ellern and cruz 6 lenses, other virtual reality induced symptoms and effects (vrise) continue to emerge with studies of longer-term use.15 two of our librarians expressed concerns that other long-term effects may still be unknown, and both of the public libraries included in this study banned vr/xr use for patrons under 14 years of age until more is known about how it affects developing brains, a recommendation that is now supported by most vendors as well. even for those who do not suffer from physical symptoms, the technology can be disorienting and uncomfortable. this contributes to higher levels of anxiety, which, somewhat ironically, vr/xr has been shown to alleviate in some clinical trials.16 for these reasons, vr/xr labs require staffing not just to safeguard the equipment and ensure its appropriate use, but also to coach users through their new experience. as one of our respondents described her experience, “a lot of people will put on the vr headset and not move because they’re used to computer displays being two-dimensional … it is not common knowledge yet that you can move around and this environment [moves] with you. and they [new users] will just stand there.” coaching someone to move around in a virtual reality environment is not a straightforward endeavor either, as one of our librarians relates: “how do you interact with somebody who can’t see you in a way that’s respectful? because that can be kind of disconcerting if you’ve got a headset on and all of sudden somebody touches your hand?” one of our respondents drew upon her experience as a swimming coach to develop a set of “non-touching” verbal protocols for her student lab assistants to utilize in working with clients who are new to the interior mobility of virtual worlds. other challenges identified by our respondents that might fit into the emerging technologies category include the following: liability, aspects of licensing, physical space modifications, room and equipment management, training curriculum, logistics of engaging with multiple users, availability of apps/games, equipment installation, and evaluation procedures. this list could perhaps also include the need to not only educate patrons on what the emerging technology can do but to advocate for its future significance. as one of our respondents stated, “i think you can write about it and speak about as much as you want. it’s a matter of getting them in there.” complicated challenges unlike emergent issues, complicated challenges are unlikely to be resolved without concerted intervention and leadership and, even then, it is possible that a single or clear solution may not be readily identified. challenges that fall into this category may be described as grey areas, in which future directions remain scattered, unclear, or uncertain. embracing these complexities means that libraries looking to adopt vr/xr currently must be willing to venture out on their own, embracing both the opportunities and the risks inherent in forecasting future technology use. licensing an example of one of these complicated challenges that emerged from our interviews is the issue of licensing. many vr/xr titles are available for free, through services such as steam and the oculus store. all of our respondents indicated that they acquired content via these services. other popular vr/xr academic titles, such as 3d organon anatomy and google tilt brush, are licensed and potential users must pay a fee to access the full functionality of the tool. the challenge is that these licenses are most offered on an individual basis (“for home use only”), a reflection of the primary customer base for vr/xr content creators, e.g., gamers. a number of distributors do offer institutional licenses, but these are primarily for use in companies, with a relatively stable and readily identifiable list of employee users or limited number of stations. some vr/xr distributors information technology and libraries december 2021 black, white, and grey | ellern and cruz 7 offer a lesser-known (and less used) license known as an arcade license (e.g., steam pc café), but the prices are determined based on the assumption that the person renting the software for use will be receiving a fee, an assumption which does not work for libraries who do provide arcadelike services but do so free of charge. in other words, none of these available license types are well-suited for library use; the former too limited, the latter too expensive. as one of our librarians suggested, this is the “sort of the crack that libraries fall into a lot of the time anyway, with regard to [issues such as] document delivery, right, [in which the rule is stated], but it probably doesn’t apply to us in the same way because we’re a library. but it doesn’t explicitly say what i need to do about it.” what this means is that the majority of the librarians we spoke with indicated that they adopted one or more of these license types, but there was discomfort with the maladaptation to library practice and uncertainly as to what might constitute a best practice in the current market space. in the case of vr/xr, this state of affairs is likely due, at least in part, to a lack of awareness of or concern for libraries (or educational labs) as customers on the part of vendors. our respondents indicated that this oversight may be changing, however, as four of the librarians we interviewed reported that game developers reached out to them and negotiated deals in which libraries would receive equipment in exchange for beta-testing new titles with student populations. that said, awareness does not equate to priority, as one of our respondents noted, “i am concerned that we will be one of the last audiences that get some consideration in terms of the functionality that meets the library’s needs.” even if these issues are resolved in the context of vr/xr specifically, it seems unlikely that the complicated problem of “library as customer” will persist with the advent of new technologies and new technology providers. ethics the challenge of vendor relationships is compounded by other emergent ethical issues surrounding the integration of vr/xr into the library. several of the ethical concerns raised by our interviewees are connected to broader social concerns with technology use, such as issues of privacy and security, and others are related to long-standing ethical debates within libraries, such as the degree to which content should be limited by the library. our interviewees had divided opinions, for example, on whether or not the vr/xr lab should offer games. on one hand, the availability of games brought students to the library and engaged them with the new technology. on the other hand, the provision of games constitutes, for some stakeholders, a potentially significant shift away from an academic or scholarly mission for the library. as one respondent put it, “i can’t say that libraries have traditionally not been a place for people to have fun, but i think that’s something that … rubs some people a little bit the wrong way.” another stated, “my big concern at the beginning was that we would put this in and people would [say] … that’s for video games. why did the library buy video games?” the question of including popular content should be a familiar debate to librarians, but the issue is ratcheted up a notch when engagement may also include actions, such as shooting, that may be especially sensitive for college campuses. as one interviewee reflected, “we are a university in the south. and if you had a bunch of white male students that love to go play this game, is that going to make somebody from another group feel uncomfortable or unwelcome or feel like this is not a space for them?” as this example implies, unlike the often private act of reading, vr/xr experiences often take place in virtual places that are at least quasi-public, a venue for which few ethical precedents exist (yet). conversations on the legal and ethical implications of fully virtual information technology and libraries december 2021 black, white, and grey | ellern and cruz 8 crimes, such as rape or robbery, for example, constitute a lively, but so far unresolved, scholarly conversation.17 wicked problems where the challenges faced by libraries get most complicated, however, is when the integration of vr/xr touch upon the more fundamental question of the appropriate roles for libraries in the digital age. our respondents framed their vr labs and services largely within existing roles, e.g., gateway or learning partner, with some attention to emerging roles, such as maker, but they also acknowledged that this adaptation was awkward, solutions were (often) makeshift, and anomalies persist. this suggests the potential for paradigm shifts in the role(s) libraries can play in shaping the intersections of knowledge between the “real” and virtual worlds. library as gateway a number of our respondents connected the library’s adoption of vr/xr technology to its role in providing access to technology for those who may not otherwise have it. this role was especially pronounced in the case of academic libraries located in public universities and public libraries serving a defined community. as one of the respondents described their role, “we’re pleased to have them come and learn how to use these technologies because they’re new and we’re trying to make it more democratized that students can come and use it. they don’t have to pay for it. they don’t have to worry about like a lab being locked away from them. they can come in anytime there’s a staff member and use the stuff where here it will provide them tutorials and instruction if they want to use it.” similarly, another respondent stated, “libraries … offer an entry-level kind of way to engage with this technology in a free way where anyone who is even remotely curious, even if it doesn’t have anything to do with … anything academic, can engage with this stuff.” a third respondent stated that the case they made internally (to their library colleagues) was “to explain the importance of the library philosophy of having equitable access to resources … books are a resource, but technology is also a resource.” we have characterized this role as a gateway, rather than strictly as an access issue, because it also encompasses a vision of a pathway, one which starts in the library but may continue to other places, whether specialized labs in the discipline, in the workforce, or as part of their everyday lives. as one of our respondents put it, “we’re very much about these technologies. they’re here; they’re coming; they’re going to be a big thing soon. and we want our students to know what they are and be comfortable with them. so, we try to position ourselves as a place where they can start learning.” this gateway function is, however, characterized by competing stakeholders, both inside and outside of the library—a defining characteristic of wicked problems. this latter is perhaps best illustrated by looking at issues of accessibility. as the statements above attest, librarians see one of their primary service roles as providing access to technologies such as vr/xr to people who might not otherwise have it. that same sentiment, though, can be flipped on its head when taking other aspects of accessibility into consideration. most vr/xr programs are not ada compliant, whether they are being offered in the physical or virtual public spaces of the library. in its current form, vr/xr is an inherently visual technology, so those who are visually impaired cannot utilize it to the same extent as others. most vr/xr programs require physical movements that may not be possible for those with limited mobility. our librarians have created a few hacks, or workarounds, to provide short-term accommodations for individual students (e.g., a verbal narration of visual interactions), but generally speaking, the technology is not fully accessible. information technology and libraries december 2021 black, white, and grey | ellern and cruz 9 library as learning partner several of our respondents indicated that they saw the library’s adoption of vr/xr technology as an extension of their role as partners in the learning enterprise. this role could be conceived directly, in that the librarian mediates between classroom needs and available vr/xr titles and capabilities. this form of direct mediation could be responsive, i.e., identifying options in response to requests received, or proactive, i.e., identifying options than reaching out to faculty who might wish to avail themselves of them. integrating vr/xr material into the library’s ils the role is especially critical at this stage of vr/xr development, as none of the libraries we spoke to had integrated the available titles into their online, public-facing catalogs or integrated library system (ils). in other words, if a patron wants to know what titles are available, the best way to find out would be to ask the librarian directly and/or visit the vr/xr lab in person. as one librarian put it, “there’s not the infrastructure or the architecture we have around a book. if you were, say, a student in a history class and you wanted to study this thing, there’s no way to discover that as part of the more general resources of the library.” several of our respondents were developing workarounds, such as libguides and web-based directories, but none of these would be accessible through a general search of the library catalog or citation databases. determining how to catalog and/or curate vr/xr artifacts may be challenging and timeconsuming, but it is a problem that has an eventual solution. what is less clear, however, is what the long-term role of the library may be beyond this cataloging function. our respondents consistently indicated that this remains one of the lesser-developed roles for vr/xr in the library, and many identified raising faculty awareness especially as a high priority. while several identified this as essentially a “marketing problem,” it would appear that the challenge extends more deeply. many librarians do not have additional degrees in either educational development or instructional design, which encompasses the practice of matching learning outcomes to technology tools. the two most successful examples of matching learning outcomes to librarybased vr/xr that we heard of were faculty driven, one a project to scan actual human body parts for use in a vr setting; the other a criminal justice project related to empathy education using virtual encounters. these kinds of alignment activities can only occur if there is a tool available to match the proposed learning outcome. most of our respondents lamented the limited availability of titles that are appropriate for use in academic settings, so even if awareness was raised, there may not be sufficient content to meet academic needs. as one librarian suggests, “students will say, i’ve seen the anatomy tool, but right now i’m taking chemistry or i’m taking genetics. do you have anything that will help me with that? i’m a visual learner. i really liked this format. and that’s been challenging for because it’s so new. there’s not a coherence in terms of the titles and subject areas that you get.” and another characterized the issue this way, “it’s like the bargain video bin at walmart. sometimes you have to dig through to find something because it’s just, it’s so new right now.” the issue of availability may seem like an emergent technology issue (as above), but the challenge is further compounded by limitations on capacity, as most library vr labs can only hold one class at a time, and even then, the numbers may be limited, necessitating workarounds such as rotations, remote screen-casting, or extended office hours. even with multiple headsets, most of the time students cannot be in the same virtual reality space together. despite these challenges, information technology and libraries december 2021 black, white, and grey | ellern and cruz 10 many of our respondents were focused on optimizing current capacities, at least in part because of pressure to justify the continued expenditure of both personnel time and equipment costs. this precarious state of affairs is reflective both of tightening university budgets as well as the frequent present of internal sources of resistance from more traditionally-minded colleagues within the library itself (noted tactfully by three of our respondents). bearing all of these factors in mind, it would seem that the question of the long-term sustainability and scalability of vr/xr as a learning service for libraries remains unresolved. library as maker there may be another way to frame vr/xr in the context of libraries. in several cases (n=3), our respondents framed vr/xr not as an extension of classroom-focused service, but rather of support for the research enterprise. as one of our respondents described it, “if they’re still working on a project and they need a thing for this academic project. and then we’re just providing a new way to provide that service, closing some of the research cycle loop, that we’re now part of a different part of that same loop of creating things.” this is a reflection of the changing nature of outputs from scholarly research. previously confined largely to print artifacts, e.g., peer-reviewed journals, researchers are facing an increasing number of choices when it comes to ways to represent the scholarship being created, e.g., knowledge artifacts. this can include artifacts created in, through, or with vr/xr. several of the librarians (n=4) we spoke to mentioned that their vr/xr lab came packaged, in a sense, along with their 3d printing stations. in each case, the librarians noted that the utility of the 3d printers had resonated more readily with library users, and two indicated that they had aspirations to link the two processes in an effort to boost interest in the vr/xr space. for example, one respondent indicated that they wanted users to be able to create an object in a vr/xr program, such as google tilt brush, and then print their creation on an associated 3d printer. libraries have long provided non-3d printing services, largely as ancillary services to support researchers, so this example may, at first glance, appear to be simply a slightly more hightech version of a pre-existing service. these made objects, too, could potentially be stored, cataloged, and disseminated through the library system and/or via a dedicated database such as sketchfab.com. in our interviews, however, the respondents hinted that this linkage (between vr/xr and 3d printing) may actually be a first step towards a more fundamental shift in re-imagining the role of the library vis-à-vis technology. rather than functioning primarily as service providers, emerging technology librarians have the opportunity to become more active (co-)creators of content and facilitators of change. in one case, the vr/xr lab director, also a faculty member, developed partnerships with strategic programs on campus, such as the office of admissions, to generate original content that was specific to their institution. fortunately, the faculty member was able to draw on coding skills she had gained in prior professional roles. in another case, the library partnered with an external developer to generate original content with direct relevance to the community—a project that served to generate interest in the library, vr/xr, and local issues, all at the same time. there is a fundamental difference between a library hosting a maker space and becoming a maker itself. while librarians have traditionally characterized themselves as facilitators of knowledge rather than knowledge creators, there is some evidence that this shift may not be quite as profound as it might appear. this shift began with libraries and librarians scanning digitized items information technology and libraries december 2021 black, white, and grey | ellern and cruz 11 of their siloed special collections and archives. the resulting databases are often treated as published works in and of themselves with the library acting as curator and publisher. in addition, librarians currently hold faculty rank at many research universities and actively present and publish both in library-focused journals, thematic journals (e.g., information literacy), as well as in other venues, often alongside faculty partners.18 the embedded curricular model places librarians in the role of learning designers and as creators of extended, discipline-specific content. it should be noted, too, that content development is not the only “creator” role available. when you build a knowledge management system (like a library catalog), the choices you make serve not just to organize knowledge, but also, to shape that knowledge and, yes, create physical and cognitive pathways to and through it.19 it is perhaps not a coincidence that identifying pathways has been identified as a signature taming strategy for wicked problems. discussion: taming wicked problems our study frames the adoption of vr/xr technology by academic libraries as embedded in the larger wicked problem of library reinvention in a digital age. that said, one of the fundamental characteristics of a wicked problem is not that it is very difficult to solve, but that it is intrinsically unsolvable (or nearly so). this may explain why the question of libraries and technology seems to be a conversation that never goes away, as the question involves a perpetually moving target, embedded in the ever-shifting social, economic, and political dynamics that are taking place well beyond the walls of any library.20 this characterization does not mean, however, that we should not keep trying a variety of strategies to untangle these wicked knots. taming strategy 1: embracing wickedness in a recent essay about learning in higher education, randy bass characterized the wicked problem designation as potentially liberating, rather than discouraging. embracing wickedness serves to move the conversation from thinking of libraries as broken or backward (and therefore, in need of solutions), to a view of the question as a grand challenge, a continual thought experiment that requires ongoing inquiry, thoughtful consideration, and an expansive, rather than reductive, perspective.21 as a grand challenge, the question of libraries and emerging technologies such as vr/xr becomes less of a mad scramble to maintain relevance and more of a scholarly conversation that enhances the role of the library as an inclusive and pluralistic space. in this framework, the questions of whether or not a library should embrace new technology or technology-related service are not bounded by the intrinsic qualities of that technology itself, nor does it mean that libraries everywhere will need to land upon the same, or even similar, technologies, but rather they might seek convergence in the role of libraries as tamers of these wicked problems. taming strategy 2: integrating adaptability the librarians we spoke to generally described their units as falling under the category of “early majority” in roger’s well-known diffusion of technology model, in that they wanted to see evidence that vr/ xr will be useful to others before committing their resources, but they also want to serve a gateway role in introducing promising new technologies to their patrons.22 much of the research on technology diffusion, however, has focused on either end of the curve, i.e., the innovators or non-innovators, and comparatively less research has been done on the role played by those in the middle, such as these libraries.23 by positioning themselves as early majority adopters, academic libraries would potentially be able to articulate a clear and distinct role for themselves vis-à-vis other units within the university that support technology-enabled learning; information technology and libraries december 2021 black, white, and grey | ellern and cruz 12 while also giving themselves the ability to leverage more resources outside of the library itself. the model also has the advantage of providing a sustainable model of re-invention. as a given technology matures along the continuum, the library’s role recedes, enabling it to embrace the next emerging technology. as one of our respondents pointed out, their library used to give training on how to use a mouse and, one day, gateway training for vr is likely to go the same route. taming strategy 3: building networks because wicked problems are complex and ill-defined, taming them is often done by connecting to others with different perspectives.24 our respondents were largely emerging technology librarians who used a number of on-the-ground strategies to tame the wickedness of the task of advocating for a vision of vr/xr on their respective campuses. most of these strategies required creating relationships beyond the walls of the library, e.g., building organizational networks, connecting to community organizations, developing joint, shared, or embedded positions; cultivating faculty champions in academic units, and initiating shared programming. these collaborative strategies resonate with another characteristic of wicked problems, e.g., that they require the ability to think across conventional organizational and disciplinary siloes. taming strategy 4: exercising interdisciplinary imagination and what other role at a university has more experience with this kind of intellectual dexterity than a librarian? our respondents mentioned working with faculty from 14 different disciplines in the context of their responses to our interview questions, and that’s without being asked. as higher education increasingly shifts its attention towards addressing wicked problems, then librarians may be well poised to serve a gateway role in modeling, supporting, and conducting what is now being called “convergent” research.25 this has been described as transdisciplinary inquiry that integrates knowledge from multiple data sources, disciplinary perspectives, and lived experiences in order to confront the world’s most complex problems.26 taming strategy 5: modeling as learning partners in the future of higher education, librarians will have a role to play in developing our students’ abilities to tame these same wicked problems.27 this partnership is not limited to the kind of information and digital literacy needed for cross-disciplinary research. taming wicked problems requires more than a specific set of knowledge or skills, but rather a certain disposition, e.g., a willingness to engage in answering seemingly impossible questions; the flexibility to find pathways through those challenges; the ability to persevere through short-term setbacks; and, above all else, the motivation to support the ability of others to flourish.28 this same set of wicked qualities could easily be applied to all of the respondents in our study, each of whom have succeeded because of their deeply held, intrinsic passion for (and commitment to) the possibilities for what technology and libraries can do together. conclusion the library remains a model of not just individual, but also organizational resiliency. as new technologies such as vr/xr arise, the library as an institution will find ways to weather emerging challenges, resolve complicated problems, and disentangle super complex, i.e., wicked, dilemmas, each of which requires the cultivation of distinctive knowledge, skills, and dispositions. in this study, we argue that the strategies associated with wicked problem solving can serve to strengthen the ability of libraries (and librarians) to serve an active role in our collective future, whether that future is “real” or virtual (or both). information technology and libraries december 2021 black, white, and grey | ellern and cruz 13 appendix a – email to interviewees subject: we are interested in your experience with virtual reality at your library/university/college a research study interview request hi invitee, you are being invited to participate in a research study of how universities navigate the integration of virtual reality labs. you were selected as a possible participant because of your experience in managing or implementing such labs. your participation entails a 45–60 minute interview, conducted through zoom. we will be especially interested in how you, your library, or your university navigated one of the following “grey areas” where a situation is ill-defined or not readily conforming to a category or an existing set of rules or policies. these include but not limited to your professional perspective in one or more of the following: 1. physical and software liability 2. licensing and infringement 3. user accounts with the university and/or with the vendor 4. physical space modifications needed for vr 5. room and equipment management 6. separating collection development policies from equipment and use policies 7. use policies for the equipment, software, and users 8. controlling the vr equipment and software 9. time, research, and staff needed to run this service 10. training and learning curve for users (both faculty and students) 11. logistics of using the vr room for a class and within a class 12. integrating vr into a college course 13. selecting appropriate vr items to purchase 14. evaluating vr items 15. paying for vr items including approval, licensing, purchasing processes 16. installing and maintaining vr items including regular updates, the user installing software/games, management of hardware/software, repair, etc. 17. budget for vr (amount, repair, one-time/continuing) 18. a vr topic of your choice of course, we will not be able to cover all of these areas listed above during our short interview with you. we are sending them so you can begin thinking about these vr challenges and prioritize them. based on our own experience, we think you have important insight to share about some of them that will be beneficial to the broader university and library communities. information technology and libraries december 2021 black, white, and grey | ellern and cruz 14 appendix b – interview protocol 1. tell us about the history of you/your library with vr. 2. how have you/your library navigated one of the following grey areas (drawn from working with vr in libraries) where a situation is ill-defined or not readily conforming to a category or an existing set of rules or policies? these include but are not limited to your professional perspective in one or more of the following (from the list we sent you in our invitation email): • physical and software liability • licensing and infringement • user accounts with the university and/or with the vendor • physical space modifications needed for vr • room and equipment management • separating collection development policies from equipment and use policies • use policies for the equipment, software, and users • controlling the vr equipment and software • time, research, and staff needed to run this service • training and learning curve for users (both faculty and students) • logistics of using the vr room for a class and within a class • integrating vr into a college course • selecting appropriate vr items to purchase • evaluating vr items • paying for vr items including approval, licensing, purchasing processes • installing and maintaining vr items including regular updates, the user installing software/games, management of hardware/software, repair, etc. • budget for vr (amount, repair, one-time/continuing) • a vr topic of your choice please describe an occasion where you were faced with one of these complex, challenging, and/or potentially insurmountable obstacles in integrating vr into your library (or university more broadly). how did you navigate this challenge? 3. please describe one way in which the values, practices, and ethos of librarianship may have been challenged by the integration of a vr lab and the purchase and curation of vr artifacts. . information technology and libraries december 2021 black, white, and grey | ellern and cruz 15 endnotes 1 horst w. j. rittel and melvin m. webber, “dilemmas in a general theory of planning,” policy sciences 4, no. 2 (1973): 155–69. 2 cameron tonkinwise, “design for transitions—from and to what?” design philosophy papers 13, no. 1 (may 2015): 15, http://dx.doi.org/10.1080/14487136.2015.1085686. 3 valerie a. brown, john harris, and jacqueline russell, tackling wicked problems: through the transdisciplinary imagination (london: taylor & francis group, 2010): 302, ebook central. 4 bayard l. catron, “on taming wicked problems,” dialogue 3, no. 3 (1981): 13–16; luke houghton, “engaging alternative cognitive pathways for taming wicked problems,” emergence : complexity and organization 17, no. 1 (2015), https://www.researchgate.net/publication/282282336_engaging_alternative_cognitive_path ways_for_taming_wicked_problems_a_case_study. 5 catron, “on taming wicked problems”; falk daviter, “coping, taming or solving: alternative approaches to the governance of wicked problems,” policy studies 38, no. 6 (november 2017): 571–88, https://doi.org/10.1080/01442872.2017.1384543; david j. snowden and mary e. boone, “a leader’s framework for decision making,” harvard business review (november 1, 2007), https://hbr.org/2007/11/a-leaders-framework-for-decision-making. 6 natallia pashkevich, “wicked problems: background and current state,” philosophia reformata 85, no. 2 (november 4, 2020): 119–24, https://doi.org/10.1163/23528230-8502a008. 7 andrew m. cox, mary anne kennan, liz lyon, and stephen pinfield, “developments in research data management in academic libraries: towards an understanding of research data service maturity,” journal of the association for information science and technology 68, no. 9 (2017): 2182–2200, https://doi.org/10.1002/asi.23781; julie mcleod and sue childs, “a strategic approach to making sense of the ‘wicked’ problem of erm,” records management journal 23, no. 2 (2013): 104–35, http://dx.doi.org/10.1108/rmj-04-2013-0009; shelley wilkin and peter g. underwood, “research on e-book usage in academic libraries: ‘tame’ solution or a ‘wicked problem’?” south african journal of libraries & information science 81, no. 2 (july 2015): 11– 18, https://doi.org/10.7553/81-2-1560; brendan howley, “libraries, prosperity’s wicked problems, and the gifting economy," information today 33, no. 6 (july 2016): 14–15, proquest. 8 rachel d. williams and rebekah willett, “makerspaces and boundary work: the role of librarians as educators in public library makerspaces,” journal of librarianship and information science 51, no. 3 (september 2019): 801–13, https://doi.org/10.1177/0961000617742467. 9 cox, pinfield, and smith, “moving a brick building.” 10 matt cook et al., “challenges and strategies for educational virtual reality,” information technology and libraries 38, no. 4 (december 16, 2019): 25–48, https://doi.org/10.6017/ital.v38i4.11075; kung jin lee, w. e. king, negin dahya, and jin ha lee, “librarian perspectives on the role of virtual reality in public libraries,” proceedings of the association for information science and technology 57, no. 1 (2020): e254, https://doi.org/10.1002/pra2.254; hannah pope, “virtual and augmented reality in information technology and libraries december 2021 black, white, and grey | ellern and cruz 16 libraries,” library technology reports 54, no. 6 (september 8, 2018): 1–25; felicia ann smith, “‘virtual reality in libraries is common sense,’” library hi tech news 36, no. 6 (august 28, 2019): 10–13, https://doi.org/10.1108/lhtn-06-2019-0040; char booth, “from technolust to technorealism,” public services quarterly 5, no. 2 (june 2009): 139–42, https://doi.org/10.1080/15228950902868504. 11 megan frost, michael goates, sarah cheng, and jed johnston, “virtual reality: a survey of use at an academic library,” information technology and libraries 39, no. 1 (march 2020): 1–12. https://doi.org/10.6017/ital.v39i1.11369; jennifer grayburn, zack lischer-katz, kristina golubiewski-davis, and veronica ikeshoji-orlati, 3d/vr in the academic library: emerging practices and trends (washington, dc: council on library and information resources, 2019), https://eric.ed.gov/?id=ed597662; susan lessick and michelle kraft, “facing reality: the growth of virtual reality and health sciences libraries,” journal of the medical library association: jmla 105, no. 4 (october 2017): 407–17, https://doi.org/10.5195/jmla.2017.329; kenneth j. varnum, ed. beyond reality: augmented, virtual, and mixed reality in the library (chicago: american library association, 2019); richard smith and oliver bridle, “using virtual reality to create real world collaborations,” proceedings of the iatul conferences. paper 5 (2018): 10, https://docs.lib.purdue.edu/iatul/2018/collaboration/5/; carl r. grant and stephen rhind-tutt, “is your library ready for the reality of virtual reality? what you need to know and why it belongs in your library,” in o, wind, if winter comes, can spring be far behind? (charleston conference, 2019), https://doi.org/10.5703/1288284317070; dorothy carol ogdon, “hololens and vive pro: virtual reality headsets,” journal of the medical library association: jmla 107, no. 1 (january 2019): 118–21, https://doi.org/10.5195/jmla.2019.602. 12 grayburn et al., 3d/vr in the academic library, 8. 13 douglas bates, “library service study,” unpublished data, june 2, 2020; andrew m. cox, mary anne kennan, liz lyon, and stephen pinfield, “developments in research data management in academic libraries: towards an understanding of research data service maturity,” journal of the association for information science and technology 68, no. 9 (2017): 2182–2200, https://doi.org/10.1002/asi.23781; priti jain, “new trends and future applications/directions of institutional repositories in academic institutions,” library review 60, no. 2 (2011): 125–41, http://dx.doi.org/10.1108/00242531111113078; janice g. norris and elka tenner, “gis in academic business libraries: the future,” journal of business & finance librarianship 6, no. 1 (september 2000): 23, https://doi.org/10.1300/j109v06n01_03. 14 ross rubin, “vendors face the tough reality of affordable vr,” zdnet (july 13, 2020), https://www.zdnet.com/article/vendors-face-the-tough-reality-of-affordable-vr/. 15 sarah sharples, sue cobb, amanda moody, and john r. wilson, “virtual reality induced symptoms and effects (vrise): comparison of head mounted display (hmd), desktop and projection display systems.” displays 29, no. 2 (march 1, 2008): 58–69, https://doi.org/10.1016/j.displa.2007.09.005. 16 emily carl et al., “virtual reality exposure therapy for anxiety and related disorders: a metaanalysis of randomized controlled trials,” journal of anxiety disorders 61 (january 1, 2019): 27–36, https://doi.org/10.1016/j.janxdis.2018.08.003. information technology and libraries december 2021 black, white, and grey | ellern and cruz 17 17 edward castronova, on virtual economies, (rochester, ny: social science research network, july 1, 2002), https://papers.ssrn.com/abstract=338500. 18 barbara i. dewey, “the embedded librarian: strategic campus collaborations,” resource sharing & information networks 17, no. 1/2 (march 2004): 5–17; alessia zanin-yost, “academic collaborations: linking the role of the liaison/embedded librarian to teaching and learning,” college & undergraduate libraries 25, no. 2 (april 2018): 150–63, https://doi.org/10.1080/10691316.2018.1455548. 19 xiaoping sheng and lin sun, “developing knowledge innovation culture of libraries,” library management 28, no. 1/2 (january 9, 2007): 36–52, https://doi.org/10.1108/01435120710723536. 20 lorcan dempsey, “libraries and the informational future: some notes,” information services & use 32, no. 3/4 (july 2012): 201–12, https://doi.org/10.3233/isu-2012-0670. 21 randall bass, “what’s the problem now?” to improve the academy: a journal of educational development 39, no. 1 (spring 2020), https://doi.org/10.3998/tia.17063888.0039.102; kate crowley and brian w. head, “the enduring challenge of ‘wicked problems’: revisiting rittel and webber,” policy sciences 50, no. 4 (december 1, 2017): 539–47, https://doi.org/10.1007/s11077-017-9302-4. 22 brady d. lund, isaiah omame, solomon tijani, and daniel agbaji, “perceptions toward artificial intelligence among academic library employees and alignment with the diffusion of innovations’ adopter categories,” college & research libraries 81, no. 5 (july 2020): 865–82, https://doi.org/10.5860/crl.81.5.865. 23 david a. abrahams, “technology adoption in higher education: a framework for identifying and prioritising issues and barriers to adoption of instructional technology,” journal of applied research in higher education 2, no. 2 (2010): 34–49, https://doi.org/10.1108/17581184201000012. 24 tilmann lindberg, christine noweski, and christoph meinel, “evolving discourses on design thinking: how design cognition inspires meta-disciplinary creative collaboration,” technoetic arts: a journal of speculative research 8, no. 1 (may 2010): 31–37, https://doi.org/10.1386/tear.8.1.31/1; nancy roberts, “wicked problems and network approaches to resolution,” international public management review 1, no. 1 (2000): 1–19. 25 heather leary and samuel severance, “using design-based research to solve wicked problems,” icls 2020 proceedings (june 2020): 1805-6, https://repository.isls.org/bitstream/1/6452/1/1805-1806.pdf; deborah l mulligan and patrick alan danaher, “the wicked problems of researching within the educational margins: some possibilities and problems,” in researching within the educational margins: strategies for communicating and articulating voices, ed. deborah l. mulligan and patrick alan danaher, (cham, switzerland: palgrave macmillan, 2020): 23–39, https://doi.org/10.1007/978-3-03048845-1_2. information technology and libraries december 2021 black, white, and grey | ellern and cruz 18 26 brown, harris, and russell, tackling wicked problems, ebook central; chris burman, marota aphane, and naftali mollel, “the taming wicked problems framework: reflections in the making,” journal for new generation sciences 15 (april 20, 2018): 51–73, https://www.researchgate.net/publication/324646298_the_taming_wicked_problems_fram ework_reflections_in_the_making; “convergence research at nsf,” national science foundation,” accessed october 21, 2021, https://www.nsf.gov/od/oia/convergence/. 27 alex jorgensen and kara lindaman, “practicing democracy on wicked problems through deliberation: essentials for civic learning and student development,” journal of management policy and practice 21, no. 2 (2020): 28–39, https://www.proquest.com/scholarlyjournals/practicing-democracy-on-wicked-problems-through/docview/2435720594/se-2; paul hanstedt, creating wicked students: designing courses for a complex world (sterling, virginia: stylus publishing, 2018), ebook central. 28 ronald barnett, “learning for an unknown future,” higher education research & development 31, no. 1 (february 1, 2012): 65–77, https://doi.org/10.1080/07294360.2012.642841; stephanie wilson and lisa zamberlan, “design for an unknown future: amplified roles for collaboration, new design knowledge, and creativity,” design issues 31, no. 2 (spring 2015): 3–15, https://doi.org/10.1162/desi_a_00318; robin kundis craig, “resilience theory and wicked problems,” vanderbilt law review 73, no. 6 (december 2020): 1733–75, proquest; larry j leifer and martin steinert, “dancing with ambiguity: causality behavior, design thinking, and triple-loop-learning,” information knowledge systems management 10, no. 1–4 (march 2011): 151–73. 20180926 10703 editor president’s message: rebuilding our identity, together bohyun kim information technology and libraries | september 2018 2 bohyun kim (bohyun.kim.ois@gmail.com) is lita president 2018-19 and chief technology officer & associate professor, university of rhode island libraries, kingston, ri. ital is the official journal of lita (library and information technology association), and if you are a reader of the ital journal, it is highly likely that you are a member of lita and/or one who is deeply interested in library technology. it is my pleasure to write this column to update all of you about the exciting discussion that is currently underway in lita and two other ala divisions, alcts (association for library collections and technical services), and llama (library leadership and management association). as many of you know, lita began discussing the potential merger with two other ala divisions, alcts and llama, last year.1 what initially prompted the discussion was the prospect of continuing budget deficits in all three divisions. but the resulting conversation has proved that financial viability is not the entire story of the change that we want to bring about. at the 2018 ala annual conference in new orleans, the three boards of lita, alcts, and llama held a joint meeting open to members and non-members alike to solicit and share our collective thoughts, suggestions, concerns, and hopes about the potential three-division realignment. at this meeting attended by approximately 75 people, participants expressed their support for creating a new division with the following key elements. • retain and build upon the best elements of each division. • embrace the breakdown of silos and positive risk-taking to better collaborate and move our profession forward. • build a strong culture of innovation, energy, and inspiration. • be more transparent, responsive, agile, and less bureaucratic • excel in diversity, equity, and inclusion. • support members in all stages of their careers, those with the least means to travel for in-person participation, in particular. • provide member-driven interactions and good value for the membership fee. these ideas have made it clear that members of all three divisions see the goal of realignment as something much more fundamental than financial sustainability. they have validated the shared belief among the lita, alcts, and llama boards that the ultimate goal of realignment is to create a division that better serves and benefits members, not to simply recover the division’s financial health. while the criteria for the success of a new combined division received almost unanimous endorsement at the meeting, opinions about how to realize such success varied. there were understandable concerns associated with combining three small-sized associations into one large one. for example, how will we reconcile three distinctly different cultures in lita, alcts, and llama? how will the new association ensure itself to be more transparent, responsive, and rebuilding our identity, together | kim 3 https://doi.org/10.6017/ital.v37i3.10703 nimble than the individual divisions prior to the merger? could the larger size of the new division make it more difficult for small groups with special interests to get needed support for their programs? many requested that the leadership of the three divisions provide more specific vision and details. as a group, the leaders of lita, alcts, and llama are committed to hashing out those details. with the aim of providing fuller information about what the new division would look like at the 2019 midwinter conference, we have already formed working groups, one for finances and the other for communication and are currently working to create two more on operations and activities. these four teams will work closely together with the current leadership of lita, alcts, and llama, to prepare the most important information about the proposed new division, so that the boards and the members of three divisions can review and provide feedback for needed adjustments. our goal is to present essential information that will allow the members to vote with confidence on the proposal to form one new division on the ala ballot in the spring of 2019. if the membership vote passes, then we will be taking the proposal to the ala committee on organization for finalization. on this occasion, i would also like to bring to everyone’s attention to an inherent tension between the two ideas that many of us hold as association members regarding alignment. one is that more member involvement in determining alignment-related details at an early stage is essential to the success of the new division. the other is that we can decide whether we will support the new division or not, only after the leadership first presents us with a clear, specific, and detailed picture of what the new division will look like. the problem is that we cannot have both at the same time. as members, if we want to be involved at an early stage of reorganization, we will have to accept that there will be no ready-made set of clear and specific details about the division waiting for us to simply say yes or no. we will be required to work through our collective ideas to decide on those details ourselves. it will be a messy, iterative, and somewhat confusing process for all of us. there is no doubt that this will be hard work for both the lita leadership and lita members. but it is also an amazing opportunity. imagine a new division, where (a) innovative ideas and projects are shared and tested through open conversation and collaboration among library professionals in a variety of functional areas such as systems and technology, metadata and cataloging, and management and administration, (b) frank and inspiring dialogues take place between front-line librarians and administrators about vexing issues and exciting challenges, and (c) new librarians learn the ropes, are supported throughout their careers going through changes in their responsibilities as well as areas of specialization, are mentored to be future leaders, and get to develop the next generation of leaders as they themselves achieve their goals. furthermore, i believe that the process of building this kind of new association from the ground up will be a truly rewarding experience. we had an opportunity to discuss and share our collective hope and vision for the new division at the joint meeting, and that vision is an inspiring one: a division that is member-driven, nimble and responsive, transparent and inclusive, and not afraid to take risks. can we create a new association that breaks down our own silos and builds bridges for better communication and collaboration to move our profession forward? information technology and libraries | september 2018 4 my hope is that we can model and embody the change we want to see, starting in the reorganization process itself. if we want to build a new association that is inclusive, transparent, and nimble, we should be able to build such an association in precisely that manner: inclusively, transparently, and nimbly. if we are successful, our identity as members of this new division will be rebuilt as the very spirit and energy of continuing innovation, experimentation, and collaboration across different functional silos of librarianship, rather than as what we have in our job titles. many lita members and ital readers are leaders in their field and care deeply about the continued success and innovation of lita and ital. i would like to invite all of you to participate in this effort of three-division alignment and to inform and lead our way together. while the boards of three divisions are working on the proposal, there will be multiple calls for member participation. keep your eye out for new updates that will be posted in the ala connect community, “alcts/llama/lita alignment discussion” at https://connect.ala.org/communities/community-home?communitykey=047c1c0e-17b9-45b6a8f6-3c18dc0023f5. all information in this group site is viewable to the public. lita, alcts, and llama members can also join the group, post suggestions and feedback, and subscribe to updates. where would you like lita to be next year, and the year after? let us take lita there, together. endnote 1 andromeda yelton, “president’s message,” information technology and libraries 37, no. 1 (march 19, 2018): 2–3, https://doi.org/10.6017/ital.v37i1.10386. reproduced with permission of the copyright owner. further reproduction prohibited without permission. new strategies in library services organization: consortia university libraries in spain miguel duarte barrionuevo information technology and libraries; jun 2000; 19, 2; proquest pg. 96 new strategies in library services organization: consortia university libraries in spain miguel duarte barrionuevo new political, economic, and technological developments, as well as the growth of information markets, in spain have created a foundation for the creation of library consortia. the author describes the process by which different regions in spain have organized university library consortia. s panish libraries are public entities that depend either on central or local governments and are funded through either the national general budget or the regional government (comunidades aut6nomas) budget. on one hand, the player at the national level is the education and culture ministry, which contributes to the fifty-two state public libraries and shares jurisdiction with the regional government. on the other hand, universities are self-governed institutions of a public nature regulated by the ley de reforma universitaria, or university reform law, which was approved by the spanish parliament in 1983 to promote scientific study and greater selfgovernment of spanish universities. universities have their own budget, and they are mainly funded by the regional government. the university library system is currently made of about fifty public libraries and twelve private libraries. since the second half of the 1980s, a new philosophy concerning public services has spread in spain, as in other european countries: a philosophy calling for higher quality and more efficiency in the management and administration of the public capital. there has also arisen a claim to the government's satisfactory use of public funds as a social right, as well as a claim to a return on that capital in social terms. this is where libraries' public services come into play. there is a clear interest in all the aspects related to the introduction of new techniques in management. quality management, effectiveness and efficiency measuring, costs control, services assessment, and users content or analysis from the stakeholders' point of view are concepts that emerge in university libraries. in order to adjust to the circumstances, universities are changing their management procedures, and university libraries have been forced into managing their "business" according to managerial criteria. the commonality of their activities, and the relaxation of geographical boundaries fostered by information technologies, have encouraged libraries to join consortia in order to remain relevant in the current library services context. such concepts as the "electronic," "digital," and miguel duarte barrionuevo is head director of the central library of the university of cadiz (andalucia), and an active contributor of the university libraries consortium of andaluc1a. 96 information technology and libraries i june 2000 "virtual" libraries lead, from my point of view, to a different configuration in the library services context; they have pushed the library managers to consider strategically where they are and what is their most adequate position within this new configuration. departments dealing with information are to be wider, more heterogeneous, and multidisciplinary. new organization strategies need to be defined in order to offer services in a different way when library managers are forced to obtain the best results out of their limited resources, the organization of consortia represents a qualitative leap forward in cooperation, efficiency, and cost-savings. library consortia aim to share resources and to promote participation on the basis of the mutual benefit of the libraries involved and, although the concepts of cooperation, coordination, and sharing resources are not new in the library world, the organization of library consortia introduces a major level of commitment and involvement among the participants. i new settings, new facts libraries are going through a crisis. a library is still an institution with a strong traditional character, but its traditional duties as depository of knowledge no longer justify its costs, and the crisis is exacerbated by an accelerated technological and informative revolution. 1 within the changing atmosphere of the spanish university in the last few years, goals and objectives are affected by a number of socioeconomic, institutional, and technological factors, as well as others with an internal character that push these institutions to move toward change as an opportunity to maintain continuous improvement. materials and services are more expensive, and technology is more sophisticated every day, which leads to a need for strong investments. the public financing funds are more and more limited while the costs are growing. the university, in general, is suffering from a lack of efficiency and organizational flexibility; staff rejects monotonous tasks and holds high expectations; the fast dynamics of the implementation of information technology in the last few years has caused a very serious imbalance in the skill levels of people and in job-position demands. all these factors generate a new setting of weaknesses and hopes to which the university libraries have to respond in order to maintain their competitive advantages. i technology technology has recently become a strategic element in the development of libraries. technology is more and more sophisticated and its life is shorter. its use implies reproduced with permission of the copyright owner. further reproduction prohibited without permission. the need of strong investments in computer and communication infrastructure. i economical pressure on information market agents materials costs have diversified and arc more and more costly, with annual growths far exceeding even inflation rate levels. an absolute change has been produced in the supply and demand of the information market, which causes the agent's utter disorientation: the publishing sector is adapting very slowly to the electronic context; the distribution sector needs a deep technological and organizational transformation (few spanish suppliers offer added value services such as cataloguing, outsourcing, or material preparation-puvill libras, or filial multinationals such as blackwell or dawson are exceptions). electronic data interchange, a european standard like sisac, is not a standard format among the sector and there is not a national supplier that offers services of the approval plans type. additionally, the agents of the information market are very conditioned by the change of the demand orientation. specialized users (teachers, researchers, thesis students, etc.) demand from libraries electronic resources, quick information, and access at all times from remote locations. this conflicts with the restrictive tendencies in the maintenance of the public services and drastic budget cuts. libraries are forced to obtain the highest possible ratio of efficiency in the use of the fewest resources. i total quality management implementation and other management techniques the result is implementation of total quality management (tqm), which guarantees quality of services. it is important to consider tqm as an instrument that develops organizational strategies. it is a continuous process developed in order to replace obsolete types of organization, to orient the corporate activity as a permanent basis to the processing optimization, and to obtain a coherent relation between the efficacy in the reaching of objectives and the efficiency in the use of resources. changes in the editorial industry, the budget cuts, the quick expansion of electronic resources, the new price politics, and the problems related to copyright and intellectual property form the new setting. in this context, the consortia organization is considered by the university and library managers as a means to face the challenges which the new settings imply, to unify their pressure capacity with regard to the different agents, and to take advantage of the system's strength in order to adjust to the new situation and improve their competitive advantage. i adequate information technologies the spanish university libraries are connected to the academic information network upheld by rediris, a scientific-technical installation that depends on the science and technology office of the prime minister. the main line that maintains the redlris services is formed by seventeen nodes in each region (comunidad aut6noma), connected by atm circuits on atm accesses of 34/155 mbps. each node is formed by a set of communication equipment that allows the coordination of the main transmission means and of the access lines from the centers of each regions. redlris participates in the ten-34 project, which aims at building up an ip paneuropean net of 34 mbps, that interconnects us with the different academic and research nets and that is planned to become a ten-135 in 1999.2 on the other hand, the region (comunidad aut6noma) incorporates added value elements to the net segments they manage, such as faster access speeds that allow centralized architecture (for instance, union catalogue consortia libraries of galicia is managed through a broad band net of 155 mbps). the region also allows access to databases in cd-rom and electronic formats orientated to the final users in a regional context. for instance, the scientific computer center of andalucia manages twenty-two databases in cd-rom and other electronic formats that can be searched by all the andalusian universities and research centers through the andalusian scientific research net. homogeneous automation level the automation process of the library services, initiated at the end of the decade of the '80s, is practically completed. dobys-libis, libertas, vrls, absys, and sabini are the most widely used library management systems. 3 since 1997 some libraries have updated their library automation system to unicom (sirsi) and innopac (innovative interfaces). the spanish university libraries have a homogeneous automation level and can establish projects from the consortia perspective, such as regional union catalogs, sharing electronic information resources, and shared purchase policies. favourable political situation traditionally, the cooperative efforts have obtained little offical support. however, in the last years, a positive attitude can be perceived from the academic authorities in new stragetiges in library services organization i barrionuevo 97 reproduced with permission of the copyright owner. further reproduction prohibited without permission. relation to cooperation activities and the cooperative projects development, both as an answer to the need to reduce costs by sharing resources and as a means to face the growing and unstoppable demand from the users. the initiatives for the consortia organization are supported by highest academic level institutional agreements among the universities: principals and vice-principals of research (such is the case of the consortia of andalucia and madrid) or they are the result of initiatives taken by the autonomous government (galicia consortium) or a confluence of interests between the autonomous government and the universities (catalufla consortium). remote access to end users' information resources following the automation projects and the network technologies and data transmission development, most university libraries have made projects for all information resources integration and maintain a wide group of services: campuswide networks, catalogs, databases in cdrom (e .g., indice espanol de ciencias sociales y humanidades, indice espanol de ciencia y tecnologia, aranzadi legislaci6n y jurisprudencia, medline, abi inform, academic search) , e-mail , and remote access via internet. access to dll resources is available through the libraries management system opac web. there is access to any of these resources from any point connected to the network, whether from terminal servers, workstations, pcs, unix stations, or macs. i cooperation in spain up to the middle of the '80s, university libraries were separate realities with scattered funds and disorganized services; they were not structured as a system and they were lacking any tradition or mentality of cooperation. in a 1994 poll, only 40 percent of university library directors declared that cooperation among libraries was important. 4 we could say that the cooperation initiatives depend on the will of the people who obtain little support from the government. therefore, two different stages could be set: one in which cooperation is the result of personal actions, taken with no institutional support, in which local projects are undertaken ; or one in which individual initiatives are taken by the people in charge of libraries and a certain concern from the central government converge. will to share resources spain did not join the movement toward library automation until the '80s . at this time, the cooperative tenden98 information technology and libraries i june 2000 cies now associated with information and communication technologies were only slightly realized in the libraries. eventually, however, a consolidation of efforts took place, helping to bring about, at the end of the '80s and beginning of the '90s, some important cooperative initiatives out of which some specialized union catalogs could be brought. some of the first cooperative initiatives arose from the association of specialized libraries. 5 among these we can point out the coordinating committee of biomedical documentation, whose mission was to promote the cooperation and rationalization of document resources in the field of biomedicine. this committee holds conferences and maintains a union catalog of the daily publications on health services accessible through internet. 6 documat, created in 1988, groups together the libraries specializing in mathematics and maintains a union catalog of journals on which basis are organized plans of shared acquisition . mecano groups together the libraries of the schools of engineering and maintains a union catalog accessible through internet? early cooperative initiatives were also promoted by the library automation systems users groups. red universitaria espanola de dobis / libis began in 1990 when twelve universities using the system decide to create an online union catalog maintained by the university of oviedo. the libertas spanish users group maintains its union catalog associated with sls database, accessible online from bristol. rueca is the union catalog of absys users .8 need to cooperate in the early '80s a forum started in universities that attempted to influence the writing of the university statutes (as a result of the ley de reforma universitaria) and establish a general criterion for regulations. as a result of this debate, two documents have been published and have proved to be essential for subsequent cooperative development. 9 some reports from conferenc es on university libraries h eld in 1989 in the university complutense of madrid had a wide influence at the national level, and the same year, fundesco produced a report about the state-of-the-art in automation in the spanish university libraries .10 the situation that is repeated in these reports about th e libraries is extremely pessimistic. their evolution from 1985 to 1995 has been perfectly described by m. taladriz and l. anglada as "the lack of recognition of the role of university libraries ... the dispersion of bibliographical funds ... the general disorganization of the library services .... " 11 in 1988, red de bibliotecas universitarias (rebiun, university libraries network) was created. although inireproduced with permission of the copyright owner. further reproduction prohibited without permission. tially only nine university libraries were involved , the number grew to seventeen during the following years. the cooperative activiti es were centralized, and th ey obtained remarkable results in training, the improvement of library interlending, and in the publishing on cd-rom of bibliographical records from participant libraries. at the same time, and thanks to the celebration of the ifla congress in barcelona in 1993, the general need to create a wider discussion forum including all the univ ersity libraries and to obtain bett er cooperation and coordination was established. this idea crystallized with the creation of the conferencia de directores de bibliotecas universitarias y cientfficas (cobiduce, th e conference of university and scientific libraries directors). the first working mee ting was held in november 1993.12 this led to th e merging of rebiun with cobiduce in order to concentrate all the cooperation efforts into a single institution. a single institution, which kept th e name of rebiun, was created in 1996. in 1998, rebiun became the local committe e of the conferencia de rectores de las universidades espanolas (crue, conference of spanish university principals). rebiun has become the organization that oversees all the cooperation and coordination efforts in spanish academic librari es . rebiun activities include a union catalog published on cd-rom, "regulation s for university and scientific librari es," agreements on int erli brary loans, and activities in different working groups .13 i university libraries' consortia in the past few years the tran sfer of powers to the autonomous regions on ed u ca tion and culture, a consequence of a constitutional order, has brought about another political and administrative context for the achievement of the libraries ' objectives. th e autonomous regions are now working on the design of regional developm en t plans or regional information systems that are related, unfailingly, to the cooperative activity of the libraries of the territor y. thi s initiati ve can be applied to university librarie s as well as any other type of library , which, through their institutions, request their autonomous governments' assistance or funding in order to achieve cooperative projects. or it could be done the other way round: a governm ent can outline an action plan for its libraries and suggest it to the potential participants. thus, the basis for consortia development was set in the second half of the '90s, and encouraged by events like the celebrated conference in ca diz , organized by the university of carlos iii de madrid and the university of cadiz libraries , and ebsco information services (spanish branch) in 1998. catalonia consortium of university libraries (consorcio de bibliotecas universitarias de catalufia) we could sum up the situation in catalonia according to the following: the existenc e of new automated libraries, few automated records, the us e of their own automation systems, and the existence of only three universitie s. we can es tablish some cooperation background developed at this time : cruc, caps , and the joint selection of an automation system realiz ed by universidad aut6noma de barcelona and universidad politecnica de cataluna. it is not until the '90s that positive factors combined to move the cooperative movement a step forward in catalonia. these positive factors were a homogeneous s ta te of automation among university libraries, a good communications network, and the use of standards for library data recording . the previous cooperative movements and an analysis of the worldwide evolution of libraries helped in the building of a united view in which coop era tion appeared as an additional instrument for the improvement of the library world. the university library directors of catalonia considered cooperation a way to accelerate the evolution of libraries, to create new services, to facilitate changes, and to save expenses. with this conviction, they wrote a proproposal for the creat io n of a library network in catalonia , which in 1993 resu lted in the interconnection of the university librarie s in catalonia, followed in 1995 with the first steps toward the cre ation of the united catalog of the univer sities of catalonia. this catalog was fully operative in early 1996. at the end of 1996 th e univ ersity library consortium of catalonia (cbuc) was created with the task of improving library services through cooperation. 14 its objectiv es are: • to create new workin g too ls • to improve services • to build a digital librar y • to take better advantages of resources • to face together the changing role in libraries the cbuc comprises the university of barcelona , universidad autonoma de barcelona, the politechnical university of catalonia , pompeu fabra university , th e univ ersity of girona, the university of lleida, rovira i virg ili university, the university oberta of catalonia , and the library of catalonia. the direction of cbuc is determined by a board of representatives from each of th e institutions, an executive committee of six members, and a technical committee of librar y dir ectors. a staff of seven new stragetiges in library services organization i barrionuevo 99 reproduced with permission of the copyright owner. further reproduction prohibited without permission. runs the cbuc office, and different working groups audit active plans and study possible issues of concern. university libraries consortium of the madrid region (comunidad autonoma de madrid) the public university libraries, based in the madrid region (universidad de alcala, universidad carlos iii, universidad complutense, universidad politecnica, universidad rey juan carlos, and uiversidad nacional de educaci6n a distancia), are developing many cooperation programs with the following objectives: • to facilitate access to information resources • to improve the existing library services • to test and promote the use of information and communication technologies • to reduce costs by sharing resources 15 two programs have already been initiated: interlibrary loan. an agreement to obtain a faster delivery system for books and journal articles has been established. using the services of a private courier company, maximum delivery time from one university to another will be set to forty-eight hours. this service started working on the first of sepember. training. different courses for the joint training of library staff are being organized on a cooperative basis. in the future, other programs will be developed, including a union catalog (with the creation of a collective data basis that will also save cataloging costs by sharing bibliographical resources); and an elecronic library, which will allow common access to electronic resources. galician libraries consortium the galician libraries consortium is the result of a regional government intiative. 16 in november 1996 the xunta de galicia signed an agreement of scientific and technological collaboration with fijitsu icl spain in which the company agreed to develop the telecommunications infrastructure of the community: the galician information highway (agi: autopista gallega de la informaci6n). inaugurated in 1997, agi serves as the basis for projects with great political and social appeal. three projects were embarked upon : • tele-teaching, • tele-medicine, and • access to libraries users have access to a loan service by which a loan may be requested from any library in the consortium. the loan works as it would work in a local climate, with the same limitations, controls, and blocking of any other local loan system . the request to the system is sent online and is fulfilled within twenty-four to forty-eight hours. the consortium originally was to encompass all types of libraries, but as the project advanced, it was decided to restrict the collaboration to university libraries. this allowed the project to move forward with greater speed, because the member libraries had more narrowly defined interests and concerns. the xunta de galicia prepared the "protocol of intentions," which has been signed by the highest representatives of the three gallician universities (universidad de santiago, universidad de la corufta, and universidad de vigo). this protocol is characterized by two essential ideas: 1. allow adequate time for planning individual incorporation into the consortium, so that each institution may participate at the rate it deems appropriate. 2. create a permanent working commission formed by representatives of the institutions involved, which will: • answer existing and future questions; • define the model of consortium that each organization desires to establish through specific objectives; and • promote adequate measurement in order to obtain the objectives that have been designed . andalucian university libraries consortium in the era of the internet, electronic documents, and the virtual library, maintaining independent libraries is out of order . in addition, the efforts needed to face the challenges of the information society and the changes that society is demanding of universities are destined to become weaknesses more than strengths in those institutions that face them individually. there are many reasons why it is advisable for libraries to approach these challenges collaboratively: • the productivity and competitiveness that society demands of the universities • the huge technological opportunities to share information • the importance of the changes that are taking place in the products and services that the information market offers • the high cost of the new products (e.g., e-joumals) • the need of very specialized knowledge in order to activate some of these services • the growing demands of library users the andalucian university libraries concluded that if they wished to stay current with information technologies, if they wished to continue implementing improved services, and if they wished to do so within their budg100 information technology and libraries i june 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. ets, solid cooperation mechanisms would have to be established. in march 1998 the andalucian vice-principals of research requested the directors of the andalucian university libraries to analyze possible cooperative activities among the university libraries of the community. two goals were set in this meeting: • the analysis of library automation products currently on the market. • the analysis of the current individual management systems within the andalucian libraries (which, though automation varied within them, were each considered to be outdated) and the potential for sharing resources with the present systems, which is difficult because currently available systems may not be compatible with z39.50. the object of this analysis is to define essential requirements so the new systems to be implemented facilitate possible cooperative actions. this possible integration will not be simple: the university pablo de olavide, recently created, is planning to purchase its own system; the universities of seville, granada, and cordoba are using dobis-libis; and the universities of cadiz and malaga are using libertas and are preparing to update to innopac. the andalucian university libraries have studied some of the systems that the spanish market offers: abys (baratz, document systems), amicus (elias), innopac (sls), sabini (sabini library automation) and unicorn (sirsi). they are preparing a catalog of electronic information resources available in the andalucian university libraries to know which resources are available and preferred by different universities. the andalucian university libraries consortium is in an early stage; while its organizational structure and functions are defined, its tasks are still being elaborated. the delegate commission of the vice-principals of research of the andalucian universities is responsible for this work. the commission is presided over by the viceprincipal of the university of seville and formed by the directors of the andalucian libraries and the juridical consultant of the university of cordoba. the commision will produce a working paper that outlines the main facets of the organization, based on the following general principles: • to add value to the computer net of research • to favor the use of technologies that contribute to the improvement of the production times and the designing of efficient processes • to apply scale economies: • in the purchase of products and services • in repetitive tasks and activities • to favor the use of information resources among the members of the andalusian universities and the society in general in order for the project to succeed, the following conditions must exist: • a homogeneous situation among the libraries in terms of regulations and technical instruments used in the description of materials, data format, and information interchange format; • the andalucian universities are connected with high speed optic fiber lines (32 mb); • the administrative framework is clearly defined; and • the responsible members of the andalusian university libraries are convinced that cooperation will improve substantially the quality of the library services in each university. additionally, the following advantages must result: • decline or leveling of production expenses • economies of scale in the purchase of products such as computer systems, databases, and journal and electronic information subscriptions • shared technical support • shared training costs • shared information resources through interlibrary loan i conclusions the ultimate goal of cooperation is to join users and the documents and information they need; establishing relations among participant institutions is a means to that end. consortia represent the possibility to test alternatives to the traditional automated library. they represent the potential to offer the best library services to a wider number of users with all the resources they possess. further than simple cooperation that unites efforts and resources, consortia represent the possibility to test innovative formulas of processes management and services organization from a regional perspective. references 1. miguel duarte, "evaluaci6n del rendimiento aplicando sistemas de gesti6n de calidad, la experiencia de la biblioteca de la universidad de cadiz" [performance assesment implementing total quality management systems. the university library of cadiz experience], in xv jornadas_ de gerencia universitaria: mode/as de financiaci6n, evaluaci6n y me1ora de la calidad de la gesti6n de las servicios [15th university managers meeting: financing models, assesment and quality assurance new stragetiges in library services organization i barrionuevo 101 reproduced with permission of the copyright owner. further reproduction prohibited without permission. of services] (cadiz, university pr., 1997), 309-10; marta torres, "el impacto de las autopistas de la informaci6n para la comunidad academica y los bibliotecarios" [the information highway to academic community and librarians], in autopistas de la informaci6n: el reto de/ siglo xxi (madrid: editorial complutense, 1996), 37-55. 2. victor castelo en la mesa redonda: suen.an los informaticos con bibliotecas electr6nicas. en seminario sobre consorcios de bibliotecas [dream the computerman with electronic libraries?] table ronde in libraries consortia conference, cadiz, university press, 1999, 130; see also www.rediris.es, accessed apr. 24, 2000. 3, m. jimenez and alice keefer, "library automation in spain," program 26, no. 3 (1992): 225-37; assumpcio stivill, "automation of university libraries in spain," telephasa seminar on innovative information services and information handling (tilburg, june 10-12, 1991); rebiuns statistical annual offers data about catalog automation. 4. luis anglada and margarita taladriz, "pasado, presente y futuro de las bibliotecas universitarias espaii.olas" [past, present and future of spanish university libraries] in ix jornadas de bibliotecas de andalucfa (granada: asociaci6n andaluza de bibliotecarios, 1996), 108-31. 5. l. anglada, "cooperaci6 bibliotecaria a espanya [library cooperation in spain]," item 95, no. 16: 51--67. 6. see www.doc6.es/cdb, accessed apr. 24, 2000. 7. see http:/ /biblioteca.upv.es/bib/mecano, accessed apr. 24, 2000. 8. see www.uned,es/bibliote/biblio/ruedo.htm and www. baratz.cs/rueca, accessed apr. 24, 2000. 9. "the library in the university: report on the university libraries in spain, produced by a working team formed by university librarians and teachers" (madrid: ministry of general culture of the book and libraries, 1985); "university libraries: recommendations about its regulations, conference's on university libraries, 'castillo magalia,' las navas de! marques," avila, may 27-28, 1986 (madrid: library coordination centre, 1987). 10. situaci6n de las bibliotecas universitarias dependientes del mec [academic libraries from education department state of art] (madrid: universidad complutense, biblioteca, 1988); estudio sob re normalizaci6n e informatizaci6n de las bibliotecas cientificas espaii.olas.-fundesco, 1989 (no publicado). 11. luis anglada and margarita taladriz, 108. 12. see consorcios de bibliotecas [consortia libraries conference], maribel gomez campillejo, ed. (cadiz: cadiz univ. pr., 1999). 13. see www2.uji.es/rebiun, accessed apr. 24, 2000. 14. for more information about cbuc, see www.cbuc.es, accessed apr. 24, 2000. 15. marta torres, los consorcios, forma de organizaci6n bibliotecaria en el s.xxi. una aproximaci6n desde la perspectiva espaii.ola. in consorcios de bibliotecas (library consortia conference), 17-35. 16. santiago raya, "el consorcio de bibliotecas de galicia [galician library consortium]," in consorcios de bibliotecas [library consortia conference], cit, 117-25. 102 information technology and libraries i june 2000 lib-s-mocs-kmc364-20140601051638 51 scientific serial lists dana l. roth: central library, indian institute of technology, kanpur, u.p., india this article describes the need for user-oriented serial lists and the development of such a list in the california institute of technology library. the results of conversion from eam to edp equipment and subsequent utilization of com (computer-output-microfilm) is reported. introduction prior to the dedication of the millikan memorial library, which houses the divisional collections in chemistry, biology, mathematics, physics, engineering, and humanities, the libraries at the california institute of technology were largely autonomous, reflecting the immediate needs of each division, and exhibited little attempt at interdivisional coordination of library purchases. with centralization of the major science collections, it became apparent that any efforts to reduce duplication, promote more effective library usage, and provide assistance in interdisciplinary research efforts would require a published list of serials and journals ( 1). scientists vs librarians it is certainly a truism that serial publications constitute the backbone of a library's research collection. particularly in the sciences, where serial publications serve as the primary record of past accomplishments, studies have shown that over 80 percent of the references cited in basic source journals are to serials (see table 1). citation of serials rather than monographs was greater in chemistry than in other sciences and the overall order may reflect the efficiency of the respective abstracting/ indexing services. in spite of the scientist's heavy dependence on serials, it appears that in most libraries little attempt has been made to reconcile the library 52 ]oumal of library automation vol. 5/1 march, 1972 table 1. percentage of citations to serials found in basic source journals for various, scientific disciplines 0 discipline percentage of citations to serials ch em is try ____________________ ____________ ---------------------. __ . _____________________ 93. 6 physiology ----------------------------------____________________________________________ 90. 8 physics __ . ____________________________________________________________________________ . .88. 8 ~~~l~~lo~~--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~6:~ mathematics __________ ·------------------------------------_____________ ....... 76.8 °c. h. brown, scientific serials (chicago: association of college and research libraries, 1956). record with practices found in the scientific literature. this is in part due to the general acceptance of the library of congress dictum that serials should be cataloged according to the general principles laid down for monographs. fortunately, monographs are generally cited in the scientific literature by entries (author / title ) which invariably appear in the library catalog. serials, however, present the special problems of so-called indistinctive titles, frequent title changes, and common reference to the abbreviated form of their title. most american libraries have followed the library of congress/union list practices and as a result have long suffered user complaints about the use of corporate entries for so-called indistinctive titles, entries under the latest form of title, and the treatment of prepositions and conjunctions as filing elements. these practices have been defended as attempts to extend the reference value of the catalog but in doing so they create a number of problems and ambiguities which are only partially resolved by the annoying use of see references. the recent surge of interest in making the library "relevant" and more intimately involved with its users needs must take into account that in the minds of scientists it is a presumptive requirement for them to remember cataloging rules when the library could just as well accommodate the scientific form. in recognition of the long-standing scientific tradition of describing serials by their titles (which considerably predates the corporate entry syndrome), the logical solution wou ld be to provide title added entries for those serials whose main entry is in corporate form ( 2). specific problems 1. even if scientists were to remember the basic rules for society publications and similar corporate entries, how are the exceptions shown in table 2 to be reconciled? 2. the practice of cataloging serials under their latest title then best serves as an obstruction to determining the library holdings, since referscientific serial lists/roth 53 table 2. an example of the difficulties encountered in translating abbreviations of scientific journal titles into lc entries abbreviation scientific form of title union list entry bull acad pol sci bulletin de 1' polska akademia nauk academie ... pnas proceedinys of the national academy ... nationa ... jacs journal of the american chemical .. . american ... berichte berichte der deutsche chemischen ... deutschen . .. comp. rend. comptes rendus ... academie des sciences .. . ber. bunsen... berichte der bunsen. .. deutsche bunsen ... bull. soc. chim. belges bulletin des societies ... bulletin des bull. soc. chim. france bulletin de la societe societies ... societe chimique des france ences given in the scientific literature and citations obtained from abstracting/ indexing services are obviously to the title currently in use. another important factor, that is sometimes overlooked, is the requirement of a classified shelf arrangement. otherwise, since the title of the bound volume corresponds to the title in use at the time of binding, you have the ambiguity of catalog referring to the latest title and shelf locator referring back to the earlier title. these problems are further complicated by the long d elays and backlogs in recataloging. in many large libraries this is a major function of serials catalogers and it is estimated that it takes 50 percent longer to recatalog than to catalog originally ( 3). 3. the jargon of scientists when discussing or requesting information about various periodicals is replete with acronyms and abbreviated forms. jags, pnas, berichte, comptes rendus, annalen all have well-defined meanings in scientific literature and conversation because of the well-developed title entries and abbreviations given in physics abstracts, chemical abstracts, and the world list of scientific periodicals. the use of prepositions and conjunctions as filing elements constrains these scientists to being able to translate these abbreviations only into title entries where the omitted words are obvious, e.g., journal of the american chemical society but often causes problems with titles like journal of the lesscommon metals. the cal tech serials list: objectives and procedures the publication of a serials list oriented to the needs of scientists must then provide for: scientific title entries for corporate and society publica54 journal of library automation vol. 5/1 march, 1972 tions, treatment of each title change as the cessation of the old title, and omitting prepositions and conjunctions as filing elements. these practices will increase the number of entries by about 40 percent over the number of current titles but in terms of user appreciation the extra expense is amply justified. the list can then be a logical extension of the library's reference service and offers the opportunity of facilitating the research efforts of its users by obviating the need to remember cataloging rules or visit the library to determine its holdings. input to the serials list was derived from the library's serials card catalog. the information was typed on oversize card stock and included the full main entry, holdings, and divisional library location, with additional data cards, as required, to reflect title changes. with this data base, an extensive search of the world list of scientific periodicals and list of periodicals abstracted by chemical abstracts was made to determine the additional scientific title entries to be incorporated in the list. (each departmental library provides a shelf locator which relates the various forms of entry in the serials list to that chosen for the bindery title and subsequent shelf location.) prepositions and conjunctions were replaced with ellipses in the final typing of multilith stencils required for the manual publication of the first edition of the cal tech serials list ( 4). during the spring of 1969, the decision was made to employ edp techniques in the publication of the second edition of the list. as an interim housekeeping device between editions, the author maintained an in-house supplement on punch cards using a single card format. this experience indicated an unacceptable severity of title abbreviation which was obviated by adopting a two-card format. this is consistent with the ibm 360 system wherein input records are read two cards at a time, and thus, the unit record may be thought of as a "super" card of 160 columns (of which only a maximum of 131 columns can be printed on a given line, the remaining 29 columns being used for internal records). the unit serials record consists of the title, holdings, divisional library, serial number, and spacing command (see table 3). the unit records were created directly from the existing serial list and the cumulated supplement by in-house clericals. this obviated the usual requirement of coding the data for keypunch operators. subsequent to the preparation of the unit records, having an alphabetical sequence of punched cards, it was a simple matter to program the computer to serially number each second card, using orre letter and six digits. an example of the distribution of titles one might expect is given in table 4. while the data conversion was being performed, a series of programs was written. these programs were designed to create a master tape, update the tape, and to produce a variety of listings. these listings, in addition to the required 131-column printout for the serial list, include the 160-column table 3. the unit serials record card no. columns 1 1-75 2 1-27 2 29-32 2 72-78 2 80 scientific serial lists/roth 55 field designation title holdings divisional library serial no. spacing command table 4. distribution of titles by initial letter. letter number of title entries a 1,024 b-d 1,126 e-1 1,199 j-m 1,272 n-r 1,413 s-z 1,471 printout (in sequential so-column units) and printouts for individual divisional libraries which can be annotated with shelf locations. the data base was then transferred from punch cards to magnetic tape and subsequent additions and changes involve punch cards and tape one onto tape two operations. as a protective device tape one and tape two are the current and previously current tapes, respectively. thus in the case of accident the preceding tape can again be updated. as a further precaution the original punch card data base and update decks are on file. the economic justification for the use of edp equipment in libraries is based upon the necessity of maintaining current records that can be published at regular intervals. in the special case of serial lists this involves the periodic merging of small numbers of new and corrected unit records with the much larger number of unit records in the existing data base. the use of serially numbered unit records allows the relatively easy machine function of merging numbered items in contrast with the difficulties involved in merging large alphabetical fields. recent advances in reprographic technology suggested that com (computer-output-microfilm) could be utilized to produce a quality catalog, free of the normal objections to "computer printout." the flexibility of currently available com units allows the acceptance, as input of a normal print tape from most computer systems (ibm, burroughs, univac) 56 journal of library automation vol. 5/1 march, 1972 table 5. data presentation and relative spacing title holdings divisional library faraday society, london discussions 1,1947+ chern 10,1951+ c eng symposia 1,1968+ chern 1,1968+ c eng transactions 1,1905+ chern 46,1950+ c eng farber-zeitung 1889-1918 chern without reformating ( 6). the print processors resident in the front-end computer of the fr-80, for example, allow for upperand lowercase, gold characters, column format, pagination, and sixty-four-character sizes. variation in character size allows a maximum density of 170 characters per line and 120 lines per ( 8~ x 11 ) page. the application of com equipment requires the production of a "print tape." this is simply a coded version of the current tape which contains the additional instructions necessary for spacing the unit records, defining the page size, and inserting "continued on next page" statements. the use of spacing command instruction, as an integral part of the unit record, allows all the information on a given title to remain in one unit and easily provides for a blank line before the next title ( see table 5). the additional problem of keeping the information on one title together on a given page or providing a "continued on next page" statement was solved by analyzing the information in the eighty-ninth line of each page to determine whether to print another line, insert the "continued on next page" instruction, or begin the title on the next page. once the film is generated, it is a simple matter to produce plates for the multilith production of hard copy ( 7). the choice of a ninety-lines-per-page format was influenced, in part, by our desire to use the serials list to break down the reluctance shown by faculty and students toward microformats. this format results in a onethird reduction of the 112-column computer printout and enables our 5,000 current titles to be accommodated on two microfiches ( 152/ pages ). footnotes 1. for the purposes of this article, periodical and serial are synonymous and refer to publications which may be suspended or cease but never conclude. the term "serials list" should be restricted to publications which record only serial titles ( and supplementary information to distinguish between similar titles), holdings, and internal records. library catalogs and union lists are quite sufficient sources for relating a title scientific serial lists /roth 57 to its successor or precedent, and providing full bibliographic detail. 2. p. a. richmond and m. k. gill, "accomodation of nonstandard entries in a serials list made by computer," journal of the american society for information science 11:240 ( 1970 ); dana l. roth, "letters to the editor; comments on the 'accomodation of nonstandard entries . . . ,' " journal of the american society for information science (in press). 3. andrew d. osborn, serial publications ( chicago: american library association, 1955). 4. e. r. moser, serials and journals in the c.i.t. libraries (pasadena: california institute of technology, 1967). 5. dana l. roth, serials and journals in the c.i.t. libraries (2nd ed.; pasadena: california institute of technology, 1970). 6. robert f. gildenberg, "technology profile; computer output microfilm ," modem data 3:78 ( 1970 ). 7. computer micrographics, inc., los angeles, california. virtual reality: a survey of use at an academic library articles virtual reality a survey of use at an academic library megan frost, michael goates, sarah cheng, and jed johnston information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11369 megan frost (megan@byu.edu) physiological sciences librarian, brigham young university. michael goates (michael_goates@byu.edu) life sciences librarian, brigham young university. sarah cheng is an undergraduate student, brigham young university. jed johnston (jed_johnston@byu.edu) innovation lab manager, brigham young university. abstract we conducted a survey to inform the expansion of a virtual reality (vr) service in our library. the survey assessed user experience, demographics, academic interests in vr, and methods of discovery. currently our institution offers one htc vive vr system that can be reserved and used by patrons within the library, but we would like to expand the service to meet the interests and needs of our patrons. we found use among all measured demographics and sufficient patron interest for us to justify expansion of our current services. the data resulting from this survey and the subsequent focus groups can be used to inform other academic libraries exploring or developing similar vr services. introduction virtual reality (vr) is commonly defined as an experience in which a user remains physically within their real world while entering a virtual world (comprising three-dimensional objects) using a headset with a computer or a mobile device.1 vr is part of a spectrum of related technologies ranging from mostly real experiences to completely virtual experiences, such as augmented reality, augmented virtuality, and mixed reality. 2 extended reality (xr) is a term often used when describing these technologies as a whole. many different xr devices and services are available in academic libraries. the most popular xr devices used in libraries are the htc vive, the oculus rift by facebook, and google cardboard.3 other common xr devices include gearvr by samsung and playstation virtual reality by sony.4 the htc vive and oculus rift are technologies that provide an immersive virtual-reality experience. google cardboard provides both non-immersive virtual reality and augmented reality experiences, while mixed reality is provided through various technologies such as microsoft’s hololens and mixed-reality headsets from hp, acer, and magic leap. in addition, many academic libraries are using augmented reality apps that can be downloaded on patrons’ personal mobile devices.5 academic libraries are starting to offer various xr services to increase engagement with patrons and teach information literacy.6 despite the increase in xr service offerings, there is little consistency in the devices used or in how these services are developed at academic libraries , and there is substantial variation in the types of services offered. for example, some libraries make vr headsets available for in-house activities, such as storytelling, virtual travel, virtual gaming, and the development of new skills.7 other libraries, notably ryerson university library and archives in toronto, let students and faculty borrow their oculus rift headsets for two or three days at a time.8 some university libraries lend out headsets or 360-degree cameras or provide a virtualmailto:megan@byu.edu mailto:michael_goates@byu.edu mailto:jed_johnston@byu.edu information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 2 reality space for students to develop content.9 the university of utah library offers an open-door, drop-in vr workshop once a week.10 claude moore health sciences library at the university of virginia implemented a project that educated its students and staff on the uses of vr in the health field through a combination of large-group demonstrations, one-on-one consultations, and workshops.11 the xr field is developing quickly, and xr services have the potential to benefit students academically. some universities are already offering classes on vr platforms.12 this is particularly true in fields that are high risk or potentially discomforting. for example, students in medical fields benefit by practicing virtually before attempting surgery on a human body.13 in addition to potential surgical benefits, the university of new england has been utilizing xr technology to teach empathy to its medical and other health profession students by putting the learner in the place of their patients.14 other examples of xr usage in the health fields include a recent attempt to introduce vr in anatomic pathology education and the use of virtual operating rooms to train nurses and educate the public. 15 one recent study measured the effectiveness of using vr platforms in engineering education and found a drastic improvement in student performance.16 many educational institutions outside of the university setting have also started exploring how xr could be used to enhance students’ educational experience. this technology has already progressed from being considered a novelty to being an established tool to engage learners.17 one of the perceived benefits of xr use in public libraries by both library patrons and staff is the ability of xr technology to inspire curiosity and a desire to learn.18 in some school programs, students are able to advance their learning through xr apps that allow them not only to absorb information but also to experience what they are learning through hands-on activities and full immersion without danger (e.g., hazardous science experiments) or high cost (e.g., traveling to another country).19 xr has the potential to increase the overall engagement of students, which, according to carnini, kuh, and klein’s 2006 study, is correlated to how well students learn.20 xr has the ability to capture the attention of students and eliminate distractions. this is particularly true for students with attention deficit disorder, anxiety disorders, or impuls e-control disorder.21 the application of xr goes beyond traditional classroom settings. a case study assessing the benefits of vr in american football training found that players showed an overall improvement of 30 percent after experiencing game plays created by their coaches in a virtual environment.22 although these studies were not conducted in an academic library or university setting, their results are transferable. it is beneficial to academic libraries to provide technologies to their patrons that enhance and advance their learning. currently, xr apps available for purchase on the google app store are still limited. most app development comes from private companies; however, some universities are giving their students the opportunity to develop xr content.23 objectives at brigham young university, we want our vr services to foster the link between academic achievement and virtual reality. in order to do this effectively, our first objective is to determine which vr services will be of most benefit to our patrons. to inform the expansion of future vr services, we conducted a survey of patrons using current vr services in the library. this survey is also intended to help other libraries that are developing vr services and potentially developers information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 3 interested in creating academic content for students. we were primarily interested in user experience, demographics, academic interests in vr, and methods of discovery. methods during one semester, january through april 2018, we asked individuals to complete a questionnaire following their use of the library’s htc vive system. this questionnaire was administered through an online qualtrics survey that was distributed via email to patrons after using the library’s vr equipment. it consisted of thirteen questions that gathered basic demographic information as well as information on patron interests and experiences with the library’s vr services. the complete survey used in this study can be found in appendix a. currently the harold b. lee library at brigham young university offers one htc vive vr system that can be used on site in the science and engineering area of the library. it is primarily operated by student employees who work at the science and engineering reference desk. time slots are reserved through an online registration system on the library’s website. in order to gather more in-depth, qualitative data on patron experience with the library’s vr services, we also conducted a focus group with vr users. we recruited participants by adding a question at the end of the qualtrics survey asking whether the responder would be interested in participating in a focus group. all focus group participants received a complimentary lunch. during the focus group, we asked a series of five questions to gain a deeper understanding of users’ vr experience at the library. in particular, we asked participants to explain what went well during their vr experience in the library, what difficulties they experienced, how they envisioned using vr for both academic and extracurricular purposes, and what type of vr content (e.g., software or equipment) they would like the library to acquire. the focus group facilitator asked follow-up questions for clarification as needed. the session was audio recorded, and participant responses were transcribed and coded for themes. results and discussion demographics the most frequent users of the vr equipment in the library were male students in the science. technology. engineering, or mathematics (stem) disciplines. the percentage of male students at brigham young university is roughly 50 percent but over 70 percent of our survey respondents were male. that stated, there was considerable use among all measured demographics, as shown in figure 1. over one third of responders were not students. university faculty made up 11 percent of responders during the survey period. the proportion of faculty who responded was higher than the university’s faculty-to-student ratio and likely the result of directly advertising the service to non-student university employees. because some users informed librarians that they had brought spouses and children to use the equipment, we estimate that the 7 percent of responders who were neither students nor university employees mostly consisted of family or friends accompanying students or employees. over one third of student responders were majoring in disciplines outside of science, technology, engineering, and mathematics. this number is small when compared to the number of students in these majors across campus (approximately 63 percent of students on campus are not majoring in stem disciplines.); however, it demonstrates that there is an interest in vr technology throughout the university. as the vr services are located in the science and engineering area of the library, it is not surprising that more students majoring information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 4 in these disciplines used these services when compared to students majoring in other disciplines. in fact, 15 percent of responders learned about the services at the reference desk, where they could see other patrons using the vr equipment. the most common discovery method, however, was the various forms of advertisements targeted to both students and employees of the brigham young university, as shown in figure 2. figure 1. demographics. information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 5 figure 2. most effective discovery methods: advertisement and word-of-mouth. only 7 percent of responders identified research or class assignments as their primary reason for using the services. the large majority of use, as shown in figure 3, was simply for entertainment or fun. this was not unexpected, especially as most of the users were trying the technology for the first time (see figure 4). however, because we purchased the equipment with the intent to support academic pursuits on campus, we hoped to see a higher percentage of academic use. figure 3. most responders came because it sounded fun. information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 6 figure 4. most responders were first-time users. faculty use was higher than expected (see figure 5). eleven percent of users during our survey period were faculty. the majority of these responders indicated an interest in potentially using vr technology with their students (see figure 6). while this interest was positive, faculty member suggestions for classroom use remained hypothetical, without any concrete intentions for implementation. this suggests that although faculty interest exists, faculty may need to be informed of specific application ideas in order to be more likely to incorporate this technology into their courses. information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 7 figure 5. faculty were interested in trying the vr equipment. information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 8 figure 6. faculty were interested in using vr academically. a clear majority (72 percent) indicated an intention of returning to the library to use the service again (see figure 7). information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 9 figure 7. most responders intend to return. because our vr services were a small pilot program at the time of the survey, we did not offer a large number of paid apps to users. table 1 displays the most common apps used by survey responders. most users tried google earth during their session, and employees at the reference desk often recommended this app to new users. another common app for new users was the lab, which includes a few small games showcasing the current capabilities of vr. google tiltbrush is an app for creating 3d art. virtual jerusalem is an app that was created by faculty at brigham young university and allows users to walk around and explore the jerusalem temple mount during the time of christ. the fifth-most-used app we offered was 3d organon vr anatomy, which teaches human anatomy. 1. google earth 2. the lab 3. tiltbrush by google 4. virtual jerusalem 5. 3d organon vr anatomy table 1. top five apps used. focus group data we conducted a total of three focus group sessions. each session included between five and eight participants, for a total of twenty-one focus group participants. because we were primarily information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 10 interested in student responses, we limited focus group participants to students enrolled at brigham young university. the participants were asked to describe what did or did not go well during their vr session. when describing what went well during their vr session, many participants responded with positive comments about the quality of service the library employees provided during their session. most participants expressed satisfaction with the number and quality of the apps provided by the library. during all three focus groups, participants mentioned that they liked how easy it was to sign up for the vr services. the most common problems reported by participants related to health or safety concerns, such as feeling dizzy, bumping into objects in the room because of the lack of space, and tripping over the headset wire. other s reported problems related to the level of personal or social comfort with the vr services, such as feeling self-conscious using vr in a semi-open space not exclusively devoted to vr services or being told to be quieter. when asked about ways the library could improve its vr services, the students suggested solutions to many of these problems. a frequent recommendation was that the library dedicate a space to vr. the reasons for this suggestion included minimizing the risk of accidentally bumping into objects, reducing the embarrassment of using the vr equipment in front of spectators, and allowing participants to become more fully immersed in the vr experience without worrying about being too loud. other common suggestions included providing more than one headset for multiple patrons to use for gaming purposes or team projects, acquiring wireless headsets to eliminate wire tripping hazards, and providing more online training videos to reduce reliance on library workers for common troubleshooting problems. participants did not provide actionable suggestions on ways to decrease dizziness while operating vr equipment. when asked about how the students could see themselves using vr academically, many responded with some of the more well-known uses of vr technology, such as potential uses in science, medicine, engineering, and the military. however, some students had a very hard time determining how vr could be applied to humanities fields such as english. after some discussion, most students were able to see the relevance of vr in their field, but some said that they most likely would not pursue those functions of vr, using vr exclusively for extracurricular activities. in contrast to the lack of academic uses envisioned by focus group participants, participants had substantially more ideas about how they would use vr for extracurricular purposes, including playing games for stress relief, exercising, exploring the world, and watching movies. many expressed interest in using vr for extracurricular learning outside their majors, such as virtually being part of significant historic events, exploring ecosystems, and visiting museums or other significant landmarks. students expressed interest in exploring the many possibilities provided by vr technology but were not especially aware of or interested in how vr might apply to their specific field of study unless they were in an engineering, medical, or other science-related discipline. conclusions vr is a rapidly growing field, and academic libraries are already providing students access to this technology. in our study, we found considerable interest across campus in using vr in the library, however the academic interest and use were not as high as we hoped. future marketing to faculty might benefit from specifically suggesting ideas for academic uses or collaboration. even though our current vr services are located at the science and engineering help desk, nearly 40 percent of information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 11 users were not in stem disciplines. this is encouraging and suggests value in marketing future vr services to all library patrons. we also found sufficient patron interest to justify exploring related vr services, such as offering classes on creating content and acquiring less expensive headsets that can be borrowed outside of the library. although this survey was limited to one university, we believe the results can be used to inform other academic libraries as they develop similar vr services. endnotes 1 susan lessick and michelle kraft, “facing reality: the growth of virtual reality and health sciences libraries,” journal of the medical library association: jmla 105, no. 4 (2017): 407. 2 paul milgram et al., “augmented reality: a class of displays on the reality-virtuality continuum,” in telemanipulator and telepresence technologies 2351 (international society for optics and photonics, 1995), 282-92. 3 hannah pope, “incorporating virtual and augmented reality in libraries,” library technology reports 54, no. 6 (2018): 8. 4 sarah howard, kevin serpanchy, and kim lewin, “virtual reality content for higher education curriculum,” proceedings of vala (melbourne, australia: libraries, technology and the future inc., 2018), 2. 5 zois koukopoulos and dimitrios koukopoulos, “usage scenarios and evaluation of augmented reality and social services for libraries,” in digital heritage. progress in cultural heritage: documentation, preservation, and protection (springer international, 2018), 134-41; leanna fry balci, “using augmented reality to engage students in the library,” information today europe/ili365 (november 17, 2017), https://www.infotoday.eu/articles/editorial/featuredarticles/using-augmented-reality-to-engage-students-in-the-library-121763.aspx. 6 bruce massis, “using virtual and augmented reality in the library,” new library world 116, nos. 11-12 (2015): 789, https://doi.org/10.1108/nlw-08-2015-0054. 7 adetoun a oyelude, “virtual and augmented reality in libraries and the education sector,” library hi tech news 34, no. 4 (2017): 3, https://doi.org/10.1108/lhtn-04-2017-0019. 8 weina wang, kelly kimberley, and fangmin wang, “meeting the needs of post-millennial: lending hot devices enables innovative library services,” computers in libraries (april 2017): 7. 9 “oxford libguides: virtual reality: borrowing vr equipment,” bodleian libraries, https://ox.libguides.com/vr/borrowing; “virtual reality services,” penn state university libraries, https://libraries.psu.edu/services/virtual-reality-services; “vr studio,” north carolina state, https://www.lib.ncsu.edu/spaces/vr-studio. 10 oyelude, “virtual and augmented reality,” 3. 11 lessick and kraft, “facing reality: the growth of virtual reality,” 409. https://www.infotoday.eu/articles/editorial/featured-articles/using-augmented-reality-to-engage-students-in-the-library-121763.aspx https://www.infotoday.eu/articles/editorial/featured-articles/using-augmented-reality-to-engage-students-in-the-library-121763.aspx https://doi.org/10.1108/nlw-08-2015-0054 https://doi.org/10.1108/lhtn-04-2017-0019 https://ox.libguides.com/vr/borrowing https://libraries.psu.edu/services/virtual-reality-services https://www.lib.ncsu.edu/spaces/vr-studio information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 12 12 oyelude, “virtual and augmented reality,” 3. 13 medhat alaker, greg r. wynn, and tan arulampalam, “virtual reality training in laparoscopic surgery: a systematic review & meta-analysis,” international journal of surgery 29 (2016): 86, https://doi.org/10.1016/j.ijsu.2016.03.034. 14 elizabeth dyer, barbara j. swartzlander, and marilyn r. gugliucci, “using virtual reality in medical education to teach empathy,” journal of the medical library association: jmla 106, no. 4 (2018): 498, https://doi.org/10.5195/jmla.2018.518. 15 emilio madrigal, shyam prajapati, and juan hernandez-prera, “introducing a virtual reality experience in anatomic pathology education,” american journal of clinical pathology 146, no. 4 (2016): 462, https://doi.org/10.1093/ajcp/aqw133; nils fredrik kleven et al., “training nurses and educating the public using a virtual operating room with oculus rift,” ieee (2014): 1, https://doi.org/10.1109/vsmm.2014.7136687. 16 wadee alhalabi, “virtual reality systems enhance students’ achievements in engineering education,” behaviour & information technology 35, no. 11 (2016): 925, https://doi.org/10.1080/0144929x.2016.1212931. 17 patricia brown, “how to transform your classroom with augmented reality—edsurge news,” edsurge, november 2, 2015, https://www.edsurge.com/news/2015-11-02-how-to-transformyour-classroom-with-augmented-reality. 18 negin dahya et al., “virtual reality in public libraries,” university of washington information school, https://ischool.uw.edu/vrinlibraries. 19 del siegle, “seeing is believing: using virtual and augmented reality to enhance student learning,” gifted child today 42, no. 1 (2019): 46, https://doi.org/10.1177/1076217518804854. 20 guillaume loup et al., “immersion and persistence: improving learners’ engagement in authentic learning situations,” 11th european conference on technical enhanced learning (2016): 414, https://doi.org/10.1007/978-3-319-45153-4_35; robert carini, george kuh, and stephen klein, “student engagement and student learning: testing the linkages,” research in higher education 47, no. 1 (2006): 23-4, https://doi.org/10.1007/s11162-005-8150-9. 21 mariano alcaniz, elena olmos-raya, and luis abad, “use of virtual reality for neurodevelopmental disorders: a review of the state of the art and future agenda,” medicinabuenos aires 79, nos. 77–81 (2019): 419-20, https://doi.org/10.21565/ozelegitimdergisi.448322. 22 yazhou huang, lloyd churches, and brendan reilly, “a case study on virtual reality american football training,” proceedings of the 2015 virtual reality international conference 6 (2015): 3, https://doi.org/10.1145/2806173.2806178. 23 “media lab,” massachusetts institute of technology, https://libraries.psu.edu/services/virtualreality-services; “the ischool technology resources at fsu: virtual reality,” florida state university libguides, https://guides.lib.fsu.edu/ischooltech/vr. https://www.sciencedirect.com/science/journal/17439191 https://doi.org/10.1016/j.ijsu.2016.03.034 https://doi.org/10.5195/jmla.2018.518 https://doi.org/10.1093/ajcp/aqw133 https://doi.org/10.1109/vsmm.2014.7136687 https://doi.org/10.1080/0144929x.2016.1212931 https://www.edsurge.com/news/2015-11-02-how-to-transform-your-classroom-with-augmented-reality https://www.edsurge.com/news/2015-11-02-how-to-transform-your-classroom-with-augmented-reality https://ischool.uw.edu/vrinlibraries https://doi.org/10.1177/1076217518804854 https://doi.org/10.1007/978-3-319-45153-4_35 https://doi.org/10.1007/s11162-005-8150-9 https://doi.org/10.21565/ozelegitimdergisi.448322 https://doi.org/10.1145/2806173.2806178 https://libraries.psu.edu/services/virtual-reality-services https://libraries.psu.edu/services/virtual-reality-services https://guides.lib.fsu.edu/ischooltech/vr abstract introduction objectives methods results and discussion demographics focus group data conclusions endnotes it is our flagship: surveying the landscape of digital interactive displays in learning environments lydia zvyagintseva information technology and libraries | june 2018 50 lydia zvyagintseva (lzvyagintseva@epl.ca) is the digital exhibits librarian at the edmonton public library in edmonton, alberta. abstract this paper presents the findings of an environmental scan conducted as part of a digital exhibits intern librarian project at the edmonton public library in 2016. as part of the library’s 2016–2018 business plan objective to define the vision for a digital exhibits service, this research project aimed to understand the current landscape of digital displays in learning institutions globally. the resulting study consisted of 39 structured interviews with libraries, museums, galleries, schools, and creative design studios. the environmental scan explored the technical infrastructure of digital displays, their user groups, various uses for the technologies within organizational contexts, the content sources, scheduling models, and resourcing needs for this emergent service. additionally, broader themes surrounding challenges and successes were also included in the study. despite the variety of approaches taken among learning institutions in supporting digital displays, the majority of organizations have expressed a high degree of satisfaction with these technologies. introduction in 2020, the stanley a. milner library, the central branch of the edmonton (alberta) public library (epl) will reopen after extensive renovations to both the interior and exterior of the building. as part of the interior renovations, epl will have installed a large digital interactive display wall modeled after the cube at queensland university of technology (qut) in brisbane, australia. to prepare for the launch of this new technology service, epl hired a digital exhibits intern librarian in 2016, whose role consisted of conducting research to inform the library in defining the vision for a digital display wall serving as a shared community platform for all manner of digitally accessible and interactive exhibits. as a result, the author carried out an environmental scan and a literature review related to digital display, as well as their consequent service contexts. for the purposes of this paper, “digital displays” refers to the technology and hardware used to showcase information, whereas “digital exhibits” refers to content and software used on those displays. wherever the service of running, managing, or using this technology is discussed, it is framed as “digital display service” and concerns both technical and organizational aspects of using this technology in a learning institution. method the data were collected between may 30 and august 20, 2016. a series of structured interviews were conducted by skype, phone, and email. the study population was driven by searching google mailto:lzvyagintseva@epl.ca it is our flagship | zvyagintseva 51 https://doi.org/10.6017/ital.v37i2.9987 and google news for keywords such as “digital interactive and library,” “interactive display,” “public display,” or “visualization wall” to identify organizations that have installed digital displays. a list of the study population was expanded by reviewing websites of creative studios specializing in interactive experiences and through a snowball effect once the interviews had begun. a small number of vendors, consisting primarily of creative agencies specializing in digital interactive services, were also included in the study population. participants were then recruited by email. the goal of this project was to gain a broad understanding of the emergent technology, content, and service model landscape related to digital displays. as a result, structured interviews were deemed to be the most appropriate method of data collection because of their capacity to generate a large amount of qualitative and quantitative data. in total, 39 interviews were conducted. a list of interview questions prepared for the interviews is included in appendix a. additionally, a complete list of the study population can also be found in appendix b. predominantly, organizations from canada, the united states, australia, and new zealand are represented in this study. literature review definitions • public displays, a term used in the literature to refer to a particular type of digital display, can refer to “small or large sized screens that are placed indoor . . . or outdoor for public viewing and usage” and which may be interactive to support information browsing and searching activities.”1 in public displays, a large proportion of users are passers-by and thus first-time users.2 in academic environments, these technologies may be referred to as “video walls” and have been characterized as display technologies with little interactivity and input from users, often located in high-traffic, public areas with content prepared ahead of time and scheduled for display according to particular priorities.3 • semi-public displays, on the other hand, can be understood as systems intended to be used by “members of a small, co-located group within a confined physical space, and not general passers-by.”4 in academic environments, they have been referred to as “visualization spaces” or “visualization studios,” and can be defined as workspaces with real-time content displayed for analysis or interpretation, often placed in in libraries or research department units.5 for the purposes of this paper, “digital displays” refers to both public and semi-public displays, as organizations interviewed as part of this study had both types of displays, occasionally simultaneously. • honeypot effect describes how people interacting with an information system, such as a public display, stimulate other users to observe, approach, and engage in interaction with that system.6 this phenomenon extends beyond digital displays to tourism, art, or retail environments, where a site of interest attracts attention of passers-by and draws them to participate in that site. interactivity the area of interactivity with public displays has been studied by many researchers, with three commonly used modes of interaction clearly identified: touch, gesture, and remote modes. information technology and libraries | june 2018 52 • touch (or multi-touch): this is the most common way users interact with personal mobile devices such as smartphones and tablets. multi-touch interaction on public displays should support many individuals interacting with the digital screen simultaneously, since many users expect immediate access and will not take turns. for example, some technologies studied in this report support up to 30 touch points at any given time, while others, like qut’s the cube, allow for a near infinite number of touch points. though studies show that this technique is fast and natural, it also requires additional physical effort from the user.7 while touch interaction using infrared sensors has a high touch recognition rate, its shortcomings have been identified as being expensive and being influenced by light interference, such as light around the touch screen.8 • gesture: this is interaction is through movement of the user’s hands, arms, or entire body, recognized by sensors such as the microsoft kinect or leap motion systems. although studies show that this type of interaction is quick and intuitive, it also brings “a cognitive load to the users together with the increased concern of performing gestures in public spaces.”9 specifically, body gestures were found not to be well suited to passing-by interaction, unlike hand gestures, which can be performed while walking. hand gestures also have an acceptable mental, physical and temporal workload.10 research into gesturebased interaction shows that “more movement can negatively influence recall” and is therefore not suited for informational exhibits.11 similarly, people consider gestures to be too much work “when they require two hands and large movements” to execute.12 not surprisingly, research suggests that gestures deemed to be socially acceptable for public spaces are small, unobtrusive ones that mimic everyday actions. they are also more likely to be adopted by users. • remote: these are interactions using another device, such as mobile phones, tablets, virtual-reality headsets, game controllers, and other special devices. connection protocols may include bluetooth, sms messaging, near-field communication, radio-frequency identification, wireless-network connectivity, and other methods. mobile-based interaction with public displays has received a lot of attention in research, media, and commercial environments because this mode allows users to interact from variable distance with minimal physical effort. however, users often find mobile interaction with a public display “too technical and inconvenient” because it requires sophisticated levels of digital literacy in addition to having access to a suitable device.13 some suggest that using personal devices for input also helps “avoid occlusion and offers interaction at a distance” without requiring multi-touch or gesture-based interactions.14 as well, subjects in studies on mobile interaction often indicate their preference for this mode because of its low mental effort and low physical demand. however, it is possible that these studies focused on users with high degrees of digital literacies rather than the general public with varying degrees of access and comfort with mobile technologies. user engagement attracting user attention is not necessarily guaranteed by virtue of having a public display. according to research, the most significant factors that influence user engagement with public digital displays are age, display content, and social context. it is our flagship | zvyagintseva 53 https://doi.org/10.6017/ital.v37i2.9987 age hinrichs found that children were the first to engage in interaction with public displays and would often recruit adults accompanying them toward the installation.15 on the other hand, the hinrichs found adults to be more hesitant in approaching the installation: “they would often look at it from a distance before deciding to explore it further.”16 these findings suggest that designing for children first is an effective strategy for enticing interaction from users of all ages. display content studies on engagement in public digital display environments indicate that both passive and active types of engagement exist with digital displays. the role of emotion in the content displayed also cannot be overlooked. specifically, clinch et al. state that people typically pay attention to displays “only when they expected the content to be of interest to them” and that they are “more likely to expect interesting content in a university context rather than within commercial premises.”17 in other words, the context in which the display is situated affects user expectations and primes them for interaction. the dominant communication pattern in existing display and signage systems has been narrowcast, a model in which displays are essentially seen as distribution points for centrally created content without much consideration for users. this model of messaging exists in commercial spaces, such as malls, but also in public areas like transit centers, university campuses, and other spaces where crowds of people may gather or pass by. observational studies indicate that people tend to perceive this type of content as not relevant to them and ignore it.18 for public displays to be engaging to end users, in other words, “there needs to be some kind of reciprocal interaction.”19 in public spaces, interactive displays may be more successful than noninteractive displays in engaging viewers and making city centers livelier and more attractive.20 in terms of precise measures of attention to such displays, studies of average attention time correlate age with responsiveness to digital signage. children (1–14 years) are more receptive than adults and men spend more time observing digital signage than women.21 studies also indicate a significantly higher average attention times for observing dynamic content as compared to static content.22 scholars like buerger suggest that designers of applications for public digital displays should assume that viewers are not willing “to spend more than a few seconds to determine whether a display is of interest.”23 instead, they recommend presenting informational content with minimal text and in such a way that the most important information can be determined in two-to-three seconds. in a museum context, the average interaction time with the digital display was between two and five minutes, which was also the average time people spent exploring analog exhibits.24 dynamic, game-like exhibits at the cube incorporate all the above findings to make interaction interesting, short, and drawing the attention of children first. social context social context is another aspect that has been studied extensively in the field of human-computer interaction, and it provides many valuable lessons for applying evidence-based practices to technology service planning in libraries. many scholars have observed the honeypot effect as related to interaction with digital displays in public settings. this effect describes how users who are actively engaged with the display perform two important functions: they entice passers-by to become actively engaged users themselves, and they demonstrate how to interact with the technology without formal instruction. information technology and libraries | june 2018 54 many argue that a conductive social context can “overcome a poor physical space, but an inappropriate social context can inhibit interaction” even in physical spaces where engagement with the technology is encouraged.25 this finding relates to use of gestures on public displays. researchers also found that contextual social factors such as age and being around others in a public setting do, in fact, influence the choice of multi-touch gestures. hinrichs suggests enabling a variety of gestures for each action—accommodating different hand postures and a large number of touch points, for example—to support fluid gesture sequences and social interactions.26 a major deterrent to users’ interaction with large public displays has been identified as the potential for social embarrassment.27 as an implication, the authors suggest positioning the display along thoroughfares of traffic and improving how the interaction principles of the display are communicated implicitly to bystanders, thus continually instructing new users on techniques of interaction.28 findings technical and hardware landscape the average age of public displays was around three years, indicating an early stage of development of this type of service among learning institutions. such technologies first appeared in europe more than 10 years ago (for example, the most widely cited early example of a public display is the citywall in helsinki in 2007).29 however, adoption in north american did not start until around 2013.the median year for the installation of these technologies among organizations studied in this report is 2014. among public institutions represented in the study population, such as public libraries and museums, digital displays were most frequently installed in 2015. while most organizations have only one display space, it was not unusual to find several within a single organization. for example, for the purposes of this study, the researcher has counted the cube as three display spaces, as documentation and promotional literature on the technology cites “3 separate display zones.” as a result, the average number of display spaces in the population of this study is 1.75. the following modes of interaction beyond displaying video content with digital displays have been observed in the study population in descending order of frequency: • sound (79%). while research on human-computer interaction is inconclusive about best practices related to incorporating sound into digital interactive displays, it is clear, among the organizations interviewed in the environmental scan, that sound is a major component of digital exhibits and should not be overlooked. • touch or multi-touch (46%). this finding highlights that screens capable of supporting multi-user interaction is not consistent across the study population. • gesture (25%): these include tools such as microsoft kinect, leap motion, or other systems for detecting movement for interaction. • mobile (14%). while some researchers in the human-computer interaction field suggest mobile is the most effective way to bridge the divide between large public displays, personalization of content, and user engagement, mobile interactivity is not used frequently to engage with digital displays in the study population. one outlier is north carolina state university library, which takes a holistic, “massively responsive design” approach in which responsive web design principles are applied to content that can be it is our flagship | zvyagintseva 55 https://doi.org/10.6017/ital.v37i2.9987 displayed effectively at once online, on digital display walls, and on mobile devices while optimizing institutional resources dedicated to supporting visualization services. further, as in the broader personal computing environment, the microsoft windows operating system dominates display systems, with 61% of the organizations choosing a windows machine to power their digital display. a fifth (21%) of all organizations have some form of networked computing infrastructure, such as the cube with its capacity to process exhibit content using 30 servers. instead, the majority (79%) of organizations interviewed have a single computer powering the display. this finding is perhaps not surprising, given that few institutions have dedicated it teams to support a single technology service like the cube. users and use cases understanding primary audiences was also important for this study, as the organizational user base defines the context for digital exhibits. the breakdown of these audiences is summarized in figure 1. for example, the university of oregon ford alumni center’s digital interactive display focuses primarily on showcasing the success of its alumni, with a goal of recruiting new students to the university. however, the interactive exhibits also serve the general public through tours and events on the university of oregon campus. other organizations with digital displays, such as all saints anglican school and the philadelphia museum of art, also target specific audiences, so planning for exhibits may be easier in those contexts than in organizations like the university of waterloo stratford campus, with its display wall at the downtown campus that receives visitor traffic from students, faculty, and the public. 44% 33% 22% types of audience academic public both public and academic information technology and libraries | june 2018 56 figure 1. audience types for digital displays in the study population. digital displays serve various purposes, which depend on the context of the organization in which they exist, their technical functionality, their primary audience, their service design, and other factors. interview participants were asked about the various uses for these technologies at their institutions. a single display could have multiple functions within a single institution. the following list summarizes these multiple uses: 1. educational (67%), such as displaying digital collections, archives, historical maps, and other informational. these activities can be summarized in the words of one participant as “education via browse”—in other words, self-guided discovery rather than formal instruction. 2. fun or entertainment (56%), including art exhibitions, film screenings, games, playful exhibits, and other engaging content to entice users. 3. communication (47%), which can be considered a form of digital signage to promote library or institutional services and marketing content. displays can also deliver presentations and communicate scholarly work. 4. teaching (42%), including formal and semi-formal instruction, workshops, student presentations, and student course-work showcases. 5. events (31%), such as public tours, conferences, guest speakers, special events, galas, and other social activities near or using the display. 6. community engagement (28%), including participation from community members through content contribution, showing local content, using the display technology as an outreach tool, and other strategies to build relationships with user communities. 7. research (22%), where the display functions as a tool that facilitates scholarly activities like data collection, analysis, and peer review. many study participants acknowledged challenges in using digital displays for this purpose and have identified other services that might support this use more effectively. content types and management in the words of deakin university librarians, “content is critical, but the message is king,” so it was particularly important for the author to understand the current digital display landscape as it relates to content.30 specifically, the research project encompassed the variety of content used on digital displays as well as how it is created, managed, shared, and received by the audiences of various organizations interviewed in this study. as can be observed in figure 2, all organizations supported 2d content, such as images, video, audio, presentation slides, and other visual and textual material. however, dynamic forms of content, such as social media feeds, interactive maps, and websites were less prevalent. it is our flagship | zvyagintseva 57 https://doi.org/10.6017/ital.v37i2.9987 figure 2. types of content supported by digital displays in the study population. discussions around interest in emergent, immersive, and dynamic 3d content such as games and virtual and augmented reality also came up frequently in the study interviews, and the researcher found that these types of content were supported in only 16 (57%) of the 28 total cases. this number is lower than the total number of interviewees because not all organizations interviewed had content to manage or display. in addition, many organizations recognized that they would likely be exploring ways to present 3d games or immersive environments through their digital display in the near future. not surprisingly, the creative agencies included in this study revealed an awareness and active development of content of this nature, noting “rising demand and interest in 3d and game-like environments.” furthermore, projects involving motion detection, the internet of things, and other sensor-based interactions are also seeing rise in demand, according to study participants. 100 % 61 % 57 % 0 10 20 30 40 50 60 70 80 90 100 content types supported content types static 2d dynamic web dynamic 3d information technology and libraries | june 2018 58 figure 3. content management systems for digital displays. in terms of managing various types of content, 20 (71%) of the organizations interviewed had used some form of content management system (cms), while the rest did not use any tool to manage or organize content. of those organizations that used a cms, 15 (75%) relied on a vendorsupplied system, such as tools by fourwinds interactive, visix, or nec live. the remaining 5 (18%) cms users created a custom solution without going to a vendor. this finding suggests that since the majority of content supported by organizations with digital displays is 2d, current vendor solutions for managing that content are sufficient for the study population at this point. it is unclear how the rise in demand for dynamic, game-like content will be supported by vendors in the coming years. table 1 reflects the distribution of approaches to managing content observed in the study population. 18% 11% 53% 18% 71% content management no system unknown vendor-supplied system in-house created system it is our flagship | zvyagintseva 59 https://doi.org/10.6017/ital.v37i2.9987 table 1. content management in study population content management responses % vendor supplied system 15 54 in-house created system 5 18 no system 5 18 unknown 3 10 middleware, automation, and exhibit management middleware can be described as the layer of software between the operating system and applications running on the display, especially in a networked computing environment. for example, most organizations studied in the environmental scan supported a windows environment with a range of exhibit applications, like slideshows, web browsers, and executable files, such as games. middleware can simplify and automate the process of starting up, switching between, and shutting off display applications on a set schedule. as figure 4 demonstrates, the majority of the organizations in the study population (17, or 61%) did not have a middleware solution. however, this group was heterogeneous: 14 organizations (50%) did not require a middleware solution because they ran content semi-permanently or relied on user-supplied content, in which case the display functioned as a teaching tool. the remaining three organizations (11%) manually managed scheduling and switching between exhibit content. in such cases, a middleware solution would be valuable to management of content, especially as the number of applications grows, but it was not present in these organizations. comparatively, 10 organizations (36%) used a custom solution, such as a combination of windows or linux scripts to manage automation and scheduling of content on the display. one organization (3%) did not specify their approach to managing content. these findings suggest that no formalized solution to automating and managing software currently exists among the study population. in addition to organizing content, digital-exhibits services involve scheduling or automating content to meet user needs according to the time of day, special events, or seasonal relevance. as a result, the middleware technology solution supports sustainable management of displays and predictable sharing of content for end users. this environmental scan revealed that digital exhibits and interactive experiences are still in the early days of development. it is possible that new solutions for managing content both at the application and the middleware level may emerge in the coming years, but they are currently limited. information technology and libraries | june 2018 60 figure 4. middleware solutions in the study population. sources of content when finding sources of content to be displayed on digital displays, organizations interviewed used multiple strategies simultaneously. table 2 below brings together the findings related to this theme. table 2. content sources for digital exhibits content source % external/commissioned 64 user-supplied 64 internal/in-house 50 collaborative with partner 43 for example, many organizations rely on their users to generate and submit material (18, or 64%); others commission vendors to create exhibits for them (18, or 64%). in 50% of all cases, organizations also produce content for exhibits in-house. in other words, most organizations used a combination of all sources to generate content for their digital displays. only a few use a single 61% 36% 3% middleware use none custom unknown it is our flagship | zvyagintseva 61 https://doi.org/10.6017/ital.v37i2.9987 source of content, such as the semi-permanent historical exhibit at henrico county public library. others, like the duke media wall, rely entirely on their users to supply content, which employs a “for students by students” model of content creation. additionally, only 12 (43%) of the organizations interviewed had explored or established some form of partnership for creating exhibits. primarily, these partnerships existed with departments, centers, institutes, campus units, and/or students in academic settings, such as the computer science department, faculty of graduate studies, and international studies. other examples of partnerships were with similar civic, educational, cultural, and heritage organizations, such as municipal libraries, historical societies, art galleries, museums, and nonprofits. examples included study participants working with ars electronica, local symphony orchestras, harvard space science, and nasa on digital exhibits. clearly, a variety of approaches were taken in the study population to come up with digital exhibits content. content creation guidelines seven organizations (19%) in the study population shared publicly the content guidelines aimed to simplify the process of engaging users in creating exhibits. these guidelines were analyzed, and key elements were identified that are necessary for users to know in order to contribute in a meaningful way, thereby lowering the barrier to participation. these elements include resolution of the display screen(s), touch capability, ambient light around the display space, required file formats, and maximum file size. a complete list of organizations with such guidelines, along with websites where these guidelines can be found, is included in appendix c. based on the analysis of this limited sample, the bare minimum for community participation guidelines would include clearly outlining • the scope, purpose, audience, and curatorial policy of the digital exhibits service; • the technical specifications, such as the resolution, aspect ratio, and file formats supported by the display; • the design guidelines, such as colors, templates and other visual elements; • the contact information of the digital exhibits coordinator; and • the online or email submission form. it should be noted, however, that such specifications are primarily useful when a cms exists and the content solicited from users is at least somewhat standardized. for example, images, slides, or webpages may be easier for community partners to contribute than video games or 3d interactive content. no examples of guidelines for the latter were observed in the study. content scheduling whereas the middleware section of this study examined the technical approaches to content management and automation, this section explores the frequency of exhibit rotation from a service design perspective. as can be observed in figure 5, no consistent or dominant model for exhibit scheduling has been identified in the study population. generally, approaches to scheduling digital exhibits reflect organizational contexts. for example, museums typically design an exhibit and display it on a permanent basis, while academic institutions change displays of student work or scholarly communication once per semester. the following scheduling models have emerged in the descending order of frequency in the study population. information technology and libraries | june 2018 62 figure 5. content scheduling distribution in the study population. 1. unstructured (29%): no formal approach, policy, or expectation is identified by the organization regarding displaying exhibits. this model is largely related to the early stage of service development in this domain, lack of staff capacity to support the service, and/or responsiveness to user needs. one study participant, for example, referred to this loose approach by noting that “no formalized approach and no official policy exists.” for example, institutions may have frameworks for what types of content are acceptable but no specific requirements on the content subjects. institutions adopting a lab space model (see figure 6) for digital displays largely belong to this category. in other words, content is created on the fly through workshops, data analysis, and other situations as needed by users. in this case, no formal scheduling is required apart from space reservations. 2. seasonal (29%), which can be defined as a period from three to six months and includes semester-based scheduling in academic institutions. many organizations operate on a quarterly basis, so it would seem logical that content refresh cycles reflect the broader workflow of the organization. 3. permanent (21%): in the cases of museums, permanent exhibits may mean displaying content indefinitely or until the next hardware refresh, which might reconfigure the entire interactive display service. no specific date ranges were cited for this model. 4. monthly (10%): this pattern was observed among academic libraries, with production of “monthly playlists” featuring curated book lists or other monthly specials. 5. weekly (7%): north carolina state university and deakin university libraries aim to have fresh content up once per week; they achieve this in part by formalizing the roles needed to support their digital display and visualization services. 29% 29% 21% 10% 7% 4% content scheduling unstructured seasonal permanent monthly weekly daily it is our flagship | zvyagintseva 63 https://doi.org/10.6017/ital.v37i2.9987 6. daily (4%): only griffith university ensures that new content is available every day on its #seemore display; it does this largely by relying on standardized external and internal inputs, such as weather updates and the university marketing department content. staffing and skills one key element of the digital exhibits research project included investigating staffing models required to support a service of this nature. not surprisingly, the theme around resource needs for digital exhibits emerged in most interviews conducted. several participants have noted that one “can’t just throw up content and leave it” while others advised to “have expertise on staff before tech is installed.” data gathered shows that the average full-time equivalent (fte) needed to support digital display services in organizations interviewed was 2.97—around three full time staff members. in addition, 74% of the organizations studied had maintenance or support contracts with various vendors, including av integrators, cms specialists, creative studios that produced original content, or hardware suppliers. hardware and av integrators typically provided a 12-month contract for technical troubleshooting while creative studios ensured a 3month support contract for digital exhibits they designed. the average time to create an original, interactive exhibit was between 9 and 12 months according to the data provided by creative agencies, the cube teams, and learning organizations who have in-house teams creating exhibits regularly. this length of time varies on the complexity of interaction designed, depth of the exhibit “narrative,” and modes of input supported by the exhibit application. additionally, it was important to understand the curatorial labor behind digital exhibits; the author did not necessarily speak with the curator of exhibits, and this work may be carried out by multiple individuals within organizations with digital displays or creative studios. in 20 (57%) of the cases, the person interviewed also curated some of or all the content for the digital display in their respective institutions. in five (14%) of the cases, the individual interviewed was not a curator for any of the content, because there was no need for curation in the first place. for example, displays in these cases were used for analysis or teaching and therefore did not require prepared content. in the rest of the cases (10, or 29%), a creative agency vendor, another member of the team, or a community partner was responsible for the curation of exhibit content. this finding suggests that, while a significant number of organizations outsource the design and curation of exhibits, the majority retain control over this process. therefore, dedicating resources to curation, organization, and management of exhibit content is deemed significant by the organizations represented in the study. in terms of the capacity to carry out digital display services, skills that have been identified by study participants as being important to supporting work of this nature include the following: 1. technical skills (such as the ability to troubleshoot), general interest in technology, and flexibility and willingness to learn new things (74%) 2. design, visual, and creative sensibility (40%), as this type of work is primarily a visual experience 3. software-development or programming-language knowledge (31%) 4. communication, collaboration, and relationship-building (25%) 5. project management (20%) information technology and libraries | june 2018 64 6. audiovisual and media skills (14%), as digital exhibits are “as much an av experience as an it experience,” according to one study participant 7. curatorial, organizational, and content-management skills (11%) the most frequent dedicated roles mentioned in the interviews are shown in table 3. table 3. types of roles significant to digital exhibits work position responses % developer/programmer 11 31 project manager 8 23 graphic designer 6 17 user experience or user interface designer 4 11 it systems administrator 4 11 av or media specialist 4 11 the relatively low percentages represented in this table suggest the distribution of skills mentioned above among various team members or combining multiple skills in a single role, as may be the case in small institutions or those without formalized services with dedicated roles. nevertheless, the presence of specific job titles indicates understanding of various skill sets needed to run a service that uses digital displays. challenges and successes many challenges were identified by study participants related to initiating and supporting a service that uses digital displays for learning. clearly, multiple challenges could be associated with the services related to digital displays within a single organization. however, many successes and lessons learned were also shared by interviewees, often overlapping with identified challenges. this pattern suggests that some organizations can pursue strategies that address challenges faced by their library or museum colleagues while perhaps lacking resources or capacity in other areas related to this type of service. for example, some organizations have observed a lack of user engagement because of limited interactivity of the technology solution they used. others have had successful user engagement largely by investing in technology solutions that provide a range of modes of interaction. it is important to learn from both these areas to anticipate possible pain points and to be able to capitalize on successes that lead to industry recognition and engagement from library customers. table 4 summarized the range of challenges identified. it is our flagship | zvyagintseva 65 https://doi.org/10.6017/ital.v37i2.9987 table 4. challenges related to digital display services challenge identified responses % technical 14 41 content 11 33 costs 11 33 user expectations 11 33 workflow 10 29 service design 9 26 time 8 24 organizational culture 8 24 user engagement 7 20 as reflected in table 4, several key challenges have been discussed: 1. technical, such as troubleshooting the technology, keeping up with new technologies or upgrades, and finding software solutions appropriate for the hardware selected. 2. content, such as coming up with original content or curating existing sources. in the words of one participant, “quality and refresh of content is key—it has to be meaningful, interesting, and new.” this clearly presents a resource requirement. 3. costs, such as the financial commitment to the service, the unseen costs in putting exhibits together, software licensing, and hardware upgrades. 4. user expectations, such as keeping the service at its full potential, using maximum functionality of the hardware, and software solutions. according to study participants, users “may not want what they think or they say they want,” and to some extent, "such technologies are almost an expectation now, and not as exciting for users.” 5. workflow or project-management strategies specifically related to emergent multimedia experiences that require new cycles of development and testing. 6. time to plan, source, create, troubleshoot, launch, and improve exhibits. 7. service design, such as thinking holistically about the functions of the technology within the larger organizational structure. as one study participant stated, organizations “cannot disregard the reality of the service being tied to a physical space” in that these types of technologies are both a virtual and physical customer experience. 8. organizational culture and policy, in terms of adapting project-based approaches to planning and resourcing services, getting institutional support, and educating all staff about the purpose, function, and benefits of the service. 9. user engagement, particularly keeping users interested in the exhibits and continually finding new and exciting content. various participants have found that “linger time is information technology and libraries | june 2018 66 between 30 seconds to few minutes” and content being displayed needs to be “something interesting, unique, and succinct, but not a destination in itself.” despite the clear challenges with delivering digital exhibits services, organizations that participated in this study have identified keys to success (see table 5). table 5. successes and lessons learned in using digital displays successful approach or lesson identified responses % user engagement and interactivity 16 47 service design 14 41 “wow” factor 12 35 organizational leadership 12 35 technology solution 10 29 flexibility 10 29 communication and collaboration 10 29 project management 9 26 team and skill sets 9 26 as reflected in table 5, several approaches have been discussed: • user engagement and interactivity, particularly for those institutions that invested in highly interactive and immersive experiences; the rewards are seen in interest and enthusiasm of their user groups. • service design: organizations that have carefully planned the service have found that this technology was successfully serving the needs of their user communities. • promotion and “wow factor” that has brought attention to the organization and the service. it is not surprising that digital displays are central points on tours of dignitaries, political figures, and external guests. further, many have commented that they “did not imagine a library could be involved in such an innovative experiment,” and others have added that their digital displays have “created new conversations that did not exist before.” • leadership and vision at the organizational level, which secures support and resources as well as defines the scope of the service to ensure its sustainability and success: “money is not necessarily the only barrier to doing this service, but risk taking, culture.” • technology solution, where “everything works” and both the organization and users of the service are happy with the functionality, features, and performance of the chosen solution. • flexibility and willingness to learn new things, including being open to agile projectmanagement methods, taking risks, and continually learning new tools, technologies, and processes as the service matures. it is our flagship | zvyagintseva 67 https://doi.org/10.6017/ital.v37i2.9987 • communication and collaboration, both internally among stakeholders and externally by building community partnerships, new audiences, and user participation in content creation. for example, one study participant noted that the technology “has contributed to giving the museum a new audience of primarily young people and families—a key objective held in 2010 at the commencement of the gallery refurbishments.” • workflow and project management for those embracing new approaches required to bring multiple skill sets together to create engaging new exhibits. as one participant has put it, “these types of approaches require testing, improvement, a new workflow and lifecycle for the projects.” • having the right team with appropriate skills to support the service, though this theme was rated as being less significant than designing services effectively and securing institutional support for the technology service. in other words, study participants noted that having in-house programming or design skills is not enough without proper definition of success for digital exhibits services. perceptions institutional and user reception of digital displays as a service to pursue in learning organizations has been identified as overwhelmingly positive, with 87% of the organizations noting positive feedback. for example, one study participant noted the positive attention received by the wider community for the digital display, stating “it is our flagship and people are in general impressed by both the potential and some of the existing content." some participants have gone as far as to say that the reception among users has been “through the roof” and they have “never had a negative feedback comment” about their display. this finding indicates a high degree of satisfaction with such technologies by organizations that pursued a digital display. table 6 further explores the range of perceptions observed in the study. table 6. perception of digital display services perception responses % positive 20 87 hesitation or uncertainty 7 30 concerns about purpose 4 17 concerns about user engagement 4 17 concerns about costs 3 13 negative 3 13 a minority (13%) have noted some negative perceptions, largely related to concerns about costs or functionality of the technology; 30% have observed uncertainty and hesitation on behalf of the staff and users in terms of engagement as well as interrogating its purpose in the organization. for example, one study participant summarizes this mixed sentiment by saying, “the perception is information technology and libraries | june 2018 68 that it’s really neat and worthwhile for exploring new ways of teaching, but that the same features and functions could be achieved with less (which we think is a good thing!).” it is helpful to note this trend in perception, as any new service will likely bring a mixture of excitement, hesitation, and occasional opposition. interestingly, these reactions have originated both from the staff of organizations interviewed and their communities of users. discussion the findings from this study indicate that the functions of the digital displays are highly dependent on the organizational context in which displays exist. this context, in turn, defines the nature of the services delivered through the digital display. for example, figure 6 can be useful in classifying the various ways digital displays appear in the study population, from research and teaching-oriented lab spaces to public spaces with passive messaging or active immersive gamelike digital experiences. figure 6. types of digital displays in the study population. as such, visualization walls might belong in the “lab spaces” category that typically appears in academic libraries or research units and do not require content planning and scheduling. what we might call “digital interactive exhibits” tend to appear in museums and galleries with a primarily public audience and may have a permanent, seasonal, or monthly rotation schedule. however, despite a range of approaches taken to provide content and in terms of use of these technologies, many organizations share resourcing needs and challenges, such as troubleshooting the technology solution, creating engaging content, and managing costs of interactive projects. despite these common concerns, the digital-exhibits services were perceived as being overwhelmingly satisfactory in all types of organizations included in this study because they brought new audiences to the organization and were often seen as “showpieces” in the broader community. the data gathered in the environmental scan demonstrates that there is currently little consistency among digital displays in learning environments. this lack of consistency is seen in content-development methods among study participants, their programming, content it is our flagship | zvyagintseva 69 https://doi.org/10.6017/ital.v37i2.9987 management, technology solutions, and even naming of the display (and, by extension, the display service). for example, this study revealed that no evidently “open platform” for managing content at the application or the middleware level currently exists. a small number of software tools are used by organizations to support digital displays, but their use is in no way standardized, as compared to nearly every other area of library services. there is some indication that digitaldisplay services may become more standardized in the coming years, and more tools, solutions, vendors, and communities of practice will be available. for example, many signage cmss are currently on the market, and the number of game-like immersive experience companies is growing, suggesting extension of these services to libraries in the coming years. only a few software tools exist for creating exhibits, such as intuiface and touchdesigner, though no free, open-source versions of exhibit software are currently available. as well, the growing number of digital exhibits and interactive media companies currently focuses on turnkey—rather than software-as-a-service or platform—solutions. in contrast, some consistency exists in staffing needs and skills required to support the digitalexhibits service. a majority of organizations interviewed agreed that design, software development, systems administration, and project-management skills are needed to ensure digital-exhibits services run sustainably in a learning organization. in addition, lack of public library representation in this study makes it challenging to draw parallels to the library context. adapting museum practices is also not necessarily reliable, as there is rarely a mandate to engage communities and partner on content creation, as there is in libraries. for example, only the el paso (texas) museum of history engages the local community to source and organize content. these findings suggest that digital displays are a growing domain, and more solutions are likely to emerge in the coming years. the cube, compared to the rest of the study population, is a unique service model because it successfully brings together most elements examined in the environmental scan. for example, to ensure continual engagement with the digital display, the cube schedules exhibits on a regular basis and employs user interface designers, systems administrators, software engineers, and project managers. it also extends the content through community engagement, public tours, and stem programming. it has created an in-house middleware solution to simplify exhibit delivery and has chosen unity3d as its platform of choice for exhibit development. limitations only organizations from english-speaking countries were interviewed as part of the environmental scan. it is therefore unclear if access to organizations from non–english-speaking countries would have produced new themes and significantly different results. in addition, as with all environmental scans, the data is limited by the degree of understanding, knowledge, and willingness to share information of the individual being interviewed. particularly, individuals with whom the author spoke may or may not have been technology or service leads for the digital display at their respective institutions. thus, the study participants had a range of understanding of hardware specifications, functionality, and service-design components associated with digital displays. for example, having access to technology leads would have likely provided more nuanced responses around the middleware solutions and the underlying technical infrastructure required to support this service. information technology and libraries | june 2018 70 a small number of vendors were also interviewed as part of the environmental scan even though vendors did not necessarily have digital displays or service models parallel to libraries or museums. they are included in appendix b. nevertheless, gathering data from this group was deemed relevant to the study, as creative agencies have formalized staffing models and clearly identified skill sets necessary to support services of this nature. in addition, this group possesses knowledge of best practices, workflows, and project-management processes related to exhibit development. finally, this environmental scan also did not capture any interaction with direct users of digital displays, whose experiences and perceptions of these technologies may or may not support the findings gathered from the organizations interviewed. these limitations were addressed by increasing the sample size of the study within the time and resource constraints of the research project. conclusion the findings of this study show that the functions of digital-display technologies and their related services are highly dependent on the organizational context in which they exist. however, despite a range of approaches taken to provide content and in terms of use of these technologies, many organizations share resourcing needs and challenges, such as troubleshooting the technology solution, creating engaging content, and managing costs of interactive projects. despite these common concerns, digital displays were perceived as being overwhelmingly positive in all types of organizations interviewed in this study, as they brought new audiences to the organization and were often seen as “showpieces” in the broader community. the successes and lessons learned from the study population are meant to provide a broader perspective on this maturing domain as well as help inform planning processes for future digital exhibits in learning organizations. it is our flagship | zvyagintseva 71 https://doi.org/10.6017/ital.v37i2.9987 appendix a. environmental scan questions digital exhibits environmental scan interview questions—museums, libraries, public organizations 1. what are the technical specifications of the digital interactive technology at your institution? 2. who are the primary users of this technology (those interacting with the platform)? is there anyone you thought would use it and isn’t? 3. what are primary uses for the technology (events, presentations, analysis, workshops)? 4. what types content is supported by the technology (video, images, audio, maps, text, games, 3d, all of the above?) 5. where is content created and how is this content managed? 6. what is the schedule for the content and how is it prioritized? 7. can you estimate the fte (full-time equivalent) of staff members involved in supporting this technology/service, both directly and indirectly? what does indirect support for this technology entail? 8. in your experience, what kinds of skills are necessary in order to support this service? 9. have partnerships with other organizations producing content to be exhibited been established or explored? 10. what challenges have you encountered in providing this service? 11. what have been some keys to the successes in supporting this service? 12. what has been the biggest success of this service and what has been the biggest disappointment? 13. what is the perception of this technology in institution more broadly? 14. are there any other institutions you suggest we contact to learn more about similar technologies? information technology and libraries | june 2018 72 digital exhibits environmental scan interview questions: vendors 1. what is the relationship between creative studio and hardware/fabrication? do you do everything or work with av integrators instead to put together touch interactives? 2. who have been the primary users of the interactive exhibits and projects you have completed? 3. who writes the use cases when creating a digital interactive exhibit? 4. what types content is supported by the technology (video, images, audio, maps, text, games, 3d, all of the above?) do you see a rise in interest for 3d and game-like environments and do you have internal expertise to support it? 5. where is content created for the exhibits and how is this content managed? who curates? 6. what timespan or lifecycle do you design for? 7. how big is your team? how long to projects typically take to create? 8. what types of expertise do you have in house? what might a project team look like? 9. to what extent is there a goal of sharing knowledge back with the company from clients or users? 10. what challenges have you encountered in providing this service? 11. what have been some keys to the successes in supporting this service? it is our flagship | zvyagintseva 73 https://doi.org/10.6017/ital.v37i2.9987 appendix b: study population in environmental scan organization location date interviewed all saints anglican school merrimac, australia july 25, 2016 anode nashville, tn july 22, 2016 belle & wissell seattle, wa july 26, 2016 bradman museum bowral, australia july 10, 2016 brown university library providence, ri june 3, 2016 university of calgary library and cultural resources calgary, ab june 2, 2016 deakin university library geelong, australia june 14, 2016 university of colorado denver library denver, co june 24, 2016 duke university library durham, nc august 17, 2016 el paso museum of history el paso, tx june 24, 2016 georgia state university library atlanta, ga june 10, 2016 gibson group wellington, new zealand july 16, 2016 henrico county public library henrico, va august 9, 2016 ideum corrales, nm july 26, 2016 indiana university bloomington library bloomington, in may 31, 2016 interactive mechanics philadelphia, pa august 2, 2016 johns hopkins university library baltimore, md june 20, 2016 nashville public library nashville, tn july 22, 2016 north carolina state university library raleigh, nc june 8, 2016 university of north carolina atchapel hill library chapel hill, nc june 2, 2016 university of nebraska omaha omaha, ne june 16, 2016 omaha do space omaha, ne july 11, 2016 university of oregon alumni center eugene, or june 7, 2016 philadelphia museum of art philadelphia, pa august 10, 2016 queensland university of technology brisbane, australia june 30; july 29, 2016; august 16, 2016 société des arts technologiques montreal, qc august 8, 2016 second story portland, or july 28, 2016 st. louis university st. louis, mo july 4, 2016 stanford university library stanford, ca july 22, 2016 university of illinois at chicago chicago, il june 22, 2016 university of mary washington fredericksburg, va july 7, 2016 visibull waterloo, on august 12, 2016 university of waterloo stratford campus stratford, on june 22, 2016 yale university center for science and social science information new haven, ct july 13, 2016 information technology and libraries | june 2018 74 appendix c: digital content publishing guidelines organization name guidelines website deakin university library http://www.deakin.edu.au/library/projects/sparking-trueimagination duke university https://wiki.duke.edu/display/lmw/lmw+home griffith university https://intranet.secure.griffith.edu.au/work/digitalsignage/seemore north carolina state university library http://www.lib.ncsu.edu/videowalls university colorado denver http://library.auraria.edu/discoverywall university of calgary library and cultural resources http://lcr.ucalgary.ca/media-walls university of waterloo stratford campus https://uwaterloo.ca/stratford-campus/research/christiemicrotiles-wall http://www.deakin.edu.au/library/projects/sparking-true-imagination http://www.deakin.edu.au/library/projects/sparking-true-imagination https://wiki.duke.edu/display/lmw/lmw+home https://intranet.secure.griffith.edu.au/work/digital-signage/seemore https://intranet.secure.griffith.edu.au/work/digital-signage/seemore http://www.lib.ncsu.edu/videowalls http://library.auraria.edu/discoverywall http://lcr.ucalgary.ca/media-walls https://uwaterloo.ca/stratford-campus/research/christie-microtiles-wall https://uwaterloo.ca/stratford-campus/research/christie-microtiles-wall it is our flagship | zvyagintseva 75 https://doi.org/10.6017/ital.v37i2.9987 references 1 flora salim and usman haque, “urban computing in the wild: a survey on large scale participation and citizen engagement with ubiquitous computing, cyber physical systems, and internet of things,” international journal of human-computer studies 81 (september 2015): 31–48, https://doi.org/10.1016/j.ijhcs.2015.03.003. 2 peter peltonen et al., “it’s mine, don't touch! interactions at a large multi-touch display in a city center,” proceedings of the sigchi conference on human factors in computing systems, florence, italy, april 5–10, 2008, 1285–94, https://doi.org/10.1145/1357054.1357255. 3 shawna sadler, mike nutt, and renee reaume, “managing public video walls in academic library,” (presentation, cni spring 2015 meeting, seattle, washington, april 13-14, 2015), http://dro.deakin.edu.au/eserv/du:30073322/sadler-managing-2015.pdf. 4 peltonen et al., “it’s mine, don't touch!” 5 john brosz, e. patrick rashleigh, and josh boyer. “experiences with high resolution display walls in academic libraries” (presentation, cni fall 2015 meeting, washington, dc, december 13-14, 2015), https://www.cni.org/wp-content/uploads/2015/12/cni_experiences_brosz.pdf; bryan sinclair, jill sexton, and joseph hurley, “visualization on the big screen: hands-on immersive environments designed for student and faculty collaboration” (presentation, cni spring 2015 meeting, seattle, washington, april 13–14, 2015), https://scholarworks.gsu.edu/univ_lib_facpres/29/. 6 niels wouters et al., “uncovering the honeypot effect: how audiences engage with public interactive systems. conference on designing interactive systems,” dis ’16 proceedings of the 2016 acm conference on designing interactive systems, brisbane, australia, june 4–8, 2016, 516, https://doi.org/10.1145/2901790.2901796. 7 gonzalo parra, joris klerkx, and erik duval, “understanding engagement with interactive public displays: an awareness campaign in the wild,” proceedings of the international symposium on pervasive displays, copenhagen, denmark, june 3–4, 2014, 180–85, https:/doi.org/10.1145 /2611009.2611020; ekaterina kurdyukova, mohammad obaid, and elisabeth andre, “direct, bodily or mobile interaction?,” proceedings of the 11th international conference on mobile and ubiquitous multimedia, ulm, germany, december 4–6, 2012, https://doi.org/10.1145 /2406367.2406421; tongyan ning et al., “no need to stop: menu techniques for passing by public displays,” proceedings of the 2011 annual conference on human factors in computing systems, vancouver, british columbia, https://www.gillesbailly.fr/publis/bailly_chi11.pdf. 8 jung soo lee et al., “a study on digital signage interaction using mobile device,” international journal of information and electronics engineering 5 no. 5 (2015): 394–97, https://doi.org/10.7763/ijiee.2015.v5.566. jung soo lee et al., “a study on digital signage interaction using mobile device,” international journal of information and electronics engineering 5 no. 5 (2015): 394–97, https://doi.org/10.7763/ijiee.2015.v5.566. 9 parra et al, “understanding engagement,” 181. https://doi.org/10.1016/j.ijhcs.2015.03.003 https://doi.org/10.1145/1357054.1357255 http://dro.deakin.edu.au/eserv/du:30073322/sadler-managing-2015.pdf https://www.cni.org/wp-content/uploads/2015/12/cni_experiences_brosz.pdf https://scholarworks.gsu.edu/univ_lib_facpres/29/ https://doi.org/10.1145/2901790.2901796 https://doi.org/10.1145/2611009.2611020 https://doi.org/10.1145/2611009.2611020 https://doi.org/10.1145/2406367.2406421 https://doi.org/10.1145/2406367.2406421 https://www.gillesbailly.fr/publis/bailly_chi11.pdf https://doi.org/10.7763/ijiee.2015.v5.566 https://doi.org/10.7763/ijiee.2015.v5.566 information technology and libraries | june 2018 76 10 parra et al, “understanding engagement,” 181; walter, robert, gilles gailly, and jorg müller. “strikeapose: revealing mid-air gestures on public displays.” proceedings of the sigchi conference on human factors in computing systems, paris, france, april 27-may 2, 2013, 841850. https://doi.org/10.1145/2470654.2470774. 11 philipp panhey et al., “what people really remember: understanding cognitive effects when interactive with large displays,” proceedings of the 2015 international conference on interactive tabletops & surfaces, madeira, portugal, november 15–18, 2015, 103–6, https://doi.org/10.1145/2817721.2817732. 12 christopher ackad et al., “an in-the-wild study of learning mid-air gestures to browse hierarchical information at a large interactive public display,” proceedings of the 2015 acm international joint conference on pervasive and ubiquitous computing, osaka, japan, september 7–11, 2015, 1227–38, https://doi.org/10.1145/2750858.2807532. 13 parra et al, “understanding engagement,” 181; kurdyukova, obaid and andre, 2012, n.p. 14 jouni vepsäläinen et al., “web-based public-screen gaming: insights from deployments,” ieee pervasive computing 15 no. 3 (2016): 40–46, https://ieeexplore.ieee.org/document/7508836/. 15 uta hinrichs, holly schmidt, and sheelagh carpendale, “emdialog: bringing information visualization into the museum,” ieee transactions on visualization and computer graphics 14 no. 6 (november 2008):1181-1188. https://doi.org/10.1109/tvcg.2008.127. 16 hinrichs, schmidt, and carpendale, “emdialog.” 17 sarah clinch et al., “reflections on the long-term use of an experimental digital signage system,” proceedings of the 13th international conference on ubiquitous computing, beijing, china, september 17-21, 2011, 133-142. https://doi.org/10.1145/2030112.2030132. 18 elaine m. huang, anna koster, and jan borchers. “overcoming assumptions and uncovering practices: when does the public really look at public displays?,” proceedings of the 6th international conference on pervasive computing, sydney, australia, may 19-22, 2008, 228-243. https://doi.org/10.1007/978-3-540-79576-6_14; jorg muller et al., “looking glass: a field study on noticing interactivity of a shop window,” proceedings of the sigchi conference on human factors in computing systems, austin, texas, may 5-10, 2012, 297-306. https://doi.org/10.1145/2207676.2207718. 19 salim & haque, “urban computing in the wild,” 35 20 mettina veenstra et al., “should public displays be interactive? evaluating the impact of interactivity on audience engagement,” proceedings of the 4th international symposium on pervasive displays, saarbruecken, germany, june 10–12, 2015, 15–21, https://doi.org/10.1145/2757710.2757727. 21 clinch et al., “reflections.” https://doi.org/10.1145/2470654.2470774 https://doi.org/10.1145/2817721.2817732 https://doi.org/10.1145/2750858.2807532 https://ieeexplore.ieee.org/document/7508836/ https://doi.org/10.1109/tvcg.2008.127 https://doi.org/10.1145/2030112.2030132 https://doi.org/10.1007/978-3-540-79576-6_14 https://doi.org/10.1145/2207676.2207718 https://doi.org/10.1145/2757710.2757727 it is our flagship | zvyagintseva 77 https://doi.org/10.6017/ital.v37i2.9987 22 robert ravnik and franc solina, “audience measurement of digital signage: qualitative study in real-world environment using computer vision,” interacting with computers 25, no. 3 (2013), https://doi.org/10.1093/iwc/iws023. 23 neal buerger, “types of public interactive display technologies and how to motivate users to interact,” media informatics advanced seminar on ubiquitous computing, 2011, hausen, doris, conradi, bettina, hang, alina, hennecke, fabiant, kratz, sven, lohmann, sebastian, richter, hendrik, butz, andreas and hussmann, heinrich (eds). university of munich, department of computer science, media informatics group, 2011. https://pdfs.semanticscholar.org/533a/4ef7780403e8072346d574cf288e89fc442d.pdf . 24 c. g. screven, “information design in informal settings: museums and other public spaces,” in information design, ed. robert e. jacobson (cambridge, ma: mit press, 2000), 131–192. 25 parra et al., “understanding engagement,” 181. 26 uta hinrichs and sheelagh carpendale, “gestures in the wild: studying multi-touch gesture sequences on interactive tabletop exhibits,” proceedings of the sigchi conference on human factors in computing systems, vancouver, british columbia, may 7–12, 2011, 3023–32, https://doi.org/10.1145/1978942.1979391. 27 harry brignull and yvonne rogers, “enticing people to interact with large public displays in public spaces,” interact ’03, proceedings of the international conference on human-computer interaction, zurich, switzerland, september 1-5, 2003, 17-24, matthias rauterberg, marino menozzi, and janet wesson (eds.), tokyo: ios press, 2003. http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/interact200 3-p17.pdf. 28 peltonen et al., “it’s mine, don't touch!” 29 peltonen et al., “it’s mine, don't touch!” 30 anne horn, bernadette lingham, and sue owen, “library learning spaces in the digital age,” proceedings of the 35th annual international association of scientific and technological university libraries conference, espoo, finland, june 2-5, 2014. http://docs.lib.purdue.edu/iatul/2014/libraryspace/2. https://doi.org/10.1093/iwc/iws023 https://pdfs.semanticscholar.org/533a/4ef7780403e8072346d574cf288e89fc442d.pdf https://doi.org/10.1145/1978942.1979391 http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/interact2003-p17.pdf http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/interact2003-p17.pdf http://docs.lib.purdue.edu/iatul/2014/libraryspace/2 abstract introduction method literature review definitions interactivity user engagement age display content social context findings technical and hardware landscape users and use cases figure 1. audience types for digital displays in the study population. content types and management middleware, automation, and exhibit management sources of content content creation guidelines content scheduling staffing and skills challenges and successes perceptions discussion figure 6. types of digital displays in the study population. limitations conclusion appendix a. environmental scan questions digital exhibits environmental scan interview questions—museums, libraries, public organizations digital exhibits environmental scan interview questions: vendors appendix b: study population in environmental scan appendix c: digital content publishing guidelines references rarely analyzed: the relationship between digital and physical rare books collections article rarely analyzed the relationship between digital and physical rare books collections allison mccormack and rachel wittmann information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.13415 allison mccormack (allie.mccormack@utah.edu) is the original cataloger for special collections, university of utah, university of utah. rachel wittmann (rachel.wittmann@utah.edu) is the digital curation librarian, university of utah. © 2022. abstract the relationship between physical and digitized rare books can be complex and, at times, nebulous. when building a digital library, should showcasing a representative slice of the physical collection be the goal? should stakeholders focus on preservation, high-use items, or other concerns? to explore these conundrums, a special collections librarian and a digital services librarian performed a comparative analysis of their library’s physical and digital rare books collections. after exporting marc metadata for the rare books from their ils, the librarians examined the place of publication, publication date, and broad subject range of the collection. they used this data to create a variety of visualizations with the open-source digital humanities tool tableau public. next, the authors downloaded the rare books metadata from the digital library and created illuminating data visualizations. were the geographic, temporal, and subject scopes of the digital library similar to those of the physical rare books collection? if not, what accounted for the differences? the implications of these and other findings will be explored. introduction as of august 2019, the special collections division of the university of utah j. willard marriott library held over 256,000 printed works and archival collections. approximately 22% of the collection, or just over 55,000 works, belongs to the rare books department (https://lib.utah.edu/collections/rarebooks/), which contains not only books but serials, maps, manuscripts, ephemera, and other formats. the collection covers over 4,000 years of human history, with its earliest piece, a cuneiform tablet, dating to the mid-twenty-third century bce; contains works from nearly 100 different countries; and represents a wide variety of topics, including the exploration and settlement of the american west and the history of the book. the rare books department, a subset of special collections, specifically seeks to document the history of written human communication and actively collects historical items to enhance teaching and research at the university of utah. the marriott library has been adding digitized works from the rare books department to its digital library (https://collections.lib.utah.edu/) for over 25 years. approximately 780 works, or 1.42% of the rare books collection, has been digitized to date. however, no formal collection development plan was ever written, and rare books were selected for digitization by both curators and patrons. unfortunately, the reason a particular item was digitized is not recorded in the system: it is unclear if age, research value, physical condition, a desire to bring forward underrepresented stories, or a combination of these and other factors influenced the decision to digitize a rare book. this piecemeal approach to digital library collection development, while not uncommon, made it difficult for library staff and patrons to determine the relationship between the digital and physical collections of rare books. it also presented challenges when library staff mailto:allie.mccormack@utah.edu mailto:rachel.wittmann@utah.edu https://lib.utah.edu/collections/rarebooks/ https://collections.lib.utah.edu/ information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 2 attempted to communicate the scope and intent of the digital library to patrons, who assumed that the digitized items were representative of the overall collection. given their expertise in library metadata, the authors decided to analyze both traditional library catalog records and digital library records for the rare books collection and explore whether the digital collection was proportionally representative of the physical collection or if it differed in geographic, temporal, or subject scope in a meaningful way. they then created a series of data visualizations to better communicate information about the library’s rare books holdings. literature review while much has been written about methods and criteria for selecting special collections items to be digitized and the effects of digitization on collection accessibility, few authors have discussed the relationships between digital collections and the physical collections from which they were sourced. in their highly detailed treatise on selection strategies for digitization, ooghe and moreels identify representativity, a method that “aims for a final selection that provides a representative view of the original collections,” as one of 25 selection criteria for digitization projects.1 however, alexandra mills notes that “without a thorough understanding of the institution and collections, it is impossible to create truly representative collections.”2 because many digitization initiatives are undertaken in response to user requests, preservation concerns, or the availability of projectbased funding, it is likely that most libraries do not plan for their digital collections to be representative of their overall special collections holdings. as peter michel states, the digital collections at the university of nevada, las vegas, were explicitly built with popular history and popular culture in mind and were never intended to be “surrogates of the collection.”3 bradley daigle of the university of virginia explained that digitization could be undertaken to alleviate preservation concerns, respond to defined research needs, or to brand certain online content, but this approach could give the mistaken impression “that only the important materials are digitized.”4 despite the gaps in the literature, having an explicit collection development policy is still considered paramount; indeed, it is the very first principle listed in the national information standards organization (niso)’s framework for building “good” digital collections.5 to investigate this type of documentation further, a google search was employed using the search term “digital collection development policy site:edu”. this yielded 10 publicly accessible digital collection development policies from academic libraries in the united states: 6 • amherst college library (https://www.amherst.edu/library/services/digital/digitalcolldev) • emerson college archives and special collections (https://www.emerson.edu/policies/digital-collections-development-policy) • colorado state university libraries (https://lib.colostate.edu/digital-collectiondevelopment-policy/) • florida atlantic university digital library (https://library.fau.edu/policy/digital-librarycollection-development-policy) • georgetown university library (https://www.library.georgetown.edu/digital-projectpolicy) • northern illinois university digital library (https://digital.lib.niu.edu/policy/collectiondevelopment-policy) https://www.amherst.edu/library/services/digital/digitalcolldev https://www.emerson.edu/policies/digital-collections-development-policy https://lib.colostate.edu/digital-collection-development-policy/ https://lib.colostate.edu/digital-collection-development-policy/ https://library.fau.edu/policy/digital-library-collection-development-policy https://library.fau.edu/policy/digital-library-collection-development-policy https://www.library.georgetown.edu/digital-project-policy https://www.library.georgetown.edu/digital-project-policy https://digital.lib.niu.edu/policy/collection-development-policy https://digital.lib.niu.edu/policy/collection-development-policy information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 3 • oregon health and sciences university digital collections (https://www.ohsu.edu/library/ohsu-digital-collections-development-policy) • university of north texas university libraries (https://library.unt.edu/policies/collection-development-digital-collections/) • wesleyan university digital library (https://digitalcollections.wesleyan.edu/about/whatwe-collect) • williams college special collections (https://specialcollections.williams.edu/collectiondevelopment-policies/digital-collections/) in reviewing the sample of 10 universities’ digital collection development policies, homogenous content becomes apparent. almost all of the policies include a mission statement, scope, and selection criteria for potential digital collection items. all policies include criteria that physical materials should meet in order to qualify for digitization. the most common criteria for digitization are materials that are rare or unique, high-use, fragile, important to institutional or regional history, and/or support campus curriculum or faculty research. in addition, the clearance to publish materials online is ubiquitous among the policies. materials eligible for online display must either be in the public domain or intellectual property rights are held by the institution, and materials currently under copyright must receive permission from the copyright holder. a measured approach to digitization qualification has been employed by the university of north texas (unt) libraries’ digital collections and the northern illinois university digital library (niudl). unt libraries’ digital collections policy lists levels of criteria that materials must meet in order to be digitized and included in the digital library; to qualify for digitization, all criteria on level one must be met while only one criterion from level two is needed. niudl includes a priority factor rubric which includes criteria categories and corresponding numerical scale with a maximum point of 35, the higher value signifying an elevated priority. six of the 10 policies include prioritizing materials that support diversity and inclusion missions on campus. amherst college has leveraged their digital collection development policy to include content that would increase perspectives of underrepresented groups within the digital collections and traditionally underrepresented groups more broadly. niudl includes marginalized groups as a collection priority area in order to “deepen public understanding of the histories of people of color and other communities and populations whose work, experiences, and perspectives have been insufficiently recognized or unattended” and lists over 20 such groups. the collection candidate’s relationship to other collections is outlined in four of the 10 policies. georgetown university requires that “the materials form a coherent collection, fill gaps in existing collections, or complement existing collection strengths.” amherst college evaluates whether digitization would “enhance public awareness of archives’ collection strengths.” another function of a digital collection development policy is to inform the public on the scope and provenance of contents in their digital library. the unt digital collection policy includes a section outlining the content contributors, including partners, which can be beneficial for large-scale digital libraries that host collections from multiple partners. unt is also exemplary in defining collection curators and their responsibilities while underscoring the nature of this role, likely changing over time and not set to an individual. with no written digital collection development policy regarding special collections at the marriott library, the authors would first have to analyze both the physical and digital special collections before determining what factors may have influenced the digitization of these materials. libraries are gathering massive amounts of data, ranging from the metadata of their varied collections to patron usage statistics of both physical and digital collections. interpretation of the https://www.ohsu.edu/library/ohsu-digital-collections-development-policy https://library.unt.edu/policies/collection-development-digital-collections/ https://digitalcollections.wesleyan.edu/about/what-we-collect https://digitalcollections.wesleyan.edu/about/what-we-collect https://specialcollections.williams.edu/collection-development-policies/digital-collections/ https://specialcollections.williams.edu/collection-development-policies/digital-collections/ information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 4 ever-growing accumulation of data can quickly become complex. by visualizing data, we are able to interpret large and often messy sets of data while processing multiple aspects of the data concurrently. for example, the ohio state university (osu) libraries used tableau desktop to combine data from various departments in order to better manage and explore information.7 tableau was osu’s data visualization software of choice due to its ease of use and accessibility, and the program was also used to create dashboards that blend data from various sources for realtime visualizations. bibliographic metadata cleanup to understand the marriott library’s collections, one must first understand the relevant metadata, which for the rare books department is in the machine-readable cataloging (marc) format. a popular criticism of marc, commonly used in traditional library cataloging, is that the schema is highly regulated and, at times, redundant. however, for the purposes of this project, those qualities proved to be a boon. an older, uncorrected record in the digital library might list london as the place of publication for a particular book, but it was not immediately apparent if that referred to london, england; london, ontario; or london, ohio. however, a marc record would not only list a book’s city of publication in the 260 or 264 field but would also contain a two or three-letter code in the 008 field that specified the country, us state, canadian province or territory, or australian state or territory in which it was published. for this reason, the authors decided to base their analysis on marc record data from the physical collection instead of the dublin core metadata used in the digital library. in order to tease out the relationships between our digital and physical collections, each of the approximately 55,000 rare books bibliographic records stored in alma, the marriott library’s cloud-based library services platform, would have to have a common set of data points that could be compared. for the purposes of this analysis, the authors chose to investigate the place of publication and the subject of each work. despite the relative rigidity of marc metadata, some of the alma records lacked country of publication data in the 008 field. these records were not incorrect, but merely outdated: some had been copied directly from paper catalog card s when the library first transitioned to a computer-based cataloging system, while others were created using different metadata standards. approximately 6,000 rare books either completely lacked a country code in the 008 field or had data that could possibly be enhanced by, for example, replacing a code for the united states with a code for a particular state. instead of editing all 6,000 records by hand, the cataloger wrote several metadata normalization rules in alma to automatically correct the most obvious errors. records that listed chicago as the place of publication were assigned the marc geographic code for illinois, while those that were published in lugduni batavorum, the latin designation for leiden, were given the geographic code for the netherlands. however, 3,000 records were unable to be enhanced in this manner, either because their place of publication was an ambiguous city name like cambridge or because the place of publication was listed as unknown. the cataloger examined each record individ ually and was ultimately unable to assign a marc geographic code to 1,682 records, most of which were arabic manuscripts or advertising pamphlets that simply did not list a place of publication or creation. while these records would be excluded from the place of publication analysis, they could be mined for data on other topics. with the marc records as complete as possible, the metadata was exported from alma into an excel spreadsheet and given to the metadata librarian for further manipulation. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 5 metadata transformation & visualization creation the next phase involved standardizing the raw metadata to create human readable data, rather than marc codes, that are necessary to produce data visualizations. once the physical rare books ’ bibliographic metadata was updated in alma, it was then exported as a comma-separated values file. the raw data export produced a massive spreadsheet containing over 50,000 marc records. these records included twoand three-letter location codes for the place of publication from the library of congress marc code list for geographic areas. two-letter codes are used for most countries, while three-letter codes are used for states within the united states, provinces within canada, and territories within the united kingdom. while this additional level of location data was available for books from the united kingdom and canada, it was decided to review the collection at a country level for consistency and map display. books from the united states, however, were analyzed on a state level, considering the research is germane to an american institution. using a list correlating these codes to the location name provided by the library of congress (https://www.loc.gov/marc/countries/countries_code.html), a vlookup formula was used in microsoft excel to add the location names to the marc records. the vlookup formula pulls in data from one table to another as long as the two tables have one data field in common. in this exercise, both tables of data contained the library of congress location codes, therefore the lc location codes were used to add the location names to the table containing the marc metadata. once the location names were added, there were some additional quality control steps required, as lc location names that included outdated country names posed issues to mapping the data to current country names and boundaries. for example, we combined the codes for east germany and west berlin for the one representing contemporary germany. for countries that have since been dissolved and rezoned to multiple countries, e.g., the ussr and czechoslovakia, these records were manually checked for city names and then added to the current country. once this process was completed, the results showed the rare books were published in 97 countries and all 50 united states, as well as the district of columbia. examining the subject content of the rare books physical collection was another aspect of analysis for this project. in contemplating this analysis, using the lc subject heading field was considered, however, faceting of lc subject headings and the structure of the exported data posed too many issues for a rather simple analysis. instead, the library of congress call number was used to extract high-level lc classification information for each work by separating the first two letters of the call numbers included in the exported marc metadata, which indicated lc class and subclass. to add the lc class and subclass names to these letters, a vlookup formula was used again to match the letter codes to the list of lc classification categories. once classification categor ies were added to the 55,000 records, works from all 21 lc master classes and 190 subclasses were represented in the rare books collection. in addition to the physical rare books collection held at the marriott library, there is a selection of this collection that has been digitized and is accessible in the marriott digital library. the rare books digital collection (https://collections.lib.utah.edu/search?facet_setname_s=uum_rbc) comprises 780 works, although this number includes unique records for individual volumes within a series and therefore is not a true comparison to marc metadata records, which contain one record for a series. for example, the silver reef miner, a newspaper “devoted to the mining interests of southern utah” published during the late nineteenth century, has 30 individual volumes in the digital library, but these are represented in just one marc record. in order to compare the digital collection to the physical collection, the datasets would need to have https://www.loc.gov/marc/countries/countries_code.html https://collections.lib.utah.edu/search?facet_setname_s=uum_rbc information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 6 consistent data for comparison, namely place of publication and lc classification derived from call numbers. the digital collection metadata is in the dublin core schema, which does not include all of the metadata found in the marc metadata, nor does it use the same format. while there is a dublin core spatial element used to capture geographic data on what the item is about, this does not always align neatly with the location of an item’s publication. for example, reise in das innere nord-america in den jahren 1832 bis 1834 (2 volumes) is a book printed in germany that documents an expedition to north america in 1832–1834 and includes illustrations of native american people from the swiss artist karl bodmer. for these volumes, the appropriate dublin core spatial data would include the specific regions the expedition traveled to in north america; in the marc 26x field, however, it contains koblenz, germany, the city where the volumes were published. call number data was included for many digitized works, but not in a consistent format. in order to use the same data to compare the physical rare books collection to the digital one, the digital collection metadata was updated with the improved/accurate call numbers found in the marc metadata. another improvement to the digital collection metadata was the addition of the metadata management system (mms) id unique numerical identifiers that aid in locating a record in the alma system. when the rare books’ descriptive metadata was originally converted to dublin core during the digitization process, some titles and call numbers were changed and became different from their physical counterparts. the inclusion of the mms id allows for a consistent identifier between the physical and digital collections. when selecting data visualization software, being able to create a map of the places where books in the rare collection were published was a priority. considering the goal of creating an easily replicable workflow for other libraries, the authors sought a freely accessible program that did not require advanced geospatial skills, unlike esri’s arcgis software. tableau software is a data visualization software package with both a public and desktop version. the tableau desktop version requires a subscription fee while tableau public is open access. for the purposes of this study, tableau public offered open access and mapping features that are enabled without any geospatial knowledge necessary. analysis creating a variety of data visualizations allowed information about the rare books physical and digital collections to be more apparent than merely browsing entries in a spreadsheet. for example, there are numerous geographic disparities between the two collections of rare materials as shown in the american states in which works from the collections were published. while books from all 50 states are found in the physical collection (fig. 1), only 18 states are represented in the digital library (fig. 2), with new york being the state in which the highest number of books were published. as new york city has long been a major publishing center in the united states, the authors were not surprised by this. however, the subsequent states were quite different: california and utah ranked second and third for the physical collection, while massachusetts and pennsylvania claimed those spots for the digital library. the authors believe several factors might influence this discrepancy. first, works can only be added to the digital library if they are no longer in copyright, and states with longer histories of european-american settlement are more likely to have published books that are now out of copyright. furthermore, these older books are more likely to be in a fragile condition and therefore may have been digitized to decrease the amount of physical handling to which they are subjected. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 7 figure 1. marriott library physical rare books by us state. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 8 figure 2. marriott library digital rare books by us state. there are other discrepancies when comparing the country of publication between the physical (fig. 3) and digital collections (fig. 4). while 61% of the physical rare books were published in the united states, only 20% of the digitized works were published in this country. the authors expected to see egypt rank highly in the physical collection, as many of the rare books were purchased by former university of utah professor dr. aziz atiya to support the middle east center for research he founded; similarly high in rank, britain, germany, france, and italy were all major centers for the early printing and publishing trade in early modern europe. however, there is strong geographic bias in the digital collection, as only north america, western europe, and one african country are represented online. copyright may again play a factor, as the earliest books from non-western countries in the collection often date to the twentieth century, but a eurocentric or other bias cannot immediately be discounted. while the physical collection contains many more european imprints than from the global south, it is much more diverse than the digital collection. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 9 figure 3. marriott library physical rare books by country. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 10 figure 4. marriott library digital rare books by country. the analysis of the subjects represented in the collection proved to be somewhat challenging to study. due to the nature and structure of library of congress subject headings, which attempt to mirror natural language and may be composed of “strings” of phrases to represent complex topics, no tableau public visualization could be created that effectively grouped similar content areas together without looking quite fragmented. instead, the authors based their analysis of subjects on library of congress classification numbers (i.e., call numbers) assigned to works, which, though not exact, can be understood as distillations of the subject of a work.8 once again there were considerable differences between the physical and digital rare books collections (fig. 5). as in many generalized special collections, literature and history make up significant portions of the physical collection. however, works on bibliography, or the study of books and book history, comprise a notable percentage of the collection. many of these are modern works on book history and special collections librarianship and therefore are unable to be digitized due to copyright law. nearly 9% of the digital collection is on the sciences, though these works comprise only 3% of the physical collection. while this portion of the holdings may be relatively small, it contains many scientific high points such as vesalius’ de humani corporis fabrica, early printings of ancient mathematical texts, and the journals of major scientific societies, which may have been digitized both for physical preservation as well as high interest on the part of students and faculty on campus. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 11 figure 5. percentage of rare books physical and digital collections by library of congress class. next steps now that the first phase of the project is complete, the authors would like to conduct additional analyses. first, they plan to compare the usage statistics of the digital rare books collection to the circulation statistics of the physical collection. this method of inquiry was not possible at the start of the project, as circulation information for the rare books was previously not tracked in the integrated library system. now that rare books are checked out to patrons for use in the special collections reading room, this data can be quickly pulled from alma. once there is a year’s worth of circulation data for the rare books unhindered by the changes necessitated by the coronavirus pandemic, the authors will compare the usage statistics of the digital collection for the same time period. do patrons in the reading room look at similar materials to online patrons, or are their interests vastly different? are some rare books used so frequently that they would benefit from the added physical security that digitization brings? information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 12 the authors also plan to pull annual usage statistics from the digitized rare books and share this with special collections division leadership. online patrons are still library patrons, and the division can use the viewing data to show the national and international reach of the collection. relatedly, the authors will investigate the digital library usage data in more depth. do patrons from utah, the united states, and the world look at similar materials, or are there geographic divides among the online patrons? do countries that are home to a majority of the university’s international student body have higher viewership numbers? finally, the authors wish to convene a group of stakeholders to create a formal collection development plan for the rare books component of the digital library. given the library’s limited resources, it is imperative that digitization be done thoughtfully and systematically. there is a good rationale for creating a digital collection that is representative of the physical rare books collection as well as one that highlights certain collection areas. both material fragility and the modern scholarly emphasis on highlighting the stories of people of color, women, and other underrepresented groups in library collections provide strong counterarguments to making digital libraries strictly representative of their physical counterparts. since informal conversations with patrons of the marriott library revealed that they assumed the digital library was representative of the collection overall, it is imperative that this assumption be either confirmed or disclaimed in a publicly viewable statement. in the case of the rare books department, the authors are in favor of a focused, rather than representative, collection development policy. firstly, many of the books in the collection are under copyright and therefore cannot be digitized, while other materials like reference sources for rare books librarians will be of limited interest to the general public. furthermore, complex items such as artists’ books are often poor candidates for digitization, as they may have movable components that cannot be captured accurately in a still photograph. as for what should be included online, the authors fully support equity, diversity, and inclusion efforts at the university of utah and would like to see the digital library highlight materials from marginalized groups whenever possible. usage statistics from the physical and digital collections, when they become available, should also inform the collection development policy to encourage traffic to the digital library. whatever is ultimately decided, however, the clarity a written policy provides will help streamline decision-making and ultimately help both library staff and patrons understand and search within the digital library much more effectively. endnotes 1 bart ooghe and dries moreels, “analysing selection for digitisation: current practices and common incentives,” d-lib magazine 15, no. 9/10 (2009), https://doi.org/10.1045/september2009-ooghe. 2 alexandra mills, “user impact on selection, digitization, and the development of digital special collections,” new review of academic librarianship 21, no. 2 (2015): 166. https://doi.org/10.1080/13614533.2015.1042117. 3 peter michel, “digitizing special collections: to boldly go where we’ve been before,” library hi tech 23, no. 3 (2005): 382, https://doi.org/10.1108/07378830510621793. https://doi.org/10.1045/september2009-ooghe https://doi.org/10.1080/13614533.2015.1042117 https://doi.org/10.1108/07378830510621793 information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 13 4 bradley j. daigle, “the digital transformation of special collections,” journal of library administration 52, no. 3–4 (2012): 253, https://doi.org/10.1080/01930826.2012.684504. 5 niso framework working group, a framework of guidance for building good digital collections (2007), https://www.imls.gov/sites/default/files/publications/documents/framework3.pdf. 6 the urls in the following list were accurate as of march 2, 2022. 7 sarah anne murphy, “data visualization and rapid analytics: applying tableau desktop to support library decision-making,” journal of web librarianship 7, no. 4 (2013): 465–76, https://doi.org/10.1080/19322909.2013.825148. 8 readers who do not work with marc metadata may not be familiar with how library of congress call numbers are assigned. created in 1891, the classification system is based on 21 classes designated by a single letter; subclasses add one or two letters to the initial class. catalogers must choose which one of the classes to assign to a particular work. the subject headings may guide a cataloger towards a certain class, but there is not a 1:1 relationship between subject headings and call number classes. https://doi.org/10.1080/01930826.2012.684504 https://www.imls.gov/sites/default/files/publications/documents/framework3.pdf https://doi.org/10.1080/19322909.2013.825148 abstract introduction literature review bibliographic metadata cleanup metadata transformation & visualization creation analysis next steps endnotes microsoft word 13389 20211217 galley.docx article hackathons and libraries the evolving landscape 2014–2020 meris mandernach longmeier information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13389 meris mandernach longmeier (longmeier.10@osu.edu) is head of research services, the ohio state university libraries. © 2021. abstract libraries foster a thriving campus culture and function as “third space,” not directly tied to a discipline.1 libraries support both formal and informal learning, have multipurpose spaces, and serve as a connection point for their communities. for these reasons, they are an ideal location for events, such as hackathons, that align with library priorities of outreach, data and information literacy, and engagement focused on social good. hackathon planners could find likely partners in either academic or public libraries as their physical spaces accommodate public outreach events and many are already providing similar services, such as makerspaces. libraries can act solely as a host for events or they can embed in the planning process by building community partnerships, developing themes for the event, or harnessing the expertise already present in the library staff. this article, focusing on years from 2014 to 2020, will highlight the history and evolution of hackathons in libraries as outreach events and as a focus for using library materials, data, workflows, and content. introduction as a means of introduction to hackathons for those unfamiliar with these events, the following definition was developed after reviewing the literature. hackathons are time-bound events where participants gather to build technology projects, learn from each other and experts, and create innovative solutions that are often judged for prizes. while hacking can have negative connotations when it comes to security vulnerabilities, typically for hackathon events hacking refers to modifying original lines of code or devices with the intent of creating a workable prototype or product. events may have a specific theme (use of a particular dataset or project based on a designated platform) or may be open-ended with challenges focused on innovation or social good. while hackathons have been a staple in software and hardware design for decades, the first hackathons with a library focus were sponsored by vendors, focused on topics such as accessibility and adaptive technology for their content and platforms.2 other industry hackathons focused on re-envisioning the role of the book in 2013 and 2014.3 as hackathons became more popular at colleges and universities, library participation evolved from content provider to event host. these partnerships were beneficial to libraries interested in shifting the perception of libraries from books to newer areas of expertise around data and information literacy. however, many libraries realized that by partnering in planning the events greater possibilities existed to educate participants about library content and staff expertise. some examples include working with public library communities to highlight text as data, having academic subject librarians work with departmental faculty to embed events within curriculum and assignments, and for both academic and public libraries to promote library-produced and publicly available datasets.4 information technology and libraries december 2021 hackathons and libraries |longmeier 2 there are many roles that libraries can take in these events. libraries can act as event hosts where they provide the space at a cost or for free.5 in other cases, library staff become collaborators and in addition to space may assist with planning logistics, judging, building partnerships, and have some staff present at the events.6 in public libraries this often includes building relationships with the city or specific segments of the community based on the theme of the event. on college campuses, it may be a partnership with a specific disciplines or campus it or an outside sponsor. in this way, the libraries are building and sustaining the event due to aligned priorities with the other partners. another option would be for the library to be the primary sponsor, where the library may provide prizes, the theme for the hackathon, as well as many of the items listed above.7 however, instead of specific categories, it should be viewed as a continuum of partnership and the amount of involvement with the event should align with the library’s priorities of what it hopes to accomplish through the event. how involved in event planning specific libraries want to be may depend on the depth of the existing partnerships as well as how many resources the library wants to commit to the event. libraries have always existed as curators and distributors of knowledge. some libraries are using hackathons to advance both their image and their practices. libraries are evolving into new roles and have grown to support more creative endeavors, such as the maker movement. this shift of libraries from book-provider to social facilitator and information co-creator aligns with hackathon events. the physical spaces themselves are ideal to support public outreach events and libraries are already providing makerspaces or similar services that would overlap with a hackathon audience.8 additionally, the spaces afforded by libraries allow flexibility and creativity to flourish, ideas to be exchanged, and different disciplines to mingle and co-produce. library staff focused on software development may have projects that would benefit from outside perspectives as well. in recent years libraries have become stewards of digital collections that can be used and reused in innovative ways. many libraries have chosen wikipedia edit-a-thons as a means of engaging with the public and enhancing access to materials.9 similarly, the collections-as-data movement is blossoming and allowing galleries, libraries, archives, and museum (glam) institutions to rethink the possible ways of interacting with collections. many public libraries are partnering with local or regional governments to build awareness of data sources and build bridges with the community around how they would like to interact with the data.10 additionally, as data science continues to grow in importance in both public and academic libraries, data fluency, data cleaning, and data visualization could be themes for a hackathon or data-thon.11 for those unfamiliar with these events, table 1 provides some generalized definitions created by the author of the different types of events and their intended purpose. for some organizations, there are ways to support these events that consume fewer resources or require less technical knowledge, such as an edit-a-thons or code jams. information technology and libraries december 2021 hackathons and libraries |longmeier 3 table 1. defining common hackathon and hackathon-like events, purpose, and typical size of events type of event definition purpose size of event hackathon a team-based sprint-like event focused on hardware or software that brings together programmers, graphic designers, interface designers, project managers or domain experts; can be open ended idea generation or for a specific provided theme build a working prototype, typically software up to 1,000 participants, usually determined by space available idea fest business pitch competition where individuals or teams pitch a solution or new company (startup) idea to a panel of judges deliver an elevator pitch for an idea, could be to secure funding <100 coding contest or code jam an individual or team competition to work through algorithmic puzzles or on specific code provided learning to code or solve challenges through coding; may produce a pitch at the end rather than a product 20–50 edit-athon an event where users improve content in a specific online community; can focus on a theme (art, country, city) or type of material (map) improving information in online communities such as wikipedia, openstreetmap, or localwiki 20–100 datathon a data-science–focused event where participants are given a dataset and a limited amount of time to test, build, and explore solutions usually a visualization or software development around a particular dataset 50–100 makeathon hardware focused hackathon build working prototype of hardware up to 300 participants methods to find articles in the library and information science literature related to hackathons and libraries, the author searched the association for computing machinery (acm) digital library, scopus, library literature and information science, and library and information science and technology abstracts (lista) databases. in scopus and the acm digital library, the most successful searches included the following: (hackathon* or makeathon*) and library; in library literature and information science and lista databases, the most successful searches included: information technology and libraries december 2021 hackathons and libraries |longmeier 4 hackathon or makeathon or “coding contest.” the author also searched google scholar in an attempt to locate other studies or reports, some of which came from institutional repositories. while this search strategy was not meant to be exhaustive, it uncovered many articles about hackathons and libraries and others were found by chaining citations in the articles reference lists. based on search locales, international articles were found but only those where the text was available in english were included which meant that articles from asia, africa, and the global south may have been inadvertently overlooked. only two of the articles found in the search results were not held in library locations, did not use library/archival materials, or were not an outreach event where library staff were integral in planning (these were discarded.) findings the author grouped the literature into two categories: library as place and library as source. in the realm of library as place, the literature consisted of reporting on hackathons where the library was the host location for the event, those where the hackathon was an outreach event, and those where the hackathon was an extension of the libraries’ teaching/education mission. for most of these articles the majority were case studies and often shared tips for other libraries to consider when hosting a hackathon in library spaces. the second category, those that use library as source, focused on highlighting library spaces or services, workflows, or collections as the theme of the events. additionally, there were a few articles in the second category that discussed how to prepare or clean library data or library sources before the event to ensure that participants were able to use the materials during the time-bound event. in some cases where the source materials were from the libraries, the event also occurred in the library; thus, some articles fit into both categories and are highlighted in both sections. results: library as place the following summaries of hackathons and libraries as places for events will be grouped into two subgenres: library spaces and outreach events. libraries, both public and academic, are ideal locations for hosting large, technology-driven events given the usual amenities of ample parking, ubiquitous wi-fi, adequate outlets, and at times already having 24-hour spaces built into their infrastructure. more and more libraries are offering generous food and drink policies, a benefit as sustenance is a mainstay at these multiday events. additionally, libraries already host a number of outreach events and serve as a community information hub. using libraries as event hosts for hackathons a number of articles detail the use of library spaces to host hackathon events.12 the university of michigan library, a local hackerspace (all hands active), and the ann arbor district library teamed up to host a hackathon focused on oculus rift.13 this event grew out of a larger partnership with the community and sought to mix teams to include participants from all three areas. the 2018 article by demeter et al. highlights lessons learned from florida state university library and many of the planning steps involved when hosting large outreach events in library spaces.14 while the library initially hosted a 36-hour event, hackfsu, as a favor to the provost in the first year, they continue to host the event, providing library staff as mentors and logistical support. after the first year they started charging the student organization for use of the space and direct staffing costs for the hours beyond normal operating hours. while focused primarily on providing a central campus space, the library also sees it as a way to highlight the teaching and learning role of the library. similarly, nandi and mandernach detail the steps involved in planning information technology and libraries december 2021 hackathons and libraries |longmeier 5 hackathon events and some benefits of choosing the library as a location for the event.15 at ohio state, hackathon events in 2014 and 2015 were held in the library due to twenty-four-hour spaces, interest by the libraries in supporting innovative endeavors on campus, and a participant size (100–200 attendees) that could be accommodated in the space. other events chose academic libraries as locations for hackathons due to their central location on campus.16 an initial summary of library hackathons was captured by r. c. davis who detailed that libraries may be motivated to host such events as they align with library principles of “community, innovation, and outreach.”17 she points out that libraries are ideal locations because of small modular workspaces paired with a large space for final presentations. additionally, adequate and sufficiently strong wi-fi or hardwired connections, a multitude of power outlets, and 24-hour spaces are appealing for these kinds of events. event planners should know that the necessities include free food and multidisciplinary involvement. davis details ways to plan smaller events, such as code days or edit-a-thons, if staffing does not allow for a large hackathon event. in all cases, the libraries serve a purpose to either campus or community as the location and sometimes also provide staff for the events. hackathons as library outreach hackathon events are a great way to reach out to the community and provide a fresh look into libraries as purveyors of information focused on more than books. at the 2014 computers in libraries conference, chief library officer mary lee kennedy delivered a keynote sharing stories of the new york public libraries experiences hosting wikipedia editathons and other hackathons at various branches since 2014.18 the goals for these outreach events were to highlight strategic priorities around making knowledge accessible, re-examine the library purpose, and spark connections. early library hackathon events focused on outreach included topics of accessibility or designing library mobile apps.19 more recent events have focused on outreach but with an eye toward sharing content as part of the coding contest.20 even library associations have hosted preconference hacking events to highlight what libraries are doing to foster innovation.21 the future libraries product forge, a four-day event, was hosted in collaboration with the scottish library and information council and delivered by product forge, a company focused on running hackathons that tackle challenging social issues. the 2016 event focused specifically on public libraries in scotland and seven teams, comprised mainly of students from a local university, worked with public library staff and users as well as regional experts in technology, design, and business.22 the goals of the event were to raise awareness of digital innovation with library services, generate enthusiasm for approaches to digital service design, and codesign new services around digital ventures. participants created novel products including digital signage, a game for young readers, a tool for collecting user stories about library services, and an app to reserve specific library spaces. another common focus for library hackathon outreach events is the theme of data and data literacy. in july 2016, the los angeles public library hosted the civic information lab’s immigration hackathon.23 this outreach event gathered 100 participants to address local issues around immigration. the library, motivated by establishing itself as a “welcoming, trusting environment,” wanted to be a “prominent destination of immigrant and life-enrichment information and programs and services.”24 newcastle libraries ran two-day-long events focused on promoting data they released under an open license as part of the commons are forever project.25 they used both events to educate users about tools such as github, a gif-making session with historical photographs, and data visualization tools. similarly, toronto public library hosted a series of open data hackathons to highlight the role of the libraries in civic issue information technology and libraries december 2021 hackathons and libraries |longmeier 6 discourse, data literacy, and data education.26 their events combined the hackathon with other panel presentations and resources focused on mentorship and connection-building in the technology sector. the library also used the event to promote their open data policy, build awareness around the data provided by the library for the community, and highlight their role in facilitating conversations around civic issues through data literacy and data education. edmonton public library hosted its first hackathon in 2014 for international open data day. one of the main drivers was to build the relationship with their local government.27 they built their event around the tenets laid out in the open data hackathon how-to guide and by a blog post about the city of vancouver’s 2013 international open data day hackathon.28 they took a structured approach to documenting expectations of both partners around areas such as resources, staffing, and costs, which served as a roadmap for the hackathon and the partnership. the library provided the event space, coffee and pizza, an emcee, tech help and wi-fi, door prizes and “best idea” prize, and promotional material. the city recruited participants and provided an orientation, promotional banners, and a keynote. the event led to a deeper partnership with the city and additional hacking events. in these ways, the hackathon served a greater purpose of community building and awareness around data, the role the library plays in interpreting data, and how the libraries serve as a resource hub to the community. events supporting library teaching mission at academic institutions, the events often focus on outreach to their own campus community. in 2015, adelphi university hosted their first hackathon and the libraries funded the event themselves rather than seeking outside funding.29 the article details the considerable lessons learned through the process as well as a step-by-step guide to planning a smaller event. similarly, york university science and engineering library hosted hackfests in the library and embedded an event as part of an introductory computer science course.30 shujah highlighted some of the benefits to the library hosting a hackathon included: establishing libraries as part of the research landscape, providing a constructive space for innovation and innate collaborative environment, highlighting the commitment to openness and democratizing knowledge, and acknowledging the library’s role in boosting critical thinking and information literacy concepts. shin, vela, and evans highlight a community hackathon at washington state university college of medicine where a group of librarians from multiple institutions staffed a research station throughout the event.31 while the station was underutilized by participants, as only seven questions were asked during the event, the libraries deemed their participation a success as it worked as an outreach and promotion mechanism for both library services and expertise. at some public libraries, the focus of the hackathon is on education and teaching basic coding skills. whether called a coding contest, hackathon, or tech sandbox, there are opportunities for programming with a focus on learning and skill-building and fun.32 santa clara county library district used a peer-to-peer approach for mentoring and hosted a hackathon in 2015 for middle and high-school students.33 the library staff facilitated the event planning and recruited judges from the community, but the bulk of the event was coordinated by the students. considerations when hosting events in library spaces a couple of substantive reports provide overarching recommendations and considerations for hosting hackathons in library spaces, including planning checklists, tips on getting funding, building partnerships with local community officials, and thinking through the event systematically. recently, the digital public library of america (dpla) created a hackathon information technology and libraries december 2021 hackathons and libraries |longmeier 7 planning guide that details a number of logistical issues to address during the planning phases, both preand post-event.34 this report highlights specific considerations for galleries, libraries, archives and museums that are looking to host a hackathon. after hosting a successful hackathon, librarians at new york university created a libguide called hack your library which is a planning guide for other libraries considering hosting a similar event.35 the engage respond innovate final report: the value of hackathons in public libraries was put together following an event the carnegie uk trust sponsored.36 this guide highlights some of the challenges present with hackathons, including: intellectual property of the creations, prizes, participant diversity, and complications that arise from either approach of using specific themes or open-ended challenges. it also highlights some of the main reasons a library should consider hackathons and other coding events, including ways to promote new roles of libraries within communities, promote specific collections, capitalize on community expertise, gain insight about users, help users build new skills and improve digital literacy, and develop tools that increase access to materials. finally, the report points out that hosting an event will not be the only solution for a library’s innovation problem. yet if the library is clear on why it wants to hold a hackathon, being planful about expectations and outcomes the library is trying to achieve will increase the chances for success. results: library as source the other category of articles about hackathons and libraries focuses on the library as the source for the challenge or theme of the hackathon. the following summaries highlight articles include those where the libraries provided the challenges around library spaces or services, library datasets, workflows or collections as the theme for the hackathon. this section also details steps involved in cleaning data for use/re-use in time-bound events. using hackathons to improve library services and spaces a few articles discuss libraries that proposed hackathon themes around improving library services. a 2016 article describes how adelphi university libraries hosted a hackathon and provided the theme of developing library mobile apps and web software applications.37 the winning student team created an app for library group study meetups. similarly, the librarians from university of illinois tried three approaches for library app development: a student competition, a project in a computer science course, and a coding camp. with the adventure code camp, students co-designed with librarians over the course of two days.38 they advertised to specific departments and courses and ten students were selected with six ultimately participating in the two-day coding camp. students were sent a package of library data, available apis, and brief tutorials on coding languages that may be useful. mentors and coaches were available throughout the coding camp. the authors provided tips for others trying to replicate their approach as well as insights from the students about interest in developing apps that include library data but that don’t solely focus on library services. the following year the librarians hosted a coding contest focused specifically on app development related to library services and spaces.39 the library sponsored the event and served as both a traditional client and partner in the design process. ultimately six teams with a total of 26 individuals participated and each app was “required to address student needs for discovery of and access to information about library services, collections, and/or facilities” but not duplicate existing library mobile apps. they based their approach on massachusetts institute of technology’s entrepreneurship competition. through this process, co-ownership was preferred and many teams set up a licensing agreement as part of the competition to handle intellectual property for the software. students had two weeks to complete the apps and were judged by both library and campus it administration. this article details what information technology and libraries december 2021 hackathons and libraries |longmeier 8 they learned through the process given the amount of attrition from selection of teams to final product presentations. the new york university school of engineering worked with the libraries and used a hackathon theme of noise issues to coincide with the renovation of the library.40 the libraries created a libguide to provide structured information about the event itself (https://guides.nyu.edu/hackdibner). they used the event to market the new maker space and held workshops there leading up to the event. in the inaugural year they held the event over the course of two semesters and saw a lot of attrition due to the event length. in the second year, following focus groups with participants, they designed a library hackathon with four goals: 1) appeal to a large base of the student population, 2) create a triangle of engagement between the student and the library, the library and the faculty, and the faculty and the students, 3) provide an adaptable model to other libraries, and 4) highlight the development of student information literacy skills.41 the second year’s approach required more work by the participants due to pitching an initial concept, providing a written proposal, and giving a final presentation. library staff and guest speakers offered workshops to help students hone their skills. the planners evaluated the event through surveys and student focus groups. overall the students applied what they learned about information literacy and were highly engaged with the codesign approach to library service improvements. similarly, mcgowan highlights two hackathons at purdue that focused on inclusive healthcare and how the libraries applied design thinking processes as part of the events.42 the librarian wanted to encourage health sciences students to examine health data challenges. to examine this issue, she applied the blended librarians adapted addie model (blaam) as a guide to developing a service to prepare students to participate in a hackathon. a number of pre-event training sessions were held in the libraries and covered topics such as research data management, openrefine and data cleaning, gephi for data visualization, and javascript. while this initial approach was in tandem with the hackathon events, students reported that they needed assistance in finding and cleaning datasets for use. in this case, developing library services to prepare for hackathon events ended up out of alignment with both the library’s mission and the participants’ expectations. using library materials for hackathon themes several events have focused on library as source where the library’s materials or processes serve as the theme of the hackathon, particularly around digital humanities (dh) topics.43 in september 2016, over 100 participants worked with materials from the special collections of hamburg state and university library, a space that serves both the university and the public.44 it followed the process established by coding da vinci (https://codingdavinci.de/en), an event that occurred in 2014 and 2015. the event at hamburg state and university library had a kick-off day for sharing available datasets, brainstorming projects using library materials, and team building opportunities. the event had a second day of programming and then teams had six weeks to complete their projects. some exemplary products included a sticker printer that would print old photographs, a quiz app based on engraving plates, and using a social media platform to bring the engravings to the public. the event was successful and resulted in opening additional data from the institution. several examples focus on highlighting digital humanities approaches as part of the events. in 2016, four teams from across european institutions participated over five days in kibbutz lotan in the arava region of israel to develop linguistic tools for tibetan buddhist studies with the goal of information technology and libraries december 2021 hackathons and libraries |longmeier 9 revealing their collections to the public.45 the planning team recruited international scholars to participate in prestructured teams (teams consisted of computer scientists as well as a tibetan scholar) in israel. although it was less of a traditional hackathon, this event being more akin to an event/coding contest around a specific task, it highlighted tools and methods for understanding literary texts. the format of the event for encouraging interdisciplinary efforts in the computational humanities was deemed successful and it was repeated the next year on manuscripts and computer-vision approaches. recently the university of waterloo detailed a series of datathons using archives unleashed to engage the community in an open-source digital humanities project.46 the goal of the events was to engage dh practitioners with the web archive analysis tools and attempt to build a web archiving analysis community. in 2016, the american museum of natural history in new york hosted their third annual hackathon event, hack the stacks, with more than 100 participants.47 the event focused on creating innovative solutions for libraries or archives and to “animate, organize, and enable greater access to the increasing body of digitized content.” ten tasks were available for participants to work on and ranged from a unified search interface, reassembling fragments of scientific notebooks, and creating timelines of archival photos of the museum itself. in addition to planning the tasks, the library staff ensured that the databases and applications could handle the additional traffic. a multitude of platforms were provided (omeka, dspace, the catalog, apis, archivespace, etc) for hackers to use. all prototypes that were developed were open source and deposited on github at “hack the stacks.”48 some cultural institutions have used hackathons as a means of outreach and publicity and then have showcased the outputs at the museums. vhacks, a hackathon at the vatican, was held in 2018 and gathered 24 teams from 30 countries for a 36-hour event.49 the three themes for the event focused on social inclusion, interfaith dialogue, and migrants and refugees. a winner was announced for each thematic area and sponsors enticed participants to continue working on projects by having a venture capitalist pitch a few weeks after the event. another program, museomix, concentrates on a three-day rapid prototyping event where outputs are highlighted in the museum or cultural institution.50 this event has happened annually in november since 2011 and the goal is to create interdisciplinary networks and encourage innovation and community partnership. improving library workflows and processes other hackathons have focused on library staff working on library processes themselves. bergland, davis, and traill detail a two-day event, catdoc hack doc, hosted by the university of minnesota data management and access department focused on increasing documentation by library staff.51 this article details logistics of preparing for the event as well and a summary of the work completed. they based their approach on the islandora collaboration group’s template on how to run a hack/doc.52 they were pleased with the workflow overall, refined some of the steps, and held it again for library staff the following year. similarly, dunsire highlights using a hackathon format to encourage adoption of a cataloging approach of research description and access (rda) through a “jane-athon.”53 events occurred at library conferences or in conjunction with other hackathon events, such as the thing-athon at harvard, with the intention of promoting the use of rda, to help users understand the utility of rda, and to spark discussions. this approach proved useful in uncovering some limitations with rda as well as valuable feedback that could be incorporated into its ongoing development. information technology and libraries december 2021 hackathons and libraries |longmeier 10 considerations when using libraries as source if libraries are interested in hosting a hackathon where the library plays a more central role, there are several options of ready-to-use library and museum data that could allow the host to also serve as the content provider. the digital public library of america released a hackathon guide, glam hack-in-a-box: a short guide for helping you organize a glam hackathon with several sources at the end for finding data related to libraries.54 the university of glasgow began a project called the global history hackathons that seeks to improve access and excitement around global history research.55 additionally, candela et al. detail the new array of sources for sharing glam data for reuse in multiple ways, including using data in hackathon projects.56 planners could look to the collections-as-data conversations for other data sources that could be adapted for hackathon projects.57 when thinking about hackathons and cultural institutions, sustainability of projects and choice of platforms is an important consideration for planners.58 ultimately, the top priority when providing a dataset is to ensure that it is clean and enough details about the dataset are available for participants to make use of it in their designs given the time constraints of most events. discussion hackathons often have a dual purpose of educating the participants and serving as an advertisement for the sponsor for either a platform or content. participants will develop a working prototype or improving their coding abilities; sponsors, including libraries, can benefit from rapid prototyping and idea generation using either their platforms or content. while usable apps or new ideas are a welcome outcome, even if the applications are not used, the events still feed into the larger goal of marketing libraries and their data, building relationships with local communities, or drawing attention to social good. there are benefits to libraries in either hosting or collaborating on the events. in both areas, those of library as space and library as source, hackathons help realign user expectations of libraries. if libraries choose to become involved with hackathons or other coding or data contests, the library should be deliberate in its goals and intended outcomes as those will help shape both the event and its planning. libraries are naturally aligned with teaching and learning, are already offering co-curricular programming, and typically serve as physical and communication hubs for campus. libraries already prioritize outreach and engagement with constituents both on campuses and in the community. therefore, when programs align with library priorities of data literacy, data fluency, and information evaluation, it is a natural fit to propose involvement in hosting hackathons. many libraries are able to customize their spaces, services, and vendor interfaces, which is a benefit when thinking about having libraries as a theme for an event. other benefits exist for the hackathon event planners when partnering with a library. hackathon planners should consider reaching out to libraries as they already serve as a cross-disciplinary event spaces, host many other outreach events, and are often connected to other campus and community stakeholders and communication outlets. since students from all disciplines and colleges already use the library spaces on college campuses, they are an ideal location for fostering collaborations from different colleges and majors. public libraries function as community gathering spots as well. as libraries consider hosting events, several articles provide overarching tips for planning and hosting hackathons and other time-bound events.59 table 2 provides an overview of articles and the areas of coverage for planning topics. information technology and libraries december 2021 hackathons and libraries |longmeier 11 table 2. selected articles for tips on planning hackathon events based on common article theme areas article author location details sample agenda + timelines power and computing mentors/ judging further readings carruthers (2014) x x x x nelson & kashyap (2014) x x x x x jansen-dings, dijk, van westen (2017) x x x bogdanov & isaacmendard (2016) x x x nandi & mandernach (2016) x x x grant (2017) x x x x x as library data becomes more open and reusable, hackathons will be a way to highlight data availability, promote its use and reuse, and reach out to the community. the issues present when considering library collections as potential hackathon themes are that libraries will need to ensure the data are cleaned and contain sufficient metadata so that the data are ready to use. additionally, if there are programming language restrictions for ongoing maintenance by the library after the event, those should be specified when advertising the event. ultimately, the libraries will likely not control the intellectual property (ip) of the tool or visualization developed, but several libraries have specified the ultimate ip as part of the event details either as open source or co-owned.60 often the goal of the event is the promotion of specific materials or building awareness of a collection rather than any biproduct created during the event. however, it is important for the library to be clear about their intent when advertising to participants. the collections-as-data movement will continue to evolve and there will be a multitude of library resources that could be mined for use at hackathons or other similar events. while libraries provide an ideal location and have access to data that can be used for an event, they can also leverage their wealth of experts. library staff can serve as judges, mentors, and connectors to the wider campus or community. events could highlight specific expertise when hackathons focus on particular approaches (data visualization), processes (metadata management or documentation), or codesign of services (physical spaces). table 3 provides examples of hackathon events from a variety of library contexts. hackathons are a great way for libraries to serve as a connector to others on campus or in their communities. if libraries are not interested or able to host an event themselves, library staff can act as mentors or event judges. at smaller schools, library staff can partner with other campus units to plan a hackathon; similarly, smaller public libraries could work with community organizations to host events. at a smaller scale if staffing is a concern or full hackathons are unrealistic, a coding contest or datathon, both of which typically have a shorter duration, might be an option. edit-a-thons are even easier to host as they require only an introduction to the editing process, ample computer space (or laptop hook-ups), and a small food budget. some edit-a-thon events happen in a single afternoon. information technology and libraries december 2021 hackathons and libraries |longmeier 12 table 3. selected hackathon event summaries from various library contexts based on themes and products of the event article author type of library size of event time for event purpose of event role of the library output carruthers (2014) public + city 29 participants 1 day highlight open data from the libraries event space, coffee + pizza, emcee, some prizes, assessment building partnerships with the city, getting dataset requests ward, hahn, mestre (2015) academic 6 teams; 25 participants 2 weeks develop apps using library data event sponsor, mentor app development for library using library data mititelu & grosu (2016) academic 100 participants 48 hours bring together tech students event space app development for sponsors nandi & manderna ch (2016) academic 200 participants 36 hours bring together tech students event space, planning logistics, judges various apps, not library related baione (2017) private museum 100 participants 2 days animate, organize, and enable greater access to digitized content from the library create challenges, event space, judges open source apps for glam institutions theise (2017) academic + public 100 participants 2 days + 6 week sprint cultural hackathon to highlight library data and resources event space, challenges, datasets for hacking highlighted data available for use, created apps focused on library materials almogi et al. (2019) academic 23 participants 5 days develop linguistic tools for buddhist studies provided cleaned datasets for manipulation linguistic tools for buddhist studies one area for iteration around these events relates to timing. while most hackathons last 24–36 hours, some are run over the course of a oneor two-month period where coding happens information technology and libraries december 2021 hackathons and libraries |longmeier 13 remotely with a few scheduled check-ins with mentors before judging and presentations. this notion of a remote event may have more appeal for collections-as-data–themed events as experts are more likely to be available for keynotes or mentoring. if the process instead of the product is the focus of the event, then providing a flexible structure may be more appealing to participants. if a library has more limited resources or capacity, stretching the event out over a longer period would allow for sustained interactions. however, libraries should be aware that the longer the event period, the greater the attrition of the participants. an area for future research includes assessment of library participation in events. a couple of articles highlighted the value the libraries found in the events, but it is unclear whether the participants also gained value from the libraries.61 typically, post-event surveys have focused on the participant experience or the overall event space, rather than whether it affected participants’ view of the libraries, which would another area of interest for future research.62 conclusion in the realm of hackathons and libraries, originally hackathon themes were a way that vendors could highlight new content or improve interfaces. libraries followed this trend and used events to reach out to constituents, make connections with their communities, and highlight evolving library services. with the growth of flexible spaces, ample technology support and more relaxed food policies, libraries have become ideal event locations. as the collections-as-data movement evolves, there will be more opportunities to develop services related to these data and other library data which would lend themselves easily as themes for hackathons, edit-a-thons, or datathons. libraries thinking about hosting events will need to weigh the amount of time and resources they want to invest with the intended goals of hosting an event. planning is essential whether the library is the event host, a collaborator, or a sponsor of a hackathon. for those libraries that are unable to host a full hackathon, smaller events, such as a datathon or edit-a-thon, are possibilities to provide support without the same time and resource commitment. given the growing popularity of hackathons and other coding contests, they may be a catch-all for solving several library issues simultaneously: updating the library’s image as being more than book-centric, supporting the collections-as-data movement, and a new way of engaging community partners. acknowledgements thank you to jody condit fagan for providing valuable suggestions on a draft of this paper and to the two anonymous reviewers whose feedback improved the quality of this manuscript. endnotes 1 james k. elmborg, “libraries as the spaces between us: recognizing and valuing the third space,” reference and user services quarterly 50, no. 4 (2011): 338–50. 2 “a brief open source timeline: roots of the movement,” online searcher 39, no. 5 (2015): 44–45; patrick timony, “accessibility and the maker movement: a case study of the adaptive technology program at district of columbia public library,” in accessibility for persons with disabilities and the inclusive future of libraries, advances in librarianship, vol. 40, (emerald group publishing limited, 2015), 51–58; kurt schiller, “elsevier challenges library community,” information today 28, no. 7 (july 2011): 10; eric lease morgan, “worldcat information technology and libraries december 2021 hackathons and libraries |longmeier 14 hackathon,” infomotions mini-musings (blog), last modified november 9, 2008, http://infomotions.com/blog/2008/11/worldcat-hackathon/; margaret heller, “creating quick solutions and having fun: the joy of hackathons,” acrl techconnect (blog), last modified july 23, 2012, http://acrl.ala.org/techconnect/post/creating-quick-solutions-andhaving-fun-the-joy-of-hackathons. 3 clemens neudecker, “working together to improve text digitization techniques: 2nd succeed hackathon at the university of alicante,” impact centre of confidence in digitisation blog, last updated april 22, 2014, https://www.digitisation.eu/succeed-2nd-hackathon/; porter anderson, “futurebook hack,” bookseller no. 5628 (june 20, 2014): 20–21; sarah shaffi, “inaugural hack crowns its diamond project,” bookseller no. 5628 (june 20, 2014): 18–19. 4 rose sliger krause, james rosenzweig, and paul victor jr. “out of the vault: developing a wikipedia edit-a-thon to enhance public programming for university archives and special collections,” journal of western archives 8, no. 1 (2017): 3; stanislav bogdanov and rachel isaac-menard, “hack the library: organizing aldelphi [sic] university libraries’ first hackathon,” college and research libraries news 77, no. 4 (2016): 180–83; matt enis, “civic data partnerships,” library journal 145, no. 1 (2020): 26–28; alex carruthers, “open data day hackathon 2014 at edmonton public library,” partnership: the canadian journal of library & information practice & research 9 no. 2 (2014): 1–13, https://doi.org/10.21083/partnership.v9i2.3121; sarah shujah, “organizing and embedding a library hackfest into a 1st year course,” information outlook 18, no. 5 (2014): 32–48; lindsay anderberg, matthew frenkel, and mikolaj wilk, “project shhh! a library design contest for engineering students,” in american society for engineering education 2018 annual conference proceedings (2018): paper id 21058, https://cms.jee.org/30900. 5 michelle demeter et al., “send in the crowds: planning and benefiting from large-scale academic library events,” marketing libraries journal 2 no. 1 (2018): 86–95, https://bearworks.missouristate.edu/cgi/viewcontent.cgi?article=1089&context=articles-lib. 6 jamie lausch vander broek and emily puckett rodgers, “better together: responsive community programming at the um library,” journal of library administration 55, no. 2 (2015): 131–41; arnab nandi and meris mandernach, “hackathons as an informal learning platform,” in sigcse 2016 – proceedings of the 47th acm technical symposium on computing science education (february 2016): 346–51, https://doi.org/10.1145/2839509.2844590; lindsay anderberg, matthew frenkel, and mikolaj wilk, “hack your library: engage students in information literacy through a technology-themed competition,” in american society for engineering education 2019 annual conference proceedings, (2019): paper id 26221, https://peer.asee.org/32883; anna grant, hackathons: a practical guide, insights from the future libraries project forge hackathon (carnegieuk trust, 2017), https://www.carnegieuktrust.org.uk/publications/hackathons-practical-guide/; carruthers, “open data day hackathon 2014 at edmonton public library”; chad nelson and nabil kashyap, glam hack-in-a-box: a short guide for helping you organize a glam hackathon (digital public library of america, summer 2014), http://dpla.wpengine.com/wpcontent/uploads/2018/01/dpla_hackathonguide_forcommunityreps_9-4-14-1.pdf. information technology and libraries december 2021 hackathons and libraries |longmeier 15 7 david ward, james hahn, and lori mestre, “adventure code camp: library mobile design in the backcountry,” information technology and libraries 33, no. 3 (2014): 45–52; david ward, james hahn, and lori mestre, “designing mobile technology to enhance library space use: findings from an undergraduate student competition,” journal of learning spaces 4, no. 1 (2015): 30–40. 8 ann marie l. davis, “current trends and goals in the development of makerspaces at new england college and research libraries,” information technology and libraries 37, no. 2 (2018): 94–117, https://doi.org/10.6017/ital.v37i2.9825; mark bieraugel and stern neill, “ascending bloom’s pyramid: fostering student creativity and innovation in academic library spaces,” college & research libraries 78, no. 1 (2017): 35–52; elyssa kroski, the makerspace librarian’s sourcebook (chicago: ala editions, 2017); angela pashia, “empty bowls in the library: makerspaces meet service,” college & research libraries news 76 no. 2 (2015): 79–82; h. michele moorefield-lang, “makers in the library: case studies of 3d printers and maker spaces in library settings,” library hi tech 32, no. 4 (2014): 583–93; adetoun a. oyelude, “virtual reality (vr) and augmented reality (ar) in libraries and museums,” library hi tech news 35, no. 5 (2018) 1–4. 9 krause, rosenzweig, and victor jr., “out of the vault”; ed yong, “edit-a-thon gets women scientists into wikipedia,” nature news (october 22, 2012), https://doi.org/10.1038/nature.2012.11636; angela l. pratesi et al., “rod library art+feminism wikipedia edit-a-thon,” community engagement celebration day (2018): 10, https://scholarworks.uni.edu/communityday/2018/all/10; maitrayee ghosh, “hack the library! a first timer’s look at the 29th computers in libraries conference in washington, dc,” library hi tech news 31, no. 5 (2014): 1–4, https://doi.org/10.1108/lhtn-05-20140031. 10 carruthers, “open data day hackathon 2014 at edmonton public library”; bob warburton, “civic center,” library journal 141, no. 15 (2016): 38. 11 matt burton et al., shifting to data savvy: the future of data science in libraries (project report, university of pittsburgh, pittsburgh, pa, 2018): 1–24, https://d-scholarship.pitt.edu/33891/. 12 vander broek and rodgers, “better together”; nandi and mandernach, “hackathons as an informal learning platform”; robin camille davis, “hackathons for libraries and librarians,” behavioral & social sciences librarian 35, no. 2 (2016): 87–91; bogdanov and isaac-menard, “hack the library”; ward, hahn, and mestre, “adventure code camp”; ward, hahn, and mestre, “designing mobile technology to enhance library space use”; demeter et al., “send in the crowds”; carruthers, “open data day hackathon 2014 at edmonton public library.” 13 vander broek and rodgers, “better together.” 14 demeter et al., “send in the crowds.” 15 nandi and mandernach, “hackathons as an informal learning platform.” 16 eduard mititelu and vlad-alexandru grosu, “hackathon event at the university politehnica of bucharest,” international journal of information security & cybercrime 6, no. 1 (2017): 97–98; information technology and libraries december 2021 hackathons and libraries |longmeier 16 orna almogi et al., “a hackathon for classical tibetan,” journal of data mining and digital humanities, episciences.org, special issue on computer-aided processing of intertextuality in ancient languages, hal-01371751v3 (2019): 1–10, https://jdmdh.episciences.org/5058/pdf. 17 davis, “hackathons for libraries and librarians.” 18 ghosh, “hack the library!” 19 timony, “accessibility and the maker movement”; ward, hahn, and mestre, “adventure code camp.” 20 gérald estadieu and carlos sena caires, “hacking: toward a creative methodology for cultural institutions,” (presented at the viii lisbon summer school for the study of culture “cuber+cipher+culture”, september 2017); andrea valdez, “the vatican hosts a hackathon,” wired magazine, last updated march 7, 2018, https://www.wired.com/story/vaticanhackathon-2018/; leonardo moura de araujo, “hacking cultural heritage: the hackathon as a method for heritage interpretation,” (phd diss., university of bremen, 2018): 181–231, 235– 38. 21 thomas finley, “innovation lab: a conference highlight,” texas library journal 94, no. 2 (summer 2018): 61–62. 22 grant, hackathons: a practical guide. 23 warburton, “civic center.” 24 warburton, “civic center.” 25 aude charillon and luke burton, “engaging citizens with data the belongs to them,” cilip update magazine (november 2016). 26 enis, “civic data partnerships.” 27 carruthers, “open data day hackathon 2014 at edmonton public library.” 28 kevin mcarthur, herb lainchbury, and donna horn, “open data hackathon how to guide v. 1.0,” october 2012, https://docs.google.com/document/d/1fbuisdtiibaz9u2tr7sgv6gddlov_ahbafjqhxsknb0/e dit?pli=1; david eaves, “open data day 2013 in vancouver,” eaves.ca (blog), march 11, 2013, https://eaves.ca/2013/03/11/open-data-day-2013-in-vancouver/. 29 bogdanov and isaac-menard, “hack the library.” 30 shujah, “organizing and embedding a library hackfest into a 1st year course.” 31 nancy shin, kathryn vela, and kelly evans, “the research role of the librarian at a community health hackathon—a technical report,” journal of medical systems 44 (2020): 36. 32 geri diorio, “programming by the book,” voices of youth advocates 35, no. 4, (2012): 326–327. information technology and libraries december 2021 hackathons and libraries |longmeier 17 33 lauren barack and matt enis, “where teens teach,” school library journal (april 2016): 30. 34 nelson and kashyap, glam hack-in-a-box. 35 lindsay anderberg, matthew frenkel, and mikolaj wilk, “hack your library: a library competition toolkit,” june 6, 2019, https://wp.nyu.edu/hackyourlibrary/; anderberg, frenkel, and wilk, “hack your library: engage students in information literacy through a technologythemed competition.” 36 anna grant, engage. respond. innovate. the value of hackathons in public libraries (carnegieuk trust, 2020), https://www.carnegieuktrust.org.uk/publications/engage-respond-innovatethe-value-of-hackathons-in-public-libraries/. 37 bogdanov and isaac-menard. “hack the library.” 38 ward, hahn, and mestre, “adventure code camp.” 39 ward, hahn, and mestre, “designing mobile technology to enhance library space use.” 40 anderberg, frenkel, and wilk, “project shhh!” 41 anderberg, frenkel, and wilk, “hack your library: engage students in information literacy through a technology-themed competition.” 42 bethany mcgowan, “the role of the university library in creating inclusive healthcare hackathons: a case study with design-thinking processes,” international federation of library associations and institutions 45, no. 3 (2019): 246–53, https://doi.org/10.1177/0340035219854214. 43 marco büchler et al., “digital humanities hackathon on text re-use ‘don’t leave your data problems at home!’” electronic text reuse acquisition project, event held july 27–31, 2015, http://www.etrap.eu/tutorials/2015-goettingen/; helsinki centre for digital humanities, “helsinki digital humanities hackathon 2017 #dhh17,” event held may 15–19, 2017, https://www.helsinki.fi/en/helsinki-centre-for-digital-humanities/dhh-hackathon/helsinkidigital-humanities-hackathon-2017-dhh17. 44 antje theise, “open cultural data hackathon coding da vinci–bring the digital commons to life,” in ifla wlic 2017 wroclaw poland, session 231—rare books and special collections (2017), http://library.ifla.org/id/eprint/1785. 45 almogi et al., “a hackathon for classical tibetan.” 46 samantha fritz et al., “fostering community engagement through datathon events: the archives unleased experience,” digital humanities quarterly 15, no. 1 (2021): 1–13, http://digitalhumanities.org/dhq/vol/15/1/000536/000536.html. 47 tom baione, “hackathon & 21st-century challenges.” library journal 142, no. 2 (2017): 14–17. information technology and libraries december 2021 hackathons and libraries |longmeier 18 48 american museum of natural history, “hack the stacks,” https://www.amnh.org/learnteach/adults/hackathon/hack-the-stacks, https://github.com/amnh/hackthestacks/wiki, https://github.com/hackthestacks. 49 andrea valdez, “inside the vatican’s first-ever hackathon: this is the holy see of the 21st century,” wired magazine, march 12, 2018, https://www.wired.com/story/inside-vhacksfirst-ever-vatican-hackathon/. 50 museomix, “concept,” accessed march, 29, 2021, https://www.museomix.org/en/concept/. 51 kristi bergland, kalan knudson davis, and stacie traill, “catdoc hackdoc: tools and processes for managing documentation lifecycle, workflows, and accessibility,” cataloging and classification quarterly 57, no. 7–8 (2019): 463–95. 52 islandora collaboration group, “templates: how to run a hack/doc,” last modified december 5, 2017, https://github.com/islandora-collaborationgroup/icg_information/tree/master/templates_how_to_run_a_hack_doc. 53 gordon dunsire, “toward an internationalization of rda management and development,” italian journal of library and information science 7, no. 2 (may 2016): 308–31. http://dx.doi.org/10.4403/jlis.it-11708 54 nelson and kashyap, glam hack-in-a-box. 55 hannah-louise clark, “global history hackathons information,” accessed april 19, 2021, https://www.gla.ac.uk/schools/socialpolitical/research/economicsocialhistory/projects/glob al%20historyhackathons/history%20hackathons/. 56 gustavo candela et al., “reusing digital collections from glam institutions,” journal of information science (august 2020): 1–10, https://doi.org/10.1177/0165551520950246. 57 thomas padilla, “on a collections as data imperative,” uc santa barbara, 2017, https://escholarship.org/uc/item/9881c8sv; rachel wittmann et al., “from digital library to open datasets,” information technology and libraries 38, no. 4 (2019): 49–61, https://doi.org/10.6017/ital.v38i4.11101; sandra tuppen, stephen rose, and loukia drosopoulou, “library catalogue records as a research resource: introducing ‘a big data history of music,’” fontes artis musicae 63, no. 2 (2016): 67–88. 58 moura de araujo, “hacking cultural heritage.” 59 grant, hackathons: a practical guide; grant, engage. respond. innovate.; joshua tauberer, “hackathon guide,” accessed march 26, 2021, https://hackathon.guide/; alexander nolte et al., “how to organize a hackathon—a planning kit,” arxiv preprint arxiv:2008.08025 (2020), https://arxiv.org/abs/2008.08025v2; ivonne jansen-dings, dick van dijk, and robin van westen, hacking culture: a how-to guide for hackathons in the cultural sector, waag society, (2017): 1–41. https://waag.org/sites/waag/files/media/publicaties/es-hacking-culturesingle-pages-print.pdf. 60 ward, hahn, and mestre, “designing mobile technology to enhance library space use.” information technology and libraries december 2021 hackathons and libraries |longmeier 19 61 mcgowan, “the role of the university library in creating inclusive healthcare hackathons.” 62 nandi and mandernach, “hackathons as an informal learning platform”; carruthers, “open data day hackathon 2014 at edmonton public library.” meeting users where they are: delivering dynamic content and services through a campus portal communications meeting users where they are delivering dynamic content and services through a campus portal graham sherriff, dan desanto, daisy benson, and gary s. atwood information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11519 graham sherriff (graham.sherriff@uvm.edu) is instructional design librarian, university of vermont. dan desanto (ddesanto@uvm.edu) is instruction librarian, university of vermont. daisy benson (daisy.benson@uvm.edu) is library instruction coordinator, university of vermont. gary s. atwood (gatwood@uvm.edu) is education librarian, university of vermont. abstract campus portals are one of the most visible and frequently used online spaces for students, offering one-stop access to key services for learning and academic self-management. this case study reports how instruction librarians at the university of vermont collaborated with portal developers in the registrar’s office to develop high-impact, point-of-need content for a dedicated “library” page. this content was then created in libguides and published using the application programming interfaces (apis) for libguides boxes. initial usage data and analytics show that traffic to the libraries’ portal page has been substantially and consistently higher than expected. the next phase for the project will be the creation of customized library content that is responsive to the student’s user profile. introduction for many academic institutions, campus portals (also referred to as enterprise portals) are one of students’ most frequently used means of interacting with their institutions. campus portals are websites that provide students and other campus constituents with a “one-stop shop” experience, with easy access to a selection of key services for learning and academic self -management. typically, portals provide features that make it possible for students to obtain course information, manage course enrollment, view grades, manage financial accounts, and access information about campus activities. for faculty and staff, campus portals provide access to administrative resources related to teaching, human relations, and more. these campus portals are different from library portals, which some libraries implemented in the 2000s as a way to centralize access to key library services.1 currently, the public-facing websites of many colleges and universities serve a crucial role in marketing the institution to prospective students. this creates an incentive to be as comprehensive as possible and to showcase the full breadth of programs, services, offices, and facilities. a common disadvantage to this approach to institutional web design is information overload: an overwhelming array of labels and links that diminish the ability of current affiliates to find and access the services they need. these sites are designed for external users for whom the research and educational functions of the library are a low priority. campus portals, however, are designed for internal users and can take a more selective approach. they give student and faculty users a view of campus services that aligns with their priorities and places them in a convenient interface. in this sense, they are tools for information management. campus portals play a critical role in students’ daily lives because they do much more than simply present information. carden observes that campus portals have these key characteristics: mailto:graham.sherriff@uvm.edu mailto:ddesanto@uvm.edu mailto:daisy.benson@uvm.edu mailto:gatwood@uvm.edu information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 2 • allow a single user authentication and authorization step at the initial point of contact to be applied to all (or most) other entities within the portal; • allow multiple types and sources of information to be displayed on a single composite screen (multiple “channels”); • provide automated personalization of the selection of channels offered, based on each user’s characteristics, on the groups to which each user belongs, and possibly on the way in which the system has historically been used; • allow user personalization of the selection of channels displayed and the look-and-feel of the interface, based on personal preferences; • provide a consistent style of access to diverse information sources, including “revealing” legacy applications through a new consistent interface; and • facilitate transaction processing as well as simple data access. 2 in sum, enterprise portals use a combination of advanced technologies that have the ability to present both static and user-responsive information in a space reserved for affiliates of the university. these abilities present an attractive venue for libraries to leverage the capabilities of a campus portal to present users with dynamic, personalized instructional experiences—in a space where users are. this aligns with the principles of user-centered design, which emphasizes the need to empathize with users’ needs and perspectives. simplicity, efficiency, convenience, and responsiveness to each user’s individual circumstances are critical.3 the idea of presenting libraries’ content through a campus portal is not a new one. stoffel and cunningham surveyed libraries in 2004 and, while finding that “library participation in campus portals is . . . relatively rare,” of the sixteen self-selected responding campuses, ten had a library tab or a dedicated library channel within their campus portal, while two more had a channel or tab under development.4 the types of library integration described in most examples consisted of using the portal’s campus authentication to link to a user’s library account and view borrowed books, fines, holds, and announcements. while resources like federated searches, research guides, and lists of journals and databases appeared in some respondents’ portals, they largely appeared as static content rather than responding to the user’s profile. since 2004, portals have remained a core part of the university of vermont’s information delivery system, but portal integration remains relatively rare among libraries and most have done little to integrate new tools such as research guides or develop instructional content that leverages a portal’s user-responsive design. as a result, there is little in the literature on libraries’ integration of content into campus portals, but a small number of case studies provide proof of concept, such as lehigh university, california state university-sacramento, and arizona state university.5 these case studies also illustrate the importance of cross-campus collaboration. our project required some critical elements, specifically access to the campus portal and a method for publishing content. the projects described in the case studies were successful partly because they were able to apply advanced programming expertise that was not available to our group, such as api coding. instead, our group was able to obtain these critical inputs through a partnership with the university of vermont registrar’s office. information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 3 at the university of vermont, the campus portal uses the banner product licensed from ellucian and has branded it as “myuvm.” it is administered by the registrar’s office. librarians have observed that it is central to students’ academic lives. students go to myuvm as their pathway to many of the online services and tools that they use. they go there to check email, log in to the learning management system (lms), check grades, to add, drop, or withdraw from courses, to check their schedule, and more. they go there to carry out tasks. figure 1. screenshot of myuvm (https://myuvm.uvm.edu as it was on march 1, 2019). the importance of myuvm is communicated to university of vermont students at orientation. in this way, first-year students learn at the earliest point, even before their academic programs begin, that the portal is their primary gateway for access to campus academic services. this shapes their view of the services available to them and how those services are organized. it also shapes how they reach those services and how they interact with them. at the same time, the selective principle underlying the campus portal means that if something is not present, it is less visible and less accessible, and there is a risk of signaling to students that it is not important to their daily lives or their academic performance. methods the characteristics of campus portals and their contents motivated instruction librarians to explore the possibility of integrating library services into myuvm. in 2014, the university of vermont libraries’ educational services working group—a small cross-libraries group of librarians who work on a variety of projects supporting classroom instruction and research assistance—began by defining the desirable scope of possible portal content. https://myuvm.uvm.edu/ information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 4 the educational services working group quickly determined that library content included in the portal should be designed to conform with the principle of priority-based selectivity employed across the portal as a whole. this content should not attempt to represent the full s uite of library information and services available. this would replicate the websites of the three libraries on campus and would risk creating overload and disorientation, in a similar way to institutional websites. it is common for actionable and instructional material to become buried beneath links on a library homepage, and the homepages of our three libraries’ websites are no different. our hope was to reposition selected instructional content such as research guides, databases -bysubject, chat reference, and liaison librarian contacts in a venue with which students are used to interacting. the goal of the project was the strategic positioning of dynamic, responsive information about research services in a venue with which students frequently interact. research librarians would select and organize the most important and pertinent instructional content. such selectivity fit well within the portal’s principle for curating content: high-use tools and services that directly support students’ priorities. thus the objective for this project would not be the re-creation of the library websites within myuvm. it was also determined that the scope would exclude content that might be considered marketing or engagement for its own sake, for the same purpose of minimizing users’ cognitive load and helping them to quickly find the features they need. the myuvm developers in the registrar’s office were enthusiastic about working with us on this project, which partly reflects an increased attention across campus to equitable access to student services for all users—something that is important for its own sake, but also for the purposes of accreditation. following preliminary discussions in early 2018, myuvm developers created a test “libraries” page, equivalent to a full screen of content, and assigned to our group the privileges necessary to view it in the myuvm test environment. each page in myuvm is composed of a series of content boxes or channels. in developing our new page, our task was to develop content for the desired channels. we began our process for composing the page with a card-sorting exercise that identified priorities for the content that should be highlighted. the participants were the group’s members, in order to expedite initial decisions about content that could be tested with users at a later point in the project. items that figured prominently in this process were the libraries’ “ask a librarian” service, research guides, and search tools (discovery layer, databases, and journal directory). this confirmed that our group’s priorities centered on users’ transactional interactions with library services and not merely the one-way promotion of library information. the results of the card sorting were then translated into a wireframe (see figure 2). each square in the wireframe represented a channel for which we would need to create the appropriate content: • ask a librarian (contact details for the libraries’ research assistance services) • research guides (subject and class guides) • search our collections (search tools for the discovery layer, databases, and journal directory) information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 5 • research roadmap (the libraries’ suite of tutorials on foundational research skills) • featured content (a channel for rotating or temporary time-specific content) • libraries (a box with a link to each of the three libraries on campus; we later added a channel for each library) • the wireframe also envisaged the inclusion of a pop-out chat widget. figure 2. wireframe for library content. as noted, the project needed a process that would enable our group to create and publish this content autonomously, but without requiring advanced programming skills on our part. we learned that myuvm is capable of publishing content pushed from a webpage by using its url. this meant that we could create content in libguides, a platform with which our group was very familiar, and then push the content of an individual libguide box to a myuvm channel simply by providing the libguide box urls to the portal developers. this method offers several advantages. importantly, it meant that our group had direct control of the box content and was able to publish it without needing the myuvm developers to review and authorize every edit. information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 6 those involved in this project faced important decisions early in the process regarding which resources we deemed essential for inclusion and best suited to this new online context. once items were selected, it was important to keep user behaviors in mind as we prioritized “above the fo ld” content. students are used to quickly popping into the portal, finding what they need, and popping out. we tried to place interactive content that fit this use pattern in high-visibility places and moved content that required more sustained reading and attention further down the page. a challenge faced during the design process was our campus’s lack of a unified, cross-libraries web presence. the three libraries on our campus have separate websites, but the university of vermont portal required that we present a unified “libraries” presence. in some cases, such as links back to library webpages, we were easily able to treat the three libraries separately. in other cases, such as our research guides, we were able to merge resources from multiple libraries. in still other cases, such as our chat widgets, we had to make decisions about which library’s resource would be featured and which other versions would be secondary. the prototyping and testing phases revealed that some content needed to be adjusted in order to display in myuvm as desired. libguides’ tabbed boxes and gallery boxes did not display correctly. also, some style coding inherited from the libguides boxes needed to be adjusted in order to display cleanly. one item, “course reserves,” was present in the wireframe but not the page at the time of implementation. we continue to work on the development of a widget for searching “course reserves” holdings. the version of the “library” page at the time of going live is shown in figure 3. figure 3. screenshot of the “library” page in myuvm. the “research guides” channel has a dropdown menu for subject guides and another for class guides. these menus were created using libguides widgets, meaning that they update automatically as guides are published and unpublished, and do not require any manual maintenance. information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 7 the “search our collections” channel includes three access points to the libraries’ collections. this contrasts with the libraries’ websites, which display only the discovery layer search box. the latter approach has the advantage of promoting one-stop searching, but also the disadvantage of overwhelming users with non-relevant results. channels on the left side of the page are less dynamic and interactive. at the top, links to the three libraries on campus provide highly visible quick access for students looking for the libraries’ websites. similarly, the “ask a librarian” channel quickly gets students to reference and consultation services at their home library. the “you should know” channel provides a space for rotating content to be changed based on time-of-year, events on campus, or other perceived student needs. results the “library” page in myuvm went live in january 2019, at the same time that spring semester classes began. our preliminary review of results from the semester, based on data collected from myuvm, libguides statistics, and google analytics, has identified several positive outcomes. myuvm data showed that there were 18,891 visits to the “library” page during the period from mid-january to the end of march, a period of eleven weeks when classes were in session. this volume of traffic substantially exceeded our group’s expectations for the first months following implementation, during a period when we were only beginning to promote awareness of the page. data also showed that usage during this period was generally consistent. the most significant variation in traffic was a small peak in late february that corresponded with a high point in the level of library instruction. libguides statistics showed an overall increase in usage of subject guides, though it is not possible to attribute this to the myuvm project with complete certainty. in addition, however, we also observed that for many of our guides during this period, myuvm was among the top referring sites. libguides statistics also recorded unexpectedly large increases in usage for the “research roadmap” that we attribute primarily to the myuvm project. four sections of the “research roadmap” experienced increases of more than 100 percent during the january-march period. the research roadmap’s “more help” page showed a 65 percent drop in visits, but a possible explanation for this is that the highlighting of sections in myuvm is providing more-immediate help to our users in finding what they need and promoting independent use of instructional materials by students. libchat statistics indicated a significant increase in chat reference transactions at howe library, the university of vermont’s central library: a 23 percent increase over the count for the fall 2018 semester, with the implementation of the myuvm project being the only reasonable explanation. all initial data appear to show that users are finding and continuing to use the “library” tab in the portal. they are discovering guides and using the embedded chat widget. we plan to gather more usage data for other channels on the page to better inform our picture of what users are doing once they find and view the “library” tab. as campus portals have become a ubiquitous part of university life, revisiting the library’s role in these portals seems worthwhile, especially given that information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 8 commonplace design tools like libguides dramatically lower the technological acumen needed for creating content. future directions the next step for this project is to leverage the ability of a campus portal to create a myuvm homepage library channel that customizes the display of content, based on unique user characteristics. when the user logs in, they are routed to the portal’s landing page, which is dynamically created based upon their student or faculty status, enrollment in a college or school, level of study (graduate/undergraduate), or number of years attending the university of vermont. this page has the ability to conform to the user in even more granular ways and dynamically display content based upon their major or other demographic categories such as study abroad status, veteran status, or first-year students. by leveraging the portal’s ability to display user-specific content, the university of vermont libraries have the ability to customize instructional content tailored to a user’s information needs and place that content in a channel that will display alongside other channels on the myuvm homepage. a first-year history major’s library channel could contain tutorials on working with primary sources, a link to their liaison librarian, links to digitized newspaper collection, and help guides for chicago citation style. a graduate student in nursing might see information abou t evidence-based practices for developing a clinical question, help guides for using pubmed and cinahl, and resources for point-of-care. a faculty member in psychology might find tutorials for creating alerts in their favorite journals, information about copyright and reserves material, or information about citation-management software. in each case, the portal pushes resources and assistance to each user that best fits their specific need, as informed by the librarians best equipped to address that need. this last step of placing dynamic content on the myuvm homepage will require a great deal of coordination with liaison librarians both to identify the most pertinent disciplinary information to place in the portal and to identify the times of year when certain information is most relevant. to keep portal content dynamic and pertinent to users, a system will need to be created for releasing and removing content on a regular basis and this scheduling of content will require the input of liaison librarians. the educational services working group will need to manage this scheduling, as well as the enforcement of portal design conventions in coordination with the myuvm developers. although this management may end up being complex, it is not insurmountable, and our next steps will be to both to create a system for content creation and management, and to begin to create test content for a sample of user groups. we also plan to gather more data and expand our analytics capabilities to assess how users are using content on the myuvm “library” page and examine which features are most popular, how much traffic is being driven back to our websites, and how users are interacting with the features on the page. conclusion our project has confirmed our initial inclination that students go to myuvm as a finding tool for finding inter-campus resources. also, faculty have reported accessing library resources through the portal and directing their students to that pathway as well. the immediate high use and consistency of use indicate that we have placed our selected libraries resources in a high -traffic information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 9 venue. instead of attempting to coax students to our web outpost in the wilds of the internet, we have placed an exit ramp from a highway they already travel. this has proven overwhelmingly effective and confirms, on our campus at least, the literature from the mid-2000s pointing out the opportunity created for libraries by campuses’ institutional adoption of portal systems. in all, the project has been a worthwhile venture for the university of vermont libraries. we have observed immediate use and better-than-expected levels of traffic, as well as continued use throughout the semester. it appears that once students wear a path to resources in myuvm, they are continuing to use that path as a way to access library content. we look forward to further customizing that content in the near future. acknowledgements we gratefully acknowledge david alles, portal developer, and naima dennis, senior assistant registrar for technology, in the university of vermont office of the registrar, for their contributions to the design and development of this project. endnotes 1 scott garrison, anne prestamo, and juan carlos rodriguez, “putting library discovery where users are,” in planning and implementing resource discovery tools in academic libraries, ed. mary pagliero popp and diane dallis (hershey, pa: information science reference, 2012), 391, https://doi.org/10.4018/978-1-4666-1821-3.ch022; bruce stoffel and jim cunningham, “library participation in campus web portals: an initial survey,” reference services review 33, no. 2 (june 1, 2005): 145-46, https://doi.org/10.1108/00907320510597354. 2 mark carden, “library portals and enterprise portals: why libraries need to be at the centre of enterprise portal projects,” information services & use 24, no. 4 (2004): 172–73, https://doi.org/10.3233/isu-2004-24402. 3 ilka datig, “walking in your users’ shoes: an introduction to user experience research as a tool for developing user-centered libraries,” college & undergraduate libraries 22, nos. 3–4 (2015): 235–37, https://doi.org/10.1080/10691316.2015.1060143; steven j. bell, “staying true to the core: designing the future academic library experience,” portal: libraries and the academy 14, no. 3 (2014): 369–82. https://doi.org/10.1353/pla.2014.0021. 4 stoffel and cunningham, “library participation in campus web portals,” 145-46. 5 tim mcgeary, “mylibrary: the library’s response to the campus portal,” online information review 29, no. 4 (2005): 365–73, https://doi.org/10.1108/14684520510617811; garrison, prestamo, and rodriguez, “putting library discovery where users are,” 393-94. https://doi.org/10.4018/978-1-4666-1821-3.ch022 https://doi.org/10.1108/00907320510597354 https://doi.org/10.3233/isu-2004-24402 https://doi.org/10.3233/isu-2004-24402 https://doi.org/10.1080/10691316.2015.1060143 https://doi.org/10.1353/pla.2014.0021 https://doi.org/10.1108/14684520510617811 abstract introduction methods results future directions conclusion acknowledgements endnotes pal: toward a recommendation system for manuscripts scott ziegler and richard shrake information technology and libraries | september 2018 84 scott ziegler (sziegler1@lsu.edu) is head of digital programs and services, louisiana state university libraries. prior to this position, ziegler was the head of digital scholarship and technology, american philosophical society. richard shrake (shraker13@gmail.com) is a library technology consultant based in burlington, vermont. abstract book-recommendation systems are increasingly common, from amazon to public library interfaces. however, for archives and special collections, such automated assistance has been rare. this is partly due to the complexity of descriptions (finding aids describing whole collections) and partly due to the complexity of the collections themselves (what is this collection about and how is it related to another collection?). the american philosophical society library is using circulation data collected through the collectionmanagement software package, aeon, to automate recommendations. in our system, which we’re calling pal (people also liked), recommendations are offered in two ways: based on interests (“you’re interested in x, other people interested in x looked at these collections”) and on specific requests (“you’ve looked at y, other people who looked at y also looked that these collections”). this article will discuss the development of pal and plans for the system. we will also discuss ongoing concerns and issues, how patron privacy is protected, and the possibility of generalizing beyond any specific software solution. introduction the american philosophical society library (aps) is an independent research library in philadelphia. founded in 1743, the library houses a wide variety of material in early american history, history of science, and native american linguistics. the majority of the library’s holdings are manuscripts, with a large amount of audio material, maps, and graphics, nearly all of which are described in finding aids created using encoded archival description (ead) standards. like similar institutions, the aps has long struggled to find new ways to help library users discover material relevant to their research. in addition to traditional in-person, email, and phone reference, the aps has spent years creating search and browse interfaces, subject guides , and web exhibitions to promote the collections.1 as part of these ongoing efforts to connect users with collections, the aps is working on an automated recommendation system to reuse circulation data gathered through aeon. developed by atlas systems, aeon is a “request and workflow management software specifically designed for special collections libraries and archives,” and it enables the aps to gather statistics on both the use of our manuscript collections and on aspects of the library’s users.2 the automated recommendation system, which we’re calling pal, for “people also liked,” is an ongoing effort. this article presents a snapshot of current work. pal: toward a recommendation system for manuscripts | ziegler and shrake 85 https://doi.org/10.6017/ital.v37i3.10357 literature review the benefits of recommendations in library opacs has long been recognized. writing in 2008 about the library recommendation system bibtip, itself started in the early 2000s, mönnich and spiering observe that “library services are well suited for the adoption of recommendation systems, especially services that support the user in search of literature in the catalog.” by 2011 oclc research and the information school at the university of sheffield began exploring a recommendation system for oclc’s worldcat.3 recommendations for library opacs commonly fall into one of two categories, content-based or collaborative filtering. content-based recommendations pair specific users to library items based on the metadata of the item and what is known about the user. for example, if a user indicates in some way that they enjoy mystery novels, items identified as mystery novels might be recommended to them. collaborative filtering combines users in some way and creates recommendations for one user based on the preferences of another user. there can be a dark side to recommendations. the algorithms that determine which users are similar and thus which recommendations to make are not often understood. writing about algorithms in library discovery systems broadly, reidsma points out that “in librarianship over the past few decades, the profession has had to grapple with the perception that computers are better at finding relevant information then people.”4 the algorithms that are doing the finding, however, often carry the same hidden biases that their programmers have. reidsma encourages a broader understanding of algorithms in general and deeper understanding of recommendation algorithms in particular. the history of recommendation systems in libraries has informed the ongoing development of pal. we use both the content-based and the collaborative filtering approach to offering recommendations to users. for the purposes of communicating them to nontechnical patrons, we refer to them as “interest-based” and “request-based,” respectively. furthermore, we are cautious about the role algorithms play in determining which recommendations users see. our help text reinforces the continued importance of working directly with in-house experts, and we promote pal as one tool among the many offered by the library. we are not aware of any literature on the development of recommendation tools for archives or special-collections libraries. the nature of the material held in these institutions presents special challenges. for example, unlike book collections, many manuscript and archival collections are described in aggregate: one description might refer to many letters. these issues are discussed in detail below. putting data to use: recommendations based on interests and requests the use of aeon allows the aps to gather and store data, including both data that users supply through the registration form and data concerning which collections are requested. pal use both types of data to create recommendations. interest-based recommendations the first type of recommendation uses self-identified research interest data that researchers supply when creating an aeon account. when registering, a user has the option to select from a list of sixty-four topics grouped into seven broad categories (figure 1). the aps selected these information technology and libraries | september 2018 86 interests based on suggestions from researchers as well as categories common in the field of academic history. upon signing in, a registered user sees a list of links (figure 2); each link leads to a full-page view of collection recommendations (figure 3). these recommendations follow the model, “you’re interested in x, other people interested in x looked at these collections.” request-based recommendations using the circulation data that aeon collects, we are able to automate recommendations in pal based on request information. upon clicking a request link in a finding aid, the user is presented with a list of recommendations on the sidebar in aeon (figure 4). each link opens the finding aid for the collection listed. figure 1. list of interests a user sees when registering for the first time. a user can also revisit this list to modify their choices at any point by following links through the aeon interface. the selected interests generate recommendations. pal: toward a recommendation system for manuscripts | ziegler and shrake 87 https://doi.org/10.6017/ital.v37i3.10357 figure 2. list of links appearing on the right-hand sidebar, based on interests that users select. figure 3. recommended collections, based on interest, showing collection name (with a link to finding aid), call number, number of requests, and number of users who have requested from the collections. the user sees this list after clicking on option from sidebar, as shown in figure 2. information technology and libraries | september 2018 88 figure 4. request-based recommendation links appearing on the right-hand sidebar after a patron requests an item from a finding aid. the process currently, the data that drives these two functions is obtained from a semidynamic process via daily, automated sql query exports. usernames are employed to tie together requests and interests but are subsequently purged from the data before the results are presented to users and staff. this section explains the process in detail and presents code snippets where available. all code is available on github.5 interest-based recommendations for interest-based recommendations, we employ two queries. the first query pulls every collection requested by a user for each topic for which that user has expressed an interest. the second aggregates the data for every user in the system. the following queries get data from the microsoft sql database, via a microsoft access intermediary, that aeon uses to store data. because of the number of interest options in the registration form, and the character length of some of them (“early america colonial history,” for example) we encode the interests in shortened form. “early america colonial history” becomes “ea-colhist” so as not to run into character limits in the database. this section explores each of these queries in more detail and provides example code. pal: toward a recommendation system for manuscripts | ziegler and shrake 89 https://doi.org/10.6017/ital.v37i3.10357 the first query gathers research topics for all users who are not staff (user status is ‘researcher’), and where at least one research topic is chosen (‘researchtopics’ is not null). the data is exported into an xml file that we call “aeonmssreg.” select aeondata.dbo.users.researchtopics, aeondata.dbo.transactions.callnumber, aeondata.dbo.transactions.location from aeondata.dbo.transactions inner join aeondata.dbo.users on (aeondata.dbo.users.username = aeondata.dbo.transactions.username) and (aeondata.dbo.transactions.username = aeondata.dbo.users.username) where (((aeondata.dbo.users.researchtopics) is not null) and ((aeondata.dbo.transactions.callnumber) like 'mss%' or (aeondata.dbo.transactions.callnumber) like 'aps.%') and ((aeondata.dbo.users.status)='researcher')) for xml raw ('aeonmssreq'), root ('dataroot'), elements; the second query combines all data for all users and exports an xml file ‘aeonmssusers.’ select distinct aeondata.dbo.users.researchtopics, aeondata.dbo.transactions.callnumber, aeondata.dbo.transactions.location, aeondata.dbo.transactions.username from aeondata.dbo.transactions inner join aeondata.dbo.users on (aeondata.dbo.users.username = aeondata.dbo.transactions.username) and (aeondata.dbo.transactions.username = aeondata.dbo.users.username) where (((aeondata.dbo.users.researchtopics) is not null) and ((aeondata.dbo.transactions.callnumber) like 'mss%' or (aeondata.dbo.transactions.callnumber) like 'aps.%') and ((aeondata.dbo.users.status)='researcher')) for xml raw ('aeonmssusers'), root ('dataroot'), elements; each query produces an xml file. these files are parsed using xsl stylesheets into subsets for each research interest. the stylesheets also generate counts of users requesting a collection and number of total requests for a collection by users sharing an interest. an example is the following stylesheet for the topic “early america colonial history,” which pulls from the xml file “aeonmssreg”: information technology and libraries | september 2018 90 this process is repeated for each interest. the data from the query that we modify with xslt is presented as html that we insert into aeon templates. this html includes the collection name (linked to finding aid), call number, number of requests, and number of users in a table. see figure 3 for how this appears to the user. the following shows how xsl is wrapped in html.

    the collections most frequently requested from researchers who expressed an interest in are listed below with links to each collection's finding aid and the number of times each collection has been requested.

    collection call number # of requests # of users
    to ensure a user only sees the links that match the interests they have selected, we use javascript to determine the expressed interests of the current user and display the corresponding links to the html pages in a sidebar. this approach works well, but we must account for two quirks. the first is that many interests in the database do not conform to the current list of options because many users predate our current registration form and wrote in free-form interests. secondly, aeon stores the research information as an array rather than in a separate table, so we must account for the fact that the aeon database contains an array of values that includes both controlled and uncontrolled vocabulary. first, we set the array as a variable so we can look for a value that matches our controlled vocabulary and separate the array into individual values for manipulation: // use var message to check for presence of controlled list of topics var message = "<#user field='researchtopics'>"; // use var values to separate topics that are collected in one string var values = "<#user field='researchtopics'>".split(","); pal: toward a recommendation system for manuscripts | ziegler and shrake 91 https://doi.org/10.6017/ital.v37i3.10357 we also create variables to generate the html entries and links out when we have extracted our research topics: var open = "" next we set a conditional to determine if one of our controlled vocabulary terms appears in the array: //determine if user has an interest topic from the controlled list if ((message.indexof("ea-colhis") > -1) || (message.indexof("ea-amrev") > -1) || (message.indexof("ea-earlynat") > -1) || (message.indexof("ea-antebellum") > -1) || … if the array contains a value from our controlled vocabulary, we generate a link and translate our internal code back into a human-friendly research topic (“ea-colhist,” for example, becomes once again “early american colonial history”): for (var i = 0; i < values.length; ++i) { if (values[i]=="ea-colhis"){ document.getelementbyid("topic").innerhtml += (open + values[i] + middle + "early america-colonial history" + close);} else if (values[i]=="ea-amrev"){ document.getelementbyid("topic").innerhtml += (open + values[i] + middle + "early america american revolution" + close);} else if (values[i]=="ea-earlynat"){ document.getelementbyid("topic").innerhtml += (open + values[i] + middle + "early america early national" + close);} else if (values[i]=="ea-antebellum"){ document.getelementbyid("topic").innerhtml += (open + values[i] + middle + "early america antebellum" + close);} … see figure 2 for how this appears to the user. users only see the links that correspond to their stated interest. if the array does not contain a value from our controlled vocabulary, we display the research-topic interests associated with the user account, note that we don’t currently have a recommendation, and provide a link to update the research topics for the account. else {document.getelementbyid("notopic").innerhtml = "

    you expressed interest in:

    <#user field='researchtopics'>

    we are unable to provide a specific collection recommendation for you. please visit our user profile page to select from our list of research topics.

    " } request-based recommendations in addition to interest-based recommendations, pal supplies recommendations based on past requests a user has made. this section details how these recommendations are generated. aeon allows users to request materials directly from a finding aid (see figure 6). to generate our request-based recommendations we employ a query depicting the call number and user of every request in the system and export the results to an xml file called “aeonlikecollections.” information technology and libraries | september 2018 92 select subquery.callnumber, subquery.username, iif(right(subquery.trimlocation,1)='.',left(subquery.trimlocation,len(subquery.trimlocation)1),subquery.trimlocation) as finallocation from ( select distinct aeondata.dbo.transactions.callnumber, aeondata.dbo.transactions.username, iif(charindex(':',[location])>0,left([location],charindex(':',[location])-1),[location]) as trimlocation from aeondata.dbo.transactions inner join aeondata.dbo.users on (aeondata.dbo.users.username = aeondata.dbo.transactions.username) and (aeondata.dbo.transactions.username = aeondata.dbo.users.username) where (((aeondata.dbo.transactions.callnumber) like 'mss%' or (aeondata.dbo.transactions.callnumber) like 'aps.%') and ((aeondata.dbo.transactions.location) is not null) and ((aeondata.dbo.users.status)='researcher'))) subquery order by subquery.callnumber for xml raw ('aeonlikecollections'), root ('dataroot'), elements; we then process the “aeonlikecollections” file through a series of xslt stylesheets, creating lists of every other collection that every user of the current collection has requested. first the stylesheets remove collections that have only been requested once. then we count the number of times each collection has been requested: we sort on the collection name and username and then re-sort to combine groups of requested collections with users who have requested each collection. pal: toward a recommendation system for manuscripts | ziegler and shrake 93 https://doi.org/10.6017/ital.v37i3.10357 we then create a new xml file that is organized by our collection groupings. the following snippet shows a populated xml file generated by the xslt stylesheet above. mss.497.3.b63c mss.497.3.b63c american council of learned societies … 94 mss.ms.coll.200 mss.ms.coll.200 miscellaneous manuscripts collection … 92 we use javascript to determine the call number of the user’s current request and display the list of other collections that users who have requested the current collection have also requested. see figure 4 for how these links appear to the user. all of the exports and processing are handled automatically through a daily scheduled task. the only personally identifiable data that is contained in these processes are usernames, which are used for counting purposes, but they are removed from the final products through the xslt processing on an internal administrative server, are never stored in the aeon web directory, and are never available for other library users or staff to see. potential pitfalls and what to do about them pal allows us to see new things about our users, and we hope that our users are able to see new collections in the library. however, there are potential pitfalls to the way we’ve been working on this project. we’re calling the two biggest pitfalls the “bias toward well-described collections” and the “problem of aboutness.” information technology and libraries | september 2018 94 the bias toward well-described collections the bias toward well-described collections is best understood by examining how the aps integrates aeon into our finding aids. we offer request links at every available level of description: collection, series, folder, and item. if a patron spends all day in our reading room and looks at the entirety of an item-level collection, they could have made between twenty and one hundred individual requests from that collection. for our statistics, each request will be counted as that collection being used. figure 6 shows a collection described at the item level; each item can be individually requested, giving the impression that this collection is very heavily used even if it is only one patron doing all the requesting. figure 6. finding aid of collection described at the item level. a patron making their way through this collection could make as many as one hundred individual requests. for collections described at the collection level, however, the patron has only one link to click to see the entire collection. for pal, however, it looks like that collection was only used once, as shown in figure 7. a patron sitting all day in our reading room looking at a collection with little description might use the collection more heavily than a patron clicking select items in a well-described collection. however, when we review the numbers, all we see is that the well-described collections get more clicks. pal: toward a recommendation system for manuscripts | ziegler and shrake 95 https://doi.org/10.6017/ital.v37i3.10357 figure 7. screenshot of finding aid with only collection-level description. this collection has only one request link, the “special request” link at the top right. a patron looking through the entirety of this collection will only log a single request from the point of view of our statistics. the problem of aboutness when we speak of the problem of aboutness, we draw attention to the fact that manuscript collections can be about many different things. one researcher might come to a collection for one reason, another researcher for another reason. a good example at the aps library is the william parker foulke papers.6 this collection contains approximately three thousand items and represents a wide variety of the interests of the eponymous mr. foulke. he discovered the first full dinosaur skeleton, promoted prison reform, worked toward abolition, and championed arctic exploration. a patron looking at this collection could be interested in any of these topics, or others. pal, however, isn’t able to account for these nuances. if a researcher interested in prison reform requests items from the foulke papers, they’ll see the same suggestion as a researcher who came to the collection for arctic exploration. what to do about this identifying these pitfalls is a good first step to avoiding them, but it’s only a first step. there are technical solutions, and we’ll continue to explore them. for example, the bias toward welldescribed collections is mitigated by showing both the number of requests and the number of users who have requested from a collection (see figure 3). we hope that by presenting both numbers, we move a little toward overcoming this bias. however, we’re also interested in the nontechnical approaches to these issues. as mentioned in the introduction, the aps relies heavily on traditional reference service, both remote and in-house. nontechnical solutions acknowledge the shortcomings of any constructed solution and injects a healthy amount of humility into our work. additionally, the subject guides, search tools, and web exhibitions all form an ecosystem of discovery and access to supplement pal. future steps using data outside of aeon we have begun exploring options for using the recommendation data outside of aeon. one early prototype surfaces a link in our primary search interface. for example, searching for the william information technology and libraries | september 2018 96 parker foulke papers shows a link of what people who requested from this collection also looked at. see figures 8 and 9. generalizing for other repositories there are ways to integrate the use of aeon with ead finding aids. the systems that the aps has developed to collect data for automated recommendations takes advantage of our infrastructure. we’d like for other repositories to be able to use pal. it is our hope that an institution using aeon in a different way will help us generalize this system. generalizing beyond aeon pal is currently configured to pull data out of the microsoft sql database used by aeon. however, all the manipulation is done outside of aeon and is therefore generalizable to data collected in other ways. because archives and special collections have long-held statistics in different types of systems, we hope to be able to generalize beyond the aeon use case if there is any interest in this from other repositories. integrating pal into aeon conversations with atlas staff about pal have been positive, and there is interest in building many of the features into future releases of aeon. as of this writing, an open uservoice forum topic is taking votes and comments about this integration.7 figure 8. a link in the search returns that leads to recommendations based on finding aid search. clicking on the link “pal recommendations: patrons who used henry howard houston, ii papers also used these collections” will open an html page with a list of links to finding aids. pal: toward a recommendation system for manuscripts | ziegler and shrake 97 https://doi.org/10.6017/ital.v37i3.10357 figure 9. html link of recommended finding aids based on search. conclusion the aps is trying to add to the already robust options for users to find relevant manuscript collections. in addition to traditional reference, web exhibitions, and online search and browse tools, we have started reusing circulation data and self-identified user interests to automate recommendations. this new system fits within the ecosystem of tools we already supply. this is a snapshot of where the pal recommendation project is as of this writing, and we hope to work with other special collections libraries and archives to continue to grow the tool. if you are interested, we hope you reach out. endnotes 1 “subject guides and bibliographies,” american philosophical society, accessed february 27, 2018, https://amphilsoc.org/library/guides; “exhibitions,” american philosophical society, accessed february 27, 2018, https://amphilsoc.org/library/exhibit; “galleries,” american philosophical society, accessed february 27, 2018, https://diglib.amphilsoc.org/galleries. 2 “aeon,” atlas systems, accessed february 27, 2018, https://www.atlas-sys.com/aeon/. https://amphilsoc.org/library/guides https://amphilsoc.org/library/exhibit https://diglib.amphilsoc.org/galleries https://www.atlas-sys.com/aeon/ information technology and libraries | september 2018 98 3 michael mönnich and marcus spiering, “adding value to the library catalog by implementing a recommendation system,” d-lib magazine 14, no. 5/6 (2008), https://doi.org/10.1045/may2008-monnich. 4 matthew reidsma, “algorithmic bias in library discovery systems,” matthew reidsma (blog), march 11, 2016, https://matthew.reidsrow.com/articles/173. 5 “americanphilosophicalsociety/pal,” american philosophical society, last modified september 11, 2017, https://github.com/americanphilosophicalsociety/pal. 6 “william parker foulke papers, 1840–1865,” american philosophical society, accessed february 27, 2018, https://search.amphilsoc.org/collections/view?docid=ead/mss.b.f826-ead.xml. 7 “recommendation system to suggest items to researchers based on users with the same research topic,” atlas systems, accessed february 27, 2018, https://uservoice.atlassys.com/forums/568075-aeon-ideas/suggestions/18893335-recommendation-system-tosuggest-items-to-research. https://doi.org/10.1045/may2008-monnich https://matthew.reidsrow.com/articles/173 https://github.com/americanphilosophicalsociety/pal http://amphilsoc.org/collections/view?docid=ead/mss.b.f826-ead.xml https://uservoice.atlas-sys.com/forums/568075-aeon-ideas/suggestions/18893335-recommendation-system-to-suggest-items-to-research https://uservoice.atlas-sys.com/forums/568075-aeon-ideas/suggestions/18893335-recommendation-system-to-suggest-items-to-research https://uservoice.atlas-sys.com/forums/568075-aeon-ideas/suggestions/18893335-recommendation-system-to-suggest-items-to-research abstract introduction literature review putting data to use: recommendations based on interests and requests interest-based recommendations request-based recommendations the process interest-based recommendations request-based recommendations potential pitfalls and what to do about them the bias toward well-described collections the problem of aboutness what to do about this future steps using data outside of aeon generalizing for other repositories generalizing beyond aeon integrating pal into aeon conclusion endnotes using machine learning and natural language processing to analyze library chat reference transcripts article using machine learning and natural language processing to analyze library chat reference transcripts yongming wang information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.14967 yongming wang (wangyo@tcnj.edu) is systems librarian, the college of new jersey. © 2022. abstract the use of artificial intelligence and machine learning has rapidly become a standard technology across all industries and businesses for gaining insight and predicting the future. in recent years, the library community has begun looking at ways to improve library services by applying ai and machine learning techniques to library data. chat reference in libraries generates a large amount of data in the form of transcripts. this study uses machine learning and natural language processing methods to analyze one academic library’s chat transcripts over a period of eight years. the built machine learning model tries to classify chat questions into a category of reference or nonreference questions. the purpose is to predict the category of future questions by the model with the hope that incoming questions can be channeled to appropriate library departments or staff. introduction since the beginning of this century, artificial intelligence (ai) and machine learning (ml) have been used in almost all industries and businesses to gain knowledge and insights and predict the future. the large amount of data available has helped to accelerate the application of ai and ml in stunning speed. to follow this technology trend, the library community has begun looking at ways to improve library services by applying ai and ml techniques to library data. stanford university library is one of the pioneers in the research and application of ml and ai in the library. the mission of its library ai initiatives states: “the library ai initiative is a program to identify, design, and enact applications of artificial intelligence that will help us make our rich collections more easily discoverable, accessible, and analyzable.” 1 in 2019, stanford university library hosted the second international conference on ai for libraries, archives, and museums, titled fantastic future.2 many academic libraries have implemented chat reference services as a way to support student learning and academic research on campus. chat reference serves as an important channel to connect the library’s resources and services to the campus community.3 over the years, libraries have accumulated a large amount of data in the form of chat transcripts. analyzing the content of transcripts can help the library understand users’ information needs, deploy library human resources more efficiently, and improve the quality of the chat reference service. the college of new jersey is a midsize academic library that serves a campus with 7,000 college students, most of them undergraduates. the library began to use springshare’s libchat in 2014. the chat service is freely accessible online from the library’s website, and anyone can initiate a chat by asking an initial question through the chat box. approximately 8,000 chat transactions have been accumulated over the past eight years. information technology and libraries september 2022 using machine learning and natural language processing to analyze library chat reference transcripts wang 2 this study aims to use machine learning and natural language processing (nlp) techniques to build a classification model to categorize all available questions into two categories: reference and nonreference. by doing so, we hope that the model can automatically classify future chat questions received into either the reference question category or the nonreference question category, and channel the question to the appropriate library department or staff. literature review traditionally, the analysis of chat transcripts has used qualitative or simple quantitative methods (e.g., chat frequency, duration). to better understand chat service quality and patrons’ information needs, librarians must manually review and read through chat transcripts, which requires a lot of time and effort.4 in recent years, however, the library field has started to witness the application of ai and ml techniques to analyze library data, including chat transcripts, in order to quickly and efficiently gain more insight into user information needs and information seeking patterns. megan ozeran and piper martin used topic modeling, a ml method, to analyze library chat reference conversations. the purpose of their project was to identify the most popular topics asked by library patrons in order to improve the chat reference service and to train the library staff.5 the brigham young university library implemented a machine learning–based tool to perform various text analysis on transcripts of chat reference to gauge patron satisfaction levels and to classify patrons’ questions into several categories.6 jeremy walker and jason coleman used ml and nlp techniques to build models that predict the relative difficulty of incoming chat reference questions. they tested their large sample size of chat transcripts on hundreds of models. their aim was to help library professionals and management improve chat reference services in the library.7another ml topic modeling project of was carried out by hyunseung koh and mark fienup. their study applied plsa (probabilistic latent semantic analysis) to library chat data over a period of four years, resulting in more accurate and interpretable topics and subjects compared with results by human qualitative evaluation.8 another interesting ml project on chat reference data was conducted by ellie kohler. this project used a machine learning model to analyze chat transcripts for sentiment and topic extraction. 9 in addition to library chat data, ml has been also used to analyze other library data, including library digital collections and library tweet data. jeremiah flannery applied nlp summarization techniques on a special library digital collection of catholic pamphlets. this project tried to automatically generate a summary for each digitized pamphlet by using nlp’s bert extractive technique and gensim python package.10 sultan m. al-daihani and alan abrahams conducted a text mining analysis of academic libraries’ tweets. they used a tool called pamtat developed by the pamplin college of business at virginia polytechnic institute and state university. pamplin is a microsoft excel–based interface to the nlp nltk package written in python. the purpose of their analysis was to try to identify the most common topics or subject keywords of the tweets by 10 large academic libraries. in addition, they also ran harvard general inquirer for semantic and sentiment analysis of the tweets.11 information technology and libraries september 2022 using machine learning and natural language processing to analyze library chat reference transcripts wang 3 other applications of ml techniques in the academic library include analyzing library operations such as acquisition. in 2019, kevin w. walker and zhehan jiang from the univers ity of alabama used a machine learning method called adaptive boosting (adaboost) to predict demand -driven acquisition (dda).12 carlos g. figuerola, franciso javier carcia marco, and maria pinto used the topic modeling technique, specifically the latent dirichlet allocation, to identify the main topics and categories of the 92,705 publications in the domain of library and information science from 1978 to 2014.13 pair (projects in artificial intelligence registry) is a repository and online global directory of ai projects in higher education. it is maintained by the university of oklahoma libraries. the aim of pair is to foster cross-institutional collaboration and to support grant activity in the field of artificial intelligence and machine learning in higher education.14 public libraries have started to seriously look at the application and impact of ai in the library. frisco public library in texas has developed a series of applications and programs to help train library staff in ai. they also developed artificial intelligence maker kits, including google aiy voice kit, for circulation. they even provide introductory python lessons to the public.15 background of nlp and ml natural language processing is a multidiscipline field that involves linguistics, computer science, and machine learning. by using computer algorithms, nlp tries to build a machine learning model that is applied to large amounts of data in order to make predictions or decisions. the data in nlp is natural language data, that is, data in plain and unstructured textual form in any language. there are many types of applications of nlp and ml in business or people’s daily lives. especially with the popularity of internet, there is a tremendous increase in and accumulation of textual data, such as social media networks and customer online chat services. major applications of nlp include sentiment analysis on social media data, topic modeling in digital humanities, text classification, speech recognition, search box auto correct, and auto completion, etc. the use cases are countless. in general, there are two types of ml: supervised learning vs. unsupervised learning. in supervised learning, the dataset fed to the model is labelled in advance to classify data or predict outcomes accurately whereas unsupervised learning is a type of algorithm that learns patterns from unlabeled or untagged data. no matter which type, all ml and nlp techniques involve a series of general steps in any project, also called the ml/nlp pipeline. 1. data collection, which involves obtaining the raw textual data and usually means downloading data from some remote server or service. 2. data preprocessing, which is necessary for any project, large or small, because the raw textual data is unstructured data and is not ready to be fed to the model for computing processes. data preprocessing usually includes removing punctuations, changing all letters to lowercase, tokenization, removing stop words, and stemming or lemmatization. 3. feature engineering, which is optional but often very useful. 4. text vectorization, which is the final step before feeding the data to the model. the purpose is to transform the text into some kind of value in numbers. information technology and libraries september 2022 using machine learning and natural language processing to analyze library chat reference transcripts wang 4 5. model building, evaluation, and optimization, which involves multiple cycles until the optimal or desired results are achieved. 6. implementation, which is the final step in implementing the model to the real world. methodology for this ml/nlp project, the raw data came from the chat transcripts repository downloaded from springshare’s server. from 2014 to 2021, a total of 8,000 chat reference transactions were logged. these transactions formed the raw dataset for model building and testing with this project. because of the nature of the data, i.e., textual data, python was chosen for this project. and the two major python packages used in the project are nltk and scikit-learn. nltk (natural language toolkit) is a suite of libraries and programs for natural language processing for english language. nltk supports classification, tokenization, and stemming, tagging, parsing, and semantic reasoning functionalities. scikit-learn is a python module built on numpy, scipy, and matplotlib. featuring various classification, regression, and clustering algorithms, including support-vector machines, random forest, gradient boosting, k-means and dbscan, scikit-learn is a simple and efficient tool for predictive data analysis. scikit-learn is one of the most popular python modules for any ml project. data collection data collection includes both data gathering and data preparation. data gathering is the process of downloading the 8,000 initial questions into an excel file. data preparation deals with the initial data clean up, such as removing the blank rows. the most important task of data preparation is data labelling. because this is a supervised-learning ml project, all questions must be labeled as either reference question (label=yes) or nonreference question (label=no) by hand. then, all labeled questions (dataset) are fed to the ml model for either training purposes or testing purposes. see table 1 for an example of data after the preparation step. table 1. sample questions with yes or no labels question sequential number label question 3979 yes working on an alumni reunion presentation. i need to know … 3980 yes would a book with this call number: ds559.8.d7 g68 1991 … 3981 no would a rutgers student be able to take out a textbook from … 3982 yes would i be able to find mathematics textbooks by pearson on … 3983 no would i be able to log in to find an article if i am an alumni of … 3984 yes would it be possible to help me find a online essay? 3985 no would like to renew: huguenots [videorecording] / music by … 3986 yes would like to request for a course description catalog from fall … 3987 no would someone be able to ask room 414 to quiet down please? 3988 no would someone be able to come up to floor 3 and tell people to … information technology and libraries september 2022 using machine learning and natural language processing to analyze library chat reference transcripts wang 5 data preprocessing data preprocessing is the first programming step in the pipeline of this ml/nlp project. data preprocessing transforms the raw data into a more digestible form so that the ml model can perform better and achieve the desired results. one of the purposes of data preprocessing is to remove insignificant and nonmeaningful words such as “a,” “the,” “and,” etc., as well as punctuation, from the textual data. removing nonmeaningful and stop words from the corpus allows for a better result in the ml model, allowing it to deal only with significant and meaningful words. it is also necessary to apply lowercase formatting to all letters. while we as humans know that lowercase and uppercase words have the same meaning, the computer will treat them as having different meanings. for example, “cat” and “cat” are two different words to the computer. tokenization involves splitting the sentences into a list of individual words by removing spaces between words. the last step of data preprocessing is stemming or lemmatizing, which is to find the semantic meaning of a group of related words. in other words, this process explicitly correlates words with similar meanings. for instance, run, running, runner will become “run” ; library and libraries will become “librari”; goose and geese will become “goose.” feature engineering involves creating a new feature, or transforming the current feature. the purpose of feature engineering is to help the model make better predictions. this step is optional but often very helpful if done right. in this project, a new feature called “question length” was created. “question length” was based on the assumption that the average length of reference questions is longer than the average length of nonreference questions. if such is the case, the ml model will benefit by using this new feature to make better decisions. figure 1 is a histogram of question length distribution. the distribution of reference questions is represented in blue; nonreference questions are represented in yellow. figure 2 shows a sample result list following completion of data preprocessing and feature engineering. from left to right, it lists the result after each step. the question length feature (question_len column) must follow the original question because this step is based on the original question before any other steps. the question_lemma column is the result after all preprocessing steps. information technology and libraries september 2022 using machine learning and natural language processing to analyze library chat reference transcripts wang 6 figure 1. histogram of question length distribution. figure 2. results from data preprocessing and feature engineering. information technology and libraries september 2022 using machine learning and natural language processing to analyze library chat reference transcripts wang 7 text vectorization the purpose of text vectorization is to transform the text data into numeric data so that the ml algorithms and python can understand and use that data to build a model. the basic idea is to build an n-dimensional vector of numerical features that present some object. the three most popular text vectorizations are count vectorization, n-grams vectorization, and tf-idf vectorization. tf-idf stands for term frequency–inverse document frequency. because it is weighted, it is more accurate. figure 3 shows the result of tf-idf vectorization. figure 3. result of tf-idf vectorization. model building, testing, and evaluation the first step of model building is to divide the dataset into two sets, one for a training model and one for model testing. normally we use 80% of the data for training and 20% for testing. after feeding the training data to the model, we feed the testing data as new data to the model to predict the yes or no label based on the pattern that the model builds through the training data. testing data were initially labeled by humans and are 100 percent accurate. by comparing the labels predicted by the model with the labels in the training data, we would know how the model performs, and make changes, if necessary, to the model parameters. scikit-learn contains several ml models. this project used two popular models: random forest and gradient boosting. the random forest model builds many decision trees and computes them at the same time. then the final decision is made by majority vote. because it computes at the same time, it is more efficient and fast. the gradient boosting model builds one tree at a time. each new tree helps correct errors made by previously trained trees, and then the model is boosted (optimization) by reward or penalty. in theory, gradient boosting should yield better results than random forest. nevertheless, it is slower and consumes more resources. the confusion matrix was used to evaluate the performance of the two models. there are three parameters in the confusion matrix. they are accuracy, precision, and recall. accuracy equals true positive plus true negative and then divided by the total. precision equals true positive divided by information technology and libraries september 2022 using machine learning and natural language processing to analyze library chat reference transcripts wang 8 true positive plus false positive. recall equals true positive divided by true positive plus false negative. usually there is a tradeoff between precision and recall. recall shows the number of false negatives or the percentage of false negatives of the total, and precision shows the number of false positives. false negative means that the model predicts the reference question as a nonreference question. false positive means that the model predicts the nonreference question as a reference question. which is more important for the model to catch, false positive or false negative? the answer depends on the actual situation. in our case, false negative is more serious than false positive because we did not want the real reference questions to be predicted as nonreference questions. however, it was acceptable if the nonreference questions were predicted as reference questions. therefore, we wanted the least amount of false negatives, which meant the largest recall value possible. results and analysis table 2 lists the result from both models. table 2. results of random forest model and gradient boosting model precision recall accuracy fit time predict time random forest model 0.914 0.964 0.912 2.489 s 0.15 s gradient boosting model 0.904 0.948 0.894 97.786 s 0.064 s in general, any parameter values above 0.9 (90%) are very good. looking at and comparing those results, we can see that both models performed well. nevertheless, the random forest model had better results than that of the gradient boosting model in all three parameters. in addition, the fit time of the random forest model was much shorter than that of the gradient boosting model. even though the predict time of the random forest model is slightly longer than that of the gradient boosting model, it is relatively insignificant. therefore, the random forest model was chosen for the final model for this project. conclusion and future work in this pilot study, we used the classification modeling of nlp and ml techniques to divide patrons’ chat questions into two categories: reference questions and nonreference questions. the purpose of the model is to predict the category of future questions received through chat so that library staff and professionals can provide faster, more efficient reference services. two machine learning models were tested: random forest and gradient boost. after comparing results from each model, it was concluded that the random forest model showed better results. what is the next step after the model is built? a potential use of this model is to implement it as a plugin or feature enhancement for the online chat application. the model can function as the filter to direct incoming questions to either reference librarians if the question is predicted as a reference question by the model, or to library staff or graduate student assistants if the question is predicted by the model as a nonreference question. this will be especially useful for libraries that have busy online chat service transactions. information technology and libraries september 2022 using machine learning and natural language processing to analyze library chat reference transcripts wang 9 further work can be done to make the model be multicategory. for example, a multicategory model can go beyond two categories and include categories for information seeking, citation help, printing help, noise complaints, interlibrary loan questions, spams, etc. thus, the model can send the question to the relevant department or library personnel accordingly. endnotes 1 “stanford university library ai initiative,” stanford university library, https://library.stanford.edu/projects/artificial-intelligence. 2 “fantastic futures: 2nd international conference on ai for libraries, archives, and museums,” (2019), stanford university library, https://library.stanford.edu/projects/fantastic-futures. 3 christina m. desai and stephanie j. graves, “cyberspace or face-to-face: the teachable moment and changing reference mediums,” reference & user services quarterly 47, no. 3 (spring 2008): 242–55, https://www.jstor.org/stable/20864890. 4 sharon q. yang and heather a. dalal, “delivering virtual reference services on the web: an investigation into the current practice by academic libraries,” journal of academic librarianship 41, no. 1 (november 2015): 68–86, https://doi.org/10.1016/j.acalib.2014.10.003. 5 megan ozeran and piper martin, “good night, good day, good luck: applying topic modeling to chat reference transcripts,” information technology and libraries 38, no. 2 (june 2019): 49– 57, https://doi.org/10.6017/ital.v38i2.10921. 6 christopher brousseau, justin johnson, and curtis thacker, “machine learning based chat analysis,” code4lib journal, no. 50 (2021), https://journal.code4lib.org/articles/15660. 7 jeremy walker and jason coleman, “using machine learning to predict chat difficulty,” college & research libraries 82, no. 5 (2021), https://doi.org/10.5860/crl.82.5.683. 8 hyunseung koh and mark fienup, “topic modeling as a tool for analyzing library chat transcripts,” information technology and libraries 40, no. 3 (2021), https://doi.org/10.6017/ital.v40i3.13333. 9 ellie kohler, “what do your library chats say? how to analyze webchat transcripts for sentiment and topic extraction” (17th annual brick & click libraries conference, maryville, missouri: northwest missouri state university, 2017). 10 jeremiah flannery, “using nlp to generate marc summary fields for notre dame’s catholic pamphlets,” international journal of librarianship 5, no.1 (2020): 20–35, https://doi.org/10.23974/ijol.2020.vol5.1.158. 11 sultan m. al-daihani and alan abrahams, “a text mining analysis of academic libraries’ tweets,” the journal of academic librarianship 42, no. 2 (2016): 135–43, https://doi.org/10.1016/j.acalib.2015.12.014. https://library.stanford.edu/projects/artificial-intelligence https://library.stanford.edu/projects/fantastic-futures https://www.jstor.org/stable/20864890 https://doi.org/10.1016/j.acalib.2014.10.003 https://doi.org/10.6017/ital.v38i2.10921 https://journal.code4lib.org/articles/15660 https://doi.org/10.5860/crl.82.5.683 https://doi.org/10.6017/ital.v40i3.13333 https://doi.org/10.23974/ijol.2020.vol5.1.158 https://doi.org/10.1016/j.acalib.2015.12.014 information technology and libraries september 2022 using machine learning and natural language processing to analyze library chat reference transcripts wang 10 12 kevin w. walker and zhehan jiang, “application of adaptive boosting (adaboost) in demand driven acquisition (dda) prediction: a machine-learning approach,” the journal of academic librarianship 45, no. 3 (2019): 203–12, https://doi.org/10.1016/j.acalib.2019.02.013. 13 carlos g. figuerola, francisco javier garcia marco, and maria pinto, “mapping the evolution of library and information science (1978–2014) using topic modeling on lisa,” scientometrics 112 (2017): 1507–35, https://doi.org/10.1007/s11192-017-2432-9. 14 “projects in artificial intelligence registry (pair): a registry for ai projects in higher ed,” university of oklahoma libraries, https://pair.libraries.ou.edu/. 15 thomas finley, “the democratization of artificial intelligence: one library’s approach,” information technology and libraries 38, no. 1 (2019): 8–13, https://doi.org/10.6017/ital.v38i1.10974. https://doi.org/10.1016/j.acalib.2019.02.013 https://doi.org/10.1007/s11192-017-2432-9 https://pair.libraries.ou.edu/ https://doi.org/10.6017/ital.v38i1.10974 abstract introduction literature review background of nlp and ml methodology data collection data preprocessing text vectorization model building, testing, and evaluation results and analysis conclusion and future work endnotes seeing through ontologies editorial board thoughts seeing through vocabularies kevin ford information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12367 kevin ford (kevinford@loc.gov) is librarian, linked data specialist in the library of congress’s network development and marc standards office. he works on the library’s bibframe initiative, and similar projects, such as mads/rdf, and is a member of the ital editorial board. the ideas and opinions expressed here are those of the author and do not necessarily reflect those of his employer. “ontologies” are popular in library land. “vocabularies” are popular too, but it seems that the library profession prefers “ontologies” over “vocabularies” when it comes to defining classes and properties that attempt to encapsulate some realm of knowledge. bibframe, mads/rdf, bibo, premis, and frbr are well-known “ontologies” in use in the library community.1 they were defined either by librarians or to be used mainly in the library space, or both. skos, foaf, dublin core, and schema are well known “vocabularies.”2 they are used widely by libraries though none were created by librarians or specifically for library use. in all cases, those ontologies and vocabularies were created for the very purpose of publication for broader use, which is one of the primary objectives behind creating one: to define a common set of metadata elements to facilitate the description and sharing of data within a group or groups of users. ontologies and vocabularies are common when working with rdf (resource description framework), a very simple data model in which information is expressed as a series of triple statements, each consisting of three parts: a subject, a predicate, and an object. the types of ontologies and vocabularies referred to here are in fact defined using rdf—thing a is a class and thing z is a property. those using any given ontology or vocabulary employ the defined classes and properties to further describe their things, for a lack of a better word. it is useful to provide an example. the first block of triples below represents class and property definitions in rdf schema (rdfs), which provides some very basic means to define classes and properties and some relationships between them, such as the domains and ranges for properties. the second block is instance data. ontovoc:book rdf:type rdfs:class ontovoc:authoredby rdf:type rdf:property ontovoc:authorof rdf:type rdf:property ex:12345 rdf:type ontovoc:book ex:12345 ontovoc:authoredby ex:abcde ontovoc:book is defined as a class and ontovoc:authoredby is defined as a property. using those declarations, it is possible to then assert that ex:12345, which is an identifier, is of type ontovoc:book and was authored by ex:abcde, an identifier for the author. is the first block— the definitions—an “ontology” or a “vocabulary?” putting aside the question for now, air quotes— in this case literal quotes—have been employed around “ontologies” and “vocabularies” to suggest that these are more terms of art than technical distinctions, though it must also be acknowledged that there is a technical distinction to be made. mailto:kevinford@loc.gov information technology and libraries june 2020 seeing through vocabularies | ford 2 ontologies in the rdf space frequently, if not always, use classes and properties from the web ontology language (known as owl) to define a specific realm’s classes and properties and how they relate to each other within that realm of knowledge. this is because owl is a more expressive definition language than basic rdfs. using owl, and considering the example above, ontovoc:authoredby could be defined as an inverse of ontovoc:authorof. ontovoc:authoredby owl:inverseof ontovoc:authorof in this way, and given the little instance data above (the two triples that begin ex:12345), it is then possible to infer the following bit of knowledge: ex:abcde ontovoc:authorof ex:12345 now that the owl:inverseof triple/declaration has been added to the definitions, it’s worth reasking: do the definitions represent an “ontology” or a “vocabulary?” a purist might answer “not an ontology,” but only because those statements have not been combined in a document, which itself has been given a uri and declared to be an owl:ontology. that’s the actual owl class that says, “this is an owl ontology.” but let’s say those statements had been added to a document published at a uri and declared to be an owl:ontology. is it an ontology now? perhaps in a strict sense the answer is “yes.” but in a practical sense few would view those four declarations, wrapped neatly in a document that has been given a uri and called an ontology, as an “ontology.” it doesn’t quite rise to the occasion—“ontologies” almost always have a broader scope and employ more formal semantics—making its use a term of art, often, rather than a real technical distinction. yet, based on the same narrow definition (a published document declaring itself to be an owl:ontology) combined with a far more extensive set of class and property definitions with defined relationships between them, it is possible to describe foaf as an ontology.3 but it is widely known as, and understood as, a “vocabulary.” (there is also an experimental version of schema as owl.4) and that gets to the crux of the issue in many ways. putting aside the technical distinction that can be argued to identify something as an “ontology” versus a “vocabulary,” there are non-technical semantics at work here—what was earlier described as a “term of art”—about when, how, and why something is deemed an “ontology” versus a “vocabulary.” the library community appears to think of their creations as “ontologies” and not “vocabularies,” even when the documentation tends to avoid the word “ontology.” for example, the opening sentence of the bibframe and mads/rdf documentation very clearly introduces each as a “vocabulary,” as does frbr in rdf.5 on the surface they may be presented as “vocabularies,” which they are of course, but despite this prominent self-declaration they are not seen in the same light as foaf or schema but instead as something more exacting, which they also are. it is worth contemplating why they are viewed principally as “ontologies” and to examine whether this has been beneficial. perhaps the ideas behind designating something a “vocabulary” are, in fact, more in line with the way libraries operate, whereas “ontologies” represent an ideal (and who doesn’t set their sights on the ideal?), striving toward which only exposes shortcomings and sows confusion. the answer to “why” is historical and probably derives from a combination of lofty thinking, traditional standards practices, and good ol’ misunderstanding. traditional standards practices favor more formal approaches. libraries’ decades-long experience with xml and xml schema information technology and libraries june 2020 seeing through vocabularies | ford 3 contributed significantly to this mindset. xml schema provides a way to describe the precise construction of an xml document and it can then be used to validate the xml document. xml schema defines what elements and attributes are permitted in the xml document and frequently dictates their order. it can further constrain the values of an element or attribute to a select list of options. in many ways, xml schema was the very expression of metadata quality control. librarians swooned. with the right controls and technology in place, it was impossible to produce poor, variable metadata. in the case of semantic modelling, owl is certainly a more formal approach. it’s founded in description logics whose expressions take the form of occult-like mathematics, at least as viewed by a librarian with a humanities background. owl can be used to declare domains and ranges for properties. one can also designate a property as a datatype property, meaning it takes a value such as a string or a date, as its value, or an object property, which means it will reference another rdf resource as its object. but these declarations are actually more about inferencing—deriving information by applying the ontology against some instance data—and not about restrictions, constraints, or validation. to be clear, there are ways to apply restrictions in owl—“wine can be either red or white”—but this is a form of advanced owl modelling that is not well understood and not often implemented, and virtually never in ontologies designed by librarians. conversely, indicating a domain for a property, for example, is easy, relatively straightforward, and seductive because it gives the appearance that the property can only be used with resources of a specific class. consider: the domain of ontovoc:authoredby is ontovoc:book. that does not mean that the ontovoc:authoredby can only be used with a ontovoc:book resource. it means that whatever resource uses ontovoc:authoredby must therefore be a ontovoc:book. defining that domain for that property is not restricting its use only to books; it allows one to derive the additional knowledge that the thing it is used with must be a book even if it doesn’t identify itself as one. this may seem like a subtle distinction and/or it may seem like tortured logic, but if it does it may suggest that one’s point of view, one’s mindset, favors constraints, restrictions, and validations. and that’s ok. that’s library training and conditioning, completely reinforced in our daily work. it’s what has been taught in library schools for decades and practiced by library professionals even longer. names should be entered “last name, first name” and any middle initial, if known, included. the data in this field should only be a three-character language code from this approved list of language codes. these rules and the consistency resulting from these rules are what make library data so often very high quality. google loves marc records from our community for this very reason. wishing to exert strong control at the definition level when creating a model or metadata scheme with an eye to data quality, it is a natural inclination for librarians to gravitate to a more formal means of defining a model, especially one that seems to promise constraints. so, despite these models self-describing at a high-level as vocabularies, the models themselves employ a considerable amount of owl at the technical level, which becomes the focus of any users wishing to implement the model. users comprehend these models as something more than a vocabulary and therefore view the model through this more complex lens. unfortunately, because owl is poorly understood (sometimes by creators and sometimes by users, and sometimes by both), this leads to various problems. on the one hand, creators and users believe there are technical restrictions or constraints where there are, in fact, none. when this happens, the “constraint” is information technology and libraries june 2020 seeing through vocabularies | ford 4 either identified as a problem (“consider removing the range for this property”) or—and this is more damaging—the property (read: model/vocabulary/ontology) is avoided. even when it is recognized that the “constraint” is not a real restriction (just a means to infer knowledge), forging ahead can generate new issues. when faced with a domain and range declaration, for example, forging ahead can result in inaccurate, imprecise, or simply undesirable inferences. most of the currently open “issues” (about 50 at the time of writing) about bibframe follow a basic pattern: 1) there is a declaration about this property or this class that makes it difficult to use because of how it has been defined with owl; 2) we cannot really use it presently because it would cause potential inferencing issues; 3) consider altering the owl definitions.6 pursuing an (owl) ontology, while formal and seemingly comforting because it feels a little like constraining the metadata schema, can result in confusion and a lack of adoption. given that vocabularies and ontologies are developed and published to encourage users to describe their data in a way that fosters wide consumption by others, this is unfortunate to say the least. it is notable that skos, foaf, dublin core, and schema have very different scopes and potentially much wider user bases than the more library-specific ontologies (bibframe, mads/rdf, bibo, etc.). there is something to be learned here: the smaller the domain, the more effective an ontology might be; the larger the universe, a more general approach may be better. it is further true that foaf, dublin core, and schema define specific domains and ranges for many of their properties, but they have strived for clarity and simplicity. the creators of schema, for example, eschewed the formal semantics behind rdfs and owl and redefine domain and range to better match their needs and (perhaps unexpectedly) most users’ automatic understanding.7 what is generally true is that each of the “vocabularies” approached the creation and defining of their models so as to minimize the use of formal semantics, and promoted this as a feature. in this way, they limited or removed altogether the actual or psychological barriers to adoption. their offering was more accessible, less fussy. bearing in mind the differences in scale and scope, they have been rewarded with a wider adopter base and passionate advocates. the decision to create a “vocabulary” or an “ontology” is a technical one and a political one, both of which must be in alignment. it’s a mindset and it is a statement. it is entirely possible to define the model at a technical level using owl, making it by definition an ontology, but to have it be perceived, and used, as a vocabulary because it is flexible and not strictly defined. likewise, it is not enough to call something a vocabulary, but in reality be a model burdened with formal semantics that is then expected to be adopted and used widely. if the objective is to fashion a (pseudo?) restrictive metadata set with rules that inform its use, and which is strongly bonded with a specific community, develop an “ontology,” but recognize that this may result in confusion and lack of uptake. if, however, the desire is to cultivate a metadata element set that is flexible, readily useable, and positioned to grow in the future because it employs fewer rules and formal semantics, create a “vocabulary.” that’s really what is being communicated when we encounter ontologies and vocabularies. interestingly, the political difference between “vocabulary” and “ontology” appears, in fact, to be understood by librarians: library models self-identify as “vocabularies.” but once past those introductory remarks, the truth is exposed quickly in the widespread use of owl, revealing beyond doubt that it is not a flexible, accommodating vocabulary but a strictly defined model. to dispense with the air quotes: as librarians we’re creating ontologies and calling them vocabularies. we really want to be creating vocabularies that are ontologies in name only. information technology and libraries june 2020 seeing through vocabularies | ford 5 endnotes 1 “bibframe ontology,” library of congress, accessed may 21, 2020, http://id.loc.gov/ontologies/bibframe.html; “mads/rdf (metadata authority description schema in rdf),” library of congress, accessed may 21, 2020, http://id.loc.gov/ontologies/madsrdf/v1.html; “bibliographic ontology specification,” the bibliographic ontology, accessed may 21, 2020, http://bibliontology.com/; “premis 3 ontology,” premis editorial committee, accessed may 21, 2020, http://id.loc.gov/ontologies/premis3.html; ian davis and richard newman, “expression of core frbr concepts in rdf,” accessed may 21, 2020, https://vocab.org/frbr/. 2 alistair miles and sean bechhofer, editors, “skos simple knowledge organization system reference,” w3c, accessed may 21, 2020, https://www.w3.org/tr/skos-reference/; dan brickley and libby miller, “foaf vocabulary specification 0.99,” accessed may 21, 2020, http://xmlns.com/foaf/spec/; “dcmi metadata expressed in rdf schema language,” dublin core™ metadata initiative, accessed may 21, 2020, https://www.dublincore.org/schemas/rdfs/; “welcome to schema.org,” schema.org, accessed may 21, 2020, http://schema.org/. 3 “foaf ontology,” xmlns.com, accessed may 21, 2020, http://xmlns.com/foaf/spec/index.rdf. 4 see “owl” at “developers,” schema.org, accessed may 21, 2020, https://schema.org/docs/developers.html. 5 see “bibframe ontology” and “mads/rdf (metadata authority description schema in rdf)” above. 6 “issues,” bibframe ontology at github, accessed 21 may 2020, https://github.com/lcnetdev/bibframe-ontology/issues. 7 r.v. guha, dan brickley, and steve macbeth, “schema.org: evolution of structured data on the web,” acmqueue 15, no. 9 (15 december 2015): 14, https://dl.acm.org/ft_gateway.cfm?id=2857276&ftid=1652365&dwn=1. http://id.loc.gov/ontologies/bibframe.html http://id.loc.gov/ontologies/madsrdf/v1.html http://bibliontology.com/ http://id.loc.gov/ontologies/premis3.html https://vocab.org/frbr/ https://www.w3.org/tr/skos-reference/ http://xmlns.com/foaf/spec/ https://www.dublincore.org/schemas/rdfs/ http://schema.org/ http://xmlns.com/foaf/spec/index.rdf https://schema.org/docs/developers.html https://github.com/lcnetdev/bibframe-ontology/issues https://dl.acm.org/ft_gateway.cfm?id=2857276&ftid=1652365&dwn=1 endnotes 10844 20190318 galley library services navigation: improving the online user experience brian rennick information technology and libraries | march 2019 14 brian rennick (brian_rennick@byu.edu) is aul for library it, brigham young university. abstract while the discoverability of traditional information resources is often the focus of library website design, there is also a need to help users find other services such as equipment, study rooms, and programs. a recent assessment of the brigham young university library website identified nearly two hundred services. many of these service descriptions were buried deep in the site, making them difficult to locate. this article will describe a web application that was developed to improve service discovery and to help ensure the accuracy and maintainability of service information on an academic library website. introduction the brigham young university library released a new version of its website in 2014. multiple usability studies were conducted to inform the design of the new site. during these studies, the web designers observed that when a user did not see what they were looking for on the homepage, they were likely to click on the “services” link as the next best option. the word services appeared to be an effective catch-all term. web designers asked themselves, “what is a library service?” they concluded that a library service could be anything public-facing that meets the needs of a user. using this broad definition, services could include: • library materials—both digital and physical (e.g. books, dvds) • material services (e.g. course reserve, interlibrary loan) • equipment and technology (e.g. computers, cameras, tripods) • help and guidance (e.g. research assistance, computer assistance) • locations (e.g. group study rooms, classrooms, help desks) • programs (e.g. friends of the library, lectures) because libraries offer so many diverse services, structuring a website to effectively promote them all brings many challenges. for instance, a common approach to presenting library services on a website is to have a menu that lists a few of the most popular or important services. the last menu item will normally be a link to a web page for “other services” that provides a more comprehensive service list. such an all-inclusive listing of library services on a single web page can easily lead to information overload for users. where do services belong in a library website’s information architecture? determining the one correct path is not easy because there are multiple valid ways to organize services into web pages. services could be arranged by department, service category, user group (undergraduates, graduates, faculty, visitors, alumni), or any number of other ways. an ideal system would allow users to follow the path that makes the most sense to them. information technology and libraries | march 2019 15 user expectations for a single (google-like) search box add to the challenges for service listings.1 a single search box, also known as a metasearch system, web-scale discovery service, or federated search, combines search results from multiple library sources. a study at the university of colorado found that users expected to locate services by entering keywords into the single search box on the library’s homepage.2 for example, the users attempted to search for “interlibrary loan” and “chat with a librarian” using the single search box. it is unrealistic to expect all users to follow a specific series of links in order to find the one correct path to information about a service when they are accustomed to google-style searching. even when a user manages to locate the correct web page where a service is described, the pertinent information can still be difficult to pinpoint when service descriptions are buried in paragraphs. users need to be able to quickly perform a visual scan of a web page to locate service information. kozak and hartley suggest that “bulleted lists are easier to read, easier to search and easier to remember than continuous prose.”3 the ongoing maintenance of service listings poses another significant challenge. for large academic libraries, up-to-date service information is difficult to maintain because it is typically scattered throughout a website. each department may have its own set of web pages and service listings. department pages created and maintained by different individuals end up with inconsistent design, organization, and voice. services that are common to multiple departments will have duplicate listings with different descriptions. maintenance of accurate information becomes an issue as services change; tracking down all of the references to a discontinued or modified service requires extensive searching of the website. literature review studies and commentaries regarding the information architecture of academic library websites have been covered extensively in the literature.4 a few articles specifically address the way that library services are organized on websites. library services are a significant component of academic library website content. clyde studied one hundred library websites from thirteen countries in order to compare common features and to determine some of the purposes for a library website.5 purposes for the sites varied. some focused on providing information about the library and its services while others functioned more like a portal, providing links to internet resources. cohen and still developed a list of core content for academic library websites by examining pages from university and two-year college sites.6 they organized the content into categories: library information, reference, research, instruction, and functionalities. liu surveyed arl libraries to get an overview of the state of web page development.7 the subsequent spec kit identifies services commonly found on academic library websites. yang and dalal studied a random sample of academic library websites to see which web–based reference services were offered and how they were presented.8 they also examined the differing terminology used to describe the services. the choice of terminology used on library websites impacts the findability of services. dewey compared academic websites from thirteen member libraries of a consortium to determine how findable service links were on the sites.9 the service links used in the evaluation covered “access, reference, information, and user education” categories. the study measured the number of clicks from the homepage that were required to find information about a service. dewey found library services navigation | rennick 16 https://doi.org/10.6017/ital.v38i1.10844 inconsistent use of terminology used to describe library services from one site to another. dewey posited that extensive use of library jargon could, in a sense, hide links from users. the overall conclusion was that the websites contained “too much information poorly placed.” a study of an academic library website by mcgillis and toms also found that participants struggled with terminology when attempting to locate services.10 the website reflected “traditional library structures” instead of using categories that were meaningful to users. the decision on where to place library services on a website is an important step in the design process. as part of their proposal to establish a benchmarking program for academic library websites, hightower, shih, and tilghman created classifications for the web pages they studied.11 library services were assigned to the “directional” category instead of representing a separate category. vaughan described a history of changes to an academic website that took place from 1996–2000.12 an interesting change was that, after multiple redesigns, the web designers combined two categories into a single “library services” category in order to simplify top level navigation on the home page. comeaux studied thirty-seven academic library websites to see how design elements evolved between 2012 and 2015.13 a portion of the study compiled terms used as navigation labels. the term “about” was the most common navigation label followed by “services” as the second most common. use of the term “services” as a main navigation label increased in popularity from 2012 to 2015. several researchers suggest organizing library services into web pages or portals that target different audiences. gullikson et al. studied usability issues related to the information architecture of an academic website and discovered that study participants followed different paths in their attempts to locate service information on the site.14 some users found items easily while others were unsuccessful. menu labels were not universally understood. the researchers identified a need for multiple access points to information in order to accommodate different mental models. they suggested employing multiple information organizational schemes, such as categorizing links by function, frequency of use, and target user group. adams and cassner analyzed the websites of arl libraries to see how services for distance education students and faculty were presented.15 they recommend strategies for helping distance students navigate the website, including maintaining a web page designed specifically for distance students that avoided jargon and clearly described services. detlor and lewis envisioned academic library websites as “sophisticated guidance systems which support users across a wide spectrum of information seeking behaviors—from goal-directed search to wayward browsing.”16 they reviewed arl library websites to see which important features were present or absent. their coding methodology was adopted by gardner, juricek, and xu in their study of how library web pages can meet the needs of campus faculty.17 liu proposed a conceptual model for an improved academic library website that would be organized into portals designed for specific user groups, such as undergraduates, faculty, or visitors.18 some of the arl websites studied by the researcher already implemented portals by user group. a more recent approach for locating library services has been to include website search results when using the single search from the homepage. for example, the north carolina state libraries website includes library-wide site search results when using the single search.19 the wayne state university libraries single search displays results from a university-wide site search.20 information technology and libraries | march 2019 17 an influential report produced by andrew pace provides practical advice for designing library websites.21 in the report, pace described the library services that should be included on a site and stressed that website design affects the discoverability and delivery of these services: “whether requiring minimal maintenance or constant upkeep, the extensibility of the design and flexibility of a site’s architecture ultimately saves the library time, money, hassle, and user frustration.”22 the web application described in this article aims to achieve these goals in terms of service discoverability and website maintainability. a services web application in an effort to tackle the challenges of services navigation and maintenance, the brigham young university library developed a web application for organizing services that allows multiple routes to service information. the application, known internally as “services,” was built using django, an open-source python web framework. the application incorporates a comprehensive list of library services and a map of service relationships. each service is assigned one or more categories, locations, and service areas within the application: • categories and subcategories—broad groupings of services (e.g., research help, for faculty, printing and copying) • locations—physical or virtual places within the library where services can be found (e.g., help desks, rooms) • service areas— library departments or other organizational units that offer services (e.g., humanities, special collections) services can have multiple categories, locations, and service areas and some service areas have multiple locations within the library (see figure 1). service information can also include links to related services. these links facilitate the serendipitous discovery of additional services (see figure 2). service information is stored in a relational database that joins connected entities together. an html template is used to format service information from the database in order to generate web pages for each of the services. maintaining the data in this manner ensures that changes made to service information in the database flow through to all of the associated web pages. adding or modifying entries automatically triggers the generation of new html for only the impacted services. generating static content by using triggers keeps the web pages up-to-date without the performance hit of real-time dynamic page generation. library services navigation | rennick 18 https://doi.org/10.6017/ital.v38i1.10844 figure 1. sample map illustrating relationships between services (on the left side) and service area locations (on the right side). information technology and libraries | march 2019 19 figure 2. sample map of how related service web pages are linked. library services navigation | rennick 20 https://doi.org/10.6017/ital.v38i1.10844 user scenarios the following examples of navigation paths typify how the web application can help users locate services. in each case there are multiple alternative paths that could be followed to find the same information. scenario 1. a student is looking for a computer that has music notation software installed. clicking the “services” link on the library homepage leads to a summary of library services. the student clicks the “public computers” link found under the “featured services” heading and is presented with detailed information about the computers. in the bullet points listed in the “overview” section there is a link to “see the list of software available on these computers.” following this link the student is able to learn that the desired software is available in the library’s music and dance media lab. scenario 2. while visiting a web page for the faculty delivery service, a professor notices a link to the category “for faculty.” following the link leads to a page that highlights some of the library services provided exclusively to campus faculty. the professor clicks the link “faculty expedited book orders” and is taken to a web page that describes the service and provides an online form for requesting a book. scenario 3. a student would like to borrow a camera for a class project. entering “digital cameras” into the main search box on the library homepage produces a link to “digital cameras (dslr)” listed under the “library services” heading at the top of the search results. following the link leads to a web page with information about the library’s digital camera offerings. the web page provides links to related services, including the library’s video production studio. the student decides to reserve the studio instead of checking out a camera. anatomy of a services web page each service web page is divided into sections to help users quickly find the type of information they seek. each section represents an information module with a specific purpose and an identifying design; the sections are color coded and displayed in a consistent order on each page. this helps users to find the same kind of information in the same place on every service page. major sections include: • title • description • keywords • hours • location • contact • overview • call to action • frequently asked questions • additional resources • related services • categories information technology and libraries | march 2019 21 a few of the sections require an explanation. the hours, location, and contact sections are links located directly below the title and description. clicking these links displays the section content. the overview section is intended to provide brief bullet points near the top of the web page so that users can quickly scan the most important information about the service. the call to action section follows these bullet points and contains one or more links to web applications that facilitate use of the service. examples of calls to action include: • place a hold • reserve a group study room • register for an advanced writing class • submit an interlibrary loan request most of the sections are optional since not all sections apply to every service. the services web pages can also include raw html that is embedded in a section in order to provide unique formatting for those services that do not neatly fit the standard layout. for example, the public computers page includes a section that displays the current availability of computers for each floor of the library. the look and feel of services web pages can be extended to other pages on the library website. library departments have web pages that provide information about personnel, mission, location, and services offered. some of these pages have been converted to a format that resembles the services layout in an effort to add cohesiveness to the library website. the department pages have sections similar to services pages such as hours, location, contact information, and an overview with bullet points. the pages can automatically display links to all of the services available in the department. because department pages are part of the services application and are connected to services with a relational database, changes to service information remains in sync across the entire website. this helps alleviate the problem of out-of-date department web pages. searching for services services can be located by submitting a query in a search box or by following links found on the main services web page. the services search engine matches words from the query with words found in a service name or associated tags. each service is tagged with keywords, phrases, or synonyms to increase the likelihood of successful searching. users may not be familiar with library jargon and will search for services using a variety of terms. it is impossible to name library services in a way that is understood by everyone, especially since academic library services target both students and faculty. a study on library services and user-centered language found that: “the choices of the graduate students did not always mirror those of the faculty. this highlights the inherent challenge of marketing services—the target audiences for the same service can have very different opinions and preferences.”23 services can have multi-word phrases assigned in addition to individual keywords. for example, the data management service has the following synonyms assigned: data curation, data management plan, and dmp. new keywords and phrases can be identified by reviewing search queries in the system log files and by conducting usability studies. library services navigation | rennick 22 https://doi.org/10.6017/ital.v38i1.10844 figure 3. the interlibrary loan service web page. information technology and libraries | march 2019 23 in addition to using a search box on the services web pages, users can search for services using the single search box on the library’s homepage. the single search box returns a link to matching services as part of search results when the search engine recognizes services keywords in a query. the services application has an api that makes keywords and other service information available to the single search box application. figure 4. search for a service from the single search box on the library’s homepage. figure 5. json results from the services api. to facilitate browsing, services are organized into three groups on the services web page: featured services, categories, and service areas. the featured services group highlights the most commonly sought-after services. categories are organized by the type of service or the target audience. the service areas group directs users to services available in library departments or units. the services web page does not list every service but instead directs users to web pages based on categories or service areas that list individual services. the services search feature can also include links to non-services. for example, library policies are not services yet users occasionally search for them on the services page (the library website posts {"status": 200, "results": [{"url": "https://lib.byu.edu/services/datamanagement/", "type": "service", "name": "data management", "slug": "datamanagement", "description": "through our institutional repository scholarsarchive, faculty can store research data. this is particularly useful for faculty who must develop data management plans for research projects funded by grants.", "keywords": ["data curation", "dmp", "data management plan", "data storage", "open access"]}], "total": 1, "query": "dmp"} library services navigation | rennick 24 https://doi.org/10.6017/ital.v38i1.10844 policy documents on the about page). in order to minimize user frustration with searching, links to non-services are included in search results so that users can be redirected to the desired pages. to help with optimization for external search engines such as google, each services page has a user-friendly url that clearly identifies the service. for example, the 3d printer service has the url https://lib.byu.edu/services/3d-printers/. each web page also includes the service name in an embedded html title tag. conclusion adopting a broad view of what represents a service has altered the library’s approach to the information architecture of the website. the services web application offers several innovations for improving library service discoverability and maintenance including: • standardized organization of service information • attaching keywords/aliases to service descriptions • an api for integration with the single search box on the homepage • links to related services • generation of web pages from a relational database usability tests were conducted throughout the development of the services application. follow-up assessments are planned for the future in order to verify that the application works as expected and to identify potential adjustments to the design. the services application shows promise as an effective tool for facilitating the discovery of services and increasing the reliability and uniformity of service information. acknowledgements the author gratefully acknowledges the contributions of grant zabriskie for the original concept and design of the services application and ben crowder for the implementation. references 1 cory lown, tito sierra, and josh boyer, “how users search the library from a single search box,” college & research libraries 74, no. 3 (may 2013): 227-41, https://doi.org/10.5860/crl-321. 2 rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends 61, no. 1 (summer 2012): 186–207, https://doi.org/10.1353/lib.2012.0029. 3 marcin kozak and james hartley, “writing the conclusions: how do bullet-points help?” journal of information science 37 no. 2 (feb. 2011): 221–24, https://doi.org/10.1177/0165551511399333. 4 barbara a. blummer, “a literature review of academic library web page studies,” journal of web librarianship 1 no. 1 (2007): 45–64, https://doi.org/10.1300/j502v01n01_04; galina letnikova, “usability testing of academic library web sites: a selective annotated bibliography,” internet reference services quarterly 8 no. 4 (2004): 53–68, https://doi.org/10.1300/j136v08n04_04. information technology and libraries | march 2019 25 5 laurel a. clyde, “the library as information provider: the home page,” the electronic library 14 no. 6 (dec. 1996): 549–58, https://doi.org/10.1108/eb045522. 6 laura b. cohen and julie m. still, “a comparison of research university and two-year college library web sites: content, functionality, and form,” college & research libraries 60 no. 3 (1999): 275–89, https://doi.org/10.5860/crl.60.3.275. 7 yaping peter liu, “web page development and management: a spec kit,” association of research libraries (1999): https://hdl.handle.net/2027/mdp.39015042087232. 8 sharon q. yang and heather a. dalal, “delivering virtual reference services on the web: an investigation into the current practice by academic libraries,” journal of academic librarianship 41 no. 1 (2015): 68–86, https://doi.org/10.1016/j.acalib.2014.10.003. 9 barbara i. dewey, “in search of services: analyzing the findability of links on cic university libraries’ web pages,” information technology and libraries, 18 no. 4 (1999): 210–13, http://www.ala.org/sites/ala.org.acrl/files/content/conferences/pdf/dewey99.pdf. 10 louise mcgillis and elaine g. toms, “usability of the academic library web site: implications for design,” college & research libraries 62 no. 4 (july 2001): 355–67, https://doi.org/10.5860/crl.62.4.355. 11 christy hightower, julie shih, and adam tilghman, “recommendations for benchmarking web site usage among academic libraries,” college & research libraries 59 no. 1 (jan. 1998): 61–79, https://crl.acrl.org/index.php/crl/article/viewfile/15182/16628. 12 jason vaughan, “three iterations of an academic library web site,” information technology and libraries 20 no. 2 (june 2001): 81–92, https://search.proquest.com/docview/215832160. 13 david j. comeaux, “web design trends in academic libraries—a longitudinal study,” journal of web librarianship 11 no. 1 (2017): 1–15, https://doi.org/10.1080/19322909.2016.1230031. 14 shelly gullikson et al., “the impact of information architecture on academic web site usability,” the electronic library 17 no. 5 (oct. 1999): 293–304, https://doi.org/10.1108/02640479910330714. 15 kate e. adams and mary cassner, “content and design of academic library web sites for distance learners: an analysis of arl libraries,” journal of library administration 37 no. 1/2 (2002): 3–13, https://doi.org/10.1300/j111v37n01_02. 16 brian detlor and vivian lewis, “academic library web sites: current practice and future directions,” journal of academic librarianship 32 no. 3 (may 2006): 251–58, https://doi.org/10.1016/j.acalib.2006.02.007. 17 susan j. gardner, john eric juricek, and f. grace xu, “an analysis of academic library web pages for faculty,” journal of academic librarianship 34 no. 1 (jan. 2008): 6–24, https://doi.org/10.1016/j.acalib.2007.11.006. library services navigation | rennick 26 https://doi.org/10.6017/ital.v38i1.10844 18 shu liu, “engaging users: the future of academic library web sites,” college & research libraries 69 no. 1 (jan. 2008): 6–27, https://doi.org/10.5860/crl.69.1.6. 19 kevin beswick, “quicksearch,” north carolina state university libraries, accessed nov. 28, 2018, https://www.lib.ncsu.edu/projects/quicksearch. 20 cole hudson and graham hukill, “one-to-many: building a single-search interface for disparate resources,” in exploring discovery: the front door to your library’s licensed and digitized content, ed. kenneth j. varnum (chicago: ala editions, 2016), 141–53, http://digitalcommons.wayne.edu/libsp/114. 21 andrew k. pace, “optimizing library web services: a usability approach,” library technology reports 38 no. 2 (mar./apr. 2002): 1–87, https://doi.org/10.5860/ltr.38n2. 22 ibid. 23 allison r. benedetti, “promoting library services with user-centered language,” libraries & the academy 17 no. 2 (apr. 2017): 217-34, https://doi.org/10.1353/pla.2017.0013. mitigating bias in metadata: a use case using homosaurus linked data article mitigating bias in metadata a use case using homosaurus linked data juliet l. hardesty and allison nolan information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13053 juliet l. hardesty (jlhardes@iu.edu) is metadata analyst, indiana university. allison nolan (anolan147@gmail.com) is library and information science graduate student, indiana university. © 2021. abstract controlled vocabularies used in cultural heritage organizations (galleries, libraries, archives, and museums) are a helpful way to standardize terminology but can also result in misrepresentation or exclusion of systemically marginalized groups. library of congress subject headings (lcsh) is one example of a widely used yet problematic controlled vocabulary for subject headings. in some cases, systemically marginalized groups are creating controlled vocabularies that better reflect their terminology. when a widely used vocabulary like lcsh and a controlled vocabulary from a marginalized community are both available as linked data, it is possible to incorporate the terminology from the marginalized community as an overlay or replacement for outdated or absent terms from more widely used vocabularies. this paper provides a use case for examining how the homosaurus, an lgbtq+ linked data controlled vocabulary, can provide an augmented and updated search experience to mitigate bias within a system that only uses lcsh for subject headings. introduction controlled vocabularies are a vital part of how individuals and communities are understood and discussed in scholarly discourse and research. controlled vocabularies are also a way to standardize terminology and allow items to be grouped by common subjects for easier discovery and access points. while larger, more universally recognized vocabularies like the library of congress subject headings (lcsh) exist, they are often slow to be updated and they reflect a largely white, heterosexual, cisgender, male, christian-centric point of view.1 when the terminology used to define a systemically marginalized group is determined by those outside of the group, often the terms are outdated or reflect a biased perspective.2 the prevalence and continued use of outdated metadata and vocabularies in discovery systems creates a cycle of biased search practices that can be difficult to break without the help of information professionals and outside resources. controlled vocabularies that have been created by or have the input of marginalized communities tend to be more inclusive and up to date. unfortunately, these vocabularies often are not known to the public or to researchers not well versed in metadata practices. providing access to controlled vocabularies created by marginalized communities and linking them to existing vocabularies such as lcsh can help make the search process more representative of the people who are using discovery systems and can connect them to resources that better represent themselves and their needs in a complex information world. lcsh terms are available as linked data, a format that enables online machine-readable connections between concepts and terms, and there needs to be an effort to make systems using lcsh terms more inclusive and representative of marginalized communities. the project described in this article built and gathered feedback on a proof-of-concept javascript application to show how defined connections between vocabularies can be used to provide alternative and mailto:jlhardes@iu.edu mailto:anolan147@gmail.com information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 2 often enhanced access to library catalog resources. in this instance, simple knowledge organization system (skos) relationships link lcsh subject terms to the homosaurus linked data vocabulary, an “international linked data vocabulary of lgbtq terms that supports improved access to lgbtq resources within cultural institutions.”3 skos is “a common data model [from the w3c] for sharing and linking knowledge organization systems via the web.”4 this project uses skos:exactmatch relationships defined by the homosaurus to enable researchers to use homosaurus terms to search a library catalog and retrieve relevant results based on the connected lcsh terms that are already in the catalog record.5 subject searches are conducted when the homosaurus term and the lcsh term match exactly, since the lcsh term’s presence in the library catalog record indicates a specific grouping of records could have this subject term applied. if the homosaurus term does not match exactly to the lcsh term, a keyword search is conducted using the homosaurus term to retrieve library catalog results where the homosaurus term appears in any indexed field in the catalog record, including creator-supplied title and abstract information. using a vocabulary like the homosaurus this way helps to connect researchers to resources that more accurately reflect systemically marginalized communities and potentially more accurately reflects the researchers themselves. by providing connections for users that they would otherwise have difficulty finding without the help of a librarian or other information professional, projects such as this one hope to combat the cycle of biased metadata and biased research practices that has dominated academic research. literature review students in higher education who identify as members of systemically marginalized communities can continue to experience marginalization within higher educational institutions, and the academic library setting is no exception. brook, ellenwood, and lazzaro provide analysis of multiple studies showing the effect of mostly white staffing in academic libraries, the impact this can have on reference services provided to patrons from marginalized communities, and the overwhelming and intimidating spaces in sizable academic libraries that can be “compounded for students who already feel that they do not belong on campus on the basis of their race.” 6 when considering how this experience impacts using an online library catalog or digital repository system for conducting research, these same students can find themselves not well represented.7 additionally, crossing disciplines to capture intersectionalities of an identity can be complicated by narrow controlled vocabulary terms which compound problems that already make interdisciplinary research difficult.8 drabinski proposes that the library catalog should be treated as a biased text that requires critical thinking to understand.9 subject headings from authorities such as the library of congress will never be unbiased as attitudes, perspectives, and identities change over time. it is therefore important to leverage information literacy competency standards put forward by the association of college & research libraries and teach students how to critically engage the library catalog as another information source. library instruction is one way to ease the challenges faced by marginalized researchers in higher education, helping researchers effectively use a system like a library catalog that incorporates biased subject headings. however, with interdisciplinary research, materials are often dispersed across information systems and physical locations, and there is still the challenge to identify and locate everything relevant to the research topic.10 using available fields within the library catalog record itself (the 590 in marc, for example) can identify cross-disciplinary resources. examples are provided by hogan for black lgbtq resources and latina lesbian literature.11 what all of these efforts seem to point to is what hannah buckland proposes: changing the framing of catalog records from “aboutness” to information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 3 “fromness,” providing “culturally-responsive metadata” that j. l. colbert recognizes can create an “equitable subject access” experience that “center[s] the information needs and information seeking behaviors of those whom our systems disenfranchise.”12 these changes can often only be implemented locally due to language variation and localized community relevance; but colbert then considers how linked open data might prove useful to combine or relate different subject or community vocabularies. “when we decenter the idea that for every concept there is one controlled term to describe it, we allow the play of seemingly opposite ways of thinking. . . . a linked open data catalog allows libraries to complement, replace, or even reject the standards that have been decided for us and our patrons.”13 librarians and archivists have suggested and tried other methods to mitigate the impact of systemic marginalization. these efforts go beyond the use of controlled vocabularies in the creation of catalog records. one of the earliest and most significant examples of this is dorothy porter’s work in organizing the collections she managed at howard university. up to that point in the 1930s and 1940s, dewey decimal classification (ddc) was used to organize works on the shelf. many libraries of the time were predominantly white institutions and dorothy porter remembered them using ddc to shelve anything by a black author or about the black experience under the ddc heading for colonization (325) or slavery (326).14 porter instead organized her collections based on subject matter, genre, and author, categorizing the work based on what it was about rather than the race of the author or the race of any people mentioned in the work. this subtle yet fundamental shift shows the real impact that libraries have on access to collections for their audiences. hope a. olson and dennis ward created a proof-of-concept microsoft access database interface connecting mary ellen capek’s a women’s thesaurus to the dewey decimal classification scheme to offer an end user interface for searching a ddc system using the thesaurus terminology. the idea, initially from joan mitchell (then editor of ddc), was to develop “a means of making ddc accessible from the point of view of a marginalized knowledge domain—in particular, creating a means of browsing ddc from a feminist/women’s studies perspective.”15 variables were defined from characteristics of different classifications to enable a systematic match to thesaurus terms. dorothy berry’s work at university of minnesota libraries to gather and digitize african american-related materials from across archival collections for aggregating in umbra search african american history shows an option for pulling a collection from other collections and highlighting what would otherwise remain marginalized items from marginalized communities.16 discovering these materials required searching with a variety of terms used over time to refer to african americans. adding collection level context at the folder level for these materials allows aggregation without losing original place and context, while at the same time centering the marginalized communities represented in these materials by gathering them from these various and marginalized original locations. archives for black lives in philadelphia is “a loose association of archivists, librarians, and allied professionals in the philadelphia and delaware valley area responding to the issues raised by the black lives matter movement.” within this group, the anti-racist description working group has compiled an annotated bibliography and metadata recommendations to address racist and antiblack archival description.17 the recommendations focus on the black community but can be applied more broadly when describing records by and about any marginalized community. the information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 4 recommendations include decentering “neutrality” and “objectivity” for “respect” and “care,” particularly when deciding on controlled vocabulary terms to use in archival description. specific recommendations to use “terminology that black people use to describe themselves,” to recognize that this “terminology changes over time, so description will be an iterative process,” and to consult “alternative cataloging schemes created by the subjects of the records being described when and if they are available” provide an approach that looks for descriptive terms from within the community and moves away from terms applied to a community by others.18 paying attention to the controlled vocabularies applied to archival description helps to change the narrative and the power structure of the historical record, centering those who have been marginalized and oppressed and increasing discoverability and access to their stories and perspectives. allowing for changes in controlled vocabulary terms keeps systems flexible enough to accommodate changes in a community’s terminology over time. linked data relationships can connect term changes for more comprehensive searching while also identifying the current controlled vocabulary term to use. the lavender library, archives, and cultural exchange (llace) community archives in sacramento, california is an archive for a marginalized community.19 in developing archival and circulating library collections that serve the queer community, the library collections use a thesaurus of queer terms from dee michel for classification and the archival collections use subject headings from michel’s thesaurus along with lcsh.20 the focus, again, begins with the community being served and recognizes that widely used controlled vocabularies like lcsh do not serve these collections or communities well. starting with a community-specific vocabulary and then connecting lcsh terms centers the collections and community first and then makes connections to the larger library and archives community possible. other efforts have used alternatives or supplements to common vocabularies and schemes. the xwi7xwa library’s use of the brian deer classification system at the university of british columbia incorporates names and terminology from the first nations community to better represent that community beyond what something like library of congress classification provides. using accurate names of nations and peoples, according to the head librarian, ann doyle, helps create identity among users of the collection and “shapes the research and types of questions that people ask.”21 the national indian law library began cataloging using local terminology only. as it moved records online and sought to be more discoverable and cooperative with other libraries, this local terminology was synchronized with lcsh and specialized terms for federal indian law and tribal law were kept as a supplement.22 doing this work is not only about changing terms on catalog records but also learning and making connections with communities who have been marginalized by these systems. farnel et al. explain the process of decolonizing both the library catalog and digital collections description at university of alberta libraries through investigation, analysis, partnering with other institutions doing this work and, most importantly, reaching out to indigenous communities represented in these records to engage and learn about the most appropriate terminology to use.23 different methods and attempts to center the marginalized in cataloging and collection description show it is possible and essential to voice the concerns of those least represented in order to have the most impact on all researchers using these resources. widely used controlled vocabularies like lcsh continue to be a major way to aggregate collections and provide common access points. groups like the association for library collections and technical services’ information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 5 cataloging and metadata management section subject analysis committee continue to work to change terms in these vocabularies to provide better and more accurate representation for systemically marginalized communities, but the process is slow and will likely never be enough.24 incorporating vocabularies from systemically marginalized communities for use either on the cataloging/description side or for researchers to use for search and discovery offers possibilities for more inclusive experiences that center marginalized voices and expand the options for research questions to ask and answer. methodology to test this idea that connections provided between a systemically marginalized community’s controlled vocabulary and a more generalized vocabulary like lcsh could be helpful, a proof-ofconcept information retrieval aid was conceived. the idea was to create a lightweight javascript application that could use a select set of terms from the homosaurus (http://homosaurus.org), an lgbtq+ vocabulary originally created by ihlia lgbt heritage (https://www.ihlia.nl/?lang=en) and now also used in its linked data form by the digital transgender archive (https://www.digitaltransgenderarchive.net), to connect to lcsh terms and provide search links against a library catalog (iucat, https://iucat.iu.edu, indiana university’s online library catalog) that uses lcsh for subject headings. homosaurus version 1 was used initially and did not identify connections to lcsh terms. analysis of homosaurus terms against lcsh terms suggested some connections could be made and for initial construction of the proof-of-concept application these were used, but with the recognition that these connections were not coming from the community vocabulary. this was a problem since the point in mitigating bias is to use the community’s definitions and any outside interpretations are necessarily not going to reflect the community’s intentions. as the application concept continued to form and the initial term comparison work continued, homosaurus version 2 was released containing explicit connections to lcsh terms, using skos:exactmatch for mapping those connections. those connections in version 2 are not expressed as linked data but are provided in the vocabulary’s site for each term. the proof-of-concept work switched to using select terms from homosaurus version 2 in order to make use of the lcsh connections now being provided by the community.25 the proof-of-concept application used the select set of homosaurus version 2 terms downloaded as json-ld and added in the lcsh terms using the supplied skos:exactmatch relationship. the user interface provided visual connections from the selected homosaurus term to its narrower, broader, and related terms within homosaurus. any exact matches to lcsh terms and any use for terms homosaurus indicated should be replaced by this term were provided together. the visual layout for the application is directly influenced by the ihlia lgbt heritage collections browse interface.26 in ihlia’s system, after searching for a term (“love,” for example), the interface provides broader, narrower, related, and used for terms as suggestions for other ways to discover items in these collections in a visually connected bubble layout surrounding the search term. those connections are linked and can be used to navigate ihlia’s controlled vocabulary, which also happens to be powered by a local non-linked data form of the homosaurus vocabulary. in the proof-of-concept application, for terms where there is an lcsh exact match, the lcsh term was used for the connection to search iucat and was only revealed on screen if the exact match (lcsh) bubble was clicked by the user (see fig. 1). http://homosaurus.org/ https://www.ihlia.nl/?lang=en https://www.digitaltransgenderarchive.net/ https://iucat.iu.edu/ information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 6 figure 1. information retrieval aid showing the homosaurus term “transgenderism” linked to search iucat. exact match (lcsh) shows the lcsh term “gender nonconformity” (also linked to search in iucat) along with narrower, broader, and related homosaurus terms. the initial proof-of-concept information retrieval aid javascript application was shared with and tested by olivia adams, a graduate student at indiana university working as the library coordinator for the lgbtq+ culture center library at indiana university (https://lgbtq.indiana.edu/programs-services/library/index.html). this library has adapted the llace classification system, the shelving organizational scheme developed by the lavender library in sacramento, california (http://lavenderlibrary.com), for organizing its own physical collection of resources. the lgbtq+ culture center library also has its own online library catalog that makes use of an established local list of tags for items included in that system (https://www.librarycat.org/lib/iuglbtlibrary/). the information retrieval aid application was first presented to the lgbtq+ culture center library coordinator for general impressions and feedback. additionally, specific tasks were proposed. please note that the proposed tasks use a vocabulary term as an example that is offensive and outdated. the results of this testing, along with feedback from the homosaurus editorial board, clarified the need to change the information retrieval aid to supply this additional contextual information (available in homosaurus as a description for the term). the tasks presented for trying the information retrieval aid were the following: https://lgbtq.indiana.edu/programs-services/library/index.html http://lavenderlibrary.com/ https://www.librarycat.org/lib/iuglbtlibrary/ information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 7 • you want to find resources at iu about transgenderism. what do you think of the resources that iucat is offering through this information retrieval aid? • how do the homosaurus terms you are seeing here compare to the llace classification terms or the tags/subjects you use in the lgbtq+ library catalog? • what is the importance of transparency for the lcsh terms in relation to community values (for terms that are different and only shown in the hidden section right now)? “transgenderism” is a term homosaurus connects to lcsh’s term “gender nonconformity” with an exact match relationship (http://homosaurus.org/v2/transgenderism). to provide results for answering the first question, the proof-of-concept information retrieval aid interface showed the homosaurus term with a linked search in iucat that provided results using the lcsh term as a subject search.27 the second question was asked to get a sense of the relevance of the homosaurus terms to the collections organized and housed in the lgbtq+ culture center library. the third question about the importance of transparency for the lcsh terms in relation to community values was meant to investigate how a system like this proof-of-concept information retrieval aid might be used by the community of researchers and patrons using the culture center’s library, and if the mechanism to mask the lcsh term in favor of the homosaurus term is useful or not. the code for this javascript web application in its current state is available on github at https://github.com/jlhardes/metadatabias. the initial proof-of-concept application was developed by justina kaiser, at the time an information and library science graduate student at indiana university. the current code is a fork of her project, also available on github (https://github.com/juskaise/metadatabias). discussion sharing this proof-of-concept information retrieval aid using homosaurus terms with the lgbtq+ culture center librarian revealed the importance of usability testing and being receptive to a community’s needs. an introduction and explanation of the controlled vocabulary and the community it represents was a recommended addition since the term list presented was not initially easily identified. additionally, the interface terminology of narrower/related/broader/exact match/use for is familiar in the library world but not necessarily for the casual user. this terminology is still in use by the information retrieval aid but is under review for updated labels that are easier to understand. this initial version kept any use for terms hidden unless the user clicked on that bubble in the interface to see them. the reasoning was to give more emphasis to the homosaurus term and to keep any potentially derogatory or harmful terms still in use by lcsh out of the way of researchers (even though the searches conducted against the catalog might need to use those terms if no other linked data connection is available). feedback here was helpful: hiding terms that homosaurus does not recommend might hinder discovering results if the researcher wants to search on a term that is no longer used by the community or is considered derogatory or harmful. this is a useful lesson in that covering up the past is not helpful to those in a marginalized community who have experience with that marginalization or those trying to learn about the past experiences of a marginalized community. also, being able to find all relevant resources can mean a variety of terms (both current in the community and no longer current) might be necessary. the homosaurus editorial board also explained that use for terms are sometimes slang terms and are not always considered derogatory. this information is helpful in figuring out how to present lcsh terms in the interface http://homosaurus.org/v2/transgenderism https://github.com/jlhardes/metadatabias https://github.com/juskaise/metadatabias information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 8 in the context of the homosaurus terms. additionally, moving use for terms next to related terms connected these sets of terms better than placing use for terms with exact match terms. further feedback from the homosaurus editorial board regarding the example term used for testing showed the terms and their connections to other terms do not supply enough information to express the full meaning of the term within the community. without supplying the homosaurus description for the term “transgenderism” (“pathologizing term often used in the medicalization of transgender people; use only in historical context,” see http://homosaurus.org/v2/transgenderism), the term can come across in the information retrieval aid as a preferred term from the community when, in fact, it is not. this was a critical update needed for the information retrieval aid to be effective as a research tool. in using the proof-of-concept interface to search against iucat, it was noted by the lgbtq+ culture center librarian that using the lcsh term to conduct a subject search against the catalog might not produce useful results if the homosaurus term is not an actual exact match to the lcsh term. in this case the homosaurus term should be searched in the catalog as a keyword instead of a subject, so the search is conducted on all indexed fields in the catalog record. in the example tried for the term “transgenderism” the skos:exactmatch relationship is defined as the lcsh term “gender nonconformity” (see fig. 1). even though the relationship is identified in homosaurus as an exact match, searching for “gender nonconformity” as a subject term in the catalog (267 results) and “transgenderism” as a keyword in the catalog (289 results) arrives at different result sets with different types of entries (see figs. 2 and 3). use for terms, while not always representative of the community providing the vocabulary, do have possible historical relevance if present in supplied information (such as a title) and can be connected to the catalog via keyword searching as well. there is an importance to revealing these differences within the library catalog and providing results that reflect the terms used by the community. the library’s applied terminology via subjects organizes a different set of resources compared to searching for terminology available via titles or other information supplied by authors and creators. when considering who is part of a community and who is not in this scenario, there are benefits to trying to work around or in addition to the library’s applied organizational scheme. subject searching in the catalog provides another view (and set of results) for those familiar with the community’s terminology. those approaching a research topic from outside of a community are able to learn more about how to find resources most effectively, moving from the catalog’s terminology to the community’s terminology. after trying the proof-of-concept information retrieval aid, the lgbtq+ culture center librarian provided feedback that this could be useful for people new to studying the lgbtq+ community and unfamiliar with the community’s terminology. with an introduction and explanation of the controlled vocabulary in place and an easy-to-follow interface to guide users through the vocabulary terms, effective searches against the catalog that also reveal terminology used by the community and differences between that terminology and the catalog’s terminology can be both educational and useful for research. http://homosaurus.org/v2/transgenderism information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 9 figure 2. searching indiana university’s online library catalog (iucat) for the lcsh term “gender nonconformity” as subject shows 267 results. information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 10 figure 3. searching indiana university’s online library catalog for the homosaurus term “transgenderism” as keyword shows 289 results. one of the largest obstacles to connecting marginalized communities to reliable, representative controlled vocabularies is the lack of controlled vocabularies that are readily available as linked data. unless an individual or organization has made the effort to establish connections between a community’s vocabulary and lcsh, the representative vocabularies stand alone and remain difficult to discover or use. the proof-of-concept testing of this project illustrates not only the need for connections to community-created controlled vocabularies, but also that having access to those vocabularies can result in more accurate and effective searches and usage of catalog resources. although vocabularies like lcsh contain outdated terms, having access to a variety of terms that are acceptable at different points in a community’s history can be useful for researchers who may not be as informed about certain systemically marginalized communities and whether certain terms have been completely eliminated, reclaimed, or replaced by more accurate terminology. efforts to mitigate bias in metadata via linked data are representative of a larger effort to correct a long-standing issue in libraries and other fields where the voices and perspectives of marginalized individuals have been overshadowed by the voices and needs of the majority. in addition to working to update large, generalized vocabularies and trying to incorporate these voices and information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 11 perspectives, this change in method is meant to add those voices and center their importance. by linking community-created vocabularies and placing them front and center in the search process, metadata can become a tool with which to center the voices of marginalized communities and move toward a more equitable method of searching, finding, and using resources. conclusion the information retrieval aid is still progressing beyond a proof-of-concept but it has seen significant updates since its initial implementation. figure 1 shows the initial proof -of-concept that was tested. introductory information has been added to explain the homosaurus vocabulary and the information retrieval aid tool itself. more terms are available (although still not the full set of homosaurus version 2 terms) and the term list in json-ld is being used to automatically populate the term list in the interface. if available, the term description is provided for more complete context. additionally, no terms are hidden in the bubble navigation and use for is located with related terms now. future work for this project includes incorporating the full list of homosaurus terms; reconsidering the category names (narrower/related/broader/exact match/use for) to determine if there are better labels to use for these categories that will be easier to understand for a general research audience; and testing the tool with researchers new to lgbtq+ terminology as well as those more knowledgeable about the lgbtq+ community and its terminology and history. additional areas of work that welcome investigation include automating the term list generated for use with the information retrieval aid (via api calls, for example) to help reflect any changes or updates made to the community vocabulary over time; the technical implications of connecting this information retrieval aid to a search engine beyond indiana university’s online library catalog; and using this tool with controlled vocabularies from other systemically marginalized communities, such as the bc first nations subject headings, the glossary of disability terms from the north carolina council on developmental disabilities, or atria: women’s thesaurus from the institute on gender equality and women’s history.28 what difference does it make to use a different search engine that incorporates lcsh terms? likewise, is it possible to connect other linked data (or non-linked data) controlled vocabularies from systemically marginalized communities and is that effective for retrieving information and improving research outcomes? the work so far shows the possibility of centering systemically marginalized voices by using the system more effectively, making linked data work to connect and update the terminology and search terms available for research. acknowledgements the authors would like to thank the lgbtq+ culture center librarian at indiana university for spring 2020, olivia adams, for her helpful review and feedback of the initial proof -of-concept information retrieval aid. we would also like to thank brian m. watson, editorial board member of homosaurus.org, for their help with using homosaurus version 2 terms and the homosaurus editorial board, particularly k. j. rawson, for reviewing and supplying article feedback. the authors also acknowledge the work of justina kaiser who created the initial code behind the information retrieval aid. information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 12 endnotes 1 hope a. olson, “mapping beyond dewey’s boundaries: constructing classificatory space for marginalized knowledge domains,” library trends 47, no. 2 (fall 1998): 238. 2 the term “systemically marginalized group,” used recently by dr. nicki washington from duke university at the september 3, 2020, indiana university center for women in technology talk, “‘bring a folding chair’: understanding and addressing issues of race in the context of stem,” was revealing to the authors as a better term to use than “historically marginalized communities.” this is significant in that it emphasizes the continued oppression and marginalization of these communities, rather than viewing these communities’ struggles as something of the past that has been overcome/surmounted. 3 “mission, history, editorial board,” homosaurus vocabulary site, accessed march 2, 2021, http://homosaurus.org/about. 4 “skos simple knowledge organization system reference,” w3, published august 18, 2009, https://www.w3.org/tr/skos-reference/. 5 “skos:exactmatch,” skos simple knowledge organization system namespace document—html variant, 18 august 2009 recommendation edition, w3, last modified august 6, 2011, https://www.w3.org/2009/08/skos-reference/skos.html#exactmatch. 6 freeda brook, david ellenwood, and althea eannace lazzaro, “in pursuit of antiracist social justice: denaturalizing whiteness in the academic library,” library trends 64, no. 2 (fall 2015): 259, https://muse.jhu.edu/article/610078. 7 holly tomren, “classification, bias, and american indian materials” (san jose state university, 2003), http://ailasacc.pbworks.com/f/biasclassification2004.pdf. 8 amelia koford, “how disability studies scholars interact with subject headings,” cataloging & classification quarterly 52, no. 4 (2014), https://doi.org/10/gf542p. 9 emily drabinski, “queering the catalog: queer theory and the politics of correction,” library quarterly: information, community, policy 83, no. 2 (april 2013), https://www.jstor.org/stable/10.1086/669547. 10 sara a. howard and steven a. knowlton, “browsing through bias: the library of congress classification and subject headings for african american studies and lgbtqia studies,” library trends 67, no. 1 (summer 2018), https://doi.org/10.1353/lib.2018.0026. 11 kristen hogan, “‘breaking secrets’ in the catalog: proposing the black queer studies collection at the university of texas at austin,” progressive librarian 34 (2010), http://www.progressivelibrariansguild.org/pl/pl34_35/050.pdf. 12 j. l. colbert [ https://orcid.org/0000-0001-5733-5168], “patron-driven subject access: how librarians can mitigate that ‘power to name’,” in the library with the lead pipe, november 15, 2017, http://www.inthelibrarywiththeleadpipe.org/2017/patron-driven-subject-access-howlibrarians-can-mitigate-that-power-to-name/. http://homosaurus.org/about https://www.w3.org/tr/skos-reference/ https://www.w3.org/2009/08/skos-reference/skos.html#exactmatch https://muse.jhu.edu/article/610078 http://ailasacc.pbworks.com/f/biasclassification2004.pdf https://doi.org/10/gf542p https://www.jstor.org/stable/10.1086/669547 https://doi.org/10.1353/lib.2018.0026 http://www.progressivelibrariansguild.org/pl/pl34_35/050.pdf http://www.inthelibrarywiththeleadpipe.org/2017/patron-driven-subject-access-how-librarians-can-mitigate-that-power-to-name/ http://www.inthelibrarywiththeleadpipe.org/2017/patron-driven-subject-access-how-librarians-can-mitigate-that-power-to-name/ information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 13 13 j. l. colbert, “patron-driven subject access.” 14 avril johnson madison and dorothy porter wesley, “dorothy burnett porter wesley: enterprising steward of black culture,” public historian 17, no. 1 (winter 1995): 25, https://www.jstor.org/stable/3378349; janet sims-wood, dorothy porter wesley at howard university: building a legacy of black history (charleston, sc: the history press, 2014), 39; zita cristina nunes, “cataloging black knowledge: how dorothy porter assembled and organized a premier africana research collection,” perspectives on history: the news magazine of the american historical association (november 20, 2018), https://www.historians.org/publications-and-directories/perspectives-on-history/december2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premierafricana-research-collection. 15 hope a. olson and dennis b. ward, “feminist locales in dewey’s landscape: mapping a marginalized knowledge domain,” in knowledge organization for information retrieval: proceedings of the sixth international study conference on classification research (the hague, netherlands: international federation for information documentation, 1997), 129. 16 dorothy berry, “digitizing and enhancing description across collections to make african american materials more discoverable on umbra search african american history,” the design for diversity learning toolkit, northeastern university libraries, august 2, 2018, https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-acrosscollections-to-make-african-american-materials-more-discoverable-on-umbra-search-africanamerican-history/. 17 alexis a. antracoli et al., anti-racist description resources (philadelphia, pa: archives for black lives in philadelphia, 2019), i, https://archivesforblacklives.files.wordpress.com/2019/10/ardr_final.pdf. 18 antracoli et al., “anti-racist description resources,” 5. 19 diana k. wakimoto, debra l. hansen, and christine bruce, “the case of llace: challenges, triumphs, and lessons of a community archives,” american archivist 76, no. 2 (fall/winter 2013), http://www.jstor.org/stable/43490362. 20 according to the article, “the word queer is used throughout this article as the most general, over-arching term to describe communities and individuals who support llace and make it possible.” diana k. wakimoto et al., “case of llace,” 439; dee michel, ed., gay studies thesaurus, rev. ed. (urbana, il, 1990). 21 catelynne sahadath, “classifying the margins: using alternative classification schemes to empower diverse and marginalized users,” feliciter 59, no. 3 (june 2013): 16. 22 monica martens, “creating a supplemental thesaurus to lcsh for a specialized collection: the experience of the national indian law library,” law library journal 98, no. 2 (spring 2006). https://www.jstor.org/stable/3378349 https://www.historians.org/publications-and-directories/perspectives-on-history/december-2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premier-africana-research-collection https://www.historians.org/publications-and-directories/perspectives-on-history/december-2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premier-africana-research-collection https://www.historians.org/publications-and-directories/perspectives-on-history/december-2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premier-africana-research-collection https://www.historians.org/publications-and-directories/perspectives-on-history/december-2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premier-africana-research-collection https://www.historians.org/publications-and-directories/perspectives-on-history/december-2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premier-africana-research-collection https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-across-collections-to-make-african-american-materials-more-discoverable-on-umbra-search-african-american-history/ https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-across-collections-to-make-african-american-materials-more-discoverable-on-umbra-search-african-american-history/ https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-across-collections-to-make-african-american-materials-more-discoverable-on-umbra-search-african-american-history/ https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-across-collections-to-make-african-american-materials-more-discoverable-on-umbra-search-african-american-history/ https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-across-collections-to-make-african-american-materials-more-discoverable-on-umbra-search-african-american-history/ https://archivesforblacklives.files.wordpress.com/2019/10/ardr_final.pdf http://www.jstor.org/stable/43490362 information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 14 23 sharon farnel et al., “rethinking representation: indigenous peoples and contexts at the university of alberta libraries,” international journal of information, diversity, & inclusion 2, no. 3 (2018), https://doi.org/10.33137/ijidi.v2i3.32190. 24 alcts is a division of the american library association— http://www.ala.org/alcts/mgrps/camms/cmtes/ats-ccssac; sac working group, “report of the sac working group on alternatives to lcsh ‘illegal aliens,’” american library association institutional repository, submitted june 19, 2020, https://alair.ala.org/bitstream/handle/11213/14582/sac20-ac_report_sac-working-groupon-alternatives-to-lcsh-illegal-aliens.pdf. 25 this is a moment to acknowledge the work of several homosaurus editorial board members, including brian m. watson, who is studying and working with linked data at university of british columbia; chloe noland from american jewish university; and walter “cat” walker from the william h. hannon library and one national gay and lesbian archives. there was never a request to add these lcsh term connections, but the timing was incredibly helpful, and the effort greatly appreciated. 26 example search for term “love” that results in browsable terms in a visual interface: https://www.ihlia.nl/search/index.jsp?q%3asearch=love&q%3azoekterm.row1.field3=&lang =en. 27 “gender nonconformity,” (search results, iucat, indiana university, accessed march 2, 2021), https://iucat.iu.edu/?utf8=%26%2310004%3b&search_field=subject&q=gender+nonconfor mity. 28 bc first nations subject headings (vancouver, bc: xwi7xwa library first nations house of learning, march 2, 2009), http://branchxwi7xwa.sites.olt.ubc.ca/files/2011/09/bcfn.pdf; “glossary of disability terms,” north carolina council on developmental disabilities, accessed march 8, 2021, https://nccdd.org/welcome/glossary-and-terms/category/glossary-ofdisability-terms; “search in the women’s thesaurus,” atria—institute on gender equality and women’s history, accessed march 8, 2021, https://institute-genderequality.org/libraryarchive/collection/thesaurus. https://doi.org/10.33137/ijidi.v2i3.32190 http://www.ala.org/alcts/mgrps/camms/cmtes/ats-ccssac https://alair.ala.org/bitstream/handle/11213/14582/sac20-ac_report_sac-working-group-on-alternatives-to-lcsh-illegal-aliens.pdf https://alair.ala.org/bitstream/handle/11213/14582/sac20-ac_report_sac-working-group-on-alternatives-to-lcsh-illegal-aliens.pdf https://alair.ala.org/bitstream/handle/11213/14582/sac20-ac_report_sac-working-group-on-alternatives-to-lcsh-illegal-aliens.pdf https://alair.ala.org/bitstream/handle/11213/14582/sac20-ac_report_sac-working-group-on-alternatives-to-lcsh-illegal-aliens.pdf https://www.ihlia.nl/search/index.jsp?q%3asearch=love&q%3azoekterm.row1.field3=&lang=en https://www.ihlia.nl/search/index.jsp?q%3asearch=love&q%3azoekterm.row1.field3=&lang=en https://iucat.iu.edu/?utf8=%26%2310004%3b&search_field=subject&q=gender+nonconformity https://iucat.iu.edu/?utf8=%26%2310004%3b&search_field=subject&q=gender+nonconformity http://branchxwi7xwa.sites.olt.ubc.ca/files/2011/09/bcfn.pdf https://nccdd.org/welcome/glossary-and-terms/category/glossary-of-disability-terms https://nccdd.org/welcome/glossary-and-terms/category/glossary-of-disability-terms https://institute-genderequality.org/library-archive/collection/thesaurus https://institute-genderequality.org/library-archive/collection/thesaurus abstract introduction literature review methodology discussion conclusion acknowledgements endnotes current trends and goals in the development of makerspaces at new england college and research libraries ann marie l. davis information technology and libraries | june 2018 94 ann marie l. davis (davis.5257@osu.edu) is faculty librarian of japanese studies at the ohio state university. abstract this study investigates why and which types of college and research libraries (crls) are currently developing makerspaces (or an equivalent space) for their communities. based on an online survey and phone interviews with a sample population of crls in new england, i found that 26 crls had or were in the process of developing a makerspace in this region. in addition, several other crls were actively promoting and diffusing the maker ethos. of these libraries, most were motivated to promote open access to new technologies, literacies, and stem-related knowledge. introduction and overview makerspaces, alternatively known as hackerspaces, tech shops, and fab labs, are trendy new sites where people of all ages and backgrounds gather to experiment and learn. born of a global community movement, makerspaces bring the do-it-yourself (diy) approach to communities of tinkerers using technologies including 3d printers, robotics, metaland woodworking, and arts and crafts.1 building on this philosophy of shared discovery, public libraries have been creating free programs and open makerspaces since 2011.2 given their potential for community engagement, college and research libraries (crls) have also been joining the movement in growing numbers.3 in recent years, makerspaces in crls have generated positive press in popular and academic journals. despite the optimism, scholarly research that measures their impact is sparse. for example, current library and information science literature overlooks why and how various crls choose to create and maintain their respective makerspace. likewise, there is scant data on the institutional objectives, frameworks, and experiences that characterize current crl makerspace initiatives.4 this study begins to fill this gap by investigating why and which types of crls are creating makerspaces (or an equivalent room or space) for their library communities. specifically, it focuses on libraries at four-year colleges and research universities in new england. throughout this study, makerspace is used interchangeably with other terms, including maker labs and innovation spaces, to reflect the variation in names and objectives that underlie the current trends. in exploring their motives and experiences, this article provides a snapshot of the current makerspace movement in crls. mailto:davis.5257@osu.edu current trends and goals in the development of makerspaces | davis 95 https://doi.org/10.6017/ital.v37i2.9825 the study finds that the number of crls actively involved in the makerspace movement is growing. in addition to more than two dozen that have or are in the process of developing a makerspace, another dozen crls have staff who support the diffusion of maker technologies, such as 3d printing and crafting tools that support active learning and discovery, in the campus library and beyond.5 comprising research and liberal arts schools, public and private, and small and large, the crls involved with makerspaces are strikingly diverse. despite these differences, this population is united by common objectives to promote new literacies, provide open access to new technologies, and foster a cooperative ethos of making. literature review the body of literature on library makerspaces is brief, descriptive, and often didactic. given the newness of the maker movement in public and academic libraries, many articles focus on early success stories and defining the movement vis-à-vis the mission of the library. for instance, laura britton, known for having created the first makerspace in a public library (the fayetteville free library’s fabulous laboratory), defines a makerspace as “a place where people come together to create and collaborate, to share resources, knowledge, and stuff.”6 this definition, she determines, is strikingly similar to that of the library. most literature on makerspaces appears in academic blogs, professional websites, and popular magazines. among the most frequently cited is tj mccue’s article, which celebrates britton’s (née smedley) fablab while distilling the intellectual underpinnings of the makerspace ethos.7 phillip torrone, editor of make: magazine, supports smedley’s project as an example of “rebuilding” or “retooling” our public spaces.8 within this camp, david lankes, professor of information studies at syracuse university, applauds such work as activist and community-oriented librarianship.9 many authors emphasize the philosophical “fit,” or intersection, of public makerspaces with the principles of librarianship. building on torrone’s work, j. l. balas claims that creating access to resources for learning and making is in keeping with the “library’s historical role of providing access to the ‘tools of knowledge.’”10 others emphasize the hands-on, participatory, and intergenerational features of the maker movement, which has the potential to bridge the digital divide.11 still others identify areas of literacy, innovation, and ste(a)m skills where library makerspaces can have a broad impact. while public libraries often focus on early childhood or adult education, crls adopt separate frameworks for information literacy. like public libraries, they aim to build (meta)literacies and ste(a)m skills. nevertheless, their programs often tailor to curricular goals in the arts and sciences or specialized degrees in engineering, education, and business. this is especially true of crls situated within large, research-intensive universities. considering their specific missions and aims, this study seeks to identify the goals and challenges that reinforce the development of makerspaces in undergraduate and research environments. research design and method data presented in this study was gathered from library directors (or their designees) through an online survey and oral telephone interviews. after choosing a sampling frame of crls in new england, i developed a three-path survey, sent invitations, and collected and analyzed data using the online platform surveymonkey. the survey was distributed following review by the information technology and libraries | june 2018 96 institutional review board (irb) at southern connecticut state university, where i completed a master of library science (mls) degree. survey population to assess generalized findings for the larger population in north america, i chose a clustersampling approach that limited the survey population to the crls in new england. in generating the sampling frame, i included four-year and advanced-degree institutions based on the assumption that libraries at these schools supported specialized, research, or field-specific degrees. i omitted for-profit and two-year institutions, based on the assumption that they are driven by separate business models. this process generated a contact list of 182 library directors at the designated crls in connecticut, maine, massachusetts, new hampshire, rhode island, and vermont. survey design the purpose of the survey was to gather basic data about the size and structure of the respondents’ institutions and to gain insights on their views and practices regarding makerspaces (the survey is reproduced in the appendix). the first page of the survey contained a statement of consent, including my contact information and that of my irb. after a short set of preliminary questions, the survey branched into one of three paths based on respondents’ answers about makerspaces. the respondents were thus categorized into one of three groups: path one (p1) for those with no makerspace and no plans to create one, path two (p2) for those with plans to develop a makerspace in the near future, and path three (p3) for those already running a makerspace in their libraries. p3 was the longest section of the survey, containing several questions about p3 experiences with makerspaces such as staffing, programing, and objectives. data collection in summer 2015, brief email invitations and two reminders were sent to the targeted population.12 to increase the participation rate, i sometimes wrote personal emails and made direct phone calls to crls known to have makerspace. for cold-call interviews, i developed a script explaining the nature of the online survey. after obtaining informed consent, i proceeded to ask the questions in the online survey and manually enter the participants’ responses at the time of the interview. on a few occasions, online respondents followed up with personal emails volunteering to discuss their library’s experiences in more detail. i took advantage of these invitations, which often provided unique and welcome insights. in analyzing the responses, i used tabulated frequencies for quantitative results and sorted qualitative data into two different categories. the first category was identified as “short and objective” and coded and analyzed numerically. the longer, more “subjective and value-driven” data was analyzed for common trends, relationships, and patterns. within this second category, i also identified outlier responses that suggested possible exceptions to common experiences. results the survey closed after one month of data collection. at this time, 55 of 182 potential respondents had participated, yielding a response rate of 30.2%. among these participants, the survey achieved a 100.0% response rate (9 completed surveys of 9 targeted crls) among libraries that were current trends and goals in the development of makerspaces | davis 97 https://doi.org/10.6017/ital.v37i2.9825 currently operating makerspaces. i created a list of all known crl makerspaces in new england based on an exhaustive website search of all crls in this region. subsequent interviews with the managers of the makerspaces on this list revealed no other hidden or unknown makerspaces in this region. of the 55 respondents, 29 (52.7%) were in p1, 17 (30.9%) were in p2, and 9 (16.4%) were in p3. (see figure 1.) figure 1. survey participants’ (n = 55) current crl efforts and plans to develop and operate a makerspace. among respondents in p2 and p3, the majority (13 of 23) indicated that they were from libraries that served a student population of 4,999 people or fewer, while only one library served a population of 30,000 or more (see figure 2). in terms of sheer numbers, makerspaces might seem to be gaining traction at smaller crls, but proportionally, one cannot say that smaller crls are adopting makerspaces at a higher rate because the majority of survey participants have student populations of 19,999 or less (51, or 91.1%). the number of institutions with populations over 20,000 were in a clear minority (5, or 8.9%). (see figure 3.) information technology and libraries | june 2018 98 figure 2. p2 and p3 crls with makerspaces or concrete plans to develop a makerspace. figure 3. the majority of crls (67.2%) that participated in the survey had a population of 4,999 students or less. only 1.8% of schools that participated had a population of 30,000 students or more. current trends and goals in the development of makerspaces | davis 99 https://doi.org/10.6017/ital.v37i2.9825 crls with no makerspace (p1 = 29) in the first part of the survey, the majority of p1 respondents demonstrated positive views toward makerspaces despite having no plans to create one in the near future. budgetary and space limitations aside, many were relatively open to the possibility of developing a makerspace in a more distant future. in the words of one respondent, “we have several areas within the library that present a heavy demand on our budget. in [the] future, we would love to consider a makerspace, and whether it would be a sensible and appropriate investment that would benefit our students.” when asked what their reasons were for not having a makerspace, some respondents (8, or 27.6%) said they had not given it much thought, but most (21, or 72.4%) offered specific answers. among these, the most frequently cited reason (11, or 37.8%) was that a library makerspace would be redundant: such spaces and labs were already offered in other departments within the institution or in the broader community. at one crl, for example, the respondent said the library did not want to compete with faculty initiatives elsewhere on campus. other reasons included that makerspaces were expensive and not a priority. some (5, or 17.2%) libraries preferred to allocate their funds to different types of spaces such as “a very good book arts studio/workshop” or “simulation labs.” some (6, or 20.6%) shared concerns about a lack of space, staff, or simply “a good culture of collaboration [on campus].” merging these sentiments, one respondent concluded, “people still need the library to be fairly quiet. . . . having makerspace equipment in our library would be too distracting.” while some were skeptical (sharing concerns about potential hazards or that makerspaces were simply “the flavor of the month”), the majority (roughly 60%) were open and enthusiastic. one respondent, in fact, held a leadership position in a community makerspace beyond campus. according to this librarian, 3d printers, scanners, and laser cutters were sure to become more common, and crls would no doubt eventually develop “a formal space for making stuff.” crls with plans for a makerspace in the near future (p2 = 17) the second section of the survey (p2) focused primarily on the motivations and means by which this cohort planned to develop a makerspace. when asked why they were creating a makerspace, the most common response was to promote learning and literacy (15 respondents, or 88.2%). in addition, a large majority (12 respondents, or 70.6%) felt that makerspaces helped to promote the library as relevant, particularly in the digital age. three more reasons that earned top scores (10 respondents each, or 58.2%) were being inspired by the ethos of making, creating a complement to digital repositories and scholarship initiatives, and providing access to expensive machines or tools. additional reasons included building outreach and responding to community requests.13 (see figure 4.) information technology and libraries | june 2018 100 figure 4. rationale behind p2 respondents’ decision to plan a makerspace (n = 17). while p2 respondents indicated a clear decision to create a makerspace, their timeframes were noticeably different. i categorized their open responses into one of six timeframes: “within six months,” “within one year,” “within two years,” “within four years,” “within six years,” and “unknown.” the result presented a clear trimodal distribution with three subgroups: six crls with plans to open within 18 months, five with plans to open within the next two years, and six with plans to open after three or more years (see figure 5). in addition to their timeframe, p2 respondents were also asked about their plans for financing their future makerspaces. based on their open responses, the following six funding sources emerged: • the library budget, including surplus moneys or capital project funds • internal funding, including from campus constituents • donations and gifts • external grants • cost recovery plans, including small charges to users • not sure/in progress current trends and goals in the development of makerspaces | davis 101 https://doi.org/10.6017/ital.v37i2.9825 figure 5. p2 respondents’ timeframe for developing the makerspace (n = 17). with seven mentions, the most common of the above funding was the “library budget.” with two mentions each, the least common sources were “cost recovery” and “not sure/in progress.” among those who mentioned external grant applications, one respondent mentioned a focus on women and stem opportunities, and another specifically discussed attempts at grants from the institute of museum and library services. (see figure 6.) figure 6. p2respondents’ plans for gathering and financing makerspace (n = 17). regarding target user groups, some respondents focused on opportunities to enhance specific disciplinary knowledge, while others emphasized a general need for creating a free and open environment. one respondent mentioned that at her state-funded library, the space would be “geared to younger [primary and secondary school] ages,” “student teachers,” and “librarians on practicum assignments.” by contrast, another respondent at a large, private, carnegie r1 information technology and libraries | june 2018 102 university emphasized that the space was earmarked for the undergraduate and graduate students. in contrast to the cohort in p1, a notable number in p2 chose to create a makerspace despite the existence of maker-oriented research labs elsewhere on campus. as one respondent noted, the university was still “lacking a physical space where people could transition between technologies” and an open environment “where students doing projects for faculty” could come, especially later in the evenings. another respondent at a similarly large, private institution explained that his colleagues recognized that most labs at their university were earmarked for specific professional schools. as a result, his colleagues came up with a strategy to provide self-service 3d printing stations at the media center, located in the library at the heart of campus. crls with operating makerspaces (p3 = 9) the final section of the survey (p3) focused on the motivations and means by which crls with makerspaces already in operation chose to develop and maintain their sites. in addition, this section gathered information on p3 crl funding decisions, service models, and types of users in their makerspaces. of the nine respondents in this path, all had makerspaces that had opened within the last three years. among these, roughly a third (4) had been in operation from one to two years; another third (3) had operated for two to three years; and two had opened within the last year. (see table 1.) table 1. length of time the crl makerspace has been in operation for p3 respondents (n = 9). age of crl makerspace or lab—p3 answer options responses % less than 6 months 1 11.1 6–12 months 1 11.1 1–2 years 4 44.4 2–3 years 3 33.3 more than 3 years 0 0.0 total responses 9 100.0 priorities and rationale the reasons behind p3 decisions to make a makerspace were slightly different from those of p2. while “promoting literacy and learning” was still a top priority, two other reasons, “promoting the maker culture of making” and “providing access to expensive machinery,” were deemed equally important (6 respondents, or 66.7%, for each). other significant priorities included “promoting community outreach” (4 respondents, or 44.4%), “promoting the library as relevant” and in “direct response to community requests” (3 respondents, or 33.3%, for each). (see figure 7.) current trends and goals in the development of makerspaces | davis 103 https://doi.org/10.6017/ital.v37i2.9825 figure 7. rationale behind p3 respondents’ decision to develop and maintain a makerspace (n = 9). the answer of “other” was also given top priority (5 respondents, or 55.6%). i conclude that this indicated a strong desire among respondents to express in their own words their library’s unique decisions and circumstances. (their free responses to this question are discussed below.) a familiar theme in the responses of the five respondents who elaborated on their choice of “other” was the desire to situate a makerspace in the central and open environment of the campus library. as one participant noted, there were “other access points and labs on campus,” but those labs were “more siloed” or cut off from the general population. by contrast, the campus library aimed to serve a broader population and anticipated a general “student need.” later, the same respondent added that the makerspace was an opportunity to promote social justice, cultivate student clubs, and encourage engagement at the hub of the campus community. this type of ecumenical thinking was manifested in a similar remark that the library’s role was to reinforce other learning environments on campus. one respondent saw the makerspace as an additional resource “that complemented the maker opportunities that we have had in our curriculum resource center for decades.” likewise, the library makerspace was intended to offer opportunities to a range of users on campus and beyond. funding, staffing, and service models when prompted to discuss how they gathered the resources for their makerspaces, the largest group (4 respondents) stated that a significant means for funding was through gifts and donations. thus, the majority of crl makerspaces in new england depended primarily on contributions from friends of the library, university/college alumni, and donors. the second most common source (3 respondents) was through the library budget, including surplus money at the end of the year. making use of grant money and cost recovery were mentioned by two library participants, and internal and constituent support was useful for two libraries. (see figure 8.) information technology and libraries | june 2018 104 figure 8. p3 methods for gathering and financing a makerspace (n = 9). among these, a particularly noteworthy case was a makerspace that had originated from a new student club focused on 3d printing. originally based in a student dorm, the club was funded by a campus student union, which allocated grant money to students through a budget derived from the college tuition. as the club quickly grew, it found significant support in the library, which subsequently provided space (on the top floor of the library), staff, and financial support from surplus funds in the library budget. as this example would suggest, the sum of the responses showed that financing the makerspaces depended on a combination of strategies. one participant summarized it best: “we’ve slowly accumulated resources over time, using different funding for different pieces. some grant funding. mostly annual budget.” regarding service models, more than half of these libraries (five) currently offer a combination of programming and open lab time where users could make appointments or just drop in. by contrast, two of the libraries offered programs only, and did not offer an open lab; another two did the opposite, offering no programming but an open makerspace at designated times. of the latter, one is open monday to friday from 8 a.m. to 4 p.m., and the other is open during regular hours, with spaces that “can be booked ahead for classes or projects.” most labs supported drop-in visitors and were open evenings and weekends. at one makerspace, where there was increasingly heavy demand, the staff required students to submit proposals with project goals. (see table 2.) while some libraries brought in community experts, others held faculty programs, and some scheduled lab time for individual classes. one makerspace prioritized not only the campus, but also the broader community, and thus featured programs for local high schools and seniors. responses from this library emphasized the social justice thread that inspired their work and the community culture that they aimed to foster. current trends and goals in the development of makerspaces | davis 105 https://doi.org/10.6017/ital.v37i2.9825 table 2. model for services offered in the crl makerspace or 3d printing lab do you offer programs in the makerspace/lab or is it simply opened at defined times for users to use? answer options responses % yes, we offer the following types of programs. 2 22.2 no, we simply leave the makerspace/lab open at the specific times. 2 22.2 we do both. we offer the programs and leave the makerspace/lab open at specific times. 5 55.6 as this data would suggest, most makerspaces were used by students (undergraduates and graduates) and faculty, in addition to local experts and generational groups. survey responses showed that undergraduate students were the most common users (9 of 9 respondents checked this group as the most frequent type of user), and faculty and graduate students were the second and third most common (8 of 9 respondents checked these groups as most frequent) user groups in the labs. local entrepreneurs, artists, designers, craftspeople, and campus and library staff also use the makerspaces. (see figure 9.) when prompted to identify “other” categories, one respondent specifically listed “learners, makers, sharers, studiers, [and] clubs.” figure 9. of the different types of users listed above, p3 respondents ranked them in order of who used the makerspace or equivalent lab most often (n = 9). the number and type of staff that managed and operated the makerspaces also varied widely at the nine crls in p3. seven of the crls employed full-time, dedicated staff, among whom four participants checked off the “dedicated staff”–only options. of the remaining two crls, one information technology and libraries | june 2018 106 reported staffing the makerspace with only one student, and one reported not having any staff working in the makerspace. i assume that the makerspace with no employees is managed by staff and students who are assigned to other, unspecified library departments or work groups. (see figure 10.) figure 10. the staffing situations at the p3 respondents (n = 9), where each respondent is assigned a letter from “a” to “i.” library programing was also diverse in terms of targeted audiences, speakers, and learning objectives. instructional workshops varied from 3d scanning and printing to soldering, felt making, sewing, knitting, robotics, and programming (e.g., raspberry pi.) the type of equipment contained in each lab is likely correlated to the range in programming; however, investigating these links was beyond the scope of this study. regarding this equipment, the size and activity of the participant crls varied considerably. some responses were more specific than others, and thus the resulting dataset was incomplete (see table 3.) challenges and philosophies of crl makerspaces the final portion of the survey invited participants to freely offer their thoughts about operating a crl makerspace. what follows below is a summary of the two most prominent themes that emerged: the challenges of building the lab and the social philosophies that framed these initiatives. in terms of challenges, the most common hurdle noted was the tremendous learning curve involved in establishing, maintaining, and promoting a makerspace. setting up some of the 3d printers, for example, required knowledge about electrical networks, computer systems, and safety policies at a federal and local level. once the hardware was running, lab managers needed to know how the machines interfaced with different and challenging software applications. communication skills were also critical, as one respondent reported, “printing anything and everything takes knowledge, experience.” communicating with stakeholders and users in accessible and proactive ways required strong teaching and customer service skills. current trends and goals in the development of makerspaces | davis 107 https://doi.org/10.6017/ital.v37i2.9825 table 3. the types of tools and equipment used at p3 crl respondents (n = 8), which are assigned letters from a to h. major equipment offered by individual library makerspaces or equivalent labs—path 3 crl label response text a die cut machine, 3d printer, 3d pens, raspberry pi, arduino, makey makey, art supplies, sewing supplies, pretty much anything anyone asks for we will try to get. b 2 makerbot replicators, 1 digital scanner, 1 othermill c 3d printing, 3d scanning, and laser cutting. d 3d printing, 3d scanning, laser cutting, vinyl cutting, large format printing, cnc machine, media production/postproduction. e no response f 3 creatorx, 1 powerspec, 3 m3d, 2 replicator 2, 1 replicator2x, 1 makergear, 1 leapfrogxl, 1 ultimaker, 1type a,1 deltaprinter, 1 delta maker, 2 printrbot, 2 filabots, 2x-box kinect for scanning, 2 oculus rifts, embedded systems cabinet with soldering stations, solar panels and micro controllers etc, 1 formlabs sla, 1 muve sla, rova 5, a bunch of quadcopters g 3d printers (4 printers, 3 models), 3d scanning/digitizing equipment (3 models), raspberry pi, arduino, a laser cutter and engraving system, poster printer, digital drawing tablets, gopro, a variety of editing and design software, a number of tools (e.g. dremel, soldering iron, wrenches, pliers, hammers, etc.), and a number of consumable or misc. items (e.g. paint, electrical tape, acetone, safety equipment, led lights, screws and nails, etc.) h 48 printers (all makerbot brand), 35 replicator 5th gen (a moderate size printer, 5 replicator z18 printers (larger built size), and 5 replicator minis, 3 replicator 2x) 5 makerbot digitzers (turntable scanners 8" by 8") 1 cubify sense hand scanner 7 still cameras for photogrammetry 21 i-mac computers 2 mac pros 2 wacom graphics tablets (thinking about complementing other resources at other labs on campus) another challenge that often came up was that of managing resources. as one respondent warned, crls should beware the “early adoption of certain technologies,” which can become “quickly information technology and libraries | june 2018 108 outdated by a rapidly growing field.” for others, it was a challenge to recruit the right staff that could run and fix machines in constant need of repair. in addition to hiring people with manufacturing and teaching skills, a successful lab required individuals who were savvy about outreach and community needs. despite such challenges, many respondents were eager to discuss the aspirations and rewards of crl makerspaces. above all, respondents focused on the pedagogical opportunities on the one hand, and the potential for outreach and social justice on the other. one participant conceded that measuring advances in literacy and education was “intangible,” but he saw great value in “giving students the experience of seeing their ideas come to fruition.” the excitement that this created for one student manifested in a buzz, and subsequently a “fever” or groundswell, in which more users came in to tinker and learn. meanwhile, the learning that took place among future professionals on campus was “critical,” even when results did not “go viral.” the aspiration to create human connections within and beyond campus was another striking theme. according to one respondent, the makerspace had “enabled some incredibly fruitful collaborations with different departments on campus.” this “fantastic outcome” was becoming more and more visible as the maker community grew. other crl makerspaces took pride in fostering a type of learning that was explicitly collaborative, exciting, and even “fun” for users. this in turn meant that some libraries were becoming “very popular,” generating a lot of “good pr,” and becoming central in the lives of new types of library users. along these lines, some respondents aimed to leverage the power of the makerspace to achieve social justice goals that resonated with core values of librarianship. according to one enthusiastic participant, the ethos of sharing was alive and strong among the staff and the many students who saw their participation in the lab as a lifestyle and culture of collaborating. in another initiative, the respondent looked forward to eventually offering grants to those users who proposed meaningful ways to use the makerspace to create practical value for the community. from this perspective, there was added value in having the 3d printing lab situated specifically on a college or university campus. according to this respondent, the unique quality of the crl makerspace was that by virtue of its location amid numerous and energetic young people, it was ripe for exploitation by those “who had great ideas and time and energy to do good.” discussion the aim of this study was to explore why and which types of crls had developed makerspaces (or an equivalent space) for their communities. of the 56 respondents, roughly half (46%) were p2 and p3 libraries who were currently developing or operating a makerspace, respectively. data from this survey indicated that none of the p2 or p3 crls fit a mold or pattern in terms of their size, educational models, or classifications. upon analyzing the data, i found that the differentiators between the three groups were less clearly defined than originally anticipated. in one example of blurred lines, at least two respondents in p1 indicated that they were more actively engaged with makerspaces than two respondents in p2. despite not having physical labs within their libraries, these p1 respondents were in the process of actively supporting or making plans for a makerspace within their crl community. one p1 respondent, for example, served on the planning board for a local community makerspace and had therefore “thoroughly investigated and used” the makerspace at a current trends and goals in the development of makerspaces | davis 109 https://doi.org/10.6017/ital.v37i2.9825 neighboring university. based on his knowledge, he decided to develop a complementary initiative (e.g., a book arts workshop) at his university library. although his library did not yet have a formal makerspace, he felt confident that the diffusion of 3d printers would come to his library in the near future. another p1 respondent was responsible for administering faculty teaching and innovation grants. among the recent grant recipients were two faculty collaborators who used the library’s funds to build a makerspace at a campus location that was separate from the library. although the makerspace was not directly developed by the respondent’s library, it was nevertheless a direct product of his library’s programmatic support. the respondent reported that for this reason, his library did not want to compete with its own faculty initiatives. in another example of blurred distinctions, one librarian in p2 was as deeply immersed in providing access and education on makerspaces as his colleagues in p3. although he was not clear on when or how his library would finance a future makerspace, his library already offered many of the same services and workshops as p3 libraries. as a “maker in the library,” he offered noncredit-bearing 3d printing seminars to students and offered trial 3d printing services in the library for graduates of the 3d printing seminar. in addition, he made appearances at relevant campus events. when the university museum ran a 3d printing day, for instance, he participated as an expert panelist and gave public demonstrations on library-owned 3d printers and a scanner kinect bar. in sum, despite the respondents’ categorization in p1 and p2, they sometimes shared more in common with the cohorts in p2 and p3, respectively. given their library’s programmatic involvement in creating and endorsing the maker movement, these respondents were more than just “interested” or “open to” the prospect of creating a makerspace. while only 16% of crls (p3 = 9) responded as actively operating a makerspace, another 30% (p2 = 17) were involved in developing a makerspace in the near future. moreover, the number of crls formally involved with the diffusion of maker technologies was not limited to just these two groups. although some makerspaces were not directly run by the library, they had come to fruition because of librarybased funding, grants, and professional support. and although some libraries did not have immediate plans for a makerspace, they were already promoting maker technologies and the maker ethos in other significant ways. conclusion this study is one of the first comprehensive and comparative studies on crl makerspace programs and their respective goals, policies, and outcomes. while the number of current crl makerspaces is relatively low, the data suggests that the population is increasing; a growing number of crls are involved in the makerspace movement. more than two dozen crls were planning to develop makerspaces in the near future, helping to diffuse maker technologies through crl programming, and/or supporting nonlibrary maker initiatives on campus and beyond. in addition, some crls were buying equipment, hiring dedicated staff, offering relevant workshops and demonstrations, and supporting community efforts to build labs beyond the library. although the author aimed to find structural commonalities between crls in groups p2 and p3, none were found. respondents in these groups came from institutions of all sizes , a wide variety information technology and libraries | june 2018 110 of endowment levels, and both public and private funding models, and they ranged in emphasis from the liberal arts to professional certifications and graduate-level research. although a majority of crl respondents were not currently making plans to create a makerspace, many respondents were enthusiastic about current trends, and some even promoted the maker movement in unexpected ways. acknowledging the steady diffusion of 3d printers, many anticipated using such technologies in the future to promote traditional library values and goals. respondents in p2 and p3 indicated that their primary rationale for developing a makerspace was to promote learning and literacy. other prominent reasons included promoting library outreach and the maker culture of learning. data from crls with makerspaces indicated that these benefits were often symbiotic and correlated to strong ideas about universal access to emergent tools and practices in learning. unexpected challenges for developing and operating makerspaces include staffing them with highly skilled, knowledgeable, and service-oriented employees. learning the necessary skills— including operating the printers, troubleshooting models, and maintaining a safe environment, to name a few—was time-consuming and labor intensive. the majority of funding for crls with or planning maker labs came from internal budgets, gifts and donors, and some grants. while some p1 crls indicated that their reason for not developing makerspaces was a lack of community interest, p2 and p3 crls were not necessarily motivated by user requests or needs, nor was lack of explicit need or interest a deterrent. on the contrary, a few reported a desire to promote the campus library as ahead of the curve by keeping in front of student and community needs. in a similar contradiction, some p1 respondents reported that their libraries did not want to compete with other labs on campus. respondents from p2 and p3, however, wanted to offer an alternative to the more siloed or structured model of departmentor lab-funded makerspaces. although makerspaces were sometimes forming in other parts of campus, some p2 and p3 crls felt there was a gap in accessibility and therefore aimed to offer more open and flexible spaces. a final salient theme among p2 and p3 respondents was their commitment to equity of access and issues of social justice. above all, they saw a unique fit for makerspaces in their crl philosophies to serve the greater good. among other advantages, crls were in a unique position to leverage the power of the makerspaces to take advantage of campus communities of “cognitive surplus” and millennial aspirations to share and create spontaneous communities of knowledge. given the amount of resources that are required to create and maintain a makerspace, this research will be useful for crls considering such a space in the future. the present data suggests that no one type of library currently has a monopoly on maker spaces; regardless of size or funding levels, the common thread among p2 and p3 crls was simply a commitment to providing access to emergent technologies and supporting new literacies. while annual budgets and grant applications were critical for some libraries, the majority of crls funded the bulk of their makerspaces through gifts and donations. future studies on the characteristics and challenges of p2 and p3 populations beyond those in new england will certainly amplify our understanding of these trends. current trends and goals in the development of makerspaces | davis 111 https://doi.org/10.6017/ital.v37i2.9825 appendix: survey questions informed consent current trends in the development of makerspaces and 3d printing labs at new england college and research libraries consent for the participation in a research study southern connecticut state university purpose you are invited to participate in a research project conducted by ann marie l. davis, a masters student in library and information studies at southern connecticut state university. the purpose of this project is to investigate the experiences and goals of college and research libraries (crls) that currently have or are making plans to have an open makerspace (or an equivalent room or space). the results from this study will be included in a special project report for the mls degree and the basis for an article to submit for peer-review. procedures if you decide to participate, you will volunteer to take a fifteen-minute online survey. risks and inconveniences there are no known risks associated with this research; other than taking a short amount of time, the survey should not burden you or infringe on your privacy in any way. potential benefits and incentive by participating in this research, you will be contributing to our understanding of current trends and practices with regards to community learning labs in crls. in addition, you will be providing useful knowledge that can support other libraries in making more informed decisions as they potentially develop their own makerspaces in the future. voluntary participation your participation in this research study is voluntary. you may choose not to participate and you may withdraw your consent to participate at any time. you will not be penalized in any way should you decide not to participate or withdraw from this study. protection of confidentiality the survey is anonymous and does not ask for sensitive or confidential information. contact information before you consent, please ask any questions on any aspect of this study that is unclear to you. you may contact me at my student email address at any time: xxx@owls.southernct.edu. if you have questions regarding your rights as a research participant, you may contact the southern connecticut state institutional review board at (203) xxx-xxxx. information technology and libraries | june 2018 112 consent by proceeding to the next page, you confirm that you understand the purpose of this research, the nature of this survey and the possible burdens and risks as well as benefits that you may experience. by proceeding, this indicates that you have read this consent form, understand it , and give your consent to participate and allow your responses to be used in this research. acrl survey on makerspaces and 3d printers q1. what is the size of your college or university? • 4,999 students or less • 5,000–9,999 students • 10,000–19,999 students • 20,000–29,999 students • 30,000 students or more q2. how would you categorize your institution? (please check all that apply) • private • public • doctorate-granting university (awards 20 or more doctorates) • master’s college or university (awards 50 or more master’s degrees, but fewer than 20 doctorates) liberal arts and sciences college • other q3. do any of the libraries at your institution have a makerspace or equivalent hands-on learning lab (including a 3-d printing station or lab)? • yes [if “yes,” respondents are directed to question 14] • no [if “no,” respondents are directed to question 4] q4. do any of the libraries at your institution have plans to develop a makerspace or equivalent learning lab in the near future? • yes [if “yes,” respondents are directed to question 8] • no [if “no,” respondents are directed to question 5] path one (crls with no makerspace, no plans for makerspace) q5. are there specific reasons why your institution has decided not to pursue developing a makerspace or equivalent lab in the near future? • no reasons. we have not given much thought to makerspaces for our library. • yes q6. thank you for your participation. would you like a copy of the results when the report is completed? if yes, please enter your email address in the space provided. current trends and goals in the development of makerspaces | davis 113 https://doi.org/10.6017/ital.v37i2.9825 • no • yes (please enter your email address below) q7. you have almost concluded this survey. before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. if no comments, please click “next” to end the survey. path two [crls with plans to build a makerspace] q8. what are the main goals that motivated your library’s decision to develop a makerspace or equivalent lab? (please check all that apply) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other q9. of these goals, please rank them in order of their level of priority for your library. (choose “n/a” for goals that you did not select in the previous question) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other q10. what is your library’s time frame for developing a makerspace or equivalent lab? q11. what are your library’s current plans for gathering and/or financing the resources needed for developing and maintaining the makerspace or equivalent lab? q12. thank you for your participation. would you like a copy of the results when the report is completed? • no • yes (please enter your email address below) q13. you have almost concluded this survey. before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. if no comments, please click “next” to end the survey. information technology and libraries | june 2018 114 path three [crls with a makerspace] q14. how long have you had your makerspace or equivalent learning lab? • less than 6 months • 6–12 months • 1–2 years • 2–3 years • more than 3 years q15. what were the main goals that motivated your library's decision to develop a makerspace or equivalent lab? (please check all that apply) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs other q16. of these goals, please rank them in order of their level of priority for your library. (choose “n/a” for goals that you did not select in the previous question) • promote community outreach • promote learning and literacy • promote the library as relevant • promote the maker culture of making • provide access to expensive machines or tools • complement digital repository or digital scholarship projects • as a direct response to community requests or needs • other q17. how did your library gather and/or finance the resources needed for developing and maintaining the makerspace or equivalent learning lab? q18. do you offer programs in the makerspace/lab or is it simply opened at defined times for users to use? • yes, we offer the following types of programs: • no, we simply leave the makerspace/lab open at the following times (please note times and/or if a reservation is required): • we do both. we offer the following types of programs and leave the makerspace/lab open at the following times (please note types of programs, times open, and if a reservation is required): current trends and goals in the development of makerspaces | davis 115 https://doi.org/10.6017/ital.v37i2.9825 q19. what type of community members tend to use your library's makerspace or equivalent lab most? (please check all that apply) • undergraduate researchers • graduate researchers • faculty • staff • general public • local artists, designers, or craftspeople • local entrepreneurs • other q20. of the cohorts chosen above, please rank them in order of who uses the makerspace or equivalent lab most often. (use “n/a” for cohorts that are not relevant to your space or lab) • undergraduate researchers • graduate researchers • faculty • staff • general public • local artists, designers, or craftspeople • local entrepreneurs • other q21. how many dedicated staff does your library currently employ for the makerspace or equivalent? • 0 • 1 • 2 • 3 • other q22. where is your makerspace or equivalent lab located? q23. what is the title or name of your makerspace or equivalent lab, and if known, what were the reasons behind this particular name? q24. what major equipment and services does your library makerspace or equivalent lab provide? q25. what unexpected considerations, challenges, or failures has your library faced in developing and maintaining the makerspace or equivalent lab? q26. how would you assess the benefits or “return on investment” of having a makerspace or equivalent lab? q27. thank you for your participation. would you like a copy of the final results when the report is completed? if yes, please enter your email address in the space provided. information technology and libraries | june 2018 116 • no • yes (please enter your email address below) q28. you have almost concluded this survey. before signing off, please feel free to share your thoughts and comments regarding the makerspace movement in college and research libraries. if no comments, please click “next” to end the survey. references and notes 1 laura britton, “a fabulous laboratory: the makerspace at fayetteville free library,” public libraries 51, no. 4 (july/august 2012): 30–33, http://publiclibrariesonline.org/2012/10/afabulous-labaratory-the-makerspace-at-fayetteville-free-library/; madelynn martiniere, “hack the world: how the maker movement is impacting innovation: from diy geige,” medium, october 27, 2014, https://medium.com/@mmartiniere/hack-the-world-how-the-makermovement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz. 2 david v. loertscher, “maker spaces and the learning commons,” teacher librarian 39, no. 6 (october 2012): 45–46, accessed december 9, 2016, library, information science & technology abstracts with full text, ebscohost; jon kalish, “libraries make room for high-tech ‘hackerspaces,’” national public radio, december 25, 2011, http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-techhackerspaces; diane slatter and zaana howard, “a place to make, hack, and learn: makerspaces in australian public libraries,” australian library journal 62, no. 4: 272–84, https://doi.org/10.1080/00049670.2013.853335. 3 sharon crawford barniskis, “makerspaces and teaching artists,” teaching artist journal 12, no. 1: 6–14. 4 anne wong and helen partridge, “making as learning: makerspaces in universities,” australian academic & research libraries 47, no. 3 (september 2016): 143–59, https://doi.org/10.1080/00048623.2016.1228163. 5 erich purpur et al., “refocusing mobile makerspace outreach efforts internally as professional development,” library hi tech 34, no. 1 (2016): 130–42. 6 britton, “a fabulous laboratory,” 30. 7 tj mccue, “first public library to create a maker space,” forbes, november 15, 2011, http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-makerspace/. 8 phillip torrone, “is it time to rebuild and retool public libraries and make ‘techshops’?,” make:, march 20, 2011, http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-publiclibraries-and-make-techshops/. 9 r. david lankes, “killing librarianship,” (keynote speech, new england library association annual conference, october 3, 2011, burlington, vermont), https://davidlankes.org/killinglibrarianship/. http://publiclibrariesonline.org/2012/10/a-fabulous-labaratory-the-makerspace-at-fayetteville-free-library/ http://publiclibrariesonline.org/2012/10/a-fabulous-labaratory-the-makerspace-at-fayetteville-free-library/ https://medium.com/@mmartiniere/hack-the-world-how-the-maker-movement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz https://medium.com/@mmartiniere/hack-the-world-how-the-maker-movement-is-impacting-innovation-bbc0b46bd820#.3mnhow4jz http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-tech-hackerspaces http://www.npr.org/2011/12/10/143401182/libraries-make-room-for-high-tech-hackerspaces https://doi.org/10.1080/00049670.2013.853335 https://doi.org/10.1080/00048623.2016.1228163 http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-maker-space/ http://www.forbes.com/sites/tjmccue/2011/11/15/first-public-library-to-create-a-maker-space/ http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-public-libraries-and-make-techshops/ http://makezine.com/2011/03/10/is-it-time-to-rebuild-retool-public-libraries-and-make-techshops/ https://davidlankes.org/killing-librarianship/ https://davidlankes.org/killing-librarianship/ current trends and goals in the development of makerspaces | davis 117 https://doi.org/10.6017/ital.v37i2.9825 10 janet l. balas, “do makerspaces add value to libraries?,” computers in libraries 32, no. 9 (november 2012): 33. 11 balas, “do makerspaces add value to libraries?,” 33; adrian g smith et al., “grassroots digital fabrication and makerspaces: reconfiguring, relocating and recalibrating innovation?” (working paper, university of sussex, spru working paper swps, falmer, brighton, september 2013), https://doi.org/10.2139/ssrn.2731835. 12 the number of and interval between emails corresponded roughly with dillman’s “five-contact framework” as outlined in carolyn hank, mary wilkins jordan, and barbara m. wildemuth, “survey research,” in applications of social research methods to questions in information and library science, edited by barbara wildemuth, 256–69 (westport, ct: libraries unlimited, 2009), 261. 13 in choosing these priorities, respondents were asked to select as many of the reasons that applied to their own crl. https://doi.org/10.2139/ssrn.2731835 abstract introduction and overview literature review research design and method survey population survey design data collection results crls with no makerspace (p1 = 29) crls with plans for a makerspace in the near future (p2 = 17) crls with operating makerspaces (p3 = 9) priorities and rationale funding, staffing, and service models challenges and philosophies of crl makerspaces discussion conclusion appendix: survey questions informed consent purpose procedures risks and inconveniences potential benefits and incentive voluntary participation protection of confidentiality contact information consent acrl survey on makerspaces and 3d printers path one path two path three references and notes microsoft word march_ital_stuart_tc proofread.docx measuring  journal  linking  success     from  a  discovery  service       kenyon  stuart,   ken  varnum,  and   judith  ahronheim       information  technology  and  libraries  |  march  2015             52   abstract   online  linking  to  full  text  via  third-­‐party  link-­‐resolution  services,  such  as  serials  solutions  360  link  or   ex  libris’  sfx,  has  become  a  popular  method  of  access  to  users  in  academic  libraries.  this  article   describes  several  attempts  made  over  the  course  of  the  past  three  years  at  the  university  of  michigan   to  gather  data  on  linkage  failure:  the  method  used,  the  limiting  factors,  the  changes  made  in  methods,   an  analysis  of  the  data  collected,  and  a  report  of  steps  taken  locally  because  of  the  studies.  it  is  hoped   that  the  experiences  at  one  institution  may  be  applicable  more  broadly  and,  perhaps,  produce  a   stronger  data-­‐driven  effort  at  improving  linking  services.   introduction   online  linking  via  vended  services  has  become  a  popular  method  of  access  to  full  text  for  users  in   academic  libraries.  but  not  all  user  transactions  result  in  access  to  the  desired  full  text.   maintaining  information  that  allows  the  user  to  reach  full  text  is  a  shared  responsibility  among   assorted  vendors,  publishers,  aggregators,  local  catalogers,  and  electronic  access  specialists.  the   collection  of  information  used  in  getting  to  full  text  can  be  thought  of  as  a  supply  chain.  to   maintain  this  chain,  libraries  need  to  enhance  the  basic  information  about  the  contents  of  each   vendor  package—a  collection  of  journals  bundled  for  sale  to  libraries—with  added  details  about   local  licenses  and  holdings.  these  added  details  need  to  be  maintained  over  time.  since  links,   platforms,  contracts,  and  subscriptions  change  frequently,  this  can  be  a  time-­‐consuming  process.   when  links  are  unsuccessfully  constructed  within  each  system,  considerable  troubleshooting  of  a   very  complex  process  is  required  to  determine  where  the  problem  lies.  because  so  much  of  the   transaction  is  invisible  to  the  user,  linking  services  have  come  to  be  taken  for  granted  by  the   community,  and  performance  expectations  are  very  high.  failure  to  reach  full  text  reflects  poorly   on  the  institutions  that  offer  the  links,  so  there  is  considerable  interest  for  and  value  to  the   institution  in  improving  performance.     kenyon  stuart  (kstuart@umich.edu)  is  senior  information  resources  specialist,  ken  varnum   (kvarnum@umich.edu)  is  web  systems  manager,  and  judith  ahronheim  (jaheim@umich.edu)  is   head,  electronic  resource  access  unit,  university  of  michigan  library,  ann  arbor,  michigan.       measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   53   improving  the  success  rate  for  users  can  best  be  achieved  by  acquiring  a  solid  understanding  of   the  nature  and  frequency  of  problems  that  inhibit  full-­‐text  retrieval.  while  anecdotal  data  and   handling  of  individual  complaints  can  provide  incremental  improvement,  larger  improvement   resulting  from  systematic  changes  requires  more  substantial  data,  data  that  characterizes  the   extent  of  linking  failure  and  the  categories  of  situations  that  inhibit  it.   literature  review   openurl  link  resolvers  are  “tool[s]  that  helps  library  users  connect  to  their  institutions’   electronic  resources.  the  data  that  drives  such  a  tool  is  stored  in  a  knowledge  base.”1  since  the   codification  of  the  openurl  as  an  ansi/niso  standard  in  2004,2  openurl  has  become,  in  a  sense,   the  glue  that  holds  the  infrastructure  of  traditional  library  research  together,  connecting  citations   and  full  text.  it  is  well  recognized  that  link  resolution  is  an  imperfect  science.  understanding  what   and  how  openurls  fail  is  a  time-­‐consuming  and  labor-­‐intensive  process,  typically  conducted   through  analysis  of  log  files  recording  attempts  by  users  to  access  a  full-­‐text  item  via  openurl.   research  has  been  conducted  from  the  perspective  of  openurl  providers,  showing  which   metadata  elements  encoded  in  an  openurl  were  most  common  and  most  significant  in  leading  to   an  appropriate  full-­‐text  version  of  the  article  being  cited.  in  2010,  chandler,  wiley,  and  leblanc   reported  on  a  systematic  approach  they  devised,  as  part  of  a  mellon  grant,  to  review  the  outbound   openurls  from  l’année  philologique.3  they  began  with  an  analysis  of  the  metadata  elements   included  in  each  openurl  and  compared  this  to  the  standard.  they  found  that  elements  critical  to   the  delivery  of  a  full-­‐text  item,  such  as  the  article’s  starting  page,  were  never  included  in  the   openurls  generated  by  l’année  philologique.4  their  work  led  to  the  creation  of  the  improving   openurls  through  analytics  (iota)  working  group  within  the  national  information  standards   organization  (niso).   iota,  in  turn,  was  focused  on  improving  openurl  link  quality  at  the  provider  end.  “the  quality  of   the  data  in  the  link  resolver  knowledge  base  itself  is  outside  the  scope  of  iota;  this  is  being   addressed  through  the  niso  kbart  initiative.”5,6  where  iota  provided  tools  to  content  providers   for  improving  their  outbound  openurls,  kbart  provided  tools  to  knowledge  base  and  linking   tool  providers  for  improving  their  data.  pesch,  in  a  study  to  validate  the  iota  process,  discovered   that  well-­‐formed  openurls  were  generally  successful,  however:   the  quality  of  the  openurl  links  is  just  part  of  the  equation.  setting  the  proper  expectations   for  end  users  also  need  to  be  taken  into  consideration.  librarians  can  help  by  educating  their   users  about  what  is  expected  behavior  for  a  link  resolver  and  end  user  frustrations  can  also  be   reduced  if  librarians  take  advantage  of  the  features  most  content  providers  offer  to  control   when  openurl  links  display  and  what  the  links  say.  where  possible  the  link  text  should   indicate  to  the  user  what  they  will  get  when  they  click  it.7       information  technology  and  libraries  |  march  2015   54   missing  from  the  standards-­‐based  work  described  above  is  the  role  of  the  openurl  middleman,   the  library.  price  and  trainor  describe  a  method  for  reviewing  openurl  data  and  identifying  the   root  causes  of  failures.  8  through  testing  of  actual  openurls  in  each  of  their  systems,  they  arrived   at  a  series  of  steps  that  could  be  taken  by  other  libraries  to  proactively  raise  openurl  resolution   success  rates.  several  specific  recommendations  include  “optimize  top  100  most  requested   journals”  and  “optimize  top  ten  full  text  target  providers.”9  that  is,  make  sure  that  openurls   leading  to  content  from  the  most  frequently  used  journals  and  content  sources  are  tested  and  are   functioning  correctly.  chen  describes  a  similar  analysis  of  broken  link  reports  derived  from   bradley  university  library’s  sfx  implementation  over  four  years,  with  a  summary  of  the  common   reasons  links  failed.10  similarly,  o’neill  conducted  a  small  usability  study  whose  recommendations   included  providing  “a  system  of  support  accessible  from  the  page  where  users  experience   difficulty,”11  although  her  recommendations  focused  on  inline,  context-­‐appropriate  help  rather   than  error-­‐reporting  mechanisms.   not  found  in  the  literature  are  several  systematic  approaches  that  a  library  can  take  to  proactively   collect  problem  reports  and  manage  the  knowledge  base  accordingly.   method   we  have  taken  a  two-­‐pronged  approach  to  improving  link  resolution  quality,  each  relying  on  a   different  kind  of  input.  the  first  uses  problem  reports  submitted  by  users  of  our  summontm-­‐ powered  article  discovery  tool,  articlesplus.12  the  second  focuses  on  the  most  commonly-­‐accessed   full-­‐text  titles  in  our  environment,  based  on  reports  from  360  link.  we  have  developed  this  dual   approach  in  the  expectation  that  we  will  catch  more  problems  on  lesser-­‐used  full-­‐text  sources   through  the  first  approach,  and  problems  whose  resolution  will  benefit  the  most  individuals   through  the  second.   user  reports   the  university  of  michigan  library  uses  summon  as  the  primary  article  discovery  tool.  when  a   user  completes  a  search  and  clicks  the  “mget  it”  button  (see  figure  1)—mget  it  is  our  local  brand   for  the  entire  full-­‐text  delivery  process—the  user  is  directed  to  the  actual  document  through  one   of  two  mechanisms:   1. access  to  the  full-­‐text  article  through  a  summon  index-­‐enhanced  direct  link.  (some  of   summon’s  full-­‐text  content  providers  contribute  a  url  to  summon  for  direct  access  to  the   full  text.  this  is  known  as  an  index-­‐enhanced  direct  linking  [direct  linking].)   2. access  to  the  full  text  article  through  the  university  library’s  link  resolver,  360  link.  at  this   point,  one  of  two  things  can  happen:   a. the  university  library  has  configured  a  number  of  full-­‐text  sources  as  “direct  to  full   text”  links.  when  a  citation  leads  to  one  of  these  sources,  the  user  is  directed  to  the   article  (or  as  close  to  it  as  the  content  provider’s  site  allows  (sometimes  to  an  issue       measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   55   table  of  contents,  sometimes  to  a  list  of  items  in  the  appropriate  volume,  and—rarely  in   this  instance——to  the  journal’s  front  page;  the  last  outcome  is  rare  in  our  environment   because  the  university  library  prefers  full-­‐text  links  that  get  closer  to  the  article  and   has  configured  360  link  for  that  outcome).     b. for  those  full-­‐text  sources  that  do  not  have  direct-­‐to-­‐article  links,  360  link  is   configured  to  provide  a  range  of  possible  delivery  mechanisms,  including  journal-­‐,   volume-­‐  or  issue-­‐level  entry  points,  document-­‐delivery  options  (for  cases  where  the   library  does  not  license  any  full-­‐text  online  sources),  the  library  catalog  (for  identifying   print  holdings  for  a  journal),  and  so  on.   from  the  user  perspective,  mechanisms  1  and  2a  are  essentially  identical.  in  both  cases,  a  click  on   the  mget  it  icon  takes  the  user  to  the  full  text  in  a  new  browser  window.  if  the  link  does  not  lead  to   the  correct  article  for  any  reason,  there  is  no  way  in  the  new  window  for  the  library  to  collect  that   information.  users  may  consider  item  2b  results  as  a  failure  because  the  article  is  not  immediately   perceptible,  even  if  the  article  is  actually  available  in  full  text  after  two  or  more  subsequent  clicks.   because  of  this  user  perception,  we  interpreted  2b  results  as  “failures.”     figure  1.  sample  citation  from  articlesplus   in  an  attempt  to  understand  this  type  of  problem,  following  the  advice  given  by  o’neill  and  chen,   we  provide  a  problem-­‐reporting  link  in  the  articlesplus  search-­‐results  interface  each  time  the  full-­‐ text  icon  appears  (see  the  right  side  of  figure  1).  when  the  user  clicks  this  problem-­‐reporting  link,   they  are  taken  to  a  qualtrics  survey  form  that  asks  for  several  basic  pieces  of  information  from  the   user  but  also  captures  the  citation  information  for  the  article  the  user  was  trying  to  reach  (see   figure  2).         information  technology  and  libraries  |  march  2015   56     figure  2.  survey  questionnaire  for  reporting  linking  problems   this  survey  instrument  asks  the  user  to  characterize  the  type  of  delivery  failure  with  one  of  four   common  problems,  along  with  an  “other”  text  field:   • there  was  no  article   • i  got  the  wrong  article   • i  ended  up  at  a  page  on  the  journal's  web  site,  but  not  the  article   • i  was  asked  to  log  in  to  the  publisher's  site   • something  else  happened  (please  explain):   the  form  also  asks  for  any  additional  comments  and  requires  that  the  user  provide  an  email   address  so  that  library  staff  can  contact  the  user  with  a  resolution  (often  including  a  functioning   full-­‐text  link)  or  to  ask  for  more  information.   in  addition  to  the  information  requested  from  the  user,  hidden  fields  on  this  form  capture  the   summon  record  id  for  the  article,  the  ip  address  of  the  user’s  computer  (to  help  us  identify  if  the   problem  could  be  a  related  to  our  ezproxy  configuration),  a  time  and  date  stamp  of  the  report’s   submission,  and  the  brand  and  version  of  web  browser  being  used.       measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   57   the  results  of  the  form  are  sent  by  email  to  the  university  library’s  ask  a  librarian  service,  the   library’s  online  reference  desk.  ask  a  librarian  staff  make  sure  that  the  problem  is  not  associated   with  the  individual  user’s  account  (that  they  are  entitled  to  get  full  text,  that  they  were  accessing   the  item  from  on  campus  or  via  the  proxy  server  or  vpn,  etc.).  when  user-­‐centric  problems  are   ruled  out,  the  problem  is  passed  on  to  the  library’s  electronic  access  unit  in  technical  services  for   further  analysis  and  resolution.   random  sampling   user-­‐reported  problems  are  only  one  picture  of  issues  in  the  linking  process.  we  were  concerned   that  user  reports  might  not  be  the  complete  story.  we  wanted  to  ensure  that  our  samples   represented  the  full  range  of  patron  experiences,  not  just  that  of  the  ones  who  reported.  so,  to  get   a  different  perspective,  we  instituted  a  series  of  random  sample  testing  using  logs  of  document   requests  from  the  link  resolver,  360  link.   2011  linking  review   our  first  large-­‐scale  review  of  linking  from  articlesplus  was  conducted  in  2011.  this  first  approach   was  based  on  a  log  of  the  summon  records  that  had  appeared  in  patron  searches  and  for  which   our  link  resolver  link  had  been  clicked.  for  this  test  we  chose  a  slice  of  the  log  covering  the  period   from  january  30–february  12,  2011.  this  period  was  chosen  because  it  was  well  into  the  academic   term  and  before  spring  break,  so  it  would  provide  a  representative  sample  of  the  searches  people   had  performed.  the  resulting  slice  contained  13,161  records.  for  each  record  the  log  contained   the  summon  id  of  the  record.  we  used  this  to  remove  duplicate  records  from  the  log  to  ensure  we   were  not  testing  linking  for  the  same  record  more  than  once,  leaving  us  with  a  spreadsheet  of   10,497  records,  one  record  per  row.  from  the  remaining  records  we  chose  a  sample  of  685   records  using  a  random  number  generator  tool,  research  randomizer   (http://www.randomizer.org/form.htm),  to  produce  a  random,  nonduplicating  list  of  685   numbers  with  values  from  1  to  10,497.  each  of  the  685  numbers  produced  was  matched  to  the   corresponding  row  in  the  spreadsheet  starting  with  the  first  record  listed  in  the  spreadsheet.  for   each  record  we  collected  the  data  in  figure  3.                 information  technology  and  libraries  |  march  2015   58   1.  the  summon  id  of  the  record   2.  the  raw  openurl  provided  with  the  record.   3.  a  version  of  the  openurl  that  may  have  been  locally  edited  to  put  dates  in  a  standard   format.   4.  the  final  url  provided  to  the  user  for  linking  to  the  resource.  this  would  usually  be   the  openurl  from  #3  containing  the  metadata  used  by  the  link  resolver  to  build  its   full-­‐text  links.  currently  it  is  an  intermediary  url  provided  by  the  summon  api.  this   url  may  lead  to  an  openurl  or  to  a  direct  link  to  the  resource  in  the  summon  record.   5.  the  classification  of  the  link  in  the  summon  record.  this  was  either  “full  text  online”   or  “citation-­‐only.”   6.  the  date  the  link  resolver  link  was  clicked.   7.  the  page  in  the  summon  search  results  the  link  resolver  link  was  found.   8.  the  position  within  the  page  of  search  results  where  the  link  resolver  link  was  located.   9.  the  search  query  that  produced  the  search  results.   figure  3.  data  points  collected   the  results  from  this  review  were  somewhat  disappointing,  with  only  69.5%  of  the  citations   tested  leading  directly  to  full  text.  at  the  time  direct  linking  did  not  yet  exist,  so  “direct  to  full  text”   linking  was  only  available  through  the  1-­‐click  feature  of  360  link.  the  1-­‐click  feature  attempts  to   lead  patrons  directly  to  the  full  text  of  a  resource  without  first  going  through  the  360  link  menu.   1-­‐click  was  used  for  579  or  84.5%  of  the  citations  tested  with  15.3%  leading  to  the  360  link  menu.   of  the  citations  that  used  1-­‐click,  476  or  82.2%  led  directly  to  full  text,  so  when  1-­‐click  was  used  it   was  rather  successful.  links  for  about  30.5%  of  the  citations  led  either  to  a  failed  attempt  to  reach   full  text  through  1-­‐click  or  directly  to  the  360  link  menu.  the  2011  review  included  looking  at  the   full-­‐text  links  that  360  link  indicated  should  lead  directly  to  the  full  text  as  opposed  to  the  journal,   volume  or  issue  level.  when  we  reviewed  all  of  the  “direct  to  full  text”  links  generated  by  360  link,   not  only  the  ones  used  by  1-­‐click,  we  found  a  variety  of  reasons  why  those  links  did  not  succeed  in   leading  to  the  full  text.  the  top  five  reasons  found  for  linking  failures  are  the  following:   1. incomplete  target  collection   2. incorrect  syntax  in  the  article/chapter  link  generated  by  360  link   3. incorrect  metadata  in  the  summon  openurl   4. article  not  individually  indexed   5. target  error  in  targeturl  translation       measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   59   collectively,  these  reasons  were  associated  with  the  failure  of  71.5%  of  the  “direct  to  full  text”   links.  as  we  will  show  later,  these  problems  were  also  noted  in  our  most  recent  review  of  linking   quality.   move  to  quarterly  testing   after  this  review  in  2011,  we  decided  to  perform  quarterly  testing  of  the  linking  so  we  would  have   current  data  on  the  quality  of  linking.  this  would  give  us  information  on  the  effectiveness  of  any   changes  we  and  proquest  had  made  independently  to  improve  the  linking.  we  could  see  where   linking  problems  found  in  previous  testing  had  been  resolved  and  where  new  ones  might  exist.     however,  we  needed  to  change  how  we  created  our  sample.  while  the  data  gathered  in  2011   provided  much  insight  into  the  workings  of  360  link,  testing  the  685  records  produced  2,210  full-­‐ text  links.  gathering  the  data  for  such  a  large  number  of  links  required  two  months  of  part-­‐time   effort  by  two  staff  members  as  well  as  an  additional  month  of  part-­‐time  effort  by  one  staff  member   for  analysis.  this  would  not  be  workable  for  quarterly  testing.  as  an  alternative  we  decided  to  test   two  records  from  each  of  the  100  serials  most  accessed  through  the  link  resolver.  this  gave  us  a   sample  we  could  test  and  analyze  within  a  quarter  based  on  serials  that  our  patrons  were  using.   we  felt  that  we  could  gather  data  for  such  a  sample  within  three  to  four  weeks  instead  of  two   months.  the  list  was  generated  using  the  “click-­‐through  statistics  by  title  and  issn  (journal  title)”   usage  report  generated  through  the  proquest  client  center  administration  gui.  we  searched  for   each  serial  title  within  summon  using  the  serial’s  issn  or  the  serial’s  title  when  the  issn  was  not   available.     we  ordered  the  results  by  date,  with  the  newest  records  first.  we  wanted  an  article  within  the  first   two  to  three  pages  of  results  so  we  would  have  a  recent  article,  but  not  one  so  new  it  was  not  yet   available  through  the  resources  that  provide  access  to  the  serial.  then  we  reordered  the  results  to   show  the  oldest  records  first  and  chose  an  article  from  the  first  or  second  page  of  results.  our  goal   was  to  choose  an  article  at  random  from  the  second  or  third  page  while  ignoring  the  actual  content   of  the  article  so  as  not  to  introduce  a  selection  bias  by  publisher  or  journal.  another  area  where   our  sample  was  not  truly  random  involved  supplement  issues  of  journals.  one  problem  we  found   with  the  samples  collected  was  that  they  contained  few  items  from  supplemental  issues  of   journals.  linking  to  articles  in  supplements  is  particularly  difficult  because  of  the  different  ways   supplement  information  is  represented  among  different  databases.  to  attempt  to  capture  linking   information  in  this  case  we  added  records  for  articles  in  supplemental  issues.  those  records  were   chosen  from  journals  found  in  earlier  testing  to  contain  supplemental  issues.  we  searched   summon  for  articles  within  those  supplemental  issues  and  selected  one  or  two  to  add  to  our   sample.   one  notable  thing  is  the  introduction  of  direct  linking  in  our  summon  implementation  between   the  reviews  for  the  first  and  second  quarters  of  2012.  proquest  developed  direct  linking  to       information  technology  and  libraries  |  march  2015   60   improve  linking  to  resources  (including  but  not  limited  to  full  text  of  articles)  through  summon.   instead  of  using  an  openurl  which  must  be  sent  to  a  link  resolver,  direct  linking  uses   information  received  from  the  providers  of  the  records  in  summon  to  create  links  directly  to   resources  through  those  providers.  ideally,  since  these  links  use  information  from  those  providers,   direct  linking  would  not  have  the  problems  found  with  openurl  linking  through  a  link  resolver   such  as  360  link.  not  all  links  from  summon  use  direct  linking,  and  as  a  result  we  had  to  take  into   account  the  possibility  that  any  link  we  clicked  could  use  either  openurl  linking  or  direct  linking.   current  review:  back  to  random  sampling   while  the  above  sampling  method  resulted  in  useful  data,  we  also  found  it  had  some  limitations.   when  we  performed  the  review  for  the  second  quarter  of  2012,  we  found  a  significant  increase  in   the  effectiveness  of  360  link  since  the  first  quarter  2012  review.  this  is  further  described  in  the   findings  section  of  this  article.  we  were  able  to  trace  some  of  this  improvement  to  changes   proquest  had  made  to  360  link  and  to  the  openurls  produced  from  summon.  however,  we  were   unable  to  fully  trace  the  cause  of  the  improvement  and  were  unable  to  determine  if  this  was  real   improvement  that  would  be  persistent.  to  resolve  these  problems,  we  have  returned  to  using  a   random  sample  in  our  latest  review,  but  with  a  change  in  methods.     current  review:  determining  the  sample  size   we  wanted  to  perform  a  review  that  would  be  statistically  relevant  and  could  help  us  determine  if   any  changes  in  linking  quality  were  persistent  and  not  just  a  one-­‐time  event.  instead  of  testing  a   single  sample  each  quarter  we  decided  to  test  a  sample  each  month  over  a  period  of  months.  one   concern  with  this  was  the  sample  size,  as  we  wanted  a  sample  that  would  be  statistically  valid  but   not  so  large  we  could  not  test  it  within  a  single  month.  we  determined  that  a  sample  size  of  300   would  be  sufficient  to  determine  if  any  month-­‐to-­‐month  changes  represent  a  real  change.   however,  in  previous  testing  we  had  learned  that  because  of  re-­‐indexing  of  the  summon  records,   summon  ids  that  were  valid  when  a  patron  performed  a  search  might  no  longer  be  valid  by  the   time  of  our  testing.  we  wanted  a  sample  of  300  still-­‐valid  records,  so  we  selected  a  random  sample   larger  than  that  amount.  so,  we  decided  to  test  600  records  each  month  to  determine  if  the   summon  ids  were  still  valid.   current  review:  methods   when  generating  each  month's  sample  we  used  the  same  method  as  in  2011.  we  asked  our  web   systems  group  for  the  logs  of  full-­‐text  requests  from  the  library’s  summon  interface  for  the  period   of  november  2012–february  2013.13  we  processed  each  month’s  log  file  within  two  months  of  the   user  interactions.  to  generate  the  600-­‐record  sample,  after  removing  records  with  duplicate   summon  ids,  we  used  a  random  number  generator  tool,  research  randomizer,  to  produce  a   random,  nonduplicating  list  of  600  numbers  with  values  from  1  to  the  number  of  unique  records.   each  of  the  600  numbers  produced  was  matched  to  the  corresponding  row  in  the  spreadsheet  of       measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   61   records  with  unique  summon  ids.  once  the  600  records  were  tested  and  we  had  a  subset  with   valid  summon  ids,  we  generated  a  list  of  300  random,  nonduplicating  numbers  with  values  from  1   to  the  number  of  records  with  valid  summon  ids.  each  of  the  300  numbers  produced  was  matched   to  the  corresponding  row  in  a  spreadsheet  of  the  subset  of  records  with  valid  summon  ids.  this   gave  us  the  300-­‐record  sample  for  analysis.     testing  was  performed  by  two  people,  a  permanent  staff  member  and  a  student  hired  to  assist   with  the  testing.  the  staff  member  was  already  familiar  with  the  data  gathering  and  recording   procedure  and  trained  the  student  on  this  procedure.  the  student  was  introduced  to  the  library’s   summon  implementation  and  shown  how  to  recognize  and  understand  the  different  possible   linking  types:  summon  direct  linking,  360  link  using  1-­‐click,  and  360  link  leading  to  the  360   link  menu.  once  this  background  was  provided,  the  student  was  introduced  to  the  procedure  for   gathering  and  recording  data.  the  student  was  given  suggestions  on  how  to  find  the  article  on  the   target  site  if  the  link  did  not  lead  directly  to  the  article  and  how  to  perform  some  basic  analysis  to   determine  why  the  link  did  not  function  as  expected.  the  permanent  staff  member  reviewed  the   analysis  of  the  links  that  did  not  lead  to  full  text  and  applied  a  code  to  describe  the  reason  for  the   failure.     based  on  our  2011  testing,  we  expected  to  see  one  of  two  general  results  in  the  current  round.     1. 360  link  would  attempt  to  connect  directly  to  the  article  because  of  our  activating  the  1-­‐ click  feature  of  360  link  when  we  implemented  the  link  resolver.  with  1-­‐click,  360  link   attempts  to  lead  patrons  directly  to  the  full  text  of  a  resource  without  first  having  to  go   through  the  link  resolver  menu.  even  with  1-­‐click  active  we  provide  patrons  a  link  leading   to  the  full  360  link  menu,  which  may  have  other  options  for  leading  to  the  full  text  as  well   as  links  to  search  for  the  journal  or  book  in  our  catalog.     2. the  other  possible  result  was  the  link  from  summon  leading  directly  to  the  360  link  menu.     once  direct  linking  was  implemented  after  we  began  this  round,  a  third  result  became  possible  (a   direct  link  from  summon  to  the  full  text).     for  each  record  we  collected  the  data  shown  in  figure  4.                 information  technology  and  libraries  |  march  2015   62   1.      date  the  link  from  summon  record  was  tested.   2.      the  url  of  the  summon  record.   3.    *the  openurl  generated  by  clicking  the  link  from  summon.  this  was  the  url  in  the   address  bar  of  the  page  to  which  the  link  led.  this  is  not  available  when  direct  linking  is   used.   4.      the  issn  of  the  serial  or  isbn  of  the  book.   5.      the  doi  of  the  article/book  chapter  if  it  was  available.   6.      the  citation  for  the  article  as  shown  in  the  360  link  menu  or  in  the  summon  record  if   direct  linking  was  used.   7.    *each  package  (collection  of  journals  bundled  together  in  the  knowledgebase)  for  which   360  link  produced  an  electronic  link  for  that  citation.   8.    *the  order  in  the  list  of  electronic  resources  in  which  the  package  in  #7  appeared  in  the   360  link  menu.   9.    *the  linking  level  assigned  to  the  link  by  360  link.  this  level  indicates  how  close  to  the   article  the  link  should  lead  the  patron,  with  article-­‐level  or  chapter-­‐level  links  ideally   taking  the  patron  directly  to  the  article/book  chapter.  the  linking  levels  recorded  in  our   testing  starting  with  the  closest  to  full  text  were  article/book  chapter,  issue,  volume,   journal/book  and  database.   10.  *for  article-­‐level  links,  the  url  that  360  link  used  to  attempt  to  connect  to  the  article.   11.  for  all  full-­‐text  links  in  the  360  link  menu,  the  url  to  which  the  links  led.  this  was  the   link  displayed  in  the  browser  address  bar.   12.  a  code  assigned  to  that  link  describing  the  results.   13.  a  note  indicating  if  full  text  was  available  on  the  site  to  which  the  link  led.  this  was  only   an  indicator  of  whether  or  not  full  text  could  be  accessed  on  that  site  not  an  indicator  of   the  success  of  1-­‐click/direct  linking  or  the  article-­‐level  link.   14.  a  note  if  this  was  the  link  used  by  1-­‐click.   15.  a  note  if  direct  linking  was  used.   16.  a  note  if  the  link  was  for  a  citation  where  1-­‐click  was  not  used  and  clicking  the  link  in   summon  led  directly  to  the  360  link  menu.   17.  notes  providing  more  detail  for  the  results  described  by  #12.  this  included  error   messages,  search  strings  shown  on  the  target  site,  and  any  unusual  behavior.  the  notes   also  included  conclusions  reached  regarding  the  cause(s)  of  any  problems.   *  collected  only  if  the  link  resolver  was  used.   figure  4.  data  collected  from  sample       measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   63   each  link  was  categorized  based  on  whether  it  led  to  the  full  text.  then  the  links  that  failed  were   further  categorized  on  the  basis  of  the  reason  for  failure  (see  figure  5  for  failure  categories).     1.        incorrect  metadata  in  the  summon  openurl.   2.        incomplete  metadata  in  the  summon  openurl.   3.        difference  in  the  metadata  between  summon  and  the  target.  in  this  case  we  were  unable  to   determine  which  site  had  the  correct  metadata.   4.        inaccurate  data  in  the  knowledgebase.  this  includes  incorrect  url  and  incorrect   issn/isbn.   5.        incorrect  coverage  in  the  knowledgebase.   6.        link  resolver  insufficiency.  the  link  resolver  has  not  been  configured  to  provide  deep   linking.  this  may  be  something  that  we  could  configure  or  something  that  would  require   changes  in  360  link.   7.        incorrect  syntax  in  the  article/chapter  link  generated  by  360  link.   8.        target  site  does  not  appear  to  support  linking  to  article/chapter  level.   9.        article  not  individually  indexed.  this  often  happens  with  conference  abstracts  and  book   reviews  which  are  combined  in  a  single  section.   10.    translation  error  of  the  “targeturl”  by  target  site.   11.    incomplete  target  collection.  site  is  missing  full  text  for  items  that  should  be  available  on   the  site.   12.    incorrect  metadata  on  the  target  site.   13.    citation-­‐only  record  in  summon.  summon  indicates  only  the  citation  is  available  so  access   to  full  text  is  not  expected.   14.    error  indicating  cookie  could  not  be  downloaded  from  target  site.  this  sometimes   happened  with  1-­‐click  but  the  same  link  would  work  from  the  360  link  menu.   15.    item  does  not  appear  to  have  a  doi.  the  360  link  menu  may  provide  an  option  to  search  for   the  doi.  sometimes  these  searches  fail  and  we  are  unable  to  find  a  doi  for  the  item.   16.    miscellaneous.  results  that  do  not  fall  into  the  other  categories.  generally  used  for  links  in   packages  for  which  360  link  only  provides  journal/book-­‐level  linking  such  as  directory  of   open  access  journals  (doaj).   17.    unknown.  the  link  failed  with  no  identifiable  cause.     figure  5.  list  of  failure  categories  assigned       information  technology  and  libraries  |  march  2015   64   user-­‐reported  problems   in  march  2012,  we  began  recording  the  number  of  full-­‐text  clicks  in  articlesplus  search  results   (using  google  analytics  events).  for  each  month,  we  calculated  the  number  of  problems  reported   per  1,000  searches  and  per  1,000  full-­‐text  clicks.  graphed  over  time,  the  number  of  problem   reports  in  both  categories  shows  an  overall  decline.  see  figures  6  and  7.     figure  6.  problems  reported  per  1,000  articlesplus  searches  (june  2011–april  2014)     figure  7.  problems  reported  per  1,000  articlesplus  full-­‐text  clicks  (march  2012-­‐april  2014)       measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   65   our  active  work  to  update  the  summon  and  360  link  knowledge  bases  began  in  june  2011.  the   change  to  summon  direct  linking  happened  on  february  27,  2012,  at  a  time  when  we  were   particularly  dissatisfied  with  the  volume  of  problems  reported.  we  felt  the  poor  quality  of   openurl  resolution  was  a  strong  argument  in  favor  of  activating  summon  direct  linking.  we   believe  this  change  led  to  a  noticeable  improvement  in  the  number  of  problems  reported  per   1,000  searches  (see  figure  6).  we  do  not  have  data  for  clicks  on  the  full-­‐text  links  in  our   articlesplus  interface  prior  to  march  2012,  but  do  know  that  reports  per  1,000  full-­‐text  clicks  have   been  on  the  decline  as  well  (see  figure  7).   findings   summary  of  random-­‐sample  testing  of  link  success     in  early  2013  we  tested  linking  from  articlesplus  to  gather  data  on  the  effectiveness  of  the  linking   and  to  attempt  to  determine  if  there  were  any  month-­‐to-­‐month  changes  in  the  effectiveness  that   could  indicate  persistent  changes  in  linking  quality.  in  this  section  we  will  review  the  data   collected  from  the  four  samples  used  in  this  testing.  we  will  discuss  the  different  paths  to  full  text,   direct  linking  vs.  openurl  linking  through  360  link,  and  their  relative  effectiveness.  we  will  also   discuss  the  reasons  we  found  for  links  to  not  lead  to  full  text.   paths  to  full-­‐text  access   as  shown  below  (see  table  1)  most  of  the  records  tested  in  summon  used  direct  linking  to   attempt  to  reach  the  full  text.  the  percentage  varied  with  each  sample  tested  but  they  ranged  from   61%  to  70%.  the  remaining  records  used  360  link  to  attempt  to  reach  the  full  text.  most  of  the   time  when  360  link  was  used,  1-­‐click  was  also  used  to  reach  the  full  text.  between  direct  linking   and  1-­‐click  about  93%  to  94%  of  the  time  an  attempt  was  made  to  lead  users  directly  to  the  full   text  of  the  article  without  first  going  through  the  360  link  menu.     sample  1   november  2012   sample  2   december  2012   sample  3   january  2013   sample  4   january  2013   direct  linking   205   68.3%   210   70.0%   184   61.3%   190   63.3%   360  link/1-­‐click   77   25.7%   70   23.3%   98   32.7%   87   29.0%   360  link/360  link  menu   18   6.0%   20   6.7%   18   6.0%   23   7.7%   total   300     300     300     300     table  1.  type  of  linking       information  technology  and  libraries  |  march  2015   66   attempts  to  reach  the  full  text  through  direct  linking  and  1-­‐click  were  rather  successful.  in  the   testing,  we  were  able  to  reach  full  text  through  those  methods  from  79%  to  about  84%  of  the  time   (see  table  2).  the  remaining  cases  were  situations  where  direct  linking/1-­‐click  did  not  lead   directly  to  the  full  text  or  we  reached  the  360  link  menu.       sample  1   november  2012   sample  2   december  2012   sample  3   january  2013   sample  4   january  2013   direct  linking   197   65.7%   204   68.0%   173   57.7%   185   61.7%   360  link/1-­‐click   45   15.0%   47   15.7%   64   21.3%   55   18.3%   total  out  of  300   242   80.7%   251   83.7%   237   79.0%   240   80.0%   table  2.  percentage  of  citations  leading  directly  to  full  text   table  3  contains  the  same  data  but  adjusted  to  remove  results  that  summon  correctly  indicated   were  citation-­‐only.  instead  of  calculating  the  percentages  based  on  the  full  300  citation  samples,   they  are  calculated  based  on  the  sample  minus  the  citation-­‐only  records.  the  last  row  shows  the   number  of  records  excluded  from  the  full  samples.       sample  1   november  2012   sample  2   december  2012   sample  3   january  2013   sample  4   january  2013   direct  linking   197   65.9%   204   69.2%   173   59.0%   185   62.5%   360  link/1-­‐click   45   15.1%   47   15.9%   64   21.8%   55   18.6%   total   242   80.9%   251   85.1%   237   80.9%   240   81.1%   records  excluded   1     5     7     4     table  3.  percentage  of  citations  leading  directly  to  full  text,  excluding  citation-­‐only  results   link  failures  with  summon  direct  linking  and  360  link  1-­‐click   the  next  two  tables  show  the  results  of  linking  for  records  that  used  direct  linking  and  the   citations  that  used  1-­‐click  through  360  link.  records  that  used  direct  linking  were  more  likely  to   lead  testers  to  full  text  than  360  link  with  1-­‐click.  for  the  four  samples,  direct  linking  led  to  full   text  more  than  90%  of  the  time  while  1-­‐click  led  to  full  text  from  about  58%  to  about  67%  of  the   time.   for  those  records  using  direct  linking  where  direct  linking  did  not  lead  directly  to  the  text,  the   result  was  usually  a  page  that  did  not  have  a  link  to  full  text  (see  table  4).         measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   67     sample  1   nov.  2012   n  =  205   sample  2   dec.  2012   n  =  210   sample  3   jan.  2013   n  =  184   sample  4   jan.  2013   n  =  190   full  text/page  with  full-­‐text  link   197   96.1%   204   97.1%   173   94.0%   185   97.4%   abstract/citation  only   6   2.9%   5   2.4%   6   3.3%   5   2.6%   unable  to  access  full  text  through   available  full-­‐text  link   1   0.5%   1   0.5%   3   1.6%   0   0.0%   error  and  no  full-­‐text  link  on  target   1   0.5%   0   0.0%   0   0.0%   0   0.0%   listing  of  volumes/issues   0   0.0%   0   0.0%   1   0.5%   0   0.0%   wrong  article   0   0.0%   0   0.0%   1   0.5%   0   0.0%   minor  results14   0   0.0%   0   0.0%   0   0.0%   0   0.0%   table  4.  results  with  direct  linking   for  360  link  with  1-­‐click,  the  results  that  did  not  lead  to  full  text  were  more  varied  (see  table  5).   the  top  reasons  for  failure  included  the  link  leading  to  an  error  indicating  the  article  was  not   available  even  though  full  text  for  the  article  was  available  on  the  site,  the  link  leading  to  a  list  of   search  results  and  the  link  leading  to  the  table  of  contents  for  the  journal  issue  or  book.  in  the  last   case,  most  of  those  results  were  book  chapters  where  360  link  only  generated  a  link  to  the  main   page  for  the  book  instead  of  a  link  to  the  chapter.           information  technology  and  libraries  |  march  2015   68     sample  1   nov.  2012   n  =  77   sample  2   dec.  2012   n  =  70   sample  3   jan.  2013   n  =  98   sample  4   jan.  2013   n  =  87   full  text/page  with  full-­‐text  link   45   58.4%   47   67.1%   64   65.3%   55   63.2%   table  of  contents   12   15.6%   6   8.6%   10   10.2%   6   6.9%   error  but  full  text  available   6   7.8%   11   15.7%   10   10.2%   18   20.7%   results  list   6   7.8%   2   2.9%   10   10.2%   4   4.6%   error  and  no  full-­‐text  link  on  target   6   7.8%   1   1.4%   2   2.0%   2   2.3%   wrong  article   1   1.3%   1   1.4%   1   1.0%   2   2.3%   other   1   1.3%   0   0.0%   0   0.0%   0   0.0%   abstract/citation  only   0   0.0%   0   0.0%   1   1.0%   0   0.0%   unable  to  access  full  text  through  available   full-­‐text  link   0   0.0%   1   1.4%   0   0.0%   0   0.0%   search  box   0   0.0%   1   1.4%   0   0.0%   0   0.0%   minor  results15   0   0.0%   0   0.0%   0   0.0%   0   0.0%   table  5.  results  with  360  link:  citations  using  1-­‐click   link  analysis  for  all  360  link  clicks   unlike  the  above  tables,  which  show  the  results  on  a  citation  basis,  the  table  below  shows  the   results  for  all  links  produced  by  360  link  (see  table  6).  this  includes  the  following:   1. links  used  for  1-­‐click.   2. links  in  the  360  link  menu  that  were  not  used  for  1-­‐click  when  360  link  attempted  to  link   to  full  text  using  1-­‐click   3. links  in  the  360  link  menu  where  clicking  the  link  in  summon  led  directly  to  the  360  link   menu  instead  of  using  1-­‐click             measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   69     sample  1   nov.  2012   n  =  167   sample  2   dec.  2012   n  =  158   sample  3   jan.  2013   n  =  184   sample  4   jan.  2013   n  =  172   full  text/page  with  full-­‐text  link   81   48.5%   84   53.2%   103   56.0%   87   50.6%   abstract/citation  only   0   0.0%   0   0.0%   1   0.5%   0   0.0%   unable  to  access  full  text  through  available   full-­‐text  link   0   0.0%   1   0.6%   0   0.0%   1   0.6%   error  but  full  text  available   9   5.4%   14   8.9%   17   9.2%   23   13.4%   error  and  full  text  not  accessible  through   full-­‐text  link  on  target   1   0.6%   0   0.0%   0   0.0%   0   0.0%   error  and  no  full-­‐text  link  on  target   10   6.0%   1   0.6%   6   3.3%   5   2.9%   failed  to  find  doi  through  link  in  360  link   menu   3   1.8%   5   3.2%   5   2.7%   8   4.7%   main  journal  page   22   13.2%   24   15.2%   17   9.2%   15   8.7%   other   2   1.2%   0   0.0%   1   0.5%   2   1.2%   360  link  menu  with  no  full-­‐text  links   0   0.0%   2   1.3%   3   1.6%   3   1.7%   results  list   9   5.4%   4   2.5%   10   5.4%   3   1.7%   search  box   6   3.6%   7   4.4%   5   2.7%   8   4.7%   table  of  contents   12   7.2%   6   3.8%   10   5.4%   9   5.2%   listing  of  volumes/issues   9   5.4%   9   5.7%   5   2.7%   6   3.5%   wrong  article   3   1.8%   1   0.6%   1   0.5%   2   1.2%   table  6.  results  with  360  link:  all  links  produced  by  360  link   in  addition  to  recording  what  happened,  we  attempted  to  determine  why  links  failed  to  reach  full   text.  even  though  direct  linking  is  very  effective,  it  is  not  100%  effective  in  linking  to  full  text.   when  excluding  records  that  indicated  that  only  the  citation,  not  full  text,  would  be  available   through  summon,  most  of  the  problems  were  due  to  incorrect  information  in  summon  (see  table   7).  either  the  link  produced  by  summon  was  incorrectly  leading  to  an  error  or  an  abstract  when       information  technology  and  libraries  |  march  2015   70   full  text  was  available  on  the  target  site  or  summon  incorrectly  indicated  access  to  full  text  was   available.       sample  1   nov.  2012   n  =  8   sample  2   dec.  2012   n  =  6   sample  3   jan.  2013   n  =  11   sample  4   jan.  2013   n  =  5   citation-­‐only  record  in  summon   1   12.5%   3   50.0%   4   36.4%   1   20.0%   incomplete  target  collection   1   12.5%   0   0.0%   1   9.1%   1   20.0%   incorrect  coverage  in  knowledgebase   0   0.0%   0   0.0%   2   18.2%   0   0.0%   summon  has  incorrect  link   3   37.5%   1   16.7%   2   18.2%   2   40.0%   summon  incorrectly  indicating  available  access   to  full  text   3   37.5%   2   33.3%   2   18.2%   1   20.0%   table  7.  reasons  for  linking  failure  to  link  to  full  text  through  direct  linking   table  8  shows  the  reasons  links  generated  by  360  link  and  used  for  1-­‐click  did  not  lead  to  full  text.   most  of  the  failures  were  caused  by  three  general  problems:   1. incorrect  metadata  in  summon   2. incorrect  syntax  in  the  article/chapter  link  generated  by  360  link   3. target  site  does  not  support  linking  to  the  article/chapter  level           measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   71     sample  1   nov.  2012   n  =  32   sample  2   dec.  2012   n  =  23   sample  3   jan.  2013   n  =  34   sample  4   jan.  2013   n  =  32   incorrect  metadata  in  the  summon  openurl   2   6.3%   4   17.4%   3   8.8%   4   12.5%   incomplete  metadata  in  the  summon   openurl   0   0.0%   2   8.7%   0   0.0%   0   0.0%   difference  in  metadata  between  summon  and   the  target   1   3.1%   5   21.7%   0   0.0%   2   6.3%   inaccurate  data  in  knowledgebase   0   0.0%   0   0.0%   0   0.0%   1   3.1%   incorrect  coverage  in  knowledgebase   0   0.0%   0   4.3%   0   0.0%   0   0.0%   link  resolver  insufficiency   2   6.3%   0   0.0%   1   2.9%   0   0.0%   incorrect  syntax  in  the  article/chapter  link   generated  by  360  link   6   18.8%   3   13.0%   10   29.4%   7   21.9%   target  site  does  not  support  linking  to   article/chapter  level   11   34.3%   4   17.4%   5   14.7%   6   18.8%   article  not  individually  indexed   0   0.0%   1   4.3%   3   8.8%   5   15.6%   target  error  in  targeturl  translation   0   0.0%   0   0.0%   5   14.7%   3   9.4%   incomplete  target  collection   8   25.0%   1   4.3%   1   2.9%   3   9.4%   incorrect  metadata  on  the  target  site   0   0.0%   1   4.3%   0   0.0%   1   3.1%   citation-­‐only  record  in  summon   0   0.0%   0   0.0%   0   0.0%   0   0.0%   cookie   2   6.3%   0   0.0%   0   0.0%   0   0.0%   item  does  not  appear  to  have  a  doi   0   0.0%   0   0.0%   0   0.0%   0   0.0%   miscellaneous   0   0.0%   0   0.0%   4   0.0%   0   0.0%   unknown   0   0.0%   1   4.3%   2   5.9%   0   0.0%   table  8.  reasons  for  linking  failure  to  link  to  full  text  through  1-­‐click       information  technology  and  libraries  |  march  2015   72   broadening  our  view  of  360  link  to  include  all  links  generated  by  360  link  during  the  testing,  not   only  the  ones  used  by  1-­‐click  (see  table  9),  we  see  more  causes  of  failure  than  with  1-­‐click.  most  of   the  failures  were  caused  by  five  general  problems:   1. incorrect  metadata  in  summon.   2. link  resolver  insufficiency.  we  mostly  used  this  classification  when  360  link  only  provided   links  to  the  main  journal  page  or  database  page  instead  of  links  to  the  article  and  we   thought  it  might  have  been  possible  to  generate  a  link  to  the  article.  sometimes  this  was   due  to  configuration  changes  that  we  could  have  made  and  sometimes  it  was  because  360   link  would  only  create  article  links  if  particular  metadata  was  available  even  if  other   sufficient  identifying  metadata  was  available.   3. incorrect  syntax  in  the  article/chapter  link  generated  by  360  link.   4. target  site  does  not  support  linking  to  the  article/chapter  level.   5. miscellaneous.  most  of  the  links  that  fell  in  this  category  were  ones  that  were  intended  to   go  the  main  journal  page  by  design.  these  were  for  journals  that  are  not  in  vendor-­‐specific   packages  in  the  knowledgebase  but  in  large  general  packages  with  many  journals  on   different  platforms.  because  there  is  no  common  linking  syntax,  article-­‐level  linking  is  not   possible.  this  includes  packages  containing  open  source  titles  such  as  directory  of  open   access  journals  (doaj)  and  packages  of  subscription  titles  that  are  not  listed  in  vendor-­‐ specific  packages  in  the  knowledgebase.             measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   73     sample  1   nov.  2012   n  =  86   sample  2   dec.  2012   n  =  74   sample  3   jan.  2013   n  =  81   sample  4   jan.  2013   n  =  89   incorrect  metadata  in  the  summon  openurl   9   10.5%   5   6.8%   4   4.9%   8   9.0%   incomplete  metadata  in  the  summon   openurl   0   0.0%   2   2.7%   1   1.2%   3   3.4%   difference  in  metadata  between  summon  and   the  target   1   1.2%   6   8.1%   2   2.5%   2   2.2%   inaccurate  data  in  knowledgebase   0   0.0%   0   0.0%   1   1.2%   5   5.6%   incorrect  coverage  in  knowledgebase   3   3.5%   1   1.4%   2   2.5%   1   1.1%   link  resolver  insufficiency   20   23.3%   15   20.3%   9   11.1%   8   9.0%   incorrect  syntax  in  the  article/chapter  link   generated  by  360  link   7   8.1%   3   4.1%   10   12.3%   11   12.4%   target  site  does  not  support  linking  to   article/chapter  level   17   19.8%   6   8.1%   9   11.1%   10   11.2%   article  not  individually  indexed   0   0.0%   1   1.4%   3   3.7%   5   5.6%   target  error  in  targeturl  translation   1   1.2%   3   4.1%   7   8.6%   3   3.4%   incomplete  target  collection   11   12.8%   2   2.7%   5   6.2%   5   5.6%   incorrect  metadata  on  the  target  site   0   0.0%   1   1.4%   0   0.0%   1   1.1%   citation-­‐only  record  in  summon   0   0.0%   2   2.7%   3   3.7%   3   3.4%   cookie   2   2.3%   0   0.0%   0   0.0%   0   0.0%   item  does  not  appear  to  have  a  doi   2   2.3%   4   5.4%   5   6.2%   7   7.9%   miscellaneous   13   15.1%   22   29.7%   18   22.2%   17   19.1%   unknown   0   0.0%   1   1.4%   2   2.5%   0   0.0%   table  9.  reasons  for  linking  failure  to  link  to  full  text  for  all  360  link  links       information  technology  and  libraries  |  march  2015   74   comparison  of  user  reports  and  random  samples   when  we  look  at  user-­‐reported  problems  during  the  same  period  over  which  we  conducted  our   manual  process  (november  1,  2012–january  29,  2013),  we  see  that  users  reported  a  problem   roughly  0.2%  of  the  time  (0.187%  of  searches  resulted  in  a  problem  report  while  0.228%  of  full-­‐ text  clicks  resulted  in  a  problem  report).  see  table  10.   sample  period   problems   reported   articlesplus   searches   mget  it   clicks   problems   reported  per   search   problems   reported  per   mget  it  click   11/1/2012– 11/30/2012   225   111,062   95,218   0.203%   0.236%   12/1/2012– 12/31/2012   105   74,848   58,346   0.140%   0.180%   1/1/2013– 1/29/2013   100   44,204   34,692   0.226%   0.288%               overall   430   230,114   188,256   0.187%   0.228%   table  10.  user  problem  reports  during  the  sample  period   the  number  of  user-­‐reported  errors  is  significantly  less  than  what  we  found  through  our   systematic  sampling  (see  table  2).  where  the  error  rate  based  on  user  reports  would  be  roughly   0.2%,  the  more  systematic  approach  showed  a  20%  error  rate.  relying  solely  on  user  reports  of   errors  to  judge  the  reliability  of  full-­‐text  links  dramatically  underreports  true  problems  by  a  factor   of  100.     conclusions  and  next  steps   comparison  of  user  reports  to  random  sample  testing  indicates  a  significant  underreporting  of   problems  on  the  part  of  users.  while  we  have  not  conducted  similar  studies  across  other  vendor   databases,  we  suspect  that  user-­‐generated  reports  likewise  significantly  lag  behind  true  errors.   future  research  in  this  area  is  recommended.     the  number  of  problems  discovered  in  full-­‐text  items  that  are  linked  via  an  openurl  is   discouraging;  however,  the  ability  of  the  summon  discovery  service  to  provide  accurate  access  to   full  text  is  an  overall  positive  because  of  its  direct  link  functionality.  more  than  95%  of  direct-­‐ linked  articles  in  our  research  led  to  the  correct  resource  (table  3).  one-­‐click  (openurl)   resolution  was  noticeably  poorer,  with  about  60%  of  requests  leading  directly  to  the  correct  full-­‐ text  item.  more  alarming,  we  found  that,  of  full-­‐text  requests  linked  through  an  openurl,  a  large       measuring  journal  linking  success  from  a  discovery  service  |  stuart,  varnum,  and   ahronheim   75   portion—20%—fail.  the  direct  links  (the  result  of  publisher-­‐discovery  service  negotiations)  are   much  more  effective.  this  discourages  us  from  feeling  any  complacency  about  the  effectiveness  of   our  openurl  link  resolution  tools.  the  effort  spent  maintaining  our  link  resolution  knowledge   base  does  not  make  a  long-­‐term  difference  in  the  link  resolution  quality.     based  on  the  data  we  have  collected,  it  would  appear  that  more  work  needs  to  be  done  if  openurl   is  to  continue  as  a  working  standard.  while  our  data  shows  that  direct  linking  offers  improved   service  for  the  user  as  an  immediate  reward,  we  do  feel  some  concern  about  the  longer-­‐term  effect   of  closed  and  proprietary  access  paths  on  the  broader  scholarly  environment.  from  the  library’s   perspective,  the  trend  to  direct  linking  creates  the  risk  of  vendor  lock-­‐in  because  the  vendor-­‐ created  direct  links  will  not  work  after  the  library’s  business  relationship  with  the  vendor  ends.   an  openurl  is  less  tightly  bound  to  the  vendor  that  provided  it.  this  lock-­‐in  increases  the  cost  of   changing  vendors.  the  emergence  of  direct  links  is  a  two-­‐edged  sword:  users  gain  reliability  but   libraries  lose  flexibility  and  the  ability  to  adapt.   the  impetus  for  improving  openurl  linking  must  come  from  libraries  because  vendors  do  not   have  a  strong  incentive  to  take  the  lead  in  this  effort,  especially  when  it  interferes  with  their   competitive  advantage.  we  recommend  that  libraries  collaborate  more  actively  on  identifying   patterns  of  failure  in  openurl  link  resolution  and  remedies  for  those  issues  so  that  openurl   continues  as  a  viable  and  open  method  for  full-­‐text  access.  with  more  data  on  the  failure  modes   for  openurl  transactions,  libraries  and  content  providers  may  be  able  to  implement  systematic   improvements  in  standardized  linking  performance.  we  hope  that  the  methods  and  data  we  have   presented  form  a  helpful  beginning  step  in  this  activity.   acknowledgement   the  authors  thank  kat  hagedorn  and  heather  shoecraft  for  their  comments  on  a  draft  of  this   manuscript.   references     1.     niso/uksg  kbart  working  group,  kbart:  knowledge  bases  and  related  tools,  january  2010,   http://www.uksg.org/sites/uksg.org/files/kbart_phase_i_recommended_practice.pdf.     2.     national  information  standards  organization  (niso),  “ansi/niso  z39.88  -­‐  the  openurl   framework  for  context-­‐sensitive  services,”  may  13,  2010,   http://www.niso.org/kst/reports/standards?step=2&project_key=d5320409c5160be4697dc 046613f71b9a773cd9e.     3.     adam  chandler,  glen  wiley,  and  jim  leblanc,  “towards  transparent  and  scalable  openurl   quality  metrics,”  d-­‐lib  magazine  17,  no.  3/4  (march  2011),   http://dx.doi.org/10.1045/march2011-­‐chandler.       information  technology  and  libraries  |  march  2015   76     4.     ibid.   5.     national  information  standards  organization  (niso),  improving  openurls  through  analytics   (iota):  recommendations  for  link  resolver  providers,  april  26,  2013,   http://www.niso.org/apps/group_public/download.php/10811/rp-­‐21-­‐2013_iota.pdf.   6.     niso/uksg  kbart  working  group,  kbart:  knowledge  bases  and  related  tools.   7.     oliver  pesch,  “improving  openurl  linking,”  serials  librarian  63,  no.  2  (2012):  135–45,   http://dx.doi.org/10.1080/0361526x.2012.689465.   8     jason  price  and  cindi  trainor,  “chapter  3:  digging  into  the  data:  exposing  the  causes  of   resolver  failure,”  library  technology  reports  46,  no.  7  (october  2010):  15–26.   9.     ibid.,  26.   10.    xiaotian  chen,  “broken-­‐link  reports  from  sfx  users,”  serials  review  38,  no.  4  (december   2012):  222–27,  http://dx.doi.org/10.1016/j.serrev.2012.09.002.     11.    lois  o’neill,  “scaffolding  openurl  results,”  reference  services  quarterly  14,  no.  1–2  (2009):   13–35,  http://dx.doi.org/10.1080/10875300902961940.   12.    http://www.lib.umich.edu/.  see  the  articlesplus  tab  of  the  search  box.   13.    one  problem  we  had  in  testing  was  that  log  data  for  february  2013  was  not  preserved.  this   would  have  been  used  to  build  the  sample  tested  in  april  2013.  to  get  around  this  we  decided   to  take  two  samples  from  the  january  2013  log.   14.    the  “minor  results”  row  is  a  combination  of  all  results  that  did  not  represent  at  least  0.5%  of   the  records  using  direct  linking  for  at  least  one  sample.  this  includes  the  following  results:   error  but  full  text  available,  error  and  full  text  not  accessible  through  full  text  link  on  target,   main  journal  page,  360  link  menu  with  no  full  text  links,  results  list,  search  box,  table  of   contents,  and  other.   15.   the  “minor  results”  row  is  a  combination  of  all  results  that  did  not  represent  at  least  0.5%  of   the  records  using  360  link  for  at  least  one  sample.  this  includes  the  following  results:  error   and  full  text  not  accessible  through  full  text  link  on  target,  main  journal  page,  360  link  menu   with  no  full  text  links,  listing  of  volumes/issues. critical success factors for integrated library system implementation in academic libraries: a qualitative study shea-tinn yeh and zhiping walter information technology and libraries | september 2016 27 abstract integrated library systems (ilss) support the entire business operations of an academic library from acquiring and processing library resources to making them available to user communities and preserving them for future use. as libraries’ needs evolve, there is a pressing demand for libraries to migrate from one generation of ils to the next. this complex migration process often requires significant financial and personnel investment, but its success is by no means guaranteed. we draw on enterprise resource planning and critical success factors (csfs) literature to identify the most salient csfs for ils migration success through a qualitative study with four cases. we found that careful selection process, top management involvement, vendor support, project team competence, staff user involvement, interdepartmental communication, data analysis and conversion, project management and project tracking, staff user education and training, and managing staff user emotions are the most salient csfs that determine the success of a migration project. introduction the first generation of integrated library systems (ilss) were developed specifically for library operations focused on the selection, acquisition, cataloging, and circulation of print collections. as libraries’ nonprint materials steadily grow, the print-centric ilss became less and less efficient in supporting libraries’ daily operations. recent years have seen an emergence of a new generation of ilss, commonly called library services platforms (lsps), that takes into account the management of both print and electronic collections. lsps take advantage of cloud computing and network advancements to provide economies of scale and to allow a library to better share data with other libraries. furthermore, lsps unify the entire suite of library operations to provide efficient workflow at the back end and advanced online discovery tools at the front end for the library.1 given the claimed benefits of the emerging lsp and the fact that vendors are phasing out support for their legacy ilss, we project that more libraries will be migrating to lsps as the systems mature and libraries’ needs evolve. shea-tinn yeh (sheila.yeh@du.edu) is assistant professor and library digital infrastructure and technology coordinator, university of denver libraries. zhiping walter (zhiping.walter@ucdenver.edu) is associate professor, business school, university of colorado denver. mailto:sheila.yeh@du.edu mailto:zhiping.walter@ucdenver.edu) critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 28 migrating from one generation of ils to another is a significant initiative that affects the entire library operation.2 because of its scale and complexity, the migration project is not always smooth and often fraught with problems, with some projects falling behind migration completion schedule.3, 4, 5 in addition, committing to a new system often results in significant financial and personnel costs for an academic library.6 understandably, there is considerable trepidation before, during, and after the migration process.6, 7 what contributes to a smooth migration process and a successful migration project? this is an urgent question at present and an enduring question for the future. this is because, as libraries continue to evolve, their operations and management needs are destined to outgrow functionalities of the current generation of ils. therefore migration to a new generation of ils is destined to occur periodically for a library. in this research, we study critical success factors (csfs) that contribute to a smooth migration process and a successful migration project defined as on-time and on-budget project completion and a smooth implementation process. to achieve our research goal, we anchor our theoretical foundation in the enterprise resource planning (erp) system-implementation literature. erp is “business process management software that allows an organization to use a system of integrated applications to manage the business and automate many back office functions related to technology, services and human resources.”9 since a complete ils is formed from a suite of integrated functions to manage a broad range of library processes, it is in fact an erp for libraries.10 a literature review of csfs for erp system implementation success revealed more than ninety csf factors.11, 12 the contribution of our research is in identifying, through qualitative research method, the most salient csfs that contribute to the success of a library system migration project from one generation of ils to another. results of this study can help library administrators to improve the chance of success and decrease the level of anxiety during a migration project now and in the future. the remainder of the article is organized as follows: section 2 reviews erp, ils, lsp, csfs, and information system success measurement described in the literature. section 3 describes the guided interviews that have been conducted to identify the csfs, the results, and the analysis of the results. finally, we offer conclusions and limitations as well as recommend future work. literature review erp is business-management software comprising a suite of integrated applications that an organization can use to collect, store, manage, and interpret data from many business activities, including product planning, manufacturing, service delivery, marketing and sales, and human resources. the core idea of an erp system is to integrate both the data and the process dimensions in a business so that transactions can be monitored and analyzed for planning and strategic purposes.13 modules of the system cover different functions within a company and are linked so users can see what is happening in all areas of the company. an erp system can improve a business’s back offices as well as its front-end functions, with both operational and strategic benefits.14 some of the benefits include reliability in information access, data and operations information technology and libraries | september 2016 29 redundancy, data retrieval and reporting efficiency, easy module extension, and internet commerce capability. just like an erp system for a business, a complete library management solution comprises a suite of integrated applications that manage a broad range of library processes including circulation, acquisition, cataloging, electronic resources management, and system administration. lsps, the current generation of library management systems, are designed to manage both physical and digital collections. lsps follow the service-oriented architecture (soa) and can be deployed through multitenant software as a service (saas) distribution model.15 in addition to supporting all library functions, lsps integrate with other university systems, such as student registry and finance, and provide front-end for library patrons in a cloud environment that leverages a global network of systems for discovery of a wide array of resources.16 since an lsp is essentially an enterprise system for library functions, csfs of erp implementation success could guide lsp implementation. csfs are conditions that must be met for an implementation to be successful.17 more than ninety csfs have been identified for erp implementation success.18, 19 those csfs have been classified according to various schemes, but we found the strategic versus tactical classification most relevant to the library context.20 strategic factors address the big picture involving the breakdown of goals into do-able items. tactical factors, on the other hand, are the methods to accomplish the doable items that lead to achieving the goals.21 by examining the entire list of csfs from both the strategic and the tactical perspectives, we identify top csfs for library-management-solution implementation and migration success, defined as on-time and on-budget delivery as well as smooth implementation process,22, 23 through a qualitative study. method we conducted semi-structured interviews with open-ended questions to identify the most salient csfs for implementation success. since we needed to reduce more than ninety csfs in the literature to a list of most salient csfs in the library context and to potentially identify new csfs, a qualitative-interview approach was more suitable than a quantitative-survey approach. a twostep process was used to arrive at the final list. first, we evaluated all csfs in the literature and identified a subset of csfs that might be most relevant for library-systems implementation.24 second, this csfs subset was used to develop an interview guide for semistructured interviews conducted later to further reduce this subset. open-ended questions were also used during the interviews to elicit additional csfs. an institutional review board (irb) application was submitted and approved. the result of this two-step process is a list of ten csfs discussed in the results section, with nine csfs coming from our initial list and one csf emerging from the interviews. the criterion for recruiting study libraries is that the library has implemented a new lsp within the last three years. this is because the lsp is the current generation of ils, and it is only within the last few years that various lsp vendors began to promote and implement the lsps. a critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 30 recruitment email was sent to libraries listed as adopters on various vendors’ press release sites. participating recipients referred the interview request to appropriate migration team members whom we later contacted to schedule interviews. this resulted in up to five people from each participating library being interviewed in person or via skype. their positions are listed in table 1. interviews were recorded, transcribed, and cleaned. emails to the same interviewees were used for follow-up questions as needed. after interviews with each library, qualitative data analysis was performed to identify csfs that emerged from the interviews. interviews continued until no new csfs emerged in the last interview. in total, staff from four libraries were interviewed between october 2014 and march 2015 about their implementation process and experience from staff user perspective. the design and implementation of discovery public interface experience was not part of this inquiry. table 1 summarizes characteristics of the four libraries. case numbers instead of university names are used to protect identities of participating libraries and interviewees. case 1 case 2 case 3 case 4 type of university private public public private student population 11,000+ 32,000+ 2,400+ 2,700+ operating budget 11 million 13 million 1.5 million 1.3 million library employees 150 400 17 13.5 project length 6 months 9 months 6 months 9 months ils used before millennium aleph evergreen voyager lsp implemented sierra alma sierra sierra reasons for migration discontinued vendor system support; servers out of warranty; vendor gave incentives outdated servers; servers out of warranty in need of a robust system and provides discovery layer in need of a modern system demonstrating the library’s moving with the times positions of interviewees head of systems; module experts heads of systems director of library; head of systems director of library table 1. summary of case study site characteristics. results the following csfs emerged from interviews: careful selection process, top management involvement, vendor support, project team competence, staff user involvement, interdepartmental communication, data analysis and conversion, project management and project tracking, staff user education and training, managing staff user emotions. we discuss each csf next. information technology and libraries | september 2016 31 careful selection process most ilss are commercial, off-the-shelf software systems that can vary dramatically in functionality from system to system.25 for example, some packages are more suitable for large institutions while others are more suitable for smaller ones. to mitigate risks in productivity or transaction loss and to minimize system and implementation costs, a library needs to determine the best “fitness-of-use” system. such a determination is the outcome of a careful selection process. although there is no commonly accepted technique, method, or tool for this process, all selection processes share common key steps suggested in the literature.26 they are the following as applied to library-systems selection: define stakeholder requirements, search for products, create a short list of most promising candidates based on a set of “must-have” requirements, evaluate the candidates on the short list, and analyze the evaluation data to make a selection. in addition, if the server option was chosen instead of the cloud option, selected hardware needs to satisfy system requirements for the final configuration. careful selection process emerged as a csf that affected implementation outcome for all four libraries. all cases were migrating to an lsp system. some systems can be offered as locally installed systems, which require appropriate in-house and hardware capabilities. case 1 did not consider its it capability when deciding on a turnkey system. as a result, the library experienced difficulties in setting up the infrastructure in-house during the implementation. each of the other three cases considered the candidate system’s compatibility with the legacy system, the match between library needs and system functionalities, system maturity, migration costs, data storage needs, and vendor support before and during the implementation as well as continued vendor support throughout the life of the new system. even though each of the three libraries arrived at its system choice differently, on reflection, interviewees expressed relief and satisfaction in their decisions to choose their respective systems. “we were in the position where our servers were out of date and warranty, needed to be replaced. the servers were too small. we had sizing issues and we couldn’t update to the most recent version of aleph . . . alma being a cloud based solution will eliminate our need to be ‘in the server business.’” (case 2). “we went through a very extensive formal process to select this system.” (case 3) top management involvement successful implementation requires strong leadership by executives who understand, support, and champion the project.27 when this involvement is trickled down through organizational hierarchy, it leads to an organizational commitment, which is required for implementation success for complex projects.28, 29 since library-system implementation is a complex project that (if done correctly) will transform the entire library and reposition it for better efficiency, strong leadership is critical as well. critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 32 in all four cases, top management were involved in the final decisions of their respective system choices. in cases 1 and 2, top management also took charge in securing funding for the migration projects. interviewees stressed that top management support was very important in their respective project implementations. “the top level management took the recommendations from the systems librarians at the time, with the blessing of the council determined whether they want to proceed with the product alma, and had funding conversations with the financial people.” (case 2) “we have faculty library committee, faculty governance oversight. we showed them webinars of the products we considered before we signed them, so we have faculty representation on board. we held open forum and were inclusive in our invitations.” (case 4) vendor support with a new technology, it is critical to acquire external technical expertise, often from the vendor, to facilitate successful implementation.30 effective vendor support includes adequate and highquality technical support during and after implementation, sufficient training provided for both the project team and staff users, and positive relationships between all parties in the project.31 additionally, there should be adequate knowledge transfer between the vendor consultants and the clients, which can be achieved by defining roles, achieving shared understanding, and enhancing relationships through competent communication.32, 33 in the case of library-system implementations, vendor support is particularly important because of the complexity of each new generation of the system and the library personnel’s knowledge gap in understanding the nuts and bolts of the new system. effective vendor support was identified in each case as a critical success factor determining the implementation outcome even though the form of vendor support varied from case to case. in case 1, the vendor sent different consultants with various expertise as project managers on the basis of the project phase. in case 2, the vendor sent one consultant who served as the main project manager. in case 3, the vendor provided a project manager and a team of technicians. in case 4, consultants were shared across multiple consortium libraries that were implementing the system at the same time. no matter how vendor support was provided, it was essential for implementation success as indicated by interviewees. “the vendor has been very supportive and provides a group of experts throughout the process, some are knowledgeable in server business while others are skilled project managers.” (case 1) project team competence since library-system migration affects all functional areas of a library, members of the implementation team need to be cross-functional. furthermore, members with both business information technology and libraries | september 2016 33 knowledge and technology knowhow are especially crucial for implementation success.34 competence of vendor consultants assigned to the project also influences implementation success, as discussed earlier. additionally, it is important to have an in-house project leader who champions the project and who has the essential skills and authority to set goals that legitimize change.35 having a competent project team was essential for implementation success for each of our cases. in each case, the vendor provided the project manager and the library provided a co-manager who was a champion figure. other team members came from various functional areas such as acquisition, circulation, cataloging, electronic resources management, and system administration. for example, in case 1, the technology librarian participated as a co-project manager. the projectmanagement team comprised module experts within the library and from functional areas. in addition, the university’s technology services department lent technical support during early stages of implementation when servers need to be set up. the interviewees all stressed the importance of project-team competence. “without the infrastructure knowledge from the university’s technology team and their time and full support to negotiate with the vendor, the migration project would not have been possible.” (case 1) “the university’s it made sure that we are in compliance with campus policies and expectations for securities.” (case 2) staff user involvement it is important that the project team involve staff users early on, otherwise the implementation process may be bumpy. when end users are involved in decisions relating to system selection and implementation, they are more invested in and concerned with the success of the system, which in turn leads to greater system use and user satisfaction.36, 37 as such, it is one of the most cited critical success factors in erp implementation.38 because personal relevance to the system is just as important for library-system implementation, effective staff user involvement with implementation is positively related to implementation success. staff user involvement has emerged as a main success factor in all our cases and contributed to the implementation project outcome. in case 1, staff users were not consulted as to whether an lsp was necessary for the library, although they were informed of the reasons for implementation. additionally, staff users were not involved when the project timetable was negotiated. this lack of early staff user involvement led to considerable stress down the road, which made the implementation process bumpy. the other three cases involved staff users early on; as a result, staff users experienced much less stress and frustration down the road. specifically, in case 2, the staff users were educated about the need for migration through staff meetings, town hall meetings, supervisory meetings, council meetings, and forums. many product-demo sessions were conducted for the staff so they would have the knowledge to participate before the final decision critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 34 was made. there were daily internal newsletters conveying implementation news throughout implementation months. in case 3, the entire library was involved with the selection of a new system. while the key staff (such as circulation manager, acquisition manager, and reference manager) had more input than others, everyone offered input about the project. as such, the buyin with the new system was strong from all stakeholders. in case 4, staff users were involved early on through open forums and webinars. the following quotes are examples of interviewee sentiment concerning staff user involvement: “everybody is involved in choosing the system; partially because evergreen had been so problematic. we wanted to make sure that everyone is on board.” (case 3) “migration is the most time consuming aspect of the library staff work during the time of the project, without their buy-ins, it is difficult to have a successful project.” (case 4) interdepartmental communication the importance of effective communications across functional and departmental boundaries is well known in information-systems-implementation literature.39 with consultants coming from the vendor, project team members coming from different functional areas, and staff users with different perceptions and understandings of the implementation project, the importance of effective communications between all involved cannot be overstated. communications should start early, be consistent and continuous throughout various stages of the implementation process, and include a system overview, rationale for implementation, briefings for process changes, and contact-points establishment.40 expectations and goals should be communicated to all stakeholders and to all levels of the organization.41 effectiveness of interdepartmental communication affected the implementation outcome in all our cases. in case 1, the library’s project manager was designated to communicate with the vendor when issues arose, such as hardware and software configurations, system backup and use, and task assignments. the formal project plan was established using the web-based basecamp so that team members in different roles with different responsibilities could communicate and work together online. regular meetings were held and emails were exchanged between project team members. however, there is a lack of effective interdepartmental communication with staff who were not on the project team. this resulted in the absence of necessary system testing that would have detected some data-integrity issues. such issues later caused the system to be offline for days, which brought much frustration and stress to everyone. in the other three cases, all actors were well informed through news releases, meetings, presentations, and webinars. concerns were communicated to the project team and addressed timely. as a result, the level of frustration was very low for those three cases. data analysis and conversion a fundamental requirement for the effectiveness of an erp system is the accuracy of its data,42 and the same is true for a library system. data types in a legacy ils are often of an outdated format and information technology and libraries | september 2016 35 can differ from formats supported by a new library system. conversion from one format to another can be an overwhelming process, especially when there is no existing expertise in the library. since migrating legacy data to the new system is essential, effective data analysis for conversion is a critical success factor for implementation success. the smoothness of each of the four implementation cases was related to the project team’s data analysis and conversion efforts. in case 1, the library did not spend any effort to analyze, convert, or clean the data. as a result, the system experienced data-integrity issues after it went live. the other three libraries either devoted time to clean and convert the data or had a third party do the data cleaning. as a result, no system issues arose from data-integrity problems. interviewees from case 2 told us, “we elected to freeze the data 30 days sooner in terms of bibliographic data, so that we can do an authority control project with a third party vendor.” project management and project tracking according to erp implementation literature, effective project-management practices are critical for implementation success. such practices include defining clear objectives, establishing a formal implementation plan, designing a realistic work plan, and establishing resource requirements.43 the formal implementation plan needs to identify modules to be implemented, tasks to be undertaken, and all technical and nontechnical issues to be considered.44 project progress must be carefully monitored through meetings and reports.45, 46 effective project management and tracking has affected implementation outcome in all our cases. a popular project management and tracking software is basecamp, a web-based project management and collaboration tool initially released in 2004.47 it offers discussion boards, to-do lists, file sharing, milestone management, event tracking, and messaging system that help project teams stay organized and connected despite their different locations. all cases used basecamp for project management and tracking, which contributed to on-time and on-budget project completion for all cases. staff user education and training a new system often frustrates users who do not receive adequate training in its functionalities and use.48 when feeling frustrated and stressed, users may avoid using the system. proper and adequate training will sooth users and eliminate their reluctance to use the new system, which in turn helps realize productivity gains.49, 50 training processes should consider factors such as training curriculum, user commitment, trainers’ personnel skills and competence, as well as training schedule, budget, evaluation, and methods.51 effective staff user training has emerged as a critical success factor from all our cases. in case 1, staff users had access to a vendor-supplied preview portal, which simulated system functionalities. staff users were so familiar with the new system by the time the system went live that they were eager to engage with it. in cases 2, 3 and 4, staff users were trained through demo products, online video trainings, q&a, and on-site training sessions conducted by the vendor. critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 36 these training materials and sessions served to ease staff user’s feeling of uncertainty and anxiety, as the following quotes show: “the online training videos were provided to all staff in the library and followed up with q&a sessions which members of the committee will host in their respective areas. . . . then ex libris did a week long onsite training workshop serve for the final deep configuration issues. . . . we know that there are staff users who want to be ahead of the game, yet there are always people who don’t want to learn until the day before they go live.” (case 2) “we have a training package with several onsite visits, each one is for a few days. the trainer focused on one aspect of the system. it was more than watching the videos online. because of the small staff here, almost everyone attended at least one training.” (case 3) “the trainers varied with their expertise, we developed fondness for some more than others. the training is functional in nature. the vendor’s priority was about trainer availability and to keep the project on time. we became familiar with trainers’ expertise; we were able to request the right trainer with the job.” (case 4) managing staff user emotions although education and training eases user anxiety, it does not completely eliminate it. emotions felt by users early in the implementation of a new system have important effects on the use of the system later on.52 how to manage staff user anxiety and negative emotions when they appear has emerged as a critical success factor in all our cases, as shown in the following quotes: “there were so many things going on in the library during the migration go-live week. the unknown of the migration success made staff users uncomfortable. should the migration date be decided in consideration of other initiatives, the frustration experienced would have been a lot less and might not have been ignored during the going-live week.” (case 1) “the frustration was just change; it was the fact that we have to learn something new. . . . primarily the frustration was handled by the lead.” (case 2) “there was a challenge, especially early on, in getting people to engage with the manuals and the literature in documentation. it is as if everyone is being asked to learn a new language. . . . the key relationship between the onsite coordinator and the project manager on the vendor side is important. when those two exchange information and handle frustration diplomatically, this bridge between the two organizations can smooth over a lot of rough feathers on either or both sides.” (case 4) information technology and libraries | september 2016 37 this final csf did not come directly from the ninety-plus csfs that we started with, although it aligned closely with “change management” category.53 this csf emerged mostly from the interview process. summary of results the results of the case studies for each critical factor are summarized in table 2. implementation project outcome is summarized in table 3. an implementation is considered successful if it was completed on-time and on-budget and if the implementation process was smooth as reflected in the number and degree of unexpected problems along the way. critical success factor case 1 case 2 case 3 case 4 careful selection process no yes yes yes top management involvement yes yes yes yes vendor support yes yes yes yes project team competence yes yes yes yes staff user involvement no yes yes yes interdepartmental communication no yes yes yes data analysis & conversion no yes yes yes project management and tracking yes yes yes yes staff user education and training yes yes yes yes managing staff user emotions no yes yes yes table 2. summary of case study critical success factors findings case 1 case 2 case 3 case 4 on time implementation yes yes yes yes on budget implementation yes yes yes yes smoothness of implementation no staff users experienced data integrity issue, system downtime, as well as anxiety and stress with the system implementation process yes yes yes table 3. summary of case study implementation success measures discussion and conclusions the implementation of a new ils is a large-scale undertaking that affects every aspect of a library’s operations as well as every staff user’s workflow process. as such, it is imperative for critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 38 library administrators to understand what factors contribute to a successful implementation. our qualitative study shows that there are two categories of csfs: strategic and tactical. from the strategic perspective, top management involvement, vendor support, staff user involvement, interdepartmental communication, and staff user emotion management are critical. from the tactical perspective, project team competence, project management and project tracking, data analysis and conversion, and staff user education and training to break down the technical barrier greatly affect implementation outcome. in addition, selection of the final system from a variety of choices and options requires a careful consideration of both strategic and tactical issues. each factor identified is important in its own right during the implementation process. combined, they complement each other to guide an implementation to success. among the list of csfs identified, the role of staff user emotion management was not identified during the theoretical phase of the study; it only emerged as an important csf during interviews. top management involvement, vendor support, project team competence, project management and tracking, and staff user education and training are csfs that were somewhat intuitive, and they were implemented by all cases. however, a library may select an end system without careful considerations. it may also be unaware of the importance of involving users early on, the importance of opening clear lines of interdepartmental communications, or the importance of performing data analysis and conversion before the implementation. staff user emotion management, especially, is at the risk of being an afterthought of an implementation. by identifying the most salient csfs, this study offers practical contributions to academic library leaders and administrators in understanding how critical success factors play a role in ensuring a smooth and successful ils implementation. although csfs have been extensively studied in the discipline of information-systems management, this is the first study to apply csfs in the library context. since library management has its unique challenges compared to businesses, identifying csfs for library-system-implementation success is important not only for the current migration to lsps but also for future migrations to future generations of ilss as the needs of libraries continue to evolve. as with any empirical research, there are limitations to this study. the number of academic libraries interviewed is small despite no new information being discovered after the fourth interview. the vendors represented in this study are only two of the many in the market providing lsps to libraries. with these aforementioned limitations, the results of this study may not be generalizable to libraries implementing an lsp with vendors other than innovative interfaces and ex libris. additionally, the results may not be generalizable to nonacademic libraries. this research can be extended to validate the proposed csfs quantitatively by performing a survey research in academic libraries. studying interactions between identified factors will offer an even greater contribution. this research can be experimented in other types of libraries to generalize inferences. in addition, case libraries 3 and 4 both expressed that lsp changes the public interface that is used by external users, and they wished to have more opportunities for outreach prior to the implementation. although the design and implementation of the public interface was not considered within the scope of this research, this comment is insightful because information technology and libraries | september 2016 39 it may imply that future studies should consider a project champion to be a critical success factor. the project champion must have people-related skills and position to introduce changes in achieving buy-in from staff users.54, 55 references 1. richard m. jost, selecting and implementing an integrated library system: the most important decision you will ever make (boston: chandos, 2015). 2. ibid., 3. 3. suzanne julich, donna hirst and brian thompson, “a case study of ils migration: aleph500 at the university of iowa,” library hi tech 21, no. 1 (2003): 44–55, http://dx.doi.org/10.1108/07378830310467391. 4. zahiruddin khurshid, “migration from dobis libis to horizon at kfupm,” library hi tech 24, no. 3 (2006): 440–51, http://dx.doi.org/10.1108/07378830610692190. 5. vandana singh, “experiences of migrating to an open-source integrated library system,” information technology & libraries 32, no. 1 (2013): 36–53. 6. jost, “selecting and implementing an integrated library system.” 7. yongming wang and trevor a. dawes, “the next generation integrated library system: a promise fulfilled,” information technology & libraries 31, no. 3 (2012): 76–84. 8. keith kelley, carrie c. leatherman, and geraldine rinna, “is it really time to replace your ils with a next-generation option?” computers in libraries 33, no. 8 (2013): 11–15. 9. vangie beal, “erp—enterprise resource planning,” webopedia, http://www.webopedia.com/term/e/erp.html. 10. “library management system,” tangient llc, https://libtechrfp.wikispaces.com/library+management+system. 11. christopher p. holland and ben light, “a critical success factors model for erp implementation,” ieee software 16, no. 3 (1999): 30–36, http://dx.doi.org/10.1109/52.765784. 12. levi shaul and doron tauber, “critical success factors in enterprise resource planning systems: review of the last decade,” acm computing surveys 45 no. 4 (2013): 1–39, http://dx.doi.org/10.1145/2501654.2501669. 13. yahia zare mehrjerdi, “enterprise resource planning: risk and benefit analysis,” business strategy series 11, no. 5 (2010): 308–24, http://dx.doi.org/10.1108/17515631011080722. 14. mohammad a. rashid, liaquat hossain, and jon david patrick, “the evolution of erp systems: a historical perspective,” in enterprise resource planning: global opportunities and challenges (hershey, pa: idea group, 2002). http://dx.doi.org/10.1108/07378830310467391 http://dx.doi.org/10.1108/07378830610692190 http://www.webopedia.com/term/e/erp.html https://libtechrfp.wikispaces.com/library+management+system http://dx.doi.org/10.1109/52.765784 http://dx.doi.org/10.1145/2501654.2501669 http://dx.doi.org/10.1108/17515631011080722 critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 40 15. marshall breeding, “library systems report 2014: competition and strategic cooperation,” american libraries 45, no. 5 (2014): 21–33. 16. sharon yang, “from integrated library systems to library management services: time for change?” library hi tech news 30, no. 2 (2013): 1–8, http://dx.doi.org/10.1108/lhtn-022013-0006. 17. shahin dezdar, “strategic and tactical factors for successful erp projects: insights from an asian country,” management research review 35, no. 11 (2012): 1070–87, http://dx.doi.org/10.1108/14637151111182693. 18. ibid. 19. shahin dezdar and ainin sulaiman, “successful enterprise resource planning implementation: taxonomy of critical factors,” industrial management & data systems 109, no. 8 (2009): 1037– 52, http://dx.doi.org/10.1108/02635570910991283. 20. sherry finney and martin corbett, “erp implementation: a compilation and analysis of critical success factors,” business process management journal 13, no. 3 (2007): 329–47, http://dx.doi.org/10.1108/14637150710752272. 21. f. pearce, business building and promotion: strategic and tactical planning (houston: pearman cooperation alliance, 2004). 22. jennifer bresnahan, “mixed messages,” cio (may 16, 1996), 72, http://dx.doi.org/10.1016/j.jchf.2013.07.005. 23. majed al-mashari, abdullah al-mudimigh, and mohamed zairi, “enterprise resource planning: a taxonomy of critical factors,” european journal of operational research 146, no. 2 (2003): 352–64, http://dx.doi.org/10.1016/s0377-2217(02)00554-4. 24. shaul and tauber, “critical success factors in enterprise resource planning systems.” 25. h. akkermans and k. van helden, “vicious and virtuous cycles in erp implementation: a case study of interrelations between critical success factors,” european journal of information systems 11, no. 1 (2002): 35–46, http://dx.doi.org/10.1057/palgrave.ejis.3000418. 26. abdallah mohamed, guenther ruhe, and armin eberlein, “cots selection: past, present, and future” (paper presented at the 14th annual ieee international conference and workshops on the engineering of computer-based system, 2007), http://dx.doi.org/10.1109/ecbs.2007.28. 27. m. michael umble, elisabeth j. umble, and ronald r. haft, “enterprise resource planning: implementation procedures and critical success factors,” european journal of operational research 146 no. 2 (2003): 241–57, http://dx.doi.org/10.1016/s0377-2217(02)00547-7. 28. jim johnson, “chaos: the dollar drain of it project failures,” application development trends 2, no. 1 (1995): 41–47. http://dx.doi.org/10.1108/lhtn-02-2013-0006 http://dx.doi.org/10.1108/lhtn-02-2013-0006 http://dx.doi.org/10.1108/14637151111182693 http://dx.doi.org/10.1108/02635570910991283 http://dx.doi.org/10.1108/14637150710752272 http://dx.doi.org/10.1016/j.jchf.2013.07.005 http://dx.doi.org/10.1016/s0377-2217(02)00554-4 http://dx.doi.org/10.1057/palgrave.ejis.3000418 http://dx.doi.org/10.1109/ecbs.2007.28 http://dx.doi.org/10.1016/s0377-2217(02)00547-7 information technology and libraries | september 2016 41 29. prasad bingi, maneesh k. sharma, and jayanth k. godla, “critical issues affecting an erp implementation,” information systems management 16, no. 3 (1999): 7–14, http://dx.doi.org/10.1201/1078/43197.16.3.19990601/313. 30. mary sumner, “critical success factors in enterprise wide information management systems projects,” proceedings of the 1999 acm sigcpr conference on computer personnel research, 1999 (new york: acm, 1999), http://dx.doi.org/10.1145/299513.299722. 31. eric t. g. wang et al., “the consistency among facilitating factors and erp implementation success: a holistic view of fit,” journal of systems & software 81 no. 9 (2008): 1609–21, http://dx.doi.org/10.1016/j.jss.2007.11.722. 32. dong-gil ko, laurie j. kirsch, and william r. king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” mis quarterly 29, no. 1 (2005): 59–85. 33. al-mashari, “enterprise resource planning.” 34. fiona fui-hoon nah and santiago delgado, “critical success factors for enterprise resource planning implementation and upgrade,” journal of computer information systems 46 no. 5 (2006): 99–113. 35. liang zhang et al., “a framework of erp systems implementation success in china: an empirical study,” international journal of production economics 98, no. 1 (2005): 56–80, http://dx.doi.org/10.1016/j.ijpe.2004.09.004. 36. ann-marie k. baronas and meryl reis louis, “restoring a sense of control during implementation: how user involvement leads to system acceptance,” mis quarterly 12, no. 1 (1988): 111–24. 37. joseph esteves, joan pastor and joseph casanovas, “a goals/questions/metrics plan for monitoring user involvement and participation in erp implementation projects,” ie working paper, march 11, 2004, http://dx.doi.org/10.2139/ssrn.1019991. 38. khaled al-fawaz, zahran al-salti, and tillal eldabi, “critical success factors in erp implementation: a review” (paper presented at the european and mediterranean conference on information systems, dubai, may 25–26, 2008). 39. h. akkermans and k. van helden, “vicious and virtuous cycles in erp implementation: a case study of interrelations between critical success factors,” european journal of information systems 11, no. 1 (2002): 35–46, http://dx.doi.org/10.1057/palgrave.ejis.3000418. 40. nancy bancroft, henning seip, and andrea sprengel, implementing sap r/3: how to introduce a large system into a large organisation (greenwich, uk: manning, 1998). 41. nah, “critical success factors.” http://dx.doi.org/10.1201/1078/43197.16.3.19990601/313 http://dx.doi.org/10.1016/j.jss.2007.11.722 http://dx.doi.org/10.1016/j.ijpe.2004.09.004 http://dx.doi.org/10.2139/ssrn.1019991 http://dx.doi.org/10.1057/palgrave.ejis.3000418 critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 42 42. toni m. somers and klara nelson, “the impact of critical success factors across the stages of enterprise resource planning implementations,” proceedings of the 34th hawaii international conference on system sciences, 2001, http://dx.doi.org/10.1109/hicss.2001.927129. 43. shi-ming huang et al., “assessing risk in erp projects: identify and prioritize the factors,” industrial management & data systems 104, no. 8 (2004): 681–88, http://dx.doi.org/10.1108/02635570410561672. 44. nah, “erp implementation.” 45. umble, “enterprise resources planning.” 46. nah, “erp implementation.” 47. “basecamp, in a nutshell,” basecamp, https://basecamp.com/about/press. 48. nah, “erp implementation.” 49. umble, “enterprise resources planning.” 50. mo adam mahmood et al., “variables affecting information technology end-user satisfaction: a meta-analysis of the empirical literature,” international journal of human-computer studies 52, no. 4 (2000): 751–71, http://dx.doi.org/10.1006/ijhc.1999.0353. 51. iuliana dorobat and floarea nastase, “training issues in erp implementations,” accounting & management information systems 11, no. 4 (2012): 621–36. 52. anne beaudry and alain pinsonneault, “the other side of acceptance: studying the direct and indirect effects of emotions on information technology use,” mis quarterly 34, no. 4 (2010): 689–710. 53. shaul and tauber, “critical success factors in enterprise resource planning systems.” 54. andrew lawrence norton et al., “ensuring benefits realisation from erp ii: the csf phasing model,” journal of enterprise information management 26, no. 3 (2013): 218–34, http://dx.doi.org/10.1108/17410391311325207. 55. chong hwa chee, “human factor for successful erp2 implementation,” new straits times, july 28, 2003, https://www.highbeam.com/doc/1p1-76161040.html. http://dx.doi.org/10.1109/hicss.2001.927129 http://dx.doi.org/10.1108/02635570410561672 https://basecamp.com/about/press http://dx.doi.org/10.1006/ijhc.1999.0353 http://dx.doi.org/10.1108/17410391311325207 https://www.highbeam.com/doc/1p1-76161040.html abstract introduction literature review method results careful selection process top management involvement vendor support project team competence staff user involvement interdepartmental communication data analysis and conversion project management and project tracking staff user education and training managing staff user emotions summary of results enhancing visibility of vendor accessibility documentation samuel kent willis and faye o’reilly information technology and libraries | september 2018 12 samuel kent willis (samuel.willis@wichita.edu) is assistant professor and technology development librarian and faye o’reilly (faye.oreilly@wichita.edu) is assistant professor and digital resources librarian at wichita state university. abstract with higher education increasingly being online or having online components, it is important to ensure that online materials are accessible for persons with print and other disabilities. libraryrelated research has focused on the need for academic libraries to have accessible websites, in part to reach patrons who are participating in distance-education programs. a key component of a library’s website, however, is the materials it avails to patrons through vendor platforms outside the direct control of the library, making it more involved to address accessibility concerns. librarians must communicate the need for accessible digital files to vendors so they will prioritize it. in much the same way as contracted workers constructing a physical space for a federal or federally funded agency must follow ada standards for accessibility, so software vendors should be required to design virtual spaces to be accessible. a main objective of this study was to determine a method of increasing the visibility of vendor accessibility documentation for the benefit of our users. it is important that we, as service providers for the public good, act as a bridge between vendors and the patrons we serve. introduction the world wide web was developed late in 1989 but reached the public sector the following year and quickly gained prominence.1 around this same time (1990), the americans with disabilities act (ada) was also passed, so when it was written the role of the web had yet to take shape. websites and online content, while not included specifically in the ada, have been increasingly emphasized when institutions examine the accessibility of their resources for persons with disabilities. more recent legislation, as well as legal-settlement agreements (including with colleges and universities), have included—and even emphasized—the importance of accessible online content. researchers have argued that in requiring facilities to be accessible, ada must include digital accessibility.2 with higher education increasingly being online or having online components, it is important to ensure that online materials are accessible for persons with print and other disabilities, many of whom may have received more extensive support in primary and secondary schools. unless accessibility is pursued with purpose, the level of education and educational materials available for students with disabilities will be severely limited.3 literature review legislation and existing guidelines equal access to information for all patrons is a foundational goal of libraries. in higher education, accessible information and communications technology allows users of all abilities to focus on learning without undue burden.4 colleges and universities are required by law to provide enhancing visibility of vendor accessibility documentation | willis and o’reilly 13 https://doi.org/10.6017/ital.v37i3.10240 reasonable accommodations to allow an individual with a disability to participate fully in the programs and activities of the university. according to title ii of ada, discrimination on the basis of disability by any state or local government and its agencies is strictly prohibited.5 section 504 of the rehabilitation act of 1973 also prohibits discrimination on the basis of disability by any program or activity receiving federal assistance.6 the department of education stated, “public educational institutions that are subject to education’s section 504, regulations because they receive federal financial assistance from us are also subject to the title ii regulations because they are public entities (e.g., school districts, state educational agencies, public institutions of vocational education and public colleges and universities).” 7 this piece of legislation usually manifests itself in the physical learning space—wheelchair ramps, braille textbook options, interpreters, and more—but finds little application in the digital spaces of a university, especially in the library’s online research presence. this is an alarming revelation; much higher learning today takes place in an online environment, and inaccessible library resources are a contributing factor to challenges in higher education faced by users with disabilities. to be considered accessible, a digital space, such as a website, online-learning management system, or a research discovery layer, and any word documents, pdfs, and multimedia presented therein, should be formatted in such a way that it is compatible with assistive technologies, such as screen-reading software. a website should also be navigable without a mouse using visual or auditory clues. content on a website ought to be clearly and logically organized, with skip navigation links to bypass to the page’s main content. images should have alternative text descriptions, known as “alt text,” that is brief and informative, describing the content and role of the image. links should likewise have clear descriptions of the target page. these and similar considerations aim to help persons with impairments that may make reading a monitor or screen difficult.8 digital spaces like a research database are considered electronic information technology (eit). eit is defined as “information technology and any equipment or interconnected system or subsystem of equipment that is used in the creation, conversion or duplication of data or information.”9 recently this terminology has been converted to information and communications technology (ict) as per the final rule updating section 508 in early 2017, but the essence of what it means remains unchanged.10 legislation regarding digital accessibility exists, specifically section 508 of the rehabilitation act of 1973, but only federal agencies and institutions receiving federal aid are required to abide by these statutes. lawmakers considered technology as a growing part of daily life in 1998 and amended the rehabilitation act with section 508, requiring federal agencies to make their ict accessible to people with disabilities.11 in 2017, these standards were updated with a final rule that modernized guidelines for accessibility of future ict.12 any research databases or other applications used by college and university libraries to facilitate online learning would be considered ict and thereby subject to section 508 requirements. it is evident that libraries not only have legal reasons to comply with section 508, but ethical reasons as well because making library collections and services universally available is a core value of the library community.13 in addition to legislation, the world wide web accessibility initiative (wai) created the web content accessibility guidelines (wcag) in 1999 in response to the growing need for web accessibility and to promote universal design. these standards created for web-content creators and web-tool developers are continually updated as new technologies and capabilities emerge— with version 2.0 being released in 2008—and apply specifically to web content and design. many information technology and libraries | septmeber 2018 14 of these guidelines were absorbed by the 2017 refresh of section 508 of the rehabilitation act of 1973.14 with fourteen guidelines assigned priority levels 1–3, wcag 2.0, and subsequent revisions to date, offer three levels of conformance with digital-accessibility guidelines: level a, the most basic level, meaning all mandatory level 1 guidelines are met; level aa, meaning priority levels 1 and 2 are met; and level aaa, meaning priority levels 1–3 are met. these conformance levels are important because many ict vendors will make their claims to conformance with wcag standards by using provided wai icons or using statements that refer to the level of conformance.15 wcag 2.0 guidelines alone are not enough to determine fully if a website or other digital content is truly accessible. it partly depends on it having an intuitive layout for a variety of users, which can only be achieved through usability testing.16 it is crucial that librarians understand what is required for a product or service to be considered accessible, and a firm grasp of wcag 2.0 and its conformance levels will enrich a librarian’s understanding of web accessibility and section 508 regulations.17 a voluntary product accessibility template (vpat) is a self-assessment document that vendors are required to complete only if they wish to sell their products to the federal government or any institution that chooses to require them. the quality of vpats varies, but essentially they will list section 508 standards and for each specify whether they fully or partially support it, do not support it, or if the standard is not applicable. there is then a space for the vendor to provide an explanation for limitations. since these are voluntary self-assessments, these documents can sometimes be brief and incomplete, but even brief statements can be specific enough to relatively easily verify the claims of support. because libraries are portals to online content, including e-books, e-journals, databases, streaming media, and more, which are provided largely by third-party vendors, libraries face unique struggles when attempting to comply with federal regulations. notions of equality and equal access are inherent to libraries and important for the maintenance of a democratic society, which makes accessibility within libraries’ digital content a concerning ethics issue.18 having little control over how ict is designed, libraries still must figure out how to address accessibility needs within third-party ict. in 2012, the association of research libraries (arl) joint task force on services to patrons with print disabilities encouraged libraries to require publishers to implement industry best practices, comply with legal requirements for accessibility, include language in publisher and vendor contracts to address accessibility, and request documentation like vpats.19 the task force’s report was vital in the creation and direction of this study. existing literature and studies as library professionals, we may often make assumptions of the accessibility of a third-party resource when the reality is that greater importance is placed on design of a product; accessibility components are either being added as special features or are being included once the design work is completed.20 tatomir and durrance conducted a study on the compatibility of thirty-two library databases with a set of guidelines for accessibility they called the tatomir accessibility checklist.21 this list included checking the usability of these databases with a screen reader and braille renewable display. they found that 44 percent of the databases were inaccessible, with an additional 28 percent being only “marginally accessible,” based on their criteria. this suggests major problems exist within vendor database platforms.22 enhancing visibility of vendor accessibility documentation | willis and o’reilly 15 https://doi.org/10.6017/ital.v37i3.10240 building on this research, western kentucky university libraries conducted a study on vpats from vendors to determine how accessible seventeen of their databases were.23 the university libraries ran an accessibility scan on those databases and compared the results with the vendors’ vpats, finding that the templates from the vendors were accurate about 80 percent of the time. most of the vendors did not address the accessibility of portable document format (pdf) files in their vpat statements, though it was an important component of their services. pertinent to this study, western kentucky’s work looked for accessibility documentation on vendors’ websites , and when one was not found, contacted the vendors requesting this information. this study was unique for targeting vendor-supplied vpats rather than only examining the databases themselves or tutorials from vendors. as mentioned previously, this was only done for the libraries’ main database vendors. mune and agee published an article on the ebooks accessibility project (eap) funded by affordable learning solutions at the california state university system. in this project, the researchers compared academic e-book platforms to e-reader platforms used for popular trade publications. they gathered data on the top sixteen library e-book vendors at san jose state university based on patron usage and title and holdings counts. the results indicated that academic e-book platforms were less accessible than nonacademic platforms, largely because of hesitance in adopting the epub 3 format, which by default has superior navigation and document structure to pdf or html, common academic options.24 while this study focused solely on the accessibility of e-book materials, a method for contacting vendors used in the eap study was adapted for the current study, applied at a larger scale. the eap researchers attempted to locate the vendors’ vpats online, and they contacted the vendors at least twice to request a vpat or other accessibility statement when none was located. it is noteworthy that of the sixteen vendors, all but one (94 percent) provided eap with some form of accessibility documentation, though less than half (44 percent) had a vpat available.25 another study, by joanne oud, examined vendor-supplied database video tutorials. half of the twenty-four vendors examined in oud’s study had tutorials in formats that were not accessible by keyboard or screen reader. this was largely because many of these tutorials were flash-based.26 shockwave flash is neither accessible for persons with disabilities nor good for usability on modern browsers.27 oud’s findings suggest that tutorial content would be more widely accessible if they were placed in youtube or another platform that had transcripts and captions available. while the focus of the study was different from our own, it was similar in that oud examined the accessibility of vendor materials apart from the journals and collections. also, oud noted that to make use of vendor tutorials, the website on which they are housed must likewise be accessible and the videos easy to find, but this is often not the case.28 other studies suggest that vendor websites and platforms often impede access to information. vendor platforms often have inaccessible pdfs, or the links to the full-text options are not easily located. delancey’s study also found more than three-fourths of the vendors examined had images without alternative text and frames without titles, resulting in many users with visual impairments being left out of the content of these images and frames entirely. of particular note, however, was the finding that not one of the vendors in this study had all forms —buttons, search boxes, and other browser navigation tools—labeled correctly, leaving the sites difficult to navigate.29 beyond whether the information itself is accessible, the question inevitably arises, can information technology and libraries | septmeber 2018 16 the desired information even be reached? one way or another, the content on these platforms must be accessible and easy to find. part of the motivation behind the current study stems from what delancey put so well: “only one vendor (out of seventeen), project muse, had a publicly available vpat on their website, though 9 others supplied this documentation upon request in under a week.”30 the first step in improving accessibility of resources for our patrons is to discuss accessibility with them—to determine how accessible information resources are today and identify areas of need. if a vpat or, minimally, any form of an accessibility statement is not easily discoverable on a vendor’s website—even if it is available upon request—users with disabilities as well as enabled users are not able to benefit from this information. are the vendors making it a priority in this case? additionally, since 41 percent of the vendors delancey examined had no vpat at all, what can be done before and aside from reaching out to vendors and stressing the importance of accessibility and of making statements on accessibility easy to find? from legal responsibilities to the dismal reality of digital accessibility, the task of improving library service for patrons with disabilities is daunting, even with the empowering ethical drivers of the library value system. ostergaard created “strategies for acquiring accessible electronic information sources,” an incredible guide to begin creating a guide that helps librarians develop an accessibility plan informed by her own work committed to accessibility in her library. steps 3 and 4 of ostergaard’s strategies are particularly relevant to the current study. step 3, “communicating with vendors,” involves inquiring about the accessibility of electronic products in addition to asking about any future plans for accessibility of their product and requesting vpats or other vendor supplied accessibility documentation. step 3 also recommends that librarians request vendors meet wcag 2.0 best practices and to incorporate a clause in license agreements that clearly defines accessibility of their products as further demonstration of ded ication to accessibility. such communication, it is hoped, would also lead to improved product development.31 once vendors are contacted, ostergaard outlines in step 4 the importance of documenting vendor communication regarding digital accessibility and further suggests assigning a person or team to review information received. ostergaard’s library changed the name of their acquisitions budget to “access budget,” reallocating a portion of their budget to review existing subscriptions, purchase accessible replacements, or in some cases, convert materials to an accessible format. the documentation review allowed the library to make informed decisions about collections and service availability on behalf of library users, but no mention was made of involving users in this process. the article provided a letter template that encompassed the aforementioned concepts and a request for assessment documentation, such as vpats and official statements of compliance. the ostergaard template served as a foundation for the language used in vendor communication for the current study, particularly the vpat or other accessibility documentation request.32 there have been no studies that suggest a way to implement easily discoverable vendor accessibility documentation—even when said documentation is not readily available to the public on the vendors’ sites. delancey suggested creating “an open repository for both vendor supplied documentation, and the results of any usability testing,” but this was suggested for internal library use, not public dissemination.33 if this documentation is made more easily available, we can enhancing visibility of vendor accessibility documentation | willis and o’reilly 17 https://doi.org/10.6017/ital.v37i3.10240 increase patron involvement in the discussion of accessibility of vendor-supplied library resources. research methods library-related research has focused the need for academic libraries to have accessible websites, in part to reach patrons who are participating in distance-education programs.34 a key component of a library’s website is the materials it avails to patrons from vendors, like databases and database aggregators. since, however, these materials are accessed via vendor platforms, they are outside the direct control of the library, making it more difficult to address accessibility concerns. some vendors have put forward significant effort in addressing accessibility needs. some offer a built-in feature for text-to-speech for html files or provide documents in a variety of formats, including txt and mp3 files, thereby offering a format that works well with common screenreading programs, or providing a sound file directly. this is of particular benefit to patrons with print disabilities.35 other vendors, such as ebook central (formerly ebrary), have worked to eliminate their flash dependencies. this is recognized as a positive step toward making vendor content usable for all. streaming video and other nonprint-based library materials must also be accessible. a person with visual impairments may be able to hear the soundtrack of the video, but unless an accurate description is provided of what is being presented visually, he or she will miss out on such information, such as the names of those speaking. to complicate matters further, hearing impaired users of these databases will not be privy to what is verbalized unless accurate captions and transcripts, or an interpreter, is made available for the videos. captions and transcripts are sometimes made available, but can easily be incomplete or incorrect. for example, alexander street press provided closed captioning and transcripts for some collections but not others. even when the captions or transcripts existed, as with a video we tested from ethnographic videos online, it was of low quality, inscribing the word “object” as “old pics,” “house” as “mess,” and so forth. one vendor, docuseek, had subtitles to translate from spanish, but no closed captioning or transcript available. audio-impaired users could not make full use of the video because the subtitles did not include all information presented in the sound track. (transcripts can also be useful to visually impaired users using screen readers.) films on demand had better captions and transcripts, but did not include all the words on the screen in the transcript, such as the title. regardless of the medium there are multiple ways to provide accessible versions, but they are seldom automatic. librarians must communicate the need for accessible digital files to vendors so they will prioritize it. as long as libraries—one of their main customer groups—accept their offerings whether accessible or not for persons with disabilities, vendors have no reason to put great effort into making these improvements. as colker pointed out, commercial vendors are not required to comply with ada regulations under title ii or title iii.36 vendors may also face resource restrictions that hinder their ability to improve their platforms’ accessibility. 37 they are businesses, so it is natural that they would only commit a concerted effort to reformat and enhance their platforms and records if the benefits are expected to outweigh the costs; they must firstly be made aware of the issue, and know that it is important to libraries and their patrons. information technology and libraries | septmeber 2018 18 in much the same way as contracted workers constructing a physical space for a federal or federally funded agency must follow ada standards for accessibility, so software vendors should be required to design virtual spaces to be accessible. this comparison was made by the department of education more than twenty years ago, and has the added benefit of greatly reducing the need for accommodation after the fact.38 according to cardenes, “at a minimum, a public entity has a duty to solve barriers to information access that the public entity’s purchasing choices create.”39 oswal stressed the importance of integrating the blind user experience into the development of databases from the beginning, as well as finding steps useful for guiding library users after the fact. merely following the rules set out in federal regulations is not enough to provide exemplary service to library patrons. the patrons as well must be involved in the process to fully address accessibility needs.40 process and findings the first objective of this study was to gain a better understanding of the accessibility of our library’s vendor-provided digital resources through the review of vendor-provided accessibility documentation. the second objective of this study was to determine a method of increasing the visibility of accessibility documentation for the benefit of our users and to communicate to them our commitment to improving service to users with disabilities. with a digital collection consisting of 270 databases, more than 750,000 e-books and e-journals, and more than 12 million streaming media titles, it was difficult to identify an appropriate sample. we needed a collection that would best serve as an illustrative swatch of our library’s digital holdings, and more importantly, a collection that would have the largest impact on our users. we also needed to establish a strategy for obtaining accessibility documentation regarding third-party content as well as create a delivery method for the vpats and other documentation we discovered in the course of our study. similar to other institutions, our library maintains a directory of the most used and most useful databases on the library’s homepage in the form of the a–z list (http://libresources.wichita.edu/az.php). determinations of usefulness are based on input from our reference librarians, who connect with user needs directly, whereas use comes from annual usage statistics compiled as per standard library procedures. users can browse this directory by subject, search by title, and sort by database type (full-text, streaming media, etc.), and the a-z list is a convenient place for users to begin their research. the directory also served as a convenient place to begin this study as it presented us with a sample that not only reflected the needs and habits of our patrons, but an excellent and diverse list of vendors to work with. beginning with a list of all subscribed databases (270 in 2016) exported directly from the a–z list’s backend, we sorted the list by vendor and determined that 74 vendors would be investigated. university materials indexed by the directory (i.e., institutional repository and libguides) were excluded from this study. as visibility of accessibility documentation is of concern to this study, our investigation began by visiting the database or vendor’s site and conducting a web search to obtain any information about accessibility. we were looking for mentions of the following keywords: “section 508” or “section 504,” “w3” or “wcag,” “vpat,” “ada,” and simply “accessibility.” some sites were intuitive: thirty-four vendors (45 percent) had statements that were found online. examples of commonly used documentation, which for the purposes of this study will be referred to as http://libresources.wichita.edu/az.php enhancing visibility of vendor accessibility documentation | willis and o’reilly 19 https://doi.org/10.6017/ital.v37i3.10240 accessibility statements, included “accessibility policy,” “section 508 compliance,” or “accessibility statement.” of those thirty-four vendors who posted accessibility documentation online, eleven provided a vpat or a link thereto in their accessibility statements. if we could not find an accessibility statement on the site, vendors were contacted first via email requesting information and documentation regarding the accessibility of their product using a form letter inspired by the ostergaard template.41 this email address was either found online— likely the “contact us” or technical support email links—or originated in the list of vendors’ contacts maintained in the library management system if another contact could not be found. if a response was not received within thirty days, the vendors were contacted a second time, a suggestion gleaned from mune and agee’s work.42 after all vendors included in the study had been contacted, any who did not provide a vpat were contacted a final time with a specific request for a vpat. for vendors who responded they could not provide a vpat or other accessibility statement, we used a screenshot of their response as documentation. the form letter (see appendix a) used in the current study made it known to vendors that their response would be posted publicly for the benefit of our users. twelve of the remaining vendors responded to our email inquiries with vpats and seven vendors responded with other accessibility documentation. figure 1. results of vendor query for accessibility documentation. in total, eleven vpats (15 percent) were found online and vpats from twelve vendors (16 percent) were received in response to our emailed request. twenty-three vendors (31 percent) had other accessibility documentation available online, while seven vendors (9 percent) provided other accessibility documentation in response to email inquiries. eight vendors (11 percent) other accessibility documentation found online 31% other accessibility documentation received 9% no official statement 11% did not respond 18% vpats found online 15% vpats received 16% vpats 31% results of vendor query for accessibility documentation information technology and libraries | septmeber 2018 20 responded they had no official statements or documentation to offer, and thirteen vendors (18 percent) did not respond (see figure 1). with the documentation compiled, we needed to establish an appropriate delivery system that would make this accessibility information visible to library users and therefore further the accessibility efforts. our collection cross-section, the a–z list, was chosen because of its prominence in our library’s online research presence as a suitable location to not only store but to convey this documentation to users. we created a clickable icon to be embedded into the databases’ entries in our a–z list created in libguides (a springshare product). clicking the icon would take the user to the vendor’s statement page, directly to the vpat, or to a page we created in libguides to store screen captures of vendor emails and vpats we received as attachments. if a vpat was available, we linked to it above any other documentation because vpats present a more rigorous analysis of the accessibility of third-party-created ict. libguides was determined to be a suitable place to house this documentation not only because it made the information easy to find for patrons, but also because springshare built libguides in an increasingly accessible manner and has documented its efforts using vpats for each product (see appendix b). further study it is expected that some of the information provided by the vendors is incomplete or inaccurate, even despite their best efforts, so the information we provide to patrons from and about the vendors might at times lead our patrons astray. we briefly examined the vpats acquired through this project to inform our work moving forward and found errors in at least half of them. some vendors claimed that skip navigation was available when none was found, while another would have benefitted from it but said it was “not applicable.” others were too brief to be useful, as no explanations were given for their claims. building on this current research, we intend, in collaboration with patrons with disabilities, to further verify the accuracy of key statements made by vendors in their vpats and other accessibility documentation. this analysis will give concrete feedback to vendors on how their sites could be further improved. as stated earlier, giving patrons access requires more than following a set of guidelines; it requires dialog to ensure their needs are fully met.43 it requires more than making the available documents accessible, but also testing the platform used to retrieve the documents for accessibility. as one author put it so well, “a lack of technological access is a solvable problem, but only if it is made a priority.”44 as vendors are not directly subject to enforcement of section 508 and other statutes regarding accessibility of the products they provide to libraries, vpats are truly voluntary. as such, the level of effort and detail of the product assessments are inconsistent and accuracy of the documentation is questionable. we intend to continue to be involved in the digital-accessibility initiative in part through our analysis of our digital-library presence, utilizing user input and expanding their role in improving the user experience. this would enable us to further improve our libraries’ service to users with disabilities. if we, as library professionals and institutions, stand together and each say our part, vendors will realize this is an important issue to address. also, it is important that we, as service providers for the public good, act as a bridge between these vendors—who at times do not avail good service information to their customers—and the patrons we serve. it may be a small step, but providing links to the vpats and other accessibility statements from vendors right where the patrons need enhancing visibility of vendor accessibility documentation | willis and o’reilly 21 https://doi.org/10.6017/ital.v37i3.10240 them is an important step in meeting the patrons where they are and showing them help is available. we can show patrons we care and will work with them to improve the now limited accessibility of not only scholarly information itself, but even of the platforms in which they are housed. information technology and libraries | septmeber 2018 22 appendix a: accessibility documentation request email template subject line: vpat request thank you for the information you provided answering our inquiry regarding the accessibility of your electronic product. wichita state university libraries has set a goal of improving the accessibility of the electronic and information technology we provide to our patrons. in accordance with section 504 of the rehabilitation act and title ii of the americans with disabilities act, do you happen to have a voluntary product accessibility template (vpat) available, or have you made plans to do further accessibility testing on your product? the vpat documentation can be found on the u.s. department of state website: http://www.state.gov/m/irm/impact/126343.htm. http://www.state.gov/m/irm/impact/126343.htm enhancing visibility of vendor accessibility documentation | willis and o’reilly 23 https://doi.org/10.6017/ital.v37i3.10240 appendix b: vpat and other accessibility documentation urls used in the databases a–z list. (list current as of october 20, 2017. library subscriptions may have changed. vendors may have updated urls or added additional documentation since october 20. research on this project is ongoing. please see http://libresources.wichita.edu/az.php for a current list of vendor accessibility documentation.) vendor urls aapg (american association of petroleum geologists) no accessibility documentation available abc-clio no response acls (american council of learned societies) http://www.humanitiesebook.org/about/for-librarians/#adacompliance-and-accessibility acm (association of computing machinery) https://www.acm.org/accessibility acs (american chemical society) https://www.acs.org/content/acs/en/accessibilitystatement.html adam matthew digital http://libresources.wichita.edu/c.php?g=583127&p=4026332 aiaa (american institute of aeronautics & astronautics) http://libresources.wichita.edu/ld.php?content_id=32264954 alexander street press https://alexanderstreet.com/page/accessibility-statement american institute of physics http://www.scitation.org/faqs american mathematical society http://www.ams.org/about-us/vpat-mathscinet-2014-ams.pdf apa (american psychological association) http://www.apa.org/about/accessibility.aspx asm international no response asme (american society of mechanical engineers) no accessibility documentation available astm no accessibility documentation available bioone http://www.bioone.org/page/resources/accessibility books 24x7 https://documentation.skillsoft.com/bkb/qrc/assistiveqrc.pdf britannica http://help.eb.com/bolae/accessibility_policy.htm business expert press http://media2.proquest.com/documents/ebookcentral_vpat.pdf cabell’s no response cambridge crystallographic data centre https://www.ccdc.cam.ac.uk/termsandconditions/ cambridge university press http://www.cambridge.org/about-us/accessibility/ cas no accessibility documentation available clcd (children’s literature comprehensive database) no response http://libresources.wichita.edu/az.php http://www.humanitiesebook.org/about/for-librarians/#ada-compliance-and-accessibility http://www.humanitiesebook.org/about/for-librarians/#ada-compliance-and-accessibility https://www.acm.org/accessibility https://www.acs.org/content/acs/en/accessibility-statement.html https://www.acs.org/content/acs/en/accessibility-statement.html http://libresources.wichita.edu/c.php?g=583127&p=4026332 http://libresources.wichita.edu/ld.php?content_id=32264954 https://alexanderstreet.com/page/accessibility-statement http://www.scitation.org/faqs http://www.ams.org/about-us/vpat-mathscinet-2014-ams.pdf http://www.apa.org/about/accessibility.aspx http://www.bioone.org/page/resources/accessibility https://documentation.skillsoft.com/bkb/qrc/assistiveqrc.pdf http://help.eb.com/bolae/accessibility_policy.htm http://media2.proquest.com/documents/ebookcentral_vpat.pdf https://www.ccdc.cam.ac.uk/termsandconditions/ http://www.cambridge.org/about-us/accessibility/ information technology and libraries | septmeber 2018 24 conference board http://www.conferenceboard.ca/accessibility/resources.aspx?as pxautodetectcookiesupport=1 cq press http://library.cqpress.com/cqresearcher/html/public/vpat.html credo reference https://credoreference.zendesk.com/hc/enus/articles/201429069-accessibility datazoa http://libresources.wichita.edu/accessibilitystatements/datazo avpat docuseek2 https://docuseek2.wikispaces.com/section+508+compliance+st atement ebsco https://www.ebscohost.com/government/full-508-accessibility ei engineering village https://www.elsevier.com/solutions/engineeringvillage/features/accessibility elsevier https://www.elsevier.com/solutions/sciencedirect/support/web -accessibility gale https://support.gale.com/technical/618 google https://www.google.com/accessibility/initiatives-research.html hathitrust https://www.hathitrust.org/accessibility heinonline https://www.wshein.com/accessibility/ ibisworld no response ieee https://www.ieee.org/accessibility_statement.html infobase learning http://support.infobaselearning.com/index.php?/tech_support/ knowledgebase/article/view/1318/0/ada-usability-statement infogroup http://libresources.wichita.edu/c.php?g=583127&p=4286285 institute of physics http://iopscience.iop.org/page/accessibility interdok no response jstor https://about.jstor.org/accessibility/ kanopy https://help.kanopystreaming.com/hc/enus/articles/210691557-what-is-kanopy-s-position-onaccessibility lexisnexis http://www.lexisnexis.com/gsa/76/accessible.asp library of congress https://www.congress.gov/accessibility mergent no accessibility documentation available national academies press no response national library of medicine https://www.nlm.nih.gov/accessibility.html naxos http://libresources.wichita.edu/c.php?g=583127&p=4287131 ncjrs https://www.justice.gov/accessibility/accessibility-information newsbank http://libresources.wichita.edu/c.php?g=583127&p=4457078 oclc https://www.oclc.org/en/policies/accessibility.html ovid http://ovidsupport.custhelp.com/app/answers/detail/a_id/590 9/~/is-the-ovid-interface-section-508-compliant%3f oxford university press https://global.oup.com/academic/accessibility/?cc=us&lang=en & projectmuse https://muse.jhu.edu/accessibility http://www.conferenceboard.ca/accessibility/resources.aspx?aspxautodetectcookiesupport=1 http://www.conferenceboard.ca/accessibility/resources.aspx?aspxautodetectcookiesupport=1 http://library.cqpress.com/cqresearcher/html/public/vpat.html https://credoreference.zendesk.com/hc/en-us/articles/201429069-accessibility https://credoreference.zendesk.com/hc/en-us/articles/201429069-accessibility http://libresources.wichita.edu/accessibilitystatements/datazoavpat http://libresources.wichita.edu/accessibilitystatements/datazoavpat https://docuseek2.wikispaces.com/section+508+compliance+statement https://docuseek2.wikispaces.com/section+508+compliance+statement https://www.ebscohost.com/government/full-508-accessibility https://www.elsevier.com/solutions/engineering-village/features/accessibility https://www.elsevier.com/solutions/engineering-village/features/accessibility https://www.elsevier.com/solutions/sciencedirect/support/web-accessibility https://www.elsevier.com/solutions/sciencedirect/support/web-accessibility https://support.gale.com/technical/618 https://www.google.com/accessibility/initiatives-research.html https://www.hathitrust.org/accessibility https://www.wshein.com/accessibility/ https://www.ieee.org/accessibility_statement.html http://support.infobaselearning.com/index.php?/tech_support/knowledgebase/article/view/1318/0/ada-usability-statement http://support.infobaselearning.com/index.php?/tech_support/knowledgebase/article/view/1318/0/ada-usability-statement http://libresources.wichita.edu/c.php?g=583127&p=4286285 http://iopscience.iop.org/page/accessibility https://about.jstor.org/accessibility/ https://help.kanopystreaming.com/hc/en-us/articles/210691557-what-is-kanopy-s-position-on-accessibilityhttps://help.kanopystreaming.com/hc/en-us/articles/210691557-what-is-kanopy-s-position-on-accessibilityhttps://help.kanopystreaming.com/hc/en-us/articles/210691557-what-is-kanopy-s-position-on-accessibilityhttp://www.lexisnexis.com/gsa/76/accessible.asp https://www.congress.gov/accessibility https://www.nlm.nih.gov/accessibility.html http://libresources.wichita.edu/c.php?g=583127&p=4287131 https://www.justice.gov/accessibility/accessibility-information http://libresources.wichita.edu/c.php?g=583127&p=4457078 https://www.oclc.org/en/policies/accessibility.html http://ovidsupport.custhelp.com/app/answers/detail/a_id/5909/~/is-the-ovid-interface-section-508-compliant%3f http://ovidsupport.custhelp.com/app/answers/detail/a_id/5909/~/is-the-ovid-interface-section-508-compliant%3f https://global.oup.com/academic/accessibility/?cc=us&lang=en& https://global.oup.com/academic/accessibility/?cc=us&lang=en& https://muse.jhu.edu/accessibility enhancing visibility of vendor accessibility documentation | willis and o’reilly 25 https://doi.org/10.6017/ital.v37i3.10240 proquest http://media2.proquest.com/documents/proquest_academic_vp at.pdf, http://media2.proquest.com/documents/ebookcentral_vpat.pdf, readex http://uniaccessig.org/lua/wpcontent/uploads/2014/11/readex.pdf sage https://us.sagepub.com/en-us/nam/accessibility-0 salem press no response sbrnet no response springer https://github.com/springernature/vpat/blob/master/springerl ink.md standard & poor’s no response swank no accessibility documentation available (http://libresources.wichita.edu/accessibilitystatements/swan kaccessibility) taylor & francis http://libresources.wichita.edu/c.php?g=583127&p=4539268 thomson reuters https://clarivate.com/wpcontent/uploads/2018/02/pacr_wos_5.27_jan-2018_v1.0.pdf, us department of commerce http://osec.doc.gov/accessibility/accessibliity_statement.html us department of education https://www2.ed.gov/notices/accessibility/index.html us government printing office https://www.gpo.gov/accessibility university of chicago no accessibility documentation available university of michigan https://www.press.umich.edu/about#accessibility uptodate http://libresources.wichita.edu/c.php?g=583127&p=4691631 valueline http://libresources.wichita.edu/accessibilitystatements/valueli neaccessibility wrds (wharton research data services) https://wrds-www.wharton.upenn.edu/pages/wrds-508compliance/ wiley http://olabout.wiley.com/wileycda/section/id-406157.html references 1 neil savage, “weaving the web,” communications of the acm 60, no. 6 (june 2017): 22. 2 ruth colker, “the americans with disabilities act is outdated,” drake law review 63, no. 3 (2015): 799. 3 colker, “the americans with disabilities act,” 817; joanne oud, “accessibility of vendor-created database tutorials for people with disabilities,” information technology and libraries 35, no. 4 (2016): 13–14. 4 laura delancey and kirsten ostergaard, “accessibility for electronic resources librarians,” serials librarian 71, no. 3–4 (2016): 181, https://doi.org/10.1080/0361526x.2016.1254134. http://media2.proquest.com/documents/proquest_academic_vpat.pdf http://media2.proquest.com/documents/proquest_academic_vpat.pdf http://media2.proquest.com/documents/ebookcentral_vpat.pdf http://uniaccessig.org/lua/wp-content/uploads/2014/11/readex.pdf http://uniaccessig.org/lua/wp-content/uploads/2014/11/readex.pdf https://us.sagepub.com/en-us/nam/accessibility-0 https://github.com/springernature/vpat/blob/master/springerlink.md https://github.com/springernature/vpat/blob/master/springerlink.md http://libresources.wichita.edu/accessibilitystatements/swankaccessibility http://libresources.wichita.edu/accessibilitystatements/swankaccessibility http://libresources.wichita.edu/c.php?g=583127&p=4539268 https://clarivate.com/wp-content/uploads/2018/02/pacr_wos_5.27_jan-2018_v1.0.pdf https://clarivate.com/wp-content/uploads/2018/02/pacr_wos_5.27_jan-2018_v1.0.pdf http://osec.doc.gov/accessibility/accessibliity_statement.html https://www2.ed.gov/notices/accessibility/index.html https://www.gpo.gov/accessibility https://www.press.umich.edu/about#accessibility http://libresources.wichita.edu/c.php?g=583127&p=4691631 http://libresources.wichita.edu/accessibilitystatements/valuelineaccessibility http://libresources.wichita.edu/accessibilitystatements/valuelineaccessibility https://wrds-www.wharton.upenn.edu/pages/wrds-508-compliance/ https://wrds-www.wharton.upenn.edu/pages/wrds-508-compliance/ http://olabout.wiley.com/wileycda/section/id-406157.html https://doi.org/10.1080/0361526x.2016.1254134 information technology and libraries | septmeber 2018 26 5 americans with disabilities act of 1990, pub. l. no. 101-336, 104 stat. 327 (1990). 6 rehabilitation act of 1973, pub. l. no. 93-112, 87 stat. 355 (1973). 7 discrimination on the basis of disability in federally assisted programs and activities, 77 fed. reg. 14,972 (march 14, 2012) (to be codified at 34 cfr pt. 104). 8 delancey and ostergaard, “accessibility for electronic resources,” 180. 9 architectural and transportation barriers compliance board, 65 fed. reg. 80,500, 80,524 (december 21, 2000) (to be codified at 36 cfr pt. 1194). 10 architectural and transportation barriers compliance board, 82 fed. reg. 5,790 (january 19, 2017) (to be codified at 36 cfr pt. 1193-1194). 11 29 usc §794d, at 289 (2016). 12 architectural and transportation barriers compliance board, 82 fed. reg. 5,790, 5,791 (january 19, 2017) (to be codified at 36 cfr pt. 1193-1194). 13 paul t. jaeger, “section 508 goes to the library: complying with federal legal standards to produce accessible electronic and information technology in libraries,” information technology and disabilities 8, no. 2 (2002), http://link.galegroup.com/apps/doc/a207644357/aone?u=9211haea&sid=aone&xid=4c7f 77da. 14 architectural and transportation barriers compliance board, 82 fed. reg. 5,790, 5791 (january 19, 2017) (to be codified at 36 cfr pt. 1193-1194). 15 ben caldwell et al., eds., “web content accessibility guidelines (wcag) 2.0,” last modified december 11, 2008, http://www.w3.org/tr/2008/rec-wcag20-20081211/. 16 delancey, laura, “assessing the accuracy of vendor-supplied accessibility documentation,” library hi tech 33, no. 1 (2015): 108. 17 ostergaard, kirsten, “accessibility from scratch: one library’s journey to prioritize the accessibility of electronic information resources,” serials librarian 69, no. 2 (2015): 159, https://doi.org/10.1080/0361526x.2015.1069777. 18 jaeger, “section 508.” 19 mary case et al., eds., “report of the arl joint task force on services to patrons with print disabilities,” association of research libraries, november 2, 2012, p. 29, http://www.arl.org/storage/documents/publications/print-disabilities-tfreport02nov12.pdf. 20 delancey and ostergaard, “accessibility for electronic resources,” 180. http://link.galegroup.com/apps/doc/a207644357/aone?u=9211haea&sid=aone&xid=4c7f77da http://link.galegroup.com/apps/doc/a207644357/aone?u=9211haea&sid=aone&xid=4c7f77da http://www.w3.org/tr/2008/rec-wcag20-20081211/ https://doi.org/10.1080/0361526x.2015.1069777 http://www.arl.org/storage/documents/publications/print-disabilities-tfreport02nov12.pdf enhancing visibility of vendor accessibility documentation | willis and o’reilly 27 https://doi.org/10.6017/ital.v37i3.10240 21 jennifer tatomir and joan c. durrance, “overcoming the information gap: measuring the accessibility of library databases to adaptive technology users,” library hi tech 28, no. 4 (2010): 581, https://doi.org/10.1108/07378831011096240. 22 tatomir and durrance, “overcoming the information gap,” 584. 23 delancey, “assessing the accuracy,” 104–5. 24 christina mune and ann agee, “are e-books for everyone? an evaluation of academic e-book platforms’ accessibility features,” journal of electronic resources librarianship 28, no. 3 (2016): 172–75, https://doi.org/10.1080/1941126x.2016.1200927. 25 mune and agee, “are e-books for everyone?,” 175. 26 joanne oud, “accessibility of vendor-created database tutorials for people with disabilities,” information technology and libraries 35, no. 4 (2016): 12, https://doi.org/10.6017/ital.v35i4.9469. 27 mark hachman, “tested: how flash destroys your browser’s performance,” pc world, august 7, 2015, https://www.pcworld.com/article/2960741/browsers/tested-how-flash-destroysyour-browsers-performance.html. 28 oud, “accessibility of vendor-created database tutorials,” 12. 29 delancey, “assessing the accuracy,” 106–7. 30 delancey, “assessing the accuracy,” 105. 31 kirsten ostergaard, “accessibility from scratch: one library’s journey to prioritize the accessibility of electronic information resources,” serials librarian 69, no. 2 (2015): 162–65, https://doi.org/10.1080/0361526x.2015.1069777. 32 ostergaard, “accessibility from scratch.” 164 33 delancey, “assessing the accuracy,” 111. 34 cynthia guyer and michelle uzeta, “assistive technology obligations for postsecondary education institutions,” journal of access services 6, no. 1/2 (2009): 29; oud, “accessibility of vendor-created database tutorials,” 7. 35 mune and agee, “are e-books for everyone?,” 173. 36 colker, “the americans with disabilities act,” 792–93. 37 delancey, “assessing the accuracy,” 107. 38 colker, “the americans with disabilities act,” 814; mune and agee, “are e-books for everyone?,” 182. https://doi.org/10.1108/07378831011096240 https://doi.org/10.1080/1941126x.2016.1200927 https://doi.org/10.6017/ital.v35i4.9469 https://www.pcworld.com/article/2960741/browsers/tested-how-flash-destroys-your-browsers-performance.html https://www.pcworld.com/article/2960741/browsers/tested-how-flash-destroys-your-browsers-performance.html https://doi.org/10.1080/0361526x.2015.1069777 information technology and libraries | septmeber 2018 28 39 adriana cardenes to dr. james rosser, april 7, 1997, private collection, quoted in colker, “the americans with disabilities act is outdated,” 815. 40 sushil k. oswal, “access to digital library databases in higher education: design problems and infrastructural gaps,” work 48, no. 3 (2014): 316. 41 ostergaard, “accessibility from scratch,” 164. 42 mune and agee, “are e-books for everyone?,” 175. 43 delancey, “assessing the accuracy,” 108; mune and agee, “are e-books for everyone?,” 181. 44 colker, “the americans with disabilities act,” 817. abstract introduction literature review legislation and existing guidelines existing literature and studies research methods process and findings further study appendix a: accessibility documentation request email template appendix b: vpat and other accessibility documentation urls used in the databases a–z list. reproduced with permission of the copyright owner. further reproduction prohibited without permission. the impact of information technology on library anxiety: the role of computer attitudes jiao, qun g;onwuegbuzie, anthony j information technology and libraries; dec 2004; 23, 4; proquest pg. 138 the impact of information technology on library anxiety: oun g. jiao and anthony j. onwuegbuzie the role of computer attitudes over the past two decades, computer-based technologies have become dominant forces to shape and reshape the products and services the academic library has to offer. the application of library technologies has had a profound impact on the way library resources are being used. although many students continue to experience high levels of library anxiety, it is likely that the new technologies in the library have led to them experiencing other forms of negative affective states that may be, in part, a function of their attitude towards computers. this study investigates whether students' computer attitudes predict levels of library anxiety. c omputers and information technologies have experienced considerable growth over the past two decades. as such, familiarity with computers is rapidly becoming a basic skill and a prerequisite for many tasks. although not every college student is equally prepared for the rising demand of computer skills in the !nformation age, computer literacy is increasingly becommg a gatekeeper for students' academic success. 1 gaps in computer literacy and skills can leave many students behind not only in their academic achievement but also in their future job-market success. the unprecedented pace of technological change in the development of digital information networks and electronic services in recent years has helped to expand the role of the academic library. once only a storehouse of printed materials, it is now a technology-laden information network where students can conduct research in a mixed print and digital-resource environment, experience the use of advanced information technologies, and hone their computer skills. yet, many students are struggling to cope with the changes brought on by the rapid advances of information teclmologies. academic libraries of various sizes have spent a large percentage of their material budget on electronic commercial content, and the trend will continue.' these days, college students are faced with the choices of ever-changing modes of electronic accessing tools, interfaces, and protocols along with the traditional print resources in the library. the fact that the same journal article may be available in multiple vendors' aggregator oun g. jiao (gerryjiao@baruch.cuny.edu) is reference libraria~ and_ associate professor at newman library, baruch college, city university of new york, and anthony j. onwuegbuzie (tony onwuegbuzie@aol.com) is associate professor at the college of education, university of south florida, tampa. sites (such as ebscohost and gale group) makes the navigation through these bibliographic databases more complex and challenging. relevant sources must be identified and navigation protocols must be learned before appropriate information and contents can be found. furthermore, having located a citation, students still have to search the library online catalog to find out if the journal or book is available in the library and, if not, know how to make an interlibrary loan request either on paper or electronically.' anxiety levels can be high and patience levels can be low at varying times of conducting library research. 4 . that students experience various levels of apprehension when using academic libraries is not a new phenomenon. indeed, the phenomenon is prevalent among college students in the united states and many other countries, and is widely known as library anxiety. mellon first coined the term in her study in which she noted that 75 percent to 85 percent of undergraduate students described their initial library experiences in terms of anxiety.5 according to mellon, feelings of anxiety stem from either the relative size of the library; a lack of knowledge about the location of materials, equipment, and resources of the library; how to initiate library research; or how to proceed with a library search. 6 library anxiety is an unpleasant feeling or emotional state with physiological and behavioral concomitants that come to the fore in li_brary settings. typically, library-anxious students experience negative emotions, including ruminations, tension, fear, and mental disorganization, which prevent them from using the library effectively. 7 a student who experiences library anxiety usually undergoes either emotional ~r physical discomfort when faced with any library or library-related task. 8 library anxiety may arise from a lack of self-confidence in conducting research, lack of prior exposure to academic libraries, the inability to see the relevance of libraries to one's field of interest, and lack of familiarity with library equipment and technologies. library anxiety is often accorded special attention because of its debilitating effects on students' academic achievement.9 although many students continue to experience high levels of library anxiety, it is likely that the new technologies and electronic databases in libraries have led to students experiencing other forms of negative affective states. in particular, it is likely that library anxiety experienced by students is, in part, a function of their attitudes toward computers. consistent with this assertion, mizrachi and shoham and mizrachi reported a statistically significant relationship between library anxiety and computer attitudes. 10 they noted in their research that home and work usage of computers, computer games, word processors, computer spreadsheets, and the internet are all related to the dimensions of library anxiety found among israeli students to varying degrees. 138 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. similarly, jerabek, meyer, and kordinak found levels of computer anxiety to be related to levels of library anxiety for both men and women. 11 these studies focused exclusively on undergraduate students. however, no study has examined this relationship among graduate students, a population that uses the academic library more than any other student population. over the past fifteen years, a large body of research literature on computer attitudes has been generated. in particular, many researchers have studied the relationship between computer attitudes and computer use. 12 the importance of beliefs and attitudes towards computers and technologies is widely acknowledged. 13 students' computer attitudes arguably impact their willingness to engage in computer-related activities in colleges and universities where effectively using library electronic resources represents an increasingly important part of college education. negative computer attitudes may inhibit students' interests in learning to use the library resources and thereby weaken their academic performance levels, while at the same time elevating levels of library anxiety. mclnerney, mclnerney, and sinclair observed that negative perceptions about computers among student teachers may accompany feelings of anxiety, including worries about being embarrassed, looking foolish, and even damaging the computer equipment. 14 further, there is often a negative relationship between prior experience with computers and computer anxiety experienced by individuals. 15 until recently, library anxiety has only been interpreted in the context of the library setting-that is, a phenomenon that occurs while students are undertaking library tasks. jiao, onwuegbuzie, and lichtenstein defined library anxiety as "an uncomfortable feeling or emotional disposition, experienced in a library setting, which has cognitive, affective, physiological, and behavioral ramifications." 16 at the same time, unprecedented technological advancement has had a profound impact on the products and services offered by academic libraries. students now are able to conduct sophisticated library searches from the comfort of their homes. it is clear that the construct of library anxiety needs to be expanded in the new library and information environment, incorporating into its definition other variables that are relevant for the changing library and information context. because many library users spend a significant portion of their time using computer-based technologies to conduct information searches, it is natural to ask, to what extent does library anxiety stem from students' prior attitudes and experiences with computers and library technologies? however, with the exception of the studies conducted by mizrachi and shoham and mizrachi on israeli undergraduate students, this link has not been examined. 17 thus, the present study investigated the relationship between computer attitudes and library anxiety in the rapidly changing library and information environment. as such, the current inquiry replicated the works of mizrachi, shoham and mizrachi, and jerabek, meyer, and kordinak by examining the degree to which computer attitudes predict levels of library anxiety among graduate students in the united states. 18 it was expected that findings from this study would help to increase the understanding of the construct of library anxiety. indeed, research in this area has become critical in higher education where educators are responsible for graduating students with the skills necessary to thrive and to lead in a rapidly changing technological environment in the twenty-first century. i method participants participants were ninety-four african american graduate students enrolled in the college of education at a historically black college and university in the eastern u.s. all participants were solicited in either a statistics or a measurement course at the time that the investigation took place. in order to participate in the study, students were required to sign an informed-consent document that was given during the first class session of the semester. the majority of the participants were female. ages of the participants ranged from twenty-two to sixty-two years (mean = 30.40, sd = 8.75). instruments and procedure all participants were administered two scales, namely, the computer attitude scale (cas) and the library anxiety scale (las). the cas, developed by loyd and gressard, contains forty likert-type items that assess individuals' attitudes toward computers and the use of computers. 19 this instrument consists of the following four scales, which can be used separately: (1) anxiety or fear of computers; (2) confidence in the ability to use computers; (3) liking or enjoying working with computers; and (4) computer usefulness. loyd and gressard reported coefficient alpha reliability coefficients of .86, .91, .91, and .95 for scores pertaining to computer anxiety, computer confidence, computer liking, and total scales, respectively. for the present study, the score reliabilities were as follows: • computer anxiety, .84 (95 percent confidence interval ci = .79, .88); • computer confidence, .81 (95 percent ci = .75, .86); • computer liking, .89 (95 percent ci= .85, .92); and • computer usefulness, .76 (95 percent ci = .68, .83). the impact of information technology on library anxiety i jiao and onwuegbuzie 139 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the las, developed by bostick, contains forty-three 5-point likert-format items that assess levels of library anxiety experienced by college students. 20 it also contains the following five subscales: 1. barriers with staff; 2. affective barriers; 3. comfort with the library; 4. knowledge of the library; and 5. mechanical barriers. a high score on any subscale represents high levels of anxiety in that area. jiao and onwuegbuzie, in their examination of the score reliability reported on las in the extant literature, found that it has typically been in the adequate to high range for the subscale and total-scale scores. 21 based on their analysis, onwuegbuzie, jiao, and bostick concluded that "not only does the [las] produce scores that yield extremely reliable estimates, but also these estimates are remarkably consistent across samples with different cultures, nationalities, ages, years of study, gender composition, educational majors, and so forth." 22 for the current investigation, the subscales generated scores for the combined sample that had a classical theory alpha reliability coefficient of .89 (95 percent ci = .85, .92) for barriers with staff, .84 (95 percent ci = .79, .88) for affective barriers, .53 (95 percent ci= .37, .66) for comfort with the library, .62 (95 percent ci= .48 .73) for knowledge of the library, and .70 (95 percent ci= .58, .79) for mechanical barriers. analysis a canonical correlation analysis was conducted to identify a combination of library anxiety dimensions (barriers with staff, affective barriers, comfort with the library, knowledge of the library, and mechanical barriers) that might be simultaneously related to a combination of computer-attitude dimensions (computer anxiety, computer liking, computer confidence, and computer usefulness). canonical correlation analysis is used to examine the relationship between two sets of variables whereby each set contains more than one variable. 23 in the present investigation, the five dimensions of library anxiety were treated as the dependent multivariate set of variables, and the four dimensions of computer attitudes formed the independent multivariate profile. the number of canonical functions (factors) that can be produced for a given dataset is equal to the number of variables in the smaller of the two variable sets. because the library-anxiety set contained five dimensions and the computer-attitude set contained four variables, four canonical functions were generated. for any significant canonical coefficient, the standardized canonical-function coefficients and structure coefficients were then interpreted. standardized canonicalfunction coefficients are computed weights that are applied to each variable in a given set in order to obtain the composite variate used in the canonical correlation analysis. as such, standardized canonical-function coefficients are equivalent to factor-pattern coefficients in factor analysis or to beta coefficients in a regression analysis." conversely, structure coefficients represent the correlations between a given variable and the scores on the canonical composite (latent variable) in the set to which the variable belongs.2 5 thus, structure coefficients indicate the degree to which each variable is related to the canonical composite for the variable set. indeed, structure coefficients are essentially bivariate correlation coefficients that range in value between -1.0 and + 1.0 inclusive." the square of the structure coefficient yields the proportion of variance that the original variable shares linearly with the canonical variate. i results table 1 presents the intercorrelations among the five dimensions of library anxiety and the four dimensions of computer attitude. of particular interest were the twenty correlations between the library-anxiety subscale scores and the computer-attitude subscale scores. it can be seen that, after applying the bonferroni adjustment, four of these relationships were statistically significant. specifically, computer liking was statistically significantly related to affective barriers, knowledge of the library, and comfort with the library. using cohen's criteria of .1, .3, and .5 for small, medium, and large relationships, respectively, the first two relationships (involving affective barriers and knowledge of the library) were medium, and the third relationship (between computer liking and comfort with the library) was large. 27 in addition to these three relationships, the association between computer usefulness and knowledge of the library also was statistically significant, with a medium effect size. the correlation matrix in table 1 was used to examine the multivariate relationship between library anxiety and computer attitudes. this relationship was assessed via a canonical correlation analysis. the canonical analysis revealed that the four canonical correlations combined were statistically significant (p < .0001). also, when the first canonical root was removed, the remaining three canonical roots were not statistically significant. in fact, removal of subsequent canonical roots did not lead to statistical significance. together, these results suggested that only the first canonical function was statistically significant, but the remaining three roots were not statistically significant. this first canonical root also was practically significant (rc1 = .63), contributing 40.8 percent (rc12) to the shared variance, which represents a large effect size. 28 140 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. data pertaining to the first canonical root are presented in table 2, which provides both standardized function coefficients and structure coefficients. using a cutoff correlation of 0.3, the standardized canonical-function coefficients revealed that affective barriers, comfort with the library, and knowledge of the library made important contributions to the library-anxiety set, with affective barriers and comfort with the library making similarly large contributions. 29 with regard to the computer-attitude set, computer anxiety, computer liking, and computer confidence made noteworthy contributions, with the latter two dimensions making the most noteworthy contributions. the structure coefficients revealed that all five dimensions of library anxiety made important contributions to the first canonical variate. the square of the structure coefficient indicated that barriers with staff, affective barriers, comfort with the library, and knowledge of the library made similarly large contributions, explaining 67.2 percent, 72.3 percent, 72.3 percent, and 60.8 percent of the variance, respectively. with regard to the computerattitude set, computer liking and computer usefulness made important contributions. these variables explained 64.0 percent and 16.8 percent of the variance, respectively. comparing the standardized and structure coefficients indicated that computer anxiety and computer confidence served as suppressor variables because the standardized coefficients associated with these variables were large, whereas the corresponding structure coefficients were relatively small. 30 suppressor variables are variables that assist in the prediction of dependent variables due to their correlation with other independent variables. 31 thus, the inclusion of computer anxiety and computer confidence in the canonical correlation model strengthened the multivariate relationship between library anxiety and computer attitudes. i discussion the purpose of this study was to investigate the relationship between computer attitudes and library anxiety among african american graduate students. specifically, the multivariate link between these two constructs was examined. a canonical correlation analysis revealed a strong multivariate relationship between library anxiety and computer attitudes. the library-anxiety subscale scores and computer-attitudes subscale scores shared 40.82 percent of the common variance. specifically, computer liking and computer usefulness were related simultaneously to the following five dimensions of library anxiety: barriers with staff, affective barriers, comfort with the library, knowledge of the library, and mechanical barriers. computer anxiety and computer confidence served as suppressor variables. thus, computer attitudes predict levels of library anxiety. as such, the present findings are consistent with those of mizrachi and shoham and mizrachi, who found a statistically significant relationship between computer attitudes and the following seven dimensions of the hebrew library-anxiety scale, a modified version of las developed by the authors for their israeli sample: 1. staff 2. knowledge 3. language 4. physical comfort 5. library computer comfort 6. library policies and hours, and 7. resources. 32 according to its authors, the staff factor refers to students' attitudes towards librarians and library staff and their perceived accessibility. the knowledge factor pertains to how students rate their own library expertise. the language factor relates the extent to which using englishlanguage searches and materials yield discomfort. physical comfort evaluates how much the physical facility negatively affects students' satisfaction and comfort with the library. library computer comfort assesses the perceived trustworthiness of library computer facilities and the quality of directions for using them. library policies and hours concerns students' attitudes toward library rules, regulations, and hours of operation. finally, resources refers to the perceived availability of the desired material in the library collection. the correlations between the dimensions of library anxiety and computer attitudes ranged from .11 (physical comfort) to .47 (knowledge). the current results also replicate those of jerabek, meyer, and kordinak, who found levels of computer anxiety to be related to levels of library anxiety for both men and women. 33 nevertheless, caution should be exercised in generalizing the current findings to all graduate students. though the present study examined the association between library anxiety and computer attitudes among african american graduate students, it should not be assumed that this relationship would hold for other racial groups. jiao, onwuegbuzie, and bostick found that african american students attending a research-intensive institution reported statistically significantly lower levels of library anxiety associated with barriers with staff, affective barriers, and comfort with the library than did caucasian american graduate students enrolled at a doctoral-granting institution, with effect sizes ranging from moderate to large. 34 in a follow-up study, jiao and onwuegbuzie compared african american and caucasian american students with respect to library anxiety, controlling for educational background by selecting both racial groups from the same institution. 35 no statistically significant racial differences the impact of information technology on library anxiety i jiao and onwuegbuzie 141 reproduced with permission of the copyright owner. further reproduction prohibited without permission. table 1. lntercorre lations among the library-anxiety subscales and computer-att itude subsca les subscale 2 3 4 5 6 7 8 9 1 . barriers with staff .64 * .63* .49* .46* .02 .05 -.27 -.09 * .52 * * -.05 .02 -.37 * -.23 2. affective barriers .56 .40 3. comfort with the library .56 * .44 * -.19 -.20 -.55 * -.16 _39*' -.21 -.11 -.37 * -.32 * 4. knowledge of the library 5. mechanical barriers -.13 -.01 -.18 .04 .77 * .48 * .46 * 6. computer anxiety .67 * .36 * 7. computer confidence .43 * 8. computer liking 9. computer usefulness *indicates a statistically significant relationsh ip after the bonferroni adjustment. table 2. canon ical solution for th ird function-re lationship between library-anx iety subscales and computer-att itude subsca les theme standardization coefficient library-anxiety subscale barriers with staff .17 affect ive barriers .40* comfort with the library .39* know ledge of the library .31 * mechanical barr iers -.12 computer-attitude subscale computer anxiety -0.31* computer confidence 0.98* computer liking -1.25* computer usefulness -0 .13 *loadings with the effect sizes larger than .3. were found in library anxiety for any of the five dimensions of las. however, across all five library-anxiety measures, the african american sample reported lower scores than did the caucasian american sample. in fact, using the test of trend by onwuegbuzie and levin, they found that the consistency with which the african american graduate students had lower levels of library anxiety than did the caucasian american students was both statistically and practically significant. 36 thus, jiao and onwuegbuzie's results, alongside those of jiao, onwuegbuzie, and bostick, suggest that racial differences in library anxiety prevail. 37 thus, future research should investigate whether the relationship between library anxistructure coefficient .82* .85* .85* .78* .39* -.22 .13 -.80* -.41 * structure•(%) 67.2 72 .3 72.3 60.8 15.2 4.8 1.7 64.0 16.8 ety and computer attitudes found in the present study among african american graduate students also exists among caucasian american graduate students, as well as among other racial groups. further, the causal direction of the relationship found in the current study should be investigated. that is, future studies should investigate whether library anxiety places a person more at risk for experiencing poor computer attitudes, or whether the converse is true. more research also is needed to determine how computer attitudes might play a role in the library context. notwithstanding, it appears that the construct of library anxiety can be expanded to include the construct 142 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. of computer attitudes. indeed, one implication of the findings is that bostick's las should be modified to include dimensions of computer attitudes. 38 such a modification likely would facilitate the identification of library-anxious students. by identifying students with high levels of library anxiety and poor computer attitudes, library educators and others could help them improve their dispositions and provide them with the skills necessary to negotiate the rapidly changing technological environment, thereby putting them in a better position to be lifelong learners. references 1. susan m. piotrowski, computer training: pathway from extinction (eric document reproduction service, ed 348955, 1992). 2. thomas h. hogan, "drexel university moves aggressively from print to electronic access for journals (interview with carol hansen montgomery, dean of libraries)," computers in libraries 21, no. 5 (may 2001): 22-27. 3. m. claire stewart and h. frank cervone, "building a new [nfrastructure for digital media: northwestern university library," information technology and libraries 22, no. 2 (june 2003): 69-74. 4. carol c. kuhlthau, "longitudinal case studies of the information search process of users in libraries," library and information science research 10 (july 1988): 257-304; carol c. kuhlthau, "inside the search process: information seeking from the user's perspective," journal of the american society for information science 42, no. 5 (june 1991): 361-71; carol c. kuhlthau, seeking meaning: a process approach to library and information services (norwood, n.j.: ablex, 1993); carol c. kuhlthau, "students and the information search process: zones of intervention for librarians," advances in librarianship 18 (1994): 57-72; carol c. kuhlthau et al., "validating a model of the search process: a comparison of academic, public, and school library users," library and information science research 12, no. 1 (jan.-mar. 1990): 5-31. 5. constance a. mellon, "library anxiety: a grounded theory and its development," college & research libraries 47, no. 2 (mar. 1986): 160-65. 6. ibid. 7. qun g. jiao, anthony j. onwuegbuzie, and art lichtenstein, "library anxiety: characteristics of' at-risk' college students," library and information science research 18 (spring 1996): 151-63. 8. constance a. mellon, "attitudes: the forgotten dimension in library instruction," library journal 113 (sept. 1, 1988): 137-39; constance a. mellon, "library anxiety and the nontraditional student," in reaching and teaching diverse library user groups, ed. teresa b. mensching (ann arbor, mich.: pierian, 1989), 77-81; anthony j. onwuegbuzie, "writing a research proposal: the role of library anxiety, statistics anxiety, and composition anxiety," library and information science research 19, no. 1 (1997): 5-33. 9. anthony j. onwuegbuzie and qun g. jiao, "information search performance and research achievement: an empirical test of the anxiety-expectation model of library anxiety," fournal of the american society for information science and technology (jasist) 55, no. 1 (2004): 41-54; anthony j. onwuegbuzie, qun g. jiao, and sharon l. bostick, library anxiety: theory, research, and applications (lanham, md.: scarecrow, 2004). 10. diane mizrachi, "library anxiety and computer attitudes among israeli b.ed. students" (master's thesis, bar-ilan university, israel, 2000); snunith shoham and diane mizrachi, "library anxiety among undergraduates: a study of israeli b.ed. students," journal of academic librarianship 27, no. 4 (july 2001): 305-11. 11. ann j. jerabek, linda s. meyer, and thomas s. kordinak, "'library anxiety' and 'computer anxiety': measures, validity, and research implications," library and information science research 23, no. 3 (2001): 277-89. 12. muhamad a. al-khaldi and ibrahim m. al-jabri, "the relationship of attitudes to computer utilization: new evidence from a developing nation," computers in human behavior 9, no. 1 (jan. 1998): 23-42; margaret cox, valeria rhodes, and jennifer hall, "the use of computer-assisted learning in primary schools: some factors affecting uptake," computers in education 12, no. 1 (1988), 173-78; gayle v. davidson and scott d. ritchie, "attitudes toward integrating computers into the classroom: what parents, teachers, and students report," journal of computing in childhood education 5, no. 1 (1994): 3-27; donald g. gardner, richard l. dukes, and richard discenza, "computer use, self-confidence, and attitudes: a causal analysis," computers in human behavior 9, no. 4 (winter 1993): 427-40; robin h. kay, "predicting student teacher commitment to the use of computers," journal of educational computing research 6, no. 3 (1990): 299-309. 13. deborah bandalos and jeri benson, "testing the factor structure invariance of a computer attitude scale over two grouping conditions," educational and psychological measurement 50, no. 1 (spring 1990): 49-60; frank m. bernt and alan c. bugbee jr., "factors influencing student resistance to computer administered testing," journal of research on computing in education 22, no. 3 (spring 1990): 265-75; michel dupagne and kathy a. krendl, "teacher's attitudes toward computers: a review of the literature," journal of research on computing in education 24, no. 3 (spring 1992): 420-29; elizabeth mowrer-popiel, constance pollard, and richard pollard, "an analysis of the perceptions of preservice teachers toward technology and its use in the classroom," journal of instructional psychology 21, no. 2 (june 1994): 131-38; jennifer d. shapka and michel ferrari, "computerrelated attitudes and actions of teacher candidates," computers in human behavior 19, no. 3 (may 2003): 319-34. 14. valentina mclnerney, dennis m. mclnerney, and kenneth e. sinclair, "student teachers, computer anxiety, and computer experience," journal of educational computing research 11, no. 1 (1994): 27-50. 15. susan e. jennings and anthony j. onwuegbuzie, "computer attitudes as a function of age, gender, math attitude, and developmental status," journal of educational computing research 25, no. 4 (2001): 367-84. 16. jiao, onwuegbuzie, and lichtenstein, "library anxiety," 152. 17. mizrachi, "library anxiety and computer attitudes"; shoham and mizrachi, "library anxiety among undergraduates." 18. mizrachi, "library anxiety and computer attitudes"; shoham and mizrachi, "library anxiety among undergraduates"; the impact of information technology on library anxiety i jiao and onwuegbuzie 143 reproduced with permission of the copyright owner. further reproduction prohibited without permission. jerabek, meyer, and kordinak, '"library anxiety' and 'computer anxiety."' 19. brenda h. loyd and clarice gressard, "the effects of sex, age, and computer experience on computer attitudes" aeds journal 18, no. 2 (1984): 67-77. 20. sharon l. bostick, "the development and validation of the library anxiety scale" (ph.d. diss, wayne state university, 1992). 21. qun g. jiao and anthony j. onwuegbuzie, "reliability generalization of the library anxiety scale scores: initial findings/' (unpublished manuscript, 2002). 22. onwuegbuzie, jiao, and bostick, library anxiety, 22. 23. norman cliff and david j. krus, "interpretation of canonical analyses: rotated versus unrotated solutions," psychometrica 41, no. 1 (mar. 1976): 35-42; richard b. darlington, sharon l. weinberg, and herbert j. walberg, "canonical variate analysis and related techniques," review of educational research 42, no. 4 (fall 1973): 131-43; bruce thompson, "canonical correlation: recent extensions for modeling educational processes" (paper presented at the annual meeting of the american educational research association, boston, mass., apr. 7-11, 1980) (eric, ed 199269); bruce thompson, canonical correlation analysis: uses and interpretations (newbury park, calif.: sage, 1984); bruce thompson, "canonical correlation analysis: an explanation with comments on correct practice" (paper presented at the annual meeting of the american educational research association, new orleans, la., apr. 5-9, 1988) (eric, ed 295957); bruce thompson, "variable importance in multiple regression and canonical correlation" (paper presented at the annual meeting of the american educational research association, boston, mass., april 16-20, 1990) (eric, ed 317615). 24. margery e. arnold, "the relationship of canonical correlation analysis to other parametric methods" (paper presented at the annual meeting of the southwest educational research association, new orleans, la., jan. 1996) (eric, ed 395994). 25. thompson, "canonical correlation: recent extensions." 26. ibid. 27. jacob cohen, statistical power analysis for the behavioral sciences (new york: wiley, 1988). 28. ibid. 29. zarrel v. lambert and richard m. durand, "some precautions in using canonical analysis," journal of marketing research 12, no. 4 (nov. 1975): 468-75. 30. anthony j. onwuegbuzie and larry g. daniel, "typology of analytical and interpretational errors in quantitative and qualitative educational research," current issues in education 6, no. 2 (feb. 2003). accessed nov. 13, 2003,http://cie.ed.asu.edu/ volume6/number2/. 31. barbara g. tabachnick and linda s. fidell, using multivariate statistics, 3rd ed. (new york: harper), 1996. 32. mizrachi, "library anxiety and computer attitudes"; shoham and mizrachi, "library anxiety among undergraduates." 33. jerabek, meyer, and kordinak, '"library anxiety' and 'computer anxiety."' 34. qun g. jiao, anthony j. onwuegbuzie, and sharon l. bostick, "racial differences in library anxiety among graduate students," library review 53, no. 4 (2004): 228-35. 35. qun g. jiao and anthony j. onwuegbuzie, "library anxiety: a function of race?" (unpublished manuscript, 2003). 36. anthony j. onwuegbuzie and joel r. levin, "a proposed three-step method for assessing the statistical and practical significance of multiple hypothesis tests" (paper presented at the annual meeting of the american educational research association, san diego, calif., apr. 12-16, 2004). 37. jiao, onwuegbuzie, and bostick, "racial differences in library anxiety." 38. bostick, "the development and validation of the library anxiety scale." 144 information technology and libraries i december 2004 president’s message andromeda yelton information technology and libraries | december 2017 2 andromeda yelton (andromeda.yelton@gmail.com) is lita president 2017-18 and senior software engineer, mit libraries, cambridge, united states. before i dive into my column, i’d like to recognize and thank bob gerrity for his six years of service as ital’s editor in chief. he oversaw our shift from a traditional print journal to a fully online one, recognized by micah vandegrift and chealsye bowley as having the strongest open-access policies of all lis journals (http://www.inthelibrarywiththeleadpipe.org/2014/healthyself/). i’d like to further extend a welcome to ken varnum as our new editor in chief. ken’s distinguished record of lita service includes stints on the ital editorial board and the lita board of directors, so he knows the journal very well and i am enthusiastic about its future under his lead. i’m particularly curious to see what will be discussed in ital under ken’s leadership because i’ve just come back from two outstanding conferences which drove home the significance of the issues we wrestle with in library technology, and i’m looking forward to a third. in early november, i attended lita forum in scenic denver. the schedule was packed with sessions on intriguing topics – too many, of course, for me to attend them all – but two in particular stand out to me. in one, sam kome detailed how he’s going about a privacy audit at the claremont colleges library. he walked us through an extensive – and sometimes surprising – list of places personally identifiable information can lurk on library and campus systems, and talked through what his library absolutely needs (which is less than he’d thought, and far less than the library has been logging without thinking about it). in the other, mary catherine lockmiller took a design thinking approach to serving transgender populations. she shared a fantastic, practical libguide (http://libguides.southmountaincc.edu/transgenderresources), but the part that stuck with me most is her statement that many trans people may never physically enter a library because public spaces are not safe spaces; for this population, our electronic services are our public services. as technologists, we create the point of first, and maybe only, contact. a week later, i attended the inaugural data for black lives conference (http://d4bl.org/) at the mit media lab, steps from my office. this was – and i think everyone in the room felt it – something genuinely new. from the galvanizing topic, to the sophisticated visual and auditory design, to the frisson of genius and creativity buzzing all around a room of artists, activists, professors, poets, data scientists and software engineers, it was a remarkable experience for us all. those of you who heard dr. safiya noble speak at thomas dowling’s lita president’s program in 2016 are familiar with algorithmic bias. numerous speakers discussed this at d4bl: the ways that racial disparities in underlying data sets can be replicated, magnified, and given a veneer of objective power when run through the black boxes that power predictive policing or risk assessment for bail hearings. absent and messy data was a theme as well: in a moment that would make many librarians chuckle (and then wince) knowingly, a panel of music industry executives estimated that 40% of their metadata is wrong, thus making it impossible to credit and compensate artists appropriately. mailto:andromeda.yelton@gmail.com) https://www.google.com/url?q=http://www.inthelibrarywiththeleadpipe.org/2014/healthyself/&sa=d&ust=1512118443864000&usg=afqjcnedfyl-ywfgnadmdzfcrvvnmhlhhq http://libguides.southmountaincc.edu/transgenderresources http://d4bl.org/) president’s message | yelton 3 https://doi.org/10.6017/ital.v36i4.10238 and yet – in a memorable keynote – dr. ruha benjamin called on us not only to collect data about black death, as she showed us an image of the ambulance bill sent to tamir rice’s family, but to listen to our artists and poets as we use our data to imagine black life – this in front of an image of wakanda. with our data and our creativity, what new worlds can we map? several of my mit colleagues also attended d4bl, and as we discussed it afterward we started thinking about how these ideas can drive our own work. how does the imaginary world of wakanda connect to the archival imaginary, and what worlds can we empower our own creators to imagine with what we collect and preserve? how can we use our data literacy and access to sometimes un-googleable resources to help community groups collate data on important issues that are not tracked by our public institutions, such as police violence (https://mappingpoliceviolence.org/) or racial disparities in setting bail? with these ideas swirling in my mind, i am looking forward with tremendous excitement to lita forum 2018. building on the work of our forum assessment task force, we’ll be doing a lot of things differently; in particular, aiming for lots of hands-on, interactive sessions. this will be a conference where, whether you’re a presenter or an attendee, you’ll be able to do things. and these last two conferences have driven home for me how very much there is to do in of library technology. our work to select, collect, preserve, clean, and provide access to data can indeed have enormous impact. technology services are front-line services. https://mappingpoliceviolence.org/) reproduced with permission of the copyright owner. further reproduction prohibited without permission. the internet, the world wide web, library web browsers, and library web servers jian-zhong, zhou information technology and libraries; mar 2000; 19, 1; proquest pg. 50 tutorial the internet, the world wide web, library web browsers, and library web servers jian-zhong (joe) zhou this article first examines the difference between two very familiar and sometimes synonymous terms, the internet and the web. the article then explains the relationship between the web's protocol http and other high-level internet protocols, such as telnet and ftp, as well as provides a brief history of web development. next, the article analyzes the mechanism in which a web browser (client) "talks" to a web server on the internet. finally, the article studies the market growth for web browsers and web servers between 1993 and 1999. two statistical sources were used in the web market analysis: a survey conducted by the university of delaware libraries for the 122 members of the association of research libraries, and the data for the entire web industry from different web survey agencies. many librarians are now dealing with the internet and the web on a daily basis. while the web is sometimes synonymous with the internet in many people's minds, the two terms are quite distinct, and they refer to different but related concepts in the modem computerized telecommunication system. the internet is nothing more than many small computer networks that have been wired together and allow electronic information to be sent from one network to the next around the world . a piece of data from joe zhou (joezhou@udel.edu) is associate librarian at the university of delaware library, newark. beijing, china may traverse more than a dozen networks while making its way to washington, d.c. we can compare the internet to the great wall of china, which was built in the qin dynasty around the third century b.c. by connecting many existing short defense walls built by previous feudal states . the great wall not only served as a national defense system for ancient china, but also as a fast military communication system. a border alarm was raised by means of smoke signals by day, and beacon fires at night, ignited by burning a mixture of wolf dung , sulfur, and saltpeter. the alarm signal could be relayed over many beacon-fire towers from the western end of the great wall to the eastern end (4,500 miles away) within a day . this was considered light speed two thousand years ago. however, while the great wall transferred the message in a linear mode, the internet is a multidimensional network. the web is a late-comer to the internet, one of the many types of high-level data exchange protocols on the internet. before the web, there was telnet, the traditional commanddriven style of interaction. there was ftp, a file transfer protocol useful for retrieving information from large file archives. there was usenet , a communal bulletin board and news system. there was also e-mail for individual information exchange, and e-mail lists, for one-to-many broadcasts. in addition, there was gopher, a campus-wide information system shared among universities and research institutions, and wais, a powerful search and retrieval system developed by thinking machines, inc. in 1990 tim bemerslee and robert cailliau at cern (www. cern.ch), the european laboratory for particle physics, created a new information system called "world wide web" (www). designed to help the cern scientists with the increasingly confusing task of exchanging information on the 50 information technology and libraries i march 2000 internet, the web system was to act as a unifying force, a system that would seamlessly bind all file-protocols into a single point of access. instead of having to invoke different programs to retrieve information via various protocols, users would be able to use a single program, called a "browser," and allow it to handle all the details of retrieving and displaying information. in december 1993 www received the ima award, and in 1995 bemers-lee and cailliau received the association for computing (acm) software system award for its development. the web is best known for its ability to combine text with graphics and other multimedia on the internet. in addition, the web has some other key features that make it stand out from earlier internet information exchange protocols. since the web is a late-comer to the internet, it has to be compatible backwards with other communications protocols in addition to its native language, hypertext transfer protocol (http). among the foreign languages spoken by web browsers are telnet, ftp, and other high-level communication protocols mentioned earlier. this support for foreign protocols lets people use a single piece of software, the web browser, to access information without worrying about shifting from protocol to protocol and software incompatibility . despite different high-level protocols including http for the web, there is one thing in common for all parts of the internet-tcp/ ip, the lower level of the internet protocol. tcp /ip is respon sible for establishing the connection between two computers on the internet and guarantees that the data can be sent and received intact. the format and content of the data are left for high-level communication protocols to manage, among which the web is the best known one. at the tcp /ip level all computers "are created equal." two computers establish a connection and start to reproduced with permission of the copyright owner. further reproduction prohibited without permission. communicate. in reality, however, most conversations are asymmetric. the end user's machine (the client) usually sends a short request for information, and the remote machine (the server) answers with a longwinded response. the media is the internet. the common language on the internet can be the web or any other high-level protocols . on the web, the client is the web browser; it handles the user's request for a document. the first web browser, ncsa mosaic, developed by the national center for supercomputing applications (ncsa) at the university of illinois at urbanachampaign, was released in midnovember 1993 for unix, windows, and macintosh platforms. version 3.0 of ncsa mosaic is available at www. ncsa. uiuc.ed u/ sdg /software/ mosaic. both source code and binaries are free for academic use. mosaic lost market share to netscape after its key developer left ncsa and joined netscape. even after mosaic introduced an innovative 32-bit version in early 1997, which can perform feats that other major browsers had not even thought of back then, mosaic remained out of the major browsers' market. the two most widely-used browsers today are microsoft's internet explorer (ie) and netscape's navigator (part of the netscape communicator suite). recent web browser surveys conducted by different internet survey companies such as www.zonaresearch.com/ browserstudy, www.psrinc.com/ trends.htm, and www .statmarket. com all indicate that ie is the market leader with more than 60 percent market share, leaving navigator with between 35 percent and 40 percent. in 1995 ie had only 1 percent share versus navigator's more than 90 percent, an unimaginable rise critics have attributed to microsoft's strategy of bundling the browser with its near-monopoly windows operating system. however, a survey conducted in december 1998 by the university of delaware library of 122 members of the association of research libraries (arl) showed that netscape still remained the market leader among big academic libraries. more than 90 percent of arl libraries supported netscape, and about 50 percent also supported ie. most arl libraries supported both browsers, and unlike the browser industry survey mentioned earlier, in which only one product can be picked as the primary browser , the sum of the percentages for the arl survey was greater than 100 percent. the main function of the web browser is to request a document available from a specific server through the internet using the information in the document's url. the server on a remote machine returns the document usually physically stored on one of the server's disks. with the use of common gateway interface (cgi), the documents do not have to be static. rather, they can be synthesized at the point of being requested by cgi scripts running on the server's side of the connection . in some database-driven web servers that make the core of today's e-commerce, the documents provided may never exist as physical files but are generated as needed from database records . the web server can be run on almost any computer, and server software is available for almost all operating systems, such as unix, windows 95/98/nt, macintosh, and os / 2. according to the university of delaware library's 1998 survey of internet web servers among arl member libraries, more than 32 percent of arl libraries chose apache as their web server software, followed by the netscape series at 29.32 percent, ncsa httpd at 11.28 percent, and microsoft internet information server (iis) at 7.52 percent. in july 1999 the author checked the netcraft survey at www .netcraft. com/survey . the top three web server software programs for more than 6.5 million web sites are apache (56.35 percent) , microsoft-hs (22.33 percent), and netscape (5.65 percent). the netcraft survey also provides the historical market share information of major web servers since august 1995. ncsa httpd was the first web server software released, about the same time as the release of mosaic in 1993. however, it slipped from the number-one position with more than 90 percent market share in 1993, and almost 60 percent in 1995, to less than 1 percent in july 1999. it is no longer supported by ncsa, however, httpd remains a popular choice for web servers due to its small size, fast performance, and solid collection of features . the "inertia effect" of the existing sites (if it runs well, why bother to change?) will likely keep ncsa on the major web server software list for some time. ncsa is free, but available only for the unix platform. it is available from http:/ /hoohoo .ncsa.uiuc.edu. however, when the author visited the site in july 1999, the following message appeared on the main page : "the ncsa httpd is no longer under development. it is an unsupported product. we recommend that you check out the apache server, instead of installing our server." most people who use only web browsers may have heard of apache only as an indian nation or a military helicopter, not the most popular web server software with more than 50 percent market share . it was first introduced as a set of fixes or "patches" to the ncsa httpd. apache 1.0 was released in december 1995 as open-source server software by a group of webmasters who named themselves the apache group. open-source means the source code is available and freely distributed, and it is the key to apache's attractiveness and popularity. the apache group members were nsca users tutorial i zhou 51 reproduced with permission of the copyright owner. further reproduction prohibited without permission. who decided to coordinate development work on the server software after nsca stopped. in july 1999 the apache group announced that it was establishing a more formal organization called the apache software foundation (asp). in the future, the asp (www .apache.org) will monitor development of the free software, but it will remain a "not-for-profit" foundation. apache is high-end, enterprise-level server software and can be run on os/2, unix (including linux), and windows platforms, but a mac version is still not available. the netscape series includes netscape-enterprise, netscape-pasttrack, netscape-commerce, and netscape-communication . enterprise is a high-end, enterprise-level server while pasttrack serves as an entrylevel server for small workgroups. netscape supports both the unix and the windows nt platforms. the other major commercial web server, microsoft internet information server (iis), as of 1999, is only available for the windows platform. however, one advantage of iis over netscape is that it can be downloaded for free as part of the windows option pack. in addition, iis can handle ms office documents very well. while both the microsoft and netscape brand names are well recognized by millions of end users. a name alone does not necessarily equate to large market share, nor does a deep pocket. apache remains the top web server despite intense competition. one of the keys to apache's success, in addition to its outstanding performance, lies in its open-source code movement and active user support on a wide basis. the web server of choice for the macintosh platforms is webstar. however, due to the limitations of the operating system networking software, the performance of macintosh-based servers has not been great. webstar can be downloaded as a free evaluation release from www.stamine.com/webstar. the web server market is dynam52 information technology and libraries i march 2000 ic and competition intense. there are more than sixty web server products on the top list ( of web servers with more than one thousand web sites) as of july 1999, and newcomers are being added frequently. acknowledgments the author thanks peter liu, head of the systems department at the university of delaware library, for providing the web survey data of arl libraries . after this article was submitted, the survey data was published by arl in 1999 as spec kit 246: web page development and management. the author also wants to thank his dear wife min yang for her technical assistance. min is webmaster and system administrator for the web site at a. i. dupont nemours foundation and hospital for children, http:/ /kidshealth.org. a framework for measuring relevancy in discovery environments article a framework for measuring relevancy in discovery environments blake l. galbreath, alex merrill, and corey m. johnson information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12835 abstract discovery environments are ubiquitous in academic libraries but studying their effectiveness and use in an academic environment has mostly centered around user satisfaction, experience, and task analysis. this study aims to create a quantitative, reproducible framework to test the relevancy of results and the overall success of washington state university’s discovery environment (primo by ex libris). within this framework, the authors use bibliographic citations from student research papers submitted as part of a required university class as the proxy for relevancy. in the context of this study, the researchers created a testing model that includes: (1) a process to produce machine-generated keywords from a corpus of research papers to compare against a set of human-created keywords, (2) a machine process to query a discovery environment to produce search result lists to compare against citation lists, and (3) four metrics to measure the comparative success of different search strategies and the relevancy of the results. this framework is used to move beyond a sentiment or task-based analysis to measure if materials cited in student papers appear in the results list of a production discovery environment. while this initial test of the framework produced fewer matches between researcher-generated search results and student bibliography sources than expected, the authors note that faceted searches represent a greater success rate when compared to open-ended searches. future work will include comparative (a/b) testing of commonly deployed discovery layer configurations and limiters to measure the impact of local decisions on discovery layer efficacy as well as noting where in the results list a citation match occurs. introduction discovery environments are ubiquitous in academic libraries as all but two libraries in the association of research libraries (arl) report using a discovery environment, and they continue to gain traction in other library settings.1 the one-stop shopping model of discovery environments is one of their most alluring features as it closely resembles searching the open web. this familiarity allows users who are accustomed to searching the web to feel comfortable searching the library catalog without fear of encountering a “failed” search (zero result set). discovery environments seldom fail to return results as even the most rudimentary or naïve search strategy will return something for a user. this idea of “returning something” has been anecdotally noted as a positive as it ensures the user does not give up and allows novices to be successful with limited search sophistication or prior instruction from information professionals. one of the potential negatives to this approach however is the sheer volume of material that is returned per search query. library discovery environments often present thousands, if not millions, of search results from an initial search query. this emulation of google is essentially blake l. galbreath (blake.galbreath@wsu.edu) is core services librarian, washington state university. alex merrill (merrilla@wsu.edu) is head of library systems and technical operations, washington state university. corey m. johnson (coreyj@wsu.edu) is instruction & assessment librarian, washington state university. © 2021. mailto:blake.galbreath@wsu.edu mailto:merrilla@wsu.edu mailto:coreyj@wsu.edu information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 2 making the time-honored study of relevancy (precision/recall) moot. how can one determine the number of relevant documents in a search query if the number of documents returned is becoming limitless? this study aims to create a quantitative, reproducible framework to test the relevancy of results returned from, and the overall efficacy of, a library discovery environment, in this case, ex libris primo. within this framework, the authors compare the results returned in model primo search queries against the bibliographic citations used in students’ research papers. background the university common requirements (ucore) curriculum, implemented in fall 2012, was a major redesign of the washington state university (wsu) undergraduate general education program. ucore is comprised of required categories of classes designed to build student proficiency in the seven undergraduate learning goals.2 roots of contemporary issues (rci) is the sole mandated undergraduate course under the ucore system.3 during the 2018–2019 academic year, over 4,500 students were enrolled in rci at wsu, the vast majority being first-year students. this paper utilizes data from the rci library research project, a term-length research experience with four central assignments designed to familiarize students with the fundamentals of quality research and a cumulative research paper where they utilize the skills learned. the research project components are spaced evenly throughout the term; students are guided along the research process from general topic formation, to research question generation, to thesis statement defense in the final paper. students are tasked with finding sources of particular resource types (e.g., journal articles), describing the value of these sources for their research, and citing them properly in chicago style. wsu libraries uses the discovery environment primo, an ex libris product, to provide resources to its patrons.4 specifically, wsu libraries uses the new user interface version of primo, which incorporates search results from the primo central index (pci) in its default search. primo, like all discovery environments, provides results with a wide variety of resource types so rci students can use it at all stages of the term research project. students use it in the pursuit of contemporary newspaper articles, history monographs, history journal articles, and primary sources. in this article, the authors focus on the versatility of primo, using rci student paper bibliographies as the central data source for the project. literature review the need for assessment of library resources and services in higher education has been welldocumented. libraries are increasingly asked to provide tangible evidence they aid student information literacy skill development and thus advance achievement of institutional learning outcomes. accrediting bodies acknowledge, “the importance of information literacy skills, and most accreditation standards have strengthened their emphasis on the teaching roles of libraries.”5 oakleaf and kaske also stress the importance of librarians choosing assessments that can contribute to university-wide assessment efforts, noting they are preferable to assessments that only benefit libraries.6 the washington state university libraries is committed to assessment of its resources and services, with primo as a central target resource, and with large, lowerundergraduate courses as a primary area of focus. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 3 there are numerous papers which document usability testing of primo. prommann and zhang (2015) analyzed the efficiency of primo through hierarchical task analysis (hta). they counted the number of physical and cognitive steps necessary to get to records or full text of known items and concluded that primo is “a flexible discovery layer as it helps achieve many goals with minimum amount [sic] of steps.”7 although many of these studies articulate avenues of success in terms of user interaction with the discovery environment, there are also reports of difficulties in a variety of categories. students have problems with source retrieval, for example, understanding availability status terminology and labels, and using link resolvers and interlibrary loan.8 dalal et al. (2015) demonstrated that retrieving the full text of an article in a discovery environment is sometimes unintuitive for students and involves navigating multiple interfaces. 9 users also have issues using facets to find particular resource types or distinguishing between them. 10 while the study addressed in this paper does not directly address user difficulties with primo functionality, issues with source retrieval point to a plausible explanation for the few matches between the model search results and student paper bibliographies. it is possible students saw many of the same sources from the model searches in their results, but ultimately did not secure those sources because of the difficulties outlined above. in other words, some source selection choices are based mostly on availability, not as much on relevance. source relevancy is an active area of research for web-based discovery services, in terms of comparative studies to disciplinary subject databases. evelhoch and zebulin analyzed two years of usage data from both primo and a selection of subject databases, concluding that users have difficulty finding relevant sources in primo or they are not available. 11 based on users’ judgments, lee and chung, determined that ebsco discovery service was less effective than a set of education and library subject databases in terms of source relevance. 12 another study illustrated that while students preferred discovery environments, the articles they selected from the subject (indexing and abstracting) databases were more authoritative.13 finally, librarians are posited to believe that subject databases are superior to discovery environments in terms of the relevancy of search results and disciplinary coverage.14 conclusions about source relevancy are complicated by the fact that students infrequently look beyond a first page of results lists.15 researchers have also explored the idea of primo user satisfaction through the presence of relevant results. in one instance, using online questionnaires and in-person focus groups, researchers found users had a high level of satisfaction with their institution’s discovery environment, largely attributed to the quality of search results over ease of use.16 hamlett and georgas (2019) conducted a mixed-methods user experience study to understand student perceptions of relevancy in primo. this study found that participants believed primo to return relevant results (with an average score of 8.3 out of 10). however, some of the qualitative responses indicated that the keywords used did not actually yield relevant results. 17 many other methods and measures have been executed in determining the value and usefulness of primo. huurdeman, aamodt, and heggo analyzed a dataset of 50 popular queries in primo. they deemed a query successful if the first 10 results included the (likely) targeted resource and found that 58% of the queries from the popular searches dataset had been successful, while 20% were unsuccessful, and 22% could not be determined. their approach assumed there is one intended document per query and that the authors can surmise what it is.18 the research presented in the remainder of this article below is unique in that the authors explore user judgment of source relevance (satisfaction) as a function of whether sources in the model primo searches for their topics existed in the students’ papers’ bibliographies. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 4 methods research questions the impetus for this study was to understand the factors that play a role in establishing a framework to test the relevancy of results returned from primo. the authors attempted to answer the following questions: • how effective is primo at returning relevant results? • to what extent does faceting improve search results? • which search strategies are the most effective within the given framework? • how can the researchers refine the framework for future investigations into relevancy? • what are the implications of this study for end users? data collection the authors began with a sample of 100 randomly selected and anonymized research papers that were submitted to the roots of contemporary issues (rci) courses in fall 2018 and spring 2019 semesters. the study used a two-pronged approach to generate keywords for model primo search queries. for one approach, keywords were machine-generated via a word-vector generation process. for the other, keywords were human-generated by a student research assistant to approximate natural language queries. keywords and queries, machine a rapidminer (https://rapidminer.com/) word-vector generation process with term-frequency schema converted the research papers into keywords, which the authors then used to generate search queries. within the main routine, the process documents from files operator, rapidminer transformed the texts into lower case and tokenized the final papers according to non -letters. rapidminer then filtered the data by those tokens representing nouns and adjectives, removed english stop words, and filtered tokens by length, with a minimum of one character and maximum of 50 characters. the researchers then applied a snowball stemmer for english words and generated 20 n-grams per paper, each with a maximum length of four. table 1 illustrates the product of the word-vector generation process. throughout this example research paper, "trade” occurred 40 times, "slave” occurred 34 times, “slave” and “trade” occurred together 26 times, "africa” occurred 18 times, "impact” occurred 16 times, “african” occurred 11 times, and "peopl” occurred 10 times. table 1. example n-grams and frequency as retrieved from rapidminer n-gram number of occurrences trade 40 slave 34 slave_trade 26 africa 18 impact 16 african 11 peopl 10 ... ... https://rapidminer.com/ information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 5 number of n-grams after compiling the data in rapidminer, the authors created a process to select those n-grams to use in the model primo search queries. huurdeman, aamodt, and heggo (2018) found that users included an average of 2.6 terms per query in their popular searches dataset.19 in a report by ex libris, stohn indicates that most topic-search queries contain five or fewer words.20 in order to investigate both ends of this spectrum, this study constructed short-length queries, consisting of two n-grams, and full-length queries, consisting of four n-grams, using the following rubric to help systematize the construction. rubric to select n-grams for shortand full-length queries pick terms that satisfy the following criteria: 1. n-grams that occur more frequently in a paper are preferred to those that occur less frequently. 2. if two n-grams appear to be structural derivatives of the same word (e.g., korea and korean), select the shortest n-gram and truncate it. 3. if one or more of the top terms appear in a later 2-gram, use the 2-gram as a phrase search. 4. ignore n-grams with repeating terms (e.g., south_africa_africa). 5. truncate all terms (using asterisk or question mark), except the first term of a phrase search, unless the first term is not a complete word (e.g., “busi* meeting*”). 6. for terms or phrases that end in truncated “i”, use the truncated version of the term and its truncated “y” counterpart, and combine both with an or operator (e.g., countri* or country*). 7. ignore all 3and 4-grams as they have a propensity to create nonsensical phrase searches (e.g., racism_polic_brutal). 8. if abbreviations are encountered, expand them for searching purposes (e.g., us is “united states”), except in cases where they are more commonly known by their abbreviation (e.g., ddt). 9. ignore results of contractions (e.g., ‘t) in case of a tie in the selection of an n-gram, sequence the following rules for selection: 1. preference proper nouns over other nouns and adjectives. if there are multiple proper nouns, preference place-name proper nouns over other proper nouns. 2. preference the n-gram that occurs in the greatest number of two or more n-grams later in the list. 3. preference longer words over shorter words. 4. group all the tied n-grams with a series of or statements. note: this may result in the selection of more than four total n-grams. referring to the example n-grams from table 1, an illustration of this method is shown in the following steps: 1. arrange terms from highest to lowest frequency. 2. select slave_trade as first n-gram, since “trade” and “slave” both occur in later n-gram. truncate to “slave trade*”. 3. select africa since it has the next greatest number of occurrences. combine africa with african since they are structural derivatives of one another. truncate to africa*. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 6 at this point, the first two selected n-grams—slave_trade and africa—become the keywords of the short-length query “slave trade*” and africa*. 4. select impact since it has the next greatest number of occurrences. truncate to impact*. 5. select peopl since it has the next greatest number of occurrences. truncate to peopl*. finally, the first four selected n-grams—slave_trade, africa, impact, and peopl—become the keywords of the full-length query “slave trade*” and africa* and impact* and peopl*. on average, after stop words and booleans were removed, the full-length queries in this study were 5.69 keywords long, while the short-length queries were 3.11 keywords long. keywords and queries, natural language in addition to the machine-oriented keyword process, the authors employed a student research assistant to create human-generated phrases, consisting of 3–10 words, which served as synopses for each of the 100 papers. this study then used these phrases as proxies for creating natural language search queries. for the same example research paper cited in table 1 above, this student created the summary phrase history and effects of the slave trade. this phrase in its entirety became the natural language query. on average, after stop words and booleans were removed, the natural language queries used in this study were 3.95 keywords long. search results using the three keyword-generation strategies outlined above, the authors constructed search queries and ran them against the ex libris’ primo search api endpoint. table 2 summarizes example result sets from the above short-length query, full-length query, and natural language query. for each of the keyword-generation strategies, the authors constructed search queries along four parameters: queries that used no faceting (open-ended), queries that faceted to articles only (articles), queries that faceted to books and ebooks only (books), and queries that faceted to newspaper articles only (newspapers). in all, there were 12 search-query constructions (three query types by four faceting modes) for fall 2018 and 12 for spring 2019. to construct a baseline for the search comparisons, the researchers designed the initial search to be open-ended. that is, the study assumed that patrons most often use the default, basic search functionality, with no facets selected. a segment of the rci instruction specifically encourages students to incorporate materials with resource types articles, books, and newspaper articles into their research papers. the authors therefore assumed that these students would most likely utilize facets corresponding to these resource types in their more specific queries and mirrored this behavior in the comparative searches. each primo search api returned titles for the top 50 results, moving beyond users’ usual search behavior in an effort to provide more flexibility to the initial steps of the relevancy framework. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 7 table 2. first-occurring result titles for query types: short-length, full-length, and natural language queries query type query first-occurring result titles short-length “slave trade*” and africa* the atlantic slave trade the atlantic slave trade : a census the atlantic slave trade legacy of the trans-atlantic slave trade : hearing before the subcommittee on the constitution, civil rights, and civil liberties of the committee on the judiciary, house of representatives, one hundred tenth congress, first session, december 18, 2007. ... full-length “slave trade*” and africa* and impact* and peopl* the atlantic slave trade the atlantic slave trade : effects on economies, societies, and peoples in africa, the americas, and europe slave trades, 1500–1800 : globalization of forced labour african voices of the atlantic slave trade : beyond the silence and the shame ... naturallanguage history and effects of the slave trade urban history, the slave trade, and the atlantic world 1500–1900 the atlantic slave trade and british abolition, 1760– 1810 the decolonization of african education and history the united states and the transatlantic slave trade to the americas, 1776–1867 ... a student research assistant harvested all the citations used across the 100 example papers to create an inventory of 730 bibliographic citations. using the excel fuzzy lookup add-in, the authors then compared this bibliographic inventory against the 60,000 titles that were returned information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 8 via the primo search api. this add-in fuzzy matches rows between two different tables and assigns a similarity score for each match. the study focused attention on rows with matching scores of .80 and above to further investigate potential matches. using the fuzzy matches as a starting point, the authors confirmed or denied matches by hand, using title and resource type as the main criteria. table 3. sample comparison of citations used in research papers against results returned from primo search api fuzzy score citation title citation resource type results title result resource type confirmed match 1.0000 a short history of biological warfare article a short history of biological warfare article yes 0.9933 the female madlady women, madness, and english culture, 1830– 1980 print book the female malady : women, madness, and english culture, 1830–1980 print book yes 0.9778 industrial revolution web resource the industrial revolution e-book no 0.9037 drug use & abuse print book drug use and abuse : a comprehensive introduction print book no results source citation data description this study compared citations gathered from a random sample of 100 research papers from the two semesters of all sections of history 105/305 taught at washington state university (wsu) from fall 2018 to spring 2019. table 4 below gives a descriptive breakdown of the citations by resource type. the student research assistant identified and categorized the source citation list. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 9 table 4. total source citations resource type fall 2018 (% of total) spring 2019 (% of total) book chapter 7 (1.94%) 4 (1.08%) books (e-books/print) 107 (29.72%) 96 (25.95%) newspaper article 63 (17.50%) 60 (16.22%) journal article 84 (23.33%) 99 (26.76%) reference entry 6 (1.67%) 6 (1.62%) other/ cannot determine 10 (2.78%) 15 (4.05%) web document 81 (22.50%) 90 (24.32%) magazine article 1 (.28%) n/a newspaper/magazine article 1 (.28%) n/a semester citation count 360 (100%) 370 (100%) total citation count 730 target citations list data the citations collected from the papers were then compared against 60,000 citations retrieved from the wsu primo search api endpoint on july 24, 2020, as described previously in the methods section. to better account for the differing numbers of citations among resource types in the source data and to normalize reporting across query types and semesters, most results are presented as a percentage and referred to as the matching success rate. for example, the natural language query had six matches out of a possible 360 citations in the open-ended search for citations from the fall of 2018. the matching success rate of the open-ended search in the fall of 2018 therefore is calculated at 1.67% (see table 5). table 6 below shows the percentage results for short queries, and table 7 for full queries. for information about the raw source numbers and target data, please see the open science framework project site.21 when all query types and faceting modes are considered, the matching success rate almost uniformly increased from fall 2018 to spring 2019. the largest difference in matching success rate was observed in the full-query articles only search at 8.91% as shown in table 7. the open-ended search observed the smallest difference in positive movement and the anomaly of a diminishing success rate. across the natural language and full-query types the open-ended search exhibited the least amount of positive difference in success rate, at 1.04% and 0.26% respectively, and the short-query open-ended search had a small negative change in success rate at −0.36%. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 10 table 5. natural language query results success rate fall 2018 spring 2019 % difference open-ended search 1.67% 2.70% 1.04% articles only 4.76% 9.09% 4.33% books only 3.74% 11.46% 7.72% newspapers only 0.00% 1.67% 1.67% table 6. short-query results success rate fall 2018 spring 2019 % difference open-ended search 3.33% 2.97% −0.36% articles only 3.57% 5.05% 1.48% books only 9.35% 10.42% 1.07% newspapers only 0.00% 3.33% 3.33% table 7. full-query results success rate fall 2018 spring 2019 % difference open-ended search 0.56% 0.81% 0.26% articles only 1.19% 10.10% 8.91% books only 0.93% 5.21% 4.27% newspapers only 0.00% 5.00% 5.00% total unique matches across all three search strategies and their four iterations, the researchers also note a raw count of matches which helps to determine how an overall search strategy is performing at finding matching citations. as the reader might expect, this metric includes a matching citation once across all four iterations of a search strategy. meaning, even if a source citation appears in both the open-ended search and the books only search, that source citation is only counted once for the purpose of this metric. for example, in the natural language query in fall 2018, six citations were matched in the openended search. four of the citations were articles and two were books. some of the matches in the articles and books searches were redundant to the open-ended search. considering only unique matches in the articles, books, and newspaper searches, the authors calculated the total number of unique matches. when the target searches were compared, the researchers matched two additional citations in the books only citations list. when the authors add the two additional matches, there were a total of eight unique citation matches across all iterations of the natural language search (open-ended search, books only, articles only, newspapers only). the total unique matches number and the corresponding success rate of the total unique matches for each search strategy is shown in table 8. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 11 table 8. total unique matches fall 2018 spring 2019 % difference natural language query 8 (2.22%) 22 (5.95%) 3.72% short query 14 (3.89%) 16 (4.32%) 0.44% full query 3 (0.83%) 18 (4.86%) 4.03% matches added by faceting another metric used to measure overall effectiveness of faceted searching is the percentage of matching citations that are new to the results list when limited to a certain resource type— matches added by faceting. meaning, what matching citations were not present in the open-ended search results but are then matched when the results list is reduced to only a single resource type. in table 9, the percentage of matches that are new and only to be found in a targeted search result varies greatly. between both semesters and among all search iterations, the smallest percentage of matches added by faceting is 14.29% and the largest is 83.33%. table 9. matches added by faceting fall 2018 spring 2019 % difference natural language query 2 (25.00%) 12 (54.55%) 29.55% short query 2 (14.29%) 5 (31.25%) 16.96% full query 1 (33.33%) 15 (83.33%) 50.00% comparing search strategies the matching success rate across search strategies (natural, short, full) and iterations is a mixed result and does not allow for very useful comparison beyond descriptions of difference which are outlined in the comparison tables (tables 5–7). to better compare the search strategies as a whole, as opposed to how a particular iterative search performed relative to another open or targeted search, the researchers used a weighted success rate of the total unique matches from both semesters as the proxy for overall performance and the point of comparison among the three search strategies. the comparison of this weighted success rate shows no difference in overall success rate between the natural language query (4.11%) and the short query (4.11%). the search strategy that was demonstrably different in weighted success rate is the full query at a lagging 2.88%. see table 10 for comparison and calculation details. table 10. weighted success rate of total unique matches natural language query (2.22%*360)+(5.95%*370)/730 4.11% (0.04109589) short query (3.89%*360)+(4.32%*370)/730 4.11% (0.04109589) full query (0.83%*360)+(4.86%*370)/730 2.88% (0.02876712) discussion how effective is primo at returning relevant results? according to the preliminary findings, primo is relatively ineffective at providing search results that match the citations used by the student researchers. the matching success rates of the openinformation technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 12 ended searches range from 0.56% to 3.33%. the possible reasons for these low numbers are numerous and varied; everything from students perhaps intending to use sources in the researchers’ auto-generated results lists, but unfortunately were unable to locate the full text, to the prevalence of finding open internet sources outside the discovery layer, to open-ended searches being flooded with rarely cited reference materials and very contemporary newspaper articles (see more about these ideas below). future research aims to understand more clearly which potential factors are present and to what degree they impact the matching success rates. to what extent does faceting improve search results? faceting within primo leads to better results, although the matching success rates are still more ineffective than not. the faceted searches contain the only matching success rates above ten percent: 10.10% (full query, articles only), 10.42% (short query, books only), and 11.46% (natural language query, books only). the data shows that the majority of unique matches found by the 2019 full-length and natural language search strategies occurs within the faceted searches (83.33% and 54.55%, respectively). it is interesting to note that these represent the two longer query strings, on average. future testing will reveal whether there is a relationship between query length and percentage of matches added by faceting. which search strategies are the most effective within the given framework? looking at the search strategies holistically, the researchers note that the total unique matches increased from fall 2018 to spring 2019 across all three query types. this increase was expected behavior, partially due to the fact that primo relevancy ranking algorithms assume that patrons prefer newer materials.22 the weighted success rate is an attempt to understand each search strategy’s performance over the 2018–2019 academic year, as opposed to comparing one semester to the other. from this metric, the consistency of the short-length query is equally effective as the more dynamic performance of the natural language query. the researchers are looking forward to adding more data to this metric to understand in which direction the average might move. how to refine the framework for future investigations into relevancy the most popular resource types used in the source citations were books, journal articles, web documents, and newspaper articles. together, these categories comprised approximately 93% of all resource types in both fall 2018 and spring 2019. however, not all areas were equally accessible within washington state university’s discovery layer configuration. the heavy reliance on web documents in the source citations was somewhat problematic, given the fact that web documents did not constitute a faceted resource type in wsu libraries’ primo prior to this study. therefore, the authors will need to better account for web documents in future testing. the assessment of newspaper articles also proved to be problematic, given their proclivity to inundate primo search results with numerous and recent documents. the sheer number of newspaper articles published and indexed every year in primo for general and introductory topics can dilute the pool of possible target citations greatly. for example, a scan of the matching newspaper articles reveals that 67% (4/6) were published in 2018. in future studies, the researchers will limit publication dates for target citations to the appropriate time period (e.g., an upper limit of may 2019 would be placed on publication dates for papers written in spring 2019) or collect data closer to the submission of research papers. in 11 out of 12 cases, matching success rates were better in spring 2019 than fall 2018, most likely due to recency. it is common for discovery environments, and true for the environment used in this study, to present content information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 13 sorted by relevance and then publication date. therefore, the researchers expected to and did find an increased matching success rate closer to the date of testing, with the one exception of the short-length, open-ended search query. this anomaly led researchers to dig more deeply into the target citations to see if a cause could be determined. researchers found a larger than expected number of citations for resource types that are underrepresented in source citations. for example, the reference entry resource type surfaced prominently in the open-ended search for several of the queries, diluting the pool of target citations with entries that had little chance of appearing in the source citation lists. in one standout example, there were four separate reference entries titled simply “taiping rebellion.” the discovery environment gave preference to these four separate reference entries over other, more substantive works, that are more likely to be cited in an academic paper. the researchers surmise this is partly a function of the relevancy ranking algorithm that gives greater weight to matches in the title, author, and subject fields.23 depending on the search and the configuration of the discovery environment, it is possible that reference entries would push other results from books, articles, and newspaper resource types farther down the results list, making them less and less visible in an open-ended search for a given topic. this dilution of the target citations with resource types that are not emphasized or widely used in source citations is another area the researchers aim to isolate and examine in further rounds of testing. in addition to the source recency and particular source type issues explained above, the authors did not take into account source availability, nor where sources were found by students, which remains a confounding factor on matching success rate. subsequent studies will capture whether sources are present in the local deployment of primo during the time frame the students were conducting research. this issue will be further addressed and mitigated by analyzing urls provided within student source citations. implications of this study for end users the matching success rate in the open-ended search when compared to the type-limited searches leads to a discussion of how to define and present the default search of the discovery environment to best serve an academic population. more pointedly, it opens the discussion of what resource types to include within that default search to return the most relevant and useful results and not just the most results. in this case, the argument could be made that excluding several resource types (e.g., reference entries) would surface resources that are more likely to be cited in a researcher’s scholarship. based on the number of matches that were introduced by performing a faceted search, it is evident that researchers still need to utilize a search strategy which includes using search filters and limiters (prior to or following the initial search) and other search tactics in a discovery environment to return relevant results. the notion that an open-ended “one and done search,” for even the most introductory of topics, will be successful in retrieving many usable and citable resources in the first page or two of results is not supported by the results of this study. conclusions and next steps as the common adage goes, “it’s not what you say, it’s what you do.” in this study, the saying applies as the researchers move beyond what sources students think are relevant to the sources students ultimately use in their papers. the current slate of discovery environment research projects focuses largely on users’ affective connections to discovery environments, often information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 14 compared to other kinds of academic databases, and places users in temporary, hypothetical research scenarios in order to judge source relevance.24 in juxtaposition, the rci research project is a term-length (10–14 weeks) venture; students have a significant amount of time and the aid of a scaffolded set of assignments, to bolster their source relevance assessment skills and authority. methodologies which closely mirror the authentic experiences and curriculum of the students are those which arguably will provide a more accurate picture of the value of the discovery environment in an academic setting. the authors of this study took the first steps in building a relevancy rating system for discovery environments. to standardize their preliminary results, they generated four metrics: matching success rate, total unique matches, matches added by faceted search, and weighted success rate. while the results of this study do not allow the researchers to draw statistical conclusions regarding the dominance of one search strategy over another in returning relevant results, the frequencies showed a better match (success) rate with faceted than non-faceted searching. discovery environments are commonly advertised as providing an easy to use, one-stop location for academic research needs, but the reality is more complex. students need to engage these systems with multiple search refinements to find valuable materials. this investigation was also the initial attempt to create a machine-generated framework to test the relevancy of web-based discovery environment’s results. as the authors look to build upon this preliminary study, there are several avenues to pursue that will enhance the methodology of the framework. one avenue is a refinement of the boundaries of the testing framework. this boundary refinement includes a re-examination of the criteria for inclusion in both the source citations and the search results list. in the current study, all student citations were deemed viable regardless of whether the source citation was able to be verified and accessed. this led to the inclusion of citations of lecture notes and other such materials that are not generally expected to appear in a discovery environment. the authors will also re-examine the inclusion of newspapers and reference works in open-ended searching. these two resource types are large in number, are not indexed very well, and often do not have descriptive titles. a portion of the next round of research will be dedicated to comparative testing (a/b) of generally deployed discovery environment configurations. another avenue of exploration is determining where in the results list a citation appears, not just the binary positive or negative, and measuring any impact based on behavior of the search (i.e., search construction) or behavior and configuration of the discovery environment. refining the methodology of the current framework will result in fewer potentially confounding factors and allow librarians to regain an understanding of relevancy when it comes to teaching discovery layers to student researchers. these next steps will contribute to the overall picture concerning the value and efficacy of web-based discovery environments that is steadily taking shape. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 15 endnotes 1 marshall breeding, “library technology guides: academic members of the association of research libraries: index-based discovery services,” library technology guides, https://librarytechnology.org/libraries/arl/discovery.pl. 2 “student learning goals,” washington state university common requirements, 2018, https://ucore.wsu.edu/about/learning-goals. 3 “welcome to the roots of contemporary issues,” washington state university department of history, 2017, https://ucore.wsu.edu/faculty/curriculum/root/. 4 “search it,” washington state university libraries, 2020, https://searchit.libraries.wsu.edu/. 5 megan oakleaf and neal kaske, “guiding questions for assessing information literacy in higher education,” portal: libraries and the academy 9, no. 2 (2009): 277, https://doi.org/10.1353/pla.0.0046. 6 oakleaf and kaske, “guiding questions.” 7 marlen prommann and tao zhang, “applying hierarchical task analysis method to discovery layer evaluation,” information technology and libraries 34, no. 1 (2015): 97, https://doi.org/10.6017/ital.v34i1.5600. 8 rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends 61, no. 1 (2012): 186–207, https://doi.org/10.1353/lib.2012.0029; david comeaux, “usability testing of a web-scale discovery system at an academic library,” college & undergraduate libraries 19, no. 2–4 (2012): 199, https://doi.org/10.1080/10691316.2012.695671; greta kliewer et al., “using primo for undergraduate research: a usability study,” library hi tech 34, no. 4 (2016): 566–84, http://doi.org/10.1108/lht-05-2016-0052; blake galbreath, corey m. johnson, and erin hvizdak, “primo new user interface,” information technology and libraries 37, no. 2 (2018): 10–33, https://doi.org/10.6017/ital.v37i2.10191. 9 heather dalal, amy kimura, and melissa hofmann, “searching in the wild: observing information-seeking behavior in a discovery tool” (association of college & research libraries 2015 conference proceedings, march 25–28, 2015): 668–75, http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 5/dalal_kimura_hofmann.pdf. 10 comeaux, “usability testing”; xi niu, tao zhang, and hsin-liang chen, “study of user search activities with two discovery tools at an academic library,” international journal of humancomputer interaction 30, no. 5 (2014): 422–33, https://doi.org/10.1080/10447318.2013.873281; kevin patrick seeber, “teaching ‘format as a process’ in an era of web-scale discovery,” reference services review 43, no. 1 (2015): 19– 30, https://doi.org/10.1108/rsr-07-2014-0023; kylie jarret, “findit@flinders: user experiences of the primo discovery search solution,” australian academic & research libraries 43, no. 4 (2012): 278–99, https://doi.org/10.1080/00048623.2012.10722288; aaron nichols et al., “kicking the tires: a usability study of the primo discovery tool,” journal of web librarianship 8, no. 2 (2014): 172–95, https://doi.org/10.1080/19322909.2014.903133; https://librarytechnology.org/libraries/arl/discovery.pl https://ucore.wsu.edu/about/learning-goals https://ucore.wsu.edu/faculty/curriculum/root/ https://searchit.libraries.wsu.edu/ https://doi.org/10.1353/pla.0.0046 https://doi.org/10.6017/ital.v34i1.5600 https://doi.org/10.1353/lib.2012.0029 https://doi.org/10.1080/10691316.2012.695671 http://doi.org/10.1108/lht-05-2016-0052 https://doi.org/10.6017/ital.v37i2.10191 http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2015/dalal_kimura_hofmann.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2015/dalal_kimura_hofmann.pdf https://doi.org/10.1080/10447318.2013.873281 https://doi.org/10.1108/rsr-07-2014-0023 https://doi.org/10.1080/00048623.2012.10722288 https://doi.org/10.1080/19322909.2014.903133 information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 16 kelsey renee brett, ashley lierman, and cherie turner, “lessons learned: a primo usability study,” information technology and libraries 35, no. 1 (2016): 7–25, https://doi.org/10.6017/ital.v35i1.8965; galbreath, johnson, and hvizdak, “primo new user interface.” 11 zebulin evelhoch, “where users find the answer: discovery layers versus database,” journal of electronic resources librarianship 30, no. 4 (2018): 205–15, https://doi.org/10.1080/1941126x.2018.1521092. 12 boram lee and eunkyung chung, “an analysis of web-scale discovery services from the perspective of user’s relevance judgement,” journal of academic librarianship 42 (2016): 529–34, https://doi.org/10.1016/j.acalib.2016.06.016. 13 sarah p. c. dahlen and kathlene hanson, “preference vs. authority: a comparison of student searching in a subject-specific indexing and abstracting database and a customized discovery layer” college & research libraries 78, no. 7 (2017): 878–97, https://doi.org/10.5860/crl.78.7.878. 14 stefanie buck and christina steffy, “promising practices in instruction of discovery tools,” communications in information literacy 7, no. 1 (2013): 66–80, https://doi.org/10.15760/comminfolit.2013.7.1.135; anita k. foster, “determining librarian research preferences: a comparison survey of web-scale discovery systems and subject databases,” journal of academic librarianship 44 (2018): 330–36, https://doi.org/10.1016/j.acalib.2018.04.001. 15 diane cmor and xin li, “beyond boolean, towards thinking: discovery systems and information literacy,” 2012 iatul proceedings, paper 7, https://docs.lib.purdue.edu/iatul/2012/papers/7/; kliewer et al., “using primo”; alexandra hamlett and helen georgas, “in the wake of discovery: student perceptions, integration, and instructional design,” journal of web librarianship 13, no. 3 (2019): 230–45, https://doi.org/10.1080/19322909.2019.1598919. 16 courtney lundrigan, kevin manuel, and may yan, “‘pretty rad’: explorations in user satisfaction with a discovery layer at ryerson university,” college & research libraries 76, no. 1 (2015): 43–62, https://doi.org/10.5860/crl.76.1.43. 17 hamlett and georgas, “in the wake of discovery.” 18 hugo c. huurdeman, mikaela aamodt, and dan michael heggo, “‘more than meets the eye’— analyzing the success of user queries in oria,” nordic journal of information literacy in higher education 10, no. 1 (2018): 18–36, https://doi.org/10.15845/noril.v10i1.270. 19 huurdeman, aamodt, and heggo, “more than meets the eye.” 20 christina stohn, ”how do users search and discover?: findings from ex libris user research,” ex libris, 2015, https://www.exlibrisgroup.com/blog/ex-libris-user-studies-how-do-userssearch-and-discover/. https://doi.org/10.6017/ital.v35i1.8965 https://doi.org/10.1080/1941126x.2018.1521092 https://doi.org/10.1016/j.acalib.2016.06.016 https://doi.org/10.5860/crl.78.7.878 https://doi.org/10.15760/comminfolit.2013.7.1.135 https://doi.org/10.1016/j.acalib.2018.04.001 https://docs.lib.purdue.edu/iatul/2012/papers/7/ https://doi.org/10.1080/19322909.2019.1598919 https://doi.org/10.5860/crl.76.1.43 https://doi.org/10.15845/noril.v10i1.270 https://www.exlibrisgroup.com/blog/ex-libris-user-studies-how-do-users-search-and-discover/ https://www.exlibrisgroup.com/blog/ex-libris-user-studies-how-do-users-search-and-discover/ information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 17 21 alex merrill and blake l. galbreath, “a framework for measuring relevancy in discovery environments,” 2020, https://osf.io/ve3kp/. 22 “primo search discovery: search, ranking, and beyond,” ex libris, 2015, https://www.exlibrisgroup.com/products/primo-discovery-service/relevance-ranking/. 23 “primo search discovery,” 3. 24 lee and chung, “an analysis of web-scale discovery services”; dahlen and hanson, “preference vs. authority”; lundrigan, manuel, and yan, “pretty rad”; hamlett and georgas, ”in the wake of discovery.” https://osf.io/ve3kp/ https://www.exlibrisgroup.com/products/primo-discovery-service/relevance-ranking/ abstract introduction background literature review methods research questions data collection keywords and queries, machine number of n-grams rubric to select n-grams for shortand full-length queries keywords and queries, natural language search results results source citation data description target citations list data total unique matches matches added by faceting comparing search strategies discussion how effective is primo at returning relevant results? to what extent does faceting improve search results? which search strategies are the most effective within the given framework? how to refine the framework for future investigations into relevancy implications of this study for end users conclusions and next steps endnotes september_ital_yelton_final president’s message: 50 years andromeda yelton information technologies and libraries | september 2017 1 fifty years. lita was voted into existence (as isad, the information science and automation division) in detroit at midwinter 1966. therefore we have just completed our first fifty years, a fact celebrated (thanks to our 50th anniversary task force) with a slide show and cake at annual in chicago. it’s truly humbling to take office upon this milestone. looking back, some of the true giants of library technology have held this office. in 1971-72, jesse shera, who in his wide-ranging career challenged librarians to think deeply about the epistemological and sociological dimensions of librarianship; ala makes several awards in his name today. in 1973-74 and again in 1974-75, frederick kilgour, the founding director of oclc, who also has an eponymous award. in 1975-76, henriette avram, the mother of marc, herself. moreover, thanks to the work of countless lita volunteers, much of this history is available openaccess. i strongly recommend reading http://www.ala.org/lita/about/history/ for an overview of the remarkable people and key issues across our history. you can also read papers by avram and kilgour, among many others, in the archives of this very publication. in fact, reading the ital archives is deeply engaging. it turns out library technology has changed a bit in 50 years! (i trust that isn’t a shock to you.) the first articles (in what was then the journal of library automation) are all about instituting first-time computer systems to automate traditional library functions such as acquisitions, cataloging, and finance. the following passage caught my eye: “a functioning technical processing system in a two-year community college library utilizes a model 2201 friden flexowriter with punch card control and tab card reading units, an ibm 026 key punch, and an ibm 1440 computer, with two tape and two disc drives, to produce all acquisitions and catalog files based primarily on a single typing at the time of initiating an order” (“an integrated computer based technical processing system in a small college library”, jack w. scott; https://doi.org/10.6017/ital.v1i3.2931.) how many of us are still using punch cards today? and, indeed, how many of us are automating libraries for the first time? the topics discussed among lita members today are far more wideranging: user experience, privacy, accessibility. they’re more likely to be about assessing and improving existing systems than creating new ones, and more likely to center on patron-facing technologies. andromeda yelton (andromeda.yelton@gmail.com) is lita president 2017-18 and owner/consultant of small beautiful useful llc. president’s message | yelton https://doi.org/10.6017/ital.v36i3.10086 2 and yet, with a few substitutions — say, “raspberry pi” for “friden flexowriter” — the blockquote above would not be out of place today. then as now, lita members were doing something exciting, yet deeply practical, that cleverly repurposes new technology to make library experiences better for both patrons and staff. our job descriptions have changed enormously in fifty years; in fact, the lita board charged a task force to develop lita member personas, so that we can better understand whom we serve, and work to align our publications, online education, conference programming, and committee work toward your needs. (you can see an overview of the task force’s stellar work on litablog: http://litablog.org/2017/03/who-are-lita-members-lita-personas/.) at the same time, the spirit of pragmatic creativity that runs throughout the first issues of the journal of library automation continues to animate lita members today. i’m looking forward to seeing where we go in our next fifty years. creating and deploying usb port covers at hudson county community college communications creating and deploying usb port covers at hudson county community college lotta sanchez and john delooper information technology and libraries | september 2019 91 lotta sanchez (lsanchez@hccc.edu) is library associate – technology, hudson county community college. john delooper (john.delooper@lehman.cuny.edu) is web services – online learning librarian, lehman college, city university of new york. abstract in 2016, hudson county (nj) community college (hccc) deployed several wireless keyboards and mice with its imac computers. shortly after deployment, library staff found that each device’s required usb receiver (a.k.a. dongle) would disappear frequently. as a result, hccc library staff developed and deployed 3d printed port covers to enclose these dongles. this, for a time, proved very successful in preventing the issue. this article will discuss the development of these port covers, their deployment, and what worked and did not work about the project. introduction 3d printing was invented in the 1980s but remained a niche product until emerging as a mainstream technology beginning in 2009. it has been speculated that the growth in popularity was due to several factors, most notably the expiration of patents on technologies such as fused deposition modeling.1 the expiration of this patent led to the emergence of several new companies such as makerbot, which developed and released lower priced 3d printers in an effort to popularize 3d printing.2 nevertheless, early 3d printers were still more expensive than most individual consumers could afford. as with laser printers in the 1980s, many libraries combined their role in the growing makerspace movement with their community purchasing power to bring this new technology to libraries across the united states.3 libraries thus became focal points in the nascent consumer 3d printing movement, frequently providing both training and access to supplies and equipment. as the popularity of 3d printing grew, new communities of 3d printing users emerged and began to design and share artwork and practical objects created with 3d printing technology, often via communities like thingiverse and shapeways. 3d printing at hudson county community college in august 2014, the hudson county (nj) community college (hccc) library moved into a larger facility, nearly doubling its square footage. at this time, many libraries were beginning to open makerspaces, which are facilities for collaboration where the “emphasis is on creating with technology,” and hccc saw an opportunity to join this movement.4 given the results of student feedback surveys, and the observed popularity of 3d printing in public libraries, hccc librarians sought to purchase a 3d printer as a signature technology for the new makerspace. to support the new makerspace, the library’s staff implemented a series of workshops to teach students how to use the 3d printer and create their own projects. in addition, when the makerspace was not in use, the library’s administration allowed staff to experiment with the 3d printer, as well as all mailto:lsanchez@hccc.edu mailto:john.delooper@lehman.cuny.edu creating and deploying usb port covers | sanchez and delooper 92 https://doi.org/10.6017/ital.v38i3.11007 technologies housed in the makerspace, to allow them to better understand and promote these tools. about hudson county community college as per its 2017-18 hccc factbook, hccc is an urban institution “offering courses and classes in a wide variety of disciplines and studies in one of the most densely populated and ethnically diverse areas of the united states.”5 as of fall 2017, hccc’s full-time-equivalent student population is 7,712, and includes students representing “more than 90 nationalities.” many of these students hail from outside of the united states, “nearly 58 percent of whom speak a language other than english in their homes.” hccc’s demographics also skew young, with students ages 20 through 29 comprising approximately 52 percent of enrolled students. more recently, hccc has also increased its enrollment of high school students, as “the number of students under the age of 18, who are mostly enrolled through hccc’s various high school initiative programs, has more than quadrupled over the past five years.” as with many other community colleges, hccc’s student body includes approximately a 6:4 ratio of female to male students. the mac usb dilemma as part of the move to a new facility, the library purchased several new technologies such as computers including dell pcs and apple imacs (macs). the dell pcs came with wireless keyboards and mice, and in march 2016, the macs were switched to wireless keyboards and mice as well because their original keyboards and mice began to break down and needed replacement. students reported to library staff that the wireless keyboards and mice were a good investment, as they made it easier to move keyboards for better collaboration and for ease of storing backpacks and textbooks on desks. on both the dell pcs and the macs, the wireless keyboards and mice required the use of a small usb receiver, known as a dongle, to connect to the computer. as the wireless keyboards were installed, several library staff members raised concerns that wireless keyboards and mice would be tempting targets for theft by patrons. surprisingly, theft of keyboards and mice did not come to pass. since deployment, library staff reported no incidents of theft of any keyboards or mice. however, an unexpected type of theft soon emerged. library employees noticed that on the imacs, the type-a usb dongles, which were needed for the computers to receive input from the keyboards and mice, started disappearing. staff observed that this seemed to be a problem only among the library’s 18 macs, not its 57 dell computers, which also had wireless keyboards and wireless receivers. anecdotal observation suggested that this phenomenon emerged due to the dell’s dark color scheme, which obscured each compu ter’s usb ports, and rendered the dongles inconspicuous. in contrast, the imacs had sleek aluminum finishes, on which the dongles were more visible, and seemed to be perceived by students as flash drives (see figures 1-5). information technology and libraries | september 2019 93 figure 1. hccc imac (back with dongle shown). creating and deploying usb port covers | sanchez and delooper 94 https://doi.org/10.6017/ital.v38i3.11007 figure 2. hccc dell pc (back with dongle shown). information technology and libraries | september 2019 95 figure 3. hccc imac with usb dongle closeup. figure 4. hccc pc with usb dongle closeup. creating and deploying usb port covers | sanchez and delooper 96 https://doi.org/10.6017/ital.v38i3.11007 figure 5. comparison of mac and pc usb ports. these perceptions were confirmed as students started visiting service points with dongles from the macs and turning them in to library staff as “lost flash drives.” as there was frequently a lag between when a dongle was turned over to staff and the device’s initial disappearance, students began to report frustration that they would try to use a mac and find that the mouse and keyboard could not communicate with the computer. this would cause them to assume that the mac was broken, and library staff would respond by taking the computer out of service until a tech could examine it, often several hours or even days later depending on staffing. during the first semester that these keyboards and mice were deployed, the library found that almost every usb receiver was lost or stolen. this resulted in over $300 of unplanned expenses. in addition, library staff spent dozens of hours inspecting the imacs after students reported non-functioning keyboards, determining what issue was occurring, ordering replacement parts, and connecting new dongles, a process also referred to as “pairing.” to address this problem, hccc’s director of library technology sought solutions from the library’s technology staff. at a staff meeting in the spring 2016 semester, most of the members of the technology unit suggested that the library address the disappearing dongle issue by purchasing new wired keyboards and mice. the director of the technology unit felt that this was a premature solution to the issue, as he and the library administration preferred a solution that allowed the library to continue to use the wireless keyboards and mice, which were both costly and requested by the institution’s student community. during this meeting, the idea of finding port covers for the dongles arose, and one of the library’s technology associates suggested using the library’s 3d printer to create a cover that inserts into one type a usb port and would cover the dongle in the adjacent slot. the library’s technology director asked her to create a prototype, and the technology associate began work on creating this port cover. information technology and libraries | september 2019 97 methodology to create the 3d-printed port cover, the technology associate began with an online search of the 3d-printing community thingiverse, looking to see if any other 3d port covers already existed. she hoped to find an existing port cover that was both functional and easy to manufacture—in other words, quick to print, since the library’s makerbot replicator often took hours to print intricate designs and frequently jammed, due to an extruder design flaw that was common to fifth generation replicator printers.6 a thingiverse search found several varieties of port covers, but each was designed solely to occupy a port in order to prevent dust or corrosion, not to cover or hide dongles or other peripherals. since none of the existing designs adequately met the library’s needs, the technology associate created her own design using tinkercad, a web-based computer aided design (cad) program (see figure 6). figure 6. picture of port cover design in tinkercad. since each of hccc’s imac computers contained four type a ports, students would often attach other peripherals such as phone charge/sync cables or flash drives. therefore, the port cover needed to be small enough to allow room for peripherals. the technology associate thus designed a cover that would not hinder students who wanted to insert their usb flash drives or other devices, as is depicted in figures 7-10. creating and deploying usb port covers | sanchez and delooper 98 https://doi.org/10.6017/ital.v38i3.11007 figure 7. closeup of port cover. figure 7. alternate angle of port cover. information technology and libraries | september 2019 99 figure 8. picture of dongle with port cover installed. figure 9. port cover allowed space to utilize other usb ports for flash drives, etc. creating and deploying usb port covers | sanchez and delooper 100 https://doi.org/10.6017/ital.v38i3.11007 she then exported the tinkercad design as an stl (stereolithography) file, and printed prototypes on the makerbot replicator using pla filament. finding that her initial measurements did not quite fit, she adjusted the models one millimeter at a time, and reprinted them until the fit was secure and the dongles were covered. at this point, she printed enough covers for each mac, along with a few spares in case covers broke or wore out during normal operation. results at the beginning of the fall 2016 semester, the port covers were deployed to each of the library’s macs. during that semester, the technology associate monitored the effectiveness of the port covers. by the end of the semester, four port covers had disappeared, along with one dongle. at the beginning of the spring 2017 semester, the missing dongle was replaced, and replacement port covers were printed and deployed to the machines from which the port covers disappeared. again, the success of the port cover installation project was monitored. during this period, four port covers disappeared, along with two dongles. after the spring semester, the technology associate conferred with the director of library technology, and they decided that given the relatively low cost of 3d-printer filament used to print the dongles, and the greatly reduced receiver theft rate, this was an acceptable loss. they therefore decided to continue utilizing the port covers. but, in the fall 2017 semester, five covers and each of their corresponding dongles disappeared. then, during spring 2018 semester, all of the port covers disappeared at least once, as did the associated dongles. in total, 20 dongles were lost during that semester. the director of library technology and the technology associate conferred once again, and decided that due to this increase in theft, and a concurrent change in the college’s purchasing process, the library would abandon the 3d-printed port cover experiment. analysis after two seemingly successful semesters, library staff were proud of the changes that resulted from deploying the port cover. yet given the reoccurrence of the theft pattern in subsequent semesters, they started to worry that printing new port covers was not a sustainable practice. to that end, the technology associate considered several theories as to what would cause the port covers to disappear. for instance, research by keizer, lindenberg, and steg found that acts of social disorder (such as graffiti or litter) will spread if not stopped promptly.7 under this framework, it could be suggested that the library was too slow to respond to missing covers, and thus permitted the loss of the dongles due to insufficient action or maintenance. this theory seems logical since following an enrollment decline that began in fall 2016, a hiring freeze was instituted so as staff members left the institution, few positions were replaced. indeed, as of fall 2018, hccc’s staff is 75 percent part-time and part-timers are subject to renewal or dismissal every six months. in addition, many library employees are student workers, who often leave at graduation, and other part-time staff tend to find full-time employment or leave the library for full-time work at rates that may exceed other institutions who have more permanent staff. with limited staff resources, many of the library’s employees noted anecdotally that they were not able to give as much attention toward preventative maintenance on library computers as they had in prior semesters. therefore, they did not have time to proactively monitor equipment such as port covers and dongles. it is also possible that a novelty factor was at play. perhaps when the covers were first deployed, the brightly colored filaments stood out on the aluminum computers, making students more likely information technology and libraries | september 2019 101 to notice them and alter their behaviors accordingly. if this was the case, new students who began their coursework in subsequent semesters would not have known that port covers were an additional piece that had been added to the library’s computers in response to prior issues. following this speculation, the library’s patrons who removed port covers in fall 2017 and spring 2018 might have thought they were removing damaged or nonfunctional flash drives similarly to the students who brought what they believed were lost flash drives to library staff during the spring 2016 semester. finally, the difference in semesters could also have been due to random chance, in which case, no staff action could have affected the rate at which port covers disappeared. conclusion and future research being unsure of which of these analyses was most correct, the technology associate had planned to learn from the sudden resurgence in thefts in several ways. she planned to experiment with adding signage about the importance of dongles and the usage of port covers, and to interview student mac users to find out their perceptions about the port covers, as well as possible ideas and student-generated suggestions to prevent future thefts. she also considered designing and experimenting with printing more elaborate port covers to see if increased visibility or an elaborate shape would change theft rates. however, a complication arose during the 2018 and 2019 fiscal years. during this time, the college’s finance office changed its purchasing procedures. first, they eliminated the library’s technology budget, centralizing all technology purchases in a “pool,” whose total budget was uncertain. to make purchases from this pool, departments had to create detailed needs justification and obtain approvals from four high-level executives, in addition to the preexisting procedure of obtaining quotes and getting department head and vice president approval. while the library was eventually able to obtain funds from this process, navigating the pool process typically took about six months per purchase, which meant that, in effect, replacement dongles had to come from existing supplies. in addition, the supplies budget line, which was greatly reduced due to the enrollment decline, also came under increased scrutiny, and the purchasing department began to refuse to approve the purchase of batteries. while many of the mac keyboards were solar powered, and thus did not require batteries, all of their wireless mice, along with the wireless keyboards and mice on the windows pcs, required the use of either aa or aaa batteries. as battery supplies dwindled, the purchasing department did eventually agree to allow purchase of more batteries, under the condition that the library begin going through the pools process to purchase wired keyboards and mice. in the meantime, the technology associate continues to monitor wireless dongles, reprint port covers, and swap wired keyboards from the library’s spare parts inventory for wireless ones as dongles have disappeared. the creation of 3d-printed port covers was successful at preventing equipment loss at hccc for only two semesters before failing to fulfill that purpose. library staff speculated about the cause of this change but were unable to make that determination with certainty before budgetary changes caused the end of the 3d-printed port cover experiment. nevertheless, this project proved valuable to the library to better learn about 3d-printing technology, and to experiment with its practical uses in the library environment. endnotes creating and deploying usb port covers | sanchez and delooper 102 https://doi.org/10.6017/ital.v38i3.11007 1 filemon schoffer, “how expiring patents are ushering in the next generation of 3d printing,” techcrunch (blog), june 5, 2016, http://social.techcrunch.com/2016/05/15/how-expiringpatents-are-ushering-in-the-next-generation-of-3d-printing/. 2 christopher mims, “3d printing will explode in 2014, thanks to the expiration of key patents,” quartz (blog), july 21, 2013, https://qz.com/106483/3d-printing-will-explode-in-2014thanks-to-the-expiration-of-key-patents/. 3 jason griffey, “absolutely fab-ulous,” library technology reports 48, no. 3 (april 2012): 21–24, https://journals.ala.org/index.php/ltr/article/view/4794. 4 caitlin bagley, “what is a makerspace? creativity in the library,” ala techsource, december 20, 2012, http://www.ala.org/tools/article/ala-techsource/what-makerspace-creativity-library. united for libraries, american library association office for information technology policy, and public library association, “progress in the making: an introduction to 3d printing and public policy,” september 2014, http://www.ala.org/advocacy/sites/ala.org.advocacy/files/content/advleg/pp/hometip3d_printing_tipsheet_version_9_final.pdf. 5 hudson county community college, “fact book 2017-2018,” 2018, https://www.hccc.edu/uploadedfiles/pages/explore_hccc/visiting_hccc(1)/factbook%20final%20web%20version.pdf. 6 adi robertson, “makerbot is replacing its most ill-fated 3d printing product,” the verge (blog), january 4, 2016, https://www.theverge.com/2016/1/4/10677740/new-makerbot-smartextruder-plus-3d-printer-ces-2016. 7 kees keizer, siegwart lindenberg, and linda steg, “the spreading of disorder,” science 322, no. 5908 (2008): 1681–85. http://social.techcrunch.com/2016/05/15/how-expiring-patents-are-ushering-in-the-next-generation-of-3d-printing/ http://social.techcrunch.com/2016/05/15/how-expiring-patents-are-ushering-in-the-next-generation-of-3d-printing/ https://qz.com/106483/3d-printing-will-explode-in-2014-thanks-to-the-expiration-of-key-patents/ https://qz.com/106483/3d-printing-will-explode-in-2014-thanks-to-the-expiration-of-key-patents/ https://journals.ala.org/index.php/ltr/article/view/4794 http://www.ala.org/tools/article/ala-techsource/what-makerspace-creativity-library http://www.ala.org/advocacy/sites/ala.org.advocacy/files/content/advleg/pp/hometip-3d_printing_tipsheet_version_9_final.pdf http://www.ala.org/advocacy/sites/ala.org.advocacy/files/content/advleg/pp/hometip-3d_printing_tipsheet_version_9_final.pdf https://www.hccc.edu/uploadedfiles/pages/explore_hccc/visiting_hccc(1)/factbook-%20final%20web%20version.pdf https://www.hccc.edu/uploadedfiles/pages/explore_hccc/visiting_hccc(1)/factbook-%20final%20web%20version.pdf https://www.theverge.com/2016/1/4/10677740/new-makerbot-smart-extruder-plus-3d-printer-ces-2016 https://www.theverge.com/2016/1/4/10677740/new-makerbot-smart-extruder-plus-3d-printer-ces-2016 abstract introduction 3d printing at hudson county community college about hudson county community college the mac usb dilemma methodology results analysis conclusion and future research endnotes reproduced with permission of the copyright owner. further reproduction prohibited without permission. electronic library for scientific journals: consortium project in brazil rosaly favero krzyzanowski;taruhn, rosane information technology and libraries; jun 2000; 19, 2; proquest pg. 61 electronic library for scientific journals: consortium project in brazil making information available for the acquisition and transmission of human knowledge is the focal point of this paper, which describes the creation of a consortium for the 1111iversity and research institute libraries in the state of sao paulo, brazil. through sharing and cooperation, the project will facilitate information access and minimize acquisition costs of international scientific periodicals, consequently increasing user satisfaction. to underscore the advantages of this procedure, the objectives, management, and implementation stages of the project are detailed, as submitted to the research support foundation of the state of sao paulo (fapesp). i production, organization, and acquisition of knowledge in 1851, predicting the imminent growth in information, which in fact exploded in volume one hundred years later, joseph henri of the smithsonian institute voiced his opinion that the progress of mankind is based on research, study, and investigation, which generate wisdom, knowledge or, simply , information. he stated that for practically every item of interest there is some record of knowl edge pertinent to it, "and unless this mass of information be properly arranged, and the means furnished by which its content may be ascertained, literature as well as science will be overwhelmed by their own unwieldy bulk. the pile will begin to totter under its own weight, and all the additions we may heap upon it will tend to add to the extension of the base, without increasing the elevation and dignity of the edifice." 1 at the threshold of the twenty-first century, these words become more self-evident by the day. there are enormous archives of knowledge from which people extract parts, allowing them to advance and progress in science, technology, and the humanities. until some decades back, recovery from these archives was essentially a manual task consisting of written work and organization. today's technologies provide auxiliary tools to transmit this knowledge . although information is a cultural and social asset, it now is purchased at high prices . making these enormous archives available in a clear and organized manner by using the proper technology is currently the greatest challenge for all those involved in knowledge management-the production , organization, and transmission of information. rosaly favero krzyzanowski rosane taruhn i the advent and implications of electronic publications among the major contributions of the industrial era, outstanding are the evolution and growth of information publi shing and printing facilities that use tools to record, store, and distribute information. in the last ten years, the first steps were taken toward the storage and reproduction of sounds and images in new multimedia formats. technological advances also have brought new possibilities in accessing and disseminating information . electronic publishing has been particularly effective in accelerating access and contributing to the generation of additional knowledge; consequently, an exponential increase in data has taken place, most notably in the second half of the twentieth century. current journals numbered about 10,000 at the beginning of the century; by the year 2000 the number had reached an estimated 1 million. 2 as a result, specialized literature has been warning about a possible crisis in the traditional system of scientific publications on paper . in addition to the difficulty of financing the publication of these works, the prices of subscriptions to scientific periodicals on paper have been rising every year. at times, this makes it impracticable to update collections in all libraries, which interferes substantially in development. on the other hand, access to electronic scientific publications via internet is proving to be an alternative for maintaining these collections at lower cost. it also provides greater agility in publishing and distributing the periodical, and in the final user's accessing of the information. due to this, it is important that institutions that wish to support and promote research developed by their scientific communities facilitate access to these publications on electronic media . to paraphrase line, we can say that although publishers are still uncertain as to all the aspects of transmitting information electronically, because authors and institutions will be increasingly able to distribute their works on the web without the direct involvements of publishers, there is an escalation in electronic publications being published by scientific publishers.3 rosaly favero krzyzanowski is technical director of the integrated library system of the university of sao paulosibi/usp, brazil. rosane taruhn is director of the development and maintenance of holdings service of the technical department of the university of sao paulo-sibi/usp, brazil. electronic library for scientific journals i krzyzanowski andtaruhn 61 ! reproduced with permission of the copyright owner. further reproduction prohibited without permission. physical figure 1. infrastructure resources for consortium formation line also savs that one of the reasons for the growth in the number o'f electronic publications is "that it is technically possible to make them [journals] accessible in this way, and in fact easy and cheap, since nearly all te_xt ~oes through a digital version on the way to pubh~ahon. secondly, journal publishers believe that electronic ve~sions provide a second market in addition to that for t~eir printed versions, or at least in an expanded market, since many users will be the same." 4 . . . . . it is important to point out that the sc1enhhc penod1cal, be it paper or electronic, must ensure market valu_e and academic community receptivity, have a staff qualified for scientific publishing, be consistent in publishing release dates, comply with international standards, and use established distribution and sales mechanisms. 5 line goes further: "electronic publication as an_ 'extra' to printed publication has few added costs of j~urnal publication other than those of printing, and pubhshe~s are not going to want to make less money fro~ elect~onic journals than they do from printed ones. while p~inted journals once acquired can be used and reused without extra cost, each access to an electronic article has to be paid for. and although the costs of storage and binding may be saved, these are offset by the costs of printing out."6 he then notes that this technology demands an active equipment and telecommunication infrastructure. another point he addresses is the need for users to master the search strategies required to efficiently recover information, thus reducing the time spent and costs. in turn, saunders points out that, depending on the contracts made with the publishers or their agents: 62 information technology and libraries i june 2000 libraries, through their development, formation, and maintenance policies, should be receptive to this transition by accommodating the different means of communication to the different user needs and striving for a new balance. these policies should certainly stress the cooperation and sharing of remote access to the information demanded. budget estimates should, therefore, foresee, in addition to the subscriptions to electronic titles with complete texts, other possible items like licensing rates for multi-user remote access and the right to copy articles on electronic media to paper, depending on the contracts made with the publishers or their agents.7 i electronic publication consortiums catering to mutual interests by setting up a library consortium to select, acquire, maintain, and preserve electronic information is one means of reducing or sharing costs as well as expanding the universe of information available to users and ensuring a successful outcome. resources-physical, human, financial, and electronic-are combined for the common good; in this case, the consortium, as shown in figure 1, which was extracted and adapted from an oclc institute. 8 the consortium presupposes invigoration of cooperative activities among member libraries by promoting the central administration of electronic publication databases as part of a shared library system visible to all and replete with access facilities. in addition to putting in place simplified, reciprocal lending progra~s and spu_rring _the cooperative development of collections and the~r st~nng, the consortium has the objective of implementing information distribution by electronic means, provided that copyright and fair use rights are complied _wi~h.9 on t~e other hand, "the research library community is committed to working with publishers and database producers to develop model agreements that deploy lice~ses that d? not contract around fair use or other copynght provisions. in this way, one seeks to insure the library practices being disseminated, especially interli?~ary lendi~g."'. 0 experience shows that acqumng ~ubhcahons through consortia has brought great benefits and has equally favored different size institutions that would not be able to afford single subscriptions, whether on paper or in electronic format. north american and european universities have been opting for this type of alliance to augment inve~tment cost-benefit. important examples of these consortia currently operative are: • washington research library consortium, washington, d.c., www.wric.org; reproduced with permission of the copyright owner. further reproduction prohibited without permission. • university system of georgia, galileo project, www.galileo.peachnet.edu; • committee on institutional cooperation, michigan, www.cedar.cic.net/ cic; and • ohio library and information network, ohio link, www.ohiolink.edu. i the electronic consortium in the state of sao paulo considering that brazilian institutions also are being affected by the high cost of maintaining periodical collections and that alternative means of distributing this information are available, the model used abroad has shown itself as appropriate for developing the international scientific publications electronic library in the state of sao paulo. the location has a favorable information infrastructure available, particularly that of the electronic network of the academic network of sao paulo (ansp), thanks to the support of the research support foundation of the state of sao paulo (fapesp). 11 growing user demand for direct, convenient access to information in the state of sao paulo also was a factor in location choice. the final decision was to compose the consortium of five sao paulo state universitiesuniversidade de sao paulo (usp), universidade estadual paulista (unesp), universidade de campinas (unicamp), universidade federal de sao carlos (ufscar), and universidade federal de sao paulo (unifesp)-as well as the latin american and caribbean center for health science information (bireme). the consortium's goal was to make available to the member institutions' entire scientific community-10,492 faculty and researchers -rapid access to the complete, updated texts of the elsevier science scientific journals. this publishing house, an umbrella for north holland, pergamon press, butterworth-einemann, and excerpta medica, presently publishes electronic versions of its journals. selection of the member institutions that would serve as a pilot group for this project was based on prior experience with the cooperative work in preparing the unibibli collective catalog cd-rom, which, using bireme/opas/oms technology, consolidates the collections of these three universities. the project was initially funded by the fapesp; since its fourth edition the cdrom has been published through funds provided by the universities themselves, by means of a signed agreement. moreover, choice of elsevier science, which would be justified solely by its premier ranking in the global publishing market, also is due to the fact that consortium member institutions maintain subscriptions to a great number (606) of this publishing house's titles on paper. already fully available on electronic media, these titles are components of a representative collection initiating the building of the international scientific publications electronic library in the state of sao paulo. furthermore, the majority of the titles are studied on the institute of scientific information's web of science site, which has been at the disposal of researchers and libraries in the state of sao paulo since 1998. consortium objectives the consortium was formed to contribute to the development of research through the acquisition of electronic publications for the state of sao paulo's scientific community. using the ansp network, in addition to augmenting and speeding up access to current scientific information in all the member institutions, will: • increase the cost-benefit per subscription; • promote the rational use of funds; • ensure continuous subscription to these periodicals; • increase the universe of publications available to users through collection sharing; • guarantee local storage of the information acquired and thus ensure the collection's maintenance and its continual use by present and future researchers; and • develop the technical capabilities of the personnel of the state of sao paulo institutions in operating and using electronic publication databases. initially, the project will not interfere in the current process of acquiring periodicals on paper and in distributing collections in member institutions. however, as electronic collection utilization becomes predominant, duplicate subscriptions to paper may be eliminated so as to allow new subscriptions to be available to the consortium at no additional cost. implementation of the electronic library for international scientific publications implementation of this project includes the following stages already achieved: • constitution of the consortium by the six member institutions; and • set up of an administrative board. the following stages are in progress: • purchase of hardware (central server) and a software manager; and • estimate for the installation of the operational system. electronic library for scientific journals i krzyzanowski and taruhn 63 reproduced with permission of the copyright owner. further reproduction prohibited without permission. bireme server fapesp server full-text database r----------.,1 full-text 1 t international i r database 1 ~----------~ web of science .... •--•.. : scientific : : current : : contents : scielo : periodical : 1 electronic 1 i l'b i 1 1 rary 1 .. __________ .. : connect : i (ccc) i i i ., __________ ., \/ universe • web of science: 8,000 titles • ccc: 9,000 titles users in consortia institutions • scielo (scientific electronic library online): 100 titles • international scientific periodical electronic library: 606 titles figure 2. reference database and full-text interconnectivity to optimize information access and the following stages are planned: • training for qualified personnel and maintenance of the infrastructure built up; • acquisition and implementation of the electronic library on the central server; and • permanent utilization assessment. the pilot project proposes that the central server, for storage and availability of electronic scientific periodical collections on the ansp network, be located at fapesp in order to facilitate development of an electronic bank. in the future, the bank should, in addition to the collection in mind for the project, include international collections of other publishing houses: the scielo collection of brazilian scientific magazines (project fapesp /bireme) as well as the web of science and current contents connect reference databases (see figure 2). consortium management the electronic library will be administrated by the consortium's administrative board, made up of a general coordinator, an operations coordinator, and directors and coordinators of the library systems and central libraries of member institutions as well as consultants recommended by fapesp. the administrative board shall be in charge of the implementation, operation, dissemination, and assessment of electronic library utilization. it also is charged 64 information technology and libraries i june 2000 with supervising qualified personnel training in order to guarantee the success of the project. an agreement specifying the consortium objective, its constitution, the manner by which it shall be executed and consortium member obligations established was signed. shortly, a contract to use elsevier science electronic publications shall be signed by fapesp and by the provider. the agreement's documents and use license were drawn up in compliance with the principles for licensing electronic resources recommended by the american library association, published in final version at the 1997 american library association annual conference.12 i recovery system and information use evaluation research on electronic media suggests that use of a single software program that offers different strategies and forms of interacting for searching the collections requires an evaluation of the efficiency of individual research strategies. this evaluation is critical for preparation of guidelines that orient the choice of systems and proper training programs.13 for the electronic library, the challenge of measuring not only the amount of file use but also the efficacy and efficiency of its information access systems and training for its users is an imperative task. in the project reproduced with permission of the copyright owner. further reproduction prohibited without permission. described, evaluation shall be made by indicators that demonstrate use of the electronic library and of the collections on paper, per journal title, subject researched, user institution, number of accesses per day, and user satisfaction regarding service provided (interface, response time, text copies), among other factors to be studied. i final remarks the way in which electronic media are read by the users is a code far beyond the written, because sound and image are being added increasingly. in this first generation of electronic publications, fapesp supported availability of web of science and of scielo by fapesp and the creation of the international scientific publications electronic library in the state of sao paulo. the possible introduction of current contents connect will trigger an extraordinary leap in research development, facilitating the access of scientific information and the acquisition and transmission of human knowledge as well as enhancing the cooperative and sharing enterprise of member libraries. references and notes l. annual report of the board of regents of tile smit/zsonum institution ... during the year 1851 (washington, d.c. 1852), 22. 2. leo wieers, "a vision of the library of the future," in developing the library of the fut11re: the tilb11rg experience, h. geleijnse and c. grootaers, eds. (tilburg, the netherlands: tilburg univ., 1994), 1-11. 3. m. b. line, "the case for retaining printed lis journals," !fla journal 24, no. 1 (oct./nov. 1998): 15-19. 4. ibid. 5. r. f. krzyzanowski, "administra<;ao de revistas cientificas," in re11niiio anual da sociedade de pesquisa odonto/6gica, aguas de sao pedro, 14, 1997. (lecture) 6. line, "the case for retaining printed lis journals." 7. l. m. saunders, "transforming acquisitions to support virtual libraries," information teclmology and libraries 14, no. 1 (mar. 1995): 41-46. 8. oclc institute, oclc instit11te seminar: information tec/znology trends for thl' global library cormmmity, 1997, ohio (dublin, ohio: oclc institute/the andrew w. mellon foundation/funda<;ao gettilio vargas/bibliodata library network, 1997). 9. a definition of fair use is the "legal use of information: permission to reproduce texts for the purposes of teaching, study, commentary or other specific social purposes." found in j. s. d. o'connor, "intellectual property: an association of research libraries statement of principles." accessed july 28, 1999, http://arl.cni.org/ scomm/ copyright/ principles. html. 10. statement of current perspective and preferred practices for the selection and purchase of electronic information. icolc statement on electronic information. accessed july 2, 1998, www.library.yale.edu/ consortia/statement.html. 11. r. f. krzyzanowski and others, biblioteca eletr6nica de publicac;oes cientfficas internacionais para as universidades e institutos de pesquisa do estado de sao paulo. sao paulo, 1998 (project presented to fapesp-fundac;ao de amparo a pesquisa do estado de sao paulo). 12. b. e. c. schottlaender, "the development of national principles to guide librarians in licensing electronic resources," library acquisitions-practice and theory 22, no. 1 (spring 1998): 49-54. 13. w. s. lang and m. grigsby, "statistics for measuring the efficiency of electronic information retrieval," journal of the american society for information science 47, no. 2 (feb. 1996): 159-66. electronic library for scientific journals i krzyzanowski and taruhn 65 local hosting of faculty-created open education resources: launching pressbooks communication local hosting of faculty-created open education resources launching pressbooks joseph letriz information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.13803 joseph letriz (jletriz@dbq.edu) is the electronic systems librarian, university of dubuque. © 2022. abstract rising costs of secondary education institutions, coupled with the inflated cost of textbooks, have forced students to make decisions on whether they can afford the primary materials for their classes. publishers working to supply digital access codes, which limit the ability of students to copy, print, or share the materials, or resell the textbook after the course is over, have further pushed students into forgoing purchasing materials. in recent years, institutions have moved to support oer (open education resources) initiatives to provide students a cost-free primary text or supplement to their materials. this allows students unfettered access to quality resources that help drive engagement in courses, from homework to discussions. while larger institutions or in-state partnerships with resource sharing consortiums, such as the mnpals cooperation with the state of minnesota, provide access to platforms like pressbooks, smaller institutions and private colleges don’t always have the ability to negotiate these types of relationships. in this case study, i will cover the foundations necessary to start a low-cost, self-hosted solution to support faculty creation of oer material and the available resources that the university of dubuque utilized in their development process. this overview will briefly cover the skills and knowledge needed to support the growth of this initiative with minimal complexity and as little jargon as possible. introduction at the university of dubuque, the library installed, configured, and deployed an instance of pressbooks to support faculty development of open education resources (oer). the university of dubuque is a small, private university with a total full-time enrollment (fte) of about 2,000. two library personnel lead the deployment of the resource. as many universities find themselves grappling with an increase in textbook costs and other barriers to students’ access to quality information, libraries have emerged as a natural partner within institutions to identify, curate, and provide access to quality oer. okamoto points towards a variety of ways that libraries have managed this, including the community college consortium for open education resources (cccoer), which includes “150 member colleges … promot[ing] oer adoption to enhance teaching and learning.”1 braddlee and vanscoy state that librarians hold an important role in “supporting faculty and students in expanding the range of oer” through a number of methods, referencing prior research that okamoto performed.2 from a number of interviews from similarly sized liberal arts colleges, schleicher et al. state that librarians leading the initiatives for oer “may need technical skills … to assist faculty in developing oer projects.” 3 the benefits for students in terms of cost alone show that oer supported projects, such as the launch of pressbooks at the university of dubuque, has longstanding benefits for faculty, students, and the library.4 mailto:jletriz@dbq.edu information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 2 pressbooks is an open-source book content management system, making the software free for anyone to utilize, customize, and remix. with an open-source software as the basis for this project, the university of dubuque could view and change any of the underlying codebase to fit their exact needs to provide a platform for faculty to publish and develop oer content for their classes and students. the overlaying interface and configuration of pressbooks is built upon a fresh installation of the wordpress blog hosting system utilizing the multisite feature. these two systems are free to install, configure, and deploy on a locally hosted or cloud-based network. larger consortiums, which can consist of state level organizations, universities, and partnerships with businesses, may have the flexibility in spending to fund a hosted solution from the company itself. the cost of paying for a hosted solution can vary depending on the needs of the community served. a pressbooks edu single network plan, hosted by pressbooks, can cost $7,000 a year for the silver plan or $14,000 a year for the gold plan.5 at the university of dubuque, we opted for the low-cost solution of locally hosting our installation, which involved configuring the software locally and providing our own support for the faculty and students utilizing it. in this case study, i will detail how we successfully deployed the instance of pressbooks for the university of dubuque. this case study will cover the documentation used, the systems and services utilized to support the network, and the timeline from beginning the project to its successful launch. documentation to start the installation process, there needs to be a web server to host the pressbooks instance. at the university of dubuque, we used an already configured amazon web services (aws) account to set up the server that pressbooks would run on. aws offers a variety of tiers for its web server hosting, from the smallest available configuration that can be used for free to larger, more powerful instances for public access. at the university of dubuque, we opted for the aws t3a.large instance type, which gave us access to a faster server load for processing the installation and running the instance operations, as well as better network bandwidth.6 once we had the instance type selected, aws allowed for configuration of a variety of operating system (os) installations that come preconfigured or an à la carte option. we chose the same os platform that we utilize for our digital repository, a c-panel, centos 7 instance, as we already owned an educational software license for it. c-panel offers a reduced cost, education license available for any institutions with a .edu domain. the application to receive an educational license for the c-panel account takes little time to fill out and the only cost associated with the initial application is a $30 processing fee.7 once c-panel activated the education license on our primary platform, the license was utilized on the other instances without having to worry about multisite or platform license fees. aws launches the instance in the ec2 services page listed under the account, which details the instance’s setup, volumes attached to the instance that the software gets installed on (with additional volumes available to add onto the instance if necessary), and the ability to create snapshots of the instance for backups and restoration of the installation configuration. aws categorizes volumes as the primary storage devices for the installation, akin to a virtual hard drive, while the snapshots function as a copy of that storage device. during the configuration process, aws provides additional information about all of the options available in their ec2 service. as these are not directly relevant here, i will not go into detail about them. at the university of dubuque, we had preconfigured security groups and snapshot schedules set up that we applied to the pressbooks instance before we installed the underlying software. information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 3 the primary documentation used for the platform setup before installing the platform came from the pressbooks documentation site.8 the documentation begins by directing users to wordpress and their famous 5-minute install. wordpress documents are available on their website (https://wordpress.com/support/); installation directions include prompts that guide users through the entire setup and configuration process. once the wordpress installation process is complete, pressbooks can be set up on top of the wordpress site by following the installation directions from the pressbooks documentation. the beginning portion of configuring the wordpress site for pressbooks involves editing the configuration file for wordpress to allow for multisite setups on the single instance of wordpress. once the pressbooks site is installed, pressbooks will require additional plugins through wordpress in order for pressbooks to function correctly. again, the installation documentation for pressbooks walks through each of the necessary plugins, providing directions on how to configure the files for the installation to work correctly, how to start the configuration of users and appearance, and how to begin the creation of digital materials on the site. access to pressbooks can be set up through the installation itself, using plugins to link the installation to microsoft office accounts, google accounts, or any others that might be used. once this final step is completed (which will vary institution by institution depending on what service the institution utilizes for their primary authentication method ), the pressbooks site is ready to be utilized. there are two kinds of regular maintenance needed to keep the installation up to date. the first relates to the pressbooks and wordpress installations and updates, changes to configurations, additions, or deletions for the instance. most of these software updates, configuration changes, and plugin updates are handled through the pressbooks interface under the network manager administrator menu. since pressbooks is a layered software that’s built on the wordpress platform, all of the network configuration options use the same wordpress tools and user interface. the second kind of maintenance is done through a terminal command-line interface (cli) connection to the aws instance. this includes server maintenance tasks, which can be preconfigured through a script run on the server or handled by an administrator with sufficient knowledge of the system. the cli can also locate the error logs to pinpoint any errors that may have happened during setup and configuration. this maintenance can run on a monthly schedule, usually to ensure that web hosting software or internet access services are running correctly on the aws instance in addition to any server updates for the os and installed platform. at a smaller institution, the work on pressbooks can be handled by a librarian or professional staff member, as wordpress makes the procedure as simple as possible for anyone. the command-line interface work, if an os is installed without a user interface, can be handled by either a librarian familiar with terminal commands or a member of the institution’s help desk or information technology support personnel. any additional dependencies outside of what comes with wordpress are easily handled through the same network manager administrator menu. most installs include a number of default configuration options, such as uploading documents, printing from a pdf, or view functionalities. at the university of dubuque, any additional dependencies were all installed using wordpress and configured on pressbooks without any need to access the server directly. for a smaller institution, this makes the process of approaching a self -hosted solution sustainable over time, as it does not require specialized knowledge of servers to handle pressbooks once it is installed. https://wordpress.com/support/ information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 4 working with faculty to add materials and address concerns when we launched pressbooks at the university of dubuque (http://pressbooks.dbq.edu) and wanted to showcase how using the platform would be advantageous, we worked with a geology professor who had already created his own textbook for his entry level geology course. the pdf he created was over 170 pages long and included all of the terminology, concepts, and example questions the students would see on the quizzes.9 we worked with the professor to get the original word document of his textbook, complete with his own layout structure, font, and headings, correctly formatted to import into pressbooks. the system manages the import process by utilizing very basic formatting of the document, identifying chapters based on the heading types.10 essentially, the library staff worked with the professor to sanitize the document of all unnecessary formatting, laid out the primary chapter headings in the document using the word heading formats that are supported, and then processed the document through the pressbooks tools for importing. with the primary example uploaded and ready to showcase to the faculty members, our library director began fielding the requests of other faculty members at the university of dubuque.11 the current process for working with faculty involves sending any interested faculty the list of required reading that pressbooks has hosted on their website. this includes materials related to creating the content directly in pressbooks, importing the content from a word document if authors already have something they want to use, and setting up an account as an author on pressbooks. in addition to the geology professor mentioned above, two additional faculty members in vastly different departments, computer information science and philosophy, used our pressbooks instance to curate their materials for their students. as the instance is built on a wordpress multisite installation, library staff are able to install and configure a variety of additional material for faculty—enabling practice quizzes, the list of glossary terms to study, and other material—either through the native pressbooks interface or with the assistance of opensource plugins such as h5p, a plugin that allows community-created videos, presentations, quizzes, and interactive content to be created, shared, and reused. all of these additional configuration options, including adding the additional tools for faculty, are handled through the network manager administrator menu. faculty with questions or needs for assistance can reach out to library personnel directly through email or by setting up a teams or zoom call to walk through the problem they might have or express a need that they can assist with. looking back/reflections throughout the process, the university of dubuque’s work came to fruition through the efforts of one librarian focused on the application and server side management and a library worker who was familiar with mysql query language and data management. this partnership proved invaluable in working with the nuances of configuring the sql database to the necessary specifications. for any institution looking to have an uncustomized database, the wordpress installation configuration options work without any additional knowledge or customization necessary. the library’s access to the aws instance from the campus needed involvement by the campus information technology department’s help desk to approve the ip address on file for the dns configuration. in simpler terms, once the library set up the aws instance with an elastic ip address (the term amazon uses on aws to refer to their ranges of ip addresses) and configured the domain information on pressbooks through the installation, the library provided that information to the help desk and they updated the necessary documents and certificates. the last http://pressbooks.dbq.edu/ information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 5 piece of the process, inviting users to utilize installation, required the most patience and is an ongoing process. in setting up the accounts locally, for more restricted access, pressbooks provides only temporary account status to any created user accounts. this means if a faculty member has an account created for them by the institution in july but doesn’t attempt to sign in to the account until september, pressbooks will not hold onto that information in the sql database. after a default period of three days, which is customizable through the wordpress configuration options, the username is not retained by the system and the new account creation process has to begin again. there are options to link the installation to a single sign-on system, such as microsoft’s adfs or a program such as shibboleth or google apps. directions for setting up this configuration option are also available as part of pressbooks documentation on their website. at the university of dubuque, having a small fte allows for more time to work closely with a faculty member throughout the oer creation process, as the faculty are more flexible with their time. the current process of creating accounts as needed, on an individual basis, wor ks well when handling limited requests. larger institutions that would utilize this method of configuration might find it easier to streamline the request through a single sign-on system, an authentication method that is automated through an administrator or pressbooks or another program. additional needs after the rollout of the pressbooks site to the campus community, we encountered additional needs for our instance that weren’t configured as part of the base installation of the site. for faculty members registered for an account on the site, pressbooks would allow their account to have basic user access to the features necessary to start creating their oer. however, this did not allow for the usage of a majority of the features that pressbooks offers. part of this disconnect stemmed from the way the accounts were created on our multisite instance. accounts created need to be manually added and confirmed as an existing account on the pressbooks site as an author in order to allow access to the full suite of options available for the oer creation. the other hang-up in access for faculty came from the way pressbooks handles email for new registration, password change information, or any type of communication. prior to june 2020, developers were able to simply connect a wordpress site, or other sites, to gmail using a simple authentication of their account using their username and password. in june 2020, google required users who want to utilize gmail to send emails from a new site, or in this case a locally created instance of a wordpress multisite for pressbooks, to authenticate their account information by authorizing the site through a google developer api, paying for access to the plugin that would allow for configuration of gmail, outlook, or other email providers, or rely on the site maintainers to configure their own email services through the server itself.12 if it is built into the budget of the university to purchase and subscribe to a service provided by a plugin owner, that option works without additional server configuration. our institution, however, was limited in its payment options and was unable to utilize standard forms of payment required by the plugin providers. as such, we are manually reviewing the registration requests for the site and creating accounts on pressbooks on an as-needed basis. information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 6 concluding thoughts the university of dubuque’s initial introduction to pressbooks came from attending the library technical conference 2019, held at macalester college in minnesota. while there, representatives from the mnpals consortium walked through the work done between the university of minnesota and the state library system to integrate their instance of pressbooks throughout public library systems and university systems.13 the work done at our institution is at a significantly smaller scale, only being utilized by faculty members at the university and members of the university community, including adjunct professors and professional staff. while work on a consortium level can proceed quickly, as there are multiple parties involved in the creation of the resource, we at the university of dubuque had a small number of people immediately working on the project. the discussion between the two personnel in the library handling the system work and the director of the library took the longest amount of time, followed by a couple of months between contact with pressbooks about pricing, hosting through them, and conversations at the state level attempting to gauge interest from additional parties to partner with. initial conversations at local state conferences, with the larger public institution librarians participating in discussions, didn’t evolve into an actionable plan. from there, the planning for the setup of the aws instance to install wordpress and pressbooks took a month to set up. another two weeks were spent working with the mysql database to customize it to the university’s needs and upload the instructor book used as the pilot upload. from start to finish, seven months passed to the launch and rollout of the product. since the launch of the platform, work has started on identifying faculty who would benefit from using pressbooks, with surveys across the institution to glean insight into what faculty are currently doing, how they and their students can benefit from this, and all the steps involved. with the work done at the university of dubuque, operating as a private, smaller university allowed for more flexibility in our adoption of technology, a more focused approach to introducing new systems to the university at large, and a less bureaucratic approach to seeking approval. in the library, we recognized that we were in a unique position to begin this development and implementation rapidly for the university and took advantage of that. endnotes 1 karen okamoto, “making higher education more affordable, one course reading at a time: academic libraries as key advocates for open access textbooks and educational resources ,” public services quarterly 9, no. 4 (2013): 4. 2 dr. braddlee and amy vanscoy, “bridging the chasm: faculty support roles for academic librarians in the adoption of open educational resources,” college & research libraries (may 2019): 429. 3 caitlin a. schleicher, christopher a. barnes, and ronald a. joslin, “oer initiatives at liberal arts colleges: building support at three small, private institutions,” journal of librarianship and scholarly communication 8 (2020): 16. 4 jennifer snoek-brown, dale coleman, and candice watkins, “from spark to flame, lighting the way for sustainable student oer advocacy framework at a community college,” scholarly communication 82, no. 8 (2021): 2. information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 7 5 pressbooksedu, pressbooksedu plans q3 2019, received july 26, 2019, adobe pdf. 6 “amazon ec2 t3 instances”, amazon, last modified september 14, 2021, https://aws.amazon.com/ec2/instance-types/t3/. 7 “educational license application”, cpanel, accessed march 14, 2022, https://input.cpanel.net/s3/edu. 8 “installation.”, pressbooks documentation, pressbooks, last modified february 23, 2022, https://docs.pressbooks.org/installation/. 9 dale easley, “the story of the earth.” dale easley, september 1, 2021, http://www.daleeasley.com/resources/physical/geomain.pdf. 10 “import from a word document.”, pressbooks user guide, accessed march 14, 2022, https://guide.pressbooks.com/chapter/bring-your-content-into-pressbooks/#chapter-156section-3. 11 dale easley, the story of earth (dubuque: university of dubuque pressbooks), http://pressbooks.dbq.edu/storyoftheearth/. 12 “how to upgrade to oauth2 security for existing google/gmail accounts.”, postbox, accessed september 1, 2021, https://support.postbox-inc.com/hc/en-us/articles/218446767-how-toupgrade-to-oauth2-security-for-existing-google-gmail-accounts. 13 “about the minnesota libraries publishing project.”, minnesota libraries publishing project, accessed september 1, 2021, https://mlpp.pressbooks.pub/about-the-minnesota-librarypublishing-project/. https://aws.amazon.com/ec2/instance-types/t3/ https://input.cpanel.net/s3/edu https://docs.pressbooks.org/installation/ http://www.daleeasley.com/resources/physical/geomain.pdf https://guide.pressbooks.com/chapter/bring-your-content-into-pressbooks/#chapter-156-section-3 https://guide.pressbooks.com/chapter/bring-your-content-into-pressbooks/#chapter-156-section-3 http://pressbooks.dbq.edu/storyoftheearth/ https://support.postbox-inc.com/hc/en-us/articles/218446767-how-to-upgrade-to-oauth2-security-for-existing-google-gmail-accounts https://support.postbox-inc.com/hc/en-us/articles/218446767-how-to-upgrade-to-oauth2-security-for-existing-google-gmail-accounts https://mlpp.pressbooks.pub/about-the-minnesota-library-publishing-project/ https://mlpp.pressbooks.pub/about-the-minnesota-library-publishing-project/ abstract introduction documentation working with faculty to add materials and address concerns looking back/reflections additional needs concluding thoughts endnotes a library website redesign in the time of covid: a chronological case study communication a library website redesign in the time of covid a chronological case study erin rushton and bern mulligan information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.15101 erin rushton (erushton@binghamton.edu) is head of digital initiatives and resource discovery, binghamton university libraries. bern mulligan (mulligan@binghamton.edu) is associate librarian emeritus, binghamton university libraries. © 2022. he found a glimmer of hope in the ruins of disaster…. gabriel garcía márquez, love in the time of cholera abstract in november 2019, binghamton university libraries initiated a website redesign project. our goal was to create a user-centered, data-informed website with refreshed content and upgraded functionality. originally, our redesign plan included in-person card-sorting activities, focus groups, and usability studies, but when the libraries went remote in march 2020 due to the covid-19 pandemic, we had to quickly reassess and adapt our processes and workflows. in this article, we will discuss how we completed this significant project remotely by relying on effective project management, communication, teamwork, and flexibility. introduction website redesign projects can be daunting, even during normal circumstances. this article will outline how we accomplished a website redesign project in a reasonable timeframe during the unprecedented circumstances of the covid-19 pandemic. binghamton university is part of the state university of new york (suny) system. founded in 1946, it has an enrollment of about 18,000 graduate and undergraduate students. binghamton university libraries are an important part of the university, serving as the center of the university’s intellectual community. our website is the libraries’ most important tool for “scaling up” services to our users. it is as important as the physical library and became even more so with the digital demands imposed by covid, particularly the importance of access to streaming video during the pandemic. a truism in website redesign is that your current website is never more popular than when you take it down. however, our redesigned website was successfully launched to general approbation and is considered a functional and cosmetic improvement over the old website. we were pleasantly surprised how little negative feedback we received. people just started using it, which may be the highest compliment paid to a design team. as we will highlight throughout the article, we believe that the success of this project was the result of the following: • a dedicated/functional web team, • the ability to meet frequently and on a moment’s notice, • the ability to focus almost exclusively on the project; and • effective project management. mailto:erushton@binghamton.edu_ mailto:mulligan@binghamton.edu information technology and libraries december 2022 a library website redesign in the time of covid | rushton and mulligan 2 premise for the redesign the last time the libraries had fully redesigned the website was in 2013. that project was significant because we had migrated from a locally hosted site to the university’s web content management system, omniupdate. highlights of the 2013 redesign included a fresh look and feel with binghamton university colors, updated site architecture and navigation, and expanded search options from the home page. up until 2013, we had typically redesigned our website on a five-year cycle. as such, we began discussing the possibility of another redesign in mid-2018. we also thought it was an opportune time to redesign the website because the libraries had a new dean, we were offering new services and initiatives, and we had become more mindful of accessibility issues and responsive design. the web team the libraries’ web team is the group that leads website redesign projects and maintains the website. the team generally meets as needed between redesign projects. in anticipation of the planned 2019–2020 redesign, however, we began scheduling more regular meetings. our preferred meeting times were friday mornings. one of the prerequisites for these meetings was to take turns bringing bagels and juice. we quickly got to know what kind of bagel each of us preferred. in november 2019, the team consisted of ben coury, bern mulligan, erin rushton (chair), and dave vose. except for ben, the other members of the web team had participated in several past library website redesigns. ben had recently been hired as the libraries’ digital web designer. he brought high-end programming skills to the team and expertise and knowledge about user experience and accessibility which were integral to the success of the project. it was an advantage to have a small, dedicated, and agile team that collaborated and communicated well. this positive chemistry or esprit de corps among members allowed us to debate any controversial issues professionally, not personally. we internalized the team’s mission and worked single-mindedly toward its successful completion. binghamton university is a google campus, which meant we had access to the full suite of google apps (e.g., gmail, google calendar, google drive, google groups, etc.). at the beginning of the redesign, we had already begun to use google to create, share, and archive our committee documents. project timeline this is the timeline for the project, which took ten months to complete. it occurred basically in two phases: before covid and after covid. much of the work that occurred before covid was completed in person. all of what occurred after covid was done virtually: via email, phone, and, as was the case with most organizations during the pandemic, via zoom. • november 2019: planning phase • december 2019–march 11, 2020: in-person meetings with library constituent groups • march 12, 2020: meeting with student advisory group • march 17, 2020: covid shutdown • april 15, 2020: meeting with communications and marketing information technology and libraries december 2022 a library website redesign in the time of covid | rushton and mulligan 3 • april–june 2020: website architecture and content review • july 2020: creating templates, migrating content, and designing the home page • august 2020: final meetings and details • august 19, 2020: successful launch of the new website when planning a website redesign project, it is understood that there will be a lot of meetings. as the pace of the project accelerates and as the deadline approaches, so does the frequency of meetings. during the ten months of this project, we met 64 times: 16 in-person meetings from november 2012 to march 12, 2020; 25 virtual meetings from march 17 to the end of june; and 23 virtual meetings in july and august. during july and august until launch, the other aspects of our jobs took a back seat to the proj ect; it’s pretty much all we worked on. we sometimes met twice a day, and that’s not counting the incidental phone calls between individual members for questions about sticking points in the process. november 2019: planning phase the libraries’ user interface steering committee (uisc) has oversight for the various public library interfaces, including the website. the committee consists of the web team members and representatives from different departments in the libraries, including public services, technic al services, and special collections. the uisc helped us establish the goals and objectives for the redesign project, gave us feedback about ideas during the redesign process, and monitored our ongoing progress. there were four goals that became apparent almost immediately. over the years, many website development requests had been postponed, so our first goal was to accommodate these improvements and enhancements. and since university communications and marketing, the unit on campus responsible for the entire university web domain, had updated templates since our last redesign, our second goal was to utilize the new templates. our third goal was to create a usercentered, data-informed website. finally, our fourth goal was to address accessibility issues on the website and make it easier for users to navigate. december 2019–march 11, 2020: in-person meetings with library constituent groups once the goals had been established, we began scheduling meetings with a variety of library constituent groups. these included preservation, reader services, special collections, and dean’s council. it was important for us to get input from these groups because we wanted everyone to feel represented by the project since the website is the gateway for many of our services and resources. we also wanted the redesign process to be transparent so that everyone knew what was happening. at each meeting, we discussed how the website was working for their area and what improvements they wanted to see. we also provided a snapshot of the website for each area which showed current usage statistics (see fig. 1). information technology and libraries december 2022 a library website redesign in the time of covid | rushton and mulligan 4 figure 1. screenshot of a website inventory spreadsheet for special collections. march 12, 2020: meeting with student advisory group we met with the student library advisory committee (slac) on march 12. this group offered open, two-way communication between the libraries and the student population and provided a mechanism for the libraries to solicit feedback on specific issues as needed. as an incentive for students attending this meeting, we provided free pizza and snacks. at this meeting, we asked the students a variety of questions, including why they used the website and what they liked and disliked about it. some of the answers we received were surprising to us. for one thing, few of the students began their research from our website. while they had used the website, they were not particularly familiar with most of its features. a second consensus of the group was that if they wanted to know something about the libraries (e.g., our hours), they just googled it. finally, although our ask a librarian service was linked from several places on the website, none of the students in the group had ever noticed it and were unaware of the service. these revelations further informed what we wanted to address in the redesign. march 17, 2020: covid shutdown on march 17, the university abruptly closed in-person services due to the growing pandemic. for a few weeks after the shutdown, the priority for all university employees was transitioning to remote work and providing virtual services. once we felt ready to resume the project, we had to reassess how we would continue it remotely. there were two significant challenges: how to continue our committee work as a distributed team and how to continue gathering user feedback given that no one was physically on campus. we discovered that transitioning into a distributed team worked well for the project. we no longer had to reserve meeting spaces and set up laptops and projectors. instead, we could quickly organize zoom meetings, sometimes on the fly, when we had something we wanted to discuss. and all our committee files were already on google drive. as a result, we were better able to focus almost exclusively on the project and were less impacted by the distractions that often occur in the office environment. information technology and libraries december 2022 a library website redesign in the time of covid | rushton and mulligan 5 unfortunately, we were unable to conduct any of the focus groups with teaching faculty or the “guerrilla” usability testing with students that we had planned. however, we were glad to have had the in-person meetings with some of our constituents before the work-from-home phase because we had established a good foundation of what we needed to accomplish in the redesign. april 15, 2020: meeting with communications and marketing one of the first groups we met with virtually was university communications and marketing (c&m). as mentioned above, this is the unit on campus that is responsible for omniupdate, the university’s website platform. the purpose of this meeting was to discuss our plans and timeline and to clarify the role of c&m in the project. although we now had our own web designer in ben, we knew that c&m had oversight over the entire university web presence and would have to decide whether our redesign fit in with the rest of the domain in terms of its appearance and accessibility. april-june 2020: website architecture and content review the next significant part of the project was reviewing our current pages. during preliminary planning the year before, we had a student list all pages of the libraries’ website in a google spreadsheet which also included google analytics from the three previous years. we literally spent hours poring over this document (see fig. 2). it helped us identify which pages would migrate mostly as is, which pages would need additional review, which pages would be converted to libguides and vice versa, and which pages would be deleted. figure 2. screenshot of a spreadsheet listing all library web pages. information technology and libraries december 2022 a library website redesign in the time of covid | rushton and mulligan 6 originally, we had planned to do a physical card sorting like we had done in past redesigns. we had even created the cards and had scheduled an all-day card-sorting event, complete with pizza. but covid changed all that. since we were meeting virtually, we had to think of another way to accomplish this. a breakthrough occurred when ben introduced trello, an online collaboration and project management tool that allowed us to work together on the new website’s architecture in real time via zoom (see fig. 3). figure 3. screenshot of a trello board. we recreated the six existing main navigation categories and created “cards” for each web page. trello made it easy to move these virtual cards around into the different categories. we had received feedback that six categories were perhaps too many for users to choose from. we spent a number of meetings in may discussing/debating what the new navigation categories should be and where pages fit under them. we decided to fold the locations & collections and special collections sections into about, and search & find into research, because we felt these changes made more logical and functional sense. we added a top-level link to my account because this was also something that users had suggested should be more prominent. another aspect of the libraries’ website redesign project was the content of libguides. initially, they were meant to be subject guides, but over the years some of our web pages were converted into libguides. as part of the redesign, we worked closely with the collections and instruction department to make decisions about where content should be located. most libguides are now subject or research guides while descriptions of libraries’ services are web pages. july 2020: creating templates, migrating content, and designing the home page by the end of june, we had accomplished the following: • met with available constituencies, • identified what content we needed to migrate, • identified what content needed to be created; and • had a new website architecture. information technology and libraries december 2022 a library website redesign in the time of covid | rushton and mulligan 7 at this point, we felt we needed to focus on the actual migration of content and the design of the home page. on july 2, we met virtually with collections and instruction to discuss how the search box on the home page would look and function. it was no mean feat coming to consensus on this. we were on a tight schedule, but we wanted to be sure that we heard everyone’s ideas. taking all feedback into consideration, ben then had the ultimate challenge of coding and designing a functioning search box. for the last three weeks of july, we met daily, and sometimes twice a day, as we began the daunting process of migrating all the lower-level pages. we were definitely feeling the pressure to complete the project on time. since we were using the new university templates, designing the lower-level pages was relatively straightforward. ben had customized the template and provided us with a migration guide. the guide included instructions on how to create and format new pages. this allowed the other members of the web team to migrate content while he focused on the more complicated aspects of the project. all migrated content was reviewed to ensure that it was current. for some pages, this required input from several departments. to facilitate the updating, we copied the content of every page we were migrating into a google doc to allow for collaborative editing. once the content was updated, it would be copied and pasted into the new template. the screenshot in figure 4 represents the redesign for most of the lower-level pages. figure 4. screenshot of a new lower-level page. information technology and libraries december 2022 a library website redesign in the time of covid | rushton and mulligan 8 the most significant changes were the relocation of the navigation column from left to right and some new features that were included in omniupdate, including the ability to add contact information. we had to do some formatting, such as adding headings and hyperlinks, and some metadata work, such as creating page descriptions and keywords. we also had to pick a photo for the banner of each page. we had planned to have the university photographer take new photos, but because there were no students and hardly any staff on campus, we had to rely on pictures of our spaces he had already taken. we were tracking the migration on a spreadsheet (see fig. 5). this document contained the new architecture of the website and had links to the google docs mentioned previously. the spreadsheet also noted who was responsible for reviewing the content and the status of each page. figure 5. screenshot of the spreadsheet used to track the migration of webpages. while this mass migration was taking place, ben focused on creating the pages that required additional coding and customization such as the ask a librarian page, the staff directory, and library tutorials. he also worked on the design of the home page. one of the tools ben used to help with the mock-up was adobe xd (see fig. 6). information technology and libraries december 2022 a library website redesign in the time of covid | rushton and mulligan 9 figure 6. screenshot of initial design of home page in adobe xd. adobe xd is a design tool in the adobe suite used for prototyping user interfaces. he created an interactive wireframe for the home page and landing pages. this allowed us to discuss the interface and make changes without a lot of time spent on mock-ups. information technology and libraries december 2022 a library website redesign in the time of covid | rushton and mulligan 10 august 2020: final meetings and details we met with the user interface steering committee on august 5 to get their input on the redesign. on august 7, we met with communications and marketing again to go over our design and get final approval for our home page. we also previewed the site to library people at an all-staff meeting later that day. the final week before site launch was dedicated to the special collections pages that had embedded links for finding aids and to working on the ask a librarian page. and the last day was spent going through and making sure everything was in order before launch. august 19, 2020: successful launch of the new website we launched the redesigned website on august 19 (see fig. 7). some of its new features include • a large “hero” photo which can easily be changed depending on what the libraries want to promote; • a redesigned search box; • popular links featured on the home page; • a news section which pulls from our blog software and allows for automatic updates; and • a prominent, visually attractive place on the home page for special collections since it was bumped from the top navigation bar. figure 7. partial screenshot of the new library home page. information technology and libraries december 2022 a library website redesign in the time of covid | rushton and mulligan 11 conclusion the process that we had initially envisioned did not work out because of the covid-19 pandemic. although we did meet with our student advisory group, we never got to hold any focus groups with teaching faculty or usability studies with students. but thanks to zoom and other online tools, we were still able to gather some user feedback. we also had a new website ready before fall 2020. while there was certainly unanticipated stress in tackling a project like this in the middle of a pandemic, we felt that working remotely in some ways helped us to be more productive. we were better able to focus almost exclusively on this project and were less impacted by the distractions that often occur in the office environment. we also felt that the quick adoption of zoom made us more agile about scheduling and holding meetings. despite some of the challenges that we faced throughout the project, the redesigned website is a success. since the launch, we have made a few minor changes to the overall architecture of the site. the most significant change was adding a giving link to the navigation menu at the req uest of our dean and the binghamton university foundation. our library website is never static, as we continue to update our home page with news and events and change our hero banner to reflect the priorities of the libraries. while we have no plans for another major redesign in the near future, we are open to making changes and improvements as needed. bibliography becker, danielle a., and lauren yannotta. “modeling a library web site redesign process: developing a user-centered web site through usability testing.” information technology and libraries 32, no. 1 (2013): 6–22. buell, jesi, and mark sandford. “from dreamweaver to drupal: a university library website case study.” information technology and libraries 37, no. 2 (2018): 118–26. wu, jin, and janis f. brown. “website redesign: a case study.” medical reference services quarterly 35, no. 2 (2016): 158–74. zhu, candice. “website makeover: transforming the user experience starting from scratch.” computers in libraries 41, no. 6 (2021): 21–6. abstract introduction premise for the redesign the web team project timeline november 2019: planning phase december 2019–march 11, 2020: in-person meetings with library constituent groups march 12, 2020: meeting with student advisory group march 17, 2020: covid shutdown april 15, 2020: meeting with communications and marketing april-june 2020: website architecture and content review july 2020: creating templates, migrating content, and designing the home page august 2020: final meetings and details august 19, 2020: successful launch of the new website conclusion bibliography text analysis and visualization research on the hetu dangse during the qing dynasty of china article text analysis and visualization research on the hetu dangse during the qing dynasty of china zhiyu wang, jingyu wu, guang yu, and zhiping song information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13279 zhiyu wang (mikemike248@gmail.com) is phd candidate, school of management, harbin institute of technology and associate professor, school of history, liaoning university. jingyu wu (734665532@qq.com) is graduate student, school of history, liaoning university. guang yu (yug@hit.edu.cn) is professor, school of management, harbin institute of technology. zhiping song (1367123893@qq.com) is graduate student, school of history, liaoning university. © 2021. abstract in traditional historical research, interpreting historical documents subjectively and manually causes problems such as one-sided understanding, selective analysis, and one-way knowledge connection. in this study, we aim to use machine learning to automatically analyze and explore historical documents from a text analysis and visualization perspective. this technology solves the problem of large-scale historical data analysis that is difficult for humans to read and intuitively understand. in this study, we use the historical documents of the qing dynasty hetu dangse, preserved in the archives of liaoning province, as data analysis samples. china’s hetu dangse is the largest qing dynasty thematic archive with manchu and chinese characters in the world. through word frequency analysis, correlation analysis, co-word clustering, word2vec model, and svm (support vector machines) algorithms, we visualize historical documents, reveal the relationships between functions of the government departments in the shengjing area of the qing dynasty, achieve the automatic classification of historical archives, improve the efficient use of historical materials as well as build connections between historical knowledge. through this, archivists can be guided practically in historical materials’ management and compilation. introduction china has a long history documented in numerous archives. at present, various local archive departments preserve large numbers of historical documents from different periods. owing to the development of china’s archive digitization, archive management departments at all levels have established digital archive abstracts, catalogs, and subject indexes of historical documents in their collections realizing online retrieval of historical archives. with in-depth research on chinese history, simple catalog retrieval cannot satisfy researchers’ demand for related knowledge in historical archives. owing to the limitations of the catalog retrieval system, complex catalog data still need to be read manually. however, it is difficult to view the overall picture of the recorded content and impossible to easily distinguish important information in historical materials; this leads to various difficulties, such as the compilation of historical materials for chinese historical researchers. thus, in this study, we aim to use text analysis and visualization methods in machine learning to conduct data mining analysis of historical document data. these methods will help us discover the logical relationships of historical records and their purposes, accomplish visual presentations of historical entities and knowledge discovered in historiography, improve knowledge representation and automatic classification of historical data, and provide valuable information for historical archive researchers. mailto:mikemike248@gmail.com mailto:734665532@qq.com mailto:yug@hit.edu.cn mailto:1367123893@qq.com information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 2 during the process of analyzing traditional manual methods for interpreting historical documents, we find the following phenomena: macro description, single angle, selective analysis, and one-way knowledge connection, among others. for example, the hetu dangse preserved in the liaoning archives contains a total of 1,149 volumes and 127,000 pages, making it difficult to fully grasp and understand the overall content of such documents. relying on manual reading and analysis of entire archives is an unrealistic task. therefore, this paper proposes using machine learning, natural language processing (nlp), and other technologies to address various problems from traditional manual reading. first, information from historical documents can be revealed from different angles, and this allows the content of the documents to be displayed more comprehensively and scientifically through visual charts. second, use of objective quantitative analysis methods, such as text analysis and nlp, prevents subjective interpretations of the same content. third, nlp and other technologies can solve the problem of calculating massive text training data sets while forming systematic knowledge that avoids the omission and one-sided understanding of knowledge in the historical archive. the application of machine learning in historical data analysis has attracted the attention of researchers in management, history, and computer science. tao used the latent dirichlet allocation (lda) topic modeling algorithm to analyze the themes of documents from 1700 to 1800 included in the german archives, providing a more three-dimensional interpretation and explanation of the spiritual world of germany during the eighteenth century.1 chinese scholars kaixu et al. proposed a method of automatic sentence punctuation based on conditional random fields in ancient chinese.2 this method was proved to better solve the problem of automatic punctuation processing compared with the single-layer conditional random field strategy in ancient chinese as tested on the two corpora of the analects and records of the grand historian. swiss and south african scholars stauffer, fischer, and riesen, and chinese scholars wu, wang, and ma used the kws technology and deep reinforcement learning to automatically recognize handwritten pictures in historical documents.3 solar and radovan used the national and university library of slovenia’s historical pictures and maps as research data. using gis technology, they created a novel display method, and interdisciplinary data resource web application to access and research the data.4 chinese scholars dong et al. and polish scholars kuna and kowalski used the webgis technology to conduct efficient management and visualization research on historical data of natural disasters in ancient china and russia. 5 meanwhile, latvian scholars ivanovs and varfolomeyev and dutch scholars schreiber et al. used web technology to develop a web service platform and explored the intelligent environment of cultural heritage service utilization.6 korean scholars kim et al. used machine learning technology to determine the complex relationships between tasks of various classes in a specific historical period through the network of historical figures.7 judging from results in related fields, the semantic analysis and visualization of historical archives in an intelligent way are gradually moving from statistical description to knowledge mining. these results provide theoretical feasibility and practical technical experience for this study. at present, research on historical documents mainly focuses on the retrieval and utilization of historical material databases. since the words, semantics, grammar, and sentence patterns recorded in historical materials differ from modern texts, using data mining technologies such as machine learning and nlp to intelligently identify historical documents and organize historical data will help us more than traditional methods. this requires the cooperation of artificial intelligence and historical researchers to establish an effective method of historical big data information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 3 analysis to achieve the transformation from traditional manual historical document analysis to automatic artificial intelligence analysis methods. in this paper, we use machine learning and data visualization as a tool to identify differently the content of the historical documents from traditional literature reading, reveal valuable information in the content of historical documents, and promote more systematic, efficient, and detailed understanding of the literature. related technology definition to perform text analysis and visualization of the hetu dangse, we use machine learning technology such as word vector processing, the svm (support vector machines) model and network analysis. word vector is a numerical vector representation of a word’s literal and implicit meaning.8 we segmented the hetu dangse’s catalog data and used the word2vec model to transform the segmented data’s word vector form into a set of 50-dimensional numerical vectors representing a catalog’s vector data set. to accurately visualize historical document records’ relationship features, we reduced the vector data set’s dimensionality. dimensionality reduction, or dimension reduction, is data’s transformation from a highinto a low-dimensional space so that the representation retains some of the original data’s meaningful properties, ideally close to its intrinsic dimension.9 after dimensionality reduction, each catalog data in the vector data set is reduced from 50 to 2 dimensions to facilitate flat display. we used the svm model and network analysis technology to analyze the vector data set. the svm model is a set of supervised learning methods used for classification, regression, and outlier detection.10 it is given a vector data set as training to represent historical document records as points in space, and learns independently through the kernel algorithm. using the algorithm, it maps the separated new records to the same space, and predicts their category based on which side of the interval they fall. network analysis techniques derive from network theory, a computer science system demonstrating social networks’ powerful influences. network analysis technology’s characteristics determine that it is suitable for books and historical archives’ visualization in the library and information science field, because the visualization technique involves mapping entities’ relationships based on the symmetry or asymmetry of their relative proximity.11 thus, it helps to discover historical documents’ knowledge relevance. for example, citation network analysis can identify emerging relationships in healthcare domain journals.12 sample data preprocessing and classification this study uses the catalog of the qing dynasty historical archives from the hetu dangse collected by the liaoning archives as the research sample to conduct text analysis and visualization research. china’s hetu dangse is the largest qing dynasty thematic archive with manchu and chinese characters both in domestic and international. the hetu dangse is the official document of communication between shengjing general yamen, the wubu of shengjing and fengtian office, and the document communicated between the beijing internal affairs office in charge and the liubu of beijing during the qing dynasty. the hetu dangse was published from 2015 to 2018, including the hetu dangse·kangxi period (56 volumes), hetu dangse·yongzheng period (30 volumes), hetu dangse·qianlong period (24 volumes), hetu dangse·qianlong period (17 volumes), hetu dangse·daoguang period (52 volumes), hetu dangse·jiaqing period (58 volumes), hetu dangse·qianlong period official documents (46 volumes), hetu dangse·qianlong period official documents (46 volumes), and hetu dangse·general list (16 volumes).13 the hetu dangse is an information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 4 important document for studying the history of the qing dynasty. owing to the special status of shengjing in the qing dynasty, it has a unique historical significance as the companion capital of beijing and the hometown of the qing royal family. this provides original evidence from this time for studying politics, economy, culture, history, and natural ecology in northeast china. in this study, we preprocess the catalog data of the hetu dangse by performing text segmentation, creating a corpus, and labeling data before using text analysis and visualization technology to analyze the catalog data of hetu dangse. first, we use word frequency analysis and statistics to study the functions of institutions. second, we use the co-word clustering algorithm to quantify and visualize the institutional relationships. finally, we use the svm model to automatically classify and explore the catalog data of the hetu dangse. figure 1 illustrates this process. figure 1. text analysis flowchart. data preparation and preprocessing we collected 95,680 catalog data items in the hetu dangse of the liaoning archives, including 25,148 items from the kangxi period; 1,096 items from the yongzheng period; 23,819 items from the qianlong period; 20,730 items from the jiaqing period; and 15,887 items from the daoguang period. the content of each catalog data includes three parts: title information, time of publication (chinese lunar calendar), and responsible agency. the proportion for each period was not evenly distributed in the catalog data of the hetu dangse with the kangxi period catalog data having the highest proportion (26.2%). through the catalog data information, we can perform an in -depth analysis of the content of the hetu dangse from the three perspectives: institutional functions, institutional relationships, and topic classification. data cleaning as the text recorded in the archives of the hetu dangse are manchu and ancient chinese, using chinese word segmentation tools (jieba, snownlp, thulac, etc.) based on modern chinese will cause errors. therefore, it is necessary to construct a special text corpus for word segmentation. first, we construct a stop vocabulary list to remove words with little impact on semantics in the hetu dangse, such as for (为), please (请) and of (之). second, we use the word segmentation tools mentioned above for preliminary word segmentation and then perform part-of-speech tagging and word segmentation corrections based on the word segmentation results. the title part of the catalog data of the hetu dangse mainly contains three dimensions of information: the record title of the catalog, issuing institution, and receiving institution. accordingly, we set a total of four types of tags in the text corpus: issuing institution, receiving institution, record type, and keywords. the receiving institution and the issuing institution correspond to the institutions at the beginning and the end of the catalog, respectively, such as the words shengjing zhangguan fang zuoling, and shengjing ministry of justice. the record type is the front word of the receiving institution, such as counseling (咨) and please (请). the keywords are words that can represent the overall semantics information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 5 in the record title of the catalog, such as arrest (缉拿) and advance (进送). table 1 presents the corpus we developed. table 1. hetu dangse corpus num word property1 property 2 1 盛京掌关防佐领 organization noun 2 为 stop_words preposition 3 缉拿 keywords verb 4 逃人 keywords noun 5 舒廷 name noun 6 官事 stop_words noun 7 咨 keywords verb 8 盛京刑部 organization noun 9 正白旗佐领 organization noun 10 兆麟 name noun 11 呈 stop_words preposition 12 为 stop_words preposition 13 交纳 keywords verb 14 壮丁 keywords noun 15 银两事 keywords noun ┋ ┋ ┋ ┋ 61047 收讫事 keywords noun 61048 盛京佐领 organization noun label data to improve the utilization efficiency of the hetu dangse and show the document content information from multiple angles, we use a supervised machine learning method to automatically classify the catalog data of the hetu dangse. therefore, the original catalog data set must be labeled. we determine the classification and label of the hetu dangse catalog according to the chinese archives classification law, chapter 12. table 2 presents the 11 categories of the catalog. with this, we complete the hetu dangse catalog sampling classification and labeling laying the foundation for automatic catalog classification. the hetu dangse has a total of 95,680 catalog records involving five periods: kangxi, yongzheng, qianlong, jiaqing, and daoguang. we randomly select 500 records from each period and manually label these 2,500 records as the sample data set. the data classification after manual labeling is shown in figure 2. the overall distribution is relatively even, making it suitable for machine learning processing. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 6 table 2. data labels num category 1 type of official documents (政务种类) 2 palace, royal family and eight banners affairs(宫 廷、皇族及八旗事务) 3 bureaucracy, officials(职官、吏役) 4 military(军事) 5 politics and law(政法) 6 sino-foreign relations(中外关系) 7 culture, education, health and scientific cultural study(文化、教育、卫生及科学文化研究) 8 finance(财政) 9 agriculture, water conservancy, animal husbandry (农业、水利、畜牧业) 10 building(建筑) 11 transportation, post and telecommunication(交 通、邮电) figure 2. percentage of the hetu dangse catalog data label chart. results in this study, we used the catalog data of the hetu dangse as a sample to analyze and reveal the hetu dangse catalog data from three perspectives: institutional function, institutional relationship, and automatic classification. this will improve usage efficiency of the hetu dangse, thus improving information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 7 researchers’ mastery of relevant information about the document. to achieve the functional requirements of text analysis, we adopted four methods: word vector conversion, word frequency analysis, co-word clustering, and the svm model. word vector conversion of text catalog data the automatic classification of machine-learning technology is based on vector data sets. thus, the hetu dangse text catalog data set must be vectorized before automatic classification. currently, word vector conversion technology mainly includes methods such as one-hot, word2vec, and glove. hetu dangse records the history of the qing dynasty for more than 200 years. there are inevitable relationships among the contents recorded in the documents, indicating that they are not isolated from each other. the word2vec model provides an efficient implementation of cbow and skip-gram architectures for computing vector representations of words, both of which are simple neural network models with one hidden layer. the word2vec model produces word vectors as outputs from inputting the text corpus. this method generates a vocabulary from the input words and then learns the word vectors via backpropagation and stochastic gradient descent.14 this makes the word2vec model more suitable for catalog data from hetu dangse. word2vec includes the cbow model and the skip-gram model, which can enrich the semantic relevance depending on the context, and it is more suitable for the semantic relevance of historical documents such as the hetu dangse. therefore, we adopt the skip-gram model to analyze the catalog data of hetu dangse. we extracted the features of word vectors in catalog data from the corpus, input them into the word2vec model, imported the gensim library in python, trained the vector embeddings, and obtained the htd.model.bin vector file and htd.text.model model file. the correlation between each word in the hetu dangse catalog can be found by implementing the model. for example, if the word bannerman (旗人) is input into the model, the most relevant words are minren (民人, with 0.84726 relevance), accused (被控, with 0.812017), and robbery (抢 劫, with 0.795359). to visualize the ethnic relationships recorded in the hetu dangse catalog, we input the first 300 words of the word vector into the trained word2vec model and performed dimensionality reduction to realize a planar graph. to understand the structure of the data intuitively, we used the t-sne algorithm to reduce the dimensions of the word vector. the t-sne is a type of nonlinear dimensionality reduction used to ensure that similar data points in high-dimensional space are as close as possible in low-dimensional space. we set the embedded space dimension parameter of tsne to 2 and the initialization parameter as pca. this makes it more globally stable than random initialization. the maximum number of optimization iterations is 5,000. figure 3 presents the results. in figure 3, the terms sanling, yongling, zhaoling, prime minister, and fuling form clusters. in shengjing, the qing set up the sanling prime minister's office, and the prime minister's mausoleum affairs minister was appointed concurrently by general shengjing. near fujinmen, the sanling prime minister's office was established. in the 30th year of guangxu, the government office was changed to the prime minister's office of shengjing mausoleum affairs, and the governor of the three provinces concurrently served. under the sanling prime minister’s office, the sanling office was set up to undertake the sacrifice and repair affairs of the three tombs (xinbin yongling, shenyang fuling, and zhaoling).15 therefore, the clustering in figure 3 verifies the close relationship between the sanling prime minister's office and the tombs. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 8 figure 3. 2d tsne visualization of word2vec vectors. analysis of the relationship between the documents received and sent of the institution with the statistics of the text data obtained after word segmentation, we can find the quantitative relationship between the documents received and sent by the institution, using the pearson correlation coefficient to judge whether there is a correlation between the number of documents received and the number of documents sent by the same institution. 𝜌(𝑟,𝑠) = 𝑐𝑜𝑣(𝑅,𝑆) 𝜎𝑟 𝜎𝑠 (3.1) we suppose that the pearson correlation coefficient between the number of documents received and the number of documents sent is ρ(r,s), r= {r1, r2, r3...r11}. here, r is the variable set of documents received from the institutional sample. set s= {s1, s2, s3…s11} is the variable set of documents sent by the institutional sample. by dividing the covariance of r and s by the product of their respective standard deviations, we can obtain the value of the correlation coefficient of the documents sent and received by the same institution. mining the relationship between institutions’ sending and receiving documents based on co-word clustering to mine the relationship between the institutions’ sending and receiving documents, we adopt a co-word clustering algorithm to generate a visualized network map of institutional relationships. the global co-occurrence rate represents the probability of two words appearing together in all the data sets. in large-scale data sets, if two words often appear together in the text, these two words are considered to be strongly related to the semantics.16 clustering is a method that places objects into a group by similarity or dissimilarity. thus, keywords with high correlation to each other tend to be placed in the same cluster. social network analysis, which evaluates the unique structure of interrelationships among individuals, has been extensively used in social science, psychological science, management science, and scientometrics. 17 we can obtain a sociogram from the institutional function analysis. the main purpose of the sociogram is to provide information information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 9 about the relationship between institutions’ sending and receiving documents. in the sociogram, each member of a network is described by a “vertex” or “node.” vertices represent high-frequency words, and the sizes of the nodes indicate the occurrence frequency. the smaller the size of a node, the lower the occurrence frequency. lines depict the relationships between two institutions. they exist between two keywords, indicating that they received or sent documents to each other. the thickness is proportional to the correlation between the keywords. the thicker the line between the two keywords, the stronger the connection. using this rationale, the map visualization and network characteristics (centrality, density, core-periphery structure, strategic diagram, and network chart) were obtained by analyzing pearson’s correlation matrix or other similarity matrices.18 in this study, we conducted network analysis on a binary matrix to display the relationships between the documents sent and received by the institutions in the shengjing area during the qing dynasty recorded in the hetu dangse. further, we extracted the receiving institution and issuing institution from each record of catalog data in the hetu dangse, and then we composed a new data set with the following data from the receiving institution: issuing institution and title content. we used python to convert the new data set to endnote format and import it into vosviewer1.6.15 to calculate and draw a visual map of the new data set. van eck and waltman of the netherlands’ leiden university developed vosviewer, a metrological analysis software used for constructing and visualizing network graphs.19 although the software’s development principle is based on documents’ co-citation principles, it can be applied to the construction of data network knowledge graphs in various fields. combined with the co -word clustering algorithm, we can create an entity connection network map for historical documents through vosviewer software to reflect the recorded content. automatic classification method of historical archives catalog based on the svm model we used the svm model in machine learning for automatic classification. the svm model has the advantages of strong generalization, low error rate, strong learning ability, and support for small sample data sets, making it suitable for historical archive catalog data samples with small sample characteristics. therefore, we attempted to classify the catalog data set of hetu dangse using the svm model. first, we divided the vectorized labeled data set into a training set and a testing set. the training set accounts for 70% of the data, and the testing set accounts for 30%. to ensure the accuracy of the model prediction, we adopted a random division method to avoid overfitting. second, we used a linear kernel in the svm model and grid search to find the best parameter. various combinations of the penalty coefficient (c) and gamma parameter in the svm model were tested based on their accuracy ranked from high to low. we then determined the best parameter combination. after the model was established, we validated the predictive performance of the model from multiple perspectives such as precision, recall, and f1 score to ensure the generalization ability and availability of the model. we set the penalty coefficients to 10, 100, 200, and 300, while the gamma parameters are set to 0.1, 0.25, 0.5, and 0.75. we used the precision evaluation criteria to find the optimal parameter combination of the model and then imported them. the penalty coefficient is set to the x-axis, the gamma parameter set to the y-axis, and the precision set to the z-axis. we implemented the model to obtain the visualization that is shown in figure 4. clearly, the optimal parameter combination is a penalty coefficient of 10 and a gamma parameter of 0.075. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 10 figure 4. svm grid search parameter tuning diagram. discussion the history of a nation is the foundation on which it is built. historical documents are the witnesses and recorders of history. through the study of historical documents, we can go back to the past, cherish the present, and look forward to the future. an increasing number of scholars have studied these documents in recent years due to their importance. the hetu dangse records the document communications between institutions in shengjing (now shenyang) and beijing during the qing dynasty. it is an important historical document that cannot be ignored when information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 11 studying the history of northeast china during the qing dynasty. here, we use the catalog data of the hetu dangse as the sample data to test the machine learning methods previously mentioned. we explore the results from the perspectives of institutional function, institutional relationship, and automatic classification to determine the feasibility of our methods. functions of institutions the number of institutions involved in the hetu dangse is over 150. these functional departments formed the governance system of the shengjing area during the qing dynasty. to gain a deeper understanding of the qing dynasty’s ruling system in the shengjing area, the functions of these institutions should be examined. this study analyzes and studies the functions of the institutions in the shengjing area through the number of documents and the frequency of content of the sending and receiving institutions. analysis of the number of documents received and sent by institutions by sorting and statistically analyzing the catalog data of hetu dangse, we obtained data on the number of documents received and sent by institutions in the shengjing area recorded in the hetu dangse. we set the vertical axis as the total number of communicated documents, number of issued documents, and number of received documents. we set the horizontal axis as the names of the institutions and then drew a histogram. this study analyzes the number of institutional archives of the hetu dangse catalog from three perspectives: total number of sent and received documents, number of received documents, and number of issued documents to find the institutions with the highest research value in the shengjing area. in the histogram shown in figure 5(a), the top three institutions in total number of communicated documents are shengjing internal affairs office, shengjing zuoling, and shengjing ministry of revenue. we can also observe that the top 10 institutions have different volumes of their respective documents received and sent by institutions. therefore, the ranking of the total number of communicated documents is not directly related to the respective rankings of the number of documents received and the number of documents sent. in figure 5(b), we can observe that the top three institutions in number of documents received in the hetu dangse are shengjing internal affairs office, shengjing ministry of revenue, and shengjing general yamen. figure 5(c) shows the top three institutions in number of documents sent in the hetu dangse are shengjing internal affairs office, shengjing zuoling, and shengjing general yamen. the total number of communicated documents, number of documents sent, and number of documents received by the shengjing internal affairs office all rank first; this indicates that the shengjing internal affairs office is the most important department of the ruling system in the qing dynasty during the shengjing area. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 12 figure 5. number of documents received and sent by institutions. a b c information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 13 by using the number of documents received and sent by the institutions, we calculated the pearson correlation coefficient to determine if the number of documents received and sent by the same institution is relevant. as institutional samples, we selected the shengjing internal affairs office, shengjing ministry of revenue, (beijing) internal affairs office in charge, shengjing zuoling, shengjing ministry of works, shengjing ministry of justice, shengjing general yamen, shengjing close defense zuoling, shengjing ministry of war, fengtian general yamen, and shengjing ministry of rites. through calculation, the result of pearson correlation coefficient is 0.69 (save two decimal places), so there is a correlation between the number of sent and received documents, as shown in figure 6. figure 6. scatter plot of pearson correlation coefficient. the hetu dangse is a copy of official documents dealing with the royal affairs of the shengjing internal affairs office during the qing dynasty. it contains the official documents between the shengjing internal affairs office and the beijing internal affairs office in charge, the liubu, etc. and the local shengjing general yamen, fengtian office, the wubu of shengjing, and other yamens.16 thus, there exist a large stock of documents with the shengjing internal affairs office as the sending and receiving agency. the wubu of shengjing, shengjing general yamen, shengjing zuoling, and other institutions are important hubs for the operation of institutions in shengjing. they played an important role in maintaining and stabilizing the society of shengjing. the number of documents is second in importance only to the shengjing internal affairs office. analysis of the frequency of documents received and sent by institutions to further explore the functions of institutions with research value, we extracted the contents of the catalogs from the top three institutions in total number of documents sent and received: shengjing internal affairs office, shengjing ministry of revenue, and shengjing zuoling. we then classified the catalogs of the aforementioned institutions according to receipts and postings. subsequently, we used word segmentation and word frequency statistics to process the two types information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 14 of catalog information and draw comparison diagrams to explore their specific functions in the hetu dangse. as shown in figure 7, we can roughly divide the obtained segmentation words into two categories. one is the name of the communicated official document institutions, such as the ministry of revenue, the ministry of justice, and the ministry of rites on the side of the word frequency (see fig. 7[a]). the other is the name of the official document content and the words zhuangtou (庄头), dimu (地亩), and zhuangding (壮丁) on the side of the frequency of the words in the documents sent. through a comparative analysis of the top 10 words received and sent by the same institution, we conclude that the institutions with a close relationship between receiving and sending documents are not the same. for example, the ministry of revenue of shengjing internal affairs office ranks first in the frequency of documents sent by institutions, while the shengjing zuoling ranks first for receiving institutions (see fig. 7[b]). the contents of documents sent and received by the same institution are different. figure 7(c) shows how the affairs sent by shengjing zuoling to ula (乌拉), forage (粮草), and license (执照) differ from those represented by the zhuangtou (庄头), accounting (会计), and close defense (关防) in the frequency of documents sent and frequency of receipts, respectively. based on previous research on the functions of shengjing’s institutions, the shengjing internal affairs office was set up in the companion capital of shengjing during the qing dynasty to be in charge of shengjing cemetery, sacrifice, organization of staff transfer, and other matters. 20 this relates to the meaning of words such as sacrifice (祭祀) in figure 7(a). the functions of the shengjing ministry of revenue were represented in guangxu’s great qing huidian. the cashiers in charge of taxation in shengjing, number of annual losses in official villages, and banner land were carefully recorded. the expenditures were distinguished and the accounting obeyed the regulations according to the beijing ministry of revenue at the end of the year.21 this is related to the meaning of words, such as dimu (地亩), land sale (卖地), and money and grain (钱粮) in figure 7(b). in fu yonggong and guan jialu’s research of shengjing zuoling’s functions, shengjing zuoling handled the transfer communicated documents; supervised and urged the various departments of guangchu, duyu, zhangyi, accounting, construction, and qingfeng to undertake matters; managed officials and various people; maintained the shengjing palace and the warehouse; selected women to send to beijing inspect; heard all types of cases; undertook the emperor’s general letter; managed the ula people and tributes; and accepted the emperor or the internal affairs office in charge, among other tasks.22 this is connected to the meaning of words such as ula (乌拉), close defense (关防) and license (执照) in figure 7(c). information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 15 figure 7. word frequency comparison of documents received (in blue) and sent (in orange) by institutions. a b c information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 16 institutional relationship analysis to further study the governance structure of the shengjing area, we not only need to understand the functions of each institution but also explore the overlap between functions of institutions. the catalog data of the hetu dangse consist of three parts: receiving institutions, issuing institutions, and record title of the catalog. a document often includes two institutions, th e receiving institution and the issuing institution, and it is certain that the content of a document relates closely to the functions between the two institutions. by observing the closeness between the number of institutions through visualizations, we conducted a quantitative analysis of consistent catalog data of the receiving and issuing institutions in the hetu dangse to provide reliable data for further research in the intersection of institutional functions in shengjing area. results of institutional connection analysis using the co-word clustering algorithm, we counted the number of archive catalog data consistent with the receiving and issuing institutions. we set the vertical axis as the issuing institution and the horizontal axis as the receiving institution to obtain figure 8. the numbers inside the boxes represent the quantity of catalog data that are consistent with the issuing institution. to facilitate measurements in the statistical process, records less than or equal to 50 communicated documents between the receiving institution and the issuing institution have been zeroed out. as shown in figure 8, the institutions having close relations with the documents recorded in the hetu dangse are concentrated in the issuing institutions shengjing zuoling and shengjing internal affairs office, and the receiving institutions shengjing internal affairs office and shengjing zuoling. among the receiving institutions, the number of documents received by the shengjing internal affairs office from shengjing general yamen reached as high as 11,936. the top three documents received by shengjing zuoling were fengtian general yamen (2,265 pieces), shengjing ministry of revenue (1,527 pieces), and shengjing ministry of justice (1,520 pieces). it is worth noting that there are less than 50 documents from shengjing zuoling in the shengjing internal affairs office. the overlapping functions of the institutions in the shengjing area enabled individual offices to play bureaucratic games, passing responsibility to other offices, leading to low efficiency in handling affairs. for example, the military and political power in the shengjing area was jointly controlled by the shengjing general office and the shengjing ministry of war. the shengjing area’s tax power was controlled by the shengjing ministry of revenue and fengtian office and their subordinate offices. this phenomenon ran through the entire qing dynasty. research on the cr ossfunctionality of institutions has always been a hot topic in qing historiography. by analyzing the official documents between the institutional functions, we can further explore the overlap as well as the advantages and disadvantages of the qing dynasty shengjing ruling system to study the history of shengjing institutions in the qing dynasty more thoroughly providing a reference for the design of current institutions. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 17 figure 8. relationship of communicated documents by the hetu dangse institutions diagram. visualization of institutional network map we used the hetu dangse catalog as sample data and the co-word clustering algorithm to obtain the close relationship between institutions and the appearance frequency of institutions. we drew a visual network diagram by virtue of vosviewer1.6.15 to obtain figure 9. in figure 9, institutions are represented by default as a circle with their names. the size of the label and the circle of an institution are determined by the weight of the item. the higher the weight of an item, the larger the label and the circle of the item. for some items, labels may not be displayed to avoid overlapping labels. the color of an institution is determined by the cluster the institutions belong to, and lines between items represent links. as shown in figure 9, the relationships between the institutions and departments in the hetu dangse form three core groups: the shengjing internal affairs office (in charge), shengjing zuoling, and beijing internal affairs office in charge. however, the relationships between the three groups are not similar; the distance between the group (beijing) internal affairs office in charge and the two other groups is relatively large. the group at the core of shengjing internal affairs office and the group at the core of shengjing zuoling are closely connected to each other through the wubu of information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 18 shengjing (shengjing ministry of revenue, shengjing ministry of rites, shengjing ministry of war, shengjing ministry of justice, and shengjing ministry of works). further, there are two larger individuals: fengtian general yamen and shengjing general yamen. fengtian general yamen and shengjing zuoling are closely related to each other, and the relationship between shengjing general yamen and shengjing internal affairs office is relatively close. figure 9. co-occurrence of institutions network map. the city of shengjing was the companion capital of the qing dynasty. the qing government implemented special governance measures in these areas that differed greatly from those of direct inland provinces.23 to ensure the stable rule of the shengjing area, the qing dynasty performed the following tasks. first, the qing dynasty set up a general garrison as the highest military and political chief in the shengjing area to be responsible for all military and political affairs within its jurisdiction. second, they established the fengtian office, a capital of the same level as the shuntian office, to rule the common people of the shengjing area. the states and counties, as well as the garrison banner officer, which was under the rule of general garrison, were local administrative institutions under the fengtian office. these institutions implemented the dual management rule of the bannerman and common people. third, as the companion capital, the shengjing area followed the ming dynasty companion capital system to set up the wubu of shengjing to maintain power. in addition, the shengjing internal affairs office, which was in charge of palace affairs, communicated with the beijing internal affairs office in charge. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 19 results of automatic classification analysis catalogs are important information resources in the field of historical archives. the classification of archival catalogs can not only link relevant information in archives or archive fonds, improve researchers’ utilization efficiency, and save time to search for required archives, but it can also be shown to readers in clusters. as the hetu dangse catalog is a series of historical documents stored for a long period of time, its original classification system does not suit well existing archival management methods. the hetu dangse has a total of 1,149 volumes and 127,000 pages. each volume contains a different number of documents and the ink characters on chinese art paper are in manchu and chinese. reading and categorizing the full text of the hetu dangse not only requires a lot of manpower, material, and financial resources but also extremely high requirements for the classified staff. they need to possess a good knowledge of manchu, archival science, document taxonomy, and other related disciplines. therefore, sorting and organizing the content of the hetu dangse is an impractical task that relies on manual reading and comprehension. to address this problem, we used the svm model of machine learning to automatically classify and explore the catalog data of the hetu dangse. this model further demonstrates the relevance of the knowledge between documents in the hetu dangse and facilitates an in-depth analysis. we imported the vectorized labeled data set into the svm model and selected the optimal parameter combination to run the model. to visualize the data results, the 50-dimensional word vector is reduced to a 2-dimensional word vector using the t-distributed random neighborhood embedding algorithm. we used the svm model to establish a hyperplane visualized in 2dimensional form. the legend only in figure 10 shows the data distribution of the six categories with the highest proportion owing to the large number of categorized data. to test the classification effect of the svm model, we used precision and recall as metrics and calculated the f1 score to validate the model. the results are presented in table 3. based on the created svm model, 95,680 catalog data of the hetu dangse were predicted and classified. the results are shown in figure 11. although there exist certain deficiencies in accuracy and other aspects, it a positive impact for the content research, management, utilization, and retrieval discovery of hetu dangse. table 3. svm model validation parameters result precision 0.736 recall 0.717 f1 score 0.716 information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 20 figure 10. svm decision region boundary. figure 11. hetu dangse catalog data prediction classification. conclusion in this study, we used machine learning to analyze and visualize the catalog data of the hetu dangse, revealing the functional relationship of the qing dynasty, shengjing regional institutions recorded in this historical document, and showing the institutional communicated relationships. using the svm model, we achieved automatic classification of the hetu dangse catalog from the category perspective. owing to the massive archives of historical materials in ancient china, the information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 21 fonts of many historical materials cannot be recognized by computers or humans. the digitization of catalogs has become a digital bridge between researchers and historical documents. this not only achieves the concise summary and refinement of them but also greatly improves the utilization efficiency by researchers. the svm model can “learn” through the labeled sample data and realize automatic classification of large amounts of unlabeled catalog data. by automatic classification of catalog data, historical data researchers and archive managers can use and manage a large number of historical documents and catalog data more effectively, greatly increasing their utilization. the co-occurrence algorithm can reveal the rules written by the catalog data itself, discover the distance between the catalog data, and form clusters providing a clearer direction for researchers to use historical documents. the algorithm also saves time for researchers to identify documents without purpose, making content presentation of historical documents to readers clearer. this paper improves archivists’ awareness of archive data compilation and management. first, data is observed, topics are identified, and potential relationships between these are found and established to improve historical archives’ compilation. second, the visual presentation method and carrier is chosen, and via the web browser established relationships are visualized for the users to access and utilize. it can be said that scientometric research method can promote the transformation of historical research and archives management and compilation research from traditional explanatory scholarship to truth-seeking scholarship. currently, the application of machine learning technology has gradually extended from applied disciplines to traditional fields of literature, art, and sociology. however, there are still many opportunities in the field of historical research. this study used methods in the field of artificial intelligence to conduct text mining and visualize the presentation of historical archive document catalog data and proposes a new digital and intelligent solution for researching chinese historical documents. with the development of science and technology, research methods for historical documents are undergoing constant changes from the traditional manual subjective analysis of historical data to relying on quantitative analysis represented by deep learning and data mining technology. it is an irreversible trend to research historical documents more comprehensively, accurately, and scientifically by means of artificial intelligence and other technologies on the scientific frontier. for future work, we plan to conduct research on the qing dynasty historical documents from a deeper semantic analysis level, construct a knowledge graph through the method of named entity recognition, and construct an ontological model transforming historical documents into a structured knowledge base to discover new knowledge from historical documents in an automated manner. acknowledgments funding statement this work was supported by the general program of the national natural science foundation of china [grant number 72074060], the research foundation of the ministry of education of china [grant number 20jhq012], and the national social science fund of china [grant number 16btq089]. data accessibility the data sets supporting this article have been uploaded as part of the supplementary material. https://drive.google.com/drive/folders/1bzs17otruyva_qkbshmf836ygdti40y0?usp=sharing https://drive.google.com/drive/folders/1bzs17otruyva_qkbshmf836ygdti40y0?usp=sharing information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 22 competing interests we have no competing interests. endnotes 1 wang tao, “data mining of german historical documents in the 18th century, taking topic models as examples,” xuehai 1, no. 20 (2017): 206–16, https://doi.org/10.16091/j.cnki.cn321308/c.2017.01.021. 2 kaixu zhang and yunqing xia, “crf-based approach to sentence segmentation and punctuation for ancient chinese prose,” journal of tsinghua university (science and technology) 10, no. 27 (2009): 39–49, https://doi.org/10.16511/j.cnki.qhdxxb.2009.10.027. 3 michael stauffer, andreas fischer, and kaspar riesen, “keyword spotting in historical handwritten documents based on graph matching,” pattern recognition 81 (2018): 240–53, https://doi.org/10.1016/j.patcog.2018.04.001; wu sihang et al., “precise detection of chinese characters in historical documents with deep reinforcement learning,” pattern recognition 107 (2020): 107503, https://doi.org/10.1016/j.patcog.2020.107503. 4 renata solar and dalibor radovan, “use of gis for presentation of the map and pictorial collection of the national and university library of slovenia,” information technology and libraries 24, no. 4 (2005): 196–200, https://doi.org/10.6017/ital.v24i4.3385. 5 shaochun dong et al., “semantic enhanced webgis approach to visualize chinese historical natural hazards,” journal of cultural heritage 14, no. 3 (2013): 181–89, https://doi.org/10.1016/j.culher.2012.06.009; jakub kuna and łukasz kowalski, “exploring a non-existent city via historical gis system by the example of the jewish district ‘podzamcze’ in lublin (poland),” journal of cultural heritage 46 (2020): 328–34, https://doi.org/10.1016/j.culher.2020.07.010. 6 aleksandrs ivanovs and aleksey varfolomeyev, “service-oriented architecture of intelligent environment for historical records studies,” procedia computer science 104 (2017): 57–64, http://doi.org/10.1016/j.procs.2017.01.062; guus schreiber et al., “semantic annotation and search of cultural-heritage collections: the multimedian e-culture demonstrator,” journal of web semantics 6, no. 4 (2008): 243–49, https://doi.org/10.1016/j.websem.2008.08.001. 7 m kim et al., “inference on historical factions based on multi-layered network of historical figures,” expert systems with applications 161 (2020): 113703, http://doi.org/10.1016/j.eswa.2020.113703. 8 hobson lane, cole howard, hannes hapke, natural language processing in action: understanding, analyzing, and generating text with python (new york: manning publications, 2019), 165. 9 laurens van der maaten, eric postma, and jaap van den herik, “dimensionality reduction: a comparative review,” tilburg university technical report, ticc-tr 2009-005 (2009), https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_200 9.pdf. https://doi.org/10.16091/j.cnki.cn32-1308/c.2017.01.021 https://doi.org/10.16091/j.cnki.cn32-1308/c.2017.01.021 https://doi.org/10.16511/j.cnki.qhdxxb.2009.10.027 https://doi.org/ https://doi.org/10.1016/j.patcog.2018.04.001 https://doi.org/10.1016/j.patcog.2020.107503 https://doi.org/10.6017/ital.v24i4.3385 https://doi.org/10.1016/j.culher.2012.06.009 https://doi.org/10.1016/j.culher.2020.07.010 http://doi.org/10.1016/j.procs.2017.01.062 https://doi.org/10.1016/j.websem.2008.08.001 http://doi.org/ https://doi.org/10.1016/j.eswa.2020.113703 https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_2009.pdf https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_2009.pdf information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 23 10 gavin hackeling, mastering machine learning with scikit-learn (birmingham: packt publishing, 2017). 11 richard smiraglia, domain analysis for knowledge organization: tools for ontology extraction (oxford: chandos publishing, 2015). 12 kuo-chung chu, hsin-ke lu, and wen-i liu, “identifying emerging relationship in healthcare domain journals via citation network analysis,” information technology and libraries 37, no. 1 (2018): 39–51, https://doi.org/10.6017/ital.v37i1.9595. 13 archives of liaoning province in china, “the hetu dangse series archives publication,” qing history research 6, no. 2 (2009): 1. 14 amit kumar sharma, sandeep chaurasia, and devesh kumar srivastava, “sentimental short sentences classification by using cnn deep learning model with fine tuned word2vec,” procedia computer science 167 (2020): 1139–47, https://doi.org/10.1016/j.procs.2020.03.416. 15 b hongxi, “research on the sanling management institutions of the qing dynasty outside the pass,” manchu minority research 4, no. 12 (1997): 38–56. 16 guangli zhu et al., “building multi-subtopic bi-level network for micro-blog hot topic based on feature co-occurrence and semantic community division,” journal of network and computer applications 170 (2020): 102815, https://doi.org/10.1016/j.jnca.2020.102815. 17 s. ravikumar, ashutosh agrahari, and s. n. singh, “mapping the intellectual structure of scientometrics: a co-word analysis of the journal scientometrics (2005–2010),” scientometrics 102 (2015): 929–55, https://doi.org/10.1007/s11192-014-1402-8. 18 jiming hu and yin zhang, “research patterns and trends of recommendation system in china using co-word analysis,” information processing and management 51, no. 4 (2015): 329–39, https://doi.org/10.1016/j.ipm.2015.02.002. 19 nees jan van eck and ludo waltman, “software survey: vosviewer, a computer program for bibliometric mapping, scientometrics, 84, no. 2 (2010): 523–38, https://doi.org/10.1007/s11192-009-0146-3. 20 z yanchang and l xinzhu, “the study of the function of shengjing office from the use of the official communication — an academic investigation based on hetu dangse,” shanxi archives 8, no. 12 (2020): 179–88. 21 shengjing ministry of revenue, guangxu's great qing huidian volume 25 (zhonghua book company, 1991), 211–12. 22 f yonggong and g jialu, “brief introduction of shengjing upper three banners baoyi zuoling,” historical archives 9, no. 30 (1992): 93–7. 23 wangyue, “research on the yamens and their affair relationships in shengjing area,” shenyang palace museum journal 1, no. 31 (2011): 67–77. https://doi.org/10.6017/ital.v37i1.9595 https://doi.org/10.1016/j.procs.2020.03.416 https://doi.org/10.1016/j.jnca.2020.102815 https://doi.org/ https://doi.org/10.1007/s11192-014-1402-8 https://doi.org/10.1016/j.ipm.2015.02.002 https://doi.org/10.1007/s11192-009-0146-3 abstract introduction related technology definition sample data preprocessing and classification data preparation and preprocessing data cleaning label data results word vector conversion of text catalog data analysis of the relationship between the documents received and sent of the institution mining the relationship between institutions’ sending and receiving documents based on co-word clustering automatic classification method of historical archives catalog based on the svm model discussion functions of institutions analysis of the number of documents received and sent by institutions analysis of the frequency of documents received and sent by institutions institutional relationship analysis results of institutional connection analysis visualization of institutional network map results of automatic classification analysis conclusion acknowledgments funding statement data accessibility competing interests endnotes accessible, dynamic web content using instagram jaci wilkinson information technology and libraries | march 2018 19 jaci wilkinson (jaci.wilkinson@umontana.edu) is web services librarian at the university of montana. abstract this is a case study in dynamic content creation using instagram’s application program interface (api). an embedded feed of the mansfield library archives and special collections’ (asc) most recent instagram posts was created for their website’s homepage. the process to harness instagram’s api highlighted competing interests: web services’ desire to most efficiently manage content, asc staff’s investment in the latest social media trends, and everyone’s institutional commitment to accessibility. introduction the mansfield library archives and special collections (asc) at the university of montana had a simple enough request. their homepage had been static for years and it was not possible to add more content creation to anyone’s workload. however, they had a robust instagram account with more than one thousand followers. was there any way to synchronize workflows with an instagram embed on the homepage? the solution was more complicated than we thought. we developed an instagram embed, but in the process grappled with some fundamental questions of technology in the library. how do we streamline the creation and sharing of ephemeral, dynamic content? how do we reconcile web accessibility standards with the innovative new platforms we want to incorporate on our websites? libraries have invested heavily in social media to improve their approachability, reduce library anxiety, and interact with their users. at the mansfield library, this investment has paid off for asc. this unit was an early adaptor of instagram, a photo and short video–sharing application with the public or approved followers. the asc instagram account launched in january 2015, and staff quickly settled on the persona of “banjo cat” to share collection items and relevant history. banjo cat was inspired by a whimsical nineteenth-century photograph in asc of a cat playing a banjo (see figure 1). asc now has about 1,200 followers including many other libraries, archives, and special collections. in fact, connecting to a wider community of similar institutions was a driving factor in creating an instagram account. the asc staff member who updates the account said, while we have lots of interactions with patrons on facebook we have basically zero interactions with other institutions. instagram is all about interacting with other institutions, sharing ideas for posts, commenting on posts. so by learning about this community and participating and interacting with it we are able to . . . learn about programs and ideas that we would probably not have access to otherwise. 1 mailto:jaci.wilkinson@umontana.edu accessible, dynamic web content using instagram | wilkinson 20 https://doi.org/10.6017/ital.v37i1.10230 figure 1. banjo cat by l. a. de ribas. mansfield library archives and special collections. 1880s. but while asc’s social media thrived, its website was bereft of dynamic content. given that the asc homepage is the ninth most visited page on the library site, it felt like a wasted opportunity to let such a highly trafficked area lack engaging, current, and appealing content. it seemed only natural to harness the energy put into the asc instagram account and embed that same light-hearted, community-oriented, and collection-focused content on the asc homepage. literature review libraries are enthusiastic adopters of social media; one study even shows that as of 2013, 94 percent of academic libraries had a social media presence.2 a 2006 library journal article observed the following about myspace, then a popular social media platform: “given the popularity and reach of this powerful social network, libraries have a chance to be leaders on their college campuses and in the larger community by realizing the possibilities of using social networking sites like myspace to bring their services to the public.” 3 this open-minded spirit and willingness to try new technology trends was shrewd. pew research reports that as of 2016, 69 percent of americans use some type of social media. 4 social media use has grown more representative of the population: the percentage of older adults on at least one social media site continues to increase.5 for academic libraries, the pull of facebook was immediately strong because of the initial requirement for users to have a .edu address. academic libraries very early on attempted to connect with students about services, resources, and spaces using facebook.6 information technology and libraries | march 2018 21 dynamic content is a gateway to building interest toward and buy-in to an institution. in user experience literature, “user delight” is “a positive emotional affect that a user may have when interacting with a device or interface.”7 in walter’s hierarchy of user needs, pleasure tops all other needs.8 figure 2. aaron walter’s hierarchy of user needs, from therese fessenden, “a theory of user delight: why usability is the foundation for delightful experiences,” nielsen norman group, march 25, 2017, https://www.nngroup.com/articles/theory-user-delight/. using social media to engage users with special collections has its own niche. special collections are typically housed in closed stacks and have no digital equivalent. often the materials housed in special collections are rare, fragile, exotic, beautiful, and unusual; a study of library blogs and social media found that those with higher aesthetic value received more visitors and more revisits.9 social media “gives users an idea of what the collection offers while it promotes and potentially gains foot traffic.”10 it has even been suggested that social media gives special collections the opportunity to stand in when digitization isn’t possible: “instead of digitizing a whole collection, librarians can highlight important parts of the collection with a snippet of its history.”11 in creating ucla’s powell library instagram account, librarian danielle salomon https://www.nngroup.com/articles/theory-user-delight/ accessible, dynamic web content using instagram | wilkinson 22 https://doi.org/10.6017/ital.v37i1.10230 writes, “special collections items and digital library images can be a treasure trove of social media content. one of our library’s goals is to increase students’ exposure to special collections items, so we draw heavily from these collections.”12 instagram is a relative newcomer to social media, but it has been consistently successful since its inception in 2010.13 as of 2016, 28 percent of americans use instagram, up from 11 percent in 2013.14 facebook bought instagram in 2012 and has since bolstered the application’s success by making the two platforms easy to navigate and share between. after vine, a short video application, was shuttered in 2017, instagram’s ability to take and post short videos has increased its value. instagram is distinct in that it is mobile-dependent: it is difficult to run the application through a web browser, and only one device can operate an instagram account. within the library community, instagram’s adoption has been strongest in academic libraries. this is tied to the high number of instagram users who are college-age.15 another reason libraries select instagram is because it has more diverse users than other social media applications, specifically african americans and latinos.16 in a 2016 study, instagram was the second-most pick among college students at western oregon university when asked what social media application the library should use (twitter came in first). the most popular use of instagram in academic libraries is familiarizing students with services, resources, and spaces. uses include first-year instruction activities to combat library anxiety and mini-contests that ask users to identify what posted photos are of.17 ucla’s powell library discovered students posting instagram photos of their spaces, so they initially joined to repost those photos and interact with those users. instagram makes a library seem approachable. librarian joanna hare reflected on this discovery: “instagram is really powerful in that respect because you can just snap a few photos [and] show what’s going on . . . so that students don’t view the library as being intimidating.” 18 approachability is augmented by delegating photography and posting tasks to library student employees. social media is less often seen as a way to help create dynamic content for a library’s website. the exceptions to this trend have come from institutions with substantial technology resources. north carolina state university created an open source software that adds photos posted by anyone on instagram to a library photo collection when a certain hashtag is used.19 the university of nebraska’s calvin t. ryan library created an rss feed that disseminates blog posts to twitter, facebook, and the library homepage. posts from followed accounts in twitter and facebook are also a part of the resulting feed. the rss feed requires use of a third-party tool called dlvr.it (https://dlvrit.com/), which supports many other social media applications, but not instagram. a notable absence in literature on social media use in libraries is any mention of accessibility concerns. the “improving the accessibility of social media for public service” toolkit developed by a group of us government offices is a useful resource that includes specific guidelines on making instagram posts more accessible.20 the toolkit explains that “more and more organizations are using social media to conduct outreach, recruit job candidates and encourage workplace productivity. . . . but not all social media content is accessible to people with certain disabilities, which limits the reach and effectiveness of these platforms. and with 20% of the population estimated to have a disability, government agencies have an obligation to ensure that their messages, services and products are as inclusive as possible.”21 given the stated importance of social media in library literature, the lack of conversation about accessibility and social media is a barrier to inclusivity. https://dlvrit.com/ information technology and libraries | march 2018 23 mansfield library archives and special collections’ instagram feed dynamic content was lacking from any part of the asc website, but staff had a dearth of time and knowledge of the content management system to create web content. there was a drive to solve this problem because a new web services librarian had recently been hired. when the web services librarian learned of asc’s thriving instagram presence, she pursued the possibility of including that content on the asc website. she felt that, in addition to being more efficient, content creation should stay in-house given the highly specialized nature of asc’s collections, spaces, and resources. the ideal solution would allow asc staff to create and manage an instagram feed unassisted; the web services librarian sought the simplest possible solution for them. our content management system and instagram’s developer website were first consulted with the hope that one provided an automated embed or plugin. our content management system, cascade, could pull in content from facebook and twitter but not instagram, and instagram did not have an automated feed creator. after more research, we learned that third-party instagram feed embeds are the only possible way to create an instagram feed without using instagram’s api. the api was considered a last-resort option because we knew that asc staff could not manage the code themselves. the idea of using any third-party service was undesirable because of a lack of control, stability, and accessibility. if the service has technical issues or goes out of business, it would be very noticeable given the visibility of asc’s homepage. in 2012, a student advocacy organization at the university of montana filed a civil rights complaint with the us department of education focusing on disabled students’ unequal access to electronic and information technologies. since then, the mansfield library has been proactive to eliminate barriers to access.22 given this history, we are wary of the accessibility of third-party applications to someone using assistive technology, most likely, a screen reader. juicer (https://www.juicer.io/), for example, is a freely available service for an instagram feed but in exchange it retains its branding prominently at the top of the feed. an example of juicer in use can be found on the home page of the baltimore aquarium (http://aqua.org/). tests of juicer showed that it was not accessible for a screen reader. finally, it didn’t fit our need: juicer curated posts from other users depending on the hashtags and reposts, but we only wanted to feature our own content. the unpredictability of other accounts’ posts ending up on the asc homepage was not desirable. instagram’s developer site did not make finding a solution easy. the page titled “embedding” is about embedding individual posts on a webpage, not a whole feed.23 this content does not even link out to an explanation of how to embed a feed. the “authentication” page is where the process begins because calling the api requires a token an authenticated instagram account user.24 a user is authenticated by creating a client id and then receiving an access token. another interesting roadblock provided by the instagram developer site is that the “authentication” page provides no further information about using the access token to call the api. it took outside research to finally figure out the steps needed to make the api requests for asc’s feed.25 php code is used to call the api and copy the three most recent asc instagram posts to a local server file. (using javascript to call the api is a poor choice because that code will make the account’s access token public. if anyone sees this token they can use it themselves to pull your feed using the instagram api.) css replicates the look and feel of instagram with white, minimalistic icons and a simple photo display https://www.juicer.io/ http://aqua.org/ accessible, dynamic web content using instagram | wilkinson 24 https://doi.org/10.6017/ital.v37i1.10230 that darkens and shows the beginning of the description when a user’s mouse hovers over it. all code from this project is freely available in github.26 there is a catch to this embedded feed process. the directions given through instagram and by the online sources we used only took us to sandbox mode (in web development, sandbox refers to a restricted or test version of a final product). in sandbox, instagram limits the number of requests to the api. unfortunately, a request was made every time someone went to the asc page. the initial feed stopped working in minutes because we did not realize this limitation of sandbox mode meant. another look at the instagram developer site taught us that the only way to leave sandbox was to have our “app,” as instagram called it, reviewed.27 in other words, instagram has only set up their api to be used for full application development (like juicer). we decided not to leave sandbox mode because of uncertainty about what instagram’s review process would entail. if our app was rejected, would they force us to discontinue our work? the timeline for the approval process was also uncertain. distrust and uncertainty, unfortunately, guided our decision-making at this stage. instead of undergoing the review process, the php code was reconfigured to call the api only once a day. this made the feed less dynamic because it was not updating in real time. f or our purposes this was not a problem; the asc instagram account is updated at most once or twice a week anyway. as a result, we are “scraping” asc’s instagram account. although “crawling, scraping, and caching” are prohibited by instagram’s terms of use, other instagram feeds in github have similar workarounds and point out that a plugin/scraper “uses (the) same endpoint that instagram is using in their own site, so it’s arguable if the toc [terms of use] can prohibit the use of openly available information.”28 while figuring out how to work with the instagram api, a major accessibility roadblock cropped up: there was no place for the alt text—descriptive information about the image that is used by assistive technologies for users with low vision. besides taking or uploading a photo, the only other actions offered to create a new post were to write a caption, tag people, or add a location. only the caption allowed for a text string. without alt text, not only is the instagram feed unintelligible to a screen reader but it disturbs a screen reader user’s interaction with all other content on that page. an asc staff member discovered a solution when she noticed a joshua tree national park instagram post with alt text at the bottom of the caption. although initially put off by the “wordiness,” we concluded this was the only logical way to move forward. the benefits to this format of alt text took focus as we moved through the project: the asc staff member was able to choose the desired alt text without any additional steps or skills, and we grew to relish the opportunity to explain to curious users what the #alttext hashtag meant and why it was important to us. php code isolates all text after #alttext and displays that as the alt text to a screen reader. since the instagram feed was implemented, it has been interesting to follow how the instagram developer site has changed and grown. although facebook has owned instagram for five years, the instagram developer site is only now starting to link out to facebook developer content. most recently, the instagram developer site has been advertising the instagram graph api for use by business accounts. this type of development is useless for us because we have a personal instagram account, not a business account. and the function of the instagram graph api is focused on the internal user and analytics, not the end user and user experience. even if the instagram graph api was available for personal accounts, it is worth asking if this type of data collection would be of use to an organization that doesn’t have the labor of a devoted marketing team. information technology and libraries | march 2018 25 dynamic content through social media and web content provides opportunities to create user delight because it focuses on visually appealing, fun, timely, and interesting information. for archives, special collections, and other cultural heritage institutions, this content is particularly useful because it provides a look into collections that are interesting and rare but also fragile and housed in closed stacks. these positives are tempered by the reality many of these institutions face: budgets are tight, staffs are small, and technical expertise might be lacking. this paper demonstrates how important and useful social media is to create dynamic website content. unfortunately, there is a gap in library literature on accessibility and social media; although social media content is ephemeral or lacks specific utility, libraries need to pay more attention to the various ways users access resources and information through social media, especially if that same content appears on the institution’s website. the asc’s embedded homepage instagram feed fits their needs, is accessible, and builds community around their unique collections. by providing all the code created in this project in github,29 including the css we used, our hope is that institutions interested in this instagram feed model could replicate it for their own purposes without extensive technical support. acknowledgments i am thankful for the expertise of carlie magill, donna mccrea, and wes samson. without them this project would not have been possible. references 1 carlie magill, e-mail message to author, august 8, 2017. 2 michael sutherland, “rss feed 2.0” code4lib 31, january 28, 2016, http://journal.code4lib.org/articles/11299. 3 beth evans, “your space or myspace?” library journal 131 (2006): 8–12. library, information science & technology abstracts, ebscohost. 4 “social media fact sheet,” pew research center, january 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 5 ibid. 6 brian s. mathews, “do you facebook?” c&rl news, may 2006, http://crln.acrl.org/index.php/crlnews/article/viewfile/7622/7622. 7 therese fessenden, “a theory of user delight: why usability is the foundation for delightful experiences,” nielsen norman group, march 25, 2017, https://www.nngroup.com/articles/theory-user-delight/. 8 ibid. 9 daryl green, “utilizing social media to promote special collections: what works and what doesn’t” (paper, 78th ifla general conference and assembly, helsinki, finland, june 2012), 11, https://www.ifla.org/past-wlic/2012/87-green-en.pdf. 10 katrina rink, “displaying special collections online,” serials librarian 73, no. 2 (2017): 1–9, https://doi.org/10.1080/0361526x.2017.1291462. 11 ibid. http://journal.code4lib.org/articles/11299 http://www.pewinternet.org/fact-sheet/social-media/ http://crln.acrl.org/index.php/crlnews/article/viewfile/7622/7622 https://www.nngroup.com/articles/theory-user-delight/ https://www.ifla.org/past-wlic/2012/87-green-en.pdf https://doi.org/10.1080/0361526x.2017.1291462 accessible, dynamic web content using instagram | wilkinson 26 https://doi.org/10.6017/ital.v37i1.10230 12 danielle salomon, “moving on from facebook,” college & research libraries news 74, no. 8 (2013): 408–12, https://crln.acrl.org/index.php/crlnews/article/view/8991. 13 sarah perez, “the rise of instagram,” techcrunch, april 24, 2012, https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spreadworldwide/. 14 “social media fact sheet,” pew research center, january 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 15 lauren wallis, “#selfiesinthestacks: sharing the library with instagram,” internet reference services quarterly 19, no. 3–4 (2014): 181–206, https://doi.org/10.1080/10875301.2014.983287. 16 elizabeth brookbank, “so much social media, so little time: using student feedback to guide academic library social media strategy ,” journal of electronic resources librarianship 27, no. 4 (2015): 232–47, https://doi.org/10.1080/1941126x.2015.1092344; salomon, “moving on from facebook.” 17 wallis,“#selfiesinthestacks”; salomon, “moving on from facebook.” 18 wendy abbott et al., “an instagram is worth a thousand words: an industry panel and audience q&a,” library hi tech news 30, no. 7 (2013): 1–6, https://doi.org/10.1108/lhtn08-2013-0047. 19 salomon “moving on from facebook.” 20 “federal social media accessibility toolkit hackpad,” digital gov, accessed november 25, 2017, https://www.digitalgov.gov/resources/federal-social-media-accessibility-toolkit-hackpad/ . 21 ibid. 22 donna e. mccrea, “creating a more accessible environment for our users with disabilities: responding to an office for civil rights complaint,” archival issues 38, no. 1 (2017): 7, https://scholarworks.umt.edu/ml_pubs/25/ 23 “embedding,” instagram developer, accessed november 25, 2017, https://www.instagram.com/developer/embedding/. 24 “authentication,” instagram developer, accessed november 25, 2017, https://www.instagram.com/developer/authentication/ . 25 pranay deegoju, “embedding instagram feed in your website,” logical feed, december 25, 2015, https://www.logicalfeed.com/embedding-instagram-feed-in-your-website . 26 wes samson, “ws784512 instagram,” github, 2016, https://github.com/ws784512/instagram. 27 “sandbox mode,” instagram developer, accessed november 25, 2017, https://www.instagram.com/developer/sandbox/. 28 “terms of use,” instagram, accessed november 25, 2017, https://help.instagram.com/478745558852511; and “image-hashtag-feed,” digitoimisto dude oy, accessed november 25, 2017, https://github.com/digitoimistodude/image-hashtag-feed. 29 samson, “ws784512 instagram.” https://crln.acrl.org/index.php/crlnews/article/view/8991 https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spread-worldwide/ https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spread-worldwide/ http://www.pewinternet.org/fact-sheet/social-media/ https://doi.org/10.1080/10875301.2014.983287 https://doi.org/10.1080/1941126x.2015.1092344 https://doi.org/10.1108/lhtn-08-2013-0047 https://doi.org/10.1108/lhtn-08-2013-0047 https://www.digitalgov.gov/resources/federal-social-media-accessibility-toolkit-hackpad/ https://scholarworks.umt.edu/ml_pubs/25/ https://www.instagram.com/developer/embedding/ https://www.instagram.com/developer/authentication/ https://www.logicalfeed.com/embedding-instagram-feed-in-your-website https://github.com/ws784512/instagram https://www.instagram.com/developer/sandbox/ https://help.instagram.com/478745558852511 https://github.com/digitoimistodude/image-hashtag-feed abstract introduction literature review mansfield library archives and special collections’ instagram feed acknowledgments references testing for transition: evaluating the usability of research guides around a platform migration articles testing for transition: evaluating the usability of research guides around a platform migration ashley lierman, bethany scott, mea warren, and cherie turner information technology and libraries | december 2019 76 ashley lierman (lierman@rowan.edu) is instruction librarian, rowan university. bethany scott (bscott3@uh.edu) is coordinator of digital projects, university of houston. mea warren (mewarren@uh.edu) is natural science and mathematics librarian, university of houston. cherie turner (ckturner2@uh.edu) is assessment & statistics coordinator, university of houston. abstract this article describes multiple stages of usability testing that were conducted before and after a large research library’s transition to a new platform for its research guides. a large interdepartmental team sought user feedback on the design, content, and organization of the guide homepage, as well as on individual subject guides. this information was collected using an open-card-sort study, two face-to-face, think-aloud testing protocols, and an online survey. significant findings include that users need clear directions and titles that incorporate familiar terminology, do not readily understand the purpose of guides, and are easily overwhelmed by excess information, and that many of librarians’ assumptions about the use of library resources may be mistaken. this study will be of value to other library workers seeking insight into user needs and behaviors around online resources. introduction like many libraries that employ springshare’s popular libguides platform for creating online research resources, the university of houston libraries (uhl) has accumulated an extensive collection of guides over the years. by 2015, our collection included well over 250 guides, with varying levels of complexity, popularity, usability, and accessibility. this presented a major challenge when we planned to migrate our libguides instance (locally branded as “research guides”) to libguides v2 in fall 2015, but also an opportunity: the transition would be an ideal time to appraise, reorganize, and streamline existing guide content. although uhl had conducted user research in the past to improve usability, in preparing for the migration it became clear that another round of tests would be beneficial in revising our guides for the new platform. our research guides would be presented much differently in libguides v2, and the design and organization of information would need to be tailored to the needs of our user community like any other service. user feedback would be vital to reorganizing our guides’ content and to making customizations to the new system. this article will describe the usability testing process that was employed before and after uhl’s migration to libguides v2. usability testing is one technique in the field of user experience (ux). the primary goal of ux is to gain a deep understanding of users’ preferences and abilities, in order to inform the design and implementation of more useful, easy-to-use products or systems. best practices for ux emphasize “improving the quality of the user’s interaction with and perceptions of your product and any related services.”1 usability tests conducted as part of this case study mailto:lierman@rowan.edu mailto:bscott3@uh.edu mailto:mewarren@uh.edu mailto:ckturner2@uh.edu testing for transition | lierman, scott, warren, and turner 77 https://doi.org/10.6017/ital.v38i4.11169 were informed by the work of jakob nielsen, who pioneered several ux ideas and techniques, and the explanations on conducting your own usability testing provided in steve krug’s seminal works on the topic, don’t make me think and rocket surgery made easy. uhl’s transition to libguides v2 consisted of five stages: (1) card sort testing to determine the best organization of guides in the new system; (2) the migration itself; (3) face-to-face usability testing after migration to study user expectations and behavior after the change; (4) a survey to identify any significant variations in distance users‘ experiences; and (5) final analysis and implementation of the results. incorporating usability testing was a relatively easy and inexpensive process with a high yield of useful insights, which could be adapted as needed to other library settings in order to evaluate similar online resources. literature review as libraries have moved from traditional paper pathfinders to online research guides of increasing sophistication, there has been substantial study into the effectiveness of online research guides for various audiences and information needs. several studies highlight the apparent disconnect between students’ and librarians’ perceptions of research guides, especially regarding the purpose, organization, and intended use of the guides. reeb and gibbons used an analysis of surveys and web usage statistics from several university libraries to show that students rarely or never used online guides despite the extensive time spent by librarians to curate and present information resources.2 similarly, in courtois, higgins, and kapur’s one-question survey (“was this guide useful?”) the authors were surprised to find that 40 percent of the responses received rated guides unfavorably, noting that “it was disheartening for many guide owners to receive poor ratings or negative comments on guides that require significant time and effort to produce and maintain.”3 hemmig concluded that in order to increase the value of a guide from a user perspective, librarians must adopt a user-centric approach by guiding the search process, understanding students’ mental models for research, and providing “starter references.”4 staley’s survey of student users also indicates a need to be mindful of what resources guides are actually expected to provide, as it found that pages linking to articles and databases were far more used than pages with other content.5 data has also shown that undergraduate students are unable to match their information needs with the resources provided on broad subject-area guides, leading several authors to conclude that students would be able to use course-specific guides more easily. for instance, strutin found that course guides are among the most frequently used guides, especially when paired with library instruction sessions.6 several other studies cite survey data, statistics, and educational concepts like cognitive load theory to conclude that ideally, guides would be customized to the specific information needs of each course and its assignments in order to better match the mental models and information-seeking behavior of undergraduate students.7 while the value of online research guides has been under study for quite some time, usability testing of guides is a relatively recent phenomenon. in 2010, librarians at concordia university conducted usability testing of two online research guides and found that undergraduate students generally found the guides difficult to use.8 librarians at metropolitan state university conducted two rounds of usability tests on their libguides with a broader range of participant types, highlighting the ability to incorporate usability testing as part of an iterative design process.9 at ithaca college, subject librarians partnered with students in a human-computer interaction information technology and libraries | december 2019 78 course to test both course guides and subject guides through a series of usability tests, preand post-test questionnaires, and a group discussion in which students evaluated the findings of the usability tests and discussed their experiences.10 at the university of nevada, las vegas, librarians conducted usability testing with both undergraduate students and librarians, and surprisingly found that attitudes towards the guides were similar in both groups: interface design challenges were the greatest barrier to task completion, rather than the level of expertise of the user.11 finally, at northwestern university, librarians conducted several types of usability tests as a part of a transition from the original libguides platform to libguides v2, to determine what features worked from the original guides and what could be improved or updated during the migration.12 throughout these and other usability studies, the authors have identified a number of desirable and undesirable elements in research guide design: • clean and simple design is highly prioritized by users. students preferred streamlined text, plentiful white space, and links to “best bets” rather than comprehensive but overwhelming lists of databases.13 these findings also align with accepted web design best practices. • guide parts and included resources should be labeled clearly and without jargon.14 sections and subpages within each guide should be named according to key terms that students recognize and understand. also, librarians should consider creating subpages using a “need-driven approach,” based on the purpose of each research task or step, rather than by the format of materials or resources.15 • the tabbed navigation of libguides v1 is both unappealing to and easily missed by users, and if it must be implemented, great care should be taken to maximize its visibility and usability.16 • consistency of guide elements, both within a guide and from one guide to the next, helps users more easily orient themselves when using guides; certain elements should always be present in the same place on the page, including navigational elements and table of contents, contact information, supplemental resources such as citation and plagiarism information, and common search boxes.17 with the findings and recommendations of these predecessors in mind, we designed a multi-stage study to expand upon their results and identify new challenges and opportunities that the libguides v2 platform might present. methodology stage 1: card sort the majority of research guides at uhl are organized by subject area, by course, or both. there are a number of guides, however, that are not affiliated with any particular subject area or course, containing task-oriented information that may be valuable across a wide variety of disciplines. the organizational system for these guides had developed organically over time as new guides were developed, rather than being structured intentionally, and it had become evident that these guides were not particularly discoverable or well-used by students. the migration to libguides v2 presented an opportunity to reorganize these guides based on user input. a team of three librarians from the liaison services department conducted an open-card-sort study in november 2015, in order to determine how best to organize those research guides not already affiliated with a course or subject area. card sorting is a method of identifying the testing for transition | lierman, scott, warren, and turner 79 https://doi.org/10.6017/ital.v38i4.11169 categories and organization of information that make the most sense to users, by asking users to sort potential tasks into named categories representing the menus or options that would be available on the site. an open-card sort allows users to create and name as many categories as they need, as opposed to a closed-card sort, which requires users to sort the available options into a predetermined set of categories. to prepare for the study, we reviewed all of our guides to develop a complete list of those not affiliated with a subject or course. for each guide, we developed a brief, clear description of the guide’s topic that would be easy for an average library user to understand, each on a small laminated card. over an approximately ninety-minute period, we staffed a table in the 24-hour lounge of m.d. anderson library, where we recruited passersby to participate in the study. after answering a few demographic questions, participants were asked to place the cards into groups that seemed logical to them. they could create as many or as few groups as necessary, but were asked to try to place every card in a group. while the participants organized the cards, they were asked to explain their thought processes and rationale, and one librarian observed the sorting process and took notes on their actions and explanations. when a participant finished grouping the cards, they were asked to write on an index card a name for each of the groups they had created. the final groupings were photographed and the labels retained for recording purposes. after the testing was complete, participants’ responses were organized into a spreadsheet and reviewed for recurring patterns and commonalities. a new set of categories was developed based on those most commonly created by students during the study, and these categories were titled using the most common terminology used by students in their group labels. stage 2: migration at the direction of the instructional design librarian (idl), research guide editors at uhl revised and prepared their own guide content throughout fall 2015, eliminating unneeded information and reorganizing what remained. the idl led multiple trainings and work sessions throughout the process to ensure compliance. during this same time, the idl completed back-end work in the libguides system to prepare for migration, and the web services department created a custom layout for the new guide site. the data migration itself took place on december 18, 2015, followed by cleanup and full implementation in january 2016. the idl provided a deadline by which all content must be ready for public consumption, prior to the start of the spring semester. af ter that deadline, the web services department switched the url for uhl’s research guides site to the libguides v2 instance and made the new system publicly available. stage 3: face-to-face testing after the migration process was complete, the idl assembled a team of ten other librarian and staff stakeholders from the liaison services, special collections, and web services departments to develop a usability testing protocol. this team assisted the idl in developing two different face-toface testing scripts and the text of a survey for distance users, as well as helping to administer face-to-face testing. the method we chose for the face-to-face testing process was think-aloud testing. in a think-aloud test, the user is provided a set of tasks to complete using the web resource that have been identified as common potential uses. the user is asked to attempt each task, and to narrate any thoughts or reactions to the resource, as well as the thought process and rationale behind each decision made. information technology and libraries | december 2019 80 several members of the team were already familiar with usability practices and had participated in think-aloud user testing before. training for the others was provided in the form of short practical readings, verbal guidance from the idl in group meetings, and practice sessions before conducting the face-to-face testing. in the practice sessions, group members volunteered for their roles in the testing, discussed protocol and logistics and asked any questions, and practiced the tasks they would each need to complete: making the recruitment pitch to users, walking through the consent process, using recording software, using the notetaking sheet, and so on. as the team leader and one of the members experienced with usability, the idl conducted the actual testing interviews. each of the face-to-face tests focused on either subject guides or the guide homepage. for both tests, tables were set up in the 24-hour lounge for recruitment and testing. two team members recruited students in the library at the time of testing by offering snacks and library branded giveaways. two additional team members facilitated the test and took notes during testing. both tests also used the same consent forms and demographic questions, and largely the same followup questions. participants in both homepage and subject guide testing were guided to the appropriate starting points and interviewed about their impressions of the homepage and guides, their perceptions of the purpose of these resources, and their understanding of the research guides name. subject guide testers were allowed to select which of our two testing guides they would be more comfortable using: the general business resources guide, or the biology and biochemistry resources guide. subject guide testers were also asked how they would seek help if the guide did not meet their needs. both groups were then asked to complete one of two sets of tasks. the homepage tasks were designed to test users’ ability to find individual guides, either for a specific course or for general information on a subject; the subject guide tasks were designed to test users’ ability to find appropriate resources for research on a given topic. after completing the tasks for their appropriate resources, participants answered several general follow-up questions, with additional questions from the facilitator as necessary. stage 4: survey unlike the face-to-face testing, the survey focused only on use of subject guides, not the homepage. otherwise, however, because the purpose of the survey was to compare the behavior of distance users to the behavior of on-campus users, the survey was designed to mimic the face-to-face test as closely as possible. several team members with liaison responsibilities identified distance user groups in their subject areas who would be demographically appropriate and available at the needed time, and contacted appropriate faculty members to ask for assistance in distributing the survey via email. ultimately, the survey was distributed to small cohorts of users in the areas of social work, education, nursing, and pharmacy, and customized for each targeted cohort. each version of the survey linked users to their appropriate subject guide and then asked the same questions regarding impressions of the guide that were asked in the face-to-face testing. users were also asked to complete tasks using the guide that were similar in purpose to those in the face-to-face testing, and they were prompted to enter the resource they found at the end of each task. demographic information was requested at the end of the survey to ensure that in the event of drop-offs, basic demographic information would be more likely to be lost than testing data. the testing for transition | lierman, scott, warren, and turner 81 https://doi.org/10.6017/ital.v38i4.11169 survey was distributed to the target groups over a three-week period in june 2016. six users at least partially completed the survey, and four completed it in full. stage 5: analyzing and implementing results after completing the face-to-face testing, the idl reviewed and transcribed the recordings of each test session, along with additional insights from the notetakers. responses to each interview question were coded and ordered from most to least common, as were patterns of behavior and difficulties in completing each task. task results and completion times were also recorded for each user and organized into a spreadsheet with users’ demographic information. the idl then reported out to research guide editors on common responses and task behaviors observed in the testing, and interpretations of the implications of these results for guide design. after survey responses were collected, the idl compiled and analyzed the results using a similar process, although the survey received few enough responses that coding was not necessary. users’ responses to questions were noted and grouped, and success and failure rates on tasks were tallied. a second report out to research guide editors summarized these results and described which responses closely resembled those received in the face-to-face testing and which varied. finally, when all data had been collected, the idl compiled recommendations based on the testing results with other recommendations derived from past uhl studies and from reviewing the literature, and from these developed a set of research guides usability guidelines. the guidelines were organized from highest to lowest priority, based on how commonly each was indicated in testing or in the literature. research guide editors were asked to revise their guides according to these guidelines within one year of their implementation, and advised that their compliance would be evaluated in an audit of all guide content in summer 2017. in the interest of transparency, the idl also included in the guidelines document an annotated bibliography of the relevant literature review, and a formal report on the procedures and results of the usability testing process. findings card sort one significant observation from the card sort was that, while librarians tended to organize guides into groups based on type of user (e.g., “undergraduates,” “student athletes,” “first-years,” etc.), none of the students who participated categorized resources in this way, and they did not seem to be particularly conscious of the categories into which they or other users might fit. instead, their groupings focused on the type of task to which each guide would be most appropriate, rather than the type of user that would be most likely to use that guide. for example, users readily recognized guides related to citation tasks and preferred them to be grouped together, regardless of the level at which they addressed the topic, and also grouped advanced visualization techniques like gis with simpler multimedia-related tasks like finding images. similarly, category labels tended to include “how to . . . ” language in describing their contents, focusing on the task to which the guides in that category would be beneficial. this aligns with the recommendation from sinkinson et al. to name guide pages based on purpose rather than format.18 it is worth noting, however, that all of the students who participated in the card-sort study were undergraduates and may not have fully understood some of the more complex research tasks being described. it should also be noted that all users created some sort of category for “general” or “basic” research tasks, and most either explicitly created an “advanced” research category, or information technology and libraries | december 2019 82 created several more granular categories and then verbally described these as being for “advanced” research tasks. in general, organization by task type was most preferred, followed by level of sophistication of task. face-to-face testing: homepage no significant correlations were found between user demographics and users’ success rates in completing each task, nor between demographics and time on task. users’ ability to navigate the system was generally consistent regardless of major, year in program, and—somewhat surprisingly—frequency of library use. this is, however, in keeping with costello et al.‘s finding that technology barriers were more significant in user testing than level of experience.19 when testing the homepage, we found that all users were able to find known guides (such as a course guide for a specific course) and appropriate guides for a given task (such as a citation guide for a particular style) quickly and easily. when seeking a guide, users generally used the by subject view of all guides to locate both subject and course guides. if this view was not helpful, as in the case of citation style guides, users’ next step was most commonly to switch to the all guides view and use the search function to look for key terms. users understood and used the by subject and all guides views intuitively, expressed more confusion and hesitation about the by owner and by type views, and disregarded the by group view entirely. we had been concerned about whether the search function would confuse users by highlighting results from guide subpages, but on the contrary, the study participants used the search function easily, and the fact that it surfaced results from within guides seemed to help them find and identify relevant terms, rather than confusing them. overall, users responded favorably to the look and feel of guides, albeit with a few specific critiques: the initially limited color palette made it difficult for some users to distin guish parts of a guide from one another, and the text size was found to be uncomfortably small in some areas. face-to-face testing: subject guides in subject guide testing, we found overwhelmingly that users both valued and made use of link and box descriptions within guides, using them throughout the navigation process as sources of additional information. users generally preferred briefer descriptions, rather than reading lengthy paragraphs of text, but several noted specific instances in which they would not have understood the nature or purpose of a database without the description that was provided. we also found, conversely, that librarian profile boxes were of less value to users than we had assumed. when asked how they would find help when researching, most subject guide testers said they would turn to google, ask at the library service desk, or use the contact us link in the libguides footer; only two mentioned the profile box as a potential source of help at all. users also seemed unsure of the purpose of the profile box, and not to recognize whose photo and contact information they were seeing, in spite of box labels and text. contrary to our expectations, users also readily clicked through to subpages of guides to find information, sometimes even when more useful information was actually available on the guide landing page. this was particularly evident in one of the subject guides that included internal navigation links in a box on the landing page: if users saw a term they recognized in one of these links, they would click it immediately, without exploring the rest of the page. in general, users latched on quickly to terms in subpage and box titles that seemed relevant to their tasks, and some expressed feelings of increased confidence and reassurance when seeing a familiar term featured testing for transition | lierman, scott, warren, and turner 83 https://doi.org/10.6017/ital.v38i4.11169 prominently on an otherwise unfamiliar resource. scanning for keywords in this manner also sometimes led users astray, however: some navigated to inappropriate pages or links because they featured words like “research” or “library” in their titles. users also expressed confusion about page titles that did not match their expectations of tasks they could complete online, such as “biology reading room.” these findings support those of many prior authors regarding the importance of including clear descriptions with key words that users readily understand.20 many of our results from subject guide testing not only ran counter to our expectations, but challenged the assumptions on which we had based our questioning. for example, we had been curious to learn whether links to external websites were used significantly compared to links to library databases, or if they simply cluttered guides unnecessarily. in testing, however, we found that users did not distinguish between the two types of resources at all, and used both interchangeably. a better question seemed to be not whether users found those links useful, but how to distinguish them from library content—or whether the distinction was necessary from the user’s perspective. some team members had also been concerned about the scroll depth of guide pages, but the majority of users not only said they did not mind scrolling, but seemed surprised and amused by being asked. their own assumptions about this type of resource clearly included the need to scroll down a page. a few other miscellaneous issues presented themselves in our face-to-face testing. one was that the purpose and nature of research guides was not readily evident to users. many used language that conflated guides with search tools like databases, or even with individual information resources like books or articles. for example, a user asked whether the by owner view listed the authors of articles available in this resource. the curated and directional nature of research guides was not at all clear to users. furthermore, in spite of the improvements to guide look and feel in libguides v2, several users still spoke of guides as being cluttered, lengthy, and overwhelming, leaving them intimidated and unsure of where to begin. consistently, testers tended to gravitate toward course guides even when subject guides would have been more appropriate for a given task, and some users expressed that this choice was because of the greater specificity in course guide titles. users demonstrated a great preference for familiarity, gravitating toward terms and resources that were known to them, and even repeating behaviors that had been unproductive earlier in the testing process. finally, one of the greatest points of confusion for users seemed to be the relationship of research guides to physical materials within the library. users readily and confidently followed links to online resources from research guides but expressed confusion and hesitancy when guides pointed to books or other resources available in the library. survey the survey of off-campus users had few responses, but the demographics of the respondents varied more than those of the on-campus testing participants, including graduate students and faculty. the users who did respond showed evidence of less use of guide subpages than we had observed in the face-to-face testing, indicating that the presence of a librarian during testing may have influenced users to explore guides more thoroughly than they would have when working on their own. at the same time, more experienced researchers in the survey group—in this case, a late-program graduate student and a faculty member—were apparently more likely than less experienced users to explore guides thoroughly, and to succeed at research tasks. survey respondents also were far more likely to state that they would use the profile box on guides for information technology and libraries | december 2019 84 help, with some indicating that they recognized their liaison librarian’s picture and were familiar with the librarian as a source of assistance. liaison librarians at uhl often work more closely with higher-level students and faculty than with undergraduates, and this greater familiarity was not surprising. discussion implementation of findings based on the results of the literature review and testing, a number of changes and recommendations were implemented. a brief description of the nature and purpose of research guides was added to the guide homepage’s sidebar, and more color variation was added to guides, while font sizes were increased. existing documentation was also reworked and expanded to create the research guides usability guidelines document for all guide editors, which included adding or revising the following recommendations: • pages, boxes, and resources should all be labeled with clear, jargon-free language that includes keywords familiar to their most frequent users. • page design should be “clean and simple,” minimizing text and incorporating ample white space. • brief, oneto two-sentence descriptions should be provided for all links. • each guide should have an introduction on its landing page with a brief description of its contents and purpose. it may be helpful to include links to subpages in this box as well, but this should be done judiciously, as these links may take users off the landing page prematurely. • pages and resources aimed at undergraduates should be organized and titled according to their relevance to research tasks (e.g., “find articles”), and not by user group. • electronic resources should be prioritized on guides over print resources. • clear distinctions should be made between library and non-library links when the distinction is important. • a profile box with a picture should be included, but the importance of this item is not as great as we had previously imagined. limitations one of the most significant challenges in our testing was actually negotiating the irb application process. delays in our application raised concerns within the team that we would not receive approval in time to test with students before the start of the summer break. although we did receive approval in time, the window for testing afterward was extremely narrow. submitting the application also bound us to the scripts and text that we had originally drafted, which severely limited the flexibility of the testing process. this became a challenge at several points when a particular phrasing or design of a question was found to be ineffective in practice, but could not be altered from its original form. tensions between the requirements for institutional review and the unique needs of usability testing are a persistent problem for user experience development in an academic setting, and must be planned for accordingly as much as possible. in some cases, as well, we might have improved our results by better designing our questions. one example of this was the question about the name “research guides,” which anecdotal evidence has suggested might be challenging for users. simply asking whether that name made sense to the participant was clearly not effective in practice, and did not yield actionable insights. in the future, testing for transition | lierman, scott, warren, and turner 85 https://doi.org/10.6017/ital.v38i4.11169 we might consider informal testing of our planned questions with users in the target demographic before proceeding with full-scale usability testing. a final challenge was in gathering data on use of guides by distance users. though we were able to get enough responses to draw some tentative conclusions, we had hoped for a larger pool of data. though it would make the results more difficult to compare to in-person testing, reducing the length of the survey might have helped to produce more responses. additionally, increased marketing and more flexible timing for survey distribution might have also helped us reach a larger audience. conclusions the results of our testing were very instructive, and led to the creation of valuable documentation for guide editors to use in their work. we also learned a number of lessons relating to process that would be of value to other librarians seeking to perform similar testing at their own institutions. the first of these is that working with a large, interdepartmental team on this type of project— while occasionally unwieldy—is greatly beneficial overall. even if all the team members are not able to fully participate, involving as many colleagues as possible in the usability testing process lessens the workload for each individual, increases flexibility, and ultimately increases buy-in and compliance with the resulting changes and recommendations. for a platform used directly by a relatively large percentage of librarians, as libguides generally is, the number of stakeholders in user research is correspondingly large, and as many of these stakeholders as possible should be involved to some degree. not only will this distribute the benefits of the process more broadly, it will make it possible to staff more extensive and more frequent testing sessions. in the course of our testing process, we also came to recognize the value of testers familiar with the user group under examination. a majority of librarians involved in testing were from publicfacing departments, with significant student contact in their day-to-day work. as a result, we were able to quickly attract a diverse set of participants for our testing simply through our collective knowledge of students’ likely behaviors and preferences: where students were most likely to congregate, what kinds of rewards would motivate them to participate, how to reach them at a distance, and how far their patience would be likely to extend for an in-person interview or an online survey. the incentives and location that the testing teams selected were so effective that the numbers of volunteers we received overwhelmed our capacity to accommodate within the allotted testing time, resulting in a substantial pool of responses for analysis. therefore, we conclude that the effectiveness of user research can be increased by including (or at least consulting) those most familiar with the user group to be studied. simply assuming that participants will be available may ultimately compromise the effectiveness of testing. additionally, time management is an extremely important element of testing development. failing to fully account for the demands of the irb process, for example, led to significant limitations for our project concerning the timing of testing, the availability of participants, our capacity for marketing and distribution of the survey, and the quality of our testing instrument. while acknowledging that, as in our case, sometimes the need for usability testing arises on short notice, we recommend allocating as much time and preparation to the process as possible, to ensure that every aspect of the testing can be given adequate attention. information technology and libraries | december 2019 86 figure 1. average monthly guide views by transition period. testing for transition | lierman, scott, warren, and turner 87 https://doi.org/10.6017/ital.v38i4.11169 as a final note, nearly two years after the best practices were implemented, we collected and compared guide traffic statistics from three key periods: • september 2014 through december 2015, the sixteen months preceding our transition to libguides v2; • january 2016 through august 2017, our first twenty months on libguides v2, during which time best practices had not yet been fully developed and implemented; and • september 2017 through april 2019, from the beginning of best practices implementation through the time of writing (best practices were implemented gradually between september 2017 and february 2018). mindful of the fact that guide usage fluctuates with the academic year, we compared average views for each guide on a monthly basis. figure 1 shows the average number of times each guide was viewed in a month for each period of the transition. as the figure shows, for most of the academic year, guide views dropped sharply after our transition from libguides v1 to libguides v2, and continued to decline slightly with time through the period when our best practices were implemented. there are a number of possible causes for this phenomenon: • guide usage may be declining over time generally for a variety of reasons, and the transition to the new look of v2 may have confused and disoriented users in the immediate aftermath, causing use of some guides to be discontinued. • a substantial number of older guides were eliminated in the transition to v2, some of which may have been more heavily used than suspected, and new guides that have been created since may not yet have gained traction and recognition from users. • librarians may also have reduced their efforts to incorporate guides into their teaching and outreach strategies. • improved organization in the new system may be helping users to find the guide they need on the first try, without having to move through and examine multiple guides. in any case, this trend is concerning and merits further investigation, but a direct correlation with the transition to libguides v2 and the implementation of best practices has not been established. a more accurate measure of the effect of the best practices would be a user satisfaction survey, although a comparison would be difficult to make due to a lack of a baseline from bef ore the transition. we will continue to investigate trends in the use of our guide and how our best practices have affected our users, and how they can be improved upon in the future. information technology and libraries | december 2019 88 appendix a: homepage testing script welcome and demographics hello! thank you for agreeing to participate. i’ll be helping you through the process, and my colleague here will be taking notes. before we get started, i’d like to ask you a few quick questions about yourself. • are you a student? o (no:) ▪ what is your status at uh? (faculty, staff, fellow, etc.) ▪ with what college or area are you affiliated? o (yes:) ▪ are you an undergraduate or a grad student? ▪ what program are you in? ▪ what year are you in now? • how often do you use this library? • how often do you use the libraries’ website or online resources? • about how many hours a week would you say you spend online? • have you ever used the libraries’ research guides before? (if not) have you ever heard of them? are you ready to start? do you have any questions? homepage tour first, i’d like to ask you a few questions about the homepage, which you can see here. don’t worry about right or wrong answers, i just want to know your reactions. • when you look at this page, what are your first impressions of it? • just from looking at these pages, what do you think this resource is for? • look at the categories across the top of the screen. what do you think each of those mean? what would you use them for? • what would you call the resources listed here? • we call these resources “research guides.” does that name make sense to you? tasks: odd-numbered participants now we’re going to ask you to complete two tasks using this page and the links on it. this isn’t a test, and nothing you do will be the wrong or right answer. we just want to see h ow you interact with the site and what we can do to make that experience better. do you have any questions so far? let’s begin. please try to talk about what you’re doing as much as possible, and tell us what you’re thinking and why you’re taking each step. 1. you need to find sources for an assignment for your history class, and you aren’t sure where to start. you clicked a link on the help section of the library webpage that led you here. find a guide that you think can help you. 2. you are taking chemistry 1301, and your professor told you that the library has a research guide especially for this class. find the guide you think they meant. testing for transition | lierman, scott, warren, and turner 89 https://doi.org/10.6017/ital.v38i4.11169 tasks: even-numbered participants now we’re going to ask you to complete two tasks using this page and the links on it. this isn’t a test, and nothing you do will be the wrong or right answer. we just want to see how you interact with the site and what we can do to make that experience better. do you have any questions so far? let’s begin. please try to talk about what you’re doing as much as possible, and tell us what you’re thinking and why you’re taking each step. 1. you need to format a bibliography in mla style, and your professor told you that the library has a research guide that can help. find the guide you think she meant. 2. you are taking a psychology course for the first time, and you want find out what types of tools you should use to do research in psychology. you clicked a link on the help section of the library webpage that led you here. find a guide that you think can help you. follow-up questions now i’d like to ask you a few follow-up questions. • was this easy or hard to do? • what was the easiest part? • what was the hardest part? • what did you like about using this site? • what’s one thing that would have made these tasks easier to complete? information technology and libraries | december 2019 90 appendix b: subject guides testing script welcome and demographics hello! thank you for agreeing to participate. i’ll be helping you through the process, and my colleague here will be taking notes. before we get started, i’d like to ask you a few quick questions about yourself. • are you a student? o (no:) ▪ what is your status at uh? (faculty, staff, fellow, etc.) ▪ with what college or area are you affiliated? o (yes:) ▪ are you an undergraduate or a grad student? ▪ what program are you in? ▪ what year are you in now? • how often do you use this library? • how often do you use the libraries’ website or online resources? • about how many hours a week would you say you spend online? • have you ever used the libraries’ research guides before? (if not) have you ever heard of them? are you ready to start? do you have any questions? guide impressions first, i’d like to ask you a few questions about this page. don’t worry about right or wrong answers, i just want to know your reactions. • when you look at this page, what are your first impressions of it? • just from looking at this page, what do you think this resource is for? what would you use it for? • what would you call this type of resource? • we call resources like this “research guides.” does that name make sense to you? • if you couldn’t find what you were looking for on this page, what would you do to find help? now we’re going to ask you to complete two tasks using this page and the links on it. this isn’t a test, and nothing you do will be the wrong or right answer. we just want to see how you interact with the site and what we can do to make that experience better. do you have any questions so far? let’s begin. please try to talk about what you’re doing as much as possible, and tell us what you’re thinking and why you’re taking each step. tasks: general business resources guide 1. find a database that you could use for research in a general business class. 2. imagine you want to find information on census data. find an appropriate resource on this guide. 3. find a tool you could use to find a dissertation to use in a general business class. testing for transition | lierman, scott, warren, and turner 91 https://doi.org/10.6017/ital.v38i4.11169 tasks: biology and biochemistry resources guide 1. find a database that you could use for research in a biology class. 2. imagine you want to find information on taxonomy. find an appropriate resource on this guide. 3. find a tool you could use to find a thesis to use in a biology class. follow-up questions now i’d like to ask you a few follow-up questions. • was this easy or hard to do? • what was the easiest part? • what was the hardest part? • what did you like about using this site? • what did you dislike? • what’s one thing that would have made these tasks easier to complete? • did it bother you to have to scroll down the page to find additional information? • if you had been doing this on your own, do you think you would have kept scrolling, or gone to other pages on the guide? • did you notice or read the text below the links? • did the names of the different pages on the guide make sense to you? did you know what to expect? • do you think you would use these resources yourself if you were a student in the appropriate class? information technology and libraries | december 2019 92 appendix c: example survey— social work students screening questions are you a university of houston student, faculty member, or employee? • yes • no are you at least 18 years of age? • yes • no consent university of houston consent to participate in research project title: usability testing of library research guides you are being invited to participate in a research project conducted by ashley lierman, the instructional design librarian, and a team of other librarians from the university of houston libraries. non-participation statement your participation is voluntary and you may refuse to participate or withdraw at any time without penalty or loss of benefits to which you are otherwise entitled. you may also refuse to answer any question. if you are a student, a decision to participate or not or to withdraw your participation will have no effect on your standing. purpose of the study the purpose of this study is to investigate user interactions with the research guides area of the uh libraries’ website, in order to understand user needs and expectations and improve the performance of the site. procedures you will be one of approximately fifty subjects to be asked to participate in this survey. you will be asked to provide your initial thoughts and reactions to the libraries’ research guides, and to complete three ordinary research tasks using the page and associated links, then answer followup questions about your experience. the survey includes 23 questions and should take approximately 20-30 minutes. confidentiality your participation in this project is anonymous. please do not enter your name or other identifying information at any point in this survey. testing for transition | lierman, scott, warren, and turner 93 https://doi.org/10.6017/ital.v38i4.11169 risks/discomforts no foreseeable risks or discomforts should result from this research. benefits while you will not directly benefit from participation, your participation may help investigators better understand our users’ needs and expectations from the libraries’ website. alternatives participation in this project is voluntary and the only alternative to this project is non participation. publication statement the results of this study may be published in professional and/or scientific journals. it may also be used for educational purposes or for professional presentations. however, no individual subject will be identified. if you have any questions, you may contact ashley lierman at 713-743-9773. any questions regarding your rights as a research subject may be addressed to the university of houston committee for the protection of human subjects (713743-9204). by clicking the “i agree to participate” button below, you affirm your consent to participate in this survey. if you do not consent to participate, you may simply close this window. • i agree to participate guide impressions click the link below (will open in a new window) and explore the page it leads to, then return to this survey and answer the questions. http://guides.lib.uh.edu/socialwork when you look at the page linked above, what are your first impressions of it? just from looking at the page, what do you think this resource is for? what would you use it for? what would you call this type of resource, if you had to give it a name? if you couldn’t find what you were looking for on the page linked above, what would you do to find help? on the following pages, you will be asked to complete three brief tasks. this is not a test, and nothing you do will be the wrong or right answer. the purpose of these tasks is simply to allow you to experiment with using the guide in an authentic way. when you have completed all of the tasks, you will be asked a few questions about your experiences. http://guides.lib.uh.edu/socialwork information technology and libraries | december 2019 94 first task click the link below to open the social work resources guide (will open in a new window): http://guides.lib.uh.edu/socialwork on the social work resources guide, find a link to a database that you could use to investigate possible psychiatric medications. enter the name of the database you found: second task click the link below to open the social work resources guide (will open in a new window): http://guides.lib.uh.edu/socialwork imagine you want to find a psychological assessment. find an appropriate resource on social work resources guide. (you do not need to actually find an assessment, only the name of a resource that would help you locate one.) enter the name of the resource you found: third task click the link below to open the social work resources guide (will open in a new window): http://guides.lib.uh.edu/socialwork on the social work resources guide, find a tool you could use to find historical census data. enter the name of the tool you found: follow-up questions were the tasks on the preceding pages easy or difficult to do? • extremely easy • somewhat easy • neither easy nor difficult • somewhat difficult • extremely difficult what was the easiest part of completing the tasks? what was the most difficult part of completing the tasks? what did you like about using the guide that you were linked to? what did you dislike about using the guide? what is one thing that would have made the tasks easier to complete? demographics thank you for completing the survey! before you leave, please answer a few demographic questions about yourself. http://guides.lib.uh.edu/socialwork http://guides.lib.uh.edu/socialwork http://guides.lib.uh.edu/socialwork testing for transition | lierman, scott, warren, and turner 95 https://doi.org/10.6017/ital.v38i4.11169 are you a student? • yes • no type of student: • undergraduate • graduate • not a student program or major: year in program: • 1st • 2nd • 3rd • 4th • 5th or higher • not a student how often do you use the university of houston libraries? • daily • a few times a week • a few times a month • a few times a year • never how often do you use the libraries’ website or online resources (e.g. databases, catalog, etc.)? • daily • a few times a week • a few times a month • a few times a year • never have you ever used the libraries’ research guides before? • yes • no ending screen we thank you for your time spent taking this survey. your response has been recorded. information technology and libraries | december 2019 96 references 1 “user experience basics,” usability.gov, https://www.usability.gov/what-and-why/userexperience.html. 2 brenda reeb and susan gibbons, “students, librarians, and subject guides: improving a poor rate of return,” portal: libraries and the academy 4, no. 1 (2004): 123-30, https://doi.org/10.1353/pla.2004.0020. 3 martin p. courtois, martha e. higgins, and aditya kapur, “was this guide helpful? users’ perceptions of subject guides,” reference services review 33, no. 2 (2005): 188-96, https://doi.org/10.1108/00907320510597381. 4 william hemmig, “online pathfinders: toward an experience-centered model,” reference services review 33, no. 1 (2005): 66-87, https://doi.org/10.1108/00907320510581397. 5 shannon m. staley, “academic subject guides: a case study of use at san jose state university,” college & research libraries 68, no. 2 (2007): 119-40, http://crl.acrl.org/content/68/2/119.short. 6 michal strutin, “making research guides more useful and more well used,” issues in science and technology librarianship 55 (2008), https://doi.org/10.5062/f4m61h5k. 7 kristin costello et al., “libguides best practices: how usability showed us what students really want from subject guides” (presentation, brick & click ’15: an academic library conference, maryville, mo, november 6, 2015): 52-60; alisa c. gonzalez and theresa westbrock, “reaching out with libguides: establishing a working set of best practices,” journal of library administration 50, no. 5-6 (2010): 638-56, https://doi.org/10.1080/01930826.2010.488941; jennifer j. little, “cognitive load theory and library research guides,” internet reference services quarterly 15, no. 1 (2010): 53-63, https://doi.org/10.1080/10875300903530199; dana ouellette, “subject guides in academic libraries: a user-centered study of uses and perceptions,” canadian journal of information and library science 35, no. 4 (2011): 436-51, https://doi.org/10.1353/ils.2011.0024. 8 luigina vileno, “testing the usability of two online research guides,” partnership: the canadian journal of library and information practice and research 5, no. 2 (2010): 1-21. https://doi.org/10.21083/partnership.v5i2.1235. 9 alec sonsteby and jennifer dejonghe, “usability testing, user-centered design, and libguides subject guides: a case study,” journal of web librarianship 7, no. 1 (2013): 83-94. https://doi.org/10.1080/19322909.2013.747366. 10 laura cobus-kuo, ron gilmour, and paul dickson, “bringing in the experts: library research guide usability testing in a computer science class,” evidence based library and information practice 8, no. 4 (2013): 43-59, http://ejournals.library.ualberta.ca/index.php/eblip/article/view/20170. 11 costello et al., 56. https://www.usability.gov/what-and-why/user-experience.html https://www.usability.gov/what-and-why/user-experience.html https://doi.org/10.1353/pla.2004.0020 https://doi.org/10.1108/00907320510597381 https://doi.org/10.1108/00907320510581397 http://crl.acrl.org/content/68/2/119.short https://doi.org/10.5062/f4m61h5k https://doi.org/10.1080/01930826.2010.488941 https://doi.org/10.1080/10875300903530199 https://doi.org/10.1353/ils.2011.0024 https://doi.org/10.21083/partnership.v5i2.1235 https://doi.org/10.1080/19322909.2013.747366 http://ejournals.library.ualberta.ca/index.php/eblip/article/view/20170 testing for transition | lierman, scott, warren, and turner 97 https://doi.org/10.6017/ital.v38i4.11169 12 john j. hernandez and lauren mckeen, “moving mountains: surviving the migration to libguides 2.0,” online searcher 39, no. 2 (2015): 16-21. 13 ouellette, 447; denise fitzgerald quintel, “libguides and usability: what our users want,” computers in libraries 36, no. 1 (2016): 8; sonsteby and dejonghe, 89. 14 costello et al., 56; hernandez and mckeen, 20; sonsteby and dejonghe, 89. 15 caroline sinkinson et al., “guiding design: exposing librarian and student mental models of research guides,” portal: libraries and the academy 12, no. 1 (2012): 74, https://doi.org/10.1353/pla.2012.0008. 16 costello et al., 56; ouellette, 444-45; quintel, 8; kate a. pittsley, and sara memmot, “improving independent student navigation of complex educational web sites: an analysis of two navigation design changes in libguides,” information technology and libraries 31, no. 3 (2012): 56, https://doi.org/10.6017/ital.v31i3.1880; sonsteby and dejonghe, 87. 17 cobus-kuo, gilmour, and dickson, 50; costello et al., 56. 18 sinkinson et al., 74. 19 costello et al., 56. 20 costello et al., 56; hernandez and mckeen, 20; sonsteby and dejonghe, 89; sinkinson et al., 74. https://doi.org/10.1353/pla.2012.0008 https://doi.org/10.6017/ital.v31i3.1880 abstract introduction literature review methodology stage 1: card sort stage 2: migration stage 3: face-to-face testing stage 4: survey stage 5: analyzing and implementing results findings card sort face-to-face testing: homepage face-to-face testing: subject guides survey discussion implementation of findings limitations conclusions appendix a: homepage testing script welcome and demographics homepage tour tasks: odd-numbered participants tasks: even-numbered participants follow-up questions appendix b: subject guides testing script welcome and demographics guide impressions tasks: general business resources guide tasks: biology and biochemistry resources guide follow-up questions appendix c: example survey— social work students screening questions consent guide impressions first task second task third task follow-up questions demographics ending screen references reproduced with permission of the copyright owner. further reproduction prohibited without permission. pearls marmion, dan information technology and libraries; mar 2000; 19, 1; proquest pg. 53 pearls ed. note: "pearls" is a new section that will appear in these pages from time to time. it will be ital 's own version of the "top technology trends" topic begun by pat ensor. these pearls might be gleaned from a variety of places, but most often will come from discussion lists on the net. our first pearl, from thomas dowling appeared on web4lib on august 19, 1999 under the subject "pixel sizes for web from : thomas dowling to : multiple recipients of list sent : thu, 19 aug 1999 06:07 :08 -0700 (pdt) subject: [web4lib] pixel s izes for web pages dan marmion pages." he is responding to a query that asked if web site developers should assume the standard monitor resolution is 640x480 pixels, or something else. you may want to consult the web4lib archive for comments from the last few merry go-rounds on this topic. monitor size in inches is different from monitor size in pixels , which is different from window size in pixels, which is d ifferent from the rendered size of a browser's default font. not only are these four measurements different, they operate almost wholly independently of each other . so a statement like "i have trouble reading text at 600x800" puts the blame in the wrong place . html inherently has no sense of screen or window dimensions. many web designers will argue that the only aspects to a page with fixed pixel dimensions should be inline images; such designers typically restrain their use of images so that no single image or horizontal chain of images is wider than, say, 550px (with obvious exceptions for sites like image archives where the main purpose of a page is to display a larger image) . outside of images, find ways to express measurements relative to window size (percentages) or relative to text size (ems). users detest horizontal scrolling. in my experience, users with higher screen resolutions and/or larger monitors are less likely to run any application full screen; average window size on a 1280x1024 19" or 21 " monitor is very likely to be less than b00px wide. (the browser window i currently have open is 587px wide and 737px high .) i applaud your decision to support web access for the visually impaired . since that entails much , much more than monitor resolution, i trust the people actually writing your pages are familiar with the web content accessibility guidelines. it is actually possible to design web sites that are equally usable , even equally beautiful, under a wide range of viewing conditions. failing to accomplish that completely is understandable; failing to identify it as a goal is not. my recommendations to your committee would be a) find a starting point that isn't tied up in presentational nitpicking; b) find a design that looks attractive anywhere from 550 to 1550 pixels wide; c) crank up both your workstations ' resolution and font size; and d) continue to run your browsers in windows that are approximately 600 to 640 pixels wide . thomas dowling ohiolink ohio library and information network tdowllng @ohiolink.edu pearls i 53 spatiotemporal distribution change of online reference during the time of covid-19 article spatiotemporal distribution change of online reference during the time of covid-19 thomas gerrish and ningning nicole kong information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.15097 thomas gerrish (tgerrish@purdue.edu) is assistant professor, purdue university libraries and school of information studies. ningning nicole kong (kongn@purdue.edu) is associate professor, purdue university libraries and school of information studies. © 2022. abstract the goal of this project was to identify the impact of the covid-19 pandemic on the spatiotemporal distribution of the library’s online patrons, so that we could assess if the scheduled library reference hours are meeting the needs of the academic community. we collected each online reference patron’s location information via their ip address, as well as the timestamp of each online reference instance. the spatiotemporal distribution patterns were analyzed and compared before and after in-person instruction was suspended due to covid-19 distance protocols and a closing of the campus in the 2020 spring semester. the results show that the geographic origins of reference questions redistributed after covid-19 protocols were initially implemented and the university community underwent a temporary geographical redistribution. reference question origins tended to move away from campus to other areas of the state, other states, and internationally. this population redistribution suggested that the library could adjust the online reference schedule to provide better access and service to patrons. introduction the library’s online reference service, also referred to as library chat or digital reference, is a synchronous text-based interaction between the library and the patron via an internet connection, though audio/video communications are now also available. this online reference service provides a way to meet the information needs of patrons who cannot access the physical library location or prefer virtual communication. in this way, it expands the library’s reference services from the physical location to a virtual environment. when the university community was encouraged to socially distance due to covid-19, online reference became a key library function that maintains the library’s connection to the community and their information needs. this connection became vital when most of the library’s physical services were suspended for a short period during the 2020 spring semester. for many libraries, chat became the only possible way to connect with patrons. this increased reliance on online reference and the greater dispersion of the student population led to a looming assessment question with regard to the service: are the hours of online reference convenient for the populations that may live in time zones other than the library’s local time? that is to say, are we available when our patrons are likely to need us? to address the above questions, this study recorded the time stamp and ip address associated with every incoming chat reference from the beginning of the 2019 fall semester to the end of the 2020 fall semester. we evaluated and compared the information from the spatiotemporal distribution pattern change of online reference patrons before and during the covid-19 pandemic. an ip address is a unique string of numeric characters that identifies a particular computer or user over a network, and this number is generally saved in the background with all other information about the chat reference interaction. the ip addresses can be translated to mailto:tgerrish@purdue.edu mailto:kongn@purdue.edu information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 2 gerrish and kong latitude and longitude information using geocoding services. with the latitude and longitude coordinates derived from the ip address, we determined a patron’s location at a city level and time zone. thus, the information helped to evaluate the user population distribution in the world and compare users’ local times to the online reference service’s operation hours. the patrons’ location information as well as the timestamps were evaluated and analyzed in geographic information systems (gis) to provide insights about how the online reference service met the user needs and how the library could improve, if possible. background purdue university libraries serves a large r1 university with a student population of 45,869 as of fall 2020. the online reference service is staffed by approximately 20 professional staff, though this number jumped to 29 when the library’s physical locations closed due to covid-19. online reference at our libraries uses the springshare platform for synchronous chat service with the university community. covid-19 has had a measured effect on online reference hours of operation. prior to covid-19, online reference operated from 11 a.m. to 9 p.m., monday through thursday. on fridays, online reference operated from 11 a.m. to 5 p.m. sundays had truncated hours with the service open from 6 p.m. to 10 p.m. immediately following the move to virtual instruction in march 2020, the administration requested that the online reference service reflect the original hours of the physical library as closely as possible. thus, online reference hours of operation shifted to 7:30 a.m. to 10 p.m. with staff who normally cover the physical reference desks now covering the online service. additional hours were added on saturday afternoon, from 1 p.m. to 5 p.m., and sunday hours were extended from 1 p.m. to 10 p.m. during the university’s virtual instruction phase, only one physical library maintained limited hours of operation, from 8 a.m. to 5 p.m. for local students who needed a computer, wi-fi access, or printing. this was the only in-person service available until august 2020, when libraries began opening with covid-19 restrictions in place. during summer 2020, online reference opening time was moved back to 9 a.m. to allow staff to cover the operations of the library, but the evening and weekend hours were maintained. finally, in fall 2020, all physical libraries reopened with operating hours from 7 a.m. to 12 a.m. however, online reference hours did not return to the 2019 model. instead, online reference operated from 9 a.m. to 11 p.m., monday through thursday; 9 a.m. to 5 p.m. on friday; 1 p.m. to 5 p.m. on saturday; and 2 p.m. to 11 p.m. on sunday. figure 1 shows the timeline of the physical libraries operating status and online reference operating hours changes in 2020. prior to covid-19, this reference service mirrored the traditional in-person reference desk in the hours of operation and staff support. indeed, it was originally conceived, structured, and promoted as a supplement to the in-person reference desk, which was stationed at each library. after campus-wide covid-19 restrictions went into place in march 2020, on-campus students were asked to return home. all of the physical libraries except one were closed, and patrons were actively directed to online reference services. as a result, this service underwent changes in its availability and how users accessed it. while the service was already increasing in use, the postcovid-19 period observed a 30% increase as it became more widely used by not only the student population but also the faculty and staff. at this point, online reference became the primary connection between the geographically distributed community and the library. on march 23, 2020, all classes became virtual, and students largely departed campus. this dispersion was not information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 3 gerrish and kong figure 1. physical libraries operation status and online reference hours change in 2020 to accommodate the campus instruction mode change. only applied to students. members of the faculty and staff communities also relocated once it was clear that work would need to be done remotely. in the fall 2020 semester, university classes resumed with a hybrid online/in-person structure. of the total 45,869 students enrolled as of fall 2020, approximately 4,900 students (or 10.7% of the student population) elected online only classes for at least one semester. given that online reference would still have a schedule of 9 a.m. to 11 p.m., sunday through friday, the question became does this serve the students who were potentially located in time zones geographically distant from the university. literature review online reference service has been offered in academic libraries for more than three decades.1 projects that evaluate online reference service, along with technology changes and user community needs, have been conducted ever since.2 for example, mcclure et al. provided a guideline of statistics, measures, and quality standards for evaluating online reference services.3 however, not many studies have been done that assess the needs from patrons’ geographic perspective in order to improve the service. applying gis analysis to improve the library’s services has a comparatively long history. most of those studies are about mapping the interior spaces of libraries and understanding both space and facility use by patrons. among the representative articles, xia used gis to map the physical location of library materials against a user’s self-height to gain a better understanding of patron browsing habits.4 weessies used gis to evaluate the likelihood of a computer station’s use in relation to the distance to the library entrance, windows, printers, number of neighbo ring computers, and library service locations.5 given6 and mandel7 combined the traditional library metric of user counts from “visual sweeps” with gis to visualize library space usage, patron preferences, and traffic flow. complementing this, stoddart mapped library spaces against the expected use to visualize how library space was used.8 shen takes the space model a step further information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 4 gerrish and kong by creating a library-space information model that can direct patrons to the shelf location of a given book while also giving the library data on circulation stats vs. book location and shelf height.9 characteristic for this broad grouping of research is the use of gis to analyze patrons, patron behavior, and library resources within the extent of the brick and mortar library building. spaces outside of library buildings have been less examined. historically, the external space of the library has been important with regard to neighborhood service areas, wayfinding, distribution of reference within a consortium, and coverage over larger geographic areas. while many studies focused on the community and a library’s immediate locality and community, donnelly mapped library geographic dispersion on the national level against existing united states populations to examine the variation in local library coverage.10 with the implementation of covid-19 procedures and the ubiquity of online reference services, the external location of patrons and their change in distribution over time has increasingly become an important question that gis can address. there are limited studies using patron information, including ip addresses, to track patron location within an online reference model. in one example, clark geocoded patron addresses to visualize the library’s external service area.11 in an academic library setting, ruttenberg used the ip address to locate on-campus patrons when they asked a question to online reference.12 kinikin studied patron locations in the world and used gis to determine which populations were using the library and its branches while also informing decision makers on areas of low library coverage.13 these ideas were expanded in mon’s study, which geolocated ip addresses for the physical location of a patron who asked a question within the statewide florida electronic library collaborative chat service.14 building on this, bishop compared the originating location of a question against which librarian in a geographically disperse network asked the question as a measure of the utility of local knowledge in the reference process.15 in our study, we expanded upon the previous methods to compare patrons’ spatiotemporal pattern changes before and after the covid-19 pandemic. methods purdue university libraries online reference service uses springshare’s libanswers platform as an interface for chat. the system records each patron’s message text, timestamp, as well as the associated ip address. an initial data set of all online reference questions dated from fall semester 2019 until the end of fall semester 2020 (inclusive august 19, 2019, to december 15, 2020) was downloaded from the libchat administration function as a .csv file. this initial data set included fields for ip address, date, time, interaction transcript, patron email (if provided), and name (if provided). this initial data set included 8,754 chat interactions excluding emails, sms text transactions, and questions asked via twitter. all reference interactions were initiated from the library’s website, the ask-a-librarian page, or through the proactive chat widget, which is located on all purdue library webpages. we do not require patrons to identify their relation to the university, their email address, or their status in the university (i.e., undergraduate student, faculty, staff, etc.), so this information is generally unavailable unless the patron self-identifies through the course of the reference interaction. similarly, unless a patron identifies their physical location, reference staff generally do not know where in the world an online reference patr on is located. in practice, most incoming online reference questions are anonymous with no indicators of identity or location. information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 5 gerrish and kong the database was geocoded using the ipinfodb tool to get latitude and longitude information.16 per ipinfodb, the platform’s accuracy is “99.5% on a country level and around 60% on a city level for the us within a 25-mile radius.” the date and timestamps were also associated with the output file. the latitude and longitude coordinates were translated to city, region (i.e., state), and country. time zone information was added to each record according to its geographic location. the final data set contained separate fields for latitude, longitude, state or region, country, time zone, and timestamp, which includes both date and time. at this point, all personal information embedded in the original data set was de-identified, though certain conversations could potentially identify users. one potential limitation to the data set is that if a patron used the university’s vpn network, the reference interaction would be georeferenced with an ip address on campus. indeed, any vpn network would not report the reference interaction’s originating location correctly. the data from each semester was broken down and sorted separately. in addition, r ecords in spring semester 2020 for both preand post-covid-19 restrictions were compared to measure the redistribution of a question point of origin due to covid-19. the data were spatially plotted and analyzed using arcgis pro. results figure 2. total digital reference transactions per semester. 0 500 1000 1500 2000 2500 3000 fall 2019 spring 2020 summer 2020 fall 2020 total transactions per semester information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 6 gerrish and kong the digital reference transaction data were compiled and analyzed on a semester basis except for spring 2020, when all classes moved to 100% online on march 23, 2020. for the spring 2020 semester, the data were split into pre-march 23 (i.e., pre-covid-19 restrictions) and post-march 23 (i.e., post-covid-19 restrictions). the total number of chat interactions generally grew for each semester starting with fall 2019 (fig. 2). the number of summer chat interactions was relatively fewer because it was a shorter semester with fewer students taking classes as compared to the fall and spring semesters. we analyzed the spatial and temporal distribution of patrons before and after the covid-19 pandemic in the following sections. spatial distribution of the patrons before and after implementation of covid-19 protocol the spatial distributions of digital reference patrons before and after the pandemic are mapped in figure 3. there is a trend showing that after the start of the pandemic, patrons were more geographically distributed within the unites states and around the world. we mapped the international distribution of patrons in fall semester of 2019 and 2020 (see fig. 3(a) and 3(b)), as most of the international students make their travel plans by semester. there is a significant increase in questions coming from india, several european countries, and south america. we compared the spatial distribution of patrons before and after the implementation of covid-19 protocols in spring 2020 within the united states as the time frame is more suitable for domestic travel plans. figure 3(c) and 3(d) also show an increase of patrons around the country other than the campus area. figure 3. spatial distribution of patrons before and after initial covid-19 protocols closed the campus. (a) 2019 fall semester (b) 2020 fall semester (c) 2020 spring pre-covid-19 (d) 2020 spring post-covid-19 information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 7 gerrish and kong to help us further understand this spatial distribution change, we divided the geographic regions into four categories: local areas (west lafayette and lafayette areas of indiana, where the purdue university main campus is located), other indiana areas, other states apart from indiana, and areas outside of the united states. the transactions in these four regions are shown in table 1, and the percentages of transactions from each region were summarized in figure 4. in fall 2019, which is considered the last “normal” semester prior to the covid-19 response, 60% of reference questions originated in the immediate local area to campus (defined as the local area). the beginning of spring 2020 followed closely to this number with 56% of the total questions originating in the local area of the university. this proportion dropped at the march 23 boundary with only 29% of reference questions originating from the immediate local area to the university. during this period, there was an increase in questions originating in the state of indiana as well as other regions of the united states and the world. during summer 2020, 35% of the total number of chats originated in the campus area. in fall 2020, classes were offered as a combination of hybrid, in-person, and virtual formats; however, the proportion of questions originating in the immediate area did not return to fall 2019 levels. instead, only 43% of questions originated from the campus area. figure 4. the percentage of transactions before and after the population redistribution due to the implementation of covid-19 protocols in four geographic areas: local area (campus), indiana, united states outside of indiana, and international. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% fall 2019 2020 spring pre-covid 2020 spring postcovid summer 2020 fall 2020 local area state outside state outside us information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 8 gerrish and kong table 1. digital reference transactions coming from different geographic regions semester total trans. local area indiana not local outside indiana outside us fall 2019 1928 1156 203 493 76 2020 spring pre-covid-19 1168 656 205 273 34 2020 spring post-covid-19 1119 323 349 389 58 summer 2020 1909 676 552 600 81 fall 2020 2630 1118 670 714 128 in general, the geography of origin was redistributed primarily to the state of indiana, followed by the united states as a whole, and then to international locations. fall 2019 saw 11% of virtual reference questions originating in indiana but outside of the local area. this increased to 18% in spring 2020 prior to the implementation of campus covid-19 restrictions. after march 23, 2020, when all classes went online, the percentage of questions in indiana outside of our campus area rose to 31%. this proportion remained steady during summer 2020, when 29% of questions originated in indiana outside of the local area. in fall 2020, when in-person classes resumed, the proportion dropped to 26%, but this remains more than two times the proportion measured during fall 2019. this pattern of redistribution of geographic origin was repeated in the data points outside the united states, though the fluctuation due to covid-19, while similar, occurred to a lesser degree. in fall 2019, 3.9% of questions arrived from geographic origins outside of the united states. in spring 2020, prior to covid-19 restrictions, this number dropped to 2.9%. after classes went virtual and students moved off campus, the percentage of questions increased to 5.2%. in fall 2020, the proportion dropped to 4.9%, but this is still not a return to fall 2019 levels. the distance of digital reference patrons to main campus to analyze the spatial distribution change of digital reference patrons, we calculated the distance of each patron to our main campus. a small portion of ip addresses (less than 4%) which couldn ’t be correctly located was excluded from the analysis. figure 5 represents the distance distributions in a box and whisker diagram. the horizontal lines within the boxes show the median of the data sets. in both fall 2019 and early spring 2020, the median distances are about 400 miles or less, which is within the local area. this indicates that most of the digital reference questions come from patrons who live around the main campus. the median distance increased to 1,000 miles after classes were moved online in spring 2020, which is about the typical distance of traveling within the state of indiana. the maximum value was extremely high coming from international countries. this indicates that a significant portion of the patrons moved outside of the local area. in fall 2020, although the maximum distance dropped to a similar range as in normal semesters, the median and average values of the distance dataset are still much higher than the time before the pandemic. information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 9 gerrish and kong figure 5. the box and whisker diagram shows the distance from ip addresses to campus (in miles). to test the statistical significance of the distance values in different time periods, we conducted anova tests for the distances in spring 2020 semester comparing before and after the pandemic, as well as a comparison between fall 2019 and fall 2020. the test results are shown in table 2. both tests show there are significant differences before and after the classes were moved online. this means the pandemic situation significantly changed the patron distances to the main campus with p-values < 0.05. we likewise compared the distance between spring 2020 post-pandemic and fall 2020. there was no significance found. although the university started to offer in-person classes in fall 2020, and most of the students were back to campus, there were still quite a few questions coming from students/faculty who were not living in the geographic area around campus. information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 10 gerrish and kong table 2. the anova test results of the patrons’ distance to the main campus before and after the pandemic protocols were implemented (in miles) summary groups count sum average variance spring 2020 pre 1134 323,361 285.15 8e+05 spring 2020 post 1061 592,238 558.19 2e+06 anova source of variation ss df ms f p-value between groups 4.1e+07 1 4.1e+07 27.54942 0.0000 within groups 3.3e+09 2193 1.4e+06 total 3.3e+09 2194 summary groups count sum average variance fall 2020 2502 1e+06 500.95 2e+06 fall 2019 1852 7e+05 385.43 2e+06 anova source of variation ss df ms f p-value between groups 1.4e+07 1 1.4e+07 7.104865 0.0077 within groups 8.7e+09 4352 2.0e+06 total 8.7e+09 4353 temporal distribution of the questions while the spatial distribution of reference questions allowed us to understand where patrons were located, the temporal distribution of these questions helped us to plan the digital reference service hours to better meet patrons’ needs. we analyzed the temporal distribution of questions by the day of the week for fall 2019 and fall 2020. figure 6 shows the median distances of the questions for each day of the two semesters. the distance was broken down into six ranges differentiated by color. from the nearest distance range to the ranges above, the analysis covers the questions coming around campus, in the local area within indiana, around eastern united states (with mostly eastern and central time zones), the entire united states, and international locations. in fall 2019, we provided digital reference service sunday through friday, and in fall 2020, the service was provided every day except holidays. in fall 2019, the monday to friday range showed that most digital reference questions came from the campus area, especially in the first half of the semester. questions from further distances within indiana started to occur more often after november, probably due to holiday related travels. relatively remote questions came more often on sundays. one possible explanation for this difference is that students/faculty might travel away from campus during weekends. interestingly, the fall 2020 weekly distribution of questions shows a different pattern. first, most median distances are further than the fall 2019 semester, which means there were a lot of questions coming from people living off campus, no matter if it was at the beginning of the semester or later. second, there was no obvious difference between the median distances during the weekdays and weekends. information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 2 gerrish and kong figure 6. the median distance of reference questions in miles by weekdays (left: fall 2019, right: fall 2020). in addition, we analyzed the hourly distributions of digital reference questions for fall 2019 and fall 2020 (fig. 7). in fall 2019, the median distance of the questions were mostly from campus, especially during the peak hours. if the median distances measured were not from the campus area, at most they were located in the local area where most of the off-campus students, faculty, staff, and research community live, or within greater lafayette. remote questions came from different time zones, such as international time zones or the pacific time zone, and usually came either in the first hour or the last hour of digital reference service operating hours. in fall 2020, this distribution pattern changed. most of the median distance ranged 200 miles, which meant that a large portion of the questions came from off-campus populations. there were additional time slots with median distances above 2,000 miles, which came from a time zone with at least 2 hours difference from our campus. again, these questions were most often observed in either early or late reference service hours, i.e., 8 a.m. to 9 a.m., or 10 p.m. to 11 p.m. <= 5.6 <= 22 <= 188 <= 500 <= 2,000 >2,000 information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 3 gerrish and kong figure 7. the median distance of reference questions by hour of day (eastern time zone). discussion and conclusion covid-19 and the protocols developed in response to it had a redistributing effect on the geographic origin of reference questions in academic libraries. as the university closed and moved to virtual classes in response to the early pandemic, the geographic origin of reference questions redistributed away from campus. in our case, the geographic origins tended to move away from the campus to nearby areas within the state and neighbor states, though there was redistribution away from the campus at the state, regional, national, and international level. as of fall 2020, when the campus partially opened, these numbers have begun reversing themselves, but there is still a (a) fall 2019 median distance in miles (b) fall 2020 median distance in miles <= 5.6 <= 22 <= 188 <= 500 <= 2,000 <= 5.6 <= 22 <= 188 <= 500 <= 2,000 information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 4 gerrish and kong significant population beyond the campus and the local time zone. fall 2020 distribution numbers still show some of the redistribution effects observed early in the pandemic. there were a surprising lack of questions coming from the russian federation, china, and central asian countries given that our university does have students from these countries. this may be due to the use of a vpn by users in these countries when accessing library resources. if a user were to use vpn, the question would be recorded as having a geographic origin in the vpn provider’s location, rather than in the country of origin. this is one possible explanation for the lower number observed in china and the non-existent users in the russian federation, eastern europe, and central asia. this study demonstrated the broadening of our library’s geographic footprint in response to covid-19 protocols. students, faculty, and staff were not bound to campus and were free to study and work anywhere with internet access. in this regard, with populations distributed around the country and world, the expansion of reference hours was necessary. prior to covid-19, online reference operated from 11 a.m. to 9 p.m. this would mean students studying virtually in the pacific time zone would experience effective reference desk hours of 8 a.m. to 6 p.m., which eliminates access during the evening hours. when the library extended online reference hours during the covid-19 lockdown from 7:30 a.m. to 10 p.m., this somewhat improved the accessibility for patrons in the pacific time zone. for those students studying online in the pacific time zone, this creates effective reference hours of 4:30 a.m. to 7:00 p.m. pst. the contrast becomes even starker when examining international students studying online in much more distant countries. many students from india returned to their home country in spring 2020 following the move to virtual learning. for students studying in india, the online reference desk in pre-covid-19 times would have effective hours 8:30 p.m. to 6:30 a.m. ist (india standard time), which forced this population to interact with the library during their evenings and nights. the expanded reference hours improved this access to 5 p.m. to 7:30 a.m. ist. while this is better, it still forces students in this part of the world to interact with the library during the evenings and nights and excludes daytime hours. interestingly, the data from fall 2019 seems to indicate that there was an international population prior to covid-19. these were likely students studying abroad, taking some of the early online classes, or simply traveling. thus, the distributed online reference user population is nothing new but has been exacerbated by covid-19 and the expansion of online classes. in this regard, the number of international reference interactions can be predicted to decrease as covid-19 restrictions are gradually relaxed, but the number will not go to zero. endnotes 1 joseph janes, david carter, and patricia memmott, “digital reference services in academic libraries,” reference & user services quarterly 39, no. 2 (1999): 145–50, http://www.jstor.org/stable/20863724. 2 carol tenopir and lisa ennis, “a decade of digital reference 1991–2001,” reference & user services quarterly 41, no. 3 (2002): 264–73, http://www.jstor.org/stable/41241123. http://www.jstor.org/stable/20863724 http://www.jstor.org/stable/41241123 information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 5 gerrish and kong 3 charles r. mcclure, r. david lankes, melissa gross, and beverly choltco-devlin, “statistics, measures, and quality standards for assessing digital reference library services: guidelines and procedures,” eric clearinghouse on information & technology (2002). 4 jingfeng xia, “library space management: a gis proposal,” library hi tech (2004), https://doi.org/10.1108/07378830410570476. 5 kathleen w. weessies, “a locational analysis of academic library computer use,” reference services review (2011), https://doi.org/10.1108/00907321111175868. 6 lisa m. given and heather archibald, “visual traffic sweeps (vts): a research method for mapping user activities in the library space,” library & information science research 37, no. 2 (2015): 100–8, https://doi.org/10.1016/j.lisr.2015.02.005. 7 lauren mandel, “visualizing the library as place,” performance measurement and metrics (2016), https://doi.org/10.1108/pmm-04-2016-0016. 8 rick stoddart and bruce godfrey, “gathering evidence of learning in library curriculum center spaces with web gis,” evidence based library and information practice 15, no. 3 (2020): 21– 35, https://doi.org/10.18438/eblip29721. 9 yaqui shen, “library space information model based on gis—a case study of shanghai jiao tong university,” information technology and libraries 37, no. 3 (2018): 99–110, https://doi.org/10.6017/ital.v37i3.10308. 10 f. p. donnelly, “regional variations in average distance to public libraries in the united states,” library & information science research 37, no. 4, 280–289, https://doi.org/10.1016/j.lisr.2015.11.008. 11 philip m. clark, “thematic mapping, data mapping, and geocoding techniques for analyzing library and information center data,” journal of education for library and information science (1995): 330–41. 12 judy ruttenberg and heather tunender, “mapping virtual reference using geographic information systems (gis),” poster presented at the ala conference, orlando, fl, june 2004, https://web.archive.org/web/20040808050212/http://helios.lib.uci.edu/question/gisala2004/campusmap2004-2.jpg. 13 janae kinikin, “applying geographic information systems to the weber county library system,” information technology and libraries 23, no. 3 (2004): 102. 14 lorri mon et al., “the geography of virtual questioning,” the library quarterly 79, no. 4 (2009): 393–420, https://doi.org/10.1086/605381. 15 bradley wade bishop, “location‐based questions and local knowledge,” journal of the american society for information science and technology 62, no. 8 (2011): 1594–603, https://doi.org/10.1002/asi.21561. 16 “ip address information,” ipinfodb, https://ipinfodb.com/. https://doi.org/10.1108/07378830410570476 https://doi.org/10.1108/00907321111175868 https://doi.org/10.1016/j.lisr.2015.02.005 https://doi.org/10.1108/pmm-04-2016-0016 https://doi.org/10.18438/eblip29721 https://doi.org/10.6017/ital.v37i3.10308 https://doi.org/10.1016/j.lisr.2015.11.008 https://web.archive.org/web/20040808050212/http:/helios.lib.uci.edu/question/gis-ala2004/campusmap2004-2.jpg https://web.archive.org/web/20040808050212/http:/helios.lib.uci.edu/question/gis-ala2004/campusmap2004-2.jpg https://doi.org/10.1086/605381 https://doi.org/10.1002/asi.21561 https://ipinfodb.com/ abstract introduction background literature review methods results spatial distribution of the patrons before and after implementation of covid-19 protocol discussion and conclusion endnotes tackling the big projects: do it yourself or contract with a vendor? editorial board thoughts tackling the big projects do it yourself or contract with a vendor? laurie willis information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.12067 laurie willis (laurie.willis@sjlibrary.org) is web services manager, san jose public library and a member of the information technology and libraries editorial board. copyright © 2020. everyone who works with library technology sooner or later finds they are faced with a major project to tackle. sometimes we contract with a vendor to do the bulk of the work, sometimes we do the project ourselves. there are advantages and disadvantages to both methods. here at san jose public library we were faced with two large projects at once—a website migration/redesign, and a new catalog discovery layer. we considered bibliocommons as the vendor for both projects. they offer both a website product (biblioweb) and a discovery layer (bibliocore). we opted to complete the website migration/redesign ourselves using open source software, migrating from our previous drupal 7 platform to drupal 8, and to contract with bibliocommons to provide our new discovery layer. this put us in an unusual position. we were implementing a website migration/redesign ourselves while simultaneously working with the vendor we would likely have chosen for the website project on the catalog discovery layer. this gave us the opportunity to compare the experience of implementing the website project ourselves with what the same project might have been like if we had been working with a vendor. what we learned timing not surprisingly, completing the website project on our own took longer than expected. • learning curve—we expected there to be a learning curve but it turned out to be significantly steeper than anticipated. • unknowns—in addition to basic learning, we also came across functionality that didn’t work as expected. • failures—there were times when what we tried to do didn’t work at all and we had to backtrack. timing for the vendor-led project, on the other hand, kept to the planned timeline. • prescribed timeline—as part of their contract, the vendor provided a timeline at the outset. we made small adjustments but for the most part the project stayed on time. • predictability—the vendor has completed many similar projects so they had a solid idea what to expect and how long it would take. • problem solving—some challenges unique to our situation did arise and caused some delays. control the ability to have more control over the project results was a significant factor in our decision to complete the website project ourselves. we had the opportunity to make choices and also faced the challenge of a sometimes-overwhelming number of options. mailto:laurie.willis@sjlibrary.org information technology and libraries march 2020 tackling the big projects | willis 2 • options—many options were available to us. we had choices regarding structure (website platform and theme), design and content. • overwhelm the plethora of options encouraged a tendency to spend a lot of time (too much?) “shopping”—researching and evaluating options. • we completed a thorough audit of our content and created a new site based on our needs. • user experience (ux) testing—we were able to perform testing with our users and adapt our website to better fit their needs. working with a vendor, on the other hand, limited what we were able to do but the decision -making process was easier and faster. • we had the option to select colors but otherwise the structure and design were fixed. • we had some control over textual content within the parameters given. e.g. we could add links to the footer but the number of links allowed was limited. • little time was spend making these decisions. • it’s a challenge fitting unique content into a pre-determined format. • user experience (ux) testing—the vendor is able to include a wider sampling of people while testing, but they’re not able to specifically consider our local users. implementation for the website project, implementation turned out to be more complex than expected. • learning—as mentioned above, there were many new things to learn that came up as the project progressed. • consultant—we came up with technical questions that were beyond the scope of our knowledge. we found it extremely helpful to contract with a consultant for guidance. • conflicting responsibilities—we worked on this project while continuing with our normal workload and maintaining the current website. we were also simultaneously working on the discovery layer implementation. the vendor-led implementation went more smoothly. • learning—the vendor assigned a project manager, who was available to guide us through the process. the vendor also provided documentation that walked us through the process. • expertise—when challenges did arise, the vendor had an experienced staff to help us work through them. • staff time—although the vendor did most of the work, the project did consume significantly more staff time than expected as we worked through every detail. training and marketing • staff for the website, we had to create our own training for staff. for the catalog, the vendor offered webinars for staff and sent a trainer to do in-person training. • public the vendor offered samples of materials from other libraries to both inform and educated the public. since both projects were launching at the same time, we were able to adapt some of these materials to include both. cost the cost of hiring a vendor initially seems steep, but staff time is also expensive. considering the unexpected additional staff time spent, it likely would have been less expensive to choose the vendor option. information technology and libraries march 2020 tackling the big projects | willis 3 conclusion there are pros and cons to both methods—completing a project on your own or working with a vendor. whether your project is a new website or catalog or something else entirely, learn as much as you can about what will be involved before you decide on an approach. weigh your options by looking at your needs and the resources and time available to you. the primary aspects to consider are: • do staff have the necessary expertise to complete the project? will there be a learning curve? are staff prepared and willing to learn new things and figure things out? if you are considering a vendor, do they have a training plan for • how much time is available? is there a deadline? if there is a deadline, what will be the costs if it needs to be extended? if you are considering a vendor, how committed are they to achieving the prescribed deadline? • which is more important to you—control and flexibility or ease of implementation? • what resources are available if you have questions? if you work on your own, are there people and online resources you will be able to turn to? if you are considering a vendor, will you be assigned a representative to walk you through the process? for our particular situation, i believe we made the right choice to complete the website project on our own. staff had enough expertise that they were willing and able to learn the necessary skills, calling upon a consultant when needed without outsourcing the entire project. while we had an expected timeline, we were able to extend it with only minor consequences (paying for additional web hosting while the project was under construction.) we maintained the control and flexibility we needed in order to present some of the unique services and spaces that our library offers, which might have been lost using a vendor package. we had some knowledge of consultants working in the field and were able to hire one to show us how to proceed when we were over our heads. we also relied heavily on tutorials and other training resources posted online. whatever you decide, taking time to think things through before beginning will help make your project a success. what we learned timing control implementation training and marketing cost conclusion highlights of isad board meeting 197 4 midwinter meeting chicago, illinois monday, january 21, 1974 43 the meeting was called to order at 10:15 a.m. by president frederick kilgour. those present were: board-frederick g. kilgour, lawrence w. s. auld, paul j. fasana, donald p. hammer (isad executive secretary), susan k. martin, ralph m. shoffner, and berniece coulter, secretary, isad. guest-brett butler. midwinter 1973 minutes approved. motion. mr. shoffner moved to approve the minutes of the midwinter 1973 board meetings. seconded by mr. fasana. carried. las vegas annual meeting minutes accepted. a correction on page one of the las vegas annual meeting minutes was noted: mr. auld's name should be added to the list of guests present. motion. mr. fasana moved that the minutes of the isad board meetings at the las vegas annual conference be accepted as corrected. seconded by mrs. martin. carried. isad history committee. the matter of appointing members to the isad history committee, whose function is to prepare a history of isad for ala's centennial celebration in 1976, was considered. mr. shoffner said that during the time he was president, he had rendered the isad history committee inactive. it was suggested by mr. kilgour that a historian would serve the purpose better than a committee. mr. shoffner remarked that he anticipated the chairman would be a historian. mrs. martin asked whether a check could be made first whether ala is planning to publish any document for the centennial celebration that would make any preparation by an isad committee or historian worth while. mr. kilgour remarked that isad definitely should be included if ala did plan to publish any document and asked the board to give an "ok" to appoint a historian. motion. mr. fasana moved that the ad hoc isad history committee 44 journal of library automation vol. 7/1 march 197 4 be abolished and recommended that the president be given the right to appoint a historian if ala planned to publish a centennial document. seconded by mr. auld. carried. ala dues structure. mr. hammer explained the information submitted to the board concerning the proposed ala. dues structure. the basic fee for ala membership under this proposed dues structure would be $35. membership in each division would be an additional $15. in es~ sence, each division would be on its own financially:· if there are not enough memberships to support a division, as could be the case, the division would cease to exist. !sad could support itself with its present membership, but there is no. way of knowing how many !sad members would still select !sad if the choice of two divisions included in the dues was removed. the divisions that publish a journal would attract membership much more easily than those that do not provide a journal. mr. hammer further remarked that the proposed dues schedule indicates that the divisions must prove themselves with membership dues as their only support, but this does not apply to ala committees, scmai, units such as the office for intellectual freedom, office for library service to the disadvantaged, and the administrative and support units of ala. these units may be of great value to ala, but if one tinit is forced to prove its value financially, then it seems that all should have to prove themselves. the divisions would be expected to depend on their own resources, e.g., if the division runs out of postage ·money, there would be no further mailings. the divisions would be expected to pay for their support services.· the idea is very closeto the federation plan which has been circulated for some time. in answer to the question of how a new division would get started, mr. hammer replied that he assumed there would have to be enough memberships to provide for it financially. mr. shoffner suggested that the discussion be divided into two parts: ( 1) the principle involved; and ( 2) the financial aspect. · the following points were brought up in the ensuing discussion by the board regarding the proposed dues structure: . starting a new division could be a problem; perhaps it could be subsidized for a stated time, after which the division: would be self-sufficient. the proposed separation of dues, however, would force a clarity in ex~ penditures of. ala in respect to how the divisions would benefit. some divisions could not be self-supporting and yet are producing important contributions for ala. ' ' a division would be at the mercy of the ala supporting units. if a sup~· port unit was not efficient, the divisions would be handicapped in the services to their members. would a division be able to know enough in advance how much money could be counted on for program planning? the answer was "yes" based highlights of meetings 45 on past membership, except in the first year. the income would be predicted on the basis of the previous year's income. an excess of income would remain in the division's funds. if the division income fell short of the anticipated amount, it would have no back-up from ala as it has presently. a person could not join one or more of the divisions without joining ala. some divisions could become part of a stronger division, e.g., a division could be broken up and absorbed into several other divisions with related interests. was there any plan to absorb or redirect these divisions which obviously could not be self-supporting? nothing has been announced so far. if a division got into financial difficulties, it could not cut down on its professional staff as a professional staff is needed to maintain ala's status with the internal revenue service. it was noted that there were more important reasons than this for maintaining a professional staff . . · this proposal was drafted by the then deputy director ruth warncke in 1970. the board was informed that a cost study of ala was recently discussed by staff members, but the reply has been that it would take five years to make such a study. the isad board disagreed with the period of five years, but stated that it could take a year. . · a division should be allowed to set up its own budget under this pro~ posal as well as have a voice in ala policy. · · the proposal appeared to be unfair in some points: ( 1) some divisions would have about twice their present income through memberships, while isad would break about even; ( 2) life members would be entitled to membership in all divisions; ( 3) apparently institutions without a group insurance plan of their own could join ala for $35 and be entitled to the gioup insurance for their staffs; at some point an examination of the privileges in each category of membership should be made; and ( 4) if the $35 ala membership fee were increased in the future, this would directly affect membership in the divisions. the isad budget for the 1973/74 year is approximately $47,000 and the journal of library automation $23,000, or a total of approximately $.70,000. if isad membership should fall back to 3,000 members and the membership fee were $25, isad could still be viable. "mr. kilgour's poll of the board revealed all were in favor of the principle of more or less independent divisions, but with reservations; the following was therefore moved: · 'motion. mr. shoffner moved that the isad board favors the prin. ciple of divided annual fees for ala and for its divisions subject to: ' · ( 1) division determination of the fee structure for division memberships and publications; ( 2) division participation in the governance of ala headquarters activities. seconded by mr. fasana. motion carried. ·selective dissemination of information system. mr. 46 journal of library automation vol. 7/1 march 1974 hammer presented a proposal for establishing on a subscription basis a selective dissemination of information system for ala members (see exhibit 1). mter discussion it was decided that mr. hammer would contact ohio state university library and obtain information on exact procedure as to how this would be run, how it would be publicized, who would develop the profiles, who would handle the subscriptions, the cost to the division, etc., and then repmt to the board. co-sponsorship of basic data processing seminars. mr. hammer presented a proposal to the board regarding co-sponsorship of basic data processing seminars with organizations outside isad, such as ibm and dataflow systems, inc. in bethesda, maryland. in the past isad seminars have generally been on library applications, but what he had in mind, mr. hammer said, was primarily on the basics of data processing, systems analysis, and other basic aspects that would be of interest to administrators. the intent would be to give administrators enough knowledge so that they could evaluate the results that they should be gaining from their data processing systems. these institutes would be a package deal in that the personnel and materials would be commercially supplied, dataflow has conducted seminars for the united states civil service commission. ibm has some seminars which are free, but there is a charge if they have to develop a special program. comment was made regarding seminars conducted several years ago where problems developed as to the commercial aspects. motion. it was moved by mrs. martin that the matter of !sad's cosponsoring basic data processing seminars with outside organizations be referred to the isad program planning committee for discussion and their evaluation. seconded by mr. fasana. carried. tuesday, january 22, 1974 the meeting was called to order by the president, mr. kilgour, at 2:25 p.m. those present were: board-frederick g. kilgour, lawrence w. s. auld, paul j. fasana, donald p. hammer (isad executive secretary), susan k. mqrtin, ralph m. shoffner, and berniece coulter, secretary, isad. guests-alex allain, brigitte kenney, ron miller, and velma veneziano. draft on ala goals and objectives. mrs. brigitte kenney sought feedback from the board on the paper previously distributed on the ala committee on planning's draft statement on ala's goals and objectives. several changes were suggested. mrs. kenney expressed her appreciation for their input. freedom to read foundation. mr. alex allain from the foundation presented the cause of the freedom to read foundation in rehighlights of meetings 47 gard to the current problem of censorship. he stressed the desire to keep channels open with the divisions of ala and with systems and networks across the nation. marbi and isad standards committee (tesla). velma veneziano, chairman of the marbi interdivisional committee, appeared before the isad board requesting clarification of the functions of marbi and the isad standards committee ( tesla). she said that her committee would like discrepancies cleared up and duplications eliminated. mrs. martin suggested that the charges to both marbi and tesla be reworded to clarify their functions. isad bylaws committee. in response to discussions concerning the establishment of several committees, mr. shoffner moved to establish an organization committee. seconded by mrs. martin. mr. fasana pointed out that the mechanism for establishing a bylaws committee was already spelled out in the isad constitution. the president can appoint the committee. motion withdrawn. mr. shoffner withdrew his motion. mr. fasana suggested that the bylaws committee also be charged with the organizational and review function. the matter of the standards committee's function was also made the charge of the bylaws committee. wednesday, january 23,1974 president kilgour called the meeting to order at 10:15 a.m. those present were: board-frederick g. kilgour, lawrence w. s. auld, paul j. fasana, donald p. hammer ( isad executive secretary), susan k. martin, ralph m. shoffner, and berniece coulter, secretary, isad. guestsbrett butler, john kountz, ann painter, charles payne, james rizzolo, richard utman, velma veneziano, and david waite. report of the nominating committee. the chairman, charles payne, announced the nominees for the 197 4/75 slate of isad candidates: vice-president/president-elect: board member-at-large: henriette a vram allen veaner ruth tighe maurice freedman the board members extended a vote of thanks to the nominating committee for their work. report of marc user's discussion group. mr. james rizzolo, chairman, said most of the discussion in the discussion group revolved around ala, clr, and the change in clr' s status which was moved in august from one irs classification to another. it is now an "op48 journal of library automation vol. 7/1 march 1974 erating foundation," i.e., it is active in programs rather than waiting for a reaction to a request using funds they have as a "carrot.'~ also discussed was whether clr should fund and pick the participants or clr should do the funding and ala pick the participants. , . also the group considered the question of standards and how one ardves at them. there are a number of groups in ala dealing with standards, but there is a need to work out a systematic method of developing standards. there needs to be a routine mechanism set up for going from an imtial formulation of an idea for a standard to a standard that the profession can live with. report of program planning committee.. the committee met at the asis meeting in los angeles prior to meeting at the ala midwinter meeting. . :rvir. brett butler, chairman, announced that three european librarians );lad been invited to participate in the 1974 annual program .at new york city. mr. kilgour was handling all arrangements. mr. kilgour informed the board that the travel expenses of all three librarians were ·being provided for by sources outside ala. linda crismond is the local planning person for the 1975. san francisco annual conference program which will be sponsored jointly with asis. joshua smith had suggested mark radwin of lockheed as liaison and he had agreed to serve in this capacity. . the new orleans institute on "alternatives in bibliographic networking" had enough registrants by midwinter to confirm it. there had been some difficulty concerning contact with speakers but the .details had been straightened out. copies of the program ·for the new orleans institute were distributed. mr. butler also inforrried the board that his committee was looking into the details of cooperating with other institutions and state schools which might be interested in working with isad in a seminar or institute. the committee was also considering what type of programs should ·be presented, subcontracting to outside companies, and how to control these. the members of the committee were working on a procedure manual for use in conducting institutes .. telecommunications committee report. the activities of the telecommunications committee are highly organizational at present. the committee has swung away from cable tv as its primary interest and towards telecommunications as applied to bibliographic networks. the chairman, david. waite, said there was a need to set up a simple guide to carry out their charge for the educational activities and legislation advisory responsibilities to the ala committee on legislation. more people would probably be appointed to the telecommunications committee as there was a need for more expertise to assign to the areas identified by the committee. . highlights of meetings 49 he further said that the need now is to determine what existing appara. tus may be utilized to fulfill the committee's responsibility to disseminate information regarding telecommunications as applied to the library community so that the committee could put most of .its effort into technical work. one project discussed was to gather background information on bibliographic data centers and network activities and their needs for telecommunication facilities in order to draft a requirements statement. the purpose of such a: statement is. that the committee could communicate .with new telecommunications systems. the committee was not aware of an ade· quate statement of library requirements that. is readily available ·for the commercial services that .are steadily increasing. assignments have been given to gordon randall, maryann duggan, and ron miller to gather this information. mtr. waite remarked that the committee would be interested in ~ny report on the proposed isad networks committee when available. brett. butler, chairman of the program planning co:rrnnittee; suggested that a telecommtmications institute should be in the future plans and mr. waite's or any of his committee members contribution of any ideas about· such would be appreciated. report 6f the interdivisional committee on machine-readable bibliographic information (marbi). (see exhibit 2~) mr. kilgoirr appointed velma veneziano to serveas liai~ son to the isad standards committee from marbi. her term as chair~ man of marbi will conclude in jnne 197 4. report of cola discussion group. (see exhibit 3.) report of committee on technical standards fqr library 'automation (tesla). (see exhibit 4.)~report of chairman jolln kountz. · · technological unemployment. president kilgour felt ala should do something about the spreading of unemployment due.' to increased use of technological development. m!r. auld suggested that someone be appointed to study the potential and existing problems in this area. this could be funded either: (1.) ,under a fellowship by clr; or (2) application for the j. morris jones .goals award. .· · · · mr. fasana. thought an interdivisional committee might be set up be~ tween the fotir rnost directly affected divisions: isad, lad, led.; 'and rtsd. ·. . . , mr. shoffner expressed· his view that as efficiency is ii:rcreased productivity is increased aj}d could possibly therefore increase employment. mr.: kil~ gour said tha.t.history had proved to.the contrary. mr. shoffner stated he felt the problem was on~. of education and ·.training. a specification· of 50 journal of library automation vol. 7/1 march 1974 what is expected of one and what training he would receive during a technical changeover was needed. mr. fasana's suggestion was that the four divisions be asked for papers of their views or a program at the san francisco annual conference be prepared on the subject of technological unemployment. mr. auld asked if it could not rather be introduced at the new york annual conference, to which ann painter volunteered the use of the isad /led education committee's two-hour time slot for the program at new york. motion. mr. fasana moved that mr. kilgour phrase a statement of the problem on technological unemployment as he sees it and present it to the !sad /led education committee for consideration as the program theme at the new york conference. seconded by mrs. martin. carried. proposed standards in ]ola tc. mr. john kountz brought up the subject of using lola tc for the interactive mechanism of presenting the proposal of a standard to the isad members for comment, and of having a form included to be filled out and returned. the board agreed that this was a good idea. isad/led education committee report. ann painter, chairman, asked for clarification of appointment of new members to the committee. roger greer is the only member whose term continues past this year. mr. hammer was asked to find out who appoints members to the above committee. the committee is working on a series of papers defining educational "modules" and has sent out a revised questionnaire to identify appropriate subject areas. it is planning to send the questionnaires to associated institutions as well as to the ala accredited schools. the need for funding the modules rather than depending upon volunteer or "slave labor" was considered by the committee. volunteers have little preparation time and so often there is a lack of in-depth or consistency in developing these modules. also the committee would like to set up a file of modules available to people across the country. there could be a problem of copyright involved. mr. kilgour asked miss painter for suggestions of people who might be interested in serving on the committee. lola manuscripts. mrs. martin, editor of ]ola, asked the board for its feeling on whether it would be appropriate or desirable to put the date of acceptance on published manuscripts in lola. the board decided that should be the editor's decision. vote of thanks to mrs. martin. the board gave mrs. susan martin a unanimous vote of thanks for her work in getting the issues of ]ola caught up to date in time to meet the post office deadline of december 31, 1973 in order to retain the second class permit. highlights of meetings 51 report of the membership survey committee. (see exhibit 5.) board minutes in lola. the board suggested that minutes published in ]ola be entitled "highlights of isad board meeting" rather than minutes. the meeting was adjourned at 12:30 p.m. exhibit 1 proposal for.establishing on a subscription basis a selective dissemination of information system for ala members the original proposal for an sdi system was intended for isad members only, but interest has grown at ala headqua1ters to the extent that it is being considered as a service to be provided for all ala members. the proposal therefore does not require any action on the part of the isad board. it is presented here for information and to give the board members an opportunity to comment on the idea and make suggestions toward developing the best possible procedure. it is hoped that a presently operating system can be found that would enable ala members to subscribe to a system using multisubject data banks that would automatically adjust profiles according to past output results and that would supply as requested copies of articles and documents whenever possible. such documents would of course be supplied at a fee additional to the basic subscription fee. it is also hoped that the operators of the system would be responsive to subscriber feedback and would improve the system as warranted. at present the only existing data banks in the library and information science fields are eric and marc, but hopefully as time goes on others will be developed. it, for example, would seem prudent for the h. w. wilson company to consider the sale of lihm1·y litemtme in machine-readable form. in any event, there is no reason to limit subscriptions to the service to information science data banks. if interested, members of ala could subscribe to other subject fields depending upon the data banks made available by the operating service. chemistry librarians could, if useful to them, subscribe to chemical abstmcts condensates, engineering librarians to enginee1'ing index, etc., etc. only time and the availability of sdi can determine the interest of librarians in such services. at the time of writing, only one of the two agencies contacted for information has provided descriptive data on their system. a copy of one of the papers sent by the ucla center for information services is attached. ohio state university libraries had not as yet responded. enquiries will be made with other operating systems so that a basis for comparison wiii be available for decision at ala headquarters. comments and suggestions from isad board members would be appreciated. information regarding presently operating systems would also be of great value. december 13, 1973 exhibit 2 reports of the meetings of the marbi committee (interdivisional committee on representation in machine readable form of bibliographic information) january 19 and 20, 1974 number one priority was the resolution of the relationship between the library of congress and marbi in its capacity as the marc advisory group. 52 journal of library automation vol. 7/1 march 1974 there was discussion of the position paper which was presented at the las vegas meeting (copy attached) entitled "the library of congress view on its relation to the ala marc advisory committee." lc had revised certain portions of this paper to conform with marbi's wishes. these revisions were acceptable to the committee. there was concern, however, over an addition which pertained to marbi's role with regard to formats other than books and serials (namely films, maps, music, etc.) alternate wording to lc's proposal was worked out by paul fasana and john knapp. several documents were submitted by henriette avram: (1) a proposed document numbering scheme for communications between lc and the committee and vice versa, and (2) proposed format for presenting changes to marc formats (copies attached). these documents and proposals were acceptable to the committee. (note: incidental to this discussion, the committee officially adopted "marbi" as its official acronym.) 1. the lc liaison presented two proposed marc format changes for the committee's consideration entitled: lc/marbi 2-addition of $x subfield for 4xx fields to allow for issn. lc/marbi 3-specincation of the 830 field. the committee decided that the following plan of action would be followed with regard to these two changes: they would be announced and distributed to isad marc users' discussion group at its january 21, 1974 meeting. the proposed changes would be sent to all on mudg's mailing list, asking for replies to the marbi chairman by february 16, 1974. the chairman would summarize responses and poll marbi committee members who would respond by march 16, 1974. the marbi committee chairman would respond to lc by march 16, 1974. marbi will request publication of changes in ]ola technical communications. 2. henriette avram presented to the committee a clr statement which had been presented to arl entitled "a composite effort to build an on-line national serials data base." the committee took note of the presentation with interest and voted to take no action on the matter at the january 19 meeting. 3. the character set subcommittee of marbi reported that it had issued a written report which will be used in support of the united states position concerning development of standards within the international standard organization. marbi issued thanks to the subcommittee and requested that they remain convened pending review of further developments coming from activities within iso. 4. there was a report on activities of the ad hoc committee convened by clr to discuss use of the marc format in a network environment. a paper entitled "sharing machine readable bibliographic data: a progress report on a series of meetings sponsored by the council on library resources" was discussed. the committee took note of these activities with interest and will wait for formal submission of format changes from the library of congress. 5. marbi discussed the apparent overlap of the change between marbi and the new isad committee on technical standards. marbi passed a resolution that the isad representatives should bring to the attention of the isad board its concern over the similarity of the function statements of the two committees, and asked that these apparent discrepancies be considered and any duplication be eliminated. 6. the proposed marbi serials task force was discussed. it was felt that marbi committee members needed to keep up on developments, and that the chairman should continue to collect and distribute as much documentation as possible to the committee highlights of meetings 53 members. it was decided that there was no need ~tt this time to set up a separate subcommittee to perform this function. 7. the proposed amendments to iso 2709-1973(e) were discussed. it appears that there are several proposals circulating to change this standard. marbi formed a subcommittee to study these proposals and respond, and possibly, to make counterproposals. the position of marbi will be reported to the chairman of ansi z-39, sc/2 and will be used in support of the u.s. position within iso. any committee member or interested professional may reply individually. the subcommittee appointed consists of charles payne, john knapp, mike malinconico, and charles husbands. response will be made by april 1, 197 4. at its regular scheduled meeting, on january 20, all members were present. (john byrum was unable to attend the unofficial meeting on january 19.) the distribution of the rtsd and isad manual material was discussed. the discussion of the previous day was summarized for purposes of review and for the benefit of the nonmembers attending the meeting. 1. marbi and lc the alternative wording to the lc position paper was presented by paul fasana. it was passed. henriette avram will have it published in lcib and will submit it to lola tc. lrts will also receive a copy. the paper will be submitted to each divisional board. 2. the national on-line union file of serials was discussed. larry livingston answered questions. 3. the character set subcommittee report will see that isad has a copy. interested professionals should ask for a copy from them. 4. the activities of the ad hoc clr committee were again reviewed. 5. the isad standards committee was discussed. 6. the serials task force for marbi was reported on. 7. the proposed changes to iso 2709-1973 (e) were reviewed. new business: 8. the activity of the ifla working group on content designators was discussed. it was reported that there is an attempt to standardize content designators across national boundaries, for purposes of international exchange. there are problems in the area of cataloging rules, not all libraries participating, and language. no action was needed, as this is only for informational purposes at this time. 9. location codes were discussed, but the issue was tabled pending report of ad hoc clr committee. 10. language and geographic area codes were brought up but not considered necessary to become involved. 11. the z39 standard account number (san) was reported by emery koltay. 12. progress in regard to the publication of the isbd-m and s was discussed. exhibit 3 cola report-midwinter '74 about fifty people were in attendance at portions of the four-hour meeting. the first half was taken up by a series of informal presentations about activity at: by: stanford allen veaner csuc john kountz berkeley & ulap sue martin ulap cis project at ucla peter watson 54 ]oumal of libmry automation vol. 7/1 march 1974 at: nypl-rlg & suny plans university of chicago lc by: mike malinconico charles payne rob mcgee mary kay daniels questions were entertained at the end of each presentation. the second half was opened by a few announcements by maryann duggan about the new orleans institute and henriette avram about the serials proposals. the major portion of the second half consisted of a panel discussion by john kountz, eme1y koltay, tom brady, and john knapp on the communication of orders, claim reports, ill requests and responses in machine-readable form. john kountz addressed general system design aspects, emery koltay discussed the isbn, issn, and standard account numbers, tom brady discussed b&t's experiences with batab, and john knapp addressed the nature of the data elements and the record structure itself. considerable discussion followed the presentations, centering heavily on the isbn and its good points and failings. both parts of the meeting seemed to be well received. the major value of cola seems to be as an occasion for a wide variety of automation-oriented people to discuss a similarly wide variety of topics in an informal environment. there was some feeling that the presentations in the first half could have been more tightly controlled. the presentation in the second half was quite useful, i feel. i would like to suggest cola as a good sounding-board for proposals and place for announcements, distributions of handouts or written position papers. john kountz and i have discussed setting aside a portion of it for tesla reports. exhibit 4 respectfully submitted, brian aveney to: board of directors, information science and automation division from: john kountz, chairman, committee on technical standards for library automation subject: report of committee's activities, ala midwinter meeting, 1974 the committee on technical standards for library automation (tesla) held its inaugural meetings on tuesday, january 1974 (4:3d-6:00 p.m. and 8:30-11:00 p.m.). these were icebreaker meetings for a new group. in view of the interest that had been expressed in various quarters, several interested observers attended, as well as six of the seven committee members (for membership attendance see attached list). in addition, the following individuals were invited to meet with the committee and present their review of standards activities in other areas; establish a working perspective for the committee within the american library association; and delineate the constraints of the committee's charge: mr. fred kilgour, mr. don hammer, ms. velma veneziano, mr. emery koltay. while the specific discussion that ensued covered a variety of topics, the central objectives for these two meetings (establishing/ defining action areas, constraints, roles, and reviewing in some detail the committee's charge) were met. in addition, stress was placed throughout the discussion on differentiating between professional, service, bibliographic, and similar library standards, and the communications/ clearinghouse function to be served by the committee in its dealings with technical standards impacting library automation. highlights of meetings 55 at its next meeting, the committee can be expected to complete its deliberations on the charge, complete a proposed pilot procedure for the handling of initiative/reactive requirements for standards, and recommend a shakedown of the proposed procedure. committee on technical standards for library automation ala midwinter meeting 1974 attendees of meetings held 21 january 1974 dr. edmund a. bowles, ibm mr. arthur brody, bro-dart industries mr. jay cunningham, university of california mr. john kountz, chairman, california state university and colleges mr. tony miele, illinois state library mr. richard utman, princeton university absent: ms. madeline henderson, national bureau of standards exhibit 5 report of the membership survey committee we mailed out 4,337 questionnaires as of november 3. as of last week, we had received 1,666 replies. they have now dwindled down to about five or six a day, so i feel we have probably received the majority of responses from our mailing. i hope for about a 40 percent response. the returns are presently being coded now by my graduate assistant, and the university of south carolina computer centre will keypunch them for us. i am hopeful that we can start analyzing the results by the end of february, and have the report ready for you by april. the expenses to date have been: $346.95 164.32 166.60 $677.88 preliminary mailing printing of envelopes return postage the bill for printing the questionnaire hasn't been received yet but should be a very minor one. jim williams will write the program for the data, and the library school has computer time which we can use. i expect when all the expenses are in that the total will be more than the budgeted $700, but not very much more. submitted by: elspeth pope, chairman jim williams bill summers martha manheimer editorial board thoughts halfway home: user centered design and library websites mark cyzyk information technology and libraries | march 2018 4 mark cyzyk (mcyzyk@jhu.edu), a member of lita and the ital editorial board, is the scholarly communication architect in the sheridan libraries, the johns hopkins university, baltimore, maryland. our library website has now gone through two major redesigns in the past five or so years. in both cases, a user centered design approach was used to plan the site. in contrast to the single person vision and design by committee approaches, user centered design focuses on the empirical study and eliciting of the needs of users. great attention is paid to studying them, listening to them, and to exposing their needs as expressed. in both of our cases, the overall design, functionality, and content of the new site was then focused exclusively on the results of such study. if a proposed design element, a bit of functionality, or a chunk of content did not appear as an expressly desired feat ure for our users, it was considered clutter and did not make it onto the site. both iterations of our website redesign were strictly governed by this principle. but user centered design has blind spots. first, it may well be that what you take to be your comprehensive user base is not as comprehensive as you think. in my library, our primary users are our faculty and student researchers, so great attention was paid to them. this makes sense insofar as we are an academic library within a major research univ ersity. faculty and student researchers will always be our primary user group. but they are not our comprehensive user group. we have staff, administrators, visitors, members of our board of trustees, members of our friends, outside members of the profession, etc. — and they are all important constituencies in their own ways. second, unless your sample size of users is large enough to be statistically valid, you are merely playing a game of three blind men and the elephant. each user individually will be ex pressing his or her own experience and perceived needs based on that experience, and yet none of them, even taken as a group, will be reporting on the whole beast. while personal testimony definitely counts as evidence, it also frequently and insidiously results in blind spots that would otherwise be exposed through having a statistically valid sample of study participants. third, and perhaps most importantly, user centered design discounts the expertise of librarians. nobody knows a library’s users and patrons as well as librarians. knowing their users, eliciting their needs, is part of what librarians as one of the “helping professions” do; it is a central tenet of librarianship. there is no substitute for experience and the expertise that follows from it. in the art world, this is connoisseurship. somehow, the art historian just knows that what is before him is not a genuine rembrandt. the empirical evidence may ineluctably lead to a different conclusion — yet there remains something missing, something the connoisseur cannot fully elucidate. similarly, in the medical world the radiologist somehow just knows that the subtle gradations on his screen indicate one type of malady and not another. interestingly, in the poultry industry there is something called a “chicken sexer.” this is a person who quickly and accurately sorts baby chicks by sex. training for this vocation mailto:mcyzyk@jhu.edu editorial board thoughts: halfway home | cyzyk 5 https://doi.org/10.6017/ital.v37i1.103813 largely employs what the philosophers call “ostensive definition:” “this one is male; that one is female.” the differences are so small as to be imperceptible. and yet, experienced chicken sexers can accurately sort chicks at an astonishing rate. they just know through experience. such is the nature of tacit knowledge. in the case of our most recent website redesign, none of our users expressed any interest whatsoever, for example, in including floor maps as part of the new site. we were assured a demand for floor maps on the site was “not a thing.” so floor maps were initially excluded from the site. this was met with a slow crescendo of grumbling from the librarians, and rightly so. librarians, and the graduate students at our information desk, know through long experience that researchers of varying types find floor maps of the building to be useful. that’s why we’ve handed out paper copies for years. the fact that this need was missed through our focus on user centered design points to a blind spot in that process. valuable experience and the expertise that follows from it should not be dismissed or otherwise diminished through dogmatic adherence to the core principle of user centered design. ... and yet, don’t get me wrong: insofar as it’s the empirical study of select user groups and their expressed concerns and needs, user centered design as a design technique and foundational principle is crucially important and useful. it gets us halfway home. lita president’s message joining together emily morton-owens information technology and libraries | december 2019 2 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the assistant university librarian for digital library development & systems at the university of pennsylvania libraries. . in writing this column i am looking ahead, as i have been throughout my term as vice-president and president of lita, to the possibility of our merger with alcts and llama. recently our discussions have included an exploration on all sides of how a division can support members through their career. this has inspired me to reflect on how lita has always taken a broad and inclusive view of what library technology work is and can be in the future. i believe the proposed core division can support and extend that tradition. one question that i’ve heard posed from time to time is “am i technical enough for lita?” longtime lita members like to answer that with a full-throated “yes!” if you’re interested enough to ask the question, we want you to join us in using technology as a part of your work. we want you to be supported in doing so at your current skill level, whether or not you want to make technology more a part of your work than it is today. if you want to go deeper into technology, we’ll be there with you. while the culture of the for-profit technology industry can promote imposter syndrome, we want lita to be a haven. in lita’s events and meetings, we consistently see different facets of library technology work reflected. some of us are training users in new technologies or creating programs that get young people excited about coding. others are working to make online resources accessible and easy for our users to benefit from. we have members who are manipulating metadata, creating services to help researchers comply with data management requirements, creating websites that guide users to the information they need, and preserving cultural heritage in digital forms. some of us manage tech projects or workers. some of our members work on large tech teams with generous resources and others are spinning magic just from their own skills. when i started working in libraries, my bosses and mentors were often librarians who had started in technical services or other roles, before “automation.” eager to improve their own workflows, and getting pulled into ils migrations and catalog development, they had become the technology experts. these accidental systems librarians have always been some of my favorite colleagues because of their sure-footed approach to our data. recently i’ve come to work with colleagues who are accidental systems librarians in the opposite sense; tech workers who took jobs in libraries and embraced what we do. one developer on my team, who had no previous library experience, took to our projects and ethical stance like a duck to water. he told me that he now goes to parties and tells people about how librarians are defenders of privacy and protectors of information. lita embraces growth in any direction because we want to support learning and problem-solving with a foundation of shared principles and resources. i don’t see these developments as time-based or inevitable in any given person’s career. there are plenty of library tech workers who prefer being an individual contributor and think they have mailto:egmowens.lita@gmail.com joining together | morton-owens 3 https://doi.org/10.6017/ital.v38i4.11905 their biggest impact doing direct work on applications. and many of my technical services colleagues prefer to define their work goals in those terms, no matter how adept they become with tech tools. whether or not they seek out a management position, our members will probably find themselves exhibiting leadership in some context, like developing standards or advocating for standards. instead of a rigid path of career development, many librarians today have fluid and multi-faceted careers. for myself, i have held similar positions at quite different types of libraries—public, medical, academic. lita has always been a part of my experience, though, providing a sort of collegial bedrock through a lot of change. the people are what make lita, lita: friendly, principled, and quirky. lita members are the kind of people who will learn all they can about a technology like the amazon alexa—and then unplug the one on the exhibit floor at annual. both as i was thinking about all this, and in this resulting column, leadership, collections, and technical services kept coming up. there is such strong and fruitful cross-pollination among these specialties, and i see that as something that would enhance the member experience—both for current lita members who want more contact with expert colleagues and for current llama and alcts members who want learning opportunities and support for their work with technology. lita members love to share their knowledge and hash through challenges together. sometimes i wish more ala members would feel comfortable giving us a try, and perhaps core will be a new, friendly face for that ongoing outreach. if, in the future, someone asked the new question “am i technical enough for core?” i’m sure the answer will be the same: “yes, please join us!” information technology and libraries at 50: the 1990s in review steven k. bowers information technology and libraries | december 2018 9 steven k. bowers (sbowers@wayne.edu) is executive director, detroit area library network (dalnet). i played some computers games — stored on data cassette tapes — in the 1980s. that was entertaining, but i never imagined the greater hold that computers would have on the world by the mid-1990s. i can remember getting my first email account in 1993, and looking at information on rudimentary web pages in 1996. i remember my work shifting from an electric typewriter to a bulky personal computer with dial-up internet access. eventually, this new computing technology became a prevalent part of my everyday life. this shift to a computer-driven reality had a major effect on libraries too. i was amazed by the end of the 1990s to be doing research on a university library catalog system connected with other institutions of higher education throughout the region, wondering at the expanded access to, and reach of, information. in my mind, due to computers and the internet, libraries were really connected at that time more than they had ever been. as i prepared this review of what we were writing about in ital in the 1990s, i had some fond memories of the advent of personal computers in my daily life and in the libraries i had access to. as we take a look back, i think it is interesting to see what we were doing then and how it is connected to what we are still working on today. along with the eventual disruption that the internet was to libraries, computers and online access also had the effect of greatly changing how libraries constructed our core research tools, especially the catalog. prior to the 1990s libraries had begun automation projects to move their catalogs to computer-based terminals, creating connections and access that were not previously possible with a card catalog. if we are still complaining about the design and function of the online public access catalog (opac) today, in the early 1990s we were discussing what their design and function should be, in a positive and optimistic way. in some ways it seems hard to recall the discussions of how to format data and display it to users. in other ways it seems like we are still having the same discussions, but our work has become more complex as we continue to restructure library data to become more open and accessible. while we were contemplating the design of online library catalogs, libraries were also discussing the implementation of networking and other information technology infrastructures. nevins and learn examined the changes in hardware, software, and telecommunications at the time and predicted a more affordable cost model with distributed personal computers connected through networks, and enhancing library automation cooperation. 1 they expanded the discussion to include consideration of copyright and intellectual property, security, authorization, and a need for information literacy in the form of user navigation, all key to what we are doing today. beyond catalogs, there was the real adoption of the internet itself. by the early 1990s there was growing enthusiasm for accessing and exploring the internet. 2 this created a need for libraries to learn about the internet and instruct others on how to use it. as late as 1997, however, even search engines were still being introduced and defined, and using the internet or searching the world wide web was still a new concept that was not fully understood by many people. at their the 1990s in review| bowers 10 https://doi.org/10.6017/ital.v37i4.10821 basis, search engines were simply defined as indexing and abstracting databases for the web. 3 it is interesting that library catalogs were developed separately from the development of search engines and we are still trying to get our metadata out of our closed systems and open to the rest of the web. in 1991, kibirige examined the potential impact of this new connectivity on library automation. he posited that “one of the most significant change agents that will pervade all other trends is the establishment and regular use of high-speed, fiber optic communication highways.”4 his article in ital provides a prescient overview of much of what has played out in technology, not just in libraries. he noted the need for disconnected devices to become tools to access full-text information remotely.5 perhaps most important, he noted the need for librarians to become experts in non-library technology, to keep pace with developments outside of the profession. this admonition is still important to keep in mind today. at the time, however, libraries were working on the basics of converting records from online bibliographic utility systems running on mainframes to a more useful format for access on a personal computer, let alone thinking about transforming library metadata into linked data that can be accessed by rest of the internet. so we keep moving forward. later in the decade, libraries began to think about the library catalog as a “one stop shop” for information. in 1997, caswell wrote about new work to integrate local content, digital materials, and electronic resources, all into one search interface. initially the discussion was more technical in nature, but caswell provided an early concept for providing a single access point to all of the content that the library has, print and electronic, which was a step forward from just listing the books in the catalog.6 at the time we were still far away from our current concept of a full discovery system with access to millions of electronic resources that may well surpass the print collections of a library. eventually more discussion developed around the importance of user experience and usability for the design of catalogs and websites. catalogs were examined in parallel with the structure of library metadata, and both were seen as important to the retrievability of library materials. human-machine interaction was starting to be examined on the staff side of systems, and this would eventually become part of examining the public interface usability as well. outlining an agenda for redesigning online catalogs, buckland summarized this new technological development work for libraries by noting that “sooner or later we need to rethink and redesign what is done so that it is not a mechanization of paper but fully exploits the capabilities of the new technology.”7 more exciting, by the end of the 1990s we were seeing usability studies for specific populations and those with accessibility difficulties. systems were in wide enough use that libraries began to examine their usefulness to more audiences. beyond our systems, the technology of our actual collections was changing. new network connectivity combined with new hardware led to new formats for library resources, specifically digital and electronic resources. in 1992, geraci and langschied summarized these changes, stating that “what is new for the 1990s is the complication of a greater variety of electronic format, software, hardware, and network decisions to consider.”8 they also expanded the conversation to include data in all forms, and data sets of various kinds, well beyond traditional library materials. this is an important evolution as libraries worked to shift their operations, identities, and curatorial practices. geraci and langschied defined data by type, including social data, scientific information technology and libraries | month year 11 data, and humanities data. they called most importantly for libraries to include access to this varied data to continue the role of libraries providing access to information, as they cautioned that information seekers were already beginning to bypass libraries and look for such information from other sources. libraries were beginning to lose ground as the gatekeepers of information and needed to shift to providing online access and open data themselves. the early 1990s were an exciting time for preservation, as discussion was moving from converting materials to microforms to digitization. in 1990, lesk compared the two formats and had hope for a promising digital future.9 thank goodness he was on target for sharing resources and creating economical digital copies, even if he did not completely predict the eventual shift to reliance on electronic resources that many research libraries have now made. lesk also noted the importance of text recognition, optical character recognition (ocr), and text formatting in ascii. others focused on digital file formats and the planning and execution of creating digital collections. digitization practices were developing and the need to formalize practice was becom ing evident. the same year, lynn outlined the relationship between digital resources and their original media, highlighting preservation, capture, storage, access, distribution.10 by the late 1990s there were more targeted discussions about the benefits of digitizing resources to provide not only remote access, but access to archival materials specifically. in 1996, alden provided a good primer on everything to consider when doing digitization projects, within budget constraints. 11 by the mid-1990s, karen hunter was excited to extol the promises of the dissemination of information electronically, calling the high performance computing and high speed networking applications act of 1993 “[a] formidable vision and goal. real-time access to everything and a laser printer in every house. the 1990s equivalent to a chicken in every pot.”12 hunter’s article is a good overview of where libraries were at working with electronic publications and online access in the early 1990s. halcyon enssle’s piece on moving reserves to online access opened with a great summary of where much of library access was headed: “the virtual library, libraries without walls, the invisible user . . . these are some of the terms getting used to describe the library of the future . . . .”13 eventually, by the end of the decade we even learned to start tracking how our new online libraries were being used, applying our knowledge of print resource usage to our new online collections. in 1995, laverna saunders had already developed a new definition of what a library was, and how the transformation of libraries from physical warehouses to providing access to online content would affect workflows in libraries. as defined by saunders, “the virtual library is a metaphor for the networked library, consisting of electronic and digital resources, both local and remote.”14 not a bad definition more than 20 years later. saunders asked pertinent questions such as which resources would be best in print vs. online, what print materials should be retained, and which resources and collections libraries should digitize themselves. the broader view provided was that these changes would affect not just collections but the entire operation of libraries. there would still be work to do in libraries, but changes in the work were necessary to address shifting technology and the composition of collections. by the end of the decade there was new work to assess use of electronic resources, extended virtual reference services, and information literacy extending to technology instruction. in 1998, kopp wrote about the promising future of library collaborations. consortia were well established in prior decades and they were seeing a resurgence. kopp noted that just as consortia the 1990s in review| bowers 12 https://doi.org/10.6017/ital.v37i4.10821 had been built around support for new shared utilities in the 1970s and 1980s, in the 1990s they were finding a new purpose in the new networking of the internet and possibilities of greater connectivity and collaborations in the online environment.15 beyond cataloging and automation technology, it is interesting to note that even in the new online environment that was forming in the 1990s, many consortia formed at the time to share print resources. this may have been conversely related to libraries shifting from complete print collections to online holdings that many may have felt were more ephemeral, or maybe money was spent on new technological infrastructures and less on library materials. resource sharing of print materials is still an important part of libraries working together to provide access to information, and since the time that kopp wrote about consortia and growing networked collaborations, there has also been a growing development of sharing electronic resources. a large part of the work of many consortia today revolves around purchasing of electronic resources, but in the late 1990s libraries were just beginning to get into purchasing commercial electronic resources.16 there were lots of ital articles in the 1990s looking at the future of libraries and technology, and some specific articles dedicated to prognostication. in 1991, looking into the future, kenneth e. dowlin shared a vision for public libraries in 2001. he predicted that libraries would still exist but it is noteworthy that at the time the future existence of libraries was questioned by many. dowlin did predict change for libraries, including the confluence of new media formats, computing, and yes, still books. he stated what time has now confirmed: “the public wants them all.”17 he had lots of other interesting ideas as well; his article is worth a second look. another fun take on the future was a special section on science fiction from 1994 considering future possibilities in information technology and access. in one piece, david brin noted, “nobody predicted that the home computer would displace the mega-machine and go on to replace the rifle over the fireplace as freedom’s great emancipator, liberating common citizens as no other technology has since the invention of the plow.”18 an interesting observation, even if the computer has now been replaced by phones in our pockets or other fantastic wearable technologies. by the end of the 1990s, libraries had been greatly transformed by technology. many libraries had automated, workflows continued to adjust in all areas of library work, and most libraries had at least partially incorporated elements of using the internet along with providing computer access to library users. some libraries were already moving through the change from print to electronic library resources. specific web applications and websites were also being developed and used for and by libraries. these eventually have matured into smarter systems that can provide better access to our collections and smarter assessment of our resource usage, for both print and electronic materials. as a whole, the 1990s are an exciting time to review when looking at the intersection of information technology and libraries. as information dissemination moved to an online environment, within and outside of the profession, the future existence of libraries began to be questioned. as we now know, libraries still play an important role in providing access to information. notes 1 kate nevins and larry l. learn, “linked systems: issues and opportunities (or confronting a brave new world),” information technology and libraries 10, no. 2 (1991): 115. information technology and libraries | month year 13 2 constance l. foster, cynthia etkin, and elaine e. moore, “the net results: enthusiasm for exploring the internet,” information technology and libraries 12, no. 4 (1993): 433-6. 3 scott nicholson, “indexing and abstracting on the world wide web: an examination of six web databases,” information technology and libraries 16, no. 2 (1997): 73-81. 4 harry m. kibirige, “information communication highways in the 1990s: an analysis of their potential impact on library automation,” information technology and libraries 10, no. 3 (1991): 172. 5 kibirige, “information communication highways in the 1990s,” 175. 6 jerry v. caswell, “building an integrated user interface to electronic resources,” information technology and libraries 16, no. 2 (1997): 63-72. 7 michael k. buckland, “agenda for online catalog designers,” information technology and libraries 11, no. 2 (1992): 162. 8 diane geraci and linda langschied, “mainstreaming data: challenges to libraries,” information technology and libraries 11, no. 1 (1992): 10. 9 michael lesk, “image formats for preservation and access,” information technology and libraries 9, no. 4 (1990): 300-308. 10 m. stuart lynn, “digital imagery, preservation, and access--preservation and access technology: the relationship between digital and other media conversion processes: a structured glossary of technical terms,” information technology and libraries 9, no. 4 (1990): 309-336. 11 susan alden, “digital imaging on a shoestring: a primer for librarians,” information technology and libraries 15, no. 4 (1996): 247-50. 12 karen a. hunter, “issues and experiments in electronic publishing and dissemination,” information technology and libraries 13, no. 2 (1994): 127. 13 halcyon r. enssle, “reserve on-line: bringing reserve into the electronic age,” information technology and libraries 13, no. 3 (1994): 197. 14 laverna m. saunders, “transforming acquisitions to support virtual libraries,” information technology and libraries 14, no. 1 (1995): 41. 15 james j. kopp, “library consortia and information technology: the past, the present, the promise,” information technology and libraries 17, no. 1 (1998): 7-12. 16 international coalition of library consortia, “guidelines for statistical measures of usage of web-based indexed, abstracted, and full text resources,” information technology and libraries 17, no. 4 (1998): 219-21; charles t. townley and leigh murray, “use-based criteria the 1990s in review| bowers 14 https://doi.org/10.6017/ital.v37i4.10821 for selecting and retaining electronic information: a case study,” information technology and libraries 18, no. 1 (1999): 32-9. 17 kenneth e. dowlin, “public libraries in 2001,” information technology and libraries 10, no. 4 (1991): 317. 18 david brin, “the good and the bad: outlines of tomorrow,” information technology and libraries 13, no. 1 (1994): 54. balancing community and local needs: releasing, maintaining, and rearchitecting the institutional repository article balancing community and local needs releasing, maintaining, and rearchitecting the institutional repository daniel coughlin information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14073 daniel coughlin (dmc186@psu.edu) is head libraries strategic technologies, penn state university. © 2022. abstract this paper examines the decision points over the course of ten years of development of an institutional repository. specifically, the focus is on the impact and influence from the open-source community, the needs of the local institution, the role that team dynamics plays, and the chosen platform. frequently, the discussion revolves around the technology stack and its limitations and capabilities. inherently, any technology will have several features and limitations, and these are important in determining a solution that will work for your institution. however, the people running the system and developing the software, and their enthusiasm to continue work within the existing software environment in order to provide features for your campus and the larger open-source community, will play a bigger role than the technical platform. these lenses are analyzed through three points in time: the initial roll out of our institutional repository, our long-term running and maintenance, and eventual new development and why we made the decisions we made at each of those points in time. the institutional repository (ir) a university institutional repository (ir) provides long-term access to the scholarship and research outputs of an institution.1 the outputs can be in the form of scholarly publications, data sets to support publications or other research, electronic theses and dissertations, and other digital assets that have value to the university to preserve and to the research community an d beyond to disseminate. there is additional value in keeping these otherwise scattered resources collected in a single repository to showcase the scholarly accomplishments of an institution.2 there is value to the university to collect and disseminate the scholarly outputs of the university to understand the strengths of the university and promote that research to outside audiences, attract new faculty, and provide opportunities for new faculty where fields may be emergent or void of an institutional presence. furthermore, there is value to the research community to be able to find peer research without having to pay publisher access fees. reducing the burden on faculty to meet various policy demands from a federal, publisher, and institutional perspective provides another motivation for irs. federal policies can require making anonymized research data and scholarship publicly available because it is publicly funded through tax dollars; publishers can make authors provide access to the data that supports the research that is being published.3 in the united states, a growing number of academic institutions, from 2005 to 2021, have adopted an open-access policy that requires researchers to provide a copy of any published scholarly article in a publicly accessible repository. the institutional repository is a way for a university to meet this increasing demand from research organizations and funding institutions for their researchers.4 as the size of a campus grows in disciplines, it inherently grows in complexity and a diversity of digital needs and use cases from its researchers. for example, high-resolution images or mailto:dmc186@psu.edu information technology and libraries march 2022 balancing community and local needs | coughlin 2 atmospheric data are likely to create a higher demand in storage needs than a discipline that relies largely on text. performance-based research may require multimedia resources and streaming capabilities while other large files can be shared in a more asynchronous manner. the diversity of needs contributes to the complexity of finding a solution for an institutional repository that meets all, or many, of the needs on a campus from a file storage, discovery, and access perspective. this paper broadly addresses penn state university’s development of its ir at three distinct points in time: (1) choosing a platform for our ir and its initial release; (2) maintaining an ir; and finally, (3) our current solution nearly 10 years later. at each point in time, we analyze our decision process through four lenses. these lenses provided a thorough examination for us to decide on how to proceed; they are community needs and potential tension that exists with local needs, our team dynamics, and finally the platform we built our software on and the infrastructure required to maintain it. we discuss why we made the decisions we made through these four lenses, the benefits and drawbacks, and what we have learned along the way. penn state is the state of pennsylvania’s land grant university in the united states. the university has 24 campuses physically located in the commonwealth of pennsylvania, the world campus which is online, two law schools, and a medical school. in the fall of 2021, penn state had 73,476 enrolled undergraduate students and 13,873 graduate students, with research expenditures totaling over $850 million for the last four years.5 penn state is a large, public research institution with a diverse set of needs. this is significant because when the university is considering developing a large system such as an institutional repository, we need to meet the needs of a broad set of disciplines and domains. we are fortunate enough to have software developer and system administration resources that smaller institutions may not have. this provides a bit of context into our considerations for an institutional repository. selecting a repository in january 2012, penn state university libraries and penn state’s central information technology department collaborated on developing an institutional repository for the university’s growing data management needs. the university libraries was interested in becoming more involved in open-source software community development efforts. at that point, many universities that we had spoken with had an existing ir solution in place, and we had a lot of freedom to choose a platform without the burden of data migration. we considered investigating (1) off-the-shelf, turnkey solutions such as dspace, (2) a prototype we had just built called curation architecture prototype services (caps) using a microservices approach, or (3) building on top of an existing platform. ultimately, we decided to build on top of an existing platform, samvera (named hydra at the time).6 we did not want a turnkey solution, because we felt that we had distinct needs that would require a level of customization that these solutions would not be able to offer. based on discussions with others, we decided to develop something of our own. we wanted to leverage the experience of others in the repository development domain. the microservices approach at the time was more of a conceptual approach towards development than an existing software solution. the ability to build on an existing platform was a happy middle ground for us and we evaluated this decision through several lenses that led us to our selection at that time. community involvement we did not want to develop a solution in a vacuum and thought a group with a (relatively) common set of problems would be helpful to problem solve. the samvera community was a small but growing community working towards repository solutions like what we were trying to information technology and libraries march 2022 balancing community and local needs | coughlin 3 achieve. members of the community were both managerial and technical. this was valuable to us for understanding the strategic direction for the community and the ability to collaborate and problem solve on technical implementations. some of the key partners for our early work were university of hull (uk), stanford university, university of virginia, and notre dame. there was communication throughout the year over community email, chat platforms, and phone calls; however, the quarterly partner meetings were the most valuable time for collaboration. these quarterly meetings were a couple days in length, typically at a partner institution’s campus (physically) attended by managers and software/systems developers. this provided the ability to work together on specific problems, showcase our work, and get to know each other more closely at lunch and after-hours meetups. working within the community would also get our team increased exposure and help with recruiting future colleagues. working in the open-source software community has been seen to benefit both candidates and employers in future job recruitment.7 we were excited by the promise of working with and contributing toward a larger community. our team had apprehension about building this alone, and we were happy to be working with the support of a community and within their set of processes. local needs early on in our requirements for the repository we created a moscow chart that provided our “must have,” “should have,” “could have,” and “won’t have” features.8 the platform we were choosing was going to provide us with a significant set of these features for our repository with very little work on our end. these features were built in and included search, discovery, and basic file edit functionality. essentially, we were going to quickly meet the needs of our stakeholders by using this software. this was important for a couple of reasons. first, providing features to our stakeholders quickly gave them ample time to provide feedback so that we could make necessary customizations for their specific needs. a less quantitative benefit was gaining the trust of colleagues at the start of a new project and new initiative. rather than continually suggesting “that feature will be done next week,” we were able to deliver results quickly and get feedback. for example, our repository integrated with our campus authentication system, restricting access. we were able to deliver these features and get feedback on both the functionality as well as terminology to improve the usability. in particular, the way our developers described permissions was initially too confusing for our users and we were able to make necessary adjustments prior to a production release of the ir. team dynamics we believed it was a significant professional development opportunity for many people on the team to work with a larger community and learn from and with those in the open-source community. the team working on the ir consisted of three full-time, or near full-time, developers (one joining after we started the project), and a systems administrator. this project was our first large project that included a project manager invested in agile project management methodologies and with a systems administrator in place at the beginning of the project. platform and infrastructure stability there was a desire to get to a common solution to easily set up other repositories for various needs within the libraries and we hoped there would be an ability to plug and play various components or features. the three common components of this system were fedora commons to store both metadata and our digital assets; solr as an index for fast search; and blacklight as a web interface that sits on top of solr. one of the primary components, active-fedora, would sync content between the fedora and solr persistence layers. our hope was that with this model, we information technology and libraries march 2022 balancing community and local needs | coughlin 4 would be able to write code that could be used in other repositories, and we could use the code that other institutions had written for our repository needs to build other applications more quickly. the samvera community was initially called hydra because of the relationship with the mythical creature that has several heads (see figure 1). we were considering the potential of running a core storage infrastructure and discovery infrastructure, while developing several heads for our various applications. we knew this was a lofty expectation, but also thought that it was a good design principle for us to advance. additionally, the pilot that we developed on microservices (caps) seemed to have a relatively large storage service and we could not determine how to get away from that. although this was a bit of a shift in our philosophy, it was less of a shift based on our practical experience. figure 1. aspirational intentions of running many applications on one access and discovery system. initial release the initial release of our ir, scholarsphere, was for research data, scholarly articles, and presentations. we considered the repository file agnostic and left the definitions of scholarly materials up to the depositor. the self-deposit process made very few assumptions to limit the barriers to deposit—there were few mandated fields for deposit in scholarsphere. the initial rollout of scholarsphere had met the “must have” and many of the “should have” needs that we had defined initially in our development requirements. the list of “must haves” included upload files via the web, create and assign metadata to the uploaded files, set three access levels to the files, search for files, display files, etc. the list of “should haves” were faceted browse, faceted search results, share files with a group, etc. the benefit of working on a community-developed platform provided some of these features for us (search, faceted browse), and gave us the flexibility to customize where necessary. for example, we had our own data model of metadata to assign to files based on our users’ needs. we were able to update the existing metadata that was information technology and libraries march 2022 balancing community and local needs | coughlin 5 provided out of the box, to accommodate that. this was a tremendous win for us to leverage community-provided solutions and local needs. additionally, the platform provided a search index with solr. this enabled our infrastructure to have a common solution with community support on configuration questions. using the blacklight ui on top of solr created another opportunity for us to customize where desired and ease of development efforts. community: following the initial release, we worked with other members of the community to pull out some of the core functionality and place it in a separate ruby library. this library (sufia) could then be leveraged as a default set of repository features for other developers. the release of a new ir, and this library, provided us with a lot of positive exposure at various community events. local needs: locally, we used this library to develop a repository for our digital archives. it previously took two to three developers nearly nine months to develop scholarsphere; however, we used the sufia module to roll out a separate repository in six months with a single developer. this was another successful production rollout and a successful use of a product created by and for the community. team dynamics: we had a successful release and were getting support for new developers to hire. we continued to move more of our projects toward an agile approach and permanently embed systems administrators into our development projects. infrastructure: we had not released a new system for archives on the exact same system that scholarsphere was developed on, but we were happy that our projects were relatively homogeneous technology stacks and provided a familiarity to run. maintaining the ir over the next several years we released three major updates to scholarsphere: 1. migrating the data object store to a major version 2. overhauling the user interface 3. migrating our data model to the portland common data model (pcdm) simultaneously, the sufia library that we developed had also grown in usage by other institutions and contributions from other developers. we were excited to have additional contributors, and with that came an understandable sense of competing priorities within our community’s development roadmap. we were building scholarsphere features and functionality to meet the needs of our local institution and managing the tension between community direction and local needs. again, we look at these lenses as evaluating the period during maintenance, upgrades, and feature adds. community involvement two of the major releases mentioned above were largely community driven. in one case— migrating the data object store—we were one of the initial repositories within our open-source community to migrate our data storage system. we anticipated that doing this work early would prevent us from having to rewrite any code that relied on the data storage layer. ultimately, this may have been a bit early for us, because we never were able to create the momentum for others in the community to make this same migration. this created a bit of a divergence, but at this layer in our technology it did not prevent us from continuing to work closely with the community. information technology and libraries march 2022 balancing community and local needs | coughlin 6 we were able to add locally developed features for managing files and uploads, community components that allowed for controlled vocabularies, and cloud provider uploads.9 in all, from 2012 to 2019 we were an active member of the community: we provided technical contributions, we were being asked to present at community events, and our developers were frequently asked to help at several workshops. the community provided many opportunities for professional development and code from the community provided new features to our users. we felt this work was successful. we had three major releases. one was something that our local users were able to experience directly. two of our upgrades were largely on the back end and, while there is no argument on their importance, it can be a challenge to illustrate the significance of largely opaque technology upgrades to users. concurrently, we were coming up against other challenges that were proving difficult to solve in a sustainable and scalable way. large file size (larger than 1 gb) for uploads and downloads remained an issue that researchers seemed to be encountering more frequently. our mechanisms for getting around some of these obstacles led us to looking at an api for administrators and other applications to integrate. for example, if the web browser upload was not working, perhaps we could physically get the file from a user and upload it to the system ourselves. if we could do that, maybe we could use an api to do this upload, but we did not have an api. when developing new features, we would question if it should be code to contribute back to the community or only for our (penn state) needs. frequently, the devil is in the details and , while several institutions were interested in a feature based on a conversation, implementation could be much more detailed and it was difficult to find common ground. this complexity could lead to longer timelines and more difficult planning for local development features. team dynamics over this time period we advanced our team by adding several highly skilled developers (some of whom have now moved on to other positions and remain highly respected within the community), and enriched the collective skill set of the group. the team was enriched by this experience overall. the balance between community involvement and local needs became a frequent conversation point for our team. we spent a lot of effort on initiatives that had not solved some of the bigger problems our users were experiencing locally; our community disengagement was likely a combination of common reasons, for example, our lack of time to make meaningful contributions.10 in the spring of 2019, the development team that worked on scholarsphere shrunk from three developers down to one. we had a strong number of developers within the samvera community to collaborate with; however, we had difficulty bringing on new members at the time because the complexity within the scholarsphere system created a high learning curve that was not necessarily transferable to other technology stacks. at the end of the summer in 2019 we were given 25 gb of video files to upload in scholarsphere and make accessible. the parameters of the request were outside of what we could support from our web interface, and we had no api to allow a product owner to develop against and work with the researcher to meet this request. after approximately one month of working with the data and our system, we successfully ingested the files into scholarsphere. at the end of this month, we information technology and libraries march 2022 balancing community and local needs | coughlin 7 decided that we needed to more urgently evaluate our path forward because we could not have our lone developer spending this amount of time on single-user requests. platform and infrastructure stability each of the major versions released between 2012 and 2019 had several patches and feature releases to enhance the system, the interface, and/or our processes for change management within the software system. for example, we went from a typed script containing a series of commands to chef (a language used to automate software deployment) for deployment management; we upgraded infrastructure core components (fedora, solr, travis, redhat, etc.); and we added infrastructure to keep up with the system demands. in terms of adding infrastructure, we both enhanced the virtual capabilities (cpu and ram) of our systems and had tasks offloaded to other systems. we did not want the systems our users interfaced with to be responsible for all the heavy lifting. these tasks included characterization, indexing metadata for search, creating thumbnails, etc. (see figure 2). figure 2. systems and services with basic workflow process for uploading a file to scholarsphere, including the background jobs that ran on file upload. adding additional components improved the user experience but made our infrastructure d ifficult to manage. we were continually trying to push our systems to reflect the best practices of the twelve-factor app.11 however, over time, we had certain “infrastructure smells.” the infrastructure smells were essentially anti-patterns of these best practices or symptoms of a bigger problem.12 these anti-patterns included: • storage coupled closely to application • lack of flexibility to scale storage to integrate • inability to spin up a scholarsphere instance web 1 repo isilon jobs mysql services • apache • rails • passenger • clamav services • maria db services • tomcat • fedora • jetty • solr • redis services • rails • resque • fits services • nfs jobs on upload • characterization • thumbnail creation • text extraction (solr) • derivatives information technology and libraries march 2022 balancing community and local needs | coughlin 8 • taking days to set up a dev environment • lack of flexibility to decouple small tasks that may require increased resources (create derivatives) evaluating next steps although we were coming up against some struggles and continued maintenance with scholarsphere it was a successful software project that had several things we liked (and likely took for granted). it was important for us to recognize what features and characteristics of scholarsphere were a part of this list. scholarsphere‘s data model was flexible enough to support several current use cases and future needs and was developed with a significant amount of community input. there were other development teams within our organization that were also developing new applications in ruby, so the language continued to be relevant within ou r larger group as was ruby on rails, blacklight, and solr. some of the libraries developed with these frameworks were providing us with struggles and we knew that tools and infrastructure could be barriers to newcomers onboarding and orientation.13 however, the languages themselves were still flexible enough for us to continue our work. we had three permission levels to access the full text of an uploaded file: (1) public, (2) penn state only, and (3) private, and we didn’t want to develop anything more complex than that around access permissions. fedora provided us with versioning capability of our objects and we thought that this was something not only to continue but potentially enhance. we also had strong support from the samvera community for scholarsphere. many people had worked on the code that helped provide functionality and we could collaborate within that community when problems arose. at that point we largely decided to continue to develop needed features for scholarsphere while the community pushed forward. in part we were hoping that our divergent paths would converge within a year (give or take). the month following the relatively manual process of ingesting the 25 gb of video files into scholarsphere was spent making important updates to the system and fixing any low-hanging fruit. in october 2019 we decided to start from scratch and spend about two months developing a new solution and to evaluate our path forward after that. current solution we turned to the same four established lenses when evaluating our needs in 2019. however, it is worth noting that organizationally we were in a much different position than when we started in 2012. the software development and infrastructure team that managed the service was organizationally moved from central it to the libraries where the service and product owner resides. being in the same building and having the same priorities improved communications. also, people within the teams had changed, and our leadership had changed, which changed how we approached some of our decisions. we had more experience in technical skills, specifically in repository development; we were more refined in our implementation of agile methodologies; and having run a service for years, we had a better sense of our users’ needs. community involvement the community saw a tremendously successful period of growth during this time in adoption of software, exposure for funded grants, and number of partners. there was renewed excitement about multiple solutions including turnkey repository solutions, hosted solutions, the merging of two highly regarded software libraries for performance, and improvement in developer friendliness. the latter improvement stripped some of the design patterns that developers struggled with to something more familiar and made it easier to onboard new developers. information technology and libraries march 2022 balancing community and local needs | coughlin 9 local needs the pressure to meet our local needs and competing priorities for the community-based software became a sticking point for us. we needed to have a more scalable backend and we were not su re when our needs and the priorities of the community would merge. we had also been behind on several dependencies and the lift to get back up to date, before being able to add anything new, was considerable. this situation led us to create a prototype for evaluation. our initial goal was to see how difficult it would be to build a system to meet the needs of uploading the video files that scholarsphere currently could not handle. we had confidence we could develop features, but this area was a consistent challenge and we considered it a primary hurdle for us to jump. team dynamics our development team consisted of a single developer. however, we had an infrastructure developer who was able to help with systems configuration, automation, and containerization. our developer thoroughly understood scholarsphere and the underlying codebase and architecture and had the resources to hire a consultant to help with our efforts. we had considerable work performed by a local software development company on other repositories (electronic theses & dissertation system, a digital cultural heritage repository, and our researcher metadata database). we valued this partnership and wanted to continue to utilize them as our staff numbers were down. we needed to be able to more quickly onboard others than we previously had in the past. if we were able to have three relatively new members of the team contributing to this progress, then we would also potentially have chosen a technology stack that was comfortable for others outside our development team to make a more immediate impact. platform and infrastructure stability as with many systems that are actively developed for years, our current system had several dependencies that had organically grown over time to become burdensome to put together in order to set up a development environment. additionally, a local development environment was not an exact replica of the production environment because networked storage was implemented on production and our development systems had a local storage. we also took this opportunity to test out amazon s3 storage options as our production storage system. we chose this alternative to see if we had increased reliability in our storage and to see how well we could manage data in s3 and get a production service using this to provide an example of the annual operating cost by using the cloud vendor. we were able to simplify our rollout a bit, and modernize the technologies used to run our systems (i.e., docker containers, kubernetes cluster) (see figure 3). development we had three general goals: (1) to improve stability/scalability for local needs; (2) to improve our ability to get an environment up for developers more simply; and (3) to be able to onboard new developers more quickly. shortly after our prototype test proved we could meet local needs in scalability, we were able to test out our second goal, getting a scholarsphere environment set up easily. the process of setting up a development environment went from days to hours. we had reached two of our three goals with these tests and believed our development team (that was two to three new developers) contributing to our first two goals was proof that we could onboard new developers quickly (our third goal). after several months of development in early 2020 we had accomplished moving several of the obstacles that had been in our way in recent years but were nowhere near feature parody with scholarsphere. information technology and libraries march 2022 balancing community and local needs | coughlin 10 figure 3. current infrastructure for scholarsphere, released in november of 2021. we had a rich feature set to transfer from the existing scholarsphere and did not want to simultaneously run two systems until we achieved some level of feature parody. we wanted to get to a minimal viable product (mvp) for our new prototype, migrate data, release our new version, and retire our existing system. our product owner had been working directly with scholarsphere users and was able to help us determine priorities for the features we needed in order to have an mvp. the following were some of those features: • an api, at the very least an internal one for o our migration script o other home-developed applications o internal library employees • versioning and the ability to view versions • updated status (pending published) • updated user interface • urls that were harvestable • maintaining our data model for continued support of concepts such as collections • enhanced support for dois information technology and libraries march 2022 balancing community and local needs | coughlin 11 we also identified some features that had been developed over the years to either simplify or eliminate. the profiles within scholarsphere were not heavily used and over the years the university had more mature systems for this type of purpose. similarly, finding a featured researcher for the home page seemed to create more work than it was worth, and our social media integrations were not going to be a priority. we also thought a user’s dashboard—the default page after logging in—could be greatly simplified based on the most prominent actions our researchers wanted to perform. conclusion after a little over a year of development, in november 2020, we released our new version of scholarsphere. we used our own internal api, as planned, for data migration from our existing fedora commons storage system into the new one in amazon s3. over the past seven months we have done nine feature releases, including collections, and an enhanced api to support penn state’s open access initiative. we learned some lessons along the way within all of these lenses. we have also more than doubled the physical storage size of our repository since releasing in november 2020. over the summer, we were able to meet a faculty member’s request to upload 30 to 40 videos of 300 to 400 gb, a request we never would have been able to meet in our prior solution. community & local needs working with the samvera community has provided countless opportunities for our entire team. we were able to sharpen our technical skills, were given opportunities to lead workshops, organized community development sprints, and collaborated on a plan for a community roadmap (to name a few). our entire team benefitted in several ways by the involvement in the community: our software knowledge is higher, our problem-solving skills are more creative, and our outside professional opportunities expanded. ultimately, our paths diverged in a way that made it difficult to justify the time and resources required for merging back. there are several benefits to community-based software: more eyes looking at potential security issues in code, more voices to let you know when a dependency of your code has beco me vulnerable, shared software ideas for developing issues, and shared solutions for common problems. the cost of all these benefits comes with increased complexity in organizing a solution (you need to take multiple institutions into account), workflows for development (your local workflow may not be the same as the community approved workflow), competing priorities within the community, and competing priorities with the community and local roadmap. open source communities are largely online, these groups typically have a more shared, informal leadership structure and that lack of formal leadership can make it difficult to find solutions to these complexities.14 team dynamics, and platform and infrastructure stability rewriting a system can be a daunting task, and several prominent developers would argue against it.15 reasons we believe we were successful are that (1) we did not change our data model, (2) although we changed our architecture, we did not change our coding conventions or our agile development process, and (3) the benefits of our changes were multidimensional. we were meeting users’ needs with our development work and our infrastructure was enhancing our capabilities and making the work of our developers easier and less frustrating. our deployment process has improved to the point that we can perform a release easily and without downtime. information technology and libraries march 2022 balancing community and local needs | coughlin 12 our technology is no longer based on samvera, and is now, largely, a more generic ruby on rails application. we migrated from using fedora as both a metadata and object store (retrieving objects on our central isilon system through fedora) to using postgres as a metadata store and amazon’s s3 storage service for our files. we migrated our background jobs processing services from rescue to sidekiq. we continue to use blacklight discovery and search interface, with solr as our search platform. many of these technical decisions were made because of the change in dynamics of our team, and perhaps the single biggest change was around experience and the confidence that comes with that. selecting a platform and infrastructure to support that platform is daunting. it is particularly difficult when you have so many questions in front of you about how the system will be used, the demand it may be under, the need to scale, how to deploy new features and update dependencies, etc. our decisions in 2019 were made with much more experience and understanding of what was required of our system as well as what desired by our users. this gave us the confidence to branch off slightly from the joined technical path and recognize all the value (beyond technical solutions) to remain members of the community albeit in a modified capacity. acknowledgements many people put in tremendous time, effort, skill, thought, and enthusiasm into scholarsphere over the years. we want to acknowledge all those that have contributed to the development and advancement of the system and appreciation for their work: carolyn cole, hector correa, michael tribone, michael j. giarlo, adam wead, ryan schenk, jeff minnelli, dann bohn, justin patterson, joni barnoff, seth erickson, kieran etienne, calvin morooney, jim campbell, paul crum, chet swalina, matt zumwalt, justin coyne, elizabeth sadler, valerie maher, jamie little, brian maddy, kevin clair, patricia hswe, and beth hayes. endnotes 1 helen hockx‐yu, “digital preservation in the context of institutional repositories,” program 40, no. 3 (2006): 232–43, https://doi.org/10.1108/00330330610681312. 2 raymond okon, ebele leticia eleberi, and kanayo kizito uka, “a web based digital repository for scholarly publication,” journal of software engineering and applications 13, no. 4 (2020), https://doi.org/10.4236/jsea.2020.134005. 3 research data access and preservation, “browse data sharing requirements by federal agency,” sparc, september 29, 2020, http://researchsharing.sparcopen.org/compare?ids=18&compare=data; “publisher data availability policies index,” chorus, october 8, 2021, https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availabilitypolicies-index/. 4 “registry of open access repository mandates and policies,” roarmap, http://roarmap.eprints.org/view/country/840.html. 5 “student enrollment – fall 2021,” the pennsylvania state university data digest 2021, https://datadigest.psu.edu/student-enrollment/. https://doi.org/10.1108/00330330610681312 https://doi.org/10.4236/jsea.2020.134005 http://researchsharing.sparcopen.org/compare?ids=18&compare=data https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availability-policies-index/ https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availability-policies-index/ http://roarmap.eprints.org/view/country/840.html https://datadigest.psu.edu/student-enrollment/ information technology and libraries march 2022 balancing community and local needs | coughlin 13 6 stephen abrams, john kunze, and david loy, “an emergent micro-services approach to digital curation infrastructure,” the international journal of digital curation 5, no. 1 (2010): 172–86, https://doi.org/10.2218/ijdc.v5i1.151. 7 jennifer marlow and laura dabbish, “activity traces and signals in software developer recruitment and hiring,” in cscw ’13: proceedings (acm, 2013): 145–56, https://doi.org/10.1145/2441776.2441794. 8 dai clegg and richard barker, case method fast-track: a rad approach (reading: addisonwesley, 1994). 9 “questioning authority,” github, accessed september 2021, https://github.com/samvera/questioning_authority; “browse-everything,” github, accessed 09/05/2021, https://github.com/samvera/browse-everything. 10 sophie huilian qiu et al., “going farther together: the impact of social capital on sustained participation in open source,” 2019 ieee/acm 41st international conference on software engineering (icse) (2019): 688–99, https://doi.org/10.1109/icse.2019.00078. 11 adam wiggins, “the twelve-factor app,” accessed september 2021, http://12factor.net. 12 akond rahman, chris parnin, and laurie williams, “the seven sins: security smells in infrastructure as code scripts,” 2019 ieee/acm 41st international conference on software engineering (icse) (2019): 164–75, https://doi.org/10.1109/icse.2019.00033. 13 christopher mendez et al., “open source barriers to entry, revisited: a sociotechnical perspective,” in proceedings of the 40th international conference on software engineering (may 2018): 1004–15, https://doi.org/10.1145/3180155.3180241. 14 lindsay larson and leslie a. dechurch, “leading teams in the digital age: four perspectives on technology and what they mean for leading teams,” leadership quarterly 31, no. 1 (2020), https://doi.org/10.1016/j.leaqua.2019.101377. 15 fredrick p. brooks jr., the mythical man-month: essays on software engineering (reading, mass.: addison-wesley pub. co., 1982) https://search.library.wisc.edu/catalog/999550146602121; joel spolsky, “things you should never do part i,” joel on software, april 6, 2000, https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/. https://doi.org/10.2218/ijdc.v5i1.151 https://doi.org/10.1145/2441776.2441794 https://github.com/samvera/questioning_authority https://github.com/samvera/browse-everything https://doi.org/10.1109/icse.2019.00078 http://12factor.net/ https://doi.org/10.1109/icse.2019.00033 https://doi.org/10.1145/3180155.3180241 https://doi.org/10.1016/j.leaqua.2019.101377 https://search.library.wisc.edu/catalog/999550146602121 https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/ abstract the institutional repository (ir) selecting a repository community involvement local needs team dynamics platform and infrastructure stability initial release maintaining the ir community involvement team dynamics platform and infrastructure stability evaluating next steps current solution community involvement local needs team dynamics platform and infrastructure stability development conclusion community & local needs team dynamics, and platform and infrastructure stability acknowledgements endnotes the fourth industrial revolution: does it pose an existential threat to libraries? editorial board thoughts the fourth industrial revolution does it pose an existential threat to libraries? brady lund information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13193 brady lund (blund2@g.emporia.edu) is a doctoral student at emporia state university's school of library and information management and a member of the ital editorial board. © 2021. “editorial board thoughts” is a regular column contributed by a member of the ital editorial board. opinions do not necessarily reflect those of the editorial board as a whole. no. no, it does not. not any more than any other technological innovation (information systems, personal computers, the internet, e-readers, google, google scholar) did. however, what is very likely is that the technologies that emerge from this era will slowly (but surely) lead to profound changes in how libraries operate. those libraries that fail to understand or embrace these technologies may, in fact, be left behind. so, we must, as always, stay abreast of trends in emerging technologies and what the literature (i.e., articles in this journal) propose as ideas for adopting (and adapting) them to better serve our patrons. with this column, my aim is to briefly discuss what the fourth industrial revolution is and its relevance within our profession. the “fourth” industrial revolution? the term “fourth industrial revolution” describes the evolution of information technology towards greater automation and interconnectedness. it includes or incorporates technological advancements such as artificial intelligence, blockchain, advanced robotics, the internet of things, autonomous vehicles, virtual reality, 3d printing, nanotechnology, and quantum computing.1 imaginations can run amok with idyllic visions of walt disney’s epcot—a utopian world of interconnectedness and efficiency—or dystopian nightmares captured in the mind of stephen king, george orwell, or pixar’s wall-e. if we have learned anything from the past though—and after the last 12 months i cannot be entirely sure of that—it is that reality is likely to settle somewhere in the middle. some things will improve dramatically in our lives but there will also be negative impacts—funding changes, learning curves, and maybe a bit of soul-searching within the profession (not that that last one is necessarily a bad thing). the “fourth industrial revolution” is referred to as such because historians of technological and industrial innovation have placed the period as fourth in a line of major shifts in technological innovation (shocking, no?). the first industrial revolution was the industrial revolution, the one we were taught to call the “industrial revolution” in high school: the period from the late-18th to mid-19th century where rapid innovation in the areas of agriculture and manufacturing transformed the economy, created a market for invention and profiteering, and formed a true “working class” of laborers. as idyllic as it sounds in our history classes, it was not a particularly pleasant time for the average laborer. there is a reason why the communist manifesto, and the whole concept of the social sciences, emerged from this era. but this era also brought us the modern (semi-modern?) library. mailto:blund2@g.emporia.edu information technology and libraries march 2021 the fourth industrial revolution | brady 2 the second industrial revolution, the “technological revolution,” occurred from the late-19th to early-20th century. it was all about the power of electricity and the engine. here we have the emergence of vast telegraph networks, lightbulbs, automobiles, and airplanes. very nice stuff. you also had a lot of economic uncertainty that led to a few depressions and wars (but do not focus on that too much, readers). the third industrial revolution, the “digital revolution,” is the first revolution that we really have a good record of within library and information science. it is why we have that “information science” part in the name. it is why we have this journal that you are reading now. it was the era of digital computers and computer networks. yes, it changed a lot—some for the better, some for the worse. if you are a reader of this journal, you would probably say—at least in terms of library services—that this era was full of fairly positive developments for libraries. we gained digital catalogs, electronic databases, integrated library systems, the internet, and microsoft office, all developments the increase the ease and efficiency of our everyday work tasks. so, that leads us to the precipice that we are on now. the fourth industrial revolution. while uncertainty provokes trepidation, there is much we can do to inform ourselves—much more so than with previous revolutions (we could not very well turn to the internet to learn about the internet). there are scenarios on both ends of the spectrum that ranges from utopia and doom — and it’s not altogether bad to read about the risks that automation presents, such as described by andrew yang and other political and economic figures—but if history and precedent mean anything, it is likely that we will see many things in our economy, and in our libraries, change substantially but certainly not vanish. the role of libraries in the fourth industrial revolution we have already been adopting many of these fourth industrial revolution technologies for quite some time—whether we were cognizant of it or not. a lot of what could be called “artificial intelligence” already exists in our library systems and more is on the way if current projects being conducted around the globe come to fruition.2 there are a lot of “pie in the sky” ideas of what the future of libraries will look like with the fourth industrial revolution—some of which were even published in this very journal. it is good to have these perspectives, even if they do seem a bit unrealistic and/or dystopian. we need to consider the possibilities of this era while also understanding the practicalities. our goal as library technologists—to serve the information needs of patrons to the best of our abilities—should never falter. if this is true of our professional ethics and values, then we must be prepared to sacrifice and adapt to change for the common good. one thing i am quite certain of is that these technological innovations will not spell doom for libraries. libraries are resilient and librarians themselves mean more to patrons than a computer interface. we do not have to go back too far to see how libraries responded to a similar disruptive period: the introduction of the internet. if you want to read an interesting parallel to this editorial, check out david raitt’s 1994 editorial in the electronic library (and hopefully not feel too old when you discover that i was born in the year that it was published). the purpose of raitt’s editorial was to answer the question that considering developments with the internet, “will librarians still be around in 2024, and if so what are they likely to be doing?”3 boy, i sure hope they are around in three years, or i really made a poor career choice. the ease with which raitt dismisses any concern about the internet spelling the end of libraries is delightful to read: “are librarians so insecure about their profession and future?... (in 2024) librarians will still be doing information technology and libraries march 2021 the fourth industrial revolution | brady 3 what they do now and what they have always done, only they will have more new-fangled technology to help them do it.”4 you could argue that raitt’s prediction is not entirely correct. some things about libraries have changed considerably. but have any of these changes been decidedly for the worse? even the most curmudgeonly postmodernist must concede that libraries serve their patrons better now than before the internet age. libraries, on the whole, are still very much the same. we have not undergone massive upheavals in our professional values and ethics. some job duties have changed (such as a greater emphasis on instruction in academic libraries) but it has not spelled total doom for our librarians. so, people, turn and face the strange, because there is nothing too serious to fear with coming changes. and, if my prediction turns out to be less accurate than raitt’s, well, just remember that my name is “john barron” (and metadata that indicates to the contrary is fake news). the changes that have been clumped under the buzzword of the “fourth industrial revolution” will do a lot to advance the mission of libraries. right now, a lot of the “how?” can seem a bit hazy, but check out some of these library school programs that are working to answer that very question and i think you will get a good idea: • blockchains for the information profession (san jose state university ischool): https://ischoolblogs.sjsu.edu/blockchains. o an imls-funded program that examines applications of blockchain in libraries. • artificial intelligence (a program of stanford university libraries): https://library.stanford.edu/projects/artificial-intelligence/. o a university-supported program that examines applications of ai in libraries. • the good systems program (university of texas ischool, in partnership with a bunch of other departments on campus): https://bridgingbarriers.utexas.edu/good-systems. o this program focuses on ethical uses of ai to improve lives. while it does not necessarily focus specifically on libraries, its founding members are connected to the university’s library school and many of the program’s “products” have direct relevance to libraries. library and information technology is one area where academia and scholarly research offer a lot of useful knowledge and ideas for the professional librarian. there are a lot of great ideas from researchers and programs at these schools, in addition to the investments made by industry leaders like oclc, that suggest libraries are not going to be left behind during the “coming revolution.”5 expect the ideas filtering from these programs, and others, to take greater hold in practical library settings in coming years—as the internet did in the mid-to-late 1990s and social media did in the period of (approximately) 2008–2013. like with these past innovations, the extent of adoption will likely vary from library-to-library. i will avoid a lecture on diffusion of innovations here, though it is one of my favorite “simple” theories (a few suggested readings for those who are interested are chatman’s 1986 article—which is a bit more technical—and minishi-majanja and kiplang'at’s 2005 article).6 what is important to expect is that this will not all just be a “flash in a pan” like we have seen before with some technologies. these technologies in the “fourth industrial revolution” will bring about real change in our world. if we are “ahead of the curve” (to reference a diffusion concept), we will be well-positioned to adapt to the changes to come. https://ischoolblogs.sjsu.edu/blockchains https://library.stanford.edu/projects/artificial-intelligence/ https://bridgingbarriers.utexas.edu/good-systems information technology and libraries march 2021 the fourth industrial revolution | brady 4 references 1 klaus schwab, the fourth industrial revolution (new york, ny: penguin books, 2017). 2 jason griffey, artificial intelligence and machine learning in libraries (chicago, il: american library association, 2019). 3 david raitt, “the future of libraries in the face of the internet,” the electronic library 12, no. 5 (1994): 275. 4 ibid. 5 thomas padilla, responsible operations: data science, machine learning, and ai in libraries (dublin, oh: oclc research, 2019), https://doi.org/10.25333/xk7z-9g97. 6 elfreda a. chatman, “diffusion theory: a review and test of a conceptual model in information diffusion,” journal of the american society for information science 37, no. 6 (1986): 377–86; mabel k. minishi-majanja and joseph kiplang’at, “the diffusion of innovations theory as a theoretical framework in library and information science research,” south african journal of libraries and information science 71, no. 3 (2005): 211–24. https://doi.org/10.25333/xk7z-9g97 the “fourth” industrial revolution? the role of libraries in the fourth industrial revolution references editorial board thoughts column getting to yes: stakeholder buy-in for implementing emerging technologies in your library ida joiner information technology and libraries | september 2018 5 ida a. joiner (ida.joiner@gmail.com), a member of lita and the ital editorial board, is the librarian at the universal academy school in irving, texas. she is the author of “emerging library technologies: it’s not just for geeks” (elsevier, august 2018). have you ever wanted to implement new technologies in your library or resource center such as (drones, robotics, artificial intelligence, augmented/virtual reality/mixed reality, 3d printing, wearable technology, and others) and presented your suggestions to your stakeholders (board members, directors, managers, and other decision makers) only to be rejected based on “there isn’t enough money in the budget,” or “no one is going to use the technology,” or “we like things the way that they are,” then this column is for you. i am very passionate about emerging technologies, how they are and will be used in libraries/ resource centers, and how librarians will be able to assist those who will be affected by these technologies. i recently published a book introducing emerging technologies in libraries. i came up with suggestions on how doing your research — including the questions below and those on the accompanying checklist —will prepare you to meet with your stakeholders and improve the likelihood of your emerging technology proposal being approved. 1. who are your stakeholders and include them early on in the process? determine who you stakeholders are, what their areas of expertise are, and how they can support your emerging technology projects. the most critical piece to getting your stakeholders on board to support your technology initiatives is addressing the question “what’s in it for them?” this will get their attention and increase your odds to getting to say “yes” to your technology initiatives. 2. what are the costs? research what your costs will be and create a budget. find innovative ways to fund your initiatives by researching grants, strategic partnerships with others who might be interested in partnering with you, and locating other funding opportunities. 3. what are the risks? identify any potential risks so that you are prepared to discuss how you will mitigate them when you meet with your stakeholders. some potential risks that you might want to address are budget cost overruns or staffing issues such as a key person resigning or going on maternity or sick leave, or policies in place to deter patrons from trying to use the technology for criminal means. mailto:ida.joiner@gmail.com https://www.elsevier.com/books/emerging-library-technologies/joiner/978-0-08-102253-5 https://www.elsevier.com/books/emerging-library-technologies/joiner/978-0-08-102253-5 getting to yes | joiner 6 https://doi.org/10.6017/ital.v37i3.10746 4. what is the timeline and key milestones? address the timeline for when you want or need to implement these technologies? have you planned for key milestones and possible delays such as funds not being available? you need to have a detailed timeline, from your first kickoff meeting with your initiative’s team, to your stakeholder meeting where you present your proposal, to getting signoff on the project. 5. what training will you offer? perform a needs assessment to determine who will need to be trained, what training you will offer, what your training costs will be, and who will pay for them. once you have all of this in place, you will select the trainer(s) and the training model (such as “train the trainer”) that you will use. 6. how will you market your technology initiatives? will you rely on social media to market your technology initiatives? will you collaborate with your marketing department for developing your message through press releases, websites, blogs, e-newsletters, flyers, and other media outlets? you will need to meet with your marketing and publications experts to plan how you will market your emerging technology initiatives along with your costs and who will pay them. 7. who is your audience and how can you engage them? this is the one of the most important areas to address in your proposal to present to your stakeholders. without our patrons, there is no library. you will need to determine who your audience is and how you can utilize the emerging technologies to assist them. are they k to 12 students, adults who will be displaced by these technologies, technology novices who want to learn more about these technologies, or university faculty and/or students who want to use the technology for their projects? you can address all of these potential audiences in your proposal to your stakeholders. these are just a few tips on how to get stakeholder buy-in for implementing emerging technologies in your library. feel free to share some of your own successes in getting shareholders on board to implement emerging technologies in your library or resource center. information technology and libraries | septmeber 2018 7 emerging technology stakeholder buy-in questionnaire i have included questions below that you should follow when you are considering getting your stakeholders on board to implement new emerging technologies in your library. if you address all of these, you have a very good chance of getting your stakeholders on board to support your initiatives. 1. what technologies do you want to implement in your library/resource center and why do you want them? 2. who are your stakeholders and what are their backgrounds? 3. why should your stakeholders support your technology initiatives? 4. what is your budget for your new technology initiatives? 5. what training is needed to support these initiatives, who will provide the training, what are the costs, and who will pay for the training? 6. how will you market these technology initiatives, what are the costs, and who will pay for them? 7. did you perform a cost-benefit-analysis for these technology initiatives? 8. are there legal fees? if so, what are they, and who will pay for them? 9. what are the risks? 10. what are the returns on the investment (roi)? 11. what strategic partnerships can you establish? 12. what is your timeline for implementing these technology initiatives? emerging technology stakeholder buy-in questionnaire 10181 20190318 galley a systematic approach towards web preservation muzammil khan and arif ur rahman information technology and libraries | march 2019 71 muzammil khan (muzammilkhan86@gmail.com) assistant professor, department of computer and software technology, university of swat. arif ur rahman (badwanpk@gmail.com) assistant professor, department of computer science, bahria university islamabad. abstract the main purpose of the article is to divide the web preservation process into small explicable stages and design a step-by-step web preservation process that leads to creating a well-organized web archive. a number of research articles are studied about web preservation projects and web archives, and designed a step-by-step systematic approach for web preservation. the proposed comprehensive web preservation process describes and combines strengths of different techniques observed during the study for preserving digital web contents into a digital web archive. for each web preservation step, different approaches and possible implementation techniques have been identified that can be adopted in digital archiving. the potential value of the proposed model is to guide the archivist, related personnel, and organizations to effectively preserved their intellectual digital contents for future use. moreover, the model can help to initiate a web preservation process and create a wellorganized web archive to efficiently manage the archived web contents. a section briefly describes the implementation of the proposed approach in a digital news stories preservation framework for archiving news published online from different sources. introduction the amount of information generated by institutions is increasing with the passage of time. one of the mediums that uses this information is the world wide web (www). the www has become a tool to share information quickly with everyone regardless of their physical location. the number of web pages is vast. google and bing each index approximately 4.8 billion.1 though the www is a rapidly growing source of information, it is fragile in nature. according to the available statistics, 80 percent of pages become unavailable after one year and 13 percent of links (mostly web references) in scholarly articles are broken after 27 months.2 moreover, 11 percent of posts and comments on websites for various purposes are lost within a year. according to another study conducted on 10 million web pages collected from the internet archive in 2001, the average survival rate of web pages is 1,132.1 days with a standard deviation of 903.5 days. 90.6 percent pages of those web pages are inaccessible today.3 the information fragility causes this valuable scholarly, cultural, and scientific information to vanish and become inaccessible to future generations. in recent years, it was realized that the lifespan of digital objects is very short, and rapid technological changes make it more difficult to access these objects. therefore, there is a need to preserve the information available on the www. digital preservation is performed using the primary methods of emulation and migration, in which emulation provides the preserved digital objects in their original format while migration provide objects in a different format.4 in the last systematic approach towards web preservation | khan and ur rahman 72 https://doi.org/10.6017/ital.v38i1.10181 two decades, a number of institutions worldwide, such as national and international libraries, universities, and companies started to preserve their web resources (resources found at a web server, i.e., web contents and web structure). the first web archive was initiated in 1996 by brewster kahle, named the internet archive, and it holds more than 30 petabytes data, which includes 279 billion web pages, 11 million books and texts, and 8 million other digital objects such as audio, video, image files, etc. more than seventy web archive initiatives were started in 33 countries since 1996, which shows the importance of web preservation projects and preservation of web contents. this information era encourages librarians, archivists, and researchers to preserve the information available online for upcoming generations. while digital resources may not replace the information available in physical form, the digital version of these information resources improves access to the available information.5 there are different aspects of the preservation process and web archiving, e.g., digital objects’ ingestion to the archive during preservation process, digital object’s format and storage, archival management, administrative issues, access and security to the archive, and preservation planning. these aspects need to be understood for effective web preservation and will help in addressing the challenges that occur during the preservation process. the reference model for open archival information system (oais) is an attempt to provide a high-level framework for the development and comparison of digital archives. in web preservation, a challenging task is to identify the starting point of the preservation process and to effectively complete the process which help to proceed further to the other activities. therefore, the complicated nature of the web and the complex structure of the web contents make the preservation of the web content even more difficult. the oais reference model helps in achieving the goals of a preservation task in a step-by-step manner. the stakeholders are identified, i.e., producer, management, and consumer, and the packages, i.e., submission information package (sip), archival information package (aip) and dissemination information package (dip), which need to be processed, are clearly defined.6 this study aims to design a step-by-step systematic approach for web preservation that helps to understand preservation or archival activities’ challenges, especially those that relate to digital information objects at various steps of the preservation process. the systematic approach may lead to an easy way to analyze, design, implement, and evaluate the archive with clarity and different options for an effective preservation process and archival development. an effective preservation process is one that leads to a well-organized, easily managed web archive and accomplishes designated community requirements. this approach may help to address the challenges and risks that confront archivists and analysts during preservation activities. step-by-step systematic approach digital preservation is “the set of processes and activities that ensure long-term, sustained storage of, access to and interpretation of digital information.”7 the growth and decline rates of www content and the importance of the information presented on the web make it a key candidate for preservation. web preservation confronts a number of challenges due to its complex structure, a variety of available formats, and the type of information (purpose) it provides. the overall layout of the web varies domain to domain based on the type of information and its presentation. the websites can be categorized based on two things. first, the type of information (i.e., the web information technology and libraries | march 2019 73 contents) and second, the way this information presented (i.e., the layout or structure of the web page. examples include educational, personal, news, e-commerce, and social networking websites, which vary a lot in their contents and structure. the variations in the overall layout make it difficult to preserve different web contents in a single web archive. the web preservation activities are summarized in figure 1. the following sections explain the web preservation activities and possible implementation in proposed systematic approach. defining the scope of the web archive the www provides an opportunity to share information using various services, such as blogs, social networking websites, e-commerce, wikis, and e-libraries. these websites provide information on a variety of topics and address different communities based on their interest and needs. there are many differences in the way the information is handled and presented on the www. in addition, the overall layout of the web changes from one domain to another domain.8 therefore, it is not practically feasible to develop a single system to preserve all types of websites for the long term. so, before starting to preserve the web, one (the archivist) should define the scope of the web to be archived. the archive will be either a site-centric, topic-centric, or domaincentric archive.9 site-centric archive a site-centric archive focuses on a particular website for preservation. these types of archives are mostly initiated by the website creator or owner. the site-centric web archives allow access to the old versions of the website. topic-centric archive topic-centric archives are created to preserve information on a particular topic published on the web for future use. for scientific verification, researchers need to refer to the available information while it is difficult to ensure access to these contents due to the ephemeral nature of the web. a number of topic-centric archive projects have been performed including the archipol archive of dutch political websites,10 the digital archive for chinese studies (dachs) archive2,11 minerva by the library of congress,12 and the french elections web archive for archiving the websites related to the french elections.13 domain-centric archive the word “domain” refers to a location, network, or web extension. a domain-centric archive covers websites published with a specific domain name dns, using either a top-level domain (tld), e.g., .com, .edu, or .org, or a second-level domain (sld), e.g., .edu.pk or .edu.fr. an advantage of domain-centric archiving is that it can be created by automatically detecting specific websites. several projects have a domain-centric scope, e.g., the portuguese web archive (pwa) national websites,14 the kulturarw, a swedish royal library web archive collection of.se and .com domain websites,15 and the uk government web archive collection of uk government websites, e.g., .gov.uk domain websites. understanding the web structure after defining the scope of the intended web archive, the archivist will have a better understanding of the interest and expected queries of the intended community based on the resources available or the information provided by the selected domain. the focus in this step is to understand the type of information (contents) provided by the selected domain and how the information has been presented. the web can be understood by two dimensions. the first systematic approach towards web preservation | khan and ur rahman 74 https://doi.org/10.6017/ital.v38i1.10181 figure 1. systematic approach for web preservation process. information technology and libraries | march 2019 75 considers the web as a medium that communicates contents using various protocols, i.e., http, and the second considers the web as a content container, which further presents the contents to the viewers and not simply contents, e.g. the underlying technology used to display the contents.16 the preservation team should understand such parameters as the technical issues, the future technologies, and the expected inclusion of other related content. identify the web resources the archivist should understand the contents and the representation of the contents of the selected domain, e.g., blogs, social networking websites, institutional websites, educational institutional websites, newspaper websites, or entertainment websites. all of these websites provide different information and address individual communities that have distinct information needs. a web page is the combination of two things, i.e., web contents and web structure.17 the resources which can be preserved are as follows. web contents web contents or web information can be categorized into the following categories: • textual contents (plain text): this category describes textual information that appears on a web page. it does not include links, behaviors, and presentation stylesheets. • visual contents (images): these contents are the visual forms of information or are a complementary material to the information provided in the textual form. • multimedia contents: as another form of information, multimedia contents mainly include audio and video. it may also include animation or even text as a part of a video or a combination of text, audio, and video. web structure web structure can be categorized in the following categories: • appearance (web layout or presentation): this category indicates the overall layout or presentation of a web page. the look and feel of a web page (representation of the contents) are important, which is maintained with different technologies, e.g., html or stylesheets, etc. • behavior (code navigations): categorized by link navigations, these can be within a website or to other websites, external document links or dynamic and animated features, such as live feed, comments, tagging, or bookmarking. identify designated community the archivist should identify the designated community of the intended web archive, their functional requirements and expected queries by analyzing them carefully. the designated community means the potential users, such as those who can access the archived web contents for different purposes, i.e., accessing old information that is not available in normal circumstances or referring to an old news article which is not bookmarked properly or retrieving relevant news articles published long ago, etc. prioritize the web resources after a comprehensive assessment of the resources of the selected domain and the identification of potential users’ requirements and expected queries, the archivist should prioritize the web systematic approach towards web preservation | khan and ur rahman 76 https://doi.org/10.6017/ital.v38i1.10181 resources. the complexity of web resources and their representation cause complications in the digital preservation process. generally, it may be undesirable or unviable to preserve all web resources; therefore, it is worthwhile to designate the web resources for preservation. the priority should be assigned on the basis of two things: first, the potential reuse of the resource and second, the frequency with which the resource will be accessed. the resources with no value, little value, or those managed elsewhere can be excluded. for prioritization of resources, the moscow method can be applied.18 the acronym moscow can be elaborated as: m must have, the resource must be preserved or resources that must be a part of the archive and preserved. for example, in the digital news story archive (dnsa), the textual news story must be preserved in the archive because the preservation emphasis is on a textual news story.19 online news contains textual news stories, and many news stories contain associated images, and a fraction of news stories contain associated audio-video contents. s should have, the resource should be preserved if at all possible. almost all the news stories have associated images; a few news stories have associated audio and video that complement it and should be preserved as a part of the news story in the web archive. c could have, the resource could be preserved if it does not affect anything else or is nice to have. the web structure in dnsa depends on the resources to be used for the preservation of news stories; the layout of the newspaper website could (c) be a part of the preservation process if it does not affect anything, e.g., storage capacity and system efficiency. w won’t have, the resource would not be included. archiving multiple versions of the layout or structure of the online newspaper are not worthwhile and hence would not (w) be preserved. the prioritization of these resources is very important in the context of web preservation planning because it does not waste time and energy, and it is the best way to handle users’ requirements and fulfill their expected queries. how to capture the resource(s) the selection of a feasible capturing technique depends on: first, the resources to be captured and second, the capturing task frequency. there are three web resources capturing techniques, i.e., by browser, web crawler, and authoring system. each capturing technique has associated advantages and disadvantages.7 web capturing using browsers the intended web content can be captured using browsers after a web page is rendered when the http transaction occurs. this technique is also referred to as a snapshot or post-rendering technique. the method captures those things which are visible to the users; the behavior and other attributes remain invisible. capturing static contents is one of the disadvantages of web capturing by the browser approach, this approach generally preserved contents in the form of images. it is best for well-organized websites, and commercial tools are available for capturing the web. the following are well-known tools to capture web using browsers. webcapture (https://web-capture.net/) is a free online web-capturing service. it is a fast web page snapshot tool, which can grab web pages in seven different formats, i.e. jpeg, tiff, png, bmp information technology and libraries | march 2019 77 image formats, pdf, svg, and postscript files of high quality. it also allows downloading the intended format in a zip file and is suitable for long vertical web pages with no distortion in layout. a.nnotate (http://a.nnotate.com/), is an online annotating web snapshot tool to keep track of information gathered from the web efficiently and easily. it allows adding tags and notes to the snapshot and building a personal index of web pages as document index. the annotation feature can be used for multiple purposes, for example, compiling an annotated library of objects for organization, sharing commented web pages, product comparison, etc. snagit (https://www.techsmith.com/screen-capture.html) is a well-known snapshot tool for capturing screens with built-in advanced image editing features and screen recording. snagit is a commercial and advanced screen capture tool that can capture web pages with images, linked files, source code, and the url of the web page. acrobat webcapture (file > create > pdf from web page...) creates a tagged pdf file from the web page that a user visits while the adobe pdf toolbar is used for the entire website.20 the capture by a browser technique has the following advantages: • by this technique, the archivist can capture only the displayed contents, and it is an advantage if you need to preserve the displayed contents only. • it is a relatively simple technique for well-organized websites. • commercial tools exist for web capturing using browsers. in addition, the disadvantages are the following: • capturing displayed contents only is a disadvantage if the focus is not on only displayed contents. • it results in frozen contents and treats contents as if they are publications. • it loses the web structure, such as appearance, behavior, and other attributes of the web page. web capturing using an authoring system/server the authoring system capturing technique is used for web harvesting directly from the website hosting server. all the contents, e.g., textual information, images, and source code, are collected from the source web server. the authoring system allows the archivist to preserve the different versions of the website. the authoring system depends on the infrastructure of the content management system and is not a good choice for external resources. the system is best for an owned web server and works well for limited internal purposes. the web curator tool (http://webcurator.sourceforge.net/), pandas (an old british library harvesting tool), and netarchivesuite (https://sbforge.org/display/nas/netarchivesuite) are known tools use for planning and scheduling web harvesting. they can be used by non-technical personnel for both selection and harvesting web content selection policies. these web archiving tools were developed in a collaboration of the national library of new zealand and the british library and are used for the uk web archive (http://www.ariadne.ac.uk/issue50/beresford/). the tools can interface with web crawlers, such as heritrix (https://sourceforge.net/projects/archivecrawler/). authoring systems are also referred to as workflow systems or curatorial tools. systematic approach towards web preservation | khan and ur rahman 78 https://doi.org/10.6017/ital.v38i1.10181 the authoring system has the following advantages: • it is best for web harvesting, which captures everything available. • it is easy to perform, if you have proper access permission or you own the server or system to access for capturing the resources. • it works in short to medium term resources and feasible for internal access within organizations. the disadvantages of web capturing using the authoring system are: • it captures all available raw information, not only presentations. • it may be too reliant on the authoring infrastructure or the content management system. • it is not feasible for large term resources, or for external access from outside organization. web capturing using web crawlers web crawlers are perhaps the mostly used technique for capturing web contents in systematic and automated manner.21 crawler development needs the expertise and experience of different tools, i.e. positive and negative of technologies, and the viability of a tool in a specific scenario. the main advantage of crawlers is that they extract embedded content. heritrix, httrack, wget, and deeparc are common examples of web crawlers. heritrix (https://github.com/internetarchive/heritrix3/wiki) is developed in java, an open source and freely available web crawler, and it was developed by internet archive. heritrix is one of the widely used extensible and web-scale web crawlers in web preservation projects. initially, the heritrix was developed for specific purpose crawling of specific websites and now a resourceful or customize web crawler for archiving the web. httrack (https://www.httrack.com/) is a freely available configurable browser utility. httrack crawls html, images, and other files from a server to a local directory and allows offline viewing of the website. the httrack crawler downloads a complete website from the web server to a local computer system and makes it available for offline for viewing with all related link-structure and seems like the user is using it online. it also updates the archived websites at the local system from the server and resumes all the interrupted previous extractions. the httrack available for both windows and linux/unix operating systems. wget (http://www.gnu.org/software/wget/) is a freely available non-interactive command line tool that can easily be configured with other technologies and different scripts. it can capture files from the web using widely used ftp, ftps, http and https protocols, and support cookies as well. it also updates the archived websites and resumes all the interrupted extractions. wget is available for both microsoft windows and unix operating systems. the advantages of web crawling: • widely used in capturing techniques. • can capture specific content or everything. • avoids some of the accessing issues, such as: link rewriting and embedded external content from an archive or live. information technology and libraries | march 2019 79 disadvantages associated with web crawling: • much work is required, as well as tools or development expertise and experience, etc. • the web crawler does not have the right scope: sometimes, it does not capture everything that it should, and sometimes the crawler captures too much content. web content selection policy in the previous steps, the web resources are identified, prioritized based on requirements and expected queries of the designated community, and feasible capturing technique is identified based on capturing frequency. now, the contents need to be prepared and filtered for selection, and a feasible selection approach needs to be selected based on the contents. a web content selection policy helps to determine and clarify, which web contents are required to be captured based on the priorities, the purpose and the scope of web contents already defined.22 the decision of the selection policy comprises the description of the context, the intended users, the access mechanisms and the expected uses of the archive. the selection policy may comprise the selection process and selection approach. the selection process can be divided into subtasks which, in combination, provide a qualitative selection of web contents to a certain extent, i.e., preparation, discovery, and filtering, as shown in figure 2. the main objective of the preparation phase is to determine the targeted information space, the capture technique, capturing tools, extension categorization, granularity level, and the frequency of archiving activity. the best personnel who can provide help in preparation are the domain experts, regardless of the scope of the web archive. the domain experts may be the archivists, researchers, librarians, or any other authentic reference, i.e. a document or a research article. the tools defined in the preparation phase will help to discover intended information in the discovery phase, which can be divided into the following four categories: 1. hubs may be the global directories or topical directories, collection of sites or even a single web page with essential links related to a particular subject or topic. 2. search engines can facilitate discovery by defining a precise query or set of alternative queries related to a topic. the use of specialized search engines can significantly improve the results of discovering related information that can be greatly improved. 3. crawlers can be used to extract web contents such as textual information, images, audio, video and links. moreover, the overall layout of a web page or a whole website can also be extracted in a well-defined systematic manner. 4. external sources may be non-web sources that may be anything, such as printed material for mailing lists, which can be monitored by the selection team. the main objective of the discovery phase is to determine the source of information to be stored the archive. this determination can be achieved by two ways. first, a manually created entry point list is used to determine the list of entry points (usually links) for crawling the collection manually and updating the list during the crawl. there are two discovery methods, i.e., exogenous and endogenous. exogenous discovery is used in manual selection and mostly relies on exploitation of an entry point list for hubs, search engines, and on non-web documents. second, there is an automatically created entry point list to determine the list of entry points by extracting links automatically and obtaining an updated list every time during the crawl. endogenous discovery is systematic approach towards web preservation | khan and ur rahman 80 https://doi.org/10.6017/ital.v38i1.10181 used in automatic selection and relies on the link extraction using crawlers by exploring the entry point list. figure 2. selection process. the main objective of the filtering phase is to optimize and make concise the discovered web contents (discovery space). filtering is important in order to collect more specific web content and remove unwanted or duplicated content. usually, for preservation, an automatic filtering method is used; manual filtering is useful if the robots or automatic tools cannot interpret the web. the discovery and filter phase can be combined practically or logically. several evaluation axes can be used for the selection policy (e.g., quality, subject, genre, and publisher). in the literature, we have three known techniques for selecting web content. the selection approach can be either automatic or manual. manual content selection is very rare because it is labor intensive: it requires automatic tools for finding the content, and then manual review of that collection to identify the subset that should be captured. automatic selection policies are used frequently in web preservation projects for web collection, especially for web archives.23 the selection of the collection approach depends on the frequency with which the web content has been preserved in the archive. there are four different selection approaches for web content collection. unselective approach the unselective approach implies collecting everything possible; by specifically using this approach, the whole website and its related domains and subdomains are downloaded to the archive. it is also referred to as automatic harvesting or selection, bulk selection, and domain selection.24 the automatic approach is used in a situation where a web crawler usually performs the collection. for example, the collection of websites from a domain, i.e., .edu means all educational institution websites (at domain level) or the collection of all possible contents/pages from a website (harvesting at website level) by extracting the embedded links. a section of the data preservation community believes that technically it is a relatively cheaper, quicker collection approach and yields a comprehensive picture of the web as a whole. in contrast, its significant drawbacks are that it generates huge unsorted, duplicated, and potentially useless data, consuming too many resources. information technology and libraries | march 2019 81 the swedish royal library’s project kulturarw3 harvests websites at domain level, i.e., collecting websites from a .se domain which is a physically located website in sweden and one of the first projects to adopt this approach.25 usually, national-based web archive initiatives adopt the unselective approach, most notably nedlib, a helsinki university library harvester, and aola, an austrian online archive.26 selective approach the selective approach was adopted by the national library of australia (nla) in the pandas project in 1997. in this approach, a website is included for archiving based on certain predefined strategies and on the access and information provided by the archive. the library of congress’ project minerva and the british library project “britain on the web” are the other known projects that have adopted the selective approach. according to nla, the selected websites are archived based on nla guidelines after negotiation with the owners.27 the inclusion decision could be taken at one of the following levels: • website level: which websites should be included from a selected domain, e.g., to archive all educational websites from high level domain “.pk”. • web page level: which web pages should be included from a selected website, e.g., to archive the homepages of all educational websites. • web content level: which type of web contents should be preserved, e.g., to archive all the images from the homepages of educational websites. a selective approach is best if the numbers of websites to be archived are very large or the archiving process is targeting the entire www and wants to narrow down the scope by identifying the resources in which the archivists are more interested. this approach performs implicit or explicit assumptions about the web contents that are not to be selected for preservation. it may be very helpful to initiate a pilot preservation project, which identifies: what is possible? what can be managed? in addition, some tangible results may be obtained easily and quickly in order to enhance the scope of the project in a broader perspective. the selective approach may be based on a predefined criterion or based on an event. selective approach based on criteria involves selecting web resources based on various predefined sets of criteria. nla’s guidance characterizes the criteria-based selective approach as the “most narrowly defined method,” and described it as “thematic selection.” a simple or a complex content-selection criteria can be defined, which depends on the overall goal of preservation. for example, all resources owned by an organization, all resources of one genre, i.e., all programming blogs, resources contributed to a common subject, resources addressing a specific community within an institution, i.e., students or staff, all publications belonging to an individual organization or group of organizations, all resources that may benefit external users or an external user’s community, e.g., historians, or alumni. selective approach based on event involves selecting web resources or websites based on various time-based events. the archivists may focus on websites that address national or international important events, e.g., disasters, elections, and the football world cup, etc. eventbased websites have two characteristics: (1) very frequent updates and (2) website content is lost after a short time, e.g., a few weeks or a few months. for example, the start and end of a term or systematic approach towards web preservation | khan and ur rahman 82 https://doi.org/10.6017/ital.v38i1.10181 academic year, the duration of an activity, e.g., research project, appointment, or departure of a new senior official. deposit approach in the deposit collection approach, the information package is submitted by the administrator or owner of the website which includes a copy of the website with related files that can be accessed through different hyperlinks. the archival information package is applicable to the small collection (of a few websites), or the owner of the website can initiate the preservation project, e.g. a company can initiate a project for preserving their website. the deposit collection approach was adopted by the national archives and records administration (nara) for the collection of us federal agency websites in 2001 and by die deutsche bibliothek (ddb, http://deposit.ddb.de/) for the collection of dissertations and some online publications. new digital initiatives are heavily dependent on administrator or owner support and provide an easy way to deposit new content to the repository, e.g., in the macewan university’s institutional repository, the librarians leading the project tried to offer an easy and effective way to deposit their archival contents.28 combined approach there are advantages and disadvantages associated with each collection approach. the ongoing debate is which approach is best in a given situation. for example, the deposit approach should be an inexpensive agreement with the depositors. the emphasis is to use the combination of automatic harvesting and selective approaches as these two approaches are cheaper as compared to other selection approaches because a few staff personnel are required and cope with technological challenges. this initiative was taken by the bibliothque nationale de france (bnf) in 2006. the bnf automatically crawls information regarding the updated web pages and stores it in an xml-based “site delta” and uses page relevancy and importance, similar to how google ranks pages, to evaluate individual pages.29 the bnf used a selective approach for the deep web (that is, web pages or websites that are behind a password or are otherwise not generally accessible to search engines), referred to as “deposit track.” metadata identification cataloging is required to discover a specific item from the digital collection. an identifier or set of identifiers is required to retrieve a digital record in digital repositories or an archive. for digital documents, this catalog or registration or identifier is referred to as metadata.30 metadata are structured information concerning resources that describe, locate (discover or place), manage, easily retrieve (access) and use digital information resources. metadata are often referred to as “data about data” or “information about information”, but it may be more helpful and informative to describe these data as “descriptive and technical documentation.”31 metadata can be divided into the following three categories: 1. descriptive metadata describes a resource for discovery and identification purposes. it may consist of elements for a document such as title, author(s), abstract, and keywords, etc. 2. structural metadata describes how compound objects are put together, for example, how sections are ordered to form chapters. information technology and libraries | march 2019 83 3. administrative metadata imparts information to facilitate resource management, such as when and how a file was created, who can access the file, its type, and other technical information. administrative metadata is classified into two types: (1) rights management metadata addresses intellectual property rights and (2) preservation metadata contains information needed to archive and preserve a resource.32 due to new information technologies, digital repositories, especially web-based repositories, have grown rapidly over the last two decades. this interest prompts the digital libraries communities to devise metadata strategies to manage the immense amount of data stored in digital libraries.33 metadata play a vital role in the long-term preservation of digital objects and important to identify the metadata which may help to retrieve a specific object from the archive after preservation. according to duff et al., “the right metadata is the key to preserving digital objects.”34 there are hundreds of metadata standards developed over the years for different user environments, disciplines, and for different purposes; many of them are in their second, third, or nth edition.35 digital preservation and archiving requires metadata standards to trace and ensure its access to the digital objects. several of the common standards are briefly discussed below. dublin core metadata initiative (dcmi, http://dublincore.org/) was initiated at the 2nd world wide web conference in 1994 and was standardized by ansi/niso z39.85 in 2001 and iso 15386 in 2003.36 the main purpose of the dcmi was to define an element set for representing web resources; initially, thirteen core elements were defined which later increased to a fifteen-element set. the elements are optional, repeatable, can be followed in any order, and expressed in xml.37 metadata encoding and transmission standard (mets, http://www.loc.gov/standards/mets/) is an xml metadata standard intended to represent information of the complex digital objects. mets elements evolved from the early project making of america ii “moa2” in 2001, supported by the library of congress and sponsored by the digital library federation “dlf” and registered with national information standards organization “niso” in 2004. a mets document contains seven major sections in which each contains different aspects of metadata.38 metadata object description schema (mods, http://www.loc.gov/standards/mods/) was initiated by the marc21 maintenance agency at the library of congress in 2002. mods elements are richer then dcmi, simpler then marc21 bibliographic format and expressed in xml.39 the mods identified the widest facets or features of an object and presented nineteen high-level optional elements.40 visual resources association core strategies (vra core, http://www.loc.gov/standards/vracore/) was developed in 1996, and the current version 4.0 was released in 2007. the vra core is a widely used standard for art, libraries, and archives for such objects as paintings, drawings, sculpture, architecture, and photographs, as well as books and decorative and performance art.41 the vra core contains nineteen elements and nine sub-elements.42 preservation metadata implementation strategies (premis, http://www.loc.gov/standards/premis/) was developed in 2005, sponsored by the online computer library center (oclc) and the research libraries group (rlg), includes a data dictionary and some information about metadata. premis defined a set of five interactive core semantic units or entities and xml schema for endorsing digital preservation activities. it is not systematic approach towards web preservation | khan and ur rahman 84 https://doi.org/10.6017/ital.v38i1.10181 concerned with discovery and access but with common metadata, and for descriptive metadata, other standards (dublin core, mets or mods) need to be used. the premis data model contains intellectual entities (contents that can be described as a unit, e.g., books, articles, databases), objects (discrete units of information in digital form, which can be files, bitstreams, or any representation), agents (people, organization, or software), events (actions that involve an object and an agent known to the system) and rights (assertion of rights and permission).43 it is indisputable that good metadata improves access to the digital object in the digital repository. therefore, the creation and selection of appropriate metadata make the web archive accessible to the archive user. structure metadata helps to manage the archival collection internally, as well as the related services, but may not always help to discover the primary source of the digital object.44 currently, there are many semi-automated metadata generation tools. the use of these semiautomatic tools for generating metadata is crucial for the future, considering the operation’s complexity and cost of manual metadata origination.45 archival format the web archive initiatives select websites for archiving based on relevance of contents and the intended audience of the archived information. the size of the web archives varies significantly depending on their scope and the type of content they are preserving, e.g., web pages, pdf documents, images, audio, or video files.46 to preserve these contents, a web archive uses different storage formats containing metadata and utilizes data compression techniques. the internet archive defined the arc format (http://archive.org/web/researcher/arcfileformat.php), later used as a defacto standard. in 2009, the internet organization for standardization (iso) established the warc format (https://goo.gl/0rbwsn) as an official standard for web archiving. approximately 54 percent of web archive initiatives applied arc and warc formats for archiving. the use of standard formats helps the archivists to facilitate the creation of collaborative tools, such as search engines and ui utilities to efficiently manipulate the archived data.47 information dissemination mechanisms a well-defined preservation process can lead to a well-organized web archive that is easy to maintain and easy to retrieve a specific digital object from the collection using information dissemination techniques. poor search results are one of the main problems in information dissemination of web archives. the users of a web archive expend excessive time to retrieve intended documents or information to satisfy the user’s query. archivists are more concerned with “ofness,” “what collections are made up of,” although archive users are concerned with aboutness, “what collections are about.”48 to use the full potential of web archives a usable interface is needed to help the user to search the archive for specific digital object. full text and keyword search are the dominant ways to search the unstructured information repository, evidently observed from the online search engines. the sophistication of search results against user queries is based on the ranking tools.49 the access tools and techniques are getting the attention of researchers, and approximately 82 percent of european web archives concentrate on such tools, which makes these web archives easily accessible.50 the lucene full-text search engine and its extension nutchwax is widely used in web archiving. moreover, for the combination of semantic descriptions that already rely on or are implicit within their descriptive metadata, reasoning-based or semantic searching of the archival information technology and libraries | march 2019 85 collection can enable the system to produce novel possibilities for the archival content retrieval and browsing.51 even in the current era of digital archives, mobile services are adopted in digital libraries, e.g., access to e-books, libraries databases, catalogs, and text messaging are common mobile services offered in university libraries.52 in a massive repository, a user query retrieves millions of documents, which makes it difficult for users to identify the most relevant information. the ranking model estimates the results relevancy based on user’s queries using specified criteria to overcome this problem and sorts the results by placing the most relevant result at the top.53 there are a number of ranking models that exist in the literature, e.g., conventional ranking models, e.g., tf-idf, bm25f, temporal ranking models, e.g., pagerank, and learning to rank models, e.g., l2r. the findings of the systematic approach for web preservation are used to automate the process of the digital news-story preservation. the steps of the proposed model are carefully adopted to develop a tool that is able to add contextual information to the stories to be preserved. digital news stories preservation framework the advancement of web technologies and maturation of the internet attracts news readers to access news online that is provided by multiple sources and to obtain the desired information comprehensively. the amount of news published online has grown rapidly, and for an individual, it is cumbersome to browse through all online sources for relevant news articles. the news generation in the digital environment is no longer a periodic process with a fixed single output, such as printed newspapers. the news is instantly generated and updated online in a continuous fashion. however, because of different reasons, such as the short lifespan of digital information and the speed of generation of information, it has become vital to preserve digital news for the long term. digital preservation includes various actions to ensure that digital information remains accessible and usable, as long as they are considered important.54 libraries and archives preserve by carefully digitizing newspapers considering as a good source of knowing the history. many approaches have been developed to preserve digital information for the long term. the lifespan of news stories published online varies from one newspaper to another, i.e., from one day to a month. however, a newspaper may be backed up and archived by the news publisher or national archives; in the future, it will be difficult to access particular information published in various newspapers regarding the same news story. the issues become even more complicated if a story is to be tracked through an archive of many newspapers, which requires different access technologies. the digital news story preservation (dnsp) framework was introduced to preserve digital news articles published online from multiple sources.55 the dnsp framework is planned based on adopting the proposed step-by-step systematic approach for web preservation to develop a wellorganized web archive. initially, the main objectives defined for the dnsp framework are: • to initiate a well-organized national level digital news archive of multiple news sources. • to normalize news articles during preservation to a common format for future use. • to extract explicit and implicit metadata, which would be helpful in ingesting stories to the archive and browsing through the archive in the future. • to introduce content-based similarity measures to link digital news articles during preservation. systematic approach towards web preservation | khan and ur rahman 86 https://doi.org/10.6017/ital.v38i1.10181 the digital news story extractor (dnse) is a tool developed to facilitate the extraction of news stories from the online newspapers and to migrate to a normalized format for preservation. the normalized format also includes a step to add metadata in the digital news stories archive (dnsa) for future use.56 to facilitate the accessibility of news articles preserved from multiple sources, some mechanisms need to be adopted for linking the archived digital news articles. an effective term-based approach “common ratio measure for stories (crms)” for linking digital news articles in dnsa is introduced that links similar news articles during the preservation process.57 the approach is empirically analyzed, and the results of the proposed approach are compared to get conclusive arguments. the initial results computed automatically using a common ratio measure for stories are encouraging and are compared with the similarity of news articles based on human judgment. the results are generalized by defining a threshold value based on multiple experimental results using the proposed approach. currently, there is ongoing work to extend the scope of dnsa to dual languages, i.e., urdu and english, as well as content-based similarity measures to link news articles published in urduenglish. moreover, research is underway to develop tools for exploiting the linkage created among stories during the preservation process for search and retrieval tasks. summary effective strategic planning is critical in creating web archives; hence, it requires a wellunderstood and a well-planned preservation process. the process should result in a wellorganized web archive that includes not only the content to be preserved but also the contextual information required to interpret the content. the study attempts to answer many questions by guiding the archivists and related personnel, such as: how to lead the web preservation process effectively? how to initiate the preservation process? how to proceed through different steps? what are the possible techniques that may help to create a well-organized web archive? how can the archived information can be used to its greatest potential? to answer these questions, the study resulted in an appropriate step-by-step process for web preservation and a well-organized web archive. the targeted goal of each step is identified by researching the existing approaches that can be adopted. the possible techniques for those approaches are discussed in detail for each step. references 1 “world wide web size,” the size of the world wide web, visited on jan 31, 2019, http://www.worldwidewebsize.com/. 2 brian f. lavoie, “the open archival information system reference model: introductory guide,” microform & imaging review 33, no. 2 (2004): 68-81; alexandros ntoulas, junghoo cho, and christopher olston, “what's new on the web? the evolution of the web from a search engine perspective,” in proceedings of the 13th international conference on world wide web-04 (new york, ny: acm, 2004), 1-12. information technology and libraries | march 2019 87 3 teru agata et al., “life span of web pages: a survey of 10 million pages collected in 2001,” ieee/acm joint conference on digital libraries, (ieee, 2014), 463-64, https://doi.org/10.1109/jcdl.2014.6970226. 4 timothy robert hart and denise de vries, “metadata provenance and vulnerability,” information technology and libraries 36, no. 4 (dec. 2017): 24-33, https://doi.org/10.6017/ital.v36i4.10146. 5 claire warwick et al., “library and information resources and users of digital resources in the humanities,” program 42, no. 1 (2008): 5-27, https://doi.org/10.1108/00330330810851555. 6 lavoie, “open archival information system reference model.” 7 susan farrell, k. ashley, and r. davis, “a guide to web preservation,” practical advice for web and records managers based on best practices from the jisc-funded powr project (2010), https://jiscpowr.jiscinvolve.org/wp/files/2010/06/guide-2010-final.pdf. 8 lavoie, “open archival information system reference model;” farrell, ashley, and davis, “guide to web preservation.” 9 peter lyman, “archiving the world wide web,” washington, library of congress (2002), https://www.clir.org/pubs/reports/pub106/web/. 10 diomidis spinellis, “the decay and failures of web references,” communications of the acm 46, no. 1 (2003): 71-77, https://dl.acm.org/citation.cfm?doid=602421.602422. 11 digital archive for chinese studies (dachs) archive2 https://www.zo.uniheidelberg.de/boa/digital_resources/dachs/index_en.html, visited on jan 31, 2019. 12 julien masanès, “web archiving methods and approaches: a comparative study,” library trends 54, no. 1 (2005): 72-90, https://doi.org/10.1353/lib.2006.0005. 13 hanno lecher, “small scale academic web archiving: dachs,” in web archiving (berlin/heidelberg: springer, 2006), 213-25, https://doi.org/10.1007/978-3-540-463320_10. 14 daniel gomes et al., “introducing the portuguese web archive initiative,” in 8th international web archiving workshop (berlin/heidelberg: springer, 2009). 15 gerrit voerman et al., “archiving the web: political party web sites in the netherlands,” european political science 2, no. 1 (2002): 68-75, https://doi.org/10.1057/eps.2002.51. 16 sonja gabriel, “public sector records management: a practical guide,” records management journal 18, no. 2 (2008), https://doi.org/10.1108/00242530810911914. 17 farrell, ashley, and davis, “guide to web preservation.” systematic approach towards web preservation | khan and ur rahman 88 https://doi.org/10.6017/ital.v38i1.10181 18 jung-ran park and andrew brenza, “evaluation of semi-automatic metadata generation tools: a survey of the current state of the art,” information technology and libraries 34, no. 3 (sept, 2015): 22-42, https://doi.org/10.6017/ital.v34i3.5889. 19 muzammil khan and arif ur rahman, “digital news story preservation framework,” in digital libraries: providing quality information: 17th international conference on asia-pacific digital libraries, icadl 2015 seoul, korea, december 9-12, 2015 (proceedings, vol. 9469, springer, 2015), 350-52, https://doi.org/10.1007/978-3-319-27974-9; muzammil khan, “using text processing techniques for linking news stories for digital preservation,” phd thesis, faculty of computer science, preston university kohat, islamabad campus, hec pakistan, 2018. 20 dennis dimick, “adobe acrobat captures the web,” washington apple pi journal (1999): 23-25. 21 trupti udapure, ravindra d. kale, and rajesh c. dharmik, “study of web crawler and its different types,” iosr journal of computer engineering (iosr-jce) 16, no. 1 (2014): 01-05, https://doi.org/10.9790/0661-16160105. 22 dora biblarz et al., “guidelines for a collection development policy using the conspectus model,” international federation of library associations and institutions, section on acquisition and collection development (2001). 23 farrell, ashley, and davis, “guide to web preservation;” e. pinsent et al., “powr: the preservation of web resources handbook,” http://jisc.ac.uk/publications/programmerelated/2008/powrhandbook.aspx (2010); michael day, “preserving the fabric of our lives: a survey of web preservation initiatives,” lecture notes in computer science (berlin/heidelberg: springer, 2003): 461-72, https://doi.org/10.1007/978-3-540-45175-4_42. 24 pinsent et al., “powr:”; day, “preserving the fabric.” 25 allan arvidson, “the royal swedish web archive: a complete collection of web pages,” international preservation news (2001): 10-12. 26 andreas rauber, andreas aschenbrenner, and oliver witvoet, “austrian online archive processing: analyzing archives of the world wide web,” research and advanced technology for digital libraries (2002): ecdl 2002. lecture notes in computer science, vol 2458, (berlin/heidelberg: springer, 2002), 16-31, https://doi.org/10.1007/3-540-45747-x_2. 27 william arms, “collecting and preserving the web: the minerva prototype,” rlg diginews 5, no. 2 (2001). 28 sonya betz and robyn hall, “self-archiving with ease in an institutional repository: micro interactions and the user experience,” information technology and libraries 34, no. 3 (sept. 2015): 43-58, https://doi.org/10.6017/ital.v34i3.5900. 29 serge abiteboul et al., “a first experience in archiving the french web,” in international conference on theory and practice of digital libraries, (berlin/heidelberg: springer, 2002), 115, https://doi.org/10.1007/3-540-45747-x_1; sergey brin and lawrence page, “reprint of: information technology and libraries | march 2019 89 the anatomy of a large-scale hypertextual web search engine,” computer networks 56, no. 18 (2012): 3825-33, https://doi.org/10.1016/j.comnet.2012.10.007. 30 masanès, “web archiving.” 31 niso-press, “understanding metadata,” national information standards (2004), http://www.niso.org/publications/understanding-metadata. 32 ibid. 33 jane greenberg, “understanding metadata and metadata schemes,” cataloging & classification quarterly 40, no. 3-4 (2009): 17-36, https://doi.org/10.1300/j104v40n03_02. 34 michael day, “preservation metadata initiatives: practicality, sustainability, and interoperability,” publishers: archivschule marburg (2004): 91-117. 35 jenn riley, glossary of metadata standards (2010). 36 corey harper, “dublin core metadata initiative: beyond the element set,” information standards quarterly 22, no. 1 (2010): 20-31. 37 jane greenberg, “dublin core: history, key concepts, and evolving context (part one),” in slide presentation on dc-2010 international conference on dublin core and metadata applications pittsburgh, pa (2010). 38 cundiff v. morgan, “an introduction to the metadata encoding and transmission standard (mets),” library hi tech 22, no. 1 (2004): 52-64, https://doi.org/10.1108/07378830410524495; leta negandhi, “metadata encoding and transmission standard (mets),”in texas conference on digital libraries, tcdl-2012 (2012). 39 sally h. mccallum, “an introduction to the metadata object description schema (mods),” library hi tech 22, no. 1 (2004): 82-88, https://doi.org/10.1108/07378830410524521. 40 r. gartner, “mode: metadata object description schema,” jisc techwatch report tsw (2003): 03-06. www.loc.gov/standards/mods/. 41 vra-core, “an introduction of vra core,” http://www.loc.gov/standards/vracore/vra core4 intro.pdf, created: oct 2014. 42 vra-core, “vra core element outline,” http://www.loc.gov/standards/vracore/vra core4 outline.pdf, created: feb 2007. 43 priscilla caplan, “understanding premis,” washington dc, usa: library of congress, (2009), https://www.loc.gov/standards/premis/understanding-premis.pdf; j. relay, “an introduction to premis,” singapore ipress tutorial, (2011), http://www.loc.gov/standards/premis/premistutorial ipres2011 singapore.pdf. systematic approach towards web preservation | khan and ur rahman 90 https://doi.org/10.6017/ital.v38i1.10181 44 jennifer schaffner, “the metadata is the interface: better description for better discovery of archives and special collections, synthesized from user studies,” making archival and special collections more accessible, 85 (2015). 45 joao miranda and daniel gomes, “trends in web characteristics,” in web congress, 2009. laweb'09. latin american, (ieee, 2009), 146-53, https://doi.org/10.1109/la-web.2009.28. 46 daniel gomes, joão miranda, and miguel costa, “a survey on web archiving initiatives,” research and advanced technology for digital libraries (2011): 408-20, https://doi.org/10.1007/978-3-642-24469-8_41. 47 ibid. 48 schaffner, “metadata is the interface.” 49 miguel costa and mário j. silva, “evaluating web archive search systems,” in international conference on web information systems engineering (berlin/heidelberg: springer, 2012), 440454. https://doi.org/10.1007/978-3-642-35063-4_32. 50 foundation, i, “web archiving in europe,” technical report, commercenet labs (2010). 51 georgia solomou and dimitrios koutsomitropoulos, “towards an evaluation of semantic searching in digital repositories: a dspace case-study,” program 49, no. 1 (2015): 63-90, https://doi.org/10.1108/prog-07-2013-0037. 52 liu yan quan and sarah briggs, “a library in the palm of your hand: mobile services in top 100 university libraries,” information technology and libraries 34, no. 2 (june 2015): 133, https://doi.org/10.6017/ital.v34i2.5650. 53 ricardo baeza-yates and berthier ribeiro-neto, modern information retrieval 463. (new york: acm pr., 1999). 54 daniel burda and frank teuteberg, “sustaining accessibility of information through digital preservation: a literature review,” journal of information science, 39, no. 4 (2013): 442-58, https://doi.org/10.1177/0165551513480107. 55 muzammil khan et al., “normalizing digital news-stories for preservation,” in digital information management (icdim), 2016 eleventh international conference on (ieee, 2016), 8590, https://doi.org/10.1109/icdim.2016.7829785. 56 khan, et al., “normalizing digital news.” 57 muzammil khan, arif ur rahman, and m. daud awan, “term-based approach for linking digital news stories,” in italian research conference on digital libraries (cham, switzerland: springer, 2018), 127-38, https://doi.org/10.1007/978-3-319-73165-0_13. letter from the editor: a blank page letter from the editor a blank page kenneth j. varnum information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12405 nothing is as daunting as a blank page, particularly now. as i sat down to write this issue’s letter, i was struck by how much fundamental uncertainty is in our lives, so much trauma. a blank page can emphasize our concerns that the old familiar should return at all, or that a new, better, normal will emerge. at the same time, a blank page can be liberating at a time when so much of our social, professional, and personal lives needs to be reconceptualized and reactivated in new, healthier , more respectful and inclusive ways. we are collectively faced with two important societal ailments. the first is the literal disease of the covid-19 pandemic that has been with us for only months. the other is the centuries-long festering disease of racial injustice, discrimination, and inequality that typifies (particularly, but not uniquely) american society. while some of us may be in better positions to help heal one or the other of these two ailments, we can all do something in both, as different as they are. lend emotional support to those in need of it, take part in rallies if your personal health and circumstances allow, and advocate for change to government officials at all levels from local to national. learn about the issues and explore ways you can make a difference on either or both fronts. i hope i am not being foolish or naive when i say i believe the blank page before us as a society will be liberating: an opportunity to shift ourselves toward a better, more equitable, more just path. * * * * * * to rephrase humphrey bogart’s rick blaine in casablanca, “it doesn’t take much to see that the problems of three little people library association divisions don’t amount to a hill of beans in this crazy world.” but despite the small global impact of our collective decision, i am glad our alcts, llama, and lita colleagues chose a united future as core: leadership, infrastructure, futures. watch for more information about what the merged division means for our three divisions and this journal in the months to come. sincerely, kenneth j. varnum, editor varnum@umich.edu june 2020 https://core.ala.org/ mailto:varnum@umich.edu letter from the editor kenneth j. varnum information technology and libraries | june 2019 1 https://doi.org/10.6017/ital.v38i2.11241 welcome to the june 2019 issue of ital. you’ll likely notice a new look to the journal when you read this issue’s content. our helpful and supportive partners at boston college, where information technologies and libraries is archived, have updated the journal’s content management system to the current version of open journal systems. i am grateful to john o’connor at boston college for his patience with and quick, helpful responses to my numerous questions as we adapted to the new user interface and editorial workflows. columns in this issue include bohyun kim’s final “president’s message” as her term concludes, summarizing the work that has gone into the planned division merger that would combine lita, alcts, and llama. editorial board member cinthya ippoliti discusses the role of libraries in fostering digital pedagogy in her “editorial board thoughts” column. and, in the second of our new “public libraries leading the way” column, jeffrey davis discusses the technologies and advantages of digital pass systems. peer-reviewed articles in this issue include: • “no need to ask: creating permissionless blockchains of metadata records,” by dejah rubel, laying a path for using blockchain for managing metadata. • “50 years of ital/jla: a bibliometric study of its major influences, themes, and interdisciplinarity,” by brady lund, a thorough study of how our journal has influenced, and been influenced by, other leading information technology journals. • “weathering the twitter storm: early uses of social media as a disaster response tool for public libraries during hurricane sandy,” by sharon han. this article is the 2019 lita/ex libris student writing award-winning paper. • “‘good night, good day, good luck’: applying topic modeling to chat reference transcripts,” by megan ozeran and piper martin, describing a process to categorize chat reference themes using topic mapping software. • “information security in libraries: examining the effects of knowledge transfer,” by tonia san nicolas-rocca and richard j burkhard, investigating the importance of knowledge transfer across an organization to enhance information security behaviors. • “wikidata: from ‘an’ identifier to ‘the’ identifier,” by theo van veen, describing how libraries could use wikidata as a source of linked open data. thank you to this issue’s authors, and all of information technology and libraries’ readers for supporting peer-reviewed, open-access, scholarly publishing. in closing, i would like to thank the members of the editorial board whose terms are ending june 30: patrick “tod” colegrove, joseph deodato, richard guajardo, and frank cervone. i’m grateful to these four individuals, upon whom i’ve relied for their excellent advice and guidance in steering ital’s course. we are in the process of appointing new editorial board members with two-year terms starting on july 1, and i’ll introduce them in the next issue. kenneth j. varnum, editor varnum@umich.edu june 2019 the “black box”: how students use a single search box to search for music materials kirstin dougan information technology and libraries | december 2018 81 kristin dougan (dougan@illinois.edu) is head, music and performing arts library, university of illinois. abstract given the inherent challenges music materials present to systems and searchers (formats, title forms and languages, and the presence of additional metadata such as work numbers and keys), it is reasonable that those searching for music develop distinctive search habits compared to patrons in other subject areas. this study uses transaction log analysis of the music and performing arts module of a library’s federated discovery tool to determine how patrons search for music materials. it also makes a top-level comparison of searches done using other broadly defined subject disciplines’ modules in the same discovery tool. it seeks to determine, to the extent possible, whether users in each group have different search behaviors in this search environment. the study also looks more closely at searches in the music module to identify other search characteristics such as type of search conducted, use of advanced search techniques, and any other patterns of search behavior. introduction music materials have inherent qualities that present difficulties to the library systems that describe them and to the searchers who wish to find them. this can be exemplified in three main areas: formats, titles, and relationships. first, printed music comes in multiple formats such as full scores, vocal scores, study scores, and parts; and in multiple editions such as facsimiles, scholarly editions, performing editions (of various caliber); each format and edition serving a different purpose or need. related to this, but less problematic, is the variety of sound recording formats available. second, issues resulting from titling practices abound in music, ranging from frequent use of foreign terms, not just in descriptive titles (l'oiseau de feu = zhar-ptitsa = the firebird = feuervogel), but in generic titles as translated by various publishers from different countries (symphony=sinfonie). additionally, musical works often have only generic genre titles enhanced by key and work number metadata, for example symphony no. 1 in c minor. third, music materials present a relationship issue best defined as “one-to-many.” musical works often have multiple sections or songs in them (an aria in an opera or a movement in a symphony), and a cd or a score anthology may contain multiple pieces of music. given these three main challenges presented by music materials, it is possible that those searching for music develop distinctive search habits compared to patrons in other subject areas. this study uses transaction log analysis of the music and performing arts module of a library’s federated discovery tool to determine how patrons search for music materials. it also makes a top-level comparison of searches done using other broadly defined subject disciplines’ modules in the same discovery tool. it seeks to determine, to the extent possible, whether users in each group have different search behaviors in this search environment. the study also looks more closely at mailto:dougan@illinois.edu the “black box” | dougan 82 https://doi.org/10.6017/ital.v37i4.10702 searches in the music module to identify other search characteristics such as type of search conducted, use of advanced search techniques, and any other patterns of search behavior. background since fall 2007 the university of illinois library has had easy search (es), a locally developed search tool designed to aid users in finding results from multiple catalog, a&i, and reference targets quickly and simultaneously. there is a “main” es on the library’s main gateway page that searches a variety of cross-disciplinary tools (see figure 1). figure 1. gateway easy search. on the gateway, users have the option of selecting one of the format tabs to narrow their search to books, articles, journals, or media. when the data for this study was gathered, the journals tab was not present. starting in 2010 many of the subject and branch libraries in the university library created their own es modules with target resources specific to the disciplinary areas they serve. search boxes for these es subject modules are often displayed right on the branch library’s home page, but users can also select these subject module options from the dropdown in the main es (see figure 2). information technology and libraries | december 2018 83 figure 2. gateway dropdown subject choices. the mpal es interface as it appears on the mpal homepage can be seen in figure 3—it was created in 2011. figure 3. mpal easy search interface. the “black box” | dougan 84 https://doi.org/10.6017/ital.v37i4.10702 es is a federated search tool and does not have a central index like most current discovery layer tools. rather, it utilizes broadcast search technologies to target different tools and search them directly. while the gateway es now uses a “bento box” layout to display selected citations from each target, in the first iterations of the tool and still today in the subject modules, users are simply presented with a list of hit counts in each of the target tools (see figures 4 and 5). figure 4. mpal easy search display screen part 1. figure 5. mpal easy search results screen part 2. information technology and libraries | december 2018 85 not shown in the screen captures are the results from various newspaper indexes and reference sources such as oxford music online and the international encyclopedia of dance. literature review many studies have examined patron search behavior using transaction log analysis and other methods over the past few decades. since the appearance of google in 1998, and its vast impact on individuals’ expectations and search behavior, recent studies have looked at user search behavior in tools that initially present a single search box. additional studies have looked at disciplinespecific searching behaviors. general search studies and single search boxes many advantages and disadvantages have been ascribed to tools with single search boxes (whether federated search tools or discovery layers), namely ease and convenience on the one hand, and the lack of precision possible in searching and overwhelming number of results on the other. two companion articles, boyd et al. and dyas-correira, written ten years apart, attempted to visit and revisit these issues.1 results and patron satisfaction can vary based on size of library and number of resources accessed by these tools. these types of tools will never be able to search and display everything and that problem is magnified by the number of resources a library has. holman, porter, and zimerman discovered in independent studies that undergraduates do not search very efficiently or effectively and find library tools difficult to use.2 avery and tracy also found this true for the es tool under discussion in this study: the generation of keywords by many students indicates they often struggled to identify alternative terminology that may have resulted in a more successful search . . . . many students exhibited persistence in their searching, but the selection of search terms, sometimes compounded by spelling problems or problems in search string structure, likely did not yield the most relevant results.3 asher, duke, and wilson state in their study comparing student search strategies and success across a variety of library search tools and google that there were “strong patterns in the way students approached searches no matter which tool they used. . . . students treated almost every search box like a google search box, using simple keyword searches in 81.5 percent (679/829) of the searches observed.”4 dempsey and valenti note students’ infrequent use of limiters such as “peer-review” and “date” in eds, the high non-use and misuse rates of quotation marks, relatively low instances of repeated uncorrected spelling errors, and variance patterns in keyword usage.5 students like federated search tools and discovery layers because of their convenience and ease, as found in studies by armstrong, belliston, and williams et al.6 this is reiterated in asher et al., “despite the fact that they did not necessarily perform better on the research tasks in this study, students did prefer google and google-like discovery tools because they felt that they could get to full-text resources more quickly.”7 this one-box approach could hinder students, as described by swanson and green: the search box became an obstacle in other questions where it should not have been used. in some cases, the search box was viewed as an all-encompassing search of the entire site. the “black box” | dougan 86 https://doi.org/10.6017/ital.v37i4.10702 several students searched for administrative information, research guides, and podcasts in this box.8 lown et al. also found that users hope to access a vast range of information via a single search box. “one lesson is that library search is about more than articles and the catalog. about 23 percent of use of quicksearch took place outside either the catalog or articles modules, indicating that ncsu library users attempt to access a wide range of information from the single search box.”9 search and library use in different disciplines in their study comparing a discovery layer and subject-specific tools, dahlen and hanson found “subject-specific indexing and abstracting databases still play an important role for libraries that have adopted discovery layers. discovery layers and subject-specific indexing and abstracting databases have different strengths and can complement each other within the suite of library resources.”10 they also observed things iterated by previous authors, chiefly that “not all students prefer discovery tools” and “the tools that students prefer may not be those that give them the best results.”11 in addition, they found that default configuration matters in terms of students’ success in and preference for a given tool. fu and thomes found that creating smaller disciplinespecific subsets in discovery tools was beneficial to searchers by reducing results and in creasing the results’ relevance.12 few studies investigate how music students search for music materials. dougan found in her observational study of music students’ search behaviors that they have difficulty forming good searches; misuse quotation marks and other search elements; and at times struggle with finding music materials.13 mayer noted upper-class music students’ frustration with using library tools to find specific works of music, going so far as to state, “the music students agreed that both the discovery layer and the catalog are not effective for music-related searching, for any format.”14 clark and yeager found that students had an easier time searching for media items than music scores, and frequently struggled with search strategy revisions.15 there is more research on the larger information needs of disciplines and creating models for research behavior, and not necessarily specific search processes or constructions.16 whitmire, in her 2002 pre-google article, found that students majoring in the social sciences were engaged in information-seeking behaviors at a higher rate than students majoring in engineering.17 chrzastowski and joseph surveyed graduate students at the university of illinois at urbana– champaign and found that those in the life sciences, physical sciences, and engineering visited the libraries less often than students in other academic disciplines.18 students in the arts and humanities used the library more often than students in other disciplines. collins and stone report that in prior studies of users across different disciplines, arts and humanities users do not account for the biggest users of library materials, their survey found the opposite to be true. 19 when looking at the various student populations in their study, musicians had the highest library usage in terms of items borrowed and almost the highest number of library visits. music users in the study also showed high numbers of hours logged into the library e-resources and highest number of e-resources accessed compared to others in their discipline group (but not as much as other disciplines). however, they show a low number of pdf downloads and low number of e-resources accessed frequently. information technology and libraries | december 2018 87 methodology this study conducted quantitative analysis of easy search (es) data as a whole and from a selection of the subject modules, including the music and performing arts library (mpal) module, using data from the period june 20, 2014 through june 16, 2015. additional quantitative and qualitative analysis was conducted only on the mpal es transaction log data. data from the following subject modules were included in comparative analyses: • funk agricultural, consumer and environmental sciences library (aces) (http://www.library.illinois.edu/funkaces/) • grainger engineering library (http://search.grainger.illinois.edu/top/) • history, philosophy, and newspaper library (hpnl) (http://www.library.illinois.edu/hpnl/) • music and performing arts library (mpal) (http://www.library.illinois.edu/mpal/) • social science, health, and education library (sshel) (http://www.library.illinois.edu/sshel/) • undergraduate library (ugl) (http://www.library.illinois.edu/ugl/) each of these libraries has a search box for es on its home page that is customized to the search targets identified as best for those subject areas by the subject librarians in that library. transaction log data on searches done in es is continuously compiled in a sql database and queries were written to determine certain quantitative measures. searches done in these various subject modules were isolated by a variable in the sql data that indicates whether the search was done in the main gateway es, in the main gateway es but using one of the subject dropdown choices, or from the subject es box directly from that library’s homepage. searches in the six subject modules listed above and in the main es were assessed for the average number of searches per session and the average words per search. further analysis of searches done in the mpal es module used 25,503 sessions conducted on mpal public computers from march 21, 2014 to june 21, 2015, which is a slightly longer timespan than used for the comparative analysis between subject es modules described above. to make this more manageable, only every tenth session was considered, meaning 2,550 sessions were analyzed out of the full set of mpal data. searches were sorted by session id number, which is assigned to each session when a new session is begun. this method kept all strings from one session together, whereas simply sorting by date and string id did not, since multiple sessions can occur simultaneously. a session is a series of user actions (searches and click-throughs) from the same workstation in which there is less than a twenty-minute pause between actions. if there are user actions from the same workstation after a twenty-minute pause, a new session is established, therefore, there is the possibility that some of the sequential sessions were from the same user, but there is no easy way to determine that. the mpal data set was assessed using the following quantitative measures: 1) average number of searches per session and whether session contained a) a single search b) multiple searches for the same thing (either repeated exactly or varied) http://www.library.illinois.edu/funkaces/ http://search.grainger.illinois.edu/top/ http://www.library.illinois.edu/hpnl/ http://www.library.illinois.edu/mpal/ http://www.library.illinois.edu/sshel/ http://www.library.illinois.edu/ugl/ the “black box” | dougan 88 https://doi.org/10.6017/ital.v37i4.10702 c) multiple strings searching for multiple things 2) average number of search terms per search 3) type of search by index (title/author/keyword) or other advanced search 4) use of boolean, quotation marks, parentheses, etc. 5) use of work or opus numbers or key indications 6) search indicating format (score, cd, etc.) findings comparing the data for searches done in the main es to some of the subject modules (see table 1) shows that the ugl es and the hpnl es have the fewest average searches per session, and the mpal es has the third highest average number of searches per session. the sciences tend to have higher average words per search string values, while mpal has the second lowest average number of words per search. this is not surprising given that the sciences tend to use a lot of journal literature and it is common for researchers to copy and paste such citations into es. whereas in music, as we will see later, keyword searches tend to focus on combinations of the composer’s name and words from the work title, occasionally with other terms added. source sessions searches average searches per session average words per search all es searches 599,482 1,340,159 2.121 5.08220 gateway only gateway everything tab 382,040 757,862 1.9837 5.255 gateway books tab 71,007 136,724 1.9255 4.048 gateway articles tab 57,169 107,893 1.887 6.35 gateway total 1,002,479 all subject modules departmental searches (incl. those from gateway dropdown) 75,035 214,364 1.9288 searches done directly from subject library pages 144,283 select subject modules21 agricultural, consumer and environmental sciences library 2,732 5,221 1.911 4.07 engineering library 32,018 68,146 2.128 5.092 history, philosophy, and newspaper library 1,264 1,985 1.57 3.09 music and performing arts library (mpal) 21,047 41,590 1.976 3.375 mpal data from march 21, 2014 to june 21, 2015 25,503 49,702 1.949 3.349 social science, health, and education library 9,458 19,760 2.089 4.906 undergraduate library 26,988 44,588 1.65 3.909 table 1. comparative search data from june 20, 2014 to june 16, 2015 (unless otherwise noted). information technology and libraries | december 2018 89 average number (and range) of searches per session in looking at the searches done directly from the mpal homepage and from the gateway dropdown from march 21, 2014 through june 21, 2015, there were 25,503 sessions conducted in the mpal es that contained a total of 49,702 searches, resulting in an average of 1.949 searches per session. of the 2,550 mpal search sessions in the study sample, the majority (63.2 percent) consisted of one search.22 this means the patron conducted one search and then left es, presumably clicking into the library catalog or another tool that is a target in es to complete their research. sessions consisting of two to four searches account for 31 percent of sessions, while sessions involving five to nine searches only account for 5 percent of total sessions, and only 32 sessions, or fewer than 1 percent, consist of ten or more searches (see table 2). searches per session number of sessions searches per session number of sessions 1 1604 12 7 2 476 13 2 3 191 14 3 4 116 16 1 5 51 17 2 6 29 18 1 7 22 19 2 8 12 20 1 9 15 23 1 10 6 30 2 11 6 total searches 2,550 table 2. searches per session. sessions with multiple searches (n= 946) were evaluated to see whether patrons were searching multiple times for the same thing (either with the same term[s] or with different terms), or whether they were searching for different things. five sessions that were clearly not music-related were removed from the sample. each session was categorized as “same/exact,” “same/different,” or “different.” at times, sessions might include several searches for the same thing using altered strings, in addition to searches for other things. those sessions were coded as “different.” for example: crumb zodiac crumb georgy crumb georgy cromb korean music there were 478 multi-search sessions (50.6 percent) in which patrons searched for different things within their session, 391 sessions (41.3 percent) in which patrons looked for the same thing with differing search strings, and 71 (7.5 percent) in which patrons reiterated the exact same search in each attempt. in the 71 sessions in which patrons used the same exact search the “black box” | dougan 90 https://doi.org/10.6017/ital.v37i4.10702 multiple times, they averaged 2.25 searches. those sessions tagged as “same/exacts” provide an opportunity to try to determine why patrons repeat the same search. common themes include: using too broad a search, searching in wrong place (non-performing-arts–related search), or repeatedly typing in the wrong info (e.g., typos or other errors) and not realizing their mistake. in the 391 sessions in which patrons spent their session searching for the same thing with different search strings, they did so with an average of 2.96 searches. often the variation in the search string was a change in spelling or a minor change in the terms, but sometimes it involved the addition or subtraction of terms, such as starting with morley fitzewilliam virginalists and going to morley fitzewilliam. in another example, we see how music metadata can prove challenging for searchers to format, such as when a patron started with schumann op.68 (without the necessary space between op. and 68), then progressed to album for the young, and finally schumann album for the young. in the 478 sessions in which patrons searched for completely different things within their session, they did so with an average of 4.08 searches per session. in many cases, although the searches were for different items, they were related in some way, either by genre, instrument, or some other element, such as in this example: microjazz color me jazz jamey aebersold play-along vandall jazz jazz piano pieces but sometimes the searches were for very different things: debussy voiles composition as problem mart humal composition as problem debussy ursatz average number (and range) of terms per search in looking at the approximately 4,900 searches included in the sample of 2,550 mpal sessions, without removing the small percentage of duplicate searches, two-term searches are the most common, followed by three-term searches—together accounting for more than half of the searches (55.3 percent). oneand four-term searches are the next most common, together accounting for 25.5 percent of searches (see table 3). in 2012, regular es single-term searches were at almost 60 percent.23 information technology and libraries | december 2018 91 number of terms in search string instances percentage (%) 1 605 12.4 2 1,559 31.8 3 1,149 23.5 4 642 13.1 5 400 8.2 6 196 4.0 7 100 2.0 8-15 216 4.5 16-57 29 .06 table 3. words per search string. longer search strings (8-15 terms) ranged from 74 to ten examples each, respectively, while searches with 16 to 20 terms ranged from 8 to 2 examples each, respectively. the following term counts each had only one example in the logs (25, 26, 31, 32, 36, 57). single-term searches types of single-term searches can be broken down into several categories (see table 4). over half (58.4 percent) were searches for personal names or part/all of a work title. some names and work titles are in fact so unique that a one-word search might in fact be successful (e.g., beyoncé, schwanengesang, newsies, or landowska). over a fifth (22.2 percent) were classified as “other or undetermined,” including publisher names, cities, or subject terms. type of one-word search number personal name 260 title 93 instrument/genre 51 tool/location/format 51 call number/barcode/label number 15 other/undetermined 135 table 4. one-term search types. in the tool/location/format category patrons searched for things such as: albums, images, dissertation, rilm, worldcat, jstor, and imslp. while rilm (abstracts of music literature) and worldcat can be found by a search in this tool because they will match on journal or database titles to which we subscribe, a search for imslp [international music score library project] only brings back mentions of imslp in rilm, etc. mpal links to imslp on its webpage, but neither imslp nor the library’s website are targets in es. when patrons only searched for a format, as in a session where a patron first searched for performances, then albums, and then audio cd [sic], it is difficult to know whether the patron expected to be led to a tool that only searched or listed recordings, whether they wanted a list of all of our recordings, or if some other logic was occurring. searchers also used this technique in multi-word searches, such as in the example george gershwin articles. the “black box” | dougan 92 https://doi.org/10.6017/ital.v37i4.10702 single-term searches in the “other/undetermined” category were a mix of subject terms like solfege, tuning, and spectralism. the patron could be trying to find materials related to these topics, examples of them (in the case of solfege), or definitions for them. they also included publisher or label terms such as rubank and puntamayo [sic], and even, on more than one occasion, urls and dois. two-term searches and names the largest segment of the mpal data (31.8 percent) is comprised of two-term searches. the examples show that often a musical work can be easily sought based on the composer’s name and a word from the title, especially in cases where it is a common title but adding the composer’s name makes it unique (e.g., ligeti requiem). sometimes the patron only knows the work’s characteristics and not its proper name (e.g., lakme duet). patrons do attempt to search for topical material using only two words, and that is not likely enough for a good topic search in most cases, such as in the example mahler dying. sometimes phonetic spellings are employed such as woozy wick followed by woyzeck (which is both a play and a film with this spelling but could also potentially be a misspelling of berg’s opera wozzeck). another example is image cartier followed by images quartier. personal names are frequently seen in two-term search strings. occasional use of foreign versions of names is observed, e.g., georgy crumb. it is difficult to know if these are typos or an artifact of our high international student population. as with any search that contains only a name, it is impossible to know whether the searcher was looking for materials by that individual or information about them. additionally, when current faculty names are searched, it is difficult to know whether patrons are looking for contact information for them, or scores or recordings by them. also observed in name searches is the phenomena of patrons repeating their search with a change in order of names, such as bryan gilliam and then gilliam bryan. this occurs with other two-word searches as well, such as a change from introitus gubaidulina to gubaidulina introitus. switching the order of the words in a search no longer makes a difference in most search tools (although in some catalogs, of course, it was once required to formulate an author search as last name, first name). there is still the occasional use of comma in ln, fn searches here. echoing the results of an earlier study that asked students what data points they used in searching, only occasionally did searches in this data set incorporate specific performers combined with a particular piece or composer: franck mutter, or for a particular edition: idomeneo barenreiter.24 sometimes names/titles were combined with format, such as a session in which a patron searched for hedwig images and then hedwig photo. here it is hard to tell if they are looking for pictures of a fictional owl or images from productions of hedwig and the angry inch, or something else. names are also frequently combined with work numbers instead of title words, such as mozart k.395 and moscheles op.73. search strings in the “other/undertermined” category sometimes included what appears to be an author/date search, perhaps for an article, such as mccord 2006. long search strings on the other end of the spectrum, the vast majority of the ten-plus word string searches are for performing arts items, but some were in other subject areas. these long searches are often citations that have been copied and pasted, which can be discerned from the use of punctuation and capitalization, like “welded in a single mass”: memory and community in london’s concert halls information technology and libraries | december 2018 93 during the first world war.25 it is very common in general gateway es searches to see an entire citation pasted in,26 but less common in the mpal module. searches such as this are often truncated through iteration to make the search more generic (see table 5). given easy search’s doi search recognition function, the longest version of this search would have worked had the doi been correct, but the correct doi number lacks the “.2” at the end (see table 6, query 1). the middle three searches (#s 2-4) failed because none of the a&i services that include this citation use hess, j. for the author’s name, but instead use her full first name (juliet). other examples showed that even when patrons use the exact citation, their search might not be successful if the citation formatting did not match that of the database(s) in which the article was indexed. query # query string 1 hess, j. (2014). radical musicking: towards a pedagogy of social change. music education research, 16(3), 229-250. doi: 10.1080/14613808.2014.909397.2). 2 hess, j. (2014). radical musicking: towards a pedagogy of social change. music education research, 16(3), 229-250. 3 hess, j. (2014). radical musicking: towards a pedagogy of social change. 4 hess, j. radical musicking: towards a pedagogy of social change. 5 radical musicking: table 5. search truncation. in some instances, searches were long because the patron included additional information such as in this example: bernstein, leonard. arranger: jack mason. title: west side story-selections (for symphonic "full" orchestra piano-conductor score). edition/publisher: hal leonard corporation. it is hard to tell if this was a copy and paste from another source such as a publisher catalog, or if the patron was trying to be very precise. in any case, this search was not successful, but would have been had the searcher omitted extraneous information such as the terms “arranger” and “edition/publisher.” type of index search—title/author/keyword and adding subsets or tools easy search does have an advanced search function with indexes for title and author, although it is rarely used by patrons. including repeated searches, searches done selecting the “title” index only numbered 207, or fewer than 10 percent of the sample. searches done selecting “author” were even scarcer, at 141(5.5 percent). the remaining ~2,300 searches in the sample were conducted using the default keyword search. occasionally there was a misuse of index searching, such as: ti: js bach english suite ti: scarlatti sonatas ti: haydn cello concerto d in these examples, composer name is included in a title index search. it is unclear whether searchers do not realize that they have selected something other than a keyword search, or whether people inherently think of the composer’s name as part of the title. later in this paper the phenomenon of searches using possessive name forms is discussed, which may be associated. the “black box” | dougan 94 https://doi.org/10.6017/ital.v37i4.10702 patrons have the option to start from the main library gateway and perform a search in es, and in the advanced search screen can choose other subject modules such as arts and humanities, l ife sciences, and so forth, and/or types of tools to cross-search (see figure 6). patrons chose the music and performing arts tool subset in 161 sessions. figure 6. easy search advanced search screen. the vast majority of the time (4,557 searches or 93 percent), patrons chose to start from the mpal es on the mpal homepage and do a basic search there, but 179 times patrons started from the mpal es and chose other subsets through the advanced search.27 given our large music education program, logically, some patrons made tool choices that included the music subset and the education and/or social science subsets. but sometimes patrons chose every or almost every option available across multiple unrelated subject areas, which likely made for a very unwieldy result set. information technology and libraries | december 2018 95 use of boolean operators, quotation marks, parentheses, truncation, etc. as in most search tools, there are several ways in es to conduct more sophisticated searches. however, patrons do not employ these techniques often, in part because they don’t always have to. in most older catalogs (including our classic voyager opac), searchers had to use boolean terms in capital letters, whereas in vufind and worldcat boolean and is now implied between terms. in the 159 examples of boolean logic in the searches, and is most common term used. interestingly, some researchers used plus signs instead of and (as they might in google), not just between individual words, but in between multi-word segments of the string (without employing quotation marks). however, the + sign, like and, is ignored/implied by es. berg + warm die lufte progressive studies for trumpet progressive studies for trumpet + john miller progressive studies for trumpet (john miller) new orleans + bossa nova johnny alf + brazil dick farney + brazil dick farney + booker pittman in some cases, the use of boolean did not seem intentional, that is, the term “and” appears as part of a common phrase (especially for instrument combinations), such as in webern violin and piano. only a handful of the boolean searches included examples of or and not, which seemed to stem from a class assignment designed by a professor, as the search strings are all very similar. one set is below: machaut not mass machaut or mass machaut and mass machaut mass notre dame machaut mass machaut and mass the “black box” | dougan 96 https://doi.org/10.6017/ital.v37i4.10702 commas were sometimes seen to stand in for boolean operators in a sense, or at least to separate search concepts, like the plus signs above, but were not counted in the total uses of boolean terms cited above. they are ignored by es. rachmaninoff, moment music planet, holst city noir, john adams piazzolla, flute and marimba mussorgsky, pictures at an exhibition, manfred schandert searchers used quotation marks on occasion (n=162) to keep phrases together, and parentheses were also used in this manner eight times (although they are ignored by es), such as in these examples: preludes and fugues (well-tempered clavier) cohen chaconne (from partita in d minor, bwv 1004) in some cases, searchers did not seem to grasp the function of quotation marks, as in this example: “snowforms" raymond murrey schafer, which was also observed by avery and tracy.28 truncation symbols can be another powerful tool in a searcher’s arsenal, but examples of their use in the transaction logs show that most searchers who attempt to use them do not understand them, such as in the examples doctor atomic?, boethius music,* and: orchestra* history history of the orchestra orchest* history orchestr* history orchestra history orchestral history in fact, the current library catalog assists users by automatically applying truncation logic so that “symphony” returns results for “symphonies” and vice versa. it is doubtful that this is generally known among users and likely functions in a manner transparent to most of them. work numbers and key indications searching by music metadata elements such as work or opus numbers and key designations has always proved challenging in online search environments given that numbers and single letters can appear in other parts of the catalog record with different meanings (e.g., 1 part instead of symphony no. 1). added to this is the difficulty of describing items that contain multiple works— the item’s title might be “mozart’s complete symphonies” or “beethoven symphonies 1-6” without information technology and libraries | december 2018 97 complete work details provided. nevertheless, 134 searches had some form of work number included, and 36 searches included a key indication. fantasie in f# minor presto georg philipp telemann and concerto en ut mineur j.c. bach are further examples of why a work’s key is hard to search by, one because of the use of the french solfege syllable “ut” and one because it includes a sharp symbol (#).29 the difficulties this can cause often led searchers to try various permutations of their search. mozart concerto g major sam franko; mozart concerto k 283 sam franko; scores; mozart violin concerto g major; mozart violin concerto g major sam franko; mozart violin concerto; sonata g major flute cpe bach sonata g major flute bach hamburger sonata flute cpe bach hamburger sonata hamburger sonata it is counterintuitive to searchers that including specific details in their search string might not help, but that is in fact the case in many online catalogs. searchers often run into the question of how or if to include the work indicator (op., k., bwv, etc.), which can lead to a “misuse” of this extra data such as in mozart k501 and mahler symphony no.9 (no spaces). another observation includes the use of what the author calls musicians’ shorthand. that is, those familiar with classical repertoire will know that examples such as sibelius 1 and mahler 5 are searches for symphonies even though they do not say so, but it will be harder, if not impossible, for the catalog to interpret that, leaving the searcher to sort through many extra results. in addition is the long-standing issue of whether to enter the number as “1”, “1st”, or “first” and whether the system can interpret these against the form of the number present in the catalog record. search by format or edition type in forty-seven examples searchers used format terms in their searches, including score, vocal score, full score, dvd, performance recordings, albums, and audio cd as well as the following: prokofiev romeo and juliet orchestra parts orchestra excerpts prokofiev romeo and juliet viola the “black box” | dougan 98 https://doi.org/10.6017/ital.v37i4.10702 tosca harp part assassins cd saxophone article in fifteen examples searchers searched for edition types including urtext, facsimile, critical edition, and complete works. in the latter case they occasionally used the word “complete” and the composer’s name, such as complete schumann or complete webern. unfortunately, this approach will often not be successful, because even though the term “complete works” is used colloquially by musicians, the titles of such editions are often something else (and often in a foreign language, such as “opera omnia”). other observations on formulation of searches searching by call numbers and recording label numbers while some catalogs allow call number searches, our current instance of vufind does not have a call number index, and keyword searching for them only works in some instances.30 but while call number searching does not work well in vufind (e.g., it has to be done as a keyword search and not a call number index search like in voyager), it still works in es because it is searching by keywords. there were thirty-two examples of searches in mpal’s es where patrons used entire call numbers or the first part of a classification number to find related materials: count basie biography count basie ml 410 duke ellington ml 410 duke ellington bibiliography it is also not unrealistic to think that patrons might want to search by a recording’s label number, since most catalogs provide search options for isbns and issns for print materials. searchers attempted this in a handful of searches like lpo-0014,31 7.24356e+11,32 and 777337-2.33 unfortunately this information is not usually reflected in mpal’s catalog records. common descriptions, natural language queries, genre queries, and context words as mentioned already with the examples mahler 1 and complete works, patrons regularly search with terms and phrases that make sense to them or that are used colloquially when discussing music and sources, which may or may not be in the bibliographic record. additional examples in the data set include: handel messiah critical edition rodelinda in italian mamma mia! book [for the text of a musical] grove encyclopedia [the title of this is in fact “dictionary” not “encyclopedia”] mgg sachteil [the abbreviation for musik in geschichte und gegenwart and the name for a section of it] information technology and libraries | december 2018 99 dance collection the last example in the list is particularly intriguing—somewhat like the earlier search examples of performances and albums, one wonders if the patron hoped to find everything in that category and then be able to browse, however it is hard to know what the searcher anticipated getting in return. sometimes natural language queries appear, often in an attempt to find a smaller part of a larger work, such as the slow movement of brahms's first symphony, anonymous chant from vespers for christmas day, and chaconne (from partita in d minor, bwv 1004); or for things other than musical works, such as in reviews of stravinsky article by robert craft. another variation on natural language or colloquial searches is the use of the possessive form of composer names. although not common (23 examples), patrons do this when searching for composer and title of a work, e.g., verdi's requiem. it seems unlikely that people do this when searching for books or other works, but musicians make works possessive to the composer, such as in the examples mendelssohn's violin concerto, to differentiate between pieces with the same form/generic title. in rare cases searchers used the term “by,” such as jeptha by carrissimi. genre searches such as south indian vocal music and hindustani classical music show that people may want to search the way they might in pandora or itunes, although it is possible this person was looking for secondary materials and not recordings or scores: pop female pop women pop contemporary pop searchers also exhibit a desire to find things by genre and instrument or voice type, such as soprano arias [which is ‘high voice’ in the lc subject heading], mozart satb sanctus, and baroque arias for medium voice. other examples include marimba literature, organ literature, and organ techniques. catalogs do not necessarily aid in these types of searches, even though they are natural constructions for users. sometimes searchers add context words to their search like they would in google in a way that will not necessarily help them in the catalog, such as daniel read composer. discussion even given the difficulties of searching for music materials, mpal patrons have embraced es—its module has almost as many searches as the undergraduate library’s, which serves a much larger population. it also has twice as many searches as the social science, health, and education library module, which also serves a much larger population than mpal. one of the possible reasons for this is the fact that mpal was an early adopter of developing an es subject module that could be searched from our homepage, which means our patrons have had longer to grow accustomed to using it. mpal has lower average words-per-search ratio (3.375 or 3.349 depending on data set) than most other es modules, likely because there are more composer plus title keyword searches for musical works and not as many pasted article citation searches, which tend to be longer. this is supported by the comparison of the average number of words in searches done in the gateway books tab the “black box” | dougan 100 https://doi.org/10.6017/ital.v37i4.10702 (4.048) vs. the gateway articles tab (6.35). in addition, although twoand three-word searches are most common, mpal has a significant number of single-word searches (12.4 percent). such searches can work in music, when there are unique titles like turandot and treemonisha that are unlikely to appear for more than one composer or as terms in other disciplines. for this same reason, singleor even two-word searches are unlikely to be effective in most other disciplines. at around seven words per search a transition in search patterns occurs. eight word and longer search strings are almost always some version of a title of a book, article, chapter or dissertation, etc. and strings with six words and fewer tend to be topical searches or combination composer/piece searches. other transaction log studies of es have shown that “title searching and results display—of journal titles, article titles, and book titles—is being heavily employed by users.”34 however, in music, where title alone may not be sufficient to identify and retrieve a musical work, searches with a combination of composer name and elements of the title and/or additional information will always be most prevalent. search location appropriateness and context even though discovery layers and federated search tools help with minimizing the number of silos and places in which scholars need to search, there are still issues with patrons attempting to use the es box to find things it is not designed to find.35 searchers see a box and search, without always understanding the context. this can happen on multiple levels. the mpal page clearly states that the mpal es box searches for arts-related things, but obviously patrons do not always see or comprehend this, even after they type in many queries that do not provide (good) results. this is likely related to the number of visitors to mpal from other disciplines who do not realize that there are various differently scoped versions of es. the following example could be a theatre set construction related search, which would work only moderately well in our tool. or, it may have been conducted by an architecture or structural engineering student, who would have better luck using a different es module. light weigh [sic] structures in architecture building research the evolving design vocabulary of fabric structure the engineering discipline of tent structures building research jan/feb 1972:22 it would be ideal if the system was smart enough to make suggestions: “you appear to need architecture resources—if you are not finding what you need, might we suggest tool x, y, or z?” while es does this to an extent when it can in the generic es, it does not do so in the subject modules, and in reality, can only go so far. it raises the question of whether we are we doing patrons a disservice by offering pre-defined subject modules. while this approach has some benefits for most users, it also creates different challenges for some. mpal’s es does not target all available relevant online tools and neither does the general es, so interdisciplinary researchers still need to be cautious of silos, even well-intentioned ones created by librarians or traditional information technology and libraries | december 2018 101 ones created by vendors. it is difficult to inform patrons of this in one-box search settings—they see the box and are eager to get started without first having to read a lengthy set of instructions. search location context is also important when patrons use es to try to find things that are described or linked on our website and not in es, such as for any of our named special collections. patrons also use es to find tools such as naxos, jstor, worldcat, and librarysource, some of which are targeted by es and some of which are not. es will at least provide a link to a tool, however (see figure 7). figure 7. easy search post-search suggestion. these particular tools are all also linked from the mpal website (in fact, naxos is linked further down the home page from the es box) and we also have a separate tool that enables one to search for databases and online journals by name. on some occasions, searchers used es to look for help using library tools, such as in the following example: rilm retrieval rilm using rilm the library website, not the discovery layer, is a better tool for finding instructions, since help information is currently delivered via various libguides. however, this is not intuitive to patrons. on a related note, it is interesting to consider whether patrons searching for specific tools such as imslp expect to find results from non-library resources in our search layers, or if they simply do not differentiate in their minds what is an open tool and what is a library subscription tool. patron knowledge level many of the observations of this study are related to known-item searching, since a large percentage of people looking for music materials are looking for specific pieces of music. earlier studies show that it is difficult to search for something if you do not know what it is.36 this can be seen in examples like ombramaifu handel (should be ombra mai fu) or the interworkings of tennis (which was followed by the correct inner game of tennis). topical searches can be especially difficult in any subject when the patron does not quite know how to put what they want into words (or literally does not know the right words, especially in the case of our many patrons for whom english is not their first language). the “black box” | dougan 102 https://doi.org/10.6017/ital.v37i4.10702 qualtize musical tension spell change click: kw:qualitize musical tension quantize musical tension quantitative musical tension music motive similarity surveying musical form through melodic-motivic similarities a paradigmatic approach to extract the melodic structure of a musical piece inding subsequences of melodies in musical pieces spell change click: kw:finding subsequences melodies musical pieces similarity measures for melodies measures of musical tension measuring musical tension this echoes head and eisenberg’s 2009 findings and dempsey and valenti’s 2016 findings.37 shortcomings of the easy search tool this study helped illuminate some shortcomings in es. sometimes the search formulation changes from es to the target, for example cramer preludes in es becomes all(cramer preludes) [a bound phrase] in one target, resulting in many fewer results than if the search had been done in the native interface. patrons may not realize this as they are searching. in another case there were no results for danças folclóricas brasileiras e suas aplicações educativas but removing the diacritics retrieves this title in our catalog, so it appears that diacritics do not function in es (at least when vufind is the target)—something that may not be apparent to searchers and hopefully can be addressed in the code. further research additional analysis could be done on this data set, including assessing whether searches were for known items or topics, and more specifically whether for articles, books, scores, or recordings. however, in many cases it is difficult to tell if a patron is looking for a score, recording, or information about a piece or composer. other research on es shows over half of searches (just over 58 percent in 2015) in the main es are for known items.38 this percentage is likely to be much higher in mpal’s es. with an enhanced data set it would also be possible to identify which target tools searchers are choosing most often. conclusion while many patrons (and librarians) are eager for a tool that can truly search everything, we are not there yet. some have tried to make music-specific interfaces for library catalogs, but this work is not widespread.39 perhaps because music students are often searching for things other than articles it would be better to have one tool that searches the catalog and streaming media tools information technology and libraries | december 2018 103 and one that only searches article indexes. some schools have taken this approach—configuring their discovery layer indexes to include article content but not the local catalog. there were several observations in this data of patron search behavior are not fully supported by library systems in all cases, but perhaps should be (e.g., use of + signs, searching by record label numbers or genre names/types of music/formats). in some cases, this is an issue with the metadata standards in use and in others it is about needing more flexible search options based on the metadata that we already have. newcomer et al. discuss this in their article outlining music discovery requirements.40 tools like easy search and discovery layers solve some problems for users but can create others. dedicated library catalogs are still generally the best tools for finding scores and recordings in our physical (and some online) collections, but not all libraries offer that tool anymore, instead offering a discovery layer as the primary search tool. in those cases, serious consideration needs to be given to facets, the ability to limit by format, and especially the frbrization of items, which is particularly problematic for music. additionally, there is a continued need for targeted instruction for music library users, because not only are the tools used in libraries less than perfect, the inherent challenges in searching for music because of its formats and titles are aggravated by musicians’ use of shorthand and colloquialisms to describe music materials. endnotes 1 john boyd et al., “the one-box challenge: providing a federated search that benefits the research process,” serials review 32, no. 4 (december 2006): 247–54, https://doi.org/10.1016/j.serrev.2006.08.005; sharon dyas-correia et al., “’the one-box challenge: providing a federated search that benefits the research process’ revisited,” serials review 41, no. 4 (october-december 2015): 250–56, https://doi.org/10.1080/00987913.2015.1095581. 2 lucy holman, “millennial students’ mental models of search: implications for academic librarians and database developers,” journal of academic librarianship 37, no. 1 (january 2011): 19–27, https://doi.org/10.1016/j.acalib.2010.10.003; brandi porter, “millennial undergraduate research strategies in web and library information retrieval systems,” journal of web librarianship 5, no. 4 (july-december 2011): 267–85, https://doi.org/10.1080/19322909.2011.623538; martin zimerman, “digital natives, searching behavior, and the library,” new library world 11, nos. 3/4 (2012): 174–201, https://doi.org/10.1108/03074801211218552. 3 susan avery and dan tracy, “using transaction log analysis to assess student search behavior in the library instruction classroom,” reference services review 42, no. 2 (june 2014): 332, https://doi.org/10.1108/rsr-08-2013-0044. 4 andrew asher, lynda m. duke, and suzanne wilson, “paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources,” college & research libraries 74, no. 5 (september 2013): 473, https://doi.org/10.5860/crl-374. https://doi.org/10.1080/00987913.2015.1095581 https://doi.org/10.1016/j.acalib.2010.10.003 https://doi.org/10.1080/19322909.2011.623538 https://doi.org/10.1108/03074801211218552 https://doi.org/10.1108/rsr-08-2013-0044 https://doi.org/10.5860/crl-374 the “black box” | dougan 104 https://doi.org/10.6017/ital.v37i4.10702 5 megan dempsey and alyssa valenti, “student use of keywords and limiters in web-scale discovery searching,” journal of academic librarianship 42, no. 3 (may 2016): 203, https://doi.org/10.1016/j.acalib.2016.03.002. 6 annie r. armstrong, “student perceptions of federated searching vs. single database searching,” reference services review 37, no. 3 august 2009): 291–303, https://doi.org/10.1108/00907320910982785; c. jeffrey belliston, jared l. howland, and brian c. roberts, “undergraduate use of federated searching: a survey of preferences and perceptions of value-added functionality,” college & research libraries 68, no. 6 (november 2007): 472-86, https://doi.org/10.5860/crl.68.6.472; sarah d. williams, angela bonnell, and bruce stoffel, “student feedback on federated search use, satisfaction, and web presence: qualitative findings of focus groups,” reference and user services quarterly 49, no. 2 (winter 2009): 131–39. 7 asher et al., “paths of discovery,” 476. 8 troy swanson and jeremy green, “why we are not google: lessons from a library web site usability study,” journal of academic librarianship 37, no. 3 (may 2011): 227, https://doi.org/10.1016/j.acalib.2011.02.014. 9 cory lown, tito sierra, and josh boyer, “how users search the library from a single search box,” college & research libraries 74, no. 3 (may 2013): 240, https://doi.org/10.5860/crl-321. 10 sarah dahlen and kathlene hanson, “preference vs. authority: a comparison of student searching in a subject-specific indexing and abstracting database and a customized discovery layer,” college & research libraries 78, no. 7 (november 2017), 892, https://doi.org/10.5860/crl.78.7.878. 11 ibid. 12 li fu and cynthia thomes, “implementing discipline-specific searches in ebsco discovery service,” new library world 115, nos. 3/4 (2014): 102–15, https://doi.org/10.1108/nlw-012014-0003. 13 kirstin dougan, “finding the right notes: an observational study of score and recording seeking behaviors of music students,” journal of academic librarianship 41, no. 1 (january 2015): 61–67, https://doi.org/10.1016/j.acalib.2014.09.013. 14 jennifer m. mayer, “serving the needs of performing arts students: a case study,” portal: libraries & the academy 15, no. 3 (july 2015): 416, https://doi.org/10.1353/pla.2015.0036. 15 joe clark and kristin yeager, “seek and you shall find? an observational study of music students’ library catalog search behavior,” journal of academic librarianship 44, no. 1 (january 2018): 105-12, https://doi.org/10.1016/j.acalib.2017.10.001. 16 christine d. brown, “straddling the humanities and social sciences: the research process of music scholars,” library & information science research 24, no. 1 (march 2002): 73–94, https://doi.org/10.1016/s0740-8188(01)00105-0; stephann makri and claire warwick, https://doi.org/10.1016/j.acalib.2016.03.002 https://doi.org/10.1108/00907320910982785 https://doi.org/10.5860/crl.68.6.472 https://doi.org/10.1016/j.acalib.2011.02.014 https://doi.org/10.5860/crl-321 https://doi.org/10.5860/crl.78.7.878 https://doi.org/10.1108/nlw-01-2014-0003 https://doi.org/10.1108/nlw-01-2014-0003 https://doi.org/10.1016/j.acalib.2014.09.013 https://doi.org/10.1353/pla.2015.0036 https://doi.org/10.1016/j.acalib.2017.10.001 https://doi.org/10.1016/s0740-8188(01)00105-0 information technology and libraries | december 2018 105 “information for inspiration: understanding architects' information seeking and use behaviors to inform design,” journal of the american society for information science & technology 61, no. 9 (september 2010): 1,745-770, https://doi.org/10.1002/asi.21338; francesca marini, “archivists, librarians, and theatre research,” archivaria 63 (2007): 7–33; ann medaille, “creativity and craft: the information-seeking behavior of theatre artists,” journal of documentation 66, no. 3 (may 2010): 327–47, https://doi.org/10.1108/00220411011038430; marybeth meszaros, “a theatre scholarartist prepares: information behavior of the theatre researcher,” in advances in library administration and organization (v. 29), delmus e. williams and janine golden, eds. (bingley, uk: emerald group publishing limited, 2010): 185-217; bonnie reed and donald r. tanner, “information needs and library services for the fine arts faculty,” journal of academic librarianship 27, no. 3 (may 2001): 231, https://doi.org/10.1016/s0099-1333(01)00184-7; shannon robinson, “artists as scholars: the research behavior of dance faculty,” college & research libraries 77, no. 6 (november 2016): 779-94, https://doi.org/10.5860/crl.77.6.779. 17 ethelene whitmire, “disciplinary differences and undergraduates’ information‐seeking behavior,” journal of the association for information science and technology 53 (june 2002): 631-38, https://doi.org/10.1002/asi.10123. 18 tina chrzastowski and lura joseph, “surveying graduate and professional students' perspectives on library services, facilities and collections at the university of illinois at urbana-champaign: does subject discipline continue to influence library use?,” issues in science & technology librarianship 45, no. 1 (winter 2006), https://doi.org/10.5062/f4dz068j. 19 ellen collins and graham stone, “understanding patterns of library use among undergraduate students from different disciplines,” evidence based library and information practice 9 (september 2014): 51–67, https://doi.org/10.18438/b8930k. 20 this is up from the 4.33 average reported by mischo in 2012 (164). 21 including direct from departmental webpage and via gateway es dropdown choices. 22 in mischo’s 2012 analysis of easy search logs, 52 percent of sessions had one string and 48 percent had two or more. by 2015, single-query sessions had risen to 57 percent (william mischo, et al., "the bento approach to library discovery: web-scale and beyond,” internet librarian international, october 21, 2015). 23 william h. mischo et al., “user search activities within an academic library gateway: implications for webscale discovery systems,” in planning and implementing resource discovery tools in academic libraries, ed. mary popp and diane dallis (hershey, pa: igi global, 2012), 163. 24 kirstin dougan, “information seeking behaviors of music students,” reference services review 40, no. 4 (november 2012): 563, https://doi.org/10.1108/00907321211277369. https://doi.org/10.1002/asi.21338 https://doi.org/10.1108/00220411011038430 https://doi.org/10.1016/s0099-1333(01)00184-7 https://doi.org/10.5860/crl.77.6.779 https://doi.org/10.1002/asi.10123 https://doi.org/10.5062/f4dz068j https://doi.org/10.18438/b8930k https://doi.org/10.1108/00907321211277369 the “black box” | dougan 106 https://doi.org/10.6017/ital.v37i4.10702 25 vanessa williams, “‘welded in a single mass’: memory and community in london’s concert halls during the first world war,” the journal of musicological research 33, nos. 1–3 (2014): 27–38. 26 mischo, “user search activities,” 162. 27 this echoes earlier research that shows most searchers use default settings and keyword searches. 28 avery and tracy, “using transaction logs,” 31. 29 barbara d. henigman and richard burbank, “online music symbol retrieval from the access angle,” information technology & libraries 14, 1 (march 1995): 5–16. 30 we still have to use our older voyager opac or the staff-side of voyager to effectively search by call number until we get a newer version of vufind. 31 symphony no. 4 in e flat “romantic” by anton bruckner, klaus tennstedt (conductor), london philharmonic orchestra. (performer). 32 this is mozart, “clarinet concerto in a, k. 622,” meyer/berlin philharmonic/abbado emi classics 57128; 7.24356e+11. 33 this is reich: sextet / piano phase / eight lines (griffiths kevin/ london steve reich ensemble/ the/ stephen wallace) (cpo: 777337-2)). 34 mischo, “user search activities,” 169. 35 this reinforces what lown and asher et al. found as cited in the literature review above. 36 kirstin dougan, “finding the right notes: an observational study of score and recording seeking behaviors of music students,” journal of academic librarianship 41, no. 1 (january 2015): 66. 37 alison head and michael eisenberg, “finding context: what today’s college students say about conducting research in the digital age,” progress report (2009) (retrieved from http://projectinfolit.org/images/pdfs/pil_progressreport_2_2009.pdf); dempsey and valenti, “student use of keywords and limiters,” 2016. 38 william h. mischo et al., “the bento approach to library discovery: web-scale and beyond,” internet librarian international, october 21, 2015. 39 anke hofmann and barbara wiermann, “customizing music discovery services: experiences at the hochschule für musik und theater, leipzig,” music reference services quarterly 17, no. 2 (june 2014): 61–75, https://doi.org/10.1080/10588167.2014.904699; bob thomas, “creating a specialized music search interface in a traditional opac environment,” oclc systems & services 27, no. 3 (august 2011): 248–56, https://doi.org/10.1108/10650751111164588. 40 nara newcomer et al., “music discovery requirements: a guide to optimizing interfaces,” notes 69, no. 3 (march 2013): 494-524, https://doi.org/10.1353/not.2013.0017. http://projectinfolit.org/images/pdfs/pil_progressreport_2_2009.pdf https://doi.org/10.1080/10588167.2014.904699 https://doi.org/10.1108/10650751111164588 https://doi.org/10.1353/not.2013.0017 abstract introduction background literature review general search studies and single search boxes search and library use in different disciplines methodology findings average number (and range) of searches per session average number (and range) of terms per search single-term searches two-term searches and names long search strings type of index search—title/author/keyword and adding subsets or tools use of boolean operators, quotation marks, parentheses, truncation, etc. work numbers and key indications search by format or edition type other observations on formulation of searches searching by call numbers and recording label numbers common descriptions, natural language queries, genre queries, and context words discussion search location appropriateness and context patron knowledge level shortcomings of the easy search tool further research conclusion endnotes june_ital_fagan_final an evidence-based review of academic web search engines, 2014-2016: implications for librarians’ practice and research agenda jody condit fagan an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 7 7 abstract academic web search engines have become central to scholarly research. while the fitness of google scholar for research purposes has been examined repeatedly, microsoft academic and google books have not received much attention. recent studies have much to tell us about google scholar’s coverage of the sciences and its utility for evaluating researcher impact. but other aspects have been understudied, such as coverage of the arts and humanities, books, and non-western, non-english publications. user research has also tapered off. a small number of articles hint at the opportunity for librarians to become expert advisors concerning scholarly communication made possible or enhanced by these platforms. this article seeks to summarize research concerning google scholar, google books, and microsoft academic from the past three years with a mind to informing practice and setting a research agenda. selected literature from earlier time periods is included to illuminate key findings and to help shape the proposed research agenda, especially in understudied areas. introduction recent pew internet surveys indicate an overwhelming majority of american adults see themselves as lifelong learners who like to “gather as much information as [they] can” when they encounter something unfamiliar (horrigan 2016). although significant barriers to access remain, the open access movement and search engine giants have made full text more available than ever.1 the general public may not begin with an academic search engine, but google may direct them to google scholar or google books. within academia, students and faculty rely heavily on academic web search engines (especially google scholar) for research; among academic researchers in high-income areas, academic search engines recently surpassed abstracts & indexes as a starting place for research (inger and gardner 2016, 85, fig. 4). given these trends, academic librarians have a professional obligation to understand the role of academic web search engines as part of the research process. jody condit fagan (faganjc@jmu.edu) is professor and director of technology, james madison university, harrisonburg, va. 1 khabsa and giles estimate “almost 1 in 4 of web accessible scholarly documents are freely and publicly available” (2014, 5). an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 8 two recent events also point to the need for a review of research. legal decisions in 2016 confirmed google’s right to make copies of books for its index without paying or even obtaining permission from copyright holders, solidifying the company’s opportunity to shape the online experience with respect to books. meanwhile, microsoft rebooted their academic web search engine, now called microsoft academic. at the same time, information scientists, librarians, and other academics conducted research into the performance and utility of academic web search engines. this article seeks to review the last three years of research concerning academic web search engines, make recommendations related to the practice of librarianship, and propose a research agenda. methodology a literature review was conducted to find articles, conference presentations, and books about the use or utility of google books, google scholar, and microsoft academic for scholarly use, including comparisons with other search tools. because of the pace of technological change, the focus was on recent studies (2014 through 2016, inclusive). a search was conducted on “google books” in ebsco’s library and information science and technology abstracts (lista) on december 19, 2016, limited to 2014-2016. of the 46 results found, most were related to legal activity. only four items related to the tool’s use for research. these four titles were entered into google scholar to look for citing references, but no additional relevant citations were found. in the relevant articles found, the literature reviews testified to the general lack of studies of google books as a research tool (abrizah and thelwall 2014; weiss 2016) with a few exceptions concerning early reviews of metadata, scanning, and coverage problems (weiss 2016). a search on “google books” in combination with “evaluation or review or comparison” was also submitted to jmu’s discovery service,2 limited to 2014-2016 in combination with the terms. forty-nine items were found and from these, three relevant citations were added; these were also entered into google scholar to look for citing references. however, no additional relevant citations were found. thus, a total of seven citations from 2014-2016 were found with relevant information concerning google books. earlier citations from the articles’ bibliographies were also reviewed when research was based on previous work, and to inform the development of a fuller research agenda. a search on “microsoft academic” in lista on february 3, 2017 netted fourteen citations from 2014-2016. only seven seemed to focus on evaluation of the tool for research purposes. a search on “microsoft academic” in combination with terms “evaluation or review or comparison” was also submitted to jmu’s discovery service, limited to 2014-2016. eighteen items were found but no additional citations were added, either because they had already been found or were not relevant. the seven titles found in lista were searched in google scholar for citing references; four additional relevant citations were found, plus a paper relevant to google scholar not 2 jmu’s version of ebsco discovery service contained 453,754,281 items at the time of writing and is carefully vetted to contain items of curricular relevance to the jmu community (fagan and gaines 2016). information technology and libraries | june 2017 9 previously discovered (weideman 2015). thus, a total of eleven citations were found with relevant information for this review concerning microsoft academic. because of this small number, several articles prior to 2014 were included in this review for historical context. an initial search was performed on “google scholar” in lista on november 19, 2016, limited to 2014-2016. this netted 159 results, of which 24 items were relevant. a search on “google scholar” in combination with terms “evaluation or review or comparison” was also submitted to jmu’s discovery tool limited to 2014-2016, and eleven relevant citations were added. items older than 2014 that were repeatedly cited or that formed the basis of recent research were retrieved for historical context. finally, relevant articles were submitted to google scholar, which netted an additional 41 relevant citations. altogether, 70 citations were found to articles with relevant information for this review concerning google scholar in 2014-2016. readers interested in literature reviews covering google scholar studies prior to 2014 are directed to (gray et al. 2012; erb and sica 2015; harzing and alakangas 2016b). findings google books google books (https://books.google.com) contains about 30 million books, approaching the library of congress’s 37 million, but far shy of google’s estimate of 130 million books in existence (wu 2015), which google intends to continue indexing (jackson 2010). content in google books includes publisher-supplied, self-published, and author-supplied content (harper 2016) as well as the results of the famous google books library project. started in december 2004 as the “google print” project,3 the project involved over 40 libraries digitizing works from their collections, with google indexing and performing ocr to make them available in google books (weiss 2016; mays 2015). scholars have noted many errors with google books metadata, including misspellings, inaccurate dates, and inaccurate subject classifications (harper 2016; weiss 2016). google does not release information about the database’s coverage, including which books are indexed or which libraries’ collections are included (abrizah and thelwall 2014). researchers have suggested the database covers mostly u.s. and english-language books (abrizah and thelwall 2014; weiss 2016). the conveniences of google books include limits by the type of book availability (e.g. free ebooks vs. google e-books), document type, and date. the detail view of a book allows magnification, hyperlinked tables of contents, buying and “find in a library” options, “my library,” and user history (whitmer 2015). google books also offers textbook rental (harper 2016) and limited print-on-demand services for out-of-print books (mays 2015; boumenot 2015). in april 2016, the supreme court affirmed google’s right to make copies for its index without paying or even obtaining permission from copyright holders (authors guild 2016; los angeles times 2016). scanning of library books and “snippet view” was deemed fair use: “the purpose of the copying is highly transformative, the public display of text is limited, and the revelations do 3 https://www.google.com/googlebooks/about/history.html an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 10 not provide a significant market substitute for the protected aspects of the originals” (u.s. court of appeals for the second circuit 2015). literature concerning high-level implications of google books suggests the tool is having a profound effect on research and scholarship. the tool has been credited for serving as “a huge laboratory” for indexing, interpretation, working with document image repositories, and other activities (jones 2010). at the same time, the academic community has expressed concerns about google books’s effects on social justice and how its full-text search capability may change the very nature of discovery (hoffmann 2014; hoffmann 2016; szpiech 2014). one study found that books are far more prevalently cited in wikipedia than are research articles (kousha and thelwall 2017). yet investigations of google books’ coverage and utility as a research tool seem to be sorely lacking. as weiss noted, “no critical studies seem to exist on the effect that google books might have on the contemporary reference experience” (weiss 2016, 293). furthermore, no information was found concerning how many users are taking advantage of google books; the tool was noticeably absent from surveys such as (inger and gardner's (2016) and from research centers such as the pew internet research project. in a largely descriptive review, harper (2016) bemoaned google books’ lack of integration with link resolvers and discovery tools, and judged it lacking in relevant material for the health sciences, because so much of the content is older. she also noted the majority of books scanned are in english, which could skew scholarship. the non-english skew of google books was also lamented by weiss, who noted an “underrepresentation of spanish and overestimation of french and german (or even japanese for that matter)” especially as compared to the number of spanish speakers in the united states (weiss 2016, 286-306). whitmer (2015) and mays (2015) provided practical information about how google books can be used as a reference tool. whitmer presented major google books features and challenged librarians to teach google books during library instruction. mays conducted a cursory search on the 1871 chicago fire and described the primary documents she retrieved as “pure gold,” including records of city council meetings, notes from insurance companies, reports from relief societies, church sermons on the fire, and personal memoirs (mays 2015, 22). mays also described google books as a godsend to genealogists for finding local records (e.g. police departments, labor unions, public schools). in her experience, the geographic regions surrounding the forty participating google books library project libraries are “better represented than other areas” (mays 2015, 25). mays concludes, “its poor indexing and search capabilities are overshadowed by the ease of its fulltext search capabilities and the wonderful ephemera that enriches its holdings far beyond mere ‘books’” (mays 2015, 26). abrizah and thelwall (2014) investigated whether google books and google scholar provided “good impact data for books published in non-western countries.” they used a comprehensive list of arts, humanities, and social sciences books (n=1,357) from the five main university presses in information technology and libraries | june 2017 11 malaysia 1961-2013. they found only 23% of the books were cited in google books4 and 37% in google scholar (p. 2502). the overlap was small: only 15% were cited in both google scholar and google books. english-language books were more likely to be cited in google books; 40% of english language books were cited versus 16% malay. examining the top 20 books cited in google books, researchers found them to be mostly written in english (95% in google books vs 29% in the sample), and published by university of malaysia press (60% in google books vs 26% in the sample) (2505). the authors concluded that due to the low overlap between google scholar and google books, searching both engines was required to find the most citations to academic books. kousha and thelwall (2015; 2011) compared google books with thomson reuters book citation index (bkci) to examine its suitability for scholarly impact assessment and found google books to have a clear advantage over bkci in the total number of citations found within the arts and humanities, but not for the social sciences or sciences. they advised combining results from bkci with google books when performing research impact assessment for the arts and humanities and social sciences, but not using google books for the sciences, “because of the lower regard for books among scientists and the lower proportion of google books citations compared to bkci citations for science and medicine” (kousha and thelwall 2015, 317). microsoft academic microsoft academic (https://academic.microsoft.com) is an entirely new software product as of 2016. therefore, the studies cited prior to 2016 refer to entirely different search engines than the one currently available. however, a historical account of the tool and reviewers’ opinions was deemed helpful for informing a fuller picture of academic web search engines and pointing to a research agenda. microsoft academic was born as windows live academic in 2006 (carlson 2006), was renamed live search academic after a first year of struggle (jacsó 2008), and was scrapped two years later after the company recognized it did not have sufficient development support in the united states (jacsó 2011). microsoft asia research group launched a beta tool called libra in 2009, which redirected to the “microsoft academic search” service by 2011. early reviews of the 2011 edition of microsoft academic search were promising, although the tool clearly lacked the quantity of data searched by google scholar (jacsó 2011; hands 2012). there were a few studies involving microsoft academic search in 2014. ortega and aguillo (2014) compared microsoft academic search and google scholar citations for research evaluation and concluded “microsoft academic search is better for disciplinary studies than for analyses at institutional and individual levels. on the other hand, google scholar citations is a good tool for individual assessment because it draws on a wider variety of documents and citations” (1155). 4 google books does not support citation searching; the researchers searched for the book title to manually find citations to a book. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 12 as part of a comparative investigation of an automatic method for citation snowballing using microsoft academic search, choong et al. (2014) manually searched for a sample of 949 citations to journal or conference articles cited from 20 systematic reviews. they found microsoft academic search contained 78% of the cited articles and noted its utility for testing automated methods due to its free api and no blocks to automated access. the researchers also tested their method against google scholar, but noted “computer-access restrictions prevented a robust comparison” (n.p.). also in 2014, orduna-malea et al. (2014) attempted a longitudinal study of disciplines, journals, and organizations in microsoft academic search only to find the database had not been updated since 2013. furthermore they found the indexing to be incomplete and still in process, meaning microsoft academic search’s presentation of information about any particular publication, organization, or author was distorted. despite this finding, mas was included in two studies of scholar profiles. ortega (2015) compared scholar profiles across google scholar, microsoft academic search, research gate, academia.edu, and mendeley, and found little overlap across the sites. they also found social and usage indicators did not consistently correlate with bibliometric indicators, except on the researchgate platform. social and usage indicators were “influenced by their own social sites,” while bibliometric indicators seemed more stable across all services (13). ward et al. (2015) still included microsoft academic search in their discussion of scholarly profiles as part of the social media network, noting microsoft academic search was painfully time-consuming to work with in terms of consolidating data, correcting items, and adding missing items. in september 2016, hug et al. demonstrated the utility of the new microsoft academic api by conducting a comparative evaluation of normalized data from microsoft academic and scopus (hug, ochsner, and braendle 2016). they noted microsoft academic has “grown massively from 83 million publication records in 2015 to 140 million in 2016” (10). the microsoft academic api offers rich, structured metadata with the exception of document type. they found all attributes containing text were normalized and that identifiers were available for all entities, including references, supporting bibliometricians’ needs for data retrieval, handling, and processing. in addition to the lack of document type, the researchers also found the “fields of study” to be too granular and dynamic, and their hierarchies incoherent. they also desired the ability to use the doi to build api requests. nevertheless, the advantages of microsoft academic’s metadata and api retrieval suggested to hug et al. that microsoft academic was superior to google scholar for calculating research impact indicators and bibliometrics in general. in october 2016, harzing and alakangas compared publication and citation coverage of the new microsoft academic with google scholar, scopus, and web of science using a sample of 145 academics at the university of melbourne (harzing and alakangas 2016a) including observations from 20-40 faculty each in the humanities, social sciences, engineering, sciences, and life sciences. they discovered microsoft academic had improved substantially since their previous study (harzing 2016b), increasing 9.6% for a comparison sample in comparison with 1.4%, 2%, and 1.7% growth in google scholar, scopus, and web of science (n.p.). the researchers noted a few information technology and libraries | june 2017 13 problems with data quality, “although the microsoft academic team have indicated they are working on a resolution” (n.p.). on average, the researchers found that microsoft academic found 59% as many citations as google scholar, 97% as many citations as scopus, and 108% as many citations as web of science. google scholar had the top counts for each disciplinary area, followed by scopus except in the social sciences and humanities, where microsoft academic ranked second. the researchers explained that microsoft academic “only includes citation records if it can validate both citing and cited papers as credible,” as established through a machine-learningbased system, and discussed an emerging metric of “estimated citation count” also provided by microsoft academic. the researchers concluded that microsoft academic is promising to be “an excellent alternative for citation analysis” and suggested microsoft should work to improve coverage of books and grey literature. google scholar google scholar was released in beta form in november 2004, and was expanded to include judicial case law in 2009. while google scholar has received much attention in academia, it seems to be regarded by google as a niche product: in 2011 google removed scholar from the list of top services and list of “more” services, relegating it to the “even more” list. in 2014, the scholar team consisted of just nine people (levy 2014). describing google scholar in an introductory manner is not helped by google’s vague documentation, which simply says it “includes scholarly articles from a wide variety of sources in all fields of research, all languages, all countries, and over all time periods.”5 the “wide variety of sources” includes “journal papers, conference papers, technical reports, or their drafts, dissertations, pre-prints, post-prints, or abstracts,” as well as court opinions and patents, but not “news or magazine articles, book reviews, and editorials.” books and dissertations uploaded to google book search are “automatically” included in scholar. google says abstracts are key, noting “sites that show login pages, error pages, or bare bibliographic data without abstracts will not be considered for inclusion and may be removed from google scholar.” studies of google scholar can be divided in to three major categories of focus: investigating the coverage of google scholar; the use and utility of google scholar as part of the research process; and google scholar’s utility for bibliographic measurement, including evaluating the productivity of individual researchers and the impact of journals. there is some overlap across these categories, because studies of google scholar seem to involve three questions: 1) what is being searched? 2) how does the search function? and 3) to what extent can the user usefully accomplish her task? the coverage of google scholar scholars want to know what “scholarship” is covered by google scholar, but the documentation merely states that it indexes “papers, not journals”6 and challenges researchers to investigate 5 https://scholar.google.com/intl/en/scholar/inclusion.html 6 https://www.google.com/intl/en/scholar/help.html#coverage an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 14 google scholar’s coverage empirically despite google scholar’s notoriously challenging technical limitations. while some limitations of google scholar have been corrected over the years, longstanding logistical hurdles involved with studying google scholar’s coverage have been well-documented for over a decade (shultz 2007; bonato 2016; haddaway et al. 2015; levay et al. 2016), and include: • search queries are limited to 256 characters • not being able to retrieve more than 1,000 results • not being able to display more than 20 results per page • not being able to download batches of results (e.g. to load into citation management software) • duplicate citations (beyond the multiple article “versions”), requiring manual screening • retrieving different results with advanced and basic searches • no designation of the format of items (e.g. conference papers) • minimal sort options for results • basic boolean operators only7 • illogical interpretation of boolean operators: esophagus or oesophagus and oesophagus or esophagus return different numbers of results (boeker, vach, and motschall 2013) • non-disclosure of the algorithm by which search results are sorted. additionally, one study reported experiencing an automated block to the researcher’s ip address after the export of approximately 180 citations or 180 individual searches (haddaway et al. 2015, 14). furthermore, the research excellence framework was unable to use google scholar to assess the quality of research in uk higher education institutions, because of researchers’ inability to agree with google on a “suitable process for bulk access to their citation information, due to arrangements that google scholar have in place with publishers” (research excellence framework 2013, 1562). such barriers can limit what can be studied and also cost researchers significant time in terms of downloading (prins et al. 2016) and cleaning citations (levay et al. 2016). despite these hurdles, research activity analyzing the coverage of google scholar has continued in the past two years, often building off previous studies. this section will first discuss google scholar’s size and ranking, followed by its coverage of articles and citations, then its coverage of books, grey literature, and open access and institutional repositories. google scholar size and ranking in a 2014 study, khabsa and giles estimated there were at least 114 million english-language scholarly documents on the web, of which google scholar had “nearly 100 million.” another study by orduna-malea, ayllón, martín-martín, and lópez-cózar (2015) estimated that the total number 7 e.g., no nesting of logical subexpressions deeper than one level (boeker, vach, and motschall 2013) and no truncation operators. information technology and libraries | june 2017 15 of documents indexed by google scholar, without any language restriction, was between 160 and 165 million. by comparison, in 2016 the author’s discovery tool contained about 168 million items in academic journals, conference materials, dissertations, and reviews.8 google scholar’s presence in the information marketplace has influenced vendors to increase the discoverability of their content, including pushing for the display of abstracts and/or the first page of articles (levy 2014). proquest and gale indexes were added to google scholar in 2015 (quint 2016). martín-martín et al. (2016b) noted that google scholar’s agreements with big publishers come at a price: “the impossibility of offering an api,” which would support bibliometricians’ research (54). google scholar’s results ranking “aims to rank documents the way researchers do, weighing the full text of each document, where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature.”9 martín-martín and his colleagues (2017, 159) conducted a large, longitudinal study of null query results in google scholar and found a strong correlation between result list ranking and times cited. the influence of citations is so strong that when the researchers performed the same search process four months later, 14.7% of documents were missing in the second sample, causing them to conclude even a change of one or two citations could lead to a document being excluded or included from the top 1,000 results (157). using citation counts as a major part of the ranking algorithm has been hypothesized to produce the “matthew effect,” where “work that is already influential becomes even more widely known by virtue of being the first hit from a google scholar search, whereas possibly meritorious but obscure academic work is buried at the bottom” (antell et al. 2013, 281). google scholar has been shown to heavily bias its ranking toward english-language publications even when there are highly cited non-english publications in the result set, although selection of interface language may influence the ranking. martin-martin and his colleagues noted that google scholar seems to use the domain of the document’s hosting web site as a proxy for language, meaning that “some documents written in english but with their primary version hosted in nonanglophone countries’ web domains do appear in lower positions in spite of receiving a large number of citations” (martin-martin et al. 2017, 161). this effect is shown dramatically in figure 3 of their paper. google scholar coverage: articles and citations the coverage of articles, journals, and citations by google scholar has been commonly examined by using brute force methods to retrieve a sample of items from google scholar and possibly one or more of its competitors. (studies discussed in this section are listed in table 1). the goal is usually to determine how well google scholar’s database compares to traditional research databases, usually in a specific field. core methodology involves importing citations into software such as publish or perish (harzing 2016a), cleaning the data, then performing statistical tests, 8 the discovery tool does not contain all available metadata but has been carefully vetted (fagan and gaines 2016). 9 https://www.google.com/intl/en/scholar/about.html an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 16 expert review, or both. haddaway (2015) and moed et al. (2016) have written articles specifically discussing methodological aspects. recent studies repeatedly find that google scholar’s coverage meets or exceeds that of other search tools, no matter what is identified by target samples, including journals, articles, and citations (karlsson 2014; harzing 2014; harzing 2016b; harzing and alakangas 2016b; moed, barilan, and halevi 2016; prins et al. 2016; wildgaard 2015; ciccone and vickery 2015). in only three studies did google scholar find fewer items, and the meaningful difference was minimal.10 science disciplines were the most studied in google scholar, including agriculture, astronomy, chemistry, computer science, ecology, environmental science, fisheries, geosciences, mathematics, medicine, molecular biology, oceanography, physics, and public health. social sciences studied include education (prins et al. 2016), economics (harzing 2014), geography (ştirbu et al. 2015, 322-329), information science (winter, zadpoor, and dodou 2014; harzing 2016b), and psychology (pitol and de groote 2014). studies related to the arts or humanities 2014-2016 included an analysis of open access journals in music (testa 2016) and a comparison between google scholar and web of science for research evaluation within education, pedagogical sciences, and anthropology11 (prins et al. 2016). wildgaard (2015) and bornmann et al. (2016) included samples of humanities scholars as part of bibliometric studies, but did not discuss disciplinary aspects related to coverage. prior to 2014, the only study found related to the arts and humanities compared google scholar with historical abstracts (kirkwood jr. and kirkwood 2011). google scholar’s coverage has been growing over time (meier and conkling 2008; harzing 2014; winter, zadpoor, and dodou 2014; bartol and mackiewicz-talarczyk 2015, 531; orduña-malea and delgado lópez-cózar 2014) with recent increases in older articles (winter, zadpoor, and dodou 2014; harzing and alakangas 2016b), leading some to question whether this supports the documented trend of increased citation of older literature (martín-martín et al. 2016c; varshney 2012). winter et al. noted that in 2005 web of science yielded more citations than google scholar for about two-thirds of their sample, but for the same sample in 2013, google scholar found more citations than web of science, with only 6.8% of citations not retrieved by google scholar (winter, zadpoor, and dodou 2014, 1560). the unique citations of web of science were “typically documents before the digital age and conference proceedings not available online” (winter, zadpoor, and dodou 2014, 1560). harzing and alakangas’s (2016b) large-scale longitudinal comparison of google scholar, scopus, and web of science suggested that google scholar’s retroactive expansion has stabilized and now all three databases are growing at similar rates. 10 for example, bramer, giustini, and kramer (2016a) found slightly more of their 4,795 references from systematic reviews in embase (97.5%) than in google scholar (97.2%). in testa (2016), the music database rilm indexed two more of the 84 oa journals than google scholar (which indexed at least one article from 93% of the journals). finally, in a study using citations to the most-cited article of all time as a sample, web of science found more citations than did google scholar (winter, zadpoor, and dodou 2014). 11 prins et al. classified anthropology as part of the humanities. information technology and libraries | june 2017 17 google scholar also seems to cover both the oldest and the most recent publications. unlike traditional abstracts and indexes, google scholar is not limited by starting year, so as publishers post tables of contents of their earliest journals online, google scholar discovers those sources (antell et al. 2013, 281). trapp (2016) reported the number of citations to a highly-cited physics paper after the first 11 days of publication to be 67 in web of science, 72 in scopus, and 462 in google scholar (trapp 2016, 4). in a study of 800 citations to nobelists in multiple fields, harzing found that “google scholar could effectively be 9–12 months ahead of web of science in terms of publication and citation coverage” (2013, 1073). an increasing proportion of journal articles in google scholar are freely available in full text. a large-scale, longitudinal study of highly-cited articles 1950-2013 found 40% of article citations in the sample were freely available in full text (martín-martín et al. 2014). another large-sample study found 61% of articles in their sample from 2004–2014 could be freely accessed (jamali and nabavi 2015). in both studies, nih.gov and researchgate were the top two full-text providers. google scholar’s coverage of major publisher content varies; having some coverage of a publisher does not imply all articles or journals from that publisher are covered. in a sample of 222 citations compared across google scholar, scopus, and web of science, google scholar contained all of the springer titles, as many elsevier titles as scopus, and the most articles by wolters kluwer and john wiley. however, among the three databases, google scholar contained the fewest articles by bmj and nature (rothfus et al. 2016). an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 18 18 study sample results (bartol and mackiewicztalarczyk 2015) documents retrieved in response to searches on crops and fibers in article titles, 1994-2013 (samples varied by crop) google scholar returned more documents retrieved for each crop. for example, “hemp” retrieved 644 results in google scholar, 493 in scopus, and 318 in web of science; google scholar demonstrated higher yearly growth of records over time. (bramer, giustini, and kramer 2016b) references from a pool of systematic reviewer searches in medicine (n=4795) google found 97.2%, embase, 97.5%, medline 92.3% of all references; when using search strategies, embase retrieved 81.6%, medline 72.6%, and google scholar 72.8%. (ciccone and vickery 2015) based on 183 user searches randomly selected from ncsu libraries’ 2013 summon search logs (n=137) no significant difference between the performance of google scholar, summon, and eds for known-item searches; “google scholar outperformed both discovery services for topical searches.” (harzing 2014) publications and citation metrics for 20 nobelists in chemistry, economics, medicine, physics, 20122013 (samples varied) google scholar coverage is now “increasing at a stable rate” and provides “comprehensive coverage across a wide set of disciplines for articles published in the last four decades” (575). (harzing 2016b) citations from one researcher (n=126) microsoft academic found all books and journal articles covered by google scholar; google scholar found 35 additional publications including book chapters, white papers, and conference papers. (harzing and alakangas 2016a) samples from (harzing and alakangas 2016b, 802) (samples varied by faculty) google scholar provided higher “true” citation counts than microsoft academic but microsoft academic “estimated” citation counts were 12% higher than google scholar for life sciences and equivalent for the sciences. information technology and libraries | june 2017 19 (harzing and alakangas 2016b) citations of the works of 145 faculty among 37 scholarly disciplines at the university of melbourne (samples varied by faculty) for the top faculty member, google scholar had 519 total papers (compared with 309 in both web of science and scopus); google scholar had 16,507 citations (compared with 11,287 in web of science and 11,740 in scopus). (hilbert et al. 2015) documents published by 76 information scientists in german-speaking countries (n=1,017) google scholar covered 63%, scopus, 31%, bibsonomy, 24%, mendeley, 19%, web of science, 15%, citeulike, 8%. (jamali and nabavi 2015) items published between 2004 and 2014 (n=8,310) 61% of articles were freely available; of these, 81% were publisher versions and 14% were pre-prints; researchgate was the top full-text source netting 10.5% of full-text sources, followed by ncbi.nlm.nih.gov (6.5%). (karlsson 2014) journals from ten different fields (n=30) google scholar retrieved documents from all the selected journals; summon only retrieved documents from 14 out of 30 journals. (lee et al. 2015) journal articles housed in florida state university’s institutional repository (n=170) metadata found in google for 46% of items and in google scholar for 75% of items; google scholar found 78% of available full text. google scholar found full text for six items with no full text in the ir. (martín-martín et al. 2014) items highly cited by google scholar (n=64,000) 40% could be freely accessed using google scholar; nih.gov and researchgate were the top two full-text providers. (moed, bar-ilan, and halevi 2016) citations to 36 highly cited articles in 12 scientific-scholarly english-language journals (n=about 7,000) 47% of sources were in both google scholar and scopus; 47% of sources were in google scholar only; 6% of sources were in scopus only; of the unique google scholar citations, sources were most often from google books, springer, ssrn, researchgate, acm digital library, arxiv, and aclweb.org. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 20 (prins et al. 2016) article citations in the field of education and pedagogies, and citations to 328 articles in anthropology (n=774) google scholar found 22,887 citations in education & pedagogical science compared to web of science’s 8,870, and 8,092 in anthropology compared with web of science’s 1,097. (ştirbu et al. 2015) compared # of citations resulting from two geographical topic searches (samples varied) google scholar found 2,732 geographical references whereas web of science found only 275, georef 97, and francis 45. for sedimentation, google scholar found 1,855 geographical references compared to web of science’s 606, georef’s 1,265, and francis’s 33; google scholar overlapped web of science by 67% and 82% for the two searches, and georef by 57% and 62% (testa 2016) open access journals in music (n=84) google scholar indexed at least one article from 93% of oa journals. rilm indexed two additional journals. (wildgaard 2015) publications from researchers in astronomy, environmental science, philosophy and public health (n=512) publication count from web of science was 2-4 times lower for all disciplines than google scholar; citation count was up to 13 times lower in web of science than in google scholar. (winter, zadpoor, and dodou 2014) growth of citations to 2 classic articles (19952013) and 56 science and social science articles in google scholar, 2005-2013 (samples varied) total citation counts 21% higher in web of science than google scholar for lowry (1951) but google scholar 17% higher than web of science for garfield (1955) and 102% higher for the 56 research articles; google scholar showed a significant retroactive expansion to all articles compared to negligible retroactive growth in web of science. table 1. studies investigating google scholar’s coverage of journal articles and citations, 2014-2016. information technology and libraries | june 2017 21 google scholar coverage: books many studies mentioned that books, including google books, are sometimes included in google scholar results. jamali and nabavi (2015) found 13% of their sample of 8,310 citations from google scholar were books, while martín-martín et al. (2014) had found that 18% of their sample of 64,000 citations from google scholar were books. within the field of anthropology, prins (2016) found books to generate the most citation impact in google scholar (41% of books in their sample were cited in google scholar) compared to articles (21% of articles were cited in google scholar). in education, 31% of articles and 25% of books were cited by google scholar (3). abrizah and thelwall found only 37% of their sample of 1,357 arts, humanities, and social sciences books from the five main university presses in malaysia had been cited in google scholar (23% of the books had been cited in google books) (abrizah and thelwall 2014, 2502). the overlap was small: 15% had impact in both google scholar and google books. the authors concluded that due to the low overlap between google scholar and google books, searching both engines is required to find the most citations to academic books. english books were significantly more likely to be cited in google scholar (48% vs. 32%), as were edited books (53% vs. 36%). they surmised edited books’ citation advantage was due to the use of book chapters in social sciences. they found arts and humanities books more likely to be cited in google scholar than social sciences books (40% vs. 34%) (abrizah and thelwall 2014, 2503). google scholar coverage: grey literature grey literature refers to documents not published commercially, including theses, reports, conference papers, government information, and poster sessions. haddaway et al. (2015) was the only empirical study found focused on grey literature. they discovered that between 8% and 39% of full-text search results from google scholar were grey literature, with the greatest concentration of citations from grey literature on page 80 of results for full-text searches and page 35 for title searches. they concluded “the high proportion of grey literature that is missed by google scholar means it is not a viable alternative to hand searching for grey literature as a standalone tool” (2015, 14). for one of the systematic reviews in their sample, none of the 84 grey literature articles cited were found within the exported google scholar search results. the only other investigation of grey literature found was bonato (2016), who after conducting a very limited number of searches on one specific topic and a search for a known item, concluded google scholar to be “deficient.” in conclusion, despite much offhand praise for google scholar’s grey literature coverage (erb and sica 2015; antell et al. 2013), the topic has been little studied and when it has, grey literature results have not been prominent. google scholar coverage: open access and institutional repository content erb and sica touted google scholar’s access to “free content that might not be available through a library’s subscription services,” including open access journals and institutional repository coverage (2015, 48). recent research has dug deeper into both these content areas. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 22 in general, oa articles have been shown to net more citations than non-oa articles, as koler-povh, južnic, and turk (2014) showed within the field of civil engineering. across their sample of 2,026 scholarly articles in 14 journals, all indexed in web of science, scopus, and google scholar, oa articles received an average of 43 citations while non-oa articles were cited 29 times (1039). google scholar did a better job discovering those citations; in google scholar the median of citations of oa articles was always higher than that for non-oa articles, wheras this was true in web of science for only 10 of the 14 journals and in scopus for 11 of the 14 journals (1040). similarly, chen (2014) found google scholar to index far more oa journals than scopus and web of science, especially “gold oa.”12 google scholar’s advantage should not be assumed across all disciplines, however; testa (2016) found both google scholar and rilm to provide good coverage of oa journals in music, with google scholar indexing at least one article from 93% of the 84 oa journals in the sample. but the bibliographic database rilm indexed two more oa journals than google scholar. google scholar indexing of repositories may be critical for success, but results vary by ir platform and whether the ir metadata has been structured according to google’s guidelines. in a random sample from shodhganga, india’s central etd database, weideman (2015) found not one article had been indexed in full text by google scholar, although in many cases the metadata was indexed, leading the author to identify needed changes to the way shodhganga stores etds.13 likewise, chen (2014) found that neither google scholar nor google appears to index baidu wenku, a major full-text archive and social networking site in china similar to researchgate, and orduña-malea and lópez-cózar (2015) found that latin american repositories are not very visible in google or google scholar due to limitations of the description schemas chosen as well as search engine reliability. in yang’s (2016) study of texas tech’s dspace ir, google was the only search engine that indexed, discovered, or linked to pdf files supplemented with metadata; google scholar did not discover or provide links to the ir’s pdf files, and was less successful at discovering metadata. when google scholar is able to index ir content, it may be responsible for significant traffic. in a study of four major u.s. universities’ institutional repositories (three dspace, one contentdm) involving a dataset of 57,087 unique urls and 413,786 records, researchers found that 48%–66% of referrals came from google scholar (obrien et al. 2016, 870). the importance of google scholar in contrast to google was noted by lee et al. (2015), who conducted title searches on 170 journal articles housed in florida state university’s institutional repository (using bepress’s digital commons platform), 100 of which existed in full text in the ir. links to the ir were found in google results for 45.9% of the 170 items, and in google scholar for 74.7% of the 170 items. furthermore, google scholar linked to the full text for 78% of the 100 cases where full text was available, and even provided links to freely available full text for six items that did not have full 12 oa articles on publisher web sites, whether the journal itself is oa or not (chen 2014) 13 most notably, the need to store thesis documents as one pdf file instead of divided into multiple, separate files, to create html landing pages as per google’s recommendations, and to submit the addresses of these pages to google scholar. information technology and libraries | june 2017 23 text in the ir. however, the researchers also noted “relying on either google or google scholar individually cannot ensure full access to scholarly works housed in oa irs.” in their study, among the 104 fully open access items there was an overlap in results of only 57.5%; google provided links to 20 items not found with google scholar, and google scholar provided links to 25 items not found with google (lee et al. 2015, 15). google scholar results note the number of “versions” available for each item. in a study of 982 science article citations (including both oa and non-oa) in irs, pitol and degroote found 56% of citations had between four and nine google scholar versions (2014, 603) almost 90% of the citations shown were the publisher version, but of these, only 14.3% were freely available in full text on the publisher web site. meanwhile, 70% percent of the items had at least one free full-text version available through a “hidden” google scholar version. the author’s experience in retrieving full text for this review indicates this issue still exists, but research would be needed to formulate reliable recommendations for users. use and utility of google scholar as part of the research process studies were found concerning google scholar’s popularity with users and their reasons for preferring it (or not) over other tools. another group of studies examined issues related to the utility of google scholar for research processes, including issues related to messy metadata. finally, a cluster of articles focused specifically on using google scholar for systematic reviews. popularity and user preferences several studies have shown google scholar to be well-known to scholarly communities. a survey of 3,500 scholars from 95 countries found that over 60% of 3,500 scientists and engineers and over 70% of respondents in the social sciences, arts, and humanities were aware of google scholar and used it regularly (van noorden 2014). in a large-scale journal-reader survey, inger and gardner (2016) found that among academic researchers in high-income areas, academic search engines surpassed abstracts and indexes as a starting place for research (2016, 85, figure 4). in low-income areas, google use exceeded google scholar use for academic research. major library link resolver software offers reports of full-text requests broken down by referrer. inger and gardner (2016) showed a large variance across subjects for whether people prefer google or google scholar: “people in the social sciences, education, law, and business use google scholar more to find journal articles. however, people working in the humanities and religion and theology prefer to use google” (88). humanities scholar use of google over google scholar was also found by kemman et al. (2013); google, google images, google scholar, and youtube were used more than jstor or other library databases, even though humanities scholars’ trust in google and google scholar was lower. user research since 2014 concerning google scholar has focused on graduate students. results suggest scholar is used regularly but the tool is only partially sufficient. in their study of 20 engineering masters’ students’ use of abstracts and indexes, johnson and simonsen (2015) found an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 24 that half their sample (n=20) had used google scholar the last time they located an article using specific search terms or criteria. google was the second most-used source at 20%, followed by abstracting and indexing services (15%). graduate students describe google scholar with nuance and refer to it as a specific part of their process. in bøyum and aabø’s (2015) interviews with eight phd business students and wu and chen’s (2014, 381) interviews with 32 graduate students drawn from multiple academic disciplines, the majority described using library databases and google scholar for different purposes depending on the context. graduate students in both studies were well aware of google scholar’s use for citation searching. bøyum and aabø’s (2015) subjects described library resources as more “academically robust” than google or google scholar. wu and chen’s (2014) interviewees praised google scholar for its wider coverage and convenience, but lamented the uncertain quality, sometimes inaccessible full text, too many results, lack of sorting function (document type or date), finding documents from different disciplines, and duplicate citations. google scholar was seen by their subjects as useful during early stages of information seeking. in contrast to general assumptions, more than half the students (wu and chen 2014, 381) interviewed reported browsing more than 3 pages’ worth of google scholar results. about half of interviewees reported looking at cited documents to find more, however students had mixed opinions about whether the citing documents turned out to be relevant. google scholar’s “my library” feature, introduced in 2013, now competes with other bibliographic citation management software. in a survey of 344 (mostly graduate) students, conrad, leonard, and somerville found google scholar was the most-used (47%) followed by endnote (37%), and zotero (19%) (2015, 572). follow-up interviews with 13 of the students revealed that a few students used multiple tools, for example one participant noted he/she used “endnote for sharing data with lab partners and others “across the community”; mendeley for her own personal thesis work, where she needs to “build a whole body of literature”; and google scholar citations for “quick reference lists that i may not need for a second or third time.” messy metadata many studies have suggested google scholar’s metadata is “messy.” although none in the period of study examined this phenomenon in conjunction with relative user performance, the issues found could affect scholarship. a 2016 study itemized the most common mistakes in google scholar resulting from its extraction process: 1) incorrect title identification; 2) missing or incorrectly assigned authors; 3) book reviews indexed as books; 4) failing to group versions of the same document, which inflates citation counts; 5) grouping different editions of books, which deflates citation counts; 6) attributing citations to documents that did not cite them, or missing citations that did; and 7) duplicate author profiles (martín-martín et al. 2016b). the authors concluded that “in an academic big data environment, these errors (which we deem affect less than 10% of the records in the database) are of no great consequence, and do not affect the core system information technology and libraries | june 2017 25 performance significantly” (54). two of these issues have been studied specifically: duplicate citations and missing publication dates. the rate of duplicate citations in google scholar has ranged upwards of 2.93% (haddaway et al. 2015) and 5% (winter, zadpoor, and dodou 2014, 1562), which can be compared to a .05% duplicate citation rate in web of science (haddaway et al. 2015, 13). haddaway found the main reasons for duplication include “typographical errors, including punctuation and formatting differences; capitalization differences (google scholar only), incomplete titles, and the fact that google scholar scans citations within reference lists and may include those as well as the citing article” (2015, 13). the issue of missing publication dates varies greatly across samples. dates were found to be missing 9% of the time in winter et al.’s study, although it varied by publication type: 4% of journals, 15% of theses, and 41% of the unknown document types” (winter, zadpoor, and dodou 2014, 1562). however martin-martin et al. studied a sample of 32,680 highly-cited documents and found that web of science and google scholar agreed on publication dates 96.7% of the time, with an idiosyncratically large proportion of those mismatches in 2012 and 2013 (2017, 159). utility for research processes prior to 2014, studies such as asher, duke, and wilson's 2012 evaluated google scholar’s utility as a general research tool, often in comparison with discovery tools. since 2014, the only such study found was namei and young’s comparison of summon, google scholar, and google using 299 known-item queries. they found google scholar and summon returned relevant results 74% of the time; google returned relevant results 91% of the time. for “scholarly formats,” they found summon returned relevant results 76% of the time; google 79%; and google 91% (2015, 526527). the remainder of studies in this category focused specifically on systematic reviews, perhaps because such reviews are so time-consuming. authors develop search strategies carefully, execute them in multiple databases, and document their search methods and results carefully. some prestigious journals are beginning to require similar rigor for any original research article, not just systematic reviews (cals and kotz 2016). information provided by professional organizations about the use of google scholar for systematic reviews seems inconsistent: the cochrane handbook for systematic reviews of interventions lists google scholar among sources for searching, but none of the five “highlighted reviews” on the cochrane web site at the time of this article’s writing used google scholar in their methodologies. the uk organization national institute for health and care excellence’s manual (national institute for health and care excellence (nice)) only mentions google scholar in an appendix of search sources under “conference abstracts.” a study by gehanno et al. (2013) found google scholar contained 100% of the references from 29 systematic reviews, and suggested google scholar could be the first choice for systematic reviews or meta-analyses. this finding prompted a slew of follow-up studies in the next three years. an an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 26 immediate response by giustini and boulos (2013) pointed out that systematic reviews are not performed by searching for article titles as with gehanno et al.’s method, but through search strategies. when they tried to replicate a systematic review’s topical search strategy in google scholar, the citations were not easily discovered. in addition the authors were not able to find all the papers from a given systematic review even by title searching. haddaway et al. also found imperfect coverage: for one of the seven reviews examined, 31.5% of citations could not be found (2015, 11). haddaway also noted that special characters and fonts (as with chemical symbols) can cause poor matching when such characters are part of article titles. recent literature concurs that it is still necessary to search multiple databases when conducting a systematic review, including abstracts and indexes, no matter how good google scholar’s coverage seems to be. no one database’s coverage is complete, including google scholar (thielen et al. 2016), and practical recall of google scholar is exceptionally low due to the 1,000 result limit, yet at the same time, google scholar’s lack of precision is costly in terms of researchers’ time (bramer, giustini, and kramer 2016b; haddaway et al. 2015). the challenges limiting study of google scholar’s coverage also bedevil those wishing to use it for reviews, especially the 1,000 result retrieval limit, lack of batch export, and lack of exported abstracts (levay et al. 2016). additionally, google scholar’s changing content, unknown algorithm and updating practices, search inconsistencies, limited boolean functions, and 256-character query limit prevent the tool from accommodating the detailed, reproducible search methodologies required by systematic reviews (bonato 2016; haddaway et al. 2015; giustini and boulos 2013). bonato noted google scholar retrieved different results with advanced and basic searches; could not determine the format of items (e.g. conference papers); and found other inconsistent results.14 bonato also lamented the lack of any kind of document type limit. despite the limitations and logistical challenges, practitioners and scholars are finding solid reasons for including academic web search engines as part of most systematic review methodologies (cals and kotz 2016). stansfield et al. noted that “relevant literature for lowand middle-income countries, such as working and policy papers, is often not included in databases,” and that google scholar finds additional journal articles and grey literature not indexed in databases (2016, 191). for eight systematic reviews by eppi-center, “over a quarter of relevant citations were found from websites and internet search engines” (stansfield, dickson, and bangpan 2016, 2). specific tools and practices have been recommended when using search engines within the context of systematic reviews. software is available to record search strategies and results (harzing and alakangas 2016b; haddaway 2015). haddaway suggests the use of snapshot tools (haddaway 2015) to record the first 1,000 google scholar records rather than the typical assessment of the first 50 search results as had been done in the past: “this change in practice 14 bonato (2016) found zero hits for conference papers when limiting by year 2015-2016, but found two papers presented at a 2015 meeting. information technology and libraries | june 2017 27 could significantly improve both the transparency and coverage of systematic reviews, especially with respect to their grey literature components.” (haddaway et al. 2015, 15). both haddaway (2015) and cochrane recommend that review authors print or save locally electronic copies of the full text or relevant details rather than bookmarking web sites, “in case the record of the trial is removed or altered at a later stage” (higgins and green 2011). new methods for searching, downloading, and integrating academic search engine results into review procedures using free software to increase transparency, repeatability, and efficiency have been proposed by haddaway and his colleagues (2015). google scholar citations and metrics google scholar citations and metrics are not academic search engines, but this article included them because these products are interwoven into the fabric of the google scholar database. google scholar citations, launched in late 2011 (martín-martín et al. 2016b, 12) groups citations by author, while google metrics (launch date uncertain) provides similar data for articles and journals. readers interested in an in-depth literature review of google scholar citations for earlier years (2005-2012) are directed to (thelwall and kousha 2015b). in his comprehensive review of more recent literature about using google scholar citations for citation analysis, waltman (2016) described several themes. google scholar’s coverage of many fields is significantly broader than web of science and scopus, and this seems to be continuing to improve over time. however studies regularly report google scholar’s inaccuracies, content gaps, phantom data, easily manipulatable citation counts, lack of transparency, and limitations for empirical bibliometric studies. as discussed in the coverage section, google scholar’s citation database is competitive with other major databases such as web of science and has been growing dramatically in the last few years (winter, zadpoor, and dodou 2014; harzing and alakangas 2016b; harzing 2014) but has recently stabilized (harzing and alakangas 2016b). more and more studies are concluding that google scholar will report more comprehensive information about citation impact than web of science or scopus. across a sample of articles from many years of one science journal, trapp (2016) found the proportion of articles with zero citations was 37% for web of science, 29% for scopus, and 19% for google scholar. some of google scholar’s superiority for citation analysis in the social sciences and humanities is due to its inclusion of book content, software, and additional journals (prins et al. 2016; bornmann et al. 2016). bornmann et al. (2016) noted citations to all ten of a research institute’s ten books published in 2009 were found in google scholar, whereas web of science found citations for only two books. furthermore they found data in google scholar for 55 of the total of 71 of the institute’s book chapters. for the four conference proceedings they could identify in google scholar, there were 100 citations, of which 65 could be found in google scholar. the comparative success of google scholar for citation impact varies by discipline, however: (levay et al. 2016) found web of science to be more reliable than google scholar, quicker for an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 28 downloading results, and better for retrieving 100% of the most important publications in public health. despite google scholar’s growth, using all three major tools (scopus, web of science, and google scholar) still seems to be necessary for evaluating researcher productivity. rothfus (2016) compared web of science, scopus, and google scholar citation counts for evaluating the impact of the canadian network of observational drug effect studies (cnodes), as represented by a sample of 222 citations from five articles. attempting to determine citation metrics for the cnodes research team yielded different results for every article when using the three tools. they found that “using three tools (web of science, scopus, google scholar) to determine citation metrics as indicators of research performance and impact provided varying results, with poor overall agreement among the three” (237). major academic libraries’ web sites often explain how to find one’s h-index in all three (suiter and moulaison 2015). researchers have also noted the disadvantages of google scholar for citation impact studies. google scholar is costly in terms of researcher time. levay et al. (2016) estimated the cost of “administering results” from web of science to be 4 hours versus 75 hours for google scholar. administering results includes using the search tool to search, download, and add records to bibliographic citation software, and removing duplicate citations. duplicate citations are often mentioned as a problem (prins et al. 2016), although moed (2016) suggested the double counting by google scholar would occur only if the level of analysis is on target sources, not if it is on target articles.15 downloaded citation samples can still suffer from double counts, however: harzing and alakangas described how cleaning “a fairly extreme case” in their study reduced the number of papers from 244 to 106 (2016b). google scholar also does not identify self-citations, which can dramatically influence the meaning of results (prins et al. 2016). furthermore, researchers have shown it is possible to corrupt google scholar citations by uploading obviously false documents (delgado lópez-cózar, robinson-garcía, and torres-salinas 2014).while the researchers noted traditional citation indexes can also be defrauded, google’s products are less transparent and abuses may not be easily detected. google did not respond to the research team when contacted and simply deleted the false documents to which it had been alerted without reporting the situation to the affected authors, and the researchers concluded: “this lack of transparency is the main obstacle when considering google scholar and its by-products for research evaluation purposes” (453). because these disadvantages do not outweigh google scholar’s seemingly broader coverage, many articles investigate workarounds for using google scholar more effectively when evaluating 15 “if a document is, for instance, first published in arxiv, and a next version later in a journal j, citations to the two versions are aggregated. in google scholar metrics, in which arxiv is included as a source, this document (assuming that its citation count exceed the h5 value of arxiv and journal j) is listed both under arxiv and under journal j, with the same, aggregate citation count (moed 2016, 29). information technology and libraries | june 2017 29 research impact. harzing and alakangas (2016b) recommend the hi index16, which is corrected for career length and co-authorship patterns, as the citation metric of choice for a fair comparison of google scholar with other tools. bornmann et al. (2016) investigated a method to normalize data and reduce errors when using google scholar data to evaluate citations in the social sciences and humanities. researcher profiles can also be used to find other scholars by topic. in a 2014 survey of researchers (n=8,554), dagienė and krapavickaitė found that 22% used a third-party service such as google scholar or microsoft academic to produce lists of their scholarly activities and 63% reported their scholarly record was freely available on the web (2016, 158, 161). google scholar ranked only second to microsoft word as the most frequently used software to maintain academic activity records (160). martín-martín et al. (2016b) examined 814 authors in the field of bibliometrics using google scholar citations, researcherid, researchgate, mendeley, and twitter. google scholar was the most used social research sharing platform, followed by researchgate, with researcherid gaining wider acceptance among authors deemed “core” to the field. only about one-third of the authors created a twitter profile, and many mendeley and researcherid profiles were found empty. the study found google scholar academic profiles’ distinctive advantages to be automatic updates and its high growth rate, with disadvantages of scarce quality control, inherited metadata mistakes from google scholar, and its manipulatability. overall, martin-martin and colleagues concluded that google scholar “should be the preferred source for relational and comparative analyses in which the emphasis is put on author clusters” (57). google scholar metrics provides citation information for articles and journals. in a sample of 1,000 journals, orduña-malea and delgado lópez-cózar found that “despite all the technical and methodological problems,” google scholar metrics provides sound and reliable journal rankings (2014, 2365). google scholar metrics seems to be an annual publication; the 2016 edition contains 5,734 publications and 12 language rankings. russian, korean, polish, ukranian, and indionesian were added this year, while italian and dutch were removed for unknown reasons (martín-martín et al. 2016a). researchers also found that many discussion papers and working papers were removed in 2016. english-language publications are broken into subject areas and disciplines. google scholar metrics often, but not always creates separate entries for each language in which a journal is published. bibliometricians call for google scholar metrics to display the total number of documents published in the publications indexed and the total number of citations received: “these are the two essential parameters that make it possible to assess the reliability and accuracy of any bibliometric indicator” (13). adding country and language of publication and self-citation rates are among the other improvements listed by lopez-cozar and colleagues. 16 harzing and alakangas (2016b) define the hia as the hi norm/academic age. academic age refers to the number of years elapsed since first publication. to calculate hi norm, one divides the number of citations by the number of authors for that paper, and then calculates the h-index of the normalized citation count. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 30 informing practice the glaring lack of research related to the coverage of arts and humanities scholarship, limited research on book coverage, and relaunch of microsoft academic make it impossible to form a general recommendation regarding the use of academic web search engines for serious research. until the ambiguity of arts and humanities coverage is clarified, and until academic web search engines are transparent and stable, traditional bibliographic databases still seem essential for systematic reviews, citation analysis, and other rigorous literature search purposes. disciplinespecific databases also have features such as controlled vocabulary, industry classification codes, and peer review indicators that make scholars more efficient and effective. nevertheless, the increasing relevance of academic search engines and solid coverage of sciences and social sciences make it essential for librarians to become as expert with google scholar, google books, and microsoft academic. for some scholarly tasks, academic search engines may be superior: for example, when looking up doi numbers for this paper’s bibliography, the most efficient process seemed to be a google search on the article title plus the term “doi,” and the most likely site to display in the results was researchgate.17 librarians and scholars should champion these tools as an important part of an efficient, effective scholarly research process (walsh 2015), while also acknowledging the gaps in coverage, biases, metadata issues and missing features available in other databases. academic web search engines could form the centerpiece for instruction sessions surrounding the scholarly network, as shown by “cited by” features, author profiles, and full-text sources. traditional abstracts and indexes could then be presented on the basis of their strengths. at some point, explaining how to access full text will likely no longer focus on the link resolver but on the many possible document versions a user might encounter (e.g. pre-prints or editions of books) and how to make an informed choice. in the meantime, even though web search engines and repositories may retrieve copious full text outside library subscriptions, college students should still be made aware of the library’s collections and services such as interlibrary loan. when considering google scholar’s weaknesses, it’s important to keep in mind chen’s observation that we may not have a tool available that does any better (antell et al. 2013). while google scholar may be biased toward english-language publications, so are many bibliographic databases. overall, google scholar seems to have increased the visibility of international research (bartol and mackiewicz-talarczyk 2015). while google scholar’s coverage of grey literature has been shown to be somewhat uneven (bonato 2016; haddaway et al. 2015), it seems to include more diversity among relevant document types than many abstracts and indexes (ştirbu et al. 2015; bartol and mackiewicz-talarczyk 2015). although the rigors of systematic reviews may contraindicate the tool’s use as a single source, it adds value to search results from other databases (bramer, giustini, and kramer 2016a). user preferences and priorities should also be taken into account; google 17 because the authority of researchgate is ambiguous, in such cases i then looked up the doi using google to find the publisher’s version. in some cases, the doi was not displayed on the publisher’s result page (e.g., https://muse.jhu.edu/article/197091). information technology and libraries | june 2017 31 scholar results have been said to contain “clutter,” but many researchers have found the noise in google scholar tolerable given its other benefits (ştirbu et al. 2015). google books purportedly contains about 30 million items, focused on u.s.-published and englishlanguage books. but its coverage is hit-or-miss, surprising mays (2015) with an unexpected wealth of primary sources but disappointing harper (2016) with limited coverage of academic health sciences books. recent court decisions have enabled google to continue progressing toward their goal of full-text indexing and making snippet views available for the google-estimated universe of 130 million books, which suggests its utility may increase. google books is not integrated with link resolvers or discovery tools but has been found useful for providing information about scholarly research impact, especially for the arts, humanities, and social sciences. as re-launched in 2016, microsoft academic shows real potential to compete with google scholar in coverage and utility for finding journal articles. as of february 2017 its index contains 120 million citations. in contrast to the mystery of google scholar’s black-box algorithms and restrictive limitations, microsoft academic uses an open-system approach and offers an api. microsoft academic appears to have less coverage of books and grey literature compared with google scholar. research is badly needed about the coverage and utility of both google books and microsoft academic. google scholar continues to evolve, launching a new algorithm for known-item searching in 201618 that appears to work very well. google scholar does not reveal how many items it searches but studies have suggested 160 million documents have been indexed. studies have shown the google scholar relevance algorithm to be heavily influenced by citation counts and language of publication. google scholar has been so heavily researched and is such a “black box” that more attention would seem to have diminishing returns, except in the area of coverage of and utility for arts and humanities research. librarians may find these takeaways useful for working with or teaching google scholar: • little is known about coverage of arts and humanities by google scholar. • recent studies repeatedly find that in the sciences and social sciences google scholar covers as much if not more than library databases, has more recent coverage, and frequently provides access to full text without the need for library subscriptions. • although the number of studies is limited, google scholar seems excellent at retrieving known scholarly items compared with discovery tools. • using proper accent marks in the title when searching for non-english language items appears to be important. 18 google scholar’s blog notes that in january 2016, a change was made so “scholar now automatically identifies queries that are likely to be looking for a specific paper” technically speaking, “it tries hard to find the intended paper and a version that that particular user is able to read” https://scholar.googleblog.com/. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 32 • finding full text for non-english journal articles may require searching google scholar in the original language. • while google scholar may include results from google books, it appears both tools should be used rather than assuming google books will appear in google scholar. • while google scholar does include grey literature, these results do not usually rank highly. • google scholar and google must both be used to effectively search across institutional repository content. • free full text may be buried underneath the “all x versions” links because the publisher’s web site is usually the dominant version presented to the user. the right-hand column links may help ameliorate this situation, but not reliably. • google scholar is well-known in most academic communities and used regularly; however, it is seldom the only tool used, with scholars continuing to use other web search tools, library abstracts and indexes, and published web sites as well. • experts in writing systematic reviews recommend google scholar be included as a search tool along with traditional abstracts and indexes, using software to record the search process and results. • for evaluating research impact, google scholar may be superior to web of science or scopus, but using all three tools still seems necessary. • as with any database, citation metadata should be verified against the publisher’s data; with google scholar, publication dates should receive deliberate attention. • when google scholar covers some of a major publisher’s content, that does not imply it covers all of that publisher’s content. • google scholar metrics appears to provide reliable journal rankings. research agenda this review of the literature also provides direction for future research concerning academic web search engines. because this review focused on 2014-2016, researchers may need to review studies from earlier periods for methodological ideas and previous findings, noting that dramatic changes in search engine coverage and behavior can occur within only a few years.19 across the studies, some general best practices were observed. when comparing the coverage of academic web search engines, their utility for establishing research impact, or other bibliometric studies, researchers should strongly consider using software such as publish or perish, and to design their research approach with previous methodologies in mind. information scientists have charted a set of clear disciplinary methods; there is no need to start from scratch. even when 19 for example ştirbu found that google scholar overlapped georef by 57% and 62% (ştirbu et al. 2015, 328), compared with a finding by neuhaus in 2006 where scholar overlapped with georef by 26% (2006, 133). information technology and libraries | june 2017 33 performing a large-scale quantitative assessment such as (kousha and thelwall 2015), manually examining and discussing a subset of the sample seems helpful for checking assumptions and for enhancing the meaning of the findings to the reader. some researchers examined the “top 20” or “top 10” results qualitatively (kousha and thelwall 2015), while others took a random sample from within their large-study sample (kousha, thelwall, and rezaie 2011). academic search engines for arts and humanities research research into the use of academic web search engines within arts and humanities fields is sorely needed. surveys show humanities scholars use both google and google scholar (inger and gardner 2016; kemman, kleppe, and scagliola 2013; van noorden 2014). during interviews of 20 historians by martin and quan-haase (2016) concerning serendipity, five mentioned google books and google scholar as important for recreating serendipity of the physical library online. almost all arts and humanities scholars search the internet for researchers and their activities, and commonly expressed the belief that having a complete list of research activities online improves public awareness (dagienė and krapavickaitė 2016). mays’s (2015) practical advice and the few recent studies on citation impact of google books for these disciplines point to the enormous potential for this tool’s use. articles describing opportunities for new online searching habits of humanities scholars have not always included google scholar (huistra and mellink 2016). wu and chen’s interviews with humanities graduate students suggested their behavior and preferences were different from science and technology students, doing more known-item searching and struggling with “semantically ambiguous keywords” that retrieved irrelevant results (2014, 381). platform preferences seem to have a disciplinary aspect: hammarfelt’s (2014) investigation of altmetrics in the humanities suggests mendeley and twitter should be included along with google scholar when examining citation impact of humanities research, while a 2014 nature survey suggests researchgate is much less popular in the social sciences and humanities than in the sciences (van noorden 2014). in summary, arts and humanities scholars are active users of academic web search engines and related tools, but their preferences and behavior, and the relative success of google scholar as a research tool cannot be inferred from the vast literature focused on the sciences. advice from librarians and scholars about the strengths and limitations of academic web search engines in these fields would be incredibly useful. specific examples of needed research, and related studies to reference for methodological ideas: • similar to the studies that have been done in the sciences, how well do academic search engines cover the arts and humanities? an emphasis on formats important to the discipline would be important (prins et al. 2016). • how does the quality of search results compare between academic search engines and traditional library databases for arts and humanities topics? to what extent can the user usefully accomplish her task? (ruppel 2009)? an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 34 • to what extent do academic search engines support the research process for scholarship distinctive to arts and humanities disciplines (e.g. historiographies, review essays)? • in academic search engines, how visible is the arts and humanities literature found in institutional repositories (pitol and de groote 2014)? specific aspects of academic search engine coverage this review suggests that broad studies of academic search engine coverage may have reached a saturation point. however, specific aspects of coverage need additional investigation: • grey literature: although google scholar’s inclusion of grey literature is frequently mentioned as valuable, empirical studies evaluating its coverage are scarce. additional research following the methodology of haddaway (2015) could investigate the bibliographies of literature other than systematic reviews, investigate various disciplines, or use a sample of valuable known items (similar to kousha, thelwall, and rezaie’s (2011) methodology for books). • non-western, non-english language literature: for further investigation of the repeated finding of non-western, non-english language bias (abrizah and thelwall 2014; cavacini 2015), comparisons to library abstracts and indexes would be helpful for providing context. to what extent is this bias present in traditional research tools? hilbert et al. found the coverage of their sample increased for english language in both web of science and scopus, and “to a lesser extent” in google scholar (2015, 260). • books: any investigations of book coverage in microsoft academic and google scholar would be welcome. very few 2014-2016 studies focused on books in google scholar, and even looking in earlier years turned up little research. georgas (2015) compared google with a federated search tool for finding books, so her study may be a useful reference. kousha et al. (2011) found three times as many citations in google scholar than in scopus to a sample of 1,000 academic books. the authors concluded “there are substantial numbers of citations to academic books from google books and google scholar, and it therefore may be possible to use these potential sources to help evaluate research in bookoriented disciplines” (kousha, thelwall, and rezaie 2011, 2157). • institutional repositories: yang (2016) recommended that “librarians of digital resources conduct research on their local digital repositories, as the indexing effects and discovery rates on metadata or associated text files may be different case by case,” and the studies found 2014-2016 show that ir platform and metadata schema dramatically affect discovery, with some irs nearly invisible (weideman 2015; chen 2014; orduña-malea and lópez-cózar 2015; yang 2016) and others somewhat findable by google scholar (lee et al. 2015; obrien et al. 2016). askey and arlitsch (2015) have explained how google scholar’s decisions regarding metadata schema can dramatically affect results.20 libraries who 20 for example, google’s rejection of dublin core. information technology and libraries | june 2017 35 would like their institutional repositories to serve as social sharing platforms for research should consider conducting a study similar to (martín-martín et al. 2016b). finally, a study of ir journal article visibility in academic web search engines could be extremely informative. • full-text retrieval: the indexing coverage of academic search engines relates to the retrieval of full text, which is another area ripe for more research studies, especially in light of the impressive quantity of full text that can be retrieved without user authentication. johnson and simonsen (2015) found that more of the engineering students they surveyed obtained scholarly articles from a free download or getting a pdf from a colleague at another institution than used the library’s subscription. meanwhile, libraries continue to pay for costly subscription resources. monitoring this situation is essential for strategic decision-making. quint (2016) and karlsson (2014) have suggested strategies for libraries and vendors to support broader access to subscription full text through creative licensing and per-item fee approaches. institutional repositories have had mixed results in changing scholars’ habits (both contributors and searchers) but are demonstrably contributing to the presence of full text in the academic search engine experience. when will academic users find a good-enough selection of full-text articles that they no longer need the expanded full text paid for by their institutions? google books similarly to microsoft academic, google books as a search tool also needs dedicated research from librarians and information scientists about its coverage, utility, and/or adoption. a purposeful comparison with other large digital repositories such as hathitrust (https://www.hathitrust.org) would be a boon to practitioners and the public. while hathitrust is transparent about its coverage (https://www.hathitrust.org/statistics_visualizations), specific areas of google books’ coverage have been called into question. weiss (2016) suggested a gap in google books exists from about 1915-1965 “because many publishers either have let it fall out of print, or the book is orphaned and no one wants to go through the trouble of tracking down the copyright owners” and found that copies in google books “will likely be locked down and thus unreadable, or visible only as a snippet, at best” (303). has this situation changed since the court rulings concerning the legality of snippet view? longitudinal studies in the growth of google books similar to (harzing 2014) could illuminate this and other questions about google books’s ability to deliver content. uneven coverage of content types, geography, and language should be investigated. mays noted a possible geographical imbalance within the united states (mays 2015, 26). others noted significant language and international imbalances, and large disciplinary differences (weiss 2016; abrizah and thelwall 2014; kousha and thelwall 2015). weiss and others suggest the implications of google books’ coverage imbalance have enormous social implications: “google and other [massive digital libraries] have essentially canonized the books they have scanned and contribute to the marginalization of those left unscanned” (301). therefore more holistic quantitative investigations of the types of information in google books and possible skewness an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 36 would be welcome. finally, chen’s study (2012) comparing the coverage of google books and worldcat could be repeated to provide longitudinal information. the utility of google books for research purposes also needs further investigation. books are far more prevalently cited in wikipedia than are research articles (thelwall and kousha 2015a). examining samples of wikipedia articles’ citation lists for the prevalence of google books could reveal how dominant a force google books has become in that space. on a more philosophical level, investigating the ways google books might transform scholarly processes would be useful. szpiech (2014) considered how the google books version of a medieval manuscript transformed his relationship with texts, causing a rupture “produced by my new power to extract words and information from a text without being subject to its order, scale, or authority” (78). he hypothesized readers approach google books texts as consumers, rather than learners, whereby “the critical sense of the gestalt” is at risk of being forgotten” (84). have other researchers in experienced what he describes? microsoft academic given the stated openness of microsoft’s new academic web search engine,21 the closed nature of google scholar, and the promising findings of bibliometricians (harzing 2016b; harzing and alakangas 2016a), librarians and information scientists should embark on a thorough review of microsoft academic with similar enthusiasm to which they approached google scholar. the search engine’s coverage, utility for research, and suitability for bibliometric analysis22 all need to be examined. microsoft academic’s abilities for supporting scholarly social networking would also be of interest, perhaps using ward et al. (2015) as a theoretical groundwork. the tool’s coverage and utility for various disciplines and research purposes is a wide-open field for highly useful research. professional and instructional approaches based on user research to inform instructional approaches, more study on user behavior is needed, perhaps repeating herrera’s (2011) study with google scholar and microsoft academic. in light of the recent focus on graduate students, research concerning the use of academic web search engines by undergraduates, community college students, high school students, and other groups would be welcome. using an interview or focus group generates exploratory findings that could be tested through surveys with a larger, more representative sample of the population of interest. studying searching behaviors has been common; can librarians design creative studies to investigate reading, engagement, and reflection when web search engines are used as part of the process? is there a way to study whether the “matthew effect” (antell et al. 2013, 281), the aging citation 21 microsoft’s faq says the company is “adopting an open approach in developing the service, and we invite community participation. we like to think what we have developed is a community property. as such, we are opening up our academic knowledge as a downloadable dataset” and offers the academic knowledge api (https://www.microsoft.com/cognitive-services/en-us/academic-knowledge-api). 22 see jacsó (2011) for methodology. information technology and libraries | june 2017 37 phenomenon (verstak et al. 2014; martín-martín et al. 2016a; davis and cochran 2015), or other epistemological hypotheses are influencing scholarship patterns? a bold study could be performed to examine differences in quality outcomes between samples of students using primarily academic search engines versus traditional library search tools. exploratory studies in this area could begin by surveying students about their use of search tools for research methods courses or asking them to record their research process in a journal, and correlating the findings with their grades on the final research product. three specific areas of user research needed are the use of scholarly social network platforms, researcher profiles, and the influence of these on scholarly collaboration and research (ward, bejarano, and dudás 2015, 178); the performance of google’s relatively new known-item search23 (compared with microsoft academic’s known-item search abilities), and searching in non-english languages. regarding the latter, albarillo’s (2016) method which he applied to library databases could be repeated with google scholar, microsoft academic, and google books. finally, to continue their strong track record as experts in navigating the landscape of digital scholarship, librarians need to research assumptions regarding best practices for scholarly logistics. for example, searching google for article titles plus the term “doi,” then scanning the results list for researchgate was found by this study’s author to most efficiently provide doi numbers: but is this a reliable approach? does researchgate have sufficient accuracy to be recommended as the optimal tool for this task? what is the most efficient way for a scholar to locate full text for a citation? are academic search engines’ bibliographic citation management software export tools competitive with third-party commercial tools such as refworks? another area needing investigation is the visibility of links to free full text in google scholar. pitol and degroote found that 70% percent of the items in their study had at least one free full-text version available through a “hidden” google scholar version (2014, 603), and this author’s work on this review article indicates this problem still exists — but to what extent? also, when free full text exists in multiple repositories (e.g. researchgate, digital commons, academic.edu), which are the most trustworthy and practically useful for scholars? librarians should discuss the answers to these questions and be ready to provide expert advice to users. conclusion with so many users opting to use academic web search engines for research, librarians need to investigate the performance of microsoft academic, google books, and of google scholar for the arts and humanities, and to re-think library services and collections in light of these tools’ strengths and limitations. the evolution of web indexing and increasing free access to full text should be monitored in conjunction with library collection development. to remain relevant to 23 google scholar’s blog notes that in january 2016, a change was made so “scholar now automatically identifies queries that are likely to be looking for a specific paper” technically speaking, “it tries hard to find the intended paper and a version that that particular user is able to read” https://scholar.googleblog.com/. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 38 modern researchers, librarians should continue to strengthen their knowledge of and expertise with public academic web search engines, full-text repositories, and scholarly networks. bibliography abrizah, a., and mike thelwall. 2014. "can the impact of nonwestern academic books be measured? an investigation of google books and google scholar for malaysia." journal of the association for information science & technology 65 (12): 2498-2508. https://doi.org/10.1002/asi.23145. albarillo, frans. 2016. "evaluating language functionality in library databases." international information & library review 48 (1): 1-10. https://doi.org/10.1080/10572317.2016.1146036. antell, karen, molly strothmann, xiaotian chen, and kevin o’kelly. 2013. "cross-examining google scholar." reference & user services quarterly 52 (4): 279-282. https://doi.org/10.5860/rusq.52n4.279. asher, andrew d., lynda m. duke, and suzanne wilson. 2012. "paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources." college & research libraries 74(5):464-488. https://doi.org/10.5860/crl374. askey, dale, and kenning arlitsch. 2015. "heeding the signals: applying web best practices when google recommends." journal of library administration 55 (1): 49-59. https://doi.org/10.1080/01930826.2014.978685. authors guild. "authors guild v. google." accessed january 1, 2016, https://www.authorsguild.org/where-we-stand/authors-guild-v-google/. bartol, tomaž, and maria mackiewicz-talarczyk. 2015. "bibliometric analysis of publishing trends in fiber crops in google scholar, scopus, and web of science." journal of natural fibers 12 (6): 531. https://doi.org/10.1080/15440478.2014.972000. boeker, martin, werner vach, and edith motschall. 2013. "google scholar as replacement for systematic literature searches: good relative recall and precision are not enough." bmc medical research methodology 13 (1): 1. bonato, sarah. 2016. "google scholar and scopus for finding gray literature publications." journal of the medical library association 104 (3): 252-254. https://doi.org/10.3163/15365050.104.3.021. bornmann, lutz, andreas thor, werner marx, and hermann schier. 2016. "the application of bibliometrics to research evaluation in the humanities and social sciences: an exploratory study using normalized google scholar data for the publications of a research institute." information technology and libraries | june 2017 39 journal of the association for information science & technology 67 (11): 2778-2789. https://doi.org/10.1002/asi.23627. boumenot, diane. "printing a book from google books." one rhode island family. last modified december 3, 2015, accessed january 1, 2017. https://onerhodeislandfamily.com/2015/12/03/printing-a-book-from-google-books/. bøyum, idunn, and svanhild aabø. 2015. "the information practices of business phd students." new library world 116 (3): 187-200. https://doi.org/10.1108/nlw-06-2014-0073. bramer, wichor m., dean giustini, and bianca m. r. kramer. 2016. "comparing the coverage, recall, and precision of searches for 120 systematic reviews in embase, medline, and google scholar: a prospective study." systematic reviews 5(39):1-7. https://doi.org/10.1186/s13643-016-0215-7. cals, j. w., and d. kotz. 2016. "literature review in biomedical research: useful search engines beyond pubmed." journal of clinical epidemiology 71: 115-117. https://doi.org/10.1016/j.jclinepi.2015.10.012. carlson, scott. 2006. "challenging google, microsoft unveils a search tool for scholarly articles." chronicle of higher education 52 (33). cavacini, antonio. 2015. "what is the best database for computer science journal articles?" scientometrics 102 (3): 2059-2071. https://doi.org/10.1007/s11192-014-1506-1. chen, xiaotian. 2012. "google books and worldcat: a comparison of their content." online information review 36 (4): 507-516. https://doi.org/10.1108/14684521211254031. ———. 2014. "open access in 2013: reaching the 50% milestone." serials review 40 (1): 21-27. https://doi.org/10.1080/00987913.2014.895556. choong, miew keen, filippo galgani, adam g. dunn, and guy tsafnat. 2014. "automatic evidence retrieval for systematic reviews." journal of medical internet research 16 (10): 1-1. https://doi.org/10.2196/jmir.3369. ciccone, karen, and john vickery. 2015. "summon, ebsco discovery service, and google scholar: a comparison of search performance using user queries." evidence based library & information practice 10 (1): 34-49. https://ejournals.library.ualberta.ca/index.php/eblip/article/view/23845. conrad, lettie y., elisabeth leonard, and mary m. somerville. 2015. "new pathways in scholarly discovery: understanding the next generation of researcher tools." paper presented at the association of college and research libraries annual conference, march 25-27, portland, or. https://pdfs.semanticscholar.org/3cb1/315476ccf9b443c01eb9b1d175ae3b0a5b4e.pdf. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 40 dagienė, eleonora, and danutė krapavickaitė. 2016. "how researchers manage their academic activities." learned publishing 29(3):155-163. https://doi.org/10.1002/leap.1030. davis, philip m., and angela cochran. 2015. "cited half-life of the journal literature." arxiv preprint arxiv:1504.07479. https://arxiv.org/abs/1504.07479. delgado lópez-cózar, emilio, nicolás robinson-garcía, and daniel torres-salinas. 2014. "the google scholar experiment: how to index false papers and manipulate bibliometric indicators." journal of the association for information science & technology 65 (3): 446-454. https://doi.org/10.1002/asi.23056. erb, brian, and rob sica. 2015. "flagship database for literature searching or flelpful auxiliary?" charleston advisor 17 (2): 47-50. https://doi.org/10.5260/chara.17.2.47. fagan, jody condit, and david gaines. 2016. "take charge of eds: vet your content." presentation to the ebsco users' group, boston, ma, may 10-11. gehanno, jean-françois, laetitia rollin, and stefan darmoni. 2013. "is the coverage of google scholar enough to be used alone for systematic reviews." bmc medical informatics and decision making 13 (1): 1. https://doi.org/10.1186/1472-6947-13-7. georgas, helen. 2015. "google vs. the library (part iii): assessing the quality of sources found by undergraduates." portal: libraries and the academy 15 (1): 133-161. https://doi.org/10.1353/pla.2015.0012. giustini, dean, and maged n. kamel boulos. 2013. "google scholar is not enough to be used alone for systematic reviews." online journal of public health informatics 5 (2). https://doi.org/10.5210/ojphi.v5i2.4623. gray, jerry e., michelle c. hamilton, alexandra hauser, margaret m. janz, justin p. peters, and fiona taggart. 2012. "scholarish: google scholar and its value to the sciences." issues in science and technology librarianship 70 (summer). https://doi.org/10.1002/asi.21372/full. haddaway, neal r. 2015. "the use of web-scraping software in searching for grey literature." grey journal 11 (3): 186-190. haddaway, neal robert, alexandra mary collins, deborah coughlin, and stuart kirk. 2015. "the role of google scholar in evidence reviews and its applicability to grey literature searching." plos one 10 (9): e0138237. https://doi.org/10.1371/journal.pone.0138237. hammarfelt, björn. 2014. "using altmetrics for assessing research impact in the humanities." scientometrics 101 (2): 1419-1430. https://doi.org/10.1007/s11192-014-1261-3. hands, africa. 2012. "microsoft academic search – http://academic.research.microsoft.com." technical services quarterly 29 (3): 251-252. https://doi.org/10.1080/07317131.2012.682026. information technology and libraries | june 2017 41 harper, sarah fletcher. 2016. "google books review." journal of electronic resources in medical libraries 13 (1): 2-7. https://doi.org/10.1080/15424065.2016.1142835. harzing, anne-wil. 2013. "a preliminary test of google scholar as a source for citation data: a longitudinal study of nobel prize winners." scientometrics 94 (3): 1057-1075. https://doi.org/10.1007/s11192-012-0777-7. ———. 2014. "a longitudinal study of google scholar coverage between 2012 and 2013." scientometrics 98 (1): 565-575. https://doi.org/10.1007/s11192-013-0975-y. ———. 2016a. publish or perish. vol. 5. http://www.harzing.com/resources/publish-or-perish. ———. 2016b. "microsoft academic (search): a phoenix arisen from the ashes?" scientometrics 108 (3): 1637-1647.https://doi.org/10.1007/s11192-016-2026-y. harzing, anne-wil, and satu alakangas. 2016a. "microsoft academic: is the phoenix getting wings?" scientometrics: 1-13. harzing, anne-wil, and satu alakangas. 2016b. "google scholar, scopus and the web of science: a longitudinal and cross-disciplinary comparison." scientometrics 106 (2): 787-804. https://doi.org/10.1007/s11192-015-1798-9. herrera, gail. 2011. "google scholar users and user behaviors: an exploratory study." college & research libraries 72 (4): 316-331. https://doi.org/10.5860/crl-125rl. higgins, julian, and s. green, eds. 2011. cochrane handbook for systematic reviews of interventions. version 5.1.0 ed.: the cochrane collaboration. http://handbook.cochrane.org/. hilbert, fee, julia barth, julia gremm, daniel gros, jessica haiter, maria henkel, wilhelm reinhardt, and wolfgang g. stock. 2015. "coverage of academic citation databases compared with coverage of scientific social media." online information review 39 (2): 255-264. https://doi.org/10.1108/oir-07-2014-0159. hoffmann, anna lauren. 2014. "google books as infrastructure of in/justice: towards a sociotechnical account of rawlsian justice, information, and technology." theses and dissertations. paper 530. http://dc.uwm.edu/etd/530/. ———. 2016. "google books, libraries, and self-respect: information justice beyond distributions." the library 86 (1). https://doi.org/10.1086/684141. horrigan, john b. "lifelong learning and technology." pew research center, last modified march 22, 2016, accessed february 7, 2017, http://www.pewinternet.org/2016/03/22/lifelonglearning-and-technology/. hug, sven e., michael ochsner, and martin p. braendle. 2016. "citation analysis with microsoft academic." arxiv preprint arxiv:1609.05354.https://arxiv.org/abs/1609.05354. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 42 huistra, hieke, and bram mellink. 2016. "phrasing history: selecting sources in digital repositories." historical methods: a journal of quantitative and interdisciplinary history 49 (4): 220-229. https://doi.org/10.1093/llc/fqw002. inger, simon, and tracy gardner. 2016. "how readers discover content in scholarly publications." information services & use 36 (1): 81-97. https://doi.org/10.3233/isu-160800. jackson, joab. 2010. "google: 129 million different books have been published." pc world, august 6, 2010. http://www.pcworld.com/article/202803/google_129_million_different_books_have_been_pu blished.html. jacsó, p. 2008. "live search academic." peter’s digital reference shelf, april. jacsó, péter. 2011. "the pros and cons of microsoft academic search from a bibliometric perspective." online information review 35 (6): 983-997. https://doi.org/10.1108/14684521111210788. jamali, hamid r., and majid nabavi. 2015. "open access and sources of full-text articles in google scholar in different subject fields." scientometrics 105 (3): 1635-1651. https://doi.org/10.1007/s11192-015-1642-2. johnson, paula c., and jennifer e. simonsen. 2015. "do engineering master's students know what they don't know?" library review 64 (1): 36-57. https://doi.org/10.1108/lr-05-2014-0052. jones, edgar. 2010. "google books as a general research collection." library resources & technical services 54 (2): 77-89. https://doi.org/10.5860/lrts.54n2.77. karlsson, niklas. 2014. "the crossroads of academic electronic availability: how well does google scholar measure up against a university-based metadata system in 2014?" current science 107 (10): 1661-1665. http://www.currentscience.ac.in/volumes/107/10/1661.pdf. kemman, max, martijn kleppe, and stef scagliola. 2013. "just google it-digital research practices of humanities scholars." arxiv preprint arxiv:1309.2434. https://arxiv.org/abs/1309.2434. khabsa, madian, and c. lee giles. 2014. "the number of scholarly documents on the public web." plos one 9 (5): https://doi.org/10.1371/journal.pone.0093949 kirkwood jr., hal, and monica c. kirkwood. 2011. "historical research." online 35 (4): 28-32. koler-povh, teja, primož južnic, and goran turk. 2014. "impact of open access on citation of scholarly publications in the field of civil engineering." scientometrics 98 (2): 1033-1045. https://doi.org/10.1007/s11192-013-1101-x. kousha, kayvan, mike thelwall, and somayeh rezaie. 2011. "assessing the citation impact of books: the role of google books, google scholar, and scopus." journal of the american society information technology and libraries | june 2017 43 for information science and technology 62 (11): 2147-2164. https://doi.org/10.1002/asi.21608. kousha, kayvan, and mike thelwall. 2017. "are wikipedia citations important evidence of the impact of scholarly articles and books?" journal of the association for information science and technology. 68(3):762-779. https://doi.org/10.1002/asi.23694. kousha, kayvan, and mike thelwall. 2015. "an automatic method for extracting citations from google books." journal of the association for information science & technology 66 (2): 309320. https://doi.org/10.1002/asi.23170. lee, jongwook, gary burnett, micah vandegrift, hoon baeg jung, and richard morris. 2015. "availability and accessibility in an open access institutional repository: a case study." information research 20 (1): 334-349. levay, paul, nicola ainsworth, rachel kettle, and antony morgan. 2016. "identifying evidence for public health guidance: a comparison of citation searching with web of science and google scholar." research synthesis methods 7 (1): 34-45. https://doi.org/10.1002/jrsm.1158. levy, steven. "making the world’s problem solvers 10% more efficient." backchannel. last modified october 17, 2014, accessed january 14, 2016, https://medium.com/backchannel/the-gentleman-who-made-scholar-d71289d9a82d. los angeles times. 2016. "google, books and 'fair use'." los angeles times, april 19, 2016. http://www.latimes.com/opinion/editorials/la-ed-google-book-search-20160419-story.html martin, kim, and anabel quan-haase. 2016. "the role of agency in historians’ experiences of serendipity in physical and digital information environments." journal of documentation 72 (6): 1008-1026. https://doi.org/10.1108/jd-11-2015-0144. martín-martín, alberto, juan manuel ayllón, enrique orduña-malea, and emilio delgado lópezcózar. 2016a. "2016 google scholar metrics released: a matter of languages... and something else." arxiv preprint arxiv:1607.06260. https://arxiv.org/abs/1607.06260. martín-martín, alberto, enrique orduña-malea, juan m. ayllón, and emilio delgado lópez-cózar. 2016b. "the counting house: measuring those who count. presence of bibliometrics, scientometrics, informetrics, webometrics and altmetrics in the google scholar citations, researcherid, researchgate, mendeley & twitter." arxiv preprint arxiv:1602.02412. https://arxiv.org/abs/1602.02412. martín-martín, alberto, enrique orduña-malea, juan manuel ayllón, and emilio delgado lópezcózar. 2014. "does google scholar contain all highly cited documents (1950-2013)?" arxiv preprint arxiv:1410.8464. https://arxiv.org/abs/1410.8464. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 44 martín-martín, alberto, enrique orduña-malea, juan ayllón, and emilio delgado lópez-cózar. 2016c. "back to the past: on the shoulders of an academic search engine giant." scientometrics 107 (3): 1477-1487. https://doi.org/10.1007/s11192-016-1917-2. martín-martín, alberto, enrique orduña-malea, anne-wil harzing, and emilio delgado lópezcózar. 2017. "can we use google scholar to identify highly-cited documents?" journal of informetrics 11 (1): 152-163. https://doi.org/10.1016/j.joi.2016.11.008. mays, dorothy a. 2015. "google books: far more than just books." public libraries 54 (5): 23-26. http://publiclibrariesonline.org/2015/10/far-more-than-just-books/ meier, john j., and thomas w. conkling. 2008. "google scholar’s coverage of the engineering literature: an empirical study." the journal of academic librarianship 34 (3): 196-201. https://doi.org/10.1016/j.acalib.2008.03.002. moed, henk f., judit bar-ilan, and gali halevi. 2016. "a new methodology for comparing google scholar and scopus." arxiv preprint arxiv:1512.05741.https://arxiv.org/abs/1512.05741. namei, elizabeth, and christal a. young. 2015. "measuring our relevancy: comparing results in a web-scale discovery tool, google & google scholar." paper presented at the association of college and research libraries annual conference, march 25-27, portland, or. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 5/namei_young.pdf national institute for health and care excellence (nice). "developing nice guidelines: the manual." last modified april 2016, accessed november 27, 2016. https://www.nice.org.uk/process/pmg20. neuhaus, chris, ellen neuhaus, alan asher, and clint wrede. 2006. "the depth and breadth of google scholar: an empirical study." portal: libraries and the academy 6 (2): 127-141. https://doi.org/10.1353/pla.2006.0026. obrien, patrick, kenning arlitsch, leila sterman, jeff mixter, jonathan wheeler, and susan borda. 2016. "undercounting file downloads from institutional repositories." journal of library administration 56 (7): 854-874. https://doi.org/10.1080/01930826.2016.1216224. orduña-malea, enrique, and emilio delgado lópez-cózar. 2014. "google scholar metrics evolution: an analysis according to languages." scientometrics 98 (3): 2353-2367. https://doi.org/10.1007/s11192-013-1164-8. orduña-malea, enrique, and emilio delgado lópez-cózar. 2015. "the dark side of open access in google and google scholar: the case of latin-american repositories." scientometrics 102 (1): 829-846. https://doi.org/10.1007/s11192-014-1369-5. orduña-malea, enrique, alberto martín-martín, juan m. ayllon, and emilio delgado lópez-cózar. 2014. "the silent fading of an academic search engine: the case of microsoft academic information technology and libraries | june 2017 45 search." online information review 38(7):936-953. https://doi.org/10.1108/oir-07-20140169. ortega, josé luis. 2015. "relationship between altmetric and bibliometric indicators across academic social sites: the case of csic's members." journal of informetrics 9 (1): 39-49. https://doi.org/10.1016/j.joi.2014.11.004. ortega, josé luis, and isidro f. aguillo. 2014. "microsoft academic search and google scholar citations: comparative analysis of author profiles." journal of the association for information science & technology 65 (6): 1149-1156. https://doi.org/10.1002/asi.23036. pitol, scott p., and sandra l. de groote. 2014. "google scholar versions: do more versions of an article mean greater impact?" library hi tech 32 (4): 594-611. https://doi.org/0.1108/lht05-2014-0039. prins, ad a. m., rodrigo costas, thed n. van leeuwen, and paul f. wouters. 2016. "using google scholar in research evaluation of humanities and social science programs: a comparison with web of science data." research evaluation 25 (3): 264-270. https://doi.org/10.1093/reseval/rvv049. quint, barbara. 2016. "find and fetch: completing the course." information today 33 (3): 17-17. rothfus, melissa, ingrid s. sketris, robyn traynor, melissa helwig, and samuel a. stewart. 2016. "measuring knowledge translation uptake using citation metrics: a case study of a pancanadian network of pharmacoepidemiology researchers." science & technology libraries 35 (3): 228-240. https://doi.org/10.1080/0194262x.2016.1192008. ruppel, margie. 2009. "google scholar, social work abstracts (ebsco), and psycinfo (ebsco)." charleston advisor 10 (3): 5-11. shultz, m. 2007. "comparing test searches in pubmed and google scholar." journal of the medical library association : jmla 95 (4): 442-445. https://doi.org/10.3163/1536-5050.95.4.442. stansfield, claire, kelly dickson, and mukdarut bangpan. 2016. "exploring issues in the conduct of website searching and other online sources for systematic reviews: how can we be systematic?" systematic reviews 5 (1): 191. https://doi.org/10.1186/s13643-016-0371-9. ştirbu, simona, paul thirion, serge schmitz, gentiane haesbroeck, and ninfa greco. 2015. "the utility of google scholar when searching geographical literature: comparison with three commercial bibliographic databases." the journal of academic librarianship 41 (3): 322-329. https://doi.org/10.1016/j.acalib.2015.02.013. suiter, amy m., and heather lea moulaison. 2015. "supporting scholars: an analysis of academic library websites' documentation on metrics and impact." the journal of academic librarianship 41 (6): 814-820. https://doi.org/10.1016/j.acalib.2015.09.004. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 46 szpiech, ryan. 2014. "cracking the code: reflections on manuscripts in the age of digital books." digital philology: a journal of medieval cultures 3(1): 75-100. https://doi.org/10.1353/dph.2014.0010. testa, matthew. 2016. "availability and discoverability of open-access journals in music." music reference services quarterly 19 (1): 1-17. https://doi.org/10.1080/10588167.2016.1130386. thelwall, mike, and kayvan kousha. 2015b. "web indicators for research evaluation. part 1: citations and links to academic articles from the web." el profesional de la información 24 (5): 587-606.https://doi.org/10.3145/epi.2015.sep.08. thielen, frederick w., ghislaine van mastrigt, l. t. burgers, wichor m. bramer, marian h. j. m. majoie, sylvia m. a. a. evers, and jos kleijnen. 2016. "how to prepare a systematic review of economic evaluations for clinical practice guidelines: database selection and search strategy development (part 2/3)." expert review of pharmacoeconomics & outcomes research: 1-17. https://doi.org/10.1080/14737167.2016.1246962. trapp, jamie. 2016. "web of science, scopus, and google scholar citation rates: a case study of medical physics and biomedical engineering: what gets cited and what doesn't?" australasian physical & engineering sciences in medicine. 39(4): 817-823. https://doi.org/10.1007/s13246-016-0478-2. van noorden, r. 2014. "online collaboration: scientists and the social network." nature 512 (7513): 126-129. https://doi.org/10.1038/512126a. varshney, lav r. 2012. "the google effect in doctoral theses." scientometrics 92 (3): 785-793. https://doi.org/10.1007/s11192-012-0654-4. verstak, alex, anurag acharya, helder suzuki, sean henderson, mikhail iakhiaev, cliff chiung yu lin, and namit shetty. 2014. "on the shoulders of giants: the growing impact of older articles." arxiv preprint arxiv:1411.0275. https://arxiv.org/abs/1411.0275. walsh, andrew. 2015. "beyond "good" and "bad": google as a crucial component of information literacy." in the complete guide to using google in libraries, edited by carol smallwood, 3-12. new york: rowman & littlefield. waltman, ludo. 2016. "a review of the literature on citation impact indicators." journal of informetrics 10 (2): 365-391. https://doi.org/10.1016/j.joi.2016.02.007. ward, judit, william bejarano, and anikó dudás. 2015. "scholarly social media profiles and libraries: a review." liber quarterly 24 (4): 174–204.https://doi.org/10.18352/lq.9958. weideman, melius. 2015. "etd visibility: a study on the exposure of indian etds to the google scholar crawler." paper presented at etd 2015: 18th international symposium on electronic theses and dissertations, new delhi, india, november 4-6. http://www.web information technology and libraries | june 2017 47 visibility.co.za/0168-conference-paper-2015-weideman-etd-theses-dissertation-india-googlescholar-crawler.pdf. weiss, andrew. 2016. "examining massive digital libraries (mdls) and their impact on reference services." reference librarian 57 (4): 286-306. https://doi.org/10.1080/02763877.2016.1145614. whitmer, susan. 2015. "google books: shamed by snobs, a resource for the rest of us." in the complete guide to using google in libraries, edited by carol smallwood, 241-250. new york: rowman & littlefield. wildgaard, lorna. 2015. "a comparison of 17 author-level bibliometric indicators for researchers in astronomy, environmental science, philosophy and public health in web of science and google scholar." scientometrics 104 (3): 873-906. https://doi.org/10.1007/s11192-015-1608-4. winter, joost, amir zadpoor, and dimitra dodou. 2014. "the expansion of google scholar versus web of science: a longitudinal study." scientometrics 98 (2): 1547-1565. https://doi.org/10.1007/s11192-013-1089-2. wu, tim. 2015. "whatever happened to google books?" the new yorker, september 11, 2015. wu, ming-der, and shih-chuan chen. 2014. "graduate students appreciate google scholar, but still find use for libraries." electronic library 32 (3): 375-389. https://doi.org/10.1108/el-082012-0102. yang, le. 2016. "making search engines notice: an exploratory study on discoverability of dspace metadata and pdf files." journal of web librarianship 10 (3): 147-160. https://doi.org/10.1080/19322909.2016.1172539. bento box user experience study at franklin university articles bento-box user experience study at franklin university marc jaffy information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11581 marc jaffy (marc.jaffy@franklin.edu) is acquisitions librarian, franklin university. abstract this article discusses the benefits of the bento-box method of searching library resources, including a comparison of the method with a tabbed search interface. it then describes a usability study conducted by the franklin university library in which 27 students searched for an article, an ebook, and a journal on two websites: one using a bento box and one using the ebsco discovery service (eds). screen recordings of the searches were reviewed to see what actions users took while looking for information on each site, as well as how long the searches took. students also filled out questionnaires to indicate what they thought of each type of search. overall students found more items on the bento-box site, and indicated a slight preference for the bento-box search over eds. the bento-box site also provided quicker results than the eds site. as a result, the franklin university library decided to implement bento-box searching on its website. introduction “one page, one search box, results from as many library-resource types as possible.”1 in 2018, the franklin university library redesigned its website to provide users with a more modern interface that more closely matched franklin university’s website. the library also wanted to improve the site’s usability and make it easier for students to find information. to determine how to best improve the user experience, library staff members held a number of meetings to discuss the new site’s layout and contents. because “students almost always resort[] to searching via web site search boxes rather than navigating through the web site by browsing,” a crucial decision involved what search results the redesigned library website would provide. 2 as a result of these discussions, the franklin university library’s initial website redesign included a persistent search bar in the upper left of each page which searched the library’s website, as well as a prominent tabbed search bar on the library’s homepage (see figure 1). the homepage search bar provided a default tab that used ebsco discovery service (eds) to search the library resources cataloged in eds (most of the library’s databases and catalog) and a second tab which used ebsco’s journal finder to look for e-journals. once our new website went live, feedback from patrons demonstrated that the persistent website search bar caused confusion among users who expected it to search the library’s databases rather than the library’s website. we also found the “search journals” tab on the homepage unnecessary. as a result, we removed both the persistent search bar and the journals tab. after these changes, the main search option provided to library users was the eds search bar on the library’s homepage, although some interior pages of the library’s website provided a search bar related to that page (such as an option to search the catalog on the catalog page). although eds searches mainly for articles and books, it “may overlook user needs for other types of library mailto:marc.jaffy@franklin.edu information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 2 resources or services.”3 this is a problem because “library users increasingly perceive the discovery interface as a portal to all of the library’s resources.”4 due to dissatisfaction with the eds search, the library decided to look for alternatives. one alternative which “[a] number of libraries have turned to [is] the bento-based approach to discovery and display.”5 figure 1. redesigned franklin university library website with two search boxes on the homepage. the circled search box in the upper left was initially persistent across the entire site. to determine whether the bento-box search format would serve our users better than eds, the library designed and conducted a usability study comparing eds and bento-box searches. by comparing user search behavior and results for each search method, as well as user opinion regarding these different methods of searching library websites, the library hoped to gain a clearer understanding of how its users interact with search boxes on the library’s website and— most importantly—which search method would best serve its users. the remainder of this article sets forth the results of that trial. after explaining what bento-box search is, as well as reasons a library might want to use bento-box search results, it reports on a usability study the franklin university library conducted by discussing both the observations of screen recordings demonstrating user search behavior and responses to questionnaires. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 3 bento-box library search what is bento-box search? the term “bento box” is based on “japanese cuisine where different parts of a meal are compartmentalized in aesthetically pleasing ways.”6 instead of compartmentalizing food, a bentobox search results page compartmentalizes search results from a variety of different resources on a single page. the user sees a single search box which gives “side-by-side results from multiple library search tools, e.g., library catalog, article index, institutional repository, website, libgu ides, etc.”7 a bento-box search provides results based on searches of individual library resources. this is important because of the difficulty of providing a single search that includes combined results from all resources: “the nature of library content and current technology makes it difficult to create usable ‘blended’ results; catalog materials may crowd out books or vice versa.” 8 bento-box results avoid this problem by “provid[ing] these resources on equal footing, leveraging the ranking algorithms internal to each resource’s individual search interface.” 9 as a result, the bento box gives libraries “the best cost/benefit way to improve article search relatively quickly with relatively large user benefit.”10 figure 2, the university of michigan’s bento-box results page, illustrates how a bento box provides search results from a variety of resources in visually discrete boxes. this is done behind the scenes by using separate searches to query the individual resources, as demonstrated by figure 3, the architecture of wayne state university’s bento-box search. benefits of a bento-box search results page wayne state university’s switch to a bento box “resulted in increased access to resources.”11 a bento box can increase access to resources both because it makes library search easier for users and because it provides results in a format that makes it easier for users to find information. simplified search when deciding what type of search to provide on the library website, the main consideration involves what users expect when searching. student experiences with internet search engines have influenced their expectations for library search, which leads them to “approach library search interfaces as if they were google.”12 what do users like about google? “one of the main reasons that users are satisfied with google is its simple user interface.”13 based on their experience with google and other search engines, library users “expect easy searching” that provides “one-step immediate access to the full text of library resources.”14 bento-box results let libraries meet these expectations by presenting users with a simple interface that permits easy searching and “returns results across many library sources.”15 additionally, bento-box results “can integrate library website results, allowing users to type things like ‘how do i renew a book’ into a single search box and get meaningful results.”16 as a result, adopting a bento-box search results page can permit a library to satisfy user search expectations. the bento-box format will provide users the information they seek whether they are looking for an article, a book, or information about the library. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 4 figure 2. university of michigan library’s bento-box results. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 5 figure 3. wayne state university library bento box architecture, from cole hudson and graham hukill, “one-to-many: building a single-search interface for disparate resources,” in k. varnum (ed.) exploring discovery: the front door to your library’s licensed and digitized content (chicago: ala editions, 2016): 147. better presentation of results bento-box results can help alleviate user confusion because “format types [are] more evident: novice users, such as undergraduates, may not have a good understanding of the difference between books, journals, and articles.”17 the bento-box presentation makes it easier for users to find information since “results from different sources are returned to visually discrete boxes” 18 on a single page. this presentation of grouped results benefits library users because “[b]y presenting search results in separate streams, users can more easily navigate to what they need.”19 when the princeton university library implemented a bento-box results page (termed all search), it found that “most users praised the new all search for its ease of use and also for the ‘bento-box’ approach of grouping results by category. they felt that this clarified the results they were seeing and made it easier for them to pursue different avenues of inquiry.”20 information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 6 comparison with tabbed searching one alternative to the bento box is to offer users a tabbed search box which lets users select a specific resource to search by selecting a tab on the search bar. before the 2018 redesign, the franklin university library’s website provided users with a tabbed search box, as shown in figure 4. our redesigned website reduced the number of tabs from four to two, but we ultimately removed tabbed search from our website because we did not find it effective. figure 4. previous franklin university library website with tabbed search box. tabbed search requires users to decide which tab to use. when the franklin university library provided a tabbed search box we found that users had difficulty identifying which tab they should use. in addition to causing user confusion over which tab to use for their search, because each search tab only searches a portion of library resources, tabbed searching “misses a wide swath of available information and resources [which] will make that missing information practically invisible.”21 another problem with tabbed search is that it requires a library to designate one of the tabs as a default search. this can lessen the chances that users will search (or use) resources in the nondefault tab(s) because library users “tend to favor the most prominent search option.”22 lown, sierra, and boyer cited a study which found that the default option was used 60.5 percent of the time in a tabbed interface, and reported that on the north carolina state university website the default tab was used 73.7 percent of the time.23 tabbed search does not meet library user needs because it is “inconsistent and confusing.”24 when wayne state university switched from tabbed searching to a bento box “many library resources that were previously hidden from search and discovery on the main library website were, for the first time, exposed to all searches, for all users . . . [which] resulted in increased use and awareness of these resources.”25 information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 7 design considerations bento-box results pages are highly customizable. a 2017 review of 38 academic libraries using bento-box search found “much variation in the implementation.”26 a bento-box results page needs to balance providing necessary information with displaying results in a way that is not too overwhelming—or too cluttered. the university of michigan analyzed usage of its bento-box results page and redesigned it to improve how it presented results to library users by displaying “[f]ewer results . . . in each section—with a more prominent ‘see all’ link—than in the original design.”27 given user expectations and the challenges previously discussed, libraries must design the results to “maximize the exposure of [their] collection[s] and services, in the most appropriate precedence, while preventing cognitive overload.”28 a cluttered results page, with a lack of distinction between categories, will make it difficult for users to find information and will cause confusion rather than ease it. another concern with a bento-box results page occurs when some of the result boxes “end up ‘below the fold,’ meaning users will need to scroll down to see them. this creates the same problem as a tabbed search box—users don’t see results from all library sources.”29 because users are less likely to see below the fold search results, the bento-box results page needs to prioritize category locations so that the more important results are above the fold (which requires the library to determine the relative importance of search result categories). user experience study at franklin university the trial design during franklin university’s fall 2018 and spring 2019 welcome weeks, in addition to providing students with information about the library’s services, staff at the library’s information table asked students to participate in a trial to help determine whether adopting a bento-box results page would benefit our users. participants were offered a franklin university coffee mug as an incentive. the trial asked participants to look for information on two different library websites: one using eds (franklin university library) and one using a bento box (wayne state university library). we set up two laptops for participants to use and made screen recordings of the participants’ actions during their searches for later viewing and analysis. after they finished the tasks we asked participants to fill out a questionnaire (reproduced in appendix a), which had three background questions and six questions about their experience/thoughts relating to the tasks. to decide what information to ask participants to look for, we reviewed library websites that used bento boxes to see what categories they searched. we compared those categories to the types of information available on our library’s website. although we identified a number of possible categories, we decided to limit the trial to three tasks because we did not want to overburden participants. based on our experience working with students, we decided the three categories we were most interested in investigating: articles, books, and journals. to see how users searched for items in these categories we asked participants to complete the following three tasks on each website: information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 8 1. find an article available through the library on the topic “criminal justice.” 2. find the ebook lean six sigma for leaders by martin brenig-jones and jo dowdall. 3. find the electronic journal business today. participants thirty-four people participated in the trial. however, not all of the participants completed all of the tasks on each library’s website. we discarded the results from participants who did not attempt at least two tasks on each library’s website. removing those who did not complete at least two tasks on each site left 27 participants (“adjusted” results). unless otherwise noted, the data discussed below refers to the adjusted results. the trial was open to students, faculty, and staff. eleven participants took the trial in fall 2018 and 16 participants took it in spring 2019. most of the participants were undergraduates (21), with some graduate students (6). no doctoral students, faculty, or staff-only participated. (one staff member participated and completed a questionnaire but did not perform enough tasks for their results to be included.) results we watched the screen recordings to time how long it took students to complete the three tasks on each library’s website. however, if a student flipped between the sites while searching instead of first completing all three searches on one site we did not time the results. we also observed what students did while searching to gain an understanding of how they searched for information. overall, students spent less time searching on the site using the bento box (wayne state university library) than they did on the franklin university library site: • students spent an average of 2 minutes, 35 seconds to complete the tasks on the wayne state site compared to 3 minutes, 28 seconds on franklin’s. • twelve students finished their searches quicker on wayne state’s site, while six had quicker results from franklin’s. how students searched for information the screen recordings showed that students looking for information often went to parts of the library websites which did not contain the content they sought. frequently, they would search whatever part of the site they were on—even if it did not contain the content they needed. on the wayne state site, when a student used an interior search bar the bento box provided results from a wide range of library resources. a search on the journals page would give results for journals, books, articles, and more—even if the page they were on did not contain the resource they needed to find. by contrast, any interior search box found on the franklin university page would only provide results for whatever portion of the library’s resources that search box accessed: a search on the journals page would only provide journal results, a search on the catalog page would only provide catalog results, etc. student action after the initial search demonstrates the need for interior search boxes which can search the library’s entire site. twelve of 27 students on each site (although not the same 12 students) followed up a search by using a search bar they found on their results page without returning to the homepage to use the main search bar. students did this even when the page they were on did not relate to the content they were looking for. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 9 for library users, the division between the content of the library’s website and the content provided by the library “’is not obvious and makes no real sense.’”30 the screen recordings of student behavior when searching for content on library websites demonstrated that students also could not distinguish between different areas within the library’s website. search for articles the first task asked students to find an article on criminal justice. because it was the first task on the list, most students started with this search. students had more success finding the article on franklin’s site than on wayne state’s site (18 to 14). although only 14 students were credited with finding the article on wayne state’s site, several students actually reached the bento-box page which included a category for article results. however, they did not realize that they had found the article and selected an ebook or database instead. search for ebooks the second task asked students to find the ebook lean six sigma for leaders by martin brenigjones and jo dowdall. students had more success finding this book on the bento-box site. twenty students found the book using the wayne state library’s bento-box search, compared to ten who found the book on the franklin university library’s site. many students, in searching for the ebook, typed their search on the results page from the previous search. on the wayne state university library’s site, this led to a bento-box results page which included the book. on franklin university library’s site, the results were more complex. between fall 2018 and spring 2019, our ebsco eds custom catalog was not renewed, which resulted in the search results for the ebook no longer displaying in eds. [this non-renewal was not intentional.] as a result, in fall 2018, the ebook that students looked for appeared in eds search results (on both the franklin university library’s main search bar and interior ebsco search boxes); however, in spring 2019 it did not. to see what effect this had, we looked at the more limited search results from the fall 2018 trial when an eds search on franklin university library’s site would successfully find the book. even then, students had more success finding the book on the bento-box site: 9 students successfully found the book on wayne state’s site, compared to 7 on franklin university’s site. some students on franklin university library’s site tried, unsuccessfully, to identify the proper page of the library’s site to find “books.” instead of the catalog page, they went to a page labeled “textbooks” which helped students locate course reserves. because there was no search option for the entire site on that page, those students did not successfully find the ebook. search for journals the third task asked students to look for the journal business today. as with the ebook search, students had more success finding the journal on the bento-box site: 19 students found the journal on wayne state library’s site compared to 10 on franklin university library’s site. students searched for the journal in a similar manner to the way they searched for the ebook. many just put the journal title in the search bar on the page they were on—if that search bar on franklin university library’s site provided access to the journal, they would find it with their information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 10 search. but if they were on a page which did not have a search which included the journal as a result, they would not. the journal search on the wayne state university library’s site demonstrated the “below th e fold” problem discussed above because the bento-box result for “journals” was below the fold. as a result, at least one student properly searched for the journal, but did not find it because the result was not visible on the screen and they did not scroll down. questionnaires we asked students a series of questions about their experience searching for information on the two library websites (see appendix a for the questionnaire and appendix b for the results). slightly more students preferred the wayne state library’s site (14) to franklin university’s (12). however, five of the users who preferred franklin university’s site referenced their familiarity with that site. four of these users specifically referenced their familiarity with the franklin university library site in response to a question asking “[w]hy did you prefer the type of search you picked,” while one referenced their familiarity with franklin university library’s site in response to a question asking whether there was “anything else you’d like to tell us.” the questionnaire also asked students why they preferred one site to the other and what they liked and disliked about each site. preference: franklin university library as mentioned above, many of the comments from those who indicated a preference for the franklin university library site indicated the preference was due to familiarity: • “might be because i am a bit used to it, i just found it easier to navigate.” • “because it’s the one i am familiar with.” • “because i'm familiar with it.” • “i liked both. both easy to use. familiar with franklin's.” other comments favored franklin university library’s overall website design (as opposed to search): • “the website is cleaner. easier to use.” • “easier to navigate.” some of the comments did indicate a preference for the search technique: • “one search bar to search all types. seemed to include more in search results.” • “it easy to search and straight forward.” • “access to research was quicker on the franklin website. also, i felt like there was more research material available.” preference: wayne state university library those who preferred the wayne state search did so more based on the search technique than did those who preferred franklin: • “simple and the search was in one spot.” information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 11 • “one search bar that pulled from the [catalog].” • “because you can type in exactly what you were looking for and it comes up.” others appreciated the way search results were displayed: • “their search system organizes the result by type of information, whereas franklin's website makes you search for the type of material information before displaying the results.” • “better layout breaks articles, journals, etc. into separate columns.” • “wayne had each section (book, e-journal, article) separately which was easier to find.” • “the layout.” still others just found the wayne state library search easier to use: • “it is more visual and easy to find and easy to use.” • “it presented the information in an easy way to find.” • “easy—all in one.” what search results do library users want? we asked participants to rank which results they would like to see displayed when searching on the library’s site. while most applied a numerical ranking, some just circled items. all questionnaire responses were included when compiling these rankings, including rankings from those who did not complete at least two tasks on each website, because user preferences about what search results they want are valid even if they did not perform the required tasks on each library’s website. we converted numerical rankings so that the first choice received six points, the second choice five points, etc. of the 34 participants who answered this question, 24 provided rankings and ten circled items without indicating how they ranked those items. where participants circled items, we converted their responses to a numerical equivalent based on how many answers they circled. if they only circled one, it was treated as the first choice, and given 6 points. if they circled more than one, we combined the numerical value of the answers and each answer received the average value. (for example, if two answers were circled, they were treated as a first and second choice, and each circled answer was given a score of 5.5.) the responses indicate students most wanted library search to provide results for articles and journals, followed by databases and ebooks: • articles: 144 • journals: 125 • databases: 112 • books/ebooks: 111 • research guides: 69.5 • library site: 63.5 mischo, norman, and schlembach reported actual usage of the university of illinois’ bento-box results page by category between 2015 and 2017. how do the categories our users indicated they information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 12 would like to see on a bento-box results page compare with the actual use of bento-box results at the university of illinois? at the university of illinois, 56.1 percent of click-throughs were for articles (franklin university students’ first choice), while 33.6 percent were for book and online catalog content (our students’ fourth choice).31 databases, our students’ third choice, were not a listed category, while journals (franklin university students’ second choice) were only the fifth-most used resource (and the percentage of click-throughs was low, at only 3.6 percent).32 limitations there were a few issues with the study which should be kept in mind when evaluating the results. number of participants thirty-four individuals participated in the trial. after removing results from participants who failed to complete a sufficient number of tasks, only 27 participants remained. while this number is small, it does provide information on what students think and, more importantly, how they act when searching for various types of information on the library’s website. examples of library user experience testing based on similar numbers include: • the university of kansas library conducted “usability testing of our primo discovery interface on basic library searching scenarios for undergraduate and graduate students” and reported results from 27 users.33 • the university of southern mississippi library conducted usability testing of 24 users (“six participants from each of the following library user groups: undergraduate students, graduate students, faculty, and library employees”) to evaluate and modify their website.34 • syracuse university conducted usability testing on “ten students . . . and eighteen library staff members.”35 familiarity with franklin university library website student familiarity with the franklin university library website affected student opinion. of 12 students in the adjusted results who preferred franklin university library’s site, 5 (41.7 percent) gave an answer indicating that familiarity was a factor in their preference. when considering all the questionnaires (including those from participants who were not included in the adjusted results), 7 out of 17 users (41.2 percent) who preferred franklin university library’s site gave an answer referencing familiarity. as a result, opinion may have been skewed in favor of franklin university, despite wayne state being slightly favored overall. a good illustration of this problem is the response from a student who the screen recording showed did not even attempt any of the tasks on the franklin university library website but indicated a preference for the franklin university library site because it’s “easy to use.” conclusion as a result of the user experience study, the franklin university library decided that providing bento-box search results would benefit our library’s users. the trial showed that students required less time to conduct their searches using wayne state’s bento-box search and found more items successfully on wayne state’s site. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 13 the lack of student distinction between different types of library content, along with the likelihood of their entering a search in whatever search box they see on a page further supports providing bento-box results. adopting a bento-box results page will permit the library to provide search boxes on interior pages which permit students to search for materials site-wide. the bento box will let students search for content anywhere on the library’s web site without requiring them to first figure out what type of library resource they are looking for and then find the correct section of the library’s website. additionally, the ebook search issue previously discussed demonstrates the benefits of switching to a bento box. the disappearance of ebook search results from the eds listing would not have mattered with a bento-box style search because the bento box would have displayed a box for catalog results. comments from two of the students who preferred the wayne state library website demonstrate the benefits of a bento-box format. the bento-box search meant that wayne state’s site is “simple and the search was in one spot.” it also helps students because “[wayne state’s] search system organizes the result by type of information, whereas franklin's website makes you search for the type of material information before displaying the results.” information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 14 appendix a: questionnaire about this study the franklin university library is studying how users search for, and find, information on library websites. the purpose of this study is to ask library users (and potential library users) to search for information on two different library websites and give their opinion on their search experience. you are asked to be a participant as a member of the franklin university community who is a library user, or a potential library user. we hope to have between 20 and 50 people participate in this study. if you agree to participate in this study, you will be asked to look for information to find four different resources on two library websites (franklin university’s website and wayne state university’s website). you will only be asked to find the information, not to access the information. you will then be asked to fill out a questionnaire providing demographic information and your opinion about library search. as part of this study, your search activity on the websites may be recorded by screen recording software. your participation in this study is anticipated to take about 15 minutes. there are no known risks to participation in the study. the benefits of participation include helping the library to better serve its users by identifying how users search for information on the library’s website. in return for your participation in this survey, you will receive a franklin university coffee mug. this study is conducted anonymously—no personally identifiable information will be collected. your participation in this survey is voluntary. if you decide to participate, you have the right to refuse to answer any of the questions that make you uncomfortable. you also have the right to withdraw from this study at any time, with no repercussions. this research has been reviewed and approved by the franklin university institutional review board. for questions regarding participants’ rights, you can contact the institutional review board at irb@franklin.edu. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 15 please answer the following background questions: 1) what do you do at franklin university (circle all that apply)? a) non-degree-seeking student b) undergraduate student c) master student d) doctoral student e) staff f) faculty 2) how often do you use the library’s website (circle the best choice)? a) frequently (every week) b) occasionally (every month) c) rarely (less than once a month) d) never 3) how often do you use the search function on the library’s website? a) frequently b) occasionally c) rarely d) never information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 16 please answer the following questions about your experience and preferences when searching for information on library websites: 1) which library’s search did you prefer: a) franklin university library b) wayne state university library 2) why did you prefer the type of search you picked in the answer to question 1? 3) for franklin university’s search results: a) what did you like? b) what didn’t you like? 4) for wayne state university’s search results: a) what did you like? b) what didn’t you like? 5) please rank in order of preference what search results you would want to see displayed when searching on the library’s website: a) articles related to a topic b) books / ebooks c) databases d) library website e) journals f) research guides g) other (please list): 6) is there anything else you’d like to tell us about your experience looking for information on these library websites? information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 17 appendix b: adjusted questionnaire results below are the results from participants who completed at least two tasks on each university library’s website. where the screen recordings indicated that participants did not complete at least two tasks on each of the websites the questionnaire responses were not recorded. what do you do at franklin university (circle all that apply)? undergraduate: 21 masters: 6 how often do you use the library’s website (circle the best choice)? occasionally: 9 frequently: 12 rarely: 6 how often do you use the search function on the library’s website? occasionally: 7 frequently: 13 rarely: 7 which library’s search did you prefer: franklin university: 12 wayne state university: 14 n/a: 1 information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 18 appendix c: screen recording results analysis of screen recordings from participants who completed at least two tasks on each university library’s website. for timed results, we did not include the results of students who flipped between library websites while completing the tasks. (an example of flipping between sites occurred when a student found the article on the franklin university library site, then looked for it on wayne state university library site before looking for the ebook on the franklin university site.) time to complete tasks (average): franklin university library site: 3:28 wayne state university library site: 2:35 site where student finished search quicker: franklin university library site: 6 wayne state university library site: 12 total items found: franklin university library site: 38 wayne state university library site: 53 articles found: franklin university library site: 18 wayne state university library site: 14 books found: franklin university library site: 10 wayne state university library site: 20 journals found: franklin university library site: 10 wayne state university library site: 19 information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 19 endnotes 1 cole hudson and graham hukill, “one-to-many: building a single-search interface for disparate resources,” in k. varnum (ed.) exploring discovery: the front door to your library’s licensed and digitized content (chicago: ala editions, 2016): 146. 2 suzanna conrad and nathasha alvarez, “conversations with web site users: using focus groups to open discussion and improve user experience,” journal of web librarianship 10, no. 2 (april 2016): 71, https://doi.org/10.1080/19322909.2016.1161572. 3 scott hanrath and miloche kottman, “use and usability of a discovery tool in an academic library,” journal of web librarianship 9, no. 1 (january 2015): 4, https://doi.org/10.1080/19322909.2014.983259. 4 irina trapido, “library discovery products: discovering user expectations through failure analysis.” information technology and libraries 35, no. 3 (2016): 22, https://doi.org/10.6017/ital.v35i3.9190. 5 william mischo, michael norman, and mary schlembach, “innovations in discovery systems: user studies and the bento approach,” proceedings of the charleston library conference (2017): 299, https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1991&context=charleston. 6 hudson and hukill, “one-to-many,” 142. 7 emily singley, “to bento or not to bento—displaying search results,” http://emilysingley.net/usablelibraries/to-bento-or-not-to-bento-displaying-search-results/. 8 jonathan rochkind, “article search improvement strategy,” https://bibwild.wordpress.com/2012/10/02/article-search-improvement-strategy/. 9 hudson and hukill, “one-to-many,” 145. 10 rochkind, “article search improvement strategy.” 11 hudson and hukill, “one-to-many,” 142. 12 nancy turner, “librarians do it differently: comparative usability testing with students and library staff,” journal of web librarianship 5, no. 4 (october 2011), https://doi.org/10.1080/19322909.2011.624428; elena azadbakht, john blair, and lisa jones, “everyone's invited: a website usability study involving multiple library stakeholders,” information technology & libraries 36, no. 4 (2017), 43, https://doi.org/10.6017/ital.v36i4.9959. 13 colleen kenefick and jennifer a. devito, “google expectations and interlibrary loan: can we ever be fast enough?” journal of interlibrary loan, document delivery & electronic reserves 23, no. 3 (july 2013): 158, https://doi.org/10.1080/1072303x.2013.856365. https://doi.org/10.1080/19322909.2016.1161572 https://doi.org/10.1080/19322909.2014.983259 https://doi.org/10.6017/ital.v35i3.9190 https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1991&context=charleston http://emilysingley.net/usablelibraries/to-bento-or-not-to-bento-displaying-search-results/ https://bibwild.wordpress.com/2012/10/02/article-search-improvement-strategy/ https://doi.org/10.1080/19322909.2011.624428 https://doi.org/10.6017/ital.v36i4.9959 https://doi.org/10.1080/1072303x.2013.856365 information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 20 14 kenefick and devito, “google expectations and interlibrary loan,” 157; carol diedrichs, “discovery and delivery: making it work for users,” serials librarian 56, no. 1-4 (january 2009): 81, https://doi.org/10.1080/03615260802679127. 15 singley, “to bento or not to bento.” 16 singley. 17 singley. 18 hudson and hukill, “one-to-many,” 146. 19 singley, “to bento or not to bento.” 20 eric phetteplace and jeremy darrington, “a hybrid approach to discovery services,” reference & user services quarterly 53, no. 4 (2014): 293. 21 cory lown, tito sierra, and josh boyer, “how users search the library from a single search box,” college & research libraries 74, no. 3 (2013): 229. 22 singley, “to bento or not to bento.” 23 lown, sierra, and boyer, “how users search the library from a single search box.” 24 hudson and hukill, “one-to-many,” 150. 25 hudson and hukill, 150. 26 mischo, norman, and schlembach, “innovations in discovery systems,” 304. 27 suzanne chapman et al., “manually classifying user search queries on an academic library web site,” journal of web librarianship 7, no. 4 (october 2013): 419, https://doi.org/10.1080/19322909.2013.842096. 28 aaron tay and feng yikang, “implementing a bento-style search in libguides v2,” code4lib journal no. 29 (july 2015), https://journal.code4lib.org/articles/10709. 29 singley, “to bento or not to bento.” 30 chapman et al., “manually classifying user search queries,” 406. 31 mischo, norman, and schlembach, “innovations in discovery systems.” 32 mischo, norman, and schlembach. 33 hanrath and milche kottman, “use and usability of a discovery tool,” 5. 34 azadbakht, blair, and jones, “everyone's invited,” 34. 35 turner, “librarians do it differently,” 290. https://doi.org/10.1080/03615260802679127 https://doi.org/10.1080/19322909.2013.842096 https://journal.code4lib.org/articles/10709 abstract introduction bento-box library search what is bento-box search? benefits of a bento-box search results page simplified search better presentation of results comparison with tabbed searching design considerations user experience study at franklin university the trial design participants results how students searched for information search for articles search for ebooks search for journals questionnaires preference: franklin university library preference: wayne state university library what search results do library users want? limitations number of participants familiarity with franklin university library website conclusion appendix a: questionnaire about this study appendix b: adjusted questionnaire results what do you do at franklin university (circle all that apply)? how often do you use the library’s website (circle the best choice)? how often do you use the search function on the library’s website? which library’s search did you prefer: appendix c: screen recording results time to complete tasks (average): site where student finished search quicker: total items found: articles found: books found: journals found: endnotes harnessing the power of orcam public libraries leading the way harnessing the power of orcam mary howard information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12637 mary howard (mhoward@sccl.lib.mi.us) is reference librarian library for assistive media and talking books (lamtb) at the st. clair county library, port huron, michigan. © 2020. library for assistive media and talking books services (lamtb) are located at the main branch for the st. clair county’s library system. lamtb facilitates resources and technologies for residents of all ages who have visual, physical, and/or reading limitations that prevent them from using traditional print materials. operating out of port huron, michigan, we encounter many instances where we need to provide assistance above and beyond what a basic library may offer. we host talking book services which provide free players, cassettes, braille titles, and downloads to users who are vision or mobility impaired. we also have a large and stationary kurtzweil reading machine that converts print to speech, video-enhanced magnifiers, large print books. we also provide home delivery service for patrons who are unable to travel to branches. the library has been searching for a more technology-forward focus for our patrons. the state’s talking books center in lansing set up an educational meeting at the library of michigan in 2018 to see a live demonstration of the orcam my eye reader. this was the innovation we were seeking and i was thoroughly impressed with the compact and powerful design of the reader, the ease of use, and the stunningly accurate feedback provided by this ai reading assistive device. users are able to read with minimal setup and total control. orcam readers are lightweight, easily maneuverable assistive technology devices for users who are blind, visually impaired, or have a reading disability, including children, adults and the elderly. the device automatically reads any printed text: newspapers, money, books, menus, labels on consumer products, text on screens, books, or smartphones, etc. the orcam reader will repeat back any text immediately and is fit for all ages and abilities. orcam works with english, spanish, and french languages and can identify money and other business and household items. it can be placed near either the left or right ear. users can easily adjust the volume and speed of the read text. it can be to either the left or right temple on your glasses using a magnetic docking device. having a diverse group of users with different needs use the reader as they like is one of the more impressive offerings. changing most settings is normally facilitated with just a finger swipe on the orcam device. the mission of orcam is to develop a "portable, wearable visual system for blind and visually impaired persons, via the use of artificial computer intelligence and augmented reality” by offering these devices to our sight, mobility, or otherwise impaired patrons we open up the world of literacy, discovery and education. some of our users are not able to read in any other fashion and the orcam provides a much-needed boost to their learning profile. we secured a grant from the institute of museum and library services (imls) for the purchase of the readers (cfda 45.310). we also worked with orcam to get lower pricing for these units. normally they retail for $3,500 but we were able to move this to the lower price point of $3,000. we also were awarded a $22,106 improving access to information grant from the library of michigan to fund the entire purchase. without this funding stream we would not have been able to secure the orcam. however, if you have veterans in your service area please contact the company since there is availability for va health coverage for low vision or legally blind veterans who may information technology and libraries september 2020 harnessing the power of orcam | howard 2 qualify to receive an orcam device, fully paid for by the va. please visit https://orcam.com/en/veterans for more information. figure 1. close-up of the orcam device. the grant was initially set to run from september 2019 to september 2020. we purchased six orcam readers for our library users, and they were planned to be rotated among our twelve branches throughout this grant cycle. however, due to the pandemic and out of safety concerns for staff and visitors, our library was closed from march 23 to june 15 and we were only able to offer it to the public at six branches. as of july 14, 2020, we are projecting that we may open to the public in september, but covid-19 issues could halt that. we have had to make arrangements with the grantor to extend the period for the usage of the orcam from september to december. this will make up for some of the lost time and open a path for the other six libraries to have their turn offering the orcam to their patrons. the interesting aspect of this is we now have to take our technology profile even further by offering remote training to prospective orcam users. thankfully, the design and rugged housing for the reader makes it easy to clean and maintain but the social distancing can prove to be intrusive for training. to set up a user you need to be within a foot or two of them and being very close in order to get them used to how the orcam reads. there is a lot of directing involved and close contact with the user and instructor. we will use a work around of providing distance instruction including in-person and remote training. orcam also has a vast array of instructional videos that we will have cued up for users. we have had over 150 https://orcam.com/en/veterans information technology and libraries september 2020 harnessing the power of orcam | howard 3 residents attend presentations, demonstrations, and talks on the orcam. i anticipate that this number will not be achieved for the second round; however, we may be more successful in our online presence since we can add the instruction to our youtube page, offer segments on facebook and other social media and provide film clips for our webpage. the situation has been difficult, but it has opened up lamtb services to think about how we should be working to provide better and more remote service to our users. since we cover over 800 square miles in the county, becoming more adaptable to servicing our patrons has become a paramount area of work for the library. the orcam will bring about a new way of remote training to our patrons, which will bring about more awareness of the reader and how it can be beneficial to users. the st. clair county library system would like to thank the institute of museum and library services for supporting this program. the views, findings, conclusions or recommendations expressed in this article do not necessarily represent those of the institute of museum and library services. microsoft word september_ital_maceli_proofed.docx what  technology  skills  do  developers   need?  a  text  analysis  of  job  listings  in   library  and  information  science  (lis)     from  jobs.code4lib.org.      monica  maceli     information  technology  and  libraries  |  september  2015             8   abstract   technology  plays  an  indisputably  vital  role  in  library  and  information  science  (lis)  work;  this  rapidly   moving  landscape  can  create  challenges  for  practitioners  and  educators  seeking  to  keep  pace  with   such  change.  in  pursuit  of  building  our  understanding  of  currently  sought  technology  competencies   in  developer-­‐oriented  positions  within  lis,  this  paper  reports  the  results  of  a  text  analysis  of  a  large   collection  of  job  listings  culled  from  the  code4lib  jobs  website.  beginning  more  than  a  decade  ago  as   a  popular  mailing  list  covering  the  intersection  of  technology  and  library  work,  the  code4lib   organization's  current  offerings  include  a  website  that  collects  and  organizes  lis-­‐related  technology   job  listings.  the  results  of  the  text  analysis  of  this  dataset  suggest  the  currently  vital  technology  skills   and  concepts  that  existing  and  aspiring  practitioners  may  target  in  their  continuing  education  as   developers.     introduction for  those  seeking  employment  in  a  technology-­‐intensive  position  within  library  and  information   science  (lis),  the  number  and  variation  of  technology  skills  required  can  be  daunting.  the  need  to   understand  common  technology  job  requirements  is  relevant  to  current  students  positioning   themselves  to  begin  a  career  within  lis,  those  currently  in  the  field  that  wish  to  enhance  their   technology  skills,  and  lis  educators.  the  aim  of  this  short  paper  is  to  highlight  the  skills  and   combinations  of  skills  currently  sought  by  lis  employers  in  north  america  through  textual   analysis  of  job  listings.  previous  research  in  this  area  explored  job  listings  through  various   perspectives,  from  categorizing  titles  to  interviewing  employers;1,2  the  approach  taken  in  this   study  contributes  a  new  perspective  to  this  ongoing  and  highly  necessary  work.  this  research   report  seeks  a  further  understanding  of  the  following  research  questions:   • what  are  the  most  common  job  titles  and  skills  sought  in  technology-­‐focused  lis  positions?   • what  technology  skills  are  sought  in  combination?   • what  implications  do  these  findings  have  for  aspiring  and  current  lis  practitioners   interested  in  developer  positions?     as  detailed  in  the  following  research  method  section,  this  study  addresses  these  questions     monica  maceli  (mmaceli@pratt.edu)  is  assistant  professor,  school  of  information  and  library   science,  pratt  institute,  new  york.     what  technology  skills  do  developers  need?  |  maceli   doi:  10.6017/ital.v34i3.5893   9   through  textual  analysis  of  relevant  job  listings  from  a  novel  dataset—the  job  listings  from  the   code4lib  jobs  website  (http://jobs.code4lib.org/).  code4lib  began  more  than  a  decade  ago  as  an   electronic  discussion  list  for  topics  around  the  intersection  of  libraries  and  technology.3  over  time,   the  code4lib  organization  expanded  to  an  annual  conference  in  the  united  states,  the  code4lib   journal,  and  most  relevant  to  this  work,  an  associated  jobs  website  that  highlights  jobs  culled  from   both  the  discussion  list  and  other  job-­‐related  sources.  figure  1  illustrates  the  home  page  of  the   code4lib  jobs  website;  the  page  presents  job  listings  and  associated  tags,  with  the  tags  facilitating   navigation  and  viewing  of  other  related  positions.  users  may  also  view  positions  geographically  or   by  employer.           figure  1.  homepage  of  the  code4lib  jobs  website,  displaying  most-­‐recently  posted  jobs  and  the   associated  tags.4   in  addition  to  the  visible  user  interface  for  job  exploration,  the  website  consists  of  software  to   gather  the  job  listings  from  a  variety  of  sources.  the  website  incorporates  jobs  posted  to  the   code4lib  discussion  list,  american  library  association,  canadian  library  association,  australian   library  and  information  association,  highered  jobs,  digital  koans,  idealist,  and  archivesgig.  this   broad  incoming  set  of  jobs  provides  a  wide  look  into  new  technology-­‐related  postings.     new  job  listings  are  automatically  added  to  a  queue  to  be  assessed  and  tagged  by  human  curators   before  posting.  this  allows  manual  intervention  where  a  curator  assesses  whether  the  job  is   relevant  to  technology  in  the  library  domain  and  to  validate  the  job  listing  information  and   metadata  (see  figure  2).  curating  is  done  on  a  volunteer  basis,  and  curators  are  asked  to  assess   whether  the  position  is  relevant  to  the  code4lib  community,  if  it  is  unique,  and  to  ensure  that  it   has  an  associated  employer,  set  of  tags,  and  descriptive  text.  combining  both  software  processes     information  technology  and  libraries  |  september  2015                   10   and  human  intervention  in  the  job  assessment  results  in  the  ability  to  gather  a  large  number  of   jobs  of  high  relevance  to  the  code4lib  community.  as  mentioned  earlier,  code4lib’s  origins  are  in   the  area  of  software  development  and  design  as  applied  in  lis  contexts.  these  foci  mean  that  most   jobs  identified  as  relevant  for  inclusion  in  the  code4lib  jobs  dataset  are  oriented  toward  developer   activities.  the  code4lib  jobs  website  therefore  provides  a  useful  and  novel  dataset  within  which  to   understand  current  employment  opportunities  relating  to  the  intersection  between  technology— particularly  developer  work—and  the  lis  field.       figure  2.  code4lib  job  curators  interface  where  job  data  is  validated  and  tags  assigned.5   research  method   to  analyze  the  job  listing  data  in  greater  depth,  a  textual  analysis  was  conducted  using  the  r   statistical  package,  exploring  job  titles  and  descriptions.6  first,  the  job  listing  data  from  the  most   recent  complete  year  (2014)  were  dumped  from  the  database  backend  of  the  code4lib  jobs   website;  this  dataset  contained  1,135  positions  in  total.  the  dataset  included  the  job  titles,   descriptions,  location  and  employer  information,  as  well  as  tags  associated  with  the  various     what  technology  skills  do  developers  need?  |  maceli   doi:  10.6017/ital.v34i3.5893   11   positions.  the  text  was  then  cleaned  to  remove  any  markup  tags  or  special  characters  that   remained  from  the  scraping  of  listings.  finally,  the  tm  (text  mining)  package  in  r  was  used  to   calculate  frequency,  correlation  of  terms,  generate  plots,  and  cluster  terms  across  both  job  titles   and  descriptions.7   results   job  title  analysis   of  the  full  set  of  1,135  positions,  30  percent  were  titled  as  a  librarian  position;  popular  specialties   included  systems  librarian  and  various  digital  collections  and  curation-­‐oriented  librarian  titles.   figures  3  and  4  detail  the  most  common  terms  used  in  position  titles  across  librarian  and   nonlibrarian  positions.       figure  3.  most  common  terms  used  in  librarian  position  titles.   345 89 63 59 34 29 25 25 23 21 20 20 18 18 16 14 13 13 13 12 12 11 11 11 10 librarian digital systems services metadata data technologies university technology web electronic resources assistant information emerging scholarship collections library management initiatives sciences cataloging projects research professor top title terms librarian positions   information  technology  and  libraries  |  september  2015                   12     figure  4.  most  common  terms  used  in  nonlibrarian  position  titles.   the  most  popular  job  title  terms  were  then  clustered  using  ward’s  agglomerative  hierarchical   method  (dendogram  in  figure  5).  agglomerative  hierarchical  clustering,  of  which  ward’s  method   is  widely  used,  begins  first  with  single-­‐item  clusters,  then  identifies  and  joins  similar  clusters  until   the  final  stage  in  which  one  larger  cluster  is  formed.  commonly  used  in  text  analysis,  this  allows   the  investigator  to  explore  datasets  in  which  the  number  of  clusters  is  not  known  before  the   analysis.  the  dendograms  generated  (e.g.,  figure  5)  allow  for  visual  identification  and   interpretation  of  closely  related  terms  representing  various  common  positions,  e.g.,  digital   librarian,  software  engineer,  collections  management,  etc.  given  that  job  titles  in  listings  may   include  extraneous  or  infrequent  words,  such  as  the  organization  name,  the  cluster  analysis  can   provide  an  additional  view  into  common  job  titles  across  the  full  dataset  in  a  more  generalized   fashion.     182 141 116 90 86 68 65 59 59 59 55 52 49 49 40 40 40 40 38 35 34 34 33 32 24 digital developer library manager specialist software web archivist services technology engineer director data systems analyst coordinator information senior metadata administrator lead project head programmer research top title terms non-librarian positions   what  technology  skills  do  developers  need?  |  maceli   doi:  10.6017/ital.v34i3.5893   13       figure  5.  cluster  dendrogram  of  terms  used  in  job  titles  generated  using  ward's  agglomerative   hierarchical  method.       tag  analysis   as  described  earlier,  the  code4lib  jobs  website  allows  curators  to  validate  and  tag  jobs  before   listing.  the  word  cloud  in  figure  6  displays  the  most  common  tags  associated  with  positions,  with   xml  being  the  most  popular  tag  (178  occurrences).  figure  7  contains  the  raw  frequency  counts  of   common  tags  observed.       information  technology  and  libraries  |  september  2015                   14         figure  6.  word  cloud  of  most  frequent  tags  associated  with  job  listings  by  curators.     what  technology  skills  do  developers  need?  |  maceli   doi:  10.6017/ital.v34i3.5893   15     figure  7.  frequency  of  commonly  occurring  tags  (frequency  of  fifty  occurrences  or  more)  in  the   2014  job  listings.   job  description  analysis   the  job  description  text  was  then  analyzed  to  explore  commonly  co-­‐occurring  technology-­‐related   terms,  focusing  on  frequent  skills  required  by  employers.  figures  8,  9,  and  10  plot  term   correlations  and  interconnectedness.  terms  with  correlation  coefficients  of  0.3  or  higher  were   chosen  for  plotting;  this  common  threshold  chosen  broadly  included  terms  with  a  range  in   positive  relationship  strength  from  moderate  to  strong.     plots  were  created  to  express  correlations  around  the  top  five  terms  identified  from  the  tags:  xml,   javascript,  php,  metadata,  and  html  (frequencies  in  figure  7).  any  number  of  terms  and   178 155 152 142 125 119 114 106 101 99 90 90 89 89 86 82 79 78 70 70 69 69 66 63 62 54 53 51 51 50 50 xml javascript php metadata html archive cascading style sheets python integrated library system java mysql dublin core marc standards encoded archival description ruby drupal project management sql metadata object description standard data management gnu/linux digital preservation perl digital library xsl transformations resource description and access digital repository world wide web management dspace mets frequency of tags 2014 job listings   information  technology  and  libraries  |  september  2015                   16   frequencies  can  be  plotted  from  such  a  dataset;  to  orient  the  findings  closely  around  the  job  listing   text,  a  focus  on  the  top  terms  was  chosen.  these  plots  illustrate  the  broader  set  of  skills  related  to   these  vital  competencies  represented  in  the  job  listings.         figure  8.  job  listing  terms  correlated  with  “xml”  (most  popular  tag).         figure  9.  job  listing  terms  correlated  with  “javascript”  (second  most  popular  tag),  including   “php”  and  “html”  (third  and  fifth  most  popular  tags,  respectively).     what  technology  skills  do  developers  need?  |  maceli   doi:  10.6017/ital.v34i3.5893   17     figure  10.  job  listing  terms  correlated  with  “metadata”  (fourth  most  popular  tag).     finally,  a  series  of  general  plots  was  created  to  visualize  the  broad  set  of  skills  necessary  in   fulfilling  the  positions  of  interest  to  the  code4lib  community.  as  detailed  in  the  title  analysis   (figures  3  and  4),  apart  from  the  generic  term  librarian,  the  two  most  common  terms  across  all  job   titles  were  digital  and  developer.  correlation  plots  were  created  to  detail  the  specific  skills  and   requirements  commonly  sought  in  positions  using  such  terms.  figure  11  illustrates  the  terms   correlated  with  the  general  term  of  developer,  while  figure  12  displays  terms  correlated  with   digital.  the  implications  of  these  findings  will  be  discussed  further  in  the  following  discussion   section.             information  technology  and  libraries  |  september  2015                   18     figure  11.  job  listing  terms  correlated  with  “developer.”       figure  12.  job  listing  terms  correlated  with  “ddigital.”     what  technology  skills  do  developers  need?  |  maceli   doi:  10.6017/ital.v34i3.5893   19   discussion   taken  as  a  whole,  the  job  listing  dataset  covered  a  quite  dramatic  range  of  positions,  from  highly   technical  (e.g.,  senior-­‐level  software  engineer  or  web  developer)  to  managerial  and  leadership   roles  (e.g.,  director  or  department  head  roles  centered  on  digital  services  or  emerging   technologies).  these  findings  support  the  suggestions  of  earlier  research,8  which  advocated  for  lis   graduate  programs  to  build  their  offerings  not  just  in  technology  skills  but  also  in  technology   management  and  decision-­‐making.  however,  the  code4lib  jobs  dataset  is  a  one-­‐dimensional  view   into  the  employment  process  and  is  focused  largely  on  the  developer  perspective.  additional   contextual  information,  including  whether  suitable  candidates  were  easily  identified  and  if  the   position  was  successfully  filled,  would  provide  a  more  complete  view  of  the  employment  process.   prior  research  has  indicated  that  many  technology-­‐related  positions  in  lis  are  in  fact  difficult  to   fill  with  lis  graduates.9  while  lis  graduate  programs  have  made  great  strides  in  increasing  the   number  of  courses  and  topics  covered  that  address  technology,  these  improvements  may  not   benefit  those  already  in  the  field  or  wishing  to  shift  towards  a  more  technology-­‐focused  position.   in  the  common  tags  and  terms  analysis,  experience  with  specific  lis  applications  was  relatively   infrequently  required,  with  the  drupal  content  management  system  a  notable  exception.  more   generalizable  programming  languages  or  concepts,  e.g.,  python,  relational  databases,  xml,  etc.,   were  favored  as  with  technology  positions  outside  of  the  lis  domain,  employers  likely  seek  those   with  the  ability  to  flexibly  apply  their  skills  across  various  tools  and  platforms.  this  may  also   relate  to  the  above  challenges  in  filling  such  positions  with  lis  graduates,  with  the  goal  of  opening   up  the  position  to  a  larger  technologist  applicant  base.   common  web  technologies  popular  in  the  open-­‐source  software  often  favored  by  lis   organizations  continued  to  dominate,  with  a  clear  preference  for  candidates  well  versed  in  html,   css,  javascript,  and  php.  relating  to  these  skills,  web  development  and  design  practices  were   often  intertwined  with  positions  requesting  both  developer-­‐oriented  skillsets  as  well  as  interface   design  (e.g.,  figure  7).  technologies  supporting  modern  web  application  development  and   workflow  management  were  evident  as  well,  e.g.,  common  requirements  for  experience  with   versioning  systems  such  as  git,  popular  javascript  libraries,  and  development  frameworks.  also   striking  was  the  richness  of  the  terms  correlated  with  metadata  (figure  10),  including  mention  of   growing  areas  of  expertise,  such  as  linked  data.     interestingly,  the  general  correlation  plots  expressing  the  common  terms  sought  around  “digital”   and  “developer”  positions  were  quite  varied.  while  the  developer  plot  (figure  11  above)  provided   a  richly  technical  view  into  common  technologies  broadly  applied  in  web  and  software   development,  the  terms  correlated  around  digital  were  notably  less  technical  (figure  12  above).   while  there  was  a  clear  focus  on  digital  preservation  activities  and  common  standards  in  this  area,   mention  of  terms  such  as  “grant”  indicated  that  these  positions  likely  have  a  broad  role.  the  term   digital  was  frequently  observed  in  librarian  job  titles,  so  these  roles  may  be  tasked  with  both   technical  and  administrative  work.       information  technology  and  libraries  |  september  2015                   20   finally,  there  are  inherent  difficulties  in  capturing  all  jobs  relating  to  technology  use  in  the  lis   domain  that  introduce  limitations  into  this  study.  while  the  incoming  job  feeds  attempt  to  broadly   capture  recent  job  posts,  it  is  possible  that  jobs  are  missed  or  overlooked  by  the  job  curators.   given  the  lack  of  one  centralized  job-­‐posting  source  regardless  of  the  field,  this  is  a  common   challenge  to  research  work  attempting  to  assess  every  job  posting.  and  as  mentioned  above,  there   is  also  a  lack  of  corresponding  data  as  to  whether  these  jobs  are  successfully  filled  and  what   candidate  backgrounds  are  ultimately  chosen  (i.e.,  from  within  or  outside  of  lis).     conclusion   this  assessment  of  the  in-­‐demand  technology  skills  provides  students,  educators,  and  information   professionals  with  useful  direction  in  pursuing  technology  education  or  strengthening  their   existing  skills.  there  are  myriad  technology  skills,  tools,  and  concepts  in  today’s  information   environments.  reorienting  the  pursuit  of  knowledge  in  this  area  around  current  employer   requirements  can  be  useful  in  professional  development,  new  course  creation,  and  course  revision.   the  constellations  of  correlated  skills  presented  above  (figures  8–12)  and  popular  job  tags  (figure   7)  describe  key  areas  of  technology  competencies  in  the  diverse  areas  of  expertise  presently   needed,  from  web  design  and  development  to  metadata  and  digital  collection  management.  in   addition  to  the  results  presented  in  this  paper,  the  code4lib  job  website  provides  a  continuously   current  view  into  recent  jobs  and  related  tags;  this  data  can  help  those  in  the  lis  field  orient   professional  and  curricular  development  toward  real  employer  needs.   acknowledgements   the  author  would  like  to  thank  ed  summers  of  the  maryland  institute  for  technology  in  the   humanities  for  generously  providing  the  jobs.code4lib.org  dataset  for  analysis.     references     1. janie  m.  mathews  and  harold  pardue,  “the  presence  of  it  skill  sets  in  librarian  position   announcements,”  college  &  research  libraries  70,  no.  3  (2009):  250–57,   http://dx.doi.org/10.5860/crl.70.3.250.     2. vandana  singh  and  bharat  mehra,  “strengths  and  weaknesses  of  the  information  technology   curriculum  in  library  and  information  science  graduate  programs,”  journal  of  librarianship  &   information  science  45,  no.  3  (2013):  219–31,  http://dx.doi.org/10.1177/0961000612448206.     3. “about”"  code4lib,  accessed  january  6,  2014,  http://jobs.code4lib.org/about/.   4. “code4lib  jobs:  all  jobs,”  code4lib  jobs,  accessed  january  12,  2015,  http://jobs.code4lib.org/.     5. “code4lib  jobs:  curate,”  code4lib  jobs,  accessed  january  17,  2015,   http://jobs.code4lib.org/curate/.     6. r  core  team,  r:  the  r  project  for  statistical  computing,  2014,  http://www.r-­‐project.org/.     what  technology  skills  do  developers  need?  |  maceli   doi:  10.6017/ital.v34i3.5893   21   7. ingo  feinerer  and  kurt  hornik,  “tm:  text  mining  package,”  2014,  http://cran.r-­‐ project.org/package=tm.     8. meredith  g.  farkas,  “training  librarians  for  the  future:  integrating  technology  into  lis   education,”  in  information  tomorrow:  reflections  on  technology  and  the  future  of  public  &   academic  libraries,  edited  by  rachel  singer  gordon,  193–201  (medford,  nj:  information  today,   2007).   9. mathews  and  pardue,  “the  presence  of  it  skill  sets  in  librarian  position  announcements.”   is creative commons a panacea for managing digital humanities intellectual property rights? articles is creative commons a panacea for managing digital humanities intellectual property rights? yi ding information technology and libraries | september 2019 34 yi ding (yi.ding@csun.edu) is online instructional design librarian and affordable learning $olutions co-coordinator, california state university, northridge. abstract digital humanities is an academic field applying computational methods to explore topics and questions in the humanities field. digital humanities projects, as a result, consist of a variety of creative works different from those in traditional humanities disciplines. born to provide free, simple ways to grant permissions to creative works, creative commons (cc) licenses have become top options for many digital humanities scholars to handle intellectual property rights in the us. however, there are limitations of using cc licenses that are sometimes unknown by scholars and academic librarians. by analyzing case studies and influential lawsuits about intellectual property rights in the digital age, this article advocates for a critical perspective of copyright education and provides academic librarians with specific recommendations about advising digital humanities scholars to use cc licenses with four limitations in mind: 1) the pitfall of a free license; 2) the risk of irrevocability; 3) the ambiguity of noncommercial and nonderivative licenses; 4) the dilemma of sharealike and the open movement. introduction along with an increasing number of digital scholarships, open access became a preferred, more affordable model for scholarly communication in the us.1 in particular, digital humanists envision a sharing culture that digital contents and tools can be widely distributed through open access licenses.2 creative commons (cc) licenses, with their promise to provide simple ways to grant permissions to creative works, became top options for many digital humanities to handle intellectual property rights in the us. however, creative commons is not a panacea for managing the intellectual property rights of digital scholarship. digital humanities projects usually consist of complicated components and their intellectual property rights involve various licenses and stakeholders. with misunderstandings of intellectual property and cc licenses, many scholars are not fully aware of the implications of using cc licenses, which cannot provide legal solutions to all intellectual property rights issues. the increasingly popular application and commercialization of digital humanities projects in the us further complicate the issue. based on case studies and influential lawsuits involving the topic in the us, this article critically investigates the limitations of using cc licenses and recommends that academic librarians provide scholars with more sophisticated suggestions on using cc licenses as well as providing education on intellectual property rights in general. mailto:yi.ding@csun.edu is creative commons a panacea for managing digital humanities ip rights? | ding 35 https://doi.org/10.6017/ital.v38i3.10714 literature review usually identified as rights experts, academic librarians are in a unique position to provide copyright education in the digital humanities field through consultation, instruction, and other means to faculty and students.3 librarians sometimes position themselves as “reuse evangelists” who embrace the vision of creative commons by applying cc licenses as well as introducing cc licenses to the campus community through guides and webpages.4 yet, few discussions have been brought up about the limitations of cc licenses in the library community.5 drawing from scholarly literature from the law field and primary sources including lawsuits, websites, magazine articles, and newspaper articles involving this topic, this article intends to bring a critical perspective into the copyright education academic librarians provide by analyzing the four limitations of cc licenses in managing digital humanities projects intellectual property rights. in the law community, scholars have examined the limitations of open licensing and creative commons. katz elaborated the mismatch of the vision of creative commons and its licensors as well as how the incompatibility of cc licenses may result in potential detriment to the dissemination of knowledge.6 scholars later have referred to katz in extensive discussions of the limitations of cc licenses in different realms of copyrighted works. for example, johnson investigated several limitations of cc licenses for entertainment media, including those with sharealike, noncommercial, and nonderivative licenses.7 lukoseviciene acknowledged the efficiency of cc licenses while pointing out its limitation in ensuring equity in a sharing culture.8 when discussing the problems of cc licenses in data sharing, khayyat and bannister echoed katz’s critique on the limitation of cc licenses in combining copyrighted works with different types of licenses.9 scholars have also addressed problems related to intellectual property rights other than copyright when applying cc licenses. for example, hietnanen discussed the problems of license interpretation and concluded that although cc licenses are useful for “low value high volume licensing,” it fails to address some important intellectual property rights including privacy and moral rights.10 burger demonstrated how cc commercial licenses have encouraged publicity right infringement in several cases.11 nevertheless, none of the above scholars discussed the implication of the limitations of cc licenses in digital scholarship. to solve the problem of excessive open-source licenses, gomulkiewicz suggested a license-selection “wizard” modeling what creative commons offers, which demonstrates the limitation of cc licenses in managing the intellectual property rights of codes, a common component of many digital humanities projects.12 this article does not aim to conduct a comprehensive assessment of pitfalls of cc licenses in digital scholarship or make legal recommendations to manage the intellectual property rights of digital humanities projects. rather, it discusses the four limitations of cc licenses that are usually overlooked but essential for academic librarians to educate patrons in the digital humanities field. with the development of the digital humanities field and more students involved in it, academic librarians should educate both faculty scholars and emerging scholars about implications of applying cc licenses.13 information technology and libraries | september 2019 36 four limitations of cc licenses is creative commons really free?—the pitfall of a free license one major reason that scholars and institutions are using cc licenses is the ease of applying them to creative works. the directory of open access journals (doaj), which is regarded as “both an important mode of discovery and marker of legitimacy within the world of open access publishing,” now recommends cc licenses as a best practice.14 doaj explicitly encourages scholars to use creative commons’ “simple and easy” license chooser tool. indeed, the creative commons website provides scholars and institutions a very user-friendly way to select and apply a license to copyrightable works.15 anyone can place a cc license on a work by copying and pasting from its website. however, this oversimplified process of handling intellectual property rights of creative works may mislead both copyright owners and copyrighted works users to overlook pitfalls of this free license, including unintentional copyright and other intellectual property rights infringements. more specifically, one prominent legal formality of cc licenses is that licensees do not need to pay to register with creative commons to apply a cc license. as indicated by creative commons website, a cc license is legally valid as soon as a user applies it to any material the user has the legal right to license. creative commons also does not “require registration of the work with a national copyright agency.”16 while copyright protection is automatic the moment a work is created and “fixed in a tangible form,” there are various advantages to register copyrighted works through the united states copyright office to establish a public record of the copyright claim.17 one foremost important advantage of copyright registration is that copyright owners can file an infringement suit of works of u.s. origin in court. actually, filing a registration before or within five years of publishing a work will actually put the copyright owner in a stronger position in court to validate the copyright.18 additionally, copyright registration enables one to get awarded statutory damages and attorney’s fees and to gain protection against the importation of infringing copies.19 the emphasis on a free-to-use license along with the lack of clarification of the functions of copyright registration on the website of creative commons may not only mislead scholars to ignore important legal formalities within the copyright law, but also increase the abuse of original materials by stakeholders such as predatory publishers. one example is how the integrated study of marine mammals repackaged existing articles taking advantage of the creative commons licenses used by plos one, which has been publishing articles on digital humanities. 20 the oversimplified process of using cc licenses advocated by creative commons website may also prevent licensors from double-checking or clarifying if they have the legal right to license a work. in 2013, persephone magazine, which used an image with a creative commons license, was later sued for $1,500 for using it. it turned out the photo did not belong to the person who uploaded it with a cc license, which led to 73 companies who used it being sued. persephone magazine claimed that $1,500 was more than its entire advertising revenue for the year and it had to ask its users to donate just to keep the site going.21 therefore, scholars of digital humanities projects, which usually include different types of content such as artworks and photographs, should be wary of using cc licensed images. otherwise, a freely available license might end up costing a scholar unexpected money and energy. in the is creative commons a panacea for managing digital humanities ip rights? | ding 37 https://doi.org/10.6017/ital.v38i3.10714 meantime, when deciding to put their projects under cc licenses or to publish their works in a journal that requires cc licenses, scholars should also be reminded to make accurate and clear copyright statements to prevent innocent infringements of other copyright owners ’ works. for example, a team of art historians who create an online map of architectures in ancient china are very likely to use and critique other people’s images in digital projects under fair use. these digital humanists should cite image sources and clarify the scope of the cc license that they apply to their project. it is understandable that in order to promote an open, sharing culture, the application of a cc license is intentionally designed to be simple and free by creative commons to fulfill its mission. however, the misuse of a free license can lead to false licenses and more innocent infringements and ultimately costs. academic librarians should become aware of these pitfalls and provide more in-depth training on cc licenses to scholars, especially by collaborating with campus centers of digital humanities or language and literature faculty as well as other institutional research support departments as suggested by fraser and arnhem.22 is creative commons really safe?—the risk of irrevocability similar to the pitfall of inaccurate licenses, the irrevocability of cc licenses can also be problematic. a “revocable” license is one that can be terminated by the licensor at any time during the term of the license agreement. an “irrevocable” license, on the other hand, cannot be terminated if there is no breach. all cc licenses are irrevocable.23 licenses and contracts usually have effective date of termination and even if they don’t have one, most courts hold that simple, nonexclusive licenses with unspecified durations that are silent on revocability are revocable at will.24 as a result, the irrevocability of cc licenses can be easily overlooked by cc licensors. this means that while in traditional academic publishing and other means of the dissemination of research, scholarly, and creative output, a scholar will be able to revise the copyright agreement he or she has established with a publisher or a scholarly communication venue due to the usually clear rules on termination dates and revocability, it is impossible to revoke a cc license. this discrepancy of the revocability between traditional copyright agreements and cc licenses may put copyright owners at disadvantage especially because many of them apply noncommercial cc licenses. copyright experts have warned scholars to keep in mind that once a “nonexclusive license,” which cc noncommercial licenses are, has been chosen to grant one’s work, the scholar has lost potential opportunities to “license the same work on an exclusive basis,” which is the case in the commercialization of a digital humanities work.25 we can understand this pitfall of the irrevocability of cc licenses in a case in late 2014. a plan by yahoo to begin selling prints of images uploaded to flickr was met with anger by users, even though yahoo only used photos with creative commons licenses that explicitly allowed commercial uses. although yahoo’s use of cc licensed works was legal, users who initially applied cc licenses with commercial use would not have wanted the company making canvas prints from the photos they posted to flickr to make money.26 should these copyright owners understand better the irrevocability of cc licenses, they might have chosen a different type of cc license with caution. bill of rights, a community of people advocating for protecting the intellectual property rights of artists, even called this kind of commercial use “abuse.”27 although most digital scholars, like those flickr users, have a genuine interest in making their works available to as many people as possible, it can be hard to gauge their reactions to all unforeseen outcomes of applying cc licenses to their works. therefore, scholars need more institutional support and education to information technology and libraries | september 2019 38 become aware of the irrevocability of cc licenses when managing the intellectual property rights of their digital scholarship projects. this institutional awareness-building is especially important because of the lack of support from creative commons. irrevocability is listed in the “considerations before licensing” section on the website of creative commons. however, scholars may easily overlook the irrevocability feature of cc licenses due to two reasons. first, the 6,500-plus-word “considerations before licensing” section is not a mandatory step to go through for licensors. it is simply a clickable link from the “choose a license” webpage of creative commons.28 second, although every cc license consists of three layers, the lawyer-readable legal code, the human-readable deed, and the machine-readable code, the irrevocability of cc licenses can be easily buried in those texts when a layperson without any experience or training of cc license look for the simplest way to promote and expose their works as much as possible.29 some may suggest putting everything under noncommercial use. however, it is not an option for some platforms and is even discouraged by some digital scholarship repositories. for example, the open access scholarly publishers association strongly encourages the use of the cc-by license wherever possible.30 the rationale behind the recommendation is the hope to make scientific findings available for innovations as well as to make open-access journals sustainable with sufficient profit to operate. driven by the same objectives, cc-by has become the gold standard for oa publishing. the three largest oa publishers (biomed central, plos, and hindawi) all use this license.31 in particular, the often multimedia and viable characteristics of digital humanities projects can expose them to even more infringement issues in the future. one example of this is romelab, a project focusing on the recreation of the roman forum, and its website is made up of multiple separate components. the project’s website is constructed with the drupal content management system, and is integrated with a 3d virtual world component, where users can access the romelab website and walk through the virtual space of rome itself. romelab is currently under a creative commons attribution-noncommercial license. as a project funded by the mellon foundation, romelab is required to offer “nonexclusive, royalty-free, worldwide, perpetual, irrevocable license to distribute” its data. 32 however, it is never clear to the researchers creating the site how to release the data that only work within the proprietary software unity engine that they used to produce the virtual space and more importantly, all the 3d models and pictures. simply putting the whole site under the creative commons attribution-noncommercial license doesn’t automatically make its research data accessible by the public. in this case, the irrevocability of cc licenses further complicates the issue of cc licenses being oversimplified. specifically, since the romelab website is also equipped with a chat feature and a multiplayer function, allowing multiple users to interact with each other, the project has a great potential to make profit if repurposed as a teaching tool and even an educational game in the future. whether or not researchers of romelab manage to make their research data publicly available, cc licenses are not a panacea to handle conflicting data release expectations and intellectual property rights of unity engine and mellon foundation. it is therefore recommended that digital scholars consider various data types and licensing options before exclusively applying irrevocable cc licenses to their creative works. moreover, if the creator of romelab wants to produce a virtual introduction of the 3d world of the project, he should take into consideration of the limitation of cc licenses before disseminating his is creative commons a panacea for managing digital humanities ip rights? | ding 39 https://doi.org/10.6017/ital.v38i3.10714 work via platforms such as youtube. in 2014, a user found out that somebody took his drone video of burning man 2013 and reposted it in its entirety to youtube under the inaccurate and misleading title “drone’s eye view of burning man 2014,” which earned a large number of views and advertising.33 when everyone was looking for the newest drone video of burning man in 2014, the video posted by this other person received millions of views, which earned them money from youtube advertising. the reason the user cannot sue this other person is that he originally licensed his video under cc by license, which allows commercial use, and which unfortunately is youtube’s only cc license option.34 had the original videographer better understood the irrevocability of cc licenses, he might have chosen a different platform to disseminate his video or at least utilized other ways to protect his copyright. scholars would not want this kind of abuse of their original works and thus should be more cautious of the irrevocability of cc licenses. furthermore, youtube and many other platforms that digital humanities scholars use to disseminate their research, scholarly, and creative work fail to provide effective functionalities and incentives to fulfill cc’s attribution requirement.35 cc by license stipulates, “if supplied, you must provide the name of the creator and attribution parties, a copyright notice, a license notice, a disclaimer notice, and a link to the material.”36 to find this piece of information on youtube, however, someone must go to a video’s landing page and first click the “show more” text in the description below the video. although it is clear to see the cc attribution license with link displayed, someone must click a “view attributions” link to discover the original author’s credit and source video link. the difficulty of going through different steps may impede an average youtube user or most potential licensees of a cc-licensed digital scholarly work to learn the original creator of any content and if what they are viewing was partially or wholly created by someone else.37 since cc licenses only provide licensees with a very general requirement to attribute, licensees are allowed to attribute “in any reasonable manner.”38 with the only limitation to be “not in any way that suggests the licensor endorses you or your use,” licensees are not incentivized to accurately attribute to the scholar of the original work and thus to help disseminate his or her work crediting the copyright owner.39 while users can search for registered works on the official website of united states copyright office, there is no way to conduct a comprehensive search for works under cc licenses. creative commons does not maintain a database of works distributed under cc licenses. although there are search engines and websites for works under cc licenses, there is no way to conduct an exhaustive search.40 this can create hurdles for future licensees of a derivative work to accurately and clearly attribute the original work. one of the most important motivations of scholars to distribute their works under cc licenses is to get gain more exposure. due to all these above limitations and others to be discussed in this paper, scholars should be more cautious of the irrevocability of cc licenses and its lack of enforcement and support system to help licensors accurately attribute the original work. is creative commons really clear?—the ambiguity of noncommercial and nonderivative licenses noncommercial license in the legal code of a cc attribution-noncommercial-sharealike license, noncommercial is defined as “not primarily intended for or directed towards commercial advantage or monetary compensation. for purposes of this public license, the exchange of the licensed material for other material subject to copyright and similar rights by digital file-sharing or similar means is noncommercial provided there is no payment of monetary compensation in connection with the https://www.youtube.com/watch?v=m2thtb6iffa https://www.youtube.com/watch?v=m2thtb6iffa https://www.youtube.com/watch?v=z9jtiouk_6o https://creativecommons.org/licenses/by/2.0/ information technology and libraries | september 2019 40 exchange.”41 this seemingly clear statement can create some confusion and problems in the real world. while a commercial use weighs against fair use, copyright law does not rule infringement solely on a use being commercial. in fact, it is hard to determine a use as totally noncommercial. in the case of princeton university press v. michigan document services, inc., michigan document services (mds) being a commercial copy shop weighs against a finding of fair use, but mds’s use being commercial is only one of the four factors in a fair-use analysis. in this case, the court held that mds’s commercial exploitation of the copyrighted works from princeton university press did not constitute fair use although the courts clarified the educational use was “noncommercial in nature.”42 there have been a number of cases in us copyright law where commercial uses have been ruled lawful fair use. by making commercial use a decisive factor to determine an illegal use, creative commons fails to specify real cases of commercial uses and thus oversimplifies the complicated copyright issues involving commercial uses that scholars should be aware of. more specifically, many digital scholars nowadays post their articles and projects with noncommercially cc licensed images on a website, the maintenance of which is seldom free. similar to the case of princeton university press v. michigan document services, inc., the educational or scholarly use of those noncommercially licensed images should be considered “noncommercial in nature.”43 however, if a digital humanist maintains a website that is subsidized partly by google ads or a company, the nature of the use of those noncommercially licensed images might be called into question as in the case of princeton university press v. michigan document services, inc. although in both situations, the image is not “primarily intended for or directed towards commercial advantage or monetary compensation,” the digital humanist may still increase the traffic of his site and thus profit from including those images on his site. 44 the “different viewpoints and colliding interests” among commercial publishers, librarians, scholars, university administrators, and others may further complicate the already “ambiguous commercial nature of use” in fair use analysis that creative commons oversimplifies.45 the more recent case of great minds v. fedex office & print services, inc. demonstrates this ambiguity of commercial use and one use of cc noncommercial license that is legal yet unexpected and unwanted by copyright owners. to specify, great minds argued that fedex should compensate it for the money the company made from copying materials that great minds distributed under a cc attribution-noncommercial-sharealike 4.0 license. in an amicus brief to support fedex office, creative commons held that “entities using cc-licensed works must be free to act as entities do—including through employees and the contractors they engage in their service” and otherwise “the value of the license would be significantly diminished.”46 creative commons demonstrated its interpretation of a commercial use to be different from the ruling in the case princeton university press v. michigan document services, in which the judge explicitly ruled the use to be commercial because the copyright complaint was performed on “a profitmaking basis by a commercial enterprise” and clearly forbade the contract between this enterprise and a nonprofit organization to copy and distribute copyrighted content.47 in contrast, in the case of great minds v. fedex office & print services, inc., the court held that great minds ’ nonexclusive public license, i.e. cc attribution-noncommercial-sharealike 4.0 international public license, “unambiguously permitted school districts to engage fedex, for a fee, to reproduce” the copyrighted content.48 scholars should therefore be wary of the complicated process and “several areas of uncertainty” surrounding creative commons, which can be easily is creative commons a panacea for managing digital humanities ip rights? | ding 41 https://doi.org/10.6017/ital.v38i3.10714 overlooked when applying the “simple and easy” cc licenses.49 none of the interpretations of noncommercial uses by creative commons are specified in the generic license deed. compared to more customized licenses that usually involve direct interactions between the licensor and the licensee, the free-of-charge license, cc licenses, has a long way to go to protect both licensors and licensees from infringements and financial loss. a study of noncommercial uses conducted by creative commons indicates that noncommercial licenses account for “approximately two-thirds of all creative commons licenses associated with works available on the internet.”50 kim confirmed this popularity of cc noncommercial licenses that “over 60 percent flickr users prohibit commercial use or derivative work.”51 as kim elaborated and as the previous section in this paper on the irrevocability of cc licenses showcases, either commercial or noncommercial cc licenses are “likely to be detrimental to potential professional careers” of copyright owners.52 nevertheless, as stated by creative commons, they do not offer legal advice. 53 when providing copyright education, academic librarians should therefore remind digital scholars to be careful in using both commercial and noncommercial content and making their own content available for noncommercial purposes. nonderivative license similarly, scholars should be reminded to have a critical view of nonderivative use of cc licenses. according to title 17 section 101 of the copyright act, a “derivative work” is a work based upon one or more preexisting works in which it may be recast, transformed, or adapted.54 however, creative commons used the phrase “adapted material” to define derivative work in the legal code for nonderivative uses.55 creative commons has a different understanding of derivative works from what is defined by the copyright act in musical works. “adapted material is always produced where the licensed material is synched in timed relation with a moving image.”56 this means that while using an original soundtrack in a video is not derivative work according to the copyright act, videos that use an nd-licensed song violate the terms of the cc license. similar to the difference of revocability and commercial use between creative commons and copyright act as discussed earlier in this article, this different understanding of derivative work should be made aware to scholars. specifically, when providing copyright education to scholars, academic librarians should make it clear that nonderivative license cannot alienate the fair use rights of users and that a noncommercial nonderivative license does not prevent companies from using a work in a parody.57 some licensors of cc licenses may not share creative commons’ vision of an open, sharing culture as suggested by the prevalence of nd licenses.58 therefore, instead of providing generic recommendation on using cc licenses, academic librarians should “balance the interests of information users and rights holders” by providing a more sophisticated and critical perspective when educating the scholarly community about the nonderivative cc licenses.59 is creative commons really sustainable—the dilemma of sharealike and open access incompatible sharealike licenses for many digital scholars, the sharealike term in cc licenses is intended to distribute their works more broadly and openly since a licensee is required by creative commons to “distribute . . . [their contributions] . . . under the same license as the original.”60 nevertheless, some incompatibility issues arise to prevent a more open distribution of works. for example, since the creative commons system offers two different sharealike licenses, a scholar cannot create a new derivative work combining two sharealike works with different terms of their respective licenses. http://www4.law.cornell.edu/uscode/17/101.html information technology and libraries | september 2019 42 it is the open and accessible nature of cc-licensed works that makes them ideal for scholars including digital humanists to collaborate on projects but ironically, the sharealike function can create the risk of “an intractable thicket” if incompatibilities between those licenses hinder future collaboration.61 creative commons does provide a series of compatible licenses, but only same licenses with differences in cc versions are considered compatible.62 against open access? these incompatibilities between certain cc licenses have been pointed out by copyright experts to limit “the future production and distribution of creative works” and even “anti-public domain.”63 in 2009, the cofounder and ceo of creative commons, lawrence lessig, pointed out the perils of openness in government in his article “against transparency.”64 echoing his argument that “whether and how new information is used to further public objectives depends upon its incorporation into complex chains of comprehension, action, and response,” this paper advocates a critical perspective of cc licenses in digital scholarship. apart from all the limitations of cc licenses discussed already, a more unsettling misuse of cc licenses is a failure to recognize other rights of a work beyond copyright. in 2011, the image of an underage girl, which was placed on flickr under a cc license, was used in an advertising campaign for mobile phone services.65 although after the lawsuit, creative commons ceo added a term in the legal deed of the latest version (4.0) of every cc license to explicitly state that other rights such as publicity, privacy, or moral rights may limit how to use the material, the case reveals the perils of openness.66 when providing copyright education, academic librarians should not only warn digital scholars of this limitation of cc licenses but also encourage them to include a statement of intellectual property rights including privacy and other rights on their digital scholarship websites to reduce abuses and innocent infringements. conclusion even though cc licenses are helpful for digital humanities to gain more exposure, these licenses are still being improved. creative commons pledged to the community to “clarify how the nc limitation works in the practical world.”67 yet, when providing copyright consultation or partnering with digital humanity scholars, academic librarians should warn these scholars as both licensors and licensees the sophisticated implications of not only the noncommercial license, but also other characteristics and limitations of cc licenses. academic librarians should introduce to digital scholars a more critical view of cc licenses by collaborating with different campus stakeholders.68 while it is recommended that academic librarians suggest digital scholars place their creative works under noncommercial license, academic librarians should also educate them about the ambiguous definitions of commercial use as well as the possibility of commercial parody and other fair use situations. it is also recommended that academic librarians provide digital humanists with guidance on how to create intellectual property statements on their website, which should include not only copyright, but also privacy and other intellectual property rights. currently, a number of university libraries and nonprofit organizations, ranging from duke university library (http://library.duke.edu/), to library of congress (https://www.flickr.com/photos/library_of_congress), and wikipedia (https://en.wikipedia.org/wiki/main_page), use cc licenses for their entire site.69 as cc license http://library.duke.edu/ https://www.flickr.com/photos/library_of_congress https://en.wikipedia.org/wiki/main_page is creative commons a panacea for managing digital humanities ip rights? | ding 43 https://doi.org/10.6017/ital.v38i3.10714 users, academic librarians should also be extremely careful when using cc-licensed pictures or music on the library’s website. the safest way is to only use ones that are in the public domain or that are acquired by the library. despite the use of free and simple cc licenses, academic libraries are recommended to include terms of use and privacy sections on their websites to provide more detailed explanations of the function of cc licenses and intellectual property rights in general. the alignment between the visions of creative commons, digital humanities, and “higher education as a cultural and knowledge commons” put academic librarians in a unique position to provide copyright education in the digital humanities field.70 because of all the limitations of cc licenses, academic librarians should go beyond a simple endorsement of cc licenses and offer a more sophisticated and critical perspective when educating the scholarly community about cc licenses. notes 1 amanda hornby and leslie bussert, "digital scholarship and scholarly communication," university of washington libraries, accessed november 30, 2016, https://www.uwb.edu/getattachment/tlc/faculty/teachingresources/newmedia. 2 oya y rieger, “framing digital humanities: the role of new media in humanities scholarship,” first monday 15, no. 10 (october 11, 2010), http://firstmonday.org/ojs/index.php/fm/article/view/3198. 3 elizabeth joan kelly, "rights instruction for undergraduate students: needs, trends, and resources," college & undergraduate libraries 25, no. 1 (2018): 1-16, https://doi.org/10.1080/10691316.2016.1275910. 4 daniel hickey, "the reuse evangelist: taking ownership of copyright questions at your library," reference & user services quarterly 51, no. 1 (2011): 9-11; “research guides: image resources: creative commons images,” creative commons images image resources research guides at ucla library, accessed april 28, 2019, https://guides.library.ucla.edu/c.php?g=180361&p=1185834; “finding public domain & creative commons media: images,” research guides, accessed april 28, 2019, https://guides.library.harvard.edu/c.php?g=310751&p=2072816. ucla and harvard are two good examples. 5 lewin-lane et al., "the search for a service model of copyright best practices in academic libraries," journal of copyright in education and librarianship 2, no. 2 (2018): 1-24. harvard. for example, when conducting a literature review of the copyright education in academic libraries to search for best practices, does not discuss any limitation of cc licenses in this article. 6 zachary katz, "pitfalls of open licensing: an analysis of creative commons licensing," idea: the intellectual property law review 46, no. 3 (2006): 391-413. 7 eric e. johnson, "rethinking sharing licenses for entertainment media," cardozo arts & entertainment law journal 26, no. 2 (2008): 391-440. https://www.uwb.edu/getattachment/tlc/faculty/teachingresources/newmedia http://firstmonday.org/ojs/index.php/fm/article/view/3198 https://doi.org/10.1080/10691316.2016.1275910 https://guides.library.ucla.edu/c.php?g=180361&p=1185834 https://guides.library.harvard.edu/c.php?g=310751&p=2072816 https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa190356089&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa190356089&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa190356089&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa190356089&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us information technology and libraries | september 2019 44 8 aurelija lukoseviciene, "beyond the creative commons framework of production and dissemination of knowledge," http://dx.doi.org/10.2139/ssrn.1973967. 9 mashael khayyat and frank bannister, “open data licensing: more than meets the eye,” information polity: the international journal of government & democracy in the information age 20 (4): 231–52, https://doi:10.3233/ip-150357. 10 herkko hietanen, “the pursuit of efficient copyright licensing: how some rights reserved attempts to solve the problems of all rights reserved,” lappeenranta university of technology, 2008. 11 christa engel pletcher burger, “are publicity rights gone in a flash?: flickr, creative commons, and the commercial use of personal photographs,” florida state business review 8 (2009): 129, https://ssrn.com/abstract=1476347. 12 robert w gomulkiewicz, “open source license proliferation: helpful diversity or hopeless confusion?” washington university journal of law & policy 30 (2009): 261; expanded academic asap, accessed april 28, 2019, http://link.galegroup.com.libproxy.csun.edu/apps/doc/a208273638/eaim?u=csunorthridge &sid=eaim&xid=4bbf2442. 13 jacob h. rooksby, “a fresh look at copyright on campus,” missouri law review (summer 2016): 769; general onefile, accessed april 27, 2019, http://link.galegroup.com.libproxy.csun.edu/apps/doc/a485538679/itof?u=csunorthridge& sid=itof&xid=1f2822f3. 14 “escholarship: copyright & legal agreements,” accessed december 1, 2016, http://escholarship.org/help_copyright.html#creative. 15 “directory of open access journals,” doaj, accessed december 1, 2016, https://doaj.org. 16 “frequently asked questions—creative commons,” accessed december 7, 2016, https://creativecommons.org/faq/#do-i-need-to-register-with-creative-commons-before-iobtain-a-license. 17 “copyright in general,” u.s. copyright office, accessed july 30, 2019, https://www.copyright.gov/help/faq/faq-general.html. 18 “why should i register my work if copyright protection is automatic?,” copyright alliance, accessed july 28, 2019, https://copyrightalliance.org/ca_faq_post/copyright-protection-ata/. 19 “copyright basics,” u.s. copyright office and library of congress, accessed november 30, 2016. https://www.copyright.gov/circs/circ01.pdf#page=7. 20 phil clapham, “are creative commons licenses overly permissive? the case of a predatory publisher,” bioscience (2018): 842-43, accessed april 20, 2019, https://doi:10.1093/biosci/biy098; cornelius puschmann and marco bastos, “how digital are http://dx.doi.org/10.2139/ssrn.1973967 https://doi:10.3233/ip-150357 https://ssrn.com/abstract=1476347 http://link.galegroup.com.libproxy.csun.edu/apps/doc/a208273638/eaim?u=csunorthridge&sid=eaim&xid=4bbf2442 http://link.galegroup.com.libproxy.csun.edu/apps/doc/a208273638/eaim?u=csunorthridge&sid=eaim&xid=4bbf2442 http://link.galegroup.com.libproxy.csun.edu/apps/doc/a485538679/itof?u=csunorthridge&sid=itof&xid=1f2822f3 http://link.galegroup.com.libproxy.csun.edu/apps/doc/a485538679/itof?u=csunorthridge&sid=itof&xid=1f2822f3 http://escholarship.org/help_copyright.html#creative https://doaj.org/ https://creativecommons.org/faq/#do-i-need-to-register-with-creative-commons-before-i-obtain-a-license https://creativecommons.org/faq/#do-i-need-to-register-with-creative-commons-before-i-obtain-a-license https://www.copyright.gov/help/faq/faq-general.html https://copyrightalliance.org/ca_faq_post/copyright-protection-ata/ https://www.copyright.gov/circs/circ01.pdf#page=7 https://doi:10.1093/biosci/biy098 is creative commons a panacea for managing digital humanities ip rights? | ding 45 https://doi.org/10.6017/ital.v38i3.10714 the digital humanities? an analysis of two scholarly blogging platforms,” plos one 10, no. 2 (2015), accessed april 20, 2019. https://doi:10.1371/journal.pone.0115035. 21 “why your blog images are a ticking time bomb,” koozai.com, accessed december 2, 2016, https://www.koozai.com/blog/content-marketing-seo/blog-sued-for-images/. 22 john w. white and heather gilbert eds., laying the foundation: digital humanities in academic libraries (west lafayette: purdue university press, 2016), proquest ebook central. 23 “considerations for licensors and licensees—creative commons,” accessed december 7, 2016, https://wiki.creativecommons.org/wiki/considerations_for_licensors_and_licensees. 24 “the terms ‘revocable’ and ‘irrevocable’ in license agreements: tips and pitfalls,” accessed december 7, 2016, http://www.sidley.com/news/the-terms-revocable-and-irrevocable-inlicense-agreements-tips-and-pitfalls-02-21-2013. 25 mark seeley and lois wasoff, “legal aspects and copyright-15,” in academic and professional publishing, edited by robert campbell, ed pentz, and ian borthwick (cambridge, uk: elsevier ltd, 2012), 355-83. 26 douglas macmillan, “fight over yahoo’s use of flickr photos,” wall street journal, november 25, 2014, sec. tech, http://www.wsj.com/articles/fight-over-flickrs-use-of-photos-1416875564. 27 “flickr apologizes but what about cc abuses by others?,” accessed december 7, 2016, http://www.artists-bill-of-rights.org/news/campaign-news/flickr-apologizes-but-whatabout-cc-abuses-by-others?/. 28 “the terms ‘revocable’ and ‘irrevocable’ in license agreements: tips and pitfalls,” accessed december 7, 2016, http://www.sidley.com/news/the-terms-revocable-and-irrevocable-inlicense-agreements-tips-and-pitfalls-02-21-2013. 29 “legal code—creative commons,” accessed december 7, 2016, https://wiki.creativecommons.org/wiki/legal_code. 30 “why cc-by?—oaspa,” accessed december 7, 2016, http://oaspa.org/why-cc-by/. 31 “why cc-by?—oaspa.” 32 “intellectual property policy,” the andrew w. mellon foundation, accessed july 28, 2019, https://mellon.org/grants/grantmaking-policies-and-guidelines/grantmakingpolicies/intellectual-property-policy/. 33 “why i’m giving up on creative commons on youtube,” eddie.com, september 6, 2014, http://eddie.com/2014/09/05/why-im-giving-up-on-creative-commons-on-youtube/. 34 “creative commons—attribution 4.0 international—cc by 4.0,” accessed december 7, 2016, https://creativecommons.org/licenses/by/4.0/. 35 “why i’m giving up on creative commons on youtube.” https://doi:10.1371/journal.pone.0115035. https://www.koozai.com/blog/content-marketing-seo/blog-sued-for-images/ https://wiki.creativecommons.org/wiki/considerations_for_licensors_and_licensees http://www.sidley.com/news/the-terms-revocable-and-irrevocable-in-license-agreements-tips-and-pitfalls-02-21-2013 http://www.sidley.com/news/the-terms-revocable-and-irrevocable-in-license-agreements-tips-and-pitfalls-02-21-2013 http://www.wsj.com/articles/fight-over-flickrs-use-of-photos-1416875564. http://www.artists-bill-of-rights.org/news/campaign-news/flickr-apologizes-but-what-about-cc-abuses-by-others?/ http://www.artists-bill-of-rights.org/news/campaign-news/flickr-apologizes-but-what-about-cc-abuses-by-others?/ http://www.sidley.com/news/the-terms-revocable-and-irrevocable-in-license-agreements-tips-and-pitfalls-02-21-2013. http://www.sidley.com/news/the-terms-revocable-and-irrevocable-in-license-agreements-tips-and-pitfalls-02-21-2013. https://wiki.creativecommons.org/wiki/legal_code http://oaspa.org/why-cc-by/ https://mellon.org/grants/grantmaking-policies-and-guidelines/grantmaking-policies/intellectual-property-policy/ https://mellon.org/grants/grantmaking-policies-and-guidelines/grantmaking-policies/intellectual-property-policy/ http://eddie.com/2014/09/05/why-im-giving-up-on-creative-commons-on-youtube/ https://creativecommons.org/licenses/by/4.0/ information technology and libraries | september 2019 46 36 “creative commons—attribution 4.0 international—cc by 4.0.” 37 “why i’m giving up on creative commons on youtube.” 38 “creative commons—attribution 4.0 international—cc by 4.0.” 39 ibid. 40 “cc search,” accessed december 7, 2016, https://search.creativecommons.org/. 41 “creative commons—attribution-noncommercial-sharealike 4.0 international—cc by-nc-sa 4.0,” accessed december 7, 2016, https://creativecommons.org/licenses/by-ncsa/4.0/legalcode. 42 “u.s. copyright office fair use index,” u.s. copyright office, accessed april 21, 2019, https://www.copyright.gov/fair-use/. 43 ibid. 44 ibid. 45 jerry d campbell, “intellectual property in a networked world: balancing fair use and commercial interests,” library acquisitions: practice and theory 19, no. 2 (1995): 179-84, https://doi:10.1016/0364-6408(95)00020-a; igor slabykh, “ambiguous commercial nature of use in fair use analysis,” aipla quarterly journal 46, no. 3 (2018): 293-339. 46 “defending noncommercial uses: great minds v fedex office,” creative commons, august 30, 2016, https://creativecommons.org/2016/08/30/defending-noncommercial-uses-greatminds-v-fedex-office/. 47 “princeton university press v. michigan document services,” bitlaw, accessed december 7, 2016, http://www.bitlaw.com/source/cases/copyright/pup.html#iiia. 48 justia, “great minds v. fedex office & print services, inc,” stanford copyright and fair use center, march 21, 2018, https://fairuse.stanford.edu/case/great-minds-v-fedex-office-printservices-inc/. 49 minjeong kim, “the creative commons and copyright protection in the digital era: uses of creative commons licenses,” journal of computer‐mediated communication 13, no. 1 (2007): 187-209, https://doi:10.1111/j.1083-6101.2007.00392.x; “directory of open access journals,” doaj, accessed december 1, 2016, https://doaj.org. 50 “feature: creative commons: copyright tools for the 21st century,” accessed december 7, 2016, http://www.infotoday.com/online/jan10/gordon-murnane.shtml. 51 “the creative commons and copyright protection in the digital era: uses of creative commons licenses.” 52 ibid. https://search.creativecommons.org/ https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode https://www.copyright.gov/fair-use/ https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_sciversesciencedirect_elsevier0364-6408(95)00020-a&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_sciversesciencedirect_elsevier0364-6408(95)00020-a&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://doi:10.1016/0364-6408(95)00020-a. https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa570516325&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa570516325&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://creativecommons.org/2016/08/30/defending-noncommercial-uses-great-minds-v-fedex-office/ https://creativecommons.org/2016/08/30/defending-noncommercial-uses-great-minds-v-fedex-office/ http://www.bitlaw.com/source/cases/copyright/pup.html#iiia https://fairuse.stanford.edu/case/great-minds-v-fedex-office-print-services-inc/ https://fairuse.stanford.edu/case/great-minds-v-fedex-office-print-services-inc/ https://doi:10.1111/j.1083-6101.2007.00392.x https://doaj.org/ http://www.infotoday.com/online/jan10/gordon-murnane.shtml is creative commons a panacea for managing digital humanities ip rights? | ding 47 https://doi.org/10.6017/ital.v38i3.10714 53 “creative commons—attribution-sharealike 4.0 international—cc by-sa 4.0,” accessed december 7, 2016, https://creativecommons.org/licenses/by-sa/4.0/legalcode#s6a. 54 “17 u.s. code § 101—definitions,” legal information institute, accessed april 20, 2019, https://www.law.cornell.edu/uscode/text/17/101. 55 “creative commons—attribution-noncommercial-noderivatives 4.0 international—cc by-ncnd 4.0,” accessed december 7, 2016, https://creativecommons.org/licenses/by-ncnd/4.0/legalcode. 56 “creative commons—attribution-noncommercial-noderivatives 4.0 international—cc by-ncnd 4.0.” 57 the famous campbell v. acuff-rose music case established that a commercial parody could qualify as fair use. 58 katz, “pitfalls of open licensing,” 411. 59 “professional ethics,” tools, publications & resources, american library association, february 6, 2019, http://www.ala.org/tools/ethics. 60 “creative commons—attribution-sharealike 4.0 international—cc by-sa 4.0,” accessed december 7, 2016, https://creativecommons.org/licenses/by-sa/4.0/. 61 molly houweling, “the new servitudes,” georgetown law journal 96, no. 3 (2008): 885-950. 62 “compatible licenses,” creative commons, accessed december 7, 2016, https://creativecommons.org/share-your-work/licensing-considerations/compatiblelicenses/. 63 katz, “pitfalls of open licensing,” 391; susan corbett, “creative commons licences, the copyright regime and the online community: is there a fatal disconnect?,” the modern law review 74, no. 4 (2011): 506, http://www.jstor.org/stable/20869091. 64 lawrence lessig, “against transparency,” new republic, october 8, 2009, https://newrepublic.com/article/70097/against-transparency. 65 “creative commons ceo apologizes to virgin mobile—stock photography news, analysis and opinion,” accessed december 7, 2016, https://www.selling-stock.com/article/creativecommons-ceo-apologizes-to-virgin-mob. 66 “frequently asked questions,” creative commons, accessed july 30, 2019, https://creativecommons.org/faq/#how-are-publicity-privacy-and-personality-rightsaffected-when-i-apply-a-cc-license. 67 “defending noncommercial uses: great minds v fedex office,” creative commons, august 30, 2016, https://creativecommons.org/2016/08/30/defending-noncommercial-uses-greatminds-v-fedex-office/. https://creativecommons.org/licenses/by-sa/4.0/legalcode#s6a https://www.law.cornell.edu/uscode/text/17/101 https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode https://en.wikipedia.org/wiki/parody https://en.wikipedia.org/wiki/fair_use http://www.ala.org/tools/ethics https://creativecommons.org/licenses/by-sa/4.0/ https://creativecommons.org/share-your-work/licensing-considerations/compatible-licenses/ https://creativecommons.org/share-your-work/licensing-considerations/compatible-licenses/ http://www.jstor.org/stable/20869091 https://newrepublic.com/article/70097/against-transparency https://www.selling-stock.com/article/creative-commons-ceo-apologizes-to-virgin-mob https://www.selling-stock.com/article/creative-commons-ceo-apologizes-to-virgin-mob https://creativecommons.org/faq/#how-are-publicity-privacy-and-personality-rights-affected-when-i-apply-a-cc-license https://creativecommons.org/faq/#how-are-publicity-privacy-and-personality-rights-affected-when-i-apply-a-cc-license https://creativecommons.org/2016/08/30/defending-noncommercial-uses-great-minds-v-fedex-office/ https://creativecommons.org/2016/08/30/defending-noncommercial-uses-great-minds-v-fedex-office/ information technology and libraries | september 2019 48 68 andrea malone et al., “center stage: performing a needs assessment of campus research centers and institutes,” journal of library administration 57, no.4 (2017): 406–19, https://doi:10.1080/01930826.2017.1300451. 69 laura gordon-murnane, “feature: creative commons: copyright tools for the 21st century,” information today, accessed december 7, 2016, http://www.infotoday.com/online/jan10/gordon-murnane.shtml. 70 ibid. https://doi:10.1080/01930826.2017.1300451 http://www.infotoday.com/online/jan10/gordon-murnane.shtml abstract introduction literature review four limitations of cc licenses is creative commons really free?—the pitfall of a free license is creative commons really safe?—the risk of irrevocability is creative commons really clear?—the ambiguity of noncommercial and nonderivative licenses noncommercial license nonderivative license is creative commons really sustainable—the dilemma of sharealike and open access incompatible sharealike licenses against open access? conclusion notes letter from the editor: reviewers wanted letter from the editor reviewers wanted kenneth j. varnum information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13xxx together with one of the other journals published by ala’s core division, information technology and libraries (ital) and library leadership and management (ll&m) invite applications for peer reviewers. serving as a reviewer is a great opportunity for individuals from all types of libraries and with a wide variety of experience to contribute to scholarship within our chosen profession. we are seeking the broadest pool of reviewers possible. reviewer responsibilities for both journals are to have an interest/experience with the journal’s topics, as described below. reviewers should expect to review 2-4 articles a year and should provide thoughtful and actionable comments to authors and the editor. reviewers will work with the editor, associate editor, and/or editorial board of the corresponding journal. see the job description for ital reviewers for more details about this new role. we welcome applications from individuals at libraries of all types, levels of experience, locations, perspectives, and voices, especially those from underrepresented groups. reviewers will be selected to maximize the diversity of representation across these areas, so if you’re not sure if you should apply, please do! increasing the pool of reviewers for information technology and libraries is part of the editorial board’s desire to provide equitable treatment to submitted articles and will enable us to follow a more typical process for peer-reviewed journals: a two-reviewer double-blind process. that will be a welcome and, frankly, overdue change to ital’s current process, in which submitted articles are typically reviewed by one person. expanding the number of reviewers across the breadth of subject areas our journal covers will foster a more rigorous yet more open review process. should you be more interested more in the policy side of this journal, please watch out for a call for volunteers for the ital editorial board. that process will start in april. * * * * * * * as this issue of the journal goes online, covid as a global health crisis has just entered its second year. i’m constantly reminded of the duality of our collective ability to show resilience and exhibit fragility as we continue to endure this period. when i wrote the letter from the editor a year ago, i focused on the imminent vote to establish a new ala division, core, as the most important question facing me. how quickly things changed! by the time the march 2020 issue was published, everything was different. wherever you are, however you have adapted to the situation, i hope you are well and, like me, are turning from wondering when this period will end, to wondering what “normal” will be in the post-pandemic world. kenneth j. varnum, editor varnum@umich.edu march 2021 https://docs.google.com/forms/d/e/1faipqlsc7fxjjk6vwute5pwxwpu_udxjrygpatkpqu4fzib9lj08sna/viewform?usp=sf_link https://docs.google.com/forms/d/e/1faipqlsc7fxjjk6vwute5pwxwpu_udxjrygpatkpqu4fzib9lj08sna/viewform?usp=sf_link https://docs.google.com/document/d/1vtgq8fcfm9ux2u0elvhjrdlm6vxut7ybu6cytqw-nz4/edit?usp=sharing https://docs.google.com/document/d/1vtgq8fcfm9ux2u0elvhjrdlm6vxut7ybu6cytqw-nz4/edit?usp=sharing https://ejournals.bc.edu/index.php/ital/about/editorialteam https://doi.org/10.6017/ital.v39i1.12137 mailto:varnum@umich.edu leadership and infrastructure and futures…oh my! letter from the core president leadership, infrastructure, futures christopher cronin information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.13027 christopher cronin (cjc2260@columbia.edu) is core president and associate university librarian for collections, columbia university. © 2020. i am so pleased to be able to welcome all ital subscribers to core: leadership, infrastructure, futures! this issue marks the first of ital since the election of core’s inaugural leadership. a merger of what was formerly three separate ala divisions—the association of library collections & technical services (alcts), library & information technology association (lita), and the library leadership & management association (llama)—core is an experiment of sorts. it is, in fact, multiple experiments in unification, in collaboration, in compromise, in survival. while initially born out of a sheer fight or flight response to financial imperatives and the need for organizational effectiveness, developing core as a concept and as a model for an enduring professional association very quickly became the real motivation for those of us deeply embedded in its planning. core is very deliberately not an all-caps acronym representing a single subset of practitioner within the library profession. it is instead an assertion of our collective position at the center of our profession. it is a place where all those working in libraries, archives, museums, historical societies—information and cultural heritage broadly—will find reward and value in membership and a professional home. all organizations need effective leaders, strong infrastructure, and a vision for the future. and that is what core strives to build with and for its members. while i welcome ital’s readers into core, i also welcome core’s membership into ital. no longer publications of their former divisions, all three journals within core have an opportunity to reconsider their mandates. as with all things, audience matters. ital’s readership has now expanded dramatically, and those new readers must be invited into ital’s world just as much as ital has been invited into theirs. as we embark on this first year of the new division, we do so with a sense of altogether newness more than of a mere refresh, and a sense of still becoming more than a sense of having always been. and who doesn’t want to reinvent themselves every once in a while? start over. move away from the bits that aren’t working so well, prop up those other bits that we know deserve more, and venture into some previously uncharted territory. how will being part of this effort, and of an expanded division, reframe ital’s mandate? the importance of information technology has never been more apparent. it is not lost on me that we do this work in core during a year of unprecedented tumult. in 2020, a murderous global pandemic was met with unrelenting political strife, pervasive distribution of misinformation and untruths, devastating weather disasters, record-setting unemployment, heightened attention on an array of omnipresent social justice issues, and a racial reckoning that demands we look both inward and outward for real change. individually and collectively, we grieve so many losses —loss of life, loss of income, loss of savings, loss of homes, loss of dignity, loss of certainty, loss of control, loss of physical contact. and throughout all of these challenges, what have we relied on more this year than technology? technology kept us productive and engaged. it provided a focal point for communication and connection. it provided venues for advocacy, expression, inspiration, and, as a mailto:cjc2260@columbia.edu information technology and libraries december 2020 leadership, infrastructure, futures | cronin 2 counterpoint to that pervasive distribution of misinformation, it provided mechanisms to amplify the voices of the oppressed and marginalized. for some, but unfortunately not all, technology also kept us employed. and as the physical doors of our organizations closed, technology provided us with new ways to invite our users in, to continue to meet their information needs, and to exceed all of our expectations for what was possible even with closed physical doors. and yet our reliance on and celebration of technology in this moment has also placed another critical spotlight on the devastating impact of digital poverty on those who continue to lack access, and by extension also a spotlight on our privilege. in her parting words to you in the final issue of ital as a lita journal, evviva weinraub lajoie, the last president of lita, wrote: we may have always known that inequities existed, that the system was structured to make sure that some folks were never able to get access to the better goods and services, but for many, this pandemic is the first time we have had those systemic inequities held up to our noses and been asked, “what are you going to do to change this?” balancing those priorities will require us to lean on our professional networks and organizations to be more and to do more. i believe that together, we can make core stand up to that challenge. i believe we will do this, too, and with a spirit of reinvention that is guided by principles and values that don’t just inspire membership but also improve our professional lives and experience in tangible ways. it was a privilege to have served as the final president of alcts and such a humbling and daunting responsibility to now transition into serving as core’s first. it is a responsibility i do not take lightly, particularly in this moment when so much is demanded of us. as we strive for equity and inclusion, we do so knowing that we are only as strong as every member’s ability to bring their whole selves to this work. we must work together to make our professional home everything we need it to be and to help those who need us. it is yours, it is theirs, it is ours. https://doi.org/10.6017/ital.v39i3.12687 do space’s virtual interview lab: using simple technology to serve the public in a time of crisis editorial board thoughts do space’s virtual interview lab using simple technology to serve the public in a time of crisis michael p. sauers information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.13461 as we start to pull ourselves out of the covid-19 pandemic it’s time to start thinking about what changes we made at our libraries in response and decide which ones we should keep and which ones need to end along with the pandemic itself. do space in omaha, nebraska is the country’s first community technology library run by community information trust, a privately funded non-profit. since our opening in november 2015, we have been providing access to a variety of technologies like a computer lab, laptops, high-speed internet and more along with innovative educational programs on a range of technological topics for individuals and small businesses. membership and the vast majority of services are free and anyone is welcome to join. as a result of the pandemic, we were closed to the public for three months in early 2020 and reopened under limited services to the public in june 2020. both while we were closed and since, we have implemented many changes to our services and programming including limiting the number of available computers to support social distancing and moving our all-in-person educational programming to all-online programming to just name two of the bigger changes. but what i’d like to talk about is one new service we implemented that was simple, easy to set up, and has had a significant impact on a number of our members: do space’s virtual interview lab. when we reopened to the public with limited services in june 2020, one of the questions we asked ourselves was what new services could we provide in the circumstances and, under those same circumstances, what new service might the public need. we considered the reality of social distancing, and the fact that our meeting rooms could no longer be used for meetings with multiple members. then, although nebraska has historically had a low unemployment rate, we realized that many employers that had not yet moved to online interviews, covid pretty much forced many to do so. this was combined with the fact that covid only reinforced the already significant digital divide. someone needing to attend a job interview online could easily be lacking something as simple as a good quality webcam or microphone, or not have the bandwidth available to them at their home to successfully video conference. worst case, they may not have a computer at all. these are exactly the situations that do space was created for; to offer the public access to the hardware, software, and bandwidth that they need to become successful. with this in mind we decided to turn our small conference room into a virtual interview lab. the room already had a good-sized table, excellent available wifi, generally good lighting, and plain white walls, perfect as a simple background. previous users of this room would generally use a laptop to connect to our wifi and a large screen in the room. instead, for this setup we added a small micropc which we connected via an ethernet port to our gigabit fiber internet connection. michael sauers (msauers@travelinlibrarian.info) is technology manager, do space, and a member of the journal’s editorial board © 2021. mailto:msauers@travelinlibrarian.info information technology and libraries june 2021 do space’s virtual interview lab | sauers 2 to this pc we added a 27” monitor, 1080p webcam, a blue yeti microphone, and headphones. on the pc we installed every virtual meeting platform we could think of including zoom, adobe connect, microsoft teams, gotowebinar/meeting, and more, placing direct shortcuts to all of the programs and online services right on the desktop for easy access. with our setup complete we opened the lab for bookings starting july 1, 2020. use has been slow and steady, possibly due to our low unemployment rate, but the members that have used it are grateful for its existence. our marketing was first just on our website and social media but after a month or two we gathered a list of over 50 area groups and organizations that assisted folks with finding work and mailed them a stash of postcards that they could hand out and asked them to let us know if they needed any more. one group was so inspired by the project that in their thank you they said that they’d be starting their own virtual interview lab at their location. in the past year the lab hasn’t changed all that much with the exception of moving to a different room and a broadening of the use case. we quickly realized that members were wishing to use the room for events beyond job interviews. those using the lab have done so for attending ged and language classes, business meetings, attending do space’s own online programming, and even participating in our virtual tech mentoring program. have there been any problems? we’re dealing with technology here so of course the answer is yes, but luckily, they have been minor. for example, one person commented that our blue headphones didn’t look as “professional” as they would have liked. other times zoom needed a last-minute update which staff quickly addressed. (we encourage everyone to book the start of their session 30 minutes in advance to give us a chance to fix such issues.) otherwise, feedback has been overall very positive. here’s just a few examples: • “thank you! i actually used the room on short notice for several conference calls (plumbing disaster at my house!). it's not the intended use, but it was open and your team was kind enough to let me use it. i sincerely appreciate it. the room, by the way, has an excellent set up. wifi was lightning fast, lighting was perfect and i love that you have a microphone to focus the sound. oh, and that cute coat rack dressed up my background when i had to talk to a large client. it's fantastic that you offer this resource to the community. thanks again for letting me use it!” • “wanted to note a few things. i used this space for a research interview where i was a participant. i wasn't strictly using this space for a job interview. that said, it suited my needs perfectly. i was very happy to utilize this space. it was quiet, clean, and accommodated what i was looking for. customer service was also excellent. the service desk worker was able to promptly get me set up when i was already running a bit late for my interview. thank you for making this service available and also making it intuitive and easy to utilize. i will probably look to use it again in the future.” • “the room is ideal, quiet, no distractions, i was able to connect clearly using teams, no glitches, the volume was loud enough. was able to hear clearly and see interviewers faces clearly. staff at do space were available and prompt to assist before the interview when i needed set-up help adjusting the appearance / display of my head within the frame/screen.” • “you are a godsend! i am so grateful, especially in these times, that you are here. the staff are kind, patient and thoroughly knowledgeable. i love you.” information technology and libraries june 2021 do space’s virtual interview lab | sauers 3 this experience has reminded me that while all the advanced experimenting and complex coding we create to better assist our users is all well and good, sometimes just a simple computer, internet connection, and webcam can make all the difference in someone’s life. while some of the changes that we’ve made over the past year, such as moving all programming online, will be either ending or slowly transitioning to pre-pandemic states, our virtual interview lab is one new service that we will be definitely keeping for the foreseeable future. applying gamification to the library orientation: a study of interactive user experience and engagement preferences articles applying gamification to the library orientation a study of interactive user experience and engagement preferences karen nourse reed and a. miller information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12209 karen nourse reed (karen.reed@mtsu.edu) is associate professor, middle tennessee state university. a. miller (a.miller@mtsu.edu) is associate professor, middle tennessee state university. © 2020. abstract by providing an overview of library services as well as the building layout, the library orientation can help newcomers make optimal use of the library. the benefits of this outreach can be curtailed, however, by the significant staffing required to offer in-person tours. one academic library overcame this issue by turning to user experience research and gamification to provide an individualized online library orientation for four specific user groups: undergraduate students, graduate students, faculty, and community members. the library surveyed 167 users to investigate preferences regarding orientation format, as well as likelihood of future library use as a result of the gamified orientation format. results demonstrated a preference for the gamified experience among undergraduate students as compared to other surveyed groups. introduction background newcomers to the academic campus can be a bit overwhelmed by their unfamiliar environment: there are faces to learn, services and processes to navigate, and an unexplored landscape of academic buildings to traverse. whether one is an incoming student or recently hired employee of the university, all need to become quickly oriented to their surroundings to ensure productivity. in the midst of this transition, the academic library may or may not be on the list of immediate inquiries; however, the library is an important place to start. newcomers would be wise to familiarize themselves with the building and its services so that they can make optimal use of its offerings. two studies found that students who used the library received better grades and had higher retention rates. 1 another study regarding university employees revealed that untenured faculty made less use of the library than tenured faculty, a problem attributed to lack of familiarity with the library.2 researchers have also found that faculty will often express interest in different library services without realizing that these services are in fact available.3 it is safe to say that libraries cannot always rely on newcomers to discover the physical and electronic services on their own; they need to be shown these items in order to mitigate the risk of unawareness. in consideration of these issues, the walker library at middle tennessee state university (mtsu) recognized that more could be done to welcome its new arrivals to campus. the public university enrolls approximately 21,000 students, the majority of whom are undergraduates. however, with a carnegie classification of doctoral/professional and over one hundred graduate degree programs, there was a strong need for specialized research among the university’s graduate students and faculty. other groups needed to use the library too: non-faculty employees on campus as well as community users who frequently used walker library for its specialized and general collections. the authors realized that when new members of these different groups mailto:karen.reed@mtsu.edu mailto:a.miller@mtsu.edu information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 2 arrived on campus, few opportunities were available for acclimation to the library’s services or building layout. limited orientation experiences were conducted within library instruction classes, but these sessions primarily taught research skills and targeted freshman generaleducation classes as well as select upper-division and graduate classes. in short, it appeared that students, employees, and visitors to the university would largely have to discover the library’s services on their own through a search on the library website or an exploration of the physical library. it was very likely that, in doing so, the newcomers might miss out on valuable services and information. as mtsu librarians, the authors felt strongly that library orientations were important to everyone at the university so that they might make optimal use of the library’s offerings. the authors based this opinion on their knowledge of relevant scholarly literature as well as their own anecdotal experiences with students and faculty.4 the authors defined the library orientation differently from library instruction: in their view, an orientation should acquaint users with the services and physical spaces of the library, as compared to instruction that would teach users how to use the library’s electronic resources such as databases. the desired new approach would structure orientations in response to the different needs of the library’s users. for example, the authors found that undergraduates typically had distinct library interests compared to faculty. it was recognized that library orientations were time-consuming for everyone: library patrons at mtsu often did not want to take the time for a physical tour, nor did the library have the staffing to accommodate large-scale requests. the authors turned to the gamification trend, and specifically interactive storytelling, as a solution. interactive storytelling has previous applications in librarianship as a means of creating an immersive and self-guided user experience.5 however, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. to overcome this gap, the authors developed an online, interactive, game-like experience via storytelling software to orient four different groups of users to the library’s services. these groups were undergraduate students, graduate students, faculty members (which included both faculty and staff at the university), and community members (i.e., visitors to the university or alumni); see figure 1 for an illustration of each groups’ game avatars. these groups were invited to participate in the gamified experience called libgo (short for library game orientation). after playing libgo, participants gave feedback through an online survey. this paper will give a brief explanation of the creation of the game, as well as describe the results of research conducted to understand the impact of the gamified experience across the four user groups. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 3 figure 1. libgo players were allowed to self-select their user group upon entering the game. each of the four user groups was assigned an avatar and followed a logic path specified for that group. literature review traditional orientation searches for literature on library orientation yield very broad and yet limited details about users of the traditional library orientation method. it is important to note that the terms “library tour” and “library orientation” can be somewhat vague, because this terminology is not interchangeable, yet is frequently treated as such in the literature.6 these terms are often included among library instruction materials which predominately influence undergraduate students.7 kylie bailin, benjamin jahre, and sarah morris define orientation as “any attempt to reduce library anxiety by introducing students to what a college/university library is, what it contains, and where to find information while also showing how helpful librarians can be.”8 their book is a culmination of case studies of academic library orientation in various forms worldwide where the common theme across most chapters is the need to assess, revise, and change library orientation models as needed, especially in response to feedback, staff demands, and the evolving trend of libraries and technology.9 furthermore, the majority of these studies are undergraduate-focused, and often freshman-focused, while only a few studies are geared towards graduate students. other traditional orientation problems discussed in the literature include students lacking intrinsic motivation to attend library orientation, library staff time required to execute the orientation, and lack of attendance.10 additionally, among librarians there seems to be consensus that the traditional library tours are the least effective means of orientation, yet they are the most highly used and with attention predominately focused on the undergraduate population alone. 11 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 4 in 1997, pixey anne mosely described the traditional guided library tour as ineffective, and documented the trend of libraries discontinuing it in favor of more active learning options.12 her study surveyed 44 students who took a redesigned library tour, all of whom were undergraduates (with freshmen as the target population). although mosely’s study only addressed one group of library users, it does attempt to answer a question on library perception whereby 93 percent of surveyed students indicated feeling more comfortable in using the library after the more active learning approach.13 a comparison study by marcus and beck looked at traditional vs treasure hunt orientations, and ultimately discovered that perception of the traditional method is limited by the selective user population and lack of effective measurements. they cited the need for continued study of alternative approaches to academic library orientation.14 a study by kenneth burhanna, tammy eschedor voelker, and julie gedeon looked at the traditional library tour from the physical and virtual perspective. confronted with a lack of access to the physical library, these researchers at kent state university decided to add an online option for the required traditional freshman library tour.15 their study compared the efficacy of learning and affective outcomes between face-to-face library tours and those of online library tours. of the 3,610 students who took the required library tour assignment, 3,567 chose the online tour method and 63 opted or were required to take the in-person, librarian-led tour. surveys were later sent to a random list of 250 students who did not take the in-person tour and the 63 students who did take the in-person tour. of the 46 usable responses all but one were undergraduates and 39 (85 percent) of them were freshman.16 this is a small sample size with a ratio of slightly greater than 2:1 for online versus in-person tour participation. although results showed that an instructor’s recommendation on format selection was the strongest influencing factor, convenience was also significant for those who selected the online option (81.5 percent). in contrast, only 18.5 percent of the students who took the face-toface tour rated it as convenient. the authors found that regardless of tour type, students were more comfortable using the library (85 percent) and more likely to use library resources (80 percent) after having taken a library tour. interestingly, students who took the online tour seemed slightly more likely to visit the physical library than those who took the in-person tour. ultimately the analysis of both tours showed this method of library orientation encourages library resource use, and the “online tour seems to perform as well, if not slightly better than the in-person tour.”17 gamification use in libraries an alternative format to the traditional method is gamification. gamification has become a familiar trend within academic libraries in recent years, and most often refers to the use of a technology based game delivery within an instructional setting. some users find gamified library instruction to be more enjoyable than traditional methods. for these people, gamification can potentially increase student engagement as well as retention of information.18 the goal of gamification is to create a simplified reality with a defined user experience. kyle felker and eric phetteplace emphasized the importance of user interaction over “specific mechanics or technologies” in thinking about the gamification design process.19 proponents of gamification of library instructional content indicate that it connects to the broader mission of library discovery and exploration as exemplified through collaboration and the stimulation of learning.20 additional benefits of gamification are its teaching, outreach and engagement functions.21 many researchers have documented specific applications of online gaming as a means of imparting library instruction. mary j. broussard and jessica urick oberlin described the work of librarians at lycoming college in developing an online game as one approach to teaching about information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 5 plagiarism.22 melissa mallon offered summaries of nine games produced for higher education, several of which were specifically created for use by academic libraries.23 many of these online library games reviewed used flash, or required players to download the game before playing. by contrast, j. long detailed an initiative at miami university to integrate gamification into the library instruction, a project which utilized twine.24 twine is an in-browser method and therefore avoids the problem of requiring users to download additional software prior to playing the game. other libraries have used online gamification specifically as a tool for library orientations. although researchers have demonstrated that the library orientation is an important practice in establishing positive first impressions of the library and counteracting library anxiety among new users, the differences between in-person versus online delivery formats are unclear.25 several successful instances have been documented in which the orientation was moved to an online game format. nancy o’hanlon, karen diaz, and fred roecker described a collaboration at ohio state university libraries between librarians and the office of first year experience; for this project, they created a game to orient all new students to the library prior to arrival on campus.26 the game was called “head hunt,” and was cited among those games listed in the article by mallon. 27 anna-lise smith and lesli baker reported the “get a clue” game at utah valley university which oriented new students over two semesters.28 another orientation game developed at california state university-fresno was noteworthy for its placement in the university’s learning management system (lms).29 in reviewing the literature regarding online library gamification efforts, there appear to be several best practices. several studies cite initial student assessment to understand student knowledge and/or perceptions of the content, followed by an iterative design process with a team of librarians and computer programmers.30 felker and phetteplace reinforced the need for this iterative process of prototyping, testing, deployment, and assessment as one key to success; however they also stated that the most prevalent reason for failure is that the games are not fun for users.31 librarians are information experts, and are not necessarily trained in fun game design. some libraries have solved this problem by partnering with or hiring professional designers; however for many under-resourced libraries, this is not an option.32 taking advantage of opensource tools, as well as the documented trial-and-error practices of others, can be helpful to newcomers who wish to break into new library engagement methods utilizing gamification. as literature has shown, a traditional library tour may have a place in the list of library services, but for whom and at what cost are questions with limited answers in studies done to date. gamification has offered an alternative perspective but with narrow accounts of its success in the online storytelling format and for users outside of the heavily focused freshman group. across all literature of library orientation studies, there is little reference to other library user populations such as faculty, staff, community users, distance students, or students not formally part of a class that requires library orientation. development of the library game orientation (libgo) libgo was developed by the authors with not only a consideration for the walker library user experience, but also with a specific attention to the differing needs of the multiple user groups served by the library. this user-focused concern led to exploring creative methodologies such as user experience research and human-centered design thinking, a process of overlapping phases that produces a creative and meaningful solution in a non-linear way. the three pillars of design information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 6 thinking are inspiration, ideation, and iteration.33 defining the problem and empathizing with the users (inspiration) led into the ideation phase, whereby the authors created lowand high-fidelity prototypes. the prototypes were tested and improved (iteration) through the use of beta testing in which playtesters interacted with the gamified orientation. the authors were novice developers of the gamified orientation, and this entailed a learning curve for not only the design thinking mindset but also the technical achievability. the development started with design thinking conversations and quickly turned to low-fidelity prototypes designed on paper. the development soon advanced to the actual coding so that the authors could get early designs tested before launching the final version. prior to deployment on the library’s website, libgo underwent a series of playtesting by library faculty, staff, and student employees. this testing was invaluable and led to such improvements as streamlining of processes and less ambiguity of text. libgo was developed with the twine open-source software (https://twinery.org), a product which is primarily used for telling interactive, non-linear stories with html. twine was an excellent application for this project as it allowed the creation of an online and interactive “choose your own adventure” styled library orientation game, in which users could explore the library based upon their selection of one of multiple available plot directions. with a modest learning curve and as an open source software, twine is highly accessible for those who are not accustomed to coding. for those who know html, css, javascript, variables, and conditional logic, twine’s capabilities can be extended. the library’s interactive orientation adventure requires users to select one of the four available personas: undergraduate student, graduate student, faculty, or community member. users subsequently follow that persona through a non-linear series of places, resources and points of interest built with the html output of using twee (twine’s programming language). see figure 2 for an example point of interest page and figure 3 for an example of a user’s final score after completing the gamified experience. once the twine story went through several iterations of design and testing, the html file was placed on the library’s website for the gamified orientation to be implemented with actual users. https://twinery.org/ information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 7 figure 2. this instructional page within libgo explains how to reserve different library spaces online. upon reading this content, the user will progress by clicking on one of the hypertext lines in blue font at the bottom. figure 3. based upon the displayed avatar, this libgo page is representative of a graduate student’s completion of libgo. the page indicates the player’s final score and gives additional options to return to the home page or complete the survey. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 8 purpose of study libgo utilized the common "choose your own adventure" format whereby players progress through a storyline based upon their selection of one of multiple available plot directions. although the literature suggests that other technology-based methods are an engaging and instructive mode of content delivery, little prior research exists regarding this specific approach to library outreach. furthermore, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. the researchers wanted to understand the potential of interactive storytelling as a means to educate a range of users on library services as well as make the library more approachable from a user perspective. the study was designed to understand the user experience of each of the four groups. the researchers hoped to discern which users, if any, found the gamified experience to be a helpful method of orientation to the library’s physical and electronic services. another area of inquiry was to determine whether this might be an effective delivery method by which to target certain segments of the campus for outreach. finally, the study intended to determine whether this method of orientation might incline participants toward future use of the library. methodology overview the authors selected an embedded mixed methods design approach in which quantitative and qualitative data were collected concurrently through the same assessment instrument.34 the survey instrument primarily collected quantitative data, however a qualitative open-response question was embedded at the end of the survey: this question gathered additional data by which to answer the research questions. each data set (one quantitative and one qualitative) was analyzed separately for each participant group, and then the groups were compared to develop a richer understanding of participant behavior. research questions the data collection and subsequent analysis attempted to answer the following questions: 1. which group(s) of library users prefer to be oriented to library services and resources through the interactive storytelling format, as compared to other formats? 2. which group(s) of library users are more likely to use library services and resources after participating in the interactive storytelling format of orientation? 3. what are user impressions of libgo, and are there any differences in impression based on the characteristics of the unique user group? participants participants for the study were recruited in-person and via the library website. in-person recruitment entailed the distribution of flyers and use of signage to recruit participants to play libgo in a library computer lab during a one-day event. online recruitment lasted approximately ten weeks and simply involved the placement of a link to libgo on the home page of th e library’s website. a total of 167 responses were gathered through both methods and participants were distributed as shown in table 1. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 9 table 1. composition of study’s participants group number affiliation number of responses 1 undergraduate students 55 2 graduate students 62 3 faculty 13 4 staff 28 5 community members 9 total 167 for the purposes of statistical data analysis, groups 3 and 4 were combined to produce a single group of 41 university employee respondents; also, group 5’s data was not included in the statistical analysis due to the low number of participants. qualitative data for all groups, however, was included in the non-statistical analysis. survey instrument a survey with twelve total questions was developed for this study and was administered online through qualtrics. after playing libgo, participants were asked to voluntarily complete the survey; if they agreed, they were redirected to the survey’s website. before answering any survey questions, the instrument administered an informed consent statement to participants . all aspects of the research, including the survey instrument, were approved through the university’s institutional review board (protocol number 18-1293). the first part of the survey (see appendix a) consisted of ten questions, each with a ten-point likert scaled response. the first five questions were each designed to measure a preference construct, and the next five questions each measured a likelihood construct. the pref erence construct referred to participant’s preference for a library orientation: did they prefer libgo’s online interactive storytelling format, or did they prefer another format such as in-person talks? the likelihood construct referred to the participant’s self-perceived likelihood of more readily engaging with the library in the future (both in-person and online) after playing libgo. the second part of the survey gathered the participant’s self-reported affiliation (see table 1 for the list of possible group affiliations) as well as offered participants an open-ended response area for optional qualitative feedback. data collection the study’s data was collected in two stages. in stage one, libgo was unveiled to library visitors during a special campus-wide week of student programming events. on the library’s designated event day, the researchers held a drop-in event at one of the library’s computer labs (see figure 4 for an example of event advertisement). library visitors were offered a prize bag and snacks if they agreed to play libgo and complete the survey. during the three-hour-long drop-in session, 58 individual responses were collected: the vast majority of these came from undergraduate students (51 responses), with additional responses from graduate students (n = 4), university staff employees (n = 2), and one community member responding. community members were defined as anyone not currently directly affiliated with the university; this group may have included prospective students or alumni. stage 2 began the following day after the library drop-in event, and simply involved the placement of a link to libgo on the home page of the library’s website. any visitor to the library’s website could click on the advertisement to be taken to libgo. this link information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 10 remained active on the library website for ten weeks, at which point the final data was gathered. a total of 167 responses were gathered during both stages and participants were distributed as previously shown in table 1. figure 4. example of student libgo event advertisement results quantitative findings statistical analysis of each of the ten quantitative questions required the use of one-way anova in spss. a post hoc test (hochberg’s gt2) was run in each instance to account for the different sample sizes. for all statistical analysis, only the data from undergraduates, graduate students, and university employees (a group which combined both faculty and staff results) were utilized. a listing of mean comparisons by group, for each of the ten survey questions, may be found in table 2. the analysis of the one-way anovas yielded statistically significant results for three of the ten individual questions in the first part of the survey: questions 2, 3, and 6 (see table 3). table 2. descriptive statistics for survey results (10-point scale, with 10 as most likely) survey question mean for undergraduate students mean for graduate students mean for university employees 1. in considering the different ways to learn about walker library, do you find this library orientation game to be more or less preferable as compared to other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)? 7.02 6.39 6.02 2. in your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources? 8.13 6.94 7.12 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 11 3. if your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own?) 7.38 5.94 5.98 4. please indicate your level of agreement with the following statement: “as compared to playing the game, i would have preferred to learn about the library’s resources and services by my own exploration of the library website?” 6.11 6.50 5.88 5. please indicate your level of agreement with the following statement: “as compared to playing the game, i would have preferred to learn about the library’s resources and services through an inperson orientation tour.” 6.11 5.08 5.76 6. after playing this orientation game, are you more or less likely to visit walker library in person? 8.27 6.94 6.90 7. after playing this library orientation game, are you more or less likely to use the walker library website to find out about the library (such as hours of operation, where to go to get different materials/services, etc.)? 7.82 6.97 7.20 8. after playing this library orientation game, are you more or less likely to seek help from a librarian at walker library? 6.95 6.58 6.63 9. after playing this library orientation game, are you more or less likely to use the library’s online resources (such as databases, journals, e-books)? 7.67 7.15 6.90 10. after playing this library orientation game, are you more or less likely to attend a library workshop, training, or event? 6.96 6.73 6.24 table 3. overall statistically significant group differences df f p w2 question 2 2 3.714 .027 .03 question 3 2 4.508 .012 .04 question 6 2 7.178 .001 .07 question 2 asked “in your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources?” the one-way anova found that there was a statistically significant difference between groups (f(2,155) = 3.714, p = .027, ω2 = .03). the post hoc comparison using the hochberg’s gt2 test revealed that undergraduates were statistically significantly more likely to prefer libgo in this manner (m = 8.13, sd = 1.94, p = .031) as information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 12 compared to the graduate students (m = 6.94, sd = 2.72). there was no statistically significant difference between undergraduates and the university employees (p = .145). according to criteria suggested by roger kirk, the effect size of .03 indicates a small effect in perceived usefulness of libgo as an introduction among undergraduates.35 question 3 asked “if your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)?” the one-way anova found that there was a statistically significant difference between groups (f(2, 155) = 4.508, p = .012, ω2 = .04). the post hoc comparison using the hochberg’s gt2 test found that undergraduates were statistically significantly more likely to prefer libgo over other orientation options (m = 7.38, sd = 2.49, p = .021) as compared to graduate students (m = 5.94, sd = 3.06). there was no statistically significant difference between undergraduates and university employees (p = .053). the effect size of .04 indicates a small effect regarding undergraduate preference for libgo versus other orientation options. question 6 asked “after playing this library orientation game, are you more or less likely to visit walker library in person?” the one-way anova found that there was a statistically significant difference between groups (f(2,155) = 7.178, p = .001, ω2 = .07). the post hoc comparison using the hochberg’s gt2 test revealed that undergraduates were statistically significantly more likely to visit the library after playing libgo (m = 8.27, sd = 2.09, p = .003) as compared to graduate students (m = 6.94, sd = 2.20). additionally, the test found that undergraduates were statistically significantly more likely to visit the library after playing libgo (p = .007) as compared to university employees (m = 6.90, sd = 2.08). according to criteria suggested by kirk, the effect size of .07 indicates a medium effect regarding undergraduate potential to visit the library in person after playing libgo. 36 in addition to testing each individual survey question, tests were run to understand the possible group differences by construct (preference and likelihood). the preference construct was an aggregate of survey questions 1-5, and the likelihood construct was an aggregate of survey questions 6-10. for both constructs, the one-way anova found results which were not statistically significant. in all, the quantitative findings indicated three areas by which the experience of playing libgo was more helpful for the surveyed undergraduates than the other surveyed groups (i.e., graduate students or university employees). at this point, the analysis turned to the qualitative data so as to better understand participant views of libgo. qualitative findings analysis of the qualitative results was limited to the data collected in the survey’s final question. question 12 was an open-response area, and was intentionally prefaced with a vague prompt: “do you have any final thoughts for the library (suggestions, additions, modification, comments, criticisms, praise, etc.)?” of the 167 total survey responses, 67 individuals chose to answer this question. preliminary analysis showed that the feedback derived from this question covered a spectrum of topics, ranging from remarks on the libgo experience itself to broader concerns regarding other library services. open coding strategies were utilized to interpret the content of participant responses. under this methodology, the responses were evaluated for general themes and then coded and grouped information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 13 under a constant comparative approach.37 nvivo 12 software was used to code all 67 participant responses. initial coding yielded eight open codes, but these were later consolidated into six final codes (see table 4). one code (libgo improvement tip) was rather nuanced and yielded five axial codes (see table 5). axial codes denoted secondary concerns which fell under a larger category of interest. although some participants gave longer feedback which addressed multiple concerns, care was taken to segregate each distinct concern to a specific code. therefore, it is important to note that some comments addressed multiple concerns, and so the total number of concerns (n = 76) is greater than the total number of individuals responding to the prompt (n = 67). table 4. distribution of qualitative codes by user group code undergraduate graduate faculty staff community member total # concerns positive feedback 7 7 1 4 2 21 negative feedback 1 2 0 3 0 6 in-person tour preference 2 3 0 1 0 6 libgo improvement tip 5 11 1 3 3 23 library services feedback 2 4 3 0 0 9 library building feedback 1 7 1 2 0 11 total: 18 34 6 13 5 76 discussion of qualitative themes positive feedback (21 separate concerns). affirmative comments regarding libgo were primarily split between undergraduate and graduate students, with a small number of comments coming from the other groups. although all groups stated that the game was helpful, one undergraduate wrote “i wish i would’ve received this orientation at the very beginning of the year!” a graduate student declared “this was a creative way to engage students, and i think it should be included on the website for fun.” both community members commented on the utility of libgo in providing an orientation without having to physically come to the library; for example, “interactive without having to actually attend the library in person which i liked.” additionally, a community member pointed out the instructional capability of libgo, writing “i think i learned more from the game than walking around in the library.” negative feedback (6 separate concerns). unfavorable comments regarding libgo primarily challenged the orientation’s characterization as a “game” in terms of its lack of fun. one graduate student wrote a comment representative of this concern by stating, “the game didn’t really seem like a game at all.” a particularly searing comment came from a university staff member who information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 14 wrote, “calling this collection of web pages an ‘interactive game’ is a stretch, which is a generous way of stating it.” in-person tour preference (6 separate concerns). a small number of concerns indicated a preference for in-person orientations versus online. one undergraduate cited the ability to ask questions during an in-person tour as an advantage of that delivery medium. a graduate student mentioned their desire for kinesthetic learning over an online approach, writing, “i prefer hands on exploration of the library.” libgo improvement tip (23 separate concerns). suggested improvements to libgo were the largest area of qualitative feedback and produced five axial themes (subthemes); see table 5 for a breakdown of the five axial themes by group. 1. design issues were the largest cited area of improvement, and the most commonly mentioned design problem was the inability of the user to go back to previously seen content. although this functionality did in fact exist, it was apparently not intuitive to users; design modifications in future iterations are therefore critical. other users made suggestions as to the color scheme used and the ability to magnify image sizes. 2. user experience was another area of feedback, and primarily included suggestions on how to make libgo a more fun experience. one graduate student offered a role-playing game alternative. another graduate student expressed an interest in a game with side missions, in addition to the overall goals, where tokens could be earned for completed missions; the student justified these changes by stating “i feel that incorporating these types of idea will make the game more enjoyable.” in suggesting similar improvements, one undergraduate stated that libgo “felt more like a quiz than a game.” 3. technology issues primarily addressed two related issues: images not loading and broken links. images not loading could be dependent on many factors, including the user’s browser settings, internet traffic (volume) delaying load time, or broken image links, among others. broken links could be the root issue since the images used in libgo were taken from other areas of the library website. this method of gathering content pointed out a design vulnerability of using existing image locations (controlled by non-libgo developers) rather than images exclusively for libgo. 4. content issues were raised exclusively by graduate students. one student felt that libgo placed an emphasis on physical spaces in the library and did not give a deep enough treatment to library services. another graduate student asked for “an interactive map to click on so that we physically see the areas” of the library, thus making the interaction more user-friendly with a visual. 5. didn’t understand purpose is a subtheme where improvement is needed and is based on two comments made by the two university staff members. one wrote that “an online tour would have been better and just as informative,” although libgo was not only designed to be an online tour of the library, but also an orientation of the library’s services. the other staff member wrote, “i read the rules but it was still unclear what the objective was.” in all, it is clear that libgo’s purpose was confusing for some. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 15 table 5. libgo improvement tip axial codes by user group axial code undergraduate graduate faculty staff community member total # concerns design 4 3 0 0 1 8 user experience 1 2 1 0 1 5 tech issue 0 1 0 1 0 2 content 0 5 0 0 1 6 didn’t understand purpose 0 0 0 2 0 2 total: 5 11 1 3 3 23 library services feedback (9 separate concerns). several participants took the opportunity to provide feedback on general library services rather than on libgo itself. undergraduates simply gave general positive feedback about the value of the library, but many graduate students gave recommendations regarding specific electronic resource improvements. additionally, one graduate student wrote, “i think it is critical to meet with new graduate students before they start their program,” something the library used to do but had not pursued in recent years. although these comments did not directly pertain to libgo, the authors accepted all of them as valuable feedback to the library. library building feedback (11 separate concerns). this was another theme in which graduate students dominated the comments. feedback ranging from requests for microwave use, additional study tables and better temperature control in the building appeared. several participants asked for greater enforcement of quiet zones. like the library services feedback, the authors again took these comments as helpful to the overall library rather than libgo. discussion the results of this study indicated that some groups of library visitors better received the gamified library orientation experience than other groups. undergraduate students indicated the largest appreciation for a library orientation via libgo. specifically, they demonstrated a statistically significant difference over the other groups in supporting libgo’s usefulness as an orientation tool, a preference for libgo over other orientation formats, and a likelihood of future use of the physical library after playing libgo. these very encouraging results provide evidence for the efficacy of alternative means of library orientation. the qualitative results provided additional helpful insight regarding the user impressions from each of the five surveyed groups. this feedback demonstrated that a variety of groups benefited from the experience of playing libgo, including some community members who appreciated libgo as a means of becoming acclimated to the library without having to enter the building. a virtual orientation format was not ideal for a few players who indicated a preference for a face-toface orientation due to the ability to ask questions. many people identified areas of improvement for libgo. graduate students in particular offered a disproportionate number of suggestions as compared to the other groups. while they provided a great deal of helpful feedback, it is possible that graduate students were so distracted by the perceived problems that they could not fully take in the experience or gain value from libgo’s orientation purpose. it is also very likely that libgo information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 16 simply was not very fun for these players: several players noted that it did not feel like a game but rather a collection of content. the review of literature indicated that this amusement issue is a common pitfall of educational games. although the authors tried to design an enjoyable orientation experience, it is possible that more work is needed to satisfy user expectations. the mixed-methods design of this study was instrumental in providing a richer understanding of user perceptions. while the statistical analysis of participant survey responses was very helpful in identifying clear trends between groups, the qualitative analysis helped the authors draw valuable conclusions. specifically, the open-response data demonstrated that additional groups such as graduate students and community members appreciated the experience of playing libgo; this information was not readily apparent through the statistical analysis. additionally, the qualitative analysis demonstrated that many groups had concerns regarding areas of improvement that may have impaired their user experience. these important findings could help guide future directions of the research. in all, the authors concluded this phase of the research feeling satisfied that libgo showed great promise for library orientation delivery but could benefit from continued development and future user assessment. although undergraduate students seemed most receptive overall to a virtual orientation experience, other groups appeared to have benefited from the resource. study limitations a primary limitation of this study was its small sample size. as the entire university campus was targeted for participation in the study, the number of respondents was far too small to generalize the results. despite this limitation however, the study’s population reflected many different groups of library patrons on campus. the findings are therefore valuable as a means of stimulating future discussion regarding the value of alternative library orientation methods utilizing gamification. another limitation is that the authors did not pre-assess the targeted groups for their prior knowledge of walker library services and building layout, nor for their interest in learning about these topics. it is possible that various groups did not see the value in learning about the library for a variety of reasons. faculty members, in particular, may have considered their prior knowledge adequate for navigating the electronic holdings or building layout without recognizing the value of the other many services offered physically and electronically by the library. all groups may have experienced a level of “library anxiety” that prevented them from being motivated to learn more about the library.38 it is difficult to understand the range of covariate factors without a pre-assessment. finally, there was qualitative evidence supporting the limitation that libgo did not properly convey its stated purpose of orientation rather than imparting research skills. without understanding libgo’s focus on library orientation, users could have been confused or disappointed by the experience. although care was taken to make this purpose explicit, some users indicated their confusion in the qualitative data. this observed problem points to a design flaw that undoubtedly had some bearing on the study’s results. conclusion & future research convinced of the importance of the library orientation, the authors sought to move this traditional in-person experience to a virtual one. the quantitative results indicated that the gamified information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 17 orientation experience was useful to undergraduate students in its intended purpose of acclimating users to the library, as well as encouraging their future use of the physical library. at a time in which physical traffic to the library has shown a marked decline, new outreach strategies should be considered.39 the results were also helpful in showing that this particular iteration of the gamified orientation was preferred over other delivery methods by undergraduate students, as compared to other groups, to a statistically significant level. this is an important finding as it demonstrates that a diversified outreach strategy is necessary: different groups of library patrons desire their orientation information in different formats. the next logical question to ask however is: why did the other groups examined through the statistical data analysis (graduate students and faculty) not appreciate the gamified orientation to the same level as undergraduates? the answers to this question are complicated and may be explained in part by the qualitative analysis. based upon those findings, it is possible that the game did not appeal to these groups on the basis of fun or enjoyment; this concern was specifically mentioned by graduate students. faculty members, including staff, provided a smaller level of qualitative feedback; it is therefore difficult to speculate as to their exact reasons for disengagement with libgo. with this concern in mind, the authors would like to concentrate their next iteration of research on the specific library orientation needs of graduate students and faculty. both groups present different, but critical, needs for outreach. graduate students were the largest group of survey respondents, presumably indicating a high level of interest in learning more about the library. many graduate programs at mtsu are delivered partially or entirely online; as a result, these students may be less likely to come to campus. due to graduate students’ relatively infrequent visits to campus, a virtual library orientation could be even more meaningful for them in meeting their need for library services information. faculty are another important group to target because if they lack a full understanding of the library’s offerings, they are unlikely to assign assignments that wholly utilize the library’s services. although it is possible that faculty prefer an in-person orientation, many new faculty have indicated limited availability for such events. a virtual orientation seems conducive to busy schedules. however, it is possible that the issue is simply a matter of marketing: faculty may not know that a virtual option is available, nor do they necessarily understand all that the library has to offer. in all, future research should begin with a survey to understand what both groups already know about the library, as well as the library services they desire. another necessary step in future research would be the expansion of the development team to include computer programmers. although the authors feel that libgo holds great promise as a virtual orientation tool, more needs to be done to enhance the user’s enjoyment of the experience. twine is a user-friendly software that other librarians could pick up without having to be computer programmers; however, programmers (professional or student) could bring a design expertise to the project. future iterations of this project should incorporate the skills of multiple groups, including expertise in libraries, user research, visual design, interaction design, programming, marketing, and testers from each type of intended audience. collectively, this group will have the greatest impact on improving the user experience and ultimately the usefulness of a gamified orientation experience. this experience with gamification, and specifically interactive storytelling, was a valuable experience for walker library. these results should encourage other libraries seeking an alternate information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 18 delivery method for orientations. the authors hope to build upon the lessons learned from this mixed methods research study of libgo to find the correct outreach medium for their range of library users. acknowledgments special thanks to our beta playtesters and student assistants who worked the libgo event, which was funded, in part, by mt engage and walker library at middle tennessee state university. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 19 appendix a: survey instrument information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 20 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 21 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 22 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 23 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 24 endnotes 1 sandra calemme mccarthy, “at issue: exploring library usage by online learners with student success,” community college enterprise 23, no. 2 (january 2017): 27–31; angie thorpe et al., “the impact of the academic library on student success: connecting the dots,” portal: libraries and the academy 16, no. 2 (2016): 373–92, https://doi.org/10.1353/pla.20160027. 2 steven ovadia, “how does tenure status impact library usage: a study of laguardia community college,” journal of academic librarianship 35, no. 4 (january 2009): 332–40, https://doi.org/10.1016/j.acalib.2009.04.022. 3 chris leeder and steven lonn, “faculty usage of library tools in a learning management system,” college & research libraries, 75, no. 5 (september 2014): 641–63, https://doi.org/10.5860/crl.75.5.641. 4 kyle felker and eric phetteplace, “gamification in libraries: the state of the art,” reference and user services quarterly 54, no. 2 (2014): 19-23, https://doi.org/10.5860/rusq.54n2.19; nancy o’hanlon, karen diaz, and fred roecker, “a game-based multimedia approach to library orientation,” (paper, 35th national loex library instruction conference, san diego, may 2007), https://commons.emich.edu/loexconf2007/19/; leila june rod-welch, “let’s get oriented: getting intimate with the library, small group sessions for library orientation,” (paper, association of college and research libraries conference, baltimore, march 2017), http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 7/letsgetoriented.pdf. 5 kelly czarnecki, “chapter 4: digital storytelling in different library settings,” library technology reports, no. 7 (2009): 20-30; rebecca j. morris, “creating, viewing, and assessing: fluid roles of the student self in digital storytelling,” school libraries worldwide, no. 2 (2013): 54–68. 6 sandra marcus and sheila beck, “a library adventure: comparing a treasure hunt with a traditional freshman orientation tour,” college & research libraries 64, no. 1 (january 2003): 23–44, https://doi.org/10.5860/crl.64.1.23. 7 lori oling and michelle mach, “tour trends in academic arl libraries,” college & research libraries, 63, no. 1 (january 2002): 13-23, https://doi.org/10.5860/crl.63.1.13. 8 kylie bailin, benjamin jahre, and sarah morriss, “planning academic library orientations: case studies from around the world,” (oxford, uk: chandos publishing, 2018): xvi. 9 bailin, jahre, and morriss, “planning academic library orientations.” 10 marcus and beck, “a library adventure”; a. carolyn miller, “the round robin library tour,” journal of academic librarianship 6, no. 4 (1980): 215–18; michael simmons, “evaluation of library tours,” edrs, ed 331513 (1990): 1-24. 11 marcus and beck, “a library adventure”; oling and mach, “tour trends”; rod-welch, “let’s get oriented.” https://doi.org/10.1353/pla.20160027 https://doi.org/10.1016/j.acalib.2009.04.022 https://doi.org/10.5860/crl.75.5.641 https://commons.emich.edu/loexconf2007/19/ http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/letsgetoriented.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/letsgetoriented.pdf https://doi.org/10.5860/crl.64.1.23 https://doi.org/10.5860/crl.63.1.13 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 25 12 pixey anne mosley, “assessing the comfort level impact and perceptual value of library tours,” research strategies 15, no. 4 (1997): 261–70, https://doi.org/10.1016/s07343310(97)90013-6. 13 mosley, “assessing the comfort level impact and perceptual value of library tours.” 14 marcus and beck, “a library adventure,” 27. 15 kenneth j. burhanna, tammy j. eschedor voelker, and jule a. gedeon, “virtually the same: comparing the effectiveness of online versus in-person library tours,” public services quarterly 4, no. 4(2008): 317–38, https://doi.org/10.1080/15228950802461616. 16 burhanna, voelker, and gedeon, “virtually the same,” 326. 17 burhanna, voelker, and gedeon, “virtually the same,” 329. 18 felker and phetteplace, “gamification in libraries.” 19 felker and phetteplace, “gamification in libraries,”20. 20 felker and phetteplace, “gamification in libraries.” 21 felker and phetteplace, “gamification in libraries”; o’hanlon et al., “a game-based multimedia approach.” 22 mary j. broussard and jessica urick oberlin, “using online games to fight plagiarism: a spoonful of sugar helps the medicine go down,” indiana libraries 30, no. 1 (january 2011): 28–39. 23 melissa mallon, “gaming and gamification,” public services quarterly 9, no. 3 (2013): 210–21, https://doi.org/10.1080/15228959.2013.815502. 24 j. long, “chapter 21: gaming library instruction: using interactive play to promote research as a process,” distributed learning (january 1, 2017), 385–401, https://doi.org/10.1016/b978-008-100598-9.00021-0. 25 rod-welch, “let’s get oriented.” 26 o’hanlon et al., “a game-based multimedia approach.” 27 mallon, “gaming and gamification.” 28 anna-lise smith and lesli baker, “getting a clue: creating student detectives and dragon slayers in your library,” reference services review 39, no. 4 (november 2011): 628–42, https://doi.org/10.1108/00907321111186659. 29 monica fusich et al., “hml-iq: frenso state’s online library orientation game,” college & research libraries news 72, no. 11 (december 2011): 626–30, https://doi.org/10.5860/crln.72.11.8667. https://doi.org/10.1016/s0734-3310(97)90013-6 https://doi.org/10.1016/s0734-3310(97)90013-6 https://doi.org/10.1080/15228950802461616 https://doi.org/10.1080/15228959.2013.815502 https://doi.org/10.1016/b978-0-08-100598-9.00021-0 https://doi.org/10.1016/b978-0-08-100598-9.00021-0 https://doi.org/10.1108/00907321111186659 https://doi.org/10.5860/crln.72.11.8667 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 26 30 broussard and oberlin, “using online games”; fusich et al., “hml-iq”; o’hanlon et al., “a gamebased multimedia approach.” 31 felker and phetteplace, “gamification in libraries.” 32 felker and phetteplace, “gamification in libraries”; fusich et al., “hml-iq.” 33 “design thinking for libraries: a toolkit for patron-centered design,” ideo (2015), http://designthinkingforlibraries.com. 34 john w. creswell and vicki l. plano clark, designing and conducting mixed methods research (thousand oaks, ca: sage publications, 2007). 35 roger kirk, “practical significance: a concept whose time has come,” educational and psychological measurement, no. 5 (1996). 36 kirk, “practical significance.” 37 sandra mathison, “encyclopedia of evaluation,” sage, 2005, https://doi.org/10.4135/9781412950558. 38 rod-welch, “let’s get oriented.” 39 felker and phetteplace, “gamification in libraries.” http://designthinkingforlibraries.com/ https://doi.org/10.4135/9781412950558 abstract introduction background literature review traditional orientation gamification use in libraries development of the library game orientation (libgo) purpose of study methodology overview research questions participants survey instrument data collection results quantitative findings qualitative findings discussion of qualitative themes discussion study limitations conclusion & future research acknowledgments appendix a: survey instrument endnotes utilizing technology to support and extend access to students and job seekers during the pandemic public libraries leading the way utilizing technology to support and extend access to students and job seekers during the pandemic daniel berra information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13261 daniel berra (danielb@pfulgervilletx.gov) is assistant director, pflugerville (texas) public library. © 2021. “public libraries leading the way” is a regular column spotlighting technology in public libraries. the ongoing pandemic necessitated a reimaging of public library services and resources. out of this challenge rose opportunities to better serve the needs of our communities during the pandemic and beyond. when our library first closed our doors to the public last march, we began discussions on how the needs of our community have changed. we identified two key groups for whom the pandemic had forced an uncomfortable shift: students suddenly thrust into virtual learning and adults who had lost their jobs. while we continue to serve all members of our community in a variety of ways, we looked to increase support for these specific groups utilizing available technology. like many public libraries, the pflugerville public library quickly shifted our service model to include virtual programs, curbside pickup, library cards issued remotely and a focus on electronic resources. our community is rapidly growing and diverse. many of our nearly 70,000 residents are frequent users of library services, attend our wide array of programs, hold meetings, study or work inside the building and enjoy both the physical and virtual library collection. the pandemic shift required our talented staff to find ways to provide a similar level of service to a community who heavily utilizes the library. for both students and job seekers, we took steps to alleviate some of the difficulties the building’s closure caused by utilizing existing technology. we worked with the city’s it department to extend the library’s wi-fi to cover the entire parking lot, allowing for 24-hour access. we also utilized our existing print from your own device system to allow library users to submit print jobs and then pick them up through our curbside service. we added additional wi-fi hotspots available for checkout to ensure access at home for those lacking internet. since these services were already offered to some degree, the expansion of access was relatively easy to implement. for students we drew upon our existing relationship with the pflugerville independent school district (pfisd) to provide support and extend access. we expanded the offering of our special digit cards, which allow students to sign up for an account giving them access to all of our electronic resources and wi-fi hotspots. the school district’s librarians handle the signups and then submit the forms so we can set up the accounts then contact students by email or phone. we further extended access to ebooks by working with the district and our vendor overdrive, to provide a direct way for students to browse and check out through the district’s own ebook app. this allows students to seamlessly see both of our collections, significantly increasing their reading options and removing barriers to access. on the support front, we utilized a portion of the city’s cares act funds directed toward the library to launch a live, virtual tutoring service called brainfuse helpnow. students of all ages have anonymous access to tutors from home seven days a week, as well as additional homework mailto:danielb@pfulgervilletx.gov information technology and libraries march 2021 utilizing technology to support and extend access | berra 2 support resources. this piece meshes nicely with some of our virtual programming for teens, like our sat and act practice tests and other testand career-preparation e-resources. recognizing the pandemic’s impact on the economy, and how this directly affects our community, we worked to prioritize support for the unemployed and under-employed. we added a resume review/job-search coaching service led by two of our circulation staff members. we utilized another portion of our cares act funds to offer career online high school, providing adults with access to an online program to obtain their high school diploma. we also began lending laptops for home use to ensure access to necessary technology. some of our support was already in place before the pandemic began, and we made a significant marketing push to highlight these e-resources. for instance, we partner with the pflugerville community development corporation to provide the online training resource lynda.com (soon to be linkedin learning). we saw a large increase in usage particularly in the first few months of the pandemic as community members looked to add employable skills to their toolboxes. we also created a page on our website with all of our job search assistance resources and services highlighted in one place. while the main emphasis of these efforts utilizes technology, serving the needs of the entire community also requires supporting those who are generally less connected. we have to balance our digital expectations with something more tangible, recognizing many library users still utilize the library in a more traditional way. for students, our senior youth services librarian partnered with pfisd for a book give away in conjunction with the district’s food distribution program to get books in the hands of children for the summer. we also began distributing “care kits” through our curbside service that include personal grooming products and cold weather gear for anyone in need. while 2020 featured the addition of many new services or significant expansion of existing ones, we are focused in 2021 on increasing our marketing efforts for these offerings. relying too heavily on digital forms of communication can limit the impact of our services. for instance, if we want to let people who do not have access to the internet at home know we have wi-fi hotspots and laptops available for checkout, then spreading the word through our standard methods of social media, website, and email will prove ineffective. with the building currently closed to the public, we face an additional barrier to communication. to help alleviate some of this, we have created a job search assistance flyer that we are distributing at places like local food pantries. we plan to expand on similar methods of marketing throughout the year. while positive feedback is often hidden from libraries since we prioritize patron privacy and anonymity, we have received a few specific stories that highlight our impact. our firs t scholarship recipient for career online high school shared how the opportunity to obtain her high school diploma will open up new professional avenues and erase the stigma of having not completed high school. another community member who took advantage of our job search coaching to prepare for an interview expressed gratitude to the library staff who helped increase his employment chances. we also see resumes and homework assignments printed through our virtual printing service, hear from parents with children utilizing hotspots for virtual schooling, see cars in the parking lot using the extended wi-fi and track statistics showing a large increase in the usage of our electronic resources. https://library.pflugervilletx.gov/services/assistance-for-job-seekers information technology and libraries march 2021 utilizing technology to support and extend access | berra 3 the ongoing pandemic necessitated a re-imagining of library services. the needs of our community changed and we set out to find ways to provide assistance to those who need it the most utilizing technology, while remaining mindful of those who are not as comfortable in the digital age. the combination of utilizing technology to address the current needs and expanding access to this technology, has allowed us to better serve the community. we are in the process now of evaluating all of our changes to determine which ones will continue even after the pandemic ends. we already know that we will keep our methods of extending access like the expanded wi-fi availability, laptops for checkout, digit cards for students and the seamless connection to our ebook collection for pfisd. in the area of support, we will continue to offer career online high school, brainfuse helpnow for virtual tutoring, and our resume review/job search coaching service. public libraries are well positioned to innovate and adjust to changes in society. it is one of the things we do extremely well, out of necessity, but also out of a deep desire to serve our communities. all of the shifts the pflugerville public library made related to supporting students and job seekers drew upon existing technology and available resources. what changed was the areas on which we chose to focus our efforts. by prioritizing support and access while pinpointing the needs of the moment, we found ways to better serve our community within the context of everything else we provide. while the jury is still out on how successful some of these initiatives will prove, we already know that many of these changes will continue long after the pandemic ends. june_ital_fifarek_final president’s message: for the record aimee fifarek information technologies and libraries | june 2017 1 this is my final column as lita president. having just finished the 2016/17 annual report, i must admit i’m a little tapped out. over the last year i’ve written on the events of an ala annual and midwinter conferences, a lita forum, a new strategic plan, information ethics, and advocacy. even for an english major and a librarian that’s a lot of words. as i work with executive director jenny levine and the rest of the lita board to prepare the agenda for our meetings at annual, the temptation is to focus on all the work that is yet to be done. but with the end of school and fiscal years approaching, it is the ideal time to celebrate everything that has been accomplished over the last 12 months. first off, at some magical point during the year we completed the lita staff transition period. jenny has truly made the executive director position her own, and although she and mark beatty have more than enough work for six people, they are well on their way to guiding lita to a bright new future. with her knowledge of the inner workings of ala and her desire to make everything easier, faster and better, jenny is truly the right person for this job. next, we have a great new set of people coming in to lead lita. andromeda yelton is going to be a fabulous lita president. she is an eloquent speaker, has more determination than anyone i know, and is a kick ass coder to boot. bohyun kim has an amazing talent for organizing and motivating people, and as president-elect work wonders with the new appointments committee. our new directors-at-large lindsay cronk, amanda goodman, and margaret heller are all devoted litans who will be great additions to the board. i’m glad i get to work with them all in their new roles as i transition to past-president. and last but certainly not least we have started to make inroads on our advocacy and information policy strategic focus. the privacy interest group has already raised lita’s profile by supplementing ala’s intellectual freedom committee’s privacy policies with privacy checklists.1 a group of board members along with office for information technology policy liaison david lee king and advocacy coordinating committee liaison callan bignoli are working on a new task force proposal to outline strategies for effectively collaborating with the ala washington office. these are just the first steps towards a future in which lita is not only relevant but necessary. with all that hard work accomplished, it must be time to toast to our successes. i hope that everyone who will be at ala annual in chicago (http://2017.alaannual.org/) later this month will join us as we conclude our 50th anniversary year. sunday with lita promises to be amazing, with aimee fifarek (aimee.fifarek@phoenix.gov) is lita president 2016-17 and deputy director for customer support, it and digital initiatives at phoenix public library, phoenix, az. president’s message | fifarek https://doi.org/10.6017/ital.v36i2.10019 2 hugo award winner kameron hurley (http://www.kameronhurley.com) speaking at the president’ program, followed by what is sure to be a spectacular lita happy hour at the beer bistro (http://www.thebeerbistro.com/). we are still working on our goal to raise $10,000 for professional development scholarships. we’re only halfway there, so please donate at: https://www.crowdrise.com/lita-50th-anniversary. being lita president during the association’s 50th anniversary year has been both an honor and a challenge. during a milestone year like this you become acutely aware of all of the hard work and innovation that was required for the association to thrive for half a century, and feel more than a little pressure to leave an extraordinary legacy that will ensure another fifty years of success. it’s a tall order, especially in an era of rapid political and societal change. but as i navigated through my presidential year i realized that i didn’t have to do anything more than ensure that people who already want to work hard for the greater good have a welcoming place to do just that. after fifty years, lita still has the thing that made it a success in the first place: a core group of volunteers committed to the belief that new technologies can empower libraries to do great things. the talented and passionate people i have worked with on the board, in the committee and interest group leadership, and throughout the membership are the best legacy that an association can have. now more than ever the people in libraries who “do tech” can be leaders in their communities and on the national stage. now more than ever it is lita’s time to shine. references 1. http://litablog.org/2017/02/new-checklists-to-support-library-patron-privacy/ 10738 20190318 galley determining textbook cost, formats, and licensing with google books api: a case study from an open textbook project eamon costello, richard bolger, tiziana soverino, and mark brown information technology and libraries | march 2019 91 eamon costello (eamon.costello@dcu.ie) is assistant professor, open education at dublin city university. richard bolger (richard.bolger@dcu.ie) is lecturer at dublin city university. tiziana soverino (tiziana.soverino@dcu.edu) is researcher at dublin city university. mark brown (mark.brown@dcu.ie) is full professor of digital learning, dublin city university. abstract the rising cost of textbooks for students has been highlighted as a major concern in higher education, particularly in the us and canada. less has been reported, however, about the costs of textbooks outside of north america, including in europe. we address this gap in the knowledge through a case study of one irish higher education institution, focusing on the cost, accessibility, and licensing of textbooks. we report here on an investigation of textbook prices drawing from an official college course catalog containing several thousand books. we detail how we sought to determine metadata of these books including: the formats they are available in, whether they are in the public domain, and the retail prices. we explain how we used methods to automatically determine textbook costs using google books api and make our code and dataset publicly available. introduction the cost of textbooks is a hot topic for higher education. it has been reported that by 2014 the average student spent $1,200 annually on textbooks.1 another study claimed that between 2006 and 2016 the costs of college textbooks increased over four times the cost of inflation.2 despite this rise in textbook costs, a survey of more than 3,000 us faculty members (“the babson survey”) found that almost every course (98 percent) mandated a textbook or related study resources.3 one response to the challenge of rising textbook costs is open textbooks. open textbooks are a type of open educational resource (oer). oers have been defined as “teaching, learning, and research resources that reside in the public domain or have been released under an intellectual property license that permits their free use and repurposing by others. open educational resources include full courses, course materials, modules, textbooks, streaming videos, tests, software, and any other tools, materials, or techniques used to support access to knowledge.”4 oers stem from the principle that access to education is a human right and that, as such, education should be accessible to all.5 hence an open textbook is made available under terms which grant legal rights to the public, not only to use, but also to adapt and redistribute. creative commons licensing is the most prevalent and well-developed intellectual property licensing tool for this purpose. open textbook projects aimed at promoting publishing and redistributing open textbooks, both in digital and print formats, have been growing. for example, the bcampus project in canada began in 2012 with the aim of creating a collection of open textbooks aligned with the most popular subject areas in british columbia.6 the project has shown strong growth, with over 230 open digital textbooks now available and more than forty institutions involved. a significant recent determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 92 https://doi.org/10.6017/ital.v38i1.10738 development in open textbooks occurred in march 2018, when the us congress announced a $5 million investment in an open textbook initiative.7 in addition to helping change institutional culture, and challenge attitudes to traditional publishing models, one of the most oft-cited benefits of open textbooks is cost savings. according to the college board’s survey of colleges, the average annual cost to us undergraduate students in 2017 for textbooks and materials was estimated at $1,250.8 this figure is remarkably close to the aforementioned figure of $1,200 a year, as reported by baglione and sullivan. however, there is little known about the monetary face value of books that students are expected to buy, beyond studies based on self-reported data. students themselves in the us have attempted to at least open the debate in this area by highlighting book price disparities.9 nonetheless, they only report on a very small number of books, and the college board representing on-campus us textbook retailers have disputed their results for this reason, claiming that they have been selective in the book prices they have chosen. hence this study seeks to address the gap that exists in knowledge about the true cost of textbooks in higher education. this is in the context of a wider research project we are conducting on open textbooks in ireland.10 determining the cost of books is not straightforward as books can be new, used, rental, or digital subscription. however, the cost of new books does set a baseline for other forms, particularly rental and used books. our aim here is hence to start with new books, by analyzing costs of all the required and recommended textbooks of one higher education institution (hei) in ireland. the overarching research question this study sought to address is: what is known about the currently assigned textbooks in an irish university? the sub-questions were: • rq1: what is the extent of textbooks that are required reading? • rq2: what are the retail costs of textbooks? • rq3: are textbooks available in digital or e-book form? • rq4: are textbooks available in the public domain? the next section outlines our methodology and how we sought to find answers to these questions. methods in this section we describe our approach, the dataset generated, and the methods we used to analyze the data. we identified a suitable data source comprising the official course catalog of a hei in ireland with more than ten thousand students. in the course catalog faculty give required and recommended textbook details for all courses. this information is freely accessible on the website of the hei; the course catalog is powered by a software system known as akari (http://www.akarisoftware.com/). akari is a proprietary software system used by several heis in and outside ireland to create and manage academic course catalogs. the course team gained access to a download of all books recorded in the database of the course catalog (figure 1). in this catalog, fields are provided for lecturers to input information for students about books such as title, international standard book number (isbn), author, and publisher. following manual and automated data cleansing, 3,014 unique records of books were created. due to the large number of books, at this stage we sought a programmatic solution for finding out more information about these books. information technology and libraries | march 2019 93 figure 1. course catalog screenshot. we initially thought that isbns might prove the best way to accurately reconcile records of books. however, many isbns were incomplete or mistyped. moreover, many instructors simply did not enter an isbn. given the capacity for errors in the data—for instance, some lectures simply entered “i will tell you in class” in the book title field—we required a tool that could handle fuzzy search queries, e.g. cases where a book title or author were misspelled. the tool we selected was the google books application programming interface (api).11 this api provides an interface to the google books database of circa thirty million books. the service, like the main google search engine, is forgiving of queries that are mistyped or misspelled. hence, we constructed a query based on a combination of author name, book title, and publisher. following experimentation, we determined that these three search terms together allowed us to find books with a high degree of accuracy whilst also accounting for possible spelling errors. determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 94 https://doi.org/10.6017/ital.v38i1.10738 figure 2. system design. we then wrote a custom javascript middleware program deployed in the google cloud platform. this program parsed the file of the book search queries, passed them to the google books api as search requests and saved the results. the api returned results in javascript object notation (json) format. json is a modern web language for describing data. it is related to javascript and can be used to translate objects in the javascript programming language into textual strings. it is used as a replacement for xml as it is arguably more human readable and is considerably less verbose. we then imported this json into a mongodb database to filter and clean the data, before finally exporting them to excel for statistical analysis. mongodb is a document store database that natively stores objects in the json format and allows for efficient querying of the data. the google books api provides some key metadata on books aside from the usual author, publisher, isbn, edition, pages, etc. as it gives prices for selected books. google draws this information from its own e-book store which contains over three million books and a network of resellers who sell print and digital versions of the books. in addition to price, google books also contains information on accessible versions of books, digital/e-pub versions, pdf versions, and whether the book is in the public domain. we have published a release of this dataset and all of our code to the software repository github. we then used the zenodo platform to generate a digital object identifier (doi) for the code.12 one of the functions of the zenodo platform is to allow for code to be properly cited and referenced. we published our code in this way for others interested in replicating this work in other contexts. in the next section we will provide an analysis of the results of our queries. results after extracting and processing the data from the course catalog and google platforms, we obtained 3,030 unique course names and in these courses we found over 15,414 books listed. required versus recommended reading from the course catalog data, we found that 11,022 (71.5 percent) books were required readings and the remaining 4,392 (28.5 percent) were recommended. information technology and libraries | march 2019 95 upon cleaning and removing duplicates and missing data, we identified 3,014 books that could be queried using the google books api. querying the api returned results for 2,940 books, i.e. it found 97 percent of the books and only seventy-four books could not be found. the google books api returns information in json format. figure 3 below shows an example of the json information returned for one book. { "volumeinfo" : { "title" : "psychiatric and mental health nursing", "authors" : [ "phil barker" ], "industryidentifiers" : [ { "type" : "isbn_13", "identifier" : "9781498759588" }, { "type" : "isbn_10", "identifier" : "1498759580" } ], "imagelinks" : { "smallthumbnail" : "http://books.google.com/books/content?id=btsocgaaqbaj&printsec=frontcover&img=1&zo om=5&edge=curl&source=gbs_api" } }, "saleinfo" : { "isebook" : true, "retailprice" : { "amount" : 62.39, "currencycode" : "usd" } }, "accessinfo" : { "publicdomain" : false, "pdf" : { "isavailable" : true } } } figure 3. sample of book information returned by google books api. digital formats and public domain license figure 4 shows the numbers of pdf (1,219) and e-book (1,016) versions of books reported to be available. eight hundred and fifty-four were available in both pdf and e-book format. from the determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 96 https://doi.org/10.6017/ital.v38i1.10738 total of 2,940 individual books listed their availability was as follows: figure 4. availability of 2,940 books in digital formats and public domain license. as per figure 4, only 0.18 percent (six) of the books had a version available in the public domain according to google books. cost results the google books api only returned prices for 596 (20 percent) of the books that we searched for. within that sample, the cost ranged from $0.99 to over $452, as illustrated in figure 5. the median price of a book was $40, and the mean price was $56.67. as there are on average 3.96 books per course, this implies an average cost to students of $224.41 per course taken. as students take an average of 8.05 courses per year, this further implies a cost per year of $1,806.50 per student if they were to buy new versions of all the books. 1,219 (39.73% ) 1,016 (34.56% ) 6 (0. 18%) 0 500 1000 1500 2000 2500 pdf ebook openpdf e-book public domain information technology and libraries | march 2019 97 figure 5. summary of book prices (n = 596). discussion and conclusion we have demonstrated that it is possible to programmatically search and determine the prices of large numbers of books. we used this information to attempt to estimate the full economic cost of books to students on average in an irish hei. we are still actively developing this tool and encourage others to use and even contribute to the code which we have published with the dataset. this proof of concept tool may allow stakeholders with an interest in book costs for students to quickly get real data on large numbers of books. ultimately, we hope that this will help highlight the costs of many textbooks. our findings also highlight relatively low levels of digital book availability. very few books were found to be in the public domain. a limitation of this research is that there are issues around the coverage of google books and its index policies or algorithms. in a literature review of research articles about google books in 2017, fagan pointed out that the coverage of google books is “hit and miss.”13 in 2017, google books included about thirty million books, though google did not release specific details on its database, as emphasized by fagan. it is known that content includes digitized collections from over forty libraries, and that us and englishlanguage books are overrepresented.14 furthermore, google books is only returning results for books that are in the public domain and cannot tell us if books are made available through open licenses such as creative commons. accepting such caveats, however, we have found the google books api to be a very useful tool for answering questions about large numbers of books in a systematic way and hope that our findings can help others. the prices that we derived in this study were for new books only. however, the new book prices provide a baseline for all other prices, e.g. a used book or a loan book price will be relative to a new book price and library budgets will need to take account of new book prices.15 further study is required to determine a more realistic figure for the cost of textbooks and the next phase of our 0 50 100 150 200 250 300 350 400 450 500 1 16 31 46 61 76 91 10 6 12 1 13 6 15 1 16 6 18 1 19 6 21 1 22 6 24 1 25 6 27 1 28 6 30 1 31 6 33 1 34 6 36 1 37 6 39 1 40 6 42 1 43 6 45 1 46 6 48 1 49 6 51 1 52 6 54 1 55 6 57 1 58 6 d ol la rs cost in usd books determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 98 https://doi.org/10.6017/ital.v38i1.10738 wider open textbook research projects involves interviews and focus groups with students to better understand the lived reality of their relationship with textbooks.16 references 1 stephen l. baglione and kevin sullivan, “technology and textbooks: the future,” american journal of distance education 30, no. 3 (aug. 2016): 145-55, https://doi.org/10.1080/08923647.2016.1186466. 2 etan senack and robert donoghue, “covering the cost: why we can no longer afford to ignore high textbook prices,” report, the student pirgs (feb. 2016), www.studentpirgs.org/textbooks. 3 elaine allen and jeff seaman, “opening the textbook: educational resources in u.s. higher education, 2015-16,” report, babson survey research group (july 2016), https://www.onlinelearningsurvey.com/reports/openingthetextbook2016.pdf. 4 william and flora hewlett foundation (2019), http://www.hewlett.org/programs/education-program/open-educational-resources. 5 2012 paris oer declaration, http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaratio n.htm. 6 mary burgess, “the bc open textbook project,” in open: the philosophy and practices that are revolutionizing education and science, rajiv s. jhangiani and robert biswas-diener (eds.). (london: ubiquity pr., 2017): 227–36. 7 nicole allen, “congress funds $5 million open textbook grant program in 2018 spending bill,” sparc open (mar. 20, 2018), https://sparcopen.org/news/2018/open-textbooks-fy18/. 8 jennifer ma et al., “trends in college pricing,” report, the college board (oct. 2017), https://trends.collegeboard.org/sites/default/files/2017-trends-in-college-pricing_0.pdf. 9 kaitlyn vitez, “open 101: an action plan for affordable textbooks,” report, student pirgs (jan. 2018), https://studentpirgs.org/campaigns/sp/make-textbooks-affordable. 10 mark brown, eamon costello, and mairéad nic giolla mhichíl, “from books to moocs and back again: an irish case study of open digital textbooks,” in exploring the micro, meso and macro. proceedings of the european distance and e-learning network 2018 annual conference, genova, 17-20 june, 2018 (budapest: the european distance and e-learning network): 206-14. 11 google books api (2018), https://developers.google.com/books/docs/v1/reference/volumes. 12 eamon costello and richard bolger, “textbooks authors, publishers, formats and costs in higher education,” bmc research notes 12, no. 1 (jan. 2019): 12-56, https://doi.org/10.1186/s13104-019-4099-1. information technology and libraries | march 2019 99 13 jody condit fagan, “an evidence-based review of academic web search engines, 2014-2016: implications for librarians’ practice and research agenda,” information technology and libraries 36, no. 2 (mar. 2017): 7-47, https://doi.org/10.6017/ital.v36i2.9718. 14 ibid. 15 anne christie, john h. pollitz, and cheryl middleton, “student strategies for coping with textbook costs and the role of library course reserves,” portal: libraries and the academy 9, no. 4 (oct. 2009): 491-510, http://digital.library.wisc.edu/1793/38662. 16 eamon costello et al., “textbook costs and accessibility: could open textbooks play a role?” proceedings of the 17th european conference on elearning (ecel), vol. 17 (athens, greece: 2018): 99-106. editorial board thoughts: events in the life of ital sharon farnel information technology and libraries | june 2018 4 sharon farnel (sharon.farnel@ualberta.ca) is metadata coordinator, university of alberta libraries. at the end of june 2018, i will be ending my time on the ital editorial board. during my term i have had the opportunity to write several “from the board” pieces and have very much enjoyed the freedom to explore a library technology topic of choice. this time around i would like to examine ital as seen through crossref’s event data service. crossref launched its event data service in beta in 2017; production service was announced in late march of this year. event data is “an open data service that registers online activity (specifically, events) associated with crossref metadata. event data will collect and store a record of any activity surrounding a research work from a defined set of web sources. the data will be made available as part of our metadata search service or via our metadata api and normalised across a diverse set of sources. data will be open, audit-able and replicable.”1 using dois as a basis, event data captures information on discussions, citations, references and other actions on wikipedia, twitter, and other services. i thought it might be interesting to see what the crossref event data might say about ital. i used the event data api2 to pull event data using the prefix for all ojs journals hosted by boston college (10.6017). i then used openrefine3 to filter out all non-ital records and then began further examining the data. the data was gathered on may 9, 2018. in total, 313 events were captured. of these, 193 events were from wikipedia, 110 from twitter, and 5 each from the lens (patent citations) and wordpress blogs. the 313 events are associated with 38 ital articles, the earliest from 1973 (volume 6, number 1, from ital’s digitized archive), and the most recent from 2018 (volume 37, number 1). the greatest number of events (126) are associated with an article from volume 25, number 1 (2006) on rfid in libraries.4 the other articles are associated with a varying number of discrete events, from one to 24. looking more closely at the events associated with the 2006 article on rfid, all 126 events are references in wikipedia. these represent references to the english and japanese language wikipedia articles on radio frequency identification. other references from wikipedia are to articles on open access, fast (faceted application of subject terminology), library 2.0 , biblioteca 2.0, and others. what about that article from 1973? it was written by j. j. dimsdale and titled “file structure for an on-line catalog of one million titles.” the abstract provides a tantalizing glimpse into the content: “a description is given of the file organization and design of an on-line catalog suitable for automation of a library of one million books. a method of virtual hash addressing allows rapid search of the indexes to the catalog file. storage of textual material in a compressed f orm allows considerable reduction in storage costs.”5 mailto:sharon.farnel@ualberta.ca events in the life of ital | farnel 5 https://doi.org/10.6017/ital.v37i2.10460 there are only four events associated with this 1973 article, but interestingly all are from the lens,6 a global patent search database. these are a set of related patents, by mayers and whiting, for data compression apparatus and methods.7 there are 110 events associated with twitter, with tweets from 15 different users. the largest number of events, 21, begins with aaron tay, 8 a librarian and blogger from singapore management university, tweeting about a 2016 ital article9 on user expectations of library discovery products, which was then retweeted 20 times. the two next most-tweeted articles (17 tweets/retweets each) discuss privacy and user experience in library discovery 10 and “reference rot” in etd (electronic theses & dissertations) repositories. 11 what value can such a brief examination of this small set of data from a very new service provide to ital authors, or the editorial board? it can certainly provide a glimpse of who might be accessing ital articles, and how, and perhaps provide some hints as to ways to increase the reach of the journal. this kind of data is not a replacement for download counts or bibliographic citation patterns, but can complement them and add another layer to our understanding of the place of ital in the library technology community and beyond. as ital continues to thrive and as services like event data continue to improve, i look forward to seeing what story this data continues to tell! references the event data used for this analysis can be found at https://bit.ly/2kgdjcm. 1 madeleine watson, “event data: open for your interpretation,” crossref blog, february 25, 2016, https://www.crossref.org/blog/event-data-open-for-your-interpretation/. 2 crossref, event data user guide, https://www.eventdata.crossref.org/guide/. 3 openrefine, http://openrefine.org/. 4 jay sing, navjit brar and carmen fong, “the state of rfid applications in libraries,” information technology and libraries 25 no. 1, 2006, https://doi.org/10.6017/ital.v25i1.3326. 5 j. j. dimsdale, “file structure for an on-line catalog of one million titles,” information technology and libraries 6, no. 1, 1973, https://doi.org/10.6017/ital.v6i1.5760. 6 the lens, https://www.lens.org/. 7 clay mayers and douglas whiting. data compression apparatus and method using matching string searching and huffman encoding. us patent 5532694, filed july 7, 1995, and issued july 2, 1996. 8 aaron tay, https://twitter.com/aarontay. 9 irina trapido, “library discovery products: discovering user expectations through failure analysis,” information technology and libraries 35, no. 3, 2016, https://doi.org/10.6017/ital.v35i3.9190. https://bit.ly/2kgdjcm https://www.crossref.org/blog/event-data-open-for-your-interpretation/ https://www.eventdata.crossref.org/guide/ http://openrefine.org/ https://doi.org/10.6017/ital.v25i1.3326 https://doi.org/10.6017/ital.v6i1.5760 https://www.lens.org/ https://twitter.com/aarontay https://doi.org/10.6017/ital.v35i3.9190 information technology and libraries | june 2018 6 10 shayna pekala, “privacy and user experience in 21st century library discovery,” information technology and libraries 36, no. 2, 2017, https://doi.org/10.6017/ital.v36i2.9817. 11 mia massicotte and kathleen botter, “reference rot in the repository: a case study of electronic theses and dissertations (etds) in an academic library,” information technology and libraries 36, no. 1, 2017, https://doi.org/10.6017/ital.v36i1.9598. https://doi.org/10.6017/ital.v36i2.9817 https://doi.org/10.6017/ital.v36i1.9598 references letter from the editor: september 2021 letter from the editor september 2021 kenneth j. varnum information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13859 in the editorial section of this issue, we have two columns to share. the september editorial board thoughts essay is by paul swanson, “building a culture of resilience in libraries,” reflecting on the lessons of covid-driven flexibility and suggests that a culture of resilience in our libraries will help us to more easily adapt to these, and emerging, changes we will inevitably encounter. that is followed by carole williams’ public libraries leading the way column, “delivering: automated materials handling for staff and patrons,” in which she discusses the effects of an automated materials handling system on both the staff and patrons of the charleston county (sc) public library. in peer-reviewed content, we have a diverse set of articles on range of topics: bias mitigation in metadata; accessibility of pdf documents; two articles on automated classification of different kinds of texts; two articles with lessons learned due to our abrupt move to remote service; and a case study on the importance of product ownership. 1. mitigating bias in metadata: a use case using homosaurus linked data / juliet hardesty and allison nolan 2. accessibility of tables in pdf documents: issues, challenges and future directions / nosheen fayyaz, shah khusro, and shakir ullah 3. text analysis and visualization research on the hetu dangse during the qing dynasty of china / zhiyu wang, jingyu wu, guang yu, and zhiping song 4. topic modeling as a tool for analyzing library chat transcripts / hyunseung koh and mark fienup 5. expanding and improving our library’s virtual chat service: discovering best practices when demand increases / parker fruehan and diana hellyar 6. a rapid implementation of a reserve reading list solution in response to the covid-19 pandemic / matthew black and susan powelson 7. product ownership of a legacy institutional repository: a case study on revitalizing an aging service / mikala narlock and don brower kenneth j. varnum, editor varnum@umich.edu september 2021 https://ejournals.bc.edu/index.php/ital/article/view/13781 https://ejournals.bc.edu/index.php/ital/article/view/13697 https://ejournals.bc.edu/index.php/ital/article/view/13697 https://ejournals.bc.edu/index.php/ital/article/view/13053 https://ejournals.bc.edu/index.php/ital/article/view/12325 https://ejournals.bc.edu/index.php/ital/article/view/12325 https://ejournals.bc.edu/index.php/ital/article/view/13279 https://ejournals.bc.edu/index.php/ital/article/view/13279 https://ejournals.bc.edu/index.php/ital/article/view/13333 https://ejournals.bc.edu/index.php/ital/article/view/13117 https://ejournals.bc.edu/index.php/ital/article/view/13117 https://ejournals.bc.edu/index.php/ital/article/view/13209 https://ejournals.bc.edu/index.php/ital/article/view/13209 https://ejournals.bc.edu/index.php/ital/article/view/13241 https://ejournals.bc.edu/index.php/ital/article/view/13241 mailto:varnum@umich.edu microsoft word 12035 20211217 galley.docx article stateful library analysis and migration system (slam) an etl system for performing digital library migrations adrian-tudor pănescu, teodora-elena grosu, and vasile manta information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.12035 adrian-tudor pănescu (tudor@figshare.com) is software engineer, figshare. teodora-elena grosu (teodora@figshare.com) is software engineer, figshare. vasile manta (vmanta@tuiasi.ro) is professor, faculty of automatic control and computer engineering, gheorghe asachi technical university of iași, romania. © 2021. abstract interoperability between research management systems, especially digital libraries or repositories, has been a central theme in the community for the past years, with the discussion focused on means of enriching, linking, and disseminating outputs. this paper considers a frequently overlooked aspect, namely the migration of records across systems, by introducing the stateful library analysis and migration system (slam) and presenting practical experiences with migrating records from dspace and digital commons repositories to figshare. introduction bibliographic record repositories are a central part of the research venture, playing a key role in both the dissemination and preservation of outcomes such as journal articles, conference papers, theses and dissertations, monographs, and, more recently, datasets. as the ecosystem of which these are a part of has evolved at a sustained pace in the last decade, repositories also had to adapt while ensuring uninterrupted service to the research community. nevertheless, a number of developments, both at the local, repository level and at a more general, global scale, have created the necessity of considering the complete replacement of certain systems with new repository solutions which are better suited for their stakeholders’ requirements. the following are a few such developments: • the need to consolidate both technological solutions and operational teams, in order to reduce running costs and provide a unified experience for end users, the research personnel.1 • various policies require researchers to provide not only traditional outputs, such as journal articles or conference papers, but also the datasets and other materials backing up scientific claims. for repositories, this means both adapting to larger amounts of stored data as well as ensuring that the metadata dissemination and preservation mechanisms are suited for the new output types (e.g., while full-text search is a common feature of literature repositories, it cannot be easily applied to numeric datasets).2 • apart from extending the set of stored outputs, policies have also created new requirements for existing record types. for example, the research excellence framework (ref) in the uk mandates monitoring open access (oa) publishing of research articles; thus, institutional repositories are no longer only a facilitator of green open access (selfarchiving of records) but also a means of monitoring compliance.3 this requires the implementation of new logic in existing repositories, which can frequently be difficult, especially when faced with legacy repository code bases or insufficient technological resources. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 2 • commercial, contractual, or leadership changes can also create the need to replace repository systems, due to uncertainty (see the acquisition of bepress by elsevier) or preference for certain platforms.4 while these developments can generate the requirement to switch repositories in a very short span of time, such a venture needs to be properly planned and executed in order to ensure, on the one hand, that no records are lost or corrupted and, on the other hand, that minimal or no downtime is caused. ideally, migrations would also be an opportunity to curate and enrich the existing corpus by consolidating and correcting bibliographic records. between 2018 and 2019 the research team has performed six digital library migrations from various source repository solutions (dspace, digital commons, custom in-house built systems) to the figshare software as a service (saas) repository platform. for this purpose, slam, an extract, transform, load (etl) system, was developed and successfully employed in order to migrate over 80,000 records. this article describes the rationale behind slam, its design and implementation, and the practical experiences with employing it for repository migrations. a number of future enhancements and open problems are also discussed. motivation and background of slam in early 2018 figshare started considering the suitability of its repository platform for storing content which is usually specific to institutional repositories (journal articles, theses, monographs), along with non-traditional research outputs (datasets or scientific software).5 while feature-wise this was validated by its hosted preprint servers, a new challenge was posed, as stakeholders choosing to use figshare as an institutional repository also had to transfer all content from their existing systems.6 thus, in the first half of 2018, a first migration was performed, transferring records from a bepress digital commons (dc) repository (https://www.bepress.com/products/digitalcommons/) to figshare (https://figshare.com). from a technical point of view, a python (https://www.python.org/) script was developed for this migration; this script parsed a commaseparated values (csv) report produced by dc which contained all metadata and links to the record files.7 using this information, records were created on the figshare repository using its application programming interface (api) (https://api.figshare.com). while this migration succeeded, the naive technical solution presented a number of issues: • difficulties with the metadata crosswalk: while a crosswalk was initially set up, mostly based on the definition of the fields in the source and target repositories’ metadata schema, issues were discovered while migrating the records, mainly generated by inconsistencies in the values of the fields across the corpus. these issues were fixed on a case-by-case basis, in order to ensure a lossless migration, but it would have been preferable to surface them in the early phases, in order to have the migration script mitigate the issues in the final run. • running the migration procedure multiple times: the migration script followed mostly an all or nothing approach, which, at each run, fully migrated all records between repositories. this is undesirable, as there was a need to run the script only for those records that failed to migrate (due, for example, to metadata crosswalk issues). after the full migration was completed, there was also a need to apply only some minor corrections to records, without following the full procedure. this was not possible, since the script would recreate all records to migrate from scratch on the target repository, as it did not have any memory of information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 3 previous runs. this issue was also amplified by the fact that in the source repository records did not have any type of persistent identifier attached. thus, additional scripts, which only performed the corrections, had to be developed. • ability to run the migration procedure with minimal supervision: like most migrations, this instance considered a large number of records (over 10,000) and, ideally, the process would run with minimal supervision operators. while the script partially accomplished this, the need for better fault-tolerance and enhanced logging was identified. given the lessons learned from the initial attempt and the requirement that five additional migrations were to be completed between october 2018 and december 2019, a more robust alternative to the naive migration script was required. this alternative had to adhere to three design principles: 1. reusability: the system should be usable for multiple migrations without extensive additions or modifications. thus, it should be able to adapt to the workflows of multiple repositories, metadata schemas, and other concerns specific to each migration. 2. statefulness: in software engineering, programs can either discard knowledge of past transactions or preserve it, allowing previous results and operations to be revisited. migration systems benefit from a stateful architecture, as the system should be able to perform the same migration multiple times, without creating duplicate records on the target repository, while allowing for incremental record improvements with each run. apart from allowing for corrections to be applied post-migration, this would also support the prototyping phase (where multiple test migrations are performed in order to validate the metadata crosswalks), that no information is lost, and other general workflow aspects. 3. fault tolerance: the system should implement fault tolerance mechanisms at all levels, allowing it to run migrations of large corpora with minimal supervision and, at the same time, implement sufficient logging and exception handling to allow operators to identify and correct potential issues. several repository migrations are represented in the literature. in van tuyl et al., the authors describe the process of moving from a dspace (https://duraspace.org/dspace) to a samvera (https://samvera.org) system, while in the study from do van chau records were migrated from a solution developed in house to dspace.8 both instances offer valuable insight into the challenges posed by digital library migrations, especially at the level of bibliographic metadata; on the other hand, both works are focused mostly on a specific use-case and do not propose general technical solutions for other migrations. it is interesting to note that the migration presented by van tuyl et al. required two and a half years of work, while slam was employed to carry five migrations in 14 months. the bridge2hyku toolkit (https://bridge2hyku.github.io/toolkit) is a collection of tools, including a module for the hyku repository solution (https://hyku.samvera.org), aimed at facilitating the import of records into digital libraries based on this software. similar to slam, it includes an analysis component, useful for surfacing and correcting potential metadata issues during the migration. slam provides two major improvements over this solution, namely it defines a generic architecture that can be used for migrating records between any two repositories, while also defining a procedural migration workflow to create a robust, fault-tolerant, and extensible solution. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 4 pygrametl (http://chrthomsen.github.io/pygrametl/) and petl (https://github.com/petldevelopers/petl) are two open-source frameworks which allow the defining of etl workflows; similar to slam, the processing steps are defined using python functions. these projects are targeted towards tabular and numeric data, making them unsuitable for the transfer of files and metadata across bibliographic repositories. singer (https://www.singer.io/) is an etl framework similar in design to slam, which allows the composing of various data sources (or taps) and targets, in order to move data between them. the two downsides of this implementation are that it is focused on processing data specified in the javascript object notation (json) format, which is not always available for bibliographic metadata, and that it does not facilitate extending the pipeline with, for example, the analysis facilities targeted by slam. hevo data (https://hevodata.com/), pentaho kettle (https://github.com/pentaho/pentahokettle) and talend open studio (https://www.talend.com/products/talend-open-studio/) are etl frameworks which employ graphical interfaces to allow users to define the processing workflows. while such functionality was not initially identified as a requirement for our planned migration projects, during testing it became obvious that providing such an interface could bring value by having repository administrators be more involved in defining and validating the processing applied to bibliographic records, as the administrators possess the most knowledge of the organisation of the repositories. a downside of the three solutions is that their usage requires commercial agreements, which did not line up with the business requirements of the considered migrations. in their work, tešendić and boberić krstićev use the pentaho suite in order to implement the etl component of a business intelligence (bi) solution for reporting on bibliographic records.9 while the structure of the etl processing is different—the authors being mostly interested only on certain aspects of the metadata—this work provides insights into the types of analysis that could be performed while migrating records. slam’s design and implementation following the design principles previously mentioned, slam’s architecture was devised as presented in figure 1; as for most etl systems, the easiest way of understanding its operation is by examining the data flow. the migration workflow proceeds by extracting all the required information from the source repository. this could be achieved in multiple ways, such as harvesting through an oai-pmh (https://www.openarchives.org/pmh/) endpoint or other types of api, using the bulk export functionality implemented by most repository systems, or even by crawling the html markup describing records, similar to what search engines do in order to discover web pages. once this mechanism has been established, practical experience proves that it is beneficial to move this raw data closer to the destination repository (to a staging area as depicted in figure 1). while this transfer might prove cumbersome, especially for large corpora, it is required only once. moreover, having the data close to the destination repository allows faster prototyping and testing of the migration procedure, as network latency and throughput are improved, while also ensuring that the source repository’s functioning is not affected in any manner. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 5 figure 1. main components and data flow in slam. areas in light blue are currently under development, while the components highlighted in green need to be adapted for each migration. the system splits the data to be migrated into four logical slices: bibliographic metadata, record files (e.g., pdfs of journal articles), persistent identifiers of records (pids, such as digital object identifiers or handles), and usage data (views and downloads). metadata is the first aspect to be considered. from the migration point of view, two dimensions are considered: the syntax and the semantics. metadata comes in various formats, such as csv or extensible markup language (xml) files, but most of these can be easily parsed by openly available software solutions. of more interest are the semantics of the metadata, which stem from the employed schemas or ontologies of field definitions; examples include dublin core (https://www.dublincore.org) or datacite (https://schema.datacite.org). a schema crosswalk, which describes how the fields in the target repository schema should be populated using the source data, needs to be set up when transferring records. while this should not be a concern if information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 6 the two repositories use the same schema, for the performed migrations (described below) this was not the case. other reasons for setting up such a crosswalk include • loosely defined schema in at least one of the repositories: certain repository systems do not specify a schema with clear field definitions, validations or applicability. by having the source repository administrators help with setting up a crosswalk, the migration team can avoid issues caused by incomplete understanding of the metadata. • support for the review of bibliographic records: migrations can prove to be an opportunity for reviewing and amending the records’ metadata; for example, infrequently used fields can be completely removed, and values which tend to confuse end users can be moved to other fields. • ensuring that a record on how the migration was performed, from the metadata point of view, is maintained. the crosswalk is considered an artefact of the migration and is preserved for future reference. in slam, the crosswalk is tested using elasticsearch, “an open-source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured.”10 the setup uses the crosswalk to create elasticsearch documents which include all fields as they would be transferred to the destination repository. a kibana (https://www.elastic.co/products/kibana) dashboard is then used to inspect the records’ metadata and perform structured searches across the corpus. this can allow, for example, discovering fields which do not follow a consistent pattern for the values, as seen in figure 2. as the crosswalk includes, apart from the field mapping, altering operations that can be performed on each field, this analysis can facilitate the review process described by the second point above. while performing actual migrations, a number of inconsistencies that the source repository administrators were unaware of were surfaced by slam and corrected in the target repository. this is commonplace especially in large corpora spanning decades, where the repository metadata workflows and schemas changed multiple times. two points should be noted about this component: • this is the only component of the architecture for which we mention an actual solution chosen for the practical implementation, namely elasticsearch. while other solutions could have been chosen, such as the ones included in the bridge2hyku toolkit, elasticsearch proved to be the best fit for a highly automated system which requires analysis capabilities; it is a production-grade solution which can index a high number of documents and support complex queries, while also providing user-friendly analytical views via kibana. • there are arguments for loading the metadata in the analysis component without having it processed through the crosswalk; such a workflow could provide further insights into various issues in the corpus which are possibly obscured by the crosswalk. our practical experiences did not fully justify this requirement, while the actual implementation provided a mean to test the crosswalk, a major migration component; nevertheless, we are still considering the possibility of having to load the raw metadata for analysis in future migrations. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 7 figure 2. a view examining the possible values of the temporal coverage field from the dublin core schema in an institutional repository corpus to be migrated. this shows variation in the format of the values (full date, year only) which can cause issues when migrating to a schema which applies strict validation on date/time values, and thus need to be handled by the migration harness. this view is generated using kibana from the elasticsearch stack, employed by slam for metadata analysis purposes. with the crosswalk set up, the migration module can be completed. from a logical point of view, it comprises of four components: 1. metadata processing: this component uses the crosswalk in order to transfer the metadata to the target repository. 2. file upload: this simply uploads all files associated to a bibliographic record to their new locations. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 8 3. usage data transfer: most repositories implement counters for views and downloads of records, and this information, if available, is also transferred to the target repository. 4. persistent identifier update: if the records are using persistent identifiers, such as digital object identifiers (dois) (https://doi.org/) or handles (http://handle.net/), these are updated to resolve to the new locations in the target repository. while employing slam for migrations, cases in which persistent identifiers were not employed on source repositories were encountered, with records being accessible only via uniform resource locators (urls). as these cannot always be transferred across repositories, because each software uses its own url schema, it is advisable to implement persistent identifiers before migrations. figure 3. a simplified process diagram describing the steps required for migrating a bibliographic record. each successful operation is recorded in a persistent database which is used in subsequent runs for resuming the workflow. for example, files will not be uploaded each time the script is run, thus avoiding duplication. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 9 one of the architectural goals of slam is statefulness and this is implemented at this level, the migration module being designed as a state machine. a trivial example of such a state machine is shown in figure 3. the state machine status is serialised in a persistent database, with each migration run deserializing it in order to understand which operations still need to be applied for each record. maintaining such a registry provides several other benefits: • facilitates testing and prototyping: this was the original reason behind the architecture, useful especially before the metadata analysis functionality was implemented. if one of the operations required for transferring a record fails, subsequent runs will not apply all steps, but only the ones that did not complete. as for each record a separate state section is maintained, this becomes especially useful when migrating multiple entries; records which failed to migrate can be easily isolated and subsequently reprocessed. • allows creating reports on the migration: these are used, for example, to validate that all records were indeed transferred to the target repository. • allows the migration module to be portable: if the state machine serialisation is accessible, the module can run from different locations and at different points in time. the first architectural principle previously presented relates to the reusability of slam across migrations. the most common cause of divergence between migrations is related to the differences between repository solutions; slam isolates this concern by using two connectors, one for the source and one for the target repository. these connectors translate the information to be migrated to and from slam’s internal data model. thus, the source connector needs to be able to traverse the staging storage and provide slam with all the required record information, while the target connector will upload the records to the new repository (using a web-accessible api for example). this means that for each migration only three parts of slam need to be adapted (shown in green highlights in figure 1): the source and target connectors, and the metadata crosswalk. all other components can remain unchanged, thus reducing the technical development time. in the last step of slam’s workflow, the information that was used for the migration is sent to a long-term preservation storage, in order to ensure that it remains available for future reference. in our implementation, the following information is preserved: • original metadata and files, as extracted from the source repository. • metadata crosswalk from source to target repository. • migration script state machine serialisation. this information is sufficient for understanding the exact steps applied during the migration and, if required, for applying certain corrections to the migrated records at a future point in time. employing slam for real-world migrations slam was used for performing five repository migrations in one year, as described in table 1; the target repository in all five cases was figshare. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 10 table 1. overview of repositories migrated to figshare using slam. source repository identifier repository type software number of records ir1 institutional dspace 37,000 ir2 institutional dspace 25,605 d1 data custom 334 (105 gb) ir3 institutional digital commons 2,275 ir4 institutional dspace 15,474 slam’s viability was assessed based on the design principles outlined above. reusability, the main rationale behind slam, relates to being able to reuse as much of the system as possible across migrations. the architecture isolated the parts that required adaption from one migration to another (the connectors and the crosswalk); the time spent by a software engineer in order to set up these was monitored. the target here was to support the specialised staff on making domainspecific decisions, especially on the metadata crosswalk, by reducing the time needed to develop the three mentioned components. for example, the research excellence framework (ref) 2021 exercise in the united kingdom had strict metadata requirements, which required thorough testing in connection with current research information systems and open access monitoring solutions. between the first and fourth migration, this was reduced from six person-weeks to only two; it is important to note that slam evolved between the migrations, based on the lessons learned from each instance. statefulness, the property which allows re-processing already-migrated records, is covered in slam by the state machine implemented in the migration module, which is persistent and can be referenced in subsequent runs. all the migrations in table 1 required supplementary runs after all records were migrated, most frequently in order to fix metadata issues discovered after the full corpus was transferred. for example, ir1 required three such runs: 1. the first run fixed a number of issues caused by omissions in the metadata schema crosswalk. 2. the second run enriched the metadata using information taken from a current research information system (a source external to slam). 3. the last run corrected the usage statistics (view and downloads) which were incorrectly imported initially, due to incomplete understanding of the source repository’s database. due to slam’s design, no issues were encountered while performing these runs, as no records were duplicated, removed, or erroneously modified; this was manually checked by the repository administrators, either by sampling the corpus or by inspecting each migrated record, depending on the repository size. a key aspect highlighted by the requirement to reprocess migrated records relates to the granularity of the state machine. as an example, in ir3 a second run required attaching supplementary files to a number of migrated records, and this posed a challenge due to the fact that the state machine only recorded if all files have been uploaded, and not which files were successfully added to the record. thus, the state machine was amended to record the complete list of record files, allowing for more granular control over this processing step. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 11 the last concern, fault tolerance, was achieved by applying basic software engineering principles, such as fail-fast (report migration issues as soon as they manifest), the implementation of proper exception handling (such as not to ignore any potential issues), and addition of enhanced logging in order to provide a complete record of the processing steps. for each of the five migrations, slam ran unsupervised, reporting at the end of each run the records for which an issue was encountered. as an example, in the ir4 migration, slam initially failed to migrate 300 records. these were reported to the operator, and after minor fixes were applied to the metadata crosswalk the migration completed successfully. fault-tolerance plays a central role in ensuring that during migrations no data is lost or corrupted, by surfacing any edge-case that might have been missed during the development of the metadata crosswalk, repository connectors, or core migration module, while also isolating such issues to the records exhibiting them, with no impact on the full corpus. future directions while proven viable in real-world scenarios, a number of areas which can benefit from further improvements were identified through an analysis of the current implementation, based on the experiences of the five migrations. first, the migration-specific components (connectors and metadata crosswalk, shown in green in figure 1) require further decoupling from the core migration module. for example, since all migrations considered figshare as a target repository, this connector is currently strongly interlinked with the core module, in order to save development time according to business requirements and migration timelines. further decoupling will ensure that the core migration module’s design is not influenced in any way by the repository’s architecture and capabilities. completing this work will also allow making the source code of our current implementation of slam publicly available, as in its current state it is making use of proprietary components which are employed across other parts of the figshare platform. aside from these, the source code includes straightforward python modules and makes use of open technologies such as elasticsearch, which will allow the larger community to adapt and use slam with other source or target repositories, or even enhance it with further functionality. nevertheless, the general architecture can already be implemented in any other way or using a different set of technologies. further to this point, the metadata crosswalk is currently influenced by the logic and design of the migration module; for example, it uses the same procedural programming language, python, as all other components of slam. employing technologies such as extensible stylesheet language transformations (xslt, for metadata in xml formats) or sparql (for rdf) will help involve staff with in-depth domain knowledge further in the migration, for whom these technologies are more familiar; moreover, such a design does not require any knowledge of slam’s internal processes. second, the five completed migrations highlighted the importance of reviewing, correcting, and enhancing records during the migration. for example, when migrating a journal article’s version of record in an open access context, special care needs to be given to its metadata (title, authors, journal name, publication date or persistent identifier), as mistakes can generate issues with scholarly search engines which will not be able to link the published version to the repository one. a possible input for comparing and correcting existing metadata is the information contained by current research information systems, which aggregate information from various databases, such as scopus (https://www.scopus.com/). if access to such systems is not available, it is possible to information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 12 source metadata from open directories, such as crossref (https://www.crossref.org/). this component is included in the architectural overview presented in figure 1. the third area in need of improvement relates to testing the outcome of the migrations. as mentioned in the previous section, this is currently a manual process and can be both cumbersome and error prone. while in line with slam’s philosophy of automating every step of the process, implementing a mechanism for validating the end migration result could also provide stronger assurances on the completeness and correctness of the migration. finally, slam’s preservation module requires further development in order to ensure that it is fully automated; moreover, the possibility of adding a manifest explaining the migration artefacts needs to be considered, as knowledge on the organisation of the information, which is specific to each migration, might be lost in time. it is important to note that architecture-wise, which was the main concern of this work, we did not identify any major shortcomings in slam—most issues discussed above focus on implementation issues. slam’s modular design will facilitate any additions to the system, required to support new use cases and migrations. conclusions this paper describes slam, the stateful library analysis and migration system, an etl software architecture for performing digital library migrations. what differentiates such transfers from other data migrations is the required domain knowledge, the particularities of the target and source repositories in the context of the scholarly communications ecosystem, and the structure of the migration package, which includes, among others, bibliographic metadata, record files, and usage data. digital libraries are an integral part of the cultural heritage; thus, any migration needs to ensure that no information is lost or corrupted in the process. the main contributions brought by slam are 1. it includes an analysis module based on an industry standard search engine, elasticsearch, which allows operators to analyse the metadata and schema crosswalk, facilitating the decisions required for properly migrating information between repositories; 2. it implements a serializable state machine in its migration module, which facilitates running the migration procedures multiple times without duplicating, removing, or corrupting records, while allowing for corrections to be applied to the corpus; 3. it follows a modular design, which enhances its reusability across multiple migrations, by reducing the development time required for adapting the system to new source and target repositories. slam applies established software engineering principles in order to provide a trustworthy tool to digital library administrators that need to transfer content between systems. its design was both influenced and validated by real-world applications, having been used for five different migrations with various requirements and targeted repository solutions. future work will consider enhancing slam’s metadata analysis and enrichment capabilities as well as the collection of further data points on its performance and possible improvement directions while using it for new digital library migrations. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 13 endnotes 1 david scherer and dan valen, “balancing multiple roles of repositories: developing a comprehensive repository at carnegie mellon university,” publications 7, no. 2 (2019), https://doi.org/10.3390/publications7020030. 2 directorate-general for research & innovation, “h2020 programme—guidelines to the rules on open access to scientific publications and open access to research data in horizon 2020,” version 3.2, march 21, 2017, https://web.archive.org/web/20180826235248/http://ec.europa.eu/research/participants/ data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf; national institutes of health, “nih public access policy details,” last updated march 25, 2016, https://web.archive.org/web/20180421191423/https://publicaccess.nih.gov/policy.htm. 3 the ref, “research excellence framework,” https://web.archive.org/web/20191215143352/https://www.ref.ac.uk/. 4 roger c. schonfeld, “elsevier acquires bepress,” scholarly kitchen (blog), august 2, 2017, https://web.archive.org/web/20191212183253/https://scholarlykitchen.sspnet.org/2017/0 8/02/elsevier-acquires-bepress/. 5 alan hyndman, “announcing the figshare institutional repository… and data repository… and thesis repository… really just an all-in-one next gen repository,” figshare (blog), march 22, 2018, https://figshare.com/blog/announcing_the_figshare_institutional_repository_and_data_repos itory_and_thesis_repository_really_just_an_all-in-one_next_gen_repository/389. 6 alan hyndman, “figshare to power chemrxiv™ beta, new chemistry preprint server for the global chemistry community,” figshare (blog), august 14, 2017, https://web.archive.org/web/20191218194210/https:/figshare.com/blog/_/322. 7 bepress, “digital commons dashboard,” https://web.archive.org/web/20191218192450/https://www.bepress.com/reference_guide_ dc/digital-commons-dashboard/. 8 steve van tuyl et al., “are we still working on this? a meta-retrospective of a digital repository migration in the form of a classic greek tragedy (in extreme violation of aristotelian unity of time),” code{4}lib journal no. 41 (august, 9, 2018), https://journal.code4lib.org/articles/13581; do van chau, “challenges of metadata migration in digital repository: a case study of the migration of duo to dspace at the university of oslo library” (master’s thesis, university of oslo, 2011), http://hdl.handle.net/10642/990. 9 danijela tešendić and danijela boberić krstićev, “business intelligence in the service of libraries,” information technology and libraries 38, no. 4 (2019), https://doi.org/10.6017/ital.v38i4.10599. 10 “what is elasticsearch?” elasticsearch bv, http://web.archive.org/web/20191207032247/https://www.elastic.co/whatis/elasticsearch. navigating uncharted waters: utilizing innovative approaches in legacy theses and dissertations digitization at the university of houston libraries article navigating uncharted waters utilizing innovative approaches in legacy theses and dissertations digitization at the university of houston libraries annie wu, taylor davis-van atta, bethany scott, santi thompson, anne washington, jerrell jones, andrew weidner, a. laura ramirez, and marian smith information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.14719 annie wu (awu@uh.edu) is head of metadata and digitization services and the ambassador kenneth franzheim ii and mrs. jorgina franzheim endowed professor, university of houston libraries. taylor davis-van atta (tgdavis-vanatta@uh.edu) is director of the digital research commons, university of houston libraries. bethany scott (bscott3@uh.edu) is head of preservation and reformatting, university of houston libraries. santi thompson (sathompson3@uh.edu) is associate dean for research and student engagement and the eva digital research endowed library professor, university of houston libraries. anne washington (washinga@oclc.org) is semantic applications product analyst, oclc. jerrell jones (jjones46@uh.edu) is digitization lab manager, university of houston libraries. andrew weidner (andrew.weidner@bc.edu) is head of digital production services, boston college libraries. a. laura ramirez (alramirez@uh.edu) is senior library specialist, university of houston libraries. marian smith (mrsmith8@uh.edu) is digital photo tech, university of houston libraries. © 2022. abstract in 2019, the university of houston libraries formed a theses and dissertations digitization task force charged with digitizing and making more widely accessible the university’s collection of over 19,800 legacy theses and dissertations. supported by funding from the john p. mcgovern foundation, this initiative has proven complex and multifaceted, and one that has engaged the task force in a broad range of activities, from purchasing digitization equipment and software to designing a phased, multiyear plan to execute its charge. this plan is structured around digitization preparation (phase one), development of procedures and workflows (phase two), and promotion and communication to the project’s targeted audiences (phase three). the plan contains step-by-step actions to conduct an environmental scan, inventory the theses and dissertations collections, purchase equipment, craft policies, establish procedures and workflows, and develop digital preservation and communication strategies, allowing the task force to achieve effective planning, workflow automation, progress tracking, and procedures documentation. the innovative and creative approaches undertaken by the theses and dissertations digitization task force demonstrated collective intelligence resulting in scaled access and dissemination of the university’s research and scholarship that helps to enhance the university’s impact and reputation. introduction to answer the call of implementing university of houston (uh) libraries strategic plan to position the libraries as a campus leader in research and transform library space to reflect evolving modes of learning and scholarship, the uh libraries launched a cross-departmental task force in 2019 charged with digitizing the university’s extensive print theses and dissertations collection. providing online access to newly digitized theses and dissertations boosts the reach and impact of our institution’s research and scholarship while expanding available space for computing, mailto:awu@uh.edu mailto:tgdavis-vanatta@uh.edu mailto:bscott3@uh.edu mailto:sathompson3@uh.edu mailto:washinga@oclc.org mailto:jjones46@uh.edu mailto:andrew.weidner@bc.edu mailto:alramirez@uh.edu mailto:mrsmith8@uh.edu information technology and libraries september 2022 navigating uncharted waters | 2 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith technology, and faculty and student learning and research activities. a study by bennett and flanagan revealed the positive impact and benefits of online dissemination of theses and dissertations, including enhanced discoverability by google’s strong indexing capabilities, significant increase in the usage of the works, and an overall enhancement of the reputation of an institution.1 encouraged by the positive outcomes and supported by funding from the john p. mcgovern foundation to initiate this project, the theses and dissertations digitization (tdd) task force developed a phased project plan and utilized creative, automated processes and methods to execute it. this article articulates the tdd project planning and the innovative work undertaken by the task force to achieve efficiency in making our print theses and dissertations readily available to new readerships around the world. literature review over the past several decades, research libraries have been building programs around digitization and open access repository infrastructures, largely aimed at expanding their digital collections and engaging communities with newly available research materials. for some, part of their programming has included projects that digitize their institution’s legacy print collections of theses and dissertations. the review below explores literature on the mass digitization process , including institutional case studies, guidance documents, legal and policy papers, and local documentation developed as libraries have planned and implemented these projects. any library tackling a retrospective thesis and dissertation project needs a framework for determining the copyright status of these works en masse. perhaps it is no coincidence, then, that copyright concerns are the most heavily documented aspect of the process. clement and levine provide the definitive work to date on copyright and the publication status of theses and dissertations written in the united states before 1978. their study asserts that “p re-1978 american dissertations were considered published for copyright purposes by virtue of their deposit in a university library or their dissemination by a microfilm distributor.”2 they go on to write that, “for copyright purposes, these were acts of publication with the same legal effect as dissemination through presses, publishers, and societies.”3 they suggest that libraries should investigate the copyright status for theses and dissertations authored between 1909 and 1978 (typically found on the title page and verso); if there is no copyright notice, then the thesis or dissertation is likely in the public domain and eligible for digitization and public release without permission. moreover, even those works that have a printed copyright notice might have fallen out of copyright if they were not renewed after 28 years for the same length of time.4 broad guidance and best practice for copyright status and other matters of process around theses and dissertations is provided in guidance documents for lifecycle management of etds, which acknowledges that legal services may be required for some retrospective thesis and dissertation digitization projects, especially “before scanning without the permission of former students .”5 the authors assert that information professionals should investigate any “appropriate access options” with institutional legal expertise before engaging in a retrospective digitization project and articulate the two most commonly encountered copyright scenarios: “[either] former student authors may not allow the reproduction and open dissemination of their work, or unauthorized copyrighted material was used in the original theses and dissertations.”6 strategies that might be employed to determine copyright status include “consulting with legal counsel at one’s institution to see where it stands on this issue; negotiating with commercial entities that make such content information technology and libraries september 2022 navigating uncharted waters | 3 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith available at a price so that institutions can have some control over it for the purpose of broader access; and working with groups such as alumni associations, colleges, departments, and graduate schools to establish contact with thesis and dissertation authors for securing their permission to digitize, and render available online, their past scholarship.”7 on the question of public access to newly digitized works, the guidance documents detail the implications of the “transition from print to electronic,” which “has led to increased scrutiny over who will be allowed to access the electronic versions and how widely they will be disseminated.”8 when there is any legal doubt, there are many reasons for libraries to exercise caution and restrict access to electronic theses and dissertations; that said, “research available on the web immediately upon submission of the final, approved thesis can prove advantageous to the newlydegreed student, the institution, and other researchers.”9 again, consulting legal officers and the original authors, if possible, remains the consensus approach to establishing a strategy for access to digitized theses and dissertations. the guidance documents also touch on the thorny issue of digitizing theses and dissertations that contain third-party content. they summarize the history and routine application of the fair use doctrine in both the creation and dissemination of scholarly works but provide little firm guidance on the matter.10 indeed, after reviewing the entire body of literature on retrospective thesis and dissertation projects, this remains a practical challenge that any library undertaking a mass digitization project must consider and the associated risks must be accounted for. in recent years, several case studies have documented institutions’ efforts to digitize and make more widely available legacy theses and dissertations. of the institutions that the tdd task force reviewed for the environmental scan, none of their case studies attempts an exhaustive documentation of end-to-end workflows and processes developed to execute the task; most focus on particularly difficult questions inherent to the process. martyniak provides a rationale for the university of florida’s (uf) retrospective scanning project and details their process for contacting authors before works were scanned.11 the workflow outlines several points of contact with authors to obtain signed distribution agreements, as well as uf’s approach to automate this process as much as possible. notably, the distribution agreement form and correspondence templates are provided as appendices to the article.12 as part of this retrospective digitization project, uf also released a scanning policy that articulates their approach to determining the copyright status of works and their resultant practice. 13 this policy document is an excellent example of an institution’s implementation of clement and levine’s research described above. likewise, mundle describes the methods used by simon fraser university (sfu) to establish its approach to the issues of copyright status and access, ultimately resulting in a public thesis access policy and procedures for contacting authors whenever possible to offer them the ability to opt their work out of the project.14 unlike the uf, sfu began scanning before any explicit permission had been obtained from authors. sfu also shares their use of scripts to automate the ingest of metadata from original marc records into their dspace repository.15 piorun and palmer, meanwhile, focus on an analysis of the time and cost associated with digitizing 300 doctoral dissertations for a newly implemented institutional repository at the university of information technology and libraries september 2022 navigating uncharted waters | 4 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith massachusetts chan medical school.16 piorun and palmer detail the library’s process for obtaining cost comparisons from external vendors as well as estimated costs, including labor, associated with undertaking the task in house.17 issues of workflow, policy development, and permissions are also addressed with an emphasis on developing accurate and streamlined methods of processing works; however, piorun and palmer conclude, “regardless of the amount of planning and thought that goes into a project, there is always the possibility that each record or file will need to be reworked.”18 shreeves and teper discuss theses and dissertations’ complicated status as grey literature and the university of illinois urbana-champaign (uiuc) library’s digitization project, which they describe as “less of a collection management or preservation issue and more as an effort to tackle broader scholarly communication and outreach issues.”19 after consulting with university legal counsel, digitized works were ingested to the uiuc institutional repository as a restricted (campus -only access) collection. as authors provide consent, access to their work is broadened to the public. worley demonstrates that, according to an analysis of circulation numbers, works that are accessible electronically are used dramatically more than print copies, serving as rationale for undertaking digitization of student works.20 they provide significant detail around virginia polytechnic institute and state university’s process to establish file specifications for its digitization process, and image quality/resolution and file format selection are discussed in some detail, with helpful visual examples.21 these case studies are particularly valuable in that they provide evidence and cautionary tales around how local contexts have made a difference in copyright and workflow issues. this case study contributes to the existing body of literature by attempting to provide an exhau stive, end-toend description of the retrospective digitization process—from copyright evaluation, to physical handling, to digitizing with an eye to access controls and digital preservation concerns. furthermore, our approach to digitization at scale incorporates automation at several points throughout the workflow, representing a production improvement to the decade-old case studies we reviewed. project planning and execution digitizing a large corpus of print theses and dissertations is a complex process touching areas of equipment, copyright policy, workflows for different sections of the process, progress tracking, preservation, and communication. to handle such a multifaceted project, the tdd task force designed a plan that divided the project activities into three phases (see table 1). phase one is dedicated to tasks of preparation such as the environmental scan, copyright permission investigation, digitization equipment purchasing, and print theses and dissertation inventory. phase two includes activities such as digitization and metadata workflow development, documentation, project tracking, ingestion, and preservation of digitized files. phase three is mainly for promotion and communication to our researchers on the availability of our digitized theses and dissertations collection. task force members volunteered to serve in subteams for identified specific tasks in each project phase. information technology and libraries september 2022 navigating uncharted waters | 5 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith table 1. phased planning for the tdd project theses and dissertation digitization task force planning phases task force activities subteams (*subteam lead) phase one: preparation environmental scan jerrell, anne, crystal, santi*, bethany physical theses/dissertations inventory/retention bethany copyright permissions and policies bethany, annie, taylor* purchase equipment jerrell*, crystal phase two: workflow development td digitization workflow jerrell, crystal* td metadata workflow anne*, taylor, annie td ingest and publishing workflow andrew, taylor td progress tracking annie, andrew preservation/storage strategy bethany*, santi phase three: communication/promotion promote dtd to colleagues and researchers taylor*, santi communicate progress to staff and users annie*, santi* develop training materials for stakeholders anne*, crystal* phase one: preparation a subgroup of the tdd task force conducted an environmental scan of similar theses and dissertations digitization approaches previously used by other institutions. the lead for the subgroup created a google sheet that all group members used to document information found in published literature, public documents, and institutional websites. the lead assigned group members to review information from institutions with publicly available data, including: the university of florida, the university of north texas, the university of illinois urbana champaign, brigham young university, william and mary university, texas a&m university, the university of arizona, the california institute of technology, the massachusetts institute of technology, iowa state university, xavier university, texas tech university, and the university of british columbia. group members noted relevant information pertaining to a variety of topics focused on theses and dissertations digitization. one of the most prominent was the institution’s response to copyright permissions. the group tried to determine if the institution required author permission before releasing a digitized thesis or dissertation (the “opt in” option), or incorporated policies and procedures that prioritized taking down digitized theses and dissertations once requested by the author (the “opt out” option). they observed software and hardware specifications used by other institutions—critical data that would inform the technology needed to complete a project of this scale. the group documented the key components of the digitization and metadata workflows, including roles and responsibilities, sequencing of actions, and the implications that policies and procedures had on the process. this data helped the group understand what gaps, common problems, and emerging best practices existed. finally, the group reviewed physical retention and preservation strategies articulated by institutions to ensure it understood the long-term stewardship hurdles and requirements for analog and digital material. information technology and libraries september 2022 navigating uncharted waters | 6 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith based on the assessment of the 19,800 uh theses and dissertations identified for inclusion in the project, the digitization subgroup members determined that several scanners would be required for agility in digitization production. the tdd digitization workflow was designed so that this project could run effectively, in parallel, with existing digitization projects regardless of the need for some theses to be scanned on existing equipment. automatic document feed (adf) scanners were a strategic choice for the rapid scanning of disbound items. two canon dr-g2110 adf scanners were purchased for the project. these scanners were chosen for their scanning speed, scanning quality, ease of use, onboard image preprocessing, and reasonable price point. the canon dr-g2110 can handle a large page stack, approximately 500 pages. theses and dissertations can be scanned on the longer or shorter dimension, which allows faster scanning times. among many innovative features, this duplex scanner simultaneously digitizes both sides of a document, rotates the pages based on text orientation, and auto-crops through preprocessing during the scanning process. this canon adf solution makes more image postprocessing automation possible since the resulting scans match our output expectations with minimal user input. other scanning options were needed for a smaller subset of theses and dissertations that could not be disbound. the digitization team leveraged an existing zeutschel os 12002 planetary scanner for items that could not be disbound. an existing plustek opticbook (po) a300 plus was used for items with foldouts containing graphs, maps, and illustrations that measure beyond 11 inches on the longest dimension. additionally, a plustek opticbook 3800l was purchased to accommodate fragile us letter–sized pages that are not suitable for adf scanning. thin or heavily waxed papers typically do not stand up well to the fast-moving rubber rollers and other internal scanning mechanisms. while the po 3800l provides a much longer scanning time than the po a300, both scanners can scan into the page gutter of bound materials, a useful feature for items with insufficient margins. figure 1. image processing workflow testing on a thesis in limb processing. the green check marks on the left indicate that a page has been processed correctly. information technology and libraries september 2022 navigating uncharted waters | 7 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith the canon adf scanner operates through two pieces of software working concurrently, canon captureontouch v4 pro and kofax vrs (an additional product supplied by the task force’s scanner vendor). some image processing settings are applied in kofax vrs, which communicates with captureontouch v4 pro. both pieces of software were bundled with purchases of the two canon adf scanners. limb processing by i2s was also purchased for the project. limb processing is a powerful mass image processing product that operates through user-built processing workflows that can be applied to multiple folders, creating standardized output suitable for automation. the limb software can transform an imperfect scan into a fully processed, clean derivative with minimal user input, which is especially useful for transforming legacy image data. abbyy finereader server 14 is used to provide quality optical character recognition (ocr) data and features efficient tools for automation, allowing for large ocr processing jobs to be queued and run recursively with minimal user intervention (see fig. 1). with these powerful tools, uh libraries has been able to leverage our existing scanners, new scanners, and advanced software to plan for the timely capture of nearly three million pages of content. the number of theses and dissertations required the implementation of a semiautomatic disbinding system. the spartan 185a paper cutter from the challenge machinery company was purchased to ensure the replication of many clean binding removals. options from several manufacturers were considered for these needs but the spartan 185a offered the cutting power needed to cut millions of pages over the life of the project (see fig. 2). the cutter features several safeguards that protect the operator, such as the lowering of a protective acrylic guard and the requirement of two hands, away from the blade, to lower the blade automatically. uh libraries chose a local cutter blade replacement company that services the equipment quarterly. in addition to the cutter, supplies for binding removal and physical volume management were needed, such as: • x-acto knife and/or utility blades • recycling bins • table brooms and dustpans • disbinding tables • cutting mats • standing mats • letter and legal-size folders • folder holders • surface cleaning materials • carts/book trucks information technology and libraries september 2022 navigating uncharted waters | 8 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith figure 2. (l) thesis scanning test on the zeutschel os 12002; (r) challenge spartan 185a cutter with red numbers indicating blade lowering and cutting safety button order. the physical retention of volumes was considered in the context of the overall preservation of the theses and dissertations collection, including the digital preservation approach to the tdd project. the uh libraries holds two copies of a student’s thesis and dissertation. after consulting with stakeholders throughout the library—such as the university archivist, the dean of libraries and associate deans, and access services’ shelving team for shelf space/storage in different areas of the library building—the task force decided to retain one bound copy of each thesis or dissertation. additional copies will be weeded from the general collection, and the best copy for digitization will be disbound for feeder scanning using the equipment described above. when only one copy of a thesis or dissertation exists in the collection, it will be scanned using a scanner that will not destroy or damage the binding. the retained theses and dissertations collection will be housed in uh libraries special collections in the secure and climate controlled closed stacks. once the tdd task force settled on this retention strategy, the digital projects coordinator, a member of the task force who represents special collections, conducted a full shelf -read of the theses and dissertations already housed in special collections. using a master tracking spreadsheet that was generated from catalog reports for project tracking and pulling, a small team of student workers reviewed over 20,000 volumes to identify missing titles, titles with multiple copies that can be weeded from special collections, and copies with label and/or cataloging errors. missing titles were transferred from the general stacks to special collections, and the items were reshelved in chronological order. a more extensive shelving shift still needs to be completed to move volumes to accommodate additions and finalize the shelf location for all items in this collection, which will no longer be growing or because all theses and dissertations at uh are submitted electronically as of 2014. as part of the shifting project, the items also need to be checked in and/or have their location codes changed in the catalog to reflect their new permanent home in special collections. information technology and libraries september 2022 navigating uncharted waters | 9 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith phase two: workflow development the theses and dissertations digitization workflow starts with pulling physical volumes from shelves. the task force generated a report of all uh theses and dissertations and sorted them by call number so that student workers can pull these volumes from the general stacks in call number order. after the pulled volumes’ records are withdrawn from the catalog system, they are shelved by call number order in the “ready for digitization” section of the tdd shelf in the library basement, close to the digitization lab. volumes are pulled from a section of the library stacks dedicated to the tdd project and loaded onto book carts for transfer to the physical volume processing room. using a custom-built processing table, covers are removed with utility knives and discarded. the text is placed in a folder with a pre-printed label indicating the oclc number and call number of the volume. the spine of each volume is removed with a spartan guillotine. the completely disbound volumes, housed in labeled folders, are then moved on book carts to the scanning room. prior to scanning, physical volumes are grouped in batches of approximately 50 and a text file is created that lists each oclc number in a batch, one per line. a simple executable file reads the text file and creates a batch directory. the batch directory is labeled with the current date in yyyymmdd format and contains a folder for each scanned volume. the scanned volume folders are labeled by oclc number and contain a metadata.txt file that records the volume’s descriptive metadata from the uh catalog system in yaml format: a data carrier that is easily readable for both humans and machines. scanning is performed with one of the two canon dr-g2110 high-speed feed scanners controlled by kofax vrs and captureontouch v4 pro software. before a volume is placed in the scanner, it is checked to ensure that the binding has been completely removed, that there are no pages that have been glued in after binding, and that there are no onion skin pages, irregular page sizes, inserts, or foldouts. if necessary, additional scans for delicate onion skin pages, inserts, or foldouts are performed on a flatbed scanner. page images are output as 300 dpi grayscale or color tiffs, and first pass quality control for completeness, page legibility, and rotation, and cropping is performed in captureontouch. after page images have been captured, a batch is loaded into limb for final processing. scanned volumes are again checked for completeness, legibility, and orientation. text pages are processed as 300 dpi bitonal tiffs. pages with grayscale or color images are processed as such. when batch processing is complete, the documents’ processed signature pages, which include names and signatures of the author, advisor, and committee members, are separated out so they are not included in the final version published online. this step protects the privacy of individuals by not sharing their signature openly over the internet. the tdd project uses abbyy finereader server 14 to generate full text pdfs and a plain text file for each scanned volume. the data in each scanned volume directory undergoes transformations both before and after the ocr processing. the transformations are accomplished with the tdd workflow utility, a ruby command line application. before running a batch through ocr, the archive digitized batch function moves the high-resolution master tiffs to an archive directory and formats the batch directory for the input that abbyy expects. after ocr processing, the archive ocr batch function moves the derivative tiffs used as ocr input to the archive directory information technology and libraries september 2022 navigating uncharted waters | 10 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith as well. in a final process before sending the batch to the metadata unit, the process ocr batch function adds descriptive metadata to the embedded exif metadata of each pdf document with exiftool for improved accessibility. the tdd task force sought to align print materials’ metadata standards with the existing metadata standards applied to electronic theses and dissertations in the university’s institutional repository, largely based on the dictionary of texas digital library descriptive metadata guidelines for electronic theses and dissertations, v.2.22 early on in the project, the metadata subteam reviewed thesis and dissertation records in the institutional repository (ir) as well as marc catalog records in uh libraries’ library services platform, with special emphasis on the metadata elements used, to identify alignments and gaps. after analysis, the team established the crosswalk from marc to the qualified dublin core profile in the ir (see table 2). in july 2019, uh libraries migrated to the alma library services platform. prior to this migration, the task force exported tdd marc records from uh libraries’ former library services platform, sierra, and crosswalked into dublin core metadata fields using the freely available software marcedit. data was further normalized using openrefine. at this early stage, openrefine proved to be a valuable tool for batch editing and formatting metadata and identifying legacy terms or missing data. once the crosswalked data was cleaned up and put into place, standard values for all records were added (see table 3). table 2. metadata crosswalk from marc to qualified dublin core metadata field marc field qualified dublin core element oclc number 001, 035 $a dc.identifier.other call number 099 [n/a, admin use only] author name 100 $a dc.creator title 245 $a $b dc.title thesis year 264 $c dc.date.issued degree information 500, 502 $a thesis.degree.name subject 6xx fields dc.subject department 710 $b thesis.degree.department during the ongoing processing of digitized materials and as part of the quality control, each volume’s metadata is evaluated against its corresponding metadata record and edited when necessary. in an effort to enrich the metadata available to users and increase visibility of the volumes, information not typically provided in the marc records, such as thesis committee chairs, other committee members, and abstracts, are added to the records using dublin core contributor (dc.contributor.committeemember) and abstract (dc.description.abstract). information technology and libraries september 2022 navigating uncharted waters | 11 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith table 3. standard values added to all records qualified dublin core element value dc.format.mimetype application/pdf dc.type.genre “thesis” or “dissertation,” as applicable thesis.degree.grantor university of houston dc.type.dcmi text dc.format.digitalorigin reformatted digital in the interest of closely observing copyright best practices, members of the tdd task force, including the digital projects coordinator and the director of the digital research commons, created copyright review guides and applicable rights statements. under these guidelines, theses and dissertations are considered under copyright if a copyright notice appears on volumes created in 1977 and earlier, if the item was created between 1978 and february 28, 1989, and if it was registered with the us copyright office within five years of its creation, or if it was created on march 1, 1989 or later. inserts and other research material provided in the volumes are similarly considered for copyright evaluation during the copyright review process. once a volume has been evaluated for copyright status, an out-of-copyright or in-copyright statement is assigned. in alignment with the uh libraries’ mission to provide valuable research and educational materials, digitized volumes and metadata records are then ingested into the institutional repository.23 in this stage of the process, out-of-copyright volumes are made available as open access materials. due to inherent limitations, in-copyright volumes are access restricted and available solely to the university community. when content is ready for ingest, volumes are moved to the ingest folder and placed in staging directories based on rights status: open access or in copyright. the ingest process is the same for both types of content, but in-copyright content requires additional post-ingest processing, so ingest batch folders are labeled according to rights status for clarity. the tdd workflow utility’s prepare ingest package function is used to create ingest packages in an input format expected by the saf creator, a utility for preparing dspace batch imports in the simple archive format .24 pdf files are copied and renamed in the format lastname_year_oclcnumber.pdf, a csv file is created with descriptive metadata for the batch, and the original files and metadata are moved to an archive directory. the saf creator is then used to create an saf ingest package that is imported into dspace. limiting access to copyrighted content was a necessary component of the project that took some time to solve. the team investigated creating a separate collection for the in-copyright content with access limited to users logged in with uh credentials. the downside to this approach was that the content within the restricted collection was not discoverable to users who were not information technology and libraries september 2022 navigating uncharted waters | 12 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith logged into the ir. in the end, the tdd task force worked with the texas digital library, a consortial nonprofit organization that hosts uh libraries’ dspace repository, to enable restricted access using bitstream authentication with shibboleth. this allows the task force to ingest all tdd project content into a single collection and apply uh authentication to copyrighted pdf documents. in this manner, descriptive metadata for all documents is discoverable, but access to the document content is only available to members of the uh community. applying authentication to bitstreams in the dspace administrative interface is a tedious process involving numerous clicks and dropdown menu selections. selenium ide, a browser plug-in designed for automated web development testing, is used to automate that process in the firefox web browser. after an in-copyright batch has been ingested, the tdd workflow utility’s prepare selenium script function is used to create an automation script for selenium. when loaded in the firefox selenium add-on, the script automatically applies the bitstream authentication steps in the browser for each volume in the batch. the tdd workflow comprises detailed tasks carried out at different units in the library in a sequential routine as an assembly line. tdd activities flow from pulling volumes from shelves to disbinding, scanning, image quality control and ocr, metadata creation and copyright evaluation, and digitized files ingestion into the dspace system. as the tdd task force worked collaboratively to develop and confirm workflows for this complicated process, they documented each section of the workflow in the one-stop tdd workflow google document for easy access and transparency of the overall process.25 the tdd working group members notify each other at completion of tasks at each section. to better track each thesis and dissertation as it moved through the digitization, metadata, and copyright verification workflows, the task force developed an excel spreadsheet tracking system.26 this tracking system lists uh libraries’ theses and dissertation titles, their oclc numbers, dates, and call numbers. it records the tdd volumes pulled from shelves, digitization completed, digitization batch, borrower notes, metadata completed, and other notes. this tracking system provides a channel for the team members to inform each other of completed tasks at each unit and to communicate issues in the working process (see fig. 3). information technology and libraries september 2022 navigating uncharted waters | 13 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith figure 3. a screenshot of a portion of the tdd tracking system. phase three: promotion, communication, and next steps it is important to have strategies for tdd promotion and communication to raise awareness of the online availability of the university’s legacy theses and dissertations. the tdd task force brainstormed elements such as audience, channels, and timeline for tdd communication. theses and dissertation authors and campus users are the two main groups the task force plans to target in its promotion and communication plan. to attract audience attention, the tdd task force will design an online flyer/postcard for dissemination. they are currently collaborating with the uh libraries director of communication, the uh alumni office, the uh graduate office, and the uh division of research to distribute messages to targeted audiences. the task force will communicate tdd digitization progress as they reach important milestones, including the completion of pre-1978 volumes, then at the increments of 10,000 and 15,000 volumes, and once all volumes have been digitized and deposited to the repository. with the disbanding, digitization, and metadata workflows firmly in place, the tdd task force commenced the process of generating digitized versions of uh’s theses and dissertations in 2020. while this process will continue over the next several years, the task force will also fo cus on refining policies and workflows around its copyright and digital preservation activities. the tdd task force has developed a draft copyright policy development document, which outlines copyright determination decisions and access controls for content deemed in copyright. the task force is currently consulting with uh general counsel to ensure its recommended copyright approaches are in concert with university best practices. information technology and libraries september 2022 navigating uncharted waters | 14 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith at the same time, the task force is developing digital preservation procedures to ensure the longterm access, storage, and preservation to digitized theses and dissertations. the group has made some foundational decisions to date. since one physical copy of each title will be retained, allowing for future higher-resolution rescanning if needed, the task force determined that the preservation master file for each digitized thesis or dissertation will be one pdf. this will allow the uh libraries to greatly reduce the ongoing storage costs associated with digitally preserving the tdd collection. throughout 2023, the task force will be exploring ways to sync tdd content to its current digital preservation workflow process, including submitting content to uh libraries’ archivematica instance for preservation curation services such as file fixity checks and normalization, and transferring preserved tdd content to cloud storage for distributed digital preservation. prior to ingesting any content into the institutional repository, the team reached out to the uh’s electronic and information resources accessibility (eira) coordinator for feedback on the accessibility of the pdf documents produced by abbyy. the eira coordinator recommended encoding our pdfs as pdf/a-1a, a standard designed for preservation and accessibility, and introduced the team to the accessibility tools available in adobe acrobat. the adobe acrobat accessibility checker has been useful for identifying and addressing accessibility issues with the pdfs that we are producing. uh libraries web accessibility standards strive to comply with the world wide web consortium’s (w3) web content accessibility guidelines (wcag). combined with the feedback from the uh’s eira coordinator, the current output was reviewed against these accessibility checklists, and areas needing improvement were identified. after several adjustments, the newest output for the project passes a majority of adobe acrobat’s accessibility checker accessibility parameters, with further investigation planned to address weak points moving forward. conclusion the tdd project at uh libraries provides an in-depth view of the planning and workflow processes needed to launch a retrospective thesis and dissertations digitization effort in an academic library setting. collaborating across uh libraries departments, the tdd task force designed a phased approach to identify technology and resources needed to undertake the project, to develop policies, procedures, and workflows to guide the work to its completion, and to communicate about the scope, purpose, and progress of the project to internal and external stakeholders. throughout the planning and development phases, the task force leveraged automation, bibliographic data reuse, and project management tracking to achieve workflow objectives efficiently and responsibly. with the project well underway, the task force will continue refining its processes and working across uh libraries and campus units to ensure it complies fully with copyright and digital preservation best practices. through these ongoing efforts, the tdd task force is ensuring that the original research and scholarship contained in thousands of theses and dissertations are more accessible than ever before—broadening the reach and impact of uh graduates well into the future. funding this project was funded by the john p. mcgovern foundation. information technology and libraries september 2022 navigating uncharted waters | 15 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith acknowledgments the authors dedicate this work to the memory of their colleague and tdd task force member crystal cooper. endnotes 1 linda bennett and dimity flanagan, “measuring the impact of digitized theses: a case study from the london school of economics,” insights: the uksg journal 29, no. 2 (2016): 111–19, https://doi.org/10.1629/uksg.300. 2 gail clement and melissa levine, “copyright and publication status of pre-1978 dissertations: a content analysis approach,” portal: libraries and the academy 11, no. 3 (2011): 825, https://doi.org/10.1353/pla.2011.0031. 3 clement and levine, “copyright and publication status,” 825. 4 clement and levine, “copyright and publication status,” 826. 5 xiaocan (lucy) wang, “guidelines for implementing etd programs—roles and responsibilities,” in guidance documents for lifecycle management of etds, eds. matt schultz, nick krabbenhoeft, and katherine skinner (2014): sect.1, p. 14, https://educopia.org/guidance-documents-forlifecycle-management-of-etds. 6 wang, “guidelines,” 1-17. 7 patricia hswe, “briefing on copyright and fair use issues in etds,” in guidance documents for lifecycle management of etds, eds. matt schultz, nick krabbenhoeft, and katherine skinner, (2014): sect. 3, p. 12, https://educopia.org/guidance-documents-for-lifecycle-management-ofetds. 8 geneva henry, “guide to access levels and embargoes of etds,” in guidance documents for lifecycle management of etds, eds. matt schultz, nick krabbenhoeft, and katherine skinner, (2014): sect. 2, p. 1, https://educopia.org/guidance-documents-for-lifecycle-management-ofetds. 9 henry, “guide to access levels,” 2-1. 10 hswe, “briefing on copyright,” 3-9–3-13. 11 cathleen l. martyniak, “scanning our scholarship: the university of florida retrospective dissertation scanning project,” microform and imaging review 37, no. 3 (2008): 122–24, https://doi.org/10.1515/mfir.2008.013. 12 martyniak, “scanning our scholarship,” 127–29. 13 “retrospective dissertation scanning policy,” (2011), university of florida, accessed january 1, 2022, https://ufdc.ufl.edu/aa00007596/00001. https://doi.org/10.1629/uksg.300 https://doi.org/10.1353/pla.2011.0031 https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://doi.org/10.1515/mfir.2008.013 https://ufdc.ufl.edu/aa00007596/00001 information technology and libraries september 2022 navigating uncharted waters | 16 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith 14 todd mundle, “digital retrospective conversion of theses and dissertations: an in house project,” in proceedings of eighth symposium on electronic theses and dissertation (sydney, australia, 2005): 3–4. 15 mundle, “digital retrospective conversion,” 3. 16 mary piorun and lisa a. palmer, “digitizing dissertations for an institutional repository: a process and cost analysis,” journal of the medical library association: jmla 96, no. 3 (2008): 223–29, https://doi.org/10.3163/1536-5050.96.3.008. 17 piorun and palmer, “digitizing dissertations,” 224. 18 piorun and palmer, “digitizing dissertations,” 227. 19 sarah l. shreeves and thomas h. teper, “looking backwards: asserting control over historic dissertations,” college and research library news 73, no. 9 (2012): 532–33, https://doi.org/10.5860/crln.73.9.8830. 20 gary m. worley, “dissertations unbound: a case study for revitalizing access,” in proceedings of the 10th international symposium on electronic theses and dissertations (uppsala, sweden, 2007). 21 worley, “dissertations unbound,” 3–6. 22 dictionary of texas digital library descriptive metadata for electronic theses and dissertations, version 2.0, (2015), http://hdl.handle.net/2249.1/68437. 23 to access cougar roar, see https://guides.lib.uh.edu/roar. 24 saf creator is a tool developed by james creel at texas a&m university. for more, see https://github.com/jcreel/safcreator. 25 see the tdd google document: https://docs.google.com/document/d/18gyjq6isn7qsuelo1z3b7btmxlxnchmvqp8rhquzy8g /edit?usp=sharing. 26 see the complete tdd tracking system: https://docs.google.com/spreadsheets/d/1tehagvcqw6wb3n5cdaulbtlwzqzstwdbltiapd 1oan0/edit?usp=sharing. https://doi.org/10.3163/1536-5050.96.3.008 https://doi.org/10.5860/crln.73.9.8830 http://hdl.handle.net/2249.1/68437 https://guides.lib.uh.edu/roar https://github.com/jcreel/safcreator https://docs.google.com/document/d/18gyjq6isn7qsuelo1z3b7btmxlxnchmvqp8rhquzy8g/edit?usp=sharing https://docs.google.com/document/d/18gyjq6isn7qsuelo1z3b7btmxlxnchmvqp8rhquzy8g/edit?usp=sharing https://docs.google.com/spreadsheets/d/1tehagvcqw6wb3n5cdaulbtlwzqzstwdbltiapd1oan0/edit?usp=sharing https://docs.google.com/spreadsheets/d/1tehagvcqw6wb3n5cdaulbtlwzqzstwdbltiapd1oan0/edit?usp=sharing abstract introduction literature review project planning and execution phase one: preparation phase two: workflow development phase three: promotion, communication, and next steps conclusion funding acknowledgments endnotes articles 50 years of ital/jla: a bibliometric study of its major influences, themes, and interdisplinarity brady lund information technology and libraries | june 2019 18 brady lund (blund2@g.emporia.edu) is a phd student at emporia state university’s school of library and information management. abstract over five decades, information technology and libraries (and its predecessor, the journal of library automation) has influenced research and practice in the library and information science technology. from its inception on, the journal has been consistently ranked as one of the superior publications in the profession and a trendsetter for all types of librarians and researchers. this research examines ital using a citation analysis of all 878 peer-reviewed feature articles published over the journal’s 51 volumes. impactful authors, articles, publications, and themes from the journal’s history are identified. the findings of this study provide insight into the history of ital and potential topics of interest to ital authors and readership. introduction fifty-one years have passed since the first publication of the journal of library automation (jla), the precursor to information technology and libraries (ital), in 1968: 51 volumes, 204 issues, and 878 feature articles. information technology and its use within libraries has evolved dramatically in the time since the first volume, as has the content of the journal itself. given the interdisciplinary nature of library and information science (lis) and ital, and the celebration of this momentous achievement, an examination of the journal’s evolution, based on the authors, publishers, and works that have influenced its content, seems apropos. the following article presents a comprehensive study of all 7,575 references listed for the 878 articles (~8.6 refs/article average) published over ital’s fifty years, identifying those authors and publishers whose work has been cited the most in the journal and major themes in the cited publications, and an evaluation of the interdisciplinarity of references in ital publications. this study not only frames the history of the ital journal, but demonstrates an evolution of the journal that suggests new paths for future inquiry. conceptual framework a major influence for the organization and methodology of this paper is imad al-sabbagh’s 1987 dissertation from florida state university’s school of library and information studies, the evolution of the interdisciplinarity of information science: a bibliometric study.1 in this study, alsabbagh sought to examine the interdisciplinary influences on the burgeoning field of information science by examining the references of the journal of the american society of information science (jasis), today known as the journal of the association for information science and technology (jasist). in al-sabbagh’s study, a sample of ten percent of jasis references was selected for examination.2 the references were sorted into disciplines based on the definitions supplied by information technology and libraries | june 2019 19 dewey decimal classification categories, with the percentages compared to the total number of the sampled references to derive percentages (e.g., if 150 references of 1,000 total jasis references examined belonged to the category of library science, then 15 percent of references belonged to library science, and so on for all disciplines). the present study deviates slightly from al-sabbagh’s in that it does not use a sampling method. instead, all 878 articles published in jla/ital and their 7,575 references will be examined. the categories for disciplines, instead of being based of dewey decimal, will be based on definitions derived from encyclopedia brittanica, and will include new disciplines that were not used in al-sabbagh’s original analysis, such as information systems and instructional design. 3 additionally, the major authors, publishers, and articles cited throughout jla/ital’s history will be identified; this was not done in al-sabbagh’s study, but will likely provide additional beneficial information for researchers and potential contributors to ital. ital is an ideal publication to study using al-sabbagh’s methodology, in that it is affiliated with librarianship and library science but, due to its content, is also closely associated with the disciplines of information science, computer science, information systems, instructional design, psychology, and many others. ital is likely one of the more interdisciplinary journals to still fall within the category of “library science.” in fact, as part of al-sabbagh’s 1987 study, he distributed a survey to several prominent information science researchers, asking them to name journals relevant to information science (this method was used to determine that jasis was the most representative journal for the discipline of information science). on the list of 31 journals compiled from the respondents’ rankings, ital ranked as the seventh most representative journal for information science, above datamation, scientometrics, jelis, and library hi-tech.4 this shows that, for a long time, ital has been considered as an important journal not just in library science, but in information science and likely beyond. key terminology while the findings of this study are pertinent to the ital reader, some of the terminology used throughout the study may be unfamiliar. to acclimate the reader to the terminology used in this study, brief definitions for key concepts are provided below. bibliometrics. “bibliometrics” is the statistical study of properties of documents.5 the present study constitutes a “citation analysis,” a type of bibliometric analysis that examines the citations in a document and what they can reveal about said document. cited publications. “cited publications” are the references (“publications”) listed at the end of a journal article.6 the purpose of al-sabbagh’s study (and the present study) is to analyze these cited publications to determine what disciplines influenced the research published in a specific journal. this bibliometric analysis methodology is distinct from those that examine the influence of a specific journal on a discipline (i.e., the present study looks at what disciplines influenced ital, not what disciplines are influenced by ital). discipline. in this study, the term “discipline” is used liberally to refer to any area of study that is presently or was historically offered at an institution of higher education (sociology, anthropology, education, etc.). in this study, library science and information science are considered as distinct disciplines (as was the case with al-sabbagh’s study).7 as discussed in the methodology section, the names and definitions of disciplines are all derived from the encyclopedia britanica. 50 years of ital/jla | lund 20 https://doi.org/10.6017/ital.v38i2.10875 literature review the type of citation analysis used by al-sabbagh and as the basis of the current study are used frequently to examine the interdisciplinarity of library and information science and specific lis journals, as noted by huang and chang.8 tsay used a similar methodology to al-sabbagh to examine cited publications in the 1980, 1985, 1990, 1995, 2000, and 2004 volumes of jasist. in this study, the researcher found that about one-half of the citations in jasist came from the field of lis.9 butler examined lis dissertations using a similar approach, finding that about 50 percent of the cited publications in the dissertations originated in lis, with education, computer science, and health science following in the second, third, and fourth positions.10 chikate and paul and chen and liang conducted similar studies of dissertations in the india and taiwan.11 each study found different degrees of interdisciplinarity, possibly indicating a fluctuation within the discipline of lis based on publication type, country of origin, etc. for the publications used in the study. several researchers have used these methods recently to examine library and information science journals, such as chinese librarianship,12 pakistan journal of library and information science,13 library philosophy and practice,14 and the journal of library and information technology.15 these studies are more common for journals published outside of the united states, but there is no reason why the methodology would not hold true for a u.s.-based journal like ital. recently, publications in a wide array of fields have used similar methodologies as al-sabbagh to evaluate interdisciplinarity in a discipline. ramos-rodriguez and ruiz-navarro (2004) examined reference trends in the journal of strategic management.16 fernandez-alles and ramos-rodriguez (2009) conducted a bibliometric analysis to identifying those publications most frequently cited in the journal human resource management.17 crawley-low (2006) used a similar methodology to identify the core (most frequently cited) journals in veterinary medicine from the american journal of veterinary research.18 these studies show a growing interest in the use of citation analysis to present new information about a publication to potential authors, editors, and readers. jarvelin and vakkari (1993) noted trends in lis from 1965 to 1985 based on an examination of cited publications in lis journals. the authors noted a trend in interest in the topic of information storage and retrieval, with a de-emphasis on classification and indexing and a strengthened emphasis on information systems and retrieval.19 this study deviated from al-sabbagh and related studies of interdisciplinarity—though it employed a similar methodology—in that it examined trends or subtopics within the discipline of lis. though it is not a primary focus of the present study, the use of subtopics to further divide the discipline of library science and examine what aspects (management, technology, cataloging, reference) of the discipline are the focus of cited publications is incorporated in several tables in the results section. methods all references from the 878 articles published in the jla/ital journals (n=7,575) were transcribed to an excel spreadsheet for analysis (this spreadsheet can be found as a supplemental file [https://ejournals.bc.edu/index.php/ital/article/view/10875/9469]). the spreadsheet includes separate columns for primary author, title, publisher, and discipline of each reference. the list of disciplines with their definitions, derived from encyclopedia brittanica, is displayed in table 1 below. information technology and libraries | june 2019 21 table 1. definitions of disciplines used for this study. discipline definition library science the principles and practices of library operation and administration, and their study. information science the discipline that deals with the processes of storing and transferring information. information systems the study of the integrated set of components for collecting, storing, and processing data and for providing information, knowledge, and digital products. computer science the study of computers, including their design (architecture) and their uses for computations, data processing, and systems control. engineering the application of science to the optimum conversion of the resources of nature to the uses of humankind. instructional design the systematic development of instructional specifications using learning and instructional theory to ensure the quality of instruction. education the discipline that is concerned with methods of teaching and learning in schools or school-like environments as opposed to various nonformal and informal means of socialization. government resources produced within the political system by which a country or community is administered and regulated. sociology a social science that studies human societies, their interactions, and the processes that preserve and change them. popular newspaper, magazine, media reports that do not fit better within another category. philosophy the rational, abstract, and methodical consideration of reality as a whole or of fundamental dimensions of human existence and experience. psychology the scientific discipline that studies mental states and processes and behaviour in humans and other animals. corporate business, corporate, private organization publications that do not fit better within another category. archival science the study of the repository for an organized body of records produced or received by a public, semipublic, institutional, or business entity in the transaction of its affairs and preserved by it or its successors. management the study of the process of dealing with or controlling things or people. linguistics the scientific study of language. literature the art of creation of a written work. law the discipline and profession concerned with the customs, practices, and rules of conduct of a community that are recognized as binding by the community. 50 years of ital/jla | lund 22 https://doi.org/10.6017/ital.v38i2.10875 discipline definition mathematics the science of structure, order, and relation that has evolved from elemental practices of counting, measuring, and describing the shapes of objects (also includes statistics). health science study of humans, the extent of an individual’s continuing physical, emotional, mental, and social ability to cope with his or her environment. communication science the study of the exchange of meanings between individuals through a common system of symbols. geography the study of the diverse environments, places, and spaces of earth’s surface and their interactions. physics the science that deals with the structure of matter and the interactions between the fundamental constituents of the observable universe. art/design the study of the nature of art, including such concepts as interpretation, representation and expression, and form. economics the social science that seeks to analyze and describe the production, distribution, and consumption of wealth. biology the study of living things and their vital processes. museum studies the study of institutions dedicated to preserving and interpreting the primary tangible evidence of humankind and the environment. music the art concerned with combining vocal or instrumental sounds for beauty of form or emotional expression, usually according to cultural standards of rhythm, melody, and, in most western music, harmony. chemistry the science that deals with the properties, composition, and structure of substances (defined as elements and compounds), the transformations they undergo, and the energy that is released or absorbed during these processes. science and technology studies the study, from a philosophical perspective, of the elements of scientific inquiry. journalism the collection, preparation, and distribution of news and related commentary and feature materials through such print and electronic media as newspapers, magazines, books, blogs, webcasts, podcasts, social networking and social media sites, and e-mail as well as through radio, motion pictures, and television. anthropology the study of human beings in aspects ranging from the biology and evolutionary history of homo sapiens to the features of society and culture that decisively distinguish humans from other animal species. to determine the discipline in which a cited publication would be classified, the researcher used the cited publication’s title, abstract, and journal to select the most appropriate discipline from the table. in those cases where a source could not be easily identified as falling within one specific discipline, the researcher conferred with additional reviewers (professional librarians) to determine the best fit. information technology and libraries | june 2019 23 several analyses of this data were conducted to explore various aspects of jla/ital’s publication history. for the complete data of the publication’s 51 volumes, the top ten most referenced authors, articles, publishers (journals/publishing houses/organizations/websites), and disciplines were identified with the aid of excel’s functions. the same was done separately for both the jla’s 14 volumes and ital’s 37 volumes. this will allow for the comparison of the journal before and after the 1982 rebranding. the 51 volumes of jla/ital were also divided into the five decades of its history: 1968-77, 1978-87, 1988-97, 1998-2007, 2008-18 (eleven volumes instead of ten). for each of these decades, the top ten authors, publishers, and disciplines were identified. for each of the three categories, a table was created to show the top ten of each decade side-by-side. lastly, the titles of the 7,575 cited publications in jla/ital articles were examined using a content analysis, to identify major concepts and themes that appear to have influenced jla/ital articles. nvivo content analysis software was utilized for this analysis. titles were fed from the excel spreadsheet into the nvivo software, and the word frequency tools were used to identify the most frequently used terms and “generalizations,” or types or themes of statements in the titles.20 results table 2 displays the top ten most-cited authors, articles, publishers/publications, and disciplines throughout ital’s fifty-year history. among the authors, four of the top six are associated with two institutions: library of congress and oclc. there are four corporate or nonprofit organizations, three academics (associated with institutions of higher education), two women and four men. of the top ten articles, eight were published before 1973; three were published in jla/ital and five were published in journals versus five in other (non-journal) publications. of the top ten publishers, seven are journals; five of the publishers are directly associated with library science. within the disciplines, lis represents 60 percent of the total. there are 31 total disciplines represented throughout the 51 volumes, a greater number of disciplines than identified in al-sabbaugh’s study of jasist. table 3 displays the results for jla. jla emerged at the same time as the machine-readable catalog (marc) and oclc, and this is evident in the authors, articles, and publishers cited in the journal. during this phase of the journal’s history, the top three authors—fred kilgour, the library of congress, and henriette avram—dominated the citations. these three authors were cited more than the next seven combined (143 to 101). the cited publications during this period reflected a focus on systems, corporate, and government publications. results for the 37 volumes of ital are displayed in table 4. during this period, marshall breeding emerged as one of the biggest influences on information technology and libraries. all but two of the top articles (larson and bizer) were written before 1985. while six publishers were the same as with jla, three of these six (library of congress, association for computing machinery, and college and research libraries) changed position in the top ten. the disciplines of systems, psychology, educational and instructional design rose, while government, corporate, management, linguistics, and electrical engineering dropped; library science, information science, and computer science remained at the top. 50 years of ital/jla | lund 24 https://doi.org/10.6017/ital.v38i2.10875 table 2. overall most cited. top ten authors (affiliation) top ten articles top ten publishers top ten disciplines top ten disciplines with percentages 1 u.s. library of congress american library association. (1967). anglo-american cataloging rules. chicago, il: american library association. ital/jla library science— technology library science 44% 2 fred g. kilgour (oclc) avram, h. d. (1968). the marc ii format: a communications format for bibliographic data. washington, dc: library of congress. asist information science information science 16% 3 henriette d. avram (library of congress) ruecking jr, f. h. (1968). bibliographic retrieval from bibliographic input; the hypothesis and construction of a test. information technology and libraries, 1(4), 227-238. association for computing machinery library science— cataloging computer science 8% 4 american library association kilgour, f. g., leiderman, e. b., & long, p. l. (1971). retrieval of bibliographic entries from a name-title catalog by use of truncated search keys. ohio college library center. college and research libraries computer science information systems 7% 5 ibm: international business machines kilgour, f. g. (1968). retrieval of single entries from a computerized library catalog file. proceedings of the american society for information science, 5, 133136. library of congress information systems government 3% 6 ohio college library center/online computer library center (oclc) long, p. l., & kilgour, f. (1972). a truncated search key title index. information technology and libraries, 5(1), 17-20. american library association library science— general instructional design 3% 7 marshall breeding (vanderbilt university/independent) hildreth, c. r. (1982). online public access catalogs: the user interface. oclc online computer library center, incorporated. library resources and technical services government corporate 2% 8 jakob nielsen (independent) nugent, w. r. (1968). compression word coding techniques for information retrieval. information technology and libraries, 1(4), 250-260. library hitech library science— administration education 2% 9 karen markey (university of michigan) curwen, a. g. (1990). international standard bibliographic description. in standards for the international exchange of bibliographic information: papers presented at a course held at the school of library, archive and information studies, university college london (pp. 3-18). library journal instructional design psychology 2% 10 walt crawford (research libraries group/independent) fasana, p. j. (1963). automating cataloging functions in conventional libraries (no. isl-9028-37). lexington, ma: itek corp information sciences lab. oclc library science— academic sociology 2% information technology and libraries | june 2019 25 table 3. jla most cited. top ten authors (affiliation) top ten articles top ten publishers top ten disciplines top ten disciplines with percentages 1 fred g. kilgour (oclc) avram, h. d. (1968). the marc ii format: a communications format for bibliographic data. washington, dc: library of congress. journal of library automation library science— technology library science 58% 2 u.s. library of congress american library association. (1967). anglo-american cataloging rules. chicago, il: american library association. asist information science information science 14% 3 henriette d. avram (library of congress) ruecking jr, f. h. (1968). bibliographic retrieval from bibliographic input; the hypothesis and construction of a test. journal of library automation, 1(4), 227-238. library of congress library science— cataloging computer science 6% 4 ibm: international business machines kilgour, f. g., leiderman, e. b., & long, p. l. (1971). retrieval of bibliographic entries from a nametitle catalog by use of truncated search keys. ohio college library center. library resources and technical services library science— general government 5% 5 american library association long, p. l., & kilgour, f. (1972). a truncated search key title index. journal of library automation, 5(1), 17-20. ibm computer science corporate 5% 6 william r. nugent (inforonics, inc.) kilgour, f. g. (1968). retrieval of single entries from a computerized library catalog file. proceedings of the american society for information science, 5, 133-136. american library association government information systems 4% 7 paul j. fasana (columbia university) livingston, l.g. (1973). international standard bibliographic description for serials. library resources and technical services, 17(3), 293-298. association for computing machinery corporate management 2% 8 philip l. long (oclc) fasana, p. j. (1963). automating cataloging functions in conventional libraries (no. isl-9028-37). lexington, ma: itek corp information sciences lab. university of illinois press information systems linguistics 1% 9 martha e. williams (university of illinois) nugent, w. r. (1968). compression word coding techniques for information retrieval. journal of library automation, 1(4), 250-260. college and research libraries library science— academic electrical engineering 1% 10 university of california avram, h. d. (1970). the recon pilot project: a progress report. journal of library automation, 3(2), 102-114. special libraries library science— special psychology 1% 50 years of ital/jla | lund 26 https://doi.org/10.6017/ital.v38i2.10875 table 4. ital most cited. top ten authors (affiliation) top ten articles top ten publishers top ten disciplines top ten disciplines with percentages 1 u.s. library of congress american library association. (1967). anglo-american cataloging rules. chicago, il: american library association. information technology and libraries library science— technology library science 41% 2 american library association hildreth, c. r. (1982). online public access catalogs: the user interface. oclc online computer library center, incorporated. asist information science information science 16% 3 marshall breeding (vanderbilt university/independent) markey, k. (1984). subject searching in library catalogs. oclc online computer library center. association for computing machinery library science— cataloging computer science 9% 4 jakob nielsen (independent) malinconico, s. m. (1979). bibliographic data base organization and authority file control. wilson library bulletin, 54(1), 36-45. college and research libraries computer science information systems 7% 5 karen markey (university of michigan) matthews, j. r., lawrence, g. s., & ferguson, d. (1983). using online catalogs: a nationwide survey. neal-schuman publishers, inc.. library hitech information systems instructional design 3% 6 oclc bizer, c., heath, t., & berners-lee, t. (2011). linked data: the story so far. in semantic services, interoperability and web applications: emerging concepts (pp. 205-227). igi global. american library association instructional design government 2% 7 walt crawford (research libraries group/independent) tolle, j. e. (1983). current utilization of online catalogs: transaction log analysis. volume i of three volumes. final report. ohio college library center library science— administration education 2% 8 clifford a. lynch (university of california/coalition for networked 0information) larson, r. r. (1991). the decline of subject searching: long-term trends and patterns of index use in an online catalog. journal of the american society for information science, 42(3), 197-215. journal of academic librarianship library science— general sociology 2% 9 charles r. hildreth (read ltd.) markey, k. (1983). online catalog use: results of surveys and focus group interviews in several libraries. volume ii of three volumes. final report. library journal library science— academic psychology 2% 10 j.r. matthews (san jose state university/independent) ludy, l.e., & logan, s.j. (1982). integrating authority control in an online catalog. american society for information science meeting, 19, 176-178. library of congress government management 2% the top ten authors of each decade are shown in table 5. for the first two decades, fred kilgour was a dominate influence, receiving 15 more citations than the next closest author (the library of congress). in the third decade, kilgour dropped entirely from the top ten and was supplanted at the top spot by karen markey, professor at the university of michigan. during the fourth decade, in the wake of cipa and the u.s. patriot act, the library of congress rose to the top spot and john information technology and libraries | june 2019 27 bertot and paul jaeger, who wrote extensively on these topics and their legal, social, and administrative implications, rose up the list. web resources, such as google, also began to emerge in the fourth decade. in the final decade, breeding, who writes on library systems as well as information technology in general, rose to the top spot. tim berners-lee, one of the pioneers of the internet and linked data, took the second spot. jakob nielson, known for his contributions to usability testing, appears in the top three of the rankings for both the fourth and fifth decade. only the library of congress and american library association appear in the top ten list for all five decades. table 5. top ten authors of each decade. 1968-77 1978-87 1988-97 1998-2007 2008-18 1 fred g. kilgour (oclc) fred g. kilgour (oclc) karen markey (university of michigan) u.s. library of congress marshall breeding (vanderbilt university/independent) 2 u.s. library of congress robert de gennaro (harvard university/ pennsylvania university) u.s. library of congress jakob nielsen (independent) tim berners-lee (w3 consortium/ university of oxford/ massachusetts institute of technology) 3 henriette d. avram (library of congress) henriette d. avram (library of congress) clifford a. lynch (university of california/coalition for networked information) john c. bertot (university of maryland) jakob nielsen (independent) 4 ibm: international business machines ibm: international business machines michael k. buckland (university of california) oclc u.s. library of congress 5 american library association s. michael malinconico (new york public library/ university of alabama) american library association paul t. jaeger (university of maryland) american library association 6 paul j. fasana (columbia university) u.s. library of congress christine l. borgman (university of californialos angeles) walt crawford (research libraries group/independent) national information standards organization 7 william r. nugent (inforonics, inc.) frederick w. lancaster (university of illinois) charles r. hildreth (read ltd) american library association u.s. government 8 university of california allen b. veaner (stanford university/ university of california) joseph r. matthews (san jose state university/independent) roy tennant (university of california/ oclc) john c. bertot (university of maryland) 9 kenneth j. bierman (oklahoma state university/ university of nevada-las vegas) alan l. landgraf (oclc) walt crawford (research libraries group/independent) google oclc 10 robert m. hayes (university of california-los angeles) american library association lois m. chan (university of kentucky) thomas b. hickey (oclc) jung-ran park (drexel university) 50 years of ital/jla | lund 28 https://doi.org/10.6017/ital.v38i2.10875 jla/ital appears as the most cited publisher in all decades except the fourth, as shown in table 6. during that decade, acm and jasist rose above ital, and library journal and websites (websites are considered in this study as a collective group) emerged on the list. library journal was a frequently used source for bertot and jaeger, who authored several ital articles during this period. there were also more articles about the internet, digital libraries, google and google scholar, and the future of libraries during the fourth decade. jasist appears in the top four of every decade but has declined in the fifth decade of ital. oclc, ibm, college and research libraries, cataloging and classification quarterly, journal of academic librarianship, library resources and technical services, and library hi-tech all appear in multiple decades of this list. table 6. top ten publishers of cited articles for each decade. 1968-77 1978-87 1988-97 1998-2007 2008-18 1 jla jla/ital ital association for computing machinery ital 2 library of congress jasist jasist jasist library hi-tech 3 jasist library journal college and research libraries ital association for computing machinery 4 library resources and technical services oclc american library association college and research libraries jasist 5 ibm university of illinois press library resources and technical services american library association journal of academic librarianship 6 american library association library of congress oclc library journal college and research libraries 7 special libraries library resources and technical services library of congress journal of academic librarianship computers in libraries 8 college and research libraries american library association library hi-tech general websites d-lib 9 association for computing machinery prentice-hall journal of academic librarianship library hi-tech cataloging and classification quarterly 10 university of illinois press ibm cataloging and classification quarterly oclc ieee as shown in table 7, library science and information science maintained the first and second positions for every decade of jla/ital’s publication, while computer science and information systems jockeyed for the third and fourth positions every decade except the first (when government reports had a major impact on the journal). government and corporate (ibm particularly) were important in the first three decades but were replaced by instructional design and education in the final two decades. sociology appears in four of five decades, while psychology appears in three of five. in the first two decades, electrical engineering (as it applied to the design of computer systems) rounded up the top ten; law emerged in decade four, following cipa and the information technology and libraries | june 2019 29 patriot act; in the final decade, with the discussion about encoded archival description in ital, archival science rose to the tenth spot. table 7. top ten disciplines of each decade (library science subcategories combined). 1968-77 1978-87 1988-97 1998-2007 2008-18 1 library science library science library science library science library science 2 information science information science information science information science information science 3 computer science information systems computer science computer science information systems 4 government computer science information systems information systems computer science 5 corporate corporate government instructional design instructional design 6 information systems government philosophy education psychology 7 management management sociology corporate government 8 linguistics sociology literature sociology education 9 electrical engineering psychology psychology philosophy sociology 10 chemistry electrical engineering education law archival science table 8 compares all disciplines (including subcategories of library science) in the first decade of jla/ital and the fifth decade. compared to the first decade, the fifth decade saw greater diversification of subtopics under library science, which led to “information science” supplanting “library science—technology” in the top spot. instructional design and archival science emerged from disciplines not discussed in the first decade to become some of the most important disciplines of the fifth decade. the library science subtopics of accessibility and teaching grew significantly as the roles of the librarian evolved. 50 years of ital/jla | lund 30 https://doi.org/10.6017/ital.v38i2.10875 table 8. first ten years vs. last eleven years disciplines (with subcategories of library science). 1968-77 2008-18 1 library science—technology information science 2 information science library science—technology 3 library science—cataloging information systems 4 library science—general computer science 5 computer science library science—cataloging 6 government instructional design 7 corporate library science—accessibility 8 library science—academic library science—academic 9 information systems library science—reference 10 library science—special library science—administration 11 management psychology 12 linguistics government 13 electrical engineering library science—general 14 library science—medical education 15 popular popular 16 library science—reference library science—teaching 17 chemistry sociology 18 physics archival science 19 engineering—general management 20 psychology law 21 mathematics corporate 22 library science—local mathematics 23 communication science philosophy 24 health science literature 25 library science—accessibility linguistics 26 library science—school physics 27 philosophy health science information technology and libraries | june 2019 31 1968-77 2008-18 28 library science—administration geography 29 journalism electrical engineering 30 government library science—medical 31 music biology 32 education art/design 33 literature museum studies 34 economics 35 communication science 36 engineering-general 37 journalism 38 library science—special 39 chemistry 40 science and technology studies 41 library science—school 42 library science—local 43 anthropology table 9 show the ten biggest themes and most frequently used terms throughout jla/ital’s 51 volumes. library is the most common theme and term. the library catalog, and the associated concept of the integrated library system (ils), influence the second and third themes. “online” is an interesting theme/term for the different ways in which it was used throughout the history of the journal. in the early years, “online” was used to refer to the retrieval of computerized bibliographic information; in later years, “online” came to refer almost exclusively to the use of the world wide web. rounding out the top ten terms are several that associated with the study of information science: data, bibliography, and retrieval. finally, table 10 depicts the top ten themes for each of ital’s five decades. libraries remained at the top for all decades; the second spot, however, shifted dramatically. in the first decade, with marc being a major topic of discussion, “system” and “catalog” rose to the top. in decades two and three, with the melding of the disciplines of library science and information science, “information” rose to the top. in the final two decades, the world wide web was influential on the ital discourse. users, usability, and accessibility remain an important theme throughout the history of the journal. 50 years of ital/jla | lund 32 https://doi.org/10.6017/ital.v38i2.10875 table 9. major themes and term frequency in titles of cited publications (all 51 volumes). themes terms 1 library library 2 catalog information 3 system online 4 information system 5 online web 6 usability catalog 7 web digital 8 search data 9 computer bibliography 10 digital retrieval table 10. major themes in titles of cited publications (by decade). 1968-77 1978-87 1988-97 1998-2007 2008-18 1 library library library library library 2 system information information web web 3 catalog system catalog information digital 4 information catalog web digital information 5 online online system usability usability 6 usability web digital users data 7 web usability online catalog users 8 search digital usability search accessibility 9 computer users users accessibility studies 10 digital search accessibility data academic information technology and libraries | june 2019 33 discussion one of the major benefits of a bibliometric study/citation analysis is the production of a set of themes, disciplines, seminal sources, influences, and influencers that may benefit potential authors in determining whether their manuscript is suitable for publication in a specific discipline or journal.21 the results of this study demonstrate that ital is undoubtedly a library science journal, but that it invites a high-level of interdisciplinarity and has experienced a growing impact from the disciplines of information science, computer science, and information systems (which combined presently comprise about 30 percent of total ital references). throughout the journal’s history, there has been an emphasis on library systems, particularly systems for library cataloging. recently, however, there has also been an emphasis on technology, law, and the library as well as instructional technology, teaching, and the library. ital authors take the majority of their citations/ideas from other ital articles, jasist, acm, and other library technology (library hitech, d-lib) and academic librarianship (college and research libraries, journal of academic librarianship) journals. some of the major authors to read to familiarize oneself with the history and themes of the ital publication include fred kilgour, henriette avram, karen markey, and marshall breeding. these are some findings that potential ital authors may find practical use while preparing crafting their research and writing. with ital having a sustained role as a leading publication in library and information science, this study may have some generalizable findings for the discipline. in 2015, richard van noorden produced an interactive chart of the interdisciplinarity of a variety of disciplines, based on data from web of science and the national science foundation.22 if ital is considered representative of a sub-discipline called “library and information science—technology,” it can be compared to the interdisciplinarity of the disciplines listed in van noorden’s study. in the last decade of ital, 45.4 percent of citations came from outside of lis. compared to van noorden’s findings, only 42 of 144 (29 percent) “fields” (or “disciplines,” as they have been referred to as in this study) have a higher proportion of references to outside disciplines. this lis-tech sub-discipline would have a level of interdisciplinarity comparable to the fields of oceanography, botany, philosophy, history, and psychology, and on-par with the average for all social sciences.23 this shows that the discipline certainly has its own proprietary knowledge-base to build upon, but also values the contributions of knowledge from other disciplines. though it is not necessarily the purpose of this study to examine the influence of ital on other journals and within the discipline of lis, some of this information can be gathered rather easily from google scholar (by searching for the journal and comparing the number of citations for each article, as displayed by scholar) and is worth sharing. table 11 shows the top ten most-cited articles published over the history of jla/ital, with mcclure’s 1994 article “network literacy: a role for libraries,” receiving the most references of any article published in the journal. three ital articles have been cited by articles which themselves have over 1,000 citations, including one article (2007’s “checking out facebook.com”) that has been cited by an article which itself has been cited over 10,000 times. fifty-seven ital articles have at least 57 citations, giving the journal an h-index24 of 57. 50 years of ital/jla | lund 34 https://doi.org/10.6017/ital.v38i2.10875 table 11. citations of ital articles in outside journals. rank journal citation number of citations 1 mcclure, c. r. (1994). network literacy: a role for libraries? information technology and libraries, 13(2), 115-26. 447 2 charnigo, l., & barnett-ellis, p. (2007). checking out facebook.com: the impact of a digital trend on academic libraries. information technology and libraries, 26(1), 23-34. 391 3 antelman, k., lynema, e., & pace, a. k. (2006). toward a 21st century library catalog. information technology and libraries 25(3), 128-39. 267 4 spiteri, l. f. (2007). the structure and form of folksonomy tags: the road to the public library catalog. information technology and libraries 26(3), 13-25. 260 5 katz, i. r. (2007). testing information literacy in digital environments: ets's iskills assessment. information technology and libraries 26(3), 3-12. 226 6 jeng, j. (2005). what is usability in the context of the digital library and how can it be measured. information technology and libraries 24(2), 47-56. 196 7 lankes, r. d., silverstein, j., & nicholson, s. (2007). participatory networks: the library as conversation. information technology and libraries 26(4), 17-33. 189 8 dickstein, r., & mills, v. (2000). usability testing at the university of arizona library: how to let the users in on the design. information technology and libraries 19(3), 14451. 188 9 schaffner, a. c. (1994). the future of scientific journals: lessons from the past. information technology and libraries 13(4), 239-47. 177 10 kopp, j. j. (1998). library consortia and information technology: the past, the present, the promise. information technology and libraries 17(1), 7. 166 limitations of this study there were couple of potential limitations to this study. this bibliometric analysis was conducted in the “old-fashioned” way, using excel and hand-typing out all 7,575 cited publications. this was deemed the most effective way to collect the data, based on the availability of ital journal, but did take a great deal of time. to save time in recording data, only the first author for each cited publication was listed and no publication dates were collected, nor were abstracts retained and analyzed (which may provide additional compelling details about the content of these cited publications). greater validity for the assignment of disciplines to cited publications may be achieved by having a large team of researchers for analysis, or using multiple researchers for all citations, not just those that the first researcher deems questionable.25 as with a content analysis, independent review of data and comparison and compromising of coding is likely to provide the most consistent and accurate results. information technology and libraries | june 2019 35 conclusion fifty-one volumes of the journal of library automation/information technology and libraries have been published, over which time library technology has evolved from early-marc, a time in which the exceptional library would have perhaps a single computer for “online retrieval,” to the internet age, characterized by personal computing, library management systems, and technology-aided instruction. as time has passed, many of the major influences on the journal have changed, yet the journal has remained one of the most influential for library and information science technology. increased interdisciplinarity in cited publications and new directions in information law and education offer new directions as the journal enters its sixth decade. endnotes 1 imad al-sabbagh, “evolution of the interdisciplinarity of information science: a bibliometric study” (phd diss., florida state university, 1987). 2 ibid. 3 encyclopedia britannica, https://www.britannica.com/ (accessed sept. 13, 2018). 4 al-sabbagh, “evolution of the interdisciplinarity.” 5 melissa k. mcburney and pamela l. novak, “what is bibliometrics and why should you care?” professional communication conference, ieee (2002): 108-14, https://doi.org/10.1109/ipcc.2002.1049094. 6 lutz bornmann and rudiger mutz, “growth rates of modern science,” journal of the association for information science and technology 66, no. 11 (2015): 2, 215-222, https://doi.org/10.1002/asi.23329. 7 al-sabbagh, “evolution of the interdisciplinarity.” 8 mu-hsuan huang and yu-wei chang, “a study of interdisciplinarity in information science: using direct citation and co-authorship analysis,” journal of information science 37, no. 4 (2011): 369-78, https://doi.org/10.1177/0165551511407141. 9 ming-yueh tsay, “journal bibliometric analysis: a case study on the jasist,” malaysian journal of library & information science 13, no. 2 (2008): 121-39, http://ejum.fsktm.um.edu.my/article/663.pdf. 10 lois buttlar, “information sources in library and information science doctoral research,” library & information science research 21, no. 2 (1999): 227-45, https://doi.org/10.1016/s0740-8188(99)00005-5 11 r.v. chikate and s.k. patil, “citation analysis of theses in library and information science submitted to university of pune: a pilot study,” library philosophy and practice 222 (2008); kuang-hua chen and chiung-fang liang, “disciplinary interflow of library and information science in taiwan,” journal of library and information studies 2, no. 2 (2004): 31-55. 50 years of ital/jla | lund 36 https://doi.org/10.6017/ital.v38i2.10875 12 akhtar hussain and nishat fatima, “a bibliometric analysis of the ‘chinese librarianship: an international electronic journal,’ 2006-2010,” chinese librarianship 31, no. 1 (2011): 1-14, http://www.iclc.us/cliej/cl31hf.pdf. 13 nosheen fatima warraich and sajjad ahmad, “pakistan journal of library and information science: a bibliometric analysis,” pakistan journal of information management and libraries 12, no. 1 (2011): 1-7. http://eprints.rclis.org/25600/. 14 s. thanuskodi, “bibliometric analysis of the journal library philosophy and practice from 20052009,” library philosophy and practice 437 (2010): 1-6. https://digitalcommons.unl.edu/libphilprac/437/. 15 manoj kumar and a.l. moorthy, “bibliometric analysis of desidoc journal of library and information technology during 2001-2010,” desicoc journal of library and information technology 31, no. 3 (2011): 203-08. 16 antonio ramos-rodriguez and jose ruiz-navarro, “changes in the intellectual structure of strategic management research: a bibliographic study of the strategic management journal, 1980-2000,” strategic management journal 25, no. 10 (2004): 981-1,004, https://doi.org/10.1002/smj.397. 17 mariluz fernandez-alles and antonio ramos-rodriguez, “intellectual structure of human resources management research: a bibliometric analysis of the journal human resource management, 1985-2005,” jasist 60, no. 1 (2009): 161, https://doi.org/10.1002/asi.20947. 18 jill crawley-low, “bibliometric analysis of the american journal of veterinary research to produce a list of core veterinary medicine journals,” jmla 94, no. 4 (2006): 430-34. 19 kalervo jarvelin and pertti vakkari, “the evolution of library and information science 19651985: a content analysis of journal articles,” information processing and management 29, no. 1 (1993): 129-44, https://doi.org/10.1016/0306-4573(93)90028-c. 20 r. barry lewis, “nvivo and atlas.ti 5.0: a comparative review of two popular qualitative data-analysis programs,” field methods 16, no. 4 (2004): 439-69, https://doi.org/10.1177/1525822x04269174. 21 thad van leeuwen, “the application of bibliometric analyses in the evaluation of social science research: who benefits from it, and why it is still feasible,” scientometrics 66, no. 1 (2006): 133-54, https://doi.org/10.1007/s11192-006-0010-7. 22 richard van noorden, “interdisciplinarity research by the numbers,” nature 525, no. 7569 (2015): 306-07, https://doi.org/10.1038/525306a. 23 ibid, 306. 24 lutz bornmann and hans-dieter daniel, “what do we know about the h index,” journal of the american society for information science and technology 58, no. 9 (2007): 1,381-385, https://doi.org/10.1002/asi.20609. 25 linda c. smith, “citation analysis,” library trends 30, no. 1 (1981): 83-106. at the click of a button: assessing the user experience of open access finding tools articles at the click of a button assessing the user experience of open access finding tools elena azadbakht and teresa schultz information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12041 elena azadbakht (eazadbakht@unr.edu) is health sciences librarian, university of nevada, reno. teresa schultz (teresas@unr.edu) is social sciences librarian, university of nevada, reno. abstract a number of browser extension tools have emerged in the past decade aimed at helping information seekers find open versions of scholarly articles when they hit a paywall, including open access button, lazy scholar, kopernio, and unpaywall. while librarians have written numerous reviews of these products, no one has yet conducted a usability study on these tools. this article details a usability study involving six undergraduate students and six faculty at a large public research university in the united states. participants were tasked with installing each of the four tools as well as trying them out on three test articles. both students and faculty tended to favor simple, clean design elements and straightforward functionality that enabled them to use the tools with limited instruction. participants familiar with other browser extensions gravitated towards tools like open access button, whereas those less experienced with other extensions preferred tools that load automatically, such as unpaywall. introduction while the open access (oa) movement seeks to make scholarly output freely accessible to a wide number of people, finding the oa versions of scholarly articles can be challenging. in recent years, several tools have emerged to help individuals retrieve an oa copy of articles when they hit a paywall. some of the most familiar of these—lazy scholar, open access button, unpaywall, and kopernio—are all free browser extensions. however, poor user experience can hamper even the adoption of free tools. usability studies, particularly of academic websites and search tools, are prevalent in the literature, but as of yet no one has compared the user-friendliness of these extensions. how open access tools work all of the tools can be installed for free as a google chrome browser extension. all four tools also work in firefox. the idea is that when a user hits a paywall for an article, they can use that tool to search for an open version. each works slightly differently: open access button (https://openaccessbutton.org/)—the oa icon will appear to the right of the browser’s search bar (see figure 1). when a user clicks it, a new page will open that is either the open version of the article if one is found or a message saying it was not able to find an open version. the user is then given the option to write an email to the author asking that it be made open. mailto:eazadbakht@unr.edu mailto:teresas@unr.edu https://openaccessbutton.org/ information technology and libraries june 2020 at the click of a button | azadbakht and schultz 2 figure 1. the oab icon appears as an orange padlock in the browser's toolbar. lazy scholar (http://www.lazyscholar.org/)—a horizontal bar will appear at the top of the page for any scholarly article (see figure 2). along with other information, such as how many citations an article has and the ability to generate a citation for that article, pdf and/or file icons will appear in the middle of the bar if an open version is found. users can then click on any of the icons to be taken to that open version. if no open version is found, no icons will appear. there is no text message indicating nothing has been found. a browser button is also installed, and users can click it to make the bar disappear and reappear. figure 2. the lazy scholar toolbar appears just below the browser's search bar. kopernio (https://kopernio.com/)—a tab will appear in the bottom left corner of the screen for any scholarly article (see figure 3). if there is an open version, the tab will be dark green. if no article is found, the tab will be shorter and grey. if a user hovers over it, they will see a message indicating if an open version was found. when a user clicks on the dark green tab, kopernio automatically opens the article in its own viewer, called a locker, instead of the browser’s viewer. unlike the other three tools, users must register with kopernio and they can add their institution so kopernio can search to see if their institution has access to the article. http://www.lazyscholar.org/ https://kopernio.com/ information technology and libraries june 2020 at the click of a button | azadbakht and schultz 3 figure 3. the kopernio icon appears on the bottom left of the screen. unpaywall (https://unpaywall.org/)—a short tab will appear in the middle right of a screen for a scholarly article. when it has found an open version, the tab will turn bright green (see figure 4). when an open version has not been found, it will turn a light grey. clicking on the grey tab will also open a message indicating an open version could not be found. figure 4. unpaywall's green padlock icon appears halfway down on the right side of the screen. literature review the need for open access finding tools although oa helps take down financial barriers to accessing the scholarly literature, there is no one place to deposit content in order to make it oa. the registry of open access repositories, a database of both institutional and subject repositories, shows 4,725 repositories.1 no central database exists that searches every possible location for oa material, which means discovery of oa content remains difficult. willi hooper noted that “making repository content findable is a major challenge facing libraries.”2 nicholas et al. found in their study of international early-career researchers that most rely on google and google scholar to find scholarly articles and that one of their main goals is to find the full text as fast as possible.3 google scholar does include oa versions of articles, but this is not always readily obvious without clicking and trying each article version until they find an oa version. dhakal also notes that search engines do not always aggregate content in institutional repositories on a consistent basis.4 joe mcarthur, one of the founders of the open access button, said he decided to invent it after hitting paywalls after graduating.5 https://unpaywall.org/ information technology and libraries june 2020 at the click of a button | azadbakht and schultz 4 oaft reviews and other research most of the scholarly literature on oafts has focused on reviews of specific tools. unpaywall has received a number of positive reviews,6 and both open access button7 and kopernio8 have received several as well. dhakal has noted that unpaywall “helps take the guesswork out of accessing oa articles.”9 reviewers have generally found the tools easy to use, although some have included criticism. for instance, rodriguez found that open access button can result in odd error messages and false negatives (that is, not finding an open access version that actually does exist), although he liked the tool overall.10 little research has looked at how well the tools work and how usable they are, however. regier informally investigated why unpaywall and similar tools do not always find articles that are open and noted that one problem is likely that publishers of oa journals do not always upload their license information to crossref, one of the sites that unpaywall relies on.11 schultz et al. looked at how many oa versions the tools found in comparison to google scholar. none of the tools found as many as google scholar, although lazy scholar, unpaywall, and open access button all compared favorably to it, and each tool found at least some open versions that no other tool did.12 usability and other evaluation studies since the late 1990s, libraries have sought to improve the user experience of their websites and electronic resources. usability testing has since become a popular means of evaluating a library’s online presence with the input of its users. blummer’s 2007 literature review chronicles the first phase of this trend in a section of her article dedicated to early usability studies of academic library websites.13 many of these studies included both student and faculty participants and found that navigation issues needed to be resolved in order to maximize users’ ability to locate key information on the library websites being evaluated. some also discovered that users misunderstood library terminology and that providing better descriptive terms and text helped improve the user experience.14 more recent examples of library website usability studies include one from 2018 by guay, rudin, and reynolds and another published in 2019 by overduin. the former’s findings echoed that of earlier studies that a cluttered interface can mask important navigational elements and content, hindering use.15 overduin describes a think-aloud, task-based study of the california state university bakersfield walter w. stiern library that concluded it was important for libraries to consider the preferences of both new and returning users when redesigning their websites.16 while most of the literature involves usability studies of library websites, online catalogs, and discovery layers, librarians have also evaluated other academic products and tools. in 2015, imler, garcia, and clements investigated pop-up chat reference widgets, such as those available through springshare’s libchat software program.17 librarians at the penn state university interviewed thirty students across three campuses, asking them to interact with a chat widget. the vast majority of students did not find the pop-up widget annoying, and many agreed that they would be more likely to use chat reference if they encountered it on the website. in addition, the participants preferred to have at least a ten-second delay between the loading of the webpage and the appearance of the pop-up, with an average ideal time of about fourteen seconds.18 information technology and libraries june 2020 at the click of a button | azadbakht and schultz 5 haggerty and scott evaluated the usability of an academic library search box and its s pecialized search features through task-based interviews with twenty participants, most of whom were students.19 most of the study’s participants indicated a preference for a simplified search box, though some were reluctant at losing access to the specialized search tabs. at around the same time, beisler, bucy, and medaille conducted a primarily task-based usability study of three streaming video databases to determine how patrons were using them.20 the students showed a preference for intuitive interfaces, whereas the faculty were concerned with the videos’ metadata and descriptions as well as the accessibility and shareability of the content. the databases’ advanced features were used less successfully. the results suggest that vendors would benefit from making navigation simpler and terminology clearer while enhancing search functionality. methodology in keeping with usability testing best practices, this study involved twelve subjects total, six students and six faculty members at the university of nevada, reno (unr).21 the authors sought subjects from a diverse set of science and social science disciplines. recruitment efforts consisted of fliers and targeted emails, some sent directly by the authors to faculty members in their liaison areas and others distributed to students and faculty by liaison librarian colleagues. interested students were directed to a simple qualtrics form that asked them for their name, major, class standing, contact information, and whether they had ever used any of the four too ls before. the student participants each received a $15 amazon gift card. faculty did not receive any compensation. the study was approved by unr’s institutional review board (project number 1452303-2). faculty interviews took place in september 2019, and student interviews took place in november 2019. the usability testing took place in three private conference rooms within the main library on a university-owned laptop running microsoft windows 10 and the chrome browser. participants were asked if they were regular users of chrome, and all were to some degree, with several indicating that they use it exclusively. both authors were present at all of the tests, alternating who walked the participants through the various tasks and who took notes. the screencast-omatic screen capture software recorded both the participants’ audio and video as well as their movements on the computer screen. referring to a script, the authors asked the participants to install each browser extension and use them, in turn, to find three scholarly articles from journals not available to the unr community but recently requested through the libraries’ interlibrary loan service. the authors switched the order in which they had participants install the four tools. half of the participants started by installing open access button, followed by lazy scholar, unpaywall, and kopernio, whereas the other half installed them in reverse order, beginning with kopernio. the authors uninstalled the tools between usability tests. the three journal articles were selected with the assistance of unr libraries’ coordinator of course reserves and document delivery services. the study purposely included articles that the university did not have access to in order to ensure that none of the tools found “open versions” simply because of the libraries’ subscription. two were findable by all four oa finding tools and one was a “planned fail” that could not be retrieved by any of them, allowing the autho rs to witness how participants responded to this failure on the part of the tools. participants decided information technology and libraries june 2020 at the click of a button | azadbakht and schultz 6 how long they were willing to spend figuring out a particular tool or finding the full text of an article. if, after a certain point, they deemed an article unfindable or realized that, in another setting, they would have given up on a tool, they could tell the interviewer that they wished to move on to the next task. finally, the authors ended each interview by asking the participants to expand upon any stray observations, share any final thoughts, and name which of the tools, if any, they would consider using in the future. each of the two authors reviewed half of the recordings and any notes, documenting key, deidentified information in a shared google sheet. they kept track of how long it took participants to install each of the four oa finding tools and whether they succeeded in locating the three articles—or, in the case of the planned fail, whether they successfully determined it was inaccessible. they also noted any issues the participants experienced and any comments they made. the authors met and coded all the information they had gleaned from the 12 usability tests together. limitations this study included only 12 participants, all from the same institution. the authors know the faculty participants, as they recruited faculty directly from the disciplines with which they routinely work. moreover, these faculty are all considered early or mid-career. while this was intentional, as the authors wanted to focus on that sub-population of researchers, it may have had an effect on the faculty members’ impression of the tools. likewise, the participants’ comfort with technology, particularly their ability to learn new technology on-the-fly, and prior experience using other browser extensions or research productivity tools was not formally assessed prior to testing. these skills may have impacted how quickly participants were able to figure out a particular tool and how long they tried to find the full text of one of the articles before giving up. results installation all participants successfully installed each of the four tools, and most took around the same time to install each tool, with none taking more than 90 seconds to install. the longest installation, 84 seconds for kopernio, was connected to a technical issue that occurred during the installation. most participants seemed to have an easy time installing the tools, although some noted they found certain tools easier to download than others. for instance, faculty 1 noted that they thought lazy scholar was easier to install than open access button, and faculty 3 said they thought both open access button and unpaywall “were pretty smooth.” student 2 liked it when there was an obvious “install now”–type button, saying, “that’s pretty convenient.” when participants did struggle, it was usually with open access button and lazy scholar. participants did not always seem to realize right away which button on open access button’s website would download the tool. other times, participants were not sure if the tool had installed, not noting the new button on the chrome browser bar. for lazy scholar, one participant, student 4, noted it seemed to take more clicks to install it, and two participants received an error message, although they both were able to successfully install the tool on a second try. kopernio also information technology and libraries june 2020 at the click of a button | azadbakht and schultz 7 resulted in several error messages for one participant when creating an account, and the authors had to use one of their accounts to allow the participant to continue with the study. ability to use the tools when looking at whether participants were able to successfully use each tool on each of the three sample articles, we determined that participants were most successful using unpaywall. all participants successfully used the tool on article 1 and article 3. because article 2 was the planned fail in that no tool was able to locate it, participants who did not realize this and continued to try to use a tool to find it were deemed to have “failed” that particular task. by this measure, one faculty member failed on the second article while using unpaywall. lazy scholar and open access button each had a total of eight fails, with two participants—a faculty member and a student—failing on all three articles, and another student failing on the first two before successfully using the tool on the third article. kopernio had a total of ten fails, with two participants failing on all three articles, and two others failing on the first two. all four of these participants were faculty members. in some cases of failures, participants would either try to find instructions for the tools or try clicking around on the screen and then following various links to see if they could successfully use the tools. in other cases, participants gave a cursory search for the tool but stopped after a short period of time. article 2, the planned fail, also caused confusion for participants. for instance, one faculty participant seemed to think that open access button had a technical glitch and looked to the instructions to see if they could troubleshoot it. others never seemed certain if the tool was working incorrectly or if the article just was not available. another issue came with the article version that lazy scholar returned for article 3. unlike the other instances when the tool took users directly to the article file, in this case lazy scholar took participants to the record page for the article in a scholarly repository. participants could then click on a link for the full text, and it took several participants a few tries of clicking around on the page before finding the correct link. student 2 noted “i expected it to pull up the pdf like all the others did.” another student stopped at the record page, not realizing they could click one more link to get the full text, which was considered a fail. themes several themes emerged during usability testing. a major one was the design of the various extensions, encompassing their aesthetics and on-screen behavior. other themes include the usefulness of each tool’s instructions and additional features as well as how participants’ experience with other browser extensions shaped how their expectations of the four tools. design as with most usability studies, certain design choices determined how successful students and faculty were at finding the three test articles and how they felt about the experience. participants gravitated toward simple, clean designs and faltered or expressed displeasure whenever they encountered extension elements that appeared overloaded with information or details. several participants, for instance, thought that lazy scholar’s toolbar was clunky or too cluttered looking, information technology and libraries june 2020 at the click of a button | azadbakht and schultz 8 even when they successfully used it to find the test articles. there were too many options embedded in the toolbar, which caused confusion, and its small font also proved problematic for a majority of the students and faculty. conversely, several participants said that they appreciated unpaywall’s minimalism, and many turned to it first when instructed to use the four tools to find the three articles. “this one is the most obvious one,” faculty 1 stated. many also responded positively to open access button’s neat-looking icon and simplicity. kopernio’s design led to a mix of user experiences. while participants seemed to appreciate its clear-cut, dark green icon, some of its other features—the search box and the storage “locker”—created unnecessary clutter. throughout testing, participants also expressed mixed views of tools that featured an automatic pop-up as a means of indicating that a free version of an article was or was not available. lazy scholar, unpaywall, and kopernio all involve some version of this design choice. open access button behaves like other commonly used browser extensions, such as zotero, in that the extension remains inactive until clicked upon. the participants’ stated preferences did not always align with their behavior. some participants did not like that the pop-ups appeared without prompting and that the pop-up tools blocked parts of the computer screen. “i like things that go away,” explained faculty 3. faculty 6 noted that they preferred open access button because it did not load automatically and that it opened in a new, separate window. what’s mo re, those participants who had experience using other browser extensions were not expecting the pop -ups and first tried clicking on the tools’ icons embedded in the browser bar. this happened more often with lazy scholar and kopernio than it did with unpaywall, but all three experienced this. however, several who said that they found pop-ups “annoying” or “distracting” nevertheless were able to successfully use the tools to quickly find free versions of the test articles. this discrepancy was especially evident in the case of unpaywall, which almost everyone used successfully and with apparent ease. a tool’s placement on the screen was likewise one of the key aspects of the tools’ design that made it either easier or more difficult to use during usability testing. unpaywall’s tab sits on the middleright side of the computer screen, whereas kopernio’s green “k” tab appears toward the bottom of the screen. sometimes the icon would disappear entirely after a few seconds, reappearing only after the page had been reloaded. kopernio’s location was especially problematic because most participants are not accustomed to needing to scroll or look to the bottom of a webpage. moreover, needing to scroll is “not convenient,” explained student 3. this appeared related to at least some of the failures that participants had with the tool. kopernio’s design did improve somewhat midway through usability testing. the icon is now highlighted when the webpage first loads and stopped dropping to the bottom of the page. however, some participants still missed the icon on their initial use of kopernio. student 2 said afterward that “unpaywall is definitely easier to use, because its pop-up button stayed up.” lazy scholar’s toolbar also proved a stumbling block for several participants. some did not notice it at first whereas others were not sure where within the toolbar they needed to click to retrieve the article, even though this is indicated by a standard pdf or html icon. the use of color also impacted participants’ success with the tools, particularly unpaywall. unpaywall’s lock icon turns bright green when the tool has found an open version of the article information technology and libraries june 2020 at the click of a button | azadbakht and schultz 9 and grey when it has not. both faculty and students appreciated this simple status indicator. “i recognized the little green button,” said student 4. for users with color vision deficiency, however, this favorite feature could be problematic. users can click on the icon regardless of whether they can differentiate between the icon’s two settings or not, but some convenience is lost. the kopernio icon’s darker green is likewise an issue for those with some forms of color blindness. open access button’s and lazy scholar’s color choices garnered less comment. prior experience with browser extensions another aspect of the tools’ design that influenced how participants interacted with a particular tool and how intuitive they ultimately found it to use was their prior experience with other browser extensions. several participants indicated that they used other browser extensions in their everyday lives. specifically, this knowledge appeared to affect their success with open access button, which behaves like most browser extensions do in that it does not launch automatically. faculty 2 said that using open access button “felt the most natural,” and faculty 3 said, “most other browser extensions i’ve used, when you want it to do the thing, you click it.” some participants who had less experience with other browser extensions still managed to use open access button successfully, though it took them slightly longer to do so. however, a few participants failed to use the tool at all during testing, having given up when they could not determine how it worked. instructions participants expressed a desire for simple, straightforward instructions and were more likely to read instructions that seemed succinct and easy-to-follow. they were also more likely to try out a tool just after installing it if the tools’ instructions were clear and if the instructions provided an example they could use to see the tool in action. unpaywall’s instructions do this particularly well, as they consist of minimal text on a large image of how the tool works. open access button and kopernio both provided instructions and examples that helped mitigate some of the issues participants had with them. for example, those who tried out open access button’s example before attempting to find the test articles—or who referred back to the instructions when they encountered a problem—were more likely to use it successfully, even if their prior experience with traditionally designed browser extensions was limited. kopernio’s instructions highlight where the icon appears, which primed the participants to later look towards the bottom of the screen for it. although this did not prevent confusion when using kopernio (as noted previously), it did reduce it. lazy scholar’s instructions, on the other hand, are quite detailed and are written in a very small font. this combination intimidated the participants, many of whom chose to quickly move on to the next task. some scanned the instructions, but none read through them. additional features three of the four tools—lazy scholar, kopernio, and, to a limited extent, open access button— offer additional features, including a way to contact article authors, integration with citation management tools, article metrics, and file storage space. however, many participants did not take note of the tools’ ability to do things other than find open versions of scholarly articles, and their enthusiasm for these options varied. this is likely partly due to the focus of the usability tests on these tools’ core function. more of the students responded positively to the tools’ extra features than did the faculty. information technology and libraries june 2020 at the click of a button | azadbakht and schultz 10 lazy scholar’s and kopernio’s extra features received the most attention. two students responded positively to lazy scholar’s cite option (a mendeley integration) in particular. for student 5, it made lazy scholar stand out. participants also tried out kopernio’s google scholar-powered search box when they had trouble locating and using its pop-up tab. a few students indicated that they would consider using this feature again to find related articles. however, those participants who came across mentions of kopernio’s article storage tool, known as a “locker,” either expressed confusion over its purpose—“locker? what locker?” wondered faculty 5—or were simply not interested in learning more about it. others said they did not need storage space of this kind. “i don’t get the metaphor. my hard drive is my locker,” noted faculty 2. favorites when asked which, if any, of the tools they preferred and would consider using, eight of the participants said unpaywall, followed by seven who said open access button (see table 1). four said they liked lazy scholar, and two said they liked kopernio, although two participants said they specifically would not use kopernio, and two said the same of lazy scholar. it is important to note that many of the participants named multiple tools, suggesting that they saw the need to rely on more than just one tool. table 1. breakdown of preference for oa finding tool by faculty and students. participant group open access button unpaywall lazy scholar kopernio faculty 3 4 1 1 students 4 4 3 1 discussion keep it simple the results show that users most preferred simplicity, including the instructions for downloading and using the tools. for example, participants seemed to have the easiest time downloading and trying out unpaywall because of how large and obvious its download button was, as well as how minimal and large their instructions were. in comparison, participants also seemed to like lazy scholar’s large and easy-to-see download button but disliked the long instructions, which were in a smaller font. as most of them did look at the instructions for unpaywall, it is clear they do find instructions helpful, as long as they can be read and understood in just a few seconds. this also seemed to be the reason why some participants struggled to find open access button’s icon. although the site does provide an instructional image similar to unpaywall’s, it is smaller and does not do as good of a job of pointing out the button’s location. some participants took a moment to look at the image but failed to notice what it was trying to highlight. likewise, participants, especially faculty, seemed to prefer the tools with a simple and clean design. the added features of lazy scholar were not worth the space it took up on the page. a few even remarked negatively on the size of the kopernio pop-up tab, saying it blocked too much of the screen. although a few at first remarked negatively on the unpaywall tab, several said that by information technology and libraries june 2020 at the click of a button | azadbakht and schultz 11 the end of the study, the tab no longer bothered them and that its usefulness outweighed its obtrusiveness. do not assume prior experience most participants who figured out how to use open access button seemed to like it; however, several struggled with finding it to begin with. part of this might be because the other tools, all of which use a pop-up tab, might have conditioned them to look for something similar. however, several participants noted that they were not familiar with browser extensions, which likely affected their ability to find the tool in the browser bar. they would try clicking directly on the article homepage screen. providing clear and obvious instructions would likely help ameliorate this issue. extra features not always worthwhile overall, participants did not seem interested in the extra features, especially kopernio’s locker and open access button’s option to email the author. and for faculty, the additional features of lazy scholar, including citation information and similar articles, proved to be a negative. however, some students did seem interested in these features, meaning this tool might be better for those who are still new to information discovery. conclusion although participants’ reaction to certain design elements, such as pop-ups and the finding tools’ additional features, were mixed, most of them were able to use the four browser extensions successfully. the tools’ location on the computer screen and their similarity (or dissimilarity) to other browser extensions influenced success rates. likewise, clean, simple design elements and straightforward instructions enhanced participants’ experience with the four tools. even though more of the students and faculty said they preferred unpaywall and open access button, each of the four tools appealed to at least some of the participants. both students and faculty were excited to find out about these tools and some even expressed surprise that they are freely available. many seemed open to the idea of using more than one tool, which can be helpful given each extension’s distinctive approach to finding and retrieving articles. however, having them use four tools at once also appeared to create issues for at least some of the participants as they would confuse which tool was which. librarians and other oa advocates can use the information from this study to help guide potential users to the tools that best suit their individual preferences and comfort level with similar technologies. increased promotion will ramp up adoption of the tools by a more diverse pool of users, which will ultimately generate the feedback needed to make the extensions more intuitive overall. endnotes 1 registry of open repositories, “welcome to the registry of open access repositories registry of open access repositories,” 2019, http://roar.eprints.org/. 2 michaela d. willi hooper, “product review: unpaywall [chrome & firefox browser extension],” journal of librarianship & scholarly communication 5 (january 2017): 1–3, https://doi.org/10.7710/2162-3309.2190. http://roar.eprints.org/ https://doi.org/10.7710/2162-3309.2190 information technology and libraries june 2020 at the click of a button | azadbakht and schultz 12 3 david nicholas et al., “where and how early career researchers find scholarly information,” learned publishing 30, no. 1 (january 1, 2017): 19–29, https://doi.org/10.1002/leap.1087. 4 kerry dhakal, “unpaywall,” journal of the medical library association 107, no. 2 (april 15, 2019): 286–88, https://doi.org/10.5195/jmla.2019.650. 5 eleanor i. cook and joe mcarthur, “what is open access button? an interview with joe mcarthur,” the serials librarian 73, no. 3–4 (november 17, 2017): 208–10, https://doi.org/10.1080/0361526x.2017.1391152. 6 terry ballard, “two new services aim to improve access to scholarly pdfs,” information today 34, no. 9 (november 2017): cover-29; chris bulock, “delivering open,” serials review 43, no. 3–4 (october 2, 2017): 268–70, https://doi.org/10.1080/00987913.2017.1385128; dhakal, “unpaywall”; e. e. gering, “review: unpaywall,” may 24, 2017, https://eegering.wordpress.com/2017/05/24/review-unpaywall/; barbara quint, “must buy? maybe not,” information today 34, no. 5 (june 2017): 17; michael rodriguez, “unpaywall,” technical services quarterly 36, no. 2 (april 3, 2019): 216–17, https://doi.org/10.1080/07317131.2019.1585002; willi hooper, “product review.” 7 quint, “must buy?”; michael rodriguez, “open access button,” technical services quarterly 36, no. 1 (january 2, 2019): 101–2, https://doi.org/10.1080/07317131.2018.1532043. 8 ballard, “two new services aim to improve access to scholarly pdfs”; matthew b. hoy, “kopernio,” journal of the medical library association 107, no. 4 (october 1, 2019): 632–33, https://doi.org/10.5195/jmla.2019.805. 9 dhakal, “unpaywall.” 10 rodriguez, “open access button.” 11 ryan regier, “how much are we undercounting open access? a plea for better and open metadata.,” a way of happening (blog), may 1, 2019, https://awayofhappening.wordpress.com/2019/05/01/how-much-are-we-undercountingopen-access-a-plea-for-better-and-open-metadata/. 12 teresa auch schultz et al., “assessing the effectiveness of open access finding tools,” information technology and libraries (online) 38, no. 3 (september 2019): 82–90, https://doi.org/10.6017/ital.v38i3.11009. 13 barbara a. blummer, “a literature review of academic library web page studies,” journal of web librarianship 1, no. 1 (june 21, 2007): 45–64, https://doi.org/10.1300/j502v01n01_04. 14 blummer, “a literature review,” 49–51. 15 sarah guay, lola rudin, and sue reynolds, “testing, testing: a usability case study at university of toronto scarborough library,” library management 40, no. 1/2 (january 1, 2019): 88–97, https://doi.org/10.1108/lm-10-2017-0107. https://doi.org/10.1002/leap.1087 https://doi.org/10.5195/jmla.2019.650 https://doi.org/10.1080/0361526x.2017.1391152 https://doi.org/10.1080/00987913.2017.1385128 https://eegering.wordpress.com/2017/05/24/review-unpaywall/ https://doi.org/10.1080/07317131.2019.1585002 https://doi.org/10.1080/07317131.2018.1532043 https://doi.org/10.5195/jmla.2019.805 https://awayofhappening.wordpress.com/2019/05/01/how-much-are-we-undercounting-open-access-a-plea-for-better-and-open-metadata/ https://awayofhappening.wordpress.com/2019/05/01/how-much-are-we-undercounting-open-access-a-plea-for-better-and-open-metadata/ https://doi.org/10.6017/ital.v38i3.11009 https://doi.org/10.1300/j502v01n01_04 https://doi.org/10.1108/lm-10-2017-0107 information technology and libraries june 2020 at the click of a button | azadbakht and schultz 13 16 terezita overduin, “‘like a robot’: designing library websites for new and returning users,” journal of web librarianship 13, no. 2 (april 3, 2019): 112–26, https://doi.org/10.1080/19322909.2019.1593912. 17 bonnie brubaker imler, kathryn rebecca garcia, and nina clements, “are reference pop-up widgets welcome or annoying? a usability study,” reference services review 44, no. 3 (2016): 282–91, https://doi.org/10.1108/rsr-11-2015-0049. 18 imler, garcia, and clements, “are reference pop-up widgets welcome or annoying,” 287–9. 19 kenneth c. haggerty and rachel e. scott, “do, or do not, make them think?: a usability study of an academic library search box,” journal of web librarianship 13, no. 4 (october 2, 2019): 296–310, https://doi.org/10.1080/19322909.2019.1684223. 20 amalia beisler, rosalind bucy, and ann medaille, “streaming video database features: what do faculty and students really want?,” journal of electronic resources librarianship 31, no. 1 (january 2, 2019): 14–30, https://doi.org/10.1080/1941126x.2018.1562602. 21 ritch macefield, “how to specify the participant group size for usability studies: a practitioner’s guide,” journal of usability studies 5, no. 1 (2009): 34–5; world leaders in research-based user experience, “how many test users in a usability study?,” nielsen norman group, accessed december 24, 2019, https://www.nngroup.com/articles/how-manytest-users/; robert a. virzi, “refining the test phase of usability evaluation: how many subjects is enough?,” human factors 34, no. 4 (august 1, 1992): 457–68, https://doi.org/10.1177/001872089203400407. https://doi.org/10.1080/19322909.2019.1593912 https://doi.org/10.1108/rsr-11-2015-0049 https://doi.org/10.1080/19322909.2019.1684223 https://doi.org/10.1080/1941126x.2018.1562602 https://www.nngroup.com/articles/how-many-test-users/ https://www.nngroup.com/articles/how-many-test-users/ https://doi.org/10.1177/001872089203400407 abstract introduction how open access tools work literature review the need for open access finding tools oaft reviews and other research usability and other evaluation studies methodology limitations results installation ability to use the tools themes design prior experience with browser extensions instructions additional features favorites discussion keep it simple do not assume prior experience extra features not always worthwhile conclusion endnotes editorial board thoughts: policy before technology — don’t outkick the coverage editorial board thoughts policy before technology don’t outkick the coverage brady lund information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14773 brady lund (blund@g.emporia.edu) is a doctoral candidate and lecturer at emporia state university and a member of the ital editorial board. © 2022. opinions expressed in this column are the author’s and do not reflect those of the editorial board as a whole or of core, a division of ala. in the race to adopt the newest and best, practical considerations for emerging technologies are frequently overlooked. technology can set an organization apart and, in the case of libraries, be instrumental in helping demonstrate value. yet, all new technologies carry additional, potentially unpleasant consequences, whether it be threats to privacy and security, barriers to accessibility or risks to health, learning barriers, or exposure to misinformation. organizations must consider these threats before introducing new technologies, rather than the other way around. to illustrate these threats and their policy implications, i will briefly discuss two popular technologies/innovations—virtual reality and data analytics—and the threats that are often overlooked by organizations and how they may be appropriately addressed by policy. virtual reality (vr) has quickly become a popular technology in all types of libraries and learning organizations. as noted in many recent publications, vr provides an immersive and interactive medium to engage with learning and entertainment content.1 of course, libraries are always seeking new ways to engage patrons with their collections and services, so it is natural that there would be high interest in this technology. however, i have observed that this technology is frequently made available with little foresight or oversight of potential issues. the engaging interface of vr technology also presents risks to certain individuals. it has been known to invoke seizures among those who are predisposed and can cause severe dizziness and disorientation.2 these risks are severe enough that the institutional review board at my university required a safety disclaimer be included for any project that utilized vr technology for learning. however, inclusion of a disclaimer is not necessarily common practice in library research and certainly not for non-research projects. further, substantial learning barriers should be acknowledged for virtual reality technology. a learning curve is perhaps a less-serious threat, compared to the health and safety risks, but can still lead to non-use or misuse of the technology.3 libraries should want as many patrons as possible to use the technology to enrich their lives. this includes individuals who have limited technology experience. it is important to provide education and policy to ensure the technology is used properly, such that the technology will not be damaged, and the user will not quit trying to use the technology due to frustration. specific policies for the use of vr technology could be integrated into existing technology policy (if such a policy already exists) or created as a new policy. either way, it should be highly visible, and patrons should be asked to acknowledge it before use. the policy may include elements like that “the patron must follow all library employees’ guidance on how to properly use the vr headset” and “the patron is encouraged to ask any employee for assistance with the headset.” while a library may not be able to foresee or enforce perfect policy for all issues that arise from using emerging technologies like vr, these are some commonsense policy items that protect the user, the library, and the technology while it is in use. though a vastly different “technology” in many ways, the evolution of data analytics in modern libraries similarly poses significant threats to library patrons. as opposed to physical threats to well-being, the threats associated with data analytics are mostly related to social, psychological, and economic well-being mailto:blund@g.emporia.edu information technology and libraries march 2022 policy before technology | lund 2 through privacy and security risks. depending on how data is used, it can be rather innocuous or overtly malicious. it is not always clear when data goes from being innocuous to being a threat.4 collecting patron addresses can seem like necessary and acceptable data in order to issue a library card. libraries could use this data, though, in conjunction with census and other government data, to identify demographics of library users, like the ethnicity of patrons. this could be helpful in knowing, for instance, that a library has a large hispanic patron-base and thus may want to invest in spanish-language resources, but it also involves using data that patrons were forced to supply in order to profile them and make inferences about what materials they would like. understandably, many patrons would likely rather not have private data about them collected and analyzed, even if it could significantly improve services. there is certain data—like the addresses mentioned above—that libraries must collect in order to provide services. this cannot be avoided. rather, what should be done is to have a policy that clearly (without much legal jargon) outlines what data is collected and for what purposes it is used. everyone knows that no one reads lengthy legal disclaimers. while it may be seen as above-board in the eyes of the law, slipping policies that the library knows most patrons would question into a disclaimer is unethical. any questionable policies or procedures should be made clear to patrons, so that they can make an informed decision on whether to opt out of those services. it is great to have extensive data to improve services, but it should not be collected without real consent. no librarian should go home at the end of the day with any question about whether they used proper data collection procedures. additionally, there are always risks with the storage and maintenance of data. how is the data being stored? what security measures have been taken? these questions, along with the concerns in the prior paragraphs, are items that would all have to be addressed in an ethical review application for human subjects research at a university, but could be (and often are) overlooked when it comes to library service s and assessment. this may be particularly true at public libraries, which are not connected to an institution of higher education (which provide some ethical oversight). it is always better to start with a policy than to make one up as one goes along, even if it is necessary to adjust the policy over time as new risks and considerations emerge. for those who are creating a new policy from scratch, one of the best sources of information and inspiration can be the existing policies of other, similar organizations.5 for example, a large public library may look to the data policies of a similarlysituated large public library for inspiration. i encourage additional works by researchers within the field of library technology to strengthen evidence based practice within the area of technology policy formation. it is important to be careful with the design of policy and not to come at it without first doing your homework. yet, at the same time, it is important to consider the unique context of your own institution. what is a successful policy for one library may not be so for another—you must know your service population and specific space and technology infrastructure and management capacities. something like the administrative structure of a library system can significantly impact the success of policy implementation. policy, understandably, can be seen as a boring—if necessary—part of the proper functioning of a library and its technology. this can lead to policy being something that either is created in haste or after considerable procrastination, or something that becomes the subject of unnecessary, prolonged debate among library administration. in most cases, appropriate policy can, in fact, be quite straightforward, if libraries rely upon existing policy examples, understanding of the technology in question, and a thorough assessment of their library environment to guide the policy-drafting process. technology policy can be a boring subject, but its necessity cannot be overstated for reducing liability and threats to the well-being of patrons, library employees, and property. it is important to have technology policy in place before the information technology and libraries march 2022 policy before technology | lund 3 technology is made available to the public so that patrons can make informed decisions about whether to use the technology and/or agree to share data. endnotes 1 matt cook et al., “challenges and strategies for educational virtual reality,” information technology and libraries 38, no. 4 (2019): 25–48, https://doi.org/10.6017/ital.v38i4.11075; kenneth j. varnum, beyond reality: augmented, virtual, and mixed reality in the library, (chicago, il, american library association, 2019). 2 james s. spiegel, “the ethics of virtual reality technology: social hazards and public policy recommendations,” science and engineering ethics 24, no. 5 (2018): 1537–50, https://doi.org/10.1007/s11948-017-9979-y. 3 amy restorick roberts et al., “older adults’ experiences with audiovisual virtual reality: perceived usefulness and other factors influencing technology acceptance," clinical gerontologist 42, no. 1 (2019): 27–33, https://doi.org/10.1080/07317115.2018.1442380. 4 yong jin park, "personal data concern, behavioral puzzle and uncertainty in the age of digital surveillance," telematics and informatics 66 (2022): article 101748, https://doi.org/10.1016/j.tele.2021.101748. 5 lili luo, “experiencing evidence-based library and information practice: academic librarians’ perspective,” college and research libraries 79, no. 4 (2018): 554–67, https://doi.org/10.5860/crl.79.4.554. https://doi.org/10.6017/ital.v38i4.11075 https://doi.org/10.1007/s11948-017-9979-y https://doi.org/10.1080/07317115.2018.1442380 https://doi.org/10.1016/j.tele.2021.101748 https://doi.org/10.5860/crl.79.4.554 endnotes facing what’s next, together lita president’s message facing what’s next, together emily morton-owens information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12383 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the acting associate university librarian for library technology services at the university of pennsylvania libraries. when i wrote my march editorial, i was optimistically picturing some of the changes that we are now seeing for lita—while being scarcely able to imagine how the world and our profession would need to adapt quickly to the impacts on library services as a result of covid-19. it is a momentous and exciting change for us to turn the page on lita and become core, yet this suddenly pales in comparison to the challenges we face as professionals and community members. libraries’ rapid operational changes show how important the ingenuity and dedication of technology staff are to our libraries. since states began to shut down, our listserv, lita-l, has hosted discussions on topics like how to provide person-to-person reference and computer assistance remotely, how to make computer labs safe for re-occupancy, how to create virtual reading lists to share with patrons, and how to support students with limited internet access. there has been an explosion in practical problem-solving (ils experts reconfiguring our systems with new user account settings and due dates), ingenuity (repurposing 3d printers and conservation materials to make masks), and advocacy (for controlled digital lending). sometimes the expense of library technologies feels heavy, but these tools have the ability to scale services in crucial ways—making them available to more people at the same time, available to people who can only take advantage after hours, available across distances. technologists are focused on risk, resilience, and sustainability, which makes us adaptable when the ground rules change. our websites communicate about our new service models and community resources; ill systems regenerate around increased digital delivery; reservation systems for laptops now allocate the use of study seating. our library technology tools bridge past practices, what we can do now, and what we’ll do next. one of our values as ala members is sustainability. (we even chose this as the theme for lita’s 2020 team of emerging leaders.) sustainability isn’t about predicting the future and making firm plans for it; it’s about planning for an uncertain future, getting into a resilient mindset, and including the community in decision-making. although the current crisis isn’t climate-related per se, this way of thinking is relevant to helping libraries serve their communities. we will need this agile mindset as we confront new financial realities. our libraries and ala itself are facing difficult budget challenges, layoffs, reorganizations, and fundamental conversations about the vitalness of the services we provide. my favorite example from my own library of a covid-19 response is one where management, technical services, and it innovated together. our leadership negotiated an opportunity for us to gain access to digitized, copyrighted material from hathitrust that corresponds to print materials currently locked away in our library building. thanks to decades of careful effort by our technical services team, we had accurate data to match our print records with records for the digital versions. our it team had processes for loading the new links into our catalog almost mailto:egmowens.lita@gmail.com information technology and libraries june 2020 facing what’s next, together | morton-owens 2 instantaneously. the result was a swift and massive bolstering of our digital access precisely when our users needed it most. this collaboration perfectly illustrates how natural our merger with alcts and llama is. as threats to our profession and the ways we’ve done things in the past gather around us, i am heartened by the strengths and opportunities of core. it is energizing to be surrounded by the talent of our three organizations working together. i hope more of our members experience that over the summer and fall, as we convene working groups and hold events together, including a unique social hour at ala virtual and an online fall forum. i close out my year serving as the penultimate lita president in a world with more sadness and uncertainty than we could have foreseen. we are facing new expectations and new pressures, especially financial ones. as professionals and community members, we are animated by our sense of purpose. while lita has been transformed by our vote to continue as core, the support and inspiration we provide each other in our association will carry on. the current state and challenges in democratizing small museums' collections online article the current state and challenges in democratizing small museums’ collections online avgoustinos avgousti and georgios papaioannou information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.14099 avgoustinos avgousti (a.avgousti@cyi.ac.cy) is a researcher, the cyprus institute, cyprus. georgios papaioannou (gpapaioa@ionio.gr) is associate professor in museum studies and director of the museology research laboratory, ionian university, corfu, greece. © 2023. abstract this article focuses on the problematic democratization of small museum collections online in cyprus. while the web has enabled cultural heritage organizations to democratize information to diverse audiences, numerous small museums do not enjoy the fruits of this digital revolution; many of them cannot democratize their collections online. the current literature provides insight into small and large museums’ challenges worldwide. however, we do not have any knowledge concerning small cypriot museums. this article aims to fulfill this gap by raising the following research question: what is the current state of small museum collections online in cyprus, and what challenges do they face in democratizing their collections online? we present our empirical results from the interview summaries gathered from six small museums. introduction cultural heritage digitization and online accessibility offer an unprecedented opportunity to democratize museum collections. online collections, typically presented on institutional websites, represent the world’s culture, an increasing trend toward a world where information is digitally preserved, stored, accessed, and disseminated instantaneously through a global and interconnected digital network. consumers search for information on the web has enabled cultural heritage institutions to democratize their collections online, yet most small museums have not benefited from this process and do not have their collections online. as a result of the above-mentioned problem, digital versions of small museum collections are primarily inaccessible, meaning less access to information “knowledge.” there is a clear need for small museums to remain relevant by publishing their collections online. small museums must move quickly into the digital world. current literature provides insights into the challenges they face worldwide. however, we do not have knowledge regarding the situation in cyprus. this study aims to fill this gap by researching small museums in cyprus and asking the following research question: what is the current state of small museum collections online in cyprus, and what challenges do they face in democratizing their collections online? what is a small museum? museums are defined as small based on their annual budget and number of staff. the american association for state and local history (aaslh) defines museums as small if they have an annual budget of less than $250,000 and limited staff with multiple responsibilities. other factors such as the size of collections and the physical size of the museum could further categorize a museum as small. katz set the same budget and set the staff number at five or less.1 honeysett and falkowski put the budget at $300,000 and five or fewer employees.2 miller notes that the average small mailto:a.avgousti@cyi.ac.cy mailto:gpapaioa@ionio.gr information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 2 avgousti and papaioannou museum has just two full-time employees and a budget of less than $90,000.3 watson by contrast defines small museums as ones that grew out of the community they serve. 4 for the purposes of this article, a small museum is one with more than one but less than five fulltime employees not including museum custodians. categorizing a museum based on its budget is difficult and contentious since often museum staff are funded by another body, such as a municipality. literature review cultural heritage institutions such as galleries, libraries, archives, and museums (glams) were among the first organizations to digitize information by creating databases whose access was granted locally to institutional cardholders (horan, 2013). the process of digitization is of paramount importance, with museums eager to offer online access to their physical collections .5 online collections provide a range of opportunities, including the facilitation of knowledge sharing and the creation of a participatory environment that promotes information exchange.6 through their online presence, museums can present their collections to a global audience.7 the accessibility of digital knowledge opens the door for further knowledge to be generated and enhances the educational reach of cultural institutions.8 online collections create opportunities for small and geographically isolated museums to deliver learning opportunities to audiences around the world, something all museums should aim for.9 while larger museums have done well, smaller ones have not been as successful.10 much of past knowledge is stored in small museums, whose importance in preserving cultural heritage should not be underestimated.11 they sometimes add far more to social capital than larger national ones.12 though the need for museum collections online is recognized, there are limitations. if it was simple, every museum would be online.13 however, most small museums are not online.14 their collections remain digitally inaccessible to future generations.15 oberoi and arnold have gone so far as to maintain that information absent from the internet can be regarded as nonexistent. 16 on the other hand, in rare cases where small museums have their collections online, they target human consumers.17 the information is stored in isolated data silos incompatible with automatic processing. the challenge is to make collections discoverable via online search engines and metadata aggregators.18 the issue appears to have been ongoing for many years as gergatsoulis and lilis maintain that the web lacks semantic information and it has proved challenging to process such a massive set of interconnected data as mentioned 18 years ago.19 clearly, online collections must be understood and used efficiently both by humans and machines, because machine-consumable content will end up in human-consumable content.20 the current state and challenges small museums find it difficult to publish their collections online. most large museums have undergone a digital transformation, but few small ones have.21 the museum survey by tongue in 2017 showed that the number of museums planning to publish their collections online decreased from 40 percent in 2016 to 24 percent in 2017, although only 8 percent had already gone online by 2018.22 the survey in 2020 by network of european museum organisations (nemo) on digitization in european museums shows that an average of 20 percent of museum collections in europe as a whole are online, and the median is 10 percent. surprisingly, 43 percent are digitized but not online, meaning the public has access to less than half of the existing digital items.23 information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 3 avgousti and papaioannou a report by flynn in 2018 reveals that most historical society collections are not accessible online.24 according to honeysett and falkowski, the majority of museums in their survey in the us have less than 10 percent of their collections online.25 according to a survey by axiell in 2017, only 21 percent of museums have a complete collection online, 27 percent more than half, 38 percent percent less than half, and 14 percent percent have no collections online.26 in 2020 beaudoin pointed out that approximately 32 percent of us art museums with holdings provide online collection systems that are openly available to the public, while 13 percent do not even have an institutional website.27 avgousti, papaioannou, and gouveia indicated that even if small museums manage to give online access to their collections, they are often stored in isolated data silos incompatible with automatic processing.28 the museum survey by vernon systems in 2016 showed that 82 percent of museums do not use any machine-consumable standards.29 furthermore, only 11.9 percent use dublin core as a metadata standard, 3.6 percent darwin core, 1.2 percent ead and 8.3 percent other. further, the existence of individual collections online, maintained by different organizations, brings challenges to the discoverability, sharing, and reuse of resources .30 metadata aggregation is a frequently utilized strategy in which centralized organizations, such as europeana,31 collect associated metadata to make resources more discoverable and usable. why do we witness such low levels of online publishing in small museums? and why are online collections not in a format that is searchable and easy to find? according to the relevant literature, small museums lack resources and skilled staff to move to the digital age. current obstacles a key obstacle in the digitization of small museum collections is insufficient resources. large cultural heritage institutions have much greater access to funds.32 according to klimper, while the internet has had a tremendous impact on the democratization of european culture, insufficient financial resources remains a significant challenge for small museums.33 irina oberlander from the institute of cultural memory has pointed out that small and medium-sized museums with limited budgets are digital age victims.34 laine-zamojska stressed that small museums, which are often entirely run by volunteers, cannot afford to digitize or make their collections available to a wider audience.35 therefore, online access to cultural heritage in these small institutions is minimal. the nemo report in 2020 showed that insufficient staff is another major obstacle for museum digitization and online accessibility.36 small museums are understaffed.37 this is confirmed by gallery systems, who noted that small museums face their own set of collection challenges.38 with smaller team sizes and limited staff hours, it is difficult to operate. the museum survey by tongue in 2017 showed that 73 percent of museums did not have dedicated staff to manage online collections.39 this means that collection management is given to staff who already have a full job description. avgousti, papaioannou, and gouveia pointed out that museums do not usually hire experts to plan, develop, deploy, and maintain a digital collection, but delegate the task to museum staff who are often limited in technological skills,40 while wigodner and kearney mentioned that small museums typically have fewer (if any) employees devoted to web publishing.41 fewer employees often means a lack of skilled personnel. in the aforementioned survey, no museum with fewer than 50 staff members reported employing a computer expert.42 additionally, honeysett and falkowski mention that two-thirds of museums have one or no it personnel.43 in information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 4 avgousti and papaioannou addition, the same concern has been observed in small university libraries, whereby 67 percent did not have an it expert.44 further, small museums do not have suitable technology, and in many cases, the staff is not technologically adept.45 additionally, klimper affirms that the internet promise of providing access to european culture is hampered by a lack of technological skills.46 considerable expertise in semantic web technologies is needed to expose machine-consumable content to the “web of data.”47 finally, in-depth knowledge of modeling, along with programming skills, are also essential needs. complexity of technology and metadata issues the nemo report in 2020 showed that less than 20 percent of museum collections are online.48 as already mentioned, this may be attributed and related to the prerequisites of online collections as they include complex technology or the need for online platforms. additionally, avgousti, papaioannou, and gouveia pointed out that small museums do not have suitable technologies.49 within the discussion on the semantic web (also known as web 3.0, the world wide web’s extensions that make internet data machine-readable via applying standards), corlosquet stated that one of the significant challenges is getting semantic web data annotations to the end-user applications. if this is achieved, there will be faster adoption of the web of data. moreover, while content management systems (cms) significantly aid the production of online content by end users, the problem of allowing the user to produce semantic web content remains elusive.50 further, velios discusses the problem of understanding semantic web concepts concerning complex setups.51 such setups may be bewildering for those humanities scholars without a technical background. he mentions that the semantic web does not offer the necessary tools to accommodate data easily. vavliakis, karagiannis, and mitkas postulate that even for the mainstream use of the semantic web in the cultural heritage community, easily operated tools are also required.52 cultural heritage institutions are encouraged to start processing and publishing content with semantic technologies. still, the tools which can undertake such a considerable task continue to lack user friendly features. daradimos, vassilakis, and katifori claim that small museums use content management systems to publish their collections online.53 however, using a general-purpose cms (e.g., drupal) comes with great difficulty, primarily due to the lack of technical information such as dublin core fields, as nontechnical staff cannot be expected to know how to install and configure appropriate modules within drupal to enable the entry and publication of this metadata.54 however, there has been little development of the current cmss regarding user-friendly tools targeting the implementation of semantic markup annotations. the integration of cms into semantic web technologies will increase cultural heritage knowledge dissemination remarkably. further, the absence of robust and easily usable tools is considered a central challenge that continues to pose obstacles concerning the rapid adoption of semantic web and linked data.55 antoniou and van harmelen explain that the semantic web’s adoption relies on developing new and straightforward tools.56 the semantic web is also being based on the adoption of the existing technology rather than on new scientific solutions. modern and easy-to-use tools will facilitate the information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 5 avgousti and papaioannou semantic web’s adoption compared with what is available in the current conjuncture. however, only a small number of institutions use semantic technologies. tim berners-lee, the brains behind the semantic web, points out that the machine-readable web is always farther off compared to the human-readable web.57 in cases of large and well-funded organizations or museums like the bbc or the british museum, it is possible to work with semantic web technologies. on the other hand, small museums will have difficulties with the semantic web’s smooth implementation.58 it is pivotal to emphasize that challenges related to the implementation of machine-consumable content by museums rely heavily on adopting existing technology rather than on scientific approval. as antoniou and van harmelen have underlined, the most significant needs are observed in the areas of easily accessible tools that are approaching nontechnical communities. the most significant technological progress will lead to a more advanced semantic web compared to what can be achieved today.59 methodology data collection methods interviews are regularly used in qualitative research for data collection.60 structured interviews lead to more specific answers, usually in a controlled environment. in unstructured interviews, there are no set-in-advance questions, and the interview can be very broad, open, and exploratory. semistructured interviews fall in the middle, as they allow both a few specific questions to be addressed and space for extra information via deviating from the set questions. this is the main reason why they are one of the most popular and widely used methods of data collection.61 the interview type selected depends on the questions to be asked and the research method. the current research aims to collect a comprehensive understanding of a problem. therefore, semistructured interviews were the ideal tool, and an interview guide containing open-ended questions was developed. selection of the sample the researcher selected museums based on nonrandom criteria. techniques for nonprobability sampling methods are often suitable for qualitative research. nonprobability sampling’s aim is not to test a hypothesis about a large population but to establish an initial understanding of a small community or a population under research. the current research targets small museums in cyprus. therefore, a nonprobability sampling method was used to select small museums. the small museums contacted were not always responsive. however, we managed to conduct interviews with six small museums in cyprus using the snowball sampling method, where the researcher asked the interviewee to refer other people for conducting future interviews. sample size in the current study, the sample population is homogeneous, meaning the population is related to small museums in cyprus. when the population is homogenous the sample size should be at least 4 to 12 cases. in cases of heterogeneous samples, for example in small museums from around the world, the sample size must be at least 12 to 30 cases. in more complex cases such as ethnographic or grounded theory, the sample size must be larger. information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 6 avgousti and papaioannou in our case, we started with two cases and continued until data saturation was achieved, the point in the research when no new information is discovered in data analysis.62 cyprus has 34 museums of which 10 are small (for this survey, defined has having one full-time and fewer than five total employees). we interviewed six of the 10 small museums and reached data saturation after interviewing the first four. conducting the interviews in preparation for the interviews, we contacted the interviewees by phone and email, informing them about the interview. information on the size of the staff was gathered by contacting the museum. while ten museums met the definition of small, only six agreed to participate in the research. first, a pilot test was conducted on two interviewees to identify any problems with the interview guide. based on this pilot test, we made changes and corrected mistakes. due to the covid-19 pandemic, interviews were conducted via internet-based technologies, mostly zoom, a video telephony software program, chosen because of its ability to record video. the interview length was about 20–25 minutes, and all participants had the option to choose greek or english as the interview language. due to the pandemic and logistic challenges, it took about six months to identify subjects and conduct the interviews. results this section discusses the empirical results extracted from the interview summaries. interviews were conducted in greek (both authors are native speakers of greek) and translated to english by the authors. under the major headings of our research subject, we present our findings concerning our research question. the current state of collections online our results indicate that most small museums in cyprus do not have an online presence. two of the six museums (4 and 5) do not have a website. the ones that do have websites created but not updated or supported for more than 15 years, and which therefore need replacement. here are two representative comments: “the museum has an old and simple website” (respondent 1); “[we have] a very old website that needs to be changed soon” (respondent 3). the two museums that do not have a website, use/have used social media: “the museum uses facebook and instagram” (respondent 4); “[we] used to have a facebook page” (respondent 5). we discovered that five of the six museums do not have their collections online: “the museum does not have any of its collections online” (respondent 1); “no online collections” (respondent 4); “we do not have any collections online” (respondent 5). further, we learned that none of the museums use machine-consumable standards to achieve wider interoperability on the web: “the online collections are only in a human-readable format” (respondent 2); “we do not use any machine-readable” (respondent 3). however, museums understand the need and benefits of such solutions: “our goal is to have the online collection understandable by machines and share metadata online” (respondent 2). information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 7 avgousti and papaioannou we noticed that all museums are willing to give online access to their collections, complete or partial, and agree that the primary goal is to disseminate information: “the primary goal is to give access to museum collections for general use” (respondent 1); “to put it another way, to communicate information, knowledge to scholars and the general public” (respondent 2); “to reach as many people as possible and spread those collections online to a variety of audiences” (respondent 3); “the main reason that online collections exist is that it is the tool to reach more people and disseminate those collections online to diverse audiences, researchers, and the public alike, in other words to disseminate knowledge” (respondent 4); “to disseminate knowledge and information to more people such as students and researchers and the general public” (respondent 5). museums also view online collections as a marketing tool that can bring more people to the museum’s physical space: “[online collections] can work as a marketing tool, people that can view our collections online may visit the museum physical space” (respondent 4); “the main goal is to be found” (respondent 5); “tourists coming to cyprus can use the system and find out about our collections and the museum” (respondent 6). clearly, museums are eager to give online access to their collections. the goal is to disseminate information and attract more people to their physical premises. when asked about the goals of publishing machine-consumable content online, findability was most significant: “nowadays, people are using search engines to find the information they are looking for. and since the information is not in a machine-readable format and understandable by search engines, it creates difficulties to be located online” (respondent 1); “[the goal is] to make the collections more findable” (respondent 2); “… to be easily findable by search engines on the internet” (respondent 3); “[to] increase wider findability of the collections over the web” (respondent 6). additionally, we discovered that some museums are not aware of the existence of machinereadable formats: “i am not aware of machine-readable data” (respondent 4); “the museum is not aware of any machine-readable standards for wider web interoperability” (respondent 5). it is evident that findability is the main goal in online content. but it is also clear that some museums are not aware of the existence of machine-readable standards and such technologies. the current challenges of collections online insufficient resources and the cost of existing solutions our study shows that museums’ insufficient resources and the cost of existing solutions are the main obstacles in having their collections online. here are five representative comments: “lack of money” (respondent 1); “we got offers from different companies; however, the costs of existing solutions were well above our budget and possibilities” (respondent 2); “the main obstacle related to giving online access to the museum collections is the cost … outsourcing this kind of work costs a lot of money that the museum does not have” (respondent 4); “of course is the cost” (respondent 5). insufficient staff (time) and skilled staff (know-how) according to our findings, staff limitations are another obstacle small museums face in providing online access to their collections: “the existing staff has so many other responsibilities mostly related to research and museum daily functions” (respondent 1); “populating all the material to a new system requires a lot of time and staff that the museum does not have” (respondent 2); “the museum’s limited staff” (respondent 4); “the limited staff of the museums is a problem” (respondent 6). information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 8 avgousti and papaioannou further, interviewees shared that the lack of know-how is another obstacle to digitizing and making accessible museum collections online: “we do not have the technical knowledge. of all of the staff members, no one has technical knowledge … this means that we must hire a person that has this kind of knowledge” (respondent 1); “we do not have a dedicated staff to work specifically for this function” (respondent 6). complexity of technology (existing systems) according to our research, the existing technological complexity is another major problem: “the lack of easy-to-use tools that we can use at the museum [is a problem]” (respondent 1); “creating a content model selecting all necessary fields is a very complex and time-consuming process” (respondent 2); “we need tools that are user-friendly, easy to use with nontechnical complexity without requiring a too specialized technical know-how” (respondent 3); “the technological complexity that is involved” (respondent 5); “hosting your own online collections due to the maintenance and technical knowledge is another issue that small museums are facing” (respondent 6). insufficient infrastructure our research revealed the lack of technological infrastructure was an obstacle: “the lack of infrastructures … we cannot work with this kind of old infrastructure … we cannot work with a computer that is 20 years old, this is impossible … [we have] only one old computer that is connected to the internet” (respondent 1); “primary challenges related to technological infrastructure” (respondent 3); “the existing infrastructure of the museum, we have old computers” (respondent 4); “hosting your own online collections due to the maintenance and technical knowledge is another issue that small museums are facing in cyprus. this is why we use external platforms” (respondent 6). not machine consumable the complexity of technology was highlighted as the biggest challenge in publishing collections online in machine-consumable formats: “easy-to-use solutions” (respondent 1); “selection of the appropriate technology, there are so many standards for machine-readable data making the selection process extremely hard” (respondent 2); “the complexity of technology is the main obstacle” (respondent 3); “if the system we use can automatically create machine-consumable content this will help” (respondent 4); “the platform that publishes the collection humanconsumable content can at the same time publish in machine-understandable content will solve the problem” (respondent 6). for some, machine consumption is not a priority: “it is not a first priority of the museum” (respondent 5); “the museum is not familiar with machine publishing” (respondent 6). the complexity of technology and the lack of easy-to-use tools are among the biggest obstacles to publishing machine-consumable content. discussion and conclusions existing online collections and/or museum resources should be researched further as they may not be completely digitized and accessible to different audiences online. with one-third of small museums in cyprus providing access to their collections online, there are many opportunities to help small museums to give access to their collections to benefit information knowledge democratization. information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 9 avgousti and papaioannou we discovered that the lack of resources and infrastructure are two significant challenges small museums in cyprus face in providing online access to their collections. our results show no museum partners with national institutions, such as universities or academic research centers. we assert that this collaboration can reduce costs and eliminate the need for infrastructure. at the same time, institutions, such as universities, usually have the technological know-how and can provide museums with new tools, free and open-source systems that focus on cypriot small museum needs. such tools, which can be found in our research, will help museums to drastically reduce the cost that is involved in buying such systems. moreover, we found that the lack of staff (time) is another challenge that prevents museums from having their collections online. we believe that developing new tools that can accelerate the process of generating, administering, maintaining, and uploading museum collections online will alleviate staff time. our research also uncovered that small museums in cyprus do not work with volunteers, as they have no time and resources to find and then train volunteers to museum work; we suggest museums must consider these options concerning the lack of staff (time). additionally, we learned that museums lack specialized staff (know-how), another significant challenge that blocks museums from democratizing their online collections. we anticipate that developing technology that requires less technical expertise will benefit small museums that do not have specialized staff (e.g., developers and information technology specialists). further, help from external bodies such as universities may help. on the other hand, there are platforms available that do not need specialized technical knowledge. however, we discovered that the complexity of existing technology impedes museum collections online. we hope that creating less complex technology will enable museums to use and publish their collections online in human and machine consumable formats. further, training of existing staff in new technologies is needed. to sum up, small museums in cyprus and the world need to invest in democratizing their collections online via digitizing, describing, and making their objects and collections available online. simple and turnkey solutions for publishing and describing digitized objects are required. there is a will; we keep researching towards finding the most suitable case-oriented and affordable ways. acknowledgments many thanks to all the interviewees from small museums in cyprus that opened their doors to our research. for ethical considerations, we keep institutions and interviewees anonymous. information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 10 avgousti and papaioannou endnotes 1 paul katz, “the quandaries of the small museum,” the journal of museum education 20, no. 1 (1995): 15–17, https://www.jstor.org/stable/40479486. 2 nik honeysett and julia falkowski, museum technology landscape 2018: discovery and findings (lyrasis: 2018), https://www.lyrasis.org/leadership/documents/lyrasis-museum-techlandscape-report-2018.pdf. 3 eric miller, uche ogbuji, victoria mueller, and kathy macdougall, bibliographic framework as a web of data: linked data model and supporting services (washington, dc: library of congress, 2012), 42. 4 sheila watson, ed., museums and their communities (routledge: new york, 2007). 5 guido cimadomo, “documentation and dissemination of cultural heritage: current solutions and considerations about its digital implementation,” in 2013 digital heritage international congress (digitalheritage) (ieee: 2013), 555–62, https://doi.org/10.1109/digitalheritage.2013.6743796; s. sylaiou, f. liarokapis, p. patias, and o. georgoula, “virtual museums: first results of a survey on methods and tools” (paper presented at cipa 2005 xx symposium, 26 september–01 october 2005, torino, italy); rachel regelein, “a digital collections plan for the southwest seattle historical society” (unpublished master’s project, university of washington, 2019), https://www.washington.edu/museology/2019/11/13/a-digital-collections-plan-for-thesouthwest-seattle-historical-society/; ion gil fuentetaja and maria economou, “studying the type of online access provided to museum collections” (2008), https://www.semanticscholar.org/paper/studying-the-type-of-online-access-provided-tofuentetaja-economou/b44415e02b5fca204d79b481d325b66482461f41. 6 regelein, “a digital collections plan”; bernadette flynn, “making collections accessible” (federation of australian historical societies inc., january 2018), https://www.history.org.au/wp-content/uploads/2018/10/makingcollectionsaccessible.pdf; karol j. borowiecki and trilce navarrete, “digitization of heritage collections as indicator of innovation,” economics of innovation and new technology 26, no. 3 (2017): 227–46, https://doi.org/10.1080/10438599.2016.1164488; morgan schlesinger, “the museum wiki: a model for online collections in museums” (master’s project/capstone, university of san francisco, 2016), https://repository.usfca.edu/capstone/456; genevieve horan, “digital heritage: digitization of museum and archival collections” (research paper, master of public administration, political science department, southern illinois university, 2013), https://opensiuc.lib.siu.edu/gs_rp/374. 7 ilse harms and werner schweibenz, “evaluating the usability of a museum web site abstract” (2001), https://www.museumsandtheweb.com/mw2001/papers/schweibenz/schweibenz.html. 8 steen hvass, preface to the museum’s web users, a user survey of museum websites (heritage agency of denmark, 2010), https://slks.dk/fileadmin/publikationer/kulturarv/the_museum_s_web_users_2010.pdf; https://www.jstor.org/stable/40479486 https://www.lyrasis.org/leadership/documents/lyrasis-museum-tech-landscape-report-2018.pdf https://www.lyrasis.org/leadership/documents/lyrasis-museum-tech-landscape-report-2018.pdf https://doi.org/10.1109/digitalheritage.2013.6743796 https://www.washington.edu/museology/2019/11/13/a-digital-collections-plan-for-the-southwest-seattle-historical-society/ https://www.washington.edu/museology/2019/11/13/a-digital-collections-plan-for-the-southwest-seattle-historical-society/ https://www.semanticscholar.org/paper/studying-the-type-of-online-access-provided-to-fuentetaja-economou/b44415e02b5fca204d79b481d325b66482461f41 https://www.semanticscholar.org/paper/studying-the-type-of-online-access-provided-to-fuentetaja-economou/b44415e02b5fca204d79b481d325b66482461f41 https://www.history.org.au/wp-content/uploads/2018/10/makingcollectionsaccessible.pdf https://doi.org/10.1080/10438599.2016.1164488 https://repository.usfca.edu/capstone/456 https://opensiuc.lib.siu.edu/gs_rp/374 https://www.museumsandtheweb.com/mw2001/papers/schweibenz/schweibenz.html https://slks.dk/fileadmin/publikationer/kulturarv/the_museum_s_web_users_2010.pdf information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 11 avgousti and papaioannou monica bercigli, “dissemination strategies for cultural heritage: the case of the tomb of zechariah in jerusalem, israel,” heritage 2, no. 1 (march 2019): 306–14, https://doi.org/10.3390/heritage2010020; elena villaespesa and trilce navarrete, “museum collections on wikipedia: opening up to open data initiatives” (paper presented at mw19, the 23rd museweb conference, boston, ma, april 2–6, 2019), https://mw19.mwconf.org/paper/museum-collections-on-wikipedia-opening-up-to-opendata-initiatives/; gerald wayne clough, “democratization of knowledge through digitization in libraries, museums, and archives” (2020), https://smartech.gatech.edu/handle/1853/62423; gerald wayne clough, “democratization of knowledge through digitization in libraries & archives” (2020), video, https://smartech.gatech.edu/bitstream/handle/1853/62423/clough.mp4?sequence=5&isallo wed=y; eva richani, georgios papaioannou, and christina banou, “emerging opportunities: the internet, marketing and museums,” in 20th international conference on circuits, systems, communications and computers (cscc 2016) 76, https://doi.org/10.1051/matecconf/20167602044. 9 lynsey martenstyn, “digital archives: making museum collections available to everyone,” culture professionals network, the guardian, may 3, 2013, https://www.theguardian.com/culture-professionals-network/culture-professionalsblog/2013/may/03/museum-archives-digital-online; shyam oberoi and kristen arnold, “new architectures for online collections and digitization” (paper presented at mw2015: museums and the web, chicago, il, april 8–11, 2015), https://mw2015.museumsandtheweb.com/paper/new-architectures-for-online-collectionsand-digitization/. 10 barbara lejeune, “the effects of online catalogues in london and other museums: a study of an alternative way of access,” papers from the institute of archaeology 18, no. s1 (2007): 79–97, https://doi.org/10.5334/pia.289. 11 chryssoula bekiari, leda charami, martin doerr, christos georgis, and athina kritsotaki, “documenting cultural heritage in small museums” (paper presented in 2008 annual conference of cidoc), https://cidoc.mini.icom.museum/wpcontent/uploads/sites/6/2018/12/25_papers.pdf; rolf däßler and ulf preuß, “digital preservation of cultural heritage for small institutions,” in digital cultural heritage, ed. horst kremers (springer international publishing, 2020), https://www.springerprofessional.de/en/digital-preservation-of-cultural-heritage-for-smallinstitutions/16842836. 12 penelope kelly, “managing digitization projects in a small museum” (master’s project, arts and administration program, university of oregon, march 2005), https://scholarsbank.uoregon.edu/xmlui/handle/1794/937. 13 kate taylor, “going digital not easy for cultural institutions,” the globe and mail, april 18, 2020, https://www-theglobeandmail-com.cdn.ampproject.org. 14 regelein, “a digital collections plan”; flynn, “making collections accessible”; susan wigodner and caitlin kearney, “who reviewed this?! a survey on museum web publishing in 2018 https://doi.org/10.3390/heritage2010020 https://mw19.mwconf.org/paper/museum-collections-on-wikipedia-opening-up-to-open-data-initiatives/ https://mw19.mwconf.org/paper/museum-collections-on-wikipedia-opening-up-to-open-data-initiatives/ https://smartech.gatech.edu/handle/1853/62423 https://smartech.gatech.edu/bitstream/handle/1853/62423/clough.mp4?sequence=5&isallowed=y https://smartech.gatech.edu/bitstream/handle/1853/62423/clough.mp4?sequence=5&isallowed=y https://doi.org/10.1051/matecconf/20167602044 https://www.theguardian.com/culture-professionals-network/culture-professionals-blog/2013/may/03/museum-archives-digital-online https://www.theguardian.com/culture-professionals-network/culture-professionals-blog/2013/may/03/museum-archives-digital-online https://mw2015.museumsandtheweb.com/paper/new-architectures-for-online-collections-and-digitization/ https://mw2015.museumsandtheweb.com/paper/new-architectures-for-online-collections-and-digitization/ https://doi.org/10.5334/pia.289 https://cidoc.mini.icom.museum/wp-content/uploads/sites/6/2018/12/25_papers.pdf https://cidoc.mini.icom.museum/wp-content/uploads/sites/6/2018/12/25_papers.pdf https://www.springerprofessional.de/en/digital-preservation-of-cultural-heritage-for-small-institutions/16842836 https://www.springerprofessional.de/en/digital-preservation-of-cultural-heritage-for-small-institutions/16842836 https://www-theglobeandmail-com.cdn.ampproject.org/ information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 12 avgousti and papaioannou (paper presented at mw18: museums and the web 2018, vancouver, canada, april 18–21, 2018), https://mw18.mwconf.org/paper/who-reviewed-this-a-survey-on-museum-webpublishing-in-2018/index.html; shyam oberoi and kristen arnold, “new architectures for online collections and digitization” (paper presented at mw2015: museums and the web, chicago, il, april 8–11, 2015), https://mw2015.museumsandtheweb.com/paper/newarchitectures-for-online-collections-and-digitization/index.html. 15 flynn, “making collections accessible.” 16 oberoi and arnold, “new architectures.” 17 nuno freire, pável calado, and bruno martins, “availability of cultural heritage structured metadata in the world wide web” (paper presented at 22nd international conference on electronic publishing, june 2018), 11, https://www.researchgate.net/publication/325914185_availability_of_cultural_heritage_stru ctured_metadata_in_the_world_wide_web. 18 flynn, “making collections accessible.” 19 manolis gergatsoulis and pantelis lilis, “multidimensional rdf,” in on the move to meaningful internet systems 2005: coopis, doa, and odbase (2005): 1188–1205, https://doi.org/10.1007/11575801_17. 20 cruce saunders, “content authoring for human and machine consumption.” [a], september 2, 2019, https://simplea.com/articles/content-authoring-for-human-and-machine. 21 alejandra garcia bittar, “is a digital strategy necessary in small museums?” museum and digital culture – pratt institute, november 11, 2018, https://museumsdigitalculture.prattsi.org/is-adigital-strategy-necessary-in-small-museums-a72c1645e495. 22 charles tongue, “museum survey 2017,” vernon systems (blog), may 12, 2017, https://vernonsystems.com/museum-survey-2017/. 23 network of european museum organisations, “final report: digitisation and ipr in european museums” (network of european museum organisations, july 2020), https://www.nemo.org/fileadmin/dateien/public/publications/nemo_final_report_digitisation_and_ipr_in_ european_museums_wg_07.2020.pdf; cf. kelly, “managing digitization projects.” 24 flynn, “making collections accessible.” 25 honeysett and falkowski, “museum technology landscape.” 26 axiell, “museums accelerate implementation of digital strategies, making more content available online and on-site to improve visitor experiences,” axiell (blog), june 28, 2017, https://www.axiell.com/axiell-news/museums-accelerate-implementation-of-digitalstrategies-making-more-content-available-online-and-on-site-to-improve-visitor-experiences2/. https://mw18.mwconf.org/paper/who-reviewed-this-a-survey-on-museum-web-publishing-in-2018/index.html https://mw18.mwconf.org/paper/who-reviewed-this-a-survey-on-museum-web-publishing-in-2018/index.html https://www.researchgate.net/publication/325914185_availability_of_cultural_heritage_structured_metadata_in_the_world_wide_web https://www.researchgate.net/publication/325914185_availability_of_cultural_heritage_structured_metadata_in_the_world_wide_web https://doi.org/10.1007/11575801_17 https://simplea.com/articles/content-authoring-for-human-and-machine https://museumsdigitalculture.prattsi.org/is-a-digital-strategy-necessary-in-small-museums-a72c1645e495 https://museumsdigitalculture.prattsi.org/is-a-digital-strategy-necessary-in-small-museums-a72c1645e495 https://vernonsystems.com/museum-survey-2017/ https://www.ne-mo.org/fileadmin/dateien/public/publications/nemo_final_report_digitisation_and_ipr_in_european_museums_wg_07.2020.pdf https://www.ne-mo.org/fileadmin/dateien/public/publications/nemo_final_report_digitisation_and_ipr_in_european_museums_wg_07.2020.pdf https://www.ne-mo.org/fileadmin/dateien/public/publications/nemo_final_report_digitisation_and_ipr_in_european_museums_wg_07.2020.pdf https://www.axiell.com/axiell-news/museums-accelerate-implementation-of-digital-strategies-making-more-content-available-online-and-on-site-to-improve-visitor-experiences-2/ https://www.axiell.com/axiell-news/museums-accelerate-implementation-of-digital-strategies-making-more-content-available-online-and-on-site-to-improve-visitor-experiences-2/ https://www.axiell.com/axiell-news/museums-accelerate-implementation-of-digital-strategies-making-more-content-available-online-and-on-site-to-improve-visitor-experiences-2/ information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 13 avgousti and papaioannou 27 joan beaudoin, “art museum collections online: extending their reach” (paper presented at mw20, the 24th museweb conference, march 31–april 4, 2020), https://mw20.museweb.net/paper/art-museum-collections-online-extending-their-reach/. 28 avgoustinos avgousti, georgios papaioannou, and feliz ribeiro gouveia, “content dissemination from small-scale museum and archival collections: community reusable semantic metadata content models for digital humanities,” the code4lib journal, no. 43 (february 14, 2019), https://journal.code4lib.org/articles/14054. 29 “vernon systems – museum survey,” vernon systems (blog), may 19, 2016, https://vernonsystems.com/vernon-systems-museum-survey-data/. 30 nuno freire et al., “a survey of web technology for metadata aggregation in cultural heritage,” information services & use 37, no. 4 (2017): 425–36, https://doi.org/10.3233/isu-170859. 31 europeana: discover europe’s digital cultural heritage (website), accessed january 29, 2023, https://www.europeana.eu/en. 32 denis pitzalis, “3d and semantic web: new tools to document artefacts and to explore cultural heritage collections” (2013); denis pitzalis, “3d and semantic web: new tools to document artifacts and to explore cultural heritage collections. signal and image processing,” (phd diss. université pierre et marie curie, 2013); chryssoula bekiari et al., “documenting cultural heritage in small museums;” lejeune, “the effects of online catalogues in london and other museums: a study of an alternative way of access.” 33 paul klimper, “introduction to museums in the digital age,” in nemo 21st annual conference documentation, bukarest, romania, 2013, ed. julia pagel and kelly donahue, https://www.nemo.org/fileadmin/dateien/public/statements_and_news/nemo_21st_annual_conference_doc umentation.pdf. 34 network of euopean museum organisations, “final report”; dorel micle, “heritage networks and portals,” in museum and the internet. selected papers from the international summer course in buşteni, romania, 20th – 26th of september, 2004, ed. irina oberlander-târnoveanu (budapest: archaeolingua, 2008), 73-120; bittar, “is a digital strategy necessary in small museums?” 35 magdalena laine-zamojska, “virtual museum and small museums: vimuseo.fi project,” in museums and the web 2011: proceedings, ed. j. trant and d. beardman (toronto: archives and museum informatics, 2011), https://www.museumsandtheweb.com/mw2011/papers/virtual_museum_and_small_museu ms_vimuseofi_pro. 36 network of european museum organisations, “final report.” 37 noah lenstra, “website development for small museums: a case study of the katherine dunham dynamic museum,” january 1, 2008, https://mw20.museweb.net/paper/art-museum-collections-online-extending-their-reach/ https://journal.code4lib.org/articles/14054 https://vernonsystems.com/vernon-systems-museum-survey-data/ https://doi.org/10.3233/isu-170859 https://www.europeana.eu/en https://www.ne-mo.org/fileadmin/dateien/public/statements_and_news/nemo_21st_annual_conference_documentation.pdf https://www.ne-mo.org/fileadmin/dateien/public/statements_and_news/nemo_21st_annual_conference_documentation.pdf https://www.ne-mo.org/fileadmin/dateien/public/statements_and_news/nemo_21st_annual_conference_documentation.pdf https://www.museumsandtheweb.com/mw2011/papers/virtual_museum_and_small_museums_vimuseofi_pro https://www.museumsandtheweb.com/mw2011/papers/virtual_museum_and_small_museums_vimuseofi_pro information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 14 avgousti and papaioannou https://www.academia.edu/2439955/website_development_for_small_museums_a_case_stud y_of_the_katherine_dunham_dynamic_museum. 38 “resources for small museums,” gallery systems (blog), accessed july 13, 2022, https://www.gallerysystems.com/online-tools-resources-small-museums/. 39 tongue, “museum survey 2017.” 40 avgousti, papaioannou, and ribeiro gouveia, “content dissemination.” 41 wigodner and kearney, “who reviewed this?!” 42 wigodner and kearney, “who reviewed this?!” 43 honeysett and falkowski, “museum technology landscape.” 44 jasmine hoover, “gaps in it and library services at small academic libraries in canada,” information technology and libraries 37, no. 4 (2018): 15–26, https://doi.org/10.6017/ital.v37i4.10596. 45 avgousti, papaioannou, and ribeiro gouveia, “content dissemination.” 46 klimper, introduction. 47 vikas bhushan, shiv shakti ghosh, and sudipta biswas, “bridging the gap between cms and semantic web” (paper presented at naclin 2015 conference, kamataka, india), researchgate, 2016, https://www.researchgate.net/publication/307923260_bridging_the_gap_between_cms_and_ semantic_web. 48 network of european museum organisations, “final report.” 49 avgousti, papaioannou, and ribeiro gouveia, “content dissemination.” 50 stéphane jean joseph corlosquet, “bootstrapping the web of data with drupal” (master’s thesis, national university of ireland, galway, 2009), https://aic.ai.wu.ac.at/~polleres/supervised_theses/stephane_corlosquet_mseng_2009.pdf . 51 athanasios velios and aurelie martin, “off-the-shelf crm with drupal: a case study of documenting decorated papers,” international journal on digital libraries 18, no. 4 (2017): 321–31, https://doi.org/10.1007/s00799-016-0191-5. 52 konstantinos vavliakis, georgios karagiannis, and pericles mitkas, “semantic web in cultural heritage after 2020” (2012), https://www.semanticscholar.org/paper/semantic-web-incultural-heritage-after-2020-vavliakiskaragiannis/c69d14de020d5dedb9e76a173c94cc56cc254251. 53 illias daradimos, costas vassilakis, and akrivi katifori, “a drupal cms module for managing museum collections” (2015), https://www.academia.edu/2439955/website_development_for_small_museums_a_case_study_of_the_katherine_dunham_dynamic_museum https://www.academia.edu/2439955/website_development_for_small_museums_a_case_study_of_the_katherine_dunham_dynamic_museum mailto:https://www.gallerysystems.com/online-tools-resources-small-museums/ https://doi.org/10.6017/ital.v37i4.10596 https://www.researchgate.net/publication/307923260_bridging_the_gap_between_cms_and_semantic_web https://www.researchgate.net/publication/307923260_bridging_the_gap_between_cms_and_semantic_web https://doi.org/10.1007/s00799-016-0191-5 https://www.semanticscholar.org/paper/semantic-web-in-cultural-heritage-after-2020-vavliakis-karagiannis/c69d14de020d5dedb9e76a173c94cc56cc254251 https://www.semanticscholar.org/paper/semantic-web-in-cultural-heritage-after-2020-vavliakis-karagiannis/c69d14de020d5dedb9e76a173c94cc56cc254251 https://www.semanticscholar.org/paper/semantic-web-in-cultural-heritage-after-2020-vavliakis-karagiannis/c69d14de020d5dedb9e76a173c94cc56cc254251 information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 15 avgousti and papaioannou https://www.academia.edu/29947679/a_drupal_cms_module_for_managing_museum_collect ions. 54 quinn dombrowski, drupal for humanists (college station: texas a&m university press, 2016.) 55 jennifer zaino, “2017 trends for semantic web and semantic technologies,” dataversity (blog), november 29, 2016, https://www.dataversity.net/2017-predictions-semantic-websemantic-technologies/. 56 grigoris antoniou and frank van harmelen, a semantic web primer, second edition (cambridge massachusetts and london, u.k.: the mit press, 2008). 57 jackson joab, “tim berners-lee: machine-readable web still a ways off” gcn, october 30, 2009, https://gcn.com/articles/2009/10/30/berners-lee-semantic-web.aspx. 58 eric miller, uche ogbuji, victoria mueller, and kathy macdougall, “bibframe primer – bibliographic framework as a web of data: linked data model and supporting services” (november 2012): 42, https://www.researchgate.net/publication/280113409_bibframe_primer__bibliographic_framework_as_a_web_of_data_linked_data_model_and_supporting_services. 59 antoniou and van harmelen, a semantic web primer. 60 bryn farnsworth, “qualitative vs quantitative research – what is what?” imotions (blog), june 11, 2019, https://imotions.com/blog/qualitative-vs-quantitative-research/. 61 ceryn evans, “analysing semi-structured interviews using thematic analysis: exploring voluntary civic participation among adults,” in sage research methods datasets part 1 (sage publications, ltd., 2018), https://doi.org/10.4135/9781526439284. 62 sandra faulkner and stormy p. trotter, “data saturation,” in the international encyclopedia of communication research methods (american cancer society, 2017), 1–2, https://doi.org/10.1002/9781118901731.iecrm0060. https://www.academia.edu/29947679/a_drupal_cms_module_for_managing_museum_collections https://www.academia.edu/29947679/a_drupal_cms_module_for_managing_museum_collections https://www.dataversity.net/2017-predictions-semantic-web-semantic-technologies/ https://www.dataversity.net/2017-predictions-semantic-web-semantic-technologies/ https://gcn.com/articles/2009/10/30/berners-lee-semantic-web.aspx https://www.researchgate.net/publication/280113409_bibframe_primer_-_bibliographic_framework_as_a_web_of_data_linked_data_model_and_supporting_services https://www.researchgate.net/publication/280113409_bibframe_primer_-_bibliographic_framework_as_a_web_of_data_linked_data_model_and_supporting_services https://imotions.com/blog/qualitative-vs-quantitative-research/ https://doi.org/10.4135/9781526439284 mailto:https://doi.org/10.1002/9781118901731.iecrm0060 abstract introduction what is a small museum? literature review the current state and challenges current obstacles complexity of technology and metadata issues methodology data collection methods selection of the sample sample size conducting the interviews results the current state of collections online the current challenges of collections online insufficient resources and the cost of existing solutions insufficient staff (time) and skilled staff (know-how) complexity of technology (existing systems) insufficient infrastructure not machine consumable discussion and conclusions acknowledgments endnotes intro to coding using python at the worcester public library public libraries leading the way intro to coding using python at the worcester public library melody friedenthal information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12207 melody friedenthal (mfriedenthal@mywpl.org) is a public services librarian, worcester public library. abstract the worcester public library (wpl) offers several digital learning courses to our adult patrons, and among them is “intro to coding using python”. this 6-session class teaches basic programming concepts and the vocabulary of software development. it prepares students to take more intensive, college-level classes. the bureau of labor statistics predicts a bright future for software developers, web developers, and software engineers. wpl is committed to helping patrons increase their “hireability” and we believe our python class will help patrons break into these lucrative and gratifying professions… or just have fun. history and details of our class i came to librarianship from a long career in software development, so when i joined the worcester public library in january 2018 as a public services librarian, my manager proposed that i teach a class in programming. she asked me to research what language would be best. python got high marks for ease of use, flexibility, growing popularity, and a very active online community. once i selected a language, i had to choose an environment to teach it in – or so i thought. i had absolutely no experience in front of a classroom, and few pedagogical skills, so i sought out an online python course within which to teach. i decided to use the code academy (ca) website as our programming environment. ca has selfguided classes in a number of subjects and the free beginning python course seemed to be just what we needed. i went through the whole class myself before using it as courseware. my intent was to help students register for ca, then, each day, teach them the concepts in that day’s ca lesson. they would then be set to do the online lesson and assignments. we first offered python in june 2018. problems with ca came up right from the start: students registered for the wrong class (despite the handout explicitly naming the correct class) and ca frequently tried to upsell to a not-free python class. since ca’s classes are moocs (massive open online courses), the developers built in an automated way of correcting student code: embedded behind each web page of the course, there’s code that examines the student’s code and decides whether it is acceptable or not. good in theory, not so good in practice. ca’s “code-behind” is flawed and sometimes prevented students from advancing to the next lesson. mailto:mfriedenthal@mywpl.org information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 2 moreover, some of the ca tasks were inane. for example, one lesson incorporated a kind of mad libs game. this is where the instructions ask, for example, for 13 nouns and 11 adjectives, and these are combined with set sentences to generate a silly story. this assignment turned out to be too long and difficult to fulfill, preventing students from advancing. although i used ca the first few times i offered the class, i subsequently abandoned it and wrote my own classroom material. after determining that ca wasn’t appropriate, i chose an online ide where the students could code independently. this platform worked well when i tested it ahead of time, but when the whole class tried to log on at once, we received denial-of-service error messages. hurriedly moving on to plan c, i chose thonny, a free python ide which we downloaded to each pc in the lab (see https://thonny.org/). each student receives a free manual (see figure 1), which i wrote. every time i’ve offered this class i’ve edited the manual, clarifying those topics the students had a hard time with. i’ve also added new material, including commands students have shown me. it is now 90 pages long, written in microsoft word, and printed in color. we use soft binders with metal fasteners. figure 1. intro to coding using python manual developed for the course. https://thonny.org/ information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 3 the manual consists of the following sections: • cover: course name, dates we meet, time class starts and ends, location, instructor’s name, manual version number, and a place for the student to write their own name. • syllabus: goals for each of the six sessions. this is aspirational. • basic information about programming, including an online alternative to thonny, for students who don’t have a computer at home and wish to use our public computers for homework. • lessons 1 – 17: “hello world” and beyond. • lesson 18: object oriented design, which i consider to be advanced, optional material. skipped if time is pressing or the class isn’t ready for it. • lesson 19: wrap-up: o how to write good code. o how to debug. o list of suggested topics for further study. o online resources for python forums and community. • list of wpl‘s print resources on python and programming. • relevant comic strips and cartoons. in march 2019, my manager asked me to start assigning homework. if a student attends all six sessions and makes a decent attempt at each assignment, at the sixth session they receive a certificate of completion. the certificate has the wpl name & logo, the student’s name, and my signature. typically three or four students earn a certificate. homework is emailed to me as an attachment. this class meets on tuesday evenings and i tell students to send me their homework as soon as possible. inevitably, several students don’t email me until the following monday. while i don’t give out grades, i do spend considerable time reviewing homework, line by line, and i email back detailed feedback. when the january 2020 course started, i found that between october’s class and january, outlook implemented a security protocol which removes certain file extensions from incoming email. and – you can see where this is going – the .py python extension was one of them. i told students to rename their python code files from xxxx.py to xxxx.py.doc, where “xxxx” is their program name. this fools outlook into thinking the file is a microsoft word document and the email is delivered to me intact. when it arrives, i remove the .doc extension from the attachment and save it to a student-specific file. then i open the file in thonny and review it. physically, our computer lab contains an instructor’s computer and twelve student computers (see figure 2). it also has a projector which projects the active window from the instructor’s computer onto a screen: usually the class manual. i use dry erase markers in a variety of colors to illustrate the concepts on a whiteboard. there is also a supply of pencils on hand for student notetaking use. the class is offered once per season. although the classroom can accommodate twelve students, we set our maximum registration to fourteen, which allows us to maximize attendance even if patrons cancel or don’t show up. and if all fourteen do attend the first class, we have two lap tops i information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 4 can bring into the lab. we also maintain a small waitlist, usually of five spots. we’ve offered this class seven times and the registration and waitlists have been full every time. sometimes we have to turn students away. figure 2. classroom at worcester public library. however, we had a problem with registered patrons not showing up, so last spring we implemented a process where, about a week before class starts, i email each student, asking them confirm their continued interest in the class. i tell them that if they are no longer interested—or don’t respond i will give the seat we reserved for them to another interested patron (from the waitlist). in this email i also outline how the course is structured and that they can each earn a certificate of completion. i tell them class starts promptly at 5:30 and to please plan accordingly. some students don’t check their email. some patrons show up without ever registering; they are told registration is required and to try again in a few months. i keep track of attendance on an excel spreadsheet. here in worcester, ma, weather is definitely a factor for our winter sessions. information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 5 over time i’ve made the class more dynamic. i have a student read a paragraph in the manual aloud. i’ve switched around the order of some lessons, in response to student questions. i have them play a game to teach boolean logic: “if you live in worcester and you love pizza, stand up!”… then: “if you live in worcester or you love pizza, stand up!” students range from experienced programmers (of other languages), to people with no experience but great aptitude, to people who just never seem to “get it”. this material is technical and i try hard to communicate the concepts but i lose a few students every time. we ask our patrons for feedback on all of our programs. our python students have written: • “… the classes were formatted in an organized manner that was beginner friendly” • “the manual is a big help. i'm thankful that the program is free.” • “… coding is fun and i learned a new skill.” • “this made me think critically and helped me understand where my errors in the programs were.” wpl is proud to offer classes that make a difference in our patrons’ lives. abstract history and details of our class applying topic modeling for automated creation of descriptive metadata for digital collections article applying topic modeling for automated creation of descriptive metadata for digital collections monika glowacka-musial information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.13799 monika glowacka-musial (monikagm@nmsu.edu) is assistant professor/metadata librarian, new mexico state university library. © 2022. abstract creation of descriptive metadata for digital objects tends to be a laborious process. specifically, subject analysis that seeks to classify the intellectual content of digitized documents typically requires considerable time and effort to determine subject headings that best represent the substance of these documents. this project examines the use of topic modeling to streamline the workflow for assigning subject headings to the digital collection of new mexico state university news releases issued between 1958 and 2020. the optimization of the workflow enables timely scholarly access to unique primary source documentation. introduction digital scholarship relies on digital collections and data. in the influential book digital_humanities, anna burdick and her associates affirm that humanistic knowledge production depends on collection building and curation.1 access to historical documents and data resources is essential for the development of new research questions and methodologies.2 this project utilizes topic modeling to support building a digital collection of institutional news releases. it is one of the initiatives to apply digital technologies in the library workflows. new mexico state university news releases in response to a growing scholarly and public interest in original university press announcements, the digitization of past nmsu print news releases was approved in september 2013. sixty years of news releases starting from the late 1950s to the present were to be included. one of the arguments presented in justification of the project was that these institutional news briefs have a truly unique historical value. researchers view university press announcements as anchors in the history of nmsu and the region, particularly for dating events and initiatives. they also find official communications essential for studying the way the news was framed by participants and the university administration. historically, the relationships between the university and the local media had always been a major concern of college administrators: how to respect the freedom of the press, while ensuring responsible and factual journalism, and how to build an effective partnership that would benefit both sides?3 to address these questions, the administration early on established the college’s information services that have issued news releases about campus events, programs, and developments in the college’s research, teaching, and service. these formal news repor ts representing the perspective of the university have been regularly distributed to local and worldwide media for many decades. this collection has become one of the most popular primary sources documenting a history of the southwestern educational institution. mailto:monikagm@nmsu.edu information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 2 since the beginning of the digitization project, thousands of press releases had been scanned, described, and added to the digital collection. currently, the collection features press releases issued by the university between 1958 and 1974. there is still a lot to be done. the most timeconsuming element in the process is adding metadata, including library of congress subject headings, to individual news releases. with decreasing personnel, dwindling library resources, and competing work priorities, the progress on the project has slowed substantially. its revitalization requires a fresh, problem-solving approach that would allow for a significant reduction of time that catalogers spend on metadata creation. in search for a viable solution, topic modeling, a computational tool for classifying large collections of texts, was put to the test and generated promising results. the following sections describe the tools, data, and process created for this experiment in some detail. topic modeling and its applications topic modeling (tm) is one of the methodologies used in natural language processing (nlp). it was specifically designed for text mining and discovering hidden patterns in huge collections of documents, images, and networks.4 according to practitioners, topic modeling is best viewed as a statistical tool for text exploration and open-ended discovery.5 it has been used extensively in computer science, genetics, marketing, political science, journalism, and digital humanities f or the last two decades. a growing literature on topic modeling applications provides clear evidence of its viability.6 examples of tm applications in digital social sciences and humanities include finding geographic themes from gps-associated documents on social media platforms such as flickr and twitter,7 selecting news articles on opposition to euro currency from financial times data,8 identifying paragraphs on epistemological concerns in english and german novels ,9 tracking research trends in different disciplines,10 and revealing dominant themes in newspapers,11 governance literature,12 and wikipedia entries.13 topic modeling was applied in addition to text mining to enhance access to large digital collections by providing minimal description and enriching metadata, including subject headings .14 also, a possibility of using topic modeling to determine the subject headings for books on project gutenberg was explored.15 topic modeling in a nutshell topic models help to identify the contents of document collections. topic modeling is a process of discovering clusters of words that best represent a set of topics. figure 1 shows the basic idea behind topic modeling. a large collection of text documents (the scrolls on top) consists of thousands of words (shown symbolically at the bottom). the algorithm seeks for the most frequent words that tend to occur in proximity and clusters them together. each cluster, referred to as a topic, has a set of words with their probabilities of belonging to a given topic. each document in the collection displays a set of combined topics to different degrees. here, documents are seen as mixtures of topics, and topics are seen as mixtures of words.16 topics also provide context to words. documents that have similar combinations of topics tend to be related. as a result, a large collection of text documents can be represented by a limited set of topics (as presented by icons in the middle of the figure). information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 3 figure 1. basic idea behind topic modeling. topics and subject headings combined the original purpose of topic modeling, as formulated by david blei and his associates in 2003, was to make large collections of texts more approachable for scholars by organizing texts automatically based on latent topics.17 these hidden topics can be discovered, measured, and consequently used by scholars to navigate the collection. the purpose of assigning subject headings is to identify “aboutness,” or simply subject concepts, covered by the intellectual content of a given work, and then again collocate related works. 18 since both topic models and subject headings have a similar purpose, although very different methodology and scale, we decided to combine them and make topic models a prerequisite for assigning subject headings. in such a scenario, the computer deals with the scale of text collections that are beyond human reading capacity and catalogers then fine-tune the results generated by the algorithm. the following methods section shows subsequent stages involved in the process of semiautomated assignment of subject headings to documents. methods overview for topic modeling, we used the algorithm of latent dirichlet allocation (lda).19 lda takes a document-term matrix, with rows corresponding to documents, and columns corresponding to terms (words) and, based on semirandom exploration, finds optimal probabilities of topics in documents (called gammas), and probabilities of terms in topics (called betas). after lda generates a set of topics that best represent the collection of news releases, each topic is associated with several subject headings that were previously assigned to news releases by catalogers. for a new news release, lda finds a set of most representative topics. subject headings information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 4 associated with the dominant topics are combined into a list of subject candidates presented to a cataloger. the last step involves a cataloger using a short list of subject candidates for selecting subject headings for news releases. training data training data used in this project consists of over 6,000 news releases (from 1958 to 1967) annotated with metadata. only two metadata properties—titles and subject headings—were considered. created by catalogers, both properties reflect the content of news releases accurately, although mistakes may sometimes happen. the values from the titles field were converted into a document-term matrix that, in turn, became an input for the algorithm. texts produced by ocr on original news releases were not included in the analysis due to their poor quality. detailed steps of the proposed method: 1. topic modeling on training data: a. run standard preprocessing of training text data, including tokenization, stop words removal, and stemming. b. run topic modeling (lda) where each document from the training data set is assigned a set of topics (subsets of words), each one with a measurable contribution to the document. 2. assignment of subject headings to topics.20 for each topic: a. select a number of documents with the highest probability (gamma) for the topic. we used 400. b. gather a set of subject headings assigned to documents produced in 2.a. and arrange them with decreasing frequency (freq) of occurrence in the set. 3. assignment of subject headings to a new document. a. assign to the new document gammas (probabilities) of topics using the lda model trained in 1.b. b. in subsequent topics, for each subject heading calculate its weight in the document as a product of its frequency in the topic (freq) and probability of the topic (gamma) in the document; for subject headings duplicated across topics, sum up their weights across topics. c. create a list of candidate subject headings processed in 3.b. in descending order with respect to their weights in the document. implementation there is a growing number of tools used for topic modeling.21 for this project, we used the r programming language, which has many packages for data preprocessing and topic modeling (tm).22 below are listed r packages used for this project: • topicmodels with functions: lda() producing topic models, posterior() for assigning topics to test documents by pretrained models and perplexity() for perplexity calculation 23 • tidytext with tidying functions that allow for re-arrangements and exploring data as well as for interpreting the models • textstem for preprocessing data, including stemming and lemmatization information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 5 • tidyr, dplyr, and stringr for data and strings manipulation and arrangements • ggplot2 for data visualizations the code related to topic modeling was mostly reused from the datacamp class on topic modeling.24 occasionally, data.table data structure was applied instead of data.frame. in addition to standard stop-words, custom stop-words including initials, names of weekdays, and dates were removed from the corpus using function anti_join(). for finding topics in test documents by a pretrained model, function posterior() from the r package topicmodels was used.25 the extra step needed before using function posterior() was to align the new document with the document-term matrix used for training the lda model.26 results for assessing the method’s performance, we adopted the idea of recall. in this specific context, recall is defined as the fraction of original subject headings (i.e., those assigned to a document manually by a cataloger) that are present on the list of candidate subject headings produced by the method. the average recall is estimated using a leave-one-out setting.27 once a single test document is set aside, the lda model is trained on the remaining documents and recall is calculated for the tested document using the list of candidate subject headings produced by the method. then, recall is averaged over a set of testing documents. this approach produces an estimate of the method’s performance if tested on a new document. information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 6 figure 2. average recall as a function of size of list with subject headings candidates. figure 2 shows the dependence of average recall on length of list of candidate subjects produced by the method. recall is averaged over 1,500 randomly selected test documents. the dashed line represents the chance level performance, i.e., when the method would produce a random subset of all subject headings available in the data. on a list of 100 suggested subject headings, the recall is on average above 0.6 and for a list of 500 candidate subject headings, above 0.8. even though the average recall stays noticeably below 1 (recall value 1 would mean perfect performance), at the same time it is still considerably above the chance level. the results presented in figure 2 were produced by the lda model trained with 16 topics. one of the parameters affecting the method performance is the number of topics used by the lda model. for finding the number of topics corresponding to the highest recall, an overall measure of recall across different lengths of the subject candidate list was defined as the cumulative recall for first 100 subject candidates. we assumed that 100 is a likely size of candidate lists that catalogers would be willing to go through. figure 3 shows the cumulative recall for different numbers of topics, based on which 16 were chosen as the optimum. interestingly, this corresponds wel l with the perplexity dependence on number of topics (fig. 4). the perplexity, a measure of model’s surprise at the data, shows how the model fits the data—a smaller number means a better fit, i.e., a better topic model.28 information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 7 figure 3. cumulative recall as a function of number of topics in the lda model. information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 8 figure 4. perplexity of the lda model as a function of number of topics in the lda model. to give a better idea about the method performance, figure 5 shows the distribution of recall for individual test documents, for a list of 100 subject headings. since most documents in the training data have just a few subject headings, there is only a small set of discrete values possible for recall for individual documents. the distribution is wide, with a fraction of documents with no subject heading present on the proposed list (recall = 0) but also with a bigger fraction of documents fully covered by the list (recall = 1). information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 9 figure 5. distribution of recall across 1,500 test documents, for 100 subject candidates (for 16 topics). the following examples show the sets of subject headings selected by the algorithm that include subject headings (in bold blue) chosen originally by catalogers. example 1 title of news release: “‘romeo and juliet’ play part of campus celebration for 400th anniversary of shakespeare's birth” subjects weights new mexico state university. playmakers 0.280 theater 0.143 students 0.080 academic achievement 0.080 theater--production and direction 0.075 high school students 0.052 competitions 0.048 new mexico state university. college of engineering 0.042 plays 0.041 debates and debating 0.038 new mexico state university. aggie forensic festival 0.036 information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 10 zohn, hershel 0.034 shakespeare, william, 1564-1616. a midsummer night's dream 0.034 forensics (public speaking) 0.034 frisch, max, 1911-1991. firebugs 0.027 tickets 0.027 theater rehearsals 0.027 new mexico state university. college of agriculture and home economics 0.022 shakespeare, william, 1564-1616. romeo and juliet 0.020 frisch, max, 1911-1991 0.020 performances 0.020 garcia lorca, federico, 1898-1936. casa de bernarda alba. english 0.020 molière, 1622-1673. bourgeois gentilhomme. english 0.020 anniversaries 0.014 new mexico state university. college of teacher education 0.012 example 2 title of caption to photo: “locals barbara gerhard, donna herron, lillian jean taylor rehearse for upcoming concert” subjects weights concerts 0.123 new mexico state university. university-civic symphony orchestra 0.085 institution. playmakers 0.077 united states. air force rotc 0.073 united states. army. reserve officers' training corps 0.062 military cadets 0.058 award presentations 0.054 theater 0.039 award winners 0.038 scholarships 0.035 music 0.035 musicians 0.031 awards 0.027 new mexico state university. department of military science 0.023 theater--production and direction 0.021 kennecott copper corporation 0.019 students 0.019 glowacki, john 0.019 new mexico state university symphonic band 0.015 new mexico state university. university-community chorus 0.015 lynch, daniel 0.015 drath, jan 0.015 performances 0.015 information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 11 military art and science 0.012 united states. army--inspection 0.012 discussion the major advantage of the method described above is reducing a long list of library of congress subject headings that catalogers need to consult before assigning subject headings to news releases. it is important to note that this method produces subject headings that are already present in the training data. the list of available subject headings can be expanded by periodic updates of the training data to include all entries in the catalog, assuming catalogers will add, where needed, subjects not present so far in the data set. in this project we utilized metadata from just two fields: titles and subject head ings. although documents’ titles are supposed to compactly represent the content of documents, we expect that the presented approach would give better results if the full text (ocr) was analyzed. in this project, the limiting factors were both quality of print copies and robustness of available ocr tools. in some cases, subject annotations are imperfect, depending on skills and experience of catalogers. that also affects the performance of our method that relies on quality of subject assignments. on the other hand, there are cases when the method suggests subjects that are fitting the content of news releases but were not selected by catalogers. this indicates that the method can also be used to refine the existing annotations. conclusion we propose a way to streamline the workflow of metadata creation for university news releases by applying topic modeling. first, we use this digital technology to identify topics in a large collection of text documents. then, we associate the discovered topics with sets of subject headings. finally, to a new document, we assign those subject headings that are associated with the document’s most dominant topics. the proposed method facilitates the process of document annotation. it produces short lists of candidate subject headings that account for a significant part of original labeling performed by catalogers. this approach can be applied to support annotation of any large digital collection of text documents. one of the advantages of applying topic modeling is that it produces numeric representations of text documents. these numeric representations can be used by advanced analytical methodologies, including machine learning, for numerous practical purposes in library workflows like text categorization, collocation of similar materials, enhancing metadata for digital collections, finding trends in government literature, etc. in addition, mastering digital methodologies by librarians may open new ways of collaboration among them and digital scholars across university campuses. as johnson and dehmlow argue, “... digital humanities represent a clear opportunity for libraries to offer significant value to the academy, not only in the areas of tool and consultations, but also in collaborative expertise that supports workflows for librarians and scholars alike.”29 digital technologies are best learned in hands-on practice. if librarians are to contribute to the development of digital scholarship, then information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 12 they need to learn how to apply new technologies to their own work. and since both librarians and humanists work with texts, they might have much to offer each other. correction on november 21, 2022, the urls to references 24 and 26 were updated at the author’s request to avoid user login. endnotes 1 anne burdick et al., digital_humanities (cambridge, massachusetts: the mit press, 2012), 32–33. 2 thomas g. padilla, “collections as data implications for enclosure,” acrl news 79, no. 6 (2018), https://crln.acrl.org/index.php/crlnews/article/view/17003/18751; rachel wittmann, anna neatrour, rebekah cummings, and jeremy myntti, “from digital library to open datasets: embracing a ‘collections as data’ framework,” information technology and libraries 38, no. 4 (december 2019), https://doi.org/10.6017/ital.v38i4.11101. 3 gerald w. thomas, academic ecosystem: issues emerging in a university environment (gerald w. thomas, 1998), 159–64. 4 david m. blei, andrew ng, and michael jordan, “latent dirichlet allocation,” journal of machine learning research 3, no. 1 (2003); david m. blei, “topic modeling and digital humanities,” journal of digital humanities 2, no. 1 (winter 2012), http://journalofdigitalhumanities.org/21/topic-modeling-and-digital-humanities-by-david-m-blei/. 5 megan r. brett, “topic modeling: a basic introduction,” journal of digital humanities 2, no. 1 (winter 2012), http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basicintroduction-by-megan-r-brett/; jordan boyed-graber, yuening hu, and david mimno, “applications of topic models,” foundations and trends® in information retrieval 11, no. 2–3 (2017): 143–296. 6 boyed-graber, hu, and mimno, “applications of topic models,” foundations and trends® in information retrieval 11, no. 2–3 (2017): 143–296; rania albalawi, tet hin yeap, and morad benyoucef, “using topic modeling methods for short-text data: a comparative analysis,” frontiers in artificial intelligence 3 (2020): 42, https://doi.org/10.3389/frai.2020.00042; hamed jelodar, yongli wang, chi yuan, xia feng, “latent dirichlet allocation (lda) and topic modeling: models, applications, a survey,” (2017), https://www.ccs.neu.edu/home/vip/teach/dmcourse/5_topicmodel_summ/notes_slides/lda _survey_1711.04305.pdf. 7 zhijun yin et al., “geographical topic discovery and comparison,” in www: proceedings of the 20th international conference on the world wide web (2011), https://doi.org/10.1145/1963405.1963443. 8 david andrzejewski and david buttler, “latent topic feedback for information retrieval,” in kdd '11: proceedings of the 17th acm sigkdd international conference on knowledge discovery and data mining (2011), https://dl.acm.org/doi/10.1145/2020408.2020503. https://crln.acrl.org/index.php/crlnews/article/view/17003/18751 https://doi.org/10.6017/ital.v38i4.11101 http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/ http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/ http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/ http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/ https://doi.org/10.3389/frai.2020.00042 https://www.ccs.neu.edu/home/vip/teach/dmcourse/5_topicmodel_summ/notes_slides/lda_survey_1711.04305.pdf https://www.ccs.neu.edu/home/vip/teach/dmcourse/5_topicmodel_summ/notes_slides/lda_survey_1711.04305.pdf https://doi.org/10.1145/1963405.1963443 https://dl.acm.org/doi/10.1145/2020408.2020503 information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 13 9 matt erlin, “topic modeling, epistemology, and the english and german novel,” cultural analytics 1, no. 1 (may 1, 2017), https://doi.org/10.22148/16.014. 10 cassidy r. sugimoto et al., “the shifting sands of disciplinary development: analyzing north american library and information science dissertations using latent dirichlet allocation,” journal of the american society for information science and technology 62, no. 1 (january 2011), https://doi.org/10.1002/asi.21435; david mimno, “computational historiography: data mining in a century of classics journals,” journal on computing and cultural heritage 5, no. 1 (april 2012): 3:1–3:19; andrew j. torget and jon christensen, “mapping texts: visualizing american historical newspapers,” journal of digital humanities 1, no. 3 (summer 2012), http://journalofdigitalhumanities.org/1-3/mapping-texts-project-by-andrew-torgetand-jon-christensen/; andrew goldstone and ted underwood, “the quiet transformations of literary studies: what thirteen thousand scholars could tell us,” new literary history 45, (2014): 359–84; carlos g. figuerola, francisco javier garcia marco, and maria pinto, “mapping the evolution of library and information science (1978–2014) using topic modeling on lisa,” scientometrics 112, (2017): 1507–35, https://doi.org/10.1007/s11192-017-2432-9; jung sun oh and ok nam park, “topics and trends in metadata research,” journal of information science theory and practice 6, no. 4 (2018): 39–53; manika lamba and margam madhusudhan, “metadata tagging of library and information science theses: shodhganga (2013–2017),” paper presented at etd 2018: beyond the boundaries of rims and oceans globalizing knowledge with etds, national central library, taipei, taiwan, https://doi.org/10.5281/zenodo.1475795; manika lamba and margam madhusudhan, “author-topic modeling of desidoc journal of library and information technology (2008– 2017), india,” library philosophy and practice (2019): 2593, https://digitalcommons.unl.edu/libphilprac/2593. 11 david j. newman and sharon block, “probabilistic topic decomposition of an eighteenthcentury american newspaper,” journal of the american society for information science and technology 57, no. 6 (april 1, 2006): 753–67; robert k. nelson, “mining the dispatch,” last modified november 2020, https://dsl.richmond.edu/dispatch/about; tze-i yang, andrew torget, and rada mihalcea, “topic modeling on historical newspapers,” in latech '11: proceedings of the 5th acl-hlt workshop on language technology for cultural heritage, social sciences, and humanities (2011), https://dl.acm.org/doi/10.5555/2107636.2107649; carina jacobi, wouter van atteveldt, and kasper welbers, “quantitative analysis of large amounts of journalistic texts using topic modelling,” digital journalism 4, no. 1 (2015), https://doi.org/10.1080/21670811.2015.1093271. 12 jonathan o. cain, “using topic modeling to enhance access to library digital collections,” journal of web librarianship 10, no. 3 (2016): 210–25, https://doi.org/10.1080/19322909.2016.1193455; alexandra lesnikowski et al., “frontiers in data analytics for adaptation research: topic modeling,” wires climate change 10, no. 3 (2019): e576, https://doi.org/10.1002/wcc.576. 13 tiziano piccardi and robert west, “crosslingual topic modeling with wikipda,” in proceedings of the web conference 2021 (www ’21), april 19–23, 2021, ljubljana, slovenia (acm, new york), https://doi.org/10.1145/3442381.3449805. https://doi.org/10.22148/16.014 https://doi.org/10.1002/asi.21435 http://journalofdigitalhumanities.org/1-3/mapping-texts-project-by-andrew-torget-and-jon-christensen/ http://journalofdigitalhumanities.org/1-3/mapping-texts-project-by-andrew-torget-and-jon-christensen/ https://doi.org/10.1007/s11192-017-2432-9 https://doi.org/10.5281/zenodo.1475795 https://digitalcommons.unl.edu/libphilprac/2593 https://dsl.richmond.edu/dispatch/about https://dl.acm.org/doi/10.5555/2107636.2107649 https://doi.org/10.1080/21670811.2015.1093271 https://doi.org/10.1080/19322909.2016.1193455 https://doi.org/10.1002/wcc.576 https://doi.org/10.1145/3442381.3449805 information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 14 14 cain, “using topic modeling to enhance access to library digital collections,” 210 –25; a. krowne and m. halbert, “an initial evaluation of automated organization for digital library browsing,” in jcdl '05: proceedings of the 5th acm/ieee-cs joint conference on digital libraries, (june 7–11, 2005): 246–255; david newman, kat hagedorn, and chaitanya chemudugunta, “subject metadata enrichment using statistical topic models,” paper presented at acm ieee joint conference on digital libraries jcdl’07, vancouver, bc, june 17– 22, 2007. 15 craig boman, “an exploration of machine learning in libraries,” ala library technology report 55, no. 1 (january 2019): 21–25. 16 julia silge and david robinson, text mining with r: a tidy approach (sebastopol, california: o’reilly media, inc., 2017), 90. 17 blei, ng, and jordan, “latent dirichlet allocation.” 18 arlene g. taylor, introduction to cataloging and classification, 10th ed. (westport, connecticut: libraries unlimited, 2006), 19–20, 301–14; arlene g. taylor and daniel n. joudrey, the organization of information, 3rd ed. (westport, connecticut: libraries unlimited, 2009), 303– 28. 19 blei, ng, and jordan, “latent dirichlet allocation.” 20 silge and robinson, text mining with r, 149. 21 albalawi, yeap, and benyoucef, “using topic modeling methods for short-text data,” 42. 22 the r project for statistical computing, https://www.r-project.org/. 23 bettina grün and kurt hornik, “topicmodels: an r package for fitting topic models,” journal of statistical software 40, no. 13 (2011): 1–30, https://doi.org/10.18637/jss.v040.i13. 24 topic modeling in r (datacamp), https://www.datacamp.com/courses/topic-modeling-in-r. 25 grün and hornik, “topicmodels.” 26 topic modeling in r (datacamp), chap. 3, https://www.datacamp.com/courses/topicmodeling-in-r. 27 christopher m. bishop, pattern recognition and machine learning (new york, ny: springer science + business media, 2006), 32–33. 28 blei, ng, and jordan, “latent dirichlet allocation.” 29 daniel johnson and mark dehmlow, “digital exhibits to digital humanities: expanding the digital libraries portfolio,” in new top technologies every librarian needs to know, ed. kenneth j. varnum, (chicago: ala neal-schuman, 2019), 124. https://www.r-project.org/ https://doi.org/10.18637/jss.v040.i13 https://www.datacamp.com/courses/topic-modeling-in-r https://www.datacamp.com/courses/topic-modeling-in-r https://www.datacamp.com/courses/topic-modeling-in-r abstract introduction new mexico state university news releases topic modeling and its applications topic modeling in a nutshell topics and subject headings combined methods overview training data detailed steps of the proposed method: implementation results example 1 example 2 discussion conclusion endnotes 20190318 10974 galley public libraries leading the way the democratization of artificial intelligence: one library’s approach thomas finley information technology and libraries | march 2019 8 thomas finley (tfinley@friscotexas.gov) is adult services manager, frisco public library. chances are that before you read this article, you probably checked your email, used a mapping app to find your way, or typed a search term online. without your even perceiving it, artificial intelligence (ai) has already helped you to accomplish something today. email spam filters use variants of ai to help cut down on harmful or useless emails in your inbox.1 with ai doing the factcrunching, mapping apps quickly preview the best route based on a myriad of factors. search engine companies like google have been using ai to suggest or produce results faster for longer than anyone outside of the company really knew until recently.2 according to a recent study by northeastern university and gallup, 85% of americans are already using ai products.3 the true revelation behind these recent technological developments may not be the fact that ai is already embedded into the fabric of our modern lives. the real surprise might just be the sudden ubiquitous availability (and approachability) of ai tools for all. as google’s former chief scientist of ai and machine learning, fei-fei li, said in 2017, “the next step for ai must be democratization, lowering the barriers of entry, and making it available to the largest possible community of developers, users and enterprises."4 this sounds a lot like most public libraries’ mission statements. as with other important workforce development efforts, libraries are uniquely placed to participate in this new revolution as key platforms for the discovery and dissemination of emerging tech knowledge. at the frisco public library (https://www.friscolibrary.com), we saw this ai trend surfacing, we see ai as a critical future job skill, and we investigated ways to introduce our patrons into this space. as such, the frisco public library has leveraged readily available technology in a cost-effective way that has engaged community interest. our efforts are also replicable and scalable in terms of multi-nodal experiences both at home and in classroombased learning. some basic definitions let’s take a few steps back to give some broad definitions and boundaries to the scope of ai. according to the oxford english dictionary, artificial intelligence is “the capacity of computers or other machines to exhibit or simulate intelligent behavior.”5 in the literature, you will find a further distinction between general ai, narrow ai, and something called machine learning.6 general ai is something that begins to look like science fiction: an artificial intelligence that learns how to learn, then is able to generalize what it has learned and apply that knowledge to a different case. in advanced examples of general ai, scientists are thinking of not putting a specific problem in front of a general ai program to solve, rather, they are giving it an entire dataset so the program itself can choose what problems it should work on. removing the limited point of view of whoever programs the program.7 narrow ai is easier to understand because it is what we interact with the most in our day-to-day lives. it is what powers those little speed ups that help us do things faster every day: search information technology and libraries | march 2019 9 through our emails to help us avoid spam, translate speech to text when we dictate a message on a smartphone, or helps to parallel park a car at the touch of a button. narrow ai accomplishes a specific task extremely fast and accurately, and thus, becomes an extension and multiplier of our own human productivity. a lot of these narrow ai activities are based in a type of artificial intelligence called machine learning (ml). ml is a set of very complex processes that can review large sets of information; create and train models based on this data; make predictions of what will happen next; and then to refine that data for better future results.8 machine learning is the focus of our efforts at the frisco public library due to two main reasons: 1) it is what has been made available through free tools such as google’s open ai resources; and 2) it makes ai attainable in a library setting. our approach: makerspaces for everyone, at home the frisco public library has had 4 years of success with circulating makerspace technology in reasonably-priced, hard shell waterproof boxes with foam inserts. each kit is cataloged, rfid tagged, security tagged, and sealed with zip ties to enable self-checkouts (zip ties can be easily cut open at home, but prevent items from disappearing in the library). these cases are easy to handle and can take some abuse while protecting their contents. this is important because we circulate about 20 different kinds of robotics kits, no-soldering circuitry kits, 3d scanning kits, programing kits, and internet of things kits. most kits contain the theme item with quick start guides, instruction booklets, and a book to inspire advanced learning. we call these maker kits, and we have about 150 total. in our community, they are wildly popular and have circulated more than 4,000 times since their introduction in january 2016.9 aiy: artificial intelligence kits for everyone in 2017, google released their maker-focused aiy voice project kit (where aiy is a catchy substitute for do-it-yourself with artificial intelligence yourself). the kit consists of several components that pairs a raspberry pi (entry-level computer) and a small speaker that is housed in a cardboard box with a button prominently placed on top.10 the result is a stripped-down version of an amazon echo or google home device — essentially a smart speaker. although the aiy voice kit is not necessarily initially set up to play music, it is designed to take voice commands like the other products on the market. with a minimum of python coding expertise, aiy kits enable mass participation in artificial intelligence. there isn’t even any soldering required to put this kit together! this is 100% in line with fei-fei li’s (google’s former chief scientist for ai and ml) remarks about the need to democratize ai. google has since released another kit called aiy vision that uses similar components paired with a camera. more information on the kits can be found at https://aiyprojects.withgoogle.com/. frisco public library’s artificial intelligence maker kits based on our previous experience with other maker kits, we made a few modifications to the original google design that most librarians with access to a 3d printer can accomplish. the original aiy voice kit uses a punch-out cardboard box to fold and envelop the device. apart from being an extremely cost-effective way of making a box, it also seems like there is delicious irony (and message) in the contrasting of cardboard-as a cheap, widely available material-with the advanced tech of ai. durability being our priority, we knew we needed to upgrade this aspect of google’s original design. our maker librarian, adam lamprecht, quickly found a shared design file public libraries leading the way: the democratization of artificial intelligence 10 https://doi.org/10.6017/ital.v38i1.10974 uploaded to the website, www.thingiverse.com, that he modified to better suit our needs (see figure 1).11 figure 1. ai maker kits with 3d printed aiy voice device. we then printed these in a variety of colors on our 3d printers and modified the grid-patterned foam inserts to make room for the device and a few other items (see figure 2). we are currently circulating 21 of these kits without major incident. information technology and libraries | march 2019 11 figure 2. interior view of the kits. library instruction: python as a window onto artificial intelligence our basic artificial intelligence classes have been key in the introduction of this technology to the public. we reserve 10 kits for a class and pair them with classroom laptops for ease of use. the structure of the class provides a short introduction to the technology and then walks participants through a basic voice recognition coding challenge. all of this is accomplished in python. python is great for beginning coders because it is easier to learn than other programming languages, takes less time to write lines of code, and it can telescope up into a very large number of projects and applications.12 in fact, according to neal ford, director and software architect at thoughtworks, python, “is very good at solving bigger kinds of problems.”13 so with python, a beginning learner has a programming language that continues to be useful beyond the classroom and into the world of work or school. python provides another important advantage: “python provides the front-end method of how to hook into google’s open ai,” states tech writer sardar yegulalp.14 it is this combination of a free, accessible coding language with the powerful (and also free) resources of google’s open ai that truly lowers the barrier to entry for anyone interested in a hands-on experience with artificial intelligence. public libraries leading the way: the democratization of artificial intelligence 12 https://doi.org/10.6017/ital.v38i1.10974 lessons learned the ai maker kits are, by far, our most complicated circulating kits. we are hearing back from patrons that the kits are right on the mark. our users get it, they see the power in getting access to these ai tools (utilizing python) and by all accounts thus far, are happy with their results. there has been a perception gap between library staff, however, and what an ai kit can reasonably accomplish. adam lamprecht reports, “staff members had the expectation that perhaps with this kit, a rookie coder was going to be able to jump directly into developing deep learning neural networks (a very advanced subset of artificial intelligence) and so we definitely benefited from ongoing discussions of those broad ai terms and expectations.”15 google’s aiy voice is a good start but there is lots of room to grow ai classes for more depth. aiy vision is the next logical step that would allow us to enter into the world of basic image recognition. our approach does rely on one company’s platform, but there are more platforms to explore ai now. one of which is amazon’s offerings of machine learning on aws (amazon web services). these services have recently been opened up for a wider audience and amazon is now offering everyone the same online courses they use to train their own engineers.16 the aws ml resources are currently behind paywalls but access to the training alone could be powerful for the right learner. there are even interesting developments for younger learners in ai with robotics. anki (www.anki.com) is a consumer robotics company that uses ai to enliven its products. they released vector in 2018: a seemingly simple toy that responds to its environment and simple commands with the aid of ai. with the release of their software development kit the company is allowing others under the hood of their robots-which potentially means an entry-point for autonomous (or semi-autonomous) robotic vehicle technology powered by ai. what is clear is that the world of ai is already upon us. public libraries are well positioned to help meet the challenge of developing the workforce of the nearand far future with ai classes being a vital tool. the doorway to artificial intelligence is now open, the only question that remains is this: do you step through it? references 1 cade metz, “google says its ai catches 99.9 percent of gmail spam,” wired. july 9, 2015, https://www.wired.com/2015/07/google-says-ai-catches-99-9-percent-gmail-spam/. 2 jack clark, “google turning its lucrative web search over to ai machines,” bloomberg business, october 26, 2015, https://www.bloomberg.com/news/articles/2015-10-26/google-turningits-lucrative-web-search-over-to-ai-machines. 3 rj reinhart, “most americans already using artificial intelligence products,” gallup, march 6, 2018, https://news.gallup.com/poll/228497/americans-already-using-artificial-intelligenceproducts.aspx. 4 scot petersen, “google joins chorus of cloud companies promising to democratize ai,” eweek, march 10, 2017, ebscohost academic search complete. information technology and libraries | march 2019 13 5 “artificial intelligence, n,” oed online, december 2018, oxford university, accessed march 1, 2019. 6 bernard marr, “what is the difference between artificial intelligence and machine learning?,” forbes, december 6, 2016, https://www.forbes.com/sites/bernardmarr/2016/12/06/whatis-the-difference-between-artificial-intelligence-and-machine-learning/#6d40eeec2742. 7 lex fridman, “juergen schmidhuber: godel machines, meta-learning, and lstms,” mit ai podcast, december 22, 2018. 8 serdar yegulalp, “what is tensorflow? the machine learning library explained,” infoworld. june 6, 2018, https://www.infoworld.com/article/3278008/tensorflow/what-is-tensorflow-themachine-learning-library-explained.html. 9 frisco public library, 2019 “unpublished maker kit statistics 2016-2019.” 10 “aiy projects: voice kit,” google, accessed december 15, 2018, https://aiyprojects.withgoogle.com/voice/. 11 adam lamprecht, “google aiy voice box,” thingiverse, accessed february 14, 2019, https://www.thingiverse.com/thing:3247685. 12 elena ruchko, “why learn python? here are 8 data-driven reasons,” dbader.org, accessed february 14, 2019, https://dbader.org/blog/why-learn-python. 13 christina cardoza, “the python programming language grows in popularity,” sd times, june 15, 2017, https://sdtimes.com/artificial-intelligence/python-programming-language-growspopularity/. 14 yegulalp, “what is tensorflow? the machine learning library explained.” 15 adam lamprecht, email message to the author, february 15, 2019. 16 locklear mallory, “amazon opens up its internal machine learning training to everyone,” engadget, november 26, 2018, https://www.engadget.com/2018/11/26/amazon-opensinternal-machine-learning-training/. near-field communication (nfc): an alternative to rfid in libraries articles near-field communication (nfc) an alternative to rfid in libraries neeraj kumar singh information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.11811 neeraj kumar singh (neerajkumar78ster@gmail.com), phd, is deputy librarian, panjab university, chandigarh, india abstract libraries are the central agencies for the dissemination of knowledge. every library aspires to provide maximum opportunities to its users and ensure optimum utilization of available resources. hence, libraries have been seeking technological aids to improve their services. near-field communication (nfc) is a type of radio-frequency technology that allows electronics devices—such as computers, mobile phones, tags, and others—to exchange information wirelessly across a small distance. the aim of this paper is to explore nfc technology and its applications in modern era. the paper will discuss potential use of nfc in the advancement of traditional library management system. introduction similar to other identification technologies such as radio-frequency identification (rfid), barcodes, and qr codes, near-field communication (nfc) is a short-range (4–10 cm) wireless communication technology. nfc is based on the existing 13.56 mhz rfid contactless card standards which have been established for several years and are used for payment, ticketing, electronic passport, and access control among many other applications. data rates range from 106 to 424 kilobits per second. a few nfc devices are already capable of supporting up to 848 kilobits per second which is now being considered for inclusion in the nfc forum specifications. 1 compared to other wireless communication technologies nfc is designed for proximity or shortrange communication which provides a dedicated read zone and some inherent security. its 13.56 mhz frequency places it within the ism band, which is available worldwide. it is a bi-directional communication meaning that you can exchange data in both directions with a typical range of 4 – 10 cm depending on the antenna geometry and the output power.2 nfc is convenient and fast: the action is automatically triggered when your phone comes within 10 cm near the nfc tag and you get instant access to the content on mobile, without a single click.3 rfid and nfc technologies are similar in that both use radio waves. both rfid and nfc technologies exchange data within electronic devices in active mode as well as in passive mode. in the active mode, outgoing signals are basically those that actually come from the power source, whereas in case of passive mode the signals use the reflected energy they have received from the active signal. in rfid technology the radio waves can send information to receivers up to hundreds of meters away depending on the frequency of the band used by th e tag. if provided with high amount of power, these signals can also be sent to extreme distances (e.g., in the case of airport radar). at large airports it typically controls traffic within a radius of 100 kilometers of the airport below an elevation of 25,000 feet. rfid is also used very often in tracking animals and vehicles. mailto:neerajkumar78ster@gmail.com information technology and libraries june 2020 near field communication (nfc) | singh 2 in contrast, items like passports and payment cards should not be capable of long-distance transmissions because of the threat of theft of personal information or funds. nfc is designed to meet this need. nfc tags are very small in size so as to fit on the inner side of devices and products such as inside luggage, purses and packs as well as from inside wallets and clothing and can be tracked. nfc technology has added security features that make it much more secure than the previously popular rfid equivalent and it is difficult to steal information stored in it. nfc has short range of work area compared to other wireless technologies, so it can be widely used for payments, ticketing and service admittance and thus has proved to be a safer technology. it is because of this security feature that this technology is used in cellular phones to turn them into a wallet.4 both rfid and nfc wireless technologies can operate in active and passive communication modes to exchange data within electronic devices. the main differences between nfc and rfid are: • though both rfid and nfc use radio frequencies for communication, nfc can be said to be an extension of the rfid technology. the rfid technology has been in use for more than a decade, but nfc has emerged on the scene recently. • rfid has a wider range whereas nfc has limited communication and operates only at close proximity. nfc typically has a range of a few centimeters. • rfid can function in many frequencies and many standards are being used, but nfc requires a fixed frequency of 13.56 mhz, and some other fixed technical specifications to function properly. • rfid technology can be used for such applications as item tracking, automated toll collecting on roads, vehicle movement, etc., that require wide area signals. nfc is appropriate for applications that carry data that needs to be kept secure like mobile payments, access controls, etc., that carry sensitive information. • rfid operates over long distances while exchanging data wirelessly so it is not secure for the applications that store personalized data. rfid using items susceptible to various fraud attacks such as data corruption. nfc’s short working range considerably reduces this risk of data theft, eavesdropping, and “man in the middle” attacks. • nfc has the capability to communicate both ways and thus is suitable to be used for advanced interactions such as card emulation and peer-to-peer sharing. • a number of rfid tags can be scanned simultaneously, while only a single nfc tag can be scanned at a time. how nfc works the extended functionality of a traditional rfid system has led to the nfc forum. the nfc forum has defined three operating modes for nfc devices: tag reader/writer mode; peer-to-peer mode, and card emulation mode (see figure 1). the nfc forum technical specifications for the different operating modes are based on the iso/iec 18092 nfc ip-1, jis x 6319-4, and iso/iec 14443. these specifications must be used to derive the full benefit from the capabilities of nfc technology. contactless smart card standards are referred to as nfc-a, nfc-b, and nfc-f in nfc forum specifications.5 information technology and libraries june 2020 near field communication (nfc) | singh 3 figure 1. nfc operation modes6 reader/writer mode in reader/writer mode (see figure 2), an nfc-enabled device is capable of reading nfc forummandated tag types, such as a tag embedded in an nfc smart poster. this mode allows nfcenabled devices to read the information that is stored on nfc tags embedded in smart posters and displays. since these tags are relatively inexpensive, they provide a great marketing tool for companies. figure 2. reader mode7 the reader/writer mode on the radio frequency interface is compliant with the nfc-a, nfc-b, and nfc-f schemes. examples of its use include reading timetables, tapping for special offers, and updating frequent flyer points, etc.8 information technology and libraries june 2020 near field communication (nfc) | singh 4 peer-to-peer mode in peer-to-peer mode (see figure 3), both devices must be nfc-enabled in order for them to communicate with each other to exchange information and to share files. the users of nfcenabled devices can thus quickly share information and other files with a touch. as an example, users can exchange data such as digital photos or virtual business cards via bluetooth or wifi. figure 3. peer-to-peer mode9 peer-to-peer mode is based on the nfc forum’s logical link control protocol specification and is standardized on the iso/iec 18092 standard. card-emulation mode in card-emulation mode (see figure 4), an nfc device behaves like a contactless smart card so that users can perform transactions such as purchases, ticketing, and transit access control with just a touch. an nfc device may have the ability to emulate more than one card. in card-emulation mode, an nfc-enabled device communicates with an external reader much like a traditional contactless smart card. this allows contact less payments and ticketing by nfc-enabled devices without changing the existing infrastructure. information technology and libraries june 2020 near field communication (nfc) | singh 5 figure 4. card-emulation mode by adding nfc to a contactless infrastructure one can enable two-way communications. in the air transport sector, this could simplify many operations such as updating seat information while boarding or adding frequent flyer points while making a payment.10 nfc standards and specifications the nfc specifications are defined by an industry organization called the nfc forum, which has nearly 200 member companies. the nfc forum was formed in 2004 with the objective of advancing the use of nfc technology. this was achieved by educating the market about nfc technology and developing specifications to ensure interoperability among devices and services. the nfc forum members are working together in task forces and working groups. as noted earlier, nfc technology is based on existing 13.56 mhz rfid standards and includes several protocols such as iso 14443 type a and type b, and jis x 6319-4 (which is also a japanese industrial standard known as sony felica). the iso 15693 standard, an additional 13.56 mhz protocol established in the market, is being integrated into the nfc specification by an nfc forum task force. smartphones in the market are already supporting the iso 15693 protocol.11 these nfc specifications and especially the specifications for the extended nfc functionalities are again standardized by the international standard organizations like iso/iec ecma and etsi.12 initially the rfid standards i.e. iso/iec 14443 a, iso/iec 14443 b and jis x6319-4 were also pronounced as nfc standards by different companies working in the field such as nxp, infineon, and sony. the first ever nfc standard was ecma 340, based on the air interface of iso/iec 14443a and jis x6319-4. ecma 340 adapted the iso/iec standard 18092. at the same time, major credit card companies like europay, mastercard, and visa introduced the emvco payment standard, which is based on iso/iec 14443 a and iso/iec 14443 b. these groups harmonised the over-the-air interfaces within the nfc forum. they are named nfc-a (iso/iec 14443 a based), nfc-b (iso/iec 14443 b based), and nfc-f (felica based).13 information technology and libraries june 2020 near field communication (nfc) | singh 6 nfc tags an nfc tag is a small microchip embedded in a sticker or wristband that can be read by the mobile devices that are within range. information regarding the item is stored in these microchips.14 an nfc tag has the capability to send the information stored on it to nfc enabled mobile phones. nfc tags can also perform various actions, such as changing the settings of handsets or even launch a website.15 tag memory capacity varies by the type of tag. for example, a tag may store a phone number or a url.16 the most common use of the nfc tag function on an object is mobile wallet payment processing, where the user swipes or flicks a mobile phone on a nfc tag to make payment. google’s version of this system is google wallet.17 figure 5. a quick overview of the tag types18 applications of nfc since it emerged as a standard technology in 2003, nfc technology has been implemented across multiple platforms in various ways. the primary driving force behind nfc is its application in the commercial sector in which the implementation of the technology focuses on such areas as sales and marketing. there are also emerging many new and interesting applications in various other fields of education and healthcare. all of these may impact libraries, librarians, and library users, either by prompting adaptations to existing collections and services or inspiring innovation in our profession.19 • mobile payment: customers with nfc-enabled smartphones can link with their bank accounts and are able to pay by simply tapping phones to an nfc-enabled point-of-sale.20 information technology and libraries june 2020 near field communication (nfc) | singh 7 • access and authentication: “keyless access” to restricted areas, cars, and other vehicles. one can imagine other potential uses of nfc in the future with the devices in the home being controlled by it.21 • transportation and ticketing: nfc-enabled phones can connect with an nfc-enabled kiosk to download a ticket, or the ticket can be sent directly to an nfc-enabled phone over the air (ota). the phone can then tap a reader to redeem that ticket and gain access. 22 • mobile marketing: nfc tags they can be embedded into the indoor and outdoor signage. upon tapping their smartphone on an nfc-enabled smart poster, the customer can read a consumer review, visit a website, or even view a movie trailer. • healthcare: nfc medical cards and bracelet tags can store relevant, up-to-date patient information like health history, allergies, infectious diseases, etc. • gaming: nfc technology is the bridge between physical and digital games. players can tap each other’s phones together and earn extra points or receive access to a new level, or get clues, by using nfc application.23 • inventory tracking, smart packaging, and shelf labels: nfc-tagged objects could provide a wide variety of information in different use environments. nfc-enabled smartphones can be used to tap the tags to access book reviews and information about the book’s author and recommend the book to other readers. users could check out a book or add it to a wish list to check out at a later date. indeed, with nfc, library records and metadata could theoretically be stored on and retrieved from library physical holdings themselves, allowing a patron to tap a book or resource borrowed from the library to recall its title, author, and due date.24 applications of nfc in libraries: introducing the smart library some libraries are beginning to use nfc technology as an alternative to rfid. yusof et al. proposed a newly developed application called the smart library, or “s-library,” that has adopted the nfc technology.25 in the s-library, library users can perform many library transactions just by using their mobile smartphones with integrated nfc technology. the users of s-library are required to download and install an app in their compatible mobile phone. this app provides the user relevant and easy to use library functionality such as searching, borrowing, returning, and viewing their transaction records. in this s-library model the app is integrated with the library management software. the s-library app needs to be installed on the mobile device, and the mobile device requires an internet connection that will connect it to the lms. the s-library provides five major functionalities to the user: scan, search, borrow, return, and transaction history. in the scanning function, users can access the information of a book by simply touching their mobile phone to the nfc tag on the book. as soon as the phone touches the book, information regarding its title, author, contents, synopsis, etc. will automatically be displayed on the screen of the mobile device. users can search for books by entering keywords such as book title, author name, year, etc. through the borrowing function the app allows users to check out books of interest. the user just needs to touch their mobile phone to the nfc-tagged book to borrow it. the transaction is automatically stored to the lms database. similar to the borrowing process is the returning process. the user is required to select the return function on the menu and touch the mobile device to the book, and the returning transaction will be automatically performed and stored in the lms database. however, it should be ensured that the book is physically returned to the library by returning the book through the nfc-enabled book drop information technology and libraries june 2020 near field communication (nfc) | singh 8 system of the library and only then transaction should be updated in the lms. the user can check the due date for the current transaction as well as his transaction history. the function of transaction history allows the user to view the list of books that have been borrowed from time to time and their status.26 data transmission for nfc technology can be up to 848 kilobits/second whereas the data transmission rate with rfid technology is 484 kilobits/second. taking advantage of this high data rate, the response time for s-library is also very fast. this is a huge improvement over rfid technology and especially over barcode technology where data transmission rate is variable and inconsistent and dependent upon the quality of the barcodes. the second key advantage of slibrary is that the time taken to read a tag (the communication time between a reader and an nfc enabled device) is very fast. the third advantage of nfc is its usability in comparison to the other two technologies. nfc technology is human-centric because it is intuitive and fast and the user is able to use it anywhere, anytime using their mobile phones. in rfid and barcode technology usability is item centric as person has to go to the specific device located in the library. 27 most of the shortcomings of rfid and barcode technology have been overcome by the s-library. with barcode technology, the quality of barcodes, printing clarity, print contrast ratio , and also the low level of security were all challenges. rfid technology had many drawbacks such as lack of common rfid standards, security vulnerability, reader and tag collision that happens when multiple tags are energized by the rfid tag reader simultaneously and they reflect their respective signals back to the reader at the same time. because nfc is touch based, it has presented a viable alternative tool for library users to overcome these weaknesses of the older technology. yosof et al. found many advantages to s-library: faster book borrowing; saved time of the user as well as the library staff; the connection can be initialised in less than a second; no configuration on the mobile device is required; and higher usability ratings and security.28 however, there are also some limitations of s-library. first, device compatibility is an issue, because s-library presently supports only the android platform. second, as the s-library application only supports up to a 10centimeter range, coverage is an issue. mobile payments nfc technology can be used for several library functions such as making payments, paying library fines, purchasing tickets to library events, or donating to library. users may also be able to use their digital wallet to pay for photocopying, printing, scanning, etc. keeping the requirements of the nfc technology in the future libraries have to enquire about the possibility of adding nfc payment capabilities into the existing hardware and also while purchasing new machines. already, bibliotheca’s smartserv 1000 self-serve kiosk, introduced in september 2013, includes nfc as a payment option. in the future other library automation companies for nfc integration would also be worth monitoring.29 library access and authentication nfc-enabled devices can be used to accessing the library and authenticate users. these capabilities suggest that nfc technology may play an important role in the next generation of identity management systems. of particular interest in this context are several applications of nfc in two-factor authentication, which generally combines a traditional password or other digital credential with a physical, nfc-enabled component as well. for example, an authentication system information technology and libraries june 2020 near field communication (nfc) | singh 9 could require the user to type in a fixed password in addition to tapping an nfc-enabled phone, identity card, or ring to the device they are logging in to. ibm has demonstrated a two -factor authentication method for mobile payment in which a user first types in a password and then taps an nfc-enabled credit card, issued by their bank, to their nfc-enabled smartphone. libraries could investigate similar access and authentication applications for nfc, both for internal use (staff badges and keys) as well as for public services. particularly if nfc mo bile payment finally gains consumer attraction, library patrons may begin to expect that they can use their nfc-enabled mobile devices to replace not just their credit cards but also their library cards. already, d-tech’s rfid air self check unit allows library patrons to log into their user accounts by tapping their nfc-enabled phone to the kiosk. the patron then uses the kiosk’s rfid reader to check out their library materials and receives a receipt via email or sms. beyond its application in circulation, nfc authentication can be applied to streamline access to other services and resources of the library.30 nfc-enabled devices could be used to make reservation of library spaces, classrooms, auditoriums or community halls, digital media labs, meeting rooms , etc. library users could use nfc authentication to be able to access digital library resources, such as databases, e-journals, e-books collections, and other digital collections. nfc might allow libraries of all kinds to provide more convenient access and authentication options to users, though privacy and security considerations would certainly need to be addressed. nfc access and authentication will certainly have an impact on academic libraries. at universities where nfc access systems are deployed, student identification cards can be replaced with nfc-enabled mobile phones for afterhours services such as library building entry, wifi access, and printing, copying, and scanning services. the inconvenience of multiple logins can be eliminated. however, the libraries will have to take the responsibility of protecting student information and library resources with added security.31 promotion of library services librarians can borrow ideas from commercial implementations of nfc-based marketing to enhance promotions for library resources, services, and events. as a first step, as kane and schneidewind suggested, nfc tags can complement several promotional uses of qr codes that have already been piloted or implemented in libraries. 32 for promotional use, libraries can easily embed nfc tags in their new book displays that can be linked to the bestseller list or current acquisitions lists in the library catalog or digital collections. similarly, if the reference book collection is tagged with nfc tags, it could be linked to the relevant digital collections of databases or e-books. nfc tags can be placed on library building doors or on library promotional material by which information such as library hours, opening days, schedule of events, membership rules , or floor plans for the building could be shared. as an example, at the renison university college library in ontario, canada, visitors can tap an nfc-enabled “library smartcard” to retrieve a digital brochure of library services in a variety of formats, including pdf, epub, and mp3.33 to promote outreach programs and events instead of merely sharing links the libraries can take advantage of nfc’s interactive capabilities. as an example, libraries could use nfc tags on their event posters so that the users that can scan them and register for an event, save the event to their personal calendar, join the friends of the library program, or even download a library app. to send a text message to a librarian the users can tap the smart poster promoting a virtual reference service. nfc-enabled promotional materials can engage users with library content even when they are outside of the library building itself. a brilliantly creative example was created by the field information technology and libraries june 2020 near field communication (nfc) | singh 10 museum of chicago. it used nfc-enabled outdoor smart posters throughout the city to promote an exhibit of the 1893 world’s fair. the event posters depicted a personage from 1893 that invited the viewer to “see what they saw.” users could tap their nfc-enabled mobile device to the smart poster (or read a qr code) to download an app from the field museum that included 360° images of the fair as well as videos highlighting items in the exhibition.34 inventory control the smart packaging use case brings forward a very important question for libraries that use rfid for inventory control. first, can existing rfid tags and infrastructure be leveraged to provide additional services to patrons with nfc-enabled mobile devices? the concept is not new; walsh envisioned using library rfid tags to store book recommendations or other digital information, which users could then access with a conveniently located rfid reader. 35 what nfc brings to walsh’s vision is that a dedicated rfid reader may no longer be necessary; a patron could use their own nfc-enabled smartphone to read a tag rather than taking it to a special location to be read. indeed, with nfc, library records and metadata could theoretically be stored on and retrieved from library physical holdings themselves, allowing a patron to tap a book or resource borrowed from the library to recall its title, author, and due date. an exciting and immediate use for nfc in libraries is for self-checkout: a patron can browse the stacks and could tap an nfctagged book with their nfc-enabled phone to check it out without visiting the circulation desk or waiting in line.36 smart packaging a sector close to librarians’ hearts is publishing and several publishers have started testing smart packaging for books, using embedded nfc tags to share additional content with readers such as book reviews, reading lists, etc. with digital extras, the concept of smart packaging has significant implications for libraries as a new opportunity to connect physical collections (i.e., from books to digital media). one can envision in the future that when a user taps an nfc-enabled library book they shall get access to relevant digital information (such as bibliographic information) in a variety of citation formats, editorial reviews, the author’s biography, a projected rating for the book, and links to other similar information. borrowing and returning books one of a library’s key functions is circulating physical books from the library’s collections. due to the low cost of barcode technology, many libraries around the world are using it for circulation management. however, barcode technology has several constraints: it requires a line-of-sight to the barcode, it does not provide security of library collection, it does not offer any benefit for collection management, and it is becoming challenging for libraries to satisfy the increasing demands of their users, for example, reservation of books issued out, checking their transaction history, etc. this leads to the need to implement a new technology to improve the library circulation management, inventory, and security of library collections. librarians are known as early adopters of technology and have started using rfid to provide circulation services in a more effective and efficient manner, for security of library collections, and to satisfy the increasing demands of the users, for example putting tags in books allows them to issue multiple books together by placing stack of books near a reader. information technology and libraries june 2020 near field communication (nfc) | singh 11 recommendations according to mchugh and yarmey, the implementation of nfc has been slow and unsteady and they do not foresee an immediate implementation in libraries.37 however, they recommend that librarians learn and prepare for nfc. they recommend, for example, that librarians: • follow the progress of research and scholarship on nfc and commercial progress of nfc technology to better anticipate its adoption in your community; • experiment with nfc technology and develop prototype applications for nfc use in the library; • offer an informational workshop on nfc for users and library colleagues; • enquire from the rfid vendor about tag compatibility with nfc and rewriting the tags; • monitor the progress of security and privacy aspects of nfc technology and educate the users about these issues; develop or update your library security policy; • allow patrons to “opt-in” to any nfc services at your library, providing other modes of communication where possible; • develop and share best practices for nfc implementations; and • support research on nfc in libraries via planning grants, research forums, and conference sessions. conclusions beyond the potential benefits of nfc, librarians should also be aware of and prepared for privacy and security concerns that accompany the technology. user privacy is of the utmost concern. nfc involves users’ mobile devices generating, collecting, storing, and sharing a significant amount of personal data. several of these functions, particularly mobile payment, necessitate the exchange of highly confidential data, including but not limited to a user’s financial accounts, purchase history, etc. spam may also be a concern; sending unwanted content (e.g., advertisements, coupons, or adware) to users’ mobile devices without their consent. librarians should also use special caution when considering the implementation of nfc for library promotions or services. security is a significant concern and an active area of research, as many nfc implementations involve the exchange of sensitive financial or otherwise personal data. an important concept in nfc security, particularly in the context of mobile payment, is the idea of a tamper-proof “secure element” as a basic protection for sensitive or confidential data such as account information and credentials for authentication.38 outside of continued standardization, the most effective measures for protecting n fc data transmissions are data encryption and the establishment of a secure channel between the sending and receiving devices (e.g., using a key agreement protocol and/or via ssl). for security concerns, as with privacy concerns, librarians have a crucial role to play in user education. there are important steps that individual users can and should take to protect their devices—e.g., setting a lock code for their device, knowing how to remotely wipe a stolen phone, and installing and regularly updating antivirus software. however, many users are unaware of the vulnerability of their mobile devices and often fail to enact even basic protections. by empowering objects and people to communicate with each other at a different level and establish a “touch to share” paradigm, nfc technology has the potential to transform the information technology and libraries june 2020 near field communication (nfc) | singh 12 information environment surrounding our libraries and fundamentally alter the ways in which the library patrons interact with information. endnotes 1 doaa abdel-gaber and abdel-aleem ali, “near-field communication technology and its impact in smart university and digital library: comprehensive study,” journal of library and information sciences, 3, no. 2 (december 2015): 43-77, https://doi.org/10.15640/jlis.v3n2a4. 2 “nfc technology discover what nfc is, and how to use it,” accessed march 17, 2019, https://www.unitag.io/nfc/what-is-nfc. 3 apuroop kalapala, “analysis of near field communication (nfc) and other short range mobile communication technologies” (project report, indian institute of technology, roorkee, 2013 ), accessed march 19, 2019, https://idrbt.ac.in/assets/alumni/pt2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc) %20and%20other%20short%20range%20mobile%20communication%20technologies_2013. pdf. 4 ed, “near field communication vs radio frequency identification,” accessed march 10, 2019, http://www.nfcnearfieldcommunication.org/radio-frequency.html. 5 “what it does,” nfc forum, accessed march 12, 2019, https://nfc-forum.org/what-is-nfc/whatit-does. 6 josé bravo et al., “m-health: lessons learned by m-experiences,” sensors 18, 1569 (2018): 1–27. 10.3390/s18051569. 7 vedat coskun, busra ozdenizci, and kerem ok, “the survey on near field communication,” sensors 15, no. 6 (2015): 13348-405, https://doi.org/10.3390/s150613348. 8 coskun, ozdenizci, and ok, “the survey on near field communications,” 13352. 9 coskun, ozenizci, and ok, “the survey on near field communication.” 10 “how nfc works?,” cnrfid, accessed january 12, 2019, http://www.centrenationalrfid.com/how-nfc-works-article-133-gb-ruid-202.html. 11 coskun, ozdenizci, and ok, “the survey on near field communication,” 13352. 12 c. ruth, “nfc forum calls for breakthrough solutions for annual competition,” accessed march 21, 2019, https://nfc-forum.org/newsroom/nfc-forum-calls-for-breakthrough-solutions-forannual-competition/. 13 m. roland, “near field communication (nfc) technology and measurements,” accessed may 12, 2019, https://cdn.rohdeschwarz.com/pws/dl_downloads/dl_application/application_notes/1ma182 /1ma182_5e_nfc_white_paper.pdf. 14 roland, “near field communication (nfc) technology and measurements.” https://doi.org/10.15640/jlis.v3n2a4 https://www.unitag.io/nfc/what-is-nfc https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf http://www.nfcnearfieldcommunication.org/radio-frequency.html https://nfc-forum.org/what-is-nfc/what-it-does https://nfc-forum.org/what-is-nfc/what-it-does https://doi.org/10.3390/s150613348 http://www.centrenational-rfid.com/how-nfc-works-article-133-gb-ruid-202.html http://www.centrenational-rfid.com/how-nfc-works-article-133-gb-ruid-202.html https://nfc-forum.org/newsroom/nfc-forum-calls-for-breakthrough-solutions-for-annual-competition/ https://nfc-forum.org/newsroom/nfc-forum-calls-for-breakthrough-solutions-for-annual-competition/ https://cdn.rohdeschwarz.com/pws/dl_downloads/dl_application/application_notes/1ma182/1ma182_5e_nfc_white_paper.pdf https://cdn.rohdeschwarz.com/pws/dl_downloads/dl_application/application_notes/1ma182/1ma182_5e_nfc_white_paper.pdf information technology and libraries june 2020 near field communication (nfc) | singh 13 15 “what is a near field communication tag (nfc tag)?,” techopedia, accessed may 27, 2019, https://www.techopedia.com/definition/28812/near-field-communication-tag-nfc-tag. 16 “what is meant by the nfc tag?,” quora, accessed july 12, 2019, https://www.quora.com/what-is-meant-by-the-nfc-tag. 17 s. profis, “everything you need to know about nfc and mobile payments,” accessed june 27, 2019, https://www.cnet.com/how-to/how-nfc-works-and-mobile-payments/. 18 “the 5 nfc tag types,” accessed march 24, 2019, https://www.dummies.com/consumerelectronics/5-nfc-tag-types/. 19 abdel-gaber and ali, “near-field communication technology and its impact in smart university and digital library,” 64–71. 20 iviane ramos de luna et al., “nfc technology acceptance for mobile payments: a brazilian perspective,” review of business management 19, no. 63 (2017): 82–103, https://doi.org/10.7819/rbgn.v0i0.2315. 21 rajiv, “applications and future of near field communication,” accessed march 14, 2019, https://www.rfpage.com/applications-near-field-communication-future/. 22 “nfc in public transport,” nfc forum, accessed april 12, 2019, http://www.smartticketing.org/downloads/papers/nfc_in_public_transport.pdf. 23 “gaming applications with rfid and nfc technology,” smarttech, accessed may 14, 2019, https://www.smarttec.com/en/applications/gaming. 24 sheli mchugh and kristen yarmey, “near field communication: recent developments and library implications,” synthesis lectures on emerging trends in librarianship 1, no. 1 (march 2014), 1–93. 25 m.k. yusof et al., “adoption of near field communication in s-library application for information science,” new library world 116, no. 11/12 (2015): 728–47, https://doi.org/10.1108/nlw-02-2015-0014. 26 yusof et al., “adoption of near field communication,” 734–36. 27 yusof et al., “adoption of near field communication,” 744. 28 yusof et al., “adoption of near field communication,” 745. 29 abdel-gaber and ali, “near-field communication technology and its impact in smart university and digital library,” 64. 30 mchugh and yarmey, “near field communication,” 27. 31 mchugh and yarmey, “near field communication,” 734. https://www.techopedia.com/definition/28812/near-field-communication-tag-nfc-tag https://www.quora.com/what-is-meant-by-the-nfc-tag https://www.cnet.com/how-to/how-nfc-works-and-mobile-payments/ https://www.dummies.com/consumer-electronics/5-nfc-tag-types/ https://www.dummies.com/consumer-electronics/5-nfc-tag-types/ https://doi.org/10.7819/rbgn.v0i0.2315 https://www.rfpage.com/applications-near-field-communication-future/ http://www.smart-ticketing.org/downloads/papers/nfc_in_public_transport.pdf http://www.smart-ticketing.org/downloads/papers/nfc_in_public_transport.pdf https://www.smarttec.com/en/applications/gaming https://doi.org/10.1108/nlw-02-2015-0014 information technology and libraries june 2020 near field communication (nfc) | singh 14 32 danielle kane and jeff schneidewind, “qr codes as finding aides: linking electronic and print library resources,” public services quarterly 7, no. 3–4 (2011): 111–24, https://doi.org/10.1080/15228959.2011.623599. 33 mchugh and yarmey, “near field communication,” 31. 34 mchugh, and yarmey, “near field communication,” 31. 35 andrew walsh, “blurring the boundaries between our physical and electronic libraries: location-aware technologies, qr codes and rfid tags,” the electronic library 29, no. 4 (2011): 429–37, https://doi.org/10.1108/02640471111156713. 36 projes roy and shailendra kumar, “application of rfid in shaheed rajguru college of applied sciences for women library, university of delhi, india: challenges and future prospects,” qualitative and quantitative methods in libraries 5, no. 1 (2016): 117–130, http://www.qqmljournal.net/index.php/qqml/article/view/310. 37 mchugh, and yarmey, “near field communication,” 61–2. 38 garima jain and sanjeet dahiya, “nfc: advantages, limits and future scope,” international journal on cybernetics & informatics 4, no. 4 (2015): 1–12, https://doi.org/10.5121/ijci.2015.4401. https://doi.org/10.1080/15228959.2011.623599 https://doi.org/10.1108/02640471111156713 http://www.qqml-journal.net/index.php/qqml/article/view/310 http://www.qqml-journal.net/index.php/qqml/article/view/310 https://doi.org/10.5121/ijci.2015.4401 abstract introduction how nfc works reader/writer mode peer-to-peer mode card-emulation mode nfc standards and specifications nfc tags applications of nfc applications of nfc in libraries: introducing the smart library mobile payments library access and authentication promotion of library services inventory control smart packaging borrowing and returning books recommendations conclusions endnotes assessing the effectiveness of open access finding tools articles assessing the effectiveness of open access finding tools teresa auch schultz, elena azadbakht, jonathan bull, rosalind bucy, and jeremy floyd information technology and libraries | september 2019 82 teresa auch schultz (teresas@unr.edu) is social sciences librarian, university of nevada, reno. elena azadbkaht (eazadbakht@unr.edu) is health sciences librarian, university of nevada, reno. jonathan bull (jon.bull@valpo.edu) is scholarly communications librarian, valparaiso university. rosalind buch (rbucy@unr.edu) is research & instruction librarian, university of nevada, reno. jeremy floyd (jfloyd@unr.edu) is metadata librarian, university of nevada, reno. abstract the open access (oa) movement seeks to ensure that scholarly knowledge is available to anyone with internet access, but being available for free online is of little use if people cannot find open versions. a handful of tools have become available in recent years to help address this problem by searching for an open version of a document whenever a user hits a paywall. this project set out to study how effective four of these tools are when compared to each other and to google scholar, which has long been a source of finding oa versions. to do this, the project used open access button, unpaywall, lazy scholar, and kopernio to search for open versions of 1,000 articles. results show none of the tools found as many successful hits as google scholar, but two of the tools did register unique successful hits, indicating a benefit to incorporating them in searches for oa versions. some of the tools also include additional features that can further benefit users in their search for accessible scholarly knowledge. introduction the goal of open access (oa) is to ensure as many people as possible can read, use, and benefit from scholarly research without having to worry about paying to read and, in many cases, restrictions on reusing the works. however, oa scholarship helps few people if they cannot find it. this is especially problematic for green oa works, which are those that have been made open by being deposited in an open online repository even if they were published in a subscription -based journal. opendoar reports more than 3,800 such repositories.1 as users are unlikely to search each individual repository, an efficient search method is needed to find the oa items spread across so many locations. in recent years, several browser extensions have been released that allow a user to search for an open version of an article while on a webpage for that article. the tools include: • lazy scholar, a browser extension that searches google scholar, pubmed, europepmc, doai.io, and dissem.in. it has extensions for both the chrome and firefox browsers.2 • open access button, which uses both a website and a chrome extension to search for oa versions.3 • unpaywall, which also acts through a chrome extension to search for open articles via the digital object identifier.4 • kopernio, a browser extension that searches subject and institutional repositories and is owned by clarivate analytics. kopernio has extensions for chrome, firefox, and opera.5 mailto:teresas@unr.edu mailto:eazadbakht@unr.edu mailto:jon.bull@valpo.edu mailto:rbucy@unr.edu mailto:jfloyd@unr.edu assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 83 https://doi.org/10.6017/ital.v38i3.11109 some of the tools offer other services, such as open access button’s ability to help the user email the author of an article if no open version is available, as well as integration with libraries’ interlibrary loan workflows. kopernio and lazy scholar offer to sync with a user’s institutional library to see if an article is available through the library’s collection.6 although other similar extensions might also exist, this article is focused on the four mentioned above based on the authors’ knowledge of available oa finding tools at the time of the project. literature review as noted above, scholars have indicated for several years a need for reliable and user-friendly methods, systems, or tools that can help researchers find oa materials. bosman et al. forwarded the idea of a scholarly commons—a set of principles, practices, and resources to enable research openness—that depends upon clear linkages between digital research objects.7 bulock notes that oa has “complicated” retrieval in that oa versions are often housed in various locations across the web, including institutional repositories (irs), preprint servers, and personal websites. 8 there is no perfect search option or tool, although some have tried creating solutions, such as the open jericho project from wayne state university, which is seeking to create an aggregator to search institutional repositories and eventually other sources as well.9 however, this lack of a central search tool can lead to confusion among researchers.10 nicholas and colleagues found that their sample of early career scholars drawn from several countries relied heavily on google and google scholar to find articles that interested them.11 many also turn to researchgate and other social media platforms and risk running afoul of copyright. the results of ithaka s+r’s 2015 survey of faculty in the united states reflect these findings to a certain extent, as variations exist between researchers in different disciplines.12 a majority of the respondents also indicated an affinity for freely accessible materials. as more researchers become aware of and gravitate toward oa options, the efficacy of various discovery tools, such as the browser extensions evaluated in this study, will become even more pertinent. previous studies on the findability of oa scholarship have focused primarily on google and google scholar.13 a few have assessed tools such as oaister, opendoar, and pubmed central.14 norris, oppenheim, and rowland sought a selection of articles using google, google scholar, oaister, and opendoar.15 while oaister and opendoar found just 14 percent of the articles’ open versions, google and google scholar combined managed to locate 86 percent. jamali and nabavi assessed google scholar’s ability to retrieve the full text of scholarly publications and documented the major sources of the full-text versions (publisher websites, institutional repositories, researchgate, etc.).16 google scholar was able to locate full-text versions of more than half (57.3 percent) of the items included in the study. most recently, martin-martin et al. likewise used google scholar to gauge the availability of oa documents across different disciplines.17 they found that roughly 54.6 percent of the scholarly content for which they searched was freely available, although only 23.1 percent of their sample were oa by virtue of the publisher. as of yet, no known studies have systematically evaluated the growing selection of open access tools’ efficiency and effectiveness at retrieving oa versions of articles. however, several scholars and journalists have reviewed these new tools, especially the more established open access button and unpaywall.18 these reviews were mostly positive, even as some acknowledged that the tools are not a wholescale solution for locating oa publications. despite pointing out these tools’ information technology and libraries | september 2019 84 limitations, reviewers voiced their hope that the oa finding tools could help disrupt the traditional scholarly publishing industry.19 at least one study has used the open access button to determine the green oa availability of journal articles. emery used the tool as the first step to identify oa article versions and then searched individual institutional repositories, followed by google scholar as the final steps.20 emery found that 22 percent of the study sample was available as green oa but did not say what portion of that was found by the open access button. emery did note that the open access button returned 17 false positives (six in which the tool took the user to the wrong article or other content, and 11 in which it took the user to a citation of the article with no full text available). she also found at least 38 cases of false-negative returns from the open access button, or articles that were openly available that the tool failed to find. the study did not count open versions found on researchgate or academia.edu. methodology oa finding tools this study compared the chrome browser extensions for google scholar and four oa finding tools: lazy scholar, unpaywall, open access button, and kopernio. each extension was used while in the chrome browser to search for open versions of the selected articles and the success of each extension in finding any free, full version was recorded. the authors did not track whether an article was licensed for reuse. for the four oa finding tools, the occurrences of false positives (e.g., the retrieval of an error page, a paywalled version, or the wrong article entirely) were also tracked. false positives were not tracked for google scholar, which does not purport to find only open versions of articles. data collection occurred over a six-week period in october and november 2018. the authors used web of science to identify the test articles (n=1,000) with the aim of selecting articles that would give the tools the best chance for finding a high number of open versions. articles selected were published in 2015 and 2016. these years were selected in order to try to avoid embargoes that might have prevented articles being made open through deposit. the articles were selected from two disciplines: applied physics and oncology, both of which have a large share in web of science and come from a broader discipline with a strong oa culture.21 each comparison began with searching the google scholar extension by article doi or title if a doi was not available. all versions retrieved by google scholar were examined until an open version was located or until the retrieved versions were exhausted. the remaining oa tools were then tested from the webpage for the article record on the journal’s website (if available). if no journal page was available, the article pdf page was tested. all data were recorded in a shared google sheet according to a data dictionary. searches for open versions of paywalled articles were performed away from the authors’ universities to ensure the institutions’ subscriptions to various journals did not impact the results. authors were limited in the number of articles they could search each day as some tools blocked continued use, presumably over concerns of illegitimate web activity, after as few as 15 searches. study limitations this methodology might have missed open versions of articles, even using these five search tools. although studies have found google scholar to be one of the most effective ways of searching for assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 85 https://doi.org/10.6017/ital.v38i3.11109 open versions, way has shown that it is not perfect.22 therefore, it is possible that this study undercounted the number of oa articles. the study tested the ability of oa finding tools to locate open articles from a journal’s main article page, not other possible webpages (e.g., the google scholar results page). this design may have limited the effectiveness of some tools, such as kopernio, which appear to work well with some webpages but not others. results overall, the tools found open versions for just less than half of the study sample (490), whereas they found no open versions for 510 articles. although lazy scholar, unpaywall, open access button, and kopernio all found open versions, google scholar returned the most with 462 articles (94 percent of all articles with at least one open version). open access button, lazy scholar, and unpaywall all found a majority of the open articles (62 percent, 73 percent, and 67 percent, respectively); however, kopernio found open versions for just 34 percent of the articles (see figure 1). figure 1. number of open versions found by each tool. it was most common for three or more of the tools to find an open version for an article, with just 48 found by two tools and 98 found by only one tool (see figure 2). information technology and libraries | september 2019 86 figure 2. number of articles where x number of oa finding tools found an open version. when looking at articles where only one tool returned an open version, google scholar had the highest results (84). open access button (4) and lazy scholar (10) also returned unique hits, but unpaywall and kopernio did not. open access button returned the most false positives with 46, or nearly 5 percent of all 1,000 articles. lazy scholar returned 31 false positives (3 percent), unpaywall returned 14 (1 percent), and kopernio returned 13 (1 percent). discussion the results for the oa search tools show that while all four options met with some success, none of them performed as well as google scholar. three of the tools—lazy scholar, open access button, and unpaywall—did find at least half or more of the open versions that google scholar did. it is important to note that open access button, which found the second fewest open versions, does not search researchgate and academia.edu because of legal concerns over article versions that are likely infringing copyright.23 this could have affected open access button’s performance. likewise, kopernio’s lower percentage of finding oa resources might relate to concerns over article versions as well. when creating an account on kopernio, the user is asked to affiliate themselves with an institution so that the tool can search existing library subscriptions at that institution. for this study, the authors did not affiliate with their home institutions when setting up kopernio to get a better idea of which content was open as opposed to content being accessible because of the tool connecting to a library’s subscription collection. if the authors were to identify assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 87 https://doi.org/10.6017/ital.v38i3.11109 with an institution, the number of accessible articles would likely increase, but this access would not be a true representation of what open content is discoverable. in addition, some tools might work better with certain publishers than others. for instance, kopernio did not appear to work with spandidos publications, a leading biomedical science publisher that publishes much of its content as gold oa, meaning the entire journal is published as oa. kopernio found just one open version of a spandidos article, compared to 153 by google scholar. this could be an unintentional malfunction either with spandidos or kopernio, which if fixed, could greatly increase the efficacy of this finding tool. however, open access button, lazy scholar, unpaywall, and google were able to find oa publications from spandidos at similar rates (135, 138, and 139, respectively) with no false positives. while none of the tools performed as well as google scholar, some of the tools were easier to use compared to google scholar. google scholar does not automatically show an open version first; instead, users often have to first select the “all x versions” option at the bottom of each record and then open each version until they find an open version. lazy scholar and unpaywall appear (for the most part) automatically, meaning users can see right away if an open version is available and then click a button once to be taken to that version. although open access button and kopernio do not show automatically if they have found an open version, users need to click a button on their toolbar once to activate each tool and see if the tool was able to find an open version. open access button also provides the extra benefit of making it easy for users to email authors to make their works open if an open version is not already available. relying on lazy scholar, unpaywall, or open access button first causes users no harm, and they can always rely on google scholar as a backup. whether all four tools are needed is questionable. for instance, a few of the authors found kopernio difficult to work with as it seemed to be incompatible with at least one publisher’s website and it introduced extra steps in downloading a pdf file. the fact that it also returned by far the fewest open versions—just 36 percent of the ones google scholar found and no unique hits—does not argue well for users to include it in their oa finding toolbox. also, while lazy scholar, unpaywall, and open access button all performed better on their own, the authors wonder what improvements could be created by combining the resources of the individual tools. conclusion the growth of oa finding tools is encouraging to see as far as helping to make oa works more discoverable. although the study showed that google scholar uncovered more articles than any of the other tools, the utility of at least two of the tools—lazy scholar and open access button—can still be seen in that both found articles not discovered by the other tools, including google scholar. indeed, using the tools in conjunction with one another appears to be the best method. and although open access button found the second fewest articles, the tool’s effort to integrate with interlibrary loan and discovery workflows, as well as its concern about legal issues are all promising for its future. likewise, kopernio might be a better tool for those interested in combining access to a library collection—which likely has a large number of final, publisher versions of scholarship—with their search for openly available scholarship. future studies can include newer oa finding tools that have entered the market, as well as evaluate the user experience of the tools. another study can also look at how well open access information technology and libraries | september 2019 88 button’s author email feature works. also, as open access button and unpaywall continue to move into new areas, such as interlibrary loan support, research could explore if these are more effective ways of connecting users to oa material as well as measure users’ understanding of oa versions they find. overall, the emergence of oa finding tools offers much potential for increasing the visibility of oa versions of scholarship, although no tool is perfect. however, if scholars wish to support oa through their research practices or find themselves unable to purchase or legally acquire the publisher's version, each of these tools can be valuable additions to their work. data statement the data used for this study has been shared publicly in the zenodo database under a cc-by 4.0 license at https://doi.org/10.5281/zenodo.2602200. endnotes 1 jisc, “browse by country and region,” accessed february 15, 2019, http://v2.sherpa.ac.uk/view/repository_by_country/countries=5fby=5fregion.html. 2 colby vorland, “extension,” accessed march 14, 2019, http://www.lazyscholar.org/; colby vorland, “data sources,” lazy scholar (blog), accessed march 14, 2019, http://www.lazyscholar.org/data-sources/. 3 “avoid paywalls, request research,” open access button, accessed march 14, 2019, https://openaccessbutton.org/. 4 unpaywall, “browser extension,” accessed march 14, 2019, https://unpaywall.org/products/extension. 5 kopernio, “faqs,” accessed march 14, 2019, https://kopernio.com/faq. 6 colby vorland, “features,” lazy scholar (blog), accessed march 14, 2019, http://www.lazyscholar.org/category/features/. 7 jeroen bosman et al., “the scholarly commons—principles and practices to guide research communication,” open science framework, september 15, 2017, https://doi.org/10.17605/osf.io/6c2xt. 8 chris bulock, “delivering open,” serials review 43, no. 3–4 (october 2, 2017): 268–70, https://doi.org/10.1080/00987913.2017.1385128. 9 elliot polak, email message to author, june 4, 2019. 10 bulock, "delivering open.” 11 david nicholas et al., “where and how early career researchers find scholarly information,” learned publishing 30, no. 1 (january 1, 2017): 19–29, https://doi.org/10.1002/leap.1087. https://doi.org/10.5281/zenodo.2602200 http://v2.sherpa.ac.uk/view/repository_by_country/countries=5fby=5fregion.html http://www.lazyscholar.org/ http://www.lazyscholar.org/data-sources/ https://openaccessbutton.org/ https://unpaywall.org/products/extension https://kopernio.com/faq http://www.lazyscholar.org/category/features/ https://doi.org/10.17605/osf.io/6c2xt https://doi.org/10.1080/00987913.2017.1385128 https://doi.org/10.1002/leap.1087 assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 89 https://doi.org/10.6017/ital.v38i3.11109 12 christine wolff, alisa b rod, and roger c. schonfeld, “ithaka s+r us faculty survey 2015,” 2015, 83, https://sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/. 13 mamiko matsubayashi et al., “status of open access in the biomedical field in 2005,” journal of the medical library association 97, no. 1 (january 2009): 4–11, https://doi.org/10.3163/15365050.97.1.002; michael norris, charles oppenheim, and fytton rowland, “the citation advantage of open-access articles,” journal of the american society for information science and technology 59, no. 12 (october 1, 2008): 1963–72, https://doi.org/10.1002/asi.20898; doug way, “the open access availability of library and information science literature,” college & research libraries 71, no. 4 (2010): 302–09; charles lyons and h. austin booth, “an overview of open access in the fields of business and management,” journal of business & finance librarianship 16, no. 2 (march 31, 2011): 108–24, https://doi.org/10.1080/08963568.2011.554786; hamid r. jamali and majid nabavi, “open access and sources of full-text articles in google scholar in different subject fields,” scientometrics 105, no. 3 (december 1, 2015): 1635–51, https://doi.org/10.1007/s11192-0151642-2; alberto martín-martín et al., “evidence of open access of scientific publications in google scholar: a large-scale analysis,” journal of informetrics 12, no. 3 (august 1, 2018): 819–41, https://doi.org/10.1016/j.joi.2018.06.012. 14 norris, oppenheim, and rowland, “the citation advantage of open-access articles”; micahel norris, fytton rowland, and charles oppenheim, “finding open access articles using google, google scholar, oaister and opendoar,” online information review 32, no. 6 (november 21, 2008): 709–15, https://doi.org/10.1108/14684520810923881; maria-francisca abad‐garcía, aurora gonzález‐teruel, and javier gonzález‐llinares, “effectiveness of openaire, base, recolecta, and google scholar at finding spanish articles in repositories,” journal of the association for information science and technology 69, no. 4 (april 1, 2018): 619–22, https://doi.org/10.1002/asi.23975. 15 norris, rowland, and oppenheim, “finding open access articles using google, google scholar, oaister and opendoar.” 16 jamali and nabavi, “open access and sources of full-text articles in google scholar in different subject fields.” 17 martín-martín et al., “evidence of open access of scientific publications in google scholar.” 18 stephen curry, “push button for open access,” the guardian, november 18, 2013, sec. science, https://www.theguardian.com/science/2013/nov/18/open-access-button-push; bonnie swoger, “the open access button: discovering when and where researchers hit paywalls,” scientific american blog network, accessed may 30, 2017, https://blogs.scientificamerican.com/information-culture/the-open-access-buttondiscovering-when-and-where-researchers-hit-paywalls/; lindsay mckenzie, “how a browser extension could shake up academic publishing,” chronicle of higher education 68, no. 33 (april 21, 2017): a29–a29; joyce valenza, “unpaywall frees scholarly content,” school library journal 63, no. 5 (may 2017): 11–11; barbara quint, “must buy? maybe not,” information today 34, no. 5 (june 2017): 17–17; michaela d. willi hooper, “product review: unpaywall [chrome & firefox browser extension],” journal of librarianship & scholarly communication 5 https://sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/ https://doi.org/10.3163/1536-5050.97.1.002 https://doi.org/10.3163/1536-5050.97.1.002 https://doi.org/10.1002/asi.20898 https://doi.org/10.1080/08963568.2011.554786 https://doi.org/10.1007/s11192-015-1642-2 https://doi.org/10.1007/s11192-015-1642-2 https://doi.org/10.1016/j.joi.2018.06.012 https://doi.org/10.1108/14684520810923881 https://doi.org/10.1002/asi.23975 https://www.theguardian.com/science/2013/nov/18/open-access-button-push https://blogs.scientificamerican.com/information-culture/the-open-access-button-discovering-when-and-where-researchers-hit-paywalls/ https://blogs.scientificamerican.com/information-culture/the-open-access-button-discovering-when-and-where-researchers-hit-paywalls/ information technology and libraries | september 2019 90 (january 2017): 1–3, https://doi.org/10.7710/2162-3309.2190; terry ballard, “two new services aim to improve access to scholarly pdfs,” information today 34, no. 9 (november 2017): cover-29; diana kwon, “a growing open access toolbox,” the scientist, accessed december 11, 2017, https://www.the-scientist.com/?articles.view/articleno/51048/title/agrowing-open-access-toolbox/; kent anderson, “the new plugins — what goals are the access solutions pursuing?,” the scholarly kitchen, august 23, 2018, https://scholarlykitchen.sspnet.org/2018/08/23/new-plugins-kopernio-unpaywallpursuing/. 19 curry, “push button for open access”; swoger, “the open access button”; mckenzie, “how a browser extension could shake up academic publishing”; kwon, “a growing open access toolbox.” 20 jill emery, “how green is our valley?: five-year study of selected lis journals from taylor & francis for green deposit of articles,” insights 31, no. 0 (june 20, 2018): 23, https://doi.org/10.1629/uksg.406. 21 anna severin et al., “discipline-specific open access publishing practices and barriers to change: an evidence-based review,” f1000research 7 (december 11, 2018): 1925, https://doi.org/10.12688/f1000research.17328.1. 22 way, “the open access availability of library and information science literature.” 23 open access button, “open access button library service faqs,” google docs, accessed february 19, 2019, https://docs.google.com/document/d/1_hwkryg7qj7ff05cx8kw40ml7exwrz6ks5fb10gegg/edit?usp=embed_facebook. https://doi.org/10.7710/2162-3309.2190 https://www.the-scientist.com/?articles.view/articleno/51048/title/a-growing-open-access-toolbox/ https://www.the-scientist.com/?articles.view/articleno/51048/title/a-growing-open-access-toolbox/ https://scholarlykitchen.sspnet.org/2018/08/23/new-plugins-kopernio-unpaywall-pursuing/ https://scholarlykitchen.sspnet.org/2018/08/23/new-plugins-kopernio-unpaywall-pursuing/ https://doi.org/10.1629/uksg.406 https://doi.org/10.12688/f1000research.17328.1 https://docs.google.com/document/d/1_hwkryg7qj7ff05-cx8kw40ml7exwrz6ks5fb10gegg/edit?usp=embed_facebook https://docs.google.com/document/d/1_hwkryg7qj7ff05-cx8kw40ml7exwrz6ks5fb10gegg/edit?usp=embed_facebook abstract introduction literature review methodology oa finding tools study limitations results discussion conclusion data statement endnotes user experience testing in the open textbook adaptation workflow: a case study article user experience testing in the open textbook adaptation workflow a case study camille thomas, kimberly vardeman, and jingjing wu information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12039 camille thomas (cthomas5@fsu.edu) is scholarly communications librarian, florida state university. kimberly vardeman (kimberly.vardeman@ttu.edu) is user experience librarian, texas tech university. jingjing wu (jingjing.wu@ttu.edu) is web librarian, texas tech university. © 2021. abstract as library publishers and open education programs grow, it is imperative that we integrate practices in our workflows that prioritize and include end users. although there is information available on best practices for user testing and accessibility compliance, more can be done to give insight into the library publishing context. this study examines the user and accessibility testing workflow during the modification of an existing open textbook using pressbooks at texas tech university. introduction as library publishers and open education programs grow, there is an opportunity to integrate into our workflows practices that prioritize and include end users. although there is information available on best practices for user testing and accessibility compliance, more can be done to give insight into the library publishing context. there are currently no case studies that examine the user and accessibility testing workflow during the modification of an existing open textbook. this study examines user experience testing as a method to improve oer interfaces, learning experience, and accessibility during the oer production process using pressbooks at a large research university. literature review user experience (ux) is a “momentary, primarily evaluative feeling (good–bad) while interacting with a product or service” that can go beyond simple usability evaluations to consider “qualities such as meaning, affect and value.”1 ux evaluations are generally applied to library websites, spaces, and interfaces and are not currently a common element in library publishing workflows. open educational resources (oer) are defined as teaching, learning, and research resources that reside under an intellectual property license that permits their free use and repurposing by others.2 whitfield and robinson make a distinction between teaching vs. learning resources, instructional vs. interface usability, and ease of modification for creators.3 this select literature review considers usability testing of e-books, oer workflows, and accessibility evaluations and how they apply to local contexts. along with incentives for instructors to engage with oer, the ability to adapt oer is o ften highlighted as a benefit. walz shares common workflows for oer production, including broad steps for design and development during creation of original oer.4 in the case of reuse, the design stage in walz’s workflow includes review, redesign, redevelopment, and adoption. open university models for transforming oer include the integrity model, in which the new oer remains close to the original material; the essence model, in which material is transformed by mailto:cthomas5@fsu.edu mailto:kimberly.vardeman@ttu.edu mailto:jingjing.wu@ttu.edu information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 2 reducing some features and adding new activities for interactivity; and the remix model, in which content is redesigned to be optimal for web viewing.5 student participation in oer production is often seen in open pedagogy, but these cases look at student frustrations and feedback with the objective of experiential learning, not for usability or evaluation.6 now that oer production has grown in scale, librarians and advocates seek the most effective and sustainable workflows. figure 1. illustrations of two oer lifecycles. this work was adapted by camille thomas from an original by anita walz (2015) under a cc by 4.0 international license.7 in his workflow and framework analysis meinke recommends the inclusion of more discrete steps and believes each institution’s ideal workflow will be based on local context.8 usability testing is a discrete workflow step that gives us human-centered insight about how users are affected by interfaces and how they value systems.9 libraries favor collections-based assessment that measures how many end users are using digital items, without prioritizing who users are or how and why they use resources.10 demonstrating and assessing value is essential for scholar buy-in and content recruitment, for example, which are central to all types of open resources. in the case of educational materials, lack of engagement and breakdowns in learning can be attributed to barriers and marginalization of learners.11 additionally, critiques of oer include assumptions that access to information equates to meaningful access to knowledge, but withou t context there is no guarantee that there will be meaningful transference or learning.12 harley believes defining a target audience and considering reasons for use and nonuse of resources in specific educational contexts beyond measuring anecdotal demand (e.g., website page views or survey responses, which harley does not see as indicators of value but rather of popularity) may address challenges to effectively measuring outcomes for content that is freely available on the web.13 meaningful evaluation of learning resources requires deep understanding of contextualized information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 3 educator and student needs, not just content knowledge.14 to address these barriers and assumptions, openstax, a leading oer publisher, has ux experts on staff, but this model is exceptional and rare at a university or library. many universities and libraries publishing oer do not have full-time personnel dedicated to review. some library user experience departments have hired content strategists for auditing, analyzing, defining, and designing website content, contentrelated projects, and overall content strategy.15 currently, oer work is rarely included in the scope of library user experience departments. however, limited literature does show the use of ux research methods in library publishing contexts. libraries and support units with few resources can also perform user testing.16 user experience practitioners have established that a low number of test participants—three to five—are enough to identify a majority of usability issues.17 borsci et al. suggest the important aspect is not securing a high-volume sample, but rather finding behavior diversity to make reliable evaluations.18 the number of users required to find diverse behavior can depend on what is being tested. following this standard, the consistent inclusion of user evaluations in oer workflows will not necessarily require large amounts of funding, participants, resources, or time. oer, in particular, are well suited to cumulative, early, frequent, and specific user testing. with their open copyright licensing and availability, oer are an example of the mutable digital content needed for collaboration, cumulative production, and support of networked communities. 19 several studies assert that complete information behavior analysis should be carried out before or during development, not after.20 meinke concludes his workflow analysis by encouraging iterative release in oer production workflows, which aligns with lean and iterative “guerilla” approaches used in libraries to sustainably improve usability.21 iteration is a process of development that involves cyclical inquiry or repetitive testing, providing multiple opportunities for people to revisit ideas or deliverables in order to improve an evolving design until the end goal is reached.22 in the context of design, it is a method of creating high-quality deliverables at a faster rate.23 a cyclical approach also reflects walz’s as well as blakiston and mayden’s workflow visualizations.24 walz asserts that incentives for instructors and the quality of the resources are key factors in advancing adoption, adaptation, and creation of oer.25 harley uncovered disconnects between what faculty say they need in undergraduate education and what those who produce digital educational resources imagine is an ideal state. 26 influence on faculty resource use, including oer, varied by discipline, teaching style and philosophy, and digital literacy levels, with personal preferences having the most influence on decision-making. in the evaluation or tracking stage found in most oer production workflows, we can see the impact of the quality assurance stage. the study by woodward et al. on student voice in evaluating textbooks found that incorporating multiple stakeholders into the process resulted in deeper exploration of students’ expectations when learning. students ranked one oer and one conventional textbook the highest based on content, design, pedagogy, and cost. multiauthored options ranked higher, and texts with examples were seen as more beneficial for distance learners.27 meinke believes unless discrete parts of the development process are identified, it is not useful to signal others to contribute to a project.28 an example of an oer production workflow containing usability considerations is the content, openness, reuse & repurpose, and evidence (corre) framework by the university of leicester (see fig. 2).29 the openness phase of the corre 2.0 framework includes “transformation for usability,” which is assigned to the oer evaluator, editorial board, or head of department.30 versions of the corre workflow were adapted by the information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 4 university of bath and the university of derby in the united kingdom for their institutional contexts. for example, the university of derby assigned “transformation for usability” to a developer. by building usability as a discrete step in oer production workflows, publishers and collaborators can make improvements on pain points, make changes in context, and create clear guidelines for partnerships based on local needs. betz and hall’s study supports considerations for how user microinteractions, or individual actions within a process, can be improved to make them scalable and commonplace in library workflows.31 this can include publishing workflows. for example, a study of oer on mobile devices in brazil found problems related to performance, text legibility, and trigger actions in small objects.32 other guidelines for oer and usability include using high-quality multimedia source material, advanced support from educational technologists, and balancing high and low technology in order to avoid assumptions about learners’ internet connection or devices.33 although usability testing alone is an important part of evaluating a website or product, because the user experience is multifaceted, it is also important to ensure that the product is accessible, meets user needs, and has an appealing design.34 figure 2. corre framework for oer development at the university of leicester.35 accessibility studies also encourage integrating user interactions into the creation workflow. accessibility impacts usability, findability, and holistic user experience. 36 creators and supporting advocates have relied on universal design, web standards, and ada compliance when creating information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 5 accessible digital content, emphasizing that accessible content for those with disabilities means more accessible content for all users.37 areas addressed can include text style and formatting, linking, and alternative text considerations, as detailed in the bccampus open education accessibility toolkit.38 for example, kourbetis and boukouras drew from universal design to create a bilingual oer for students with hearing impairments in greece, incorporating contextual considerations for vernacular languages and other local user needs.39 early efforts toward accessible oer, such as a 2010 framework, prompted critiques from members of the accessibility community and impeded adoption.40 while guides based in universal design offer a starting place and consistent reference, oer advocates could create workflows that support adaptive changes seen in inclusive design. universal design and web standards are fixed, while inclusive design seeks to support adaptive changes as needs evolve and does not treat non normative users as a homogenous, segregated group.41 treviranus et al. go on to state that compliance is not achieved by providing a set of rules, guidelines, or regulations to be followed.42 beyond lack of awareness of accessibility best practices, librarians and creators tend to have little control over customizing proprietary digital content platforms to add local context. 43 the flexible learning for open education (floe) project, for example, aims to integrate automatic and unconscious inclusive oer design through open source tools, but many institutions may not be able to develop such tools to incorporate local contexts.44 both librarians and e-resources vendors have been interested in the features and usability of ebooks to fine-tune their collection development strategies as well as improve the user experience of their platforms. literature shows that most studies about e-books have focused on features or the interface design of e-book reading applications. the recent academic student ebook experience survey showed that three-quarters of survey attendees considered it extremely important or important for e-books to have page numbers for citation purposes.45 this survey and other studies suggest that search, navigation, zoom, annotation, mobile compatibility, as well as offline availability including downloading, printing, and saving, were the most expected features.46 other features, such as citation generation and emailing, were mentioned or tested in some research.47 while using e-books and using e-textbooks may involve the same functionality, the purpose, devices, and user types differ because knowledge transfer is needed in learning. jardina and chaparro evaluated eight e-textbook applications with four tasks: bookmarking, note-taking, note locating, and word searching.48 they found that the interfaces to these common features varied on the different applications. standardization, or at least following general web convention when designing these interfaces, may reduce distractions that keep students from learning. the etextbook user interface can be critical to the future success of e-textbook adoption. although limited research on usability of e-textbooks or open textbooks has been conducted, a considerable number of findings from studies on e-books are relevant and applicable to etextbook projects. the e-book or e-textbook applications usability evaluation methods and results can be borrowed when understanding oer user needs. libraries can apply these e-book usability evaluations to the basic infrastructure of oer, but leverage the local contexts of students, instructors, and institutional culture when adapting the material. the more normalized usability, prototyping, and collaboration are in oer production workflows, the richer the resources and community investment. this approach can address diverse and evolving oer user needs, locally and sustainably, as they arise. our study contributes to the information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 6 literature by examining the impacts of integrating usability testing in an inaugural oer adaptation project at a large research university in the united states. case study the project to adapt the open textbook college success, published by the university of minnesota libraries, for use in the raiderready first-year seminar course, was brought to the texas tech university libraries’ scholarly publishing office by the head of the outreach department in march 2018. the program was deciding between a commercial textbook and the adapted open textbook. the course was offered during each fall semester and had an average enrollment of over 1,600 since 2016. an initial meeting took place and regular weekly meetings were set up afterward to review edits and ensure communication within the original 30-day timeline. it was the first oer production project for the libraries’ publishing program, which had previously focused on open access journals and materials in the repository. originally, we sought to use open monograph press because we had a local instance already installed. however, a platform with more robust formatting capabilities was needed in order to reach the desired product within the timeline. we decided to use the pressbooks pro (a wordpress plugin) sandbox for one project through our open textbook network membership. a rough draft of edits to the original work were already completed. we used a mediated service model, in which librarians performed the formatting, quality assurance, and publishing. this was in contrast to self-service models in which creators work independently and consult with support specialists. the digital publishing librarian and scholarly communication librarian formatted the edits, with html/css customization and platform troubleshooting from the web librarian. other library staff involved in the project included communications & marketing (cover design), the user experience librarian, and the electronic resources librarian (cataloging in the catalog). campus stakeholders and partners included the libraries, the raiderready program, editors, copytech (printers affiliated with the university), the campus bookstore, and the worldwide elearning unit. program partners were enthusiastic about usability and accessibility testing for the textbook. the initial testing took place in the middle of the adaptation project timeline, once initial content was formatted and ready for testing. the bccampus’ accessibility toolkit and the pressbooks user guide were used as primary guides throughout the process. the scholarly publishing librarian and the user experience librarian met to develop the testing method and identify users who would reflect the audience using the textbook. a second round of tests was conducted a year after the initial project when the editors made updates to the text. while the resulting changes were minor, this further testing allowed us to seek more feedback on the most recent version of the textbook and apply some lessons learned from the first round of testing. we did not use personas or identify user needs beforehand. we planned to recruit first-year students and students who took the raiderready course in a previous semester. however, we decided to instead recruit from existing pools of student volunteers for library usability tests in order to get three to six students in a short amount of time. for the second round of testing, we planned to recruit on-campus students, distance students, and students with diverse abilities. we recruited from newly established pools of volunteers for distance students as well as existing volunteer pools. during the first iteration, we requested that worldwide elearning, texas tech university’s distance learning unit on campus, test the textbook pilot content in pdf and epub formats using screen reader software. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 7 the user experience librarian conducted a first round of four usability tests in march 2018 and a second round of two usability tests in april 2019. a sample test script from steve krug provided a solid foundation for conducting our own tests.49 in each test, participants were asked to answer two pretest questions, complete four tasks, and answer four posttest questions (see appendix for script). tasks included finding the textbook, exploring the textbook itself, locating activities for a specific chapter, and searching for the student code of conduct. participants were instructed to think aloud as they worked through the tasks. the think-aloud protocol is a commonly used test method, where participants are asked to verbalize their thoughts and actions as they perform tasks on a website. the observation tasks are set beforehand, and the facilitator follows a script to ensure consistency among testers.50 the combination of observing and recording participants’ comments and reactions provides insight into a site’s usability. testers were invited to comment on their experience at the end of the session. each usability test was recorded using morae software to track on-screen activity such as mouse movements and clicks, typing, and the verbal comments of the facilitator and participant. we conducted tests using a laptop running windows 10 with a 15.6-inch display. in the first round of testing, we also showed students the book on an ipad mini, both in adobe reader and ibooks. while we asked them to briefly view the textbooks, we did not ask them to complete specific tasks while using the tablet. limitations the biggest limitation was that we did not test on users using a screen reader or other assistive technology. the user experience librarian built a pool of on-campus students who volunteered to participate in user research in 2018, and relationships with a pool of distance student users was established in 2019. however, a pool of other types of non-normative learners had not yet been established for either round of testing. another limitation of the study was that we primarily tested on campus servers, so we do not have data on rural or distance learner experiences with the textbook until the second round of testing. in addition, we used only a few devices, a windows 10 laptop for formal testing and an ipad students briefly viewed afterward. we also did not have an educational technologist as a partner throughout the process. results once testing was complete, the scholarly communications librarian and the user experience librarian analyzed the notes and identified areas of common concern and confusion among participants. all participants were familiar with online textbooks from other courses. participants cited cost as a major consideration when deciding between purchasing print or electronic texts. more than one participant said that electronic textbooks can be cheaper but can be more frustrating to use. participants had more experience viewing textbooks on laptops. the ability to download texts for reading on a phone was not always available due to publisher restrictions. content and navigation participants liked pictures and visuals to break up the blocks of text. however, one participant expressed a dislike for too many slideshows or other media. another in the first round of testing liked that there were not “too many” links that brought you out of the textbook, stating it was “annoying to split screen in order to see text plus activity/homework assignment.” in the second round of testing, one participant felt the lack of interactive content was best for the first-year students compared to videos and activities in textbooks for advanced courses. that participant also thought the simpler language of the text was more welcoming to first-year students. a information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 8 participant said an ipad would be better than a laptop for viewing this book, because scrolling was easier. several users did not find features such as bookmarked sections in the in-browser pdf viewers or adobe reader. participants who did not see or use the table of contents (toc) continually relied on scrolling through pages to locate content. only one participant, unprompted, used the ctrl+f shortcut to keyword search the text. a few other participants viewed the toc, then entered the desired page number in the top toolbar navigation field. most of them expected the code of conduct information in one of the tasks to be in the front matter. the emphasis on content reflects blakiston and mayden’s experience that without a content strategy, it becomes difficult to search and to demonstrate credibility, and it is a challenge to create a coherent, user-centered information architecture.51 all participants navigated to the toc several times to complete tasks, making it a relevant feature. in the second round of testing, one participant preferred the statements and questions at the beginning of the first chapter to learning objectives typically listed in textbooks. discovery and access participants took varied approaches to finding textbooks. one would get links from the professor via email or the syllabus. others would use the campus bookstore for purchases. one would use the student information system (raiderlink) to locate information about the textbook. potential access points to make the raiderready textbook discoverable included the institutional repository, the open textbook library, and the local library catalog. the open textbook library was ruled out mostly due to campus-specific adaptations, which were not more substantial for public use than the original college success. thinktech, the institutional repository, was the most viable option and allowed for permanent linking, which worked well with the access points student users mentioned. in the second round of testing, one participant searched for the textbook via the library catalog/discovery system, google search, and the raiderready department website. the course description on the department website listed an open textbook, but the user pointed out that it was not actually linked there. discussion user testing changed our actions during the project. interactions with students did not occur during any other stage of the adaptation process before the resource was adopted in the course. many insights from the testing were indicative of self-reported preferences such as requesting more visuals, preferring print for reading and exercises, and auditory screen reading. we also learned ways that cost impacted how students used textbooks. for example, when we followed up on a participant’s comment and asked if they liked to highlight books, the student responded that they try not to mark their books because they want to resell them. testing also helped us observe actual behaviors among similar users in a way oer toolkits and guidelines alone did not. we learned more about how oer fits into the culture of learning and resources at texas tech university and how that may differ from other institutions. for a visual representation of our workflow, we adapted billy meinke’s oer production workflow (targeted to creators) because it was an openly available, editable workflow with comprehensive discrete steps. similar to the corre framework adaptations, meinke’s workflow was adapted by others, including the southern alberta institute of technology (sait), lansing community college, and the university of houston, to fit their institutional contexts.52 our process did not include an external peer review process; instead review was done by the editors. priming and preproduction information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 9 phases in the workflow were relatively quick, occurring in the first two weeks. the bulk of time— about four weeks—was spent in the development phase. the quality assurance and publishing phases occurred for about two weeks, with most of the time spent on finalizing edits and formatting. the first round of user testing took about two hours total and redux (revisiting the prototype and implementing changes), along with the format finalization, took about two weeks. finalization for formatting and redux changes in the first iteration of the text involved pressbooks troubleshooting. the original timeline for the project was 30 days, but the actual time for the project was 60 days. the second round of user testing took about two hours total and occurred at the halfway point within a new 60-day deadline for an updated version of the textbook. we acknowledge that even though the actual time spent with users in the first round was limited to two hours, the process also required time for drafting recruitment emails, communicating with volunteers, scheduling testing, and debriefing after sessions. figure 3 shows our workflow diagram, including a new quality assurance phase (see fig. 4 for detail) based on our case study. it includes prototyping (content and format draft), user testing, and implementing user feedback on the oer prototype. figure 3. discrete production workflow including quality assurance phase. this workflow is an adaptation of a workflow by billy meinke and university of hawai’i at manoa under a cc-by license. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 10 figure 4. quality assurance phase with user testing. we addressed several suggestions participants made during testing to improve the textbook’s navigation functionality. we were able to address requests for a linked toc in the second round of testing. in the first round of testing, formatting was tailored to print pdf format because the editors wanted a print version to be available. we were able to create a linked toc in the digital pdf format, but not the printable pdf format. we were not aware that the toc could be changed based on available documentation in the first round of testing, but we were able to successfully troubleshoot this issue in the second round of tests. we were not able to do any customization on the search feature, which was built in. for customization, pressbooks allows styling through css files for three formats (pdf, web page, epub) separately. we customized them for look and feel. many of the requests were constrained by our working knowledge of and the technical limits of pressbooks, so we added a tips for pdf readers and printing section in the front matter of the textbook during the first round. it is important to note that although these were not major changes to the interface, they gave us insight for iterative changes. upon reflection, it would have been information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 11 preferable to involve someone with pressbooks developer experience at the outset. because we had not led a similar project before, nor worked with the software previously, we were more limited on the changes we could make as a result of testing than we expected. however, after this experience, we know what areas to test and are better prepared to effect actual changes. we made chapters available separately in the institutional repository to cut down on excessive scrolling, because scrolling through an entire textbook slowed students’ ability to study and quickly reference the text. also, the editors requested digital as well as print access to the textbook through the campus bookstore. the raiderready textbook was also added to the library’s local records in the catalog. we did not make a web version available through pressbooks. a web version was not a priority because of the institution-specific customization and because the editors did not request one. usage statistics from the repository between march 2018 and february 2019 peaked during midterms and at the end of semesters in which the class was taught. chapters 5, 6, 7, and 8 had the most downloads—the last chapters of the book, likely the chapters students were tested on for the final—with the majority of downloads (1,368) taking place during october 2018. this indicates that the option to download individual chapters appealed to students. accessibility testing the textbook with screen readers confirmed the need for an epub format of the text. hyde gives the following guidance for educators using pressbooks: “pdf accessibility is known to be somewhat limited, and in particular some files are not able to be interpreted by screen readers. the pdf conversion tool that pressbooks uses does support pdf tagging, which improves screen reader compatibility, but often the best solution is to offer web and e-book versions of a textbook as well as the pdf, so readers can find the format that best suits their needs.”53 for pdfs, issues included lack of alt tags, headings not set, and tables and images lacking tags. adding alt tags was planned early, after they were lost when uploading the wxr (wordpress extended rss) file—a wordpress export file in xml format—in pressbooks, and loss of the alt tags was confirmed during testing at the midpoint of the process. however, due to deadlines and pressbooks functionality, we were not able to address more of the tagging issues. epubs worked much better in tests with screen readers, apple devices, and e-readers. editors preferred that a pdf be used as the primary version and wanted an epub for screen readers upon request. our partners’ preference was likely based on the common use of pdfs but it did not comply with the principles of universal or inclusive design. regarding e-book accessibility, pressbooks documentation says, “ebook accessibility is somewhat dictated by the file format standards, which focus on font sizes and screen readers, and improvements are also being made with dynamic content. the international digital publishing forum has a checklist to prompt ebook creators on accessibility functions they can incorporate while creating their content.”54 we made a decision to include multiple formats to take multiple types of use into consideration. in the fir st round of changes, we included an epub alongside the pdf in the repository, so users with disabilities would not have to self-identify by making a request in order to gain access. upon learning more about inclusive design after the pilot, we realized we were treating users as a homogeneous group and segregating the more accessible version. in the second round, when we realized the epub was not available by separate chapters as was the pdf version, we then made it available by chapter as well. we recommend that evaluating oer according to the international digital publishing forum checklist be incorporated into the qa part of the workflow. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 12 conclusion there is room for future research on iterative testing for oer and testing with more emphasis on mobile devices, testing with deeper investigation into microinteractions concerning accessibility, and testing in workflows that use other publishing platforms. as the creators of the floe project suggest, many more customizations can be made to points of user interactions if the software platform for adaptation is open source. future research may also examine regional and cultural influence on learning and interface preferences. one change that may support future adaptation projects at texas tech university would be modifying internal guidelines that take into consideration previous testing and local context. we also recommend keeping detailed documentation, particularly of steps for changes that are not included in existing guides on oer production. creating a memorandum of understanding with partners that clearly outlined responsibilities could have prevented some of the misunderstandings that occurred. for example, when stakeholders discussed producing print copies of the textbook, it wasn’t clear what the library’s role was. with a short timeline and more work involved than expected, the library was in a position of overpromising and underdelivering. it was apparent that the workflows themselves needed to be open and adaptable to support resources, communities, and processes in local contexts. it was important throughout the process to be aware of our partners’ priorities (e.g., instructional preferences, cost to students, departmental buy-in), because we had to balance these priorities with user feedback. we recommend having specific roles for content strategists, educational technologists, and developers in workflows during oer production. the work of creating workflows, assigning roles, and creating standards for oer content currently falls on librarians, instru ctional designers, and creators. as librarians seek the most sustainable workflows, it will be beneficial to emphasize investing in the quality assurance stages of oer production and evenly distributing responsibilities. this can be done through collaborative partnerships or by hiring additional positions. if other institutions were to scale the practices from our case study, ideally, librarians would take responsibility for adding roles or formalized work to the scope of either ux or oer departments so that it becomes normalized in oer workflows. we recommend working with editors to advocate for one textbook format that addresses a variety of learning needs. we plan to use these experiences, along with existing resources, to include inclusive and user -friendly recommendations in policies and guidelines for oer adaptation. conducting user testing did challenge assumptions about student use of oer by librarians and editing instructors. while we referred to toolkits, guidelines, and best practices, internal testing allowed us to make improvements to several specific microinteractions students encountered while using the text. it was very feasible to incorporate testing into the workflow. we were able to directly observe user information behavior from members of the community that the resource was intended to serve. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 13 appendix: usability test method pretest questions 1. what is your academic classification? (undergraduate, graduate, faculty) 2. have you ever used an e-textbook or a digital textbook in one of your classes? (if yes, ask for course details.) tasks to observe 1. imagine you needed to get a copy of the digital textbook raider ready: unmasking the possibilities of college success. how would you go about finding it? it will help us if you think out loud as you go along—tell us what you’re looking at, what you’re trying to do, what you’re thinking. 2. [if the tester is unable to locate the digital textbook, the moderator will open it.] please take a couple of minutes to look at this textbook. explore and click on a link or two. 3. for the next task, imagine an instructor asked you to locate the chapter activities for chapter 1. could you show us how you would locate those? 4. for the final task, could you find the student code of conduct? posttest questions 1. what were your impressions of this resource? 2. what did you like? dislike? what would you change? 3. how easy or difficult was it to find what you wanted? please explain. 4. is there anything else about your experience using this textbook today that you’d like to tell us? information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 14 endnotes 1 sonya betz and robyn hall, “self-archiving with ease in an institutional repository: microinteractions and the user experience,” information technology and libraries 34, no. 3 (september 21, 2015): 44–45, https://doi.org/10.6017/ital.v34i3.5900. 2 anita r. walz, “open and editable: exploring library engagement in open educational resource adoption, adaptation and authoring,” virginia libraries 61 (january 2015): 23, http://hdl.handle.net/10919/52377. 3 stephen whitfield and zoe robinson, “open educational resources: the challenges of ‘usability’ and copyright clearance,” planet 25, no. 1 (2012): 52, https://doi.org/10.11120/plan.2012.00250051. 4 walz, “open and editable,” 24. 5 andy lane, “from pillar to post: exploring the issues involved in repurposing distance learning materials for use as open educational resources” (working paper, uk open university, december 2006), accessed august 1, 2018, http://kn.open.ac.uk/public/document.cfm?docid=9724. 6 andy arana et al. eds., “open logic project,” university of calgary faculty of arts and the campus alberta oer initiative, accessed april 26, 2019, http://openlogicproject.org/; robin derosa, the open anthology of earlier american literature (public commons publishing, 2015), https://openamlit.pressbooks.com/; timothy robbins, “case study: expanding the open anthology of earlier american literature,” in a guide to making open textbooks with students, ed. elizabeth mays (the rebus community for open textbook creation, 2017), https://press.rebus.community/makingopentextbookswithstudents/chapter/case-studyexpanding-open-anthology-of-earlier-american-literature/. 7 walz, “open and editable,” 24. 8 billy meinke, “discovering oer production workflows,” uh oer (blog), university of hawai’i, december 23, 2016, https://oer.hawaii.edu/discovering-oer-production-workflows/. 9 betz and hall, “self-archiving with ease,” 44. 10 beth st. jean et al., “unheard voices: institutional repository end-users,” college & research libraries 72, no. 1 (january 2011): 23, https://doi.org/10.5860/crl-71r1. 11 jutta treviranus et al., “an introduction to the floe project,” in international conference on universal access in human-computer interaction, universal access to information and knowledge, ed. constantine stephanidis and margherita antona, uahci 2014 (june 2014), lecture notes in computer science 8514: 454, https://doi.org/10.1007/978-3-319-074405_42. 12 sarah crissinger, “a critical take on oer practices: interrogating commercialization, colonialism, and content,” in the library with the lead pipe, october 21, 2015, http://www.inthelibrarywiththeleadpipe.org/2015/a-critical-take-on-oer-practicesinterrogating-commercialization-colonialism-and-content/; diane harley, “why https://doi.org/10.6017/ital.v34i3.5900 https://nam04.safelinks.protection.outlook.com/?url=http%3a%2f%2fhdl.handle.net%2f10919%2f52377&data=02%7c01%7ccamille.thomas%40ttu.edu%7c1184a3bd8d1b411a7d6f08d6abcb07bd%7c178a51bf8b2049ffb65556245d5c173c%7c0%7c0%7c636885285831155488&sdata=dqdoguvqanm6uote7bqivip8uoaz%2b3xnoxg5uscm4tc%3d&reserved=0 https://nam04.safelinks.protection.outlook.com/?url=http%3a%2f%2fhdl.handle.net%2f10919%2f52377&data=02%7c01%7ccamille.thomas%40ttu.edu%7c1184a3bd8d1b411a7d6f08d6abcb07bd%7c178a51bf8b2049ffb65556245d5c173c%7c0%7c0%7c636885285831155488&sdata=dqdoguvqanm6uote7bqivip8uoaz%2b3xnoxg5uscm4tc%3d&reserved=0 http://hdl.handle.net/10919/52377 https://doi.org/10.11120/plan.2012.00250051 https://doi.org/10.11120/plan.2012.00250051 https://doi.org/10.11120/plan.2012.00250051 http://kn.open.ac.uk/public/document.cfm?docid=9724 http://openlogicproject.org/ https://openamlit.pressbooks.com/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://oer.hawaii.edu/discovering-oer-production-workflows/ https://doi.org/10.5860/crl-71r1 https://doi.org/10.1007/978-3-319-07440-5_42 https://doi.org/10.1007/978-3-319-07440-5_42 http://www.inthelibrarywiththeleadpipe.org/2015/a-critical-take-on-oer-practices-interrogating-commercialization-colonialism-and-content/ http://www.inthelibrarywiththeleadpipe.org/2015/a-critical-take-on-oer-practices-interrogating-commercialization-colonialism-and-content/ information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 15 understanding the use and users of open education matters,” in opening up education: the collective advancement of education through open technology, open content, and open knowledge, ed. toru iiyoshi and m.s. vijay kumar (cambridge, ma: the mit press, 2008), 197– 212. 13 harley, “why understanding,” 208. 14 tom carey and gerard l. hanley, “extending the impact of open educational resources through alignment with pedagogical content knowledge and institutional strategy: lessons learned from the merlot community experience,” in opening up education: the collective advancement of education through open technology, open content, and open knowledge, ed. toru iiyoshi and m.s. vijay kumar (cambridge, ma: the mit press, 2008), 238. 15 rebecca blakiston and shoshana mayden, “how we hired a content strategist (and why you should too),” journal of web librarianship 9, no. 4 (2015): 202–6, https://doi.org/10.1080/19322909.2015.1105730; “our team,” openstax, rice university, accessed december 9, 2019, https://openstax.org/team. 16 maria nuccilli, elliot polak, and alex binno, “start with an hour a week: enhancing usability at wayne state university libraries,” weave: journal of library user experience 1, no. 8 (2018), https://doi.org/10.3998/weave.12535642.0001.803. 17 jakob nielsen and thomas k. landauer, “a mathematical model of the finding of usability problems,” in proceedings of the interact’93 and chi’93 conference on human factors in computing systems (may 1993): 211–12, https://doi.org/10.1145/169059.169166. 18 simone borsci et al., “reviewing and extending the five-user assumption: a grounded procedure for interaction evaluation,” in acm transactions on computer-human interaction 20, no. 5, article 29 (november 2013), 18–19, http://delivery.acm.org/10.1145/2510000/2506210/a29-borsci.pdf. 19 treviranus et al., “floe project,” 454. 20 laura icela gonzález-pérez, maría-soledad ramírez-montoya, and francisco j. garcía-peñalvo, “user experience in institutional repositories: a systematic literature review,” international journal of human capital and information technology professionals 9, no. 1 (january–march 2018): 79, 84, https://doi.org/10.4018/ijhcitp.2018010105; betz and hall, “self-archiving with ease,” 45; st. jean et al., “unheard voices,” 23, 36–37, 40. 21 meinke, “discovering oer production workflows”; nuccilli, polak, and binno, “start with an hour.” 22 steven d. eppinger, murthy v. nukala, and daniel e. whitney, “generalised models of design iteration using signal flow graphs,” research in engineering design 9, no. 2 (1997): 112; helen timperley et al., teacher professional learning and development (wellington, new zealand: ministry of education, 2007), http://www.oecd.org/education/school/48727127.pdf. 23 eppinger, nukala, and whitney, “design iteration,” 112–13. https://doi.org/10.1080/19322909.2015.1105730 https://openstax.org/team https://doi.org/10.3998/weave.12535642.0001.803 https://doi.org/10.1145/169059.169166 http://delivery.acm.org/10.1145/2510000/2506210/a29-borsci.pdf https://doi.org/10.4018/ijhcitp.2018010105 http://www.oecd.org/education/school/48727127.pdf information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 16 24 walz, “open and editable,” 23; blakiston and mayden, “how we hired a content strategist,” 203. 25 walz, “open and editable,” 28. 26 harley, “why understanding,” 201–6. 27 scott woodward, adam lloyd, and royce kimmons, “student voice in textbook evaluation: comparing open and restricted textbooks,” international review of research in open and distributed learning 18, no. 6 (september 2017), 150–63, https://doi.org/10.19173/irrodl.v18i6.3170. 28 meinke, “discovering oer production workflows.” 29 samuel k. nikoi et al., “corre: a framework for evaluating and transforming teaching materials into open educational resources,” open learning: the journal of open, distance and e-learning 26, no. 3 (2011), 194–99, https://doi.org/10.1080/02680513.2011.611681. 30 “corre 2.0,” institute of learning innovation, university of leicester, accessed april 25, 2019, https://www2.le.ac.uk/departments/beyond-distance-researchalliance/projects/ostrich/corre-2.0. 31 betz and hall, “self-archiving with ease,” 45–46. 32 andré constantino da silva et al., “portability and usability of open educational resources on mobile devices: a study in the context of brazilian educational portals and android-based devices” (paper, international conference on mobile learning 2014, madrid, spain, february 28–march 2, 2014), 198, https://eric.ed.gov/?id=ed557248. 33 sarah morehouse, “oer bootcamp 3-3: oers and usability,” youtube video, 3:16, march 2, 2018, https://www.youtube.com/watch?v=cncxbcs-2gm. 34 krista godfrey, “creating a culture of usability,” weave: journal of library user experience 1, no. 3 (2015), https://doi.org/10.3998/weave.12535642.0001.301; peter morville, “user experience design,” semantic studios, june 21, 2004, http://semanticstudios.com/user_experience_design/. 35 meinke, “discovering oer production workflows.” 36 cynthia ng, “a practical guide to improving web accessibility,” weave: journal of library user experience 1, no. 7 (2017), https://doi.org/10.3998/weave.12535642.0001.701; whitney quesenbery, “usable accessibility: making web sites work well for people with disabilities,” ux matters, february 23, 2009, http://www.uxmatters.com/mt/archives/2009/02/usableaccessibility-making-web-sites-work-well-for-people-with-disabilities.php. 37 ng, “improving web accessibility.” 38 amanda coolidge et al., accessibility toolkit 2nd edition (victoria, b.c.: bccampus, 2018), 1–71, https://opentextbc.ca/accessibilitytoolkit/. https://doi.org/10.19173/irrodl.v18i6.3170 https://doi.org/10.1080/02680513.2011.611681 https://www2.le.ac.uk/departments/beyond-distance-research-alliance/projects/ostrich/corre-2.0 https://www2.le.ac.uk/departments/beyond-distance-research-alliance/projects/ostrich/corre-2.0 https://eric.ed.gov/?id=ed557248 https://www.youtube.com/watch?v=cncxbcs-2gm https://doi.org/10.3998/weave.12535642.0001.301 http://semanticstudios.com/user_experience_design/ https://doi.org/10.3998/weave.12535642.0001.701 http://www.uxmatters.com/mt/archives/2009/02/usable-accessibility-making-web-sites-work-well-for-people-with-disabilities.php http://www.uxmatters.com/mt/archives/2009/02/usable-accessibility-making-web-sites-work-well-for-people-with-disabilities.php https://opentextbc.ca/accessibilitytoolkit/ information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 17 39 vassilis kourbetis and konstantinos boukouras, “accessible open educational resources for students with disabilities in greece,” in universal access in human-computer interaction, universal access to information and knowledge, ed. constantine stephanidis and margherita antona, uahci 2014 (june 2014), lecture notes in computer science 8514: 349–57, https://doi.org/10.1007/978-3-319-07440-5_32. 40 treviranus et al., “floe project,” 455–56. 41 treviranus et al., 456–57. 42 treviranus et al., 456–57. 43 ng, “improving web accessibility”; treviranus et al., “floe project,” 460–61. 44 treviranus et al., “floe project,” 461. 45 2018 academic student ebook experience survey report (library journal research, 2018): 6, accessed may 3, 2019, https://mediasource.formstack.com/forms/2018_academic_student_ebook_experience_survey _report. 46 michael gorrell, “the ebook user experience in an integrated research platform,” against the grain 23, no. 5 (december 2014): 38; robert slater, “why aren’t e-books gaining more ground in academic libraries? e-book use and perceptions: a review of published literature and research,” journal of web librarianship 4, no. 4 (2010): 305–31; joelle thomas and galadriel chilton, “library e-book platforms are broken: let’s fix them,” academic e-books: publishers, librarians, and users (2016): 249–62; christina mune and ann agee, “ebook showdown: evaluating academic ebook platforms from a user perspective,” in creating sustainable community: the proceedings of the acrl 2015 conference (2015): 25–28; laura muir and graeme hawes, “the case for e-book literacy: undergraduate students’ experience with ebooks for course work,” the journal of academic librarianship 39, no. 3 (2013): 260–74; esta tovstiadi, natalia tingle, and gabrielle wiersma, “academic e-book usability from the student’s perspective,” evidence based library and information practice 13, no. 4 (2018): 70– 87. 47 erin dorris cassidy, michelle martinez, and lisa shen, “not in love, or not in the know? graduate student and faculty use (and non-use) of e-books,” the journal of academic librarianship 38, no. 6 (2012): 326–32; gorrell, “the ebook user experience,” 36–40. 48 jo r. jardina and barbara s. chaparro, “investigating the usability of e-textbooks using the technique for human error assessment,” journal of usability studies 10, no. 4 (2015): 140–59. 49 steve krug, rocket surgery made easy (berkeley, ca: new riders, 2010), 146–53. 50 danielle a. becker and lauren yannotta, “modeling a library website redesign process: developing a user-centered website through usability testing,” information technology and libraries 32, no. 1 (march 2013): 9–10. 51 blakiston and mayden, “how we hired a content strategist,” 194. https://doi.org/10.1007/978-3-319-07440-5_32 https://mediasource.formstack.com/forms/2018_academic_student_ebook_experience_survey_report https://mediasource.formstack.com/forms/2018_academic_student_ebook_experience_survey_report information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 18 52 jessica norman, sait oer workflow, may 2019, accessed july 14, 2020, https://docs.google.com/drawings/d/1xvjpu9s4bb32k3gblnvw4uy1ely9rtxnr8bkfdm5yk/; regina gong, oer production workflow, accessed july 14, 2020, http://libguides.lcc.edu/oer/adopt; ariana e. santiago, oer adoption workflow visual overview, april 2019, accessed july 14, 2020, https://docs.google.com/drawings/d/1czqhpgpqyrr46vm5iytoemyqj-s1zr0p-m-lj16rtto/; meinke, “discovering oer production workflows.” 53 zoe wake hyde, “accessibility and universal design,” in pressbooks for edu guide (pressbooks.com, 2016), https://www.publiconsulting.com/wordpress/eduguide/. 54 hyde. https://docs.google.com/drawings/d/1xvjpu9s4bb32k3gblnvw4uy1ely9rtxnr8bkfdm5-yk/edit https://docs.google.com/drawings/d/1xvjpu9s4bb32k3gblnvw4uy1ely9rtxnr8bkfdm5-yk/edit http://libguides.lcc.edu/oer/adopt https://docs.google.com/drawings/d/1czqhpgpqyrr46vm5iytoemyqj-s1zr0p-m-lj16rtto/ https://www.publiconsulting.com/wordpress/eduguide/ abstract introduction literature review case study limitations results content and navigation discovery and access discussion accessibility conclusion appendix: usability test method endnotes letter from the editor (december 2019) letter from the editor kenneth j. varnum information technology and libraries | december 2019 1 https://doi.org/10.6017/ital.v38i4.11923 earlier this fall, i had the privilege of participating in the sharjah library conference, a three-day event hosted by the sharjah book authority in the united arab emirates with programming coordinated by the ala international relations office. the experience of meeting with so many librarians from cultures different from my own was truly rewarding and enriching. it was both refreshing and invigorating to see, first-hand, the global importance of the local matters that occupy so much of my professional life. i returned to my regular job with a newfound appreciation for how much the issues i spend so much of my professional time on—information access, equity, user experience, and the like—are universal. it is easy to get lost in the weeds of my own circumstances and environment, and sometimes difficult to look up and explore what colleagues, known and unknown, are doing and thinking. the experience reinforces the importance of important open access publications such as information technology and libraries. while “open access” doesn’t remove every possible barrier to accessing the knowledge, experience, and lessons contained within in its virtual cover, it does remove the all-important paywall. and that is no small thing, in a community of library technologists who interact and exchange information through social media, email, and other tools. our open access status gives this journal a vibrant platform for sharing knowledge, experience, and expertise to all who seek it. i hope you find this issue’s contents useful and informative, and will share the items you find most important with your peers at your institutions and beyond. i invite you to add your own knowledge and experience to our collective wisdom through a contribution to the journal. for more details, see the about the journal page or get in touch with me. sincerely, kenneth j. varnum, editor varnum@umich.edu december 2019 https://www.sibfala.com/program http://www.sba.gov.ae/ http://www.ala.org/aboutala/offices/iro https://ejournals.bc.edu/index.php/ital/about mailto:varnum@umich.edu personalization of search results representation of a digital library article personalization of search results representation of a digital library ljubomir paskali, lidija ivanovic, georgia kapitsaki, dragan ivanovic, bojana dimic surla, and dusan surla information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12647 ljubomir paskali (ljubomir.paskali@gmail.com) phd student, university of novi sad, serbia. lidija ivanovic (lidija.ivanovic@uns.ac.rs) assistant professor, university of novi sad, serbia, she is a corresponding author. georgia kapitsaki (gkapi@cs.ucy.ac.cy) associate professor, university of cyprus, cyprus. dragan ivanovic (dragan.ivanovic@uns.ac.rs) full professor, university of novi sad, serbia. bojana dimic surla (bdimicsurla@raf.edu.rs) full professor, union university, serbia. dusan surla (surla@uns.ac.rs) professor emeritus, university of novi sad, serbia. © 2021. abstract the process of discovering appropriate resources in digital libraries within universities is important, as it can have a big effect on whether retrieved works are useful to the requester. the improvement of the user experience with the digital library of the university of novi sad dissertations (phd uns) through the personalization of search results representation is the aim of the research presented in this paper. there are three groups of phd uns digital library users: users from the academic community, users outside the academic community, and librarians who are in charge of entering dissertation data. different types of textual and visual representations were analyzed, and representations which needed to be implemented for the groups of users of phd uns digital library were selected. after implementing these representations and putting them into operation in april 2017, the user interface was extended with functionality that allows users to select their desired style for representing search results using an additional module for storing message logs. the stored messages represent an explicit change in the results representation by individual users. using these message logs and elk technology stack, we analyzed user behavior patterns depending on the type of query, type of device, and search mode. the analysis has shown that the majority of users of the phd uns system prefer using the textual style of representation rather than the visual. some users have changed the style of results representation several times and it is assumed that different types of information require a different representation style. also, it has been established that the most frequent change to the visual results representation occurs after users perform a query which shows all the dissertations from a certain time period and which is taken from the advanced search mode; however, there is no correlation between this change and the client’s device used. introduction in order to place their current work within a framework of previous methods or identify research gaps, researchers often need to identify and study previous research. discovering information on the web is not always a trivial task. many systems allow scholars to search for research papers, dissertations, and other technical reports, providing at the same time relevant recommendations to users based on their areas of interest or previous searches. although web search engines are considered a superior solution to more specialized digital library systems, these specialized systems may provide more benefits in specific conditions, e.g., when searching for dissertations in specific languages, or by affiliated countries or institutions.1 nowadays, digital libraries are widely used by diverse communities of users for diverse purposes.2 xie and colleagues conducted an analysis in 2018 to compare similarities and differences in perceptions of the importance of mailto:ljubomir.paskali@gmail.com mailto:lidija.ivanovic@uns.ac.rs mailto:gkapi@cs.ucy.ac.cy mailto:dragan.ivanovic@uns.ac.rs mailto:bdimicsurla@raf.edu.rs mailto:surla@uns.ac.rs information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 2 different digital libraries evaluation criteria by heterogeneous stakeholders in academic settings.3 specifically, they surveyed three groups of stakeholders (scholars, librarians, and digital library users), and through their analysis of the survey’s responses, they identified differences in opinions not only between user expectations and the digital library practice but also between what is desirable and what is possible in the academic environment. regardless of whether more general (i.e., web search engines) or more specific (i.e., digital libraries) systems are used, the presentation of search results to users is important. it significantly affects how they perceive the system and may reduce or increase the chances of using the system and the frequency of use, as usability is an important aspect in any system. the visualization of search results may be adapted to user needs. presenting results either in a textual or an alternative format depends on how users respond to the alternative presentations of the system and this process is followed in different domains, such as recommender systems.4 this can form part of a personalized context-aware system that considers users’ environment, history, and interaction with the system in order to act proactively and adapt the presentation of search results to each user. based on the gaps identified above, the focus for this study is the improvement of the user experience with the phd uns digital library through the personalization of the offered service for system users. it is necessary to provide the user with a choice between textual and visual search results representation according to user preferences and needs. this study reveals a new way of presenting search results that has been analyzed, designed, and implemented by the authors. more concretely, presentations of the bibliographic metadata in a standardized citation style (harvard style) and bibliographic formats (marc 21, dublin core, etd-ms) have been analyzed and implemented. word cloud format is widely used in different systems and this representation has been implemented in the phd uns system. finally, it is necessary to determine if the initial search results representation should be stored in the history of users’ queries, device types, and search mode. users have the ability to provide their feedback on the visualization of the search results, therefore indicating if they prefer a textual or new visual results representation (by changing search results representation style). the feedback received is used to adapt the results representation based on the user preferences. this component represents the first step towards a completely personalized system, in which different contextual parameters will be used for providing a personalized context-based user experience. at this point, user feedback is used for personalization, search results representation, and subsequent system use. a preliminary version with preliminary results regarding the word cloud component is described by kapitsaki and ivanovic.5 in respect to this previous work, we are presenting the evolvement of personalization in the phd uns system and a more thorough evaluation that allows us to perform statistical analysis and draw more generic conclusions. accordingly, the motivation for this research is the personalization of the search results representation of a digital library, and the research questions to which this research should provide answers have been identified. we are discussing our results based on these questions: 1. rq1: what are the users’ profiles of the phd uns digital library? 2. rq2: how could search results best be presented to different users within phd uns’s digital library collections? 3. rq3: can the search results representation with phd uns’s digital library depend on the history of users’ queries, device types, and search mode? information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 3 related work dosird uns the dosird uns project (http://dosird.uns.ac.rs/) was launched in 2009 with the aim to develop software infrastructure for the research domain of the university of novi sad (uns). the cris uns system (www.cris.uns.ac.rs) is the first result of this project. this system represents the information system of the research domain of the uns. the development of the system started with the beginning of the project in 2009 and is still active. the digital library of theses and dissertations (phd uns), which is the topic of this paper, is integrated within cris uns. the complete cris uns system was developed in accordance with the recommendations of the eurocris (www.eurocris.org) non-profit organization. systems which contain the published scientific results were analyzed and, on the basis of these analyses, a set of metadata describing the scientific and research result in cris uns was created. 6 a paper by ivanović et al. described the cerif compatible data model based on the marc 21 format which maps part of the cerif data model to the marc 21 format data model. 7 the marc 21 format is a standardized format for storing bibliographic data. cris uns has been built on th is model. the system architecture and implementation are described in previous publications.8 the development of the digital ph.d. dissertations library (phd uns) began in 2010. in december 2012, the senate of the university of novi sad approved the commissioning of a public service for the search of a digital library of dissertations defended at the university (https://cris.uns.ac.rs/searchdissertations.jsf). phd uns has been implemented with the following characteristics: • the digital library of e-theses is integrated into the information system of the scientific research activity of the university of novi sad (cris uns). • the digital library is cerif compatible, that is, it can exchange metadata with cerifcompatible systems of scientific research activity. • e-theses are described by a set of metadata which includes all the metadata prescribed by the dublin core and the etd-ms metadata format, that is, the system can exchange the data in dublin core or etd-ms format via the oai-pmh protocol. • the digital e-thesis library has a data model and architecture that can be easily integrated with a bibliographic system based on the marc 21 bibliographic format. • the user interface allows a user to enter the thesis and dissertation data without knowing the standardized metadata formats on which the digital library is built. the integration of phd uns within cris uns involved the following four steps: 1. the cris uns data model has been extended with entities and properties for describing phd theses in accordance with cerif, dublin core, and etd-ms data models.9 2. the cris uns software architecture and user interface has been extended in order to support basic functionality of cataloguing theses.10 3. theses’ metadata have been imported from the previous source.11 4. the web page for searching among the collection of theses has been implemented.12 http://dosird.uns.ac.rs/ http://www.cris.uns.ac.rs/ http://www.eurocris.org/ https://cris.uns.ac.rs/searchdissertations.jsf information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 4 searching personalization the findings and analysis of scientific results described in papers, theses, and dissertations is an important part of research activities in the scientific community. therefore, the use and development of the tools and the bibliographic systems which enable advanced search is becoming increasingly more common. the personalization of search results can include automatic recommendations to users.13 moreover, part of search personalization refers to the personalization of results representation. similar to popular web search engines like google, the way the search results are represented is very important to users. the way the results are represented can affect user perception of the system and the frequency of its use. the results can be presented to users in formats other than textual in order to improve the user experience in search tools, as well as to improve access to finding information and in recommendation systems.14 ferran and colleagues described browsing and searching personalization systems for digital libraries.15 their approach is based on the use of ontologies for describing the relationships between elements that determine the functionalities of the desired personalization system. those elements include the user’s profile, including navigational history and the user preferences, as well as the information collected from the navigational behavior of the digital library users. such a personalization system can improve digital library users’ experience. sebrechts and colleagues presented a controlled comparison of text, 2d, and 3d approaches to a set of typical information seeking tasks on a collection of 100 top ranked documents retrieved from a much larger document set.16 the conducted experiments included 15 participants. the study revealed that although visualization can assist the reduction of the mental workload for interpreting the results, these reductions and their acceptance depend on an appropriate mapping among the interface, the task and the user. in relevance to the above, our approach lies in the area of 2d display of information (see the visual results representation section later in this article), but instead of focusing on basic text information we have adopted newer approaches found in word clouds. bowers and card analyzed visualization in the framework of database search. 17 soliman et al. presented an approach for the clustering of search engine results that relies on the semantics of the retrieved documents.18 the approach takes into consideration both lexical and semantic similarities among documents and applies activation spreading tech nique, in order to generate clusters based on semantic properties. nguyen and zhang proposed a model for web search visualization, where physical location, spatial distance, color, and movement of graphical objects are used to represent the degree of relevance between a query and relevant web pages considering this way the context of users’ subjects of interest.19 a word or tag cloud is a visual representation of word content commonly used to represent content in different environments.20 several past works have introduced various algorithms for the tag selection or new ways for the word cloud creation. 21 tag clouds have been used in pubcloud for the summarization of results from queries over the pubmed database of biomedical literature.22 pubcloud responds to queries of this database with tag clouds generated from words extracted from the abstracts returned by the query. the authors found that the descriptive information is this way provided in a better way to users. however, the discovery of relations between concepts is rendered less effective. information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 5 context awareness context awareness is a part of many systems in various domains, where the application or system functionality adapts to the context of use, such as in mobile computing or pervasive computing applications.23 the first definition of context was given by abowd and colleagues, where they defined context “as any information that is relevant to the user, to the system and to any interaction between the user and the system.”24 applications utilize context data in order to provide context-aware services to users. context information, such as user location or user preferences, are used to adapt the application functionality or presentation to a specific user. mobile computing and pervasive computing offer the necessary information from mobile device sensors and in users’ environments for context-aware application provision.25 fink and kobsa claimed that personalization may adapt various features in order to address the specific needs of each individual.26 many systems utilize users’ search history in order to offer personalized search in the framework of the web information retrieval systems. yoganarasimhan found that personalization based on short-term history or within-session behavior is less valuable than longterm or across-session personalization.27 behnert and lewandowski analyzed the application of web search engines ranking approaches on digital libraries, and they argued for a user-centric view on ranking, taking into account that ranking should be for the benefit of the user, and user preferences may vary across different contexts.28 frias-martinez and colleagues defined an approach to constructing personalized digital libraries. adaptive digital libraries automatically learn user preferences and goals and personalize their interaction using this information.29 based on previous work, frias-martinez and colleagues developed a personalized digital library to suit the needs of different cognitive styles. 30 contribution of our work we share similarities with previous work in terms of techniques used, as for instance the harvard reference style, standardized bibliographic formats (marc 21, dublin core, etd-ms) and word clouds, which have been used in other systems for representation of search results in order to improve the user experience. however, in contrast to previous work, we apply techniques in a specific context for a serbian digital library, allowing automatic adaptation for the representation of search results based on a user’s type, history, and reaction. those search results representations are implemented and integrated within a real system (phd uns) and are tested with real users’ feedback. that users’ feedback is analyzed using elk stack technologies, making our main conclusions useful for similar systems and future research on personalization in digital libraries. methodology the main requirements for implementation of the phd uns digital library were for the system to be compatible for integration with other systems of scientific research activity, support for data exchange in different standardized formats, and provision of representation of the results to users of different categories and profiles (researchers, scientists, librarians, users from outside the academic community, etc.). for these reasons, existing formats for representation of references, bibliographic metadata formats, as well as techniques for visual representation of textual publications, were analyzed. the format adopted for the representation of references was harvard-style implemented with the freemarker template. freemarker information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 6 (https://freemarker.apache.org/) is an open-source template engine for java that assists in separating the web user interface from the main system functionality, following the mvc (model view controller) pattern. the analysis established that it was necessary to implement the search results representation in three bibliographic formats: marc 21, dublin core, and etd-ms. for each of these formats, appropriate mappers/serializers were made which transform the data from a database into the xml (extensible markup language) representation of the previously mentioned bibliographic formats. for visual representation, word cloud style was adopted. a component for generating word cloud images was implemented and integrated into the phd uns digital library. when presented with search results, users are able to choose which desired representation format is stored and used as preferable for the given user in future search results representations. a logging system was implemented to assist with this; this module is invoked when the user changes the default data view mode. received messages are preprocessed for the purpose of analyzing messages and obtaining a more accurate evaluation. the aim of the analysis of the messages on the work of the phd uns system is to obtain the desired statistics: • distribution of used representation styles • top queries executed before changing into the textual style, as well as into the visual style • distribution of devices used before changing into the textual style, as well as into the visual style • distribution of search modes before changing into the textual style, as well as into the visual style these statistics are analyzed to determine the user behavior patterns depending on the type of search (basic or advanced), the search device used, the executed query, etc. based on the established patterns it is possible to determine the representation style for future searches of the new users. the results of the analysis are graphically represented using elk stack technologies and are presented below in the evaluation section. the methodological approach is shown in figure 1. the rest of this paper is organized in accordance with the methodology steps shown in figure 1. https://freemarker.apache.org/ information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 7 figure 1. the methodology of the study presented in this paper. information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 8 analysis of search results’ representation styles textual representation for the needs of the phd uns library, three types of search results’ textual representation for three types of phd uns users were analyzed: reference representation. a citation style is defined as a set of rules for citing sources in academic writing and those rules prescribe style for in-text citations, as well as style for reference representation in the references’ list. this textual approach is intended for users from the academic community—researchers, teaching staff of universities, and phd students. since the majority of phd uns users belong to this group, this is the default representation in a textual representation of the search results. this group of users is familiar with this type of representation. from this type of representation, the users can easily recognize basic data of interest and can use this representation for citing and referencing the dissertations which have been retrieved as a result of executing a query. taking into account that there are currently many different styles for representing references and citations (e.g., apa, mla, harvard, vancouver, and chicago, among others), and that they might change in the future with the emergence of new trends in science (for example, the emergence of open science and the need to cite data sets, not just publications), it is necessary to create a scalable component for representing the results in a form of a reference. based on this analysis, we decided that the architecture of this component will be based on freemarker, which makes the introduction of new templates for the output format easier, and that the first freemarker template should be created for harvard style. structured representation. this textual approach is intended for users outside the academic community who want to search the digital library. this type of representation represents only the data from the digital library database in a legible format that is represented in the web browser. bibliographic formats representation. this textual approach is intended for librarians who are in charge of entering and maintaining data in the digital library. in addition to one central library of the university of novi sad, there are libraries in every department within the university. most of these libraries use the bisis library system, which is based on marc format. therefore, it can be concluded that the majority of the librarians who enter data into the phd uns library are familiar with the marc format. librarians can use the representation of metadata about dissertations in the marc 21 format to check if all of the information about a dissertation is entered correctly. • marc 21 bibliographic format supports not only descriptions of theses and dissertations but also other published scientific results, such as a paper published in a journal, a monograph, a paper published in conference proceedings, etc. there are several examples where theses and dissertations are described using the marc 21 format in the bibliographic information systems of some universities.31 • dublin core (http://dublincore.org) is the most commonly used format for data exchange between different information systems, and data are exported in this format via the oaipmh protocol from the phd uns system into a network of digital libraries, such as darteurope, oatd, and nardus. the dublin core xml schema is available online at www.openarchives.org/oai/2.0/oai_dc.xsd. the representation in dublin core format can http://dublincore.org/ http://www.openarchives.org/oai/2.0/oai_dc.xsd information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 9 be used by librarians to check if the metadata will be correctly exported to the previously mentioned aggregation systems. • electronic theses and dissertations metadata standard (etd-ms) (www.ndltd.org/standards/ metadata) is an extension of the dublin core format with new features/properties. the standard defines a set of metadata that is used to describe a master’s thesis or a doctoral dissertation. the metadata of this standard describe the author, his/her paper, and the context in which this paper has been created in a way that will be useful not only to the researcher, but also to the librarian and/or the technical staff in charge of maintaining the paper in electronic form. this format is used within the ndltd worldwide network of digital theses and dissertations and in this format the data is exported via the oai-pmh protocol from the phd uns system to this network. the xml schema of the etd-ms format is available online at www.ndltd.org/ standards/metadata/etdms/1.0/etdms.xsd. the representation in the etd-ms format can be used by librarians to check if the metadata will be correctly exported to the ndltd network (http://union.ndltd.org/). visual representation a word cloud is a visual representation of textual content, with the importance of each word indicated with a different font size and/or color. word clouds are often used in many digital libraries to represent textual content.32 as previously written, the word cloud is used in different environments and is a popular way to represent web results by summarizing the content of documents and other sources of information. we adopted a word cloud approach for visual representation of the user search results in the phd uns library. various tools for generating word clouds are available, such as the tool offered by jin.33 based on the characteristics of available tools, we decided to implement the kumo library available in the java programming language (https://github.com/kennycason/kumo) that allows easier integration within the phd uns digital library. implementation details this section presents the implementation of the textual and visual search results representation, as well as the implementation of the search results personalization. textual results representation based on the analysis presented in the previous section, we decided to implement the following functionality in order to enhance the phd uns digital library: a structured representation for users outside the academic community, a representation in the form of references for scholars , and a representation of bibliographic and library formats for librarians in charge of data in the phd uns digital library. reference representation. figure 2 depicts the architecture of the module for generating phd dissertations’ representations in the form of references for scholars. http://www.ndltd.org/standards/%20metadata http://www.ndltd.org/%20standards/metadata/etdms/1.0/etdms.xsd http://www.ndltd.org/%20standards/metadata/etdms/1.0/etdms.xsd http://union.ndltd.org/ https://github.com/kennycason/kumo information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 10 figure 2. architecture of module for generating reference representation. the model of the reference generator component is shown as the class diagram in figure 3. this component can be used to generate textual representation to all publications from the data model component (figure 4) in the chosen reference style (freemarker template—see listing 1). the central class is templaterunner which includes the necessary operations to generate reports. the templateholder represents the template container and has operations for adding new templates and selecting template for generating report. the template class is the model of the template for one reference style and one publication type. the component architecture described in the class diagram of figure 3 is independent of the number of templates, whereas adding a new template to the component requires creating a new instance of the template class. as similarly performed in the cris uns system, the implementation of these instances of the template class is done in freemarker that does not require the recompilation of the source code. freemarker template reference generator data model phd uns database reference representation information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 11 figure 3. architecture of the component for generating template. <#macro nameinitial name> <@compress> <#t><#if name?length >1> <#t><#if (name?upper_case?starts_with("lj") || name?upper_case?starts_with("nj"))>, ${name?substring(0,2)?upper_case}. <#t><#else>, ${name?substring(0,1)?upper_case}. <#t><#elseif name?length=1>, ${name?upper_case}. <#t>${author.name.lastname?upper_case} <#t><@nameinitial author.name.firstname/> <#t><#if publicationyear??> (${publicationyear}) ${sometitle!""}. (${localizedstudytype}), ${institution.somename} listing 1. harvard-style freemarker template structured representation. the simplified version of the bibliographic records data model that is used in the cris uns system is shown in figure 4. the cris uns system contains also other publication entities, such as monograph, journal paper, etc. the phd uns digital library is integrated into the cris uns system and uses the entities shown in figure 4. 1..1 1..* templaterunner + getrepresentation (record rec[], int referencestyle) getrectype (record rec) makeonereference (record rec, template template) organizerecords (criteria criteria) : string : int : string : void templatesholder + + gettemplate (int recordtype, int referencestyle) addtemplate (template t) : int : void template pubtype referencestyle : int : int + + getdata () formatdata () : void : void information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 12 figure 4. data model. a structured representation for users outside the academic community is implemented with the help of the tohtmlrepresentation method contained in the thesis class. this method forms a structured representation of the dissertation with the help of html markup. when storing a dissertation in a database, html representation is generated and stored in lucene indexes for faster representation of search results. bibliographic formats representation. after analyzing the bibliographic and library formats (see the methodology section above) we concluded that we should implement the search results representation in marc21, dublin core, and etd-ms bibliographic formats for the needs of librarians who are in charge of entering data in the phd uns library. the representation of these formats is implemented in a similar way as the representation of structured data outside the academic community, with the help of the following methods: the tomarc21representation, 0..* publications 1..* authors 0..* 0..* supervisors 0..* researchers 0..* affiliations 0..1 1..1 defendedat 0..* 0..* defendboardmembers t hesis + + + + toht mlrepresentation () tomarc21representation () todublincorerepresentation () toet dmsrepresentation () : string : string : string : string publication {abstract} title publicationyear : string : int institution name address : string : string author firstname lastname address email : string : string : string : string information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 13 todublincorerepresentation, and toetdmsrepresentation in the thesis class (figure 4). these methods generate an xml representation of these formats. when storing a dissertation in a database, these xml representations are stored in lucene indexes for faster retrieval of search results. screenshots of user interface. figure 5 presents the textual search results representation. the basic representation contains the metadata of the dissertation presented as a harvard -style reference. this is the basic representation because the researchers from the academic environment are the most common users of the phd uns library. additional metadata are represented by pressing the button which is located next to the reference and represents the data structured for the needs of users outside the academic community (see fig. 6). figure 5. results representation in a textual format. in addition, the representation of the dissertation metadata is also available to library users in marc 21, dublin core, and etd ms formats (see fig. 6). information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 14 figure 6. structured and bibliographic formats representation. visual results representation this section describes the implementation of the graphical (visual) search results representation. the graphical representation is realized using a word cloud to represent the content of a dissertation. word cloud generator component. the word cloud generator component forms a new part of the phd uns digital library. the aim of this new component is to present the user search results in a word cloud representation (see fig. 7). information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 15 figure 7. word cloud generator steps. the word cloud component was implemented in java. the component accepts a pdf file as input and generates an image (png file) as output. the tool uses as input the pdf file of the dissertation ; it then parses the textual content of the file and performs a preprocessing of the text. the result of the preprocessing is a list of pairs containing the original version of each word from the text and its stem. the details of the tool utilized for this preprocessing step can also be found in the existing publication.34 the tool then calculates the top frequencies of words in the text, generates the word cloud, and creates an image file. as aforementioned, for implementation purposes, the kumo library was used (https://github.com/kennycason/kumo). kumo is an open source software that carries the mit license. the source code has been extended to accommodate the needs of the phd uns digital library. https://github.com/kennycason/kumo information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 16 integration to the phd uns system. the word cloud generator component described in the previous section has been integrated into the phd uns digital library application and was put into operation in april 2017, although some necessary adaptations have been performed since then and have all been integrated as well. taking into account that the word cloud generator is lengthy and creates a computationally intensive process, it is invoked in the indexing phase and the generated image is stored as supplementary material to a phd dissertation in the server file system. figure 8 presents a unified modeling language (uml) activity diagram which describes the process of adding a new dissertation to the phd uns digital library. the activity “generate word cloud image” is highlighted in red and represents invoking the execution of the word cloud component. moreover, the activity “create lucene index” includes the same steps for text preprocessing as the steps described in the word cloud generator component (see fig. 7). figure 8. adding a new dissertation into the phd uns system. the search results representation in the form of a word cloud is enabled via the user interface page for representing the search results of the phd uns digital library (see fig. 9). user phd uns upload dissertation store dissertation generate word cloud image create lucene index store image in file system upload dissertation store dissertation generate word cloud image create lucene index store image in file system information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 17 figure 9. results of the search of the phd uns system in a word cloud format. personalization of representation this section describes the implementation related to personalization of the search results representation. the user can select the desired style of representation, and the representation history of the results is recorded in order to personalize the results representation and customize the user’s profile and information needs. the initial search results representation style for users who search for dissertations in the phd uns system for the very first time is the random selection of one of the two options: • result representation in a textual format • result representation as word cloud images after analyzing the logs from changing the results representation (see next section), this random selection could be replaced with a choice that depends on the context: queries, devices, types of searches, etc. the parts of the page which represent how the results are presented in the textual and word cloud representations are shown in figures 5 and 9, respectively. users can change the representation style from the page. in this way, users give feedback and indicate their preference for visualization of the results which is used in the future results representations for that user. information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 18 evaluation collecting user feedback if a digital library user changes the style of results representation, the relevant message about the change of the representation style together with the user metadata is recorded using the log4j. this process is shown in red in the activity diagram in figure 10. figure 10. the process of executing queries and giving feedback on the representation style. listing 2 is an example of a recorded message about the change of the representation style containing user metadata from the phd uns system. information, such as the time and territorial determinant of a web client, the agent used, and the representation style, are also recorded. the representation style is stored on the user browser in the form of cookies and represents the basic style for representing results in future searches of dissertations in the phd uns system. by analyzing the messages about the change of the representation style, we evaluate the results of our approach and examine how the users respond to the new style of representation. user phd uns define query change representation style yes no log change of representation style select representation style execute query display results as a harvard style reference load and display word cloud image representation style is textual yes no information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 19 [info] 22.08.2017. 16:07:33 (searchdissertationsmanagedbean:setrepresentationstyle) date and time: tue aug 22 16:07:33 cest 2017| miliseconds: 1503410853455| + session id: 2a4ce66932d0c3c8db97098dff956074| userid: 150341083728649| ip address: 188.2.29.239| location: city: belgrade, postal code: null, regionname: null (region: 00), countryname: serbia (country code: rs), latitude: 44.818604, longitude: 20.468094| user agent (device): mozilla/5.0 (windows nt 10.0; wow64) applewebkit/537.36 (khtml, like gecko) chrome/60.0.3112.78 safari/537.36 opr/47.0.2631.55| new representation style: wordcloud listing 2. example of a message about the change of the representation style preprocessing users’ feedback as already indicated, each change in the representation style of the search results causes the creation of an appropriate message (see listing 2). in order to better understand the context of use and the reason for changing the representation style, these messages are preprocessed and supplemented with information on the type of search and the given query which preceded the change in the representation style (highlighted in yellow in listing 3). by analyzing additional information, we can understand which context of usage and user actions preceded the change of the representation style. additional information is obtained from the received queries for the phd uns system and is mapped by using a unique user session identifier. an example of a message after preprocessing is shown in listing 3. [info] 22.08.2017. 16:07:33 (searchdissertationsmanagedbean:setrepresentationstyle) date and time: tue aug 22 16:07:33 cest 2017| miliseconds: 1503410853455| + session id: 2a4ce66932d0c3c8db97098dff956074| userid: 150341083728649| ip address: 188.2.29.239| location: city: belgrade, postal code: null, regionname: null (region: 00), countryname: serbia (country code: rs), latitude: 44.818604, longitude: 20.468094| user agent (device): mozilla/5.0 (windows nt 10.0; wow64) applewebkit/537.36 (khtml, like gecko) chrome/60.0.3112.78 safari/537.36 opr/47.0.2631.55| new representation style: wordcloud| query: internet| searching mode: basic listing 3. example of a message about the change of the representation style after preprocessing analysis of user feedback messages such as the one in listing 3 with additional information about contextual use are suitable for further analysis using elk stack technologies. messages in a given format are collected from logs of the phd uns system using the logstash grok filter. this filter is used for parsing, statistical analysis based on field values, data filtering, and advanced search using multiple filters. the parsed messages have been forwarded to elasticsearch components of elk stack technology. the grok pattern definition, which represents the rules and instructions for parsing messages, is located in the configuration files that are forwarded as a parameter when running the tool. an example of a configuration file is shown in listing 4. input { file { path => "/config-dir/logs-style-formatted/*.log" start_position => beginning } } filter { grok { break_on_match => false match => { "message" => "%{loglevel:loglevel}" } match => { "message" => "date and time: %{day:logday} %{month:logmonth} %{monthday:logmonthday} %{number:loghour}:%{numbe r:logminute}:%{number:logsecond} %{word:logtimezone} %{year:logyear}" } information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 20 match => { "message" => "userid: %{number:userid}\|" } match => { "message" => "city: %{data:city}," } match => { "message" => "countryname: %{data:country} \(" } match => { "message" => "user agent \(device\): %{data:useragent}\|" } match => { "message" => "new representation style: %{data:newstyle}\|" } match => { "message" => "query: %{data:query}\|" } match => { "message" => "searching mode: %{data:searchingmode}\|" } } } output { elasticsearch { hosts => ["elasticsearch:9200"] } } listing 4. an example of a grok pattern used to analyze the message about the change of the representation the analysis of the messages about the work of the phd uns system is presented in this section. the results are represented using the kibana graph component of the elk stack. this component is used for visualization and data exploration, analysis of logs at specified time intervals, and realtime monitoring of applications. the word cloud generating component was put into operation in april 2017. log messages were analyzed from then until the end of 2019. in total, there were 17,474 analyzed messages about changing the style of search results representation. in these messages, the style was changed into a textual representation 16,032 times, while it was changed into a visual representation style in the form of a word cloud image 1,442 times. thus, most of the users of the phd uns system changed the representation style to textual rather than visual format. this tells us that the majority of users are more familiar with the textual style of representing search results in interaction with scientific systems. based on this analysis, it can be concluded that the random selection of the representation style of the results is not a good choice. we also analyzed the client devices used when changing representation style (textual and visual). computers were used considerably more frequently than mobile devices. devices with larger resolution screens are more suitable for presenting search results in different formats. distribution of the change in the representation style of the search results is similar for computers and mobile devices, and based on the device, we cannot conclude which representation style is more suitable for the user. the queries that were submitted by users before changing the style of representation were also analyzed; in other words, which of the queries and results representations initiated the change into the other style of representation. figure 11 and figure 12 show the most commonly executed queries before changing the style of representation into the textual and visual format, respectively. figure 11 shows the most commonly executed queries before changing into the textual style of representation. some of the queries shown on this figure represent the names of faculties of the university, such as • fakultet tehnick nauka (department of technical sciences) • filosofsk fakultet (department of philosophy) information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 21 figure 11. top queries before changing into the textual style the most commonly executed queries before changing into the visual representation style are shown in figure 12. some queries shown on this figure represent scientific fields, such as • doktor medicinskih nauka (doctor of medical science) • doktor geografskih nauka (doctor of geographical science) figure 12. top queries before changing into the visual representation style. based on figures 11 and 12, we can conclude that the queries users submitted before the change in the style of representing the results are of a general type, that is, they represent the queries in faculties or by scientific fields. these types of queries give long lists of results. for queries over longer periods of time where the representation of all dissertations defended in a certain period is required, users changed the representation style into visual. information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 22 the search mode used before the change into the textual style is shown in figure 13, while the search mode used before the change into the visual representation style is shown in figure 14. figure 13. search mode before changing into the textual style. figure 14. search mode before changing into the visual style. by analyzing figures 13 and 14, we conclude that most queries preceding the change of the representation style are set from the basic search mode (labeled basic on the figures), which is the default search mode. also, we notice that there is an increase in the percentage when changing into the visual style of the advanced search mode as compared to the basic search mode. this is in compliance with the analysis following figures 11 and 12, because in the advanced search mode we make queries for a time range that gives long lists of results. also, we notice that some users have changed the style of results representation several times , so it is assumed that different types of information require a different representation style. there has been no reduction or enlargement in the number of users since the introduction of the word cloud generating component, which indicates that the introduction of the new component has not affected the frequency of the system use significantly. conclusion this paper describes one improvement on user experience performed for the users of the phd uns digital library. this improvement was implemented through the personalization of the search results representation which was put into operation in april 2017. users of the phd uns digital library are using desktop and laptop computers considerably more than mobile devices (rq1). moreover, besides specific exploratory queries, the users are raising general queries by scientific fields, faculties, or in the time range. the phd uns digital library has three user groups: those from the academic community, those from outside the academic community, and librarians in charge of entering the dissertation data. for these three groups of users, the following textual search results representations (rq2) have been selected and implemented: harvard-style representation of the dissertation in the form of references for users from the academic community; html structured results representation for users outside the academic community; and marc 21, dublin core, etd-ms bibliographic records for the library users. for the visual representation, word cloud presentation based on the complete text from the pdf file of the information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 23 dissertation has been selected and implemented. it is possible to select the desired search results representation which initiates storing of the messages about the representation style of the results, client device used, time, etc. this message is joined with a preset query message to analyze the patterns of system usage and establish a correlation between the change of the representation style and the type of query, device, and search mode (rq3). based on the conducted analysis, we reached the following conclusions: • a significantly larger number of users of the phd uns system use the textual representation style rather than the visual representation. this tells us that a larger number of users is more familiar with the textual style of representing search results in interaction with scientific systems and that a random selection of the representation style of the results used since april 2017 was not a good choice for the first-time user. because of this observation, the initial selection of the representation style for the first-time user was changed to the textual search results representation (rq3). • some users changed the representation style of the results several times and it is assumed that different types of information require a different representation style. based on this, we can conclude that the possibility of personalizing the search results representation is a useful functionality that contributes to the improvement of the phd uns system and the user experience. • it has been established that the most frequent change of the visual results representation is after a query which shows all the dissertations from a certain time period taken from the advanced search mode, but there is no correlation between this change and the device being used. based on this, it can be concluded that in certain cases for queries which show long lists of results, it is more transparent to see the results in the visual mode (rq3). it is necessary to collect more data and carry out additional analysis, in order to be able to precisely establish the correlation or to precisely determine for which queries and for which types of users this applied to, so that the system could automatically change the style of representation in certain cases. directions for future research and application development include the following. it is planned to collect and analyze additional messages about the work of the digital library in order to further enhance the user experience. also, it is necessary to follow the trends of the results representation due to the change of standardized reference styles, bibliographic formats, technologies and hardware devices, and it is further necessary to coordinate the results representation with these trends. differences between the behavior of the different user groups will also be examined further. endnotes 1 j. brophy and d. bawden, “is google enough? comparison of an internet search engine with academic library resources,” aslib proceedings 57, no. 6 (2005): 498–512, https://doi.org/10.1108/00012530510634235. 2 a. f. smeaton and j. callan, “personalisation and recommender systems in digital libraries,” international journal on digital libraries 5, no. 4 (2005): 299–308, https://doi.org/10.1007/s00799-004-0100-1. https://doi.org/10.1108/00012530510634235 https://doi.org/10.1007/s00799-004-0100-1 information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 24 3 iris xie, soohyung joo, and krystyna k. matusiak, “multifaceted evaluation criteria of digital libraries in academic settings: similarities and differences from different stakeholders,” the journal of academic librarianship 44, no. 6 (2018): 854–63, https://doi.org/10.1016/j.acalib.2018.09.002. 4 theodora nanou, george lekakos, and konstantinos fouskas, “the effects of recommendations’ presentation on persuasion and satisfaction in a movie recommender system,” multimedia systems 16, no. 4–5 (august 2010): 219–30, https://doi.org/10.1007/s00530-010-0190-0. 5 georgia kapitsaki and dragan ivanović, “representation with word clouds at the phd uns digital library,” computer science & information technology 21 (2017), https://doi.org/10.5121/csit.2017.71102. 6 dragan ivanović, “software systems for increasing availability of scientific-research outputs,” novi sad journal of mathematics – ns jom 42, no. 1 (2012): 37–48. 7 dragan ivanović, dušan surla, and zora konjović, “cerif compatible data model based on marc 21 format,” the electronic library 29, no. 1 (2011): 52–70, https://doi.org/10.1108/02640471111111433. 8 dragan ivanović, “a scientific-research activities information system,” (phd thesis, university of novi sad, 2010); d. ivanović, g. milosavljević, b. milosavljević, and d. surla, “a cerifcompatible research management system based on the marc 21 format,” program: electronic library and information systems 44, no. 1 (2010): 229–51; dragan ivanović and branko milosavljević, “software architecture of system of bibliographic data,” in proceedings of the xxi conference on applied mathematics prim 2009, 85–94. 9 lidija ivanović, dragan ivanović, and dušan surla, “a data model of theses and dissertations compatible with cerif, dublin core and edt-ms,” online information review 36, no. 4: 548– 67, https://doi.org/10.1108/14684521211254068. 10 lidija ivanović, dragan ivanović and dušan surla, “integration of a research management system and an oai-pmh compatible etds repository at the university of novi sad, republic of serbia,” library resources & technical services 56, no. 2: 104–12, https://doi.org/10.5860/lrts.56n2.104. 11 lidija ivanović and dušan surla, “a software module for import of theses and dissertations to criss,” in proceedings of the cris 2012 conference, (prague, june 6–9, 2012): 313–22. 12 lidija ivanovic, “search of catalogues of theses and dissertations,” novi sad journal of mathematics – ns jom, 43, no. 1 (2013): 155–65; lidija ivanović, dragan ivanović, dušan surla and zora konjović, “user interface of web application for searching phd dissertations of the university of novi sad,” in proceedings of the intelligent systems and informatics (sisy), 2013 ieee 11th international symposium: 117–22. 13 joel azzopardi, dragan ivanović and georgia kapitsaki, “comparison of collaborative and content-based automatic recommendation approaches in a digital library of serbian phd https://doi.org/10.1016/j.acalib.2018.09.002 https://doi.org/10.1007/s00530-010-0190-0 https://doi.org/10.5121/csit.2017.71102 https://doi.org/10.1108/02640471111111433 https://doi.org/10.1108/14684521211254068 https://doi.org/10.5860/lrts.56n2.104 information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 25 dissertations,” in proceedings of the international keystone conference 2016: 100–11, https://doi.org/10.1007/978-3-319-53640-8_9. 14 dragan ivanović and georgia kapitsaki, “personalisation of keyword-based search on structured data sources,” in proceedings of the 1st international keystone conference (ikc 2015). 15 núria ferran, enric mor, and julià minguillón, “towards personalization in digital libraries through ontologies,” library management 26, no. 4/5 (2005): 206–17, https://doi.org/10.1108/01435120510596062. 16 mark m. sebrechts et al., “visualization of search results: a comparative evaluation of text, 2d, and 3d interfaces,” in proceedings of the 22nd annual international acm sigir conference on research and development in information retrieval (sigir '99): 3–10, https://doi.org/10.1145/312624.312634. 17 frank h. bowers and stuart k. card, “method and apparatus for visualization of database search results,” u.s. patent no. 5,546,529. 18 sara saad soliman, maged f. el-sayed, and yasser f. hassan, “semantic clustering of search engine results,” the scientific world journal (2015), https://doi.org/10.1155/2015/931258. 19 tien nguyen and jin zhang, “a novel visualization model for web search results,” ieee transactions on visualization and computer graphics, 12, no. 5 (2006), https://doi.org/10.1109/tvcg.2006.111. 20 daniel scanfeld, vanessa scanfeld, and elaine l. larson, “dissemination of health information through social networks: twitter and antibiotics,” american journal of infection control 38, no. 3 (2010): 182–88, https://doi.org/10.1016/j.ajic.2009.11.004. 21 carmel mcnaught and paul lam, “using wordle as a supplementary research tool,” the qualitative report 15, no. 3 (2010): 630; weiwei cui et al., “context preserving dynamic word cloud visualization,” in ieee pacific visualization symposium (pacificvis) (2010): 121–28, https://doi.org/10.1109/pacificvis.2010.5429600; yusef hassan-montero and herrerovictor solana, “improving tag-clouds as visual information retrieval iinterfaces,” in proceedings of the international conference on multidisciplinary information sciences and technologies (2006): 25–28. 22 byron kuo, thomas hentrich, benjamin good, and mark wilkinson, “tag clouds for summarizing web search results,” in proceedings of the 16th international conference on world wide web (www '07) (2007): 1203–04, https://doi.org/10.1145/1242572.1242766. 23 jong-yi hong, eui-ho suh, and sung-jin kim, “context-aware systems: a literature review and classification,” expert systems with applications 36, no. 4 (2009): 8509–22, https://doi.org/10.1016/j.eswa.2008.10.071; georgia m. kapitsaki, george n. prezerakos, nikolaos d. tselikas, and iakovos s. venieris, “context-aware service engineering: a survey,” journal of systems and software 82 no. 8 (2009): 1285–97, https://doi.org/10.1016/j.jss.2009.02.026. https://doi.org/10.1007/978-3-319-53640-8_9 https://doi.org/10.1108/01435120510596062 https://doi.org/10.1145/312624.312634 https://doi.org/10.1155/2015/931258 https://doi.org/10.1109/tvcg.2006.111 https://doi.org/10.1016/j.ajic.2009.11.004 https://doi.org/10.1109/pacificvis.2010.5429600 https://doi.org/10.1145/1242572.1242766 https://doi.org/10.1016/j.eswa.2008.10.071 https://doi.org/10.1016/j.jss.2009.02.026 information technology and libraries march 2021 personalization of search results | paskali, ivanovic, kapitsaki, ivanovic, surla, and surla 26 24 gregory d. abowd et al., “towards a better understanding of context and context-awareness,” in handheld and ubiquitous computing (springer: berlin/heidelberg, 1999): 304–07, https://doi.org/10.1007/3-540-48157-5_29. 25 mika raento, antti oulasvirta, renaud petit and hannu toivonen, “contextphone: a prototyping platform for context-aware mobile applications,” ieee pervasive computing 4 no. 2 (2005): 51–59, https://doi.org/10.1109/mprv.2005.29. 26 josef fink and alfred kobsa, “a review and analysis of commercial user modeling servers for personalisation on the world wide web,” user modeling and user-adapted interaction 10, no. 2 (2000): 209–49, https://doi.org/10.1023/a:1026597308943. 27 hema yoganarasimhan, “search personalization using machine learning,” management science 66 no. 3 (2020): 1045–70, https://doi.org/10.1287/mnsc.2018.3255. 28 cristiane behnert and dirk lewandowski, “ranking search results in library information systems—considering ranking approaches adapted from web search engines,” the journal of academic librarianship 41 no. 6 (2015): 725–35, https://doi.org/10.1016/j.acalib.2015.07.010. 29 enrique frias-martinez, george magoulas, chen sherry, and robert macredie, “automated user modeling for personalised digital libraries,” international journal of information management 26 no. 3 (2006): 234–48, https://doi.org/10.1016/j.ijinfomgt.2006.02.006. 30 enrique frias-martinez, chen sherry, and liu xiaohui, “evaluation of a personalised digital library based on cognitive styles: adaptivity vs. aadaptability,” international journal of information management 29 no. 1 (2009): 48–56, https://doi.org/10.1016/j.ijinfomgt.2008.01.012. 31 magda el-sherbini and george klim, “metadata and cataloging practices,” the electronic library 22 no. 3 (2004): 238–48, https://doi.org/10.1108/02640470410541633; shawn averkamp and joanna lee, “repurposing proquest metadata for batch ingesting etds into an institutional repository” (university libraries staff publications, 2009): 38; sai deng and terry reese, “customized mapping and metadata transfer from dspace to oclc to improve etd workflow,” new library world, 2009, 110, no. 5/6 (2009): 249–64, https://doi.org/10.1108/03074800910954271. 32 steffen lohmann, jürgen ziegler, and lena tetzlaff, “comparison of tag cloud layouts: taskrelated performance and visual exploration,” in proceedings of the ifip conference on humancomputer interaction (2009): 392–404, https://doi.org/10.1007/978-3-642-03655-2_43. 33 yuping jin, “development of word cloud generator software based on python,” procedia engineering 174 (2017): 788–92, https://doi.org/10.1016/j.proeng.2017.01.223. 34 joel azzopardi, dragan ivanović and georgia kapitsaki, “comparison of collaborative and content-based automatic recommendation approaches in a digital library of serbian phd dissertations,” in proceedings of the international keystone conference 2016: 100–11, https://doi.org/10.1007/978-3-319-53640-8_9 https://doi.org/10.1007/3-540-48157-5_29 https://doi.org/10.1109/mprv.2005.29 https://doi.org/10.1023/a:1026597308943 https://doi.org/10.1287/mnsc.2018.3255 https://doi.org/10.1016/j.acalib.2015.07.010 https://doi.org/10.1016/j.ijinfomgt.2006.02.006 https://doi.org/10.1016/j.ijinfomgt.2008.01.012 https://doi.org/10.1108/02640470410541633 https://doi.org/10.1108/03074800910954271 https://doi.org/10.1007/978-3-642-03655-2_43 https://doi.org/10.1016/j.proeng.2017.01.223 https://doi.org/10.1007/978-3-319-53640-8_9 abstract introduction related work dosird uns searching personalization context awareness contribution of our work methodology analysis of search results’ representation styles textual representation visual representation implementation details textual results representation visual results representation personalization of representation evaluation collecting user feedback preprocessing users’ feedback analysis of user feedback conclusion endnotes navigation design and library terminology: findings from a user-centered usability study on a library website communication navigation design and library terminology findings from a user-centered usability study on a library website isabel vargas ochoa information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12123 isabel vargas ochoa (ivargas2@csustan.edu) is web services librarian, california state university, stanislaus. © 2020. abstract the university library at california state university, stanislaus is not only undergoing a library building renovation, but a website redesign as well. the library conducted a user-centered usability study to collect data in order to best lead the library website “renovation.” a prototype was created to assess an audience-based navigation design, homepage content framework, and heading terminology. the usability study consisted of 38 student participants. it was determined that a topicbased navigation design will be implemented instead of an audience-based navigation, a search-all search box will be integrated, and the headings and menu links will be modified to avoid ambiguous library terminology. further research on different navigation and content designs, and usability design approaches, will be explored for future studies. introduction the university library at california state university, stanislaus is currently undergoing a much anticipated and necessary redesign of the library website. website redesigns are crucial and a part of website maintenance to acclimate with modern technology and meet accessibility standards. “if librarians are expected to be excellent communicators at the reference desk and in the classroom, then the library website should complement the work of a librarian.”1 in this case, a library website prototype was created, using a springshare llc product, libguides cms, as the testing subject for our user-centered usability study. the usability study was completed with 38 student participants belonging to different academic years and areas of study. the library website prototype tested was designed using a user-based design framework and an audience-based navigation. this study found issues reported from users based on navigation design and ambiguous library terminology. an audience-based navigation was chosen in order to best organize and group the information and services offered to best make them accessible for users. however, an audience-based navigation will directly affect users and their search behaviors.2 the prototype, like the current library website, did not have a search-all search box during the study. a catalog search box was utilized to test whether or not the catalog was enough for student participants to find information. this also forced the participants to utilize the menu navigation. literature review the design and approach of usability studies, preference for types of search boxes, navigation design, and library terminology evolve over time in parallel with technology changes. most recent usability studies use screen and audio recording tools as opposed to written observation notes. participants in recent studies are also more adapted to learning how to navigate websites, as mailto:ivargas2@csustan.edu information technology and libraries december 2020 navigation design and library terminology | ochoa 2 opposed to participants in usability studies twenty years ago. regardless, it’s crucial to compare the results from previous usability studies to analyze differences and similarities. different types of usability studies include user-centered usability studies and heuristic usability studies. this study chose a user-centered approach because of the library's desire to collect data and feedback from student users. the way in which the usability study is presented is also detrimental to the approach. website usability studies are meant to test the website, although participants may unconsciously believe they are being tested. in tidal’s library website case study (2012) researchers assured the participants that “the web site was being tested and not the participants themselves.”3 this unconscious belief may also affect the data collected from the participants and “influence user behavior, including number of times students might attempt to find a resource or complete a given task.”4 the features tested were the navigation design and homepage elements. the navigation design in the prototype was developed to test an audience-based navigation design (see figure 1). an audience-based navigation design organizes the navigation content by audience type. 5 that is to say, the user will begin their search by identifying themselves first. although this design can organize content in a more efficient manner, especially for organizations that have specific, known audiences, critics argue that this design forces users to identify themselves before searching for information, thus taking them out of their task mindset.6 for this usability study, i wanted to test this navigation design and compare the results to our current navigation design which is a topicbased navigation design. a topic-based navigation design is developed to present topics as navigation content.7 this design is our current library website navigation design (see figure 2) figure 1. screenshot of the audience-based navigation design developed for the library website prototype. figure 2. screenshot of the current content-based navigation design in the library website. information technology and libraries december 2020 navigation design and library terminology | ochoa 3 designing the navigation and homepage also means choosing accessible terms that are relevant to all users. unfortunately, over the course of many decades, library terminology has been a hindrance for student users. terms such as “catalog,” “reference,” and “research guides” are still difficult for users to understand. as conrad states (2019), “students are not predisposed to think of a ‘research guide’ as a useful tool to help them get started.”8 a research guide isn’t necessarily a self-explanatory term. in many ways, the phrase is ambiguous. augustine’s case study in 2002 had similar difficulties. students “lack of understanding and awareness of library resources impacted their ability more than the organization of the site did.”9 it’s unsettling to know that our own terminology has been deterring users from accessing library resources for decades. librarians use library terminology to such an extent that it’s part of our everyday language, but what is common knowledge to us may be completely alien to our very own audience. not only should libraries be aware of confusing library terms, but content should also not overwhelm the user with an abundance of information. most students who visit the library are looking for something specific and easy to find. it’s important for librarians to condense their information on guides or website pages to not frustrate the user or make them search elsewhere, like google. “students scan. . . rather than [read] material.” 10 this is also something that has been noted from our crazy egg statistics. heatmaps of our website’s pages prove that users are not scrolling to the bottom of the pages. this also applies to the use of large images, or unnecessary flashy or colorful content that covers most of the desktop or mobile screen. these images should be reduced in size so that users can find information swiftly. for this reason, any large design on the homepage should also be included in menu links, in case large flashy content is ignored.11 the search box is also another fundamental element i analyzed. in this case study, our search box was the catalog search box for ex libris primo. if a page, particularly the homepage, has two search boxes—search-all and catalog search—the user can be confused. search boxes are primarily placed at the center of the page. depending on how these search boxes are labeled and identified, users may not know which one to use. students approach library search boxes as if searching google.12 in our case, neither the current website nor the prototype has a general search-all box. we have a catalog search box placed on the top center of the homepage for both sites. if we were to add a general search-all box, it would be placed away from the catalog search box and preferably in the header where it is visible in all pages. methodology the usability study was conducted by the author, the web services librarian at california state university, stanislaus, who also worked with a computer science instructor in order to recruit participants. not only is the university library redesigning its website, but the university library building is also undergoing a physical renovation. due to this project, the library has relocated to the library annex, a collection of modular buildings providing library services to the campus community. the usability study was conducted in a quiet study room in one of these modular sites. i reserved this study area and borrowed eight laptops for the sessions. the usability study employed two different methods to get students to participate. the first offered an extra credit incentive, which was offered when i collaborated with the computer science instructor. this instructor was teaching a course on human-centered design for websites. she offered her students an extra credit incentive, since several of her learning objectives centered on website design and usability studies. the second approach was an informal one. this approach information technology and libraries december 2020 navigation design and library terminology | ochoa 4 was promoted by scouting students who were already at the library annex during the usability study scheduled sessions. this enabled students to participate without having to sign-up or remember to participate. the students were recruited in-person during the usability session and through flyers posted in study rooms on the days of the study. an incentive of snacks for students to take home was also included. i created questions and seven tasks to be handed out to the participants during the study. the tasks were created to test the navigation design of the main menu and content on the homepage. i also added a task to test the research skills of the student. after these tasks, students were asked to rate the ease of access, answer questions about their experience navigating the prototype and to provide feedback. all students were given the same tasks, however if the student was taking the human-centered design course, they were also given specific web design questions for feedback (see appendices a and b). the tasks were piloted before the study with three library student workers who provided feedback on how to better word the tasks for students. the following tasks are the final seven tasks used for the usability study: 1. find research help on citing legal documents—a california statute— in apa style citation. 2. find the library hours during spring break. 3. find information on the library study spaces hours and location. 4. you’re a student at the stan state campus and you need to request a book from turlock to be sent to stockton. fill out the request-a-book form. 5. you are a graduate student and you need to submit your thesis online. fill out the thesis submission form. 6. for your history class, you need to find information on the university’s history in the university archives and special collections. find information on the university archives and special collections. 7. find any article on salmon migration in portland, oregon. you need to print it, email it to yourself, and you also need the article cited. the usability study sessions took place from 11am to 2pm on february 10, 12, and 14, 2020. these days and times were chosen because the snack incentive would attract students during lunch hour and i wanted to accommodate the start and end times of the human-centered design course on mondays, wednesdays, and fridays. the total time it took for students to complete the 7 tasks averaged 15 minutes. in total, there were 38 student participants. the student’s experience was recorded anonymously. i asked students to provide their academic year and major. students ranged from freshman (5), sophomore (2), junior (12), senior (17), graduate (1), and unknown (1). areas of study included computer science (16), criminal justice (2), business (2), psychology (3), communications (1), sociology (1), english (3), nursing (1), spanish (1), biology (3), geology (1), history (2), math (1), gender studies (1), and undeclared (1). the subject tested was the library website prototype created and executed using a springshare llc product, libguides cms. the tools i used were eight laptops and a screen recording tool, snagit. snagit is a recording tool made accessible through a campus subscription. the laptops were borrowed from the library for the duration of the sessions. during the session, students navigated and completed the tasks on their own with no direct interference, including no direct observations. i planned to create a space where my presence didn’t directly influence or intimidate their experience with the website. my findings were based solely on their written responses and screen recordings. i also explained to the students that their screen recorded video information technology and libraries december 2020 navigation design and library terminology | ochoa 5 will not be linked to their identity, since they had to sign-in to the laptop using their campus student id. i did, however, occasionally walk around the tables in the room in case a student was navigating the current website or using a separate site to complete the tasks. once the students completed the tasks and answered the questions, i collected the handouts and the screen -capture videos by copying them to a flash drive. limitations during the usability study session, there were two technical issues that hindered the initial process. on the first day, there were difficulties accessing the campus wi-fi in the room as well as difficulties accessing the snagit video recording application. this limitation affected some of the students' experiences and feedback. these issues were resolved and not present on the second and third day of the study. results and observations the results and observations collected from this study mirror results from the studies conducted by azadbakht and swanson.13 i found that students searched the catalog search box for library collections, citations, and other library terms they didn’t understand, even though it was a catalog search box with the keywords “find articles, books, and other materials” labeled in the search bar. another finding was that the navigation design can detrimentally affect a user's experience with the website. mixed reviews were received from utilizing the audience-based navigation design. the study also found that students are adept at finding research materials. for example, most students knew how to search, find, print, email, and cite an article. students in general are also familiar with book requests, ill accounts, and filling out book request webforms. this indicates that, in terms of utilizing library services, students are well aware of how to find, request, and acquire resources, using the website on their own. what was most difficult for students was interpreting library terminology. this was explicitly shown in their attempts to complete tasks 1 and 6: finding how to cite a legal document in apa style and finding information on special collections and the university archives. the following results and observations are divided into three categories: written responses, video recording observations, and data collected. data was collected based on observations from the video recording and the written responses. data was then input into eight separate charts. written responses observations comments from both non-human-centered website design students and human-centered design students included mixed reviews on the navigation layout, overall positive outlook on the page layout design, suggestions to add a search-all “search bar,” and frustrations with tasks 1 and 6. video recording observations the ex libris primo search box was constantly mistaken as a search-all search box. this occurred during students’ search for tasks 1 and 6: citation help and university archives, respectively. students also used the research guides search box in libguides as a search-all search box. students found the citation style guides easily because of this feature, however on the proposed new website, it was difficult to find citation help. students were also using research guides to complete other tasks, such as task 6. a search bar for the entire website was continuously mentioned as a solution from student participants. information technology and libraries december 2020 navigation design and library terminology | ochoa 6 tasks 2 and 3, regarding library hours and study spaces, were easily completed. tasks 4 and 5 were also easily accessible. after completing task 4 (book request form) it was easier for participants to complete task 5 (thesis submission form) because both tasks required students to search the top main navigation menu. to complete task 4, several students immediately signed-in to their ill account or login to primo for csu+, which was expected as signing-in to these accounts are alternate modes to request a book. an additional other observation, regarding task 4 is that the confusion revolving the library terms, “call number,” was solved by adding an image reference pointing to the call number in the catalog. the call number image reference was opened several times for assistance. most students completed task 7 (find a research article) but not all students used the catalog search box on the homepage to complete it. several students searched the top main navigation and clicked on the “research help” link. others utilized research guides and the research guides search box on the homepage. a particular unique observation was made by some computer science students. most computer science students were quicker to give up on a task as opposed to non-computer science students. some computer science students did not scroll down when browsing pages. these students failed to complete several tasks because they didn’t scroll down the page after being on the page for less than ten seconds. data collected figure 3. ease of navigation (overall). information technology and libraries december 2020 navigation design and library terminology | ochoa 7 figure 3 illustrates the ease of navigation rating overall from all student participants. students were asked to rate the ease of access of the website (see appendices a and b). other than the keywords “ease of navigation (1 difficult; 10 easy)” students were given the freedom to define what “easy” and “difficult” meant to them individually. the mean for the ease of access rating for all student participants was 7.7. the lowest rating of ease of access was 3 and the highest rating of ease of access was 10. figure 4. ease of navigation (computer science major). figure 4 illustrates the ease of access rating by the student participants based on whether the student was a computer science major or not a computer science major. the lowest ease of access ratings were from computer science majors. overall non-computer science majors had higher ease of access ratings than computer science majors. information technology and libraries december 2020 navigation design and library terminology | ochoa 8 figure 5. ease of navigation (human-centered design). figure 5 illustrates the ease of access rating by the student participants based on whether the student was taking the human-centered design course. the human-centered design students’ learning outcomes include website user-interface design and an assignment on how to create a usability study. similar to patterns found in figure 2, human-centered design students had lower ease of access ratings. figure 6. tasks – status of completion. information technology and libraries december 2020 navigation design and library terminology | ochoa 9 figure 6 illustrates whether a task was completed or not. completion of task was determined by analyzing whether or not the student not only found the page(s) that provided the solution to the task. it was determined that a student did not complete the task if the student was unable to find the page(s) that provided the solution to the task. “not applicable” was determined if th e student did not use the website prototype (e.g., followed a link that led elsewhere or opted to use google search instead). most students completed tasks 2, 3, 4, 5, and 7. the task with most “did not complete” was task 1 , which 64 percent of student participants did not complete. task 6 had neutral completion, 63 percent. 86 percent of students completed tasks 2 and 4, and 90 percent of students completed tasks 3, 5 and 7. it is evident that task 1 was a difficult task to complete, regardless of the stu dent’s area of study. task 1 required students to find apa legal citation help. the terms “apa legal citation” confused users. likewise, for task 6 (special collections), students did not understand what “collections” referred to or where to search them. figure 7. tasks – number of clicks (complete). figure 7 illustrates how many clicks it required students to complete the task. the clicks were separated into three categories: 1-2 clicks, 3-5 clicks, and more than 6 clicks. this figure only illustrates data collected from tasks that were completed. the number of clicks began at the website prototype’s homepage or from the main menu navigation found in the website prototype’s header, when it was evident that the student was starting a new task. tasks 2 and 3 were completed in 1-2 clicks, whereas tasks 1, 4, 5, 6, and 7 required an average of 3-5 clicks. because of experience helping students find articles at the librarian's research help desk, task 7 (find research articles) was expected to require 6+ clicks. task 1 may have a pattern of needing a high number of clicks to complete because it was a generally a difficult task to complete. information technology and libraries december 2020 navigation design and library terminology | ochoa 10 figure 8. tasks – number of clicks (did not complete). figure 8 illustrates how many clicks a student participant made before they decided to skip the task or if they believed they had completed the task. this figure only illustrates data from tasks that were not completed. the clicks were separated into three categories: 1-2 clicks, 3-5 clicks, and more than 6 clicks. the number of clicks began at the website prototypes homepage or from the main menu navigation found in the website prototype’s header, when it was evident that the student was starting a new task. task 1 and 6 show the most patterns in this figure. task 1 (citation help) shows that students generally skipped the task after more than 6 clicks. task 6 (special collections) was generally skipped after 3-6+ clicks. figure 9 illustrates the duration to complete each task. the duration was separated into three categories: 0-1 minutes, 1-3 minutes, or more than 3 minutes. this figure only illustrates data for tasks that were completed. the duration began when the student started a new task. this was determined when it was observed that the students started to use the main menu navigation, or if the student directed their screen back to the website prototype’s homepage. there are parallels between the number of clicks and duration of tasks. for tasks 2, 3, and 5, the duration to complete the task was less than 1 minute. task 5 was a task similar to task 4 (both are forms, linked once on the website), but the duration for task 5 may have averaged lower than the duration of task 4, because task 5 was after task 4. having completed a form before task 5 may have influenced the student’s behavior on searching for forms. tasks 1, 6, and 7 averaged 1-3 minutes to complete. information technology and libraries december 2020 navigation design and library terminology | ochoa 11 figure 9. tasks – question duration (complete) figure 10. tasks – question duration (did not complete) information technology and libraries december 2020 navigation design and library terminology | ochoa 12 figure 10 illustrates the duration of each task that wasn’t completed. the duration was separated into three categories: 0-1 minutes, 1-3 minutes, or more than 3 minutes. this figure only illustrates data for tasks that were not completed. the duration began when the student started a new task. this was determined when it was observed that the students started to use the main menu navigation, or if the student directed their screen back to the website prototype’s homepage. similarly, to observations for figure 7, there are parallels between the number of clicks and duration of tasks. for task 1, the average time before students skipped the task varied, however most students who didn’t complete the task skipped it after more than 3 minutes of trying to complete it. for task 6, the average duration before skipping the task was 1-3 minutes. conclusion and recommendations the purpose of this study was primarily designed to test the user-centered study approach and the navigational redesign of the library website. the results, however, provided the library with a variety of outcomes. based on suggestions and comments on the website prototype navigation design, menus, and page content, there are several elements that will be integrated to help lead the redesign of the library’s website. students found that the navigation design of the website was clear and simple, but also required a “getting used to.” because of this, and due to navigation design literature, it is recommended to design a menu navigation that is a topic-based navigation as opposed to an audience-based navigation. our findings also highlighted the effects of the use of library terms. to make menu links exceptionally user-friendly, it is recommended to utilize clear and common terminology. student participants also voiced that a search-all search box for the website was necessary. this will enable users to access information efficiently. library website developers should also map more than one link to a specific page, especially if the only link to the page is on an image or slideshow. the user-centered usability approach for this case study worked well in collaboration with campus faculty and as an informal recruitment. it provided relevant and much needed data and feedback for the university library. in terms of future usability studies, a heuristic approach may be effective. a heuristic study approach will enable moderators to gather feedback and analysis from library web development experts.14 moreover, the usability study could be conducted over a semester long time and include focus groups to acquire consistent feedback. 15 overall, website usability studies are evolving and require constant improvements and research. information technology and libraries december 2020 navigation design and library terminology | ochoa 13 appendix a major: ___________ year (freshman, sophomore, etc.): ______________ link to site: url please do not use url please complete the following situations. for some of these, you don’t need to actually submit/send, but pretend as if you are. 1. find research help on citing legal documents a california statute in apa style citation. 2. find the library hours during spring break. 3. find information on the library study spaces hours and location. 4. you’re a student at the stan state campus and you need to request a book from turlock to be sent to stockton. fill out the request-a-book form. 5.you are a graduate student and you need to submit your thesis online. fill out the thesis submission form. 6. for your history class, you need to find information on the university’s history in the university archives and special collections. find information on the university archives and special collections. 7. find any article on salmon migration in portland, oregon. you need to print it, e-mail it yourself, and you also need the article cited. complete the following questions. 1. rate the ease of access of the website (1= really difficult to navigate, 10=eas y to navigate) 1 2 3 4 5 6 7 8 9 10 2. did you ever feel frustrated or confused? if so, during what question? 3. do you think the website provides enough information to answer the above questions? why or why not? information technology and libraries december 2020 navigation design and library terminology | ochoa 14 appendix b cs 3500 major: ___________ year (freshman, sophomore, etc.): ______________ link to site: url please do not use url please complete the following situations. for some of these, you don’t need to actually submit/send, but pretend as if you are. 1. find research help on citing legal documents a california statute in apa style citation. 2. find the library hours during spring break. 3. find information on the library study spaces hours and location. 4. you’re a student at the stan state campus and you need to request a book from turlock to be sent to stockton. fill out the request-a-book form. 5.you are a graduate student and you need to submit your thesis online. fill out the thesis submission form. 6. for your history class, you need to find information on the university’s history in the university archives and special collections. find information on the university archives and special collections. 7. find any article on salmon migration in portland, oregon. you need to print it, e-mail it yourself, and you also need the article cited. then, complete the following questions. 1. rate the ease of access of the website (1= really difficult to navigate, 10=easy to navigate) 1 2 3 4 5 6 7 8 9 10 2. what did you think of the overall web design? 3. what would you change about the design? please be specific. 4. what did you like about the design? please be specific. information technology and libraries december 2020 navigation design and library terminology | ochoa 15 endnotes 1 mark aaron polger, “student preferences in library website vocabulary,” library philosophy and practice, no. 1 (june 2011): 81, https://digitalcommons.unl.edu/libphilprac/618/. 2 jakob nielsen, “is navigation useful?,” nn/g nielsen norman group, https://www.nngroup.com/articles/is-navigation-useful/. 3 junior tidal, “creating a user-centered library homepage: a case study,” oclc systems & services: international digital library perspectives 28, no. 2 (may 2012): 95, https://doi.org/10.1108/10650751211236631. 4 suzanna conrad and christy stevens, “‘am i on the library website?’: a libguides usability study,” information technology and libraries (online) 38, no. 3 (september 2019): 73, https://doi.org/10.6017/ital.v38i3.10977. 5 eric rogers, “designing a web-based desktop that's easy to navigate,” computers in libraries 20, no. 4 (april 2000): 36, proquest. 6 katie sherwin, “audience-based navigation: 5 reasons to avoid it,” nn/g nielsen norman group, https://www.nngroup.com/articles/audience-based-navigation/. 7 rogers, “designing a web-based desktop that's easy to navigate,” 36. 8 conrad, “‘am i on the library website?’: a libguides usability study,” 71. 9 susan augustine and courtney greene, “discovering how students search a library web site: a usability case study,” college & research libraries 63, no. 4 (july 2002): 358, https://doi.org/10.5860/crl.63.4.354. 10 conrad, “‘am i on the library website?’: a libguides usability study,” 70. 11 kate a. pittsley and sara memmott, “improving independent student navigation of complex educational web sites: an analysis of two navigation design changes in libguides,” information technology and libraries 31, no. 3 (september 2012): 54, https://doi.org/10.6017/ital.v31i3.1880. 12 elena azadbakht, john blair, and lisa jones, “everyone's invited: a website usability study involving multiple library stakeholders,” information technology and libraries 36, no. 4 (december 2017): 43, https://doi.org/10.6017/ital.v36i4.9959. 13 azadbakht, “everyone's invited,” 43; troy a. swanson and jeremy green, “why we are not google: lessons from a library web site usability study,” the journal of academic librarianship 37, no. 3 (february 2011): 226, https://doi.org/10.1016/j.acalib.2011.02.014. 14 laura manzari and jeremiah trinidad-christensen, “user-centered design of a web site for library and information science students: heuristic evaluation and usability testing ,” information technology and libraries 25, no. 3 (september 2006): 164, https://doi.org/10.6017/ital.v25i3.3348. 15 tidal, “creating a user-centered library homepage: a case study,” 97. https://digitalcommons.unl.edu/libphilprac/618/ https://www.nngroup.com/articles/is-navigation-useful/ https://doi.org/10.1108/10650751211236631 https://doi.org/10.6017/ital.v38i3.10977 https://www.nngroup.com/articles/audience-based-navigation/ https://doi.org/10.5860/crl.63.4.354 https://doi.org/10.6017/ital.v31i3.1880 https://doi.org/10.6017/ital.v36i4.9959 https://doi.org/10.1016/j.acalib.2011.02.014 https://doi.org/10.6017/ital.v25i3.3348 abstract introduction literature review methodology limitations results and observations written responses observations video recording observations data collected conclusion and recommendations appendix a appendix b endnotes beyond viaf: wikidata as a complementary tool for authority control in libraries article beyond viaf wikidata as a complementary tool for authority control in libraries carlo bianchini, stefano bargioni, and camillo carlo pellizzari di san girolamo information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12959 abstract this paper aims to investigate the reciprocal relationship between viaf® and wikidata and their possible roles in the semantic web environment. it deals with their data, their approach, their domain, and their stakeholders, with particular attention to identification as a fundamental goal of universal bibliographic control. after examining interrelationships among viaf, wikidata, libraries and other glam institutions, a double approach is used to compare viaf and wikidata: first, a quantitative analysis of viaf and wikidata data on personal entities, presented in eight tables; and second, a qualitative comparison of several general characteristics, such as purpose, scope, organizational and theoretical approach, data harvesting and management (shown in table 9). quantitative data and qualitative comparison show that viaf and wikidata are quite different in their purpose, scope, organizational and theoretical approach, data harvesting, and management. the study highlights the reciprocal role of viaf and wikidata and its helpfulness in the worldwide bibliographical context and in the semantic web environment and outlines new perspectives for research and cooperation. introduction in 2011, the library linked data incubator group, a w3c working group with the aim “to help increase global interoperability of library data on the web,” published its final report. two interrelated issues were tackled in that milestone report: what libraries can do for the semantic web and what the semantic web can do for libraries. linked data is an important asset for libraries as the “use of identifiers allows diverse descriptions to refer to the same thing. through rich linkages with complementary data from trusted sources, libraries can increase the value of their own data beyond the sum of their sources taken individually.”1 so linked data greatly contribute to library cataloguing work not just for description of resources but also for their proper identification. on the other hand, libraries have always created and curated a significant amount of valuable information assets and library authority data for names and subjects to help reduce “redundancy of bibliographic descriptions on the web by clearly identifying key entities that are shared across linked data. this will also aid in the reduction of redundancy of metadata representing library holdings.”2 the report opened a new way of thinking about universal bibliographic control (ubc), a “worldwide system for control and exchange of bibliographic information,” (https://archive.ifla.org/ubcim/ubcim-archive.htm) the purpose of which is “to make universally carlo bianchini (carlo.bianchini@unipv.it) is associate professor, department of musicology and cultural heritage, university of pavia. stefano bargioni (bargioni@pusc.it) is deputy director, library of the pontifical university santa croce (rome). camillo carlo pellizzari di san girolamo (camillo.pellizzaridisangirolamo@sns.it) is graduate student, department of classics, university of pisa and scuola normale superiore. © 2021. https://archive.ifla.org/ubcim/ubcim-archive.htm mailto:carlo.bianchini@unipv.it mailto:bargioni@pusc.it mailto:camillo.pellizzaridisangirolamo@sns.it information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 2 and promptly available, in a form which is internationally acceptable, basic bibliographic data on all publications in all countries.”3 exchanging information and data requires standards, at both the national and international level, for description, identification, and data format. nowadays, a pillar of ubc is viaf® (the virtual international authority file), a worldwide project designed by a few national libraries and run by oclc, which combines multiple name authority files with the goal “to lower the cost and increase the utility of library authority files by matching and linking widely-used authority files and making that information available on the web [https://www.viaf.org/].” it “clusters together the various forms of names for an entity” and has become “a major source for authority control and is becoming the collective reference source at the international level.”4 viaf is a fundamental tool for the identification of entities (people, locations, works, and expressions) relevant for the bibliographic universe. yet, as it is based on the harvesting of data from authoritative national libraries spread all over the world, it has a top-down approach: libraries and services that are not viaf sources can only refer to viaf, but not actively cooperate with it, and, for its nature, viaf cannot admit user cooperation. therefore, on a global scale, a very large number of local libraries are excluded, and their data, collections, and specificities are, too. furthermore, since the design and development of viaf at the beginning of the 21 st century, the semantic web environment has hugely evolved, and libraries are more and more required to act in new directions and to explore new forms of cooperation.5 illien and bourdon maintain not only that libraries “must now be careful to keep up their own interoperability,” but also that they “would be well-advised to keep up or enter into dialogue with the most influential communities in the web of data—smoothing out their own disputes in the meantime.”6 moreover, they believe that “building collaborative authority registries linked to standardized identifiers is one of the fundamental cornerstones of the new universal bibliographic control.”7 also, dunsire and willer suggest that a “smart ubc should strive to support all those who wish to think globally and act locally, with a better mix of bottom-up and top-down methodologies” as far as the “attempts to implement ubc as a worldwide system for the control and exchange of bibliographic information using top-down methodologies have only partially succeeded at global scale.”8 as a result, a better integration of libraries into the semantic web seems to require the involvement of a larger group of stakeholders—such as non-national agencies, museums, archives, and users—and the adoption of a complementary bottom-up approach. a new global actor of the semantic web has both a bottom-up and a very inclusive approach: wikidata. wikidata is a freely available hosted platform that anyone—including libraries—can use to create, publish, and use linked open data (lod). since 2012, many users have been involved in a bottom-up approach to identity management in wikidata. furthermore, interest in and experience with the use of wikidata to publish lod among glam (galleries, libraries, archives, and museums) institutions is constantly increasing.9 the wikidata role as an important tool for the identification of entities of any kind —not just those of traditional importance to glam—has likewise been increasingly recognized in recent years.10 https://www.viaf.org/ information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 3 so, two worldwide identification tools, two different backgrounds, two opposite approaches. are they mutually exclusive, or integrable? is one of them sufficient for libraries’ needs, or do libraries need both? which stakeholders are best served by viaf? which are best served by wikidata? this paper investigates the reciprocal relationship between viaf and wikidata and of their possible specific roles in the semantic web environment with respect to their approach, their domain, and their stakeholders, with particular attention to identification as a fundamental goal of ubc. relationship between viaf and libraries viaf gathers a huge quantity of authority data from more than 50 sources, listed in the home page of the project (https://viaf.org). millions of records coming from national libraries and other institutions are continuously processed using algorithms based on the matching of data and bibliographic relationships with the goal of creating clusters of names (figure 1).11 figure 1. viaf cluster for wolfgang amadeus mozart clusters are usable in many services “to identify names, locations, works, and expressions while preserving regional preferences for language, spelling, and script” (https://www.oclc.org/en/viaf.html). clusters may contain one or more ids from viaf sources. furthermore, unique identifiers of clusters (a viaf id, e.g., https://viaf.org/viaf/7524651/) are freely reusable and reused by other institutions to add useful information to their catalogues, open up new paths of information for the end user, contribute local data to the linked data cloud, and much more.12 https://viaf.org/ https://viaf.org/viaf/32197206/ https://www.oclc.org/en/viaf.html https://viaf.org/viaf/7524651/ information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 4 data sources are selected and approved by the viaf council (see https://www.oclc.org/en/viaf/contributing.html), and may belong to two categories: viaf contributors, usually national lam (libraries, archives, museums) agencies, admitted following very selective criteria; and other data providers, i.e., “other selected sources (e.g., wikipedia [sic]) that are not viaf contributor agencies.”13 other data providers include isni and wikidata (even if wikidata is often confused with wikipedia, as in the quotation above).14 while contributors are eligible to appoint a representative to the viaf council, other data providers are not. so, viaf is based on a rigid three-level hierarchical approach: viaf, viaf contributors, and other data providers. all the other national and local institutions, i.e., relevant national data producers that are no t national agencies, cannot provide data to viaf; instead, they are expected to benefit from the use of viaf ids after performing a reconciliation process of their own data with viaf ids. however, benefits could be not completely satisfactory in term of quality of data: while viaf deals with “widely-used authority files,” it can be supposed that the libraries of non-national agencies need authority data more relevant on a local or specialistic basis. lastly, while viaf guidelines state that viaf participants should periodically send updated data to viaf, it is not clear when and how viaf retrieves and collects data from other data providers (https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf). relationships between wikidata and academic, research, and public libraries wikidata was launched in 2012 by the wikimedia foundation as the central storage of the structured data from all wikimedia foundation projects; it is “a freely available hosted platform that anyone—including libraries—can use to create, publish, and use lod.”15 wikidata stores stable and common information about entities, i.e., items and properties, and interlinks between different wikimedia projects, in a form compliant with the rdf model (see https://www.mediawiki.org/wiki/wikibase/datamodel/primer). additionally, wikidata uses triples and enriches them with qualifiers and references.16 qualifiers allow adding specifications about the validity of a statement (start/end date, precision, obsolescence, series ordinal, etc.); references are fundamental to justify the data, i.e., to document the authority data creator’s reason for choosing the name or form of name on which a controlled access point is based. 17 wikidata uses the software wikibase (https://wikiba.se/), which is “an open-source software suite for creating collaborative knowledge bases” whose “data model prioritizes language independence and knowledge diversity.” the wikibase open-source software, which is currently used by more than thirty institutions, supports federated sparql queries. 18 wikibase’s approach and characteristics are particularly interesting for the library world. gemeinsame normdatei (gnd) created a working group with wikimedia deutschland in order to “debate whether wikibase is suitable for the needs of existing authority files coming from libraries” (https://wiki.dnb.de/display/gnd/authority+control+meets+wikibase); in march 2020 it was stated that the cooperation “has proven successful” and the current aim is to “develop a wikibasebased gnd and put it into use” (https://wiki.dnb.de/pages/viewpage.action?pageid=167019461). similarly, the bibliothèque nationale de france (bnf) and the agence bibliographique de l'enseignement supérieur (abes) launched the joint french national entities file (fne), which in 2019 carried out “a proof of concept to investigate the feasibility of using the software https://www.oclc.org/en/viaf/contributing.html https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf https://www.mediawiki.org/wiki/wikibase/datamodel/primer https://wikiba.se/ https://wiki.dnb.de/display/gnd/authority+control+meets+wikibase https://wiki.dnb.de/pages/viewpage.action?pageid=167019461 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 5 infrastructure of wikibase to support the fne.”19 a synthesis of the proof of concept, published in july 2020, mentioned, among the decisions taken, the choice to develop fne to build on wikibase (https://www.transition-bibliographique.fr/wp-content/uploads/2020/07/synthese-preuveconcept-fne.pdf). fne is scheduled to be launched in the next few years (https://f.hypotheses.org/wpcontent/blogs.dir/2167/files/2020/02/20200128_8_versunfichiernationaldentites.pdf ). even more interestingly, between 2017 and 2018, oclc explored a linked data wikibase prototype; the final report shows, among other results, that “the building blocks of wikibase can be used to create structured data with a precision that exceeds current library standards” and that “to populate knowledge graphs with library metadata, tools that facilitate the import and enhancement of data created elsewhere are recommended [. . . and . . .] the pilot underscored the need for interoperability between data sources, both for ingest and export.”20 in late 2019, the ifla wikidata working group was formed “to explore and advocate for the use of and contribution to wikidata by library and information professionals, the integration of wikidata and wikibase with library systems, and alignment of the wikidata ontology with library metadata formats such as bibframe, rda, and marc” (https://www.ifla.org/node/92837). on the wikimedia side, in 2019 the ld4-wikidata affinity group (ld4 stands for “linked data for”) was created by hilary thorsen, wikimedian in residence at stanford university, to understand “how the library can contribute to and leverage wikidata as a platform for publishing, linking, and enriching library linked data” (https://wiki.lyrasis.org/display/ld4p2/ld4wikidata+affinity+group). libraries’ interest in wikidata is usually focused on lod and semantic discovery, not on authority control: “libraries may each use different, unique, or select identifiers and authority control methods for disambiguation. increasingly, wikidata is becoming an important tool for synchronizing across identifiers like virtual international authority file (viaf) and orcid identifiers. integrating awareness of wikidata and its uses for enhancing metadata and link ed open data will help advance a more interconnected research web.”21 identification is a key issue both in bibliographic control and in the semantic web environment, as john riemer noted: “recent examination of the efforts involved in what we have historically called authority control in the pcc community has led us to the conclusion that the primary emphasis should be on identity management.”22 as a matter of fact, wikibase and wikidata’s approach to authority control and bibliographic description is quite new: not only does the traditional distinction between authority and bibliographic data disappear in a wikibase description, but wikidata is to be considered firstly as an identity management tool for any kind of entity.23 relationship between viaf and wikidata the first attempt of cooperation between viaf and wikidata goes back to 2012, when maximilian klein and alex kyrios, wikipedians in residence at oclc and the british library, respectively, developed a project to integrate authority data from the viaf with english wikipedia biographical articles. the project successfully “added authority data to hundreds of thousands of articles on the english wikipedia,” but above all showed that “linking of data represents an opportunity for libraries to present their traditionally siloed data, such as catalogue and authority records, in more openly accessible web platforms.”24 at the time, wikidata was taking its first steps, but later authority data were successfully transferred from english wikipedia to wikidata. https://www.transition-bibliographique.fr/wp-content/uploads/2020/07/synthese-preuve-concept-fne.pdf https://www.transition-bibliographique.fr/wp-content/uploads/2020/07/synthese-preuve-concept-fne.pdf https://f.hypotheses.org/wp-content/blogs.dir/2167/files/2020/02/20200128_8_versunfichiernationaldentites.pdf https://f.hypotheses.org/wp-content/blogs.dir/2167/files/2020/02/20200128_8_versunfichiernationaldentites.pdf https://www.ifla.org/node/92837 https://wiki.lyrasis.org/display/ld4p2/ld4-wikidata+affinity+group https://wiki.lyrasis.org/display/ld4p2/ld4-wikidata+affinity+group https://wiki.lyrasis.org/display/ld4p2/ld4-wikidata+affinity+group information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 6 at present, the connection between wikidata and viaf is very strong. both viaf and wikidata are founded on a strict authority control that is built on a few cataloguing principles . in particular, both apply the principle that the authorized access point “for the name of an entity should be recorded as authority data along with identifiers for the entity and variant forms of name.”25 in addition, wikidata is a data provider in viaf, while viaf ids are constantly recorded and updated in wikidata items. at present, wikidata has 8,304,947 personal items, out of which 2,061,046 items have a viaf id. moreover, each month a wikidata bot (https://www.wikidata.org/wiki/user:krbot) updates links in wikidata items to redirected viaf clusters and removes links to abandoned viaf clusters. the relevance of viaf to the wikidata information ecosystem is evident in the visualization of external identifiers in the items: viaf ids, represented on wikidata by property p214 (https://www.wikidata.org/wiki/property:p214), are automatically sorted as the first external identifier, preceded by the group of iso standards and followed by the group of viaf sources.26 using specific gadgets, i.e., enhancements of the edit interface, wikidata registered users can add to a specific item the ids of single viaf sources extracting them from the viaf id(s) present in the item.27 unfortunately, there is no automatic reciprocity between viaf and wikidata: when a wikidata item gets a link to a viaf cluster, viaf does not have an automated way to add a reciprocal link to the wikidata item. likewise, when a viaf cluster gets a link to a wikidata item, wikidata has no automatic way to add a reciprocal link to the viaf cluster. another very important aspect of the viaf-wikidata relationship is that wikidata uploads data from viaf only by voluntary work of wikidata users; and this approach applies to national library data, and to any other data, too. when available, viaf ids are typically one of the most important elements used by users to decide the identity of a wikidata item. wikidata controls on viaf in wikidata, the use of constraints—i.e., rules that check the appropriate use of a property (https://www.wikidata.org/wiki/help:property_constraints_portal)—enables easy discovery of possible inconsistencies in statements, both in data and in external identifiers. weekly, a wikidata bot (https://www.wikidata.org/wiki/user:krbot2) updates the database reports containing the constraint violations for each property, so that users can check the issues and try to fix them. users can also check constraint violations in real time using the appropriate queries linked in the talk page of each property. as far as to viaf ids, two types of constraint-violations are particularly relevant both for the data entry and for the present paper: • “single value” violations, i.e., one item has two or more viaf ids. this means that either one or more viaf ids are not to be related to the item, so that the non-pertinent viaf ids should be removed from the wikidata item or that more viaf ids exist for the same real entity, so that all the existing viaf ids must be kept in the wikidata item until viaf merges them. an example of a merge performed by viaf, maybe on the basis of the correspondent wikidata item, can be found in iulius rufinianus (https://www.wikidata.org/wiki/q28131664), where the eight distinct viaf ids contained in the wikidata item on september 24, 2019, have now been merged (https://www.wikidata.org/w/index.php?title=q28131664&oldid=1001570078); in april 2021, the wikidata item for alaricus i (https://www.wikidata.org/wiki/q102371) contains https://www.wikidata.org/wiki/user:krbot https://www.wikidata.org/wiki/property:p214 https://www.wikidata.org/wiki/help:property_constraints_portal https://www.wikidata.org/wiki/user:krbot2 https://www.wikidata.org/wiki/q28131664 https://www.wikidata.org/w/index.php?title=q28131664&oldid=1001570078 https://www.wikidata.org/wiki/q102371 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 7 four viaf ids (but there were ten on june 29, 2020; https://www.wikidata.org/w/index.php?title=q102371&oldid=1220309663). • “unique value” violations, i.e., two or more wikidata items have the same viaf id. this violation means not only an error on the wikidata side, but it could imply an error in viaf too. in the former, either one or more wikidata items have a non-pertinent viaf id, to be removed; or the same entity is referred to by one or more wikidata items, to be merged. in the latter, the viaf id conflates two or more distinct entities in one cluster. an example of conflation is the cluster at https://viaf.org/viaf/57898554/, where the painter herbert e. abrams (1920–2003; https://www.wikidata.org/wiki/q4117019) and the physician herbert l. abrams (1920–2016; https://www.wikidata.org/wiki/q23665535) conflate. in that case, wikidata users can report the viaf conflation error in the proper wikidata errorreport pages.28 in most cases just a few weeks are required for viaf to merge clusters regarding the same entity when wikidata includes them in the same item, but solutions to cases of conflation are fixed more slowly. while updates to viaf clusters and ids are obviously necessary and welcome, they are somehow risky for viaf contributors, providers, and users that base the consistency of their data on viaf. so, national libraries could import incorrect data into their ids and wikidata could import wrong national libraries ids referring to different entities into the same wikidata item. there is no evidence that the error-report pages created and updated by wikidata users are being systematically taken into consideration by viaf to solve its conflations. recently, other issues in the use of viaf as a source were raised when viaf removed very important information about its cluster merging process, information that is no longer available to worldwide libraries and users. the viaf data dump page (http://viaf.org/viaf/data) is refreshed monthly and, until april 2020, it included a persist file. for example, the february 2020 dump, viaf-20200203-persist-rdf.xml.gz, contained data about redirected clusters and—potentially— abandoned clusters as well. this information is essential to the prompt and safe synchronization of local data with viaf clusters. in this dump, redirected clusters were described, for instance, as follows: while any abandoned cluster (14,692,237 out of 24,030,176!) was erroneously described as follows: this xml empty statement omits the specific information about the abandoned cluster. to obtain this invaluable information again, we filed a bug by email. 29 the decision taken was drastic: starting in may 2020, viaf stopped including this information in its monthly dump, as stated at the bottom of the page itself.30 as a result, the only recourse available to viaf contributors or any https://www.wikidata.org/w/index.php?title=q102371&oldid=1220309663 https://viaf.org/viaf/57898554/ https://www.wikidata.org/wiki/q4117019 https://www.wikidata.org/wiki/q23665535 http://viaf.org/viaf/data information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 8 other institution that would synchronize their authority records with viaf identifiers is to rely on an external identification tool such as wikidata! materials and methods any comparison between viaf and wikidata must consider their different content. viaf contains personal name clusters, corporate name clusters, geographic name clusters, and work clusters, whereas wikidata allows items to describe any kind of entity relevant in the universe of discourse of the users’ data and irrespective of their bibliographic nature. even if all kinds of viaf clusters are relevant for bibliographic control, this study is limited to the analysis of personal name clusters in viaf and of items having “instance of: human” (p31:q5) in wikidata, because they are largely the most represented in viaf and they can be directly compared.31 some entities, such as mythological persons, legendary persons, etc., that are personal clusters in viaf, are not treated as humans in wikidata and belong to other instances (e.g., https://www.wikidata.org/wiki/q95074). a double approach was used to compare viaf and wikidata: first, data analyses of viaf and wikidata were performed, to compare viaf clusters and wikidata items and to investigate their reciprocal relationships (see the data analysis section). second, a comparison of several general characteristics, such as scope, objectives, philosophy, authority control, and identification, was made based on respective websites and available literature to find and highlight differences and similarities. full viaf dumps are available in native xml, rdf, marc-21 xml, or iso-2709 marc-21 (http://viaf.org/viaf/data/). viaf clusters were analyzed using an xml dump published on september 6, 2020 (http://viaf.org/viaf/data/viaf-20200906-clusters.xml.gz). full wikidata dumps are available in xml, json, or rdf.32 however, given the size of the entire dataset, it is much more convenient to create customized rdf dumps using the tool wdumper (https://wdumps.toolforge.org/). all the information (settings, dimension, and date of base dump) about dumps created using wdumper remains traced (https://wdumps.toolforge.org/dumps). wikidata items were analyzed using a customized rdf dump updated to september 14, 2020 (https://wdumps.toolforge.org/dump/732). the customized dump contains all statements with non-deprecated values33 present in items having both “instance of: human” (p31:q5) in best rank and at least one value of “viaf id” (p214) in best rank. both dumps were parsed using three perl scripts. dumps and scripts were uploaded on zenodo and are all available for analysis and reuse.34 perl scripts generate json data that are published on the html page http://catalogo.pusc.it/beyond_viaf/, where they are interpreted by javascript scripts in order to populate eight tables: three dedicated to viaf (tables 1–3) and five to wikidata (tables 4–8). in order to select the statements to be analyzed in wikidata items, three sets of relevant properties were found through three distinct sparql queries at the end of september 2020: viaf members (table 5), authority controls related to libraries but not being viaf members (table 6), and biographical dictionaries (table 7).35 at the beginning of october 2020, another sparql query was performed to find all the personal items containing the authority controls related to libraries but not being viaf members (table 6, column 4), without filtering the search to personal items having at least one value of “viaf id” (p214).36 https://www.wikidata.org/wiki/q95074 http://viaf.org/viaf/data/ http://viaf.org/viaf/data/viaf-20200906-clusters.xml.gz https://wdumps.toolforge.org/ https://wdumps.toolforge.org/dumps https://wdumps.toolforge.org/dump/732 http://catalogo.pusc.it/beyond_viaf/ http://catalogo.pusc.it/beyond_viaf/#summary http://catalogo.pusc.it/beyond_viaf/#summary http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb6 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 9 data analysis: viaf clusters and wikidata items for this paper, two different versions of the data tables were produced: the first version, available at http://catalogo.pusc.it/beyond_viaf/, is a full, commented, and dynamic version of all the tables. within that version, links to the acronyms (such as lc, dnb, sudoc, etc.) of all the viaf contributors and other data providers are available too. static versions of these tables are included in this paper with commentary. viaf viaf has 22,099,715 personal clusters, half of which (50.90%; table 1, col. 2) are isolated clusters (i.e., they contain only one id). the presence of isolated clusters is interesting because it means that those clusters are created based on data coming from just one source. what is more, the percentage of isolated clusters is much higher (71.19%; table 1, col. 12) if just viaf contributors are taken into account (i.e., excluding isolated clusters due to data from other data providers, such as isni). it is worth noting that other data providers can form isolated clusters, with the relevant exception of wikidata (for which viaf uses the acronym wkp), which never appears in isolated clusters (table 1, cols. 7 and 8). table 1. viaf personal clusters by number of sources [adapted from http://catalogo.pusc.it/beyond_viaf/#tb1] the total number of ids present in viaf clusters is 51,327,847 (table 2), distributed in 22,099,715 clusters; the most relevant contributors include lc (7,266,628 ids), dnb (5,677,731 ids), sudoc (3,278,189 ids), and nta (2,754,036 ids), while the most relevant other data providers are isni (8,455,814 ids) and wkp (2,148,680 ids) (table 2). apart from lc and dnb, data about isolated clusters (table 2, col. 5) shows that the number of isolate clusters tends to slowly decrease over time and that clustering has improved: recently-added sources tend to have a higher share of isolated ids. another relevant figure is that sources in non-latin alphabets usually have higher shares of isolated ids.37 so, a high number of isolated clusters may reveal a source that is partially in need to be gathered to existing clusters. http://catalogo.pusc.it/beyond_viaf/ http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb2 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 10 table 2. viaf personal clusters by source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb2] the histories of viaf clusters, as contained in xml dumps, appear weird and incoherent. for example, many viaf contributors in their first year of appearance seem to have no additions and many removals (e.g., bav row; for complete information see table 3 on the website at http://catalogo.pusc.it/beyond_viaf/#tb3). incoherence is due to the absence of redirected and abandoned clusters in the data. nevertheless, the histories allow us to reconstruct the year of first contribution of each source—an information otherwise unavailable—and to detect major changes in the data provided to viaf by each source.38 table 3. viaf history of personal clusters by source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb3] wikidata wikidata has 8,304,947 personal items and 2,061,046 of them contain a viaf id. usually one or more viaf sources are extracted from the viaf id(s), so that 1,905,470 personal items containing viaf id have at least one viaf source id (table 4, col. 1). wikidata records ids from a wide range http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb3 http://catalogo.pusc.it/beyond_viaf/#tb4 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 11 of other resources, such as non-viaf bibliographic agencies and biographical dictionaries (investigated in these tables), but also encyclopedias and various online databases. considering the 2,061,046 items containing a viaf id, 684,367 items contain only one viaf source id (table 4, col. 1), but only 353,710 items contain only one among viaf sources ids and non-viaf sources ids and biographical dictionaries ids (table 4, col. 15); so, more than 300,000 items containing only one viaf source id have at least one non-viaf source id and/or one biographical dictionary id. table 4. wikidata personal items (pers. it.) by number of ids [adapted from http://catalogo.pusc.it/beyond_viaf/#tb4] viaf and wikidata: a data comparison from a quantitative perspective, wikidata personal items (8,304,947) are 37.58% of viaf personal clusters (22,099,715), while wikidata personal items having a viaf id (2,061,046) are 9.26%. ids from viaf sources present in wikidata personal items containing viaf id (6,292,778; table 5, col. 3) are 12.91% of ids present in viaf personal clusters (48,740,933; table 5, col. 4). in the authors’ opinion, quantitative confrontation between viaf and wikidata must be carefully considered. it could be argued that is a noticeable disadvantage of wikidata with respect to viaf, but it would be right only from a bibliographic control perspective and the other side of the coin must be examined too. as wikidata represents any kind of entity relevant for its users (libraries, archives, museums, and many other stakeholders), viaf contains just over a third of wikidata items (37%). furthermore, a very large part of the personal entities represented in wikidata (at present, more than 6,200,000, i.e., about 75%) cannot rely on viaf for identification purposes (for example, because wikidata personal items can also represent singers, lawyers, pilots, and so on). it can be concluded that viaf can be considered just one specialized source, in the domain of the semantic web and with respect to the objectives of wikidata. considering single viaf sources, wikidata surpasses viaf by number of ids only in two cases, perseus (135.18%) and simacob (102.17%) (table 5, col. 5). this is possible because wikidata and viaf gather different sets of data from both the sources; the former uses sets of data obtained by its users, while the latter uses only data sent by the contributor. all the other sources, because of the absence of systematic imports, are much rarer in wikidata than in viaf. http://catalogo.pusc.it/beyond_viaf/#tb4 http://catalogo.pusc.it/beyond_viaf/#tb4 http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb5 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 12 table 5. wikidata personal items (pers. it.) by viaf source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb5] table 6 and table 7 show authority control in wikidata living aside viaf. wikidata contains some non-viaf sources (usually non-national libraries or groups of libraries which couldn’t become viaf contributors); their ids in personal items having viaf id (894,161) are the 86.04% of their ids in all personal items (958,206; table 6, col. 4), meaning that wikidata provides a clusterization for more than 64,000 ids (6%) probably corresponding to non-existent viaf clusters (table 6, totals). http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb6 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 13 table 6. wikidata personal items (pers. it.) by non-viaf sources [adapted from http://catalogo.pusc.it/beyond_viaf/#tb6] table 7. wikidata personal items (pers. it.) by biographical dictionary [adapted from http://catalogo.pusc.it/beyond_viaf/#tb7] in general the presence of ids of biographical dictionaries (796,609 ids in total) in 725,755 personal items having viaf id helps significantly in the definition of authoritative dates of birth and death (table 7, total of column 2 and table 4, total of column 12). http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb4 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 14 a comparison between table 1, column 7, and table 2, row wkp (the acronym for wikidata wrongly used by viaf) shows that 2,147,319 clusters contain 2,148,680 wkp ids; it means that, from a viaf point of view, wikidata duplicates are only 1,361. furthermore, a comparison between the total and row 0 in table 8, col. 1, shows that 2,061,046 items contain at least one viaf id and that 2,037,638 items contain exactly one viaf id; so, items containing one or more viaf duplicates are 23,408. as a result, it can be concluded that the percentage of duplicates in wikidata is less than 0.01% and in viaf is about 0.01%, so wikidata is as trustworthy as viaf. viaf and wikidata not only are able to discover reciprocal duplicates, but also discover duplicates in viaf sources, by a comparison between table 8, col. 3—containing the total number of the cases in which a viaf source has at least one duplicate—and table 8, col. 5—containing the total number of the cases in which viaf sources are duplicated. however, while duplicates recorded by viaf are findable only by querying the monthly dumps using in-house–made programs, duplicates discovered by wikidata are easily findable through sparql queries detecting single-value constraint violations. table 8. wikidata personal items (pers. it.) by repeated viaf sources and viaf source ids [adapted from http://catalogo.pusc.it/beyond_viaf/#tb8] discussion viaf and wikidata are quite different in their purpose, scope, organizational and theoretical approach, data harvesting and management. a major difference between viaf and wikidata is in their purpose: on the one hand, viaf aims to identify bibliographic entities and to connect authority data provided by selected contributors (national libraries, cultural agencies, and other major institutions) and extracted from other data providers (such as isni, rism or de663, wikidata, etc.) through the creation of clusters by means of software. on the other hand, like isni, wikidata focuses on both identification and description of entities and has the purpose of building collaboratively a database concerning the sum of all relevant knowledge—provided that each item complying with its notability criteria is accepted— using a crowdsourced approach (https://www.wikidata.org/wiki/wikidata:notability). http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 https://www.wikidata.org/wiki/wikidata:notability information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 15 another relevant difference between viaf and wikidata is their scope: while viaf aims to identify a few selected types of entities already described within the bibliographic universe by national agencies, wikidata aims to identify and describe any kind of entity of interest for the wikidata community. wikidata items may exist for any kind of entity and may contain a very broad range of data and of external identifiers. so, wikidata can represent bibliographic data and entities —e.g., at present wikidata records data for the 54% of all the bibliographic sources cited in wikipedia entries—any other kind of entity provided for in viaf (i.e., agents, works, expressions, and places), and any other entity defined by the frbr-ifla lrm model (e.g., manifestations, items, timespans, nomens, res, etc.), and by other models relevant for the glam universe (such as frbroo and cidoc).39 but it is open to any data model because it can also include any kind of entity outside the bibliographic or cultural heritage universe, as it is a knowledge base capable of containing any kind of statement on any entity users want to describe. in addition, for any kind of entity there is no minimum or maximum number of statements that must or can be added; as soon as an entity is clearly identified, it can be added to wikidata. moreover, when miss ing, new identifiers—and properties for description—can be proposed by anyone through property proposals and, if well defined, they are usually approved within two weeks (https://www.wikidata.org/wiki/wikidata:property_proposal). a broader scope is supposed to be much more convenient for users who wish to discover previously unknown links and information in the semantic web. organizational model due to the viaf top-down approach, data is completely managed by oclc with no chance for common users or medium and small libraries or other institutions to directly improve viaf clusters (e.g., by adding other data coming from their collections or from encyclopedias or online databases, merging duplicates, solving conflations, etc.). as the wikidata approach is “to crowdsource data acquisition, allowing a global community to edit the data,” data is curated directly by users interested in their creation and use.40 so, in wikidata, data is produced by volunteers, by means of semiautomatic or manual data harvesting from any desired and available source. moreover, users’ statistics show that authoritative data from national bibliographic agencies and other libraries, archives, and museums are normally uploaded by common users, not by librarians (or any other kind of institutional data curator).41 identification function the theoretical approach differs too, both as to the form of the names and as to identification function. in viaf, preferred and variant forms of names for persons are based on national cataloguing codes. because national codes are different, viaf is needed and works as a neutral hub of all the national preferred forms. cataloguing rules can assure uniformity and univocity to the forms of the names of the entities within a national catalogue but are quite complicated to be understood and used by users. in ranganathan’s words, “the cataloguing conventions are on the surface quite contrary to what mr. everybody is familiar with.”42 in contrast, preferred forms in wikidata are based on the international principles of the convenience of the user and common usage.43 a clear example is the use of the direct form of name (jane doe) instead of the inverted form of name (doe, jane). a different usage in the forms of names could be an issue for the integration of library metadata in wikidata. in practice, however, it is not. first, there is no conflict between the wikidata form and any other form from a theoretical point of view, as wikidata form is already treated in viaf as the preferred form within its specific context.44 in addition to that, wikidata accepts any library https://www.wikidata.org/wiki/wikidata:property_proposal information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 16 identifier, so that any library-controlled form can be linked to a wikidata item and vice versa. furthermore, a wikidata bot could be programmed to dump authorized and variant access points from national authority files and add them to the item labels and aliases. 45 lastly, it could be argued that national cataloguing codes are compliant with the icp principles and with the convenience of the user and common usage. but a remarkable difference is that while in national codes principles are applied by cataloguers for users, in wikidata they are expressed directly by the users themselves. as the identification function is a major feature of the semantic web, the different approach of viaf and wikidata to this issue must be underlined. as noted, “viaf remains neutral towards differences in the cataloguing policy of its data contributors” and, for this reason, viaf accepts all ids provided by its sources, even when they are not clearly identifiable entities but are just labels (see for example https://viaf.org/viaf/307171748 or https://viaf.org/viaf/305052259).46 on the contrary, wikidata explicitly requires each item to refer to “a clearly identifiable conceptual or material entity” (second notability criterium; https://www.wikidata.org/wiki/wikidata:notability). as a consequence, many isolated clusters formed by viaf on the basis of single contributors’ ids related to not-clearly-identifiable entities are not acceptable in wikidata and remain unlinked. moreover, data on cluster duplication shows that identification in wikidata is performed with the same quality level as in viaf. clusters for identification purpose are created both in viaf and wikidata, but differently from viaf, in wikidata external identifiers—as all the other data—are not provided in a structured way by national libraries or other institutions (with very few exceptions); instead, identifiers are usually found and added by common users through web scrapers and after data cleaning. what is more, matches are not performed automatically, but semiautomatically (through tools such as openrefine or mix’n’match (https://mix-n-match.toolforge.org/ and https://openrefine.org/) or manually. an enhanced feature of wikidata in clusterization is the record of a wider variety of sources and relative ids: due to its openness, wikidata refers to viaf and its sources, but also to any other library or cultural institution and to a large number of reference sources like encyclopedias and biographical dictionaries too (table 7). a wider variety of identification sources and manual work assure a higher level of identification. data quantity data harvesting affects both quantity and quality of data. in viaf, data are collected from periodical contributions of viaf participants, with very large sets of data. therefore, from a quantitative point of view, viaf has a far larger number of people (22,099,715 personal clusters) in comparison with wikidata (8,304,947 personal items). even though wikidata was created in 2012, the number of personal items in wikidata is currently only over a third (37%) of all viaf personal clusters. although quantities are not directly comparable due to the different universe to be described, in the last few years initiatives to enhance organized cooperation between libraries and wikidata and to promote data production in wikidata are increasing. a very high-quality initiative is supported by cornell university, harvard university, stanford university, and the university of iowa’s school of library and information science, in collaboration with the library of congress and the program for cooperative cataloging (pcc). their linked data for production (ld4p) wikidata project is “an indepth exploration of how wikidata could serve as a platform for publishing, linking, and enriching library linked data” https://viaf.org/viaf/307171748 https://viaf.org/viaf/305052259/#jones,_a._l https://www.wikidata.org/wiki/wikidata:notability https://mix-n-match.toolforge.org/ https://openrefine.org/ http://catalogo.pusc.it/beyond_viaf/#tb7 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 17 (https://www.wikidata.org/wiki/wikidata:wikiproject_linked_data_for_production). an additional example is the ifla wikidata working group that was formed “to explore and advocate for the use of and contribution to wikidata by library and information professionals, the integration of wikidata and wikibase with library systems, and alignment of the wikidata ontology with library metadata formats such as bibframe, rda, and marc” (https://www.ifla.org/node/92837). even so, wikidata is still very far from having a structured workflow to ingest data from national or local libraries, museums, and archives. in fact, while the projects mentioned above are mainly dedicated to explaining to the public of librarians and institutions why wikidata is important and how to contribute to it, there are still very few projects which are mainly dedicated to the concrete massive synchronisation of data between library and bibliographic data and wikidata. in fact, they also require a relevant effort in the manual cleaning of discrepancies and oddities emerging from the synchronisation. relevant exceptions are the national library of wales 47 and the biblioteca europea di informazione e cultura, where significant work has been done to synchronise respective databases of authors (and of other types of entities) with wikidata. 48 data quality data quality also needs to be analyzed in detail. even if data from national libraries are authoritative and of high quality, as a virtual file viaf neither has nor produces its own data. consequently, viaf data does not always remain authoritative because errors can be both inherited and added, and clusters can be duplicated. the issue is well known by isni, that “whenever necessary [. . .] splits and merges data coming from viaf, and even applies protection to data that has been fixed manually.”49 as shown in table 2 and table 8, viaf clusters are subject to isolation and duplication when they are created and to many changes and updates when they are maintained. so, even if viaf collects a huge amount of authoritative data and creates clusters of ids, viaf users can not always safely and continuously rely on them. data flows just in one direction (from national libraries to viaf), viaf deletes and rebuilds clusters without giving priority to the stability of one cluster over another, and, after april 2020, viaf no longer makes available to users a record of its changes.50 on the contrary, wikidata data is always under strict control of any user, as its structure is designed to trace any minimum change to its data. every single addition or deletion is documented, not just to easily recover eventual vandalism, but also to support any decision with clear evidence. any stakeholder can exactly know if, how, when, and why data changed, in any moment. what is more, from a qualitative point of view, wikidata seems to offer a better solution for the recording of authority data than viaf. first, it can store a wider variety of data about a person in a more semantic way. not only is it possible in wikidata to express preferred and variant forms of the name, related names, works, co-authors, publication statistics, and other data about the person—like in viaf—but all these data are all expressed in a semantic way. for example, whereas in viaf “bach, anna magdalena” is just a related name of johann sebastian bach, in wikidata she is recorded and qualified as the person who married the musician. thanks to that different approach, wikidata can represent and show bach’s full genealogic tree (https://magnustoolserver.toolforge.org/ts2/geneawiki/?q=q1339). as adamich noted, “building graphs from bibliographic entities is really about making the data machine readable and understandable. it is about making the data web enabled. in terms of translation, linked data opens up a whole new world over our marc entrapment.”51 https://www.wikidata.org/wiki/wikidata:wikiproject_linked_data_for_production https://www.ifla.org/node/92837 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb8 https://magnus-toolserver.toolforge.org/ts2/geneawiki/?q=q1339 https://magnus-toolserver.toolforge.org/ts2/geneawiki/?q=q1339 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 18 quality is enhanced by matching methods too; whereas viaf matches identities by an algorithm based on explicit identifiers or string matching (such as the forms of the name, dates, and bibliographic relationships),52 wikidata matches are usually decided by a human, the user, or (in the case of semiautomatic imports) at least checked a posteriori by a human after some time. the higher precision of manual over automatic matching is recognized also in viaf guidelines. 53 furthermore, as seen above, notability requires that, when clear identification is impossible, no item must be created in wikidata. data maintenance and usability data quality relies also on maintenance. comparison between wikidata items and viaf clusters shows a very small but constant presence of errors to be fixed in both (around 0.01%), even if it is impossible to determine with certainty whether viaf uses wikidata error pages. issues on fixing viaf errors directly by viaf contributors were already noted: “while clustering anomalies can be handled by viaf itself, reporting errors found in source data of viaf partners raise problems related to the efficiency of the notification workflows. at this point, involvement of viaf partners themselves in the process is needed.”54 on the other hand, in wikidata anyone can edit items, add new data or delete mistakes, merge items, fix various issues, and so on, on the fly. due to its openness, wikidata may also suffer from vandalism, but it has its own solutions.55 along with this, data receive special attention to their accuracy and reliability because they are uploaded and maintained by users that are direct stakeholders. for this reason, in wikidata, references to bibliographical or biographical sources and to other data provider ids such as any national and international identification system are suggested, promoted, and carefully examined. moreover, there is a commitment to monitor the consistency of viaf clusters. the ability of wikidata to identify inconsistent viaf clusters and the fact that viaf isolated clusters can be reduced at least by 30%56 by referring to identifiers from wikidata and other data providers, are the best demonstration of the quality of its data and of the importance of the other data providers in viaf clusterization. as to the usability of data, the internal search of viaf lacks more than basic functions: the only available filter allows to limit results to clusters having one specific source; on the contrary, filtering searches for clusters having and/or not having a specific group of sources or to clusters having more or less sources would be very useful, especially in order to find duplicates. in contrast, wikidata has a sparql query service which returns results based on the current status of the database and its internal search can integrate some of the functions of the query service, allowing to look for items having and/or not having specific statements (https://www.wikidata.org/wiki/special:search).57 considering cases in which viaf and wikidata discover potential duplicates in their sources, viaf has no page dedicated to listing cases of (supposedly) duplicate ids from its sources, while wikidata easily allows to find cases in which single sources have (supposedly) duplicate ids through constraint violations58 and appropriate sparql queries. a comparison table a comparison table was built to compare scope, role, system, and functions between viaf and wikidata, inspired by and adapted from a viaf vs isni comparison.59 https://www.wikidata.org/wiki/special:search information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 19 table 9. comparison between and complementarity of viaf and wikidata features feature viaf wikidata scope ● persons ● organizations ● works ● expressions ● locations ● any kind of viaf entity ● any “res” of ifla lrm ● any entity of cidoc ● any other non-glam entity ● any entity in the universe of discourse software ● unknown ● wikibase60 data. person entity properties ● preferred form of name, based on national cataloguing rules ● very rich variant forms of name, identified by national agencies variant forms ● sources ● preferred form of name (label) based on convenience of the user and common usage61 ● variant forms of name (aliases), organized by languages and scripts62 ● sources (as statements and references and with qualifiers) data. quantity (persons) ● number of clusters: 33,656,281 (sept. 2020) ● number of personal clusters: 22,099,715 (sept. 2020) ● number of entities: 90,260,081 (oct. 2020) ● number of personal items: 8,304,947 (oct. 2020) ● number of personal items with viaf id: 2,061,046 (sept. 2020) data. harvesting ● data are provided by authoritative national bibliographic agencies ● data are added through massive semiautomatic imports and/or manually by any interested user data. quality ● data are granted by authoritative national bibliographic agencies ● data are controlled by any directly interested user, based on data from viaf, available bibliographic agencies, and other authoritative bibliographic sources data. other entities properties ● isbn, titles, dates included in the cluster ● any kind of property applicable to an entity can be used (multimedia included)63 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 20 feature viaf wikidata ● dates, genre, bibliographic references from sources, xlinks, etc. ● properties are unchangeable ● all statements admit references, which are strongly recommended in some cases ● unavailable properties can be freely added through a process of property proposal64 data. dates ● dates are extracted from authority and bibliographic records using a parsing technique; calendars and precision are not available65 ● dates are imported semiautomatically from various sources or filled in manually; different calendars are available and further statements can be made through qualifiers66 data. vandalism ● no vandalism: data are editable only by oclc ● everyone can edit, but items which are frequently vandalized can be temporarily or permanently protected from the edits of unregistered users67 data. fixing errors, deduplicating, or unmerging clusters/items ● suggestions and requests via email ● asynchronous ● presumably, automated processes and human interventions ● viaf rebuilds clusters and does not give priority to the stability of one cluster over another68 ● everyone can edit69 ● instantaneous ● probable errors (constraintviolations) are detected in an automated way (by bots and through queries) ● pages with lists of probable errors (constraint-violations) are freely available and constantly updated in an automated way (by bots)70 data. license ● all public data (license: http://opendatacommons.org/licen ses/by/1.0/) ● all public data (license: https://creativecommons.org/publi cdomain/zero/1.0/deed.it) role ● create clusters ● ingest authority records from viaf contributors and other data providers (included wkd and isni) ● publish and diffuse viaf ids and data ● create items with a worldwide recognized and standard identifier ● interlink items with any available external identifier ● ingest data from viaf, from viaf contributors, and other data providers (e.g., isni) http://opendatacommons.org/licenses/by/1.0/ http://opendatacommons.org/licenses/by/1.0/ https://creativecommons.org/publicdomain/zero/1.0/deed.it https://creativecommons.org/publicdomain/zero/1.0/deed.it information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 21 feature viaf wikidata ● allow to create and maintain on toolforge free tools—e.g., mix’n’match—to ingest external identifiers71 ● manage library, bibliographic, and non-library and non-bibliographic linked data ● publish and diffuse wikidata ids and data organizational model ● oclc service, guided by viaf council of participating institutions ● hierarchical, top-down ● membership on request and subordinated to approval ● largely limited to national bibliographic agencies ● wikimedia project ● distributed, bottom-up ● everyone can take part in the project72 ● open to any bibliographic or nonbibliographic institution (national, large, medium, and small) system. website ● interface only in english language ● interface in nearly any language and script; new ones can be added ● online facilities (end user input; edit online facilities for end user) ● login enhances users’ experience (by gadgets and scripts) system. updating ● periodical (asynchronous) ingestions ● continuous, instantaneous, free updates system. versioning ● history is included in each present cluster and for abandoned clusters ● history is inaccessible in redirected clusters ● page history available in each item and for redirected items ● for deleted items, history is accessible only to administrators long-term preservation policy ● oclc maintains the hosting, software, and data for viaf73 ● wikimedia foundation maintains the hosting, software, and data for wikidata74 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 22 feature viaf wikidata notifications to stakeholders ● notifications to be sent to data providers ● notifications are sent to end users and contributors display, search, and download ● in multiple formats: xml and json, including justlinks.json; ● basic search interface ● clusters are listed without clear ranking rule ● integrating monthly dumps ● api endpoint75 ● before april 2020, by monthly dump with persist links; after, monthly dumps without persists links ● in multiple formats: json, php, n3, ttl, nt, rdf, jsonld, html76 ● search interface 77 ● api endpoint78 ● sparql query endpoint79 ● dumps80, also customizable81 ● see https://www.wikidata.org/wiki/help :about_data linked data and sru ● linked data ● sru82 (search and browse indexes, using cql syntax; output formats are xml or html) ● linked data interoperability. local ● local institution can only reconcile viaf ids to their own data ● as changes are made by viaf, synchronization must be periodically performed by sources and local institutions ● full reconciliation, upload, and synchronization of local ids on wikidata and vice versa ● dedicated tools: mix’n’match ● other tools: openrefine ● bots ● manually conclusion main viaf and wikidata features and personal entities data were analyzed and compared in this study to focus on analogies and differences, and to highlight their reciprocal role and helpfulness in the worldwide bibliographical context and in the semantic web environment. viaf is a major international initiative to address the challenge of reliably identifying bibliographic agents on the web, by means of authoritative data based on national cataloguing codes and coming from the national libraries involved in the ubc program. moreover, viaf is a pillar of the identification process that users enact within wikidata. still, the comparison emphasized a few relevant issues in viaf’s approach, designed more than twenty years ago: a very selective policy of inclusion of its sources—contributors and other data providers—and to their participation to the governance, that prevents a worldwide openness of the project to non national libraries and cultural institutions; an obvious neutrality toward data coming from its https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 23 contributors, even when data are not compliant with the identification requirements of the semantic web; troubles in correct clustering of ids (duplicate clusters to be merged and conflated clusters to be split), and a one-way flow of data due to its top-down approach that prevents a quick and cooperative workflow to identify and fix errors; the ability to identify only a narrow range of entities (i.e., mainly bibliographic entities, but not even all those provided by ifla lrm). on the other side, the semantic web has offered new important tools and chances to libraries, archives, museums and other cultural institutions, and their data are recognized as a relevant asset for building the backbone of the semantic web as to the control of entities of bibliographic and cultural interest. after eight years of existence, wikidata is playing a relevant role in the publication, aggregation, and control of bibliographic and non-bibliographic information in the semantic web too. it is more and more indicated as a hub for identifiers in the semantic web.83 wikidata depends on viaf for a large part of the identification work of its items on viaf and viaf’s preeminent role in wikidata is acknowledged by its primary position in the identifiers section of the data of each item. for this reason, the wikidata community constantly monitors the consistency of viaf clusters and continuously updates lists of errors present in them . on the other hand, if viaf is undoubtedly very useful to the wikidata community, wikidata can support the consistency of viaf clusters. the wikidata informational ecosystem is much larger and wider, can be built by any interested institution and person, and its identification function can count also on the authority work of national and non-national libraries excluded from the viaf environment, and on authoritative non-bibliographical reference sources too. this study opens some research perspectives. analysis was limited to data about personal entities, as this kind of entity was the only one directly comparable, while further research is wanted to possibly extend the analysis to other kinds of entities. moreover, more research should be devoted to the investigation of the treatment of special categories of persons and their names, such as mythological and legendary characters, ancient greek and latin authors, kings, queens, popes, saints, and so on, as viaf guidelines84 themselves declare among viaf’s typical problems the clusterization of such names (and they often get five or more viaf ids in wikidata). a further line of research should consider the relevance of the clusterization of encyclopedias and other reference sources in the identification process within wikidata. lastly, isolated clusters would need more consideration; as a matter of fact, in this study they were used as a clue of relatively recent uploads in viaf, but lc and dnb show a high rate of isolated clusters too (maybe due to the richness of their collections and metadata). more research on isolated clusters could help to describe with more precision the possible role of non-national libraries and institutions and of their locally rich collections in identifying lesser-known agents (not just persons) in a worldwide perspective. from analyzed data and direct comparison, it can be concluded that viaf and wikidata can be constantly improved through reciprocal comparison, which allows discovery of errors in both. viaf and wikidata are two relevant tools for the authority control in the semantic web and they each have a specific role to play and different stakeholders. unfortunately, as opposed to the relationship between viaf and isni, at present no aspect of viaf-wikidata interoperability is discussed between the managing structures of both systems, on a regular or irregular basis . while wikidata appears to be more reliable with regards to the identification process, its most significant weakness consists in its unorganized and unplanned crowdsourced data acquisition, information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 24 even if based at present on about 11,500 active editors.85 furthermore, the wikidata community still lacks the constant support and cooperation of institutional data curators such as librarians, archivists, and museum curators. many current projects are mainly dedicated to explaining to the potential institutional stakeholders the importance and the usefulness of wikidata for their institutional missions, but there are still too few projects devoted to massive synchronization of data from institutional silos to wikidata. but, as soon as these initiatives reach a critical mass, wikidata will become the real global hub of the web of data. acknowledgements all the authors have cooperated in the redaction and revision of the article. nevertheless, each author has mainly authored specific sections and subsections of the article: • stefano bargioni: data analysis; viaf; wikidata; viaf and wikidata: a data comparison. • carlo bianchini: introduction; discussion; organizational model; identification function; data quantity; data quality; data maintenance and usability. • camillo carlo pellizzari di san girolamo: relationship between viaf and libraries; relationship between wikidata and academic, research, and public libraries; relationship between viaf and wikidata; wikidata controls on viaf; materials and methods; conclusion. all authors contributed to a comparison table. the authors wish to thank the anonymous reviewer whose suggestions helped to improve and enrich the paper, and the editor for his helpful edits. information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 25 endnotes 1 thomas baker et al., library linked data incubator group final report, sec. 2 (w3c incubator group, october 25, 2011), http://www.w3.org/2005/incubator/lld/xgr-lld-20111025/. 2 baker et al., library linked data. 3 dorothy anderson, universal bibliographic control. a long term policy—a plan for action (munchen: verlag dokumentation, 1974), 11. 4 anila angjeli, andrew mac ewan, and vincent boulet, “isni and viaf: transforming ways of trustfully consolidating identities,” in ifla wlic 2014 (ifla 2014 lyon, ifla, 2014), 2, http://library.ifla.org/985/1/086-angjeli-en.pdf. 5 rick bennett et al., “viaf (virtual international authority file): linking the deutsche nationalbibliothek and library of congress name authority files,” international cataloguing and bibliographic control 36, no. 1 (2007): 12–18; barbara b. tillett, the bibliographic universe and the new ifla cataloging principles : lectio magistralis in library science = l’universo bibliografico e i nuovi principi di catalogazione dell’ifla : lectio magistralis di biblioteconomia (fiesole (firenze): casalini libri, 2008), 14–15, http://digital.casalini.it/9788885297814; “viaf. connect authority data across cultures and languages to facilitate research,” oclc, 2020, https://www.oclc.org/en/viaf.html. 6 gildas illien and françoise bourdon, “a la recherche du temps perdu, retour vers le futur: cbu 2.0” (paper, ifla wlic 2014, lyon, france, 2014), 13–14, http://library.ifla.org/956/. 7 illien and bourdon, “a la recherche,” 15. 8 gordon dunsire and mirna willer, “the local in the global: universal bibliographic control from the bottom up” (paper, ifla wlic 2014, lyon, france, 2014), 11, http://library.ifla.org/817/. 9 luca martinelli, “wikidata: la soluzione wikimediana ai linked open data,” aib studi 56, no. 1 (march 2016): 75–85, https://doi.org/10.2426/aibstudi-11434; jesús tramullas, “objetos culturales y metadatos: hacia la liberación de datos en wikidata,” anuario thinkepi 11 (2017): 319–21, https://doi.org/10/ghbj63; xavier agenjo-bullón and francisca hernández-carrascal, “wikipedia, wikidata y mix’n’match,” anuario thinkepi 14 (2020), https://doi.org/10/ghbj6t; claudio forziati and valeria lo castro, “the connection between library data and community participation: the project share catalogue-wikidata,” jlis.it 9, no. 3 (2018): 109–20, https://doi.org/10/ggxj9n; adrian pohl, “was ist wikidata und wie kann es die bibliothekarische arbeit unterstützen?,” abi technik 38, no. 2 (2018): 208, https://doi.org/10/ghbj6w; arl white paper on wikidata: opportunities and recommendations (the association of research libraries, 2019), https://www.arl.org/wpcontent/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf; regine heberlein, “on the flipside: wikidata for cultural heritage metadata through the example of numismatic description” (paper, ifla wlic 2019, libraries: dialogue for change, session 206: art libraries with subject analysis and access, athens, greece, august 28, 2019), http://library.ifla.org/2492/1/206-heberlein-en.pdf. 10 arl white paper on wikidata, 27–30; theo van veen, “wikidata: from ‘an’ identifier to ‘the’ identifier,” information technology and libraries 38, no. 2 (2019): 72–81, http://www.w3.org/2005/incubator/lld/xgr-lld-20111025/ http://library.ifla.org/985/1/086-angjeli-en.pdf http://digital.casalini.it/9788885297814 https://www.oclc.org/en/viaf.html http://library.ifla.org/956/ http://library.ifla.org/817/ https://doi.org/10.2426/aibstudi-11434 https://doi.org/10/ghbj63 https://doi.org/10/ghbj6t https://doi.org/10/ggxj9n https://doi.org/10/ghbj6w https://www.arl.org/wp-content/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf https://www.arl.org/wp-content/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf http://library.ifla.org/2492/1/206-heberlein-en.pdf information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 26 https://doi.org/10/ghbj62; hilary thorsen, “ld4p: linked data for production: wikidata as a hub for identifiers” (slideshow presentation, june 11, 2020), https://docs.google.com/presentation/d/1jwz3_ncf5rdd7ejetglfv99uv2pnd1v/edit?usp=embed_facebook. 11 tillett, the bibliographic universe, 15. 12 open data commons attribution license (odc-by) v1.0 (as stated in http://viaf.org/viaf/data/). 13 “viaf admission criteria,” oclc, 2020, https://www.oclc.org/content/dam/oclc/viaf/viaf%20admission%20criteria.pdf. 14 the description of wikidata source in http://viaf.org/viaf/partnerpages/wkp.html seems to refer to wikipedia before the existence of wikidata. the same acronym wkp reflects this anachronism, whereas isni correctly uses wkd. anyway, this description, as well as many others, requires an update. 15 stacy allison-cassin and dan scott, “wikidata: a platform for your library’s linked open data,” code4lib journal 40 (may 4, 2018), https://journal.code4lib.org/articles/13424. 16 carlo bianchini and pasquale spinelli, “wikidata at fondazione levi (venice, italy): a case study for the publication of data about fondo gambara, a collection of 202 musicians’ portraits,” jlis.it 11, no. 3 (september 15, 2020): 24. 17 ifla working group on functional requirements and numbering of authority records (franar), functional requirements for authority data: a conceptual model (münchen: k. g. saur, 2009), 46, https://www.ifla.org/files/assets/cataloguing/frad/frad_2013.pdf. for qualifiers, see https://www.wikidata.org/wiki/help:qualifiers; for references see https://www.wikidata.org/wiki/help:sources. 18 partial lists are linked from https://wikibase-registry.wmflabs.org/wiki/main_page. 19 see https://www.transition-bibliographique.fr/fne/french-national-entities-file/; the proof of concept is available at https://github.com/abes-esr/poc-fne. 20 jean godby et al., creating library linked data with wikibase: lessons learned from project passage (dublin oh: oclc research, 2019): 8, https://doi.org/10.25333/faq3-ax08. 21 ifla, “opportunities for academic and research libraries and wikipedia” (discussion paper, 2016), 10, https://www.ifla.org/files/assets/hq/topics/infosociety/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf. 22 john riemer, “the program for cooperative cataloging & a wikidata pilot” (slideshow presentation, june 16, 2020), slide 5, https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmy xffi/edit#slide=id.p. 23 godby et al., “creating library linked data,” 8. https://doi.org/10/ghbj62 https://docs.google.com/presentation/d/1jwz3_ncf5rdd-7ejetglfv99uv2pnd1v/edit?usp=embed_facebook https://docs.google.com/presentation/d/1jwz3_ncf5rdd-7ejetglfv99uv2pnd1v/edit?usp=embed_facebook http://viaf.org/viaf/data/ https://www.oclc.org/content/dam/oclc/viaf/viaf%20admission%20criteria.pdf http://viaf.org/viaf/partnerpages/wkp.html https://journal.code4lib.org/articles/13424 https://www.ifla.org/files/assets/cataloguing/frad/frad_2013.pdf https://www.wikidata.org/wiki/help:qualifiers https://www.wikidata.org/wiki/help:sources https://wikibase-registry.wmflabs.org/wiki/main_page https://www.transition-bibliographique.fr/fne/french-national-entities-file/ https://github.com/abes-esr/poc-fne https://doi.org/10.25333/faq3-ax08 https://www.ifla.org/files/assets/hq/topics/info-society/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf https://www.ifla.org/files/assets/hq/topics/info-society/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmyxffi/edit%23slide=id.p https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmyxffi/edit%23slide=id.p information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 27 24 maximilian klein and alex kyrios, “viafbot and the integration of library data on wikipedia,” code4lib journal 22 (october 14, 2013), https://journal.code4lib.org/articles/8964. 25 ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp) (den haag: ifla, 2016), para. 5.3. 26 https://www.wikidata.org/wiki/mediawiki:wikibasesortedproperties#ids_with_datatype_%22external-id%22; isni (p213, https://www.wikidata.org/wiki/property:p213) is presently sorted after viaf instead of in the iso section because it is considered primarily as a viaf source. 27 epìdosis, viaf e wikidata.mpg, 2020, https://commons.wikimedia.org/wiki/file:viaf_e_wikidata.mpg; a list of gadgets is available at https://www.wikidata.org/wiki/wikidata:viaf/cluster#gadgets. 28 the main error-report page is https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_entities; its subpage https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_specific_entries is designed for collecting “easy” cases of conflation, when only a few members of a cluster should be moved elsewhere, while the cluster is substantially sane. 29 moreno hayley, email to author, march 23, 2020. to the question if data about abandoned clusters would have been maintained, the viaf answered, “we recognize that the data in the file was not usable. viaf is in a period of transition and it was decided that we could not at this time fix the file so it has been removed from the list of available downloads.” 30 the statement read: “the persist-rdf.xml file has been removed and will no longer be available,” accessed october 23, 2020. 31 angjeli, mac ewan, and boulet “isni and viaf,” 3. 32 https://dumps.wikimedia.org/wikidatawiki/; instructions and a list of kinds of data dumps are available at https://www.wikidata.org/wiki/wikidata:database_download. 33 a general explanation of ranks is available at https://www.wikidata.org/wiki/help:ranking. here is a small summary: values of statements can be ranked in three ways, “preferred,” “normal” (default), and “deprecated”; the expression “values with non-deprecated rank” includes all values with preferred rank or normal rank; the expression “values with best rank” includes only values with preferred rank or normal rank, with this condition: if the same statement has two or more values and at least one of them has preferred rank, values with normal rank aren’t counted; if there aren’t values with preferred rank, all values with normal rank are counted. 34 viaf and wikidata dumps, together with the scripts, were published on zenodo at https://doi.org/10.5281/zenodo.4457114. https://journal.code4lib.org/articles/8964 https://www.wikidata.org/wiki/mediawiki:wikibase-sortedproperties%23ids_with_datatype_%22external-id%22 https://www.wikidata.org/wiki/mediawiki:wikibase-sortedproperties%23ids_with_datatype_%22external-id%22 https://www.wikidata.org/wiki/property:p213 https://commons.wikimedia.org/wiki/file:viaf_e_wikidata.mpg https://www.wikidata.org/wiki/wikidata:viaf/cluster%23gadgets https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_entities https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_specific_entries https://dumps.wikimedia.org/wikidatawiki/ https://www.wikidata.org/wiki/wikidata:database_download https://www.wikidata.org/wiki/help:ranking https://doi.org/10.5281/zenodo.4457114 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 28 35 the queries can be performed using the following links: viaf members: https://w.wiki/i5j; authority controls related to libraries but not being viaf members: https://w.wiki/i5k; biographical dictionaries: https://w.wiki/i5n. 36 the query can be performed using the following link: https://w.wiki/i5p. 37 it could be because they are probably more difficult to cluster, but in some cases also because they represent infrequently described entities. 38 as suggested by the reviewer, more removals than additions may be a clue of a cleanup project. 39 pat riva, patrick le boeuf, and maja zumer, ifla library reference model, draft (den haag: ifla, 2017), https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ifla_lrm_2017-03.pdf; nick crofts et al., “definition of the cidoc conceptual reference model,” version 5.0.4, icom/cidoc crm special interest group, 2011, http://www.cidoc-crm.org/html/5.0.4/cidoc-crm.html; chryssoula bekiari et al., eds., frbr object-oriented definition and mapping from frbrer, frad and frsad, version 2.0 (international working group on frbr and cidoc crm harmonisation, 2013), http://old.cidoccrm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf; lydia pintscher, lea lacroix, and mattia capozzi, “what’s new on the wikidata features this year,” youtube video, october 26, 2020, truocolo, https://www.youtube.com/watch?v=ebxdzk54gru. 40 denny vrandečić and markus krötzsch, “wikidata: a free collaborative knowledgebase,” communications of the acm 57, no. 10 (september 23, 2014): 80, https://doi.org/10/gftnsk. 41 for a general statistic see http://wikidata.wikiscan.org/users; for a statistic about the viaf property see https://bambots.brucemyers.com/navelgazer.php?property=p214; changing the id of the property at the end of the url allows exploring other property statistics. 42 shiyali ramamrita ranganathan, reference service, 2nd ed., ranganathan series in library science 8 (bombay: asia publishing house, 1961), 74. 43 ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp), 5, https://www.ifla.org/publications/node/11015. 44 wikidata does have a guideline for a preferred label, and its choice is based on users’ convenience (https://www.wikidata.org/wiki/help:label, par. 1.2) as required by international cataloguing principles (2016). as to the choice of the wikidata label in a specific language, viaf does not show any clear principle, while the authors believe that it would be preferable to use the english (“en”) label, whenever available. see ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp). 45 for example, in september it was done for nkc using openrefine (sample edit: https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=12668704 64). https://w.wiki/i5j https://w.wiki/i5k https://w.wiki/i5n https://w.wiki/i5p https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ifla_lrm_2017-03.pdf http://www.cidoc-crm.org/html/5.0.4/cidoc-crm.html http://old.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf http://old.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf https://www.youtube.com/watch?v=ebxdzk54gru https://doi.org/10/gftnsk http://wikidata.wikiscan.org/users https://bambots.brucemyers.com/navelgazer.php?property=p214 https://www.ifla.org/publications/node/11015 https://www.wikidata.org/wiki/help:label https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=1266870464 https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=1266870464 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 29 46 angjeli, mac ewan, and boulet, “isni and viaf,” 9. 47 simon cobb (https://www.wikidata.org/wiki/user:sic19) became wikidata visiting scholar in 2017 (https://en.wikipedia.org/wiki/user:jason.nlw/wikidata_visiting_scholar). 48 federico leva and marco chemello, “the effectiveness of a wikimedian in permanent residence: the beic case study,” jlis.it 9, no. 3 (september 2018): 141–47, https://doi.org/10.4403/jlis.it-12481. 49 angjeli, mac ewan, and boulet, “isni and viaf,” 11. 50 andrew mac ewan, “isni, viaf and naco and their relationship to orcid, discussion paper for pcc policy committee, 4 november,” 2013, 2, http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.d ocx. 51 tom adamich, “library cataloging workflows and library linked data: the paradigm shift,” technicalities 39, no. 3 (may/june 2019): 14. 52 oclc, viaf guidelines, rev. july 16, 2019, 2, https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf. 53 oclc, viaf guidelines, 5. “when viaf is unable to algorithmically match some of the source authority records with each other, they can be manually pulled together into a single cluster using an internal table.” 54 angjeli, mac ewan, and boulet, “isni and viaf,” 16. 55 stefan heindorf et al., “vandalism detection in wikidata,” in proceedings of the 25th acm international conference on information and knowledge management, cikm ’16 (new york, ny: association for computing machinery, 2016), 327–36, https://doi.org/10/gg2nmm; amir sarabadani, aaron halfaker, and dario taraborelli, “building automated vandalism detection tools for wikidata,” in proceedings of the 26th international conference on world wide web companion, www ’17 companion (republic and canton of geneva, che: international world wide web conferences steering committee, 2017), 1647–54, https://doi.org/10/ghhtzf. 56 see table 1, col. 1 vs col. 9; it should be noted that col. 9 considers only non-viaf sources and biographical dictionaries, but wikidata also links to encyclopedias and other online databases. 57 for example, people not having viaf id but having iccu id (https://tinyurl.com/y6hbtjuo); instructions about the internal search are available at https://www.mediawiki.org/wiki/help:extension:wikibasecirrussearch. 58 https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations. 59 angjeli, mac ewan, and boulet, “isni and viaf,” 16. 60 https://www.mediawiki.org/wiki/wikibase/datamodel. https://www.wikidata.org/wiki/user:sic19 https://en.wikipedia.org/wiki/user:jason.nlw/wikidata_visiting_scholar https://doi.org/10.4403/jlis.it-12481 http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.docx http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.docx https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf https://doi.org/10/gg2nmm https://doi.org/10/ghhtzf https://tinyurl.com/y6hbtjuo https://www.mediawiki.org/wiki/help:extension:wikibasecirrussearch https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations https://www.mediawiki.org/wiki/wikibase/datamodel information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 30 61 “the label is the most common name that the item would be known by” (https://www.wikidata.org/wiki/help:label). see also ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp), 5., https://www.ifla.org/publications/node/11015. 62 bots exist to create more and more variant forms based on matching properties, such as date of birth (p569) and date of death (p570), and to import variant forms of names from national authority files. see, for example, https://www.wikidata.org/w/index.php?title=q5669&diff=611600491&oldid=608231160 . 63 https://www.wikidata.org/wiki/help:data_type. 64 https://www.wikidata.org/wiki/wikidata:property_proposal. 65 jenny a. toves and thomas b. hickey, “parsing and matching dates in viaf,” code4lib journal, 26 (october 21, 2014), https://journal.code4lib.org/articles/9607; stefano bargioni, “from authority enrichment to authoritybox : applying rda in a koha environment,” jlis.it 11, no. 1 (2020): 175–89, https://doi.org/10/gg66rq. 66 https://www.wikidata.org/wiki/help:dates. 67 see heindorf et al., “vandalism detection in wikidata.” 68 see mac ewan, “isni, viaf and naco.” 69 see https://www.wikidata.org/wiki/help:merge, https://www.wikidata.org/wiki/help:split_an_item, and https://www.wikidata.org/wiki/help:conflation_of_two_people. 70 complete list at https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations (e.g., https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations/p214). 71 https://admin.toolforge.org/; see also xavier agenjo-bullón and francisca hernándezcarrascal, “registros de autoridades, enriquecimiento semántico y wikidata,” anuario thinkepi 12 (2018): 361–72, https://doi.org/10/ghbj6z. 72 https://www.wikidata.org/wiki/wikidata:property_proposal. 73 https://www.oclc.org/en/viaf.html. 74 https://www.wikidata.org/wiki/wikidata:introduction. 75 https://platform.worldcat.org/api-explorer/apis/viaf. 76 https://www.wikidata.org/wiki/special:entitydata; see also https://www.wikidata.org/wiki/wikidata:database_download. 77 https://www.wikidata.org/wiki/special:search. https://www.wikidata.org/wiki/help:label https://www.ifla.org/publications/node/11015 https://www.wikidata.org/w/index.php?title=q5669&diff=611600491&oldid=608231160 https://www.wikidata.org/wiki/help:data_type https://www.wikidata.org/wiki/wikidata:property_proposal https://journal.code4lib.org/articles/9607 https://doi.org/10/gg66rq https://www.wikidata.org/wiki/help:dates https://www.wikidata.org/wiki/help:merge https://www.wikidata.org/wiki/help:split_an_item https://www.wikidata.org/wiki/help:conflation_of_two_people https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations/p214 https://admin.toolforge.org/ https://doi.org/10/ghbj6z https://www.wikidata.org/wiki/wikidata:property_proposal https://www.oclc.org/en/viaf.html https://www.wikidata.org/wiki/wikidata:introduction https://platform.worldcat.org/api-explorer/apis/viaf https://www.wikidata.org/wiki/special:entitydata https://www.wikidata.org/wiki/wikidata:database_download https://www.wikidata.org/wiki/special:search information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 31 78 https://www.wikidata.org/w/api.php. 79 https://query.wikidata.org/. 80 https://dumps.wikimedia.org/wikidatawiki/. 81 https://wdumps.toolforge.org/. 82 https://www.oclc.org/developer/develop/web-services/viaf/authority-source.en.html. 83 van veen, “wikidata.” 84 see “typical problems” in viaf guidelines: https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf. 85 pintscher, lacroix, and capozzi, “what’s new.” https://www.wikidata.org/w/api.php https://query.wikidata.org/ https://dumps.wikimedia.org/wikidatawiki/ https://wdumps.toolforge.org/ https://www.oclc.org/developer/develop/web-services/viaf/authority-source.en.html https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf abstract introduction relationship between viaf and libraries relationships between wikidata and academic, research, and public libraries relationship between viaf and wikidata wikidata controls on viaf materials and methods data analysis: viaf clusters and wikidata items viaf wikidata viaf and wikidata: a data comparison discussion organizational model identification function data quantity data quality data maintenance and usability a comparison table conclusion acknowledgements endnotes virtual reality as a tool for student orientation in distance education programs: a study of new library and information science students articles virtual reality as a tool for student orientation in distance education programs a study of new library and information science students sandra valenti, brady lund, and ting wang information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.11937 dr. sandra valenti (svalenti@emporia.edu) is assistant professor, school of library and information management, emporia state university. brady lund (blund2@g.emporia.edu) is doctoral student of library and information management at emporia state university. ting wang (twang2@emporia.edu) is doctoral student of library and information management, emporia state university. abstract virtual reality (vr) has emerged as a popular technology for gaming and learning, with its uses for teaching presently being investigated in a variety of educational settings. however, one area where the effect of this technology on students has not been examined in detail is as tool for new student orientation in colleges and universities. this study investigates this effect using an experimental methodology and the population of new master of library science (mls) students entering a library and information science (lis) program. the results indicate that students who received a vr orientation expressed more optimistic views about the technology, saw greater improvement in scores on an assessment of knowledge about their program and chosen profession, and saw a small decrease in program anxiety compared to those who received the same information as standard textand-links. the majority of students also indicated a willingness to use vr technology for learning for long periods of time (25 minutes or more). the researchers concluded that vr may be a useful tool for increasing student engagement, as described by game engagement theory. literature review computer-assisted instruction (cai) has, for many years, been considered an effective method of instructional delivery that improves student engagement and outcomes.1 new technologies, such as the learning management system (lms), online video, laptops and tablets, word processors, spreadsheets, and presentation platforms, have all significantly altered how knowledge is transferred and measured in students. when adopted by instructors, these technologies can improve the quality of student learning, work, and their evaluation of this work. empirical research has shown that learning technologies do indeed contribute to better learning than a lecture alone.2 positive reaction to the adoption of new learning technologies among student populations has been shown across all grade levels, from pre-k through postgraduate education.3 research in the fields of instructional design technology (idt) and information science (is) have shown that the novelty of new learning technology provides short-term improvement in outcomes.4 this supports the broader hypothesis that engagement increases retention of knowledge. these findings would suggest that, at least in the short term, instructors could anticipate improvement in knowledge retention through the use of a new technology like virtual mailto:svalenti@emporia.edu mailto:blund2@g.emporia.edu mailto:twang2@emporia.edu information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 2 reality. when used in sustained instructional efforts, many learning technologies show som e promise for improving the attainment of learning outcomes.5 this is why interest in learning technology has grown so significantly in the past two decades and the job outlook for instructional designers is increasing faster than the national average. 6 a large proportion of instructional technologies are not truly “adopted” by instructors, but rather used only in one-off sessions and then discarded.7 there seem to be some common factors among those technologies that are adopted and used regularly by instructors: 1. practicality, or the amount of work the new technology requires versus the perceived value of said technology; 2. affordability, or the cost of a new technology versus the perceived value of said technology; and 3. stability, or the likelihood of the product to be continuously supported and updated by its manufacturer (e.g., a product like microsoft office has a higher likelihood of ongoing maintenance).8 as noted by lund and scribner, only recently, with the introduction of free vr development programs and inexpensive viewers/headsets like google cardboard, has vr fit this criteria. 9 it is finally practical to use vr as a learning tool for classrooms with large numbers of students. “virtual reality is the computer-created counterpart to actual reality. through a video headset, computer programs present a visual world that can, pixel-perfectly, replicate the real world—or show a completely unreal one.”10 virtual reality is distinct from augmented reality, which augments a real-world, real-time image (e.g., viewed through a camera on a mobile device) with computer-generated information, such as images, text, videos, animation, and sound.11 the focus of the present study is virtual reality only, not related augmented (or mixed) reality technology. an important contribution to the study of virtual reality in library and information science (lis) is varnum’s beyond reality.12 this short introductory book covers both theoretical and practical considerations for the use of virtual, augmented, and mixed reality in a variety of library contexts. while the book describes how vr can be utilized in a variety of library education (for non-lis majors) contexts, it does not include an example of how virtual reality may be used for library school education. it also does not investigate in significant detail the use of virtual reality for a virtual orientation to an academic program. these are the gaps in which the following study attempts to address. the present study may be viewed through the framework game engagement theory, as described by whitton.13 game engagement theory suggests that five major learning engagement factors exist and that using gaming activities may improve how well learning activities address these factors. these factors include: • challenge, motivation to undertake activity; control, the level of choice; • immersion, extent to which an individual is absorbed into activity; • interest, an individual’s interest in the subject matter; and • purpose, the perceived value of the outcome of the activity. information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 3 it has been suggested by several researchers, including dede, that immersive experiences like vr touch on similar factors of engagement.14 emporia state university’s school of library and information management the setting for this study is emporia (ks) state university’s school of library and information management (esu slim). esu slim is the oldest library school west of the mississippi river, founded in 1902. compared to other lis education programs, esu slim is unique in that it offers a hybrid course delivery format. the six core courses in the mlis degree program are online with two in-person-class weekends for each class. each class weekend is eleven hours: from 6 to 9 p.m. friday and 9 a.m. to 5 p.m. saturday at one of nine distance education locations scattered throughout the western half of the united states. due to this course delivery format, the student population of esu slim may skew slightly older and have more individuals who are employed fulltime in relation to residential master’s programs. esu slim uses a cohort system, with a new group of students beginning annually at each of the eight distance locations as well as the main emporia, kansas campus. before each new cohort begins its first course, a one-day, in-person student orientation is offered on the campus in which the cohort will attend classes. the purpose of this experimental study is to examine how well vr technology can support or satisfy the role of the in-person student orientation by emulating the experience/information students receive during this informational session. methods this study was designed with a pre-test/post-test experimental design. depending on the state in which the students reside, they were assigned either to the experimental or control group . the experimental group received a cardboard vr headset (similar to google cardboard) and a set of instructions on how to use them. they were instructed to utilize this headset to view an interactive experience that introduced elements of library service and library education as a form of new student orientation. students in the control group received a set of links that contained the same information as the vr experience, but in a more static (non-immersive or interactive) setting. participants for this study were library school students from four states: south dakota, idaho, nevada, and oregon. these students were all enrolled in a mixed-delivery program in lis. for each core course in the program, students attend two intensive, in-person, weekend class sessions. the rest of the course content is delivered via a learning management system. for this study, the researchers were particularly interested in understanding the role of vr orientation for distance education students, as these students do not have access to the physical university campus and thus miss out on information that in-person interaction with faculty and the library environment might provide. this also seemed like a worthwhile population to study given that a large portion of lis programs have adopted the distance education (online or mixed-delivery) format. in march 2019, a sample of this population was asked to complete a short survey to indicate their interest in virtual reality for new student orientation and the extent to which acquiring information via this medium may relieve their anxiety and increase their success in the program. sixty-one percent of students indicated at least some elevated level of anxiety about their first mls information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 4 course, while 55 percent agreed that knowing more about the program’s faculty and course structure and purpose would decrease that anxiety. students were also asked to indicate the most pressing information needs they have about the program. these needs are displayed in table 1 below. this information was used to guide the design of the vr content for this study. table 1. information needs expressed by new mls students information need number of respondents (out of 55) information about esu’s curriculum 50 what courses professors normally teach 42 information about information access 41 information about librarianship in general 39 professors’ research interests 35 information about esu’s faculty 27 to see who they are via a video introduction 25 information about esu’s library 24 why they teach for esu’s mls program 23 a little personal information about faculty 20 information about my regional director 14 to which associations do faculty belong 13 information about esu’s physical spaces 5 information about esu’s archives 4 these students were also asked to indicate the extent to which they would like to use vr to virtually “meet” faculty, learn more about the program’s format, see program spaces, and learn about library services, using a five-point likert scale. the findings for this question are displayed in figure 1. information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 5 figure 1. new mls students reception to using vr as an orientation tool based on the largely positive response towards using vr for new student orientation, the researchers progressed to the experimental phase of the study. a vr experience was developed using veer vr (veer.tv), a completely free and intuitive vr-creation platform. within this platform, creators are able to upload images that were captured using a 360-degree vr camera (we used a samsung gear 360 camera) and drag-and-drop interactive elements, including text boxes, videos, audio, and transitions to new images. thus, it was possible to create a vr experience within the setting of an academic library where users could navigate throughout the building and virtually meet faculty and learn about fundamental concepts in librarianship. for this phase of the study a set of research questions were defined, hypothesis created, and independent and dependent variables identified: research questions 1. research question 1: will vr improve students’ knowledge of topics related to their library school and basic library topics, relative to those without a vr experience? 2. research question 2: will vr reduce students’ anxiety about their library program, relative to those without a vr experience? 3. research question 3: will students’ perceptions towards the usefulness of vr be significantly different based on whether or not they utilized the vr experience? 0 2 4 6 8 10 12 14 16 18 20 i'd like to use vr to "meet" faculty i'd like to use vr to learn more about the program format i'd like to use vr to see the classrooms i'd like to learn more about library services using vr f re q u e n c y o f r e sp o n d e n ts category of vr use as student orientation tool strongly agree agree neutral disagree strongly disagree information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 6 hypothesis use of vr will improve students’ knowledge of topics related to library schools and librarianship, reduce their anxiety, and result in a more positive perspective towards vr technology. variables independent variable: whether a student viewed the vr experience for a virtual orientation or viewed the web links for an online orientation. dependent variables: change in students’ scores on a post-test assessment of orientation knowledge, compared to their pre-test scores. change in students’ anxiety levels and perceptions of vr. experimental phase the experimental phase of the study was conducted in august 2019. twenty-nine students agreed to participate in this study. the age and gender characteristics of this population are as follows: fourteen under age 35, eleven age 35–44, four age 45+; nine male, seventeen female, and three fluid or transgender. thirty-three percent of the students who agreed to participate were in the control group, while 67 percent were in the experimental group. all participants in the study received a free vr headset, which was theirs to keep. funding for these vr headsets was provided by a generous grant from a benefactor at the researchers ’ university. participants in the control group were encouraged to use the vr headset after they had completed their participation in the study. both groups received instructions with their viewer that instructed them to complete a pre-test survey, embedded within a module of their learning management system account. following the pre-test, the experimental group was instructed to use the vr experience created by the researchers to learn about their library school, its faculty, and the library concepts. the control group was instructed to use links provided in the module to experience the same content, but without the vr experience. following the experience, both groups were instructed to complete a post-test survey in the module, as well as a follow-up survey that asked questions about how long they interacted with the content, how the experience affected their program anxiety, and additional comments. once the data was collected for all participants, the researchers’ conducted a series of analyses on the data, including an analysis of covariance (ancova) for post-test scores among the control and experimental groups, and ancova for program anxiety following the experimental treatment. 15 results figure 2 displays the amount of time participants in the experimental group spent using the vr experience. nearly 60 percent of participants spent more than 25 minutes using the virtual reality experience. this finding may seem remarkable, given the average attention span of students is generally no more than a handful of minutes, but aligns with that of geri, winer, and zaks, who found that engagement with interactive video lengthens the attention span of users, and supports the premise of engagement theory as discussed in the literature review.16 only 10 percent of individuals assigned to the experimental group decided not to use the headset. additionally, about information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 7 one-third of participants in both the experimental and control groups indicated that they used the vr headset to view other content after they completed the study. figure 2. amount of time experimental group participants spent in vr experience in table 2, responses for likert question about the participants’ post-test perspectives of vr are shown. participants in the vr group generally had more favorable perspectives on their experience than participants in the control group. participants in the control group, however, were a bit more optimistic on the idea that vr has promising uses for education and librarianship (though both groups expressed optimistic perspectives on these questions). there was some indication that participants would be willing to use vr for student orientation again, as both groups responded favorably to the idea that vr orientation information is appropriate and negatively to the idea that it would be better to get information from other sources. tables 3 and 4 display the ancova for pre-test/post-test score change among groups and the change in anxiety among the groups, respectively. post-test scores for the experimental (17.23 correct out of 20 questions, or 86 percent) and control group (17.38/20, or 87 percent) were virtually identical; however the pre-test scores differed (experimental group, 72 percent, scored worse on the pre-test than control group, 78 percent), so the change in scores was actually greater for the experimental group. as shown in table 3, though, this difference in score change was not found to be statistically significant, f (1, 20) = .641 p = .4, r = .01. that is, no significant difference was found as to whether vr improves scores compared to links. it can be concluded, however, that information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 8 the links and vr together did improve scores from the pre-test to the post-test, with ancova values of f (1, 20) = 7.6, p < .01, r = .47. table 2. post-test perspectives of vr for experimental and control groups question control (textlinks)* experimental (vr)* the instructions were easy to understand and follow 3 3.38 the viewer/text-links were fun to use 3 3.63 the vr/text-links content was engaging 3 3.13 i would recommend continuing vr/textlinks use 2.67 3 i felt better informed about the topics presented 2.5 3.11 the information given was helpful 2.5 3.38 i feel more connected to the school than before 2.5 2.88 virtual reality is just a fad 2 2.88 there are exciting uses for vr in education 4 3.5 there are exciting uses for vr in librarianship 4 3.5 using vr is too time consuming 2 3 i’d rather get information in formats other than vr 2.5 2.89 vr orientation information is appropriate 4 3.38 *five-point likert scale (level of agreement—1, strongly disagree; 5, strongly agree) table 3. ancova for pre-test/post-test change in scores degrees of freedom fvalue pvalue pretest 1 .135 .7 group 1 .641 .4 error 18 total 19 corrected total 20 though the vr group generally reported less anxiety on a five-point likert scale following the experiment than the control group (both groups showed some reduction), this difference was not statistically significant at p<.05 (though it was significant at p<.1). it is worth noting that few students indicated prior experience with vr before this study, so it may have simply been the unfamiliar technology that resulted in anxiety not dropping as far as anticipated, not the nature of the content. at the same time, it is worth noting, as bawden and robinson did, that information overload, which could certainly be the product of immersive vr orientations, is connected to information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 9 information anxiety.17 thus, it may be better, in the design of vr orientations, to keep the amount of new information at a minimum, only introducing broad concepts and allowing more freedom and flexibility for the user. table 4. ancova for anxiety following the orientation experience sum of squares df mean square f sig. between groups 3.219 1 3.219 3.44 9 .07 9 within groups 17.733 19 .933 total 20.952 20 discussion participants in this study expressed willingness to use vr for extended periods of time (over 25 minutes) and demonstrated strong levels of engagement. based on this finding, it seems possible that a well-designed vr orientation could be a suitable substitute for the in-person orientation for distance students. this is a significant finding, given that the majority of existing research on orientation for distance education students focuses on the design of online course modules or video streaming for orientation, which are not nearly as immersive and dynamic as physical presence in the environment.18 vr much more closely emulates physical presence than noninteractive/immersive videos and text. those among the participants who were in the experimental (vr) group expressed more favorable perspectives towards the technology. this suggests that experience with the technology increases comfort and interest in the technology. this aligns with the findings of theung, mei-ling, liu, cheok, among others, who found that use of vr were more likely to accept the technology after usage.19 additionally, stated interest in using vr for other purposes, including one-third of participants who have already utilized the technology to explore other apps suggested by the researchers. the findings of this study align with game engagement theory in several of its key aspects. vr is shown to have garnered the interest of the students who participated in the study, as indicated in table 2, aligning with the aspect of interest. they could see the purpose of the experience and were able to take control of the experience to ensure that they interacted with necessary information to satisfy this purpose. this is opposed to the control group, which had to follow links and read text in a sequential order with little control or creativity involved. accordingly, greater improvement in scores was observed for the experimental group. even though the improvement was not statistically significant, this could likely be explained by the relatively small sample size. with a larger number of participants, the statistical strength of the differences between the two study groups may have been more pronounced. this is one limitation of the present study. in addition to a small participant group, several other limitations exist with this study. participants came from only a small sample of states, all in the western half of the united states. a less homogeneous sample may have produced more robust results. some vr headsets arrived late due to delays in distributing them, giving the students less opportunity to review the content than information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 10 they otherwise may have had. finally, the researchers were not able to easily troubleshoot problems with accessing the vr experience for distance students. while the best was done to help all participants figure out how to use the technology, several students opted to discontinue participation when the technology gave them trouble. this also led to a smaller study sample population than initially anticipated. conclusion the findings of this study may have several important implications for library professionals who are considering using vr technology for library orientations or instruction. this study found vr to have a positive effect on students’ interest and to slightly increase scores and reduce anxiety among them. while there is no indication from this study whether vr would produce positive effects over a sustained period of time (e.g., every class session over the course of a semester), in limited usage it appears to at least draw students’ attention more so than the traditional online teaching options like static text and links. the same vr experience developed to introduce students to basic concepts within the librarianship/the library could be used for undergraduate and graduate students in all majors during library orientation sessions. this may make the library a more memorable component of students’ early university experiences, as opposed to lecture information that students are likely to easily forget. library professionals may consider these factors when deciding whether to opt for the more traditional methods of instruction/orientation or experimenting with a more innovative method of teaching like virtual reality. endnotes 1 jennifer j. vogel et al., “using virtual reality with and without gaming attributes for academic achievement,” journal of research on technology in education 39, no. 1 (2006): 105–18, https://doi.org/10.1080/15391523.2006.10782475. 2 yigal rosen, “the effects of an animation-based on-line learning environment on transfer of knowledge and on motivation for science and technology learning,” journal of educational computing research 40, no. 4 (2009): 451–67, https://doi.org/10.2190/ec.40.4.d; elisha chambers, efficacy of educational technology in elementary and secondary classrooms: a metaanalysis of the research literature from 1992–2002 (carbondale, il: southern illinois university at carbondale, 2002). 3 elisha chambers, “efficacy of educational technology in elementary and secondary classrooms: a meta-analysis of the research literature from 1992–2002,” phd diss., southern illinois university at carbondale, 2002. 4 jason m. harley et al., “comparing virtual and location-based augmented reality mobile learning: emotions and learning outcomes,” educational technology research and development 64, no. 3 (2016): 359–88, https://doi.org/10.1007/s11423-015-9420-7; jocelyn parong and richard e. mayer. “learning science in immersive virtual reality,” journal of educational psychology 110, no. 6 (2018): 785–95, https://doi.org/10.1037/edu0000241; paul legris, john ingham, and pierre collerette, “why do people use information technology? a https://doi.org/10.1080/15391523.2006.10782475 https://doi.org/10.2190%2fec.40.4.d https://doi.org/10.1007/s11423-015-9420-7 https://psycnet.apa.org/doi/10.1037/edu0000241 information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 11 critical review of the technology acceptance model,” information and management 40, no. 3 (2003): 191–204, https://doi.org/10.1016/s0378-7206(01)00143-4. 5 zaid khot et al., “the relative effectiveness of computer‐based and traditional resources for education in anatomy,” anatomical sciences education 6, no. 4 (2013): 211–15, https://doi.org/10.1002/ase.1355; michael j. robertson and james g. jones, “exploring academic library users’ preferences of delivery methods for library instruction,” reference & user services quarterly 48, no. 3 (2011): 259–69. 6 joshua kim, “instructional designers by the numbers,” inside higher ed (2015), https://www.insidehighered.com/blogs/technology-and-learning/instructional-designersnumbers. 7 elena olmos-raya et al., “mobile virtual reality as an educational platform: a pilot study on the impact of immersion and positive emotion induction in the learning process,” eurasia journal of mathematics science and technology education 14, no. 6 (2018): 2045-57, https://doi.org/10.29333/ejmste/85874. 8 brady d. lund and shari scribner, “developing virtual reality experiences for archival collections: case study of the may massee collection at emporia state university,” the american archivist, https://doi.org/10.17723/aarc-82-02-07. 9 lund and scribner, “developing virtual reality experiences for archival collections.” 10 kenneth j. varnum, “preface,” in kenneth j. varnum, ed., beyond reality: augmented, virtual, and mixed reality in the library (chicago: ala editions, 2019): x. 11 brady d. lund and daniel a. agbaji, “augmented reality for browsing physical collections in academic libraries,” public services quarterly 14, no. 3 (2018): 275–82, https://doi.org/10.1080/15228959.2018.1487812. 12 kenneth j. varnum, ed., beyond reality: augmented, virtual, and mixed reality in the library (chicago: ala editions, 2019). 13 nicola whitton, “game engagement theory and adult learning,” simulation and gaming 42, no. 5 (2011): 596–609, https://doi.org/10.1177/1046878110378587. 14 chris dede, “immersive interfaces for engagement and learning,” science 323, no. 5910 (2010): 66–69, https://doi.org/10.1126/science.1167311. 15 pat dugard and john todman, “analysis of pre‐test‐post‐test control group designs in educational research,” educational psychology 15, no. 2 (1995): 181–98, https://doi.org/10.1080/0144341950150207. https://doi.org/10.1016/s0378-7206(01)00143-4 https://doi.org/10.1002/ase.1355 https://www.insidehighered.com/blogs/technology-and-learning/instructional-designers-numbers https://www.insidehighered.com/blogs/technology-and-learning/instructional-designers-numbers https://doi.org/10.29333/ejmste/85874 https://doi.org/10.17723/aarc-82-02-07 https://doi.org/10.1080/15228959.2018.1487812 https://doi.org/10.1177%2f1046878110378587 https://doi.org/10.1126/science.1167311 https://doi.org/10.1080/0144341950150207 information technology and libraries june 2020 virtual reality as a tool for student orientation | valenti, lund, and wang 12 16 nitza geri, amir winer, and beni zaks, “challenging the six-minute myth of online video lectures: can interactivity expand the attention span of learners?,” online journal of applied knowledge management 5, no. 1 (2017): 101–11. 17 david bawden and lyn robinson, “the dark side of information: overload, anxiety and other paradoxes and pathologies,” journal of information science 35, no. 2 (2009): 180–91, https://doi.org/10.1177/0165551508095781. 18 moon-heum cho, “online student orientation in higher education: a developmental study,” educational technology research and development 60, no. 6 (2012): 1051–69, https://doi.org/10.1007/s11423-012-9271-4; karmen crowther and alan wallace, “delivering video-streamed library orientation on the web: technology for the educational setting,” college and research libraries news 62, no. 3 (2001): 280–85. 19 yin-leng theng et al., “mixed reality systems for learning: a pilot study understanding user perceptions and acceptance,” international conference on virtual reality (2007): 728–37, https://doi.org/10.1007/978-3-540-73335-5_79. https://doi.org/10.1177/0165551508095781 https://doi.org/10.1007/s11423-012-9271-4 https://doi.org/10.1007/978-3-540-73335-5_79 abstract literature review emporia state university’s school of library and information management methods research questions hypothesis variables experimental phase results discussion conclusion endnotes persistent urls and citations offered for digital objects by digital libraries article persistent urls and citations offered for digital objects by digital libraries nicholas homenda information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12987 abstract as libraries, archives, and museums make unique digital collections openly available via digital library platforms, they expose these resources to users who may wish to cite them. often several urls are available for a single digital object, depending on which route a user took to find it, but the chosen citation url should be the one most likely to persist over time. catalyzed by recent digital collections migration initiatives at indiana university libraries, this study investigates the prevalence of persistent urls for digital objects at peer institutions and examines the ways their platforms instruct users to cite them. this study reviewed institutional websites from the digital library federation’s (dlf) published list of 195 members and identified representative digital objects from unique digital collections navigable from each institution’s main web page in order to determine persistent url formats and citation options. findings indicate an equal split between offering and not offering discernible persistent urls with four major methods used: handle, doi, ark, and purl. significant variation in labeling persistent urls and inclusion in item-specific citations uncovered areas where the user experience could be improved for more reliable citation of these unique resources. introduction libraries, archives, and museums often make their unique digital collections openly available in digital library services and in different contexts, such as digital library aggregators like the digital public library of america (dpla, https://dp.la/) and hathitrust digital library (https://www.hathitrust.org/). as a result, there can be many urls available that point to digital objects within these collections. take, for example, image collections online (http://dlib.indiana.edu/collections/images) at indiana university (iu), a service launched in 2007 featuring open access iu image collections. users discover images on the site through searching and browsing and its collections are also shared with dpla. the following urls exist for the digital object shown in figure 1, an image from the building a nation: indiana limestone photograph collection: • the url as it appears in the browser in image collections online: https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/i mages/vac5094/vac5094-01446 • the persistent url on that page (“bookmark this page at”) http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 • the url pasted from the browser for the image in dpla: https://dp.la/item/eb83ff0a6ae507e2ba441634f7eb0f18?q=indiana%20limestone nicholas homenda (nhomenda@indiana.edu) is digital initiatives librarian, indiana university bloomington. © 2021. https://dp.la/ https://www.hathitrust.org/ http://dlib.indiana.edu/collections/images https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://dp.la/item/eb83ff0a6ae507e2ba441634f7eb0f18?q=indiana%20limestone mailto:nhomenda@indiana.edu information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 2 as a digital library or collection manager, which url would you prefer to see cited for this object? figure 1. an example of a digital object with multiple urls. mcmillan mill, ilco id in2288_1. courtesy, indiana geological and water survey, indiana university, bloomington, indiana. retrieved from image collections online at http://purl.dlib.indiana.edu/iudl/images/vac5094/vac509401446. citation instructions given to authors in major style guides explicitly mention using the best possible form of a resource’s url: “[i]t is important to choose the version of the url that is most likely to continue to point to the source cited.”1 of the three urls above, the second is a purl, or persistent url (https://archive.org/services/purl/), which is why both image collections online and dpla instruct users to bookmark or cite it. other common methods for issuing and maintaining persistent urls include digital object identifiers (doi, https://www.doi.org/), handles (http://handle.net/), and archival resource keys (ark, https://n2t.net/e/ark_ids.html). all of those have been around since the late 1990s to early 2000s. at indiana university libraries, recent efforts have focused on migrating digital collections to new digital library platforms, mainly based on the open source samvera repository software (https://samvera.org/). as part of these efforts, we wanted to survey how peer institutions were http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://archive.org/services/purl/ https://www.doi.org/ http://handle.net/ https://n2t.net/e/ark_ids.html https://samvera.org/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 3 employing persistent, citable urls for digital objects to determine if a prevailing approach had emerged since indiana university libraries’ previous generation of digital library services were developed in the earlyto mid-2000s. besides having the capability of creating and reliably serving these urls, our digital library platforms need to make these urls easily accessible to users, preferably along with some assertion that the urls should be used when citing digital objects and collections instead of the many non-persistent urls also directing to those same digital objects and collections. although libraries, archives, and museums have digitized and made digital objects in digital collections openly accessible for decades using several methods for providing persistent, citable urls, how do institutions now present digital object urls to people who encounter, use, and cite them? by examining digital collections within a large population of digital library institutions’ websites, this study aims to discover 1. what methods of url persistence are being employed for digital objects by digital library institutions? 2. how do these institutions’ websites instruct users to cite these digital objects? literature review the study of digital objects in the literature often takes a philosophical perspective in attempting to define them. moreover, practical accounts of digital object use and reuse note the challenges associated with infrastructure, retrieval, and provenance. much of the literature about common methods of persistent url resolution comes from individuals and entities who developed and maintain these standards, as well as overviews of the persistent url resolution methods available. finally, several studies have investigated the problem of “link rot” by tracking the availability of web-hosted resources over time. allison notes the generations of philosophical thought that it took to recognize common characteristics of physical objects and the difficulty in understanding an authentic version of a digital object, especially with different computer hardware and software changing the way digital objects appear.2 hui also investigates the philosophical history of physical objects to begin to define digital objects through his methods of datafication of objects and objectification of data, noting that digital objects can be approached in three phases: objects, data, and networks, in order to define them.3 lynch is also concerned with determining the authenticity of digital objects and challenges inherent in the digital realm. in describing digital objects, he creates a hierarchy with raw data at the bottom, elevated to interactive experiential works at the top which elicit the fullest emotional connection contributing to the authentic experience of the work.4 the literature often examines digital objects from the practitioner’s perspective, such as the publishing industry’s difficulty in repurposing digital objects for new publishing products. publishers in benoit and hussey’s 2011 case study note the tension between managers and technical staff concerning assumptions about what their computer system could automatically do with their digital objects; their digital objects always require some human labor and intervention to be accurately described and retrievable later. 5 dappert et al. note the need to describe a digital object’s environment in order to be able to reproduce it in their work with the premis data dictionary for preservation metadata (https://www.loc.gov/standards/premis/).6 strubulis et al. provide a model for digital object provenance using inference and resource description framework (rdf) triples (https://w3.org/rdf/) since storing full provenance information for https://www.loc.gov/standards/premis/ https://w3.org/rdf/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 4 complex digital objects, such as the large amount of mars rover data they offer as an example, would be cost prohibitive.7 in 2001, arms describes the landscape of persistent uniform resource names (urn) of handles, purls, and dois near the latter’s inception.8 recent work by koster explains the persistent identifier methods most in use today and examines current infrastructure practices for maintaining them.9 the persistent link resolution method most prominently featured in the literature is the digital object identifier (doi). beginning in 1999, those behind developing and implementing doi have explained its inception, development, and trajectory, continuing with paskin’s deep explanation in 2002 of the reasons why doi exist and the technology behind the service. 10 discipline-specific research notes the utility of doi. sidman and davidson and weissberg studied doi for the purposes of automating the supply chain in the publishing industry.11 derisi, kennison, and twyman, on behalf of the public library of science (plos) announced their 2003 decision to broadly implement doi, followed by additional disciplinespecific encouragement of the practice by skiba in nursing education and neumann and brase in molecular design.12 the archival resource key (ark) is an alternative permanent link resolution scheme. since 2001, the open-source ark identifier offers a self-hosted solution for providing persistent access to digital objects, their metadata, and a maintenance commitment.13 recently, duraspace working groups have planned for further development and expansion of ark with the arks in the open project (https://wiki.lyrasis.org/display/arks/arks+in+the+open+project). persistent urls (purls) have been used to provide persistent access to digital objects for nearly 20 years, and their use in the library community is well documented. shafer, weibel, and jul anticipate uniform resource names becoming a web standard and offer purls as an intermediate step to aid in urn development.14 shafer also explained how oclc uses purls and alternate routing methods (arms) to properly direct global users to oclc resources.15 purls are also used to provide persistent access to government information and were seen by the cendi persistent identification task group as essential to their early efforts to implement the federal enterprise architecture (fea) and a theoretical federal persistent identification resolver.16 digital objects and collections should ideally be accessible via urls that work beyond the life of any one platform, lest the materials be subjected to “link rot,” or the process of decay when previously working links no longer correctly resolve. ducut et al. investigated 1994–2006 medline abstracts for the presence of persistent link resolution services such as handle, purl, doi, and webcite and found 20% of the links were inaccessible in 2008.17 mcmurry et al. investigated link rot in life sciences data and suggested practices for formatting links for increased persistence and approaches for versioning.18 the topic of link rot has been examined as early as 2003, in markwell and brooke’s “broken links: just how rapidly do science education hyperlinks go extinct,” cited by multiple link rot studies. ironically, this article is no longer accessible at the cited url.19 methodology this study sought a set of digital objects within library institutions’ digital collections websites. to locate examples of publicly accessible digital objects in digital collections, this study collected institutional websites from the digital library federation’s (dlf) published list of 195 members https://wiki.lyrasis.org/display/arks/arks+in+the+open+project information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 5 as of august 2019.20 subsequent investigation aimed to find one representative digital object from unique digital collections navigable from each institution’s main web page. this study aimed to locate digital collections that met the following criteria: 1. collections are openly available. 2. collections are in a repository service, as opposed to highlighted content visible on an informational web page or blog. 3. collections are gathered within a site or service that contains multiple collections, as opposed to individual digital project websites, when possible. 4. collections are unique to an institution, as opposed to duplicated or licensed content. these criteria were developed in an effort to find unique, publicly accessible digital objects within each institution’s digital collections. to be sure, users search for and discover materials in a variety of ways and in numerous services, but studying the information-seeking behavior of users looking for digital objects or digital collections is outside the scope of this study. ultimately, digital collections indexed by search engines or available in aggregator services like dpla often contain links to collections and objects in their institutionally hosted platforms. users who discover these materials are likely to be directed to the sites this study investigated. for the purposes of this study, at least one digital collection was investigated from each dlf institution. multiple sites for an institution were investigated when more than one publicly accessible site or service met the above criteria. when digital collections at an institution were delivered only through the library catalog discovery service, reasonable attempts were made to delimit discoverable digital collections content. in total, 183 digital collections were identified for this study. once digital collections were located, subsequent investigation aimed to locate individual digital objects within them. while digital objects represent diverse materials available in a variety of formats, for ease of comparing approaches between institutions, a mixture of ind ividual digital images, multipage digital items, and audiovisual materials were examined. objects for this study were primarily available in websites containing a variety of collections and format types with common display characteristics despite format differences, and no additional efforts were made to locate equal or proportional digital object formats at each institution. one representative digital object was identified per digital collection, totaling 183 digital objects. once a digital object was located at an institution, the object’s unique identifier, format, persistent url, persistent url label, method of link resolution (if identifiable), and citation were collected with particular focus on the object’s persistent url, if available. commonly used persistent url types and their url components can be identified, as seen in table 1; however, any means of persistence was collected if clearly identified. after examining initial results, the object’s provided citation, if available, was added to the list of data collected since many digital collection platforms provide recommended citations for individual objects. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 6 table 1. commonly used persistent url methods and corresponding url components persistent url type url component archival resource key (ark) ark:/ digital object identifier (doi) doi.org/ (or doi:) handle hdl.handle.net persistent url (purl) purl. results most institutions have a single digital collection site or service that met the selection criteria for this study. some appear to have multiple digital collection repositories, often separated by digital object format or library department, and many institutions have collections that are only publicly accessible through discrete project web sites, such as digital exhibits or focused digital humanities research projects. out of 195 dlf member institutions, 171 had publicly accessible digital collections. of these 171 institutions, 153 had digital collections services/sites that adhered to the criteria of this study, while 21 had only project-focused digital collections sites. since several institutions had more than one digital collection platform accessible via their main institutional website, a population of 183 digital collections were investigated. one representative digital object from each collection was gathered, consisting of 107 digital images, 73 multipage items, and 3 audiovisual items (totaling 183). table 2. number of instances of digital collection platforms identified platform number percentage of total (183) custom or unidentifiable 53 29% contentdm 46 25% islandora 19 10% dspace 11 6% samvera 11 6% omeka 10 5% internet archive 7 4% digital commons 6 3% fedora custom 4 2% luna 3 2% xtf 3 2% artstor 2 1% iiif server 2 1% primo 2 1% aspace 1 1% elevator 1 1% knowvation 1 1% veridian 1 1% information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 7 as seen in table 2, almost a third of digital collection platforms encountered appear to be customdeveloped or customized to not reveal the software platform upon which they were based. of the platform-based services encountered where software was identifiable, 17 different platforms were used and the top five were contentdm, islandora, dspace, samvera (hyrax, avalon, curation concerns, etc.), and omeka. table 3. occurrence of persistent links in surveyed digital collections, method of link persistence, and persistent link labels persistent links? number percentage of total (183) no/unknown 93 51% yes/ persistence claimed 90 49% persistent link method number percentage of total (90) unknown 33 37% handle 27 30% ark 19 21% doi 6 7% purl 5 6% persistent link label number percentage of total (90) othera 24 26.7% permalink 22 24.4% identifier 13 14.4% [no label given] 10 11.1% permanent link 7 7.8% uri 5 6% persistent link 3 3.3% handle 2 2.2% link to the book 2 2.2% persistent url 2 2.2% atwenty-four other persistent link labels were reported,21 each occurring only once. as seen in table 3, the numbers of digital objects with and without publicly accessible persistent (or seemingly persistent) links were nearly equal. among the digital objects with persistent links, the majority claimed persistence without a discernible resolution method, with the rest divided between handle, ark, doi, and purl. these objects also had 33 different labels for these links in the public-facing interface. the top five labels were: permalink (22), identifier (13), permanent link (7), uri (5), and persistent link (3). as seen in table 4, the majority of digital objects surveyed had a unique item identifier in their publicly viewable item record. the majority did not offer a citation in the item’s publicly viewable record. among items that offered citations, the majority contained a link to the item, and three offered downloadable citation formats only, such as endnote, zotero, and mendeley. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 8 table 4. various digital object characteristics surveyed unique item identifier in item record number percentage of total (183) yes 132 72% no 51 28% citation in item record number percentage of total (183) yes 65 36% no 118 64% citations containing links to item number percentage of total (65) yes 39 60% downloadable citation format only 3 5% no 23 35% discussion since proper citation practice dictates choosing the url most likely to provide continuing access to a resource, it follows that providing persistent urls to resources such as digital objects or digital collections is also a good practice. it is encouraging to see a large number of institutions surveyed providing urls that persist (or claim to persist). providing persistent access to a unique digital resource implies a level of commitment to maintaining its url into the future, requiring policies, technology, and labor resources, further augmented by costs associated with registering certain types of identifiers like doi.22 it is likely that institutions not providing persistent (or not obviously persistent) urls are either internally committing to preserving their objects, collections, and services through means not known to end users; are constrained by technological limitations of their digital collection platforms; hope to develop or adopt new digital library services that offer these capabilities; or lack the resources to offer persistent urls. the four commonly used methods of persistent link resolution—doi, handle, ark, and purl— have been used for nearly 20 years, and it is not surprising that alternative observable methods were seldom encountered in this study. handles were the most common persistent url method, which seems related to the digital library platform used by an institution. dspace distributions are pre-bundled with handle server software, for example, and 12 out of 27 platforms serving digital objects with handles were based on dspace (https://duraspace.org/dspace/). when choosing to implement or upgrade a digital library platform, institutions often consider several available options. choosing a platform that offers the ability to easily create and maintain persistent urls might be less burdensome than making urls persist via independent or alternative means. thirty-three digital objects offered links that had labels implying some sort of persistence but lacked information describing the methods used or url components consistent with commonly used methods, as seen in table 1. to achieve persistence, there might be a combination of url rewriting, locally implemented solutions, or nonpublic persistent urls existing. it would benefit users, increasingly aware of the need to cite digital objects using persistent links, for digital object platforms that offer persistent linking to explicitly state that fact and ideally offer some evidence of the resolution method used. researchers will be looking for citable persistent links that offer https://duraspace.org/dspace/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 9 some cues signifying their persistence, whether it is clearly indicated language on the website or a url pattern consistent with the four major methods commonly used. the amount of variation in labeling persistent links was surprising. commonly used digital library software platforms have default ways of labeling these fields. nearly all of the “reference url” labels encountered are in contentdm sites, for example. since the concept of offering a persistent link to a digital object is not uncommon, perhaps there can be a more consistent approach to choosing the label for this content. when a researcher finds a digital object in an institutional digital library service, they might want to cite that object. accurately citing resources in all formats is an essential research skill, and digital library platforms often try to aid users by providing dynamically generated or pre-populated citations based on unique metadata associated with that object. it was somewhat surprising to encounter these types of citation helpers that did not include persistent links. since a digital object’s preferred persistent link is often different than the url visible in the browser, efforts should be made to make citations available containing persistent links. there are institutions with digital collections that were not examined in this study due to a number of factors. first, this study examined the 195 institutions who were members of the digital library federation, and there are 2,828 four-year postsecondary institutions in the united states as of 2018.23 additional study could expand perceptions about persistent links for digital objects when looking beyond the dlf member institutions, which are predominantly four-year postsecondary institutions but also contain museums, public libraries, and other cultural heritage organizations. an alternative approach to collecting this data would be to conduct user testing focused on finding and citing digital objects from a number of institutions. this approach was not used, however, since the initial goal of this study was to see how peer digital library institutions have employed persistent links and citations across a broad yet contained spectrum. as one librarian with extensive digital library experience, my approach to locating these platforms and resources is subject to subconscious bias i may have accumulated over my professional career, but i would hope that my experience makes me more able to locate these platforms and materials than the average user. digital library platforms are numerous, and often institutions have several of them with varying degrees of public visibility or connectivity to their institution’s main library website. this study’s findings for any particular institution are not as authoritative as self-reported information from the institution itself. while a survey aimed at collecting direct responses from institutions might have yielded more accuracy, a potentially low response rate would also make it difficult to truly know what methods of persistent linking peer institutions are employing, especially with the majority of these resources being openly findable and accessible. still, further study with self reported information could shed more light on the decisions to provide certain methods of persistent links to objects within their chosen digital collection platforms. moreover, it is possible that some digital object formats are more likely to have persistent urls than others. newer formats such as three-dimensional digital objects, commonly cited resources like data sets, and scholarship held in institutional repositories could be available in digital library services similar to those surveyed in this study with different persistent url characteristics. additional study could aim to survey populations of digital objects by format across multiple institutions to investigate any correlation between persistent urls and object format. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 10 conclusion unique digital collections at digital library institutions are made openly accessible to the pu blic in a variety of ways, including digital library software platforms and digital library aggregator services. regardless of how users find these materials, best practices require users to cite urls for these materials that are most likely to continue to provide access to them. persistent urls are a common way to ensure cited urls to digital objects remain accessible. commonly used methods of issuing and maintaining persistent urls can be identified in digital object records within digital collection platforms available at these institutions. this study identified characteristics about these digital objects, their platforms, prevalence of persistent urls in their records, and the way these urls are presented to users. findings indicate that dlf member institutions are split evenly between providing and not providing publicly discernible persistent urls with wide variation on how these urls are presented and explained to users. decisions made in developing and maintaining digital collection platforms and the types of urls made available to users impact which urls users cite and the possibility of others encountering these resources through these citations. embarking on this study also was prompted by digital collection migrations at indiana university, and these findings provide us interesting examples of persistent url usage at other institutions and ways to improve the user experience in digital collection platforms. endnotes 1 the chicago manual of style online (chicago: university of chicago press, 2017), ch. 14, sec. 7. 2 arthur allison et al., “digital identity matters,” journal of the american society for information science & technology 56, no. 4 (2005): 364–72, https://doi.org/10.1002/asi.20112. 3 yuk hui, “what is a digital object?” metaphilosophy 43, no. 4 (2012): 380–95, https://doi.org/10.1111/j.1467-9973.2012.01761.x. 4 clifford lynch, “authenticity and integrity in the digital environment: an exploratory analysis of the central role of trust” council on library and information resources (clir), 2000, https://www.clir.org/pubs/reports/pub92/lynch/. 5 g. benoit and lisa hussey, “repurposing digital objects: case studies across the publishing industry,” journal of the american society for information science & technology 62, no. 2 (2011): 363–74, https://doi.org/10.1002/asi.21465. 6 angela dappert et al., “describing and preserving digital object environments,” new review of information networking 18, no. 2 (2013): 106–73, https://doi.org/10.1080/13614576.2013.842494. 7 christos strubulis et al., “a case study on propagating and updating provenance information using the cidoc crm,” international journal on digital libraries 15, no. 1 (2014): 27–51, https://doi.org/10.1007/s00799-014-0125-z. 8 william y. arms, “uniform resource names: handles, purls, and digital object identifiers,” communications of the acm 44, no. 5 (2001): 68, https://doi.org/10.1145/374308.375358. https://doi.org/10.1111/j.1467-9973.2012.01761.x https://www.clir.org/pubs/reports/pub92/lynch/ https://doi.org/10.1002/asi.21465 https://doi.org/10.1080/13614576.2013.842494 https://doi.org/10.1007/s00799-014-0125-z https://doi.org/10.1145/374308.375358 information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 11 9 lukas koster, “persistent identifiers for heritage objects,” code4lib journal 47 (2020), https://journal.code4lib.org/articles/14978. 10 albert w. simmonds, “the digital object identifier (doi),” publishing research quarterly 15, no. 2 (1999): 10, https://doi.org/10.1007/s12109-999-0022-2; norman paskin, “digital object identifiers,” information services & use 22, no. 2/3 (2002): 97, https://doi.org/10.3233/isu2002-222-309. 11 david sidman and tom davidson, “a practical guide to automating the digital supply chain with the digital object identifier (doi),” publishing research quarterly 17, no. 2 (2001): 9, https://doi.org/10.1007/s12109-001-0019-y; andy weissberg, “the identification of digital book content,” publishing research quarterly 24, no.4 (2008): 255–60, https://doi.org/10.1007/s12109-008-9093-8. 12 susanne derisi, rebecca kennison, and nick twyman, “the what and whys of dois,” plos biology 1, no. 2 (2003): 133–34, https://doi.org/10.1371/journal.pbio.0000057; diane j. skiba, “digital object identifiers: are they important to me?,” nursing education perspectives 30, no. 6 (2009): 394–95, https://doi.org/10.1016/j.lookout.2008.06.012; janna neumann and jan brase, “datacite and doi names for research data,” journal of computer-aided molecular design 28, no. 10 (2014): 1035–41, https://doi.org/10.1007/s10822-014-9776-5. 13 john kunze, “towards electronic persistence using ark identifiers,” california digital library, 2003, https://escholarship.org/uc/item/3bg2w3vs. 14 keith e. shafer, stuart l. weibel, and erik jul, “the purl project,” journal of library administration 34, no. 1–2 (2001): 123, https://doi.org/10.1300/j111v34n01_19. 15 keith e. shafer, “arms, oclc internet services, and purls,” journal of library administration 34, no. 3–4 (2001): 385, https://doi.org/10.1300/j111v34n03_19. 16 cendi persistent identification task group, “persistent identification: a key component of an egovernment infrastructure,” new review of information networking 10, no. 1 (2004): 97–106, https://doi-org/10.1080/13614570412331312021. 17 erick ducut, fang liu, and paul fontelo, “an update on uniform resource locator (url) decay in medline abstracts and measures for its mitigation,” bmc medical informatics & decision making 8, no. 1 (2008): 1–8, https://doi.org/10.1186/1472-6947-8-23. 18 julie a. mcmurry et al., “identifiers for the 21st century: how to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data,” plos biology 15, no. 6 (2017): 1–18, https://doi.org/10.1371/journal.pbio.2001414. 19 john markwell and david brooks, “broken links: just how rapidly do science education hyperlinks go extinct?” (2003), cited by many and previously available from: http://wwwclass.unl.edu/biochem/url/broken_links.html [currently non-functional]. 20 “our member institutions,” digital library federation (2020), https://www.diglib.org/about/members/. https://journal.code4lib.org/articles/14978 https://doi.org/10.1007/s12109-999-0022-2 https://doi.org/10.3233/isu-2002-222-309 https://doi.org/10.3233/isu-2002-222-309 https://doi.org/10.1007/s12109-001-0019-y https://doi.org/10.1007/s12109-008-9093-8 https://doi.org/10.1371/journal.pbio.0000057 https://doi.org/10.1016/j.lookout.2008.06.012 https://doi.org/10.1007/s10822-014-9776-5 https://escholarship.org/uc/item/3bg2w3vs https://doi.org/10.1300/j111v34n01_19 https://doi.org/10.1300/j111v34n03_19 https://doi-org/10.1080/13614570412331312021 https://doi.org/10.1186/1472-6947-8-23 https://doi.org/10.1371/journal.pbio.2001414 http://www-class.unl.edu/biochem/url/broken_links.html http://www-class.unl.edu/biochem/url/broken_links.html https://www.diglib.org/about/members/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 12 21 twenty-four labels used only once: archival resource key; ark; bookmark this page at; citable link; citable link to this page; citable uri; copy; copy and paste this url; digital object url; doi; identifier (hdl); item; link; local identifier; permanent url; permanently link to this resource; persistent link to this item; persistent link to this record; please use this identifier to cite or link to this item; related resources; resource identifier; share; share link/location; to cite or link to this item, use this identifier. 22 one of the frequently asked questions (https://www.doi.org/faq.html) states that doi registration fees vary. 23 national center for education statistics, “table 317.10. degree-granting postsecondary institutions, by control and level of institution: selected years, 1949–50 through 2017–18,” in digest of education statistics, 2018, https://nces.ed.gov/programs/digest/d18/tables/dt18_317.10.asp. https://www.doi.org/faq.html https://nces.ed.gov/programs/digest/d18/tables/dt18_317.10.asp abstract introduction literature review methodology results discussion conclusion endnotes research on knowledge organization of intangible cultural heritage based on metadata article research on knowledge organization of intangible cultural heritage based on metadata qing fan, guoxin tan, chuanming sun, and panfeng chen information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14093 qing fan (fanqmy@hotmail.com) is phd student, jingchu university of technology and central china normal university. guoxin tan (gxtan@mail.ccnu.edu.cn) is professor, central china normal university. chuanming sun (cms@ccnu.edu.cn) is assistant professor, central china normal university. panfeng chen (94388389@qq.com) is phd student, guizhou university. © 2022. abstract metadata has been analyzed and summarized. based on dublin core metadata, combined with the characteristics and forms of intangible cultural heritage, this article explores the metadata for intangible cultural heritage in knowledge organizations based on relevant resource description standards. the wuhan woodcarving ship model is presented as an example of national intangible cultural heritage to control the application of metadata in intangible cultural heritage knowledge organizations. new ideas are provided for the digital development of intangible cultural heritage. introduction intangible cultural heritage includes traditions or living expressions inherited from our ancestors and passed on to our descendants. digital storage and presentation of intangible cultural heritage resources is an inevitable requirement for the protection of china’s long history and its culture in the information age. with the rapid development of artificial intelligence and big data, all kinds o f massive data in the internet age are expanding, necessitating the development of a database platform for the inheritance and protection of intangible cultural heritage. at the same time, organizations must consider how to deal with the intangible cultural heritage using complex data. searching for data and visualizing the relationship with intangible cultural heritage is a current research hotspot. however, at this stage, there are still some problems in the construction of digital resources of intangible cultural heritage in china, such as the establishment of accurate and interoperable metadata. in this process, the diversity and uniqueness of intangible cultural heritage items needs to be fully considered, including the subsequent integration of digital resources and its existing digital resource system of intangible cultural heritage in china. therefore, the construction of the intangible cultural heritage resource database is not only to simply organize and list the data, but more importantly, to reveal the relationships between the knowledge content and resources in the intangible cultural heritage field and to build a thorough and relevant knowledge system. research status at home and abroad metadata is data that describes the attributes of a certain type of resource (or object). metadata can be used to locate and manage the resource and display information about it.1 metadata can also be structured data used to describe online information resources and strengthen the collection development, organization, and utilization of online information resources.2 from the perspective of knowledge organization, general metadata is used to describe the theme, content, and characteristics of information resources. the most common metadata format is dublin core mailto:fanqmy@hotmail.com mailto:gxtan@mail.ccnu.edu.cn mailto:94388389@qq.com information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 2 (dc) metadata, which is structured and descriptive. the creation of metadata standards in the field of intangible cultural heritage must first combine the basic concepts and characteristics of cultural heritage to extract specific attributes and provide element definitions that describe the basic characteristics of intangible cultural heritage resources, that is, core metadata. this is not easy to achieve since intangible cultural heritage is traditional art, music, folklore, etc. only by unifying intangible heritage resources of different expressions through metadata standards can a relatively standardized intangible cultural heritage resource library be formed. the visual resources association of america (vra) created the vra core metadata standard to describe art, architecture, prehistoric artifacts, folk culture, and other artistic visual resources in the network environment.3 in terms of intangible cultural heritage material, lan xuliu et al. proposed the vra core as the foundation format and added elements from the categories for the description of works of art (cdwa) as the extended element metadata format of digital cultural resources.4 a sculpture of abraham lincoln was used as the basis for the metadata format. the example explains the specific use method of the proposed metadata format in practice. the solution does not extend the core elements and there is an overall lack of flexibility as users cannot customize the required elements. b. murtha proposed a descriptive metadata architecture in the field of art and architecture, including the core category of ontology id, and added a controlled vocabulary and classification system in the field of art and architecture to enrich the specific metadata model.5 it is mainly based on the theoretical discussion of metadata standards in this field, and there is no specific practice, but its method of formulating metadata from the perspective of user retrieval effects is worth learning. yi junkai et al. proposed the core metadata specification for digital museums as the basis for expansion, implemented the relevant methods in the metadata expansion rules, and finally formed a special metadata specification. 6 this metadata specification system can guarantee the basic and personalized description of resources. the metadata specification was developed and completed by the national museum of china. to keep this specification consistent with the metadata description of other metadata specifications at home and abroad, the description method refers to the iso-11179 standard.7 the national museum metadata specification contains seven element sets, 60 elements, and 342 restricted elements. the seven metadata element sets are: collection resource entity, data resource entity, responsible entity, business entity, transaction entity, relationship entity, and save entity. each metadata element of the museum’s digital resources defines several elements according to the concept of hierarchical structure; each element is defined and described by a group of attributes, such as name, version, logo, definition, type, and value range. there are 11 attributes of necessity, repeatability, lower-level elements, application scope, and annotations. the establishment of the metadata standard framework for museum digital resources is based on the digitization of museum collection resources. collection resources are the core of museum work, and the content of museum collection resources is the core component of digital resources.8 the digital resources of these collections are related to communication, transmission, storage, or business activities. based on the characteristics of china’s existing intangible cultural heritage information resources, li bo proposed a compatible and interoperable metadata model. the description of intangible cultural heritage information resources was created on the basis of information structure and semantic component analysis.9 the ontological characteristics of each intangible cultural heritage information and related documents, characters, objects, spaces, and other entities are included in the construction of the intangible cultural heritage metadata model, which combines china’s nonmaterial cultural heritage. the actual situation of the tangible cultural heritage database has a information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 3 certain degree of international generality. ye peng compared the dc metadata standard system with the needs of china’s intangible cultural heritage protection, proposed a metadata standard based on intangible cultural heritage resources, and gave the scope of application. this metadata standard contains multiple core metadata, corresponding to the relevant elements in dc. 10 however, ye peng also pointed out that a major problem with this intangible cultural heritage metadata standard is that it is not compatible with china’s existing intangible cultural heritage database, such as information storage, digital mining, file retrieval, and multimedia distribution. connotation and design principles of intagible cultural heritage metadata connotation of intangible cultural heritage metadata the core metadata of this study will be designed based on the dc core metadata set, considering its versatility, scalability, easy conversion between metadata, interoperability between systems, and existing comparisons. universal dc metadata is the most influential and widely used metadata standard in the field of information resource description under the network environment. since the dc metadata standard is mainly aimed at the retrieval of network entity resources, it reveals common characteristics of digital entity resources but does not consider the cultural connotation and knowledge context of specific knowledge topics such as intangible cultural heritage.11 to reveal the originality of the object, the model proposed in this article will also combine the application and recording of china’s intangible cultural heritage items, reflecting the characteristics of specific intangible cultural heritage items, so as to facilitate compatibility and integration with existing information resources to form a unified interface standard with the existing intangible cultural heritage management system of the cultural sector, enabling the sharing of digital resources among cultural centers in different regions. design principles of intangible cultural heritage metadata the design of the metadata model of intangible cultural heritage information resources should be fully compatible with popular metadata standards. various metadata standards apply to different objects: dc is suitable for network resources, cdwa is suitable for artworks, and federal geographic data committee (fgdc) is suitable for geographic space. when it comes to digital collections, the national library of the netherlands was one of the first institutions in the world to respond, starting in 1994 with the decision to collect digital publications and working with publishers and it partners to make important contributions to digital collections research. the national library of the netherlands will develop a new global information network. the main approach of the system is to add dc data to all collected web pages. the new web page will require providers to add elements of the dc core set by themselves. once submitted, the national library of the netherlands’ search engine will use these dc elements to assist in retrieval. in recent years, the art museum community has adopted several metadata standards such as cdwa and vra core to describe their collections of art works. nam, y. j. and lee, s. m proposed a set of metadata elements customized to fit into the distinct context of smallscaled art museums in south korea.12 a small art museum in korea combines the existing cdwa, vra core, and dc standards and the proposed set of metadata elements is expected to support artistic resources. the metadata design of intangible cultural heritage resources should refer to the design cases of the netherlands and south korea. when applying the existing metadata standards, it is beneficial to fully reveal the characteristics of the described objects and decide whether to use the overall framework or the partial use, and must not be blindly used.13 information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 4 the design of the metadata should connect with existing intangible cultural heritage information sources. at present, china should refer to the relevant standards of world intangible cultural heritage digital resources and establish a management system for intangible cultural heritage resources in line with the national, provincial, municipal and county levels. the relevant cultural management functional departments have also established relevant information systems to form a unified set of authoritative and standardized data. therefore, in terms of the elements and concepts used in the metadata model, special attention should be paid to the connection with these existing data models, so that as new resources are developed, these rich information sources can be shared through the mapping relationship between the elements. the metadata model should have good scalability and strong descriptive ability containing more elements. therefore, an element-rich metadata model has a strong influence on the organization and management of information resources and content disclosure. data inspection should be flexible. conversely, a metadata model with a lack of elements will be less flexible when technology is upgraded or user description requirements are expanded. a metadata model requires constant expansion and modification, and the practicality of the model will be greatly reduced. on the other hand, the design of the metadata model should have a mechanism that facilitates different types of users to expand elements according to different needs. the design of metadata can show the relationship between intangible cultural heritage resource entities. with the development of information resource description technology at home and abroad, a batch of metadata standards for various types of information resources have been formed. the metadata standards for china’s intangible cultural heritage should aim for compatibility and integrate existing world standards based on current results. the metadata should further be expanded and developed in accordance with preserving intangible cultural heritage works. the metadata standards for intangible cultural heritage archives should describe resources while displaying the greatest degree of versatility, compatibility, and standardization. therefore, combining the requirements of cultural heritage archiving and the characteristics of intangible cultural heritage, the dc metadata standard is used as the basic standard, and the advantages of other metadata standards are combined to determine the metadata standard of china’s intangible cultural heritage archives. intangible cultural heritage knowledge organization definition of intangible cultural heritage metadata through semantic analysis, the core attributes and concepts involved in metadata can be obtained, and the specificity of attributes and concepts can be improved through metadata standards, which can make users’ cognition, retrieval, and evaluation of information more accurate and effective. at the same time, the normative concepts and common attributes in existing metadata schemes should be quoted as much as possible. according to the attribute characteristics of the object, close and similar conceptual entities can be selected from one or more common metadata schemes, so that the element definition has versatility and standardization. for intangible cultural heritage, according to the attributes and characteristics of the object, close and similar conceptual entities are selected from one or more general metadata schemes to make the element definition universal and normative. in the “convention for the safeguarding of intangible cultural heritage," unesco pointed out that the types of intangible cultural heritage include oral traditions, performing arts, social practices, festivals, traditional handicrafts. based on the above-mentioned definitions of intangible cultural heritage types and the previous comparative research results on metadata information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 5 standards of various countries, combined with the research results of scholars, the dc standard metadata name and standard affix library incorporates a set of intangible cultural heritage archive metadata containing 23 elements and extended elements (table 1). table 1. core metadata of intangible cultural heritage digital resources category standard metadata name field name annotation content title dc_title name and content of intangible cultural heritage category dc_category bintroduction dc_bintroduction creator mcreator dc_creator_own creator identity information nation dc_creator_nation sex dc_creator_sex age dc_creator_age area dc_creator_area biography dc_creator_biography category dance dc_category_dance heritage list category song dc_category_song literature dc_category_literature quyi dc_category_quyi art dc_category_art resources video dc_category_video resource type includes a description of resource content picture dc_resources_picture text dc_resources_text network dc_resources_network organization area dc_organization_area organization information principal dc_organization_principal officephone dc_organization_officephone jobtitle dc_organization_jobtitle introduction dc_organization_introduction intangible cultural heritage metadata standards unify the information format and mutual mapping relationship of intangible cultural heritage digital achievements. on the one hand, a single standard removes barriers to sharing metadata caused by having intangible cultural heritage information resources with different hardware, different platforms, and different formats. on the other, it enables the digital resources of intangible cultural heritage to be shared online. for example, the china intangible cultural heritage digital museum (https://www.ihchina.cn/) uses unified metadata to design this section, which solves the problem of integration and sharing of different resources. the smooth conversion between new and old data is beneficial to the protection of intangible cultural heritage inventory data, avoiding duplication of work, and improving the efficiency and effectiveness of intangible cultural heritage storage. in addition, design of intangible cultural https://www.ihchina.cn/ information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 6 heritage metadata standards must consider the versatility, compatibility, and individualization of the metadata system. description of digital resources of intangible cultural heritage through the analysis of the intangible cultural heritage project objects, we can provide content, management, resources, etc. these attributes can correspond to the elements of the metadata during the metadata design or serve as the semantic basis for the definition of the elements. the analysis and extraction of the core attributes and concepts of the object should firs t consider the full presentation of the object knowledge and resource content, and the concept should have a certain degree of specificity so that users can recognize, retrieve, and evaluate the information. secondly, it is important to refer to the normative concepts and general attributes in the existing metadata schemes as much as possible. according to the attribute characteristics of the object, select close and similar conceptual entities from one or more general metadata schemes, so that the element definition is versatile and standardized. therefore, the content description of intangible cultural heritage items should reflect unique cultural meanings and characteristics. at present, there are only general concepts such as name, category, subject, and region among several general metadata schemes. figure 1 shows the metadata framework of intangible cultural heritage. figure 1. metadata framework of intangible cultural heritage. in the content description, there are five elements which include names, types, subjects, regions, and protection levels as special attributes. in the metadata standard, the only elements that can be used in general are the name, subject, category, and region. the protection level means that the list of intangible cultural heritage is the object of national or provincial protection. the “national information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 7 intangible cultural heritage declaration form” uses the five elements of content description, and then conducts resource description analysis based on the information organization structure of the intangible cultural heritage project object to construct intangible cultural heritage. th e framework covers the main attributes and definitions involved in intangible cultural heritage objects, as well as their connections and hierarchical relationships. in the description framework, in addition to the attributes and definitions specified by the dc metadata, a set of custom elements is also set.14 without changing the basic structure, users can customize elements according to standard needs to make the model extensible. in the description of related resources, entities related to intangible cultural heritage are divided into four categories: inheritors, object categories, resources, and organizations. among them, the inheritor-related attributes include six general attributes such as name, ethnicity, gender, region, age, and person profile. object category attributes include dance, song, art, literature, video, network, etc. for intuitive objects, you can refer to the use of artistic works to describe the category or the core category of visual materials, and the documentation and materials can use the metadata defined by dc. this model does not specify the use of attributes and concepts in metadata. in a specific metadata solution, these attributes and concepts can correspond to metadata element names, or they can be modifiers, values, or metadata element definition s, such as the inheritor of shadow puppetry is lin shimin. data association linked data is a technical specification recommended by the world wide web consortium (w3c). the relationship among linked data objects supports a greater degree of resource sharing and utilization, enabling users to efficiently and accurately locate needed resources on a larger scale. the release of linked data is to describe the metadata of cultural resources in the form of resource description framework (rdf). after forming semantic associations, intelligent retrieval and data discovery services are provided on the intelligent application platform, so as to ensure the visual presentation and data sharing of intangible cultural heritage digital resources in knowledge organizations. linked data publishing provides standardized data access specifications. the biggest advantage is that it can correlate data across platforms and establish links to different data, which is convenient for users to search for data in different repositories. as far as the content of intangible cultural heritage is concerned, linked data presents unstructured, semi-structured, and structured data on the internet in the form of rdf. rdf description refers to the transformation of metadata in resources into rdf triples through data and relationship mapping, and the formation of w3c-supported documents through semantic relationship construction. visual presentation refers to the visual presentation of relevant content by users through network search with the support of the network architecture. in essence, the release of digital resource data is to realize the rdf description and sharing of metadata for intangible cultural heritage metadata by multiplexing the relationship. its essence is the management application process of the database. the linked data publishing process of intangible cultural heritage resources consists of three steps: (1) converting the metadata of the repository into an rdf triple model and assigning a uri identifier to form an rdf document of linked data; (2) establishing a semantic relationship and building relational links to form semantic associations; and (3) mapping cultural resource data to the network through the uri access mechanism, and presenting data search results in a visual way information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 8 through user sparq queries. although there are differences in the structure of different data publishing tools, metadata-based linked data publishing follows these basic steps. examples of metadata application of intangible cultural heritage knowledge organizations introduction to the wuhan wood carving ship model intangible cultural heritage project the wuhan woodcarving ship model is a unique art variety in chinese woodcarving craftsmanship, with a history of more than 2,000 years.15 according to the song dynasty’s the history of jin shi·zhang zhongyan: “the craftsman did not know how to build the ship. when the boat was built, the craftsmen did not know how to build it. the boat model made by zhang zhongyan. it was only a few inches long and was very delicate. the front and rear of the boat could be spliced well without glue. the other craftsmen were all amazed.” as early as the 12th century, there were people in china who could carve small boats several inches long as models for making ships. hubei woodcarving boats are a national intangible cultural heritage project but the art and craft faces challenges. like other intangible cultural heritage projects, development of the craftwork is weak. while younger generations in hubei may recognize the form of wooden carving boat, few are willing to learn this art and more young people have not even heard of it. in order to better honor this long-standing tradition, this article focuses on the characteristics of intangible cultural heritage digital resources, combined with the relevant theories of knowledge organization, an d adopts certain technical standards to organize the knowledge organization and construction of the metadata standards for hubei woodcarving ships. knowledge organization construction based on metadata to effectively use metadata in intangible cultural heritage, metadata specifications must be defined and described. rdf is metadata specification description language. it can semantically pay attention to the attributes of the ontology and the interrelationships between these attributes. by using rdf information, it can be easily exchanged between computers using different types of operating systems and application languages.16 rdf regulates the realization of semantics in a standardized and interoperable way. the web page can implement the invocation of rdf in a simple way, thereby facilitating the retrieval of network data and the discovery of related knowledge. in this paper, the metadata system needs to use rdf to define the attributes, so that it can be better transformed into a language that the computer can understand. intangible cultural heritage items have a certain relationship with inheritors, organizations, resource content, etc. in order to establish a complete intangible cultural heritage cultural resource database, these entities need to be described separately in rdf. wuhan woodcarving ship model metadata definition according to the rdf description, wuhan woodcarving ship model is used as a specific example to show the designed metadata scheme, that is, the relevant content of the example is filled into the defined resource description frame. for example, part of the rdf description of the wuhan woodcarving ship model intangible cultural heritage item can be found in the following code: information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 9 long congfa men wuhan, hubei family-level intangible cultural heritage project-inheritor of wuhan woodcarving ship model conclusion this article reviews the classification system of china’s intangible cultural heritage items and the integration of existing knowledge organizations and other types of resources for designing a set of more comprehensive and reasonable metadata standards with a certain degree of scalability and it is applied to the actual intangible cultural heritage knowledge organization. to effectively protect and use the digital resources of intangible cultural heritage, further research is needed for this study. additional discussion on updating and promoting existing metadata specifications as well as multidimensional aggregation of existing resources to achieve knowledge discovery is needed. through the integration of linked data and sharing existing digital resources, this article can encourage scholarship and conversation that leads to the preservation of china’s intangible cultural heritage. funding statement this work was supported by the hubei key laboratory of big data in science and technology. this work was also supported by the palace museum’s open project in 2021, research on the dissemination of intangible cultural heritage of the palace museum from the perspective of artificial intelligence. this subject has been funded by the mercedes-benz star wish fund of china youth foundation. information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 10 endnotes 1 feng xiangyun, xiao long, liao sansan, and zhuang jilin, “a comparative study of commonly used foreign metadata standards,” journal of university libraries 4 (2001): 15–21, https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2001&filename=d xts200104005&uniplatform=nzkpt&v=v9a8p-rcf4csl9yoaqskj5nbnfjrmwjhsaoj2pnqq9jl0tdsle3ntrjrzeto32h. 2 ma min, “metadata—the basic format for organizing online information resources,” information science 4 (2002): 377–79, https://kns.cnki.net/kcms/detail/detail.aspx? dbcode=cjfd&dbname=cjfd2002&filename=qbkx200204012&uniplatform=nzkpt&v=yem o5mxwo0mzg5mkz6qml62oruvfchtdy2slxdbn_hesfdvspxuc-naorq0v0ikl. 3 “specification data function requirements” november 24, 2014, http://eprints.rclis.org/13191/1/frad_2009-zh.pdf. 4 lan xuliu and meng fang, “metadata format analysis of digital cultural resources,” modern information 33, no. 8: 61–64, 102, https://kns.cnki.net/kcms/detail/detail.aspx? dbcode=cjfd&dbname=cjfd2013&filename=xdqb201308015&uniplatform=nzkpt&v=skct nh3sg04qrgzqahxdh3nj2hmpk2ppmjbp4ymnpdq-phf2ffjwxpp5vcns9qc9. 5 murtha baca, “practical issues in applying metadata schemas and controlled vocabularies to cultural heritage information,” cataloging & classification quarterly 36, no. 3–4 (2003): 47–55, https://doi.org/10.1300/j104v36n03_5. 6 yi junkai, zhou yubin, and chen gang, “research and practice of scalable digital museum metadata specification[j],” digital library forum 2 (2014): 43–53, https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdtemp&filename=s ztg201402011&uniplatform=nzkpt&v=tf76zueher7ymnfxdfafenmm2z2tetze08zqkdhoc7 wq2zwtkoao3i0ei7oyvcf1. 7 jin saiying, “research on chinese and foreign art image metadata and framework,” new art 37, no. 1 (2016): 129–32, https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname= cjfdlast2016&filename=xmsh201601019&uniplatform=nzkpt&v=eynvoucbcnpzjkw84 mxeabs--auqafuwanchem0p5phcmjw0s7jttnplobqop0_h. 8 xiao long and zhao liang, introduction and examples of chinese metadata (beijing: beijing library press, 2007). 9 li bo, “research on metadata model of intangible cultural heritage information resources,” library circle 5 (2011): 38–41, https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd& dbname=cjfd2011&filename=tsgu201105016&uniplatform=nzkpt&v=unflzsdezr0jue0ut _npb7h0ri5vioemybvm3zytqfh2quzuycubz5tzrbshnkwh. 10 ye peng and zhou yaolin, “the framework and standards of chinese intangible cultural heritage metadata,” 2013 international conference on applied social science research (paris: atlantis press, 2013). https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2001&filename=dxts200104005&uniplatform=nzkpt&v=v9a8p-rcf-4csl9yoaqskj5nbnfjrmwjhsaoj2pnqq9jl0tdsle3ntrjrzeto32h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2001&filename=dxts200104005&uniplatform=nzkpt&v=v9a8p-rcf-4csl9yoaqskj5nbnfjrmwjhsaoj2pnqq9jl0tdsle3ntrjrzeto32h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2001&filename=dxts200104005&uniplatform=nzkpt&v=v9a8p-rcf-4csl9yoaqskj5nbnfjrmwjhsaoj2pnqq9jl0tdsle3ntrjrzeto32h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2002&filename=qbkx200204012&uniplatform=nzkpt&v=yemo5mxwo0mzg5mkz6qml62oruvfchtdy2slxdbn_hesfdvspxuc-naorq0v0ikl https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2002&filename=qbkx200204012&uniplatform=nzkpt&v=yemo5mxwo0mzg5mkz6qml62oruvfchtdy2slxdbn_hesfdvspxuc-naorq0v0ikl https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2002&filename=qbkx200204012&uniplatform=nzkpt&v=yemo5mxwo0mzg5mkz6qml62oruvfchtdy2slxdbn_hesfdvspxuc-naorq0v0ikl http://eprints.rclis.org/13191/1/frad_2009-zh.pdf https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=xdqb201308015&uniplatform=nzkpt&v=skctnh3sg04qrgzqahxdh3nj2hmpk2ppmjbp4ymnpdq-phf2ffjwxpp5vcns9qc9 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=xdqb201308015&uniplatform=nzkpt&v=skctnh3sg04qrgzqahxdh3nj2hmpk2ppmjbp4ymnpdq-phf2ffjwxpp5vcns9qc9 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=xdqb201308015&uniplatform=nzkpt&v=skctnh3sg04qrgzqahxdh3nj2hmpk2ppmjbp4ymnpdq-phf2ffjwxpp5vcns9qc9 https://doi.org/10.1300/j104v36n03_5 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdtemp&filename=sztg201402011&uniplatform=nzkpt&v=tf76zueher7ymnfxdfafenmm2z2tetze08zqkdhoc7wq2zwtkoao3i0ei7oyvcf1 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdtemp&filename=sztg201402011&uniplatform=nzkpt&v=tf76zueher7ymnfxdfafenmm2z2tetze08zqkdhoc7wq2zwtkoao3i0ei7oyvcf1 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdtemp&filename=sztg201402011&uniplatform=nzkpt&v=tf76zueher7ymnfxdfafenmm2z2tetze08zqkdhoc7wq2zwtkoao3i0ei7oyvcf1 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2016&filename=xmsh201601019&uniplatform=nzkpt&v=eynvoucbcnpzjkw84mxeabs--auqafuwanchem0p5phcmjw0s7jttnplobqop0_h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2016&filename=xmsh201601019&uniplatform=nzkpt&v=eynvoucbcnpzjkw84mxeabs--auqafuwanchem0p5phcmjw0s7jttnplobqop0_h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2016&filename=xmsh201601019&uniplatform=nzkpt&v=eynvoucbcnpzjkw84mxeabs--auqafuwanchem0p5phcmjw0s7jttnplobqop0_h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2011&filename=tsgu201105016&uniplatform=nzkpt&v=unflzsdezr0jue0ut_npb7h0ri5vioemybvm3zytqfh2quzuycubz5tzrbshnkwh https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2011&filename=tsgu201105016&uniplatform=nzkpt&v=unflzsdezr0jue0ut_npb7h0ri5vioemybvm3zytqfh2quzuycubz5tzrbshnkwh https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2011&filename=tsgu201105016&uniplatform=nzkpt&v=unflzsdezr0jue0ut_npb7h0ri5vioemybvm3zytqfh2quzuycubz5tzrbshnkwh information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 11 11 bamo qubumo, guo cuixiao, yin hubin, and li gang, “customizing discipline-based metadata standards for digital preservation of living epic traditions in china: basic principles and challenges,” 2013 digital heritage international congress, https://ieeexplore.ieee.org/document/6744746. 12 y. j. nam and s. m. lee, “localization of metadata elements in the art museum community[j],” 충청문화연구 46, no. 2 (2012): 139–74. 13 bamo qubumo, c. guo, h. yin, et al., “customizing discipline-based metadata standards for digital preservation of living epic traditions in china: basic principles and challenges,” digital heritage international congress, ieee, 2014. 14 chao gojin, “unesco ethical principles for the protection of intangible cultural heritage: an introduction and comment [j],” inner mongolia social sciences (chinese version), 37, no. 05 (2016): 1–13, https://doi.org/10.14137/j.cnki. issn1003-5281.2016.05.00. 15 chen junxiu, “research on the mode of productive protection and utilization of intangible cultural heritage,” learning and practice 5 (2015): 118–23, https://kns.cnki.net/kcms/detail/ detail.aspx?dbcode=cjfd&dbname=cjfdlast2015&filename=xxys201505014&uniplatform= nzkpt&v=telpps4abo6-qidxtqjyu9a_hy0q6ukovi4x5nz8br-u33pzq6py2d1cshqlclnw. 16 zhao zhihui, “visual analysis of the evolution path and hot frontiers of cultural heritage digitization research,” library forum 2 (2013): 33–40, https://kns.cnki.net/kcms/detail/ detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=tsgl201302007&uniplatform=nzk pt&v=yezmntrx2f00eqvogxwtz5yehk3zz1dm8layjik4l1lmjvvjuq7gaiymloplnmiv. https://ieeexplore.ieee.org/document/6744746 https://doi.org/10.14137/j.cnki.%20issn1003-5281.2016.05.00 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2015&filename=xxys201505014&uniplatform=nzkpt&v=telpps4abo6-qidxtqjyu9a_hy0q6ukovi4x5nz8br-u33pzq6py2d1cshqlclnw https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2015&filename=xxys201505014&uniplatform=nzkpt&v=telpps4abo6-qidxtqjyu9a_hy0q6ukovi4x5nz8br-u33pzq6py2d1cshqlclnw https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2015&filename=xxys201505014&uniplatform=nzkpt&v=telpps4abo6-qidxtqjyu9a_hy0q6ukovi4x5nz8br-u33pzq6py2d1cshqlclnw https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=tsgl201302007&uniplatform=nzkpt&v=yezmntrx2f00eqvogxwtz5yehk3zz1dm8layjik4l1lmjvvjuq7gaiymloplnmiv https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=tsgl201302007&uniplatform=nzkpt&v=yezmntrx2f00eqvogxwtz5yehk3zz1dm8layjik4l1lmjvvjuq7gaiymloplnmiv https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=tsgl201302007&uniplatform=nzkpt&v=yezmntrx2f00eqvogxwtz5yehk3zz1dm8layjik4l1lmjvvjuq7gaiymloplnmiv abstract introduction research status at home and abroad connotation and design principles of intagible cultural heritage metadata connotation of intangible cultural heritage metadata design principles of intangible cultural heritage metadata intangible cultural heritage knowledge organization definition of intangible cultural heritage metadata description of digital resources of intangible cultural heritage data association examples of metadata application of intangible cultural heritage knowledge organizations introduction to the wuhan wood carving ship model intangible cultural heritage project knowledge organization construction based on metadata wuhan woodcarving ship model metadata definition conclusion funding statement endnotes reproduced with permission of the copyright owner. further reproduction prohibited without permission. the impact of web search engines on subject searching in opac yu, holly;young, margo information technology and libraries; dec 2004; 23, 4; proquest pg. 168 the impact of web search engines on subject searching in opac holly yu and margo young this paper analyzes the results of transaction logs at california state university, los angeles (csula) and studies the effects of implementing a web-based opac along with interface changes. the authors find that user success in subject searching remains problematic. a major increase in the frequency of searches that would have been more successful in resources other than the library catalog is noted over the time period 2000-2002. the authors attribute this increase to the prevalence of web search engines and suggest that metasearching, relevance-ranked results, and relevance feedback ("more like this") are now expected in user searching and should be integrated into online catalogs as search options. i n spite of many studies and articles on online public access catalogs (opac) over the last twenty-five years, many of the original ideas about improving user success in searching library catalog have yet to be implemented. ironically, many of these techniques are now found in web search engines. the popularity of the web appears to have influenced users' mental models and thus their expectations and behavior when using a webbased opac interface. this study examines current search behavior using transaction-log analysis (tla) of subject searches when zero-hits are retrieved. it considers some of the features of web search engines and online bookstores and suggests future enhancements for opacs. i literature review many studies have been published since the 1980s centering on the opac. seymour and large and beheshti provide in-depth overviews on opac research from the mid-1980s through the mid-1990s.' much of this research has addressed system design and user behavior including: • user demographic s, • search behavior, • knowledge of system, • knowledge of subject matter, holly yu (hyu3@calstatela.edu) is library web administrator and reference librarian at the university library, california state university, los angeles. margo young (margo.e.young@jpl. nasa.gov) is manager of the library, archives and records section at the jet propulsion laboratory, california institute of technology, pasadena. • library settings, • search strategies, and • opac systems 2 opac research has employed a number of data-collection methodologies: experiment, interviews, questionnaires, observation, think aloud, and transaction logs. ' transaction logs have been used extensively to study the use of opacs, and library literature reflects this. while the exact details of tla vary greatly, peters et al. define it simply as "the study of electronically recorded interactions between online information retrieval systems and the persons who search for the information found in those systems."' this section reviews the tla literature relevant to the study. i number of hits tla cannot portray user intention or actual satisfaction since relevance, success, or failure are subjectively determined and require the user to decide. peters recommends combining tla with another technique such as observation, questionnaire or survey, interview, or focus group. 5 in spite of the limit ations of tla, many studies (including this one) rely on it alone. typically, these studies define failure as zero hits in response to a search. generalizing from several studies, approximately 30 percent of all searches result in zero hits.6 the failure rate is even higher for subject searches: peters reported that about 40 percent of subject searches failed by retrieving zero hits. 7 some researchers also define an upper number of results for a successful sea rch. buckland found that the average retrieval set was 98.8 blecic reported that cochrane and markey found that opac users retrieve too much (15 percent of the time). 9 wiberly, daugherty, and danowski (as reported in peters) found that the median number of postings considered to be too many was fifteen, although when fifteen to thirty postings were retrieved, more users displayed them all than abandoned the search. 10 i subject searching some studies have specifically looked at subject searching. hildreth differentiated among various types of searches and defined one hundred items as the upper limit for keyword searches and ninety as the upper limit for subject searches." larson defined reasonable subject retrieval as between one and twenty items and found that only 12 percent of subject searches retrieved the appropriate number. 12 168 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. larson is not the only researcher to have reported poor results in subject searching. for more than twenty years, research has demonstrated that subject or topical searches are both popular and problematic. tolle and han found that subject searching is most frequently used and the least successful. 13 moore reported that 30 percent of searches were for subject, and matthews et al. found that 59 percent of all searches were for subject information. 14 hunter found that 52 percent of all searches were subject searches and that 63 percent of these had zero hits. 15 van pulis and ludy referred to alzofon and van pulis's earlier work in 1984 where they reported that 42 percent of all searches were subject searches.16 hildreth found that 62.1 percent of subject searches and 35.4 percent of keyword searches failed. 17 larson categorized the major problems with online catalogs as follows: • users' lack of knowledge of library of congress subject headings (lcsh), • users' problems with mechanical and conceptual aspects of query formulation, • searches that retrieve nothing, • searches that retrieve too much, and • searches that retrieve records that do not match what the user had in mind. 18 during an eleven-year longitudinal study, larson found that subject searching was being replaced by keyword searching. 19 no consistent pattern in the number of search terms has emerged in the literature. van pulis and ludy reported that user searches were typically single words. 20 markey contended that users' search terms frequently matched standardized vocabulary in large catalogs. 21 none of markey's researchers consulted lcsh, and only 11 percent of van pulis and ludy's did so, notably in spite of their library's user-education programs. peters reported that lester found that the average search was less than two words and fewer than thirteen characters." hildreth found that more than two-thirds of keyword searches included two or more words and 42 percent of these multiple-word searches resulted in zero hits. 23 the proportion of zero-hit keyword searches rose with the increasing number of words in the search. subject headings have been a matter of considerable study. gerhan examined catalog records and surmised their accessibility in an online catalog. he contended that when a keyword from the title only is accessed, only 50 percent of all relevant books would be found and that title keywords would lead a user to subject-relevant records in 55 percent of cases while lcsh would lead a user successfully in 85 percent of the cases. 24 in contrast, cherry found that 42 percent of zero-hit subject searches would have been more fruitful as keyword or title searches than by following cross references retrieved from the subject field.25 she recommended converting zero-hit subject queries to other types of subject searches (keyword). thorne and whitlatch recommended that subject searchers should select keyword rather than subject headings as their first access strategy. 26 types of problems in subject searches numerous studies have categorized reasons for search failure (typically in zero-hit situations), but peters reports that a standard categorization has not yet been established .27 tn cases where more than one error is made in a search (and hunter reported this to be frequent), there is no consistency in how that is assigned. nonetheless, some major categories of problems stand out: • misspelling and typographical errors-peters found that these errors accounted for 20.8 percent of all unsuccessful keyword searches, while henty (reported by peters) concluded that 33 percent of such searches could be attributed to this.28 hunter found that 9.3 percent of subject searches had typographical and spelling errors. 29 • keyword search-hunter found 52.6 percent of zerohit searches used uncontrolled vocabulary terms. 30 • wrong source or field-hunter concluded that 4.5 percent of searches should have been done in a source other than the catalog, while 1.3 percent of searches were of the wrong type (an author search in the subject-search option). 31 • items not in the database-peters found that searches for items not held in the database accounted for 39.1 percent of unsuccessful searches, while hunter found that problem in only 2.5 percent of the problem cases. 32 in addition to these problems, hunter also found that index display and rules relating to the systems accounted for 27 percent of errors. 33 i resulting recommendations for change while hildreth stated, "there has been little research on most components of the opac interface" in 1997, he proposed two options to improve user success: increased user training or improved design based on informationseeking behavior. 34 wallace pointed out that there is a very short window of opportunity when searchers are amenable to instruction and that successful screen designs should therefore focus on presenting the quicksearching options employed by the majority of users first. 35 large and beheshti observed "that too many options simply caused confusion, at least for less experienced opac users," and they summarized that opacthe impact of web search engines on subject searching in opac i yu and young 169 reproduced with permission of the copyright owner. further reproduction prohibited without permission. interface research focuses on menu sequence, browsing, and querying .3'; menu sequence in terms of menu sequence, hancock-beaulieu indicated that "the menu sequence in which search options are offered will influence user selection." 37 ballard found that the amount of keyword searching was affected by its posi tion on the menu. 38 scott reported that both keywordand subject-search success improved when the keyword was plac ed at the top of the menus .39 thorne and whitlach used a combination of methods in their study and concluded that several interface changes should be implemented : • strongly encourage novi ce users to start with keyword (list keyword above subject heading), • relabel "keyword" to "subject or title words," and • relabel ii subject heading" to "library of congress subject heading."' 0 blecic et al. studied tran sactio n logs over six months to track th e impact of "simplifying and clarifying" opac introductory screens. after moving the keyword option to th e top, keyword searching incr ease d from 13.30 percent to 15.83 percent of all sea rch statements. blecic et al. found her original tally of 35.05 p ercent of correct searches having zero hits decre ased to 31.35 percent after screen changes. 41 querying opac-interface design has been based on an assumption that us ers come to the catalog knowing what the y need to know . in either text-bas ed opac or web-based opac, query-based searches are still mainstream. searchers are required to have knowledge of title, author, or subject. ortiz-repiso and moscoso observed that web-based catalogs, like all library catalogs, basi cally fulfill two functions: locating works based on known details and identifying which documents in the databas e cover a given subject. 42 natural-language input has long been considered a desi rable way to overcome this shortcoming. browsing relevance-ranked output and hypertext were considered by hildr eth to be promising in 1997.43 opacs have not been conceived within a true h ypertext environment, but rather they maintain the structure of their original formats, principally machine-readable cataloging (marc), and therefore impede the generation of a structure of nodes and links. 44 in addition to continuing to employ marc format as its underlying structure, the concept of main entry and added entr y, field label, and displa y logic all reflect cataloging rules . amazon.com and barnes and noble have completel y mo ve d away from this centuryold structure to pro vi d e easy access to book information . in the web environment , th e concept of main ent ry loses its meaning to multiple-acces s points and linking capabilities of author, subject, and call number. another prominent drawback of web-based op a cs is that they have not taken advantage of thesaurus structure and utilized the thesaurus for sea rching feedback. the hierarc hical relationship in lcsh is underutilized in terms of the relationship betw een terms and associations through related terms. web-based opacs have failed to make use of this important access. the persistence of the se drawbacks in opac-interfac e design is rooted d eeply in cataloging rules that were derived from the manual environment more than a century ago. it reflects th e gap between "concepts typically h eld by nonprofessional users and those used in library practices." 45 in her article "why are online catalogs still hard to use?" borgman conclude s: despite numerous imp rove m en ts to the user interfac e of online catalogs in recent years, searc her s still find th em hard to use . most of th e improvements are in surface features rather than in th e core functionality. we see little evidence th at our research on searching behavior studies has influ enced onlin e catalog design. " catalog content users misunderstand th e scope of the catalog. in questionnaire responses, 80 percent of van pulis and ludy 's participants indicated the y had considered looking elsewhere than the library catalog, as in periodical ind exes. 47 blazek and bilal report ed a reque st for inclusion of journalarticle titles in one respo nse to their questionnaire .48 libraries responded to th ese requests by acquiring databas es on cd-rom , loadin g them locally (sometimes using the catalog system to mount a separate databas e), and, most recently, providing access to databases over the internet. however, seldom h ave libraries responded to these requests by integratin g searc h access through a single front end as the default search. i impact of web search engines blecic et al. found that keyword searching increased from 13.3 percent to 28.3 percent over her four-year series of logs. at the same time, zero hits in keyword increased from 8.71 percent to 20.78 percent while subject zero hits dropped from 23 percent to 13.69 percent. she surmised th at the influence of web interfaces might have affected the regression-fluctuation in search syntax, initial articles, and author order. 49 170 information technology ano libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. automatically sco uts the web for pa ges that are related to its res ults so it can find a large number of resources ver y qu ickly without requiring th e user to select the right keyw ord s . teoma structures the appropriate communities of int eres t on-the-fly and ranks th e results on a range of facto rs including authorities and hubs (good resources pointing to related resources). google offers an opti on of "similar pages." whil e the subj ect-r edirect function in a web-bas ed opac emulates thi s, it succ ee ds only if th e user 's initi a l search term y ielded the right result. opac u sers ha ve the option of clicking on hyperlinked h ea din gs (author, titl e, subject headin gs ) but cannot ask the sys tem to perform a more so phisticated sea rch on their behalf. user-popularity tracking amazon and barnes and nobl e web sites pr ese nt enhanced information about items b y user-popul arity tracking. circulation stati stics or user comments could serv e as a form of "r ecommend er sys tem " to h elp novi ces narrow th eir selections. messa ge s such as "o ther student s who checked this bo ok out also read thes e book s" could be dynamically in serted in bibliographic records. users could also be allowed to pro vide comment s on mat eri als in the catalog, thus providin g an int era ctive experience for opac user s. summary of web features there are positive and negati ve imp acts of web sea rch engines and on line bookstores on web-based opac u sers . u sers who find web p ages to b e comfortable, easy, and familiar may mak e greater use of web-ba sed opacs. while th ey brin g with them their knowledge of search eng ine s, they also brin g their misp erce ption s. the possibility of using similar too ls to those found on web sea rch en gine s can greatly "re infor ce the u sefuln ess of the catalog as well as th e positiv e perc eption that th e end us er has of it." 61 given the diver sity of the error s that u sers experience , a co mbination of approaches is necessa r y to improve their search success. automatic mapping of freetex t-to -th esauru s term s, tran sla tion of common spelling mistak es, and links to related pages are to ols alr eady in use in th e web sea rch engines . "see similar pages," extensive us e of releva nce feedback, and popularity track ing along with natural language are less common. i recommendations for web-based opacs th e authors' tla rev ea led a continuing problem with subject-h ea din g searches and sho we d a trend toward searching top ics that are n ot typically answered in a bo ok catalog. the form er probl em ha s a well-documented hi story, whil e the authors b elieve th e latt er probl em stems from the influence of th e web and web sear ch engin es . severa l changes to typical opacs are recommended to addr ess th e trend s observ ed in th e cour se of thi s study. metasearching th e recent trend of incorporating databases and opac s into a single sear ch reflects the neces sity of exp anding information resourc es and simplif ying access to resources. thi s stud y's empirical results clearly indicate a need to exp and thi s integration into one sea rch. while some argu e that this metasearching w ill further au gment the syntax digr ess ion an d pr eve nt us ers from becom ing information literate, oth er s beli eve that metas ea rchin g, along with th e option of sear chin g each individu al d a tabase , is an ultim a te goal for onl ine search. like it or not, the m etasearch technolog y, also known as federat ed or broadca st search, "crea tes a portal that could allow the libr ary to become the on e-stop shop th eir us ers and potential use rs find so attractive ."65 onesea rch-for-all cannot solve all problems; how ev er, guidin g u sers to where the y are mo st likel y to find results quickly (the quick search) should sa tisfy th e ne ed s of th e majority of u sers . menu sequence eff ec tive scree n d es ign h as a p osi ti ve e ffec t on user su ccess. the m enu sequence for search opti ons plays a significant role in user selection . this research and oth ers it h ave demonstr ated th at users choose an option hi gher rath er than lower in a list. too many options "simply cause confu sion, at least for less experienced opac use rs." •• browsing feature browsing is a natural and effective approach to m any information -seekin g problems an d requires less effort and knowledge on the part of th e u ser. the liter a ture suggests that a great deal of the use of th e web relies on known web sites, recommended sites , or return visits to sit es recently visited-thus relying on browsin g rather than on searching. jenkins, co rritore , and widenb eck found that domain novice s seldom clicked very deepout and b ack-while web experts explor ed mor e deeply. 67 holscher and strub e not e that hurtineene and wandtke claim that only minimal trainin g is necessa ry for brow sin g an individual web site, whil e pollok and hockl ey claim that considerably more experience is req uired for qu ery ing and na viga tin g among sites. 68 hancock -beaulieu found that betwe en 30 p ercent and 45 percent of all online searches, reg ardl ess of th e typ e of search, ar e concluded with brow sing the librar y shelve s.69 176 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. to implement user help throu gh tip s or tac tics select ed and accumulated from a collection of common u sersearc h mistakes. in such a case, the system would play a more activ e role by generatin g relevant search tips on th e fly and using zero-hits search resul ts as a basis for gener ating a spe ll check or sugg esting altern ate wording. an idea l scenario is th at opac allow s the user to pursue mu ltiple avenues of an inq uiry by entering fra gments of th e question, exploring vocabulary choices, and reformulating the search wi th the assis tan ce of var iou s spec ialized intelligent assistants. borgman suggests that an opac should be jud ge d by whether the ca tal og answers questions rather th an merely mat ches queri es. she s ugges ts the need to design systems that are ba sed on behavioral models of h ow people ask questions, arguing th a t users still need to tra n sla te their question into what a sys tem will accept. " user instruction onsite tr aini ng and online documentation can help mak e it eas ier to u se opac. with the adven t of information literacy, the shi ft in librar y instruction from procedur ebased query formulation to question-being-answered has taken place. at csula, in struct ion for en try-level classes focu ses on formulating a research sta teme nt and then identifying keywords and alternate terms. the instruc tion sess ions that follow the initia l-conce pt formulation are sh ort an d focus on how to en ter keyword or subject, a u t h 01~ a n d title, and th e u se of boolea n operators. thi s approac h may improve success until th e sys tems provid e th e tools to improve sea rch stra tegies or accept an unt rai ned user 's input. as an increas ing numb er of users access online librar y ca talogs remotel y, assistance needs to be embedded into intuitive sys tems. "time invested in elaborate help systems often is better spent in redesigning the user interfac e so that help is no longer n eeded." 74 users are not willing to devote much of their time to learning to use these systems. they just want to get th eir searc h results quickly and expec t the catalog to be easy to use w ith little or no tim e invested in learning th e sys tem. i conclusion the em piri cal study repo rted in thi s paper indicates th at p rogress has been made in terms of increasing search success by improv ing the opac search int erfac e. the goal is to design web-based opac systems for today's users who are like ly to bring a mental model of web search engin es to the lib rary catalog. web-b ased opacs and web search engi nes differ in terms of th eir sys tems and interfac e design. however, in most cases, these differences do not res ult in different sea rch charac teris tics by users. researc h findings on the impact of web searc h engines and u ser searc hing expectations and behavior should b e adequately utilized to guide the in terface design. web users typically do n ot know how a search engine works. therefore, fund amental fea tures in the desi gn of the n ext generation of th e opac in terface should includ e ch ang in g the search to allow natural-language searching wit h keyword search first, and focu s on meetin g th e quick-search need . such a concep t-ba sed sea rch will allow u sers to enter natu ra l lan guage of their chos en top ic in the searc h bo x w hil e th e system maps the quer y to th e s tru cture and content of the database. relevance feedb ack to allow the system to brin g back related page s, spe llin g correctio n, and relevan ce-ranked output remain key goals for future opacs. references and notes 1. sharon seymour, "on line public-access catalog user stud ies: a revi ew of research methodologies, march 1986november 1989," library and information science research 13 (1991): 89-102; andrew large and ja mshid beheshti , "opacs: a resear ch review," library and information science research 19 (1997): 2, 111-33. 2. ibid., 113-16. 3. ibid., 116-20. 4. thomas a. peters et al.," an introduct ion to the special section on transaction-log analysis," library hi tech 11(1993): 2, 37. 5. thomas a. peters, "the history and developm ent of transactionlog analysis," library hi tech 11 (1993): 2, 56. 6. pauline a. cochrane an d karen markey, "cata log use studies since th e introdu ction of onlin e interactiv e ca tal ogs: impact on design for subj ec t access, " in redesign of catalogs and indexes for improved subject access: selected papers of pauline a. cochrnne (phoenix: oryx , 1985), 159-84; steve n a. zink , "monitoring user success th ro u gh transac tion-log analysis: the wolfpac example," reference services review 19 (sprin g 1991): 44956; michael k. buckl and et al., "oas is: a front end for prototy ping catalog enhancements," library hi tech 10 (1992): 7-22. 7. thomas a. peters, "when smart people fail: an analysis of the tra nsaction log of an on line public-access catalog," journal of academic librarianship 15 (1989): 5, 267. 8. michael k. buckland et al., "oasis," 7-22. 9. deborah d. blecic et al., "using transac tion-lo g ana lys is to imp rove opac retrieval result s," college and research libraries (jan. 1998): 48. 10. peters, "histo ry and development of transacti on-log analys is," 2, 52. 11. cha rles r. hildr eth , "the use and understanding of keyword searching in a un iversity online catalog," information technology and libraries 16 (1997): 6. 12. ray r. larson, "th e decline of subject searching: long term trends and patt erns of index use in an online catalog," journal of the american society for information science and technology 42 (1991): 3, 210. 178 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. 13. john e. tolle and sehchang hah, "o nline search patterns: nlm catline database," journal of the american society for information science 36 (mar. 1985): 8293. 14. carol weiss moore, "user reac tion to online catalog s: an exploratory study," college and research libraries 42 (1981): 295-302; joseph r. matth ews et a l., using online catalogs: a nationwide survey-a report of a study sponsored by the council on library resources (new york: n ea l-schuman, 1983), 144. 15. rhonda n. hunter, "success and failures of patrons searching the online catalog at a large academic library: a transaction-log analysis," r.q 30 (spring 1991): 399. 16. noelle van puli s and lorne e. ludy, "subject searching in an onl ine cata log with aut h ority contro l," college and research libraries 49 (1988): 526. 17. hildret h, "th e use and understanding of keyword searching," 6. 18. ray r. larson, "the decline of subjec t searching," 3, 60. 19. ibid. 20. van pulis and ludy, "subj ect searching in an onlin e cat alog," 527. 21. karen markey, research report on the process of subject searching in the library catalog: final report of the subject access research project (repo rt no. oclc /op r/ rr-83-1) (dub lin , ohio: oclc online co mput er library center, 1983), 529. 22. pe ters, "the history and deve lopment o f transactionlog ana lysis," 2, 43. 23. hi ldr eth, "the use and understanding of keyword searching," 8-9. 24. david r. gerhan, "lcsh in vivo: subje ct searching performance an d strategy in th e opac era," journal of academic librarianship 15 (1989): 86-8 7. 25. joan m. cherry, "improving subject access in op acs: an exploratory study of conversion of users' queries," journal of academic librarianship 18 (1992): 2, 98. 26. rosemary thorne and jo bell whitlatch, "patron on line catalog success," college and research libraries 55 (1994): 496. 27. peters, "the history and developmen t of transactionlog analys is," 2, 48. 28. ibid. 29. h unt er, "succe ss and failures," 400. 30. ibid., 399. 31. ibid., 400. 32. peters, "the histor y and developmen t of transa ctionlog analysis," 2, 56. 33. hunter, "success and failures," 400. 34. hildreth , "the use and understandi n g of keyword searchi n g," 6. 35. patricia m . wa llace, "how do patrons search th e online c:, talog w h en no one 's looking? trnn sae tion-log a nal ysis and impli cation s for bibliographic instruction and system desi gn, " rq 33 (winter 1993): 3, 249. 36. large and beheshti, "opacs: a research review," 125. 37. m. m. hancock-beaulieu , "online cata logue: a case for the user," in the online catalogue: developments and directions, c. hildreth, ed. (london: library association , 1989), 25-46. 38. terry ballard, "com parative searching styles of patrons and staff," library resources and technical services 38 (1994): 293305. 39. jane scott et al.,"@*&#@ this computer and the horse it rode in on: patron frustration and failur e at th e opac" (in "co ntinuity and transformation : the promise of confluen ce": u sabi li rs·"' i [: ,, ), b p l..jr l.i ""( ' " user interface consulting fed erat ed search tn(,ln es 1.ibr;'.'\ry portals & [)at/\, (ln 'itr s ()pacs f.." ( h i ldrei' l's dl(, ital libr ar ies ezra schwartz locs (773) 256-1418 ezra@artandtech.com http://www.artandtech.com proceedings of the acrl 7th nationa l conference, chicago: acrl 1995), 247-56. 40. thorne and whitlat ch, "patron on lin e catalog success," 496. 41. blecic et al., "usin g tran sac tion-log ana lys is," 46. 42. virginia ortiz-repiso and purificac ion moscoso, "we bbased op a cs: between tradition and innovation ," lnformntion technology and libraries 18, no. 2 (june 1999): 68-69. 43. hildreth, "the use and understanding of keyword searching," 6. 44. ortiz-repiso and mos coso, "web-bas ed opac s," 71. 45. ibid., 75. 46. chris tine borgm an, "why are on line catalogs still hard to us e?" journal of the americnn society for information science 47 (1996): 7, 501. 47. van pulis and ludy, "subje ct searching in an onlin e cat alog," 53. 48. rla zek and bilal , "prob lems with opac: a case study of an academic research library," rq 28 (w int er 1988): 175. 49. debora h d. blecic et al., "a longitud inal stu dy of the effects of opac screen changes on searching behavior and user success," college and research library 60, no. 6 (nov. 1999): 524,527. 50. bernar d j. jan sen and udo pooch, "a revi ew of web searching studies and a framework for future resear ch," journal of the american society for information science and technology 52 (2001): 3, 249-50. 51. ibid., 250. 52. blazek and bilal, "problems with opac: a case study," 175; moore , "user reaction to online cata logs," 295-302. the impact of web search engines on subject searching in opac i yu and young 179 reproduced with permission of the copyright owner. further reproduction prohibited without permission. 53. m. j. bates, "the design of browsin g and berry-pickin g techniques for the onlin e search interfac e," online review 13 (1989): 5, 407-24. 54. jan sen and pooch, "a review of web searc hing studies, " 238. 55. judy luther, "trumping google? metasearching's promise," library journal 128 (2003): 16, 36. 56. jack muramatsu and wanda pratt, "transparent queries: investigating users' mental models of search engines," research and development in information retrieval sept. 2001. accessed mar. 10, 2003, http://citeseer.nj.nec.com/muramatsuoltransparent. html. 57. jans en and pooch, "a review of web searching studie s," 235. 58. luth er, 't rumping goog le," 36. 59. blecic et a l., "a lon gitudina l study of th e effects of opac screen changes," 527. 60. sus an m. colaric, "ins truction for web searching: an empirical study," college and research libraries news, 64 (2003): 2. 61. a. g. sutcliff, m. ennis, and s. j. watkinson, "empirical studies of end-user informati on searching," joumal of the american society for information science and tcchnologtj 51 (2000): 13, 1213. 62. "a ll about google," google. accessed dec. 10, 2003, www.google.com. 63. g. salton, introduction to modern information retrieval (new york: mcgraw-hill, 1983), 18. 64. orti z-rep iso and moscoso, "we b-ba sed opacs," 71. 65. luth er, "trumping google," 37. 66. maaike d. kiestr a et al, "end-us ers searching th e online catalogue: the influenc e of domain and system knowledge on search patterns. experiment at tilburg university," the electronic library 12 (dec. 1994): 335-43. 67. c. jen kins et al., "pa tterns of in forma tion seeking on the web: a qualitative study of domain expertise and web experti se," it and society l (winter 2003): 3, 74,77. accessed may 10, 2003, www.itandsociety.org/. 68. c. holscher and g. strube, "web search behavior of internet experts and newbi es," 9th international world wide web conference, (amsterdam. 2000). accessed mar. 28, 2003, www9.org/ w9cdrom /8 1/81.html; a. pollock and a. hockley, "wha t's wrong with internet searching," d-lib magazine (mar. 1997). accessed may 10, 2003, www.dlib.org/dlib/march97 /b t /03 pollo ck.h tml. 69. m . m . hanco ck-beau lieu , "on lin e catalogue: a case for the user," 25-46. 70. wilbert 0. galitz, the essential guide to user interface design: an introduction to gui design principles and techniques (chichester, england: wiley, 1996). 71. juliana chan," an evaluation of displays of bibliographic records in opacs in canadian academic and public libraries," mis report, univ. of toronto, 1995. [025.3132 c454e] 72. giorgio brajnik et al., "strategic h elp in user interfaces for information retriev al," journal of the american society for information science and technology (jasist) 53 (2002): 5, 344 . 73. borgman, "why are online catalogs still hard to use?" 500. 74. ibid . 180 information technology and libraries i december 2004 algorithmic literacy and the role for libraries article algorithmic literacy and the role for libraries michael ridley and danica pawlick-potts information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12963 abstract artificial intelligence (ai) is powerful, complex, ubiquitous, often opaque, sometimes invisible, and increasingly consequential in our everyday lives. navigating the effects of ai as well as utilizing it in a responsible way requires a level of awareness, understanding, and skill that is not provided by current digital literacy or information literacy regimes. algorithmic literacy addresses these gaps. in arguing for a role for libraries in algorithmic literacy, the authors provide a working definition, a pressing need, a pedagogical strategy, and two specific contributions that are unique to libraries. introduction algorithms, in one form or another, are as old as human problem solving and as simple as “a sequence of computational steps that transform the input into the output.”1 for centuries they have been effective, and uncontroversial, methodologies. however, the rise of artificial intelligence (the integration of big data, enhanced computation, and advanced algorithms) with its human and greater-than-human performance in many areas has positioned algorithms as transformational and a “major human rights issue in the twenty-first century.”2 algorithmic literacy is important given of the prevalence of algorithmic decision-making in many aspects of everyday life and because “the danger is not so much in delegating cognitive tasks, but in distancing ourselves from—or in not knowing about—the nature and precise mechanisms of that delegation.”3 as a result, david lankes warns of a new type of digital divide with “a class of people who can use algorithms and a class used by algorithms.”4 in a 2019 deloitte survey “only 4 percent reported they were confident explaining what ai is and how it works.”5 while a 2019 edelman survey indicated general awareness of ai, it also revealed a similar lack of knowledge about the details of ai.6 an informed, algorithmically literate public is better able to negotiate and employ the complexities of ai.7 identifying and acting upon algorithms as a literacy makes them as “fundamental as reading, writing, and arithmetic.”8 however, the uncritical use of the term literacy should make one suspicious of extending it to algorithms. increasingly “literacy” has come to mean merely a body of knowledge or a set of domain-specific skills.9 various literacies have been described, such as health, death, financial, physical, ocean, religious, visual, dancing, spatial, screen, and porn. this includes a dozen different technology-related literacies.10 the case for algorithmic literacy, and the role for libraries in advancing it, must rest on a clear definition, a recognized problem and need, a pedagogical strategy, and a unique (or at least supportive) contribution libraries can provide. michael ridley (mridley@uoguelph.ca) is librarian emeritus, mclaughlin library, university of guelph, ontario, canada. danica pawlick-potts (dpawlic@uwo.ca) is phd candidate, faculty of information and media studies, western university, ontario, canada. © 2021. mailto:mridley@uoguelph.ca mailto:dpawlic@uwo.ca information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 2 algorithms and literacy while the term “algorithmic literacy” is recent, it has antecedents that cover similar if not equivalent ground. the general terms computer literacy or digital literacy have spawned more specific terms such as cyber literacy, computational thinking, and algorithmic thinking.11 most of these arise from the field of computer science, where algorithms are central, and focus on the computational nature of algorithms as a “matter of mathematical proof” where “other knowledge about algorithms—such as their applications, effects, and circulation—is strictly out of frame.”12 the implications of algorithms in everyday life suggests that a deeper and broader interpretation is required. whether a literacy, a mode of thinking, or merely a set of skills, discussions about computation and algorithms have been plagued by “ambiguity and vagueness” and “definitional confusion” resulting in ongoing challenges in establishing core pedagogy in both k–12 and higher education.13 without a clear, acknowledged, and actionable definition that differentiates it from concepts such as digital literacy, computational thinking, and algorithmic thinking, algorithmic literacy will be relegated to a buzz phrase and the urgency of its recognition and application will be lost. the relationship between algorithms and artificial intelligence might recommend the adoption of “ai thinking” or “ai literacy” as the more appropriate term.14 however, algorithmic literacy is both more foundational than the broader concept of ai and more actionable than just thinking. algorithms are not a technology like ai or, more generally, computers. algorithms provide a structure that frames—and constrains—how we express ourselves. they are a way of seeing and acting in the world and “need to be understood as relational, contingent, [and] contextual.”15 while the technical and operational aspects of algorithms are important to understand and use (as they are for the technologies and processes of reading and writing in a new language), they are complemented by a broader awareness: literacy is not a set of generic skills or something we do or do not possess, it’s a sociocultural practice, it’s something that we do, and what we do with literacy depends on the social, cultural, and historical contexts in which we do it. literacy looks different in different contexts and communities. literacy is not neutral, it’s ideological. there are dominant and marginalized literacies.16 this perspective is the essence of critical algorithm studies, where algorithms are viewed as sociotechnical systems that are “intrinsically cultural . . . constituted not only by rational procedures, but by institutions, people, intersecting contexts, and the rough-and-ready sensemaking that obtains in ordinary cultural life.”17 algorithms as part of increasingly ubiquitous ai, such as machine learning and deep learning systems, reflect and promulgate certain ideologies and have impacts and influences in the full range of human society. cautions about algorithmic decision-making have identified the far-reaching implications for bias, fairness, privacy, and democratic processes.18 at the same time, numerous national strategies to support ai development have highlighted the substantial economic impact, anticipated to be $15.7 trillion (us) by 2030.19 the idea of algorithmic literacy must encompass multiple perspectives and contexts. information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 3 “literacies of the digital” computer, internet, information, computation, and algorithmic are all “literacies of the digital.” 20 while each of these has its own domain and focus, they share common ideas and are generally symbiotic with each other. there is an especially strong and complementary connection between computational literacy and information literacy.21 computational thinking and algorithmic literacy are closely related even if most definitions of the former fail to fully acknowledge the broader social, economic, and political implications. however, the extensive literature on computational thinking is useful in helping to articulate aspects of algorithmic literacy. wing’s foundational article about computational thinking describes the key characteristics in terms that closely resemble a literacy: 1. conceptualizing, not programming 2. fundamental, not a rote skill 3. a way that humans, not computers, think 4. complements and combines mathematical and engineering thinking 5. ideas, not artifacts 6. for everyone, everywhere.22 jacob and warschauer make a strong case for computational thinking as a literacy. their threepart framework identifies computational thinking as a new literacy embedded in modern sociocultural practices (computational thinking as literacy), discusses how literacy development can be leveraged to foster computational thinking (computational thinking through literacy), and explores ways in which computational thinking can facilitate literacy development (literacy through computational thinking).23 this analysis of computational thinking informs the larger context and broader implications of algorithmic literacy. defining algorithmic literacy scribner and cole define a literacy as “socially organized practices [that] make use of a symbol system and a technology for producing and disseminating it.”24 therefore, literacy = practices + symbol system + technology. to this definition, steiner adds a more aspirational and humanistic definition: by “literacy” i mean the ability to engage with, to respond to, what is most challenging and creative in our societies. to experience and contribute to the energies of informed debate. to distinguish the “news that stays news,” as ezra pound put it, from the tidal waves of ephemeral rubbish, superstition, irrationalism, and commercial exploitation.25 literacy is about knowing and meaning making through the processes of internalizing and externalizing information. literacy enables a reflective, critical, and integrative approach to information that utilizes a broad knowledge base for both understanding and communicatin g ideas. finn calls for an algorithmic literacy “that builds from a basic understanding of computational systems, their potential and their limitations, to offer us intellectual tools for interpreting the algorithms shaping and producing knowledge” and thereby provides “a way to information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 4 contend with both the inherent complexity of computation and the ambiguity that ensues when that complexity intersects with human culture.”26 referring more broadly to “ai literacy,” long and magerko provide an operational view defining it as “a set of competencies that enables individuals to critically evaluate ai technologies; communicate and collaborate effectively with ai; and use ai as a tool online, at home, and in the workplace.” 27 following an exhaustive analysis of different, and often contradictory, definitions of literacy, information literacy, and digital literacy, bawden suggests “explaining, rather than defining, terms.”28 this provisional description of algorithmic literacy acknowledges that advice. algorithmic literacy is the skill, expertise, and awareness to • understand and reason about algorithms and their processes • recognize and interpret their use in systems (whether embedded or overt) • create and apply algorithmic techniques and tools to problems in a variety of do mains • assess the influence and effect of algorithms in social, cultural, economic, and political contexts • position the individual as a co-constituent in algorithmic decision-making. this description recognizes two overarching concepts: “creativity and critical analysis.”29 creativity involves building, creating, and using algorithms for specific purposes. critical analysis involves recognizing the application of algorithms in decision-making and the implications of their use in a variety of settings and within certain contexts. why algorithmic literacy? the need for algorithmic literacy arises from two key and equally important perspectives, both of which essentially focus on power: control and empowerment. algorithms, especially those us ing machine learning and deep learning, are complex, opaque, invisible, shielded by intellectual property protection, and most importantly, consequential in the everyday lives of people.30 control is held by those who build and deploy algorithms, not those who use them. in part because of these characteristics, people hold significant misconceptions about algorithms, their use, and their effect. in a 2019 global survey of consumers, 72% said they understood what ai was. however, despite ai being used in a wide variety of consumer-facing applications (e.g., email, search, social media), 64% said they had never used ai.31 a study of facebook users found that 62% were unaware that the news feed is algorithmically constructed and, even when told this, 12% concluded that it is, as a result, completely random.32 bias, discrimination, and unfairness in ai have been well documented.33 it is clear that poor data combined with underspecified algorithms and uncritical interpretations of the ai model outcomes can lead to abuses in a variety of ways. there is no quick fix, no automated solution to these problems. accordingly, those creating algorithms and those using them must be able to question the source of training data, the strengths and weaknesses of learning algorithms, the metrics for success, and how (and for whom) the systems are being optimized. the overarching objectives are accountability and transparency. perhaps most critically, the prevalence of algorithms in our lives has changed the way we interact with and use those systems, and the ways we behave in personal and social contexts. we conduct information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 5 ourselves to be “algorithmically recognizable” allowing us to become “increasingly legible to machines for capture and calculation.”34 the danger is that this will “lead users to internalize their [algorithm’s] norms and priorities.”35 at the same time the power of algorithmic technology is abused and misused, it remains a powerful technology to enhance human capabilities and insight. algorithms are attributable to dramatic advances in health care and science as well as more mundane (but appreciated) applications such as spam filters. anti-science sentiments, typified by anti-vaxxers, should not be allowed to undermine the opportunities for algorithms that materially improve the human condition and the natural world. those opportunities now extend beyond the well-funded, technology-rich research and corporate ai departments. increasingly more consumer-friendly tools and applications allow a broader and more diverse population to create algorithmic solutions. the rise of mlaas (machine learning as a service) brings together powerful cloud-based machine learning environments with accessible toolsets.36 algorithmic literacy is needed to acknowledge both the technology’s power (control) over people and power (empowerment) for people. recognizing the need for protection and encouragement, many governments have enacted protective legislation and training initiatives. emblematic of the former is the general data protection regulation (gdpr) of the european union with its “right to explanation” for algorithmic decisions.37 exemplary of the latter is finland’s initiative to educate a large portion of their population through “elements of ai,” a free online course.38 despite these advances there remain power imbalances that require vigilance on the part of 21 st century digital citizens. understanding the power and politics of algorithms recognizes their ontological impact in “new ways of ordering the world.”39 effects this profound suggest a deeper and more comprehensive understanding of algorithms is needed: efforts to help people understand algorithms need to continue moving away from a focus on building awareness of algorithms—people increasingly know about “those things called algorithms”—and toward explaining algorithms in such a way that people have a more consistent conceptualization of what algorithms are, what algorithms do, and—what often is overlooked—what algorithms cannot do.40 algorithmic literacy, like all literacies, is not about mastery but levels of competence appropriate to age, circumstance, and need. understood simply as recipes or visual decision trees, algorithms are accessible to even those with minimal digital literacy. public institutions, and specifically libraries, can and must take a lead role in addressing the challenges of this “new world.” the library role in algorithmic literacy libraries have traditionally played a central role in making emerging technologies accessible to their communities whether those be online systems, makerspaces, interactive media, virtual reality, or a host of others. advancing digital access, digital literacy, and digital inclusion have long been the acknowledged by governments and public agencies as a role of the public library even if not appropriately funded to do so.41 recently, libraries have begun addressing their role in relation to ai and algorithmic literacy. information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 6 the urban libraries council (ulc) conducted an informal poll about ai and public libraries.42 of the responding libraries (83 of its 150-member library systems), 45% identified ai as important to their leadership with 23% having a staff person dedicated to ai and 27% providing programming to help the public learn about ai. in response to a question of how best libraries could serve their community in this area, 79% said by framing and building awareness of ai, 68% recommended providing continuous education opportunities for the public, and 61% supported the provision of experiential programming. in 2019, the ulc formed a working group to advance the public library role in ai awareness, education, and experiences. in 2018 the canadian federation of library associations (cfla) held a national forum in part focused on artificial intelligence.43 participant discussions yielded three key priorities with respect to ai: training for library staff, educational materials of for the public, and advocacy initiatives regarding privacy, bias, and transparency. a fourth priority was the inclusion o f ai literacy and awareness in mis and mlis curricula to facilitate a leadership role for the profession in this area. algorithmic literacy programs have two general audiences: members of the community the library serves and the staff of the libraries themselves. for the community, these programs center on awareness and implications, skill development, and application and use.44 through workshops, hands-on laboratories and makerspaces, consumer checklists, and a variety of informational tools, libraries can provide, or partner in providing, resources in an ageand context-appropriate setting. for library staff, an additional focus is required on advocacy with respect to regulatory issues, system development, and the evolution of the local and national information infrastructures. library staff can lead, and participate in, advocacy programs that seek to influence government, public agencies, commercial system and service providers, and others about algorithmic literacy. it is a misconception to think of algorithms, and ai more generally, as arcane topics beyond the ability of library staff to understand and teach. while the technical details of ai are complex, this is not the level of understanding required of staff or needed by the library’s community. for example, ai programing at the frisco public library introduced ai maker kits and ran basic ai classes. the toronto public library, through its digital innovation hubs, has offered learning circles in basic ai (using the finnish elements of ai course as a foundation) and hosted presentations on various aspects of algorithms in everyday life. by abstracting algorithms to higher level concepts related directly to daily experience (using facebook is illustrative of many key ideas regarding algorithmic literacy), staff can obtain a sufficient overview from a variety of accessible, introductory texts or videos. perhaps most importantly, given the new and evolving nature of this technology, library staff should view themselves as co-learners. no matter the setting or context, an active learning approach is recommended with learners situated as makers as well as consumers.45 a review of the k–12 curricula regarding computational literacy identified active learning strategies based on projects, problem solving, cooperation, and games. the researchers recommend augmenting these with scaffolding strategies, storytelling, and aesthetic approaches.46 while intended for algorithmic literacy initiatives involving children, four design principles from dasgupta and mako are relevant for any demographic: https://friscolibrary.com/ https://www.torontopubliclibrary.ca/ https://www.elementsofai.com/ information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 7 1. make data analysis central and ensure the data is relevant to the learner, 2. manage risk by using sandboxes for experimentation, 3. respect community values about technology that may differ, and 4. support authenticity with real-world examples and scenarios.47 long and magerko document a set of 17 core competencies and 15 associated learning design considerations regarding ai literacy.48 taken together these represent the basis for an algorithmic literacy program for any demographic and any context. libraries are encouraged to seek partnerships and collaborations with schools (k–12 and higher education) as well as with non-profit advocacy and training groups.49 examples among these include the algorithmic literacy project (algorithmliteracy.org) and a.i. for anyone (https://aiforanyone.org). many technology companies also offer high quality programs and resources. however, a report from the public policy forum notes that digital literacy campaigns are “too often funded by the very companies that are contributing to the problem.”50 a key issue is the lack of assessment instruments. there are none for algorithmic literacy and few for computational thinking. the most prominent of the latter is skills based, focusing on concepts and operational practices and very little on the wider social and cultural implications.51 library experience with information literacy assessment can inform algorithmic literacy assessment by helping to balance skills and operational concerns with a wider focus on concepts and contextual awareness. information literacy and explainable ai (xai): unique library contributions while libraries can make contributions to algorithmic literacy through a variety of programs, resources, and advocacy initiatives, two specific areas suggest opportunities for unique contributions: algorithmic literacy as a part of information literacy and algorithmic literacy in support of “explainable ai” (xai). algorithmic literacy and information literacy annemaree lloyd describes the opacity and ubiquity of algorithms as “a wicked problem for librarians and archivists who have a vested interest in equitable access, informed citizenry and the maintenance of public memory” and insists that information literacy “provides resistance to the expansionist claims of algorithms, while at the same time ensuring that people harness the power of this culture to their advantage.”52 information literacy programs championed by libraries have been instrumental in raising awareness and skill building among their user communities. using information literacy programs as a scaffold, algorithmic literacy can be incorporated into these successful initiatives. however, given the current needs “machine learning and algorithms present frontstage in the information literacy constellation.”53 head et al., in their important 2020 study of algorithms and information literacy, present a view of student perspectives that is both troubling and optimistic. 54 the students expressed “a tangle of resignation and indignation” about the effects of algorithms on their lives. for them, algorithms obscure more than they reveal, privacy is compromised, “trust is dead,” and skepticism is total. the authors conclude that we face an “epistemological crisis” where algorithms are “stripping individuals of the responsibility to interpret the facticity of the information these systems give us when that interpretation has been performed by the algorithms themselves.” however, students also employed “defensive practices” against algorithms, utilized “multiple selves” to preserve their https://algorithmliteracy.org/ https://aiforanyone.org/ information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 8 privacy, and were keen to learn how to “fight back” against surveillance and algorithmic decisionmaking. this is a reminder that “while algorithms certainly do things to people, people also do things to algorithms.”55 people have “algorithmic capital” which they can use in “negotiation with algorithmic power.”56 with these findings, it seems clear that status quo information literacy programs will not address the unique challenges presented by algorithms. jason clark, scott young, and lisa janicke hinchliffe took up this challenge with a project funded by an imls grant.57 calling “algorithmic awareness” a “new competency,” these researchers identified a gap in the acrl framework for information literacy that revealed “a lack of an understanding around the rules that govern our software and shape our digital experiences.”58 those rules are the “invisible logic” of algorithms that need to be made transparent for users and library staff. deliverables from this project include an integrated curriculum, syllabus, and software prototype that respond uniquely to the pedagogical challenges of algorithmic literacy.59 in promoting ml (machine learning) literacy, ryan cordell also calls for a specific pedagogical approach that would “emphasize the situated-ness of ml training data and experiments, including the biases or oversights that influence the outcomes of academic, economic, and governmental ml processes.”60 recommendations from this report provide guidelines for developing staff expertise, running pilot projects, and creating toolsets and checklists supportive of responsible machine learning. algorithmic literacy and explainable ai (xai) perhaps a less obvious way for libraries to contribute to algorithmic literacy is through explainable ai (xai).61 difficulties in interrogating algorithms to assess bias, discrimination, and unfairness (as well as other deficiencies such as veracity and generalizability) have led to widespread interest in xai. the purpose of xai is to “enable human users to understan d, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners” and to deploy ai systems that have “the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future.”62 there is complementarity between the objectives of xai and algorithmic literacy. both seek transparency, promote understanding, and facilitate accountability. both recognize the primacy of human agency in human-machine interaction. xai is accomplished through a variety of techniques, strategies, and processes. these can involve unambiguous proofs, technical and statistical interventions for verification and validation, and authorizations that rely on standards, audits, and policy directives.63 explanations are contextual. system designers, professionals, regulators, end users, and the general public need explanations specific to their objectives and tailored to their skills and knowledge. as algorithmic decision-making is increasingly embedded in the information tools, services, and resources provided by libraries and promoted to users, xai and algorithmic literacy can operate in close association. libraries can incorporate aspects of xai into algorithmic literacy programming and the principles of algorithmic literacy (and more generally information literacy) can inform how xai is sensitive and responsive to different explanatory needs. xai is still an emergent field but it has had, and will continue to have, a profound impact on the development of machine learning systems. the opportunity for library involvement is immediate: information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 9 librarians need to become well versed in these technologies, and participate in their development, not simply dismiss them or hamper them. we must not only demonstrate flaws where they exist but be ready to offer up solutions. solutions grounded in our values and in the communities we serve.64 a repeated message from lis researchers is that library-developed tools to interrogate ai systems are essential components in advancing algorithmic literacy. 65 these tools can address the complexity and opacity of machine learning systems and provide levels of explainability and transparency in contextually appropriate ways. one such tool, either as a stand-alone system or embedded in an existing discovery system, might provide a user with access to the nature, and potential bias, of the training data, the general efficacy of the learning algorithm(s) used, and the generalizability of the trained model to different contexts. this xai scorecard would integrate the objectives of xai, algorithmic literacy, and information literacy. by leveraging and developing library staff skills and by partnering with ai research and industry groups “libraries can become ideal sites for cultivating responsible and responsive ml.”66 padilla views this engagement as not just a technical initiative but a library-wide effort to promulgate “responsible operations” with ai, noting that library practices “that embed transparency and explainability increase the likelihood of organizational accountability.”67 conclusion algorithms are “the new power brokers in society” and “we are growing increasingly dependent on computational spectacles to see the world.”68 lash argues that this development has altered the rules by which society operates. constitutive rules (e.g., rules that define the boundaries of society) and regulative rules (e.g., the rules define how we operate in society) are now joined by “algorithmic, generative rules.” these rules are “compressed and hidden and we do not encounter them in the way that we encounter constitutive and regulative rules. yet this third type of generative rules is more and more pervasive in our social and cultural life of the post-hegemonic order.”69 algorithmic literacy is a means to understand this new set of rules and to encourage the skills and abilities so people can use algorithms and not be used by them. libraries have typically championed accessible technology and its effective use. the ubiquity of algorithmic decisionmaking and its profound impact on everyday lives makes the recognition and promotion of algorithmic literacy a critical new challenge and imperative for libraries of all types. information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 10 endnotes 1 thomas h. cormen et al., introduction to algorithms, 3rd ed. (cambridge ma: mit press, 2009), 13. 2 yoav shohman et al., “ai index 2017 report” (stanford, ca: human-centered ai initiative, stanford university, 2017), http://cdn.aiindex.org/2017-report.pdf; safiya noble, algorithms of oppression: how search engines reinforce racism (new york: new york university press, 2018), 1. 3 jos de mul and bibi van den berg, “remote control: human autonomy in the age of computer mediated agency,” in law, human agency, and autonomic computing, ed. mireille hildebrandt and antoinette rouvroy (abingdon: routledge, 2011), 58. 4 lee rainie and janna anderson, “code-dependent: pros and cons of the algorithmic age” (pew research center, february 2017), http://www.pewinternet.org/wpcontent/uploads/sites/9/2017/02/pi_2017.02.08_algorithms_final.pdf. 5 “canada’s ai imperative: from predictions to prosperity” (toronto: deloitte, 2019), 16, https://www.canada175.ca/en/reports/aiimperative?&id=ca:2el:3or:awa_2019_fcc_omnia1:from_dca_fccomnia2. 6 “2019 edelman ai survey,” edelman, 2019, https://www.edelman.com/sites/g/files/aatuss191/files/201903/2019_edelman_ai_survey_whitepaper.pdf. 7 jenna burrell, “how the machine ‘thinks’: understanding opacity in machine learning algorithms,” big data & society 3, no. 1 (2016), https://doi.org/10.1177/2053951715622512; rainie and anderson, “code-dependent.” 8 jeannette wing, “computational thinking, 10 years later,” communications of the acm 59, no. 7 (2016): 10, https://doi.org/10.1145/2933410. 9 loanne snavely and natasha cooper, “the information literacy debate,” journal of academic librarianship 23, no. 1 (1997): 9–14, https://doi.org/10.1016/s0099-1333(97)90066-5. 10 alfred thomas bauer and ebrahim mohseni ahooei, “rearticulating internet literacy,” cyberspace studies 2, no. 1 (2018): 29–53, https://doi.org/10.22059/jcss.2018.245833.1012. 11 evelyn stiller and cathie leblanc, “from computer literacy to cyber-literacy,” journal of computing sciences in colleges 21, no. 6 (2006): 4–13; peter j. denning and matti tedre, computational thinking (cambridge ma: mit press, 2019); z. katai, “the challenge of promoting algorithmic thinking of both sciencesand humanities-oriented learners,” journal of computer assisted learning 31, no. 4 (2015): 287–99, https://doi.org/10.1111/jcal.12070. 12 nick seaver, “what should an anthropology of algorithms do?” (american anthropological association, chicago, 2013), 1–2, http://nickseaver.net/papers/seaveraaa2013.pdf. 13 jesús moreno-león and marcos román-gonzález, “on computational thinking as a universal skill,” in ieee global engineering education conference (educon, santa cruz de tenerife, http://cdn.aiindex.org/2017-report.pdf http://www.pewinternet.org/wp-content/uploads/sites/9/2017/02/pi_2017.02.08_algorithms_final.pdf http://www.pewinternet.org/wp-content/uploads/sites/9/2017/02/pi_2017.02.08_algorithms_final.pdf https://www.canada175.ca/en/reports/ai-imperative?&id=ca:2el:3or:awa_2019_fcc_omnia1:from_dca_fccomnia2 https://www.canada175.ca/en/reports/ai-imperative?&id=ca:2el:3or:awa_2019_fcc_omnia1:from_dca_fccomnia2 https://www.edelman.com/sites/g/files/aatuss191/files/2019-03/2019_edelman_ai_survey_whitepaper.pdf https://www.edelman.com/sites/g/files/aatuss191/files/2019-03/2019_edelman_ai_survey_whitepaper.pdf https://doi.org/10.1177/2053951715622512 https://doi.org/10.1145/2933410 https://doi.org/10.1016/s0099-1333(97)90066-5 https://doi.org/10.22059/jcss.2018.245833.1012 https://doi.org/10.1111/jcal.12070 http://nickseaver.net/papers/seaveraaa2013.pdf information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 11 spain: ieee, 2018), 1684–89; shuchi grover and roy pea, “computational thinking in k–12: a review of the state of the field,” educational researcher 42, no. 1 (2013): 38–43, https://doi.org/10.3102/0013189x12463051; betual c. czerkawski and eugene w. lyman iii, “exploring issues about computational thinking in higher education,” techtrends 59, no. 2 (2015): 57–65. 14 daniel zeng, “from computational thinking to ai thinking,” ieee intelligent systems (november/december, 2013), 2–4; duri long and brian magerko, “what is ai literacy? competencies and design considerations,” in proceedings of the 2020 chi conference on human factors in computing systems, chi ’20 (honolulu, hi: association for computing machinery, 2020), 1–16, https://doi.org/10.1145/3313831.3376727. 15 rob kitchin, “thinking critically about and researching algorithms,” information, communication & society 20, no. 1 (2017): 18, https://doi.org/10.1080/1369118x.2016.1154087. 16 karen nicholson, “information into action? reflections on (critical) practice” (workshop on instruction in library use (wilu), university of ottawa, 2018), 7–8, https://ir.lib.uwo.ca/fimspres/51/. 17 nick seaver, “algorithms as culture: some tactics for the ethnography of algorithm systems,” big data & society 4 (2017): 10, https://doi.org/10.1177/2053951717738104. 18 virginia eubanks, automating inequity: how high-tech tools profile, police, and punish the poor (new york: st. martin’s press, 2018); noble, algorithms of oppression; cathy o’neil, weapons of math destruction: how big data increases inequality and threatens democracy (new york: crown, 2016); frank pasquale, the black box society: the secret algorithms that control money and information (cambridge, ma: harvard university press, 2015). 19 time dutton, “building an ai world: report on national and regional ai strategies” (toronto: cifar, 2018), https://www.cifar.ca/docs/default-source/aisociety/buildinganaiworld_eng.pdf?sfvrsn=fb18d129_4; pricewaterhousecooper, “sizing the prize: what’s the real value of ai for your business and how can you capitalise?,” 2017, https://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prizereport.pdf. 20 allan martin and jan grudziecki, “digeulit: concepts and tools for digital literacy development,” innovation in teaching and learning in information and computer sciences 5, no. 4 (2006): 249–67, https://doi.org/10.11120/ital.2006.05040249. 21 rosanne cordell, “information literacy and digital literacy: competing or complementary?,” communications in information literacy 7, no. 2 (2013): 177–83, https://doi.org/10.15760/comminfolit.2013.7.2.150; andreas dengel and ute heuer, “a curriculum of computational thinking as a central idea of information & media literacy,” in proceedings of the 13th workshop in primary and secondary computing education (wipsce’18) october 4-6, 2018, potsdam, germany (new york: acm, 2018), https://doi.org/10.1145/3265757.3265777; sarah gretter and aman yadav, “computational https://doi.org/10.3102/0013189x12463051 https://doi.org/10.1145/3313831.3376727 https://doi.org/10.1080/1369118x.2016.1154087 https://ir.lib.uwo.ca/fimspres/51/ https://doi.org/10.1177/2053951717738104 https://www.cifar.ca/docs/default-source/ai-society/buildinganaiworld_eng.pdf?sfvrsn=fb18d129_4 https://www.cifar.ca/docs/default-source/ai-society/buildinganaiworld_eng.pdf?sfvrsn=fb18d129_4 https://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prize-report.pdf https://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prize-report.pdf https://doi.org/10.11120/ital.2006.05040249 https://doi.org/10.15760/comminfolit.2013.7.2.150 https://doi.org/10.1145/3265757.3265777 information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 12 thinking and media & information literacy: an integrated approach to teaching twenty-first century skills,” techtrends 60 (2016): 510–16, https://doi.org/10.1007/s11528-016-0098-4. 22 jeannette wing, “computational thinking,” communications of the acm 49, no. 3 (2006): 35. 23 sharin rawhiya jacob and mark warschauer, “computational thinking and literacy,” journal of computer science integration 1, no. 1 (2018): 3, https://doi.org/10.26716/jcsi.2018.01.1.1. 24 sylvia scribner and michael cole, the psychology of literacy, acls humanities e-book (series) (cambridge, ma: harvard university press, 1981), 99. 25 george steiner, “school terms: redefining literacy for the digital age,” lapham’s quarterly 1, no. 4 (2008): 198. 26 ed finn, “algorithm of the enlightenment,” issues in science and technology 33, no. 3 (2017): 25; ed finn, what algorithms want: imagination in the age of computing (cambridge, ma: mit press, 2017), 2. 27 long and magerko, “what is ai literacy?,” 2. 28 david bawden, “information and digital literacies: a review of concepts,” journal of documentation 57, no. 2 (2001): 233. 29 gretter and yadav, “computational thinking,” 510. 30 pasquale, the black box society; o’neil, weapons of math destruction. 31 “what consumers really think about ai: a global study,” pega, 2019, https://www.ciosummits.com/what-consumers-really-think-about-ai.pdf. 32 motahhare eslami et al., “first i ‘like’ it, then i hide it: folk theories of social feeds,” in proceedings of the 2016 chi conference on human factors in computing systems, chi ’16 (san jose, ca: association for computing machinery, 2016), 2371–82, https://doi.org/10.1145/2858036.2858494. 33 julia angwin et al., “machine bias,” propublica, may 23, 2016, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing; eubanks, automating inequity; noble, algorithms of oppression; pasquale, the black box society; ruha benjamin, race after technology: abolitionist tools for the new jim code (polity press, 2019); o’neil, weapons of math destruction. 34 tarleton gillespie, “the relevance of algorithms,” in media technologies: essays on communication, materiality, and society, ed. tarleton gillespie, pablo j. boczkowski, and kirsten a. foot (cambridge, ma: mit press, 2014), 184; sun-ha hong, technologies of speculation: the limits of knowledge in a data-driven society (new york: new york university press, 2020), 2. 35 gillespie, “the relevance of algorithms,” 187. https://doi.org/10.1007/s11528-016-0098-4 https://doi.org/10.26716/jcsi.2018.01.1.1 https://www.ciosummits.com/what-consumers-really-think-about-ai.pdf https://doi.org/10.1145/2858036.2858494 https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 13 36 altexsoft, “comparing machine learning as a service: amazon, microsoft azure, google cloud ai, ibm watson,” data science (blog), september 27, 2019, https://www.altexsoft.com/blog/datascience/comparing-machine-learning-as-a-serviceamazon-microsoft-azure-google-cloud-ai-ibm-watson/. 37 european union, “regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016,” 2016, http://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:32016r0679; bryce goodman and seth flaxman, “european union regulations on algorithmic decision making and a ‘right to explanation,’” ai magazine 38, no. 3 (2017): 50–57, https://doi.org/10.1609/aimag.v38i3.2741. 38 finland, “work in the age of artificial intelligence: four perspectives on economy, employment, skills and ethics” (helsinki: ministry of economic affairs and employment, 2018), http://urn.fi/urn:isbn:978-952-327-313-9. 39 taina bucher, if . . . then: algorithmic power and politics (new york: oxford university press, 2018), 20. 40 alison j. head, barbara fister, and margy macmillan, “information literacy in the age of algorithms: student experiences with news and information, and the need for change” (project information literacy, 2020), 41, https://www.projectinfolit.org/uploads/2/7/5/4/27541717/algoreport.pdf. 41 paul t. jaeger et al., “the intersection of public policy and public access: digital divides, digital literacy, digital inclusion, and public libraries,” public library quarterly 31, no. 1 (2012): 1, https://doi.org/10.1080/01616846.2012.654728. 42 “ulc snapshot: artificial intelligence,” urban libraries council weekly newsletter, july 18, 2018. 43 canadian federation of library associations, “artificial intelligence and intellectual freedom: key policy concerns for canadian libraries” (ottawa: cfla, 2018), http://cfla-fcab.ca/wpcontent/uploads/2018/07/cfla-fcab-2018-national-forum-paper-final.pdf. 44 martin and grudziecki, “digeulit.” 45 b. alexander, s. adams becker, and m. cummins, “digital literacy: an nmc horizon project strategic brief” (austin, tx: the new media consortium, 2016), https://www.nmc.org/publication/digital-literacy-an-nmc-horizon-project-strategic-brief/. 46 ting-chia hsu, shao-chen chang, and yu-ting hung, “how to learn and how to teach computational thinking: suggestions based on a review of the literature,” computers & education 126 (2018): 296–310, https://doi.org/10.1016/j.compedu.2018.07.004. 47 sayamindu dasgupta and benjamin mako hill, “designing for critical algorithmic literacies,” arxiv:2008.01719 [cs], 2020, http://arxiv.org/abs/2008.01719. 48 long and magerko, “what is ai literacy?” 49 alexander, adams becker, and cummins, “digital literacy.” https://www.altexsoft.com/blog/datascience/comparing-machine-learning-as-a-service-amazon-microsoft-azure-google-cloud-ai-ibm-watson/ https://www.altexsoft.com/blog/datascience/comparing-machine-learning-as-a-service-amazon-microsoft-azure-google-cloud-ai-ibm-watson/ http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 https://doi.org/10.1609/aimag.v38i3.2741 http://urn.fi/urn:isbn:978-952-327-313-9 https://www.projectinfolit.org/uploads/2/7/5/4/27541717/algoreport.pdf https://doi.org/10.1080/01616846.2012.654728 http://cfla-fcab.ca/wp-content/uploads/2018/07/cfla-fcab-2018-national-forum-paper-final.pdf http://cfla-fcab.ca/wp-content/uploads/2018/07/cfla-fcab-2018-national-forum-paper-final.pdf https://www.nmc.org/publication/digital-literacy-an-nmc-horizon-project-strategic-brief/ https://doi.org/10.1016/j.compedu.2018.07.004 http://arxiv.org/abs/2008.01719 information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 14 50 edward greenspon and taylor owen, “democracy divided: countering disinformation and hate in the digital public sphere” (ottawa: public policy forum, 2018), 19, https://ppforum.ca/wpcontent/uploads/2018/08/democracydivided-ppf-aug2018-en.pdf. 51 marcos román-gonzález, juan-carlos pérez-gonzález, and carmen jiménez-fernández, “which cognitive abilities underlie computational thinking? criterion validity of the computational thinking test,” computers in human behavior 72 (2017): 678–91, https://doi.org/dx.doi.org/10.1016/j.chb.2016.08.047. 52 annemaree lloyd, “chasing frankenstein’s monster: information literacy in the black box society,” journal of documentation 75, no. 6 (2019): 1476, https://doi.org/10.1108/jd-022019-0035. 53 head, fister, and macmillan, “information literacy in the age of algorithms,” 42. 54 head, fister, and macmillan, “information literacy in the age of algorithms.” 55 taina bucher, “the algorithmic imaginary: exploring the ordinary affects of facebook algorithms,” information, communication & society 20, no. 1 (2017): 42, https://doi.org/10.1080/1369118x.2016.1154086. 56 tanya kant, making it personal: algorithmic personalization, identify, and everyday life (oxford: oxford university press, 2020), 152. 57 jason clark, lisa janicke hinchliffe, and scott young, “unpacking the algorithms that shape our ux” (washington, dc: imls, 2017), https://www.imls.gov/sites/default/files/grants/re-7217-0103-17/proposals/re-72-17-0103-17-full-proposal-documents.pdf. 58 association of college and university libraries, “framework for information literacy for higher education,” 2015, http://www.ala.org/acrl/standards/ilframework; jason clark, “building competencies around algorithmic awareness” (washington, dc: code4lib, 2018), https://www.lib.montana.edu/~jason/talks/algorithmic-awareness-talk-code4lib2018.pdf. 59 jason clark, algorithmic awareness (2018; repr., github, 2020), https://github.com/jasonclark/algorithmic-awareness. 60 ryan cordell, “machine learning + libraries: a report on the state of the field” (washington dc: library of congress, 2020), 31, https://labs.loc.gov/static/labs/work/reports/cordellloc-ml-report.pdf. 61 michael ridley, “explainable artificial intelligence,” research library issues, no. 299 (2019): 28– 46, https://doi.org/10.29242/rli.299.3. 62 matt turek, “explainable artificial intelligence (xai)” (arlington, va: darpa, 2016), https://www.darpa.mil/program/explainable-artificial-intelligence; darpa, “explainable artificial intelligence (xai)” (arlington, va: darpa, 2016), http://www.darpa.mil/attachments/darpa-baa-16-53.pdf. 63 ashraf abdul et al., “trends and trajectories for explainable, accountable, and intelligible systems: an hci research agenda,” in proceedings of the 2018 chi conference on human https://ppforum.ca/wp-content/uploads/2018/08/democracydivided-ppf-aug2018-en.pdf https://ppforum.ca/wp-content/uploads/2018/08/democracydivided-ppf-aug2018-en.pdf https://doi.org/dx.doi.org/10.1016/j.chb.2016.08.047 https://doi.org/10.1108/jd-02-2019-0035 https://doi.org/10.1108/jd-02-2019-0035 https://doi.org/10.1080/1369118x.2016.1154086 https://www.imls.gov/sites/default/files/grants/re-72-17-0103-17/proposals/re-72-17-0103-17-full-proposal-documents.pdf https://www.imls.gov/sites/default/files/grants/re-72-17-0103-17/proposals/re-72-17-0103-17-full-proposal-documents.pdf http://www.ala.org/acrl/standards/ilframework https://www.lib.montana.edu/~jason/talks/algorithmic-awareness-talk-code4lib2018.pdf https://github.com/jasonclark/algorithmic-awareness https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf https://doi.org/10.29242/rli.299.3 https://www.darpa.mil/program/explainable-artificial-intelligence http://www.darpa.mil/attachments/darpa-baa-16-53.pdf information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 15 factors in computing systems, chi ’18 (new york: acm, 2018), 582:1–582:18, https://doi.org/10.1145/3173574.3174156; wojciech samek and klaus-robert muller, “towards explainable artificial intelligence,” in explainable ai: interpreting, explaining and visualizing deep learning, ed. wojciech samek et al., 2019., lecture notes in artificial intelligence 11700 (cham: springer international publishing, 2019), 5–22; alejandro barredo arrieta et al., “explainable artificial intelligence (xai): concepts, taxonomies, opportunities and challenges toward responsible ai,” arxiv:1910.10045 [cs], 2019, http://arxiv.org/abs/1910.10045. 64 r. david lankes, “decoding ai and libraries,” r. david lankes (blog), july 3, 2019, https://davidlankes.org/decoding-ai-and-libraries/. 65 catherine coleman, “artificial intelligence and the library of the future, revisited,” digital library blog (blog), november 3, 2017, https://library.stanford.edu/blogs/digital-libraryblog/2017/11/artificial-intelligence-and-library-future-revisited; head, fister, and macmillan, “information literacy in the age of algorithms”; cordell, “machine learning + libraries”; clark, hinchliffe, and young, “unpacking the algorithms.” 66 cordell, “machine learning + libraries,” 2. 67 thomas padilla, responsible operations. data science, machine learning, and ai in libraries (dublin, oh: oclc research, 2019), 10, https://doi.org/10.25333/xk7z-9g97. 68 nicholas diakopoulos, “algorithmic accountability reporting: on the investigation of black boxes” (new york: tow center for digital journalism, columbia university, 2014), 2, https://doi.org/10.7916/d8zk5tw2; finn, “algorithm of the enlightenment,” 24. 69 scott lash, “power after hegemony: cultural studies in mutation?,” theory, culture & society 24, no. 3 (2007): 71, https://doi.org/10.1177/0263276407075956. https://doi.org/10.1145/3173574.3174156 http://arxiv.org/abs/1910.10045 https://davidlankes.org/decoding-ai-and-libraries/ https://library.stanford.edu/blogs/digital-library-blog/2017/11/artificial-intelligence-and-library-future-revisited https://library.stanford.edu/blogs/digital-library-blog/2017/11/artificial-intelligence-and-library-future-revisited https://doi.org/10.25333/xk7z-9g97 https://doi.org/10.7916/d8zk5tw2 https://doi.org/10.1177/0263276407075956 abstract introduction algorithms and literacy “literacies of the digital” defining algorithmic literacy why algorithmic literacy? the library role in algorithmic literacy information literacy and explainable ai (xai): unique library contributions algorithmic literacy and information literacy algorithmic literacy and explainable ai (xai) conclusion endnotes microsoft word 13319 20211217 gallery.docx article developing a minimalist multilingual full-text digital library solution for disconnected remote library partners todd digby information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13319 todd digby (digby@ufl.edu) is chair, library technology services department, and associate university librarian, george a. smathers libraries, university of florida. © 2021. abstract the university of florida (uf) george a. smathers libraries have been involved in a wide range of partnered digital collection projects throughout the years with a focus on collaborating with institutions across the caribbean region. one of the countries that we have a number of digitization projects within is cuba. one of these partnerships is with the library of the temple beth shalom (gran sinagoga bet shalom) in havana, cuba. as part of this partnership, we have sent personnel over to cuba to do onsite scanning and digitization of selected materials found within the institution. the digitized content from this project was brought back to uf and loaded into our university of florida digital collections (ufdc) system. because internet availability and low bandwidth are issues in cuba, the synagogue’s ability to access the full-text digitized content residing on ufdc was an issue. the synagogue also did not have a local digital library system to load the newly digitized content. to respond to this need we focused on providing a minimalist technology solution that was highly portable to meet their desire to conduct full-text searches within their library on their digitized content. this article will explore the solution that was developed using a usb flash drive loaded with a portableapps version of zotero loaded with multilingual ocr’s documents. about the partnership the university of florida (uf), george a. smathers libraries have been involved in a wide range of partnered digital collection projects throughout the years with a focus on collaborating with institutions across the caribbean region. uf has been involved with the digital library of the caribbean (dloc), which began in 2006, and the university is the technical home. the dloc brings together collections from countries around the caribbean in order to provide researchers with greater online access to these physically dispersed collections.1 this partnership reflects common interests of preservation, access, accessibility, discovery, and content management.2 one of the countries that we have a number of digitization projects with is cuba.3 the cuban judaica collection comprises materials held in the library of the temple beth shalom (gran sinagoga bet shalom) in havana, cuba. the synagogue library collection contains over 10,000 books. the origin of this collection started with abraham marcus matterin, the founder of the la agrupacion cultural hebreo cubana cultural group, who first gathered and arranged the materials. in addition to matterin’s own works, the materials in the library include many rare yiddish publications from the early 20th century, as well as little-known works produced in cuba beginning in the 1930s. the temple beth shalom library as a whole provides a complete snapshot of cuban jewish intellectual, cultural, religious, and political life as it evolved and progressed during the 20th century.4 information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 2 among the rare publications included in the smathers libaries collections are habanera lebn, the main cuban jewish newspaper published between 1932 and1960; israelia, a spanish-language newspaper which circulated as a monthly during the 1950s; as well as many other cuban jewish publications. this collection will also provide access to the synagogue library’s wealth of jewish publications from other parts of the caribbean and latin america. the synagogue library digitization project is a partnership between the george a. smathers libraries, the isser and price library of judaica under the auspices of its neh challenge grant, la comunidad hebrea de cuba, and la biblioteca nacional de cuba josé martí. the digitization process as part of this partnership, graduate interns who were fluent in spanish travelled to cuba to do onsite scanning and digitization of selected materials found within the institution. the digitized content from this project was brought back to uf and loaded into our university of florida digital collections (ufdc) system. this digitization process involved taking images in a high-resolution tiff format and then creating the appropriate metadata to accompany these records for use once they were loaded into the digital library system. an additional step of this process was developed by fletcher durant, uf’s preservation librarian, who cut strips of colorful acid-free paper, which were placed in physical items to indicate that they were digitized. these paper flags were used to tell local synagogue users which items were digitized and available locally in the synagogue and more broadly on the internet. these digital files were then transported back to the digital support services group at the university of florida with the returning personnel on usb hard drives, which were then appropriately scanned for viruses before extracting the digitized files. as part of the ingest process to ufdc we created derivative access files including jpgs, jpg2000, and thumbnail images from the high resolution. additionally, these files are processed through an optical character recognition (ocr) process to generate full text searchable files. this ocr process involved a combination of using adobe acrobat or abbyy finereader. the unique aspect of the ocr process was the need to ensure multilingual character recognition that could recognize and generate full text files that may include spanish, hebrew, and english. these scanned files, along with the derivatives and ocr files were then loaded and made publicly searchable with the ufdc system. these newly digitized files were then easily loaded into our ufdc system and made accessible to the wider internet audience around the world. working with a partner in cuba, however, presented additional challenges. partnership challenges providing the synagogue with access to their own content presented challenges that were not necessarily anticipated when the scanning project started. the synagogue did not have a local digital library system that they could use to load and access the newly digitized content. we did provide the synagogue with copies of all the digital files, but since this digitization effort focused on text-based printed materials, to make full use of this content access to the ocr files in a searchable format was the end goal. this in itself is not normally an issue since the digitized content would be loaded on ufdc systems and could be accessed online. unfortunately, one challenge that presented itself when working with cultural heritage partners within cuba is that limited technology infrastructure and internet connectivity can create issues in supporting the information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 3 physical and digital initiatives needs of the project activities. with internet availability being intermittent and bandwidth speeds limited in cuba, there needed to be creative ways to address the digitization work and subsequent file sharing needed to be developed.5 aside from the broader infrastructural challenges related to technology that are presented when working with cuban partners, there are additional bureaucratic challenges presented with partnering projects in cuba that can be in flux and change with changes in policy by either the us or cuban governments. these technological and political hurdles made our ability to offer ongoing remote support highly challenging. additionally, there were barriers to how we could offer remote support because our respective it technicians spoke either english or spanish, but not both. the need for translation between languages is one that can be overcome, but does slow down the responsiveness. also, translators may not be accustomed to translating technical jargon, which can further complicate providing support. given the challenges presented above we endeavored to provide the synagogue with a solution that would provide a multilingual full-text search of the materials that had been scanned and put through the ocr process. additional factors that influenced our planning, as discussed above, were the recognition that our cuban partner did not have reliable internet and access to the content hosted in the us and that this solution would be run locally at the synagogue so we would be able to provide minimal, if any, support in installing or supporting the system once it was deployed. minimalist computing solutions in the search for a solution, we were influenced by the concept of minimalist computing.6 borrowed from digital humanities, minimal computing “refer[s] to computing done under some set of significant constraints of hardware, software, education, network capacity, power, or other factors. minimal computing includes both the maintenance, refurbishing, and use of machines to do dh work out of necessity.”7 an important focus of minimal computing in the digital humanities, as noted by risam and edwards (2017), is how these practices can be used by those that find themselves with technological needs when they work outside larger macro structures of financial and technical support.8 so basically, minimal computing is a solution for those individuals or institutions that are not positioned at a larger scale to support projects financially and technologically. appropriate technology in addition to acknowledging the use of minimalist computing, additional related frameworks exist that we drew upon for our project. the most prominent of these is the concept of appropriate technology (at) which is a concept that comes from the field of economics.9 this concept was further adapted and used in the field of economic development and was seen through implementation pertaining to … small production units, appropriate technologies are small-scale but efficient, replicable in numerous units, readily operated, maintained and repaired, low-cost and accessible to low-income persons. in terms of the people who use or benefit from them, appropriate technologies seek to be compatible with local cultural and social environments.10 information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 4 one of the main reasons for the use of appropriate technology is that advanced technologies were often inappropriate for the needs of the populations in countries that did not have the same level of technological infrastructure, support, and knowledge. this idea is composed of multiple facets, where in some cases it can be used to describe as using the simplest level of technology that is needed to meet the intended purpose of the user. it can also refer to how the system works in the way it is developed that takes into account the social and environmental factors for a given use. open-source appropriate technology further influences that influenced the design of this project can be found in a more granular approach to appropriate technology that has become known as open source appropriate technology (osat). as defined by pearce (2012), osat refers to technologies that can be sustainably developed, while at the same time being developed using the concepts and principles of free and open source software. additionally, pearce (2012) further states that, osat is made up of technologies that are easily and economically utilized from readily available resources by local communities to meet their needs and must meet the boundary conditions set by environmental, cultural, economic, and educational resource constraints of the local community.11 developing a solution as mentioned previously, the digitized synagogue content from this project was brought back to uf from cuba and loaded into our local digital collections system. this made the content accessible to anyone with internet access around the world, yet due to the fact that internet availability and low bandwidth are issues in cuba, the synagogue’s ability to access the full-text digitized content residing on ufdc could not be assured. additionally, the synagogue also did not have a local digital library system to load the newly digitized content into. to respond to their desire to conduct full-text searches on their digitized content within their library, we focused on providing a minimalist technology solution that was highly portable, user friendly, open source, and sustainable without any or minimal technical support. in scanning the library technological landscape, our first thought was to find a small digital library system that could be used to meet these needs. although there are a number of open source digital library systems and some of these can be configured to work in a non-internet–connected environment, the level of customization and ongoing technical support posed a problem, especially when we may not be able to provide support due to both issues in the telecommunications systems and possible language barriers. although we knew that they had a windows laptop, there were still technical uncertainties about the local computing environment within the synagogue. once we determined that a full digital library system was not going to be sustainable and deployable, we decided to look for alternative approaches. a solution that just involved providing the scanned materials on dvds was considered, but this also presented a problem, because the ocr’d pdfs would need to be opened individually to search the text within, and there was no logical way to provide citation information or organize files in a meaningful way. information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 5 zotero portable eventually, we looked at the various citation management systems, since many of these allow pdf files to be imported and allow for searching the full text of pdf files using the ocr’d text. we then focused on a solution that is open source; this is especially important given that we were providing this to an entity in a foreign country and we did not want to experience any licensing or update issues to the chosen software. the platform that was chosen was zotero, primarily because of the open source license of the software, but also because of existing technical knowledge and experience using it. with zotero chosen as our platform, we then investigated how this could be made portable in a way that was already installed, configured, and populated for the end user. i had existing knowledge of the portableapps platform (https://portableapps.com/), which is a fully open source and free platform that enables a broad range of windows applications to be installed on a portable device, in our case a large flash drive. once installed using the portableapps platform these applications can be used without any additional installation, just by plugging in the flash drive to another computer. in the case of the zotero application, there was already a deployment that was created to be used with portableapps, which made the installation and configuration less of a hurdle. see figure 1. figure 1. once plugged into the pc, the portableapps.com drive would present itself; double-clicking on the icon would load the menu for the user with only the zotero application available to choose. once installed we started to load the digitized files into zotero. this consisted of a two-step process. first, because the content was already added to our digital library system, we imported citations for each item or volume into zotero, as you normally would when adding citations into zotero. then we went into each of these citations and added the corresponding digitized file(s) into each entry. some of these items consisted of multiple volumes that would be placed under an information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 6 existing citation. in addition to this, some of the materials contained both spanish and hebrew text, so during the ocr process there was a separate file created for each of these languages. at this point we were able to test the full-text search capabilities of the zotero against our multilanguage ocr’d pdf files. figure 2. this image shows a set of citations of the scanned materials. to perform a full-text search within these documents a user uses the everything search box in the zotero toolbar. at this point in our testing, we had determined that our method was successful in being able to perform a full-text search across all the loaded pdf documents (see figure 2). however, a limit of zotero at this point was that the search would only identify which files the search terms were located in and not the exact location of the located terms within each file (see figure 3). although this was a limitation of our solution, we were able to provide a second step for searching within the pdf files for the exact location of the search terms. information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 7 figure 3. the zotero search results, which highlighted the files in which the search text could be found. figure 4. acrobat search conducted on a file to locate the exact location of the text in the digitized materials. since we were aware that you can search within a full-text pdf file using adobe acrobat reader, we decided that for a more granular search we would instruct the users to click on the file that information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 8 included the identified search term, which would load acrobat reader and open the file. then the user could search for the exact term using adobe acrobat reader’s search capabilities to locate the location in the document where the term was situated (see figure 4 for an example). although this two-step process is not ideal, for a minimal technological solution that addresses all the concerns, it would meet the overall goals of the project and provide a workable searching solution for our partner, with the workflows, installation, and configuration of the flash drive complete, we next created documentation in both english and spanish to guide the user through the search process. we then provided the flash drive with accompanying documentation to the next staff member who was travelling to cuba. because of federal rules implemented shortly after we transported the flash drive to cuba, our ability to travel to cuba to work with our partners was limited. this has substantially reduced the information flow between our institution and our cuban partners and has limited how much we can know about how actively this resource is being used. it is hoped that in the near future we can once again be able to travel to cuba and re-engage with our partners to determine the success of this and other projects we have been working with them on. conclusion in the realm of library technology, we often implement and support complex and highly costly systems as part of our regular oversight. by working on this project, we have been given a chance to take a step back and design a solution that uses open source and free tools that are readily available and require low support. looking broadly across the technology platforms, systems, and software that are used, there is a tendency to find these include a plethora of features and functions that are rarely used but add additional complexity. focusing on solutions that reduce this complexity and still meet user needs has been a rewarding experience. endnotes 1 brooke wooldridge, laurie taylor, and mark sullivan, “managing an open access, multiinstitutional, international digital library: the digital library of the caribbean,” resource sharing & information networks 20, no. 1–2 (2009): 35–44, https://doi.org/10.1080/07377790903014534. 2 miguel asencio, “collaborating for success: the digital library of the caribbean,” journal of library administration 57, no. 7 (2017): 818–25, https://doi.org/10.1080/01930826.2017.1362902. 3 “celebrating cuba! collaborative digital collections of cuban patrimony,” university of florida digital collections, accessed february 15, 2021, https://ufdc.ufl.edu/cuba. 4 “cuban judaica,” university of florida digital collections, accessed february 15, 2021, https://ufdc.ufl.edu/cuban_judaica. information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 9 5 xuefei deng, nancy armando camacho, and larry press, “how do cubans use internet? the effects of capital,” in proceedings of the 52nd hawaii international conference on system sciences (2019), https://doi.org/10.24251/hicss.2019.617. 6 jentery sayers, “minimal definitions,” minimal computing: a working group of go::dh, october 2, 2016, https://go-dh.github.io/mincomp/thoughts/2016/10/02/minimal-definitions. 7 “about: what is minimal computing?” minimal computing: a working group of go::dh, https://go-dh.github.io/mincomp/about/. 8 roopika risam and susan edwards, “micro dh: digital humanities at the small scale,” digital humanities 2017, http://works.bepress.com/roopika-risam/27/. 9 ernest f. schumacher, small is beautiful: economics as if people mattered (london: blond and brigggs, 1973). 10 peter thormann, “proposal for a program in appropriate technology,” in appropriate technologies for third world development (new york: st. martin's press, 1979): 280–99. 11 j. m. pearce, “the case for open source appropriate technology,” environment, development and sustainability 14 (2012): 425–31, https://doi.org/10.1007/s10668-012-9337-9. from digital library to open datasets: embracing a “collections as data” framework articles from digital library to open datasets: embracing a “collections as data” framework rachel wittmann, anna neatrour, rebekah cummings, and jeremy myntti information technology and libraries | december 2019 49 rachel wittmann (rachel.wittmann@utah.edu) is digital curation librarian, university of utah. anna neatrour (anna.neatrour@utah.edu) is digital initiatives librarian, university of utah. rebekah cummings (rebekah.cummings@utah.edu) is digital matters librarian, university of utah. jeremy myntti (jeremy.myntti@utah.edu) is head of digital library services, university of utah. abstract this article discusses the burgeoning “collections as data” movement within the fields of digital libraries and digital humanities. faculty at the university of utah’s marriott library are developing a collections as data strategy by leveraging existing digital library and digital matters programs. by selecting various digital collections, smalland large-scale approaches to developing open datasets are explored. five case studies chronicling this strategy are reviewed, along with testing the datasets using various digital humanities methods, such as text mining, topic modeling, and gis (geographic information system). introduction for decades, academic research libraries have systematically digitized and managed online collections for the purpose of making cultural heritage objects available to a broader audience. making archival content discoverable and accessible online has been revolutionary for the democratization of scholarship, but the use of digitized collections has largely mimicked traditional use: researchers clicking through text, images, maps, or historical documents one at a time in search of deeper understanding. “collections as data” is a growing movement to extend the research value of digital collections beyond traditional use and to give researchers more flexible access to our collections by facilitating access to the underlying data, thereby enabling digital humanities research.1 collections as data is predicated upon the convergence of two scholarly trends happening in parallel over the past several decades.2 first, as mentioned above, librarians and archivists have digitized a significant portion of their special collections, giving access to unique material that researchers previously had to travel across the country or globe to study. at the same time, an increasing number of humanist scholars have approached their research in new ways, employing computational methods such as text mining, topic modeling, gis (geographic information system), sentiment analysis, network graphs, data visualization, and virtual/augmented reality in their quest for meaning and understanding. gaining access to high-quality data is a key challenge of digital humanities work, since the objects of study in the humanities are frequently not as amenable to computational methods as data in the sciences and social sciences.3 typically, data in the sciences and social sciences is numerical in nature and collected in spreadsheets and databases with the intention that it will be computationally parsed, ideally as part of a reproducible and objective study. conversely, data (or, more commonly, “evidence” or “research assets”) in the humanities is textor image-based and is created and collected with the intention of close reading or analysis by a researcher who brings their subjective expertise to bear on the object.4 even a relatively simple digital humanities method like identifying word frequency in a corpus of literature is predicated on access to plain mailto:rachel.wittmann@utah.edu mailto:anna.neatrour@utah.edu mailto:rebekah.cummings@utah.edu mailto:jeremy.myntti@utah.edu from digital library to open datasets | wittmann, neatrour, cummings, and myntti 50 https://doi.org/10.6017/ital.v38i4.11101 text (.txt) files, high-quality optical character recognition (ocr), and the ability to bulk download the files without running afoul of copyright or technical barriers. as “the santa barbara statement on collections as data” articulates, “with notable exceptions like the hathitrust research center, the national library of the netherlands data services & api’s, the library of congress’ chronicling america, and the british library, cultural heritage institutions have rarely built digital collections or designed access with the aim to support computational use.”5 by and large, digital humanists have not been well-served by library platforms or protocols. current methods for accessing collections data include contacting the library for direct access to the data or “scraping” data off library websites. recently funded efforts such as the institute of museum and library services’ (imls’s) always already computational and the andrew w. mellon foundation’s collections as data: part to whole seek to address this problem by setting standards and best practices for turning digital collections into datasets amenable to computational use and novel research methods.6 the university of utah j. willard marriott library has a long-running digital library program and a burgeoning digital scholarship center creating a moment of synergy for librarians in digital collections and digital scholarship to explore collaboration in teaching, outreach, and digital collection development. a shared goal between the digital library and digital scholarship teams is to develop collections as data of regional interest that could be used by researchers for visualization and computational exploration. this article will share our local approach to developing and piloting a collections as data strategy at our institution. relying upon best practices and principles from thomas padilla’s “on a collections as data imperative,” we transformed five library collections into datasets, made select data available through a public github repository, and tested the usability of the data with our own research questions relying upon expertise and infrastructure from digital matters and the digital library at the marriott library.7 digital matters in 2015, administration at the marriott library was approached by multiple colleges at the university of utah to explore the possibility of creating a collaborative space to enable digital scholarship. while digital scholarship was happening across campus in disparate and unfocused ways, there was no concerted effort to share resources, build community, or develop a multi-college digital scholarship center with a mission and identity. after an eighteen-month planning process, the digital matters pop-up space was launched as a four-college partnership between the college of humanities, college of fine arts, college of architecture + planning, and the marriott library. an anonymous $1 million donation in 2017 allowed the partner colleges to fund staffing and activity in the space for five years, including the hire of a digital matters director tasked with planning for long-term sustainability. the development of digital matters brings new focus, infrastructure, and partners for digital humanities research to the university of utah and the marriott library. monthly workshops, speakers, and reading groups led by digital scholars from all four partner colleges have created a vibrant community with crossdisciplinary partnerships and unexpected synergies. close partnerships and ongoing dialogue have increased awareness for marriott faculty, particularly those working in and collaborating with digital matters, of the challenges facing digital humanists and the ways in which the library community is uniquely suited to meet those needs. for example, a university of utah researcher in the college of humanities developed “century of black mormons,” a community-based public history database of biographical information and primary source documents on black mormons baptized between 1830 and 1930.8 working closely with the digital initiatives librarian and various staff and faculty at the marriott library, they created an omeka s site that allows users to interact with the historical data using gis, timeline features, and basic webpage functionality. information technology and libraries | december 2019 51 institution digital library the university of utah has had a robust digital library program since 2000, including one of the first digital newspaper repositories, utah digital newspapers (udn, https://digitalnewspapers.org/). in 2016, the library developed its own digital asset management system using open-source systems such as solr, phalcon, and nginx after using contentdm for over fifteen years.9 this new system, solphal, has made it possible for us to implement a custom solution to manage and display a vast amount of digital content, not only for our library, but also for many partner institutions throughout the state of utah. our main digital library server (https://collections.lib.utah.edu/) contains over 765,000 objects in nearly 700 collections, consisting of over 2.5 million files. solphal is also used to manage the udn, containing nearly 4 million newspaper pages and over 20 million articles. digital library projects are continually evolving as we redefine our digital collection development policies, ensuring that we are providing researchers and other users the digital content that they are seeking. with such a large amount of data available in the digital library, we can no longer view our digital library as a set of unique, yet siloed, collections, but more as a wealth of information documenting the history of the university, the state of utah, and the american west. we are also engaged in remediating legacy metadata across the repository in order to achieve greater standardization, which could support computational usage of digital library metadata in the future. with this in mind, we are working to strategically make new digital content available on a large scale that can help researchers discover this historical content within a collections as data mindset. leveraging the existing digital library and digital matters programs, faculty at the marriott library are in the process of piloting a collections as data strategy. we selected digital collections with varying characteristics and used them to explore smalland large-scale approaches to developing datasets for humanities researchers. we then tested the datasets by employing various digital humanities methods such as text mining, topic modeling, and gis. the five case studies below chronicle our efforts to embrace a collections as data framework and extend the research value of our digital collections. text mining mining texts when developing the initial collections as data project, several factors were considered to identify the optimal material for this experiment. selecting already digitized and described material in the university of utah digital library was ideal to avoid waiting periods required for new digitization projects. the marriott library special collections’ relationship with the american west center, an organization based at the university of utah with the mission of documenting the history of the american west, has produced an extensive collection of oral histories held in the audio visual archive which have typewritten transcripts yielding high-quality ocr. given the availability and readiness of this material, we built a selected corpus of mining-related oral histories, drawn from collections such as the uranium oral histories and carbon county oral histories. engaging in the entire process with a digital humanities framework, we scraped our own digital library repository as though we had no special access to the back end of the system, developing a greater understanding of the process and workflows needed to build a text corpus to support a research inquiry. in this way, we extended our skills so that we would be able to scrape any digital library system if this expertise was needed in the future. the extensive amount of text produced by the corpus of 230 combined oral histories provided ideal material for topic modeling. simply put, “topic modeling is an automatic way to examine the contents of a corpus of documents.”10 the output of these models is word clouds with varying sizes of words based on the number of co-occurrences within the corpus; larger words indicate more occurrences and smaller ones indicate fewer. each topic model then points to the most relevant documents within the corpus based on the co-occurrences of the words contained in that model. in order to create these topic models from the https://digitalnewspapers.org/ https://collections.lib.utah.edu/ from digital library to open datasets | wittmann, neatrour, cummings, and myntti 52 https://doi.org/10.6017/ital.v38i4.11101 corpus of oral histories, a workflow was developed with the expertise of the digital matters cohort, implementing mallet for r script, using the lda topic model style, developed by blei et al.11 figure 1. topic model from text mining the mining-related oral histories found in the university of utah’s digital library. from the mining-related oral history corpus, twenty-six topic models were created. once generated, each topic model points to five interviews that are most related to the words in a particular model. in figure 1, the words carbon, county, country, and italian are the largest, because the interviews are about carbon county, utah. considering this geographical area of utah was the most ethnically diverse in the late 1800s due to the coal mining industry recruiting labor from abroad, including italy, these words are not surprising. as indicated by their prominence in the topic model, the set of words co-occur most often in the interview set. we approached the process of topic modeling the oral histories as an exploration, but after information technology and libraries | december 2019 53 reviewing the results, we discovered that many of the words which surfaced through this process pointed to deficiencies in the original descriptive metadata, highlighting new possibilities for access points and metadata remediation. honing in on the midsize words tended to uncover unique material that is not covered in descriptive metadata, as these words are often mentioned more than a handful of times and across multiple interviews. the largest words in the model are typically thematic to the interview and included in the descriptive metadata. for example, when investigating the inclusion of “wine” in the topic model found in figure 1, conversations about the winemaking process amongst the italian mining community in carbon county, utah were revealed. from an interview with mary nicolovo juliana conducted in 1973 from the carbon county oral history project, nicolovo discusses how her father, a miner, made wine at home.12 as the topic models are based on co-occurrences in the corpus, there was another interview with emile louise cances, from the carbon county oral history project conducted in 1973. cances, from a french immigrant mining family, discusses the vineyards her family had in france.13 with both of these oral histories, there was no reference to wine in the descriptive metadata. a researcher may miss this content because it isn’t included as an access point in metadata. thus, topic modeling allowed for the discoverability of potentially valuable topics that may be buried in hundreds of pages of content. from this collections as data project, text mining the mining oral history texts to produce topic models, we are considering employing topic modeling when creating new descriptive metadata for similar collections. setting a precedent, the text files for this project are hosted on the growing marriott library collections as data github repository. after we developed this corpus, we discovered that a graduate student in the history department had developed a similar project, demonstrating the research value of oral histories combined with computational analysis.14 harold stanley sanders matchbooks collection when assessing potential descriptive metadata for the harold stanley sanders matchbooks collection, an assortment of matchbooks that reflect many bygone establishments predominately from salt lake city that include restaurants, bars, hotels, and other businesses, non-dublin core metadata was essential for computational purposes. with the digital project workflow now extending beyond publishing the collection in the digital library, to publishing the collection data to the marriott library collections as data github repository, assessing metadata needs has evolved. as matchbooks function as small advertisements, they often incorporate a mix of graphic design, advertising slogans, and addresses of the establishment. the descriptive metadata was created first with the most relevant fields for computational analysis, including business name, type of business, transcription of text, notable graphics, colors of matchbooks, and street addresses. for collection mapping capabilities, street addresses were then geocoded using a google sheets add-on called geocode cells, which uses google’s geocoding api (see figure 2). from digital library to open datasets | wittmann, neatrour, cummings, and myntti 54 https://doi.org/10.6017/ital.v38i4.11101 figure 2. a screenshot of google sheets add-on, geocode cells. (https://chrome.google.com/webstore/detail/geocode-cells/pkocmaboheckpkcbnnlghnfccjjikmfc). figure 3. a screenshot of harold stanley sanders matchbook collection map, made with arcgis online. https://chrome.google.com/webstore/detail/geocode-cells/pkocmaboheckpkcbnnlghnfccjjikmfc information technology and libraries | december 2019 55 this proved efficient for this collection, as other geocoding services required zip codes for street addresses which were not present on the matchbooks. with the latitude and longitude addition to the metadata, the collection was then mapped using arcgis online (see figure 3).15 the extensive metadata, including geographic-coordinate data, is available on the library’s github repository for public use. after the more computationally ready metadata was created, it was then massaged to fit library best practices and dublin core (dc) standards. this included deriving library of congress subject headings for dc subjects from business type and concatenating notable matchbook graphics and slogans for the dc description. while providing the extensive metadata is beneficial for computational experimentation, it adds time and labor to the lifespan of the project. kennecott copper miner records one aspect of our collections as data work at the university of utah moving forward is the need for longterm planning for resources that contain interesting information that could eventually be used for computational exploration, even if we currently don’t have the capacity to make the envisioned dataset available at the current time. the marriott library holds a variety of personnel records from the kennecott copper corporation, utah copper division. these handwritten index cards contain a variety of interesting demographic data about the workers who were employed by the company from 1900-19 such as name, employee id, date employed, address, dependents, age, weight, height, eyes, hair, gender, nationality, engaged by, last employer, education, occupation, department, pay rate, date leaving employment, and reason for leaving. not all the cards are filled out with the complete level of detail as listed in the fields above, however, usually name, date employed, ethnicity, and notes about pay rates for each employee are included. developing a scanning and digitization procedure for creating digital surrogates of almost 40,000 employment records was fairly easy due to an existing partnership and reciprocal agreement with familysearch, however developing a structure for making the digitized records available and providing full transcription is a long-term project. librarians used this project as an opportunity to think strategically about the limits of dublin core when developing a collections as data project from the start. the digital library repository at the university of utah provides the ability to export collection level metadata as .tsv files. with this in mind, the collection metadata template was created with the aim of eventually being able to provide researchers with the granular information on the records. this required introducing a number of new, non-standard field labels to our repository. since we are not able to anticipate exactly how a researcher might interact with this collection in the future, our main priority was developing a metadata template that would accommodate full transcription for every data point on the card. twenty new fields in the template reflect the demographic data on the card, and ten are existing fields that map to our standard practices with dublin core fields. because we do not currently have the staffing in place to transcribe 40,000 records, we are implementing a phased approach of transcribing four basic fields, with fuller transcription to follow if we are able to secure additional funding. from digital library to open datasets | wittmann, neatrour, cummings, and myntti 56 https://doi.org/10.6017/ital.v38i4.11101 figure 4. employment card for alli ebrahim, 1916. information technology and libraries | december 2019 57 figure 5. employment card for richard almond, 1917. woman’s exponent a stated goal for digital matters is to be a digital humanities space that is unique to utah and addresses issues of local significance such as public lands, water rights, air quality, indigenous peoples, and mormon history.16 when considering what digital scholarship projects to pursue in 2019, digital matters faculty became aware of the upcoming 150th anniversary of women in utah being the first to vote in the nation. working with a local nonprofit, better days 2020, and colleagues at brigham young university (byu), digital matters faculty and staff decided to embark on a multimodal analysis of the 6,800-page run of the woman’s exponent, a utah women’s newspaper published between 1872-1914 primarily under the leadership of latter-day saint relief society president emmeline b. wells. in its time, the woman’s exponent was a passionate voice for women’s suffrage, education, and plural marriage, and chronicled the interest and daily lives of latter-day saint women. initially, we hoped to access the data through the brigham young university harold b. lee library, which digitized the exponent back in 2000. we quickly learned that ocr from nearly twenty years ago would not suffice for digital humanities research and considered different paths for rescanning the exponent. after accessing the original microfilm from byu, we leveraged existing structures for digitization. through an agreement that the marriott library has in place with a vendor for completing large-scale digitization of newspapers on microfilm for inclusion in the utah digital newspapers program, we were able to add the woman’s exponent to the existing project without securing a new contract for digitization. the vendor digitized the microfilm, created an index of each title, issue, date, and page, and extracted the full text from digital library to open datasets | wittmann, neatrour, cummings, and myntti 58 https://doi.org/10.6017/ital.v38i4.11101 through an ocr process. they then delivered 330 gb of data to us, including high-quality tiff and jp2000 images, a pdf file for each page, and mets-alto xml files containing the metadata and ocr text. acquiring data for the woman’s exponent project illuminated the challenges that digital humanists face when looking for clean data. our original assumption was that if something had already been scanned and put online, the data must exist somewhere. we soon learned, when working with legacy digital scans, that the ocr might be insufficient or the original high-quality scans might be lost over the course of multiple system migrations. as librarians with existing structures in place for digitization, we had the content rescanned and delivered within a month. our digital humanities partners from outside of the library did not know this option was available and assumed our research team would have to scan 6,800 pages of newspaper content before we were able to start analyzing the data. this incongruity highlighted cultural differences between digital humanists with their learned self-reliance and librarians who are more comfortable and conversant looking to outside resources. indeed, our digital humanities colleagues seemed to believe that “doing it yourself” was part and parcel of digital humanities work. the woman’s exponent project is still in early phases, but now that we have secured the data, we are considering what digital humanities methods we can bring to bear on the corpus. with the 2020 150th anniversary of women’s suffrage in utah, we have considered a topic modeling project looking at themes around universal voting, slavery, and polygamy and tracking how the discussion around those topics evolved over the 42-year run of the paper. another potential project is building a social network graph of the women and men chronicled throughout the run of the paper. developing curriculum around women in utah history is of particular interest to the group as women are underrepresented in the current k-12 utah history curriculum. keeping in line with our commitment to collections as data, we have released the woman’s exponent as a .tsv file with ocr full-text data, which can be analyzed by researchers studying utah, mormon studies, the american west, or various other topics. collaborators have also developed a digital exhibit on the woman’s exponent which includes essays about a variety of topics as well as sections showcasing its potential for digital scholarship.17 obituary data the utah digital newspapers (udn) program began in 2002 with the goal of making historical newspaper content from the state of utah freely available to the public for research purposes. between 2002 and 2019, there have been over 4 million newspaper pages digitized for udn. due to search limitations of the software system used for udn at the time, the data model for newspapers was made more granular, and included segmentation for articles, obituaries, advertisements, birth notices, etc. this article segmentation project ended in 2016 when it was determined that the high cost of segmentation was not sustainab le with mass digitization of newspapers and users were still able to find the content they are looking for on a full newspaper page. before the article segmentation project concluded, udn had accrued over 20 million articles, including 318,044 articles that were tagged as obituaries or death notices. in 2013, the marriott library partnered with familysearch to index the genealogical information that can be gleaned from these obituaries. the familysearch indexing (fsi) program crowdsourced the indexing of this data to thousands of volunteers worldwide. certain pieces of data, such as place names, were mapped to an existing controlled vocabulary and dates were entered in a standardized format to ensure that certain pieces of the data are machine actionable.18 after the obituaries were indexed by fsi in 2014, a copy of the data was given to the marriott library to use in udn. the indexed data included fields such as name of deceased, date of death, place of death, date of birth, birthplace, and relative names with relationships. since this massive amount of data didn't easily fit within the udn metadata schema, it was stored for several years without the marriott library doing anything with the data. information technology and libraries | december 2019 59 now that we are thinking about our digital collections as data, we are exploring ways that researchers could use this vast amount of data. the data was delivered to the library in large spreadsheets that are not easily usable in any spreadsheet software. we are exploring ingesting the data into a revised newspaper metadata schema within our digital asset management system or converting the data into a mysql database so it is possible to search and find relationships between pieces of data. working with a large dataset such as this can be challenging. the data from only two newspapers, including 1,038 obituaries, is a 25 mb file. the full database is over 10 gb of data. since this is a large amount of data, we are working through issues related to how we can distribute this data in a usable way in order for researchers to make use of the data. we are also looking at the possibility of having fsi index a dditional obituary data from udn, which will make the database continually expand. conclusion as the digital library community recognizes the need for computational-ready collections, the university of utah digital library has embraced this evolution with a strategic investment. implementing the collections as data github repository for computational users is a first step towards providing access to collections beyond the traditional digital library environment. while there may be improved ways to access this digital library data in the future, the github repository filled an immediate need. developing standardized metadata for computational use can often require more time from metadata librarians who are already busy with the regular work of describing new assets for the digital library. developing additional workflows for metadata enhancement and bulk download can delay the process in making new collections available. in most cases, collections need to be evaluated individually to determine what type of resources can be invested in making them available for computational use. for a project needing additional transcription, like the kennecott mining records, crowdsourcing might seem like potential avenue to pursue. however, the digital library collection managers have misgivings about the training and quality assurance involved in developing a new large-scale transcription project. combined with the desire to ensure that the people who are working on the project have adequate training and compensation for their labor, we are making the strategic decision to transcribe for some of the initial access points to the collection now, and attempt full transcription at a later date pending additional funding. for the udn obituary data, leveraging an existing transcription program at no cost with minimal supervision needed by librarians worked well in being able to surface additional genealogical data that can be released for researchers. the collections as data challenge mirrors a perennial digital library conundrum—how much time and effort should librarians invest for unknown future users with unknown future needs? much like digitization and metadata creation, creating collections as data requires a level of educated guesswork as to what collections digital humanists will want to access, what metadata fields they will be interested in manipulating, and in what formats they will need their data. considering the limited resources of librarians, should we convert our digital collections into data in anticipation of use or convert our collections on demand? this “just in case” vs. “just in time” question is worthy of debate and will naturally be dependent on the resources and priorities of individual institutions. with an increasing number of researchers experimenting with digital humanities methods, collections as data will be a standard consideration when working with new digitization projects at the university of utah. visualization possibilities outside of the digital-library environment will be regularly assessed. descriptive metadata practices beyond dublin core will be developed when beneficial to the computational and experimental use of the data by the public. integrating techniques like topic modeling into descriptive metadata workflows provides additional insight about the digital objects being described. while adding collections as data to existing digitization workflows will require an additional investment of time, developing these projects has also created new opportunities for collaboration both within the library and from digital library to open datasets | wittmann, neatrour, cummings, and myntti 60 https://doi.org/10.6017/ital.v38i4.11101 in developing expanded partnerships at the university of utah and other institutions in the mountain west. by leveraging our existing partnerships, we were able to create collections as data pilots organically by taking advantage of our current workflows and digitization procedures. while we have been successful in releasing smaller-scale collections as data projects, we still need to consider integration issues with our larger digital library program and experiment more with enabling access to large datasets. with librarians engaged in producing curated datasets that evolve from unique special collection materials, they can extend the research value of the digital library and the collections that are unique to each institution. as we look towards the future, we see this work continuing and expanding as librarians engage more with digital humanities teaching and support. acknowledgements the authors would like to acknowledge dr. elizabeth callaway, former digital matters postdoctoral fellow and current assistant professor in the department of english at the university of utah, for developing the topic modeling workflow used in the collections as data project, text mining mining texts. callaway’s expertise was invaluable in creating the scripts to enable distance reading of the text corpus, documenting this process, and training library staff. references 1 thomas g. padilla, “collections as data: implications for enclosure,” college & research libraries news; chicago 79, no. 6 (june 2018): 296, https://crln.acrl.org/index.php/crlnews/article/view/17003/18751. 2 thomas padilla et al., “the santa barbara statement on collections as data (v1),” n.d., https://collectionsasdata.github.io/statementv1/. 3 christine l. borgman, “data scholarship in the humanities,” in big data, little data, no data: scholarship in the networked world (cambridge, ma: the mit press, 2015), 161–201. 4 miriam posner, “humanities data: a necessary contradiction,” miriam posner’s blog (blog), june 25, 2015, http://miriamposner.com/blog/humanities-data-a-necessary-contradiction/. 5 thomas padilla et al., “the santa barbara statement on collections as data (v1),” n.d., https://collectionsasdata.github.io/statementv1/. 6 thomas padilla, “always already computational,” always already computational: collections as data, 2018, https://collectionsasdata.github.io/; thomas padilla, “part to whole,” collections as data: part to whole, 2019, https://collectionsasdata.github.io/part2whole/. 7 “marriott library collections as data github repository,” april 16, 2019, https://github.com/marriottlibrary/collections-as-data. 8 “century of black mormons,” accessed april 25, 2019, http://centuryofblackmormons.org. 9 anna neatrour et al., “a clean sweep: the tools and processes of a successful metadata migration,” journal of web librarianship 11, no. 3-4 (october 2, 2017): 194-208, 111, https://doi.org/10.1080/19322909.2017.1360167. 10 anna l. neatrour, elizabeth callaway, and rebekah cummings, “kindles, card catalogs, and the future of libraries: a collaborative digital humanities project,” digital library perspectives 34, no. 3 (july 2018): 162–87, https://doi.org/10.1108/dlp-02-2018-0004. https://crln.acrl.org/index.php/crlnews/article/view/17003/18751 https://collectionsasdata.github.io/statementv1/ http://miriamposner.com/blog/humanities-data-a-necessary-contradiction/ https://collectionsasdata.github.io/statementv1/ https://collectionsasdata.github.io/ https://collectionsasdata.github.io/part2whole/ https://github.com/marriott-library/collections-as-data https://github.com/marriott-library/collections-as-data http://centuryofblackmormons.org/ https://www.ifla.org/files/assets/newspapers/slc/2014_ifla_slc_herbert_mynti_alexander_witkowski_-_getting_the_crowd_into_obituaries.pdf https://doi.org/10.1108/dlp-02-2018-0004 information technology and libraries | december 2019 61 11 david m. blei et al., “latent dirichlet allocation,” journal of machine learning research 3, no. 4/5 (may 15, 2003): 993–1022, http://search.ebscohost.com/login.aspx?direct=true&db=asn&an=12323372&site=ehost-live. 12 “mary nicolovo juliana, carbon county, utah, carbon county oral history project, no. 47, march 30 1973,” carbon county oral histories, accessed april 29, 2019, https://collections.lib.utah.edu/details?id=783960. 13 “mrs. emile louise cances, salt lake city, utah, carbon county oral history project, no. cc-25, february 24, 1973,” carbon county oral histories, accessed april 29, 2019, https://collections.lib.utah.edu/details?id=783899. 14 nate housley, “a distance reading of immigration in carbon county,” utah division of state history blog, 2019, https://history.utah.gov/a-distance-reading-of-immigration-in-carbon-county/. 15 “harold stanley sanders matchbooks collection,” accessed may 8, 2019, https://collections.lib.utah.edu/search?facet_setname_s=uum_hssm; “harold stanley sanders matchbooks collection map,” accessed may 8, 2019, https://mlibgisservices.maps.arcgis.com/apps/webappviewer/index.html?id=d16a5bc93b864fc0b953 0af8e48c6c6f. 16 rebekah cummings, david roh, and elizabeth callaway, “organic and locally sourced: growing a digital humanities lab with an eye towards sustainability,” digital humanities quarterly, 2019. 17 “woman’s exponent data,” https://github.com/marriott-library/collections-asdata/tree/master/womansexponent; “woman’s exponent digital exhibit,” https://exhibits.lib.utah.edu/s/womanexponent/. 18 john herbert et al., “getting the crowd into obituaries: how a unique partnership combined the world’s largest obituary with the utah’s largest historic newspaper database,” in salt lake city, ut: international federation of library associations and institutions, 2014, https://www.ifla.org/files/assets/newspapers/slc/2014_ifla_slc_herbert_mynti_alexander_witkowski _-_getting_the_crowd_into_obituaries.pdf. http://search.ebscohost.com/login.aspx?direct=true&db=asn&an=12323372&site=ehost-live https://collections.lib.utah.edu/details?id=783960 https://collections.lib.utah.edu/details?id=783899 https://history.utah.gov/a-distance-reading-of-immigration-in-carbon-county/ https://collections.lib.utah.edu/search?facet_setname_s=uum_hssm https://mlibgisservices.maps.arcgis.com/apps/webappviewer/index.html?id=d16a5bc93b864fc0b9530af8e48c6c6f https://mlibgisservices.maps.arcgis.com/apps/webappviewer/index.html?id=d16a5bc93b864fc0b9530af8e48c6c6f https://www.zotero.org/google-docs/?kmyo08 https://www.zotero.org/google-docs/?kmyo08 https://github.com/marriott-library/collections-as-data/tree/master/womansexponent https://github.com/marriott-library/collections-as-data/tree/master/womansexponent https://exhibits.lib.utah.edu/s/womanexponent/ https://www.ifla.org/files/assets/newspapers/slc/2014_ifla_slc_herbert_mynti_alexander_witkowski_-_getting_the_crowd_into_obituaries.pdf https://www.ifla.org/files/assets/newspapers/slc/2014_ifla_slc_herbert_mynti_alexander_witkowski_-_getting_the_crowd_into_obituaries.pdf abstract introduction digital matters institution digital library text mining mining texts harold stanley sanders matchbooks collection kennecott copper miner records woman’s exponent obituary data conclusion acknowledgements references microsoft word 14291 20211219 author.docx letter from the editor december 2021 kenneth j. varnum information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.14291 another year is nearly in the books. it has not been an easy year for many of us; perhaps not as truly chaotic and frightening as 2020 was, but still a year filled with uncertainty and rising concerns about the path of the pandemic. as we turn to a new year in our calendars, i wish all our readers health, peace, and a continued spirit of adaptation as we begin 2022. our public libraries leading the way column, “how covid affected our python class at the worcester public library” by melody friendenthal, is a follow up to her 2020 column on moving a library course on the python programming language from in-person to online for the worcester (ma) public library. our peer-reviewed content this month showcases topics including: digital library innovations; virtual reality; diversity, equity, and inclusion (dei); and library hackathons. 1. stateful library analysis and migration system (slam): etl system for performing digital library migrations / adrian-tudor panescu, teodora-elena grosu, and vasile manta 2. black, white, and grey: the wicked problem of virtual reality in libraries / gillian d. ellern and laura cruz / 3. bridging the gap: using linked data to improve discoverability and diversity in digital collections / jason boczar, bonita pollock, xiying mi, and amanda yeslibas 4. developing a minimalist multilingual full-text digital library solution for disconnected remote library partners / todd digby 5. diversity, equity & inclusion statements on academic library websites: an analysis of content, communication, and messaging / eric ely 6. a 21st century technical infrastructure for digital preservation / nathan tallman 7. hackathons and libraries: the evolving landscape 2014-2020 / meris mandernach longmeier kenneth j. varnum, editor varnum@umich.edu december 2021 articles no need to ask: creating permissionless blockchains of metadata records dejah rubel information technology and libraries | june 2019 1 dejah rubel (rubeld@ferris.edu) is metadata and electronic resource management librarian, ferris state university. abstract this article will describe how permissionless metadata blockchains could be created to overcome two significant limitations in current cataloging practices: centralization and a lack of traceability. the process would start by creating public and private keys, which could be managed using digital wallet software. after creating a genesis block, nodes would submit either a new record or modifications to a single record for validation. validation would rely on a federated byzantine agreement consensus algorithm because it offers the most flexibility for institutions to select authoritative peers. only the top tier nodes would be required to store a copy of the entire blockchain thereby allowing other institutions to decide whether they prefer to use the abridged version or the full version. introduction several libraries and library vendors are investigating how blockchain could improve activities such as scholarly publishing, content dissemination, and copyright enforcement. a few organizations, such as katalysis, are creating prototypes or alpha versions of blockchain platforms and products.1 although there has been some discussion about using blockchains for metadata creation and management, only one company appears to be designing such a product. therefore, this article will describe how permissionless blockchains of metadata records could be created, managed, and stored to overcome current challenges with metadata creation and management. limitations of current practices metadata standards, processes, and systems are changing to meet twenty-first century information needs and expectations. there are two significant limitations, however, to our current metadata creation and modification practices that have not been addressed: centralization and traceability. although there are other sources for metadata records, including the open library project, the largest and most comprehensive database with over 423 million records is provided by the online computer library center (oclc).2 oclc has developed into a highly centralized operation that requires member fees to maintain its infrastructure. oclc also restricts some members from editing records contributed by other members. one example of these restrictions is the program for cooperative cataloging (pcc). although there is no membership fee for pcc, catalogers from participating libraries must receive additional training to ensure that their institution contributes high quality records.3 requiring such training, however, limits opportunities for participation and can create bottlenecks when non-pcc institutions identify errors in a pcc record. decentralization no need to ask | rubel 2 https://doi.org/10.6017/ital.v38i2.10822 would help smaller, less-well-funded institutions overcome such barriers to creating and contributing their records and modifications to a central database. the other significant limitation to our current cataloging practices is the lack of traceability for metadata changes. oclc tracks record creation and changes by adding an institution’s oclc symbol to the 040 marc field.4 however, this symbol only indicates which institution created or edited the record, not what specific changes they made. oclc also records a creation date and a replacement date in each record, but a record may acquire multiple edits between those two dates. recording the details of each change within a record would help future metadata editors to understand who made certain changes and possibly why they were made. capturing these details would also mitigate concerns about the potential for metadata deletion because every datum would still be recorded even if it is no longer part of the active record. information science blockchain research many researchers and institutions are exploring blockchain for information science applications. most of these applications can be categorized as either scholarly publishing, content dissemination and management, or metadata creation and management. one of the most promising applications for blockchain is coordinating, endorsing, and incentivizing research and scholarly publishing activities. in “blockchain for research,” rossum from digital science describes benefits such as data colocation, community self-correction, failure analysis, and fraud prevention.5 research activity support and endorsement would use an academic endorsement points (aep) currency to support work at any level, such as blog posts, data sets, peer reviews, etc. the amount credited to each scientist is based on the aep received for their previous work. therefore, highly endorsed researchers will have a greater impact on the community. one benefit of this system is that such endorsements would accrue faster than traditional citation metrics.6 one detriment to this system is its reliance on the opinions of more experienced scientists. the current peer review process assumes these experts would be the best to evaluate new research because they have the most knowledge. breakthroughs often overturn the status quo, however, and consequently may be overlooked in an echo chamber of approved theories and approaches. micropayments using aep could “also introduce a monetary reward scheme to researchers themselves,” bypassing traditional publishers.7 unfortunately, such rewards could become incentives to propagate unscientific or immoral research on topics like eugenics. in addition, research rewards might increase the influence of private parties or corporations to science and society’s detriment. blockchains might also reduce financial waste by “incentivizing research collaboration while discouraging solitary and siloed research.”8 smart contracts could also be enabled that automatically publish any article, fund research, or distribute micropayments based on the amount of endorsement points.9 to support these goals, digital science is working with katalysis on the blockchain for peer review project. it is hard to tell exactly where they are in development, but as of this writing, it is probably between the pilot phase and the minimum viable product.10 the decentralized research platform (deip) serves as another attempt “to create an ecosystem for research and scientific activities where the value of each research…will be assessed by an experts’ community.”11 the whitepaper authors note that the lack of negative findings and unmediated or open access to information technology and libraries | june 2019 3 research results and data often leads to scientists replicating the same research.12 they also state that 80 percent of publishers’ proceeds are from university libraries, which spend up to 65 percent of their entire budget on journal and database subscriptions.13 this financial waste is surprising because universities are the primary source of published research. therefore, deip’s goals include research and resource distribution, expertise recognition, transparent grant processes, skill or knowledge tracking, preventing piracy, and ensuring publication regardless of the results.14 the second most propitious application of blockchain to information science is content dissemination and management.15 blockchain is an excellent way to track copyright. several blockchains have already been developed for photographers, artists, and musicians. examples include photochain, copytrack, binded, and dotbc.16 micropayments for content supports the implementation of different access models, which can provide an alternative to subscriptionbased models.17 micropayments can also provide an affordable infrastructure for many content types and royalty payment structures. blockchain could also authenticate primary sources and trace their provenance over time. this authentication would not only support archives, museums, and special collections, but it would also ensure law libraries can identify the most recent version of a law.18 finally, blockchain could protect digital first sale rights, which are key to libraries being able to share such content.19 “while drm of any sort is not desirable, if by using blockchain-driven drm we trade for the ability to have recognized digital first sale rights, it may be a worthy bargain for libraries.”20 to support such restrictions, another use for blockchain developed by companies such as libchain is open, verifiable, and anonymous access management to library content.21 another suitable application for blockchain is metadata creation and management.22 an open metadata archive, information ledger, or knowledgebase is very appealing because access to high quality records often requires a subscription to oclc.23 some libraries cannot afford such subscriptions. therefore, they must rely on records supplied by either a vendor or a government agency, like the library of congress. unfortunately, as of this writing, there is little research on how these blockchains could be constructed at the scale of large databases like those of oclc and the library of congress. in fact, the only such project is demco’s private, invitation-only beta.24 demco does not provide any information regarding their new product, but to make its development profitable, it is most likely a private, permissioned blockchain. creating permissionless blockchains for metadata records this section will describe how to create permissionless blockchains for metadata records including grouping transactions, an appropriate consensus algorithm, and storage options. please note that these blockchains are intended to augment current metadata record creation and modification practices and standards, not supersede them. the author assumes that record creation and modification will still require content (rda) and encoding (marc) validation prior to blockchain submission. validation in this section will refer solely to blockchain validation. generating and managing public and private keys all distributed ledger participants will need a public key or address for blocks of transactions to be sent to them and a private key for digital signatures. one way to create these key pairs is to generate a seed, which can be a group of random words or passphrases. the sha-256 algorithm can then be applied to this seed to create a private key.25 next, a public key can be generated from that private key using an elliptic curve digital signature algorithm.26 for additional security, the no need to ask | rubel 4 https://doi.org/10.6017/ital.v38i2.10822 public key can be hashed again using a different cryptographic hash function, such as ripemd160, or multiple hash functions, like bitcoin does to create its addresses.27 these key pairs could be managed with digital wallet software. “a bitcoin wallet is an organized collection of addresses and their corresponding private keys.”28 larger institutions, such as the library of congress, could have multiple key pairs with each pair designated for the appropriate cataloging department based on genre, form, etc. creating a genesis block every blockchain must start with a “genesis block.”29 for example, a personal name authority blockchain might start with william shakespeare’s record. a descriptive bibliographic blockchain might start with the king james bible. this genesis block includes a block header, a recipient’s public key or address, a transaction count, and a transaction list.30 being the first block, the block header will not contain a hash of the previous block header. it will contain, however, a hash of all of the transactions within that block to verify that the transactions list has not been altered. the block header will also include a timestamp and possibly a difficulty level and nonce.31 then the block header is hashed using the sha-256 algorithm and encrypted with the creator’s private key to produce a digital signature. this digital signature will be appended to the end of the block so validators can verify that the creator made the block by using their (the creator’s) public key.32 finally, the recipient’s public key or address, the transaction count, and transaction list are appended to the block header.33 block header • hash of previous block header • hash of all transactions in that block • timestamp • difficulty level (if applicable) • nonce (if applicable) block • recipient public key or address • transaction count • transaction list • digital signature in her master of information security and intelligence thesis at ferris state university, amber snow investigated the feasibility of using blockchain to add, edit, and validate changes to woodbridge n. ferris’ authority record.34 as shown in figure 1, she began by creating a hash function using the sha-256 algorithm to encrypt the previous hash, the timestamp, the block number, and the metadata record. “the returned encrypt value is significant because the returned data is the encrypted data that is being committed as [a] mined block transaction permanently to ledger.”35 the ledger block, however, “contains the editor’s name, the entire encrypted hash value, and the prior blocks [sic] hashed value.”36 information technology and libraries | june 2019 5 figure 1. creating a sha-256 hash. next, as shown in figures 2 and 3, she created a genesis block with a prior hashed value of zero by ingesting ferris’ authority record as “a single line file that contains the indicator signposts for cataloging the record.”37 figure 2. ingesting woodbridge n. ferris' authority record.38 figure 3. woodbridge n. ferris' authority record as a genesis block. note the previoushash value is zero. snow noted that “the understanding and interpretation of the marc authority record’s signposts is not inherently relevant for the blockchain data processing.”39 to keep the scope narrow, she also avoided using public and private key pairs to exchange records between nodes. “the ri [research institution] blockchain does not necessarily require two users to agree…instead the ri blockchain is looking to commit and track single user edits to the record.”40 creating and submitting new blocks for validation once a genesis block has been created and distributed, any node on the network can submit new blocks to the chain. for metadata records, new blocks should contain either new records or multiple modifications to the same record with each field being treated as a transaction. when a no need to ask | rubel 6 https://doi.org/10.6017/ital.v38i2.10822 second block is appended, the new block header will include the hash of the previous block header, a hash of all of the new transactions, a new timestamp, and possibly a new difficulty level and/or nonce. the block header will then be hashed using sha-256 and encrypted with the submitter’s private key to become a digital signature for that block. finally, another recipient’s public key or address, a new transaction count, and a new transaction list will be appended to the block header. additional blocks can then be securely appended to the chain ad infinitum without losing any of the transactional details. if two validators approve the same block at the same time, then the fork where the next block is appended first becomes the valid chain while the other chain becomes orphaned.41 although snow’s method does not include exchanging records using public keys or addresses, she was able to change a record, add it to the blockchain, and successfully commit those edits using the proof of work consensus algorithm.42 as shown in figure 4, after creating and submitting a genesis block as “tester 1,” she added a modified version of woodbridge n. ferris’ record as “tester 2.” this version appended the string “testerchanged123” to woodbridge n. ferris’ authority record. then she validated or “mined” the second block to commit the changes. figure 4. submitting and validating an edited record. figure 5 shows that the second block is chained to the genesis block because the “previoushash” value of the second block matches the “hash” of the genesis block. this link is what commits the block to the ledger. the appended string in the second block is at the end of the “metadata” variable. information technology and libraries | june 2019 7 figure 5. the new authority record blockchain. a more sophisticated method to append a second block would require key pairs. as described previously, a block would include a recipient’s public key or address, which would route the new and modified records to large, known institutions like the library of congress. although every node on the network can see the records and all of the changes, large institutions with welltrained and authoritative catalogers may be the best repository for metadata records and could store a preservation or backup copy of the entire chain. they are also the most reliable for validating records for content accuracy and correct encoding. achieving algorithmic consensus once a block has been submitted for validation, the other nodes use a consensus algorithm to verify the validity of the block and its transactions. “consensus mechanisms are ways to guarantee a mutual agreement on a data point and the state…of all data.”43 the most well-known consensus algorithm is bitcoin’s proof of work, but the most suitable algorithm for permissionless metadata blockchains is a federated byzantine agreement. proof of work proof of work (pow) relies on a one-way cryptographic hash function to create a hash of the block header. this hash is easy to calculate, but it is very difficult to determine its components.44 to solve a block, nodes must compete to calculate the hash of the block header. to calculate the hash of a block header, a node must first separate it into its constituent components. the hash of the previous block header, the hash of all of the transactions in that block, the timestamp, and the difficulty target will always have the same inputs. the validator, however, changes the nonce or random value appended to the block header until the hash has been solved.45 in bitcoin this process is called “mining” because every new block creates new bitcoins as a reward for the node that solved the block.46 no need to ask | rubel 8 https://doi.org/10.6017/ital.v38i2.10822 bitcoin also includes a mechanism to ensure the average number of blocks solved per hour remains constant. this mechanism is the difficulty target. “to compensate for increasing hardware speed and varying interest in running nodes over time, the proof-of-work difficulty is determined by a moving average targeting an average number of blocks per hour. if they’re generated too fast, the difficulty increases.”47 adjusting the difficulty target within the block header keeps bitcoin stable because its block rate is not determined by its popularity.48 in sum, validators are trying to find a nonce that generates a hash of the block header that is less than the predetermined difficulty target. unfortunately, proof of work requires immense and ever-increasing computational power to solve blocks, which poses a sustainability and environmental challenge. bitcoin and other financial services may need to rely on proof of work because “the massive amounts of electricity required helps to secure the network. it disincentivizes hacking and tampering with transactions…”49 because an attacker would need to control over 51 percent of the entire network to convince the other nodes that a faulty ledger is correct.50 metadata blockchains would rely on public information and therefore would not need the same level of security as private financial, medical, or personally identifiable information. unlike bitcoin, metadata blockchains also would not need a difficulty target because fluctuations in block production rates would not affect a metadata block’s value the same way cryptocurrency inflation would. therefore, despite its incredible security, proof of work would be computationally excessive for metadata record blockchains. federated byzantine agreement byzantine agreements are “the most traditional way to reach consensus. […] a byzantine agreement is reached when a certain minimum number of nodes (known as a quorum) agrees that the solution presented is correct, thereby validating a block and allowing its inclusion on the blockchain.”51 byzantine fault-tolerant (bft) state machine replication protocols support consensus “despite participation by malicious (byzantine) nodes.”52 this support ensures consensus finality, which “mandates that a valid block…never be removed from the blockchain.”53 in contrast, proof of work does not satisfy consensus finality because there is still the potential for temporary forking even if there are no malicious nodes.54 the “absence of consensus finality directly impacts the consensus latency of pow blockchains as transactions need to be followed by several blocks to increase the probability that a transaction will not end up being pruned and removed from the blockchain.”55 this latency increases as block size increases, which may also increase the number of forks and possibility of attack.56 “with this in mind, limited performance is seemingly inherent to pow blockchains and not an artifact of a particular implementation.”57 bft protocols, however, can sustain tens of thousands of transactions at nearly network latency levels.58 a bft consensus algorithm is also superior to one based on proof of work because “users and smart contracts can have immediate confirmation of the final inclusion of a transaction into the blockchain.”59 bft consensus algorithms also decouple trust from resource ownership, allowing small organizations to oversee larger ones.60 to use bft, every node must know and agree on the exact list of participating peer nodes. ripple, a bft protocol, tries to ameliorate this problem by publishing an initial membership list and allowing members to edit that list after implementation. unfortunately, users are often reluctant to edit the membership list thereby placing most of the network’s power in the person or organization that maintains the list.61 information technology and libraries | june 2019 9 federated byzantine agreement (fba), however, does not require each node to agree upon and maintain the same membership list. “in fba, each participant knows of others it considers important. it waits for the vast majority of those others to agree on any transaction before considering the transaction settled.”62 theoretically, an attacker could join the network enough times to outnumber legitimate nodes, which is why quorums by majority would not work. instead, fba creates quorums using a decentralized method that relies on each node selecting its own quorum slices.63 “a quorum slice is the subset of a quorum convincing one particular node of agreement.”64 a node may have many slices, “any one of which is sufficient to convince it of a statement.”65 the system constructs quorums based on individual node decisions thereby generating consensus without every node being required to know about every other node in the system.66 one example of quorum slices that might be good for metadata blockchains is a tiered system as shown in figure 6. the top tier would be structured like a bft system where the nodes can tolerate a limited number of byzantine nodes at the same level. this level would include the core metadata authorities, such as the library of congress or pcc members. members of this tier would be able to validate any record. the second or middle tier nodes would depend on the top tier because, in this example, a middle tier node requires two top tier nodes to form a quorum slice. these middle tier nodes would be authoritative, known institutions, such as universities, that already rely on the core metadata authorities on the top tier to validate and distribute their records. finally, a third tier, such as smaller institutions, would, in this example, rely on at least two middle tier nodes for their quorum slice. figure 6. tiered quorum example. no need to ask | rubel 10 https://doi.org/10.6017/ital.v38i2.10822 using an fba protocol to validate a transaction requires each node to exchange two sets of messages. the first set of messages gathers validations and the second set of messages confirms those validations. “from each node’s perspective, the two rounds of messages divide agreement…into three phases: unknown, accepted, and confirmed.”67 the unknown status becomes an acceptance when the first validation succeeds. acceptance is not sufficient for a node to act on that validation, however, because acceptance may be stuck in an indeterminate state or blocked for other nodes.68 the accepting node may also be corrupted and validate a transaction the network quorum rejects. therefore, the confirmation validation “allows a node to vote for one statement and later accept a contradictory one.”69 figure 7. validation process of statement a for a single node v. fba would lessen concerns about sharing a permissionless blockchain, but it can “only guarantee safety when nodes choose adequate quorum slices.”70 after discovery, byzantine nodes should be excluded from quorum slices to prevent interference with validation. one example of such interference is tricking other nodes to validate a bad confirmation message. “in such a situation, nodes must disavow past votes, which they can only do by rejoining the system under new node names.”71 theoretically, this recovery process could be automated to include “having other nodes recognize reincarnated nodes and automatically update their slices.”72 therefore, the key limitation to using an fba algorithm is continuity of participation. if too many nodes leave the network, reengineering consensus would require centralized coordination whereas proof of work algorithms could operate after losing many nodes without substantial human intervention.73 storing the blockchain storing a large blockchain, such as bitcoin, is a significant challenge. one method to facilitate that storage would be to rely on top tier nodes to retain a complete copy of the blockchain and allow smaller, lower tier nodes to retain an abridged version. in bitcoin, these methods are known as full payment verification (fpv) and simplified payment verification (spv). fpv requires a complete copy of the blockchain to “verify that bitcoins used in a transaction originated from a mined block by scanning backward, transaction by transaction, in the blockchain until their origin is found.”74 unfortunately, as one might expect, fpv consumes many resources and can take a long time to initialize. for example, downloading bitcoin’s blockchain can take several days. this long installation period is partly due to the size of blockchain, but if proof of information technology and libraries | june 2019 11 work is used as the consensus algorithm, then the new node must also connect to other full nodes “to determine whose blockchain has the greatest proof-of-work total (by definition, this is assumed to be the consensus blockchain).”75 using fba instead of proof of work would eliminate this time and resource consuming step. in contrast, svp only allows a node “to check that a transaction has been verified by miners and included in some block in the blockchain.”76 a node does this by downloading the block headers of every block in the chain. in addition to retaining the hash of the previous block header, these headers also include root hashes derived from a merkle tree. a merkle tree is a method where “the spent transactions…can be discarded to save disk space.”77 as shown in figure 8, combining transaction hashes for the entire block into a single root hash in the block header saves a considerable amount of storage capacity because the interior hashes can be eliminated or “pruned” off the merkle tree. figure 8. using a merkle tree for storage. as shown in figure 9, to verify that a transaction was included a block, a node “obtains the merkle branch linking the transaction to the block it’s timestamped in.”78 although it cannot check the transaction directly, “by linking it to a place in the chain he can see that a network node has accepted it and blocks after it further confirm the network has accepted it.”79 no need to ask | rubel 12 https://doi.org/10.6017/ital.v38i2.10822 figure 9. verifying a transaction using a merkle root hash. compared to fvp, svp “requires only a fraction of the memory that’s needed for the entire blockchain.”80 this small amount of storage enables svp ledgers to sync and become operational in less than an hour.81 svp is limited, however, only allowing nodes to manage addresses or public keys that they maintain whereas fvp ledgers are able to query the entire network. thus, an svp ledger must rely “on its network peers to ensure its transactions are legit.”82 theoretically, an attacker could overpower the entire network and convince nodes using svp to accept fraudulent transactions, but such an attack is very unlikely for metadata blockchains. for additional security, an svp node could also “accept alerts from network nodes when they detect an invalid block, prompting the user’s software to download the full block and alerted transactions to confirm the inconsistency.”83 adding such a feature to metadata blockchain software would eliminate the slight risk of it being contaminated by malicious actors. thus, svp offers the ability for smaller institutions to participate in creating and maintaining a metadata blockchain without requiring them to have the storage capacity for the entire blockchain. conclusion and future directions this article described how permissionless metadata blockchains could be created to overcome two significant limitations in current cataloging practices: centralization and a lack of traceability. the process would start by creating public keys using a seed and the sha-256 algorithm and private keys using an elliptic curve digital signal algorithm. after creating the genesis block, nodes would submit either a new record or modifications to a single record for validation. validation would rely on a federated byzantine agreement (fba) consensus algorithm because it offers the most flexibility for institutions to select authoritative peers. quorum slices would be chosen using a tiered system where the top tier institutions would be the core metadata authorities, such as the library of congress. only the top tier nodes would be required to store a copy of the entire blockchain (fvp) thereby allowing other institutions to decide whether they prefer to use svp or fvp. information technology and libraries | june 2019 13 future directions for research could start with investigating whether this theoretical design will work. fba has not been heavily promoted as an option for a consensus algorithm, but its quorum slices create trust between recognized authorities and smaller institutions. another area of study could be whether there is a significant demand for metadata blockchains. many institutions appear frustrated at the costs and limitations of working with a vendor, but they also view such relationships as necessary for metadata record creation and maintenance. a metadata blockchain would reduce such dependence, but some institutions may be leery of using open source software. other institutions might be hesitant to adopt blockchain because they believe it is merely another “fad” or an unnecessary addition to metadata exchange systems. a third area for research could be a cost-benefit analysis for implementing metadata blockchains that weighs current vendor fees and labor costs against the potential storage and labor costs. such an analysis may create a tipping point where long-term return on investment outweighs the short-term challenges. endnotes 1 “about the project,” blockchain for peer review, digital science and katalysis, accessed nov. 29, 2018, https://www.blockchainpeerreview.org/about-the-project/. 2 “marc record services,” marc standards, library of congress, accessed nov. 29, 2018, https://www.loc.gov/marc/marcrecsvrs.html; “open library data,” open library, internet archive, accessed nov. 29, 2018, https://archive.org/details/ol_data ; oclc, 2017-2018 annual report. 3 “join the pcc,” program for cooperative cataloging, library of congress, accessed nov. 29, 2018, http://www.loc.gov/aba/pcc/join.html. 4 “040 cataloging source (nr),” oclc support & training, oclc, accessed nov. 29, 2018, https://www.oclc.org/bibformats/en/0xx/040.html. 5 dr. joris van rossum, “blockchain for research,” accessed nov. 29, 2018, https://www.digitalscience.com/resources/digital-research-reports/blockchain-for-research/. 6 van rossum, 11. 7 van rossum, 12. 8 van rossum, 12. 9 van rossum, 16. 10 digital science and katalysis, “about the project.” 11 “decentralized research platform,” deip, accessed nov. 29, 2018, https://deip.world/wpcontent/uploads/2018/10/deip-whitepaper.pdf. 12 deip, 13. 13 deip, 14. 14 deip, 16. no need to ask | rubel 14 https://doi.org/10.6017/ital.v38i2.10822 15 jason griffey, “blockchain for libraries,” feb. 26, 2016, https://speakerdeck.com/griffey/blockchain-for-libraries. 16 “e-services,” concensum, accessed nov. 29, 2018, https://concensum.org/en/e-services; “about,” binded, accessed nov. 29, 2018, https://binded.com/about; “faq,” dot blockchain media, accessed nov. 29, 2018, http://dotblockchainmedia.com/. 17 van rossum, “blockchain for research,” 10. 18 debbie ginsberg, “law and the blockchain,” blockchains for the information profession, nov. 22, 2017, https://ischoolblogs.sjsu.edu/blockchains/law-and-the-blockchain-by-debbieginsberg/. 19 griffey, “blockchain for libraries.” 20 “ways to use blockchain in libraries,” san josé state university, accessed nov. 29, 2018, https://ischoolblogs.sjsu.edu/blockchains/blockchains-applied/applications/. 21 “libchain: open, verifiable, and anonymous access management,” libchain, accessed nov. 29, 2018, https://libchain.github.io/. 22 griffey, “blockchain for libraries.” 23 san josé state university. “ways to use blockchain in libraries.” 24 “demco software blockchain,” demco, accessed nov. 29, 2018, http://blockchain.demcosoftware.com/. 25 jordan baczuk, “how to generate a bitcoin address—step by step,” coinmonks, accessed nov. 29, 2018, https://medium.com/coinmonks/how-to-generate-a-bitcoin-address-step-by-step9d7fcbf1ad0b. 26 “elliptic curve digital signature algorithm,” bitcoin wiki, accessed nov. 29, 2018, https://en.bitcoin.it/wiki/elliptic_curve_digital_signature_algorithm. 27 conrad barski and chris wilmer, bitcoin for the befuddled (san francisco: no starch pr., 2015), 139. 28 barski and wilmer, 12-13. 29 barski and wilmer, 11. 30 barski and wilmer, 172-73. 31 barski and wilmer, 172-73. 32 satoshi nakamoto, “bitcoin: a peer-to-peer electronic cash system,” accessed nov. 29, 2018, https://bitcoin.org/bitcoin.pdf. 33 barski and wilmer, bitcoin for the befuddled, 170-72. information technology and libraries | june 2019 15 34 amber snow, “the design and implementation of blockchain technology in academic resource’s authoritative metadata records: enhancing validation and accountability” (master’s thesis, ferris state university, 2018), 34. 35 snow, 40. 36 snow, 40. 37 snow, 37, 40. 38 snow, 42. 39 snow, 37. 40 snow, 39. 41 barski and wilmer, bitcoin for the befuddled, 23. 42 snow, “the design and implementation of blockchain technology,” 37. 43 “9 types of consensus mechanisms you didn’t know about,” daily bit, accessed nov. 29, 2018, https://medium.com/the-daily-bit/9-types-of-consensus-mechanisms-that-you-didnt-knowabout-49ec365179da. 44 barski and wilmer, bitcoin for the befuddled, 138. 45 barski and wilmer, 171. 46 barski and wilmer, 138. 47 nakamoto, “bitcoin,” 3. 48 barski and wilmer, bitcoin for the befuddled, 171. 49 helen zhao, “bitcoin and blockchain consume an exorbitant amount of energy. these engineers are trying to change that,” cnbc, feb. 23, 2018, https://www.cnbc.com/2018/02/23/bitcoinblockchain-consumes-a-lot-of-energy-engineers-changing-that.html. 50 barski and wilmer, bitcoin for the befuddled, 23. 51 shaan ray, “federated byzantine agreement,” towards data science, accessed nov. 29, 2018, https://towardsdatascience.com/federated-byzantine-agreement-24ec57bf36e0. 52 marko vukolić, “the quest for scalable blockchain fabric: proof-of-work vs. bft replication,” ibm research – zurich, accessed nov. 29, 2018, http://vukolic.com/inetsec_2015.pdf 53 vukolić, “the quest for scalable blockchain fabric,” [5]. 54 vukolić, [6]. no need to ask | rubel 16 https://doi.org/10.6017/ital.v38i2.10822 55 vukolić, [6]. 56 vukolić, [7]. 57 vukolić, [7]. 58 vukolić, [7]. 59 vukolić, [6]. 60 david mazières, “the stellar consensus protocol: a federated model for internet-level consensus,” stellar development foundation, accessed nov. 29, 2018, https://www.stellar.org/papers/stellar-consensus-protocol.pdf. 61 mazières, 3. 62 mazières, 1. 63 mazières, 4. 64 mazières, 4. 65 mazières, 4. 66 mazières, 5. 67 mazières, 11. 68 mazières, 11. 69 mazières, 13. 70 mazières, 28. 71 mazières, 29. 72 mazières, 29. 73 mazières, 29. 74 barski and wilmer, bitcoin for the befuddled, 191. 75 barski and wilmer, 191. 76 barski and wilmer, 192. 77 nakamoto, “bitcoin,” 4. 78 nakamoto, 5. 79 nakamoto, 5. information technology and libraries | june 2019 17 80 barski and wilmer, bitcoin for the befuddled, 192. 81 barski and wilmer, 193. 82 barski and wilmer, 193. 83 nakamoto, “bitcoin,” 5. tech tools in pandemic-transformed information literacy instruction: pushing for digital accessibility article tech tools in pandemic-transformed information literacy instruction pushing for digital accessibility amanda rybin koob, kathia salomé ibacache oliva, michael williamson, marisha lamont-manfre, addie hugen, and amelia dickerson information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.15383 amanda rybin koob (amanda.rybinkoob@colorado.edu) is assistant professor, literature and humanities librarian, university of colorado. kathia salomé ibacache oliva (kathia.ibacache@colorado.edu) is assistant professor, romance languages librarian, university of colorado. michael williamson (michael.d.williamson@colorado.edu) is assistant director, assessment and usability, digital accessibility office, university of colorado. marisha lamont-manfre (marisha.manfre@colorado.edu) is accessibility and usability assessment coordinator, digital accessibility office, university of colorado. addie hugen (addison.hugen@colorado.edu) is senior accessibility tester, digital accessibility office, university of colorado. amelia dickerson (amelia.dickerson@colorado.edu) is accessibility professional, digital accessibility office, university of colorado. © 2022. abstract inspired by pandemic-transformed instruction, this paper examines the digital accessibility of five tech tools used in information literacy sessions, specifically for students who use assistive technologies such as screen readers. the tools are kahoot!, mentimeter, padlet, jamboard, and poll everywhere. first, we provide an overview of the americans with disabilities act (ada) and digital accessibility definitions, descriptions of screen reading assistive technology, and the current use of tech tools in information literacy instruction for student engagement. second, we examine accessibility testing assessments of the five tech tools selected for this paper. our data show that the tools had severe, significant, and minor levels of digital accessibility problems, and while there were some shared issues, most problems were unique to the individual tools. we explore the implications of tech tools’ unique environments as well as the importance of best practices and shared vocabularies. we also argue that digital accessibility benefits all users. finally, we provide recommendations for teaching librarians to collaborate with campus offices to assess and advance the use of accessible tech tools in information literacy instruction, thereby enhancing an equitable learning environment for all students. introduction the last fifteen years have seen the rise of collaborative and interactive web platforms and whiteboards, game-based learning technologies, audience polls, and other tools that contribute to student engagement in higher education classrooms. these educational tech tools have supported one-time library information literacy (il) sessions by enabling student participation in real time. still, knowing that tech tools may enhance engagement is not enough; we should also be asking whether these tech tools are accessible for all students and, if not, what can be done to make them more accessible. this paper examines the digital accessibility of five tech tools specifically for students who use assistive technologies such as screen readers. the tools are kahoot!, mentimeter, padlet, mailto:amanda.rybinkoob@colorado.edu mailto:kathia.ibacache@colorado.edu mailto:michael.d.williamson@colorado.edu mailto:marisha.manfre@colorado.edu mailto:addison.hugen@colorado.edu mailto:amelia.dickerson@colorado.edu information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 2 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson jamboard, and poll everywhere. these tech tools were identified in a 2021 paper inquiring which tech tools librarians used in emergency remote il instruction during the covid-19 pandemic along with their perceptions of the weaknesses and strengths of these tech tools.1 although there are guidelines aiding librarians in assessing ada accessibility around library spaces, there are no disability-related recommendations for specific tech tools used in il instruction or studies examining tech tools’ digital accessibility features. 2 there is also a lack of documentation regarding librarians’ outreach to ada-related academic offices and tech companies regarding tech tools. we argue that collaboration between libraries and ada-related offices at the campus level increases awareness of digital accessibility issues and requirements and could ultimately advance digital accessibility in educational tech tools used in il instruction. we place our paper within the context of other pandemic-responsive digital pedagogy research. we acknowledge that technology needs for student engagement are evolving in new face-to-face, hybrid, and remote instruction environments; thus, we hope to impact the way tech tools are assessed for digital accessibility and to promote the use of accessibility-tested tech tools in library instruction. first, we provide an overview of ada and digital accessibility definitions, descriptions of screen reading assistive technology, and the current use of tech tools in instruction for student engagement. secondly, we examine accessibility testing reports for the five tech tools selected for this paper. then, we discuss two trends found in the reports: shared issues between the tools and the implications of unique environments. we also argue that digital accessibility benefits all users. finally, we provide recommendations for teaching librarians to collaborate with campus offices to assess and advance the use of accessible tech tools in il instruction, thereby enhancing an equitable learning environment for all students. overview ada accessibility the americans with disabilities act (ada) was made law in 1990, signaling an initiative to protect people with disabilities from discrimination in employment opportunities, when purchasing goods, and when participating in state and local government services. 3 the idea behind the ada law was to provide equal opportunity.4 however, as health sciences librarian ariel pomputius notes, ada law protects people from discrimination, but it does not guarantee a right to accessibility beyond the legal requirements granted by this act.5 as higher education advances through the covid-19 pandemic, digital accessibility has become more essential than ever in il instruction as it takes place in hybrid, remote, and in -person environments. to ensure the digital accessibility of tech tools for all students, we should first understand its meaning. what is digital accessibility? the covid-19 pandemic brought digital accessibility to the forefront as universities navigated complex remote and hybrid learning environments. fernando h. f. botelho, a scholar with expertise in technology and disability, explains digital accessibility as the interconnection of “hardware design, software development, content production, and standards definition.”6 for botelho, accessibility is “an ongoing and dynamic process” rather than an immobilized state, where standards work together as a part of a ubiquitous process.7 as information studies information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 3 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson professor jonathan lazar notes, “digital accessibility means providing an equal user experience for people with disabilities, and it never happens by accident.” 8 georgetown law also defines digital accessibility from a perspective that may resonate with instructors who seek technologies that are accessible to all students. they define digital accessibility as “the inclusive practice of removing barriers that prevent interaction with, or access to websites, digital tools, and technologies.”9 however, it is lazar who moves the topic forward when referring to digital accessibility in research libraries, arguing that although accessibility laws protect people with disabilities, digital accessibility also benefits the whole population.10 lazar made this assertion after capturing the challenges and lessons learned related to digital accessibility during covid-19.11 the most salient lesson is that research libraries should create an infrastructure that supports digital accessibility, especially now that the covid-19 pandemic has driven universities to provide instruction in multiple formats.12 we argue that this infrastructure should also include digital accessibility evaluation of tech tools used in the classroom. assistive technology for blind users congress defined assistive technology in the disabilities act of 1988 as “any item, piece of equipment, or product system, whether acquired commercially off the shelf, modified, or customized, that is used to increase, maintain, or improve functional capabilities of individuals with disabilities.”13 furthermore, special education professors kathleen puckett and kim w. fisher state, “technology becomes assistive when it supports an individual … to accomplish tasks that would otherwise be difficult or impossible.”14 as scholars of occupational therapy claire kearney-volpe and amy hurst note, screen readers assist people with no or low vision by presenting web information on “a non-visual interface” via braille or speech synthesis.15 screen readers’ purpose is important because all people should have the opportunity to access the same information and services in the digital environment without facing undue barriers or burdens. the digital accessibility office’s (dao) assessment and usability team at university of colorado boulder (cu boulder) primarily tests tools for accessibility by utilizing screen reader assistive technology for both computers and mobile devices. assessment and usability staff rely on screen readers for testing because this assistive technology uses and responds to the underlying code of each webpage, application, and environment. this in-depth output makes screen readers good tools for overall accessibility testing, even though they are generally for people with no vision. however, we found no studies on tech tools and classroom engagement that consider assistive technology such as screen readers. classroom engagement with tech tools academic librarians emily chan and lorrie knight state in a 2010 study that library instruction risks being anachronistic if it does not include an engaging technology-based activity.16 with this in mind, there is ample literature documenting the impact and benefits of tech tools in the classroom. for example, authors highlight tech tools’ anonymous environment, categorized as free of judgment, noting that it is student-centered and enhances student participation.17 moreover, anonymity provides a means for students to answer honestly, fostering classroom discussion that includes introverted students.18 on the other hand, some authors argue that anonymous participation does not enhance critical thinking. ann rosnida md deni and zainor izat zainal, referring to padlet as an educational tool, information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 4 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson argue that one challenge of using tech tools to advance student engagement is that they do not, on their own, enhance criticality or discussion because students may not want to oppose their peers’ opinions.19 as with other pedagogical techniques, intentional facilitation with tech tools is necessary to enhance criticality. many authors regard the use of tech tools in the classroom positively.20 examining kahoot! to test students’ performance, darren h. iwamoto et al. state that students valued receiving immediate feedback on their answers after taking a high-stakes examination.21 carolyn m. plump and julia larosa also appreciate the use of instructional games to provide immediate feedback to students, noting that this feedback warned faculty instructors against making assumptions about how much students understand in class.22 similarly, librarian maureen knapp, referring to online tools for active learning, notes that instant feedback drives classroom discussions forward.23 liya deng, a librarian with a focus on disability studies, notes in a 2019 study that using poll everywhere in library instruction provides an opportunity to build rapport with students and a strategy to keep students focused and away from non-instruction-related internet distractions.24 engineering classes have also used tech tools to enhance teaching and learning. a 2021 case study addressing online education due to covid-19 reports that students found kahoot! to be a useful online tool that helped them reflect, apply knowledge, and receive feedback. 25 similarly, engineering educator razzaqul ashahn advocates for incorporating tech tools like jamboard for active “think-pair-share” activities, noting that it enables instructors to connect with students as they do small group work.26 these studies suggest that tech tools continue to be relevant and beneficial during the pandemic, though again, they do not consider whether they are digitally accessible. for this reason, the continued use of tech tools in various modalities (in-person, hybrid, and remote) attests to their relevance, which may continue to grow as instructors transition to pandemic-transformed pedagogy. pandemic-transformed pedagogy in a 2020 publication exploring covid-19 impacts on teaching, learning, and technology use, scholars jillianne code, rachel ralph, and kieran forde coined the phrase “pandemic-transformed pedagogy.”27 as they state, educators find themselves “on the cusp of a rapid change that is compelling them to re-think their worldview in both how they teach and how their students learn, calling for their transformation as educators.”28 a review of the recent literature available through google scholar on “pandemic-transformed pedagogy” shows expanding adoption of this phrase, including academics publishing on a range of interdisciplinary subjects and in international contexts, with implications for both k–12 and post-secondary education.29 as we reflect on this transformation and call for responsiveness to rapid change, we emphasize the need for support, planning, and advocacy for digital accessibility and tech tools. before the covid-19 pandemic, scholars at the university of sydney found in 2018 that the most significant factor driving the choice to use technology was whether it was immediately available. 30 these scholars emphasized the “just in time” use, noting that ready access to the technology required “actions, expenditure, support, and commitment from policymakers and administrators.”31 at the beginning of the pandemic, teachers, librarians, and students had a matter of days to pivot to remote work, and as ibacache, rybin koob, and vance found in a 2021 study, “availability” was a consideration for librarians in selecting tech tools for engagement and content delivery.32 this “just in time” consideration is even more important in the aftermath of covid-19, which prompted emergency remote learning. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 5 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson yet, teaching librarians also ought to go beyond what is easily available and move towards what is digitally accessible. part of the transformation we envision is to extend the concept of “pandemictransformed pedagogy” to include digital accessibility and thus push for the tech tools we use in il instruction to be readily available and digitally accessible. methodology as previously mentioned, this study examines the digital accessibility of five educational tech tools used in il instruction. to initiate a formal accessibility test, we created scripts detailing how to interact with the samples we provided for the five tools.33 these scripts were then used to manually test each tech tool for its digital accessibility using a variety of screen readers on both computers and mobile devices. about the testers the testers are native users of screen reading assistive technology and are blind. they test each tech tool first, with additional staff in the dao reviewing and validating results. about the test scripts process each test script contained the following parameters: 1. basic information about the tool. 2. contact information for access issues and technical questions, such as the tools’ customer support email and librarians’ emails for follow-up questions. 3. access points to the software and websites (urls). 4. step-by-step instructions for testers to impersonate a student engaging in an il task. as a part of these test scripts, we created short sample quizzes and activities for each of the five tools considered in this paper. in addition, the test scripts provided step-by-step descriptions to help the testers interact with the tools. the testers then tried each tool, focusing on functionality and whether they could complete the tasks in the script. the reports describe three levels of problems: severe, significant, and minor. the results section of this paper reports on these problems as found with the five tools tested. the testers also assessed general user experience (usability). the testers used a holistic approach, engaging with the entire virtual environment of the tool rather than looking only at isolated functions. assistive technology the testers utilized four types of screen reader software: voiceover, talkback, nvda, and jaws. voiceover, developed by apple, is a screen reader for mobile devices and computers that “comes standard on every iphone, mac, apple watch, and apple tv. it is gesture-controlled, so by touching, dragging, or swiping[,] users can hear what’s happening on screen and navigate between elements, pages, and apps.”34 talkback is a google-based screen reader included in android mobile devices that functions similarly to voiceover.35 nvda is a microsoft windows-only free open access screen reader supporting people who are blind or have vision impairment.36 jaws, also compatible with microsoft windows, allows people with visual impairment to read the pc screens with a text-to-speech output or via braille display.37 we also tested for visual usability issues using a free web-based color contrast analyzer.38 the testers provided thorough reports detailing the results of their testing, including exact versions used. the tests were conducted between february 27 and may 1, 2022. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 6 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson tools evaluated the educational technology tools in this study are web-based and have free options, allowing students to engage in activities using their computers or their phones. we identified these tools based on a survey about tech tool use during the covid-19 pandemic and from our own experiences.39 the tools are jamboard, kahoot!, mentimeter, padlet, and poll everywhere. jamboard is a google-powered virtual whiteboard tool. kahoot! is a quiz/game platform offering multiple styles; we tested the standard quiz question format and utilized one of the vendor provided sample quizzes. mentimeter is another quiz-making platform; we created the sample quiz utilizing multiple choice and short answer question formats. padlet is a collaborative bulletin board platform with various formats (including the three we tested: padlet maps, padlet shelf, and padlet wall). padlet includes options for users to add text and multimedia in response to question prompts or to post their own questions and other content in a collaborative virtual space. finally, poll everywhere is a polling/survey platform. limitations although digital accessibility offices at different universities commonly rely on shared standards for technology evaluation, such as web content accessibility guidelines (wcag) 2.1, we acknowledge that the assessment approach will vary from office to office. overall, there is much debate on which practices and standards for evaluating tech tools yield the best results. not all higher education institutions have digital accessibility offices, let alone accessibility and usability labs and testers. some institutions may rely on automated checkers or a mix of automated and manual testing. approaches to testing differ, and there is disagreement among digital accessibility practitioners about whether a fully automated, fully manual, or hybrid approach is best. regardless, we expect that manual testing of these educational tech tools using similar assistive technologies would have similar results during the timeframe these tools were tested. the testing reports capture a moment in time, and it’s important to note that web-based tools are frequently updated. we only tested the free versions of these tools. there may be differences in accessibility between free and paid versions. we tested only the browser versions of these tools on computers and mobile devices and did not test mobile applications, which may or may not be more accessible. this decision was made due to the probability that most il librarians and other instructors would not regularly ask students to download applications to their personal devices for in-class engagement. kahoot!, mentimeter, padlet, and poll everywhere were tested on windows, ios, and android platforms. jamboard was tested only on windows, because the browser version would not open on a mobile device using assistive technology. instead, it attempted to force an app download. we also tested each tool using sample environments and functions that we hope captured some ways the tools would be used in a typical il classroom. due to the nature of the tools and the many options available for question and collaboration formats in each tool, these samples were not exhaustive of all options available. these testing results are meant to be illustrative rather than comprehensive. finally, this study evaluated tech tools only for digital accessibility using the specific assistive technology of screen readers. further research is needed regarding how students with a rang e of different disabilities may interact with the technology tools examined here. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 7 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson results this section reports the three levels of problems (severe, significant, and minor) that dao testers found in jamboard, kahoot!, mentimeter, padlet (shelf, map, and wall), and poll everywhere. the testers also assessed user experience (“usability”). issues may be present in multiple categories based on how they impact the user’s ability to complete actions. severe issues table 1 shows the severe issues found in the tools tested. severe issues create access barriers that prevent assistive technology users from completing tasks and are issues that need to be remediated. the testers consider these issues prohibitive for many individuals with disabilities and for those who use assistive technologies. the dao identified ten severe issues in padlet shelf; five severe issues each in jamboard, padlet wall, and poll everywhere; four severe issues in kahoot!, and three severe issues in padlet map. the testers did not find severe issues in mentimeter. table 1 shows that the most common severe issue corresponds to elements that are unlabeled or inappropriately labeled. in the case of padlet map and jamboard, the testers found buttons that were unlabeled or labeled with irrelevant numbers. testers felt unclear as to what the buttons were or what their functionality was. padlet shelf contained the most unlabeled buttons, including the buttons to add posts and the three vertical dots menu to edit or delete. this issue is highly relevant since users need these buttons to navigate and contribute to the padlet. the testers observed a similar problem when using the screen reader talkback to engage with padlet shelf. talkback found unknown or unlabeled buttons, which impede users’ ability to navigate or interact with videos they submit to the padlet. figure 1 illustrates the play button located at the center of a video. in the screen reader, this button is unlabeled and appears after the video, preventing the screen reader from understanding its function and leaving users unclear whether this button is connected to the video. figure 1. the play button at the center of the video is unlabeled. in the reading order, this unlabeled button appears after the video; therefore, it is unclear what it does or how it relates to the video. the second most prevalent severe issue is elements that are not accessible to screen readers. this issue affected padlet shelf and padlet wall. in the case of padlet shelf, the testers utilizing the information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 8 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson voiceover screen reader were unable to interact with or locate gifs and graphics. when the testers utilized talkback, they would hear the gif but could not find the graphics because they were marked as links. in addition, the drawing feature was also not accessible to screen readers, including the visual elements that control colors, which appear as clickable links instead of the visual elements associated with colors. these elements were unavailable for users utilizing voiceover and jaws. the testers found a similar problem with the visual elements in padlet wall, especially when they tried to edit a post (see fig. 2). figure 2. when users want to edit a post in padlet wall, there are visual elements that are available to change the color of the post. these elements are not available to screen readers. figure 3. when images are not programmed to be read as graphics, screen readers are not able to gather information related to the gif. this image was read as “jaf3mi0ja5huk/giphy.” information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 9 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson figure 4. while using nvda, the user hears links for images that do not make sense. the third most frequent severe issue relates to graphics and gifs that are not appropriately programmed. this issue affected padlet shelf and padlet wall. when the testers were using jaws in padlet shelf, the gifs read as links with the following text: “jaf3ml0ja5huk/giphy.” when the testers utilized nvda, the gifs read as “giphy,” conveying no information describing the gif and hindering navigation (fig. 3). similarly, graphics and gifs in padlet wall are programmed as links rather than graphics. when the testers used jaws to understand graphics and gifs, they heard long links such as: “eb351cc20e6bfda76d443f1e93ad7963/pumpkin_seedling_3.” long links like this are useless to people using screen readers and disrupt people’s ability to search for graphics. when the testers used nvda, they also heard links for the images, but without the other series of characters included in jaws (fig. 4). the testers also found severe issues with elements not available by keyboard or screen reader (jamboard and poll everywhere) and timer features (kahoot!). for example, the pen, eraser, laser, shapes, and text box elements in jamboard can only be utilized or placed on the screen by a mouse, making them inaccessible to blind learners. another issue is the lack of alternative text. since jamboard offers a collaborative multi-user space, some users may post images. however, there is no way to input alternative text to an image. in the case of kahoot!, when the timer is activated, the countdown plays as the screen reader tries to read the page, confusing the screen reader and the user, who will hear the timer with random numbers and not the question. the timer feature also affects the user when starting a quiz or moving between questions. it is unclear whether the screen reader is unable to read the questions due to the short timeframe or whether the questions are truly unavailable to the screen reader. the instructor may extend the timer for quizzes in kahoot!, but it is impossible to turn it off altogether when using the kahoot! quiz question format. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 10 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson table 1. number of occurrences of severe issues found during screen reader testing for kahoot!, jamboard, mentimeter, padlet (three formats tested), and poll everywhere. (jamboard was tested on a windows computer only; the other tools were tested on windows, ios, and android.) severe issue jamboard kahoot! mentimeter padlet: map padlet: shelf padlet: wall poll everywhere total occurrences element not available by keyboard or screen reader 2 0 0 0 0 0 1 3 element presents gesture/navigation traps 0 0 0 0 0 0 1 1 elements are not keyboard accessible 0 0 0 1 0 0 0 1 elements are unlabeled or inappropriately labeled 2 0 0 1 4 2 0 9 elements not accessible to screen reader 0 0 0 0 3 2 0 5 errors do not get focus 0 0 0 0 0 0 1 1 graphics and gifs are not programmed appropriately 0 0 0 0 3 1 0 4 graphics are unlabeled or inappropriately labeled 0 0 0 0 0 0 2 2 graphics lack alternative text 1 0 0 0 0 0 0 1 lack of alert 0 0 0 1 0 0 0 1 text not read by screen reader 0 1 0 0 0 0 0 1 timed pages disrupt the ability to read the page 0 3 0 0 0 0 0 3 tool totals 5 4 0 3 10 5 5 32 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 11 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson the testers found other severe issues such as text not being read by screen readers, missing notifications, elements not accessible to screen readers, unlabeled graphics, and lack of focus on images. for example, in the case of kahoot!, the screen reader could not read the answer notification text. this issue meant that while the tool offered visual indicators for correct and incorrect answers, the screen reader did not read these indicators and it remained unclear to testers whether their answers were correct or not. finally, “lack of focus” or challenges with “focus handling” indicate that the assistive technology’s attention was not where it should be. this problem happens because tool developers do not set the appropriate code for screen readers. significant issues table 2 shows the significant issues found in the tools. significant issues represent items that create great difficulty for people who use assistive technologies, but they do not necessarily prevent the tool from being used. significant issues are recommended for remediation. interestingly, most significant issues were not shared across the five tools; out of seventeen problems, only one was shared by four tools (“inconsistent focus handling”), and three were shared by two tools each (“graphics are inappropriately labeled,” “reading order can be confusing to users,” and “state is not indicated”). because of this lack of overlap, brief descriptions of how frequent issues affected specific tech tools are warranted, focusing on those issues that affected multiple tools, recurred most frequently, or both. the significant issue that recurred most frequently was “reading order can be confusing to users,” affecting jamboard as well as all three padlet styles. in jamboard, when creating a sticky note, the focus of the assistive technology went into the edit field but ignored the color options. this meant that users were unable to switch between colors when making a post. reading order also caused difficulties. reading order is the way elements are tagged and read by screen readers. this may not be the same order most sighted users experience when reading elements on the page from top to bottom, though it should closely reflect the visual layout of the page. it determines what a blind learner will understand about the digital environment and in what order. in padlet map, the screen reader went through irrelevant content, including the terms and conditions, before reading the “new post” button. padlet shelf had three instances of confusing reading order; for example, the “publish” and “update” options were in the reading order above the “edit” field. the user would have to know to navigate back to finalize their post (this issue is repeated in padlet wall as well). further, if a user leaves the new post dialog box, it is difficult to return due to the reading order. the “more buttons” element was also read before the heading of a new post, and those additional buttons are unlabeled. finally, in padlet wall, the tester utilizing voiceover could not discard a post (fig. 5). a dialog opened asking for discard confirmation, but this dialog was buried in the reading order and challenging to locate. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 12 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson figure 5. a dialog box appears visually in padlet wall when a user attempts to discard a post, but it is buried in the reading order of the voiceover screen reader, making it difficult to locate and complete the task. the next most frequent significant issue was “inconsistent focus handling,” which occurred six times. focus handling directs the attention of the user and facilitates various actions in a given environment. inconsistent focus handling emerged in four out of the five tools: jamboard, kahoot!, all three padlet styles, and poll everywhere. this issue often appeared when a new element on the screen was opened, but the “focus” (what the screen reader was paying attention to at any given time) remained on the previous panel or element, causing confusion and difficulty. for example, in jamboard, when selecting the “open a jamboard” button, the panel opened visually, but the screen reader’s focus remained on the button behind the open panel. to get to the new jamboard, the tester had to navigate the other page content first. focus handling was inconsistent across many activation buttons and interactions in all three padlet styles. in kahoot!, focus handling was inconsistent across screen readers, with the focus going to different places, such as after answering a question. in poll everywhere, the focus traveled to other areas of the page after answering a question, returning to previous ques tions, or refreshing the page. these inconsistencies varied among screen readers. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 13 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson table 2. number of occurrences of significant issues found during screen reader testing for jamboard, kahoot!, mentimeter, padlet (three formats tested), and poll everywhere. (jamboard was tested on a windows computer only; the other tools were tested on windows, ios, and android.) significant issue jamboard kahoot! mentimeter padlet: map padlet: shelf padlet: wall poll everywhere total occurrences difficult combination/list box 0 3 0 0 0 0 0 3 element difficult to access 1 0 0 0 0 0 0 1 element state not indicated 1 0 0 0 0 0 2 3 error does not get focus 0 0 1 0 0 0 0 1 extensive load times create difficulties 0 1 0 0 0 0 0 1 graphics are inappropriately labeled 0 1 1 0 0 0 0 2 graphics not programmed appropriately 1 0 0 0 0 0 0 1 headings are not used 1 0 0 0 0 0 0 1 inconsistent focus handling 1 1 0 1 1 1 1 6 lack of alert 0 0 0 1 0 2 0 3 lack of contextual text/information 0 1 0 0 0 0 0 1 lack of focus indicators 0 0 0 1 0 1 0 2 lack of notification 3 0 0 0 0 0 0 3 object placement 1 0 0 0 0 0 0 1 reading order can be confusing to users 1 0 0 1 3 2 0 7 user-created objects initially lack markup 0 0 0 0 0 1 0 1 tool totals 10 7 2 4 4 7 3 37 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 14 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson another issue that recurred across two tools (kahoot!, mentimeter) was “graphics are inappropriately labeled.” in kahoot!, a graphic showing the final scoreboard from the quiz and a podium were hidden from screen readers. in mentimeter, the logo for the tool is labeled as a “logotype” in the alt text. while these inappropriate labels for graphics may seem minor, they leave players using assistive technology out of celebratory or fun elements and may be confusing. inappropriate labels cannot be corrected by instructors, who are unable to adjust the alt text for elements that are built into the software. another issue that recurred was “state is not indicated” (jamboard, poll everywhere). here, “state” refers to any change or option for an element—the state of it in the digital environment. in jamboard, there is no indication of what color is selected for sticky notes, for example, which can be problematic if instructors use color to convey meaning (fig. 6). in one test question on poll everywhere, the unlabeled image reads as clickable to nvda, and visually, the image becomes larger when clicked. but this change is not announced and again may be confusing. figure 6. for screen readers, there is not an indication of what color has been selected for sticky notes, though this is available visually. with ten issues listed, jamboard was the tool with the greatest number of significant problems. this was true even though jamboard was tested only on windows and not on mobile devices. padlet wall and kahoot! had seven significant issues each. this is a slight departure from the data in table 1, where padlet shelf had the most severe issues. in general, tools with severe issues consistently exhibited some significant issues as well. figure 6. users can enter text on option 1 and option 2, but these options do not generate a heading. minor issues table 3 shows the minor issues found in the five technology tools. minor issues represent items that are inconvenient or annoying, but do not necessarily create barriers to accessibility, e.g., repetitiveness, unclear text, etc. the testers found that each tool had between one and four minor issues of their own but did not share any of the minor issues listed. kahoot! had three issues related to confusing elements: gibberish text heard on screen readers, blanks in the statement not read by screen readers, and an icon that shows the total number of users who finished a test, which the screen reader could not read. other minor issues include instructions, questions, and information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 15 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson table 3. number of occurrences of minor issues found during screen reader testing for jamboard, kahoot!, mentimeter, padlet (three formats tested), and poll everywhere. (jamboard was tested on a windows computer only; the other tools were tested on windows, ios, and android.) minor issue jamboard kahoot! mentimeter padlet: map padlet: shelf padlet: wall poll everywhere total occurrences element is inappropriately labeled 0 0 1 0 0 0 0 1 elements confusing to users 0 3 0 0 0 0 0 3 elements read twice 0 0 0 0 1 0 0 1 heading level not concise 0 0 0 0 0 1 0 1 headings are not used to provide structure 0 0 0 1 0 0 0 1 headings used too often 0 0 0 0 1 0 0 1 inconsistent focus handling 0 0 1 0 0 0 0 1 labels are inconsistent 1 0 0 0 0 0 0 1 lack of a programmatic list creates confusion 0 0 0 0 0 1 0 1 lack of notification 0 0 0 0 0 1 0 1 same information is presented to screen reader multiple times 0 0 0 0 0 0 1 1 sound effect portray meaning 0 1 0 0 0 0 0 1 submenu item count not provided 1 0 0 0 0 0 0 1 unclear text is confusing to user 0 0 0 0 1 1 0 2 tool totals 2 4 2 1 2 3 1 17 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 16 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson answers announced multiple times (poll everywhere), long heading text (padlet wall), lack of notification when the user adds an image (padlet wall), excessive use of headings in a page forcing users to go through the entire page to find a heading (padlet shelf), and headings not used to provide structure and facilitate navigation. the lack of heading structure complicates the ability of users who desire to add a post with a heading and text as seen in figure 6 (padlet map). usability issues the accessibility assessment reports also included usability evaluation. usability issues may impact users of any ability. the testers noticed insufficient color contrast in three tools (poll everywhere, padlet wall and map, and kahoot!). for example, figure 7 illustrates a poll everywhere sample question where the color of the text does not have enough contrast between the text and the background. the evaluators also found a lack of color contrast in instructions and captions. in some padlet formats, the instructor can change color contrast by choosing a different template. figure 7. an example of poll everywhere answer options that do not have sufficient color contrast between the text and background. conveying meaning using colors is another issue. in the case of jamboard, the sticky note (fig. 8) has a blue bar at the bottom that appears to be loading. this bar is connected to a character count that is not noted by screen readers. in addition, the testers could continue typing past the character limit when the loading bar turned red. the testers also noticed layered elements that caused usability problems. figure 9 illustrates how the preview panel in padlet map visually blocks the post and the button to close the preview panel. padlet shelf and mentimeter did not have usability issues. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 17 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson figure 8. in jamboard, the blue bar (below the yellow box) is used as a visual indication of character limit that is not available to screen readers. figure 9. when the user has the “preview panel” in padlet map open and starts a new post, the preview panel blocks the post. summary of results the reports showed that mentimeter was the most digitally accessible tool of those considered in this study. it is important to note that kahoot! and poll everywhere were judged as relatively accessible with caveats. both jamboard and all three types of padlet tested were found to be inaccessible for many individuals who use assistive technologies. in any case, all tools included either severe or significant issues, creating a great deal of difficulty for users. most issues were unique to individual tools. of twelve severe issues, only two were shared across two tools each. of sixteen significant issues, only four were shared across tools, and only one was shared across more than two tools (“inconsistent focus handling” was a problem in all tools except mentimeter). all fourteen minor issues were unique. mentimeter, with very few issues, and padlet, with the most problems in aggregate, were outliers. because padlet offers so many different format options, we tested three, which affected the findings. still, padlet shelf had the most severe issues (ten). jamboard had the most significant issues (ten) of any tool aside from padlet. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 18 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson discussion we hypothesized that the tools selected for this study shared similar digital accessibility issues. to our surprise, the reports showed that these tech tools had few shared problems. we will thus consider two trends in the reports worthy of further examination: shared issues among the tools and unique environments. we will also discuss how digital accessibility can benefit all users of tech tools, not only people with disabilities. shared issues among the tools as previously mentioned, many issues were unique to individual tech tools covered in this study, while a few problems were shared among the tools. when a particular issue was shared among different tools, the level of severity determined whether a person using assistive technology could have a successful interaction with a tool or not. tracking shared issues and their severity may help developers and digital accessibility staff create a shared vocabulary for discussing user experience. it may also help both parties recognize when issues are common and relatively easy to remediate (e.g., labeling, heading, and alt text problems). there are other shared issues, such as focus handling inconsistencies, that are more difficult to resolve even though they are at the heart of screen reading assistive technology. tracking focus handling problems may allow developers and digital accessibility advocates to share possible solutions with one another. moreover, if tech tool developers and digital accessibility staff both understand the importance of a factor like focus handling, any difficult and severe problem can be prioritized when creating and fixing tech tools. it is also important for instruction librarians to have a basic grasp of this shared vocabulary so that they can anticipate the needs and experiences of the learners in their classrooms. looking at each tech tool in isolation offers only a tiny glimpse into the possibilities of what might happen when students connect to engagement technologies. evaluating multiple tools allowed us to better understand recurring problems and the barriers they create. unique environments though tracking shared issues is important for these reasons, by the end of our testing, we found that the tools were not similar and that even when they had shared issues, these problems had unique characteristics. for this study, we selected tools that have similar functionality (for example, both kahoot! and mentimeter can function as quiz platforms) and others that are distinctive (such as padlet maps, which incorporates gps data to allow users to interact with maps). these tools offer students real-time engagement, which helps foster a collaborative learning environment. as mentioned above, most severe issues (ten out of twelve), most significant issues (twelve out of sixteen), and all minor issues were unique—in other words, they were not shared across tools. from the testers’ point of view, the presence of unique problems is understood by the fact that the elements of each tool combine to create unique environments. for example, some tech tools are more image based, while others are text based.40 our study shows that even tools that initially appear similar are revealed as unique through assistive technology testing. an interesting finding concerned padlet. when tools have problems, these issues usually exist across all of the screen readers used for testing. padlet, however, caused inconsistencies across information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 19 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson different screen readers. for example, padlet shelf had many unlabeled or inappropriately labeled elements that created different experiences between voiceover, nvda, and jaws. moreover, when irregularities appear across similar assistive technologies, this might mean that developers created unusual code in order to facilitate specific visual elements or other aspects of the technology. developers should consider testing on multiple platforms and with two or more screen readers to catch these inconsistencies and should also consider whether simple html alternatives are possible in place of more complicated code. regarding padlet, it may be possible that software developers used accessible rich internet application code (aria-code), which is known to cause inconsistencies for assistive technology. whenever possible, user experience should be consistent across screen readers. users should never be asked to switch assistive technologies in order to adapt to a tech tool. although we sampled only five tech tools, when considering the breadth of other tools in the market and those that may not yet be developed, we wonder whether our results could indicate an abundance of unique environments with unique digital accessibility problems. this inference suggests that software developers may not be creating tech tools with digital accessibility in mind or may be testing with only one type of screen reader. it also speaks to the lack of digital accessibility best practices in software development for educational tech tools. if anything, our results also illustrate the complexity of tech tool environments and the nuances of assistive technology. digital accessibility benefits users with different abilities digital accessibility is valuable for everyone, not just people with disabilities. two specific values illustrate this comprehensive benefit. first, if standards for digital accessibility are followed, digital content will be more “portable across platforms, browsers, and operating systems.”41 this interoperability could mean that learning content and properly formatted tech tools will be easy to use across assistive technologies and devices such as smartphones. secondly, accessible features benefit people who do not see themselves as having a disability. 42 for example, covid-19 amplified the benefits of using captioning for all learners, even when these learners did not have a specific disability.43 a 2004 microsoft survey also inferred that accessibility features benefit a wide variety of people.44 while a person with a disability benefits from clear organization, headings, labels, and color contrast, those aspects are also helpful for all users. recommendations and next steps planning with intention teaching librarians need to invest time learning about the environment of a tech tool they decide to use in il instruction. sometimes, tech tools that are digitally accessible are not easy or intuitive for instructors to use. we experienced this “easy to use” versus digital accessibility conflict when preparing the scripts for mentimeter and padlet. padlet is used extensively at our institution due perhaps to its instructor-friendly platform. however, padlet’s wall, shelf, and map assessments revealed many problems with digital accessibility. additionally, we had a difficult time creating a quiz in mentimeter, finding this platform unfriendly for the instructor; yet this tool had the fewest digital accessibility problems. this tension between ease of use and digital accessibility illustrates the importance of taking time to read and understand documentation and training materials before creating engagement activities for il sessions. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 20 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson we encourage teaching librarians to work with their local digital accessibility offices to evaluate the technology used most frequently in il classrooms. if a digital accessibility office does not exist on your campus, you may wish to advocate for your administration to create one. even if your institution does not yet have a digital accessibility office, there are ways for librarians to plan their il sessions with accessibility in mind. librarians may do basic assessments of tech tools without access to assistive technology by testing whether it is possible to access all features in a given tool using the tab keyboard key. if there is a function or action that you cannot access with tab or that you must use a mouse to navigate to, then that part of the tool will not be accessible to someone using a screen reader. you can also unplug or turn off the mouse and attempt using a tech tool. librarians can approach each tech tool by asking: is there anything in the tool that uses only images or colors? do sounds convey a meaning that is not otherwise communicated on the screen? if there is anything in the tool that relies on a single form of sensory feedback, it may be unperceivable to people using assistive technology. finally, we strongly suggest considering whether these tools add value to il instruction. if you like a certain tool but know it is inaccessible (or you are unsure), consider trying a different way of involving students in the same kind of engagement. think about simplifying the tech tools that you do use. extend or turn off timers where possible if you choose to use kahoot!, mentimeter, or other quiz-making tools. avoid using questions on any platform that require users to engage with images, even if alt text is provided, because they tend to be more difficult for screen readers. pursue documentation and take time to understand various options for each tool and each question, then weigh which option will be most accessible for most students. it takes time and energy to plan ahead with intention but increasing the ability of all students to engage in learning makes the planning process worthwhile. collaboration if collaboration between librarians and digital accessibility experts is possible on your campus, take the time to talk to one another about learning outcomes and reasons for using specific tech tools. consult with experts in digital accessibility who can also help you advocate for accessibility clauses in purchase contracts before agreeing to subscribe to a given tool or service. you may also foster collaboration with an inclusive community of practice if you have one at your library. further, the teaching and learning unit on your campus may offer support for integrating technology with pedagogy to promote the engagement and learning experience of all students attending il instruction. this collaboration may be impactful for the library and the campus teaching community. as librarians with teaching responsibilities, we usually do not work in isolation. instruction librarians can also serve as a resource for teaching faculty who may want to incorporate accessible tech tools into their instruction. in addition, librarians could investigate professional organizations that provide support and development in understanding digital accessibility. while a framework for assessing tech tools for accessibility does not currently exist, the development of standards and best practices would be beneficial for librarians, software developers, and accessibility professionals alike. we hope to undertake future research and consultation to develop such frameworks with colleagues, possibly through ala round tables or acrl sections focused on instruction and accessibility. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 21 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson next steps our next steps include sharing these reports with the companies who created the five tools we tested. we will ask them to prioritize both the most severe issues and those issues that are easy to fix and that impact user experience. we will also underscore those areas that surprised us, such as inconsistencies between screen readers for the same issue in a given environment. the goal of this outreach will be to build relationships with tech tool developers so that continued dialogue and testing can occur. the ultimate goal is a more accessible learning environment for everyone with technology vendors as partners in this journey. conclusion advocating for digital accessibility in research libraries requires relationship and capacity building. the challenges faced during emergency remote learning illustrate the necessity of campus units working together to ensure student inclusion and success. increased collaborations between academic libraries, tech tool developers, and digital accessibility offices mean that all parties can benefit from mutual expertise. librarians may share the kinds of tech tools being used in il sessions, while accessibility offices may test those tools and provide recommendations for improvement, which may then be leveraged when working with software companies to advocate for positive change. if more people are aware of digital accessibility vocabulary, needs, and resources across campus, that can also augment the number of people available to respond to and triage needs when future emergencies arise. acknowledgment we would like to thank scott holman and eric klinger from the cu boulder writing center for their help revising this manuscript. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 22 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson appendix a: tester instruction and script background poll everywhere is an online platform used in classrooms to engage with students through questions, surveys, and polls. people can sign up for a free account or for one of four subscription based account options. the free account allows users to create unlimited questions, have access to webinar tutorials, and upload images as question choices. this tool also allows people to respond via browser, sms, or app; to export data and screenshots; and to share to social networks, though some of these features are limited with a free account. poll everywhere script 1. type in your browser or click on the link provided. a pop-up might show on your screen. agree to the cookie policy if it does. 2. you may be prompted to introduce yourself and enter the screen name you would like to appear alongside your responses. 3. click continue. the survey will let you know that there are six questions. click start survey. 4. the first question is multiple choice. select your favorite sport. 5. click next on the upper right-hand corner. 6. the second question is a short response. type your favorite ice cream flavor. click submit. you can enter as many answers as you want. when you are ready to go to the next question click next. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 23 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 7. the third question is also a short response. type in your favorite food. you can enter as many answers as you want. when you are ready to go to the next question click next. 8. the fourth question is also a short response. type what you are looking forward to this semester. you can enter as many answers as you want. when you are ready to go to the next question click next. 9. the fifth question is a clickable image question. click on the face that describes how you are feeling today. for this question, if you want to clear your response and enter a new one, you may do so by clicking “clear last response.” when you are ready, click next. 10. the sixth question is a ranking question. you need to use the arrow feature, which appears when you click next to the image. move images up and down organizing them from favorite to least favorite. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 24 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson for this question, if you want to clear your response and enter a new one, you may do so by clicking “clear last response.” when you are ready, click submit. 11. click finish in the upper right-hand corner. a screen will appear that says “all done!” the results of the survey are only available when the creator of the survey presents them in class. we were not able to figure out a way for respondents to access group responses asynchronously. notes • we noticed that when preparing questions 4 and 6, we were not prompted to enter alt-text by default. • the creator of the poll must enable alt-text for clickable image questions (such as question 4) by going to the user profile and selecting “features lab.” • alt-text did not seem to be available for question 6. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 25 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson appendix b: digital accessibility assessment report for poll everywhere information • testing tools: o windows 10 / jaws 2021/ nvda 2021.3.1 / google chrome (most recent version) o pixel 3a, android 12 / talkback / last updated 9/30/21 o iphone 12 mini / voiceover / safari ios 14.3 • testing dates: february 27, 28, and march 3, 2022 summary this document provides an overview of the issues the digital accessibility office (dao) identified on the poll everywhere platform. overall, we found the site to be relatively accessible for many individuals with disabilities or who use assistive technologies or alternative forms of access depending on the question type. the questions with images—to rank or select—were inaccessible. that said, through our testing, we found five severe issues, two significant issues, one minor issue, and one usability issue. severe issues represent items that create access barriers and need to be remediated, significant issues represent items that create a great deal of difficulty and should be remediated, and minor issues represent items that are the lowest priority but would be good to remediate. usability issues can impact users of any ability. if there are questions, concerns, or the desire to see demos of the issues presented in this report, please reach out to the assessment & usability testing team. please also consider filling out the assessment & usability testing feedback form to help us improve our testing protocols. issues severe graphics are unlabeled or inappropriately labeled • in question 6, there are four pictures of animals. the screen readers read all four images as “unlabeled images.” there is no differentiation between the four images. appropriate image descriptions are needed. o additionally, while reviewing the history of submissions, the answers are a list that read “(an image), (an image), (an image), (an image)” information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 26 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson • there were several elements that have dots in the label name. when using voiceover, these elements were read as “unpronounceable. [braille dots] ...” followed by numbers. o these elements included the marker on the emoji image, the up and down arrow buttons on question 6, and the finished icon. element presents gesture/navigation traps • on question 6, while using voiceover and talkback, the user could not swipe between the answer options. this made the buttons, links, and text before and after the options inaccessible. o a tester was able to leave the trap, but they had to use direct touch and focus landed outside the answers. o additionally, while using talkback, there was not any indication that the image was being moved up or down. element not available by keyboard or screen reader • question 5 is an emoji question where the user would need a mouse or direct touch (while not using a screen reader) to answer successfully. the alternative text says there are emojis, but the user does not know what five emotes or different colors are presented. to activate, the user selects “enter” or double taps (mobile screen reader). this makes a random selection and places the marker in the middle of the image without a way to move the marker to the appropriate emoji. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 27 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson errors do not get focus • during one instance, a user received an error that the response was not submitted. during this one instance, the focus was not pushed to the error message. the user would have to know it was there. ideally, the focus would be pushed to the error so all users would be aware that an error had occurred. significant element state not indicated • in question 6, the unlabeled image reads as “clickable” to nvda. when selecting enter, the state of the element is not announced. visually, the image gets larger. inconsistent focus handling • focus handling for all tools could be improved. focus goes to different areas of the page after responses, returning to previous questions, or refreshing the page. o focus inconsistencies depended on the screen reader. while going through the questions, focus would go to the top of the page, “close app download offer” button, “submit,” or “next.” ideally, focus would be on the heading 1 for each question. minor same information is presented to screen reader multiple times • while using voiceover, the instructions, questions, and answers were announced multiple times. this was noted on several occasions. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 28 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson usability insufficient color contrast (4.5:1) • in the multiple-choice question, after selecting an answer, the question’s color becomes lighter. the lighter color has insufficient color contrast for both the answer selected (2:1) and the answers not selected (1.8:1). information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 29 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson endnotes 1 kathia ibacache et al., “emergency remote library instruction and tech tools: a matter of equity during a pandemic,” information technology and libraries 40, no. 2 (2021): 8, https://doi.org/10.6017/ital.v40i2.12751. 2 there are many examples of guides which include ada compliance and library spaces, including but not limited to william w. sannwald, checklist of library building design considerations, 6th ed. (chicago: ala editions, an imprint of the american library association, 2016); and carrie scott banks et al., including families of children with special needs: a how-to-do-it manual for librarians (chicago: american library association, 2014). 3 “introduction to the ada,” ada.gov: united states department of justice civil rights division, accessed june 15, 2022, https://www.ada.gov/ada_intro.htm. 4 “introduction to the ada.” 5 ariel pomputius, “assistive technology and software to support accessibility,” medical reference services quarterly 39, no. 2 (2020): 203, https://doi.org/10.1080/02763869.2020.1744380. 6 fernando h. f. botelho, “accessibility to digital technology: virtual barriers, real opportunities,” assistive technology 33, no. s1 (2021): s31, https://doi.org/10.1080/10400435.2021.1945705. 7 botelho, “accessibility to digital technology,” s27. 8 jonathan lazar, “planning for digital accessibility in research libraries,” research libraries issues, no. 302 (2021): 20, https://doi.org/10.29242/rli.302.3. 9 “digital accessibility,” georgetown law, accessed june 15, 2022, https://www.law.georgetown.edu/your-life-career/campus-services/information-systemstechnology/digital-accessibility/. 10 lazar, “planning for digital accessibility,” 21. 11 lazar, “planning for digital accessibility,” 19. 12 lazar, “planning for digital accessibility,” 26–28. 13 education of individuals with disabilities, 20 u.s.c. §§ 1400–1485 (suppl. 2 1988), https://tile.loc.gov/storage-services/service/ll/uscode/uscode1988-03202/uscode1988032020033/uscode1988-032020033.pdf; see also kathleen puckett and kim w. fisher, “assistive technology,” in the sage encyclopedia of intellectual and developmental disorders, ed. ellen b. braaten (thousand oaks, ca: sage publications, inc., 2018), 100 –101. 14 puckett and fisher, “assistive technology,” 100. 15 claire kearney-volpe and amy hurst, “accessible web development: opportunities to improve the education and practice of web development with a screen reader,” acm transactions on accessible computing 14, no. 2 (july 21, 2021): 8:2. https://doi.org/10.6017/ital.v40i2.12751 https://www.ada.gov/ada_intro.htm https://doi.org/10.1080/02763869.2020.1744380 https://doi.org/10.1080/10400435.2021.1945705 https://doi.org/10.29242/rli.302.3 https://www.law.georgetown.edu/your-life-career/campus-services/information-systems-technology/digital-accessibility/ https://www.law.georgetown.edu/your-life-career/campus-services/information-systems-technology/digital-accessibility/ https://tile.loc.gov/storage-services/service/ll/uscode/uscode1988-03202/uscode1988-032020033/uscode1988-032020033.pdf https://tile.loc.gov/storage-services/service/ll/uscode/uscode1988-03202/uscode1988-032020033/uscode1988-032020033.pdf information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 30 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 16 emily k. chan and lorrie a. knight, “‘clicking’ with your audience: evaluating the use of personal response systems in library instruction,” communications in information literacy 4, no. 2 (march 1, 2011): 192–201, https://doi.org/10.15760/comminfolit.2011.4.2.96. 17 referring to the use of padlet to foster collaboration in a statistics course, henrik skaug saetra suggests that students feel more welcome to ask basic questions in an anonymous environment, in “using padlet to enable online collaboration mediation and scaffolding in a statistics course,” education sciences 11, no. 5 (2021), 6, https://eric.ed.gov/?id=ej1297247. christopher j. e. anderson notes that anonymity invites classroom discussion participation for introverted students and states that answers can be reviewed without “requiring participants to reveal their choice, thus removing stigmas that keep many introverted students from orally participating,” in “repurposing digital devices: using poll everywhere as a vehicle for classroom participation,” journal of teaching and learning with technology 7, (2018): 154, https://eric.ed.gov/?id=ej1307006. 18 citing a 2010 article by b. jean mandernach and jana hackatborn, jared hoppenfeld states that anonymity provides a means for students to answer honestly, in “keeping students engaged with web-based polling in the library instruction session,” library hi tech 30, no. 2 (2012): 238, https://doi.org/10.1108/07378831211239933. see also anderson, “repurposing digital devices,” 154. 19 this paper considered pedagogical approaches when using padlet in the classroom, noting that this tool did not enhance criticality or students’ desire to counter a post by a classmate; see ann rosnida md deni and zainor izat zainal, “padlet as an educational tool: pedagogical considerations and lessons learnt,” proceedings of the 10th international conference on education technology and computers (october 2018), 157, https://doi.org/10.1145/3290511.3290512. 20 some authors surmise that an instructional game could be used to prepare students for exams, for example, patricia a. baszuk and michele l. heath, “using kahoot! to increase exam scores and engagement,” journal of education for business 95, no. 8. (2020): 550, https://doi.org/10.1080/08832323.2019.1707752. examining technology as a tool to enhance teaching and learning in engineering classes, vian ahmed and alex opuku mentioned that students found kahoot! a useful online tool that helped them reflect, apply knowledge, and receive feedback, in “technology supported learning and pedagogy in times of crisis: the case of covid‐19 pandemic,” education and information technologies 27 (2021), https://doi.org/10.1007/s10639-021-10706-w. darren h. iwamoto et al. assert that kahoot! provided students with a fun activity that helped them memorize important concepts, in darren h. iwamoto, jace hargis, erik jon taitano, and kv vuong, “analyzing the efficacy of the testing effect using kahoot! on student performance,” turkish online journal of distance education 18, no. 2 (2017): 83, 89, https://eric.ed.gov/?id=ej1145220. 21 iwamoto et al., “analyzing the efficacy,” 83, 89. 22 carolyn m. plump and julia larosa, “using kahoot! in the classroom to create engagement and active learning: a game-based technology solution for elearning novices,” management teaching review 2, no. 2 (2017): 156, https://doi.org/10.1177/2379298116689783. https://doi.org/10.15760/comminfolit.2011.4.2.96 https://eric.ed.gov/?id=ej1297247 https://eric.ed.gov/?id=ej1307006 https://doi.org/10.1108/07378831211239933 https://doi.org/10.1145/3290511.3290512 https://doi.org/10.1080/08832323.2019.1707752 https://doi.org/10.1007/s10639-021-10706-w https://eric.ed.gov/?id=ej1145220 https://doi.org/10.1177/2379298116689783 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 31 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 23 maureen knapp, “technology for one-shot instruction and beyond,” journal of electronic resources in medical libraries (2014): 224, https://doi.org/10.1080/15424065.2014.969969. 24 liya deng, “assess and engage: how poll everywhere can make learning meaningful again for millennial library users,” journal of electronic resources librarianship 31, no. 2 (2019): 63, https://doi.org/10.1080/1941126x.2019.1597437. 25 a surveyand interview-based study by engineering faculty members ahmed and opoku examined both instructors’ and students’ perceptions of technology-supported learning during times of crisis. with regard to technological and pedagogical best practices, student participants noted that interactive feedback tools such as kahoot! helped them synthesize and apply their knowledge. as one student said, “kahoot! was a fun and interactive application and engaging.” see ahmed and opuku, “technology supported learning and pedagogy,” 381. 26 razzaqul ahshan, “a framework of implementing strategies for active student engagement in remote/online teaching and learning during the covid-19 pandemic,” education sciences 11, no. 9 (2021): 487, https://doi.org/10.3390/educsci11090483. 27 jillianne code, rachel ralph, and kieran forde, “pandemic designs for the future: perspectives of technology education teachers during covid-19,” information and learning sciences 121, no. 5/6 (january 1, 2020): 419–31, https://doi.org/10.1108/ils-04-2020-0112. 28 code, ralph, and forde, “pandemic designs,” 426. 29 one such study examines the impact of covid-19 on higher education in ethiopia; see berhanu abera, “the effects of covid-19 on ethiopian higher education and their implication for the use of pandemic-transformed pedagogy: ‘corona batches’ of addis ababa university in focus,” journal of international cooperation in education 24, no. 2 (2021): 3–25. another study focuses on polish primary school integration of ipads; see lucyna kopciewicz and hussein bougsiaa, “understanding emergent teaching and learning practices: ipad integration in polish school,” education and information technologies 26, no. 3 (2021): 2916, https://doi.org/10.1007/s10639-020-10383-1. a third article explores pandemic-transformed pedagogy from the perspectives of early childhood instructors in the caribbean; see sabeerah abdul-majied, zoyah kinkead-clark, and sheron c. burns, “understanding caribbean early childhood teachers’ professional experiences during the covid-19 school disruption,” early childhood education journal (2022), https://doi.org/10.1007/s10643-022-01320-7. 30 paul f. burke et al., “exploring teacher pedagogy, stages of concern and accessibility as determinants of technology adoption,” technology, pedagogy and education 27, no. 2 (2018): 149–63, https://doi.org/10.1080/1475939x.2017.1387602. 31 burke et al., “exploring teacher pedagogy,” 158–59. 32 ibacache, rybin koob, and vance, “emergency remote library instruction,” 9. 33 the authors of this study hold roles as academic subject specialist librarians and digital accessibility office staff, including accessibility and usability team testers. https://doi.org/10.1080/15424065.2014.969969 https://doi.org/10.1080/1941126x.2019.1597437 https://doi.org/10.3390/educsci11090483 https://doi.org/10.1108/ils-04-2020-0112 https://doi.org/10.1007/s10639-020-10383-1 https://doi.org/10.1007/s10643-022-01320-7 https://doi.org/10.1080/1475939x.2017.1387602 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 32 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 34 “free accessibility tools and assistive technology you can use today,” bureau of internet accessibility (blog), october 26, 2018, https://www.boia.org/blog/free-accessibility-tools-andassistive-technology-you-can-use-today; see also “chapter 1, introducing voiceover,” in voiceover getting started guide, apple, inc., accessed june 16, 2022, https://www.apple.com/voiceover/info/guide/_1121.html. 35 “get started on android with talkback,” android accessibility help, accessed june 16, 2022, https://support.google.com/accessibility/android/answer/6283677?hl=en. 36 “about nvda,” nv access, accessed june 17, 2022, https://www.nvaccess.org/about-nvda/. 37 jaws was developed by freedom scientific members with vision loss; see “jaws®—freedom scientific,” accessed june 16, 2022, https://www.freedomscientific.com/products/software/jaws/. 38 “colour contrast analyser (cca),” tpgi, accessed june 16, 2022, https://www.tpgi.com/colorcontrast-checker/. 39 ibacache, rybin koob, and vance, “emergency remote library instruction,” 9. 40 jamboard is very visual, with multiple options such as sticky notes, drawing pens, and image searching. other tools such as kahoot! and mentimeter are not solely visual; they also include additional moving parts, such as sounds and other notifications. 41 lazar, “planning for digital accessibility,” 21. 42 lazar, “planning for digital accessibility,” 21. 43 lazar indicated that captioning benefits people who process information in different ways, who are learning the language being used, or who may otherwise struggle to understand a dialect, in “planning for digital accessibility,” 21. 44 forrester research, inc., accessible technology in computing: examining awareness, use, and future potential, redmond, wa: microsoft corporation (2004): 9, http://download.microsoft.com/download/0/1/f/01f506eb-2d1e-42a6-bc7b1f33d25fd40f/researchreport-phase2.doc. https://www.boia.org/blog/free-accessibility-tools-and-assistive-technology-you-can-use-today https://www.boia.org/blog/free-accessibility-tools-and-assistive-technology-you-can-use-today https://www.apple.com/voiceover/info/guide/_1121.html https://support.google.com/accessibility/android/answer/6283677?hl=en https://www.nvaccess.org/about-nvda/ https://www.freedomscientific.com/products/software/jaws/ https://www.tpgi.com/color-contrast-checker/ https://www.tpgi.com/color-contrast-checker/ http://download.microsoft.com/download/0/1/f/01f506eb-2d1e-42a6-bc7b-1f33d25fd40f/researchreport-phase2.doc http://download.microsoft.com/download/0/1/f/01f506eb-2d1e-42a6-bc7b-1f33d25fd40f/researchreport-phase2.doc abstract introduction overview ada accessibility what is digital accessibility? assistive technology for blind users classroom engagement with tech tools pandemic-transformed pedagogy methodology about the testers about the test scripts process assistive technology tools evaluated limitations results severe issues significant issues minor issues summary of results discussion shared issues among the tools unique environments digital accessibility benefits users with different abilities recommendations and next steps planning with intention collaboration next steps conclusion acknowledgment appendix a: tester instruction and script background poll everywhere script notes appendix b: digital accessibility assessment report for poll everywhere information summary issues severe graphics are unlabeled or inappropriately labeled element presents gesture/navigation traps element not available by keyboard or screen reader errors do not get focus significant element state not indicated inconsistent focus handling minor same information is presented to screen reader multiple times usability insufficient color contrast (4.5:1) endnotes the open access citation advantage: does it exist and what does it mean for libraries? colby lewis information technology and libraries | september 2018 50 colby lewis (colbyllewis@gmail.com), a second year master of science in information student at the university of michigan school of information, is winner of the 2018 lita/ex libris student writing award. abstract the last literature review of research on the existence of an open access citation advantage (oaca) was published in 2011 by philip m. davis and william h. walters. this paper reexamines the conclusions reached by davis and walters by providing a critical review of oaca literature that has been published since 2011 and explores how increases in open access publication trends could serve as a leveraging tool for libraries against the high costs of journal subscriptions. introduction since 2001, when the term “open access” was first used in the context of scholarly literature, the debate over whether there is a citation advantage (ca) caused by making articles open access (oa) has plagued scholars and publishers alike.1 to date, there is still no conclusive answer to the question, or at least not one that the premier publishing companies have deemed worthy of acknowledging. there have been many empirical studies, but far fewer with randomized controls. the reasons for this range from data access to the numerous potential “methodological pitfalls” or confounding variables that might skew the data in favor of one argument or another. the most recent literature review of articles that explored the existence (or lack thereof) of an open access citation advantage (oaca) was published in 2011 by philip m. davis and william h. walters. in that review, davis and walters ultimately concluded that “while free access leads to greater readership, its overall impact on citations is still under investigation. the large access -citation effects found in many early studies appear to be artifacts of improper analysis and not the result of a causal relationship.”2 this paper seeks to reexamine the conclusions reached by davis and walters in 2011 by providing a critical review of oaca literature that have been published since their 2011 literature review.3 this paper will examine the methods and conclusions provoking such criticisms and whether these criticisms are addressed in the studies. i will begin by identifying some of the top confounders in oaca studies, in particular the potential for self-archiving bias. i will then examine articles from july 2011, when davis and walters published their findings, to july 2017. there will be a few exceptions to this time frame, but the studies cited in figures 4 and 5 are entirely from this period. in addition to reviewing oaca studies since davis and walters’ march 2011 study, i will explore the implications of an oaca on the future of publishing and the role of librarians in the subscription process. as antelman points out in her association of college and research libraries conference paper, “leveraging the growth of open access in library collection decision making,” it is the responsibility of libraries to use the newest data and technology available to them in the interest of best serving their patrons and advancing scholarship.4 in connecting oaca mailto:colbyllewis@gmail.com the open access citation advantage | lewis 51 https://doi.org/10.6017/ital.v37i3.10604 studies and the potential bargaining power an oaca could bring libraries, i assess the current roles that universities and university libraries play in promoting (or not) oa publications and the implications of an oaca for researchers, universities, and libraries, and i provide suggestions on how recent research could influence the present trajectory. i conclude by summarizing what my findings tell us about the existence (or lack thereof) of an oaca, and what these findings imp ly for the future of library journal subscriptions and the publish-or-perish model for tenure. lastly, i will suggest some alternative metrics to citations that could be used by libraries in determining future journal subscriptions and general collection management. self-archiving bias and why it doesn’t matter the idea of a self-archiving bias is based upon the concept that, if faced with a choice, authors will always opt to make their best work more widely available. effectively, when open access is not mandated, these articles may be specifically chosen to be made open access to increase readership and, hypothetically, citations.5 this biased selection method has the potential to confound the results of oaca studies because of the intuitive notion that an author’s best work is much more likely to be cited than any of their other work. its effect is amplified by making this work available oa, but it prevents studies in which articles were self-archived from being able to convincingly claim that the citation advantage these articles received was due to oa and not to its inherent quality and subsequent likelihood to be cited anyway. in a 2010 study, gargouri et al. determined that articles by authors whose institutions mandated self-archiving (such as in an institutional repository [ir]) saw an oaca just as great for articles that were mandated to be oa as for articles that were self-selected to be oa.6 this by no means proves a causal relationship between oa and ca, but does counter the notion that self -archived articles are an uncontrollable confounder that automatically compromises the legitimacy of oaca studies.7 ottaviani affirms this conclusion in a 2016 study in which he writes, “in the long run better articles gain more citations than expected by being made oa, adding weight to the results reported by gargouri et al.”8 in short, claiming that articles self-selected for self-archiving irreparably confound oaca studies ignores the fact that these authors have accounted for the likelihood that articles of higher quality will inherently be cited more. as gargouri et al. put it, “the oa advantage [to self-archived articles] is a quality advantage, rather than a quality bias” (italics in original).9 gold versus green and their effect on oaca analyses many critics of oaca studies have argued that such studies do not distinguish between gold oa, green oa, and hybrid (subscription journals that offer the option for authors to opt-in to gold oa) journals in their sample pool, thus skewing the results of their studies. in fact, there are many acknowledged subcategories of oa, but for the purposes of this paper, i will primarily focus on gold, green, and hybrid oa. figure 1, provided by elsevier as a guide for their clients, distinguishes between gold and green oa.10 while the chart provided applies specifically to those looking to publish with elsevier, it highlights the overarching differences between gold oa and green oa. a comprehensive list of oa journals is available through the directory of open access journals (doaj) website (https://doaj.org/). https://doaj.org/ information technology and libraries | september 2018 52 figure 1. elsevier explains to potential clients their options for publishing oa with elsevier and the differences between publishing with gold oa versus green oa. the argument that not distinguishing between gold oa and green oa in oaca studies distorts study results primarily stems from the potential for skew in green oa journals. green oa journals allow authors to self-archive their articles after publication, but the articles are often not made full oa until an embargo period has passed. this problem was addressed in a recent study conducted by science-metrix and 1science, who manually checked and coded approximatively 8,100 top-level domains (tlds).11 it is important to note that this study was made available as a white paper on the 1science website and has not been published in a peer-reviewed journal. additionally, 1science is a company built on providing oa solutions to libraries, which means they have a vested interest in proving the existence of an oaca. however, just as publishers such as elsevier have a vested interest in a substantial oaca not existing, this should not prevent us from examining their data. for their study, 1science did not distinguish hybrid journals as being in a distinct journal category. critics, such as the editorial director of journals policy for oxford university press, david crotty, were quick to fixate on this lack of distinction as a means of discrediting the study.12 employees of elsevier were similarly inclined to criticize the study, declaring that it, “like many others [studies] on this topic, does not appear to be randomized and controlled.”13 however, archambault et al., acknowledging that their study “does not examine the overlap between green and gold,” have provided an extremely comprehensive sample pool, examining 3,350,910 oa papers published between 2007 and 2009 in 12,000 journals.14 this paper examines the notion that “the advantage of oa is partly due to citations having a chance to arrive sooner . . . and concludes that the purported head start of oa papers is actually contrary to observed data.” 15 the open access citation advantage | lewis 53 https://doi.org/10.6017/ital.v37i3.10604 in a more recent study published in february 2018, piwowar et al. examine the prevalence of oa and average relative citation (arc) based on three sample groups of one hundred thousand articles each: “(1) all journal articles assigned a crossref doi, (2) recent journal articles indexed in web of science, and (3) articles viewed by users of unpaywall, an open-source browser extension that lets users find oa articles using oadoi.”16 unlike the 1science study, piwowar et al. had a twofold purpose: to examine the prevalence of oa articles available on the web and whether an oaca exists based on their sample findings. i do not include their results in my literature review because of the dual focus of their study, although i do compare their results with those of archambault et al. and analyze the implications of their findings. bronze: neither gold nor green in their article, piwowar et al. introduce a new category of oa publication: bronze. if gold oa refers to complete open access at the time of publication, and green oa refers to articles published in a paywalled journal but ultimately made oa either after an embargo period or via an ir, bronze oa refers to oa articles that somehow don’t fit into either of these categories. piwowar et al. define bronze oa articles as “free to read on the publisher page, but without any clearly identifiable license.”17 however, as crotty points out in a scholarly kitchen article reflecting on the preprint version of piwowar et al.’s article, “bronze” already exists as an oa category, but has simply been called “public access.”18 while coining “bronze” as a new term for “public access” is helpful in connecting it to oa terms such as “green” and “gold,” it is not quite the new phenomenon it is touted to be. arc as an indication of an oaca both archambault et al. and the authors of the 1science paper provide the arc as a means of establishing a paper’s impact on the larger research community. 19 within their arc analyses, archambault et al. distinguish between non-oa and oa, within which they differentiate between gold and green oa (figure 2). piwowar et al. group papers by closed (non-oa) and oa, with the following oa subcategories: bronze, hybrid, gold, and green oa (figure 3). an arc of 1.0 is the expected amount of citations an article will receive “based on documents published in the same year and [national science foundation (nsf)] specialty.” 20 based on this standard, articles with an arc above or below 1.0 represent a citation impact that percentage above or below the expected citation impact of like articles. for example, an article with an arc of 1.23 has received 23 percent more citations than expected for articles of similar content and quality. this scale can be incredibly useful in determining the presence of a citation advantage, and it can enable researchers to determine overall ca patterns. information technology and libraries | september 2018 54 figure 2. research impact of paywalled (not oa) versus open access (oa) papers “computed by science-metrix and 1science using oaindx and the web of science.” archambault et al., “research impact of paywalled versus open access papers,” white paper, science-metrix and 1science, 2016, http://www.1science.com/1numbr/. critics’ fixation on the “randomized and controlled” nature of the 1science study ignores the fact that the authors do not claim causation. rather, their findings suggest the existence of an oaca when comparing oa (in all forms) and non-oa (in any form) articles (see figure 2). the authors ultimately conclude that “in all these fields, fostering open access (without distinguishing between gold and green) is always a better research impact maximization strategy than relying on strictly paywalled papers.”21 unlike archambault et al., piwowar et al. found that gold oa articles had a significantly lower arc, and that the average arc of all oa balances out to 1.18 because of the high arcs of bronze (1.22), hybrid (1.31), and green (1.33). however, both studies fou nd that non-oa (referred to by piwowar et al. as “closed”) articles had an arc below 1.0, suggesting a definitive correlation between oa (without specifying type) and an increase in citations. http://www.1science.com/1numbr/ the open access citation advantage | lewis 55 https://doi.org/10.6017/ital.v37i3.10604 figure 3. “average relative citations of different access types of a random sample of world of science (wos) articles and review with a digital object identifier (doi) published between 2009 and 2015.” heather piwowar et al., “the state of oa: a large-scale analysis of the prevalence and impact of open access articles,” peerj, february 13, 2018, https://doi.org/10.7717/peerj.4375. six years and what has changed in oaca research between july 2011 and the publication of piwowar et al.’s work in february 2018, nine new oaca studies have been published in peer-reviewed journals. of these, five only look at the oaca in one field, such as cytology or dentistry. the other four are multidisciplinary studies, two of which are repository-specific and only use articles from deep blue and academia.edu, respectively. this is important to note because of critics’ earlier stated objections to the use of studies that are not randomized controlled studies. however, the deep blue study can still be considered a randomized controlled sample group because the authors are not self-selecting articles to upload to the repository as they are with academia.edu. rather, articles were made accessible through deep blue “via blanket licensing agreements between the publishers and the [university of michigan] library.”22 some of the field-specific studies use sample sizes that may not reflect a general oaca, but rather one only for that field, and in certain cases, only for a single journal. field-specific studies between july 2011 and july 2017, five field-specific studies were conducted to determine whether an oaca existed in those fields. i summarize the scope and conclusions of these studies in table 1. as you can see from the table, the article sample size vastly varied between studies, but that can likely be accounted for by considering the specific fields studied since there are only five major cytopathology journals and nearly fifty major ecology journals. piwowar et al. acknowledge this in their study, noting that the nsf assigns all science journals “exactly one ‘discipline’ (a high-level categorization) and exactly one ‘specialty’ (a finer-grained categorization).”23 the more deeply nested in an nsf discipline a subject is, the more specialized the field becomes and the fewer journals there are on the subject. this alone is reason not to extrapolate from the results of these studies and project their results on the existence of oaca across all fields. https://doi.org/10.7717/peerj.4375 information technology and libraries | september 2018 56 only two of these studies, those focused on an oaca in dentistry and ecology, can be cons idered truly randomized controlled studies. both the cytopathology and marine ecology studies chose a specific set of journals from which to draw their entire sample pool. while the dentistry and ecology studies can be considered randomized controlled in nature, they still only reflect the occurrence (or lack thereof) of an oaca in those specific fields. it would be irresponsible to allow the results from studies in a single field of a single discipline to represent oaca trends across all disciplines. therefore, it is surprising that elsevier employees use the dentistry study to make such a claim. hersh and plume write, “another recent study by hua et al (2016) looking at citations of open access articles in dentistry found no evidence to suggest that open access articles receive significantly more citations than non-open access articles.”24 the key phrase missing from the end of this analysis is in dentistry. one might question whether a claim about multidisciplinary oaca can effectively be extrapolated from a single-field analysis. the authors do, two sentences later, qualify their earlier statement by saying, “in dentistry at least, the type of article you publish seems to make a difference but not oa status.”25 that is indeed what this study seems to show, and is therefore a logical claim to make. likewise, the three empirical studies in table 1 show that, for those respective fields, oa status does correlate to a citation advantage. in the case of the ecology study, the authors are confident enough in their randomized controlled methodology to claim causation. 26 the ecology study is the most recently published oaca study, and its authors were able to learn from similar past studies about the necessary controls and potential confounders in oaca studies. with this knowledge, tang et al. determined that: by comparing oa and non-oa articles within hybrid journals, our estimate of the citation advantage of oa articles sets controls for many factors that could confound other comparisons. numerous studies have compared articles published in oa journals to those in non-oa journals, but such comparison between different journals could not rule out the impacts of potentially confounding factors such as publication time (speed) and quality and impact (rank) of the journal. these factors are effectively controlled with our focus on hybrid journals, thereby providing robust and general estimates of citation advantages on which to base publication decisions. 27 the open access citation advantage | lewis 57 https://doi.org/10.6017/ital.v37i3.10604 summary of key field-specific studies author study design content number of articles controls results, interpretation, and conclusion clements 2017 empirical 3 hybrid-oa marine ecology journals all articles published in these journals between 2009 and 2012; specific number not provided jif; article type; selfcitations “on average, open access articles received more peer-citations than nonopen access articles.” oaca found. frisch et al. 2014 empirical 5 cytopathology journals; 1 oa and 4 non-oa 314 articles published between 2007 and 2011 jif; author frequency; publisher neutrality “overall, the averages of both cpp and q values were higher for oa cytopathology journal (cytojournal) than traditional non-oa journals.” oaca found. gaulé and maystre 2011 empirical 1 major biology journal 4,388 articles published between 2004 and 2006 last author; characteristics; article quality “we find no evidence for a causal effect of open access on citations. however, a quantitatively small causal effect cannot be statistically ruled out.” oaca not found. hua et al. 2016 randomized controlled articles randomly selected from pubmed database, not specific dentistry journals 908 articles published in 2013 randomized article selection; exclusion of articles unrelated to dentistry; multidatabase search to determine oa status “in the present study, there was no evidence to support the existence of oa ‘citation advantage’, or the idea that oa increases the citation of citable articles.” oaca not found. tang et al. 2017 randomized controlled 46 hybrid-oa ecology journals 3,534 articles published between 2009 and 2013 gni of author country; randomized article pairing; article length “overall, oa articles received significantly more citations than non-oa articles, and the citation advantage averaged approximately one citation per article per year and increased cumulatively over time after publication.” oaca found. table 1. scope, controls, and results of field-specific oaca studies since 2011. based on a chart in stephan mertens, “open access: unlimited web based literature searching,” deutsches ärzteblatt international 106, no. 43 (2009): 711. jif, journal impact factor; cpp, citations per publication; q, q-value (see frisch, nora k., romil nathan, yasin k. ahmed, and vinod b. shidham. “authors attain comparable or slightly higher rates of citation publishing in an open access journal (cytojournal) compared to traditional cytopathology journals—a five year (2007–2011) experience.” cytojournal 11, no. 10 (april 2014). https://doi.org/10.4103/1742-6413.131739 for specific equation used.) https://doi.org/10.4103/1742-6413.131739 information technology and libraries | september 2018 58 summary of key multidisciplinary studies author study design content number of articles controls results, interpretation, and conclusion mccabe and snyder 2014 empirical 100 journals in ecology, botany, and multidisciplinary science all articles published in these journals between 1996 and 2005; specific number not provided jif; journal founding year “we found that open access only provided a significant increase for those volumes made openly accessible via the narrow channel of their own websites rather than the broader pubmed central platform.” oaca found. niyazov et al. 2016 empirical unspecified number of journals across 23 academic divisions 31,216 articles published between 2009 and 2012 field; jif; publication vs. upload date “we find a substantial increase in citations associated with posting an article to academia.edu. . . . we find that a typical article that is also posted to academia.edu has 49% more citations than one that is only available elsewhere online through a non-academia.edu venue.” oaca found for academia.edu. ottaviani 2016 randomized controlled unspecified number of journals who have blanket licensing agreements between the publishers and the university of michigan library 93,745 articles published between 1990 and 2013 self-selection “even though effects found here are more modest than reported elsewhere, given the conservative treatments of the data and when viewed in conjunction with other oaca studies already done, the results lend support to the existence of a real, measurable, open access citation advantage with a lower bound of approximately 20%.” oaca found. sotudeh et al. 2015 empirical 633 apc-funded oa journals published by springer and elsevier 995,508 articles published between 2007 and 2011 journals who adopted oa policies after 2007 journals with non– article processing charge oa policies “the apc oa papers are, also, revealed to outperform the ta ones in their citation impacts in all the annual comparisons. this finding supports the previous results confirming the citation advantage of oa papers.” oaca found. table 2. scope, controls, and results of multi-disciplinary oaca studies since 2011. jif, journal impact factor; apc, article processing charge; ta, toll access the open access citation advantage | lewis 59 https://doi.org/10.6017/ital.v37i3.10604 based on the randomized controlled methodology that tang et al. found hybrid journals to provide, it is possible that this study may serve as an ideal model for future larger oaca studies across multiple disciplines. however, more field-specific hybrid journal studies will have to be conducted before determining if this model would be the most accurate method for measuring oaca across multiple disciplines in a single study. multidisciplinary studies the multidisciplinary oaca studies conducted since 2011 include a single randomized control study and three empirical studies (table 2). all these studies found an oaca; in the case of niyazov et al., an oaca was found specifically for articles posted to academia.edu. i included this study because it is an important contribution to the premise that a relationship exists between self selection and oaca. niyazov et al. highlight this point in the section “sources of selection bias in academia.edu citations,” explaining that “even if academia.edu users were not systematically different than non-users, there might be a systematic difference between the papers they choose to post and those they do not. as [many] . . . have hypothesized, users may be more likely to post their most promising, ‘highest quality’ articles to the site, and not post articles they believe will be of more limited interest.”28 to underscore this point, i refer to gargouri et al., who stated that “the oa advantage [to self archived articles] is a quality advantage, rather than a quality bias” (italics in original).29 again, it is unsurprising that articles of higher caliber are cited more and that making such articles more readily available increases the amount of citations they would likely already receive. similar to my conclusion in the field-specific study section, we simply need more randomized controlled studies, such as ottaviani’s, to determine the nature and extent of the relationship between oa and ca across multiple disciplines. conclusions critics of some of the most recent studies, specifically archambault et al. and ottaviani, have argued that authors of oaca studies are too quick to claim causation. while a claim of causation does indeed require strict adherence to statistical methodology and control of potential confounders, few of the authors i have examined actually claim causation. they recognize that the empirical nature of their studies is not enough to prove causation, but rather to provide insight into the correlation between open access and a citation advantage. in all their conclusions, these authors acknowledge that further studies are needed to prove a causal relationship between oa and ca. the recent work published by piwowar et al. provides a potential model for replication by other researchers, and ottaviani offers a replicable method for other large research institutions with non-self-selecting institutional repositories. alternatively, field-specific studies conducted in the style of tang et al. across all fields would serve to provide a wider array of evidence for the occurrence of field-specific oaca and therefore of a more widespread oaca. recent developments in oa search engines have created alternative routes to many of the same articles offered by subscriptions, but at a fraction (if any) of the cost. antelman proposed that libraries use an oa-adjusted cost per download (oa-adj cpd), a metric that “subtracts the downloads that could be met by oa copies of articles within subscription journals,” as a tool for negotiating the price of journal subscriptions.30 by calculating an oa-adj cpd, libraries could information technology and libraries | september 2018 60 potentially leverage their ability to access journal articles through means other than traditional subscription bundles to save money and encourage oa publication. while antelman suggests using oa-adj cpd as a leveraging tool when making deals with publishers for journals subscriptions, i suggest that libraries use the data-gathering methods of piwowar et al. via unpaywall to determine whether enough articles from a specific journal can be found oa via unpaywall. by using metrics such as those collected by piwowar et al. through unpaywall, the potential confounding variable of articles found through illegitimate means (such as scihub) is alleviated. instead, piwowar et al.’s metrics focus on tracking the percentage of material searched by library patrons that can be found oa through the unpaywall browser extension. according to unpaywall’s “libraries user guide” page, libraries “can integrate unpaywall into their sfx, 360 link, or primo link resolvers, so library users can read oa copies in cases where there's no subscription access. over 1000 libraries worldwide are using this now. ”31 ideally, scholars will also be more willing to publish papers oa, and institutions will be more supportive of providing the necessary costs for making publications oa. though the publish-orperish model still reigns in academia, there is great potential in encouraging tenured professors to publish oa by supplementing the costs through institutional grants and other incentives wrapped into a tenure agreement. perhaps through this model, as gargouri et al. have suggested, the longstanding publish-or-perish doctrine will give way to an era of “self-archive to flourish.”32 bibliography antelman, kristin. “leveraging the growth of open access in library collection decision making.” acrl 2017 proceedings: at the helm, leading the transformation, march 22–25, baltimore, maryland, ed. dawn m. mueller (chicago: association of college and research libraries, 2017), 411–22. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/l everagingthegrowthofopenaccess.pdf. archambault, éric, grégoire côté, brooke struck, and matthieu voorons. “research impact of paywalled versus open access papers.” white papers, science-metrix and 1science, 2016. http://www.1science.com/1numbr/. calver, michael c. and j. stuart bradley. “patterns of citations of open access and non -open access conservation biology journal papers and book chapters.” conservation biology 24, no. 3 (may 2010): 872-80. https://doi.org/10.1111/j.1523-1739.2010.01509.x. chua, s. k., ahmad m. qureshi, vijay krishnan, dinker r. pai, laila b. kamal, sharmilla gunasegaran, m. z. afzal, lahri ambawatta, j. y. gan, p. y. kew, et al. “the impact factor of an open access journal does not contribute to an article’s citations” [version 1; referees: 2 approved]. f1000 research 6 (2017): 208. https://doi.org/10.12688/f1000research.10892.1. clarivate analytics. “incites journal citation reports.” dataset updated september 9, 2017. https://jcr.incites.thomsonreuters.com/. clements, jeff c. “open access articles receive more citations in hybrid marine ecology journals.” facets 2 (january 2017): 1–14. https://doi.org/10.1139/facets-2016-0032. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf http://www.1science.com/1numbr/ https://doi.org/10.1111/j.1523-1739.2010.01509.x https://doi.org/10.12688/f1000research.10892.1 https://jcr.incites.thomsonreuters.com/ https://doi.org/10.1139/facets-2016-0032 the open access citation advantage | lewis 61 https://doi.org/10.6017/ital.v37i3.10604 crotty, david. “study suggests publisher public access outpacing open access; gold oa decreases citation performance.” scholarly kitchen, october 4, 2017. https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-accessoutpacing-open-access-gold-oa-decreases-citation-performance/. crotty, david. “when bad science wins, or ‘i’ll see it when i believe it.’” scholarly kitchen, august 31, 2016. https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-itwhen-i-believe-it/. davis, philip m. “open access, readership, citations: a randomized controlled trial of scientific journal publishing.” faseb journal 25, no. 7 (july 2011): 2129–34. https://doi.org/10.1096/fj.11183988. davis, philip m., and william h. walters. “the impact of free access to the scientific literature: a review of recent research.” journal of the medical library association 99, no. 3 (july 2011): 208– 17. https://doi.org/10.3163/1536-5050.99.3.008. elsevier. “your guide to publishing open access with elsevier.” amsterdam, netherlands: elsevier, 2015. https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf. evans, james a. and jacob reimer. “open access and global participation in science.” science 323, no. 5917 (february 2009): 1025. https://doi.org/10.1126/science.1154562. eysenbach, gunther. “citation advantage of open access articles.” plos biology 4, no. 5 (may 2006): e157. https://doi.org/10.1371/journal.pbio.0040157. fisher, tim. “top-level domain (tld).” lifewire, july 30, 2017. https://www.lifewire.com/toplevel-domain-tld-2626029. frisch, nora k., romil nathan, yasin k. ahmed, and vinod b. shidham. “authors attain comparable or slightly higher rates of citation publishing in an open access journal (cytojournal) compared to traditional cytopathology journals—a five year (2007–2011) experience.” cytojournal 11, no. 10 (april 2014). https://doi.org/10.4103/1742-6413.131739. gaulé, patrick, and nicolas maystre. “getting cited: does open access help?” research policy 40, no. 10 (december 2011): 1332–38. https://doi.org/10.1016/j.respol.2011.05.025. gargouri, yassine, chawki hajjem, vincent larivière, yves gingras, les carr, tim brody, and stevan harnad. “self-selected or mandated, open access increases citation impact for higher quality research.” plos one 5, no. 10 (october 2010). https://doi.org/10.1371/journal.pone.0013636. hajjem, chawki, stevan harnad, and yves gingras. “ten-year cross-disciplinary comparison of the growth of open access and how it increases research citation impact.” ieee data engineering bulletin 28, no. 4 (december 2005): 39-46. hall, martin. “green or gold? open access after finch.” insights 25, no. 3 (november 2012): 235– 40. https://doi.org/10.1629/2048-7754.25.3.235. https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://doi.org/10.1096/fj.11-183988 https://doi.org/10.1096/fj.11-183988 https://doi.org/10.3163/1536-5050.99.3.008 https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf https://doi.org/10.1126/science.1154562 https://doi.org/10.1371/journal.pbio.0040157 https://www.lifewire.com/top-level-domain-tld-2626029 https://www.lifewire.com/top-level-domain-tld-2626029 https://doi.org/10.4103/1742-6413.131739 https://doi.org/10.1016/j.respol.2011.05.025 https://doi.org/10.1371/journal.pone.0013636 https://doi.org/10.1629/2048-7754.25.3.235 information technology and libraries | september 2018 62 hersh, gemma, and andrew plume. “citation metrics and open access: what do we know?” elsevier connect, september 14, 2016. https://www.elsevier.com/connect/citation-metrics-andopen-access-what-do-we-know. houghton, john, and alma swan. “planting the green seeds for a golden harvest: comments and clarifications on ‘going for gold.’” d-lib magazine 19, no. 1/2 (january/february 2013). https://doi.org/10.1045/january2013-houghton. hua, fang, heyuan sun, tanya walsh, helen worthington, and anne-marie glenny. “open access to journal articles in dentistry: prevalence and citation.” journal of dentistry 47 (april 2016): 41– 48. https://doi.org/10.1016/j.jdent.2016.02.005. internet corporation for assigned names and numbers. “list of top-level domains.” last updated september 13, 2018. https://www.icann.org/resources/pages/tlds-2012-02-25-en. jump, paul. “open access papers ‘gain more traffic and citations.’” times higher education, july 30, 2014. https://www.timeshighereducation.com/home/open-access-papers-gain-more-trafficand-citations/2014850.article. mccabe, mark j., and christopher m. snyder. “identifying the effect of open access on citations using a panel of science journals.” economic inquiry 52, no. 4 (october 2014): 1284–1300. https://doi.org/10.11111/ecin.12064. mccabe, mark j., and christopher m. snyder. “does online availability increase citations? theory and evidence from a panel of economics and business journals.” review of economics and statistics 97, no. 1 (march 2015): 144–65. https://doi.org/10.1162/rest_a_00437. mertens, stephan. “open access: unlimited web based literature searching.” deutsches ärzteblatt international 106, no. 43 (2009): 710–12. https://doi.org/10.3238/arztebl.2009.0710. moed, hank. “does open access publishing increase citation or download rates?” research trends 28 (may 2012). https://www.researchtrends.com/issue28-may-2012/does-open-accesspublishing-increase-citation-or-download-rates/. niyazov, yuri, carl vogel, richard price, ben lund, david judd, adnan akil, michael mortonson, josh schwartzman, and max shron. “open access meets discoverability: citations to articles posted to academia.edu.” plos one 11, no. 2 (february 2016): e0148257. https://doi.org/10.1371/journal.pone.0148257. ottaviani, jim. “the post-embargo open access citation advantage: it exists (probably), it’s modest (usually), and the rich get richer (of course).” plos one 11, no. 8 (august 2016): e0159614. https://doi.org/10.1371/journal.pone.0159614. pinfield, stephen, jennifer salter, and peter a. bath. “a ‘gold-centric’ implementation of open access: hybrid journals, the ‘total cost of publication,’ and policy development in the uk and beyond.” journal of the association for information science and technology 68, no. 9 (september 2017): 2248–63. https://doi.org/10.1002/asi.23742. piwowar, heather, jason priem, vincent larivière, juan pablo alperin, lisa matthias, bree norlander, ashley farley, jevin west, and stefanie haustein. “the state of oa: a large-scale https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know https://doi.org/10.1045/january2013-houghton https://doi.org/10.1016/j.jdent.2016.02.005 https://www.icann.org/resources/pages/tlds-2012-02-25-en https://www.timeshighereducation.com/home/open-access-papers-gain-more-traffic-and-citations/2014850.article https://www.timeshighereducation.com/home/open-access-papers-gain-more-traffic-and-citations/2014850.article https://doi.org/10.11111/ecin.12064 https://doi.org/10.1162/rest_a_00437 https://doi.org/10.3238/arztebl.2009.0710 https://www.researchtrends.com/issue28-may-2012/does-open-access-publishing-increase-citation-or-download-rates/ https://www.researchtrends.com/issue28-may-2012/does-open-access-publishing-increase-citation-or-download-rates/ https://doi.org/10.1371/journal.pone.0148257 https://doi.org/10.1371/journal.pone.0159614 https://doi.org/10.1002/asi.23742 the open access citation advantage | lewis 63 https://doi.org/10.6017/ital.v37i3.10604 analysis of the prevalence and impact of open access articles.” peerj (february 13, 2018): 6:e4375. https://doi.org/10.7717/peerj.4375. research information network. “nature communications: citation analysis.” press release, 2014. https://www.nature.com/press_releases/ncomms-report2014.pdf. riera, m. and e. aibar. “¿favorece la publicación en abierto el impacto de los artículos científicos? un estudio empírico en el ámbito de la medicina intensive” [does open access publishing increase the impact of scientific articles? an empirical study in the field of intensive care medicine]. medicina intensiva 37, no. 4 (may 2013): 232-40. http://doi.org/10.1016/j.medin.2012.04.002. sotudeh, hajar, zahra ghasempour, and maryam yaghtin. “the citation advantage of author-pays model: the case of springer and elsevier oa journals.” scientometrics 104 (june 2015): 581–608. https://doi.org/10.1007/s11192-015-1607-5. swan, alma, and john houghton. “going for gold? the costs and benefits of gold open access for uk research institutions: further economic modelling.” report to the uk open access implementation group, june 2012. http://wiki.lib.sun.ac.za/images/d/d3/report-to-the-uk-openaccess-implementation-group-final.pdf. tang, min, james d. bever, and fei-hai yu. “open access increases citations of papers in ecology.” ecosphere 8, no. 7 (july 2017): 1–9. https://doi.org/10.1002/ecs2.1887. unpaywall. “libraries user guide.” accessed september 13, 2018. https://unpaywall.org/userguides/libraries. wray, k. brad. “no new evidence for a citation benefit for author-pay open access publications in the social sciences and humanities.” scientometrics 106 (january 2016): 1031–35. https://doi.org/10.1007/s11192-016-1833-5. endnotes 1 elsevier, “your guide to publishing open access with elsevier” (amsterdam, netherlands: elsevier, 2015), 2, https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf. 2 philip m. davis and william h. walters, “the impact of free access to the scientific literature: a review of recent research,” journal of the medical library association 99, no. 3 (july 2011): 213, https://doi.org/10.3163/1536-5050.99.3.008. 3 david and walters, “the impact of free access,” 208. 4 kristin antelman, “leveraging the growth of open access in library collection decision making,” acrl 2017 proceedings: at the helm, leading the transformation, march 22–25, baltimore, maryland, ed. dawn m. mueller (chicago: association of college and research libraries, 2017): 411, 413, http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 7/leveragingthegrowthofopenaccess.pdf. https://doi.org/10.7717/peerj.4375 https://www.nature.com/press_releases/ncomms-report2014.pdf http://doi.org/10.1016/j.medin.2012.04.002 https://doi.org/10.1007/s11192-015-1607-5 http://wiki.lib.sun.ac.za/images/d/d3/report-to-the-uk-open-access-implementation-group-final.pdf http://wiki.lib.sun.ac.za/images/d/d3/report-to-the-uk-open-access-implementation-group-final.pdf https://doi.org/10.1002/ecs2.1887 https://unpaywall.org/user-guides/libraries https://unpaywall.org/user-guides/libraries https://doi.org/10.1007/s11192-016-1833-5 https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf http://jmla.mlanet.org/ https://doi.org/10.3163/1536-5050.99.3.008 http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf information technology and libraries | september 2018 64 5 research information network, “nature communications: citation analysis,” press release, 2014, https://www.nature.com/press_releases/ncomms-report2014.pdf. 6 gargouri et al., “self-selected or mandated, open access increases citation impact for higher quality research,” plos one 5, no. 10 (october 2010): 17, https://doi.org/10.1371/journal.pone.0013636. 7 david crotty, “when bad science wins, or ‘i’ll see it when i believe it’,” scholarly kitchen, august 31, 2016, https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-seeit-when-i-believe-it/. 8 jim ottaviani, “the post-embargo open access citation advantage: it exists (probably), it’s modest (usually), and the rich get richer (of course),” plos one 11, no. 8 (august 2016): 9, https://doi.org/10.1371/journal.pone.0159614. 9 gargouri et al., “self-selected or mandated,” 18. 10 elsevier, “your guide to publishing,” 2. 11 top-level domain (tld) refers to the last string of letters in an internet domain name (i.e., the tld of www.google.com is .com). for more information on tlds, see tim fisher, “top-level domain (tld),” lifewire, july 30, 2017, https://www.lifewire.com/top-level-domain-tld2626029. for a full list of tlds, see “list of top-level domains,” internet corporation for assigned names and numbers, last updated september 13, 2018, https://www.icann.org/resources/pages/tlds-2012-02-25-en. 12 crotty, “when bad science wins.” 13 hersh and plume, “citation metrics and open access: what do we know?,” elsevier connect, september 14, 2016, https://www.elsevier.com/connect/citation-metrics-and-open-accesswhat-do-we-know. 14 archambault et al., “research impact of paywalled versus open access papers,” white paper, science-metrix and 1science, 2016, http://www.1science.com/1numbr/. 15 archambault et al., “research impact.” 16 heather piwowar et al., “the state of oa: a large-scale analysis of the prevalence and impact of open access articles,” peerj, february 13, 2018, https://doi.org/10.7717/peerj.4375. 17 piwowar et al., “the state of oa,” 5. 18 david crotty, “study suggests publisher public access outpacing open access; gold oa decreases citation performance,” scholarly kitchen, october 4, 2017, https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-accessoutpacing-open-access-gold-oa-decreases-citation-performance/. https://www.nature.com/press_releases/ncomms-report2014.pdf https://doi.org/10.1371/journal.pone.0013636 https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://doi.org/10.1371/journal.pone.0159614 https://www.lifewire.com/top-level-domain-tld-2626029 https://www.lifewire.com/top-level-domain-tld-2626029 https://www.icann.org/resources/pages/tlds-2012-02-25-en https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know http://www.1science.com/1numbr/ https://doi.org/10.7717/peerj.4375 https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ the open access citation advantage | lewis 65 https://doi.org/10.6017/ital.v37i3.10604 19 archambault et al., “research impact”; piwowar et al., “the state of oa,” 15. 20 piwowar et al., “the state of oa,” 9–10. 21 archambault et al., “research impact.” 22 ottaviani, “the post-embargo open access citation advantage,” 2. 23 piwowar et al., “the state of oa,” 9. 24 hersh and plume, “citation metrics and open access.” 25 hersh and plume, “citation metrics and open access.” 26 tang et al., “open access increases citations of papers in ecology,” ecosphere 8, no. 7 (july 2017): 8, https://doi.org/10.1002/ecs2.1887. 27 tang et al., “open access increases citations,” 7. tang et al. list the following as examples of the “numerous studies” as quoted above, which i did not include in the quote for the purpose of brevity: (antelman 2004, hajjem et al. 2005, eysenbach 2006, evans and reimer 2009, calver and bradley 2010, riera and aibar 2013, clements 2017). 28 yuri niyazov et al., “open access meets discoverability: citations to articles posted to academia.edu,” plos one 11, no. 2 (february 2016): e0148257, https://doi.org/10.1371/journal.pone.0148257. 29 gargouri et al., “self-selected or mandated,” 18. 30 antelman, “leveraging the growth,” 414. 31 “library user guide,” unpaywall, accessed september 13, 2018, https://unpaywall.org/userguides/libraries.<> 32 gargouri et al., “self-selected or mandated,” 20. https://doi.org/10.1002/ecs2.1887 https://doi.org/10.1371/journal.pone.0148257 https://unpaywall.org/user-guides/libraries https://unpaywall.org/user-guides/libraries abstract introduction self-archiving bias and why it doesn’t matter gold versus green and their effect on oaca analyses bronze: neither gold nor green arc as an indication of an oaca six years and what has changed in oaca research field-specific studies summary of key field-specific studies summary of key multidisciplinary studies multidisciplinary studies conclusions bibliography endnotes perceived quality of whatsapp reference service: a quantitative study from user perspectives article perceived quality of whatsapp reference service a quantitative study from user perspectives yan guo, apple hiu ching lam, dickson k. w. chiu, and kevin k. w. ho information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.14325 yan guo (kguo@connect.hku.hk) is msc(lim) graduate, faculty of education, the university of hong kong. apple hiu ching lam (applelamwork@gmail.com) is edd candidate/msc(lim) graduate, faculty of education, the university of hong kong. dickson k. w. chiu (dicksonchiu@ieee.org) is lecturer, faculty of education, the university of hong kong. kevin k. w. ho (ho.kevin.ge@u.tsukuba.ac.jp) is professor of management information systems, graduate school of business sciences, humanities and social sciences, university of tsukuba. © 2022. abstract academic libraries are experiencing significant changes and making efforts to deliver their service in the digital environment. libraries are transforming from being places for reading to extensions of the classroom and learning spaces. due to the globalized digital environment and intense competition, libraries are trying to improve their service quality through various evaluations. as reference service is crucial to users, this study explores user satisfaction towards the reference service through whatsapp, a social media instant messenger, at a major university in hong kong and discusses the correlation between the satisfaction rating and three variables. suggestions and recommendations are raised for future improvements. the study also sheds light on the usage of reference services through instant messaging in other academic libraries. introduction due to the advancement of new technologies and mobile devices, library resources and services are more accessible.1 apart from independent searching strategies, the interactions between librarians and users have become an effective method to solve user problems, referred to as reference services.2 according to the reference and user services association (rusa), reference services include creating, managing, and assessing reference transactions and activities. 3 with the increasing user needs, reference services have become an essential part of library services and commonplace in academic libraries.4 further, technology development requires reference librarians to possess updated skills, willingness, and interest to deal with user inquiries.5 recently, due to the covid-19 pandemic, users have increasingly utilized virtual reference services to help them obtain information required for their academic studies instead of face-to-face modes.6 some libraries have employed different virtual tools, for example, instant messaging services, to provide reference services to their users. one of the most popular global instant messaging services is whatsapp.7 referring to the digital 2022—hong kong report, the most-used social media platform among internet users aged 16 to 64 in hong kong was whatsapp (84.3%), followed by facebook (83.7%), instagram (65.6%), wechat (55.2%), and facebook messenger (50.4%).8 the popularity of whatsapp in hong kong accordingly increases whatsapp reference service usage in academic libraries. the qualitative study by tsang and chiu has identified whatsapp as one of the most commonly-used and relatively preferred reference services of an academic library in hong kong.9 mailto:kguo@connect.hku.hk mailto:applelamwork@gmail.com mailto:dicksonchiu@ieee.org mailto:ho.kevin.ge@u.tsukuba.ac.jp information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 2 many studies have investigated reference service quality with measurements such as satisfaction rating, perceived gaps in reference services ability to meet user expectations, and other information-seeking behaviors. however, few studies focus on instant messaging reference services compared to traditional services, except for a notable recent qualitative study by tsang and chiu.10 therefore, this research aims to quantitatively evaluate user satisfaction with whatsapp’s application in reference service of a major university’s library in hong kong through three dimensions: affect of service (as), information control (ic), and library as place (lp), which are detailed in the research purpose section. the results can help librarians better understand the effectiveness of applying whatsapp and other instant messaging to improve reference service quality. by expanding technology-based services, libraries can become more competitive in the digital era and provide better user experiences in the future. thus, this study deals with the following four research questions (rqs): rq1. what is the users’ awareness level of the library’s reference services? rq2. how do users evaluate the whatsapp application in the library? rq3. what are the relationships between user satisfaction and the three service dimensions as, ic, and lp? rq4. how can academic libraries increase user satisfaction with whatsapp reference services? literature review in the late 1800s, library leaders started to pay attention to the importance of reference services. 11 since then, reference services have also caught the public’s attention and were introduced into public libraries. reference services can assist readers in solving problems through various interactions between users and staff.12 currently, the library is not merely a repository of collections, and librarians can provide more help, particularly fulfilling users’ various information needs rather than just offering directions or physical locations of books.13 nowadays, librarians strive to solve various user problems and inquiries with their professional skills and information literacy.14 at first, in-person and telephone were the most common ways for reference services. however, with the increasing number of remote users and ubiquitous internet connectivity, face-to-face reference and asynchronous emails can no longer satisfy users’ needs.15 thus, libraries increasingly explore collaborative software and mobile applications such as instant messaging, online chatting, video sessions, and other methods to serve users, referred to collectively as virtual reference.16 virtual reference occurs electronically in real time, where users may interact with librarians through smartphones, computers, or other devices without physical presence.17 as libraries began to use the internet, several case studies investigated instant messaging reference services in academic libraries.18 at the same time, librarians and researchers began to investigate reference service quality with designated measurements. various indicators can help measure user satisfaction levels, such as accuracy, communication skills, user satisfaction, instruction, and user’s willingness to return.19 although these indicators were originally developed for physical reference services, most principles and methods can still be applied to virtual reference services, as instant messaging has become one of the most frequently used information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 3 channels.20 some studies have confirmed the effectiveness of instant messaging for reference services for more traditional means, such as phone and email.21 as one of the most popular social media chatting software platforms, whatsapp has become a powerful tool for connecting librarians and users. a primary difference from traditional phonebased reference services is that whatsapp can share texts, images, documents, and videos (and their links) at a low cost.22 whatsapp can run as a mobile application on smartphones or as a web page on desktop browsers named whatsapp web. whatsapp web users are required to use their mobile phone to scan the qr (quick response) code on the computer browser (https://web.whatsapp.com/) for authentication before use. as the functionality of whatsapp web is similar to whatsapp, users can easily adapt to whatsapp web on desktop computers. as of march 2020, the number of active whatsapp users has globally increased to approximately 2 billion and is still growing steadily.23 whatsapp, by april 2021, had become the most popular messaging application based on the number of monthly active users, compared with other popular messaging applications.24 studies also indicate that students may use whatsapp for two to three hours daily.25 although the essential chat functions of whatsapp are similar to other instant messaging services such as facebook messenger and wechat, whatsapp and wechat have been more popular for hongkongers and mainland chinese, respectively.26 surprisingly, howard et al. studied students’ habits of using social media platforms at purdue university in the us and revealed that respondents rarely use whatsapp in their daily lives, indicating that residents in different regions may have different social media platform preferences.27 recently, odu and omini have demonstrated a significant relationship between using whatsapp and library service satisfaction from the student’s perspective.28 some studies also stressed that many students welcome whatsapp as an effective reference service platform.29 however, friday et al. pointed out that some librarians might not be trained and equipped with proper and up -todate skills in using social media tools to provide library services effectively and efficiently.30 further, aina, babalola, and oduwole argued that hurdles such as instructional policies, lack of time, and heavy workloads might cause difficulties in using these tools to provide library services.31 as for evaluation, mohd azmi, noorhidawati, and yanti idaya aspura applied rusa’s guidelines for the behavioral performance of reference and information service providers to evaluate the perceived importance versus actual practices of whatsapp reference service from librarians’ perspectives.32 they suggested that although librarians expressed their awareness of the importance of rusa guidelines, they would not fully comply with the guidelines because of time and other constraints. yet, few studies deal with the satisfaction with whatsapp reference services of academic libraries from user perspectives. research purpose regarding whatsapp and library services, a few studies focused on finding the relationship between whatsapp and service usage, user attitudes toward whatsapp applications, and the difficulties of using whatsapp, particularly for reference services.33 though mohd azmi, noorhidawati, and yanti idaya aspura evaluated librarians’ behavioral performance in providing whatsapp reference service, it was from librarians’ perspectives instead of users.34 https://web.whatsapp.com/ information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 4 thus, we studied user satisfaction with the whatsapp reference service offered by academic libraries by adapting tsui’s instrument to develop the survey framework.35 tsui employed three key indicators, i.e., affect of service (as), information control (ic), and library as place (lp), in libqual+, an online library assessment tool developed by the association of research libraries (arl).36 as measures “empathy, responsiveness, assurance, and reliability of library employees,”37 like librarian-user interactions concerning the librarians’ knowledge in inquiry responses and the level of reference service provided.38 ic measures “how users want to interact with the modern library and include scope, timeliness and convenience, ease of navigation, modern equipment, and self-reliance,”39 such as library resource availability and accessibility from user perspectives. 40 lp measures “the usefulness of space, the symbolic value of the library, and the library as a refuge for work or study,”41 such as the availability of adequate facilities and an appropriate physical environment from user perspectives.42 the application of these indicators will be further discussed in the methodology section. methodology this study chose a major academic library in hong kong with a long track record of technological advancement. a reference desk is situated near the library’s main entrance for traditional services. the library’s web page shows a clear ask a librarian column with diversified methods for reference services, including email, telephone, whatsapp, and other electronic devices to access the reference services. notably, the whatsapp reference service is operated the same as other channels, available monday to friday from 9 am to 5 pm (except on public holidays). the library promises an inquiry response in no more than four hours. the mission and vision of such reference services are to • help locate information resources; • assist in searching strategies and research; • deal with queries about the use of facilities and services; and • equip users with information literacy. the library has developed a whatsapp business account with a mobile phone in the whatsapp business application and uses the whatsapp web function to handle user inquiries on desktop computers. one to two library assistants support the whatsapp reference service on shift seamlessly from 9 am to 5 pm, including the lunch break, and a professional librarian reviews whatsapp inquiry records weekly. this study used a survey administered through google form, both online and offline, to collect user perceptions about the library’s whatsapp reference service. online methods for collecting survey responses included email, facebook, wechat, and whatsapp, and offline methods included site delivery at the library entrance and sticking the survey qr code on public notice boards. no incentives were provided for the voluntary data collection. the data collected comprised mainly undergraduate and postgraduate students to represent a general user view of the whatsapp reference service. microsoft excel and ibm spss statistics were used to analyze the data, including bivariate correlation for investigating the relationships between whatsapp satisfaction and the three variables based on tsui’s study, as, ic, and lp.43 among these indicators, as focuses on whether whatsapp is easy to use and supportive; ic evaluates the response speed, accuracy, and accessibility of the whatsapp reference service; and lp measures the staff attitude and whether whatsapp helps encourage librarian-user information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 5 communication. the survey also includes demographic information, reference services usage, and user satisfaction with the whatsapp reference service. participants were asked to evaluate the quality of whatsapp reference service from these three dimensions through five-point likert scales in the satisfaction rating part. finally, the survey asked for the overall satisfaction and other useful comments about the reference service. data analysis demographic information as the main analysis of this study is regression analysis, a check on the minimum number of participants needed for analysis was performed. as explained later in this paper, the regression involved six predictors of satisfaction. using medium effect size and 0.8 as the statistical power, the minimum sample size should be 97 using an online a-priori sample size calculator for multiple regression (https://www.danielsoper.com/statcalc/calculator.aspx?id=1). the data collection yielded 131 completed responses, with 66% of master’s students and the rest undergraduates. respondents had diversified academic backgrounds, including education (26.0%), science (14.5%), business and economics (13.0%), engineering (12.2%), liberal arts (10.7%), social science (9.9%), architecture (9.2%), and legal studies (4.6%). for the time spent on instant messaging such as whatsapp and wechat, 39% spent more than three to five hours every day, while one-fifth of them would spend one to two hours. 22% of respondents spend five hours or above, and only a small portion of them (19%) would spend less than an hour. in summary, most respondents would spend at least one hour on instant messaging daily. usage of reference service table 1 summarizes respondents’ usage of reference services with a five-point likert scale (1 = never; 5 = always). as shown, walk-in and email are the most common methods to use the reference service, while whatsapp is the least frequent. when it comes to the purposes of using reference services (see table 2), databases and e-resources and identifying information sources are the two most common purposes for respondents, followed by service and facility and research assistance. table 1. usage frequency of reference service through different methods (n = 131) methods walk-in email phone whatsapp mean score 3.23 3.24 2.53 2.32 note: 1=never; 5=always table 2. purposes of using reference service (n = 131) purposes service and facility database and eresources identify information sources research assistance (individual/ group) other mean score 3.10 3.36 3.28 3.08 2.31 note: 1=never; 5=always https://www.danielsoper.com/statcalc/calculator.aspx?id=1 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 6 when asked about their preferred way to use reference services, more than half of the respondents said they would use email (67.9%), followed by walk-in (59.5%), whatsapp (23.7%), and phone (12.2%). as the traditional method, most respondents considered walk-in, in-person reference the most effective reference method because users could receive instant help from librarians, especially for urgent and complex problems. however, results indicated that despite a gap in users seeking reference services help by instant messaging and email, this gap is smaller than that for face-to-face and telephone.44 the user ratings for reference services through different methods were compared using anova. our result shows a significant result, f(3, 520) = 30.52, p < 0.01. walk-in (m = 4.18, sd = 0.71) is the most satisfying method. post-hoc tests showed the rating of email (m = 3.78, sd = 0.60) and phone (m = 3.73, sd = 0.83) statistically indifferent and lower than walk-in, while both were considered better than whatsapp (m = 3.27, sd = 0.90). apart from these ratings, respondents were also asked to leave a few comments and suggestions for the reference service. notably, most respondents showed a positive attitude to the whatsapp reference service while suggesting some improvements. for example, one respondent requested “longer office hours for whatsapp.” at the time of this research, the whatsapp reference service hours were monday to friday from 9 am to 5 pm, while in-person reference service reference hours were monday to friday, 8:30 am to 7 pm, and saturdays from 8:30 am to 7 pm. therefore, the library should extend the whatsapp service hours to provide more flexible service time, aligning with the findings of tsang and chiu.45 further, a respondent suggested that librarians should “respond to email more efficiently.” for this issue, whatsapp could serve to expand user access to reference services instead of emails. users’ satisfaction with whatsapp reference service prior research reported that as, ic, and lp influenced user satisfaction. this study adapted the instrument developed in tsui’s prior research (see appendix) to collect data to investigate these relationships.46 as the cronbach’s alpha values for all three constructs are higher than 0.7, it is valid to use the average value of these items for our data analysis.47 table 3 shows the analysis of whether respondents’ academic level would affect as, ic, lp, and overall user satisfaction with whatsapp using anova. results indicated that academic level affected as but not the other factors and satisfaction. further, multiple regression results indicated that ic and lp affected whatsapp satisfaction. table 4 tabulates our findings. table 3. anova results overall undergraduate (n = 45) master’s student (n = 86) f-value affect of service (as) 3.380 3.200 3.474 5.712 * information control (ic) 3.202 3.162 3.223 0.286 library as place (lp) 3.645 3.550 3.695 0.273 whatsapp satisfaction (sat) 3.275 3.200 3.314 0.495 notes: *** p < 0.001; ** p < 0.01; * p < 0.05. information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 7 as shown in table 4, ic and lp have significant positive impacts on users’ satisfaction with using whatsapp for reference services. however, considering the academic level (undergraduate = 0; master’s student = 1) in our regression model (i.e., interaction effect), the following effects are notable. first, as does not affect user satisfaction with using whatsapp for reference services for undergraduates but positively affects master’s respondents. second, ic positively impacts satisfaction for both undergraduate and master’s respondents, of which the difference between these two respondent groups is statistically insignificant. lastly, even though lp also positively impacts satisfaction for both groups, the effect is higher for undergraduates than for master ’s respondents. the different learning needs of the groups may explain such differences, as shown in table 5.48 table 4. regression analysis main effect interaction effect independent variables coefficient t-value coefficient t-value affect of service (as) 0.0933 0.7556 −0.3503 −1.744 information control (ic) 0.6718 6.736 *** 0.7624 4.178 *** library as place (lp) 0.6092 6.143 *** 0.9366 6.335 *** as  academic 0.7817 3.076 *** ic  academic −0.1741 −0.8701 lp  academic −0.5721 −0.3178 *** intercept −1.412 −3.748 *** −1.423 −3.876 *** r2(adj). 0.5444 0.5742 f-value 52.78 *** 30.21 *** notes: *** p < 0.001; ** p < 0.01; * p < 0.05 table 5. impacts of as, ic, and lp on different student groups undergraduate master’s student affect of service (as) not significant 0.7817 information control (ic) 0.7624 0.7624 library as place (lp) 0.9366 0.3645 discussions and recommendations subdivision of the whatsapp reference service into specialist subjects our findings indicated that as had the strongest correlation with whatsapp satisfaction for master’s students, while the as part had the lowest satisfaction with undergraduate students. this reflected that respondents who are undergraduates could not receive adequate supportive help from librarians through whatsapp, aligning with the findings of tsang and chiu.49 a possible reason is that the number of whatsapp reference librarians with specialist subject knowledge was information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 8 small. yet, one general reference whatsapp number on the library website is insufficient compared to other methods, as the library website shows seven telephone numbers of branch libraries to serve different patrons. through different numbers, users could easily find the required experts accordingly. the whatsapp reference service had only a single number probably because users mostly ask basic and general questions. such a process would cost professionals too much time and energy to deal with.50 to further enhance the service, it is necessary to reform the operational policies and add a few more whatsapp accounts, for instance, creating a whatsapp business account for each branch library (a total of six branch libraries) or for each school serving users in different disciplines to connect to corresponding subject librarians via specialized whatsapp accounts.51 this approach can separate users from the general inquiry number dealing with quick and straightforward information inquiries from those requiring specific domain inquiries.52 further, the general inquiry whatsapp service should be extended to cater to various students’ needs by possibly improving to provide 24-hour service.53 to remedy human resources requirements, student helpers, interns, and volunteers can serve on shifts on saturdays, sundays, and even public holidays.54 more users may seek troubleshooting services during the holidays, especially long holidays, and recently, under the covid-19 pandemic and its associated isolation requirements.55 more staff training due to the whatsapp reference service features, the skills required for online and face-to-face conversations are different, e.g., it is difficult to convey emotions like facial expressions and body language online.56 further, due to the limited interactions between librarians and users and the lack of visual and audio cues through the whatsapp reference service, librarians can hardly identify user needs in a short time.57 therefore, librarians may need further professional training for such scenarios, particularly in answering questions quickly and precisely in real-time chat, because users tend to be more impatient during a chat engagement.58 in addition, unlike face-toface inquiry, some complex issues often cannot be adequately explained through whatsapp. therefore, librarians should make appropriate referrals if some problems cannot be solved through whatsapp. reference services through video-based platforms such as zoom can also help.59 regular training could offer librarians updated information on using the tool and refresh the skills used in responding to the whatsapp reference service among various staff members, i.e., librarians, library assistants, student helpers, volunteers, and interns. if the library staff does not acquire well-developed skills and competencies in texting, comprehension, and communication specialized in instant messaging services, they cannot efficiently and effectively understand the inquiries and search, locate, explain, and convey the appropriate information resources to users on the asynchronous whatsapp reference service in a shorter response time.60 establishment of whatsapp reference service guidelines whether the whatsapp reference service increases the capability to deal with user problems, it still relies on consistently favorable reference behaviors.61 mohd azmi, noorhidawati, and yanti idaya aspura pointed out that users need timely responses and friendly online contacts from librarians, though librarians might not completely follow the rusa guidelines due to human resources constraints.62 therefore, libraries should establish easy-to-follow, concise, and information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 9 whatsapp-tailored guidelines for appropriately conducting whatsapp reference services, especially because such skills differ from face-to-face services, as discussed above.63 the studied library has developed a simple series of internal work procedures for using whatsapp web, including how to open and close whatsapp web and how on-duty staff should handle inquiries. to enhance and standardize the whatsapp reference service, the library should develop guidelines by offering some polite, brief, and interactive text templates for answering inquiries, such as “i am (name) (job title). what may i help you with, dear user?”, as well as answers to frequently asked questions. progress reporting messages should be sent to users to acknowledge their searching status.64 the relationship between librarians and users can thus be enhanced by creating a consistent, friendly, and warm atmosphere, using informal conversation and emojis, and incorporating whatsapp’s features to engage users and establish continued service use.65 further, such guidelines can save time and energy in training new staff and provide the basis for the future development of artificial intelligence aids such as chatbots.66 promotion for the whatsapp service most respondents conveyed a positive attitude, considering whatsapp a convenient way to access the reference service, which is in line with the studies by ansari and tripathi and sudharani and nagaraju.67 however, it is still not the most frequently used and preferred method in the library. one reason is that users still need help with physical materials and ask for the answers face-toface. 68 however, this is not the only reason, as many studies showed that library promotional efforts are often weak.69 in addition to the traditional promotional materials such as leaflets and contact cards with whatsapp numbers, the library can also broaden the promotion through massive emails and social media such as facebook, instagram, linkedin, twitter, and signal.70 in the information era, social media is an effective and efficient channel for reaching the target audience and disseminating information in an accessible way. 71 the library should reform the webpage of the whatsapp reference service to further attract users. for instance, displaying some sample whatsapp chat screenshots of librarian-user interactions on the library’s website can increase the attractiveness of the service as images can graphically represent the application’s ease of use for library reference help.72 conclusion the study has investigated user satisfaction with the whatsapp reference service in a major academic library in hong kong and explored the correlations between whatsapp satisfaction with three quality dimensions as, ic, and lp. the survey revealed various opinions toward using reference services and preference methods, including inconsistencies between users’ frequently used methods and preferred methods. moreover, by analyzing the correlation between whatsapp satisfaction and the three variables, results showed that users emphasized the whatsapp reference service. the results have led to some practical suggestions for improvement: subdividing the whatsapp reference service with subject specialists, providing more staff training, establishing staff guidelines and policies, and increasing whatsapp service promotion. limitations and future research there are still some limitations to the study. firstly, the survey only collected limited complete responses, which may not represent all users’ views. additionally, the perceptions of both library information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 10 staff and users should be considered. secondly, the research evaluation design with three dimensions can be extended to measure other quality and effects. thirdly, as whatsapp is just one application among various emerging instant-messaging tools, further studies should cover other instant messaging platforms for similar and different purposes. for instance, as the studied university comprises a significant student population from mainland china, wechat could be investigated for its possibility and effectiveness as a whatsapp alternative for providing reference services and promotion to chinese students.73 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 11 appendix: key survey items item mean sd. affect of service (ease of use, supportive) (as) (cronbach’s alpha  = 0.799) as1. there is a clear introduction teaching library users about how to use the whatsapp function. 3.18 0.87 as2. the directories are easy to understand. 3.16 0.85 as3. reference service through whatsapp is easy to use. 3.69 0.94 as4. i can receive instant help from a librarian through whatsapp. 3.55 0.78 as5. i can request service anytime, anywhere. 3.31 0.93 information control (response speed, accuracy, accessible) (ic) (cronbach’s alpha  = 0.707) ic1. response of inquiry is reliable. 3.66 0.74 ic2. whatsapp application makes reference services easily accessible for users. 3.25 0.95 ic3. response of inquiry is accurate. 3.87 0.66 ic4. using whatsapp to gain access to reference services can meet my needs. 3.73 0.95 ic5. the quality of response obtained through whatsapp is inferior to walk-in. (r). 2.60 1.26 ic6. the quality of response obtained through whatsapp is inferior to email (r). 2.66 1.17 ic7. the quality of response obtained through whatsapp is inferior to phone (r). 2.63 0.98 library as place (staff attitude, encourage communication) (lp) (cronbach’s alpha  = 0.843) lp1. reference staff is friendly or pleasant 4.08 0.76 lp2. using whatsapp to contact a librarian is convenient. 3.90 0.78 lp3. whatsapp application in reference service increases my productivity in using online library services. 3.53 0.99 lp4. it provides an efficient channel to communicate with librarians. 3.87 0.89 lp5. i request more reference services after i know about the whatsapp channel. 3.28 0.86 note: ic5, ic6, and ic7 are reversed codes. information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 12 endnotes 1 karen hiu tung yip, patrick lo, kevin k. w. ho, and dickson k. w. chiu, “adoption of mobile library apps as learning tools in higher education: a tale between hong kong and japan ,” online information review 45, no. 2 (2020): 389–405, https://doi.org/10.1108/oir-07-20200287; ken yiu kwan fan, patrick lo, kevin k. w. ho, stuart so, dickson k. w. chiu, and eddie h. t. ko, “exploring the mobile learning needs amongst performing arts students,” information discovery and delivery 48, no. 2 (2020), 103–12, https://doi.org/10.1108/idd-12-2019-0085; vanessa hiu ying chan, dickson k. w. chiu, and kevin k. w. ho, “mediating effects on the relationship between perceived service quality and public library app loyalty during the covid-19 era,” journal of retailing and consumer services 67 (2022): 102960, https://doi.org/10.1016/j.jretconser.2022.102960. 2 samuel s. green, “personal relations between librarians and readers,” library journal 1, no. 2 (1876): 74–81. 3 “measuring and assessing reference services and resources: a guide,” reference and user services association, accessed july 25, 2021, http://www.ala.org/rusa/sections/rss/rsssection/rsscomm/evaluationofref/measrefguide. 4 angel lok yi tsang and dickson k. w. chiu, “effectiveness of virtual reference services in academic libraries: a qualitative study based on the 5e learning model,” the journal of academic librarianship 48, no. 4 (2022): 102533; kun zhang and peixin lu, “what are the key indicators for evaluating the service satisfaction of wechat official accounts in chinese academic libraries?,” library hi tech, (2022), ahead-of-print, https://doi.org/10.1108/lht07-2021-0218; yifei zhang, patrick lo, stuart so, and dickson k. w. chiu, “relating library user education to business students’ information needs and learning practices: a comparative study,” reference services review 48, no. 4 (2020): 537–58, https://doi.org/10.1108/rsr-12-2019-0084. 5 andrew chean yang yew, dickson k. w. chiu, yuriko nakamura, and king kwan li, “a quantitative review of lis programs accredited by ala and cilip under contemporary technology advancement,” library hi tech, (2022), ahead of print, https://doi.org/10.1108/lht-12-2021-0442; james friday, oluchi chidozie, and lauretta ngozi chukwuma, “social media and library services: a case of covid-19 pandemic era,” international journal of research and review 7, no. 10 (2020): 230–37, https://www.ijrrjournal.com/ijrr_vol.7_issue.10_oct2020/abstract_ijrr0031.html. 6 ruth sara connell, lisa c. wallis, and david comeaux, “the impact of covid-19 on the use of academic library resources,” information technology and libraries 40, no. 2 (2021): 1–20, https://doi.org/10.6017/ital.v40i2.12629. 7 “digital 2022: global overview report,” we are social and hootsuite, accessed april 30, 2022, https://wearesocial.com/hk/blog/2022/01/digital-2022/; tsang and chiu, “effectiveness of virtual reference services.” 8 “digital 2022—hong kong,” we are social and hootsuite, accessed april 30, 2022, https://wearesocial.com/hk/blog/2022/01/digital-2022/. https://doi.org/10.1108/oir-07-2020-0287 https://doi.org/10.1108/oir-07-2020-0287 https://doi.org/10.1108/idd-12-2019-0085 https://doi.org/10.1016/j.jretconser.2022.102960 http://www.ala.org/rusa/sections/rss/rsssection/rsscomm/evaluationofref/measrefguide https://doi.org/10.1108/lht-07-2021-0218 https://doi.org/10.1108/lht-07-2021-0218 https://www.emerald.com/insight/search?q=yifei%20zhang https://www.emerald.com/insight/search?q=patrick%20lo https://www.emerald.com/insight/search?q=stuart%20so https://www.emerald.com/insight/search?q=dickson%20k.w.%20chiu https://doi.org/10.1108/rsr-12-2019-0084 https://doi.org/10.1108/lht-12-2021-0442 https://www.ijrrjournal.com/ijrr_vol.7_issue.10_oct2020/abstract_ijrr0031.html https://doi.org/10.6017/ital.v40i2.12629 https://wearesocial.com/hk/blog/2022/01/digital-2022/ https://wearesocial.com/hk/blog/2022/01/digital-2022/ information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 13 9 tsang and chiu, “effectiveness of virtual reference services.” 10 tsang and chiu, “effectiveness of virtual reference services.” 11 green, “personal relations.” 12 green, “personal relations.” 13 p. sankar and e. s. kavitha, “ask librarian to whatsapp librarian: reengineering of traditional library services,” international journal of information sources and services 3, no. 2 (march– april 2016): 35–40, https://www.researchgate.net/profile/drkavithaes/publication/304466788_ask_librarian_to_whatsapp_librarian_reengineering_of_traditio nal_library_services/links/5770958c08ae10de639c0ca3/ask-librarian-to-whatsapplibrarian-reengineering-of-traditional-library-services.pdf; spear wing sze wong and dickson k. w. chiu, “re-examining the value of remote academic library storage in the mobile digital age: a comparative study,” portal 23, no. 1 (2023), in press; tin nok leung, dickson k. w. chiu, kevin k. w. ho, and canon k. l. luk, “user perceptions, academic library usage and social capital: a correlation analysis under covid-19 after library renovation,” library hi tech 40, no. 2 (2021): 304–22, https://doi.org/10.1108/lht-04-2021-0122. 14 syeda hina batool, ata ur rehman, and imran sulehri, “the current situation of information literacy education and curriculum design in pakistan: a discovery using delphi method,” library hi tech (2021): ahead of print, https://doi.org/10.1108/lht-02-2021-0056; yew et al., “quantitative review of lis programs.” 15 tsang and chiu, “effectiveness of virtual reference services”; zhang and lu, “what are the key indicators.” 16 james ogom odu, and emmanuel ubi omini, “mobile phone applications and the utilization of library services in the university of calabar library, calabar, nigeria,” global journal of educational research 16, no. 2 (2017): 111–19, https://doi.org/10.4314/gjedr.v16i2.5. 17 “guidelines for behavioral performance of reference and information service providers,” american library association, june 2004, http://www.ala.org/template.cfm?section=home&template=/contentmanagement/contentd isplay.cfm&contentid=26937. 18 marianne foley, “instant messaging reference in an academic library: a case study,” college & research libraries 63, no. 1 (2002): 36–45, https://doi.org/10.5860/crl.63.1.36; tsang and chiu, “effectiveness of virtual reference services.” 19 chun-wai tsui, “a study on service quality gap in remote service delivery with mobile devices among academic libraries in hong kong,” (master’s thesis, the university of hong kong, 2015), https://doi.org/10.5353/th_b5611574; leung et al., “user perceptions”; zhang and lu, “what are the key indicators.” 20 tsang and chiu, “effectiveness of virtual reference services.” https://www.researchgate.net/profile/drkavitha-es/publication/304466788_ask_librarian_to_whatsapp_librarian_reengineering_of_traditional_library_services/links/5770958c08ae10de639c0ca3/ask-librarian-to-whatsapp-librarian-reengineering-of-traditional-library-services.pdf https://www.researchgate.net/profile/drkavitha-es/publication/304466788_ask_librarian_to_whatsapp_librarian_reengineering_of_traditional_library_services/links/5770958c08ae10de639c0ca3/ask-librarian-to-whatsapp-librarian-reengineering-of-traditional-library-services.pdf https://www.researchgate.net/profile/drkavitha-es/publication/304466788_ask_librarian_to_whatsapp_librarian_reengineering_of_traditional_library_services/links/5770958c08ae10de639c0ca3/ask-librarian-to-whatsapp-librarian-reengineering-of-traditional-library-services.pdf https://www.researchgate.net/profile/drkavitha-es/publication/304466788_ask_librarian_to_whatsapp_librarian_reengineering_of_traditional_library_services/links/5770958c08ae10de639c0ca3/ask-librarian-to-whatsapp-librarian-reengineering-of-traditional-library-services.pdf https://doi.org/10.1108/lht-04-2021-0122 https://doi.org/10.1108/lht-02-2021-0056 https://doi.org/10.4314/gjedr.v16i2.5 http://www.ala.org/template.cfm?section=home&template=/contentmanagement/contentdisplay.cfm&contentid=26937 http://www.ala.org/template.cfm?section=home&template=/contentmanagement/contentdisplay.cfm&contentid=26937 https://doi.org/10.5860/crl.63.1.36 https://doi.org/10.5353/th_b5611574 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 14 21 charlotte clements, “implementing instant messaging in four university libraries,” library hi tech 27, no. 3 (2009): 393–402, https://doi.org/10.1108/07378830910988522. 22 gunnan dong et al., “relationships between research supervisors and students from coursework-based master’s degrees: information usage under social media,” information discovery and delivery 49, no. 4 (2021): 319–27, https://doi.org/10.1108/idd-08-2020-0100; tsang and chiu, “effectiveness of virtual reference services.” 23 “number of monthly active whatsapp users worldwide 2013–2020,” statista research department, accessed july 26, 2021, https://www.statista.com/statistics/260819/number-ofmonthly-active-whatsapp-users/. 24 “most popular global mobile messaging apps 2021,” statista research department, accessed july 26, 2021, https://www.statista.com/statistics/258749/most-popular-global-mobilemessenger-apps/. 25 mohd shoaib ansari and aditya tripathi, “use of whatsapp for effective delivery of library and information services,” desidoc journal of library & information technology 37, no. 5 (2017): 360–65, https://doi.org/10.14429/djlit.37.5.11090; y. sudharani and k. nagaraju, “whatsapp usage among the students of svu college of engineering, tirupathi,” journal of advances in library and information science 5, no. 4 (2016): 325–29, https://jalis.in/pdf/5-4/nagaraju.pdf. 26 jianhua xu, qi kang, zhiqiang song, and christopher peter clarke, “applications of mobile social media: wechat among academic libraries in china,” the journal of academic librarianship 41, no. 1 (2015): 21–30, https://doi.org/10.1016/j.acalib.2014.10.012; tsang and chiu, “effectiveness of virtual reference”; “digital 2022—hong kong,” 54; zhang and lu, “what are the key indicators.” 27 heather howard, sarah huber, lisa carter, and elizabeth moore, “academic libraries on social media: finding the students and the information they want,” information technology and libraries 37, no. 1 (2018): 8–18, https://doi.org/10.6017/ital.v37i1.10160. 28 odu and omini, “mobile phone applications.” 29 ansari and tripathi, “use of whatsapp”; sudharani and nagaraju, “whatsapp usage.” 30 friday, chidozie, and chukwuma, “social media and library services.” 31 adebowale japhet aina, yemisi tomilola babalola, and adebambo adewale oduwole, “use of web 2.0 tools and services by the library professionals in lagos state tertiary institution libraries: a study,” world digital libraries – an international journal 12, no.1 (2019): 1–17, https://content.iospress.com/articles/world-digital-libraries-an-internationaljournal/wdl12101. 32 nor azilawati mohd azmi, a. noorhidawati, and m. k. yanti idaya aspura, “librarians’ behavioral performance on chat reference service in academic libraries: perceived importance vs actual practices,” malaysian journal of library & information science 22, no. 3 (2017): 19–33, https://doi.org/10.22452/mjlis.vol22no3.2. https://doi.org/10.1108/07378830910988522 https://doi.org/10.1108/idd-08-2020-0100 https://www.statista.com/statistics/260819/number-of-monthly-active-whatsapp-users/ https://www.statista.com/statistics/260819/number-of-monthly-active-whatsapp-users/ https://www.statista.com/statistics/258749/most-popular-global-mobile-messenger-apps/ https://www.statista.com/statistics/258749/most-popular-global-mobile-messenger-apps/ https://doi.org/10.14429/djlit.37.5.11090 https://jalis.in/pdf/5-4/nagaraju.pdf https://doi.org/10.1016/j.acalib.2014.10.012 https://doi.org/10.6017/ital.v37i1.10160 https://content.iospress.com/articles/world-digital-libraries-an-international-journal/wdl12101 https://content.iospress.com/articles/world-digital-libraries-an-international-journal/wdl12101 https://doi.org/10.22452/mjlis.vol22no3.2 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 15 33 aina, babalola, and oduwole, “use of web 2.0 tools and services”; ansari and tripathi, “use of whatsapp”; friday, chidozie, and chukwuma, “social media and library services”; odu and omini, “mobile phone applications”; sudharani and nagaraju, “whatsapp usage.” 34 mohd azmi, noorhidawati, and yanti idaya aspura, “librarians’ behavioral performance.” 35 tsui, “a study on service quality gap.” 36 “what is libqual+®?,” libqual+, accessed may 1, 2022, https://www.libqual.org/home; tsui, “a study on service quality gap.” 37 jessica kayongo, and sherri jones, “faculty perception of information control using libqual+™ indicators,” the journal of academic librarianship 34, no. 2 (2008): 131, https://doi.org/10.1016/j.acalib.2007.12.002. 38 rachael kwai fun ip and christian wagner, “libqual+® as a predictor of library success: extracting new meaning through structured equation modeling,” the journal of academic librarianship 46, no. 2 (2020): 102102, https://doi.org/10.1016/j.acalib.2019.102102; selena killick, anne van weerden, and fransje van weerden, “using libqual+® to identify commonalities in customer satisfaction: the secret to success?.” performance measurement and metrics 15, no. 1/2 (2014), 23–31, https://doi.org/10.1108/pmm-04-2014-0012. 39 kayongo and jones, “faculty perception of information control,” 131. 40 ip and wagner, “libqual® as a predictor”; killick, van weerden, and van weerden, “using libqual® to identify commonalities.” 41 kayongo and jones, “faculty perception of information control,” 131. 42 ip and wagner, “libqual® as a predictor”; killick, van weerden, and van weerden, “using libqual® to identify commonalities.” 43 tsui, “a study on service quality gap.” 44 anabel quan–haase, “instant messaging on campus: use and integration in university students' everyday communication,” the information society 24, no. 2 (2008): 105–15, https://doi.org/10.1080/01972240701883955. 45 tsang and chiu, “effectiveness of virtual reference services.” 46 tsui, “a study on service quality gap.” 47 robert a. peterson, “a meta-analysis of cronbach's coefficient alpha,” journal of consumer research 21, no. 2 (1994): 381–91, https://doi.org/10.1086/209405. 48 ka po lau, dickson k. w. chiu, kevin k. w. ho, patrick lo, and eric w. k. see-to, “educational usage of mobile devices: differences between postgraduate and undergraduate students ,” the journal of academic librarianship 43, no. 3 (2017): 201–8, https://doi.org/10.1016/j.acalib.2017.03.004. https://www.libqual.org/home https://doi.org/10.1016/j.acalib.2007.12.002 https://doi.org/10.1016/j.acalib.2019.102102 https://doi.org/10.1108/pmm-04-2014-0012 https://doi.org/10.1080/01972240701883955 https://doi.org/10.1086/209405 https://doi.org/10.1016/j.acalib.2017.03.004 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 16 49 tsang and chiu, “effectiveness of virtual reference services.” 50 aina, babalola, and oduwole, “use of web 2.0 tools and services.” 51 aina, babalola, and oduwole, “use of web 2.0 tools and services.” 52 leung et al., “user perceptions”; zhang et al., “relating library user education.” 53 maggie ka yin chan, dickson k. w. chiu, and ernest tak hei lam, “effectiveness of overnight learning commons: a comparative study,” the journal of academic librarianship 46, no. 6 (2020): 102253, https://doi.org/10.1016/j.acalib.2020.102253; tsang and chiu, “effectiveness of virtual reference services.” 54 wesley wing hong cheng, ernest tak hei lam, and dickson k. w. chiu, “social media as a platform in academic library marketing: a comparative study,” the journal of academic librarianship 46, no. 5 (2020): 102188, https://doi.org/10.1016/j.acalib.2020.102188. 55 parker fruehan and diana hellyar, “expanding and improving our library's virtual chat service: discovering best practices when demand increases,” information technology and libraries 40, no. 3 (2021): 1–9, https://doi.org/10.6017/ital.v40i3.13117; pui yik yu, ernest tak hei lam, and dickson k. w. chiu, “operation management of academic libraries in hong kong under covid-19,” library hi tech, (2022), ahead of print, https://doi.org/10.1108/lht10-2021-0342. 56 friday, chidozie, and chukwuma, “social media and library services.” 57 mohd azmi, noorhidawati, and yanti idaya aspura, “librarians’ behavioral performance.” 58 aina, babalola, and oduwole, “use of web 2.0 tools and services”; friday, chidozie, and chukwuma, “social media and library services.” 59 yu, lam, chiu, “operation management” 60 tsang and chiu, “effectiveness of virtual reference services.” 61 kirsti nilsen, “the library visit study: user experiences at the virtual reference desk,” information research 9, no. 2 (2004), paper 171, http://informationr.net/ir/92/paper171.html. 62 mohd azmi, noorhidawati, and yanti idaya aspura, “librarians’ behavioral performance.” 63 tsang and chiu, “effectiveness of virtual reference services.” 64 mohd azmi, noorhidawati, and yanti idaya aspura, “librarians’ behavioral performance.” 65 tsang and chiu, “effectiveness of virtual reference services.” 66 dessy harisanty et al., “leaders, practitioners and scientists’ awareness of artificial intelligence in libraries: a pilot study,” library hi tech, (2022), ahead of print, https://doi.org/10.1108/lht10-2021-0356. https://doi.org/10.1016/j.acalib.2020.102253 https://doi.org/10.1016/j.acalib.2020.102188 https://doi.org/10.6017/ital.v40i3.13117 https://doi.org/10.1108/lht-10-2021-0342 https://doi.org/10.1108/lht-10-2021-0342 http://informationr.net/ir/9-2/paper171.html http://informationr.net/ir/9-2/paper171.html https://doi.org/10.1108/lht-10-2021-0356 https://doi.org/10.1108/lht-10-2021-0356 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 17 67 ansari and tripathi, “use of whatsapp”; sudharani and nagaraju, “whatsapp usage.” 68 leung et al., “user perceptions”; tsang and chiu, “effectiveness of virtual reference services.” 69 ernest tak hei lam, cheuk hang au, and dickson k. w. chiu, “analyzing the use of facebook among university libraries in hong kong,” the journal of academic librarianship 45, no. 3 (2019): 175–83, https://doi.org/10.1016/j.acalib.2019.02.007; foley, “instant messaging reference”; tsang and chiu, “effectiveness of virtual reference services.” 70 lam, au, and chiu, “analyzing the use of facebook”; tsang and chiu, “effectiveness of virtual reference services.” 71 cheng, lam, and chiu, “social media as a platform.” 72 tsang and chiu, “effectiveness of virtual reference services.” 73 apple hiu ching lam, kevin k. w. ho, and dickson k. w. chiu, “instagram for student learning and library promotions? a quantitative study using the 5e instructional model,” aslib journal of information management, (2022), in press, https://doi.org/10.1108/ajim-12-2021-0389. https://doi.org/10.1016/j.acalib.2019.02.007 https://doi.org/10.1108/ajim-12-2021-0389 abstract introduction literature review research purpose methodology data analysis demographic information usage of reference service users’ satisfaction with whatsapp reference service discussions and recommendations subdivision of the whatsapp reference service into specialist subjects more staff training establishment of whatsapp reference service guidelines promotion for the whatsapp service conclusion limitations and future research appendix: key survey items endnotes public libraries respond to the covid-19 pandemic, creating a new service model editorial board thoughts public libraries respond to the covid-19 pandemic, creating a new service model jon goddard information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12847 jon goddard (jgoddard@northshorepubliclibrary.org) is a librarian at the north shore public library, and a member of the ital editorial board. © 2020. during the covid-19 pandemic, public libraries have demonstrated, in many ways, their value to their communities. they have enabled their patrons to not only resume their lives, but to help them learn and grow. additionally, electronic resources offered to patrons through their library card have allowed people to be educated and entertained. the credit must go to the librarians, who initially fueled, and have maintained this level of service by re-writing the rules—creating a new service model. once libraries closed, librarians promoted ebooks and other important platforms available to patrons with their library cards. the result: the checkout of ebooks, and the use of these platforms rose, exponentially. community engagement became completely virtual with librarians, and those who provide library programs to the public, providing services on platforms that they may or may not have heard of, such as zoom and discord. as libraries re-opened, many offered real-time reference services, as well as seamless and contactless curbside service, providing a sense of control and continuity amongst the chaos. exponential increases in electronic resource usage overdrive, which is currently used by nearly 90% of public libraries in the united states to manage both ebook and audiobook collections, saw an exponential increase in its usage. since the lockdown began in mid-march, the daily average for ebook checkouts have been consistently 52% above pre-covid periods. additionally, new users to the platform have been consistently double and triple 2019 highs.1 library staff have been helping readers during this time to ensure they obtain access with their devices. in suffolk county, new york, where new patron registration to overdrive is up 72% from last year (as of august 2020), there has been no shortage of requests for help.2 with kids being home from school and learning virtually, it is no surprise that ebook readership skyrocketed amongst ya and juvenile readers with an 87% increase from last year. 3 to help them with their homework and studies, families turned to online tutoring. in suffolk county, new york, the usage of the brainfuse online live tutoring service has been consistently up by nearly 50% during the school closures.4 gale, a cengage company, which offers miss humblebee's academy, a virtual learning program for preschoolers, saw its user sessions increase by 100% from the previous year.5 mailto:jgoddard@northshorepubliclibrary.org information technology and libraries december 2020 public libraries respond to the covid-19 pandemic | goddard 2 adults, also eager to learn new skills, took to online courses as well. gale courses saw a 50% increase in enrollments from march-july from the previous year. likewise, gale presents: udemy, which offers on-demand video courses, saw just over 21,000 course enrollments from marchjune.6 to help those who did not have sufficient broadband wifi to use these necessary resources and platforms, many libraries left their wifi on even when the building was closed to allow access to those in the vicinity of the building. in addition, many libraries purchased wifi hotspots to lend to their patrons. according to pew research, approximately 25% of households do not have a broadband internet connection at home.7 while public libraries cannot provide the only local solution to this gap, here are other steps libraries have been taking during the shutdown: • strengthening wireless signals so people can access wireless from outside library buildings. • hosting drive-up wifi hotspot locations. • partnering with schools to obtain and distribute wifi hotspots to families in need. community engagement virtually community engagement has been vital since the covid-19 lockdown. both librarians and those who provide library programs to the public had to quickly adjust to the virtual world in which we were suddenly living. using a mixture of social media platforms, including facebook live and stories, discord, instagram, youtube, zoom, and gotomeeting, librarians flocked to the internet, providing a wide range of programming. even those libraries that did not previously have any virtual programs managed to very quickly provide quality programs to their patrons. virtual programming was not available at the san josé public library (sjpl) prior to the shutdown. librarians quickly started to move programs online, including story time, and created a program called spring into reading, similar to the summer reading program, to continue to encourage families to read together. they also started a weekly recorded story time, so patrons could call the library and use their phones to hear a story. to date, sjpl has hosted over 2,000 virtual events since the lockdown began on march 17th.8 some libraries, like the oceanside library in new york, were offering virtual programs before the pandemic. when the library closed on march 13, the team started planning to move completely virtual. two days later, the library was offering four programs a day, including story times, book chats, and book clubs. by the end of the week, they were offering eight programs a day.9 in april, may, and june, they found book discussions and story times were the most popular programs. they then started to open their programs to people from out of state, partnering with other libraries. the result? program attendance has increased and several zoom meeting rooms have been maxed out.10 through the lockdown, library patrons have been exercising, listening to concerts, taking virtual vacations, learning new skills, cooking, playing games, and reducing stress. this incredible information technology and libraries december 2020 public libraries respond to the covid-19 pandemic | goddard 3 adaption was only possible due to library worker’s quick thinking and a never-ending determination to help. delivering information and materials with a new service model at the san jose public library (sjpl), which has over 500,000 library members, library staff had to quickly shift to a new online reality just after the shutdown. to help patrons get the most from their electronic resources, sjpl used libanswers to post faqs and email responses to their issues and questions. when a librarian was available, patrons could use libchat to ask questions in real-time. because no one was in the library buildings to answer phones, libanswers and libchat became the only way the public could communicate with staff. chat reference conversations increased by nearly 400%—from approximately 40 chat sessions per day to 160 per day. the chat service was also made available in spanish, vietnamese, and chinese. when the library implemented its express pickup service, sjpl utilized the spaces functionality in libcal to allow patrons to create pickup appointments. when patrons arrived at the library for their appointment, the sms functionality in libanswers allowed patrons to text staff upon arrival. through the city of san josé’s sj access initiative, which aims to help bridge the digital divide in the city, sjpl worked closely with other city departments, and the santa clara county office of education, to purchase approximately 16,000 high-speed at&t hotspots for students and the public.11 working towards the new normal the american library association (ala) is committed to advocate strongly for libraries on several different fronts. thanks to thousands of advocate communications with congress, libraries secured $50 million for the institute of museum and library services (imls) in the coronavirus aid, relief, and economic security (cares) act. this enabled libraries and museums to apply for grants during this time of need.12 in addition, the ala is currently advocating for the passage of the library stabilization fund act (s.4181 / h.r.7486) to allow libraries to retain staff, maintain services, and safely keep communities connected and informed. the legislation calls for $2 billion in emergency recovery funding for america's libraries through the institute of museum and library services (imls).13 while the ala is rightly advocating for these emergency funds, public librarians and administrators should take advantage of this time to strategically review what has been put into place to react to the covid-19 pandemic, and plan for the long term. while it is true that libraries are physical spaces, they are also technology-driven services for learning and connections for all ages. additionally, they have shown that due to this new service model, access has expo nentially expanded to new patrons, showing tremendous value when it comes to education and engagement. information technology and libraries december 2020 public libraries respond to the covid-19 pandemic | goddard 4 this new service model should be preserved. programs that engage our communities should be both physical and virtual. physical media and books should be provided both at the circulation desk and through a contactless service. reference services should be provided both at the reference desk, and through chat reference services. this must be our new normal. endnotes 1 david burleigh, director, brand marketing & communication at overdrive, phone conversation with author, october 9, 2020. 2 maureen mcdonald, special projects supervisor at the suffolk cooperative library system, phone conversation, september 14, 2020. 3 burleigh. 4 mcdonald. 5 kayla siefker, head of media & public relations at gale, a cengage company, brian risse, vp of sales – public libraries. muna sharif, product manager, discovery & analytics, phone conversation with author, october 16, 2020. 6 siefker. 7 pew research center, “internet/broadband fact sheet,” june 12, 2019, accessed october 13, 2020, https://www.pewresearch.org/internet/fact-sheet/internet-broadband/. 8 laurie willis, web services at sjpl, phone conversation with author, october 14, 2020. 9 erica freudenberger, “programming through the pandemic,” library journal, may 22, 2020, https://www.libraryjournal.com/?detailstory=programming-through-the-pandemic-covid19. 10 tony iovino, assistant director for community services at the oceanside library, phone conversation with author, october 19, 2020. 11 willis. 12 american library association, “advocacy & policy,” accessed october 15, 2020, http://www.ala.org/tools/covid/advocacy-policy. 13 ibid. https://www.pewresearch.org/internet/fact-sheet/internet-broadband/ https://www.libraryjournal.com/?detailstory=programming-through-the-pandemic-covid-19 https://www.libraryjournal.com/?detailstory=programming-through-the-pandemic-covid-19 http://www.ala.org/tools/covid/advocacy-policy exponential increases in electronic resource usage community engagement virtually delivering information and materials with a new service model endnotes researchgate metrics’ behavior and its correlation with rg score and scopus indicators: a combination of bibliometric and altmetric analysis of scholars in medical sciences article researchgate metrics’ behavior and its correlation with rg score and scopus indicators a combination of bibliometric and altmetric analysis of scholars in medical sciences saeideh valizadeh-haghi, hamed nasibi-sis,* maryam shekofteh, and shahabedin rahmatizadeh information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14033 saeideh valizadeh-haghi (saeideh.valizadeh@gmail.com) is assistant professor, department of medical library and information sciences, school of allied medical sciences, shahid beheshti university of medical sciences, tehran, iran. *corresponding author hamed nasibi-sis (nasibi.lib@gmail.com) is msc. graduate, department of medical library and information sciences, school of allied medical sciences, shahid beheshti university of medical sciences. maryam shekofteh (shekofteh_m@yahoo.com) is associate professor, department of medical library and information sciences, school of allied medical sciences, shahid beheshti university of medical sciences. shahabedin rahmatizadeh (shahab.rahmatizadeh@gmail.com) is assistant professor, department of health information technology and management, school of allied medical sciences, shahid beheshti university of medical sciences, tehran, iran. © 2022. abstract objective: social networking sites are appropriate tools for sharing and exposing scientific works to increase citations. the objectives of the present study are to investigate the activity of iranian scholars in the medical sciences in researchgate and to explore the effect of each of the four researchgate metrics on the rg score. moreover, the citation metrics of the faculty members in scopus and the relationship between these metrics and the rg score were explored. methods: the study population included all sbmu faculty members who have profiles in researchgate (n=950). the data were collected through researchgate and scopus in january 2021. the spearman correlation coefficient was applied to examine the relationship between researchgate metrics and scopus indicators as well as to determine the effect of each researchgate metric on the rg score. results: the findings revealed that the publication sharing metric had the highest correlation (0.918) with the rg score and had the greatest impact on it (p-value <0.001), while the question asking metric showed the lowest correlation (0.11). moreover, there was a significant relationship between the rg score and scopus citation metrics (p-value <0.05). furthermore, all four rg metrics had a positive and significant relationship with scopus indicators (p-value <0.05), in which the number of shared publications had the highest correlation compared to other rg metrics. conclusion: researchers’ participation in the researchgate social network is effective in increasing citation indicators. therefore, more activity in the researchgate social network may have favorable results in improving universities’ ranking. introduction conducting any scientific activity first requires gaining knowledge of previous relevant research and citing those sources. there is often a content link between these activities and the sources cited.1 typically, receiving citations is essential and valuable for researchers because, on the one hand, this issue is effective in the career advancement and promotion of researchers and, on the mailto:saeideh.valizadeh@gmail.com mailto:nasibi.lib@gmail.com mailto:shekofteh_m@yahoo.com mailto:shahab.rahmatizadeh@gmail.com information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 2 other hand, researchers intend to have a greater impact on science by receiving more citations. these works should be shared with other researchers and exposed to their observation to increase scholars’ research activities’ citation rate using appropriate tools.2 in this respect, academic social network sites are appropriate tools for sharing and exposing scientific works to increase citation rates.3 academic social network sites have brought researchers together regardless of time and space constraints and have facilitated scientific communication and information exchange.4 in addition, researchers can use these networks to pursue their common interests with other users.5 various studies indicate that sharing publications and subsequently publications’ visibility through social networks increases the citation rate of these works by more than 50%. it has also been observed that journal articles which are shared through these networks have received more citations than other articles in the same journals.6 one academic social network site for the exchange of scientific information is researchgate, which authors can use to cooperate with researchers in all scientific disciplines.7 through this network, researchers‘ scientific works will have better visibility by other people.8 to use this network, users must initially create their profile and then perform scientific activities.9 the researchers’ activity level in this network is indicated by the rg score, which is determined based on four individual metrics, including the number of shared publications, the researcher’s activity in asking questions, the researcher’s activity in answering other people’s questions, and the number of followers. the individual rg metrics affect researchers’ rg score, but the extent of individual metrics impact on this score is not clear.10 shahid beheshti university of medical sciences (sbmu) is one of the top universities in iran. according to the evaluation of medical universities’ research activities in the webometrics ranking of world universities, sbmu has achieved the fourth rank among iran’s medical universities.11 in the centre for science and technology studies (cwts) leiden ranking, this university is ranked 11th among iranian universities and 646th among world universities.12 faculty members are one of the main components in universities’ educational structure and play a crucial role in generating, conducting, and disseminating knowledge. due to the importance of citations of faculty members’ scientific works in ranking systems and the situation of sbmu in world rankings, it seems that measures should be taken to improve the citations of faculty members of this university as one of the ways to improve the ranking of the university. considering that more than half of the published articles never receive citations, as well as the positive role of research sharing on social networks in increasing the number of citations, it seems that the activities of sbmu faculty members in the researchgate network may increase the citations count to their research and, consequently, improve the university’s ranking.13 however, to date, no research has been carried out on the activity of the faculty members of sbmu in researchgate. literature review various studies have addressed researchers’ activity in the researchgate academic network. the level of researchers’ activities in researchgate and the relationship between citation metrics and rg score are among the topics that have been investigated. information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 3 regarding researchers’ activity in researchgate, numerous studies have been carried out. among these, the research of nikkar, rahmani, lui, sheeja, muhammad yousuf ali, mahajan, and joshi can be mentioned.14 nikkar et al. conducted a study to investigate surgical researchers’ activities in researchgate, which revealed that the majority of these researchers (86.24%) are active members in researchgate.15 rahmani et al. identified the activity of faculty members of technical colleges in researchgate, which showed that most of these researchers (64.16%) were active members of this network.16 the study by sheeja et al. of naval architecture engineering researchers at researchgate revealed that most of them (65%) have a researchgate profile.17 the study by muhammad yousuf ali, titled “altmetrics of pakistani library and information science researchers at researchgate,” indicated that 75.73% of researchers have a researchgate profile.18 in contrast, in studies conducted by mahajan et al., joshi et al., and lui et al., findings showed that less than half of the surveyed researchers are active users of this network.19 in addition to measuring activity in researchgate, some other studies also examined the relationship between the rg score and citation indicators. in this regard, in a study by joshi et al., it was revealed that there is a significant relationship between the rg score and citation metrics.20 shrivastava et al. also conducted an analysis of researchgate profiles of panjab university lecturers.21 the results demonstrated that there is a significant relationship between rg and citation metrics. a study conducted by naderbeigi et al. showed that there is a significant relationship between activity on the researchgate network, rg score, and scopus metrics of the faculty members of sharif university of technology.22 the allied medical science scientists’ activity in researchgate was examined by valizadeh-haghi et al.23 the study revealed that there is a significant relationship between rg scores and scopus indicators. correspondingly, the findings showed that there is a significant relationship between lecturers’ academic ranking and their rg scores as well as scopus indicators. according to the literature, it seems that the effect of each of the individual metrics of researchgate on the rg score has not so far been studied. researchgate also has not officially specified the impact of any of its individual metrics on the rg score, while researchers’ awareness of this impact may affect their activity behavior in any of the individual metrics to increase their rg score. previous studies also show that limited research has been conducted in iran regarding faculty members’ activity in researchgate. accordingly, none of these studies has investigated the activity of all faculties of a university. therefore, the objectives of the present study include (1) investigating the activity of sbmu faculty members in researchgate, (2) investigating the effect of each of the four individual researchgate metrics on the rg score of the faculty members, (3) determining the citation metrics of the faculty members in scopus, and (4) the relationship between individual rg metrics and the faculty members’ scopus citation metrics. material and methods the present altmetrics study population included all sbmu faculty members who have profiles in researchgate (n=950). information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 4 to extract the number as well as the name of the faculty members, we used the iranian scientometrics information database, which is developed by the deputy of research and technology of ministry of health and medical education of iran.24 the number of faculty members in this system was 1,430, of which 950 had profiles in researchgate and were examined. the data regarding rg score were collected through direct observation of their profiles in researchgate. the rg score includes four metrics: number of shared publications, researcher’s activity in asking questions, researcher’s activity in answering other people’s questions, and followers. data related to the number of citations and the h-index of each of the lecturers were collected by viewing their profiles in the scopus database in january 2021. in this study, it was assumed that there is a significant relationship between researchgate individual metrics and scopus citation metrics. given that the data were not normally distributed, to examine this relationship, the spearman correlation coefficient was used. moreover, regarding that the impact of each of the researchgate metrics on the rg score has not been officially determined, the spearman correlation coefficient was applied to determine the effect of the individual metrics of researchgate on the rg score of the participants. the collected data were analyzed using excel and spss version 18 software. results the rg score of the faculty members is shown in table 1. all faculty members had an rg score, and most of the faculty members (79.5%) had an rg score of less than one. the average rg score of participants was 15.88. table 1. rg score of the sbmu faculty members rg score frequency mean min max sd median members % <1 55 5.79 0.01 0 0.59 0.08 0 1-11 300 31.58 6.18 1.13 10.98 2.89 6.56 11-21 297 31.26 16.09 11 20.98 2.83 15.92 21-31 209 22 25.24 21.06 30.97 2.79 25.25 31-41 82 8.63 34.9 31.02 40.32 2.56 34.5 41-51 6 0.63 42.88 41.46 45.84 1.57 42.34 41-61 1 0.11 56.49 56.49 56.49 56.49 total 950 100 15.88 0 56.49 10.42 15.05 the findings show that most of the faculty members have shared their publications in researchgate, but only 4.11% of them are not active in sharing their publications (see table 2). information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 5 table 2. number of shared publications of the sbmu faculty members publication frequency mean min max sd median members % 0 39 4.11 0 0 0 0 0 1-50 689 72.53 19.70 1 50 13.12 17 51-100 145 15.26 69.83 51 100 13.24 68 101-150 36 3.79 121.28 101 147 13.50 118 151-200 23 2.42 174.04 153 199 14.66 172 201-250 10 1.05 222.80 203 243 14.29 226 251-300 5 0.53 262.40 251 275 11.13 264 401-450 1 0.11 402 402 402 402 501-550 1 0.11 522 522 522 -_ 522 801-850 1 0.11 824 824 824 824 total 950 100 39.32 0 824 54.73 23 the findings indicate that most faculty members (94.42%) did not have any activity in asking questions. the highest level of activity in this metric was performed by 0.11% of faculty members (see table 3). table 3. the faculty members’ activity in asking questions question frequency mean min max sd median members % 0 897 94.42 0 0 0 0 0 1-10 51 5.37 1.73 1 9 1.58 1 20-30 1 0.11 28 28 28 28 41-50 1 0.11 46 46 46 46 total 950 100 0.17 0 46 1.82 0 table 4. faculty members’ activity in answering questions answers frequency mean min max sd median members % 0 840 88.42 0 0 0 0 0 1-5 91 9.58 2.08 1 5 1.34 2 6-10 10 1.05 7 6 9 1.15 7 11-15 3 0.32 13.33 12 15 1.53 13 16-20 2 0.21 16 16 16 0 16 21-25 1 0.11 25 25 25 25 31-35 1 0.11 31 31 31 31 41-45 1 0.11 41 41 41 41 216-220 1 0.11 218 218 218 218 total 950 100 0.68 0 218 7.43 0 information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 6 additionally, in answer to other researchers’ questions, most faculty members (88.42%) did not have any activity. the highest level of activity in this metric was done by 0.11% of them (see table 4). the findings demonstrated that most faculty members had followers, and only 0.74% had no followers (see table 5). table 5. the number of followers of sbmu faculty members followers frequency mean min max sd median members % 0 7 0.74 0 0 0 0 0 1-50 654 68.84 21.51 1 50 13.58 20 51-100 146 15.37 69.36 51 99 13.76 65.5 101-150 66 6.95 121 102 150 14.53 119 151-200 39 4.11 171.92 151 198 14.38 169 201-250 20 2.11 223.60 202 246 13.29 223.5 >250 18 1.89 391.44 253 891 181.72 338 total 950 100 53.05 0 891 72.17 31 the correlation between rg metrics and rg score was examined using the spearman correlation test. the findings showed that the publication sharing metrics had the highest correlation (0.918) with the rg score; therefore, it had the greatest impact on the rg score (p-value <0.001). the question asking metric had the lowest correlation (0.11) with the rg score (see table 6). table 6. the correlation between researchgate metrics and rg score of faculty members publication followers question answers rg score correlation coefficient 0.918 0.774 0.11 0.185 p-value < .001 < .001 .001 < .001 the number of citations of the faculty members in the scopus database is presented in table 7. the findings showed that most faculty members had citations, and only 5.16% of them had not received any citations. information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 7 table 7. number of citations of sbmu faculty members in scopus citation frequency mean min max sd median members % 0 49 5.16 0 0 0 0 0 1-500 771 8116 108.91 1 495 114.80 67 501-1000 68 7.16 694.06 503 997 148.87 690 1001-1500 29 3.05 1213.66 1016 1481 131.55 1180 1501-2000 14 1.47 1771.14 1594 1995 131.41 1741 2001-2500 9 0.95 2175.56 2029 2446 137.10 2192 2501-3000 2 0.21 2747 2686 2808 86.27 2747 3001-3500 2 0.21 3207.5 3007 3408 283.55 3207.5 3501-4000 2 0.21 3767 3582 3952 261.63 3767 4001-4500 1 0.11 4459 4459 4459 4459 4501-5000 1 0.11 4581 4581 4581 4581 6501-7000 1 0.11 6907 6907 6907 6907 1900119500 1 0.11 19272 19272 19272 19272 total 950 100 279.37 0 19272 817.14 83 the findings indicated that most faculty members had an h-index in scopus, and the mean of their h-index was 6.46 (see table 8). table 8. h-index of sbmu faculty members hindex frequency mean min max sd median members % 0 49 5.16 0 0 0 0 0 1-10 732 77.05 4.54 1 10 2.57 4 11-20 137 14.42 14.38 11 20 2.81 14 21-30 29 3.05 24.41 21 30 2.71 24 31-40 2 0.21 35.50 31 40 6.36 35.5 61-70 1 0.11 63 63 63 63 total 950 100 6.46 0 63 5.96 5 the correlation between researchgate indicators and scopus citation metrics is presented in table 9. information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 8 table 9. correlation between researchgate indicators and scopus citation metrics h-index citation p-value correlation coefficient p-value correlation coefficient rg score < .001 0.803 < .001 0.791 publication < .001 0.735 < .001 0.715 question .006 0.09 .019 0.076 answers < .001 0.147 < .001 0.119 followers < .001 0.694 < .001 0.676 the findings showed a positive and significant relationship between the rg score and scopus citation metrics (p-value <0.05). additionally, all four rg metrics had a positive and significant relationship with scopus citation metrics, including citations and the h-index (p-value <0.05). the findings showed that the number of shared publications had the highest correlation with citations (0.715) and h-index (0.735) compared to other rg metrics. discussion researchgate’s mission is to link the academic world and make research accessible to all scholars. this study has compared the rg metrics of sbmu faculty members. the major findings are highlighted and discussed around the four research questions of this study. the findings of the present study revealed that even though more than half of the faculty members have profiles in researchgate and are active in this network, compared to the findings of other studies, such as those of yousuf ali, janmohammadi, rahmani and nikkar, this rate is low.25 this issue may be due to the lack of knowledge and familiarity of faculty members with the researchgate social network or the lack of the need to publish outputs on the researchgate social network, which needs further investigation. the present study results also showed that the mean rg score of sbmu faculty members is similar to the results of other studies conducted in iran and other international studies.26 the current study results indicated that the subjects’ activity in the four rg metrics was slight in some indicators, and the highest activity was related to publications metric. the lowest level of members’ activity was related to the asking-questions metric. considering that the rg score results from the scores obtained by the researcher in the four rg metrics, this study’s research results confirm that the faculty members did not pay enough attention to the activity in all the rg metrics. the present findings showed that faculty members have limited activities in sharing publications, which is aligned with the results from other studies.27 this could be due to several reasons. firstly, young faculty members who have recently joined the university as faculty members may have fewer publications in comparison with older members. another reason may be that faculty members who have recently joined the researchgate social network have not had enough time to share all of their publications. it should be noted that sharing publications on researchgate has massive ramifications for the open access movement. it might be that one of the reasons authors do not publish on rg is because they do not have the rights to do so. in this regard, it is worthy to mention that the publication-sharing metric includes both full-text sharing and/or abstract information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 9 sharing in which sharing the abstract is legal. so, researchers were free to share their abstracts but they have not done it. the findings also show that most of the sbmu faculty members have no activity in two metrics: asking questions and answering questions. compared to other studies, the activity of sbmu faculty members in these metrics is at a lower level.28 possible reasons for this could be a lack of awareness of the importance of these metrics to increase the rg score, a lack of english language proficiency to participate in asking and answering questions, and a lack of time. however, more research is needed in this area. the results showed that most of the sbmu faculty members have followers. the mean number of followers of the faculty members is similar to what was found in other studies.29 as the number of followers increases, a person’s popularity in their subject area increases. they may even be influenced by the researcher’s studies in other areas and follow the researcher’s activities in researchgate and, with the increase of followers, there is a possibility of increasing the citation rate.30 therefore, it is recommended to elaborate on the importance and role of each of the rg metrics in raising the rg score through posters, workshops, and educational brochures for faculty members. in this study, the correlation between each of the rg metrics and rg score was examined using the spearman correlation test. the present study results revealed that the shared publication and number of followers metrics have a stronger correlation with the rg score compared to the metrics of questions and answers. the results also showed a significant correlation between all rg metrics and the rg score of sbmu faculty members. the study results indicated that most of the sbmu faculty members have citations in the scopus database and have an h-index, but most of them received the least number of citations. according to the present study findings, the subjects have little activity in the researchgate social network. as one of the possible reasons for the low number of citations, we can mention the low activity in the researchgate social network. research on surgeons’ publications has also confirmed this.31 nevertheless, there is a need for further research on the low number of citations of faculty members of sbmu. the present study’s findings demonstrated a significant relationship between the rg score and scopus citation metrics (h-index and number of citations). in this regard, the highest correlation was observed between the h-index and rg score (p-value = 0.803). this finding is consistent with other studies’ findings.32 there is also a significant relationship between each of the rg score metrics and scopus citation metrics. in this regard, the highest and lowest correlations with scopus citation metrics were observed between publication, questions, and answers metrics, respectively (p-value = 0.001). considering the positive relationship between each of the rg metrics and scopus citation metrics, it is suggested that faculty members pay enough attention to all of these metrics to help increase their citation indicators. due to the researchgate social network’s role in increasing the visibility of researchers’ scientific outputs, faculty members can consider the use of this network as one of the tools to increase the number of citations and the h-index. conclusion easy access to research outputs and increasing visibility is one of the most important features of researchgate, which, according to the results of this study, has a significant impact on increasing information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 10 the use and thus increasing citations. as the results revealed, researchers’ participation in the researchgate social network is effective in increasing citation indicators, including the number of citations and h-index. therefore, more activity in the researchgate social network, followed by receiving citations, can have favorable results in improving rankings for both research institutes and universities. universities can encourage faculty members to join and work in researchgate and other academic-social networks by considering privileges to improve their academic rank. libraries and research centers can explain the importance of faculty members’ activities in these networks by holding workshops on altmetric indicators and academic social network sites, especially researchgate. they can also justify to researchers the benefits of using this network and sharing more scientific outputs. declaration of interest: none funding: this work was supported by the school of allied medical sciences, shahid beheshti university of medical sciences, tehran, iran [grant number 28727]. the research ethics committee has approved this research under the ethical code number ir.sbmu.retech.rec.1400.310. endnotes 1 bart penders, “ten simple rules for responsible referencing,” plos computational biology 14, no. 4 (2018), https://doi.org/10.1371/journal.pcbi.1006036; b. s. lancho barrantes et al., “citation flows in the zones of influence of scientific collaborations,” journal of the american society of information science technology 63, no. 3 (2012): 481–89, https://doi.org/10.1002/asi.21682. 2 h. a. piwowar, r. s. day, and d. b. fridsma, “sharing detailed research data is associated with increased citation rate,” plos one 2, no. 3 (2007): e308, https://doi.org/10.1371/journal.pone.0000308. 3 stefano bortoli, paolo bouquet, and themis palpanas, “social networking: power to the people,” in papers presented in w3c workshop on the future of social networking position, january, barcelona (2009); brian kelly, “can linkedin and academia.edu enhance access to open repositories?”, impact of social sciences blog, 2012, https://blogs.lse.ac.uk/impactofsocialsciences/2012/08/23/linkedin-academia-enhanceaccess-to-open-repositories/. 4 bortoli, bouquet, and palpanas, “social networking.” 5 nicole muscanell and sonja utz, “social networking for scientists: an analysis on how and why academics use researchgate,” online information review 41, no. 5 (2017): 744–59, https://doi.org/10.1108/oir-07-2016-0185. 6 stevan harnad, “publish or perish—self-archive to flourish: the green route to open access,” ercim news 64 (2006), http://eprints.ecs.soton.ac.uk/11715/1/harnad-ercim.pdf. 7 vala ali rohani and siew hock ow, “eliciting essential requirements for social networks in academic environments,” in 2011 ieee symposium on computers & informatics (ieee, 2011):171–76, https://doi.org/10.1109/isci.2011.5958905. https://doi.org/10.1371/journal.pcbi.1006036 https://doi.org/10.1002/asi.21682 https://doi.org/10.1371/journal.pone.0000308 https://blogs.lse.ac.uk/impactofsocialsciences/2012/08/23/linkedin-academia-enhance-access-to-open-repositories/ https://blogs.lse.ac.uk/impactofsocialsciences/2012/08/23/linkedin-academia-enhance-access-to-open-repositories/ https://doi.org/10.1108/oir-07-2016-0185 http://eprints.ecs.soton.ac.uk/11715/1/harnad-ercim.pdf https://doi.org/10.1109/isci.2011.5958905 information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 11 8 elena giglia, “academic social networks: it’s time to change the way we do research,” european journal of physical and rehabilitation medicine 47, no. 2 (2011): 345–49. 9 rohani and hock ow, “eliciting.” 10 peter kraker and elisabeth lex, “a critical look at the researchgate score as a measure of scientific reputation,” (paper, quantifying and analysing scholarly communication on the web (ascw15), oxford, uk, june 30, 2015): 7–9, https://doi.org/10.5281/zenodo.35401. 11 webometrics ranking of world universities 2021, https://www.webometrics.info/en/asia/iran%20%28islamic%20republic%20of%29. 12 cwts leiden ranking 2018, https://www.leidenranking.com/ranking/2018/list. 13 richard van noorden, “the science that’s never been cited,” nature 552 (2017): 162–64, https://doi.org/10.1038/d41586-017-08404-0; rishabh shrivastava and preeti mahajan, “an altmetric analysis of researchgate profiles of physics researchers: a study of university of delhi (india),” performance measurement and metrics 18, no. 1 (2017): 52–66, https://doi.org/10.1108/pmm-07-2016-0033. 14 dennis h. lui et al., “contemporary engagement with social media amongst hernia surgery specialists,” hernia 21, no. 4 (2017): 509–15, https://doi.org/10.1007/s10029-017-1609-8; n. k. sheeja and susan k. mathew, “researchgate profiles of naval architecture scientists in india: an altmetric analysis,” library philosophy and practice (2019): 1–9, https://digitalcommons.unl.edu/libphilprac/2305/; muhammad yousuf ali and joanna richardson, “pakistani lis scholars’ altmetrics in researchgate,” program 51, no. 2, (2017):152–69, https://doi.org/10.1108/prog-07-2016-0052; preeti mahajan, har singh, and anil kumar, “use of snss by the researchers in india: a comparative study of panjab university and kurukshetra university,” library review 62, no. 8/9 (2013): 525–46; neil d. joshi et al., “social media in neurosurgery: using researchgate,” world neurosurgery 127 (2019): e950– e956, https://doi.org/10.1016/j.wneu.2019.04.007; maliheh nikkar, rahim alijani, and hamid ghazizadeh khalifeh mahaleh, “investigation of the presence of surgery researchers in research gate scientific network: an altmetrics study,” iranian journal of surgery 25, no. 2, (2017): 76–82; maryam rahmani et al., “rg score compared with h-index: a case study,” sciences and techniques of information management 4, no. 2 (2018): 61–76, http://stim.qom.ac.ir/article_1139_en.html. 15 nikkar, alijani, and ghazizadeh khalifeh mahaleh, “investigation.” 16 rahmani et al., “rg score.” 17 sheeja and mathew, “researchgate.” 18 yusuf ali and richardson, “pakistani.” 19 mahajan, singh, and kumar, “use of snss”; joshi et al., “social media”; h lui et al., “contemporary.” 20 joshi et al., “social media.” https://doi.org/10.5281/zenodo.35401 https://www.webometrics.info/en/asia/iran%20%28islamic%20republic%20of%29 https://www.leidenranking.com/ranking/2018/list https://doi.org/10.1038/d41586-017-08404-0 https://doi.org/10.1038/d41586-017-08404-0 https://doi.org/10.1108/pmm-07-2016-0033 https://doi.org/10.1007/s10029-017-1609-8 https://digitalcommons.unl.edu/libphilprac/2305/ https://doi.org/10.1108/prog-07-2016-0052 https://doi.org/10.1016/j.wneu.2019.04.007 https://doi.org/10.1016/j.wneu.2019.04.007 http://stim.qom.ac.ir/article_1139_en.html information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 12 21 rishabh shrivastava and preeti mahajan, “relationship amongst researchgate altmetric indicators and scopus bibliometric indicators: the case of panjab university chandigarh (india),” new library world (2015), https://doi.org/10.1108/nlw-03-2015-0017. 22 farahnaz naderbeigi and alireza isfandyari-moghaddam, “researchers’ scientific performance in researchgate: the case of a technology university,” library philosophy & practice (2018), https://digitalcommons.unl.edu/libphilprac/1752. 23 hamed nasibi-sis, saeideh valizadeh-haghi, and maryam shekofteh, “researchgate altmetric scores and scopus bibliometric indicators among lecturers,” performance measurement and metrics 22 no. 1 (2020):15–24, https://doi.org/10.1108/pmm-04-2020-0020. 24 iranian scientometrics information database, https://isid.research.ac.ir/, accessed december 26, 2021. 25 yusuf ali and richardson, “pakistani”; maryam janmohammadi, maryam rahmani, and zahra rootan, “review of rg indices and ranking of researchers in research gate: case study: faculty of veterinary medicine, university of tehran,” in proceedings international interactive information retrieval conference (tehran, 2017); rahmani et al., “rg score”; nikkar, alijani, and ghazizadeh khalifeh mahaleh, “investigation.” 26 rahmani et al., “rg score”; naderbeigi and isfandyari-moghaddam, “researchers”; janmohammadi, rahmani, and rootan, “review of rg”; shrivastava and mahajan, “an altmetric analysis”; shrivastava and mahajan, “relationship.” 27 shrivastava and mahajan, “an altmetric analysis”; shrivastava and mahajan, “relationship”; janmohammadi, rahmani, and rootan, “review of rg.” 28 shrivastava and mahajan, “an altmetric analysis”; shrivastava and mahajan, “relationship.” 29 shrivastava and mahajan, “an altmetric analysis”; shrivastava and mahajan, “relationship”; naderbeigi and isfandyari-moghaddam, “researchers”; janmohammadi, rahmani, and, rootan, “review of rg.” 30 shrivastava and mahajan, “an altmetric analysis.” 31 nikkar, alijani, and ghazizadeh khalifeh mahaleh, “investigation.” 32 sheeja and mathew, “researchgate”; joshi et al., “social media”; rishabh and mahajan, “relationship”; naderbeigi and isfandyari-moghaddam, “researchers.” https://doi.org/10.1108/nlw-03-2015-0017 https://digitalcommons.unl.edu/libphilprac/1752 https://doi.org/10.1108/pmm-04-2020-0020 https://isid.research.ac.ir/ abstract introduction literature review material and methods results discussion conclusion endnotes an overview of the current state of linked and open data in cataloging irfan ullah, shah khusro, asim ullah, and muhammad naeem information technology and libraries | december 2018 47 irfan ullah (cs.irfan@uop.edu.pk) is doctoral candidate, shah khusro (khusro@uop.edu.pk) is professor, asim ullah (asimullah@uop.edu.pk) is doctoral student, and muhammad naeem (mnaeem@uop.edu.pk) is assistant professor, at the department of computer science, university of peshawar. abstract linked open data (lod) is a core semantic web technology that makes knowledge and information spaces of different knowledge domains manageable, reusable, shareable, exchangeable, and interoperable. the lod approach achieves this through the provision of services for describing, indexing, organizing, and retrieving knowledge artifacts and making them available for quick consumption and publication. this is also aligned with the role and objective of traditional library cataloging. owing to this link, major libraries of the world are transferring their bibliographic metadata to the lod landscape. some developments in this direction include the replacement of anglo-american cataloging rules 2nd edition by the resource description and access (rda) and the trend towards the wider adoption of bibframe 2.0. an interesting and related development in this respect are the discussions among knowledge resources managers and library community on the possibility of enriching bibliographic metadata with socially curated or user-generated content. the popularity of linked open data and its benefit to librarians and knowledge management professionals warrant a comprehensive survey of the subject. although several reviews and survey articles on the application of linked data principles to cataloging have appeared in literature, a generic yet holistic review of the current state of linked and open data in cataloging is missing. to fill the gap, the authors have collected recent literature (2014–18) on the current state of linked open data in cataloging to identify research trends, challenges, and opportunities in this area and, in addition, to understand the potential of socially curated metadata in cataloging mainly in the realm of the web of data. to the best of the authors’ knowledge, this review article is the first of its kind that holistically treats the subject of cataloging in the linked and open data environment. some of the findings of the review are: linked and open data is becoming the mainstream trend in library cataloging especially in the major libraries and research projects of the world; with the emergence of linked open vocabularies (lov), the bibliographic metadata is becoming more meaningful and reusable; and, finally, enriching bibliographic metadata with user-generated content is gaining momentum. conclusions drawn from the study include the need for a focus on the quality of catalogued knowledge and the reduction of the barriers to the publication and consumption of such knowledge, and the attention on the part of library community to the learning from the successful adoption of lod in other application domains and contributing collaboratively to the global scale activity of cataloging. introduction with the emergence of the semantic web and linked open data (lod), libraries have been able to make their bibliographic data publishable and consumable on the web, resulting in an increased understanding and utility both for humans and machines.1 additionally, the use of linked data principles of lod has allowed connecting related data on the web.2 traditional catalogs as mailto:cs.irfan@uop.edu.pk mailto:khusro@uop.edu.pk mailto:asimullah@uop.edu.pk mailto:mnaeem@uop.edu.pk current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 48 https://doi.org/10.6017/ital.v37i4.10432 collections of metadata about library content have served the same purpose for a long time.3 it is, therefore, natural to establish a link between the two technologies and exploit the capabilities of lod to enhance the power of cataloging services. in this regard, significant milestones have been achieved, which includes the use of linked and open data principles for publishing and linking library catalogs, bibframe, and europeana data model (edm).4 however, the potential of linked and open data for building more efficient libraries and the challenges involved in that direction are mostly unknown due to the lack of a holistic view of the relationship between cataloging and the lod initiative and the advances made in both areas. likewise, the possibility of enriching the bibliographic metadata with user-generated content such as ratings, tags, and reviews to facilitate the search for known-items as well as exploratory search has not received much attention. 5 some studies of preliminary extent have, however, appeared in literature an overview of which is presented in the following paragraphs. several survey and review articles have contributed to different aspects of cataloging in the lod environment. hallo et al. investigated how linked data is used in digital libraries, how the major libraries of the world implemented it, and how they benefit from it by focusing on the selected ontologies and vocabularies. 6 they identified several specific challenges to applying linked data to digital libraries. more specifically, they reviewed the linked data applications in digital libraries by analyzing research publications regarding the major national libraries (obtaining five-stars by following linked data principles) and published from 2012 to 2016.7 tallerås examined statistically the quality of linked bibliographic data published by the major libraries including spain, france, the united kingdom, and germany. 8 yoose and perkins presented a brief survey of lod uses under different projects in different domains including libraries, archives, and museums.9 by exploring the current advances in the semantic web, robert identified the potential roles of libraries in publishing and consuming bibliographic data and institutional research output as linked and open data on the web.10 gardašević presented a detailed overview of semantic web and linked open data from the perspective of library data management and their applicability within the library domain to provide a more open and integrated catalog for improved search, resource discovery, and access.11 thomas, pierre-yves, and bernard presented a review of linked open vocabularies (lov), in which they analyzed the health of lov from the requirements perspective of its stakeholders, its current progress, its uses in lod applications, and proposed best practices and guidelines regarding the promotion of lov ecosystem.12 they uncovered the social and technical aspects of this ecosystem and identified the requirements for the long-term preservation of lov data. vandenbussche et al. highlighted the features, components, significance, and applications of lov and identified the ways in which lov supports ontology & vocabulary engineering in the publication, reuse and data quality of lod.13 tosaka and park performed a detailed literature review of rda (2005–11) and identified its fundamental differences from aacr2, its relationship with the metadata standards, and its impact on metadata encoding standards, users, practitioners, and the training required.14 sprochi presented the current progress in rda, frbr (functional requirements for bibliographic records), and bibframe to predict the future of library metadata, the skills and knowledge required to handle it, and the directions in which the library community is heading. 15 gonzales identified the limitations of marc21 and the benefits of and challenges in adopting the bibframe information technology and libraries | december 2018 49 framework.16 taniguchi assessed bibframe 2.0 for the exchange and sharing of metadata created in different ways for different bibliographic resources.17 he discussed bibframe 1.0 from rda point of view.18 he examined bibframe 2.0 from the perspective of rda to uncover issues in its mapping to bibframe including rda expressions in bibframe, mapping rda elements to bibframe properties, and converting marc21 metadata records to bibframe metadata. 19 fayyaz, ullah, and khusro reported on the current state of lod and identified several prominent issues, challenges, and research opportunities. 20 ullah, khusro, and ullah reviewed and evaluated different approaches for bibliographic classification of digital collections.21 by looking at the above survey and review articles, one may observe that these articles target a specific aspect of cataloging from the perspective of lod. the holistic analysis and a complete picture of the current state of cataloging in transiting to lod ecosystem are missing. this paper adds to the body of knowledge by filling this gap in the literature. more specifically, it attempts to answer the following research questions (rqs): rq01: how linked open data (lod) and vocabularies (lov) are transforming the digital landscape of library catalogs? rq02: what are the prominent/major issues, challenges, and research opportunities in publishing and consuming bibliographic metadata as linked and open data? rq03: what is the possible impact of extending bibliographic metadata with the usergenerated content and making it visible on the lod cloud? the first section of this paper answers rq01 by discussing the potential role of lod and lov in making library catalogs visible and reusable on the web. the second section answers rq02 by identifying some of the prominent issues, challenges, and research opportunities in publishing, linking, and consuming library catalogs as linked data. it also identifies specific issues in rda and bibframe from lod perspective and highlights the quality of lod-based cataloging. the third section answers rq03 by reviewing the state-of-the-art literature on the socially curated metadata and its role in cataloging. the last section concludes the paper followed by references cited in this article. the role of linked open data and vocabularies in cataloging the catalogers, librarians, and information science professionals have always been busy defining the set of rules, guidelines, and standards to record the metadata about knowledge artifacts accurately, precisely, and efficiently. the aacr2 are among the widely used rules and guidelines for cataloging. however, it has several issues with the nature of authorship, the relationships between bibliographic metadata, the categorization of format-specific resources, and the description of new data types.22 in an attempt to produce its revised version, aacr3, the cataloging community noticed that a new framework should be developed with the name of rda.23 based on frbr conceptual models, rda is a “flexible and extendible bibliographic framework” that supports data sharing and interoperability and is compatible with marc21 and aacr2.24 according to the rda toolkit, rda describes digital and non-digital resources by taking advantage of the flexibilities and efficiencies of modern information storage and retrieval technologies while at the same time is backward-compatible with legacy technologies used in conventional resource discovery and access applications.25 it is aligned with the ifla’s current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 50 https://doi.org/10.6017/ital.v37i4.10432 (international federation of library associations and institutions) conceptual models of authority and bibliographic metadata (frbr, frad [functional requirements for authority data], frsad [functional requirements for subject authority data]).26 rda accommodates all types of content and media in digital environments with improved bibliographic control in the realm of linked and open data; however, its responsiveness to user requirements needs further research.27 the discussion of the cataloging rules and guidelines stays incomplete without the metadata encoding standards and formats that give practical shape to these rules in the form of library catalogs. the most common encoding formats include dublin core (dc) and marc21. dublin core (http://lov.okfn.org/dataset/lov/vocabs/dce) is a [general-purpose metadata encoding scheme and] vocabulary of fifteen properties with “broad, generic, and usable terms” for resource description in natural language. it is advantageous as it presents relatively low barriers to repository construction; however, it lacks in standards to index subjects consistently as well as to offer a uniform semantic basis necessary for an enhanced search experience.28 the lack of uniform semantic basis is due to the individual interpretations and exploitations of dc metadata by the libraries, which in turn originated from its different and independent implementations at the element level.29 marc21 is the most common machine process-able metadata encoding format for bibliographic metadata. it can be mapped to several formats including dc, marc/xml (http://www.loc.gov/standards/marcxml/), mods (http://www.loc.gov/standards/mods), mads (http://www.loc.gov/standards/mads), and other metadata standards.30 however, marc21 has several limitations such as only library software and librarians understand it, it is semantically inexpressive and isolated from the web structure, and it lacks in expressive semantic connections to relate different data elements in a single catalog record.31 besides its limitations, marc metadata encoding format is vital for resource discovery especially within the library environment, and therefore, ways must be found to make visible the library collections outside the libraries and available through the major web search engines.32 one such effort is from the library of congress (http://catalog.loc.gov/) that introduced a new bibliographic metadata framework, bibframe 2.0, which will eventually replace marc21 and allow semantic web and linked open data to interlink bibliographic metadata from different libraries. other metadata encoding schema and frameworks include schema.org, edm, and the international community for documentation (cidoc)’s conceptual reference model (cidoc-crm).33 today, the bibliographic metadata records are available on the web in several forms including marc21, online public access catalogs (opacs), and bibliographic descriptions from online catalogs (e.g., library of congress), online cooperative catalogs (e.g., oclc’s worldcat [https://www.oclc.org/en/worldcat.html program]), social collaborative cataloging applications (e.g., librarything [https://www.librarything.com]), digital libraries (e.g., ieee xplore digital library [https://ieeexplore.ieee.org/xplore/home.jsp]), acm digital library(https://dl.acm.org), book search engines such as google books, and commercial databases including e.g., amazon.com. most of these cataloging web applications use either marc or other legacy standards as metadata encoding and representation schemes. however, the majority of these applications are either considering or transiting to the emerging cataloging rules, frameworks, and encoding schemes so that the bibliographic descriptions of their holdings could be made visible and reusable as linked and open data on the web for the broader interests of libraries, publishers, and end-users. http://lov.okfn.org/dataset/lov/vocabs/dce http://www.loc.gov/standards/marcxml/ http://www.loc.gov/standards/mods http://www.loc.gov/standards/mads http://catalog.loc.gov/ https://www.oclc.org/en/worldcat.html https://www.librarything.com/ https://ieeexplore.ieee.org/xplore/home.jsp https://dl.acm.org/ information technology and libraries | december 2018 51 the presence of high-quality reusable vocabularies makes the consumption of linked data more meaningful, which is made possible by linked open vocabularies (lov) that bring value-added extensions to the web of data.34 the following two subsections attempt to answer the rq01 by highlighting how lod and lov are transforming the current digital landscape of cataloging. linked and open data the semantic web and linked open data have enabled libraries to publish and make visible their bibliographic data on the web, which increases the understanding and consumption of this metadata both for humans and machines.35 lod connects and relates bibliographic metadata on the web using linked data principles.36 publishing, linking, and consuming bibliographic metadata as linked and open data brings several benefits. these include improvements in data visibility, linkage with different online services, interoperability through universal lod platform, and the credibility due to user annotations.37 other benefits include: the semantic modeling of entities related to bibliographic resources; ease in transforming topics into skos; ease in the usage of linked library data in other services; better data visualization according to user requirements; linking and querying linked data from multiple sources; and improved usability of library linked data in other domains and knowledge areas.38 different users including scientists, students, citizens and other stakeholders of library data can benefit from adopting lod in libraries.39 linked data has the potential to make bibliographic metadata visible, reusable, shareable, and exchangeable on the web with greater semantic interoperability among the consuming applications. several major projects including bibframe, lodlam (linked open data in libraries archives and museums [http://lodlam.net]), and ld4l (linked data for libraries [https://www.ld4l.org]) are in progress, which advocates for this potential.40 similarly, library linked data (lld) is lod-based bibliographic datasets, available in mods and marc21 and could be used in making search systems more sophisticated and may also be used in lov datasets to integrate applications requiring library and subjects domain datasets.41 bianchini and guerrini report on the current changes in the library and cataloging domains from ranganathan’s point of view of trinity (library, books, staff), which states that changes in one element of this trinity undoubtedly affect the others.42 they found several factors including readers, collections, and services influence this trinity and emphasize for a change: • readers moved to the web from libraries and wanted to save their time but want many capabilities including searching and navigating the full-text of resources by following links. they want resources connected to similar and related resources. they want concepts interlinked to perform an exploratory search, find serendipitous results to fulfill their information needs. • collections encompass several changes from their production to dissemination, from search and navigation to the representation and presentation of content. the ways the users access them and catalogers describe them are changing. their management is moving beyond the boundaries of their corresponding libraries to the open and broader landscape of open access context and exposure to lod environment. • services are moving from bibliographic data silos to the semantic web. this affects moving the bibliographic model to a more connected and linked data model and environment of semantic web. the data is moving from bibliographic database management systems to large lod graph, where millions of marc records are reused and converted to new http://lodlam.net/ https://www.ld4l.org/ current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 52 https://doi.org/10.6017/ital.v37i4.10432 encoding formats that are backward compatible with marc21, rda, and others and provide opportunities to be exploited fully by the linked and open data environment. thinking along this direction, new cataloging rules and guidelines, such as rda, are making us a part of the growing global activity of cataloging. therefore, catalogers should take keen interest in and avail themselves of the opportunities that lie in linked and open data for cataloging. otherwise, they (as a service) might be forgotten or removed from the trinity, i.e., from collections and readers.43 several major libraries have been actively working to make their bibliographic metadata visible and re-usable on the web. the library of congress through its linked data service (http://id.loc.gov) enables humans and machines to access its authority data programmatically. 44 it exposes and interconnects data on the web through dereferenceable uniform resource identifiers (uris).45 its scope includes providing access to the commonly found loc standards and vocabularies (controlled vocabularies and data values) for the list of authorities and controlled vocabularies that loc currently supports.46 according to the loc, the linked data service brings several benefits to the users including: accessing data at no cost; providing granular access to individual data values; downloading controlled vocabularies and their data values in numerous formats; enabling linking to loc data values within the user metadata using linked data principles; providing a simple restful api, clear license and usage policy for each vocabulary; accessing data across loc divisions through a unified endpoint; and visualizing relationships between concepts and values.47 however, to fully exploit the potentials of lod, loc is mainly focusing on its bibframe initiative.48 bibframe is not only a replacement for the current marc21 metadata encoding format it is a new way of thinking how the available large amount of bibliographic metadata could be shared, reused, and made available as linked and open data. 49 the bibframe 2.0 (https://www.loc.gov/bibframe/docs/bibframe2-model.html) model organizes information into work (the details of the about the work information), instance (work on specific subject quantity in numbers), item (format: print or electronic), and nature (copy/original work). bibframe 2.0 elaborates the roles of the persons in the specific work as agents, and the subject of the work as subjects and events.50 according to taniguchi, bibframe 2.0 takes the bibliographic metadata standards to the linked and open data with model and vocabulary that makes the cataloging more useful both inside and outside the library community.51 to achieve this goal, it needs to fulfill two primary requirements. these include (1) accepting and representing metadata created with rda by replacing the marc21, and therefore, working as creating, exchanging, and sharing rda metadata; (2) accepting and accommodating descriptive metadata for bibliographic resources created by libraries, cultural heritage communities, and users for the wide exchange and sharing. bibframe 2.0 should comply with the linked data principles including the use of rdf and uris. in addition to the library of congress, oclc through its linked data research has also been actively involved in research on transforming and publishing its bibliographic metadata as linked data.52 under this program, oclc aims to provide a technical platform for the management and publication of its rdf datasets at a commercial scale. it models the key bibliographic entities including work and person and populates them with legacy and marc-based metadata. it extends http://id.loc.gov/ https://www.loc.gov/bibframe/docs/bibframe2-model.html information technology and libraries | december 2018 53 models to efficiently describe the contents of digital collections, art objects, and institutional repositories, which are not very well-described in marc. it improves the bibliographic description of works and their translations. it manages the transition from marc and other legacy encoding formats to linked data and develops prototypes for native consumption of linked data to improve resource description and discovery. finally, it organizes teaching and training events.53 since 2012, oclc has been publishing bibliographic data as linked data with three major lod datasets including oclc persons, worldcat works, and worldcat.org.54 inspired from google research, currently, they have been working on knowledge vault pipeline process to harvest, extract, normalize, weigh, and synthesize knowledge from bibliographic records, authority files, and the web to generate linked data triples to improve the exploration and discovery experience of end-users.55 worldcat.org publishes it bibliographic metadata as linked data by extracting a rich set of entities including persons, works, places, events, concepts, and organizations to make possible several web services and functionalities for resource discovery and access.56 it uses schema.org (http://schema.org) as the base ontology, which can be extended with different ontologies and vocabularies to model worldcat bibliographic data to be published and consumed as linked data.57 tennant presents a simple example of how this works. suppose we want to represent the fact “william shakespeare is the author of hamlet” as linked data.58 to do this, the important entities should be extracted along with their semantics (relationships) and represented in a format that is both machine-processable and human-readable. using schema.org, virtual international authority file (viaf.org), and worldcat.org, the sentence can be represented as a linked data triple, as shown in figure 1 based on tennant.59 the digital bibliography & library project (dblp) is an online computer science bibliography that provides bibliographic information about major publications in computer science with the goal of providing free access to high-quality bibliographic metadata and links to the electronic version of these publications.60 as of october 2018, it has indexed more than 4.3 million publications from more than 2.1 million authors and has indexed more than 40,000 journal volumes, 38,000 conference/workshop proceedings, and more than 80,000 monographs.61 its dataset is available on lod that allows for faceted search and faceted navigation to the matching publications. it uses growbag graphs to create topic facets and uses dblp++ datasets (an enhanced version of dblp) and additional data extracted from the related webpages on the web.62 a mysql database stores the dblp++ dataset that is accessible through several ways including (1) getting the database dump; (2) using its web services; (3) using d2r server to access it in rdf; and (4) getting the rdf dump available in n3 serialization.63 the above discussions on loc, oclc, and dblp make it clear that lod can potentially transform the cataloging landscape of libraries by making bibliographic metadata visible and reusable on the web. however, this potential can only be exploited to its fullest if relevant vocabularies are provided to make the linked data more meaningful. lov fulfills this demand for relevant and standard vocabularies, discussed in the next subsection. current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 54 https://doi.org/10.6017/ital.v37i4.10432 figure 1. an example of publishing a sample fact as linked data (based on tennant64). linked open vocabularies linked open vocabularies (lov) are a “high-quality catalog of reusable vocabularies to describe linked and open data.”65 they assist publishers in choosing the appropriate vocabulary to efficiently describe the semantics (classes, properties, and data types) of the data to be published as linked and open data.66 lov interconnect vocabularies, version control, the property type of values to be matched with a query to increase the score of the terms, and offers a range of data access methods including apis, sparql endpoint, and data dump. the aim is to make the reuse of well-documented vocabularies possible in the lod environment.67 the lov portal brings valueadded extensions to the web of data, which is evident from its adoption in several state-of-the-art applications.68 the presence of vocabulary makes the corresponding linked data meaningful, if the original vocabulary vanishes from the web, linked data applications that rely on it no longer function because they cannot validate against the authoritative source. lov systems prevent vocabularies from becoming unavailable by providing redundant or back-up locations for these vocabularies.69 the lov catalog meets almost all types of search criteria including search using metadata, ontology, apis, rdf dump, and sparql endpoint enabling it to provide a range of services regarding the reuse of rdf vocabularies.70 linked data should be accompanied by its meaning to achieve its benefits, which is possible using vocabularies especially rdf vocabularies that are also published as linked data and linked with each other forming an lov ecosystem.71 such an ecosystem defines the health and usability of linked data by making its meaningful interpretation possible.72 for an ontology or vocabulary to be included into the lov catalog, it must be of an appropriate size with low-level and normalized information technology and libraries | december 2018 55 constraints and represented in rdfs or web ontology language (owl); it must allow creating instances and support documentation by permitting comments, labels, definitions, and descriptions to support end users.73 the ontology must have additional characteristics such as those described in semantic web languages like owl, published on the web with no limitations on its reuse, and support for content negotiation using searchable content and namespace uris .74 the lov catalog offers four core functionalities that make it more attractive for libraries. the aggregate accesses vocabularies through dump file or (a sparql) endpoint. the search finds classes/properties in a vocabulary or ontology. the stat displays descriptive statistics of lov vocabularies. finally, suggest enables the registry of new vocabularies.75 radio and hanrath uncovered the concerns regarding transitioning to lov including how preexisting terms could be mapped while considering the potential semantic loss.76 they describe this transition in the light of a case study at the university of kansas institutional repository, which adopted oclc’s fast vocabulary and analyzed the outcomes and impact of exposing their data as linked data. to them, a vocabulary that is universal in scope and detail can become “bloated” and may result in an aggregated list of uncontrolled terms. however, such a diverse system may be capable of accurately describing the contents of an institutional repository. in this regard, adopting linked data vocabulary may serve to increase the overall quality of data by ensuring consistency with greater exposure of the resources when published as lod. however, such a transition to a linked data vocabulary is not that simple and gets complicated when the process involves reconciling the legacy metadata especially when dealing with the issues of under or misrepresentation.77 publishers, commercial entities, and data providers such as universities are taking keen interest and consortial participation, and therefore the library community must contribute to, benefit from, and consider this inevitable opportunity seriously.78 considering, the core role of libraries in connecting people to the information, they should come forward to make available their descriptive metadata collections as linked and open data for the benefit of the scholarly community on the web. it is time to move from strings (descriptive bibliographic records) to things (data items) that are connected in a more meaningful manner for the consumption of both machines and humans.79 besides the numerous benefits of the lov, there are some well-documented [and well-supported] vocabularies that are “not published or no longer available.”80 while focusing on the mappings between schema.org and lov, nogales et al. argue that the lov portal is limited as “some of the vocabularies are not available here.”81 in other words, the lov portal is growing, but currently, it is at the infant stage, where much work is needed to bring all or at least the missing welldocumented and well-supported vocabularies. this way the true benefits of lov could be exploited to the fullest when such vocabularies are linked and made available for the consumption and reuse of the broader audience and applications of the web of data. challenges, issues, and research opportunities to answer the rq02, this section attempts to identify some of the prominent/key challenges and issues regarding publishing and consuming bibliographic metadata as linked and open data. the sheer scale and diversity of cataloging frameworks, metadata encoding schemes, and standards make it difficult to approach cataloging effectively and efficiently. the quality of the cataloging data is another dimension that needs proper attention. current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 56 https://doi.org/10.6017/ital.v37i4.10432 the multiplicity of cataloging rules and standards the importance and critical role of standards in cataloging are clear to everyone. with standards, it becomes possible to identify authors uniquely; link users to the intended and the required resources; assess the value and usage of the services a library or information system provides; operate efficiently different transactions regarding bibliographic metadata, link content, preserve metadata, and generate reports; and enable the transfer of notifications, data, and events across machines.82 the success of these standards is because of the community-based efforts and their utility for a person/organization and ease of adoption. 83 however, we are living in a “jungle of standards” with massive scale and complexity.84 we are facing a flood of standards, schemas, protocols, and formats to deal with bibliographic metadata. 85 it is necessary to come up with some uniform and widely accepted standard, schema, protocol, and format, which will make possible the uniformity between bibliographic records and make way for records de-duplication on the web. also, because of the exponential growth of the digital landscape of document collections and the emerging yet widely adopted linked data environment, it becomes necessary for librarians to be part of this global scale activity of making their bibliographic data available as linked and open data.86 therefore, all these standards need reenvisioning and reconsideration when libraries transit from the current implementations to a more complex lod-based environment.87 rda is easy to use, user-centric, and retrieval-supportive with a precise vocabulary.88 however, it has lengthier descriptions with a lot of technical terms, is time-consuming, needs re-training, and suffers from the generation gap.89 rda is transitioning from aacr2 to produce metadata for knowledge artifacts, and it will be adaptive to the emerging data structures of linked data.90 although librarians could potentially play a vital role in making rda successful, it is challenging to bring them on the same page with publishers and vendors.91 while studying bibframe 2.0 from rda point of view, taniguchi observed that: • bibframe has no class correspondence with rda, especially making a distinction between work and expression is challenging. • some rda elements have no corresponding properties in bibframe, and therefore, cannot be expressed in bibframe. in other cases, bibframe properties cannot be converted back to rda elements due to the many-to-one and many-to-many mappings between them. • the availability of multiple marc21-to-bibframe tools results in the variety of bibframe metadata, which makes its matching and merging in the later stages challenging.92 to understand whether bibframe 2.0 is suitable as a metadata schema, taniguchi examined it closely for domain constraint of properties and developed four additional methods for implementing such constraints, i.e., defining properties in bibframe.93 in these methods, method 1 is the strictest one for defining such properties, method 2 from bibframe, and the remaining gradually loosen. method 1 defines the domain of individual properties as work or instance only, which is according to the method in rda. method 2 defines properties using multiclass structure (work-instance-item) for descriptive metadata. method 3 introduces a new class bibres to accommodate work and instance properties. method 4 uses two classes bibres and work for representing a bibliographic resource. method 5 leaves the domain of any property unspecified and uses rdf:type to represent whether a resource belongs to the work or instance. he observed that: information technology and libraries | december 2018 57 • the multi-class structure used in bibframe (method 2) questions the consistency between this structure and the domain definition of the properties. • if the quality of the metadata is concerned especially matching among converted metadata from different source metadata, then method 1 works better than method 2. • if metadata conversion from different sources is required, then method 4 or 5 should be applied.94 taniguchi concludes that bibframe’s domain constraint policy is unsuitable for descriptive metadata schema to exchange and share bibliographic resources, and therefore, should be reconsidered.95 according to sprochi, bibliographic metadata is passing through a significant transformation. 96 frbr, rda, and bibframe are among the three major and currently running programs that will affect the recording, storage, retrieval, reuse and sharing of bibliographic metadata. ifla focuses on reconciling frbr, frad, and frsad models into one model namely frbr-library reference model (rfbr-lrm [https://www.ifla.org/node/10280]), published in may 2016.97 sprochi further adds that it is generally expected that by adopting this new model, rda will be changed and revised significantly. bibframe will also get substantial modifications to become compatible with frbr-lrm and the resulting rda rules.98 these initiatives, on the one hand, makes possible their visibility on the web, but on the other hand, introduces several changes and challenges for the library and information science community.99 to cope with the challenges of making bibliographic data visible, available, reusable, and shareable on the web, sprochi argues that: 100 • the library and information science community must think of the bibliographic records in terms of data that is both human-readable and machine-understandable, which can be processed across different applications and databases with no format restrictions. also, this data must support interoperability among vendors, publishers, users, and libraries and therefore, should be thought of beyond the notion that “only library create quality metadata (as quoted in coyle (2007)” and cited by sprochi101). • a shared understanding of semantic web, lod, data formats, and other related technologies is necessary for the library and information science community for more meaningful and fruitful conversations with software developers, information & library science (ils) designers, and it & linked data professionals. at least some basic knowledge about these technologies will enable the library community to take active participation in publishing, storing, visualizing, linking, and consuming bibliographic metadata as linked and open data. • the library community must show a strong commitment to more ils vendors to “postmarc” standards such as bibframe or any other standard that is supportive of the lod environment. this way we will be in a better position to exploit linked data and semantic web to their fullest. the library community must be ready to adopt lod in cataloging. transitioning from marc to linked data needs collaborative efforts and requires addressing several challenges. these challenges include: https://www.ifla.org/node/10280 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 58 https://doi.org/10.6017/ital.v37i4.10432 • committing to a single standard by getting all units in the library, so that the big data problem resulting from using multiple metadata standards by different institutions could be mitigated; • bringing individual experts, libraries, university, and governments to work together and organize conferences, seminars, and workshops to bring linked data into the mainstream ; • translating the bibframe vocabulary into other languages; • involving different users and experts in the area; and • obtaining funding from the public sector and other agencies to continue the journey towards linked data.102 in the current scenario of metadata practices, the interoperability for the exchange of metadata varies across different formats.103 the semantic web and lod support different library models such as frbroo, edm, and bibframe. these conceptual models and frameworks suffer from the interoperability issue, which makes data integration difficult. currently, several options are available for encoding bibliographic data to rdf (and to lod), which further complicates the interoperability and introduces inconsistency.104 existing descriptive cataloging methodologies and the bibliographic ontology descriptions in cataloging and metadata standards set the stage for redesigning and developing better ways of improved information retrieval and interoperability.105 besides the massive heaps of information on the web, the library community (especially digital libraries) has devised standards for metadata and bibliographic description to meet the interoperability requirements for this part of the data on the web.106 semantic web technologies could be exploited to make information presentation, storage, and retrieval more user-friendly for digital libraries.107 to achieve such interoperability among resources, castro proposed an architecture for semantic bibliographic description.108 gardašević emphasizes on employing information system engineers and developers to understand resource description, discovery, and access process in libraries and then extend these practices by applying linked data principles.109 this way bibliographic metadata will be more visible, reusable and shareable on the web. godby, wang, and mixter stress collaborative efforts to establish a single and universal platform for cataloging rules, encoding schema, and model to a higher level of maturity, which requires initiatives such as rda, bibframe, ld4l, and biblow (https://bibflow.library.ucdavis.edu/about).110 the massive volume of metadata (available in marc and other legacy formats) makes data migration to bibframe challenging.111 although bibframe challenges the conventional ground of cataloging, which aims to record tangible knowledge containers, it is still in the infant stage at both theoretical and practical levels.112 for bibframe to be more efficient, enhanced, and enriched, it needs the attention of librarians and information science experts who will use it to encode their bibliographic metadata.113 gonzales suggests that librarians must be willing to share metadata and upgrade metadata encoding standards to bibframe; they should train, learn, and upgrade their systems to efficiently use bibframe encoding scheme and research new ways of bringing interoperability between bibframe and other legacy metadata standards; and they should ensure the data security of patrons and mitigate the legal and copyright issues in making visible their resources as linked and open data.114 also, lov must be exploited from the cataloging perspective by finding out ways to create a single, flexible, adaptable, and representative vocabulary. such a vocabulary will bring the cataloging data from different https://bibflow.library.ucdavis.edu/about information technology and libraries | december 2018 59 libraries of the world and make it accessible and consumable as a single library linked data to get free from the jungle of metadata vocabularies [and standards]. publishing and consuming linked bibliographic metadata according to the findings of one survey, there are several primary motives for publishing an institution’s [meta]data as linked data. these include (in the order from most frequent/ essential to a lesser one):115 • making data visible on the web; • experimenting and finding the potentials of publishing datasets as linked data; • exposing local datasets to understand the nature of linked data; • exploring the benefits of linked data for search engine optimization (seo); • consuming and reusing linked data in future projects; • increasing the data reusability and interoperability; • testing schema.org and bibframe; • meeting the requirements of the project; and • making available the “stable, integrated, and normalized data about research activities of an institution.”116 they also identified several reasons from the participants regarding the consumption of such data. these include (in the order from most frequent/essential to a lesser one):117 • improving the user experience; • extending local data with other datasets; • effectively managing the internal metadata; • improving the accuracy and scope of search results; • trying to improve seo for local resources; • understanding the effect of data aggregation from multiple datasets; and • experimenting and finding the potentials of consuming linked datasets. publishing and consuming bibliographic data on the lod cloud brings numerous applications. kalou et al. developed a semantic mashup by combining semantic web technologies, restful services, and content management services (cms) to generate personalized book recommendations and publish them as linked data.118 it allows for the expressive reasoning and efficient management of ontologies and has potential applications in the library, cataloging services, and ranking book records and reviews. this application exemplifies how we can use the commercially [and socially] curated metadata with bibliographic descriptions from improved user experience in digital libraries using linked data principles. however, publishing and consuming bibliographic metadata as linked and open data is not that simple and need addressing several prominent challenges and issues, which are identified in the following subsections along with some opportunities for further research. publishing linked bibliographic metadata the university of illinois library worked on publishing marc21 records of 30,000 digitized books as linked library data by adding links, transforming them to lod-friendly semantics (mods) and deploying them as rdf with the objective to be used by a wider community.119 to them, using semantic web technologies, a book can be linked to related resources and multiple possible current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 60 https://doi.org/10.6017/ital.v37i4.10432 contexts, which is an opportunity for libraries to build innovative user-centered services for the dissemination and uses of bibliographic metadata.120 in this regard, the challenge is to utilize the existing book-related bibliographic maximally and descriptive metadata in a manner that parallels with the services (both inside the library and outside) as well as exploit to the fullest the full-text search and semantic web technologies, standards, and lod services.121 while publishing the national bibliographic information as free open linked data, ifla identifies several issues including:122 • dealing with the negative financial impact on the revenue generated from traditional metadata services; • the inability to offer consistent services due to the complexity of copyright and licensing frameworks; • the confusion in understanding the difference between “open” and “free” terms; • remodeling library data as library linked data; • the limited persistence and sustainability of linked data resources; • the steep learning curve in understanding and applying linked data practices to library data; • making choices between sites to link to; and • creating persistent uris for library data objects. from the analysis of the relevant literature, hallo identified several issues in publishing bibliographic metadata as linked and open data. these include difficulties in cataloging and migrating data to new conceptual models; the multiplicity of vocabularies for the same metadata; the lack of agreements to share data; the lack of experts and tools for transforming data; the lack of applications and indicators for its consumption; mapping issues; providing useful links of datasets; defining and controlling data ownership; and ensuring dataset quality.123 libraries should adopt to linked data five-stars model by adopting emerging non-proprietary formats to publish its data; link to external resources and services; participate actively in enriching; and improving the quality of metadata to improve knowledge management and discovery. 124 the cataloging has a bright future with more dataset providers by involving citizens and end -users in metadata enrichment and annotation; making ranking and recommendation as part of library cataloging services; and the increased participation of the library community to the body of semantic web and linked data.125 publishing linked data poses several issues. these include data cleanup issues es pecially when dealing with legacy data; technical issues such as data ownership; the software maturity to keep linked data up-to-date; managing its colossal volume; and providing it support for data entry, annotation, and modeling; developing representative and widely applicable lovs; and handling the steep learning curve to understand and apply linked data principles. 126 bull and quimby stress understanding how the library community is transiting their cataloging methods, systems, standards, and integrations to the lod for making them visible on the web and how they keep backward compatibility with legacy bibliographic metadata.127 it is necessary for the lod data model to maintain the underlying semantics of the existing models, schemas, and standards, yet innovate and renew old traditions, where the quality of the conversion solely depends on the ability of this new model to cope with heterogeneity conflicts, information technology and libraries | december 2018 61 maintain granularity and semantic attributes and consequently prevent loss of data and semantics.128 the new model should be semantically expressive enough to support meaningful and precise linking to other datasets. by thinking alternatively, these challenges are the significant research opportunities that will enable us to be part of linked and open data community in a more profound manner. consuming linked bibliographic metadata consuming linked data resources can be a daunting task and may involve resolving/mitigating several challenges. these challenges include:129 • dealing with the bulky or non-available rdf dumps, no authority control within rdf dumps, and data format variations; • identifying terms’ specificity levels during concept matching; • the limited reusability of library linked data due to lack of contextual data; • harmonizing classes and objects at the institution level; • excessive handcrafting due to few off-the-shelf visualization tools; • manual mapping of vocabularies; • matching, aligning, and disambiguating library and linked data; • the limited representation of several essential resources as linked data due to nonavailability of uris; • the lack of sufficient representative semantics for bibliographic data; • the time-consuming nature of linked data to understand its structure for reuse; • the ambiguity of terms across languages; and • the non-stability of endpoints and outdated datasets. syndication is required to make library data visible on the web. also, it is necessary to understand how current applications including web search engines perceive and treat visibility, to what extent schema.org matters, and what is the nature of the linked data cloud.130 an influential work may be translated into several languages, which results in multiple metadata records. some of these are complete, and others are with missing details. godby and smith‐ yoshimura suggest aggregating these multiple metadata records into a single record, which can be complete, link the work to its different translations and translators, and is publishable (and consumable) as linked data.131 however, such an aggregation demands a great deal of human effort to make these records visible and consumable as linked data. this also includes describing all types of objects that libraries currently collect and manage, translating research findings to best practices; and establishing policies to use uris in marc and other types of records. 132 to achieve the long-term goal of making metadata consumable as linked data; the libraries, as well as individual researchers, should align their research with work that of the major players such as oclc, loc, and ifla and follow their best practices.133 the issues in lov needs immediate attention to make lod more useful. these issues, according to include the following:134 • lov publishes only a subset of rdf vocabularies with no inclusion for value vocabularies such as skos thesaurus; • it provides no or almost negligible support for vocabulary authors; current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 62 https://doi.org/10.6017/ital.v37i4.10432 • it relies on third parties to get the information about vocabulary usage in published datasets; • it has insufficient support for multilingualism or many languages; • it should support multi-term vocabulary search, which is required from the ontology designers to understand and employ the complex relationships among concepts; • it should support vocabulary matching, vocabulary checking, and multilingualism to allow users to search and browse vocabularies using their native language. it also improves the quality of the vocabulary by translation, which allows the community to evaluate and collaborate; and • efforts are required to improve and make possible the long-term preservation of vocabularies. lod emerged to change the design and development of metadata, which has implications for controlled vocabularies, especially, the person/agent vocabularies that are fundamental to data linkage but suffer from the issues of metadata maintenance and verification. 135 therefore, practical data management and the metadata-to-triples transition should be studied in detail to make the wider adaptation of lod possible.136 to come out of the lab environment and make lod practically useful, the controlled vocabularies must be cleaned, and its cost should be reduced.137 however, achieving this is challenging and needs to answer how knowledge artifacts could be uniquely identified and labeled across digital collections and what should be the standard practices to use them.138 linked data is still new to libraries.139 the technological complexities, the feeling of risks in adopting new technology and limitations due to the system, politics, and economy are some of the barriers in its usage in libraries.140 however, libraries can potentially overcome these barriers by learning from the use of linked data in other domains including, e.g., google’s knowledge graph and facebook’s open graph.141 the graph interfaces could be developed to link author, publisher, and book-related information, which in turn can be linked to the other open and freely available datasets.142 it is time that the library and information science professionals come out of the old, document-centric approach to bibliographic metadata and adapt their thinking as more datacentric for a more meaningful consumption of bibliographic metadata by both users and machines.143 quality of linked bibliographic metadata the use of a cataloging data defines its quality.144 the quality is essential for the discovery, usage, provenance, currency, authentication, and administration of metadata. 145 cataloging data or bibliographic metadata is considered fit for use based on its accuracy, completeness, logical consistency, provenance, coherence, timeliness, conformance and accessibility. 146 data is commonly assessed by its quality to be used in specific application scenarios and use cases, however, sometimes, low-quality data can still be useful for a specific application as far as its quality meets the requirements of that application.147 the reasons include several factors including availability, accuracy, believability, completeness, conciseness, consistency, objectivity, relevance, understandability, timeliness, and verifiability that determine the quality of data. 148 the quality of linked data can be of two types, one is the inherent quality of linked data, and the other relates to its infrastructure aspects. the former can be further divided into aspects including domain, metadata, rdf model, links among data items, and vocabulary. the infrastructural information technology and libraries | december 2018 63 aspects include the server that hosts the linked data, linked data fragments, and file servers.149 this typology introduces issues of their own, the issues related to the inherent quality including “linking, vocabulary usage and the provision of administrative metadata.”150 the infrastructural aspect introduces issues related to naming conventions, which include avoiding blank nodes and using http uris, linking through owl:sameas links, describing by reusing the existing terms and dereferencing.151 the quality cataloging definitions are mainly based on the experience and practices of the cataloging community.152 its quality falls into at least four basic categories: (1) the technical details of the bibliographic records, (2) the cataloging standards, (3) the cataloging process, and (4) the impact of cataloging on the user.153 the cataloging community focuses mainly on the quality of bibliographic metadata. however, it is not sufficient enough to consider the accuracy, completeness, and standardization of bibliographic metadata, and therefore, it is necessary that they should also consider the information needs of the users.154 van kleeck et al. investigated issues in the quality management of metadata of electronic resources to assess in supporting user tasks of finding, selecting, and accessing library holdings as well as identifying the potential for increasing efficiencies in acquisition and cataloging workflow.155 they evaluated the quality of existing bibliographic records mostly provided by their vendors and compared them with those of oclc and found that the latter has better support users in resource discovery and access. 156 from the management perspective, the complexity and volume of bibliographic metadata and the method of ingesting it to the catalog emphasize the selection of highest quality records.157 from the perspective of digital repositories, the absence of well-defined theoretical and operational definitions of metadata quality, interoperability, and consistency are some of the issues for the quality of metadata.158 the national information standards organization (niso) identifies several issues in creating metadata. 159 these include the inadequate knowledge about cataloging in both manual and automatic environments leading to inaccurate data entry, inconsistency of subject vocabularies, and limitations of resource discovery, and the development of standardized approaches to structure metadata.160 the poor quality of linked data can make its usefulness much difficult.161 datasets are created at the data level resulting in a significant variance in perspectives and underlying data models.162 this also leads to errors in triplication, syntax, and data; misleading owl:sameas links, and the low availability of sparql endpoints.163 library catalogs, because of their low quality, most often fail to communicate clear and correct information correctly to the users.164 the reasons for such low quality include user’s inability to produce catalogs that are free from faults and duplicates as well as low standards and policies that drive these cataloging practices. 165 although the rich collections of bibliographic metadata are available, these are rich in terms of the heaps of cataloging data and not in terms of quality with almost no bibliographic control. 166 these errors in and the low quality of bibliographic metadata are the result of misunderstanding the aims and functions of bibliographic metadata and adopting the “unwise” cataloging standards and policies.167 still there exist some high-quality cataloging efforts with well-maintained cataloging records, where the only quality warrant is to correctly understand the subject matter of the artifact and effectively communicate between librarians and experts in the corresponding domain knowledge. 168 the demand for such high quality and well-managed catalogs has increased on the web. although current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 64 https://doi.org/10.6017/ital.v37i4.10432 people are more accustomed to web search engines, the quality catalogs will attract not only libraries but the general web users as well (when published and consumed as linked data).169 the community must work together on metadata with publishers and vendors to approach cataloging from the user perspective and refine the skillset as well as produce quality metadata.170 as library and information science professionals, we should not only be the users of the standards , instead, we must actively participate and contribute to its development and improvement so that we may effectively and efficiently connect our data with the rest of the world.171 such collaboration is required from not only the librarians and vendors but also from the users in developing an efficient cataloging environment and for a more usable bibliographic metadata, this is discussed in the next section. linking the socially curated metadata this section addresses rq03 by reviewing the state-of-the-art literature from multiple but related domains including library sciences, information sciences, information retrieval, and semantic web. the section below discusses the importance and possible impact of making socially curated metadata as part of the bibliographic or professionally curated metadata. the next section highlights why social collaborative cataloging approaches should be adopted by librarians to work with other stakeholders in making their bibliographic data available and visible as linked and open data and what is the possible impact of fusing the user-generated content with professional metadata and making it available as linked and open data. the socially curated metadata matters in cataloging conventional libraries have clear and well-established classification and cataloging schemes but these are as challenging to learn, understand, and apply as they are slow and painful to consume.172 using computers to retrieve bibliographic records resulted in the massive usage of copy cataloging.173 however, adopting this practice is challenging, because these records are inconsistent; incomplete; less visible, granular, and discoverable; unable to integrate metadata and content to the corresponding records; difficult to preserve with new and usable format for the consumption by users and machines; and not supportive towards integrating the user-generated content into the cataloging records.174 the university of illinois library, through its vufind service, offers extra features to enhance the search and exploration experience of end users by providing a book’s cover image, table of contents, abstracts, reviews, comments, and user tags.175 users can contribute content such as tags, reviews, comments, and recommend books to friends. h owever, it is necessary to research whether this user-generated content should be integrated to or preserved along the bibliographic records.176 in their book, alemu and stevens mentioned several advantages of making user-generated content as part of the library catalogs.177 these include (i) enhancing the functionality of professionallycurated metadata by making information objects findable and discoverable; (ii) removing the limitations posed by sufficiency and necessity principles of the professionally-curated metadata; (iii) bringing users closer to the library by “pro-actively engaging” them in ratings, tagging, and reviewing, etc., provided that users are also involved in managing and controlling metadata entries; and (iv) the resulting “wisdom of the crowd” would benefit all the stakeholders from this massively growing socially-curated metadata. however, this combination can only be utilized optimally if we can semantically and contextually link it to the internal and external resources; the resulting metadata is openly accessed, shared, and reused; users are supported in easily adding information technology and libraries | december 2018 65 the metadata and made part of the quality control by enabling them to report spamming activities to the metadata experts.178 librarything for libraries (ltfl) makes a library catalog more informative and interactive by enhancing opac, providing access to professional and social metadata, and enabling them to search, browse, and discover library holdings in a more engaging way (https://www.librarything.com/forlibraries). it is one of the practical examples of enriching library catalogs with user-generated content. this trend of merging social and professional metadata innovates library cataloging by dissolving the borders between “social sphere” and library resources.179 the social media has expanded library into social spaces by exploiting tags and tag clouds as navigational tools and enriching the bibliographic descriptions by integrating the user-generated content.180 it bridges the communication gaps between the library and its users, where users take active participation in resource description, discovery, and access. 181 the potential role of the socially curated metadata in resource description, discovery, and access is also evident from the long long-tail social book search research under the initiative for xml retrieval (inex) where both professionally curated bibliographic and user-generated social metadata are exploited for retrieval and recommendation to support both known-item as well as exploratory search.182 by experimenting with amazon/librarything datasets of 2.8 million book records, containing both professional and social metadata, the results conclude that enriching the professional metadata with social metadata especially tags significantly improves search and recommendation.183 koolen also noticed that the social metadata especially tags and reviews significantly improve the search performance as professionally curated metadata is “often too limited” to describe books resourcefully.184 users add socially curated metadata with the intention of making resource re-findable during a future visit, i.e., they add metadata such as tags to facilitate themselves and allow other similar users in resource discovery and access, and therefore, form a community around the resource.185 clements found user tags (social tagging) beneficial for librarians while browsing and exploring the library catalogs.186 to some librarians, tags are complementary to controlled vocabulary; however, training issues and lack of awareness of social tagging functionality in cataloging interfaces prevent their perceived benefit.187 the socially curated metadata as linked data metadata is socially constructed.188 it is shaping and shaped by the context in which it is developed and applied, and demands community-driven approaches, where data should be looked at from a holistic point of view rather than considering them as discrete (individual) semantic units.189 the library is adopting the collaborative social aspect of cataloging that will take place between authors, repository managers, libraries, e-collection consortiums, publishers, and vendors.190 librarians should improve their cataloging skills in line with the advances in technology to expose and make visible their bibliographic metadata as linked and open data.191 currently, linked library data is generated and used by library professionals. socially constructed metadata will act as a value-added in retrieving knowledge artifacts with precision.192 the addition of socially constructed and community-driven metadata in current metadata structures, controlled vocabularies, and classification systems provide the holistic view of these structures as they add the community-generated sense to the professionally-curated metadata structures.193 an https://www.librarything.com/forlibraries current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 66 https://doi.org/10.6017/ital.v37i4.10432 example of the possibilities of making user-generated content as part of cataloging and linked open data is the semantic book mashup (see “consuming linked bibliographic metadata” above) which demonstrates how the commercially [and socially] curated metadata could be retriev ed and linked with bibliographic descriptions.194 while enumerating the possible applications of this mashup, they argue that book reviews from different websites could be aggregated using linked data principles by extending the review class of bibframe 2.0.195 from the analysis of twenty-one in-depth interviews with lis professionals, alemu discovered four metadata principles, namely metadata enrichment, linkage, openness, and filtering.196 this analysis revealed that the absence of socially curated metadata is sub-optimal for the potential of lod in libraries.197 their analysis advocates for a mixed-metadata approach, in which social metadata (tags, ratings, and reviews) augments the bibliographic metadata by involving users proactively and by offering a social collaborative cataloging platform. the metadata principles should be reconceptualized, and linked data should be exploited to address the existing library metadata challenges. therefore, the current efforts in linked data should fully consider social metadata.198 library catalogs should be enriched by mixing the professional and social metadata as well as semantically and contextually interlinked to internal and external information resources to be optimally used in different application scenarios.199 to fully exploit this linkage, the duplication of metadata should be reduced. it must be made openly accessible so that its sharing, reuse, mixing, and matching could be made possible. the enriched metadata must be filtered per user requirements using an interface that is flexible, personalized, contextual, and reconfigurable.200 their analysis suggests a “paradigm shift” in metadata’s future, i.e., from simple to enriched; from disconnected, invisible and locked to well-structured, machine-understandable, interconnected, visible, and more visualized metadata; and from single opac interface to reconfigurable and adaptive metadata interfaces.201 by involving users in the metadata curation process, the mixed approach will bring diversity in metadata and make resources discoverable, usable, and user-centric with the wider and well-supported platform of linked and open data.202 in conclusion, the fusion of socially curated metadata with the standards-based professional metadata is essential from the perspective of the user-centric paradigm of cataloging, which has the potential to aid resource discovery and access and open new opportunities for information scientists working in linked and open data as well as catalogers who are transiting to the web of data to make their metadata visible, reusable, and linkable to other resources on the web. from the analysis and scholarly discussions of alemu, stevens, farnel, and others as well as from the initial experiments of kalou et al.203 it becomes apparent that the application of linked data principles for library catalogs is future-proof and promising towards more user-friendly search and exploration experience with efficient resource description, discovery, access, and recommendations. conclusions in this paper, we presented a brief yet holistic review of the current state of linked and open data in cataloging. the paper identified the potentials of lod and lov in making the bibliographic descriptions publishable, linkable, and consumable on the web. several prominent challenges, issues, and future research avenues were identified and discussed. the potential role of sociallycurated metadata for enriching library catalogs and the collaborative social aspect of cataloging were highlighted. some of the notable points include the following: information technology and libraries | december 2018 67 • publishing, linking, and consuming bibliographic metadata on the web using linked data principles brings several benefits for libraries.204 the library community should improve their skills regarding this paradigm shift and adopt the best practices from other domains.205 • standards have a key role in cataloging, however, we are living in a “jungle of metadata standards” with varying complexity and scale, which makes it difficult to select, apply and work with.206 to be part of global scale activity of making bibliographic data available on the web as linked and open data, these standards should be considered and reenvisioned.207 • the quality of bibliographic metadata depends on several factors including accuracy, completeness, logical consistency, provenance, coherence, timeliness, conformance and accessibility.208 however, achieving these characteristics is challenging because of several reasons including cataloging errors; limited bibliographic control; misunderstanding the role of metadata; and “unwise” cataloging standards and policies.209 to ensure high-quality and make data visible and reusable as linked data, the library community should contribute to developing and refining these standards and policies. 210 • metadata is socially constructed and demands community-driven approaches and the social collaborative aspect of cataloging by involving authors, repository managers, librarians, digital collection consortiums, publishers, vendors, and users.211 this is an emerging trend, which is gradually dissolving the borders between the “social sphere” and library resources and bridging the communication gap between libraries and their users, where end users contribute to the bibliographic descriptions resulting in a diversity of metadata and making it user-centric and usable.212 • adopting a “mixed-metadata approach” by considering bibliographic metadata and the user-generated content complementary and essential for each other suggests a “paradigm shift” in the metadata’s future from simple to enriched; from human-readable data silos to machine understandable, well-structured, and reusable; from invisible and restricted to visible and open; and from single opac to reconfigurable interfaces on the web.213 several researchers including the ones cited in this article agree that the professionally curated bibliographic metadata supports mostly the known-item search and has little value to open and exploratory search and browsing. they believe that not only the collaborative social efforts of the cataloging community are essential but also the socially curated metadata, which can be used to enrich bibliographic metadata and support exploration and serendipity. this is not only evident from the wider usage of librarything and its ltfl but also from the long-tail inex social book search research where both professionally curated bibliographic and user-generated social metadata are exploited for retrieval and recommendation to support both known-item as well as exploratory search.214 therefore, this aspect should be considered for further research to make cataloging more useful for all the stakeholders including libraries, users, authors, publishers, and for the general consumption as linked data on the web. the current trend of social collaborative cataloging efforts is essential to fully exploit the potential of linked open data. however, if we look closely, we find four groups including librarians, linked data experts, information retrieval (ir) and interactive ir researchers; and users, all going on their separate ways with minimal collaboration and communication. more specifically, they are not benefiting from each other to a greater extent, which could result in better possibilities of current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 68 https://doi.org/10.6017/ital.v37i4.10432 resource description, discovery, and access. for example, the library community should consider the findings of inex sbs track, which have demonstrated that professional and social metadata, are essential for each other to facilitate end users in resource discovery and access and support not only known-item search but also exploration and serendipity. the current practices of librarything, ltfl, and social web in general advocate for user-centric cataloging, where users are not only the consumers of bibliographic descriptions but also the contributors to metadata enrichment. linked open data experts have achieved significant milestones in other domains including, e.g., e-government, they should understand the cataloging and resource discovery & access practices in libraries to make the bibliographic metadata not only visible as linked data on the web but also shareable, re-usable, and beneficial to the end-users. the social collaborative cataloging approach by involving the four mentioned groups actively is significant to make bibliographic descriptions more useful not only for the library community and users but also for their consumption on the web as linked and open data. together we can, and we must. references 1 maría hallo et al., “current state of linked data in digital libraries,” journal of information science 42, no. 2 (2016):117–27, https://doi.org/10.1177/0165551515594729. 2 tim berners-lee, “design issues: linked data,” w3c, 2006, updated june18, 2009, accessed november 09, 2018, https://www.w3.org/designissues/linkeddata.html; hallo, “current state,” 117. 3 yuji tosaka and jung-ran park, “rda: resource description & access—a survey of the current state of the art,” journal of the american society for information science and technology 64, no. 4 (2013): 651–62, https://doi.org/10.1002/asi.22825. 4 hallo, “current state,” 118; angela kroeger, “the road to bibframe: the evolution of the idea of bibliographic transition into a post-marc future,” cataloging & classification quarterly 51, no. (2013): 873–90. https://doi.org/10.1080/01639374.2013.823584; martin doerr et al., “the europeana data model (edm).” paper presented at the world library and information congress: 76th ifla general conference and assembly, gothenburg, sweden, august 10–15, 2010. 5 getaneh alemu and brett stevens, an emergent theory of digital library metadata—enrich then filter,1st edition (waltham, ma: chandos publishing, elsevier ltd. 2015). 6 hallo, “current state,” 118 . 7 berners-lee, “design issues.” 8 kim tallerås, “quality of linked bibliographic data: the models, vocabularies, and links of data sets published by four national libraries,” journal of library metadata 17, no. 2 (2017):126– 55, https://doi.org/10.1080/19386389.2017.1355166. 9 becky yoose and jody perkins, “the linked open data landscape in libraries and beyond,” journal of library metadata 13, no. 2–3 (2013): 197–211, https://doi.org/10.1080/19386389.2013.826075. https://doi.org/10.1177/0165551515594729 https://www.w3.org/designissues/linkeddata.html https://doi.org/10.1002/asi.22825 https://doi.org/10.1080/01639374.2013.823584 https://doi.org/10.1080/19386389.2017.1355166 https://doi.org/10.1080/19386389.2013.826075 information technology and libraries | december 2018 69 10 robert fox, “from strings to things,” digital library perspectives 32, no. 1 (2016): 2–6, https://doi.org/10.1108/dlp-10-2015-0020. 11 stanislava gardašević, “semantic web and linked (open) data possibilities and prospects for libraries,” infotheca—journal of informatics & librarianship 14, no. 1 (2013): 26–36, http://infoteka.bg.ac.rs/pdf/eng/2013-1/infotheca_xiv_1_2014_26-36.pdf. 12 thomas baker, pierre-yves vandenbussche, and bernard vatant, “requirements for vocabulary preservation and governance,” library hi tech 31, no. 4 (2013): 657-68, https://doi.org/10.1108/lht-03-2013-0027. 13 pierre-yves vandenbussche et al., “linked open vocabularies (lov): a gateway to reusable semantic vocabularies on the web,” semantic web 8, no. 3 (2017): 437–45, https://doi.org/10.3233/sw-160213. 14 tosaka, “rda,” 651, 652. 15 amanda sprochi, “where are we headed? resource description and access, bibliographic framework, and the functional requirements for bibliographic records library reference model,” international information & library review 48, no. 2 (2016): 129–36, https://doi.org/10.1080/10572317.2016.1176455. 16 brighid m.gonzales, “linking libraries to the web: linked data and the future of the bibliographic record,” information technology and libraries 33, no. 4 (2014): 10, https://doi.org/10.6017/ital.v33i4.5631. 17 shoichi taniguchi, “is bibframe 2.0 a suitable schema for exchanging and sharing diverse descriptive metadata about bibliographic resources?,” cataloging & classification quarterly 56, no. 1 (2018): 40–61, https://doi.org/10.1080/01639374.2017.1382643. 18 shoichi taniguchi, “bibframe and its issues: from the viewpoint of rda metadata,” journal of information processing and management 58, no. 1 (2015): 20–27, https://doi.org/10.1241/johokanri.58.20. 19 shoichi taniguchi, “examining bibframe 2.0 from the viewpoint of rda metadata schema,” cataloging & classification quarterly 55, no. 6 (2017): 387–412, https://doi.org/10.1080/01639374.2017.1322161. 20 nosheen fayyaz, irfan ullah, and shah khusro, “on the current state of linked open data: issues, challenges, and future directions,” international journal on semantic web and information systems (ijswis) 14, no. 4 (2018): 110–28, https://doi.org/10.4018/ijswis.2018100106. 21 asim ullah, shah khusro, and irfan ullah, “bibliographic classification in the digital age: current trends & future directions,” information technology and libraries 36, no. 3 (2017): 48–77, https://doi.org/10.6017/ital.v36i3.8930. 22 tosaka, “rda,” 659. https://doi.org/10.1108/dlp-10-2015-0020 http://infoteka.bg.ac.rs/pdf/eng/2013-1/infotheca_xiv_1_2014_26-36.pdf https://doi.org/10.1108/lht-03-2013-0027 https://doi.org/10.3233/sw-160213 https://doi.org/10.1080/10572317.2016.1176455 https://doi.org/10.6017/ital.v33i4.5631 https://doi.org/10.1080/01639374.2017.1382643 https://doi.org/10.1241/johokanri.58.20 https://doi.org/10.1080/01639374.2017.1322161 https://doi.org/10.4018/ijswis.2018100106 https://doi.org/10.6017/ital.v36i3.8930 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 70 https://doi.org/10.6017/ital.v37i4.10432 23 tosaka, “rda,” 651, 652, 659. 24 tosaka, “rda,” 653, 660. 25 the first author used the trial version of rda toolkit to report these facts about rda (https://access.rdatoolkit.org). rda toolkit is co-published by american library association (http://www.ala.org), canadian federation of library associations (http://cflafcab.ca/en/home-page), and facet publishing (http://www.facetpublishing.co.uk). 26 ifla, “ifla conceptual models,” the international federation of library associations and institutions (ifla), 2017, updated april 06, 2009, accessed november 12, 2018, https://www.ifla.org/node/2016. 27 tosaka, “rda,” 651, 652, 655. 28 michael john khoo et al., “augmenting dublin core digital library metadata with dewey decimal classification,” journal of documentation 71, no. 5 (2015): 976–98. https://doi.org/10.1108/jd-07-2014-0103; ulli waltinger et al., “hierarchical classification of oai metadata using the ddc taxonomy,” in advanced language technologies for digital libraries, edited by raffaella bernardi, frederique segond and ilya zaihrayeu. lecture notes in computer science (lncs), 29–40: springer, berlin, heidelberg, 2011; aaron krowne and martin halbert, “an initial evaluation of automated organization for digital library browsing,” paper presented at the proceedings of the 5th acm/ieee-cs joint conference on digital libraries, denver, co, usa, june 7–11, 2005 2005; waltinger, “ddc taxonomy,” 30. 29 khoo, “dublin core,” 977, 984 . 30 loc, “marc standards: marc21 formats,” library of congress (loc), 2013, updated march 14, 2013, accessed january 2, 2014, http://www.loc.gov/marc/marcdocz.html. 31 philip e schreur, “linked data for production and the program for cooperative cataloging,” pcc policy committee meeting, 2017, accessed may 18, 2018, https://www.loc.gov/aba/pcc/documents/facil-session-2017/pcc_and_ld4p.pdf. 32 sarah bull and amanda quimby, “a renaissance in library metadata? the importance of community collaboration in a digital world,” insights 29, no. 2 (2016): 146–53, http://doi.org/10.1629/uksg.302. 33 philip e. schreur, “linked data for production,” pcc policy committee meeting, 2015, accessed november 09, 2018, https://www.loc.gov/aba/pcc/documents/pcc-ld4p.docx. 34 vandenbussche, “linked open vocabularies,” 437, 438, 450. 35 hallo, “current state,” 120. 36 hallo, “current state,” 118. 37 hallo, “current state,” 120, 124. https://access.rdatoolkit.org/ http://www.ala.org/ http://cfla-fcab.ca/en/home-page http://cfla-fcab.ca/en/home-page http://www.facetpublishing.co.uk/ https://www.ifla.org/node/2016 https://doi.org/10.1108/jd-07-2014-0103 http://www.loc.gov/marc/marcdocz.html https://www.loc.gov/aba/pcc/documents/facil-session-2017/pcc_and_ld4p.pdf http://doi.org/10.1629/uksg.302 https://www.loc.gov/aba/pcc/documents/pcc-ld4p.docx information technology and libraries | december 2018 71 38 hallo, “current state,” 120, 124. 39 hallo, “current state,” 124. 40 bull, “community collaboration,” 147. 41 sam gyun oh, myongho yi, and wonghong jang, “deploying linked open vocabulary (lov) to enhance library linked data,” journal of information science theory and practice 2, no. 2 (2015): 6–15, http://dx.doi.org/10.1633/jistap.2015.3.2.1. 42 carlo bianchini and mauro guerrini, “a turning point for catalogs: ranganathan’s possible point of view,” cataloging & classification quarterly 53, no. 3-4 (2015): 341–51, http://doi.org/10.1080/01639374.2014.968273. 43 bianchini, “turning point,” 350. 44 loc, “library of congress linked data service,” the library of congress, accessed march 24, 2018, http://id.loc.gov/about/. 45 loc, “linked data service.” 46 loc, “linked data service.” 47 loc, “linked data service.” 48 loc, “linked data service.” 49 margaret e dull, “moving metadata forward with bibframe: an interview with rebecca guenther,” serials review 42, no. 1 (2016): 65–69, https://doi.org/10.1080/00987913.2016.1141032. 50 loc, “overview of the bibframe 2.0 model,” library of congress, april 21, 2016, accessed november 09, 2018, https://www.loc.gov/bibframe/docs/bibframe2-model.html. 51 taniguchi, “bibframe 2.0,” 388; taniguchi, “suitable schema,” 40. 52 oclc. 2016, “oclc linked data research,” online computer library center (oclc), https://www.oclc.org/research/themes/data-science/linkeddata.html. 53 oclc, “linked data research.” 54 jeff mister, “turning bibliographic metadata into actionable knowledge,” next blog—oclc, february 29, 2016, http://www.oclc.org/blog/main/turning-bibliographic-metadata-intoactionable-knowledge/. 55 mister, “turning bibliographic metadata.” 56 george campbell, karen coombs, and hank sway, “oclc linked data,” oclc developer network, march 26, 2018, https://www.oclc.org/developer/develop/linked-data.en.html. 57 campbell, “oclc linked data.” http://dx.doi.org/10.1633/jistap.2015.3.2.1 http://doi.org/10.1080/01639374.2014.968273 http://id.loc.gov/about/ https://doi.org/10.1080/00987913.2016.1141032 https://www.loc.gov/bibframe/docs/bibframe2-model.html https://www.oclc.org/research/themes/data-science/linkeddata.html http://www.oclc.org/blog/main/turning-bibliographic-metadata-into-actionable-knowledge/ http://www.oclc.org/blog/main/turning-bibliographic-metadata-into-actionable-knowledge/ https://www.oclc.org/developer/develop/linked-data.en.html current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 72 https://doi.org/10.6017/ital.v37i4.10432 58 roy tennant, “getting started with linked data,” next blog—oclc, february 8, 2016, http://www.oclc.org/blog/main/getting-started-with-linked-data-3/. 59 tennant, “linked data.” 60 dblp, “dblp computer science bibliography: frequently asked questions,” digital bibliography & library project (dblp), updated november 07, 2018, accessed 08 november 2018. http://dblp.uni-trier.de/faq/. 61 dblp, “frequently asked questions.” 62 jörg diederich, wolf-tilo balke, and uwe thaden, “demonstrating the semantic growbag: automatically creating topic facets for faceteddblp,” paper presented at the proceedings of the 7th acm/ieee-cs joint conference on digital libraries, vancouver, canada, june 17–22, 2007. 63 jörg diederich, wolf-tilo balke, and uwe thaden, “about faceteddblp,” 2018, accessed november 09, 2018, http://dblp.l3s.de/dblp++.php. 64 tennant, “linked data.” 65 in this section, lov catalog or portal refers to the lov platform available at http://lov.okfn.org/dataset/lov/, whereas the abbreviation lov, when used alone (without the term catalog/portal), refers to linked open vocabularies in general; vandenbussche, “linked open vocabularies,” 437. 66 vandenbussche, “linked open vocabularies,” 443, 450. 67 vandenbussche, “linked open vocabularies,” 437. 68 vandenbussche, “linked open vocabularies,” 437, 438, 450. 69 vandenbussche, “linked open vocabularies,” 438. 70 vandenbussche, “linked open vocabularies,” 437, 438, 443–46. 71 baker thomas, pierre-yves vandenbussche, and bernard vatant, “requirements for vocabulary preservation and governance,” library hi tech 31, no. 4 (2013): 657–68, https://doi.org/10.1108/lht-03-2013-0027. 72 thomas, “vocabulary preservation,” 658. 73 oh, “deploying,” 9. 74 oh, “deploying,” 9. 75 oh, “deploying,” 9, 10. http://www.oclc.org/blog/main/getting-started-with-linked-data-3/ http://dblp.uni-trier.de/faq/ http://dblp.l3s.de/dblp++.php http://lov.okfn.org/dataset/lov/ https://doi.org/10.1108/lht-03-2013-0027 information technology and libraries | december 2018 73 76 erik radio and scott hanrath, “measuring the impact and effectiveness of transitioning to a linked data vocabulary,” journal of library metadata 16, no. 2 (2016): 80–94, https://doi.org/10.1080/19386389.2016.1215734. 77 radio, transitioning,” 81. 78 robert, “strings to things,” 2. 79 robert, “strings to things,” 2, 4, 6. 80 vandenbussche, “linked open vocabularies,” 438. 81 as of april 23, 2018, the schema.org vocabulary is now available at http://lov.okfn.org/dataset/lov/; alberto nogales et al., “linking from schema.org microdata to the web of linked data: an empirical assessment,” computer standards & interfaces 45 (2016): 90-99. https://doi.org/10.1016/j.csi.2015.12.003. 82 bull, “community collaboration,” 146. 83 bull, “community collaboration,” 146. 84 bull, “community collaboration,” 147. 85 bull, “community collaboration,” 147. 86 bull, “community collaboration,” 147, 148. 87 schreur, 2015. linked data for production. 88 yhna therese p. santos, “resource description and access in the eyes of the filipino librarian: perceived advantages and disadvantages,” journal of library metadata 18, no. 1 (2017): 45–56, https://doi.org/10.1080/19386389.2017.1401869. 89 santos, “filipino librarian,” 51–55. 90 philomena w. mwaniki, “envisioning the future role of librarians: skills, services and information resources,” library management 39, no. 1, 2 (2018): 2–11, https://doi.org/10.1108/lm-01-2017-0001. 91 mwaniki, “envisioning the future,” 7, 8. 92 taniguchi, “bibframe 2.0,” 410, 411 . 93 taniguchi, “suitable schema,” 52–58 . 94 taniguchi, “suitable schema,” 59, 60. 95 taniguchi, “suitable schema,” 60. 96 sprochi, “where are we headed?,” 129, 134. https://doi.org/10.1080/19386389.2016.1215734 http://lov.okfn.org/dataset/lov/ https://doi.org/10.1016/j.csi.2015.12.003 https://doi.org/10.1080/19386389.2017.1401869 https://doi.org/10.1108/lm-01-2017-0001 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 74 https://doi.org/10.6017/ital.v37i4.10432 97 sprochi, “where are we headed?,” 129. 98 sprochi, “where are we headed?,” 134. 99 sprochi, “where are we headed?,” 134. 100 sprochi, “where are we headed?,” 134, 135. 101 sprochi, “where are we headed?,” 134. 102 caitlin tillman, joseph hafner, and sharon farnel, “forming the canadian linked data initiative,” paper presented at the the 37th international association of scientific and technological university libraries 2016 (iatul 2016) conference, dalhousie university libraries in halifax, nova scotia, june 5–9, 2016. 103 carol jean godby, shenghui wang, and jeffrey k mixter, library linked data in the cloud: oclc's experiments with new models of resource description. vol. 5, synthesis lectures on the semantic web: theory and technology, san rafael, california (usa),morgan & claypool publishers, 2015, https://doi.org/10.2200/s00620ed1v01y201412wbe012. 104 sofia zapounidou, michalis sfakakis, and christos papatheodorou, “highlights of library data models in the era of linked open data,” paper presented at the the 7th metadata and semantics research conference, mtsr 2013, thessaloniki, greece, november 19 –22, 2013; timothy w. cole et al., “library marc records into linked open data: challenges and opportunities,” journal of library metadata 13, no. 2–3 (2013): 163–96, https://doi.org/10.1080/19386389.2013.826074; kim tallerås, “from many records to one graph: heterogeneity conflicts in the linked data restructuring cycle, information research 18, no. 3 (2013) paper c18, accessed november 10, 2018. 105 fabiano ferreira de castro, “functional requirements for bibliographic description in digital environments,” transinformação 28, no. 2 (2016): 223–31. https://doi.org/10.1590/231808892016000200008. 106 castro, “functional requirements,” 223, 224. 107 castro, “functional requirements,” 224, 230. 108 castro, “functional requirements,” 223, 228–30. 109 gardašević, “possibilities and prospects,” 35. 110 godby, oclc's experiments, 112. 111 gonzales, “the future,” 17. 112 karim tharani, “linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe,” information technology and libraries 34, no. 1 (2015): 5–15. https://doi.org/https://doi.org/10.6017/ital.v34i1.5664. 113 tharani, “harvesting and sharing,” 16. https://doi.org/10.2200/s00620ed1v01y201412wbe012 https://doi.org/10.1080/19386389.2013.826074 https://doi.org/10.1590/2318-08892016000200008 https://doi.org/10.1590/2318-08892016000200008 https://doi.org/https:/doi.org/10.6017/ital.v34i1.5664 information technology and libraries | december 2018 75 114 gonzales, “the future,” 16. 115 karen smith-yoshimura, “analysis of international linked data survey for implementers,” dlib magazine, 2016, july/august 2016. 116 smith-yoshimura, “analysis.” 117 smith-yoshimura, “analysis.” 118 aikaterini k. kalou, dimitrios a. koutsomitropoulos, and georgia d. solomou, “combining the best of both worlds: a semantic web book mashup as a linked data service over cms infrastructure,” journal of library metadata 16, no. 3–4 (2016): 228–49, https://doi.org/10.1080/19386389.2016.1258897. 119 cole, “marc,” 163, 165, 175. 120 cole, “marc,” 163, 164, 191. 121 cole, “marc,” 164, 191. 122 ifla, “linked open data: challenges arising,” the international federation of library associations and institutions (ifla), 2014, accessed march 03, 2018, https://www.ifla.org/book/export/html/8548. 123 hallo, “current state,” 124. 124 hallo, “current state,” 126. 125 hallo, “current state,” 124. 126 karen smith-yoshimura, “linked data survey results 4–why and what institutions are publishing (updated),” hanging together the oclc research blog, september 3, 2014, accessed november 12, 2018, https://hangingtogether.org/?p=4167. 127 bull, “community collaboration,” 148. 128 tallerås, “one graph.” 129 karen smith-yoshimura, “linked data survey results 3–why and what institutions are consuming (updated),” hanging together the oclc research blog, september 1, 2014, accessed november 12, 2018, http://hangingtogether.org/?p=4155. 130 godby, oclc’s experiments, 116. 131 carol jean godby and karen smith‐yoshimura, “from records to things: managing the transition from legacy library metadata to linked data,” bulletin of the association for information science and technology 43, no. 2 (2017): 18–23, https://doi.org/10.1002/bul2.2017.1720430209. 132 godby, “from records to things,” 23. https://doi.org/10.1080/19386389.2016.1258897 https://www.ifla.org/book/export/html/8548 https://hangingtogether.org/?p=4167 http://hangingtogether.org/?p=4155 https://doi.org/10.1002/bul2.2017.1720430209 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 76 https://doi.org/10.6017/ital.v37i4.10432 133 godby, “from records to things,” 22. 134 vandenbussche, “linked open vocabularies,” 449, 450. 135 silvia b. southwick, cory k lampert, and richard southwick, “preparing controlled vocabularies for linked data: benefits and challenges,” journal of library metadata 15, no. 3–4 (2015): 177–190, https://doi.org/10.1080/19386389.2015.1099983. 136 southwick, “controlled vocabularies,” 177. 137 southwick, “controlled vocabularies,” 189, 190. 138 southwick, “controlled vocabularies,” 183. 139 robin hastings, “feature: linked data in libraries: status and future direction,” computers in libraries (magzine article), 2015, http://www.infotoday.com/cilmag/nov15/hastings-linked-data-in-libraries.shtml. 140 hastings, “status and future.” 141 hastings, “status and future.” 142 hastings, “status and future.” 143 hastings, “status and future.” 144 tallerås, “national libraries,” 129 (by quoting from van hooland 2009; wang and strong 1996). 145 jung-ran park, “metadata quality in digital repositories: a survey of the current state of the art,” cataloging & classification quarterly 47, no. 3–4 (2009): 213–28, https://doi.org/10.1080/01639370902737240. 146 tallerås, “national libraries,” 129 (by quoting from bruce & hillmann, 2004). 147 park, “metadata quality,” 213, 224; tallerås, “national libraries,” 129, 150. 148 park, “metadata quality,” 213, 215, 218–21, 224, 225; tallerås, “national libraries,” 141. 149 tallerås, “national libraries,” 129. 150 tallerås, “national libraries,” 129. 151 tallerås, “national libraries,” 129. 152 karen snow, “defining, assessing, and rethinking quality cataloging,” cataloging & classification quarterly 55, no. 7–8 (2017): 438–55, https://doi.org/10.1080/01639374.2017.1350774. 153 snow, “quality cataloging,” 445. 154 snow, “quality cataloging,” 451, 452. https://doi.org/10.1080/19386389.2015.1099983 http://www.infotoday.com/cilmag/nov15/hastings--linked-data-in-libraries.shtml http://www.infotoday.com/cilmag/nov15/hastings--linked-data-in-libraries.shtml https://doi.org/10.1080/01639370902737240 https://doi.org/10.1080/01639374.2017.1350774 information technology and libraries | december 2018 77 155 david van kleeck et al., “managing bibliographic data quality for electronic resources,” cataloging & classification quarterly 55, no. 7-8 (2017): 560–77, https://doi.org/10.1080/01639374.2017.1350777. 156 van kleeck, “data quality,” 560, 575, 576. 157 van kleeck, “data quality,” 575. 158 park, “metadata quality,” 214, 216–18, 225. 159 niso, a framework of guidance for building good digital collections, ed. niso framework advisory group, 3rd ed (baltimore, md: national information standards organization, 2007), https://www.niso.org/sites/default/files/2017-08/framework3.pdf. 160 park, “metadata quality,” 214, 215; niso. guidance; jane barton, sarah currier, and jessie mn hey, “building quality assurance into metadata creation: an analysis based on the learning objects and e-prints communities of practice,” paper presented at the proceedings of the international conference on dublin core and metadata applications: supporting communities of discourse and practice—metadata research & applications, seattle, washington, september 28–october 2, 2003. 161 pascal hitzler and krzysztof janowicz, “linked data, big data, and the 4th paradigm,” semantic web 4, no. 3 (2013): 233–35, https://doi.org/10.3233/sw-130117. 162 hitzler, “4th paradigm,” 234. 163 hitzler, “4th paradigm,” 234. 164 alberto petrucciani, “quality of library catalogs and value of (good) catalogs,” cataloging & classification quarterly 53, no. 3–4 (2015): 303–13. https://doi.org/10.1080/01639374.2014.1003669. 165 petrucciani, “quality,” 303, 305. 166 petrucciani, “quality,” 303, 309, 311. 167 petrucciani, “quality,” 303, 309. 168 petrucciani, “quality,” 309, 310. 169 petrucciani, “quality,” 310. 170 bull, “community collaboration,” 147. 171 bull, “community collaboration,” 148. 172 han, myung-ja, “new discovery services and library bibliographic control,” library trends 61, no. 1 (2012):162–72, https://doi.org/10.1353/lib.2012.0025. 173 han, “bibliographic control,” 162. https://doi.org/10.1080/01639374.2017.1350777 https://www.niso.org/sites/default/files/2017-08/framework3.pdf https://doi.org/10.3233/sw-130117 https://doi.org/10.1080/01639374.2014.1003669 https://doi.org/10.1353/lib.2012.0025 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 78 https://doi.org/10.6017/ital.v37i4.10432 174 han, “bibliographic control,” 169–71. 175 han, “bibliographic control,” 163. 176 han, “bibliographic control,” 167–70. 177 alemu, emergent theory, 29–33, 43–65. 178 alemu, emergent theory, 29–65. 179 lorri mon, social media and library services, synthesis lectures on information concepts, retrieval, and services, ed. gary marchionini, 40, san rafael, california (usa), morgan & claypool publishers, 2015), https://doi.org/10.2200/s00634ed1v01y201503icr040. 180 mon, social media, 50. 181 mon, social media, 24. 182 marijn koolen et al., “overview of the clef 2016 social book search lab,” paper presented at the 7th international conference of the cross-language evaluation forum for european languages, évora, portugal, september 5–8, 2016; koolen et al., “overview of the clef 2015 social book search lab,” paper presented at the 6th international conference of the crosslanguage evaluation forum for european languages, toulouse, france, september 8–11, 2015; patrice bellot et al., “overview of inex 2014,” paper presented at the international conference of the cross-language evaluation forum for european languages, sheffield, uk, september 15–18, 2014; bellot et al., “overview of inex 2013,” paper presented at the international conference of the cross-language evaluation forum for european languages, valencia, spain, september 23–26, 2013. 183 bo-wen zhang, xu-cheng yin, and fang zhou, “a generic pseudo relevance feedback framework with heterogeneous social information,” information sciences 367–68 (2016): 909–26, https://doi.org/10.1016/j.ins.2016.07.004; xu-cheng yin et al., “isart: a generic framework for searching books with social information,” plos one 11, no. 2 (2016): e0148479, https://doi.org/10.1371/journal.pone.0148479; faten hamad and bashar alshboul, “exploiting social media and tagging for social book search: simple query methods for retrieval optimization,” in social media shaping e-publishing and academia, edited by nashrawan tahaet al., 107–17 (cham: springer international publishing, 2017). 184 marijn koolen, “user reviews in the search index? that’ll never work!” paper presented at the 36th european conference on ir research (ecir 2014), amsterdam, the netherlands, april 13–16, 2014. 185 alemu, emergent theory, 29–33, 43–65. 186 lucy clements and chern li liew, “talking about tags: an exploratory study of librarians’ perception and use of social tagging in a public library,” the electronic library 34, no. 2 (2016): 289–301, https://doi.org/10.1108/el-12-2014-0216. 187 clements, “talking about tags,” 291, 297-99. https://doi.org/10.2200/s00634ed1v01y201503icr040 https://doi.org/10.1016/j.ins.2016.07.004 https://doi.org/10.1371/journal.pone.0148479 https://doi.org/10.1108/el-12-2014-0216 information technology and libraries | december 2018 79 188 sharon farnel, “understanding community appropriate metadata through bernstein’s theory of language codes,” journal of library metadata 17, no. 1 (2017): 5–18, https://doi.org/10.1080/19386389.2017.1285141. 189 farnel, “bernstein’s theory,” 5, 6. 190 mwaniki, “envisioning the future,” 8. 191 mwaniki, “envisioning the future,” 8, 9. 192 getaneh alemu et al., “toward an emerging principle of linking socially-constructed metadata,” journal of library metadata 14, no. 2 (2014): 103–29, https://doi.org/10.1080/19386389.2014.914775. 193 farnel, “bernstein’s theory,” 15–16. 194 kalou, “book mashup.” 195 kalou, “book mashup,” 242, 243. 196 alemu, “socially-constructed metadata,” 103, 107. 197 alemu, “socially-constructed metadata,” 103. 198 alemu, “socially-constructed metadata,” 103, 104, 120, 121. 199 getaneh alemu, “a theory of metadata enriching and filtering: challenges and opportunities to implementation,” qualitative and quantitative methods in libraries 5, no. 2 (2017): 311–34, http://www.qqml-journal.net/index.php/qqml/article/view/343 200 alemu, “metadata enriching and filtering,” 311. 201 alemu, “socially-constructed metadata,” 125. 202 alemu, “metadata enriching and filtering,” 319, 320. 203 alemu, “metadata enriching and filtering”; alemu, emergent theory; alemu, “sociallyconstructed metadata”; farnel, “bernstein's theory”; kalou, “book mashup.” 204 hallo, “current state,” 120. 205 alemu, “socially-constructed metadata,” 125; hastings, “status and future.” 206 bull, “community collaboration,” 147. 207 bull, “community collaboration,” 152; bull, “community collaboration,” 152; schreur, 2015. linked data for production. 208 tallerås, “national libraries,” 129. 209 petrucciani, “quality,” 303, 309. https://doi.org/10.1080/19386389.2017.1285141 https://doi.org/10.1080/19386389.2014.914775 http://www.qqml-journal.net/index.php/qqml/article/view/343 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 80 https://doi.org/10.6017/ital.v37i4.10432 210 bull, “community collaboration,” 147, 152. 211 farnel, “bernstein's theory,” 5, 6, 12, 13, 15, 16; mwaniki, “envisioning the future,” 8. 212 mon, social media, 3; alemu, “metadata enriching and filtering,” 320. 213 alemu, “socially-constructed metadata,” 125. 214 koolen, “clef 2016”; koolen, “clef 2015”; bellot, “inex 2014”; bellot, “inex 2013.” abstract introduction the role of linked open data and vocabularies in cataloging linked and open data linked open vocabularies challenges, issues, and research opportunities the multiplicity of cataloging rules and standards publishing and consuming linked bibliographic metadata publishing linked bibliographic metadata consuming linked bibliographic metadata quality of linked bibliographic metadata linking the socially curated metadata the socially curated metadata matters in cataloging the socially curated metadata as linked data conclusions references in the middle of difficulty lies opportunity: hope floats lita president’s message in the middle of difficulty lies opportunity hope floats evviva weinraub lajoie information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12687 evviva weinraub lajoie (evviva@gmail.com) is vice provost for university libraries and university librarian, university at buffalo, and the last lita president. © 2020. if quarantine has illustrated anything to me, it’s that time is merely a construct. while my approximately 2-month term as president may be the shortest in lita history, it has been filled with meetings, reports, protests, and preparations for our metamorphosis into core. my thoughts have been consumed with the myriad of financial, health, and societal issues that have also filled my news feed. i spend a lot of time thinking and worrying about what their impact will be on our work and our institutions, how it impacts me and the people i work with personally, and what role core may play for many of us in the future. i imagine all of us are thinking about health and safety. we are all balancing those parts of ourselves that want to aid, to help, to teach and guide with the parts of ourselves that are anxious and scared. many of us have responsibilities where we need to protect our loved ones and ourselves. we are seeing the health and safety of our bipoc colleagues disproportionately harmed. balancing our crucial role within our communities is complicated and there are no right answers. i imagine many of us have been spending a lot of time thinking about money, whether it be personal concerns, institutional and organizational concerns, or their intersection point. we’re thinking a lot more about where our money comes from, how it is invested, how we pay for things, how we prioritize paying for things, who decides what gets purchased, and whose voice gets centered when we make that purchase. we’re thinking carefully about the institutions and infrastructures that have existed and how they will look different and should be different in a post-covid landscape. i imagine most of us are thinking about societal connections. we are interacting with our professional colleagues differently, and many of us are, perhaps for the first time, perceiving the deep imbalances that permeate our personal, social, and professional lives. we are all trying to figure out how to do the work we need to do when we are uncomfortable and the world is uncertain and the demands for change are coming from all angles and in a variety of forms. lita remained my professional home through the years because i found it to be a place where no matter who you were or where you worked, there was a place for you. that feeling of connection is so vital to all of us, pandemic and social unrest or not. knowing there is a network i can depend on to be there when i’m working through the difficult and uncomfortable makes the work just a little bit easier and significantly more meaningful. our professional organizations and affiliations have the ability to be an anchor in uncertain times whether through a change in career, a financial crisis, an environmental catastrophe, or a global health emergency. on august 31, 2020, lita officially dissolved and on september 1, our home became core. at our last lita board meeting, margaret heller and amanda l. goodman presented a history of lita. mailto:evviva@gmail.com http://hdl.handle.net/11213/14823 information technology and libraries september 2020 in the middle of difficulty lies opportunity | weinraub lajoie 2 what became clear to me in the retelling is that this is not lita’s first reorganization. nor is it our second or our third. llama, lita, and alcts have always been dancing with each other. our merger is an acknowledgement that we “...play a central role in every library, shaping the future of the profession by striking a balance between maintenance and innovation, process and progress, collaboration and leading.” collectively, we have had a year that is beyond comprehension—it has been filled with loss, anger, frustration, grief, anxiety, depression, horror...we have all been weathering the same storm, but our ships are not all equally prepared for the task laid ahead of them. that has been, for so many of us, the hardest part of all of this. we may have always known that inequities existed, that the system was structured to make sure that some folks were never able to get access to the better goods and services, but for many, this pandemic is the first time we have had those systemic inequities held up to our noses and been asked, “what are you going to do to change this?” balancing those priorities will require us to lean on our professional networks and organizations to be more and to do more. i believe that together, we can make core stand up to that challenge. it has been an honor to serve as the last lita president. for the brief time i have served, to have the chance to hold an office so many people i truly admire have held...it is a legacy i am proud to have had a moment to uphold. i am gratified to transition lita into a partnership that will take all that we have loved about lita and make something new, something core. https://core.ala.org/our-mission-vision-and-values/ https://core.ala.org/our-mission-vision-and-values/ https://core.ala.org/our-mission-vision-and-values/ the first 500 mistakes you will make while streaming on twitch.tv public libraries leading the way the first 500 mistakes you will make while streaming on twitch.tv chris markman, kasper kimura, and molly wallner information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.15475 chris markman (chris.markman@cityofpaloalto.org) is senior librarian, palo alto city library. kasper kimura (kasper.tsutomu@gmail.com) is methodist youth fellowship high school director, wesley united methodist church. molly wallner (molly.wallner@cityofpaloalto.org) is senior librarian, palo alto city library. © 2022. introduction three librarians at the palo alto city library embarked on an epic virtual event journey in 2020. this is our story. twitch.tv is the most popular video game streaming platform on the internet right now, but that does not mean it is the easiest to use or navigate as content creators. while the mistakes were many, you do not have to repeat them. in short, lessons learned over the past two years fell under four distinct categories, many of them interrelated or compounding one another: • physical space limitations and challenges migrating studio setups during various phases of the covid-19 pandemic. • complex decisions making audio and video equipment purchases. • our own familiarity with videogame streaming platforms and specialized software. • converting our in-person event policies and codes of conduct for virtual events. mistakes 001–135: picking the right time, place, and software we can say confidently that mistake #1 in your 500-mistake journey is pretending the library will strike gold with its first ever stream and achieve instant online success. we chose minecraft as our first videogame featured on twitch.tv. the cold reality is that real-world streamers who host thousands of viewers at one time are not building the interpersonal connections you are likely aiming for as a librarian. the second biggest mistake you’re likely to make while setting up a stream is in picking the right location. over the course of two years, in response to different levels of building access, we ended up moving our ad-hoc studio location a total of four times. each location posed its own challenges, and we learned more about what worked with every move. your streaming space should not only be distraction free, but also easy to adjust as needed, because your setup will change over time. picking the right av equipment for your stream is a gigantic topic, and the subject of infinite support forum threads and online discussions. the correct answer also largely depends on if you plan to stick with console game streaming, or pc, or some mixture of both. we can summarize by saying that to start off, you do not need the very best studio gear, and in fact , this thinking can lead to an artificial barrier that might result in more “tech debt” than necessary. you will end up spending a considerable amount of time troubleshooting strange quirks that were not there the last time you streamed, or with each new equipment purchase/upgrade. mailto:chris.markman@cityofpaloalto.org mailto:kasper.tsutomu@gmail.com mailto:molly.wallner@cityofpaloalto.org information technology and libraries september 2022 the first 500 mistakes you will make while streaming on twitch.tv | markman, kimura, and wallner 2 mistakes 136–223: moderation tools and volunteers we have had to block a few bots, as well as tactfully defuse some loose-cannon streamsurfers by maintaining aggressive kindness in answer to their sarcastic questioning. overall, our moderating world has not been rocked in a way we weren’t prepared for, due to our thoughtfully crafted and transparent policy that was adapted from our patron code of conduct, trained teen volunteer moderators, and clear communication as a team. mistakes 224–301: the finer points of twitch.tv in addition to having had little experience playing video games in general, our stream host also had no experience with streaming. by design, kasper went into our first stream with only two guidelines for interacting with twitch viewers: don’t stop talking and be friendly. no one wants to watch someone silently play a game badly; it’s not engaging and it’s not fun. another part of using twitch that we did not account for until we were in the middle of the first stream is that the chat runs on a delay. this makes sense from a moderating point of view; you want to be able to catch inappropriate or spam comments. however, in terms of holding a conversation with the chat, it became a mental challenge to hold multiple threads of conversation at a time—all while playing—and all while narrating what’s happening on screen, and as people were typing to respond to what was just said or done. this process can be very overwhelming for twitch.tv hosts.1 imagine driving a car on the highway while also watching a movie of yourself, and then simultaneously holding a conversation with ten or more people in the back seat of this car at the same time. they’re not commenting on what you’re currently doing though, instead they’re making jokes about the on-ramp or stop light two miles back. it’s not impossible to juggle these tasks simultaneously, but as the host, it does require practice. mistakes 302–389: art is a process, just like the inevitable bugs you will find in your setup every time you change anything heed our warning! you can find a mountain of well-meaning online advice and tutorials about the best possible streaming setup and content strategy: much of this is outdated or aimed at a very specific subset of gamers. there is a cottage industry of media consultants and youtuber personalities that review hardware and share tips and tricks advice. your information literacy skills should not go to waste here! always consider the source. stream decks and keyboard shortcuts: what the twitch.tv pros get right if we could go back in time, there is one element to our stream setup that could have been integrated sooner, and that’s the stream deck by elgato (https://www.elgato.com/en/streamdeck). this extra desktop keypad is literally a game changer for usability—it is the peanut butter that smooths over all the ux cracks created by open broadcast studio (obs) and the chaos of chat interactions already discussed. this small hardware upgrade also makes onboarding new stream hosts much easier because there is no need to memorize keyboard shortcuts: the buttons on the stream deck can be customized to do exactly what they say they do (like mute audio, change screen layouts, or stop and start streaming). mistakes 390–499: do androids dream of electric animal crossing dream codes? what does twitch.tv outreach look like? we used social media to connect with other organizations doing similar work, such as the lgbtq+ youth space in san jose. we had worked with this group before the pandemic on some https://library.cityofpaloalto.org/library-policies/patron-code-of-conduct/ https://www.elgato.com/en/stream-deck https://www.elgato.com/en/stream-deck https://youthspace.org/ information technology and libraries september 2022 the first 500 mistakes you will make while streaming on twitch.tv | markman, kimura, and wallner 3 pride programs for teens at the library, and so in 2020, when we saw on their instagram that they had a minecraft server open to the local community, our team eagerly jumped on this opportunity to collaborate with them. we had a minecraft stream; they had a minecraft server—could the stars be any more aligned? after some planning, one of the server mods joined us for a stream and gave us a tour of their server, which ended up being one of our most popular streams to date. conclusion: and what did we learn from all this? the final mistake (#500) is giving up. over the past two years we have hosted over 50 streams at https://www.twitch.tv/paloaltolibrary and can say confidently that not only was each virtual event unique, but also improved over time. we encourage more librarians to test out this mode of online outreach and practice your iterative design skills. video game streaming is not only fun for both the audience and hosts, but also a great way to connect with “extremely online” patrons of all ages. endnotes 1 to illustrate this problem in more detail: consider the events of our first very first stream, in which kasper’s dog saw a postal employee through the window while live on camera and reacted accordingly. this was one of the many reasons why moving our center of operations from the living room to the library was an upgrade. https://www.twitch.tv/paloaltolibrary introduction mistakes 001–135: picking the right time, place, and software mistakes 136–223: moderation tools and volunteers mistakes 224–301: the finer points of twitch.tv mistakes 302–389: art is a process, just like the inevitable bugs you will find in your setup every time you change anything stream decks and keyboard shortcuts: what the twitch.tv pros get right mistakes 390–499: do androids dream of electric animal crossing dream codes? what does twitch.tv outreach look like? conclusion: and what did we learn from all this? endnotes a collaborative approach to newspaper preservation public libraries leading the way a collaborative approach to newspaper preservation ana krahmer and laura douglas information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12596 ana krahmer (ana.krahmer@unt.edu) oversees the digital newspaper unit at unt. through this work, she manages the texas digital newspaper program collection on the portal to texas history, which is a gateway to historic research materials freely available worldwide. laura douglas (laura.douglas@cityofdenton.com) is the librarian in charge of the special collections with the denton public library which houses the genealogy, texana, and local denton history collections as well as the denton municipal archives. in her work, she regularly assists patrons with newspaper research questions specifically related to denton newspapers. © 2020. introduction when we first proposed this column in january 2020, we had no idea how much the world would change between then and the july deadline. while we have collaborated for many years on a variety of projects, the value of our collaboration has never proven itself more than in this covid 19 reality: collaboration leverages the strengths and resources of partners to form something stronger than each. in this world of covid-19, the collaboration between the denton public library (dpl) and the university of north texas libraries (unt) has allowed us to build open, online access to the first 16 years of the denton record-chronicle (drc). this newspaper is the city’s daily newspaper of record, and the collaboration between dpl and unt resulted in free, worldwide research access, via the portal to texas history. the project was funded by a $24,820.00 grant through the imls library services and technology act (lsta), awarded from september 2019 to august 2020 by the texas state libraries and archives commission (tslac) as part of their textreasures program, to digitize 24,000 newspaper pages. this project has also resulted in a follow-up collaboration to build open access to further years of this daily newspaper title, through a 2021 textreasures award to digitize an additional 24,000 newspaper pages. the real question, though, is what recipe made this a successful collaboration. background the drc has been the community newspaper in denton for over 100 years. due to the sheer amount of material, digitizing a daily newspaper with such an extensive publication run is a long term project that requires a lot of planning, time, and funding. since the dpl’s inception in 1937, the library has endeavored to collect items related to denton and texas history. with community support, the library has developed a well-rounded collection of local history, texana, and genealogical materials, all of which are housed in the special collections research area at the emily fowler central library. these materials support research, projects, and exhibits. one major research resource is the archival collection of local newspapers, mainly the drc, maintained on 752 rolls of microfilm containing issues from 1908 to 2018. before this project, access to these newspapers was only available in the special collections research area, through microfilm readers or paid subscription services. in addition, although steps had been taken to preserve the film, many of the rolls show wear from years of use, while others have developed vinegar syndrome and soon will no longer be a usable resource. in 2018, unt obtained publisher permission to make the drc run freely accessible on the portal to texas history. mailto:ana.krahmer@unt.edu mailto:laura.douglas@cityofdenton.com information technology and libraries september 2020 a collaborative approach to newspaper preservation | krahmer and douglas 2 laura had been exploring different avenues to digitize this microfilm and make them freely available to the public when ana contacted her with information about the texas state library and archives commission (tslac), which awards annual grants supported by library services & technology act funds, through the institute of museum & library services. lsta funding is annually provided to all fifty states through the institute of museum and library services, and the state library determines how this funding is expended. in texas, lsta funding is provided through a number of grant programs including textreasures, a competitive grant program for any texas library. as described by tslac, the “textreasures grant is designed to help libraries make their special collections more accessible for the people of texas and beyond. activities considered for possible funding include digitization, microfilming, and cataloging.” libraries can apply to fund the same type of project up to three years in a row, and the drc project applied for $24,820.00 in 2019 to digitize 24,000 newspaper pages, representing the earliest years of microfilm available at the denton public library. to create a viable grant application dpl partnered with the texas digital newspaper program (tdnp), available through unt’s portal to texas history, and decided to start first by digitizing as many early years of microfilm as grant funding could cover. tdnp is the largest single-state, open access, digital newspaper preservation repository in the u.s., hosting just under 8 million newspaper pages at the time of this writing. in late 2018, unt received permission from the owner of the drc to include the newspaper run in the tdnp collection, which represented a very exciting opportunity for city and county researchers, as well as for the dpl. as thanks to the publisher for granting permission, unt built access to the 2014 to 2018 pdf eprint editions, which the tdnp preserves as a service to texas press associationmember publishers. after this, unt contacted the dpl to discuss applying for grant funding. once laura learned that the dpl had received the 2019 award, she prepared the local planning steps necessary to collaborate with the university. the project becomes real the denton record-chronicle digitization project grant contract and resolution for adoption went before the denton city council on october 8, 2019. the city of denton issued a press release that day, and the drc also published an article announcing the project. over the next few days the drc article appeared across social media, including the city of denton’s social media accounts, as well as through library-associated email newsletters. after the first newspapers became available on the portal, both dpl and unt prepared blog posts about the project, which have also appeared on social media. these blog posts fulfilled publicity requirements specified by the grant, even while offering training to researchers in how to work with the online newspaper collection. one major convenience to this collaboration is that both organizations are in the same city. transfer of materials was arranged by email and accomplished by a trip across town. we completed the digitization process in batches, with the first 10 microfilm rolls going to unt on october 10, 2019, and unt uploading the first 854 issues in december 2019. the newspapers from the first microfilm set represented 1908-1916. dpl transferred the last set of microfilm in april 2020, with dates ranging from 1917 through september 1924, shortly after which unt completed and uploaded the grant-funded count of 24,000 newspaper pages. the estimated year given in the grant proposal that the scans would have gone through was 1938, but the page count on this newspaper proved to be much, much higher than originally estimated, and as a result, the funding only covered up to september 1924. dpl and unt will continue their partnership by information technology and libraries september 2020 a collaborative approach to newspaper preservation | krahmer and douglas 3 digitizing further years of the drc, through a variety of methods. as we were in the midst of preparing this column the tslac contacted laura to inform her that dpl had received a second grant award, in the amount of $24,820.00 to digitize 24,000 additional newspaper pages, which will move the newspapers through 1954. as of july 23, 2020, the denton record-chronicle collection on the portal to texas history hosts 6,168 items and has been used 16,397 times. this includes 1,743 items that are pdf eprint editions of the paper from 2014 to 2018, which unt uploaded for long-term preservation and access. unt uploads eprint editions without a charge, and digitally preserves these through an agreement with the texas press association; these pdfs were not a part of the funded grant, but they do enhance access to the collection and helped to build community interest in seeing earlier years available on the portal. the usage of the collection skyrocketed after the early editions became available. january 2020 saw the highest number with the collection uses at 3105. once this project is complete, it will include over 200,000 newspaper pages. neither dpl nor unt has the ability to tackle this project on their own, but through collaboration, it is possible. recipe for your own collaboration success these are planning recommendations as you prepare for your own collaboration, drawn from what we’ve learned as we worked on this project together. 1. communicate early and often: communicating needs enables partners to identify each other’s strengths. each partner will bring their strengths to the project, which in this case included actual archival materials from dpl and technological expertise on the unt side. in addition, be prepared to communicate with local groups who need to endorse or sign off on the project, including possibly the city council, the historical commission, or the city manager. 2. partner to write the grant: partnering in preparing the grant achieves two goals: first, it enables partners to develop a communication flow that will move forward throughout the collaboration; second, it ensures that partners know what each can realistically accomplish within the grant timeline. in this case, laura wrote most of the grant application herself, but she had very specific questions that ana had to answer, and she needed key elements from unt, including project budget, technological infrastructure, and a commitment letter. communicating early and partner on the grant application process ensured that there were no unexpected surprises that were within the control of either partner. 3. work together to explain your partnership: with a grant of this size, we always spoke in advance to ensure we weren’t over-promising when newspapers would appear online. this also gave both laura and ana lead-time for promoting the project: laura would share the years of the physical microfilm before sending them over, and ana would walk laura through the years that would get uploaded in a given month. this allowed them to plan publicity, training, and outreach efforts based on the dates of newspapers going online. in addition, laura regularly communicated with ana prior to submitting grant reports, and this was critical in preventing miscommunication going to the funding agency. 4. pad enough time for the unexpected: of course, we had no way of knowing a pandemic would occur when we began this project, and what saved us was that we’d started planning as soon as we learned about receiving the grant, rather than as soon as the grant started, which was in september 2019. planning two months in advance put us two months ahead of schedule, and we were able to start exchanging materials as soon as the grant period information technology and libraries september 2020 a collaborative approach to newspaper preservation | krahmer and douglas 4 started. this gave us a few weeks of lead time so we successfully completed the project by the end of april 2020, at which point the microfilm page count had been scanned and unt staff could remote in to complete the digitization processes. extra time is only a benefit. if the covid-19 pandemic had not occurred, we still might have had to address technological or film deterioration problems, and we could resolve these earlier rather than later because we had given ourselves a few extra weeks of lead time. 5. don’t be afraid to explain changes to your granting agency: if your project changes due to unforeseen circumstances, for example in our project the uploaded total of pages reached 24,000 before we digitized the entire planned date range. unt charges a per-page digitization fee, and these newspaper issues proved to contain more pages than expected . laura contacted the representative at tslac to explain the situation and offer an alternative approach to cover the digitization of the remaining years. the important thing is to keep the granting agency informed of any changes, delays, or hiccups in the project. we are both proud of having completed this project three months before the end of the grant period, but we know that without solid communication, planning, or flexibility, the covid-19 pandemic would have made the situation extremely difficult if not impossible. leveraging the portal’s technical infrastructure and tdnp’s newspaper expertise with the volume of material and collection expertise provided by the dpl has given us a model for success we plan to capitalize on in future projects. best of all, in the world of covid-19, our patrons can access these newspapers from the comfort of their own couches, without even taking off their pajamas! introduction background the project becomes real recipe for your own collaboration success primo new user interface: usability testing and local customizations implemented in response blake lee galbreath, corey johnson, and erin hvizdak information technology and libraries | june 2018 10 blake lee galbreath (blake.galbreath@wsu.edu) is core services librarian, corey johnson (coreyj@wsu.edu) is instruction and assessment librarian, and erin hvizdak (erin.hvizdak@wsu.edu) is reference and instruction librarian, washington state university. abstract washington state university was the first library system of its 39-member consortium to migrate to primo new user interface. following this migration, we conducted a usability study in july 2017 to better understand how our users fared when the new user interface deviated significantly from the classic interface. from this study, we learned that users had little difficulty using basic and advanced search, signing into and out of primo, and navigating their account. in other areas, where the difference between the two interfaces was more pronounced, study participants experienced more difficulty. finally, we present customizations implemented at washington state university to the design of the interface to help alleviate the observed issues. introduction a july 2017 usability study by washington state university (wsu) libraries was the final segment of a sixmonth process for migrating to the new user interface of ex libris primo called primo new ui. wsu libraries assembled a working group in december 2016 to plan for the migration from the classic interface to primo new ui and met bi-weekly through may 2017. to start, the primo new ui working group attempted to answer some baseline questions: what can and cannot be customized in the new interface? how, and according to what timeline, should we introduce the new interface to our library patrons? what methods could be used to assess the new interface? this working group customized the look and feel of the new interface to conform to wsu branding and then released a beta version of primo new ui in march, leaving the older interface (primo classic) as the primary means of access to primo but allowing users to enter and test the beta version of the new interface. in early may (at the start of the summer semester), the prominence of the old and new interfaces was reversed, making primo new ui the default interface but leaving the possibility of continued access to primo classic. the older interface was removed from public access in mid-august, just prior to the start of the fall semester. the public had the opportunity to work with the beta version from march to may and then another two months experience with the production release by the time the usability study took place in july 2017. the remainder of this paper will focus on the details of this usability study. mailto:blake.galbreath@wsu.edu mailto:coreyj@wsu.edu mailto:erin.hvizdak@wsu.edu primo new user interface | galbreath, johnson, and hvizdak 11 https://doi.org/10.6017/ital.v37i2.10191 research questions primo new ui was the name given to the new front end of the primo discovery layer, which was made available to customers in august 2016. according to ex libris, “its design is based on user studies and feedback to address the different needs of different types of users.”1 we were primarily interested in understanding the usability of the essential functionalities of primo new ui, especially where the design of the new interface deviated significantly from the classic interface (taking local customizations into account). for example, we noted that the new interface introduced the following differences to the user (this ordinal list corresponds to the number labels in figure 1): 1. basic search tabs were expressed as drop-downs. 2. the advanced search link was less prominent than it was with our customized shape and color in the classic interface. 3. main menu items were located in a separate area from the sign in and my account links. 4. my favorites and help/chat icons were located together and in a new section of the top navigation bar. 5. sign in and my account links were hidden beneath a “guest” label. 6. facet values were no longer associated with checkboxes or underlining upon hover. 7. availability statuses were expressed through colored text. figure 1. basic search screen in primo new ui. we also observed a fundamental change in the structure of the record in primo new ui: the horizontally oriented and tabbed structure of the classic record (see figure 2) was converted to a vertically oriented and non-tabbed structure in the new interface (see figure 3). additionally, the tabbed structure of the classic interface opened in a frame of the brief results area, while the same information was displayed on the full display page of the new interface. the options displayed in these areas are known as get it and view it (although we locally branded our sections availability and request options and access options, information technology and libraries | june 2018 12 respectively). therefore, we were eager to see how this change in layout might affect a participant’s ability to find get it and view it information on the full display page. taking the above observations into account, we formulated the following questions: 1. will the participant be able to find and use the basic search functionality? 2. will the participant be able to understand the availability information of the brief results? 3. will the participant be able to find and use the sign in and sign out features? 4. will the participant be able to understand the behavior of the facets? 5. will the participant be able to find and use the actions menu? (see the “send to” boxed area in figure 3.) 6. will the participant be able to navigate the get it and view it areas of the full display page? (see the “availability and request options” boxed area in figure 3.) 7. will the participant be able to navigate the my account area? 8. will the participant be able to find and use the help/chat and my favorites icons? 9. will the participant be able to find and use the advanced search functionality? 10. will the participant be able to find and use the main menu items? (see figure 1, number 3.) figure 2. horizontally oriented and tabbed layout of primo classic. literature review 2012 witnessed a flurry of studies involving primo classic. majors compared the experiences of users within the following discovery interfaces: encore synergy, summon, worldcat local, primo central, and ebsco discovery service. the study used undergraduate students enrolled at the university of colorado and focused on common undergraduate searching activities. each interface was tested by five or six participants who also completed an exit survey. observations specific to the primo interface noted that users had difficulty finding and using existing features, such as email and e-shelf, and difficulty connecting their failed searches to interlibrary loan functionality.2 primo new user interface | galbreath, johnson, and hvizdak 13 https://doi.org/10.6017/ital.v37i2.10191 figure 3. vertically oriented and non-tabbed layout of primo new ui. comeaux noted issues relating to terminology and the display of services during usability testing carried out at tulane university. twenty people, including undergraduates, graduates, and faculty members, participated in this study, which tested five typical information-seeking scenarios. the study found several problems related to terminology. for example, participants did not fully understand the meaning of the expand my results functionality.3 participants also did not understand that the display text “no full-text” could be used to order an item via interlibrary loan. 4 the study also concluded that the mixed presentation of differing resource types (e.g., books, articles, reviews) was confusing for patrons who were attempting known-item searches.5 jarrett documented a usability study conducted at flinders university library. the aims of the study were to determine user perceptions regarding the usability of the discovery layer, the relevance of the information retrieved, and the user experiences of this search interface compared to other interfaces. 6 the usability portion of the study scored the participants’ completion of tasks in the primo discovery layer as difficult, confusing, neutral, or straightforward. scores indicated that participants had difficulty determining different editions of a book, locating a local thesis, and placing an item on hold. the investigators also observed that students had issues signing into primo and distinguishing between journals and journal articles.7 information technology and libraries | june 2018 14 nichols et al. conducted a usability test on a newly implemented primo instance at the university of vermont libraries in 2012. their research questions were designed to understand primo’s design, functionality, and layout.8 the majority of the participants were undergraduate students. similar to comeaux, confusion occurred when participants had to find specific or relevant records within longer sets of results.9 nichols et al. also noticed that test subjects had difficulty navigating and finding information in the primo tabbed structure. like jarrett, nichols et al. noted that participants had difficulty distinguishing between the journals and articles.10 similar to majors, participants in nichols et al. had difficulty finding certain primo functionality, such as email, the e-shelf, and the feature to open items in a new window.11 the investigators concluded that these tools were difficult to find because they were buried too deep in the interface. the university of kansas libraries conducted two usability studies on primo. the first study took place during the 2012–13 academic year and involved 27 participants, including undergraduate, graduate, and professional students, who performed four to five main tasks in two separate sessions. similar to other studies, participants experienced great difficulty using the save to e-shelf and email citation tools.12 kliewer et al. conducted the second usability study in 2016, which focused primarily on student satisfaction with the primo discovery tool. thirty undergraduates participated in this study that collected both qualitative and quantitative data. in contrast to most usability studies of discovery services, this study allowed participants to explore primo with open-ended searches to more closely mimic natural searching strategies. results of the study indicated that the participants preferred basic search to advanced search, used facets (but not enough to maximize their searching potential), rarely moved beyond the first page of search results, and experienced difficulties using the link resolver. in response to the latter, a primo working group clarified language on the link resolver page to better differentiate between links to articles and links to journals.13 brett, lierman, and turner conducted a usability study at the university of houston libraries focusing primarily on undergraduate students. users were able to complete the assigned tasks, but the majority did not do so in the most efficient manner. that is, the participants did not take full advantage of primo functionality, such as facets, holds, and recalls. additionally, some participants exhibited difficulty deciphering among the terms journals, journal articles, and newspaper articles. another difficulty participants experienced was knowing what further steps to take once they had successfully found an item in the results list. for example, participants had trouble locating stacks guides, finding request features, and using call numbers. the researchers concluded that many of the issues witnessed in this usability study could be mitigated via library instruction.14 usability testing of primo new ui has recently begun to take a foothold in academic libraries. in addition to conducting usability testing on the primo classic in april 2015 (5 participants, 5–6 tasks), researchers at boston university carried out both preand post-launch testing of the new interface in december 2016 and april 2017, respectively. pre-launch testing with five student participants identified issues with “labelling, locating links to online services, availability statement links in full results, [and] my favorites.”15 after completing fixes, post-launch testing with four students (2 infrequent users, 2 frequent) found that they were able to easily complete tasks, use filters, save results, and find links to online resources. usage statistics for the new interface, compared to classic, also showed an increased use of facets after fixes, and an increase in the use of some features but decrease in the use of others, providing information on what features warranted further examination.16 primo new user interface | galbreath, johnson, and hvizdak 15 https://doi.org/10.6017/ital.v37i2.10191 california state university (csu) libraries conducted usability studies on primo new ui with 24 participants (undergraduate students, graduate students, and faculty) across five csu campuses. five standard tasks were required: find a specific book, find a specific film, find a peer-reviewed journal article, find an item in the csu network not owned locally, and find a newspaper article. each campus added additional questions based on local needs. participants were overwhelmingly positive about the interface look and feel, ease of use, and speed of the system. the success rate for each task varied across the campuses, with participants having greater success on simple tasks such as finding a specific or known item and mixed results on more difficult tasks including using scopes, understanding icons and elements of the frbr record, and facets. steps were taken to relabel and rearrange the scopes and facets so that they were more meaningful to users, and frbr icons were replaced. the authors concluded that primo is an ideal solution to incorporate both global changes and local preference because of its customizability.17 university of washington libraries conducted usability studies on the classic and new primo interfaces. the primo new ui study observed 12 participants. each 60-minute session included an orientation, pre and post-tests, tasks, and follow-up questions. difficulties were noted with terminology, the site logo, the inability to select multiple facets, unclear navigation, volume requesting, advanced search logic, the pin location in item details, and the date facet. a/b testing with 12 participants (from both the new and c lassic ui studies) revealed the need to fix the sign-in prompt for my favorites, enable libraries to add custom actions to the actions menu, add a sort option for favorites in the new interface, add the ability to rearrange elements on a single item page, and add zotero support. overall, participants preferred the new interface. generally, participants easily completed basic tasks, such as known-item searches, searches for course reserves, and open searches, but had more difficulty with article subject searching, audio/visual subject searching, and print-volume searching, which was consistent from the classic to the new interfaces for student participants.18 method we conducted a diagnostic usability evaluation of primo new ui using eight participants, whom we recruited from the wsu faculty, staff, and student populations. in the end, we received a skewed distribution among the categories: three members of staff and five students (two undergraduate students and three graduate students). the initial composition of the participants comprised a greater number of undergraduate students, but substitution created the final makeup. all the study participants had some exposure to primo classic in the past. we recruited participants by hanging flyers around the libraries of our pullman campus and the adjoining student commons area. we offered the participants $15 in exchange for their time, which we advertised as being a maximum of one hour. the usability test was designed by a team of three library staff, one from systems (it) and two from research services (reference/instruction). two of us were present at each session, one to read the tasks aloud and the other to document the session. we used camtasia to record each session so that we would have the ability to return to it later if we needed to verify our notes or other specifics of the session. we stored the recordings on a secured share of the internal library drive. we received an institutional review board certificate of exemption (irb #16190) to conduct this study. information technology and libraries | june 2018 16 this usability test comprised eleven tasks (see appendix a) to test the research questions described above. the tasks were drafted in consultation with the ex libris set of recommendations for conducting primo usability testing.19 each investigator drew their conclusions as to the participants’ successes and failures. we then met as a group to form a consensus regarding task success and failure (see appendix b). we met to discuss the patterns that emerged and to formulate remedies to problems we perceived as hindering student success. results for each of the ten research questions below, consult appendix b to see details regarding the associated tasks and how each participant approached and completed each task. task set(s) related to research question 1: will the participant be able to find and use the basic search functionality? this was one of the easier tasks for the participants to complete. some participants did not follow the task literally to find their favorite book or movie, but rather completed a search for an item or topic of interest to them. all the participants completed this task successfully. task set(s) related to research question 2: will the participant be able to understand the availability information of the brief results? the majority of the participants understood that the availability text and its color represented important access information. however, there were instances where the color of the availability status was in conflict with its text. this led at least one participant to evaluate the availability of a resource incorrectly. task set(s) related to research question 3: will the participant be able to find and use the sign in and sign out features? the participants all successfully completed this task. participants used multiple methods to sign in: the guest link in the top navigation bar, the sign in link from the ellipsis main menu item, and the get it sign in link on the full display page. all participants signed out via the user link in the top navigation bar. task set(s) related to research question 4: will the participant be able to understand the behavior of the facets? almost all of the participants were able to select the articles facet without issue. one person, however, misunderstood the include behavior of the facets. instead of using the include behavior, this participant used the exclude behavior to remove all facets other than the articles facet. only two participants attempted to use the print books facet to complete the task, “from the list of results, find a print book that you would need to order from another library.” instead, the other 75 percent simply scanned the list of results to find the same information. five out of the eight participants attempted to find the peer-reviewed facet when completing the task to choose any peer-reviewed article from a results list: three were successful, while one selected the newspaper articles facet, and another selected the reviews facet. task set(s) related to research question 5: will the participant be able to find and use the actions menu? the tasks related to the actions menu (copy a citation and email a record) were some of the most difficult for the participants: two were successful, three had some difficulty, and three were unsuccessful. of those primo new user interface | galbreath, johnson, and hvizdak 17 https://doi.org/10.6017/ital.v37i2.10191 who experienced difficulty, one seemed not to understand the task fully; this participant found and copied the citation, but then spent additional time looking for a “clipboard.” the other two participants were both distracted by competing areas of interest: the citations section of the full display and the section headings of the full display. of those who were unsuccessful, one suffered from a technical issue that ex libris needs to resolve (the functionality to expand the list of action items failed), one did not seem to understand what a citation was when they found it, and another could not find the email functionality. this last subject continued searching in the ellipsis area of the main menu, in the my account area, and the facets, but ultimately never found the email icon in the scrolling section of the actions menu. task set(s) related to research question 6: will the participant be able to navigate the get it and view it areas of the full display page? three participants experienced substantial difficulty in completing this set of tasks. these participants were distracted by the styled show libraries and stack chart buttons on the full display page that were competing for attention with the requesting options. task set(s) related to research question 7: will the participant be able to navigate the my account area? all of the participants completed this task successfully. four participants located the back-arrow icon to exit the my account area, while the other four participants used alternate methods: using the library logo, selecting the new search button, and signing out of primo. task set(s) related to research question 8: will the participant be able to find and use the help/chat and my favorites icons? participants encountered very little difficulty in finding a way to procure help and chat with a librarian, with one exception. participant 2 immediately navigated to and opened our help/chat icon, but then moved away from this service because it opened in a new tab. this same participant, along with three others, had a more difficult time finding and deciding to use the pin this item icon than did the three participants who completed the same task with ease. the remaining participant failed to complete this task because they could not find the my favorites area of primo. task set(s) related to research question 9: will the participant be able to find and use the advanced search functionality? one participant had more trouble finding the advanced search functionality than the other seven. another experienced a technical difficulty, in which the primo screen froze during the experiment, and we had to begin the task anew. the remaining six people easily finished the tasks. task set(s) related to research question 10: will the participant be able to find and use the main menu items? the majority of the participants completed this task with ease, navigating to the databases link in the main menu items. one participant, however, was confused by the term database but was able to succeed once we provided a brief definition of the term. the remaining two participants were further confused by the term and instead entered general search terms into the primo search bar. these two participants failed to find the list of databases. discussion information technology and libraries | june 2018 18 study participants completed four of our task sets with relative ease: using basic search (see research question 1 above), signing into and out of primo (see research question 3 above), navigating their my account area (see research question 7 above), and using advanced search (see research question 9 above). there was one exception: one participant experienced minor trouble finding the advanced search link, checking first among the drop-down options on our basic search page. subsequent and unrelated to this study, wsu elected to eliminate the first set of drop-down options from our primo landing page. further testing might tell us if this elimination in the number of drop-down options has effectively made the advanced search link more prominent for users. also, the ease with which participants were able to use items located underneath the “guest” label contradicted our expectations. we predicted that this opacity would cause users issues, but it did not seem to deter them. from this, we concluded that the placement of the sign in options in the upper right corner is sufficient to maintain continuity. participants encountered a moderate degree of difficulty completing two task sets: determining availability statuses and navigating the get it area of the full display page. concerning availability, participants were quick to understand that statuses such as “check holdings” relayed that the item was not available. the participants were also keen to notice that green availability statuses implied access while non -green availability statuses implied non-access. however, per the design of the new interface, certain non-green links became green after opening the full display page of primo. this was a significant deviation from the classic interface, where colors indicating availability status did not change. this design element misled one participant. of note, we did not observe participants experiencing issues with the converted format of the get it and view it areas (see figures 2 and 3) per se. however, we did notice that three of our participants were unnecessarily distracted by the show libraries link when trying to find resource sharing options because wsu had previously styled the show libraries links with color and shape. therefore, our local branding in this area impeded usability and led us to rethink the hierarchy of actions on the full display page. similar to comments made by demars, study participants also remarked that the layout of the full display was cluttered and difficult to read.20 we therefore took steps to make this page more readable for the viewer. study participants displayed the greatest difficulty completing the remaining four task sets: selecting a main menu item, refining a search via the facets, using the actions menu, and navigating the my favorites functionality. however, web design was not necessarily the culprit in all four areas. three participants experienced difficulty finding the databases link (a main menu item). after further discussion, it became apparent that this trouble related not to usability but to information literacy—they did not understand the term databases. therefore, like majors and comeaux,21 we recognize the recurring issue of library jargon, and like brett, lierman, and turner,22 we believe that this issue would best be mitigated via library instruction. in agreement with the literature, two participants selected the incorrect facet because they had difficulty distinguishing among the terms articles, newspaper articles, reviews and peer-reviewed.23 further, one of these participants experienced even more difficulty because of not understanding the inherent functionality of the facet values. that is, this participant did not grasp that the facet value links performed an inclusion process by default. to the contrary, this person believed that they would have had to exclude all unwanted facet values to arrive at the wanted facet value. the change in facet behavior between classic and new interfaces likely caused this confusion. in primo classic, wsu had installed a local customization that provided checkboxes and underlining upon hover for each facet value. the new interface did not primo new user interface | galbreath, johnson, and hvizdak 19 https://doi.org/10.6017/ital.v37i2.10191 provide either one of these clues to the user. additionally, we observed, similar to kliewer et al. and brett, lierman, and turner, that participants oftentimes preferred to scan the results list over refining their search via faceting.24 this finding also matches a 2014 ex libris user study indicating that users are easily confused by too many interface options and thus tend to ignore them.25 regarding the actions menu, the majority of the participants attempted to find the email icon in the correct section of the full display page (i.e., the “send to” section). however, because of a technical issue in the design of the new interface, the email icon was not always present for the participant to find. for others, it was difficult to reach the icon even when it was present as participants had to click the right arrow three to four times to navigate past all the citation manager icons. this observed difficulty in finding existing functionalities in primo echoes that cited by majors and nichols et al.26 participants also experienced significant difficulty deciphering between the similarly named functionalities of the citation icon and the citations section of the full display page. as a result of this observed difficulty, we concluded that differentiating sections of the page with distinct naming conventions would be beneficial to users. like the results reported by boston university, our study participants encountered significant issues when trying to save items into their my favorites list.27 we noticed that participants had difficulty making connections between the icons named keep this item/remove this item and the my favorites area. during testing, it was clear that many of the participants were drawn to the pin icon for the correctly anticipated functionality but then were confused that the tooltips did not include any language resembling “my favorites.” from this last observation, we surmised that providing continuity in language between these icons and the my favorites area would increase usability for our library patrons. pepitone reported problems with the placement of the my favorites pin icon,28 but we observed this being less of a problem than the actual terminology used to name the pin icon. beyond success and failure, a 2014 ex libris user study suggested that academic level and discipline play a key role in user behavior.29 however, we were unable to draw meaningful conclusions among user groups because of our small and homogenous participant pool. decisions made in response to usability results declined to change facets. although one participant did not understand the inclusion mechanism of the facet values, we declined to investigate a customization in this area. according to the primo august 2017 release notes, ex libris plans to make considerable changes to the faceting functionality.30 therefore, we decided to wait until after this release to reassess whether customization was warranted. implemented a change labels citations. we observed confusion between the citation icon of the actions menu and the section of the full display page labeled “citations.” to differentiate between the two items, we changed the actions menu icon text to “cite this item” (see figure 4) and the heading for the citations section to “references cited” (see figure 5). information technology and libraries | june 2018 20 figure 4. cite this item icon of the actions menu. figure 5. references cited section of the full display page. my favorites. there was a mismatch among the tooltip texts of the my favorites icons. we changed the tooltip language for the “keep this item” pin to read “add to my favorites” (see figure 6) and the tooltip language for the “unpin this item” pin to read “remove from my favorites” (see figure 7). figure 6. add to my favorites language for my favorites tooltip. figure 7. remove from my favorites language for my favorites tooltip. availability statuses. per the design of the new interface, certain non-green links became green after opening the full display page of primo new ui. we implemented css code to retain the non-green coloring of the availability statuses after opening the full display. in this case, “check holdings” remains orange (see figure 8). figure 8. availability status color of brief display, before and after opening the full display. primo new user interface | galbreath, johnson, and hvizdak 21 https://doi.org/10.6017/ital.v37i2.10191 link removal full display page headings. there was confusion as to the function of the headings on the full display page. these are anchor tags, but patrons clicked on them as if they were functional links. no patrons used the headings successfully. therefore, we hid the headings section via css (see figure 9). figure 9. removal of headings on full display page. links to other institutions. we observed participants attempting to use the links to other institutions to place resource sharing requests. therefore, we removed the hyperlinking functionality of the links in the list, via css (see figure 10). figure 10. neutralization of links to other institutions. prioritized the emphasis of certain functionalities request options and show libraries buttons. it is usually more important to be able to place a request than find the names of other institutions who own an item. however, the show libraries button was originally styled with crimson coloring, which drew unwarranted attention, while the requesting links were not. therefore, we added styling to the resource-sharing links and removed styling from the show libraries button via css (see figure 11). figure 11. resource sharing link with crimson color, show libraries removed of styling. information technology and libraries | june 2018 22 e-mail icon. we observed that the e-mail icon of the actions menu was difficult to find. therefore, we decreased the number of icons and moved the emailing functionality to the left side of the actions menu (see figure 12). figure 12. email icon prioritized over citation manager icons. contrast and separation full display page sections. participants noted that the information on the full display page tended to run together. to remedy, we created higher contrast between the foreground and background of the page sections via css. we also styled the section titles and dividers with color, among other edits (see figure 13). figure 13. separated sections of full display page (see figure 3 to compare to the new ui default full display page design). primo new user interface | galbreath, johnson, and hvizdak 23 https://doi.org/10.6017/ital.v37i2.10191 conclusion while providing one of the first studies on primo new ui, we acknowledge several limitations. previous studies on primo had larger study populations compared to this one (which had eight participants). however, we adhered to nielsen’s findings that usability studies uncover most design deficiencies with five or more participants.31 additionally, the scope of this study was limited to the usability of the desktop view. we recommend further studies that will concentrate on accessibility compliance and that will test the interface on mobile devices. regarding the study design, the question arose as to whether the participants’ difficulties reflected poor design functionality or a misunderstanding of library terminology (as noted by majors and comeaux).32 the researchers did not carry out pre-tests or an assessment of participants’ level of existing knowledge. this limitation is almost always unavoidable, however, as a task list will always risk not fitting the skills or knowledge of every participant. the lack of some features’ use also might have been because of study design. while not using the facets may reflect that participants are unaware of them, it could also be from the fact that they never had to scroll past the first few items to find the needed resource. users might have felt a greater need to use the facets had we asked more difficult discovery tasks. the study also contained an investigative bias in that the researchers were part of the working group that developed the customized interface, and then tested those customizations. this bias could have been reduced if the study had used researchers who were not a part of the same group that made these customizations. despite these limitations, there are still key findings of note. tasks that participants completed with the greatest ease mapped to those that we assume they do most often, which included basic searching for materials and accessing account information. tasks beyond these basics proved to be more difficult. this raises the question of whether difficulties were really a function of the interface design or if they reflected ongoing literacy issues. therefore, it is crucial that designers work with public services and instruction librarians to identify areas where users might be well-served by making certain functionalities more userfriendly and creating educational and training opportunities to increase awareness of these functionalities.33 bringing diverse perspectives into the study is also crucial so that researchers can discover and be more conscious of commonalities in design and literacy needs, particularly regarding advanced tasks. information technology and libraries | june 2018 24 appendix a: usability tasks note: search it is the local branding for primo at washington state university. 1) please search for your favorite book or movie. a) is this item available for you to read or watch? b) how do you know that this item is or isn’t available for you to read or watch? 2) please sign in to search it. 3) please perform a search for “causes of world war ii” (do not include quotation marks). a) limit your search results to articles. b) for any of the records in your search results list: i) find the citation for any item and copy it to the clipboard. ii) email this record to yourself. 4) please perform a search for “actor’s choice monologues” (do not include quotation marks). a) from the list of results, find a print book that you would need to order from another library. 5) please perform a search for a print book with isbn 0582493498. a) this book is checked out. how would you get a copy of it? b) pretend that this book is not checked out. please show us the information from this record that you would use to find this item on the shelves. 6) please navigate to your library account (from within search it). a) pretend that you have forgotten how many items you have checked out. please show us how you would find out how many items you currently have checked out. b) exit your library account area. 7) please navigate to advanced search. a) perform any search on this page. 8) please show us where you would go to find help and/or chat with a librarian? 9) please perform a search using the keywords “gender and media.” a) add any source to your my favorites list. then open my favorites and click on the title of the source you just added. b) return to your list of results. choose any peer-reviewed article that has the full text available. click on the link that will access the full text. 10) please find a database that might be of interest to you (e.g., jstor). 11) please sign out of search it and close your browser. primo new user interface | galbreath, johnson, and hvizdak 25 https://doi.org/10.6017/ital.v37i2.10191 appendix b: usability results note: search it is the local branding for primo at washington state university. research question 1: will the participant be able to find and use the basic search functionality? associated task(s): 1. please search for your favorite book or movie. participant successful? commentary 1 yes searches for “the truman show” from the beginning. 2 yes searches for “pet sematary” from the beginning. 3 yes searches for “additive manufacturing” from the beginning. 4 yes signs in first, navigates to new search, searches for “pzt sensor design.” 5 yes searches for “the notebook” from the beginning. 6 yes searches for “das leben der anderen” from the beginning. 7 yes searches for “legally blonde” from the beginning. 8 yes searches for “jurassic park” from the beginning. research question 2: will the participant be able to understand the availability information of the brief results? associated task(s): 1b. how do you know that this item is or isn’t available for you to read or watch? 4a. from the list of results, find a print book that you would need to order from another library. participant successful? commentary 1 yes differentiates between green and orange text; uses the “check holdings” availability status. clicks on “availability and request option” heading and then clicks on the resource sharing link. 2 yes, with difficulty. says that green “check holdings” status indicates ability to read the book. selects book with “check holdings” status and locates resource sharing link. information technology and libraries | june 2018 26 participant successful? commentary 3 yes, with difficulty unclear. initially, goes to a record with online access; redoes search, eventually locates resource sharing link. 4 yes says the record for the item reads “in place” and the availability indicator = 1. the record for the item reads “check holdings.” 5 yes says that status is indicated by statement “available at holland/terrell libraries.” the record for the item reads “check holdings.” 6 yes says that status is indicated by statement “available at holland/terrell libraries” and “item in place.” clicks on “check holdings”; says that orange color denotes fact that we don’t have it. 7 yes hovers over “check holdings” status, and then notes that “availability” statement reads “did not match any physical resources.” the record for the item reads “check holdings.” 8 yes says that status is indicated by statement “available at holland/terrell libraries.” says the record for the item reads “check holdings.” research question 3: will the participant be able to find and use the sign in and sign out features? associated task(s): 2. please sign into search it. 11. please sign out of search it and close your browser. participant successful? commentary 1 yes navigates to “guest” link, signs in. 2 yes navigates to ellipsis, signs in. navigates to “user” link, signs out. 3 yes navigates to “guest” link, signs in. navigates to “user” link, signs out. 4 yes n/a—already signed in. navigates to “user” link, signs out. primo new user interface | galbreath, johnson, and hvizdak 27 https://doi.org/10.6017/ital.v37i2.10191 participant successful? commentary 5 yes navigates to “guest” link, signs in. navigates to “user” link, signs out. 6 yes navigates to “guest” link, signs in. navigates to “user” link, signs out. 7 yes uses sign in link from full display page. navigates to “user” link, signs out. 8 yes navigates to “guest” link, signs in. navigates to “user” link, signs out. research question 4: will the participant be able to understand the behavior of the facets? associated task(s): 3a. limit your search results to articles. 4a. from the list of results, find a print book that you would need to order from another library. 9b. return to your list of results. choose any peer-reviewed article that has the full text available. participant successful? commentary 1 yes selects articles facet. n/a—does not use facets (however, participant investigates the library and type facets, returns to results lists). 2 yes selects articles facet. n/a—does not use facets. 3 no uses “exclude” property to remove everything but articles. uses “exclude” property to remove everything but print books. looks in facet type for articles; selects newspaper articles instead. 4 yes, with difficulty selects articles facet. selects print books facet. selects articles under type facet, clicks on “full-text available” status, selects peer-reviewed articles facet. 5 no selects articles facet. n/a—does not use facets. screen freezes (technical issue) and participant is forced to redo search. n/a— does not use facets. when further prompted to find only peerreviewed articles, participant searches pre-filter area and then selects reviews facet. information technology and libraries | june 2018 28 participant successful? commentary 6 yes selects articles facet. clicks on “check holdings.” participant hovers over “online access” text and then selects peer-reviewed facet. 7 yes looks in drop-down scope, then moves to articles facet. n/a— does not use facets. n/a—does not use facets. 8 yes hovers over peer-reviewed articles facet, and then selects articles facet. n/a—does not use facets. selects peer-reviewed facet. research question 5: will the participant be able to find and use the actions menu? associated task(s): 3.b.i. for any of the records in your search results list, find the citation for any item and copy it to the clipboard. 3.b.ii. for any of the records in your search results list, email this record to yourself. participant successful? commentary 1 yes briefly looks at citation icon, scrolls to bottom of page and looks at citations area, returns to citation icon. scrolls to bottom of page, returns to actions area, scrolls with arrow to find email icon, emails to self. 2 no initially clicks on citation manager icon (easybib), then clicks on citation icon and copies to clipboard. could not find email icon (technical issue with search it). although further discussion reveals that participant expects to see email function within “send to” heading. 3 no opens full display page of item, scrolls to bottom of page. clicks on the citation icon but doesn’t see what looking for. finds email icon and emails to self. 4 no opens full display page of item, clicks on the citation icon, double-clicks to highlight citation. could not find email icon. searches in ellipsis. attempts the keep this item pin. navigates to my account. searches in facets. 5 yes, with difficulty finds citation icon, but then leaves the area via citations heading and winds up at web of science homepage. hovers over “cited in this” language. finds the copy functionality. primo new user interface | galbreath, johnson, and hvizdak 29 https://doi.org/10.6017/ital.v37i2.10191 participant successful? commentary attempts sent to heading twice, looks through actions icons, scrolls to right, finds email icon. 6 yes finds citation icon, copies to clipboard. scrolls down page, returns to actions menu, scrolls to email icon, emails record to self. 7 yes, with difficulty copies citation from the brief result, and then spends some time trying to find “the clipboard.” navigates to the email icon. 8 yes, with difficulty scrolls to bottom of full display page, clicks on citing this link, clicks on title to record, and then copies first 3 lines of record. scrolls until finds email icon, but then moves to sent to heading, and then back to email icon, and sends. research question 6: will the participant be able to navigate the get it and view it areas of the full display page? associated task(s): 5.a. this book is checked out. how would you get a copy of it? 5.b. please show us the information from this record that you would use to find this item on the shelves. 9.b. click on the link that will access the full text. participant successful? commentary 1 yes clicks on “check holdings” availability status, clicks on availability and request options heading, clicks on request summit item link. refers to call number in alma iframe. clicks “full-text available” status, clicks database name. 2 yes opens record, locates resource sharing link. refers to call number; opens stack chart to find call number. clicks on title, clicks database name. 3 yes locates request option. locates call number in record. clicks “full-text available” status, clicks database name. 4 yes, with difficulty. clicks on show libraries button, then finds request option after searching page. locates call number in record. clicks “full-text available” status but does not click on database name. 5 yes, with difficulty. moves to stack chart button, then to show libraries button, and then to availability and request options heading, clicks on stack chart, clicks on show libraries, moves into first library listed and information technology and libraries | june 2018 30 participant successful? commentary back out, and finally to ill link. finds call number on full display page. 6 yes finds request summit option. identifies call number and stack chart as means to find book. clicks on database name. 7 yes, with difficulty. looks at status statement, scrolls to bottom of page, then show libraries button, then request summit option. identifies call number and stack chart as means to find book. attempts to use “full-text available” link, then clicks on database name. 8 yes finds summit request option. identifies call number and stack chart as means to find book. attempts to use “full-text available” link, then clicks on database name. research question 7: will the participant be able to navigate their my account area? associated task(s): 6. please navigate to your library account (from within search it). 6a. pretend that you have forgotten how many items you have checked out. please show us how you would find out how many items you currently have checked out. 6b. exit your library account area. participant successful? commentary 1 yes navigates to my account from “user” link. navigates to loans tab. uses back arrow icon. 2 yes navigates to my account from “user” link. navigates to loans tab. uses back arrow icon. 3 yes navigates to my account from main menu ellipsis. navigates to loans. uses back arrow icon. 4 yes navigates to my account from main menu ellipsis. navigates to loans. uses to back arrow icon. 5 yes navigates to my account from “user” link. navigates to loans. signs out of search it. 6 yes navigates to my account from “user” link. navigates to loans. uses search it logo to exit. primo new user interface | galbreath, johnson, and hvizdak 31 https://doi.org/10.6017/ital.v37i2.10191 participant successful? commentary 7 yes navigates to my account from “user” link. navigates to loans. uses new search button to exit. 8 yes navigates to my account from “user” link. navigates to loans. uses search it logo to exit. research question 8: will the participant be able to find and use the help/chat and my favorites icons? associated task(s): 8. please show us where would you go to find help and/or chat with a librarian? 9.a. add any source to your my favorites list. then, open my favorites and click on the title of the source you just added. participant successful? commentary 1 yes, with difficulty navigates to help/chat icon. navigates to keep this item pin, hesitates, navigates to ellipsis, returns to and clicks on pin. moves to my favorites via animation. clicks on title. 2 yes, with difficulty initially navigates to help/chat icon, but thinks it is the wrong button because chat is not directly available within search it. navigates to keep this item pin, hesitates, looks around, selects pin. moves to my favorites via animation. clicks on title. 3 yes, with difficulty navigates to help/chat icon. navigates to ellipsis, actions menu, and tags section. finds keep this item pin. 4 no navigates to help/chat icon. navigates to ellipsis, keep this item pin, my account, and facets quits search. 5 yes, with difficulty navigates to help/chat icon. adds keep this item pin after investigating 12 other icons. moves to my favorites via animation. clicks on title. 6 yes navigates to help/chat icon. adds keep this item pin and moves to my favorites via animation. clicks on title. 7 yes navigates to help/chat icon. checks actions menu, adds keep this item pin and moves to my favorites via animation clicks on title. 8 yes navigates to help/chat icon. adds keep this item pin and moves to my favorites via animation. clicks on title. information technology and libraries | june 2018 32 research question 9: will the participant be able to find and use the advanced search functionality? associated task(s): 7. please navigate to advanced search. 7a. perform any search on this page. participant successful? commentary 1 yes navigates to advanced search. performs search. 2 yes navigates to advanced search. performs search. 3 yes, with difficulty navigates to basic search drop-down, then to new search, then to advanced search. has trouble inserting cursor into search box. 4 yes, with difficulty navigates to advanced search. builds complex search, then search it freezes and we have to restart the search tool. 5 yes navigates to advanced search. performs search. 6 yes navigates to advanced search. performs search. 7 yes navigates to advanced search. performs search. 8 yes navigates to advanced search. performs search. research question #10: will the participant be able to find and use the main menu items? associated task(s): 10. please find a database that might be of interest to you (e.g., jstor). participant successful? commentary 1 yes navigates to “databases” link of main menu. 2 yes navigates to “databases” link of main menu. 3 no types query “stretchable electronics” into search box, but unsure how to find a database in the results lists. 4 no types query “reinforced concrete” into search box, but unsure how to find a database in the results lists. primo new user interface | galbreath, johnson, and hvizdak 33 https://doi.org/10.6017/ital.v37i2.10191 participant successful? commentary 5 yes, with difficulty is confused by term database. enters “ieee” in search box. 6 yes navigates to “databases” link of main menu. 7 yes searches within drop-down scopes, then facets, then moves to “databases” link of main menu. 8 yes navigates to “databases” link of main menu. 1 “frequently asked questions,” ex libris knowledge center, accessed august 28, 2017, https://knowledge.exlibrisgroup.com/primo/product_documentation/050new_primo_user_interface /010frequently_asked_questions. 2 rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends 61, no. 1 (2012): 186–207, https://doi.org/10.1353/lib.2012.0029. 3 david comeaux, “usability testing of a web-scale discovery system at an academic library,” college & undergraduate libraries 19, no. 2–4 (2012): 199, https://doi.org/10.1080/10691316.2012.695671. 4 comeaux, “usability testing,” 202. 5 comeaux, “usability testing,” 196–97. 6 kylie jarrett, “findit@flinders: user experiences of the primo discovery search solution,” australian academic & research libraries 43, no. 4 (2012): 280, https://doi.org/10.1080/00048623.2012.10722288. 7 jarrett, “findit@flinders,” 287. 8 aaron nichols et al., “kicking the tires: a usability study of the primo discovery tool,” journal of web librarianship 8, no. 2 (2014): 174, https://doi.org/10.1080/19322909.2014.903133. 9 nichols, “kicking the tires,” 181. 10 nichols, “kicking the tires,” 184. 11 nichols, “kicking the tires,” 184–85. 12 scott hanrath and miloche kottman, “use and usability of a discovery tool in an academic library,” journal of web librarianship 9, no. 1 (2015): 9, https://doi.org/10.1080/19322909.2014.983259. 13 greta kliewer et al., “using primo for undergraduate research: a usability study,” library hi tech 34, no. 4 (2016): 576, https://doi.org/10.1108/lht-05-2016-0052. https://knowledge.exlibrisgroup.com/primo/product_documentation/050new_primo_user_interface/010frequently_asked_questions https://knowledge.exlibrisgroup.com/primo/product_documentation/050new_primo_user_interface/010frequently_asked_questions https://doi.org/10.1353/lib.2012.0029 https://doi.org/10.1080/10691316.2012.695671 https://doi.org/10.1080/00048623.2012.10722288 https://doi.org/10.1080/19322909.2014.903133 https://doi.org/10.1080/19322909.2014.983259 https://doi.org/10.1108/lht-05-2016-0052 information technology and libraries | june 2018 34 14 kelsey brett, ashley lierman, and cherie turner, “lessons learned: a primo usability study,” information technology & libraries 35, no. 1 (2016): 21, https://doi.org/10.6017/ital.v35i1.8965. 15 cece cai, april crockett, and michael ward, “our experience with primo new ui,” ex libris users of north america conference 2017, accessed november 4, 2017, http://documents.eluna.org/1467/1/caicrockettward_051017_445pm.pdf. 16 cai, crockett, and ward, “our experience with primo new ui.” 17 j. michael demars, “discovering our users: a multi-campus usability study of primo” (paper presented, international federation of library association and institutions world library and information conference 2017, warsaw, poland, august 14, 2017), 11, http://library.ifla.org/1810/1/s10-2017demars-en.pdf. 18 anne m. pepitone, “a tale of two uis: usability studies of two primo user interfaces” (slideshow presentation, primo day 2017: migrating to the new ui, june 12, 2017), https://www.orbiscascade.org/primo-day-2017-schedule/. 19 “primo usability guidelines and test script,” ex libris knowledge center, accessed october 28, 2017, https://knowledge.exlibrisgroup.com/primo/product_documentation/ new_primo_user_interface/primo_usability_guidelines_and_test_script. 20 demars, “discovering our users,” 9. 21 majors, “comparative user experiences,” 190; comeaux, "usability testing," 198–204. 22 brett, lierman, and turner, “lessons learned,” 21. 23 jarrett, “findit@flinders,” 287; nichols, “kicking the tires,” 184; brett, lierman, and turner, “lessons learned,” 20–21. 24 kliewer et al., “using primo for undergraduate research,” 571–72; brett, lierman, and turner, “lessons learned,” 17. 25 miri botzer, “delivering the experience that users expect: core principles for designing library discovery services,” white paper, nov 25 2015, 10, http://docplayer.net/10248265-delivering-theexperience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzerprimo-product-manager-ex-libris.html. 26 majors, “comparative user experiences,” 194; nichols et al., “kicking the tires,” 184–85. 27 cai, crockett, and ward, “our experience with primo new ui,” 28–29. 28 pepitone, “a tale of two uis,” 29. 29 botzer, “delivering the experience,” 4–5; christine stohn, “how do users search and discover? findings from ex libris user research,” library technology guides, may 5 2015, 7–8, https://librarytechnology.org/document/20650. https://doi.org/10.6017/ital.v35i1.8965 http://documents.el-una.org/1467/1/caicrockettward_051017_445pm.pdf http://documents.el-una.org/1467/1/caicrockettward_051017_445pm.pdf http://library.ifla.org/1810/1/s10-2017-demars-en.pdf http://library.ifla.org/1810/1/s10-2017-demars-en.pdf https://www.orbiscascade.org/primo-day-2017-schedule/ https://knowledge.exlibrisgroup.com/primo/product_documentation/%20new_primo_user_interface/primo_usability_guidelines_and_test_script https://knowledge.exlibrisgroup.com/primo/product_documentation/%20new_primo_user_interface/primo_usability_guidelines_and_test_script http://docplayer.net/10248265-delivering-the-experience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzer-primo-product-manager-ex-libris.html http://docplayer.net/10248265-delivering-the-experience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzer-primo-product-manager-ex-libris.html http://docplayer.net/10248265-delivering-the-experience-that-users-expect-core-principles-for-designing-library-discovery-services-miri-botzer-primo-product-manager-ex-libris.html https://librarytechnology.org/document/20650 primo new user interface | galbreath, johnson, and hvizdak 35 https://doi.org/10.6017/ital.v37i2.10191 30 “primo august 2017 highlights,” ex libris knowledge center, accessed november 2, 2017, https://knowledge.exlibrisgroup.com/primo/product_documentation/highlights/ 027primo_august_2017_highlights. 31 jakob nielsen, “how many test users in a usability study?,” nielsen norman group, jun 4, 2012, https://www.nngroup.com/articles/how-many-test-users/. 32 majors, “comparative user experiences,” 190; comeaux, “usability testing,” 200–204. 33 brett, lierman, and turner, “lessons learned,” 21. https://knowledge.exlibrisgroup.com/primo/product_documentation/highlights/%20027primo_august_2017_highlights https://knowledge.exlibrisgroup.com/primo/product_documentation/highlights/%20027primo_august_2017_highlights https://www.nngroup.com/articles/author/jakob-nielsen/ https://www.nngroup.com/articles/how-many-test-users/ abstract introduction research questions literature review method results task set(s) related to research question 1: will the participant be able to find and use the basic search functionality? task set(s) related to research question 2: will the participant be able to understand the availability information of the brief results? task set(s) related to research question 3: will the participant be able to find and use the sign in and sign out features? task set(s) related to research question 4: will the participant be able to understand the behavior of the facets? task set(s) related to research question 5: will the participant be able to find and use the actions menu? task set(s) related to research question 6: will the participant be able to navigate the get it and view it areas of the full display page? task set(s) related to research question 7: will the participant be able to navigate the my account area? task set(s) related to research question 8: will the participant be able to find and use the help/chat and my favorites icons? task set(s) related to research question 9: will the participant be able to find and use the advanced search functionality? task set(s) related to research question 10: will the participant be able to find and use the main menu items? discussion decisions made in response to usability results declined to change implemented a change labels link removal prioritized the emphasis of certain functionalities contrast and separation conclusion appendix a: usability tasks appendix b: usability results research question 1: associated task(s): research question 2: associated task(s): research question 3: associated task(s): research question 4: associated task(s): research question 5: associated task(s): research question 6: associated task(s): research question 7: associated task(s): research question 8: associated task(s): research question 9: associated task(s): research question #10: associated task(s): using the harvesting method to submit etds into proquest: a case study of a lesser-known approach communications using the harvesting method to submit etds into proquest a case study of a lesser-known approach marielle veve information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12197 marielle veve (m.veve@unf.edu) is metadata librarian, university of north florida. © 2020. abstract the following case study describes an academic library’s recent experience implementing the harvesting method to submit electronic theses and dissertations (etds) into the proquest dissertations & theses global database (pqdt). in this lesser-known approach, etds are deposited first in the institutional repository (ir), where they get processed, to be later harvested for free by proquest through the ir’s open archives initiative (oai) feed. the method provides a series of advantages over some of the alternative methods, including students’ choice to opt-in or out from proquest, better control over the embargo restrictions, and more customization power without having to rely on overly complicated workflows. institutions interested in adopting a simple, automated, post-ir method to submit etds into proquest, while keeping the local workflow, should benefit from this method. introduction the university of north florida (unf) is a midsize public institution established in 1972, with the first theses and dissertations (tds) submitted in 1974. since then, copies have been deposited in the library, where bibliographic records are created and entered in the library catalog and the online computer library center (oclc). during the period of 1999 to 2012, some tds were also deposited in proquest by the graduate school on behalf of students who decided to. this practice, however, was discontinued in the summer of 2012, when the institutional repository, digital commons, was established and submission to it became mandatory. five years later, in the summer of 2017, interest in getting unf tds hosted in proquest resurfaced. this renewed interest grew out from a desire of some faculty and graduate students to see the institution’s electronic theses and dissertations (etds) posted there, in addition to a recent library subscription to the proquest dissertations & theses global database (pqdt). a month later, conversations between the library and graduate school began on the possibility of resuming hosting unf etds in proquest. consensus was reached that the pqdt database would be a good exposure point for our etds, in addition to the institutional repository (ir), yet some concerns were raised. one of the concerns was cost of the service and who would be paying for it. neither the library nor the graduate school had allocated funds for this. the next concern was the possibility of proquest imposing restrictions that could prevent students, or the university, from posting etds in other places. it was important to make sure there were no such restrictions. another concern was expressed over students entering embargo dates in proquest that do not match the embargo dates selected for the ir. this is a common problem encountered by other libraries.1 for that reason, we wanted to keep the local workflow. the last concern expressed during the conversations was preserving students’ right to opt-in or out from distributing their theses in proquest. this is something both the graduate school and library have been adamant mailto:m.veve@unf.edu information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 2 about. in higher education, requiring students to submit to proquest is a controversial issue which has raised ethical concerns and has been highly debated over the years.2 once conversations between the library and graduate school were held and concerns were gathered, the library moved ahead to investigate the available options to submit etds into proquest. literature review currently, there are three options to submit etds into proquest: (1) submission through the proquest etd administrator tool, (2) submission via file transfer protocol (ftp), and (3) submission through harvests performed by proquest.3 proquest etd administrator submission option in this option, a proprietary submission tool called proquest etd administrator is used by students, or assigned administrators, to upload etds into proquest. inside the tool, a fixed metadata form is completed with information on the degree, subject terms are selected from a proprietary list, and keywords are provided. the whole administrative and review process gets done inside the tool. afterwards, zip packages with the etds and proquest’s extensible markup language (xml) files are sent to the institution via ftp transfers, or through direct deposits to the ir using the simple web-service offering repository deposit (sword) protocol. the etd administrator submission method presents several shortcomings. first, the proquest xml metadata that is returned to the institutions must be transformed into ir metadata for ingest in the ir, a process that can be long and labor intensive.4 second, the subject terms supplied in the returned files come from a proprietary list of categories maintained by proquest, which does not match the library of congress subject headings (lcsh) used by libraries.5 third, control over the metadata provided is lost because the metadata form cannot be altered, plus customizations to other parts of the system can be difficult to integrate. 6 fourth, there have been issues with students indicating different embargo periods in the proquest and ir publishing options, with instances of students choosing to embargo etds in the ir, while not in proquest.7 lastly, this method does not allow students’ choice, unless the etds are submitted separately in two systems in a process that can be burdensome. ultimately, for these reasons, we found the etd administrator not a suitable option for our institution. ftp submission option in this option, an administrator sends zip packages with the institution’s etd files and proquest xml metadata to proquest via ftp.8 at the time of this investigation, there was a $25 charge per etd submitted through this method.9 we did not want to pursue this option because of the charge and the tedious metadata transformations that would be needed between ir and proquest xml schemas. another way to go around this would have been to submit the etds through the vireo application. vireo is an open source, etd management system used by libraries to freely submit etds into proquest via ftp.10 this alternative, however, was not an option for us as our ir, digital commons, does not support the vireo application. harvesting submission option this is the latest method available to submit etds into proquest. in this option, etds are submitted first into an ir, or other internal system, where they get processed to be later harvested by proquest through the ir’s existing open archives initiative (oai) feed.11 at the time of this writing, we were not able to find a single study that documents the use of this method. this option information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 3 looked appealing and worth pursuing as it met most of our desired criteria. first, with this option, students’ choice would not be compromised as etds would be submitted to proquest after being posted in the ir. second, because the etd administrator would not be used, issues with conflicting embargo dates and unalterable metadata forms would be avoided. in addition, the local workflow would be retained, thus eliminating the need for tedious metadata transformations between proquest and ir schemas. from the available options, this one seemed the most feasible solution for our institution. implementation of the harvesting method at unf after research on the different submittal options was performed, the library approached proquest to express interest in depositing our future etds into their system by using a post-ir option. in the first communications, proquest suggested we use the etd administrator to submit etds because is the most commonly used method. when we expressed interest in the harvesting option, they said “we have not been harvesting from bepress sites” (the company that makes digital commons) and suggested we use the ftp option instead.12 ten months later, they clarified the harvests could be performed from bepress sites and that the option is free, with the only requirement of a non-exclusive agreement between the university and proquest. the news appeased both the library’s and the graduate school’s previous concerns, as we would be able to adopt a free method that would not compromise on students’ choice nor restrict students from posting in other places, while keeping the local workflow. after agreement on the submittal method was established, planning and testing of the harvesting method began. the library worked with proquest and bepress to customize the harvesting process while the university’s office of the general counsel worked with proquest on the negotiation process. negotiation process before proquest could harvest unf etds, two legal documents needed to be in place. the first document was the theses and dissertations distribution agreement, which specifies the conditions under which etds can be obtained, reproduced, and disseminated by proquest. the document had to be signed by the unf’s board of trustees and proquest. the agreement stipulated the following conditions: • the agreement must be non-exclusive. • the university must make the full-text uniform resource locators (urls) and abstracts of etds available to proquest. • proquest must harvest the etds from the university’s ir. • the university and students have the option to elect not to submit individual works or to withdraw them. • no fees are due from the university or students for the service. • proquest must include the etds in the pqdt database. the second document that needed to be in place was the theses and dissertations availability agreement, which grants the university the non-exclusive right to reproduce and distribute the etds. this agreement between students and unf specifies the places where etds can be hosted and the embargo restrictions, if any. unf already has been using this document as part of its etd workflow, but the document needed to be modified to include the additional option to submit information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 4 etds into proquest. beginning with the spring 2019 semester, the revised version of the agreement provided students with two hosting alternatives: posting in the ir only or in the ir and proquest. local steps performed before the harvesting the workflow begins when students upload their etds and supplemental files (certificate of approval and availability agreements) directly into the digital commons ir. in there, students complete a metadata template with information on the degree and keywords related to the thesis are provided. after this, the graduate school reviews the submitted etds and approves them inside the ir platform. next, the library digital projects’ staff downloads the native pdf files of etds, processes them, and creates public and archival versions for each etd. availability agreements are reviewed to determine which students chose to embargo their etds and which ones chose to host them in proquest, in addition to the ir. if students choose to embargo their etds, the embargo dates are entered in the metadata template. if students choose to publish their etds in proquest, a “proquest: yes” option is checked in their metadata template, while students who choose not to host in proquest would get a “proquest: no” in their template. (the proquest field is a new administrative field that was added to the etd metadata template, starting with the spring 2019 semester, to assist with the harvesting process. it was designed to alert proquest of the etds that were authorized for harvesting. more detail on its functionality will be provided in the next section.) the reason library staff enters the proquest and embargo fields on behalf of students is to avoid having students enter incorrect data on the template. following this review, the metadata librarian assigns library of congress subject headings to each etd and creates authority files for the authors. these are also entered in the metadata template. afterwards, the etds get posted in the digital commons’ public display, with the fulltext pdf files available only for the non-embargoed etds. information that appears in the public display of digital commons will also appear immediately in the oai feed for harvesting. at this point, two separate processes take place: 1. metadata librarian harvests the etds’ metadata from the oai feed and converts it into marc records that are sent to oclc, with the ir’s url attached. the workflow is described at https://journal.code4lib.org/articles/11676. 2. on the seventh of each month, proquest harvests the full-text pdf files, with some metadata, of the non-embargoed etds that were authorized for harvesting from the oai feed. harvesting process (customized for our institution) to perform the harvests, proquest creates a customized robot for each institution that crawls oaipmh compliant repositories to harvest metadata and full-text pdf files of etds.13 the robot performs a date-limited oai request to pull everything that has been published or edited in an ir’s publication set during a specific timeframe. information to formulate the date limited request is provided to proquest by the institution for the first harvest only, subsequently, the process gets done automatically by the robot. the request contains the following elements: information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 5 • base url of the oai repository • publication set • metadata prefix or type of metadata • date range of titles to be harvested in the particular case of our institution, we needed to customize the robot to limit the harvests to authorized etds only. to achieve this, we worked with bepress to add a new, hidden field at the bottom of our digital commons’ etd metadata template. the field, called proquest, consisted of a dropdown menu with 2 alternatives: “proquest yes” or “proquest no” (see figure 1). the field was mapped to an element in the oai feed that displays the value of “proquest: yes” or “proquest: no,” thus alerting the robot of the etds that were authorized for harvesting and the ones that were not. the element used to map the proquest field in the oai feed is the , which is a qualified dublin core (qdc) element (figure 2). for that reason, the robot needs to perform the harvests from the qdc oai feed in order to see this field. figure 1. display of the proquest field’s dropdown menu in the metadata template figure 2. display of the proquest field in the qdc oai feed after the etds authorized for harvesting have been identified with help from the “proquest: yes” field, the robot narrows down the ones that can be harvested at the present moment by using the element. this element, as the name implies, provides the date when the full text file of an etd becomes available. it also displays in the qdc oai feed (see figure 3). if the date is on or before the monthly harvest day, the etd is currently available for harvesting. if the date is in the future, the robot identifies that etd as embargoed and adds its title to a log of embargoed etds with some basic metadata (including the etd’s author and the last time it was checked). the log of embargoed etds is then pulled out in the future to identify the etds that come out of embargo so the robot can retrieve them. figure 3. display of the element in the qdc oai feed information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 6 after the etds that are currently available for harvesting have been identified (because they have the “proquest: yes” field and a present or past availability date), the robot performs a harvest of their full-text pdf files by using the third element, which displays at the bottom of records in the oai feed (figure 4). the third element contains a url with direct access to the complete pdf file of etds that are currently not embargoed. etds that are currently on embargo contain a url that redirects the user to a webpage with the message: “the full-text of this etd is currently under embargo. it will be available for download on [future date]” (see figure 5). figure 4. display of the third element at the bottom of records in the qdc oai feed figure 5. message that displays in the url of embargoed etds once the metadata and full-text pdf files of authorized, non-embargoed etds have been obtained by the robot, they get queued for processing by the proquest editorial team, who then assigns them international standard book numbers (isbns) and proquest’s proprietary terms. it takes an average of four to nine weeks for the etds to display in the pqdt database after been harvested. records in the pqdt come with the institutional repository’s original cover page and a copyright statement that leaves copyright to the author. afterwards, the process gets repeated once a month. this frequency can be set to quarterly or semi-annually if desired. information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 7 additional points on the harvesting method handling of etds that come out of embargo. when the embargo period of an etd expires, the full-text pdf of it becomes automatically available in the ir’s webpage, and consequently, in the third element that displays in the oai record. each month, when the robot prepares to crawl the oai feed, it will first check for the titles in the log of embargoed etds to determine if any of them have become fully available through the third element. the ones that become available are then pulled by the robot through this element. handling of metadata edits performed after the etds have been harvested and published in pqdt. edits performed to metadata of etds will trigger a change of date in the element that displays in the oai records. this change of date will alert the robot of an update that took place in a record, which is then manually edited or re-harvested, depending on the type of update that took place. sending marc records to oclc. as part of the harvesting process, proquest provides free marc records for the etds hosted in their pqdt database. these can be delivered to oclc on behalf of the institution on an irregular basis. records are machine-generated “k” level and come with urls that link to the pqdt database and with proquest’s proprietary subject terms. we requested to be excluded from these deliveries and continue our local practice of sending marc records to oclc with lcsh, authority file headings, and the ir’s urls. notifications of harvests performed by proquest and imports to the pqdt database. when harvests or imports to the pqdt have been performed by proquest, institutions do not get automatically notified. still, they can request to receive scheduled monthly reports of the titles that have been added to the pqdt. unf requested to receive these monthly reports. usage statistics of etds hosted in pqdt. usage statistics of an institution’s etds hosted in the pqdt can be retrieved from a tool called dissertation dashboard. this tool is available to the institution’s etd administrators and provides the number of times some aspect of an etd (e.g., citations, abstract viewings, page previews, and downloads) has been accessed through the pqdt database. royalty payments to authors. students who submit etds through this method are also eligible to receive royalties from proquest. obstacles faced during the planning phase, we encountered some obstacles that hindered progress on the implementation. these were: • amount of time it took to get the ball rolling. initially, we were misled by the assumption we would not be able to use the harvesting method to submit etds into proquest because we were bepress users, as we were originally told, but that ended up not being the case. ten months later, we were notified by the same source that the harvesting option for bepress sites would be possible and doable by proquest. these were ten months that delayed the implementation process. information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 8 • amount of time it took to get the paperwork finalized and signed before the harvesting. from the moment first contact was initiated with proquest, to the moment the last agreement was finalized and signed by both parties, 21 months went by. there was a lot of back and forth in the negotiation process and paperwork between the university and proquest. • inconsistent lines of communication. there were multiple parties involved in the communication process and some of the emails began with one person only to be later transferred to someone else. this lack of consistency in the communication lines made it difficult to determine who was in charge of particular tasks at certain stages of the process. conclusion and recommendations although problems were encountered at the beginning, implementation of the harvesting process at unf was a complete success. once the process started, it ran smoothly without complications. harvests were performed on schedule and no issues with unauthorized content been pulled from the oai were faced. fields used to alert the robot in the oai of the etds authorized for harvesting worked as planned, and so did the embargo log used to identify and pull the out of embargo etds. it should be noted that digital commons users who want to exclude embargoed etds from displaying in the oai can do so by setting up an optional yes/no button in their submission form. this button prevents metadata of particular records from displaying in the oai feed. we did not pursue this option because we have been using the etd metadata that displays in th e oai to generate the marc records we send to oclc. in addition, we took the necessary precautions to avoid exposing full content of the embargoed etds in the oai feed. institutions planning to use this method should be very careful with the content they display in the oai as to avoid embargoed etds from been mistakenly pulled by proquest. access restrictions can be set by either suppressing the metadata of embargoed etds from displaying in the oai or by suppressing the urls with full access to the embargoed etds. the same precaution should be taken if planning to provide students with the choice to opt-in or out from proquest. altogether, the harvesting option proved to be a reliable solution to submit etds into proquest without having to compromise on students’ choice nor rely on complicated workflows with metadata transformations between ir and proquest schemas. institutions interested in adopting a simple, automated, post-ir method, while keeping the local workflow, should benefit from this method. information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 9 endnotes 1 dan tam do and laura gewissler, “managing etds: the good, the bad, and the ugly,” in what’s past is prologue: charleston conference proceedings, eds. beth r. bernhardt et al. (west lafayette, in: purdue university press, 2017), 200-04, https://doi.org/10.5703/1288284316661; emily symonds stenberg, september 7, 2016, reply to wendy robertson, “anything to watch out for with etd embargoes?,” digital commons google users group (blog), https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7csort:da te/digitalcommons/rningtrarny/6byzt9apaqaj. 2 gail p. clement, “american etd dissemination in the age of open access: proquest, noquest, or allowing student choice,” college & research libraries news 74, no. 11 (december 2013): 562– 66, https://doi.org/10.5860/crln.74.11.9039; fuse, 2012-2013, graduate students re-fuse!, https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/graduate%20students %20re-fuse.pdf?sequence=25&isallowed=y. 3 “pqdt submissions options for universities,” proquest, http://contentz.mkt5049.com/lp/43888/382619/pqdtsubmissionsguide_0.pdf . 4 meghan banach bergin and charlotte roh, “systematically populating an ir with etds: launching a retrospective digitization project and collecting current etds,” in making institutional repositories work, eds. burton b. callicott, david scherer, and andrew wesolek (west lafayette, in: purdue university press, 2016), 127–37, https://docs.lib.purdue.edu/purduepress_ebooks/41/. 5 cedar c. middleton, jason w. dean, and mary a. gilbertson, “a process for the original cataloging of theses and dissertations,” cataloging and classification quarterly 53, no. 2 (february 2015): 234–46, https://doi.org/10.1080/01639374.2014.971997. 6 wendy robertson and rebecca routh, “light on etd’s: out from the shadows” (presentation, annual meeting for the ila/acrl spring conference, cedar rapids, ia, april 23, 2010), http://ir.uiowa.edu/lib_pubs/52/; yuan li, sarah h. theimer, and suzanne m. preate, “campus partnerships advance both etd implementation and ir development: a win-win strategy at syracuse university,” library management 35, no. 4/5 (2014): 398–404, https://doi.org/10.1108/lm-09-2013-0093. 7 do and gewissler, “managing etds,” 202; banach bergin and roh, “systematically populating,” 134; donna o’malley, june 27, 2017, reply to andrew wesolek, “etd embargoes through proquest,” digital commons google users group (blog), https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7csort :date/digitalcommons/gadwi8infga/sg7de7sdcaaj. 8 gail p. clement and fred rascoe, “etd management & publishing in the proquest system and the university repository: a comparative analysis,” journal of librarianship and scholarly communication 1, no. 4 (august 2013): 8, http://doi.org/10.7710/2162-3309.1074. 9 “u.s. dissertations publishing services: 2017-2018 fee schedule,” proquest. https://doi.org/10.5703/1288284316661 https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7csort:date/digitalcommons/rningtrarny/6byzt9apaqaj https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7csort:date/digitalcommons/rningtrarny/6byzt9apaqaj https://doi.org/10.5860/crln.74.11.9039 https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/graduate%20students%20re-fuse.pdf?sequence=25&isallowed=y https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/graduate%20students%20re-fuse.pdf?sequence=25&isallowed=y http://contentz.mkt5049.com/lp/43888/382619/pqdtsubmissionsguide_0.pdf https://docs.lib.purdue.edu/purduepress_ebooks/41/ https://doi.org/10.1080/01639374.2014.971997 http://ir.uiowa.edu/lib_pubs/52/ https://doi.org/10.1108/lm-09-2013-0093 https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7csort:date/digitalcommons/gadwi8infga/sg7de7sdcaaj https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7csort:date/digitalcommons/gadwi8infga/sg7de7sdcaaj http://doi.org/10.7710/2162-3309.1074 information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 10 10 “support: proquest export documentation,” vireo users group, https://vireoetd.org/vireo/support/proquest-export-documentation/. 11 “pqdt global submission options, institutional repository + harvesting,” proquest, https://media2.proquest.com/documents/dissertations-submissionsguide.pdf. 12 marlene coles, email message to author, january 19, 2018. 13 “proquest dissertations & theses global harvesting process,” proquest. https://vireoetd.org/vireo/support/proquest-export-documentation/ https://media2.proquest.com/documents/dissertations-submissionsguide.pdf abstract introduction literature review proquest etd administrator submission option ftp submission option harvesting submission option implementation of the harvesting method at unf negotiation process local steps performed before the harvesting harvesting process (customized for our institution) additional points on the harvesting method handling of etds that come out of embargo. handling of metadata edits performed after the etds have been harvested and published in pqdt. sending marc records to oclc. notifications of harvests performed by proquest and imports to the pqdt database. usage statistics of etds hosted in pqdt. royalty payments to authors. obstacles faced conclusion and recommendations endnotes making disciplinary research audible: the academic library as podcaster articles making disciplinary research audible the academic library as podcaster drew smith, meghan l. cook, and matt torrence information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12191 drew smith (dsmith@usf.edu) is associate librarian, university of south florida. meghan l. cook (mlcook3@usf.edu) is coordinator of library operations, university of south florida. matt torrence (torrence@usf.edu) is associate librarian, university of south florida. © 2020. abstract academic libraries have long consulted with faculty and graduate students on ways to measure the impact of their published research, which now include altmetrics. podcasting is becoming a more viable method of publicizing academic research to a broad audience. because individual academic departments may lack the ability to produce podcasts, the library can serve as the most appropriate academic unit to undertake podcast production on behalf of researchers. the article identifies what library staff and equipment are required, describes the process needed to produce and market the published episodes, and offers preliminary assessments of the podcast impact. introduction the academic library has always had an essential role in the research activities of university faculty and graduate students, but until the last several years, that role has primarily focused on assisting university researchers with obtaining access to all relevant published research in their fields, making it possible for those researchers to complete a thorough literature review. more recently, that role has evolved to encompass assisting with other aspects of research and publication, including consulting on copyright-related issues, advising researchers on the most appropriate places to publish, preserving publications and data in institutional repositories, helping tenure-track faculty to evaluate their research impact as part of the tenure and promotion process, and hosting open-access journals. meanwhile, libraries of all types have experimented in the last ten to fifteen years with using social media to promote library collections, services, and events. many libraries have taken advantage of facebook, twitter, and youtube as part of these efforts. increasingly, libraries have incorporated makerspaces so that library patrons can create and edit video and audio files, meaning that this same equipment and software is now available to librarians and other library staff for their own purposes. this has resulted in libraries producing promotional videos and podcasts. the dramatic increase in mobile technology (smartphones and tablets) ownership and usage over the last decade has resulted in an increase in the consumption of podcasts wherever the listener happens to be when their ears are not otherwise fully occupied, such as commuting, exercising, and engaging in home chores. as a result, academic libraries are now finding themselves in an excellent position to use podcasting for instructional and promotional purposes in an effort to reach a broad audience. what happens when the university library combines its inherent interest in supporting the promotion of faculty and graduate student research with its ability to create podcasts to quickly and inexpensively reach an international audience? this paper documents the efforts of an academic library at a high-level research university to partner with one of the university’s mailto:dsmith@usf.edu mailto:mlcook3@usf.edu mailto:torrence@usf.edu information technology and libraries september 2020 making disciplinary research audible | smith, cook, and torrence 2 academic departments to use podcasting to promote the research done by that department’s faculty and doctoral candidates. we will describe which library staff were involved, how the podcast was planned, the execution of the podcasting process, the issues that were encountered throughout the process, and how the impact of the podcast was assessed. calling: earth, the podcast produced by the university of south florida (usf) libraries, can be found at http://callingearth.lib.usf.edu/. literature review podcasting as a means for promoting scholarly communication is a relatively new and uncommon idea in a library setting, therefore the extant literature is scarce on the subject. a high percentage of the contemporary articles on the aforementioned topic focus on the use of podcasts as a means to satisfy a wide array of student learning needs. while pedagogical best practices knowledge is useful, what current literature does exist is not an exact match for the concept of promoting scholarly communication, which offers subject specificity, faculty and graduate interaction, marketing of libraries, and research visibility as aggregate goals. what follows in this literature review is a summary of a slice of the literature related to podcasting, academia, and/or libraries. the researchers chose as a starting point to look at the general use of podcasting, as well as social media, in various academic and library environments. in a recent article on the use of social media and altmetrics, for example, the increased use of these tools is outlined, but with numerous caveats regarding the initial non-probabilistic methods of gathering information on the how and why of their adoption.1 to further emphasize the use of podcasts and, in a related way, social marketing, an examination of an article related to association of research libraries (arl) efforts in this vein was examined. a comprehensive study of arl member libraries published in 2011, with not much on this topic published since this date, demonstrated in figure 1 of their research that five of the 37 respondents contained recorded interviews and only one included scholarly publishing content.2 this ten-year vacuum in further research was unexpected but indicates an opportunity for a new type of podcast focusing on academic production. scholars in academic libraries have long examined student preferences for new technologies and types of information transfer, including the use of podcasts. a study from sam houston state university found that 36 percent of users in 2011 were using podcasts for recreational purposes as opposed to much lower use for academic and scholarly communication benefits. 3 in the future, academic creation and utilization of podcasts for scholarly communication is ripe for a hearty statistical and qualitative analysis. specific to this inquiry, the application of podcasts for scholarly communication in a subject discipline present in the literature appears to be lacking. furthermore, this literature review emphasizes the dearth of research that relates to promoting the research efforts of geosciences faculty and graduate students. in terms of recent literature, there are also a number of publications available that deal with the history and evolution of podcasting in education and, specifically, higher education. one such current work provides an excellent outline of this growth in use, as well as outlining several major types, or genres, of podcasting in these types of environments. following a strong and succinct overview of the technology and its use in college and university settings, the author continues to effectively define, with examples, the three main genres they have identified: the “quick burst,” the “narrative,” and the “chat show.”4 the model that most represents usf’s calling: earth program is “narrative,” as this includes a subcategory of “storytelling.” this work is truly beneficial for any group or individual developing, or improving, an educational podcast effort. http://callingearth.lib.usf.edu/ information technology and libraries september 2020 making disciplinary research audible | smith, cook, and torrence 3 in 2011, peoples and tilley outlined the emergence of podcasts to disseminate information in academic libraries. one of the excellent questions that arises from this work deals with the access, advancement, and archiving of the content; is this content to be archived, or cataloged, as more permanent material, or is it electronic ephemera?5 this is a question for the usf calling: earth podcast group going forward as the level and quality of content and, ideally, use is expanded. additionally, educators are studying more about the limitations of podcasts; not to rule them out as academic tools, but to inspire and enhance the best possible outcomes. one excellent warning to be heeded by any library hoping to utilize podcasts for education and dissemination of research is summed up well in this quote: “if students do not utilize or do not realize the benefits of the selfpacing multimedia characteristics of podcasting, then the resource becomes a more likely contributor to cognitive overload.”6 there have been a small number of the quantitative elements of podcast use in academic libraries. an article in vine: the journal of information & knowledge management systems outlined, via content analysis and other methods, various unique and shared characteristics of existing academic podcasts, while also furthering the concept of podcasting as a “library service.”7 this may not have been the first publication to make this assertion, but this is a view that is also held by these authors and this view shapes the development and advancement of the usf libraries podcasting efforts. librarians of all types must be wary, however, as there are numerous articles that focus on the better understanding of student learning preferences. as presented by a recent article on the success of satellite and distance learners showed, though, these tools often hit the spot on the delivery preferences for these types of students.8 switching gears to a bit more topic specificity, a number of news and academic articles were identified on the use of podcasts in areas of the geosciences. one such effort is th e geology flannelcast. the development and implementation of this combination of education and entertainment, which is also a goal of these authors, is outlined by the creators’ poster presentation at a recent geological society of america conference. with a focus on the increasing ease of podcasting technology, reduced cost of equipment, and the use of “conversational atmosphere” within a pedagogical framework, this model stood out as one worth studying.9 furthermore, the geosciences are, or can be, interesting and exciting. a recent podcast on communicating geosciences with fun and flair is just the encouragement this research group needed to go all-in on this project. and that the geosciences are far from boring!10 as is evidenced by an examination of current and historical literature on this topic, there are multiple opportunities for further exploration and library efforts, expressly as one of the main points of this work is to emphasize faculty and graduate research efforts and scholarly communication and original content creation. in addition to the focus on these publications and presentation efforts, the results will be measured by the initial assessment projects including download and utilization data and, hopefully, positive feedback from participants and library administration. further measurement is expected to demonstrate advanced citation counts and downloads of the publications of the faculty and graduate student interviewees. it will be correlation and not causation, of course, but the team hopes to have positive feedback for participants and the library. information technology and libraries september 2020 making disciplinary research audible | smith, cook, and torrence 4 staffing as with any successful project, a project to produce a podcast focused on academic research had to begin with individuals who had either the interest or the expertise, ideally both, to initiate the work. one was an associate librarian with more than 13 years of experiencing in producing regular podcasts, while the other was a library staff member who was a doctoral candidate serving on the usf libraries research platform team (rpt) for the usf school of geosciences. the rpt was already tasked with assisting the geosciences faculty and graduate students in maximizing the impact of their work and had been using various means in order to accomplish this, such as an institutional repository for research output, and tools to measure the impact of previously published work. during a conversation in late 2018, the librarian suggested to the rpt staff person that podcasting could be used to promote research to a variety of audiences, including usf faculty and students, faculty and students at other universities, k-12 science teachers, and members of the general public (both local and beyond). the librarian offered to initiate the podcast and train the rpt staff on how to continue the podcast after a number of episodes had been produced. the librarian brought to the project the needed expertise with launching and maintaining a podcast, while the rpt doctoral candidate was already familiar with the geosciences faculty and other doctoral candidates and could identify those who would make good candidates for being interviewed about their research. planning the initial planning for the podcast began approximately two months before the first episode release. the original project managers and podcast creators met a number of times to discuss logistics, equipment, and staffing needs, and to agree upon a podcast name (calling: earth). since the notion of podcasting for researcher promotion was an unexplored territory, the support from higher administration was cautious. however, after production of the first episodes, traction behind the podcast grew and additional support for future endeavors was received. the podcasters acquired handheld recording equipment, a tascam dr-05 linear pcm recorder, from the usf libraries digital media commons and tested it out in multiple environments (for instance, a quiet office versus a recording studio) to find the optimal location to record the interviews. we found the hand-held recording equipment worked well in a quiet office and allowed for travel to the researcher’s office if they requested. the podcast creation team discussed how to add intro and outro music to the podcast that would not violate any copyright restrictions but that would fit the mood of the podcast. th e rpt staff person knew of a local tampa-based band, the growlers, as a potential source for music because the bass guitarist was an adjunct professor and alumnus of the usf school of geosciences. the alumnus gave permission to use a portion of the band’s recorded music for the podcast. a hosting service was needed to host and publish the podcast. the librarian suggested using libsyn, because of their 13 years of previous experience with the platform, libsyn’s inexpensive hosting plan, and the ability to acquire statistics including the geographic locations (countries and states) where the podcast was being downloaded. information technology and libraries september 2020 making disciplinary research audible | smith, cook, and torrence 5 execution potential interviewees were contacted via email and invited to be interviewed. once the potential interviewee agreed, a time and a place to conduct the interview was agreed upon. the rpt staff person determined what the most recent research was for each interviewee, and then pro vided that content to the librarian host for review. the host then prepared interview questions based on the research content. the host went over the questions with the interviewee before the interview began to clear the content with the interviewee and to make sure everything would be covered in the interview that the interviewee wished to cover. the interviews took approximately 30 minutes to an hour. editing of the podcast was done using garageband, allowing for the addition of the music to the beginning and end, as well as the host introducing both the general podcast and the specific episode, identifying the academic units involved in the podcast, indicating how listeners might provide feedback, and thanking the music group for allowing the use of their music. in a few rare cases, small interview segments were removed, usually due to the interviewee feeling that it did not represent them well. challenges as with any new endeavor, challenges were faced at all stages in the process of getting the podcast to production and beyond. buy-in from library administration an early challenge was to gain buy-in from the library administration. this began with requesting that the library fund the hosting service, and the feeling of the administrator was that it was a worthwhile experiment, at least in the short term. once a number of episodes had been produced, the library administration had a better sense of the quality of the production and how it would serve the interests of the library in its academic support role. lack of budget with no budget for this project (beyond the administration’s monthly payment of the hosting service), the podcasters were at the mercy of the quality of the recorders available for library checkout. if the recorders did not produce a high-quality recording, the podcast would possibly lack the sophistication needed for production. also, high-quality graphics work was needed and required us to look into other library units for help with creating a logo. getting the podcast into apple podcasts once content was being produced and published, it was time to submit the podcast to apple podcasts. apple initially rejected the submission because the first logo looked very similar to an iphone. it should be noted that apple did not supply a specific explanation of what copyright was being infringed, so the podcasters were faced with making a best guess as to what the problem was. based on our assumption, we changed the logo and resubmitted the podcast. a further problem arose when apple required that the new submission use a different rss feed than the original submission. eventually the podcasters sought assistance from libsyn, who explained how to make a minor change to the url of the rss feed so that the podcast could be successfully resubmitted. information technology and libraries september 2020 making disciplinary research audible | smith, cook, and torrence 6 new logo creation the first logo continued to be used for the entire first season, but before the second season was released, the library’s new communications and marketing coordinator assisted with the creation of a new logo that looked more sophisticated and more in-line with other podcast logos. having an in-house graphics designer was extremely helpful in rolling out a new logo (see figures 1 and 2). figure 1. season 1 logo figure 2. current logo setting up interviews identifying potential interviewees, requesting interviews, and setting good times and locations for the interviews brought on another batch of challenges. the usf school of geosciences is composed of geologists, geographers, and environmental scientists so when planning out the schedule for the potential interviewees, an effort was made to involve a wide range of researchers. some potential interviewees denied the request altogether, while others were not available for the needed time period. given that the podcast was released every two weeks, there was a little wiggle room for scheduling hiccups, but once or twice a last-minute request to a new potential interviewee was made to ensure production stayed on schedule. where the interview was held and what time required a lot of back-and-forth emails between the rpt staff person and the interviewee. preference on time and location was given to the interviewee, but it was requested that, if they did not want to come to the library to be interviewed, their own office/lab space could be used if it was a sufficiently quiet environment for recording purposes. comfortability of the interviewee once an interview began, the challenge of engagement from the host and comfortability of the interviewee became apparent. the host had to engage the researcher at a level appropriate for a general audience, which was challenging given that the research done by the usf school of geosciences is often at a high-level of critical thinking and problem-solving. to add on to the complexity of the research being explained, the comfort level of the interviewee had the potential information technology and libraries september 2020 making disciplinary research audible | smith, cook, and torrence 7 to dampen the interview. one researcher was so uncomfortable speaking in an interview that they typed up in advance what they wanted to say. assessment libsyn statistics according to libsyn statistics (as of july 17, 2020) there were a total of 3,593 unique downloads from 48 different countries of the published 35 episodes of calling: earth. in table 1, the 48 countries where calling: earth has been downloaded are shown, as well as how many times the podcast has been downloaded in each country. it is worth noting that there are 105 downloads that do not have a location specified, so the total of the downloads in table 1 does not equal the total number of downloads reported by libsyn. table 1. downloads by country name downloads name downloads united states 2,729 chile 3 united kingdom 103 denmark 3 india 98 romania 3 australia 88 south africa 3 france 62 yemen 3 ireland 50 argentina 2 bangladesh 43 ecuador 2 spain 37 poland 2 russian federation 36 taiwan 2 norway 30 turkey 2 portugal 30 belgium 1 germany 20 bulgari 1 japan 19 colombia 1 mexico 18 costa rica 1 italy 14 estonia 1 netherlands 12 greece 1 new zealand 11 latvia 1 brazil 9 macedonia 1 korea, republic of 9 nigeria 1 czech republic 7 pakistan 1 ukraine 7 saudi arabia 1 china 6 united arab emirates 1 hong kong 5 vietnam 1 sweden 4 without a location 105 canada 3 information technology and libraries september 2020 making disciplinary research audible | smith, cook, and torrence 8 preliminary survey and scholarly impact a survey was sent out to the interviewees to gauge their impressions of the podcast and to see if they had noticed any impact to their citations or document downloads. our goal for the survey was to find out if the podcast was accomplishing the intention for starting a podcast, which was to increase researcher impact by research dissemination, as well as to inform the podcast processes and procedures. the questions asked were: 1. in what ways do you view the calling: earth podcast as a way to positively affect your research impact? 2. what evidence do you have, if any, to suggest your research has been positively impacted because of being an interviewee on the calling: earth podcast? 3. what would you have liked to be different about your interview process for the calling: earth podcast? 4. what suggestions do you have for the future seasons of the calling: earth podcast? for example, should the format change, the focus be different, change the length of the interview, etc. furthermore, each interviewee was asked to contribute their scholarship to the library’s institutional repository, scholar commons, to allow for the archiving of their research publications and to use as a means of tracking scholarship impact as a result of the podcast. once an interviewee’s scholarship was placed in scholar commons, a selected works profile was created so that a direct link to the scholar’s work could be disseminated through the podcast notes. impact on faculty has also been noteworthy. the download totals for faculty interview participants (when comparing roughly the same amount of time just prior to and following their published interview) showed an average increase of 30 percent and suggest a strong correlative link between the podcast and researcher impact. furthermore, anecdotal evidence from interviewees such as “puts my name out there to a wider audience,” “enhances the visibility of my work,” and “allow others to hear about [my research] in a more passive way” indicates the potential impact a researcher can see from being a part of the podcast. a second survey was sent to the faculty, students, and staff of the entire school of geosciences to determine who was listening to the podcast and conversely, who was not, and their reasons for listening or not listening. the survey contained five questions in total, but according to how the participant selected their answers, not all were available to be answered (figure 3). the first question asked their status in the school of geosciences (faculty, staff, undergraduate, graduate, or other). the second question asked if they had heard of the podcast and if they had or had not listened to it. if a participant chose the option that they had never heard of the podcast, then the survey ended for them. if a participant chose the option that they had heard of the podcast, but had not listened to it, then the survey directed them to a question that asked them to provide reasons they had not listened to the podcast. if a participant chose the option that they had heard of the podcast, and had listened to at least one episode, the survey directed them to a question that asked how many episodes the participant had listened to and for what reasons were they listening to the podcast. this data was collected to inform the future direction of the podcast. information technology and libraries september 2020 making disciplinary research audible | smith, cook, and torrence 9 figure 3. flow chart for the entire school of geosciences survey checklist for podcast planning/execution based on our experiences in the production of the calling: earth podcast, we recommend that academic librarians and library staff use the following list to help with planning and executing the production of their own podcasts: • get general buy-in from library staff and administration, and update as the planning progresses and budgeting is needed. • decide on goals, audience, content, format, frequency of production, and methods of assessment. • work with media staff to design marketing, including podcast title (avoiding duplication with other podcasts) and logo development. • choose a podcast hosting service. • identify relevant staff for hosting, recording, editing, and publishing and train as needed. • evaluate existing hardware and software and make additional purchases as needed. • contact potential interviewees and create a schedule. • prepare customized interview questions and share as appropriate with interviewees. • record interviews. • edit and publish episodes. • submit podcast to apple podcasts, spotify, and other popular podcast directories. • monitor statistics. • continue to engage in marketing and assessment activities. what choice best describes your current status in the usf school of geosciences: which of the following describes you: i have never heard of the calling: earth podcast i have heard of the calling: earth podcast, but have not listened to it i have heard of the calling: earth podcast and have listened to at least 1 episode what choice best describes why you have not listened to the calling: earth podcast: i know what a podcast is, but i do not have time to listen to the calling: earth podcast. end survey faculty staff graduate student undergraduate student other i do not know what a podcast is. other i know what a podcast is, but i am not interested in listening to calling: earth podcast. i know what a podcast is, but i do not have time to listen to the calling: earth podcast. end survey approximately how many episodes of the calling: earth podcast have you listened to? 1 for enjoyable content for awareness of current research in the usf school of geosciences for instructional purposes for ways to find collaborators other 2 3 4 5 6 7 8 end survey information technology and libraries september 2020 making disciplinary research audible | smith, cook, and torrence 10 conclusions and future directions enthusiasm and anecdotal positive feedback are enough fuel for current activities and levels of enthusiasm and the future of podcasting in libraries also appears open and exciting. at the usf libraries, calling: earth is currently in its third season and with each new episode, new ideas and increased archival content become a permanent part of the library’s legacy and collections. this is another area ripe for future exploration, as this type of original content is archived, cataloged, and disseminated, becoming another part of regular academic impact measure. in this vein, the usf libraries podcasting group plans to further codify cyclical assessment tools, including the receipt of irb clearance for future surveys and data collection. in addition to cleaning up and refining these assessment practices, this will also provide the opportunity to publish and present publicly on more specific data. ideally, the group will be able to correlate the show’s presence to positive citation or metrics levels with show participants. the usf libraries geosciences rpt is currently collecting baseline aggregate information, which could then be compared following further maturation and dissemination of the podcast. causality may never be within reach, but any positive impacts will be exciting and beneficial. it is also the hope of those involved with calling: earth that it might provide a model or template for other rpt or library podcasts or media efforts. one of the current benefits is the strong and effective support from the development and communication directors at the usf libraries and their partnerships in the future will certainly be key to the success of this and any other potential projects of this type. in closing, the academic library podcasting landscape is wide-open for further exploration and examination, and the usf libraries plans to lead and learn. endnotes 1 cassidy r sugimoto et al., “scholarly use of social media and altmetrics: a review of the literature,” journal of the association for information science and technology 68, no. 9 (2017): 2,037–62. 2 james bierman and maura l. valentino, “podcasting initiatives in american research libraries,” library hi tech 29, no. 2 (may 2011): 349, https://doi.org/10.1108/07378831111138215. 3 erin dorris cassidy et al., “higher education and emerging technologies: student usage, preferences, and lessons for library services,” reference & user services quarterly 50, no. 4 (2011): 380–91, https://doi.org/10.5860/rusq.50n4.380. 4 christopher drew, “educational podcasts: a genre analysis,” e-learning and digital media 14, no. 4 (2017): 201–11, https://doi.org/10.1177/2042753017736177. 5 brock peoples and carol tilley, “podcasts as an emerging information resource,” college & undergraduate libraries 18, no. 1 (january 2011): 44, https://doi.org/10.1080/10691316.2010.550529. 6 stephen m walls et al., “podcasting in education: are students as ready and eager as we think they are?”, computers & education 54, no. 2 (january 2010): 372, https://doi.org/10.1016/j.compedu.2009.08.018. https://doi.org/10.1108/07378831111138215 https://doi.org/10.5860/rusq.50n4.380 https://doi.org/10.1177/2042753017736177 https://doi.org/10.1080/10691316.2010.550529 https://doi.org/10.1016/j.compedu.2009.08.018 information technology and libraries september 2020 making disciplinary research audible | smith, cook, and torrence 11 7 tanmay de sarkar, “introducing podcast in library service: an analytical study,” vine 42, no. 2 (2012): 191–213, https://doi.org/10.1108/03055721211227237. 8 lizah ismail, “removing the road block to students’ success: in-person or online? library instructional delivery preferences of satellite students,” journal of library & information services in distance learning 10, no. 3–4 (2016): 286–311, https://doi.org/10.1080/1533290x.2016.1219206. 9 jesse thornburg, “podcasting to educate a diverse audience: introducing the geology flannelcast,” in innovative and multidisciplinary approaches to geoscience education (posters) (boulder, co: geological society of america, 2015). 10 catherine pennington, “podcast: geology is boring, right? what?! no! why scientists should communicate geoscience...,” n.d., https://britgeopeople.blogspot.com/2018/10/podcastgeology-is-boring-right.html. https://doi.org/10.1108/03055721211227237 https://doi.org/10.1080/1533290x.2016.1219206 https://britgeopeople.blogspot.com/2018/10/podcast-geology-is-boring-right.html https://britgeopeople.blogspot.com/2018/10/podcast-geology-is-boring-right.html abstract introduction literature review staffing planning execution challenges buy-in from library administration lack of budget getting the podcast into apple podcasts new logo creation setting up interviews comfortability of the interviewee assessment libsyn statistics preliminary survey and scholarly impact checklist for podcast planning/execution conclusions and future directions endnotes what more can we do to address broadband inequity and digital poverty? editorial board thoughts what more can we do to address broadband inequity and digital poverty? lori ayre information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12619 lori ayre (lori.ayre@galecia.com) is principal, the galecia group, and a member of the ital editorial board. © 2020. we are now almost seven months into our new lives with the novel coronavirus and over 190,000 americans have died of covid-19. library administrators have been struggling with their commitment to provide services to their communities while keeping their staff safe. initially, libraries relied on their online offerings, so more e-books and other online resources were acquired. staff learned that they could do quite a bit of their work from home. they could still respond to email and phone messages. they could evaluate and order new material. they could deliver online programs like summer reading and story time. they could interact with people on social media. they could put together key resources for patrons and post them on the website.1 a lot of what the library was doing while the buildings were closed was not obvious. most people associate the library with the building and since the building was closed… it seemed like nothing was happening at the library. yet, library workers were busy. once it became possible for library staff to enter the building (per local health ordinances), the first thing that libraries started to do was accept returns. that was a little fraught considering how little we knew about the virus and how long contaminants might live on returned library material. eventually with the long-awaited testing results from the realm project and battelle labs (https://www.webjunction.org/explore-topics/covid-19-research-project.html), people started standardizing on a three-day quarantine of returns. then more testing of stacked material was done resulting in some people choosing to quarantine returns for four days. as of early september, we have learned that even five days isn’t enough to quarantine delivery totes and some other plastic material. curbside pick-up was born in these early days of being allowed back in the buildings. if someone had mapped who was offering curbside pick-up, it would look like popcorn popping across the country. the number of libraries offering the service slowly increased and pretty soon nearly everyone was doing it.2 many library directors will say that curbside pick-up is here to stay. people love the convenience too much to take the service away. rolling out curbside pick-up has had some challenges: how to safely make the handoff between library staff and library patrons; whether to accept returns; whether to charge fines; modifying mailto:lori.ayre@galecia.com https://www.webjunction.org/explore-topics/covid-19-research-project.html information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 2 circulation policies to fit the current needs; and selecting books for people that want them but who don’t have the skills needed to negotiate the library catalog’s requesting system. some libraries started putting together grab bags of materials selected by staff for specific patrons—kind of like homebound services on-the-fly. curbside helped get material in circulation again. importantly, also during this period, libraries started finding creative ways to get wi-fi hotspots out into communities. they began lending them if they weren’t already. those libraries already circulating hotspots increased their inventory. they took their bookmobiles into neighborhoods and created temporary hotspot connections around town. many libraries made sure wi-fi from their building was available in their own parking lots.3 but one thing everyone has learned during this pandemic is that libraries alone cannot be the solution to the digital divide. this isn’t news to librarians who have been arguing that internet access should be as readily available as electricity and water. librarians understand that information cannot be free and accessible unless everyone has internet access and knows how to use it. public access computers, wi-fi hotspots, and media literacy are all staple services in our libraries today.4 however, these services are not enough to bridge a digital divide that only seems to be getting worse. the coronavirus that closed libraries and schools has made it painfully clear that something much bigger has to happen to address the problem. as gina millsap stated in a recent facebook post: i think it’s become obvious that the covid-19 crisis is shining a spotlight on the flaws we have in our broadband infrastructure and on our failure to make the investments that should have been made for equitable access to what should be a basic utility, like water or electricity.5 according to broadbandnow, the number of people who lack broadband internet access could be as high as 42 million.6 the fcc reports that at least “18 million people lacked access to broadband internet at the end of 2018.”7 even if all the libraries were open and circulating hundreds of wi-fi hotspots, we’d still have a very serious access problem. thinking differently about addressing the digital divide in a paper published march 28, 2019, by the urban libraries council (ulc), the author suggested three specific actions that libraries can take to address race and social equity and the digital divide. they are: 1. assess and respond to the needs of your community through meaningful conversation (including considering different partners for your work) 2. optimize funding opportunities to support your efforts (e.g. e-rate), and information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 3 3. think outside the box to create effective solutions that are informed by those in need (e.g. lending wi-fi hot spots).8 while we know libraries have been heeding this advice when it comes to wi-fi hotspots, let’s look into what can be done when we consider ulc’s suggestion to consider different partners for your work. community partners an excellent example of what can be done with a coalition of community partners comes from detroit where a mesh wireless network was put in place to provide permanent broadband in a low-income neighborhood.9 the project is called the detroit community technology project. with the community-based mesh network, only one internet account is needed to provide access for multiple homes. the networks also enable people on the network to share resources on the network (calendar, files, bulletin board) and that data lives on their network, not in the cloud. one of the sponsors of the detroit community technology project is the allied media project (https://www.alliedmedia.org/) which also sponsors the casscowifi and the equitable internet initiative to get broadband and digital literacy training into several underserved areas. community networks (https://muninetworks.org/), a project of the institute for local selfreliance (https://ilsr.org/), describes several innovative projects in which communities partner with electric utilities. surry county, virginia, expects to extend broadband access to 6,700 households through a first-ever partnership between a utility (dominion energy virginia) and an electric cooperative (dominion energy). a similar project is underway with the northern neck cooperative and dominion energy.10 these initiatives are made possible due to some regulatory changes made in virginia (sb 966). according to community networks, there are 900 communities providing broadband connectivity locally (https://muninetworks.org/communitymap). but nineteen states still have barriers in place that discourage, if not outright prevent, local communities from investing in broadband. libraries in states where community networks are a viable option should be at the table, or perhaps setting the table, for discussions about how to bring broadband to the entire community not just into the library or dispatched one-at-a-time via wi-fi hotspots. this is an opportunity to convene community conversations focusing on the issue of broadband. library staff have been doing more and more of this type of outreach into the community and acting as facilitator. the ala has even produced a community conversation workbook (http://www.ala.org/tools/sites/ala.org.tools/files/content/ltc_convoguide_final_062414.pdf ) to support libraries just getting started. state partners in california, the governor recently issued executive order n-73-20 (https://www.gov.ca.gov/wp-content/uploads/2020/08/8.14.20-eo-n-73-20.pdf) directing state agencies to pursue a goal of 100 mbps download speed and outlines actions across state agencies https://www.alliedmedia.org/ https://muninetworks.org/ https://ilsr.org/ https://muninetworks.org/communitymap http://www.ala.org/tools/sites/ala.org.tools/files/content/ltc_convoguide_final_062414.pdf https://www.gov.ca.gov/wp-content/uploads/2020/08/8.14.20-eo-n-73-20.pdf information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 4 and departments to accelerate mapping and data collection, funding, deployment, and adoption of high-speed internet.11 this will undoubtedly create fertile ground for libraries to partner with other agencies and community organizations to advance this initiative. libraries are specifically called out to raise awareness of low-cost broadband options to their local community. every state has some kind of broadband task force or commission or advisory council (https://www.ncsl.org/research/telecommunications-and-information-technology/statebroadband-task-forces-commissions.aspx). this is another instance where libraries should be at the table. in my state, our state librarian is on the california broadband council. but many of these commissions do not have a representative from the library world which means they probably are not hearing from us. whether it is through your local library, your state library, or your state library association, it is important for librarians to build relationships with people on these commissions—if not get a seat on the commission themselves. national partners unless your community is blanketed with affordable broadband connectivity, it will be important that we continue to advocate nationally for the needs we see. in addition to helping the patron standing right in front of us checking out their hotspot, we also need to address the needs of the people who aren’t able to get to the library but are equally in need of access. our job is to make sure that any new initiatives undertaken by a new administration provide for free and equitable access to the internet for every household. extending e-rate (the federal communication commission’s program for making internet access more affordable for schools and libraries) isn’t enough. free (or at least affordable) broadband needs to be brought to every home. the electronic frontier foundation (eff) argues that fiber-to-the-home is the best option for consumers today because it will be easily upgradeable without touching the underlying cables and will support the next generation of applications (see https://www.eff.org/wp/case-fiber-hometoday-why-fiber-superior-medium-21st-century-broadband). libraries have worked with the eff on issues related to privacy and government transparency. maybe it’s time to team-up with them about broadband. global partners low earth orbit (leo) satellites could potentially bring broadband to everyone on earth.12 starlink (https://www.starlink.com/) is elon musk’s initiative and project kuiper (https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-projectkuiper-satellite-constellation) is amazon’s jeff bezos’ project. a private beta starlink service is due (or perhaps it is already happening). if it works as musk has envisioned, it could be a gamechanger. or it might just make the digital divide worse if it isn’t affordable to everyone who needs it. how might we lobby musk to roll-out this service in a way that is equitable and fair? https://www.ncsl.org/research/telecommunications-and-information-technology/state-broadband-task-forces-commissions.aspx https://www.ncsl.org/research/telecommunications-and-information-technology/state-broadband-task-forces-commissions.aspx https://www.eff.org/wp/case-fiber-home-today-why-fiber-superior-medium-21st-century-broadband https://www.eff.org/wp/case-fiber-home-today-why-fiber-superior-medium-21st-century-broadband https://www.starlink.com/ https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project-kuiper-satellite-constellation https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project-kuiper-satellite-constellation information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 5 speak up, speak out, and get in the way these are just a few avenues that we, as professionals committed to free access to information, might pursue. i worry that we have not made enough noise about the problems we see in our communities that are a result of broadband inequity and digital poverty. and although virtually every library is doing something to address the problem, our efforts are no match for the magnitude of the problem. in a blog post on the brookings institution’s website, authors lara fishbane and adie tomer argue for a new agenda focused on comprehensive digital equity that includes (among other things) “building networks of local champions, ensuring community advocates, government officials, and private network providers share intelligence, debate priorities, and deploy new programming .”13 there are no better local champions and advocates for communities than the city or county librarians and their staffs. let’s treat this problem with the seriousness it deserves and at a scale that will be meaningful. to quote john lewis (as so many of us have since his death on july 17, 2020), it's time for us to “speak up, speak out, and get in the way.”14 we have to make it painfully clear to policymakers that libraries cannot bridge the digital divide with public access computers and hotspots. we need to tell our communities’ stories, convene conversations, and agitate for equitable broadband that is as readily available as water and electricity. endnotes 1 “libraries respond: covid-19 survey,” american library association, accessed august 25, 2020, http://www.ilovelibraries.org/sites/default/files/may-2020-covid-survey-pdf-summaryof-results-web-2.pdf. 2 erica freudenberger, “reopening libraries: public libraries keep their options open,” library journal, june 25, 2020, https://www.libraryjournal.com/?detailstory=reopening-librariespublic-libraries-keep-their-options-open. 3 lauren kirchner, “millions of american depend on libraries for internet. now they’re closed,” the markup, june 25, 2020, https://themarkup.org/coronavirus/2020/06/25/millions-ofamericans-depend-on-libraries-for-internet-now-theyre-closed. 4 jim lynch, “the gates library foundation remembered: how digital inclusion came to libraries,” techsoup, accessed august 24, 2020, https://blog.techsoup.org/posts/gateslibrary-foundation-remembered-how-digital-inclusion-came-to-libraries. 5 gina millsap, “this was in april. q. we’re starting a new school year and what has changed? a. not much. it’s past time to get serious about universal broadband in the u.s.” facebook, august 16, 2020, 5:37 a.m., https://www.facebook.com/gina.millsap.7/posts/10218986781485855. accessed september 14, 2020. http://www.ilovelibraries.org/sites/default/files/may-2020-covid-survey-pdf-summary-of-results-web-2.pdf http://www.ilovelibraries.org/sites/default/files/may-2020-covid-survey-pdf-summary-of-results-web-2.pdf https://www.libraryjournal.com/?detailstory=reopening-libraries-public-libraries-keep-their-options-open https://www.libraryjournal.com/?detailstory=reopening-libraries-public-libraries-keep-their-options-open https://themarkup.org/coronavirus/2020/06/25/millions-of-americans-depend-on-libraries-for-internet-now-theyre-closed https://themarkup.org/coronavirus/2020/06/25/millions-of-americans-depend-on-libraries-for-internet-now-theyre-closed https://blog.techsoup.org/posts/gates-library-foundation-remembered-how-digital-inclusion-came-to-libraries https://blog.techsoup.org/posts/gates-library-foundation-remembered-how-digital-inclusion-came-to-libraries https://www.facebook.com/gina.millsap.7/posts/10218986781485855 information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 6 6 “libraries are filling the homework gap as students head back to school,” broadband usa, last modified september 4, 2018, https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-arefilling-homework-gap-students-head-back-school. 7 james k. willcox, “libraries and schools are bridging the digital divide during the coronavirus pandemic,” consumer reports, last modified april 29, 2020, https://www.consumerreports.org/technology-telecommunications/libraries-and-schoolsridging-the-digital-divide-during-the-coronavirus-pandemic/. 8 sarah chase webber, “the library’s role in bridging the digital divide”, urban libraries council, last modified march 28, 2019, https://www.urbanlibraries.org/blog/the-librarys-role-inbridging-the-digital-divide. 9 cecilia kang, “parking lots have become a digital lifeline,” the new york times, may 20, 2020, https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html. 10 ry marcattilio-mccracken, “electric cooperatives partners with dominion energy to bring broadband to rural virginia,” last modified august 6, 2020, https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bringbroadband-rural-virginia. 11 “newsom issues executive order on digital divide,” cheac (improving the health of all californians), last modified august 14, 2020, https://cheac.org/2020/08/14/newsom-issuesexecutive-order-on-digital-divide/. 12 tyler cooper, “bezos and musk’s satellite internet could save americans $30b a year,” podium: opinion, advice, and analysis by the tnw community, last modified august 24, 2019, https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-couldsave-americans-30b-a-year/. 13 lara fishbane and adie tomer, “neighborhood broadband data makes it clear: we need an agenda to fight digital poverty,” brookings institution, last modified february 6, 2020, https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-datamakes-it-clear-we-need-an-agenda-to-fight-digital-poverty/. 14 rashawn ray, “five things john lewis taught us about getting in ‘good trouble,’” brookings institution, last modified july 23, 2020, https://www.brookings.edu/blog/how-werise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/. https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are-filling-homework-gap-students-head-back-school https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are-filling-homework-gap-students-head-back-school https://www.consumerreports.org/technology-telecommunications/libraries-and-schools-bridging-the-digital-divide-during-the-coronavirus-pandemic/ https://www.consumerreports.org/technology-telecommunications/libraries-and-schools-bridging-the-digital-divide-during-the-coronavirus-pandemic/ https://www.urbanlibraries.org/blog/the-librarys-role-in-bridging-the-digital-divide https://www.urbanlibraries.org/blog/the-librarys-role-in-bridging-the-digital-divide https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring-broadband-rural-virginia https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring-broadband-rural-virginia https://cheac.org/2020/08/14/newsom-issues-executive-order-on-digital-divide/ https://cheac.org/2020/08/14/newsom-issues-executive-order-on-digital-divide/ https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could-save-americans-30b-a-year/ https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could-save-americans-30b-a-year/ https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data-makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/ https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data-makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/ https://www.brookings.edu/blog/how-we-rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/ https://www.brookings.edu/blog/how-we-rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/ thinking differently about addressing the digital divide community partners state partners national partners global partners speak up, speak out, and get in the way endnotes peer reading promotion in university libraries: based on simulation study about readers' opinion seeking in social network article peer reading promotion in university libraries based on a simulation study about readers’ opinion seeking in social networks yiping jiang, xiaobo chi, yan lou, lihua zuo, yeqi chu, and qingyi zhuge information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12175 yiping jiang (jyp@zjut.edu.cn) is associate professor in information and library science, zhejiang university of technology, china. xiaobo chi (chixiaobo@zjut.edu.cn) is associate professor in information and library science, zhejiang university of technology, china. yan lou (jljly@zju.edu.cn) is associate professor in administrative department of continuing education, zhejiang university, china. lihua zuo (jljly@zju.edu.cn) is librarian at zhejiang university of technology, china. yeqi chu (cyq77@zjut.edu.cn) is librarian at zhejiang university of technology, china. qingyi zhuge (beckygoodly@163.com) is librarian at zhejiang university of technology, china. © 2021. abstract university libraries use social networks to promote reading; however, there are challenges to increasing the use of these library platforms, such as poor promotion and low reader participation. therefore, these libraries need to find ways of dealing with the behavior characteristics of social network readers. in this study, a simulation experiment was developed to explore the behaviors of readers seeking book reviews and opinions on social networks. the study draws on social network theory to find the causes of students’ behavior and how these affect their selection of information. finally, it presents strategies for peer reading promotion in university libraries. introduction over the last decade, social media has made an impact on almost every aspect of daily life. university libraries have gradually accepted social media as a way of promoting their services, and almost every university library in the people’s republic of china has its own social media accounts. however, there are challenges to increasing libraries’ use of social media, such as poor promotion and low reader participation.1 university libraries cannot depend only on promoting reading through readers’ unenthusiastic use of social media tools, as constructive engagement with social networks requires users’ participation, dominance, and construction.2 therefore, as a baseline, university libraries must take into consideration their readers’ social attributes and then make full use of the mutual cooperation and sharing mechanisms between peers so that readers can become more involved in the use of these platforms that promote reading. in the current study, a free simulation was conducted wherein participants were required to complete a preferential choice task while browsing a book review survey that was integrated with social media platforms.3 finally, we provide some suggestions to promote reading. literature review university reading promotion our review of literature on the promotion of university reading reveals three main research perspectives. the first perspective focuses on libraries. there is some evidence to suggest increasing enthusiasm for reading programs within universities.4 rodney detailed the experiences of a library at a small liberal arts university that launched a one book, one community program.5 hou emphasized the dominant position of university reading in reading promotion and put forward specific promotion strategies.6 li et al. established a subscription digital service system for reading promotion in universities that provided personalized services for users.7 the national mailto:jyp@zjut.edu.cn mailto:chixiaobo@zjut.edu.cn mailto:jljly@zju.edu.cn mailto:jljly@zju.edu.cn mailto:cyq77@zjut.edu.cn mailto:beckygoodly@163.com information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 2 resource center for the first-year experience and students in transition hosts a discussion site and has compiled a list of institutions reporting first-year summer reading programs along with a list of book titles used in the programs.8 appalachian state university also has an active university reading program discussion list.9 college reading experience programs have the potential to bring disparate disciplines and college departments together in ways that extend student learning and engagement beyond the classroom. it could be argued that librarians are one group of natural “boundary spanners.”10 gustavus adolphus college has compiled a lengthy list of links to universities that participate in the first-year experience. querying this list suggests that there is a growing number of reading programs on college campuses and librarians are increasingly finding a role in their development and delivery.11 the second perspective focuses on readers. for instance, zhou et al. analyzed users’ reading needs and proposed ideas of how universities could promote reading through the use of questionnaires.12 based on self-determination theory, wei et al. constructed an index of students’ and teachers’ motivation and participation in reading promotions in libraries by using questionnaires. their factor analysis of readers’ reading psychology from the aspects of information value, social sharing, interest, cognition, and emotional entertainment concludes that the theme, intelligence, and interactivity of college reading promotions are significant. 13 dali made specific recommendations on how to give reading practices in academic libraries a boost and a new direction through the lens of the differentiated nature of readerships on campuses.14 the third perspective focuses on cultural constructions in colleges. boff et al. studied the practical activities they termed as library participation in “campus reading experience” (cre) at two american community colleges and two four-year institutions in the united states.15 their research pointed out the importance of reading promotion activities in the cultural constructions of colleges and universities. moreover, it presented efficient suggestions of how librarians can hold reading promotion activities on campus and how librarians can play a more positive role in presenting reading promotion plans to their administrations. marcoux et al. emphasized the vital status of canadian college libraries in various subject areas and cultural dominance in colleges and insisted that reading promotion should be enforced by bringing together colleges, teachers, and students.16 peer education in university libraries since the 1970s, university libraries have experimented with making students an extension of reference services and part of established peer instruction services. for example, at california state university, fresno, student assistants were recruited to work on the reference desk and answer directional and simple reference questions.17 the university of michigan in ann arbor developed its peer information counseling program in 1985 to focus on the retention of minority students.18 the university of wisconsin-parkside and binghamton university in new york employ student peers to provide instructional support.19 all these programs incorporate peer tutoring models developed specifically for their settings. there are many practical projects in which libraries provide instruction to peer readers, such as the peer practical project at wabash college, the student assistant project at valparaiso university, the student consultant project at the university of new mexico, the curriculum consultant project at the university of new hampshire, and the student assistants project at utah state university. 20 surveys of targeted students in these programs revealed that students were more likely to ask questions of student assistants than librarians. descriptions of the training in these projects information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 3 emphasized knowledge of the library’s resources, with little or no explanation of incorporating peer-learning principles.21 by 2010, libraries acknowledge that peer learning provides an opportunity to further solidify the information literacy skills of all students. programs such as the library instruction tutor project at the university of new mexico and research mentors at the university of new hampshire were designed with a focus on taking advantage of the uniqueness of the peer-to-peer relationship, rather than replacing reference services.22 the librat program at california polytechnic state university trained students to provide single-shot information literacy instruction. student endorsement of peer-led sessions provides supportive evidence that participating attendees perceive this type of session as useful and valuable.23 to summarize, university libraries have identified that harnessing the uniqueness of peer relationships is an effective way to engage students in learning.24 social reading promotion the first published papers regarding the use of facebook by libraries and librarians appeared in 2006.25 up to that time, most scholars defined social reading as sharing theoretical research but had not yet explored its relationship to social media. for instance, zhang et al. suggested that classic books and other resources should be aimed more accurately at potential users, and the convenience of the connection between a library and its users in the social media environment should be fully utilized in order to understand the personal needs of users.26 yang et al. explored the relationship between users who engage in social reading and reading resources by analyzing it from the perspective of users.27 white et al. explored how social networks promote reading and studying and found that social media can promote users’ selection, critiques, and discussions in the reading context, which are part of the process of constructive studying.28 similarly, asteman et al. investigated the impact of users’ discussions on reading participation and reading promotion by using facebook as a research target, and they proved that users’ discussions on social reading were beneficial to reading, studying, and understanding complex scientific topics.29 some researchers have explored the resources recommending methods and algorithms for social reading. take kochitchi et al.’s research as an example: they built a visual analysis system to construct the relationship between user characteristics and resources by extracting social reading tags and user interaction behavior characteristics. 30 huang et al. proposed an efficient method for recommending information among social network groups.31 some scholars put particular emphasis on researching user services. for instance, liu et al. created reading review flows to help improve users’ reading ability and optimize the reading experience by analyzing users’ needs on social platforms.32 fox subdivided users on social media platforms into three categories—passive, active, and interactive—and summarized that users’ standardized behaviors can effectively enhance user interaction.33 yao et al. studied the data gathered from practical activities such as information posts and book retrieval through social reading platforms at the tsinghua university library.34 information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 4 methods research questions existing studies on social reading promotion in libraries have mainly explored how to use social media platforms to promote and develop reading promotion services. however, few scholars have explored the reader patterns of those seeking opinions within social networks. this study explored the effects that peers have on each other as opposed to services provided by university libraries and addressed the following three questions: 1. do readers value the opinions of peers on social networks? 2. what tendencies do readers exhibit when seeking opinions on different types of literature? 3. how does social capital influence readers’ tendencies when those readers are seeking opinions? social capital refers to the potential value of social relations and includes two key dimensions: structural and relational.35 social structure can be characterized by quantity and configuration.36 with respect to quantity, the more social ties one has the potential to activate, the more information resources can be transferred.37 the configuration of social capital means that it is higher when a network’s structure is more sparse.38 relational capital refers to the potential value associated with the quality of social relationships which are created and embedded by network peers and can be utilized by network friends.39 previous studies have used different social attributes to describe relational capital, including homogeneity, trust, expertise, power, and closeness.40 study design we used a questionnaire applet and designed a survey of book reviews to explore patterns in the way readers seek information on books through social networks and the factors that influence readers when they adopt others’ opinions. as an initial step, we recruited 300 college student participants from 15 colleges and universities. these students, from the wechat group of the eighth national mechanical design competition for college students, expressed interest in the study. the students were offered three lists of books, categorized as leisure literature, mechanical literature, and information resources utilization literature. there were 10 books in every list, and the books were selected from a 2019 lending list compiled by five colleges and universities. participants were asked to log in to our survey using their wechat credentials. they were required to write reviews for the books that they had read, and they were encouraged to recommend similar literature and write reviews for those books as well. meanwhile, 30 librarians were also invited to write reviews for the books on the lists (see fig. 1). information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 5 figure 1. the book review steps. to keep a representative sample, we adopted stratification and divided participants into different non-overlapping homogenous groups. we set up six groups with similar scales according to the number of wechat friends that the readers had within the 300-student sample. readers in the first group did not have any friends. readers in the second group had one or two friends. readers in the third group had three or four friends. readers in the fourth group had five or six friends. readers in the fifth group had seven to eleven friends. readers in the sixth group had twelve or more friends. subsequently, we randomly invited 15 readers from every group to complete the following steps: (1) complete an online questionnaire that measured their relational capital (see table 1 and fig. 2) and (2) make references to others’ reviews and select books that they intended to read (see fig. 2). information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 6 table 1. three measurements of relational capital measurements questionnaire professional skills 1. reading is a part of my life. 2. reading is in my daily to-do list. 3. i go to the library to study. 4. my reading ability is good. 5. reading helps me a lot. 6. i read a lot (at least 10 books a year) similarity 1. my friends and i read similar books. 2. my friends and i have similar feelings about reading. 3. recommendations are useful to me. intimacy 1. my survey group is trustworthy. 2. others’ comments are beneficial to me. 3. i am willing to share my feelings about reading with my friends. note: a five-point likert scale (strongly disagree,disagree,neutral,agree,strongly agree) was used. information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 7 figure 2. selecting the books. data collection measurement of readers’ behavior when seeking opinions. a book review applet (with wechat’s questionnaire function) was incorporated to record the number of times a reader looked at reviews from peers and librarians (see fig. 3). information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 8 figure 3. measurement of readers’ behavior when seeking opinions. note: when readers wanted to refer to other people’s comments, the applet allowed them to choose either classmates’ or librarians’ reviews. readers could browse the comments through a drop-down menu, and the number of reviews read by the respondents was recorded. measurement of structural capital. the scale of a reader’s structural capital was related to the number of wechat friends they had.41 drawing on the extant literature, we computed network sparseness by dividing the network’s effective size by the overall size. network effective size is the average number of wechat friends within the sample set of 300 people. measurement of relational capital. we used the three variables of professional skills, similarity, and intimacy to measure social relationships.42 data were gathered from the online questionnaire (see table 1). experiment validity reliability analysis. we performed measurement validity checks on the three variables applied in the measurement of relational capital. table 2 shows the evidence of satisfactory convergence and discriminant validity. table 2. reliability analysis results cronbach’s alpha std. cronbach’s alpha projects .811 .811 12 factor analysis. we tested whether it was scientifically meaningful to consider network scale, network sparseness, and relational capital as independent variables in respectively analyzing two information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 9 dependent variables, that is, times of seeking peers’ opinions and times of ref erring to librarians’ opinions. tables 3 and 4 show the results of the kaiser-meyer-olkin (kmo) spherical test and bartlett’s test, which show that factor analysis is suitable, and further analysis can be performed. table 3. kmo spherical test kmo .654 bartlett’s test chi-square 314.342 df 10 significance .00 table 4. bartlett’s test initial extract network scale 1.000 .724 network sparseness 1.000 .676 relational capital 1.000 .720 results reviews written and browsed we recorded the number of reviews read by the respondents and then compared the number of times they consulted peers with the number of times they consulted librarians. the results are shown in table 5. in total, readers sought opinions from peers 1,374 times (70.9%), and from librarians 563 times (29.1%). for leisure literature, mechanical literature, and information resources utilization literature they sought opinions from peers 422 times (85.3%), 519 times (88.3%), and 433 times (50.7%), respectively. from these results, it can be surmised that readers tend to seek opinions from peers. table 5. comparison of sources consulted by readers seeking opinions leisure mechanical information resources utilization total reviews browsed reviews browsed reviews browsed readers 422 (85.3%) 519 (88.3%) 433 (50.7%) 1374 (70.9%) librarians 73 (14.7%) 69 (11.7%) 421 (49.3%) 563 (29.1%) we used regression analysis to analyze the relationship between readers’ behavior of seeking opinions and social capital. results are shown in table 6. according to the t-test, given the significance level of 0.10, the significance probability of the three variables was less than 0.10 for both the times that readers sought peer opinions and the times that they consulted librarians. table 7 shows the introduction and elimination process of variables in the process of stepwise information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 10 regression. the analysis of seeking peer opinions eliminates relational capital, and the analysis of reference to librarians’ opinions has eliminated network sparseness. table 6. regression analysis results model non-standardized coefficients standardized coefficients t observed value significance probability partial regression coefficient (b) standard error (std error) standardized partial regression coefficient (beta) dependent variable: seeking peers’ opinions (constant) 4.864 1.334 3.645 .000 network scale 1.341 .143 .715 9.393 .000 network sparseness 3.450 1.012 -.245 -3.408 .001 relational capital .094 .318 .017 .295 .769 dependent variable: seeking librarian opinions (constant) .292 1.943 .150 .881 network scale -1.691 .208 .909 8.133 .000 network sparseness -1.312 1.474 .094 .890 .376 relational capital -1.385 .463 -.247 -2.990 .004 table 7. quantity introduction and elimination process model variable entered variable removed dependent variable: seeking peers’ opinions network scale, network sparseness relational capital dependent variable: seeking librarians’ opinions network scale, relational capital network sparseness according to the analysis results, we can draw some conclusions. first, the number of times readers sought opinions from peers was in proportion to network sparseness. in other words, the higher the number of readers’ online peers, the lower the network sparseness and the more they sought advice from their peers. second, the number of times readers sought librarians’ opinions was inversely proportional to their network scale and relational capital. this means that readers tended not to seek opinions from librarians if they had more network peers and more relational capital. information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 11 according to further analysis of tables 6 and 7, there are two equations that can derived: 1. regression equation of readers seeking opinions from peers: where y1 represents seeking peer opinions and x1 and x2 represent network scale and network sparseness, respectively. 2. regression equation of readers seeking opinions from librarians: where y2 represents seeking librarian opinions and x1 and x3 represent network scale and relational capital, respectively. based on the two regression equations, it can be observed that the number of peers’ reviews consulted increases by 13.9% for each point increase in network scale and by 35.7% for each additional unit of network sparseness. the number of librarians’ reviews consulted decreases by 60.7% for each point increase in network scale and by 70.7% for each additional unit of relational capital. discussion the value of peers according to the results in table 5, readers tend to seek opinions from peers. this is mainly because familiar information sources can provide more diagnostic help.43 meanwhile, the cognitive effort required to process such information is lower and information is easier to understand. research on peer education also shows similar findings. jane piaget and lev vygotsky found that it is easier to build partnerships among children than among children and adults. moreover, children are more willing to negotiate and theorize with partners who are not authoritative. in this social media age, the value of peers is even more pronounced. therefore, libraries should recognize this, recruit influential readers for reading promotion, and utilize the influence of peer social networks to spread information related to the promotion of reading. opinion seeking tendency of different types of literature information the participants in this study were university students majoring in mechanical engineering disciplines across several different universities. this means that they had similar backgrounds, experiences, and feelings as they participated in the mechanical design competition. under these circumstances, it would be expected that they had close peer relationships. behavioral science research proves that if the communicator and the receiver have similar experiences, are concerned about similar things, and face similar problems, the receiver is more likely to accept information from the communicator. this viewpoint is consistent with the standpoint proposed by psychological models in which relevant sources of information are more frequently activated. 44 therefore, the result that readers are more willing to seek opinions from peers is supported. the result regarding information resources utilization literature shows similar numbers in terms of seeking opinions from peers and from librarians, and the number seeking from librarians is slightly higher. this may be because libraries are the literature and information resource centers information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 12 of universities, and readers trust the professional abilities of librarians and are willing to seek their help with regard to resource utilization. the results indicate that librarians should make efforts to promote the use of information resources. seeking tendency of readers with different social capital in the process of decision-making, readers will look for homogeneous and credible people to assist them in their search for and evaluation of information.45 identity, experience, reading level, and taste in reading are contributing factors to credibility. the more partners, the more the relational capital; if readers have more trustworthy sources of information, they will not turn to librarians for their opinions. information searching is a dynamic and adaptive process. when readers find information that is novel, it has value in decision-making. conversely, when the information is redundant (not novel), it may cause the seeker to stop searching. it is widely acknowledged that sparse social networks reduce the possibility of information redundancy.46 therefore, the sparser the social network, the more likely it is that readers will need peer help. conclusions in this age of social media, university students are accustomed to using social networks. seeking information that complies with their psychological needs is more significant and valuable to them than the value of the information itself. therefore, libraries should make full use of peer influences when employing social media in reading promotion activities. first, university libraries ought to realize their great potential for involving students within the social flow to participate in reading promotion activities. in the digital age, readers’ consciousness is repeatedly awakened. individuality, advocating for information freedom, and improving the flow of information mean that readers are no longer satisfied with passively receiving information and are more willing to actively search out and read information. meanwhile, sharing and interaction can fully meet the desires of individuals to share and communicate as well as meet their psychological need of realizing their self-worth. libraries must understand the characteristics of contemporary university students’ information needs and create a space for readers to take the initiative. only in this way can readers be more than passive recipients of information—they can also be pushers and disseminators of information. bolder attempts at innovation should be applied to reading promotions. in this research, an analysis and exploration were conducted based on the literature, indicating that readers prefer to seek opinions from their own social networks. therefore, the library can make full use of readers’ social network groups when promoting the library’s literature resources. likewise, other services and activities provided by the libraries can be promoted through readers’ social networks. for instance, libraries can invite student volunteers to take part in a new service before launching and then invite them to share their feelings and evaluations through social networks. these types of methods are more efficient than the traditional flyer notification. last but not least, when organizing reading promotion activities, libraries should stay behind the scenes. university libraries can establish a set of systematic peer reading promotion rules, including recruitment, training, and management systems, to build a wide and influ ential reading promotion student volunteer team on social networks. libraries should, however, strengthen the process of monitoring peer reading promotions to prevent negative influences caused by harmful information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 13 information on social media.47 they should pay special attention to the control of social opinion by using the reader volunteer team. in addition, through the monitoring and analysis of data, the strategy and direction of reading promotions can be adjusted over time to improve their pertinence and effectiveness. moreover, libraries should strengthen the effective evaluation of peer reading promotion projects. readers should be involved in the systematic readjustment of traditional reading promotion methods. innovative methods need to be tested in practice s o that libraries can strengthen the effective evaluation of peer projects. limitations and future research there are some limitations in the methodology design, theoretical scope, empirical context, and research perspective in the current study. getting past these limitations can also provide direction for further research. in the methodology, although the sectional sampling method is often adopted, the conclusions of this research make it difficult to disentangle the roles of readers’ social capital from those of opinion seeking. this study explored the correlations between several variables which need to be further investigated by introducing control variables to fully examine the interactions between the variables. in theory, this research focused on analysis of the rule that readers will seek opinions through social networks, which reflects the behavior observed in information searching and browsing. readers then need to decide whether to use the information they find. these two behaviors are related and, based on this research, we need to further explore readers’ adoption behavior to better guide reader service work in libraries. in the empirical analysis, it is worth further considering how the various efforts undertaken by university libraries have promoted information channels. the participants in this study were university students majoring in mechanical engineering disciplines. however, students in different majors may exhibit different information selecting behavior, and this deserves further analysis and exploration. this study compared the influence of peers’ and librarians’ opinions on readers. however, how do readers feel about opinions from peers as compared to those from librarians? what is the impact of students preferring to use their social networks instead of librarians for information retrieval? is there a difference in the adoption of peer opinions by readers in different social media contexts? these questions deserve further study to fully understand the impact of social networks on read er opinion-seeking behavior. acknowledgements this work was supported by the humanities and social sciences research fund of the chinese ministry of education [grant number 17yja870003] and the philosophy and social science fund of zhejiang province [grant number 21ndjc039yb]. information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 14 endnotes 1 shi-man tang, “research on the reading promotion model and implementation path of university library based on social media platform.” (master’s thesis, university of jilin, 2013): 90–111. 2 yao qi, hua-wei ma, huan yan, and qi chen, “analysis of social network users’ online behavior from the perspective of psychology,” advances in psychological science 22, no. 10 (2014): 1647–59, https://doi.org/10.3724/sp.j.1042.2014.01647. 3 david gefen, elena karahanna, and detmar w. straub, “trust and tam in online shopping: an integrated model,” mis quarterly 27, no. 1 (march 2003): 51–90, https://doi.org/10.2307/30036519. 4 colleen boff, robert schroeder, carol letson, and joy gambill, “building uncommon community with a common book: the role of librarians as collaborators and contributors to campus reading programs,” research strategies 20, (2007): 272–83, https://doi.org/10.1016/j.resstr.2006.12.004. 5 mae l. rodney, “building community partnerships: the ‘one book one community’ experience,” c&rl news 65, no. 3 (march 2004), 130–32, https://doi.org/10.5860/crln.65.3.130. 6 ai-hua hou, “analysis and research on the reading promotion strategy of university library,” lifelong education 9, no. 5 (2020), https://doi.org/10.18282/le.v9i5.1251. 7 mei-ning li, tian-zi zhao, xu guan, and xin-hua chen, “study on building digital service system of ‘subscription’ reading promotion for university library,” library and information service 62, no. 18 (2018): 77–82, https://doi.org/10.13266/j.issn.0252-3116.2018.18.008. 8 boff, “building uncommon community with a common book,” 271–83. 9 kim becnel et al., “‘somebody signed me up’: north carolina fourth-graders’ perceptions of summer reading programs,” children & libraries: the journal of the association for library service to children 15, no. 3 (2017): 3–8, https://doi.org/10.5860/cal.15.3.3. 10 rodney, “building community partnerships,” 130–32, 155. 11 boff, “building uncommon community with a common book,” 271–83. 12 mu-chen wan and liang ou, “the empirical research of university libraries reading promotion effect based on the wechat public platform,” library and information service 60, no. 22 (2015): 72–78, https://doi.org/10.13266/j.issn.0252-3116.2015.22.011. 13 xiao-li wei, yi-ming mi, and fang sheng, “motivation measurement of university library’s participation in the reading promotion based on self-determination theory,” journal of library and information science 10, (2018): 1–8. 14 dali keren and lindsay mcniff, “reading work as a diversity practice: a differentiated approach to reading promotion in academic libraries in north america,” journal of https://doi.org/10.3724/sp.j.1042.2014.01647 https://doi.org/10.2307/30036519 https://doi.org/10.1016/j.resstr.2006.12.004 https://doi.org/10.5860/crln.65.3.130 https://doi.org/10.18282/le.v9i5.1251 https://doi.org/10.13266/j.issn.0252-3116.2018.18.008 https://doi.org/10.5860/cal.15.3.3 https://doi.org/10.13266/j.issn.0252-3116.2015.22.011 information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 15 librarianship and information science 52, no. 4 (february 2020): 1050–62, https://doi.org/10.1177/0961000620902247. 15 boff, “building uncommon community with a common book,” 271–83. 16 elizabeth betty marcoux and d. v. loertscher, “the role of a school library in a school’s reading program,” teacher librarian 37, no. 1 (2009): 8, 10–14, 84. 17 brett b. bodemer, “they can and they should: undergraduates providing peer reference and instruction,” college & research libraries 75, no.2 (2014): 162–78, https://doi.org/10.5860/crl12-411. 18 barbara macadam and darlene p. nichols, “peer information counseling: an academic library program for minority students,” journal of academic librarianship 15, no. 4 (1989): 204–9, https://doi.org/10.1016/0268-4012(89)90012-1. 19 turkey alzahrani and melinda leko, “the effects of peer tutoring on the reading comprehension performance of secondary students with disabilities: a systematic review,” reading & writing quarterly (april 2017): 1–17, https://doi.org/10.1080/10573569.2017.1302372. 20 bodemer, “they can and they should,” 162–78; ruth sara connell and patricia j. mileham, “student assistant training in a small academic library,” public services quarterly 2, no. 2–3 (2006): 69–84, https://doi.org/10.1300/j295v02n02_06; michael m. smith and leslie j. reynolds, “the street team: an unconventional peer program for undergraduates,” library management 29.3 (2008): 145–58, https://doi.org/10.1108/01435120810855287; gail fensom et al., “navigating research waters: the research mentor program at the university of new hampshire at manchester,” college & undergraduate libraries 13, no. 2 (2006): 49–74, https://doi.org/10.1300/j106v13n02_05; wendy holliday and c. nordgren, “extending the reach of librarians: library peer mentor program at utah state university,” college & research libraries news, 66, no. 4 (2005), https://doi.org/10.5860/crln.66.4.7422. 21 mary o’kelly, julie garrison, brian merry, and jennifer torreano, “building a peer-learning service for students in an academic library,” libraries and the academy 15, no. 1 (2015): 163– 82, https://doi.org/10.1353/pla.2015.0000. 22 fensom et al., “navigating research waters,” 49–74. 23 bodemer, “they can and they should,” 162–78. 24 ling-jie yao, “peer education: a new mode of university library services,” library development 12, (2012): 57–59. 25 jamie m. graham, allison faix ,and lisa hartman, “crashing the facebook party: one library’s experiences in the students’ domain,” library review 58, no. 3 (2009): 228–36, https://doi.org/10.1108/00242530910942072. 26 yue-qun zhang and chun-ning li, “change of library role in knowledge transfer in social network environment and countermeasures,” library and information 166, no. 6 (2015): 107– 12. https://doi.org/10.1177/0961000620902247 https://doi.org/10.5860/crl12-411 https://doi.org/10.1016/0268-4012(89)90012-1 https://doi.org/10.1080/10573569.2017.1302372 https://doi.org/10.1300/j295v02n02_06 https://doi.org/10.1108/01435120810855287 https://doi.org/10.1300/j106v13n02_05 https://doi.org/10.5860/crln.66.4.7422 https://doi.org/10.1353/pla.2015.0000 https://doi.org/10.1108/00242530910942072 information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 16 27 yi yang and ji-qing sun, “professional reading habits correlation research based on the social network theory,” new century library 70, no. 10 (2012): 81, 91–92, https://doi.org/10.16810/j.cnki.1672-514x.2012.10.024. 28 john wesley white and holly hungerford-kresser, “character journaling through social networks,” journal of adolescent & adult literacy 57, no. 8 (2014): 642–54, https://doi.org/10.1002/jaal.306. 29 christa s. c. asterhan and rakheli hever, “learning from reading argumentative group discussions in facebook,” computers in human behavior, no. 53 (2015): 570–76, https://doi.org/10.1016/j.chb.2015.05.020. 30 a. kochtchi, t. v. landesberger, and c. biemann, “networks of names: visual exploration and semi-automatic tagging of social networks from newspaper articles,” computer graphics forum 33, no. 3 (2014): 211–20, https://doi.org/10.1111/cgf.12377. 31 zhen-hua huang, bo zhang, qiang fang, and yang xiang, “an efficient algorithm of information recommendation between groups in social networks,” acta electronica sinica 43, no. 6 (2015): 1090–93. 32 cheng-ying liu, ming-syan chen, and chi-yao, “incrests: towards real-time incremental short text summarization on comment streams from social network services,” ieee transactions on knowledge and data engineering 27, no. 11 (2015): 2986–3000, https://doi.org/10.1109/tkde.2015.2405553. 33 jesse fox and courtney anderegg, “romantic relationship stages and social networking sites: uncertainty reduction strategies and perceived relational norms on facebook,” cyberpsychology behavior & social networking 17, no. 11 (2014): 685–91, https://doi.org/10.1089/cyber.2014.0232. 34 fei yao, cheng-yu zhang, wu chen, and tian-fang dou, “study on integrating library services into social network sites: taking the book club of tsinghua library university as a practice example,” library journal 30, no. 6 (2011): 24–28, https://doi.org/10.13663/j.cnki.lj.2011.06.014. 35 lin nan, “social capital: a theory of social structure and action,” (cambridge: cambridge university press, 2001); paul s. adler and seok-woo kwon, “social capital: prospects for a new concept,” academy of management review 27, no. 1 (2002): 17–40, https://doi.org/10.5465/amr.2002.5922314; peter moran, “structural vs. relational embeddedness: social capital and managerial performance,” strategic management journal 26, no. 12 (2005): 1129–51, https://doi.org/10.1002/smj.486. 36 peter h. gray, s. parise, and b. iyer, “innovation impacts of using social bookmarking systems,” mis quarterly 35, no. 3 (2011): 629–43, https://doi.org/10.1002/asi.21581. 37 linton c. freeman, “centrality in social networks’ conceptual clarification,” social networks (1978), https://doi.org/10.1016/0378-8733(78)90021-7; stephen p. borgatti, “centrality and network flow,” social networks 27, no. 1 (2005): 55–71, https://doi.org/10.1016/j.socnet.2004.11.008. https://doi.org/10.16810/j.cnki.1672-514x.2012.10.024 https://doi.org/10.1002/jaal.306 https://doi.org/10.1016/j.chb.2015.05.020 https://doi.org/10.1111/cgf.12377 https://doi.org/10.1109/tkde.2015.2405553 https://doi.org/10.1089/cyber.2014.0232 https://doi.org/10.13663/j.cnki.lj.2011.06.014. https://doi.org/10.5465/amr.2002.5922314 https://doi.org/10.1002/smj.486 https://doi.org/10.1002/asi.21581 https://doi.org/10.1016/0378-8733(78)90021-7 https://doi.org/10.1016/j.socnet.2004.11.008 information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 17 38 ronald s. burt, “structural holes: the social structure of competition” (cambridge: harvard university press, 1992). 39 adler and kwon, “social capital,” 17–40. 40 peter v. marsden and k. e. campbell, “reflections on conceptualizing and measuring tie strength,” social forces 91, no. 1 (2012): 17–23, https://doi.org/10.1093/sf/sos112tti; stephen p. borgatti and r. cross, “a relational view of information seeking and learning in social networks,” management science 49, no. 4 (2003): 432–45, https://doi.org/10.1287/mnsc.49.4.432.14428; peter moran, “structural vs. relational embeddedness,” 1129–51; mesch gustavo and i. talmud, “the quality of online and offline relationships: the role of multiplexity and duration of social relationships,” the information society 22, no. 3 (2006): 137–48, https://doi.org/10.1080/01972240600677805. 41 camille grange and i. benbasat, “opinion seeking in a social network-enabled product review website: a study of word-of-mouth in the era of digital social networks,” social science electronic publishing 27, no. 6 (2018): 629–53, https://doi.org/10.2139/ssrn.2993427. 42 borgatti and cross, “a relational view of information seeking and learning in social networks,” 432–45; gustavo and talmud, “the quality of online and offline relationships,” 137–48; fox and anderegg, “romantic relationship stages and social networking sites,” 685–91; gray, parise, and iyer, “innovation impacts of using social bookmarking systems,” 629–43. 43 david gefen, “e-commerce: the role of familiarity and trust,” omega 28, no. 6 (2000): 725–37, https://doi.org/10.1016/s0305-0483(00)00021-9. 44 tam kar yan and s. y. ho, “understanding the impact of web personalization on user information processing and decision outcomes,” mis quarterly 30, no. 4 (2006): 865–90, https://doi.org/10.2307/25148757. 45 jacqueline johnson brown and peter h. reingen, “social ties and word-of-mouth referral behavior,” journal of consumer research 14, no. 3 (december 1987): 350–62, https://doi.org/10.1086/209118. 46 glenn j. browne, mitzi g. pitts, and james c. wetherbe, “cognitive stopping rules for terminating information search in online tasks,” mis quarterly 31 (march 2007): 89–104, https://doi.org/10.2307/25148782. 47 yue long and yi-yang liu, “propagation characteristics and paths of negative network public opinions in colleges under the new media environment,” information science 37, no. 12 (2019): 134–39, https://doi.org/10.13833/j.issn.1007-7634.2019.12.022. https://doi.org/10.1093/sf/sos112tti https://doi.org/10.1287/mnsc.49.4.432.14428 https://doi.org/10.1080/01972240600677805 https://doi.org/10.2139/ssrn.2993427 https://doi.org/10.1016/s0305-0483(00)00021-9 https://doi.org/10.2307/25148757 https://doi.org/10.1086/209118 https://doi.org/10.2307/25148782 https://doi.org/10.13833/j.issn.1007-7634.2019.12.022 abstract introduction literature review university reading promotion peer education in university libraries social reading promotion methods research questions study design data collection experiment validity results reviews written and browsed discussion the value of peers opinion seeking tendency of different types of literature information seeking tendency of readers with different social capital conclusions limitations and future research acknowledgements endnotes decision-making in the selection, procurement, and implementation of alma/primo: the customer perspective article decision-making in the selection, procurement, and implementation of alma/primo the customer perspective jin xiu guo and gordon xu information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.15599 jin xiu guo (jiguo@fiu.edu) is associate dean for technical services, florida international university. gordon xu (gordon.xu@njit.edu) is associate university librarian for collections & information technology, new jersey institute of technology. © 2023. abstract this case study examines the decision-making process of library leaders and administrators in the selection, procurement, and implementation of ex libris alma/primo as their library services platform (lsp). the authors conducted a survey of libraries and library consortia in canada and the united states who have implemented or plan to implement alma. the results show that most libraries use both request for information (rfi) and request for proposal (rfp) in their system selection process, but the vendor-offered training is insufficient for effective operation. one-third of the libraries surveyed are considering switching to open-source options for their next automation system. these insights can benefit libraries and library consortia in improving their technological readiness and decision-making processes. introduction with the exponential growth of digital information, libraries have been seeking innovative systems to manage electronic resources and provide collection services. the next-generation integrated library system (ils) should address both current challenges and future demands. with that in mind, new cloud-based commercial products have come into the market in recent years. ex libris alma, oclc worldshare, and innovative sierra are often referred to as library service platforms (lsps) compared to a client-based ils. among these new products, selecting and implementing a new system is no small task. studies show that libraries might overlook the capacity of an ils to accommodate many functions and make a tough choice between sticking with the current vendor or switching to another before investing time and resources to migrate to a completely new system.1 libraries do not make these kinds of decisions in a rational manner, which involves clearly defining the problem, identifying and evaluating potential options, weighing the pros and cons of each option, considering an organization’s values, goals, and preferences, making a choice based on a systematic analysis, and continuously reassessing and adjusting the decision as new information becomes available. as a result, a selected system might not be the best fit for a library’s actual needs.2 library consortia also face a similar challenge, but in a more complex context. for example, sharing cost, level of collaboration, and integration with other library applications can be quite different from a small library to a large research library. additionally, the requirement for security and scalability can vary among consortial members. ninety-four percent of academic libraries migrated their systems to alma in 2018 by joining a consortium.3 at a consortial level, managing a system migration project adds a significant challenge because of the competing, often conflicting desires of constituent institutions. mailto:jiguo@fiu.edu mailto:gordon.xu@njit.edu information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 2 guo and xu budgeting for a migration project needs to be secured before the project takes place. the one-time migration cost has a huge impact on a library’s decision on a new system. lengthy procurement processes mean that it can take a year to communicate requirements, solicit bids, and make a final decision. libraries also wonder if they should acquire such a new system through a consortial deal or on their own. a successful implementation of a new system starts with making a sound choice. the system migration project encompasses various technological and management decisions made by project managers, team leaders, and library administrators. decisions about data cleanup, migration mapping, system configuration, communication, and training can have a tremendous impact on project outcomes, staffing, existing workflows, and job functions and responsibilities. in the meantime, the project itself also provides libraries a great opportunity to improve the existing operational and staffing model and to adjust their strategy to manage technological and organizational change. there are few studies on decision-making of the alma/primo selection, procurement, and migration from the user’s perspective. alma is a cloud-based library management system that helps libraries manage, deliver, and discover digital and physical resources. it offers functionalities such as resource discovery, resource management, resource sharing, and analytics. primo ve is a next-generation library discovery platform that provides users with access to a central index of the library’s collections. it offers a personalized and intuitive search experience, with features such as faceted searching, saved searches, and item recommendations. both alma and primo ve are ex libris products. this case study fills the gap and provides a better understanding of how american and canadian library leaders and administrators make decisions for their libraries and consortia. the pairing of ex libris’s alma and primo products has become a widely accepted next-generation system due to its cloud-based model for managing both electronic and print resources. the findings of this study offer insights and lessons learned to help library leaders and administrators to make better decisions on their future technological change. literature review the growing user demand for electronic resources over the last decade has led libraries to make a rapid digital transformation to manage and deliver online library services. consequently, system providers are hungry to develop the next-generation library systems. organizations have started to adopt cloud computing as their infrastructure. a benefit of cloud computing is that local it staff no longer need to handle hardware failures and software installation. cloud computing streamlines processes and saves time and money. additionally, cloud computing not only enables libraries to deliver resources and services in a network and a library community but also frees libraries from managing technology to focus on collection building, service improvement, and innovation. therefore, libraries have started to migrate their client-based integrated library systems (ils) to cloud-based next-generation systems, often referred as lsps. these lsps can be connected with other web applications, increase collection visibility and accessibility, streamline workflows, reduce duplication of staffing and collections, and create a greener ecosystem for organizations.4 library consortia have been playing vital roles in resource sharing, cooperative purchasing, discovery, user experience, and technical support. many libraries migrate to a shared nextgeneration ils or lsp by joining a consortium. besides sharing common needs, participating information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 3 guo and xu libraries are quite different with respect to their sizes, the kinds and numbers of resources they provide, services, priorities, and staffing. although this could pose some challenges like cost sharing for participating libraries, workflow design, policy, and a collaboration model for libraries, libraries still benefit greatly from the shared catalog and enhanced metadata as well as cooperation on a global level through the product community such as eluna and igelu.5 the selection of a new system is not a small decision. calvert and read pointed out that some libraries turned to “sheep syndrome” of selecting what other libraries have bought due to the lack of software knowledge.6 their study suggested that a request for proposal (rfp) could be a part of the lsp selection process by providing a consistent set of vendor responses with a narrow scope, a formal statement of requirements for benchmarking, and a mechanism for vendors to compete. gallagher advised considering existing contracts, financial resources, and rfps before beginning a system assessment. he indicated that the expiration date of the current ils and opt-out clauses of the existing contract could be the indicators of a go-live date. a price quote including a one-time implementation fee and a cost-benefit analysis of the current ecosystem compared to the vendor offer could provide a helpful document that envisions future library services.7 in addition to an rfp, yang and venable also considered the library automation marketplace and needs of their own library when migrating from sirsidynix symphony to alma/primo.8 gallaway and hines embraced competitive usability techniques to test a set of standard tasks across multiple systems by using focus groups at loyola university new orleans to select a nextgeneration system.9 they also collected anecdotal information and feedback on the system performance of the current library online catalog through a survey of library staff. this evidencebased decision-making process makes system selection in a rational manner. manifold, on the other hand, proposed a principled approach to selecting a new lsp. he believed that system selection was a part of the continuing process of organizational change and needed to involve library staff and users throughout the process. today’s lsp systems can connect almost the entire range of library operations, from resource management and acquisitions to user request fulfillment and the integration of subject guides on research, teaching, and learning a system migration is much more than just a move to a new system; instead, it is a transfer to a new culture. he suggested the acquisitions process must start with educating participants on the features of various systems, methods of vendor assessment, the rules of contract negotiation, communication, and stress management. the success in system selection and implementation should be measured over the life span of the system to guide new decisions along the way.10 in addition to commercial products, some libraries are acquiring open-source software (oss) that enables them to have a greater control over customization. the potential benefits of oss include cost effectiveness, interoperability, user friendliness, reliability, stability, auditability, and customization. koha, evergreen, folio, abcd, winisis, newgenlib, emilda, pmb (phpmybibli) and weblis are examples of oss ils/lsp products on the market.11 when selecting and implementing an oss solution, small libraries such as the paine college colins-callaway library, with a limited budget and small staff, chose a hosted open-source ils (koha) to obtain specific expertise and services at a reasonable price.12 once a system is selected, the implementation process itself can be critical to the perception of overall system success. lovins expressed concern about choosing a project management approach that is schedule-driven over results-driven. he also recommended organizing implementation information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 4 guo and xu activities around the incoming system functionality. for a consortium-wide system migration, a “train-the-trainer” strategy was adopted in the training program, which mostly offers demonstrations instead of instruction to future trainers.13 the program hardly met libraries’ expectation for training. active staff participation in a system migration is key to a project success. banerjee and middleton reported that when library staff owned the migration process, fewer mistakes and greater satisfaction with the new system, as well as quicker troubleshooting of problems that did arise as a result of the migration, were observed.14 avery shared that the god’s bible college libraries did an informal preand post-assessment of library users and staff to gather feedbacks on both legacy and target ils. he recommended conducting a formalized preand post-evaluation of user satisfaction with the ils.15 stewart and morrison observed that acquisitions workflows in a shared alma environment must balance required consortial needs with local policies and procedures. the unmet training needs and the lack of an electronic resources management (erm) module in alma presented challenges for library staff to develop and manage alma workflows. they argued that a two-year project cycle was super ambitious especially if the consortium size and variety of individual libraries involved were large and wide.16 when migrating from horizon to symphony (both are sirsidynix products), king fahd university of petroleum and minerals based in dhahran saudi arabia experienced a delayed implementation. some unmet needs, such as a dramatic shift of workflows, user interface customization, and training support by a system provider or its parent company not matched by a local vendor, became hurdles for this project.17 although a new lsp including alma/primo and oss empowers libraries to create unified workflows across functional modules, this feature requires a system user to have cross-functional roles to conduct these activities.18 when migrating from non-ex libris product lines to alma/primo, libraries may need to make tough implementation decisions. for example, the university of south carolina migrated library data to alma/primo from innovative’s millennium and ebsco’s full text finder. when the legacy and target products are from different vendors, the system migration can be more complicated in communication, data mapping, data quality, and expected results of data migration. for the usc library, the preexisting duplicate records for electronic resources should have been cleaned up before the migration.19 libraries should address their concerns about key activities during the implementation to get the best possible result. the joint bank fund library had a three-day onsite training in workflows in the middle of the project. it would be much more effective if the library had communicated with the vendor to reschedule the training at a later stage of the migration because library staff were not yet familiar with the lsp by the expected time.20 the university of north carolina at charlotte migrated from oclc’s worldshare management services (wms) to alma/primo after migrating from millennium to wms four and a half years previously. the atkins library went through the second system migration because wms modules did not meet their library’s needs. going through two system migrations in the span of five years was particularly costly and frustrated technical services staff spent more than half of their work time on data cleanup. additional time for data cleaning, workflow design, and training was also needed after the migration to alma.21 fu and fitzgerald studied the effect of lsp staffing models for library systems and technical services by analyzing the software architecture, workflows, and functionality of voyager and information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 5 guo and xu millennium against those realigned in alma, wms (worldshare management systems), and innovative sierra. they discovered that the workload of systems staff could be reduced by around 40 percent, so library systems staff could have additional time to focus on local applications development, the discovery interface, and system integration. in the meanwhile, the functionality of the next generation ils provides a centralized-data services platform to manage all types of library assets with unified workflows. consequently, libraries could streamline and automate workflows for both physical and electronic resources through systems integration and enhanced functionality. this change requires libraries to reconsider their staffing models, redefine job descriptions, and even reorganize the library structure to leverage the benefits of a new lsp.22 western michigan university (wmu) decided to reorganize its technical services department after the alma migration was completed in 2015. after the alma implementation, it was observed that staff spent 38 percent less time working with physical materials. the systems department also shifted its focus from back-end system support to front-end user and other new technologies. wmu consolidated fourteen departments into six and renamed technical services to resource management, composed of cataloging and metadata, collections and stacks, and electronic resources. the lsp administration was shared by four certified alma administrators and one discovery administrator residing in the resource management department.23 although researchers and library practitioners have studied ils selection and implementation processes and the impact of migration on library operation and staffing, only the studies on the rfp and usability testing have focused on decision-making on the ils selection. today, library administrators and leaders face technological change more often while making a transformation to a digital business model. they should understand how decisions are made at different organizational levels when managing change. this study is to fill this gap and help library administrators and leaders to better prepare for future change through the following research questions: • what is the decision-making process and what do libraries consider? • how do libraries evaluate the migration project? • what are the impacts of the system migration on library staffing and operation? • what lessons have libraries learned from the system migration? • what will libraries do differently for the future system migration? methods researchers have adopted both qualitative and quantitative methods for studies about system migration. the literature indicates that both interviews and surveys have been employed to collect data for these studies.24 a usability testing through a set of tasks across systems has also been utilized in a system selection.25 a comparative analysis of vendor documents, rfp responses, and webinars has been applied in studying the impact of system migration on staffing models.26 in this research, the authors used a qualitative method through a survey to understand decisionmaking on system selection, procurement, and implementation. data collection the population for this study is those libraries that implemented or are planning to implement alma. through the eluna membership management site (https://eluna40.wildapricot.org/), the https://eluna40.wildapricot.org/ information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 6 guo and xu authors identified 1,440 libraries in the united states and canada that use at least one ex libris product. with help from sue julich at university of iowa libraries, who manages the site, 1,150 alma libraries were identified. the authors also contacted marshall breeding, the founder and publisher of library technology guides (https://librarytechnology.org/), and obtained a list of 1,134 alma libraries in the united states and canada. comparing the alma libraries acquired from the two different sources, they eventually identified 1,079 libraries from the united states and 55 libraries from canada as eligible survey-participating libraries. the authors developed a 13-question survey in qualtrics. this questionnaire aimed to help participants recall the project experience and offer them an opportunity to self-reflect and give feedback. the survey was distributed via email to the eligible libraries. a few email reminders were sent out to encourage participation. upon the closure of the survey, 291 libraries (27%) completed the survey completely. data analysis qualtrics generates data analysis and reports. the authors conducted a text analysis by categorizing responses to those open-ended survey questions to clarify the characteristics of each response manually and then presented and analyzed data in microsoft excel. findings part i: library profile & background information the participating libraries have diverse profiles in terms of size and geographic location and reflect the point of views from small library to library consortium. remarkably, during the survey, the authors received requests for a complete survey questionnaire so that respondents could coordinate and provide the complete and accurate data on behalf of their libraries. respondents the majority of the respondents in this survey were deans, directors of the library or university librarians, and system librarians (see table 1). also, there were a wide variety of other position titles across cataloging, acquisitions, technical support, and reference, who participated in the survey (see table 2). participating libraries geographic location the participating libraries were located in the united states and canada, and the majority of them were american libraries (see table 3). the american libraries were distributed in 36 states, while the canadian libraries came from 4 provinces. https://librarytechnology.org/ information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 7 guo and xu table 1. the position titles of the respondents position title percentage dean/director of the library/university librarian 35% system librarian 23% other 42% table 2. the other position titles of the respondents other position titles assessment librarian head of metadata and cataloging asset management librarian head of technical services assistant director ils coordinator associate dean instructional technology librarian associate director lead librarian associate law librarian library technician associate university librarian library technology manager cataloging and metadata librarian manager of archives & access services cataloging librarian manager of digital services collections librarian manager of technical support consortial executive director metadata librarian deputy director of the library project director director of library systems public services librarian director of library technology services reference librarian/webmaster director of technical services resource description and access librarian electronic resources librarian solutions architect, alma implementation project manager head librarian supervisor for access services head of acquisitions technical services and instruction librarian head of collection management technical services librarian head of library systems technical services section head head of library technology services technology manager table 3. the geographic locations of the libraries country percentage united states 92% canada 8% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 8 guo and xu library size the libraries served a wide variety of student sizes, ranging from less than 1,000 to over 50,000 students (see table 4). the smallest library had only 199 students while the largest library system or consortium had 482,000. the number of employees in those institutions ranged from less than 1,000 employees to over 20,000 faculty and staff (see table 5). the smallest institution may only have 10 employees, while there were three larger institutions with over 50,000 faculty and staff. table 4. student population (number of ftes) student population (number of ftes) percentage <1,000 6% 1,000–1,999 14% 2,000–2,999 10% 3,000–3,999 8% 4,000–4,999 4% 5,000–5,999 6% 6,000–6,999 4% 7,000–7,999 6% 8,000–8,999 4% 9,000–9,999 1% 10,000–14,999 9% 15,000–19,999 8% 20,000–29,999 6% 30,000–39,999 5% 40,000–49,999 3% 50,000+ 4% table 5. faculty and staff population (number of ftes) faculty/staff population (number of ftes) percentage <100 9% 100–499 25% 500–1,000 17% 1,000–1,999 14% 2,000–2,999 7% 3,000–4,999 12% 5,000–9,999 9% 10,000–19,999 4% 20,000+ 5% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 9 guo and xu library type the majority of the libraries were single campus libraries; some were part of a multicampus library system or consortium libraries (see table 6). the other types of libraries may include single campus libraries serving more than one institution or location, central offices of a consortium, part of a statewide system, or independent libraries involved in consortium purchase and implementation of alma. table 6. library type library type percentage single campus library 45% part of a multicampus library system 24% part of a consortium 26% other 5% previous integrated library system (ils) the majority of previous ilss used by the participating libraries were voyager, aleph, millennium, and sierra (see table 7), and their vendors were ex libris, innovative interfaces, inc., and sirsidynix (see table 8). thirty-seven percent of libraries reported that they had used their previous ils over 20 years before they planned to migrate or migrated to alma (see table 9). also, one-fifth of libraries indicated that prior to alma, it was their first time to adopt an ils. therefore, this was their only experience in system migration (see table 10). all libraries used cataloging, circulation, and opac modules in their previous ilss, and they also used other modules (see tables 11 and 12). table 7. the previous ilss the previous ils percentage voyager 29% aleph 24% millennium 16% sierra 12% symphony 6% worldshare management services 3% horizon 2% workflows 2% tlc 1% clio 1% evergreen 1% surpass 1% the library corporation 1% other 3% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 10 guo and xu table 8. the previous system vendors the previous ils vendor percentage ex libris 49% innovative interfaces, inc. 28% sirsidynix 11% oclc 4% endeavor 1% tlc 1% surpass 1% the library corporation 1% other 5% table 9. years with the previous systems years with the previous system percentage 3 1% 4 1% 5–9 7% 10–14 18% 15–19 27% 20+ 37% unknown 9% table 10. whether the previous systems were the first ilss was it your first ils percentage no 72% yes 20% unknown 7% table 11. modules used in previous ils modules used in previous ilss percentage cataloging 100% circulation 100% opac 100% serials 77% acquisitions 76% course reserves 64% interlibrary loan 28% other 9% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 11 guo and xu table 12. other modules used in previous ilss other modules used in previous ilss analytics booking course reserves discovery system electronic resource management ereserves inn-reach licensing part ii: implementation process alma modules/functions the majority of libraries reported that they will implement or have implemented the following alma modules: fulfillment, primo/primo ve, resource management, and acquisitions (see table 13). some libraries mentioned that they also used summon to replace primo/primo ve as they had used it before the system migration. table 13. alma modules/functions implemented alma modules/functions implemented percentage fulfillment 100% primo/primo ve 93% resource management 92% acquisitions 84% erm (electronic resources management) 77% course reserves 73% network zone 50% interlibrary loan 40% digital collections 21% other 8% selection process rfi and rfp when asked if an rfi (request for information) was involved, more than half of the libraries responded with a confirmative answer (see fig. 1). about half of the libraries reported that they did not conduct a system functionality survey to collect information from library users and colleagues (see fig. 2). more than half of the libraries indicated that the rfp (request for proposal) process is required for the system migration (see fig. 3). there were a variety of reasons why for those libraries who did not conduct the rfp process (see fig. 4), such as an rfp may not be necessary when migrating systems to the same vendor, there was no increase in expenditure, or the expenditure did not reach a budget threshold (e.g., less than $100,000), or the previous contract stipulated it if upgrading to a new product with the same vendor. another reason was that libraries might have an existing relationship with vendors and would like to continue using information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 12 guo and xu their products. some libraries were given authority by the university administration and library directors to handle the negotiation, or they thought an rfi offered sufficient information to make this decision. other libraries had no choice in conducting an rfi or rfp process for reasons such as their system was outdated and they had to migrate, the decision was made by consortium, or alma was their sole source procurement. figure 1. whether an rfi (request for information) was involved. figure 2. whether a system functionality survey was conducted. yes 52% no 40% unknown 8% no 51%yes 43% unknown 6% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 13 guo and xu figure 3. whether an rfp (request for proposal) was involved. figure 4. the rationales for libraries who did not conduct the rfp. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 14 guo and xu decision-making the authors found that the common roles involved in the decision-making process included library dean/director, alma local implementation team, and alma project working group (consortium) (see fig. 5). some libraries indicated that their system migration decision was made by university executives (provost, vp finance, cio, and cfo), campus it, aul for library technology, or all librarians/staff. one library reported that the dean of arts, languages & learning services made the selection decision instead of the library or librarians. figure 5. the decision makers. important factors for system selection the authors found that the four most important elements to consider for system selection were budget reality; electronic resource management (erm), bibliographic, and authority control; discovery layers (primo, primo ve); and cloud hosted (see table 14). information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 15 guo and xu table 14. the important factors for system selection important factor for system selection strongly disagree somewhat disagree neither agree nor disagree somewhat agree strongly agree the budget reality 3% 6% 11% 34% 47% the number of libraries adopted 7% 7% 27% 40% 19% erm, bibliographic, & authority control 2% 2% 17% 38% 41% discovery layers (primo, primo ve) 6% 4% 13% 27% 50% the analytics/reporting functionality 4% 6% 15% 41% 35% cloud hosted 3% 3% 12% 36% 47% the campus it infrastructure & its ecosystems 8% 12% 31% 31% 18% integration with other erps 12% 15% 30% 33% 10% customer support & satisfaction 4% 6% 21% 37% 31% system user training programs 5% 11% 24% 38% 21% figure 6. the data migrated to alma. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 16 guo and xu data migrated the most common types of data migrated to alma were bibliographic records, holdings and items, patrons, and circulation data (see fig. 6). some libraries reported that they also migrated other types of data including vendor lists, e-resource data, all available data types, etc. discovery service the survey asked if there were any libraries that migrated to alma and did not choose primo/primo ve for their discovery service. nine libraries reported they were in this case. four of them used summon, four chose ebsco discovery service, and one adopted their locally developed product. when asking the reason for their choices, the nine libraries indicated that they would like to stay with the existing discovery service. additionally, two of the libraries stated that a budget limitation was a part of their reasons, and one library thought the better discovery service for users was the rationale. part iii: feedback on alma migration system migration evaluation the majority of libraries reported that they did not conduct a formal post-migration evaluation. half of the libraries thought the migration achieved their project goals, or met the needs of library operations (acquisitions, cataloging, fulfilment, discovery, etc.) (see fig. 7). figure 7. whether a formal post migration evaluation was conducted. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 17 guo and xu some libraries also provided their own migration evaluation, including rfp mandatory requirements signoff, availability study, focus groups with library staff, usability testing with students and faculty, feedback and cross-checking with consortium, debrief of library staff, etc. some only did an informal evaluation, which turned out to be not handled well or not very satisfactory. for example, one consortium did a survey on the migration and provided the feedback to ex libris for improvement. other libraries reported that they had not done the evaluation as they did not start the migration process, were still in the migration stage, that an evaluation was not a part of the decision-making process, or that alma was offered as a free product because of their consortial partnerships. valuable lessons learned the authors asked what were the most valuable lessons the libraries had learned from the migration project, and how they would implement the migration differently if they had a chance to do it again. the most valuable lessons concentrated on training, communication, engagement, implementation process, and data cleanup/preparation (see fig. 8). these lessons are shared in greater detail in the discussion section. figure 8. the valuable lessons learned from the migration project. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 18 guo and xu prospective migration when asking if libraries would consider working with ex libris again if they migrated to a new system in the future, 70 percent of libraries gave an affirmative answer, but some libraries indicated that they would seek other alternatives (see fig. 9). when asked how likely libraries would be to consider implementing an open-source ils, the majority of libraries conveyed that they would not consider open source; only 7 percent of libraries would consider it (see fig. 10). figure 9. whether ex libris products would be considered in the future. figure 10. whether an open-source ils would be considered in the future. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 19 guo and xu discussion the authors examine the above findings further through the lens of the research questions raised in the literature review section. the decision-making process and factors considered the survey indicates that both rfi and rfp are important for a selection process. fifty-two percent of the libraries conducted an rfi and 57 percent required the rfp process for the system migration. interestingly, even with a variety of sound reasons such as no increase in expenditure, within the budget threshold, existing relationships with vendors, sole source procurement, consortium decision, riders, etc., some libraries still did not roll out the rfp process. besides rfi and rfp, 43 percent of libraries went through a system functionality survey to collect information from library users and colleagues. for most libraries, the library dean or director, alma local implementation team, or alma project working group of a consortium were involved in the decision-making process. in some cases, university executives such as provost, vp finance, cio, cfo, campus it, and associate dean or associate university librarian for library technology made a collective decision. in a rare case, the dean of arts, languages & learning services made the call for the system selection. when considering system migration, many factors can be important. this survey shows that libraries mainly consider budget reality; erm, bibliographic, and authority control; discovery layers; and cloud-hosted systems. it is interesting that most libraries would like to move to a cloud-based system that has better functionality for discovery and electronic resources management. the survey also reveals that library administration needs to find a way to offset the cost increase of the system migration. the lack of comparable system or service offerings in the market also contributes to the decision on system selection. project evaluation project evaluation provides important feedback from both system users and system providers and a great opportunity for libraries to learn. the findings indicate that many libraries do not have a formal assessment process. some consortia have conducted surveys and provided feedback to ex libris, but no response reported to the feedback from ex libris. both libraries and system vendors have lost the opportunity to learn and improve project management. for example, welldocumented complaints on dissatisfaction with ex libris training have not been effectively addressed. some libraries believe a demonstration-focused training model does not provide the same experience as onsite training offers. many libraries have had trouble with acquisitions workflows. the eocr (electronic order confirmation record) and edi (electronic data interchange) processes are standard practices in libraries today to generate order records and create invoices automatically and should be a part of implementation contract to ensure that libraries can operate appropriately after a new system goes live. it is time for both libraries and system providers to consider a formal project assessment as a part of system migration down the road. libraries will not do better if they do not improve today. libraries cannot improve if they do not know where previous projects have gone wrong. a better way to learn from mistakes is project assessment. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 20 guo and xu impacts on library staffing and library operation some libraries reported that insufficient staffing over the system migration has created additional problems and hardships. some library departments have been stretched very thin in order to work on the migration project in addition to their regular operational duties. however, about onethird of survey-participating libraries have reported that meeting the needs of library operation including acquisitions, cataloging, fulfilment, and discovery is a criterion of project evaluation. the lack of dedicated lsp project migration staff creates a challenge for system migration. most importantly, additional staffing time and technical capacity are important factors that decide if libraries could fully take advantage of the functionalities of a new system. libraries might manage the system migration better by hiring additional technical staff on a project basis to handle technical aspects if staff cannot be released from library operation to focus on the migration project. the system integration and unified automated workflows of a modern lsp can enable libraries to run their operations more efficiently. particularly in a shared environment or network, libraries could share bibliographic records for general collections wider and deeper, which could dramatically reduce the need for both original and copy cataloging. system staff no longer need to install or upgrade proprietary software and maintain servers in house. these changes might cause job insecurity for some library staff. it is critical for library leaders to make adjustments to some job responsibilities or develop new skills to meet new demands. this requires library administration to create a culture of embracing change, learning, and collaboration. staff can take the advantage of a new system by being curious and reassessing previous workflows. library administration could create a flexible structure to encourage learning and collaboration across departments. lessons learned many libraries shared valuable lessons they learned from the migration projects. those lessons concentrate on training, communication and engagement, implementation process, and data cleanup and preparation. training many libraries expressed dissatisfaction with the training provided by their vendor. for example, libraries moving to alma reported that ex libris could have focused more on in-person, postmigration training. as it was, staff felt undertrained because they had access only to online training before the libraries had access to their own data in alma/primo. additionally, ex libris did not have regular trainers for a particular library, so there was less continuity across training sessions than there could have been. some suggest that ex libris do a concentrated several-day initial training for migration so that libraries have a solid overview of the entire system before data exports for testing loads, and then delve into a detailed weekly training that includes more library staff. it seems a good idea to schedule more training sessions after implementation because libraries may not know how the system functions during the implementation period. in an ideal world, libraries would put more contractual obligations on ex libris to train staff more thoroughly. after all, libraries need to hold ex libris more accountable for project outcome. for consortium libraries, they should insist that ex libris provide specialized individual trainers and technical contacts. attending group training sessions conducted by a variety of different ex libris information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 21 guo and xu trainers does not work well in large migration projects. ex libris needs to train the library staff rather than focusing on training the consortium support staff and expecting them to do most of the staff training. ex libris indeed carries a variety of training webinars that are free; however, for bespoke training or intimate training sessions, they charge their customers. a barrier for many libraries is that they just cannot afford to pay more on these bespoke training sessions so they depend on other in-house training and best practices (e.g., work groups, training committees, inhouse power users, etc.) to train/manage the training needs of their library personnel. communication and engagement many libraries express that communication is extremely important and buy-in from stakeholders at all levels is critical to the migration project’s success. investing the initial time to have all stakeholders onboard will pay off. blocking off time for weekly meetings with involved staff and ex libris is key. some suggested asking more questions and seeking to understand the functionality of the new system more deeply. for consortial libraries, librarians can become much closer to each other and learn to seek out and receive help from one another in the ways that they might never do before. the networking can be an invaluable source for mutual support going forward. some libraries reported that due to the lack of communication, an overly sudden decision for the implementation timeline was made at the legislative level. information regarding requirements and expenses was not fully clarified before the process began and came as a surprise during the migration. the whole process felt very rushed by the vendor with insufficient trainings, which turned out to be very dissatisfying. implementation process a system migration is complex and requires a great deal of time, institutional resources, and staff. some key processes needed to be better prepared in advance, such as staff trainings, project plans and major milestones, system analysis, customer inputs for implementation and configuration, data cleanup, physical to electronic processing (p2e), source data extraction, validation and delivery, workflow analysis, fulfillment network, authentication, third-party integrations, data review and testing, go-live readiness checklist, etc. in practice, the migration was often more time and resource-intensive than expected, meaning that libraries found it difficult to complete their part of the process in the contractually-specified time. libraries should clear the decks of core staff to focus on migration, and make sure there are no other major projects occurring at the same time. if staff have insufficient time during the migration window, libraries need to hire temporary experienced staff for the project. this investment will benefit library operation in the long run. the implementation team members should have more dedicated time to be trained so that the library staff are well prepared and knowledgeable in the areas in which they work. it is wise to clean up data as much as possible prior to migration. it would be ideal if the existing workflows were fully documented with diagrams so that it would be easier to determine what parts of the workflows need change. some libraries reported their migration happened during the pandemic with state-issued stay-athome orders in force. it was extremely stressful juggling all of the changes for the library while keeping up with system migration. ideally, it would be better to avoid doing the migration during a pandemic and postpone the migration. but if libraries have no other choices, one benefit is to take advantage of closures for cutover days. the stress of the implementation and trying to get information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 22 guo and xu things done may cause frustrations to boil over. it is advised to manage these situations by adding additional support where needed and by always ensuring that communication is a top priority so that any confusion is kept to a minimum. for consortial libraries, it is important for individual institution members to have their own project managers. the consortial libraries would have tried to standardize more configurations across the consortia, like user groups, circulation settings, item types, etc. some libraries felt the whole migration process was rushed by the vendor, which turned out to be not very successful. libraries should not let the vendor talk them into a compressed, severalmonth migration timeline; instead, they should spend more time in the preparation and implementation process. data cleanup and preparation although it is tedious and time consuming, many libraries suggested cleaning up data as much as possible prior to migration. more pre-migration data cleanup would avoid the post-migration mess. some libraries recommended more stringent cleanup of catalog records, acquisitions data, circulation data, patron records, weeding, etc. it is important to make sure the cataloging structure matches the structure of the new system. had they taken the data review stage more seriously and fully modeled the processes and workflows that would be needed, they would have had fewer data cleanup problems to address after the migration was complete. some libraries cautioned that alma’s p2e (physical to electronic) migration process was more complex than anticipated. they stated that the p2e conversion did not work as it should have, and ex libris should do a better job in the future. due to misalignment of source and target collections, the p2e process resulted in a large cleanup after the migration. a number of libraries would have asked more questions about what data was migrated and to where. ex libris had migrated data that should not have been migrated. as a result, a messy system became a reality. planning for future system migrations when asking what libraries will do differently for a future system migration, many provided very interesting insights. some libraries believed that the system migration put library leadership in a difficult position. they needed to engage all library employees in decision-making and provide staff with the resources they needed to navigate change, experience the vulnerability of learning a new system, and even have difficult conversations with colleagues. at the same time, library leaders are accountable to their parent organizations and subject to budget pressure and mandates to follow procurement processes, which are geared around efficiency and hierarchy rather than promoting democratic decision-making and self-governance. many libraries expressed a concern about training. they stated that they would demand a separate contract for training in the future and put more contractual obligations on system providers to train staff more thoroughly. they would spell out in greater detail what a successful migration would consist of to hold ex libris responsible for outcome. during the bidding process, library staff should be less distracted by smooth presentations but ask difficult questions about system functionality. another concern is about the pricing. one early adopter of alma stated that they learned the risks, rewards, and excitement of helping with a developing product as they felt aleph was a dead end information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 23 guo and xu and did not see many other alternatives. they would have negotiated more strongly with ex libris on pricing considering the immaturity of the product and pricing model at the time of adoption. some libraries felt they were not given competitive pricing, and their costs went up significantly, which constituted a large budget shift. some small libraries believed alma is too big for them, and oclc might be more appropriate for their size of collections and materials. they realized they underutilized a very expensive system. some libraries preferred a customized implementation as opposed to the one-size-fits-all model ex libris offered. they stated that despite learning the new system, they found that the solutions ex libris offered for their implementation rarely worked. they would better off fitting in their own workflows with alma (especially for budgeting). ex libris seems to be not ready to work with single-campus small colleges. other libraries reported that they had multiple people in a project management role, which created communication issues. they learned that in any future migration processes they should have a single project manager empowered to make decisions. for consortium libraries, some libraries suggested taking advantage of cohorts of migrating institutions to share information, issues, and raise common questions. they would have made some local decisions instead of simply going with the consortiums. one consortium experienced a major difficulty that the group implementation took place in different countries. the time difference with their implementation team had added an additional dimension to project management. they would have done an individual migration instead of a group migration since they had a very complex institutional structure. some libraries strongly recommended open-source systems as well. they believed that the trend toward vertical consolidation of vendors is not healthy for the library system market in the long run. with mergers and acquisitions, gigantic companies are formed and might over-control the market and pricing. conclusions decision-making on the selection, procurement, and implementation of a new lsp is a process that requires gathering information and seeking input from library administration, experts, and different levels of stakeholders in a systematical way to ensure the system quality, fitness, and a successful implementation. the findings suggest that libraries should adopt an rfi/rfp (request for information/proposal) or system functionality survey as the basis for system selection. budget, resource discovery, and electronic resources management are the most important factors to be considered in an ils selection. staffing time and technical capability must be addressed before implementing a new system to enable libraries to manage user expectations. insufficient staff and the lack of technical skills could affect the realization of the benefits of a new system. technological change can lead to the shifts of staff job responsibilities and lead to a new way of working together. it is important for library administration to address organizational change when making technological change. a formal project assessment is essential for libraries and system providers to learn and improve collectively. open-source systems could open doors for libraries to seek more customized and affordable systems. research limitations like all research studies, this study has limitations that provide opportunities for further investigations. firstly, because we asked for responses from individuals, not libraries, the findings information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 24 guo and xu might be biased by participants due to individual experiences. secondly, due to the limitation of time, space, and number of survey questions, reported data mainly focused on alma libraries and could not cover migration experiences of libraries migrating to other products or all aspects of system migration. further research would benefit the library community from interviewing participating libraries in a different size, type, and geographic location as well as different system providers. practical implications every new system has its advantages and downsides. to help libraries fully take advantage of a new system, it would be helpful if vendors could evaluate training, physical to electronic (p2e) process, and system affordability. providing training after a system goes live will help libraries implement workflows effectively and give staff better experience. p2e is crucial for ensuring that all relevant information is transferred and maintained in the new system. vendors could address potential p2e issues before a system migration takes place so that libraries might approach data cleanup differently. it would be great if vendors could customize system modules or functionalities as needed by both small and large libraries. this will give libraries flexibility to invest in most needed library operations at different prices to make the system affordable. customer services can be crucial for libraries to continue optimizing the new system down the road. regularly seeking libraries’ feedback can foster a positive customer relationship and benefit both libraries and vendors. acknowledgements the authors appreciate the support of marshall breeding and sue julich for providing the library contact lists. the authors would also like to thank the office of research integrity for reviewing the survey questionnaire and providing comments. much gratitude goes to the survey participants who volunteered their time to participate in this study and took the time to communicate with the authors in order to provide accurate responses for their libraries or consortia. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 25 guo and xu appendix: survey questionnaire adult online consent to participate in a research study a customers’ perspective: decision-making on system migration summary information things you should know about this study: • purpose: the purpose of the study is to understand how library leaders make decisions on system migration during technological change and the impact of these decisions on library operation and staff. • procedures: if you choose to participate, you will be asked to answer 12 multiplechoice questions and 3 open-ended questions. • duration: this will take about 15 to 20 minutes. • risks: there is little risk or discomfort from this research since you share your project experience anonymously. • benefits: the main benefit to you from this research is to self-reflect on the project and have an opportunity to share the project experience. we plan to publish our findings, which will bring potential benefits to you and the library community. • alternatives: there are no known alternatives available to you other than not taking part in this study. • participation: taking part in this research project is voluntary. please carefully read the entire document before agreeing to participate. confidentiality the records of this study will be kept private and will be protected to the fullest extent provided by law. in any sort of report we might publish, we will not include any information that will make it possible to identify you. research records will be stored securely and only the researcher team will have access to the records. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 26 guo and xu the following questions are for general analytical use only. although qualtrics does not collect your email address, please do not provide your personal identification indicators (pii) with your answers. if pii appear in the responses, we will apply a data anonymization process to anonymize pii after the results are added into the final tally. right to decline or withdraw your participation in this study is voluntary. you are free to participate in the study or withdraw your consent at any time during the study. you will not lose any benefits if you decide not to participate or if you quit the study early. the investigator reserves the right to remove you without your consent at such time that he/she feels it is in the best interest. researcher contact information if you have any questions about the purpose, procedures, or any other issues relating to this research study you may contact jin guo (jiguo@fiu.edu) or gordon xu (gordon.xu@njit.edu). irb contact information if you would like to talk with someone about your rights of being a subject in this research study or about ethical issues with this research study, you may contact the fiu office of research integrity by phone at 305-348-2494 or by email at ori@fiu.edu. participant agreement i have read the information in this consent form and agree to participate in this study. i have had a chance to ask any questions i have about this study, and they have been answered for me. by clicking on the “consent to participate” button below i am providing my informed consent. consent to participate mailto:jiguo@fiu.edu mailto:gordon.xu@njit.edu information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 27 guo and xu section i: library profile and background information 1. your title: a. dean/director of the library/university librarian b. system librarian c. other (please specify: _________________) 2. describe your institution a. location i. us ii. canada iii. state 2. total student and faculty population a. total student population (number of ftes) b. total faculty population (number of ftes) 3. information about your library a. single campus library b. part of a multicampus library system c. part of a consortium d. other (please specify: _________________) 4. previous ils: a. the previous ils name: b. the previous ils vendor: c. years with the previous system: d. was it your first ils? a. yes b. no 5. ils modules in use prior to alma migration: (please check all that apply) a. acquisitions b. cataloging c. circulation d. interlibrary loan e. reserves f. serials g. opac h. other (please specify: _____________________) information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 28 guo and xu section ii: alma implementation process 6. alma modules/functions implemented: (please check all that apply) a. acquisitions b. resource management c. fulfillment d. interlibrary loan e. course reserves f. erm g. network zone h. primo/primo ve i. digital collections j. other (please specify: ________________________) 7. the system selection process • was an rfi (request for information) involved? a. yes b. no • did you conduct a system functionality survey to collect information from library users and colleagues? a. yes b. no • was the rfp (request for proposal) process required? • a. yes, please specify the person/department that prepared for rfp. _____ • b. no, please provide the reason why (e.g., budget cap less than $100k, etc.)_____ 8. who was involved in the decision-making process? • alma project working group (consortium) • alma local implementation team • project manager(s) • library dean • institutional coordinators/leads • departmental heads • others (please specify ______) information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 29 guo and xu 9. what are important factors for system selection (5 points, weight/per response)? • the budget reality • the number of libraries adopted • e-resource management (erm), bibliographic, and authority control • discovery layers (primo, primo ve) • the analytics/reporting functionality • cloud hosted • the university/college it infrastructure and its ecosystems • integration with other erp (enterprise resource planning) systems/platforms • customer support & satisfaction • system user training programs 10. what data was migrated (please select all that apply)? • authority data • bibliographic records • holdings and items • patrons • loans, holds, and fines • acquisitions • course reserves • digital metadata and objects 11. please skip this question if you use primo/primo ve. if you chose non-ex libris products for discovery service, please specify the product____, and select the possible reason below: • budget limitation • stay with the existing discovery service • others section iii: feedback on alma migration project. 12. how did your library evaluate the system migration project? • no formal post-migration evaluation • user satisfaction survey • achieved the project goals • met the needs of library operations (acquisitions, cataloging, fulfilment, discovery, etc.) information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 30 guo and xu 13. open-ended questions • what are the most valuable lessons you have learned from this project? if you had a chance to do it again, how would you implement the migration differently? • would the library consider working with ex libris again if it were to migrate to a new system in the future? • how likely is it that this library would consider implementing an open-source ils? information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 31 guo and xu endnotes 1 zhonghong wang, “integrated library system (ils) challenges and opportunities: a survey of us academic libraries with migration projects,” the journal of academic librarianship 35, no. 3 (2009): 207–20, https://doi.org/10.1016/j.acalib.2009.03.024. 2 teri oaks gallaway and mary finnan hines, “competitive usability and the catalogue: a process for justification and selection of a next-generation catalogue or web-scale discovery system,” library trends 61, no. 1 (2012): 173–85. 3 guoying liu and ping fu, “shared next generation ilss and academic library consortia: trends, opportunities and challenges,” international journal of librarianship 3, no. 2 (2018): 53–71. 4 matt goldner, “winds of change: libraries and cloud computing,” bcla browser: linking the library landscape 4, no. 1 (2012): 1–7. 5 liu and fu, “shared next generation,” 53–71; jone thingbø, frode arntsen, anne munkebyaune, and jan erik kofoed, “transitioning from a self-developed and self-hosted ils to a cloudbased library services platform for the bibsys library system consortium in norway,” bibliothek forschung und praxis 40, no. 3 (2016): 331–40, https://doi.org/10.1515/bfp-20160052. 6 philip calvert and marion read, “rfps: a necessary evil or indispensable tool?” electronic library 24, no. 5 (2006): 649–61. 7 matt gallagher, “how to conduct a library services platform review and selection,” computers in libraries 36, no. 8 (2016): 20. 8 zhongqin (june) yang and linda venable, “from sirsidvnix symphony to alma/primo: lessons learned from an ils migration,” computers in libraries 38, no. 2 (march 2018): 10–13. 9 gallaway and hines, “competitive usability,” 173–85. 10 alan manifold, “a principled approach to selecting an automated library system,” library hi tech 18, no. 2 (2000): 119–30, https://doi.org/10.1108/07378830010333455. 11 ayoku a. ojedokun, grace o. o. olla, and samuel a. adigun, “integrated library system implementation: the bowen university library experience with koha software,” african journal of library, archives and information science 26, no. 1 (2016): 31–42. 12 lyn h. dennison and alana faye lewis, “small and open source: decisions and implementation of an open source integrated library system in a small private college,” georgia library quarterly 48, no. 2 (spring 2011): 6–9. 13 daniel lovins, “management issues related to library systems migrations. a report of the alcts camms heads of cataloging interest group meeting. american library association annual conference, san francisco, june 2015,” technical services quarterly 33, no. 2 (2016): 192–98, https://doi.org/10.1080/07317131.2016.1135005. https://doi.org/10.1016/j.acalib.2009.03.024 https://doi.org/10.1515/bfp-2016-0052 https://doi.org/10.1515/bfp-2016-0052 https://doi.org/10.1108/07378830010333455 https://doi.org/10.1080/07317131.2016.1135005 information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 32 guo and xu 14 kyle banerjee and cheryl middleton, “successful fast track implementation of a new library system,” technical services quarterly 18, no. 3 (2001): 21–33. 15 joshua m. avery, “implementing an open source integrated library system (ils) in a special focus institution,” digital library perspectives 32, no. 4 (2016): 287–98, https://doi.org/10.1108/dlp-02-2016-0003. 16 morag stewart and cheryl aine morrison, “breaking ground: consortial migration to a nextgeneration ils and its impact on acquisitions workflows,” library resources & technical services 60, no. 4 (2016): 259–69. 17 zahiruddin khurshid and saleh a. al-baridi, “system migration from horizon to symphony at king fahd university of petroleum and minerals,” ifla journal 36, no. 3, (2010): 251–58, https://doi.org/10.1177/0340035210378712. 18 efstratios grammenis and antonios mourikis, “migrating from integrated library systems to library services platforms: an exploratory qualitative study for the implications on academic libraries’ workflows,” qualitative and quantitative methods in libraries 9, no. 3 (september 2020): 343–57, http://qqml-journal.net/index.php/qqml/article/view/655/585. 19 abigail wickes, “e-resource migration: from dual to unified management,” serials review 47, no. 3–4 (2021): 140–42. 20 yang and venable, “from sirsidynix,” 13. 21 joseph nicholson and shoko tokoro, “cloud hopping: one library’s experience migrating from one lsp to another,” technical services quarterly 38, no. 4 (2021): 377–94. 22 ping fu and moira fitzgerald, “a comparative analysis of the effect of the integrated library system on staffing models in academic libraries,” information technology and libraries 32, no. 3 (september 2013): 47–58. 23 geraldine rinna and marianne swierenga, “migration as a catalyst for organizational change in technical services,” technical services quarterly 37, no. 4 (2020): 355–75, https://doi.org/10.1080/07317131.2020.1810439. 24 vandana singh, “experiences of migrating to open source integrated library systems,” information technology and libraries 32, no. 1 (2013): 36–53, https://doi.org/10.6017/ital.v32i1.2268; shea-tinn yeh and zhiping walter, “critical success factors for integrated library system implementation in academic libraries: a qualitative study,” information technology and libraries 35, no. 3 (2016): 27–42, https://doi.org/10.6017/ital.v35i3.9255; grammenis and mourikis, “migrating from integrated library systems,” 343–54; xiaoai ren, “service decision-making processes at three new york state cooperative public library systems,” library management 35, no. 6 (2014): 418–32, https://doi.org/10.1108/lm-07-2013-0060; wang, “integrated library system,” 207– 20; pamela r. cibbarelli, “helping you buy ils,” computers in libraries 30, no. 1 (2010): 20–48, https://www.infotoday.com/cilmag/cilmag_ilsguide.pdf; calvert and read, “rfps,” 649–61. https://doi.org/10.1108/dlp-02-2016-0003 https://doi.org/10.1177/0340035210378712 http://qqml-journal.net/index.php/qqml/article/view/655/585 https://doi.org/10.1080/07317131.2020.1810439 https://doi.org/10.6017/ital.v32i1.2268 https://doi.org/10.6017/ital.v35i3.9255 https://doi.org/10.1108/lm-07-2013-0060 information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 33 guo and xu 25 gallaway and hines, “competitive usability,” 173–85. 26 fu and fitzgerald, “a comparative analysis,” 47–58. abstract introduction literature review methods data collection data analysis findings part i: library profile & background information respondents participating libraries geographic location library size library type previous integrated library system (ils) part ii: implementation process alma modules/functions selection process rfi and rfp decision-making important factors for system selection data migrated discovery service part iii: feedback on alma migration system migration evaluation valuable lessons learned prospective migration discussion the decision-making process and factors considered project evaluation impacts on library staffing and library operation lessons learned training communication and engagement implementation process data cleanup and preparation planning for future system migrations conclusions research limitations practical implications acknowledgements appendix: survey questionnaire endnotes automated fake news detection in the age of digital libraries article automated fake news detection in the age of digital libraries uğur mertoğlu and burkay genç information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12483 uğur mertoğlu (umertoglu@hacettepe.edu.tr) is a phd candidate, hacettepe university. burkay genç (bgenc@cs.hacettepe.edu.tr) is assistant professor, hacettepe university. © 2020. abstract the transformation of printed media into the digital environment and the extensive use of social media have changed the concept of media literacy and people’s habits of news consumption. while online news is faster, easier, comparatively cheaper, and offers convenience in terms of people's access to information, it speeds up the dissemination of fake news. due to the free production and consumption of large amounts of data, fact-checking systems powered by human efforts are not enough to question the credibility of the information provided, or to prevent its rapid dissemination like a virus. libraries, long known as sources of trusted information, are facing challenges caused by misinformation as mentioned in studies about fake news and libraries.1 considering that libraries are undergoing digitization processes all over the world and are providing digital media to their users, it is very likely that unverified digital content will be served by world’s libraries. the solution is to develop automated mechanisms that can check the credibility of digital content served in libraries without manual validation. for this purpose, we developed an automated fake news detection system based on turkish digital news content. our approach can be modified for any other language if there is labelled training material. this model can be integrated into libraries’ digital systems to label served news content as potentially fake whenever necessary, preventing uncontrolled falsehood dissemination via libraries. introduction collins dictionary which chose the term “fake news” as the “word of the year 2017,” describes news as the actual and objective presentation of a current event, information, or situation that is published in newspapers and broadcast on radio, television, or online.2 we are in an era where everything goes online, and news is not an exception. many people today prefer to read their daily news online, because it is a cost-effective and convenient way to remain up to date. although this convenience has lucrative benefits for society, it can also have harmful side effects. having access to news from multiple sources, anytime, anywhere has become an irresistible part of our daily routines. however, some of these sources may provide unverified content which can easily be delivered right to your mobile device. most importantly, potential fake news content delivered by these sources may mislead society and cause social disturbances such as triggering violence against ethnic minorities and refugees, causing unnecessary fear related to health issues, or even sometimes result in crisis, devastating riots and strikes. not having a steady definition compared to news, fake news is often defined according to the data used or the limited perspective of the study in the literature. for example; difranzo and gloriagarcia defined the fake news as “false news stories that are packaged and published as if they were genuine.”3 on the other hand, guess et al. see the term as “a new form of political misinformation” within the domain of politics, whereas mustafaraj is more direct and defines it as mailto:umertoglu@hacettepe.edu.tr mailto:bgenc@cs.hacettepe.edu.tr information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 2 “lies presented as news.”4 a comprehensive list of 12 definitions can be found in egelhofer and lecheler.5 in simplified terms, news which is created to deceive or mislead readers can be called fake news. however, the concept of fake news is a quite broad one that needs to be specified meticulously. fake news is created for many purposes and emerges in many different types. having an interwoven structure, most of these types are shown in figure 1. although, it is not easy to cluster these types into separate groups, they can be categorized according to the information quality or based on the intention as it is created to deceive deliberately or not, as rashkin et al. did.6 we propose the following classification where the two dimensions represent the potential impact and the speed of propagation. figure 1. the volatile distribution of the fake news types (clustered in four regions: sr, sr, sr, sr) with respect to two dimensions: speed of propagation and potential impact. the four regions visualized are clustered according to their dangerousness. first of all, it should be noted that to order types of fake news in a stable precision is quite a challenging task. the variations within the field highly depend on dynamic factors such as timespan, actors, and echochamber effect. hence, this figure should be considered as a clustering effort. there are possible intersecting areas of types within the regions. we will now give examples for two regions, “sr” and “sr.” for example, the sr grouping shows characteristics of high-risk levels and fast dissemination. this includes varieties of fake news such as propaganda, manipulation, misinformation, hate news, information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 3 provocative news, etc. we usually encounter this in the domain of politics. this kind of news may cause critical and nonrecoverable results in politics, the economy, etc., in a short period of time. the rise of the term fake news itself can also be attributed to this kind of news. on the other hand, the relatively less severe group (sr) of fake news, comprising of satire, hoax, click-bait, etc., has low-risk levels and a slow speed of dissemination. a frequently used type of this group, click-bait, is a sensational headline or link that urges the reader to click on a post, link, article, image, or video. these kinds of news have a repetitive style. it can be said that readers become aware of falsehood after experiencing a few times. so, risk level is lower, and dissemination is slower. vosoughi et al. stated the assumption that “falsehood diffuses significantly farther, faster, deeper, and more broadly than the truth.”7 so indeed, just one piece of fake news may affect many more people than thousands of true news items do because of the dramatic circulation of fake news. in their recent survey about fake news, zhou and zafarani highlighted that fake news is a major concern for many different research disciplines especially information technologies. 8 being a trusted source of information for a long time, libraries will play an important role in fighting against fake news problem. kattimani et al. claims that the modern librarian must be equipped with necessary digital skills and tools to handle both printed collections and newly emerging digital resources.9 similarly, we foresee that digital libraries, which can be defined as collections of digital content licensed and maintained by libraries, can be a part of the solution as an authority service with a collective effort. connaway et al. point to the key role of information professionals such as librarians, archivists, journalists, and information architects in helping society use the products and services related to news in a convenient way. 10 as libraries all over the world are transitioning into digital content delivery services, they should implement mechanisms to avoid fake and misleading content being disseminated through them under the guidance of information professionals. to lay out proper future directions for the solution strategy, a clear understanding of interaction between library and information science (lis) community and fake news must be addressed. sullivan states that the lis community has been affected deeply in the aftermath of the 2016 us presidential elections.11 moreover, he quotes many other scientists, emphasizing libraries’ and librarians’ role in the fight against fake news. for example, finley et al. say that libraries are the direct antithesis of fake news, the american library association (ala) called fake news an anathema to the ethics of librarianship in 2017, rochlin emphasizes the role of librarians in this fight, and talks about the need to adopt fake news as a central concern in librarianship and many other researchers name librarians in the front lines of the fight against fake news.12 today, the struggle to detect fake news and prevent their spread is so popular that competitions are being organized (e.g., http://www.fakenewschallenge.org/) and conferences are being held (e.g., bobcatsss 2020). the struggle against fake news can be classified under three main venues: • reader awareness • fact-checking organizations and websites • automated detection systems the first item requires awareness of individuals against fake news and a collective conscience within the society against spreading fake news. to this end, visual and textual checklists, frameworks, and guidance lists are being published by official organizations, such as ifla’s13 http://www.fakenewschallenge.org/ information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 4 (international federation of library associations) infographic which contains eight steps to spot fake news. the radar framework and the currency, relevance, authority, accuracy, and purpose (craap) test are some of the efforts trying to increase reader-awareness of fake news.14 unfortunately, due to the nature of fake news and the clever way they are created triggering people’s hunger to spread sensational information, it is very difficult to achieve full control via this strategy. some studies explicitly showed that humans are prone to get confused when it comes to spotting lies or deciding whether a news item is fake or not.15 furthermore, people often overlook facts that conflict with their current belief, especially in politics and controversial social issues.16 the second strategy focuses on third-party manually driven systems for checking and labelling content as fake or valid. recently, we have seen many examples of offline and online organizations trying to work according to this strategy, such as a growing body of fact-checking organizations, start-ups (storyzy, factmata, etc.), and other projects with similar purposes.17 unfortunately, these manually powered systems cannot cope with the huge amounts of digital content being steadily produced. therefore, they focus only on a subset of digital content that they classify as having higher priority. even for this subset of content, their reaction speed is much slower than the fake information’s spread speed. therefore, automated and verified systems emerge as an inevitable last option. the third strategy offers automated fact-checking systems, which once trained, can deliver content labelling at unprecedented speeds. today, many researchers are researching automated solutions and building models with different methodologies.18 notwithstanding the latest studies, there is still a lot to do in the realm of automated fake news detection. automated fact-checking systems will be detailed in the rest of the paper. thanks to the internet, the collections of digital content served by digital libraries can be accessed by a great number of users without distance and time limits. therefore, we propose a solution to the problem by positioning digital libraries as automated fact-checking services, which label digital news content as fake or valid as soon as or before it is served through library systems. the main reason we associate this approach with digital libraries is their access to a wide variety of digital content which can be used to train the proposed mathematical models, as well as their role in the society as the publisher of trusted information. to this end, we develop a mathematical model that is trained using existing news content served by digital libraries, and capable of labelling news content as fake or valid with unprecedented accuracy. the proposed solution uses machine learning techniques with an optimized set of extracted features and annotated labels of existing digital news content. our study mainly contributes (a) a new set of features highly applicable for agglutinative languages, (b) the first hybrid model combining a lexicon/dictionarybased approach with machine learning methods to detect fake news, and (c) a benchmark dataset prepared in turkish for fake news detection. literature review contemporary studies have indicated that social, economic, and political events in recent years, especially after the 2016 us presidential elections, are increasingly associated with the concept of fake news.19 since then, fake news has begun to be used as a tool in many domains. on the other hand, researchers motivated by finding automated solutions started to make use of machine learning, deep learning, hybrid models, and other methodologies for their solutions. https://storyzy.com/ information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 5 although computational deception detection studies applying nlp (natural language processing) operations are not new, textual deception in the context of text-based news is a new topic for the field of journalism.20 accordingly, we believe that there is a hidden body language of news text, which has linguistic clues indicating whether the news is fake or not. thus, lexical, syntactic, semantic, and rhetorical analysis when used with machine learning and deep learning techniques offers encouraging directions. the textual deception spread over a wide spectrum and the studies have utilized many different techniques. there are some prominent studies which took the problem as a binary classification problem utilizing linguistic clues.21 although it is still early to say the linguistic characteristics of fake news are fully understood, research into fake-news detection in english-language texts is relatively advanced compared to that in other languages. in contrast, agglutinative languages such as turkish have been little researched when it comes to fake news detection. agglutinative languages enable the construction of words by adding various morphemes, which means that words that are not practically in use may exist theoretically. for example, “gerek-siz-leş-tir-ebilecek-leri-miz-den-dir,” is a theoretically possible word that means “it is one of the things that we will be able to make redundant,” but it is not a practical one. shu et al. classified the models for the detection of fake news in their study.22 according to this study, the automated approaches can focus on four types of attributes to detect fake news: knowledge based, style based, stance based, or propagation based. among these, it can be said that the most useful approaches are the ones which focus on the textual news content. th e textual content can be studied by an automated process to extract features that can be very helpful in classifying content as fake or valid. many scholars have tried to build models for automatic detection and prediction of fake news using machine learning algorithms, deep learning algorithms, and other techniques. these scholars approach the detection of fake news from many different perspectives and domains. for example, in one of the studies, scientific news and conspiracy news were used.23 in shu et al.’s study based on credibility of news, the headlines were used to determine whether the article was clickbait or not. in another study, reis et al. worked on buzzfeed articles linked to the 2016 us election using machine learning techniques with a supervised learning approach.24 studies which try to detect satire and sarcasm can be attributed to subcategories of fake news detection.25 our observation, in line with the general view, is that satire is not always recognizable and can be misunderstood for real news.26 for this reason, we included satirical news in our dataset. it should be noted that although satire or sarcasm can be classified by automated detection systems, experts should still evaluate the results of the classification. while some scholars used specific models focusing on unique characteristics, some others such as ruchansky et al. proposed hybrid deep models for fake news detection making use of multiple kinds of features such as temporal engagement between users and news articles over time and generated a labelling methodology based on those features.27 in related studies, many features such as automatic extracted features, hand-crafted features, social features, network information, visual features, and some others such as psycholinguistic features, are applied by researchers.28 in this work, we focused on news content features, however the social context features can also be adapted using different tiers such as user activity patterns, information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 6 analysis of user interaction, profile metadata, social network/graph analysis etc. to extract features. we also have some of these features in our data but not having ground truth quantitatively, we avoided using these features. methodology in this section, we present our motivation for this work which we visualized in a framework and named global library and information science (glis_1.0). subsequently, we discuss the construction of the automated detection system as the key element of the glis_1.0 framework. we explain the framework, model, dataset, features, and the techniques used in this section. framework the main structure of the proposed framework is shown in figure 2. this framework consists of highly cohesive but flexible layers. figure 2. the glis_1.0 framework main structure. in the presentation layer one can find the different sources of news that are publicly available. these sources can be accessed directly using their websites or can be searched for via search engines. the news is received by fact-checking organizations which classify them manually, digital libraries which archives and serves them, and automated detection systems (ads) which classify them automatically. digital libraries work together with fact-checking organizations and adss to present clean and valid news to the public. moreover, search engines use digital libraries systems to label their results as fake or valid. information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 7 fact-checking organizations should also benefit from the output of adss, as instead of manually checking heaps of news content, they could now focus on news labeled as potentially fake by an ads. through glis, adss make the life of fact-checking organizations and digital libraries much easier, all the while increasing the quality of news served to the public. considering this is a high-level overview of a structure given in figure 2, there may be many other components, mechanisms, or layers, but the key elements of this structure are automated detection systems and the digital libraries. a critical approach to this framework can be why we need such an authority mechanism. the answer will be quite simple, technological progress is not the only solution. on the contrary, tech giants have already been subject to regulatory scrutiny for how they handle personal information.29 also, their policy related to political ads has been questioned. furthermore, they are often blamed for failing to fight fake news. indeed, there is an urgent need for a global action more than ever. digital libraries are much more than a technological advancement. hence, they should be considered as institutions or services which can be a great authority service to provide news to society since the printed media disappears day by day. the threats caused by fake news are real and dangerous, but only recently have researchers from different disciplines been trying to find possible solutions such as educational, technological, regulatory, or political. digital librarianship can be the intersection of all these solutions for promoting information/media literacy. hence, digital librarianship will make use of many automated detection systems (ads) to serve qualified news. in the following section, we discuss ads in detail. model an overview of our model of automated detection system solution which is very critical for the framework is shown in figure 3. our fake news detection model consists of two phases. first is the language model/lexicon generation and the second is machine learning integration. in this work, we used machine learning algorithms via supervised learning techniques which learn from labeled news data (training) and helps us to predict outcomes for unforeseen news data (test). dataset we collected our data from three sources: • the primary source is the gdelt (global database of events, language and tone) project (https://www.gdeltproject.org/), a massive global news media archive offering free access to news text metadata for researchers worldwide. it can almost be considered a digital library of news in its own right. however, gdelt does not provide the actual news text and only serves processed metadata along with the url of the news item. gdelt normally does not check for the validity of any news items. however, we have only used news from approved news agencies and completely ignored news from local and lesser-known sources to maximize the validity of the news we have automatically obtained through gdelt. moreover, we have post-processed the obtained texts by cross-validating with teyit.org data to clean any potential fake news obtained through gdelt links. https://www.gdeltproject.org/ information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 8 figure 3. integrated fake news detection model with main phases combining language-model based approach with machine learning approach. • the second source is teyit.org which is a fact-checking organization based in turkey, compliant to the principles of ifcn (international fact-checking network) aiming to prevent spreading of false information through online channels. manually analyzing each information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 9 news item, they tag them as fake, true, or uncertain. we used their results to automatically download and label each news text. • lastly, our team collected manually curated and verified fake and valid news obtained from various online sources and named it as mvn (manually verified news). this set includes fake and valid news that we have manually accumulated in time during our studies and that were not overlapping with the news obtained from gdelt and teyit.org sources. we named our dataset trfn. in phase 2, the data is very similar to the one we used in phase 1. however, to see the effectiveness of model, we made modifications to exclude old news before 2017 and added new items from 2019. the news in our dataset span a time frame between 2017– 2019 and are uniformly distributed. table 1 outlines the dataset statistics, namely where the news text comes from, its class (fake or valid), the amount of distinct texts and the corresponding data collection method. it can be seen from the table that most of our valid news come from the gdelt source, whereas teyit.org, a fact-checking organization, contributes only fake news. table 1. trfn dataset summary after cleaning and duplicate removal. dataset class size of processed data collection method gdelt non-fake 82708 automated teyit.org fake 1026 mvn non-fake 1049 manual fake 400 all news items were processed through zemberek (http://code.google.com/p/zemberek), the turkish nlp engine for extracting different morphological properties of words within texts. after this processing phase, all obtained features were converted into tabular format and made available for future studies. this dataset is now available for scholarly studies upon request. in a study of this nature, the verifiability of the data used is important. as we have already mentioned, most of the data we used comes from verified sources such as mainstream news agencies accessed through gdelt and teyit.org archives which are verified by teyit.org staff. all data used in training the mathematical models which are to be explained in the rest of the paper are either directly or indirectly verified. another important issue was generalizability of the dataset, which determines whether the results of the study are only applicable to specific domains or to all available domains. although focusing on a specific news domain would clearly improve our accuracies, we preferred to work in the general domain and included news from all specific domains. the distribution of domains in our dataset is visualized in figure 4. this distribution closely matches the distribution one would experience reading daily news in turkey. hence, we have no domain specific bias in our training dataset. http://code.google.com/p/zemberek information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 10 figure 4. the distribution of domains in the dataset. (scitechenvwetnatlife = science, technology, environment, weather, nature, life. educulturearttourism = education, culture, art, tourism.) moreover, we obtained highly correlated evidence showing syntactic similarities with the other nlp studies in turkish during the exploratory data analysis. for example, the results of a study by zemberek developers (http://zembereknlp.blogspot.com/2006/11/kelime-istatistikleri.html) to find the most common words in turkish experimented with over five million words is compatible with most common words in our corpus. this evidence can be attributed to representability of our dataset. the last issue worth discussing is the imbalanced nature of the dataset. an imbalanced dataset occurs in a binary classification study when the frequency of one of the classes dominates the frequency of the other class in the dataset. in our dataset, the amount of fake news is highly surpassed by the amount of valid news. this generally results in difficulties in applying conventional machine learning methods to the dataset. however, it is a frequently observed phenomenon due to the disparity of variable classes in these kinds of problems in real world. to avoid potential problems due to the imbalanced nature of the dataset, we used smote (synthetic http://zembereknlp.blogspot.com/2006/11/kelime-istatistikleri.html information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 11 minority over-sampling technique) which is an over-sampling method.30 it creates synthetic samples of the minority class that are relatively close in the feature space to the existing observations of the minority class. features in this study, we discarded some features because of their relatively low impact on overall performance during the exploratory data analysis and subsequently in the training phase. the most effective features we decided on are shown in table 2. table 2. main features features group definition nrootscore language model features the news score calculated according to the root model nrawscore the news score calculated according to the raw model spellerrorscore extracted features spell errors per sentences complexityscore the score of the complexity/readibility of the news source labels the url or identifier of the news maincategory the category of the news newssite the unique address of the news the language model features nrootscore and nrawscore are features that we have borrowed from our earlier study on fake news detection.31 in that study, we focused on constructing a fake news dictionary/lexicon based on different morphological segments of the words used in news texts. these two scores were found to be the most successful ones in determining the fakeness/validity of a news text, one considering the raw form of the words, the other considering the root form. the extracted features are complexityscore and spellerrorscore. complexityscore basically represents the readability of the text. studies for determining a good readability metric exist for the turkish language.32 we used a modified version of the gunnig-fog metric, which is based on word length and sentence length.33 since turkish is an agglutinative language, we used word length instead of using the syllable count. we also made some modifications to normalize the scores. the average number of syllables per word syllable in turkish is 2.6, so we defined a word as a long word if it has more than 9 letters.34 for a given news text t, the complexity score (cs) can be computed by equation 1. (1) 𝑇𝐶𝑆 = ( 𝑊𝑜𝑟𝑑𝑐𝑜𝑢𝑛𝑡 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠𝑐𝑜𝑢𝑛𝑡 + 𝐿𝑜𝑛𝑔𝑊𝑜𝑟𝑑𝑐𝑜𝑢𝑛𝑡∗100 𝑊𝑜𝑟𝑑𝑐𝑜𝑢𝑛𝑡 10 ) the second extracted feature is spellerrorscore. we foresee that there may be many more errors in fake news than in valid news. we calculated the spell error counts making use of turkish spellchecker class of zemberek. due to the text length of news varies, we calculate the ratio information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 12 according to the sentences. for a given news text t, se (spell error score) is calculated as shown in equation 2. (2) 𝑇𝑆𝐸 = ( 𝑆𝑝𝑒𝑙𝑙𝐸𝑟𝑟𝑜𝑟𝐶𝑜𝑢𝑛𝑡 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠𝐶𝑜𝑢𝑛𝑡 ) finally, we included the metadata categories source, maincategory, and newssite as additional identifiers for the learning process. then, we combined features extracted from text representation techniques with the features shown in table 2 and trained the model with different classifiers. for text representation, we followed two directions for the experiments. first, we converted text into structured features with bag of words (bow) approach in which text data is represented as the multiset of its words. second, we experimented with n-grams which represents the sequence of n words, in other words splitting text into chunks of size n-words. in the (bow) model, documents in trfn are represented as a collection of words, ignoring grammar and even word order, but preserving multiplicity. in a classic bow approach, each document can be represented as a fixed-length vector with length equal to the vocabulary size. this means each dimension of this vector corresponds to the occurrence of a word in a news item. we customized the generic approach by reducing variable-length documents to fixed-length vectors to be able to use with varying lengths with many machine learning models. information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 13 figure 5. an overview of bow (bag of word) approach. because we ignore the word order, we reduced fixed length of counts as histograms as seen in figure 5. assuming n is the number of news documents and w is the number of possible words in the corpus, it should be noted that in n*w count matrix, n is generally large but infrequent, because we have many news documents, but most words do not occur in any given document causing rareness of a term/word which is a drawback for the approach. therefore, we modified the model to compensate the rarity problem by weighting the terms using tf-idf measure which evaluates how important a word is to a document in a collection. the other technique we used, n-gram model is the generic term for a string of words in computational linguistics, and it is extensively used in text mining and nlp tasks. the prefixes that replace the n-part indicate the number of consecutive words in the string. so, a unigram is referred to one word, a bigram is two words, and an n-gram is n words. experimental results and discussion in this section, the experimental process and the results are presented. all experiments are performed using the scikit-learn library. to evaluate the performance of the model and proposed features we employed the precision, recall, f1 score (the harmonic mean of the precision and recall), and accuracy metrics. we did many experiments using different combinations of features. information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 14 several classification models have been trained. these are as follows: k-nearest neighbor, decision trees, gaussian naive bayes, random forest, support vector machine, extratrees classifier, and logistic regression. to be effective, a classifier should be able to correctly classify previously unseen data. to this end, we tuned the parameter values for all the classification models used. then, models were trained and evaluated on trfn dataset using 10-fold cross-validation. in table 3, we present the ultimate best scores of the proposed model. the results are highly motivating to exemplify how useful automated detection systems can be as a key component of the integrated solution framework in figure 2. we compared the algorithms with three ultimate feature sets for having respectively consistent results to the other feature set combinations. set1 stands for bigram+fopt (optimized features), set2 stands for bowmodified+ fopt and set3 stands for unigram+bigram+fopt. the results show that there is a relative consistency in terms of performance across the models. in almost all models, the combination of unigram+bigram and optimized features sets (fopt) gives better results than the other combinations. the extratree classifier model is chosen as the best due to its higher performance. this model is also known as extremely randomized trees classifier which is a type of ensemble learning technique aggregating the results of multiple decision trees collected in a “forest” to output its classification result. it is very similar to random forest classifier and only differs in the manner of construction of the decision trees. so, we can also see closer results between these two classifiers. table 3. results. evaluation results of all combinations of features and classification models. model feature sets precision%(0,1) recall%(0,1) accurac y f1scor e set1 93.32 93.96 93.92 93.3 6 93.64 93.62 gaussian naive bayes set2 93.37 94.02 93.98 93.4 2 93.70 93.68 set3 93.95 94.21 94.19 93.9 7 94.08 94.07 set1 93.70 93.50 93.52 93.6 9 93.60 93.61 k-nearest neighbour set2 93.66 94.05 94.03 93.6 8 93.85 93.84 set3 94.42 94.21 94.22 94.4 1 94.31 94.32 set1 94.15 94.92 94.88 94.1 9 94.53 94.51 extratrees classifier set2 94.09 94.94 94.90 94.1 4 94.51 94.49 set3 97.90 95.72 95.81 97.8 6 96.81 96.85 set1 89.61 88.92 88.99 89.5 4 89.26 89.30 support vector machine set2 89.70 88.96 89.04 89.6 2 89.33 89.37 set3 90.85 91.26 91.22 90.8 9 91.05 91.03 set1 91.56 92.28 92.23 91.6 2 91.92 91.89 logistic regression set2 91.50 92.28 92.22 91.5 6 91.89 91.86 set3 92.25 92.90 92.86 92.3 0 92.57 92.55 set1 93.71 94.44 94.40 93.7 5 94.07 94.05 random forest set2 93.87 95.00 94.94 93.9 4 94.44 94.41 set3 94.77 95.14 95.12 94.7 9 94.96 94.95 set1 93.95 94.59 94.56 93.9 9 94.27 94.25 decision trees set2 94.05 95.08 95.03 94.1 1 94.57 94.54 set3 94.94 95.24 95.23 94.9 5 95.09 95.08 information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 15 every ads in glis_1.0 framework may use its own way to detect fake news. the open source ads may improve with feedbacks. hybrid models and other techniques such as neural networks with deep learning methodology can also be used according to the data, language of news and the news features related with both social context and news content. conclusion and future work in this study we presented a novel framework which offers a practical architecture of an integrated system for identifying fake news. we have tried to illustrate how digital libraries can be a service authority to promote media literacy and fight against fake news. because librarians are trained to critically analyze information sources, their contributions to our proposed model are critical. accordingly, we see this work as an encouraging effort for the next collaborative studies among the communities of lis and cs (computer science). we think that there is an immediate need for lis professionals to participate and contribute to automated solutions that can help detecting inaccurate and unverified information. in the same manner, we believe the collaboration of lis professionals, computer scientists, fact-checking organizations, and pioneering technology platforms is the key to provide qualified news within a real-time framework to promote information literacy. moreover, we put the reader at the core of the framework as the feed reader position while consuming news. in terms of automated detection systems, we proposed a fake news detection model in tegration of dictionary-based approach and machine learning techniques offering optimized feature sets applicable to agglutinative languages. we comparatively analyzed the findings with several classification models. we demonstrated that machine learning algorithms when used together with dictionary-based findings yield high scores both for precision and recall. consequently, we believe once operational in the field, proposed workflow can be extended in the future to support other news elements such as photographs and videos. with the help of social network analysis (sna) it may be possible to stop or slow down the spread of fake news as it emerges. during all the experiments we did, this work also highlighted several tasks as future research directions such as: • the studies can be deepened to mathematically categorize the fake news types and the dissemination characteristics of each type can be analyzed. • the workflow has the potential to provide an automated verification platform for all news content existing in digital libraries to promote media literacy. endnotes 1 m. connor sullivan, “why librarians can’t fight fake news,” journal of librarianship and information science 51, no. 4 (december 2019): 1146–56, https://doi.org/10.1177/0961000618764258. 2 “definition of 'news',” available at: https://www.collinsdictionary.com/dictionary/english/news 3 dominic difranzo and kristine gloria-garcia, “filter bubbles and fake news,” xrds: crossroads, the acm magazine for students 23, no. 3 (april 2017): 32–35, https://doi.org/10.1145/3055153. https://doi.org/10.1177/0961000618764258 https://www.collinsdictionary.com/dictionary/english/news https://doi.org/10.1145/3055153 information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 16 4 andrew guess, brendan nyhan, and jason reifler, “selective exposure to misinformation: evidence from the consumption of fake news during the 2016 us presidential campaign,” european research council 9, no. 3 (2018): 4; eni mustafaraj and p. takis metaxas, “the fake news spreading plague: was it preventable?” proceedings of the 2017 acm on web science conference, (june 2017): 235–39, https://doi.org/10.1145/3091478.3091523. 5 jana laura egelhofer and sophie lecheler, “fake news as a two-dimensional phenomenon: a framework and research agenda,” annals of the international communication association 43, no. 2 (2019): 97–116, https://doi.org/10.1080/23808985.2019.1602782. 6 hannah rashkin et al., “truth of varying shades: analyzing language in fake news and political fact-checking,” proceedings of the 2017 conference on empirical methods in natural language processing, (2017): 2931–37. 7 soroush vosoughi, deb roy, and sinan aral, “the spread of true and false news online,” science 359, no. 6380 (2018): 1146–51, https://doi.org/10.1126/science.aap9559. 8 xinyi zhou and reza zafarani, “a survey of fake news: fundamental theories, detection methods, and opportunities,” acm computing surveys (csur) 53, no. 5 (2020): 1–40, https://doi.org/10.1145/3395046. 9 s. f. kattimani, praveenkumar kumbargoudar, and d. s. gobbur, “training of the library professionals in digital era: key issues” (2006), https://ir.inflibnet.ac.in:8443/ir/handle/1944/1234. 10 lynn silipigni connaway et al., “digital literacy in the era of fake news: key roles for information professionals,” proceedings of the association for information science and technology 54, no. 1 (2017): 554–55, https://doi.org/10.1002/pra2.2017.14505401070. 11 matthew c. sullivan, “libraries and fake news: what’s the problem? what’s the plan?,” communications in information literacy 13, no. 1 (2019): 91–113, https://doi.org/10.15760/comminfolit.2019.13.1.7. 12 wayne finley, beth mcgowan, and joanna kluever, “fake news: an opportunity for real librarianship,” ila reporter 35, no. 3 (2017): 8–12; american library association, “resolution on access to accurate information,” 2018; nick rochlin, “fake news: belief in post-truth,” library hi tech 35, no. 3 (2017): 386–92, https://doi.org/10.1108/lht-03-2017-0062; linda jacobson, “the smell test: in the era of fake news, librarians are our best hope,” school library journal 63, no. 1 (2017): 24–29; angeleen neely–sardon, and mia tignor, “focus on the facts: a news and information literacy instructional program,” the reference librarian 59, no. 3 (2018): 108–21, https://doi.org /10.1080/02763877.2018.1468849; claire wardle and hossein derakhshan, “information disorder: toward an interdisciplinary framework for research and policy making,” council of europe report 27 (2017). 13 ifla, “how to spot fake news,” 2017. https://doi.org/10.1145/3091478.3091523 https://doi.org/10.1080/23808985.2019.1602782 https://doi.org/10.1145/3395046 https://doi.org/10.1002/pra2.2017.14505401070 https://doi.org/10.15760/comminfolit.2019.13.1.7 https://www.emerald.com/insight/publication/issn/0737-8831 https://doi.org/10.1108/lht-03-2017-0062 information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 17 14 jane mandalios, “radar: an approach for helping students evaluate internet sources,” journal of information science 39, no. 4 (2013): 470–78, https://doi.org/10.1177/0165551513478889; sarah blakeslee, “the craap test,” loex quarterly 3, no. 3 (2004):4. 15 victoria l. rubin and niall conroy, “discerning truth from deception: human judgments and automation efforts,” first monday 17, no. 5 (2012), https://doi.org/10.5210/fm.v17i3.3933; verónica pérez-rosas et al., “automatic detection of fake news,” arxiv preprint arxiv:1708.07104 (2017). 16 justin p. friesen, troy h. campbell, and aaron c. kay, “the psychological advantage of unfalsifiability: the appeal of untestable religious and political ideologies,” journal of personality and social psychology 108, no. 3 (2015): 515–29, https://doi.org/10.1037/pspp0000018. 17 tanja pavleska et al., “performance analysis of fact-checking organizations and initiatives in europe: a critical overview of online platforms fighting fake news,” social media and convergence 29 (2018). 18 yasmine lahlou, sanaa el fkihi, and rdouan faizi, “automatic detection of fake news on online platforms: a survey,” (paper, 2019 1st international conference on smart systems and data science (icssd), rabat, morocco, 2019), https://doi.org/10.1109/icssd47982.2019.9002823; christian janze, and marten risius, “automatic detection of fake news on social media platforms,” (paper, pasific asia conference on information systems (pacis), 2017); torstein granskogen, “automatic detection of fake news in social media using contextual information” (master’s thesis, norwegian university of science and technology (ntnu), 2018). 19 jacob l. nelson and harsh taneja, “the small, disloyal fake news audience: the role of audience availability in fake news consumption,” new media & society 20, no. 10 (2018): 3720–37, https://doi.org/10.1177/1461444818758715; philip n. howard et al., “social media, news and political information during the us election: was polarizing content concentrated in swing states?,” arxiv preprint arxiv:1802.03573 (2018); alexandre bovet and hernán a. makse, “influence of fake news in twitter during the 2016 us presidential election,” nature communications 10, no. 7 (2019): 1–14, https://doi.org/10.1038/s41467-018-07761-2. 20 lina zhou et al., “automating linguistics-based cues for detecting deception in text-based asynchronous computer-mediated communications,” group decision and negotiation 13, no. 1 (2004): 81–106, https://doi.org/10.1023/b:grup.0000011944.62889.6f; myle ott et al., “finding deceptive opinion spam by any stretch of the imagination,” arxiv preprint arxiv:1107.4557 (2011); rada mihalcea and carlo strapparava, “the lie detector: explorations in the automatic recognition of deceptive language,” (paper, proceedings of the acl-ijcnlp 2009 conference short papers, (2009): association for computational linguistics, 309–12); julia b. hirschberg et al., “distinguishing deceptive from non-deceptive speech,” (2005), https://doi.org/10.7916/d8697c06. 21 victoria l. rubin, yimin chen, and nadia k. conroy, “deception detection for news: three types of fakes,” proceedings of the association for information science and technology 52, no. 1 (2015): 1–4, https://doi.org/10.1002/pra2.2015.145052010083; david m. markowitz, and jeffrey t. hancock, “linguistic traces of a scientific fraud: the case of diederik stapel,” plos https://doi.org/10.1177/0165551513478889 https://doi.org/10.5210/fm.v17i3.3933 https://psycnet.apa.org/doi/10.1037/pspp0000018 https://doi.org/10.1109/icssd47982.2019.9002823 https://doi.org/10.1177%2f1461444818758715 https://doi.org/10.1038/s41467-018-07761-2 https://doi.org/10.1023/b:grup.0000011944.62889.6f https://doi.org/10.7916/d8697c06 https://doi.org/10.1002/pra2.2015.145052010083 information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 18 one 9, no. 8 (2014): e105937, https://doi.org/10.1371/journal.pone.0105937; jing ma et al., “detecting rumors from microblogs with recurrent neural networks,” (paper, proceedings of the 25th international joint conference on artificial intelligence (ijcai 2016), (2016): 3818–24), https://ink.library.smu.edu.sg/sis_research/4630. 22 kai shu et al., “fake news detection on social media: a data mining perspective,” acm sigkdd explorations newsletter 19, no. 1 (2017): 22–36, https://doi.org/10.1145/3137597.3137600. 23 eugenio tacchini et al., “some like it hoax: automated fake news detection in social networks,” arxiv preprint arxiv:1704.07506 (2017). 24 julio c.s. reis et al., “supervised learning for fake news detection,” ieee intelligent systems 34, no. 2 (2019): 76–81, https://doi.org10.1109/mis.2019.2899143. 25 victoria l. rubin et al., “fake news or truth? using satirical cues to detect potentially misleading news,” (paper, proceedings of the second workshop on computational approaches to deception detection, (2016): 7–17); francesco barbieri, francesco ronzano, and horacio saggion, “is this tweet satirical? a computational approach for satire detection in spanish,” procesamiento del lenguaje natural, no. 55 (2015): 135-42; soujanya poria et al., “a deeper look into sarcastic tweets using deep convolutional neural networks,” arxiv preprint arxiv:1610.08815 (2016). 26 lei guo and chris vargo, “’fake news’ and emerging online media ecosystem: an integrated intermedia agenda-setting analysis of the 2016 us presidential election,” communication research 47, no. 2 (2020): 178–200, https://doi.org/10.1177/0093650218777177. 27 natali ruchansky, sungyong seo, and yan liu, “csi: a hybrid deep model for fake news detection,” proceedings of the 2017 acm on conference on information and knowledge management, (november 2017): 797–806, https://doi.org/10.1145/3132847.3132877. 28 yaqing wang et al., “eann: event adversarial neural networks for multi-modal fake news detection,” proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining, (2018): 849–57, https://doi.org/10.1145/3219819.3219903; james w. pennebaker, martha e. francis, and roger j. booth, “linguistic inquiry and word count: liwc 2001”, mahway: lawrence erlbaum associates 71, no. 2001 (2001). 29 “facebook, twitter may face more scrutiny in 2019 to check fake news, hate speech,” accessed may 17, 2020, available: https://www.huffingtonpost.in/entry/facebook-twitter-may-facemore-scrutiny-in-2019-to-check-fake-news-hate-speech_in_5c29c589e4b05c88b701d72e. 30 nitesh v. chawla et al., “smote: synthetic minority over-sampling technique,” journal of artificial intelligence research 16, (2002): 321–57, https://doi.org/10.1613/jair.953. 31 uğur mertoğlu and burkay genç, “lexicon generation for detecting fake news,” arxiv preprint arxiv:2010.11089 (2020). 32 burak bezirci, and asım egemen yilmaz, “metinlerin okunabilirliğinin ölçülmesi üzerine bir yazilim kütüphanesi ve türkçe için yeni bir okunabilirlik ölçütü,” dokuz eylül üniversitesi https://doi.org/10.1371/journal.pone.0105937 https://ink.library.smu.edu.sg/sis_research/4630 https://doi.org/10.1145/3137597.3137600 https://doi.org10.1109/mis.2019.2899143 https://doi.org/10.1177%2f0093650218777177 https://doi.org/10.1145/3132847.3132877 https://doi.org/10.1145/3219819.3219903 https://www.huffingtonpost.in/entry/facebook-twitter-may-face-more-scrutiny-in-2019-to-check-fake-news-hate-speech_in_5c29c589e4b05c88b701d72e https://www.huffingtonpost.in/entry/facebook-twitter-may-face-more-scrutiny-in-2019-to-check-fake-news-hate-speech_in_5c29c589e4b05c88b701d72e https://doi.org/10.1613/jair.953 information technology and libraries december 2020 automated fake news detection in the age of digital libraries | mertoğlu and genç 19 mühendislik fakültesi fen ve mühendislik dergisi 12, no. 3 (2010): 49–62, https://dergipark.org.tr/en/pub/deumffmd/issue/40831/492667. 33 robert gunning, “the technique of clear writing,” revised edition, new york: mcgraw hill, 1968. 34 ender ateşman, “türkçede okunabilirliğin ölçülmesi,” dil dergisi 58, no. 71–74 (1997). https://dergipark.org.tr/en/pub/deumffmd/issue/40831/492667 abstract introduction literature review methodology framework model dataset features experimental results and discussion conclusion and future work endnotes library-authored web content and the need for content strategy articles library-authored web content and the need for content strategy courtney mcdonald and heidi burkhardt information technology and libraries | september 2019 8 courtney mcdonald (crmcdonald@colorado.edu) is learner experience & engagement librarian and associate professor, university of colorado at boulder. heidi burkhardt (heidisb@umich.edu) is web project manager & content strategist, university of michigan. abstract increasingly sophisticated content management systems (cms) allow librarians to publish content via the web and within the private domain of institutional learning management systems. “libraries as publishers” may bring to mind roles in scholarly communication and open scholarship, but the authors argue that libraries’ self-publishing dates to the first “pathfinder” handout and continues today via commonly used, feature-rich applications such as wordpress, drupal, libguides, and canvas. although this technology can reduce costly development overhead, it also poses significant challenges. these tools can inadvertently be used to create more noise than signal, potentially alienating the very audiences we hope to reach. no cms can, by itself, address the fact that authoring, editing, and publishing quality content is both a situated expertise and a significant, ongoing demand on staff time. this article will review library use of cms applications, outline challenges inherent in their use, and discuss the advantages of embracing content strategy. introduction we tend to look at content management as a digital concept, but it’s been around for as long as content. for as long as humans have been creating content, we’ve been searching for solutions to manage it. the library of alexandria (300 bc to about ad 273) was an early attempt at managing content. it preserved content in the form of papyrus scrolls and codices, and presumably controlled access to them. librarians were the first content managers.1 (emphasis added) content is, and has always been, central to the mission of libraries. content is physical, digital, acquired, purchased, leased, subscribed, and created. “libraries as publishers” may bring to mind roles in scholarly communication and open scholarship, but the authors argue that libraries’ selfpublishing dates to the first mimeographed ‘pathfinder’ handout and continues today via commonly used, feature-rich web content management systems (cmss). libraries use these cmss to support research, teaching, and learning in a variety of day-to-day operations. the sophisticated and complex infrastructure surrounding web-based library content has evolved from the singular, independently hosted and managed “library website” into a “library web ecosystem” comprised of multiple platforms, including integrated library systems, institutional repositories, cmss, and others. multiple cms applications, whether open-source (e.g., wordpress, drupal), institutionally supported (e.g., canvas, blackboard) or library-specific (e.g., springshare’s libguides), are employed by most libraries to power the library’s website and research guides, as well as to make their collections, in any and all formats, discoverable and accessible. mailto:crmcdonald@colorado.edu mailto:heidisb@umich.edu) library-authored web content and the need for content strategy | mcdonald and burkhardt 9 https://doi.org/10.6017/ital.v38i3.11015 library staff at all levels create and publish content through these cms platforms, an activity that is critical to our users discovering what we offer and accomplishing their goals. the cms removes technical bottlenecks and enables subject matter experts to publish content without coding expertise or direct access to a server. this disintermediation has many benefits, enabling librarians to share and interact directly with their communities, and reducing costly development overhead. as with any powerful technology that’s simple to use, effectively implementing a cms is not without pitfalls. through these tools, we can inadvertently create more noise than signal, potentially alienating the very audiences we hope to reach. further, effective management of content and workflows across and among so many platforms is not trivial. distributing web content creation among many authors can quickly lead to numerous challenges requiring expert attention. governance strategies for library-authored web content are rarely addressed in the library literature. this article will review library use of cms applications, outline challenges inherent in their use, and discuss the advantages of embracing content strategy as a framework for library-authored web content governance. content management systems: a definition any conversation on this topic is complicated by the fact that there is both misunderstanding and disagreement regarding the definition of a content management system. in their survey of 149 libraries covering day-to-day website management, including staffing, infrastructure, and organizational structures, bundza et al. observed “[w]hen reviewing the diverse systems mentioned, it is obvious that people defined cmss very broadly.”2 connell surveyed over 600 libraries regarding their use of cmss, defined as “website management tools through which the appearance and formatting is managed separately from content, so that authors can easily add content regardless of web authoring skills.”3 a few respondents “indicated their cms was dreamweaver or adobe contribute” and another “self-identified as a non-cms user but then listed drupal as their web management tool.”4 while the authors find the survey definition itself slightly ambiguous (likely in the service of clarity for survey respondents), we also believe that these responses may hint at an underlying and widespread lack of clarity regarding the technology itself. an early report on potential library use of content management systems by browning and lowndes in 2001 opined that “a cms is not really a product or a technology. it is a catch-all term that covers a wide set of processes that will underpin the ‘next generation’ large-scale website.”5 while technological developments over the last twenty years reveal some limitations to this early characterization, we believe it is fundamentally sound to define the cms primarily through its functions. fulton defined a cms as “an application that enables the shared creation, editing, publish ing, and management of digital content under strict administrative parameters.”6 the authors concur with barker’s (2016) similarly task-based definition: “a content management system (cms) is a software package that provides some level of automation for the tasks required to effectively manage content . . . usually server-based, multi-user . . . [and] interact[ing] with content stored in a information technology and libraries | september 2019 10 repository.”7 browning & lowndes defined the key tasks, or functions, of the cms as encompassing four major categories: authoring, workflow, storage, and publishing.8 barker (2016) also outlined “the big four” of content management as: enterprise content management (e.g., intranets), digital asset management (dam), records management, and web content management (wcm), with wcm defined as “the management of content primarily intended for mass delivery via a website. wcm excels at separating content from presentation and publishing to multiple channels.”9 for the purpose of clarity within the scope of this article, our discussion will primarily focus on content management systems as they are used for wcm, acknowledging that some principles may apply in varying degrees to other categories. the cms and library websites the library literature reveals that, generally speaking, libraries began the transition from telnet and gopher catalog interfaces to launching websites in the 1990s.10 case studies of library websites from this period through the mid-2000s report library website pages increasing at a rapid rate, in some cases doubling or tripling on a yearly basis. 11 a comment from dallis and ryner in regard to their own case study provides a sense of what might be considered typical during this period: “the management of the site was decentralized, and it grew to an estimated 8,000 pages over a period of five years.”12 this proliferation, in turn, spurred focused interest in content management. “web content management (wcm) as a branch of content management (cm) gained importance during the web explosion in the mid-1990s.”13 as early as 2001 there were published laments regarding the state of library websites: institutions are struggling to maintain their web sites. out of date material, poor control over design and navigation, a lack of authority control and the constriction of the webmaster (or even web team) bottleneck will be familiar to many in the he/fe [higher education / further education] sector. the pre-millennial web has been characterized by highly manual approaches to maintenance; the successful and sustainable post-millennial web will have significant automation. one vehicle by which this can be achieved is the cms.14 mach wrote: the special concerns of web maintenance have only multiplied with the increased size and complexity of many library web sites. not only does the single webmaster model no longer work for most libraries, but the static html page is also in jeopardy. many overworked web librarians dream about the instant content updates possible with database-driven site or content management software. but while these technical solutions save staff time, they demand a fair amount of compromise.15 in 2010, fulton noted, “at one time, all institutions [mentioned in her literature review] could effectively manage their sites outside of a cms. however, changing standards combined with uncontrollable growth patterns persuaded them to take steps to prevent prolonged chaos.”16 library-authored web content and the need for content strategy | mcdonald and burkhardt 11 https://doi.org/10.6017/ital.v38i3.11015 changing technology, accessibility, and literacy throughout the early 2000s, advances in consumer technology and in web development (e.g., css, html 5, bootstrap) together with the need to comply with web-accessibility standards resulted in a gradual move from static, hand-coded sites to other solutions. in 2005, yu stated, “today’s content management solution is either a sophisticated software-based system or a databasedriven application.”17 after a detailed explanation of the cumbersome process of managing and updating a static site using microsoft’s frontpage, kane and hegarty noted, “the opportunity to migrate the site to a content management system provided a golden opportunity . . . to bring the code into line with best practice.”18 this transition also coincided with the growth of viable cms options, particularly open-source tools. black stated in 2011: “in the past few years, the field of open-source cmss has increased, making it more likely that a library will find a viable cms in the existing marketplace that will meet the organization’s needs.”19 in 2013, comeaux and schmetzke replicated an earlier study of library websites’ accessibility, reviewing the homepages of library websites at 56 institutions offering ala-accredited library and information science programs using bobby, an automated web-accessibility checker. they found that cms-powered library websites had a higher average of approved pages and a lower average of errors per page than those not powered by a cms.20 in a 2017 study, comeaux manually reviewed 37 academic library websites (members of the association of southeastern research libraries), and found that approximately three-quarters of cms-driven sites were responsive, as compared to only one-quarter of sites without a cms.21 accessibility also manifests itself on the web in other ways. it is important to consider what we know about literacy and how people read online. the ability to write using plain language, in addition to other essential techniques for effective web writing, is an important aspect of accessibility that must be addressed in tandem with compliance with industry standards such as the web content accessibility guidelines (wcag, https://www.w3.org/tr/wcag20/). a summary of recent results for the program for the international assessment of adult competencies (piaac, https://nces.ed.gov/surveys/piaac/) survey, administered to us adults, reported “the majority of people may struggle to read through a ‘simple’ bullet-point list of rules . . . nearly 62% of our population might not be able to read a graph or calculate the cost of shoes reliably.”22 blakiston succinctly observed: “on the web, scanning and skimming is the default.”23 these trends have led to an increasing push to adopt “plain language” by governmental agencies and others.24 skaggs stated, “adopt plain language throughout your website. plain language focuses on understanding and writing for the user’s goals, making content easily scannable for the user, and writing in easy to understand sentences.”25 library websites and the challenges of a distributed environment in 2011, black pointed out one of the chief advantages to using a cms: “cmss support a distributed content model by separating the content from the presentation and giving the content provider an easy to use interface for adding content”.26 empowerment to focus on special expertise is noted as another benefit: “chief among the efficiencies gained in using a cms is the simple act of giving content authors the tools they need to create webpages and, most importantly, to do so without requiring the technical knowledge that used to be a part of webpage development. designers can design, writers can write, editors can edit, and technology folks can manage the cms and support https://www.w3.org/tr/wcag20/ https://nces.ed.gov/surveys/piaac/ information technology and libraries | september 2019 12 its users.”27 browning and landes agreed: “the concept of ‘self-service authoring’, whereby staff do not need special skills to edit the content for which they are responsible, can be regarded as a major step towards acceptance of the web as a medium for communication by non-web specialists. providing this is the key advantage of a cms.”28 librarians quickly found, however, that while the adoption of a cms could empower more subject matter experts to participate in web content development and address technical issues such as responsive design and compliance with accessibility standards, the transition to a distributed model of content creation, oversight, and maintenance resulted in larger organizational ramifications. in 2006, approximately a decade following libraries’ general move to the web and at an early stage for cms adoption, guenther (2006) cautioned: “a cms is only a tool. purchasing the very best cms with every bell and whistle available will be a useless exercise without a solid plan to guide people and processes around its use.”29 this same article went on to observe: what makes using a cms a tremendous advantage is exactly what makes it a potential nightmare. a cms can make website development really easy; that's the good part. the bad part is, it makes webpage development really easy. one of the first issues you encounter is having to suddenly support a lot more content authors posting a lot more content. what once was an environment with limited activity can become a web development environment requiring considerably more oversight and technical support. having more hands stirring the pot, so to speak, is wrought with all kinds of challenges. 30 untenable growth this model of distributed content creation, in which authorship is undertaken by numerous parties across the organization, generally results in a rapidly increasing quantity of content without necessarily guaranteeing consistent quality. a review of the literature reveals that, more commonly, a distributed model leads to a lack of consistency and focus in library web content’s structure and execution. some papers underscore the problematic quality of the highly individualized nature of the content: “the sheer mass of [libraries’] public web presence has reached the point where maintenance is a problem. often the webpages grew out of the personal interests of staff members, who have since left for other jobs for other responsibilities or simply retired.”31 blakiston stated, “for a number of years, librarians were motivated to create more web content. it was assumed that adding more content was a service for library users, and it was also seen as a way to improve their web skills and demonstrate their fluency with technology.”32 similarly, chapman and demsky described how the university of michigan library website grew “in an organic fashion” and noted, “[a]s in many places, the library’s longstanding attitude toward the web was that more was more and that there was really no harm in letting the website develop however individual units and librarians thought best.”33 other papers described “authority and decision-making issues . . . differing opinions, turf struggles or a lack of communication . . . a shortage of time and motivation, general inertia, and resistance to change on the part of content authors.”34 iglesias noted, “some librarians will always be more comfortable creating webpages from scratch, fearing a loss of control. the library as a whole must decide if the core responsibility of librarians is to create content or to create websites.”35 library-authored web content and the need for content strategy | mcdonald and burkhardt 13 https://doi.org/10.6017/ital.v38i3.11015 newton and riggs stated, “this approach to content appears to be at odds with the role of librarians as leaders in information management practices and in supporting users to find , filter and critically evaluate information.”36 in her article “editorial and technological workflow tools to promote website quality,” morton-owens discussed several studies measuring the severe impact of even small flaws (such as typographical errors) on users’ judgements of a website’s credibility, and, by extension, of the organization’s credibility: “users’ experience of a website leads them to attribute characteristics of competence and trustworthiness to the sponsoring organization.”37 a. paula wilson, citing mcconnell and middleton, summarized the potential pitfalls inherent in a distributed model in which empowerment of content creators overshadows a unified vision, strategy, and approach to library-wide content management: a decentralized model without the use of guidelines, standards or templates will eventually fail. the website may experience inconsistency in presentation and navigation, outdated and incorrect information, and gaps in content, and its webpages maybe noncompliant in usability and accessibility design so much so that users cannot find information.38 inconsistent voice and lack of organizational unity in addition to such compounding factors and in contrast to journalistic practice, “libraries lack an editorial culture where content production and management is viewed as a collective rather than a personal effort.”39 morton-owens noted: “the concept of editing is not yet consistently applied to websites unless the site represents an organization that already relies on editors (like a newspaper)—but it is gaining recognition as a best practice. if the website is the most readily available public face of an institution, it should receive editorial attention just as a brochure or fundraising letter would.”40 in an environment with distributed authorship lacking a strong and consistent editorial culture, an organization's “voice” can quickly deteriorate. in web writing, voice is often defined as personality. blakiston stated: “the written content you provide plays an essential role in defining your library as an organization.”41 young went further, aligning voice with values, and arguing “[a]ny item of content that your library creates—an faq, a policy page, or a facebook post—should be conveyed in the voice of your library and should communicate the values of your library. a combined expression of content and values defines the voice of your organization.” 42 in their 2006 article “cms/cms: content management system/change management strategies,” goodwin et al. insightfully explore organizational challenges: the effort of developing a unified web presence reveals where the organization itself lacks unity . . . effective use of a content management system requires an organized and comprehensive consolidation of library resources, which emphasizes the need for a different organizational model and culture—one that promotes thinking about the library as a whole, sharing and collaboration.43 fulton built on this concept: “disunity in the library’s web interface could signify disunity within the institution. on the other hand, a harmonious web presence suggests an institution that works well together.”44 young drew an inherent connection between a strongly unified organizational identity and a consistent and coherent “content strategy”: information technology and libraries | september 2019 14 while libraries in general can draw on decades or centuries of cultural identity, each individual library may wish to convey a unique set of attributes that are appropriate for unique contexts. in this way, the element of “organizational values” inherent to content strategy signals a larger visioning project for determining the mission, vision, and values of your library. if these elements are already in place, then the work of content strategy can easily be adapted to fit existing values statements. otherwise, content strategy and organizational values can develop as a joint initiative. 45 library websites and content strategy content strategy is an emerging discipline that brings together concepts from user experience design, information architecture, marketing, and technical writing. content strategy encompasses activities related to creating, updating, and managing content that is intentional, useful, usable, well-structure, easily found, and easily understood, all while supporting an organization’s strategic goals.46 browning and lowndes recognized as early as 2002 that strategy would be required as the variety of communication channels for libraries increased: “as local information systems integrate and become more pervasive, self-service authoring extends to the concept of ‘write once, re-use anywhere’, in which the web is treated as just another communication channel along with email, word processor files and presentations, etc.”47 more than a decade later, in the introductory column to a 2013 themed issue of information outlook focused on content strategy, hales stated: content strategy is a field for which information professionals and librarians are ideally suited, by virtue of both their education and temperament. content, after all, is another word for information, and librarians and information professionals have been developing strategies for acquiring, managing, and sharing information for centuries. today, however, information is available to more people in more forms and through more channels than ever before, making content strategies a necessity for organizations rather than an afterthought.48 jones and farrington posited a common refrain for stating the importance of content strategy for librarianship: “library website content must be viewed in much the same way as a physical collection” and the “library website, to apply s. r. ranganathan’s fifth law, is a growing organism and must be treated as such, especially with the complexity of web content.”49 claire rasmussen drew connections between ranganathan’s laws and content strategy in a blog post, pointing out that web content represents an additional set of responsibilities to be managed: “for hundreds of years, librarians have been the primary caretakers of the content corpus. but somebody needs to care for the content that never makes it into a library’s collections, too.”50 blakiston & mayden provided a helpful overview of content strategy and its application in libraries in their article “how we hired a content strategist (and why you should too),” finding many points of connection between skill sets essential to content strategy and those commonly possessed by librarians: librarians who have worked in public services may have the needed skills to ask good questions and find out what users need . . . professionals doing this kind of work came from backgrounds including communications, english and library science . . . desirable library-authored web content and the need for content strategy | mcdonald and burkhardt 15 https://doi.org/10.6017/ital.v38i3.11015 qualifications for . . . content strategist[s] . . . [include] strategic planning, web skills and project management.51 the circumstances that motivated them to propose and eventually hire a dedicated content strategist at the university of arizona libraries hearken back to the discussion earlier in this article regarding the increasing complexity of web librarianship: “the web product manager had independently coordinated all user research and content strategy work. the idea of both managing [a major web redesign project] and leading these other important areas was not realistic.”52 datig also pointed to increasing day-to-day responsibilities when advocating for the importance of content strategy for librarians with outreach and marketing responsibilities: “lack of time, and a desire for that time to be well spent, is a huge concern for all librarians involved in library outreach and marketing . . . content strategy is an important and overlooked aspect of maintaining an effective and vital library outreach program.”53 hackett reflected on her role as web content strategist in a blog post after a recent website migration, noting: “moving forward with a content strategy . . . will ensure that university libraries’ website is useful, usable, and discoverable—now and in the future.”54 yet, while the need for strategy is hard to dispute and librarians are theoretically well suited for web content strategy work, blakiston & mayden noted that explicit organizational support for content strategy in libraries remained limited: “despite the growing popularity of content strategy as a discipline, only a handful of libraries had hired staff dedicated to this role at the time we proposed adding a content strategist to our staff.”55 conclusion this article has traced the history of library adoption of web content management systems, the evolution of those systems, and the corresponding challenges as libraries have attempted to manage increasingly prolific content creation workflows across multiple, divergent cms platforms. what is the library website, anyway? while some variation would to be expected from institution to institution, largely missing from the conversation is agreement on the purpose and aim of the library website writ large. this lack of definition, together with the technological and growth-related issues already discussed, has doubtless contributed to the confusion. after all, how would we know if we are “building it right” if we are not sure what we are meant to be building in the first place? in response to this ambiguity, the following definition was proposed: the library website is an integrated representation of the library, providing continuously updated content and tools to engage with the academic mission of the college/university. it is constructed and maintained for the benefit of the user. value is placed on consump tion of content by the user rather than production of content by staff.56 effective management of library web content requires dedicated resources and clear authority inconsistent processes, disconnects between units, varying constituent goals, and vague or ineffective wcm governance structures are recurrent themes throughout the literature. as cms applications have enabled broader access to web publishing, models of library web management information technology and libraries | september 2019 16 have moved away from workflows structured around strictly technical tasks and permissions, and have instead migrated toward consensus-based, revolving committee structures. while greater involvement of subject matter experts has been noted as a positive earlier in this article, other challenges have also been acknowledged. mcdonald, haines, and cohen stated: “in the context of web design and governance, consensus is a blocker to nimble, standards-based, user-focused action.”57 library website as an integrated representation of the organization as previously discussed, web content governance issues often signal a lack of coordination, or even of unity, across an organization. demsky stated, “we won’t be fully successful until we see it as our website” (emphasis added).58 internal documentation from the university of michigan library emphasized the value of “publicly represent[ing] ourselves as one library,” and stated: the more people are provided with clear communication that shows our offerings and unique items are part of the . . . library—rather than confuse users by making primary attribution to a sub-library, collection, or service point—the more people will recognize and understand the library's tremendous, overall value.59 content strategy and the case for library-authored content no cms can, by itself, address the fact that authoring, editing, and publishing quality content is both a situated expertise and a significant, ongoing demand on staff time. each platform, resource, or database brings its own visual style, terminology, tone and functionality. they are all parts of the library experience, which in turn is one part of the student, research or teaching experience. an understanding of content strategy is critical if staff are to see the connections between their own content and the rest of the content delivered by the organization.60 libraries must proactively embrace and employ best practices in content strategy and in writing for the web to effectively address considerations of literacy and to present a consistent voice for the organization. these practices position libraries to fully realize the promise of content management systems through embracing an ethos of library-authored content. the authors define library-authored content as collectively owned and authored content that represents the organization as a whole. library-authored content is: • collaboratively planned, written, and edited with participation of both subject matter experts and domain experts (i.e., library staff with expertise in content strategy, web librarianship); • carefully drafted to optimize for clarity within the context of the end-user; • current, reviewed on a recurrent schedule, and regularly updated; • consistent across the ecosystem of cms applications and other platforms, including print materials and social media; • compliant with industry standards (including but not limited to those related to accessibility), and with relevant internal brand standards; and • centrally managed as the primary responsibility of one or more domain experts. library-authored web content and the need for content strategy | mcdonald and burkhardt 17 https://doi.org/10.6017/ital.v38i3.11015 in order for libraries to meet the ever-increasing demands on our resources to produce timely, user-centered content that advances our missions for supporting teaching, research, and learning, a cultural shift toward a more collective, collaborative model of web content management and governance is necessary. content strategy provides a flexible, adaptable framework for libraries to more efficiently and effectively leverage the power of multiple cms platforms, to present engaging on-point content, and to provide appropriate, scaffolded support for researchers at all levels — with a team of one or a team of many. endnotes 1 deane barker, “what web content management is (and isn’t),” in web content management (o’reilly media, inc., 2016), sec. what web content management is (and isn’t), https://learning.oreilly.com/library/view/web-content-management/9781491908112/. 2 maira bundza, patricia fravel vander meer, and maria a. perez-stable, “work of the web weavers: web development in academic libraries,” journal of web librarianship 3, no. 3 (september 15, 2009): 252, https://doi.org/10.1080/19322900903113233. 3 ruth sara connell, “content management systems: trends in academic libraries,” information technology and libraries 32, no. 2 (june 10, 2013): 43, https://doi.org/10.6017/ital.v32i2.4632. 4 connell, 46. 5 paul browning and mike lowndes, “jisc techwatch report: content management systems,” 2001, 3, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.9100. 6 camilla fulton, “library perspectives on web content management systems,” first monday 15, no. 8 (july 15, 2010): sec. review of literature, https://doi.org/10.5210/fm.v15i8.2631. 7 barker, “what web content management is (and isn’t),” sec. what is a content management system? 8 browning and lowndes, “jisc techwatch report,” 4. within a diagram outlining the major functions within the content life-cycle, they include the steps ‘review’, ‘archive’ and ‘dispose’ steps which, in the experience and observations of the authors, are often overlooked in general library web practice. 9 barker, sec. types of content management systems. 10 laura b. cohen, matthew m. calsada, and frederick j. jeziorkowski, “scratchpad: a quality management tool for library web sites,” content and workflow management for library websites: case studies, 2005, 102–26, https://doi.org/10.4018/978-1-59140-533-7.ch005; diane dallis and doug ryner, “indiana university bloomington libraries presents organization to the users and power to the people: a solution in web content management,” content and workflow management for library websites: case studies, 2005, 80–101, https://doi.org/10.4018/978-1-59140-533-7.ch004; stephen sottong, “database-driven web pages using only javascript: active client pages,” content and workflow management for library websites: case studies, 2005, 167–85, https://doi.org/10.4018/978-1-59140-533 https://learning.oreilly.com/library/view/web-content-management/9781491908112/ https://doi.org/10.1080/19322900903113233 https://doi.org/10.6017/ital.v32i2.4632 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.9100 https://doi.org/10.5210/fm.v15i8.2631 https://doi.org/10.4018/978-1-59140-533-7.ch005 https://doi.org/10.4018/978-1-59140-533-7.ch004 https://doi.org/10.4018/978-1-59140-533-7.ch008 information technology and libraries | september 2019 18 7.ch008; ray bailey and tom kmetz, “migrating a library’s web site to a commercial cms within a campus‐wide implementation,” library hi tech 24, no. 1 (january 1, 2006): 102–14, https://doi.org/10.1108/07378830610652130; juan carlos rodriguez and andy osburn, “developing a distributed web publishing system at csu sacramento library: a case study of coordinated decentralization,” content and workflow management for library websites: case studies, 2005, 51–79, https://doi.org/10.4018/978-1-59140-533-7.ch003; barbara a. blummer, “a literature review of academic library web page studies,” journal of web librarianship 1, no. 1 (june 21, 2007): 45–64, https://doi.org/10.1300/j502v01n01_04; robert slater, “the library web site: collaborative content creation and management,” journal of web librarianship 2, no. 4 (december 2008): 567–77, https://doi.org/10.1080/19322900802473928; rebecca blakiston, “developing a content strategy for an academic library website,” journal of electronic resources librarianship 25, no. 3 (july 2013): 175–91, https://doi.org/10.1080/1941126x.2013.813295; suzanne chapman and ian demsky, “taming the kudzu: an academic library’s experience with web content strategy,” in cutting-edge research in developing the library of the future, ed. bradford lee eden (lanham, md: rowman & littlefield, 2015). 11 cohen, calsada, and jeziorkowski, “scratchpad,” 11; rodriguez and osburn, “developing a distributed web publishing system at csu sacramento library,” 76–77; slater, “the library web site,” 57. 12 dallis and ryner, “indiana university bloomington libraries presents organization to the users and power to the people,” 82. 13 holly yu, ed., content and workflow management for library web sites: case studies (hershey, pa: igi global, 2005), vi. 14 browning and lowndes, “jisc techwatch report,” 5. 15 michelle mach, “website maintenance workflow at a medium-sized university library,” content and workflow management for library websites: case studies, 2005, 128, https://doi.org/10.4018/978-1-59140-533-7.ch006. 16 fulton, “library perspectives on web content management systems,” sec. review of literature. 17 yu, content and workflow management for library web sites, 2. 18 nora hegarty and david kane, “new web site, new opportunities: enforcing standards compliance within a content management system,” library hi tech 25, no. 2 (june 19, 2007): 278, https://doi.org/10.1108/07378830710755027. 19 elizabeth l. black, “selecting a web content management system for an academic library website,” information technology and libraries 30, no. 4 (december 1, 2011): 186, https://doi.org/10.6017/ital.v30i4.1869. https://doi.org/10.4018/978-1-59140-533-7.ch008 https://doi.org/10.1108/07378830610652130 https://doi.org/10.4018/978-1-59140-533-7.ch003 https://doi.org/10.1300/j502v01n01_04 https://doi.org/10.1080/19322900802473928 https://doi.org/10.1080/1941126x.2013.813295 https://doi.org/10.4018/978-1-59140-533-7.ch006 https://doi.org/10.1108/07378830710755027 https://doi.org/10.6017/ital.v30i4.1869 library-authored web content and the need for content strategy | mcdonald and burkhardt 19 https://doi.org/10.6017/ital.v38i3.11015 20 dave comeaux and axel schmetzke, “accessibility of academic library web sites in north america: current status and trends (2002‐2012),” library hi tech 31, no. 1 (march 1, 2013): 27, https://doi.org/10.1108/07378831311303903. 21 david j. comeaux, “web design trends in academic libraries — a longitudinal study,” journal of web librarianship 11, no. 1 (january 2, 2017): 12, https://doi.org/10.1080/19322909.2016.1230031. 22 meredith larson, “even if you’re trying, you’re probably not writing for the average american,” federal communicators network (blog), october 9, 2018, https://fedcommnetwork.org/2018/10/09/even-if-youre-trying-youre-probably-not-writingfor-the-average-american/. 23 rebecca blakiston, writing effectively in print and on the web—a practical guide for librarians (rowman & littlefield, 2017), 110. 24 national adult literacy agency, “plain english around the world,” simply put, 2015, http://www.simplyput.ie/plain-english-around-the-world; plain language action and information network, general services administration, united states government, “home | plainlanguage.gov,” plainlanguage.gov, accessed february 1, 2019, https://www.plainlanguage.gov/. 25 danielle skaggs, “my website reads at an eighth grade level: why plain language benefits your users (and you),” journal of library & information services in distance learning, 2016, 2, https://doi.org/10.1080/1533290x.2016.1226581. 26 black, “selecting a web content management system for an academic library website,” 185. 27 kim guenther, “content management systems as ‘silver bullets,’” online 30, no. 4 (2006): 55. 28 paul browning and mike lowndes, “content management systems: who needs them?,” ariadne, no. 30 (2002): sec. the issue, http://www.ariadne.ac.uk/issue30/techwatch. 29 guenther, “content management systems as ‘silver bullets,’” 54. 30 guenther, 56. 31 michael seadle, “content management systems,” library hi tech 24, no. 1 (january 1, 2006): 5, https://doi.org/10.1108/07378830610652068. 32 blakiston, “developing a content strategy for an academic library website,” 176. 33 chapman and demsky, “taming the kudzu,” 25. 34 bundza, meer, and perez-stable, “work of the web weavers,” 256. 35 edward iglesias, “winning the peace: an approach to consensus building when implementing a content management system,” in content management systems in libraries: case studies, ed. bradford lee eden (scarecrow press, 2008), 177. https://doi.org/10.1108/07378831311303903 https://doi.org/10.1080/19322909.2016.1230031 https://fedcommnetwork.org/2018/10/09/even-if-youre-trying-youre-probably-not-writing-for-the-average-american/ https://fedcommnetwork.org/2018/10/09/even-if-youre-trying-youre-probably-not-writing-for-the-average-american/ http://www.simplyput.ie/plain-english-around-the-world https://www.plainlanguage.gov/ https://doi.org/10.1080/1533290x.2016.1226581 http://www.ariadne.ac.uk/issue30/techwatch https://doi.org/10.1108/07378830610652068 information technology and libraries | september 2019 20 36 kristy newton and michelle riggs, “everybody’s talking but who’s listening? hearing the user’s voice above the noise, with content strategy and design thinking,” vala2016 conference, january 1, 2016, 1, https://ro.uow.edu.au/asdpapers/536. 37 emily g. morton-owens, “editorial and technological workflow tools to promote website quality,” information technology and libraries 30, no. 3 (september 2, 2011): 91, https://doi.org/10.6017/ital.v30i3.1764. 38 a. paula wilson, library web sites: creating online collections and services (chicago: american library association, 2004), 4. 39 chapman and demsky, “taming the kudzu,” 35. 40 morton-owens, “editorial and technological workflow tools to promote website quality,” 97. 41 blakiston, writing effectively in print and on the web — a practical guide for librarians, 6. 42 scott w. h. young, “principle 1: create shareable content,” library technology reports 52, no. 8 (november 18, 2016): 11–12. 43 susan goodwin et al., “cms/cms: content management system/change management strategies,” library hi tech 24, no. 1 (january 2006): 55–56, https://doi.org/10.1108/07378830610652103. 44 fulton, “library perspectives on web content management systems,” sec. discussion. 45 young, “chapter 1. principle 1,” 12. 46 kristina halvorson, “understanding the discipline of web content strategy,” bulletin of the american society for information science & technology 37, no. 2 (january 2011): 23–25, https://doi.org/10.1002/bult.2011.1720370208; anne haines, “web content strategy: what is it, and why should i care?,” inula notes 27, no. 2 (december 18, 2015): 11–15, https://scholarworks.iu.edu/journals/index.php/inula/article/view/20672/26734; u.s. department of health & human services, “content strategy basics,” usability.gov, january 24, 2016, https://www.usability.gov/what-and-why/content-strategy.html. 47 browning and lowndes, “content management systems,” sec. the issue. 48 stuart hales, “providing content strategy services,” information outlook (online); alexandria 17, no. 6 (december 2013): 8. 49 kyle m. l. jones and polly-alida farrington, “wordpress as library cms,” american libraries; chicago 42, no. 5/6 (june 2011): 34. 50 claire rasmussen, “do it like a librarian: ranganathan for content strategists « brain traffic blog,” braintraffic blog, june 13, 2012, https://web.archive.org/web/20120613173955/http://blog.braintraffic.com/2012/06/do -itlike-a-librarian-ranganathan-for-content-strategists/. https://ro.uow.edu.au/asdpapers/536 https://doi.org/10.1108/07378830610652103 https://doi.org/10.1002/bult.2011.1720370208 https://scholarworks.iu.edu/journals/index.php/inula/article/view/20672/26734 https://www.usability.gov/what-and-why/content-strategy.html https://web.archive.org/web/20120613173955/http:/blog.braintraffic.com/2012/06/do-it-like-a-librarian-ranganathan-for-content-strategists/ https://web.archive.org/web/20120613173955/http:/blog.braintraffic.com/2012/06/do-it-like-a-librarian-ranganathan-for-content-strategists/ library-authored web content and the need for content strategy | mcdonald and burkhardt 21 https://doi.org/10.6017/ital.v38i3.11015 51 rebecca blakiston and shoshana mayden, “how we hired a content strategist (and why you should too),” journal of web librarianship 9, no. 4 (2015): 196, https://doi.org/10.1080/19322909.2015.1105730. 52 blakiston and mayden, 197. 53 ilka datig, “revitalizing library websites and social media with content strategy: tools and recommendations,” journal of electronic resources librarianship 30, no. 2 (2018): 63–64, https://doi.org/10.1080/1941126x.2018.1465511. 54 karen hackett, “what is a web content strategist?,” library news, october 17, 2016, https://sites.psu.edu/librarynews/2016/10/17/whats-a-web-content-strategist/. 55 blakiston and mayden, “how we hired a content strategist (and why you should too),” 196. 56 courtney mcdonald, anne haines, and rachael cohen, “from consensus to expertise: rethinking library web governance,” acrl techconnect (blog), november 2, 2015, https://acrl.ala.org/techconnect/post/from-consensus-to-expertise-rethinking-library-webgovernance/. 57 mcdonald, haines, and cohen. 58 ian demsky, “lessons from my first year as web content strategist,” library tech talk (blog), august 7, 2014, https://www.lib.umich.edu/blogs/library-tech-talk/lessons-my-first-yearweb-content-strategist. 59 university of michigan library, “editorial style and best practices,” january 23, 2019, sec. library branding. 60 newton and riggs, “everybody’s talking but who’s listening?,” 12. https://doi.org/10.1080/19322909.2015.1105730 https://doi.org/10.1080/1941126x.2018.1465511 https://sites.psu.edu/librarynews/2016/10/17/whats-a-web-content-strategist/ https://acrl.ala.org/techconnect/post/from-consensus-to-expertise-rethinking-library-web-governance/ https://acrl.ala.org/techconnect/post/from-consensus-to-expertise-rethinking-library-web-governance/ https://www.lib.umich.edu/blogs/library-tech-talk/lessons-my-first-year-web-content-strategist https://www.lib.umich.edu/blogs/library-tech-talk/lessons-my-first-year-web-content-strategist abstract introduction content management systems: a definition the cms and library websites changing technology, accessibility, and literacy library websites and the challenges of a distributed environment untenable growth inconsistent voice and lack of organizational unity library websites and content strategy conclusion what is the library website, anyway? effective management of library web content requires dedicated resources and clear authority library website as an integrated representation of the organization content strategy and the case for library-authored content endnotes public libraries leading the way: on educating patrons on privacy and maximizing library resources public libraries leading the way on educating patrons on privacy and maximizing library resources t.j. lamanna information technology and libraries | september 2019 4 t.j. lamanna (professionalirritant@riseup.net) is an adult services librarian, cherry hill public library. abstract libraries are one of our most valuable institutions. they cater to people of all demographics and provide services to patrons they wouldn’t be able to get anywhere else. the list of services libraries provide is extensive and comprehensive, although unfortunately, there are significant gaps in what our services can offer, particularly those regarding technology advancement and patron privacy. though library classes on educating patrons’ privacy protection are a valiant effort, we can do so much more and lead the way, maybe not for the privacy industry but for our communities and patrons. creating a strong foundational knowledge will help patrons leverage these new skills in their day to day lives as well as help them educate their families about common privacy issues. in this column, we’ll explore some of the ways libraries can utilize their current resources as well as provide ideas on how we can maximize their effectiveness and roll new technologies into their operations. though many libraries have policies on how they deal with patron privacy, unfortunately some policies aren’t very strong and oftentimes staff isn’t trained in the details of these policies. fortunately, for libraries who don’t have these necessary policies, there are some, such as the san jose public library, that offer their own as a framework.1 those that do have a strong comprehensive policy must make sure they are enforcing and regularly updating it to comply with new technologies being released. it’s a daunting task, but as article vii of the library bill of rights says, “all people, regardless of origin, age, background, or views, possess a right to privacy and confidentiality in their library use. libraries should advocate for, educate about, and protect people’s privacy, safeguarding all library use data, including personally identifiable information.”2 this means we have a responsibility to our patrons to do everything in our power to protect them and teach them to protect themselves. this requires a concerted effort not just for technology and it librarians, but for all library workers. a privacy policy means little if those on the front lines are either unaware of the policy or unsure how it is to be implemented. therefore, all library staff should both understand the fundamental reasons behind library privacy policies and be trained in maintaining them. libraries may consider implementing this training during staff development days or offer independent training sessions as needed. since the introduction of the patriot act, libraries stopped collecting patrons’ reading habits, but so many library integrated library systems (ils) snag massive amounts of patron information we are unaware of. i’ve been administering our ils for over two years and i just found another space where items are being unnecessarily retained that i didn’t notice before. an instance such as this calls for limiting personally identifiable information (pii) to what is strictly necessary. mailto:professionalirritant@riseup.net on educating patrons on privacy and maximizing library resources | lamanna 5 https://doi.org/10.6017/ital.v38i3.11571 in limiting the pii gathered in the first place, library staff should consider the following questions: what information do libraries really need to collect to offer library cards or programming? does your library really need patrons’ date of birth or gender? probably not. if so, you shouldn’t be collecting it, and if you do, make sure you anonymize the data. using metrics is vital to how libraries function, receive funding, and schedule programming. you can still use the information, but it should not be connected to a patron in any way. after educating staff, we can educate patrons on developing better and safer practices regarding personal privacy and security in their daily lives. practical examples range from teaching patrons how to create strong passwords and backup sensitive files to explaining how malware works and what the “cloud” actually is. this is a start, but it goes far beyond that. i’ve served many patrons who, even after taking courses on the subject, are overwhelmed by the security measures needed to protect themselves. this isn’t necessarily a sign that our classes are ineffective, but it does imply that new tactics are needed. let’s look at a few examples. another version of pii that we often overlook are security measures such as closed-circuit television (cctv) or security/police officers in our buildings.3 they often are either forgotten or outside the purview of the library itself. as the college of policing states, “cctv is more effective when directed at reducing theft of and from vehicles, while it has no impact on levels of violent crime.”4 while there are justifications for bringing this technology into the library, they should only be set up where needed, taking great care not to point them at patron or staff computers. if cctv is needed, make sure to follow local retention laws and remove the footage as soon as its time has expired. this idea applies to all collected information. there is no reason to archive data beyond the date they can be destroyed as it puts the library and its patrons in a compromised position. law enforcement in the library is a tough thing to argue against in our current political climate. but studies have shown that police presence does little to deter crime and may actually disproportionately impact marginalized communities.5 consider the purpose of law enforcement personnel and if their presence is actually necessary to the proper functioning of your library. in the event that you should have law enforcement come in with a subpoena that requires you to turn over your patron data, it’s important to have a canary warning that can be removed so your patrons understand what has happened.6 another way libraries can lead the way in protecting patron privacy both inside and outside the library is by supporting legislation that bans facial recognition software. this type of technology is becoming ubiquitous, but places have already started pushing back and libraries can be the epicenter of this movement. it’s already been banned in oakland,7 san francisco8 (one of the homes of this technology), as well as somerville, massachusetts, with groups like the massachusetts library association unanimously putting out a moratorium on facial surveillance, which is the practice of recording ones face to create user profiles.9 there are other states that are working down this path and it’s overwhelmingly heartening to see libraries step up and in front of something they know would damage our communities. we ought to be activists, standing on the front lines and showing our patrons our deepest commitment to them. surely there are greater strides we can make, such as revising wifi policies. wifi is one of the most used services libraries offer and many libraries don’t use it to their full potential. for instance, some libraries turn off their wifi when the building is closed, severely limiting patrons’ information technology and libraries | september 2019 6 usage. it’s a service we pay for and there is no reason it shouldn’t be available at all times. your it service should make sure the wifi is secure (it should be where it’s available at all hours or not). unlimited access to wifi becomes invaluable to users who need it for emergencies including completing work or accessing important online services when the library is closed. while we do have limited bandwidth and it services must actively maintain wifi security, libraries should make sure it’s available to the public as often as possible. now that we’ve covered using bandwidth when we aren’t open, let’s talk about libraries with excess bandwidth. no resource should go unused in the library. we have a limited budget and we should make sure every penny is used to serve our communities. one fantastic use of excess bandwidth — especially during closed hours — would be to set up a tor relay in your library, an anonymity network that allows people to surf the internet with extra security and privacy in mind. it’s quite easy to set up and you can limit how much bandwidth it uses so you aren’t shorting anyone in your library. it’s a service used by groups such as journalists or activists who want to make positive change in the world and need a safe place to do so. some are concerned that the tor network is used for malicious intent but the tor project, the organization that runs the network, constantly works to ensure nothing like that is taking place. also, anything solicitous you can find on the tor network is available on the regular internet including places like facebook or craigslist, so the stigma of the network should be taken in context. the tor project routinely monitors the network and searches out illegal material (there are no hired killers on the tor network). given all this, you could help the network greatly by just partitioning a small amount of your bandwidth. libraries have the unique ability to be transformative. unlike other non-profits or organizations, we have the ability to pivot. we can both change directions as needed and pave the way for our communities as leaders in the movement toward patron privacy. i leave you with a quote from hardt and negri: “…we share common dreams of a better future.”10 that should be our motto. endnotes 1 “our privacy policy, san jose public library, accessed august 15, 2019, https://www.sjpl.org/privacy/our-privacy-policy. 2 “library bill of rights,” american library association, last modified january 19, 2019, http://www.ala.org/advocacy/intfreedom/librarybill. 3 “importance of cctv in libraries for better security,” accessed august 14, 2019, https://www.researchgate.net/publication/315098570_importance_of_cctv_in_libraries_for _better_security. 4 “effects of cctv on crime,” college of policing, accessed august 14, 2019, http://library.college.police.uk/docs/what-works/what-works-briefing-effects-of-cctv2013.pdf. 5 “do police officers in school really make them safer?” accessed august 14, 2019, https://www.npr.org/2018/03/08/591753884/do-police-officers-in-schools-really-make-themsafer. 6 “canary warning,” wikipedia https://en.wikipedia.org/wiki/warrant_canary. https://www.sjpl.org/privacy/our-privacy-policy http://www.ala.org/advocacy/intfreedom/librarybill https://www.researchgate.net/publication/315098570_importance_of_cctv_in_libraries_for_better_security https://www.researchgate.net/publication/315098570_importance_of_cctv_in_libraries_for_better_security http://library.college.police.uk/docs/what-works/what-works-briefing-effects-of-cctv-2013.pdf http://library.college.police.uk/docs/what-works/what-works-briefing-effects-of-cctv-2013.pdf https://www.npr.org/2018/03/08/591753884/do-police-officers-in-schools-really-make-them-safer https://www.npr.org/2018/03/08/591753884/do-police-officers-in-schools-really-make-them-safer https://en.wikipedia.org/wiki/warrant_canary on educating patrons on privacy and maximizing library resources | lamanna 7 https://doi.org/10.6017/ital.v38i3.11571 7 sarah ravani, “oakland bans use of facial recognition technology, citing bias concerns,” san francisco chronicle, july 17, 2019, https://www.sfchronicle.com/bayarea/article/oakland-bansuse-of-facial-recognition-14101253.php. 8 kate conger, richard fausset, and serge f. kovaleski, “san francisco bans facial recognition technology,” new york times, may 14, 2019, https://www.nytimes.com/2019/05/14/us/facialrecognition-ban-san-francisco.html. 9 sarah wu, “somerville city council passes facial recognition ban,” boston globe, june 27, 2019, https://www.bostonglobe.com/metro/2019/06/27/somerville-city-council-passes-facialrecognition-ban/sfaqq7mg3dgulxonbhscyk/story.html. 10 michael hart and antonio negri, multitude: war and democracy in the age of empire, (new york: the penguin press, 2009), p. 128. https://www.sfchronicle.com/bayarea/article/oakland-bans-use-of-facial-recognition-14101253.php https://www.sfchronicle.com/bayarea/article/oakland-bans-use-of-facial-recognition-14101253.php https://www.nytimes.com/2019/05/14/us/facial-recognition-ban-san-francisco.html https://www.nytimes.com/2019/05/14/us/facial-recognition-ban-san-francisco.html https://www.bostonglobe.com/metro/2019/06/27/somerville-city-council-passes-facial-recognition-ban/sfaqq7mg3dgulxonbhscyk/story.html https://www.bostonglobe.com/metro/2019/06/27/somerville-city-council-passes-facial-recognition-ban/sfaqq7mg3dgulxonbhscyk/story.html abstract endnotes lita president's message: sustaining lita lita president’s message sustaining lita emily morton-owens information technology and libraries | september 2019 2 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the assistant university librarian for digital library development & systems at the university of pennsylvania libraries. recently, at the 2019 midwinter meeting in seattle, ala decided to adopt sustainability as one of the core values of librarianship. the resolution includes the idea of a triple bottom line: “to be truly sustainable, an organization or community must embody practices that are environmentally sound and economically feasible and socially equitable.” if you had thought of sustainability mainly in terms of the environment, you have plenty of company. i originally pictured it as an umbrella term for a variety of environmental efforts: clean air, waste reduction, energy efficiency. but in fact the idea encompasses human development in a broader sense. one definition of sustainability involves making decisions in the present that take into account the needs of the future. of course our current environmental threats demand our attention, and libraries have found creative ways to promote environmental consciousness (myriad examples include books on bikes, seeking leed or passive house certification for library buildings, providing resources on xeriscaping, and many more). even if you’re not presently working in a position that allows you to engage directly on the environment, though, the concept of sustainability turns out to permeate our work and values. the ideas of solving problems in a way that doesn’t create new challenges for future people, developing society in a way that allows all people to flourish, and fostering strong institutions: these concepts all resonate with the work we do daily, not only in what we offer our users but also in how we work with each other. as a profession, we have a history of designing future-proof systems (or at least attempting to). whenever i’ve been involved in planning a digital library project, one of the first questions on the table is “how do we get our data back out of this, when the time comes?” no matter how enamored we are of the current exciting new solution, we remember that things will look different in the future. library metadata schemas are all about designing for interoperability and reusability, including in new ways that we can’t picture yet. someone who is unaccustomed to this kind of planning may see a high project overhead for these concerns, but we have consistently incorporated long-term thinking into our professional values due to the importance we place on free access, data preservation, and interoperability. the triple-bottom line approach, considering economic, social, and environmental factors, also influences the lita leadership. i recently announced the lita board’s decision to reduce our in person participation at ala midwinter for 2020, which is partly in response to ala’s deliberations about reinventing the event starting in 2021. with all the useful collaboration technologies now at our fingertips, it is harder to justify requiring our members to meet in person more than once per year. it is possible for us to do great work, on a continuous and rolling basis, throughout the year. more importantly, we want to offer committee and leadership positions to members who may not mailto:egmowens.lita@gmail.com http://www.ala.org/aboutala/sites/ala.org.aboutala/files/content/governance/council/council_documents/2019_ms_council_docs/ala%20cd%2037%20resolution%20for%20the%20adoption%20of%20sustainability%20as%20a%20core%20value%20of%20librarianship_final1182019.pdf sustaining lita | morton-owens 3 https://doi.org/10.6017/ital.v38i3.11627 be able to travel extensively, for personal or work reasons. (especially when many do not receive financial support from their employers. and, to come back around to environmental concerns for a moment, think of all the flights our in-person meetings require.) by being more flexible about what participation looks like, we sustain the effort that our members put into lita through a world of work that is changing. financial sustainability is also a factor in our pursuit of a merger with alcts and llama. we are three smaller divisions based on professional role, not library type, who share interests and members. we also have similar needs and processes for running our respective associations. unfortunately, lita has been on an unsustainable course with our budget for some time—we spend more than we take in annually, due to overhead costs and working within ala’s processes and infrastructure. the lita board has engaged for many years on the question of how to balance our financial future with the fact that our programs require full-time staff, instructors, technology, printing, meeting rooms, etc. core, as the new merged division will be known, will allow us to correct that balance by combining our operations, streamlining workflows, and containing our costs. the staff will also be freed up to invest more effort in member engagement. we can’t predict all the services that associations will offer in the future, but we know that, for example, online professional development is always needed, so we’re ensuring that the plan allows it to continue. it is inspiring to talk about the new collaborations and subject-matter synergies that the merger will bring with it, but core will also achieve something important for sustaining a level of service to our membership. at the ala level, the steering committee on organizational effectiveness (scoe) is also looking at ways to streamline the association’s structure and make it more approachable and welcoming to new members. i would add that a simplified structure should make ala more accountable to members as well, which is crucial for positioning it as an organization worth devoting yourself to. these shifts are essential because member volunteers are what make ala happen, and we need a structure that invites participation from future generations of library workers. taken together, these may look like a confusing flurry of changes. but librarians have evolved to be excellent at long-term thinking about our goals and values and how to pursue an exciting future vision based on what we know now and what tools (technology, people, ideas) we have at hand. we care about helping our users thrive and are able to take a broad view of what that encompasses. in particular, with the new resolution about sustainability, we’re including the health of our communities and the security of our environment as a part of that mission. due to their innovative spirit and principled sense of commitment, our members are well-placed to lead transformations in their home institutions and to participate in the development of lita. as we weigh all these changes, we value the achievements of our association and its past leaders and members, and seek to honor them by making sure those successes carry on for our future colleagues. editorial board thoughts: building a culture of resilience in libraries editorial board thoughts building a culture of resilience in libraries paul swanson information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13781 paul swanson (swans062@umn.edu) is technology manager, minitex, and a member of the ital editorial board © 2021. we find ourselves a year and a half into the global pandemic, and libraries—just like the rest of the workforce—have been navigating through a drastic amount of change in a short amount of time with no real guideposts as to what may come next. libraries were completely shut down, library staff displaced, and library services transformed in a short period of time like they hadn’t been before. we’ve also found that as we’ve reopened our libraries there are new patron and staff expectations. it is expected that the changes that we’ve enacted in the middle of a crisis will continue and will be folded into a new service delivery model. patrons have different expectations of libraries; library staff have different expectations of management and of the technology that drives library services and their day-to-day work. in order to meet these new expectations, we’ve embarked on a path of implementing more flexibility into our environments and workflows. i feel however that the concept of flexibility misses the mark. flexibility is about being open to change and reacting to it, and possibly taking different paths to solving a common problem. flexibility though is part of a broader concept that i think we need to embrace, and that is resiliency. resiliency is defined as “tending to recover from or adjust easily to misfortune or change.”1 we’ve all reacted and persevered through the last year and a half in our own way, but i would bet that those organizations that have a culture of resiliency have fared better. organizations that pride themselves on their flexibility can sometimes be more reactionary than anything. a problem arises and they may be quick to change, to flex and react. a resilient organization can make the conscious decision to absorb the problem, make a change that can be rolled back when the time is right, or even make it permanent. they have built a resilient foundation across the organization that allows them to not just be flexible when a problem arises, but instead assimilate the solutions to the challenges that come up. flexibility is what you do, resilience is what you are. to be more resilient with the technology that your library uses and depends on, is really rather easy. it means doing all those things that you have always meant to do but haven't made the time for. you are building that foundation that your organization can rely on when change or misfortune strikes. that foundation is what enables you to make positive change through a crisis. it is what can keep you from making reactionary decisions. let’s talk through a few of the different places where you can instill resilience in your organization. the stronger your change management practices are, the more resilient your technology will be. all the way from governance for changes, to documentation around decision making, and to a detailed history of the changes that have been made to your systems. it sounds like a lot because it is a lot, but you have to start now. you need to be able to think beyond today’s current state of your systems and understand where you were in the past, and how you got to where you are today. you cannot effectively make changes to your systems, especially in the midst of a crisis, if you don’t know why you’ve done the things you’ve done up to this point. you also need that disaster recovery plan that you’ve always meant to start and keep updated. think of a disaster recovery plan as a set of decisions that have already been made so you have the latitude to act in a crisis without the usual bureaucracy of change governance. in your disaster recovery plans, make mailto:swans062@umn.edu information technology and libraries september 2021 building a culture of resilience in libraries | swanson 2 sure you have defined roles for everyone who is involved. make sure that communication has just as important of a role as resolving the crisis at hand. you need to finish that disaster recovery plan and keep it updated if you are going to instill resilience into your technology and organization. when it comes down to building resilience in the decisions that you are making, you need to start integrating the concept of two-way decision making into your process.2 a two-way decision is reversible. yes, most decisions that we make feel like a one-way street. anything from starting a new service, applying for a grant, or purchasing a new technology platform. but we can temper this by applying two-way decision making to the implementation decisions for the project or activity that we are undertaking. even if you just plan on ways to mitigate the impact of a decision gone wrong, you are building resilience into your decision-making process. you should also go beyond asking “how can we roll this decision back?” by having discussions around “what are the impacts this decision might have that would cause us to roll it back?” defining those evaluation metrics before you embark on a path can help to make your decisions more resilient. if you think about decisions as a two-way door and incorporate both the how and the why into them, you will be on your way to having a much more resilient change management process. when it comes to building resilience into our day-to-day workflows, we have to paradoxically standardize how we do work. if we are going to be more flexible and absorb change, it would be reasonable to think that we should support many different ways of working. however, that type of work fragmentation doesn’t scale to a hybrid work environment. it is just too much to try to support a work environment where common tasks are performed differently by staff that are physically located almost anywhere. therefore, we need to standardize how we work in order to support a hybrid work environment. staff communications and expectations need to be the same for every single staff member. everyone uses instant messaging or no one does. everyone keeps their calendar up to date. meetings are held with the same expectations whether a one-on-one with a supervisor or a team brainstorming session. this is foundational to incorporating resilience into the organization. you need to be able to rely on consistent communication channels when a crisis is thrust upon you. workflows that have been traditionally on-premise all need to be reenvisioned into cloud-based processes. work needs to be performed from anywhere, and the only way you can do that effectively is by standardizing how that work is accomplished. the more exceptions that you allow to this, the more you will run into problems with change. this type of standardization has to be ingrained into the culture of the organization with a top -down approach. therefore, it is imperative that those at the top of the library’s org chart embody this standardization. building this type of operational resilience will go a long way towards having that strong foundation that you can rely on across your organization. through the pandemic, many libraries focused on core services, reimagining them so they can function in new ways during the crisis. pandemic-prompted services that may well be permanent include increased use of pop-up libraries and bookmobiles, low-touch self-service kiosks, and webinar-based story times, among others.3 these are the types of changes and decisions that are upon us right now and may very well take up all of the oxygen in the room. however, we still need to make sure that we dedicate resources to that next thing. we can’t overcorrect and only circle the wagons around the delivery of our core services. we have future challenges that we will have to meet. the seeds of the solutions to those challenges may be growing right now in an open source project, a new resource sharing standard, or an innovative use of blockchain. even if, in the end, you don’t take on those projects, you will help build resilience into your library by expanding awareness and understanding. just as the speed of societal change always seems to be increasing, information technology and libraries september 2021 building a culture of resilience in libraries | swanson 3 the speed of expectations for libraries will be increasing. we have to make sure that we continue to dedicate resources into understanding, researching, and building those new services that are only now on the horizon. none of the things that i’ve said are new or groundbreaking. the urgency is what is new. delivery of library services has permanently changed over the past 18 months. how work gets done has permanently changed as well. the only way that we will be able to contend with this new paradigm is to do all of those things that we know we should be doing. shore up your foundations. absorb the change that has happened and will continue to happen. don’t be reactionary or fight for your own personal work needs. be confident problem solvers and work together for the strength of your library and for the needs of your patrons. instill resilience in the work that you do and the services that your library provides and it will help you to endure through whatever may come next. endnotes 1 https://www.merriam-webster.com/dictionary/resilient. 2 jeff haden, “why emotionally intelligent people embrace the 2-way-door rule to make better and faster decisions,” inc., july 6, 2021, https://www.inc.com/jeff-haden/why-emotionallyintelligent-people-embrace-2-way-doors-rule-to-make-better-faster-decisions.html. 3 ellen rosen, “beyond the pandemic, libraries look toward a new era,” the baltimore sun, september 27, 2020, https://www.baltimoresun.com/featured/sns-nyt-libraries-digitalresources-beyond-the-pandemic-20200927-lrgsracn6jhandorc37lxngnye-story.html. https://www.google.com/url?q=https://www.merriam-webster.com/dictionary/resilient&sa=d&source=editors&ust=1629870933402000&usg=aovvaw1oo9w6en5y9wmd7afr2p9h https://www.google.com/url?q=https://www.inc.com/jeff-haden/why-emotionally-intelligent-people-embrace-2-way-doors-rule-to-make-better-faster-decisions.html&sa=d&source=editors&ust=1629870933406000&usg=aovvaw0djnxh8mhjzvn5ra7ecvjp https://www.google.com/url?q=https://www.inc.com/jeff-haden/why-emotionally-intelligent-people-embrace-2-way-doors-rule-to-make-better-faster-decisions.html&sa=d&source=editors&ust=1629870933406000&usg=aovvaw0djnxh8mhjzvn5ra7ecvjp https://www.baltimoresun.com/featured/sns-nyt-libraries-digital-resources-beyond-the-pandemic-20200927-lrgsracn6jhandorc37lxngnye-story.html https://www.baltimoresun.com/featured/sns-nyt-libraries-digital-resources-beyond-the-pandemic-20200927-lrgsracn6jhandorc37lxngnye-story.html endnotes web content strategy in practice within academic libraries article web content strategy in practice within academic libraries courtney mcdonald and heidi burkhardt information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12453 courtney mcdonald (crmcdonald@colorado.edu) is associate professor and user experience librarian, university of colorado boulder. heidi burkhardt (heidisb@umich.edu) is web project manager and content strategist, university of michigan. © 2021. abstract web content strategy is a relatively new area of practice in industry, in higher education, and, correspondingly, within academic and research libraries. the authors conducted a web-based survey of academic and research library professionals in order to identify present trends in this area of professional practice by academic librarians and to establish an understanding of the degree of institutional engagement in web content strategy within academic and research libraries. this article presents the findings of that survey. based on analysis of the results, we propose a web content strategy maturity model specific to academic libraries. introduction our previous article traced the history of library adoption of web content management systems (cms), the evolution of those systems and their use in day-to-day library operations, and the corresponding challenges as libraries have attempted to manage increasingly prolific content creation workflows across multiple, divergent cms platforms.1 these challenges include inconsistencies in voice and a lack of sufficient or dedicated resources for library website management, resulting in the absence of shared strategic vision and organizational unity regarding the purpose and function of the library website. we concluded that a productive solution to these challenges lay in the inherently user-centered practice of web content strategy, defined as “an emerging discipline that brings together concepts from user experience design, information architecture, marketing, and technical writing.”2 we further noted that organizational support for web content management and governance strategies for library-authored web content had been rarely addressed in the library literature, despite the growing importance of this area of expertise to the successful provision of support and services: “libraries must proactively embrace and employ best practices in content strategy . . . to fully realize the promise of content management systems through embracing an ethos of libraryauthored content.”3 we now investigate the current state of practice and philosophy around the creation, editing, management, and evaluation of library-authored web content. to what degree, if at all, does web content strategy factor into the actions, policies, and practices of academic libraries, and academic librarians today? does a suitable measure for estimating the maturity of web content strategy practice for academic libraries exist? mailto:crmcdonald@colorado.edu mailto:heidisb@umich.edu information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 2 background maturity models maturity models are one useful mechanism for consistently measuring and assessing an organization’s current level of achievement in a particular area, as well as providing a path to guide future growth and improvement: “maturity levels represent a staged path for an organization’s performance and process improvement efforts based on predefined sets of practice areas. . . . each maturity level builds on the previous maturity levels by adding new functionality or rigor.”4 the initial work on maturity models emerged from carnegie mellon institute (cmi), focused on contract software development.5 since that time, cmi founded the cmmi institute which has expanded the scope of maturity models into other disciplines. many such models, developed for a variety of specific industries or specializations, have since been developed based on the cmmi institute approach, in which stages are defined as: • maturity level 1: initial (unpredictable and reactive); • maturity level 2: managed (planning, performance, measurement and control occur on the project level); • maturity level 3: defined (proactive, rather than reactive, with organization-wide standards); • maturity level 4: quantitatively managed (data-driven with shared, predictable, quantitative performance improvement objectives that align to meet the needs of internal and external stakeholders); and • maturity level 5: optimizing (stable, flexible, agile, responsive, and focused on continuous improvement).6 application of maturity models within user experience work in libraries thus far, discussion of maturity models in the library literature relevant to web librarianship has primarily centered on user experience (ux) work. in their 2020 paper “user experience methods and maturity in academic libraries,” young, chao, and chandler noted, “. . . several different ux maturity models have been advanced in recent years,” reviewing approximately a half-dozen approaches with varying emphases and numbers of stages.7 in 2013, coral sheldon-hess developed the following five-stage model, based on the aforementioned cmmi framework, for assessing maturity of ux practice in library organizations: 1 – decisions are made based on staff’s preferences, management’s pet projects. user experience [of patrons] is rarely discussed. 2 – some effort is made toward improving the user experience. decisions are based on staff’s gut feelings about patrons’ needs, perhaps combined with anecdotes from service points. 3 – the organization cares about user experience; one or two ux champions bring up users’ needs regularly. decisions are made based on established usability principles and studies from other organizations, with occasional usability testing. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 3 4 – user experience is a primary motivator; most staff are comfortable with ux principles. users are consulted regularly, not just for major decisions, but in an ongoing attempt at improvement. 5 – user experience is so ingrained that staff consider the usability of all of their work products, including internal communications. staff are actively considerate, not only toward users but toward their coworkers.8 as an indicator of overall ux maturity within an organization, sheldon-hess focuses on “consideration” in interactions not only between library staff and library patrons, but also between library staff: “when an organization is well and truly steeped in ux, with total awareness of and buy-in on user-centered thinking, its staff enact those principles, whether they’re facing patrons or not.”9 in 2017, macdonald conducted a series of semi-structured interviews with 16 ux librarians to investigate, among other things, “the organizational aspects of ux librarianship across various library contexts.”10 macdonald proposes a five-stage model, broadly similar in concept to the cmmi institute structure and to sheldon-hess’s model. most compelling, however, were these three major findings, taken from macdonald’s list: • some (but not all) ux librarian positions were created as part of purposeful and strategic efforts to be more self-aware; . . . • the biggest challenges to doing ux are navigating the complex library culture, balancing competing responsibilities, and finding ways to more efficiently employ ux methods; an d • the level of co-worker awareness of ux librarianship is driven by the extent to which ux work is visible and by the individual ux librarian’s ability to effectively communicate their role and value.11 based on analysis of the results of their 2020 survey of library ux professionals, in which they asked respondents to self-diagnose their organizations, young, chao, and chandler presented, for use in libraries, their adaptation of the nielsen norman group’s eight-stage scale of ux maturity: • stage 1: hostility toward usability / stage 2: developer-centered ux—apathy or hostility to ux practice; lack of resources and staff for ux. • stage 3: skunkworks ux—ad hoc ux practices within the organization; ux is practiced, but unofficially and without dedicated resources or staff; leadership does not fully understand or support ux.12 • stage 4: dedicated ux budget—leadership beginning to understand and support ux; dedicated ux budget; ux is assigned fully or partly to a permanent position. • stage 5: managed usability—the ux lead or ux group collaborates with units across the organization and contributes ux data meaningfully to organizational and strategic decision-making. • stage 6: systematic user-centered design process—ux research data is regularly included in projects and decision-making; a wide variety of methods are practiced regularly by multiple departments. • stage 7: integrated user-centered design / stage 8: user-driven organization—ux is practiced throughout the organization; decisions are made and resources are allocated only with ux insights as a guide.13 information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 4 young et al.’s findings supported macdonald’s, underscoring the importance of shared organizational understandings, priorities, and culture related to ux activities and personnel: ux maturity in libraries is related to four key factors: the number of ux methods currently in use; the level of support from leadership in the form of strategic alignment, budget, and personnel; the extent of collaboration throughout the organization; and the degree to which organizational decisions are influenced by ux research. when one or more of these four connected factors advances, so too does ux maturity. 14 these findings are consistent with larger patterns in the management of library-authored web content identified in the earlier cited literature review: inconsistent processes, disconnects between units, varying constituent goals, and vague or ineffective wcm governance structures are recurrent themes throughout the literature . . . web content governance issues often signal a lack of coordination, or even of unity, across an organization.15 assessing the maturity of content strategy practice in libraries we consider kristina halverson’s definition of content strategy, offered in content strategy for the web, as the authoritative definition. halverson states: “content strategy is the practice of planning for the creation, delivery, and governance of useful, usable content.”16 this definition can be divided into five elements: 1. planning: intentionality and alignment, setting goals, discovery and auditing, connecting to strategic a plan or vision 2. creation: roles, responsibilities, and workflows for content creation; attention to content structure; writing or otherwise developing content in its respective format 3. delivery: findability of content within site and more broadly (i.e., search engine optimization), use of distinct communication channels 4. governance: maintenance and lifecycle management of content through coordinated process and decision making; policies and procedures; measurement and evaluation through analysis of usage data, testing, and other means 5. useful/usable (hereafter referred to as ux): relevant, current, clear, concise, and in context jones discusses the application of content strategy–specific maturity models as a potential tool for content strategists: “the[se] model[s] can help your company identify your current level of content operations, . . . decide whether that level will support your content vision and strategy . . . [and] help you plan to get to the next level of content operations.”17 three examples of maturity models developed for use by content strategy industry professionals map industry-specific terms, tools, and actions to the level-based structure put forward by the cmmi institute (see table 1). information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 5 table 1. comparative table of content strategy maturity models content strategy, inc.18 [2016] jones (gathercontent)19 [2018] randolph (kapost)20 [2020] ad hoc: inconsistent quality, lack of uniform practice, little or no opportunity to understand customer needs chaotic: no formal content operations, only ad hoc approaches reactive: chaotic, siloed, lacking clarity, chronically behind rudimentary: movement toward structure, unified process and voice; can be derailed by timelines, resistance piloting: trying content operations in certain areas, such as for a blog siloed: struggles to collaborate, poorly defined and inconsistently measured goals organized & repeatable: strong leadership, uniform process and voice has become routine, integration of userfocused data collection scaling: expanding formal content operations across business functions mobilizing: varying collaboration, content is centralized but not necessarily accessible, defined strategy sometimes impacted by ad hoc requests managed & sustainable: larger buy-in across organization, can sustain changes in leadership, increased number and sophistication of methods sustaining: solidifying and optimizing content operations across business functions integrating: effective collaboration across multiple teams, capability for proactive steps, still struggle to prove roi optimized: close alignment to strategic objectives, integration across the organization, leadership within and outside the organization thriving: sustaining while also innovating and seeing return on investment (roi) optimizing: cross-functional collaboration results in seamless customer messaging and experiences, consistently measured roi contributes to planning while these models have some utility for content strategy practitioners in higher education, including those in academic and research libraries, emphasis on commercial standards for assessing success (e.g., business goals, centrally managed marketing) limits their direct application in the academic environment. the 2017 blog post by tracey playle, “ten pillars for getting the most of your content: how is your university doing?”, presented ten concepts paired with questions, which could be used by higher education content professionals to reflect on their current state of practice.21 this model was developed for use by a consultancy, and the “pillars”—”strategy and vision,” “risk tolerance and creativity,” and “training and professional development”— are more broadly conceived than typical maturity models. thus, this approach seems more appropriate as a personal or management planning tool rather than as a model for evaluating maturity across library organizations. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 6 methods following review and approval by the researchers’ institutional review boards, a web-based survey collecting information about existing workflows for web content, basic organizational information, and familiarity with concepts related to web content strategy was distributed to 208 professionals in april 2020. the survey was available for four weeks. participants were drawn from academic and research libraries across north america, providing their own opinions as well as information on behalf of their library organization. (see appendix a: institution list.) the sample group (n=208) was composed of north american academic and research libraries that are members of the following nationally and regionally significant membership organizations (excluding non-academic member institutions): the association of research libraries, the big ten academic alliance, the greater western library alliance, and/or the oberlin group. some libraries are members of multiple groups. details are supplied below in table 3. we identified individuals (n=165) based on their professional responsibilities and expertise using the following order and process: 1. individual job title contains some combinations of the following words and/or phrases: content strategy, content specialist, content strategist, web content, web communications, digital communications, digital content 2. head of web department or department email 3. head of ux department or department email 4. head of it or department email for institutions where a specific named individual could not be identified through a review of the organizational website, we identified a general email (e.g., libraries@state.edu) as the contact (n=43). a mailing list was created in mailchimp, and two campaigns were created: one for named individuals, and one for general contacts. only one response was requested per institution. (see appendix b: recruitment emails.) the 165 named individuals, identified as described above, received a personalized email inviting them to participate in the study. the recruitment email explained the purpose of the study, advised potential participants of possible risks and their ability to withdraw at any time, and included a link to the survey. a separate email was sent to the 43 general contacts on the same day, explaining the purpose of the study, and requesting that the recipient forward the communication to the appropriate person in the organization. this email also included information advising potential participants of possible risks and their ability to withdraw at any time, and a link to the survey. data was recorded directly by participants using qualtrics. the bulk of survey data does not include any personal information; we did not collect the names of institutions as part of our data collection, so identifying information is limited to information about institutional memberships. for the group of named individuals, one email bounce was recorded. the open rate for personalized emails sent to named individuals was approximately 62% (88 of 142 successfully delivered emails were opened) and the survey link was followed 66 times. the general email group had a 51% open rate (n=22) with 11 clicks of the survey link. with recruitment occurring in april 2020, most individuals and institutions were at the height of switching to remote operations information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 7 in light of the covid-19 pandemic. despite this, our open rates were considerably higher than average open rates as reported by mailchimp.22 as discussed below, we achieved our minimum response rate goal of 20%. table 2. survey question topics and response count question topic category response count 1 consent — 43 2 organizational memberships demographic 40 3 approx. # full-time employees demographic 41 4 cms products used infrastructure/ organizational structure 41 5 primary cms infrastructure/ organizational structure 39 6 number of site editors infrastructure/ organizational structure 39 7 describe responsibility for content infrastructure/ organizational structure 39 8 existence of position(s) with primary duties of web content infrastructure/ organizational structure 39 9 titles of such positions, if any infrastructure/ organizational structure 24 10 familiar with web content strategy content strategy practices 36 11 definition of web content strategy content strategy practices 32 12 policies or documentation content strategy practices 35 13 methods content strategy practices 37 14 willing to be contacted — 37 15 name — 27 16 email — 26 the survey included 16 questions; question topics and response counts are noted in table 2. informed consent was obtained as part of the first survey question. (see appendix c: survey questions and appendix d: informed consent document.) most questions were multiple-choice or short answer (i.e., a number). two questions required longer-form responses. information collected fell into the following three categories: information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 8 • demographics (estimated total number of employees; institutional memberships; estimated number of employees with website editing privileges) • infrastructure and organizational structure (content management systems used to manage library-authored web content; system used to host primary public-facing website; distribution of responsibility for website content; titles of positions (if any) whose primary responsibilities focus on web content) • web content strategy practices (familiarity with; personal definition; presence or absence of policy or documentation; evaluation methods regularly used) upon completion of the survey questions, participants had the option to indicate that they would be willing to be contacted for an individual interview as part of planned future research on this topic. twenty-seven individuals (63%) opted in and provided us with their contact information. findings in sum, 43 responses were received, resulting in a response rate of 20.67%. because we did not collect names of individuals or institutions and used an anonymous link for our survey, we cannot determine the ultimate response rate by contact group (named individuals or general email). demographic information the bulk of responses came from association of research libraries members, but within-group response rates show that the proportion of responses from each group was relatively balanced within the overall 20% response rate. table 3. distribution of survey contacts, responses, and response rates by group23 organization member libraries contacted responses share of total responses (%) group response rate (%) association of research libraries 117 26 50.98 22.22 big ten academic alliance 15 5 9.8 33.0 greater western library alliance 38 8 15.69 21.05 oberlin group 80 12 23.53 15.0 infrastructure & organizational structure content management systems a variety of content management systems are used to manage library-authored web content (see table 4); libguides, wordpress, omeka, and drupal were most commonly used across the group. other systems mentioned as write-in responses included acquia drupal, cascade, fedora-based systems, archivesspace, google sites, and “wiki and blog.” one response stated, “most pages are just non-cms for the website.” write-in responses for “other” and “proprietary system hosted by institution” were carried forward within the survey from question 3 to question 4, and are available in full in appendix e: other content management systems mentioned by respondents. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 9 table 4. cms products used to manage library-authored web content q3: cms products used percentage (%) count libguides 28.06 39 wordpress 18.71 26 omeka 15.11 21 drupal 13.67 19 other 9.35 13 sharepoint 7.19 10 proprietary system hosted by institution 7.19 10 adobe experience manager 0.72 1 total 100 139 for their primary library website, just under half of respondents relied on drupal (n=17, 43.59%). slightly fewer selected the specific system, whether the institution’s proprietary system or some other option, that they had shared as a write-in answer for the previous question; in total just under 36% (n=14). despite the widespread use reported in the previous question, only two respondents indicated that their primary website was hosted in libguides. (see table 5.) table 5. cms used to host primary library website q4: primary website cms percentage (%) count drupal 43.59 17 other (write in answers) 20.51 8 wordpress 15.38 6 libguides 5.13 2 proprietary system hosted by institution (write in answers) 15.38 6 dedicated positions, position titles, and organizational workflows almost two-thirds of respondents (n=24, 61.5%) indicated there were position(s) within their library whose primary duties were focused on the creation, management, and/or editing of web content. a total of 52 position titles were shared (the full list of position titles can be found in appendix f). terms and phrases most commonly occurring across this set were web (15), librarian (15), user experience (10), and digital (8). explicitly content-focused terms appeared more rarely: content (6), communication/communications (5), and editor (1). information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 10 table 6. frequency of terms and phrases in free-text descriptions of website content management, grouped by the authors into concepts count count count count count concept collaborative 29 assigned roles 18 locus of control 13 support 5 libguides 14 terms group 7 admin* 6 their own 7 training 2 team 6 manager 5 review 3 guidance 2 distributed 5 editor/s 4 oversight 3 consulting 1 committee 3 developer 3 permission 1 stakeholder 3 product owner 2 representative 2 crossdepartmental 1 decentralized 1 inclusive 1 information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 11 most respondents described collaborative workflows for web content management, in which a group of representatives or delegates collectively stewards website content (see table 6 for a summary and appendix f for full-text responses). collaborative concepts appeared 29 times, including terms like group (7), team (6), distributed (5), and committee (3). within this set, decentralized, inclusive, and cross-departmental each appeared once. similarly, within terms related to locus of control, the phrase “their own” appeared seven times. specifically assigned roles or responsibilities were mentioned 18 times, including terms like admin/administrator (6), manager (5), and editor/s or editorial (4). respondents discussed support structures such as training, guidance or consulting five times. libguides were mentioned 14 times. over 60% of respondents indicated that 20 or fewer employees had editing privileges on the library website (see table 7). three respondents commented “too many” when citing the number or range: “too many! i think about five, but there could be more”; “too many, about 12”; “too many to count, maybe 20+.” table 7. distribution of the number of employees with website editing privileges response percentage (%) count less than five 23.08 9 5–10 20.51 8 11–20 17.95 7 21–99 23.08 9 100–199 10.26 4 200+ 2.56 1 the greatest variation in practice regarding how many employees had website editing privileges occurs in institutions with more than 100 total employees, where institutions reported within every available range (see table 8). table 8. comparison of number of total employees and of number of employees with editing privileges number of employees less than 5 5–10 11–20 21–99 100–199 200+ 4–10 2 — — — — — 11–25 3 1 — — — — 26–50 — 2 2 — — — 51–99 1 1 4 1 — — 100+ 3 4 2 8 4 1 information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 12 web content strategy practices almost all respondents (n=36, 83%) reported that they were familiar with the concept of web content strategy. conversely, only 20% (n=7) reported that their library had either a documented web content strategy or web content governance policy. respondents were asked, optionally, to provide a definition of web content strategy in their own words, and we received 32 responses (see appendix g: definitions of web content strategy). we analyzed the free-text definitions of content strategy based on the five elements of halvorson’s previously cited definition: planning, creation, delivery, governance, and ux. we first individually rated the definitions, then we determined a mutually agreed rating for each. across the set, responses most commonly addressed concepts or activities related to planning and ux, and least commonly mentioned concepts or activities related to delivery (see table 9). table 9. occurrence of content strategy elements in free-text definitions element count percentage (%) plan intentional, strategic, brand, style, best practices 29 91 creation workflows, structure, writing 20 63 delivery findability, channels 13 41 governance maintenance, lifecycle, measurement/evaluation 16 50 ux needs of the user, relevant, current, clear, concise, in context 19 59.38 responses were scored on each of the five elements as follows: zero points, concept not mentioned; one point, some coverage of the concept; two points, thorough coverage of the concept. representative examples are provided in table 10. a perfect score for any individual definition would be 10. the median score across the group was four, and the average score was 3.4. we consider scores less than three to indicate a basic level of practice; scores from four to seven to be an intermediate level of practice; and scores above eight to be advanced levels of practice. of the 33 responses to the free-text definition question, one respondent failed to include any data, 14 responses were classed as basic, 17 responses as intermediate, and none were advanced. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 13 table 10. example showing scoring of four representative free-text definitions provided by respondents free-text definitions of content strategy plan intentional, strategic, brand, style, best practices creation workflows, structure, writing delivery findability, channels governance maintenance, lifecycle, measuremen t/evaluation ux needs of the user, relevant, current, clear, concise, and in context total score intentional and coordinated vision for content on the website. 1 0 0 0 0 1 an overarching method of bringing user experience best practices together on the website including heuristics, information architecture, and writing for the web. 1 1 0 0 1 3 strategies for management of content over its entire lifecycle to ensure it is accurate, timely, usable, accessible, appropriate, findable, and well-organized. 1 0 1 1 1 4 the process of creating and enacting a vision for the organization and display of web content so that it is user friendly, accurate, up-to-date, and effective in its message. web content strategy often involves considering the thoughts and needs of many stakeholders, and creating one cohesive voice to represent them all. 2 1 0 1 2 6 information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 14 respondents reported most frequent use of practices associated with web librarianship and user experience work: analysis of usage data (n=36) and usability testing (n=28) (see fig. 1). contentspecific methods were less commonly used overall. figure 1. frequency of reported usage of analysis and evaluation methods the five other responses mainly clarified or qualified the selections, although some added additional information, for example: at this time, all library websites use a standard template, so they have the same look and feel. beyond that everything else is “catch as catch can” because we do not have a web services librarian, nor are we likely to get that dedicated position any time soon, given the recent covid-19 financial upheaval. brand guidelines, accessibility guidance, and personal responsibility were also mentioned. discussion the targeted recruitment methodology and survey, representing a combination of demographic and practice-based questions, aspired to collect data suitable to generate a snapshot of how web content strategy work is being undertaken in academic libraries at this time, as well as the depth and breadth of that practice. we were struck by several contrasts in findings: first and foremost, the 80–20 inversion across responses related to knowledge of web content strategy versus its practice. this was particularly notable in combination with respondents’ reports that, in nearly two-thirds of organizations, one information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 15 or multiple positions exist in their organization with primary duties focused on the creation, management, and/or editing of web content. the influence of ux thinking and methods in academic libraries is visible in the frequency of respondents’ reported use of general and established ux practices for maintaining the primary website (e.g., usability testing). the other four elements of halvorson’s definition were less thoroughly covered, both in provided definitions of web content strategy and in methods reported. some respondents mentioned use of methods such as content audits or inventories and style guides, but many fewer reported reliance on review checklists, content calendars, and readability scores. in reviewing the self-reported definitions of content strategy for evidence of each of the five elements of halvorson’s previously discussed definition, trends in findings suggest higher levels of maturity in the elements of planning, creation, and ux, and lower levels in the elements of delivery and governance. nearly all respondents (91%) referenced the element of planning. almost twothirds mentioned concepts or practices related to creation, and approximately 60% of respondents referenced usability of content or a focus on the user in some capacity. only half made mention of governance (including maintenance and evaluation), and even fewer (41%) referenced delivery, whether considering content channels or findability; in fact, no single definition touched on both. overall, the results of the analysis of provided definitions (discussed in the previous section) suggest that at present, web content strategy as a community of practice in academic libraries is operating at, or just above, a basic level. proposed maturity model from these findings, and referencing the structure of the cmmi institute five-stage maturity model, the authors propose the following proposed content strategy maturity model for academic libraries. as previously noted in our findings, we assess the web content strategy community of practice in academic libraries as operating at, or just above, a basic level. to align the proposed maturity model with the definition scores, we applied the 10-point rating scale for provided definitions to the five levels by assigning two points per level, so a score of one or two would be equivalent to level 1, a score of three or four equivalent to level 2, and so on (table 11). table 11. comparison of maturity model with definition rating scale and maturity assessment maturity model level definition score assessment level 1 1 basic level 1 2 basic level 2 3 basic level 2 4 intermediate level 3 5 intermediate level 3 6 intermediate level 4 7 intermediate level 4 8 advanced level 5 9 advanced level 5 10 advanced content strategy maturity model for academic libraries level 1: ad hoc • no planning or governance • creation and delivery are reactive, distributed, and potentially chaotic • no or minimal consideration of ux level 2: establishing • some planning and evidence of strategy, such as use of content audits and creation of a style guide; may be localized within specific groups or units • basic coordination of content creation workflows • delivery workflows not explicitly addressed, or remain haphazard • no or minimal organization-wide governance structures or documentation in place; may be localized within specific groups or units • evidence of active consideration of ux in creation and structure of content level 3: scaling • intentional and proactive planning coordinated across multiple units • basic content creation workflows in place across organization • delivery considered, but may not be consistent or strategic • ad hoc evaluation through usage data and usability testing; organization-wide governance documents and workflows may be at a foundational level • consideration of ux is integral to process of creating useful, usable content • web content creation and maintenance is assigned at least partly to a permanent position with some level of authority and responsibility for the primary website level 4: sustaining • alignment in planning, able to respond to organizational priorities; style guidelines and best practices widely accepted • established and accepted workflows for content creation are coordinated through a person, department, team, or other governing body • delivery includes strategic and consistent use of channels, as well as consideration of findability • regular and strategic evaluation occurs; proactive maintenance and retirement practices in place; managed through established governance documents and workflows • web content strategy explicitly assigned partly or fully to a permanent position level 5: thriving • full lifecycle of content (planning, creation, delivery, maintenance, retirement) managed in coordination across all library-authored web content platforms • governance established and accepted throughout the organization, including documented policies, procedures, and accountability • basic understanding of content strategy concepts and importance across the organization • overall stable, flexible, agile, responsive, user-centered and focused on continuous improvement information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 17 as previously mentioned, the median score across the group was four, and the average score was 3.4; these measures suggest that the majority of survey respondents’ organizational web content strategy maturity levels would currently stand at level 2 or 3, with a few at level 1. conclusion the findings of this survey and assessment, while inherently limited, suggest that web content strategy is currently not a pervasive factor for academic libraries and academic web librarians in the development and implementation of actions, policies, and practices related to website creation, maintenance, and evaluation. we have proposed a measure for self-estimating the maturity of web content strategy practice for academic libraries. our content strategy maturity model for academic libraries, while grounded both in industry best practices and in evidence from practitioners in academic libraries, is nonetheless a work in progress. we intend to further develop and strengthen the model through follow-up interviews with practitioners, drawing on those survey respondents who opted-in to being contacted. interviewees will be invited to discuss their work within and outside the frame of the proposed maturity model, and to provide feedback on the model itself, with the ultimate goal of enabling a better understanding of web content strategy practice in academic libraries and the needs of its community of practice. endnotes 1 courtney mcdonald and heidi burkhardt, “library-authored web content and the need for content strategy,” information technology and libraries 38, no. 3 (september 15, 2019): 8–21, https://doi.org/10.6017/ital.v38i3.11015. 2 mcdonald and burkhardt, 14. 3 mcdonald and burkhardt, 16. 4 “cmmi levels of capability and performance,” sec. maturity levels, cmmi institute llc, accessed may 28 2020, https://cmmiinstitute.com/learning/appraisals/levels. 5 “about cmmi institute,” cmmi institute llc, accessed may 28 2020, https://cmmiinstitute.com/company. 6 “cmmi levels of capability and performance,” sec. maturity levels. 7 scott w. h. young, zoe chao, and adam chandler, “user experience methods and maturity in academic libraries,” information technology and libraries 39, no. 1 (march 16, 2020): 2, https://doi.org/10.6017/ital.v39i1.11787. 8 coral sheldon-hess, “ux, consideration, and a cmmi-based model,” para. 6, july 25, 2013, http://www.sheldon-hess.org/coral/2013/07/ux-consideration-cmmi/. 9 sheldon-hess, “ux, consideration, and a cmmi-based model,” para. 2, http://www.sheldonhess.org/coral/2013/07/ux-consideration-cmmi/. https://doi.org/10.6017/ital.v38i3.11015 https://cmmiinstitute.com/learning/appraisals/levels https://cmmiinstitute.com/company https://doi.org/10.6017/ital.v39i1.11787 http://www.sheldon-hess.org/coral/2013/07/ux-consideration-cmmi/ http://www.sheldon-hess.org/coral/2013/07/ux-consideration-cmmi/ http://www.sheldon-hess.org/coral/2013/07/ux-consideration-cmmi/ information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 18 10 craig m. macdonald, “‘it takes a village’: on ux librarianship and building ux capacity in libraries,” journal of library administration 57, no. 2 (february 17, 2017): 196, https://doi.org/10.1080/01930826.2016.1232942. 11 macdonald, 212. 12 skunk works is trademarked by lockheed martin corporation, but is informally used to describe an experimental, sometimes secret, research and development group focused on agile innovation. 13 young, chao, and chandler, “user experience methods and maturity in academic libraries,” 19. 14 young, chao, and chandler, 23. 15 mcdonald and burkhardt, “library-authored web content and the need for content strategy,” 15–16. 16 kristina halvorson, content strategy for the web, 2nd ed. (berkeley, ca: new riders, 2012), 28. 17 colleen jones, “a content operations maturity model,” sec. a maturity model for content operations, gather content (blog), november 30, 2018, https://gathercontent.com/blog/content-operations-model-of-maturity. 18 “understanding the content maturity model,” content strategy inc. (blog), march 2016, https://www.contentstrategyinc.com/understanding-content-maturity-model/. 19 jones, “a content operations maturity model,” sec. a maturity model for content operations. 20 zoë randolph, “where do you fall on the content operations maturity model?,” sec. the content operations maturity model, kapost blog (blog), april 20, 2020, https://kapost.com/b/content-operations-maturity-model/. 21 tracy playle, “ten pillars for getting the most of your content: how is your university doing?,” pickle jar communications (blog), september 29, 2017, http://www.picklejarcommunications.com/2017/09/29/content-strategy-benchmarking/. 22 “email marketing benchmarks by industry,” mailchimp, accessed june 15, 2020, https://mailchimp.com/resources/email-marketing-benchmarks/. 23 some libraries are members of multiple groups. https://doi.org/10.1080/01930826.2016.1232942 https://gathercontent.com/blog/content-operations-model-of-maturity https://www.contentstrategyinc.com/understanding-content-maturity-model/ https://kapost.com/b/content-operations-maturity-model/ http://www.picklejarcommunications.com/2017/09/29/content-strategy-benchmarking/ https://mailchimp.com/resources/email-marketing-benchmarks/ information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 19 appendices appendix a: institution list appendix b: recruitment emails appendix c: survey questions appendix d: informed consent document appendix e: other content management systems mentioned by respondents appendix f: organizational responsibility for content; and position titles appendix g: definitions of web content strategy information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 20 appendix a: institution list institution membership(s) agnes scott college oberlin group alabama arl alberta arl albion college oberlin group alma college oberlin group amherst college oberlin group arizona arl, gwla arizona state arl, gwla arkansas gwla auburn arl augustana college oberlin group austin college oberlin group bard college oberlin group barnard college oberlin group bates college oberlin group baylor gwla beloit college oberlin group berea college oberlin group boston arl boston college arl boston public library arl bowdoin college oberlin group brigham young arl, gwla british columbia arl brown arl bryn mawr college oberlin group bucknell university oberlin group calgary arl california, berkeley arl california, davis arl california, irvine arl california, los angeles arl information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 21 institution membership(s) california, riverside arl california, san diego arl california, santa barbara arl carleton college oberlin group case western reserve arl chicago arl, btaa cincinnati arl claremont colleges gwla, oberlin group clark university oberlin group coe college oberlin group colby college oberlin group colgate university oberlin group college of the holy cross oberlin group college of wooster oberlin group colorado arl, gwla colorado college oberlin group colorado state arl, gwla columbia arl connecticut arl connecticut college oberlin group cornell arl dartmouth arl davidson college oberlin group delaware arl, gwla denison university oberlin group denver gwla depauw university oberlin group dickinson college oberlin group drew university oberlin group duke arl earlham college oberlin group eckerd college oberlin group emory arl information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 22 institution membership(s) florida arl florida state arl franklin & marshall college oberlin group furman university oberlin group george washington arl georgetown arl georgia arl georgia tech arl gettysburg college oberlin group grinnell college oberlin group guelph arl gustavus adolphus college oberlin group hamilton college oberlin group harvard arl haverford college oberlin group hawaii arl hope college oberlin group houston arl, gwla howard arl illinois, chicago arl, gwla illinois, urbana arl, btaa indiana arl, btaa iowa arl, btaa iowa state arl, gwla johns hopkins arl kalamazoo college oberlin group kansas arl, gwla kansas state gwla kent state arl kentucky arl kenyon college oberlin group knox college oberlin group lafayette college oberlin group information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 23 institution membership(s) lake forest college oberlin group laval arl lawrence university oberlin group library of congress arl louisiana state arl louisville arl macalester college oberlin group manhattan college oberlin group manitoba arl maryland arl, btaa massachusetts arl mcgill arl mcmaster arl miami arl michigan arl, btaa michigan state arl, btaa middlebury college oberlin group mills college oberlin group minnesota arl, btaa missouri arl, gwla mit arl morehouse/spelman colleges (auc) oberlin group mount holyoke college oberlin group nebraska arl, btaa nevada las vegas gwla new mexico arl, gwla new york arl north carolina arl north carolina state arl northwestern arl, btaa notre dame arl oberlin college oberlin group occidental college oberlin group information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 24 institution membership(s) ohio arl ohio state arl, btaa ohio wesleyan university oberlin group oklahoma arl, gwla oklahoma state arl, gwla oregon arl, gwla oregon state gwla ottawa arl pennsylvania arl pennsylvania state arl, btaa pittsburgh arl princeton arl purdue arl, btaa queen's arl randolph-macon college oberlin group reed college oberlin group rhodes college oberlin group rice arl, gwla rochester arl rollins college oberlin group rutgers arl, btaa sarah lawrence college oberlin group saskatchewan arl sewanee: the university of the south oberlin group simmons university oberlin group simon fraser arl skidmore college oberlin group smith college oberlin group south carolina arl southern california arl, gwla southern illinois arl, gwla southern methodist gwla st. john's university / college of st. benedict oberlin group information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 25 institution membership(s) st. lawrence university oberlin group st. olaf college oberlin group suny-albany arl suny-buffalo arl suny-stony brook arl swarthmore college oberlin group syracuse arl temple arl tennessee arl texas arl, gwla texas a&m arl, gwla texas state gwla texas tech arl, gwla toronto arl trinity college oberlin group trinity university oberlin group tulane arl union college oberlin group utah arl, gwla utah state gwla vanderbilt arl vassar college oberlin group virginia arl virginia commonwealth arl virginia tech arl wabash college oberlin group washington arl, gwla washington and lee university oberlin group washington state arl, gwla washington u.-st. louis arl, gwla waterloo arl wayne state arl, gwla wellesley college oberlin group information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 26 institution membership(s) wesleyan university oberlin group west virginia gwla western arl wheaton college oberlin group whitman college oberlin group whittier college oberlin group willamette university oberlin group williams college oberlin group wisconsin arl, btaa wyoming gwla yale arl york arl information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 27 appendix b: recruitment emails recruitment email: named recipients this message is intended for *|mmerge6|* dear *|fname|*, we are writing today to ask for your participation in a research project “content strategy in practice within academic libraries,” (cu boulder irb protocol #18-0670), led by co-investigators courtney mcdonald and heidi burkhardt (university of michigan). we have provided the information below as a downloadable pdf should you wish to keep it for your records. the purpose of the study is to establish an understanding of the degree of institutional engagement in web content strategy within academic and research libraries, and what trends may be detected in this area of professional practice. our primary subject population consists of academic and research libraries that are members of the following nationally and regionally significant membership organizations (excluding nonacademic member institutions): association of research libraries, big ten academic alliance, greater western library alliance, and/or the oberlin group. if you opt to participate, we expect that you will be in this research study for the duration of the time it takes to complete our web-based survey. you will not be paid to be in this study. whether or not you take part in this research is your choice. you can leave the research at any time and it will not be held against you. we expect about 210 people, representing their institutions, in the entire study internationally. this survey will be available over a four-week period in the spring of 2020, through friday, may 1. ** confidentiality ----------------------------------------------------------- information obtained about you for this study will be kept confidential to the extent allowed by law. research information that identifies you may be shared with the university of colorado boulder institutional review board (irb) and others who are responsible for ensuring compliance with laws and regulations related to research, including people on behalf of the office for human information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 28 research protections. the information from this research may be published for scientific purposes; however, your identity will not be given out. ** questions ----------------------------------------------------------- if you have questions, concerns, or complaints, or think the research has hurt you, contact the research team at crmcdonald@colorado.edu. this research has been reviewed and approved by an irb. you may talk to them at (303) 735 3702 or irbadmin@colorado.edu if: * your questions, concerns, or complaints are not being answered by the research team. * you cannot reach the research team. * you want to talk to someone besides the research team. * you have questions about your rights as a research subject. * you want to get information or provide input about this research. thank you for your consideration, courtney mcdonald crmcdonald@colorado.edu heidi burkhardt heidisb@umich.edu ============================================================ not interested in participating? you can ** unsubscribe from this list (*|unsub|*). this email was sent to *|email|* (mailto:*|email|*) why did i get this? (*|about_list|*) unsubscribe from this list (*|unsub|*) update subscription preferences (*|update_profile|*) information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 29 recruitment email: named recipients dear library colleague, we are writing today to ask for your participation in a research project “content strategy in practice within academic libraries,” (cu boulder irb protocol #18-0670), led by co-investigators courtney mcdonald and heidi burkhardt (university of michigan). our primary subject population consists of academic and research libraries that are members of the following nationally and regionally significant membership organizations (excluding non academic member institutions): association of research libraries, big ten academic alliance, greater western library alliance, and/or the oberlin group. we ask that you forward this message to the person in your organization whose role includes oversight of your public web site. we are only requesting a response from one person at each institution contacted. thank you for your assistance in routing this request. we have provided the information below as a downloadable pdf should you wish to keep it for your records. the purpose of the study is to establish an understanding of the degree of institutio nal engagement in web content strategy within academic and research libraries, and what trends may be detected in this area of professional practice. if someone within your library opts to participate, we expect that person will be in this research study for the duration of the time it takes to complete our web-based survey. the participant will not be paid to be in this study. whether or not someone in your library takes part in this research is an individual choice. the participant can leave the research at any time and it will not be held against them. we expect about 210 people, representing their institutions, in the entire study internationally. this survey will be available over a four-week period in the spring of 2020, through friday, may 1. ** confidentiality ----------------------------------------------------------- information obtained about you for this study will be kept confidential to the extent allowed by law. research information that identifies you may be shared with the university of co lorado boulder institutional review board (irb) and others who are responsible for ensuring compliance with laws and regulations related to research, including people on behalf of the office for human research protections. the information from this research may be published for scientific purposes; however, your identity will not be given out. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 30 ** questions ----------------------------------------------------------- if you have questions, concerns, or complaints, or think the research has hurt you, contact the research team at crmcdonald@colorado.edu. this research has been reviewed and approved by an irb. you may talk to them at (303) 735 3702 or irbadmin@colorado.edu if: * your questions, concerns, or complaints are not being answered by the research team. * you cannot reach the research team. * you want to talk to someone besides the research team. * you have questions about your rights as a research subject. * you want to get information or provide input about this research. thank you for your consideration, courtney mcdonald crmcdonald@colorado.edu heidi burkhardt heidisb@umich.edu ============================================================ not interested in participating? you can ** unsubscribe from this list (*|unsub|*). this email was sent to *|email|* (mailto:*|email|*) why did i get this? (*|about_list|*) unsubscribe from this list (*|unsub|*) update subscription preferences (*|update_profile|*) information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 31 appendix c: survey questions web content strategy methods and maturity start of block: introduction q1 web content strategy methods and maturity in academic libraries (cu boulder irb protocol #20-0581) purpose of the study the purpose of the study is to gather feedback from practitioners on the proposed content strategy maturity model for academic libraries, and to further enhance our understanding of web content strategy practice in academic libraries and the needs of its community of practice. q2 please make a selection below, in lieu of your signature, to document that you h ave read and understand the consent form, and voluntarily agree to take part in this research. o yes, i consent to take part in this research. (1) o no, i do not grant my consent to take part in this research. (2) skip to: end of survey if q2 = no, i do not grant my consent to take part in this research. end of block: introduction start of block: demographic information information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 32 q3 estimated total number of employees (fte) at your library organization: o less than five (12) o 5-10 (13) o 11-20 (14) o 21-99 (15) o 100-199 (16) o 200+ (17) q4 estimated number of employees with editing privileges within your primary library website: o less than five (12) o 5-10 (13) o 11-20 (14) o 21-99 (15) o 100-199 (16) o 200+ (17) q5 does your library have a documented web content strategy and / or a web content governance policy? o no (1) o yes (2) information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 33 q6 are there position(s) within your library whose primary duties are focused on creation, management, and/or editing of web content? o no (1) o yes, including myself (2) o yes, not including myself (3) end of block: demographic information start of block: web content strategy q7 please indicate the degree to which each of the five elements of content strategy are currently in practice at your library. q8 creation employ editorial workflows, consider content structure, support writing. definitely true (48) somewhat true (49) somewhat false (50) definitely false (51) this is currently in practice at my institution. (1) o o o o information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 34 q9 delivery consider findability, discoverability, and search engine optimization, plus choice of content platform or channels. definitely true (48) somewhat true (49) somewhat false (50) definitely false (51) this is currently in practice at my institution. (1) o o o o q10 governance support maintenance and lifecycle of content, as well as measurement and evaluation. definitely true (31) somewhat true (32) somewhat false (33) definitely false (34) this is currently in practice at my institution. (1) o o o o q11 planning use an intentional and strategic approach, including brand, style, and writing best practices. definitely true (31) somewhat true (32) somewhat false (33) definitely false (34) this is currently in practice at my institution. (1) o o o o information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 35 q12 user experience consider needs of the user to produce relevant, current, clear, concise, and in context. definitely true (31) somewhat true (32) somewhat false (33) definitely false (34) this is currently in practice at my institution. (1) o o o o q13 please rank the elements of content strategy (as defined above) in order of their priority based on your observations of practice in your library. • ______ creation (1) • ______ delivery (2) • ______ governance (3) • ______ planning (4) • ______ user experience (5) q14 how would you assess the content strategy maturity of your organization? o basic (1) o intermediate (2) o advanced (3) end of block: web content strategy information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 36 start of block: thank you! q15 your name: ________________________________________________________________ q16 thank you very much for your willingness to be interviewed as part of our research study. prior to continuing on to finalize your survey submission, please sign up for an interview time: [link] (this link will open in a new window in order to allow you to finalize and submit your survey response after scheduling an appointment) please contact courtney mcdonald, crmcdonald@colorado.edu, if you experience any difficulty in registering or if there is not a time available that works for your schedule. end of block: thank you! information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 37 appendix d: informed consent document permission to take part in a human research study page 37 of 28 title of research study: content strategy in practice within academic libraries irb protocol number: 18-0670 investigators: courtney mcdonald and heidi burkhardt purpose of the study the purpose of the study is to establish an understanding of the degree of institutional engagement in web content strategy within academic and research libraries, and what trends may be detected in this area of professional practice. our primary subject population consists of academic and research libraries that are members of the following nationally and regionally significant membership organizations (excluding nonacademic member institutions): association of research libraries, big ten academic alliance, and/or greater western library alliance. we expect that you will be in this research study for the duration of the time it takes to complete our web-based survey. we expect about 210 people, representing their institutions, in the entire study internationally. explanation of procedures we are directly contacting each library to request that the appropriate individual(s) complete a web-based survey. this survey will be available over a four-week period in the spring of 2020. voluntary participation and withdrawal whether or not you take part in this research is your choice. you can leave the research at any time and it will not be held against you. the person in charge of the research study can remove you from the research study without your approval. possible reasons for removal include an incomplete survey submission. confidentiality information obtained about you for this study will be kept confidential to the extent allowed by law. research information that identifies you may be shared with the university of colorado boulder institutional review board (irb) and others who are responsible for ensuring compliance with laws and regulations related to research, including people on behalf of the office for human research protections. the information from this research may be published for scientific purposes; however, your identity will not be given out. payment for participation you will not be paid to be in this study. contact for future studies we would like to keep your contact information on file so we can notify you if we have future research studies we think you may be interested in. this information will be used by only th e information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 38 principal investigator of this study and only for this purpose. you can opt-in to provide your contact information at the end of the online survey. questions if you have questions, concerns, or complaints, or think the research has hurt you, contact to the research team at crmcdonald@colorado.edu this research has been reviewed and approved by an irb. you may talk to them at (303) 7353702 or irbadmin@colorado.edu if: • your questions, concerns, or complaints are not being answered by the research team. • you cannot reach the research team. • you want to talk to someone besides the research team. • you have questions about your rights as a research subject. • you want to get information or provide input about this research. signatures in lieu of your signature, your acknowledgement of this statement in the online survey document documents your permission to take part in this research. mailto:crmcdonald@colorado.edu mailto:irbadmin@colorado.edu information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 39 appendix e: other content management systems mentioned by respondents question #4: which of the following content management systems does your library use to manage library-authored web content? write-in responses for ‘proprietary system hosted by institution’ ● xxxxxxxxxxx • archivesspace • pressbooks • preservica • hippo cms • siteleaf • cascade • dotcms • terminal four • acquia drupal • fedora based digital collections system built in house write-in responses for ‘other” • wiki and blog • we draft content in google docs & also use gather content for auditing. • google sites • cascade • ebsco stacks • modx • islandora and online journal system • contentful • we also have some in-house-built tools such as for room booking; some of these are quite old and we would like to upgrade or improve them when we can. (very few people can make edits in these tools.) • cascade • the majority of the library website (and university website) is managed by a locally developed cms; however, the university is in the process of migrating to the acquia drupal cms. • blacklight, vivo, fedora • most pages are just non-cms for the website information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 40 appendix f: organizational responsibility for content; and position titles question 6 please explain how your organization distributes responsibility for content hosted in your content management system(s). if different parties (individuals, departments, collaborative groups) are responsible for managing content in different platforms please describe. • we have one primary website manager who oversees the management of the website, including content strategy and editing, and 2 editors who assist with small editing tasks. • we have content editors that edit content for individual libraries and collections. there is a content creator network managed by library communications. they provide trainings and guidance for content editors and act as reviewers, but not every single thing gets reviewed. • we have a team of developers and product owners who are responsible for managing web content. • we currently have a very distributed model, where virtually any library staff member or student assistant can request a drupal account and then make changes to existing content or develop new pages. we have a cross-departmental team that oversees the libraries' web interfaces and makes decisions about library homepage content, the menu navigation, overall ia, etc. we have web content guidelines to help staff as they develop new content. we have identified functional and technical owners for each of our cmss and have slightly different processes for managing content in those cmss. our general approach, however, is very inclusive (for better or worse ;) )-lots of staff have access to creating and editing content. we are, however, moving to a less distributed content for drupal in particular. moving forward, we'll have a small team responsible for editing and developing new content. this is to ensure that content is more consistent and user-centered. we attempted to identify funding for a full-time content manager but were unsuccessful, so this team will attempt to fill the role of a full-time content manager. • ux is the product owner and admin. if staff want content added to the website, they send a request to ux, we structure and edit content in a google doc, and then ux posts to the website. • there's no method for how or why responsibility is distributed. it ends up being something like, someone wants to add some content, they get editing access, they can now edit anything for as long as they're at the library. we are a super decentralized and informal library. • the primary content managers are the xxxxxx librarian and the xxxxxx. other individuals (primarily librarians) that are interested in editing their content have access on our development server. their edits are vetted by the xxxxxxand/or the xxxxxx librarian before being moved into production. • the xxxxxx department (6 staff) manages content and helps staff throughout the organization create and maintain content. ux staff sometimes teach others how to manage content, and sometimes do it for them. if design or content is complex, usually ux staff do the work. many staff don't maintain any content beyond their staff pages. subject specialists and instruction librarians maintain content [like] libguides-like content, but we don't use libguides. branch library staff maintain most of the content for their library pages. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 41 • in addition, the xxxxxx manages the catalog. the xxxxxx department manages special web projects. and the xxxxxx department manages social media, publications, and news. • a web content team made up of two administrators and librarians from xxxxxx and xxxxxx makes executive-level decisions about web content. • the xxxxxx team (xxxxxx) provides oversight and consulting for online user interfaces chaired by a xxxxxxposition which is new and is not yet filled. • for the public website, content editing is distributed to many groups and teams throughout the libraries. • the xxxxxxteam manages the main portions of the site including the homepage, news, maps, calendars, etc. the research librarians and subject liaisons manage the research guides. the xxxxxx provides guidance regarding overall responsibilities and style guidelines. • site structure and top-level pages for our main website resides with xxxxxx. page content is generally distributed to the departments closest to the services described by the pages. • right now editing of pages is distributed to those individuals who have the closest relationship to the pages being edited, with a significantly smaller number of people having administrative access to all of the libraries' websites. • primary website is co-managed by xxxxxx team (4 people) and xxxxxx team (3 people). xxxxxxteam creates timely content about news/events/initiatives while xxxxxx team manages content on evergreen topics. • research librarians and staff manage libguides content, which is in sore need of an inventory and pruning. • primarily me, plus two colleagues who serve with me as a web editorial board • one librarian manages the content and makes changes based on requests from other library staff • my role (xxxxxx) is xxxxxx. we also have a web content creator in our xxxxxx. i chair our xxxxxxgroup (xxxxxx), which has representatives from each division in the library and they are the primary stewards of supporting library authored web content. our "speciality" platforms (libguides, omeka, and wordpress for microsites) all have service leads, but content is managed by the respective stakeholders. the lead for libguides is a xxxxxx [group] member due to its scope and scale. in our primary website, we are currently structured around drupal organic groups for content management with xxxxxx [group] having broad editing access. in our new website, all content management will go through the xxxxxx, with communications for support and dynamic content (homepage, news, events) management. • management is somewhat in flux right now. we recently migrated our main web site to acquia drupal; there is a very new small committee consisting of xxxxxx, and three representatives from elsewhere in the library. for libguides, all reference, instructio n, and subject librarians can edit their own guides; the xxxxxx has tended to have final oversight but i don't know if this has ever been formally delegated. • librarians manage their own libguides subject guides; several members of xxxxxx can make administrative changes to coding, certificates, etc. on the entire site; there are individuals in different departments who control their own pages/libguides. there is a group within the library that administers wordpress for the institution. other content systems are administered by individuals within the library. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 42 • librarians are responsible for their own libguides. the xxxxxx department manages changes to most content, although some staff do manage their own wordpress content. they tend not to want to. • individuals. mainly one person authors content. the other individual has created some research guides. • individuals in different positions and departments within the library are assigned roles based on the type of content they frequently need to edit. • for instance, xxxxxx staff have the ability to create and edit exhibition content in drupal. xxxxxx staff and xxxxxx staff have the ability to create and edit equipment content. the event coordinator and librarians and staff involved in instruction are allowed to create and edit event and workshop listings. • only the communication coordinator is permitted to create news items that occupy real estate on the home page and various service point home pages. • as for general content, the primary internal stakeholders for that content typically create and edit that content, but if any staff notice a typo or factual error they are encouraged to correct them on their own, although they can also submit a request to the it department if they are not comfortable doing so. • subject specific content is hosted in libguides, and is maintained by subject liaison librarians. other content in libguides, software tutorials or information related to electronic resources for example, is created and maintained by appropriate specialists. • the drupal site when launched had internal stakeholders explicitly defined for each page, and only staff from the appropriate group could edit that content (e.g. if xxxxxx was tagged as the only stakeholder for a page about xxxxxx policies, then only staff from the xxxxxx department with editing privileges could edit that page). this system was abandoned after about two years as it was considered too much overhead to maintain and also the introduction of a content revisioning module that kept a history of edits alleviated fears of malicious editing. • individuals are assigned pages to keep content updated. the xxxxxx is responsible for coordinating with those staff and offers training to make sure content gets updated. • individual liaison librarians are responsible for their own libguides. i and the "xxxxxx" are the primary editors of the wordpress site, although 4 others have editing access (an employee who writes and posts news articles, the liaison librarian who spearheaded our new video tutorials, and two who work in special collections to update finding aids on that site, which is still on wordpress and i would consider under the main libraries web page, but is part of a multisite installation.) • in omeka and libguides, librarians are pretty self-sufficient and responsible for all of their own content. the three or four digital projects faculty and staff who work with omeka manage it internally alongside one of our developers. our omeka instance is relatively small-scale. • i (xxxxxx) oversee our libguides environment. while i am in the process of creating and implementing formal libguides content and structure guidelines, as of now it's a bit of a free-for-all with everyone responsible for the content pertaining to their own liaison department(s). content is made available to patrons via automatically populating legacy landing pages (we've had libguides for a decade and i've been with the institution not yet a year). information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 43 • as the xxxxxx, i am ultimately responsible for almost all of the content in our wordpress environment. that said, i try to distribute content upkeep responsibilities to the relevant department for each piece of the site. managers and committee chairs provide me with what they want on the web, and as needed (and in consultation with them) i review/rewrite it for the web (plain language), develop information architecture, design the front-end, and accessibly publish the content. there are only a few faculty and staff at my library who are comfortable with wordpress -but one of my long-term goals is to empower more folks to enact their own minor edits (e.g., updating hours, lending policies, etc.) while i oversee large-scale content creation, overall architecture, and strategy. we have a blog portion of our wordpress site which is not managed by anyone in particular, but i tend to clean it up if things go awry. • generally all of our web authors *can* publish to most parts of the site. (a very few content types (mostly featured images that display on home pages) can be edited only by admins and a small number of super-users.) however the great majority of people who can post content very rarely do (and some never do). some edit or post only to specific blogs, some only to their own guides or to very specific pages or suites of pages (e.g. liaison librarians to their own guides; thesis assistant to thesis pages). our small group in xxxxxx reviews new and updated pages and edits for in-house style and usability guidelines, and also trains and works collaboratively with web authors to create more usable content and reduce duplication -but given the large number of authors (with varied priorities, skills, and preferences) and pages we have trouble keeping up. we also more actively manage content on home pages. • for the main website and intranet, we have areas broken apart by unit area. we use workbench access to determine who can edit which pages. libguides is managed by committee, but most of the librarians have access. proprietary systems have separate accounts for those who need access. • for libguides, librarians can create content as they like, though there is a group that provides some (light) oversight. for main library website, most content is overseen by departments (in practice, one person each from a handful of “areas”, such as the branches, access services, etc.). • dotcms is primarily managed in systems (2 staff), with delegates from admin and outreach allowed to make limited changes to achieve their goals. libguides is used by all librarians and several staff, with six people given admin privileges. wordpress is used only in special collections. • xxxxxx dept manages major public facing platforms (drupal, wordpress, and shares libguides responsibilities with xxxxxx dept). xxxxxx manages omeka. within platforms, responsibilities are largely managed by department with individuals assigned content duties & permissions as needed. • different units maintain their content; one unit has overall management and checks for uniformity, needed updates, and broken links. • developers/communications office oversees some aspects, library management, research and collections librarians, and key staff edit other pieces. • currently, content is maintained by the xxxxxx librarian in coordination with content stakeholders from around the organization. we are in the process of migrating our site from drupal to omniupdate. once that is complete, we will develop a new model for content responsibilities. information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 44 • content is provided by department/services. • 5 librarians manage the libguides question 9 titles of positions in your organization whose primary duties involve creation, management and/or editing of web content: • head of web services; developer; web designer; user experience librarian • user experience librarian, lead librarian for discovery systems, digital technologies development librarian, lead librarian for software development. and we have titles that are university system it titles that don't mean a whole lot, such as technology support specialist and business and technology applications analyst. • web content specialist • user experience strategist, user experience designer, user experience student assistants , director of marketing communications and events • sr. ux specialist • web support consultant; coordinator, web services & library technology • editor & content strategist in library communications • web manager • discovery & systems librarian • head of library systems and technology • web services and data librarian • communications manager • web content and user experience specialist • metadata and discovery systems librarian, systems analyst, outreach librarian • digital services librarian; manager, communication services; communication specialist • (1) web project manager and content strategist, (2) web content creator • web services librarian • web developer ii • sr. software engineer, program director for digital services • user experience librarian • digital initiatives & scholarly communication librarian; senior library associate in digital scholarship and services • web services and usability librarian • senior library specialist -web content • web developer, software development librarian information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 45 appendix g: definitions of web content strategy question 11 in your own words, please define web content strategy. • a cohesive plan to create an overall strategy for web content that includes tone, terminology, structure, and deployment to best communicate the institution's message and enable the user. for the next question, the true answer is sort of. we have the start of a style guide. we also have the university's branding policies. we also have a web governance committee that is university-wide, of which i'm a part of. however, we don't have a complete strategy and it is certainly not documented. so you pick. • planning, development, and management of web content. two particularly important parts of web content strategy for academic library websites: 1. keeping content up to date and unpublishing outdated content. 2. building consensus for the creation and maintenance of a web style guide and ensuring that content across the large website adheres to the style guide. • strategies for management of content over its entire lifecycle to ensure it is accurate, timely, usable, accessible, appropriate, findable, and well-organized. • a system of workflows, training, and governance that supports the entire lifecycle of content, including creation, maintenance, and updating of content across all communications channels (e.g. websites, social media, signage). • a comprehensive, coordinated, planned approach to content across the site including components such as style guides, accessibility, information architecture, discoverability, seo. • not terribly familiar with the concept in a formal sense but think of it related to how the institution considers the intersection of content made available by the institution, the management and governance of issues such as branding/identity, accessibility, design, marketing, etc. • intentional and coordinated vision for content on the website • content strategy is the planning for the lifecycle of content. it includes creating, editing, reviewing, and deleting content. we also use a content strategy framework to determine each of the following for the content on our websites: audience, page goal, value proposition, validation, and measurement strategy. • website targets the community to ensure they can find what they need • the process of creating and enacting a vision for the organization and display of web content so that it is user friendly, accurate, up-to-date, and effective in its message. web content strategy often involves considering the thoughts and needs of many stakeholders, and creating one cohesive voice to represent them all. • web content strategy is the planning, design, delivery and governance plan for a website. this responsibility is guided by the library website management working group. • a web content strategy is a cohesive approach to managing and editing online content. an effective strategy takes into account web accessibility standards and endeavors to produce and maintain consistent, reliable, user-centered content. an effective content strategy evolves to meet the needs of online users and involves regular user testing and reviews of web traffic/analytics. • web content strategy is the theory and practice of creating, managing, and publishing web content according to evidence-based best practices for usability and readability information technology and libraries march 2021 web content strategy in practice within academic libraries | mcdonald and burkhardt 46 • making sure your content aligns with both your business goals and your audience needs. • a plan to oversee the life cycle of useful, usable content from its creation through maintenance and ultimately removal. • web content strategy is the overarching strategy for how you develop and disseminate web content. ideally, it would be structured and user tested to ensure that the content you are spending time developing is meeting the needs of your library and your community. • a web content strategy guides the full lifecycle of web content, including creation, maintenance, assessment, and retirement. it also sets guiding principles, makes responsibility and authority clear, and documents workflows. • an overarching method of bringing user experience best practices together on the website including: heuristics, information architecture, and writing for the web • planning and management of online content • a defined strategy for creating and delivering effective content to a defined audience at the right time. • in the most basic sense, web content strategy is matching the content, services and functionality of web properties with the organizational strategic goals. • web content strategy can include guidelines, processes, and/or approaches to making your website(s) usable, sustainable, and findable. it's a big-picture or higher-level way of thinking about your site(s), rather than page by page or function by function. • deliberate structures and practices to plan, deliver, and evaluate web content. • producing content that will be useful to users and easy for them to access • tying content to user behavior/user experience? • web content strategy is the thoughtful planning and construction of website content to meet users' needs. • n/a • cohesive planning, development, and management of web content, to engage and support library users. • working with teams and thinking strategically and holistically about the usability, functions, services, information, etc. provided on the website to best meet the needs of the site's users, as well as incorporating the marketing/promotional perspectives offered by the website. • planning and managing web content • web content strategy is the idea that all written and visual information on a certain site would conform to or align with the goals for that site. • ensuring that the most accurate and appropriate words, images, and other assets are presented to patrons at the point of need, while using web assets to tell stories patrons might not know they want to know. abstract introduction background maturity models application of maturity models within user experience work in libraries assessing the maturity of content strategy practice in libraries methods findings demographic information infrastructure & organizational structure content management systems dedicated positions, position titles, and organizational workflows web content strategy practices discussion proposed maturity model content strategy maturity model for academic libraries level 1: ad hoc level 2: establishing level 3: scaling level 4: sustaining level 5: thriving conclusion endnotes tending to an overgrown garden: weeding and rebuilding a libguides v2 system article tending to an overgrown garden weeding and rebuilding a libguides v2 system rebecca hyams information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12163 rebecca hyams (rhyams@bmcc.cuny.edu) is web and systems librarian, borough of manhattan community college/cuny. © 2020. abstract in 2019, the borough of manhattan community college’s library undertook a massive cleanup and reconfiguration of the content and guides contained in their libguides v2 system, which had been allowed to grow out of control over several years as no one was in charge of its maintenance. this article follows the process from identifying issues, getting departmental buy-in, and doing all of the necessary cleanup work for links and guides. the aim of the project was to make their guides easier for students to use and understand and for librarians to maintain. at the same time, work was done to improve the look and feel of their guides and implement the built-in a-z database list, both of which are also discussed. introduction in early 2019, the a. philip randolph library at the borough of manhattan community college (bmcc) (part of the city university of new york (cuny) system) hired a new web and systems librarian. the position itself was new to the library, though some of its functions had previously been performed by a staff member who had left more than a year prior. it quickly became apparent to the newest member of the library’s faculty that, while someone had at one point managed the website, the same could not really be said for the library’s libguides system and the mass of content contained within. the library’s libguides system was first implemented in january 2013 and over time the system came to be used primarily by instruction librarians to serve their teaching efforts. not long after bmcc implemented libguides, springshare announced libguides version 2 (v2), a new version of the system that included several enhancements and features not present in the earlier version.1 these features included the ability to mix content types in a single box (in the earlier version, for example, boxes could have either rich text or links but not both), a centrally managed asset library, and an automatically-generated a-z database list designed to make it easy to manage a publicfacing display. bmcc moved to libguides v2 around early 2015, but few of those who worked with the system ever took advantage of the newer features offered for quite some time, if at all. at the time the web and systems librarian came aboard, the bmcc libguides system contained over 400 public guides and an unwieldy asset library filled with duplicates and broken widgets and links. many of the guides essentially duplicated others, with only the name of the classroom instructor differing. there were, for example, 69 separate guides just for english 101, some of which had not been updated in three or four years. there were there no local guidelines for creating or maintaining guides, and in theory, each librarian was responsible for their own. however, it was apparent that in practice, no one was actively managing the guides or their related assets, as the lists of both were overwhelming. the creators of existing guides were primarily reference and instruction librarians whose other responsibilities meant there was little mailto:rhyams@bmcc.cuny.edu information technology and libraries december 2020 tending to an overgrown garden | hyams 2 time to do guide upkeep and because there was no single person in charge of the guides, there was no one to ensure any maintenance took place. in addition to the unwieldy guide list and asset library, the bmcc library was also effectively maintaining two separate a-z database lists, one on the library’s website that was a homegrown sql database built by a previous staff member, and another running on libguides to provide links to databases via the guides. the lists were not in sync with one another and several of the librarians were unaware that the libguides version of the list even existed, leading to links to databases appearing on both the database list and as link assets. and, while the libguides a -z list was not linked from the library’s website, it was still accessible from points within libguides, meaning that patrons could encounter an incorrect list that was not being maintained. getting started before any work could be done on our system, there needed to be buy-in from the rest of the library faculty. with the library director in agreement, agenda items were added to department meetings between march and may 2019 for discussion and department approval. the various aspects of the project were pitched to emphasize the following goals: • removing outdated material, broken links, etc. • streamlining where information could be found • decluttering guides to make everything easier to use and understand for students • improving the infrastructure to make maintenance and new guide creation easier and more manageable • standardizing layouts and content the aim of all of this would be to increase guide usability, accessibility, and make the guides overall a more consistent resource for our students. for the sake of transparency (as well as to have a demo of some of the aesthetic changes discussed in more detail below), a project guide was created and shared with the rest of the library department to share preliminary data as well as detailed updates as tasks were completed.2 process the database list while the libguides a-z database list, a feature built into v2 of the platform, contained information about our databases, it was essentially only serving to provide links to databases when creating guide content. there was some indication, in the form of a dormant a-z database “guide,” that someone had tried to create a list in libguides by manually adding assets to a guide. while that was a common practice in libguides v1 sites, as the built-in list was not yet a part of the system, the built-in list itself was never properly put into use. the links on our website all pointed to a homegrown list which, while powered by an sql database, was essentially a manual list. because of its design, it had proved impossible for anyone in the library to update without extensive web programming knowledge. it seemed a no-brainer to work on the database list first. this way we had both the infrastructure to update database-related content on the guides and a single and up-to-date list of resources with enhanced functionality that could benefit the library’s users almost immediately.3 information technology and libraries december 2020 tending to an overgrown garden | hyams 3 to begin, the two lists were compared to find any discrepancies, of which there were many. as the e-resources librarian was on leave at the time, the library director was consulted to determine which of databases missing from the libguides list were active subscriptions (and which of the ones missing from the homegrown list were previously cancelled so they could be removed). once the database list reflected current holdings, the metadata entries for the databases on the libguides side were updated to include resource type, related subjects, and related icons. these updates would enhance the functionality of the libguides list, as it could be filtered or searched using that additional information, something that was missing from the homegrown lis t. in addition to updating content and adding useful metadata, some slight visual changes were made to improve the look and usability of the list using custom css. most of this was done because as the list was being worked on, several librarians (of those who were even aware of it in the first place) mentioned that one reason they disliked the libguides list was because of the font size and spacing, which they felt was too small and hard to read. with the list updated, it was presented at the march 2019 department meeting and quickly won over all in attendance, especially when it was pointed out that the list could be very easily maintained because it required no special coding knowledge. while the homegrown list would remain live on the server for the rest of the semester (so as to not disrupt any classes that may have been using it), it was agreed that the web and systems librarian could go ahead with switching all of the links pointing to the homegrown list to point to the springshare list instead. the asset library because of how guides were typically created over the years since adopting libguides (many appeared to have been copied from another existing guide each time) the asset library had grown immense and unmanageable. for example, there were 149 separate links to our “databases by subject” page on the library’s website, the overwhelming majority of which were only used once. there were also 145 separate widgets for the same embedded scribd-hosted keyword worksheet, which was in fact broken and displayed no content. this is to say nothing of the broken-link report that no one had reviewed in quite some time. tackling the cleanup of duplicates and fixing of broken links/embeds was a large piece of the invisible work taken on behind the scenes to make maintaining the guides easier in the future. in order to analyze the data, the asset library report was exported to an excel file to make it easier to identify issues that needed correction. to start this process, we requested that springshare technical support wipe out all assets (other than documents) that were not mapped to anything and were just cluttering up the asset library (this ended up being just under 2,000 assets).4 most of those items had been removed from the guides they were originally included on but were never removed from the asset library. they served no real function other than to clutter up the backend. the guide authors had given the web and systems librarian permission to remove anything broken that could not be easily fixed. this included the aforementioned broken worksheet (and other similar items), as well as an assortment of youtube video embeds where the video had since been taken down, resulting in a “this video is unavailable” error message. it was felt that since those were already not working and seriously hurt the reliability of our guides to our users, that no further permission was needed. then came the much more tedious task of standardizing (where possible) which assets were in use. this involved going into guides listed as containing known-duplicate assets, replacing them with a single, designated asset, and then removing the resulting unmapped items.5 it was decided information technology and libraries december 2020 tending to an overgrown garden | hyams 4 that while many of the guides would likely be deleted after the spring semester, that only assets appearing on currently-active guides would be standardized. while in hindsight, as many of the links that were fixed were on guides that were soon-to-be deleted, it would have been better to hold off and wait until guides could be deleted first. however, doing at least some of this work in advance helped find other issues including instances where our proxy prefix was included directly in the url (an issue as we were also in the process of changing our ezproxy hosting) and where custom descriptions or link names were unclear. “books from the catalog” assets had their own issues that also needed to be addressed. with a pending migration of the library’s ils, it was already apparent that the links to any books in the library’s catalog would need updating so they could have a shot at continuing to function postmigration.6 we had been told at the time that the library’s primo instance would remain through the migration (though this changed during the migration process) so at the time we felt it important to ensure that all links were pointing to primo, as some had been pointing to the soonto-be decommissioned opac. for consistency, the urls were structured as isbn searches instead of ones relying on internal system numbers that would soon change. however, it became obvious very early on that some of the links to books were either pointing to materials that were no longer in the library’s collection, or were pointing to a previously decommissioned opac server, both of which resulted in errors. because the domain of the previously decommissioned opac server had been whitelisted in the link checker report settings, these items had not appeared on the broken link list. using the filtered list of “books from the catalog” assets, all titles were checked, which allowed the web and systems librarian to remove items that were no longer in the collection and make other adjustments as needed. as a result of the asset cleanup process, the asset library went from an unwieldy total of more than 5,000 items to just over 2,000 items. it also simplified the process for reusing assets in new guides, as there was now only one choice per item, and made it much easier to find and fix broken links and embeds. the guides the cleanup of the guides themselves was by far the most complex task. before starting the guide cleanup work itself, the web and systems librarian performed a content analysis to identify and recommend guides for deletion and which could be converted into general subject area guides. because a common practice was to create a “custom” guide for each class that came in for a library instruction session, there was an overrepresentation of guides for the classes that had regular sessions: english 101 (english composition), english 201 (introduction to literature), speech 100 (public speaking), and introduction to critical thinking. those four courses accounted for 187 guides, or over 40 percent of the total number in our system. the majority of them had not been updated directly in over three years, and in some cases, were designed for instructors who no longer taught at the college. perhaps more telling was that the content for these guides diff ered more across the librarians who created them than across the courses they were designed for. this meant that while there might be three or four different iterations of the english 101 guide, the guides created by the same librarian for different introductory courses were essentially the same. before the arrival of the web and systems librarian, one of the other librarians had been occasionally maintaining guide groups for “current courses” and “past courses,” but it was unclear if anyone was still actively maintaining these groupings, as guides for current instructors were sometimes under “past courses” and vice versa. because these groups did not actually hide the information technology and libraries december 2020 tending to an overgrown garden | hyams 5 guides from view on the master list of guides and appeared to be unnecessary work, it was decided to remove the groupings. instead, the web and systems librarian would plan to revisit the guides on a regular basis to unpublish/remove anything for courses that were no longer taught. however, since the philosophy behind the guides was to move from “custom” guides for each instructor’s section to a general guide for the course as a whole for the overwhelming majority of cases, the need for maintaining these groupings was essentially eliminated anyway. in may 2020, a preliminary list of guides to be deleted was presented to the librarians at the monthly department meeting. the list was broken down as: • duplicates to be deleted: this portion consisted primarily of course guides like those mentioned above where multiple guides existed for the same course, most of which used the exact same content. • guides to be “merged:” while merging guides is not actually possible in the libguides platform, there were cases where we had two or three for the same course. they could be condensed into a single guide with the rest deleted. • guides to convert to subject area guides: these were guides that were essentially already structured as a subject guide but were titled for a specific course, and in many cases, a guide for the subject area did not already exist (for example, a course-specific guide for business would become the business subject area guide). • dead guides: these were guides that had not been updated in more than two years and had not been viewed in at least one year. librarians were given an opportunity in the department meeting to comment on the list, as well as to contact the web and systems librarian with any comments. additionally, as some of the classroom faculty on campus had connections to specific guides, the library director also sent out a message to classroom faculty to let them know of our general plan to revamp the guides and that many would be removed over the summer. surprisingly, there were few objections either amongst the librarians or the classroom faculty once they understood the rationale and process. of the few classroom faculty members that did respond to the library director’s message, most of them were more concerned with content or specific links that they felt strongly about versus the guides themselves. in those cases, we noted the content requests to make sure they appeared on the new guides. most of these instructors were satisfied when we further explained our process and , if needed, ensured them that the content they requested would be worked into the new guide. only one instructor who responded, whose assignment was related to a grant they had received, made a strong case for keeping a separate guide for their sections of english 101. with the project approval out of the way, it was then time to begin removing all of the to-bedeleted guides and start the process of revamping those that would be kept. the goal was that the project would be completed by the start of the fall semester so that faculty and students would come back to a new (and hopefully, much improved) set of guides. removing debris to be cautious, a few preliminary steps were taken before the guides selected for deletion were removed. for starters, the selected guides had their status changed to “unpublished,” meaning that they no longer appeared on the public-facing list of guides. this gave everyone a chance to say something if a guide they were actively using suddenly went “missing.” these unpublished guides were then downloaded using the libguides html backup feature and saved to the department’s information technology and libraries december 2020 tending to an overgrown garden | hyams 6 network share drive. while the html backup output is not a full representation of the guide (the file generated displays as a single page and is missing any formatting or images that were included in the guide), it does include all of a guide’s content, meaning that a link or block of text can be retrieved from the backup in case of moments of “i know i had this on my guide before but....” because of the somewhat haphazard nature of our guides, deleting unwanted ones turned out to result in interesting and unexpected challenges. over the years, some of librarians had, from time to time, reused individual boxes between guides, but there was no consistency to the practice. while there was a repository guide for reusable content, not everyone used it or used it consistently. thankfully, libguides runs a pre-delete check, which proved to be invaluable in this process, as it showed if any of the boxes displayed on one guide were reused on any others. in most cases where boxes were reused, they were reused on guides that were also on the “to be deleted” list, but that was not always the case. by having that check we could find the other guides listed and make copies of the boxes that would have otherwise been deleted. if a box was reused on multiple guides that were being kept, it was copied to the reusable content guide and then remapped from there. cosmetic improvements in conjunction with the work being done to improve content of our guides, the web and systems librarian felt it was the perfect opportunity to update the guide templates and overall aesthetics to make the guides more visually appealing, especially considering little had been done in this area system-wide apart from setting the default color scheme. using the project guide as an initial sandbox, several changes were put into motion that would eventually be worked into new templates and pushed out to all of the reworked guides. the first, and perhaps biggest, change was the move from tab navigation to side navigation (an option first made available with the release of libguides v2). while there have been several usability studies that have debated using one over the other, in this case side navigation was chosen both for the streamlined nature of the layout as a whole (by default there is only one full content column), and because enabling the box-level navigation could serve as a quick index for anyone looking to find specific content on a page.7 side navigation also avoided the issue of long lists of tabs spilling into a second row, which further complicated page navigation. several changes to the look and feel of the guides were also put into place, with many of the changes coming from suggestions given on various libguide style or best practice guides or more general recommendations from web usability guidelines.8 perhaps most importantly, all of the font sizes were increased for improved readability, especially on box titles and headers, to better facilitate visual scanning. the default fonts were also replaced with two commonly used fonts from the google fonts library, roboto (for headings and titles) and open sans (for body text). additionally, the navigation color scheme was changed because the orange of the college’s blueand-orange color scheme regularly failed accessibility contrast checks and was described by some colleagues as “harsh on the eyes.” instead, two analogous lighter shades of blue (one of which was taken from the college’s branding documentation) were selected for the navigation and box titles respectively, both of which allowed for the text in those areas to be changed from white to black (again, for improved readability). figure 1 shows a typical “before” guide navigation design, and figure 2 shows a typical “after” design. information technology and libraries december 2020 tending to an overgrown garden | hyams 7 figure 1. a sample of guide navigation and content frequently found on guides before start of cleanup figure 2. navigation and content after revisions additionally, the web and systems librarian took this opportunity to go through the remaining guides to ensure they were all consistent. most of this work fell in the area of text styling, or rather, undoing text styling. it was clear from several of the guides that over the years, librarians had not been happy with the default font sizes or styles, which lead to a lot of customizing using the built-in wysiwyg text editor. not only did this create a nightmare in the code itself (as the wysiwyg editor adds a lot of extraneous tags and style markup), but it also meant that the changes coming from the new stylesheet were not being applied universally as any properties assigned on a page overrode the global css. there was also the issue of paragraph text (

    ) that was sometimes styled as fake headings (made larger or bolder to look like headings, but not using the proper tags) which needed to be corrected for consistency and accessibility purposes. replanting and sprucing up with an overwhelming majority of the guides (and their associated assets) deleted, it was finally time to rework the remaining guides into clear, easy-to-use resources that would benefit our students. at this point the guides fell into three categories: • guides that just needed to be pruned and updated. • guides that should be combined into a single subject area guide. • guides that should be created to fill an unmet need. information technology and libraries december 2020 tending to an overgrown garden | hyams 8 pruning and updating tasks were generally the least-arduous, as many of the guides included content that was also housed on discrete guides (citations, resource evaluation, etc.). instead of duplicating, for example, citation formats on every guide, those pages were replaced with navigation-level links out to the existing citation guide. this was also the point that we could do more extensive quality control such as switching to a single content column which further emphasized the extraneous information on many of our guides. infographics, videos, and long blocks of links or text were scrutinized to determine if they were helping to enhance students’ understanding of the core content or if they were merely providing clutter that would make it more difficult to understand the important information.9 in some cases, by going from guide to guide, it became apparent that there were guides for multiple courses in a subject area where the resources were basically identical. this was most noticeable in the criminal justice and health education subject areas. in these cases, it made little sense to keep separate course guides when the content was basically the same across them. to remedy this duplication, one of the course guides for each subject was transformed into the subject area guide, and resources were added to ensure they covered the same materials that the separate course guides may have covered. the remaining course guides were then marked for future deletion as they were no longer needed. lastly, subject areas without guides were identified so that work could be done later to create them. as we had discussed moving towards using the “automagic” integration of guide content into our blackboard learning management system (lms), this step will be key in ensuring that all subject areas have at least some resources students can use. however, as of this time we have yet to finish creating these additional guides, and several subject areas (including computer science, nursing, and gender studies) have no guides at all. next steps now that all of the work to clean and update our libguides is done, the most important next step is coming up with a workflow to ensure that the guides stay relevant and useful. the web and systems librarian mostly left the guides alone for the fall 2019 semester to allow their colleagues time to use them and report back any issues. to the web and systems librarian’s surprise there were few issues reported, but that does not mean there is no room for future improvement. as a department, it is clear that we need a formal plan for maintaining the guides, including update frequency, content review, and guidelines for when guides should be added or deleted. additionally, immediately following the conclusion of this cleanup project the library’s website was forced into a server migration and full rebuild for reasons outside of the scope of this article. however, as a result there were changes made on the library’s site involving the look and feel of pages that will need to be carried through into our guides and associated springshare platforms. while most of this work is relatively simple, mimicking changes developed in wordpress to work properly on external services will take time and effort. conclusion overall, while this project was a massive undertaking (done almost entirely by a single person), the end result, at least on the surface, has made our guides much easier to use and understand. there were obviously several things that, if the project were to be done over, should have been done differently, mostly involving the cleaning of the asset library. however, it is now much easier information technology and libraries december 2020 tending to an overgrown garden | hyams 9 to refer students to guides for their courses and the feelings about the guides amongst the library faculty have become much more positive. endnotes 1 “libguides: the next generation!,” springshare blog (blog), june 26, 2013, https://blog.springshare.com/2013/06/26/libguides-the-next-generation/. 2 the guide can be viewed at: https://bmcc.libguides.com/guidecleanup. 3 though the author only learned of the project undertaken at unc a few years ago, after they had already finished this project, a similar project was outlined here: sarah joy arnold, “out with the old, in with the new: migrating to libguides a-z database list,” journal of electronic resources librarianship 29, no. 2 (april 2017): 117–20, https://doi.org/10.1080/1941126x.2017.1304769. 4 because there was no way to view the documents before a bulk deletion, documents were manually reviewed and deleted as needed. 5 it was only long after this process that springshare promoted that they could do this on the backend by request. 6 however, it turned out that due to the differences in url structure between classic primo and primo ve that this change was completely unnecessary as the urls did actually needed to be changed again post-migration. at least they were consistent which meant a systemwide findand-replace could take care of most of the links. 7 several studies have been done since the roll out of libguides v2 including: sarah thorngate and allison hoden, “exploratory usability testing of user interface options in libguides 2,” college and research libraries 78, no. 6 (2017): 844–61, https://doi.org/10.5860/crl.78.6.844; kate conerton and cheryl goldenstein, “making libguides work: student interviews and usability tests,” internet reference services quarterly 22, no. 1 (january 2017): 43–54, https://doi.org/10.1080/10875301.2017.1290002. 8 of the many guides the author consulted, the following were the most informative: stephanie jacobs, “best practices for libguides at usf,” https://guides.lib.usf.edu/c.php?g=388525&p=2635904; jesse martinez, “libguides standards and best practices,” https://libguides.bc.edu/guidestandards/getting-started; carrie williams, “best practices for building guides & accessibility tips,” https://training.springshare.com/libguides/best-practices-accessibility/video. 9 there is a very detailed discussion of cognitive overload in libguides in jennifer j. little, “cognitive load theory and library research guides,” internet reference services quarterly 15, no. 1 (march 1, 2010): 53–63, https://doi.org/10.1080/10875300903530199. https://blog.springshare.com/2013/06/26/libguides-the-next-generation/ https://bmcc.libguides.com/guidecleanup https://doi.org/10.1080/1941126x.2017.1304769 https://doi.org/10.5860/crl.78.6.844 https://doi.org/10.1080/10875301.2017.1290002 https://guides.lib.usf.edu/c.php?g=388525&p=2635904 https://libguides.bc.edu/guidestandards/getting-started https://training.springshare.com/libguides/best-practices-accessibility/video https://doi.org/10.1080/10875300903530199 abstract introduction getting started process the database list the asset library the guides removing debris cosmetic improvements replanting and sprucing up next steps conclusion endnotes a comprehensive approach to algorithmic machine sorting of library of congress call numbers articles a comprehensive approach to algorithmic machine sorting of library of congress call numbers scott wagner and corey wetherington information technology and libraries | december 2019 62 scott wagner (smw284@psu.edu) is information resources and services support specialist, penn state university libraries. corey wetherington (cjw36@psu.edu) is open and affordable course content coordinator, penn state university libraries. abstract this paper details an approach for accurately machine sorting library of congress (lc) call numbers which improves considerably upon other methods reviewed. the authors have employed this sorting method in creating an open-source software tool for library stacks maintenance, possibly the first such application capable of sorting the full range of lc call numbers. the method has potential application to any software environment that stores and retrieves lc call number information. background the library of congress classification (lcc) system was devised around the turn of the twentieth century, well before the advent of digital computing. 1 consequently, neither it nor the system of library of congress (lc) call numbers which extend it were designed with any consideration to machine readability or automated sorting.2 rather, the classification was formulated for the arrangement of a large quantity of library materials on the basis of content, gathering like items together to allow for browsing within specific topics, and in such a way that a new item may always be inserted (interfiled) between two previously catalogued items without disruption to the overall scheme. unlike, for instance, modern telephone numbers, isbns, or upcs—identifiers which pair an item with a unique string of digits having a fixed and regular format, largely irrespective of any particular characteristics of the item itself—lc call numbers specify the locations of items relative to others and convey certain encoded information about the content of those items. the library of congress summarizes the essence of the lcc in this way: the system divides all knowledge into twenty-one basic classes, each identified by a single letter of the alphabet. most of these alphabetical classes are further divided into more specific subclasses, identified by two-letter, or occasionally three-letter, combinations. for example, class n, art, has subclasses na, architecture; nb, sculpture, nd, painting; as well as several other subclasses. each subclass includes a loosely hierarchical arrangement of the topics pertinent to the subclass, going from the general to the more specific. individual topics are often broken down by specific places, time periods, or bibliographic forms (such as periodicals, biographies, etc.). each topic (often referred to as a caption) is assigned a single number or a span of numbers. whole numbers used in lcc may range from one to four digits in length, and may be further extended by the use of decimal numbers. some subtopics appear in alphabetical, rather than hierarchical, lists and are represented by mailto:smw284@psu.edu mailto:cjw36@psu.edu algorithmic machine sorting of lc call numbers | wagner and wetherington 63 https://doi.org/10.6017/ital.v38i4.11585 decimal numbers that combine a letter of the alphabet with a numeral, e.g., .b72 or .k535. relationships among topics in lcc are shown not by the numbers that are assigned to them, but by indenting subtopics under the larger topics that they are a part of, much like an outline. in this respect, it is different from more strictly hierarchical classification systems, such as the dewey decimal classification, where hierarchical relationships among topics are shown by numbers that can be continuously subdivided.3 as this description suggests, lcc cataloging practices can be quite idiosyncratic and inconsistent across different topics and subtopics, and sorting rules for properly shelf-ordering lc call numbers can be correspondingly complex, as we will see below.4 for the purposes of discussion in what follows, we divide lc call number strings into three principal substrings: the classification, the cutter, and what we will term the specification. the classification categorizes the item on the basis of its subject matter, following detailed schedules of the lcc system published by the library of congress; the cutter situates the item alongside others within its classification (often on the basis of its title and/or author5); and the specification distinguishes a specific edition, volume, format, or other characteristic of the item from others having the same author and title: hc125⏞ 𝑎 .g25313⏞ 𝑏 1997⏞ 𝑐 in the above example, the classification string (a) denotes the subject matter (in this case, general economic history and conditions of latin america), the cutter string (b) locates the book within this topic on the basis of author and/or title (following a specific encoding process), and the specification string (c) denotes the particular edition of the text (in this case, by year). each of these general substrings may contain further substrings having specific cataloging functions, and though each is constructed following certain rigid syntactical rules, a great deal of variation in format may be observed within the basic framework. the following is an inexhaustive summary of the basic syntax of each of the three call number components: • the classification string always begins with one to three letters (the class/subclass), almost always followed by one to four digits (the caption number), possibly including an additional decimal. the classification may also contain a date or ordinal number following the caption number. • the beginning of the cutter string is always indicated by a decimal point followed by a letter and at least one digit. while the majority of call numbers contain a cutter, it is not always present in all cases. among the sorting challenges posed by lc call numbers, we note in particular the “double cutter”—a common occurrence in certain subclasses— wherein the cutter string changes from alphabetic to numeric, then back to alphabetic, and finally again to numeric. triple cutters are also possible, as are dates intervening between cutters. certain cutter strings (e.g., in juvenile fiction) end with an alphabetic “work mark” composed of two or more letters. • the specification string (which may be absent on older materials) is always last, and usually contains the date of the edition, but may also include volume or other numbering, ordinal numbers, format/part descriptions (e.g., “dvd,” “manual,” “notes”), or other distinguishing information. information technology and libraries | december 2019 64 figure 1 shows example call numbers, all found within the catalog of penn state university libraries, suggesting the wide variety of possibilities: figure 1. example call numbers. as one might expect given this irregularity in syntax, systematic machine-sorting of lc call numbers is by no means trivial. to begin with, sorting procedures within the lcc system are to a certain degree contextual—that is, the sorter must understand how a given component of a call number operates within the context of the entire string in order to determine how it should sort. both integer and decimal substrings appear in lc call numbers, so that a numeral may properly precede a letter in one part of a call number (a ‘1’ sorts before an ‘a’ in the classification portion, for example: h1 precedes ha1), while the contrary may occur in another part (within the cutter, in particular, an ‘a’ may well precede a ‘1’: hb74.p65a2 precedes hb74.p6512). similarly, letters may have different sorting implications depending on where and how they appear. compare, for instance, the call numbers v23.k4 1961 and u1.p32 v.23 1993/94. the v in the former denotes the subclass of general nautical reference works and simply sorts alphabetically, whereas the v in the latter call number functions in part as an indicator that the numeral 23 refers to a specific volume number and is to be sorted as an integer rather than a decimal. such contextual cues are often tacitly understood by a human sorter, but can present considerable challenges when implementing machine sorting procedures. furthermore, the lack of uniformity or regularity in the format of call number strings poses various practical obstacles for machine sorting. taken together, these assorted complexities suggest the insufficiency of a single alphanumeric sorting procedure to adequately handle lc call numbers as unprocessed, plain text strings. literature review a thorough review of information science literature revealed little formal discussion of the algorithmic sorting of lc call numbers. if the topic has been more widely addressed in the scholarly or technical literature, we were unable to discover it. nevertheless, the general problem appears to be fairly well known. this is evident both from informal online discussions of the topic (e.g., in blog posts, message board threads, and coding forums) and from the existence of certain features of library management system (lms) and integrated library system (ils) software designed to address the issue. in this section we examine methods proffered by some of these sources, and detail how each fails to fully account for all aspects of lc call number sorting. b1190 1951 no cutter string dt423.e26 9th.ed. 2012 compound specification e505.5 102nd.f57 1999 ordinal number in classification hb3717 1929.e37 2015 date in classification kbd.g189s no caption number, no specification n8354.b67 2000x date with suffix ps634.b4 1958-63 hyphenated range of dates ps3557.a28r4 1955 “double cutter” pz8.3.g276lo 1971 cutter with “work mark” pz73.s758345255 2011 lengthy cutter decimal algorithmic machine sorting of lc call numbers | wagner and wetherington 65 https://doi.org/10.6017/ital.v38i4.11585 in a brief article archived online, conley and nolan outline a method for sorting lc call numbers through the use of function programming in microsoft excel. 6 given a column of plain-text lc call numbers, their approach entails successive processing of the call numbers across several spreadsheet columns with the aim of properly accounting for the sorting of integers . the fullyprocessed strings are then ultimately ready for sorting in the rightmost column using excel’s built in sorting functionality. we note that conley and nolan’s method (hereafter “cnm”) only attempts to sort what the authors refer to as the “base call number” (i.e., the classification and cutter portions), and does not address the sorting of “volume numbers, issue numbers, or sheet numbers” (what we refer to here as the “specification”). 7 cnm stems from the tacit observation that ordinary, single-column sorting of lc call numbers is clearly inadequate in an environment like excel’s. for instance, in the following example, standard character-by-character sorting fails at the third character position, since pz30.a1 erroneously sorts before pz7.a1 (as 3 is compared to 7 in the third character position), contrary to the correct order (7 before 30). to address this, cnm normalizes the numeric portion of the class number with leading zeros so that each numeric string is of equal length, ensuring that the proper digits are compared during sorting. this entails a transformation, pz30.a1  pz0030.a1 pz7.a1  pz0007.a1 following which the strings will in fact sort correctly in an excel column. this technique appears adequate until we compare call numbers having subclasses of different length: p180.a1  p0180.a1 pz30.a1  pz0030.a1 here, while standard excel sorting will in fact properly order the resulting strings, in other applications, depending on the sorting hierarchy employed, sorting may fail in the second position if letters are sorted before numbers. hierarchy aside, it is not difficult to see the potential issues that may arise from sorting unlike portions of the call number string against one another in this way, particularly when comparing characters within the cutter string or in situations involving a “double cutter.” for instance, the call numbers b945.d4b65 1998 and b945.d41 1981b are listed here in their proper sorting order, but are in fact sorted in reverse by cnm when, in the eighth character position, 1 is sorted before b in accordance with excel’s default sorting priority. this again illustrates an essential problem of character-by-character sorting: in certain substrings we require a letters-before-numbers sorting priority, while in others a numbers-before-letters order is needed. this impasse makes clear that no single-column sorting methodology can succeed for all types of lc call numbers without significant modification to the call number string. in a blog post, dannay observed that cnm does not account for certain call number formats, particularly those of legal materials within the k classification having 3-letter class strings. 8 (the information technology and libraries | december 2019 66 same would also be true in the d classification, where 3-letter strings also appear.) although minor modification of portions of the function code (e.g., replacing certain ‘2’s with ‘3’s) would be sufficient to alleviate this particular issue, dannay proposed instead to employ placeholder characters to normalize the classification string and avoid instances of alphabetic characters being compared against numeric ones. dannay’s method (dm) normalizes various parts of the classification string, including the subclass, caption, and decimal portions: q171.t78  q**0171.0.t78 qa9.r8  qa*0009.0.r8 (here, of course, it is imperative that the chosen placeholder character sort before all letters in the sorting hierarchy.) dm thus successfully avoids the issue of comparing classification strings of unequal length or format. nevertheless, despite the improvements of dm over cnm, both approaches are ultimately unable to properly process certain types of common lc call numbers. for example, call numbers with dates preceding the cutter (e.g., gv722 1936.h55 2006) and call numbers without cutters (e.g., b1205 1958) both result in errors, as do those containing the aforementioned “double cutters.” furthermore, as we previously noted, neither dm nor cnm were designed to handle any portion of the specification string following the cutter, where the presence of ordinal and volumetype numbering is commonplace. hence neither method is able to properly order the quite ordinary pair of call numbers ac1.g7 v.19 and ac1.g7 v.2, since the first digit of each’s volume number is compared and ordered numerically (i.e., character-by-character), resulting in a mis-sort. though neither dn nor cnm is ultimately comprehensive (nor designed to be), both methods contain valuable insights and strategies that inform our own approach to the problem. software review available software solutions for sorting lc call numbers appear to be nearly as scant as literature on the subject. while github contains a handful of programs that attempt to address the problem, we found none which could be considered comprehensive. table 1 is a summary of those programs we discovered and were able to examine. the “sqlite3-lccn-extension” program is an extension for sqlite 3 which provides a collation for normalizing lc call numbers, executing from a sqlite client shell. we discovered several limitations in its ability to sort certain call number formats similar to those discussed above in the literature review. for instance, the program cannot correctly sort specification integers (e.g., it sorts v.13 before v.3) or call numbers lacking cutter strings (e.g., it sorts b 1190.a1 1951 before b 1190 1951). we found similar issues with “js-loc-callnumbers,” a javascript program with a web interface into which a list of call numbers can be pasted. the program transforms the call numbers into normalized strings, which are then sorted and displayed to the user. however, we observed that it does not account for dates or ordinal numbers in the classification string, nor can it correctly sort call numbers lacking caption numbers.9 algorithmic machine sorting of lc call numbers | wagner and wetherington 67 https://doi.org/10.6017/ital.v38i4.11585 program and author app-type, interface repository url last update “sqlite3-lccn-extension” by brad dewar database extension, shell https://github.com/macdewar/sqlite3lccn-extension dec. 2013 “js-loc-callnumbers” by ray voelker javascript, web https://github.com/rayvoelker/js-loccallnumbers feb. 2017 “library-of-congresssystem” by luis ulloa python tutorial, command line https://github.com/ulloaluis/library-ofcongress-system sep. 2018 “lcsortable” by mbelvadi2 google sheets script https://github.com/mbelvadi2/lcsortabl e may 2017 “library-callnumber-lc” by library hackers perl, python https://github.com/libraryhackers/libra ry-callnumberlc/tree/master/perl/librarycallnumber-lc dec. 2014 “lc_call_number_compare” by smu libraries javascript, command line https://github.com/smulibraries/lc_call_number_compare dec. 2016 “lc_callnumber” by bill dueber ruby https://github.com/billdueber/lc_callnu mber feb. 2015 table 1. list of github software involving lc call number sorting. several of the programs are rather narrow in scope. the “lcsortable” script is a google sheets scheme for normalizing lc call numbers into a separate column for sorting, very much like cnm and dm. its normalization routine appears to conflate decimals and integers, though, leading to transformations such as hf5438.5.p475 2001  hf5438.0005.p04752001 which would clearly result in a great deal of incorrect sorting across a wide array of lc call number formats. the command-line-based python program “library-callnumber-lc” processes a call number and returns a normalized sort key, but is not intended to store or sort groups of call numbers. it cannot adequately handle compound specifications or cutters containing consecutive letters (e.g., s100.bc123 1985), and does not appear to preserve the demarcation between a caption integer and caption decimal (i.e., the decimal point), thereby commingling integer and decimal sorting logic. lastly, “library-of-congress-system” is a tutorial/training program written in python that runs from the command line and supplies a list of call numbers for the user to sort. it does not draw call numbers from a static collection nor allow call numbers to be input by the user; rather, it randomly generates call numbers within certain parameters and following a https://github.com/macdewar/sqlite3-lccn-extension https://github.com/macdewar/sqlite3-lccn-extension https://github.com/rayvoelker/js-loc-callnumbers https://github.com/rayvoelker/js-loc-callnumbers https://github.com/ulloaluis/library-of-congress-system https://github.com/ulloaluis/library-of-congress-system https://github.com/mbelvadi2/lcsortable https://github.com/mbelvadi2/lcsortable https://github.com/libraryhackers/library-callnumber-lc/tree/master/perl/library-callnumber-lc https://github.com/libraryhackers/library-callnumber-lc/tree/master/perl/library-callnumber-lc https://github.com/libraryhackers/library-callnumber-lc/tree/master/perl/library-callnumber-lc https://github.com/libraryhackers/library-callnumber-lc/tree/master/perl/library-callnumber-lc https://github.com/smu-libraries/lc_call_number_compare https://github.com/smu-libraries/lc_call_number_compare https://github.com/billdueber/lc_callnumber https://github.com/billdueber/lc_callnumber information technology and libraries | december 2019 68 prescribed pattern. as such, we were not able to satisfactorily test its sorting capabilities for the kind of use-case scenario under discussion. we did not evaluate the remaining two github programs, “lc_call_number_compare” and “lc_callnumber,” as we could not get the former, a javascript es6 module, to execute, and as the latter, a ruby application which we did not attempt to install, evidently remains unfinished: its github documentation lists “normalization: create a string that can be compared with other normalized strings to correctly order the call numbers” as the among tasks yet to be completed. in addition to these open resources, we examined lc sorting capability within the commercial lms/ils software we had at hand. the marc (machine-readable cataloging) 21 protocol, a widely used international standard for formatting bibliographic data, provides a specific syntax for cataloging lc call numbers for the purposes of machine parsing.10 symphony workflows, the lms licensed by penn state university libraries from sirsidynix (and thus the only one available for our direct examination), contains within its search module a call number browsing feature which attempts to sort call numbers in shelving order via “shelving ids,” call number strings rendered from each item’s marc 21 “050” data field for sorting purposes. while these shelving ids are not visible within workflows (that is, they operate in the background), they can be accessed as plain text strings via bluecloud analytics, a separate, sirsidynix-branded data assessment and reporting tool peripheral to the lms. examination of these sort keys revealed integer normalization strategies similar to those of dm and cnm, with additional processing of volume-type numbering within the specification string. however, these shelving ids are similarly unable to correctly sort “double cutter” substrings and other syntactic complexities, such as ordinal numbers appearing in the classification. the following shelving id transformations of two call numbers in the penn state university libraries catalog, for instance, fail to properly account for the ordinal numbers which appear within the classification: e507.5 36th.v47 2003  e 000507.5 36th.v47 2003 e507.5 5th.c36 2000  e 000507.5 5th.c36 2000 consequently, and as expected, these two call numbers sort incorrectly within workflows’ call number browsing panes.11 proposed parsing and sorting methodology given the sorting difficulties inherent in the single-column approaches outlined above, we suggest a multi-column, tiered sorting procedure in which only like portions of the call number are compared to one another. this requires the call number to be processed, its various components identified, and each component appropriately sorted according to its specific typ e. this, in turn, requires a sorting algorithm which can identify like substrings by scanning for specific patterns and cues. “shelf reading” is a term for the common practice of verifying the correct ordering of items filed on a library shelf, typically unaided by technology, and our approach is primarily informed by the kind of mental procedures one undertakes when performing such sorting “in one’s head.”12 perhaps the most significant component of this process involves recognizing and interpreting the role and logic of specific types of substrings and identifying their positions within the sorting hierarchy. the overall design of the lc classification, from class to subclass to caption, constitutes algorithmic machine sorting of lc call numbers | wagner and wetherington 69 https://doi.org/10.6017/ital.v38i4.11585 a left-to-right progression from general to specific, and the classification portion of a call number can be interpreted as a series of containers holding items of increasingly narrow scope, some of which may be empty (that is, absent). this creates a structure that has a linear, hierarchical aspect, but also contains within it subcategories that share a common position within the structure. the priority that a subcategory (or container) is afforded in the sorting process depends first upon its position in the linear hierarchy, and subsequently on the depth ascribed to it relative to other subcategories that share the same position. call numbers indicate a subcategory’s position in the linear dimension by including or expanding sections; its depth within a given position is encoded in the character or series of characters chosen to represent it. thus, the sorting process may be regarded as a comparison of the paths that two call numbers denote through this structure, and the point at which the paths diverge is then the decisive point in determining an item’s position relative to others. this inflection point may occur at any juncture of the comparison, from the first character to the last. given these observations, a comprehensive machine-sorting strategy must observe the following provisions: 1. characters in call numbers should only be compared to characters that occupy an equivalent section of another call number. (“like compared to like.”) 2. within these designated sections, characters should only be compared to characters that occupy a corresponding position (place value) within that section. 3. if call numbers are identical up to the point that one of them lacks a section that the other call number possesses, the one with the “missing” section is ordered first. this is in keeping with the convention that items occupying a more general level in the hierarchy are ordered before those occupying a more specific one. (this principle is often summarized in shelfreading tutorials as “nothing before something.”) 4. if call numbers are identical up to the point that one of them lacks a character in a given position within a particular section that the other call number possesses, the one missing the character is ordered first. again, this preserves the general to specific scheme of lcc sorting. (another instance of “nothing before something.”) 5. whole numbers (e.g., caption integers, volume numbers) must be distinguished from decimals. for character-by-character sorting to work in sections of the call number containing integers, the length of whole numbers must be normalized to assure each digit is compared to another of equal place value. application of methodology shelfreader is a software application designed by the authors to improve the speed and accuracy of the shelf-reading process in collections filed using the library of congress system—and, to our knowledge, is the first such application to do so. it was coded by scott wagner in php and javascript, uses mysql for data storage and sorting, and is deployed as a web application. shelfreader allows the user to scan library items in the order they are shelved and receive feedback regarding any mis-shelved items. the program receives an item’s unique barcode identification via a barcode scanner, assembles a rest request incorporating the barcode, and sends it to an api connected to the lms. the application then processes the response, retrieving the title and call number of the item, along with information about the item’s status (for example, if it has been marked as lost or missing). the call number is passed off to the sorting algorithm, information technology and libraries | december 2019 70 which processes it and assigns it a position among the set of call numbers recorded during that session. a user interface then presents a “virtual shelf” to the user displaying a graphical representation of the items in the order they were scanned. when items are out of place on the shelf, the program calculates the fewest number of moves needed to correct the shelf and presents the necessary corrections for the user to perform until the shelf is properly ordered. a screenshot depicting the shelfreader gui during a typical shelf-reading session is presented in figure 2. figure 2. a screenshot of the shelfreader gui, showing an incorrectly filed item (highlighted in blue text) and its proper filing position (represented by the green band). shelfreader’s sorting strategy consists of breaking call numbers into elemental substrings and arranging those parts in a database table so that any two call numbers may be compared exclusively on the basis of their corresponding parts. to this end, a base set of call number components was established. these are shown in table 2, along with their abbreviations (for ease in reference), maximum length, and corresponding mysql data types. the specific mysql data type determines the kind of sorting employed in each column: • varchar accepts alphanumeric string data. sorting is character by character, numbers before letters. • integer accepts numerical data; numbers are evaluated as whole numbers. • decimal accepts decimal values. specifying the overall length of the column and the number of characters to the right of the decimal point has the effect of adding zeros as placeholders in any empty spaces to the right of the last digit. the values are then compared digit by digit. algorithmic machine sorting of lc call numbers | wagner and wetherington 71 https://doi.org/10.6017/ital.v38i4.11585 • timestamp a date/time value that defaults to the date and time the entry is made. this orders call numbers that are identical (i.e., multiple copies of the same item) in the order they are scanned. section, component abbreviation max. length mysql data type classification class/subclass sbc 3 varchar caption number, integer part ci 4 integer caption number, decimal part cdl 16 decimal caption date cdt 4 varchar caption ordinal co 16 integer caption ordinal indicator coi 2 varchar cutter first cutter, alphabetical part c1a 3 varchar first cutter, numerical part c1n 16 decimal first cutter date cd 4 integer second cutter, alphabetical part c2a 3 varchar second cutter, numerical part c2n 16 decimal second cutter date cd2 4 integer third cutter, alphabetical part c3a 3 varchar third cutter, numerical part c3n 16 decimal specification specification sp 256 varchar timestamp — — mysql timestamp table 2. shelfreader call number components and data types. when parsing a call number, it must be assumed that each call number may contain all of the components identified above. the following is a general outline of the parsing algorithm which processes the call number: information technology and libraries | december 2019 72 1. an array is created from the call number. each character, including spaces, is an element of the array. 2. a second array is then created to serve as a template for each call number, replacing the actual characters with ones indicating data type. for example, all integers are replaced with ‘i’s. this makes pattern matching and data-type testing simpler. 3. pattern matching is used to identify the presence or absence of landmarks such cutters, spaces, volume-type numbering, etc. 4. when landmarks are identified, their beginning and ending positions in the call number string are noted. 5. component strings are created by looping through the appropriate section of the call number template, constructing a string in which the template characters are replaced by the actual characters in the call number string and continuing until a space, th e end of the string, or an incompatible character is encountered. 6. where needed, whole numbers strings are normalized to uniform length. dividing a call number into its component parts and placing those parts in separate columns in a database table, then, effectively creates a sort key that may be used for ordering. this key occupies a row of the table, and is an inflated representation of the call number insofar as it makes use of the maximum possible string length of each component type. it contains the characters of each component the call number possesses, and any empty columns serve as placeholders for components it does not possess. when two call numbers are compared, sorting proceeds through each successive column, each component (and each character within each component) serving as a potential break point within the sorting process. we note that every column (with the exception of the specification) contains exclusively alphabetic or numeric data, so that numbers and letters are never compared in th ose sections of the call number string. (the use of spaces in the specification string effectively accounts for the mixed alphanumeric data type.) some additional points of clarification regarding the algorithm’s multi-column approach to sorting are worth mentioning: 1. any lowercase alphabetic characters are converted to uppercase before processing in order to ensure that letter case does not affect sorting. 2. components are arranged in the database table from left to right in the order they occur in the call number. 3. if a call number does not contain a given component, the column is left empty (in the case of a varchar column) or is assigned a zero value (in the case of numeric columns). 4. empty columns and zero columns sort before columns containing data. 5. in columns designated as varchar columns, numbers are compared as whole numbers . this means that, in order to sort correctly, the length of any number stored must be normalized to a uniform length (6 places) by adding leading zeros. for example, 17 must be normalized to “000017.” 6. sorting proceeds column by column, provided the call numbers are identical. when the first difference is encountered, sorting is complete. algorithmic machine sorting of lc call numbers | wagner and wetherington 73 https://doi.org/10.6017/ital.v38i4.11585 table 3 shows two randomly selected call numbers of rather common configuration, along with the corresponding sort keys created by shelfreader: e169.1.b634 2002 e169.1.b653 1987 }  sbc ci cdl cdt co coi c1a c1n cd c2a c2n cd2 c3a c3n sp e 0169 0.10000 0 b 0.6340000000 0002002 e 0169 0.10000 0 b 0.6530000000 0001987 table 3. example shelfreader sort-key processing of two similar call numbers. in this first example, sorting is complete when 3 is compared to 5 in the first numerical cutter (c1n) column. (note that we have here truncated the length of certain strings for space and readability.) to illustrate how the application handles call numbers having heterogenous formats, table 4 shows the sort keys created from two call numbers in an example mentioned above, one with a “double cutter” and one without: b945.d4b65 1998 b945.d41 1981b }  sbc ci cdl cdt co coi c1a c1n cd c2a c2n cd2 c3a c3n sp b 0945 0.0 0 d 0.400000 b 0.650000 0001998 b 0945 0.0 0 d 0.410000 0.000000 0001981b table 4. shelfreader sort-key processing of a “double cutter” call number and a nearby, single cutter call number. by pushing the second cutter (b65) in the first call number into the c2a and c2n columns, the issue of comparing incompatible sections of the call number is avoided, as the 1 in the second call number is compared to the placeholder 0 in the first. when the sorting routine reaches this position, it terminates, and any subsequent characters are ignored. aspects of this multi-column approach may seem counterintuitive at first, but the method mimics what we do when we order call numbers mentally. one compares two call numbers character by character within these component categories until encountering a difference, or until a character or entire category in one of the call numbers is found to be absent. results shelfreader’s sorting method is powerful, accurate, and has been extensively tested without issue in a number of different academic libraries within penn state’s statewide system. the application accurately sorts all valid lc call numbers (with the exception of those for certain cartographic materials in the g1000 – g9999 range, which sometimes employ a different syntax and sorting order) as well those of the national library of medicine classification system (which augments information technology and libraries | december 2019 74 lcc with class w and subclasses qs – qz) and the national library of canada classification (which adds to lcc the subclass fc, for canadian history). while there may conceivably be valid lc or lcextended call numbers having exotic formats that would fail to correctly sort in shelfreader, we are not aware of any examples (outside of, once again, the g1000 – g9999 range), nor have we received reports of any from users. in addition to verifying proper shelf-ordering, shelfreader contains a number of other features useful for stacks maintenance. the program can identify shelved items that are still checked out to patrons, have been marked missing or lost, or are flagged as in transit between locations, and often reveals items which have been inadvertently “shadowed” (i.e., excluded from public-facing library catalogs) or have shelf labels which do not match their catalogued call numbers. the gui has different modes to accommodate the user’s preferred view (both single shelf and multi-shelf, stacks views), and allows for a good deal of flexibility in how and when the user wishes to make and record shelf corrections. a reports module is also included, which tracks shelving statistics and other useful information for later reference. the shelfreader application code (including the full sorting algorithm) is freely available via an mit license at https://github.com/scodepress/shelfreader. while shelfreader was developed and tested using the collections and systems of penn state university libraries, its architecture could be adapted and configured for use with other library apis and adjusted to suit local practices within the general confines of the lc call number structure.13 we can also envision a wide array of potential applications of the sorting functionality within other software environments, and we welcome and encourage users to pursue innovative adaptations of the method. references and notes: 1 leo e. lamontagne, american library classification: with special reference to the library of congress (hamden, ct: the shoe string press, 1961). the lengthy development of the lcc is described in detail in chapters xiii and xiv (pp. 221-51). 2 indeed, as lamontagne asserts, “the classification was constructed [ . . . ] to provide for the needs of the library of congress, with no thought to its possible adoption by other libraries. in fact, the library has never recommended that other libraries adopt its system . . . ” (ibid., p. 252). nevertheless, lcc is employed by the overwhelming majority of academic libraries in the united states (brady lund and daniel agbaji, “use of dewey decimal classification by academic libraries in the united states,” cataloging & classification quarterly 56, no. 7 (december 2018): 653-61, https://doi.org/10.1080/01639374.2018.1517851). 3 “library of congress classification,” library of congress, https://www.loc.gov/catdir/cpso/lcc.html. italics in original. 4 for a summary of lc sorting rules, see “how to arrange books in call number order using the library of congress system,” rutgers university libraries, https://www.libraries.rutgers.edu/rul/staff/access_serv/student_coord/libconsys.pdf. note that this summary is not comprehensive and does not cover all contingencies. 5 here we emphasize that our definition of the cutter string may differ from that of others, including (at times) that of the library of congress. for instance, the schedules for certain lcc https://github.com/scodepress/shelfreader https://doi.org/10.1080/01639374.2018.1517851 https://www.loc.gov/catdir/cpso/lcc.html https://www.libraries.rutgers.edu/rul/staff/access_serv/student_coord/libconsys.pdf algorithmic machine sorting of lc call numbers | wagner and wetherington 75 https://doi.org/10.6017/ital.v38i4.11585 subclasses regard the first portion of a cutter as part of the classification itself. since this paper concerns sorting rather than classification, we favor the simpler and more convenient definition. 6 j.f. conley and l.a. nolan, “call number sorting in excel,” https://scholarsphere.psu.edu/downloads/9cn69m421z. 7 conley and nolan, “call number sorting in excel.” 8 tim danny, “sorting lc call numbers in excel,” https://medium.com/@tdannay/sorting-lc-callnumbers-in-excel-75de044bbb04. 9 while there is in fact a “hack” or partial patch built into the program which identifies call numbers beginning with the subclass kbg and parses them separately, there is no general support for other call numbers in this category. 10 for the details of this syntax, see “050 library of congress call number (r),” library of congress, https://www.loc.gov/marc/bibliographic/bd050.html. 11 testing was conducted on sirsidynix symphony workflows staff client version 3.5.2.1079, build date june 5, 2017. 12 for an overview, see “student library assistant training guide: shelving basics,” florida state college at jacksonville, https://guides.fscj.edu/training/shelving. 13 shelfreader was written to receive real-time data directly from a sirsidynix api connected to penn state university libraries’ lms, a great improvement over drawing from a static collections database. this does, however, present a challenge for making the program easily adaptable to libraries using distinct web services. a strategy to adapt the program would need to account for potential differences in barcode structure, structure and naming conventions in the rest request, and structure and naming conventions within the server response from institution to institution. it is possible that these issues could be resolved via a configuration file made available to the user, but no attempt to address this issue has been undertaken as of yet. https://scholarsphere.psu.edu/downloads/9cn69m421z https://medium.com/@tdannay/sorting-lc-call-numbers-in-excel-75de044bbb04 https://medium.com/@tdannay/sorting-lc-call-numbers-in-excel-75de044bbb04 https://www.loc.gov/marc/bibliographic/bd050.html https://guides.fscj.edu/training/shelving abstract background literature review software review proposed parsing and sorting methodology application of methodology results references and notes: letter from the editor kenneth j. varnum information technology and libraries | september 2018 1 https://doi.org/10.6017/ital.v37i3.10747 this september 2018 issue of ital continues our celebration of the journal’s 50 th anniversary with a column by former editorial board member mark dehmlow, who highlights the technological changes beginning to stir the library world in the 1980s. the seeds of change planted in the 1970s are germinating, but the explosive growth of the 1990s is still a few years away. in addition to peer-reviewed articles on recommender systems, big data processing and storage, finding vendor accessibility documentation, using gis to find specific books on a shelf, and a recommender system for archival manuscripts, we are also publishing the student paper by this year’s ex libris/lita student writing award, “the open access citation advantage: does it exist and what does it mean for libraries?”, by colby lewis at the university of michigan school of information. this inciteful paper impressed the competition’s judges (as ital’s editor, i was one of them) and i am very pleased to include ms. lewis’ work here. this issue also marks my fourth as editor. with one year under my belt i am finding a rhythm for the publication process and starting to see the increased flow of articles from outside traditional academic library spaces that i wrote about in december 2017. as always, if you have an idea for a potential ital article, please do get in touch. we on the editorial board look forward to working with you. sincerely, kenneth j. varnum, editor varnum@umich.edu september 2018 http://www.ala.org/news/member-news/2018/04/colby-lewis-wins-2018-litaex-libris-student-writing-award mailto:varnum@umich.edu microsoft word 13063 20211217 galley.docx article bridging the gap using linked data to improve discoverability and diversity in digital collections jason boczar, bonita pollock, xiying mi, and amanda yeslibas information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13063 jason boczar (jboczar@usf.edu) is digital scholarship and publishing librarian, university of south florida. bonita pollock (pollockb1@usf.edu) is associate director of collections and discovery, university of south florida. xiying mi (xmi@usf.edu) is digital initiative metadata librarian, university of south florida. amanda yeslibas (ayesilbas@usf.edu) is e-resource librarian, university of south florida. © 2021. abstract the year of covid-19, 2020, brought unique experiences to everyone in their daily as well as their professional life. facing many challenges of division in all aspects (social distancing, political and social divisions, remote work environments), university of south florida libraries took the lead in exploring how to overcome these various separations by providing access to its high-quality information sources to its local community and beyond. this paper shares the insights of using linked data technology to provide easy access to digital cultural heritage collections not only for the scholarly communities but also for those underrepresented user groups. the authors present the challenges at this special time of the history, discuss the possible solutions, and propose future work to further the effort. introduction we are living in a time of division. many of us are adjusting to a new reality of working separated from our colleagues and the institutions that formerly brought us together physically and socially due to covid-19. even if we can work in the same physical locale, we are careful and distant with each other. our expressions are covered by masks, and we take pains with hygiene that might formerly have felt offensive. but the largest divisions and challenges being faced in the united states go beyond our physical separation. the nation has been rocked and confronted by racial inequality in the form of black lives matter, a divisive presidential campaign, income inequality exacerbated by covid-19, the continued reckoning with the #metoo movement, and the wildfires burning the west coast. it feels like we are burning both literally and metaphorically as a country. adding fuel to this fire is the consumption of unreliable information. ironically, even as our divisions become more extreme, we are increasingly more connected and tuned into news via the internet. sadly, fact checking and sources are few and far between on social media platforms, where many are getting their information. the pew foundation report the future of truth and misinformation online warns that we are on the verge of a very serious threat to the democratic process due to the prevalence of false information. lee raine, director of the pew research center’s internet and technology project, warns, “a key tactic of the new anti-truthers is not so much to get people to believe in false information. it’s to create enough doubt that people will give up trying to find the truth, and distrust the institutions trying to give them the truth.”1 libraries and other cultural institutions have moved very quickly to address and educate their populations and the community at large, trying to give a voice to the oppressed and provide information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 2 reliable sources of information. the university of south florida (usf) libraries reacted by expanding antiracism holdings. usf’s purchases were informed by work at other institutions, such as the university of minnesota’s antiracism reading lists, which has in turn grown into a rich resource that includes other valuable resources like the mapping prejudice project and a link to the umbra search.2 the triad black lives matter protest collection at the university of north carolina greensboro is another example of a cultural institution reacting swiftly to document, preserve, and educate.3 these new pages and lists being generated by libraries and cultural institutions seem to be curated by hand using tools that require human intervention to make them and keep them up to date. this is also a challenge the usf libraries faced when constructing its new african american experience in florida portal, a resource that leverages already existing digital collections at usf to promote social justice. another key challenge is linking new digital collections and tools to already established collections and holdings. beyond the new content being created in reaction to current movements, there is already a wealth of information established in rich archives of material, especially regarding african american history. digital collections need to be discoverable by a wide audience to achieve resource sharing and educational purposes. this is a challenge many digital collections struggle with, because they are often being siloed from library and archival holdings even within their own institutions. all the good information in the world is not useful if it is not findable. an example of a powerful discovery tool that is difficult to find and use is the umbra search (https://www.umbrasearch.org/) linked to the university of minnesota’s anti-racism reading list. umbra search is a tool that aggregates content from more than 1,000 libraries, archives, and museums.4 it is also supported by highprofile grants from the institute of museum and library services, the doris duke charitable foundation, and the council on library and information resources. however, the website is difficult to find in a web search. umbra search was named after society of umbra, a collective of black poets from the 1960s. the terms umbra and society of umbra do not return useful results for finding the portal, nor do broader searches of african american history the portal is difficult to find through basic web searches. one of the few chances for a user to find the site is if they came upon the human-made link in the university of minnesota anti-racism reading list. despite enthusiasm from libraries and other cultural institutions, new purchases and curated content are not going to reach the world as fully as hoped. until libraries adopt open data formats in favor of locking away content in closed records like marc, library and digital content will remain siloed from the internet. the library catalog and digital platforms are even siloed from each other. we make records and enter metadata that is fit for library use but not shareable to the web. as karen coyle asked in her lita keynote address a decade ago, the question is how can libraries move from being “on the web” to being “of the web”?5 the suggested answer and the answer the usf libraries are researching is with linked data. literature review the literature on linked data for libraries and cultural heritage resources reflects an implementation that is “gradual and uneven.” as national libraries across the world and the library of congress develop standards and practices, academic libraries are still trying to understand their role in implementation and identify their expertise.6 information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 3 in 2006 tim berners-lee, the creator of the sematic web concept, outlined four rules of linked data: 1. use uris as names for things. 2. use http uris so that people can look up those names. 3. when someone looks up a uri, provide useful information, using the standards (rdf, sparql). 4. include links to other uris so that they can discover more things.7 it was not too long after this that large national libraries began exploring linked data and experimenting with uses. in 2010 the british library presented its prototype of linked data. this move was made in accordance with the uk government’s commitment to transparency and accountability along with the user’s expectation that the library would keep up with cutting edge trends.8 today the british library has released the british national bibliography as linked data instead of the entire catalog because it is authoritative and better maintained than the catalog.9 the national libraries of europe, spurred on by government edicts and europeana (https://www.europeana.eu/en), are leading the progress in implementation of linked data. national libraries are uniquely suited to the development and promotion of new technologies because of their place under the government and proximity to policy making, bridging communication between interested parties and the ability to make projects into sustainable services.10 a 2018 survey of all european national libraries found that 15 had implemented linked data, two had taken steps for implementation and three intended to implement it. even national libraries that were unable to implement linked data were contributing to the linked data open cloud by providing their data in datasets to the world.11 part of the difficulty with earlier implementation of linked data by libraries and cultural heritage institutions was the lack of a “killer example” that libraries could emulate.12 the relatively recent success of european national libraries might provide those examples. many other factors have slowed the implementation of linked data. a survey of norwegian libraries in 2009 found considerable gap in the semantic web literature between the research undertaken in the technological field and the research in the socio-technical field. implementing linked data requires reorganization of the staff, commitment of resources, education throughout the library and buy-in from the leadership to make it strategically important.13 the survey of european national libraries cited the exact same factors as limitations in 2018.14 outside of european national libraries the implementation of linked data has been much slower. many academic institutions have taken on projects that tend to languish in a prototype or proof of concept phase.15 the library-centric talis group of the united kingdom “embraced a vision of developing an infrastructure based on semantic web technologies” in 2006, but abandoned semantic web-related business activities in 2012.16 it has been suggested that it is premature to wholly commit to linked data, but it should be used for spin-off projects in an organization for experimentation and skill development.17 linked data is also still proving to be technologically challenging for implementation of cultural heritage aggregators. if many human resources are needed to facilitate linked data, it will remain an obstacle for cultural heritage aggregators. a study has shown automated interpretation of information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 4 ontologies is hampered by a lack of inter-ontology relations. cross-domain applications will not be able to use these ontologies without human intervention.18 aiding in the development and awareness of linked data practices for libraries is the creation and implementation of bibframe by the library of congress. the library of congress’s announcement in july 2018 that bibframe would be the replacement of marc definitively shows that the future of library records is focused on linking out and integrating into the web.19 the new rda (resource description and access) cataloging standards made it clear that marc is no longer the best encoding language for making library resources available on the web.20 while rda has adopted the cataloging rules to meet a variety of new library environments, the marc encoding language makes it difficult for computers to interpret and apply logic algorithms to the marc format. in response, the library of congress commissioned the consulting agency zepheria to create a framework that would integrate with the web and be flexible enough to work with various open formats and technologies, as well as be able to adapt to change. using the principles and technologies of the open web, the bibframe vocabulary is made of “resource description framework (rdf) properties, classes, and relationships between and among them.”21 eric miller, the ceo of zepheria, says bibframe “works as a bridge between the description component and open web discovery. it is agnostic with regards to which web discovery tool is employed” and though we cannot predict every technology and application bibframe can “rely on the ubiquity and understanding of uris and the simple descriptive power of rdf.”22 the implementation of linked data in the cultural heritage sphere has been erratic but seems to be moving forward. it is important to pursue though because bringing local data out of the “deep web” and making them open and universally accessible, means offering minority cultures a democratic opportunity for visibility.”23 linked data linked data is one way to increase the access and discoverability of critical digital cultural heritage collections. also referred to as semantic web technologies, linked data follows the w3c resource description framework (rdf) standards.24 according to tim berners-lee, the semantic web will bring structure and well-defined meaning to web content allowing computers to perform more automated processes.25 by providing structure and meaning to digital content, information can be more readily and easily shared between institutions. this provides an opportunity for digital cultural heritage collections of underrepresented populations to get more exposure on the web. following is a brief overview of linked data to illustrate how semantic web technologies function. linked data is created by forming semantic triples. each rdf triple contains uniform resource identifiers or uris. these identifiers allow computers (machines) to “understand” and interpret the metadata. each rdf triple consists of three parts: a subject, a predicate, and an object. the subject defines what the metadata rdf triple is about, while the object contains information about the subject which is further defined by the relationship link in the predicate. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 5 figure 1. example of a linked data rdf triple describing william shakespeare’s authorship of hamlet. for example, in figure 1, “william shakespeare wrote hamlet” is a triple. the subject and predicate of the triple are written as an uri containing the identifier information and the object of the triple is a literal piece of information. the subject of the triple, william shakespeare, has an identifier which in this example links to the library of congress name authority file for william shakespeare. the predicate of the rdf triple describes the relationship between the subject and object. the predicate also typically defines the metadata schema being used. in this example, dublin core is the metadata schema being used, so “wrote” would be identified by the dublin core creator field. the object of this semantic triple, hamlet, is a literal. literals are text that are not linked because they do not have a uri. subjects and predicates always have uris to allow the computer to make links. the object may have a uri or be a literal. together these uris, along with the literal, tell the computer everything it needs to know about this piece of metadata, making it self-contained. rdf triples with their uris are stored in a triple-store graph style database which functions differently from a typical relational database. relational databases rely on table headers to define the metadata stored inside. moving data between relational databases can be complex because tables must be redefined every time data is moved. graph databases don’t need tables since all the defining information is already stored in each triple. this allows for bidirectional flow of information between pieces of metadata and makes transferring data simpler and more efficient.26 information in a triple-store database is then retrieved using sparql, a query language developed for linked data. because linked data is stored as self-contained triples, machines have all the information needed to process the data and perform advanced reasoning and logic programming. this leads to better search functionality and lends itself well to artificial intelligence (ai) technologies. many of today’s modern websites make use of these technologies to enhance their displays and provide greater functionality for their users. the internet is an excellent avenue for libraries to un-silo their collections and make them globally accessible. once library collections are on the web, advanced keyword search functionalities and artificial intelligence machine learning algorithms can be developed to automate metadata creation workflows and enhance search and information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 6 retrieval of library resources. the use of linked data metadata in these machine-learning functions will add a layer of semantic understanding to the data being processed and analyzed for patron discovery. ai technology can also be used to create advanced graphical displays making connections for patrons between various resources on a research topic. sharing digital cultural heritage data with other institutions often involves transferring data and is considered one of the greatest difficulties in sharing digital collections. for example, if one institutional repository uses dublin core to store its metadata for a certain cultural heritage collection and another repository uses mods/mets to store digital collections, there must first be a data conversion before the two repositories could share information. dublin core and mods/mets are two completely different schemas with different fields and metadata standards. these two schemas are incompatible with each other and must be crosswalked into a common schema. this typically results in some data loss during the transformation process. this makes combining two collections from different institutions into one shared web portal difficult. linked data allows institutions to share collections more easily. because linked data triples are self-contained, there is no need to crosswalk metadata stored in triples from one schema into another when transferring data. the uris contained in the rdf triples allow the computer to identify the metadata schema and process the metadata. rdf triples can be harvested from one linked data system and easily placed into another repository or web portal. a variety of schemas can all be stored together in one graph database. storing metadata in this way increases the interoperability of digital cultural heritage collections. collections stored in triple-store databases have sparql endpoints that make harvesting the metadata in a collection more efficient. libraries can easily share metadata on important collections increasing the exposure and providing greater access for a wider audience. philip schreur, author of “bridging the worlds of marc and linked data,” sums this concept up nicely: “the shift to the web has become an inextricable part of our day-to-day lives. by moving our carefully curated metadata to the web, libraries can offer a muchneeded source of stable and reliable data to the rapidly growing world of web discovery.”27 linked data also makes it easier to harvest metadata and import collections into larger cultural heritage repositories like digital public library of america (dpla) which uses linked data to “empower people to learn, grow, and contribute to a diverse and better-functioning society by maximizing access to our shared history, culture, and knowledge.”28 europeana, the european cultural heritage database, uses semantic web technologies to support its mission which is to “empower the cultural heritage sector in its digital transformation.”29 using linked data to transfer data into these national repositories is more efficient and there is less loss of data because the triples do not have to be transformed into another schema. this increases the access of many cultural heritage collections that might not otherwise be seen. one of the big advantages to linked data is the ability to create connections between other cultural heritage collections worldwide via the web. incorporating triples harvested from other collections into the local datasets enables libraries to display a vast amount of information about cultural heritage collections in their web portals. libraries thus can provide a much richer display and allows users access to a greater variety of resources. linked data also allows web developers to use uris to implement advanced search technologies creating a multifaceted search environment for patrons. current research points to the fact that using sematic web technologies makes the creation of advance logic and reasoning functionalities possible. according to liyang yu in the book introduction to the semantic web and semantic web services, “the semantic web is an information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 7 extension of the current web. it is constructed by linking current web pages to a structured data set that indicates the semantics of this linked page. a smart agent, which is able to understand this structure data set, will then be able to conduct intelligent actions and make educated decisions on a global scale.”30 many digital cultural heritage collections in libraries live in siloed resources and are therefore only accessible to a small population of users. linked data helps to break down traditional library silos in these collections. by using linked data, an institution can expand the interoperability of the collection and make it more easily accessible. many institutions are starting to incorporate linked data technologies into digital collections, thereby increasing the ability for institutions to share collections. this allows for a greater audience to have access to critical cultural heritage collections for underrepresented populations. in the article “bridging the worlds of marc and linked data,” the author states, “the shift to linked data within this closed world of library resources will bring tremendous advantages to discovery both within a single resource … as well as across all the resources in your collections, and even across all of our collective collections. but there are other advantages to moving to linked data. through the use of linked data, we can connect to other trusted sources on the web.… we can also take advantage of a truly international web environment and reuse metadata created by other national libraries.”31 university of south florida libraries practice university of south florida libraries digital collections house a rich collection varying from cultural heritage objects to natural science and environment history materials to collections related to underrepresented populations. most of the collections are unique to usf and have significant research and educational value. the library is eager to share the collections as widely as possible and hopes the collections can be used at both document and data level. linked data creates a “web of data” instead of a “web of documents,” which is the key to bringing structure and meaning to web content, allowing computers to better understand the data. however, collections are mostly born at the document level. therefore, the first problem librarians need to solve is how to transform the documents to data. for example, there is a beautiful natural history collection called audubon florida tavernier research papers in usf libraries digital collections. the audubon florida tavernier research papers is an image collection which includes rookeries, birds, people, bodies of water, and man-made structures. the varied images come from decades of research and are a testament to the interconnectedness of bird population health and human interaction with the environment. the images reflect the focus of audubon’s work in research and conservation efforts both to wildlife and to the habitat that supports the wildlife.32 this was selected to be the first collection the authors experimented with to implement linked data at usf libraries. the lessons learned from working with this collection are applied to later work. when the collection was received to be housed in the digital platform, it was carefully analyzed to determine how to pull the data out of all the documents as much as possible. the authors designed a metadata schema of the combination of mods and darwin core (darwin core, abbreviated to dwc, is an extension of dublin core for biodiversity informatics) to pull out and properly store the data. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 8 figure 2. american kestrel. figure 3. american kestrel metadata. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 9 figure 2 is one of the documents in the collection, which is a photo of an american kestrel. figure 3 shows the data collected from the document and the placement of the data in the metadata schema. the authors put the description of the image in free text in the abstract field. this field is indexed and searchable through the digital collections platform. location information is put in the hierarchical spatial field. the subject heading fields describe the “aboutness” of the image, that is, what is in the image. all the detailed information about the bird is placed in darwin core fields. thus, the document is dissembled into a few pieces of data which are properly placed into metadata fields where they can be indexed and searched. having data alone is not sufficient to meet linked data requirements. the first of the four rules of linked data is to name things using uris.33 to add uris to the data, the authors needed to standardize the data and reconcile it against widely-used authorities such as library of congress subject headings, wikidata, and the getty thesaurus of geographic names. standardized data tremendously increases the percentage of data reconciliation, which will lead to more links with related data once published. figure 4. amenaprkitch khachkar. figure 4 shows an example from the armenia heritage and social memory program. this is a visual materials collection with photos and 3d digital models. it was created by the digital heritage and humanities collection team at the library. the collection brings together comprehensive information and interactive 3d visualization of the fundamentals of armenian identity, such as their architectures, languages, arts, etc.34 when preparing the metadata for the items in this collection, the authors devoted extra effort to adding geographic location metadata. this effort serves two purposes: one is to respectfully and honestly include the information in the collection; and the second is to provide future reference to the location of each item as the physical items are in danger and could disappear or be ruined. the authors employed the getty thesaurus of geographic names because it supports a hierarchical location structure. the location names at each level can be reconciled and have their own uris. the authors also paid extra attention on the subject headings. figure 5 shows how the authors used library of congress subject headings, local subject headings assigned by the researchers, and the getty art and architecture thesaurus for this collection. in the data reconciliation stage, the metadata can be compared against both library of congress subject headings authority files and the getty aat vocabularies so that as many uris as possible can be fetched and added to the metadata. the focus information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 10 on geographic names and subject headings is to standardize the data and use controlled vocabularies as much as possible. once moving to the linked data world, the data will be ready to be added with uris. therefore, the data can be linked easily and widely. figure 5. amenaprkitch khachkar metadata. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 11 one of the goals of building linked data is to make sense out of data and to generate new knowledge. as the librarians explored how to bring together multiple usf digital collections to highlight african american history and culture, three collections seemed particularly appropriate: • an african american sheet music collection from the early 20th century (https://digital.lib.usf.edu/sheet-music-aa) • the “narratives of formerly enslaved floridians” collection from 1930s (https://digital.lib.usf.edu/fl-slavenarratives) • the “otis r. anthony african american oral history collection” from 19781979(https://digital.lib.usf.edu/ohp-otisanthony) these collections are all oral expressions of african american life in the us. they span the first three-quarters of the 20th century around the time of the civil rights movement. creating linked data out of these collections will help shed light on the life of african americans through the 20th century and how it related to the civil rights movement. with semantic web technology support, these collections can be turned into machine actionable datasets to assist research and education activities on racism, anti-racism and to piece into the holistic knowledge base. usf libraries started to partner with dpla in 2018. dpla leverages linked data technology to increase discoverability of the collections contributed to it. dpla employs javascript object notation for linked data (json-ld) as its serialization for their data which is in rdf/xml format. json-ld has a method of identifying data with iris. the use of this method can effectively avoid data ambiguity considering dpla is holding a fairly large amount of data. json-ld also provides computational analysis in support of semantics services which enriches the metadata and in results, the search will be more effective.35 in the 18 months since usf began contributing selected digital collections to dpla, usf materials have received more than 12,000 views. it is exciting to see the increase in the usage of the collections and it is the hope that they will be accessed by more diverse user groups. usf libraries are exploring ways to scale up the project and eventually transition all the existing digital collections metadata to linked data. one possible way of achieving this goal would be through metadata standardization. a pilot project at usf libraries is to process one medium-size image collection of 998 items. the original metadata is in mods/mets xml files. we first decided to use the dpla metadata application profile as the data model. if the pilot project is successful, we will apply this model to all of our linked data transformation processes. in our pilot, we are examining the fields in our mods/mets metadata and identify those that will be meaningful in the new metadata schema. then we transport the metadata in those fields to excel files. the next step is to use openrefine to reconcile the data in these excel files to fetch uris for exact match terms. during this step, we are employing reconciliation services from the library of congress, getty tgn, and wikidata. after all the metadata is reconciled, we are transforming the excel file to triples. the column headers of the excel file become the predicates and the metadata as well as their uris will be the objects of the triples. next, these triples will be stored in an apache jena triple-store database so that we can start designing sparql queries to facilitate search. the final step will be designing a user-friendly interface to further optimize the user experiences. in this process, to make the workflow as scalable as possible, we are focusing on testing two processes: first, creating a universal metadata application profile to apply to the most, if not all, of the collections; and second, only fetching uris for exactly matching terms during the reconciliation information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 12 process. both of these processes aim to reduce human interactions with the metadata so that the process is more affordable to the library. conclusion and future work linked data can help collection discoverability. in the past six months, usf has seen an increase in materials going online. usf special collections department rapidly created digital exhibits to showcase their materials. if the trend in remote work continues, there is reason to believe that digital materials may be increasingly present and, given enough time and expertise, libraries can leverage linked data to better support current and new collections. the societal impact of covid-19 worldwide sheds light on the importance of technologies such as linked data that can help increase discoverability. when items are being created and shared online, either directly related to covid-19 or a result of its impact, linked data can help connect those resources. for instance, new covid-19 research is being developed and published daily. the publications office of the european union datathon entry “covid-19 data as linked data” states that “[t]he benefit of having covid-19 data as linked data comes from the ability to link and explore independent sources. for example, covid-19 sources often do not include other regional or mobility data. then, even the simplest thing, having the countries not as a label but as their uri of wikidata and dbpedia, brings rich possibilities for analysis by exploring and correlating geographic, demographic, relief, and mobility data.”36 the more institutions that contribute to this, the greater the discoverability and impact of the data. in 2020 there has been an increase in black lives matter awareness across the country. this affects higher education. usf libraries are not the only ones engaged in addressing racial disparities. many institutions have been doing this for years. others are beginning to focus on this area. no matter whether it’s a new digital collection or one that’s been around for decades, the question remains: how do people find these resources? perhaps linked data technologies can help solve that problem. linked data is a technology that can help accentuate the human effort put forth to create those collections. linked data is a way to assist humans and computers in finding interconnected materials around the internet. usf libraries faced many obstacles implementing linked data. there is a technological barrier that takes well-trained staff to surmount, i.e., creating a linked data triple store database and having linked data interact correctly on webpages. there is a time commitment necessary to create the triples and sparql queries. sparql queries themselves vary from being relatively simple to incredibly complicated. the authors also had the stumbling block of understanding how linked data worked together on a theoretical level. taking all of these considerations into account, we can say that creating linked data for a digital collection is not for the faint of heart. a cost/benefit analysis must be taken and assessed. the authors of this paper must continue to determine the need for linked data. at usf, the authors have taken the first steps in converting digital collections into linked data. we’ve moved from understanding the theoretical basis of linked data and into the practical side where the elements that make up linked data start coming together. the work to create triples, sparql queries, and uris has begun, and full implementation has started. our linked data group has learned the fundamentals of linked data. the next, and current, step is to develop workflows for existing metadata conversion into appropriate linked data. the group meets regularly and has created a triple store database and converted data into linked data. while the process is slow information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 13 moving due to group members’ other commitments, progress is being made by looking at the most relevant collections we would like to transform and moving forward from there. we’ve located the collections we want to work on, taking an iterative approach to creating linked data as we go. with linked data, there is a lot to consider. how do you start up a linked data program at your institution? how will you get the required expertise to create appropriate and high-quality linked data? how will your institution crosswalk existing data into triples format? is it worth the investment? it may be difficult to answer these questions but they’re questions that must be addressed. the usf libraries will continue pursuing linked data in meaningful ways and showcasing linked data’s importance. linked data can help highlight all collections but more importantly those of marginalized groups, which is a priority of the linked data group. endnotes 1 peter perl, “what is the future of truth?” pew trust magazine, february 4, 2019, https://www.pewtrusts.org/en/trust/archive/winter-2019/what-is-the-future-of-truth. 2 “anti-racism reading lists,” university of minnesota library, accessed september 24, 2020, https://libguides.umn.edu/antiracismreadinglists. 3 “triad black lives matter protest collection,” unc greensboro digital collections, accessed december 9, 2020, http://libcdm1.uncg.edu/cdm/blm. 4 “umbra search african american history,” umbra search, accessed december 10, 2020, https://www.umbrasearch.org/. 5 karen coyle, “on the web, of the web” (keynote at lita, october 1, 2011), https://kcoyle.net/presentations/lita2011.html. 6 donna ellen frederick, “disruption or revolution? the reinvention of cataloguing (data deluge column),” library hi tech news 34, no. 7 (2017): 6–11, https://doi.org/10.1108/lhtn-072017-0051. 7 tim berners-lee, “linked data,” w3, last updated june 18, 2009, https://www.w3.org/designissues/linkeddata.html. 8 neil wilson, “linked data prototyping at the british library” (paper presentation, talis linked data and libraries event, 2010). 9 diane rasmussen pennington and laura cagnazzo, “connecting the silos: implementations and perceptions of linked data across european libraries,” journal of documentation 75, no. 3 (2019): 643–66, https://doi.org/10.1108/jd-07-2018-0117. 10 jane hagerlid, “the role of the national library as a catalyst for an open access agenda: the experience in sweden,” interlending and document supply 39, no. 2 (2011): 115–18, https://doi.org/10.1108/02641611111138923. 11 pennington and cagnazzo, “connecting the silos,” 643–66. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 14 12 gillian byrne and lisa goddard, “the strongest link: libraries and linked data,” d-lib magazine 16, no. 11/12 (2010): 2, https://doi.org/10.1045/november2010-byrne. 13 bendik bygstad, gheorghita ghinea, and geir-tore klæboe, “organisational challenges of the semantic web in digital libraries: a norwegian case study,” online information review 33, no. 5 (2009): 973–85, https://doi.org/10.1108/14684520911001945. 14 pennington and cagnazzo, “connecting the silos,” 643–66. 15 heather lea moulaison and anthony j. million, “the disruptive qualities of linked data in the library environment: analysis and recommendations,” cataloging & classification quarterly 52, no. 4 (2014): 367–87, https://doi.org/10.1080/01639374.2014.880981. 16 marshall breeding, “linked data: the next big wave or another tech fad?” computers in libraries 33, no. 3 (2013): 20–22. 17 moulaison and million, “the disruptive qualities of linked data,” 369. 18 nuno freire and sjors de valk, “automated interpretability of linked data ontologies: an evaluation within the cultural heritage domain,” (workshop, ieee conference on big data, 2019). 19 “bibframe update forum at the ala annual conference 2018,” (washington, dc: library of congress, july 2018), https://www.loc.gov/bibframe/news/bibframe-update-an2018.html. 20 jacquie samples and ian bigelow, “marc to bibframe: converting the pcc to linked data,” cataloging & classification quarterly 58, no. 3–4 (2020): 404. 21 oliver pesch, “using bibframe and library linked data to solve real problems: an interview with eric miller of zepheira,” the serials librarian 71, no. 1 (2016): 2. 22 pesch, 2. 23 gianfranco crupi, “beyond the pillars of hercules: linked data and cultural heritage,” italian journal of library, archives & information science 4, no. 1 (2013): 25–49, http://dx.doi.org/10.4403/jlis.it-8587. 24 “resource description framework (rdf),” w3c, february 25, 2014, https://www.w3.org/rdf/. 25 tim berners-lee, james hendler, and ora lassila, “the semantic web,” scientific american 284, no. 5 (2001): 34–43, https://www.jstor.org/stable/26059207. 26 dean allemang and james hendler, “semantic web application architecture,” in semantic web for the working ontologist: effective modeling in rdfs and owl, (saint louis: elsevier science, 2011): 54–55. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 15 27 philip e. schreur and amy j. carlson, “bridging the worlds of marc and linked data: transition, transformation, accountability,” serials librarian 78, no. 1–4 (2020), https://doi.org/10.1080/0361526x.2020.1716584. 28 “about us,” dpla: digital public library of america, accessed december 11, 2020. https://dp.la/about. 29 “about us,” europeana, accessed december 11, 2020, https://www.europeana.eu/en/about-us. 30 liyang yu, “search engines in both traditional and semantic web environments,” in introduction to semantic web and semantic web services (boca raton: chapman & hall/crc, 2007): 36. 31 schreur and carlson, “bridging the worlds of marc and linked data.” 32 “audubon florida tavernier research papers,” university of south florida libraries digital collections, accessed november 30, 2020, https://lib.usf.edu/?a64/. 33 berners-lee, “linked data,” https://www.w3.org/designissues/linkeddata.html. 34 “the armenian heritage and social memory program,” university of south florida libraries digital collections, accessed november 30, 2020, https://digital.lib.usf.edu/armenianheritage/. 35 erik t. mitchell, “three case studies in linked open data,” library technology reports 49, no. 5 (2013): 26-43. 36 “covid-19 data as linked data,” publications office of the european union, accessed december 11, 2020, https://op.europa.eu/en/web/eudatathon/covid-19-linked-data. letter from the editor: improving ital's peer review letter from the editor improving ital’s peer review kenneth j. varnum information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.13573 over the past several months, ital has enrolled almost 30 reviewers to the journal’s new review panel. increasing the pool of reviewers for the journal supports the editorial board’s desire to provide equitable treatment to submitted articles by having two independent reviews provide double-blind consideration of each article, a practice that has now been in effect for articles submitted after may 1, 2021. i am grateful to the individuals (listed on the editorial team page) who volunteered, attended an orientation session, and have begun contributing to the work of the journal. * * * * * * in this issue in the editorial section of this issue, we have a column by incoming core president margaret heller. her essay, “making room for change through rest,” highlights the need for each of us to recharge after a collectively challenging year. this inaugurates what we plan to be an occasional feature, the “core leadership column,” to which we invite contributions from members of core leadership. it is joined by two other regular items, our editorial board thoughts essay by michael p. sauers, “do space’s virtual interview lab: using simple technology to serve the public in a time of crisis” and william yarbrough’s public libraries leading the way column, “service barometers: using lending kiosks to locate patrons.” an interesting and diverse set of peer-reviewed articles round out the issue: 1. the impact of covid-19 on the use of academic library resources / ruth sara connell, lisa c. wallis, and david comeaux 2. emergency remote library instruction and tech tools: a matter of equity during a pandemic / kathia ibacache, amanda rybin, and eric vance 3. off-campus access to licensed online resources through shibboleth / francis jayakanth, anand t. byrappa, and raja visvanathan 4. a framework for measuring relevancy in discovery environments / blake l. galbreath, alex merrill, and corey m. johnson 5. beyond viaf: wikidata as a complementary tool for authority control in libraries / carlo bianchini, stefano bargioni, and camillo carlo pellizzari di san girolamo 6. algorithmic literacy and the role for libraries / michael ridley and danica pawlick-potts 7. persistent urls and citations offered for digital objects by digital libraries / nicholas homenda kenneth j. varnum, editor varnum@umich.edu june 2021 https://ejournals.bc.edu/index.php/ital/about/editorialteam https://ejournals.bc.edu/index.php/ital/article/view/13513 https://ejournals.bc.edu/index.php/ital/article/view/13461 https://ejournals.bc.edu/index.php/ital/article/view/13461 https://ejournals.bc.edu/index.php/ital/article/view/13499 https://ejournals.bc.edu/index.php/ital/article/view/13499 https://ejournals.bc.edu/index.php/ital/article/view/12629 https://ejournals.bc.edu/index.php/ital/article/view/12751 https://ejournals.bc.edu/index.php/ital/article/view/12751 https://ejournals.bc.edu/index.php/ital/article/view/12589 https://ejournals.bc.edu/index.php/ital/article/view/12835 https://ejournals.bc.edu/index.php/ital/article/view/12959 https://ejournals.bc.edu/index.php/ital/article/view/12963 https://ejournals.bc.edu/index.php/ital/article/view/12987 mailto:varnum@umich.edu in this issue classical musicians v. copyright bots: how libraries can aid in the fight article classical musicians v. copyright bots how libraries can aid in the fight adam eric berkowitz information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14027 adam eric berkowitz (berkowitza@hcflgov.net) is supervisory librarian, tampahillsborough county public library. © 2022. abstract the covid-19 pandemic forced classical musicians to cancel in-person recitals and concerts and led to the exploration of virtual alternatives for engaging audiences. the apparent solution was to livestream and upload performances to social media websites for audiences to view, leading to income and a sustained social media presence; however, automated copyright enforcement systems add new layers of complexity because of an inability to differentiate between copyrighted content and original renditions of works from the public domain. this article summarizes the conflict automated copyright enforcement systems pose to classical musicians and suggests how libraries may employ mitigation tactics to reduce the negative impacts when uploaders are accused of copyright infringement. introduction the covid-19 pandemic, unlike anything the country has seen in a century, forced industries to reevaluate the manner in which they provide services to the public. businesses and citizens everywhere made hairpin turns as they quickly searched for virtual alternatives to everyday inperson activities. with many remaining home for extended periods of time, demand for digital content and entertainment skyrocketed. in may 2020, comcast reported a 40% increase in online video streaming since march 1, just weeks before governments instated stay-at-home mandates.1 throughout the year, subscription-based streaming services saw enormous surges in customer usage and, likewise, social media platforms saw a significant spike in content production and consumption.2 daily blogging on facebook replaced in-person interactions, and youtubers generated higher volumes of videos to meet viewer demand. classical musicians were also heavily reliant on social media platforms in order to showcase performances as pointed out in the washington post article “copyright bots and classical musicians are fighting online. the bots are winning.” highlighted by american library association’s american libraries, the article illustrated the toll social media content moderation algorithms took on classical musicians sharing their performances online.3 this article became the starting point for the 2021 study “are youtube and facebook canceling classical musicians?,” which investigated the relationship between classical musicians and automated copyright enforcement systems.4 the following is a summary of this study’s findings and brings attention to the role libraries can play in aiding classical musicians facing copyright infringement claims. automated copyright enforcement evidence shows that automated copyright enforcement systems wrongfully remove useruploaded materials in the name of copyright protections on a regular basis.5 in fact, it happens so often that the australian broadcasting corporation began wittingly dubbing such instances “copywrongs.”6 these algorithms are not designed to distinguish between recordings of music mailto:berkowitza@hcflgov.net information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 2 owned by record labels and those shared online by freelance musicians. they are instructed to recognize copyrighted recordings and content resembling those recordings as identical matches, ensuring the protection of intellectual property from unauthorized reproduction. as such, automated content moderation systems are incapable of making allowances for the performance of works from the public domain. such performances comprise nearly all of a classical musician’s repertoire. automated copyright enforcement systems are typically based on a combination of matching and classification methods. the most effective matching technique for content moderation is perceptual hashing, which isolates unique strings of data (hashes) taken from an uploaded file and compares distinguishing markers and patterns to a database of samples provided by copyright owners.7 this technique allows systems to detect exact matches and iterations of the original work, such as live recordings and remixes.8 among classification methods, artificial neural networks with deep learning are best suited to the task of algorithmic moderation. consisting of a network of nodes, they are meant to simulate the structure and function of neural networks in animals and humans.9 this enables them to solve multifaceted, dynamic problems, which makes them ideal for instantaneous content moderation, allowing them to identify musical similarities in real time.10 both youtube and facebook enable users to upload recordings and broadcast live feeds to their websites. matching techniques are used to review prerecorded content since the upload process allows for automated systems to sample the material for comparison to the companies’ hash databases before allowing the recording to be posted.11 in contrast, live broadcasts are transmitted instantaneously and allow for no time to review the footage before it is visible online. therefore, hashes cannot be sampled from streaming content, requiring that classification methods using training data identify infringing material on the fly.12 while these algorithms make content moderation easier, they are limited in their capacity. one study showed that youtube is surprisingly inaccurate in its attempts to recognize infringing material in live broadcasts, failing to identify 26% of copyrighted footage within the first thirty minutes of streaming and blocking 22% of non-infringing livestreams.13 research strongly suggests that the only factors considered by music copyright enforcement systems are pitch, volume, and melodic and harmonic contour.14 those values alone cannot be used to distinguish copyrighted works from the public domain. as such, these systems are not yet advanced enough to account for the total complexity of human creativity, and human intervention is required before these programs systematically accuse uploaders of copyright infringement.15 compositions in the public domain are not subject to copyright; however, recorded performances of compositions from the public domain can be copyrighted. individuals may upload or livestream their own performances of classical music without fear of infringing copyright but may not upload another musician’s copyrighted recordings of the same pieces. for example, no one owns the copyright to bach’s cello suites and, therefore, anyone can profit from performing these works. sony music, though, owns the copyright to yo-yo ma’s recordings of bach’s cello suites, and anyone uploading these specific recordings to social media would be infringing copyright and subject to the repercussions. unfortunately, automated copyright enforcement systems often misidentify an individual’s performances as copyrighted recordings. information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 3 the impact on classical musicians classical musicians are accustomed to having their content misidentified for infringing copyright, but with the pandemic forcing many more musicians to share performances regularly on social media, the problem has become ever more pervasive. adrian spence, the artistic director for chamber ensemble camerata pacifica, found himself appealing multiple copyright claims from both facebook and youtube. on occasion, he would dispute several claims issued by different copyright owners for the same recording. until these issues were resolved, facebook suspended camerata pacifica’s ability to livestream, and youtube displayed a notification on their channel informing viewers that their videos were likely to be removed due to anticipated copyright infringement.16 owen espinosa, a high school senior, was preparing for a piano recital, and during rehearsal, facebook ended his livestream over claims of copyright infringement. he was unable to successfully appeal the claim which meant that facebook would not host his performance. instead, he had to broadcast his recital on an acquaintance’s youtube channel.17 michael sheppard, a professional pianist, has had broadcasts interrupted and videos removed by facebook multiple times with notifications stating that music owned by naxos of america was detected in his performances.18 after facebook rejected his disputes, sheppard took to twitter, alerting naxos of his situation. his videos were eventually restored, but nothing could be done about his livestreams.19 the violinist.com broadcasts weekly, hour-long concerts featuring multiple guest musicians. during one of these performances, facebook muted child violinist yugo maeda due to a claim of copyright infringement. after appealing the notice, facebook unmuted maeda’s performance three days later.20 while covid-19 exacerbated the issue, classical musicians often had their performances interrupted or removed from social media. in 2019, conducting students at the university of british colombia had their facebook live feed interrupted over copyright infringement claims and, in 2018, facebook removed a recording of an in-home performance given by pianist james rhodes also stating that the music infringed copyright.21 also in 2018, the australia broadcasting corporation’s abc classic fm livestreamed a performance of beethoven’s symphony no. 9. the broadcast ended with facebook issuing a claim stating that the music in question was owned by two different copyright owners.22 in 2016, violinist claudia schaer disputed several of youtube’s copyright claims. she typically had success with these appeals, but one of her recordings received three claims from different copyright owners. she was able to refute two of them; however, the third remained, and she was warned that if she was unsuccessful in her second attempt at appealing the claim, her account would receive a copyright strike, deleting her video from the site permanently. she felt both intimidated and aggravated by the ordeal.23 the author of this article has also had to refute a copyright infringement claim on youtube. according to the notice, 51 seconds of the author’s approximately five-minute performance of beethoven’s “für elise” infringed copyright. as a result, the claimant authorized youtube to include ads in the video, allowing them to generate revenue. the dispute was upheld after the claimant’s 30-day window for a response expired. although the author does not rely on monetized videos and livestreams for income, it is unethical for another entity to profit from the work of an unaffiliated individual. information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 4 disputing a copyright claim while there is recourse for uploaders facing copyright claims from social media sites, the appeals process can be lengthy and overwhelming. it can take more than two months for youtube to render a verdict when a musician disputes a copyright notice. during this span of time, classical musicians depending on ad revenue cease to generate income as these funds are held by the company until a final decision is made, at which point all profits accumulated by the video are released to the appropriate party. if the claim is upheld, the recording may remain online with proceeds going to the supposed copyright owner.24 uploaders may attempt to refute the result, but a failed appeal leads to the video’s removal and a copyright strike levied against the uploader s preventing them from livestreaming and monetizing videos for three months. should this occur, a counter notification can be issued which insists that the content in question has been mischaracterized as infringing and requires that would-be copyright owners file a lawsuit to uphold the claim. after three strikes, accounts are permanently deleted along with all associated uploads.25 the time that elapses for a final verdict along with the suspension of uploading and livestreaming permissions due to a copyright strike amounts to more than five months without being able to sustain an income. when a single performance is charged with multiple claims from different entities, as in the aforementioned examples, the uploader must dispute each one individually. this makes it easy to accumulate copyright strikes, risking account termination. it would be reasonable to assume that many classical musicians who endure these circumstances avoid the dispute process for fear of youtube removing their recordings, enforcing limitations on their ability to broadcast and monetize videos, and even permanently deleting their accounts. meanwhile, mistakenly recognized copyright owners can leverage this by appropriating the earnings generated by the work of unaffiliated musicians. furthermore, should the matter be redirected to the courts, the uploader faces the burden of retaining legal counsel. youtube algorithms deal with approximately 98% of all copyright issues and, because youtube’s business model generates profits primarily via user-uploaded content, it has been found to show bias towards established copyright owners.26 copyright owners can set preferences for how they want the system to react to instances of copyright infringement, resulting in the automatic monetization of 95% of claims for the copyright owner. as a result, user uploads make up 50% of the revenue generated by youtube for the music industry.27 although google reported in 2018 that 60% of disputed claims were found in favor of accused uploaders, the system clearly benefits established copyright owners.28 all of the aforementioned musicians who were accused of copyright infringement had their livestreams interrupted, saw their videos removed, and witnessed companies profiting from their work performing music that has long since passed into the public domain. youtube’s video series copyright and content id on youtube attempts to educate users on how automated copyright enforcement and the dispute process work, and while fair use and copyright permissions are discussed, the public domain is never mentioned; although, youtube does offer a brief explanation of the public domain on its help site.29 according to the us copyright act, the duration of copyright extends to 70 years after the death of the known composer, and for uncredited compositions or those composed by a musician under a pseudonym, copyright is recognized for 95 years from the date the work was published or 120 years from when it was composed, depending on which information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 5 expires first.30 while record labels are fully within their right to protect the recordings they own, that should have no bearing on individual musicians performing pre-twentieth-century music. the majority of online music consumption occurs on social media sites with 47% of the market share going to youtube.31 reports from deezer showed a near 20% increase in users listening to classical music since the start of the pandemic.32 given that more users are gravitating towards listening to classical music, and that the most popular digital access point for music is youtube, classical musicians coping with pandemic-induced restrictions were presented with what should have proven to be a lucrative opportunity. adhering to social distancing requirements and stay-athome mandates meant musicians cancelled their performances, leading to an exploration of virtual alternatives such as uploading recordings and livestreaming. obstructing these activities interrupts their sole source of income. conclusion while researchers have suggested a handful of improvements for automated copyright enforcement systems, they have not addressed the role that libraries can play in assis ting classical musicians.33 the tampa-hillsborough county public library, prior to the spread of covid-19, maintained four branches outfitted with recording studios; today, that number has grown to five. prior to pandemic library closures, recording studios were reserved just over 800 times, amounting to about 1,600 hours of usage between january 1, 2019 and march 13, 2020. patrons using the recording studios produce music and videos with the intention of uploading them to social media. other libraries with recording studios likely see their patrons doing the same, but without knowledge of copyright. libraries have the means and the motive to assist classical musicians. libraries can hold classes covering the basics of copyright, fair use, and the public domain, or that expand upon how automated copyright enforcement systems work on social media. library staff, however, may feel overwhelmed by the numerous texts on these subjects and may not know where to begin. an excellent starting point is the frequently asked questions page on the us copyright office website. this webpage offers explanations for a broad array of copyright-related issues and questions.34 fair use allows for unauthorized borrowing from a creative work; however, navigating how fair use is determined is always challenging. steven m. davis’ “computerized takedowns: a balanced approach to protect fair uses and the rights of copyright owners” is a reliable point of reference for defining fair use, its application in copyright infringement cases, and ethical and legal implications regarding the limitations of algorithmic moderation systems.35 for a thorough look into the mechanics and applications of automated copyright enforcement, refer to the previously mentioned “are youtube and facebook cancelling classical musicians?” this article offers a synopsis on the shift from physical to digital media, descriptions of different algorithmic models developed specifically for copyright enforcement, and an account of how youtube’s and facebook’s copyright enforcement systems came to be.36 libraries can also offer help sessions that support patrons through the copyright claims dispute process. the youtube dispute interface is user friendly, and the instructions are comprehensible. throughout each step, explanations are offered to clarify what is being required of the user. for example, when asked for the reasoning behind the dispute, the user is offered four options: the disputed material is original content, the user has acquired permission to reproduce the co ntent, the content falls under fair use, or the content originates from the public domain. once selected, information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 6 additional explanations for each option are given in order to provide further clarification and context which allows the user to reconsider their choice and also helps the user better explain how their content falls under the selected category. finally, the user is asked to provide a narrative explaining how the content in question does not infringe copyright. facebook’s counternotification process is less generous, providing brief, ineffectual descriptions of copyright and a simple form requesting the user’s personal information and explanation for why the copyright infringement claim is unfounded. after library staff demonstrate the use of these interfaces, patrons can be guided to library resources to help them articulate and refine their arguments. for anything that cannot be found among the library’s collections, library staff may need to assist with internet searches, or patrons may request materials through interlibrary loan. additionally, patrons may still feel overwhelmed by the terminology being presented, which would further support the need for library programming that covers copyright-related topics. when considering the research involved to produce a convincing counterargument, information literacy and metaliteracy classes may be warranted. libraries can also encourage patrons to include descriptions in their uploads and livestreams with links to supporting evidence explaining that the featured music belongs to the public domain, and as the uploader, they own the rights to recordings and broadcasts of their own performances. the public domain description on youtube’s help page provides links to columbia university libraries’ copyright advisory service and cornell university’s copyright information center, and it suggests that these resources can lead to supporting evidence regarding works in the public domain.37 another excellent resource is the international music score library project’s petrucci music library. this database of almost 200,000 compositions belonging to the public domain features both sheet music and recordings of each of these works.38 users can also point to the public domain song anthology, a book comprising 348 popular songs from the public domain; the entire text can be downloaded from the publisher’s website.39 these resources and explanations can be included in disputes to support the reasoning for why a copyright claim is invalid. it should be noted that library employees are most often not lawyers, and as such, it is ill-advised to answer direct questions about the specific legality of the myriad of situations musicians face when disputing copyright claims. these matters require expert, specialist knowledge with which library staff are not equipped. the role of the library should only be to provide access to resources and inform the public on various issues regarding the use of information. as information specialists, librarians are in a unique position to educate patrons on information policy, and in this case, copyright. library systems with law libraries or with access to law collections and databases would be especially suited to teach patrons about copyright, guide them through the dispute process, and assist them with gathering resources to support their counterarguments. the tampahillsborough county public library and other systems like it that are outfitted with both music recording studios and a law library are encouraged to offer such services. hopefully, this overview of automated copyright enforcement, its impacts on classical musicians, and the suggestions to libraries offered here will promote further conversation that eventually leads to action and a possible solution. perhaps, as progress is made, automated copyright enforcement systems will grow more hospitable towards user-generated recordings and livestreams of classical music. after all, social media should be able to freely host the artistic talents of all musicians. information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 7 endnotes 1 “covid-19 network update,” comcast, may 20, 2020, https://corporate.comcast.com/covid19/network/may-20-2020. 2 julia alexander, “the entire world is streaming more than ever—and it’s straining the internet,” the verge, march 27, 2020, https://www.theverge.com/2020/3/27/21195358/streaming-netflix-disney-hbo-nowyoutube-twitch-amazon-prime-video-coronavirus-broadband-network; ella koeze and nathaniel popper, “the virus changed the way we internet,” the new york times, april 7, 2020, https://www.nytimes.com/interactive/2020/04/07/technology/ coronavirus-internet-use.html. 3 michael andor brodeur, “copyright bots and classical musicians are fighting online. the bots are winning,” the washington post, may 21, 2020, https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classicalmusicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd28fb313d1886_story.html. 4 adam eric berkowitz, “are youtube and facebook cancelling classical musicians? the harmful effects of automated copyright enforcement on social media platforms,” notes 78, no. 2 (december 2021): 177–202. 5 rebecca tushnet, “all of this has happened before and all of this will happen again: innovation in copyright licensing,” berkeley technology law journal 29, no. 3 (december 2014): 1147–87. 6 matthew lorenzon, “why is facebook muting classical music videos?” abc classic fm, december 21, 2018, https://www.abc.net.au/classic/read-and-watch/music-reads/facebookcopyright/10633928. 7 xia-mu niu and yu-hua jiao, “an overview of perceptual hashing,” acta electronica sinica 36, no. 7 (2008): 1405–11. 8 robert gorwa, reuben binns, and christian katzenbach, “algorithmic content moderation: technical and political challenges in the automation of platform governance,” big data & society 7, no. 1 (january 2020): 7. 9 larry hardesty, “explained: neural networks,” mit news, april 14, 2017, https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414. 10 daniel graupe, principles of artificial neural networks, 3rd ed. (hackensack, nj: world scientific publishing company, 2013), 1–3. 11 gorwa, binns, and katzenbach, “algorithmic content moderation,” 7. 12 daniel (yue) zhang, jose badilla, herman tong, and dong wang, “an end-to-end scalable copyright detection system for online video sharing platforms,” in proceedings of the 2018 https://corporate.comcast.com/covid-19/network/may-20-2020 https://corporate.comcast.com/covid-19/network/may-20-2020 https://www.theverge.com/2020/3/27/21195358/streaming-netflix-disney-hbo-now-youtube-twitch-amazon-prime-video-coronavirus-broadband-network https://www.theverge.com/2020/3/27/21195358/streaming-netflix-disney-hbo-now-youtube-twitch-amazon-prime-video-coronavirus-broadband-network https://www.nytimes.com/interactive/2020/04/07/technology/coronavirus-internet-use.html https://www.nytimes.com/interactive/2020/04/07/technology/coronavirus-internet-use.html https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classical-musicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd-28fb313d1886_story.html https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classical-musicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd-28fb313d1886_story.html https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classical-musicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd-28fb313d1886_story.html https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414 information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 8 ieee/acm international conference on advances in social networks analysis and mining (barcelona, spain: ieee press, 2018), 626–27. 13 daniel (yue) zhang et al., “crowdsourcing-based copyright infringement detection in live video streams,” in proceedings of the 2018 ieee/acm international conference on advances in social networks analysis and mining (barcelona, spain: ieee press, 2018), 367. 14 berkowitz, “are youtube and facebook cancelling classical musicians?,” 200. 15 diego cerna aragon, “behind the screen: content moderation in the shadows of social media,” critical studies in media communication 37, no. 5 (october 19, 2020): 512–14. 16 brodeur, “copyright bots and classical musicians are fighting online.” 17 amy williams, “camerata pacifica to stream high school graduate’s senior recital,” classical candor: classical music news and reviews (blog), june 6, 2020, https://classicalcandor.blogspot.com/2020/06/classical-music-news-of-week-june-62020.html. 18 baltimore school for the arts, “sometimes you have to fight!,” facebook, may 22, 2020, https://www.facebook.com/baltimoreschoolforthearts/posts/sometimes-you-have-to-fightour-michael-sheppard-was-recently-giving-a-facebook-/3146142648740808/. 19 michael sheppard (@pianistcomposer), “dear @naxosrecords please stop muting portions of works whose composers have been dead for hundreds of years.” twitter, may 9, 2020, https://twitter.com/pianistcomposer/status/1259118489622777856. 20 laurie niles, “facebook and naxos censor music student playing bach,” violinist.com (blog), july 13, 2020, https://www.violinist.com/blog/laurie/20207/28375/. 21 brodeur, “copyright bots and classical musicians are fighting online”; ian morris, “facebook blocks musician from uploading his own performance—but did he break copyright?” daily mirror, september 7, 2018, https://www.mirror.co.uk/tech/facebook-blocks-musicianuploading-performance-13208194. 22 matthew lorenzon, “why is facebook muting classical music videos?” abc classic fm, december 21, 2018, https://www.abc.net.au/classic/read-and-watch/music-reads/facebookcopyright/10633928. 23 claudia schaer, “youtube copyright issues,” violinist.com (blog), february 15, 2016, https://www.violinist.com/discussion/archive/27589/. 24 “monetization during content id disputes,” youtube help, accessed october 24, 2019, https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,fili ng-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-idappeal,more-info-about-the-content-id-appeal-process. https://classicalcandor.blogspot.com/2020/06/classical-music-news-of-week-june-6-2020.html https://classicalcandor.blogspot.com/2020/06/classical-music-news-of-week-june-6-2020.html https://www.facebook.com/baltimoreschoolforthearts/posts/sometimes-you-have-to-fight-our-michael-sheppard-was-recently-giving-a-facebook-/3146142648740808/ https://www.facebook.com/baltimoreschoolforthearts/posts/sometimes-you-have-to-fight-our-michael-sheppard-was-recently-giving-a-facebook-/3146142648740808/ https://twitter.com/pianistcomposer/status/1259118489622777856 https://www.violinist.com/blog/laurie/20207/28375/ https://www.mirror.co.uk/tech/facebook-blocks-musician-uploading-performance-13208194 https://www.mirror.co.uk/tech/facebook-blocks-musician-uploading-performance-13208194 https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://www.violinist.com/discussion/archive/27589/ https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,filing-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-id-appeal,more-info-about-the-content-id-appeal-process https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,filing-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-id-appeal,more-info-about-the-content-id-appeal-process https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,filing-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-id-appeal,more-info-about-the-content-id-appeal-process information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 9 25 “copyright strike basics,” youtube help, accessed october 24, 2019, https://support.google.com/youtube/answer/2814000#zippy=,what-happens-when-you-geta-copyright-strike,resolve-a-copyright-strike. 26 google, how google fights piracy (november 2018), 14, https://www.blog.google/documents/27/how_google_fights_piracy_2018.pdf; joanne e. gray and nicolas p. suzor, “playing with machines: using machine learning to understand automated copyright enforcement at scale,” big data & society 7, no. 1 (april 2020): 1–15. 27 karl borgsmiller, “youtube vs. the music industry: are online service providers doing enough to prevent piracy?” southern illinois university law journal 43, no. 3 (spring 2019): 660. 28 google, how google fights piracy, 28–31. 29 youtube creators, copyright and content id on youtube, october 12, 2020, accessed december 11, 2021, https://www.youtube.com/playlist?list=plpjk416fmkwrnrbv72kshryeknnsaafkd; “frequently asked copyright questions,” youtube help, accessed october 24, 2019, https://support.google.com/youtube/answer/2797449#c-pd&zippy=,what-is-the-publicdomain. 30 “how long does copyright protection last?” copyright.gov, us copyright office, https://www.copyright.gov/faq/faq-duration.html. 31 adam j. reis and manon l. burns, “who owns that tune? issues faced by music creators in today’s content-based industry,” landslide 12, no. 3 (january & february 2020): 13–16. 32 maddy shaw roberts, “research shows huge surge in millennials and gen zers streaming classical music,” classic fm, august 19, 2020, https://www.classicfm.com/music-news/surgemillennial-gen-z-streaming-classical-music/. 33 berkowitz, “are youtube and facebook cancelling classical musicians?,” 199–201. 34 “frequently asked questions” copyright.gov, us copyright office, https://www.copyright.gov/help/faq. 35 steven m. davis, “computerized takedowns: a balanced approach to protect fair uses and the rights of copyright owners,” roger williams university law review 23, no. 1 (winter 2018): 1– 24. 36 berkowitz, “are youtube and facebook cancelling classical musicians?,” 177–202. 37 “frequently asked copyright questions,” youtube help. 38 “main page,” (website), imslp: petrucci music library, accessed december 12, 2021, https://imslp.org/wiki/main_page. 39 david berger and chuck israels, the public domain song anthology: with modern and traditional harmonization (charlottesville: aperio, 2020), https://aperio.press/site/books/m/10.32881/book2/. https://support.google.com/youtube/answer/2814000#zippy=,what-happens-when-you-get-a-copyright-strike,resolve-a-copyright-strike https://support.google.com/youtube/answer/2814000#zippy=,what-happens-when-you-get-a-copyright-strike,resolve-a-copyright-strike https://www.blog.google/documents/27/how_google_fights_piracy_2018.pdf https://www.youtube.com/playlist?list=plpjk416fmkwrnrbv72kshryeknnsaafkd https://support.google.com/youtube/answer/2797449#c-pd&zippy=,what-is-the-public-domain https://support.google.com/youtube/answer/2797449#c-pd&zippy=,what-is-the-public-domain https://www.copyright.gov/faq/faq-duration.html https://www.classicfm.com/music-news/surge-millennial-gen-z-streaming-classical-music/ https://www.classicfm.com/music-news/surge-millennial-gen-z-streaming-classical-music/ https://imslp.org/wiki/main_page https://aperio.press/site/books/m/10.32881/book2/ abstract introduction automated copyright enforcement the impact on classical musicians disputing a copyright claim conclusion endnotes the role of the library in the digital economy article the role of the library in the digital economy serhii zharinov information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12457 serhii zharinov (serhii.zharinov@gmail.com) is researcher, state scientific and technical library of ukraine. © 2020. abstract the gradual transition to a digital economy requires all business entities to adapt to the new environmental conditions that are taking place through their digital transformation. these tasks are especially relevant for scientific libraries, as digital technologies make changes in the main subject field of their activities, the processes of creating, storing, and information disseminating. in order to find directions for the transformation of scientific libraries and determine their role in the digital economy, a study of the features of digital transformation and the experience of the digital transformation of foreign libraries was conducted. management of research data, which is implemented through the creation of current research information systems (cris) was found to be one of the most promising areas of the digital transformation of libraries. the problem area of this direction and ways of engaging libraries in it have been also analyzed in the work. introduction the transition to a digital economy contributes to the even greater penetration of digital technologies into our lives and the emergence of new conditions of competition and trends in organizations’ development. big data, machine learning, and artificial intelligence are becoming common tools implemented by the pioneers of digital transformation in their activities.1 significant changes in the main functions of libraries, storage and dissemination of information caused by the development of digital technologies, affect the operational activities of libraries, user and partners’ requests to the library, and ways to meet them. in the process of adapting to these changes, the role of libraries in the digital economy is changing. this study is designed to find current areas of library development and to determine the role of the library in the digital economy. achieving this goal requires study of the “digital economy” concept and the peculiarities of the digital transformation of organizations in order to better understand the role of the library in it; research on the development of libraries and determine what best fits the new role of the library in the digital economy; identification of obstacles to the development of this area and ways to engage libraries in it. the concept of the “digital economy” the transition to an information society and digital economy will gradually change all industries, and all companies must change accordingly.2 taking advantage of the digital economy is the main driving force of innovation, competitiveness, and economic development of the country.3 the transition to a digital economy is not instant but occurs over many years. the topic emerged starting at the end of the twentieth century, but in recent years has experienced rapid growth. in the web of science (wos) citation database, publications with this term in the title began to be published in 1996 (figure 1). mailto:serhii.zharinov@gmail.com information technology and libraries december 2020 the role of the library in the digital economy | zharinov 2 figure 1. the number of publications in the wos citation database for the query “digital economy.” one of the first books devoted entirely to the study of the digital economy concept is the work of don tapscott, published in 1996. in this book, the author understands the digital economy as an economy in which the use of digital computing technologies in economic activity becomes its dominant component.4 thomas mesenbourg, an american statistician and economist, identified in 2000 the three main components of the digital economy: e-business, e-commerce, and e-business infrastructure.5 a number of works on the development of indicators to assess the state of the digital economy, in particular, the work of philip barbet and nathalie coutinet, are based on the analysis of these components.6 alnoor bhimani, in his 2003 paper, “digitization and accounting change,” defined the digital economy as “the digital interrelationships and dependencies between emerging communication and information technologies, data transfers along predefined channels and emerging platforms, and related contingencies within and across institutional and organizational entities.”7 bo carlsson’s 2004 article described the digital economy as a dynamic state of the economy characterized by the constant emergence of new activities based on the use of the internet and new forms of communication between different authors of ideas, whose communication allows them to generate new activities.8 in 2009, john hand gave the meaning of the digital economy as the new design or use of information and communication technologies that help transform the lives of people, society, or business.9 0 100 200 300 400 500 600 700 1995 2000 2005 2010 2015 2020 information technology and libraries december 2020 the role of the library in the digital economy | zharinov 3 ciocoiu carmen nadia, in her 2011 article, explained the digital economy as a state of the economy where knowledge and networking begin to play a more important role than capital in a postindustrial society due to technology.10 in a 2014 article, kit lesya defined the digital economy as an element of the network economy, characterized by the transformation of all spheres of the economy by transferring information resources and knowledge to a computer platform for further use.11 ukrainian scientists mykhailo voinarenko and larysa skorobohata, in a study of network tools in 2015, gave the following definition of the digital economy: “the digital economy, unlike the internet economy, assumes that all economic processes (except for the production of goods) take place independently of the real world. goods and services do not have a physical medium but are ‘electronic.’”12 yurii pivovarov, director of the ukrainian association for innovation development (uaid), gives the following definition: “digital economy is any activity related to information technology. and in this case, it is important to separate the terms: digital economy and it sphere. after all, it is not about the development of it companies, but about the consumption of services or goods they provide—online commerce, e-government, etc.—using digital information technology.”13 taking into account the above, in this study, the digital economy is defined as digital infrastructure encompasses all business entities and their activities. the transition to the digital economy is the process of creating conditions for the digital transformation of organizations, the creation of digital infrastructure, and the process of gradual involvement of various economic entities and certain sectors of the economy in the digital infrastructure. one of the first practical and political manifestations of the transition to the digital economy was the european commission’s index of digital economy and society (desi), first published in 2014. the main components of the index are communications, human capital, internet use, digital integration, and digital public services. among european countries in 2019, there is significant progress in the digitalization of business and in the interaction of society with the state.14 for ukraine, the first step towards the digital economy was the digital economy and development concept of ukraine, which defines the understanding of the digital economy, the direction and principles of transition to it.15 thus, for active representatives of the public sector, this concept is a signal that the development of structures and organizations should be based not on improving operational efficiency, but on transformation in accordance with the requirements of industry 4.0. confirmation of the seriousness of the ukrainian government’s intentions in this direction is the creation of the ministry of digital transformation in 2019 and the digitization of the latest public services through online services.16 one of the priority challenges which needs to be solved at the stage of transition to the digital economy is the development of skills in working with digital technologies in the entire population . this is relevant not only for ukraine, but also for the european union. in europe, a third of the active workforce does not have basic skills in working with digital technologies; in ukraine, 15.1 percent of ukrainians do not have digital skills, and the share of the working population with below-average digital skills is 37.9 percent.17 information technology and libraries december 2020 the role of the library in the digital economy | zharinov 4 part of the solution to this challenge in ukraine is entrusted to the “digital education” project, implemented by the ministry of digital transformation (osvita.diia.gov.ua), which through the mini-series created by him for different target audiences should form digital literacy in the population of ukraine. features of digital transformation developed digital skills in the population make the digital transformation of organizations not just a competitive advantage, but a prerequisite for their survival. thus, the larger the target audience is accustomed to the benefits of the digital economy, the more actively the organization is to adapt to new requirements and customer needs, to the new competitive environment. digital transformation of the organization is a complex process that is not limited to the implementation of software in the company’s activities or automation of certain components of production. it includes changes to all elements of the company, including methods of manufacturing and customer service, the organization’s strategy and business model, approaches , and management methods. according to a study by mckinsey, the integration of new technologies into a company's operations can reduce profits in 45 percent of cases.18 therefore, it is extremely important to have a comprehensive approach to digital transformation, understanding the changes being implemented, choosing the method of their implementation, and gradually involving all structural units and business processes in the process of transformation. the boston consulting group study identified six factors necessary for the effective use of the benefits of modern technologies:19 • connectivity of analytical data; • integration of technologies and automation; • analysis of results and application of conclusions; • strategic partnership; • competent specialists in all departments; and • flexible structure and culture. mckinsey consultants draw attention to the low percentage of successful digital transformation practices and based on the successful experience of 83 companies form five categories of recommendations that can contribute to successful digitalization:20 • involvement of leaders experienced in digitalization; • development of digital staff skills; • creating conditions for the use of digital skills by staff; • digitization of tools and working procedures of the company; and • establishing digital communication and ensuring the availability of information. experts at the institute of digital transformation identify four main stages of digital transformation in the company:21 1. research, analysis and understanding of customer experience. 2. involvement of the team in the process of digital transformation and implementation of corporate culture, which contributes to this process. 3. building an effective operating model based on modern systems. 4. transformation of the business model of the organization. https://osvita.diia.gov.ua/ information technology and libraries december 2020 the role of the library in the digital economy | zharinov 5 the “integrated model of digital transformation” study identifies one of the key factors of successful digital transformation, focusing on priority digital projects, the development and implementation of which should be engaged in specific organizational teams. the authors identify three main functional activities for digital transformation teams, the implementation of which provides a gradual comprehensive renewal of the company, namely: the creation and implementation of digital strategy, digital activity management, digitization of operational activities.22 in their study, ukrainian scientists natalia kraus, oleksandr holoborodko, and kateryna kraus determine that the general pattern for all digital economy projects is their focus on a specific consumer and comprehensive use of available information about the latter and the conditions of project effectiveness.23 initially, the project is pre-tested on a small scale, and only after obtaining satisfactory results from the testing of new principles of activity in a narrow target audience is the project scaled to a wider range of potential users. all this reduces the risks associated with digital transformation. eliminating unnecessary changes and false hypotheses on a small scale allows to avoid overspending at the stage of a comprehensive transformation of the entire enterprise. therefore, the process of effective digital transformation should begin with the involvement of experienced leaders in the field of digital transformation, analysis of the weaknesses of the organization, and building of a plan for its comprehensive transformation, which is divided into individual projects implemented by individual qualified teams with a gradual increase in the volume of these projects, while confirming their effectiveness on a small scale. the process of digital transformation should be accompanied by constant training of employees in digital skills. the goal of digital transformation is to build an efficient, high-profile company that can quickly adapt to new environmental conditions, which is achieved through the introduction of digital technologies and new methods and tools of organization management. directions of library development in the digital economy based on the study of the digital economy concept and the peculiarities of digital transformation, the review of library development in the digital economy was conducted to find the library’s place in digital infrastructure and potential projects that can be implemented on a separate library as part of its comprehensive transformation plan. the main task is to determine the new role of the library in the digital economy and the areas that best meet it. the search for directions in the development of the library in response to the spread of digital technology began at the end of the last century. one of the first concepts to reflect the impact of the internet on the library sector is the concept of the digital library, published in 1999.24 in 2006, the concept of “library 2.0” emerged, which is based on the use of web 2.0 technologies, dynamic sites, users become data authors, open-source software, api interfaces, data added to one database is immediately fed to partner databases.25 the spread of the use of social networks, mobile technologies, and their successful use in library practice has led to the formation of the concept of “library 3.0.”26 the development of open source, cloud service, big data, augmented reality, context-aware, and other technologies has influenced library activities, which is reflected in the “library 4.0.”27 researchers, scholars, and the professional community continued to develop the concepts of the modern library, drawing on the experience of implementing changes in library activities and taking into account the development of other areas, and in 2020 articles began to appear which described the concept of “library 5.0,” based on a personalized approach to students, information technology and libraries december 2020 the role of the library in the digital economy | zharinov 6 support of each student during the whole period of study, development of skills necessary for learning and a set of other supporting actions integrated into the educational process.28 in determining the current role of the library in the digital economy, it is necessary to pay attention to a study by denis solovianenko, who in identifies research and educational infrastructure as one of the key elements of scientific libraries of the twenty-first century.29 olga stepanenko considers libraries as part of the information and communication infrastructure, the development of which is one of the main tasks of the transformation of the socioeconomic environment in accordance with the needs of the digital economy, which ensures high efficiency of stakeholders the pace of digitalization of the state economy, which occurs through the development of its constituent elements.30 the importance of traditional library services replacing digital infrastructure, based on the example of the moravian library, is proved in a study by michal indrak and lenka pokorna, published in april 2020.31 projects that contribute to the library’s adaptation to the conditions of the digital economy, implemented in the environment of public libraries, include: digitization of library collections (including historical heritage) and the creation of a database of full-text documents; providing free access to the internet via library computers and wi-fi; organization of online customer service, development of services that do not require a physical presence in the library; organization of events for the development of digital skills of users, work with information.32 under such conditions, the role of the librarian as a specialist in the field of information changes from being a custodian to being an intermediary, a distributor.33 one of the main objectives of library activity in the digital economy becomes overcoming a digital divide, dissemination of knowledge about modern technologies and innovations, the assistance of their use by the community, development of digital skills in all users of the library.34 an example of the digital public library is the digital north library project in canada, which resulted in the creation of the inuvialuit digital library (https://inuvialuitdigitallibrary.ca). the project lasted four years, bringing together researchers from different universities and the community in the region, who together digitized cultural heritage documents and created metadata. the library now has more than 5,200 digital resources collected in 49 catalogues. the implementation of this project provides access to library services and information to a significant number of people living in remote areas of northern canada and unable to visit libraries (https://sites.google.com/ualberta.ca/dln/home?authuser=0, https://inuvialuitdigitallibrary.ca).35 other representatives of modern digital libraries, one of the main tasks of which is the preservation of cultural heritage and the spread of national culture, are the british library (https://www.bl.uk), the hispanic digital library—biblioteca nacional de españa (http://www.bne.es), gallica digital library in france (https://gallica.bnf.fr), the german digital library—deutsche digitale bibliothek (https://www.deutsche-digitale-bibliothek.de), and the european library (https://www.europeana.eu). another direction was the development of analytical skills in information retrieval. academic libraries, operating with their competencies in information retrieval and information technology, which refined the results of the analysis were able to better identify trends in academia and expand cooperation with teachers to update their curricula.36 libraries become active participants https://inuvialuitdigitallibrary.ca/ https://sites.google.com/ualberta.ca/dln/home?authuser=0 https://inuvialuitdigitallibrary.ca/ https://www.bl.uk/ http://www.bne.es/ https://gallica.bnf.fr/ https://www.deutsche-digitale-bibliothek.de/ https://www.europeana.eu/ information technology and libraries december 2020 the role of the library in the digital economy | zharinov 7 in the process of teaching, learning, and assessment of acquired knowledge in educational institutions. t. o. kolesnikova, in her research of models of library development, substantiates the expediency of creating information intelligence centers for the implementation of the latest scientific advances in training and production processes, the involvement of libraries in the activities of higher educational establishments in the educational process, and the creation of centralized repositories as directions of development for university libraries of ukraine.37 one of the advantages of the development and dissemination of digital technologies is the possibility of forming individual curricula for students. involvement of university libraries in this area is one of the new areas of their activities in the digital economy.38 one of the important areas of operation for departmental and scientific-technical libraries that contribute to increasing the innovative potential of the country is activity in the area of intellectual property. consulting services in the field of intellectual property, information support for scientists, creation of electronic patent information databases in the public domain , and other related services are important components of libraries in many countries. consulting services in the field of intellectual property, information support for scientists, creation of electronic patentinformation databases in the public domain and other related services are important components of libraries in many countries.39 another important component of libraries’ transformation is the deepening of their role in scientific communication; expanding the boundaries of the use of information technology in order to integrate scientific information into a single network; creation and management of information technology infrastructure of science.40 the presence of libraries on social networks has become an important component of their digital transformation. on the one hand, libraries have thus created another source of information dissemination and expanded the number of service delivery channels, for the implementation of which they have developed online training videos and interactive help services.41 on the other hand, social networks have become a marketing tool to engage the audience in the digital fund of the library and its online services. an additional important component of the presence of libraries in social networks was the establishment of contacts and exchange of ideas with other professional organizations, which contributed to the further expansion of the network of library partners.42 another area of activity that libraries take on in the digital economy is the management of research data, which is confirmed by the significant number of publications on this topic in professional scientific and research journals for 2017–18.43 joining this area allows libraries to become part of the scientific digital information and communication infrastructure, the creation of which is one of the main tasks of digital transformation on the way to the digital economy.44 the development of this area contributes to the digitalization of scientific and information sphere, systematization and structuring of all scientific research data has a positive effect on the effectiveness of research, the level of scientific novelty of the results of intellectual activity. the ukrainian institute of the future with the digital agency of ukraine consider digital transformation as the integration of modern digital technologies into all spheres of business. the introduction of modern technologies (artificial intelligence, blockchain, koboty, digital twins, iiot platforms and others) in the production process will lead to the transition to industry 4.0. according to their forecasts, the key competence in industry 4.0 should be data processing and information technology and libraries december 2020 the role of the library in the digital economy | zharinov 8 analytics.45 research information is an integral part of this competence, so the development of this area is one of the most promising for the library in the digital economy. the tools used in the management of research data are called current research information systems, abbreviated as cris. in ukraine, there is no such system connected to the international community. 46 the change of the library’s role from a repository to its manager, the alignment of the functions and tasks of a cris with the key requirements of the digital economy, and the advantages of such systems, together with the fact that they are still not used in ukraine, make this area extremely relevant for research and a promising area of work of scientific libraries, so we’ll consider it more thoroughly. problems in research data management the global experience of research information management shows several problems in the process of research data management. some of them are related to the processes of workflow organization, control, and reporting. this is due to the use of several poorly coordinated systems to organize the work of scientists. data sets from different systems without metadata are very difficult to combine into a single system, and it is almost impossible to automate the process. all this is manifested in the lack of information security of the decision-making process in the field of science, both at the state level and at the level of individual structures. this situation can lead to wrong management decisions and can lead to overspending on similar, duplicate projects; increasing the cost of the process of recruiting and finding scientists with relevant experience for research, finding the equipment needed for research. cris, which began to appear in europe in the 1990s, are designed to overcome these shortcomings and promote the effective organization of scientific work. such systems are now widespread throughout the world, with a total of about five hundred, which are mainly concentrated in europe and india. however, there is currently no research information management system in ukraine that meets international standards and integrates with international scientific databases. this omission slows down ukraine’s integration into the international scientific community. the solution to this problem may be the creation of the national electronic scientific information system uris (ukrainian research information system).47 the development of this system is an initiative of the ministry of education and science of ukraine. it is based on combining data from ukrainian scientific institutions with data from crossref and other organizations, as well as ensuring integration with other international cris systems through the use of the cerif standard. future developers of the system face a number of challenges, both specific and already studied by foreign scientists. a significant number of studies in this area are designed to over come the problem of lack of access to research data, as well as to solve problems of data standardization and openness. in the global experience, the directions of collection processes management and development of structured data sets, their distribution on a commercial basis, and also ways of receiving the advantage of providing them in open access are investigated. the mechanisms of financing these processes are studied, in particular, the effective ways of attracting patronage funds are analyzed. the possibilities of licensing the received data sets and their distribution, approaches and tools that can be the most effective for the library are determined. in particular, alice wise describes information technology and libraries december 2020 the role of the library in the digital economy | zharinov 9 the experience of settling some legal aspects by clarifying the use of the site in the license agreement, which covers the conditions of access to information and search in it, while maintaining a certain level of anonymity.48 the problem of data consistency is related to the lack of uniform standards for information retention, which would relate to the format of the data, the metadata itself, the methods of their generation and use. thus, the use of different standards and formats in repositories and archives leads to problems with data consistency in researchers, which, in turn, affects the quality of service delivery and makes it impossible to use multiple data sets.49 another important problem for the dissemination of research data is the lack of tools, components in libraries, and repositories of higher educational establishments and scientific institutions. it is worth to develop the infrastructure so that at the end of the projects, in addition to the research results, the scientists publish the research data they used and generated. this approach will be convenient both for authors (in case they need to reuse the research data) and for other scientists (because they will have access to data that can be used in their own research).50 the development of the necessary tools is quite relevant, especially because researcher-practitioners are in favor of sharing the data they create with other researchers and the licensed use of other people’s datasets in conducting their own research, according to international surveys.51 another reason for the low prevalence of research data is that datasets have less of an impact on a researcher’s reputation and rating than publications.52 this is partly due to the lack of citation tracking infrastructure in datasets, in contrast to the publication of research results, and the lack of standards for storing and publishing data. prestigious scientific journals have been struggling with this problem for several years. for example, the american economic review requires authors whose articles contain empirical work, modelling, or experimental work to provide information about research data in a volume enough for replication.53 nature and science require authors to preserve research data and provide them at the request of the editors of the journals.54 one of the reasons for the underdeveloped infrastructure in research data management is the weak policy of disseminating free access to this data, as a result of which even a small part of usable scientific data remains closed by license agreements and cannot be used by other scientists.55 open science initiatives related to publications have been operating in the scientific field for a long time, but their dissemination to research data remains insufficient. the development of the uris system will provide management of scientific information, will solve problems highlighted in the above scientific works of researchers; will promote the efficient use of funds, will simplify the process of finding data for conducting research; will discipline research , and therefore will have a positive impact on the entire economy of ukraine. library and research information management library involvement in the development process for scientific information management systems will be an important future direction of their work. such systems, which could include all the necessary information about scientific research, will contribute to the renewal and development of the library sphere of ukraine, will promote the transition of the state to a digital economy. information technology and libraries december 2020 the role of the library in the digital economy | zharinov 10 the creation of the uris system is designed to provide access to research data generated by both ukrainian and foreign scientists. such a system can ensure the development of cooperation in the field of research, intensification of knowledge exchange, and interaction through the open exchange of scientific data and integration of ukrainian scientific infrastructure into the world scientific and information space. according to surveys conducted by the international organizations eurocris and oclc, of the 172 respondents working in the field of research information management, 83 percent said that libraries play an important role in the development of open science, copyright, and the deposit of research results. the share of libraries that play a major role in this direction was 90 percent. almost 68 percent of respondents noted the significant contribution of libraries in filling in the metadata needed to correctly identify the work of researchers in various databases; 60 percent noted the important role of libraries in verifying the correctness of metadata filling by researchers, and almost 49 percent of respondents assess the role of libraries as the main one in the management of research data (figure 4). figure 4. the proportion of organizations among 172 users of cris-systems that assess the role of libraries in the management of research information as basic or supporting.56 at the same time, the activity of libraries in the direction of assistance in information management of scientific research can take various forms, which should be adopted by scientific libraries of ukraine; some of these forms will be useful to public libraries that can become science ambassadors in their communities. based on the experience of foreign libraries, we have identified areas of activity in which the library can join the management of research information. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% financial support for rim project management maintaining or servicing technical operations impact assessment and reporting strategic development, management and planning creating internal reports for departments system configuration outreach and communication initiating rim adaption research data management metadata validation workflows metadata entry training and support open access, copyright and deposit information technology and libraries december 2020 the role of the library in the digital economy | zharinov 11 one of the main directions for libraries that cooperate with cris users or are themselves the organizers of such systems is the introduction and support of open science. historically, libraries support open science because they provide access to scientific papers, but they can further expand their activities. using open data resources and promoting them among the scientific community, involving scientific users in disseminating their own research results on the principles of open science, supporting users in disseminating their publications, creating conditions for increasing the citation of scientific papers, tracking information about user publications, creating and support of public profiles of scientists in scientific and professional resources and scientific social networks—all this will help to intensify researchers in engaging in open science and take advantages of this area. the analysis of the world experience shows that in the activity of scientific libraries there is a significant intensification of support for the strategic goals of the structures that finance their activities and to which they are subordinated. libraries are moving away from the usual customer service and expanding their activities through the use of their own assets and the introduction of new modern tools. such libraries try to promote the development of parent structures, increase modern competencies to meet the needs and goals of these institutions better. by introducing and implementing various tools for the development of management, libraries synchronize their strategy with the strategy of the parent structure to achieve a synergistic effect. the next important direction of library development is their socialization. wanting to get rid of the antiquated understanding of the word library, many of them conduct campaigns aimed at changing the image of the library in the imagination of users, communities, and society. an important component of this system step is to build relationships with the target audience, creating user communities around the library, which are not only its users but also supporters, friends, and promoters. building relationships with members of the scientific community allows libraries to reduce resistance to change as a result of the introduction of scientific information management systems; to influence users positively so that they introduce new tools into their usual activities, receive benefits, and become an active part of the scientific space structuring process. recently, work with metadata has undergone some changes. the need for identification and structuring of data in the world scientific space leads to the fact that they are already filled not only by libraries but also by other organizations that produce, issued, publish scientific results and scientific literature. scientists are beginning to make more active use of modern standards in the field of information in order to promote their own work. libraries, in turn, take on the role of consultant or contractor with many years of experience working with metadata and sufficient knowledge in this area. on the other hand, filling in metadata by users frees up the time of librarians and creates conditions for them to perform other functions, such as information management, creation of automated data collection and management systems integrated with scientific databases—both ukrainian and international. another area of research information management is the direct management of this process. thus, cris are developed and implemented with the contribution of scientific libraries in different countries of the world. this allows libraries to combine disparate data obtained from different sources, compile scientific reports, evaluate the effectiveness of scientific activities of the institution, create profiles of scientific institutions and scientists, develop research network s, etc. information technology and libraries december 2020 the role of the library in the digital economy | zharinov 12 scientists and students can find the results of scientific research, and look for partners and sources of funding for research. research managers have access to up-to-date scientific information, which allows to more accurately assess the productivity and influence of individual scientists, research groups and institutions. business representatives get access to up-to-date information on promising scientific developments, and the public—a way to control research conducting effectively. conclusions ukraine is on the path to a digital economy, characterized by the penetration of new technologies in all areas of human activity, simplification of access to information, goods and services, blurring the geographical boundaries of companies, increasing the share of automated and robotic production units, strengthening the role of creation and use databases. these changes affect all sectors of the economy, and all organizations, without exception, need to adapt accordingly. rapid response to relevant changes helps to increase competitiveness both at the level of individual organizations and at the level of the state economy. adaptation to the conditions of the digital economy occurs through digital transformation—a complex process that requires a review of all business processes of the organization and radically changes its business model. the digital transformation of the organization takes place through the involvement of management, which is competent in digitization, updating management methods, developing digital skills, establishing efficient production and services, implementing digital to ols and building digital communication, implementing individual development projects, and adapting to new user needs. the digital transformation of the economy occurs through the transformation of its individual sectors, creating conditions for the transformation of their representatives. one of the first steps in the process of transition to the digital economy is the establishment of digital information and communication infrastructure. libraries are representatives of the information sphere, which were the main operators of information in the analogue era. significant changes in the subject area of their activities require the search for a new role for libraries. modern projects and directions of library development are integral elements of transformation to the conditions of the digital economy. the result of completing this complex implementation will allow libraries to update their management methods, the range of services, and the channels of their provision; change fixed assets through their digitization, structuring the data and creating metadata; affect approaches to communication with users and cooperation with both domestic and international partners; change the functions and positioning of the library; and will enable them become effective information operator-managers. in the digital economy, the role of the library is changing from passively collecting and storing information to actively managing it. one of the areas of development that most comprehensively meets this role is the management of research data, which is implemented through the creation of cris systems. thus, the main asset of libraries is a digital, structured database, which is automatically and regularly updated, the main focus of which is to support the decision-making process. the library becomes an assistant in conducting research, finding funding, partners, fixed assets and information; a partner in the strategic management of both scientific organizations and the state at the level of committees and ministries. information technology and libraries december 2020 the role of the library in the digital economy | zharinov 13 the development of this area in ukraine requires solving a number of technical, administrative, and managerial questions that are relevant not only in ukraine, but also around the world. in particular, libraries need to address the issue of data integration and consistency, its accessibility and openness, copyright, and personal data issues. solving the problems of creation and operation of cris systems in ukraine are promising areas for future research. endnotes 1 andriy dobrynin, konstantin chernykh, vasyl kupriyanovsky, pavlo kupriyanovsky and serhiy sinyagov, “tsifrovaya ekonomika—razlichnyie puti k effektivnomu primeneniyu tehnologiy (bim, plm, cad, iot, smart city, big data i drugie),” international journal of open information technologies 4, no. 1 (2016): 4–10, https://cyberleninka.ru/article/n/tsifrovayaekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smartcity-big-data-i-drugie. 2 jurgen meffert, volodymyr kulagin, and alexander suharevskiy, digital @ scale: nastolnaya kniga po tsifrovizatsii biznesa (moscow: alpina, 2019). 3 victoria apalkova, “kontseptsiia rozvytku tsyfrovoi ekonomiky v yevrosoiuzi ta perspektyvy ukrainy,” visnyk dnipropetrovskoho universytetu. seriia «menedzhment innovatsii» 23, no. 4 (2015): 9–18, http://nbuv.gov.ua/ujrn/vdumi_2015_23_4_4. 4 don tapscott, the digital economy: promise and peril in the age of networked intelligence (new york: mcgraw-hill, 1996). 5 thomas l. mesenbourg, measuring the digital economy (washington, dc: bureau of the census, 2001). 6 philippe barbet and nathalie coutinet, “measuring the digital economy: state-of-the-art developments and future prospects,” communications & strategies, no. 42 (2001): 153, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.576.1856&rep=rep1&type=pdf . 7 alnoor bhimani, “digitization and accounting change,” in management accounting in the digital economy, edited by alnoor bhimani, 1-12 (london: oxford university press, 2003), https://doi.org/10.1093/0199260389.003.0001. 8 bo carlsson, “the digital economy: what is m=new and what is not?,” structural change and economic dynamics 15, no. 3 (september 2004): 245–64, https://doi.org/10.1016/j.strueco.2004.02.001. 9 john hand, “building digital economy—the research councils programme and the vision,” lecture notes of the institute for computer sciences, social informatics and telecommunications engineering 16, (2009): 3, https://doi.org/10.1007/978-3-642-11284-3_1. 10 carmen nadia ciocoiu, “integration digital economy and green economy: opportunities for sustainable development,” theoretical and empirical researches in urban management 6, no. 1 (2011): 33–43, https://www.researchgate.net/publication/227346561. https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie https://cyberleninka.ru/article/n/tsifrovaya-ekonomika-razlichnye-puti-k-effektivnomu-primeneniyu-tehnologiy-bim-plm-cad-iot-smart-city-big-data-i-drugie http://nbuv.gov.ua/ujrn/vdumi_2015_23_4_4 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.576.1856&rep=rep1&type=pdf https://doi.org/10.1093/0199260389.003.0001 https://doi.org/10.1016/j.strueco.2004.02.001 https://doi.org/10.1007/978-3-642-11284-3_1 https://www.researchgate.net/publication/227346561 information technology and libraries december 2020 the role of the library in the digital economy | zharinov 14 11 lesya zenoviivna kit, “evoliutsiia merezhevoi ekonomiky,” visnyk khmelnytskoho natsionalnoho universytetu, ekonomichni nauky, no. 3 (2014): 187–94, http://nbuv.gov.ua/ujrn/vchnu_ekon_2014_3%282%29__42. 12 mykhailo voinarenko and larissa skorobohata, “merezhevi instrumenty kapitalizatsii informatsiino-intelektualnoho potentsialu ta innovatsii,” visnyk khmelnytskoho natsionalnoho universytetu, . ekonomichni nauky, no. 3 (2015): 18–24, http://elar.khnu.km.ua/jspui/handle/123456789/4259. 13 yurii pivovarov, “ukraina perehodut na “cifrovu economic,” sccho ce oznachae,” edited by miroslav liskovuch. ukrinform (january 21, 2020). https://www.ukrinform.ua/rubricsociety/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html. 14 european commission, “digital economy and society index,” brussels, belgium, https://ec.europa.eu/commission/news/digital-economy-and-society-index-2019-jun-11_en. 15 kabinet ministriv ukrainu, “pro skhvalennia kontseptsii rozvytku tsyfrovoi ekonomiky ta suspilstva ukrainy na 2018–2020 roky ta zatverdzhennia planu zakhodiv shchodo yii realizatsii,” (kyiv: 2018), https://zakon.rada.gov.ua/laws/show/67-2018-%d1%80. 16 kabinet ministriv ukrainu, “pytannia ministerstva tsyfrovoi transformatsii,” (kyiv: 2019), https://zakon.rada.gov.ua/laws/show/856-2019-%d0%bf. 17 piatuy, “biblioteky stanut pershymy oflain-khabamy: mintsyfry zapustyt kursy z tsyfrovoi osvity,” https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfryzapustyt-kursy-z-tsyfrovoi-osvity-206206.html. 18 jacques bughin, jonathan deaki, and barbara o’beirne, “digital transformation: improving the odds of success,” mckinsey & company, https://www.mckinsey.com/businessfunctions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-ofsuccess. 19 domynyk fyld, shylpa patel, and henry leon, “kak dostich tsifrovoy zrelosti,” the boston consulting group inc. (2018), https://www.thinkwithgoogle.com/_qs/documents/5685/ru_adwords_marketing___sales_89 1609_mastering_digital_marketing_maturity.pdf. 20 hortense de la boutetière, alberto montagner, and angelika reich, “unlocking success in digital transformations,” mckinsey & company, https://www.mckinsey.com/businessfunctions/organization/our-insights/unlocking-success-in-digital-transformations. 21 top lea, “tsyfrova transformatsiia biznesu: navishcho vona potribna i shche 14 pytan,” businessviews, https://businessviews.com.ua/ru/business/id/cifrova-transformacijabiznesu-navischo-vona-potribna-i-sche-14-pitan-2046. 22 vasily kupriyanovsky, andrey dobrynin, sergey sinyagov, and dmitry namiot, “tselostnaya model transformatsii v tsifrovoy ekonomike—kak stat tsifrovyimi liderami,” international journal of open information technologies 5, no. 1 (2017): 26–33, http://nbuv.gov.ua/ujrn/vchnu_ekon_2014_3%282%29__42 http://elar.khnu.km.ua/jspui/handle/123456789/4259 https://www.ukrinform.ua/rubric-society/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html https://www.ukrinform.ua/rubric-society/2385945-ukraina-perehodit-na-cifrovu-ekonomiku-so-ce-oznacae.html https://ec.europa.eu/commission/news/digital-economy-and-society-index-2019-jun-11_en https://zakon.rada.gov.ua/laws/show/67-2018-%d1%80 https://zakon.rada.gov.ua/laws/show/856-2019-%d0%bf https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfry-zapustyt-kursy-z-tsyfrovoi-osvity-206206.html https://www.5.ua/suspilstvo/biblioteky-stanut-pershymy-oflain-khabamy-mintsyfry-zapustyt-kursy-z-tsyfrovoi-osvity-206206.html https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/digital-transformation-improving-the-odds-of-success https://www.thinkwithgoogle.com/_qs/documents/5685/ru_adwords_marketing___sales_891609_mastering_digital_marketing_maturity.pdf https://www.thinkwithgoogle.com/_qs/documents/5685/ru_adwords_marketing___sales_891609_mastering_digital_marketing_maturity.pdf https://www.mckinsey.com/business-functions/organization/our-insights/unlocking-success-in-digital-transformations https://www.mckinsey.com/business-functions/organization/our-insights/unlocking-success-in-digital-transformations https://businessviews.com.ua/ru/business/id/cifrova-transformacija-biznesu-navischo-vona-potribna-i-sche-14-pitan-2046 https://businessviews.com.ua/ru/business/id/cifrova-transformacija-biznesu-navischo-vona-potribna-i-sche-14-pitan-2046 information technology and libraries december 2020 the role of the library in the digital economy | zharinov 15 https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomikekak-stat-tsifrovymi-liderami. 23 nataliia kraus, alexander holoborodko, and kateryna kraus, “tsyfrova ekonomika: trendy ta perspektyvy avanhardnoho kharakteru rozvytku,” efektyvna ekonomika no. 1 (2018): 1–7, http://www.economy.nayka.com.ua/pdf/1_2018/8.pdf. 24 david bawden and ian rowlands, “digital libraries: assumptions and concepts,” international journal of libraries and information studies (libri), no. 49 (1999): 181–91, https://doi.org/10.1515/libr.1999.49.4.181. 25 jack m. maness, “library 2.0: the next generation of web-based library services,” logos 13, no. 3 (2006): 139–45, https://doi.org/10.2959/logo.2006.17.3.139. 26 woody evans, building library 3.0: issues in creating a culture of participation (oxford: chandos publishing, 2009). 27 younghee noh, “imagining library 4.0: creating a model for future libraries,” the journal of academic librarianship 41, no. 6 (november 2015): 786–97, https://doi.org/10.1016/j.acalib.2015.08.020. 28 helle guldberg et al., “library 5.0,” septentrio conference series, uit the arctic university of norway, no. 3 (2020), https://doi.org/10.7557/5.5378. 29 denys solovianenko, “akademichni biblioteky u novomu sotsiotekhnichnomu vymiri. chastyna chetverta. suchasnyi riven dyskursu akademichnoho bibliotekoznavstva ta postup e-nauky,” bibliotechnyi visnyk no.1 (2011): 8–24, http://journals.uran.ua/bv/article/view/2011.1.02. 30 olga petrivna stepanenko, “perspektyvni napriamy tsyfrovoi transformatsii v konteksti rozbudovy tsyfrovoi ekonomiky,” in modeliuvannia ta informatsiini systemy v ekonomitsi : zb. nauk. pr., edited by v. k. halitsyn, (kyiv: kneu, 2017), 120–31, https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120131.pdf?sequence=1&isallowed=y. 31 michal indrák and lenka pokorná, “analysis of digital transformation of services in a research library,” global knowledge, memory and communication (2020), https://doi.org/10.1108/gkmc-09-2019-0118. 32 irina sergeevna koroleva, “biblioteka—optimalnaya model vzaimodeystviya s polzovatelyami v usloviyah tsifrovoy ekonomiki,” informatsionno-bibliotechnyie sistemyi, resursyi i tehnologii no. 1 (2020): 57–64, https://doi.org/10.20913/2618-7515-2020-1-57-64. 33 james currall and michael moss, “we are archivists, but are we ok?”, records management journal 18, no. 1 (2008): 69–91, https://doi.org/10.1108/09565690810858532. 34 kirralie houghton, marcus foth and evonne miller, “the local library across the digital and physical city: opportunities for economic development,” commonwealth journal of local governance no. 15 (2014): 39–60, https://doi.org/10.5130/cjlg.v0i0.4062. https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomike-kak-stat-tsifrovymi-liderami https://cyberleninka.ru/article/n/tselostnaya-model-transformatsii-v-tsifrovoy-ekonomike-kak-stat-tsifrovymi-liderami http://www.economy.nayka.com.ua/pdf/1_2018/8.pdf https://doi.org/10.1515/libr.1999.49.4.181 https://doi.org/10.2959/logo.2006.17.3.139 https://doi.org/10.1016/j.acalib.2015.08.020 https://doi.org/10.7557/5.5378 http://journals.uran.ua/bv/article/view/2011.1.02 https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120-131.pdf?sequence=1&isallowed=y https://ir.kneu.edu.ua/bitstream/handle/2010/23788/120-131.pdf?sequence=1&isallowed=y https://doi.org/10.1108/gkmc-09-2019-0118 https://doi.org/10.20913/2618-7515-2020-1-57-64 https://doi.org/10.1108/09565690810858532 https://doi.org/10.5130/cjlg.v0i0.4062 information technology and libraries december 2020 the role of the library in the digital economy | zharinov 16 35 sharon farnel and ali shiri, “community-driven knowledge organization for cultural heritage digital libraries: the case of the inuvialuit settlement region,” advances in classification research online no. 1 (2019): 9–12, https://doi.org/10.7152/acro.v29i1.15453. 36 elizabeth tait, konstantina martzoukou, and peter reid, “libraries for the future: the role of it utilities in the transformation of academic libraries,” palgrave communications no. 2 (2016): 1–9, https://doi.org/10.1057/palcomms.2016.70. 37 tatiana alexandrovna kolesnykova, “suchasna biblioteka vnz: modeli rozvytku v umovakh informatyzatsii,” bibliotekoznavstvo. dokumentoznavstvo. informolohiia no. 4 (2009): 57–62, http://nbuv.gov.ua/ujrn/bdi_2009_4_10. 38 ekaterina kudrina and karina ivina, “digital environment as a new challenge for the university library,”bulletin of kemerovo state university. series: humanities and social sciences 2, no. 10 (2019): 126–34, https://doi.org/10.21603/2542-1840-2019-3-2-126-134. 39 anna kochetkova, “tsyfrovi biblioteky yak oznaka xxi stolittia,” svitohliad no. 6 (2009): 68–73, https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68kochetkova.pdf. 40 victoria alexandrovna kopanieva, “naukova biblioteka: vid e-katalohu do e-nauky,” bibliotekoznavstvo. dokumentoznavstvo. informolohiia no. 6 (2016): 4–10, http://nbuv.gov.ua/ujrn/bdi_2016_3_3. 41 christy r. stevens, “reference reviewed and re-envisioned: revamping librarian and deskcentric services with libstars and libanswers,” the journal of academic librarianship 39, no. 2 (march 2013): 202–14, https://doi.org/10.1016/j.acalib.2012.11.006. 42 samuel kai-wah chu and helen s du, “social networking tools for academic libraries,” journal of librarianship and information science 45, no. 1 (february 17, 2012): 64–75, https://doi.org/10.1177/0961000611434361. 43 acrl research planning and review committee, “2018 top trends in academic libraries a review of the trends and issues affecting academic libraries in higher education,” c&rl news 79, no.6 (2018): 286–300. https://doi.org/10.5860/crln.79.6.286. 44 currall and moss, “we are archivists, but are we ok?”, 69–91, https://doi.org/10.1108/09565690810858532. 45 valerii fishchuk et al., “ukraina 2030e— kraina z rozvynutoiu tsyfrovoiu ekonomikoiu,” ukrainskyi instytut maibutnoho, 2018, https://strategy.uifuture.org/kraina-z-rozvinutoyucifrovoyu-ekonomikoyu.html. 46 eurocris, “search the directory of research information system (dris),” https://dspacecris.eurocris.org/cris/explore/dris. 47 mon, “mon zapustylo novyi poshukovyi servis dlia naukovtsiv—vin bezkoshtovnyi ta bazuietsia na vidkrytykh danykh z usoho svituю,” https://mon.gov.ua/ua/news/mon https://doi.org/10.7152/acro.v29i1.15453 https://doi.org/10.1057/palcomms.2016.70 http://nbuv.gov.ua/ujrn/bdi_2009_4_10 https://doi.org/10.21603/2542-1840-2019-3-2-126-134 https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68-kochetkova.pdf https://www.mao.kiev.ua/biblio/jscans/svitogliad/svit-2009-20-6/svit-2009-20-6-68-kochetkova.pdf http://nbuv.gov.ua/ujrn/bdi_2016_3_3 https://doi.org/10.1016/j.acalib.2012.11.006 https://doi.org/10.1177/0961000611434361 https://doi.org/10.5860/crln.79.6.286 https://doi.org/10.1108/09565690810858532 https://strategy.uifuture.org/kraina-z-rozvinutoyu-cifrovoyu-ekonomikoyu.html https://strategy.uifuture.org/kraina-z-rozvinutoyu-cifrovoyu-ekonomikoyu.html https://dspacecris.eurocris.org/cris/explore/dris https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu information technology and libraries december 2020 the role of the library in the digital economy | zharinov 17 zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-navidkritih-danih-z-usogo-svitu. 48 nancy herther et al., “text and data mining contracts: the issues and needs,” proceedings of the charleston library conference, 2016, https://doi.org/10.5703/1288284316233. 49 karen hogenboom and michele hayslett, “pioneers in the wild west: managing data collections.” portal: libraries and the academy 17, no. 2 (2017): 295–319, https://doi.org/10.1353/pla.2017.0018. 50 philip young et al., “library support for text and data mining,” a report for the university libraries at virginia tech, 2017, http://bit.ly/2fccowu. 51 carol tenopir et al., “data sharing by scientists: practices and perceptions,” plos one 6 (2011), no. 6, https://doi.org/10.1371/journal.pone.0021101. 52 filip kruse and jesper boserup thestrup, “research libraries’ new role in research data management, current trends and visions in denmark,” liber quarterly 23, no.4 (2014): 310– 35, https://doi.org/10.18352/lq.9173. 53 american economic review, “data and code.” aer guidelines for accepted articles. instructions for preparation of accepted manuscripts, 2020, https://www.aeaweb.org/journals/aer/submissions/accepted-articles/styleguide#iic. 54 “data access and retention.” the publication ethics and malpractice statement, (new york: marsland press, 2019), http://www.sciencepub.net/marslandfile/ethics.pdf. 55 patricia cleary et al., “text mining 101: what you should know,” the serials librarian 72, no.1-4 (may 2017): 156–59, https://doi.org/10.1080/0361526x.2017.1320876. 56 rebecca bryant et al., practices and patterns in research information management findings from a global survey (dublin: oclc research, 2018), https://doi.org/10.25333/bgfg-d241. https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu https://mon.gov.ua/ua/news/mon-zapustilo-novij-poshukovij-servis-dlya-naukovciv-vin-bezkoshtovnij-ta-bazuyetsya-na-vidkritih-danih-z-usogo-svitu https://doi.org/10.5703/1288284316233 https://doi.org/10.1353/pla.2017.0018 http://bit.ly/2fccowu https://doi.org/10.1371/journal.pone.0021101 https://doi.org/10.18352/lq.9173 https://www.aeaweb.org/journals/aer/submissions/accepted-articles/styleguide#iic http://www.sciencepub.net/marslandfile/ethics.pdf https://doi.org/10.1080/0361526x.2017.1320876 https://doi.org/10.25333/bgfg-d241 abstract introduction the concept of the “digital economy” features of digital transformation directions of library development in the digital economy problems in research data management library and research information management conclusions endnotes letter from the editor (september 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.15559 as summer turns to fall—too soon for the editor of this journal—the northern hemisphere’s academic calendar is getting started. after the past two years of covid-directed activities, it feels good to be returning to a more typical start to the year. while we’re not out of the pandemic woods yet, by any means, it does feel as if we’ve turned a corner. some of us have returned to precovid modes of working and socializing, while others are finding the return to the status quo ante a bit more challenging. if the pandemic has shown us one thing, it’s the power—and limitations—of technology to adapt to a changed world, and of our human ability to adapt, or not, to new habits of technology. we’ve covered many of these responses of adaptation in the pages of this journal over the past two years, and expect that many more innovations and lessons learned will be shared here in the years to come. technological change is a never-ending process. information technology and libraries will continue to share the ways cultural memory institutions adapt, respond, and react to changes in the technological tools we use. as always, if you have lessons learned about technologies and their effect on our mission, we’d like to hear from you. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. this issue’s contents the september “public libraries leading the way” column, “the first 500 mistakes you will make while streaming on twitch.tv” by chris markman, kasper kimura, and molly wallner of the palo alto (california) public library, is all about lessons learned. the authors summarize the many things they discovered while launching, managing, and sustaining a library presence on twitch . our peer-reviewed content this month showcases topics including public library broadband connectivity, two articles on aspect of chat reference, digitization, library management, and a learning object repository to support a cross-institutional, land-based, multidisciplinary academic initiative. 1. measuring library broadband networks to address knowledge gaps and data caps / chris ritzo, colin rhinesmith, and jie jiang 2. perceived quality of whatsapp reference service: a quantitative study from user perspectives / yan guo, apple hiu ching lam, dickson k. w. chiu, and kevin k. w. ho 3. library management practices in the libraries of pakistan: a detailed retrospective / asim ullah, shah khusro, and irfan ullah 4. navigating uncharted waters: utilizing innovative approaches in legacy theses and dissertations digitization at the university of houston libraries / annie wu, taylor davisvan atta, bethany scott, santi thompson, anne washington, jerrell jones, andrew weidner, a. laura ramirez, and marian smith 5. using machine learning and natural language processing to analyze library chat reference transcripts / yongming wang 6. an omeka s repository for placeand land-based teaching and learning / neah ingram-monteiro and ro mckernan kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/call-for-submissions https://ejournals.bc.edu/index.php/ital/article/view/15475 https://ejournals.bc.edu/index.php/ital/article/view/15475 https://ejournals.bc.edu/index.php/ital/article/view/13775 https://ejournals.bc.edu/index.php/ital/article/view/14325 https://ejournals.bc.edu/index.php/ital/article/view/14325 https://ejournals.bc.edu/index.php/ital/article/view/14433 https://ejournals.bc.edu/index.php/ital/article/view/14719 https://ejournals.bc.edu/index.php/ital/article/view/14719 https://ejournals.bc.edu/index.php/ital/article/view/14967 https://ejournals.bc.edu/index.php/ital/article/view/14967 https://ejournals.bc.edu/index.php/ital/article/view/15123 mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com this issue’s contents service barometers: using lending kiosks to locate patrons public libraries leading the way service barometers using lending kiosks to locate patrons william yarbrough information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.13499 public libraries have been using lending kiosks for close to ten years now. typically, kiosks are used as a sort of satellite collection, delivering library services directly to the community. often, they target people with limited mobility or who lack reliable transportation. reaching these underserved populations helps expand a library’s service area and users, which can factor in to state aid. but when amanda jackson first took over as director for the chesapeake public library, back in 2018, one of her first ideas involved using lending kiosks in a way that’s slightly unconventional. chesapeake is the second largest city in the commonwealth of virginia, measuring at 350 square miles. while considered suburban, the city is also plenty rural, with large areas of farmland, forest, swamps, and river. one of those areas is southern chesapeake. spanning roughly 130 square miles, southern chesapeake stretches all the way down to the border with north carolina. the closest library, however, is located on the northern end of the city, all the way up in great bridge. this library, which houses over 289,000 items (not to mention a law library and history room), is the largest in the city. more than 500,000 people visit it annually. still, for many of the 19,586 res idents in southern chesapeake, it’s a bit of a hike. for this size population, breaking ground on a new library branch would be warranted. especially since that population is growing. from 2018 to 2019, southern chesapeake saw its population increase by 1.62%, good for second highest among the city’s nine boroughs. but even a small library, of, say, 12,000 square feet, would cost $3 million—and that’s being conservative. justifying that big of an expense, both to the city council and local taxpayers, requires proving a return on investment. “building a new library brings a lot of excitement and energy into the community,” jackson says. “but first, to support that decision, we need to better understand what south chesapeake needs from the library.” the first person jackson turned to was maiko medina, who heads up cpl’s it division. like jackson, medina has worked in libraries for close to 20 years, first as frontline staff before transitioning to it. while with the neighboring virginia beach public library system, he helped install a variety of new systems, including self-checkout kiosks. together, medina and jackson came up with a plan, that uses lending kiosks as a type of service barometer. cpl would install kiosks all around south chesapeake, at city parks, community william yarbrough (wyarbrou@infopeake.org) is administrative assistant, chesapeake public library. © 2021. mailto:wyarbrou@infopeake.org information technology and libraries june 2021 service barometers | yarbrough 2 centers, police and fire stations, and local businesses. each kiosk would provide a sel ection of new and popular items from the library’s catalogue, from which patrons could check out on the spot. “by studying how the kiosks are used, we’ll get a better idea for where our patrons are,” medina says. “it’ll also tell us what they’re interested in.” this plan was submitted as a capital project. at a proposed $113,000, the project would fund one initial lending kiosk, along with an accompanying holds locker. chesapeake city council approved this project for fiscal year 2020. installation of the first kiosk in southern chesapeake was then scheduled for 2021. however, when the covid-19 pandemic hit, jackson, medina and the rest of the team at cpl recognized the need to speed the plan into action. “we kept hearing about how kids were struggling with virtual learning,” jackson says, “especially those in our underserved communities.” one of those communities is south norfolk. among the neighborhood’s 22,851 residents, 59.2% identify as black. 31% are 19 or younger. of those children, 39% are enrolled at a title i school. as schools were forced to move learning online, many of these students fell behind, whether because they lacked reliable internet, access to a home computer, or both. to meet this need, the neighboring dr. clarence v. cuffee library was transformed into an outreach and innovation center. along with a business center, maker spaces, stem walls, and a rotating art gallery, the new-and-improved cuffee library came with a student learning center. through this service, students can schedule one-on-one tutoring appointments with library staff and local college students, either virtual or in-person. of course, adding these new services required moving other things around. books, dvds, and other materials were redistributed to other libraries across the system. this may seem like an odd decision (after all, what’s a library without books?). but patrons weren’t using this library for materials; in fact, the number of items checked out from the collection (17,922) was significantly lower than any of the other six branches. still, the library didn’t want to abandon those patrons who rely on cuffee for more traditional services. especially since many were likely stuck at home during the pandemic. to meet this need, last october, shortly after the newly renovated dr. clarence v. cuffee outreach and innovation center opened, rather than wait until 2021, the library secured additional funding, through the cares act, to install a lending kiosk right outside the center’s main entrance. with this kiosk—a lendit 200 (courtesy of d-tech international)—patrons can check out from a rotating list of 200 items. they can also get any other item in the library’ collection by using the holdit locker (also from d-tech). both services are free and are available 24 hours a day, 7 days a week. so far, since last december, 136 items have been checked out through the cuffee lendit kiosk. another 304 have been checked out via the holds locker. among those check-outs, most popular are adult nonfiction dvds (33%) and adult fiction books (20%). as a result, more of these items will be rotated into the kiosk’s collection. not only that, but next month, the library will break information technology and libraries june 2021 service barometers | yarbrough 3 ground on another, bigger lending kiosk. located at fire station 7, this kiosk will be the first lendit 500 installed in south chesapeake. “the lending kiosk has really helped us continue to serve our patrons during all the changes brought on over the past 18 months,” say both jackson and medina. “now that things are starting to move ahead a little, we’re excited for how this technology will help us reach more of chesapeake.” over the next couple of years, chesapeake public library will use these lending kiosks to learn more about what the growing number of people in south chesapeake need from the library. maybe that’s more kiosks, a small storefront or even a full-sized, brick-and-mortar building. either way, new and innovative technologies like the lending kiosks will lead the way, helping cpl deliver services further into the community. president’s message andromeda yelton information technology and libraries | june 2018 3 https://doi.org/10.6017/ital.v37i2.10493 andromeda yelton (andromeda.yelton@gmail.com) is lita president 2017-18 and senior software engineer, mit libraries, cambridge, massachusetts. as i started planning this column, i looked back over my other columns for the year and discovered that they have a theme: the connection that runs from our past, through our present, and into our future. in my first column, i talked about the first issues of ital: henriette avram founding marc right here in these pages. early lita hackers cobbling together the technologies of their age to make streamlined, inventive library services — just as lita members do today. in my second column, i talked about conferences where we come together today — lita forum 2017 and 2018 — and encounter the issues of today — data for black lives. i can close my eyes and i’m in denver, chatting with long-time colleagues and first-time presenters ... or i’m at the mit media lab, watching algorithmic opportunity and injustice spar with one another, while artists and poets point us toward the wakandan imaginary. and in my third column, i talked about the possibility of lita, llama, and alcts coming together to form a new division: a potential future. this possibility both knocked my world off its axis and let me see it in a new light; i didn’t imagine that i’d spend my presidency exploring the options for large-scale organizational transformation, and yet i can see how this route could not only address challenges all three divisions face, but also give us opportunities to be stronger together. i believe in this roadmap, but i also want us all to grapple with the question of identity. what’s peripheral, and what’s central, to who we are as library technologists? what’s ephemeral, and what endures? what’s the through line we can hold on to, across that past and present, and carry with us into the future? today, here in the present, i’m preparing to turn over my piece of that line to president-elect bohyun kim. she has been unfailingly brilliant and diligent in the years i’ve known her, and i know she’ll ask insightful questions, advocate for all that’s best in lita and its people, and get things done. but i’m also cognizant that it was never really my line; it was yours. i had the immense privilege of carrying it for a while, but as we hear every time we survey our members, the best part of lita is the networking — it’s you. we will have many chances to discuss our through line in the months to come and i urge you to bring your voices to the table: ask your questions, tell us what matters, and depict your imaginaries. mailto:andromeda.yelton@gmail.com likes, comments, views: a content analysis of academic library instagram posts articles likes, comments, views a content analysis of academic library instagram posts jylisa doney, olivia wikle, and jessica martinez information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12211 jylisa doney (jylisadoney@uidaho.edu) is social sciences librarian, university of idaho. olivia wikle (omwikle@uidaho.edu) is digital initiatives librarian, university of idaho. jessica martinez (jessicamartinez@uidaho.edu) is science librarian, university of idaho. © 2020. abstract this article presents a content analysis of academic library instagram accounts at eleven land-grant universities. previous research has examined personal, corporate, and university use of instagram, but fewer studies have used this methodology to examine how academic libraries share content on this platform and the engagement generated by different categories of posts. findings indicate that showcasing posts (highlighting library or campus resources) accounted for more than 50 percent of posts shared, while a much smaller percentage of posts reflected humanizing content (emphasizing warmth or humor) or crowdsourcing content (encouraging user feedback). crowdsourcing posts generated the most likes on average, followed closely by orienting posts (situating the library within the campus community), while a larger proportion of crowdsourcing posts, compared to other post categories, included comments. the results of this study indicate that libraries should seek to create instagram posts that include various types of content while also ensuring that the content shared reflects their unique campus contexts. by sharing a framework for analyzing library instagram content, this article will provide libraries with the tools they need to more effectively identify the types of content their users respond to and enjoy as well as make their social media marketing on instagram more impactful. introduction library use of social media has steadily increased over time; in 2013, 86 percent of libraries reported using social media to connect with their patron communities.1 the ways in which libraries use social media tend to vary, but common themes include marketing services, content, and spaces to patrons, as well as creating a sense of community.2 even with this wealth of research, fewer studies have examined how libraries use instagram, and those that do often utilize a formal or informal case study methodology.3 this research seeks to fill that gap by examining the types of content shared most frequently by a subset of academic library instagram accounts. although this research focused on academic libraries, its methods and findings could be leveraged by educational institutions and non-profits in their own investigations of instagram usage and impact. literature review since its inception in 2010, instagram’s number of account holders has been steadily increasing. by 2019, more than one billion user accounts were active each month, making it the third most popular social media network in the world, and the pew research center has reported that instagram is the second most used social media platform among people ages 18-29 in the united states, after facebook.4 instagram has estimated that 90 percent of user accounts follow at least one business account.5 previous research has also shown that individuals who use instagram to follow specific brands have the highest rates of engagement with, and commitment to, those mailto:jylisadoney@uidaho.edu mailto:omwikle@uidaho.edu mailto:jessicamartinez@uidaho.edu information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 2 brands when compared to users of other social media platforms.6 though businesses are fundamentally different in the products or services they are trying to market, academic libraries share a desire to provide information to, and engage with, their followers. as such, in the past decade, libraries have begun to adopt instagram as a way to market their libraries and interact with patrons.7 however, methods and parameters for libraries’ use of instagram vary across types of libraries and even within specific library types.8 research has demonstrated that academic libraries’ use of social media, including instagram, is often for the purpose of increasing the sense of community among librarians and patrons by marketing the library’s services and encouraging student feedback and interaction.9 similarly, harrison et al. discovered that academic library social media posts reflected three main themes: “community connections, inviting environment, and provision of content.”10 chatten and roughley have also reported that libraries’ use of social media ranges from providing customer service to promoting the library and building a community of users.11 indeed, when comparing modern social networking systems, such as instagram, to older platforms, such as myspace, fernandez posited that today’s popular social media sites encourage networking and are especially suited to creating community.12 ideally, community engagement in the virtual social media environment would encourage more patrons to enter the library and thus engage in more face-to-face encounters.13 libraries’ methods for measuring the success of their social media engagement are as varied as the ways in which they use social media. assessment of libraries’ social media efficacy is tricky, and highly variable from institution to institution. hastings has cautioned that librarians should recognize that patrons both actively and passively interact with social media content.14 for this reason, while a large number of comments or likes may be identified as positive markers for active engagement, passive forms of engagement, such as the number of times a post appeared in users’ instagram feeds, may also be relevant.15 therefore, when librarians measure the success of an instagram post by examining only the number of likes and comments, they should be aware that they are measuring a very specific type of engagement: one which, on its own, may not determine a post’s full reach or effectiveness. other ways to measure engagement include monitoring how the number of people subscribed to an account changes over time, evaluating reach and impressions,16 or analyzing the content of comments (a type of qualitative measure that may indicate the type of community developing around the library’s social media). despite, or perhaps because of, the general excitement surrounding the possibilities that libraries’ engagement with social media can produce, very little has been written about how different types of libraries (such as academic libraries, law libraries, public libraries, etc.), or libraries in general, use these platforms.17 additionally, many librarians may lack expertise in marketing, including those who are managing social media accounts.18 as social media culture continues to evolve, librarians should move toward a more targeted and pragmatic approach to their instagram practices. this refinement in social media practices may enable libraries to develop more structure, so that they may create and share the type of content that would achieve their desired result at a given time. however, in order to develop this kind of measured approach, it is necessary for researchers to first analyze libraries’ current instagram practices to determine how posts are being used and the outcomes they generate. one effective method of analyzing instagram content centers on coding and classifying images. while many such schemas have been developed for analyzing images posted by instagram users and businesses, transferring these schemas to academic contexts has been difficult. 19 to address information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 3 this gap, stuart et al. adapted a schema that had been used to examine how “news media [and] non-profits,” as well as businesses, used instagram.20 this new schema allowed stuart et al. to classify instagram posts produced by academic institutions in the uk and measure the effect of these universities’ attempts to engage with students via instagram.21 stuart et al.’s schema, which classified instagram images into six categories (orienting, humanizing, interacting, placemaking, showcasing, and crowdsourcing), was the basis for the present study.22 methods research questions the impetus for this study was to learn more about how academic libraries use instagram to connect with their campus communities and promote their services and events. the authors of the present study adapted the research questions posed by stuart et al. to reflect academic library contexts:23 • rq1: which type of post category is used most frequently by libraries on instagram? • rq2: is the number of likes or the existence of comments related to the post category? identifying a sample population this study investigated a small subset of academic institutions: the university of idaho’s sixteen peer institutions. these peers have similar “student profiles, enrollment characteristics, research expenditures, [or] academic disciplines and degrees”; each is designated as a land-grant institution; and the university of idaho considers three to be “aspirational peers.”24 after selecting this population, the authors investigated the library websites of each of the sixteen peer institutions to determine whether or not they had a library-specific instagram account. when a link was not available on the library websites, the authors conducted a search within instagram as well as a general google search in an attempt to identify these instagram accounts. of the university of idaho’s sixteen peer institutions, eleven had active, library-specific instagram accounts. data collection the authors undertook manual data collection between november and december 2018 for these eleven library instagram accounts. initial information about each instagram account was gathered prior to the study on october 23, 2018: the date of the first post, the total number of posts shared by the account, the total number of followers, and the total number of accounts followed. for each account, the authors identified posts shared from january 1, 2018, to june 30, 2018. the “print to pdf” function available in the chrome browser was used to preserve a record of the content, in case the accounts were later discontinued while research was underway. if a post included more than one image, only the first image was captured in the pdf and analyzed. to organize the 3 77 instagram posts shared within this timeframe, the authors assigned each institution a unique, fivedigit identifier; file names included this identifier as well as the date of the post (e.g. , 00004_igpost_20180423). this file naming convention ensured that posts were separated based on institution and that future studies could use the same file naming convention, even if the sample size increased significantly. the authors added the file names of all 377 instagram posts to a shared google sheet, and for each post they reported the kind of post (photo or video), the number of likes, and whether comments existed. information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 4 research data analysis content analysis this project adapted the coding schema stuart et al. employed to investigate the ways in which uk universities used instagram.25 expanding on research by mcnely, stuart et al. employed six instagram post categories: orienting, humanizing, interacting, placemaking, showcasing, and crowdsourcing.26 for the purposes of the present study, the authors used the same category names when coding library instagram posts. however, they updated and adapted the descriptions of each category over the course of two rounds of coding to better reflect academic library contexts (see table 1). within this coding schema, the authors elected to apply only a single category name (i.e., a code) to each library instagram post. interrater reliability during the first round of coding, the authors selected two or three institutions every month, independently coded the posts based on the initial adapted schema, met to discuss discrepancies, and identified the final code based on consensus.27 however, during these discussions, it became evident that there was substantial disagreement concerning how specific categories were interpreted. to examine the impact of this disagreement, the authors calculated fleiss’ kappa, which can be used to assess interrater reliability when two or more coders categorically evaluate data.28 although this project’s fleiss’ kappa (0.683554901) was relatively close to a score of 1.0, demonstrating moderate agreement between each of the three coders, the authors recognized that additional fine-tuning of the adapted coding schema would allow for a more accurate representation of the types of content shared by academic libraries. after updating the schema (table 1), a small sample of collected instagram posts (20 percent, or 76 posts) was randomly selected for independent recoding by each of the authors. again, after coding this random sample individually, the authors met to seek consensus. anecdotal feedback from the coders, as well as an increase in the project’s fleiss’ kappa (0.795494117), demonstrated that the updated coding schema was more robust and representative. based on this evidence, the authors randomly distributed the remaining 301 posts amongst themselves; each post was coded by one author. information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 5 table 1. coding schema for library instagram posts [adapted from: emma stuart, david stuart, and mike thelwall, “an investigation of the online presence of uk universities on instagram,” online information review 41, no. 5 (2017): 588, https://doi.org/10.1108/oir-02-2016-0057.] category description example1 crowdsourcing posts that were created with the intention of generating feedback within the platform. if the content of the post itself fits within a different classification category, but the image is accompanied by text that explicitly asks for viewer feedback, then the post should be classified as crowdsourcing. includes requests for followers to like, comment on, or tag others in a particular post. humanizing posts that aim to emphasize human character or elements of warmth, humor, or amusement. this includes historic/archival photos used to convey these sentiments. this code is only used if both the text and the photo or video can be categorized as humanizing because many library posts contain a “humanizing” element. 1 sample images from the university of idaho library’s instagram account. https://doi.org/10.1108/oir-02-2016-0057 information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 6 category description example1 interacting posts with candid photographs or videos at library and libraryassociated events. includes events within or outside the library. orienting posts that situate the library within its larger community, especially regarding locations, artifacts, or identities. text often includes geographic information. placemaking posts that capture the atmosphere of the library through its physical space and attributes. includes permanent murals, statues, etc. information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 7 category description example1 showcasing posts that highlight library or campus resources, services, or future events. can include current or on-going events if people are not the focus of the image (e.g. exhibit, highlight of collection, etc.). these posts can also present information about library operations, such as hours and fundraising. posts can also entice their audience to do something, outside of instagram, such as visit a specific website. results general data about the library instagram accounts as of october 23, 2018 (the date this initial information was gathered), the eleven academic library instagram accounts had shared a combined 3,124 posts. most libraries created their instagram accounts and started posting between 2013 and 2016, but one library shared a post in 2012 and one created their account in april 2018. since the date of their first post, each account had shared 284 posts on average, while the actual number of posts shared across accounts ranged from 62 to 520. the number of followers and accounts followed across these eleven accounts ranged from 115 to 1,390 and 65 to 2,717, respectively. between january 1, 2018 , and june 30, 2018, these eleven library instagram accounts shared a total of 377 posts. the number of posts shared by each account during this time period ranged from four to 57, with an average of 34 posts. rq1: which type of post category is used most frequently by libraries on instagram? of the 377 posts analyzed, 359 included photos and 18 included videos. more than 50 percent of posts shared were coded as showcasing, with humanizing (18 percent) and crowdsourcing (9.8 percent) being the next most common categories (see table 2), although data demonstrated that individual libraries differed in their use of specific post categories (see table 3). when examining frequency based on category of post, the authors identified slight differences between video and photo posts. as with photos, the majority of videos (55.6 percent) were still coded as showcasing; however, the second most common post category for videos was interacting (16.7 percent). information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 8 table 2. number and percentage of posts by category for posts with photos or videos category number of posts percentage of posts crowdsourcing 38 10.1% humanizing 68 18.0% interacting 16 4.2% orienting 28 7.4% placemaking 33 8.8% showcasing 194 51.5% total 377 100% table 3. percentage of posts by category and library for posts with photos or videos library crowdsourcing humanizing interacting orienting placemaking showcasing lib 1 7.7% 15.4% 0% 23.1% 30.8% 23.1% lib 2 4.2% 50.0% 0% 4.2% 0% 41.7% lib 3 56.1% 10.5% 1.8% 3.5% 7.0% 21.1% lib 4 0% 4.1% 4.1% 4.1% 2.0% 85.7% lib 5 0% 24.4% 2.2% 20.0% 26.7% 26.7% lib 6 7.5% 18.9% 3.8% 11.3% 11.3% 47.2% lib 7 0% 20.0% 0% 0% 10.0% 70.0% lib 8 0% 21.6% 9.8% 5.9% 0% 62.7% lib 9 0% 25.0% 25.0% 0% 0% 50.0% lib 10 0% 16.1% 6.5% 0% 9.7% 67.7% lib 11 0% 15.0% 5.0% 5.0% 5.0% 70.0% rq2: is the number of likes or the existence of comments related to the post category? number of likes by category the results of the coding process also indicated that the number of likes differed based on the category of post. when examining photo posts, the authors noted that every post received at least five likes, with most posts receiving between 20-39 likes (see table 4). on average, crowdsourcing information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 9 photo posts generated the highest average number of likes across all categories, followed by orienting and placemaking posts (see table 5). however, it is important to recognize that crowdsourcing posts often asked visitors to participate in a post by “liking” it, often with the chance to win a library-sponsored contest, which may partially explain the higher average number of likes. table 4. number of posts by category and range of likes for posts with photos (does not include posts with videos) range of likes category 5-19 20-39 40-59 60-79 80-99 100119 120140 crowdsourcing 0 11 16 6 1 1 1 humanizing 16 26 10 9 5 0 1 interacting 5 5 3 0 0 0 0 orienting 2 7 9 8 0 1 0 placemaking 3 10 12 3 2 1 1 showcasing 67 83 27 5 1 0 1 total 93 142 77 31 9 3 4 table 5. average number of likes by category for posts with photos (does not include posts with videos) category average number of likes number of posts crowdsourcing 53.6 36 humanizing 39.9 67 interacting 27.8 13 orienting 50.0 27 placemaking 46.9 32 showcasing 27.6 184 existence of comments by category the authors also examined the existence of comments, another metric for engagement with instagram posts. data demonstrated that 78.9 percent of crowdsourcing posts included information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 10 comments, while a much lower percentage of placemaking (30.3 percent), orienting (28.6 percent), and humanizing (26.5 percent) posts generated this type of engagement (see table 6). as with the data on the number of “likes,” many crowdsourcing posts encouraged visitors to comment on a particular post, at times with an incentive connected to this type of engagement. table 6. presence of comments by category for posts with photos or videos category number of posts with comments number of posts without comments total number of posts percentage of posts with comments crowdsourcing 30 8 38 78.9% humanizing 18 50 68 26.5% interacting 3 13 16 18.8% orienting 8 20 28 28.6% placemaking 10 23 33 30.3% showcasing 40 154 194 20.6% total 109 268 377 28.9% discussion as noted previously, the post category used most frequently by these eleven libraries on instagram was showcasing (51.5 percent). the fact that libraries were more likely to share this type of content—which highlighted library resources, events, or collections—is understandable, as library promotion is one of the foundational reasons libraries spend the time and effort required to maintain social media accounts.29 this finding differs substantially from previous research with uk universities, which classified only 28.8 percent of posts as showcasing.30 when examining other post categories, it also became clear that uk universities shared humanizing posts more frequently (31 percent) than the eleven libraries (18 percent) included in this study.31 although the results of this study demonstrated that showcasing posts were shared most often, the data also indicates that showcasing posts were neither the category with the most likes on average nor the category that received comments most often. crowdsourcing posts were the category with the highest average number of likes (53.6) with orienting posts coming in at a close second (50), followed by placemaking (46.9) and humanizing (39.9) posts. showcasing posts, along with interacting posts, only generated slightly more than half the number of likes on average, when compared to the other categories (27.6 and 27.8, respectively). the category with the largest proportion of comments was crowdsourcing posts, with 78.9 percent of posts in this category generating comments from visitors. however, this result is likely skewed, as one of the library instagram accounts had exceptionally successful crowdsourcing posts, which often included a giveaway or other incentive for participation. in fact, when this institution was removed from the data set, only six crowdsourcing posts remained, two of which generated information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 11 comments. to better determine whether crowdsourcing posts are always this effective at generating engagement, it would be necessary to code a larger sample of instagram pos ts. it is clear that while showcasing posts were the most common among the instagram accounts analyzed, they also received the lowest number of likes, on average, and generated comments less frequently than all but one post category. while this may seem disheartening, it is important to remember that the showcasing category includes informational posts that convey library hours, services, or closures; this information that may be effectively relayed to users without necessitating an active response in the form of likes and comments. therefore, one might use different criteria to determine the success of showcasing posts, perhaps examining instagram data related to reach (the total number of unique visitors that view a post) and impressions (the total number of times a post is viewed).32 data on reach and impressions are only available to instagram account “owners.” in the current study, the authors did not quantify these types of engagement as their goal was to evaluate the content and metrics available to all instagram users, rather than the data that was only available to the “owners” of these library instagram accounts. in addition to answering the research questions, coding these instagram posts prompted several new questions regarding the types of information libraries and other institutions share online. one such question includes: with both universities and academic libraries working with students, why did academic libraries share a smaller percentage of interacting posts than uk universities? 33 additional research is needed to answer this question, but anecdotally, this difference may be related to the fact that universities, as a whole, have a larger number of opportunities to promote and share instances of interaction via instagram than libraries. for example, general university instagram accounts often include photos of students and affiliates interacting at large scale events such as sports games, musical performances, and other student gatherings that take place across campus. library-specific accounts on the other hand, have fewer opportunities to post photos that capture individuals “interacting” candidly. further, the fact that libraries tend to be proponents of privacy rights may inhibit library staff from taking photos of their users and sharing them online without first getting permission. therefore, differences related to the number of events and the organization type may contribute to whether or not universities and libraries share interacting posts; more research is needed to examine this hypothesis. another issue that arose during coding was that, if not for their inclusion of a request to comment, many crowdsourcing posts could have been classified under other categories. if an account follower looked only at the photos included in many of the crowdsourcing posts without reading the captions, they may not interpret those posts as crowdsourcing. therefore, a future research project might examine whether applying secondary categories to crowdsourcing posts, as a means of further classifying images and not just their captions, could generate a more comprehensive picture of what libraries are sharing on their instagram accounts. the authors also discovered that a majority of the library instagram posts included in this sample contained humanizing elements. almost all posts attempted to convey warmth, humor, or assistance, and therefore had the potential to be classified as humanizing. to successfully adapt stuart et al.’s coding schema for academic library instagram accounts, the authors specified that a post had to have both a humanizing caption as well as a humanizing photo to be coded as such.34 as with crowdsourcing posts, adding secondary categories to humanizing posts could better reflect the dual nature of this content and help future coders more accurately interpret the types of content shared by academic libraries. information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 12 limitations and future research the number of library instagram accounts selected as well as the use of a six-month timeframe were limitations of the current study. in the future, selecting a larger sample size and a different group of academic libraries would serve to advance the discipline’s understanding of the types of content shared by academic libraries and how users interact with these instagram posts. additionally, collecting instagram posts shared during an expanded timeframe could allow researchers to explore whether library instagram accounts consistently share the same types of content at various points throughout the year. as mentioned in the discussion section, future research could also include adding secondary categories to posts, which would allow researchers to gather more granular information about the types of content shared and the relationships between post category, comments, and likes. lastly, to better understand the post categories that generate the greatest engagement, collaborative research between institutions could allow researchers to gather and analyze metrics that are only available to account owners, such as impressions and reach. with this type of collaboration, researchers could also investigate how social media outreach goals influence the types of content shared on library instagram accounts. for example, researchers could conduct interviews or surveys with libraries and ask questions such as: what does your library hope to accomplish with its instagram account, who are you attempting to reach, how do you define a successful post, what metrics do you use to evaluate your instagram presence, and do your social media outreach goals influence the types of content shared on instagram? pursuing these types of questions, in addition to examining the actual content shared, would allow researchers to gain a more complete picture of what a successful social media presence looks like for an academic library. conclusion this research provides initial insight into the instagram presence of a subset of academic libraries at land-grant institutions in the united states. expanding on the research of stuart et al., this project used an adapted coding schema to document and analyze the content and efficacy of academic libraries’ instagram posts.35 the results of this study suggest that social media accounts, including those used by academic libraries, perform better when they reflect the community the library inhabits by highlighting content that is unique to their particular constituents, rather than simply functioning as another platform through which to share information. this study’s findings also demonstrate that academic libraries should strive to create an instagram presence that encompasses a variety of post categories to ensure that their online information sharing meets various needs. information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 13 endnotes 1 nancy dowd, “social media: libraries are posting, but is anyone listening?,” library journal 138, no. 10 (may 7, 2013), 12, https://www.libraryjournal.com/?detailstory=social-media-libraries-areposting-but-is-anyone-listening. 2 marshall breeding, next-gen library catalogs (london: facet publishing, 2010); zelda chatten and sarah roughley, “developing social media to engage and connect at the university of liverpool library,” new review of academic librarianship 22, no. 2/3 (2016), https://doi.org/10.1080/13614533.2016.1152985; amanda harrison et al., “social media use in academic libraries: a phenomenological study,” the journal of academic librarianship 43, no. 3 (2017), https://doi.org/10.1016/j.acalib.2017.02.014; nicole tekulve and katy kelly, “worth 1,000 words: using instagram to engage library users,” brick and click libraries symposium, maryville, mo (2013), https://ecommons.udayton.edu/roesch_fac/20; evgenia vassilakaki and emmanouel garoufallou, “the impact of twitter on libraries: a critical review of the literature,” the electronic library 33, no. 4 (2015), https://doi.org/10.1108/el03-2014-0051. 3 yeni budi rachman, hana mutiarani, and dinda ayunindia putri, “content analysis of indonesian academic libraries’ use of instagram,” webology 15, no. 2 (2018), http://www.webology.org/2018/v15n2/a170.pdf; catherine fonseca, “the insta-story: a new frontier for marking and engagement at the sonoma state university library,” reference & user services quarterly 58, no. 4 (2019), https://www.journals.ala.org/index.php/rusq/article/view/7148; kjersten l. hild, “outreach and engagement through instagram: experiences with the herman b wells library account,” indiana libraries 33, no. 2 (2014), https://journals.iupui.edu/index.php/indianalibraries/article/view/16633; julie lê, “#fashionlibrarianship: a case study on the use of instagram in a specialized museum library collection,” art documentation: bulletin of the art libraries society of north america 38, no. 2 (2019), https://doi.org/10.1086/705737; danielle salomon, “moving on from facebook: using instagram to connect with undergraduates and engage in teaching and learning,” college & research libraries news 74, no. 8 (2013), https://doi.org/10.5860/crln.74.8.8991. 4 “our story,” instagram, https://business.instagram.com/; chloe west, “17 instagram stats marketers need to know for 2019,” sprout blog, april 22, 2019, https://web.archive.org/web/20191219192653/https://sproutsocial.com/insights/instagra m-stats/; pew research center, “social media fact sheet,” last modified june 12, 2019, http://www.pewinternet.org/fact-sheet/social-media/. 5 “our story,” instagram. 6 joe phua, seunga venus jin, and jihoon jay kim, “gratifications of using facebook, twitter, instagram, or snapchat to follow brands: the moderating effect of social comparison, trust, tie strength, and network homophily on brand identification, brand engagement, brand commitment, and membership intention,” telematics and informatics 34, no. 1 (2017), https://doi.org/10.1016/j.tele.2016.06.004. https://www.libraryjournal.com/?detailstory=social-media-libraries-are-posting-but-is-anyone-listening https://www.libraryjournal.com/?detailstory=social-media-libraries-are-posting-but-is-anyone-listening https://doi.org/10.1080/13614533.2016.1152985 https://doi.org/10.1016/j.acalib.2017.02.014 https://ecommons.udayton.edu/roesch_fac/20 https://doi.org/10.1108/el-03-2014-0051 https://doi.org/10.1108/el-03-2014-0051 http://www.webology.org/2018/v15n2/a170.pdf https://www.journals.ala.org/index.php/rusq/article/view/7148 https://journals.iupui.edu/index.php/indianalibraries/article/view/16633 https://doi.org/10.1086/705737 https://doi.org/10.5860/crln.74.8.8991 https://business.instagram.com/ https://web.archive.org/web/20191219192653/https:/sproutsocial.com/insights/instagram-stats/ https://web.archive.org/web/20191219192653/https:/sproutsocial.com/insights/instagram-stats/ http://www.pewinternet.org/fact-sheet/social-media/ https://doi.org/10.1016/j.tele.2016.06.004 information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 14 7 fonseca, “the insta-story;” hild, “outreach and engagement;” lê, “#fashionlibrarianship;” rachman, mutiarani, and putri, “content analysis;” salomon, “moving on from facebook;” tekulve and kelly, “worth 1,000 words.” 8 vassilakaki and garoufallou, “the impact of twitter.” 9 breeding, next-gen library catalogs; hild, “outreach and engagement;” rachman, mutiarani, and putri, “content analysis;” vassilakaki and garoufallou, “the impact of twitter.” 10 harrison, burress, velasquez, schreiner, “social media use,” 253. 11 chatten and roughley, “developing social media.” 12 peter fernandez, “‘through the looking glass: envisioning new library technologies’ social media trends that inform emerging technologies,” library hi tech news 33, no. 2 (2016), https://doi.org/10.1108/lhtn-01-2016-0004. 13 robin m. hastings, microblogging and lifestreaming in libraries (new york: neal-schumann publishers, 2010). 14 hastings, microblogging. 15 robert david jenkins, “how are u.s. startups using instagram? an application of taylor's sixsegment message strategy wheel and analysis of image features, functions, and appeals” (ma thesis, brigham young university, 2018), https://scholarsarchive.byu.edu/etd/6721. 16 lucy hitz, “instagram impressions, reach, and other metrics you might be confused about,” sprout blog, january 22, 2020, https://sproutsocial.com/insights/instagram-impressions/. 17 vassilakaki and garoufallou, “the impact of twitter.” 18 mark aaron polger and karen okamoto, “who’s spinning the library? responsibilities of academic librarians who promote,” library management 34, no. 3 (2013), https://doi.org/10.1108/01435121311310914. 19 yuhen hu, lydia manikonda, and subbarao kambhampati, “what we instagram: a first analysis of instagram photo content and user types,” eighth international aaai conference on weblogs and social media (2014), https://www.aaai.org/ocs/index.php/icwsm/icwsm14/paper/viewpaper/8118; jenkins, “how are u.s. startups using instagram?;” brian j. mcnely, “shaping organizational imagepower through images: case histories of instagram,” proceedings of the 2012 ieee international professional communication conference, piscataway, nj (2012), https://doi.org/10.1109/ipcc.2012.6408624; emma stuart, david stuart, and mike thelwall, “an investigation of the online presence of uk universities on instagram,” online information review 41, no. 5 (2017): 584, https://doi.org/10.1108/oir-02-2016-0057. 20 stuart, stuart, and thelwall, “an investigation of the online presence;” mcnely, “shaping organizational image-power,” 3. https://doi.org/10.1108/lhtn-01-2016-0004 https://scholarsarchive.byu.edu/etd/6721 https://sproutsocial.com/insights/instagram-impressions/ https://doi.org/10.1108/01435121311310914 https://www.aaai.org/ocs/index.php/icwsm/icwsm14/paper/viewpaper/8118 https://doi.org/10.1109/ipcc.2012.6408624 https://doi.org/10.1108/oir-02-2016-0057 information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 15 21 stuart, stuart, and thelwall, “an investigation of the online presence.” 22 stuart, stuart, and thelwall, “an investigation of the online presence,” 588. 23 stuart, stuart, and thelwall, “an investigation of the online presence,” 585. 24 “university of idaho’s peer institutions,” university of idaho, accessed october 8, 2019. 25 stuart, stuart, and thelwall, “an investigation of the online presence,” 588. 26 mcnely, “shaping organizational image-power,” 4; stuart, stuart, and thelwall, “an investigation of the online presence,” 588. 27 johnny saldaña, the coding manual for qualitative researchers (los angeles: sage publications, 2013), 27. 28 “fleiss’ kappa,” wikipedia, https://en.wikipedia.org/wiki/fleiss%27_kappa. 29 chatten and roughley, “developing social media.” 30 stuart, stuart, and thelwall, “an investigation of the online presence,” 590. 31 stuart, stuart, and thelwall, “an investigation of the online presence,” 590. 32 hitz, “instagram impressions, reach, and other metrics.” 33 stuart, stuart, and thelwall, “an investigation of the online presence,” 590. 34 stuart, stuart, and thelwall, “an investigation of the online presence,” 588. 35 stuart, stuart, and thelwall, “an investigation of the online presence.” https://en.wikipedia.org/wiki/fleiss%27_kappa abstract introduction literature review methods research questions identifying a sample population data collection research data analysis content analysis interrater reliability results general data about the library instagram accounts rq1: which type of post category is used most frequently by libraries on instagram? rq2: is the number of likes or the existence of comments related to the post category? number of likes by category existence of comments by category discussion limitations and future research conclusion endnotes user experience with a new public interface for an integrated library system articles user experience with a new public interface for an integrated library system kelly blessinger and david comeaux information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11607 kelly blessinger (kblessi@lsu.edu) is head of access services, louisiana state university. david comeaux (davidcomeaux@lsu.edu) is systems and discovery librarian, louisiana state university. abstract the purpose of this study was to understand the viewpoints and attitudes of researchers at louisiana state university toward the new public search interface from sirsidynix, enterprise. fifteen university constituents participated in user studies to provide feedback while completing common research tasks. particularly of interest to the librarian observers were identifying and characterizing where problems were expressed by the participants as they utilized the new interface. this study was approached within the framework of the cognitive load theory and user experience (ux). problems that were discovered are discussed along with remedies, in addition to areas for further study. introduction the library catalog serves as a gateway for researchers at louisiana state university (lsu) to access the print and electronic resources available through the library. in 2018 lsu, in collaboration with our academic library consortium (louis: the louisiana library network), upgraded to a new library catalog interface. this system, called enterprise, was developed by sirsidynix, which also provides symphony, an integrated library system (ils) long used by the lsu libraries. “sirsidynix and innovative interfaces are the two largest companies competing in the ils arena that have not been absorbed by one of the top-level industry players.”1 there were several reasons for the change. most importantly, sirsidynix made the decision to discontinue updates to the previous online public access catalog (opac), known as e-library, and focus development on enterprise. in response to this announcement, the louis consortium chose to sunset the e-library opac in the summer of 2018. this was welcome news to many, especially the systems librarian, who had felt frustrated by the antiquated interface of the old opac as well as its limited potential for customization. the newer interface has a more modern design and includes features such as faceted browsing to better suit the twenty-first-century user. enterprise also delivers better keyword searching. this is largely because it uses the solr search platform, which operates on an inverted index. solr (pronounced “solar”) is based on open source indexing technology and is customizable, more flexible, and usually provides more satisfactory results to common searches than our previous catalog. inverted indexing can be conceptualized similarly to indexes within books. “instead of scanning the entire collection, the text is preprocessed and all unique terms are identified. this list of unique terms is referred to as the index. for each term, a list of documents that contain the term is also stored.”2 unlike the old catalog, which sorted results by date (newest to oldest), enterprise ranks results by relevance, like search engines. the new search is also faster because the results are matched to the inverted index instead of whole records.3 mailto:kblessi@lsu.edu mailto:davidcomeaux@lsu.edu information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 2 the authors wanted to investigate how well this new interface would meet users’ research needs. while library database and website usage patterns can be assessed through quantitative measures using web analytics, librarians are often unaware of the elements that cause frustration for the users unless they are reported. prior to enterprise going live, the library’s head of access services solicited internal “power users” in the library to use the new interface. power users were identified as library personnel in units who used the catalog daily for their work. this group included interlibrary loan, circulation, and some participants in research and instruction services. these staff members were asked to use enterprise as their default search to help discover problems before it went live. a shared document was created in google drive for employees to leave feedback regarding their experiences and suggestions for improvements. the systems librarian had access to this folder and periodically accessed it and made warranted changes that were within his control. several changes were made based on feedback from the internal user group. these included adding the term “checked out” in addition to the due date, and the adjustment of features that were not available or working correctly in the advanced search mode, such as date, series, call number, and isbn. several employees were curious about differences between the results in the old system and enterprise due to the new algorithm. additionally, most internal users fou nd the basic search too simplistic and not useful, so the advanced search mode was made the default search. among the suggestions, there was also praise for the new interface. these statements were regarding elements of the user-enablement tools, such as “i was able to login using my patron information. i really like the way that part functions,” and from areas where additional information was now available, such as “i do enjoy that it shows the table of contents —certainly helps with checking citations for ill.” while the feedback from internal stakeholders was helpful, the authors were determined to gather feedback from patrons as well. to obtain this feedback, the authors elected to conduct usability studies. usability testing employs representative users to complete typical tasks while observers watch, facilitate, and take notes. the goal of this type of study is to collect both qualitative and quantitative data to explore problem areas and to gauge user satisfaction.4 enterprise includes an integration with ebsco discovery service (eds) to display results from the electronic databases subscribed to by the library as well as the library’s holdings. eds was implemented several years ago as a separate tool. the implementation team suspected that launching enterprise with periodical article search functionality might be confusing to those who were not accustomed to the catalog operating in this manner. therefore, for the initial roll-out, the discovery functionality was disabled in enterprise, leaving it to function strictly as a catalog to library resources. this decision will be revisited later. like many other academic libraries, eds is currently the default search for users from the lsu libraries homepage. other search interfaces, labeled “catalog,” “databases,” and “e-journals,” are also included as options in a tabbed search box. conceptual framework two schools of thought helped to frame this research inquiry: cognitive load theory and user experience (ux). cognitive load theory relates to the amount of new information a novice learner can take on at a given time due to limitations of the working memory. this theory originated in the field of instructional design in the late 1980s.5 the theory states that what separates novice learners from experts is that the latter know the background, or are familiar with the schema of a information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 3 problem, whereas novices start without this information. accordingly, experts are able to categorize and work with problems as they are presented, whereas new learners need to formulate their problem-solving strategies while encountering new knowledge. as a result, novices are quicker to max out the cognitive load in their working memories while trying to solve problems. ux emerged from the field of computer science and measures the satisfaction of users with their experience with a product. a 2010 article reviewed materials published in 65 separate studies with “cognitive load” in the title or abstract.6 early articles on cognitive load focused on learning and instructional development. these studies concentrated on limiting extraneous information (e.g. , materials and learning concepts), which affects the amount of information able to be held in the working memory.7 while the research that developed cognitive load theory centered on real-life problemsolving scenarios, later research focused on its impact in e-learning environments and learning regarding this delivery mode.8 in contrast to cognitive load theory, which was formed by academic study, the concept of ux was developed in response to user/customer satisfaction, particularly regarding electronic resources such as websites.9 user testing allows end users to provide realtime feedback to developers so they see the product working, and in particular, to note where it could be improved. ux correlates well with cognitive load theory for this study, as the concept arose with the widespread use of computers in the workplace and in homes in the mid-1990s. user studies user expectations have shifted beyond the legacy or “classic” opac, originally designed for use by experienced researchers with the primary goal of searching for known items.10 user feedback has historically been sought when libraries release new platforms and services and to gauge user satisfaction regarding research tools. “libraries seek fully web-based products without compromising the rich functionality and efficiencies embodied in legacy platforms.”11 a study by borbely used a combination of the log files of opac searches and a satisfaction questionnaire to determine which factors were most important to both professional and nonprofessional users. their findings indicated that task effectiveness, defined as the system returning relevant results, was the primary factor related to user satisfaction.12 many of the articles dealing with user studies and library holdings published in recent years have focused on next-generation catalogs (ngcs). this was defined in a 2011 study by 12 characteristics: “a single point of entry for all library resources, state of the art web interface, enriched content, faceted navigation, simple keyword search box with a link to advanced search box on every page, relevancy based on circulation statistics and number of copies, ‘did you mean . . .’ spell checking recommendations/related materials based on transaction logs, user contributions (tagging and ranking), rss feeds, integration with social networking sites, and persistent links.”13 catalogs defined as next-generation provide more options and functionality in a user-friendly, intuitive format. they are typically designed to more closely mimic web search engines, with which novice users are already familiar. tools within ngcs such as the faceted browsing of results have been reported as popular in user studies, especially among searchers without high levels of previous search experience. “faceted browsing offers the user relevant subcategories by which they can see an overview of results, then narrow their list.”14 a 2015 study interviewed 18 academic librarians and users to seek their feedback regarding new features made possible by ngcs. their findings indicate “that while the next-generation catalogue interfaces and information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 4 features are useful, they are not as ‘intuitive’ as some of the literature suggests, regardless of the users’ searching skills.”15 this study also indicated that users typically use the library catalog in combination with other tools such as google scholar or worldcat local both for ease of use and for a more complete review of the literature. while enterprise contains the twelve elements that yang and hofmann defined for a ngc, since the discovery element has been disabled, lsu libraries use of enterprise may be better described as an ils opac with faceted results navigation. while the implementation of discovery services and other web tools has shifted users to sources other than the catalog, many searchers often still prefer to use the library’s catalog. reasons for this may include familiarity with the interface, the ability to limit results to smaller numbers, or the unavailability of specific desired search options through other interfaces. problem statement the purpose of this study was to understand the viewpoints and attitudes of university stakeholders regarding a new interface to the online catalog. in particular, four areas were investigated: 1. identification of problems searching for books on general and distinct topics. 2. identification of problems searching for books with known titles and specific journal titles. 3. exploration of the usability of patron-empowerment features. 4. identification of other issues and/or frustrations (e.g., “pain points”). methodology three groups of users were identified for this study: undergraduate students, graduate students, and staff/faculty. the student participants were the easiest to recruit due to a fine forgiveness program that was initiated at lsu libraries in 2016. this program gives library users the option of completing user testing in lieu of some or all of their library fines (up to $10 per user test). all the student participants were recruited in this manner. additionally, five faculty/staff members identified as frequent library users were asked to participate. the participant pool for user testing included five undergraduate students, five graduate students, and five faculty and staff members. each of these groups had five participants, which is considered a best practice in user testing.16 the total sample studied for this study was 15 library users representing these three unique user groups. these participants are described in more detail in appendix a. for the observations, individuals were brought to the testing room in the library. this is a small neutral conference room with a table, laptop, and chairs for the librarian observers and participants. each participant was tested individually and was asked to speak aloud through their thought process as they used the new interface. the authors employed a technique known as “rapid iterative testing.” this type of testing involves updating the interface soon after problems are identified by a user or observer. thus, after each user test, confusing and extraneous information was removed applying cognitive load theory, improving the interface in alignment with the concept of ux. this approach helped to minimize the number of times participants repeatedly encountered the same issues. this framework makes this study more of a working study than a typical user study. a demonstration of this type of testing is included as figure 1. information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 5 before after figure 1. iterative testing model. this shows the logon screen before and after it was modified. this was based on the observation that users were unsure what information to enter here. the software usability studio was utilized to record audio and video of the participants’ electronic movements throughout each test. although the software can also record video of users throughout tests, the authors felt that this may make the participants uncomfortable and possibly more reluctant to openly share their opinions. at the beginning of each user study, participants were informed of the purpose of the study, the process, and the estimated time of the observation (30 to 45 minutes). the participants were then asked to sign a consent form for participation in the study. the interviews began with two open-ended pre-observation questions to gauge the users’ previous library experience. the first question asked students whether they had received library training in any of their courses, or, if the participant was a teaching staff or faculty member, if they regularly arranged library instruction for their students. the second question explored if they needed to use library resources in a previous assignment or required these in one they had assigned. then volunteers were given a written list of four multi-part task-based exercises, detailed in appendix b. these exercises were designed to evaluate the areas of concern outlined in the problem statement and to let the users explore the system, helping the observers discover unforeseen issues. the observations ended with two follow-up questions that asked the participants to describe their experience with the new interface. they were asked what they liked and what they found frustrating. they were also asked if there were areas where they felt they needed more help, and how the design could be made more user friendly. after the testing was completed, the audio files were imported into temi, a transcription tool that provided the text of what the users and the observers said throughout the test periods. the authors reviewed these transcripts and the recorded videos of the user’s keystrokes within the system for further clarity. the process and all instruments involved were reviewed by the lsu institutional review board prior to the testing. all user tests took place from march through november 2018. information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 6 findings previous library training and assignments three of the five undergraduate participants had received some previous library training fro m their professors or from a librarian who visited their classes. those that had training tended to recall specific databases they found useful, such as cq researcher or jstor. the assignments requiring library research mentioned by the undergraduate participants typically required several scholarly resources as a component of an assignment. four out of five of the graduate-student participants also indicated that they had some library training, and most also indicated they used the library frequently. two of the graduate-student participants, both studying music, mentioned that a research course was a required component of their degree program. the staff and faculty members tested mentioned that, depending on the format of the course, they either demonstrated resources to their students themselves, or would request a librarian to teach more extensive training. participant 13, a teaching staff member, mentioned that in a previous class she was “able to get our subject librarian to provide things to cater to the students, and they had research groups, so she [the subject librarian] was very helpful.” some of the teaching staff and faculty mentioned providing specific scholarly resources for their students. they acknowledged that since these were provided, their students did not gain hands-on experience finding scholarly materials themselves. participant 10, a faculty member, stated that she usually requires that students in one of her courses “do an annotated bibliography. i’ll require that they find six to ten sources in the library and usually require that at least three or four of those sources be on the shelf physically, because i want them to actually work with the books, and in addition, to avail themselves of electronic resources.” most of the staff and faculty participants indicated that, despite its weaknesses, they preferred using the online catalog over eds, mainly because eds included materials outside of our collection. when asked to explain, participant 12, a staff member said, because i feel like [with] ebsco you get a ton of results, and you know, i’m still looking for stuff that you guys have. um, [however] because of the way the original catalog is, i feel like i have to go through discovery to get a pretty accurate search on what lsu has. because, when i do use the discovery search, it’s a lot more sensitive, or should i say maybe a lot less sensitive, and it will pick up a lot of results. . . . it searches your catalog really well, just like worldcat does. . . . so, if the catalog was the thing that was able to do that, that would be cool. if the catalog search was more intuitive and inviting, i wouldn’t even bother going to some of these other places. books: general and distinct topics the observers noticed multiple participants using or commenting on known techniques learned from experience with the old catalog interface. these included boolean operators such as and to connect terms within the results. enterprise does not include boolean logic in its searches. a goal of the structure for the new algorithm is to provide a search closer to natural language. while most of the student participants typically searched by keyword when searching for books on general topics, staff and faculty participants typically preferred to search within the subject or title fields. faculty and staff participants also actively utilized the library of congress subject heading links within records and said that they also recommended that their students find materials in this manner. participant 9, a faculty member, said that he usually told his students to “find one book, then go to the catalog record . . . where you’ll get the subject headings. because . . . you’re not going information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 7 to guess the subject headings just off the top of your head, and that's how they are organized. that's the best way of getting . . . your real list.” many users were able to deduce that a book was available in print based on the call number listed. however, some undergraduate-student participants were confused by links in individual records and assumed that these were links to electronic versions of the book. these primarily linked to additional information about the book, such as the table of contents. the observers also found that many students used the publication date limiter when searching for materials, while commenting that they typically favored recent publications, unless the subject matter made a historical perspective beneficial. the date limiter, while effective for books, is less effective for periodicals, which include a range of publication dates. more advanced researchers, such as one staff participant, enjoyed the option to sort their results by date, but sorted these by “oldest first” indicating that they did this to find primary documents. known title books none of the user groups tested had trouble locating a known book title within the new interface, although two undergraduate students remarked that they preferred to search by the author, if known, to narrow the results. most undergraduates determined if the books were relevant to their needs based on the title and did not explore the tables of contents or further information available. graduate students tended to be more sophisticated in their searches for relevance and used the additional resources available in the records. participant 12, a staff member, mentioned that he liked the new display of the results when he was searching for a specific book. while the old system contained brief title information in the results display, he believed the new version showed more pertinent information, such as the call number, in the initial results. he said, “and this is also great too, because the old system . . . you would bring up information, then there’s another tab you have to click on to get to the . . . real meat of it. so . . . this is really good to see if it’s a book, to know what the number is immediately, just to not have to go through so many clicks.” specific journals specific journal results were problematic and were confusing in multiple ways. the task regarding journals directed users to find a specific journal title within enterprise, and then to determine whether 2018 issues were available and in what format (e.g., print or electronic). all the student users had trouble determining whether a journal was in print or electronic and if the year they needed was available. the task of finding a specific journal title and its available date range was also troublesome to many students. the catalog lists “publication dates” for journals prominently in the search results. however, these dates indicate the years that a journal was published, not the years that the library holds. users need to go into the full record for a journal to see the available years listed under the holding information. unfortunately, this was not intuitive to many. additionally, the presentation of years in records for journals was also unclear to some. for instance, participant 2, a freshman, did not understand that a dash next to an initial date (e.g., 2000–) indicated that the library held issues from 2000 to the present. many student users, especially those familiar with google scholar or eds, did not understand that journals are solely indexed in the catalog by the title of the journal. this is problematic for those who are accustomed to more access points for journals, such as article title and author. journals were additionally confusing because each format (e.g., print, electronic, or microfilm) has its own record in the catalog. typically, users clicked on the first record with the title that matched information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 8 the one they required, and assumed that this was all that was available, rather than scrolling through the results to view the different formats and the varying years within. this issue was problematic for all the student participants. participant 5, a phd student, summed up this frustration by saying, “i find stuff like that sometimes when i’m looking for other things. like, it shows the information [for the journal] but great, awesome. i found that [the journal] is maybe, hopefully somewhere, but sometimes you click on whatever link they have, and it goes to this one thing and there’s like one of the volumes available. so, this is not useful.” while records in the past used to be cataloged with all the holding information in one record, a better functionality for end users, this practice was changed in the midto late 2000sdue to the updating of journal holdings was a manual process completed by technical services staff. this timeframe was when the influx of electronic journal records began to steadily increase, making this workflow too cumbersome. due to all these known issues with journals, when asked to search for a specific journal in the catalog, several advanced searchers (graduate students, staff, and faculty) indicated they would not use the catalog to find journals. several stated other sources they preferred to use, whether google scholar, interlibrary loan, or subject-specific databases in their fields. after fumbling around with the catalog, participant 12, a staff member, summed this up by saying, “i guess if i was looking for a journal, i would just go back to the main page, and go from there [from the e-journals option]. i haven’t really searched for journals from the catalog. the catalog is usually my last [resort], especially for something like a journal.” usability of patron-empowerment features many participants were confused by the login required to engage with the patron-enablement tools prior to the iterative changes demonstrated in figure 1. once changes were made clarifying the required login information, patrons were able to use the patron-enablement tools well, placing holds and selecting the option to send texts regarding overdue materials. however, few undergraduate participants intuitively understood the functionality of the checkboxes next to records to retrieve items later. some participants assumed that they needed to be logged into the system to use this functionality, similar to eds. participant 1, a senior, said that she used a different method for retrieving items later, stating “normally, i'm going to be honest, if i needed the actual title, i’d put it in a word document on my computer. i wouldn’t do it on the library website.” another graduate student, participant 14, stated that while he was aware of the purpose of the checkboxes, he would not use them because the catalog would not be the only resource he would be using. he said that his preference was to “keep a reference list [in word] for every project. and then this reference list will ultimately become the reference for the work done.” participants in every category noted that they did not usually create lists in the catalog to refer to them later. there was enthusiasm regarding the new option to text records, with participant 6, a staff member, going so far as to say “boy, this is gonna make me very annoying to my friends” and staff participant 12 stating “that’s a really cool feature. i think that’s more helpful than this email to yourself.” unfortunately, there were several issues discovered regarding the text functionality. the first issue was that it was not reliably working with all carriers. once that was resolved, the systems librarian removed extraneous information regarding the functionality. this included text that “standard rates apply” and a requirement for users to choose their phone carrier before a text could be sent. these were both deemed unnecessary as it was assumed that users would information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 9 know whether they were charged for receiving text messages. additionally, one of the international graduate student participants did not understand the connection between texting information and the tool to do this, which was titled “sms notification.” while the texts were successful while the users were performing the studies, it was discovered later that the texts did not include the call numbers for items. after discussion regarding this problem arose at a louis conference, the decision was made to hide texting as an option until the system was able to properly text all the necessary information. when sirsidynix can fix this issue, the language around this will likely be made more intuitive by labeling it “text notifications” instead of sms. other issues and frustrations the researchers noticed that some options were causing confusion, such as the format limiter. under this drop-down option, several areas were displayed that did not align to known formats in the lsu collections, such as “continuing resources.” to remedy this, all the formats that could not be identified were removed as options. another confusing element was the way that records were displaying in initial user tests. some marc cataloging information was visible to users, so the systems librarian modified the full record display to hide this information. originally, the option to see this information was moved to the side under an option to “view marc record.” however, since this still seemed to confuse users, this button was changed to “librarian view.” undergraduate-student users reported confusion when they needed to navigate out of the catalog into a new, unfamiliar database interface to obtain an electronic article. participant 3, a senior, described her feelings when this happened, that she felt like she was “not in the hands of lsu anymore. i’m with these people, and i don’t know how to work this for sure.” another undergraduate user gave the suggestion that the system provide a warning when this occurs, so users knew that they would be navigating in a new system. since so many of the records link to databases and other non-catalog resources, this was not pursued. several undergraduate-student users mentioned that they didn’t understand the physical layout of the library, and that they used workarounds to get the materials they needed rather than navigate the library. for example, some were using the “hold” option in the catalog to have staff pull materials for them for reasons not initially intended by the library. rather than using this feature for convenience, they stated they were using it due to a lack of awareness of the layout of the library or the call number system. one user, participant 4, a sophomore, used the hold feature to determine whether a book was in print or electronic. when she clicked on the “hold” button in a record and it was successful, she said “okay, so i can place a hold on it, so i’m assuming there is copy here.” follow-up questions feedback to the new interface was primarily positive. several participants mentioned that the search limiters were now more clearly presented as choices from drop-down boxes. additionally, result lists are now categorized by facets such as format and location, which users had options to “include” or “exclude” at their discretion. participant 9, a faculty member, particularly liked the new library of congress subject facet from within the search results. she mentioned that these were available in the past interface, but the process to get to them was much more cumbersome. she regarded this new capability as a “game changer” and “something she hadn’t even dreamed of.” experienced searchers, such as participant 6, a staff member, noticed and appreciated the improvements in search results made possible by the new algorithm. she said, “it’s very easy to information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 10 look at, especially compared to the old database, and the keyword searching is a lot better.” after conducting a brief search, another staff member, participant 12, mentioned that he thought the results returned by an author search were much more relevant than in the past. he said, “sometimes with the older system even name searches can be sort of awful. i mean . . . this is a lot better. . . . if you type in benjamin franklin, for me at least, it’s difficult to get to the things he actually wrote. you know, you can find a lot of books about [them], and so you kind of have to filter until you can find . . . the subject.” figure 2. catalog disclaimer. the new search is also more forgiving of misspellings than the old version, which responded with “this search returned no results” when an item was misspelled. those who were very familiar with the old interface, such as staff and faculty, were particularly excited by small changes. these included being able to use a browser’s back button instead of the navigation internal to the system, or the addition of the isbn field to the primary search. prior to the new interface, the system would give an error message when a user attempted to use a browser’s back button instead of the internal navigation. additionally, users mentioned that they liked that features were similar to the previous interface with additional options. an example of a new feature is the system employing fuzzy logic to provide “did you mean” and autocomplete suggestions when users start typing in titles they are interested in, similar to what google provides. this same logic also returns results with related terms, eliminating the need for truncation tools.17 one graduate student, however, particularly mentioned missing boolean operators; they thought they were helpful because students had been taught these and were familiar with them. due to this comment, and other differences between the old and new interfaces, a disclaimer was added to information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 11 make users aware of these changes (see figure 2). two of the five undergraduate participants and two faculty participants noted if they didn’t understand something, or needed help, they would ask a person at a service desk for assistance. one of the staff members mentioned she would use the virtual chat if she had a question regarding the catalog. she also suggested that a question mark symbol providing additional information when clicked might be helpful if users got confused in the system. to allow users to provide continued feedback, the systems librarian created a web form for users to ask questions or report errors regarding the system. discussion study participants requested several updates; unfortunately, some of the recommendations suggested were in areas where the systems librarian had little control to make changes. since the initial advanced search was not easy to customize, the systems librarian created a custom new advanced search that more closely fit the needs of catalog users than the built-in search. one limitation of the default advanced search that several participants and staff users noted was the inability to limit by publication date. to work around this problem, one of the features the systems librarian implemented in the custom search was a date-range limiter. while still falling short of the patrons’ desired outcome of inputting a precise date to limit by, the date range feature was still a step forward. he was also able to make stylistic changes, such as bolding fields or making buttons appear in bright colors to make them more visible. other changes included eliminating confusing options and reordering the way full records appeared. this included moving the call number to a more visible area than where it was originally located. after a staff participant suggested it, he was also able to make the author name a hyperlinked field. now users can click on an author’s name to see what other books are available by that author within the library. the systems librarian was also able to make the functionality of the checkboxes more intuitive by adding a “select an action” option at the top of the list of results, which more clearly indicated what could be done with the checked options. these include being added to a list, printed, emailed, or placed on hold. the username and pin required to engage with the user-enablement tools was continually problematic, and not intuitive. only one of the student participants knew their login information, a graduate student close to graduation. the user name is a unique nine-digit lsu id number, which students, faculty, and staff don’t often use. the pin is system generated, so there is no way users could intuit what their pin is. once the user selects the “i forgot my pin” option however, the pin is sent to them, and then they have the option to change it to something they prefer. this setup is not ideal, especially since many other resources on campus are accessed through a more familiar text-based login and password. the addition of “i forgot my pin” to this part of the interface helps by anticipating and assisting with this problem by providing an example with the nine-digit id number, but this can also be overlooked. for this reason and for other security reasons related to paying fines, the library is exploring options to provide a single-sign-in login mechanism. the lack of knowledge regarding the physical layout of the library cannot be solely blamed on the users. in 2014, the lsu libraries made several changes to middleton library, the main campus library. the first was the closing of a once distinct collection, education resources, whose titles were merged with the regular collections. the second was weeding a large number of materials on the library’s third floor to facilitate the creation of a math lab. the resulting shifting of the collection had a direct impact on how patrons were able to locate materials within the library. due to required deadlines, access services staff needed to place books in groupings out of their typical information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 12 call number locations. the department is still working to remedy this five years later. in addition, service points previously manned to assist with wayfinding on the third and fourth floors were closed. conclusion to optimize whatever resources the library provides, user feedback is a useful tool. however, there are limitations to this study, the most obvious being that the data collection took place and was based on researchers at one institution. this could limit the applicability of the study’s results. the second is regarding the sampling method for user tests. student users self -selected by volunteering in lieu of user fines, and staff and faculty identified as frequent library users, were purposively selected. less-experienced users may have encountered different issues as they navigated the system. most of the student participants indicated they had received library training in some manner, and that they had been required to use library resources in the past to complete assignments. only a small number of participants were undergraduates without library training. the authors noted that the student participants who had received library training were more likely to attempt complicated searches and to explore advanced features. however, they also tended to try to conduct searches that were optimized for the previous catalog, such as using boolean logic. those with library training were also more likely to identify problematic areas, such as searching for journals, and to develop workarounds to get the materials they desired. the two graduate students in music, who were required to take a research course, both indicated how helpful this knowledge was to conducting research in their field. the user tests in this study demonstrated which information points the users at lsu found to be the most relevant, which allowed the system librarian to redesign a search that better fit their needs. this included hiding or separating extraneous information, such as additional information regarding texting, and making changes so all marc coding only appeared under the newly created “librarian view.” while this study demonstrated that the advanced researcher participants created workarounds regarding journal searches, undergraduate participants also created workarounds (such as placing holds) to accommodate their lack of knowledge regarding the library system and the physical library. several of the undergraduate participants reported having anxiety regarding their ability to navigate systems when the catalog linked to databases with interfaces new to them. the authors found that more advanced researchers appreciated having more data in catalog records, such as information on publishers and library of congress subject headings. students without as much exposure to library resources tended to prefer to conduct keyword searches and were more likely to judge the relevance of a record based mainly on the title or year of publication. most of the staff and faculty participants in this study indicated that they preferred to use the opac over eds. less-seasoned researchers tended to prefer ease and convenience over additional control and functionality. these kinds of generalizations could be tested by additional studies at other universities. the new user-empowerment features were received positively, especially the new “text notifications” feature. most participants indicated that they found it easy to renew items within the interface. however, the authors discovered that few patrons indicated they would u se “my lists” to capture records they would like to retrieve later. the user tests highlighted how many problems lsu library users are having signing on to the system to utilize the user-enablement tools. it is hoped that the upcoming change to a single sign on will alleviate these issues and the users’ frustrations. the systems librarian would like to incorporate other changes, such as the information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 13 request to return to the same spot on a list after going into a full record rather than returning to the top of the list. he is also planning on programming the mobile view for the catalog soon. currently the mobile site is still linking to the desktop version. he has also reintroduced the option to conduct a boolean search by linking to the old catalog due to so many users being familiar with it. the text messaging is expected to be corrected in an upcoming upgrade. overall, response from the participants in this study was positive, especially regarding the new algorithm. they also appreciated the familiarity of the design with the previous catalog interface, with additional features and functionality. regardless of the limitations in this study, some of this study’s findings reaffirm those from previous user studies. these include researchers indicating a need to consult multiple resources either in combination with or in exclusion of the catalog, and ngcs not being as intuitive as expected. the need to consult multiple resources particularly correlated with this study’s findings regarding journals. librarians were aware that searching journals in the online catalog was tricky for users due to multiple issues. many of the experienced participants in this study mentioned that they appreciated the new algorithm because it provided more accurate results. this reaffirmed results from the borbely study, which indicated that task effectiveness, or the system returning relevant results, as the primary factor related to user satisfaction. also similar to findings from the literature, users appreciated the newly available faceted-browsing features. dissimilar to a previous study however, it was the advanced searchers, rather than the novices, who mentioned these specifically as an improvement.18 the authors noted that it was common for undergraduate library participants to express confusion regarding navigating the physical library, so the library has taken several steps to remedy this. since this user testing was completed, a successful grant was written to provide new digital signage to replace outdated signage. this digital signage will be much more flexible and easier to update than the older fixed signage. additionally, this grant provided a three-year license to the stackmaps software. this software has since been integrated into the catalog and eds tool to direct users to physical locations within the libraries. additionally, the access services department updated physical bookmarks that display the call number ranges and resources available on each floor. these are now available at all the library’s public service desks. the library will also continue providing the popular “hold” services for patrons. this is a relatively new service, which was started to offset confusion and to assist patrons during the construction they may have encountered during the changes to the library. finally, since the fine forgiveness program has been so fruitful regarding recruitment for user studies, the special collections library also anticipates providing user studies in lieu of photocopying costs in the future. future research these user tests made it obvious that finding specific journal information through the catalog was difficult for most users. this is an area that needs remediation, and the systems librarian plans to conduct further user testing to explore avenues to make searching for journal holdings more efficient. another potential area for further study includes assessing enterprise’s integration of article records. as previously mentioned, enterprise can be configured to include article-level records into its display. however, this functionality would duplicate an existing feature of our main search tab, an implementation of eds that we have labeled “discovery.” while the implementation team felt that duplicating this functionality on a search tab labelled “catalog” might initially confuse users, replacing our current default search tab with enterprise warrants information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 14 serious consideration. an additional area to explore is a rethinking of the tabbed search box design. while the tabbed design remains popular in libraries, a trend toward a single search box on the library homepage has been observed in academic libraries.19 a future study with an emphasis on determining the best presentation of a various search interfaces, including either a reshuffling of available tabs or a move to a single search box, is planned in the foreseeable future. information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 15 appendix a: study participants participant status year major date tested 1 undergrad senior international studies & psychology 3/23/2018 2 undergrad freshman mass communication 4/6/2018 3 undergrad senior child and family studies 4/13/2018 4 undergrad sophomore pre-psychology 4/24/2018 5 graduate phd music 4/26/2018 6 staff n/a english 5/3/2018 7 graduate masters music 5/3/2018 8 graduate phd french 5/4/2018 9 faculty n/a history 5/7/2018 10 faculty n/a english 5/8/2018 11 undergrad junior accounting 6/1/2018 12 staff n/a history 6/4/2018 13 staff n/a mass communication 8/28/2018 14 graduate phd curriculum and instruction 10/3/2018 15 graduate phd petroleum engineering 10/2/2018 information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 16 appendix b: user exercises worksheet 1) you need to do a research paper on gerrymandering and race. a) identify three books that you may want to use. b) how would you save these titles to refer to later? 2) you are looking for the book titled harriet tubman and the fight for freedom by lois e. horton. find out the following and write your answers below. a) does the library own this in print? b) what is the call number? c) if we have this book, go into the record, and text yourself the information. d) place a hold on this book. 3) you need an article from the journal of philosophy. do we have access to the 2018 issues? what type of access (e.g., print or electronic)? 4) log in to your personal account to see the following: a) what you have checked out currently, if you have materials out, try to renew an item. b) determine any fines you owe. c) add a text notification for overdue notices. information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 17 endnotes 1 marshall breeding, “library systems report 2018: new technologies enable an expanded vision of library services,” american libraries (may 1, 2018): 22–35. 2 david a. grossman and ophir frieder, information retrieval: algorithms and heuristics, 2nd ed. (dordrecht, the netherlands: springer, 2004). 3 dikshant shahi, apache solr: a practical approach to enterprise search (berkeley, ca: springer ebooks, 2015). ebscohost. 4 “usability testing,” u.s. department of health and human services, accessed june 1, 2019, https://www.usability.gov/how-to-and-tools/methods/usability-testing.html. 5 john sweller, “cognitive load during problem solving: effects on learning,” cognitive science 12, no. 2 (1988): 257–85, https://doi.org/10.1207/s15516709cog1202_4. 6 nina hollender et al., “integrating cognitive load theory and concepts of human–computer interaction,” computers in human behavior 26, no. 6 (2010): 1278–88, https://doi.org/10.1016/j.chb.2010.05.031. 7 wolfgang schnotz and christian kürschner, “a reconsideration of cognitive load theory,” educational psychology review 19, no. 4 (2007): 469–508, https://doi.org/10.1007/s10648007-9053-4. 8 jeroen j. g. van merriënboer and ayres paul, “research on cognitive load theory and its design implications for e-learning,” educational technology research and development 53, no. 3 (2005): 5–13, https://doi.org/10.1007/bf02504793. 9 ashok sivaji and soo shi tzuaan, “website user experience (ux) testing tool development using open source software (oss),” in 2012 southeast asian network of ergonomics societies conference (seanes), ed. halimahtun m. khalid et al. (langkawi, kedah, malaysia: ieee, 2012), 1–6, https://doi.org/10.1109/seanes.2012.6299576. 10 deeann allison, “information portals: the next generation catalog,” journal of web librarianship 4, no. 4 (2010): 375–89, https://doi.org/10.1080/19322909.2010.507972. 11 breeding, “library systems report 2018.” 12 maria borbely, “measuring user satisfaction with a library system according to iso/iec tr 9126‐4,” performance measurement and metrics 12, no. 3 (2011): 151–71, https://doi.org/10.1108/14678041111196640. 13 sharon q. yang and melissa a. hofmann, “next generation or current generation? a study of the opacs of 260 academic libraries in the usa and canada,” library hi tech 29 no. 2 (2011): 266–300, https://doi.org/10.1108/07378831111138170. 14 jody condit fagan, “usability studies of faceted browsing: a literature review,” information technology & libraries 29, no. 2 (2010): 58–66, https://doi.org/10.6017/ital.v29i2.3144. https://www.usability.gov/how-to-and-tools/methods/usability-testing.html https://doi.org/10.1207/s15516709cog1202_4 https://doi.org/10.1016/j.chb.2010.05.031 https://doi.org/10.1007/s10648-007-9053-4 https://doi.org/10.1007/s10648-007-9053-4 https://doi.org/10.1007/bf02504793 https://doi.org/10.1109/seanes.2012.6299576 https://doi.org/10.1080/19322909.2010.507972 https://doi.org/10.1108/14678041111196640 https://doi.org/10.1108/07378831111138170 https://doi.org/10.6017/ital.v29i2.3144 information technology and libraries march 2020 user experience with a new public interface | blessinger and comeaux 18 15 hollie m. osborne and andrew cox, “an investigation into the perceptions of academic librarians and students towards next-generation opacs and their features,” program: electronic library and information systems 51, no. 4 (2015): 2163, https://doi.org/10.1108/prog-10-2013-0055. 16 “best practices for user centered design,” online computer library center (oclc), accessed june 7, 2019, https://www.oclc.org/content/dam/oclc/conferences/acrl_user_centered_design_best_pract ices.pdf. 17 “enterprise,” sirsidynix, accessed june 27, 2019, https://www.sirsidynix.com/enterprise/. 18 fagan, “usability studies.” 19 david j. comeaux, “web design trends in academic libraries—a longitudinal study,” journal of web librarianship 11, no. 1 (2017): 1–15, https://doi.org/10.1080/19322909.2016.1230031. https://doi.org/10.1108/prog-10-2013-0055 https://www.oclc.org/content/dam/oclc/conferences/acrl_user_centered_design_best_practices.pdf https://www.oclc.org/content/dam/oclc/conferences/acrl_user_centered_design_best_practices.pdf https://www.sirsidynix.com/enterprise/ https://doi.org/10.1080/19322909.2016.1230031 abstract introduction conceptual framework user studies problem statement methodology findings previous library training and assignments books: general and distinct topics known title books specific journals other issues and frustrations follow-up questions discussion conclusion future research appendix a: study participants appendix b: user exercises endnotes technology integration in storytime programs: provider perspectives article technology integration in storytime programs provider perspectives maria cahill, erin ingram, and soohyung joo information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.15701 maria cahill (maria.cahill@uky.edu) is professor, university of kentucky. erin ingram (erin.ingram@chpl.org) is youth librarian, cincinnati and hamilton county public library. soohyung joo (soohyung.joo@uky.edu) is associate professor, university of kentucky. © 2023. abstract technology use is widespread in the lives of children and families, and parents and caregivers express concern about children’s safety and development in relation to technology use. children’s librarians have a unique role to play in guiding the technology use of children and families, yet little is known about how public library programs facilitate children’s digital literacy. this study sought to uncover librarians’ purposes for using technology in programs with young children as well as the supporting factors and barriers they encountered in attempting to do so. findings reveal 10 purposes for integrating technology into public library storytime programs and 15 factors across four dimensions that facilitate and/or inhibit its inclusion. if librarians are to embrace the media mentor role with confidence and the necessary knowledge and skills required of the task, much greater attention should be devoted to the responsibility and more support in the way of professional development and resources is necessary. introduction technology use is widespread in the lives of children and families. from a very early age, children in highly developed countries across the world regularly interact with technology and data from device trackers substantiate parental reports.1 nearly all families have access to one or more mobile devices, and nearly three-fourths of children in the united states begin some form of digital engagement, primarily television viewing, before age three.2 prior to formal schooling, children (ages two to four) in highly developed countries tend to use a device with a screen for about two and a half hours per day on average.3 differences in screen use by income level and race are significant, with children from lowerincome families and children of color spending more time on electronic devices than children from higher-income families and children who are white. though most parents do allow their children to use technology, many voice some concerns about their children’s well-being, particularly regarding privacy as well as the content of the media.4 yet, young children’s digital activity can be beneficial, particularly when the technology is designed to foster active, meaningful engagement and when it facilitates social interaction.5 in light of children’s usage and parents’ concerns, librarians in public libraries have a unique role to play in this information realm. not only can librarians provide access to technology and recommended resources but they can also provide guidance in how to use technology to contribute to children’s learning, especially in the areas of reading, information literacy, and academic concepts.6 yet, little is known about whether librarians actually facilitate children’s digital literacy through integration of technology into programs, and this dearth of empirical mailto:maria.cahill@uky.edu mailto:erin.ingram@chpl.org mailto:soohyung.joo@uky.edu information technology and libraries june 2023 technology integration in storytime programs 2 cahill, ingram, and joo evidence is highlighted in the association of library services to children (alsc) research agenda.7 storytime, as a program attended by both children and caregivers, can be used as a time for children’s librarians to integrate technology for the purposes of modeling and explaining how various electronic tools might be beneficial for young children.8 due to this potential, it is important to understand how and why children’s librarians are—or are not—integrating technology into storytime programs. previous studies of technology use in children’s programs and storytimes internationally, there have been few investigations of technology integration within library programs for young children. within the united states, two survey studies, both commissio ned by alsc, sought to capture the use of technology in youth programming.9 the initial survey launched in 2014 and the follow-up survey in 2018. respondents to these surveys reported that the types of devices used most often in libraries were proprietary institutional devices, digital tablets, tangible tech such as squishy circuits that allow children to build electrical circuits with play dough, and programmable tech such as cubetto, a wooden robot toy.10 additionally, more than half of respondents working in medium and large libraries and more than 45% of those working in small libraries indicated using digital devices during storytimes.11 conversely, a comprehensive study of programming for young children in public libraries, which included observations, concluded that, “while many libraries offer families a place to use computers and other digital resources together, few libraries actively promote the use of technology during their programming.”12 notably, neither the 2014 nor 2018 alsc survey included questions about the types of technology used in storytimes, nor were respondents asked to explain their thoughts on why or how technology was or was not included in storytime.13 a study conducted in aotearoa new zealand collected data about technology use in storytime in three phases: a survey of 25 children’s librarians, interviews with librarians in nine libraries, and a survey of 28 caregivers who attend a library storytime with a young child.14 slightly more than a quarter of the librarians responding to the survey reported incorporating digital technology such as tablets or e-books into storytime programs. the most common rationale for technology use in storytime was to educate caregivers. other reasons included for the novelty of it and to promote accessibility and the aims of library services. interviewees explained that they used technology in storytime to show caregivers the availability of high-quality digital media such as e-books and educational apps, with one likening the use and recommendation of digital media to librarians’ traditional role as recommenders of storybooks (i.e., readers’ advisory services). conversely, one interviewee expressed reservations about using technology for fear that children would be distracted from the content of the story. the majority of caregiver respondents who had attended a storytime with digital technology reported enjoying the experience. however, those who had never attended a storytime with technology were apprehensive about doing so. technology best practices: joint media engagement and media mentorship recent scholarship encourages children’s librarians to use their expertise and experience to evaluate and recommend technology and new media resources as well as to model for adults how to interact with children as they use technology.15 for example, librarians can promote joint media engagement during storytimes both by modeling the practice and by directly explaining it to the adults in attendance. using technology during storytime can be seen as modeling modern literacy practices, just as reading print books has modeled literacy practices in traditional storytimes since the 1940s .16 information technology and libraries june 2023 technology integration in storytime programs 3 cahill, ingram, and joo alsc instructs youth services librarians to act as media mentors, a role that means they will assist caregivers in choosing and using technology by researching new technology and by modeling technology use, such as joint media engagement, for caregivers in programs such as storytimes.17 media mentorship is seen as an extension of how youth services librarians have traditionally been called upon to meet the needs of caregivers and children with their knowledge of child development and ability to facilitate caregivers’ information seeking.18 while alsc encourages media mentorship, the extent to which children’s librarians have embraced this role is unclear in professional research. findings from prior surveys and interviews with storytime providers suggest that librarians are regularly integrating technology into programs while observations of library programs suggest otherwise.19 further, goulding and colleagues found that while many librarians were comfortable recommending technology such as apps, it was unclear whether or not they were modeling its use during storytimes.20 study objectives the overarching research question of this study is “how do storytime providers view the integration of technology into storytime programs?” the following three research questions guide this study. 1. what are the purposes for using technology in storytimes? 2. what are factors associated with adopting technology in storytimes? 3. what are barriers to integrating technology in storytimes? method participants as part of a larger institute of museum and library services (imls)-funded, multistate study that was approved by the university of kentucky institutional review board (irb number 42829), researchers conducted semi-structured interviews with 34 library staff who facilitate storytime programs at public libraries serving urban, suburban, and rural communities across kentucky, ohio, and indiana.21 interviewees were not asked to identify their race or ethnicity. thirty-two identified as female and two as male. all but one of the participants (97%) had earned a college degree, but only 13 (38.2%) held a master’s degree from a library and information science (lis) program, while another two were enrolled in an lis master’s degree program when the interviews occurred. the majority of participants (57.1%) had five years or more of experience in children’s library services. the participants will be referred to as “storytime providers.” procedure the interviews were conducted by one member of the research team. other members of the team created written transcripts from recordings of the interviews. for the study reported in this paper, researchers focused on participants’ answers to the interview question “what place, if any, does technology or digital media have in a quality storytime program?” an open coding method was used to organize participants’ statements within three categories: purposes underlying technology use, factors associated with technology adoption, and barriers to technology integration. three researchers conducted open coding independently and came up with the initial set of coding results. then, the researchers discussed the coding results multiple times to assess the relevance of the coded constructs, refine operational definitions, and select one representative quote for each code. interviewees were assigned a number between 1 and 34 to eliminate identifying information. information technology and libraries june 2023 technology integration in storytime programs 4 cahill, ingram, and joo results what are the purposes for using technology in storytimes? to find answers to this research question, the researchers coded statements related to how or why interviewees used or wanted to use technology in storytime programs. we identified 10 specific purposes, formed operational definitions for each, and chose one representative quote (table 1). although most purposes had statements from more than one interviewee associated with them, we collaborated to choose one example due to space constraints. researchers determined that the purposes for technology use could be divided into two categories: experiential and learning. experiential purposes are those for which technology is used to create a positive, engaging experience for child and/or adult participants. learning purposes are those for which technology use is intended to help child and/or adult participants learn. what are factors associated with adopting technology in storytimes? to answer the second research question, researchers looked for statements explaining the reasons or causes for storytime providers using or wanting to use technology in their storytime programs. these would be factors that facilitate technology adoption. researchers coded statements independently and then discussed results multiple times to verify relevance and consolidate categories into 15 factors in four dimensions: storytime provider, participant, library system, and content. though many factors had more than one corresponding statement from participants, we chose one representative quote for each. results are presented in table 2. what are barriers to integrating technology in storytimes? to answer this question, researchers independently reviewed responses, looking for statements related to why storytime providers did not or did not wish to use technology during storytime. after individual coding, we collaborated to verify relevance, refine definitions of the 15 identified barriers, and choose representative quotes. the results are presented in table 3. researchers found that three of the dimensions created for factors that lead to techno logy adoption could also be applied to barriers to technology integration: storytime provider, participant, and library system. information technology and libraries june 2023 technology integration in storytime programs 5 cahill, ingram, and joo table 1. purposes for using technology in storytimes category purpose operational definition representative quote experiential accommodating large groups technology is used to enable a large group to view books/materials 2: “i had this huge group of kids. and i took them to our red room and did a story on our big screen. you know, through tumblebooks.” children’s enjoyment provider incorporates media or technology because children enjoy it 14: “and then as far as, um, sometimes, um, we’ll have, like, at the end of a storytime, we may have a little short, um, like nonfiction or sign language or if we were doing something on the alphabet, maybe i would throw in a little dvd and give them popcorn for the end of storytime and things like that and i think that they really enjoy that. it is important to integrate that in.” facilitating adult participation provider uses technology to display the words to songs to facilitate adult participation 12: “the closest thing i would say, i use a powerpoint that has the words on it for the parents to be able to follow along, um, or for the kids if they can pick out some of the letters or start to read, even some of the older ones.” facilitating movements technology is used to facilitate movements or dancing 19: “in addition to our singing, just to give, you know, to change it up a little bit. so, they can hear the music. we clap rhythms. so, we use that a lot.” playing songs/music technology is used to play songs or music 13: “we have a sound system that i love, with surround sound. we always do our last song with, you know, that, and i’ve been fortunate that it’s worked all the time.” information technology and libraries june 2023 technology integration in storytime programs 6 cahill, ingram, and joo category purpose operational definition representative quote sound effects technology is used to create a sound or voice 17: “one of the better things that i’ve done, that i like to do, is, i like to use animal sounds. i’ll research or pull up a list of sounds on youtube or whatever and have the kids listen to them. i think that’s always been a fun way to work in a little bit of technology without taking out all of the flow.” visual aids technology is used to support children’s visual experience 24: “and, like, it gives the kids a visual. and i feel like sometimes, if we could give them a better visual, they might be more engaged.” learning support for adult-child interaction technology is used to support adult-child interaction 1: “if you’re actually sitting down with your child, looking at it together, it’s a lot more effective and the child is getting a lot more out of it versus just sitting them in front of it and expecting to teach something to the child.” teaching caregivers technology is integrated to model for caregivers 11: “i think it’s important to share with parents really good e-resources, such as, like, apps. and books and stuff. so, that, i think it’s very important…. i have, like, when i have like a screen, a projector screen, maybe when the book i picked for the storytime was an e-book that they could get through the library, and kind of, you know, advertise that resource, and then we would, we would read the e-book, you know, from the projector. so i’ve done, like, e-books and stuff.” teaching concepts technology is used to present letters, words, numbers, shapes, sign language, colors, or coding skills, to children 22: “…. all these different color songs, um, and they’re actually just on youtube…. so that is one way that we’ve been incorporating technology, um, is with those color songs because it spells it out for them. they can see the word, it’s a familiar tune, and it helps them, you know, at least be able to sing, sing the song.” information technology and libraries june 2023 technology integration in storytime programs 7 cahill, ingram, and joo table 2. factors associated with adopting technology in storytimes dimension factor operational definition representative quotes storytime provider awareness provider is aware of the tool/technology available for storytime 1: “i’m aware of all kinds of apps that are out there and of course the ebooks.” familiarity provider feels comfortable with the technology and with integrating the technology into programs 1: “i feel like it’s going to be effective if it’s what you’re comfortable with and you’re excited about. because that will come through when you actually provide the storytime.” choice of provider ultimately it is up to the provider to choose to integrate technology or not 1: “i think it all depends on the provider.” provider’s philosophy and approach how the provider views storytime and its purpose influences technology integration 1: “everyone has their own, unique storytime philosophy and the way that they approach planning storytimes…. so, really, a lot of it is just ... theory of how you want to approach it since there’s so many options out there.” reaction/success with initial attempt if the provider tried technology integration, the success or failure of that initial attempt influences subsequent attempts 2: “it went over really well.” information technology and libraries june 2023 technology integration in storytime programs 8 cahill, ingram, and joo dimension factor operational definition representative quotes research base provider is aware of research to support integration of technology 1: “... it’s kind of what the research is saying with parents and digital media at home. it all depends on how you are using it. if you’re actually sitting down with your child, looking at it together, it’s a lot more effective and the child is getting a lot more out of it versus just sitting them in front of it and expecting to teach something to the child.” participant number of participants the number of participants facilitates technology integration 2: “i think this summer was the first time i ever did that [used technology], and it was because i had this huge group of kids.” perception of caregivers’ reactions provider’s perception of how the caregivers would react to technology use 1: “i think they would probably be open to it…. i don’t know if maybe the perception some parents don’t want any technology, that would keep some people from appreciating it. but i think in general, it would be wellreceived if we tried it.” responsive to children’s interests provider uses digital resources because the children show interest or engagement 10: “kids are automatically interested in that stuff. they don’t need to be enticed. you know, you just get out an iphone or an ipad and they’re, like, gasp.” library system access to equipment and resources provider has access to technology and tools 1: “... we have technology, i think, in our system to implement it. you know, e-readers and ipads and things that we can use in storytimes. and large screen tvs.” information technology and libraries june 2023 technology integration in storytime programs 9 cahill, ingram, and joo dimension factor operational definition representative quotes colleague support provider is part of a branch or system that shares information and resources for technology integration 17: “so, you know, we have, and we’ve gotten pretty [good] at sharing with other storytime providers in our system if we have any websites or anything that we’ve been using or music that works really well for ‘movers and shakers’ or anything like that.” expectation to integrate technology in programs provider feels pressure to integrate technology and is defensive about the choice not to do so 1: “i kind of apologize for it…. so, we have the technology available, and they encourage us to use it....” training provider has used or wants to use technology during storytime because of a training 17: “we did a digital mentoring training about how to appropriately model, like, tech skills and screen time with families. so we’ve been encouraged to add in a little bit more technology into our storytimes if we can do those, you know, in an appropriate way.” content interactivity provider can use technology to facilitate interactivity 24: “... i would love to use some, like, smart tvs, smart boards, those kind of things. just for some interactive songs and you know, activities... when i go into these kindergartens and first grade and second grade rooms, like, these kids are using the smart boards for interactive activities for abcs and colors and shapes and numbers. and it may be through an activity or a song that’s being used with that smart board. and i say, ‘oh, i love that! i wish i could do that!’” theme provider uses technology that clearly connects to the theme of the storytime 17: “actually in my kinderbridge storytime now, it’s shapes month. we have the osmotangrams that i bring out. so that’s one of the ones all four weeks i’m going to use the apps and bring out both of our ipads so that kids can practice those spatial shapes.” information technology and libraries june 2023 technology integration in storytime programs 10 cahill, ingram, and joo table 3. barriers to integrating technology in storytime dimension barrier operational definition representative quote storytime provider fear of difficulties/ problems provider doesn’t plan or hesitates to plan technology use because there may be problems with using it 13: “but technology can be a problem. when you’re planning or something and it’s not working.” previous/ own child’s experiences with tech provider has negative experience using technology with children 5: “i have a four-year old. and it’s interesting to see how he responds to technology and what he responds to. and what helps him to learn the most. and it’s just, like, night and day what he learns from. you know, hearing repeated songs and rhymes and just reading tons of books versus what he learns.… i mean, i think that probably the most he ever learned from an ipad was getting to watch sesame street. just sort of the same, sort of like watching a storytime, i think. but yeah, i think just now from experience seeing like, ‘oh! that really doesn’t. it’s not a helpful tool, i don’t think, for that age.’ just from my experience.” undecided about the value of tech provider is unsure if tech integration is appropriate 5: “i have been all over the board in terms of that subject … like i said, it’s really important for me to pack in as much of what i think they need in a storytime. and i don’t know, again, i’m not sure that i’m doing exactly what is correct and maybe i should be exposing them more. but i feel like, especially for threeto five-year olds, it’s one of those things.... screen time/ overuse concerns provider is concerned about children’s screen time 2: “because i think there’s plenty of opportunity to be had in other places.” information technology and libraries june 2023 technology integration in storytime programs 11 cahill, ingram, and joo dimension barrier operational definition representative quote storytime activities as purposeful alternative to technology provider deliberately chooses not to use technology in storytime because they see storytime activities as equally or more beneficial 16: “and one thing that i’ve gotten feedback on is that kids are exposed to the technology in pretty much every facet of their life, so if we can make this a space where they can learn and experience things in a way that doesn’t have technology and they can see that it’s still really fun and exciting and we can learn a lot, then that has its own place, too.” unwilling to adopt a new technology provider keeps using the prior tool and does not try a new alternative technology 18: “i’m kind of old school because we’ve been using our cd player.” participant children devalue other components of storytime when tech is integrated provider perceives that the children prefer tech over other components of storytime 5: “i used to sometimes show a short video, and then i kind of found that that’s what they looked forward to most. i wanted to sort of change that perception of what the library was for some kids.” difficult to use tech with young children provider experiences difficulty using technology with young children 5: “i have found, for preschoolers, that it is really hard to incorporate anything digital.” lack of access to the internet poor broadband in rural area; why expose children to something they can’t use at home 5: “i feel like, especially here in this rural area, … [w]e have a really poor broadband network here, so not a lot of people have access to the internet. and so sometimes i feel like, also, showing them something that they can’t really utilize at home is not really helpful until they’re a little older also. information technology and libraries june 2023 technology integration in storytime programs 12 cahill, ingram, and joo dimension barrier operational definition representative quote perception or anticipated perception of some parents/ caregivers if the provider perceives that some parents/caregivers will object to tech integration, the storytime provider may be reluctant to do so 1: “i don’t know if maybe the perception, some parents don’t want any technology, that would keep some people from appreciating it.” tech is distracting for young children provider believes technology is distracting 5: “personally, i think i kind of get distracted by the media, so, then i think they would, too. library system lack of access to devices library does not have a certain device or technology even though the provider would like to have or think useful for storytime 24: “um, i’ll be honest with you, if we had the ability, i would love to use some like smart tvs, smart boards, those kind of things.… we just don’t really have that option here.” lack of time to integrate tech into storytime, the provider has to have time to explore tools and know the best resources/media to integrate, and that takes time 1: “and part of it’s time, too. having the time to find quality resources, and to learn how to use them. because we have the technology, i think, in our system to implement it. “ information technology and libraries june 2023 technology integration in storytime programs 13 cahill, ingram, and joo dimension barrier operational definition representative quote lack of training provider thinks self doesn’t have the knowledge, interest, skill, or training to use technology during storytime 15: “and i’d be open to ways to use it, but i guess i haven’t taken, you know, any trainings on … i mean, i really haven’t seen a lot of things offered at conferences.” old facility library does not support installing newer technology 21: “... that’s a thing that we have struggled with previously because of our infrastructure and set-up. it was almost a hazard to set up a projector and have some sort of digital aspect to storytime.” information technology and libraries june 2023 technology integration in storytime programs 14 cahill, ingram, and joo discussion purposes experiential many of the storytime providers’ purposes for using technology revealed a goal to create a positive, engaging experience for all children and adults who attend storytime, a theme that prior research has highlighted.22 specifically, technology facilitates the sharing of visual aids, sound effects, and songs. providers also use technology to encourage adult participation, and like their early childhood educator colleagues, storytime providers in this study reported using technology to scaffold and coordinate children’s gross motor movements with songs and action rhymes.23 learning storytime providers’ responses also show the aim to contribute to the learning of children and adults in storytime. this finding mirrors those of goulding, shuker, and dickie, which found that providers like to use technology in ways that coincide with the aims of children’s services. 24 two of the purposes show an awareness of best practices in technology integration: support for adult-child interaction and teaching caregivers.25 additionally, storytime can be an opportune time for providers to model technology best practices for caregivers as providers have been modeling literacy best practices throughout the history of storytime programming.26 importantly, when storytime providers do model and intentionally seek to support caregivers’ learning, caregivers expand their knowledge, experience heightened confidence, and tend to utilize the strategies they encountered.27 notably, storytime providers tend to feel discomfort with providing instructional or developmental information directly to caregivers via “asides”; thus, a more palatable approach for many storytime providers might include using “we” language along the lines of “when we use digital media, we want to be sure that we are developing healthy habits. some families set a timer to help them monitor the duration of their children’s screentime.”28 one way that storytime providers might model digital media use is to search for and find information related to the storytime theme or book in one of the library’s databases. for example, if a book shared in storytime included a sloth, the storytime provider might demonstrate how to search for a video of a sloth in one of the library’s digital encyclopedias (e.g., encyclopedia britannica). storytime providers should also keep in mind that digital play can be incorporated into the informal activities that typically occur before and after storytime programs as a means to support children’s social interaction with other children.29 for example, if puzzles are typically included as one of the informal activity options before or after the storytime program, the provider might offer both traditional and digital puzzles (e.g., https://kids.nationalgeographic.com/games/puzzles/) on library-owned tablets and provide a simple how-to if needed. supports and barriers through the process of open coding, researchers identified four dimensions that storytime providers’ perceived supports and barriers could fall into based on the primary influential factor: provider, library system, participants, or content. provider the providers’ perceptions about technology and experiences with technology in the library setting serve as facilitators or barriers to integration. if a provider is aware of useful technology, familiar and comfortable with its use, knowledgeable of research supporting technology use, has a https://kids.nationalgeographic.com/games/puzzles/ information technology and libraries june 2023 technology integration in storytime programs 15 cahill, ingram, and joo professional philosophy that can accommodate technology use, and/or has had a positive experience trying out technology, then these may be factors that lead to the adoption of technology in storytime. on the other hand, if the provider has concerns about the difficulties of technology use or the amount of time children spend on screens, if the provider’s professional philosophy views storytime as a deliberate alternative to time with technology, or if the provider has had a negative experience with technology, then these may be factors that prevent the adoption of technology in storytime. these same factors affect early childhood practitioners and influence their decisions to incorporate technology into classroom practices.30 the factors that lead to technology integration could be seen as related to media mentorship. a media mentor has awareness, familiarity, knowledge, and a professional philosophy that supports technology use, all of which were factors identified by interviewees. professional training in mentorship was mentioned by one interviewee (17) who stated, “we did a digital mentoring training about how to appropriately model, like, tech skills and screen time with families.” thus , some providers’ responses indicate some general awareness of the currently emphasized best practice of media mentorship. however, the ambivalence toward the role of media mentor that goulding and colleagues found amongst librarians is also found here as interviewees’ responses do not give a clear picture of how they model technology use for caregivers during storytimes .31 in addition, responses that highlight barriers to technology integration show ways in which some providers are opposed to employing the role of media mentor specifically during storytime. as such, our findings align with prior observational studies that noted “few instances of librarians willing to speak directly to parents about how to interact with their children using technology.”32 participant providers consider the perspectives of the adult and child participants in storytimes in relation to integrating technology. providers are more likely to integrate technology if they view it as an aid to facilitating sessions for large groups, they believe caregivers will be open to the technology, and they appreciate that young children show a high interest in devices such as ipads. however, children’s high interest in devices was seen by other providers as a negative aspect of technology use and a barrier to integration because they thought children were too focused on the technology itself or would be distracted by the technology. just as early childhood teachers have been encouraged to broaden their perspectives of literacy to encompass digital literacy, so too might storytime providers, as this shift in focus would enable them to view these incidences as engagement rather than distraction.33 also, the same interviewee who thought caregivers might be open to technology in storytime expressed the concern that other caregivers might not like its use. our findings related to caregiver reaction echo similar findings from goulding and colleagues: the reaction that providers anticipate from adult participants might be either a support or a barrier for technology integration.34 library system two aspects of the library system were present in both factors and barriers: access and training. when the library system in which the provider worked gave them access to technology and training in its use for programs, they were more likely to integrate technology. in contrast, when a provider did not have access to technology, the library building did not support its use, or training was not given, the provider was less likely to integrate technology. libraries pride themselves on providing the highest level of service to members of the community and “removing barriers to access presented by socioeconomic circumstances.”35 yet, if libraries are to facilitate the digital learning of young children, it is necessary for them to recognize the digital divide impacting information technology and libraries june 2023 technology integration in storytime programs 16 cahill, ingram, and joo children’s access to technology throughout the world, and parents’ reluctance to spend money on digital apps.36 content content was a dimension only found in factors that support technology integration, not in barriers. providers used or wanted to use technology because they could connect the technology to two essential elements in the content of storytime: interactivity and theme. this dimension relates to purposes for technology use in the learning category as providers want to use the interactivity of technology as well as technology directly related to the session’s theme to boost children’s learning. indeed, child learning has long been librarians’ goal in providing storytime programs as has facilitating the development of parent skills.37 conclusion technology is prevalent in the lives of children and many begin interacting with digital tools as early as the first year of life; and caregivers seek guidance regarding their children’s technology use.38 while alsc has championed children’s librarians as media mentors, findings from this study, coupled with those from prior research, highlight storytime providers’ opposition to the media mentor role and the integration of technology within storytime programs.39 some first steps storytime providers might take are to integrate the digital tools the library is already providing. for example, if the library offers e-books (e.g., via libby), the storytime provider might consider integrating one or more picturebooks from that collection into storytime. alternatively, if the library does not have the tools necessary to share the book electronically during the program (e.g., a screen large enough for the storytime group), the provider might read the print version but then follow that up with a comment along the lines of “grownups, did you know that the library also offers this as an e-book that you could read on a phone, tablet, or other device? i would be happy to show you how to access it and other e-books after the program.” providers looking for other ways to incorporate digital tools into library programs might read strategies recommended by librarians in a fully and freely accessible online book.40 as scholars have previously noted, early childhood providers, including those who support young children and families in libraries, need much more professional development.41 specifically, the field needs more opportunities for librarians and other early childhood educators to develop their knowledge and skills within the realm of digital technology for young children, but they also need training that advances the notion of media mentor and boosts their confidence and identities relative to that role.42 the institute of museum and library services recently funded a project designed to support librarians’ knowledge and skills within the realm of family media for children ages five to eleven years—and products from that project are certainly a good starting place for storytime providers; however, additional resources and research focused on library programs and services designed for children from birth through five years are needed.43 if librarians are to embrace the media mentor role with confidence and the necessary knowledge and skills required of the task, much greater attention should be devoted to the responsibility and more support in the way of professional development and resources is necessary. acknowledgement this work was supported by the institute of museum and library services [federal award identification number: lg-96-17-0199-17]. information technology and libraries june 2023 technology integration in storytime programs 17 cahill, ingram, and joo endnotes 1 nalika unantenne, mobile device usage among young kids: a southeast asia study (the asianparent insights, november 2014), https://s3-ap-southeast-1.amazonaws.com/tap-sgmedia/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.p df; brooke auxier, monica anderson, andrew perrin, and erica turner, parenting children in the age of screens (pew research center, 2020), https://www.pewresearch.org/internet/2020/07/28/parenting-children-in-the-age-ofscreens/; stephane chaudron, rosanna di gioia, and monica gemo, young children (0–8) and digital technology: a qualitative study across europe (publications office of the european union, 2018), https://doi.org/10.2760/294383; organization for economic cooperation and development, what do we know about children and technology? (2019), https://www.oecd.org/education/ceri/booklet-21st-century-children.pdf; victoria rideout and michael b. robb, the common sense census: media use by kids age zero to eight, 2020 (common sense media, 2020), https://www.commonsensemedia.org/sites/default/files/uploads/research/2020_zero_to_eig ht_census_final_web.pdf; jenny s. radesky et al., “young children’s use of smartphones and tablets,” pediatrics 146, no. 1 (2020). 2 unantenne, mobile device usage; auxier, anderson, perrin, and turner, parenting children; chaudron, di gioia, and gemo, young children (0-8) and digital technology. 3 rideout and robb, the common sense census; sebastian paul suggate and philipp martzog, “preschool screen-media usage predicts mental imagery two years later,” early child development and care (2021): 1–14. 4 auxier, anderson, perrin, and turner, parenting children; suggate and martzog, “preschool screen-media usage.” 5 marc w. hernandez, carrie e. markovitz, elc estrera, and gayle kelly, the uses of technology to support early childhood practice: instruction and assessment. sample product and program tables (administration for children & families, u.s. department of health & human services, 2020), https://www.acf.hhs.gov/media/7970; lisa b. hurwitz and kelly l. schmitt, “can children benefit from early internet exposure? shortand long-term links between internet use, digital skill, and academic performance,” computers & education 146 (2020): 103750; kathy hirsh-pasek et al., “putting education in ‘educational’ apps: lessons from the science of learning,” psychological science in the public interest 16, no. 1 (2015): 3–34. 6 amy koester, ed., young children, new media, and libraries: a guide for incorporating new media into library collections, services, and programs for families and children ages 0–5 (little elit, 2015), https://littleelit.files.wordpress.com/2015/06/final-young-children-new-media-andlibraries-full-pdf.pdf. 7 association for library service to children, national research agenda for library service to children (ages 0–14), 2019, https://www.ala.org/alsc/sites/ala.org.alsc/files/content/200327_alsc_research_agen da_p rint_version.pdf. https://s3-ap-southeast-1.amazonaws.com/tap-sg-media/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.pdf https://s3-ap-southeast-1.amazonaws.com/tap-sg-media/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.pdf https://s3-ap-southeast-1.amazonaws.com/tap-sg-media/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.pdf https://www.pewresearch.org/internet/2020/07/28/parenting-children-in-the-age-of-screens/ https://www.pewresearch.org/internet/2020/07/28/parenting-children-in-the-age-of-screens/ https://doi.org/10.2760/294383 https://www.oecd.org/education/ceri/booklet-21st-century-children.pdf https://www.commonsensemedia.org/sites/default/files/uploads/research/2020_zero_to_eight_census_final_web.pdf https://www.commonsensemedia.org/sites/default/files/uploads/research/2020_zero_to_eight_census_final_web.pdf https://littleelit.files.wordpress.com/2015/06/final-young-children-new-media-and-libraries-full-pdf.pdf https://littleelit.files.wordpress.com/2015/06/final-young-children-new-media-and-libraries-full-pdf.pdf https://www.ala.org/alsc/sites/ala.org.alsc/files/content/200327_alsc_research_agenda_print_version.pdf https://www.ala.org/alsc/sites/ala.org.alsc/files/content/200327_alsc_research_agenda_print_version.pdf information technology and libraries june 2023 technology integration in storytime programs 18 cahill, ingram, and joo 8 christner, hicks, and koester, “chapter six: new media in storytimes: strategies for using tablets in a program setting.” in a. koester, ed., a guide for incorporating new media into library collections, services, and programs for families and children ages 0–5 (little elit, 2015), 77-88. 9 kathleen campana, j. elizabeth mills, marianne martens, and claudia haines, “where are we now? the evolving use of new media with young children in libraries,” children and libraries 17, no. 4 (2019): 23–32; j. elizabeth mills, emily romeign-stout, cen campbell, and amy koester, “results from the young children, new media, and libraries survey: what did we learn?”, children and libraries 13, no. 2 (2015): 26–32. 10 campana, mills, martens, and haines, “where are we now?”. 11 campana, mills, martens, and haines, “where are we now?”. 12 susan b. neuman, naomi moland, and donna celano, “bringing literacy home: an evaluation of the every child ready to read program” (chicago: association for library service to children and public library association, 2017), 5, http://everychildreadytoread.org/wpcontent/uploads/2017/11/2017-ecrr-report-final. 13 campana, mills, martens, and haines, “where are we now?”; mills, romeign-stout, campbell, and koester, “results from the young children, new media, and libraries survey.” 14 anne goulding, mary jane shuker, and john dickie, “media mentoring through digital storytimes: the experiences of public libraries in aotearoa new zealand,” in proceedings of ifla wlic (2017), https://library.ifla.org/id/eprint/1742/1/138-goulding-en.pdf. 15 goulding, shuker, and dickie, “media mentoring through digital storytimes”; cen campbell and amy koester, “new media in youth librarianship,” in a. koester, ed., a guide for incorporating new media into library collections, services, and programs for families and children ages 0–5 (little elit, 2015), 8–24. 16 jennifer nelson and keith braafladt, technology and literacy: 21st century library programming for children and teens (chicago: american library association, 2012). 17 c. campbell, c. haines, a. koester, and d. stoltz, media mentorship in libraries serving youth (chicago: association for library service to children, 2015), https://www.ala.org/alsc/sites/ala.org.alsc/files/content/media%20mentorship%20in%20li braries%20serving%20youth_final_no%20graphics.pdf. 18 association for library service to children, competencies for librarians serving children in libraries. 19 campana, mills, martens, and haines, “where are we now?”; mills, romeign-stout, campbell, and koster, “results from the young children, new media, and libraries survey”; neuman, moland, and celano, “bringing literacy home”; goulding, shuker, and dickie, “media mentoring through digital storytimes.” http://everychildreadytoread.org/wp-content/uploads/2017/11/2017-ecrr-report-final http://everychildreadytoread.org/wp-content/uploads/2017/11/2017-ecrr-report-final https://www.ala.org/alsc/sites/ala.org.alsc/files/content/media%20mentorship%20in%20libraries%20serving%20youth_final_no%20graphics.pdf https://www.ala.org/alsc/sites/ala.org.alsc/files/content/media%20mentorship%20in%20libraries%20serving%20youth_final_no%20graphics.pdf information technology and libraries june 2023 technology integration in storytime programs 19 cahill, ingram, and joo 20 goulding, shuker, and dickie, “media mentoring through digital storytimes” in proceedings of ifla wlic (2017), https://library.ifla.org/id/eprint/1742/1/138-goulding-en.pdf. 21 institute of museum and library services, public libraries survey, 2016, https://www.imls.gov/research-evaluation/data-collection/public-libraries-survey. 22 maria cahill, soohyung joo, mary howard, and suzanne walker, “we’ve been offering it for years, but why do they come? the reasons why adults bring young children to public library storytimes,” libri 70, no. 4 (2020), 335–44; peter andrew de vries, “parental perceptions of music in storytelling sessions in a public library,” early childhood education journal 35, no. 5 (2008): 473–78; goulding and crump, “developing inquiring minds.” 23 courtney k. blackwell, ellen wartella, alexis r. lauricella, and michael b. robb, technology in the lives of educators and early childhood programs: trends in access, use, and professional development from 2012 to 2014 (chicago: northwestern school of communication, 2015). 24 campbell and koester, “new media in youth librarianship.” 25 campbell, haines, koester, stoltz, media mentorship in libraries serving youth; prachi e. shah et al., “daily television exposure, parent conversation during shared television viewing and socioeconomic status: associations with curiosity at kindergarten,” plos one 16, no. 10 (2021), e0258572. 26 nelson and braafladt, technology and literacy. 27 roger a. stewart et al., “enhanced storytimes: effects on parent/caregiver knowledge, motivation, and behaviors,” children and libraries 12, no. 2 (2014): 9–14; scott graham and andré gagnon, “a quasi-experimental evaluation of an early literacy program at the regina public library/évaluation quasi-expérimentale d'un programme d'alphabétisation des jeunes enfants à la bibliothèque publique de regina,” canadian journal of information and library science 37, no. 2 (2013): 103–21. 28 maria cahill and erin ingram, “instructional asides in public library storytimes: mixed methods analyses with implications for librarian leadership,” journal of library administration 61, no. 4 (2021): 421–38. 29 leigh disney and gretchen geng, “investigating young children’s social interactions during digital play, early childhood education journal (2021): 1–11. 30 hernandez, markovitz, estrera, and kelly, “the uses of technology”; karen daniels et al., “early years teachers and digital literacies: navigating a kaleidoscope of discourses,” education and information technologies 25, no. 4 (2020): 2415–26. 31 goulding, shuker, and dickie, “media mentoring through digital storytimes.” 32 neuman, moland, and celano, “bringing literacy home,” 58. 33 daniels et al., “early years teachers and digital literacies.” https://library.ifla.org/id/eprint/1742/1/138-goulding-en.pdf https://www.imls.gov/research-evaluation/data-collection/public-libraries-survey information technology and libraries june 2023 technology integration in storytime programs 20 cahill, ingram, and joo 34 goulding, shuker, and dickie, “media mentoring through digital storytimes.” 35 association for library service to children, competencies for librarians serving children in libraries (2020) https://www.ala.org/alsc/edcareeers/alsccorecomps; american library association, code of ethics of the american library association (2021), https://www.ala.org/tools/ethics 36 jenna herdzina and alexis r. lauricella, “media literacy in early childhood report,” child development 101 (2020): 10; sara ayllon et al., digital diversity across europe: policy brief september 2021 (digigen project, 2021), https://www.digigen.eu/news/digital-diversityacross-europe-recommendations-to-ensure-children-across-europe-equally-benefit-fromdigital-technology/. 37 goulding and crump, “developing inquiring minds”; nancy l. kewish, “south euclid’s pilot project for two-year-olds and parents,” school library journal 25, no. 7 (1979): 93–97. 38 auxier, anderson, perrin, and turner, parenting children; rideout and robb, the common sense census. 39 neuman, moland, and celano, “bringing literacy home”; goulding, shuker, and dickie, “media mentoring through digital storytimes.” 40 koester, ed., young children, new media, and libraries. 41 us department of education, office of educational technology, policy brief on early learning and use of technology, 2016, https://tech.ed.gov/files/2016/10/early-learning-tech-policybrief.pdf. 42 herdzina and lauricella, “media literacy in early childhood report.” 43 rebekah willett, june abbas, and denise e. agosto, navigating screens (blog), https://navigatingscreens.wordpress.com. https://www.digigen.eu/news/digital-diversity-across-europe-recommendations-to-ensure-children-across-europe-equally-benefit-from-digital-technology/ https://www.digigen.eu/news/digital-diversity-across-europe-recommendations-to-ensure-children-across-europe-equally-benefit-from-digital-technology/ https://www.digigen.eu/news/digital-diversity-across-europe-recommendations-to-ensure-children-across-europe-equally-benefit-from-digital-technology/ https://tech.ed.gov/files/2016/10/early-learning-tech-policy-brief.pdf https://tech.ed.gov/files/2016/10/early-learning-tech-policy-brief.pdf https://navigatingscreens.wordpress.com/ abstract introduction previous studies of technology use in children’s programs and storytimes technology best practices: joint media engagement and media mentorship study objectives method participants procedure results what are the purposes for using technology in storytimes? what are factors associated with adopting technology in storytimes? what are barriers to integrating technology in storytimes? discussion purposes experiential learning supports and barriers provider participant library system content conclusion acknowledgement endnotes are ivy league library website homepages accessible? articles are ivy league library website homepages accessible? wenfan yang, bin zhao, yan quan liu, and arlene bielefield information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.11577 wenfan yang (youngwf@126.com) is a master’s student in the school of management, tianjin university of technology, china. bin zhao (andy.zh@126.com) is professor in school of management, tianjin university of technology, china. yan quan liu (liuy1@southernct.edu) is professor in information and library science at southern connecticut university and special hired professor of tianjin university of technology. arlene bielefield (bielefielda1@southernct.edu) is professor in information and library science at southern connecticut university. copyright © 2020. abstract as a doorway for users seeking information, library websites should be accessible to all, including those who are visually or physically impaired and those with reading or learning disabilities. in conjunction with an earlier study, this paper presents a comparative evaluation of ivy league university library homepages with regard to the americans with disabilities act (ada) mandates. data results from wave and achecker evaluations indicate that although the error of missing form labels still occurs in these websites, other known accessibility errors and issues have been significantly improved from five years ago. introduction an academic library is “a library that is an integral part of a college, university, or other institution of postsecondary education, administered to meet the information and research needs of its students, faculty, and staff.”1 people living with physical disabilities face barriers whenever they enter a library. many blind and visually impaired persons need assistance when visiting a library to do research. in such cases, searching the collection catalog, periodical indexes, and other bibliographic references are frequently conducted by a librarian or the person accompanying that individual to the library. thus, professionals in these institutions can advance the use of academic libraries for the visually impaired, physically disabled, hearing impaired, and people with learning disabilities. library websites are libraries’ virtual front doors for all users pursuing information from libraries. fichter stated that the power of the website is in its popularization.2 access by everyone regardless of disability is an essential reason for its popularization. whether users are students, parents, senior citizens, or elected officials navigating the library website to find resources, or sign up for computer courses at the library, the website can be either a liberating or a limiting experience.3 according to the web accessibility initiative (https://www.w3.org/wai/), website accessibility means that people with disabilities can use the websites. more specifically, website accessibility means that people with disabilities can perceive, understand, navigate, and interact with websites and that they can contribute to the websites. incorporating accessibility into website design enables people with disabilities to enjoy the benefits of websites to the same extent as anyone else in their community. this study evaluated the current state of the accessibility of university websites of the american ivy league university libraries using guidelines established by the americans with disabilities act mailto:youngwf@126.com mailto:andy.zh@126.com mailto:liuy1@southernct.edu mailto:bielefielda1@southernct.edu https://www.w3.org/wai/ information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 2 (ada) for those who are visually or physically impaired or who have reading or learning disabilities. ada’s section 508 and the web content accessibility guidelines (wcag), by the world wide web consortium (w3c) provide guidelines for website developers which define what makes a website accessible to those with physical, sensory, or cognitive disabilities. since a broad array of disabilities are recognized under the ada, websites seeking to be compliant with the ada should use the act’s technical criteria for website design. this study used two common accessibility evaluation tools—wave and achecker—for both section 508 and the wcag version 2.0 level aa. among universities in the united states, the eight ivy league universities—brown, columbia, cornell, dartmouth, harvard, princeton, university of pennsylvania, and yale—all have a long and distinguished history, strict academic requirements, high-quality teaching, and high-caliber students. because of their good reputations, they are expected to lead by example, not only in terms of academic philosophy and campus atmosphere, but also by the accessibility of their various websites. of course, any library website, whether an urban public library or a university library, should be accessible to everyone. hopefully, this study of their accessibility can enlighten other universities on how to better develop and maintain library websites so that individuals with disabilities can enjoy the same level of accessibility to academic knowledge as everyone else. literature review in 1999, schmetzke reported that emerging awareness about the need for accessible website design had not yet manifested itself in the actual design of library websites. for example, at the fourteen four-year campuses within the university of wisconsin system, only 13 percent of the libraries’ top-level pages (homepages plus the next layer of library pages linked to them) were free of accessibility problems.4 has this situation changed in the last twenty years? to answer this question, a number of authors have suggested various methods for evaluating software/hardware for accessibility and usability.5 included in the process of compiling data is “involving the user at each step of the design process. involvement typically takes the form of an interview and observation of the user engaged with the software/hardware.”6 providenti & zai conducted a study in 2007 focused on providing an update on the implementation of website accessibility guidelines of kentucky academic library websites. they tested the academic library homepages of bachelor-degree granting institutions in kentucky for accessibility compliance using watchfire’s webxact accessibility tester and the w3c’s html validator. the results showed that from 2003 to 2007, the number of library homepages complying with basic accessibility guidelines was increasing.7 billingham conducted research on edith cowan university (ecu) library websites. the websites were tested twice, in october 2012 and june 2013, using automated testing tools such as code validators and color analysis programs, resulting in findings that 11 percent of the guidelines for wcag 2.0 level a to level aa were passed in their first test. additionally, there was a small increase in the percentage of wcag 2.0 guidelines passed by all pages tested in the second test. 8 while quite a few research studies focus on library website accessibility rather than the university websites, the conclusions diverge. tatiana & jeremy (2014) tested 509 webpages at a large public university in the northeastern united states using wave (http://wave.webaim.org) and cynthia says (http://www.cynthiasays.com). the results indicated that 51 percent of those webpages http://wave.webaim.org/ http://www.cynthiasays.com/ information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 3 passed automated website accessibility tests for section 508 compliance with cynthia says. however, when using wave for wcag priority 1 compliance, which is a more rigorous evaluation level, only 35 percent passed the test.9 maatta smith reported that not one of the websites of 127 us members of the urban library council (ulc) was without errors or alerts, with the average number of errors being 27.10 such results were similar with liu.11 12they also found that about half (58 of 127) of the urban public libraries provided no information specifically for individuals with disabilities. of the 127 websites, some were confusing by using the variety of verbiage to suggest services for individuals with disabilities. sixty-six of them provide some information about services within the library for individuals with disabilities. the depth of the information varied, but in all instances contact information was included for additional assistance. liu, bielefield, and mckay examined 122 library homepages of ulc members and reported on three main aspects. first, only seven of them presented as error free when tested for compliance with the 508 standards. the highest percentage of errors occurred in accessibility sections 508(a) and 508(n). second, the number of issues was dependent on the population served. that means libraries serving larger populations tend to have more issues with accessibility than those serving smaller ones. third, the most common errors were missing label and contrast errors, while the highest number of alerts was related to the device-dependent event handler, which means that a keyboard or mouse is a necessary piece of equipment to initiate a desired transaction.12 although they were interested in overall website accessibility, theofanos and redish focused their research on the visually impaired website user. the authors investigated and revealed six reasons to bridge the gap between accessibility and usability. the six reasons were: 1. disabilities affect more people than you may think worldwide. 750 million people have a disability, and three of every ten families are touched by a disability. in the united states, one in five have some kind of disability, and one in ten have a severe disability. that’s approximately 54 million americans. 2. it is a good business. according to the president’s committee on the employment of people with disabilities, the discretionary income of people with disabilities is $175 billion. 3. the number of people with disabilities and income to spend is likely to increase. the likelihood of having a disability increases with age, and the overall population is aging. 4. the website plays an important role and has significant benefits for people with disabilities. 5. improving accessibility enhances usability for all users. 6. it is morally the right thing to do.13 lazar, dudley-sponaugle, and greenidge validated that most blind users were just as impatient as most sighted users. they want to get the information they need as quickly as possible. they don’t want to listen to every word on the page just as sighted users do not read every word.14 similarly, foley found that using automated validation tools did not ensure complete accessibility. students with low vision found many of the pages hard to use even though they were validated.15 outcomes of all the research revealed that most university library websites have developed a policy on website accessibility, but the policies of most universities had deficiencies.16 library staff information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 4 must be better informed and trained to understand the tools available to users, and when reviewing web pages, the audiences of all kinds must be considered.17 research design and methods this study, as a continuing effort from an earlier study on urban library websites, made use of content analysis methodology to examine the website accessibility of the university libraries against the americans with disabilities act (ada), with a focus on those with visual or cognitive disabilities.18 under the ada, people with disabilities are guaranteed access to all postsecondary programs and services. the evaluation of accessibility focuses on the main pages of these university library websites, as shown in table 1, because these homepages considerably demonstrate the institution’s best effort or, at least, most recent redesigns. it was the intent of the authors of this research to reveal the current status of the ivy league library homepages’ accessibility and the importance that ivy league universities attach to the accessibility of their websites. commonly recognized website evaluators (wave, achecker, and cynthia says), along with other online tools, evaluate a website's accessibility by checking its html and xml code. wave and achecker were selected for this study for the robustness of their evaluation based on w3c guidelines, comprehensiveness of evaluation reporting, and ready availability to any institution or individual conducting website evaluations. wave is a web evaluation tool that was utilized to check websites against section 508 standards and wcag 2.0 guidelines. this assessment was conducted by entering a uniform research locator, url, or website address in the search box. the evaluation tool provided a summary of errors, alerts, features, structural elements, html5 and aria. achecker is a tool to check single html page content for conformance with accessibility standards to ensure the content can be accessed by everyone. it produces a report of all accessibility problems for the selected guidelines by three types of problems: known problems, likely problems and potential problems. both wave and achecker help website developers make their website content more accessible. data from different periods were compared to show statistically whether enough attention was paid to accessibility issues by the ivy league university systems. the study team collected the first data set in february 2014, using wave for section 508. in 2018, achecker accessibility checker was used for both section 508 and wcag 2.0 aa. the access board published new requirements for information and communication technology covered by section 508 of the rehabilitation act (https://www.access-board.gov/guidelines-andstandards/communications-and-it/about-the-ict-refresh) on january 18, 2017. the latest wcag 2.0 guidelines were updated on september 5, 2013 (https://www.w3.org/tr/wcag2ict/). while the wave development team indicated that they have updated the indicators in wave regarding wcag 2.0, the current indicators regarding section 508 refer to the previous technical standards for section 508, not the updated 2017 ones. according to achecker.ca, the versions of the section 508 standards and wcag 2.0 aa guidelines used were published on march 12, 2004 and june 19, 2006, respectively, with neither being the latest versions. this study centered on three research questions: https://www.access-board.gov/guidelines-and-standards/communications-and-it/about-the-ict-refresh https://www.access-board.gov/guidelines-and-standards/communications-and-it/about-the-ict-refresh https://www.w3.org/tr/wcag2ict/ information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 5 1. are the library websites of the eight ivy league universities ada compliant? 2. are there easily identified issues that present barriers to access for the visually impaired on the ivy league university library homepages? 3. what should ivy league libraries do to achieve ada compliance and to maintain it? table 1. investigated websites of ivy league university libraries. library website address brown university library https://library.brown.edu columbia university libraries http://library.columbia.edu cornell university library https://www.library.cornell.edu dartmouth library https://www.library.dartmouth.edu harvard library http://library.harvard.edu princeton university library http://library.princeton.edu penn libraries http://www.library.upenn.edu yale university library https://web.library.yale.edu results & discussion all five evaluation categories employed by wave for section 508 standards, as shown in figure 1, were examined, with a more in-depth review of the homepage of the university of pennsylvania library. similar results in numbers of the five categories are presented in the library homepages of brown university, columbia university, and cornell university. interestingly, wave indicates more errors and alerts on the homepage of yale university. figure 1. wave results for section 508 standards. in order to determine the accuracy of the results, the team also used achecker to reevaluate these homepages in the year 2018. known problems as the category in achecker are as serious as errors in wave. they have been identified with certainty as accessibility barriers by the website information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 6 evaluators and need to be fixed. likely problems are problems that could be barriers which require a human to decide whether there is a need to fix them. achecker cannot identify potential problems and requires a human to confirm if identified problems need remediation. figure 2 shows the numbers for each category as detected by achecker on june 18, 2018, on the eig ht ivy league university libraries’ homepages. the library homepage of the university of pennsylvania was found to contain the most, which was the same as the result from wave. however, among the seven remaining libraries’ homepages, the homepage of harvard university library displayed the same number of problems as the university of pennsylvania detected by achecker. figure 2. achecker results for section 508 standards. there was significant improvement between 2014 and 2018 the results of this research from wave for section 508 standards signify a significant shift in the accessibility of these websites between the years of 2014 and 2018. among the five wave detection categories in the eight library homepages, the total of errors and alerts decreased during this period. for instance, the total number of errors was 36 in 2014 decreasing to 11 in 2018, and the number of alerts decreased from 141 to 14. figure 3 shows the number of errors in each library homepage, and figure 4 shows the number of alerts. they all show a downward trend from 2014 to 2018. but features, structural elements and html/aria were all on the rise when comparing the two years’ data sets. the green sections in table 2 indicate a decrease of the numbers in three categories from 2014 to 2018, and the yellow sections indicate an increase in numbers. these data results revealed that errors and alerts, the most common problems related to access, had been better controlled during these years, while others might still remain unchanged. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 7 figure 3. change of errors from 2014 to 2018. figure 4. change of alerts from 2014 to 2018. table 2. changes of features, structural elements, and html/aria between 2014 and 2018. categories features structural elements html/aria year of data collection 2014 2018 2014 2018 2014 2018 total 108 191 184 233 24 89 brown university library 13 15 6 13 0 1 columbia university libraries 12 13 23 14 17 0 cornell university library 5 6 20 18 0 4 dartmouth library 10 8 15 27 0 23 harvard library 20 20 14 24 0 4 princeton university library 15 31 45 24 0 3 penn libraries 12 90 29 104 7 50 yale university library 21 8 32 9 0 4 missing form labels were the top error against the ada the data used in the analysis below were all the test data collected in 2018. all errors appearing in data results were collected and analyzed. figure 5 shows the number of errors that were identified based on the specific requirements contained in section 508 of the rehabilitation act as evaluated by wave. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 8 figure 5. occurrences of specific error per specific 508 standards. the term error refers to accessibility errors that need to be fixed. missing form label was the highest frequency error type shown. only two types of errors occurred in ivy league university libraries’ homepages. but these errors didn’t appear on every homepage. there are several errors in some homepages while others had no errors. for example, linked image missing alternative text occurred on the library homepage of harvard university twice. table 3 shows the distribution of errors in eight homepages. table 3. distribution of errors in eight homepages. missing form label linked image missing alternative text brown university library columbia university libraries 1 cornell university library dartmouth library 3 harvard library 2 princeton university library penn libraries 1 yale university library 4 missing form label is listed in section 508 (n) and means there is a form control without a corresponding label. this is important because if a form control does not have a properly associated text label, the function or purpose of that form control may not be presented to sc reen reader users. linked image missing alternative text occurred only in the harvard library homepage among the eight ivy league university libraries’ homepages. it indicated that an image without alternative text results in an empty link. if an image is within a link that does not provide alternative text, a screen reader has no content to present to the user regarding the function of the link. these website accessibility issues may be easy fixes and considered minor to some; however, if they are not detected, they are major barriers for persons living with low vision or blindness. as a result, users are left at a disadvantage because they are lacking critical information to successfully fulfill their needs. examples of such error icons in wave are displayed in figures 6 and 7. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 9 figure 6. missing form label icon from yale university library homepage. figure 7. linked image missing alternative text icon from harvard library homepage. a total of eleven errors, as shown in figure 8, were located on the homepages of the eight ivy league libraries and illustrated the number of errors that occurred in each library homepage. the average number of errors for each homepage was 1.375. yale university library homepage had the most errors with a total of four. library homepages of brown university, cornell university and princeton university performed best with zero errors. figure 8. the total of errors in ivy league libraries’ homepages. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 10 six alerts appear among ada requirements the issues that alerts identify are also significant for website accessibility. figure 9 shows there are six different kinds of alerts that were identified based on the specific requirements contained in section 508 of the rehabilitation act. figure 9. occurrences of specific alert per specific 508 standards. the noscript element was the most encountered alert issue. alerts that wave reports need close scrutiny, because they likely represent an end-user accessibility issue. the noscript element is related to the 508 (l) requirement and means a noscript element is present when javascript is disabled. for users of screen readers and other assistive technologies, almost all have javascript enabled, so noscript cannot be used to provide an accessible version of inaccessible scripted content. skipped heading level ranked was second in number. the importance of headings is in their provision of document structure and facilitation of keyboard navigation for users of assistive technology. these users may be confused or they may experience difficulty navigating when heading levels are skipped. examples of icons of these alerts, evaluated by wave, indicated these noscript elements as being to accessibility, as shown in figures 10 and 11. figure 10. noscript element icon from cornell university library homepage. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 11 figure 11. skipped heading level icon from dartmouth library homepage. a total of fourteen alert problems were detected. figure twelve illustrates the number of alerts that occurred on each library homepage. on average, there were 1.75 alerts present on the eight websites. the library homepages of yale university and the university of pennsylvania had the most alerts with 4 on each site. only the brown university library’s homepage had zero alerts. figure 12. the total of alerts in ivy league libraries’ homepages. linked image with alternative text was the most frequently found feature issue features as a category of issues indicates conditions of accessibility that probably need to be improved and usually require further verification and manual fixing. for example, if a feature is detected on a website, it means that further manual verification is required to confirm its accessibility. figure 13 shows the number of features that were identified, based on the specific requirement contained in section 508 of the rehabilitation act. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 12 figure 13. occurrences of specific features per specific 508 standards. linked image with alternative text, which is a 508 (a) requirement, was shown to be the most encountered features issue. this means that an alternative text should be presented for an image that is within a link. by including appropriate alternative text on an image within a link, the function and purpose of the link and the content of the image are available to screen reader users even when images are unavailable. another high occurring feature was form label, which means a form label is present and associated with a form control. a properly associated form label is presented to a screen reader user when the form control is accessed. these evaluation steps were the same ones used for errors and alerts. example icons of features evaluated by wave are displayed as figures 14 and 15. figure 14. linked image with alternative text icon from brown university library homepage. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 13 figure 15. form label icon from penn libraries homepage. this study also ranked the number of features that were detected by wave in the eight ivy league library homepages. figure 16 displays the number of features that occurred on each library homepage. in total there were 191 features detected by wave in the eight ivy league university libraries’ homepages. the homepage of the university of pennsylvania library was found to have 90 features, by far the most of all the libraries. no library was entirely free of features according to the wave measurement using section 508 standards. figure 16. the total of features in ivy league libraries’ homepages. information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 14 table 4a. comparison between wave & achecker section 508 standards on brown and columbia’s library homepages. section 508 standards brown university columbia university wave achecker wave achecker april june april june april june april june total 33 29 47 47 28 29 79 83 a 9 9 9 9 12 13 12 14 b c 14 14 26 28 d 8 8 14 14 e f g h i j 8 8 14 14 k l 6 6 12 12 m n 1 1 1 1 1 1 o 23 19 1 1 15 15 1 1 p table 4b. comparison between wave & achecker section 508 standards on cornell and dartmouth’s library homepages. section 508 standards cornell university dartmouth college wave achecker wave achecker april june april june april june april june total 30 29 107 106 59 68 65 67 a 2 2 2 2 8 8 10 11 b c 36 36 22 23 d 32 32 9 9 e f g h i j 33 32 9 9 k l 3 3 7 7 m n 7 7 23 29 8 8 o 21 20 1 1 28 31 p information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 15 table 4c. comparison between wave & achecker section 508 standards on harvard and princeton’s library homepages. section 508 standards harvard university princeton university wave achecker wave achecker april june april june april june april june total 51 51 139 139 57 61 74 74 a 20 20 29 29 25 25 20 20 b c 43 43 32 32 d 32 32 10 10 e f g h i j 34 34 10 10 k l 1 1 m n 5 5 3 7 o 26 26 1 1 29 29 1 1 p table 4d. comparison between wave & achecker section 508 standards on pennsylvania and yale’s library homepages. section 508 standards university of pennsylvania yale university wave achecker wave achecker april june april june april june april june total 253 249 129 139 28 29 84 85 a 40 37 14 19 6 7 4 5 b c 82 87 28 28 d 11 11 21 21 e f g 1 1 h i j 11 11 21 21 k l 1 1 9 9 3 3 4 4 m 3 2 n 103 104 1 1 8 8 4 4 o 106 105 1 1 11 11 1 1 p information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 16 a few 508 standards deviate from comparison between two evaluators to determine whether the wave tool missed some specific requirements in section 508, the authors comparatively examined these eight university homepages using both wave and achecker from one site to another synchronously in april and again in june 2019. there are sixteen principles in section 508. they are arranged from a to p. tables 4a–4d indicate issues for these section 508’s requirements in the eight universities’ homepages respectively. except the requirement g for yale library homepage which shows one issue in achecker, in neither wave nor achecker during the time we conducted our examination, there was no issue found for the seven requirements (b, e, f, h, i, k, and p) below: b. equivalent alternatives for any multimedia presentation shall be synchronized with the presentation; e. redundant text links shall be provided for each active region of a server-side image map; f. client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape; h. markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers; i. frames shall be titled with text that facilitates frame identification and navigation; k. a text-only page, with equivalent information or functionality, shall be provided to make a website comply with the provisions of this part, when compliance cannot be accomplished in any other way. the content of the text-only page shall be updated whenever the primary page changes; p. when a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required. the results tabulated in tables 4a–4d indicate that these seven section 508 requirements perhaps are not problematic to the websites. conclusions based on the results, this study determined that the eight ivy league universities’ homepages exhibited some issues with accessibility for people with disabilities. considerable effort is necessary to ensure their websites ready to meet the challenges and future needs of web accessibility. users with visual impairments can navigate a website only when it is designed to be accessible with other assistive technology. while each institution presented both general and comprehensive coverage of services for users with disabilities, it would have been more practical and efficient if specific links were posted on the homepage. according to the american foundation for the blind (https://www.afb.org), “usability” is a way of describing how easy a website is to understand and use. accessibility refers to how easily a website can be used, understood, and accessed by people with disabilities. https://www.afb.org/ information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 17 this study has concluded that expertise and specialized training and skill are still needed in th is area. principles of accessible website design must be introduced and taught, underscoring that design matters for people with disabilities just as it does in the physical environment. as highlighted earlier through the evaluation tool wave, most of the problems detected can be fixed with provided solutions. a frequent review is critical and websites should be assessed at a minimum on a yearly basis for accessibility compliance. there is much to be done if accessibility is to be realized for everyone. limitations the authors recognize that this study, using free website accessibility testing tools, has certain limitations. as wave remarked in their help page, the aim of website developers is not to get rid of all identified problem categories except errors that need to be fixed, but to determine whether a website is accessible. at the time of writing neither wave nor achecker were updated with the latest general wcag 2.1 aa rules. while the version of wcag 2.1 is expected to provide new guidelines for making websites even more accessible, more careful and comprehensive studies against the wcag 2.1 aa rules could further assist university library professionals and their website developers to provide those with disabilities with accessible websites. moreover, while it is effective to conduct these machine-generated evaluations, it is equally important that researchers check the issues manually to impose human analysis in determining the major issues with content. endnotes 1 joan m. reitz, odlis: online dictionary for library and information science. (westport, ct: libraries unlimited, 2004), 1–2. 2 darlene fichter, “making your website accessible,” online searcher 37, no. 4 (2013): 73–76. 3 fichter, “making your website accessible,” 74. 4 axel schmetzke, web page accessibility on university of wisconsin campuses: a comparative study (stevens point, wi, 2019). 5 jeffrey rubin and dana chisnell, handbook of usability testing: how to plan, design, and conduct effective tests (idaho: wiley, 2008), 6–11. 6 alan foley, “exploring the design, development and use of websites through accessibility and usability studies,” journal of educational multimedia and hypermedia 20, no. 4 (2011), 361–85, http://www.editlib.org/p/37621/. 7 michael providenti and robert zai iii, “web accessibility at kentucky's academic libraries,” library hi tech 25, no. 4 (2007): 478–93, https://doi.org/10.1108/07378830710840446. 8 lisa billingham, “improving academic library website accessibility for people with disabilities,” library management 35, no. 8/9 (2014): 565–81, https://doi.org/10.1108/lm-11-2013-0107. 9 tatiana i solovieva and jeremy m bock, “monitoring for accessibility and university websites: meeting the needs of people with disabilities,” journal of postsecondary education and http://www.editlib.org/p/37621/ https://doi.org/10.1108/07378830710840446 https://doi.org/10.1108/lm-11-2013-0107 information technology and libraries june 2020 are ivy league library website homepages accessible? | liu 18 disability 27, no. 2 (2014): 113–27, http://search.proquest.com/docview/1651856804?accountid=9744. 10 stephanie l. maatta smith, “web accessibility assessment of urban public library websites,” public library quarterly 33, no. 3 (2014): 187–204, https://doi.org/187204.10.1080/01616846.2014.937207. 11 yan quan liu, arlene bielefeld, and peter mckay, “are urban public libraries’ websites accessible to americans with disabilities?,” universal access in the information society, 18, no. 1 (2019): 191–206, https://doi.org/10.1007/s10209-017-0571-7. 12 liu, bielefeld, and mckay, “are urban public library websites accessible.” 13 mary frances theofanos and j. redish, “bridging the gap: between accessibility and usability,” interactions 10, no. 6 (2003): 36–51, https://doi.org/10.1145/947226.947227. 14 jonathan lazar, a. dudley-sponaugle, and k. d. greenidge, “improving web accessibility: a study of webmaster perceptions,” computers in human behavior 20, no. 2 (2004): 269–88, https://doi.org/10.1016/j.chb.2003.10.018. 15 foley, “exploring the design,” 365. 16 david a. bradbard, cara peters, and yoana caneva, “web accessibility policies at land-grant universities,” internet & higher education 13, no. 4 (2010): 258–66, https://doi.org/10.1016/j.iheduc.2010.05.007. 17 mary cassner, charlene maxey-harris, and toni anaya, “differently able: a review of academic library websites for people with disabilities," behavioral & social sciences librarian 30, no. 1 (2011): 33–51, https://doi.org/10.1080/01639269.2011.548722. 18 liu, bielefeld, and mckay, “are urban public library websites accessible,” 195. http://search.proquest.com/docview/1651856804?accountid=9744 https://doi.org/187-204.10.1080/01616846.2014.937207 https://doi.org/187-204.10.1080/01616846.2014.937207 https://doi.org/10.1007/s10209-017-0571-7 https://doi.org/10.1145/947226.947227 https://doi.org/10.1016/j.chb.2003.10.018 https://doi.org/10.1016/j.iheduc.2010.05.007 https://doi.org/10.1080/01639269.2011.548722 abstract introduction literature review research design and methods results & discussion there was significant improvement between 2014 and 2018 missing form labels were the top error against the ada six alerts appear among ada requirements linked image with alternative text was the most frequently found feature issue a few 508 standards deviate from comparison between two evaluators conclusions limitations endnotes ontology for the user-learner profile personalizes the search analysis of online learning resources: the case of thematic digital universities article ontology for the user-learner profile personalizes the search analysis of online learning resources the case of thematic digital universities marilou kordahi information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.13601 marilou kordahi (marilou_kordahi@yahoo.fr) is assistant professor, faculty of business administration and management, saint-joseph university of beirut, and associate researcher, paragraph research laboratory, paris 8 university. © 2022. abstract we hope to contribute to the field of research in information technology and digital libraries by analyzing the connections between thematic digital universities and digital user-learner profiles. thematic digital universities are similar to digital libraries, and focus on creating and indexing open educational resources, as well as improving learning in the information age. the digital user profile relates to the digital representation of a person’s identity and characteristics. in this paper we present the design of an ontology for the digital user-learner profile (ontoulp) and its application program. ontoulp is used to structure a user-learner’s digital profile. the application provides each user-learner with tailor-made analyses based on informational behaviors, needs, and preferences. we rely on an exploratory research approach and on methods of ontologies, user modeling, and semantic matching to design the ontoulp and its application program. any user-learner could use the ontoulp and its application program. introduction more online learning environments are supporting the creation and dissemination of quality open educational resources (oer) to facilitate change in the education sector, improve education, ensure longlife learning, reduce cost, and other motives.1 in 2002, the united nations educational, scientific and cultural organization (unesco) recommended the definition of oer as follows: “the open provision of educational resources, enabled by information and communication technologies, for consultation, use and adaptation by a community of users for non-commercial purposes.”2 the william and flora hewlett foundation defined oer as “freely licensed, remixable learning resources—[they] offer a promising solution to the perennial challenge of delivering high levels of student learning at lower cost.”3 in 2012, unesco noted that oer offer education stakeholders an opportunity to access textbooks and other learning contents to enhance their knowledge and professional experiences.4 education stakeholders may choose oer based on their informational needs, behaviors, and preferences.5 we hope to contribute to the field of research in information technology and digital libraries by analyzing the connections between thematic digital universities and digital user-learner profiles. we are conducting a case study using the digital university engineering and technology.6 in the following we will explain these topics and the interest in the digital university engineering and technology. in 2003, the french ministry of higher education, research, and innovation initiated the creation of thematic digital universities to facilitate the integration and use of information and mailto:marilou_kordahi@yahoo.fr information technology and libraries june 2022 ontology for the user-learner profile | kordahi 2 communication technologies for education in university teaching practices.7 in total, there are six thematic digital universities which are organized by broad disciplines: health sciences and sports, engineering sciences, environment and sustainable development, humanities, economics and management, as well as technical studies. thematic digital universities are similar to digital libraries, and focus on creating and indexing oer, as well as improving learning in the information age.8 although thematic digital libraries are mostly comprised of oer, they also develop complete training programs with some of these resources (e.g., massive open online courses, or moocs). they are partners with canal-u, the video library for higher education, as well as the french national platform for massive open online courses (fun-mooc). thematic digital universities are mostly created for learners and teachers, as they offer complementary educational resources to bachelor, masters, and doctoral programs.9 to date, learners and teachers have free access to most thematic digital universities and corresponding educational resources. registration is not required; however, without registration neither the learner nor the teacher can analyze her/his search for oer based on informational behaviors, needs, and preferences.10 we will focus on the analysis of oer metadata records in the context of thematic digital universities. each oer in the repository holds a metadata record to precisely describe its specifications to the learner or teacher (e.g., the learning level, language, and topics). specifications are written according to the institute of electrical and electronics engineers (ieee) standards for learning object metadata (lom),11 lomfr, and suplomfr. lom provides an accurate descriptive schema of a learning object suitable for educational resources12 (e.g., the classification and identification of an educational resource). lomfr and suplomfr are currently applications of lom in the french educational community.13 the digital university engineering and technology attracted our attention because of the following characteristics: clear presentation of its objectives, regular information updates, priority for free access to oer and open data, 3,000 published educational resources, extensive documentation of oer indexing, interoperability of oer and metadata records, and an advanced search engine for oer. each metadata record describes precise information on the oer, including the main title, keywords, descriptive text, educational types (or resources), learning level, copyrights, knowledge domains, topics, authors, and publishers. it is processed and structured with xml language which is human-readable and machine-readable. digital user profiles relate to the digital representation of a person’s identity and characteristics.14 digital identity is the sum of digital traces (or “footprints”) relating to an individual or a community found on the web or in digital systems. digital traces correspond to the user’s profile, browsing history, and contribution actions.15 our focus is the learner who wishes to use the thematic digital universities for tailor-made analysis of retrieved information based on her/his needs and preferences. we offer the learner an option to register on these platforms to track behavior over time while searching for oer. analyses are based on criteria the learner has previously chosen to personalize this search. subsequently, we suggest using the term “digital user-learner profile.” we will do our best to respect the general data protection regulations when collecting information on the digital userlearner profile.16 the general data protection regulations are privacy laws drafted and passed by information technology and libraries june 2022 ontology for the user-learner profile | kordahi 3 the european union that prohibit the processing, storage, or sharing of certain types of information about individuals without their knowledge and consent. the research questions are as follows: 1. in the context of thematic digital universities, how can a user-learner personalize the search for open educational resources according to her/his digital profile? 2. in this same context, what kinds of information can a user-learner analyze in a search for open educational resources according to her/his digital profile? the objectives of this article are to present the preliminary results of work in progress on the design of the ontology for the digital user-learner profile (ontoulp) and its application program, the personalized modeling system for the user-learner profile (psul). we rely on the methods of ontology,17 user modeling,18 and semantic matching.19 the method of ontology is used to describe in a formal manner a set of concepts and objects which represent the meaning of an information system in a specific area and the relationships between these concepts and objects.20 the method of user modeling describes the process of designing and changing a user’s conceptual understanding. it is applied to customize and adjust systems to meet the user’s needs and preferences. the method of semantic matching is used to identify and relate a meaning concept (or class) to its homologous concept in tree-like schemas and to consider the concept’s position in these schemas (e.g., mapping a class in an ontology to homologous concepts in metadata records). this relationship can be a one-to-one concept or one-to-many concepts. the ontoulp is a first approach, and it will be used to structure a user-learner’s digital profile in the context of thematic digital universities. we design this ontology for three main reasons: to structure collected and generated information21 (e.g., structuring a user-learner’s learning preferences will enable the identification of learning behaviors and activities), to analyze collected and generated information22 (e.g., analyzing generated information by a user-learner may predict a search for oer), as well as to facilitate relationships between a user-learner and thematic digital universities23 (e.g., analyzing user-learner informational behaviors may improve oer creation and dissemination). the psul will be designed as an application program for the ontoulp. it will be used to provide each user-learner with tailor-made analyses based on informational behaviors, needs, and preferences. psul will include a secure database and web pages, namely those for registering and editing the user-learner profile and its dashboard.24 ontoulp and its application program will offer each registered user-learner an opportunity to analyze the search for oer according to informational behaviors and needs. ontoulp and psul could be implemented in the structure of information systems for educational and research institutions, documentation and information centers, and many others. we will finetune our analysis by relying on a case example—the thematic digital universities. this article comprises six sections. first, we will explain the exploratory research carried out in the context of thematic digital universities. second, we will present the main published works related to the subject of the article. third, we will explain the approach followed to design and write the ontoulp. fourth, we will discuss the creation of the psul application program. fifth, we will demonstrate the integration of the designed ontology and its application program into a information technology and libraries june 2022 ontology for the user-learner profile | kordahi 4 mirror site to perform a technical test. finally, we will discuss the completed work before concluding the article. exploratory research approach this exploratory research is based on an analysis of the literature, a semistructured questionnaire, and an in-depth documentary research. we check the consistency of collected information and identify the need to personalize the search for oer as well as make tailor-made analysis of information. methods used during the first 18 months of the covid-19 pandemic (november 2020–may 2021), we conducted qualitative research to deepen our comprehension of the practices of thematic digital universities. we collected and interpreted primary and secondary information. primary information: we contacted the digital university association and their six thematic digital universities.25 because of their extensive expertise and robust knowledge in leading or managing thematic digital universities, directors and general secretaries were chosen to selfadminister an electronic semistructured questionnaire. we contacted seven individuals and received six responses. in this questionnaire, we asked about the following topics: the recent knowledge of thematic digital universities, conditions of access to oer, metadata records indexing as well as user-learner’s expectations. an example of the questionnaire is included in the appendix. secondary information: we analyzed a report by the french general inspectorate of the national education and research administration. we have also studied recently-published scientific articles by anne boyer (2011), deborah arnold (2018),26 and sihem zghidi and mokhtar ben henda (2020). the results and findings will be explained in the following paragraphs. results of information collection we have compared responses to the questionnaire and contents of published documents and articles. for the digital university in health sciences and sports, “resources are mostly accessible to learners from member universities, through an identification system based on the university email address.”27 only a few resources are open to the public. otherwise, according to comments gathered from the other four digital universities and digital university association, “thematic digital universities are part of global movements providing access to oer by promoting open access to knowledge.”28 they are an opportunity for learners to discover new disciplines and explore new areas.29 in fact, “the process for indexing metadata records meets standards for education, such as lom, lomfr and suplomfr.”30 at present, there is no feedback on the use of thematic digital universities platforms. in other words, “thematic digital universities have no information about learners who view oer, because there is no login and password. this is done on purpose to make them as open as possible.”31 these platforms are considered as a means of selftraining with quality assurance, as the documents have been produced and validated by higher education teachers. “thematic digital universities provide a certain flexibility allowing learners to work when and where they want.”32 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 5 findings five thematic digital universities and the digital university association responded to the semistructured questionnaire. two thematic digital universities can track user-learners’ behaviors. these digital universities are related to the disciplines of health and sport in addition to technical studies. to date, four thematic digital universities cannot track user-learners’ interactions based on informational behaviors and preferences. ontoulp and its application program could be implemented in four thematic digital universities, which are related to the disciplines of engineering sciences, environment and sustainable development, humanities, economics, and management. literature review to our best knowledge, published research works addressing this research subject are limited in the context of thematic digital universities.33 we analyze the most recent ontologies and user modeling systems that are close to our research objectives. the main works we use are those of bloom et al. (1984),34 smythe et al. (2001),35 green and panzer (2009),36 and kordahi (2020),37 in addition to kelly and belkin (2002). the work methods and field studies these researchers have developed are useful to design the structure of the ontoulp and the model of its application program. in the following paragraphs, we will explain these works and the relationships with this research article. selection of recently published works in 2020 and 2021, kordahi designed an ontology and a personalized dashboard for user learners.38 the objectives of these works were to track individual searches for oer and compare them with a user-learner’s field of work. to design her ontology, kordahi relied on standardized ontologies and validated taxonomies which are used in online learning environments, namely the ims learner information profile (ims lip)39 and bloom’s taxonomy. the personalized dashboard was linked to the user-learner ontology. the designed dashboard was tested technically with its ontology in a digital library environment to examine its performance. kordahi used the methods of ontologies and semantic matching. learner model we are mostly interested in the learner model40 as it “is a model of the knowledge, difficulties and misconceptions of the individual [learner].” 41 as students learn the educational resources they find, the learner model is updated to display their current progress. the model can continue to tailor students’ interactions as they learn. there are several learner models, such as the ims lip.42 we examine the ims lip, which is based on a standardized data model describing a learner’s characteristics. it is mainly used to manage a student’s learning history to discover her/his learning opportunities. ims lip is made from 11 categories that gather learning information: “the identification, goals, qualifications and licenses, activity, interest, competency, accessibility, transcript, affiliation, security, and relationships.”43 this model has been successfully used by many renowned researchers (e.g., paquette 201044) to design a learner model and then adapt it to appropriate contexts. ims lip’s reliability, accuracy, and flexibility match well with the ontoulp motives. we will use it to begin designing the structure of the ontoulp and adapt it to the thematic digital universities context. we will also consider the ieee lom, lomfr, and suplomfr classification fields. this measure will be used to improve semantic matching between the ontoulp and oer metadata records. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 6 taxonomy of educational objectives we examine the user-learner’s educational objectives to meet informational needs and expectations.45 in each oer metadata record, educational objectives are defined based on bloom’s taxonomy (e.g., “understand the context and rules of scientific publication” 46). bloom et al. have developed a taxonomy for educational objectives to classify statements teachers expected students to learn as a result of lessons and instructions. the researchers described a method for allowing students to achieve educational goals while carrying out exercises utilizing the resources of the environment. bloom et al. relied on in-depth qualitative studies to design and validate this taxonomy. bloom’s taxonomy contains the following six major categories related to the cognitive domain: knowledge, comprehension, application, analysis, synthesis, and evaluation. this taxonomy was revised in 2001 by lorin anderson et al.47 bloom’s taxonomy is still in use internationally as in the works of kordahi. integrating bloom’s taxonomy into the ontoulp will enhance the structure of a user-learner’s educational objectives. these educational objectives will be organized in six categories allowing the user-learner to refine her/his informational goals. therefore, we will create a mutual link between the user-learner’s educational objectives and oer educational objectives. knowledge domains knowledge organization systems48 are seen as a valuable component for searching for oer.49 our research includes analyses of oer metadata records to establish relationships between their knowledge topics and the user-learner’s topics of interest. in the thematic digital universities’ metadata records, a precise classification is reported respecting both knowledge topics and dewey decimal classification (e.g., geographic information systems (526.028 5)). 50 the dewey decimal classification and relative index 22nd edition,51 published in 2003 by the online computer library center,52 is being used worldwide in digital libraries and by the thematic digital universities.53 in their works published in 2009, green and panzer have developed an ontology to structure knowledge domains.54 this ontology recognizes two classes, which are dewey classes and knowledge topics. we selected the dewey decimal classification for the ontoulp because the thematic digital universities are already using it. we will rely on green and panzer’s ontology to structure the knowledge domains in the ontoulp (e.g., the use of dewey classes and knowledge topics). we will establish relationships between the knowledge domains and user-learner model, allowing the user-learner to choose the most appropriate learning topics. user modeling system the “user modeling system for personalized interaction and tailored retrieval” is useful for analyzing each user-learner’s informational needs and preferences.55 kelly and belkin’s system helps the user to track informational needs over time. it contains three classes of models and a set of interactions. the “general behavioral model” tracks information seeking and user behavior to determine informational needs. the “personal behavioral model” characterizes each user’s information search according to specific preferences and behaviors. the “topical models” are associated with concepts related to each user’s informational behaviors. this model is developed by renowned researchers specialized in information retrieval and corresponds to the objectives of the research article. we will use the structure of kelly and belkin’s model (2002) to design the psul application program, in the context of thematic digital information technology and libraries june 2022 ontology for the user-learner profile | kordahi 7 universities. relationships between both the psul and ontoulp ontology will be established to carry out personalized analyses of oer search. ontoulp ontology ontoulp’s design is based on the works discussed in the previous section. it consists of two stages. we start by writing it. we then describe the ontology and emphasize the relationships between different entities. writing the ontology we write ontoulp with protégé editor and use the hermit inference engine to check the consistency of classes and their relationships with objects. the ontology’s first approach is saved in owl format, which is compliant with the semantic web technologies. ontoulp description the ontology is comprised of five subsystems. these are: user-learner, user-learner model, educational objectives, learning design, and knowledge domains. each subsystem is composed of classes that inherit the attributes of the subsystem on which they depend. for brevity, the figures show the hierarchical representation of these subsystems. the user-learner subsystem contains all recorded private information on the digital user-learner profile. the classes personal information, identification sessions, and traces provide information about the user-learner’s behavior and search history for oer, e.g., the search duration for oer (see fig. 1). the user-learner model subsystem is responsible for structuring collected information related to learning behaviors and needs, namely the classes identification, interest, learning level (or qualifications and licenses), personal preferences (or accessibility), activities, learning objectives (or goals), affiliation, and network of contacts (or relationships). in the context of thematic digital universities, the resulting subsystem is composed of eight classes instead of eleven. the userlearner model subsystem conveys the structured information to the user-learner subsystem. figure 1 shows the structure of both subsystems, the user-learner and user-learner model. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 8 figure 1. hierarchical representation of both subsystems, the user-learner and user-learner model. the educational objectives subsystem includes cognitive objectives involved in the process of acquiring knowledge. we design their structure by adapting bloom’s taxonomy. the cognitive objectives class includes six interrelated subclasses: remember (or knowledge), understand (or comprehension), apply (or application), analyze (or analysis), synthetize (or synthesis), and evaluate (or evaluation). the cognitive objectives class is enhanced with the ieee lom, lomfr, and suplomfr classification fields enabling the user-learner to choose objectives which best describe their needs and preferences, e.g., the class apply has subclasses design, choose (see fig. 2). information technology and libraries june 2022 ontology for the user-learner profile | kordahi 9 figure 2. hierarchical representation of educational objectives and learning design subsystems. the learning design subsystem is an adaptation of the ims learning design model, in the context of thematic digital universities.56 the learning design subsystem has two main classes: the userlearner’s environment and learning activities. the environment class has six thematic digital universities as subclasses. in a general manner, information about the environment class comes from thematic digital universities platforms (e.g., the viewed metadata records). the learning activities class has resources as a subclass. the resources subclass is also enriched with the ieee lom, lomfr, and suplomfr classification fields to complete its structure and meet the userlearner’s needs and expectations. further, we have connected the learning activities with cognitive objectives classes to ensure continuity between them (e.g., the subclass experimentation is associated with subclass analyze). figure 2 illustrates the main structure of both subsystems, the learning objectives and learning design. the knowledge domains subsystem contains the main class dewey decimal classification and class contacts. this main class has two subclasses: dewey classes, with the corresponding divisions as subclasses, and knowledge topics, with the corresponding subtopics as subclasses (e.g., science topic corresponds to dewey class 500, manufacturing subtopic corresponds to division 670). information technology and libraries june 2022 ontology for the user-learner profile | kordahi 10 figure 3. hierarchical representation of the subsystem knowledge domains. the subclass knowledge topics is related to the subclass user-learner’s learning topics to improve informational behavior analyses. the class contacts is linked to the subclass user-learner’s network of contacts to analyze the strength or weakness of networks between the user-learner and oer publishers/authors (see fig. 1). the subsystem knowledge domains can deal with questions which belong to different levels in the ontoulp. for example, which learning topics is the user-learner looking for? which network of contacts is the user-learner interested in? what are the activities related to the user-learner learning topics? what keywords searched relate to the user-learner’s learning topics?57 in figure 3, we show some of the subsystem’s elements. personalized modeling system for the user-learner profile the psul is based on the works discussed in the previous sections. it is written with php, javascript, and xml, computing languages for the web. this new modeling system comprises three classes of models: the general behavioral, personal behavioral, and topical (see fig. 4). the general behavioral model has two roles. it registers a user-learner’s digital profile in order to determine informational needs and preferences for oer. it also collects informational behaviors of a user-learner while viewing oer metadata records for tailor-made analyses. the general information technology and libraries june 2022 ontology for the user-learner profile | kordahi 11 behavioral model includes the ontology ontoulp as well as user-learner registration and editing pages. the registration page contains relevant information about a user-learner, an option to accept or reject data collection, and a list of choices for behavioral analyses. once registered, the user-learner can modify her/his profile from the editing page. both pages are mapped to the ontoulp to populate criteria fields. the user-learner profile information is stored in a secure database (as described in the introduction). the personal behavioral model is used to analyze information according to the registered digital user-learner profile and informational behaviors. it contains a set of queries to collect and tailor information for each user-learner. the sources of information are the general behavioral model and oer metadata records. this model is designed based on analyses of the general behavioral model. when a user-learner begins searching for oer, the general behavioral model provides the personal behavioral model with all profile information as well as the history of oer search. this information is transmitted to make an adjustment to the personalized user-learner profile. the user-learner profile changes as the personal behavioral model receives more information from the general behavioral model. informational interactions connect the personal behavioral model to topical models. the topical models bring together all analyses of oer search for each user-learner.58 they are inferred from the personal behavioral model. informational interactions connect the topical models to the general behavioral model. for now, we have designed four topical models and present their outcome in the user-learner dashboard page. this page may be used as a practical dashboard providing feedback to each user-learner, who can use these analyses to adjust or make changes in the profile or the oer search. topical model 1 is used to synthesize each user-learner’s search history and to suggest a profile adjustment. the suggested adjustment is based on analyses of user-learner behavioral trends.59 topical model 2 allows each user-learner to examine the list of knowledge topics which have caught her/his attention. it contains two separate lists describing viewed oer metadata records and matching them to the chosen topics of interest. topical model 3 shows comparative analyses between the user-learner’s preference criteria and viewed metadata records. the user-learner can interact with this model by comparing the chosen topics of interest to the viewed knowledge topics. the user-learner can also compare the chosen learning activities to the viewed teaching pedagogies. the teaching pedagogies as well as knowledge topics are extracted from oer metadata records (see fig. 5a). topical model 4 highlights each user-learner’s interest based on the keyword search volume. the user-learner can interact with this model by studying the relationships between searched keywords and chosen topics of interest (see fig. 5a and fig. 5b). figure 4 shows the diagram of psul as explained in the paragraph. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 12 figure 4. the psul diagram based on the kelly and belkin’s system (2002).60 ontoulp and its modeling system in the context of a thematic digital university for now, ontoulp and its application program are implemented in the digital university engineering and technology private platform which is hosted on a private server. we conducted a technical test to mainly assess ontoulp’s precision and performance. the digital university’s team has sent us a complete archive of their oer metadata records. these oer metadata records are saved on the private server with the digital university engineering and technology platform. once a user-learner is registered to this platform, she/he can carry out actions through the psul. for example, these actions are a search by keyword, personalization of profile, tailored-made analysis of oer search, and visualization of analyses in the dashboard. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 13 figure 5a. screenshot of a section of the dashboard. the bar chart shows comparative analyses between a user-learner’s topic of interest and knowledge topics. the knowledge topics are extracted from the viewed oer metadata records. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 14 figure 5b. screenshot of a section of the dashboard. the pie chart highlights a user-learner’s interest based on a keyword search volume. the bar chart shows comparative analyses between a user-learner learning activities and viewed teaching pedagogies. the keywords are extracted from the search. the teaching pedagogies are extracted from oer metadata records. to avoid making the article longer, in figures 5a and 5b, we show brief results of a technical test. in this example, the user-learner’s identity is fictitious, or the user-learner’s persona is a construct.61 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 15 in other words, the user-learner’s identity is not real, it is fabricated to conduct and complete the technical test. when registering, this user-learner has selected the technology topic (dewey class 600) in addition to the management and public relations subtopic (dewey division 650). this userlearner has also selected all topical models. during a viewing session, this user-learner chose to search for oer while using a few keywords. the keywords were chosen according to the userlearner’s profile and in order to continue the technical test. discussion and conclusion the ontology for the digital user-learner profile is a first approach based on the semantic web. it is designed for the personalization of interactions and retrieval of tailored information. we have combined standardized and validated resources, such as the ims lip, bloom’s taxonomy, and knowledge domains ontology, to allow the user-learner’s search analyses. we have discussed the design of a new application program prototype allowing a user-learner to analyze the search for oer according to her/his digital profile. psul provides automated real-time feedback based on the user-learner’s search history and information she/he has inserted about herself/himself. we have then demonstrated the integration of the ontoulp and psul into a mirror site to perform a technical test. the ontology’s main characteristics are flexibility and adaptability. while designing ontoulp, we have reused or restructured resources to allow its use in other thematic digital universities and online learning environments, including digital libraries. another advantage of ontoulp is the application of several information processing techniques. for example, a registered user-learner can self-assess her/his search for oer by keywords. she/he can also analyze the relevance of the search for oer through the psul. we have successfully overcome three essential limitations. the first limitation concerns the literature on the subject (see literature review section). while contributing to the field of research in information technology and digital libraries, this work has also drawn on disciplines as diverse as those of education as well as cognitive, social, and human sciences. the terminological definitions of disciplines, concepts, and even methods vary over decades or centuries, and among groups of researchers. we have made every effort to define the different terms correctly and to cite the corresponding researchers. the second difficulty relates to the design of ontoulp. published works dealing with this topic are rare. we used an exploratory research approach and the published works of renowned international researchers to fine-tune our study (see the exploratory research approach and literature review sections). we then determined the classes and objects as well as relationships between them. the third constraint concerns the design of the psul by following the thematic digital universities policies and respecting the general data protection regulations. according to the regulations, we have opted for an optional registration to thematic digital universities and to collecting information on the digital user-learner profile. thus, the user-learner will always have the possibility of registering to these platforms to make a tailor-made information analysis according to the digital profile. as we conclude our work, we have a plan to focus our research and initiatives in the following areas. firstly, we will further deepen our study of ontoulp classes to further increase their precision. we will also examine the search personalization of oer based on uses and practices of algorithms in the ontoulp.62 for example, by relying on newer version of the ontology we will identify the topics of interest, which may interest a specific user-learner. we will implement this information technology and libraries june 2022 ontology for the user-learner profile | kordahi 16 newer version in some thematic digital universities to perform technical tests. secondly, we will conduct qualitative and quantitative studies to analyze participants’ behavior while using ontoulp and its application program, in the context of thematic digital universities. for example, we will examine how many participants would choose to use the ontoulp and psul as well as how many wouldn’t (e.g., the usefulness of ontologies to participants). we will analyze the behavior of individuals with digital personae and make connections between their searches for oer.63 we will study their profiles, behaviors, and interests to ultimately suggest oer (e.g., the use of recommendation systems). we will also analyze how participants’ behavior and feedback may affect future findings. participants would be previously selected to contribute to these studies. thirdly, we will study the effects of ontoulp and psul practices on the thematic digital universities. this study will concern an analysis of the thematic digital universities’ search engines and users-learners’ needs. for example, exploratory research will allow us to better understand user-learners’ informational needs and expectations when using the oer search engines. we will analyze the design of oer search engines considering these informational needs and expectations. we will then utilize and integrate these findings to suggest alternatives to the thematic digital universities to further improve these search engines. acknowledgments we thank the digital university association and thematic digital universities for their elaborate and enlightening explanations concerning the platforms. we thank the reviewers and claude baltz, emeritus professor in information and communication sciences at the paris 8 university, for carefully reviewing this article and for enriching it with their expert observations. thanks to mohammad hajj hussein, communication and it engineer, for his help programming the dashboard. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 17 appendix: semistructured questionnaire example email subject: digital university engineering and technology dear sir, madam, i am affiliated to the paragraph research laboratory at the paris 8 university (laboratoire de recherche paragraphe, université paris 8). i am writing to you to gather further information concerning the digital university engineering and technology. the objective of the semistructured questionnaire is to deepen my comprehension of the practices of digital university engineering and technology in order to write a research article and contribute to its improvement. i would be grateful if you could answer the following questions: • what are your responsibilities at the digital university engineering and technology? • do the thematic digital universities as well as digital university engineering and technology provide “open” educational resources? • are the educational resources accessible only to students enrolled in the training programs of partner universities? • how is the access to educational resources made? • do the educational resources follow document processing for their indexing? • is the document processing specific to the thematic digital universities? • what are the expectations of “users” searching for educational resources? thank you in anticipation sincerely yours, marilou kordahi information technology and libraries june 2022 ontology for the user-learner profile | kordahi 18 endnotes 1 “cape town open education declaration: unlocking the promise of open educational resources,” 2007, http://www.capetowndeclaration.org/read-the-declaration. 2 unesco, “forum on the impact of open courseware for higher education in developing countries,” (2002): 24, http://unesdoc.unesco.org/images/0012/001285/128515e.pdf. 3 william and flora hewlett foundation, “open education,” accessed april 5, 2022, https://hewlett.org/strategy/open-education. 4 unesco, “2012 paris oer declaration,” 2012, http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaratio n.html. 5 camille thomas, kimberly vardeman, and jingjing wu, “user experience testing in the open textbook adaptation workflow,” information technology and libraries journal 40, no. 1 (2021): 1–18, https://doi.org/10.6017/ital.v40i1.12039. 6 digital university engineering and technology, “open educational resources for engineering and technology,” accessed april 5, 2022, https://unit.eu. 7 jean delpech de saint guilhem, sonia dubourg-lavroff, and jean-yves de longueau, “thematic digital universities,” general inspectorate of the national education and research administration, 2016, https://www.enseignementsuprecherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/lesuniversites-numeriques-thematiques.html. 8 asim ullah, shah khusro, and irfan ullah, “bibliographic classification in the digital age: current trends & future directions,” information technology and libraries 36, no. 3 (2017): 48–77, https://doi.org/10.6017/ital.v36i3.8930; anne boyer, “thematic digital universities: report,” sciences et technologies de l'information et de la communication pour l'éducation et la formation 18, no. 1 (2011): 39–52. 9 sihem zghidi and mokhtar ben henda, “open educational resources and open archives in the open access movement: an educational engineering and scientific research crossed analysis,” distances and mediations of knowledge 31 (2020), https://doi.org/10.4000/dms.5347. 10 diane kelly and nicholas j. belkin, “a user modeling system for personalized interaction and tailored retrieval in interactive ir,” proceedings of the american society for information science and technology 39, no. 1 (2002): 316–25, https://doi.org/10.1002/meet.1450390135. 11 ieee learning technology standards committee, “learning object metadata, final draft standard, 1484.12.1-2002,” http://ltsc.ieee.org/wg12. 12 gregory m. shreve, and marcia lei zeng, “integrating resource metadata and domain markup in an nsdl collection,” in international conference on dublin core and metadata applications (2003): 223–29. http://www.capetowndeclaration.org/read-the-declaration http://unesdoc.unesco.org/images/0012/001285/128515e.pdf https://hewlett.org/strategy/open-education http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaration.html http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaration.html https://doi.org/10.6017/ital.v40i1.12039 https://www.enseignementsup-recherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/les-universites-numeriques-thematiques.html https://www.enseignementsup-recherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/les-universites-numeriques-thematiques.html https://www.enseignementsup-recherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/les-universites-numeriques-thematiques.html https://doi.org/10.6017/ital.v36i3.8930 https://doi.org/10.4000/dms.5347 https://doi.org/10.1002/meet.1450390135 http://ltsc.ieee.org/wg12 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 19 13 french standardization association, “description standard for the field of education in france – – part 1: description of learning resources (nodefr-1), nf z76-041,” 2019. 14 arthur allison, james currall, michael moss, and susan stuart, “digital identity matters,” journal of the american society for information science and technology 56, no. 4 (2005): 364–72, https://doi.org/10.1002/asi.20112. 15 katalin feher, “digital identity and the online self: footprint strategies – an exploratory and comparative research study.” journal of information science 47, no. 2 (2021): 192–205. https://doi.org/10.1177/0165551519879702. 16 robyn caplan and danah boyd, “who controls the public sphere in an era of algorithms,” mediation, automation, power (2016), https://www.datasociety.net/pubs/ap/mediationautomationpower_2016.pdf. 17 thomas r. gruber, “a translation approach to portable ontology specifications,” knowledge acquisition 5, no. 2 (1993): 199–220, https://doi.org/10.1006/knac.1993.1008. 18 gerhard fischer, “user modeling in human–computer interaction,” user modeling and useradapted interaction 11, no. 1 (2001): 65–86, https://doi.org/10.1023/a:1011145532042. 19 yannia kalfoglou and marco schorlemmer, “ontology mapping: the state of the art,” the knowledge engineering review 18, no. 1 (2003): 1–31, https://doi.org/10.1017/s0269888903000651. 20 tom gruber, “collective knowledge systems: where the social web meets the semantic web,” web semantics: science, services and agents on the world wide web 6 no. 1 (2008): 4–13, https://doi.org/10.1016/j.websem.2007.11.011. 21 peter ingwersen, “search procedures in the library – analysed from the cognitive point of view,” journal of documentation 38, no. 3 (1982): 165–97, https://doi.org/10.1108/eb026727. 22 tefko saracevic, amanda spink, and mei-mei wu, “users and intermediaries in information retrieval: what are they talking about?” in user modeling: proceedings of the sixth international conference (vienna: springer, 1997): 43–54. 23 núria ferran, enric mor, and julià minguillón, “towards personalization in digital libraries through ontologies,” library management 26, no. 4/5 (2005): 206–17. https://doi.org/10.1108/01435120510596062. 24 katrien verbert, erik duval, joris klerkx, sten govaerts, and josé luis santos, “learning analytics dashboard applications,” american behavioral scientist 57, no. 10 (2013): 1500– 1509, https://doi.org/10.1177/0002764213479363. 25 digital university association, “open educational resources for all,” accessed april 5, 2022, https://univ-numerique.fr. https://doi.org/10.1002/asi.20112 https://doi.org/10.1177/0165551519879702 https://www.datasociety.net/pubs/ap/mediationautomationpower_2016.pdf https://doi.org/10.1006/knac.1993.1008 https://doi.org/10.1023/a:1011145532042 https://doi.org/10.1017/s0269888903000651 https://doi.org/10.1016/j.websem.2007.11.011 https://doi.org/10.1108/eb026727 https://doi.org/10.1108/01435120510596062 https://doi.org/10.1177/0002764213479363 https://univ-numerique.fr/ information technology and libraries june 2022 ontology for the user-learner profile | kordahi 20 26 deborah arnold, “the french thematic digital universities – a 360° perspective on open and digital learning,” in european distance and e-learning network conference proceedings, no. 1 (2018): 370–78. 27 director of the digital university in health and sport messaged author, may 3, 2021. 28 director of the virtual university of environment and sustainable development messaged author, january 6, 2021. 29 director of the digital university in economics and management messaged author, december 08, 2020. 30 general secretary of the open university of the humanities messaged author, may 1, 2021. 31 member of digital university association messaged author, december 18, 2020. 32 director of the digital university engineering and technology messaged author, december 11, 2020. 33 laecio araujo costa, leandro manuel pereira sanches, ricardo josé rocha amorim, laís do nascimento salvador, and marlo vieira dos santos souza, “monitoring academic performance based on learning analytics and ontology: a systematic review,” informatics in education 19, no. 3 (2020): 361–97. 34 benjamin s. bloom, david r. krathwohl, and bertram b. masia, taxonomy of educational objectives: the classification of educational goals (new york: longman, 1984). 35 colin smythe, frank tansey, and robby robson, “ims learner information package. best practice & implementation guide,” ims global learning consortium, 2001. 36 rebecca green and michael panzer, “the ontological character of classes in the dewey decimal classification,” the library, (2009), https://www.ergonverlag.de/isko_ko/downloads/aiko_vol_12_2010_25.pdf 37 marilou kordahi, «le changement de l’apprentissage, l’ontologie du profil de l’utilisateurapprenant, » management des technologies organisationnelles, 10 (2020): 73–88. 38 marilou kordahi, “information literacy: ontology structures user-learner profile in online learning environment,” in seventh european conference on information literacy, (2021): 130, http://ecil2021.ilconf.org/wpcontent/uploads/sites/9/2021/09/ecil2021_book_of_abstracts_final_v3.pdf#page=149. 39 “ims learner information package accessibility for lip best practice and implementation guide,” ims global learning consortium, last revised june 18, 2003, https://www.imsglobal.org/accessibility/acclipv1p0/imsacclip_bestv1p0.html. 40 judy kay, “learner know thyself: student models to give learner control and responsibility,” in proceedings of international conference on computers in education (1997): 17–24. https://www.ergon-verlag.de/isko_ko/downloads/aiko_vol_12_2010_25.pdf https://www.ergon-verlag.de/isko_ko/downloads/aiko_vol_12_2010_25.pdf http://ecil2021.ilconf.org/wp-content/uploads/sites/9/2021/09/ecil2021_book_of_abstracts_final_v3.pdf#page=149 http://ecil2021.ilconf.org/wp-content/uploads/sites/9/2021/09/ecil2021_book_of_abstracts_final_v3.pdf#page=149 https://www.imsglobal.org/accessibility/acclipv1p0/imsacclip_bestv1p0.html information technology and libraries june 2022 ontology for the user-learner profile | kordahi 21 41 susan bull, “supporting learning with open learner models,” in proceedings of 4th hellenic conference with international participation information and communication technologies in education (2004): 47–61. 42 peter dolog and wolfgang nejdl, “challenges and benefits of the semantic web for user modelling,” in proceedings of the workshop on adaptive hypermedia and adaptive web-based systems (ah2003) at 12th international world wide web conference (2003). 43 ims global learning consortium, “ims learner information package accessibility for lip best practice and implementation guide,” para. 2. 44 gilbert paquette, “ontology-based educational modelling-making ims-ld visual,” technology, instruction, cognition & learning 7, no. 3–4 (2010): 263–93. 45 john seely brown and richard p. adler, “open education, the long tail, and learning 2.0,” educause review 43, no. 1 (2008): 16–20. 46 open university of the humanities, “how to write and publish a scientific article,” accessed on april 5, 2022, https://uoh.fr/front/noticefr/?uuid=6a063dd7-3a02-482a-9857934501f7c82d. 47 lorin w. anderson, david r. krathwohl, peter w. airiasian, kathleen a. cruikshank, richard e. mayer, paul r. pintrich, james raths, and merlin c. wittrock. a taxonomy for learning, teaching and assessing: a revision of bloom’s taxonomy of educational objectives (new york: longman publishing group, 2001). 48 birger hjørland, “theories are knowledge organizing systems (kos).” knowledge organization 42, no. 2 (2017): 113–28, https://doi.org/10.5771/0943-7444-2015-2-113. 49 walter moreira and daniel martínez-ávila, “concept relationships in knowledge organization systems: elements for analysis and common research among fields,” cataloging & classification quarterly 56, no. 1 (2018): 19–39, https://doi.org/10.1080/01639374.2017.1357157. 50 wayne a. wiegand, “the ‘amherst method’: the origins of the dewey decimal classification scheme.” libraries & culture 33, no. 2 (1998): 175–94. 51 melvil dewey, dewey decimal classification and relative index, ed. joan s. mitchell, julianne beall, giles martin, and winton e. matthews, 22nd ed., (dublin, ohio: oclc, 2003). 52 joan s. mitchell, “ddc 22: dewey in the world, the world in dewey,” advances in knowledge organization 9 (2004): 139–45. 53 hamid saeed and abdus sattar chaudhry, “using dewey decimal classification scheme (ddc) for building taxonomies for knowledge organisation,” journal of documentation 58, no. 5 (2002): 575–83. 54 rebecca green and michael panzer, “the interplay of big data, worldcat, and dewey,” advances in classification research online 24, no. 1 (2013): 51–58. https://doi.org/10.5771/0943-7444-2015-2-113 https://doi.org/10.1080/01639374.2017.1357157 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 22 55 kelly and belkin, “a user modeling system,” 319. 56 rob koper and colin tattersall, eds., learning design: a handbook on modelling and delivering networked education and training (heidelberg: springer science and business media, 2005). 57 david beer, “envisioning the power of data analytics,” information, communication & society 21, no. 3 (2018): 465–79, https://doi.org/10.1080/1369118x.2017.1289232. 58 charles lang, george siemens, alyssa wise, and dragan gasevic, eds., handbook of learning analytics (society for learning analytics and research, 2017), https://doi.org/10.18608/hla17. 59 joris klerkx, katrien verbert, and erik duval, “learning analytics dashboards,” in handbook of learning analytics, ed. charles lang, george siemens, alyssa wise, and dragan gasevic, (society for learning analytics and research, 2017), https://doi.org/10.18608/hla17, 143–50. 60 kelly and belkin, “a user modeling system,” 319. 61 roger clarke, “the digital persona and its application to data surveillance,” the information society 10, no. 2 (1994): 77–92, https://doi.org/10.1080/01972243.1994.9960160. 62 ahu sieg, bamshad mobasher, and robin burke, “web search personalization with ontological user profiles,” in proceedings of the sixteenth acm conference on conference on information and knowledge management (2007): 525–34, https://doi.org/10.1145/1321440.1321515. 63 roger clarke, “persona missing, feared drowned: the digital persona concept, two decades later,” information technology & people 27, no. 2 (2014): 182–207, https://doi.org/10.1108/itp-04-2013-0073. https://doi.org/10.1080/1369118x.2017.1289232 https://doi.org/10.18608/hla17 https://doi.org/10.18608/hla17 https://doi.org/10.1080/01972243.1994.9960160 https://doi.org/10.1145/1321440.1321515 https://doi.org/10.1108/itp-04-2013-0073 abstract introduction exploratory research approach methods used results of information collection findings literature review selection of recently published works learner model taxonomy of educational objectives knowledge domains user modeling system ontoulp ontology writing the ontology ontoulp description personalized modeling system for the user-learner profile ontoulp and its modeling system in the context of a thematic digital university discussion and conclusion acknowledgments appendix: semistructured questionnaire example endnotes topic modeling as a tool for analyzing library chat transcripts article topic modeling as a tool for analyzing library chat transcripts hyunseung koh and mark fienup information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13333 hyunseung koh (hyunseung.koh@uni.edu) is an assessment librarian and assistant professor of library services, university of northern iowa. mark fienup (mark.fienup@uni.edu) is an associate professor in the computer science department, university of northern iowa. © 2021. abstract library chat services are an increasingly important communication channel to connect patrons to library resources and services. analysis of chat transcripts could provide librarians with insights into improving services. unfortunately, chat transcripts consist of unstructured text data, making it impractical for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing tools. as a stepping-stone toward a more sophisticated chat transcript analysis tool, this study investigated the application of different types of topic modeling techniques to analyze one academic library’s chat reference data collected from april 10, 2015, to may 31, 2019, with the goal of extracting the most accurate and easily interpretable topics. in this study, topic accuracy and interpretability—the quality of topic outcomes—were quantitatively measured with topic coherence metrics. additionally, qualitative accuracy and interpretability were measured by the librarian author of this paper depending on the subjective judgment on whether topics are aligned with frequently asked questions or easily inferable themes in academic library contexts. this study found that from a human’s qualitative evaluation, probabilistic latent semantic analysis (plsa) produced more accurate and interpretable topics, which is not necessarily aligned with the findings of the quantitative evaluation with all three types of topic coherence metrics. interestingly, the commonly used technique latent dirichlet allocation (lda) did not necessarily perform better than plsa. also, semi-supervised techniques with human-curated anchor words of correlation explanation (corex) or guided lda (guidedlda) did not necessarily perform better than an unsupervised technique of dirichlet multinomial mixture (dmm). last, the study found that using the entire transcript, including both sides of the interaction between the library patron and the librarian, performed better than using only the initial question asked by the library patron across different techniques in increasing the quality of topic outcomes. introduction with the rise of online education, library chat services are an increasingly important tool for student learning.1 library chat services have the potential to support student learning, especially for distant learners who have a lack of opportunity to come and learn about library and research skills in person. in addition, unlike traditional in-person reference services whose use has declined drastically, library chat services have become an important communication channel that connects patrons to library resources, services, and spaces.2 quantitative and qualitative analysis of chat transactions could provide librarians with insights into improving the quality of these resources, services, and spaces. for example, in order to maximize patrons’ satisfaction, librarians could identify or evaluate quantitative and qualitative mailto:hyunseung.koh@uni.edu mailto:mark.fienup@uni.edu information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 2 patterns of chat reference data (e.g., busiest days and times of nondirectional, research-focused questions) and develop a better staffing plan for assigning librarians or student employees to most appropriate days and times. furthermore, these insights could be used to help demonstrate library value by showing external stakeholders how successfully library chat services support students’ needs, which is increasingly in demand for higher education. 3 in practice, it is burdensome for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing chat software tools, such as libraryh3lp, questpoint, springshare’s libchat, and liveperson.4 currently, in order to obtain rich and hidden insights from large volumes of chat transcripts, librarians need to conduct manual qualitative analysis of chat transcripts with unstructured text data, which requires a lot of time and effort. in an age when library patrons' information needs have been changing, the lack of chat analysis tools that handle large volumes of transcripts hinders librarians’ ability to respond to patrons’ wants and needs in a timely manner.5 in particular, small and medium-sized academic libraries have seen a shortage of librarians and need to hire and train student employees , so librarians’ capabilities for real-time quick and easy analysis and assessment will become critical in helping them take appropriate actions to best meet user needs.6 as part of an effort to develop a quick and easy analysis tool for large volumes of chat transcripts, this study applied topic modeling, which is a statistical technique “for learning the latent structure in document collections” or “a type of statistical model for finding hidden topical patterns of words.”7 we compared outcomes of different types of topic modeling techniques and attempted to propose topic modeling techniques that would be most appropriate in the context of chat reference transcript data. literature review to identify the most appropriate research methods that would facilitate analyzing a vast amount of chat transcripts, this section first introduces literature in relation to research methods used in analyzing chat transcript data in library settings and nonlibrary settings. it follows by discussing different types of topic modeling techniques that have high potential for quick and easy analysis of chat transcripts and their strengths and weaknesses. chat transcript analysis methods in library settings in analyzing library chat transcripts, which are one major data source of library chat service research, researchers have used variants of quantitative and qualitative research methods.8 coding-based content analysis with or without predefined categories is one type of qualitative method.9 the other type of qualitative research method is conversation or language usage analysis but it is not a dominant type of research method, as compared to coding-based qualitative content analysis.10 the most common quantitative methods are simple descriptive countor frequencybased analyses that are accompanied by qualitative coding-based content analyses.11 in some recent research, advanced quantitative research methods, such as cluster analysis and topic modeling techniques, have been used, but they have not been fully explored yet with a wide range of techniques.12 chat transcript analysis methods in nonlibrary settings as shown in table 1, researchers in nonlibrary settings also used research methods in analyzing chat data from diverse technology platforms or contexts, ranging from qualitative manual coding methods to data mining and machine learning techniques. topic modeling techniques are one of the chat analysis methods, but again, it seems that they have not been fully explored yet in chat analyses in nonlibrary settings, even though they have been used in a wide range of contexts.13 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 3 table 1. chat transcript analysis applications in non-library settings disciplines platforms/sources of chat transcript data chat transcript analysis methods/tools/techniques education chat rooms and text chat14 qualitative content analysis health social media15 qualitative & quantitative content analysis business in-game chat features and chatbots16 a spell-checker, readability scores, the number of spelling and grammatical errors, linguistic inquiry and word count (liwc) program, logistic regression analysis, decision tree, support vector machine (svm) criminology instant messengers, internet relay chat (irc) channels, internet-based chat logs, and social media17 liwc program, cluster analysis, latent dirichlet allocation (lda) topic modeling techniques and their strengths and weaknesses as a quantitative and statistical method appropriate for analyzing a vast amount of chat transcript data, researchers from both library and nonlibrary settings used topic modeling. as shown in table 2, conventional topic modeling techniques include latent semantic analysis, probabilistic latent semantic analysis, and latent dirichlet allocation, each of which has its unique strengths and weaknesses.18 in order to overcome weaknesses of the conventional techniques, researchers have developed alternative techniques. for example, dirichlet multinomial mixture (dmm) has been proposed to overcome data sparsity problems in short texts.19 as another example, correlation explanation (corex) has been proposed to avoid time and effort to identify topics and their structure ahead of time.20 last, guided lda (guidedlda) has been proposed to improve performance of infrequently occurring topics.21 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 4 table 2. strengths and weaknesses of conventional topic modeling techniques acronym definitions strengths weaknesses latent semantic analysis lsa a document is represented as a vector of numbers found by applying dimensionality reduction (specifically, truncated svd) to summarize the frequencies of cooccurring words across documents. can deal with polysemy (multiple meanings) to some extent. is hard to obtain and to determine the optimal number of topics. probabilistic latent semantic analysis plsa a document is represented as vectors, but these vectors have nonnegative entries summing to 1 such that each component (topic) represents the relative prominence of some probabilistic mixture of words in the corpus. topics in a document are “probabilistic instead of the heuristic geometric distances.”22 can deal with polysemy issues; provides easy interpretation terms of word, document, and topic probabilities. has over-fitting problems. latent dirichlet allocation lda a bayesian extension of plsa that adds assumptions about the relative probability of observing different document's distributions over topics. prevents overfitting problems; provides a fully bayesian probabilistic interpretation. does not show relationships among topics. data, preprocessing, analysis, and evaluation this section first introduces the data used for this study. next, it explains the procedures of each stage starting from preprocessing to analyzing chat transcript data using different types of conventional and alternative topic modeling techniques. last, it discusses quantitative and qualitative evaluation in terms of the quality of topic outcomes across different types of topic technique. for more details including python scripts please visit our github page at https://github.com/mfienup/uni-library-chat-study. https://github.com/mfienup/uni-library-chat-study information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 5 data this study collected the university of northern iowa’s rod library chat reference data dated from april 10, 2015, to may 31, 2019 (irb#18-0225). this raw chat data was downloaded from libchat in the form of an excel spreadsheet with 9,942 english chat transcripts with each transcript as a separate row. preprocessing as the first step, this study removed unnecessary components of each chat transcript using a custom python script. components removed were timestamps, patron and librarian identifiers, http tags (e.g., urls), and non-ascii characters. next, it processed the resulting text words using python’s natural language toolkit (https://www.nltk.org/) and its wordnetlemmatizer function (https://www.nltk.org/_modules/nltk/stem/wordnet.html) to normalize words for further analyses. as the final step, it prepared the four types of data sets to identify which type of data set would produce better topic outcomes. the four types of data sets were as follows: • question-only: consists of only the initial question asked by the library patron in each chat transcript. only the latter 10.7% of the chats recorded in the excel spreadsheet contained an initial question column entry. the remaining chats assumed to contain their initial question in the patron’s first response if it was longer than a trivial welcome message. • whole-chat: consists of the whole chat transcripts from the library patron and librarians. • whole-chat with nouns and adjectives: consists of only nouns and adjectives as parts of speech (pos) from the whole chat transcripts. • whole-chat with nouns, adjectives, and verbs: consists of only nouns, adjectives, and verbs as pos from the whole chat transcripts. the first two data sets were prepared to examine if the first question initiated by each patron or the whole chat transcripts would help produce better topic outcomes. the last two data sets were prepared to examine which parts of speech retained would help produce better topic outcomes. data analysis with conventional topic modeling techniques this study first analyzed chat reference data using three conventional topic modeling techniques: latent semantic analysis (lsa), probabilistic latent semantic analysis (plsa), and two versions of latent dirichlet allocation (lda), as shown in table 3. all three techniques are examples of unsupervised topic modeling techniques that automatically analyze text data from a set of documents (in this study, a set of chat transcripts) to infer predominant topics or themes across all documents without human help. a key challenge, or a key parameter to be determined, for unsupervised topic modeling techniques is to identify the optimal number of topics. the study ran the commonly used lda technique with the whole-chat data set with various numbers of topics. fifteen was chosen as an optimal number of topics for this study by calculating and comparing the log-likelihood scores among various number of topics. https://www.nltk.org/ https://www.nltk.org/_modules/nltk/stem/wordnet.html information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 6 table 3. conventional topic modeling techniques and their sources technique programming language implementation source version used in the study latent semantic analysis python https://pypi.org/project/gensim/ 3.8.1 probabilistic latent semantic analysis python https://scikitlearn.org/stable/modules/generated/ sklearn.decomposition.nmf.html 0.21.3 latent dirichlet allocation (with sklearn) python https://scikitlearn.org/stable/modules/generated/ sklearn.decomposition.latentdirichlet allocation.html 0.21.3 latent dirichlet allocation (with pymallet) python https://github.com/mimno/pymallet dated february 26, 2019 also, before analyzing chat transcript data using lsa and plsa, this study performed a term frequency–inverse document frequency (tf–idf) transformation. tf–idf is a measure of how important a word is to a document (i.e., a single chat transcript) compared to its relevance in a collection of all documents. data analysis with alternative topic modeling techniques in addition to conventional topic modeling techniques, this study analyzed chat reference data using three alternative techniques of dirichlet multinomial mixture (dmm), anchored correlation explanation (corex) and guided lda (guidedlda), as shown in table 4. this study selected dmm as an alternative unsupervised topic modeling technique that has been developed for short texts. also, this study selected anchored corex and guided lda (guidedlda) as semi-supervised topic modeling techniques that require human-curated sets of words, called anchors or seeds, which nudge topic models toward including the suggested anchors. this is based on the assumption that human’s curated techniques would help produce better quality of topics than the unsupervised techniques. for example, the three words “interlibrary,” “loan,” and “request,” or the two words “article” and “database,” are possible anchor words in the context of library chat transcripts. such anchor words can appear anywhere within a chat in any order. https://pypi.org/project/gensim/ https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.nmf.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.nmf.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.nmf.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://github.com/mimno/pymallet information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 7 table 4. alternative topic modeling techniques and their sources unsupervised vs. semisupervised technique programming language implementation source version used in the study unsupervised dirichlet multinomial mixture (dmm) java https://github.com/qiang2 100/sttm 9/27/2019 semi-supervised anchored correlation explanation (corex) python https://github.com/gregve rsteeg/corex_topic 1/21/2020 semi-supervised guided lda using collapsed gibbs sampling python https://guidedlda.readthe docs.io/en/latest/ 10/5/2017 given that a known set of anchor words associated with academic library chats seems unavailable in the literature, this study decided to obtain a list of most meaningful anchor words by combining outcomes of the unsupervised techniques with a human’s follow-up curation, as follows: step 1. execute unsupervised topic modeling techniques step 2. combine resulting topics from all unsupervised topic modeling techniques step 3. identify a list of all possible pairs of words (bi-occurrences), e.g., 28 pairs of words if each topic has 8 words, and all possible combinations of tri-occurrences of words step 4. identify most common bi-occurrences and tri-occurrences of words across all topics by ordering in descending order by frequency step 5. select a set of anchors from these bi-occurrences and tri-occurrences of words by a human’s judgment in terms of selecting a set of anchor words, the librarian author of this paper judged whether combinations of words in each row from step 4 were aligned with frequently asked questions or easily inferable themes in academic library contexts. as shown in table 5, a set of “interlibrary,” “loan,” and “request” was selected as anchor words that are aligned with one frequently asked question about interlibrary loan requests, whereas a set of “access,” “librarian,” and “research” was not selected as anchor words because multiple themes, such as access to resources and asking for research help from librarians, can be inferred. additionally, a set of “hour,” “time,” and “today” was selected over a set of “time,” “tomorrow,” and “tonight” as better or clearer anchor words. https://github.com/qiang2100/sttm https://github.com/qiang2100/sttm https://github.com/gregversteeg/corex_topic https://github.com/gregversteeg/corex_topic https://guidedlda.readthedocs.io/en/latest/ https://guidedlda.readthedocs.io/en/latest/ information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 8 table 5. examples of anchor words that were selected and not selected examples of tri-occurrences of words (note: strikethrough denotes a set of words that were not selected as anchor words) 1 interlibrary loan request 2 hour time today 3 time tomorrow tonight 4 time today tomorrow 5 floor librarian research 6 access librarian research 7 camera digital hub 8 digital hub medium 9 access article journal 10 access article database 11 access account campus 12 research source topic 13 paper research topic quantitative evaluation with topic coherence metrics comparing the quality of topic outcomes across various topic modeling techniques is tricky. purely statistical and quantitative evaluation techniques, such as held-out log-likelihood measures, have proven to be unaligned with human intuition or judgment with respect to topic interpretability and coherency.23 thus, this study adopted the three topic coherence metrics of tcpmi (normalized pointwise mutual information), tc-lcp (normalized log conditional probability), and tc-nz (number of topic word pairs never observed together in the corpus) that have been introduced by boyd-graber, mimno, and newman; bouma; and lau, newman, and baldwin.24 these three metrics are based on the assumption that the likelihood that two words that co-occur in a topic would also co-occur within a corpus. to utilize the three topic coherence metrics, the study chose a binarized choice (e.g., does a transcript contain two words?) instead of a sliding window of fixed size (e.g., do two words appear within a fixed window of 10 consecutive words?) as a type of how to count term cooccurrences. this decision was made because each chat transcript is relatively short, and a fixed information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 9 window size seemed inconsistent across different type of data sets that included different parts of speech. in terms of the other decision to be made for applying the three topic coherence metrics, this study chose a training corpus of all the chat transcripts instead of external corpuses such as the entire collection of english wikipedia articles that has little in common with average library chat transcripts. qualitative evaluation with human judgment in addition to quantitative evaluation with topic coherence metrics, qualitative accuracy and interpretability were judged by the librarian author of this paper based on whether topics were aligned with frequently asked questions or easily inferable themes in academic library contexts. for example, “find or access book or article” was inferred, from a set of words in topic 1 on lsa in table 6, as an accurate and easily interpretable theme. from a set of words in topic 3 on lda, “reserve study room” and “check out laptop computer” were inferred as two separable, easily interpretable themes. from a set of words in topic 15 on corex with nine anchors, no theme was inferred as an easily interpretable theme. (see table 10 in the results section for all themes inferred from table 6.) information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 10 table 6. examples of topics found by topic modeling techniques topic modeling technique topics (top 15 topics with eight words per topic) note: parenthetical additions are explanations or descriptions and not part of the topic. latent semantic analysis (lsa) topic 1. article book search find access link will check topic 2. renew book article room reserve search journal check topic 3. room renew reserve book study scheduler loan online topic 4. renew request loan interlibrary search room review peer topic 5. loan floor renew access interlibrary request log book topic 6. book open print request search loan renew interlibrary topic 7. print floor open printer color hour research pm topic 8. open hour print search review close peer floor topic 9. print access renew research book loan librarian open topic 10. floor article open book renew print locate database topic 11. article book attach file print database floor check topic 12. check book desk laptop answer print shortly open topic 13. answer desk shortly place room database circulation pick topic 14. review peer search reserve log access campus database topic 15. database file attach collection access journal research reserve probabilistic latent semantic analysis (plsa) topic 1. collection special youth contact email number archive department topic 2. book title hold online check pick number reserve topic 3. room reserve study scheduler reservation group rodscheduler (software) space topic 4. search bar click type journal onesearch (a discovery tool) result homepage topic 5. request loan interlibrary link illiad (system) submit inter instruction topic 6. renew online account book today number circulation item topic 7. access link log campus click work online sign topic 8. article journal attach file title access google scholar topic 9. research librarian paper appointment consultation source topic question topic 10. open hour today close pm tomorrow midnight tonight topic 11. check answer place shortly desk laptop student long topic 12. print color printer computer printing mobile release black topic 13. floor locate desk stack main fourth number section information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 11 topic modeling technique topics (top 15 topics with eight words per topic) note: parenthetical additions are explanations or descriptions and not part of the topic. topic 14. database az subject ebsco(database) list business topic access topic 15. review peer journal topic sociology study article result latent dirichlet allocation (lda) with sklearn topic 1. file attach cite citation link article author pdf topic 2. check book renew student item today time member topic 3. room reserve computer laptop study check reservation desk topic 4. book request loan interlibrary check title online copy topic 5. search article database review result type google bar topic 6. student class access iowa course university college fall topic 7. research librarian source paper topic good appointment specific topic 8. email contact chat good librarian work question address topic 9. open hour today check pick hold desk close topic 10. link access click log work campus sign database topic 11. floor locate desk main art music circulation section topic 12. medium digital check video hub desk rent camera topic 13. article journal access title online link education amp topic 14. print printer color card scan document charge job topic 15. answer check place collection shortly special question number dirichlet multinomial mixture (dmm) topic 1. room reserve how will study check floor what topic 2. request loan book interlibrary how article will link topic 3. article access find journal link how search full topic 4. book how find check what online link will topic 5. article find attach file what how will link topic 6. how check open today desk hour will what topic 7. find article what search how research source database topic 8. how print will cite printer link what citation topic 9. search article find how review will database journal topic 10. book find floor how will where call number topic 11. book check how renew will today request what topic 12. research how librarian find what article will email topic 13. find how will contact collection what special email topic 14. access article link log how campus database work topic 15. article find will search what link book how anchored correlation explanation (corex) with nine anchor words topic 1. request loan interlibrary illiad (system) form submit inter fill topic 2. study reserve room scheduler hub medium equipment digital information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 12 topic modeling technique topics (top 15 topics with eight words per topic) note: parenthetical additions are explanations or descriptions and not part of the topic. topic 3. search review peer bar result type onesearch (a discovery tool) homepage topic 4. today open hour pm assist close window midnight topic 5. locate floor main where third fourth desk stack topic 6. print printer color printing black white mobile release topic 7. number collection special call phone youth archive xxx topic 8. research librarian appointment consultation paper set xxx transfer topic 9. access database journal article campus full az text topic 10. email will contact work when good who student topic 11. education read school class professor amp teacher child topic 12. topic source cite write apa start citation recommend topic 13. find attach file google what scholar title specific topic 14. click log link left side catid button hand topic 15. shortly place answer check cedar fall iowa northern guidedlda with nine anchor words and confidence 0.75 topic 1. book request loan interlibrary will how check link topic 2. room reserve how check will desk study medium topic 3. search article find how will database book review topic 4. book check how renew today will hour open topic 5. book floor find how check where call locate topic 6. print how computer will printer color desk student topic 7. contact collection will find email special how check topic 8. research librarian find how what will email article topic 9. article access link how log click database find topic 10. article find how access what link attach file topic 11. find chat copy how good online what will topic 12. article find file attach what journal will work topic 13. how check book answer place shortly what find topic 14. book how find what sport link video textbook topic 15. how cite what find citation author article source information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 13 results this section first introduces which topic modeling techniques, as well as which type of data set, performed the best on each of the three topic coherence metrics. it follows by introducing which technique was the best according to human qualitative judgment. quantitative evaluation with topic coherence metrics given that for a topic coherence metric tc-pmi larger values mean more coherent topics, table 7 and its corresponding figure 1 show that corex with anchor words on the whole-chat performed best on tc-pmi. tf–idf & plsa on the whole-chat performed better than lda on the whole-chat. given that for topic coherence metric tc-lcp larger values mean more coherent topics, table 8 and its corresponding figure 2 show that dmm on the whole-chat performed best on tc-lcp. tf– idf & plsa on the whole-chat performed better than lda, even though lda (pymallet) on the whole-chat performed better than tc-idf & plsa on the whole-chat. given that for topic coherence metric tc-nz smaller values mean more coherent topics, table 9 and its corresponding figure 3 show that tf–idf & plsa, lda and lda (pymallet) on the wholechat performed best on tc-nz. table 7. tc-pmi comparison of topic modeling techniques on the four types of data sets (with top 15 topics with eight words per topic) topic modeling technique whole-chat whole-chat (noun, adjective, verb) whole-chat (noun, adjective) question-only tf–idf & lsa -0.066 -0.061 -0.063 -0.429 tf–idf & plsa 0.508 0.321 0.494 -0.122 lda (sklearn) 0.378 0.261 0.099 -0.995 lda (pymallet) 0.218 0.262 0.271 -0.091 dmm 0.136 0.22 0.285 0.109 corex without anchor words 0.47 0.497 0.396 -0.584 corex with nine anchor words 0.522 0.534 0.558 -0.401 guidedlda with nine anchor words and confidence 0.75 0.133 0.216 0.262 0.069 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 14 figure 1. tc-pmi comparison of topic modeling techniques on the four types of data sets. table 8. tc-lcp comparison of topic modeling techniques on the four types of data sets (with top 15 topics with eight words per topic) topic modeling technique whole-chat whole-chat (noun, adjective, verb) whole-chat (noun, adjective) question-only tf–idf & lsa -1.114 -1.124 -1.204 -1.675 tf–idf & plsa -0.751 -0.793 -0.893 -1.956 lda (sklearn) -0.789 -0.979 -1.263 -2.827 lda (pymallet) -0.637 -0.767 -0.918 -1.626 dmm -0.546 -0.645 -0.731 -1.159 corex without anchor words -0.868 -0.853 -1.062 -2.618 corex with nine anchor words -0.82 -0.791 -0.884 -2.348 guidedlda with nine anchor words and confidence 0.75 -0.637 -0.686 -0.792 -1.143 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 15 figure 2. tc-lcp comparison of topic modeling techniques on the four types of data sets. information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 16 table 9. tc-nz comparison of topic modeling techniques on the four types of data sets (with top 15 topics with eight words per topic) topic modeling technique whole-chat whole-chat (noun, sdjective, verb) whole-chat (noun, adjective) question-only tf–idf & lsa 0.267 0.267 0.333 1.8 tf–idf & plsa 0 0 0.067 3.8 lda (sklearn) 0 0.467 1.2 7.067 lda (pymallet) 0 0.133 0.267 1.8 dmm 0.067 0 0 0.267 corex without anchor words 0.333 0.067 0.6 7.067 corex with nine anchor words 0.133 0 0.133 5.267 guidedlda with nine anchor words and confidence 0.75 0.2 0.067 0 0.133 figure 3. tc-nz comparison of topic modeling techniques on all four data sets. information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 17 last, all tables 7 to 9 and their corresponding figures 1 to 3 clearly show that the whole-chat data set with all parts of speech was generally the best data set on all the techniques. qualitative evaluation with human judgment as shown in table 10, all techniques had relatively high accuracy and interpretability in terms of straightforward topics or themes in italicized text, such as “interlibrary loan,” “technology,” “hours,” and “room reservations,” where one keyword could represent a whole theme. however, in terms of less-straightforward topics or themes plsa performed better than the other techniques. in other words, plsa had the highest number of topics that are aligned clearly with frequently asked questions or are easily inferable themes in academic library contexts. also, plsa had a lower number of unrelated or multiple themes within one topic, whereas other techniques had a higher number of unrelated or multiple themes within one topic. as an example, topic 8 on dmm shows that “print” and “citation” can be inferred as two unrelated themes within one topic. table 10. examples of themes qualitatively inferred from a list of words (a topic) identified by each topic modeling technique topic modeling technique themes inferred from table 6 (note: italics denotes straightforward themes; and strikethrough denotes themes with no interpretability or unrelated, multiple themes within one topic) latent semantic analysis (lsa) topic 1. find or access book or article topic 2. renew book or article; reserve a room; search journal topic 3. renew book online; reserve room; loan topic 4. renew; interlibrary loan; search; room topic 5. renew book; interlibrary loan; floor topic 6. renew; interlibrary loan print book; search topic 7. print color; floor; hours; research topic 8. hours; print; search; peer peer review; floor topic 9. print; renew book; librarian; open hours topic 10. renew book and article, print, floor and locate; database topic 11. print; database; floor topic 12. check out book or laptop; print; open topic 13. circulation desk; room; database topic 14. not clear topic 15. not clear probabilistic latent semantic analysis (plsa) topic 1. contact information of special collection and youth topic 2. not clear topic 3. room reservation topic 4. journal search and onesearch topic 5. interlibrary loan request topic 6. how to renew book online topic 7. working from off campus (not clear) topic 8. journal article via google scholar topic 9. appointment with librarians for research consultations topic 10. open hours topic 11. not clear topic 12. printing information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 18 topic modeling technique themes inferred from table 6 (note: italics denotes straightforward themes; and strikethrough denotes themes with no interpretability or unrelated, multiple themes within one topic) topic 13. stack on the fourth floor topic 14. databases a-z for business including ebsco topic 15. peer reviewed journals for sociology latent dirichlet allocation (lda) with sklearn topic 1. not clear topic 2. not clear topic 3. reserve study room; check out laptop computer topic 4. interlibrary loan online topic 5. search article via databases topic 6. not clear topic 7. appointment with research librarians topic 8. contact librarian via email topic 9. open hours topic 10. database access from off campus topic 11. floor for art and music circulation desk topic 12. rent camera topic 13. access journal article topic 14. printing and charge topic 15. special collection dirichlet multinomial mixture (dmm) topic 1. reserve study room and floor topic 2. interlibrary loan topic 3. search and access article topic 4. find book online topic 5. find article (not clear) topic 6. open hours topic 7. find article and database topic 8. print; citation topic 9. find article & database topic 10. find book with call number topic 11. renew book (not clear) topic 12. email librarians for research help topic 13. special collection (not clear) topic 14. access article/database from on campus topic 15. find article (not clear) anchored correlation explanation (corex) with nine anchor words topic 1. interlibrary loan topic 2. reserve study room; equipment topic 3. peer-reviwed and onesearch topic 4. open hours topic 5. floor location topic 6. printing topic 7. special collection and phone number topic 8. research consultation appointment topic 9. access database a-z information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 19 topic modeling technique themes inferred from table 6 (note: italics denotes straightforward themes; and strikethrough denotes themes with no interpretability or unrelated, multiple themes within one topic) topic 10. not clear topic 11. not clear topic 12. apa citations topic 13. google scholar (not clear) topic 14. log in topic 15. not clear guidedlda with nine anchor words and confidence 0.75 topic 1. interlibrary loan topic 2. reserve study room & medium topic 3. search and find article; databases topic 4. renew book; hours topic 5. find book with call number topic 6. printing topic 7. special collection topic 8. email to research librarian topic 9. access article and databases topic 10. access article; attach file (not clear) topic 11. not clear topic 12. find article and journal; file attach (not clear) topic 13. not clear topic 14. find book, video, and textbook about sport topic 15. citation discussion given that different topic modeling techniques performed the best depending on different types of topic coherence metrics, it is not possible to make a firm conclusion that one technique is better than the others. interestingly, the commonly-used technique lda tested in both sklearn and pymallet in this study did not consistently outperform tf–idf & plsa. in addition, semisupervised techniques of anchored correlation explanation (corex) or guided lda (guidedlda) did not necessarily outperform an unsupervised technique of the dirichlet multinomial mixture (dmm). last, from a human’s qualitative judgment, plsa performed the best, which is aligned with the findings on tc-nz. this might imply that tc-nz is a more appropriate metric than the other metrics in measuring topic coherence in the context of academic library chat transcripts. in terms of different types of data sets, all three of the whole-chat data sets significantly outperformed the questions-only data set. at the outset of the study, it was conjectured that the initial question of each chat transaction might concentrate the essence of each chat, thereby leading to better performance. clearly this was not the case, possibly because the rest of chat transcripts would reinforce a topic by standardizing the vocabulary of the chat’s initial question. it was somewhat interesting that varying the parts of speech (pos) retained in the three whole-chat data sets had little benefit on the topic modeling analyses. it might imply that topic modeling information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 20 techniques are sensitive enough to differentiate across different parts of speech, thereby leading to good performance regardless of types of data sets. conclusion this study clearly showed that conventional techniques should be also examined to avoid any errors from the assumption that newly developed techniques such as lda would always outperform regardless of contexts. also, both quantitative and qualitative evaluations indicate that unsupervised techniques should be equally weighted as semi-supervised techniques with human interventions. as a future study, like other similar research, it would be meaningful to compare human qualitative judgment with scores of each metric more rigorously, along with more librarians’ input, to confirm (or disconfirm) our preliminary conclusion that tc-nz is the most appropriate topic coherence metric in the context of library chat transcripts.25 it would be also interesting to investigate and examine semi-supervised techniques with different types of anchoring approaches, such as tandem anchoring.26 last, in order to overcome limitations of this study, it would be valuable to collect more and diverse chat reference data and compare output of topics across different types of institutions (e.g., teaching versus research institutions). acknowledgments this project was made possible in part by the institute of museum and library services [national leadership grants for libraries, lg-34-19-0074-19]. endnotes 1 christina m. desai and stephanie j. graves, “cyberspace or face-to-face: the teachable moment and changing reference mediums,” reference & user services quarterly 47, no. 3 (spring 2008): 242–55, https://www.jstor.org/stable/20864890; megan oakleaf and amy vanscoy, “instructional strategies for digital reference: methods to facilitate student learning,” reference & user services quarterly 49, no. 4 (summer 2010): 380–90, https://www.jstor.org/stable/20865299; shu z. schiller, “chat for chat: mediated learning in online chat virtual reference service,” computers in human behavior 65 (july 2016): 651–65, https://doi.org/10.1016/j.chb.2016.06.053; mila semeshkina, “five major trends in online education to watch out for in 2021,” forbes, february 2, 2021, https://www.forbes.com/sites/forbesbusinesscouncil/2021/02/02/five-major-trends-inonline-education-to-watch-out-for-in-2021/?sh=3261272521eb. 2 maryvon côté, svetlana kochkina, and tara mawhinney, “do you want to chat? reevaluating organization of virtual reference service at an academic library,” reference and user services quarterly 56, no. 1 (fall 2016): 36–46, https://www.jstor.org/stable/90009882; sarah lemire, lorelei rutledge, and amy brunvand, “taking a fresh look: reviewing and classifying reference statistics for data-driven decision making,” reference & user services quarterly 55, no. 3 (spring 2016): 230–38, https://www.jstor.org/stable/refuseserq.55.3.230; b. jane scales, lipi turner-rahman, and feng hao, “a holistic look at reference statistics: whither librarians?,” evidence based library and information practice 10, no. 4 (december 2015): 173– 85, https://doi.org/10.18438/b8x01h. https://www.jstor.org/stable/20864890 https://www.jstor.org/stable/20865299 https://doi.org/10.1016/j.chb.2016.06.053 https://www.forbes.com/sites/forbesbusinesscouncil/2021/02/02/five-major-trends-in-online-education-to-watch-out-for-in-2021/?sh=3261272521eb https://www.forbes.com/sites/forbesbusinesscouncil/2021/02/02/five-major-trends-in-online-education-to-watch-out-for-in-2021/?sh=3261272521eb https://www.jstor.org/stable/90009882 https://www.jstor.org/stable/refuseserq.55.3.230 https://doi.org/10.18438/b8x01h information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 21 3 pamela j. howard, “can academic library instant message transcripts provide documentation of undergraduate student success?,” journal of web librarianship 13, no. 1 (february 2019): 61– 87, https://doi.org/10.1080/19322909.2018.1555504. 4 côté and kochkina, “do you want to chat?”; sharon q. yang and heather a. dalal, “delivering virtual reference services on the web: an investigation into the current practice by academic libraries,” journal of academic librarianship 41, no. 1 (november 2015): 68–86, https://doi.org/10.1016/j.acalib.2014.10.003. 5 feifei liu, “how information-seeking behavior has changed in 22 years,” nn/g nielsen norman group, january 26, 2020, https://www.nngroup.com/articles/information-seeking-behaviorchanges/; amanda spink and jannica heinström, eds., new directions in information behavior (bingley, uk: emerald group publishing limited, 2011). 6 kathryn barrett and amy greenberg, “student-staffed virtual reference services: how to meet the training challenge,” journal of library & information services in distance learning 12, no. 3–4 (august 2018): 101–229, https://doi.org/10.1080/1533290x.2018.1498620; robin canuel et al., “developing and assessing a graduate student reference service,” reference services review 47, no. 4 (november 2019): 527–43, https://doi.org/10.1108/rsr-06-20190041. 7 bhagyashree vyankatrao barde and anant madhavrao bainwad, “an overview of topic modeling methods and tools,” in proceedings of international conference on intelligent computing and control systems, 2018, 745–50, https://doi.org/10.1109/iccons.2017.8250563; jordan boydgraber, david mimno, and david newman, “care and feeding of topic models: problems, diagnostics, and improvements,” in handbook of mixed membership models and their applications, eds. edoardo m. airoldi et al. (new york: crc press, 2014), 225–54. 8 miriam l. matteson, jennifer salamon, and lindy brewster, “a systematic review of research on live chat service,” reference & user services quarterly 51, no. 2 (winter 2011): 172–89, https://www.jstor.org/stable/refuseserq.51.2.172. 9 kate fuller and nancy h. dryden, “chat reference analysis to determine accuracy and staffing needs at one academic library,” internet reference services quarterly 20, no. 3–4 (december 2015): 163–81, https://doi.org/10.1080/10875301.2015.1106999; sarah passonneau and dan coffey, “the role of synchronous virtual reference in teaching and learning: a grounded theory analysis of instant messaging transcripts,” college & research libraries 72, no. 3 (2011): 276–95, https://doi.org/10.5860/crl-102rl. 10 paula r. dempsey, “‘are you a computer?’ opening exchanges in virtual reference shape the potential for teaching,” college & research libraries 77, no. 4 (2016): 455–68, https://doi.org/10.5860/crl.77.4.455; jennifer waugh, “formality in chat reference: perceptions of 17to 25-year-old university students,” evidence based library and information practice 8, no. 1 (2013): 19–34, https://doi.org/10.18438/b8ws48. 11 robin brown, “lifting the veil: analyzing collaborative virtual reference transcripts to demonstrate value and make recommendations for practice,” reference & user services quarterly 57, no. 1 (fall 2017): 42–47, https://www.jstor.org/stable/90014866; sarah https://doi.org/10.1080/19322909.2018.1555504 https://doi.org/10.1016/j.acalib.2014.10.003 https://www.nngroup.com/articles/information-seeking-behavior-changes/ https://www.nngroup.com/articles/information-seeking-behavior-changes/ https://doi.org/10.1080/1533290x.2018.1498620 https://doi.org/10.1108/rsr-06-2019-0041 https://doi.org/10.1108/rsr-06-2019-0041 https://doi.org/10.1109/iccons.2017.8250563 https://www.jstor.org/stable/refuseserq.51.2.172 https://doi.org/10.1080/10875301.2015.1106999 https://doi.org/10.5860/crl-102rl https://doi.org/10.5860/crl.77.4.455 https://doi.org/10.18438/b8ws48 https://www.jstor.org/stable/90014866 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 22 maximiek, elizabeth brown, and erin rushton, “coding into the great unknown: analyzing instant messaging session transcripts to identify user behaviors and measure quality of service,” college & research libraries 71, no. 4 (2010): 361–73, https://doi.org/10.5860/crl48r1. 12 christopher brousseau, justin johnson, and curtis thacker, “machine learning based chat analysis,” code4lib journal 50 (february 2021), https://journal.code4lib.org/articles/15660; ellie kohler, “what do your library chats say?: how to analyze webchat transcripts for sentiment and topic extraction,” in brick & click libraries conference proceedings (maryville, mo: northwest missouri state university, 2017), 138–48, https://files.eric.ed.gov/fulltext/ed578189.pdf; megan ozeran and piper martin, “‘good night, good day, good luck,’” information technology and libraries 38, no. 2 (june 2019): 49–57, https://doi.org/10.6017/ital.v38i2.10921; thomas stieve and niamh wallace, “chatting while you work: understanding chat reference user needs based on chat reference origin ,” reference services review 46, no. 4 (november 2018): 587–99, https://doi.org/10.1108/rsr09-2017-0033; nadaleen tempelman-kluit and alexa pearce, “invoking the user from data to design,” college & research libraries 75, no. 5 (2014): 616–40, https://doi.org/10.5860/crl.75.5.616. 13 jordan boyd-graber, yuening hu, and david mimno, “applications of topic models,” foundations and trends in information retrieval 11, no. 2–3 (2017): 143–296, https://mimno.infosci.cornell.edu/papers/2017_fntir_tm_applications.pdf. 14 ewa m. golonka, medha tare, and carrie bonilla, “peer interaction in text chat: qualitative analysis of chat transcripts,” language learning & technology 21, no. 2 (june 2017): 157–78, http://hdl.handle.net/10125/44616; laura d. kassner and kate m. cassada, “chat it up: backchanneling to promote reflective practice among in-service teachers,” journal of digital learning in teacher education 33, no. 4 (august 2017): 160–68, https://doi.org/10.1080/21532974.2017.1357512. 15 eradah o. hamad et al., “toward a mixed-methods research approach to content analysis in the digital age: the combined content-analysis model and its applications to health care twitter feeds,” journal of medical internet research 18, no. 3 (march 2016): e60, https://doi.org/10.2196/jmir.5391; janet richardson et al., “tweet if you want to be sustainable: a thematic analysis of a twitter chat to discuss sustainability in nurse education,” journal of advanced nursing 72, no. 5 (january 2016): 1086–96, https://doi.org/10.1111/jan.12900. 16 shuyuan mary ho et al., “computer-mediated deception: strategies revealed by languageaction cues in spontaneous communication,” journal of management information systems 33, no. 2 (october 2016): 393–420, https://doi.org/10.1080/07421222.2016.1205924; mina park, milam aiken, and laura salvador, “how do humans interact with chatbots?: an analysis of transcripts,” international journal of management & information technology 14 (2018): 3338–50, https://doi.org/10.24297/ijmit.v14i0.7921. 17 abdur rahman, m. a. basher, and benjamin c. m. fung, “analyzing topics and authors in chat logs for crime investigation,” knowledge and information systems 39, no. 2 (march 2014): 351–81, https://doi.org/10.1007/s10115-013-0617-y; michelle drouin et al., “linguistic https://doi.org/10.5860/crl-48r1 https://doi.org/10.5860/crl-48r1 https://journal.code4lib.org/articles/15660 https://files.eric.ed.gov/fulltext/ed578189.pdf https://doi.org/10.6017/ital.v38i2.10921 https://doi.org/10.1108/rsr-09-2017-0033 https://doi.org/10.1108/rsr-09-2017-0033 https://doi.org/10.5860/crl.75.5.616 https://mimno.infosci.cornell.edu/papers/2017_fntir_tm_applications.pdf http://hdl.handle.net/10125/44616 https://doi.org/10.1080/21532974.2017.1357512 https://doi.org/10.2196/jmir.5391 https://doi.org/10.1111/jan.12900 https://doi.org/10.1080/07421222.2016.1205924 https://doi.org/10.24297/ijmit.v14i0.7921 https://doi.org/10.1007/s10115-013-0617-y information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 23 analysis of chat transcripts from child predator undercover sex stings,” journal of forensic psychiatry & psychology 28, no. 4 (february 2017): 437–57, https://doi.org/10.1080/14789949.2017.1291707; da kuang, p. jeffrey brantingham, and andrea l. bertozzi, “crime topic modeling,” crime science 6, no. 12 (december 2017): 1–12, https://doi.org/10.1186/s40163-017-0074-0; md waliur rahman miah, john yearwood, and siddhivinayak kulkarni, “constructing an inter‐post similarity measure to differentiate the psychological stages in offensive chats,” journal of the association for information science and technology 66, no. 5 (january 2015): 1065–81, https://doi.org/10.1002/asi.23247. 18 charu c. aggarwal and chengxiang zhai, eds. mining text data (new york: springer, 2012); rubayyi alghamdi and khalid alfalqi, “a survey of topic modeling in text mining,” international journal of advanced computer science and applications 6, no. 1 (2015): 146–53, https://doi.org/10.14569/ijacsa.2015.060121; leticia h. anaya, “comparing latent dirichlet allocation and latent semantic analysis as classifiers” (phd diss., university of north texas, 2011); barde and bainwad, “an overview of topic modeling”; david m. blei, “topic modeling and digital humanities,” journal of digital humanities 2, no. 1 (winter 2012), http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-davidm-blei/; tse-hsun chen, stephen w. thomas, and ahmed e. hassan, “a survey on the use of topic models when mining software repositories,” empirical software engineering 21, no. 5 (september 2016): 1843–919, https://doi.org/10.1007/s10664-015-9402-8; elisabeth günther and thorsten quandt, “word counts and topic models: automated text analysis methods for digital journalism research,” digital journalism 4, no. 1 (october 2016): 75–88, https://doi.org/10.1080/21670811.2015.1093270; gabe ignatow and rada mihalcea, an introduction to text mining: research design, data collection, and analysis (new york: sage, 2017); stefan jansen, hands-on machine learning for algorithmic trading: design and implement investment strategies based on smart algorithms that learn from data using python (birmingham: packt publishing limited, 2018); lin liu et al., “an overview of topic modeling and its current applications in bioinformatics,” springerplus 5, no. 1608 (september 2016): 1– 22, https://doi.org/10.1186/s40064-016-3252-8; john w. mohr and petko bogdanov, “introduction—topic models: what they are and why they matter,” poetics 41, no. 6 (december 2013): 545–69, https://doi.org/10.1016/j.poetic.2013.10.001; gerard salton, anita wong, and chung-shu yang, “a vector space model for automatic indexing,” communications of the acm 18, no. 11 (november 1975): 613–20, https://doi.org/10.1145/361219.361220; jianhua yin and jianyong wang, “a dirichlet multinomial mixture model-based approach for short text clustering,” in proceedings of the twentieth acm sigkdd international conference on knowledge discovery and data mining (new york: acm, 2014), 233–42, https://doi.org/10.1145/2623330.2623715; hongjiao xu et al., “exploring similarity between academic paper and patent based on latent semantic analysis and vector space model, ” in proceedings of the twelfth international conference on fuzzy systems and knowledge discovery (new york: ieee, 2015), 801–5, https://doi.org/10.1109/fskd.2015.7382045; chengxiang zhai, statistical language models for information retrieval (williston, vt: morgan & claypool publishers, 2018). 19 neha agarwal, geeta sikkaa, and lalit kumar awasthib, “evaluation of web service clustering using dirichlet multinomial mixture model based approach for dimensionality reduction in service representation,” information processing & management 57, no. 4 (july 2020), https://doi.org/10.1016/j.ipm.2020.102238 ; chenliang li et al., “topic modeling for short https://doi.org/10.1080/14789949.2017.1291707 https://doi.org/10.1186/s40163-017-0074-0 https://doi.org/10.1002/asi.23247 https://doi.org/10.14569/ijacsa.2015.060121 http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/ http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/ https://doi.org/10.1007/s10664-015-9402-8 https://doi.org/10.1080/21670811.2015.1093270 https://doi.org/10.1186/s40064-016-3252-8 https://doi.org/10.1016/j.poetic.2013.10.001 https://doi.org/10.1145/361219.361220 https://doi.org/10.1145/2623330.2623715 https://doi.org/10.1109/fskd.2015.7382045 https://doi.org/10.1016/j.ipm.2020.102238 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 24 texts with auxiliary word embeddings,” in proceedings of the thirty-ninth international acm sigir conference on research and development in information retrieval (new york: acm, 2016), 165–74, https://doi.org/10.1145/2911451.2911499; jipeng qiang et al., “short text topic modeling techniques, applications, and performance: a survey,” ieee transactions on knowledge and data engineering 14, no. 8 (april 2019): 1–17, https://doi.org/10.1109/tkde.2020.2992485. 20 ryan j. gallagher et al., “anchored correlation explanation: topic modeling with minimal domain knowledge,” transactions of the association for computational linguistics 5 (december 2017): 529–42, https://doi.org/10.1162/tacl_a_00078. 21 jagadeesh jagarlamudi, hal daumé iii, and raghavendra udupa, “incorporating lexical priors into topic models,” in proceedings of the thirteenth conference of the european chapter of the association for computational linguistics (stroudsburg, pa: acl, 2012), 204–13, https://www.aclweb.org/anthology/e12-1021; olivier toubia et al., “extracting features of entertainment products: a guided latent dirichlet allocation approach informed by the psychology of media consumption,” journal of marketing research 56, no. 1 (december 2019): 18–36, https://doi.org/10.1177/0022243718820559. 22 nan zhang and baojun ma, “constructing a methodology toward policy analysts for understanding online public opinions: a probabilistic topic modeling approach,” in electronic government and electronic participation, eds. efthimios tambouris et al. (amsterdam, netherlands: ios press bv, 2015): 72–9, https://doi.org/10.3233/978-1-61499-570-8-72. 23 jonathan chang et al., “reading tea leaves: how humans interpret topic models,” in proceedings of the twenty-second international conference on neural information processing systems (new york: acm, 2009), 288–96, https://dl.acm.org/doi/10.5555/2984093.2984126. 24 gerlof bouma, “normalized (pointwise) mutual information in collocation extraction,” in proceedings of the international conference of the german society for computational linguistics and language technology (tübingen, germany: gunter narr verlag, 2009), 43–53; boydgraber, mimno, and newman, “care and feeding of topic models,” in handbook of mixed membership models and their applications, eds. edoardo m. airoldi, david m. blei, elena a. erosheva, and stephen e. fienberg (boca raton: crc press, 2014), 225–54; jey han lau, david newman, and timothy baldwin, “machine reading tea leaves: automatically evaluating topic coherence and topic model quality,” in proceedings of the fourteenth conference of the european chapter of the association for computational linguistics (stroudsburg, pa: acl, 2014), 530–39, https://doi.org/10.3115/v1/e14-1056. 25 lau, newman, and baldwin, “machine reading tea leaves”; david newman et al., “automatic evaluation of topic coherence,” in proceedings of human language technologies: the 2010 annual conference of the north american chapter of the association for computational linguistics (new york: acm, 2010), 100–108, https://dl.acm.org/doi/10.5555/1857999.1858011. 26 jeffrey lund et al., “tandem anchoring: a multiword anchor approach for interactive topic modeling,” in proceedings of the fifty-fifth annual meeting of the association for computational linguistics (stroudsburg, pa: acl, 2017), 896–905, https://doi.org/10.18653/v1/p17-1083. https://doi.org/10.1145/2911451.2911499 https://doi.org/10.1109/tkde.2020.2992485 https://doi.org/10.1162/tacl_a_00078 https://www.aclweb.org/anthology/e12-1021 https://doi.org/10.1177/0022243718820559 https://doi.org/10.3233/978-1-61499-570-8-72 https://dl.acm.org/doi/10.5555/2984093.2984126 https://doi.org/10.3115/v1/e14-1056 https://dl.acm.org/doi/10.5555/1857999.1858011 https://doi.org/10.18653/v1/p17-1083 abstract introduction literature review chat transcript analysis methods in library settings chat transcript analysis methods in nonlibrary settings topic modeling techniques and their strengths and weaknesses data, preprocessing, analysis, and evaluation data preprocessing data analysis with conventional topic modeling techniques data analysis with alternative topic modeling techniques quantitative evaluation with topic coherence metrics qualitative evaluation with human judgment results quantitative evaluation with topic coherence metrics qualitative evaluation with human judgment discussion conclusion acknowledgments endnotes an omeka s repository for placeand land-based teaching and learning article an omeka s repository for placeand land-based teaching and learning neah ingram-monteiro and ro mckernan information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.15123 neah ingram-monteiro (ingramn@wwu.edu) is a teaching and learning librarian, western washington university. ro mckernan (rmckernan@whatcom.edu) is the oer librarian, whatcom community college. © 2022. abstract our small community college library developed a learning object repository to support a crossinstitutional, land-based, multidisciplinary academic initiative using the open-source platform omeka s. drawing on critical, feminist, and open practices, we document the relational labor, dialogue, and tensions involved with this open education project. this case study shares our experience with tools and processes that may be helpful for other small-scale open education initiatives, including user-centered iterative design, copyright education, metadata design, and userinterface development in omeka s. introduction whatcom community college (wcc) is a rural, public institution, located on the lands of coast salish peoples, including lummi, nooksack, semiahmoo, and samish, in the northwest region of washington state and just south of british columbia and the us-canada border. referred to as the pacific northwest or the north puget sound, this area is part of the greater salish sea bioregion (see fig. 1). the sea’s name was adopted in 2009 in washington state and british columbia to refer collectively to the strait of georgia, the strait of juan de fuca, and the puget sound.1 the library at wcc has recently established several new digital services, including the college’s first institutional repository. housed within this repository is a site named the salish sea curriculum repository, which has been developed to host a collection of materials and multidisciplinary curriculum related to engaging college students with this bioregion and is a unique cross-institutional collaboration between the library and the salish sea institute at nearby western washington university (wwu). in this paper, we document, from the perspective of the constraints of a small community college library, the development of the institutional repository service through the creation of the salish sea curriculum repository. this first phase development process began through relational work and proceeded through user-centered iterative design considerations, copyright education, metadata design, and user-centered interface development. a second phase was then launched that produced a curated index of existing work. we document our process to demonstrate a case study of a small-scale, open-source–backed scholarly communication project that can be reasonably replicated by other smaller institutions in order to encourage scholarly communication and open education services at all levels of librarianship. mailto:ingramn@wwu.edu mailto:rmckernan@whatcom.edu https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 2 ingram-monteiro and mckernan figure 1. “reference map for the salish sea bioregion,” aquila flower, 2020. made as part of the salish sea atlas, https://wp.wwu.edu/salishseaatlas/. creative commons attributionnoncommercial-noderivatives 4.0 international license. https://wp.wwu.edu/salishseaatlas/ https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 3 ingram-monteiro and mckernan description of library repository service development in spring 2020, our library began to develop an institutional repository in response to a need to document faculty and staff scholarship and student scholarship, including newspapers and journals, and to host a collection of historical college images and videos. lacking the budget for bepress and the dedicated technical expertise to implement dspace, we found that omeka s hosted on a shared server through reclaim hosting was the most appropriate fit for our needs. omeka was originally developed at the roy rosenzweig center for history and new media at george mason university; it offers libraries and museums a way to publish online exhibits while ensuring accessibility and the inclusion of standards-based metadata to support discovery and use.2 omeka s is a later platform that offers one single point of administration for multiple instances of omeka, making it more usable for institutions like ours with a variety of unique collection sites with their own display templates. omeka s adheres to international standards, such as the dublin core schema for metadata, and allows creation of digital content collections, simple web pages, and complex online exhibits. the software can be managed and administered by one librarian. it allows interoperability through the open archives initiative protocol for metadata harvesting (oai-pmh is critical for future ingestion into the digital public library of america, which will provide wider discoverability) and rest apis (which will be necessary for any integration into the library’s opac). as the college’s open education resources and copyright librarian, mckernan initially developed and administers the library’s omeka s installation. while the initial collections were in line with traditional institutional repository sites, a new need developed later in 2020 in response to curricular developments at the college: a repository based around multidisciplinary, land-based learning objects. this repository was the salish sea curriculum repository. development of salish sea curriculum repository relational work, which in our process includes time to build relationships and engage in dialogue, is important given our mutual exploration of open education projects. luo, hostetler, freeman, and stefaniak point to the importance of a campus culture that supports open education, through resource allocation such as oer design and development.3 as part of a larger team, this project represented three institutions (wcc, wwu, and the university of british columbia) with three different open education cultures. and additional faculty partners at wcc and wwu had varying experience with open education, ranging from an awareness of creative commons licenses to experience authoring oer textbooks. dai and carpenter discuss the feminized labor that goes into oer projects, arguing that, like instruction librarianship, oer librarianship is predicated on relational work.4 as feminized work is often invisible and undervalued, they suggest planning and documenting time for consultative tasks such as meeting with faculty as ways of bringing critical, feminist, and open pedagogies into this work.5 by discussing the development of the salish sea curriculum repository in terms of development phases, we want to devote space not just to the final products, but also to documenting this collaborative process. a note on terminology: the eric descriptor place based education is described as pedagogy to engage learners in their cultural, social, and ecological contexts; it often includes inviting community members in as instructors and bringing learners into the natural environments where https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 4 ingram-monteiro and mckernan they live. while place-based education is the prevailing term, calderon has argued that the expression of this term has typically erased indigenous relations with land and obscures the violence of settler colonialism.6 in contrast, calderon writes that land education or land-based education makes explicit the ideologies and structures of settler colonialism and that land education centers indigenous peoples’ relations to land, critically examining what it means to inhabit the lands of indigenous nations. we will continue to use the phrase land -based education in this paper. at wcc, the salish sea studies (sali) curriculum was developed by history instructor anna booker in partnership with natalie j. k. baloy of the salish sea institute at wwu. the curriculum includes experiential learning about the complex human-environment systems of the bioregion that builds a sense of place, connection, and relational accountability.7 at both colleges, the instruction teams who co-teach this course rotate from term to term and include faculty from multiple disciplines. instructors at wcc have included faculty from the departments of history, anthropology, geology, and sociology. at wwu, instructors have included faculty with appointments in salish sea studies, canadian-american studies, comparative indigenous studies, the college of the environment, and fairhaven college of interdisciplinary studies. units in the introductory sali course are designed to demonstrate that many ways of knowing are relevant and important to understanding the salish sea. when the second iteration of the course shifted online due to the covid-19 pandemic, with less than two weeks’ notice in spring 2020, instructors shifted to creating learning objects for asynchronous learning. since then, curricula for this course and adjacent courses in salish sea studies have been designed for online, in-person, and hybrid learning. subsequently booker had received a 2020–22 grant from the national endowment for the humanities (neh) to further develop the salish sea studies curriculum. with the pivot to online, she was looking for digital ways to support curriculum sharing. baloy had been approached by ingram-monteiro (who was in the ubc’s master of library and information studies program at the time) about supporting salish sea studies during the initial covid-19 shutdown. baloy connected her with booker about a possible grant work project. because of booker’s previous work on oer with mckernan, she suggested approaching the college’s library about a collaboration. the idea of using existing repository software to build a new site that would serve as a space to collect and share curriculum was born out of this dynamic context. in addition to the rotating instruction team and teaching modality variables introduced by the global health pandemic, the field of salish sea studies as taught in our context was also being defined and articulated concurrent with the initial development of this repository (through distinct curricular conversations). what started as a repository to share open educational resources about the salish sea bioregion became an online space for creating and sharing a curated set of oer explicitly for use in land-based, experiential, multidisciplinary, and transboundary teaching and learning about the salish sea bioregion. phase one: initial digital repository development the first phase of development ran from november 2020 to january 2021. library work in the first phase included designing, building out, styling, and initially populating the repository. we used a user-centered iterative design process in this phase (see fig. 2). user-centered iterative information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 5 ingram-monteiro and mckernan design is used in the human-computer interaction field to foreground user needs during design processes. van house, butler, ogle, and schiff discuss how designers need to know the larger context and purposes of users’ work with a digital library, as well as their specific tasks and information acts, such as searching or repackaging.8 for our project, user-centered iterative design started with consulting faculty partners to help them articulate use cases, identify primary users, and discuss ways to build the repository to incentivize submissions from instructors. these consultations included asking a lot of questions to draw out their needs and wishes for the repository. ingram-monteiro spent about 15 hours (of the 120 dedicated to this phase of the project) meeting or corresponding with our faculty partners over the course of four weeks. our partners contributed this time and more, in addition to their standard workloads and during winter break. this was very much a dialogue, as we went back and forth on some topics over the course of a few weeks, brainstorming together and looking for inspiration to share with each other. figure 2. “agile development” by dave gray is licensed under cc by-nd 2.0. through our dialogue, we were able to articulate that the repository would include adaptable, reusable teaching materials for lessons and courses about the salish sea. users of the repository would primarily include faculty contributors who use the repository to submit teaching materials and instructors from bioregional higher education institutions who use the repository to find material to adapt for their teaching. other users considered could include site visitors who are seeking information about the introduction to salish sea studies course that is taught in parallel at wcc and wwu. copyright education we provided copyright education in various modalities to the instructors who were involved during this phase. our faculty partners’ questions informed how we built copyright considerations into the repository. questions included what materials they could use and remix in a learning object that they would then license for reuse. while they felt protected by fair use for distributing copyrighted videos, maps, or readings within a traditionally mediated classroom environment, this calculus could not be automatically extended for distribution in oer. a recent systematic https://www.flickr.com/photos/38075047@n00/6865783267 https://www.flickr.com/photos/davegray/ https://creativecommons.org/licenses/by-nd/2.0/?ref=ccsearch&atype=rich information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 6 ingram-monteiro and mckernan literature review of empirical oer studies found that faculty are consistently uncertain how to license their creation when it includes remixed materials9 and our experience echoed that.10 the solution for the salish sea repository was the creation of a resources section in the repository that included citations for all rights reserved published works, so that an instructor could point to traditionally copyrighted works without directly uploading them to the repository (see fig. 3). figure 3. screenshot of an omeka s record for an article citation in the resources section. our faculty partners also had questions about creative commons licensing and how to select an appropriate license for their work. we designed the curriculum submission form to include explanations of each creative commons license type, as well as public domain and all rights reserved options. a submitter can read about these six terms and select which license is appropriate for their activity. figure 4. screenshot of copyright guidance and license selector on the submission form. information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 7 ingram-monteiro and mckernan while we were able to provide some guidance in the context of this project, bigger questions remained. faculty partners worked through the challenges of creating public oer from private, contextual lesson plans and learning objects.11 given the curricular emphasis on relational accountability in the salish sea bioregion, and the central role of indigenous knowledge holders and ways of knowing in line with land-based education, materials referenced in salish sea studies include traditional knowledge of indigenous nations. while this knowledge is shared in a consensual way in the context of a course (where a knowledge holder may be an invited guest, for example), sharing these materials in an open repository online introduces different considerations.12 local contexts’ traditional knowledge (tk) labels are a popular topic in open education and have been adopted by the library of congress, but as reijerkerk demonstrates, simply applying these labels in online catalogs is insufficient.13 the use of these labels is intended to be one intervention and done in relationship with indigenous knowledge holders.14 our role may be more to ask questions about existing permissions to share that knowledge, especially around ownership of that knowledge. christen provides more context on how tk labels can be applied in such material when it has been published in the public domain.15 metadata schema omeka s offers linked data infrastructure with the dublin coretm metadata initiative’s dcmi metadata terms (dcterms:) as the default vocabulary.16 we used this vocabulary to create a functional metadata schema that allows faculty to describe their submissions in ways that would be useful for other users. the metadata added during the submission process was then cleaned and enhanced by the librarian who reviewed each submission. for site visitors, this metadata allows browsing through the set of learning objects in the repository; they can browse lessons from a particular discipline or place, for example. they can also perform keyword searching to find results based on titles and lesson descriptions. during this design phase of the metadata, we started with an examination of the types of learning objects that would be shared through the repository. through iterative design we arrived at an initial structure that included four types of objects that would be deposited: assignments, activities, existing published resources, and learning modules. for each type, we then created an omeka s resource template to support consistent metadata processing and a “collecting form” to support metadata collection. we then added 40 resources, one module, two assignments, and five activities—all provided by our faculty partners—to test this structure. after more faculty consultation, we simplified the metadata design to include two learning object types: activities (including assignments) created by instructors and bibliographic citations for core resources used in salish sea studies. we refined our metadata schema for each of these and documented this metadata design and processing. see table 1 for one example. https://localcontexts.org/ https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 8 ingram-monteiro and mckernan table 1. metadata design—resource type: activity metadata field label values notes dcterms:title activity described by submitter dcterms:description lesson description described by submitter dcterms:subject discipline indigenous ways of knowing, humanities, natural sciences, social sciences, multidisciplinary/interdisciplinary, other dcterms:spatial spatial coverage described by submitter repeatable dcterms:audience course modality face-to-face, online synchronous, online asynchronous, hybrid, other dcterms:temporal temporal coverage described by submitter repeatable dcterms:format primary format of activity icebreaker, problem-based discussion, field trip, other dcterms:extent estimated time for students to complete activity 15 minutes, 30 minutes, one hour, two hours, more than two hours, multiple sessions, all quarter dcterms:creator primary creator(s) full name repeatable dcterms:contributor institutional affiliation western washington university, whatcom community college, other dcterms:license license 8 listed in item set add as “omeka resource” user interface design once we had a working metadata schema and collection mechanism/workflow and the high -level site structure, we shifted our focus to considerations of the interactivity and look and feel of the repository site. we heard from our faculty partners that they wanted a clean, colorful design that would be appealing to users. they shared the stanford history education group as one example, noting its simple navigation, and blackpast.org, noting its interactive timeline. they also shared spokanehistorical.org, which is built on omeka and includes an interactive map. we tried to manage expectations about what would be possible. we did not have many resources available for web design or experience with the omeka-compatible mapping and timeline tool neatline. still, we found that it was possible to create a simple, visually pleasing interface with some basic css skills, omeka s modules, and documentation from histsex.org, a library collection made with omeka s.17 modules in omeka s are plug-ins that can be installed and activated to add additional functionality. one of the more key modules we activated (on the admin side) was the css editor.18 we could then write an internal style sheet in this editor, in which to style the colors and links in accordance with web content accessibility guidelines for styling headers, color contrast, and text decoration for hyperlinks.19 the css editor module also includes input fields for external style sheets, which enabled us to include one for google fonts. the color scheme we selected uses wcc’s colors and complements the blues and greens of the salish sea. https://sheg.stanford.edu/ https://www.blackpast.org/ https://spokanehistorical.org/ https://neatline.org/ http://histsex.org/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 9 ingram-monteiro and mckernan another key module we added was the mapping module.20 this enabled us to represent the spatial coverage of learning objects on maps within the repository. instructors contributing content can associate their contribution with a specific geographic place by placing a marker or entering an address. the librarian processing the contribution can add to and edit this geospatial data. for site visitors, the mapping module allows interactive, map-based browsing of repository items. we initially deployed it only for bibliographic citations because this was the only resource type with a critical mass of existing content when we built the repository. when a visitor opens the resources page from the navigation, they see the option to “find resources by place,” with an embedded openstreetmap that includes markers that link to associated citations in the repository (see fig. 5). the spatial markers help locate scholarship to concepts of land-based education. figure 5. a screenshot of map indicators tied to repository items. a third omeka s module that we installed was fields as tags.21 this module increases the number of browsable metadata fields available to visitors on the main pages of the repository; in addition to title, subject, extent, and creator, visitors can also browse spatial and temporal coverage tags. in the interim months between phase one and phase two, we introduced the repository to wcc faculty who were participating in a year-long professional development workshop about teaching salish sea studies. the culminating project for that workshop involved submitting a teaching activity to the repository. however, while many participants began the submission process, few information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 10 ingram-monteiro and mckernan were able to submit an activity that was repository ready. reflecting on this and their own experiences developing curriculum for introduction to the salish sea, our faculty partners scaled back on expectations for oer development. it was evident that thoughtfully designing land and place-based, experiential, multidisciplinary, transboundary curriculum that is also open and adaptable would require dedicated resources in the context of deeper relationships and a longer timeframe. phase two: a curated index of published works the second phase of development, which took place roughly from july to december 2021, focused on further honing the interface and usability of the site. our faculty partners designated some neh grant funds to pay ingram-monteiro a stipend for summer website development work. further developing the resources section of the repository thus became the focus of our work in summer 2021. by providing a central access point for curated, published works about the salish sea, the repository would support faculty who were developing salish sea teaching materials as oer. we also referred to the resources section as the salish sea index, a space that provides building blocks for teaching materials. developing this index included adding individual pages for maps, collections, and terminology. the maps and collections pages—as well as the original resources page that includes published articles, books, videos, podcasts—are configured to automatically pull in newly entered omeka s items. each item was added using our previously developed bibliographic citation resource template for published resources. whenever there was a creative commons license, open access license, or copyright information provided, we note this at the item level to facilitate reuse and attribution. the digital collections page points visitors to digital collections (such as the northwest indian college salish sea speaker series videos and the south asian american digital archive) as well as to information about physical collections (such as the wing luke museum and the center for pacific northwest studies), which can be visited in person. finally, in addition to maps, journal articles, archival collections, and other media catalogued in the index, another building block for creating salish sea teaching materials is the terminology. this html page is in progress. it will be a reference tool for the vocabulary of salish sea studies, synthesizing concepts that are critical to this multidisciplinary and transboundary pedagogy. providing these building blocks functions as a way of supporting oer creation and remixing. phase three: future work—building transboundary community as of spring 2022, the salish sea repository’s role is to share curricular building blocks, learning outcomes, and sample materials. our faculty partners, with our support, are working on building a transboundary community around the repository, including librarians and interdisciplinary scholars engaged with relational accountability and landand place-based learning in higher education. we are producing this article in this context. as we expressed earlier, we wanted this to document the way relationship building is critical to the development and future growth of this project. as an example, we met with ashley edwards, one of two indigenous initiatives and instruction librarians at simon fraser university (sfu). in her work with the indigenous curriculum resource centre at sfu, edwards collects and organizes http://whatcomdigitalcommons.org/s/salishsea/item/1269 http://whatcomdigitalcommons.org/s/salishsea/item/1269 http://whatcomdigitalcommons.org/s/salishsea/item/1273 http://whatcomdigitalcommons.org/s/salishsea/item/1274 http://whatcomdigitalcommons.org/s/salishsea/item/1271 http://whatcomdigitalcommons.org/s/salishsea/item/1271 information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 11 ingram-monteiro and mckernan resources to assist faculty in learning about and engaging in indigenizing their pedagogy and curriculum, centering materials by and about coast salish communities.22 the creation of the center is part of sfu’s response to canada’s truth and reconciliation commission calls to action.23 though no such mandate exists in the us, indigenous and non-indigenous settler faculty at wwu and wcc are engaged with indigenization, as reflected in the inclusion of land-based education in salish sea studies. building a transboundary community invites collaboration with indigenous librarians like edwards, from whose work we can learn how to better support indigenization of curriculum in ethical, responsible, and respectful ways. we also presented the repository at the washington library association academic libraries division/association of college and research libraries of oregon and washington (ald/acrlwa and acrl-wa) academic libraries summit in fall 2021 with the intention of sharing this case study to document our work in the vein of open scholarship. audience questions focused on labor—attendees were interested in knowing about the job titles of people involved with the project. as more oer are developed for sharing in the salish sea repository, we intend to continually evaluate the effectiveness of the repository for users, including user experiences around adapting and remixing the building blocks, filling out the submission form, and browsing learning objects. one area that we expect to focus on is refining the metadata scheme. for example, what is the best approach for describing spatial coverage in this repository, given the variety of place names that can describe one location? we began with a controlled vocabulary and then shifted to an open entry user-defined field. this trades off the user’s ability to browse by a place name for the contributor’s ability to choose the specificity of a location name, which is important given the interdisciplinarity of land-based learning and the inclusion of multiple ways of knowing in this curriculum. we hope metadata librarians will be interested in bringing their skills to this project and working through such questions. future refinements will be driven by these evaluations. summary in response to an emerging, multidisciplinary academic initiative that originated at two local public colleges, our small library utilized our omeka s installation to create the salish sea curriculum repository. we implemented this open education project using a user-centered iterative development process. as of spring 2022, this has involved three phases of development. in the first phase, library work focused on metadata design, copyright education, and user interface development in omeka s. in the second phase, we focused on developing an index of salish sea resources, including information to help instructors find, adapt, and remix published maps, vetted terminology, and bioregional archival collections. the third phase will be focused on building a transboundary community around the creation and sharing of oer that meets salish sea studies learning objectives, including inviting other librarians to bring their specialized skills in support of this project. https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 12 ingram-monteiro and mckernan endnotes 1 “salish sea naming project,” college of the environment, western washington university, accessed march 17, 2022, https://cenv.wwu.edu/si/salish-sea-naming-project. 2 “project,” omeka, accessed march 16, 2022, https://omeka.org/about/project/. 3 tina luo, kirsten hostetler, candice freeman, and jill stefaniak, ”the power of open: benefits, barriers, and strategies for integration of open educational resources,” open learning: the journal of open, distance, and e-learning 35, no. 2 (october 2020): 149, https://doi.org/10.1080/02680513.2019.1677222. 4 jessica y. dai and lindsay inge carpenter, “bad (feminist) librarians: theories and strategies for oer librarianship,” international journal of open educational resources 3, no. 1 (may 2020): 152, https://doi.org/10.18278/ijoer.3.1.10. 5 dai and carpenter, 159. 6 dolores calderon, “speaking back to manifest destinies: a land education-based approach to critical curriculum inquiry,” environmental education research 20, no. 1, (2014): 24–36. https://doi.org/10.1080/13504622.2013.865114. 7 kathryn l. sobocinski, “section 6: opportunities for improving assessment and understanding of the salish sea,” in state of the salish sea, ed. k. l. sobocinski (bellingham: salish sea institute, 2021), 213, http://doi.org/10.25710/vfhb-3a69. 8 nancy a. van house, mark h. butler, virginia ogle, and lisa shiff, “user-centered iterative design for digital libraries,” d-lib magazine (february 1996), http://webdoc.sub.gwdg.de/edoc/aw/d-lib/dlib/february96/02vanhouse.html. 9 luo, hostetler, freeman, and stefaniak, 143. 10 the 2021 guide “code of best practices in fair use for open educational resources,” distributed by american university’s washington college of law and the center for media and social impact, has since become an important resource in such consultations. 11 one faculty partner shared walthausen’s article “how the internet is complicating the art of teaching” (the atlantic, october 26, 2016) pointing to the sentence “what i hadn’t understood before this tentative jump into the broader sharing economy was that making assignments is so much about personalization,” which illustrates one challenge to this work. 12 we are writing from lummi territory and so will share an example from here. anthropologist stacy michelle rasmus was asked by the lummi nation to study knowledge transmission and acquisition in the 1990s and early 2000s, including how research relationships are affected by the way knowledge is accessed and controlled in different contexts. in a 2002 article, rasmus shares several ways that outside researchers unethically extract and disseminate knowledge beyond the community. she shares how knowledge holders will share knowledge without giving it away, but outsiders often interpret this sharing as a license to do with it as they wish. https://cenv.wwu.edu/si/salish-sea-naming-project https://omeka.org/about/project/ https://doi.org/10.1080/02680513.2019.1677222 https://doi.org/10.18278/ijoer.3.1.10 https://doi.org/10.1080/13504622.2013.865114 http://doi.org/10.25710/vfhb-3a69 https://cmsimpact.org/code/open-educational-resources/ https://www.theatlantic.com/education/archive/2016/10/how-the-internet-is-complicating-the-art-of-teaching/505370/ https://www.theatlantic.com/education/archive/2016/10/how-the-internet-is-complicating-the-art-of-teaching/505370/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 13 ingram-monteiro and mckernan she writes, “… some researchers may not in fact know when they have been exposed to knowledge that, within a community context, is considered private in nature.” 13 dana reijerkerk, “ux design in online catalogs: practical issues with implementing traditional knowledge (tk) labels,” first monday 25, no. 8 (august 2020), https://doi.org/10.5210/fm.v25i8.10406. 14 jane anderson and kim christen, “‘chuck a copyright on it’: dilemmas of digital return and the possibilities for traditional knowledge licenses and labels,” museum anthropology review 7, no. 1–2 (spring-fall 2013), 110. 15 kimberly christen, “tribal archives, traditional knowledge, and local contexts: why the ‘s’ matters,” journal of western archives 6, no. 1 (2015), 13, https://doi.org/10.26077/78d5-47cf. 16 “vocabularies,” omeka s user manual, accessed may 13, 2022, https://omeka.org/s/docs/usermanual/content/vocabularies/. 17 brian m. watson, “grant application for 50 years on, many years past,” iu scholarworks (march 2020), https://hdl.handle.net/2022/25593. 18 “css editor,” omeka s user manual, accessed may 13, 2022, https://omeka.org/s/docs/usermanual/modules/csseditor/. 19 “wcag 2 overview,” web accessibility initiative, accessed may 13, 2022, https://www.w3.org/wai/standards-guidelines/wcag/. 20 “mapping,” omeka s user manual, accessed may 13, 2022, https://omeka.org/s/docs/usermanual/modules/mapping/. 21 libnamic, ”omeka s fields as tags,” omeka s modules, accessed may 13, 2022, https://omeka.org/s/modules/fieldsastags/. 22 ashley edwards, “supporting faculty in indigenizing curriculum and pedagogy: case study of the indigenous curriculum resource centre,” in ethnic studies in academic and research libraries, eds. raymond pun, melissa cardenas-dow, and kenya s. flash (chicago, il: association of college & research libraries, 2021), 171–72. 23 edwards, 177. https://doi.org/10.5210/fm.v25i8.10406 https://doi.org/10.26077/78d5-47cf https://omeka.org/s/docs/user-manual/content/vocabularies/ https://omeka.org/s/docs/user-manual/content/vocabularies/ https://hdl.handle.net/2022/25593 https://omeka.org/s/docs/user-manual/modules/csseditor/ https://omeka.org/s/docs/user-manual/modules/csseditor/ https://www.w3.org/wai/standards-guidelines/wcag/ https://omeka.org/s/docs/user-manual/modules/mapping/ https://omeka.org/s/docs/user-manual/modules/mapping/ https://omeka.org/s/modules/fieldsastags/ abstract introduction description of library repository service development development of salish sea curriculum repository phase one: initial digital repository development copyright education metadata schema user interface design phase two: a curated index of published works phase three: future work—building transboundary community summary endnotes article exploring final project trends utilizing nuclear knowledge taxonomy an approach using text mining faizhal arif santosa information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.15603 faizhal arif santosa (faizhalarif@gmail.com) is academic librarian, polytechnic institute of nuclear technology, national research and innovation agency. © 2022. abstract the national nuclear energy agency of indonesia (batan) taxonomy is a nuclear competence field organized into six categories. the polytechnic institute of nuclear technology, as an institution of nuclear education, faces a challenge in organizing student publications according to the fields in the batan taxonomy, especially in the library. the goal of this research is to determine the most efficient automatic document classification model using text mining to categorize student final project documents in indonesian and monitor the development of the nuclear field in each category. the knn algorithm is used to classify documents and identify the best model by comparing cosine similarity, correlation similarity, and dice similarity, along with vector creation binary term occurrence and tf-idf. a total of 99 documents labeled as reference data were obtained from the batan repository, and 536 unlabeled final project documents were prepared for prediction. in this study, several text mining approaches such as stem, stop words filter, n-grams, and filter by length were utilized. the number of k is 4, with cosine-binary being the best model with an accuracy value of 97 percent, and knn works optimally when working with binary term occurrence in indonesian language documents when compared to tf-idf. engineering of nuclear devices and facilities is the most popular field among students, while management is the least preferred. however, isotopes and radiation are the most prominent fields in nuclear technochemistry. text mining can assist librarians in grouping documents based on specific criteria. there is also the possibility of observing the evolution of each existing category based on the increase of documents and the application of similar methods in various circumstances. because of the curriculum and courses given, the growth of each discipline of nuclear science in the study program is different and varied. introduction the national nuclear energy agency of indonesia (batan), now known as the research organization for nuclear energy (ortn)—national research and innovation agency (brin), in 2018 issued a decision regarding batan’s six competencies: isotopes and radiation (ir), nuclear fuel cycle and advanced materials (nfcam), engineering of nuclear devices and facilities (endf), nuclear reactor (nr), nuclear and radiation safety and security (nrss), and management (mgt). these areas of focus are also known as batan’s knowledge taxonomy, which is used to support nuclear knowledge management (nkm) and the grouping of explicit knowledge in repositories.1 the polytechnic institute of nuclear technology (pint), which is under the auspices of batan and is now in one of the directorates of brin, can also utilize batan’s knowledge taxonomy to classify students’ final assignments. every year the pint library accepts final assignments from mailto:faizhalarif@gmail.com information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 2 santosa students who have graduated from three study programs, namely nuclear technochemistry, electronics instrumentation, and electromechanics. over the past six years (2017 to 2022), 563 final assignments in indonesian were collected and needed to be classified into the batan’s knowledge taxonomy in order to see the document growth of each existing competency. however, it is quite time consuming for librarians to assign individual documents to the most appropriate taxonomy term. it is also possible to involve experts to determine the right group, which results in increased working time to complete a document. this obstacle arises because librarians do not have in-depth and detailed knowledge of the nuclear field so it is feared that grouping errors will occur. in this study, the author tried to classify the collection of final project documents owned by the pint library based on batan’s knowledge taxonomy. the author used text mining tools, choosing the k-nearest neighbors (knn) algorithm for this study. similar research also leads to trying to focus on automatic document classification of certain subjects,2 which in this case is the subject of nuclear engineering. the hope is that users will find it easier to explore knowledge according to their area of interest through taxonomy grouping based on explicit knowledge,3 in this case, pint students’ final project documents. finding the trend of research conducted by students on each subject is also one of the goals of this research. literature review text mining in libraries the increasing number of publications currently makes it a challenge to classify and find out the growth and trends of a topic. document classification is one of the jobs that is quite time consuming so document classification automation by utilizing text mining is very necessary.4 the application and utilization of text mining itself is very broad. several studies have demonstrated the usefulness of text mining in libraries. pong et al. from city university of hong kong conducted research to facilitate the classification process using machine learning.5 this study aimed to streamline document categorization utilizing automatic document classification by using a system called the web-based automatic document classification system (wadcs) and claimed to be the pioneer of a comprehensive study of automatic document classification on a classification that is already popular in the world, namely the library of congress classification (lcc) utilizing knn and naive bayes (nb). this research indicates that the machine-learning algorithm they used can be applied by the library for document classification. wagstaff and liu utilized text mining to perform automatic classification to help make decisions to select candidate documents for weeding.6 this study used data from wesleyan university from 2011 to 2014 to predict which documents were eligible for weeding and which will be stored. five classifier models, namely knn, naive bayes, decision tree, random forest, and support vector machines (svm), were used to compare their performance. while this process may not replace librarians, this study can help librarians make better decisions and reduce their workload significantly. lamba and madhusudhan applied the use of text mining to extract important topics which were published in the desidoc journal of library and information technology over a period of 38 years.7 the latent dirichlet allocation (lda) method used in this study is able to find topics from information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 3 santosa within a collection of documents so that they can see how these topics develop over time. because lda is an algorithm for looking at topics from a group of words that appear together, the authors suggest that this study be expanded by utilizing articles that have been labeled using supervised classification. knn classifier various studies try to find answers to the most appropriate method of grouping the collection of documents. the knn and svm algorithms were used as comparative methods in the document classification study.8 however, there is no definite standard for the methods used in text mining.9 choosing the right technique in each phase of document classification can improve the performance of the text classifier, so, experts generally make adjustments to existing methods to get better results.10 kim and choi compared knn, maximum entropy model (mem), and svm to classify japanese patent documents by focusing on the structure of patents.11 instead of comparing the entire text, specific components named semantic elements, such as purpose, background, and application fields, are compared from the training document. these semantically grouped components are the basis for patent categorization. in addition, the strategy used is the existence of cross -references from two semantic fields that are useful for determining the intentions of the patent writer s who are still unsure or hidden. this strategy works well on knn compared to mem and svm where svm doesn’t do very well when handling large data sets. however, research conducted by alhaj et al. on arabic documents showed that svm can outperform knn by implementing a stemming strategy.12 meanwhile, through the approach to the relationship between unstructured text documents, the study conducted by mona et al. was able to increase the performance of knn combined with tf-idf by 5 percent.13 the knn algorithm is one of the popular classifiers that categorizes new data based on the concept of similarity from the amount of data (determined by the specified “k” value) around it.14 this method is believed to be able to group documents effectively because it is not limited to the number of vector sizes.15 wagstaff and liu noted that one of the weaknesses of knn is the long processing time when faced with large datasets, but knn as a classifier is easy to apply.16 in terms of measurement, previous experiments showed that knn was not suitable when used with euclidean distance.17 generally, similarity measures such as cosine, jaccard, and dice were used in the knn classifier.18 one of the problems in text classification is the number of attributes or dimensions so that many irrelevant attributes in the data set cause the classifier’s performance to not run optimally.19 for this reason, it is necessary to have a technique to increase effectiveness and reduce dimensions that are too large through the selection of features or terms,20 such as within-document tf, weighting with one of the popular methods, namely tf-idf (which sees how important a word is in a collection of corpus),21 and binary representation which looks at the absence and presence of a concept in a document22 by converting it to 0 and 1.23 aims of the study university libraries have a vital role in managing internal publications to support the education ecosystem. in connection with the role of the pint to support nkm and nuclear development, it is necessary to apply technology to help provide advice on certain classes of documents. in addition, in order to see scientific developments, generally experts conduct bibliometric studies which are information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 4 santosa limited to the title and abstract fields. text mining provides an opportunity to dig deeper. instead of just the title and abstract, this study used the full text of the final project collection. the trend of a subject will be seen from the growth and percentage of existing documents. so, the objectives of this study are to • explore the best knn model to be applied to classify the final project; • know the development of nuclear subjects based on batan’s knowledge taxonomy; and • know the development of nuclear subjects from each study program at the pint. methods a total of 99 documents were taken from the batan repository and manually labeled as reference data. this study was conducted using rapidminer studio software. the first document processing method is to convert all words into lower case and divide the text into a collection of tokens. filters on tokens are also applied based on the length of the token. in this case, the author applied a minimum of 3 characters and a maximum of 25 characters. stop words were also applied to eliminate short words (e.g., “and,” “the,” and “are”), thereby reducing the vector size. english and indonesian stop words were used for this study to overcome the use of english in the abstract section and indonesian as the document language. the collection of words from haryalesmana was chosen to be the stop words for indonesian.24 the stemming technique is applied to reduce dimensions that are useful for improving the function of the classification system 25 by changing word forms into basic word,26 e.g., water, waters, watered, and watering into water. this analysis applies wicaksana data to indonesian stemming.27 some words cannot be separated from other words because they form a meaning, e.g., nondestructive testing, biological radiation effects, structural chemical analysis, and water -cooled reactors. to overcome this case, the use of n-grams can help identify compound words that have a meaning so that the words are not reduced.28 n-grams will record a number of “n” words that follow the previous word.29 to accommodate these words, in this study, three words were assigned to n-grams. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 5 santosa figure 1. nuclear taxonomy classification framework. vector creation in this study used tf-idf and binary term occurrence and then compared them to determine the best performance. in the knn method, it is necessary to determine the value of “k” manually, so a value of 2–10 was chosen by activating a weighted vote which is useful for weighing the contributions of neighbors in the vicinity. weight voting indicates the use of multiple voting methods by assigning a weight to each neighbor depending on their distance from the unknown item.30 the types of measurement chosen to get maximum results were numerical measure and tested cosine similarity, correlation similarity, and dice similarity. meanwhile, to measure performance, the author used cross validation with a number of folds of 10. then, using this set of procedures, documents from the batan repository are classified. the procedure that achieves the highest level of accuracy is then submitted as a model. this model was applied to 563 final project documents that have not been labeled so that each document has a label according to batan’s knowledge taxonomy. results the experiment was carried out 54 times to determine the best knn performance from the proposed approach, namely cosine-binary, correlation-binary, dice-binary, cosine–tf-idf, correlation–tf-idf, and dice–tf-idf utilizing cross validation. cosine was still the most accurate in the tf-idf vector creation process, with an accuracy of 81.89 percent on seven neighbors, and dice reaches the lowest point when used on four neighbors. in contrast to correlation and dice, cosine can perform well when creating binary vectors. cosine on four neighbors had the best performance, with a 97 percent accuracy rate. the lowest accuracy occurred when the number of selected neighbors was two and the overall numerical measure had decreased in neighbors more than nine. the classification model for unlabeled documents was determined to be the cosine-binary method with four neighbors. the experiment found that this method did not successfully group three information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 6 santosa documents (for details of the confusion matrix, see appendix a). even though document 7 ought to be on nfcam, but with a lower score of 0.49921, it was predicted on the nrss with a confidence value of 0.50079. documents 86 and 93, which were supposed to be about endf, were unable to be foreseen. document 93 was predicted on the nrss with a confidence value of 0.50126 and document 86 was predicted on the nr with a value of 0.49936. figure 2. a comparison of the accuracy levels in the knn method. this study utilized 563 unlabeled documents that were divided into six years. there were 34 fewer documents in 2021 than there were in 2020, a significant drop from the previous year (see table 1). the number of documents then climbed again in 2022, reaching 98. rapidminer’s labeling process ran into issues when it got to the process document stage. to improve memory performance, the documents were split into three runs (2017–2018, 2019–2020, and 2021–2022) because the memory was not sufficient to execute a set of commands on docu ment processing. the results of the previous set of procedures were then exported as tabular data for further study. every year, the evolution of each nuclear subject can be seen in the final project report (see fig. 3). during the test period, 282 documents (50.09%) of the total extant papers had an endf study, followed by ir with 95 documents (16.87%) and nfcam with 69 documents (12.26%). while there were very little changes between nr and nrss, nr contains 47 papers (8.35%) connected while nrss had 45 documents (7.99%). mgt was the subject with the fewest documents, with a total of 25 (4.44%) from 2017 to 2022. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 7 santosa table 1. the pint’s final project documents growth from 2017 to 2022 study program 2017 2018 2019 2020 2021 2022 grand total electromechanics 35 34 43 35 24 41 212 electronics instrumentation 27 34 38 38 22 28 187 nuclear technochemistry 31 31 26 27 20 29 164 grand total 93 99 107 100 66 98 563 see appendix b for more information on the confidence value of each predicted document. of the 212 final project reports in the electromechanics study program 63.68 percent (135 documents) were projected to be on the endf subject, followed by 17.92 percent (38 documents) on nfcam, nrss with 8.96 percent (19 documents), and nr 5.19 percent (11 documents). meanwhile, ir had the fewest papers predicted, with 2.83 percent (6 documents) while mgt had 1.42 percent (3 documents) predicted. every year, endf was the most predicted subject in this study (see fig. 4). figure 3. nuclear subject development by percentage each year. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 8 santosa figure 4. nuclear subject development in electromechanics by % each year. the final project report on instrumentation electronics, which included 187 papers, was successfully predicted into five subjects. endf was projected to contain 141 documents (75.40%), nrss was likely to contain 24 documents (12.83%), and nr was predicted to contain 14 documents (7.49%). furthermore, only 7 documents (3.74%) on mgt and 1 document (0.53%) on ir were predicted. nfcam, on the other hand, is not mentioned in any of the electronics instrumentation publications (see fig. 5). final processing was performed on a collection of nuclear technochemistry documents. one hundred sixty-four documents are predicted at ir of 53.66 percent (88 documents), nfcam of 18.90 percent (31 documents), nr of 13.41 percent (22 documents), mgt of 9.15 percent (15 documents), endf of 3.66 percent (6 documents), and the remaining 1.22 percent (2 documents) were predicted on the nrss. subjects that were popular each year vary (see fig. 6) when compared to electromechanics and instrumentation electronics, where endf was the most popular topic in these two study programs. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 9 santosa figure 5. nuclear subject development in electronics instrumentation by % each year. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 10 santosa figure 6. nuclear subject development in nuclear technochemistry by % each year. discussion the study found that implementing knn with cosine similarity in association with vector construction=binary and k=4 resulted in the highest accuracy results of 97 percent. in general, this strategy outperformed in every class examined, and it can only be balanced on one occasion, notably at k=9 by utilizing correlation similarity. when compared to the use of tf-idf, the results likewise indicated that binary term occurrence always functioned well. tf-idf was only able to achieve its highest accuracy of 81.89 percent when k was 7 using correlation similarity. cosine similarity also seemed to work efficiently on every vector creation, both when using binary and tf-idf (in classes numbering 2, 5, and 10 the use of tf-idf was not optimal), compared to numerical measures of correlation similarity and dice similarity. cosine similarity evaluates the similarity of documents, and a high similarity score indicates that the documents are quite similar.31 nuclear field growth in general, aside from the endf field, which is steady and increasing, other subjects endure annual changes in development. for the past six years, endf has been the most popular subject among students. the endf reached the highest percentage rate in 2022, with 59 documents predicted on this subject. students preferred engineering final project reports on mechanics and structures, electromechanics, control systems, nuclear instrumentation, or nuclear facility process technology. research conducted by wang et al. also suggests that the current popular topic of research on nuclear power is modeling and simulation.32 information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 11 santosa the endf document’s average confidence value was 0.6499916, with a median value of 0.7490455. the two documents with the lowest confidence in the endf were document numbers 233 and 597. document 233 had a confidence value of 0.25105 and was predicted in the other three subject areas (nrss, nr, mgt) with close values. likewise, the 597 documents predicted in the endf with a confidence value of 0.25156 were higher than the nrss, nfcam, and ir subjects, but with a not too significant difference. both of these documents can be investigated further and directly evaluated by the librarian in order to obtain a more precise field. the majority of the final project reports projected in the endf have confidence levels around 0.50, and some even higher at 0.75. this study also reveals that 11 documents in the endf category have a confidence value of 1. with lower nrss confidence values, 239 endf documents connected to the nrss field. this relationship demonstrates a good tendency among students conducting nuclear engineering related to the nrss discipline. though it differs significantly from endf, ir is becoming a prominent field. the final project report for ir was developed in 2017–2018, but it shrank again from 2019 to 2021, then increased in 2022. in comparison to other fields, ir has the highest minimal confidence score of 0.4 987, with many documents lying within the 0.5 and 0.75 range. meanwhile, the confidence value for 26 documents predicted by ir is 1. the nfcam subject area is a prediction that appears frequently in ir predictions but has a lower level of confidence. there are 54 documents indicating the existence of research that involves isotopes and radiation in nuclear materials, nuclear excavations, radioactive waste, structures, or advanced materials. nfcam is inversely proportional to the conditions that occur in endf. after increasing in 2019, this subject faced a reversal over the next three years, with only two documents classified in this subject through 2022. students are still uncommonly interested in nuclear minerals, nuclear fuel, radioactive waste, structural materials, and advanced materials. six projected documents in this field have confidence levels of 1, while many more have confidence levels between 0.50 and 0.75. the ir field is also expected to appear alongside the nfcam field publications. there were also ups and downs in nr and nrss. twenty-five of the 47 documents identified on the nr were also predicted with a lower value in the nrss field. this demonstrates that students explored the relationship between the subject of reactor research and safety and security in various documents. meanwhile, only eight of the 46 nrss papers are unrelated to the endf field. this demonstrates that students who study nuclear safety and security tend to perform engineering to address situations involving nuclear safety and security. documents in these two fields are usually concentrated in the 0.5 confidence value range in both nr and nrss. mgt is one of the least studied topics among students. human resources, organization, management, program planning, auditing, quality systems, informatics utilization, or cooperation are more commonly associated with the mgt field. the mgt increased in 2020, although it became the field with the fewest documents on earlier occasions (2017 to 2019 and 2021 to 2022). in terms of confidence value, 21 mgt documents have a value greater than 0.5, with eight documents worth 1. with 10 documents, the endf is the most often discussed study area with mgt. progression in each study program even if they are still within the purview of nuclear science, the growth of the nuclear field in each study program differs depending on the curriculum. students are influenced by knowledge, and information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 12 santosa more specifically the process of learning and comprehending (whether theoretical or more practical).33 endf is still the most popular field in electromechanics and electronics instrumentation study programs. these two study programs offer courses in endf topic areas such as mechanical, civil and architectural, electromechanical, electrical, control systems, and radiation detection for nuclear devices. furthermore, the electronics instrumentation study program offers courses on nuclear electronics, signal processing techniques, and practical work on interface and data acquisition techniques, all of which are part of the endf nuclear instrumentation group. apart from endf, the fields of nfcam and nrss have been present in electromechanics for a period of six years. while mgt is currently a less appealing topic, there have been no final project reports relating to mgt in the most recent three years. in electronics instrumentation, the absence of a field occurs in nfcam. the findings of the predictions demonstrate that none of the documents predicted on nfcam were proper. meanwhile, only 10 documents that intersect with nfcam which have lower confidence in the range of values from 0.247 to 0.251. nuclear minerals, nuclear fuel, structural materials and advanced materials, and radioactive waste were not studied in depth in this study program, illustrating why nfcam is not predicted in instrumentation electronics. in contrast to other study programs, ir is the most predictable field in the final project report in nuclear technochemistry. in this investigation, nuclear technochemistry owns 88 of the 95 documents examined. this study program includes ir specializations such as the use of isotopes and radiation in agriculture, health, and industry. radioisotope production becomes another discipline that specializes in the creation of isotopes and radiation sources, which explains why ir is so popular among nuclear technochemistry students. the nfcam field was not present in 2022, despite the fact that it had been the topic of several students’ studies throughout the preceding five years. while the endf and mgt fields have only been present in the last three years, there were no predictable papers in the previous three years. conclusion the trend of research activities carried out by students from one study program to the next appears to vary although they are both within the scope of the nuclear field. for example, the field of endf is quite popular among electromechanics and electronics instrumentation students but not for nuclear technochemistry students because endf only appeared three years ago and the number of documents is still modest. however, endf deserves to be a field that needs attention. nuclear technochemistry students with radiochemistry learning experiences demonstrate that the ir field is linear and interesting to them. due to a paucity of publications, the low proportion in certain categories, e.g., mgt, shows a potential to further investigate this field. this study demonstrates an opportunity to use text mining to assist librarians in performing automatic document classification based on specific subjects. the best model in this study is produced by combining knn with cosine similarity and binary term occurrence. the model used can help improve the quality of decisions made to accurately and efficiently categorize documents. to determine a more specific classification, pay close attention to documents that have a low level of confidence and intersect with other issues. this study is limited to the knn method and information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 13 santosa documents from the batan repository, as well as final project documents for pint students. large-scale testing can be conducted, for instance, in the international atomic energy agency ’s (iaea) nuclear repository known as the international nuclear information system (inis) repository, or in other databases with the complexity of categorizing documents throughout many languages. data accessibility datasets and data analysis code for rapidminer have been uploaded to the rin dataverse: https://hdl.handle.net/20.500.12690/rin/asrgvo. data visualization can be accessed through tableau public: https://public.tableau.com/app/profile/faizhal.arif/viz/finalprojecttrendsutilizingnuclearknow ledgetaxonomy/story1 https://hdl.handle.net/20.500.12690/rin/asrgvo https://public.tableau.com/app/profile/faizhal.arif/viz/finalprojecttrendsutilizingnuclearknowledgetaxonomy/story1 https://public.tableau.com/app/profile/faizhal.arif/viz/finalprojecttrendsutilizingnuclearknowledgetaxonomy/story1 information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 14 santosa appendix a: confusion matrix of 10-fold cross validation accuracy: 97.00% +/4.83% (micro average: 96.97%) true nfcam true ir true nrss true mgt true nr true endf class precision pred. nfcam 13 0 0 0 0 0 100.00% pred. ir 0 18 0 0 0 0 100.00% pred. nrss 1 0 20 0 0 1 90.91% pred. mgt 0 0 0 19 0 0 100.00% pred. nr 0 0 0 0 13 1 92.86% pred. endf 0 0 0 0 0 13 100.00% class recall 92.86% 100.00% 100.00% 100.00% 100.00% 86.67% information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 15 santosa appendix b: the confidence value of each field e n d f ir m g t information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 16 santosa n f c a m n r n r s s information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 17 santosa endnotes 1 budi prasetyo and anggiana rohandi yusuf, “pengelolaan pengetahuan eksplisit berbasis teknologi informasi di batan,” in prosiding seminar nasional sdm teknologi nuklir (seminar nasional sdm teknologi nuklir, yogyakarta: sekolah tinggi teknologi nuklir, 2018), 126–32, https://inis.iaea.org/collection/nclcollectionstore/_public/50/062/50062856.pdf?r=1 . 2 joanna yi-hang pong et al., “a comparative study of two automatic document classification methods in a library setting,” journal of information science 34, no. 2 (april 2008): 213–30, https://doi.org/10.1177/0165551507082592. 3 prasetyo and yusuf, “pengelolaan pengetahuan eksplisit.” 4 jae-ho kim and key-sun choi, “patent document categorization based on semantic structural information,” information processing & management 43, no. 5 (september 2007): 1200–15, https://doi.org/10.1016/j.ipm.2007.02.002; pong et al., “a comparative study”; khusbu thakur and vinit kumar, “application of text mining techniques on scholarly research articles: methods and tools,” new review of academic librarianship (may 12, 2021): 1–25, https://doi.org/10.1080/13614533.2021.1918190. 5 pong et al., “a comparative study.” 6 kiri l. wagstaff and geoffrey z. liu, “automated classification to improve the efficiency of weeding library collections,” the journal of academic librarianship 44, no. 2 (march 2018): 238–47, https://doi.org/10.1016/j.acalib.2018.02.001. 7 manika lamba and margam madhusudhan, “mapping of topics in desidoc journal of library and information technology, india: a study,” scientometrics 120, no. 2 (august 2019): 477– 505, https://doi.org/10.1007/s11192-019-03137-5. 8 fábio figueiredo et al., “word co-occurrence features for text classification,” information systems 36, no. 5 (july 2011): 843–58, https://doi.org/10.1016/j.is.2011.02.002; yen-hsien lee et al., “use of a domain-specific ontology to support automated document categorization at the concept level: method development and evaluation,” expert systems with applications 174 (july 2021): 114681, https://doi.org/10.1016/j.eswa.2021.114681; yousif a. alhaj et al., “a study of the effects of stemming strategies on arabic document classification,” ieee access 7 (2019): 32664–71, https://doi.org/10.1109/access.2019.2903331. 9 david antons et al., “the application of text mining methods in innovation research: current state, evolution patterns, and development priorities,” r&d management 50, no. 3 (june 2020): 329–51, https://doi.org/10.1111/radm.12408; muhammad arshad et al., “next generation data analytics: text mining in library practice and research,” library philosophy and practice (2020): 1–12. 10 mowafy mona, rezk amira, and hazem m. el-bakry, “an efficient classification model for unstructured text document,” american journal of computer science and information technology 06, no. 01 (2018), https://doi.org/10.21767/2349-3917.100016. https://inis.iaea.org/collection/nclcollectionstore/_public/50/062/50062856.pdf?r=1 https://doi.org/10.1177/0165551507082592 https://doi.org/10.1016/j.ipm.2007.02.002 https://doi.org/10.1080/13614533.2021.1918190 https://doi.org/10.1016/j.acalib.2018.02.001 https://doi.org/10.1007/s11192-019-03137-5 https://doi.org/10.1016/j.is.2011.02.002 https://doi.org/10.1016/j.eswa.2021.114681 https://doi.org/10.1109/access.2019.2903331 https://doi.org/10.1111/radm.12408 https://doi.org/10.21767/2349-3917.100016 information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 18 santosa 11 kim and choi, “patent document categorization.” 12 alhaj et al., “a study of the effects of stemming strategies.” 13 mona, amira, and el-bakry, “an efficient classification model.” 14 thakur and kumar, “application of text mining techniques.” 15 kim and choi, “patent document categorization.” 16 wagstaff and liu, “automated classification.” 17 najat ali, daniel neagu, and paul trundle, “evaluation of k-nearest neighbour classifier performance for heterogeneous data sets,” sn applied sciences 1, no. 12 (december 2019): 1559, https://doi.org/10.1007/s42452-019-1356-9. 18 roiss alhutaish and nazlia omar, “arabic text classification using k-nearest neighbour algorithm,” the international arab journal of information technology 12, no. 2 (2015): 190–95. 19 mona, amira, and el-bakry, “an efficient classification model.” 20 guozhong feng et al., “a probabilistic model derived term weighting scheme for text classification,” pattern recognition letters 110 (july 2018): 23–29, https://doi.org/10.1016/j.patrec.2018.03.003. 21 snezhana sulova et al., “using text mining to classify research papers,” in 17th international multidisciplinary scientific geoconference sgem 2017, vol. 17, international multidisciplinary scientific geoconference-sgem (17th international multidisciplinary scientific geoconference sgem, sofia: surveying geology & mining ecology management (sgem), 2017), 647 –54, https://doi.org/10.5593/sgem2017/21/s07.083. 22 lee et al., “use of a domain-specific ontology.” 23 man lan et al., “supervised and traditional term weighting methods for automatic text categorization,” ieee transactions on pattern analysis and machine intelligence 31, no. 4 (april 2009): 721–35, https://doi.org/10.1109/tpami.2008.110. 24 devid haryalesmana, “masdevid/id-stop words,” 2019, https://github.com/masdevid/id-stop words. 25 alhaj et al., “a study of the effects of stemming strategies.” 26 pong et al., “a comparative study.” 27 ananta pandu wicaksana, “nolimitid/nolimit-kamus,” 2015, https://github.com/nolimitid/nolimit-kamus. 28 antons et al., “the application of text mining methods.” https://doi.org/10.1007/s42452-019-1356-9 https://doi.org/10.1016/j.patrec.2018.03.003 https://doi.org/10.5593/sgem2017/21/s07.083 https://doi.org/10.1109/tpami.2008.110 https://github.com/nolimitid/nolimit-kamus information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 19 santosa 29 kanish shah et al., “a comparative analysis of logistic regression, random forest and knn models for the text classification,” augmented human research 5, no. 1 (december 2020): 12, https://doi.org/10.1007/s41133-020-00032-0. 30 judit tamas and zsolt toth, “classification-based symbolic indoor positioning over the miskolc iis data-set,” journal of location based services 12, no. 1 (january 2, 2018): 2–18, https://doi.org/10.1080/17489725.2018.1455992. 31 hanan aljuaid et al., “important citation identification using sentiment analysis of in -text citations,” telematics and informatics 56 (january 2021): 101492, https://doi.org/10.1016/j.tele.2020.101492. 32 qiang wang, rongrong li, and gang he, “research status of nuclear power: a review,” renewable and sustainable energy reviews 90 (july 2018): 90–96, https://doi.org/10.1016/j.rser.2018.03.044. 33 ronald barnett, “knowing and becoming in the higher education curriculum,” studies in higher education 34, no. 4 (june 2009): 429–40, https://doi.org/10.1080/03075070902771978. https://doi.org/10.1007/s41133-020-00032-0 https://doi.org/10.1080/17489725.2018.1455992 https://doi.org/10.1016/j.tele.2020.101492 https://doi.org/10.1016/j.rser.2018.03.044 https://doi.org/10.1080/03075070902771978 abstract introduction literature review text mining in libraries knn classifier aims of the study methods results discussion nuclear field growth progression in each study program conclusion data accessibility appendix a: confusion matrix of 10-fold cross validation appendix b: the confidence value of each field endnotes a rapid implementation of a reserve reading list solution in response to the covid-19 pandemic article a rapid implementation of a reserve reading list solution in response to the covid-19 pandemic matthew black and susan powelson information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13209 matthew black (mblack@ucalgary.ca) is the discovery & systems librarian, university of calgary. susan powelson (spowelso@ucalgary.ca) is the associate university librarian, technology, discovery & digital services, university of calgary. © 2021. abstract in the spring of 2020, as post-secondary institutions and libraries were adapting to the covid-19 pandemic, libraries and cultural resources at the university of calgary rapidly implemented ex libris’ reading list solution leganto to support the necessary move to online teaching and learning. this article describes the rapid implementation process and changes to our reserve reading list service and policies, reviews the status of the implementation to date and presents key takeaways which will be helpful for other libraries considering implementing an online reading list management system or other systems on a rapid timeline. overall, rapid implementation allowed us to meet our immediate need to support online teaching and learning; however, long term successful adoption of this tool will require additional configuration, engagement, and support. introduction in response to the changes to the post-secondary learning environment due to covid-19 and to better integrate our course reserve reading list services with our library management system (ex libris’ alma), libraries and cultural resources (lcr) at the university of calgary (ucalgary) decided to rapidly implement ex libris’ reading list solution leganto. after rapidly implementing this reading list solution, lcr made it accessible to instructors and students in our learning management system, desire2learn (d2l), for the fall 2020 term. this article will discuss lcr’s decision to rapidly implement leganto, the implementation process, changes to our reserve reading list service and policies, and will conclude by reviewing the implementation so far and present key takeaways. this paper will be helpful for those who are considering implementing an online reading list management system in general but more so for those looking to do so on a rapid timeline. it will also be helpful for those implementing new systems to support changes to services due to covid-19. literature review online reading lists management systems have been in use since the early 2000s. from 1999 to 2000, loughborough university developed and implemented an in-house open-source reading list management system which “was an electronic representation of the academic’s paper-based reading list.”1 since then open-source and commercial solutions have been developed and implemented by many libraries. richard cross summarizes the development and growth of the market for “resource list software” in the uk and notes that “in the absence of a mature commercial market, several uk universities have developed in-house” systems. cross notes that since 2010, commercial solutions have been developed and that “the high-level specifications requirements” for reading list management systems have now been distilled.2 as part of the development of the commercial market, ex libris launched its leganto online reading list solution in 2015; the solution has since been implemented by over 230 institutions worldwide.3 mailto:mblack@ucalgary.ca mailto:spowelso@ucalgary.ca information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 2 overall, the literature on reading list management systems focuses on reviewing implementations and identifying factors that contribute to successful implementation of these solutions.4 also, some literature captures instructor and student perceptions of these systems and provides recommendations for successful engagement and use.5 these recommendations focus on the importance of administrative support for the tool, updating or creating policy and workflows to support implementation, technical configurations and integrations, and faculty/instructor engagement. 6 libraries that have implemented an online reading list system indicate that it is important to have the adoption of the system supported and championed by senior administration within the library and from the wider institution.7 having this support means that reading list policies and services can be aligned with library and institutional goals to support teaching and learning. furthermore, marie o'neill and lara musto contend that successful adoption is dependent on this support and integration with institutional goals and not just “premised on technology.”8 establishing policies and service goals that are supported by senior administration provides an impetus to make sure workflows and functionalities are configured to achieve these. the literature recommends that implementing an online reading list solution provides an important opportunity to review or revise previous workflows or to develop new workflows for standardization and “timely satisfaction of resource list needs.”9 to support these new workflows and services, technical integrations and configurations need to be considered. these include integrations with the library management system, the learning management system (lms), and institutional authentication systems.10 these integrations are essential to make the system streamlined and accessible for instructors and students and are expected by these user groups. for example, when the university of manchester library was implementing leganto in 2019, they surveyed students and instructors and found that students valued convenience and access, and that instructors were interested in a system that would allow them to order books and digitized chapters and see analytics.11 thus, the reading list solution should be configured and implemented with functionalities and integrations to support these expectations. ease of access within the lms is perceived as is an important technical requirement for successful implementation. o’neill and musto’s study finds that faculty had a strong desire to have a reading list system integrated with the lms.12 meredith gorran farkas contends that “positioning the library at the heart of the virtual campus seems as important as positioning the library as the heart of the physical campus.”13 murray and feinberg address the placement of libguides in the learning management system stating it is critical to make resources available despite where students are physically located.”14 this can be extrapolated to reading lists, and in this environment, where students are learning online in geographically dispersed locations, the lists need to be easily available. faculty and student engagement are important nontechnical considerations when embedding library services in the lms. knowing and including relevant stakeholders, gaining instructor buyin and understanding of the benefits, and determining how to raise student awareness of the tool, both short and longer term, are key elements.15 the literature recommends providing instructors with resources such as templates and training “at specific points in the semesters when faculty have less time pressures.”16 however, engagement should not be limited to this: it should be information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 3 ongoing and collaborative. this ongoing engagement can work to create “a virtuous circle” in which instructors “see return on the investment in resource list work” and “improved student satisfaction.”17 one university reports this can provide an opportunity for collaboration and generate “word-of mouth” to promote adoption.18 what is leganto? leganto is a cloud-based reading list solution fully integrated with ex libris’ alma library management system. it allows reading lists to be processed directly in alma by library staff and provides an interface for creating and engaging with readings lists that can be integrated directly into learning management systems using the learning tools interoperability (lti) integration. instructors can use leganto to create reading lists by searching for and adding resources. leganto allows diverse resources to be added to reading lists, including physical and electronic resources that are in alma, ex libris discovery index resources (via their central discovery index), internet sources (any url added manually or through the cite it! browser bookmark tool), or uploaded resources (documents uploaded by the instructor). university of calgary context the university of calgary is a comprehensive research university, ranked one of canada’s top ten research universities.19 the university is home to 14 faculties (offering more than 200 academic programs) and more than 33,000 students are currently enrolled in undergraduate, graduate, and professional degree programs. d2l is the university’s learning management system, managed by the university’s teaching and learning unit. libraries and cultural resources is a principal division of the university of calgary and includes eight libraries on campus and across the city, and two art galleries. the main library is the taylor family digital library (tfdl) which opened in 2011. in 2018, lcr adopted ex libris’ alma as its library management system. pre-covid-19 reserve reading list service prior to covid-19, lcr had distinct and unintegrated systems and processes for managing course reserves and reading lists. while some functions in alma were used to manage physical course reserves, most of the workflows were managed outside of alma. these included a web form that instructors could use to submit requests for course materials and a course reserve tool (atlas systems’ ares product). the submissions from the web form would be reviewed by our copyright office who would determine if the requested item needed copyright clearance or not. through this process, physical items which were already in the library collection were flagged and sent to fulfillment staff. staff would create or update the course in alma, add the item to a course reading list, and move it to a reserve location so it could be borrowed by students. doing this also made the items searchable through our course reserve search in our discovery service (primo). requested items that were not in the library collection were sent to the collections department for purchase consideration. through email communications with the copyright office, instructors could also request parts of items be scanned and approved for use in classes. requests for electronic reserves were not managed in alma. how did covid-19 affect our reserve reading list service? shortly after the covid-19 pandemic became widespread in the province of alberta, the government implemented restrictions which required post-secondary institutions to close physical locations and to restrict or move most courses and services online. in march 2020, lcr had to close all physical locations and was only able to provide online access to resources and information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 4 services. from march to august, lcr focused on online services and promoted access to online resources. the main taylor family digital library was not able to open until august, when we began offering contactless pickup service and limited study space bookings. as of july 2021, the majority of the courses offered by the university continue to be online. as a result, we are not fulfilling instructor requests to make print/physical resources accessible to students through course reserve. this has been a significant change from our previous operations in which physical course reserve was a popular and successful service. in 2019, lcr had a total 3,428 physical items added in alma as course reserve citations and these items circulated 16,345 times. with the announcement that fall 2020 term courses would be predominantly online, and the anticipation that the same would hold true for winter 2021, we realized that a new mechanism to deliver course reserves would be necessary. while we had been considering ex libris’ course reading list tool before the pandemic, the operational changes due to covid-19 provided us with an urgent need and impetus. we had to quickly develop alternative ways to support instructors in creating reading lists and for students to access reading list resources in the online learning environment. leganto rapid implementation leganto implementation is managed by ex libris and is typically done over an eight-to-twelveweek schedule. during this implementation, institutions work with ex libris to set up configurations to meet local needs. in 2020, ex libris began offering rapid implementation for leganto, which involves a shortened timeline of approximately four to five weeks. this is achievable because institutions allow ex libris staff to set up the leganto now configuration, which focuses on configuring essential features to quickly get the tool up and running. there are inherent tradeoffs between the typical and rapid implementation that we needed to consider. the shortened timeline would allow us to launch the service sooner and begin promoting it as a tool to support faculty in moving to teaching online due to covid-19. in addition, because we would implement leganto out of the box we could use the vendor created videos and support tools rather than creating our own. the additional three to seven weeks required for the typical implementation would have provided us more time to fine tune and configure the tool to meet our specific needs but would have delayed our ability to support our faculty in this time of need. for example, the rapid implementation schedule did not include support for setting up workflows for digitization requests or configuration of the more advanced copyright clearance functions to improve automatic processing. to configure these, we would need to work on them on our own while going through the rapid implementation or once the implementation was complete. after considering these tradeoffs, in april 2020 we made the decision to rapidly implement leganto. this decision was supported by the provost and lcr senior administration providing us with the institutional support identified in the literature as necessary for a successful implementation. table 1 outlines the rapid implementation schedule we followed. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 5 table 1. project implementation schedule event dates kickoff may 27, 2020 implementation may 28 to june 28, 2020 go-live june 29, 2020 switch to support july 15, 2020 the implementation portion involved weekly meetings to review the project status, frequent posting to the online project management tool to update the status and ask clarifying questions, and many internal meetings to discuss our progress and to make decisions about workflows, configurations, and engagement. as a requirement for go-live, at least one reading list had to be created by an instructor and be accessible to students. ex libris offered the opportunity to implement together with the university of manitoba, which is a similarly sized canadian university. this was a great opportunity to not only start implementation sooner but also connect with the university of manitoba. in the short term, rapid implementation would help us meet an immediate need to support online access and learning while operating under covid-19 restrictions. in the long term, this was an opportunity to develop a new way to engage with instructors and further promote our collections as resources for supporting learning. also, it was the opportunity to revise or develop workflows to support these goals. overall, we hoped leganto would make it easier for instructors to add resources to reading lists and to get copyright clearance so these resources would be accessible to students through a standardized tool integrated with d2l. with these goals in mind, we aimed to • pilot leganto for summer 2020 term, • have leganto accessible for all courses by fall 2020 term, and • revise or develop workflows for digitization, copyright clearance, and purchase requests. implementation work despite the out-of-the-box leganto now configuration which aimed to have leganto up and running for our go-live date, implementation still required the lcr team to do a significant amount of work and planning. to manage this work, we established a technical team and an engagement team. the technical team, comprised of representatives from the library’s systems and discovery, copyright, digitization and collections units, the university’s teaching and learning unit, and the associate university librarian, technology, discovery and digital services (aul), worked to review, revise, and develop workflows and to test configurations in alma and leganto. the engagement team, created by a call for participants and comprised of three subject librarians and the aul, worked to develop strategies, communications, and resources to promote and support the use of leganto to instructors. also, the two teams collaborated to test configurations and functions and suggest improvements. the aul’s presence in the teams was critically important to demonstrate senior leadership support of the project, an important element for success noted by o’neill and musto.20 overall, this work required the teams to meet and discuss short-term and long-term changes to our course reserve service and how leganto could be configured to support these. the rapid information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 6 implementation schedule made this challenging because we had to start with the leganto now configurations and then test these to see how they aligned with our desired services and workflows. if they did not align well, we had to investigate and adapt. this was a back-and-forth negotiation as we learned in a short time frame how to configure the system to support our desired workflows and services and how to adjust our services and workflows to align with the capabilities/functionalities of the system. an important part of the implementation was configuring leganto to use the learning tools interoperability (lti) standard to integrate with d2l. as mentioned in the literature, having the online reading list solutions embedded and integrated with the institutional lms is a key factor for successful adoption of the tool by instructors and students.21 using the lti integration, we were able to connect alma, leganto, and d2l. this work required coordination with the adminis trators of d2l at the university so that course and student data could be communicated between d2l, alma, and leganto. after configuring and testing the lti integration, we decided to use it to create reading list (leganto) links in d2l courses through the d2l tools menu. when a user clicks the link, user and course data from d2l is sent to alma to • create the course based on the course information and • assign the appropriate role in alma and leganto for the instructor or student. since lcr could not provide physical reserves because of covid-19, we decided to support the full/partial digitization of physical resources based on copyright approval and the creation of purchase requests for electronic copies of physical items added to a reading list. to achieve these goals, we needed to revise and establish workflows that make use of the basic digitization and purchase request functions in leganto and alma. these functions were not part of the leganto now configuration, but we decided to take the opportunity to make them available to instructors. for our purposes, the digitization workflows and functions in alma and leganto would be used to support the full or partial scanning of physical resources. we already had workflows to support this type of work, but these needed to be revised so requests created by instructors in leganto could be reviewed by the copyright office, items could be retrieved from the collection and scanned by staff, and scans could be made accessible in leganto for instructors and students. the technical team worked with the departments involved in these workflows to determine how the new functions could support this work and how the workflows needed to be adjusted. also, we had to decide how to use the functions in alma and leganto to facilitate electronic purchase requests for print/physical resources added to readings lists. similarly, we already had workflows for this but needed to review and revise these to make use of the functions supported by alma and leganto. using these functions, we configured alma and leganto so that if an instructor added a book to their reading list and we did not have an e-book version, a purchase request for an e-book version would be submitted to the collections department. this was achieved using tags and automatic processing rules with definable parameters. there were a few other settings we had to customize to meet our needs. this included configuring the default out-of-the-box processing and copyright statuses for citations added to reading lists, reading list visibility settings to make sure only enrolled students could access the lists, user interface adjustments to control what functions are available to students and instructors, and interface text/messaging changes to align with our services. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 7 the other major part of this work involved engagement to publicize the new tool and provide training and support to instructors and library staff. the engagement team worked to develop resources and provided training to staff and instructors. library staff were oriented to the product, the key benefits, and how to add course materials so that they could speak knowledgably to their faculty about the tool. faculty training sessions introduced faculty to the tool and how to add specific types of resources, for example a book chapter, a website, or an item from their personal collection. one libguide was created to support both faculty and staff. ex libris provided support to the engagement team by providing communication templates and advice on engagement strategies. the team also promoted the new tool in our institutional newsletter and worked with the university’s teaching and learning unit to promote it. overall, it was challenging during the five-week rapid implementation to quickly map and adapt our current course reserve workflows and ensure that these configurations will meet long term needs while considering the restraints of the current covid environment. go-live and post go-live we finished implementation on june 29, 2020. for the summer term pilot, we only had one instructor publish a reading list for their course. however, instead of using the functionalities to add citations to the list the instructor uploaded a document version of their reading list. this was similar to what instructors were used to doing in d2l and we had to support the instructor to add the citations from the document to the reading list in leganto. shortly after going live, we had to make the lti link to leganto available in all courses because after publicizing the new tool, we received requests for access from instructors who were preparing for their fall term courses. this was sooner than expected and we believe this was because the online fall term may have motivated instructors to start preparing earlier than usual. with the end of the fall 2020 term, we have been able to use alma and leganto’s analytics reporting to see instructor use and student engagement with reading lists (see table 2). from these reports, we can see that for the fall 2020 term, there were 50 reading lists created that were associated with a course and that had at least one citation added. this means the instructors of these courses at least tried to use the reading list tool. however, of these 50 lists, there were only 30 lists that had student activity. student activity is a category of interactions that ex libris uses to indicate how well the lists are being used by students and includes activities such as reading list views, number of citation views, number of full text views, number of files downloaded, and number of students that have marked a reading as done. we surmise that for the other 20 reading lists, the instructors encountered barriers to list creation and abandoned the process. we continued to use the analytics reporting to monitor the status of instructor use for the winter 2021 term. by the end of that term, there were 66 lists created with at least one citation. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 8 table 2. fall 2020 and winter 2021 term usage term reading lists with at least one citation added number of reading lists with student activity students viewing reading lists reading list views fall 2020 50 30 664 5,320 winter 2021 66 49 1,154 13,200 takeaways rapidly implementing leganto during covid-19 has been a valuable and challenging learning experience which has offered several takeaways. go through rapid implementation with another university of a comparable size so that you can connect, support, and learn from each other. rapid implementation is an intense period of uncertainty which can be challenging to go through alone. another institution can provide support and help the implementation team cope. for example, during the implementation we had meetings with the university of manitoba to discuss our progress and challenges with the implementation before meeting with the vendor implementation team. since implementing we have stayed in contact and have been able to rely on each other to discuss the status of our engagement, adoption, and configurations. rapid implementation requires effort and time. the shortened implementation timeline imp lied that it would be relatively easy; however, the project dominated the schedules of those involved and has continued to require work. the weekly meetings with ex libris, the weekly internal meetings, the work in between meetings, and the continuous testing and reworking settings required a significant time commitment from all involved. timing is everything. we had hoped for a successful pilot in the summer term and to use this experience to learn and prepare for the wider rollout in the fall term. however, instructors began preparing for the fall term earlier than expected and consequently we needed to make leganto accessible earlier. understand your instructor timelines and practices around course preparation. the rapid implementation allowed us to respond to this unexpected pressure and begin to support instructors who wanted to make use of the tool. finally, further customizations and integrations will be necessary because of the nature of the rapid implementation and the inherent trade offs. the rapid implementation did not provide the time for us to pursue these customizations and integrations, and this is typical. for example, the course data integration would have required us to coordinate with our central it department and registrar to get approval and resources to build a data source and scripts to export and format the data so it could be loaded by the tool. normal implementation would have provided support and time for this. interestingly, sheedy, wells, and bellenger in discussing their implementation at curtin university noted that they too did not pursue some of the configurations during their implementation.22 regardless of the implementation schedule, libraries maybe uncertain of how these configurations will be useful until after using the tool. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 9 in our case, it was after implementation that we realized the benefits these configurations could offer staff and users in terms of efficiency. our next steps include • loading course data (we are working on this with our central it department and hope to have it complete for summer 2021), • refining and expanding the advanced automatic rules for copyright approval, • refining digitization workflows, and • implementing the q&a functionality to prompt faculty to describe if they need an entire ebook or just a digitized chapter. these steps will support staff in the administration of courses and reading list in alma and leganto and in improved efficiency in processing citations. furthermore, they will help ensure resources are accessible to all registered students for the appropriate time period. conclusion it is not yet clear if the adoption of the tool has been successful with only 30 lists with student activity for the fall 2020 term and 49 for the winter 2021 term. we had hoped that given the increased need for online learning in this term, instructors would have been eager to use the reading list tool to support student access to resources and learning. conversely, it seems likely that the new tool may have been too much for some instructors during this stressful period.23 sheedy, wells and bellenger noted that there is a potential for system rejection by end users if the change is perceived as creating additional workloads for academic staff, and this may be a factor in our implementation.24 as a result, we will continue to monitor use and determine if our engagement strategies are working. if not, we will need to provide further engagement, training, and support to build interest and use. the public health restrictions due to covid-19 challenged the structure of learning and library services in post-secondary institutions. in some cases, these challenges were opportunities for change. presented with the challenge of how to continue to provide course reserve reading list service, lcr decided to adopt leganto through a rapid implementation. this implementation was an opportunity for lcr to continue to provide reserve reading list service while implementing new workflows to support online access to resources (digitization and purchase requests). although this has met some of lcr’s immediate needs, there is still work that needs to be done to ensure long term successful adoption and use. we will need to continue to review and improve engagement strategies, workflows, and integrations to make the most of this investment. endnotes 1 gary brewerton and jon knight, “from local project to open source: a brief history of the loughborough online reading list system (lorls),” vine 33, no. 4 (2003): 189–95, https://doi.org/10.1108/03055720310510909. 2 richard cross, “implementing a resource list management system in an academic library,” electronic library 33, no. 2 (july 2015): 221, https://doi.org/10.1108/el-05-2013-0088. 3 “leganto course resource list management,” ex libris, march 18, 2021, https://exlibrisgroup.com/products/leganto-reading-list-management-system/. https://doi.org/10.1108/03055720310510909 https://doi.org/10.1108/el-05-2013-0088 https://exlibrisgroup.com/products/leganto-reading-list-management-system/ information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 10 4 cross, “implementing a resource list management system;” brewerton and knight, “local project to open source.” 5 marie o'neill and lara musto, “faculty perceptions of loughborough's online reading list system (lorls) at dublin business school (dbs),” new review of academic librarianship 23, no. 4 (2016): 368, https://doi.org/10.1080/13614533.2016.1272473; cross, “implementing a resource list management system in an academic library.” 6 cross, “implementing a resource list management system”; brewerton and knight, “local project to open source”; o’neill and musto, “faculty perceptions”; olivia walsby, “implementing a reading list strategy at the university of manchester—determination, collaboration and innovation,” insights the uksg journal 33 (2020), https://doi.org/10.1629/uksg.494. 7 o’neill and musto, “faculty perceptions,” 368. 8 o’neill and musto, “faculty perceptions,” 368. 9 o’neill and musto, “faculty perceptions,” 368; linda sheedy, david wells, and amanda bellenger, “implementation of a leganto reading list service at curtin university library,” in technology, change and the academic library, ed. jeremy atkinson (chandos publishing, 2021), 55– 61, https://doi.org/10.1016/b978-0-12-822807-4.00005-1. 10 o’neill and musto, “faculty perceptions,” 368. 11 walsby, “implementing a reading list strategy.” 12 o’neill and musto, “faculty perceptions,” 368. 13 meredith gorran farkas, “libraries in the learning management system,” in american libraries tips and trends (summer 2015), https://acrl.ala.org/is/wpcontent/uploads/2014/05/summer2015.pdf. 14 jennifer murray and daniel feinberg, “collaboration and integration,” information technology and libraries 39, no. 2 (november 2020), https://doi.org/10.6017/ital.v39i2.11863,. 15 murray and feinberg, “collaboration and integration,” 9. 16 o’neill and musto, “faculty perceptions,” 368. 17 cross, “implementing a resource list management system.” 18 ex libris, “course materials affordability: a win for university of st. thomas,” library journal (november 7, 2018), https://www.libraryjournal.com/?detailstory=course-materialsaffordability-a-win-for-university-of-st-thomas. 19 “canada’s best medical doctoral universities: rankings 2020,” maclean’s, october 3, 2019, https://www.macleans.ca/education/university-rankings-2020-canadas-top-medicaldoctoral-schools/. https://doi.org/10.1080/13614533.2016.1272473 https://doi.org/10.1629/uksg.494 https://doi.org/10.1016/b978-0-12-822807-4.00005-1 https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf https://doi.org/10.6017/ital.v39i2.11863 https://www.libraryjournal.com/?detailstory=course-materials-affordability-a-win-for-university-of-st-thomas https://www.libraryjournal.com/?detailstory=course-materials-affordability-a-win-for-university-of-st-thomas https://www.macleans.ca/education/university-rankings-2020-canadas-top-medical-doctoral-schools/ https://www.macleans.ca/education/university-rankings-2020-canadas-top-medical-doctoral-schools/ information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 11 20 o’neill and musto, “faculty perceptions,” 368. 21 murray and feinberg, “collaboration and integration,” 9. 22 sheedy, wells, and bellenger, “implementation of a leganto reading list service,” 55–61. 23 “faculty wellness and careers,” course hero (blog), december 1, 2020, https://www.coursehero.com/blog/faculty-wellness-research/. 24 sheedy, wells, and bellenger, “implementation of a leganto reading list service,” 55–61. https://www.coursehero.com/blog/faculty-wellness-research/ abstract introduction literature review what is leganto? university of calgary context pre-covid-19 reserve reading list service how did covid-19 affect our reserve reading list service? leganto rapid implementation implementation work go-live and post go-live takeaways conclusion endnotes information technology and libraries at 50: the 1980s in review mark dehmlow information technology and libraries | september 2018 8 mark dehmlow (mdehmlow@nd.edu) is director, library information technology at the hesburgh libraries, university of notre dame. my view of library technology in the 1980s through the lens of journal of library automation (jola) and its successor information technology and libraries (ital) is a bit skewed by my age. i am a gen-xer and much of my professional perspective has been shaped by the last two decades in libraries. while i am cognizant of our technical past, my perspective is very much grounded in the technical present. in a way, i think that context made my experience reviewing the 1980s in jola and ital all the more fun. the most pronounced event for the journal during the 1980s was the transition from the journal of library automation to information technology and libraries between 1981 to 1982. the rationale for this change is perhaps best captured through the context set in the guest editorial “old wine in new bottles?” by kenney in the first issue of ital: “proliferating technologies, the trend toward integration of some of these technologies into new systems, and rapidly increasing adoption of technology-based systems of all types in libraries .…”1 the article grounds us in the anxieties and challenges of the decade surrounding an accelerating change in technology. libraries were evolving from implementing systems of “automation,” a term that focuses more on processes, to broadening their view to “information technology,” which is more of a discipline — an ecosystem made up of technology, process, systems, standards, policies, etc. in a way, the article acknowledges the departure of libraries from their adolescent technological pasts to their young adult present for which the 80s would be the background. perhaps no other event is more technologically significant during the decade than the standardization of the internet. while the concept of networks and a network of networks, e.g. the internet, was conceptualized in the 1960s, it was the development of the tcp/ip network protocol that is the most consequential event because it made it possible to interconnect computer systems using a common means of communication. while the internet wouldn’t become ubiquitously popularized until the early 1990s with the emergence of the world wide web, the internet was active and alive well before that and, in its early state, was critical to the emergence and evolution of library technologies. from the first issue through the last of the 1980s, ital references the term “online” frequently. the “online” of the 80s however was largely text based, where systems were interconnected using lightweight terminals to navigate browse and search systems. it was not unlike a massive “choose your own adventure book,” skipping from menu to menu to find what you were looking for. throughout my review, i was happy to see a small, but significant, percentage of international articles that focused on character sets, automation, and collection comparisons in countries like kuwait, australia, china, and israel. diversity is a cornerstone for lita and ala and the journal has continued this trend to encourage the submission of articles from outside of the u.s. the 1980s volumes of ital traversed a plethora of topics ranging from measuring system the 1980s in review | dehmlow 9 https://doi.org/10.6017/ital.v37i3.10749 performance (efficiency was important during a time when computing was relativ ely slow and expensive) to how to use library systems to provide data that can be used to make business decisions. over the decade, there was a significant focus on library organizations coming to terms with new technology, e.g. the automation of circulation, acquisitions, and the marc bibliographic record. there were several articles that discussed the complications, costs, and best practices for converting card-catalog metadata to electronic records and several other articles that detailed large barcoding projects. the largest number of articles on a single topic focused on the automation and management of authority control in automated library systems. there were articles on the emergence of research databases often delivered as applications on cd-roms which would then be installed on microcomputers. the term “microcomputer” was frequently used because the 80s saw the emergence of the personal computer in the work environment, a transformative step in enabling staff and patrons alike to access online library services and applications to support their research and work. electronic mail was in its infancy and became a novel way to share information with end users across a campus. several articles focused on the physical design of search terminals and optimizing the ergonomics of computers. there were also many articles about designing the best opac interface for users, ranging from how to present bibliographic records to users, to what information should be sent to printers, to early efforts to extend local catalogs with article-based metadata. many of these topics have parallels today. instead of only analyzing statistical usage data we can pull from our systems, libraries are striving to develop predictive analytics, leveraging big-data from across an assortment of institutions. i found the 1988 article “investigating computer anxiety in an academic library,” which examines staff resistance to technology and change to be as apropos today as it was then.2 cd-roms have gone the way of the feathered and overly hairsprayed coifs of the 80s and have largely been superseded by hard drives and solid state flash media that can hold significantly more data and can transfer data more rapidly. the current decade of the 2010s has been dedicated to providing the optimal search experience for our end users as we have broadened our efforts to the discovery of all scholarly information, not just what is held in our collections. and of course, instead of adding a few article abstracting resources to our catalogs in an innovative, but difficult to sustain manner, the commercial sector has created web-scale mega-indexes that are integrated with our catalogs and offer the promise of searching a predominant amount of the scholarly record. there was a really interesting thread of articles over the decade that traced the evolution of the ils in libraries. there were articles about how to develop automation systems for libraries, the various functions that could be automated — cataloging, circulation, acquisitions, etc. — and evaluation projects for commercial systems. if the 2000s was the era of consolidation, the early 1980s could easily represent the era of proliferation. the decade nicely traces the first two generations of library systems, starting with university-developed automation and database backed systems and the migration of many of those systems to vendors. the northwestern university-based notis system was referenced a lot and there were some mentions of oclc’s acquisition and distribution of the ls/2000 system. this part of our automation history is a palpable reminder that libraries have been innovative leaders in technology for decades, often developing systems ahead of the commercial industry in an effort to meet our evolving service portfolios. this early strategy for libraries mirrors recent developments of institutional repositories, current research information systems (criss), and faculty profiling systems like vivo that were developed before the commercial sector saw the feasibility of commercialization. information technology and libraries | september 2018 10 the cycle of selecting and implementing a new integrated library system is something that m any organizations are faced with again. the only difference is that the commercial sector has entered into the development of the 4th or 5th generation of integrated library systems, many of which are coming with data services integrated and most of them are implemented in the cloud. in addition to seeing our technically rudimentary past, there were several articles over the decade that discussed especially innovative ideas or that anticipated future technologies. a 1983 article by tamas doszkocs which was written long before the emergence of google is an early revelation that regular patrons struggle to use expert systems that require normalized and boolean searching strategies. not surprising is the conclusion that users lean organically toward natural language searching, but even then we were having the expert experience vs. intuitive experience debate in the profession: “the development of alternative interfaces, specifically designed to facilitate direct end user interaction in information retrieval systems, is a relatively new phenomenon.”3 the 1984 article, “packet radio for library automation,” is about eliminating the challenges of retrofitting buildings with cabling to connect lan networks by using radio based interfaces.4 could this be an early precursor to wifi? there is the 1985 article titled “microcomputer based faculty-profile” about using a local database management application on a pc to create an index of faculty publications and university publishing trends.5 this is nearly three decades before the popularization of the cris and faculty profile system. in 1986, there is an article “integrating subject pathfinders into a geac ils: a marc-formatted record approach,” an article that made me think about how library websites are structured, and the current trend of developing online research guides and making them discoverable in our websites as a research support tool.6 and finally, i was struck by the innovative approach in 1987’s “remote interactive online support,” wherein the authors wrote about using hardware to make simultaneous shell connections to a search interface so they could give live search guidance to researchers remotely. 7 we take remote technical support for granted now, but in the late 80s, this required several complicated steps to achieve. the 80s were an exciting time for technology development and a decade that is rife with technical evolution. i think this quote from the article “1981 and beyond: visions and decisions” by fasana in the journal of library automation best elucidates the deep connection between the past and the future, “library managers are currently confronted with a dynamic environment in which they are attempting simultaneously to plan library services and systems for the future, and to control the rate and direction of change.”8 this still holds true. library managers are still planning services in a rapidly changing environment, except, i like to think we have learned to live with change that we cannot control the rate nor direction of. 1 b. kenney, “guest editorial: old wine in new bottles?,” information technology and libraries, 1 no. 1 (march 1982), p. 3. 2 maryellen sievert, rosie l. albritton, paula roper, and nina clayton, “investigating computer anxiety in an academic library,” information technology and libraries 7 no. 3 (september 1988), pp. 243-252. the 1980s in review | dehmlow 11 https://doi.org/10.6017/ital.v37i3.10749 3 tamas e. doszkocs, “cite nlm: natural-language searching in an online catalog,” information technology and libraries 2 no. 4 (december 1983), p. 364. 4 edwin b. brownrigg, clifford a. lynch, and rebecca pepper, “packet radio for library automation,” information technology and libraries 3 no. 3 (september 1984), pp. 229-244. 5 vladimir t. borovansky and george s. machovec, “microcomputer based faculty-profile,” information technology and libraries 4 no. 4 (december 1985), pp. 300-305. 6 william e. jarvis and victoria e. dow, “integrating subject pathfinders into a geac ils: a marcformatted record approach,” information technology and libraries 5 no. 3 (september 1986), pp. 213-227. 7 s. f. rossouw and c. van rooyen, “remote interactive online support,” information technology and libraries 6 no. 4 (december 1987), pp. 311-313. 8 paul j. fasana, “1981 and beyond: visions and decisions,” journal of library automation 13 no. 2 (june 1980), p. 96. letter from the editors (december 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.16005 from the articles and communications in our december issue, the library technology profession has begun thinking through and reporting on the adaptations and changes wrought by the ongoing (some may say never-ending) covid-19 pandemic. four of the 5 articles in this issue relate to the many ways the pandemic altered how libraries do their work, both behind the scenes and in public. from the tools we use internally for project management to those we provide to our public service colleagues, it seems no aspect of library technology has been untouched. in particular, the seriousness of the challenges caused by interfaces with poor accessibility has been brought to the foreground. a critical component of libraries’ diversity, equity, inclusion, and accessiblity(deia) efforts, ensuring equitable access to all must be at top of mind. when the pandemic led libraries, and education in general, to adapt to largely virtual presentation models, the interactive tools we reached for—products such as padlet, jamboard, and poll everywhere— became de rigeur for establishing two-way interactions with our communities. yet little attention was paid, until now, to the accessibility of those tools. in this issue, “tech tools in pandemictransformed information literacy instruction: pushing for digital accessibility” provides excellent qualitative data to help us understand how well, or poorly, these tools meet accessibility needs. articles • digitization of libraries, archives, and museums in russia / heesop kim and nadezhda maltceva • tech tools in pandemic-transformed information literacy instruction: pushing for digital accessibility / amanda rybin koob, kathia salomé ibacache oliva, michael williamson, marisha lamont-manfre, addie hugen, and amelia dickerson • spatiotemporal distribution change of online reference during the time of covid-19 / thomas gerrish and ningning nicole kong communications • a library website redesign in the time of covid: a chronological case study / erin rushton and bern mulligan • a library website migration: project planning in the midst of a pandemic / isabel vargas ochoa as always, if you have lessons learned about technologies and their effect on our mission, we’d like to hear from you. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. we particularly welcome our public library colleagues to consider a column in our “public libraries leading the way” series; proposals for 2023 may be submitted through the pllw proposal form. with best wishes for 2023, kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/article/view/13783 https://ejournals.bc.edu/index.php/ital/article/view/15383 https://ejournals.bc.edu/index.php/ital/article/view/15383 https://ejournals.bc.edu/index.php/ital/article/view/15097 https://ejournals.bc.edu/index.php/ital/article/view/15101 https://ejournals.bc.edu/index.php/ital/article/view/14801 https://ejournals.bc.edu/index.php/ital/call-for-submissions https://docs.google.com/forms/d/e/1faipqlsegdx926lhtfsrsdkexaqzmx1ayfw7g2ny6j1iegy-qt6lubq/viewform?usp=sf_link mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com articles communications off-campus access to licensed online resources through shibboleth article off-campus access to licensed online resources through shibboleth francis jayakanth, ananda t. byrappa, and raja visvanathan information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12589 abstract institutions of advanced education and research, through their libraries, invest substantially in licensed online resources. only authorized users of an institution are entitled to access licensed online resources. seamless on-campus access to licensed resources happens mostly through internet protocol (ip) address authentication. increasingly, licensed online resources are accessed by authorized users from off-campus locations as well. libraries will, therefore, need to ensure seamless off-campus access to authorized users. libraries have been using various technologies, including proxy server or virtual private network (vpn) server or single sign-on, to facilitate seamless offcampus access to licensed resources. in this paper, authors share their experience in setting up a shibboleth-based single sign-on (sso) access management system at the jrd tata memorial library, indian institute of science, to enable authorized users of the institute to seamlessly access licensed online resources from off-campus locations. introduction the internet has both necessitated and offered options for libraries to enable remote access to an organization’s licensed online content—journals, e-books, technical standards, bibliographical and full-text databases, and more. in the absence of such an option for remote access, faculty, students, and researchers have limited and constrained access to the licensed online content from off campus locations. as scholarly resources transitioned from print to online in the mid-1990s, libraries and their vendors had to start identifying user affiliations in order to grant access to licensed online resources to the authorized users of an institution. the ip address was an obvious mechanism to do that. allowing or denying access to online resources based on a user’s ip address was simple, it worked, and, in the absence of practical alternatives, it became the universal means of authentication for gaining access to licensed library content.1 to facilitate seamless access to licensed online resources from off-campus sites, libraries have been using various technologies including proxy server or vpn server or remote desktop gateway or federated identity management or a combination of the said technologies. in our institute, the on-campus ip-based access to the licensed content is supplemented by vpn technology for off-campus access. the covid-19 pandemic has necessitated academic and scientific staff work from home, which demands smooth and seamless access to the organization’s licensed content. the sudden surge in demand for seamless off-campus access to the licensed online resources had an impact on the institute’s vpn server. also, not all authorized users of the francis jayakanth (francis@iisc.ac.in) is scientific officer, j.r.d. tata memorial library, indian institute of science. ananda t. byrappa (anandtb@iisc.ac.in) is librarian, j.r.d. tata memorial library, indian institute of science. raja visvanathan (raja@inflibnet.ac.in) is scientist c (computer science), inflibnet centre, gandhinagar, india. © 2021. mailto:francis@iisc.ac.in mailto:anandtb@iisc.ac.in mailto:raja@inflibnet.ac.in information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 2 institute are entitled to get vpn access. to mitigate the situation, the library, therefore, had to explore a secure, reliable, and cost-effective solution to facilitate seamless off-campus access to all the licensed online resources to all the authorized users of the institute. after exploring the possibilities, the library decided to implement a single sign-on solution based on shibboleth. shibboleth software implements the security assertion markup language (saml) protocol, separating the functions of authentication (undertaken by the library or university, which knows its community of end users) and authorization (undertaken by the resource provider, which knows which libraries have licenses for their users to access the resource in question). 2 about the indian institute of science (iisc) the indian institute of science (iisc, or “the institute”) was established in 1909 by a visionary partnership between the industrialist jamsetji nusserwanji tata, the maharaja of mysore, and the government of india. over the 109 years since its establishment, iisc has become the premier institute for advanced scientific and technological research and education in india. since its inception, the institute has laid a balanced emphasis on the pursuit of fundamental knowledge in science and engineering, and the application of its research findings for industrial and social benefit. during 2017–18, the institute initiated the practice of undergoing international peer academic reviews over a 5-year cycle. each year, a small team of invited international experts reviews a set of departments. the experts spend 3 to 4 days at the institute. during this period, they interact closely with the faculty and students of these departments and tour the facilities, aiming to assess the academic work against international benchmarks. iisc has topped the ministry of human resource development (mhrd), government of india’s nirf (national institutional ranking framework) rankings not only in the university’s category but also overall among all ranked institutions. times higher education has placed iisc at the 8th position in its small university rankings (that is, among universities with fewer than 5 ,000 students), at the 13th position in its ranking of universities in the emerging economies, and in the range 91–100 in its world reputation rankings. in the qs world university rankings, iisc is ranked 170. in the same ranking system, on the metric of citations per faculty, iisc is placed in second position. iisc publishes about 3,000 papers per year in scopus and web of science indexed journals and conferences and, each year, the institute awards around 400 phd degrees. about the iisc library jrd tata memorial library (https://www.library.iisc.ac.in), popularly known as the indian institute of science library, is one of the best science and technology libraries in india. started in 1911, as one of the first three departments in the institute, it has become a precious national resource center in the field of science and technology. the library receives annually a grant of 1012% of the total budget of the institute. the library spends about 95% of its budget toward periodical subscriptions, which is unparalleled in this part of the globe. with a collection of nearly 500,000 volumes of books, periodicals, technical reports and standards, the jrd tata memorial library is one of the finest in the country. currently, it subscribes to over 13,000 current periodicals. the library also maintains the iisc’s research publications repository, eprints@iisc (http://eprints.iisc.ac.in), and its theses and dissertations repository, etd@iisc (https://etd.iisc.ac.in). https://www.library.iisc.ac.in/ http://eprints.iisc.ac.in/ http://etd.iisc.ac.in/ https://etd.iisc.ac.in/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 3 off-campus access to licensed online resources in a typical research library, licensed scholarly resources comprise research databases, electronic journals, e-books, standards, and more. a library licenses these resources through publishers/vendors. these license agreements limit access to the resources to the authorized users of an institute. in our case, authorized users include faculty members, enrolled students, current staff, contractual staff, and walk-in users to the library. seamless access to the licensed resources from on-campus sites is predominantly ip-address authenticated, which is a simple and efficient model for users physically located on the institute campus. these users expect a similar experience while accessing licensed online resources from off-campus locations. therefore, the challenge to the libraries is to ensure that such off-campus accesses are secure, seamless, and restricted to authorized users of an institute. libraries have been using various technologies including proxy servers, vpn servers, or single sign-on to facilitate seamless off-campus access to licensed resources. our institute has been using vpn technology to enable off-campus access to licensed online resources. a virtual private network (vpn) is a service offered by many organizations to its members to enable them to remotely connect to the organization’s private network. a vpn extends a private network across a public network and allows users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network. applications running across a vpn may therefore benefit from the functionality, security, and management of the private network. encryption is common, although not an inherent, part of a vpn connection.3 in our institute, faculty members and students are provided access to the vpn service when their institute email address is created. users follow four steps to use a vpn client to get connected to the campus network: • install vpn client software on their computer system. cisco anyconnect (https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-securemobility-client/at-a-glance-c45-578609.html) is one such software. • start the vpn client software every time there is a need to connect to the private network. • enter the address of the institute’s vpn server, and click connect in the anyconnect window. • log in to the vpn server using their institutional email credentials. an authorized user of the institute can use any of the ip authenticated network services, including the licensed online resources, after a successful login to the vpn server. the vpn technology has been serving the purpose well, but the service is, by default, available only to the institute’s faculty and students. other categories of employees such as project assistants, project associates, research assistants, post-doctoral fellows, and others, who constitute a good percentage of iisc staff, are provided vpn access on a case-by-case basis. during the covid-19 lockdown, the library received several enquiries about accessing the online resources from off-campus sites. realizing the importance of the situation, the library quickly assessed the various possibilities for facilitating seamless off-campus access to the subscribed online resources apart from the vpnbased access. federated access through shibboleth identity provider (idp) service emerged as a possible solution to facilitate seamless off-campus access to the entire institute community. https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-secure-mobility-client/at-a-glance-c45-578609.html https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-secure-mobility-client/at-a-glance-c45-578609.html information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 4 federated access federated access is a model for access control in which authentication and authorization are separated and handled by different parties. if a user wishes to access a resource controlled by a service provider (sp), the user logs in via an identity provider (idp). more complex forms of federated access involve the use of attributes (information about the user passed from the idp to the sp, which can be used to make access decisions) and can include extra services such as trust federations and discovery services (where the user selects which idp to use to connect to the sp). 4 examples of this federated access model include shibboleth and openathens. shibboleth is opensource software that offers single sign-on infrastructure. openathens is a commercial product delivered as a cloud-based solution. it supports many of the same standards as shibboleth. so, an institution could pay and join the openathens federation, which will provide technical support to set up, integrate, and operationalize federated access using openathens. we decided to go with shibboleth for the following reasons: • to avoid the recurring cost associated with the openathens solution. • the existence of a shibboleth-based infed federation in the country. infed manages the trust between the participating institutions and publishers (http://infed.inflibnet.ac.in/). • infed is part of the edugain inter-federation, which enables our users to gain access to the resources of federations of other countries. what is shibboleth? shibboleth is a standards-based, open-source software package for web single sign-on across or within organizational boundaries. it allows sites to make informed authorization decisions for individual access of protected online resources in a privacy-preserving manner. the shibboleth software implements widely used federated identity standards, principally the oasis security assertion markup language (saml), to provide a federated single sign-on and attribute exchange framework. a user authenticates with their organizational credentials, and the organization (or identity provider) passes the minimal identity information necessary to the service provider to enable an authorization decision. shibboleth also provides extended privacy functionality allowing a user and their home site to control the attributes released to each application (https://www.shibboleth.net/index/). shibboleth has two major components: (1) an identity provider (idp), and (2) a service provider (sp). the idp supplies required authorizations and attributes about the users to the service providers (for example, publishers). the service providers make use of the information about the users sent by the idp to make decisions on whether to allow or deny access to their resources. interaction between a shibboleth identity provider and service provider. when a user attempts to access licensed content on the service provider’s platform, the service provider generates an authentication request and then directs the request and the user to the user’s idp server. the idp prompts for the login credentials. in our setup, the idp server communicates the login credentials to the institute’s active directory (ad) using the secure lightweight directory access protocol (ldap). http://infed.inflibnet.ac.in/ https://www.shibboleth.net/index/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 5 ad is a directory service provided by microsoft. in a directory service, objects (such as a user, a group, a computer, a printer, or a shared folder) are arranged in a hierarchical manner facilitating easy access to the objects. organizations primarily use ad to perform authentication and authorization. once the authenticity of a user is verified, ad helps in determining if a user is authorized to use a specific resource or service. access is granted to a user only if the user checks out on both counts. the ad authenticates a user, and the response is sent back to the idp along with the required attributes. the idp then releases only the required set of attributes to the service provider. based on the idp attributes, which is nothing but a user’s entitlement, the sp grants access to the resource. figure 1 illustrates the functioning of the two components of shibboleth. figure 1. a shibboleth workflow involving a user, identity provider, and service provider. identity federation the interaction between a service provider and identity provider happens based on mutual trust. the trust is established by providing idp metadata as encrypted keys and the idp url that the sp uses to send and request information from the idp. the exchange of metadata between idp and sp can be informal if an institution licenses online resources from only a few publishers. however, research libraries license content from hundreds of sps. therefore, the role of federations is significant. in the absence of a federation, each identity provider and service provider must individually communicate with each other about their existence and configuration, as illustrated in figure 2. information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 6 figure 2. individual communication between idps and sps. a federation is merely a list of metadata entries aggregated from their member idps and their sps. our institute is a member of infed (information and library network access management federation). infed was established as a centralized agency to coordinate with member institutions in the process of implementing user authentication and access control mechanism across all member institutions. infed manages the trust relationship between the idps and sps (publishers) in india. therefore, individual idps that intend to facilitate access to subscribed online resources through shibboleth will share their metadata with infed. infed, in turn, will share the metadata of the idps with respective service providers, as illustrated in figure 3. other regions have their federations. for example, n the us, incommon (https://www.incommon.org/) serves as the federation, and in the uk, it is the uk access management federation (http://www.ukfederation.org.uk/). https://www.incommon.org/ http://www.ukfederation.org.uk/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 7 figure 3. role of a federation as a trust manager between idps and sps. how does one gain access to shibboleth-enabled resources? a federation manages the trust between identity providers and service providers. the sps enable shibboleth-based access to subscribed resources to the idps based on the metadata shared by a federation. once the sps allow access, users can access such resources by using the institutional login option via the athens/shibboleth link found on the service provider’s platform. alternatively, a library can create a simple html page listing all the shibboleth-enabled licensed resources, as shown in figure 4. information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 8 figure 4. partial screenshot of shibboleth-enabled resources of our institute. each of the links in figure 4 is a wayfless url. a wayfless url is specific to an institution (idp), and it enables users of that institution to gain federated access to a service or resource in a way that bypasses where are you from (wayf), or the institutional login (discovery service) steps on the sp’s platform. since the institutional login or the discovery service step can be confusing to end users, wayfless links to the resources will facilitate an improved end-user experience in accessing licensed resources. a user needs to follow a link from the list of resources. the link will take the user to the sp. the sp will redirect the user to the idp server for authentication. after successful authentication, the user will gain access to the resource. there are two ways to get a wayfless url to a service: (1) the service provider can share the url or (2) one can make use of a wayfless url generator service like wugen (https://wugen.ukfederation.org.uk/wugen/login.xhtml). https://wugen.ukfederation.org.uk/wugen/login.xhtml information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 9 benefits of shibboleth-based access shibboleth-based single sign-on can effectively address several requirements of the libraries in ensuring secure and seamless on-campus and off-campus access to subscribed online resources. there are other benefits of shibboleth-based sso: 1. it is open-source software that provides single sign-on infrastructure. 2. it enables organizations to use their existing user authentication mechanism to facilitate seamless access to licensed online resources. 3. being a single sign-on system, for the end users, it eliminates the need to have individual credentials for each online resource. 4. it uses security assertion mark-up language (saml) to securely transfer information about the process of authentication and authorization. 5. it is used by most of the publishers, who facilitate shibboleth-based access through shibboleth federations. 6. it requires a formal federation as a trusted interface between the institutions as an identity provider (idp) and publishers as service providers (sp) thereby ensuring the use of uniform standards and protocols while transmitting attributes of authorised users to publishers. inflibnet’s access management federation, infed, plays this role (https://parichay.inflibnet.ac.in/objectives.php). idp server configuration we installed the shibboleth idp software version 3.3.2 on a virtual machine on the azure platform. the vm system is configured with two virtual cpus, 4 gb of ram, 300 gib of os disk (standard hdd), and ubuntu linux os version 18.04.4 lts. coordination with the organization’s network support team is essential. the network support team handles the domain name service resolution of the idp server and facilitates the idp server to communicate with the organization’s active directory and to open non-standard communication ports on the idp server. shibboleth idp usage statistics the infed team has developed a beta version of the usage analysis tool called infedstat to analyse the use of federated access to gain access to licensed resources. we have implemented the tool on the idp server. figure 5 shows the redacted screenshot of the infedstat dashboard. it shows • date-wise usage details of logged-in users along with ip address, time logged in, and the publishers’ platforms accessed, • number of times the publishers’ platforms were accessed during a specific period, • number of times users logged in for a specific period, • unique users for a specific period, and • unique publishers accessed during a specific period. https://parichay.inflibnet.ac.in/objectives.php information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 10 figure 5. idp usage dashboard. conclusions the implementation of federated access to subscribed online resources has ensured that all the authorized users of the institute can access almost all the licensed resources from wherever they are. the counter 5 usage analysis of subscribed resources for the period of january 2020 to october 2020 indicates that usage of online resources has increased by nearly 20 percent over the last year for the same period. the enhanced use could be partly because of ease of accessing online resources facilitated by federated access. to assess the reasons for enhanced usage of online resources, the library is planning to conduct a survey to understand how convenient and useful federated access to online resources has been especially while being off campus. federated access through single sign-on is useful not just for accessing licensed online resources. a typical research library offers various other services to its users, including the institutional repository service, learning management system, online catalogue, etc. the library intends to integrate such services with sso, thereby freeing the end users from service-specific credentials. endnotes 1 thomas dowling, “we have outgrown ip authentication,” journal of electronic resources librarianship 32, no. 1 (2020): 39–46, https://doi.org/10.1080/1941126x.2019.1709738. 2 john paschoud, “shibboleth and saml: at last, a viable global standard for resource access management,” new review of information networking 10, no. 2 (2004): 147–60, https://doi.org/10.1080/13614570500053874. 3 andrew g. mason, ed., cisco secure virtual private network (cisco press, 2001): 7, https://www.ciscopress.com/store/cisco-secure-virtual-private-networks-9781587050336. 4 masha garibyan, simon mcleish, and john paschoud, “current access management technologies,” in access and identity management for libraries: controlling access to online information (london, uk: facet publishing, 2014): 31–38. https://doi.org/10.1080/1941126x.2019.1709738 https://doi.org/10.1080/13614570500053874 https://www.ciscopress.com/store/cisco-secure-virtual-private-networks-9781587050336 abstract introduction about the indian institute of science (iisc) about the iisc library off-campus access to licensed online resources federated access what is shibboleth? interaction between a shibboleth identity provider and service provider. identity federation how does one gain access to shibboleth-enabled resources? benefits of shibboleth-based access idp server configuration shibboleth idp usage statistics conclusions endnotes contactless services: a survey of the practices of large public libraries in china article contactless services a survey of the practices of large public libraries in china yajun guo, zinan yang, yiming yuan, huifang ma, and yan quan liu information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14141 yajun guo (yadon0619@hotmail.com) is professor, school of information management, zhengzhou university of aeronautics. zinan yang (yangzinan612@163.com) is master, school of information management, zhengzhou university of aeronautics. yiming yuan (yuanyiming361@163.com) is master, school of information management, zhengzhou university of aeronautics. huifang ma (mahuifang126@126.com) is master, school of information management, zhengzhou university of aeronautics. *corresponding author hamed yan quan liu (liuy1@southernct.edu) is professor, department of information and library science, southern connecticut state university. © 2022. abstract contactless services have become a common way for public libraries to provide services. as a result, the strategy used by public libraries in china will effectively stop the spread of epidemics caused by human touch and will serve as a model for other libraries throughout the world. the primary goal of this study is to gain a deeper understanding of the contactless service measures provided by large chinese public libraries for users in the pandemic era, as well as the challenges and countermeasures for providing such services. the data for this study was obtained using a combination of website investigation, content analysis, and telephone interviews for an analytical survey study of 128 large public libraries in china. the study finds that touch-free information dissemination, remote resources use, no-touch interaction self-services, network services, online reference, and smart services without personal interactions are among the contactless services available in chinese public libraries. exploring the current state of contactless services in large public libraries in china will help to fill a need for empirical attention to contactless services in libraries and the public sector. up-to-date information to assist libraries all over the world in improving their contactless services implementation and practices is provided. introduction the spread of covid-19 began in 2020, and people all over the world are still fighting the severity of its spread, the breadth of its impact, and the extent of its endurance. the virus’s continued spread has had a wide-ranging impact on industry sectors worldwide, including libraries. the growth of public libraries has also seen significant changes as a result of covid-19, resulting in added patron services, including contactless services. contactless services are those that patrons can use without having to interact face to face with librarians. these services transcend time and geographical constraints, as well as lower the danger of disease transmission through human interaction. since the covid-19 pandemic, contactless or touch-free interaction services are emerging in chinese public libraries. this service model can also serve as a reference for other libraries. this study evaluates and analyzes contactless service patterns in large public libraries in china, and then suggests a contactless service framework for public libraries, which is currently in the process of being implemented. mailto:yadon0619@hotmail.com mailto:yangzinan612@163.com mailto:mahuifang126@126.com mailto:liuy1@southernct.edu information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 3 literature review the available literature shows that the term “non-contact” appeared as early as 1916 in the article “identification of the meningococcus in the naso-pharynx with special reference to serological reactions” and described a patient’s infection in the context of medical research.1 in recent years, with the widespread application of “internet +” and the development and promotion of technologies such as the internet of things, cloud computing, and artificial intelligence, the contactless economy has grown by leaps and bounds, and so has the research on library contactless services.2 library contactless services encompass a wide range of services such as selfservices, online reference, and smart services without personal interactions. library self-service has become a major service model for contact-free services. the self-service model was first adopted in american public libraries in the 1970s with the emergence of self service borrowing and returning practices.3 many public libraries have since adopted stand-alone, fully automated self-service halls, self-service counters, etc.4 by the 1990s, a range of commercial self-service kiosks and self-service products had been introduced.5 currently, the most mature self-service type used by the library community is the circulation self-service product.6 in addition to self-service borrowing and returning of titles, libraries have launched self-service printing systems, self-service computer systems, and self-service booking of study spaces.7 as an example, patrons can complete printing operations using a self-service system and can offer payment by bank card, alipay, wechat, and other means.8 a face recognition system can also be used to borrow and return books, a solution for patrons who forget their library cards.9 these library selfservice system elements are confined to simple, repetitive, and routine tasks such as conducting book inventories, book handling, circulating books, and the like, whose development stems from the widespread application of electronic magnetic stripe technology and radio frequency identification (rfid), optical character recognition (ocr) technology, and face recognition.10 new applications of technology continue to advance the development of contactless services in libraries. the overall work and service processes of the library have been made intelligent to varying degrees. online reference is an important service in the contactless service program. researchers have started to study the current state of library reference services. interactive online reference services support patrons using the library, including how to search for literature, locate and renew books, schedule a study or seminar room, and participate in other library activities, such as seminars, lectures, etc.11 in response to the problem of how patrons access various library service abilities, digital reference systems need to have functions such as automated semantic processing, automated scene awareness, through automatic calculation and adaptive matching, understanding of patrons’ interests preferences and needs, and the ability to recommend the most suitable information resources for them.12 at present, most library reference services in china mainly include the use of telephone, email, wechat, robot librarians/interactive communication, microblogs, and qq, an instant messaging software popular in china. during the past two years, most public libraries in china have essentially implemented the use of the aforementioned reference tools to communicate and interact with patrons, with wechat having a 55.6% adoption rate when compared to other instant reference tools.13 the use of online chat in reference services has allowed librarians to help patrons from anywhere and at any time through embedding chat plug-ins into multiple pages of the library website and directing patrons to ask questions based on the specific page they are viewing, setting up automatic pop-up chat windows, and changing patrons’ passive waiting to active engagement. 14 in terms of technology, emerging technologies information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 4 such as patron profiling, natural language processing, and contextual awareness can support the development of reference advisory services in libraries.15 the online reference service provides a 24/7, high-quality, efficient, and personalized service that connects libraries more closely with society and is an important window in the future smart library service system. smart services without personal interactions may become the most popular form of library services development for the future, and research on library smart services has gradually deepened. in terms of conceptual definition, the library community generally understands the concept of library smart services as mobile library services that are not limited by time and space and can help patrons find books and other types of materials in the library by connecting to the wireless internet.16 apart from this, there are two other ways to define library smart services. one discusses the meaning of smart services in an abstract way, such as library smart services that should be an advanced library form dedicated to knowledge services through human-computer interaction, a comprehensive ecosystem.17 the other concretizes the extension of this concept expressed with a formula “smart library = library + internet of things + cloud computing + smar t devices.”18 applied technology research is an important part of smart services in libraries. library smart services have three main features: digitization, networking, and clustering. among them, digitization provides the technical basis, networking provides the information guarantee, and clustering provides the library management model of resources sharing, complementary advantages, and common development among libraries.19 the key breakthrough in the development of smart services is the applications deployment of smart technologies to truly realize a new form of integration of online and offline, virtual and reality. 20 the integration of face recognition technology in traditional libraries, as well as its application to services like acces s control management, book borrowing and returning, and wallet payment, can help libraries build smart services faster.21 the integration of deep learning into a mobile visual search system for library smart services can play an important role in integrating multiple sources of heterogeneous visual data and the personalized preferences of patrons.22 blockchain technology, born out of the impact of the new wave of information technology, has also been applied to the construction of smart library information systems because of its decentralized and secure features.23 library smart services can leverage new technologies and smart devices to enhance the efficiency of library contact-free services and provide new opportunities for knowledge innovation, knowledge sharing, and universal participation, thereby enabling innovation in service models. additional research on the development of contactless services in service areas such as library self-services, online reference, and smart services is discussed. in particular, the research and construction of smart library services have been enriched with the advent of big data and artificial intelligence. however, non-contact service has not been systematically researched and elaborated in domestic and international librarianship. the emergence and prevalence of covid-19 has enabled libraries in many countries to practice various types of touch-free services, such as the introduction of postal delivery, storage deposit, and click-and-collect in australian libraries; curbside pickup service or build a book bag service in us public libraries; and delivery book to the building services in chinese university libraries. 24 therefore, a systematic investigation and study of contactless services in public libraries in the pandemic is of great importance for the adaptation and innovation of library services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 5 methods survey samples the survey selected some of the most typical public libraries for the study. the selection criteria were those large public libraries in the more economically and culturally developed regions of china. a total of 128 large public libraries were identified, including national libraries, 32 provincial public libraries, and municipal public libraries in the top 100 cities by gdp ranking in 2020, of which five public libraries, including the capital library and nanjing library, are both top 100 city libraries and provincial libraries. these 128 large public libraries can more obviously reflect the current service level of the better developed public libraries in china, and represent the highest level of public library construction in china. (see table 1 for a list of the libraries studied.) table 1. a list of the 128 public libraries that were studied no. library no. library 1. national library of china 2. hebei library 3. shanxi library 4. liaoning provincial library 5. jilin province library 6. heilongjiang provincial library 7. zhejiang library 8. anhui provincial library 9. fujian provincial library 10. jiangxi provincial library 11. shandong library 12. henan provincial library 13. hubei provincial library 14. hunan library 15. guangzhou library 16. hainan library 17. sichuan library 18. guizhou library 19. yunnan provincial library 20. shanxi library 21. gansu provincial library 22. qinghai library 23. guangxi library 24. inner mongolia library 25. tibet library 26. ningxia library 27. xinjiang library 28. shanghai library 29. capital library of china 30. shenzhen library 31. guangzhou digital library 32. chongqing library 33. tianjin library 34. suzhou library 35. chengdu public library 36. wuhan library 37. hangzhou public library 38. nanjing library 39. qingdao library 40. wuxi library 41. changsha library 42. ningbo library 43. foshan library 44. zhengzhou library 45. nantong library 46. dongguan library 47. yantai library 48. quanzhou library 49. dalian library 50. jinan library 51. xi’an public library 52. hefei city library 53. fuzhou library 54. tangshan library 55. changzhou library 56. changchun library 57. guilin library 58. harbin library 59. xuzhou library 60. shijiazhuang library 61. weifang library 62. shenyang library 63. wenzhou library 64. shaoxing library information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 6 no. library no. library 65. yangzhou library 66. yancheng library 67. nanchang library 68. zibo library 69. kunming library 70. taizhou library 71. erdos city library 72. public library of jining 73. taizhou library 74. linyi library 75. luoyang library 76. xiamen library 77. dongying library 78. nanning library 79. zhenjiang library 80. jiaxing library 81. xiangyang library 82. jinhua library 83. yichang library 84. huizhou tsz wan library 85. cangzhou digital library 86. zhangzhou library 87. weihai library 88. digital library of handan 89. guiyang library 90. sun yat-sen library of guangdong province 91. ganzhou library 92. baotou library 93. huaian library 94. yulin digital library 95. dezhou network library 96. yuyang library 97. changde library 98. baoding library 99. the library of jiujiang city 100. taiyuan library 101. hohhot library 102. wuhu library 103. langfang library 104. national library of hengyang city 105. maoming library 106. nanyang library 107. heze library 108. urumqi library 109. zhanjiang library 110. zunyi library 111. shangqiu library 112. jiangmen library 113. liuzhou library 114. zhuzhou library 115. xuchang library 116. chuzhou library 117. lianyungang library 118. suqian library 119. mianyang library 120. zhuhai library 121. xinyang library 122. zhoukou library 123. zhumadian library 124. huzhou library 125. lanzhou library 126. fuyang library 127. xinxiang library 128. jiaozuo library survey methods web-based investigation, content analysis, and interviews with librarians were used to assess 128 public libraries in china. the survey was carried out between march 10 and september 15 in 2021. first, the authors identified the media platforms for sharing information about each public library’s contactless services, including an official website, a social networking account on wechat, or a library-developed app. the authors investigated whether these media platforms were updated with information about the contactless services and if they provided various information about these services. next, the authors searched the various contactless services offered by this library through these media platforms and recorded them. finally, the authors reviewed the data and findings from the survey to minimize errors and ensure the accuracy of the findings. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 7 findings touch-free information distribution the distribution of library information is generally carried out in a touch-free manner. there are three commonly used information media in libraries: official website, wechat official account, and library-developed app. the adoption rate of each information medium by libraries is determined by investigating whether libraries have opened information media platforms and whether the opened platforms are updated with service information. the results showed that the information medium with the highest adoption rate was the wechat official account, reaching 100%. the library’s official website showed an adoption rate of 94%. only 57% of libraries use apps to distribute contactless information (see fig, 1). figure 1. percentage of touch-free information distribution platforms in large public libraries in china. patron services must provide timely and convenient access if public libraries want to effectively expand their patron base or increase library usage. wechat is better adapted to user convenience than websites, which explains the greater utilization rate as a contactless information dissemination tool for libraries. as a public service institution, the chinese public library has an incomparable impact on politics, economy, and culture. libraries have a great influence on the cultural popularization and educational development of the public. therefore, touch-free information dissemination plays an important role in improving the efficiency of information dissemination. wechat has been fully integrated into china’s public library services as a communication tool, allowing libraries to better foster cultural growth. in the process of cultural growth, libraries need to emphasize interactive public participation and combine public culture, social topics, citizen interaction and media communication, bringing innovative value to promote urban vitality and urban humanism. the 100% 94% 57% 0% 20% 40% 60% 80% 100% 120% wechat official account official website app information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 8 widespread use of wechat helps users stay up to date on the newest information and access library resources services more conveniently. remote resources services restrictions on the use of digital resources are closely related to the frequency of patrons’ use. restrictive measures that posed obstacles to patrons using digital resources were identified. among the 128 large public libraries surveyed, 42% of libraries require reader card authentication by patrons before they can access remote resources services; 8% of libraries do not require users to have reader cards for services. patrons can use the remote resources services available in the remaining 49% of public libraries without needing to register for a user account or patron id on the library website. to reduce the risk of infection between librarians and patrons, some libraries adopted noncontact paper document delivery services for users in urgent need of paper books during the pandemic. for example, the peking university library’s book delivery to building service (see fig. 2) and xiamen library and wenzhou library’s book delivery to home (see fig. 3) allow patrons to reserve books online, and librarians will express mail the books to patrons’ homes according to their needs. figure 2. peking university library’s book delivery service to the building. figure 3. book delivery service of xiamen library and wenzhou library. contactless services have two outstanding advantages: services can be obtained without contact with people, and convenience. however, if the use of remote resources is restricted in many ways, it will lead to a decrease in the utilization of digital resources in libraries. while intellectual property requirements and concerns must be appropriately managed, public libraries should strive to provide patrons with unlimited access to digital materials and physical print books. no-touch interaction self-services no-touch interaction self-services in chinese public libraries mainly include self-checkout, selfretrieval, self-storage, self-printing, self-card registration, and other self-service services, such as self-payment, and self-reservation of study rooms or seminar rooms (see fig. 4). information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 9 figure 4. percentage of large public libraries in china that provide contactless self-service. the survey of large public libraries in china shows that the majority offer self-checkout and selfretrieval services. the percentage of public libraries offering self-storage, self-certification and self-printing is low, with only 50% or less usage. self-storage, as one of the earlier self-services, has a usage rate of 50%. only 34 percent of public libraries offered self-card registration. the selfservice card registration machine has four main functions: reader card registration, payment, password modification, and renewal. for example, when patrons need to pay deposits or overdue fines, they can use the self-service card registration machine to swipe their cards and payment to facilitate subsequent borrowing of various resources. the machine supports face recognition technology for card application and online deposit recharge, catering to the needs of patrons in many aspects of operation (see fig. 5). the proportion of self-printing is even lower available at only 15% of libraries. self-card registration and self-printing are both emerging self-service options that require strong financial and technical support and are therefore not widely available. 5% 99% 98% 50% 34% 15% 0% 20% 40% 60% 80% 100% 120% others self-checkout self-retrieval self-storage self-card registration self-printing information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 10 figure 5. self-service card registration machine in chinese large public libraries. most public libraries in china have set up dedicated self-service libraries or microservice halls on the wechat public account platform in addition to further promoting library contactless services and enabling users to enjoy self-service library services anytime, anywhere. for example, the changsha library (see fig. 6) and the taiyuan library (see fig. 7) have both set up a microservice hall column on their wechat public numbers, containing services such as personal appointment, book renewal, event registration, and digital resources. the emergence of online self -service library services has greatly contributed to the development of equalization and standardization of public library services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 11 figure 6. changsha library no-touch interaction self-service hall. figure 7. taiyuan library no-touch interaction self-service hall. 24-hour self-service library the 24-hour self-service library, a contactless phenomenon in china’s public libraries, was introduced in 2006 and officially launched in 2007 by dongguan library and followed by shenzhen library’s initial batch of ten self-service libraries. the success of the shenzhen model has sparked a boom in the construction of self-service libraries in china, with 77% of the chinese public libraries surveyed having opened self-help libraries. the development of self-service libraries is divided into two types of service models: space-based self-service libraries (see fig. 8), i.e., unattended libraries with a certain amount of space for use, in which patrons can freely select books and read for leisure, such as 24-hour city bookstores; and a cabinet-type self-service library (see fig. 9), similar to a bank atm with an operating panel and similar in appearance to a bookcase, which allows real-time data interaction with the central library via the network. the eight self-service libraries in taiyuan library in shanxi can provide self-service book borrowing services through the new model of library + internet + credit, which allows patrons to apply for a reader’s card without a deposit and make reservations online and deliver books to the counter (see fig. 10). by cross-referencing the reader’s card with the patron’s face information, the guangzhou self-service library provides self-service borrowing and returning services for patrons through face recognition. there are many similar self-service libraries in china, which provide various types of patron services in different forms, largely reducing direct contact between patrons and librarians, and between patrons and readers. for example, when the pandemic was most severe, data collected from the ningbo self-service library showed that 7,022 physical books were borrowed and returned from january to march 2020, 50% more than in a normal year.25 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 12 figure 8. space-based self-service libraries. figure 9. cabinet type self-service library. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 13 figure 10. taiyuan self-service library. the popularity of 24-hour self-service libraries in china is first and foremost due to the strong support and financial investment of government departments in the construction of self -service libraries. secondly, the features of self-service libraries, which are convenient, time-independent, time-saving, efficient, and diversified, are in line with modern lifestyles, integrating public library services into people’s lives, increasing the visibility and penetration of public library patron services, and maximizing patrons’ needs in reading. network services there is a wide range of network services but the most common are seat reservation, online renewal, and overdue fee payment (see fig. 11). the survey found that 89% of chinese public libraries offer at least one of these network services, indicating a high adoption rate of network services. in 2002, online renewals began to appear in china and then gradually became popular. most of the public libraries in china provide this service in the personal library or wechat official account. the rate of adoption of network service is as high as 85% in the 128 public libraries surveyed. the prevalence of seat reservation services is not high. only 28% of the public libraries surveyed offered seat reservation services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 14 figure 11. percentage of large chinese public libraries that provide network services. coverage of the online overdue fee payment service was even lower with only 21% of public libraries providing access. however, some libraries have replaced the overdue fee system with other methods, such as the shantou library’s lending points system. in the system, the initial number of points on a patron’s account is 100, with two points added for each book borrowed and one point deducted for each day a book is overdue. when the number of points deducted on the account reaches zero, the reader’s card will be frozen for seven days and cannot be used to borrow books. after the freeze is lifted, the number of points will be reset to 20.26 in summary, contactless services in china’s public libraries are moving in a more humane direction. online reference services as a type of contactless service, online reference services are extremely helpful in developing access to documentary information resources. the survey shows that 94% of public libraries provide online reference services. online reference services are available by telephone, website, email, qq, and wechat. telephone reference and website reference are the earliest forms of contactless service, with the highest usage rates of 79% and 71% respectively among public libraries surveyed. this is followed by slightly lower coverage of email reference and qq reference at 55% and 48% respectively. wechat reference coverage rate is the lowest with only 16% (see fig. 12). qq and wechat are both tencent’s instant messengers, but qq’s file function is slightly stronger than wechat’s. qq can send large files of over 1gb and files do not expire, making it easy for the reference librarians to communicate with patrons. 85% 28% 21% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% online renewal seat reservation overdue fee payment information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 15 figure 12. percentage of large public libraries in china that provided online reference service tools. other online reference methods such as microblog reference and intelligent robot reference are present in chinese large public libraries. real-time reference is labor-intensive and timeconsuming, and where librarians may be unavailable to provide an immediate response, intelligent robotic referencing can make up for the problem of consultants being online full time. applying intelligent robots to library reference can also provide accurate and personalized consultation services according to patrons’ needs and behavioral patterns, greatly improving the quality, effectiveness, and satisfaction of consultation services. for example, the zhejiang library has an online reference service which includes online 24-hour robot reference and offline message modules. patrons can also choose expert reference and see available reference experts in the expert list and their details, including name, library, title, specialties, status, etc.27 in addition, the hunan library provides joint online reference, which is a public welfare platform of the hunan provincial literature and information resources common construction and sharing collaborative network, to provide online reference services to the public. eleven member units, including hunan library, hunan university library, and hunan science and technology information institute benefit from the rich literature resources, information technology, and human resources of the network, and all sites work together to provide free online reference advice and remote delivery of literature to a wide range of patrons, as well as advisory and tutorial services to guide patrons on how to use the library’s physical and digital resources.28 smart services without personal interactions driven by artificial intelligence, blockchain, cloud computing, and other technologies, libraries are evolving from physical and digital libraries to smart libraries. smart services without personal interactions are a fundamental capability of smart libraries. this survey found that the coverage of 4% 79% 71% 55% 48% 16% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% others telephone website email qq wechat information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 16 smart services was 52%, with virtual reality coverage at 21%, face recognition coverage at 20%, and swipe face to borrow books at 9%. face recognition can be used in library resources services, face gates, security monitoring, self-checkout, and other online and offline real-name identity verification instances, which can improve the efficiency of identity verification. the biggest advantage of face recognition is that it is contactless and easy to use, avoiding the health and safety risks associated with contact identification such as fingerprints. swipe face to borrow books is one of the applications included in face recognition technology that allows patrons to quickly borrow and return books by swiping faces, even if they have forgotten their reader’s card. this technology also tracks the interests of patrons based on their borrowing habits and history records, providing them with corresponding reading recommendation services. it is worth noting that chinese public libraries have a rich variety of smart service methods. in terms of vr technology applications, the national library of china launched the national library virtual reality system in 2008, the first service in china to bring vr technology to the public eye. the virtual reality system provides patrons with the option to explore virtual scenes and interact with virtual resources available in the library. the virtual scenes are distributed by using computer systems to build realistic architectural structures and reading rooms, so that patrons can learn about the library in the library lobby with the help of vr equipment. virtual resources are digital resources presented in virtual form. the technology combines flash and human gesture recognition systems, allowing patrons to flip through books touch-free at virtual reality reading stations, enhancing the reading style and interactive experience. in addition, the fuzhou library is concerned with the characteristics of different groups of people and has made virtual experiences a focus of its services, using vr technology to innovate reading methods, such as presenting animal images in 3d form on a computer screen, which has been welcomed by a large number of readers, especially children. shanghai library, tianjin library, shenzhen library, chongqing library, and jinan library have introduced vr technology into their patron services as to attract more users. in terms of blockchain applications, the national digital library of china makes use of the special features of blockchain technology in terms of distributed storage, traceable transmission, and high-grade encryption to provide full-time, full-domain, and full-scene copyright protection for massive digital resources and promotes the construction of intelligent library services. related to big data technology, the shanghai library provides personalized recommendation services for e-books based on the characteristics of the books borrowed by readers. patrons using a mobile phone can scan a code on borrowed books and click on the recommended book’s cover for immediate reading.29 conclusion & recommendations an in-depth analysis of the contactless service strategy will help to steadily improve the smart library development process in public libraries and to support their transition to smart libraries. this report provides a systematic framework for contactless services for public libraries based on a survey and assessment of the contactless service status of large public libraries in china. contactless patron services, contactless space services, contactless self -services, and contactless extension services are the four key components of the framework (see fig. 13). information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 17 figure 13. a systematic framework of contactless services for public libraries. providing contactless patron services patron services are the heart and soul of each public library. the library’s services providing no personal physical contact or touch-free connection with patrons are referred to as contactless patron services. this includes book lending, online reference, digital resources and network reading promotion. at present, most chinese public libraries have few contactless lending options, making it difficult to meet the needs of patrons who cannot access the library due to covid-19 or transportation difficulties for various reasons. therefore, public libraries can enrich their existing book lending methods by providing patrons with contactless services, such as book delivery and online lending, to create a convenient reading environment. a focus on digital resources is fundamental to achieving contactless patron services. at present, some public libraries in china neglect the management of digital resources due to the emphasis on paper resources, and digital resources are not updated and maintained in a timely manner, which leads to the inability of patrons to use them smoothly; therefore, the effective management of digital resources in libraries is crucial. in addition, public libraries can carry out activities such as network reading promotion and reader education to effectively improve the utilization of library resources. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 18 building contactless space services contactless space services refer to the touch-free interaction between physical space and virtual space. physical space services mainly include self-reservation of study rooms, discussion rooms, meeting rooms, as well as providing venues for public lectures or exhibitions, etc., to fulfill the space demands arising from patrons’ access to information. virtual space services mainly include building spaces for collaboration and communication, creative spaces, information sharing spaces, and cultural spaces, providing a virtual integrated environment for patrons’ needs for information exchange and acquisition in the online environment. public libraries can develop their activities through different channels according to the characteristics and elements of physical and virtual spaces, so that libraries can evolve from “library as a place” to “library as a platform.” the combination of an offline library space and an online library platform provides a more convenient and accessible library experience for patrons. implementing no-touch interaction self-services no-touch interactive self-service plays a pivotal role as one of the service forms of the contactless service strategy. it mainly includes no-touch interaction self-services such as information retrieval, resources navigation, self-checkout, and self-printing. public libraries can set up no-touch interaction self-service sections on their official websites or social media accounts to help patrons quickly access up-to-date information from anywhere and at any time. developing contactless extension services in the three dimensions of time, space, and approach, contactless extension services refer to the mutual extension of the library. public libraries can be open year round on a 24/7 basis or during holidays without librarians, allowing patrons to swipe their own cards to gain access. the traditional collection of paper books should not only be available in offline libraries but can extend to individual self-service libraries or city bookshops. libraries can approach patrons with a more individualized service strategy. for example, some public libraries provide a service called build a book bag, where librarians select books according to the patron’s personal interests and reading preferences and deliver them to a designated location. limitations and prospects after analyzing the current status of contactless services in large public libraries in china, this paper finds that contactless services such as reference and access to digital resources are well established in chinese public libraries. on the other hand, the availability of contactless applications such as no-touch interaction self-services, network services, and smart services without personal interaction are less well-developed. despite the rapid development of touch-free services and their variety, public libraries in china have not yet implemented a system of contactless services. this paper proposes a systematic framework to improve the development and practice of contactless services in public libraries and interrupt the spread of covid-19. the framework includes four core modules: contactless patron services, contactless space services, contactless self-help services, and contactless extension services. it is foreseeable that contactless services will become the mainstream of public library services in the future. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 19 endnotes 1 fred griffith, “identification of the meningococcus in the naso-pharynx with special reference to serological reactions,” journal of hygiene 15, no. 3 (1916): 446–63, https://doi.org/10.1017/s0022172400006355. 2 “guiding opinions of the state council on actively promoting the ‘internet +’ action,” 2015, http://www.gov.cn/zhengce/content/2015-07/04/content_10002.htm. 3 d. brooks, “a program for self-service patron interaction with an online circulation file,” in proceedings of the american society for information science 39th annual meeting (oxford, england, 1976). 4 beth dempsey, “do-it-yourself libraries,” library journal 135, no. 12 (2010): 86–93, https://doi.org/10.1016/j.lisr.2010.03.004. 5 jackie mardikian, “self-service charge systems: current technological applications and their implications for the future library,” reference services review 23, no. 4 (1995): 19–38, https://doi.org/10.1108/eb049262. 6 pan yongming, liu huihui, and liu yanquan, “mobile circulation self-service in u.s. university libraries,” library and information service 58, no. 12 (2014): 26–31, https://doi.org/10.13266/j.issn.0252-3116.2014.12.004. 7 chen wu and jang airong, “building a modern self-service oriented library,” journal of academic libraries, no. 3 (2013): 93–96, https://doi.org/cnki:sun:mrfs.0.2016-24-350. 8 rao zengyang, “innovative strategies for university library services in the era of smart libraries,” library theory and practice, no. 12 (2016): 75–76, https://doi.org/10.14064/j.cnki.issn1005-8214.2016.12.018. 9 wang weiqiu and liu chunli, “functional design and model construction of intelligent library services in china based on face recognition technology,” research on library science, no. 18 (2018): 44–50, https://doi.org/10.15941/j.cnki.issn1001-0424.2018.18.008. 10 cheng huanwen and zhong yuanxin, “a three-dimensional analysis of a smart library,” library tribune 41, no. 6 (2021): 43–45. 11 nahyun kwon and vicki l. gregory, “the effects of librarians’ behavioral performance and user satisfaction in chat reference services,” reference & user services quarterly, no. 47 (2007): 137–48, https://doi.org/10.5860/rusq.47n2.137. 12 w. uutoni, “providing digital reference services: a namibian case study,” new library world 119, no. 5 (2018): 342–56, https://doi.org/10.1108/ils-11-2017-0122. 13 zhu hui, liu hongbin, and zhang li, “an analysis of the remote service model of university libraries in response to public safety emergencies,” new century library, no. 5 (2021): 39–45, https://doi.org/10.16810/j.cnki.1672-514x.2021.05.007. https://doi.org/10.1017/s0022172400006355 http://www.gov.cn/zhengce/content/2015-07/04/content_10002.htm https://doi.org/10.1016/j.lisr.2010.03.004 https://doi.org/10.1108/eb049262. https://doi.org/10.13266/j.issn.0252-3116.2014.12.004 https://doi.org/10.14064/j.cnki.issn1005-8214.2016.12.018 https://doi.org/10.15941/j.cnki.issn1001-0424.2018.18.008 https://doi.org/10.5860/rusq.47n2.137 https://doi.org/10.1108/ils-11-2017-0122 https://doi.org/10.1080/24750158.2020.1840719 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 20 14 xiangming mu, alexandra dimitroff, jeanette jordan, and natalie burclaff, “a survey and empirical study of virtual reference service in academic libraries,” journal of academic librarianship 37, no. 2 (2011): 120–29, https://doi.org/10.1016/j.acalib.2011.02.003. 15 cheng xiufeng et al., “a study on a library’s intelligent reference service model based on user portraits,” research on library science, no. 2 (2021): 43–55, https://doi.org/10.15941/j.cnki.is sn1001-0424.2021.02.012. 16 m. aittola, t. ryhänen, and t. ojala, “smart library-location-aware mobile library service,” in human-computer interaction with mobile devices and services, international symposium, (2003). 17 chu jingli and duan meizhen, “from smart libraries to intelligent libraries,” journal of the national library of china, no. 1 (2019): 3–9, https://doi.org/10.13666/j.cnki.jnlc.2019.01.001. 18 yan dong, “iot-based smart libraries,” journal of library science 32, no. 7 (2010): 8–10, http://doi.org/10.14037/j.cnki.tsgxk.2010.07.034. 19 wang shiwei, “a brief discussion of the five relationships of smart libraries,” library journal 36, no. 4 (2017): 4–10, https://doi.org/10.13663/j.cnki.lj.2017.04.001. 20 morell d. boone, “unlv and beyond,” library hi tech 20, no. 1 (2002): 121–23, https://doi.org/10.1108/07378830210733981. 21 qin hong et al., “research on the application of face recognition technology in libraries,” journal of academic libraries 36, no. 6 (2018): 49–54, https://doi.org/10.16603/j.issn10021027.2018.06.008. 22 li mo, “research on a mobile visual search service model for smart libraries based on deep learning,” journal of modern information 39, no. 5 (2019): 89–96. 23 zhou jie, “study on the application of lora technology in smart libraries,” new century library, no. 5 (2021): 57–61, https://doi.org/10.16810/j.cnki.1672-514x.2021.05.010. 24 international federation of library associations and institutions, “the covid-19 and the global library community,” 2020, https://www.ifla.org/covid-19-and-the-global-library-field/; guo yajun, yang zinan, and yang zhishun, “the provision of patron services in chinese academic libraries responding to the covid-19 pandemic,” library hi tech 39, no. 2 (2021): 533–48, https://doi.org/10.1108/lht-04-2020-0098; peking university library, “book delivery service to the buildings where the patrons live,” (2020), https://mp.weixin.qq.com/s/eknyg_-_rjrcl6sjc-it-a. 25 hu bin ying yan, “study on the intelligent construction of ningbo library under the influence of epidemic,” jiangsu science & technology information 38, no. 24 (2021): 17–21, https://doi.org/10.3969/j.issn.1004-7530.2021.24.005. 26 shantou library, “come and be a book ‘saint’! city library changes lending rules, points system instead of overdue fees,” 2021, http://www.stlib.net/information/26182. https://doi.org/10.1016/j.acalib.2011.02.003 https://doi.org/10.13666/j.cnki.jnlc.2019.01.001 https://doi.org/10.13663/j.cnki.lj.2017.04.001 https://doi.org/10.1108/07378830210733981 https://doi.org/10.16603/j.issn1002-1027.2018.06.008 https://doi.org/10.16603/j.issn1002-1027.2018.06.008 https://doi.org/10.16810/j.cnki.1672-514x.2021.05.010 https://www.ifla.org/covid-19-and-the-global-library-field/ https://doi.org/10.1108/lht-04-2020-0098 https://mp.weixin.qq.com/s/eknyg_-_rjrcl6sjc-it-a' http://dx.chinadoi.cn/10.3969/j.issn.1004-7530.2021.24.005 http://www.stlib.net/information/26182 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 21 27 zhejiang library, “online reference services,” 2020, https://www.zjlib.cn/yibanwt/index.htm?liid=2. 28 hunan provincial collaborative network for the construction and sharing of literature and information resources, “reference union of public libraries in hunan province,” 2021, http://zx.library.hn.cn/. 29 ministry of culture and tourism of the people’s republic of china, “shanghai library launches personalized recommendation service for e-books,” 2021, https://www.mct.gov.cn/whzx/qg whxxlb/sh/202101/t20210106_920497.htm. https://www.zjlib.cn/yibanwt/index.htm?liid=2 http://zx.library.hn.cn/ https://www.mct.gov.cn/whzx/qgwhxxlb/sh/202101/t20210106_920497.htm https://www.mct.gov.cn/whzx/qgwhxxlb/sh/202101/t20210106_920497.htm abstract introduction literature review methods survey samples survey methods findings touch-free information distribution remote resources services no-touch interaction self-services 24-hour self-service library network services online reference services smart services without personal interactions conclusion & recommendations providing contactless patron services building contactless space services implementing no-touch interaction self-services developing contactless extension services limitations and prospects endnotes letter from the editor: farewell 2020 letter from the editor farewell 2020 kenneth j. varnum information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.13051 i don’t think i’ve ever been so ready to see a year in the rear-view mirror as i am with 2020. this year is one i’d just as soon not repeat, although i nurture a small flame of hope. hope that as a society what we have experienced this year will exert a positive influence on the future. hope that we recall the critical importance of facts and evidence. hope that we don’t drop the effort to be better members of our local, national, and global communities and treat everyone equitably. hope that as a global populace we continue to get into “good trouble” and push back against institutionalized policies and practices of racism and discrimination and strive to be better. despite the myriad challenges this year has brought, it is welcome to see so many libraries continuing to serve their communities, adapting to pandemic restrictions, and providing new and modified access to books and digital information. and equally gratifying, from my perspective as ital’s editor, is that so many library technologists continue to generously share what they have learned through submissions to this journal. along those lines, i’m extending my annual invitation to our public library colleagues to propose a contribution to our quarterly column, “public libraries leading the way.” items in this series highlight a technology-based innovation from a public library perspective. topics we are interested in could include any way that technologies have helped you provide or innovate service to your communities during the pandemic, but could touch on any novel, interesting, or promising use of technology in a public library setting. columns should be in the 1,000-1,500 word range and may include illustrations. these are not intended to be research articles. rather, public libraries leading the way columns are meant to share practical experience with technology development or uses within the library. if you are interested in contributing a column, please submit a brief summary of your idea. wishing you the best for 2021, kenneth j. varnum, editor varnum@umich.edu december 2020 https://ejournals.bc.edu/index.php/ital/pllw https://docs.google.com/forms/d/e/1faipqlsd7c0-g-lxetkj2ukjokd7oyt-vprtoizdm1fs8xuhkotctug/viewform https://docs.google.com/forms/d/e/1faipqlsd7c0-g-lxetkj2ukjokd7oyt-vprtoizdm1fs8xuhkotctug/viewform mailto:varnum@umich.edu articles weathering the twitter storm: early uses of social media as a disaster response tool for public libraries during hurricane sandy sharon han information technology and libraries | june 2019 37 sharon han (shrnhan@gmail.com) is candidate for master of science in library and information science, school of information sciences, university of illinois. abstract after a disaster, news reports and online platforms often document the swift response of public libraries supporting their communities. despite current scholarship focused on social media in disasters, early uses of social media as an extension of library services require further scrutiny. the federal emergency management agency (fema) recognized hurricane sandy as one of the earliest u.s. disasters in which first responders used social media. this study specifically examines early uses of twitter by selected public libraries as an information tool during sandy’s aftermath. results can inform uses of social media in library response to future disasters. introduction in the digital age of instantaneous communication, when disasters hit, they hit us all. the fall and winter of 2017-18 brought a literal and figurative deluge to our screens with the arrival of hurricanes harvey, irma, and maria to the united states. within moments of each event, websites and news feeds filled with images of destruction and cries for help. the use of social media to bring awareness to victims’ situations through hashtags and directly tagging first responders underscores the importance of this technological tool in the twenty-first century. in fact, the ubiquity of social media in documenting hurricane harvey have led some to believe that it should be considered the first “social media storm.”1 however, many of the most popular social media platforms have existed since the mid-2000s and have already been used to communicate disasterrelated information since well before harvey reached the united states’ shores. some of social media’s earliest adapters were even public libraries who had the resources and means to use this information technology as a method of connecting with their communities. why should social media matter to public libraries in times of disaster? as a physical manifestation of information access, the public library maintains a relationship with its community that varies across regions, time, and context. currently, the public library as an entity is in an interventionist period, according to jaeger’s article “libraries, policy, and politics in a democracy: four historical epochs,” where its roles and responsibilities are heavily influenced by outside factors, especially the federal government.2 from tax forms to permits to insurance claims, the government encourages people to use the public library to find and use information necessary to navigate american society. public demand for accessing government and other resources is especially apparent after natural disasters, which, due to their unpredictable nature, can heighten weathering the twitter storm | han 38 https://doi.org/10.6017/ital.v38i2.11018 community uncertainty and the need for credible and reliable information. public libraries can meet this information need by using social media as one strategy to assess and provide resources in real time. when hurricane sandy landed on new jersey’s shore on october 29, 2012, it prompted a new era for societal response to emergencies and community needs. due to the hurricane’s trajectory into densely populated areas of the american northeast and subsequent widespread flooding, hurricane sandy was the deadliest storm of 2012.3 with initial estimated recovery costs of up to $50 billion, the degree of damage to buildings, infrastructure, and endangerment of people’s safety made swift and coordinated communication paramount in response efforts. thus, the aftermath of hurricane sandy resulted in federal agencies using social media for the first time in coordinating and implementing disaster response.4 as community-based service providers, many public libraries responded to the hurricane by sharing available resources and services with patrons. however, few studies explicitly examine the use of social media as a library tool to support their community. this paper explores the role of social media and its impact on public library services in response to hurricane sandy as a measure of libraries using digital mediums to support their communities. using twitter posts from three separate public libraries impacted by the hurricane, their content is analyzed and compared to reported library services after the storm. the analysis will then be used to discuss the use of social media as a library tool and recommendations for social media implementation in future disaster response. background information library response to disasters according to the institute of museum and library services’ public library data from 2009 to 2011, over half of all public libraries are located within declared “disaster counties.”5 this value implicates disaster response as an important topic within public librarianship discourse. in addition to assessing damages to buildings and collections, libraries must also meet the needs of its community. information needs are heightened after a disaster, as the destruction results in information uncertainty and loss of important resources such as power and telecommunication services.6 consistent and increased use of public libraries is not unusual post-disaster. for example, despite 35 percent of louisiana libraries being closed after hurricane katrina in 2004, a study found that overall library visitor counts only decreased by 1 percent.7 frequent use of library resources after a disaster can be attributed to the library’s free and low-cost resources, as well as the institution’s reputation as a source for reliable and credible information.8 libraries also extend their resources and services beyond their walls. library bookmobiles and delivery programs provide services to those who are unable to physically visit the library. some libraries use their skills in information management and communication to assist local disaster preparedness groups and response teams.9 in 2011, the federal emergency management agency (fema) declared public libraries eligible for temporary relocation funds in the event of an emergency, a distinction once limited to first responders, hospitals, utilities, and schools.10 former executive director of the american library association’s (ala) washington office, emily sheketoff, stated such a distinction recognizes libraries as “essential community organizations.”11 in context with jaeger’s interventionist period, it benefits libraries and government agencies alike to have libraries open to serve communities after a disaster. information technology and libraries | june 2019 39 in the aftermath of hurricane sandy, communities suffered from varying degrees of damage, such as flooding, power outages, debris, and downed trees.12 the impact of the storm drove many community members to their local libraries to seek shelter, charge their electronics, file insurance claims and other e-government forms, drop off or pick up donations, and obtain entertainment.13 despite the many stories of libraries serving disaster victims and working with first responders, such actions have yet to be translated into widespread library policy and procedures. ala provides a “disaster preparedness and recovery” resources webpage, but it primarily focuses on addressing material and structural needs after a disaster, such as mitigating water damage to collections.14 other studies also note a majority of library disaster response literature remains focused on protecting materials.15 such a limited perspective is highlighted in a national survey in which the majority of librarian respondents believed protecting library materials and performing daily services were their primary goals in the event of an emergency.16 as a result, library communication with the community and local organizations remains a relatively unexplored subject in context with disaster response.17 while trade journals and websites publish stories of individual libraries serving their communities, formal studies and research are comparatively scarce. with the widespread use of technology and the internet, one method of communication stands out as an important tool for library outreach and study: social media. disaster response through social media as information providers and advocates of communication technology, libraries should use social media to connect with their communities. although libraries were early adopters of social media prior to hurricane sandy, their use of these tools tends to focus on one-way information sharing instead of a dialogue with their community.18 social media in context of disaster response may upend traditional library social media use, which is why this topic needs further examination. social media coupled with mobile technology has created a society in which information sharing and communication are constant and instantaneous.19 since social networking is a relatively new form of media, formal studies on its impact on social behaviors have only come about in the last decade.20 within this young body of literature, however, social media use in disaster response and recovery is a popular topic for researchers, organizations, and federal agencies.21 alexander claims that social media provides the following benefits during disaster response: • provides an outlet to listen and share thoughts, emotions, opinions; • monitors a situation; • integrates social media into emergency plans; • crowdsources information; • creates social cohesion and promoting therapeutic initiatives; • furthers causes; and • creates research data.22 such a comprehensive list is beneficial to this study because it provides a framework through which library social media use can be examined. these benefits stem from the sharing of information with people or entities, which is a large component of library disaster response, as discussed in the previous section. using alexander’s list as a reference, the three main benefits this study examines in context with library disaster response are: weathering the twitter storm | han 40 https://doi.org/10.6017/ital.v38i2.11018 1. monitors a situation. a survey of library patrons impacted by the 2015 south carolina floods revealed all respondents used social media to learn about the flooding and impacted areas.23 people now frequently use social media to get updates on situations, whether they were directly or indirectly impacted by the natural disaster itself. disaster response groups also monitor social media feeds to assess and allocate resources to those in need.24 libraries can use social media feeds to assess resources and services use, plan outreach opportunities, and even inform the public about its own status during the disaster. 2. integrates social media into emergency plans. social media is a low-cost and effective way to coordinate disaster response between organizations and people. much like bookmobiles, social media serves as outreach for librarians to improve service accessibility. librarians can use platforms like twitter and facebook to help coordinate their activities and services alongside with other responders in the community. having an established plan of action where the library’s role and responsibilities are clearly outlined will result in more effective service and efficient response to community needs.25 3. creates social cohesion and promoting therapeutic initiatives. in alignment with the library’s mission of creating and serving communities, social media can act as an extra method of fostering connections in times of need. disaster victims can take advantage of social media’s speed and ubiquity to check in with family, tell them they are safe, and participate in relief efforts.26 social cohesion through platforms such as twitter can also create participatory discourse between people and organizations. for example, then-fema administrator chris furgate’s recommendation to read to children during the hurricane prompted the hashtag #stormreads to trend on twitter, as many accounts—libraries included—shared their recommended titles.27 library use of social media can also address growing concerns about rumors and misinformation spread during disasters.28 as providers of reliable and accurate information, libraries help establish source credibility and push more accurate resources to misinformed and unaware community members. although there is a substantial amount of research focused on libraries responding to disasters and social media use during disasters separately, there is a gap in library science literature examining social media as a method of library disaster response. interestingly, formal studies that mention library disaster response note an explicit absence of social media as a form of emergency communication.29 despite the current dearth, library social media studies can develop quickly thanks to the abundant amount of data available on social media platforms. as libraries continue to respond to disasters, they will require more deliberate and planned use of social media as a communication tool. such a need demands a closer examination of how libraries have historically used social media during disasters. case studies: three public libraries and twitter this study will examine the social media feeds of three public libraries during and immediately after hurricane sandy landed on the northeastern coast as a measure of social media’s impact on communication and information-sharing amongst libraries, patrons, and first responders. due to its frequent use for sharing up-to-date information, twitter was the selected social media platform to study.30 the public library systems were selected for this analysis based on their varying characteristics and available literature describing their actions after the hurricane. new york information technology and libraries | june 2019 41 public library (nypl, @nypl), princeton public library (ppl, @princetonpl), and queens library (ql, @queenslibrary) have twitter accounts that were at least two years old by october 2012. all accounts were active during the time period of interest, although they were closed when hurricane sandy landed. nypl and ql were closed an additional two days due to damages to several branch libraries.31 these library systems serve varied communities. nypl and ql are urban libraries located in new york city, with 91 and 62 branches respectively, and ppl is a one branch library located in downtown princeton, new jersey. the larger library systems reported flooding and power outages at several branches from the hurricane, while ppl sustained no structural or internal damages.32 however, all library systems were in communities where large numbers of households lost electricity and internet access, and sustained damages from fallen trees and flooding.33 the library systems were mentioned in news reports for services to library patrons affected by the storm, including providing charging stations for electronics, helping people fill out fema insurance forms, running programs for children and adults, and having public computers and wireless connections to access the internet.34 the libraries’ coupled use of twitter and active provision of disaster response services make them ideal candidates for examining the correlation between the two activities. methodology this study used a filtered search on twitter to identify tweets from each library’s feed within the time period of interest. within searches, each tweet was recorded and categorized based on content and message format. a single tweet could have more than one category. common content subcategories were identified to improve analysis. defined categories are as follows: • hurricane information: information on the hurricane’s status and impact from news and government agencies. • library policies: information on library policies. • library policies, renewals/fines: information on renewals and fines during the studied time period. • library status: information on library branch closures. • library event/service related to hurricane: event or service specifically planned in response to hurricane. • library event/service not related to hurricane: regular library programming; included event/service cancellations as an indirect/direct result of hurricane. • non-library event/service related to hurricane: information on non-library sponsored events and services provided in response to the hurricane. • replies: a publicly posted message from the library to another twitter user. • social interactions: non-informative and informative tweets aimed at conversing with people or organizations in a social manner. selected categories were then associated with a corresponding benefit from three of alexander’s defined benefits (table 1).35 after categorizing, the collected data was organized for analysis and comparison. weathering the twitter storm | han 42 https://doi.org/10.6017/ital.v38i2.11018 table 1. categories organized by social media benefits.36 benefit twitter content categories monitoring a situation § hurricane information § library event/service related to hurricane § replies integrating social media into emergency plans § library policies § library status § non-library event/service related to hurricane creating social cohesion and promoting therapeutic initiatives § library event/service related to hurricane § library event/service not related to hurricane § non-library event/service related to hurricane § replies § social interactions results from october 29-31, each library used twitter regularly to provide information or to communicate with library followers. tweet frequencies were counted and compared over the fiveday period across libraries (figure 1). while nypl and ql averaged almost 11 tweets per day, ppl had nearly double their numbers, at about 18 tweets per day. nypl and ql had a generally increasing trendline in tweets, while ppl’s twitter use fluctuated greatly. nypl and ql’s low tweet count during the studied time frame may be attributed to library-wide closures, although only ql’s tweet count increased significantly upon reopening. figure 1. number of tweets per day by library. content analysis illustrated variations in twitter use across all three libraries (figure 2). nypl tweeted the most about their library status and renewal/fine policy, with 21 and 17 tweets, information technology and libraries | june 2019 43 respectively. ppl focused more on advertising library events and services such as electrical outlets, heat, internet, and entertainment. they also used twitter heavily for social interactions, 35 percent of ppl’s 112 tweets, including asking questions, recommending books, thanking concerned patrons, and even apologizing for retweeting too many news articles about the hurricane. ql’s twitter use was more of a mix, often posting about library status and socially interacting with other twitter users. figure 2. twitter content by library. each library also differed in least common content tweeted. nypl had the fewest tweets about the hurricane, non-library services and events related to the hurricane, other library policies, and social interactions. ppl also had few tweets with information about the hurricane and rarely tweeted about fines and renewals. ql had no tweets about the hurricane, nor did they tweet about any library events or programs that were unrelated to their disaster response. discussion the data collected was analyzed to determine whether each library fulfilled the three identified benefits of social media that directly relate to the library’s mission of information access and community building: monitoring a situation, integrating social media into emergency plans, and creating social cohesion and promoting therapeutic initiatives. each library’s consistent responses to twitter users, status updates, and information about library services illustrates they all monitored their communities’ situations and responded accordingly through services and programs, as evidenced in news reports. libraries also used twitter to engage with others and weathering the twitter storm | han 44 https://doi.org/10.6017/ital.v38i2.11018 create a social network of library patrons and local institutions. based on the lack of information about the storm itself and few recommendations for non-library disaster response group resources, it is not apparent libraries integrated social media as part of their emergency policy and procedures. this also resulted in a dissonance between library action and their online communication. one notable example: many news reports described librarians aiding patrons with finding and filling out fema insurance forms, but only one of the 196 tweets analyzed in this study advertised fema assistance at the library.37 ppl tweeted several posts illustrating library use by affected patrons, but also emphasized they were at capacity due to large visitor numbers and shortages in charging stations and internet bandwidth. ppl also failed to offer alternatives on twitter to meet patron information needs. the lack of a coordinated effort perhaps can be explained in two parts. first, as no two disasters are alike, library response is often a direct reaction to the event and damages to their institution and community. a busy library would logically place social media communication and coordination as a lower priority than other immediate, tangible needs. second, librarians may not make a concerted effort to use social media if they are trained to prioritize protecting library collections and conducting regular services.38 while digital and outreach services such as bookmobiles have been common components of libraries, there is still a noticeable gap in libraries extending these same services using online tools. the libraries in this study used social media as a part of their disaster response, but the lack of planning resulted in each library’s twitter feed acting more as a “triage center,” providing basic assistance as the need arose, rather than an extension of in-house services. takeaways and further research while these libraries provided much needed services in the aftermath of hurricane sandy, their implementation of social media as a communication and information-sharing tool illustrates opportunities to develop more coordinated efforts. as library presence on and use of social media continues to grow, it should be considered as a necessary component of library disaster response and collaboration with other government agencies and first responders. while libraries are qualified for fema funding, it is uncertain that local first responder groups are aware of the services and benefits libraries provide post-disaster at all. as of 2013, the u.s. department of homeland security’s virtual social media working group did not include any library organizations, which leaves libraries out of crucial conversations in designing comprehensive disaster response plans.39 in an effort to participate in productive discourse, librarians also need to improve their social media use to better align with their practice when serving distressed communities. while the exact reasons for librarians’ lack of effective social media use in disaster response remains speculative, other research has shown that training opportunities for social media use in libraries remain scarce and not very effective.40 since hurricane sandy, social media has only grown as a powerful tool for people and communities, rendering it an essential skill for librarians today. this should motivate librarians, library associations, and other professional groups to consider developing effective training and workshops geared towards intentional use of social media. despite its power, social media should be seen as a complementary tool to enhance information services for community members. it will optimize the library’s reach, but it cannot completely replace current methods of outreach, nor should it. this is especially important when considering information technology and libraries | june 2019 45 who benefits the most from libraries, many of whom do not necessarily have consistent access to social media.41 social media use varies across age, socioeconomic status, digital access, and education levels, making it important for librarians to consider whose information needs are and are not being met online. considering such limitations, learning impactful social media skills and creating a support network amongst disaster response groups will enable libraries to effectively develop outreach strategies and improve disaster response services. the discussion and takeaways highlight the necessity for further research on social media use in library disaster response. as the history of library development and service informs the direction of libraries today, so too should historic uses of social media as a library service tool guide future work. continuing research may include case studies of public library response to recent disasters, which would provide better insight into the developing use of social media. the identified patterns and strengths can be used to guide future work in incorporating effective social media policies and protocols in library disaster plans. considering social media usage by first responders and federal agencies, future research should also include a closer examination of relationships between public libraries, first responders, and disaster information providers in improving coordinated response efforts. conclusion when disaster strikes, many communities exhibit a great need for resources and information. despite libraries providing much needed service and resources to community members after natural disasters, their use of social media platforms as a tool remains overlooked. this study examines historical use of social media as a communication and service tool between libraries, community members, and disaster response groups in the aftermath of hurricane sandy. the effectiveness of social media use was evaluated using alexander’s review of social media benefits and compared with descriptions of post-sandy library resources and services described in the literature. the study found social media use to be highly variable based on content and correlations with reported in-house library services. there was no sign of a coordinated effort with other disaster response groups, and the primary objective of their twitter accounts was connecting with patrons and other organizations through social interactions. improvements to social media use could be achieved through intentional coordination with first responders, directed training, and evaluating social media’s strengths and limitations in disaster response. if libraries wish to continue providing pertinent information, they need to adapt to communication methods used by their community. with social media’s strong presence in society, suburban and urban libraries such as the ones examined in this study should improve their use of social media as an effective information sharing and communication tool. continuing to examine and assess uses of social media as a disaster response tool can help shape policies and procedures that will enable libraries to better serve their communities. references 1 maya rhodan, “‘please send help.’ hurricane harvey victims turn to twitter and facebook,” time, aug. 30, 2017, http://time.com/4921961/hurricane-harvey-twitter-facebook-socialmedia/. weathering the twitter storm | han 46 https://doi.org/10.6017/ital.v38i2.11018 2 paul t. jaeger et al., “libraries, policy, and politics in a democracy: four historical epochs,” library quarterly 83, no. 2 (apr. 2013): 166–81, https://doi.org/10.1086/669559. 3 virtual social media working group and dhs first responders group, “lessons learned: social media and hurricane sandy", u.s. department of homeland security, june 2013, https://www.dhs.gov/sites/default/files/publications/lessons%20learned%20social%20me dia%20and%20hurricane%20sandy.pdf. 4 virtual social media working group and dhs first responders group. 5 bradley w. bishop and shari r. veil, “public libraries as post-crisis information hubs,” public library quarterly 32 (2013): 33–45, https://doi.org/10.1080/01616846.2013.760390. 6 bishop and veil. 7 bishop and veil. 8 bishop and veil; jingjing liu et al., “social media as a tool connecting with library users in disasters: a case study of the 2015 catastrophic flooding in south carolina,” science & technology libraries 36, no. 3 (july 2017): 274–87, https://doi.org/10.1080/0194262x.2017.1358128. 9 charles r. mcclure et al., “hurricane preparedness and response for florida public libraries: best practices and strategies,” florida libraries 52, no. 1 (2009): 4–7. 10 michael kelley, “ala midwinter 2011: fema recognizes libraries as essential community organizations,” school library journal, jan. 11, 2011, http://lj.libraryjournal.com/2011/01/industry-news/ala-midwinter-2011-fema-recognizeslibraries-as-essential-community-organizations/. 11 kelley. 12 maureen m. garvey, “serving a public library community after a natural disaster: recovering from ‘hurricane sandy,’” journal of the leadership & management section 11, no. 2 (spring 2015): 22–31; cathleen a. merenda, “how the westbury library helped the community after hurricane sandy,” journal of the leadership & management section 11, no. 2 (spring 2015): 32– 34. 13 sarah bayliss, shelley vale, and mahnaz dar, “libraries respond to hurricane sandy, offering refuge, wifi, and services to needy communities,” school library journal, nov. 1, 2012, http://www.slj.com/2012/11/public-libraries/libraries-respond-to-hurricane-sandy-offeringrefuge-wifi-and-services-to-needy-communities/; joel rose, “for disaster preparedness: pack a library card? : npr,” npr, aug. 12, 2013, https://www.npr.org/2013/08/12/210541233/for-disasters-pack-a-first-aid-kit-bottledwater-and-a-library-card. 14 “disaster preparedness and recovery,” ala advocacy, legislation & issues, 2017, http://www.ala.org/advocacy/govinfo/disasterpreparedness. information technology and libraries | june 2019 47 15 bishop and veil, “public libraries as post-crisis information hubs.” 16 lisl zach, “what do i do in an emergency? the role of public libraries in providing information during times of crisis,” science & technology libraries 30, no. 4 (sept. 2011): 404–13, https://doi.org/10.1080/0194262x.2011.626341. 17 bishop and veil, “public libraries as post-crisis information hubs.” 18 liu et al., “social media as a tool connecting with library users in disasters: a case study of the 2015 catastrophic flooding in south carolina”; zach, “what do i do in an emergency?” 19 virtual social media working group and dhs first responders group, “lessons learned.” 20 david alexander, “social media in disaster risk reduction and crisis management,” science & engineering ethics 20, no. 3 (sept. 2014): 717–33, https://doi.org/10.1007/s11948-013-9502z. 21 alexander; liu et al., “social media as a tool”; virtual social media working group and dhs first responders group, “lessons learned.” 22 alexander, “social media in disaster risk reduction.” 23 liu et al., “social media as a tool.” 24 alexander, “social media in disaster risk reduction.” 25 bishop and veil, “public libraries as post-crisis information hubs.” 26 alexander, “social media in disaster risk reduction.” 27 bayliss, vale, and dar, “libraries respond.” 28 liu et al., “social media as a tool.” 29 liu et al.; zach, “what do i do in an emergency?” 30 deborah d. halsted, library as safe haven: disaster planning, response, and recovery: a how-todo-it manual for librarians, first edition (chicago: american library association, 2014). 31 george m. eberhart, “libraries weather the superstorm,” american libraries magazine, nov. 4, 2012, https://americanlibrariesmagazine.org/2012/11/04/libraries-weather-thesuperstorm/; rose, “for disaster preparedness.” 32 bayliss, vale, and dar, “libraries respond”; eberhart, “libraries weather the superstorm”; rose, “for disaster preparedness.” 33 bayliss, vale, and dar, “libraries respond.” weathering the twitter storm | han 48 https://doi.org/10.6017/ital.v38i2.11018 34 bayliss, vale, and dar; eberhart, “libraries weather the superstorm”; lisa epps and kelvin watson, “emergency! how queens library came to patrons’ rescue after hurricane sandy,” computers in libraries 34, no. 10 (dec. 2014): 3–30; rose, “for disaster preparedness.” 35 alexander, “social media in disaster risk reduction.” 36 benefits listed and defined in alexander, david. “social media in disaster risk reduction and crisis management.” science & engineering ethics 20, no. 3 (sept. 2014): 717–33. https://doi.org/10.1007/s11948-013-9502-z. 37 eberhart, “libraries weather the superstorm”; rose, “for disaster preparedness.” 38 zach, “what do i do in an emergency?” 39 virtual social media working group and dhs first responders group, “lessons learned.” 40 rachel n. simons, melissa g. ocepek, and lecia j. barker, “teaching tweeting: recommendations for teaching social media work in lis and msis programs,” journal of education for library and information science 57, no. 1 (dec. 1, 2016): 21–30, https://doi.org/10.3138/jelis.57.1.21. 41 alexander, “social media in disaster risk reduction.” microsoft word 13353 20211217 galley.docx article diversity, equity & inclusion statements on academic library websites an analysis of content, communication, and messaging eric ely information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13353 eric ely (eely@wisc.edu) is a phd candidate in the information school, university of wisconsin-madison. © 2021. abstract post-secondary education in the 21st century united states is rapidly diversifying, and institutions’ online offerings and presence are increasingly significant. academic libraries have an established history of offering virtual services and providing online resources for students, faculty, staff, and the general public. in addition to these services and resources, information on academic library websites can contribute to an institution’s demonstration of value placed on diversity, equity, and inclusion (dei). this article analyzes the dei statements of a library consortium’s member websites to explore how these statements contribute to institutional construction of, and commitment to, diversity, equity, and inclusion. descriptive analysis revealed 12 of 16 member libraries had explicitly labeled dei statements in november 2020, with an additional member updating their website to include such a statement in early 2021. content analysis examined how the existing statements contributed to institutional value of and commitment to dei, and multi-modal theory explored the communicative aspects of dei statement content. analysis revealed vague conceptualizations of diversity and library-centered language in dei statements, while a subset of statements employed anti-racist and social justice language to position the library as an active agent for social change. implications and avenues for future research are discussed. introduction according to the national center for education statistics, 44% of us resident students attending us degree-granting postsecondary institutions were non-white during the fall 2017 term.1 academic libraries can utilize their online presence to engage diverse students. these sites provide users with various services, resources, and information. the convenience of remote access may encourage physical library use and encourage lasting library utilization. clearly demonstrating institutional values of diversity, equity, inclusion (dei) sends a message to users. given the amount and variety of content on academic library websites, creating a shared vision regarding the purpose of academic library websites is challenging.2 as instances of racial discrimination and marginalization continually occur within society, academic libraries can position themselves as agents for social justice via the presence and content of dei statements. in addition to social justice and student demographics, professional values, outlined in the american library association’s bill of rights, demonstrate the need for academic libraries to adequately serve non-white students.3 this article examines the dei statements on academic library websites and examines the presence, or lack thereof, and content of these statements to address the following research question: how do dei statements on academic library websites contribute to the construction of institutional value of diversity, equity and inclusion? information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 2 literature review literature regarding dei in academic libraries is plentiful.4 in addition to abundant scholarly research, the association of college and research libraries (acrl) addressed diversity via the “diversity standards: cultural competency for academic libraries,” while equity, diversity and inclusion are part of acrl’s timeless core ideology.5 the association of research libraries (arl) has similarly addressed diversity via the spec kit 356: diversity and inclusion, which compiled information regarding recruitment and retention of minority librarians, strategies for fostering inclusive workplaces, and diversity programs and assessment.6 additionally, the american library association (ala) announced the formation of a joint task force to create a framework for cultural proficiencies in racial equity.7 despite ample research and professional attention to dei, surprisingly, no studies have explicitly examined academic library dei statements and few studies have examined diversity content on library websites. academic libraries and dei statements examining website diversity content, mestre reviewed 107 arl member websites for the presence and visibility of diversity content.8 employing content analysis, mestre found that diversity language which focused on ethnic and racial diversity, particularly for black, latinx, native americans, and asian americans, was most included in a strategic plan (37%, n=39) and in a values statement (27%, n=29). member sites that included diversity in a mission (16%, n=17), vision (14%, n=15) or diversity (13%, n=14) statement were less frequent. generally, across types of diversity-related links and information, diversity content was limited on the arl sites and, when present, was often difficult to find, situated deeply within a website behind multiple layers, or requiring a site search to locate. academic library mission statements expanding the scope to include mission statements yields literature that examines communicating purpose. salisbury and griffis examined the presence and placement of mission statements on 113 arl websites.9 operating under the principle that considers website content as hierarchical (e.g., the most important information is most visible), the authors documented the number of steps necessary to reach the library’s mission statement. eighty-four percent (n=95) of library websites contained a mission statement and 3.5% (n=4) of libraries contained a direct link from the homepage. the authors identified a visibility issue, as mission statements on 14% (n=16) of websites required one click to access, but only four were clearly labeled as mission statements. despite this issue, salisbury and griffis found that mission statements were available in two steps or fewer in over 60% (n=72) of libraries. the authors’ findings indicate that academic libraries acknowledge the need to make their mission statements more visible to various stakeholders. academic libraries are responsible to the institutions within which they are situated, making their institutions a primary stakeholder. wadas employed discourse analysis and compared library and institutional mission statements from 44 colleges and universities and found 14 (31.8%) institutions, “showed a discernable degree of agreement between the college or university and library mission statements,” while the remaining 30 showed none.10 like salisbury and griffis’ finding regarding the lack of explicit labeling, wadas also identified a labeling inconsistency regarding the information in, and purpose of, each statement type. wadas’ analysis also identified a prevailing sense of vagueness across college/university and academic library mission statements, further contributing to confusion regarding statements’ purpose and intended messages.11 information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 3 mission statements, strategic plans and diversity: content analyses wilson, meyer, and mcneal examined institutional mission statements and other diversity-related content on 80 websites of institutions of higher education in the united states.12 fifty-nine (75%) referenced diversity in their mission statements and 52 (65%) had a separate diversity statement. of these statements, wilson et al. found most diversity references fell into two areas: population demographics (student body racial or ethnic composition) and cultural vitality (incorporating various cultures within the campus community). of the 59 institutions that referenced diversity in the mission statement, 63% were related to changing student demographics, while 55% referenced cultural diversity. furthermore, less than 10% of the statements included language that fell into both categories, indicating institutions conceptualized diversity in one area or the other. in addition to formal mission statements, wilson et al. found that 52 (65%) of the institutions contained other diversity content. given these findings, the authors state their disappointment in the 25%–35% of institutions that did not include diversity content in official, primary statements. recognizing the rapid developments affecting the lis field, saunders employed content analysis and examined the publicly available strategic plans of 63 acrl institutions.13 saunders’ analysis indicated that while 40 (63.5%) libraries alluded to institutional mission, goals, or strategic plan to some degree, only 17 (27%) made explicit connections, results similar to wadas’ findings.14 regarding specific content, saunders categorized themes into three tiers: major emphasis (>75% of strategic plans), second tier, and other areas of emphasis. saunders’ analysis revealed that strategic plan diversity content was a second-tier issue related to library staff. saunders found the term diversity was used in two ways: to refer to expertise, skills, and abilities; and to delimitate demographic characteristics, including ethnicity, nationality, or language.15 like wilson et al., saunders’ findings demonstrate academic libraries’ recognition of the importance of diversity in higher education.16 methodology this study employed content analysis and examined uborrow consortium members’ library websites (see appendix a for a list) for the presence and content of dei statements. uborrow is an interlibrary loan service comprised of big ten academic alliance members, plus the university of chicago and the center for research libraries, in which “users at member institutions are granted access to the collective wealth of information of the entire consortium.”17 uborrow members leverage individual campus resources to collaboratively assist the academic pursuits of students and faculty of each institution via the expedited sharing of resources. these libraries were chosen as a representation of a model consortium and are a reasonable focus for examination. content analysis is a research technique for making replicable and valid inferences from texts or other forms of contextually based data. it allows for data analysis “in view of the meanings, symbolic qualities, and expressive contents they have and of the communicative roles they play in the lives of the data’s sources”.18 content analysis provides a foundation for understanding how messages and meanings are constructed. as such, content analysis is appropriate to analyze the content and meanings of dei statements on uborrow websites. additionally, this study utilized multimodal theory, particularly lemke’s hypermodality and three communicative acts: organizational, presentational, and orientational.19 examining dei statements as multimodal texts allows for the analysis of meaning making and construction across information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 4 each type of act. just as users make meanings across sentences, paragraphs, and pages, users likewise make meanings from the ways in which they interact with digital information.20 the organizational aspect provides a way to examine the spatial arrangement of library websites, for example, libraries that dedicate entire webpages to dei statements or those in which these statements share pages with other content. content analysis provides a way of examining the presentational aspect of information, the ideational content of texts, in this case how dei statements are presented on uborrow websites. content analysis also provides a way to examine the orientational aspect, which indicates the nature of the communicative relationship, via exploring how libraries establish relations with whom they are communicating, for example, how the presence of dei statements positions libraries as conscientious entities engaged in the promotion of diverse and inclusive environments. i examined each uborrow member website for an explicit dei statement. informed by previous literature, i created an excel spreadsheet and entered data from each institution including: institution name, library website url, dei statement (yes/no), homepage link (yes/no), dei statement url, and notes following a standardized process. first, i recorded the library’s homepage. next, i searched the homepage for a dei statement link. if found, i indicated the presence in the yes/no columns and recorded the url. only direct links to library dei statements were marked as yes in the homepage link column. if no homepage link existed, i searched the library websites using the following terms: diversity, equity, and inclusion. when it was difficult to locate dei statements, i utilized the chat feature or e-mailed library administrators to ensure i was not overlooking relevant information. i conducted an initial search in july 2020 and a subsequent search in november 2020. no changes to explicit dei statements occurred. i conducted a followup search in april 2021. in the intervening months, one major change occurred on the university of minnesota libraries website. implications of this change are discussed below. once dei statements were identified, i examined the pages on which they were located. first, i examined page organization. in this step, i noted if the dei statement was the sole content on a page. if not, i noted the accompanying content. this analysis focused on lemke’s traversals, or the varied paths available to users in their search and navigation of websites.21 second, i analyzed dei statement content and identified the ways libraries presented their statements. this step included an examination of the language used in the dei statement. third, i expanded upon the presentational analysis and considered the ways dei statement content oriented the library toward users by exploring how statement language contributed to portraying the library in a certain way. this analysis focused on two areas: library-centered language common across statements and social justice language, which a subset of libraries’ dei statements employed. limitations uborrow is a single 16-member library consortium. further research of similar consortia or library associations would contribute to the study’s limited size. this study focused on explicit dei statements, thereby excluding other forms of dei content (e.g., announcements, marketing material, events). further research employing a broader view of dei content on academic library sites would also contribute to this study’s findings. finally, this study represents library websites during a snapshot in time. findings and analysis twelve (75%) uborrow member websites had an explicitly titled dei statement in november 2020 and 13 had an explicitly titled and labeled dei statement as of april 2021 (see table 1). in information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 5 november 2020, the university of minnesota had a clearly defined statement; however, this statement was untitled, and its location was unique among websites during the initial search. initially, this statement was not considered an explicit statement due to the lacking title, the implications of which are discussed in detail below. however, between november 2020 and april 2021, the university of minnesota libraries updated their homepage to include a link to a clearly defined and labeled dei statement, which university librarian and dean of libraries lisa german approved on february 1, 2021. for this reason, the university of minnesota libraries website receives unique discussion in the analysis that follows. three additional consortium members did not have an explicit dei statement. table 1. uborrow member libraries and the presence of dei statements institution explicit dei statement (y/n) university of chicago yes center for research libraries no university of illinois at urbana-champaign yes indiana university yes university of iowa yes university of maryland yes university of michigan yes university of minnesota no (fall 2020) yes (spring 2021) michigan state university yes university of nebraska yes northwestern university no ohio state university yes penn state university yes purdue university no rutgers university yes university of wisconsin madison yes while 12/13 of the 16 uborrow members contained an explicitly labeled dei statement, all member institutions addressed dei in some form, which included libguides, links to library resources, library events, and statements responding to specific societal events. however, the degree to which additional dei content prominent varied, with some content buried deep within library sites, as mestre’s work indicated.22 dei statement analysis: organization, presentation, and orientation the following section presents the descriptive analysis of the findings utilizing content analysis and lemke’s organizational, presentational, and orientational communicative aspects.23 analysis focused on questions about how the organization and presentation of dei statements contributed to the construction of meaning, the content of dei statements, and how content is oriented toward users in ways that position academic libraries as conscientious entities via their dei statements. organizational aspect of dei statements the organizational aspect of communication is instrumental and organizes and composes content in such a way that it is coherent and cohesive.24 organizational meanings have practical consequences, as the example of the university of minnesota’s library website demonstrates. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 6 unique among uborrow sites as of november 2020, the university of minnesota libraries’ website contained a clearly focused, although untitled, dei statement. this statement is accessed via the about option on the homepage’s menu (see figure 1), which includes dropdown links to library policies, library overview, and the untitled dei statement. figure 1. the university of minnesota libraries’ homepage (november 2020). the statement’s placement is problematic for several reasons. first, the statement is easy to overlook. the researcher and a library staff member who responded to the researcher’s query via library chat both overlooked the statement. only when a third staff member was consulted did the identification of this statement occur. secondly, lemke discusses the affordances of hypertext and the many ways users can navigate websites, calling possible paths traversals.25 among the most basic is the visual-organizational traversal, which considers how webpage composition guides users’ eyes across the page. in this instance, the links are a call to action and signify to users that clicking on a link will transport them to a page with more information. static text on a webpage does not offer the same affordance. as a block of text located next to two panes of links, the statement is static, passive, and non-interactive, contributing to the ease with which users can overlook the statement. finally, this statement did not appear in the results of a library site search for the terms diversity, equity, or inclusion. given the various ways users can transverse a website, including actively searching for information, the lacking title makes this statement difficult to locate via scanning and searching. in users’ traversals of websites, two common approaches, identifying links or actively searching for desired information, are not applicable in locating the university of minnesota libraries’ dei statement. in the intervening months, between november 2020 and april 2021, the university of minnesota libraries website was updated to include an explicitly titled and labeled dei statement, available via a link from the homepage, prominently located in the upper right quadrant between the menu bar and hours and locations information (see figure 2). information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 7 figure 2. the umn libraries’ homepage (april 2021). this statement, written by the university libraries’ diversity, equity, and inclusion leadership committee, was approved on february 1, 2021. presented on a standalone page, this statement is similar to those of eight other uborrow consortium members, which are discussed in the next section. organization: stand-alone dei statements of the websites that contained explicitly labeled dei statements, eight libraries dedicated an entire page to the dei statement (nine including the umn libraries update). examination of these webpages revealed similar page titles, with variance according to the terms included. some page titles only included diversity, while others included diversity, equity, and inclusion. the university of michigan was unique as it also included accessibility. the relative consistency across these titles contributes to less frustrating and confusing user experience. clear and descriptive titles provide a positive experience for users accessing pages with an assistive screen reader. logistically, clear titles amplify page presence on searches conducted via google or other search engines. in addition to webpage titles, examination of the eight/nine pages revealed a relatively similar page organization and structure. each page contained headings that included some or all the terms diversity, equity, or inclusion. the pages were text heavy, with the university of nebraska and penn state university the only two whose pages included visual representations of diversity (i.e., images containing multi-racial groups). furthermore, the detail level of libraries’ dei statements were relatively consistent across the eight/nine webpages. while the page titles, organization, and detail of dei statements were similar, differences existed in the amount of additional dei content. for example, along with their dei statement, the university of maryland libraries’ diversity page defined diversity, an equitable environment, and inclusion. the university of michigan library followed their explicit dei statement with information relating the statement to the library’s collections, services, spaces, and people. other library webpages did not contain as much other on-page information. for example, rutgers university libraries links to various dei resources, which was another common trait (see figure 3). although clicking links requires additional steps to reach dei content, the presence of links is significant in consideration of lemke’s traversals. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 8 figure 3. rutgers libraries’ diversity homepage with dei links. of the many types, a common organizational traversal is what lemke terms cohesive, in which “each element is an instance of some general category, and therefore with some thematic and/or visual similarities to the others, and as we catenate them we are cumulating toward an exhaustive exploration of the category”.26 the links on the rutgers university libraries’ diversity page allow users to traverse the library’s dei content, along with institutional dei content, as several links direct users to diversity pages external to the library. these links serve as calls to action and require users to click for more information. associated to dei content via the categorical connection, these links allow users to fully explore and expand upon the information found on the library’s dei statement page, allowing users to create their own meaning of library commitment to dei. user creation of meaning is in opposition to the library making this decision for the user, as when dei statements are placed on pages with other content, as is the case in four uborrow member websites. organization: shared dei statements unlike the libraries that dedicated a page to dei statements, variety exists in the page titles of the four libraries on which dei statements share pages with other content. dei statements are available via the about section of the library’s website, while of these, two are further couched on administration pages. the pages on which dei statements shared space exemplify mestre’s finding that dei content situated deeply within a website are difficult to locate.27 furthermore, the location of dei content is not entirely intuitive, making a user’s traversal to locate desired information less cohesive. michigan state university’s (msu) dei statement is found on the information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 9 library’s strategic plan page. rather than in a single statement, dei content is spread across the library’s strategic plan, including an inclusivity statement; a vision statement; and diversity, equity, and inclusion strategic direction (see figure 4). also, unlike the pages singularly devoted to a dei statement, msu’s strategic plan page was comparatively static with no links to other library or institutional dei content. the lack of links does not allow users to traverse msu’s site for dei content as easily due to the page’s static nature, making it difficult for users to, “construct a traversal which is more than the sum of its parts.”28 in this way, msu constructs the meaning of their commitment to dei via the limitations and restrictions on users’ interaction opportunities with the page on which their dei content is situated. figure 4. michigan state university libraries’ diversity content as part of strategic plan. organization: homepage links homepage links to dei statements were present on seven (58%) library homepages. when present, homepage links were located at two locations: in the menu or page footers. additionally, two levels of clarity existed regarding homepage links, as some sites contained an explicitly labeled link, while others required a two-step process to access the dei link. for example, the information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 10 university of iowa and penn state university libraries each had a clearly labeled link to their dei statement available via a single click on their library homepages. contrastingly, michigan, maryland, nebraska, ohio state, and rutgers all required users to first navigate a menu bar to find a link to the library’s dei statement. this two-step process requires more time and effort, whereas direct links require one less step. however, the direct link’s location on the university of iowa’s library homepage is located in the footer (see figure 5) and penn state university’s direct dei link is located near the bottom of the page, requiring users to scroll through entire pages. although requiring an extra step, libraries with a menu link at the top of the homepage, such as the ohio state university (see figure 6) do not require scrolling. a tradeoff exists between page location and number of steps to locate a link to the library’s dei statement when a homepage link is present. regardless of the homepage location, the presence of links to dei statements provides relatively easy access, making a user’s traversal to these statements relatively effortless and straightforward. figure 5. university of iowa libraries’ homepage dei statement footer link. figure 6. the ohio state university libraries’ homepage dei statement menu link. presentational aspect of dei statements lemke defines presentational meanings as those that present some state of affairs, which are construed from connections among processes, relations, events, participants, and circumstances and is significant for institutional purposes.29 users see the product of these actions that result in public dei statements. the discussions, meetings, efforts, and decisions that contribute to dei statements on library websites are concealed. the presence of dei statements represents the hidden work necessary for their creation, making dei statement content the library’s presentation of commitment to diversity, equity, and inclusion. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 11 presentation: vague language and diversity conceptualizations examining the content of the 12/13 libraries with an explicit dei statement revealed these statements are frequently vague. many statements do not include specific language identifying what diverse means or who is in-/excluded. for example, rutgers university libraries’ dei statement states, “the libraries advance and promote diversity in all its forms” without describing, defining, or providing diversity examples.30 additionally, rutgers libraries endeavors “to create a welcoming workplace that reflects and supports the many populations and programs of the university with which we engage [emphasis original].”31 again, no definition indicates who the many populations includes. similarly, vague language produced an inconsistency regarding to whom dei statements were directed, with many, but not all, statements including faculty and staff. indiana university’s statement represents the later, stating, “iu libraries esteems diversity of all kinds […] to support students from diverse socio-economic backgrounds and foster a global, diverse inclusive community… in addition, the libraries commits to diversifying its own staff to reflect a diversity of perspectives and backgrounds [emphasis original].”32 including library faculty and staff acknowledges the potential significance of having a diverse and representative workforce, but still vaguely addresses the issue. unlike many dei statements which vaguely conceptualize diversity, the university of maryland libraries includes in its definition of diversity “race, ethnicity, nationality, religion, socioeconomic status, education, marital status, language, age, gender, sexual orientation, cognitive or physical disability; and learning styles” while noting diversity is not limited to these categories.33 similarly, the university of iowa libraries “welcomes and serves all, including people of color from all nations, immigrants, people with disabilities, lgbtq, and the most vulnerable in our community.”34 while still broad, and with language to cover additional conceptions of diversity, these statements’ explicit mention of various groups is unique among uborrow members’ dei statements. presentation: library-focused language continuing the broad conceptualizations of diversity, the university of chicago libraries’ statement includes an inward focus, which asks library users to consider their own positions and backgrounds: “we encourage open and honest discussion, reflect on our assumptions, and actively seek viewpoints beyond our own … and respect the uniqueness that we each bring to our shared endeavors.”35 this statement asks library users to actively challenge their own assumptions, values, beliefs, and views. however, the statement does not include active language regarding the necessity to prepare for challenging and difficult conversations and interactions. furthermore, the general conceptualization of these interactions with diversity makes it difficult for individuals to prepare for concrete situations in which one may encounter challenging, uncomfortable, or difficult conditions. utilizing lemke’s presentational aspect of communication, which considers processes, relations, events, participants, and circumstances to create and present a state of affairs, demonstrates that uborrow member libraries are vague in their dei statements, which the university of illinois at urbana-champaign (uiuc) exemplifies in their recognition of “diversity as a constantly changing concept. it [diversity] is purposefully defined broadly as encompassing, but not limited to, individuals’ social, cultural, mental and physical differences.”36 dei statements, as representative of academic libraries, present these institutions as attuned to larger social issues and the difficulties of making sweeping, definitive statements regarding diversity, when the term itself is, as the uiuc statement indicates, evolving and contested. the challenges this creates for library information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 12 administrators, and the hidden work that contributes to the creation and presentation of dei statements, is invisible in the end product, yet informs the content of these public statements. orientational aspect of dei statements orientational meanings establish relations between those who are communicating. these meanings communicate point of view, attitudes, and values.37 dei statements demonstrate libraries’ willingness to engage with and address dei issues, as well as, in some cases, combating racism and discrimination. analyzing the content of these statements produces insights into how statements orient libraries to their audiences. the vague and general language of many library dei statements creates a sense of detachment between libraries and users. conceptualizing diversity using vague language in an exchange between library users and a library dei statement orients the library in an abstract, immaterial way. using vague, broad, and ill-defined language makes no concrete demands of users. additionally, many dei statements are written in library-centered language, which positions the libraries at the center and users as peripheral. for example, the university of nebraska’s statement begins, “the university libraries creates and fosters inclusive environments for teaching, learning, scholarship, creative expression and civic engagement.”38 in this instance, the onus is on the libraries and what they can do to address issues of diversity, equity, and inclusion. the statement continues, “libraries staff members are empowered to provide an array of library services, collections, and spaces to meet the diverse needs of students, faculty, and researchers.”39 again, the library self-promulgates their efforts to address dei issues and ignores users’ contributions to positive, inclusive, and equitable environments. the university of nebraska libraries’ statement is not unique in the use of library-centered language, as such language is common across uborrow members’ dei statements. less vague and user-centered language would make library dei statements more humanizing, valuable, and contribute to the inclusive environments these statements espouse. orientation: anti-racism and social justice language some libraries’ dei statements make explicit mention of larger social issues and actively position themselves as social justice advocates, particularly anti-racism. the university of wisconsin – madison libraries are “dedicated to the principles and practices of social justice, diversity and equality and … commit ourselves to doing our part to end the many forms of discrimination that plague our society.”40 the penn state university libraries’ statement includes a commitment to “disrupting racism, hate and bias whenever and wherever we encounter it.”41 the university of michigan library “actively work[s] to ensure that tenets of diversity and antiracism influence all aspects of our work.”42 these statements present the libraries as cognizant, responsible, and socially aware entities. in these statements, the libraries’ employment of social justice discourse demonstrates nonneutrality. the university of wisconsin – madison libraries’ statement recognizes its place within society and the continual legacy of discrimination which tangibly affects current students. identifying social discrimination as a “plague” implies a solution via targeted, collective efforts to “further and enable the opportunities for education, benefit the good of the public and inform citizens.”43 similarly, the ohio state university libraries are guided by priorities “which facilitate, celebrate and honor diversity, inclusion, access and social justice.”44 embracing an active stance against social discrimination and positing the libraries as proponents of social justice utilizes the libraries’ dei statement as a tool to combat these injustices. semantically, dei statement text information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 13 offers information to users. statement content demonstrates libraries’ willingness to address dei issues institutionally and within society. the text demonstrates the libraries’ desire to combat injustice and the importance they place in doing so. additionally, in linking dei statements to social justice issues, libraries make demands of users. while still employing library-centered language, these statements provide a call to action via their direct acknowledgement that the libraries’ actions are a part of larger, collective efforts in the continual struggle against social injustices. lack of explicit dei statements as the analysis shows, the ways in which academic libraries organize, present, and orient themselves via their dei statements contributes to the construction of institutional value of, and commitment to, diversity, equity, inclusion. but what about libraries who do not have an explicit dei statement? in the united states context, given the attention to diversity, through black lives matter and other social movements advocating for social justice, it is surprising that four uborrow members do not have explicitly labeled dei statements on their websites. orientationally, the absence of an explicit dei statement suggests a lack of concern and consideration on behalf of libraries, as well as seemingly being out of touch with broader social contexts in which racial disparities persist. a clear dei statement, however, is a single piece of a library’s online presence. academic libraries can organize and present dei content on their websites in other ways, as all uborrow members did, even if an explicit statement was lacking. for example, the purdue university libraries have a diversity, inclusion, racism and anti-racism resources library guide, which acts as a one-stop shop for dei-related material. additionally, this guide contains a statement from the dean of libraries, dated june 2, 2020, condemning and making a collective call to action to address systemic racism. given this statement, while acknowledging the bureaucratic mechanisms in place that may slow the creation of an explicit dei statement, the question remains: if “enough is enough” as the statement claims, why have the purdue libraries not taken swift action to expedite the bureaucratic process? purdue university libraries are working behind the scenes and have created a council on equity, inclusion and belonging, as well as creating a new strategic plan in which “edi [equity, diversity and inclusion] is much more prominent in the current draft of that plan than in previous ones.”45 similarly, northwestern university does not, as of this writing, have an explicit dei statement. however, minimal diversity language is present in a public-facing welcome message on the library’s about page stating, “your library serves the diversity of the northwestern community.”46 furthermore, minimal diversity language appears in the internal strategic plan, 2019–2021, which includes a commitment to “responding to the vibrant diversity of our campus community.”47 additionally, recent conversations regarding racism, diversity, and social justice among library leadership have spurred the creation of a formal edi program at the institutional level.48 examining the situation at northwestern university revealed a look into the hidden work required to create and present dei content and an explicit dei statement, demonstrating the institutional significance of the presentational aspect of communication. discussion and implications the descriptive analysis presented in this study provides a foundation for closer analysis and future research, with potential avenues suggested below. this analysis also illustrates issues with the way in which dei statements are presented on academic library websites, which, given the pervasive whiteness of academic librarianship, affects academic librarians, staff, and the students information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 14 they serve. following lemke’s treatment of organizational meanings as primarily instrumental, the following section discusses presentational and orientational implications of dei statement content.49 academic libraries are an integral component of the institutions within which they are situated. their physical and digital spaces, services, and resources are critical to students’ academic success and faculty research. academic libraries also contribute to larger institutional dei initiatives. while an examination of institutional dei statements is beyond the scope of this study, institutional mission and vision statements also address diversity, equity, and inclusion. although many institutions have implemented specific diversity statements, wilson, meyer, and mcneal identified diversity content on institutional websites as being limited.50 given the changing demographics of higher education in the united states, the significance of dei to academic institutions and libraries will continually increase. if the purpose of mission and diversity statements is to reflect institutional priorities, as wilson and colleagues argue, the presence, or lack thereof, and content of these statements indicates the extent to which institutions value diversity, equity, and inclusion.51 presentational implications of dei statements in the context of the present study, that all uborrow member libraries’ websites engaged in some ways with dei content demonstrates the value they place on diversity, equity and inclusion. however, that only 12/13 of the 16 sites contained explicitly titled dei statements demonstrates more concerted effort is required if these libraries are to truly demonstrate their commitment. despite other dei language, northwestern university, a member of the 2020 acrl diversity alliance, does not have an explicit, public-facing dei statement, which demonstrates the many ways academic libraries are involved with diversity initiatives. while academic libraries may have internal policies that guide practice, that these policies, if they exist, are not public does not contribute to the construction and dissemination of the libraries’ message indicating their commitment to diversity, equity, and inclusion. the lack of a publicly facing statement, whether intentional or not, contributes to the message that the library is not fully committed to diversity, equity, and inclusion. in this vein, further exploration of diversity content and statements, at the institutional and library levels, is necessary to expand upon the findings of the present study regarding the messages dei statements send. qualitative studies could investigate the working cultures of academic libraries and explore internal mechanisms that contribute to the creation of public facing statements and how these mechanisms operate. lemke argues that presentational meanings are typically uncritical due to the presupposition of institutional hierarchies and roles, which minimize threats to the status quo, making this avenue especially fruitful from a critical or decolonizing framework.52 other opportunities for further research include quantitative content analysis of diversity statements, which could reveal specific words, terms, and phrases that institutions and academic libraries use to shed light on how these entities conceptualize dei. research examining users’ perspectives of academic library dei content is necessary to explore the ways in which libraries’ messages are received. orientational implications of dei statements examining uborrow members’ dei statements revealed the frequent employment of librarycentered language. framing the statements in this way places responsibility to create inclusive, equitable and welcoming environments on academic libraries, librarians, and staff. if the onus is information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 15 on academic libraries, as this dei statement language suggests, those who staff libraries are required to appropriately serve diverse students. as such, practical considerations of staff training regarding cultural competence is of paramount importance, which the university of michigan recognizes as they “encourage all library staff to participate in diversity-focused professional development and training activities.”53 while training and professional development opportunities are of limited utility, as cultural competence, cultural humility, and a diversity mindset cannot be acquired in one-off sessions, setting a pervasive atmosphere establishes institutional library value of diversity, equity, and inclusion. furthermore, hiring and retaining staff representative of student demographics is critical as doing so is one way academic libraries can demonstrate evidence of their values placed on diversity. that librarianship has traditionally been a white profession, as 86.7% of ala members selfidentified as white as of 2017 and 86.1% of higher education credentialed librarians were white as of 2009–2010, exacerbates the need for representative library staff.54 however, recruiting and hiring diverse staff is challenging as the number of visible minorities in academic librarianship has remained stagnant.55 retaining academic librarians and staff of color is a separate challenge, as institutional and library environments, expectations and research output are all explicit barriers, while internal pressure and time management constraints are implicit barriers.56 academic librarians and staff of color are subject to racial microaggressions perpetrated by unaware nonminority colleagues, an issue that permeates higher education, particularly at historically white institutions (hwis).57 these environments contribute to individual stress and fatigue for faculty of color.58 a history of what mehra and gray label white-ist trends in lis, an amalgamation of practices that symbolize, “racist connotations and racism in lis that is part of its historical evolution and development in the united states” affects librarians and staff of color.59 at the societal level, hate crimes are a continual issue in the united states.60 academic library dei statements were not created to directly address grand social issues. however, some dei statements included a social justice call to action. while not all dei statements contained such language, those that did not still made a commitment to supporting diversity. academic libraries’ dei statements identify the scope of available services and demonstrate libraries’ collective attempt to provide equitable spaces for all campus community members. while these statements occasionally align with institutional diversity statements, institutional responses to bias and discrimination provide insight into other ways institutions craft an identity.61 especially at hwis, these responses typically include demonstrating a professed commitment to dei, acknowledging actions to prevent future instances, establishing a protocol in the event an incident occurs; and addressing the issue and removing the institution from the perpetrators’ actions.62 academic library dei statements that simply state a commitment to diversity and inclusion without actively promoting change, which is lacking in the vague, library-centric language common to these statements, is a typical, though not emphatic, stance. this passive stance demonstrates the need for critical analysis of orientational meanings. such critical analysis allows for the examination of scrutinization of the actors and processes involved in dei statement creation, presentation, and messaging. such an examination offers an avenue to hold institutions accountable for their words and dei statements. future research that examines academic libraries’ responses to specific incidents of bias and discrimination could provide further insight into internal processes that lead to the public display of academic libraries as change agents. additional research could examine individual academic librarians and staff to interrogate the congruences or dissimilarities of individual and institutional practices regarding engagement with dei initiatives. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 16 conclusion examination of uborrow members’ websites revealed that 12/13 of 16 sites contained explicitly labeled dei statements. although not all members’ sites contained an explicit statement, every library engaged with dei content in some way. among the 12/13 sites that contained an explicit dei statement, distinctions existed regarding statement organization. eight/nine libraries dedicated an entire page to their dei statement, while four members’ statements shared a page with other content. organizationally, the pages containing dei statements were similar with textheavy pages common across the websites. presentationally, dei statements serve as publicly facing representations of university libraries. the most telling insight into the presentational aspect of communication was revealed in an analysis of the sites that did not contain explicit dei statements, as this analysis examined the hidden work that is necessary in dei statement creation. orientationally, vague and library-centric language distances academic libraries and positions them as abstract entities. those libraries whose dei statements employed social justice language made more concrete demands of users. while explicit dei statements comprise only a portion of academic library dei content, an analysis of these statements revealed the ways in which they contribute to academic libraries’ construction of value of, and commitment to, diversity, equity, and inclusion. this analysis demonstrated how the presence, or lack thereof, of dei statements positions libraries as conscious entities operating within institutional and social contexts that both restrain and encourage promotion of diversity, equity, and inclusion. that the university of minnesota libraries updated their homepage to include a link to a newly constructed dei statement during the months between the first and second examination of uborrow consortium members’ websites in this study indicates the significance and value institutions place on dei initiatives. academic libraries, as entities that operate within institutions in the social context of historical racism, discrimination, and marginalization in the united states, are not immune to the consequences of these enduring legacies. despite current and ongoing efforts, this analysis revealed that much work and dedication is yet required in the continual engagement with dei initiatives. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 17 appendix a: uborrow member institutions university of chicago university of illinois at urbana-champaign indiana university university of iowa university of maryland university of michigan michigan state university university of minnesota university of nebraska – lincoln northwestern university ohio state university penn state university purdue university rutgers university university of wisconsin – madison center for research libraries information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 18 appendix b: urls for dei pages from uborrow consortium websites university of chicago: https://www.lib.uchicago.edu/about/thelibrary/ university of illinois at urbana-champaign: https://www.library.illinois.edu/about/administration-overview/ indiana university: https://libraries.indiana.edu/administration#panel-about university of iowa: https://www.lib.uiowa.edu/about/diversity-equity-inclusion/ university of maryland: https://www.lib.umd.edu/about/deans-office/diversity university of michigan: https://www.lib.umich.edu/about-us/about-library/diversity-equityinclusion-and-accessibility michigan state university: https://lib.msu.edu/strategic-plan/ university of minnesota: https://www.lib.umn.edu/about/inclusion university of nebraska – lincoln: https://libraries.unl.edu/diversity ohio state university: https://library.osu.edu/equity-diversity-inclusion penn state university: https://libraries.psu.edu/about/diversity rutgers university: https://www.libraries.rutgers.edu/diversity university of wisconsin – madison: https://www.library.wisc.edu/diversity/ information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 19 endnotes 1 “table 306.30 fall enrollment of u.s. residents in degree-granting postsecondary institutions, by race/ethnicity: selected years, 1976–2028,” national center for education statistics, last modified march 2019, https://nces.ed.gov/programs/digest/d18/tables/dt18_306.30.asp. 2 courtney mcdonald and heidi burkhardt, “library-authored web content and the need for content strategy,” information technology and libraries 38, no. 3 (2019): 8–21, https://doi.org/10.6017/ital.v38i3.11015; courtney mcdonald and heidi burkhardt, “web content strategy in practice within academic libraries,” information technology and libraries 40, no. 1 (2021): 52–98, https://doi.org/10.6017/ital.v40i1.12453. 3 library bill of rights, american library association, amended january 29, 2019, https://www.ala.org/advocacy/intfreedom/librarybill. 4 alice m. cruz, “intentional integration of diversity ideals in academic libraries: a literature review,” the journal of academic librarianship 45, no. 3 (2019): 220–27, https://doi.org/10.1016/j.acalib.2019.02.011; jenny lynne semenza, regina koury, and sandra shropshire, “diversity at work in academic libraries 2010–2015: an annotated bibliography,” collection building 36, no. 3 (2017): 89–95, https://doi.org/10.1108/cb-122016-0038. 5 acrl racial and ethnic diversity committee, “diversity standards: cultural competency for academic librarians,” college and research libraries news 73, no. 9 (2012): 551–61, https://doi.org/10.5860/crln.73.9.8835; “acrl plan for excellence,” american library association, revised november 2019, http://www.ala.org/acrl/aboutacrl/strategicplan/stratplan 6 toni anaya and charlene maxey-harris, diversity and inclusion, spec kit 356 (washington, dc: association of research libraries, september 2017) https://doi.org/10.29242/spec.356. 7 american library association, “acrl, arl, odlos, and pla announce joint cultural competencies task force,” news release, may 18, 2020, https://www.ala.org/news/membernews/2020/05/acrl-arl-odlos-and-pla-announce-joint-cultural-competencies-task-force. 8 lori s. mestre, “visibility of diversity within association of research libraries websites,” the journal of academic librarianship 37, no. 2 (2011): 101–8, https://doi.org/10.1016/j.acalib.2011.02.001. 9 preston salisbury and matthew r. griffis, “academic library mission statements, web sites, and communicating purpose,” the journal of academic librarianship 40, no. 6 (2014): 592–96, https://doi.org/10.1016/j.acalib.2014.07.012. 10 linda r. wadas, “mission statements in academic libraries: a discourse analysis,” library management 38, no. 2/3 (2017): 108–16, https://doi.org/10.1108/lm-07-2016-0054. 11 salisbury and griffis, “academic library mission statements”; wadas, “mission statements in academic libraries.” information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 20 12 jeffery l. wilson, katrina a. meyer, and larry mcneal, “mission and diversity statements: what they do and do not say,” innovative higher education 37 (2012): 125–39, https://doi.org/10.1007/s10755-011-9194-8. 13 laura saunders, “academic libraries’ strategic plans: top trends and under-recognized areas,” the journal of academic librarianship 41, no. 3 (2015): 285–91, https://doi.org/10.1016/j.acalib.2015.03.011. 14 saunders, “academic libraries’ strategic plans”; wadas, “mission statements in academic libraries.” 15 saunders, “academic libraries’ strategic plans.” 16 wilson, meyer, and mcneal, “mission and diversity statements”; saunders, “academic libraries’ strategic plans.” 17 “library borrowing,” big ten academic alliance, accessed november 5, 2020, https://www.btaa.org/library/reciprocal-borrowing. 18 klaus krippendorf, content analysis: an introduction to its methodology, 3rd ed. (los angeles, ca: sage, 2013), 49. 19 jay l. lemke, “travels in hypermodality,” visual communication 1, no. 3 (2002): 299–325, https://doi.org/10.1177%2f147035720200100303. 20 lemke, “travels in hypermodality.” 21 lemke, “travels in hypermodality.” 22 mestre, “visibility of diversity.” 23 krippendorf, content analysis; lemke, “travels in hypermodality,” 304–5. 24 lemke, “travels in hypermodality,” 304. 25 lemke, “travels in hypermodality,” 300–1. 26 lemke, “travels in hypermodality,” 318. 27 mestre, “visibility of diversity.” 28 lemke, “travels in hypermodality,” 318. 29 lemke, “travels in hypermodality,” 304. 30 “diversity, equity, and inclusion,” rutgers university libraries, accessed april 2, 2021, https://www.libraries.rutgers.edu/about-rutgers-university-libraries/diversity-equity-andinclusion. 31 “diversity, equity, and inclusion,” rutgers university libraries. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 21 32 “indiana university libraries diversity strategic plan,” indiana university libraries, accessed april 2, 2021, https://libraries.indiana.edu/strategicplan. 33 “diversity, equity, inclusion,” university of maryland libraries, accessed april 2, 2021, https://www.lib.umd.edu/about/deans-office/diversity. 34 “the university of iowa libraries’ commitment to diversity, equity, and inclusion,” iowa university libraries, accessed april 2, 2021, https://www.lib.uiowa.edu/about/diversityequity-inclusion/. 35 “diversity, equity, and inclusion statement,” university of chicago library, accessed april 2, 2021, https://www.lib.uchicago.edu/about/thelibrary/. 36 “library diversity statement,” university of illinois library diversity committee, accessed april 2, 2021, https://www.library.illinois.edu/about/administration-overview/. 37 lemke, “travels in hypermodality,” 304. 38 “diversity mission statement,” university of nebraska-lincoln libraries, accessed april 2, 2021, https://libraries.unl.edu/diversity. 39 “diversity mission statement,” university of nebraska-lincoln libraries. 40 “our commitment to diversity and inclusion,” university of wisconsin–madison libraries, accessed april 2, 2021, https://www.library.wisc.edu/about/administration/commitment-todiversity-and-inclusion/. 41 “libraries diversity, equity, inclusion, and accessibility (deia) commitment statement,” penn state university libraries, accessed april 2, 2021, https://libraries.psu.edu/about/diversity. 42 “diversity, equity, inclusion, and accessibility,” university of michigan library, accessed april 2, 2021, https://www.lib.umich.edu/about-us/about-library/diversity-equity-inclusion-andaccessibility. 43 “our commitment to diversity and inclusion,” university of wisconsin–madison libraries. 44 “diversity, equity, inclusion and accessibility (deia),” the ohio state university, university libraries, accessed april 2, 2021, https://library.osu.edu/equity-diversity-inclusion. 45 mark a. puente, associate dean for organizational development, inclusion and diversity, personal communication with the author, november 11, 2020. 46 “about,” northwestern university libraries, accessed april 2, 2021, https://www.library.northwestern.edu/about/index.html. 47 “strategic plan,” northwestern university libraries, accessed july 21, 2020, https://www.library.northwestern.edu/documents/about/2019-21-plan.pdf. 48 claire roccaforte, director of library marketing & communication, personal communication with the author, october 26, 2020. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 22 49 lemke, “travels in hypermodality,” 304. 50 wilson, meyer, and mcneal, “mission and diversity statements.” 51 wilson, meyer, and mcneal, “mission and diversity statements.” 52 lemke, “travels in hypermodality.” 53 “diversity, equity, inclusion, and accessibility,” university of michigan library. 54 kathy rosa and kelsey henke, 2017 ala demographic study (chicago: ala office for research and statistics, 2017): 1–3, https://www.ala.org/tools/sites/ala.org.tools/files/content/draft%20of%20member%20de mographics%20survey%2001-11-2017.pdf; diversity counts 2012 tables, (data from diversity counts study, chicago: american library association), https://www.ala.org/aboutala/sites/ala.org.aboutala/files/content/diversity/diversitycounts /diversitycountstables2012.pdf. 55 janice y. kung, k-lee fraser, and dee winn, “diversity initiatives to recruit and retain academic librarians: a systematic review,” college and research libraries 81, no. 1 (2020): 96–108, https://doi.org/10.5860/crl.81.1.96. 56 trevar riley-reid, “breaking down barriers: making it easier for academic librarians of color to stay,” the journal of academic librarianship 43, no. 5 (2017): 392–96, https://doi.org/10.1016/j.acalib.2017.06.017. 57 jaena alabi, “racial microaggressions in academic libraries: results from a survey of minority and non-minority librarians,” the journal of academic librarianship 41, no. 1 (2015): 47–53, https://doi.org/10.1016/j.acalib.2014.10.008; chavella t. pittman, “racial microaggressions: the narratives of african american faculty at a predominantly white university,” the journal of negro education 81, no. 1 (2012): 82–92, https://doi.org/10.7709/jnegroeducation.81.1.0082. 58 william a. smith, tara j. yosso, and daniel g. solorzano, “challenging racial battle fatigue on historically white campuses: a critical race examination of race-related stress,” in covert racism: theories, institutions, and experiences, ed. rodney d. coates (boston: brill, 2011): 211– 37. 59 bharat mehra and laverne gray, “an ‘owning up’ of white-ist trends in lis to further real transformations,” library quarterly 90, no. 2 (2020): 189–239, https://doi.org/10.1086/707674. 60 “hate crime statistics, 2019,” federal bureau of investigation, https://ucr.fbi.gov/hatecrime/2019. 61 wadas, “mission statements in academic libraries.” 62 glyn hughes, “racial justice, hegemony, and bias incidents in u.s. higher education,” multicultural perspectives 15, no. 3 (2013): 126–32, https://doi.org/10.1080/15210960.2013.809301. a tale of two tools: comparing libkey discovery to quicklinks in primo ve communication a tale of two tools comparing libkey discovery to quicklinks in primo ve jill k. locascio and dejah rubel information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.16253 jill k. locascio (jlocascio@sunyopt.edu) is associate librarian, suny college of optometry. dejah rubel (dejahrubel@ferris.edu) is metadata and electronic resources management librarian, ferris state university. © 2023. introduction consistent delivery of full-text content has been a challenge for libraries since the development of online databases. library systems have attempted to meet this challenge, but link resolvers and early direct linking tools often fell short of patron expectations. in the last several years, a new generation of direct linking tools has appeared, two of which will be discussed in this article: third iron’s libkey discovery and quicklinks by ex libris, a clarivate company. figure 1 shows the “download pdf” link added by libkey. figure 2 shows the “get pdf” link provided by quicklinks. the way we configured our discovery interface, a resource cannot receive both the libkey and quicklinks pdf links. these two direct linking tools were chosen because they were both relatively new to the market in april 2021 when this analysis took place and they can both be integrated into primo ve, the library discovery system of choice at the authors’ home institutions of suny college of optometry and ferris state university. through analysis of the frequency of direct links, link success rate, and number of clicks, this study may help determine which product is most likely to meet your patrons’ needs. figure 1. example of a libkey discovery link in primo ve. figure 2. example of a quicklink in primo ve. mailto:jlocascio@sunyopt.edu mailto:dejahrubel@ferris.edu information technology and libraries june 2023 a tale of two tools 2 locascio and rubel literature review over the past 20 years link resolvers and direct linking have evolved in tandem. early link generator tools, such as proquest’s sitebuilder, often involved a process that “… proved too cumbersome for most end-users.”1 five years later, tools from ebsco, gale, ovid, and proquest had improved, but they were all proprietary. bickford postulates that metadata-based standards, like openurl, may make linking as simple as copying and pasting from the address bar; however, they may be more likely to fail “… as long as vendors use incompatible, inaccurate, or incomplete metadata.”2 the first research was wakimoto’s 2006 study of sfx, which relied on 224 test queries and 188,944 individual uses for its data set. 3 of those queries, 39.7% of search results included a full-text link and that link was accessed 65.2% of the time. unfortunately, wakimoto also discovered that 22.2% of all full-text results failed and concluded that most complaints against sfx were problems with the systems it links to and not the link resolver itself. alth ough intended to be provider-neutral, the openurl standard is, in fact, vulnerable to metadata omissions. content providers, whether aggregators or publishers, have a vested interest in link stability and platform use and have therefore invested in building direct link generation tools. in 2006, grogg examined ebsco’s smartlink, which checks access rights before generating the link; proquest’s crosslinks, which was used to link from proquest to another vendor’s content; silverplatter and links@ovid, which relied on a knowledge base in the terabytes for static links.4 in 2008, cecchino described the national library of medicine’s linkout tool for selected publishers within pubmed.5 they also described two ovid products: links@ovid and linksolver, noting that the former is similar to linkout and the latter is similar to sfx. most of the time these tools worked well, but their use was restricted to a particular platform or set of publishers. as online public catalogs became discovery layers, direct linking became a feature of the library management system. two studies have been done thus far: silton’s analysis of summon and stuart’s analysis of 360 link. in 2014, silton tested the percentage of full-text articles retrievable from summon by running a test query and examining the first 100 results. over a year, the total success rate for unfiltered queries rose from 61% to 76%. after direct linking was introduced, the success rate of link resolver links rose to 65.8–73% and direct links succeeded 90.48–100% of the time. silton concluded, “while direct linking had some issues in its early months, it generally performs better than the link resolver.”6 in 2011, stuart, varnum, and ahronheim began testing the 1-click feature of 360 link on 579 citations, 82.2% of which were successful. after direct linking became an option for summon in 2012, 61–70% of their sample relied on it. “between direct linking and 1-click about 93 to 94% of the time an attempt was made to lead users directly to the full text of the article … [and] … we were able to reach full text … from 79% to about 84% of the time.”7 direct linking outperformed 1-click with a 90% success rate compared to 58–67% for 1-click. stuart also compared the actual error rate with one based on user reports and discovered that “relying solely on user reports of errors to judge the reliability of full-text links dramatically underreports true problems by a factor of 100.”8 openurl links were especially alarming with approximately 20% of them failing. although direct linking is more reliable, stuart closes by noting that direct linking binds libraries closer to vendors thereby decreasing institutional their flexibility. information technology and libraries june 2023 a tale of two tools 3 locascio and rubel methods the goal of this project was to assess two of the latest direct linking tools: ex libris’s native quicklinks feature and third iron’s libkey discovery. we performed a side-by-side comparison of the two tools by searching for specific articles in primo ve, the library discovery system used by the authors’ respective home institutions, suny college of optometry and ferris state university, and measuring • how often each vendor’s direct links appeared on the brief record; • success rate of the links; and • number of clicks it takes from each link to reach the pdf full text. both suny college of optometry and ferris state university use ex libris’ alma as their library services platform. alma provides a number of usage reports in their analytics module. we sourced the queries used in our analysis from the alma analytics link resolver usage report. the report contains a field number of requests, which records the number of times an openurl request was sent to the link resolver. an openurl request is sent to the link resolver when the user clicks on a link to the link resolver from an outside source (such as google scholar), for example, when the user submits a request using primo’s citation linker or when the user accesses the article’s full record in primo by clicking on either the brief record’s title or availability statement. this means that results that have a direct link (whether a quicklink or libkey discovery link) on the brief record will not appear in the report if the user clicked the direct link to the article. thus, in order to create test searches that would be an accurate representation of articles being accessed, we used article titles taken from suny optometry’s october 2019 alma link resolver usage report— a report that was generated prior to the implementation of both libkey discovery and quicklinks. the report was filtered to include only articles with the source type of primo/primo central to ensure that the initial search was taking place within the native primo interface, as requests from outside sources like google scholar or from primo’s citation linker are irrelevant to this analysis. this filtering generated a total of 412 articles. after further removal of duplicates and non -article material, there were 386 article titles in our test query set. we created two separate primo views as test environments: one with libkey discovery and the other with quicklinks. we ran the test searches twice in each view. in the first round of testing, we recorded whether a direct link was present. we also recorded the name of the full-text provider (if present), as well as whether the article was open access. suny optometry does not filter their primo results by availability; therefore, many of the articles included in the initial search did not have any associated full-text activations. since these articles are irrelevant to our assessment, we removed them before analyzing the first round of data and proceeding with the second search. the exception to these removals were articles identified as open access by unpaywall, as the presence of unpaywall links is independent of any activations in alma. furthermore, third iron’s libkey discovery and ex libris’ quicklinks both incorporate unpaywall’s api into their products to provide direct links to pdfs of open access articles. this functionality helps fill coverage gaps where institutions may not have activated a hybrid open access journal due to its paywalls. therefore, we are including the presence of direct links resulting from the unpaywall api when determining whether a libkey discovery link or quicklink is present. after filtering for availability, we had 254 article titles for the first round of searching and analysis. the initial analysis revealed the need to further filter the information technology and libraries june 2023 a tale of two tools 4 locascio and rubel articles used for the second round of searching, which would provide a much closer comparison of the two direct linking tools as third iron had partnered with more content providers than ex libris. controlling for shared providers would give a more accurate representation of how each direct linking tool performs in relation to the other. when controlling for shared providers and open access articles, we were left with 145 article titles for the second query set. during the second round of searching, we measured whether the direct link was successful in linking to the full text—meaning that the link was neither broken nor linked to an incorrect article—and how many clicks were necessary to get from the direct link to the article pdf. along the way, additional qualitative measures were observed, such as document download time and metadata record quality. while not as easy to measure as the quantitative data, these observations provided additional insight into the strengths and weaknesses of each of these direct linking tools. since april 2022, when our research was conducted, ex libris has added several quicklinks providers, possibly increasing the current number of quicklinks available. additionally, both rounds of searching were conducted on campus, so our analysis excludes any consideration of authentication and/or proxy information. results of the 254 articles searched, 208 (82%) had libkey discovery links present while 129 (52%) had quicklinks present. while this seems like a large discrepancy between the two direct link providers, it can be explained by the fact that during the time of testing, ex libris was collaborating with fewer content providers than third iron. ex libris has since added more providers. while the provider discrepancy meant that there were many instances where a libkey discovery link was present where a quicklink was not, there were 5 articles where a quicklink was present while a libkey discovery link was not. as mentioned previously, the criterion for the 254 articles included in the second round of searching was that the articles must be activated in alma or must be open access. of these 254 articles, we identified 137 (54%) as open access. of those open access articles, 132 (96%) had libkey discovery links present, and 118 (86%) had quicklinks present. we found that 113 (82%) of the open access articles had both libkey discovery links and quicklinks present. we also discovered within this set of 137 open access articles that 30 (22%) were from non-activated resources. of those 30 open access articles from non-activated titles, all 30 (100%) had libkey discovery links appearing on the brief results and 24 (80%) had quicklinks. to get a better idea of how libkey discovery links and quicklinks compared in terms of linking success, we filtered to only those articles available from providers who were participating in both libkey discovery links as well as quicklinks. since both direct linking tools use unpaywall integrations, we continued to include open access articles. this filtering resulted in 145 articles where libkey discovery links were present in 137 articles (94%) while quicklinks were present in 129 articles (89%). we found that 123 (85%) of these 145 articles had both libkey discovery links and quicklinks present. there were 2 (1%) articles that had neither libkey discovery links nor quicklinks present despite being activated in a journal currently participating as a provider in both direct linking tools. there were also 14 articles (10%) that had libkey discovery links but information technology and libraries june 2023 a tale of two tools 5 locascio and rubel not quicklinks; all of these articles were open access. in total, of the 145 articles searched, 128 (88%) were identified as open access. as for the 137 libkey discovery links, 130 (95%) of them successfully linked to the article. on average it took 1.07 clicks to get to the pdf of the article. of the 129 quicklinks, 126 (98%) of them successfully linked to the article. on average it took 1.07 clicks to get to the pdf of the article. we also attempted to measure the time it took for the pages to load after the initial click on the libkey discovery links and quicklinks; however, the tools used to measure this, as well as the environments in which the links were being clicked, proved too varied to provide an appropriate comparison. nevertheless, we noted observations such as the page load times after clicking on libkey discovery links and quicklinks were generally consistent, but quicklinks attempts to connect to the wiley platform took a significant time (at least 10 seconds) to load. conclusions with high article linking success rates, both third iron’s libkey discovery and ex libris’ quicklinks deliver on the promise to provide fast and seamless access to full-text articles. however, the libkey discovery tool far outpaces quicklinks when it comes to coverage. both direct linking tools perform well with open access articles, supplying libraries with better options for full-text links to articles that may be in hybrid journals. as with any kind of full-text linking, both direct linking tools rely on metadata. in conclusion, while libkey discovery provides a more complete direct linking solution, both libkey discovery and quicklinks are reliable tools that improve primo’s discovery and delivery experience. endnotes 1 david bickford, “using direct linking capabilities in aggregated databases for e-reserves,” journal of library administration 41, no. 1/2 (2004): 31–45, https://doi.org/10.1300/j111v41n01_04. 2 bickford, 45. 3 wendy furlan, “library users expect link resolvers to provide full text while librarians expect accurate results,” evidence based library and information practice 1, no. 4 (2006): 60–63, https://doi.org/10.18438/b88c7p. 4 jill e. grogg, “linking without a stand-alone link resolver,” library technology reports 42, no. 1 (2006): 31–34. 5 nicola j. cecchino, “full-text linking demystified,” journal of electronic resources in medical libraries 5, no. 1 (2008): 33–42, https://doi.org/10.1080/15424060802093377. 6 kate silton, “assessment of full-text linking in summon: one institution’s approach,” journal of electronic resources librarianship 26, no. 3 (2014): 163–69, https://doi.org/10.1080/1941126x.2014.936767. https://doi.org/10.1300/j111v41n01_04 https://doi.org/10.18438/b88c7p https://doi.org/10.1080/15424060802093377 https://doi.org/10.1080/1941126x.2014.936767 information technology and libraries june 2023 a tale of two tools 6 locascio and rubel 7 kenyon stuart, ken varnum, and judith ahronheim, “measuring journal linking success from a discovery service,” information technology and libraries 34, no. 1 (2015): 52–76, https://doi.org/10.6017/ital.v34i1.5607. 8 stuart, varnum, and ahronheim, 74. https://doi.org/10.6017/ital.v34i1.5607 introduction literature review methods results conclusions cultivating digitization competencies: a case study in leveraging grants as learning opportunities in libraries and archives article cultivating digitization competencies a case study in leveraging grants as learning opportunities in libraries and archives gayle o'hara, emily lapworth, and cory lampert information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.11859 gayle o’hara (gayle.ohara@wsu.edu) is manuscripts librarian, washington state university. emily lapworth (emily.lapworth@unlv.edu) is digital special collections & archives librarian, university of nevada las vegas. cory lampert (cory.lampert@unlv.edu) is head of digital collections, university of nevada las vegas. © 2020. abstract this article is a case study of how six digitization competencies were developed and disseminated via grant-funded digitization projects at the university of nevada, las vegas libraries special collections and archives. the six competencies are project planning, grant writing, project management, metadata, digital capture, and digital asset management. the authors will introduce each competency, discuss why it is important, and describe how it was developed during the course of the grant project, as well as how it was taught in a workshop environment. the differences in competency development for three different stakeholder groups will be examined: early career grant staff gaining on-the-job experience; experienced digital collections librarians experimenting and innovating; and a statewide audience of cultural heritage professionals attending grant-sponsored workshops. introduction digitization of cultural heritage resources is commonly viewed as an important and necessary task for libraries, archives, and museums. there are many reasons for engaging in digitization projects and creating digital collections, including providing increased access to unique collections, preserving fragile records, raising the global profile of the institution, meeting user demand, and supporting the teaching, learning, and research needs of host institutions. in addition, there is an expectation among the public that research resources are digitized and available online. from the perspective of librarians and archivists, digitization of special collections and archives materials involves more than just reformatting analog materials into a digital format (this article uses the term “digitization” to refer to the entire lifecycle of digitization projects involving special collections and archives materials, from planning to preservation). materials must be selected and prepared, the digital surrogates must be described and preserved, and access must be provided to the appropriate audiences. digitization work is often project-based, since each set of materials to be digitized may require different equipment, specifications, approaches, or workflows. digitization projects and workflows can be a solo affair, a temporary project team, or a permanent functional area complete with staff specializing in activities such as project management, grantwriting, web development, or metadata. staff learning needs will significantly vary depending on organizational characteristics, assigned roles, project specifications, and motivation of individuals. overall, the libraries’ and archives’ profession-wide approach to teaching and developing digitization competencies is somewhat haphazard. there are many methods to learn about digitization, including self-study of published resources, online tutorials and resources, conference presentations, workshops, continuing education courses, and masters in library and information mailto:gayle.ohara@wsu.edu mailto:emily.lapworth@unlv.edu mailto:cory.lampert@unlv.edu information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 2 science (mlis) program classes.1 in many graduate school programs there has been a move toward integrating digital library theory and practice, but courses are necessarily broad in nature, and not every student will be required or have the opportunity to complete a practicum or internship while studying. this can make it difficult for new librarians to identify which skills are most in demand and which type of self-study is most useful for the job market. identifying key competencies, and how to acquire them, may be helpful in supporting new librarians as th ey make the jump from graduate education to their first professional position, but it is not a challenge limited to newer professionals. even seasoned librarians and archivists, with practical experience in their portfolio, may find that their local experience does not translate to different organizations, is too broad for a particular project, or is not deep enough for them to lead the initiation of a new digitization program. the digital collections department at the university of nevada, las vegas (unlv) has a decadelong record of hiring early career librarians for grant-funded projects, providing them with opportunities to develop digitization competencies on the job. from 2017 to 2019, unlv’s digital collections department completed two grant-funded digitization projects that specifically set out goals to contribute to competency development for multiple stakeholders. early career project managers learned, practiced, and refined skills; the department experimented and innovated its own workflows; and the project team held two workshops to contribute to the development of digitization competencies throughout the state. the six main competencies that were developed during the grant projects are project planning, grant writing, project management, metadata, digital capture, and digital asset management. the authors, who were members of the grant project teams, will discuss the six competencies in this article. using the grant projects as a case study, they will describe each competency and share how it was used and developed within the project team via on-the-job learning, and within the state via the statewide workshops. literature review the idea of professional competencies for librarians and archivists is well-established and documented in academic literature, and defined competencies are recognized as valuable tools for education, recruitment, professional development, and evaluation. drawing from organizational project management literature, daniel, oliver, and jamieson define competency as the ability to apply combined knowledge, skills, and abilities in service of a measurable and observable goal. 2 in the united states, the american library association (ala) defines “core competencies of librarianship” and “competencies for special collections professionals.”3 the competency framework of the archives & records association of the united kingdom and ireland (ara) describes five levels of experience: novice, beginner, competent, proficient, and expert/authoritative.4 ara’s recognition of the varying dimensions of competency is a helpful guide, and aligns with the reality of different levels of expertise. however, the competencies identified by ala, ara, and other similar professional organizations are necessarily broad; competencies for specific library roles are harder to generalize and define. in order to identify the knowledge, skills, and abilities required of “digital librarians,” researchers such as choi and rasmussen analyzed job announcements and surveyed practitioners.5 job announcement analysis shows that there is no single definition of a digital librarian; instead digital librarian positions consist of many varied roles and responsibilities in almost infinite combinations. the competencies discussed in this article (project planning, grant writing, project management, metadata, digital capture, and digital asset management) were locally important to information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 3 unlv’s digitization projects, but they also align with the competencies identified in previous research. in their study of projects undertaken in the national digital stewardship residency program (ndsr), blumenthal et al. found that project management skills and technical skills (including metadata, workflow enhancement/development, digital asset management, and digitization) were important.6 the level of required technical competency tended to vary by project but workflow enhancement stood out as a universally important skill. a 2019 analysis of the latest career trends for information professionals by san jose state university’s (sjsu) ischool noted that there is increasing demand for project management skills across all career types. 7 this usually encompasses the ability to organize complex tasks and collaborate with other departments or institutions in service of a shared goal. sjsu also cited “new technologies” as a necessary skill. however, they specified that this refers to “all iterations relating to interest in, familiarity with, or experience with new and emerging technologies” (emphasis in the original). in choi and rasmussen’s article analyzing job ads, the authors note that many of the frequently stated job requirements tend to be vaguely described or cover broad areas, including current trends in digital libraries, competency on general technological knowledge, and the current state of information technology as three most frequently mentioned competencies. 8 digital asset management, digital scanning, digital preservation, and metadata were some of the specific technical skills desired, as well as project management, planning, and organization. research shows that the more generic the competencies, the more broadly applicable they are; but specific competencies depend on the local environment, the role of the position, and the variables of the project or responsibilities. the wide range of competencies required by the digital library field paired with the specificity of local implementation requires new librarians and archivists to seek out learning opportunities that target both theory and practice. in fact, one of the most important aspects of practical experience is the benefit gained by experiencing the concepts in real-world situations that require decision-making, iteration, and sometimes even failure. the education field points to the kolb model of experiential learning, a cycle that is composed of four elements: concrete experience, reflective observation, abstract conceptualization, and active experimentation. 9 these elements mirror the process of learning observed in the grant case studies. new project staff are often trained to do tasks, then reflect upon what went well or was challenging. then permanent staff in leadership roles encourage and facilitate discussions in abstract concepts such as the philosophy behind an organization’s decision to prioritize efficiency or the concepts of creating authentic digital surrogates. while it may not happen in every project, within both grant cases, the final phase of the learning cycle was also reached as project staff and permanent employees worked together to move practice forward through testing, experimentation with new methods, and ultimately innovation of new models for digital library practices in the area of large-scale digitization. kolb’s model can be useful throughout the library and archives field, as shown in the following example. the federal agencies digital guidelines initiative (fadgi) started in 2007 as a collaboration of federal agencies seeking to articulate common sustainable practices and guidelines for digitized and born-digital cultural heritage resources. the fadgi website is a treasure trove of approved and recommended guidelines covering still image digitization, embedding metadata, project planning for digitization activities, and more.10 it essentially provides step-by-step guides for all aspects of digitization and is a tool that those interested or actively involved in digitization should be familiar with and consult on a regular basis. however, information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 4 fadgi technical standards are relatively prescriptive, so organizations often have to decide how to implement them within their local environments, taking into consideration a wide range of variables. if every new digitization project manager conscientiously implemented the fadgi standards without associated institutional context, they could be investing their organization in long-term cost commitments that cannot be sustained over time or that do not meet the project goals. this scenario points to the need for hands-on experience and learning as outlined in the kolb model. the digitization project manager may want to revisit the goals of the project (access vs. preservation, or both) and resource allocations (storage capacity, software and hardware specifications, staff time and expertise), and then pilot a subset of materials by capturing with the fadgi standard and calculating the storage sizes of the files and any associated workflows for long-term management. through this small experiential exercise, much information can be gained, reflected upon, and then used to conceptualize how to proceed. most of the tasks associated with digital library projects demand increasing competency over time to progress from enacting the technical standard in an organizational context, to revising it across projects or local environments, to educating others about the role of the standard, or to, at the highest levels of competency actively participate in the creation or revision of the standard itself as it changes over time. the ability to not only implement but also refine and even innovate comes from a process of mastery of the competency in question. experiential learning is an important method for developing and refining competencies from a novice to more expert level, but not all librarians and archivists have the opportunity to learn from more experienced colleagues on the job. matusiak and hu emphasize the importance but also the inconsistency of integrating experiential learning into mlis programs.11 for those who do not gain practical experience in library school or on the job, workshops are an additional learning opportunity that can help professionals bridge the gap from written resources to local implementation. the illinois digitization institute is one example described in detail by maroso in 2005.12 digital directions is a conference that presents the “fundamentals of creating and managing digital collections” in two days.13 other available workshops focus more closely on different aspects of digitization, such as metadata or preservation, or training for specific equipment via a vendor. in the following examination of unlv’s digitization grant projects and workshops, the authors address six competencies that were either employed or developed by staff or have been identified in existing literature. these competencies may be viewed as critical building blocks for digitization projects and the authors address how they were developed to different levels of expertise and using different methods, experiential learning, and workshops. overview of grant projects unlv’s digital collections completed two grant projects with the main goals of: (1) the large-scale digitization of archival collections, (2) the development of large-scale digitization models and workflows that could be reused, and (3) statewide workshops to share those models and workflows with other libraries and archives institutions. both projects were funded by library services & technology act (lsta) grants administered by the nevada state library and archives. the first project, “raising the curtain: large-scale digitization models for nevada cultural heritage,” digitized mainly visual materials on the topic of las vegas entertainment, while the second project, “building the pipelines: large-scale digitization models for nevada cultural heritage,” digitized mostly text documents about water issues in southern nevada. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 5 digital collections hired two types of temporary project-specific staff for the two digitization grants: project managers and student assistants. the project manager for each grant coordinated the day-to-day activities, such as preparation of the materials, digital capture, quality control, metadata, and ingest into the digital collection management system, as well as helping to fine-tune workflow documentation. the student assistants contributed to digital capture, quality control, metadata creation, and upload to the digital collection management system. these grant projects are strong examples of experiential learning and competency development. two of the authors were principal investigators (pis) for both of the grants, and one author was the project manager for the second building the pipelines grant. at time of hire, the project manager for the second grant had experience working in special collections and archives but had not previously worked in a digital environment. one student assistant was hired for this project; she had already worked on the first large-scale digitization grant project in digital collections and was already familiar with the digitization workflow, as well as the hardware and software. employing a student who had already experienced the concrete tasks (phase 1, “concrete experience” in the kolb model) allowed her to help the new project staff as they cou ld together perform “reflective observation” (phase 2) and learn from their compiled shared experience. the project pis were intentional in designing opportunities for discussion. they regularly met with the student and project manager to help them understand what they were seeing and experiencing in the context of the organization's mission and the grant goals (kolb’s “abstract conceptualization”). the building the pipelines grant project facilitated each of them gaining more competency and moving to the next level while also helping the pis learn through experimenting with new approaches (the final phase of “active experimentation”). the same experiential learning model was also successfully used for the first raising the curtain grant project. as previously stated, conducting a day-long digitization workshop for nevada libraries and archives was a goal of both large-scale digitization grant projects undertaken by unlv digital collections. the nevada statewide large-scale digitization workshops, which were held towards the end of each grant period, were free for participants, and travel grants were available thanks to the grant funding. the workshops sought to provide an overview of large-scale digitization using unlv projects as examples, as well as to provide practical advice related to developing digitization competencies. the first workshop that unlv held in may 2018 consisted of presentations and discussions addressing the basics, methods, and challenges of large-scale digitization. the second workshop, held in may 2019, still shared what unlv learned about large-scale digitization during the grant project, but widened the scope to address multiple important digitization competencies, whether the project is large or small. competencies whether presented in a project-based learning environment, a one-day workshop, or in a selfstudy scenario, learners can benefit from a clear understanding of what is meant by competencies in each of the areas that make up a successful digitization project. below, the authors share the competencies most critical to success in the case study projects. these were also the competencies selected as priorities for the workshops. while expertise is not mandatory in all of the competencies in order to start a digitization project or apply for a grant, reflection and planning for each of these steps should be addressed prior to initiating any project. by identifying available resources (such as existing documentation, available staff with expertise to consult, or approval from a supervisor for a self-study plan) project managers can ensure that if there are any information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 6 competency gaps, they will learn the needed competencies to carry out the project. in addition, throughout the learning process, interpersonal skills such as proactive communication, adaptability to change, flexibility in evolving job scope, and cultivation of comfort with ambiguity are all qualities that are just as necessary as any technical skill in mastering competency in digitization. project planning this competency can be defined as the ability to create a shared and documented vision and plan so that specifications, expectations, roles, goals, and timelines are considered in advance and clear to everyone involved. planning for a digitization project is best approached holistically. the planning period is the time to consider all needed competencies and plan for their implementation. writing up a project plan is important, especially since digitization can involve many collaborators and stakeholders. even if one is working alone, there are so many components, steps, and details involved in digitization projects that it is important to plan ahead for them and to document everything. brainstorm and write down ideas and plans for the project, from the overall scope, goals, timelines, and roles, to the specific details of each component, including specifications and workflows for digital capture, metadata, access, preservation, assessment, and promotion (see appendix a, “an overview of planning and implementing digitization projects”). the plan should be communicated, remain flexible, and be updated (or better yet, versioned) to document changes implemented during the project. an important part of project planning is selecting materials for digitization. to develop competency in effectively selecting materials, a person should be familiar with the materials and the digitization process or collaborate closely with people who are. it is often not until one is in the weeds and discussing the nitty-gritty details of a project that the challenges and actual viability of digitizing specific materials become apparent. format is a huge factor in digitization, as is description, and understanding how materials will be used.14 digitizing a group of materials that can all be processed the same way is much easier than undertaking a project to digitize many different formats that require different digitization specifications, equipment, description, processing, etc. one must also take into account legal and ethical considerations. successful selection of materials takes all factors into account and targets materials that fit with the overall goals and vision of a specific project.15 in the case of unlv’s grant projects, the head of digital collections and the digital collections librarian identified the main goals, developed tentative workflows, and authored the grant applications as copis. the pis had multiple years of experience planning and completing digitization projects, which they drew upon to plan these projects. they both started off developing their digitization competencies by completing pilot projects, developing workflows and writing grants to fund smaller-scale, highly curated “boutique” projects. as they honed their skills and the department’s workflows over the years and the organization built the capacity and expertise to successfully scale up the rate of digitization, digital object production grew from one staff member using one scanner to digitize a couple hundred items in a year, to a robust department with a digitization lab that produces tens of thousands of digital surrogates per year. the pis documented the vision and goals of the projects in the grant applications, along with timelines, desired outcomes, the roles of the team members, and budgets. the grant application information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 7 provided a structure to help with the bigger picture of project planning, and the digital collections librarian also used a template to create detailed digitization plans for the collections. the template was developed locally based on past experience planning and implementing digitization projects (see appendix b for unlv’s “digitization plan template”). project planning was completed prior to the hire of the project managers and student assistants. the project managers and student assistants were responsible for enacting the project plans, and during the projects they were empowered to adapt and improve upon the plans. the modelling provided by the pis, coupled with the day-to-day experience of the project managers, led to the continuous improvement of and adaptation of workflows through experiential learning. the grant application and digitization plan, along with all of the prepared workflow documentation and tracking spreadsheets, provided a concrete example of how large digitization projects can successfully be planned. by implementing and refining the plans herself, the project manager gained direct experience and intimate knowledge of the plans, including what worked well and what did not. the project manager therefore developed competency in project planning to be able to create plans herself, and the pis further refined their own planning skills, allowing them to plan for even larger or more complex projects in the future. based on previous experience with projectbased learning, the pis had already established a level of expertise at roughly level 4 in the ara tiers. level 5 includes innovation, which was a target of the grant project as it required the pis not only to successfully map past experience to a new situation, but in cases where experiences did not map, gain new knowledge through experimentation. the project team included project planning as a topic of the statewide digitization workshops, sharing digitization plan templates, finalized workflows, and other planning resources that aided in the successful completion of the grant projects. building upon feedback from the first workshop in 2018, the second one addressed the ability to create a digitization project plan of any scale, recognizing that many nevada institutions do not have the ability to engage in large-scale projects. despite the emphasis on the foundational importance of project planning, most attendees noted that they do not currently create detailed digitization plans prior to starting a project. providing examples of plans, practical resources, sharing hands-on experiences, and welcoming discussion was helpful to participants, as indicated by feedback on the post-workshop survey. the workshop organizers scheduled time for participants to work on their own digitization plan, and also offered private consultations to help them, but many participants did not have a specific project in mind and did not seem ready to jump into the details of project planning during the workshop. overall, these teaching strategies helped participants gain a better idea about how to plan digitization projects, but they do not match the experience of creating or implementing a plan oneself. grant writing all projects begin with an idea, but only a small fraction of possible projects are acted upon. this is due primarily to a scarcity of resources. grant writing is not a necessary competency for all projects, but it is a valuable skill that can secure funding for projects that otherwise would not have been prioritized or possible. in its simplest form, a grant is a well-communicated idea with supporting rationale that effectively communicates why a project is a priority to undertake.16 grant applications are usually composed of a narrative section that covers the main goals, a budget with associated costs for the project, letters of support from partners, and details about the project team leading the work. even if a grant is not needed to undertake a project, the process of writing one often mirrors the very same decision-making that is necessary in the project planning information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 8 step. project planning is recommended for all digitization projects and it is nearly always required by external grant funders. grant writing can be undertaken alone, in a team, or (in larger institutions) as part of a research office or external funding program. in any case, it can be defined as the skill of writing text, calculating costs, and compiling relevant documentation to successfully propose projects for the award of external funding. competency in grant writing requires excellent communication skills, including the ability to craft persuasive arguments advocating for the project and the analytical ability to interpret instructions and guidelines to ensure the project is in compliance with the funder’s requirements. often, grant writing involves several people: disciplinary experts, collaborative partners, commercial vendors or contractors, technicians, and advisory boards. being able to facilitate discussions and coordinate actions is vital to wrangling the pieces of a large grant pre-award, as well as successful grant administration once funded. grants are competitive in nature, so creativity and originality in framing of a problem can mean the difference between a highly ranked grant and one that is passed over by reviewers. one method to obtain competency in grant writing is to read as many grant proposals as possible, specifically targeting those for similar projects.17 in addition, some funders look for a panel of grant reviewers and seeking out opportunities to participate in these processes is a valuable education. in the case of unlv digital collections and the pis, grant writing has been honed over time by some of the strategies mentioned above: reading other grant applications, serving on grant review panels, collaborating with other stakeholders, and communicating with the granting agency to understand criteria and solicit feedback. although the grant proposal was written and the grant was secured prior to the hire of the project managers, the project managers were able to develop a thorough understanding of the grant process. by successfully completing the grant projects, in addition to reviewing the grant proposals, contributing to quarterly reports, and discussing the projects with the pis and other stakeholders, the project managers gained valuable experience and understanding to inform their own future grant applications. given the scarcity of resources, the statewide digitization workshops made it a priority to address various aspects of locating grant opportunities, preparing to write proposals, seeking out collaborations to strengthen applications, and the mechanics and timelines to expect when applying for grants. one of the panel sessions in the workshops included a presentation by the state library’s grant administrator, who provided an overview of the state process and what the board looks for when reviewing project proposals. many participants found this particularly helpful because seeking out and applying for grants for digitization projects was not within their frame of reference, especially as many did not believe they had the requisite expertise in digitization. awareness of a need, gathering information, and analyzing examples are some of the first steps in developing a competency. the workshops helped attendees take these first steps of developing competency in grant writing and management but fell short of actually helping them to write their own grants. in this case, however, it was appropriate since the attendees did not have specific projects in mind and likely needed to spend more time in the first stages of competency development before jumping into implementation. workshops are most effective when the level of the content is appropriate to the level of expertise of the attendees. project management project management training is not often specifically emphasized in mlis programs. while there is literature on this topic, most people learn on the job.18 a successful project manager demonstrates information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 9 mastery of this competency by taking responsibility and assuming leadership of the project throughout the process, even if they are not intimately involved in the day-to-day tasks. they often are responsible for hiring and training project team members as well as communicating and responding to project team members and stakeholders. while they are tracking and analyzing progress using appropriate metrics, they are often the one raising a red flag if the project is experiencing delays or challenges. because they are responsible for ensuring the completion of the main goals of the project within the specified timeline, they often need to analyze bottlenecks and propose possible solutions in order to deliver high-quality results. ideally, they learn from their experiences and also help other team members and the organization learn from experience. a key role of the project manager is not only to deliver the outputs, but to assess and analyze, both during the project, in order to make improvements, and after, in order to inform future projects. therefore, investment in mentoring and supporting a project manager, whether a temporary or permanent staff member, can greatly influence how much learning takes place during the project and how that acquired knowledge is transferred to others. documentation is a key part of project management. this needs to happen at every interval of the project—while planning, during implementation, and at the conclusion.19 documenting concrete data including the time spent on specific activities helps to plan cost predictions for future projects, as well as to make recommendations regarding future staffing and equipment. mastering this competency involves planning, an eye for both details and the big picture, clarity, transparency, communication, and dedicated recordkeeping from the start of the project to the end. much like in project planning, the unlv pis had multiple years of experience stewarding projects from start to finish, which assisted them in on-the-job development of the project management competency. they were able to share with the project managers their accumulated years of learning experiences on both the projects, providing guidance on what to look for and how to comprehensively document the current digitization projects. this mentorship, combined with the experience of managing the day-to-day workings of the digitization projects, allowed the early career librarians to develop this competency. in addition, monthly project staff meetings, complemented by on the spot consultation when necessary, contributed to the ease of competency development. during the statewide digitization workshops, the project teams discussed digitization project management and shared strategies and tools such as using google sheets and trello to track workflows and progress. the teams also provided advice for aspects of project management such as managing student workers, troubleshooting equipment, transparent communication, and more. the project team chose to focus specifically on their own large-scale digitization experience because literature and resources about general and library project management are readily available. in addition, participants were encouraged to consider how their non-digitization experiences with project management could be translated to this kind of project as a way to encourage reflective learning based on their individual experience. metadata digitizing materials would not be a valuable endeavor without comparable investment in describing them with metadata that aids users in discovering and using the digital objects. developing a project plan that includes metadata approaches is essential in scoping project work and resources. metadata assignment and quality review is often a far more resource-intensive step information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 10 than the process of digital capture. metadata is one digitization competency that is robustly addressed in library school programs. standards are well documented and examples of digital library metadata are easily accessible online. the importance of metadata to the library and archives profession means that many professionals already have a foundational knowledge. what makes metadata a difficult competency to master is the level of detail and specificity it entails, which makes the step from theory to practice challenging. metadata competencies require an understanding of recognized standards, the ability to interpret and apply them, and an awareness of metadata mobility including: reusability, interoperability, and flexibility for migration or transfer.20 metadata-related skills require comfort moving along a wide spectrum of varied tasks, often toggling between awareness and understanding of high-level philosophical issues (such as inclusiveness of subject terms) and a laser-focused eye for detail to troubleshoot data issues (like correcting spreadsheets or code). metadata work traverses several phases of the digitization lifecycle: from initial preparation of collections, during capture, through the ingest into systems, and over the long-term to maintain and preserve the assets. metadata quality itself is difficult to quantify, making this a competency that can be tricky to evaluate. mastery can be indicated through the identification and study of appropriate standards, including compliance with any data reuse requirements, such as a regional digital library consortium, or metadata requirements to ensure compatibility with existing systems and data. in addition to selection of standards, or adherence to existing standards, metadata can be subjective and needs to be undertaken with attention to the level of specificity required for the project. completion of successful projects demonstrates efficient processing of records balanced with an appropriate level of metadata richness. documentation of metadata approach via a metadata application profile (map) as well as training materials and examples for metadata creators are also good indicators of metadata expertise. while technical skills are valuable for metadata competencies, communication and soft skills should not be underestimated as part of this skill set. often metadata competency is an area where collaboration is required. many libraries have catalogers, metadata librarians or aggregators that can advise and sometimes train or provide documentation for projects. before creating a new metadata approach from scratch, consultation can be a very effective way to gain greater competency. at unlv, the choice of an already processed collection eased the metadata choices for digitization. this meant there was already a certain amount of basic metadata regarding the collection; in addition, having the curator function as a subject expert engaged in prepping the collection enabled the project team to have a readymade list of prioritized subject terms, people, and corporate bodies available to input as each folder in the collection was digitized. the building the pipelines project manager had prior coursework in metadata, as well as experience assigning metadata in a previous internship. using the unlv’s metadata application profile as a guide and the existing metadata procedures established for the project, the project manager was able to hone a better understanding of metadata theory applied in practice, including how to best capture the “aboutness” of these particular digital objects. the project manager also observed the importance of consistency in applying metadata by performing quality control of the studentcreated metadata. a final contributing factor in developing competency in this area is that the team, consisting of the digital collections librarian, the project manager, and the student assistant, had many resources available as a team to solve problems. as previously mentioned, the team met information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 11 to review any project concerns and to pull in adjacent team members such as technical services staff, the metadata librarian, the curator, or even those with experience in programming and application development who could advise on how the metadata would appear in other systems, such as the one being developed for future digital collections. this larger group feedback was invaluable in the learning process and often touched on the more abstract concepts underpinning the tasks. at the statewide workshops, a metadata “bootcamp” was held in which staff addressed the types of metadata standards attendees were likely to encounter, the role of a metadata application profile, how to identify an existing map and apply it to your collection materials, as well the value of having a subject expert available for consultation. while reuse of existing description data (e.g. , finding aids or inventories) was an important topic for the first workshop, in response to feedback the second workshop’s metadata bootcamps focused more on concrete steps required to make digitized images searchable regardless of other workflows or systems that might be in use. again , this was an example of tailoring the content to the learning level of the audience. while all participants were familiar with metadata, many did not have experience using a map or taking interoperability into consideration. many recognized a need to devote more time to developing this competency, regardless of project. digital capture whether it is done in-house or outsourced to a vendor, competency in digital capture (digitization in the most specific sense) is key. this competency requires considering the materials to be digitized, how they will be displayed, and how long-term access will be provided to the digital objects. working in-house, technical mastery is not required, but it is necessary to have a solid idea of what hardware and software capabilities are, as well as who to consult should difficulties arise (and they will).21 mastery of this competency means having a vision for the ongoing presentation and use of the digitized material and outlining specifications to make that happen. documenting digitization specifications is useful not only for the project manager and for fu ture projects, but also as a training tool for students, interns, and volunteers. it can also be a source of important preservation and technical metadata ensuring files created today are sustainable into the future. in addition, a robust quality control workflow should be in place prior to uploading digital objects for display and use. a key component of digital capture is efficiently preparing the selected materials. at unlv, experience has taught the digital collections department that digitization is most successful when using materials that have already been physically processed (surveyed and arranged) and for which an inventory (finding aid) has been created. digitization of archival materials can quickly become complicated because they are often not physically uniform or consistent, and sometimes they are grouped together for digitization into complex/compound/aggregate digital objects. well thought-out workflows for naming and tracking individual files can make the digital capture process smoother, especially when files are related (such as the front and back of a photo, or pages of a scrapbook). this item-level documentation is critical to managing the large volume of files created in digital capture. any conservation or preservation concerns of the physical materials should also be addressed prior to capture. additional consultation may be required if unforeseen complications or problems arise during digital capture; item-level review may not be possible for all materials during the planning stage. for instance, there may need to be an alternate workflow for items that contain personally identifiable information or which are too fragile to undergo scanning or capture. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 12 there are a number of options for capturing images to create digital surrogates, including digital camera systems and a variety of scanners. depending upon the method of capture, additional software may be needed to edit, output, and ingest the images into a digital management system. for a text-heavy collection, software for optical character recognition (ocr) makes the items fulltext searchable. for audiovisual materials, digital capture is even more complex. the local hardware, software, and procedures for capture all may require an investment in hands-on time learning and testing procedures. the repetitive nature of capturing items may also require some investigation of ergonomics or more human-friendly configurations of these variables. at unlv, step-by-step documentation for using the various hardware and software is key to developing staff competencies. such documentation includes screenshots of steps in the process to contribute to comprehensive understanding and correct implementation of the workflow. projectbased staff also make suggestions, as they move through projects, to improve current workflows. the clear documentation, repetition of tasks, access to workflows of prior digitization projects, consultation with experienced staff, and review of available resources (such as the previously mentioned fadgi website) all contributed to competency development for the project managers. although the pis have years of experience digitizing, it is a detailed process that can be forgotten without use and practice, and it is a competency that must be continually cultivated because of changing technology. if it is decided to outsource digital capture, there are a number of factors to take into account in order to find the right vendor. issues to consider include cost, company stability, prior clients and completed projects, timelines, where the work is performed, and preferred communication methods. requesting a quote for services can be a good way to gain visibility into vendor communications, flexibility, and workflows, and will be essential if the project funds are administered in conjunction with any state or organizational purchasing rules or guidelines. although it can be time-consuming, it is vital for the research and legwork to take place prior to starting the project (see the “project planning” section). in outsourcing, confidence in the digital capture partner is key. mastery of this aspect of digitization means a comprehensive, transparent agreement, a regular flow of communication, and comfort in letting go of control over a major part of the project. resources provided by the northeast document conservation center (nedcc) and the sustainable heritage network help to consider the pros and cons of both in-house and outsourced digital capture.22 project management skills can also be very useful as working with a vendor shifts the needed competency from digital capture to more of a project management focus. unlv often employs vendors for the more challenging formats mentioned, such as oversized materials like maps and architectural drawings, and for materials like newspapers that require specialized zoning in the metadata to retrieve articles. working with a vendor can be an informative experience, teaching communication skills, negotiation of contracts, building appropriate timelines, and quality reviewing deliverables. some granting agencies cover a limited timeframe and outsourcing digital capture can free up an organization’s time to do more librarycentric work like metadata or archival processing. for the building the pipelines project, most of the material in the selected collections was flat printed material that was not oversized or in challenging formats such as film/transparent material, newspapers, or media (audio/video). this led to a high comfort level for in-house digital capture as there were established procedures for the archival collection. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 13 at the statewide workshops, participants attended a digital capture session where they were presented with digital capture workflows and information about unlv’s decision-making regarding digitization equipment, outsourcing vendors, and technical standards, and then they went into the digitization lab to observe the equipment in action. the digital capture bootcamp was facilitated by the head of digital collections, the student assistant, and the visual resources curator (who is a professional photographer). this unstructured session offered a place for attendees to preview equipment that might be suitable for their projects, get a sense of costs if they were looking to purchase equipment, and to observe the digital capture in a large-scale workflow (a specially designed rapid capture overhead camera system), a medium-scale workflow (with digital slr camera and copy stand), and a small-scale workflow (flatbed and map scanners). attendees were encouraged to match equipment to their project needs or identify if outsourcing was an appropriate approach for their collection. attendees were not able to use the equipment themselves or practice the digital capture workflows, but the small workshop format allowed them to view demonstrations in person, ask specific questions, and also see example workflows in action, which is a step above what online research or resources provide for competency development. digital asset management competency in digital asset management goes beyond identifying the storage capacity necessary for a project. digital asset management includes the storage, access, and preservation of digital files and their accompanying metadata. there are different ways to provide access to digital objects, some of the most popular being online content management systems like omeka or digital collection management systems like contentdm.23 as mentioned previously, metadata is important for staff and users to discover and locate digital objects. competency in digital asset management requires technical knowledge of how to securely and efficiently transfer digital files that are requested, or how to provide secure and user-friendly online access. it also requires planning to ensure that whichever approach taken is sustainable and can meet demand. good digital preservation means planning and implementing the necessary actions to ensure that digitized resources continue to be discoverable, accessible, and usable well into the future. in the case of digitized libraries and archives materials, this means that they must be well-documented and trustworthy. preserving digital materials includes maintaining multiple copies of files, capturing checksums to verify if the bits of a file have been corrupted over time, and in some cases, migrating file formats so that items can be viewed and used with future hardware and software. models for digital preservation include the open archival information system (oais) model and the national digital stewardship alliance (ndsa) levels of preservation.24 software and tools to aid in digital preservation tasks are available, as is training. however, digital preservation is still relatively new to many in the libraries and archives profession, although some individuals and institutions have developed very sophisticated and carefully considered programs and approaches. since digital preservation is based on technology, it will always be changing. one must not only learn and be able to implement the current standards and best practices of digital preservation, but also always keep up with changes. success in digital preservation requires ongoing effort and evaluation. successful digital preservation means that staff and users can find, understand, view, and use digital resources at any point in the future. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 14 for the grant project managers, this was the most challenging competency. while they were exposed to the complexities of digital preservation at unlv, this process was already wellestablished, having been developed over time by the pis and other library staff. the project managers essentially stewarded the newly digitized objects up to this point and then handed over the reins to the digital collections librarian. while they were free to ask questions and developed an understanding of the standards that contribute to long-term digital preservation, the project managers did not implement this particular workflow, nor did they contribute to adapting it. it is important to keep in mind that digital preservation is not an all or nothing proposition; small steps can be taken by libraries and archives professionals to address short-term digital preservation while gaining a better understanding of long-term solutions.25 given the complexity of this competency, it was difficult to train participants in the statewide digitization workshop setting. however, unlv’s digital collections staff emphasized the multiplicity of options available for libraries and archives with varying levels of resources and encouraged participants to be open to starting despite ambiguity about the ultimate long-term solution for their organization. digital collections staff also provided an overview of these options and shared the evolution of digital preservation strategies at unlv, including suggesting some first steps such as creating an inventory of digital assets and a digital preservation plan. developing expertise in this competency requires in-depth research, consultation, and analysis to customize plans for local circumstances. the statewide workshops provided only an hour-long introduction to the topic and a broad overview as an example. digital preservation is a topic that is well-addressed by other more intensive workshops though, such as the society of american archivists digital archives specialist courses and the powrr institute.26 summary of competency development: experiential learning versus workshops learning through experience for project teams unlv’s grant projects are examples of how specific time-bound projects and grant funding can be utilized to develop both individual and organizational competencies, and to share what is learned via workshops, aiding in the professional development of others. the early career project managers advanced the most in competency development because of the opportunity for focused training and experiential learning through practice. they progressively developed digitization competencies in a number of ways, including training from the pis, working with experienced student assistants and staff, reading locally created documentation, observing project activities and decision-making by the team, proposing solutions to challenges and testing them through trial and error, learning by doing tasks and suggesting small iterations to improve them, consulting the workflows from previous projects, and reviewing recommended resources such as the fadgi website. the project managers, though temporary employees, were treated with the same status as permanent staff and encouraged to attend meetings, ask questions, take risks, and experiment in a safe and controlled environment of learning. given the many multifaceted details and tasks that go into a digitization project from start to finish, it is unrealistic to expect staff to remember everything without engaging in the process themselves. training for the grant projects broke each project down into a series of discrete steps, including preparation, digital capture, quality review, ocr transcription, metadata creation and review, and upload into the digital asset management system. each task was reviewed and practiced in a linear manner. given the volume of materials, basic mastery and self-sufficiency for information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 15 the grant project staff were achieved fairly quickly. this allowed project staff to then identify areas for workflow improvements and test adjustments for increased efficiency. despite having just two dedicated staff, one of whom had no prior digitization experience, over 55,000 digital surrogates were created during the ten-month building the pipelines project, far exceeding the original goal of 10,000 images. in both grant projects the project managers were able to develop digitization competencies as a result of on-the-job experience, enriching their skill sets while also assisting unlv digital collections to refine workflows for large-scale digitization projects. this in turn strengthened competencies on the organizational level, and those of the pis. in the best-case scenario of this kind of project, temporary staff develop valuable digitization competencies via project-based work; however, that is not always the case, and temporary project-based positions can be very harmful to the personal and professional development of workers. when undertaking a project that uses temporary labor, the organization should plan for and prioritize equitable hiring practices, fair compensation and benefits, and a positive and productive experience for temporary staff.27 learning through experience for organizations grants are temporary in nature, so it is important that organizations who fund them and who receive funding think about the long-term implications of the temporary work. it is important for project staff to clearly document all of the details of the digitization approaches and workflows that worked successfully in the grant, as well as any problems that can be avoided. all the extra work of testing and refining new workflows completed by the project staff, can (and should) be adopted and integrated by permanent staff into the existing structure of the department or institution. one of the drawbacks for institutions undertaking grant-funded projects is that temporary staff leave and take their expertise with them. it is essential for permanent staff to not only teach, but also be open to and active in learning from the temporary staff during the project, even if the permanent staff are not doing the day-to-day work. building opportunities for information-sharing and knowledge transfer into a project plan vastly increases the value of the grant project funding. this organizational learning is a form of accountability to the funder ensuring that projects can be sustained and that lessons learned contribute to increased capacity in the funded organization and beyond. learning through workshops for professional development grant projects also pave the way to share lessons learned with colleagues via workshops or collaborative endeavors. as previously stated, conducting a day-long digitization workshop for nevada libraries and archives institutions was a goal of both large-scale digitization grant projects undertaken by unlv digital collections. besides the metropolitan areas surrounding las vegas and reno, much of nevada is rural and sparsely populated. these workshops provided a forum for people who might not usually come together to meet and talk about their work. many libraries and archives institutions in nevada are small and may have limited or no experience with digitization. the workshops sought to provide an overview of large-scale digitization using unlv projects as examples, as well as to provide practical advice related to developing digitization competencies. the first workshop at unlv, held in may 2018, consisted of presentations and discussions addressing the basics, methods, and challenges of large-scale digitization (see appendix c for the may 2018 agenda, “nevada statewide large-scale digitization symposium”). the grant team surveyed participants after the workshop and received mostly positive responses. sixteen out of information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 16 nineteen people who completed the survey said they learned something, thirteen said they were confident and likely to apply what they learned, and eleven people said that if there was a follow up workshop they would attend. the comments from the surveys showed that participants wanted more interactive activities, and many of them were not ready to implement large-scale digitization at their institutions—they wanted to learn more about the basics of digitization first. this feedback highlighted two challenges of workshop-based learning: the tendency toward passive delivery of large amounts of information and designing content for an audience with unknown or varying skill levels. the second workshop, held in may 2019, still shared what unlv learned about large-scale digitization during the grant project, but widened the scope to address multiple important digitization competencies, whether the project is largeor small-scale (see appendix d for the may 2019 agenda, “nevada statewide large-scale digitization symposium”). prior to the workshop, attendees were surveyed about their expertise level and topics of interest and were asked to review a project planning document with their local materials in mind. sessions were designed as bootcamps with more extensive documentation that could be used as a template for implementation at their home organization. participants were encouraged to ask questions and share their own experiences during the workshops and were given the option to sign up for a private consultation. the team endeavored across the workshop to allow for more interactive, hands-on learning. although unlv adjusted the second workshop based on the feedback from the first, teaching practical how-to skills that are broadly applicable in a one-day workshop is challenging. digitization is a complicated and technical undertaking that is most easily learned via hands-on experience, which is most effectively gained through repetition rather than a one-day workshop. there was not enough time or equipment for participants to actually practice parts of the digitization process themselves and so experiential learning was not always an option for every competency. also, if participants return to an organization with different equipment, hardware, and software, there are limits to hands-on training. another potentially problematic issue is staying up to date with the rapid technological changes that characterize digital collections. if a person gains a basic intellectual understanding of digitization via a workshop or other professional education opportunities, and then returns to their setting without starting a specific project in a timely manner, there is a risk that the knowledge they gained becomes outdated. despite the drawbacks of workshop-based learning, workshops are still valuable venues for colleagues to come together and learn from one another. they can also provide demonstrations or hands-on learning activities that help to bridge the gap from written theory to local implementation. conclusion online access to libraries and archives materials is expected and increasingly necessary in order for institutions and their collections to remain vital, useful, and relevant. ideally, digitization in libraries, archives, and museums would be a permanent functional area with specialized staff. however, many medium and smaller libraries and archives institutions do not have the capacity to sustain such an area. competencies in the areas of project planning and management, grant writing and administration, digital capture, metadata, and digital asset management are instrumental in order to complete a successful digitization project or institute a digitization program in any setting. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 17 despite the proliferation of professional workshops, online resources, literature, and conferences regarding digitization skills, it can be difficult to make time to study these materials and put such learning into practice in a way that builds to more sophisticated learning through experience. the diversity of collection materials to be digitized, the range of local circumstances, and the changing pace of technology prevent any profession-wide standardized approach to digitization education. instead, individuals, organizations, and the profession as a whole must strategically invest in the most effective and efficient methods and opportunities for developing digitization competencies. locally, unlv digital collections has found that experiential project-based learning is the most effective way to pilot new workflows and develop competencies. project-based experiences, if thoughtfully designed with an eye to mentoring and supporting temporary staff, provide an opportunity for individuals to develop and practice these competencies in a hands-on way that encourages deep learning. there is a unique place for small pilot projects, modest grant projects, or one-time experimental projects to create a space for this kind of learning in almost any organization. as capacity increases, digitization projects can also be designed to develop competencies at the staff functional group level, the organizational level, or the regional level. workshops in turn can be an opportunity for project teams or experienced individuals to share what they’ve learned and teach basic competencies to others. although not as comprehensive and effective as experiential learning, workshops can provide a solid introduction to digitization competencies, especially if interactive and hands-on learning methods are incorporated and there is a willingness for organizations to remain available for consultation or questions from attendees. workshops that have a preand post-session component can add continuity, and workshops that can be offered multiple times have the ability to evolve and scale. rotating instructors, incorporating hands-on sessions, and on-going mentoring are all ways to improve workshopbased learning. scaffolding these approaches and sharing what is learned individually or locally with others is a way to continue to develop the capacity of libraries and archives institutions to provide global online access to unique historical materials. although this approach is already widespread in the profession, it is important not to leave individuals or institutions with less resources behind. when planning new digitization projects or initiatives, institutions should consider adding and investing in new positions, partnerships, and regional collaborations and networks. when new permanent positions are not possible, temporary positions should be designed to be empowering and valuable for workers, rather than exploitative and harmful. in an age where technology is changing rapidly and is driven by large, well-resourced corporations, developing the profession’s competencies in digitization, keeping pace with digital technologies, and remaining relevant in the information environment depends on decentralized, peer-to-peer educational opportunities that use efficient and effective methods of teaching, such as interactive and hands -on learning. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 18 appendix a an overview of planning and implementing digitization projects created by emily lapworth for local use, march 8, 2018. shared at the statewide digitization workshops. these steps were written for large-scale digitization but can be applied to any size digitization project. 1) identify collections for digitization. a) brainstorm your goals for this project. think about what you will do with these digital surrogates, and who your audience is. b) criteria for selection of materials i) formats: start simple. if everything's the same, large-scale workflows are easier to apply. ultimately you will need to create different workflows for each format with differing requirements. for example, print photos are digitized differently than film negatives. text documents benefit from transcription using optical character recognition (ocr) software, while photos do not, and handwritten materials present additional discoverability challenges. when creating complex digital objects with different formats within them things can become even more complicated. ii) condition: fragile materials require extra handling time and possibly additional physical treatment prior to digitization. iii) existing arrangement and description: it is easiest if online access can directly mirror physical access, but the materials may need additional arrangement and description before digitization, depending on your goals. if the materials already have item or folder level description that is ideal. if there is any hierarchy in the existing description, especially inconsistent or complex hierarchy, consider how you will reuse that description for digital objects. iv) copyright: plan on providing public online access only if you own the copyright, have permission from the copyright holder, or if it is a strong case of fair use. c) see preparation step (below) to come up with some idea of how you will undertake this project. it will likely be modified during the actual preparation, but you need to have some idea of what you will do and how you will do it in order to gather support and resources. 2) assess the technical infrastructure needed to create, manage, provide access to, and preserve the digital files. a) estimate how much storage space you will need, and how much space will be needed for long-term digital preservation. b) make sure that your current digital preservation policies and workflows will be able to accommodate this project. adjust them if needed. c) identify what equipment and software will be needed and if you already have it, can acquire it, or can use someone else’s. d) assess if your existing workflows and systems for providing access to digital materials will be able to accommodate this project, and what changes you might need to make. e) technology could be a great area for collaboration! if you lack certain resources, explore opportunities to collaborate with other institutions. 3) coordinate with other stakeholders to verify choices and plans for digitization. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 19 a) find out what kind of support there is (financial, staffing, etc.) from management, administration, and the community. b) identify possible collaborators and discuss plans, make agreements, etc. c) decide who will manage and oversee the project and how different responsibilities will be distributed. d) identify and apply for grants if appropriate. 4) prepare collections for digitization. a) arrangement: assess how are the materials physically arranged and described, and if it will help or slow down your anticipated workflows. plan for and complete additional processing if needed. b) decide how you will display digitized materials. mirroring existing arrangement is the easiest, but you also have to consider the file formats you want to create. c) description: figure out how you can reuse existing description. plan metadata fields, vocabularies, prioritized subject terms and names. d) prepare preliminary metadata. reuse what you already have! e) prepare physical materials. verify that physical contents of the collection match existing description or inventory. remove staples, unbind, unsleeve, flatten, etc. identify and address any preservation or conservation issues. f) identify physical formats (this will help determine timeline and what equipment is needed). g) decide: outsource or in house? h) create and test workflows and procedures. i) create documentation for workflows and procedures (important for duration of project, for reusing for future projects, and also for future employees stewarding these digital assets to know what you did and how you did it). j) create and prepare systems, documents, or mechanisms to track work (it’s important to stay organized, especially when dealing with a large amount of materials or a team of workers). 5) digitize collections a) set up consistent file naming procedures and make sure they are followed. b) when dealing with mixed materials in house: depending on equipment and composition of materials, start with the easiest or what you have the most of, then take note of other formats (e.g., transparencies, oversize, etc.) that require different equipment or settings so you group them together to do later all at once. c) keep specifications simple if possible, especially if you have student workers doing the digitization. (for example, if you have complex digital objects with both text and photographic prints, and can digitize both materials on the same equipment without changing settings, do so. if you normally digitize text at 300 ppi but want photos at 600 ppi, rather than having the technician stop and change the settings, capture all at 600 ppi if you have the space.) d) auto-crop is a great tool if you have it but otherwise try to improve the efficiency of your processes with any tools at your disposal. sometimes this can be as simple as placing the item with the correct orientation to avoid the need to manually rotate later. e) file formats: archival images are generally tiffs. smaller derivative files may be necessary for access or to speed up ocr processes. sometimes it’s better to output them at the time of scanning than to batch process later. 6) process images information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 20 a) see above: try to improve your digitization workflows and procedures to shave time off of image processing. b) ocr: if you have textual materials ocr transcription makes them much more accessible with less manual work into creating detailed metadata. this is especially true for large aggregations of textual documents. resist the urge to have perfect ocr transcription. something is better than nothing, and when dealing with scale, you do not have the time to correct everything. here is also an opportunity for crowdsourcing, if you have the technical resources to set it up. c) ocr file output: depending on how you choose to display and make the digital surrogates available, you may need to output text files and/or pdf/as. 7) describe and provide access a) reuse description that already exists (e.g., from an inventory or a finding aid). if a finding aid exists, make sure you are using all available information and understand how description is inherited and can be reused. b) at the beginning of the project transform the metadata that already exists into a format you can use to describe the digital objects. you can add to this existing metadata throughout the workflow. c) at the beginning of the project identify preferred subject terms and important names to look out for and add to digital object metadata when appropriate. this is especially important when metadata is created by students or teams or anyone unfamiliar with the subject matter of the collection. it will help ensure consistency and make faceting better for users. d) explore how search engine optimization (seo) works for your public online access system. take that into consideration when creating metadata in order to optimize discovery of the materials. e) make it as easy as possible for users to identify the provenance of the digital object and to find other digital objects from the same collection. f) consider the links between the original collection description and the digital surrogates. consider adding digitization information or links to digital surrogates into finding aids and other records. consider also adding a link to the finding aid in the digital object metadata. consider using persistent identifiers, such as arks (archival resource keys), to d o this, instead of using regular urls. g) find out how your access system indexes full text transcripts and how it displays different file formats. consider if you are able/if you want to offer multiple file formats of a digital object. for example, a compound digital object that includes both text and images could be available as a collection of image files, a single pdf file, or both. identify what would be most useful to your users. h) don’t forget about structural, administrative, technical, and preservation metadata! 8) implement quality control procedures (qc) a) have a strategy (e.g., sampling), guidelines, and goals for qc. b) for staff performing quality control, identify the most important things to look for. c) decide how much time should be spent on qc. d) identify and acquire any automated tools that can be used. e) set up procedures or steps to follow when errors are found. 9) preserve digital assets a) you should have already planned how you will ensure access to and preservation of the digital files and metadata in the long term. best practice is to have policies in place information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 21 identifying what digital assets should be preserved and to what extent. identify applicable standards and best practices, implement software and technical solutions. b) set up workflows and procedures to ensure that the digital files receive appropriate ongoing digital preservation treatment. 10) publicize and promote a) work with administration, collaborators, and other stakeholders to publicize and promote the project. b) depending on your audience, social media, academic listservs, and professional organization publications can be other avenues to spread the word. c) set up harvesting with your regional digital library for inclusion in the digital public library of america. 11) assess a) web statistics can be used to track the use of online materials. see saa/acrl’s “standardized statistical measures and metrics for public services” section 8 “online interactions” for general information on what information to collect, and the digital libr ary federation’s “best practices for google analytics” for specific information on google analytics. if you are a contentdm user, see “google analytics in contentdm.”28 b) surveys, interviews, and focus groups are other methods that can be used to gather feedback. c) record and compile any oral or written feedback received from stakeholders and audiences. d) analyze feedback and use statistics to identify areas of success and areas for improvement. make improvements as necessary and incorporate findings into planning for future projects. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 22 appendix b digitization plan template template created by emily lapworth for local use and shared at the statewide digitization workshop. project overview collection name(s): collection number(s): link to finding aid(s) or existing description(s): project staff: project supervisor: research value/audience: goals: available resources: staff, money, equipment, software, etc. additional resources needed: staff, training, money, equipment, software, etc. priority level: low, medium, high. why is this being digitized now? part of the regular workflow, part of a grant project, or specially requested? publicity and promotion plans: assessment plans: estimated time frame/due date: estimate how much time should be spent on the collection, or when it should be finished by. date completed/approximate hours spent: formats and quantity of items: e.g., seven boxes of photographic prints, three folders of flat text documents, two drawers of oversize materials, etc. existing arrangement & description: how the collection is currently arranged, what description is currently available? copyright: what is the copyright status of the materials and can you legally digitize and provide access to them? restricted or sensitive materials: e.g., skip over restricted folders, digitize restricted item notice, or physically cover pii (personally identifiable information, such as social security numbers) during digitization. preservation issues: any fragile or delicate materials that need extra attention? supply needs: e.g., envelopes needed for rehousing information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 23 notes for future/follow-up: e.g., missing items, materials that should be restricted, recommend additional processing, rehousing, digitization, metadata enhancement, etc. preparation what will be digitized and what won’t be? e.g., series x will not be digitized at this time. it consists of audiovisual materials which would need to be outsourced. how will items be arranged and described online? how will identifiers/file names be assigned? e.g., each folder = a compound object, file titles from the finding aid will be used as titles for the digital objects what physical preparation must take place before digitization? e.g., remove all staples and fasteners digitization equipment/technical specs: • outsourcing or in-house equipment to be used • file types (e.g., tiffs) • file quality (e.g., 24-bit color, 600ppi) • file naming other specifications: • where will digital files be stored and preserved? • how will special physical formats be handled? (e.g., scrapbooksentire page or individual photos; magazinesentire issue or just cover? etc.) digital file processing • image correction? • cropping or other editing? • ocr or transcription? • create derivative files? digital file quality control: what procedures and workflows will you put in place to ensure that everything is digitized accurately and according to the project specifications? metadata what standards, fields, guidelines, and controlled vocabularies will you use? metadata quality control: what procedures and workflows will you put in place to ensure that all metadata is accurate, consistent, and conforms to the project specifications? access how will digital objects be accessed? what systems, workflows, and procedures will be used to provide access? information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 24 appendix c nevada statewide large-scale digitization symposium funded by lsta may 18, 2018 coffee and pastries 9:00 9:30 digitization lab tour 9:30 10:00 welcome 10:00 10:15 opening remarks from the dean of university libraries and the director of special collections and archives. session: what is large-scale? [live streaming begins] 10:15 11:00 this session will cover the characteristics of large-scale digitization and what sets it apart from other types of digitization projects. the unlv entertainment project team will also provide an update on the lsta funded project they undertook to digitize over 25,000 items from unlv’s entertainment collections. panel: methods for ramping up identifying resources 11:00 12:00 there is a mandate to increase efficiency in digitization, but what resources can help you get there? this session will detail four methods to increase digitization output and address how organizations of varying resource levels can adopt them. 12:00 1:00 lunch enjoy a catered lunch and some discussion time with colleagues from across the state and region. there will be time to walk around the room and share digitization activities at your organization via whiteboards. during lunch you can also browse the “equipment buffet” where we will have handouts/displays on various types of digitization equipment and outsourcing vendors. panel: challenges of digitization at a larger scale 1:00 2:00 information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 25 ramping up digitization is not as simple as merely increasing numbers. in this session we will discuss the challenges encountered each phase of digitization when scaling up and some strategies to meet the challenges. break [live streaming ends] 2:00 2:15 during the break, browse the “equipment buffet” where we will have handouts/displays on various types of digitization equipment and outsourcing vendors. using the provided worksheet, shop the buffet and rank how well each product meets your digitization needs. discussion: resource 5: statewide collaboration (in groups) 2:15 3:15 the last session of the day will focus on an additional resource to ramping up digitization: your peers and partners right here in nevada! we will review the notes about organizational projects and shared challenges, identify potential partnerships or collaborations, discuss grant opportunities, and work as a group to prioritize our state’s most at-risk collections. wrap up / assessment 3:15 3:30 before everyone departs for home, we will share contact information from attendees, complete a workshop evaluation and discuss follow up activities for next year. all attendees will leave with a customized plan of action for their organization. attendee learning objectives: • be able to define characteristics of digitization projects (mass, large-scale, boutique) and where your organization fits. decide on the type of digitization appropriate for your organization to move toward. • understand pros and cons of each method and the type of resources needed to support implementation. identify one or more method/resource for your organization to target to increase your organizational capacity. • understand complexities of large-scale digitization and identify one or more challenges at your organization. • gain perspective on projects across nevada. be able to identify at least one future collaborative opportunity. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 26 appendix d nevada statewide large-scale digitization workshop funded by lsta may 10, 2019 workshop outcomes: • digitization boot camp sessions guided by survey responses • upr lsta project update and lessons learned • project consultations available • reflections on statewide workshops compare over 1 year agenda 8:00 9:00 *concurrent session coffee and pastries digitization lab equipment consultations welcome 9:00 9:15 opening remarks from the dean of university libraries panel: challenges of digitization at a larger scale 9:15 -10:00 what does it take to complete a large digitization project? in this case study panel presentation, we will cover the approach used in digitizing the union pacific railroad water documents, including: writing the grant and selecting materials, preparing archival collections for efficient digitization, managing the project, the student technician perspective, and trouble-shooting imaging and technical issues. panelists: project manager; curator; digital collections librarian; student technician; visual resources curator goal: overview of large-scale digitization and project deliverables. boot camp: preparing to digitize 10:00 11:00 goal: dig into the decisions needed to create a digitization plan. there will be a short presentation to go over the planning document, including asking “what makes a good project”? we will discuss labor and students and complete hands-on activities with actual collections to encourage work on individual plans. 11:00 12:00 *concurrent session boot camp: capture images group a boot camp: create metadata group b information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 27 goal: provide introductions to two main workflows in digitization projects: digital capture and metadata creation. there will be demonstrations, hands-on activities, and a chance to ask questions with the goal of helping to complete digitization plans. 12:00 1:00 lunch 1:00 2:00 *concurrent session boot camp: capture images group b boot camp: create metadata group a goal: provide introductions to two main workflows in digitization projects: digital capture and metadata creation. there will be demonstrations, hands-on activities, and a chance to ask questions with the goal of helping to complete digitization plans. boot camp finding external funding 2:00 2:30 goal: learn what opportunities exist to secure funding for your project. hear tips on successful grant writing. discuss possible collaboration opportunities across the state. presenting online images dams overview 2:30 3:30 goal: see several options for presenting your collection to an online audience. options will highlight strategies for many staffing configurations including: solo librarian/historian, low it resourced institutions, common systems in the profession, and complex open source development communities focused on digital asset management platform (islandora 8). wrap up / assessment 3:30 3:45 goal: complete short survey on the workshop and ideas for future statewide events related to digitization. one on one consultations available 3:45 4:30 information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 28 endnotes 1 some examples include: “moving theory into practice: digital imaging tutorial,” cornell university library/research department, http://preservationtutorial.library.cornell.edu/contents.html; “bcr’s cdp digital imaging best practices version 2.0,” bibliographical center for research, june 2008, https://sustainableheritagenetwork.org/system/files/atoms/file/bcrcdpimagingbp.pdf; “new self-guided curriculum for digitization,” digital public library of america, https://dp.la/news/new-self-guided-curriculum-for-digitization/; elizabeth la beaud, “analysis of digital preservation course offerings in ala accredited graduate programs,” slis connecting 6, no. 2 (2017): 10, https://doi.org/10.18785/slis.0602.09. 2 anne daniel, amanda oliver, and amanda jamieson, “toward a competency framework for canadian archivists,” journal of contemporary archival studies 7, article 4 (2020): 1–13, https://elischolar.library.yale.edu/jcas/vol7/iss1/4. 3 “ala’s core competences of librarianship,” american library association, 2009, http://www.ala.org/educationcareers/careers/corecomp/corecompetences; “guidelines: competencies for special collections professionals,” association of college and research libraries, 2017, http://www.ala.org/acrl/standards/comp4specollect. 4 archives & records association of the united kingdom and ireland, “the ara competency framework,” 2016, https://www.archives.org.uk/160-cpd/cpd/700-competency-framework.html. 5 youngok choi and edie rasmussen, "what is needed to educate future digital librarians," d-lib magazine 12, no. 9 (september 2006), https://doi:10.1045/september2006-choi. youngok choi and edie rasmussen, "what qualifications and skills are important for digital librarian positions in academic libraries? a job advertisement analysis," the journal of academic librarianship 35, no. 5 (2009): 457–67, https://doi.org/10.1016/j.acalib.2009.06.003. 6 karl-rainer blumenthal et al., “what makes a digital steward: a competency profile based on the national digital stewardship residencies,” lis scholarship archive (2017), https://doi.org/10.17605/osf.io/tnmra. 7 “mlis skills at work: a snapshot of job postings,” san jose state university school of information, 2019, https://ischool.sjsu.edu/lis-career-trends-report. 8 choi and rasmussen, “what qualifications.” 9 david a. kolb and ronald fry, “toward an applied theory of experiential learning,” in theories of group process, ed. cary l. cooper (london: john wiley, 1975), 33–57. 10 “guidelines,” federal agencies digital guidelines initiative, http://www.digitizationguidelines.gov/guidelines/. 11 krystyna k. matusiak and xiao hu, "educating a new cadre of experts specializing in digital collections and digital curation: experiential learning in digital library curriculum," proceedings of the american society for information science and technology 49, no. 1 (2012): 1– 3, https://doi.org/10.1002/meet.14504901018. http://preservationtutorial.library.cornell.edu/contents.html https://sustainableheritagenetwork.org/system/files/atoms/file/bcrcdpimagingbp.pdf https://dp.la/news/new-self-guided-curriculum-for-digitization/ https://doi.org/10.18785/slis.0602.09 https://elischolar.library.yale.edu/jcas/vol7/iss1/4 http://www.ala.org/educationcareers/careers/corecomp/corecompetences http://www.ala.org/acrl/standards/comp4specollect https://www.archives.org.uk/160-cpd/cpd/700-competency-framework.html https://doi:10.1045/september2006-choi https://doi.org/10.1016/j.acalib.2009.06.003 https://doi.org/10.17605/osf.io/tnmra https://ischool.sjsu.edu/lis-career-trends-report http://www.digitizationguidelines.gov/guidelines/ https://doi.org/10.1002/meet.14504901018 information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 29 12 amy lynn maroso, "educating future digitizers," library hi tech 23, no. 2 (june 1, 2005): 187– 204, https://doi.org/10.1108/07378830510605151. 13 “agenda, digital directions: fundamentals of creating and managing digital collections, october 19-20, 2020, tucson, az,” northeast document conservation center, https://www.nedcc.org/preservation-training/dd20/agenda. 14 kim christen and lotus norton-wisla, “digitization project decision-making: starting a digitization project,” center for digital scholarship and curation, sustainable heritage network, july 1, 2017, https://sustainableheritagenetwork.org/digital-heritage/digitizationproject-decision-making-starting-digitization-project. 15 kim christen and lotus norton-wisla, “digitization project decision-making: should we digitize? can we digitize?,” center for digital scholarship and curation, sustainable heritage network, july 1, 2017, https://sustainableheritagenetwork.org/digital-heritage/digitizationproject-decision-making-should-we-digitize-can-we-digitize-0. 16 taylor surface, “getting a million dollar digital collection grant in six easy steps,” oclc next, december 6, 2016, http://www.oclc.org/blog/main/getting-a-million-dollar-digital-collectiongrant-in-six-easy-steps/. 17 institute of museum and library services, “putting your best foot forward: tips on making your preliminary proposal competitive”, december 31, 2015, https://www.imls.gov/blog/2015/12/putting-your-best-foot-forward-tips-making-yourpreliminary-proposal-competitive. 18 examples of project management literature relevant to cultural heritage digitization projects include: cyndi shein, hannah e. robinson, and hana gutierrez, “agility in the archives: translating agile methods to archival project management,” rbm: a journal of rare books, manuscripts, and cultural heritage 19, no. 2 (2018), https://rbm.acrl.org/index.php/rbm/article/view/17418/19208; michael dulock and holley long, “digital collections are a sprint, not a marathon: adapting scrum project management techniques to library digital initiatives,” information technology and libraries 34, no. 4 (2015), https://doi.org/10.6017/ital.v34i4.5869; michael middleton, “library digitisation project management," proceedings of the iatul conferences (1999), http://docs.lib.purdue.edu/iatul/1999/papers/20; “dlf project managers toolkit," digital library federation, https://wiki.diglib.org/dlf_project_managers_toolkit; theresa burress and chelcie juliet rowell, “project management for digital library projects with collaborators beyond the library,” journal of college & undergraduate libraries 24, no. 2–4 (2017), https://doi.org/10.1080/10691316.2017.1336954. 19 “guiding digital success,” online computer library center (oclc), https://www.oclc.org/content/dam/oclc/contentdm/guiding_digital_success_handout.pdf . 20 useful metadata resources include: digital public library of america, “metadata application profile,” https://pro.dp.la/hubs/metadata-application-profile; dublin core metadata initiative, “guidelines for dublin core application profiles,” https://www.dublincore.org/specifications/dublin-core/profile-guidelines/; oksana l. https://doi.org/10.1108/07378830510605151 https://www.nedcc.org/preservation-training/dd20/agenda https://sustainableheritagenetwork.org/digital-heritage/digitization-project-decision-making-starting-digitization-project https://sustainableheritagenetwork.org/digital-heritage/digitization-project-decision-making-starting-digitization-project https://sustainableheritagenetwork.org/digital-heritage/digitization-project-decision-making-should-we-digitize-can-we-digitize-0 https://sustainableheritagenetwork.org/digital-heritage/digitization-project-decision-making-should-we-digitize-can-we-digitize-0 http://www.oclc.org/blog/main/getting-a-million-dollar-digital-collection-grant-in-six-easy-steps/ http://www.oclc.org/blog/main/getting-a-million-dollar-digital-collection-grant-in-six-easy-steps/ https://www.imls.gov/blog/2015/12/putting-your-best-foot-forward-tips-making-your-preliminary-proposal-competitive https://www.imls.gov/blog/2015/12/putting-your-best-foot-forward-tips-making-your-preliminary-proposal-competitive https://rbm.acrl.org/index.php/rbm/article/view/17418/19208 https://doi.org/10.6017/ital.v34i4.5869 http://docs.lib.purdue.edu/iatul/1999/papers/20 https://wiki.diglib.org/dlf_project_managers_toolkit https://doi.org/10.1080/10691316.2017.1336954 https://www.oclc.org/content/dam/oclc/contentdm/guiding_digital_success_handout.pdf https://pro.dp.la/hubs/metadata-application-profile https://www.dublincore.org/specifications/dublin-core/profile-guidelines/ information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 30 zavalina et al., “developing an empirically-based framework of metadata change and exploring relation between metadata change and metadata quality in marc library metadata,” procedia computer science 99 (2016 ) 50–63, https://doi.org/10.1016/j.procs.2016.09.100. 21 “guidelines: technical guidelines for digitizing cultural heritage materials,” federal agencies digital guidelines initiative, http://www.digitizationguidelines.gov/guidelines/digitizetechnical.html; “digital preservation at the library of congress,” library of congress, https://www.loc.gov/preservation/digital/. 22 robin l. dale, “reformatting: 6.7 outsourcing and vendor relations,” northeast documentation conservation center, https://www.nedcc.org/free-resources/preservation-leaflets/6.reformatting/6.7-outsourcing-and-vendor-relations; “deciding to outsource or digitize inhouse," digital stewardship curriculum, center for digital scholarship and curation/sustainable heritage network, https://www.sustainableheritagenetwork.org/system/files/atoms/file/1.20_outsourcingvsin house.pdf. 23 “omeka,” roy rosenzweig center for history and new media, https://omeka.org/; “contentdm: build, showcase, and preserve your digital collections,” oclc, https://www.oclc.org/en/contentdm.html. 24 “iso 14721:2012 space data and information transfer systems—open archival information system (oais)—reference model,” international organization for standardization, https://www.iso.org/standard/57284.html; “levels of digital preservation,” national digital stewardship alliance/digital library federation, https://ndsa.org//activities/levels-of-digitalpreservation/. 25 “from theory to action: ‘good enough’ digital preservation solutions for under-resourced cultural heritage institutions”, preserving digital objects with restricted resources (powrr), august 2014, http://commons.lib.niu.edu/handle/10843/13610. 26 “digital archives specialist (das) curriculum and certificate program,” society of american archivists, https://www2.archivists.org/prof-education/das; “powrr institutes,” digital powrr, https://digitalpowrr.niu.edu/institutes/. 27 sandy rodriguez et al., “collective responsibility: seeking equity for contingent labor in libraries, archives, and museums,” institute for museum and library services white paper, http://laborforum.diglib.org/wpcontent/uploads/sites/26/2019/09/collective_responsibility_white_paper.pdf. 28 saa-acrl/rbms joint task force on public services metrics, “standardized statistical measures and metrics for public services in archival repositories and special collections libraries,” 2018, https://www2.archivists.org/standards/standardized-statistical-measuresand-metrics-for-public-services-in-archival-repositories; molly bragg et al., “best practices for google analytics in digital libraries: digital library federation assessment interest group analytics” working group, 2015, https://doi.org/10.17605/osf.io/ct8bs; “google analytics in contentdm,” oclc, https://doi.org/10.1016/j.procs.2016.09.100 http://www.digitizationguidelines.gov/guidelines/digitize-technical.html http://www.digitizationguidelines.gov/guidelines/digitize-technical.html https://www.loc.gov/preservation/digital/ https://www.nedcc.org/free-resources/preservation-leaflets/6.-reformatting/6.7-outsourcing-and-vendor-relations https://www.nedcc.org/free-resources/preservation-leaflets/6.-reformatting/6.7-outsourcing-and-vendor-relations https://www.sustainableheritagenetwork.org/system/files/atoms/file/1.20_outsourcingvsinhouse.pdf https://www.sustainableheritagenetwork.org/system/files/atoms/file/1.20_outsourcingvsinhouse.pdf https://omeka.org/ https://www.oclc.org/en/contentdm.html https://www.iso.org/standard/57284.html https://ndsa.org/activities/levels-of-digital-preservation/ https://ndsa.org/activities/levels-of-digital-preservation/ http://commons.lib.niu.edu/handle/10843/13610 https://www2.archivists.org/prof-education/das https://digitalpowrr.niu.edu/institutes/ http://laborforum.diglib.org/wp-content/uploads/sites/26/2019/09/collective_responsibility_white_paper.pdf http://laborforum.diglib.org/wp-content/uploads/sites/26/2019/09/collective_responsibility_white_paper.pdf https://www2.archivists.org/standards/standardized-statistical-measures-and-metrics-for-public-services-in-archival-repositories https://www2.archivists.org/standards/standardized-statistical-measures-and-metrics-for-public-services-in-archival-repositories https://doi.org/10.17605/osf.io/ct8bs information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 31 https://help.oclc.org/metadata_services/contentdm/get_started/google_analytics_in_con tentdm. https://help.oclc.org/metadata_services/contentdm/get_started/google_analytics_in_contentdm https://help.oclc.org/metadata_services/contentdm/get_started/google_analytics_in_contentdm abstract introduction literature review overview of grant projects competencies project planning grant writing project management metadata digital capture digital asset management summary of competency development: experiential learning versus workshops learning through experience for project teams learning through experience for organizations learning through workshops for professional development conclusion appendix a appendix b project overview preparation digitization metadata access appendix c appendix d endnotes core leadership column: making room for change through rest core leadership column making room for change through rest margaret heller information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.13513 i write this column from the vantage point of my current role as a member of the core technology section leadership team, and as a newly elected president-elect of core, with my term starting in july 2021. the planning for core began years ago but became a real division of ala in the most chaotic of times. visions for the first year of core were set aside as we had to face the reality of all the work needing to be done remotely, without any conferences that would allow for in-person conversations, and with all the leadership and members under personal and professional strain. yet being forced to start up slowly and deliberately provides some advantages. settling into this new situation has allowed staff, leaders, and members to acclimate to a new division an d learn how we want to do things in the future, rather than relying too much on how we did things in the past or feeling pressure to meet every demand. right now, we are all at a juncture in our personal and professional lives, and thinking about how to approach the coming months. summer offers the promise of growth and reinvention. the pause that a break implies allows time for us both as individuals to make time for what is important to us, and as members or employees of institutions to reconsider our priorities. for people working in library technology, however, the “summer break” is often anything but. public libraries become a hub for activity as schools are closed, and school and academic libraries may use slow periods when classes are not in session for necessary systems upgrades or to roll out a new service. the summer of 2020 was one of the most challenging of my life, both professionally and personally, and meeting all the demands of the moment left hardly any time for a true break. this year, just like last year, feels like a summer we might not let ourselves rest for a moment. while many libraries have been open to some degree over the past year, the upcoming summer has the potential for a return to something like normal. shutting down regular in-person services and buildings felt chaotic since it required new ways of providing those services and building up new technical infrastructure, but without us having expected this in advance like a normal summer project. the return may also feel chaotic, but rather than approaching it as a series of tasks in a plan that requires lots of energy and work, i hope we can treat the time as a period of reflective practice and give ourselves time to understand what has changed. adapting to the realities of life since spring of 2020 has changed us all in various ways, and so too our library users have new needs and expectations. in some cases, they have embraced new services, though this has not been a smooth process for everyone. i have a family member who started using an e-reader for the first time during the pandemic to access library e-books when her public library was closed or had limited services. she was grateful for the option to access books this way, but occasionally struggled to follow the complex workflow from library app to vendor site to device. without the ability to visit a physical reference desk to ask for help, she asked me to assist with device troubleshooting on several occasions. that worked well for her, but margaret heller (mheller1@luc.edu) is digital services librarian, loyola university chicago, and (as of july 1, 2021) president-elect of the core: leadership, infrastructure, futures division of ala. © 2021. mailto:mheller1@luc.edu information technology and libraries june 2021 making room for change through rest | heller 2 not everyone has a digital services librarian in their quarantine bubble. i share this to illustrate that while some people will have adapted or gotten the help they need, for many, this time has been one of doing without or maladaptation. going back to “normal” will not help those who will need even more than they did pre-pandemic. taking time to understand that fact, and to accept that it will not be a quick process of return for many people, will allow us to give each other space to find a way back to our lives as library users and library employees. while many of us feel uncomfortable when we see slow progress—i know i do—i am coming to realize the value of making space for slowness and for rest. rest comes in all forms. it could be physical rest, but it could be pursuing an artistic or athletic hobby, intentional social interactions, or spiritual practices. institutions might give extra time off or set healthy expectations for work hours and meeting-free days, while also discarding old practices and attitudes to create better future work environments. there are crises to which we must immediately react and respond, but without personal and institutional energy in reserve, we will not do as good a job when they occur. crises include political upheaval, public health emergencies, and other major events, but we can also appreciate how they unfold on a more mundane level. information technology work often requires odd hours, intense bursts of energy to complete projects in a small window of time, and unpredictable problems that require dropping everything else to address an emergency. it is natural to constantly look towards the most urgent and the newest problem. this tendency results in lengthy backlogs for requests and accumulates technical debt from deferred maintenance or refactoring. yet as we bring our libraries and other institutions out of pandemic mode over the next few years, allowing for reflective space can help us to be cautious about the choices we make. for example, during earlier stages of the pandemic, many of us probably had to set u p systems for some type of surveillance to maintain social distancing and aid in contact tracing. taking some time to review all those new procedures and systems—and purposefully dismantle those with negative privacy implications—will help us to go forward as more ethical and empathetic institutions. taking it slow is going to be the only way through the next period. summer 2021 should be about reflection on collective trauma. we responded to the events of the past year, whether it was for closing libraries, keeping libraries open as safely as possible, racial justice work, or election support, and now we must consider how to incorporate what we started into lasting change. to do that reflection will require rest. we know how important rest is but finding space for it is not usually a high priority. rest allows us to integrate our experiences, and will build us back to make sure we can keep responding to what comes next. i am challenging myself to spend time in deliberate reflection at the cost of mindless productivity over the coming months so that i can keep helping my library and core succeed. i hope you will consider doing the same. collaboration and integration: embedding library resources in canvas articles collaboration and integration embedding library resources in canvas jennifer l. murray and daniel e. feinberg information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.11863 jennifer l. murray (jennifer.murray@unf.edu) is associate dean, university of north florida. daniel e. feinberg (daniel.feinberg@unf.edu) is online learning librarian, university of north florida. abstract the university of north florida (unf) transitioned to canvas as its learning management system (lms) in summer 2017. this implementation brought on opportunities that allowed for a more userfriendly learning environment for students. working with students in courses which were in-person, hybrid, or online, brought about the need for the library to have a place in the canvas lms. students needed to remember how to access and locate library resources and services outside of canvas. during this time, the thomas g. carpenter library’s online presence was enhanced, yet still not visible in canvas. it became apparent that the library needed to be integrated into canvas courses. this would enable students to easily transition between their coursework and finding resources and services to support their studies. in addition, librarians who worked with students, looked for ways for students to easily find library resources and services online. after much discussion, it became clear to the online learning librarian (oll) and the director of technical services and library systems (library director) that the library needed to explore ways to integrate more with canvas. introduction online learning is not a new concept at the unf. in fact, in-person, hybrid, and online courses used online learning in some capacity since distance learning took hold in higher education. unf transitioned to canvas as their learning management system (lms) in summer 2017, which replaced blackboard. this change, which affected all the unf’s online instruction and student learning, brought on new benefits and challenges that allowed for a more secure system for students to take in-person, hybrid, and distance learning courses. while this change occurred, unf’s library went through many changes in its virtual presence. students, specifically those who had classes that utilized canvas, needed a user-friendly way to use the library website and its resources virtually. in response, the library’s resources transitioned into having a greater online presence. however, ultimately, many students needed to use resources that they did not actually realize were available electronically from the library. through instruction and research consultations (both in-person and virtually), students needed to be directed back to the library homepage to access resources; however, the reality was that unless there was a presence of library instruction or professors pointing out library resources, students instead turned to google or other easy to find online resources to which they were previously exposed. how the project originated by spring 2018, there was growth of unf courses that were converted to online or hybrid courses. as students used canvas more, librarians started receiving feedback from in-person and online sessions that students had difficulty accessing the library’s resources while in canvas. the lack of library visibility in canvas caused the librarians to truly acknowledge that this was a problem. mailto:jennifer.murray@unf.edu mailto:daniel.feinberg@unf.edu information technology and libraries june 2020 collaboration and integration | murray and feinberg 2 students had to open a new browser window to access the library and then go back to canvas to complete their assignments, which involved multiple steps. this caused frustration among students who had to remember the library url, while also getting used to navigating their new courses in canvas. librarians consistently spent large amounts of time instructing students how to navigate to the library website during library instruction sessions and research consultations. in effect, more time was spent with students to guide them to library resources such as programmatic or course specific springshare hosted libguides (also known as library guides), or the library homepage. rather than being focused on how to use library resources and become more information literate, students spent more time on just locating the library website to get to the unf library’s online resources. together, the oll and library director talked about possibilities in canvas that would benefit all students who attended unf both in-person and online. canvas is located in unf’s mywings, a portal where all students go for coursework, email, and resources that support their academic studies at unf. it became apparent that if it was possible, there needed to be a quicker way to access the unf library resources for students. literature review with the advent of online learning, it became obvious that students needed to have library access within their online learning management system. for campuses such as unf, this meant within canvas. for unf students that are distance or online students only, this was especially true. farkas noted that librarians had worked to determine the best ways to provide library materials, services, and embed librarians into the lms.1 over the last fifteen years, lms have become more important to support the growth of online learning. pomerantz noted that the lms has become critical to instruction and online learning. approximately 99 percent of institutions adopt an lms and 88 percent of faculty utilize an lms platform.2 this “puts it in the category with cars and cellphones as among the most widely adopted technologies in the united states.”3 library guides that have been integrated into an lms increased their visibility, but did not guarantee that faculty and students would utilize them. that is why it was critical to continuously collaborate and communicate with faculty, students, and librarians to bring attention to the resources that could assist students. farkas noted that librarians at the ohio state university discussed that no matter how the library was integrated into a university’s lms, the usage of the library there was decidedly dependent on if the faculty professor promoted the library to their students.4 the reality that libraries faced was that without visibility in an lms, students that were online/distance learners needed to remember or find the library’s website. while this seemed to be inconsequential, it caused students to use google or other resources instead of their university/college’s library discovery tool or library databases. farkas noted that shank and dewald’s seminal article described a university’s lms as having two levels, macro and micro. when there was one way to access the library in the lms, then it was termed macro. this single pathway allowed for less maintenance since there was a single way to access the library from the lms.5 the university of kentucky embedded the library by adding a library tab in blackboard. other institutions like portland state university, ohio state university, and oakland state university developed library widgets to make the library more accessible.6 the addition of library and research guides in library instruction was critical to increase visibility for information technology and libraries june 2020 collaboration and integration | murray and feinberg 3 students and furthermore make sure students had easier access to the library through their lms. getting librarians’ access to the lms at their institutions is an ongoing issue.7 unf librarians wanted to determine best practices to decide how the library could integrate into canvas. therefore, research was needed to see what other university libraries were doing. the librarians at unf discovered that there was no obvious preference based on examples found in research to accomplish how to get the library into canvas. davis observed that “claiming a small amount of real estate in the lms template . . . is an easy way to put the library in students’ periphery.” by simply having a library link added or a page added to each course was “the digital equivalent for students of walking past the library on the way to class.”8 however, it seemed that a lot depended on how an lms was used at an unf and the technical expertise available. thompson and davis noted that the “lms technology has added another layer of complexity to the puzzle. as technology evolves to address changes in learning design, student and facu lty attitudes, expectations, perceptions will continue to be a critical piece of the course integration puzzle.” 9 while looking at other institutions, there were a variety of ways in which canvas and the library were integrated. there were numerous examples from embedded springshare product library guides, to the creation of modules of quizzes or tutorials, and even to the creation of online minicourses, and embedded librarians in lms courses.10 penn state university looked at their method of how to add library content into canvas. they already had a specific way of putting library guides in canvas, but it was not in a highly visible location for students to easily access. when faculty put guides in their courses, with the collaboration of librarians, the guides were used. however, many of the faculty did not use these librarians or resources. a student survey and user studies were used to best learn how to fix the problem of students and faculty that did not use the guides and content more. penn state worked with their comm 190 instructors to administer a survey that was extra credit, to ensure getting responses.11 “general findings included: 53 percent of students did not know what a course guide was; 41 percent of students had used a guide for a class project; 60 percent accessed a course guide via the lms; and 37 percent of students used course guides on their own.”12 many students were interested in doing their library research within canvas itself. it should be noted that the guides needed to be in a prominent place in canvas, but not overwhelm the course content. for course-specific guides an introductory statement was needed to describe what the guide was about. when the release of springshare’s lti tool occurred, it became an optimal time in which the technical solutions allowed for penn state’s library guides to be embedded smoothly into canvas.13 the learning tools interoperability (lti tool) allows for online software to be integrated into canvas. in effect, when professors want to add a tool to their course, it allows for more seamless and controlled avenue. in the case of library guides, it creates a way in which guides can be embedded into the lms with little problems. another example of a library integration into a campus lms was at the utah state university (usu) merrill-cazier library. they looked to find a way to maximize the effectiveness of springshare’s library guides when they assessed the design and reach of library guides within their lms.14 they took a unique approach to build an lti tool that automatically pulled in the most appropriate library guide when the “research help” link in canvas was activated by a professor. they also saw this as an opportune time to redesign their subject guides and ensure there were guides for all disciplines. they provided usage data to subject librarians to help determine where there might be opportunities to interact with classes and provide more library instruction. overall, information technology and libraries june 2020 collaboration and integration | murray and feinberg 4 the study and feedback they received from students helped them to find ways to improve how librarians used and thought about library guides, and expanded their reach based on usage data. 15 this ability to add library guides to canvas provided students a way to access library materials or the library without having to leave the online classroom. many libraries have conducted focus groups and usability studies that were key to providing valuable feedback on the knowledge and understanding that faculty and students had of guides, ways to improve information shared that assisted students with their coursework and faculty in their online teaching. research indicated that exploration and implementation of integrating library guides into an lms led to a need to improve and provide more consistently designed guides.16 the literature indicated the importance of a strong relationship with the department that manages the lms. these integrations were made much easier when there was a relationship established and it sometimes led to finding out about additional opportunities to integrate more with the lms. penn state saw an increase of over 200,000 hits to its course guides believed to be because of the lti integration.17 this, however, did not guarantee that the students benefited from the course guides, similar to library statistics not proving resources were being used despite page hits. in addition, faculty were able to turn off the lti course guide tool, which reduced the chances of student usage or awareness of the course guide. based on feedback from students and faculty, it did not matter where the course guides were since they could be ignored anyway. a penn state blog was developed by the teaching and learning with technology unit to provide instructors a venue in which they could be aware of online services that librarians provide.18 “although automatic integration allows specialized library resources to be targeted at all lms courses, that does not mean that they’ll be accessed. it is important then to build ongoing relationships with stakeholders, providing not just information that such integrations exist, but also reasons why to use them.”19 however, not all universities and colleges decided to integrate the library strictly through a library guide or a link to the library integration into their lms. karplus noted that students spent more time online rather than going to the physical academic library. karplus discussed that the digital world combined with academic library resources had two benefits. one of which brought online research as a more normal occurrence. the second benefit was that students were more comfortable with accessing online resources.20 while using blackboard, st. mary’s college’s goal was to incorporate library information literacy modules into courses that existed. using the blackboard lms, students were able to access all components of their courses including assigned readings. this became their academic environment. therefore, information literacy modules, tutorials, and outside internet resources could be added to the lms.21 tutorials combined with preand post-testing, gave faculty instant feedback. librarians were also able to participate in blackboard through discussion boards and work with students.22 there was a constant need to update the modules and the information added to blackboard. librarians having access to the blackboard site, allowed for students to use the library resources more readily. “the site can be the focal point for many librarians in one location thus ensuring a consistent, collaborative instructional program.”23 overall, the integration of campus librarians into an lms was to get students to use the library in order to be more successful in their academic endeavors. information technology and libraries june 2020 collaboration and integration | murray and feinberg 5 developing a plan of action initially, the oll and library director brainstormed possible integration ideas, ranging from adding a library global navigation button to the canvas ui, to adding a link to the library in the canvas help menu. at the same time, they also researched what other libraries had done. after brainstorming, it was realized that additional conversations needed to occur within the library and with unf’s online learning support team, a part of the center for instruction and research technology (cirt), the group that manages canvas. the discussion to integrate the library and canvas was a complex matter. unf administrators asked for a proposal to be written so it could be brought to the library, online learning support team, and information technology services (its) stakeholders for discussion and approval. that proposal, along with much needed discussion, was critical in order to determine the possibilities and actions that needed to be taken. that being said it was important to recognize the importance of what was best to serve the faculty and students. when brainstorming discussions started to occur with unf’s online learning support team, it was important for the library to determine what options were available to embed the library in canvas. the library had a strong relationship with unf’s online learning support team and its administrators, which made this an easy process to pursue. what the oll and library director initially wanted was to add a simple link to the global navigation in canvas that would take all users to the library homepage. however, it became apparent that this was not possible due to the fact that this space is limited and many departments on campus would like greater visibility in canvas. the next option, which was easier to implement, was to add a link to the library homepage under the help menu in canvas. although this menu link was added, it was so hidden in canvas that the oll and library director felt that it would never be found in canvas by faculty, let alone students. cirt administrators asserted to the oll and library director of what other possibilities were available. after researching options, the library recommended creating access to library resources and services using a springshare lti tool for library guides, which cirt agreed to. library guides, or libguides, are library webpages that are built from springshare software. using the lti tool seemed like a great possibility since it would allow for more of a presence in addition to the help link to the library homepage. after approval from library administration and initial discussions with it, the project moved forward. implementation the project took about a year to complete from the time discussions began internally in the library to the time the integration went live (see figure 1). information technology and libraries june 2020 collaboration and integration | murray and feinberg 6 figure 1. project timeline the idea to have a seamless entryway to the library seemed to be a good idea based on observations and feedback from students, but the oll and library director started by completing an environmental scan to see what other institutions did to get ideas on ways the unf library could integrate into canvas. the oll and library director learned that there were a variety of ways it had been done from the integration of the library at the global navigation level, course level, and by an added link to the library under the help menu in canvas. it became clear that an integration into canvas would seem like an obvious progression to strengthen not only online learning, but also give students the ability to benefit from the resources that the library subscribed to and enhance their curricular needs. conversations then occurred with unf’s online learning support team to discuss integration options further. after much discussion, a decision was made to pursue an added link to the librar y website under the canvas help menu and a new lti tool at the course level. since canvas was used in so many courses, it was determined that university-wide campus committee agreement was needed on how to go about adding library guides to canvas courses. librarians were also approached at this time to get their input and feedback. the goal seemed obvious to the librarians. when they were approached, buy-in to support the students with canvas by way of the help button and lti tool integration seemed more than straightforward. therefore, for the librarians, the goal was to solve the problem of making sure that students could easily access library materials. overall, the library faculty’s preference for the implementation was to embed the library website under the canvas help menu while also have the student resources libguide inside all canvas courses using the springshare lti tool. after all internal approvals were obtained, the link to the library was seamlessly added under the canvas help menu. as for the springshare lti tool, it required more work and discussion before it could be implemented. after approval was granted from the unf online learning support team and campus its security team, the integration began. configuration options for the lti tool were explored and the systems and digital technologies librarian worked closely with the unf online learning support team and springshare to setup the libapp lti tool. information technology and libraries june 2020 collaboration and integration | murray and feinberg 7 the first step was to configure springhare’s automagic lti tool to automatically add libguid es to courses in canvas. this included adding an lti tool name and description, which appeared in canvas during setup and the course navigation. it was also decided to set the student resources libguide as the default guide for all courses based on feedback from across campus. instructors could request to use a different libguide for their course. to enable this, two parameters had to be set in the automagic lti tool to enable libguide matching between canvas and libguides: • lti parameter name: for unf, this was set to “context” label, to select the course code field in canvas. • libguides metadata name: this was set to the appropriate value to identify the metadata field used in libguides. if an instructor decided to change the default guide to another guide, these two parameters would need to be entered into a specific libguide’s custom metadata, so that canvas could link to the designated guide to display in a course. the change had to be made in the libguide itself, so it was handled by the systems and digital technologies librarian. there had not been many instructors who had requested this yet, but if utilized, the library would also have had to ensure this carried over each semester by updating the metadata in the guide to the new course code. after the configuration was completed on the springshare side, the next step was to set up the integration in the canvas test environment. an external application had to be installed in canvas to allow the springshare lti tool to run. after it was tested, the application was applied across all courses and set to display by default, which the majority of faculty preferred. faculty who did not want to use the integration had the ability to manually turn it off in canvas. during the implementation setup, a few minor issues were encountered. after seeing what the student resources guide looked like in canvas it became clear that the header and footer were not needed and just cluttered the guide. they were both removed in the lti setup options to ensure a cleaner looking guide. since the libguides were being embedded into another system (canvas), formatting of the guides had to be adjusted. the other issue encountered was trying to add available springshare widgets such as the library hours or room booking content to the guide using the automagic lti tool. while this was not successful, it was determined that the additional options were not needed. once the integration was set up in the canvas test environment, demonstrations were held and input was gathered from stakeholders through campus-wide meetings with faculty to obtain their input. it was critical to determine if faculty would utilize libguides in their canvas courses. an overview of the integration and the benefits were given to the campus technology committee and distance learning committee faculty. a demonstration was also given so that these faculty committees could see what the integration would look like in their courses. overall, the feedback obtained from the faculty was very positive. the preference was to have the configuration be optout, where the library guides would automatically display in canvas courses. many faculty members were excited about the integration and looked forward to having it in their courses. after demos took place and final setup was completed based on feedback, the integration was then setup in the canvas production environment and was announced via newsletters, emails , and social media. as of the fall 2019 semester, the library's student resources guide was integrated into all courses in canvas (see figure 2). information technology and libraries june 2020 collaboration and integration | murray and feinberg 8 figure 2. student resources library guide in canvas benefits of the integration students are dependent on their campus lms in order to complete their coursework, support their studies, and in the case at unf, have easier access to the online campus. the libguide integration not only streamlined their way to library resources, but also promoted library usage from students that may not have known how to get to the resources available to them. for faculty it should be noted that they were able to replace the default student libguide with a more specific subject or course guide. either way, it brought more awareness to resources and services that supported curricular needs. the springshare chat widget in the guide also gave students the ability to communicate directly with a librarian from within canvas. this integration not only increased the library visibility in the online environment but enabled all students, whether inperson, hybrid, or online, with direct access to the resources they needed for their coursework. challenges of the integration although there were many benefits to integrate the library into canvas, there were many challenges with making the integration happen. there were many more stakeholders than expected. from library administration, to the canvas administrators, to library faculty, and teaching faculty committees, their input was needed prior to the project taking place. since the project grew organically, this meant that all of the stakeholders were brought in as the project grew or unfolded. once the project received approval from the library and cirt administrators, its administrators had to give the final approval in order to proceed with the integration of library guides. the process to implement the integration took some time to figure out. in addition, getting buy-in from the teaching faculty was key as the navigation options in their canvas courses would be impacted. making sure the faculty understood how it would assist their students was important as the goal was to help their students succeed with their coursework. information technology and libraries june 2020 collaboration and integration | murray and feinberg 9 a concern was if faculty would tell their students, or conversely, would students find the link to the libguide on their own? determining how the news of the library and canvas integration would be communicated to the unf community was the final step. the library director, oll, and cirt administrators needed to determine the best communication routes to get the unf community aware of this news. in effect, emails, unf updates/newsletters, and by word of mouth by teaching faculty. it was crucial that students be aware of these tools. this meant that going forward, unf would depend on word of mouth or student's curiosity in the canvas navigation bars themselves. discussion and next steps integrating the library with the unf’s learning management system, canvas, took much planning and collaboration, which was key to creating a more user-friendly learning environment for students. in reflecting on what went well and what did not, the unf librarians learned several important lessons that will help improve upon the implementation of future projects. to begin, it is important to identify and involve stakeholders early on, so they can provide feedback along the way. getting buy-in from the teaching faculty is also key since the integration affects the navigation options in their canvas courses. for unf, initially, the oll and library director did not realize how many groups of teaching faculty and departments would need to approve this canvas change and implementation. it was important to have them understand the importance of the integration and how it can assist their students with their coursework. considering the content of the library guides was important because of the impact it would have on canvas courses. for example, at the unf library some students thought that the librarian’s picture on the default guide was in fact their professor. in turn, students began to contact her. this caused much confusion for our students and professors alike. along the way, communication is critical so that everyone is kept informed as the integration progresses. communication at the appropriate times and ensuring all information is gathered about configuration options before starting conversations with stakeholders is important too. having this transparency at the appropriate times and ensuring there was enough info rmation about the configuration options before starting conversations with stakeholders was important too. finally, investigating the ability to retrieve usage statistics from day one would be extremely beneficial and provide data to assess how often the library guides are being used in the lms and by whom. this information would help determine next steps and explore other potential integration opportunities. at unf, the librarians were not able to implement statistics as part of our integration which has made it more difficult to assess the usage of the library guides in canvas. now that the integration has been completed, ensuring the integration continues to meet the needs of faculty and students will be important. feedback will need to be gathered from stakeholders to find out if they find the integration useful, if there are any issues being encountered, and/or if they have any recommendations for ways to enhance the integration. usage statistics will also be gathered as soon as they are available. this will provide information on which instructors are using the library guides in their courses and which instructors are not using them. for those who have used it, it will be an opportunity to target those courses for instruction. for those who have not used them, it will be an opportunity to find out why and make sure they are aware of the benefits of using them in their courses. information technology and libraries june 2020 collaboration and integration | murray and feinberg 10 exploring other integration possibilities, especially as the technology continues to evolve, will be important to ensure the library continues to reach students. while the natural progression of the unf integration would be to embed librarians in the canvas platform, others have faced challenges. “according the ithaka s & r library survey 2013 by long and schonfeld, 80–90 percent of academic library directors perceive their librarians’ primary role as contributing to student learning while only 45–55 percent of faculty agree librarians contribute to student learning.”24 even though this is a challenge, faculty collaboration with librarians is crucial for the embedded librarian role. without a requirement of embedded librarianship, marketing for the librarians and what they can do for students will be essential for their role to be successful.25 at unf, conversations will have to be held to determine what other integrations would be of interest and possible at our university. the unf library will also be looking to improve the design and layout of library guides. now that their visibility has increased, it will be important to standardize them and ensure they all have a consistent look and feel, which will make it easier for students to find the information and resources they are looking for. conclusion in today’s rapidly changing technological world, it is critical to make resources available despite where students are physically located. integrating the library’s libguides into canvas not only brought more visibility to the library, its resources, and its services, but it also brought the library to where the students were engaged with the university. as noted by farkas, “positioning the library at the heart of the virtual campus seems as important as positioning the library as the heart of the physical campus.”26 providing resources to students at their point of need, enabled them to easily access the information they needed to help them succeed in their courses. it also allowed faculty to integrate library resources that were most beneficial to their courses and enhanced their teaching as well as the educational needs of their students. the unf library will continue to look at how library resources are used, and how to best serve the online community going forward. it will be important to explore ways to enhance existing services with existing technology but also look ahead and determine what may be possible down the road with new and upcoming technologies. in addition, assessing how the library connects to online learners and gathers feedback from students and faculty will be critical to contributing to the success of students. endnotes 1 meredith gorran farkas, “libraries in the learning management system,” american libraries tips and trends (summer 2015), https://acrl.ala.org/is/wpcontent/uploads/2014/05/summer2015.pdf. 2 jeffrey pomerantz et al., “foundations for a next generation digital learning environment: faculty, students, and the lms” (jan 12, 2018): 1–4. 3 pomerantz et al., “foundations for a next generation digital learning environment.” 4 farkas, “libraries in the learning management system.” https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf information technology and libraries june 2020 collaboration and integration | murray and feinberg 11 5 farkas, “libraries in the learning management system.” 6 farkas, “libraries in the learning management system.” 7 farkas, “libraries in the learning management system.” 8 robin camille davis, “the lms and the library,” behavioral & social sciences librarian 36, no. 1 (jan 2, 2017): 31–5, https://doi.org/10.1080/01639269.2017.1387740. 9 liz thompson and davis vess, “a bellwether for academic library services in the future: a review of user-centered library integrations with learning management systems,” virginia libraries 62, no. 1 (2017): 1–10, https://doi.org/10.21061/valib.v62i1.1472. 10 davis, “the lms and he library.” 11 amanda clossen and linda klimczyk, “chapter 2: tell us a story: canvas integration strategy,” library technology reports 54, no. 5 (2018): 7–10, https://doi.org/10.5860/ltr.54n5. 12 clossen and klimczyk, “chapter 2,” 8. 13 clossen and klimczyk, “chapter 2,” 8. 14 britt fagerheim et al. “extending our reach,” reference & user services quarterly 56, no. 3 (2017): 180–8, https://doi.org/10.5860/rusq.56n3.180. 15 fagerheim et al., “extending our reach,” 187. 16 fagerheim et al., “extending our reach,” 188. 17 amanda clossen, “chapter 7: ongoing implementation: outreach to stakeholders,” library technology reports 54, no. 5 (2018): 28. 18 amanda clossen, “chapter 7,” 29. 19 amanda clossen, “chapter 7,” 29. 20 susan s. karplus, “integrating academic library resources and learning management systems: the library blackboard site,” education libraries 29, no. 1 (2006): 5, https://doi.org/10.26443/el.v29i1.219. 21 karplus, “integrating academic library resources and learning management systems.” 22 karplus, “integrating academic library resources and learning management systems.” 23 karplus, “integrating academic library resources and learning management systems.” 24 beth e. tumbleson, “collaborating in research: embedded librarianship in the learning management system,” reference librarian 57, no. 3 (jul, 2016): 224–34, https://doi.org/10.1080/02763877.2015.1134376. https://doi.org/10.1080/01639269.2017.1387740. https://doi.org/10.21061/valib.v62i1.1472 https://doi.org/10.5860/ltr.54n5 https://doi.org/10.5860/rusq.56n3.180 https://doi.org/10.26443/el.v29i1.219 https://doi.org/10.1080/02763877.2015.1134376 information technology and libraries june 2020 collaboration and integration | murray and feinberg 12 25 tumbelson, “collaborating in research.” 26 farkas, “libraries in the learning management system.” abstract introduction how the project originated literature review developing a plan of action implementation benefits of the integration challenges of the integration discussion and next steps conclusion endnotes president’s message andromeda yelton information technology and libraries | march 2018 2 andromeda yelton (andromeda.yelton@gmail.com) is lita president 2017-18 and senior software engineer, mit libraries, cambridge, massachusetts. in my last president’s message, i talked about change — ital’s transition to new leadership — and imagination — wakanda and the archival imaginary. today change and imagination are on my mind again as lita contemplates a new path forward: potential becoming a new combined division with alcts and llama. as you may have already seen on litablog (http://litablog.org/2018/02/lita-alcts-and-llamadocument-on-small-division-collaboration/), the three divisional leadership teams have been envisioning this possibility, and all three division boards discussed it at midwinter. while the id ea sprang out of our shared challenges with financial stability, in discussing it we’ve realized how much opportunity we have to be stronger together. for instance, we’ve heard for years that you, lita members, want more of a leadership training pathway, and more ways to stay involved with your lita home as you move into management; alignment with llama automatically opens up all kinds of possibilities. they have an agile divisional structure with their communities of practice and an outstanding set of lead ership competencies. and anyone involved with library technology knows that we live and die by metadata, but we aren’t all experts in it; joining forces with alcts creates a natural home for people no matter where they are (or where they’re going) on the technology/metadata continuum. alcts also runs far more online education than lita and runs a virtual conference. meanwhile, of course, lita has a lot to offer to llama and alcts. you already know how rewarding the networking is, and how great the depth of expertise on technology topics. we also bring strong publications (like this very journal), marquee conference programs (like top tech trends and the imagineering panel), and a face-to-face conference. (speaking of which, please pitch a session (http://bit.ly/2gpgxdf) for the 2018 lita forum!) i want to emphasize that no decisions have been made yet. the outcome of our three board discussions was that we all feel there is enough merit to this proposal to explore it further, but none of us are formally committed to this direction. furthermore, it is not practically or procedurally possible to make a change of this magnitude until at least 2019. in the meantime, we expect there will be numerous working groups to determine if and how this all could work, as well as open forums for the membership of all three divisions to express hopes, concerns, and ideas. personally, my highest priority is to ensure that that you, the members, continue to have a divisional home: one that gives you learning opportunities and a place for professional camaraderie, and that is on solid financial footing so it can continue to be here for you in the long term. http://litablog.org/2018/02/lita-alcts-and-llama-document-on-small-division-collaboration/ http://litablog.org/2018/02/lita-alcts-and-llama-document-on-small-division-collaboration/ http://bit.ly/2gpgxdf president’s message | march 2018 3 https://doi.org/10.6017/ital.v37i1.10386 so, i’m excited about the possibilities that a superhero teamup affords, but i’m even more excited to hear from you. do you find this prospect thrilling, scary, both? do you think we should absolutely go this way, or definitely not, or maybe but with caveats and questions? please tell me what you think. you can submit anonymous feedback and questions at https://bit.ly/litamergefeedback. i will periodically collate and answer these questions on litablog. you can also reach out to me personally any time (andromeda.yelton@gmail.com). https://bit.ly/litamergefeedback mailto:andromeda.yelton@gmail.com vr hackfest public libraries leading the way vr hackfest chris markman, m ryan hess, dan lou, and anh nguyen information technology and libraries | december 2019 6 chris markman (chris.markman@cityofpaloalto.org) is senior librarian – information technology & collections, palo alto city public library. m ryan hess (ryan.hess@cityofpaloalto.org) is library services manager — digital initiatives, information technology & collections, palo alto city public library. dan lou (dan.lou@cityofpaloalto.org) is senior librarian — information technology & collections, palo alto city public library. anh nguyen (anh.nguyen@cityofpaloalto.org) is library specialist, information technology & collections, palo alto city public library. we built the future of the internet…today! the elibrary team at the palo alto city library held a vr hackfest weaving together multiple emerging technologies into a single workshop. during the event, participants had hands -on experience building vr scenes, which were loaded to a raspberry pi and published online using the distributed web. throughout the day, participants discussed how these technologies might change our lives, for good and for ill. and afterward, an exhibit showcasing the participants’ vr scenes was placed at our mitchell park branch to stir further conversation. multiple emerging technologies explored the workshop was largely focused around the a-frame code, a framework for publishing 3d scenes to the web (https://aframe.io/). however, we also integrated a number of other technologies, including a raspberry pi, qr codes, a twitter-bot, and the inter-planetary file system (ipfs), which is a distributed web technology. virtual reality built with a-frame code in the vr hackfest, participants first learned how to use a-frame code to render 3d scenes that can be experienced through a web browser or vr headset. a-frame is a new framework that web publishers and 3d designers can use to design web sites, games and 3d art. a-frame is an extension of html, the code used to build web pages. anyone who is familiar with html will pick up a-frame very quickly, but it is simple enough even for beginners. for example, here is some raw a-frame code: mailto:chris.markman@cityofpaloalto.org mailto:ryan.hess@cityofpaloalto.org mailto:dan.lou@cityofpaloalto.org mailto:anh.nguyen@cityofpaloalto.org https://aframe.io/ vr hackfest | markman, hess, lou, and nguyen 7 https://doi.org/10.6017/ital.v38i4.11877 figure 1. try this code example! https://tinyurl.com/ipfsvr02. save the above code as an html file and open it with a webvr compatible browser like chrome and you will then see a blue cube in the center of your screen. by just changing the values of a few parameters, novice coders can easily change the shape, size, color and location of primitive 3d objects, add 3d backgrounds and more. advanced users can also insert javascript code to make the 3d scenes more interesting. for example, in the workshop, we provided javascript that animated a 3d robot head (see figure 1) pre-loaded into the codepen (https://codepen.io) interface for quicker editing and iteration. the inter-planetary file system (ipfs) the collection of 3d scenes created in the vr hackfest was published to the internet using the inter-planetary file system (ipfs), an open source distributed web technology originally created in palo alto by protocol labs in 2014 and now actively improved by a global network of software developers. ipfs allows anyone to publish to the internet without a server, through a peer-to-peer network that can also work seamlessly with the regular internet through http “gateways”. in november 2019, brave browser (https://brave.com) became the first to offer seamless ipfs integration, capable of spawning its own background process or daemon that can upload and download to ipfs content on the fly without the need for an http gateway or separate browser extension installation. unlike p2p technologies such as bittorrent, ipfs is best suited for distributing small files available for long periods of time rather than the quick distribution of large files over a short period of time. this is an oversimplification of what is really happening behind the scenes (part of the magic involves content-addressable storage and asynchronous communication methods based on pub/sub messaging, to name a few) but the ability to share and publish 3d environments and 3d objects in a way that can instantly scale to meet demand could have far reaching consequences for future technologies like augmented reality. https://tinyurl.com/ipfsvr02 https://codepen.io/ https://ipfs.io/ https://brave.com/ information technology and libraries | december 2019 8 figure 2. workshop attendees. ipfs can load content much faster, more securely (through features like automated cryptographic hash checking), and allows people to publish directly to the internet without the need of a thirdparty host. google, facebook, and amazon web services need not apply. the same technology has already been used to overcome censorship efforts by governments, but like any technology it has its downsides. content on ipfs is essentially permanent, allowing free speech to flourish but it could also make undesirable content, like hate speech or child pornography, all but impossible to control. toward 21st century literacy like our other technology programs, the vr hackfest was designed to engage customers around new forms of literacy, particularly around understanding code and thinking critically about emerging communication technologies. in 2019, we are already seeing how technologies like machine learning and social media are impacting social relations, politics and the economy. it is no longer enough to know how to read and write code that underlies the web. true literacy must also understand how these technologies interface with each other and how they impact people and society. vr hackfest | markman, hess, lou, and nguyen 9 https://doi.org/10.6017/ital.v38i4.11877 figure 3. the free-standing exhibit. information technology and libraries | december 2019 10 to this end, the vr hackfest sought to take participants on a journey, both technological but also sociological. once the initial practice with the code was completed, we moved on to a discussion of the consequences for using these technologies. with the distributed web, for example, we explored questions like: • what are the implications for permanent content on the web which no one can take down? • what power do gatekeepers like the government and private companies have over our online speech? • what does a 3d web look like and how will that change how we communicate, tell stories and learn? after the workshop ended, we continued the conversation with the public through an exhibit placed at our mitchell park branch (see figure 3). in this exhibit, we showcased the vr scenes participants had created and introduced the technologies underlying them. but we also asked people to reflect on the future of the internet and to share their thoughts by posting on the exhibit itself. public comments reflected the current discourse around the internet. responses (see figure 5) were generally positive—most of our customers mentioned better download speeds or other efficiency increases but a few also highlighted online privacy and safety improvements. we recorded an equal number of pessimistic and technical responses to the same question, these often demonstrated either knowledge of similar technology (e.g. “how is this different than napster?”) or displeasure with the current state of the world wide web (e.g. “less human connections” or “more spyware and less freedom”). outcomes one surprise outcome was that our project reached the attention of the developers of ipfs, who happen to live a few blocks away from the library. after reading about the exhibit online, their whole team visited our team at the library. in fact, one of their team turned out to be a former child customer of our library! the workshop itself, which was featured as a summer reading program activity, also brought in record numbers. originally open to 20 participants and later expanded to 30, the workshop grew a waitlist that more than quadrupled our initial room capacity. clearly, people were interested in learning about these two emerging technologies. we also want to take a moment to highlight the number of design iterations this project went through before making its way into the public eye. the free-standing vr hackfest exhibit was originally conceived as a wall mounted computer kiosk that encouraged users to publish a short message directly to the web with ipfs, but this raised too many privacy concerns and ultimately our building design does not make mounting a computer on the wall an easy task. our workshop also initially focused much more on command line skills working directly with ipfs, but user testing with library staff showed learning a-frame was more than enough. vr hackfest | markman, hess, lou, and nguyen 11 https://doi.org/10.6017/ital.v38i4.11877 figure 4. building the exhibit. information technology and libraries | december 2019 12 figure 5. exhibit responses. figure 6. visit from protocol labs co-founders. 0 2 4 6 8 10 12 14 16 18 20 optimistic pessimistic technical spam illegible n u m b e r o f p o st -i t n o te s vr hackfest | markman, hess, lou, and nguyen 13 https://doi.org/10.6017/ital.v38i4.11877 the vr hackfest was also a win because it combined so many different skills into a single project. we were not only working with open source tools and highlighting new technologies, but also building an experience for workshop attendees and showcasing their work to thousands of people. future work our immediate plans include re-use of the exhibit frame for future public technology showcases and offering another round of vr hackfest workshops, perhaps in a smaller group so participants have the chance to view their work while wearing a vr headset. figure 7. 3d mock-up. beyond this, we also think libraries have the opportunity to harness the distributed web for digital collections, potentially undercutting the cost of alternative content delivery networks or file hosting services. through this project we have already tested things like embedded ipfs links in marc records and building a 3d object library. essentially, all the pieces of the “future web” are already here and it is just a matter of time before all modern web browsers offer native support for these new technologies. in general, our project demonstrated the popularity of 21st-century literacy programs. but it also demonstrated the significant technical difficulties of conducting cutting edge technology workshops in public libraries. clearly, the demand is there, and our library will continue to strive to re-imagine library services. multiple emerging technologies explored virtual reality built with a-frame code the inter-planetary file system (ipfs) toward 21st century literacy outcomes future work microsoft word 14041 20211221 galley.docx public libraries leading the way how covid affected our python class at the worcester public library melody friedenthal information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.14041 melody friedenthal (mfriedenthal@mywpl.org) is a public services librarian, worcester public library. © 2021. in june 2020, ital published my account of how the worcester public library (ma) came to offer a class in python programming and how that class was organized. although readers may have read the article in the middle of our covid-year, i wrote it mostly in early january 2020, before libraries across the country closed in an effort to protect staff and patrons from the disease. from spring 2020 through april 2021, i taught intro to coding: python for beginners five more times. but, of course, these classes were not face-to-face. like virtually all other library, musical, political, religious, and cultural programming across the world, our python course was taught virtually. the public services team has one professional zoom account, which my colleagues and i share. how did going remote affect this class? it depends on whether your perspective is that of a student or that of the instructor. many of us have read how difficult it’s been for teachers to effectively reach their elementarythrough-collage-age students. i’ve had many of the same challenges, but since nearly all my students are adults and they all chose to take this class, i don’t need to grapple with fidgety kids or recess. on the other hand, there were few distractions in our computer lab, while covid-time students have to grapple with pets, children squabbling, or noise from a tv. i was teaching from my home office. at the library i have one monitor but at home i have two, which makes it easier for me to spread out my assorted documents. to “protect” my students from seeing my messy house, i used a virtual background, one chosen not to distract. however, the software which determines the borders of a human presenter isn’t perfect and there is sometimes a halo behind my head of the things behind me; this may be distracting itself. prior to covid, since we had twelve seats in the computer lab, we limited registration to fourteen, allowing for some no-shows (and we have two spare laptops, in case everyone showed up). a week prior to session one i would email the registrants, asking them to confirm their continued interest. if a student didn’t confirm, i’d give their seat to someone on the waitlist. while i was not prepared to make my class a mooc (massive open online courses) because i individually review homework and give lots of feedback, we did increase maximum registration to fifteen since the number of seats in the computer lab was no longer a limiting factor. and, as before, i ask for confirmation via email, but i also include in that email two links and an attached word doc. the document is an excerpt from cory doctorow’s novel little brother on the joys of coding. information technology and libraries december 2021 how covid affected our python class | friedenthal 2 the first embedded link leads to the free version of zoom. the second link is to the thonny website (https://thonny.org). thonny is a free ide (integrated development environment) where students can write and execute python code. we used thonny when i taught face-to-face, but the lab computers all had thonny installed, and were ready for students to use. now, i have to depend on the ability of students to download the software to their own computers. i ask students to do the two downloads ahead of session one. which brings us to two problems: the class was no longer accessible to students who live in a household without a computer and internet service. and, as i found out with one prospective student, it’s not accessible to patrons who don’t have administrative rights to their computer; that is, the ability to download new software. when a patron confirms their interest, i email them the course manual. it now contains about 93 pages. i told students they might choose to print it but doing so is up to individual preference. the advantage of having a digital copy is that students can search for keywords easily. the disadvantage is that the cost of printing the manual is shifted to the student and may be prohibitive for some. in session one, i acknowledged that it’s difficult to learn technical material via zoom, and i encouraged everyone to ask questions during class and to email me if they are stymied while working on the homework. i reiterated that invitation during every session. while teaching, i bounce back-and-forth between screen-sharing my thonny window and the manual, while trying to keep an eye on the little zoom windows showing my students. some students cannot or choose not to turn on their video. this is a problem for me, since i can’t readily determine who’s asking a question. moreover, it is helpful to associate a face with a name. and since i give out a certificate of completion to each student who does the homework and attends all sessions, i want to make sure the student is actually taking part. i’ve had students who sign in, leave their camera off, and then apparently leave (i call on students by name and sometimes the no-video ones never respond). offering the class online has advantages in snowy worcester. students can tune in from the comfort of their own homes, avoid the slick roads, bypassing paying for parking at the municipal lot next to our building or for a bus to downtown, or the discomfort of walking in a dark citycenter in the evening. another plus: as program organizers and program participants have discovered, with videoconferencing we are no longer limited geographically. i had registrants who live in pennsylvania and georgia. as always, students range from total beginners to experienced programmers-of-other-languages. i’ve thought about how i can give extra time to the former while not boring the latter. one thing i’ve done is to make some assignments optional and say, “if you want an extra challenge, give this a try….” i’ve slowed the class down a bit, leaving more time for coding during each session. if a student has difficulties, i invited them to share their screen. this pedagogical technique actually works better information technology and libraries december 2021 how covid affected our python class | friedenthal 3 via zoom than in-person, because we could all see that screen equally well. in the computer lab, only the student who sat at the same (2-person) desk could easily see what the other person had coded. another thing i’ve done is to ratchet down the formality of the class: i am chattier and demo fun games i’ve written, e.g., hangman, tic-tac-toe, rock-paper-scissors, and you sunk my battleship, for inspiration. i experimented with using the built-in zoom whiteboard but that wasn’t satisfactory, so i wrote supplementary notes as comments in thonny. parents were fearful their kids were not being intellectually challenged when schools were closed due to the pandemic, so maybe i shouldn’t have been surprised that the april 2021 class contained seven children. there would have been an eighth, but when i realized one registrant was just seven years old, i told his mother that, while she was the best judge of her son’s abilities, i discouraged him from taking the class. she decided to take it herself. figure 1. a word-cloud of our fall 2020 project outcome evaluations (includes other digital learning programs). at our sixth and final session i traditionally execute a program which draws colorful graphics, rather like spirograph. students were able to see each curve being drawn in a new window launched by the ide. but this window doesn’t exist until i executed the program. while we were information technology and libraries december 2021 how covid affected our python class | friedenthal 4 using zoom, when i attempted to share my screen, the students missed the first graphics, no matter how fast i was at screen-sharing. i made the execution “sleep” for a few seconds to give me time to switch screens before the graphics were drawn. a larger percentage of students earned the certificate of completion during the virtual classes than on average in the in-person pre-covid classes, perhaps 75% vs. 40%. for the in-person classes our communications officer printed the certificates on heavy paper adorned with the wpl logo; i signed each and handed them out during the final session. for our virtual classes, the certificates were digitally signed and then emailed; students could print them if they chose. this follow-up is being written during october 2021, and with a substantial percentage of massachusetts residents vaccinated for covid, the worcester public library is now back to offering many programs in-person, including python. the city of worcester requires mask use in all municipal buildings, and while some patrons don’t cooperate, i’ve told my students that anyone not wearing a mask properly will be asked to leave the computer lab. with so many people out of work due to the economic devastation wrought by covid, we were gratified to be able to offer a class that teaches in-demand skills, especially ones that can be applied in a work-from-home environment. microsoft word 13355 20211217 galley.docx article a 21st century technical infrastructure for digital preservation nathan tallman information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13355 nathan tallman (ntt7@psu.edu) is digital preservation librarian, pennsylvania state university. © 2021. abstract digital preservation systems and practices are rooted in research and development efforts from the late 1990s and early 2000s when the cultural heritage sector started to tackle these challenges in isolation. since then, the commercial sector has sought to solve similar challenges, using different technical strategies such as software defined storage and function-as-a-service. while commercial sector solutions are not necessarily created with long-term preservation in mind, they are well aligned with the digital preservation use case. the cultural heritage sector can benefit from adapting these modern approaches to increase sustainability and leverage technological advancements widely in use across fortune 500 companies. introduction most digital preservation systems and practices are rooted in research and development efforts from the late 1990s and early 2000s when the cultural heritage sector started to tackle these challenges in isolation. since then, the commercial sector has sought to solve similar challenges, using different technical strategies. while commercial sector solutions are not necessarily created with long-term preservation in mind, they are well aligned with the digital preservation use case because of similar features. the cultural heritage sector can benefit from adapting these modern approaches to increase sustainability and leverage technological advancements widely in use across fortune 500 companies. in order to understand the benefits, this article will examine the principles of sustainability and how they apply to digital preservation. typical preservation activities that use technology will be described, followed by how these activities occur in a 20th-century technical infrastructure model. after a discussion on advancements in the it industry since the conceptualization of the 20thcentury model, a theoretical 21st-century model is presented that attempts to show how the cultural heritage sector can employ industry advancements and the beneficial impact on sustainability. galleries, libraries, archives, and museums cannot afford to ignore the sustainability of managing and preserving digital content and neither can distributed digital preservation or commercial service providers.1 budgets lag behind economic inflation while the cost of and amount of materials to purchase rises, coupled with the need to hire more employees to do this work. if digital preservation programs are going to scale up to enterprise levels and operate in perpetuity, it is imperative to update technical approaches, adopt industry advancements, and embrace cloud technology. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 2 sustainability for digital preservation programs to succeed, they must be sustainable per the triple bottom line or they risk subverting their mission. the triple bottom line definition of sustainability identifies three pillars: people (labor), planet (environmental), and profit (economic).2 while there are typically few people with digital preservation in their job title within an organization, it’s a collaborative domain with roles and responsibilities distributed throughout organizations, reflecting the digital object lifecycle. it’s important that the underlying technical infrastructure can easily be supported and is not so complicated that it is hard to recruit systems administration staff. digital preservation consumes many technical resources and data centers have a substantial environmental impact. as ben goldman points out in “it’s not easy being green(e),” data centers consume an immense amount of power and require extravagant cooling systems that use precious fresh water resources.3 because there is no point in preserving digital content if there will be no future generation of users, responsible digital preservation programs will seek to reduce carbon outputs and the number of rare-earth elements in our technical infrastructure.4 while cultural heritage organizations rarely seek to make a profit, economic sustainability is vital to organizational health and costs for digital preservation must be controlled. modern technological infrastructures discussed here will help to increase sustainability by using widespread technologies and strategies for which support can be easily obtained, by reducing energy consumption, by minimizing reliance on hardware using rare-earth elements, and by leveraging advances in infrastructure components such as storage to perform digital preservation activities. basic digital preservation activities this paper will examine technical preservation activities and the author acknowledges that basic digital preservation activities are likely to include risk management and other non-technical concepts. while there is no formal, agreed-upon definition of what constitutes a set of basic digital preservation activities, bit-level digital preservation is a common baseline. bit-level digital preservation seeks to preserve the digital object as it was received, ensuring that you can get out an exact copy of what you put in, no matter how long ago the ingest occurred; however, with no guarantees as to the renderability of said digital object. two basic digital preservation activities are key to this strategy: fixity and replication. fixity fixity checking, or the “practice of algorithmically reviewing digital content to ensure that it has not changed over time,” is a foundational digital preservation strategy for verifying integrity that aligns with rosenthal et al.’s “audit” strategy.5 fixity is how preservationists demonstrate mathematically that the content has not changed since it was received. not all fixity is the same, however; fixity can be broken up into three types: transactional fixity, authentication fixity, and fixity-at-rest.6 transactional fixity transactional fixity is checked after some sort of digital preservation event7, such as ingest or replication. depending on the event, it’s desirable to use a non-cryptographic algorithm, such as crc32 or md5, when files move within a trusted system. when it’s only necessary to prove that a file hasn’t immediately changed, such as copying between filesystems, cryptographic algorithms are unnecessarily complex and are too expensive, in terms of compute consumption. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 3 authentication fixity authentication fixity proves that a file hasn’t changed over a long period of time, particularly since ingest. although one could use a chain of transactional fixity checks to cumulatively prove there has been no change, it’s often desirable to conduct one fixity check that can be independently verified. unbroken cryptographic algorithms, such as one from the sha-2 and sha-3 families, are well suited to this use case and worth the complexity and compute expense, particularly since this type of fixity check doesn’t have to be run as often. fixity-at-rest fixity-at-rest is when fixity is monitored while content is stored on disk. while some organizations may choose to only conduct fixity checks when files move or migrate, this strategy can miss bit loss due to media degradation, software or human error, or malfeasance that is only discovered when the file is retrieved.8 a common approach for monitoring fixity-at-rest is to systematically conduct fixity checks on all or a sample of files at regular intervals. these types of fixity checks may or may not use cryptographic algorithms, depending on their availability.9 replication replication is another cornerstone of achieving bit-level digital preservation. the national digital stewardship alliance’s 2019 levels of digital preservation, a popular community standard, recommends maintaining at least two copies in separate locations, while noting three copies in geographic locations with different disaster threats is stronger.10 all of these copies must be compared to ensure fixity is maintained. an important concept to consider when thinking about replication is the independence of each copy. according to schaefer et al.’s user guide for the preservation storage criteria, “the copies should exist independently of one another in order to mitigate the risk of having one event or incident which can destroy enough copies to cause loss of data.”11 in other words, a replica should not depend on another replica, but instead depend on the original file. advanced digital preservation activities when considering more robust digital preservation strategies beyond bit-level preservation, additional activities must be considered to ensure that the information contained within digital files can be understood. implementation of these activities may vary by digital object, depending on the digital preservation goal and appraised value of the content. this paper only describes a handful of the many advanced digital preservation activities as illustrative examples; the ideas in this paper could be applied to most advanced activities. metadata extraction digital files often contain various types of embedded metadata that can be used to help describe both its intellectual content and technical characteristics. this metadata can be extracted and used to populate basic descriptive metadata fields, such as title or author. extracted technical metadata is useful for broader preservation planning, but also for validating technical characteristics in derivative files. for example, if generating an access file for digitized motion picture film, it’s necessary to know the color encoding, aspect ratio, and frame rate. if these details are ignored, the access derivative may appear significantly different than the original file and give a false impression to users. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 4 file format conversions file format conversions help to ensure the renderability of digital content. there are two types of file format conversions to consider: normalization and migration. normalization generally refers to proactively converting file formats upon ingest to retain informational content, e.g., converting a wordperfect document to plain text or pdf when only the informational content is desired. migration may occur at any time: upon ingest, upon access, or any time while an object is in storage. migration occurs when file formats are converted to a newer version of the same format, e.g., microsoft access 2003 (mdb) to microsoft access 2016 (accdb) or to a more stable and open format that retains features, e.g., microsoft access 2016 (accdb) to sqlite. versioning versioning, or the retention of past states of a digital object with the ability to restore previous states, is complex to implement and not always necessary. an organization might choose to apply versioning to subsets of digital content, such as within an institutional repository, but not for born-analog, or digitized material. additionally, an organization may choose to version metadata only, ignoring changes to the bitstream, such as for born-analog digital objects. figure 1. the infrastructure architecture for a typical 20th-century stack. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 5 the 20th century technical infrastructure the technical infrastructure that enables digital preservation can come in many forms. while technology has advanced over the past thirty years, the cultural heritage sector, particularly where digital preservation is concerned, has been slow to adapt. below are descriptions of three common components of a typical server stack (technical infrastructure), though the author acknowledges that some organizations have already moved past this model. figure 1 is a diagram of the typical 20th-century stack. storage storage, at the core of digital preservation, has benefitted from rapid technological advancement since computers first started storing information on punch cards and magnetic media. twentiethcentury servers often use three main types of storage: file, block, and object. file storage file storage is what most people are familiar with. a filesystem interfaces with the underlying storage technology (block or object) and physical media (hard disk drives, solid state drives, tapebased media, or optical media) to present users with a hierarchy of directories and subdirectories to store data. this data can easily be accessed by users or applications using file paths, while the filesystem negotiates the actual bit-locations on the physical media. the choice of filesystem can impact data integrity (fixity), although choice may be limited by operating system. in the 20th century, journaling filesystems offered the most data protection as the filesystem keeps track of all changes; in the event of a disk failure, it’s possible to recover more data if a journaling filesystem is used. block storage block storage uses blocks of memory on physical media (disk, tape, etc.) that are managed through a filesystem to present volumes of storage to the server. all interactions between server and storage are handled by the filesystem via file paths, though the data is stored on scattered blocks on the media. block storage directly attached to a server is often the most performant option, the data does not travel outside the server. network attached storage, in which an external file system is mounted to the server as if it were locally attached block storage, requires data to travel through cables and networks before it gets to the server, which decreases performance. object storage object storage, which still uses tape and disk media, is an abstraction on top of a filesystem. instead of using a filesystem to interact directly with storage media, the storage media is managed by software. the software pools storage media and interactions happen through an api, with files being organized into “buckets” instead of using a filesystem with paths. object storage is webnative and the basis for commercial cloud storage. software-defined storage, which is discussed in more detail later in this article, also allows users to create block storage volumes that can be directly mounted to virtual servers as part of a filesystem or to create network shares that present the underlying storage to users via a filesystem.12 both block and object storage can be used for high-performance storage, hot storage (online), cold storage (nearline), and offline storage. generally, tape and slower performing hard disks are used for offline and nearline storage; faster performing hard disks are used for online storage. solidinformation technology and libraries december 2021 a 21st century technical infrastructure | tallman 6 state drives (ssds) using non-volatile memory express (nvme) protocols are best suited for highperformance storage.13 in the 2019 storage infrastructure survey, by the national digital stewardship alliance, 60% of those aware of their organizational infrastructure reported a reliance on hardware-based filesystems (file and block storage) while about 15% used software-based filesystems (object storage), with 14% reporting a hybrid approach.14 this indicates that the cultural heritage sector continues to rely more on file and block storage and is not yet fully embracing object storage. the survey did not probe into why this might be. servers: physical and virtual twentieth-century technical infrastructures relied primarily upon physical servers. physical servers, also called bare metal, dominated the server landscape up through roughly 2005. virtual servers arrived on the scene after “vmware introduced a new kind of virtualization technology which … [ran] on the x86 system” in 1999.15 server virtualization facilitated a fresh wave of innovation by making it easier and more inexpensive to create, manage, and destroy servers as necessary. dedicating physical servers to one or a limited number of applications requires more resources and expends a higher carbon cost; virtual servers can be highly configured for their precise needs and this configuration can be changed using software, rather than changing parts on a physical server, resulting in less waste. cultural heritage organizations have been slow to fully adapt virtual servers. the 2019 ndsa storage infrastructure survey reports that 81% of respondents continue to rely on physical servers with 63% of respondents using virtual servers. fewer than 10% reported using containers, an even more efficient virtualization technology.16 containers are an evolution of virtual servers that act like highly optimized, self-contained servers doing a specific activity.17 applications and microservices in the 20th century, applications often required dedicated servers. business logic was handled by applications or microservices that ran on top of the server and storage, the highest level in the stack. there are advantages to handling the business logic at this high level: it’s completely in the control of the developer and can be finely tuned to the needs. unfortunately, this is also an expensive place to handle all business logic as the application needs to be maintained over time and there’s overhead involved in working at this level of the stack. microservices, in this server model, are generally specific commands that can be invoked as needed. while called microservices because they can be applied individually, they still run in this expensive part of the stack and have the same downsides as applications. in digital preservation systems using this type of architecture, basic and advanced digital preservation activities occur within this application layer. fixity can be a costly activity. garnett, winter, and simpson, in their paper “checksums on modern filesystems, or: on the virtuous consumption of cpu cycles,” point out that “calculating full checksums in software is not efficient” and “increases wear and tear on the disks themselves, actually accelerating degradation.”18 fixity, when done this way, is a linear process that requires every file to be read from disk so a checksum can be calculated; when performing fixity over large amounts of content, this is very inefficient and time consuming. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 7 preservation activities in the 20th-century stack in this model of infrastructure, many cultural heritage institutions are relying on practices created when the field of digital preservation was emerging. basic activities basic preservation activities take a generalized approach and mostly occur in the costly application and microservices layer. this follows the general approach of application development from the commercial sector in the 20th century. fixity although there are differences in frequency, most organizations do not currently make distinctions between transactional fixity, authentication fixity, or fixity-at-rest. common current practices use the same method (md5, sha-256, sha-512) for all fixity checks.19 this inefficient approach take place in the application and microservices layer and uses more compute power than necessary, increasing the environmental impact. replication in most 20th-century stacks, replications are handled in the application layer, where it is most costly in terms of computational power and labor to maintain, having a negative impact on sustainability. some are using 20th-century microservices are well. advanced digital preservation activities like basic preservation activities, advanced ones chiefly take place in the application and microservices layer if they occur at all. metadata extraction and file format conversion metadata extraction and file format conversion tends to occur only upon ingest as a one-time event. archivematica, the popular open-source digital preservation system, uses 20th-century microservices for each and they only occur during the transfer (ingest) process.20 other systems often include this in the business logic of the application layer. versioning version control is a feature that many organizations choose not to implement. the 2019 ndsa storage infrastructure surveys shows that fewer than half (40) of respondents (83) used any type of version control.21 version control is hard to implement in a custom system, with alternative approaches. fedora, a digital preservation backend repository, introduced support for versioning in the application layer around 2004.22 advances in the commercial sector since the conceptualization of the 20th-century stack, there have been significant advancements made in the general it industry. virtualization technology developed in the 1990s led to the proliferation of cloud computing and infrastructure that transformed the it industry in the early 2000s, leading to the “long-held dream of computing as a utility” or commodity.23 clouds can be public, where anyone is able to provision and use services, or private, where services are only available to a group of authorized users. public clouds are run in commercial data centers while private clouds are typically built-in privately-owned data centers, though it’s possible to use commercial data centers to build private clouds. hybrid clouds are also possible, typically combing private and public clouds, or combining on-premises infrastructure with a private or public cloud. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 8 in 2009, researchers at uc berkley identified three strong reasons why cloud computing has been so widely adopted: the illusion of vertical scaling on demand, elimination of upfront cost, and the ability to pay for short-term resources.24 surveys from the ndsa and the beyond the repository grant project show a steady, but slow adoption of cloud infrastructure by the cultural heritage community.25 it is unclear whether early adopters have chosen independently or simply followed it changes in their parent organizations. any organization can build a private cloud and take advantage of the benefits described in this article. using the cloud does not mean that you must contract with commercial cloud providers. some organizations may choose to build a private cloud if there are concerns over data sovereignty, mistrust in public clouds, or for other reasons. the ontario council of university libraries in canada has built a private cloud for its members called the ontario library research cloud using openstack, a suite of open-source software for building clouds.26 software-defined storage while virtualization enables cloud computing, software-defined storage is the foundation for cloud storage. software-defined storage combines inexpensive hardware with software abstractions to create a flexible, scalable, storage solution that provides data integrity.27 software-defined storage can use the same pool of disks to present all three of the common types of storage: file, object, and block. file storage is what most users are familiar with. software defined file storage creates a network file share from which files can be accessed on local devices via a filesystem.28 object storage in this environment is like a web-native file share; files are stored in buckets, which can be further organized by folders. files are not accessed through a filesystem, but are instead accessed through uris, which makes object storage very amenable to web applications and avoids some of the pitfalls of relying on filesystems. block storage is mostly used to mount storage to virtual servers, storage that is directly attached to the server as if it was a physical disk or volume mounted to the server. block storage is more performant than either file or object storage; as such it’s typically used for things like the operating system and application code, but not for storing content. all storage can be managed through apis, adding to its suitability for automation, software development, and it operations.29 hardware diversity software defined storage also has features that make a compelling use case for digital preservation. first, software defined storage accommodates hardware diversity. because software defined storage is an abstraction, it’s possible to combine different types of storage media, from different manufacturers and production batches to ensure some technical diversity and avoid risk from catastrophic failure from a hardware monoculture. fixity and integrity second, like the use of raid in traditional filesystems, file integrity can be strengthened through the use of erasure coding.30 erasure coding splits files into chunks and spreads them across multiple disks or potentially nodes such that the file can be reconstructed if some of the disks or nodes fail. this can be configured in different ways, depending on the amount of parity desired.31 replication third is replication of content. for cloud administrators, replication might be an alternative to erasure coding for ensuring data integrity; for digital preservationists, it’s a distinct strategy and information technology and libraries december 2021 a 21st century technical infrastructure | tallman 9 basic preservation activity. operating nodes in a software defined storage network can be in different availability zones; through object storage policies, content can be replicated as many times as necessary to provide mitigation of geographic based threats. it’s even possible to replicate to object storage in a different software defined storage network, helping to achieve organizational diversity as well. figure 2. the infrastructure architecture for a theoretical 21st-century stack. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 10 an updated technical infrastructure for the 21st century a theoretical 21st-century stack for digital preservation has many of the same components as its 20th-century antecedent. however, these components are used in different ways, largely due to technological advancements. leveraging these advancements to handle digital preservation activities at lower levels of the stack reduces the complexity of the business logic in the application layer. figure 2 shows an updated architecture diagram for this 21st-century stack, which may be used by an individual organization, consortium, or service provider planning to build a digital preservation system. the storage layer is built on software-defined storage with digital content primarily being stored as objects; these objects are stored using the oxford common file layout (discussed further later). physical bare metal servers are used to power virtual machines that host applications such as a digital repository. physical servers also host a container and function as a service to provide a suite of microservices for processing digital content. storage in this new stack, storage is primarily managed through software defined storage with data flowing over networks. there are currently two primary open-source options for running a software-defined storage service: gluster and ceph. both can be installed and run on-premises, in a private or public data center, or even contracted through infrastructure as a service (iaas). in his presentation at the 2018 designing storage architectures for digital collections meeting, hosted by the library of congress, glenn heinle recommended ceph over gluster where data integrity is the highest priority; however, others argue that gluster is better for long-term storage.32this is likely because ceph is better able to recover from hardware failures.33 file storage reliance on file storage has become minimal in this theoretical stack, with data primarily stored as objects. however, file storage may still be used; when it is, it benefits from using a modern filesystem. several modern filesystems have emerged since 2000, most notably zfs and openzfs34 with their innovative copy-on-write transactional model and methods for managing free space.35 both zfs and openzfs can also be configured to use raid-z, which maintains blocklevel fixity by calculating checksums for each block of data and verifying the checksum when accessed. this can be combined with simple software to touch every block on a regular basis to ensure fixity-at-rest. although this is a different approach than file-level fixity checks, it accomplishes the same thing in a much more efficient method: preservation metadata could be recorded for each block that contains part of the file.36 zfs has also inspired similar modern filesystems such as btrfs, apple file system (apfs), refs, and resier.37 however, even if this theoretical stack isn’t relying on file storage to persist data, software-defined storage is an abstraction that sits atop servers and disks (or tape) that do use filesystems.38 ironically, zfs is not the best option for the underlying disks as its data integrity features come with more overhead and data integrity can be achieved through different means with softwaredefined storage.39 block storage block storage comes in two forms in this future stack. many virtual servers will leverage the block storage offerings of the software defined storage service, attaching virtual disk blocks to virtual servers. however, the physical servers that support virtualization will still have some physically information technology and libraries december 2021 a 21st century technical infrastructure | tallman 11 attached storage using ssds (through nvme) to support high performance storage needs. this physically attached block storage is more performant than virtually attached block storage since the system has direct access to the disks and does not have to work through a virtually abstracted filesystem. object storage object storage has become the primary method of storing data in this theoretical stack. the flexibility of object storage, with its web-native apis and authentication, gives it an advantage as systems become less centralized and more integrations are needed. the natural scalability of object storage and the variety of private, public, and commercial offerings greatly simplifies geographic and organizational redundancy when replicating data. with software-defined storage, it’s also possible to offer hot (live) and cold (nearline, offline) options, giving flexibility for how data is stored to better optimize the storage for various needs. hot storage may use either hard disk or solid-state drives while cold storage would rely on tape or optical media. presently, options for running software defined storage on tape and optical media are mostly proprietary.40 while this would be a concern if these systems held the only copy, if the data is replicated to systems using other technology and media, this risk can be managed. while optical media has long been criticized for use as a preservation media, when well-managed, the risk may be overstated.41 oxford common file layout the oxford common file layout (ocfl) is a “shared approach to filesystem layouts for institutional and preservation repositories.”42 ocfl is a specification for organizing digital objects in a way that supports preservation while being computationally efficient. it has several advantages for use in digital preservation, such as the ability to rebuild a repository with only the files, it’s both human and machine readable, supports native error detection, allows objects to be efficiently versioned, and is designed to work with a variety of storage infrastructures.43 although some implementation details are still being worked out, ocfl can be used with object storage.44 ocfl is in production use and client libraries are available for javascript, java, ruby, and rust.45 in this future stack, all storage operations are handled by an ocfl client, which then interacts with the underlying software defined storage network as shown in figure 2. servers physical servers are used chiefly to support virtualization in this future stack. however, this stack moves beyond virtual servers and supports containers and serverless computing. virtual servers are chiefly used to support applications and databases while containers are perfectly suited for microservices running preservation activities. serverless, or function-as-a-service, is the next evolution in virtualization. while a container may be idling all the time, spinning into action when a microservice is called, serverless functions are run on-demand only. they can cost less when using commercial services as aws lambda or aws fargate where the customer is billed for usage only.46 though serverless functions can make use of containers, function-as-a-service platforms have emerged, such as apache openwhisk and openfaas that don’t always require containers. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 12 preservation activities in the 21st-century stack this 21st-century stack performs the same preservation activities as its predecessor. however, it generally does this at lower levels of the stack, in the infrastructure layers as opposed to the application and microservice layers. this change reduces the computational load on the stack and simplifies the business logic. basic activities fixity and replication are achieved leveraging a combination of microservices and softwaredefined storage. by optimizing the approach to fixity for each use case, instead of using the same computationally intensive method for all fixity, organizations can use less compute power. while fixity and replication still involve the microservice layer, it is a more targeted approach. transactional fixity transactional fixity is maintained through a function-as-a-service-based microservice. each time a file is moved, the microservice is triggered, which calculates a md5 checksum and compares it to a stored value that was created upon ingest. if there is a mismatch between the md5 values, a second microservice is called that fetches a valid file replica. while crc32 might be preferred (because it’s slightly less cpu-intensive), box has shown that crc32 values can differ depending on how they are calculated.47 a stored crc32 can only be reliably used to confirm fixity if the new calculation uses the exact same method because crc32 not a true specification—such as md5— and implementations may differ. crc32 is recommended only when it’s possible to calculate in the same manner, such as the same microservice. as this introduces technical complexity, some organizations may prefer to rely solely on md5 for transactional fixity. authentication fixity authentication fixity is maintained in much the same way as in the 20th-century model, except using a cryptographically secure checksum algorithm, such as sha-512 (part of the sha-2 family). however, distinguishing between transactional vs. authentication fixity allows more precise use of algorithms, only requiring more computationally intensive cryptography when it’s truly needed. authentication fixity may require the use of a container-based microservice, versus a function-asa-service, due to the increased computational need. fixity-at-rest fixity-at-rest, the most common type of fixity, is managed by the software-defined storage service and reported in preservation metadata. how this is achieved might look different, depending on which software-defined storage service is used. the ceph community has developed a new technology called bluestor which serves as a custom storage backend that directly interacts with disks, essentially replacing the need to use an underlying filesystem.48 bluestor calculates checksums for every file and verifies them when read. because this is all internal and managed by the same system, crc32 is the default algorithm, but multiple algorithms are supported. ceph will “scrub” data every week. scrubbing is the process of reading the file simply to verify the checksum, even if no user has accessed the file. because of the way ceph performs erasure coding, if a checksum is invalid, the file can be repaired. what remains to be done is writing a script that will read ceph’s internal metadata and record preservation events within the object’s preservation metadata for the fixity verification and any reparative actions. ryu and park propose a “markov failure and repair model” to optimize the frequency of data scrubbing and number of replicas such that the least amount of information technology and libraries december 2021 a 21st century technical infrastructure | tallman 13 power is consumed and that scrubbing occurs at off-peak times.49 it appears that this optimization causes less media degradation from the process of regularly reading the file, though empirical studies are needed to confirm that there is overall less degradation than conducting fixit checks through an application. gluster has a similar scrubbing process for fixity-at-rest in the optional bitrot feature, although it uses sha-256 by default, instead of crc32, which requires more computing power.50 replication replication in this future stack is mostly handled by the software-defined storage service, but microservices may play a role in achieving independence of copies.51 object storage policies allow the automatic copying of data into another region or availability zone that is within the software defined storage network. however, these copies are not replicas, or independent instances, because all copies are in a chain derived from the primary instance; if there is a problem anywhere in the chain, bad data will be copied. in addition to using object storage policies, microservices could be used to independently verify the fixity of downstream copies as well as trigger true replications to independent instantiations, such as an alternative storage service or different storage area within the same software defined storage network. bill branan suggested a similar approach in his cloud native preservation presentation at ndsa digital preservation 2019.52 advanced digital preservation activities advanced digital preservation activities in a 21st-century stack also make use of microservices for metadata extraction and file format conversion. versioning, however, relies upon features of the oxford common file layout, even though object storage often supports versioning natively. metadata extraction function-as-a-service microservices are well suited to metadata extraction, actuated upon ingest or as needed. since embedded metadata is machine-readable, this activity will not be resource intensive or time consuming. in addition to extracting metadata and storing it as discrete, sidecar files, these microservices can be used to populate specific metadata fields used by the repository, including descriptive, administrative, and technical metadata. this approach is more efficient as it gives flexibility to reuse the functions in multiple workflows as opposed to specific events like ingest. file format conversion file format conversions use a combination of function-as-a-service and container-based microservices, depending upon the original format. like metadata extraction, conversion may occur at ingest (normalization) or as needed (migration). function-as-a-service is well suited for small to medium files, such as converting wordperfect to opendocument format. function-as-aservice is also well suited for logical preservation, when only the informational content is necessary to preserve, such as converting a tif to a txt file through ocr. container-based microservices are better suited for converting large media files that may take more memory and time; function-based services often have a time constraint, for example, migrating proprietary encoded digital video to open codecs and container formats. versioning although object storage typically supports versioning, it is inefficient because each version is an entirely new object. this means that unchanged data is duplicated, taking up more disk space. the oxford common file layout, which negotiates storage between the application and microservices information technology and libraries december 2021 a 21st century technical infrastructure | tallman 14 layers and a software defined storage service, supports a forward delta versioning in which each new version only contains the changes. using the object inventories, it’s possible to rebuild any object to any version without duplicating bits.53 an additional benefit of using ocfl is that it inherently uses checksums, and any changes or corruption are detected when an interaction occurs with the object, creating a layered approach to maintaining fixity-at-rest. sustainability in the 21st-century stack the differences between our 20thand 21st-century stacks result in a more sustainable approach to digital preservation, per the triple-bottom-line definition.54 by adopting commercial sector approaches, cultural heritage organizations can more efficient data centers consumers. people (labor) by shifting activities to lower levels in the stack and letting infrastructure components selfmanage, fewer people are needed to develop and maintain the business logic that formerly handled the same action. the application and microservice layers use programming languages and libraries that can become outdated quickly, requiring development work to maintain functionality. while there is still a need for specialized knowledge, fewer people are needed to do the work when these actions take place in more stable parts of the stack. planet (environmental) our new stack has a lower environmental impact for a variety of reasons. first, per kryder’s law (the storage parallel to moore’s law for computing), the areal density of storage media predictably increases annually, and our new stack uses the latest hard disk and tape technology.55 this results in needing less media, some of which doesn’t need power to run, decreasing the carbon impact. additionally, our new stack uses a mix of hot and cold storage, making it possible to implement automatic tiering to shift objects to less resource-intensive storage, like tape.56 second, as the stack becomes more serverless, fewer computational resources are needed. even though container and function-based microservices may incur some overhead in terms of cpu cycles, it is more efficient in terms of system idling to be running these as microservices on one platform rather than doing the same action in the application or vm layer. this further decreases the carbon impact and while also decreasing the dependency on rare-earth elements. relatedly, by leveraging software-defined storage to maintain fixity-at-rest, the compute load is greatly decreased; the cpu cost to calculating checksums in the storage layer is less than when this is done in the through applications or microservices. profit (economic) sustainability improvements for both people and planet may also result in a lower total cost of ownership for a digital preservation system. cost is a prime motivator when administers and leaders make long term decisions, decreasing annual operating cost related to digital preservation is crucial to the viability of a program. future and related work the 21st-century stack proposed in this paper is not the only way to increase sustainability or the only way in which digital preservation stacks will change. the planet is running out of bandwidth and will need to expand into using 5g and low-earth orbit satellite communications. new, quantum-resistant algorithms will need to be introduced as quantum computing advances.57 information technology and libraries december 2021 a 21st century technical infrastructure | tallman 15 blockchain technology introduces many possibilities. inherently, blockchain maintains fixity. the archangel project is exploring practical methods of recording provenance and proving authenticity by using a permissioned blockchain.58 blockchain is also the technology behind the interplanetary file system (ipfs), a content-addressed distributed storage network, that is in turn used by filecoin, a marketplace for an ipfs storage. small data industries is building starling, a filecoin-based application designed for cultural heritage organizations to securely store digital content.59 it’s important to note that these blockchain-based projects use a proof-of-stake model instead of a proof-of-work model, which has a significantly lower environmental impact than other blockchain implementations like the cryptocurrency bitcoin.60 while some organizations, like stanford university, may already leverage software-defined storage, most in the cultural heritage sector are not.61 the metaarchive cooperative, a nonprofit consortium for digital preservation, has begun a noteworthy project to explore using softwaredefined storage in a distributed digital preservation network. metaarchive, which currently uses lockss, is one of the few public digital preservation services that mitigates risk through organizational and administrative diversity. because members host and administer the lockss nodes that contain the replications, each copy is managed by a different set of organizational and administrative policies and often use different technology to do so. diversifying in this way protects against a single point of failure if only one organization managed the technical infrastructure. how this same diversity is achieved in a software-defined storage-based distributed digital preservation network will be a great contribution to the community. it would also be useful to study the reasons cultural heritage organizations have been so reluctant to adopt commercial sector technologies. identifying these hesitations would make it possible to create strategies that would encourage adoption of these approaches. it may simply be that when it comes to digital preservation, familiar and proven technologies provide a level of comfort. organizations may also be entrenched in custom developed solutions that are hard to move away from. conclusion digital preservation is a long-term commitment. while re-appraisal may take place, it’s inevitable that the amount of content stored in digital repositories will only increase over time. it is fiduciarily incumbent upon the cultural heritage community to examine our practices and look for better alternatives. exceptionalism ignores technological advancements made by the commercial industry, advancements that are very well suited to digital preservation. by adopting commercial industry data practices, such as software-defined storage, while simultaneously implementing innovations from within the cultural heritage community, like the oxford common file layout, it is possible to reduce the ongoing costs, resource consumption, and environmental impact of digital preservation. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 16 endnotes 1 ben goldman, “it’s not easy being green(e): digital preservation in the age of climate change,” in archival values: essays in honor of mark a. greene, ed. mary a. caldera and christine weidman (chicago: american library association, 2018), 274–95, https://scholarsphere.psu.edu/concern/generic_works/bvq27zn11p. 2 “a simple explanation of the triple bottom line,” university of wisconsin sustainable management, october 2, 2019, https://perma.cc/2hwf-3mmq. 3 goldman, “it’s not easy being green(e).” 4 keith l. pendergrass et al., “toward environmentally sustainable digital preservation,” the american archivist 82, no. 1 (2019): 165–206, https://doi.org/10.17723/0360-9081-82.1.165. 5 sarah barsness et al., 2017 fixity survey report: an ndsa report (osf, april 24, 2018), https://doi.org/10.17605/osf.io/snjbv; david s. h. rosenthal et al., “requirements for digital preservation systems: a bottom-up approach,” d-lib magazine 11, no. 11 (2005), https://perma.cc/x2r7-r5xp. 6 matthew addis, which checksum algorithm should i use? (dpc technology watch guidance note, digital preservation coalition, december 11, 2020), https://doi.org/10.7207/twgn20-12. 7 premis editorial committee, premis data dictionary for preservation metadata, version 3.0 (library of congress, november 2015), https://perma.cc/l79v-gqv7. 8 some organizations may continue to use a strategy where fixity is only checked when a file is accessed, if the potential loss fits within a defined acceptable loss. while this strategy may not work for all organizations, recognizing that loss is inevitable and defining a level of acceptable loss is an effective and pragmatic approach to managing risk of bit decay. 9 barsness et al., 2017 fixity survey report. 10 ndsa levels of preservation revisions working group, “levels of digital preservation, 2019 lop matrix, v2.0 (osf, october 14, 2019), https://osf.io/2mkwx/. 11 sibyl schaefer et al., “user guide for the preservation storage criteria,” february 25, 2020, https://doi.org/10.17605/osf.io/sjc6u. 12 mark carlson et al., “software defined storage,” (white paper, storage network industry association, january 2015), https://perma.cc/aq4t-9yxq. 13 abutalib aghayev et al., “file systems unfit as distributed storage backends” (proceedings of the 27th acm symposium on operating systems principles—sosp ’19, huntsville, ontario, canada: association for computing machinery, 2019): 353–69, https://doi.org/10.1145/3341301.3359656. 14 ndsa storage infrastructure survey working group, 2019 storage infrastructure survey: results of the storage infrastructure survey (osf, 2020), https://doi.org/10.17605/osf.io/uwsg7. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 17 15 joseph migga kizza, “virtualization technology and security,” in guide to computer network security, computer communications and networks (springer, cham, 2017), 457–75, https://doi.org/10.1007/978-3-319-55606-2_21. 16 ndsa storage infrastructure survey working group, 2019 storage infrastructure survey. 17 eric jonas et al., “cloud programming simplified: a berkeley view on serverless computing” (university of california, berkeley, february 10, 2019), https://perma.cc/yam2-tz8w. 18 alex garnett, mike winter, and justin simpson, “checksums on modern filesystems, or: on the virtuous consumption of cpu cycles,” in ipres 1028 conference [proceedings] (international conference on digital preservation, boston, mass., 2018), https://doi.org/10.17605/osf.io/y4z3e. 19 barsness et al., 2017 fixity survey report. 20 “import metadata,” documentation for archivematica 1.12.1, artefactual systems, inc., accessed may 21, 2021, https://perma.cc/ue3r-bdgz.; “ingest,” documentation for archivematica 1.12.1, artefactual systems, inc., accessed may 21, 2021, https://perma.cc/5sn5-gfx3. 21 ndsa storage infrastructure survey working group, 2019 storage infrastructure survey. 22 “fedora content versioning,” 2005, https://duraspace.org/archive/fedora/files/download/2.0/userdocs/server/features/version ing.html. 23 michael armbrust et al., above the clouds: a berkeley view of cloud computing, (technical report, eecs department, university of california, berkeley, february 10, 2009), https://perma.cc/qj5w-8s5y. 24 armbrust et al., above the clouds. 25 micah altman et al., “ndsa storage report: reflections on national digital stewardship alliance member approaches to preservation storage technologies,” d-lib magazine 19, no. 5/6 (may 2013), https://doi.org/10.1045/may2013-altman; michelle gallinger et al., “trends in digital preservation capacity and practice: results from the 2nd bi-annual national digital stewardship alliance storage survey,” d-lib magazine 23, no. 7/8 (2017), https://doi.org/10.1045/july2017-gallinger; ndsa storage infrastructure survey working group, 2019 storage infrastructure survey; evviva weinraub et al., beyond the repository: integrating local preservation systems with national distribution services (northwestern university, 2018), https://doi.org/10.21985/n28m2z. 26 ontario council of university libraries, “ontario library research cloud,” accessed april 14, 2021, https://perma.cc/kmp9-fs8k; “open source cloud computing infrastructure,” openstack, accessed april 14, 2021, https://perma.cc/g9ge-92jd. 27 nathan tallman, “software defined storage,” (presentation for the ndsa infrastructure interest group, march 16, 2020), https://doi.org/10.26207/3nn2-zv13. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 18 28 these network shares typically use the smb (server message block) or cifs (common internet file system) protocols to present file shares through a graphical user interface in operating systems such as windows or macos while the nfs (network file shares) protocol is more often used to mount storage in linux. 29 carlson et al., “software defined storage.” 30 raid, or the redundant array of independent disks, is technology that splits a file into multiple chunks and spreads them across multiple disks in a storage device, adding extra copies of the chunks so that the file can be recovered if an individual drive fails. 31 abhijith shenoy, “the pros and cons of erasure coding & replication vs raid in next-gen storage platforms” (software developer conference, storage networking industry association, 2015), https://perma.cc/yfs5-kxkk. 32 glenn heinle, “unlocking ceph” (presentation, designing storage architectures for digital collections, washington, dc: library of congress, 2019), https://perma.cc/z2r9-79ze; tamara scott, “big data storage wars: ceph vs gluster,” technologyadvice (blog), may 14, 2019, https://perma.cc/2yy2-bbxg. 33 giacinto donvito, giovanni marzulli, and domenico diacono, “testing of several distributed file-systems (hdfs, ceph and glusterfs) for supporting the hep experiments analysis,” journal of physics: conference series 513, no. 4 (june 2014): 042014, https://doi.org/10.1088/1742-6596/513/4/042014. 34 matthew ahrens, “openzfs: a community of open source zfs developers,” in asiabsdcon 2014 (asiabsdcon, tokyo, japan: bsd research, 2014), 27–32, https://perma.cc/xg79-pbu7. 35 brian hickmann and kynan shook, “zfs and raid-z: the über-fs?” (university of wisconsin– madison, december 2007), https://perma.cc/w5pd-enpp. 36 garnett, winter, and simpson, “checksums on modern filesystems.” 37 edward shishkin, “resier5 (format release 5.x.y),” marc mailing list archive, 2019, https://perma.cc/dn8y-v8kq. 38 “fujifilm launches ‘fujifilm software-defined tape,’” fujifilm europe, may 19, 2020, https://perma.cc/b3gn-plr9. 39 aghayev et al., “file systems unfit as distributed storage backends.” 40 ibm systems, “tape goes high speed,” 2016, https://perma.cc/fnv9-rtg9; “fujifilm launches ‘fujifilm software-defined tape’”; desire athow, “here’s what sony’s million gigabyte storage cabinet looks like,” techradar (blog), 2020, https://perma.cc/vhn4-layt. 41 david rosenthal, “optical media durability: update,” dshr’s blog, august 20, 2020, https://perma.cc/vkw9-83j3. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 19 42 andrew hankinson et al., “the oxford common file layout: a common approach to digital preservation,” publications 7, no. 2 (june 2019): 39, https://doi.org/10.3390/publications7020039. 43 andrew hankinson et al., “oxford common file layout specification,” july 7, 2020, https://perma.cc/s73z-3n6k. 44 marco la rosa et al., “our thoughts on ocfl over s3 · issue #522 · ocfl/spec,” github, accessed march 12, 2021, https://perma.cc/pa3g-cb78. 45 hannah frost, “version 1.0 of the oxford common file layout (ocfl) released,” stanford libraries (blog), july 23, 2020, 1, https://perma.cc/5j5m-gyqw; andrew woods, “implementations | ocfl/spec,” github, february 10, 2021, https://github.com/ocfl/spec. 46 while serverless might be the ultimate microservice, requiring the least amount of overhead, costs may still be hard to predict. 47 ryan luecke, “crc32 checksums; the good, the bad, and the ugly,” box blog, october 12, 2011, https://perma.cc/mvp7-yvzv. 48 aghayev et al., “file systems unfit as distributed storage backends.” 49 junkil ryu and chanik park, “effects of data scrubbing on reliability in storage systems,” ieice transactions on information and systems e92-d, no. 9 (september 1, 2009): 1639–49, https://doi.org/10.1587/transinf.e92.d.1639. 50 raghavendra talur, “bitrot detection | gluster/glusterfs-specs,” github, august 15, 2015, https://github.com/gluster/glusterfsspecs/blob/fe4c5ecb4688f5fa19351829e5022bcb676cf686/done/glusterfs%203.7/bitrot.m d. 51 schaefer et al., “user guide for the preservation storage criteria.” 52 bill branan, “cloud-native preservation” (osf, october 22, 2019), https://osf.io/kmdyf/. 53 andrew hankinson et al., “implementation notes, oxford common file layout specification,” july 7, 2020, https://perma.cc/pvf8-sqfn. 54 although out of scope in terms of the stack, the policies and practices implemented by organizations can have a direct impact on digital preservation sustainability. for example, appraisal can be the most powerful tool available to an organization to control the amount of content being preserved. despite storage vendors proclamations that storage is cheap, digital preservation is not. it is not wise nor necessary to keep every digital file. organizations will benefit from applying flexible appraisal systems that reduce the amount of content needing preservation, but also establishing different classes of preservation so the most advanced activities are only applied as needed. additionally, organizations should consider allowing lossy compression to decrease disk usage, where appropriate; compression as an appraisal choice is very similar to choosing to sample a grouping of material rather than preserving the whole. for additional information see nathan tallman and lauren work, “approaching information technology and libraries december 2021 a 21st century technical infrastructure | tallman 20 appraisal: framing criteria for selecting digital content for preservation,” in ipres 1028 conference [proceedings] (international conference on digital preservation, boston, mass.: osf, 2018), https://doi.org/10.17605/osf.io/8y6dc. 55 david rosenthal, “cloud for preservation,” dshr’s blog, 2019, https://perma.cc/zls9-r857. 56 pendergrass et al., “toward environmentally sustainable digital preservation.” 57 henry newman, “industry trends” (presentation, designing storage architectures for digital collections, washington, dc: library of congress, 2019), https://perma.cc/3mgk-n5u3. 58 t. bui et al., “archangel: trusted archives of digital public documents,” in proceedings acm document engineering 2018 (association for computing machinery, arxiv.org, 2018), https://doi.org/10.1145/3209280.3229120. 59 ben fino-radin and michelle lee, “[starling]” (presentation, designing storage architectures for digital collections, washington, dc: library of congress, 2019), https://perma.cc/7lguuew9. 60 for additional information on the differences of proof-of-stake vs. proof-of-work models, see peter fairley, “ethereum plans to cut its absurd energy consumption by 99 percent,” ieee spectrum (blog), january 2, 2019, https://perma.cc/gch7-t556. 61 julian morley, “storage cost modeling” (presentation, pasig, mexico city, mexico, 2019), https://doi.org/10.6084/m9.figshare.7795829.v1. product ownership of a legacy institutional repository: a case study on revitalizing an aging service article product ownership of a legacy institutional repository a case study on revitalizing an aging service mikala narlock and don brower information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13241 mikala narlock (mnarlock@nd.edu) is digital collections strategy librarian, university of notre dame. don brower (dbrower@nd.edu) is digital projects lead, university of notre dame. © 2021. abstract many academic libraries have developed and/or purchased digital systems over the years, including digital collection platforms, institutional repositories, and other online tools on which users depend. at hesburgh libraries, as with other institutions, some of these systems have aged without strong guidance and resulted in stale services and technology. this case study will explore the lengthy process of stewarding an aging service that satisfies critical external needs. starting with a brief literature review and institutional context, the authors will examine how the current product owners have embraced the role of maintainers, charting a future direction by defining a clear vision for the service, articulating firm boundaries, and prioritizing small changes. the authors will conclude by reflecting on lessons learned and discussing potential future work, both at the institutional and professional level. introduction our home-grown institutional repository (ir) began almost a decade ago with enthusiasm and promise, driven by an eagerness to meet as many use cases as possible. over time, the code grew unwieldy, personnel transitioned into new roles, and new priorities emerged, leaving few individuals to manage the repository, allocate resources, articulate priorities, or advocate for user needs. this in turn left the system underutilized and undervalued. in mid -2019, two product owners (pos) at hesburgh libraries, university of notre dame were named to oversee the service and tasked with determining how the service should continue, if at all. the pos began by evaluating the service, current commitments, and benefits, and identifying potential on-campus adopters of the service. after agreeing the service should continue, the pos started the lengthy process of turning the metaphorical ship, prioritizing modest adjustments that would have large payoffs.1 selected literature review since the 2003 seminal article by clifford lynch, much has been authored on the topic of institutional repositories as academic libraries and archives have flocked to create their own.2 a complete literature review is beyond the scope of this case study: institutional repositories have contended and continue to contend with a wide variety of challenges, including legal, ethical, and socio-technical challenges.3 while the lessons presented in this case study can apply to a wide variety of legacy services, a brief overview of some of the literature surrounding irs is crucial to understanding the challenges the authors were presented as product owners. broadly defined “as systems and service models designed to collect, organize, store, share, and preserve an institution’s digital information or knowledge assets worthy of such investment,” libraries and archives flocked to build the “essential infrastructure for scholarship in the digital mailto:mnarlock@nd.edu mailto:dbrower@nd.edu information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 2 age.”4 operating under the assumption that faculty members would flock to the service to deposit their works, irs were promised to solve many problems, including supporting open access publishing and digital asset management.5 as articulated by dorothea salo, however, the field of dreams model (“build it and they will come”) was insufficient as repositories often failed to meet changing user needs and expectations while heavily employing library jargon that was foreign to faculty members.6 moreover, as identified by kim, some irs struggle to even be known to their users, while also grappling with concerns of trust.7 other problems that have plagued repositories include limited adoption rates, restricted resources to support digitization of analog materials for faculty that operate in both analog and digital media, failing support from fellow library colleagues, and inconsistent and incomplete metadata.8 salo warned more than a decade ago that high-level library administrative support would be necessary to empower repository managers to enact lasting and substantive change, and recent studies echo these concerns.9 libraries have slowly started to serve faculty on their terms, such as by creating automated processes for populating irs, streamlining content deposits, experimenting with metadata harvesting features to provide increased access, and building more tools to integrate directly with the research lifecycle.10 however, these new technologies and services may be out of reach for many institutions. in addition to limited resources, some institutions are grappling with a legacy system that is incompatible with newer code, leaving these institutions in a feature desert, reliant on aging technology and cumbersome deposit processes.11 moreover, even in an institution where resources might be more readily available for licensing or purchasing newer technology, early forks of open-source code or otherwise deprecated components might make migration to newer platforms extremely difficult, if not impossible, without extensive infrastructure improvements. lastly, as libraries grappled with some of the issues mentioned above and options for repositories continued to proliferate, many institutions struggled to clearly articulate boundaries around their digital library holdings. confusion between digital collections, scholarly content, e-resources, and other digital materials resulted in some institutions having too many options to store content, leaving internal and external stakeholders confused as to where to discover and distribute materials; conversely, other institutions have few options, and a wide variety of content is pigeonholed imperfectly into a single repository.12 in both situations, developing repositories with vague content scopes can be exceedingly difficult, as a restrictive scope can stifle development , while an overly inclusive approach results in too many use cases and competing stakeholder interests to effectively prioritize feature development. local context our institutional repository at the university of notre dame, managed by hesburgh libraries employees, suffered from many problems that affected our locally built code: limited adoption and awareness on campus; aging technology that made adding new features a monumental, if not impossible, task; and an overly broad scope (and a simultaneous proliferation of other digital collection tools). while the detailed history of this repository is beyond the scope of this paper, a brief overview of the development provides critical context. additionally, the technical details and implementation particulars will not be discussed, as this case study transcends specific software frustrations and will resonate with many institutions regardless. information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 3 in 2012, after a failed attempt to launch a repository in the early 2000’s, consortial development of our ir began in an open-source community. in 2014, an early implementation of the product was envisioned to be a unified digital library service that would provide support to many different stakeholders. this included a plan for a single location for researchers to share their scholarly work, research outputs, and research data, as well as for the university libraries to provide access to digitized rare and archival holdings. as development continued on the homegrown service, features were implemented to serve the numerous purposes mentioned above. this included components of an institutional repository, such as a self-deposit interface, customizable access levels, and a proof-of-concept researcher profile system. over time support for browsing digital collections was added, namely the development of the work type “collection,” which allowed curators to create a landing page for their collection and customize it with a representative image. development continued in a somewhat sporadic fashion, often aligning at the intersection of “what is easy?” and “what is needed?” as technical staff continued growing the open-source code. as content was added to the system, stemming from special collections, various campus partners, and electronic thesis and dissertation (etd) deposits, additional use cases emerged and were added to the scope of the repository. the system quickly grew cumbersome and difficult to work with. in short, the repository struggled with the challenges of many open-source technologies. the struggle was compounded by decreasing resources, an overly inclusive scope, limited adoption— both with external faculty as well as library faculty and staff—and consortial development that introduced features extraneous to local campus needs. while our repository did many different things, it failed to do any one well. after falling short of meeting the expectations for digital collections, particularly with regards to browsing and displaying objects, the library applied for, and received, a three-year mellon grant.13 this grant, a collaboration with the snite museum of art, university of notre dame, was initially sought to improve upon the existing repository and to build the infrastructure necessary to support the online display of collective cultural heritage materials and facilitate serendipitous discovery for patrons. however, soon into the grant, it became clear that creating an entirely new system for digital collections would be not only easier to build and maintain, but also better suited to meet the specific needs of digital collections as articulated by campus partners. first things first: what is our ir? around the same time this shift was announced, two individuals were appointed to serve as product owners (pos) of the repository. while exact duties vary between institutions, pos are responsible for liaising with users, managing the product backlog, directing development, communicating with a wide variety of stakeholders, resolving issue tickets, and guiding the overall direction of the product.14 the pos were tasked with making this amorphous, oft-critiqued service usable while dealing with uncertain resources and competing institutional priorities. with the change in grant objectives mentioned above, namely the desire to develop a new repository instead of contending with the legacy code, the option was presented to retire the repository and direct users to other systems that could sufficiently meet their needs, such as discipline specific repositories, general purpose repositories, or even online cloud storage. the pos recognized that continuing the system due solely to sunk costs was a fallacy: if the service was too cumbersome to information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 4 maintain with even nominal use, the return on investment would be abysmal and ultimately prevent the library from investing resources more appropriately. in order to evaluate the service, the pos considered active commitments and ongoing partnerships tied to the service. in particular, several centers and departments on campus had utilized the system to capture citations and demonstrate their impact. additionally, after conversations with library liaisons, it became apparent that there was great value in providing the campus with a discipline-agnostic repository that allows deposition of, provides access to, and preserves scholarly outputs that might otherwise be lost. while the pos recognized that faculty adoption or even awareness of the service was limited, they realized there were several campusspecific features that were useful to local champions, including flexible access controls at the record and file levels, as well as a customized etd workflow that served the graduate school, internal technical services, and the students and faculty required to interact with the system. acknowledging that the system and related services were still critical, the pos prioritized making sure the system remained useful: maintaining the legacy repository would cost valuable time and resources and would need to overcome the resentment that many internal stakeholders had developed over the years. after deciding the system was worth maintaining, it was necessary to explicitly narrow the scope of the service, which had broadened over time in an ad hoc manner: as other services were turned off, leaving various digital content to find a new location, our institutional repository was often leveraged to host the content, even when support for the needs of niche content was poor at best. when considering the future of the repository, several key use cases emerged, including the etd support provided to the graduate school as mentioned above. while the service had done many things acceptably, the strength was in the support for scholarship: the customized access levels, self-deposit interface, and robust preservation capabilities were frequently lauded as the highlights of the service to internal and external stakeholders. these considerations, combined with the eventual migration of digitized rare and unique materials to the new mellon-funded platform, resulted in rebranding and redefining the service as exclusively focused on scholarly outputs. with the goal of best supporting the teaching and research mission of the university, the directional force became how to (re)build the service as a trusted, useful, and integral repository for campus scholars to provide access to their research outputs. mission (and vision) critical operating under the guiding principles of usefulness, usability, and transparency, the first task after redefining and rearticulating the scope of the service was to keep the service operational. however, with the recognition that maintenance alone, while critical, would not lead to an enhanced reputation on campus, it was important to continue charting a forward direction. the product owners were given the freedom to articulate their ideal mission statement. to complement the vision of the repository as both trusted and integral, the pos further defined the mission statement in three key areas: to increase the impact of campus researchers, scholars, and departments; to advance new research by facilitating access to scholarship in all forms; and to serve as active campus partners in the research lifecycle. while these statements are far from innovative or revolutionary, it was essential for moving the service forward. in fact, these sentences were carefully crafted over the course of a month, during which time the product owners drafted the language, compared it with peer and aspirational peer information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 5 institutions, and solicited feedback from trusted internal colleagues before sharing it more broadly. this time-consuming process was critical for success, however: with the knowledge that these words would serve as the foundation for prioritizing feature requests and advocating for resources, the pos wanted to establish both the repository and themselves in their new role. this clarity in mission was also important for grappling with legacy emotional and mental frustrations that lingered towards the system, as the pos had a strong, unified foundation to advocate for resources and the service as a whole. relatedly, these mission and vision statements provided critical and consistent talking points, which were leveraged in presentations to internal stakeholders, provided to librarians as messaging for the liaison faculty, and useful in short communications to teaching professors, research faculty, and department administrato rs. clear and present boundaries in rebranding the repository, it also became clear that firm boundaries would be instrumental in attaining success. in addition to narrowly focusing feature development on supporting research and scholarly outputs, the pos also scaled back goals for adoption, intentionally excluded digital collection features, and identified features that were patently unattainable in the short term. the repository was often seen as a failure locally due to limited adoption and an incomplete record of the academic outputs of campus, reflecting concerns of irs more generally.15 combatting this narrative required a clear articulation and acceptance of the fact that the institutional repository, regardless of how seamlessly integrated or easy to use, would never be absolutely comprehensive or the authoritative record of our researchers and scholars. with limited resources and a current technical infrastructure in which it is difficult to incorporate automatic harvesting mechanisms, any effort to make the repository comprehensive would be impractical, unrealistic, and a waste of limited resources. instead, by focusing efforts on making the repository useful and refraining from being yet another requirement for an already overwhelmed faculty member or graduate student, the service can be improved to meet the unique needs of campus faculty, serving as a more viable option for those who need it.16 similarly, because there is less concern with filling the repository and increasing usage statistics and more on what the patron needs, the pos have been able to develop robust partnerships with stakeholders, leading to champions in research centers, labs, departments, and other administrative units across campus. this has helped scholars demonstrate the impact of their work, which in turn led to more partnerships with other campus centers, as champions began to advocate for the service to colleagues facing similar challenges across the university. in this way, decreasing the effort to fill the repository has actually increased holdings and driven more traffic to the site: by focusing on useful offerings and decreasing the burden on ourselves to create a comprehensive research repository, the pos have been able to prove the value of a discipline-agnostic approach to internal and external stakeholders. an additional, and extremely beneficial, boundary was intentionally excluding library-owned digital collections from the repository’s collecting and feature-development scope. the pos received little pushback from internal users on this change: the repository had been the de facto scholarly and research repository for nearly five years, as it was patently clear that supporting digital collections had been more of an afterthought, with limited features built to support curators and users in creating and interacting with rare and archival materials. in fact, internal colleagues supported this change wholeheartedly, as the pos volunteered to continue providing access to the extant digital content in the ir as the mellon grant-funded site was built. while this information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 6 direction had already been understood by individuals across the organization, it was helpful to clearly articulate the new boundaries in open forums for internal stakeholders, communication through a library-wide listserv, and repetition in smaller meetings. by articulating this new boundary clearly and repeating it frequently in different methods of communication, the pos had the authority to reject feature requests that were explicitly in support of rare and archival materials. with a clear focus on collecting and providing access to scholarly and research outputs, niche metadata fields, advanced browsing features, and robust collection landing pages were identified as unnecessary, as they were scoped for the mellon-funded platform, and internal colleagues quickly embraced this boundary. the final, crucial boundary, also related to feature requests, was to clearly define requests that were impossible to accommodate in the current technical infrastructure. as mentioned earlier, the pos focused first on maintenance: by updating code, critically evaluating the service and existing commitments, and charting a future direction, the pos could more effectively steward the project. this also meant revisiting previous feature requests, and even technical promises, in order to set more reasonable expectations on what the service would, and would not, be able to support in the coming years. with limited resources, advanced features such as research profiles—a frequent request from internal allies—was beyond the current capabilities with the aging technical stack. moreover, a feature-rich repository would be essentially useless if users’ basic expectations were left unmet: a cumbersome deposit interface, limited upload support, and confusing language throughout the site were more pressing issues, as they prevented users from even engaging with the site for any amount of time. by resolving these limitations and generating awareness of the repository, the pos could better serve not only current campus partners, but also future users, as an increase in adoption and use would lead to more resources to develop advanced features. instead of planning a new outfit for the proverbial patient, it was more important to stop the bleeding. by adopting firm boundaries, the pos were able to scope developer work, prioritize maintenance and modest feature development, and even deny implementation of previously requested features that were no longer relevant to the repository or would be unattainable in the coming years. the pos could explicitly drop support for unused services, allow other services to limp along, and improve existing strengths. this has further helped to clarify messaging about the service and garner more support from our campus partners; instead of a malleable system that fits too many roles in a limited capacity, the pos could clearly state how the repository offers support and garner users from across campus. small changes, big rewards the last critical component of rebranding and revitalizing the institutional repository was the conscious decision to implement incremental improvements instead of large, sweeping changes. in particular, there were known frustrations with the service that were easy to start working on while the product owners expanded the user base and sought additional user feedback. small changes to the user interface, including the addition of use metrics and color-coded access tags, received immediate attention and positive feedback from key stakeholders. additionally, over the numerous years of development, many projects to improve the repository had stalled for various reasons. by either prioritizing the work necessary to complete the project or accepting the sunkcosts and clearing the backlog for other projects, the technical development team could build momentum, completing projects and clearing mental space for new, exciting endeavors. information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 7 with limited resources on hand, maximizing the return on investment also included an emphasis on securing and keeping internal and external champions. due to the limited outreach conducted early in the system’s existence as well as the mediocre service offerings, many campus users were unaware of the tool, and a few were using the repository in a somewhat limited fashion. in order to build support for the service, it was critical that key users of the repository received targeted support and outreach efforts. a primary example of this was an imaging facility on campus: this unit provided a critical service to campus, yet had difficulty showing the impact of their work as many faculty members did not cite their team in publications. the facility slowly began collecting citations manually, but still struggled to publicly advertise their capabilities and show the fruits of their labor. they solved this problem by loading citation records into the repository, which became the single location where any interested faculty, staff, and students could look to see the full output of the center. while they were using the repository in a somewhat different manner than anticipated, they found the system useful and were actively directing other campus centers and institutes to the repository for similar support. in conversations with them, it became clear that a few modest changes would streamline their workflows and alleviate some cumbersome burdens. with this concentrated outreach and a minimal amount of development, the repository secured a champion that continues to advocate for the service to colleagues across campus. lastly, prioritizing maintenance and paying down technical debt was critical for moving the repository forward. many software dependencies had fallen behind by several major version updates, making it difficult to add new features or consider potential migration paths to future technical solutions. while the amount of technical debt to be paid was substantial, by prioritizing a small amount of maintenance every month, the development team quickly caught up, thereby improving the overall performance of the site and providing the product owners with the flexibility to consider future technical implementations and key features to continue recruiting users. lessons learned and future work moving forward, the product owners are embracing the role of maintainers. in specific reference to repositories, that includes “repairing, caring for and documenting a wide variety of knowledge systems beyond irs to facilitate access and optimize user experience.”17 the work of critically evaluating commitments, establishing clear boundaries, and reaffirming the mission of the repository is useful on a recurring basis, and will need to be continued as the repository ages. maintaining the technical infrastructure as appropriate and conducting user experience testing to improve the service will be critical to ensuring the long-term success of the repository and the information contained therein. beyond the stewardship and small improvements required for maintaining the service, there is the opportunity to reconsider the role of the institutional repository, both at the local level and within the academic community. by prioritizing usefulness over comprehensiveness, the product owners made great strides in making the service accessible to patrons and actually usable. when considering the future of repositories, specifically through a lens of usefulness, it is critical to consider how future work will best serve faculty needs without overburdening librarians. adding pos who are examining how a service will be used and what will promote the mission of the library reframes a repository from being a piece of technology to being a source of interconnections. scholarship usually requires a level of technology different from what most campus it departments can provide: research does not usually just deal in urls, it requires dois information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 8 and persistent identifiers; files are not just backed up, but are preserved (an active process that requires consideration for how computing will change over the coming decades). not only is a library a place to go to look for data, but it is also a place that can help publish and deposit items, providing valuable services to connect researchers to tools and platforms to facilitate research. this is an area of service that libraries and repositories can provide. in the relationship between libraries and technologies, innovation and maintenance, one clear challenge was the amount of emotional labor necessary to revitalize a service. the pos spent a large portion of time apologizing for previous failures, managing expectations by scaling back previous promises, and grappling with the current technical shortcomings of the service. while this is, at least in part, the role of the pos, the phenomenon of controlling expectations and handling the emotional debt that comes with broken promises and failed technologies is not localized to hesburgh libraries. in libraries especially, this work tends to fall to women, where they are forced to be the middle ground between technology and patron-facing librarians.18 while embracing the term “product owner” has helped to make visible and valuable the labor invested, especially that which might otherwise be overlooked, libraries writ large still need to contend with the gender divide plaguing the seeming dichotomy between innovation and maintenance. 19 in fact, as libraries continue to build new technologies and support innovative research, the role of the product owners in managing legacy technologies will be crucial for success, as will embracing a culture of care and empathy. while beyond the scope of this case study, continued discussions of the gender roles often employed in library technology need to continue, especially as academic libraries embrace scrum methodology, project management, and product ownership. conclusion in this case study, the product owners of a legacy institutional repository described methods for revitalizing a service. for the institutional repository managed by hesburgh libraries, there has been a noticeable increase in usage in the past six months: more deposits, higher access counts, and more support tickets tracked. it appears the efforts of the product owners are showing results. this increased usage is one more piece of evidence that a repository is more than software and more than technology: by allowing the product owners oversight of the mission and ultimate direction of the service, not to mention the freedom to engage with users on behalf of the development team, the system is in a much better position than in previous years. despite these improvements, there is still room for growth as the pos guide the overall mission and development of the institutional repository as both a service and a system. similarly, as more institutions contend with legacy digital technology, using pos and the methods described above may prove beneficial. there is additional work to be done, such as investigating more thoroughly the role of the repository—indeed the concept of the repository—and discussions of gender norms in technology. endnotes 1 this article is based on a presentation by don brower and mikala narlock: “what to do when your repository enters middle age” (online presentation, samvera connect 2020, october 28, 2020), https://doi.org/10.7274/r0-e32v-2h81. 2 clifford lynch, “institutional repositories: essential infrastructure for scholarship in the digital age,” portal: libraries and the academy 3 (april 1, 2003): 327–36, https://doi.org/10.1353/pla.2003.0039. https://doi.org/10.7274/r0-e32v-2h81 https://doi.org/10.1353/pla.2003.0039 https://doi.org/10.1353/pla.2003.0039 https://doi.org/10.1353/pla.2003.0039 information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 9 3 soohyung joo, darra hofman, and youngseek kim, “investigation of challenges in academic institutional repositories: a survey of academic librarians,” library hi tech 37, no. 3 (january 1, 2019): 525–48, https://doi.org/10.1108/lht-12-2017-0266. 4 j. j. branin, “institutional repositories,” in encyclopedia of library and information science, ed. m. a. drake (boca raton, fl: taylor & francis group, 2005): 237–48; lynch, “institutional repositories.” 5 raym crow, “the case for institutional repositories: a sparc position paper,” arl bimonthly report 223, august 2002: 7; lynch, “institutional repositories.” 6 dorothea salo, “innkeeper at the roach motel,” december 11, 2007, https://minds.wisconsin.edu/handle/1793/22088. 7 jihyun kim, “motivations of faculty self-archiving in institutional repositories,” journal of academic librarianship 37, no. 3 (may 1, 2011): 246–54, https://doi.org/10.1016/j.acalib.2011.02.017; deborah e. keil, “research data needs from academic libraries: the perspective of a faculty researcher,” journal of library administration 54, no. 3 (april 3, 2014): 233–40, https://doi.org/10.1080/01930826.2014.915168. 8 trevor owens, “the theory and craft of digital preservation,” lis scholarship archive, july 15, 2017, https://doi.org/10.31229/osf.io/5cpjt. 9 e.g., joo, hofman, and kim, “investigation of challenges in academic institutional repositories.” 10 sarah hare and jenny hoops, “furthering open: tips for crafting an ir deposit service,” october 26, 2018, https://scholarworks.iu.edu/dspace/handle/2022/22547; james powell, martin klein, and herbert van de sompel, “autoload: a pipeline for expanding the holdings of an institutional repository enabled by resourcesync,” code4lib journal, no. 36 (april 20, 2017), https://journal.code4lib.org/articles/12427; carly dearborn, amy barton, and neal harmeyer, “the purdue university research repository: hubzero customization for dataset publication and digital preservation,” oclc systems & services, february 1, 2014, https://docs.lib.purdue.edu/lib_fsdocs/62. 11 clifford lynch, “updating the agenda for academic libraries and scholarly communications,” college & research libraries 78, no. 2 (february 2017): 126–30, https://doi.org/10.5860/crl.78.2.126. 12 lynch, “updating the agenda,” 128. 13 diane walker, “hesburgh/snite mellon grant,” october 31, 2018, https://doi.org/10.17605/osf.io/cusmx. 14 hrafnhildur sif sverrisdottir, helgi thor ingason, and haukur ingi jonasson, “the role of the product owner in scrum-comparison between theory and practices,” in “selected papers from the 27th ipma (international project management association), world congress, dubrovnik, croatia, 2013,” special issue, procedia—social and behavioral sciences, 119 (march 19, 2014): 257–67, https://doi.org/10.1016/j.sbspro.2014.03.030. https://doi.org/10.1108/lht-12-2017-0266 https://doi.org/10.1108/lht-12-2017-0266 https://minds.wisconsin.edu/handle/1793/22088 https://minds.wisconsin.edu/handle/1793/22088 https://minds.wisconsin.edu/handle/1793/22088 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1080/01930826.2014.915168 https://doi.org/10.1080/01930826.2014.915168 https://doi.org/10.31229/osf.io/5cpjt https://doi.org/10.31229/osf.io/5cpjt https://scholarworks.iu.edu/dspace/handle/2022/22547 https://scholarworks.iu.edu/dspace/handle/2022/22547 https://journal.code4lib.org/articles/12427 https://journal.code4lib.org/articles/12427 https://journal.code4lib.org/articles/12427 https://docs.lib.purdue.edu/lib_fsdocs/62 https://docs.lib.purdue.edu/lib_fsdocs/62 https://docs.lib.purdue.edu/lib_fsdocs/62 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.17605/osf.io/cusmx https://doi.org/10.1016/j.sbspro.2014.03.030 https://doi.org/10.1016/j.sbspro.2014.03.030 information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 10 15 salo, “innkeeper.” 16 carolyn ten holter, “the repository, the researcher, and the ref: ‘it’s just compliance, compliance, compliance’,” journal of academic librarianship 46, no. 1 (january 1, 2020): 102079, https://doi.org/10.1016/j.acalib.2019.102079. 17 don brower et al., “on institutional repositories, ‘beyond the repository services,’ their content, maintainers, and stakeholders,” against the grain, 32 (1), https://against-thegrain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-therepository-services-their-content-maintainers-and-stakeholders/. 18 bethany nowviskie, “on capacity and care,” october 4, 2015, http://nowviskie.org/2015/oncapacity-and-care/; ruth kitchin tillman, “who’s the one left saying sorry? gender/tech/librarianship,” april 6, 2018, https://ruthtillman.com/post/whos-the-one-leftsaying-sorry-gender-tech-librarianship/. 19 dale askey and jennifer askey, “one library, two cultures” (library juice press, 2017), https://macsphere.mcmaster.ca/handle/11375/22281; rafia mirza and maura seale, “dudes code, ladies coordinate: gendered labor in digital scholarship,” october 22, 2017, https://osf.io/hj3ks/. https://doi.org/10.1016/j.acalib.2019.102079 https://doi.org/10.1016/j.acalib.2019.102079 https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ http://nowviskie.org/2015/on-capacity-and-care/ http://nowviskie.org/2015/on-capacity-and-care/ http://nowviskie.org/2015/on-capacity-and-care/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://macsphere.mcmaster.ca/handle/11375/22281 https://macsphere.mcmaster.ca/handle/11375/22281 https://macsphere.mcmaster.ca/handle/11375/22281 https://osf.io/hj3ks/ abstract introduction selected literature review local context first things first: what is our ir? mission (and vision) critical clear and present boundaries small changes, big rewards lessons learned and future work conclusion endnotes using open access institutional repositories to save the student symposium during the covid-19 pandemic article using open access institutional repositories to save the student symposium during the covid-19 pandemic allison symulevich and mark hamilton information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14175 allison symulevich (asymulev@usf.edu) is scholarly communications librarian, university of south florida. mark hamilton (hamiltonma@longwood.edu) is research and digital services librarian, longwood university. © 2022. abstract in 2020, during the covid-19 pandemic, colleges and universities around the world were forced to close or move to online instruction. many institutions host yearly student research symposiums. this article describes how two universities used their institutional repositories to adapt their student research symposiums to virtual events in a matter of weeks. both universities use the bepress digital commons platform for their institutional repositories. even though the two universities’ symposium strategies differed, some commonalities emerged, particularly with regard to learning the best practices to highlight student work and support their universities’ efforts to host research symposiums virtually. introduction many colleges and universities host student research symposiums as a way to celebrate students’ intellectual experiences and support the high-impact practice of presenting original student research. students contribute research outputs and share their projects with others in their institution’s community, beyond the classroom. typically, many of these student research symposiums are conducted in the second half of the spring semester in order to allow students to work on their research throughout the course of the year. during the 2020 school year, the world experienced the covid-19 pandemic. the many ways this pandemic has changed our society are only now being understood, but the pervasive move to virtual meetings and presentations is certainly one of the most dramatic. college campuses began delivering remote instruction in a matter of days and organizers of student research symposiums around the country were forced either to cancel or reimagine the events. longwood university and university of south florida st. petersburg campus (usf) were two institutions that transformed their in-person student symposiums into online events in a matter of weeks. in this article, the authors share their experiences of working with many people throughout their campuses to create a student research symposium experience similar to their past in-person events. both universities use bepress’ digital commons platform for their institutional repositories. overall, longwood’s and usf’s symposium strategies were different in some regards, but some commonalities emerged, particularly with regard to learning the best practices that celebrate the students’ achievements and support their universities’ efforts promoting high-impact student research. literature review student research has grown in importance. following george kuh’s 2008 report, high-impact educational practices: what they are, who has access to them, and why they matter, universities recognized and responded to the need to integrate these high-impact practices into their curricular and co-curricular efforts.1 one of the recognized high-impact practices is student mailto:asymulev@usf.edu mailto:hamiltonma@longwood.edu information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 2 research.2 students can contribute to their disciplinary scholarly conversation through their original research and by presenting on their research projects, and colleges and universities can promote this conversation by facilitating the display of student work and enabling interactive discussions between the student presenters and other members of their academic community. the number of student research conferences has increased internationally. 3 students participate in the formal aspect of these conferences, as well as informal conversations where they can continue to expound on their research, extend their professional social networks, and gain confidence as researchers.4 student research is also being captured in institutional repositories (irs) more than in the past.5 “these most junior members of the academic community are doing research and adding to the body of knowledge generated by their institutions and in their disciplines.”6 passehl-stoddart and monge point out the importance of institutional repositories supporting student work: “the ir also serves to support, enhance, and capture evidence of high-impact educational practices; acts as an equitable access point to meaningful learning opportunities; and provides a platform for students to begin to develop academic confidence and an entryway into the scholarly communication learning cycle.”7 in supporting high-impact student research, librarians do not act alone. we collaborate with other departments on campus such as offices of undergraduate and graduate studies; offices of research, honor colleges, student affairs; and more. krause, eickholt, and otto describe how the library collaborated with the music department at eastern washington university to upload student musical performances to the institutional repository.8 this type of collaboration leads to increased student support, as well as increased discoverability of student intellectual and creative scholarship. when the covid-19 pandemic hit, universities around the world were forced to change their means of conducting business. classes were moved online at many institutions. conferences were either canceled or moved online as well. many colleges and universities around the coun try host student research symposiums to highlight the high-impact work that students are doing. these symposiums needed to move to remote delivery, and many of these had to move quickly as the spring semester was well underway when institutions were being forced to close. symposiums and conferences adapted to online environments by moving away from their inperson events. this applied to both academic and professional conferences. for example, oregon state university (osu) and the new haven local section of the american chemical society hosted their respective events virtually using a variety of technologies. osu worked with its distance learning unit to create a canvas course, whereas the new haven local section of the american chemical society used a combination of open broadcaster software (obs studio), youtube, zoom, and google drive. as far as professional conferences, many used prerecorded sessions when hosting on digital platforms such as zoom.9 there were positive outcomes from these virtual symposiums. for example, osu saw benefits of “enhanced ability to devote personalized attention to presenters (e.g., by providing links to relevant publications or websites), fewer distractions, more time to craft thoughtful responses, and an ability for students to keep track of shared resources and discussants’ contact information that could be used for follow-up after the event.” their post-event surveys also showed that students who could not previously participate due to distance circumstances were able to information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 3 participate in an online forum. osu’s approach involved using canvas, their learning management system (lms), through which students submitted prerecorded lightning talks over powerpoint slides with a written narrative. the canvas course was open to the osu community. discussion boards for commenting were open for a two-day period.10 in her article, stephanie houston interviewed various conference coordinators.11 interviewees stated that a major benefit was global access to information from top researchers.12 with regard to cancer research conferences, free registration vastly expanded the number of registrants from previous years.13 conference hosts felt as though some of the differences of online events would stay for future years because of personal scheduling issues, ability to provide global access, and environmental impact.14 others think the novelty of virtual events may wear off following the pandemic.15 however, the switch to virtual events was not without challenges. osu noted that two main challenges they faced were organization of presentations and presenters responding to comments on their asynchronous presentations.16 houston’s interviewees explain that the lack of informal discussions and face-to-face interactions was a negative of hosting virtual symposiums.17 speirs also states that virtual poster sessions suffer from the lack of interaction of face-to-face exchanges, especially for young researchers.18 some saw that the large number of participants made it difficult for participants to engage in question-and-answer sessions.19 two of the interviewees attempted to fix this by using twitter to have asynchronous q&a using a specific hashtag for the event.20 technology issues such as limited bandwidth and internet connectivity problems are a concern for virtual conferences.21 conferences that are not archived can result in a loss of material beyond the original event. jonathan bull and stephanie davis-kahl discuss the problem of conference ephemera not being accessible in their poster presentation.22 they explain that conference-hosting as an institutional repository service can assist with this lack of accessibility. “by posting documents and artifacts from conferences within an institutional repository, the content is not only accessible for future use, but also preserves those materials for the future and for institutional memory.”23 virtual student research symposiums longwood university’s virtual spring showcase for research and creative inquiry longwood university is a public university in south central virginia. it has about 4,000 undergraduate and 500 graduate students. it is known for its liberal arts focus, with strong programs in the humanities, nursing, and education. since spring of 2018 there has been a spring symposium of undergraduate research (now called the spring showcase for research and creative inquiry). for the first three years of its existence, the spring showcase was planned as a single day event in april, then a fall showcase was added in november 2019. in january 2020, the university showcase committee began planning to have an in-person spring showcase for research and creative inquiry on april 22, 2020. the proposed schedule was to be as follows: students would register to be part of it by march 13, 2020; they would be notified of their acceptance by the end of march, and they would be encouraged to submit posters to the institutional repository, digital commons @ longwood, by the date of the spring showcase, april 22. planning for the in-person showcase continued throughout february and the first part of march. one of the elements on the registration form was giving permission for student content to information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 4 go into the institutional repository. this step had been added in fall 2019 for the previous showcase. as covid-19 cases in the united states began rising in the beginning of march, administrators at longwood began to discuss the possibility of altering certain events. author mark hamilton considered the possibility of offering the institutional repository as a vehicle for hosting digital content for the showcase. by march 23, the director of undergraduate research notified the author that a decision had been made to host the spring showcase as an asynchronous event from april 22–24, 2020. this decision had been made by a small group including the co-chairs of the showcase and the provost in consultation with others. the director of undergraduate research specifically asked the library if the event could be hosted virtually through digital commons @ longwood and also requested a comments feature to facilitate online conversation. students, faculty, and staff would comment on the presentations throughout the three-day showcase. presenters would check the comments within those three days and post replies. the director also asked if it would be possible to upload videos that could go along with posters and other presentations. hamilton and the library’s digital initiatives specialist began to work through the technical aspects of making the showcase virtual. after inquiring about potential software from digital commons, they looked through two suggested options, disqus (https://disqus.com/) and intense debate (https://intensedebate.com/). they decided on intense debate because the comments feature was already integrated into the platform. they also looked through the various video formats available. then they worked with the digital commons representative to develop the structure for the showcase. this involved a bit of dialogue back and forth between the showcase co-chairs, digital commons staff, and the library. because the registration form already gave permission to post student content, it was decided that the university did not need to ask for this permission a second time for a virtual conference. new workflows were developed for the research submission process which included posters, presentations, and videos. faculty would submit files on behalf of students in their classes. library staff and instructional designers developed video and printed upload instructions. they were posted on the showcase website (part of the university website) as well as advertised via the library website. faculty asked students to submit their final projects by thursday, april 16, so there was time to upload the project posters and videos to digital commons @ longwood. most students submitted their projects through the campus lms, canvas. as faculty attempted to upload content, library staff were available to help them. the author helped one faculty member via zoom, describing the process of uploading to digital commons. one process that had to be adjusted involved powerpoint presentations that contained videos—they had to be separately downloaded and then just the powerpoint of the poster re-uploaded, so visitors could both view the powerpoint and watch the video. hamilton and the other staff member worked with faculty to place all the required content for each presentation; then the library’s digital initiatives specialist made all the content live. a number of activities occurred during the showcase. initially the digital initiatives specialist individually approved the comments that were posted, because this was the default set up. later this was changed to allow for automatic posting to speed up the approval process and to remove any apparent bias on the part of the administrators. some faculty also uploaded a few new versions of presentations. some of the science students decided to post only their abstracts, https://disqus.com/ https://intensedebate.com/ information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 5 because they were going to publish their research in journals and did not want their content to be open access and because some faculty were co-authors in these publications. in the subject listing, library staff included a link to the live zoom presentations that were offered. there was also a link to a live zoom session for the showcase awards ceremony, highlighting submissions to the journal of the college of arts and sciences. longwood university has hosted two more virtual showcases: in fall 2020 and spring 2021. the showcase organizers chose to switch the hosting platform to symposium by forager one (https://symposium.foragerone.com/), a third-party platform that allows for virtual and live posting of presentations and videos. the new platform provided an easier interface for students to submit research and administrators to manage it. library staff worked with the showcase organizing committee to preserve all the abstracts from the spring showcase. they are in discussions with how future content will be preserved and whether library staff should collect some of the research into the institutional repository. university of south florida st. petersburg campus virtual student research symposium the university of south florida st. petersburg campus is a branch campus of the larger university of south florida (usf). usf is an r1 research institution with approximately 50,000 undergraduate and graduate students in tampa, florida. at the time of the 2020 virtual student research symposium, usf st. petersburg campus was a separately accredited institution with roughly 5,000 undergraduate and graduate students. the student research symposium was in its 17th iteration in 2020. the office of research at usf st. petersburg organized the event and coordinated with the nelson poynter memorial library and the honors program. undergraduate and graduate students were invited to share their work with the campus community to demonstrate the high-impact research that they were conducting. in 2019, the library had worked with the office of research to host award winners and posters nominated by faculty on the bepress digital commons institutional repository called usfsp digital archive. for the 2020 symposium, the office of research began planning in august 2019. the inperson symposium was scheduled for april 16, 2020. when the covid-19 pandemic hit, the usf st. petersburg campus moved to remote instruction and ended on-campus activities on march 20, 2020. the office of research staff contacted librarians at nelson poynter memorial library to discuss the possibility of a virtual symposium. author allison symulevich considered a variety of platforms for hosting the research symposium, such as the campus website, canvas, libguides, digital commons, and facebook. criteria for platforms included factors related to team control, security, engagement, and archiving. becaus e of these factors and the prior pilot project, the library decided to recommend that the institutional repository be used to host the virtual research symposium. the office of research wanted to capture an experience for the students similar to that of the inperson event. thus, they requested that the platform include both video and audio options, as well as a way for the poster to be viewed. the office of research also requested audience participation through a commenting feature if possible. they also extended student submission deadlines to assist with the disruption in students’ lives. the office of research used a course in canvas, the learning management system used at usf, to collect research posters and presentations from the students. the library digital team was given https://symposium.foragerone.com/' information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 6 access to the canvas course so that the team could download posters, presentations, and abstracts to then upload to the institutional repository. the library uploaded 55 student projects, 43 of which had a video or audio presentation. the digital team had hoped to batch upload the files to the institutional repository using spreadsheets containing metadata such as author names, titles, abstracts, and links to audio or video presentation files. however, due to technical concerns, everything was uploaded manually, with work divided amongst team members who had previously used the system. first, all of the content was downloaded from canvas. these files were then posted to a shared drive in a variety of folders organized to maintain a workflow. the projects with audio/video presentations were uploaded first. then the projects that had abstracts and posters were added. due to time constraints, the digital team wanted to make sure the basics were done first so students would have time to make any revisions necessary before the site was promoted to the usf st. petersburg community. after this initial implementation, the team had a meeting with the larger committee to discuss the progress of the digital collection. the committee suggested some changes and offered constructive feedback. once the abstracts, posters, and presentations (either audio or video) were posted, the team noticed issues with some submissions. some students had submitted powerpoint presentations that did not display as the team was hoping, so one of the team members changed the format to mp4 files. audio files did not include a visual component. as a way to add a visual component, the team worked with digital commons to create a digital image gallery and add thumbnail images that could then be added to a special metadata field called poster preview. this enabled the collection to have a visual of the poster displayed above the audio file, allowing virtual attendees to press play on the audio file and see the poster image on the same page. the team then turned to the office of research’s request for a feature that allowed virtual attendees to interact with student presenters. digital commons does not have a commenting feature, so the digital team had to look at third-party commenting platforms. digital commons was able to integrate the platform chosen, intense debate, so that virtual attendees could comment on presentations. students were asked to monitor their posters for a two-week period. moving forward, the library and the usf st. petersburg campus discussed using the institutional repository for the spring 2021 symposium. however, due to administrative consolidation of the usf tampa, st. petersburg, and sarasota-manatee campuses into a oneusf with a single accreditation, the new combined office of research on the tampa campus decided to host the newly expanded, one-university, undergraduate student symposium through a canvas course.24 download statistics following the research symposiums, the authors looked at metrics for the virtual events. at usf, 55 presentations were uploaded to the ir. all time downloads from april 1, 2020 to december 31, 2021, including additional supplementary files, is 2,068, from 53 countries around the world. total streams of audio or video presentations for the same timeframe are 1,168. at longwood, 200 presentations were uploaded to the ir. all time downloads from april 13, 2020 to december 31, 2021, including additional supplementary files, is 16,190 from 124 countries around the world. total streams of video presentations for the same timeframe are 2,541. see figures 1 and 2. these presentations are still getting downloads and streams—one of the benefits of preserving high-impact student research projects. information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 7 university of south florida longwood university figure 1. downloads of symposium materials from each campus from april 20, 2020 to december 31, 2021. blue represents downloads of presentations. red represents supplementary materials. university of south florida longwood university figure 2. streams of symposium materials from each campus from april 20, 2020 to december 31, 2021. dark blue represents plays, blue represents views, and light blue represents completed viewings. information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 8 best practices after reflecting on these large undertakings to move in-person events to online student research symposiums, the authors have identified some common best practices, meant to assist other institutions making similar decisions. these decisions are based on the following core requisites. consistent university branding although both universities used bepress’ digital commons platform, institutions can use a variety of online platforms, such as campus websites, other institutional repositories, and third-party software for conference hosting, such as symposium by forager one or lumen learning. use a system that creates a cohesive look and feel to the collection of student research projects. usf had to do this with audio only presentations for consistency of viewing, adding a visual co mponent to match those of video presentations. university access or open access use a platform that allows archiving of student projects. even if the platform chosen for hosting the event does not allow for archiving, libraries should work with event hosts to provide institutional repository digital archiving of projects, similar to usf’s pilot project and longwood’s 2021 spring project of archiving abstracts. libraries can offer this as a solution to provide permanent archiving of high-impact student work.25 institutions need to consider whether they will keep their symposiums closed, meaning only accessible to the university community, or open to the world. while it is technically straightforward to restrict access using the campus lms, irs using net id sign-ins, or private websites, the authors argue for worldwide access to these presentations. archiving student work archiving these projects allows students to build their cvs for graduate school or interviews by providing hyperlinked citations to worldwide published projects. making these projects available open access allows students to contribute to the worldwide scholarly conversation on their given research topics.26 statistics from both longwood and usf show international downloads. file formats as far as file formats that work best, consider embedding video and audio files using consistent formats. mp4 video files worked best for both longwood and usf on the digital commons platform. audio files should be consistent as well for preservation. cross-unit collaborations work with other departments on campus to host these major academic events. many units on campus contribute to student success, and these efforts can be combined to distribute work amongst university faculty and staff so as not to overload one department and to provide the best possible symposium. different departments have different skill sets, such as technology and marketing. both longwood and usf st. petersburg libraries worked with departments such as undergraduate studies and communications to switch these in-person events to successful online programs. consider working with distance learning units to increase distance learning student participation in student research symposiums.27 distance learning and it departments may have additional technology experience that could lead to a better overall experience for students. in 2021, longwood worked with the office of student research, the library, marketing and communications, and academic affairs to put on the spring student showcase. this inter-unit information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 9 work led to another successful online event, with several hundred student researchers presenting their work. flexibility institutions should use flexible workflows when transitioning in-person events to online. both longwood and usf used flexible workflows for posting presentations into institutional repositories. however, the two universities differed in their submission process. longwood had faculty submit student projects directly to the ir, a more distributed approach. usf st. petersburg had students submit projects to canvas and then the digital team posted projects to the ir, a more centralized approach. institutions will need to decide which approach works best for them. usf st. petersburg does not have a history of allowing outside submissions to its ir. the digital teams needed to remain flexible as event dates were moving and online technology requests were changing, for example, event coordinators requesting online commenting features. similarly, deadlines should be set with realistic timeframes, allowing enough time for uploading projects to the online platform. longwood and usf worked with offices of research to establish flexible timelines for digital teams. submission forms consider using forms or a system to collect student submissions. google forms, microsoft forms, or a learning management system such as canvas are ways to collect the projects. make sure to test these prior to the submission process. both universities used canvas during the 2020 student research symposiums to collect student projects. however, in 2021, longwood students (both graduate and undergraduate) submitted directly to forager one’s symposium platform because it was already integrated into the campus single sign-on service, enabling ease of submissions. for the graduate student research symposium in 2021, usf used microsoft forms to create a form that was tailored to file format preference. although this form was not used after the office of graduate studies went in another direction, symulevich felt it was an improvement from the previous year’s collection process due to the output of an excel spreadsheet for metadata collection for batch uploading purposes. abstract archiving institutions should consider allowing students to submit abstracts only. longwood allowed students to not submit complete presentations if they were planning to publish their projects. this may be more of an issue when students are working with faculty members on research to be published at a later date. promoting the symposium and creating engagement promote the event to increase student participation. this can be done both through social media and through university web presences. consider working with your campus marketing and communications department to broaden marketing beyond the library. this marketing can be both to gain student projects and to promote the event to the broader campus community. likewise, seek ways to promote engagement on the institutional repository or whatever platform is chosen. this could include using a third-party commenting feature as a way to further engage students with their scholarly topic. however, make sure to monitor commenting in some capacity to avoid spam. also, turn off commenting features after a certain period of time so as not to overburden students. commenting features and increased engagement via online platforms, like video and audio presentations, help avoid the negative impact of a lack of face-to-face interactions.28 information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 10 hybrid symposiums even after the end of the covid pandemic when events resume in-person, hybrid symposium models should be considered, as evidenced by longwood’s use of synchronous presentations using zoom. these links were integrated into the ir. osu is considering using hybrid solutions in the future as well.29 conclusion moving in-person student research symposiums to online platforms during a pandemic is challenging. but this process of creating online events allows students to continue to celebrate their highimpact research and contribute to the scholarly community. open access archiving of these projects has been successful based on download counts at longwood university and usf st. petersburg campus. the authors hope to continue to use innovative digital archiving to provide support for student research projects. remaining flexible and working with other departments on campus can lead to successful online events. the authors hope in-person events will eventually return; however, these online platforms can enhance student research symposiums, providing global access to high-impact student projects. acknowledgement the authors thank the collaborative teams at longwood university and university of south florida st. petersburg campus that helped make these student research symposiums happen and succeed during a very difficult time. endnotes 1 george d. kuh, “high-impact educational practices: what they are, who has access to them, and why they matter,” leap (vol. 2008). association of american colleges & universities, https://provost.tufts.edu/celt/files/high-impact-ed-practices1.pdf. 2 kuh, “high-impact.” 3 helen walkington, jennifer hill, and pauline e. kneale, “reciprocal elucidation: a student-led pedagogy in multidisciplinary undergraduate research conferences,” higher education research & development 36, no. 2 (2017): 417, https://doi.org/10.1080/07294360.2016.1208155. 4 walkington, hill, and kneale, “reciprocal elucidation,” 417–18. 5 danielle barandiaran, betty rozum, and becky thoms, “focusing on student research in the institutional repository: digitalcommons@usu,” college & research libraries news 75, no. 10 (2014): 546–49, https://doi.org/10.5860/crln.75.10.9209; betty rozum, becky thoms, scott bates, and danielle barandiaran, “we have only scratched the surface: the role of student research in institutional repositories” (paper, acrl 2015 conference, portland, or, march 26, 2015), https://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 5/rozum_thoms_bates_barandiaran.pdf. 6 rozum, thoms, bates, and barandiaran, “we have only scratched the surface,” 804. https://provost.tufts.edu/celt/files/high-impact-ed-practices1.pdf https://doi.org/10.1080/07294360.2016.1208155 https://doi.org/10.5860/crln.75.10.9209 https://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2015/rozum_thoms_bates_barandiaran.pdf https://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2015/rozum_thoms_bates_barandiaran.pdf information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 11 7 erin passehl-stoddardt and robert monge, “from freshman to graduate: making the case for student-centric institutional repositories,” journal of librarianship and scholarly communication 2, no. 3 (2014): 2, https://doi.org/10.7710/2162-3309.1130. 8 rose sliger krause, andrea langhurst eickholt, and justin l. otto, “creative collaboration: student creative works in the institutional repository,” digital library perspectives 34, no. 1 (2018): 20–31, https://doi.org/10.1108/dlp-03-2017-0010. 9 jessica g. freeze et al., “orchestrating a highly interactive virtual student research symposium,” journal of chemical education 97, no. 9 (2020): 2773–78, https://dx.doi.org/10.1021/acs.jchemed.0c00676; sophie pierszalowski et al., “developing a virtual undergraduate research symposium in response to covid-19 disruptions: building a canvas-based shared platform and pondering lessons learned,” scholarship and practice of undergraduate research 4, no. 1 (fall 2020): 75, https://doi.org/10.18833/spur/4/1/10. 10 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. 11 stephanie houston, “lessons of covid-19: virtual conferences,” journal of experimental medicine 217, no. 9 (2020): e20201467, https://doi.org/10.1084/jem.20201467. 12 houston, “lessons of covid-19,” 2. 13 valerie speirs, “reflections on the upsurge of virtual cancer conferences during the covid -19 pandemic,” british journal of cancer 123 (2020): 698–99, https://doi.org/10.1038/s41416020-1000-x. 14 houston, “lessons of covid-19,” 3. 15 speirs, “reflections on the upsurge,” 699. 16 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. 17 houston, “lessons of covid-19,” 2–3; goedele roos et al., “online conferences—towards a new (virtual) reality,” computational and theoretical chemistry 1189 (november 2020): 5, https://doi.org/10.1016/j.comptc.2020.112975. 18 speirs, “reflections on the upsurge,” 699. 19 houston, “lessons of covid-19,” 2–3. 20 houston, “lessons of covid-19,” 2. 21 houston, “lessons of covid-19,” 3; roos et al., “online conferences,” 5; speirs, “reflections on the upsurge,” 699. 22 jonathan bull and stephanie davis-kahl, “contributions to the scholarly record: conferences & symposia in the repository,” library faculty presentations (2015): paper 12, http://scholar.valpo.edu/ccls_fac_presentations/12. 23 bull and davis-kahl, “contributions to the scholarly record.” https://doi.org/10.7710/2162-3309.1130 https://doi.org/10.1108/dlp-03-2017-0010 https://dx.doi.org/10.1021/acs.jchemed.0c00676 https://doi.org/10.18833/spur/4/1/10 https://doi.org/10.1084/jem.20201467 https://doi.org/10.1038/s41416-020-1000-x https://doi.org/10.1038/s41416-020-1000-x https://doi.org/10.1016/j.comptc.2020.112975 http://scholar.valpo.edu/ccls_fac_presentations/12 information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 12 24 digital commons @ usf will be used for a hybrid symposium, the 2022 annual undergraduate research conference. there will be an in-person component, as well as both synchronous and asynchronous presentations. 25 passehl-stoddardt and monge, “from freshman to graduate,” 2; barandiaran, rozum, and thoms, “focusing on student research in the institutional repository”; rozum, thoms, bates, and barandiaran, “we have only scratched the surface,” 804. 26 houston, “lessons of covid-19,” 2. 27 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. 28 houston, “lessons of covid-19,” 2–3; roos et al., “online conferences,” 3. 29 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. abstract introduction literature review virtual student research symposiums longwood university’s virtual spring showcase for research and creative inquiry university of south florida st. petersburg campus virtual student research symposium download statistics university of south florida longwood university university of south florida longwood university best practices consistent university branding university access or open access archiving student work file formats cross-unit collaborations flexibility submission forms abstract archiving promoting the symposium and creating engagement hybrid symposiums conclusion acknowledgement endnotes letter from the editor kenneth j. varnum information technology and libraries | march 2018 1 https://doi.org/10.6017/ital.v37i1.10388 this issue marks 50 years of information technology and libraries. the scope and everaccelerating pace of technological change over the five decades since journal of library automation was launched in 1968 mirrors what the world at large has experienced. from “automating” existing services and functions a half century ago, libraries are now using technology to rethink, recreate, and reinvent services — often in areas that simply were in the realm of science fiction. in an attempt to put today’s technology landscape in context, ital will publish a series of essays this year, each focusing on the highlights of a decade. in this issue, editorial board member mark cyzyk talks about selected articles from the first two volumes of the journal. in the remaining issues this year, we’ll tackle the 1970s, 1980s, 1990s, and 2000s. the journal itself, now as ever before, focuses on the present and the near future, so we will hold off recapitulating the current decade until our centennial celebration in 2068. as we look back over the journal’s history, the editorial board is also looking to the future. we want to make sure that we know for whom we are publishing these articles, and to make sure that the journal is as relevant to today’s (and tomorrow’s) readership as it has been for those who have brought us to the present. to that end, we invite anyone who is reading this issue to take this brief survey — tell us a little about how you came to ital today, how you’re connected with library technology, and what you’d like to see in the journal. it won’t take much of you r time (no more than 5 minutes) and will help us understand the context in which we are working. there’s another opportunity for you to help shape the future of the journal. due to a number of terms being up at the end of june 2018, we have at least five openings on the editorial board to fill. if you are passionate about libraries and technology, enjoy working with authors to shape their articles, and want to help set out today’s scholarly record for tomorrow’s technologists, submit a statement of interest at https://goo.gl/forms/5gbqouuseolxrfx52. we seek to have an editorial board that represents the diversity of library technology practitioners, and particularly invite individuals from non-academic libraries and underrepresented demographic groups to apply. sincerely, kenneth j. varnum editor march 2018 https://umich.qualtrics.com/jfe/form/sv_6hafly0cyjpbk4j https://umich.qualtrics.com/jfe/form/sv_6hafly0cyjpbk4j https://goo.gl/forms/5gbqouuseolxrfx52 the benefits of enterprise architecture for library technology management: an exploratory case study sam searle information technology and libraries | december 2018 27 sam searle (samantha.searle@griffith.edu.au) is manager, library technology services, griffith university, brisbane, australia. abstract this case study describes how librarians and enterprise architects at an australian university worked together to document key components of the library’s “as-is” enterprise architecture (ea). the article covers the rationale for conducting this activity, how work was scoped, the processes used, and the outputs delivered. the author discusses the short-term benefits of undertaking this work, with practical examples of how outputs from this process are being used to better plan future library system replacements, upgrades, and enhancements. longer-term benefits may also accrue in the future as the results of this architecture work inform the library’s it planning and strategic procurement. this article has implications for practice for library technology specialists as it validates views from other practitioners on the benefits for libraries in adopting enterprise architecture methods and for librarians in working alongside enterprise architects within their organizations. introduction griffith university is a large comprehensive university with multiple campuses located across the south east queensland region in australia. library and information technology operations are highly converged and from 1989–2017 were offered within a single division of information services. scalable, sustainable, and cost-effective it is seen as a key strategic enabler of the university’s core business in education and research. “information management and integration” and “foundation technology” are two of four key areas outlined in the griffith digital strategy 2020, which highlights enterprise-wide decision-making and proactive moves to take advantage of as-a-service models for delivering applications.1 from late 2016 through to early 2018, library and learning services (“the library”) and it architecture and strategy (itas) worked iteratively to document key components of the library’s “as-is” enterprise architecture (ea). around fifty staff members have participated in the process at different points. the process has been very positive for all involved and has led to a number of benefits for the library in terms of improved planning, decision-making, and strategic communication. as manager, library technology services, the author was well placed to act as a participant-asobserver with the objective of sharing these experiences with other library practitioners. the author actively participated in the processes described here and has been able to informally discuss the benefits of this work with the architects and some of the library staff members who were most involved. mailto:samantha.searle@griffith.edu.au benefits of enterprise architecture for library technology management | searle 28 https://doi.org/10.6017/ital.v37i4.10437 literature review enterprise architecture (ea) emerged over twenty years ago and is now a well-established it discipline. like other disciplines such as project management and change management, there are a number of best practice frameworks in common use, including the open group architecture framework (togaf).2 a global federation of member professional associations has been in place since 2011, with aims including the formalization of standards and promotion of the value of ea.3 educational qualifications, certifications, and professional development pathways for enterprise architects are available within universities and the private training sector. according to the international higher education technology association educause, ea is relatively new within universities but is growing in importance. as a set of practices, “ea provides an overarching strategic and design perspective on it activities, clarifying how systems, services, and data flows work together in support of business processes and institutional mission.”4 yet despite this growing interest in our parent organizations, individual academic libraries applying ea principles and methods are notably absent from the scholarly literature and library practitioner information sharing channels. the fullest account to date of the experience and impacts of enterprise architecture practice in a library context is a case study from the canada institute for scientific and technical information (cisti). at the time of the case study’s writing in 2008, cisti was already well underway in its adoption of ea methods in an effort to address the challenges of “legacy, isolated, duplicated, and ineffective information systems” and to “reduce complexity, to encourage and enable collaborations, and, finally, to rein in the beast of technology.”5 the author of this case study concludes that while getting started in ea was complex and resource-intensive, this was more than justified at cisti by the improvements in technology capability, strategic planning, and services to library users. broader whole-of-government agendas are a driver for ea adoption in non-university research libraries. the national library of finland’s ea efforts were guided by a national information society policy and the ea architecture design method for finnish government. 6 a 2009 review of the it infrastructure at the u.s. library of congress (lc) argued lc was lagging behind other federal agencies in adoption of government-recommended ea frameworks. the impact of this included: inadequate linking of it to the lc mission; potential system interoperability problems; difficulties assessing and managing the impact of changes; poor management of it security; and technical risk due to non-adherence to industry standards and lack of future planning.7 a followup review in 2015 noted that lc had since developed an architecture, but that it had still fallen short by not gathering data from management and validating the work with stakeholders. 8 there is little discussion in the literature about the ea process as a collaborative effort. in their 2016 discussion of emerging roles for librarians, parker and mckay proposed ea as a new area for librarians themselves to consider moving into, rather than as a source of productive partnerships.9 they argued that there are many similarities in the skillsets and practices of enterprise architects and information professionals (in particular, systems librarians and corporate information managers). areas of crossover identified included: managing risks, for example, related to intellectual property and data retention; structured and standardized approaches to (meta)data and information; technical skills such as systems analysis, database design and vendor management; and understanding and application of information standards and internal information technology and libraries | december 2018 29 information flows. while not a research library, within a broader information management context state archives and records nsw has promoted the benefits to records managers of working with enterprise architects, including improved program visibility, strategic assistance with business case development, and the embedding of recordkeeping requirements within the organization’s overall enterprise architecture.10 getting started: context and planning library technology services context in 2015–16, the awareness of enterprise and solution architecture expanded significantly within griffith university’s library technology services (lts) team. in 2015, some members of the team participated in activities led by external consultants to document griffith’s overall enterprise architecture at a high level. in 2016, the author became a member of the university’s solution architecture board (sab). lts submitted several smaller solution architectures to this group for discussion and approval, and team members found this process useful in identifying alternative ways to do things that we may not have otherwise considered. as a small team looking after a portfolio of high-use applications, lts was seeking to align itself as much as possible with university-wide it governance and strategy. these broader approaches included aggressively seeking to move services to cloud hosting, standardizing methods for transferring data between systems, complying with emerging requirements for greater it security, and participating in large-scale disaster recovery planning exercises. the author also needed to improve communication with senior it stakeholders. there was little understanding outside of the library of the scale and complexity involved in delivering online library services to a community of over 50,000 people. in a resource-scarce environment, it was increasingly important to make business cases not just in formal project documents but also opportunistically in less formal situations (the “elevator pitch”). existing systems were definitely hindering the library in making progress toward an improved online student experience and more efficient usage of staff resources. a complex ecosystem of more than a dozen library applications had developed over time. the library had selected these at different times based on requirements for specific library functions rather than alignment with an overall architectural strategy. our situation mirrored that described at cisti: “a complex and ‘siloed’ legacy infrastructure with significant vendor lock-in” combined with “reactionary” projects that “extended or redesigned [existing infrastructure] to meet purported needs, without consideration for the complexity that was being added to overcomplicated systems.”11 complex data flows between local systems and third-party providers that were critical to library services were not always well-documented. while lts staff members were extremely experienced, much of their knowledge was tacit. as in many libraries, staff could be observed sharing in informal, organic ways focused on the tasks at hand, but less effort was spent on capturing knowledge systematically. building a more explicit shared understanding about the library’s application portfolio would help address risks associated with staff succession. improved internal documentation would also address emerging requirements for team members to both develop their own understanding in new areas (upskilling) as well as become more flexible in terms of taking up broader roles and responsibilities across the team (cross-skilling). benefits of enterprise architecture for library technology management | searle 30 https://doi.org/10.6017/ital.v37i4.10437 there was also a sense that the time was right to take stock and evaluate the current state of affairs before embarking on any major changes. the team was supporting several applications, including the library management system and the interlibrary loans system, that were end-of-life. we needed to make decisions, and these needed to not only address our current issues but also provide a firm platform for the future. it was in this context that in 2016 library technology services approached the information technology architecture and solutions group for assistance. information technology architecture and solutions context in 2014, griffith university embarked on a new approach to enterprise architecture. the chief technology officer was given a mandate by the senior leadership of the university to ensure that it architecture was managed within an architecture governance framework, and the information services ea team was tasked with developing and maintaining an ea and providing services to support the development of solution architectures for projects and operational activities. two new boards were established to provide governance: the information and technology architecture board (itab) would control architectural standards and business technology roadmaps, while the solution architecture board (sab) would “support the development and implementation of solution architecture that is effective, sustainable and consistent with architectural standards and approaches.” project teams and operational areas were explicitly given responsibility to engage with these boards when undertaking the procurement and implementation of it systems. sets of architectural, information, and integration principles were developed, which promoted integration mechanisms that minimized business impact and were future-proof, loosely coupled, reusable, and shared services.12 our enterprise architects saw their primary role as maximizing the value of the university’s total investment in it by promoting standards and frameworks that could potentially improve consistency and reduce duplication across the whole organization. in order to do this , they would need to work with and through other business units. from the architects’ perspective, a collaboration with the library offered an opportunity to exercise skillsets and frameworks that were in place but still relatively new. griffith was still maturing in this area and attempting to move from the hiring of consultants as the norm to building more internal capability. working with the library would be a good learning experience for a junior architect, who was on a temporary work placement from another part of information services as a professional development opportunity. she could build her skills in a friendly environment before embarking on other engagements with potentially less open client groups. determining scope in a statement of architecture work once the two teams had decided that the process could have benefits on both sides, the next step was to jointly develop a statement of architecture work outlining what the process would include and how we would work together. a formal document was eventually endorsed at the director level, but prior to that, the librarians and the architects had a number of useful informal conversations in which we discussed our expectations, as well as the amount of time that we could reasonably contribute to the process. in developing the statement of work, the two teams agreed to focus on the current “as-is” environment and on assessment of the maturity of the applications already in use (see figure 1). this would help us immediately with developing business cases and roadmaps, without information technology and libraries | december 2018 31 necessarily committing either team to the much greater effort required to identify an ideal “to-be” (i.e., future) state to work towards. figure 1. overview of the architecture statement of work. full size version available at https://doi.org/10.6084/m9.figshare.6667427. the open group architecture framework (togaf) supports the development of enterprise architectures through four subdomains: business architecture, data architecture, application architecture, and technology architecture.13 the work that we decided to pursue maps to two of these areas: data architecture, which “describes the structure of an organization’s logical and physical data assets and data management resources;” and application architecture, which “provides a blueprint for the individual applications to be deployed, their interactions, and their relationships to the core business processes of the organization.” enterprise architecture process and outputs once the architecture statement of work had been agreed on, the two teams embarked on the process of working together over an extended period. while the lapsed time from approval of the statement of work through to endorsement of the architecture outputs by the solution architecture board was approximately fourteen months, the bulk of the work was undertaken within the first six months. following an intense period of information gathering involving large numbers of staff, a smaller subset of people then worked iteratively to refine the outputs for final approval. several times architecture activities had to be placed on hold in favor of essential ongoing operational work and higher priority projects, such as a major upgrade of the institutional repository. the process involved four main activities which are described in more detail in following sections. https://doi.org/10.6084/m9.figshare.6667427 benefits of enterprise architecture for library technology management | searle 32 https://doi.org/10.6017/ital.v37i4.10437 data asset and application inventory the first activity consisted of a series of three workshops to review information held about library systems in the ea management system, orbus software’s iserver. this is the tool used by the griffith ea team to develop and store architectural models, and to produce artifacts such as architecture diagrams (in microsoft visio format) and documentation (in microsoft word, excel, and powerpoint formats).14 the architects guided a group of librarians who use and support library systems through a process of mapping the types of data held against an existing list of enterprise data entities. in this context, a data entity is a grouping of data elements that is discrete and meaningful within a particular business context. for library staff, meaningful data entities included all the data relating to a person, to items and metadata within a library collection, and to particular business processes such as purchasing. we also identified the systems into which data were entered (system of entry), the systems that were considered the “source of truth” (system of record), and the systems that made use of data downstream from those systems of record (reference systems). the main output of this process was a workbook (figure 2) showing a range of relationships: between systems and data entities; between internal systems; and between internal systems and external systems. the first two columns in the worksheet contain a list of all the data entities and sub-entities stored in library systems (as expressed in the enterprise architecture). along the top of the worksheet is a list of all the products in our portfolio along with a range of systems they are integrated with. each of the orange arrows in this spreadsheet represents the flow of data from one system to another. the workbook in this raw form is definitely messy and the data within it is not really meant to be widely consumed in this format. the workbook’s main role is as the data source for the application communication diagram that is described in a later section. as a result of this data asset inventory, the management system used by our architects now contains a far more comprehensive and up-to-date view of the library’s architectural components than before: • the data entities better reflect library content. for example, while iserver already had a collection item data entity, we were able to add new data entity subtypes for bibliographic records, authority records, and holdings records. • library systems are now captured in ways that make more sense to us. workshopping with the architects led to the breakdown of several applications into more granular architectural components. for example, the library management system is now represented not just as a single system, but rather as a set of interconnected modules that support different business functions, such as cataloguing and circulation. similarly, our reading lists solution was broken down into its two main components: one for managing reading lists and one for managing digitized content. this granularity has enabled us to build a clearer picture of how systems (and modules within systems) interface with each other. information technology and libraries | december 2018 33 figure 2. part of the data asset and application inventory worksheet. full size version available at https://doi.org/10.6084/m9.figshare.6667430. https://doi.org/10.6084/m9.figshare.6667430 benefits of enterprise architecture for library technology management | searle 34 https://doi.org/10.6017/ital.v37i4.10437 • the wide range of technical interfaces we have with third parties, such as publishers and other libraries, is now explicitly expressed. feedback from the architects suggested that the library was very unusual compared to other parts of the organization in terms of the number of critical external systems and services that we use as part of our service provision. previously iserver did not contain a full picture of these critical services, including: o the web-based purchasing tools that we use to interact with publishers, such as ebsco’s gobi;15 o the library links program that we use to provide easier access to scholarly content via google scholar;16 and o various harvesting processes that enable us to share metadata with content aggregators, such as the national library of australia’s trove service and the australian national data service’s research data australia portal. 17 application maturity survey the second activity was an application maturity assessment. this involved forty-four staff members from all areas of the library with different viewpoints (technical, non-technical, and management) answering a series of questions in a spreadsheet format. the survey contained questions about: • how often a system was used; • how easy it was to use; • how well it supported the business processes that person carried out; • how well it performed, for example, in terms of response times; • how quickly changes/enhancements were implemented in the product; • how easily the system could be integrated with other systems; • the level of compliance with industry standards; and • overall supportability (including vendor support). as different respondents were assigned multiple systems depending on their level of support and/or use, the final overall number of responses to the survey was 144 responses relating to eleven different systems. the outputs of this process were a summary table and a series of four graphs. the summary table (see figure 3) presents aggregated scores on a scale of one (low) to five (high) for each application as well as recommended technical and management strategies. it is interesting, and somewhat disheartening, to note that scores for the business criticality of the applications are generally much higher than the scores for fitness. there is also some variation in the strategies required; some systems need to be replaced, but there are others where the issues seem to be less technical. the third row of the table shows a product that is scored as highly business-critical and perfectly suited to the job from a technical perspective, yet the product still scores much more poorly for business fit, which could indicate that something has gone wron g in the way that this product has been implemented. information technology and libraries | december 2018 35 figure 3. table summarizing the results of the application maturity assessment [product names redacted]. applications are rated on a scale of one to five, and one of four management strategies (technology refresh—not shown here, optimise, implementation review, or replace) is recommended. full size version available at https://doi.org/10.6084/m9.figshare.6667433. figure 4. two of the four graph types produced from the application maturity survey results, for a product [name redacted] that is performing well. full size version available at https://doi.org/10.6084/m9.figshare.6667436. figures 4 and 5 show the four graph types produced automatically from the survey results. on the left in figure 4 is a view displaying the business criticality, business fit, and technical fit for an individual application (shown in pink) as compared to the overall portfolio (shown in blue). on the right is a graph showing scores for the range of measures covered by the survey. this https://doi.org/10.6084/m9.figshare.6667433 https://doi.org/10.6084/m9.figshare.6667436 benefits of enterprise architecture for library technology management | searle 36 https://doi.org/10.6017/ital.v37i4.10437 particular product is doing well; technical and business fit are high in the graph on the left, and most measures are above average in the graph on the right. figure 5 shows the remaining two graphs for the same product. the graph on the left plots the scores for business criticality and application suitability (fitness for purpose) to produce a recommended technical strategy. the graph on the right plots the scores for business fit and technical fit to produce a recommended management strategy. in both graphs, it is possible to see how the specific application is performing (the red square) compared to the portfolio overall (the blue diamond). placement within the quadrant with the green optimize label is preferred, as in this case. figure 5. the remaining two graph types from the application maturity survey results, for a system [product name redacted] that is performing well. the specific system’s location is shown by the red square, while the blue diamond maps the average for all systems in the application portfolio. full size version available at https://doi.org/10.6084/m9.figshare.6667442. figures 6 and 7 present the same set of graphs for an end-of-life system. in figure 6 the graph on the left shows that the product is very business-critical but that its scores for technical fit and business fit (the lower corners of the pink triangle) are lower than the average across all applications (the lower corners of the blue triangle). the graph on the right shows that supportability and the time to market for changes and enhancements (the least prominent “points” in the pink polygon) are below the portfolio average (shown in blue along the same axes) while scores for other criticality, standards compliance, information quality, and performance were more in line with the portfolio average. https://doi.org/10.6084/m9.figshare.6667442 information technology and libraries | december 2018 37 figure 6. the first and second (of four) graphs for a system [product name redacted] that is end of-life. full size version available at https://doi.org/10.6084/m9.figshare.6667478. in figure 7, this application is placed well within the quadrant suggesting replacement. figure 7. the third and final graphs for a system [product name redacted] that is end-of-life. the placement of the red square within the replace quadrant indicates that this product is a high candidate for decommissioning. this is a marked difference from the portfolio as a whole (the blue diamond), which could be reviewed for possible implementation improvements. full size version available at https://doi.org/10.6084/m9.figshare.6667484. https://doi.org/10.6084/m9.figshare.6667478 https://doi.org/10.6084/m9.figshare.6667484 benefits of enterprise architecture for library technology management | searle 38 https://doi.org/10.6017/ital.v37i4.10437 the graphs are also useful for highlighting anomalies. figure 8 shows a product that is assessed as better-than-average in the portfolio on most measures. however, the survey results quite clearly show that information quality is a major issue. figure 8. graph from application maturity survey showing a specific area of concern (data quality) for an otherwise well-performing application [product name redacted]. full size version available at https://doi.org/10.6084/m9.figshare.6667487. this type of finding will help library technology services to target our continuous improvement efforts and work through our relationships with user groups and vendors to get a better result. application communication diagram the third major activity was the production of an application communication diagram (see figure 9). this is a visual representation of all of the information that was collated through the workshops using the workbook described above. https://doi.org/10.6084/m9.figshare.6667487 information technology and libraries | december 2018 39 figure 9. application communication diagram [simplified view]. full size version available at https://doi.org/10.6084/m9.figshare.6667490. https://doi.org/10.6084/m9.figshare.6667490 benefits of enterprise architecture for library technology management | searle 40 https://doi.org/10.6017/ital.v37i4.10437 the diagram includes a number of things to note. • key applications that make up the library ecosystem. an example of this is the large blue box on the top left. this represents the intota product suite from proquest, which contains multiple components, including our link resolver, discovery layer, and electronic resource manager. • physical technology. self-checkout machines appear as the small green box mid-right. • other internal systems that connect to library system components. examples of these are throughout and include: corporate systems, such as peoplesoft for human resources and finances; identity management systems like metadirectory and ping federate; the learning management system blackboard; and research systems, including the research information management system and the researcher profiles system. • external systems that connect to our systems. these are mostly gathered into the large grey box bottom right. • actors who access the systems. this includes administrators, staff, students, and the general public. actors are identified using a small person icon. • interfaces between components. each line in the diagram represents a unique connection into another system or interface. captions on these lines indicate the nature of the connection, e.g. manual data entry, z39.50 search, export scripts, and lookup lists. the production of this diagram has been an iterative process that has taken place over a long time period. the number of components involved in the diagram is quite large, so it is worth noting that the version presented here has actually been simplified. the architects’ tools can present information in different ways and this particular “view” was chosen to balance the need for detail and accuracy with the need to communicate meaningfully with a variety of stakeholders. production of interactive visualizations in the fourth and final work package, the data entity and application inventory spreadsheet was used as a data source to provide an interactive visualization (see figure 10). a member of the architecture team converted the workbook (see figure 2) from microsoft excel .xls into a .csv file. he developed a php script to query the file and return a json object based on the parameters that were passed. the data driven documents javascript library (d3.js) was used to produce a force graph that uses shapes, colors, and lines to visually present the spreadsheet information in a more interactive way.18 this tool enables navigation through the library’s network of data entities (shown as orange squares) and applications (shown as blue dots). in the example being displayed, the data entity “bibliographic records—marc” has been selected. it is possible to see both in the visualization and in the popup box on the left how marc records are captured, stored, and used across our entire ecosystem of applications. this visualization was very much an experiment and the value of this in the long term is something we are still discussing. in the short term, other outputs have proven to be more useful for planning purposes. information technology and libraries | december 2018 41 figure 10. interactive visualization of library architecture, showing relationships between a single data subentity (bibliographic records—marc) and various applications. full size version available at https://doi.org/10.6084/m9.figshare.6667493. https://doi.org/10.6084/m9.figshare.6667493 benefits of enterprise architecture for library technology management | searle 42 https://doi.org/10.6017/ital.v37i4.10437 discussion the process described above was not without its challenges, including establishing a common language. enterprise architecture and libraries are both fertile breeding grounds for jargon and acronyms. there was also a disconnect in our understandings of who our users were, with the architects tending to concentrate on internal users, while the librarians were keen to include the perspectives of the academic staff and students who make up our core client base. these were minor challenges, and the experience of working with the enterprise architects was overall an interesting and positive one for the library. our collaboration validated mckay and parker’s view that there is much crossover in the skillsets and mindsets of librarians and enterprise architects.19 both groups tended to work in systematic and analytical ways, which was helpful in removing some of the more emotive aspects that might have arisen through a more judgmental “assessment” process. the enterprise architects’ job was to promote conformance with standards that are aspirational in many respects for the library. however, the collaborative nature of the process and the immediate usefulness of its outputs helped us to approach this as an opportunity to improve our internal practices as well as the services that we offer to library customers. the architects observed in return that library staff were very open-minded about the process; this had not necessarily always been their experience with other groups in the university. one reason for this may have been lts’s efforts to communicate early with other library staff. before embarking on this work, we sent emails and provided verbal updates to all participants and their supervisors. these communications were clear about both the time commitment needed for workshops and surveys and also about the benefits we hoped to achieve. short-term impacts in the library domain the level of awareness and understanding in library technology services about ea concepts and methods is much higher than what it was previously. our capacity to self-identify architectural issues is better as a result and this is enabling us to be proactive rather than reactive. a recent example of this is a request from our solution architecture board (sab) to seek an exemption from our it advisory board (itab) for our proposed use of the niso circulation interchange protocol (ncip) to support interlibrary loan. while ncip is a niso standard that is widely used in libraries, it is not one of the integration mechanisms incorporated into the architecture standards. as a result of this request, we plan to develop a document for these it governance groups about all the library-specific data transfer protocols that we use; not just ncip, but also z39.50, the open archives initiative protocol for metadata harvesting (oai-pmh), the edifact standard for transferring purchasing information, and possibly others. it is in our interests to educate these important governance groups about integration methods commonly used in the library environment, since these are not well understood outside of our team. the baseline as-is application architecture diagram gives us a much better grasp on the complexity we are faced with. understanding this complexity is a prerequisite to controlling it. the diagram, and the process worked through to populate it, makes it easier to identify manual processes that should be automated and integrations that might be done more efficiently or effectively. for example, like most libraries, we still have many scheduled batch processes that we could potentially replace in the future with web services to provide real-time updates. information technology and libraries | december 2018 43 the iserver platform is now an important source of data to support our decision-making, in terms of arriving at broad recommendations for replacing, reimplementing, or optimizing our systems as well as highlighting specific areas of concern. importantly, the process produced relative results, so that we can see across our application portfolio which systems are underperforming compared to others. this makes it easier to determine where the team should be putting its efforts and highlights areas where firmer approaches to vendor management could be applied. a practical example of this was our decision in late 2017 to review (and ultimately unbundle and replace) an e-journal statistics module that was underperforming compared to other modules within the same suite. the outputs from this process are also helping library technology services communicate, both within our own team and also with other stakeholders. the results of the application maturity assessment were included as part of a business case seeking project funding to upgrade our library management system and replace our interlibrary loans system. that funding bid was successful. while it is possible that the business case would have been approved regardless, a recommendation from the architects that the system needed to be replaced was likely more persuasive than the same recommendation coming solely from a library perspective. in our organizational context, enterprise architects are trusted by very senior executives; they are perceived as neutral and objective, and the processes that they use are understood to be systematic and data-driven. longer-term impacts in an enterprise context there are a number of longer-term impacts that may arise from this work. seeing the library’s applications in a broader enterprise context is likely to lead to more questioning of the status quo and to a desire to investigate new ways to do things. in large organizations like universities, available enterprise systems can offer better functionality and more standardized ways of operating than library systems. financial systems are an obvious example, as are business intelligence tools. the canned and custom reports and dashboards within library systems meet a narrow set of requirements, but do not compare well for increasingly complex analytics when compared to enterprise data warehousing, emerging “data lake” technologies for less structured data, and sophisticated reporting tools. an enterprise approach also highlights where the same process is being done across different systems. for example, oai-pmh harvesting is a feature of multiple systems at griffith. traditionally each system provides its own feeds. our data repository, publications repository, and researcher profile system all provide oai-pmh harvesting endpoints for sending metadata to different aggregators. an alternative solution to explore could be to harvest all publications data from multiple systems into our corporate data warehouse (particularly if this evolved to provide more linked data functionality) and provide a single oai-pmh endpoint that could then be managed as a single service. the ea process has further raised our already high level of concern with the current library systems market. there has been a move in recent years towards larger, highly-integrated “black box” solutions. while there have been some moves towards openness, for example through the development of apis, these are often rhetorical rather than practical. the pricing structures for products mean that we continue to pay for functionality that would not be required if we could integrate library applications with non-library enterprise tools in smarter ways. at griffith, the benefits of enterprise architecture for library technology management | searle 44 https://doi.org/10.6017/ital.v37i4.10437 products that scored most highly in our maturity assessment in terms of business and technical fit were the less expensive, lightweight, browser-based, cloud-native tools designed to do one or two things really well. this suggests that strategies around a more loosely coupled microservices approach, such as that being developed through the folio open source library software initiative, will be worth exploring in future.20 conclusion there are few documented examples of librarians working closely with enterprise architects in higher education or elsewhere. the goal of this case study is to encourage other librarians to learn more about architects’ work practices and to seek opportunities to apply ea methods in the library systems space for the benefit not just of the library but also for the organization as a whole. as a single institution case study, the applicability of this work may be limited in other environments. griffith has a long tradition of highly converged library and it operations; other organizations may have more structural barriers to entry if the library and it areas are not as naturally cooperative. a further obvious limitation relates to resourcing. the author of the cisti case study cautions that getting started in ea can be complex and resource-intensive. few libraries are likely to be in the position of cisti in having dedicated library architects, so working with others will be required. in many universities, work of this nature is outsourced to specialist consultants because of a lack of in-house expertise. at griffith university, we conducted this exercise entirely with in-house staff. a downside of this was that, despite our best efforts at the scoping stage, competing priorities in both areas meant that this work took far longer than we expected. in theory, external consultants could have guided the library through similar activities to produce similar outputs, and probably in a shorter timeframe. however, we would observe that the process has been just as important as the outputs; the knowledge, skills, and relationships that have been built will continue into the future. at cisti, investments in ea were assessed by the library as justified by the improvements in technology capability, strategic planning, and services to library users. the griffith experience validates this perspective. it is also important to note that ea work can and should be done in an iterative way. our experience suggests that some outputs can be delivered earlier than others and useful insights can be gleaned even from drafts. our local “ecosystem” of library applications, enterprise applications, and integrations between these different components mus t respond to changes in technologies; legal and regulatory frameworks; institutional policies and procedures; and other factors. it is therefore unrealistic to expect outputs from a process like this to remain current for long. assuming that the library’s data and application architecture will always be a work-in-progress, it will continue to be worth the effort involved to build and maintain positive working relationships with the enterprise architects, who now have a deeper understanding of who we are and what we do. acknowledgements thank you to anna pegg, associate it architect; jolyon suthers, senior enterprise architect; colin morris, solution consultant; the library technology services team; all our library and learning services colleagues who participated in this initiative; and joanna richardson, library strategy information technology and libraries | december 2018 45 advisor, for support and feedback during the writing of this article. this work was previously presented at theta (the higher education technology agenda) 2017, auckland, new zealand. references 1 griffith university, “griffith digital strategy 2020,” 2016, https://www.griffith.edu.au/__data/assets/pdf_file/0026/365561/griffithuniversity-digitalstrategy.pdf. 2 the open group, “togaf®, an open group standard,” accessed june 4, 2018, http://www.opengroup.org/subjectareas/enterprise/togaf. 3 federation of enterprise architecture professional associations, “a common perspective on enterprise architecture,” 2013, http://feapo.org/wp-content/uploads/2013/11/commonperspectives-on-enterprise-architecture-v15.pdf. 4 judith pirani, “manage today’s it complexities with an enterprise architecture practice,” educause review, february 16, 2017, https://er.educause.edu/blogs/2017/2/managetodays-it-complexities-with-an-enterprise-architecture-practice. 5 stephen kevin anthony, “implementing service oriented architecture at the canada institute for scientific and technical information,” the serials librarian 55, no. 1–2 (july 3, 2008): 235–53, https://doi.org/10.1080/03615260801970907. 6 kristiina hormia-poutanen, “the finnish national digital library: national library of finland developing a national infrastructure in collaboration with libraries, archives and museums,” accessed march 24, 2018, http://travesia.mcu.es/portalnb/jspui/bitstream/10421/6683/1/fndl.pdf. 7 karl w. schornagel, “information technology strategic planning: a well-developed framework essential to support the library’s and future it needs. report no. 2008-pa-105,” may 2, 2009, https://web.archive.org/web/20090502092325/https://www.loc.gov/about/oig/reports/20 09/final%20it%20strategic%20planning%20report%20mar%202009.pdf. 8 joel willemssen, “information technology: library of congress needs to implement recommendations to address management,” december 2, 2015, https://www.gao.gov/assets/680/673955.pdf. 9 rebecca parker and dana mckay, “it’s the end of the world as we know it . . . or is it? looking beyond the new librarianship paradigm,” in marketing and outreach for the academic library, ed. bradford lee eden (lanham, md: rowman and littlefield, 2016): 81–106. 10 new south wales state archives and records authority, “recordkeeping in brief 59—an introduction to enterprise architecture for records managers,” 2011, https://web.archive.org/web/20120502184420/https://www.records.nsw.gov.au/recordkee ping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-inbrief-59-an-introduction-to-enterprise-architecture-for-records-managers. 11 anthony, “implementing service oriented architecture,” 236–37. https://www.griffith.edu.au/__data/assets/pdf_file/0026/365561/griffithuniversity-digital-strategy.pdf https://www.griffith.edu.au/__data/assets/pdf_file/0026/365561/griffithuniversity-digital-strategy.pdf http://www.opengroup.org/subjectareas/enterprise/togaf http://feapo.org/wp-content/uploads/2013/11/common-perspectives-on-enterprise-architecture-v15.pdf http://feapo.org/wp-content/uploads/2013/11/common-perspectives-on-enterprise-architecture-v15.pdf https://er.educause.edu/blogs/2017/2/manage-todays-it-complexities-with-an-enterprise-architecture-practice https://er.educause.edu/blogs/2017/2/manage-todays-it-complexities-with-an-enterprise-architecture-practice https://doi.org/10.1080/03615260801970907 http://travesia.mcu.es/portalnb/jspui/bitstream/10421/6683/1/fndl.pdf https://web.archive.org/web/20090502092325/https:/www.loc.gov/about/oig/reports/2009/final%20it%20strategic%20planning%20report%20mar%202009.pdf https://web.archive.org/web/20090502092325/https:/www.loc.gov/about/oig/reports/2009/final%20it%20strategic%20planning%20report%20mar%202009.pdf https://www.gao.gov/assets/680/673955.pdf https://web.archive.org/web/20120502184420/https:/www.records.nsw.gov.au/recordkeeping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-in-brief-59-an-introduction-to-enterprise-architecture-for-records-managers https://web.archive.org/web/20120502184420/https:/www.records.nsw.gov.au/recordkeeping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-in-brief-59-an-introduction-to-enterprise-architecture-for-records-managers https://web.archive.org/web/20120502184420/https:/www.records.nsw.gov.au/recordkeeping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-in-brief-59-an-introduction-to-enterprise-architecture-for-records-managers benefits of enterprise architecture for library technology management | searle 46 https://doi.org/10.6017/ital.v37i4.10437 12 jolyon suthers, “information and technology architecture,” 2016, accessed april 6, 2018 https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/com munities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%2 0symposium%202016%20-%20it%20architecture%20v2_0.pdf. 13 the open group, “togaf® 9.1,” 2011, 2018, http://pubs.opengroup.org/architecture/togaf9doc/arch/index.html: part 1 introduction section 2: core concepts. 14orbus software, “iserver for enterprise architecture,” accessed march 26, 2018, https://www.orbussoftware.com/enterprise-architecture/capabilities/. 15 ebsco, “gobi®,” accessed june 5, 2018, https://gobi.ebsco.com/gobi. 16 google scholar, “google scholar support for libraries,” accessed june 5, 2018, https://scholar.google.com/intl/en/scholar/libraries.html. 17 national library of australia, “trove,” accessed june 5, 2018, https://trove.nla.gov.au/; australian national data service, “research data australia,” accessed june 5, 2018, https://researchdata.ands.org.au/. 18 mike bostock, “d3.js—data-driven documents,” accessed april 3, 2018, https://d3js.org/. 19 parker and mckay, “it’s the end of the world,” 88. 20 breeding, marshall, “five key technology trends for 2018,” computers in libraries, 37, no.10 (december 2017), http://www.infotoday.com/cilmag/dec17/breeding--five-key-technologytrends-for-2018.shtml. https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/communities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%20symposium%202016%20-%20it%20architecture%20v2_0.pdf https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/communities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%20symposium%202016%20-%20it%20architecture%20v2_0.pdf https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/communities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%20symposium%202016%20-%20it%20architecture%20v2_0.pdf http://pubs.opengroup.org/architecture/togaf9-doc/arch/index.html http://pubs.opengroup.org/architecture/togaf9-doc/arch/index.html https://www.orbussoftware.com/enterprise-architecture/capabilities/ https://gobi.ebsco.com/gobi https://scholar.google.com/intl/en/scholar/libraries.html https://trove.nla.gov.au/ https://researchdata.ands.org.au/ https://d3js.org/ http://www.infotoday.com/cilmag/dec17/breeding--five-key-technology-trends-for-2018.shtml http://www.infotoday.com/cilmag/dec17/breeding--five-key-technology-trends-for-2018.shtml abstract introduction literature review getting started: context and planning library technology services context information technology architecture and solutions context determining scope in a statement of architecture work enterprise architecture process and outputs data asset and application inventory application maturity survey application communication diagram production of interactive visualizations discussion short-term impacts in the library domain longer-term impacts in an enterprise context conclusion acknowledgements references solving seo issues in dspace-based digital repositories: a case study and assessment of worldwide repositories article solving seo issues in dspace-based digital repositories a case study and assessment of worldwide repositories matúš formanek information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12529 matúš formanek (matus.formanek@fhv.uniza.sk) is assistant professor in the department of mediamatics and cultural heritage, faculty of humanities, university of zilina, slovakia. © 2021. abstract this paper discusses the importance of search engine optimization (seo) for digital repositories. we first describe the importance of seo in the academic environment. online systems, such as institutional digital repositories, are established and used to disseminate scientific information. next, we present a case study of our own institution’s dspace repository, performing several seo tests and identifying the potential seo issues through a group of three independent audit tools. in this case study, we attempt to resolve most of the seo problems that appeared within our research and propose solutions to them. after making the necessary adjustments, we were able to improve the quality of seo variables by more than 59% compared to the non-optimized state (a fresh installation of dspace). finally, we apply the same software audit tools to a sample of global institutional repositories also based on dspace. in the discussion, we compare the seo sample results with the average score of the semi-optimized dspace repository (from the case study) and make conclusions. introduction and state of art search engine optimization (seo) is a crucial part of the academic electronic environment. all their users attempt to process too much information and need to retrieve information fast and effectively. making academic information findable is essential. digital institutional repository systems, used to disseminate scientific information, must present their content in ways that make it easy for researchers elsewhere to find. in this paper, we describe work conducted in the department of mediamatics and cultural heritage at faculty of humanities, university of zilina to improve the discoverability of materials contained within its dspace institutional repository. in the literature review, we examine definitions of website quality and discuss audit tools. then, beginning our case study, we describe the tools applied at our institution. we next describe the selection process of a suitable set of testing tools, focused on the optimization of seo variables of the selected institutional repository running with dspace software, that will be applied later in the case study. the remainder of the article focuses on the identification and resolution of potential seo issues using the three independent online tools we selected. we aim to resolve as many problems as possible and compare the level of achieved improvement with the default installation of dspace 6.3 software which our digital repository is based on. the primary goal is not only to improve the seo parameters of the discussed system but also to increase the searchability of scientific website content disseminated by dspace-based digital repositories. next, we offer insights into worldwide dspace-based repositories. we will show that dspace is currently one of the most widely used software packages to support and run digital repositories. unfortunately, there are many major seo issues that will be discussed later. the secondary objective of this paper is to use the same set of tools to evaluate the current state of the sample of worldwide digital repositories also based on dspace. we will provide the report based on our own findings. in the discussion, the seo score of the optimized dspace (from th e case study) will be mailto:matus.formanek@fhv.uniza.sk information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 2 compared with the results of the current state of seo parameters from the worldwide dspace repositories. finally, our work also carries out many relatively innovative approaches related to digital repositories that have not been extensively debated anywhere in the literature yet. literature review to achieve our goal, we started with a review of existing academic papers. drawing from those papers we describe the current state of academic institutions’ presentation through the internet and search engines. in this sense, we focus on website optimization. the internet, as a medium, is still rapidly expanding. a massive amount of data is communicated, shared, and available online, as noted by christos ziakos: as a result, billions of websites were created, which made it hard for the average (or even advanced) user to extract useful information from the web efficiently for a specific search. the need for an easier, more efficient way to search for information led to the development of search engines. gradually, search engines began to assess the relevance of every website on their indexes compared to the queries provided to them by the users. they took into consideration several website characteristics and metrics and calculated the value of each website using complex algorithms. the enormous number of websites being indexed from search engines, along with the increasing competition for the first search results, led to studying and implementing various techniques in order for websites to appear more valuable in search engines.1 that description applies equally to academic websites as well as commercial ones. a review of relevant literature suggests that it is very important for academic institutions to carefully consider and apply website optimization. there were around 28,000 universities worldwide in 2010, according to one study that monitored research in the field of worldwide academic webometrics.2 the actual number of universities seems to be very similar in 2020. baka and leyni affirm in their working paper that the success or failure of an academic institution depends on its website: “the work of each university exists only when it encounters and interacts with society. their popularity with the public is steadily growing.” what is directly connected with the institution’s presence in the world wide web.3 many authors define the term search engine optimization (seo) as a series of processes that are conducted systematically to improve the volume and quality of traffic from search engines to a specific site by utilizing the working mechanism or algorithm of the search engine. it is a technique of optimization a website’s structure and content to achieve a higher position in search results. the aim is to make increase the website’s ranking in a web search results.4 after an extensive information retrieval in the relevant literature, we can conclude that although seo is currently a widely discussed topic, there is very little accessible scientific literature related to seo applications in the field of digital repositories in general, and none at all in the particular subset of dspace-based repositories. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 3 website quality many authors generally affirm that there is a positive correlation between academic excellence and the complex web presence of an institution. it confirms that website quality is a factor that can give us a predictive or causal relationship with seo performance.5 numerous tools could be employed to measure the quality of websites, test them closely and produce an seo performance ranking websites ability to properly promote their content through the search engines. for example, the academic ranking of world universities (the shanghai ranking, http://www.shanghairanking.com) has been established for the top 1,000 universities in the world. the website quality is considered by the authors as the quality of institution’s online presence, its ability to properly promote digital content in search engines and finally, in combination, its overall web presence. according to the shanghai ranking list, this is a factor for some “prospective students to decide on whether they will enroll in a specific institute or not. ” 6 a number of recent studies have also attempted to examine the online presence of academic institutions from various points of view. one of the older studies mentioned that the quality of academic websites is very important for students in the process of enrollment.7 another key aspect is the optimized website performance as well as seo and website security.8 audit tools if we want to perform any optimization, we need an appropriate software tool to check a current website’s ranking. according to g2, the world’s largest technology online marketplace, seo software is designed to improve the ranking of websites in search engine results pages without paying the search engine provider for placement. these tools provide seo insights to companies through a variety of different features, helping identify the best strategies to improve a website’s search relevance.9 seo audit software could be used by seo specialists or system administrators, as well. audit software performs one or more of the following functions in relation to seo: content optimization, keyword research, rank tracking, link building, or backlink monitoring. the software then provides reports on the optimization-related metrics.10 many authors stress the importance of a holistic approach to seo factors (24 factors were tested), but it depends on the most effective ones: for example, the quantity and quality of the backlinks, the ssl certificate and so on, which will be described later in this paper.11 the quality of academic websites is very important for researchers, too. they need to disseminate scientific information and communicate it in effective ways. according to some authors, the topic of academic seo (aseo) has been gaining attention in recent years.12 aseo applies seo principles to the search for academic documents in academic search engines such as google scholar and microsoft academic. in another scientific paper, aseo is considered as very similar to traditional seo, where institutions want to make good use of a seo to promote digital scientific content on the internet. beel, gipp, and wilde emphasize the importance for researchers to ensure that their publications will receive a high rank on academic search engines.13 by making good use of aseo, researchers will have a higher chance of improving the visibility of their publications and have their work read and cited by more researchers. in recent years, digital institutional repositories (as the academic systems) have been used as modern ways of promotion and dissemination of digital scientific objects through the internet. digital objects need to reach a wider audience—digital repositories have a form of website interface, interact with students, teachers, or researchers on a daily basis and use the number of citations, articles, theses or other research objects. institutional repositories are affected by search http://www.shanghairanking.com/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 4 engines too, so some improvements on repositories’ seo parameters are needed. these factors contribute to a system’s rankings. seo on institutional repositories is not considered an absolutely new scientific topic. kelly stressed eight years ago that google is critical in driving traffic to repositories. he analyzed results from a survey describing the summarization of seo findings for the 24 institutional repositories in the united kingdom. the survey results showed that referring platforms were primarily responsible for driving traffic to those institutional repositories—thanks to many hypertext links in referring domains.14 since then, seo analyses of digital repositories have not been a widely discussed topic in the literature. it is a relatively unique topic to discuss seo on a specific type of digital repository software—dspace, as the most used and popular software for running digital libraries and repositories.15 consequently, this paper focuses on that topic since the dspace-based digital repository is a complex online computer system where some seo parameters could be adjusted. seo audit tools help to identify areas of potential adjustments of those website properties that could help produce higher rankings in search engines (and improve the whole system visibility). audit tools selection process website variables that affect seo can be tested using specialized online software tools. this topic is discussed in detail on a semi-professional level on specialized websites that provide a number of recommendations regarding the use of specific tools as well as evaluations of the tools.16 these tools can keep track of changes in many seo variables. we want to use this approach in our study. however, first we need to choose the appropriate set of these tools. we have found that many seo audit tools mentioned in professional online sources are narrowly specialized.17 for example, they may be focused only on keyword analysis, backlink analysis (for example, ahrefs’ free backlink checker), and so on. in our study, we intend to describe a greater number of seo parameters to monitor rather than emphasize only a few selected ones. we also need tools that are fully available online for free. based on these criteria, we immediately excluded several tools from the selection, because they provide only austere, simple, or restricted information. many tools were excluded because they were limited to a single test with the requirement of registration or provision of an email address. a number of testing tools were also available only in paid versions. we wanted a set of tools that focus on several aspects of seo analyses and evaluate the quality of websites’ seo variables comprehensively. it is important to add that the selected tools results must be comparable, too. after careful consideration of all possibilities, we finally decided to choose three independent seo audit tools in order to make the approach more transparent. the selected tools met most of the criteria mentioned above. however, it is very important to note that many other software tools surely meet the criteria and could also be suitable for testing purposes. based on the scientific literature review, we were not able to identify specific recommendations in this regard; therefore, we have been inspired by the advice offered in the websites and blogs previously mentioned that are focused primarily on seo. our tools selection is as follows (listed in alphabetical order): 1. seo checker (https://suite.seotesteronline.com/seo-checker ) is part of a complex audit software suite called seo tester online suite. seo checker provides tests in the following categories: base, content, speed, and connections to social media. it tracks, among many other parameters, title coherence, text/code ratio, accessibility of microdata, opengraph https://suite.seotesteronline.com/seo-checker information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 5 metadata, social plugins, in-page and off-page links, quality of links, mobile friendliness of the page and many other seo and technical website attributes. regarding restrictions, only two sites can be tested within a 24-hour period. the limit increases to four sites per day after free registration with a valid email address. moreover, there is a 14-day trial period during which all hidden functionalities work. in the free version that we used, a complete report can be viewed only, not downloaded or saved. 2. seo site checkup (https://seositecheckup.com/) was selected based on many positive recommendations from the technically oriented expert website traffic radius.18 seo site checkup is described as “a great seo tool that offers more than 40 checks in 6 different categories (common seo issues like missing metadata, keywords, issues related with absence of connections to social media, semantic web, etc.) to serve up a comprehensive report that you can use to improve results and the website’s organic traffic. it also gives recommendations to fix critical issues in just a few minutes. as a tool, it is very fast and provides in-depth information about the various seo opportunities and accurate results.”19 seo site checkup is appreciated and recognized as number one among other audit tools ranked by the geekflare website.20 another reason we selected this tool for our testing scenario is the fact that the google search engine will offer a link to this tool as the first after entry the search query “seo testing tool” (excluding paid links). seo site checkup is also the fastest of the selected audit tools, which could be considered as another advantage. its disadvantages include the ability to test only one website within 24 hours from one public ip address. 3. woorank (https://woorank.com) is recommended by traffic radius: “woorank offers an in-depth analysis that covers the performance of existing seo strategies, social media and more. the comprehensive report analysis is classified into eight sections for improved readability quotient, and you may also download the report as branded pdf.”21 woorank has obtained the third position among the recommended software tools. trustradius gives it a score of 9.2 out of 10 and users rate it of 4.67 out of 5 stars based on 51 reviews .22 on the one hand, some results are hidden in the free version, but the final score will be shown. on the other hand, woorank has no limit to the number of websites tested per day, but it is the slowest of the selected testing tools. we selected these three seo audit tools because they work independently, their results are comparable to each other, and they offer a quick way to get comprehensive seo analysis results for a tested site. it should be noted that results of some performed tests are hidden, but there is general guidance on how to fix some issues. however, the solution always depends on the specif ic site and used technology. using three different tools adds objectivity because we do not rely on just one tool and a one-sided view of the seo issue. the three selected testers all display results in the same way—test results are always shown as a summarized score in the range of 0 to 100 points (100 represents the best result). a very large set of seo parameters and technical website properties is evaluated in all three cases. these tests are usually divided into several categories (for example, common seo issues, performance, security issues, and social media integration). although similar parameters https://seositecheckup.com/ https://woorank.com/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 6 are assessed in all three audit tools, there are still some differences between them. each of the testing tools is unique in a certain area because it also tests a parameter that the others do not deal with or evaluates a website by a different methodology. still, the fact remains that the evaluated seo parameters overlap between the tools. we will not overload this paper with detailed information and technical details of individual partial tests, because they can be easily found on the website of the given test tools (seo site checkup, seo checker online, woorank). we will just mention the common core of main tests: css minification test, favicon test, google search results preview test, google analytics test, h1 heading tags test, html page size test, image alt test, javascript minification test, javascript error test, keywords usage test, meta description test, meta title test, seo friendly url test, sitemap test, social media test, robots.txt test, url canonicalization test, and url redirects test. another specific group consists of tests related to a particular audit tool. thanks to them we can get a more comprehensive view of the tested area of a website’s seo characteristics. for example, seo checker features the following specific tests: title coherence test, unique key words test, h1 coherence test, h2 heading tags test and facebook popularity test. woorank as the second tool extends the basic set of tests with the following: title tag length test, in-page links test, off-page links test, language test, twitter account test, instagram account test, traffic estimations and traffic rank. of course, there is also a set of tests that are parts of two audit tools, but the third one does not deal with them since it is specialized in another area. as we have mentioned, the tools offer a list of suggestions for potential improvement of seo characteristics. the user is informed about an issue, but no instructions or solutions are provided on how to resolve it. the main benefit of this paper lies with its objective to solve specific seo issues. this work may improve the visibility and searchability of dspace-based institutional repositories. a set of the three audit tools described above will be used in the following section. we attempt to identify possible seo issues of the selected institutional repository in the form of a case study. then we aim to fix the identified seo issues and increase its quality of seo parameters as well as demonstrate the potential impact on website traffic caused by performed repairs. all traffic measurements will be based on google analytics data. the institutional repository of the department of mediamatics and cultural heritage (seo case study) background information an older version of our digital repository (based on dspace v5.5) was launched by the department of cultural heritage and mediamatics in april 2017. now, in 2021, the repository makes available online over 180 digital objects, most of them open access under creative commons licenses. the first attempts to create and establish a similar virtual space for digital objects started long ago. several software solutions had been tested for this purpose—for example, invenio and eprints, along with dspace. according to opendoar’s statistics, eprints and dspace have always been the most popular tools for running digital repositories.23 a few years ago, dspace was chosen as the primary software for running a digital repository. since then, the usage of open-source software has been raising. for example, ubuntu server lts (long term support) is used as an operating system, tomcat 8 is used as a web server, postgresql information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 7 assumes the role of a database system, etc. all of those software components are part of a complex digital system and are orchestrated in a virtual environment that is built on an open-source virtualization solution called xcp-ng (in version 8.2). some software components have been switched for others during the development period. based on our experience, the digital repository’s regular visitors were mostly from the staff and students of the department. we initially did not feel a need to improve the visibility of this system to search engines, an oversight that turned out to be a mistake in the long run. we did not perform any search engine optimization on this repository until november 2019, when we coincidentally discovered several scientific articles dealing with seo in the academic environment. after studying the theoretical background, we initiated the practical application process. we applied theory and our experience with dspace software into an seo troubleshooting process within our local repository. most of the optimizing actions related to solving the major seo issues were performed before november 10, 2019. we will describe the seo adjustments we made and derive a list of recommendations for other institutions based on our own experience. initial testing of a clean dspace 6 installation in order to formulate any recommendations related to seo and the administration of dspace digital repositories, it is important to determine and test a starting point. for this purpose, we chose a clean instance of dspace v6.3 with an xml user interface (xmlui)—the latest commonly available stable version. this is the same version that we use in this case study and in our production environment. (a newer version, dspace 7 beta 4, was released by atmire on october 13, 2020).24 no other customization edits were made except a base configuration and necessary url settings. this installation of dspace v6.3 has been tested by the same set of tools mentioned previously. the tests we performed are summarized in table 1, where they are divided into four main seo sections in the first column: common seo issues, social, speed and security. a test name is shown in the second column. the third column is marked as “default installation,” where we display the test results on our clean dspace 6.3 installation. if the tested instance met the criteria of the given test, the green pictogram occurs. when the particular test fails, the red cross is used. the improved state is shown in the fourth column marked as semi-optimized. it is a consequence caused by many important technical changes and seo issues solving process. th is issue will be discussed and described later in this paper; however, a short note about the considered issue is displayed in each row. these notes were retrieved by reports on results. we have used the prefix semiin the last column because we were not able to resolve all detected seo issues—only most of them. all related reasons will be described briefly in the discussion section. when the improving change between states has been made, we have changed a status pictogram (from the red cross to the green correct tick) and set the row color to yellow. the changes leading to improvement (e.g., the yellow rows) will be discussed in detail later, too. recall that we have no need to overload the main text of this paper with detailed technical information about partial tests, because it can be easily found on the websites of the given test tools. table 1 shows the compared results between the non-optimized and semi-optimized states of the dspace repository. based on table 1, the default instance of dspace with basic http and other information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 8 default settings received only 58 points out of 100 in seo site checkup, 50.1 points in seo checker and 32 points in woorank. the average final score is 46.7 points out of 100. although this gained score could be considered as low, the dspace default instance still meets certain basic criteria of seo. in addition, many repository administrators usually do not rely only on a default installation, but they make at least some changes in configuration immediately after the initial installation. inter alia, the first thing to do should be an implementation of https protocol, adding a connection with google analytics services and so on. the improved state is shown in the last column of table 1. whenever we solved an issue, the overall score raised. the semi-optimized repository has obtained a higher score compared to the previous column (default installation). the last column represents the final (however semioptimized) state of technical and seo attributes which we were able to reach at this moment. as shown, many seo issues have been solved. we highlighted them in yellow. on the one hand, some issues remain unsolved. on the other hand, the overall seo improvement is more than noticeable although the final average gained score has not reached the maximum value (100 points). information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 9 table 1. comparison of results between the non-optimized and semi-optimized states of dspace repository. test name state default installation (before optimization) semi-optimized (after a few optimization steps) meta title test, title tag length the title tag is set, but the meta title of the webpage (dspace home) has a length of 11 characters. it is too low. the title tag has been set to “digitálny repozitár katedry mediamatiky a kultúrneho dedičstva” (note: in slovak language). title coherence test the keywords in the title tag are included in the body of the page the title of the page seems optimized. meta description test no meta-description tag is set. meta-description tag has been set. (121 characters) google search results preview test “dspace home” is too general. the title of the page has been changed. keywords usage test the keywords are not included in title and meta-description tags. a set of appropriate keywords has been added. unique key words test the textual content is not optimized on the page. there is an excellent concentration of keywords in the page. this page includes 382 words of which 58 are unique. h1 heading tags test 8 h1 tags, 6 h2 tags the h1 tags of the page seem not to be optimized. there are too many h1 tags. h1 coherence test the keywords present in the tag h1 are included in the body of the page. some of the keywords of the tag h1 are not included in the body of the page. h2 heading tags test the keywords present in the tag

    are included in the body of page. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 10 test name state default installation (before optimization) semi-optimized (after a few optimization steps) language test detected: slovak declared: missing a missed language tag has been implemented. robots.txt test no “robots.txt” file has been found. “robots.txt” file has been enabled. sitemap test no sitemap has been found. sitemap has been enabled. seo friendly url test webpage contains urls that are not seo friendly! webpage contains urls that are not seo friendly. image alt test the webpage does not use “img” tags. it is optimized. inline css test the webpage uses inline css styles. the webpage uses inline css styles. deprecated html tags test the webpage does not use html deprecated tags. google analytics (ga) test ga is not in use. ga has been implemented. favicon test default dspace favicon is used. the favicon has been customized. js error test no severe javascript errors were detected. no severe javascript errors were detected. social media test no connection with social media has been detected. the website is successfully connected with social media (using facebook). facebook account test information about facebook page has been added by schema.org metadata. facebook popularity (low) the webpage is promoted enough on facebook. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 11 test name state default installation (before optimization) semi-optimized (after a few optimization steps) twitter account test no connection with twitter has been detected. information about twitter account has been added by schema.org metadata. twittercard test no twittercard is implemented. metainformation about twittercard has been added by opengraph metadata. instagram account test no connection with instagram has been detected. information about instagram account has been added by schema.org metadata. microdata (opengraph, schema.org) test there is no microdata or opengraph/schema.org metadata on the website. some opengraph and schema.org matadata has been added. html page size test the size of the page is excellent. (23.65 kb) the size of the page is excellent. (28.84 kb) text/code ratio test 10.71% (excellent) 15.45% (excellent) html compression/gzip (no compression is enabled) the size of html could be reduced up to 79%. the webpage is successfully compressed using gzip compression on your code. your html is compressed with 78% size savings. site loading speed test loading time is around 1.86s loading time is around 2.39s page objects test the webpage has fewer than 20 http requests. the webpage has fewer than 20 http requests. page cache test (server-side caching) the pages are not cached. the pages are not cached. flash test website does not include flash objects. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 12 test name state default installation (before optimization) semi-optimized (after a few optimization steps) cdn usage test your webpage is not serving all resources (images, javascript and css) from cdns. your webpage is not serving all resources (images, javascript and css) from cdns. image, javascript, css caching tests data are not cached. data are not cached. javascript minification test javascripts are not minified. javascript files’ minification has been enabled in tomcat configuration. css minification test some of your webpage’s css resources are not minified. some of your webpage’s css resources are not minified. nested tables test the webpage does not use nested tables. frameset test the webpage does not use frames. doctype test the website has a valid doctype declaration. url redirects test 1 url redirect has been detected. it is acceptable. url canonicalization test the webpage urls are not canonized. https://repozitar.kmkd.uniza.sk/x mlui and https://www.repozitar.kmkd.uniz a.sk/xmlui should resolve to the same url, but currently do not. canonical tag test no canonical tag has been detected. the webpage is using a canonical link tag. https test website is not ssl secured. https has been implemented. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 13 test name state default installation (before optimization) semi-optimized (after a few optimization steps) safe browsing test no malware or phishing activity found. server signature test server self-signature for https is off. directory browsing test server has disabled directory browsing. plaintext emails test the webpage does not include email addresses in plain text. mobile friendliness (includes tap targets, no plugin content, font size legibility, mobile viewport) the webpage is optimized for mobile visitors. seo site checkup final score 58/100 81/100 seo checker online final score 50.1/100 78.0/100 woorank final score 32/100 65/100 average final score 46.7/100 74.66/100 resolving major seo issues this section will look at how we resolved the major seo issues that the tools detected. this is the key technical part because most of mentioned issues highlighted in table 1 were solved and described. the following technical and seo adjustments have been implemented and tested in order to improve the average final score by 59.87% (by 27.96 points, from 46.7 to 74.66 points)— comparing the fresh installation of dspace against the semi-optimized one. all the following solution procedures are based on our own experience, experiments, and research carried out in the area of digital repositories and their optimization as virtual spaces. during the solving process, we follow the order of issues stated in table 1 and describe them in more details in the dspace v6.3 environment and an xml user interface (xmlui). the following procedures may differ slightly if you are using a different version of dspace or another graphic interface (for example jspui). examples of code are given in monospaced font. title, description, and keywords tags in a website header this criterion requires filling in the specific metadata (e.g., metacontent) fields in the page’s html code. the search engines process them automatically to find out what the website is about. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 14 to solve these seo issues change a website title (in default “dspace home”) located in the language translations config files in the folder path. /dspace/webapps/xmlui/i18n/messages_en.xml. find the appropriate key and change the value. all content in this file is fully customizable. next, edit dspace’s page structure config file (in path /themes/mirage/lib/xsl/core/page-structure.xsl) in order to add the metadata content: • a meta-description tag • a keywords tag • an author tag with a carefully selected content and length just below the main tag, as shown in the example: note: do not forget the termination characters />. the keywords should be included in title and meta–description tags. several other seo parameters are affected by performing those steps, for example, google search results preview test, keywords usage test, unique key words test and keywords concentration test. language declaration the language declaration is very important for search engines to identify the primary language of the website content. if a declared language is missing in a website, you can define it by adding the following line into the page-structure.xsl file (the process is similar to adding keywords and description tag as explained above). edit the page-structure.xsl file (with vim or another text editor, for example) and add a statement like the following above the main tag: note: “sk” is an abbreviation for “slovak language” as stated in w3 namespaces. more information is available at https://www.w3.org/tr/xml/ . google analytics, robots.txt and sitemap implementation the connection between a website and google analytics services enables google analytics to track users’ behavior and understand as they interact with this site. it is the basis of web analysis. the “robots.txt” and “sitemap.xml” files are simple text files which are required for search engines to specify the website structure and additional information about it. https://www.w3.org/tr/xml/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 15 to enable google analytics services, insert a ua code identifier (id is a string), obtained from google analytics, into the dspace.cfg config files located in the dspace home folder. in that file find the key/row named “xmlui.google.analytics.key=” and insert the corresponding ua identifier there. next, it is needed to uncomment the row with the key “xmlui.controlpanel.activity.max = 250” in the same “dspace.cfg” file. finally, uncomment the row below in the “xmlui.xconf”file located in the path /dspace/config/ and restart the tomcat service: the “robots.txt” file is commonly used and enabled in dspace, but many seo audit tools are not able to detect it successfully because this file is located in path other than the expected default one. to enable robots.txt file detection, copy the file /dspace/webapps/xmlui/static/robots.txt to the root of the tomcat folder (usually located in path /var/lib/tomcat8/webapps/root). finally, restart the tomcat web service. a sitemap for a currently running dspace instance is available in the “robots.txt” file mentioned above. edit this file and set an appropriate url for the sitemap location. enabling connections with social media this criterion detects a hyperlink (or other metadata) connection between a website and popular social media, such as facebook, twitter, etc. the primary goal is to promote the digital content. this subsection deals with social media connections with a dspace-based repository. a simple creation of a profile or a site on a social network related to a repository is considered an essential example of good practice. however, an appropriate form of connection between sites must be created, too. naturally, further endorsement of this system through social networks is another key step. social media-oriented tests are performed by each seo audit tool nowadays. the detected connection with social media could have a big impact on the site’s popularity, as well as on the gained seo final score. there are many ways how to establish these connections: connection with facebook, instagram and twitter—a direct link from the homepage, for example: to add a link to a facebook site profile, edit the page-structure file (/dspace/webapps/xmlui/themes/mirage/lib/xsl/core/page-structure.xsl) just below a div tag with id “ds-footer-wrapper”. for example: